From patchwork Wed Dec 6 19:43:28 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 120885 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp7468429qgn; Wed, 6 Dec 2017 11:44:05 -0800 (PST) X-Google-Smtp-Source: AGs4zMYkKVjMM0AWtjQgkgjMOMDj2AY+58I7Vbp4pWyj1y1eRoyBXwXMw5ruPy6L7tdyabf1Ruit X-Received: by 10.84.137.129 with SMTP id 1mr23676068pln.169.1512589445129; Wed, 06 Dec 2017 11:44:05 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1512589445; cv=none; d=google.com; s=arc-20160816; b=kdSzsyJFlyi7grco2co6Kt7ZT0VoQIKoaFG3D8zQIg6D+OmlA/gUe9xHfbs5GGxKK9 XJv+26S9Mb/I2L4uHYNLF2haC2VRhXxH1+pYRKgvJFCO3aRFPv5iXeLnL7LOQwkhdKfk cCC+MyHpJqnAuXh8Z9Y6zEsO4DewHs5beIePxfjL5WdfmKOInoi/KmnOkvWooxNHDDe2 9AcTqCsPH9cCSmxjrNhT7Vx7Xxw7ShAKp8H5TMcWayZwHI3/Yperx8SQRl2Lxwbid0SJ p04pPuJenBeOa6KFhPOwjwRgsjK95WBhK+7qtmAcE0Oh+OI8fGJiu3psWgZ2wTVvHjx3 o+GA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=LihVXBSIHLub8TadiGHOcXJbFC4M3p/6b/xh6SsRLz4=; b=orplhWVU4wuU3ikXJ3KVMwpomqsFD20/a8xkzRbxGzulPV9mfrCh8dc/JzETiVKvtD 1BgPzJgJnb7urH23OpSMIuEeBAjpdAY5luFC6knC1Ma+yQoh8J2iF2kh6jHXbYYWPBdU h+FYBhVkh/Vca7K3rqbYZ2q0WSwf5T/d1pUXLUkwDqnPhKu56q5/ZqeOu3x5WRXI5cu5 oJfuMEBVG64aVLRq9L0+jhXsQQlCJqHlRfdZZcisXNzULMhOw6eHO4X+bDD8O2iI39iw kQIpFF3/yvSftDcagiI5PXzDScjYU6ciI1PDkpj34LyujV9mRcVgKZibgLhVVqT3O769 4yVA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=EwgEL2CI; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 31si2517485plz.160.2017.12.06.11.44.04; Wed, 06 Dec 2017 11:44:05 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=EwgEL2CI; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752034AbdLFToC (ORCPT + 1 other); Wed, 6 Dec 2017 14:44:02 -0500 Received: from mail-wm0-f66.google.com ([74.125.82.66]:38453 "EHLO mail-wm0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751718AbdLFToA (ORCPT ); Wed, 6 Dec 2017 14:44:00 -0500 Received: by mail-wm0-f66.google.com with SMTP id 64so9076569wme.3 for ; Wed, 06 Dec 2017 11:44:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=LihVXBSIHLub8TadiGHOcXJbFC4M3p/6b/xh6SsRLz4=; b=EwgEL2CI96sQSEb3dva1pqFjHxBL8CW4DLJl5vkDrs/BMP1RW5U9ro/8I7oTaKYNMF 7t8Oa2uEYNCiG4L8Hc/T2WsSuwTPz7dgAtXHfi1OcPbH7AF13GQQejmORV9wyWO+JfsP 5wQ36ADgD9UrXX9PlRZ2GRy00D/BklS7CUIu4= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=LihVXBSIHLub8TadiGHOcXJbFC4M3p/6b/xh6SsRLz4=; b=RBb6Tt9mMf8u3QwIvDSEHsUBWQGBS0x1Q8KYgDUvBWdl8o7j8rBPQPB1JIOuyOCyah yStkenemrJ6Slv16op3I2Sv2mtMp5iLEy9U3ipvU/s+KvJhkw+PD3XFVTOq+xmYp5OKR TCAwephTlRsGAidMBmJD+Gihrm+Gsv//GfViF0ZTeNqS1LwZx82CfiR3qW1BSgvi9Kpn SncuJxwYKpLRIiXLD9JXWMKmarygb5N/W0Bz0QEkyzkLmGA0tz6drHtFNl+PU7ZBIdrA 7vK2SLUeWAoMfyC8XFt1xifjc5av6+Ii4YmKXz7pXOZF93qDksVQRGgvyDPj14XnY+i1 4euQ== X-Gm-Message-State: AKGB3mIVeazVg4Loo8oDmCOKgHSjRKrQH1tSwI4tBJU1ZF5ris2NjYd6 d/+9g7ItQC+LvBtJaMEXLJZNwQInjlo= X-Received: by 10.28.212.137 with SMTP id l131mr8548865wmg.32.1512589439392; Wed, 06 Dec 2017 11:43:59 -0800 (PST) Received: from localhost.localdomain ([105.150.171.234]) by smtp.gmail.com with ESMTPSA id b66sm3596594wmh.32.2017.12.06.11.43.56 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 06 Dec 2017 11:43:58 -0800 (PST) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: herbert@gondor.apana.org.au, linux-arm-kernel@lists.infradead.org, Ard Biesheuvel , Dave Martin , Russell King - ARM Linux , Sebastian Andrzej Siewior , Mark Rutland , linux-rt-users@vger.kernel.org, Peter Zijlstra , Catalin Marinas , Will Deacon , Steven Rostedt , Thomas Gleixner Subject: [PATCH v3 02/20] crypto: arm64/aes-ce-ccm - move kernel mode neon en/disable into loop Date: Wed, 6 Dec 2017 19:43:28 +0000 Message-Id: <20171206194346.24393-3-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20171206194346.24393-1-ard.biesheuvel@linaro.org> References: <20171206194346.24393-1-ard.biesheuvel@linaro.org> Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org When kernel mode NEON was first introduced on arm64, the preserve and restore of the userland NEON state was completely unoptimized, and involved saving all registers on each call to kernel_neon_begin(), and restoring them on each call to kernel_neon_end(). For this reason, the NEON crypto code that was introduced at the time keeps the NEON enabled throughout the execution of the crypto API methods, which may include calls back into the crypto API that could result in memory allocation or other actions that we should avoid when running with preemption disabled. Since then, we have optimized the kernel mode NEON handling, which now restores lazily (upon return to userland), and so the preserve action is only costly the first time it is called after entering the kernel. So let's put the kernel_neon_begin() and kernel_neon_end() calls around the actual invocations of the NEON crypto code, and run the remainder of the code with kernel mode NEON disabled (and preemption enabled) Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-ce-ccm-glue.c | 47 ++++++++++---------- 1 file changed, 23 insertions(+), 24 deletions(-) -- 2.11.0 diff --git a/arch/arm64/crypto/aes-ce-ccm-glue.c b/arch/arm64/crypto/aes-ce-ccm-glue.c index a1254036f2b1..68b11aa690e4 100644 --- a/arch/arm64/crypto/aes-ce-ccm-glue.c +++ b/arch/arm64/crypto/aes-ce-ccm-glue.c @@ -107,11 +107,13 @@ static int ccm_init_mac(struct aead_request *req, u8 maciv[], u32 msglen) } static void ccm_update_mac(struct crypto_aes_ctx *key, u8 mac[], u8 const in[], - u32 abytes, u32 *macp, bool use_neon) + u32 abytes, u32 *macp) { - if (likely(use_neon)) { + if (may_use_simd()) { + kernel_neon_begin(); ce_aes_ccm_auth_data(mac, in, abytes, macp, key->key_enc, num_rounds(key)); + kernel_neon_end(); } else { if (*macp > 0 && *macp < AES_BLOCK_SIZE) { int added = min(abytes, AES_BLOCK_SIZE - *macp); @@ -143,8 +145,7 @@ static void ccm_update_mac(struct crypto_aes_ctx *key, u8 mac[], u8 const in[], } } -static void ccm_calculate_auth_mac(struct aead_request *req, u8 mac[], - bool use_neon) +static void ccm_calculate_auth_mac(struct aead_request *req, u8 mac[]) { struct crypto_aead *aead = crypto_aead_reqtfm(req); struct crypto_aes_ctx *ctx = crypto_aead_ctx(aead); @@ -163,7 +164,7 @@ static void ccm_calculate_auth_mac(struct aead_request *req, u8 mac[], ltag.len = 6; } - ccm_update_mac(ctx, mac, (u8 *)<ag, ltag.len, &macp, use_neon); + ccm_update_mac(ctx, mac, (u8 *)<ag, ltag.len, &macp); scatterwalk_start(&walk, req->src); do { @@ -175,7 +176,7 @@ static void ccm_calculate_auth_mac(struct aead_request *req, u8 mac[], n = scatterwalk_clamp(&walk, len); } p = scatterwalk_map(&walk); - ccm_update_mac(ctx, mac, p, n, &macp, use_neon); + ccm_update_mac(ctx, mac, p, n, &macp); len -= n; scatterwalk_unmap(p); @@ -242,43 +243,42 @@ static int ccm_encrypt(struct aead_request *req) u8 __aligned(8) mac[AES_BLOCK_SIZE]; u8 buf[AES_BLOCK_SIZE]; u32 len = req->cryptlen; - bool use_neon = may_use_simd(); int err; err = ccm_init_mac(req, mac, len); if (err) return err; - if (likely(use_neon)) - kernel_neon_begin(); - if (req->assoclen) - ccm_calculate_auth_mac(req, mac, use_neon); + ccm_calculate_auth_mac(req, mac); /* preserve the original iv for the final round */ memcpy(buf, req->iv, AES_BLOCK_SIZE); err = skcipher_walk_aead_encrypt(&walk, req, true); - if (likely(use_neon)) { + if (may_use_simd()) { while (walk.nbytes) { u32 tail = walk.nbytes % AES_BLOCK_SIZE; if (walk.nbytes == walk.total) tail = 0; + kernel_neon_begin(); ce_aes_ccm_encrypt(walk.dst.virt.addr, walk.src.virt.addr, walk.nbytes - tail, ctx->key_enc, num_rounds(ctx), mac, walk.iv); + kernel_neon_end(); err = skcipher_walk_done(&walk, tail); } - if (!err) + if (!err) { + kernel_neon_begin(); ce_aes_ccm_final(mac, buf, ctx->key_enc, num_rounds(ctx)); - - kernel_neon_end(); + kernel_neon_end(); + } } else { err = ccm_crypt_fallback(&walk, mac, buf, ctx, true); } @@ -301,43 +301,42 @@ static int ccm_decrypt(struct aead_request *req) u8 __aligned(8) mac[AES_BLOCK_SIZE]; u8 buf[AES_BLOCK_SIZE]; u32 len = req->cryptlen - authsize; - bool use_neon = may_use_simd(); int err; err = ccm_init_mac(req, mac, len); if (err) return err; - if (likely(use_neon)) - kernel_neon_begin(); - if (req->assoclen) - ccm_calculate_auth_mac(req, mac, use_neon); + ccm_calculate_auth_mac(req, mac); /* preserve the original iv for the final round */ memcpy(buf, req->iv, AES_BLOCK_SIZE); err = skcipher_walk_aead_decrypt(&walk, req, true); - if (likely(use_neon)) { + if (may_use_simd()) { while (walk.nbytes) { u32 tail = walk.nbytes % AES_BLOCK_SIZE; if (walk.nbytes == walk.total) tail = 0; + kernel_neon_begin(); ce_aes_ccm_decrypt(walk.dst.virt.addr, walk.src.virt.addr, walk.nbytes - tail, ctx->key_enc, num_rounds(ctx), mac, walk.iv); + kernel_neon_end(); err = skcipher_walk_done(&walk, tail); } - if (!err) + if (!err) { + kernel_neon_begin(); ce_aes_ccm_final(mac, buf, ctx->key_enc, num_rounds(ctx)); - - kernel_neon_end(); + kernel_neon_end(); + } } else { err = ccm_crypt_fallback(&walk, mac, buf, ctx, false); } From patchwork Wed Dec 6 19:43:29 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 120886 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp7468475qgn; Wed, 6 Dec 2017 11:44:07 -0800 (PST) X-Google-Smtp-Source: AGs4zMYlBD3OBpy8l3r+tDCU64W9u6WFfuwPuvpBm9hv697o2vSetjQD4cUQdiPUl0j2qcW3iIAj X-Received: by 10.99.127.84 with SMTP id p20mr22568276pgn.204.1512589447812; Wed, 06 Dec 2017 11:44:07 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1512589447; cv=none; d=google.com; s=arc-20160816; b=c3ToBl5WO0xGyqptr8FZwTwou4dh2MMXWafFAGRHILgNcYz6Wc1bl/ni8nfY/itkEE BL7dfQeM/lyyx5WJBsyG9OSKafaQQ4heHpsoJZqw8DOo05qtA7qcKb6Q4lNy3gKg7dII Bh6npTsqX1jBtYTRU8cpXoskae8R9En75GGC3AJaXmpaUTbxEbAX1SZ89F0ApbpW5r3y Z6FR/XviZ1kSRHxgVib+k1z9gwauHvmPkGLG7uFN1+RdXe+qX0ceVqn8dbIp7K+xbqrD VPh5KdCFckQWtE7kHlt6YuXWe/Qj4DLnxMbTEcO9t2RlM7VHc+AW2KkEvsF5PHnx0kXr MvYw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=R2cQ4/fosfe4jzVbyYEDZjyUsNZ8Zda7gHi/Beszb0E=; b=R2GxyZjMUyMn1G5rovEwoJeXxxQhhaSOB34v9diXscEJgtLLLONfxiwTvfU91x99+7 J9B5rZiPXZGF+62AP/8LxoJ3z4PYd4F0UwjrLOJ6oAK6A72Zc7YcutWoUwUV9k8HvUwV SrITGglKAQhV2Dizwe6Qk/JXCa4yJ9I3DgM+7V8rHRm9q/zseMLXB38Hq0XniDuluEdE Kxv/Saq0CxixT6FxcLh8h1uHZftMOhQKBD1HJPzXtBkPoty61NpPzLapWgHvB2oJi56U i6xOQ54QcX8WDb32xyYdQpsCvDwfWzI4FvQad/8lTEa+T37rC5wkVZaIwAM8aqEHgkFx Sc0g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=CMY5i2yZ; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d4si2481646plo.114.2017.12.06.11.44.07; Wed, 06 Dec 2017 11:44:07 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=CMY5i2yZ; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751894AbdLFToG (ORCPT + 1 other); Wed, 6 Dec 2017 14:44:06 -0500 Received: from mail-wr0-f196.google.com ([209.85.128.196]:45708 "EHLO mail-wr0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752059AbdLFToE (ORCPT ); Wed, 6 Dec 2017 14:44:04 -0500 Received: by mail-wr0-f196.google.com with SMTP id h1so5089713wre.12 for ; Wed, 06 Dec 2017 11:44:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=R2cQ4/fosfe4jzVbyYEDZjyUsNZ8Zda7gHi/Beszb0E=; b=CMY5i2yZ4/CjIGrqJooeT7c72bVyA9lsdc0hTaikjEqkX0d/hTEzRkBvDL2eLwG/17 ylFllyjhFmd8gX//9OUNimbTLjOCuCJljKOXuLMWwulRqBtfId0xq7HaRsQdh14dj6w9 ZNTADg9kRuxUC8XPMkj/yGFKpeMS017bp2Esk= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=R2cQ4/fosfe4jzVbyYEDZjyUsNZ8Zda7gHi/Beszb0E=; b=i2k1CcblsAo0stkHyWDVo4CpBldYef9pYS2uWGsttbiWbAQmv8FDSDCLKI+mDucgp4 59umzYSf02PdbpifO1X3oNjafx89G78L0fMAA+nDfXFU+JVlY+AU9Ok7MXe2k290+6nc CXuZ57XH9gjexeGbqjaSk10Ri0+aAELNuFkL1L/E35vg497Bb9Epd51wU31Ao44OcG5i dMJAOl4DhQ4hUaqaxC72pvgZ7E1bup8fqiN8z6sUtpFYmIauX4nmLQ/B95lNvGk+YwHe H8U5VtRMCd0KrO+odQomgo/4ZlA/lW/ZnH2hGAnSgvijs6PtVoMGFgZza6Wm4lZVtlNc vIPA== X-Gm-Message-State: AJaThX6Ip3dbEz81DifFhcjuYWE2UJHgC7XlOYIJsjOMcf3zg5YxhWaF PjacU2FN5MklswraydLduALI+drx3fA= X-Received: by 10.223.186.202 with SMTP id w10mr22206116wrg.187.1512589442547; Wed, 06 Dec 2017 11:44:02 -0800 (PST) Received: from localhost.localdomain ([105.150.171.234]) by smtp.gmail.com with ESMTPSA id b66sm3596594wmh.32.2017.12.06.11.43.59 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 06 Dec 2017 11:44:01 -0800 (PST) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: herbert@gondor.apana.org.au, linux-arm-kernel@lists.infradead.org, Ard Biesheuvel , Dave Martin , Russell King - ARM Linux , Sebastian Andrzej Siewior , Mark Rutland , linux-rt-users@vger.kernel.org, Peter Zijlstra , Catalin Marinas , Will Deacon , Steven Rostedt , Thomas Gleixner Subject: [PATCH v3 03/20] crypto: arm64/aes-blk - move kernel mode neon en/disable into loop Date: Wed, 6 Dec 2017 19:43:29 +0000 Message-Id: <20171206194346.24393-4-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20171206194346.24393-1-ard.biesheuvel@linaro.org> References: <20171206194346.24393-1-ard.biesheuvel@linaro.org> Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org When kernel mode NEON was first introduced on arm64, the preserve and restore of the userland NEON state was completely unoptimized, and involved saving all registers on each call to kernel_neon_begin(), and restoring them on each call to kernel_neon_end(). For this reason, the NEON crypto code that was introduced at the time keeps the NEON enabled throughout the execution of the crypto API methods, which may include calls back into the crypto API that could result in memory allocation or other actions that we should avoid when running with preemption disabled. Since then, we have optimized the kernel mode NEON handling, which now restores lazily (upon return to userland), and so the preserve action is only costly the first time it is called after entering the kernel. So let's put the kernel_neon_begin() and kernel_neon_end() calls around the actual invocations of the NEON crypto code, and run the remainder of the code with kernel mode NEON disabled (and preemption enabled) Note that this requires some reshuffling of the registers in the asm code, because the XTS routines can no longer rely on the registers to retain their contents between invocations. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-glue.c | 95 ++++++++++---------- arch/arm64/crypto/aes-modes.S | 90 +++++++++---------- arch/arm64/crypto/aes-neonbs-glue.c | 14 ++- 3 files changed, 97 insertions(+), 102 deletions(-) -- 2.11.0 diff --git a/arch/arm64/crypto/aes-glue.c b/arch/arm64/crypto/aes-glue.c index 998ba519a026..00a3e2fd6a48 100644 --- a/arch/arm64/crypto/aes-glue.c +++ b/arch/arm64/crypto/aes-glue.c @@ -64,17 +64,17 @@ MODULE_LICENSE("GPL v2"); /* defined in aes-modes.S */ asmlinkage void aes_ecb_encrypt(u8 out[], u8 const in[], u8 const rk[], - int rounds, int blocks, int first); + int rounds, int blocks); asmlinkage void aes_ecb_decrypt(u8 out[], u8 const in[], u8 const rk[], - int rounds, int blocks, int first); + int rounds, int blocks); asmlinkage void aes_cbc_encrypt(u8 out[], u8 const in[], u8 const rk[], - int rounds, int blocks, u8 iv[], int first); + int rounds, int blocks, u8 iv[]); asmlinkage void aes_cbc_decrypt(u8 out[], u8 const in[], u8 const rk[], - int rounds, int blocks, u8 iv[], int first); + int rounds, int blocks, u8 iv[]); asmlinkage void aes_ctr_encrypt(u8 out[], u8 const in[], u8 const rk[], - int rounds, int blocks, u8 ctr[], int first); + int rounds, int blocks, u8 ctr[]); asmlinkage void aes_xts_encrypt(u8 out[], u8 const in[], u8 const rk1[], int rounds, int blocks, u8 const rk2[], u8 iv[], @@ -133,19 +133,19 @@ static int ecb_encrypt(struct skcipher_request *req) { struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); struct crypto_aes_ctx *ctx = crypto_skcipher_ctx(tfm); - int err, first, rounds = 6 + ctx->key_length / 4; + int err, rounds = 6 + ctx->key_length / 4; struct skcipher_walk walk; unsigned int blocks; - err = skcipher_walk_virt(&walk, req, true); + err = skcipher_walk_virt(&walk, req, false); - kernel_neon_begin(); - for (first = 1; (blocks = (walk.nbytes / AES_BLOCK_SIZE)); first = 0) { + while ((blocks = (walk.nbytes / AES_BLOCK_SIZE))) { + kernel_neon_begin(); aes_ecb_encrypt(walk.dst.virt.addr, walk.src.virt.addr, - (u8 *)ctx->key_enc, rounds, blocks, first); + (u8 *)ctx->key_enc, rounds, blocks); + kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); } - kernel_neon_end(); return err; } @@ -153,19 +153,19 @@ static int ecb_decrypt(struct skcipher_request *req) { struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); struct crypto_aes_ctx *ctx = crypto_skcipher_ctx(tfm); - int err, first, rounds = 6 + ctx->key_length / 4; + int err, rounds = 6 + ctx->key_length / 4; struct skcipher_walk walk; unsigned int blocks; - err = skcipher_walk_virt(&walk, req, true); + err = skcipher_walk_virt(&walk, req, false); - kernel_neon_begin(); - for (first = 1; (blocks = (walk.nbytes / AES_BLOCK_SIZE)); first = 0) { + while ((blocks = (walk.nbytes / AES_BLOCK_SIZE))) { + kernel_neon_begin(); aes_ecb_decrypt(walk.dst.virt.addr, walk.src.virt.addr, - (u8 *)ctx->key_dec, rounds, blocks, first); + (u8 *)ctx->key_dec, rounds, blocks); + kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); } - kernel_neon_end(); return err; } @@ -173,20 +173,19 @@ static int cbc_encrypt(struct skcipher_request *req) { struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); struct crypto_aes_ctx *ctx = crypto_skcipher_ctx(tfm); - int err, first, rounds = 6 + ctx->key_length / 4; + int err, rounds = 6 + ctx->key_length / 4; struct skcipher_walk walk; unsigned int blocks; - err = skcipher_walk_virt(&walk, req, true); + err = skcipher_walk_virt(&walk, req, false); - kernel_neon_begin(); - for (first = 1; (blocks = (walk.nbytes / AES_BLOCK_SIZE)); first = 0) { + while ((blocks = (walk.nbytes / AES_BLOCK_SIZE))) { + kernel_neon_begin(); aes_cbc_encrypt(walk.dst.virt.addr, walk.src.virt.addr, - (u8 *)ctx->key_enc, rounds, blocks, walk.iv, - first); + (u8 *)ctx->key_enc, rounds, blocks, walk.iv); + kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); } - kernel_neon_end(); return err; } @@ -194,20 +193,19 @@ static int cbc_decrypt(struct skcipher_request *req) { struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); struct crypto_aes_ctx *ctx = crypto_skcipher_ctx(tfm); - int err, first, rounds = 6 + ctx->key_length / 4; + int err, rounds = 6 + ctx->key_length / 4; struct skcipher_walk walk; unsigned int blocks; - err = skcipher_walk_virt(&walk, req, true); + err = skcipher_walk_virt(&walk, req, false); - kernel_neon_begin(); - for (first = 1; (blocks = (walk.nbytes / AES_BLOCK_SIZE)); first = 0) { + while ((blocks = (walk.nbytes / AES_BLOCK_SIZE))) { + kernel_neon_begin(); aes_cbc_decrypt(walk.dst.virt.addr, walk.src.virt.addr, - (u8 *)ctx->key_dec, rounds, blocks, walk.iv, - first); + (u8 *)ctx->key_dec, rounds, blocks, walk.iv); + kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); } - kernel_neon_end(); return err; } @@ -215,20 +213,18 @@ static int ctr_encrypt(struct skcipher_request *req) { struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); struct crypto_aes_ctx *ctx = crypto_skcipher_ctx(tfm); - int err, first, rounds = 6 + ctx->key_length / 4; + int err, rounds = 6 + ctx->key_length / 4; struct skcipher_walk walk; int blocks; - err = skcipher_walk_virt(&walk, req, true); + err = skcipher_walk_virt(&walk, req, false); - first = 1; - kernel_neon_begin(); while ((blocks = (walk.nbytes / AES_BLOCK_SIZE))) { + kernel_neon_begin(); aes_ctr_encrypt(walk.dst.virt.addr, walk.src.virt.addr, - (u8 *)ctx->key_enc, rounds, blocks, walk.iv, - first); + (u8 *)ctx->key_enc, rounds, blocks, walk.iv); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); - first = 0; + kernel_neon_end(); } if (walk.nbytes) { u8 __aligned(8) tail[AES_BLOCK_SIZE]; @@ -241,12 +237,13 @@ static int ctr_encrypt(struct skcipher_request *req) */ blocks = -1; + kernel_neon_begin(); aes_ctr_encrypt(tail, NULL, (u8 *)ctx->key_enc, rounds, - blocks, walk.iv, first); + blocks, walk.iv); + kernel_neon_end(); crypto_xor_cpy(tdst, tsrc, tail, nbytes); err = skcipher_walk_done(&walk, 0); } - kernel_neon_end(); return err; } @@ -270,16 +267,16 @@ static int xts_encrypt(struct skcipher_request *req) struct skcipher_walk walk; unsigned int blocks; - err = skcipher_walk_virt(&walk, req, true); + err = skcipher_walk_virt(&walk, req, false); - kernel_neon_begin(); for (first = 1; (blocks = (walk.nbytes / AES_BLOCK_SIZE)); first = 0) { + kernel_neon_begin(); aes_xts_encrypt(walk.dst.virt.addr, walk.src.virt.addr, (u8 *)ctx->key1.key_enc, rounds, blocks, (u8 *)ctx->key2.key_enc, walk.iv, first); + kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); } - kernel_neon_end(); return err; } @@ -292,16 +289,16 @@ static int xts_decrypt(struct skcipher_request *req) struct skcipher_walk walk; unsigned int blocks; - err = skcipher_walk_virt(&walk, req, true); + err = skcipher_walk_virt(&walk, req, false); - kernel_neon_begin(); for (first = 1; (blocks = (walk.nbytes / AES_BLOCK_SIZE)); first = 0) { + kernel_neon_begin(); aes_xts_decrypt(walk.dst.virt.addr, walk.src.virt.addr, (u8 *)ctx->key1.key_dec, rounds, blocks, (u8 *)ctx->key2.key_enc, walk.iv, first); + kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); } - kernel_neon_end(); return err; } @@ -425,7 +422,7 @@ static int cmac_setkey(struct crypto_shash *tfm, const u8 *in_key, /* encrypt the zero vector */ kernel_neon_begin(); - aes_ecb_encrypt(ctx->consts, (u8[AES_BLOCK_SIZE]){}, rk, rounds, 1, 1); + aes_ecb_encrypt(ctx->consts, (u8[AES_BLOCK_SIZE]){}, rk, rounds, 1); kernel_neon_end(); cmac_gf128_mul_by_x(consts, consts); @@ -454,8 +451,8 @@ static int xcbc_setkey(struct crypto_shash *tfm, const u8 *in_key, return err; kernel_neon_begin(); - aes_ecb_encrypt(key, ks[0], rk, rounds, 1, 1); - aes_ecb_encrypt(ctx->consts, ks[1], rk, rounds, 2, 0); + aes_ecb_encrypt(key, ks[0], rk, rounds, 1); + aes_ecb_encrypt(ctx->consts, ks[1], rk, rounds, 2); kernel_neon_end(); return cbcmac_setkey(tfm, key, sizeof(key)); diff --git a/arch/arm64/crypto/aes-modes.S b/arch/arm64/crypto/aes-modes.S index 2674d43d1384..65b273667b34 100644 --- a/arch/arm64/crypto/aes-modes.S +++ b/arch/arm64/crypto/aes-modes.S @@ -40,24 +40,24 @@ #if INTERLEAVE == 2 aes_encrypt_block2x: - encrypt_block2x v0, v1, w3, x2, x6, w7 + encrypt_block2x v0, v1, w3, x2, x8, w7 ret ENDPROC(aes_encrypt_block2x) aes_decrypt_block2x: - decrypt_block2x v0, v1, w3, x2, x6, w7 + decrypt_block2x v0, v1, w3, x2, x8, w7 ret ENDPROC(aes_decrypt_block2x) #elif INTERLEAVE == 4 aes_encrypt_block4x: - encrypt_block4x v0, v1, v2, v3, w3, x2, x6, w7 + encrypt_block4x v0, v1, v2, v3, w3, x2, x8, w7 ret ENDPROC(aes_encrypt_block4x) aes_decrypt_block4x: - decrypt_block4x v0, v1, v2, v3, w3, x2, x6, w7 + decrypt_block4x v0, v1, v2, v3, w3, x2, x8, w7 ret ENDPROC(aes_decrypt_block4x) @@ -86,33 +86,32 @@ ENDPROC(aes_decrypt_block4x) #define FRAME_POP .macro do_encrypt_block2x - encrypt_block2x v0, v1, w3, x2, x6, w7 + encrypt_block2x v0, v1, w3, x2, x8, w7 .endm .macro do_decrypt_block2x - decrypt_block2x v0, v1, w3, x2, x6, w7 + decrypt_block2x v0, v1, w3, x2, x8, w7 .endm .macro do_encrypt_block4x - encrypt_block4x v0, v1, v2, v3, w3, x2, x6, w7 + encrypt_block4x v0, v1, v2, v3, w3, x2, x8, w7 .endm .macro do_decrypt_block4x - decrypt_block4x v0, v1, v2, v3, w3, x2, x6, w7 + decrypt_block4x v0, v1, v2, v3, w3, x2, x8, w7 .endm #endif /* * aes_ecb_encrypt(u8 out[], u8 const in[], u8 const rk[], int rounds, - * int blocks, int first) + * int blocks) * aes_ecb_decrypt(u8 out[], u8 const in[], u8 const rk[], int rounds, - * int blocks, int first) + * int blocks) */ AES_ENTRY(aes_ecb_encrypt) FRAME_PUSH - cbz w5, .LecbencloopNx enc_prepare w3, x2, x5 @@ -148,7 +147,6 @@ AES_ENDPROC(aes_ecb_encrypt) AES_ENTRY(aes_ecb_decrypt) FRAME_PUSH - cbz w5, .LecbdecloopNx dec_prepare w3, x2, x5 @@ -184,14 +182,12 @@ AES_ENDPROC(aes_ecb_decrypt) /* * aes_cbc_encrypt(u8 out[], u8 const in[], u8 const rk[], int rounds, - * int blocks, u8 iv[], int first) + * int blocks, u8 iv[]) * aes_cbc_decrypt(u8 out[], u8 const in[], u8 const rk[], int rounds, - * int blocks, u8 iv[], int first) + * int blocks, u8 iv[]) */ AES_ENTRY(aes_cbc_encrypt) - cbz w6, .Lcbcencloop - ld1 {v0.16b}, [x5] /* get iv */ enc_prepare w3, x2, x6 @@ -209,7 +205,6 @@ AES_ENDPROC(aes_cbc_encrypt) AES_ENTRY(aes_cbc_decrypt) FRAME_PUSH - cbz w6, .LcbcdecloopNx ld1 {v7.16b}, [x5] /* get iv */ dec_prepare w3, x2, x6 @@ -264,20 +259,19 @@ AES_ENDPROC(aes_cbc_decrypt) /* * aes_ctr_encrypt(u8 out[], u8 const in[], u8 const rk[], int rounds, - * int blocks, u8 ctr[], int first) + * int blocks, u8 ctr[]) */ AES_ENTRY(aes_ctr_encrypt) FRAME_PUSH - cbz w6, .Lctrnotfirst /* 1st time around? */ + enc_prepare w3, x2, x6 ld1 {v4.16b}, [x5] -.Lctrnotfirst: - umov x8, v4.d[1] /* keep swabbed ctr in reg */ - rev x8, x8 + umov x6, v4.d[1] /* keep swabbed ctr in reg */ + rev x6, x6 #if INTERLEAVE >= 2 - cmn w8, w4 /* 32 bit overflow? */ + cmn w6, w4 /* 32 bit overflow? */ bcs .Lctrloop .LctrloopNx: subs w4, w4, #INTERLEAVE @@ -285,11 +279,11 @@ AES_ENTRY(aes_ctr_encrypt) #if INTERLEAVE == 2 mov v0.8b, v4.8b mov v1.8b, v4.8b - rev x7, x8 - add x8, x8, #1 + rev x7, x6 + add x6, x6, #1 ins v0.d[1], x7 - rev x7, x8 - add x8, x8, #1 + rev x7, x6 + add x6, x6, #1 ins v1.d[1], x7 ld1 {v2.16b-v3.16b}, [x1], #32 /* get 2 input blocks */ do_encrypt_block2x @@ -298,7 +292,7 @@ AES_ENTRY(aes_ctr_encrypt) st1 {v0.16b-v1.16b}, [x0], #32 #else ldr q8, =0x30000000200000001 /* addends 1,2,3[,0] */ - dup v7.4s, w8 + dup v7.4s, w6 mov v0.16b, v4.16b add v7.4s, v7.4s, v8.4s mov v1.16b, v4.16b @@ -316,9 +310,9 @@ AES_ENTRY(aes_ctr_encrypt) eor v2.16b, v7.16b, v2.16b eor v3.16b, v5.16b, v3.16b st1 {v0.16b-v3.16b}, [x0], #64 - add x8, x8, #INTERLEAVE + add x6, x6, #INTERLEAVE #endif - rev x7, x8 + rev x7, x6 ins v4.d[1], x7 cbz w4, .Lctrout b .LctrloopNx @@ -328,10 +322,10 @@ AES_ENTRY(aes_ctr_encrypt) #endif .Lctrloop: mov v0.16b, v4.16b - encrypt_block v0, w3, x2, x6, w7 + encrypt_block v0, w3, x2, x8, w7 - adds x8, x8, #1 /* increment BE ctr */ - rev x7, x8 + adds x6, x6, #1 /* increment BE ctr */ + rev x7, x6 ins v4.d[1], x7 bcs .Lctrcarry /* overflow? */ @@ -385,15 +379,17 @@ CPU_BE( .quad 0x87, 1 ) AES_ENTRY(aes_xts_encrypt) FRAME_PUSH - cbz w7, .LxtsencloopNx - ld1 {v4.16b}, [x6] - enc_prepare w3, x5, x6 - encrypt_block v4, w3, x5, x6, w7 /* first tweak */ - enc_switch_key w3, x2, x6 + cbz w7, .Lxtsencnotfirst + + enc_prepare w3, x5, x8 + encrypt_block v4, w3, x5, x8, w7 /* first tweak */ + enc_switch_key w3, x2, x8 ldr q7, .Lxts_mul_x b .LxtsencNx +.Lxtsencnotfirst: + enc_prepare w3, x2, x8 .LxtsencloopNx: ldr q7, .Lxts_mul_x next_tweak v4, v4, v7, v8 @@ -442,7 +438,7 @@ AES_ENTRY(aes_xts_encrypt) .Lxtsencloop: ld1 {v1.16b}, [x1], #16 eor v0.16b, v1.16b, v4.16b - encrypt_block v0, w3, x2, x6, w7 + encrypt_block v0, w3, x2, x8, w7 eor v0.16b, v0.16b, v4.16b st1 {v0.16b}, [x0], #16 subs w4, w4, #1 @@ -450,6 +446,7 @@ AES_ENTRY(aes_xts_encrypt) next_tweak v4, v4, v7, v8 b .Lxtsencloop .Lxtsencout: + st1 {v4.16b}, [x6] FRAME_POP ret AES_ENDPROC(aes_xts_encrypt) @@ -457,15 +454,17 @@ AES_ENDPROC(aes_xts_encrypt) AES_ENTRY(aes_xts_decrypt) FRAME_PUSH - cbz w7, .LxtsdecloopNx - ld1 {v4.16b}, [x6] - enc_prepare w3, x5, x6 - encrypt_block v4, w3, x5, x6, w7 /* first tweak */ - dec_prepare w3, x2, x6 + cbz w7, .Lxtsdecnotfirst + + enc_prepare w3, x5, x8 + encrypt_block v4, w3, x5, x8, w7 /* first tweak */ + dec_prepare w3, x2, x8 ldr q7, .Lxts_mul_x b .LxtsdecNx +.Lxtsdecnotfirst: + dec_prepare w3, x2, x8 .LxtsdecloopNx: ldr q7, .Lxts_mul_x next_tweak v4, v4, v7, v8 @@ -514,7 +513,7 @@ AES_ENTRY(aes_xts_decrypt) .Lxtsdecloop: ld1 {v1.16b}, [x1], #16 eor v0.16b, v1.16b, v4.16b - decrypt_block v0, w3, x2, x6, w7 + decrypt_block v0, w3, x2, x8, w7 eor v0.16b, v0.16b, v4.16b st1 {v0.16b}, [x0], #16 subs w4, w4, #1 @@ -522,6 +521,7 @@ AES_ENTRY(aes_xts_decrypt) next_tweak v4, v4, v7, v8 b .Lxtsdecloop .Lxtsdecout: + st1 {v4.16b}, [x6] FRAME_POP ret AES_ENDPROC(aes_xts_decrypt) diff --git a/arch/arm64/crypto/aes-neonbs-glue.c b/arch/arm64/crypto/aes-neonbs-glue.c index c55d68ccb89f..9d823c77ec84 100644 --- a/arch/arm64/crypto/aes-neonbs-glue.c +++ b/arch/arm64/crypto/aes-neonbs-glue.c @@ -46,10 +46,9 @@ asmlinkage void aesbs_xts_decrypt(u8 out[], u8 const in[], u8 const rk[], /* borrowed from aes-neon-blk.ko */ asmlinkage void neon_aes_ecb_encrypt(u8 out[], u8 const in[], u32 const rk[], - int rounds, int blocks, int first); + int rounds, int blocks); asmlinkage void neon_aes_cbc_encrypt(u8 out[], u8 const in[], u32 const rk[], - int rounds, int blocks, u8 iv[], - int first); + int rounds, int blocks, u8 iv[]); struct aesbs_ctx { u8 rk[13 * (8 * AES_BLOCK_SIZE) + 32]; @@ -157,7 +156,7 @@ static int cbc_encrypt(struct skcipher_request *req) struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); struct aesbs_cbc_ctx *ctx = crypto_skcipher_ctx(tfm); struct skcipher_walk walk; - int err, first = 1; + int err; err = skcipher_walk_virt(&walk, req, true); @@ -167,10 +166,9 @@ static int cbc_encrypt(struct skcipher_request *req) /* fall back to the non-bitsliced NEON implementation */ neon_aes_cbc_encrypt(walk.dst.virt.addr, walk.src.virt.addr, - ctx->enc, ctx->key.rounds, blocks, walk.iv, - first); + ctx->enc, ctx->key.rounds, blocks, + walk.iv); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); - first = 0; } kernel_neon_end(); return err; @@ -311,7 +309,7 @@ static int __xts_crypt(struct skcipher_request *req, kernel_neon_begin(); neon_aes_ecb_encrypt(walk.iv, walk.iv, ctx->twkey, - ctx->key.rounds, 1, 1); + ctx->key.rounds, 1); while (walk.nbytes >= AES_BLOCK_SIZE) { unsigned int blocks = walk.nbytes / AES_BLOCK_SIZE; From patchwork Wed Dec 6 19:43:30 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 120887 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp7468516qgn; Wed, 6 Dec 2017 11:44:11 -0800 (PST) X-Google-Smtp-Source: AGs4zMbM2k2+Y8TzZv5k62NqaYdylXq/5kPb+8WSnoHfAFj4yfCg7WImWwU8mbiqb7wzIGcEa0bA X-Received: by 10.159.204.148 with SMTP id t20mr23082132plo.433.1512589450999; Wed, 06 Dec 2017 11:44:10 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1512589450; cv=none; d=google.com; s=arc-20160816; b=khljpgCb9IqEJ1jt2dAq1vsoY8bFu84+oGDHmr/czr6lCEOX5fbK4B1WTlCX3qr4xS zxp6R4pYBFMeSspAE2/beTE+z2nWSGtB4EjQHdFY60QKRhLo19P0DGqI9kZvmEPIFTMe 5++yEFI30qacD1ZsO/sQ5QQnv6VszjTus4Cl34PWcuwNp7ljAXLd2o6cRylgc55qUTyk 0f8MjOjwF6+nvaEmLnIgKJ8/IJX5d2iK8BhZB+skY4UM8CsCHheSfxp2MVunw4ztfrlQ cLpYkgUyZKBfqckhVUQ5EB+LEEKm9G3okZJ2IF7c83iiQxhvIC+hRsrx5SBu/lHO73CX ezGQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=siDqjuKDSU5MVSI0X6LMwRScVHyvLRBVTI5yU5S7QOs=; b=EyB7PaXK9Vho0Ht38fFwNcptH485D9w4GySwulYSjR23oqflv6cTWI2U5RbpU3g23+ udhSPstIGE6yT1/xNqReP1nkGfiet7NDxS9E3nKvyF5MOPC4C2rfLMImGzfavQOiLFwe HDjB/CXyjbCk6yP4XgTvAtyw/WZs9qP25Anqz2PKOfjQQHidL23cIKKq1hBvrAxMGAMi SRDGfwXy64MCjoUbyZxU68rYu0bFPBVQdZ2/zoICUwRtQq+qrHyhBRBXqfdZyWPJFbnh qCErobzPz7XZQefkQRDrvie8WwF6q7CNo9/HoNy86WD0MWea9cOYzHo5iFDAyetvq44M y+oQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=S2QaAuwe; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id i16si2382524pgv.496.2017.12.06.11.44.10; Wed, 06 Dec 2017 11:44:10 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=S2QaAuwe; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752050AbdLFToJ (ORCPT + 1 other); Wed, 6 Dec 2017 14:44:09 -0500 Received: from mail-wr0-f193.google.com ([209.85.128.193]:46897 "EHLO mail-wr0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752119AbdLFToH (ORCPT ); Wed, 6 Dec 2017 14:44:07 -0500 Received: by mail-wr0-f193.google.com with SMTP id x49so5090772wrb.13 for ; Wed, 06 Dec 2017 11:44:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=siDqjuKDSU5MVSI0X6LMwRScVHyvLRBVTI5yU5S7QOs=; b=S2QaAuweBd+cSO3Xh3zeavmwPelMzFmQgvhtK5UL+NLTdAz3wb5GSNGfLukH6sbU9p oflZAR5ldhJLUK6XD0AUusmIHVcr/YNq9ZSB0qaJNVFRw+SH9OGiCMFtkVadGy7qcX8O m5Wt5PsbA/phivAG729ap4EjBeprqA2Zv0mfM= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=siDqjuKDSU5MVSI0X6LMwRScVHyvLRBVTI5yU5S7QOs=; b=IprD7t/YSAwUAEYabfatv5Yts7mXumyTp9qmGpofBzG3G3TKRQdKa3SZwHjgoRLqEl DbwpiMTSHh+2zkz393L9YATVXp4T/f6gSEZeLFLr1pZ1UfWrNfOq4Yj0B+aFWfwZa6s/ UifyYrQnQZYLZB2QeT3pjSi5paF+LgaGbzQ6lzSRNKHWLQUu4uXTjVQ9ChjAZ9y6tHsh u/kpkR6j1XEVAc3id5smRADs1aoruYJydVaczoxpvSg0d4QdpsbqsaOZU2zqDE1SRPnn vwepXzWUZc5X5vQYIpZcFC1HqpUh0LRfMaRcF8mJ7mkDM3qb+GJeJb79VJyn9aqK9Zrr xYqg== X-Gm-Message-State: AJaThX6jgb1UjjlVol2PAFe3dBWVYwqTeP9mQ2ESOqDbC+hX3CGIxph+ f1tCrv3tEYmp2lXtTT16QvusdmDlnY8= X-Received: by 10.223.178.87 with SMTP id y23mr19221326wra.3.1512589445993; Wed, 06 Dec 2017 11:44:05 -0800 (PST) Received: from localhost.localdomain ([105.150.171.234]) by smtp.gmail.com with ESMTPSA id b66sm3596594wmh.32.2017.12.06.11.44.02 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 06 Dec 2017 11:44:05 -0800 (PST) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: herbert@gondor.apana.org.au, linux-arm-kernel@lists.infradead.org, Ard Biesheuvel , Dave Martin , Russell King - ARM Linux , Sebastian Andrzej Siewior , Mark Rutland , linux-rt-users@vger.kernel.org, Peter Zijlstra , Catalin Marinas , Will Deacon , Steven Rostedt , Thomas Gleixner Subject: [PATCH v3 04/20] crypto: arm64/aes-bs - move kernel mode neon en/disable into loop Date: Wed, 6 Dec 2017 19:43:30 +0000 Message-Id: <20171206194346.24393-5-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20171206194346.24393-1-ard.biesheuvel@linaro.org> References: <20171206194346.24393-1-ard.biesheuvel@linaro.org> Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org When kernel mode NEON was first introduced on arm64, the preserve and restore of the userland NEON state was completely unoptimized, and involved saving all registers on each call to kernel_neon_begin(), and restoring them on each call to kernel_neon_end(). For this reason, the NEON crypto code that was introduced at the time keeps the NEON enabled throughout the execution of the crypto API methods, which may include calls back into the crypto API that could result in memory allocation or other actions that we should avoid when running with preemption disabled. Since then, we have optimized the kernel mode NEON handling, which now restores lazily (upon return to userland), and so the preserve action is only costly the first time it is called after entering the kernel. So let's put the kernel_neon_begin() and kernel_neon_end() calls around the actual invocations of the NEON crypto code, and run the remainder of the code with kernel mode NEON disabled (and preemption enabled) Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-neonbs-glue.c | 36 +++++++++----------- 1 file changed, 17 insertions(+), 19 deletions(-) -- 2.11.0 diff --git a/arch/arm64/crypto/aes-neonbs-glue.c b/arch/arm64/crypto/aes-neonbs-glue.c index 9d823c77ec84..e7a95a566462 100644 --- a/arch/arm64/crypto/aes-neonbs-glue.c +++ b/arch/arm64/crypto/aes-neonbs-glue.c @@ -99,9 +99,8 @@ static int __ecb_crypt(struct skcipher_request *req, struct skcipher_walk walk; int err; - err = skcipher_walk_virt(&walk, req, true); + err = skcipher_walk_virt(&walk, req, false); - kernel_neon_begin(); while (walk.nbytes >= AES_BLOCK_SIZE) { unsigned int blocks = walk.nbytes / AES_BLOCK_SIZE; @@ -109,12 +108,13 @@ static int __ecb_crypt(struct skcipher_request *req, blocks = round_down(blocks, walk.stride / AES_BLOCK_SIZE); + kernel_neon_begin(); fn(walk.dst.virt.addr, walk.src.virt.addr, ctx->rk, ctx->rounds, blocks); + kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes - blocks * AES_BLOCK_SIZE); } - kernel_neon_end(); return err; } @@ -158,19 +158,19 @@ static int cbc_encrypt(struct skcipher_request *req) struct skcipher_walk walk; int err; - err = skcipher_walk_virt(&walk, req, true); + err = skcipher_walk_virt(&walk, req, false); - kernel_neon_begin(); while (walk.nbytes >= AES_BLOCK_SIZE) { unsigned int blocks = walk.nbytes / AES_BLOCK_SIZE; /* fall back to the non-bitsliced NEON implementation */ + kernel_neon_begin(); neon_aes_cbc_encrypt(walk.dst.virt.addr, walk.src.virt.addr, ctx->enc, ctx->key.rounds, blocks, walk.iv); + kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); } - kernel_neon_end(); return err; } @@ -181,9 +181,8 @@ static int cbc_decrypt(struct skcipher_request *req) struct skcipher_walk walk; int err; - err = skcipher_walk_virt(&walk, req, true); + err = skcipher_walk_virt(&walk, req, false); - kernel_neon_begin(); while (walk.nbytes >= AES_BLOCK_SIZE) { unsigned int blocks = walk.nbytes / AES_BLOCK_SIZE; @@ -191,13 +190,14 @@ static int cbc_decrypt(struct skcipher_request *req) blocks = round_down(blocks, walk.stride / AES_BLOCK_SIZE); + kernel_neon_begin(); aesbs_cbc_decrypt(walk.dst.virt.addr, walk.src.virt.addr, ctx->key.rk, ctx->key.rounds, blocks, walk.iv); + kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes - blocks * AES_BLOCK_SIZE); } - kernel_neon_end(); return err; } @@ -229,9 +229,8 @@ static int ctr_encrypt(struct skcipher_request *req) u8 buf[AES_BLOCK_SIZE]; int err; - err = skcipher_walk_virt(&walk, req, true); + err = skcipher_walk_virt(&walk, req, false); - kernel_neon_begin(); while (walk.nbytes > 0) { unsigned int blocks = walk.nbytes / AES_BLOCK_SIZE; u8 *final = (walk.total % AES_BLOCK_SIZE) ? buf : NULL; @@ -242,8 +241,10 @@ static int ctr_encrypt(struct skcipher_request *req) final = NULL; } + kernel_neon_begin(); aesbs_ctr_encrypt(walk.dst.virt.addr, walk.src.virt.addr, ctx->rk, ctx->rounds, blocks, walk.iv, final); + kernel_neon_end(); if (final) { u8 *dst = walk.dst.virt.addr + blocks * AES_BLOCK_SIZE; @@ -258,8 +259,6 @@ static int ctr_encrypt(struct skcipher_request *req) err = skcipher_walk_done(&walk, walk.nbytes - blocks * AES_BLOCK_SIZE); } - kernel_neon_end(); - return err; } @@ -304,12 +303,11 @@ static int __xts_crypt(struct skcipher_request *req, struct skcipher_walk walk; int err; - err = skcipher_walk_virt(&walk, req, true); + err = skcipher_walk_virt(&walk, req, false); kernel_neon_begin(); - - neon_aes_ecb_encrypt(walk.iv, walk.iv, ctx->twkey, - ctx->key.rounds, 1); + neon_aes_ecb_encrypt(walk.iv, walk.iv, ctx->twkey, ctx->key.rounds, 1); + kernel_neon_end(); while (walk.nbytes >= AES_BLOCK_SIZE) { unsigned int blocks = walk.nbytes / AES_BLOCK_SIZE; @@ -318,13 +316,13 @@ static int __xts_crypt(struct skcipher_request *req, blocks = round_down(blocks, walk.stride / AES_BLOCK_SIZE); + kernel_neon_begin(); fn(walk.dst.virt.addr, walk.src.virt.addr, ctx->key.rk, ctx->key.rounds, blocks, walk.iv); + kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes - blocks * AES_BLOCK_SIZE); } - kernel_neon_end(); - return err; } From patchwork Wed Dec 6 19:43:31 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 120888 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp7468563qgn; Wed, 6 Dec 2017 11:44:14 -0800 (PST) X-Google-Smtp-Source: AGs4zMb47KBVQfFaIcO98Byt3UwJzqWd1sKyiw4xNXaEhCTVx8iqSWzmtVPX14IMZMHrqlndEB80 X-Received: by 10.159.204.147 with SMTP id t19mr23387469plo.222.1512589454040; Wed, 06 Dec 2017 11:44:14 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1512589454; cv=none; d=google.com; s=arc-20160816; b=bfMofwvvJOYOPEjAeaHSIIfMbY6We+DOyNBq0pxBwUvou2SFmpfdhGETJHOfd9gEGn 90PIX3sedAMiduiM4zCvoIB4OlNeOuWxfkp9H2eYTlvAI7BO39NJwHKU/uHZS0lfhkW2 g0IctZV4+YzX/gG8JCwc9LalUaIQ1HOfmR7M9qC+8JYOUfkr0j1lj6vAmr6662LZ7vWS azfZ0ZJkyhrfSghBfX+zaDucs+ivnC1mj74JKLtVKlq+VWr6TI+gnPw1uG0wjisb7/tB BGxYUPvEk8RXSpuDYhkj0vxcngIvRliXcMkSG9ibGPsJ+vDc7a/5vY6PMdIZlXhXq+XS hLTg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=IhTik8BmpN/IjyhBdu2dYZZ72sUFaQrQeiEa5qMgGQk=; b=EqWSGYs34YAf9V952lcYI5hKq//If+0YrJZCAfFLTr4hm/A4+7bpQd+ySBiKFD2EPJ zvjfmXKMhXfhQ5lzwu6dReisGwk9OWiFyR4dtVFmLf13Jy1aPE42rwox/BagsrAdJ+zH lq02Rshk5r/5hRUWvTrS7k996UWJIldnfvWPjXOM8xCnUkY8Pgx08kq68zKXu5J24/Tv MsXfAuosrBuca1CkcI9O766qhS4ZPOcUs+NCDPSdyZuoBz5KNrZwNQTmhQKYIpOvRDee 5h3IqobMs1S8QU4y6Zchhc9V+ZO/30Bwb/UL2QlRuPefs93MLER42LXGpwPIBXPHemo/ vWbw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=KahvDAbK; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id i16si2382524pgv.496.2017.12.06.11.44.13; Wed, 06 Dec 2017 11:44:14 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=KahvDAbK; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752073AbdLFToL (ORCPT + 1 other); Wed, 6 Dec 2017 14:44:11 -0500 Received: from mail-wr0-f195.google.com ([209.85.128.195]:44331 "EHLO mail-wr0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752126AbdLFToK (ORCPT ); Wed, 6 Dec 2017 14:44:10 -0500 Received: by mail-wr0-f195.google.com with SMTP id l22so5100091wrc.11 for ; Wed, 06 Dec 2017 11:44:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=IhTik8BmpN/IjyhBdu2dYZZ72sUFaQrQeiEa5qMgGQk=; b=KahvDAbKzrmVdWRAx9S5eOMsUg9W+vaTK1AbWDMXcoQDIWwHiKCqRmVeGs0n7cMXV4 AwXIWnL6lEq2LJzUqOB8FUaHzoa/s57S38x+XRfxqJQXms+u71ok1pAHK/f1Sva1cYTF GTl+Q4JAiFPJl2bXCL3XHZgJ8Ke4hgL6fsgok= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=IhTik8BmpN/IjyhBdu2dYZZ72sUFaQrQeiEa5qMgGQk=; b=FDyrX1sM5uId9DQQa1GF9D/pvf8tMGJzUEZmTmYiQYOkUmboGuQKgGVdk2qfGO6vFV l46gy+H2PEZEdovU+LrgYlr/xrwEDeHc41h8lq/vlchnYaBJ/OFECf/9fwIQ2otJX8Bi zEIkQb4/Gd54mJcHwS0YJat1cG+Z7MPc5SJbv/dGVgjS37mE2tnKntiMIBFBRFrFfDOk f118HrscXyKcHetlFkV0HH+GxaXYz5jC8onQ63j/RqWJM1OILAu/axJmE3Vp72MR9cD2 EfQSXb3rlBFs2PYfgtGRNAPIOGQviJrIw9e67NlWrTvBvgYC+bffzUOrRslvCiufg15T BOuA== X-Gm-Message-State: AJaThX5+NxJ+0xhIJoVmmQbcrB9gc93h/r+Msvl14PifLHjKR5fWiDqv eqDH9piOKRH6FvYSB2xk0i9PLbCSEi8= X-Received: by 10.223.156.146 with SMTP id d18mr20927914wre.161.1512589448764; Wed, 06 Dec 2017 11:44:08 -0800 (PST) Received: from localhost.localdomain ([105.150.171.234]) by smtp.gmail.com with ESMTPSA id b66sm3596594wmh.32.2017.12.06.11.44.06 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 06 Dec 2017 11:44:07 -0800 (PST) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: herbert@gondor.apana.org.au, linux-arm-kernel@lists.infradead.org, Ard Biesheuvel , Dave Martin , Russell King - ARM Linux , Sebastian Andrzej Siewior , Mark Rutland , linux-rt-users@vger.kernel.org, Peter Zijlstra , Catalin Marinas , Will Deacon , Steven Rostedt , Thomas Gleixner Subject: [PATCH v3 05/20] crypto: arm64/chacha20 - move kernel mode neon en/disable into loop Date: Wed, 6 Dec 2017 19:43:31 +0000 Message-Id: <20171206194346.24393-6-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20171206194346.24393-1-ard.biesheuvel@linaro.org> References: <20171206194346.24393-1-ard.biesheuvel@linaro.org> Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org When kernel mode NEON was first introduced on arm64, the preserve and restore of the userland NEON state was completely unoptimized, and involved saving all registers on each call to kernel_neon_begin(), and restoring them on each call to kernel_neon_end(). For this reason, the NEON crypto code that was introduced at the time keeps the NEON enabled throughout the execution of the crypto API methods, which may include calls back into the crypto API that could result in memory allocation or other actions that we should avoid when running with preemption disabled. Since then, we have optimized the kernel mode NEON handling, which now restores lazily (upon return to userland), and so the preserve action is only costly the first time it is called after entering the kernel. So let's put the kernel_neon_begin() and kernel_neon_end() calls around the actual invocations of the NEON crypto code, and run the remainder of the code with kernel mode NEON disabled (and preemption enabled) Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/chacha20-neon-glue.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) -- 2.11.0 diff --git a/arch/arm64/crypto/chacha20-neon-glue.c b/arch/arm64/crypto/chacha20-neon-glue.c index cbdb75d15cd0..727579c93ded 100644 --- a/arch/arm64/crypto/chacha20-neon-glue.c +++ b/arch/arm64/crypto/chacha20-neon-glue.c @@ -37,12 +37,19 @@ static void chacha20_doneon(u32 *state, u8 *dst, const u8 *src, u8 buf[CHACHA20_BLOCK_SIZE]; while (bytes >= CHACHA20_BLOCK_SIZE * 4) { + kernel_neon_begin(); chacha20_4block_xor_neon(state, dst, src); + kernel_neon_end(); bytes -= CHACHA20_BLOCK_SIZE * 4; src += CHACHA20_BLOCK_SIZE * 4; dst += CHACHA20_BLOCK_SIZE * 4; state[12] += 4; } + + if (!bytes) + return; + + kernel_neon_begin(); while (bytes >= CHACHA20_BLOCK_SIZE) { chacha20_block_xor_neon(state, dst, src); bytes -= CHACHA20_BLOCK_SIZE; @@ -55,6 +62,7 @@ static void chacha20_doneon(u32 *state, u8 *dst, const u8 *src, chacha20_block_xor_neon(state, buf, buf); memcpy(dst, buf, bytes); } + kernel_neon_end(); } static int chacha20_neon(struct skcipher_request *req) @@ -68,11 +76,10 @@ static int chacha20_neon(struct skcipher_request *req) if (!may_use_simd() || req->cryptlen <= CHACHA20_BLOCK_SIZE) return crypto_chacha20_crypt(req); - err = skcipher_walk_virt(&walk, req, true); + err = skcipher_walk_virt(&walk, req, false); crypto_chacha20_init(state, ctx, walk.iv); - kernel_neon_begin(); while (walk.nbytes > 0) { unsigned int nbytes = walk.nbytes; @@ -83,7 +90,6 @@ static int chacha20_neon(struct skcipher_request *req) nbytes); err = skcipher_walk_done(&walk, walk.nbytes - nbytes); } - kernel_neon_end(); return err; } From patchwork Wed Dec 6 19:43:35 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 120893 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp7468862qgn; Wed, 6 Dec 2017 11:44:33 -0800 (PST) X-Google-Smtp-Source: AGs4zMboF7Wyyo1SOdrVudgsl2omfwliT97RL0fUpOTQFBKPFA6tq3jeQsZLk2aRuKyt9Oft54FT X-Received: by 10.99.141.75 with SMTP id z72mr21746277pgd.73.1512589473579; Wed, 06 Dec 2017 11:44:33 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1512589473; cv=none; d=google.com; s=arc-20160816; b=sXuZ+kFsMP2w1gt3tJ6y5/r3xDhy8DHxrgei3gypvS+Nim8CZN48jefEnAov5zQamI V0rxC0x8gmkMtrdgITA6uXSlIzC3SMQwXjodygiqYN0UchpasAlK6S8aRe8ikW2gqfuI VzTK3eqSqJVhTImLtz+wQXuAC+ryMHzjtPyNRaiTPJaagrG3GIIuYIKL0P2JHDf0YrFX ELcTDuw4tgNhhYLD1EzpXwkWEyfTlNfwbK/R6ICYpsAQBRPMZQqQHTFpyyzaVMVPYMXE 5d5gWXOB5jorTNYmxDdC29XfKtKJFMWNNIEy9UIyy4vG73KDKVh0gqgQuhyCZXSqD5X8 XGuw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=5zk1lw0Of4wZ9+5EpqGNCIfS63EqvIFydBGYbdg5eLE=; b=YfKZJXeV20Y+xe3eM1QXqRaOeG0roSiA8ZVYNfE3jmdFFvDP6WzW8Zor7/SF0Pm5dq m0txPNC24WGUqbhVlkKAw/1eD0Q2mLYil1HXY14W9N5Kf2UtzaeYbES/7QPsmkNupYUS u7LaA5q+DCspSyEssDYa9cZR9JCPXYkfFA52rdTBkZOxIReUb+I18QGVnOQ8s7Vu9ORL KXvAWqW4mQen4byyUS5Z5ujbqpuGBy+rV4HIZ9M14iI+NxeVdLD8aJBNpk/8JMC2GnsB RyQjUgaZDLwRrKDfc1h6xkNgGNSyAizPWNB3sair59RzlxYNoJoAIyjSBGCMMnxEaAcY kkeA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=TJRcRkR6; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id i16si2382524pgv.496.2017.12.06.11.44.33; Wed, 06 Dec 2017 11:44:33 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=TJRcRkR6; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752357AbdLFToc (ORCPT + 1 other); Wed, 6 Dec 2017 14:44:32 -0500 Received: from mail-wr0-f194.google.com ([209.85.128.194]:42676 "EHLO mail-wr0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752317AbdLFToX (ORCPT ); Wed, 6 Dec 2017 14:44:23 -0500 Received: by mail-wr0-f194.google.com with SMTP id s66so5114430wrc.9 for ; Wed, 06 Dec 2017 11:44:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=5zk1lw0Of4wZ9+5EpqGNCIfS63EqvIFydBGYbdg5eLE=; b=TJRcRkR6osoVuJGMqyJEfzWXl3vKd0OKLO3W/3HR9ZP5LFhsD4j/QxxLUdD47IPduF Jx62UC3CMOXGDYSDbD1YlNBXWLT8yjPKis7VEM7IRmd20dxP4vL1EckdmOzwJpO1Ohf8 tJV62Wyy/9NeYPOET5kTczlyULqJaR7sOJFg4= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=5zk1lw0Of4wZ9+5EpqGNCIfS63EqvIFydBGYbdg5eLE=; b=XT0aXqrkBEMXp4bEogaqIntSHB/26E1wotXUCLkXMMCet8BZjqPwspvKFfT6KpruMv IFfy80DWWRObiExb7C0AW1XTHPSHBZNth5LoZTOpFsQzyDTwrzr/f1dE/a3pltcAhdGR pz/QoqdX3ZlujhIXbUMU9zKiQFxsQtZ01YM60f/b4YyRTTxcQE1+JYzd1CcXiYrCIA7l fi6WLYzd8/lHXdUoO9DE2BIP7Gu/TadWf3xPlu3o9FVILhabi06gDmSpY/7FpqQowJmc 0d/KtUFj4K211AjzouBiwRhRPb4TAu7zN6k8atHFFinORBNFgbDz8Oqno+TIc2V4Bp4p aXNQ== X-Gm-Message-State: AJaThX5EwuM7o3eGDAWcugUJP7/wqze/4YzQEd5UwftQUMWi/OpjlScK OUjvTwG2Gqkxo13ABp9ai5C+XT5HrIc= X-Received: by 10.223.148.69 with SMTP id 63mr22031648wrq.89.1512589461631; Wed, 06 Dec 2017 11:44:21 -0800 (PST) Received: from localhost.localdomain ([105.150.171.234]) by smtp.gmail.com with ESMTPSA id b66sm3596594wmh.32.2017.12.06.11.44.18 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 06 Dec 2017 11:44:20 -0800 (PST) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: herbert@gondor.apana.org.au, linux-arm-kernel@lists.infradead.org, Ard Biesheuvel , Dave Martin , Russell King - ARM Linux , Sebastian Andrzej Siewior , Mark Rutland , linux-rt-users@vger.kernel.org, Peter Zijlstra , Catalin Marinas , Will Deacon , Steven Rostedt , Thomas Gleixner Subject: [PATCH v3 09/20] crypto: arm64/sha256-neon - play nice with CONFIG_PREEMPT kernels Date: Wed, 6 Dec 2017 19:43:35 +0000 Message-Id: <20171206194346.24393-10-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20171206194346.24393-1-ard.biesheuvel@linaro.org> References: <20171206194346.24393-1-ard.biesheuvel@linaro.org> Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Tweak the SHA256 update routines to invoke the SHA256 block transform block by block, to avoid excessive scheduling delays caused by the NEON algorithm running with preemption disabled. Also, remove a stale comment which no longer applies now that kernel mode NEON is actually disallowed in some contexts. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/sha256-glue.c | 36 +++++++++++++------- 1 file changed, 23 insertions(+), 13 deletions(-) -- 2.11.0 diff --git a/arch/arm64/crypto/sha256-glue.c b/arch/arm64/crypto/sha256-glue.c index b064d925fe2a..e8880ccdc71f 100644 --- a/arch/arm64/crypto/sha256-glue.c +++ b/arch/arm64/crypto/sha256-glue.c @@ -89,21 +89,32 @@ static struct shash_alg algs[] = { { static int sha256_update_neon(struct shash_desc *desc, const u8 *data, unsigned int len) { - /* - * Stacking and unstacking a substantial slice of the NEON register - * file may significantly affect performance for small updates when - * executing in interrupt context, so fall back to the scalar code - * in that case. - */ + struct sha256_state *sctx = shash_desc_ctx(desc); + if (!may_use_simd()) return sha256_base_do_update(desc, data, len, (sha256_block_fn *)sha256_block_data_order); - kernel_neon_begin(); - sha256_base_do_update(desc, data, len, - (sha256_block_fn *)sha256_block_neon); - kernel_neon_end(); + while (len > 0) { + unsigned int chunk = len; + + /* + * Don't hog the CPU for the entire time it takes to process all + * input when running on a preemptible kernel, but process the + * data block by block instead. + */ + if (IS_ENABLED(CONFIG_PREEMPT) && + chunk + sctx->count % SHA256_BLOCK_SIZE > SHA256_BLOCK_SIZE) + chunk = SHA256_BLOCK_SIZE - + sctx->count % SHA256_BLOCK_SIZE; + kernel_neon_begin(); + sha256_base_do_update(desc, data, chunk, + (sha256_block_fn *)sha256_block_neon); + kernel_neon_end(); + data += chunk; + len -= chunk; + } return 0; } @@ -117,10 +128,9 @@ static int sha256_finup_neon(struct shash_desc *desc, const u8 *data, sha256_base_do_finalize(desc, (sha256_block_fn *)sha256_block_data_order); } else { - kernel_neon_begin(); if (len) - sha256_base_do_update(desc, data, len, - (sha256_block_fn *)sha256_block_neon); + sha256_update_neon(desc, data, len); + kernel_neon_begin(); sha256_base_do_finalize(desc, (sha256_block_fn *)sha256_block_neon); kernel_neon_end(); From patchwork Wed Dec 6 19:43:36 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 120894 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp7468934qgn; Wed, 6 Dec 2017 11:44:38 -0800 (PST) X-Google-Smtp-Source: AGs4zMYmM/bqsaU5vYxiuYWlvRBBFKV25r+PDa0MYYE9BMT8JHyW0SU9cLgqBy9j75ipAxnD2Ie0 X-Received: by 10.84.176.195 with SMTP id v61mr11435457plb.8.1512589478066; Wed, 06 Dec 2017 11:44:38 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1512589478; cv=none; d=google.com; s=arc-20160816; b=G5Vc8wTwcw38sygWASuVLP3blPZVwWn/jng2cjTRZ+ogH71frIsfEcr0NkVTaukVV0 b+pgGVMy35/jOhadmIhD9YaVpl+YGxK7RombT+go9Njhw6ZPYoxQ1G+X1SRgROf1UXgt 3Y1K1nzCbxHUYsg1l6PthHiGp9PEQeHoL9jHWK0+f0c7xl8mGYkJEO47QXVvCVtJmCIs sZqgW4BxjzuxgI7o57nmQG2vlyZXeDSUkWCWde4SdCiOMc1qZAGgfBeVzcbZQAV9KQjC 7R0dnVMaT1cNsl+hNpo/VIiI3D/6oJsbEGScwowjV6Cz/oSKVMJypbYbT+5x4x/3f++p 6RjQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=nmrHmpQAJW1RNoM8S8w0VHO1J/XgSbx51edTYJDyDvE=; b=Sg7sjPHWNmeXxk+Sp+UQSaWsJXIFQNwRYe9rYDSCmM9R4RolqjlUiiLD/xIFambAGQ aMxE2s5+TxhGvwEjgQPQX+ntaJynAeOPbXHlO0Ri2zf46036dUo/UPHwLdvs3s4QomK1 VXFhra1oNYmuWYSMTVq+iz0xonAIYFQWUNKWD1/wHy3rftfBIGZtqJVa4WbVxFx/CZvy sleuw/xqpaNgHNjqOCajgT2puO9iHgtUpJB1HdUmmpVu2qfvJnwoqjsoO+LM36RZVJ4m jABCEeuOkaNo4cM4jDhnZLlW3/IVNe2Uxyq3lwFpzepuThcKteo/GICppHTOTXe9qxc5 r+fQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=joUDHC6m; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id i16si2382524pgv.496.2017.12.06.11.44.37; Wed, 06 Dec 2017 11:44:38 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=joUDHC6m; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752319AbdLFTog (ORCPT + 1 other); Wed, 6 Dec 2017 14:44:36 -0500 Received: from mail-wm0-f68.google.com ([74.125.82.68]:46873 "EHLO mail-wm0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752320AbdLFTo0 (ORCPT ); Wed, 6 Dec 2017 14:44:26 -0500 Received: by mail-wm0-f68.google.com with SMTP id r78so9289330wme.5 for ; Wed, 06 Dec 2017 11:44:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=nmrHmpQAJW1RNoM8S8w0VHO1J/XgSbx51edTYJDyDvE=; b=joUDHC6mHRG/9mBhWTxw6p5sBmBfeCgyYM+PIXSEioU8UireBx3eTFx5c2GjyjUmYM 94y5CN+VewAtkzsEi6eEnjgUyxAvV05IeyyRHmKLJRWA1HYbfGTPS4V8+Mxa1ALa7MML bFPG8K+LgfcKLcJLuPiF+E/kjrtUk8ctLzESc= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=nmrHmpQAJW1RNoM8S8w0VHO1J/XgSbx51edTYJDyDvE=; b=ObUylB/+ZQb/FoJsKjQtHyTIXNMzrTPYaUFC7eIg5rpgUIlcrIzehGEmPxkP7EsH0q kYJzPadydpPSBqg0OKZzBA1dTQQ/s5nsYcbVOLluztOvISPGQI/skiB3K5WIttCWS3gI pExU81/s8FLi0NxX9sTr9SUV9YbZ2/951H6V6sWnRAhy2Ru9FwlovXMZolBYuVoiQLX0 Db/kF0gBQRSqUq7MFMN1HO1IZUUV5To5PhCxwsf+eMXEN1QVAVRBXuCaRq/8a+FxAvuT 67iOjA5C0u/qJOohXJTLC9XpqS/oxF/ko/TJUEy6btBfxvLnDt+H5TlxzAQ4mKJRAUbB gOIA== X-Gm-Message-State: AKGB3mJQMQ4wNToO34/E59ATwj60t5M1PWjjXrkrPaNINY/whUDUK8ST y7/wGmNiJ6Wrre/WVH3pIyiWsfOlaVc= X-Received: by 10.28.170.75 with SMTP id t72mr8781612wme.15.1512589464864; Wed, 06 Dec 2017 11:44:24 -0800 (PST) Received: from localhost.localdomain ([105.150.171.234]) by smtp.gmail.com with ESMTPSA id b66sm3596594wmh.32.2017.12.06.11.44.21 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 06 Dec 2017 11:44:24 -0800 (PST) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: herbert@gondor.apana.org.au, linux-arm-kernel@lists.infradead.org, Ard Biesheuvel , Dave Martin , Russell King - ARM Linux , Sebastian Andrzej Siewior , Mark Rutland , linux-rt-users@vger.kernel.org, Peter Zijlstra , Catalin Marinas , Will Deacon , Steven Rostedt , Thomas Gleixner Subject: [PATCH v3 10/20] arm64: assembler: add utility macros to push/pop stack frames Date: Wed, 6 Dec 2017 19:43:36 +0000 Message-Id: <20171206194346.24393-11-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20171206194346.24393-1-ard.biesheuvel@linaro.org> References: <20171206194346.24393-1-ard.biesheuvel@linaro.org> Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org We are going to add code to all the NEON crypto routines that will turn them into non-leaf functions, so we need to manage the stack frames. To make this less tedious and error prone, add some macros that take the number of callee saved registers to preserve and the extra size to allocate in the stack frame (for locals) and emit the ldp/stp sequences. Signed-off-by: Ard Biesheuvel --- arch/arm64/include/asm/assembler.h | 60 ++++++++++++++++++++ 1 file changed, 60 insertions(+) -- 2.11.0 diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h index aef72d886677..5f61487e9f93 100644 --- a/arch/arm64/include/asm/assembler.h +++ b/arch/arm64/include/asm/assembler.h @@ -499,6 +499,66 @@ alternative_else_nop_endif #endif .endm + /* + * frame_push - Push @regcount callee saved registers to the stack, + * starting at x19, as well as x29/x30, and set x29 to + * the new value of sp. Add @extra bytes of stack space + * for locals. + */ + .macro frame_push, regcount:req, extra + __frame st, \regcount, \extra + .endm + + /* + * frame_pop - Pop @regcount callee saved registers from the stack, + * starting at x19, as well as x29/x30. Also pop @extra + * bytes of stack space for locals. + */ + .macro frame_pop, regcount:req, extra + __frame ld, \regcount, \extra + .endm + + .macro __frame, op, regcount:req, extra=0 + .ifc \op, st + stp x29, x30, [sp, #-((\regcount + 3) / 2) * 16 - \extra]! + mov x29, sp + .endif + .if \regcount < 0 || \regcount > 10 + .error "regcount should be in the range [0 ... 10]" + .endif + .if (\extra % 16) != 0 + .error "extra should be a multiple of 16 bytes" + .endif + .if \regcount > 1 + \op\()p x19, x20, [sp, #16] + .if \regcount > 3 + \op\()p x21, x22, [sp, #32] + .if \regcount > 5 + \op\()p x23, x24, [sp, #48] + .if \regcount > 7 + \op\()p x25, x26, [sp, #64] + .if \regcount > 9 + \op\()p x27, x28, [sp, #80] + .elseif \regcount == 9 + \op\()r x27, [sp, #80] + .endif + .elseif \regcount == 7 + \op\()r x25, [sp, #64] + .endif + .elseif \regcount == 5 + \op\()r x23, [sp, #48] + .endif + .elseif \regcount == 3 + \op\()r x21, [sp, #32] + .endif + .elseif \regcount == 1 + \op\()r x19, [sp, #16] + .endif + .ifc \op, ld + ldp x29, x30, [sp], #((\regcount + 3) / 2) * 16 + \extra + .endif + .endm + /* * Errata workaround post TTBR0_EL1 update. */ From patchwork Wed Dec 6 19:43:37 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 120895 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp7469002qgn; Wed, 6 Dec 2017 11:44:42 -0800 (PST) X-Google-Smtp-Source: AGs4zMb23F2pgq9I4vUWJKXi8XkcA6GowpneEVuCWy89fAzh5eh2tB/ySVWErweXQJJ7RzlXLndH X-Received: by 10.101.99.140 with SMTP id h12mr21868859pgv.80.1512589482813; Wed, 06 Dec 2017 11:44:42 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1512589482; cv=none; d=google.com; s=arc-20160816; b=zvT3v7Ol104VMp37NoZ0h4qNoymStUhlevP7MIo9va6wb9reLrkrZGO4uLqGIZkrFD QQrLwbkbjlvHcxutoYl1FY8FbNehJsHGl13OZE92VAHLXlMGe0+P90+qW6dSSAlQms37 6eAu5YMhsSVykJYJOSJdYjHQ+SDBnSQyEZEsCsaMvESyE9zRJbh5EpPhTIAIAdSgAeP4 K8yuTpfZlpl/PPlJo+S7OyIT47qCj6PWnEWjH/Zf9TmhlFm1lrXbL3lHtASWUl85d4k2 0CUTn4aXCiGO/QhrUeBO2KNbuhx/M90MPfRze3pn0Ze+GkBkbvHoYrswCQLtsqxyY5rZ wHXw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=nktoeluwbKnt4qPDY2lQ8CC4zE2KSWsLD5ZbGr2gOFI=; b=C+eWrQqwpgInXqBHgljqZZmnzsrB+7+W0xfineRoLI6fM24A8vwMCD2uFyC/QZ0ufo +xZGUGyB1DNqX581unYXeVKPB3FsKXqQ9KKZieVJZbdV9z02fuGxqmrp/8X4a1AAWrSB 4BQfYlK02A+UhWcQJWctl4h2/9Vn/GXNaCjiyuW1J4pJ8L3/njkoMZgtmTKEGywbEmKc lGYNov0R90KpabAqQBgbEg78tvHrEdU95qxcAdm0F4XG1Z5DUTxUGwyD3sYt9mk3Slek emcFvScMQXAUoICbnn5GwX96HWPCJBDNST+8gnoyXjaEgsgKInGRQZUKHMTXMK6rqTW4 yk9Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=TLIR2OK7; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id i16si2382524pgv.496.2017.12.06.11.44.42; Wed, 06 Dec 2017 11:44:42 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=TLIR2OK7; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752413AbdLFTol (ORCPT + 1 other); Wed, 6 Dec 2017 14:44:41 -0500 Received: from mail-wm0-f68.google.com ([74.125.82.68]:36664 "EHLO mail-wm0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752176AbdLFTo3 (ORCPT ); Wed, 6 Dec 2017 14:44:29 -0500 Received: by mail-wm0-f68.google.com with SMTP id b76so9175704wmg.1 for ; Wed, 06 Dec 2017 11:44:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=nktoeluwbKnt4qPDY2lQ8CC4zE2KSWsLD5ZbGr2gOFI=; b=TLIR2OK7SqQNGNB26u40vVnifqhEgu6OFAPQ7uOkXIABMWuVPc4N7O1ZxVbcwAsiQv IbQFgxYL+MhURDFqkFvM82/dJZQmSP4swialookiLotqUUVqR5ZTTn5cDbSbEhMIPCOz RMebwo4VzKFo0zrtZq8xx4+sHDQQb8YOd2bAY= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=nktoeluwbKnt4qPDY2lQ8CC4zE2KSWsLD5ZbGr2gOFI=; b=UVzVFFt/WDCM99BNJ2wB8Ok1puzvWUVjruO1CUiaHWDwK70Ug5DTy3zNAS9qzsuJqp oQvBV3wAtH/t6uV64XbU3GM8X4bfW5pqn7yZvMn9bNn5PEtGrpbcbDkTZk3uqOTIG4gE r11BWZACO1IHw9sfj29RlfBegGKPLva64+VV5dz++UJwXJHSvAiqRGsv4vah90yhBG4N 1+cCOwOw2V2wen5q01KScWnFL35fyPIIcLLTsdwOctE85/1rzrwWoucfjuoZGcb5/Ww7 VKQM+PRr80QXPh0fssUYlDX2j4Xy799jRJ2CV1OEd6+DWRjZiB2ZZPc5O6A2BvEgT5zL r/FQ== X-Gm-Message-State: AKGB3mLXTFvSyByD3I5PT3ouMM9uXEw+dibzIKudrkxlo/XPrgMxsDw0 0EbKKYv3bQw40hARWWXX5LlFzTX2BCM= X-Received: by 10.28.54.3 with SMTP id d3mr15363015wma.79.1512589467460; Wed, 06 Dec 2017 11:44:27 -0800 (PST) Received: from localhost.localdomain ([105.150.171.234]) by smtp.gmail.com with ESMTPSA id b66sm3596594wmh.32.2017.12.06.11.44.24 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 06 Dec 2017 11:44:26 -0800 (PST) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: herbert@gondor.apana.org.au, linux-arm-kernel@lists.infradead.org, Ard Biesheuvel , Dave Martin , Russell King - ARM Linux , Sebastian Andrzej Siewior , Mark Rutland , linux-rt-users@vger.kernel.org, Peter Zijlstra , Catalin Marinas , Will Deacon , Steven Rostedt , Thomas Gleixner Subject: [PATCH v3 11/20] arm64: assembler: add macros to conditionally yield the NEON under PREEMPT Date: Wed, 6 Dec 2017 19:43:37 +0000 Message-Id: <20171206194346.24393-12-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20171206194346.24393-1-ard.biesheuvel@linaro.org> References: <20171206194346.24393-1-ard.biesheuvel@linaro.org> Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Add support macros to conditionally yield the NEON (and thus the CPU) that may be called from the assembler code. In some cases, yielding the NEON involves saving and restoring a non trivial amount of context (especially in the CRC folding algorithms), and so the macro is split into three, and the code in between is only executed when the yield path is taken, allowing the context to be preserved. The third macro takes an optional label argument that marks the resume path after a yield has been performed. Signed-off-by: Ard Biesheuvel --- arch/arm64/include/asm/assembler.h | 51 ++++++++++++++++++++ 1 file changed, 51 insertions(+) -- 2.11.0 diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h index 5f61487e9f93..c54e408fd5a7 100644 --- a/arch/arm64/include/asm/assembler.h +++ b/arch/arm64/include/asm/assembler.h @@ -572,4 +572,55 @@ alternative_else_nop_endif #endif .endm +/* + * Check whether to yield to another runnable task from kernel mode NEON code + * (which runs with preemption disabled). + * + * if_will_cond_yield_neon + * // pre-yield patchup code + * do_cond_yield_neon + * // post-yield patchup code + * endif_yield_neon + * + * - Check whether the preempt count is exactly 1, in which case disabling + * preemption once will make the task preemptible. If this is not the case, + * yielding is pointless. + * - Check whether TIF_NEED_RESCHED is set, and if so, disable and re-enable + * kernel mode NEON (which will trigger a reschedule), and branch to the + * yield fixup code. + * + * This macro sequence clobbers x0, x1 and the flags register unconditionally, + * and may clobber x2 .. x18 if the yield path is taken. + */ + + .macro cond_yield_neon, lbl + if_will_cond_yield_neon + do_cond_yield_neon + endif_yield_neon \lbl + .endm + + .macro if_will_cond_yield_neon +#ifdef CONFIG_PREEMPT + get_thread_info x0 + ldr w1, [x0, #TSK_TI_PREEMPT] + ldr x0, [x0, #TSK_TI_FLAGS] + cmp w1, #1 // == PREEMPT_OFFSET + csel x0, x0, xzr, eq + tbnz x0, #TIF_NEED_RESCHED, 5555f // needs rescheduling? +#endif + .subsection 1 +5555: + .endm + + .macro do_cond_yield_neon + bl kernel_neon_end + bl kernel_neon_begin + .endm + + .macro endif_yield_neon, lbl=6666f + b \lbl + .previous +6666: + .endm + #endif /* __ASM_ASSEMBLER_H */ From patchwork Wed Dec 6 19:43:40 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 120898 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp7469112qgn; Wed, 6 Dec 2017 11:44:50 -0800 (PST) X-Google-Smtp-Source: AGs4zManx7gLyHWpovs0tAkhALBOcKwLkjRufAeYA0SAQr1BWOMHe2tecJKz5SVWQTLn7J/LGN0A X-Received: by 10.84.130.67 with SMTP id 61mr22724251plc.368.1512589490304; Wed, 06 Dec 2017 11:44:50 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1512589490; cv=none; d=google.com; s=arc-20160816; b=qRve/l+viFTZNdC6x5IFUEffnmeZc/vBL4OyWyuD4YNQFVRtgD+MvHddkkz7+32CgF LOTN6Glo0s/xlZyH/hILUCUa6NTiTfqaY5yU3JoTcffNHAl7sZSG7KM6ldkV7LgOeky0 epmCcgl2dH4ejeWHrK3WFYWEW11i9qm7hFe2WpXTkklxjKZnqPSUocYLo3f/SE6qeo2j Tx+fTQOBX0Hs1hv2vc/LRuuHzDwpgvxommLBmQC/ud5O0/AbGrP6Ow7BF31N3qhiEdPU VaJ1sI3ePwAIo55/msiuRPiDzIxlpjH6uzrsGMVI4ypnv+Ur+OSpDovZj8epgzy5RfPR irQg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=86ELQHMa4AmVn4oDxMtGPfkNgalsb59iZFkEXX1kHfs=; b=Yt5+pRfXZmwuKl45zlUpOinAO+OsNimgZDapPWfAFDhcxNdE5O1n/Fmh9gDz/cqyax yr6lmmeM5ZJD9/Rq+M7vzUyo38hkMwgVzgywhCFZjjPS0r2UKfYlK9N8oGuGRynriEFC tEE9kMTcA5A9CkELfTh0/fmQFLlayVHUK1DB0rEDZoCMf39EaW4I81M7Ss7mhq75wRQ5 w4a8EOqi4XYYdaTF5kP0cPqacTppDE84XW5jMw6MsqE1MsPiAuWoJlZK9ByPZzCLqnt1 4o7e1TrmIFi2LN/SMDp3CaZHsG729efa0g+TSluaOHWt9TSOhs+4a+8CDz5GKqq7XH8Q fO6Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=I8VmD3T4; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id i16si2382524pgv.496.2017.12.06.11.44.50; Wed, 06 Dec 2017 11:44:50 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=I8VmD3T4; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752457AbdLFTos (ORCPT + 1 other); Wed, 6 Dec 2017 14:44:48 -0500 Received: from mail-wm0-f65.google.com ([74.125.82.65]:33303 "EHLO mail-wm0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752391AbdLFToi (ORCPT ); Wed, 6 Dec 2017 14:44:38 -0500 Received: by mail-wm0-f65.google.com with SMTP id g130so24498008wme.0 for ; Wed, 06 Dec 2017 11:44:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=86ELQHMa4AmVn4oDxMtGPfkNgalsb59iZFkEXX1kHfs=; b=I8VmD3T4NYOIVSlgAXWFxmjl0WEROpAbYfBn2Wr84evAplm1mXsUd5yE47G8IeSmit K6XtOEkhSP2gCkQ8x8khxaao8kVzaFeFkVjSrAg7cmjSRtZ6M+WnYPS6UTw3Xecq7abS qMpWgkO+miO7qswaWgkjrc5s+50t35yoDUGUE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=86ELQHMa4AmVn4oDxMtGPfkNgalsb59iZFkEXX1kHfs=; b=ECT+uR3nZiAR+z9z1rIh+9hKgckDzwMfONO8XvHKWKfWXzCoJV+AMOwcHlHQcp6G7n XxCVUuZWbVnyH2J745Pa8nwRhuC5twq1m1Wp0Yt/slaDy9Sm89ad+MKlJSYnoOeitraa ucoue328wzWsvO+Yd6S0ODS/wlsSkGkkVil26Dr+fFBg2noIueoydSJvl+ClCcodgW+0 iauOotxoC6ZapV3/KIj83yJ9nxje3ezZIznOLuaP7OP95e7NruWz0HuZ13fAhvCUe++c 5emS1/mDMSzu8JWMKYjk3EBkFUNPDSMuW+aEd0oMSYGGuObgqikQmjEMYJWmjgyps6Ru 82yA== X-Gm-Message-State: AKGB3mK4/cDe+/mhtUMtPkJ/fDWwnQAPMrXV3/AZDhaZjHzLs7iFm3fv E+NHx4Pq0heNAG2LM0o0Szkvh1IZpZU= X-Received: by 10.28.169.194 with SMTP id s185mr9374944wme.119.1512589477036; Wed, 06 Dec 2017 11:44:37 -0800 (PST) Received: from localhost.localdomain ([105.150.171.234]) by smtp.gmail.com with ESMTPSA id b66sm3596594wmh.32.2017.12.06.11.44.33 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 06 Dec 2017 11:44:36 -0800 (PST) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: herbert@gondor.apana.org.au, linux-arm-kernel@lists.infradead.org, Ard Biesheuvel , Dave Martin , Russell King - ARM Linux , Sebastian Andrzej Siewior , Mark Rutland , linux-rt-users@vger.kernel.org, Peter Zijlstra , Catalin Marinas , Will Deacon , Steven Rostedt , Thomas Gleixner Subject: [PATCH v3 14/20] crypto: arm64/aes-ccm - yield NEON after every block of input Date: Wed, 6 Dec 2017 19:43:40 +0000 Message-Id: <20171206194346.24393-15-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20171206194346.24393-1-ard.biesheuvel@linaro.org> References: <20171206194346.24393-1-ard.biesheuvel@linaro.org> Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Avoid excessive scheduling delays under a preemptible kernel by yielding the NEON after every block of input. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-ce-ccm-core.S | 150 +++++++++++++------- 1 file changed, 95 insertions(+), 55 deletions(-) -- 2.11.0 diff --git a/arch/arm64/crypto/aes-ce-ccm-core.S b/arch/arm64/crypto/aes-ce-ccm-core.S index e3a375c4cb83..22ee196cae00 100644 --- a/arch/arm64/crypto/aes-ce-ccm-core.S +++ b/arch/arm64/crypto/aes-ce-ccm-core.S @@ -19,24 +19,33 @@ * u32 *macp, u8 const rk[], u32 rounds); */ ENTRY(ce_aes_ccm_auth_data) - ldr w8, [x3] /* leftover from prev round? */ + frame_push 7 + + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 + mov x24, x5 + + ldr w25, [x22] /* leftover from prev round? */ ld1 {v0.16b}, [x0] /* load mac */ - cbz w8, 1f - sub w8, w8, #16 + cbz w25, 1f + sub w25, w25, #16 eor v1.16b, v1.16b, v1.16b -0: ldrb w7, [x1], #1 /* get 1 byte of input */ - subs w2, w2, #1 - add w8, w8, #1 +0: ldrb w7, [x20], #1 /* get 1 byte of input */ + subs w21, w21, #1 + add w25, w25, #1 ins v1.b[0], w7 ext v1.16b, v1.16b, v1.16b, #1 /* rotate in the input bytes */ beq 8f /* out of input? */ - cbnz w8, 0b + cbnz w25, 0b eor v0.16b, v0.16b, v1.16b -1: ld1 {v3.4s}, [x4] /* load first round key */ - prfm pldl1strm, [x1] - cmp w5, #12 /* which key size? */ - add x6, x4, #16 - sub w7, w5, #2 /* modified # of rounds */ +1: ld1 {v3.4s}, [x23] /* load first round key */ + prfm pldl1strm, [x20] + cmp w24, #12 /* which key size? */ + add x6, x23, #16 + sub w7, w24, #2 /* modified # of rounds */ bmi 2f bne 5f mov v5.16b, v3.16b @@ -55,33 +64,43 @@ ENTRY(ce_aes_ccm_auth_data) ld1 {v5.4s}, [x6], #16 /* load next round key */ bpl 3b aese v0.16b, v4.16b - subs w2, w2, #16 /* last data? */ + subs w21, w21, #16 /* last data? */ eor v0.16b, v0.16b, v5.16b /* final round */ bmi 6f - ld1 {v1.16b}, [x1], #16 /* load next input block */ + ld1 {v1.16b}, [x20], #16 /* load next input block */ eor v0.16b, v0.16b, v1.16b /* xor with mac */ - bne 1b -6: st1 {v0.16b}, [x0] /* store mac */ + beq 6f + + if_will_cond_yield_neon + st1 {v0.16b}, [x19] /* store mac */ + do_cond_yield_neon + ld1 {v0.16b}, [x19] /* reload mac */ + endif_yield_neon + + b 1b +6: st1 {v0.16b}, [x19] /* store mac */ beq 10f - adds w2, w2, #16 + adds w21, w21, #16 beq 10f - mov w8, w2 -7: ldrb w7, [x1], #1 + mov w25, w21 +7: ldrb w7, [x20], #1 umov w6, v0.b[0] eor w6, w6, w7 - strb w6, [x0], #1 - subs w2, w2, #1 + strb w6, [x19], #1 + subs w21, w21, #1 beq 10f ext v0.16b, v0.16b, v0.16b, #1 /* rotate out the mac bytes */ b 7b -8: mov w7, w8 - add w8, w8, #16 +8: mov w7, w25 + add w25, w25, #16 9: ext v1.16b, v1.16b, v1.16b, #1 adds w7, w7, #1 bne 9b eor v0.16b, v0.16b, v1.16b - st1 {v0.16b}, [x0] -10: str w8, [x3] + st1 {v0.16b}, [x19] +10: str w25, [x22] + + frame_pop 7 ret ENDPROC(ce_aes_ccm_auth_data) @@ -126,19 +145,29 @@ ENTRY(ce_aes_ccm_final) ENDPROC(ce_aes_ccm_final) .macro aes_ccm_do_crypt,enc - ldr x8, [x6, #8] /* load lower ctr */ - ld1 {v0.16b}, [x5] /* load mac */ -CPU_LE( rev x8, x8 ) /* keep swabbed ctr in reg */ + frame_push 8 + + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 + mov x24, x5 + mov x25, x6 + + ldr x26, [x25, #8] /* load lower ctr */ + ld1 {v0.16b}, [x24] /* load mac */ +CPU_LE( rev x26, x26 ) /* keep swabbed ctr in reg */ 0: /* outer loop */ - ld1 {v1.8b}, [x6] /* load upper ctr */ - prfm pldl1strm, [x1] - add x8, x8, #1 - rev x9, x8 - cmp w4, #12 /* which key size? */ - sub w7, w4, #2 /* get modified # of rounds */ + ld1 {v1.8b}, [x25] /* load upper ctr */ + prfm pldl1strm, [x20] + add x26, x26, #1 + rev x9, x26 + cmp w23, #12 /* which key size? */ + sub w7, w23, #2 /* get modified # of rounds */ ins v1.d[1], x9 /* no carry in lower ctr */ - ld1 {v3.4s}, [x3] /* load first round key */ - add x10, x3, #16 + ld1 {v3.4s}, [x22] /* load first round key */ + add x10, x22, #16 bmi 1f bne 4f mov v5.16b, v3.16b @@ -165,9 +194,9 @@ CPU_LE( rev x8, x8 ) /* keep swabbed ctr in reg */ bpl 2b aese v0.16b, v4.16b aese v1.16b, v4.16b - subs w2, w2, #16 - bmi 6f /* partial block? */ - ld1 {v2.16b}, [x1], #16 /* load next input block */ + subs w21, w21, #16 + bmi 7f /* partial block? */ + ld1 {v2.16b}, [x20], #16 /* load next input block */ .if \enc == 1 eor v2.16b, v2.16b, v5.16b /* final round enc+mac */ eor v1.16b, v1.16b, v2.16b /* xor with crypted ctr */ @@ -176,18 +205,29 @@ CPU_LE( rev x8, x8 ) /* keep swabbed ctr in reg */ eor v1.16b, v2.16b, v5.16b /* final round enc */ .endif eor v0.16b, v0.16b, v2.16b /* xor mac with pt ^ rk[last] */ - st1 {v1.16b}, [x0], #16 /* write output block */ - bne 0b -CPU_LE( rev x8, x8 ) - st1 {v0.16b}, [x5] /* store mac */ - str x8, [x6, #8] /* store lsb end of ctr (BE) */ -5: ret - -6: eor v0.16b, v0.16b, v5.16b /* final round mac */ + st1 {v1.16b}, [x19], #16 /* write output block */ + beq 5f + + if_will_cond_yield_neon + st1 {v0.16b}, [x24] /* store mac */ + do_cond_yield_neon + ld1 {v0.16b}, [x24] /* reload mac */ + endif_yield_neon + + b 0b +5: +CPU_LE( rev x26, x26 ) + st1 {v0.16b}, [x24] /* store mac */ + str x26, [x25, #8] /* store lsb end of ctr (BE) */ + +6: frame_pop 8 + ret + +7: eor v0.16b, v0.16b, v5.16b /* final round mac */ eor v1.16b, v1.16b, v5.16b /* final round enc */ - st1 {v0.16b}, [x5] /* store mac */ - add w2, w2, #16 /* process partial tail block */ -7: ldrb w9, [x1], #1 /* get 1 byte of input */ + st1 {v0.16b}, [x24] /* store mac */ + add w21, w21, #16 /* process partial tail block */ +8: ldrb w9, [x20], #1 /* get 1 byte of input */ umov w6, v1.b[0] /* get top crypted ctr byte */ umov w7, v0.b[0] /* get top mac byte */ .if \enc == 1 @@ -197,13 +237,13 @@ CPU_LE( rev x8, x8 ) eor w9, w9, w6 eor w7, w7, w9 .endif - strb w9, [x0], #1 /* store out byte */ - strb w7, [x5], #1 /* store mac byte */ - subs w2, w2, #1 - beq 5b + strb w9, [x19], #1 /* store out byte */ + strb w7, [x24], #1 /* store mac byte */ + subs w21, w21, #1 + beq 6b ext v0.16b, v0.16b, v0.16b, #1 /* shift out mac byte */ ext v1.16b, v1.16b, v1.16b, #1 /* shift out ctr byte */ - b 7b + b 8b .endm /* From patchwork Wed Dec 6 19:43:41 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 120899 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp7469138qgn; Wed, 6 Dec 2017 11:44:52 -0800 (PST) X-Google-Smtp-Source: AGs4zMb59MG7vAohM9oBeX3T2dBN1kWIhzTZ96JI7M1uWSRB6XBhOob2hkLzagRBpsKQki3p8p40 X-Received: by 10.101.99.140 with SMTP id h12mr21869175pgv.80.1512589492510; Wed, 06 Dec 2017 11:44:52 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1512589492; cv=none; d=google.com; s=arc-20160816; b=qYLy74er1cLBXZfSu4lO3a+8MfVlFIt+IOCUahEATZQZXHvxM+dq2VFhrYL+YYepal OheChuS19MNTI00X1RXm0jcEDnLkOUGQwS63KLQ5eJfV3xGGR1dj9XKkBu+cIi34q3Uy gIGVSyhnuXheVh/Tp+9OR1PtL5wrBCduzCXFo5n0iKOapjrHSX0AdsN3O9HA5mYTAnYg XU5l1NvuwMTPX3e8m+WYPOtzDAqHvZzEa8v1o5QV9anzemOrnmtUhkK1cudekNU/Pme8 paO0XF2ZKvKIspqkaoZAuVhxMdHonupFFOoAGlPMVxMeu5f/WuRsq7Fv/XaY/+dha44M jhcg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=QkvW+VUYSDPGniD4OTf5nTkqhIg+4thk/ZikLrgv3wE=; b=HIQO2Vh0i+zYSqRZmX3miyhHoNJ/WUhkaC/fOphR9VpFBowtKSpzCMAczKHslFRPGa w4RD7lxXzilSkUztDQ3C8pHqbMLSMwzMsDICqOZS2C3FWWfzxAEx6psQNQzH6AVnDPvL FiBeycKG/m2QhFL/mgLHURayGl97JMnLWTdZ+c+PhrkxQQ5BIfmU6fYMQD4Wl7GxL13c nyaU8Qf3bdATQmr2M3O9gU9ojZmS/xTNEQULc5EnrDRKkcKAJVgqB/Y7rGn1xjKOavfF 2tx0FFOIZy0GrpRYlDd0SW7nUbFGIRM5dwScyEcTpdac+QJO2AGyWY767gqG8rdrbF0t /4zA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=a1HVXc+E; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id i16si2382524pgv.496.2017.12.06.11.44.52; Wed, 06 Dec 2017 11:44:52 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=a1HVXc+E; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752473AbdLFTov (ORCPT + 1 other); Wed, 6 Dec 2017 14:44:51 -0500 Received: from mail-wm0-f66.google.com ([74.125.82.66]:38541 "EHLO mail-wm0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752422AbdLFTom (ORCPT ); Wed, 6 Dec 2017 14:44:42 -0500 Received: by mail-wm0-f66.google.com with SMTP id 64so9079321wme.3 for ; Wed, 06 Dec 2017 11:44:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=QkvW+VUYSDPGniD4OTf5nTkqhIg+4thk/ZikLrgv3wE=; b=a1HVXc+E72aYALfUPBlTwQ2Ma2zkvvi8PqUfdqSU3zaNLXbRjrPG0RLiiHRDMix7j2 3TK5nY+B2kcm21ciCYKKlcF3ncnh0njleVhF0mNhpY8YibMOrMzzn6ZB3xQB9o9NFjlj TgW70oGsCCtR3WwBSHnU6VfqxpWm/VQoGUoGg= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=QkvW+VUYSDPGniD4OTf5nTkqhIg+4thk/ZikLrgv3wE=; b=dNiSXKAeJPlA3GqFftP454t8UkfmlK19uCXMgPlurhwvwACGJEmmgtNH+RUL8Bhzsy bUoh8iAmDJRa0gAp3+kXSe5DU1hjmAqsCeJC3thcvByBPBjJvjMMkrVVam8mbf2XoENx 1VveZBakqOxjDM/QPcLj3LTk3PQE/PD82uOtj1bIYxcA9welUHZ6EY9pob5tQsIcTyVu at5tro25BAjUSknEcVPhHA4wtNqKzD6LgdXQq1Xkq1v2nbkjleqz+6qzhTFFZtrAVWP8 6Gwq5z9eylc9Pw6hUqX9RRlDR8pQMzFkQoUpYjW/YBB1ri79PSLbnB4ualJYK+fuU3z8 hWqw== X-Gm-Message-State: AJaThX67jblsfvj2rbUlXH5xHq1k5QMpBwWH07Ue5fHp8YRY9ASwAddj ZLm8YYIn36zkd2pKjLMy66w1qxjo6Y0= X-Received: by 10.28.7.82 with SMTP id 79mr12367034wmh.4.1512589480303; Wed, 06 Dec 2017 11:44:40 -0800 (PST) Received: from localhost.localdomain ([105.150.171.234]) by smtp.gmail.com with ESMTPSA id b66sm3596594wmh.32.2017.12.06.11.44.37 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 06 Dec 2017 11:44:39 -0800 (PST) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: herbert@gondor.apana.org.au, linux-arm-kernel@lists.infradead.org, Ard Biesheuvel , Dave Martin , Russell King - ARM Linux , Sebastian Andrzej Siewior , Mark Rutland , linux-rt-users@vger.kernel.org, Peter Zijlstra , Catalin Marinas , Will Deacon , Steven Rostedt , Thomas Gleixner Subject: [PATCH v3 15/20] crypto: arm64/aes-blk - yield NEON after every block of input Date: Wed, 6 Dec 2017 19:43:41 +0000 Message-Id: <20171206194346.24393-16-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20171206194346.24393-1-ard.biesheuvel@linaro.org> References: <20171206194346.24393-1-ard.biesheuvel@linaro.org> Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Avoid excessive scheduling delays under a preemptible kernel by yielding the NEON after every block of input. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-ce.S | 15 +- arch/arm64/crypto/aes-modes.S | 331 ++++++++++++-------- 2 files changed, 216 insertions(+), 130 deletions(-) -- 2.11.0 diff --git a/arch/arm64/crypto/aes-ce.S b/arch/arm64/crypto/aes-ce.S index 50330f5c3adc..623e74ed1c67 100644 --- a/arch/arm64/crypto/aes-ce.S +++ b/arch/arm64/crypto/aes-ce.S @@ -30,18 +30,21 @@ .endm /* prepare for encryption with key in rk[] */ - .macro enc_prepare, rounds, rk, ignore - load_round_keys \rounds, \rk + .macro enc_prepare, rounds, rk, temp + mov \temp, \rk + load_round_keys \rounds, \temp .endm /* prepare for encryption (again) but with new key in rk[] */ - .macro enc_switch_key, rounds, rk, ignore - load_round_keys \rounds, \rk + .macro enc_switch_key, rounds, rk, temp + mov \temp, \rk + load_round_keys \rounds, \temp .endm /* prepare for decryption with key in rk[] */ - .macro dec_prepare, rounds, rk, ignore - load_round_keys \rounds, \rk + .macro dec_prepare, rounds, rk, temp + mov \temp, \rk + load_round_keys \rounds, \temp .endm .macro do_enc_Nx, de, mc, k, i0, i1, i2, i3 diff --git a/arch/arm64/crypto/aes-modes.S b/arch/arm64/crypto/aes-modes.S index a68412e1e3a4..ab05772ce385 100644 --- a/arch/arm64/crypto/aes-modes.S +++ b/arch/arm64/crypto/aes-modes.S @@ -14,12 +14,12 @@ .align 4 aes_encrypt_block4x: - encrypt_block4x v0, v1, v2, v3, w3, x2, x8, w7 + encrypt_block4x v0, v1, v2, v3, w22, x21, x8, w7 ret ENDPROC(aes_encrypt_block4x) aes_decrypt_block4x: - decrypt_block4x v0, v1, v2, v3, w3, x2, x8, w7 + decrypt_block4x v0, v1, v2, v3, w22, x21, x8, w7 ret ENDPROC(aes_decrypt_block4x) @@ -31,57 +31,71 @@ ENDPROC(aes_decrypt_block4x) */ AES_ENTRY(aes_ecb_encrypt) - stp x29, x30, [sp, #-16]! - mov x29, sp + frame_push 5 - enc_prepare w3, x2, x5 + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 + +.Lecbencrestart: + enc_prepare w22, x21, x5 .LecbencloopNx: - subs w4, w4, #4 + subs w23, w23, #4 bmi .Lecbenc1x - ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 pt blocks */ + ld1 {v0.16b-v3.16b}, [x20], #64 /* get 4 pt blocks */ bl aes_encrypt_block4x - st1 {v0.16b-v3.16b}, [x0], #64 + st1 {v0.16b-v3.16b}, [x19], #64 + cond_yield_neon .Lecbencrestart b .LecbencloopNx .Lecbenc1x: - adds w4, w4, #4 + adds w23, w23, #4 beq .Lecbencout .Lecbencloop: - ld1 {v0.16b}, [x1], #16 /* get next pt block */ - encrypt_block v0, w3, x2, x5, w6 - st1 {v0.16b}, [x0], #16 - subs w4, w4, #1 + ld1 {v0.16b}, [x20], #16 /* get next pt block */ + encrypt_block v0, w22, x21, x5, w6 + st1 {v0.16b}, [x19], #16 + subs w23, w23, #1 bne .Lecbencloop .Lecbencout: - ldp x29, x30, [sp], #16 + frame_pop 5 ret AES_ENDPROC(aes_ecb_encrypt) AES_ENTRY(aes_ecb_decrypt) - stp x29, x30, [sp, #-16]! - mov x29, sp + frame_push 5 + + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 - dec_prepare w3, x2, x5 +.Lecbdecrestart: + dec_prepare w22, x21, x5 .LecbdecloopNx: - subs w4, w4, #4 + subs w23, w23, #4 bmi .Lecbdec1x - ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 ct blocks */ + ld1 {v0.16b-v3.16b}, [x20], #64 /* get 4 ct blocks */ bl aes_decrypt_block4x - st1 {v0.16b-v3.16b}, [x0], #64 + st1 {v0.16b-v3.16b}, [x19], #64 + cond_yield_neon .Lecbdecrestart b .LecbdecloopNx .Lecbdec1x: - adds w4, w4, #4 + adds w23, w23, #4 beq .Lecbdecout .Lecbdecloop: - ld1 {v0.16b}, [x1], #16 /* get next ct block */ - decrypt_block v0, w3, x2, x5, w6 - st1 {v0.16b}, [x0], #16 - subs w4, w4, #1 + ld1 {v0.16b}, [x20], #16 /* get next ct block */ + decrypt_block v0, w22, x21, x5, w6 + st1 {v0.16b}, [x19], #16 + subs w23, w23, #1 bne .Lecbdecloop .Lecbdecout: - ldp x29, x30, [sp], #16 + frame_pop 5 ret AES_ENDPROC(aes_ecb_decrypt) @@ -94,78 +108,100 @@ AES_ENDPROC(aes_ecb_decrypt) */ AES_ENTRY(aes_cbc_encrypt) - ld1 {v4.16b}, [x5] /* get iv */ - enc_prepare w3, x2, x6 + frame_push 6 + + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 + mov x24, x5 + +.Lcbcencrestart: + ld1 {v4.16b}, [x24] /* get iv */ + enc_prepare w22, x21, x6 .Lcbcencloop4x: - subs w4, w4, #4 + subs w23, w23, #4 bmi .Lcbcenc1x - ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 pt blocks */ + ld1 {v0.16b-v3.16b}, [x20], #64 /* get 4 pt blocks */ eor v0.16b, v0.16b, v4.16b /* ..and xor with iv */ - encrypt_block v0, w3, x2, x6, w7 + encrypt_block v0, w22, x21, x6, w7 eor v1.16b, v1.16b, v0.16b - encrypt_block v1, w3, x2, x6, w7 + encrypt_block v1, w22, x21, x6, w7 eor v2.16b, v2.16b, v1.16b - encrypt_block v2, w3, x2, x6, w7 + encrypt_block v2, w22, x21, x6, w7 eor v3.16b, v3.16b, v2.16b - encrypt_block v3, w3, x2, x6, w7 - st1 {v0.16b-v3.16b}, [x0], #64 + encrypt_block v3, w22, x21, x6, w7 + st1 {v0.16b-v3.16b}, [x19], #64 mov v4.16b, v3.16b + st1 {v4.16b}, [x24] /* return iv */ + cond_yield_neon .Lcbcencrestart b .Lcbcencloop4x .Lcbcenc1x: - adds w4, w4, #4 + adds w23, w23, #4 beq .Lcbcencout .Lcbcencloop: - ld1 {v0.16b}, [x1], #16 /* get next pt block */ + ld1 {v0.16b}, [x20], #16 /* get next pt block */ eor v4.16b, v4.16b, v0.16b /* ..and xor with iv */ - encrypt_block v4, w3, x2, x6, w7 - st1 {v4.16b}, [x0], #16 - subs w4, w4, #1 + encrypt_block v4, w22, x21, x6, w7 + st1 {v4.16b}, [x19], #16 + subs w23, w23, #1 bne .Lcbcencloop .Lcbcencout: - st1 {v4.16b}, [x5] /* return iv */ + st1 {v4.16b}, [x24] /* return iv */ + frame_pop 6 ret AES_ENDPROC(aes_cbc_encrypt) AES_ENTRY(aes_cbc_decrypt) - stp x29, x30, [sp, #-16]! - mov x29, sp + frame_push 6 + + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 + mov x24, x5 - ld1 {v7.16b}, [x5] /* get iv */ - dec_prepare w3, x2, x6 +.Lcbcdecrestart: + ld1 {v7.16b}, [x24] /* get iv */ + dec_prepare w22, x21, x6 .LcbcdecloopNx: - subs w4, w4, #4 + subs w23, w23, #4 bmi .Lcbcdec1x - ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 ct blocks */ + ld1 {v0.16b-v3.16b}, [x20], #64 /* get 4 ct blocks */ mov v4.16b, v0.16b mov v5.16b, v1.16b mov v6.16b, v2.16b bl aes_decrypt_block4x - sub x1, x1, #16 + sub x20, x20, #16 eor v0.16b, v0.16b, v7.16b eor v1.16b, v1.16b, v4.16b - ld1 {v7.16b}, [x1], #16 /* reload 1 ct block */ + ld1 {v7.16b}, [x20], #16 /* reload 1 ct block */ eor v2.16b, v2.16b, v5.16b eor v3.16b, v3.16b, v6.16b - st1 {v0.16b-v3.16b}, [x0], #64 + st1 {v0.16b-v3.16b}, [x19], #64 + st1 {v7.16b}, [x24] /* return iv */ + cond_yield_neon .Lcbcdecrestart b .LcbcdecloopNx .Lcbcdec1x: - adds w4, w4, #4 + adds w23, w23, #4 beq .Lcbcdecout .Lcbcdecloop: - ld1 {v1.16b}, [x1], #16 /* get next ct block */ + ld1 {v1.16b}, [x20], #16 /* get next ct block */ mov v0.16b, v1.16b /* ...and copy to v0 */ - decrypt_block v0, w3, x2, x6, w7 + decrypt_block v0, w22, x21, x6, w7 eor v0.16b, v0.16b, v7.16b /* xor with iv => pt */ mov v7.16b, v1.16b /* ct is next iv */ - st1 {v0.16b}, [x0], #16 - subs w4, w4, #1 + st1 {v0.16b}, [x19], #16 + subs w23, w23, #1 bne .Lcbcdecloop .Lcbcdecout: - st1 {v7.16b}, [x5] /* return iv */ - ldp x29, x30, [sp], #16 + st1 {v7.16b}, [x24] /* return iv */ + frame_pop 6 ret AES_ENDPROC(aes_cbc_decrypt) @@ -176,19 +212,26 @@ AES_ENDPROC(aes_cbc_decrypt) */ AES_ENTRY(aes_ctr_encrypt) - stp x29, x30, [sp, #-16]! - mov x29, sp + frame_push 6 - enc_prepare w3, x2, x6 - ld1 {v4.16b}, [x5] + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 + mov x24, x5 + +.Lctrrestart: + enc_prepare w22, x21, x6 + ld1 {v4.16b}, [x24] umov x6, v4.d[1] /* keep swabbed ctr in reg */ rev x6, x6 - cmn w6, w4 /* 32 bit overflow? */ - bcs .Lctrloop .LctrloopNx: - subs w4, w4, #4 + subs w23, w23, #4 bmi .Lctr1x + cmn w6, #4 /* 32 bit overflow? */ + bcs .Lctr1x ldr q8, =0x30000000200000001 /* addends 1,2,3[,0] */ dup v7.4s, w6 mov v0.16b, v4.16b @@ -200,25 +243,27 @@ AES_ENTRY(aes_ctr_encrypt) mov v1.s[3], v8.s[0] mov v2.s[3], v8.s[1] mov v3.s[3], v8.s[2] - ld1 {v5.16b-v7.16b}, [x1], #48 /* get 3 input blocks */ + ld1 {v5.16b-v7.16b}, [x20], #48 /* get 3 input blocks */ bl aes_encrypt_block4x eor v0.16b, v5.16b, v0.16b - ld1 {v5.16b}, [x1], #16 /* get 1 input block */ + ld1 {v5.16b}, [x20], #16 /* get 1 input block */ eor v1.16b, v6.16b, v1.16b eor v2.16b, v7.16b, v2.16b eor v3.16b, v5.16b, v3.16b - st1 {v0.16b-v3.16b}, [x0], #64 + st1 {v0.16b-v3.16b}, [x19], #64 add x6, x6, #4 rev x7, x6 ins v4.d[1], x7 - cbz w4, .Lctrout + cbz w23, .Lctrout + st1 {v4.16b}, [x24] /* return next CTR value */ + cond_yield_neon .Lctrrestart b .LctrloopNx .Lctr1x: - adds w4, w4, #4 + adds w23, w23, #4 beq .Lctrout .Lctrloop: mov v0.16b, v4.16b - encrypt_block v0, w3, x2, x8, w7 + encrypt_block v0, w22, x21, x8, w7 adds x6, x6, #1 /* increment BE ctr */ rev x7, x6 @@ -226,22 +271,22 @@ AES_ENTRY(aes_ctr_encrypt) bcs .Lctrcarry /* overflow? */ .Lctrcarrydone: - subs w4, w4, #1 + subs w23, w23, #1 bmi .Lctrtailblock /* blocks <0 means tail block */ - ld1 {v3.16b}, [x1], #16 + ld1 {v3.16b}, [x20], #16 eor v3.16b, v0.16b, v3.16b - st1 {v3.16b}, [x0], #16 + st1 {v3.16b}, [x19], #16 bne .Lctrloop .Lctrout: - st1 {v4.16b}, [x5] /* return next CTR value */ - ldp x29, x30, [sp], #16 + st1 {v4.16b}, [x24] /* return next CTR value */ +.Lctrret: + frame_pop 6 ret .Lctrtailblock: - st1 {v0.16b}, [x0] - ldp x29, x30, [sp], #16 - ret + st1 {v0.16b}, [x19] + b .Lctrret .Lctrcarry: umov x7, v4.d[0] /* load upper word of ctr */ @@ -274,10 +319,16 @@ CPU_LE( .quad 1, 0x87 ) CPU_BE( .quad 0x87, 1 ) AES_ENTRY(aes_xts_encrypt) - stp x29, x30, [sp, #-16]! - mov x29, sp + frame_push 6 + + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 + mov x24, x6 - ld1 {v4.16b}, [x6] + ld1 {v4.16b}, [x24] cbz w7, .Lxtsencnotfirst enc_prepare w3, x5, x8 @@ -286,15 +337,17 @@ AES_ENTRY(aes_xts_encrypt) ldr q7, .Lxts_mul_x b .LxtsencNx +.Lxtsencrestart: + ld1 {v4.16b}, [x24] .Lxtsencnotfirst: - enc_prepare w3, x2, x8 + enc_prepare w22, x21, x8 .LxtsencloopNx: ldr q7, .Lxts_mul_x next_tweak v4, v4, v7, v8 .LxtsencNx: - subs w4, w4, #4 + subs w23, w23, #4 bmi .Lxtsenc1x - ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 pt blocks */ + ld1 {v0.16b-v3.16b}, [x20], #64 /* get 4 pt blocks */ next_tweak v5, v4, v7, v8 eor v0.16b, v0.16b, v4.16b next_tweak v6, v5, v7, v8 @@ -307,35 +360,43 @@ AES_ENTRY(aes_xts_encrypt) eor v0.16b, v0.16b, v4.16b eor v1.16b, v1.16b, v5.16b eor v2.16b, v2.16b, v6.16b - st1 {v0.16b-v3.16b}, [x0], #64 + st1 {v0.16b-v3.16b}, [x19], #64 mov v4.16b, v7.16b - cbz w4, .Lxtsencout + cbz w23, .Lxtsencout + st1 {v4.16b}, [x24] + cond_yield_neon .Lxtsencrestart b .LxtsencloopNx .Lxtsenc1x: - adds w4, w4, #4 + adds w23, w23, #4 beq .Lxtsencout .Lxtsencloop: - ld1 {v1.16b}, [x1], #16 + ld1 {v1.16b}, [x20], #16 eor v0.16b, v1.16b, v4.16b - encrypt_block v0, w3, x2, x8, w7 + encrypt_block v0, w22, x21, x8, w7 eor v0.16b, v0.16b, v4.16b - st1 {v0.16b}, [x0], #16 - subs w4, w4, #1 + st1 {v0.16b}, [x19], #16 + subs w23, w23, #1 beq .Lxtsencout next_tweak v4, v4, v7, v8 b .Lxtsencloop .Lxtsencout: - st1 {v4.16b}, [x6] - ldp x29, x30, [sp], #16 + st1 {v4.16b}, [x24] + frame_pop 6 ret AES_ENDPROC(aes_xts_encrypt) AES_ENTRY(aes_xts_decrypt) - stp x29, x30, [sp, #-16]! - mov x29, sp + frame_push 6 - ld1 {v4.16b}, [x6] + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 + mov x24, x6 + + ld1 {v4.16b}, [x24] cbz w7, .Lxtsdecnotfirst enc_prepare w3, x5, x8 @@ -344,15 +405,17 @@ AES_ENTRY(aes_xts_decrypt) ldr q7, .Lxts_mul_x b .LxtsdecNx +.Lxtsdecrestart: + ld1 {v4.16b}, [x24] .Lxtsdecnotfirst: - dec_prepare w3, x2, x8 + dec_prepare w22, x21, x8 .LxtsdecloopNx: ldr q7, .Lxts_mul_x next_tweak v4, v4, v7, v8 .LxtsdecNx: - subs w4, w4, #4 + subs w23, w23, #4 bmi .Lxtsdec1x - ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 ct blocks */ + ld1 {v0.16b-v3.16b}, [x20], #64 /* get 4 ct blocks */ next_tweak v5, v4, v7, v8 eor v0.16b, v0.16b, v4.16b next_tweak v6, v5, v7, v8 @@ -365,26 +428,28 @@ AES_ENTRY(aes_xts_decrypt) eor v0.16b, v0.16b, v4.16b eor v1.16b, v1.16b, v5.16b eor v2.16b, v2.16b, v6.16b - st1 {v0.16b-v3.16b}, [x0], #64 + st1 {v0.16b-v3.16b}, [x19], #64 mov v4.16b, v7.16b - cbz w4, .Lxtsdecout + cbz w23, .Lxtsdecout + st1 {v4.16b}, [x24] + cond_yield_neon .Lxtsdecrestart b .LxtsdecloopNx .Lxtsdec1x: - adds w4, w4, #4 + adds w23, w23, #4 beq .Lxtsdecout .Lxtsdecloop: - ld1 {v1.16b}, [x1], #16 + ld1 {v1.16b}, [x20], #16 eor v0.16b, v1.16b, v4.16b - decrypt_block v0, w3, x2, x8, w7 + decrypt_block v0, w22, x21, x8, w7 eor v0.16b, v0.16b, v4.16b - st1 {v0.16b}, [x0], #16 - subs w4, w4, #1 + st1 {v0.16b}, [x19], #16 + subs w23, w23, #1 beq .Lxtsdecout next_tweak v4, v4, v7, v8 b .Lxtsdecloop .Lxtsdecout: - st1 {v4.16b}, [x6] - ldp x29, x30, [sp], #16 + st1 {v4.16b}, [x24] + frame_pop 6 ret AES_ENDPROC(aes_xts_decrypt) @@ -393,43 +458,61 @@ AES_ENDPROC(aes_xts_decrypt) * int blocks, u8 dg[], int enc_before, int enc_after) */ AES_ENTRY(aes_mac_update) - ld1 {v0.16b}, [x4] /* get dg */ + frame_push 6 + + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 + mov x24, x6 + + ld1 {v0.16b}, [x23] /* get dg */ enc_prepare w2, x1, x7 cbz w5, .Lmacloop4x encrypt_block v0, w2, x1, x7, w8 .Lmacloop4x: - subs w3, w3, #4 + subs w22, w22, #4 bmi .Lmac1x - ld1 {v1.16b-v4.16b}, [x0], #64 /* get next pt block */ + ld1 {v1.16b-v4.16b}, [x19], #64 /* get next pt block */ eor v0.16b, v0.16b, v1.16b /* ..and xor with dg */ - encrypt_block v0, w2, x1, x7, w8 + encrypt_block v0, w21, x20, x7, w8 eor v0.16b, v0.16b, v2.16b - encrypt_block v0, w2, x1, x7, w8 + encrypt_block v0, w21, x20, x7, w8 eor v0.16b, v0.16b, v3.16b - encrypt_block v0, w2, x1, x7, w8 + encrypt_block v0, w21, x20, x7, w8 eor v0.16b, v0.16b, v4.16b - cmp w3, wzr - csinv x5, x6, xzr, eq + cmp w22, wzr + csinv x5, x24, xzr, eq cbz w5, .Lmacout - encrypt_block v0, w2, x1, x7, w8 + encrypt_block v0, w21, x20, x7, w8 + st1 {v0.16b}, [x23] /* return dg */ + cond_yield_neon .Lmacrestart b .Lmacloop4x .Lmac1x: - add w3, w3, #4 + add w22, w22, #4 .Lmacloop: - cbz w3, .Lmacout - ld1 {v1.16b}, [x0], #16 /* get next pt block */ + cbz w22, .Lmacout + ld1 {v1.16b}, [x19], #16 /* get next pt block */ eor v0.16b, v0.16b, v1.16b /* ..and xor with dg */ - subs w3, w3, #1 - csinv x5, x6, xzr, eq + subs w22, w22, #1 + csinv x5, x24, xzr, eq cbz w5, .Lmacout - encrypt_block v0, w2, x1, x7, w8 +.Lmacenc: + encrypt_block v0, w21, x20, x7, w8 b .Lmacloop .Lmacout: - st1 {v0.16b}, [x4] /* return dg */ + st1 {v0.16b}, [x23] /* return dg */ + frame_pop 6 ret + +.Lmacrestart: + ld1 {v0.16b}, [x23] /* get dg */ + enc_prepare w21, x20, x0 + b .Lmacloop4x AES_ENDPROC(aes_mac_update) From patchwork Wed Dec 6 19:43:42 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 120900 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp7469212qgn; Wed, 6 Dec 2017 11:44:58 -0800 (PST) X-Google-Smtp-Source: AGs4zMaNb0aaN1aMjWUW90RvjwTPOnw9kq6uj2cP27XwKTNtGaKgchLqyKHPx72hecJ7EGE+iBWL X-Received: by 10.98.103.156 with SMTP id t28mr3776572pfj.234.1512589498410; Wed, 06 Dec 2017 11:44:58 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1512589498; cv=none; d=google.com; s=arc-20160816; b=kwI3qVphb6B+OARxz9SsEesDCJ6Fj30rlXEn/6arsDCEz6GnPiX8XhyViw8yNuoxy7 +McOMM0XrFS0CYoTaPeR41xTbltzdJlWLrEyH54wnzJ7fc9QEkQ3jztn480L3/9pRIo7 4o5LS+QVD3sPNSd6J7LXWQpkscuUi22KpzlnuZt8KXmmWArAf0P9D7+SxGUUayO6JH3n vKM5oRZWDW1ampOnBquWtc/0fr37CF9Lv3S0F+o1+O/ceaHGUWVMs3Xk2WCGA21QwuWQ kt+mnmCTzxriWaJCM7oG+hYZsjC0RSPBAlUQ8Xu7POAjbLgptnKicxiALOmv3syhRS0d XiAA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=v6Kxl/UUbEqYTxgRE3N+cXpJEN/y9kmHE+RlumD6CuQ=; b=fmtk2QlB/QvRs+9esjW1Us5AJb/OUQdU1o4ceHdUlBeV8y/fFEV6ma0++qClXguW95 Ut0dXyhc4fRUsXdk6P/RSf3ipouXz+XJJwzyTMKYjAgsvyAvl17LlRMSNlF3jwF+oI0Z xLVRaZlav6YzlQthsxQlXcl6rtvsOzmW2Hqi9ostwZG/hmJz5NN4HeYyLm/4a6996r5h V1YCaOZgGIJd3lfcEbwZgkOfzEyxY0FCnUSeMP+fWPqpvzI1kKMOZd+H9daF8WBSSCP5 1PZ0+jAGerv1L3cNLuTncT2SY6UIBI+oVXMLvf+4pemeQhQT7nC18rtzbJfCavF4KN2H dvXg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=kXYqo57p; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id i16si2382524pgv.496.2017.12.06.11.44.58; Wed, 06 Dec 2017 11:44:58 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=kXYqo57p; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752132AbdLFTo5 (ORCPT + 1 other); Wed, 6 Dec 2017 14:44:57 -0500 Received: from mail-wm0-f65.google.com ([74.125.82.65]:36971 "EHLO mail-wm0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752447AbdLFTop (ORCPT ); Wed, 6 Dec 2017 14:44:45 -0500 Received: by mail-wm0-f65.google.com with SMTP id f140so9065504wmd.2 for ; Wed, 06 Dec 2017 11:44:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=v6Kxl/UUbEqYTxgRE3N+cXpJEN/y9kmHE+RlumD6CuQ=; b=kXYqo57p33Hy3/x16ckJAztWjdIpzA5b2nIezLW7E238hHMqzmmhAl7ZWr1SWzNx3S u72VLwhnHVzLuiB9LDSqVsviswiIpCIjWXSKJvPo0Thc7EJB5U+sbQppJn3+18Ct8ozA VGEhlGgrQM+rzP/ztrBFftBHrFnQ1+MJ3UOdA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=v6Kxl/UUbEqYTxgRE3N+cXpJEN/y9kmHE+RlumD6CuQ=; b=DE286q4KHZ/sj3yXHTGbo1OoJGu/mlA/HK0zC94GwZPaeGbYPfCAzgsvV9dZlgvVxB RqcMbuag1qSgyzhmFUMjjU5zR3NN1RoHQBD0waCxtfKhpedfSwQFlM+iy/LDpZEWJYFf /AMx0MqEUKTdvpNkIZxk4CJdVozGOYor0PyfUl+br473Ox6r2kchv5AwHjsomvtH5iEl Z12G4kSW536JfyWHoxFa6kLgIpX0EkkUbZE6xk/34SMx7/jxhTbLKsJgFdeUgOXJsAl0 dHklOHm83v/eBMDM9WGMIKE2lgR5/2fl+8ho3RpXPNuAVI8IEan7yD+DPdy4hZAJIQHB rtBQ== X-Gm-Message-State: AKGB3mLLgfL/975EXuc1cMB0tLOguQ6SpO7Itxy7ILXwLnYfzhDSNw17 FjMvlcZ2dmf9UeLUHp0YFVK5ejGWzn0= X-Received: by 10.28.191.3 with SMTP id p3mr12783957wmf.81.1512589483914; Wed, 06 Dec 2017 11:44:43 -0800 (PST) Received: from localhost.localdomain ([105.150.171.234]) by smtp.gmail.com with ESMTPSA id b66sm3596594wmh.32.2017.12.06.11.44.40 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 06 Dec 2017 11:44:43 -0800 (PST) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: herbert@gondor.apana.org.au, linux-arm-kernel@lists.infradead.org, Ard Biesheuvel , Dave Martin , Russell King - ARM Linux , Sebastian Andrzej Siewior , Mark Rutland , linux-rt-users@vger.kernel.org, Peter Zijlstra , Catalin Marinas , Will Deacon , Steven Rostedt , Thomas Gleixner Subject: [PATCH v3 16/20] crypto: arm64/aes-bs - yield NEON after every block of input Date: Wed, 6 Dec 2017 19:43:42 +0000 Message-Id: <20171206194346.24393-17-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20171206194346.24393-1-ard.biesheuvel@linaro.org> References: <20171206194346.24393-1-ard.biesheuvel@linaro.org> Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Avoid excessive scheduling delays under a preemptible kernel by yielding the NEON after every block of input. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-neonbs-core.S | 305 +++++++++++--------- 1 file changed, 170 insertions(+), 135 deletions(-) -- 2.11.0 diff --git a/arch/arm64/crypto/aes-neonbs-core.S b/arch/arm64/crypto/aes-neonbs-core.S index ca0472500433..23659369da78 100644 --- a/arch/arm64/crypto/aes-neonbs-core.S +++ b/arch/arm64/crypto/aes-neonbs-core.S @@ -565,54 +565,61 @@ ENDPROC(aesbs_decrypt8) * int blocks) */ .macro __ecb_crypt, do8, o0, o1, o2, o3, o4, o5, o6, o7 - stp x29, x30, [sp, #-16]! - mov x29, sp + frame_push 5 + + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 99: mov x5, #1 - lsl x5, x5, x4 - subs w4, w4, #8 - csel x4, x4, xzr, pl + lsl x5, x5, x23 + subs w23, w23, #8 + csel x23, x23, xzr, pl csel x5, x5, xzr, mi - ld1 {v0.16b}, [x1], #16 + ld1 {v0.16b}, [x20], #16 tbnz x5, #1, 0f - ld1 {v1.16b}, [x1], #16 + ld1 {v1.16b}, [x20], #16 tbnz x5, #2, 0f - ld1 {v2.16b}, [x1], #16 + ld1 {v2.16b}, [x20], #16 tbnz x5, #3, 0f - ld1 {v3.16b}, [x1], #16 + ld1 {v3.16b}, [x20], #16 tbnz x5, #4, 0f - ld1 {v4.16b}, [x1], #16 + ld1 {v4.16b}, [x20], #16 tbnz x5, #5, 0f - ld1 {v5.16b}, [x1], #16 + ld1 {v5.16b}, [x20], #16 tbnz x5, #6, 0f - ld1 {v6.16b}, [x1], #16 + ld1 {v6.16b}, [x20], #16 tbnz x5, #7, 0f - ld1 {v7.16b}, [x1], #16 + ld1 {v7.16b}, [x20], #16 -0: mov bskey, x2 - mov rounds, x3 +0: mov bskey, x21 + mov rounds, x22 bl \do8 - st1 {\o0\().16b}, [x0], #16 + st1 {\o0\().16b}, [x19], #16 tbnz x5, #1, 1f - st1 {\o1\().16b}, [x0], #16 + st1 {\o1\().16b}, [x19], #16 tbnz x5, #2, 1f - st1 {\o2\().16b}, [x0], #16 + st1 {\o2\().16b}, [x19], #16 tbnz x5, #3, 1f - st1 {\o3\().16b}, [x0], #16 + st1 {\o3\().16b}, [x19], #16 tbnz x5, #4, 1f - st1 {\o4\().16b}, [x0], #16 + st1 {\o4\().16b}, [x19], #16 tbnz x5, #5, 1f - st1 {\o5\().16b}, [x0], #16 + st1 {\o5\().16b}, [x19], #16 tbnz x5, #6, 1f - st1 {\o6\().16b}, [x0], #16 + st1 {\o6\().16b}, [x19], #16 tbnz x5, #7, 1f - st1 {\o7\().16b}, [x0], #16 + st1 {\o7\().16b}, [x19], #16 - cbnz x4, 99b + cbz x23, 1f + cond_yield_neon + b 99b -1: ldp x29, x30, [sp], #16 +1: frame_pop 5 ret .endm @@ -632,43 +639,49 @@ ENDPROC(aesbs_ecb_decrypt) */ .align 4 ENTRY(aesbs_cbc_decrypt) - stp x29, x30, [sp, #-16]! - mov x29, sp + frame_push 6 + + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 + mov x24, x5 99: mov x6, #1 - lsl x6, x6, x4 - subs w4, w4, #8 - csel x4, x4, xzr, pl + lsl x6, x6, x23 + subs w23, w23, #8 + csel x23, x23, xzr, pl csel x6, x6, xzr, mi - ld1 {v0.16b}, [x1], #16 + ld1 {v0.16b}, [x20], #16 mov v25.16b, v0.16b tbnz x6, #1, 0f - ld1 {v1.16b}, [x1], #16 + ld1 {v1.16b}, [x20], #16 mov v26.16b, v1.16b tbnz x6, #2, 0f - ld1 {v2.16b}, [x1], #16 + ld1 {v2.16b}, [x20], #16 mov v27.16b, v2.16b tbnz x6, #3, 0f - ld1 {v3.16b}, [x1], #16 + ld1 {v3.16b}, [x20], #16 mov v28.16b, v3.16b tbnz x6, #4, 0f - ld1 {v4.16b}, [x1], #16 + ld1 {v4.16b}, [x20], #16 mov v29.16b, v4.16b tbnz x6, #5, 0f - ld1 {v5.16b}, [x1], #16 + ld1 {v5.16b}, [x20], #16 mov v30.16b, v5.16b tbnz x6, #6, 0f - ld1 {v6.16b}, [x1], #16 + ld1 {v6.16b}, [x20], #16 mov v31.16b, v6.16b tbnz x6, #7, 0f - ld1 {v7.16b}, [x1] + ld1 {v7.16b}, [x20] -0: mov bskey, x2 - mov rounds, x3 +0: mov bskey, x21 + mov rounds, x22 bl aesbs_decrypt8 - ld1 {v24.16b}, [x5] // load IV + ld1 {v24.16b}, [x24] // load IV eor v1.16b, v1.16b, v25.16b eor v6.16b, v6.16b, v26.16b @@ -679,34 +692,36 @@ ENTRY(aesbs_cbc_decrypt) eor v3.16b, v3.16b, v30.16b eor v5.16b, v5.16b, v31.16b - st1 {v0.16b}, [x0], #16 + st1 {v0.16b}, [x19], #16 mov v24.16b, v25.16b tbnz x6, #1, 1f - st1 {v1.16b}, [x0], #16 + st1 {v1.16b}, [x19], #16 mov v24.16b, v26.16b tbnz x6, #2, 1f - st1 {v6.16b}, [x0], #16 + st1 {v6.16b}, [x19], #16 mov v24.16b, v27.16b tbnz x6, #3, 1f - st1 {v4.16b}, [x0], #16 + st1 {v4.16b}, [x19], #16 mov v24.16b, v28.16b tbnz x6, #4, 1f - st1 {v2.16b}, [x0], #16 + st1 {v2.16b}, [x19], #16 mov v24.16b, v29.16b tbnz x6, #5, 1f - st1 {v7.16b}, [x0], #16 + st1 {v7.16b}, [x19], #16 mov v24.16b, v30.16b tbnz x6, #6, 1f - st1 {v3.16b}, [x0], #16 + st1 {v3.16b}, [x19], #16 mov v24.16b, v31.16b tbnz x6, #7, 1f - ld1 {v24.16b}, [x1], #16 - st1 {v5.16b}, [x0], #16 -1: st1 {v24.16b}, [x5] // store IV + ld1 {v24.16b}, [x20], #16 + st1 {v5.16b}, [x19], #16 +1: st1 {v24.16b}, [x24] // store IV - cbnz x4, 99b + cbz x23, 2f + cond_yield_neon + b 99b - ldp x29, x30, [sp], #16 +2: frame_pop 6 ret ENDPROC(aesbs_cbc_decrypt) @@ -731,87 +746,93 @@ CPU_BE( .quad 0x87, 1 ) */ __xts_crypt8: mov x6, #1 - lsl x6, x6, x4 - subs w4, w4, #8 - csel x4, x4, xzr, pl + lsl x6, x6, x23 + subs w23, w23, #8 + csel x23, x23, xzr, pl csel x6, x6, xzr, mi - ld1 {v0.16b}, [x1], #16 + ld1 {v0.16b}, [x20], #16 next_tweak v26, v25, v30, v31 eor v0.16b, v0.16b, v25.16b tbnz x6, #1, 0f - ld1 {v1.16b}, [x1], #16 + ld1 {v1.16b}, [x20], #16 next_tweak v27, v26, v30, v31 eor v1.16b, v1.16b, v26.16b tbnz x6, #2, 0f - ld1 {v2.16b}, [x1], #16 + ld1 {v2.16b}, [x20], #16 next_tweak v28, v27, v30, v31 eor v2.16b, v2.16b, v27.16b tbnz x6, #3, 0f - ld1 {v3.16b}, [x1], #16 + ld1 {v3.16b}, [x20], #16 next_tweak v29, v28, v30, v31 eor v3.16b, v3.16b, v28.16b tbnz x6, #4, 0f - ld1 {v4.16b}, [x1], #16 - str q29, [sp, #16] + ld1 {v4.16b}, [x20], #16 + str q29, [sp, #64] eor v4.16b, v4.16b, v29.16b next_tweak v29, v29, v30, v31 tbnz x6, #5, 0f - ld1 {v5.16b}, [x1], #16 - str q29, [sp, #32] + ld1 {v5.16b}, [x20], #16 + str q29, [sp, #80] eor v5.16b, v5.16b, v29.16b next_tweak v29, v29, v30, v31 tbnz x6, #6, 0f - ld1 {v6.16b}, [x1], #16 - str q29, [sp, #48] + ld1 {v6.16b}, [x20], #16 + str q29, [sp, #96] eor v6.16b, v6.16b, v29.16b next_tweak v29, v29, v30, v31 tbnz x6, #7, 0f - ld1 {v7.16b}, [x1], #16 - str q29, [sp, #64] + ld1 {v7.16b}, [x20], #16 + str q29, [sp, #112] eor v7.16b, v7.16b, v29.16b next_tweak v29, v29, v30, v31 -0: mov bskey, x2 - mov rounds, x3 +0: mov bskey, x21 + mov rounds, x22 br x7 ENDPROC(__xts_crypt8) .macro __xts_crypt, do8, o0, o1, o2, o3, o4, o5, o6, o7 - stp x29, x30, [sp, #-80]! - mov x29, sp + frame_push 6, 64 + + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 + mov x24, x5 - ldr q30, .Lxts_mul_x - ld1 {v25.16b}, [x5] +0: ldr q30, .Lxts_mul_x + ld1 {v25.16b}, [x24] 99: adr x7, \do8 bl __xts_crypt8 - ldp q16, q17, [sp, #16] - ldp q18, q19, [sp, #48] + ldp q16, q17, [sp, #64] + ldp q18, q19, [sp, #96] eor \o0\().16b, \o0\().16b, v25.16b eor \o1\().16b, \o1\().16b, v26.16b eor \o2\().16b, \o2\().16b, v27.16b eor \o3\().16b, \o3\().16b, v28.16b - st1 {\o0\().16b}, [x0], #16 + st1 {\o0\().16b}, [x19], #16 mov v25.16b, v26.16b tbnz x6, #1, 1f - st1 {\o1\().16b}, [x0], #16 + st1 {\o1\().16b}, [x19], #16 mov v25.16b, v27.16b tbnz x6, #2, 1f - st1 {\o2\().16b}, [x0], #16 + st1 {\o2\().16b}, [x19], #16 mov v25.16b, v28.16b tbnz x6, #3, 1f - st1 {\o3\().16b}, [x0], #16 + st1 {\o3\().16b}, [x19], #16 mov v25.16b, v29.16b tbnz x6, #4, 1f @@ -820,18 +841,22 @@ ENDPROC(__xts_crypt8) eor \o6\().16b, \o6\().16b, v18.16b eor \o7\().16b, \o7\().16b, v19.16b - st1 {\o4\().16b}, [x0], #16 + st1 {\o4\().16b}, [x19], #16 tbnz x6, #5, 1f - st1 {\o5\().16b}, [x0], #16 + st1 {\o5\().16b}, [x19], #16 tbnz x6, #6, 1f - st1 {\o6\().16b}, [x0], #16 + st1 {\o6\().16b}, [x19], #16 tbnz x6, #7, 1f - st1 {\o7\().16b}, [x0], #16 + st1 {\o7\().16b}, [x19], #16 - cbnz x4, 99b + cbz x23, 1f + st1 {v25.16b}, [x24] -1: st1 {v25.16b}, [x5] - ldp x29, x30, [sp], #80 + cond_yield_neon 0b + b 99b + +1: st1 {v25.16b}, [x24] + frame_pop 6, 64 ret .endm @@ -856,24 +881,31 @@ ENDPROC(aesbs_xts_decrypt) * int rounds, int blocks, u8 iv[], u8 final[]) */ ENTRY(aesbs_ctr_encrypt) - stp x29, x30, [sp, #-16]! - mov x29, sp - - cmp x6, #0 - cset x10, ne - add x4, x4, x10 // do one extra block if final - - ldp x7, x8, [x5] - ld1 {v0.16b}, [x5] + frame_push 8 + + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 + mov x24, x5 + mov x25, x6 + + cmp x25, #0 + cset x26, ne + add x23, x23, x26 // do one extra block if final + +98: ldp x7, x8, [x24] + ld1 {v0.16b}, [x24] CPU_LE( rev x7, x7 ) CPU_LE( rev x8, x8 ) adds x8, x8, #1 adc x7, x7, xzr 99: mov x9, #1 - lsl x9, x9, x4 - subs w4, w4, #8 - csel x4, x4, xzr, pl + lsl x9, x9, x23 + subs w23, w23, #8 + csel x23, x23, xzr, pl csel x9, x9, xzr, le tbnz x9, #1, 0f @@ -891,82 +923,85 @@ CPU_LE( rev x8, x8 ) tbnz x9, #7, 0f next_ctr v7 -0: mov bskey, x2 - mov rounds, x3 +0: mov bskey, x21 + mov rounds, x22 bl aesbs_encrypt8 - lsr x9, x9, x10 // disregard the extra block + lsr x9, x9, x26 // disregard the extra block tbnz x9, #0, 0f - ld1 {v8.16b}, [x1], #16 + ld1 {v8.16b}, [x20], #16 eor v0.16b, v0.16b, v8.16b - st1 {v0.16b}, [x0], #16 + st1 {v0.16b}, [x19], #16 tbnz x9, #1, 1f - ld1 {v9.16b}, [x1], #16 + ld1 {v9.16b}, [x20], #16 eor v1.16b, v1.16b, v9.16b - st1 {v1.16b}, [x0], #16 + st1 {v1.16b}, [x19], #16 tbnz x9, #2, 2f - ld1 {v10.16b}, [x1], #16 + ld1 {v10.16b}, [x20], #16 eor v4.16b, v4.16b, v10.16b - st1 {v4.16b}, [x0], #16 + st1 {v4.16b}, [x19], #16 tbnz x9, #3, 3f - ld1 {v11.16b}, [x1], #16 + ld1 {v11.16b}, [x20], #16 eor v6.16b, v6.16b, v11.16b - st1 {v6.16b}, [x0], #16 + st1 {v6.16b}, [x19], #16 tbnz x9, #4, 4f - ld1 {v12.16b}, [x1], #16 + ld1 {v12.16b}, [x20], #16 eor v3.16b, v3.16b, v12.16b - st1 {v3.16b}, [x0], #16 + st1 {v3.16b}, [x19], #16 tbnz x9, #5, 5f - ld1 {v13.16b}, [x1], #16 + ld1 {v13.16b}, [x20], #16 eor v7.16b, v7.16b, v13.16b - st1 {v7.16b}, [x0], #16 + st1 {v7.16b}, [x19], #16 tbnz x9, #6, 6f - ld1 {v14.16b}, [x1], #16 + ld1 {v14.16b}, [x20], #16 eor v2.16b, v2.16b, v14.16b - st1 {v2.16b}, [x0], #16 + st1 {v2.16b}, [x19], #16 tbnz x9, #7, 7f - ld1 {v15.16b}, [x1], #16 + ld1 {v15.16b}, [x20], #16 eor v5.16b, v5.16b, v15.16b - st1 {v5.16b}, [x0], #16 + st1 {v5.16b}, [x19], #16 8: next_ctr v0 - cbnz x4, 99b + st1 {v0.16b}, [x24] + cbz x23, 0f + + cond_yield_neon 98b + b 99b -0: st1 {v0.16b}, [x5] - ldp x29, x30, [sp], #16 +0: frame_pop 8 ret /* * If we are handling the tail of the input (x6 != NULL), return the * final keystream block back to the caller. */ -1: cbz x6, 8b - st1 {v1.16b}, [x6] +1: cbz x25, 8b + st1 {v1.16b}, [x25] b 8b -2: cbz x6, 8b - st1 {v4.16b}, [x6] +2: cbz x25, 8b + st1 {v4.16b}, [x25] b 8b -3: cbz x6, 8b - st1 {v6.16b}, [x6] +3: cbz x25, 8b + st1 {v6.16b}, [x25] b 8b -4: cbz x6, 8b - st1 {v3.16b}, [x6] +4: cbz x25, 8b + st1 {v3.16b}, [x25] b 8b -5: cbz x6, 8b - st1 {v7.16b}, [x6] +5: cbz x25, 8b + st1 {v7.16b}, [x25] b 8b -6: cbz x6, 8b - st1 {v2.16b}, [x6] +6: cbz x25, 8b + st1 {v2.16b}, [x25] b 8b -7: cbz x6, 8b - st1 {v5.16b}, [x6] +7: cbz x25, 8b + st1 {v5.16b}, [x25] b 8b ENDPROC(aesbs_ctr_encrypt) From patchwork Wed Dec 6 19:43:43 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 120901 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp7469255qgn; Wed, 6 Dec 2017 11:45:02 -0800 (PST) X-Google-Smtp-Source: AGs4zMY+9ouzLtwY+wmnSy6wZBvZUobkG7scgzBIjqR5zS+uhYr7BcSyglX8T/HR0geRNye/NGDi X-Received: by 10.99.141.75 with SMTP id z72mr21747224pgd.73.1512589501907; Wed, 06 Dec 2017 11:45:01 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1512589501; cv=none; d=google.com; s=arc-20160816; b=VPd4CxS4DU4n4K3lBLVji73RSZiolMtCL/37CqT50J6h67owG0YrOOvMr5zS0oMvz2 U6LYHE4uelKcw1a9ole23M8IluDutFiweytaSmbfYCzZ13/tcuWXwJVIYSAkMwj5u+ko FzxFuLo98eBZBzYp8XaJuhaEHfglkrxL51EZFXygNFp1X++Xubo1/gomuOx5LTTMWOw0 GIglXSOCT8rSeDGFy/3Yc77xtfEAZzD+5eJKXVYEufa1N34Snf/ISap2PtawtUU0vgAd 2BGRS6hPnWYVigpwPRHk0bPD7tuIo6aO657e4BME5QlQFHVfQ0d19+CBy9Z0acemrRaZ IF/g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=56HLO1hTwAirQLQREK7pSWn82Sof/aZd1e7vq37Xg9E=; b=yJ6xqfma/SxZMP2f9B7Gj8dNrOH2lAh0RNs3bnL4rGgooyRjz9equH5H49ZjG5drv4 dMVINYu9bwKw/A0MXuDHq4aIoAX2NOqRh79aiax2u59HTrLGj306MILhSOyWg846Ksk6 Qvm9Q3yZPe+RFu2pm5RcKm/QUuaVmeVfPK71bvpx4j5J9Hzig14oKbXkpSrL3+poDuUs EdRMFZTqhHQJ8/7Buj2lILaJ9yn5XzGgeakOPNlhsB9Qd0k66aQUiXGzTcr+JTynfOrh zVjaVJa/tXqw05baVFszjZBEUQIiMZjs0E1eQhSAZywGc9JjxlgO1yLBB+VvKwD9QMzv +D3Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=MjJ1+Qph; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id i16si2382524pgv.496.2017.12.06.11.45.01; Wed, 06 Dec 2017 11:45:01 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=MjJ1+Qph; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752503AbdLFTpA (ORCPT + 1 other); Wed, 6 Dec 2017 14:45:00 -0500 Received: from mail-wm0-f67.google.com ([74.125.82.67]:38558 "EHLO mail-wm0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752449AbdLFTos (ORCPT ); Wed, 6 Dec 2017 14:44:48 -0500 Received: by mail-wm0-f67.google.com with SMTP id 64so9079786wme.3 for ; Wed, 06 Dec 2017 11:44:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=56HLO1hTwAirQLQREK7pSWn82Sof/aZd1e7vq37Xg9E=; b=MjJ1+QphBa6L1FoU1vU1zgGB4tMbJwUojxL/pPqwt6UYLdpxBZVtiYeNYEhEF/oMaZ q1lLIchsKVNI0kfZF9zkQH3rYA1zl+E5r9ufh5UwSedU+tNtqBoWYvePgzrbgnyJwJ39 y1orqL0p28DbeQwi6r8rM0ePMyEAT9FUwywEs= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=56HLO1hTwAirQLQREK7pSWn82Sof/aZd1e7vq37Xg9E=; b=AEH0iXXs6prxs4oBkd1KWbm7E2xdZ4J3+XR7umHvQ7UGYZW+tu0hdxcahy+vwzrEj3 sYACrLrkCxK/czkekBuSlSapYJZg4tnnOoX8R80f5ARW9xKVHIhoSiNWWd5H4oMANyM2 mXZkVD3OYTM4cnQW3kyLtZTEIDdX0fXxRwE+R9F/bVzfwQsIvfrle6jsjb8UBe2Xkzvy v7fMfeIyROO+vlfuMGxKCL5hJBpymTSDZdadLbRTPIHBb78YQ/mXygxe9a6W0X1S6CdY QHb5hDAud9eNFct4y1wddYtmnaMawcrejn+izuuzXhKFnfQbwm/sTB9jExfYXEtwPMMF OmYA== X-Gm-Message-State: AKGB3mIuK3g4PXq+IAeb5da7BXup5Ja3SHr7RuCxdd5zF3/zxURXz+dr GpL6s6vGS50FOjA88E1xRKFy32DGyQA= X-Received: by 10.28.69.8 with SMTP id s8mr9499380wma.65.1512589487285; Wed, 06 Dec 2017 11:44:47 -0800 (PST) Received: from localhost.localdomain ([105.150.171.234]) by smtp.gmail.com with ESMTPSA id b66sm3596594wmh.32.2017.12.06.11.44.44 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 06 Dec 2017 11:44:46 -0800 (PST) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: herbert@gondor.apana.org.au, linux-arm-kernel@lists.infradead.org, Ard Biesheuvel , Dave Martin , Russell King - ARM Linux , Sebastian Andrzej Siewior , Mark Rutland , linux-rt-users@vger.kernel.org, Peter Zijlstra , Catalin Marinas , Will Deacon , Steven Rostedt , Thomas Gleixner Subject: [PATCH v3 17/20] crypto: arm64/aes-ghash - yield NEON after every block of input Date: Wed, 6 Dec 2017 19:43:43 +0000 Message-Id: <20171206194346.24393-18-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20171206194346.24393-1-ard.biesheuvel@linaro.org> References: <20171206194346.24393-1-ard.biesheuvel@linaro.org> Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Avoid excessive scheduling delays under a preemptible kernel by yielding the NEON after every block of input. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/ghash-ce-core.S | 113 ++++++++++++++------ arch/arm64/crypto/ghash-ce-glue.c | 28 +++-- 2 files changed, 97 insertions(+), 44 deletions(-) -- 2.11.0 diff --git a/arch/arm64/crypto/ghash-ce-core.S b/arch/arm64/crypto/ghash-ce-core.S index 11ebf1ae248a..8da87cfcce66 100644 --- a/arch/arm64/crypto/ghash-ce-core.S +++ b/arch/arm64/crypto/ghash-ce-core.S @@ -213,22 +213,31 @@ .endm .macro __pmull_ghash, pn - ld1 {SHASH.2d}, [x3] - ld1 {XL.2d}, [x1] + frame_push 5 + + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 + +0: ld1 {SHASH.2d}, [x22] + ld1 {XL.2d}, [x20] ext SHASH2.16b, SHASH.16b, SHASH.16b, #8 eor SHASH2.16b, SHASH2.16b, SHASH.16b __pmull_pre_\pn /* do the head block first, if supplied */ - cbz x4, 0f - ld1 {T1.2d}, [x4] - b 1f + cbz x23, 1f + ld1 {T1.2d}, [x23] + mov x23, xzr + b 2f -0: ld1 {T1.2d}, [x2], #16 - sub w0, w0, #1 +1: ld1 {T1.2d}, [x21], #16 + sub w19, w19, #1 -1: /* multiply XL by SHASH in GF(2^128) */ +2: /* multiply XL by SHASH in GF(2^128) */ CPU_LE( rev64 T1.16b, T1.16b ) ext T2.16b, XL.16b, XL.16b, #8 @@ -250,9 +259,18 @@ CPU_LE( rev64 T1.16b, T1.16b ) eor T2.16b, T2.16b, XH.16b eor XL.16b, XL.16b, T2.16b - cbnz w0, 0b + cbz w19, 3f + + if_will_cond_yield_neon + st1 {XL.2d}, [x20] + do_cond_yield_neon + b 0b + endif_yield_neon + + b 1b - st1 {XL.2d}, [x1] +3: st1 {XL.2d}, [x20] + frame_pop 5 ret .endm @@ -304,38 +322,55 @@ ENDPROC(pmull_ghash_update_p8) .endm .macro pmull_gcm_do_crypt, enc - ld1 {SHASH.2d}, [x4] - ld1 {XL.2d}, [x1] - ldr x8, [x5, #8] // load lower counter + frame_push 10 + + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 + mov x24, x5 + mov x25, x6 + mov x26, x7 + .if \enc == 1 + ldr x27, [sp, #96] // first stacked arg + .endif + + ldr x28, [x24, #8] // load lower counter +CPU_LE( rev x28, x28 ) + +0: mov x0, x25 + load_round_keys w26, x0 + ld1 {SHASH.2d}, [x23] + ld1 {XL.2d}, [x20] movi MASK.16b, #0xe1 ext SHASH2.16b, SHASH.16b, SHASH.16b, #8 -CPU_LE( rev x8, x8 ) shl MASK.2d, MASK.2d, #57 eor SHASH2.16b, SHASH2.16b, SHASH.16b .if \enc == 1 - ld1 {KS.16b}, [x7] + ld1 {KS.16b}, [x27] .endif -0: ld1 {CTR.8b}, [x5] // load upper counter - ld1 {INP.16b}, [x3], #16 - rev x9, x8 - add x8, x8, #1 - sub w0, w0, #1 +1: ld1 {CTR.8b}, [x24] // load upper counter + ld1 {INP.16b}, [x22], #16 + rev x9, x28 + add x28, x28, #1 + sub w19, w19, #1 ins CTR.d[1], x9 // set lower counter .if \enc == 1 eor INP.16b, INP.16b, KS.16b // encrypt input - st1 {INP.16b}, [x2], #16 + st1 {INP.16b}, [x21], #16 .endif rev64 T1.16b, INP.16b - cmp w6, #12 - b.ge 2f // AES-192/256? + cmp w26, #12 + b.ge 4f // AES-192/256? -1: enc_round CTR, v21 +2: enc_round CTR, v21 ext T2.16b, XL.16b, XL.16b, #8 ext IN1.16b, T1.16b, T1.16b, #8 @@ -390,27 +425,39 @@ CPU_LE( rev x8, x8 ) .if \enc == 0 eor INP.16b, INP.16b, KS.16b - st1 {INP.16b}, [x2], #16 + st1 {INP.16b}, [x21], #16 .endif - cbnz w0, 0b + cbz w19, 3f -CPU_LE( rev x8, x8 ) - st1 {XL.2d}, [x1] - str x8, [x5, #8] // store lower counter + if_will_cond_yield_neon + st1 {XL.2d}, [x20] + .if \enc == 1 + st1 {KS.16b}, [x27] + .endif + do_cond_yield_neon + b 0b + endif_yield_neon + b 1b + +3: st1 {XL.2d}, [x20] .if \enc == 1 - st1 {KS.16b}, [x7] + st1 {KS.16b}, [x27] .endif +CPU_LE( rev x28, x28 ) + str x28, [x24, #8] // store lower counter + + frame_pop 10 ret -2: b.eq 3f // AES-192? +4: b.eq 5f // AES-192? enc_round CTR, v17 enc_round CTR, v18 -3: enc_round CTR, v19 +5: enc_round CTR, v19 enc_round CTR, v20 - b 1b + b 2b .endm /* diff --git a/arch/arm64/crypto/ghash-ce-glue.c b/arch/arm64/crypto/ghash-ce-glue.c index cfc9c92814fd..7cf0b1aa6ea8 100644 --- a/arch/arm64/crypto/ghash-ce-glue.c +++ b/arch/arm64/crypto/ghash-ce-glue.c @@ -63,11 +63,12 @@ static void (*pmull_ghash_update)(int blocks, u64 dg[], const char *src, asmlinkage void pmull_gcm_encrypt(int blocks, u64 dg[], u8 dst[], const u8 src[], struct ghash_key const *k, - u8 ctr[], int rounds, u8 ks[]); + u8 ctr[], u32 const rk[], int rounds, + u8 ks[]); asmlinkage void pmull_gcm_decrypt(int blocks, u64 dg[], u8 dst[], const u8 src[], struct ghash_key const *k, - u8 ctr[], int rounds); + u8 ctr[], u32 const rk[], int rounds); asmlinkage void pmull_gcm_encrypt_block(u8 dst[], u8 const src[], u32 const rk[], int rounds); @@ -368,26 +369,29 @@ static int gcm_encrypt(struct aead_request *req) pmull_gcm_encrypt_block(ks, iv, NULL, num_rounds(&ctx->aes_key)); put_unaligned_be32(3, iv + GCM_IV_SIZE); + kernel_neon_end(); - err = skcipher_walk_aead_encrypt(&walk, req, true); + err = skcipher_walk_aead_encrypt(&walk, req, false); while (walk.nbytes >= AES_BLOCK_SIZE) { int blocks = walk.nbytes / AES_BLOCK_SIZE; + kernel_neon_begin(); pmull_gcm_encrypt(blocks, dg, walk.dst.virt.addr, walk.src.virt.addr, &ctx->ghash_key, - iv, num_rounds(&ctx->aes_key), ks); + iv, ctx->aes_key.key_enc, + num_rounds(&ctx->aes_key), ks); + kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); } - kernel_neon_end(); } else { __aes_arm64_encrypt(ctx->aes_key.key_enc, tag, iv, num_rounds(&ctx->aes_key)); put_unaligned_be32(2, iv + GCM_IV_SIZE); - err = skcipher_walk_aead_encrypt(&walk, req, true); + err = skcipher_walk_aead_encrypt(&walk, req, false); while (walk.nbytes >= AES_BLOCK_SIZE) { int blocks = walk.nbytes / AES_BLOCK_SIZE; @@ -467,15 +471,19 @@ static int gcm_decrypt(struct aead_request *req) pmull_gcm_encrypt_block(tag, iv, ctx->aes_key.key_enc, num_rounds(&ctx->aes_key)); put_unaligned_be32(2, iv + GCM_IV_SIZE); + kernel_neon_end(); - err = skcipher_walk_aead_decrypt(&walk, req, true); + err = skcipher_walk_aead_decrypt(&walk, req, false); while (walk.nbytes >= AES_BLOCK_SIZE) { int blocks = walk.nbytes / AES_BLOCK_SIZE; + kernel_neon_begin(); pmull_gcm_decrypt(blocks, dg, walk.dst.virt.addr, walk.src.virt.addr, &ctx->ghash_key, - iv, num_rounds(&ctx->aes_key)); + iv, ctx->aes_key.key_enc, + num_rounds(&ctx->aes_key)); + kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); @@ -483,14 +491,12 @@ static int gcm_decrypt(struct aead_request *req) if (walk.nbytes) pmull_gcm_encrypt_block(iv, iv, NULL, num_rounds(&ctx->aes_key)); - - kernel_neon_end(); } else { __aes_arm64_encrypt(ctx->aes_key.key_enc, tag, iv, num_rounds(&ctx->aes_key)); put_unaligned_be32(2, iv + GCM_IV_SIZE); - err = skcipher_walk_aead_decrypt(&walk, req, true); + err = skcipher_walk_aead_decrypt(&walk, req, false); while (walk.nbytes >= AES_BLOCK_SIZE) { int blocks = walk.nbytes / AES_BLOCK_SIZE;