From patchwork Fri Dec 9 13:47:26 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 87478 Delivered-To: patch@linaro.org Received: by 10.140.20.101 with SMTP id 92csp313573qgi; Fri, 9 Dec 2016 05:47:37 -0800 (PST) X-Received: by 10.98.89.129 with SMTP id k1mr82365959pfj.180.1481291256997; Fri, 09 Dec 2016 05:47:36 -0800 (PST) Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 138si33941672pgg.76.2016.12.09.05.47.36; Fri, 09 Dec 2016 05:47:36 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@linaro.org; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=fail (p=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932995AbcLINrf (ORCPT + 1 other); Fri, 9 Dec 2016 08:47:35 -0500 Received: from mail-wj0-f170.google.com ([209.85.210.170]:33192 "EHLO mail-wj0-f170.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932955AbcLINre (ORCPT ); Fri, 9 Dec 2016 08:47:34 -0500 Received: by mail-wj0-f170.google.com with SMTP id xy5so15116164wjc.0 for ; Fri, 09 Dec 2016 05:47:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id; bh=RGxTnqp2DpdFuyaaJ665nTULmsuIUM2YXlJUi4iNx1A=; b=FMLG6Zt2oK447IFgwMCNanKJYYylWUlMm5Rq6UbWcTg437A6+PIUEHykRJVoZubbTl uwK1lJquAw1xLgHP1rY52FYgA7xJP1pJhqSGx0UPk+fFVUFaQSCZCANr+lg1GNkQG9l4 98Mvf2GY0ATOGFMYKz9k053TG9FQxGrVYIRCw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=RGxTnqp2DpdFuyaaJ665nTULmsuIUM2YXlJUi4iNx1A=; b=YfbmVAm55WjxT5JSAv/oynMcAYUR/WxhcBCJGNrjGL44SsKIR99kn/qD4uphaIaaTG 9+SC6VUAqxw2cS3lR0vMAKkOxi6wHKISwzkObpOP1YM8TmVriLw7aHMeYw2xi+xh6MLH zTui8Y60rpjbuZYpuSDb+y2iFZ4tDnxmfQfvEBm48LkUvzzpEMM2HBxKq99pIu2zo8+u p9YZGSY+3wj0n0sGCF9mEMNan9RaliRPkDB36Uao3G4QM1pYjUp26Qw7ppf7jLPJEhA7 cXxuL9DkOj/fAVujIOfkHCVnMpQXGXnsNw3Qs73XzJp7qf773y88kH9Yh/DON4iRWbQM WbxQ== X-Gm-Message-State: AKaTC02FykZQzJFgPD1XPYkhRM1iyukVcU+wg+mMn1ig2IsJXoMrWm4PKWWG/IsnCm2s8svu X-Received: by 10.194.95.194 with SMTP id dm2mr66872752wjb.207.1481291253014; Fri, 09 Dec 2016 05:47:33 -0800 (PST) Received: from localhost.localdomain ([105.144.52.243]) by smtp.gmail.com with ESMTPSA id b7sm42390009wjm.39.2016.12.09.05.47.31 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Fri, 09 Dec 2016 05:47:32 -0800 (PST) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org, herbert@gondor.apana.org.au Cc: linux-arm-kernel@lists.infradead.org, Ard Biesheuvel Subject: [PATCH] crypto: arm/aes-neonbs - process 8 blocks in parallel if we can Date: Fri, 9 Dec 2016 13:47:26 +0000 Message-Id: <1481291246-20216-1-git-send-email-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.7.4 Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org The bit-sliced NEON implementation of AES only performs optimally if it can process 8 blocks of input in parallel. This is due to the nature of bit slicing, where the n-th bit of each byte of AES state of each input block is collected into NEON register 'n', for registers q0 - q7. This implies that the amount of work for the transform is fixed, regardless of whether we are handling just one block or 8 in parallel. So let's try a bit harder to iterate over the input in suitably sized chunks, by increasing the chunksize to 8 * AES_BLOCK_SIZE, and tweaking the loops to only process multiples of the chunk size, unless we are handling the last chunk in the input stream. Note that the skcipher walk API guarantees that a step in the walk never returns less that 'chunksize' bytes if there are at least that many bytes of input still available. However, it does *not* guarantee that those steps produce an exact multiple of the chunk size. Signed-off-by: Ard Biesheuvel --- arch/arm/crypto/aesbs-glue.c | 68 +++++++++++++++++++++++++------------------- 1 file changed, 38 insertions(+), 30 deletions(-) -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/arch/arm/crypto/aesbs-glue.c b/arch/arm/crypto/aesbs-glue.c index d8e06de72ef3..938d1e1bf9a3 100644 --- a/arch/arm/crypto/aesbs-glue.c +++ b/arch/arm/crypto/aesbs-glue.c @@ -121,39 +121,26 @@ static int aesbs_cbc_encrypt(struct skcipher_request *req) return crypto_cbc_encrypt_walk(req, aesbs_encrypt_one); } -static inline void aesbs_decrypt_one(struct crypto_skcipher *tfm, - const u8 *src, u8 *dst) -{ - struct aesbs_cbc_ctx *ctx = crypto_skcipher_ctx(tfm); - - AES_decrypt(src, dst, &ctx->dec.rk); -} - static int aesbs_cbc_decrypt(struct skcipher_request *req) { struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); struct aesbs_cbc_ctx *ctx = crypto_skcipher_ctx(tfm); struct skcipher_walk walk; - unsigned int nbytes; int err; - for (err = skcipher_walk_virt(&walk, req, false); - (nbytes = walk.nbytes); err = skcipher_walk_done(&walk, nbytes)) { - u32 blocks = nbytes / AES_BLOCK_SIZE; - u8 *dst = walk.dst.virt.addr; - u8 *src = walk.src.virt.addr; - u8 *iv = walk.iv; - - if (blocks >= 8) { - kernel_neon_begin(); - bsaes_cbc_encrypt(src, dst, nbytes, &ctx->dec, iv); - kernel_neon_end(); - nbytes %= AES_BLOCK_SIZE; - continue; - } + err = skcipher_walk_virt(&walk, req, false); + + while (walk.nbytes) { + unsigned int nbytes = walk.nbytes; + + if (nbytes < walk.total) + nbytes = round_down(nbytes, walk.chunksize); - nbytes = crypto_cbc_decrypt_blocks(&walk, tfm, - aesbs_decrypt_one); + kernel_neon_begin(); + bsaes_cbc_encrypt(walk.src.virt.addr, walk.dst.virt.addr, + nbytes, &ctx->dec, walk.iv); + kernel_neon_end(); + err = skcipher_walk_done(&walk, walk.nbytes - nbytes); } return err; } @@ -186,6 +173,12 @@ static int aesbs_ctr_encrypt(struct skcipher_request *req) __be32 *ctr = (__be32 *)walk.iv; u32 headroom = UINT_MAX - be32_to_cpu(ctr[3]); + if (walk.nbytes < walk.total) { + blocks = round_down(blocks, + walk.chunksize / AES_BLOCK_SIZE); + tail = walk.nbytes - blocks * AES_BLOCK_SIZE; + } + /* avoid 32 bit counter overflow in the NEON code */ if (unlikely(headroom < blocks)) { blocks = headroom + 1; @@ -198,6 +191,9 @@ static int aesbs_ctr_encrypt(struct skcipher_request *req) kernel_neon_end(); inc_be128_ctr(ctr, blocks); + if (tail > 0 && tail < AES_BLOCK_SIZE) + break; + err = skcipher_walk_done(&walk, tail); } if (walk.nbytes) { @@ -227,11 +223,16 @@ static int aesbs_xts_encrypt(struct skcipher_request *req) AES_encrypt(walk.iv, walk.iv, &ctx->twkey); while (walk.nbytes) { + unsigned int nbytes = walk.nbytes; + + if (nbytes < walk.total) + nbytes = round_down(nbytes, walk.chunksize); + kernel_neon_begin(); bsaes_xts_encrypt(walk.src.virt.addr, walk.dst.virt.addr, - walk.nbytes, &ctx->enc, walk.iv); + nbytes, &ctx->enc, walk.iv); kernel_neon_end(); - err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); + err = skcipher_walk_done(&walk, walk.nbytes - nbytes); } return err; } @@ -249,11 +250,16 @@ static int aesbs_xts_decrypt(struct skcipher_request *req) AES_encrypt(walk.iv, walk.iv, &ctx->twkey); while (walk.nbytes) { + unsigned int nbytes = walk.nbytes; + + if (nbytes < walk.total) + nbytes = round_down(nbytes, walk.chunksize); + kernel_neon_begin(); bsaes_xts_decrypt(walk.src.virt.addr, walk.dst.virt.addr, - walk.nbytes, &ctx->dec, walk.iv); + nbytes, &ctx->dec, walk.iv); kernel_neon_end(); - err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); + err = skcipher_walk_done(&walk, walk.nbytes - nbytes); } return err; } @@ -272,6 +278,7 @@ static struct skcipher_alg aesbs_algs[] = { { .min_keysize = AES_MIN_KEY_SIZE, .max_keysize = AES_MAX_KEY_SIZE, .ivsize = AES_BLOCK_SIZE, + .chunksize = 8 * AES_BLOCK_SIZE, .setkey = aesbs_cbc_set_key, .encrypt = aesbs_cbc_encrypt, .decrypt = aesbs_cbc_decrypt, @@ -289,7 +296,7 @@ static struct skcipher_alg aesbs_algs[] = { { .min_keysize = AES_MIN_KEY_SIZE, .max_keysize = AES_MAX_KEY_SIZE, .ivsize = AES_BLOCK_SIZE, - .chunksize = AES_BLOCK_SIZE, + .chunksize = 8 * AES_BLOCK_SIZE, .setkey = aesbs_ctr_set_key, .encrypt = aesbs_ctr_encrypt, .decrypt = aesbs_ctr_encrypt, @@ -307,6 +314,7 @@ static struct skcipher_alg aesbs_algs[] = { { .min_keysize = 2 * AES_MIN_KEY_SIZE, .max_keysize = 2 * AES_MAX_KEY_SIZE, .ivsize = AES_BLOCK_SIZE, + .chunksize = 8 * AES_BLOCK_SIZE, .setkey = aesbs_xts_set_key, .encrypt = aesbs_xts_encrypt, .decrypt = aesbs_xts_decrypt,