From patchwork Tue Jan 17 15:22:37 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Ard Biesheuvel <ard.biesheuvel@linaro.org>
X-Patchwork-Id: 91680
Delivered-To: patch@linaro.org
Received: by 10.140.20.99 with SMTP id 90csp543301qgi;
 Tue, 17 Jan 2017 07:24:05 -0800 (PST)
X-Received: by 10.84.130.5 with SMTP id 5mr58509213plc.69.1484666645388;
 Tue, 17 Jan 2017 07:24:05 -0800 (PST)
Return-Path: <linux-crypto-owner@vger.kernel.org>
Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67])
 by mx.google.com with ESMTP id
 e13si25201758pgf.220.2017.01.17.07.24.05; 
 Tue, 17 Jan 2017 07:24:05 -0800 (PST)
Received-SPF: pass (google.com: best guess record for domain of
 linux-crypto-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender) client-ip=209.132.180.67; 
Authentication-Results: mx.google.com;
 dkim=neutral (body hash did not verify) header.i=@linaro.org;
 spf=pass (google.com: best guess record for domain of
 linux-crypto-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender)
 smtp.mailfrom=linux-crypto-owner@vger.kernel.org; 
 dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
 id S1751008AbdAQPYC (ORCPT <rfc822;victor.chong@linaro.org>
 + 1 other); Tue, 17 Jan 2017 10:24:02 -0500
Received: from mail-wm0-f48.google.com ([74.125.82.48]:35459 "EHLO
 mail-wm0-f48.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
 with ESMTP id S1750924AbdAQPX4 (ORCPT
 <rfc822;linux-crypto@vger.kernel.org>);
 Tue, 17 Jan 2017 10:23:56 -0500
Received: by mail-wm0-f48.google.com with SMTP id r126so205377522wmr.0
 for <linux-crypto@vger.kernel.org>;
 Tue, 17 Jan 2017 07:23:13 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; 
 h=from:to:cc:subject:date:message-id:in-reply-to:references;
 bh=ElVhTH7rbhdNxoCETCPbiBrPki0mmyu75g9pZfBtnwA=;
 b=dRxJZQrQq8u+0FYeKPrjpyIv/qHN/wc89LeDhWCn+YqFZlmLNnFPi0k3SijCIJISLJ
 eXZSnjhcjQhZzbMY+2a6l4QPObUBRaw0KeGhejBlDHSYGR+2zbNTsaqmZ/IOdb6zWjZy
 6enFeGbbktKaINr63OWl2VelrV2ug3putZwrk=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
 :references;
 bh=ElVhTH7rbhdNxoCETCPbiBrPki0mmyu75g9pZfBtnwA=;
 b=WOIfl5lrWh72D9durTCsTTggycILHpc8qRaqjcOijHeZTPrbeW8S6rO1FuSMEKQ/VW
 4+ZYWr/ZsddBr4QoB2Fkf4FM2IT4QVOgiKRCTe4399Um72lEVd6nu1HLeX0Vv5MVkyYF
 a6eT8xsDG/p1ZXN5HmdxfQIgS83P0+dgGf23UsrFBjzah95Vhi3/f08Vb9w34Rtz9oEz
 7otZ5QF4b177LfeE1ibg5QB7xWXTQQdMMHrkcoukPXtLkmTL+IyUe8ZdYFMWgR2371FA
 I6JN/5jQhs6YCKZxBV8GlQrWzHNy0LPTMrijOo54GFUGDtHibzZ613wXsyGkVccGemVQ
 whXA==
X-Gm-Message-State: AIkVDXLUO9FisqUO21d4YE4XfaeIxT9nzIUNuf1CmAhQNl0MXYglttD4MkFa8u+FZr2Wlhgf
X-Received: by 10.28.172.7 with SMTP id v7mr15722294wme.37.1484666592597;
 Tue, 17 Jan 2017 07:23:12 -0800 (PST)
Received: from localhost.localdomain ([160.167.203.25])
 by smtp.gmail.com with ESMTPSA id
 y127sm37590359wmg.12.2017.01.17.07.23.10
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128);
 Tue, 17 Jan 2017 07:23:11 -0800 (PST)
From: Ard Biesheuvel <ard.biesheuvel@linaro.org>
To: linux-crypto@vger.kernel.org, herbert@gondor.apana.org.au
Cc: linux-arm-kernel@lists.infradead.org,
 Ard Biesheuvel <ard.biesheuvel@linaro.org>
Subject: [PATCH 10/10] crypto: arm64/aes - replace scalar fallback with
 plain NEON fallback
Date: Tue, 17 Jan 2017 15:22:37 +0000
Message-Id: <1484666557-31458-11-git-send-email-ard.biesheuvel@linaro.org>
X-Mailer: git-send-email 2.7.4
In-Reply-To: <1484666557-31458-1-git-send-email-ard.biesheuvel@linaro.org>
References: <1484666557-31458-1-git-send-email-ard.biesheuvel@linaro.org>
Sender: linux-crypto-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-crypto.vger.kernel.org>
X-Mailing-List: linux-crypto@vger.kernel.org

The new bitsliced NEON implementation of AES uses a fallback in two
places: CBC encryption (which is strictly sequential, whereas this
driver can only operate efficiently on 8 blocks at a time), and the
XTS tweak generation, which involves encrypting a single AES block
with a different key schedule.

The plain (i.e., non-bitsliced) NEON code is more suitable as a fallback,
given that it is faster than scalar on low end cores (which is what
the NEON implementations target, since high end cores have dedicated
instructions for AES), and shows similar behavior in terms of D-cache
footprint and sensitivity to cache timing attacks. So switch the fallback
handling to the plain NEON driver.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/crypto/Kconfig           |  2 +-
 arch/arm64/crypto/aes-neonbs-glue.c | 38 ++++++++++++++------
 2 files changed, 29 insertions(+), 11 deletions(-)

-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
index 5de75c3dcbd4..bed7feddfeed 100644
--- a/arch/arm64/crypto/Kconfig
+++ b/arch/arm64/crypto/Kconfig
@@ -86,7 +86,7 @@ config CRYPTO_AES_ARM64_BS
 	tristate "AES in ECB/CBC/CTR/XTS modes using bit-sliced NEON algorithm"
 	depends on KERNEL_MODE_NEON
 	select CRYPTO_BLKCIPHER
-	select CRYPTO_AES_ARM64
+	select CRYPTO_AES_ARM64_NEON_BLK
 	select CRYPTO_SIMD
 
 endif
diff --git a/arch/arm64/crypto/aes-neonbs-glue.c b/arch/arm64/crypto/aes-neonbs-glue.c
index 323dd76ae5f0..863e436ecf89 100644
--- a/arch/arm64/crypto/aes-neonbs-glue.c
+++ b/arch/arm64/crypto/aes-neonbs-glue.c
@@ -10,7 +10,6 @@
 
 #include <asm/neon.h>
 #include <crypto/aes.h>
-#include <crypto/cbc.h>
 #include <crypto/internal/simd.h>
 #include <crypto/internal/skcipher.h>
 #include <crypto/xts.h>
@@ -42,7 +41,12 @@ asmlinkage void aesbs_xts_encrypt(u8 out[], u8 const in[], u8 const rk[],
 asmlinkage void aesbs_xts_decrypt(u8 out[], u8 const in[], u8 const rk[],
 				  int rounds, int blocks, u8 iv[]);
 
-asmlinkage void __aes_arm64_encrypt(u32 *rk, u8 *out, const u8 *in, int rounds);
+/* borrowed from aes-neon-blk.ko */
+asmlinkage void neon_aes_ecb_encrypt(u8 out[], u8 const in[], u32 const rk[],
+				     int rounds, int blocks, int first);
+asmlinkage void neon_aes_cbc_encrypt(u8 out[], u8 const in[], u32 const rk[],
+				     int rounds, int blocks, u8 iv[],
+				     int first);
 
 struct aesbs_ctx {
 	u8	rk[13 * (8 * AES_BLOCK_SIZE) + 32];
@@ -140,16 +144,28 @@ static int aesbs_cbc_setkey(struct crypto_skcipher *tfm, const u8 *in_key,
 	return 0;
 }
 
-static void cbc_encrypt_one(struct crypto_skcipher *tfm, const u8 *src, u8 *dst)
+static int cbc_encrypt(struct skcipher_request *req)
 {
+	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
 	struct aesbs_cbc_ctx *ctx = crypto_skcipher_ctx(tfm);
+	struct skcipher_walk walk;
+	int err, first = 1;
 
-	__aes_arm64_encrypt(ctx->enc, dst, src, ctx->key.rounds);
-}
+	err = skcipher_walk_virt(&walk, req, true);
 
-static int cbc_encrypt(struct skcipher_request *req)
-{
-	return crypto_cbc_encrypt_walk(req, cbc_encrypt_one);
+	kernel_neon_begin();
+	while (walk.nbytes >= AES_BLOCK_SIZE) {
+		unsigned int blocks = walk.nbytes / AES_BLOCK_SIZE;
+
+		/* fall back to the non-bitsliced NEON implementation */
+		neon_aes_cbc_encrypt(walk.dst.virt.addr, walk.src.virt.addr,
+				     ctx->enc, ctx->key.rounds, blocks, walk.iv,
+				     first);
+		err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE);
+		first = 0;
+	}
+	kernel_neon_end();
+	return err;
 }
 
 static int cbc_decrypt(struct skcipher_request *req)
@@ -254,9 +270,11 @@ static int __xts_crypt(struct skcipher_request *req,
 
 	err = skcipher_walk_virt(&walk, req, true);
 
-	__aes_arm64_encrypt(ctx->twkey, walk.iv, walk.iv, ctx->key.rounds);
-
 	kernel_neon_begin();
+
+	neon_aes_ecb_encrypt(walk.iv, walk.iv, ctx->twkey,
+			     ctx->key.rounds, 1, 1);
+
 	while (walk.nbytes >= AES_BLOCK_SIZE) {
 		unsigned int blocks = walk.nbytes / AES_BLOCK_SIZE;