From patchwork Mon Apr 30 16:18:23 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Ard Biesheuvel <ard.biesheuvel@linaro.org>
X-Patchwork-Id: 134713
Delivered-To: patch@linaro.org
Received: by 10.46.151.6 with SMTP id r6csp3948150lji;
 Mon, 30 Apr 2018 09:18:51 -0700 (PDT)
X-Google-Smtp-Source: AB8JxZqeEM+zm96zVcTXVv7/lBvuCrONio2FHx+AdufWoECV44Xb3/LCfkIMvHC3T4ks5QHuX3bG
X-Received: by 2002:a17:902:ab8d:: with SMTP id
 f13-v6mr12394029plr.81.1525105131511; 
 Mon, 30 Apr 2018 09:18:51 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1525105131; cv=none;
 d=google.com; s=arc-20160816;
 b=xXTLSe5IakQ8VK9c9Cx6CiIE1n1xEnvpqEg5WlG7mdZcCjNEmQ3iyuSIy6tyANv3sd
 xmQ0KP6ViT/TUu9bNs8Ta+kGmApGAsHa6j64QqI4zrn79f4c3rTe1iw5UZAbMpa1NDEI
 fWavUX07eVgMfp6AKYv7Ca3HqZQ9fCXkDAm9tmZ2mnLSYc9ufbixwY0wVAS5dwvaLP2p
 F8tjRZsJPMasejC7X7TAtW7vuyBiFPOoCcXwWpheVRwVysc9qbq0oyLlRZiroLQ0Mb+L
 sdIWqahuWGG9iXj2QzQTu5vCSXCSvgZ2ONkv2aI+zOPgySZcwp0DcO0wUk3mqfV/jp3k
 E64Q==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816; 
 h=list-id:precedence:sender:references:in-reply-to:message-id:date
 :subject:cc:to:from:dkim-signature:arc-authentication-results;
 bh=M3Dar2QfjxkTFdMiYpyIylqbas4/Sx5VzHZDuYjRXQY=;
 b=cSQ4v4mllCToDwI8IaNTHzMBth3EyzMNYfPEL8UAiY5K6hJDiV3AiFlcjuxMCqx5T2
 /5RxV59mNOLlLdwlQ9oWLjZFRdAdddBozqfFpxN/n0DueXIkFtpDU+vS99WbQPyhNrmo
 /YNtZQiswzsCWezD+iakDBzEC5Ms1wYCgNN91PbXJ0fQTm3YmnbPO3OXacjD5so9Crlz
 MvwmaIHZX5THgTgZd0LIFCoFTepcZkQtZjWp72e4Q+zUYWpc0Dlp0qTLLUAiR+kDoW87
 fzrO92g/9ruPidq8sYewIvhP+uxGGvbsavi3SzVHOJ0Y4gt0/zdgNLbDQNtBLInPEET6
 AyeQ==
ARC-Authentication-Results: i=1; mx.google.com;
 dkim=pass header.i=@linaro.org header.s=google header.b=iC6mQJAt;
 spf=pass (google.com: best guess record for domain of
 linux-crypto-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender)
 smtp.mailfrom=linux-crypto-owner@vger.kernel.org; 
 dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org
Return-Path: <linux-crypto-owner@vger.kernel.org>
Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67])
 by mx.google.com with ESMTP id
 f66-v6si5264259pgc.391.2018.04.30.09.18.51; 
 Mon, 30 Apr 2018 09:18:51 -0700 (PDT)
Received-SPF: pass (google.com: best guess record for domain of
 linux-crypto-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender) client-ip=209.132.180.67; 
Authentication-Results: mx.google.com;
 dkim=pass header.i=@linaro.org header.s=google header.b=iC6mQJAt;
 spf=pass (google.com: best guess record for domain of
 linux-crypto-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender)
 smtp.mailfrom=linux-crypto-owner@vger.kernel.org; 
 dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
 id S1754557AbeD3QSu (ORCPT <rfc822;victor.chong@linaro.org>
 + 1 other); Mon, 30 Apr 2018 12:18:50 -0400
Received: from mail-wr0-f195.google.com ([209.85.128.195]:44149 "EHLO
 mail-wr0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
 with ESMTP id S1753334AbeD3QSt (ORCPT
 <rfc822;linux-crypto@vger.kernel.org>);
 Mon, 30 Apr 2018 12:18:49 -0400
Received: by mail-wr0-f195.google.com with SMTP id o15-v6so8576548wro.11
 for <linux-crypto@vger.kernel.org>;
 Mon, 30 Apr 2018 09:18:49 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; 
 h=from:to:cc:subject:date:message-id:in-reply-to:references;
 bh=M3Dar2QfjxkTFdMiYpyIylqbas4/Sx5VzHZDuYjRXQY=;
 b=iC6mQJAtSyWEU97jB2983xJh89TuYRWhSNr/tvpmaGvrdC8gQq6uQwWfQ938h6y6+5
 6QT6p/GjukNeM8cFLCxEDdHtVB8/DsqLguDspegEVTw5BEUyl9kNiUfrM3O1zebHygiF
 yuCf9ifbSaGB7kFj5JznvsyTRmWyilOyXw8U0=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
 :references;
 bh=M3Dar2QfjxkTFdMiYpyIylqbas4/Sx5VzHZDuYjRXQY=;
 b=P1L3OmphCVUm/xYZhaEPHCeBEeveYhbbnRJeHppF0smIZihm/mAbc0vjWtpypGUGyR
 RHdLnoHFNfGEHjRwOP4FFa/ogiPS88bCAeD8DO41ZC52qXbAUe2WLu20+v2mGlZUzUhW
 06YzijVSPrSjyLBgufgQALDJrv3tCl4V7lr897L4XiC8yD6vBooDt5QbrlqRDUhhB2B7
 vOvy9+vywNui2r2DJHNgSGwsXdpGdqYVDTb8bQme+k2QOVbgixHtxy1+UsBDJ6OLRPOp
 0RNk8+rC2aaw/ZhUYD9HPaColDf2iNTCP41Z0yoXTx/AVUC+PhN6vMtGF1li+CoIlQsM
 qvkA==
X-Gm-Message-State: ALQs6tAukto7hUK0zQfXgNmmqEdXSobupAS8u/2t+va7M8MPSeNgEcww
 O8v18U/C3qczTIRVbMAn7vvMh5ZSLFk=
X-Received: by 2002:adf:988c:: with SMTP id
 w12-v6mr8983453wrb.215.1525105128189; 
 Mon, 30 Apr 2018 09:18:48 -0700 (PDT)
Received: from localhost.localdomain ([2a01:e35:3995:5470:200:1aff:fe1b:b328])
 by smtp.gmail.com with ESMTPSA id
 l1-v6sm5753845wre.54.2018.04.30.09.18.46
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Mon, 30 Apr 2018 09:18:47 -0700 (PDT)
From: Ard Biesheuvel <ard.biesheuvel@linaro.org>
To: linux-crypto@vger.kernel.org, herbert@gondor.apana.org.au
Cc: linux-arm-kernel@lists.infradead.org, dave.martin@arm.com,
 will.deacon@arm.com, Ard Biesheuvel <ard.biesheuvel@linaro.org>
Subject: [PATCH resend 03/10] crypto: arm64/aes-ccm - yield NEON after every
 block of input
Date: Mon, 30 Apr 2018 18:18:23 +0200
Message-Id: <20180430161830.14892-4-ard.biesheuvel@linaro.org>
X-Mailer: git-send-email 2.17.0
In-Reply-To: <20180430161830.14892-1-ard.biesheuvel@linaro.org>
References: <20180430161830.14892-1-ard.biesheuvel@linaro.org>
Sender: linux-crypto-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-crypto.vger.kernel.org>
X-Mailing-List: linux-crypto@vger.kernel.org

Avoid excessive scheduling delays under a preemptible kernel by
yielding the NEON after every block of input.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/crypto/aes-ce-ccm-core.S | 150 +++++++++++++-------
 1 file changed, 95 insertions(+), 55 deletions(-)

-- 
2.17.0

diff --git a/arch/arm64/crypto/aes-ce-ccm-core.S b/arch/arm64/crypto/aes-ce-ccm-core.S
index e3a375c4cb83..88f5aef7934c 100644
--- a/arch/arm64/crypto/aes-ce-ccm-core.S
+++ b/arch/arm64/crypto/aes-ce-ccm-core.S
@@ -19,24 +19,33 @@
 	 *			     u32 *macp, u8 const rk[], u32 rounds);
 	 */
 ENTRY(ce_aes_ccm_auth_data)
-	ldr	w8, [x3]			/* leftover from prev round? */
+	frame_push	7
+
+	mov	x19, x0
+	mov	x20, x1
+	mov	x21, x2
+	mov	x22, x3
+	mov	x23, x4
+	mov	x24, x5
+
+	ldr	w25, [x22]			/* leftover from prev round? */
 	ld1	{v0.16b}, [x0]			/* load mac */
-	cbz	w8, 1f
-	sub	w8, w8, #16
+	cbz	w25, 1f
+	sub	w25, w25, #16
 	eor	v1.16b, v1.16b, v1.16b
-0:	ldrb	w7, [x1], #1			/* get 1 byte of input */
-	subs	w2, w2, #1
-	add	w8, w8, #1
+0:	ldrb	w7, [x20], #1			/* get 1 byte of input */
+	subs	w21, w21, #1
+	add	w25, w25, #1
 	ins	v1.b[0], w7
 	ext	v1.16b, v1.16b, v1.16b, #1	/* rotate in the input bytes */
 	beq	8f				/* out of input? */
-	cbnz	w8, 0b
+	cbnz	w25, 0b
 	eor	v0.16b, v0.16b, v1.16b
-1:	ld1	{v3.4s}, [x4]			/* load first round key */
-	prfm	pldl1strm, [x1]
-	cmp	w5, #12				/* which key size? */
-	add	x6, x4, #16
-	sub	w7, w5, #2			/* modified # of rounds */
+1:	ld1	{v3.4s}, [x23]			/* load first round key */
+	prfm	pldl1strm, [x20]
+	cmp	w24, #12			/* which key size? */
+	add	x6, x23, #16
+	sub	w7, w24, #2			/* modified # of rounds */
 	bmi	2f
 	bne	5f
 	mov	v5.16b, v3.16b
@@ -55,33 +64,43 @@ ENTRY(ce_aes_ccm_auth_data)
 	ld1	{v5.4s}, [x6], #16		/* load next round key */
 	bpl	3b
 	aese	v0.16b, v4.16b
-	subs	w2, w2, #16			/* last data? */
+	subs	w21, w21, #16			/* last data? */
 	eor	v0.16b, v0.16b, v5.16b		/* final round */
 	bmi	6f
-	ld1	{v1.16b}, [x1], #16		/* load next input block */
+	ld1	{v1.16b}, [x20], #16		/* load next input block */
 	eor	v0.16b, v0.16b, v1.16b		/* xor with mac */
-	bne	1b
-6:	st1	{v0.16b}, [x0]			/* store mac */
+	beq	6f
+
+	if_will_cond_yield_neon
+	st1	{v0.16b}, [x19]			/* store mac */
+	do_cond_yield_neon
+	ld1	{v0.16b}, [x19]			/* reload mac */
+	endif_yield_neon
+
+	b	1b
+6:	st1	{v0.16b}, [x19]			/* store mac */
 	beq	10f
-	adds	w2, w2, #16
+	adds	w21, w21, #16
 	beq	10f
-	mov	w8, w2
-7:	ldrb	w7, [x1], #1
+	mov	w25, w21
+7:	ldrb	w7, [x20], #1
 	umov	w6, v0.b[0]
 	eor	w6, w6, w7
-	strb	w6, [x0], #1
-	subs	w2, w2, #1
+	strb	w6, [x19], #1
+	subs	w21, w21, #1
 	beq	10f
 	ext	v0.16b, v0.16b, v0.16b, #1	/* rotate out the mac bytes */
 	b	7b
-8:	mov	w7, w8
-	add	w8, w8, #16
+8:	mov	w7, w25
+	add	w25, w25, #16
 9:	ext	v1.16b, v1.16b, v1.16b, #1
 	adds	w7, w7, #1
 	bne	9b
 	eor	v0.16b, v0.16b, v1.16b
-	st1	{v0.16b}, [x0]
-10:	str	w8, [x3]
+	st1	{v0.16b}, [x19]
+10:	str	w25, [x22]
+
+	frame_pop
 	ret
 ENDPROC(ce_aes_ccm_auth_data)
 
@@ -126,19 +145,29 @@ ENTRY(ce_aes_ccm_final)
 ENDPROC(ce_aes_ccm_final)
 
 	.macro	aes_ccm_do_crypt,enc
-	ldr	x8, [x6, #8]			/* load lower ctr */
-	ld1	{v0.16b}, [x5]			/* load mac */
-CPU_LE(	rev	x8, x8			)	/* keep swabbed ctr in reg */
+	frame_push	8
+
+	mov	x19, x0
+	mov	x20, x1
+	mov	x21, x2
+	mov	x22, x3
+	mov	x23, x4
+	mov	x24, x5
+	mov	x25, x6
+
+	ldr	x26, [x25, #8]			/* load lower ctr */
+	ld1	{v0.16b}, [x24]			/* load mac */
+CPU_LE(	rev	x26, x26		)	/* keep swabbed ctr in reg */
 0:	/* outer loop */
-	ld1	{v1.8b}, [x6]			/* load upper ctr */
-	prfm	pldl1strm, [x1]
-	add	x8, x8, #1
-	rev	x9, x8
-	cmp	w4, #12				/* which key size? */
-	sub	w7, w4, #2			/* get modified # of rounds */
+	ld1	{v1.8b}, [x25]			/* load upper ctr */
+	prfm	pldl1strm, [x20]
+	add	x26, x26, #1
+	rev	x9, x26
+	cmp	w23, #12			/* which key size? */
+	sub	w7, w23, #2			/* get modified # of rounds */
 	ins	v1.d[1], x9			/* no carry in lower ctr */
-	ld1	{v3.4s}, [x3]			/* load first round key */
-	add	x10, x3, #16
+	ld1	{v3.4s}, [x22]			/* load first round key */
+	add	x10, x22, #16
 	bmi	1f
 	bne	4f
 	mov	v5.16b, v3.16b
@@ -165,9 +194,9 @@ CPU_LE(	rev	x8, x8			)	/* keep swabbed ctr in reg */
 	bpl	2b
 	aese	v0.16b, v4.16b
 	aese	v1.16b, v4.16b
-	subs	w2, w2, #16
-	bmi	6f				/* partial block? */
-	ld1	{v2.16b}, [x1], #16		/* load next input block */
+	subs	w21, w21, #16
+	bmi	7f				/* partial block? */
+	ld1	{v2.16b}, [x20], #16		/* load next input block */
 	.if	\enc == 1
 	eor	v2.16b, v2.16b, v5.16b		/* final round enc+mac */
 	eor	v1.16b, v1.16b, v2.16b		/* xor with crypted ctr */
@@ -176,18 +205,29 @@ CPU_LE(	rev	x8, x8			)	/* keep swabbed ctr in reg */
 	eor	v1.16b, v2.16b, v5.16b		/* final round enc */
 	.endif
 	eor	v0.16b, v0.16b, v2.16b		/* xor mac with pt ^ rk[last] */
-	st1	{v1.16b}, [x0], #16		/* write output block */
-	bne	0b
-CPU_LE(	rev	x8, x8			)
-	st1	{v0.16b}, [x5]			/* store mac */
-	str	x8, [x6, #8]			/* store lsb end of ctr (BE) */
-5:	ret
-
-6:	eor	v0.16b, v0.16b, v5.16b		/* final round mac */
+	st1	{v1.16b}, [x19], #16		/* write output block */
+	beq	5f
+
+	if_will_cond_yield_neon
+	st1	{v0.16b}, [x24]			/* store mac */
+	do_cond_yield_neon
+	ld1	{v0.16b}, [x24]			/* reload mac */
+	endif_yield_neon
+
+	b	0b
+5:
+CPU_LE(	rev	x26, x26			)
+	st1	{v0.16b}, [x24]			/* store mac */
+	str	x26, [x25, #8]			/* store lsb end of ctr (BE) */
+
+6:	frame_pop
+	ret
+
+7:	eor	v0.16b, v0.16b, v5.16b		/* final round mac */
 	eor	v1.16b, v1.16b, v5.16b		/* final round enc */
-	st1	{v0.16b}, [x5]			/* store mac */
-	add	w2, w2, #16			/* process partial tail block */
-7:	ldrb	w9, [x1], #1			/* get 1 byte of input */
+	st1	{v0.16b}, [x24]			/* store mac */
+	add	w21, w21, #16			/* process partial tail block */
+8:	ldrb	w9, [x20], #1			/* get 1 byte of input */
 	umov	w6, v1.b[0]			/* get top crypted ctr byte */
 	umov	w7, v0.b[0]			/* get top mac byte */
 	.if	\enc == 1
@@ -197,13 +237,13 @@ CPU_LE(	rev	x8, x8			)
 	eor	w9, w9, w6
 	eor	w7, w7, w9
 	.endif
-	strb	w9, [x0], #1			/* store out byte */
-	strb	w7, [x5], #1			/* store mac byte */
-	subs	w2, w2, #1
-	beq	5b
+	strb	w9, [x19], #1			/* store out byte */
+	strb	w7, [x24], #1			/* store mac byte */
+	subs	w21, w21, #1
+	beq	6b
 	ext	v0.16b, v0.16b, v0.16b, #1	/* shift out mac byte */
 	ext	v1.16b, v1.16b, v1.16b, #1	/* shift out ctr byte */
-	b	7b
+	b	8b
 	.endm
 
 	/*