From patchwork Sat Aug 4 18:46:24 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 143449 Delivered-To: patch@linaro.org Received: by 2002:a2e:9754:0:0:0:0:0 with SMTP id f20-v6csp1543512ljj; Sat, 4 Aug 2018 11:46:36 -0700 (PDT) X-Google-Smtp-Source: AAOMgpdwpxFZwbQomsz9i+eVVHHf6/7IkF2kQNHyAe+yKJcJCDxRlKyo6tNEo5GFOKHfL02qOC5+ X-Received: by 2002:a63:5542:: with SMTP id f2-v6mr8633022pgm.37.1533408396070; Sat, 04 Aug 2018 11:46:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1533408396; cv=none; d=google.com; s=arc-20160816; b=znwBe17AMyITbp7HYfEXpgZfs5nsvKRV3ylqh8GmmpI0CXyUMLyl3rDhMsTMbxrux8 BLaakFZLc0SrLijpzsCKjNuqEpvyT4FAGF3ye73dVoRUqHUHfmGPQRttRpr9jvEUUSIa LYrxBL6Rk/q4CZ6kIICPL+bji5co7eGbcIZXH1ZGRRDgl68QlPlxngxtIZy3lEnE6jFz dv0l5xEaYBQB84RlL2G0CWO0f2NwsY77pDVdecAinItqwNMOptzv9eQgo4a+M01NZSsN lsGinjAruO9Elnj9MqKGOyQ4V5YwG8Xj5ADJL5WXtw9c8iHviZtGl8XmvwXy3NdtFIUc gLqA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=GoEAMPv7ae12W1B/AP8WTKMiEpYBcJEkfUG44j/a860=; b=efuHWWyxnPFHE4aICRZX5tbvqSpal178W0Pe9CJQqRtgfAL/coEXQFOdbWSuX/AlgA 86CpaJwYsKjbWE+LsgBVd5O85r2x1C6A1t5xFN9qebyi92AF5UTPFhfJ16S86BCPND+Q zkZt39wFPbuxUm+3XME/MKBQEWYQzToAs6/YtfPbuaesbFIBqvo9aB/dm9NQ2ufspKMk 1UKuUv6ff83WBjiwtQSKZAsh60y/JQB1v8v9IIN5rWdz891/k4dO9yii/3P4AMDz3Uty 4zg0hy1XFNZYmYWCVYeDi6X+mp6pnU6hOKocA9JBHC+ADi6p41VdoBcMzSUJOX2ImkUa i1ww== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=Zer9dOQ4; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p65-v6si8667183pga.401.2018.08.04.11.46.35; Sat, 04 Aug 2018 11:46:36 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=Zer9dOQ4; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728054AbeHDUsK (ORCPT + 1 other); Sat, 4 Aug 2018 16:48:10 -0400 Received: from mail-ed1-f45.google.com ([209.85.208.45]:46523 "EHLO mail-ed1-f45.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728067AbeHDUsI (ORCPT ); Sat, 4 Aug 2018 16:48:08 -0400 Received: by mail-ed1-f45.google.com with SMTP id o8-v6so3286479edt.13 for ; Sat, 04 Aug 2018 11:46:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=GoEAMPv7ae12W1B/AP8WTKMiEpYBcJEkfUG44j/a860=; b=Zer9dOQ4R/DkOk+xID0AtZJI+AdzIM4vjUOSVg6cFds/I/B+NSnbf2Z96uPxuJIqbf A7yyxePXeMcVLiL3ANRrNuq307/K+wlj8+2HUXTwuKUdZhj+6U4fdnXnO9KQMj16IPU0 g1xuyGRNu1JDLBucDzyjEqmnIQy0r6W8J4PvA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=GoEAMPv7ae12W1B/AP8WTKMiEpYBcJEkfUG44j/a860=; b=q3QyOqB8mtjOLsgMen2idRrBghcd6GkzWX9EqMZVbZRGpcSbCRTtMMo7/es0ylv3p2 Qsrb30jc9ooId6bLAiNvonDubLxi1Km+ZXAyjnqGkMKUQyimK2CkAbftvYi0IyD5pUnx OBYZ+UIZkAqJIoTWQS6qICSzQOJCYdfbTXBqaUOWmw8mGxgZlg4DFZq5xOPWxPPg1vFc cDYpPWHueH1MHw4jBz93st+TzyUvClZ95glzGQ2sVN0BwI5seHS86z98tqz4rK2+8FmV wd46HwW7i2XW1VDIVTJSELsgH/5ZYENa/ipRKGtqqheBNy9Nh5CtRt/S22yccq+4PSoG 0nFA== X-Gm-Message-State: AOUpUlGzlQQb2s+Zdb6etoFRzaqsd1TSp3zNAanI2YNSZMjT3sUfFLon UTQhvRAgL2kPEGivjjCDhuD+7ZAnHe4= X-Received: by 2002:a50:d307:: with SMTP id g7-v6mr12098903edh.221.1533408391768; Sat, 04 Aug 2018 11:46:31 -0700 (PDT) Received: from rev02.home (b80182.upc-b.chello.nl. [212.83.80.182]) by smtp.gmail.com with ESMTPSA id l30-v6sm4340504edc.70.2018.08.04.11.46.30 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 04 Aug 2018 11:46:30 -0700 (PDT) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: herbert@gondor.apana.org.au, linux-arm-kernel@lists.infradead.org, jerome.forissier@linaro.org, jens.wiklander@linaro.org, Ard Biesheuvel Subject: [PATCH 1/2] crypto: arm64/ghash-ce - replace NEON yield check with block limit Date: Sat, 4 Aug 2018 20:46:24 +0200 Message-Id: <20180804184625.28523-2-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180804184625.28523-1-ard.biesheuvel@linaro.org> References: <20180804184625.28523-1-ard.biesheuvel@linaro.org> Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Checking the TIF_NEED_RESCHED flag is disproportionately costly on cores with fast crypto instructions and comparatively slow memory accesses. On algorithms such as GHASH, which executes at ~1 cycle per byte on cores that implement support for 64 bit polynomial multiplication, there is really no need to check the TIF_NEED_RESCHED particularly often, and so we can remove the NEON yield check from the assembler routines. However, unlike the AEAD or skcipher APIs, the shash/ahash APIs take arbitrary input lengths, and so there needs to be some sanity check to ensure that we don't hog the CPU for excessive amounts of time. So let's simply cap the maximum input size that is processed in one go to 64 KB. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/ghash-ce-core.S | 39 ++++++-------------- arch/arm64/crypto/ghash-ce-glue.c | 16 ++++++-- 2 files changed, 23 insertions(+), 32 deletions(-) -- 2.18.0 diff --git a/arch/arm64/crypto/ghash-ce-core.S b/arch/arm64/crypto/ghash-ce-core.S index 913e49932ae6..344811c6a0ca 100644 --- a/arch/arm64/crypto/ghash-ce-core.S +++ b/arch/arm64/crypto/ghash-ce-core.S @@ -213,31 +213,23 @@ .endm .macro __pmull_ghash, pn - frame_push 5 - - mov x19, x0 - mov x20, x1 - mov x21, x2 - mov x22, x3 - mov x23, x4 - -0: ld1 {SHASH.2d}, [x22] - ld1 {XL.2d}, [x20] + ld1 {SHASH.2d}, [x3] + ld1 {XL.2d}, [x1] ext SHASH2.16b, SHASH.16b, SHASH.16b, #8 eor SHASH2.16b, SHASH2.16b, SHASH.16b __pmull_pre_\pn /* do the head block first, if supplied */ - cbz x23, 1f - ld1 {T1.2d}, [x23] - mov x23, xzr - b 2f + cbz x4, 0f + ld1 {T1.2d}, [x4] + mov x4, xzr + b 1f -1: ld1 {T1.2d}, [x21], #16 - sub w19, w19, #1 +0: ld1 {T1.2d}, [x2], #16 + sub w0, w0, #1 -2: /* multiply XL by SHASH in GF(2^128) */ +1: /* multiply XL by SHASH in GF(2^128) */ CPU_LE( rev64 T1.16b, T1.16b ) ext T2.16b, XL.16b, XL.16b, #8 @@ -259,18 +251,9 @@ CPU_LE( rev64 T1.16b, T1.16b ) eor T2.16b, T2.16b, XH.16b eor XL.16b, XL.16b, T2.16b - cbz w19, 3f - - if_will_cond_yield_neon - st1 {XL.2d}, [x20] - do_cond_yield_neon - b 0b - endif_yield_neon - - b 1b + cbnz w0, 0b -3: st1 {XL.2d}, [x20] - frame_pop + st1 {XL.2d}, [x1] ret .endm diff --git a/arch/arm64/crypto/ghash-ce-glue.c b/arch/arm64/crypto/ghash-ce-glue.c index 88e3d93fa7c7..03ce71ea81a2 100644 --- a/arch/arm64/crypto/ghash-ce-glue.c +++ b/arch/arm64/crypto/ghash-ce-glue.c @@ -113,6 +113,9 @@ static void ghash_do_update(int blocks, u64 dg[], const char *src, } } +/* avoid hogging the CPU for too long */ +#define MAX_BLOCKS (SZ_64K / GHASH_BLOCK_SIZE) + static int ghash_update(struct shash_desc *desc, const u8 *src, unsigned int len) { @@ -136,11 +139,16 @@ static int ghash_update(struct shash_desc *desc, const u8 *src, blocks = len / GHASH_BLOCK_SIZE; len %= GHASH_BLOCK_SIZE; - ghash_do_update(blocks, ctx->digest, src, key, - partial ? ctx->buf : NULL); + do { + int chunk = min(blocks, MAX_BLOCKS); + + ghash_do_update(chunk, ctx->digest, src, key, + partial ? ctx->buf : NULL); - src += blocks * GHASH_BLOCK_SIZE; - partial = 0; + blocks -= chunk; + src += chunk * GHASH_BLOCK_SIZE; + partial = 0; + } while (unlikely(blocks > 0)); } if (len) memcpy(ctx->buf + partial, src, len); From patchwork Sat Aug 4 18:46:25 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 143450 Delivered-To: patch@linaro.org Received: by 2002:a2e:9754:0:0:0:0:0 with SMTP id f20-v6csp1543521ljj; Sat, 4 Aug 2018 11:46:36 -0700 (PDT) X-Google-Smtp-Source: AAOMgpfcD9cU8gXdVEdhyf6Zlq5cMouCYcbNkiKvzhdqTufKOuAQKr2l7Yu5frq8A+23gffrLvCo X-Received: by 2002:a63:4506:: with SMTP id s6-v6mr8787409pga.422.1533408396365; Sat, 04 Aug 2018 11:46:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1533408396; cv=none; d=google.com; s=arc-20160816; b=ygDgMwqUfQcpKbEK5piJu2JCdSN0/iyT8+x8tbA0qk79Y7wRU1jtpi7j4Q2vZZuB/C /+HgoEWB/+K15kn64sHqZLLzW/0DVVNsyLVoWzW9PXC3TGsOHXcGFD+ES0GfB6h1pauD VJJSm7/DJn9VC8i6T0sJfpFh2ZbvEXcz0+4uRob/X7qY3eQYXnR65PMHmKd/v1LzQ9Ta zGQN8X9Cql5voinOAWVypIhND8Fao6pCqUowiiMmf4T/TOzLAWqT3G3VtzJdFM9A5XPB RCs3iIwR2Q8/OtxUcQjdD+Mf1HmHosPDFiRuysM009sNxMxYp8v8oYFxbLHPVk347okM XKUQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=9w4N68DENte3mzEE1jIcjhaQkI+u2rctrlHPu5Xa3SI=; b=KhsE6opZVoQnsiMs8y9KVR19o5nhj79IUyk70D911J5lITsSVgoGNSKvOSURIrIHEz Ckujgd3iaOIs2ETqc/4V3O0RIqf1jOSjcn5mrD2s0faerEtSYRAMVXjoE0ETyWouNyuv 5v0PjU5RkDylSXauDvobxrk/JDE8z0EkCtV01duy4+UfpAc5EZTgn1veSVVw6Na5T9NQ yF04O13OYhkpDoP94qhWVmD46LNfCwHyyRyPSh4YFvvuvGLog2aZstz9oDF1DeeEFsWo qtIOS2CeuhGCwL6q+vOfC277yEtnQmcIh73E2/kEXJiR/oaw5+riR9Asef/RZ73VX9q0 fDBA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b="k3A/Rgh/"; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p65-v6si8667183pga.401.2018.08.04.11.46.36; Sat, 04 Aug 2018 11:46:36 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b="k3A/Rgh/"; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728067AbeHDUsK (ORCPT + 1 other); Sat, 4 Aug 2018 16:48:10 -0400 Received: from mail-ed1-f51.google.com ([209.85.208.51]:38033 "EHLO mail-ed1-f51.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728205AbeHDUsK (ORCPT ); Sat, 4 Aug 2018 16:48:10 -0400 Received: by mail-ed1-f51.google.com with SMTP id t2-v6so3298951edr.5 for ; Sat, 04 Aug 2018 11:46:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=9w4N68DENte3mzEE1jIcjhaQkI+u2rctrlHPu5Xa3SI=; b=k3A/Rgh/5sIWqfTZaA/fw4/sl02vgSh89vqU4yKTo/ZKFbR7eS7OJXbZMHACQ/QchY ZuqQn33t2pvleQ99KiRNr1EIR6ZRi0ihrfFQ+vkpXUw+2GNpUQbFG+aW0dQB+et+U9NS OJCQj1hX5BltKDO/wceoKLq1QlqBDeNLhiLlI= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=9w4N68DENte3mzEE1jIcjhaQkI+u2rctrlHPu5Xa3SI=; b=t+Ky/Lr31rd9A5m/5tgzUXuQ/bsby43rg2sJpxdhIlqiq0qf1vKs7j/X9Q/kobYqcG mvj/Bp6nanx+WTunTtaUFEthpsdP0eOR+Nt+xUqzn2WJB8SdkfchjliZY6bRu0YGhdYO 9z/x48EL+hXbpEnsrUVqV51eNEovqsSvdWb7+lvevB+OJ/aJHzACf/OWb9x0S6wuGwmv PcuHU2R5XTAYaOq+Y3lDZclMxPEh+/XM/u3LNSSZ9GYIAkWX6KbOBvWJKUIdp6ImbJV3 uHdB/cZU08ZwBbaW4NSOrgOUKcrJtBgyDwnDRCb8dZeFQJRJ+dH8tCfnyllacMjoQF+Y kv/g== X-Gm-Message-State: AOUpUlH9AUzTWxLPFNXOVOs3juuuEI6f05FKPdqzQA5WS+i3S3suYttD UIct/nJOn/6AV5B4AxcCYnaNUViZMDU= X-Received: by 2002:aa7:d0d8:: with SMTP id u24-v6mr11964997edo.144.1533408393060; Sat, 04 Aug 2018 11:46:33 -0700 (PDT) Received: from rev02.home (b80182.upc-b.chello.nl. [212.83.80.182]) by smtp.gmail.com with ESMTPSA id l30-v6sm4340504edc.70.2018.08.04.11.46.31 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 04 Aug 2018 11:46:32 -0700 (PDT) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: herbert@gondor.apana.org.au, linux-arm-kernel@lists.infradead.org, jerome.forissier@linaro.org, jens.wiklander@linaro.org, Ard Biesheuvel Subject: [PATCH 2/2] crypto: arm64/ghash-ce - implement 4-way aggregation Date: Sat, 4 Aug 2018 20:46:25 +0200 Message-Id: <20180804184625.28523-3-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180804184625.28523-1-ard.biesheuvel@linaro.org> References: <20180804184625.28523-1-ard.biesheuvel@linaro.org> Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Enhance the GHASH implementation that uses 64-bit polynomial multiplication by adding support for 4-way aggregation. This more than doubles the performance, from 2.4 cycles per byte to 1.1 cpb on Cortex-A53. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/ghash-ce-core.S | 122 +++++++++++++++++--- arch/arm64/crypto/ghash-ce-glue.c | 71 ++++++------ 2 files changed, 142 insertions(+), 51 deletions(-) -- 2.18.0 diff --git a/arch/arm64/crypto/ghash-ce-core.S b/arch/arm64/crypto/ghash-ce-core.S index 344811c6a0ca..1b319b716d5e 100644 --- a/arch/arm64/crypto/ghash-ce-core.S +++ b/arch/arm64/crypto/ghash-ce-core.S @@ -46,6 +46,19 @@ ss3 .req v26 ss4 .req v27 + XL2 .req v8 + XM2 .req v9 + XH2 .req v10 + XL3 .req v11 + XM3 .req v12 + XH3 .req v13 + TT3 .req v14 + TT4 .req v15 + HH .req v16 + HH3 .req v17 + HH4 .req v18 + HH34 .req v19 + .text .arch armv8-a+crypto @@ -134,11 +147,25 @@ .endm .macro __pmull_pre_p64 + add x8, x3, #16 + ld1 {HH.2d-HH4.2d}, [x8] + + trn1 SHASH2.2d, SHASH.2d, HH.2d + trn2 T1.2d, SHASH.2d, HH.2d + eor SHASH2.16b, SHASH2.16b, T1.16b + + trn1 HH34.2d, HH3.2d, HH4.2d + trn2 T1.2d, HH3.2d, HH4.2d + eor HH34.16b, HH34.16b, T1.16b + movi MASK.16b, #0xe1 shl MASK.2d, MASK.2d, #57 .endm .macro __pmull_pre_p8 + ext SHASH2.16b, SHASH.16b, SHASH.16b, #8 + eor SHASH2.16b, SHASH2.16b, SHASH.16b + // k00_16 := 0x0000000000000000_000000000000ffff // k32_48 := 0x00000000ffffffff_0000ffffffffffff movi k32_48.2d, #0xffffffff @@ -215,8 +242,6 @@ .macro __pmull_ghash, pn ld1 {SHASH.2d}, [x3] ld1 {XL.2d}, [x1] - ext SHASH2.16b, SHASH.16b, SHASH.16b, #8 - eor SHASH2.16b, SHASH2.16b, SHASH.16b __pmull_pre_\pn @@ -224,12 +249,79 @@ cbz x4, 0f ld1 {T1.2d}, [x4] mov x4, xzr - b 1f + b 3f + +0: .ifc \pn, p64 + tbnz w0, #0, 2f // skip until #blocks is a + tbnz w0, #1, 2f // round multiple of 4 + +1: ld1 {XM3.16b-TT4.16b}, [x2], #64 + + sub w0, w0, #4 + + rev64 T1.16b, XM3.16b + rev64 T2.16b, XH3.16b + rev64 TT4.16b, TT4.16b + rev64 TT3.16b, TT3.16b + + ext IN1.16b, TT4.16b, TT4.16b, #8 + ext XL3.16b, TT3.16b, TT3.16b, #8 + + eor TT4.16b, TT4.16b, IN1.16b + pmull2 XH2.1q, SHASH.2d, IN1.2d // a1 * b1 + pmull XL2.1q, SHASH.1d, IN1.1d // a0 * b0 + pmull XM2.1q, SHASH2.1d, TT4.1d // (a1 + a0)(b1 + b0) + + eor TT3.16b, TT3.16b, XL3.16b + pmull2 XH3.1q, HH.2d, XL3.2d // a1 * b1 + pmull XL3.1q, HH.1d, XL3.1d // a0 * b0 + pmull2 XM3.1q, SHASH2.2d, TT3.2d // (a1 + a0)(b1 + b0) + + ext IN1.16b, T2.16b, T2.16b, #8 + eor XL2.16b, XL2.16b, XL3.16b + eor XH2.16b, XH2.16b, XH3.16b + eor XM2.16b, XM2.16b, XM3.16b + + eor T2.16b, T2.16b, IN1.16b + pmull2 XH3.1q, HH3.2d, IN1.2d // a1 * b1 + pmull XL3.1q, HH3.1d, IN1.1d // a0 * b0 + pmull XM3.1q, HH34.1d, T2.1d // (a1 + a0)(b1 + b0) -0: ld1 {T1.2d}, [x2], #16 + eor XL2.16b, XL2.16b, XL3.16b + eor XH2.16b, XH2.16b, XH3.16b + eor XM2.16b, XM2.16b, XM3.16b + + ext IN1.16b, T1.16b, T1.16b, #8 + ext TT3.16b, XL.16b, XL.16b, #8 + eor XL.16b, XL.16b, IN1.16b + eor T1.16b, T1.16b, TT3.16b + + pmull2 XH.1q, HH4.2d, XL.2d // a1 * b1 + eor T1.16b, T1.16b, XL.16b + pmull XL.1q, HH4.1d, XL.1d // a0 * b0 + pmull2 XM.1q, HH34.2d, T1.2d // (a1 + a0)(b1 + b0) + + eor XL.16b, XL.16b, XL2.16b + eor XH.16b, XH.16b, XH2.16b + eor XM.16b, XM.16b, XM2.16b + + eor T2.16b, XL.16b, XH.16b + ext T1.16b, XL.16b, XH.16b, #8 + eor XM.16b, XM.16b, T2.16b + + __pmull_reduce_p64 + + eor T2.16b, T2.16b, XH.16b + eor XL.16b, XL.16b, T2.16b + + cbz w0, 5f + b 1b + .endif + +2: ld1 {T1.2d}, [x2], #16 sub w0, w0, #1 -1: /* multiply XL by SHASH in GF(2^128) */ +3: /* multiply XL by SHASH in GF(2^128) */ CPU_LE( rev64 T1.16b, T1.16b ) ext T2.16b, XL.16b, XL.16b, #8 @@ -242,7 +334,7 @@ CPU_LE( rev64 T1.16b, T1.16b ) __pmull_\pn XL, XL, SHASH // a0 * b0 __pmull_\pn XM, T1, SHASH2 // (a1 + a0)(b1 + b0) - eor T2.16b, XL.16b, XH.16b +4: eor T2.16b, XL.16b, XH.16b ext T1.16b, XL.16b, XH.16b, #8 eor XM.16b, XM.16b, T2.16b @@ -253,7 +345,7 @@ CPU_LE( rev64 T1.16b, T1.16b ) cbnz w0, 0b - st1 {XL.2d}, [x1] +5: st1 {XL.2d}, [x1] ret .endm @@ -269,14 +361,10 @@ ENTRY(pmull_ghash_update_p8) __pmull_ghash p8 ENDPROC(pmull_ghash_update_p8) - KS0 .req v8 - KS1 .req v9 - INP0 .req v10 - INP1 .req v11 - HH .req v12 - XL2 .req v13 - XM2 .req v14 - XH2 .req v15 + KS0 .req v12 + KS1 .req v13 + INP0 .req v14 + INP1 .req v15 .macro load_round_keys, rounds, rk cmp \rounds, #12 @@ -310,8 +398,8 @@ ENDPROC(pmull_ghash_update_p8) .endm .macro pmull_gcm_do_crypt, enc - ld1 {HH.2d}, [x4], #16 - ld1 {SHASH.2d}, [x4] + ld1 {SHASH.2d}, [x4], #16 + ld1 {HH.2d}, [x4] ld1 {XL.2d}, [x1] ldr x8, [x5, #8] // load lower counter diff --git a/arch/arm64/crypto/ghash-ce-glue.c b/arch/arm64/crypto/ghash-ce-glue.c index 03ce71ea81a2..08b49fd621cb 100644 --- a/arch/arm64/crypto/ghash-ce-glue.c +++ b/arch/arm64/crypto/ghash-ce-glue.c @@ -33,9 +33,12 @@ MODULE_ALIAS_CRYPTO("ghash"); #define GCM_IV_SIZE 12 struct ghash_key { - u64 a; - u64 b; - be128 k; + u64 h[2]; + u64 h2[2]; + u64 h3[2]; + u64 h4[2]; + + be128 k; }; struct ghash_desc_ctx { @@ -46,7 +49,6 @@ struct ghash_desc_ctx { struct gcm_aes_ctx { struct crypto_aes_ctx aes_key; - u64 h2[2]; struct ghash_key ghash_key; }; @@ -63,11 +65,12 @@ static void (*pmull_ghash_update)(int blocks, u64 dg[], const char *src, const char *head); asmlinkage void pmull_gcm_encrypt(int blocks, u64 dg[], u8 dst[], - const u8 src[], u64 const *k, u8 ctr[], - u32 const rk[], int rounds, u8 ks[]); + const u8 src[], struct ghash_key const *k, + u8 ctr[], u32 const rk[], int rounds, + u8 ks[]); asmlinkage void pmull_gcm_decrypt(int blocks, u64 dg[], u8 dst[], - const u8 src[], u64 const *k, + const u8 src[], struct ghash_key const *k, u8 ctr[], u32 const rk[], int rounds); asmlinkage void pmull_gcm_encrypt_block(u8 dst[], u8 const src[], @@ -174,23 +177,36 @@ static int ghash_final(struct shash_desc *desc, u8 *dst) return 0; } +static void ghash_reflect(u64 h[], const be128 *k) +{ + u64 carry = be64_to_cpu(k->a) & BIT(63) ? 1 : 0; + + h[0] = (be64_to_cpu(k->b) << 1) | carry; + h[1] = (be64_to_cpu(k->a) << 1) | (be64_to_cpu(k->b) >> 63); + + if (carry) + h[1] ^= 0xc200000000000000UL; +} + static int __ghash_setkey(struct ghash_key *key, const u8 *inkey, unsigned int keylen) { - u64 a, b; + be128 h; /* needed for the fallback */ memcpy(&key->k, inkey, GHASH_BLOCK_SIZE); - /* perform multiplication by 'x' in GF(2^128) */ - b = get_unaligned_be64(inkey); - a = get_unaligned_be64(inkey + 8); + ghash_reflect(key->h, &key->k); + + h = key->k; + gf128mul_lle(&h, &key->k); + ghash_reflect(key->h2, &h); - key->a = (a << 1) | (b >> 63); - key->b = (b << 1) | (a >> 63); + gf128mul_lle(&h, &key->k); + ghash_reflect(key->h3, &h); - if (b >> 63) - key->b ^= 0xc200000000000000UL; + gf128mul_lle(&h, &key->k); + ghash_reflect(key->h4, &h); return 0; } @@ -241,8 +257,7 @@ static int gcm_setkey(struct crypto_aead *tfm, const u8 *inkey, unsigned int keylen) { struct gcm_aes_ctx *ctx = crypto_aead_ctx(tfm); - be128 h1, h2; - u8 *key = (u8 *)&h1; + u8 key[GHASH_BLOCK_SIZE]; int ret; ret = crypto_aes_expand_key(&ctx->aes_key, inkey, keylen); @@ -254,19 +269,7 @@ static int gcm_setkey(struct crypto_aead *tfm, const u8 *inkey, __aes_arm64_encrypt(ctx->aes_key.key_enc, key, (u8[AES_BLOCK_SIZE]){}, num_rounds(&ctx->aes_key)); - __ghash_setkey(&ctx->ghash_key, key, sizeof(be128)); - - /* calculate H^2 (used for 2-way aggregation) */ - h2 = h1; - gf128mul_lle(&h2, &h1); - - ctx->h2[0] = (be64_to_cpu(h2.b) << 1) | (be64_to_cpu(h2.a) >> 63); - ctx->h2[1] = (be64_to_cpu(h2.a) << 1) | (be64_to_cpu(h2.b) >> 63); - - if (be64_to_cpu(h2.a) >> 63) - ctx->h2[1] ^= 0xc200000000000000UL; - - return 0; + return __ghash_setkey(&ctx->ghash_key, key, sizeof(be128)); } static int gcm_setauthsize(struct crypto_aead *tfm, unsigned int authsize) @@ -402,8 +405,8 @@ static int gcm_encrypt(struct aead_request *req) kernel_neon_begin(); pmull_gcm_encrypt(blocks, dg, walk.dst.virt.addr, - walk.src.virt.addr, ctx->h2, iv, - rk, nrounds, ks); + walk.src.virt.addr, &ctx->ghash_key, + iv, rk, nrounds, ks); kernel_neon_end(); err = skcipher_walk_done(&walk, @@ -513,8 +516,8 @@ static int gcm_decrypt(struct aead_request *req) kernel_neon_begin(); pmull_gcm_decrypt(blocks, dg, walk.dst.virt.addr, - walk.src.virt.addr, ctx->h2, iv, - rk, nrounds); + walk.src.virt.addr, &ctx->ghash_key, + iv, rk, nrounds); /* check if this is the final iteration of the loop */ if (rem < (2 * AES_BLOCK_SIZE)) {