From patchwork Wed Aug 22 15:28:55 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 144839 Delivered-To: patch@linaro.org Received: by 2002:a2e:164a:0:0:0:0:0 with SMTP id 10-v6csp159798ljw; Wed, 22 Aug 2018 08:29:09 -0700 (PDT) X-Google-Smtp-Source: AA+uWPx0UwNy/kWw/gZqbVT4Bt3767PM5V8KMvo4kNgSqmbwQaxaF1LiuTeJnwvmS+lcTVeyIBQf X-Received: by 2002:a17:902:74ca:: with SMTP id f10-v6mr23149706plt.260.1534951749031; Wed, 22 Aug 2018 08:29:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1534951749; cv=none; d=google.com; s=arc-20160816; b=mksBgLEN3g6RCf8BJIAAZB9ZJqwEtYVnKYDIQqQ1ERAfb1c8/30qv0uCiODtOjs5V0 x0rFfiLiyZWRqpD3FPaolhgkICJmlPOcis9EdmEczyN45x1wbvHsp3sW82s3Nt3o1kZB CasKEAcQNkR5vm25dTX9tiZYwkPHQkwou8F9PTGZXP8eZxkxnsuyXlYkvNaa/yUx7Znd cNyrjkmDwW/IIyAol1aW0vambinNvXfDecFt6sFn0D5n5EaXLae7Hl+e/q9YJghp2gsN tAwYWGkhoC6XdAN8Jui6QueO/1Z67dYnhVSEiJ7DXAhW+g9ornmK1A3OTiXgCFiaPvOV 8jFQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from :dkim-signature:arc-authentication-results; bh=alDbOvmjKobSRlfqJ5+Rtsy6fyk9/4NGrnCQEZhhyQg=; b=SHsReOzWu6+dt7tZyku2tjF7VS6ByVqRdT/gIXDfQv73UCet1TIQJ5LkrbU2e82YWU Y5psZbtSdFmbUzNUvz2OkIOuPmoa7HDIeePP0qGkXD6ozFiv+c7fHpGeZ55evbRYm4y4 kzg0eX/FQ9m0fskUhWMErVPFVOlE50iy9tUqFWEnKO5HE+P0+rIvgcO/XPBJk7TycWpD DCGk2DQjSsBO8cSAOi9UNY89zkMNt58y5TWTsrHYC6hMxJmL0OGoCSTlEb7sRSfGJnoJ X5uk32DDC8SoKKHtcwHyFGpIOA3UyaxoeinhK5U+5e+eYshECfvKeitTV23p8dnA0Hu/ CcNw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=iDcm9Tqd; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 67-v6si2096908pfc.21.2018.08.22.08.29.08; Wed, 22 Aug 2018 08:29:09 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=iDcm9Tqd; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728568AbeHVSy2 (ORCPT + 2 others); Wed, 22 Aug 2018 14:54:28 -0400 Received: from mail-wr1-f65.google.com ([209.85.221.65]:40126 "EHLO mail-wr1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728207AbeHVSy2 (ORCPT ); Wed, 22 Aug 2018 14:54:28 -0400 Received: by mail-wr1-f65.google.com with SMTP id n2-v6so1947721wrw.7 for ; Wed, 22 Aug 2018 08:29:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id; bh=alDbOvmjKobSRlfqJ5+Rtsy6fyk9/4NGrnCQEZhhyQg=; b=iDcm9Tqd/ykt9xr0TNvBudFoFGIi3N1xHwf7KIuWxfsWZlpyS+4cPRPjHTWDOxpLR/ jf7ZsQP7nn71zvldhFMx9UptF8Yhz7hLCxtbVIdpLn48kJW+YB+GMr1eBa9AnfXGjhmA sHgd84uvIXT23GDmv7WOU0YWfpCI0JfVIPeIY= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=alDbOvmjKobSRlfqJ5+Rtsy6fyk9/4NGrnCQEZhhyQg=; b=OUBJGZ2YaDBh4MVr87Ok8tg40Pe3hH5BXBoOtKDrVYO4hwm9nnFxuPVl56H/U7Y5BA 3qbVhHpyN9mUdVma48cZ4r8Cu2D7cHWLumoy2hPSMxp9a4jusLzIWlXgCu0IeGQ420AZ rWqBUeZBNbJ1Gc5B/P/NePzRjxo/VOzV/Ts4+FlhycHvuGpKjrlXSszeey+P5HG4k8RY QK1ewOh6xC79ePGT2E6QjJ+Hf0HDNrs0azBWyXCTNe2Shzl+D1+S97yIBgZPuXYemS1J 4TsljfIBXc+m5pSaI+bad0m0QJ0fp2TUHsaVrVTjZKdi+mW1VuMz4rbhiF2GwnmNyvum sa5g== X-Gm-Message-State: APzg51A/rSvYwBo/Z579KMrmcKyCq4UKMheXBWCf87wg20PQWmNXwiN2 Adz8A1nz+1NPpT+JKpaLpINYW4zXiJONPg== X-Received: by 2002:adf:9ab7:: with SMTP id a52-v6mr10963734wrc.75.1534951745619; Wed, 22 Aug 2018 08:29:05 -0700 (PDT) Received: from rev02.lan (cpc107249-cmbg18-2-0-cust143.5-4.cable.virginm.net. [80.3.80.144]) by smtp.gmail.com with ESMTPSA id r140-v6sm2657120wmd.7.2018.08.22.08.29.04 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 22 Aug 2018 08:29:04 -0700 (PDT) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org, herbert@gondor.apana.org.au, Ard Biesheuvel Subject: [PATCH] crypto: arm/ghash-ce - implement support for 4-way aggregation Date: Wed, 22 Aug 2018 16:28:55 +0100 Message-Id: <20180822152855.7699-1-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.18.0 Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Speed up the GHASH algorithm based on 64-bit polynomial multiplication by adding support for 4-way aggregation. This improves throughput by ~60% on Cortex-A53, from 1.70 cycles per byte to 1.05 cycles per byte. Signed-off-by: Ard Biesheuvel --- arch/arm/crypto/Kconfig | 1 + arch/arm/crypto/ghash-ce-core.S | 101 ++++++++++++++++++++++++++++++-- arch/arm/crypto/ghash-ce-glue.c | 38 ++++++++---- 3 files changed, 124 insertions(+), 16 deletions(-) -- 2.18.0 diff --git a/arch/arm/crypto/Kconfig b/arch/arm/crypto/Kconfig index 925d1364727a..07dd12efeea4 100644 --- a/arch/arm/crypto/Kconfig +++ b/arch/arm/crypto/Kconfig @@ -99,6 +99,7 @@ config CRYPTO_GHASH_ARM_CE depends on KERNEL_MODE_NEON select CRYPTO_HASH select CRYPTO_CRYPTD + select CRYPTO_GF128MUL help Use an implementation of GHASH (used by the GCM AEAD chaining mode) that uses the 64x64 to 128 bit polynomial multiplication (vmull.p64) diff --git a/arch/arm/crypto/ghash-ce-core.S b/arch/arm/crypto/ghash-ce-core.S index 2f78c10b1881..c982c63877a6 100644 --- a/arch/arm/crypto/ghash-ce-core.S +++ b/arch/arm/crypto/ghash-ce-core.S @@ -63,6 +63,27 @@ k48 .req d31 SHASH2_p64 .req d31 + HH .req q10 + HH3 .req q11 + HH4 .req q12 + HH34 .req q13 + + HH_L .req d20 + HH_H .req d21 + HH3_L .req d22 + HH3_H .req d23 + HH4_L .req d24 + HH4_H .req d25 + HH34_L .req d26 + HH34_H .req d27 + SHASH2_H .req d29 + + XL2 .req q5 + XM2 .req q6 + XH2 .req q7 + XL3 .req q8 + XM3 .req q9 + .text .fpu crypto-neon-fp-armv8 @@ -175,12 +196,76 @@ beq 0f vld1.64 {T1}, [ip] teq r0, #0 - b 1f + b 3f + +0: .ifc \pn, p64 + tst r0, #3 // skip until #blocks is a + bne 2f // round multiple of 4 + +1: vld1.8 {XL2-XM2}, [r2]! + vld1.8 {XL3}, [r2]! + vrev64.8 T1, XL2 + + subs r0, r0, #4 + + vext.8 T2, T1, T1, #8 + veor T1_H, T1_H, XL_L + veor XL, XL, T2 + + vmull.p64 XH, HH4_H, XL_H // a1 * b1 + veor T1_H, T1_H, XL_H + vmull.p64 XL, HH4_L, XL_L // a0 * b0 + vmull.p64 XM, HH34_H, T1_H // (a1 + a0)(b1 + b0) + + vrev64.8 T1, XM2 + + vmull.p64 XH2, HH3_H, T1_L // a1 * b1 + veor T1_L, T1_L, T1_H + vmull.p64 XL2, HH3_L, T1_H // a0 * b0 + vmull.p64 XM2, HH34_L, T1_L // (a1 + a0)(b1 + b0) + + vrev64.8 T1, XL3 + + vmull.p64 XL3, HH_H, T1_L // a1 * b1 + veor T1_L, T1_L, T1_H + veor XH2, XH2, XL3 + vmull.p64 XL3, HH_L, T1_H // a0 * b0 + vmull.p64 XM3, SHASH2_H, T1_L // (a1 + a0)(b1 + b0) + + vld1.8 {T1}, [r2]! + veor XL2, XL2, XL3 + vrev64.8 T1, T1 + veor XM2, XM2, XM3 + + vmull.p64 XL3, SHASH_H, T1_L // a1 * b1 + veor T1_L, T1_L, T1_H + veor XH2, XH2, XL3 + vmull.p64 XL3, SHASH_L, T1_H // a0 * b0 + vmull.p64 XM3, SHASH2_p64, T1_L // (a1 + a0)(b1 + b0) -0: vld1.64 {T1}, [r2]! + veor XL2, XL2, XL3 + veor XM2, XM2, XM3 + + veor XL, XL, XL2 + veor XH, XH, XH2 + veor XM, XM, XM2 + + veor T1, XL, XH + veor XM, XM, T1 + + __pmull_reduce_p64 + + veor T1, T1, XH + veor XL, XL, T1 + + beq 4f + b 1b + .endif + +2: vld1.64 {T1}, [r2]! subs r0, r0, #1 -1: /* multiply XL by SHASH in GF(2^128) */ +3: /* multiply XL by SHASH in GF(2^128) */ #ifndef CONFIG_CPU_BIG_ENDIAN vrev64.8 T1, T1 #endif @@ -203,7 +288,7 @@ bne 0b - vst1.64 {XL}, [r1] +4: vst1.64 {XL}, [r1] bx lr .endm @@ -212,8 +297,14 @@ * struct ghash_key const *k, const char *head) */ ENTRY(pmull_ghash_update_p64) - vld1.64 {SHASH}, [r3] + vld1.64 {SHASH}, [r3]! + vld1.64 {HH}, [r3]! + vld1.64 {HH3-HH4}, [r3] + veor SHASH2_p64, SHASH_L, SHASH_H + veor SHASH2_H, HH_L, HH_H + veor HH34_L, HH3_L, HH3_H + veor HH34_H, HH4_L, HH4_H vmov.i8 MASK, #0xe1 vshl.u64 MASK, MASK, #57 diff --git a/arch/arm/crypto/ghash-ce-glue.c b/arch/arm/crypto/ghash-ce-glue.c index 8930fc4e7c22..b7d30b6cf49c 100644 --- a/arch/arm/crypto/ghash-ce-glue.c +++ b/arch/arm/crypto/ghash-ce-glue.c @@ -1,7 +1,7 @@ /* * Accelerated GHASH implementation with ARMv8 vmull.p64 instructions. * - * Copyright (C) 2015 Linaro Ltd. + * Copyright (C) 2015 - 2018 Linaro Ltd. * * This program is free software; you can redistribute it and/or modify it * under the terms of the GNU General Public License version 2 as published @@ -28,8 +28,10 @@ MODULE_ALIAS_CRYPTO("ghash"); #define GHASH_DIGEST_SIZE 16 struct ghash_key { - u64 a; - u64 b; + u64 h[2]; + u64 h2[2]; + u64 h3[2]; + u64 h4[2]; }; struct ghash_desc_ctx { @@ -117,26 +119,40 @@ static int ghash_final(struct shash_desc *desc, u8 *dst) return 0; } +static void ghash_reflect(u64 h[], const be128 *k) +{ + u64 carry = be64_to_cpu(k->a) >> 63; + + h[0] = (be64_to_cpu(k->b) << 1) | carry; + h[1] = (be64_to_cpu(k->a) << 1) | (be64_to_cpu(k->b) >> 63); + + if (carry) + h[1] ^= 0xc200000000000000UL; +} + static int ghash_setkey(struct crypto_shash *tfm, const u8 *inkey, unsigned int keylen) { struct ghash_key *key = crypto_shash_ctx(tfm); - u64 a, b; + be128 h, k; if (keylen != GHASH_BLOCK_SIZE) { crypto_shash_set_flags(tfm, CRYPTO_TFM_RES_BAD_KEY_LEN); return -EINVAL; } - /* perform multiplication by 'x' in GF(2^128) */ - b = get_unaligned_be64(inkey); - a = get_unaligned_be64(inkey + 8); + memcpy(&k, inkey, GHASH_BLOCK_SIZE); + ghash_reflect(key->h, &k); + + h = k; + gf128mul_lle(&h, &k); + ghash_reflect(key->h2, &h); - key->a = (a << 1) | (b >> 63); - key->b = (b << 1) | (a >> 63); + gf128mul_lle(&h, &k); + ghash_reflect(key->h3, &h); - if (b >> 63) - key->b ^= 0xc200000000000000UL; + gf128mul_lle(&h, &k); + ghash_reflect(key->h4, &h); return 0; }