From patchwork Thu Dec 23 04:35:42 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "tianjia.zhang" X-Patchwork-Id: 527772 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CE504C433F5 for ; Thu, 23 Dec 2021 04:35:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346360AbhLWEf6 (ORCPT ); Wed, 22 Dec 2021 23:35:58 -0500 Received: from out30-130.freemail.mail.aliyun.com ([115.124.30.130]:52001 "EHLO out30-130.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1346347AbhLWEfy (ORCPT ); Wed, 22 Dec 2021 23:35:54 -0500 X-Alimail-AntiSpam: AC=PASS; BC=-1|-1; BR=01201311R181e4; CH=green; DM=||false|; DS=||; FP=0|-1|-1|-1|0|-1|-1|-1; HT=e01e04407; MF=tianjia.zhang@linux.alibaba.com; NM=1; PH=DS; RN=20; SR=0; TI=SMTPD_---0V.UQwWK_1640234149; Received: from localhost(mailfrom:tianjia.zhang@linux.alibaba.com fp:SMTPD_---0V.UQwWK_1640234149) by smtp.aliyun-inc.com(127.0.0.1); Thu, 23 Dec 2021 12:35:50 +0800 From: Tianjia Zhang To: Herbert Xu , "David S. Miller" , Vitaly Chikunov , Eric Biggers , Eric Biggers , Gilad Ben-Yossef , Ard Biesheuvel , Jussi Kivilinna , Catalin Marinas , Will Deacon , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , linux-crypto@vger.kernel.org, x86@kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org Cc: Tianjia Zhang Subject: [PATCH v3 1/6] crypto: sm3 - create SM3 stand-alone library Date: Thu, 23 Dec 2021 12:35:42 +0800 Message-Id: <20211223043547.32297-2-tianjia.zhang@linux.alibaba.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20211223043547.32297-1-tianjia.zhang@linux.alibaba.com> References: <20211223043547.32297-1-tianjia.zhang@linux.alibaba.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Stand-alone implementation of the SM3 algorithm. It is designed to have as little dependencies as possible. In other cases you should generally use the hash APIs from include/crypto/hash.h. Especially when hashing large amounts of data as those APIs may be hw-accelerated. In the new SM3 stand-alone library, sm3_transform() has also been optimized, instead of simply using the code in sm3_generic. Signed-off-by: Tianjia Zhang Reviewed-by: Gilad Ben-Yossef --- include/crypto/sm3.h | 32 ++++++ lib/crypto/Kconfig | 3 + lib/crypto/Makefile | 3 + lib/crypto/sm3.c | 246 +++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 284 insertions(+) create mode 100644 lib/crypto/sm3.c diff --git a/include/crypto/sm3.h b/include/crypto/sm3.h index 42ea21289ba9..b5fb6d1bf247 100644 --- a/include/crypto/sm3.h +++ b/include/crypto/sm3.h @@ -1,5 +1,10 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ /* * Common values for SM3 algorithm + * + * Copyright (C) 2017 ARM Limited or its affiliates. + * Copyright (C) 2017 Gilad Ben-Yossef + * Copyright (C) 2021 Tianjia Zhang */ #ifndef _CRYPTO_SM3_H @@ -39,4 +44,31 @@ extern int crypto_sm3_final(struct shash_desc *desc, u8 *out); extern int crypto_sm3_finup(struct shash_desc *desc, const u8 *data, unsigned int len, u8 *hash); + +/* + * Stand-alone implementation of the SM3 algorithm. It is designed to + * have as little dependencies as possible so it can be used in the + * kexec_file purgatory. In other cases you should generally use the + * hash APIs from include/crypto/hash.h. Especially when hashing large + * amounts of data as those APIs may be hw-accelerated. + * + * For details see lib/crypto/sm3.c + */ + +static inline void sm3_init(struct sm3_state *sctx) +{ + sctx->state[0] = SM3_IVA; + sctx->state[1] = SM3_IVB; + sctx->state[2] = SM3_IVC; + sctx->state[3] = SM3_IVD; + sctx->state[4] = SM3_IVE; + sctx->state[5] = SM3_IVF; + sctx->state[6] = SM3_IVG; + sctx->state[7] = SM3_IVH; + sctx->count = 0; +} + +void sm3_update(struct sm3_state *sctx, const u8 *data, unsigned int len); +void sm3_final(struct sm3_state *sctx, u8 *out); + #endif diff --git a/lib/crypto/Kconfig b/lib/crypto/Kconfig index 545ccbddf6a1..36963e8f4eaa 100644 --- a/lib/crypto/Kconfig +++ b/lib/crypto/Kconfig @@ -129,5 +129,8 @@ config CRYPTO_LIB_CHACHA20POLY1305 config CRYPTO_LIB_SHA256 tristate +config CRYPTO_LIB_SM3 + tristate + config CRYPTO_LIB_SM4 tristate diff --git a/lib/crypto/Makefile b/lib/crypto/Makefile index 73205ed269ba..8149bc00b627 100644 --- a/lib/crypto/Makefile +++ b/lib/crypto/Makefile @@ -38,6 +38,9 @@ libpoly1305-y += poly1305.o obj-$(CONFIG_CRYPTO_LIB_SHA256) += libsha256.o libsha256-y := sha256.o +obj-$(CONFIG_CRYPTO_LIB_SM3) += libsm3.o +libsm3-y := sm3.o + obj-$(CONFIG_CRYPTO_LIB_SM4) += libsm4.o libsm4-y := sm4.o diff --git a/lib/crypto/sm3.c b/lib/crypto/sm3.c new file mode 100644 index 000000000000..d473e358a873 --- /dev/null +++ b/lib/crypto/sm3.c @@ -0,0 +1,246 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * SM3 secure hash, as specified by OSCCA GM/T 0004-2012 SM3 and described + * at https://datatracker.ietf.org/doc/html/draft-sca-cfrg-sm3-02 + * + * Copyright (C) 2017 ARM Limited or its affiliates. + * Copyright (C) 2017 Gilad Ben-Yossef + * Copyright (C) 2021 Tianjia Zhang + */ + +#include +#include +#include + +static const u32 ____cacheline_aligned K[64] = { + 0x79cc4519, 0xf3988a32, 0xe7311465, 0xce6228cb, + 0x9cc45197, 0x3988a32f, 0x7311465e, 0xe6228cbc, + 0xcc451979, 0x988a32f3, 0x311465e7, 0x6228cbce, + 0xc451979c, 0x88a32f39, 0x11465e73, 0x228cbce6, + 0x9d8a7a87, 0x3b14f50f, 0x7629ea1e, 0xec53d43c, + 0xd8a7a879, 0xb14f50f3, 0x629ea1e7, 0xc53d43ce, + 0x8a7a879d, 0x14f50f3b, 0x29ea1e76, 0x53d43cec, + 0xa7a879d8, 0x4f50f3b1, 0x9ea1e762, 0x3d43cec5, + 0x7a879d8a, 0xf50f3b14, 0xea1e7629, 0xd43cec53, + 0xa879d8a7, 0x50f3b14f, 0xa1e7629e, 0x43cec53d, + 0x879d8a7a, 0x0f3b14f5, 0x1e7629ea, 0x3cec53d4, + 0x79d8a7a8, 0xf3b14f50, 0xe7629ea1, 0xcec53d43, + 0x9d8a7a87, 0x3b14f50f, 0x7629ea1e, 0xec53d43c, + 0xd8a7a879, 0xb14f50f3, 0x629ea1e7, 0xc53d43ce, + 0x8a7a879d, 0x14f50f3b, 0x29ea1e76, 0x53d43cec, + 0xa7a879d8, 0x4f50f3b1, 0x9ea1e762, 0x3d43cec5 +}; + +/* + * Transform the message X which consists of 16 32-bit-words. See + * GM/T 004-2012 for details. + */ +#define R(i, a, b, c, d, e, f, g, h, t, w1, w2) \ + do { \ + ss1 = rol32((rol32((a), 12) + (e) + (t)), 7); \ + ss2 = ss1 ^ rol32((a), 12); \ + d += FF ## i(a, b, c) + ss2 + ((w1) ^ (w2)); \ + h += GG ## i(e, f, g) + ss1 + (w1); \ + b = rol32((b), 9); \ + f = rol32((f), 19); \ + h = P0((h)); \ + } while (0) + +#define R1(a, b, c, d, e, f, g, h, t, w1, w2) \ + R(1, a, b, c, d, e, f, g, h, t, w1, w2) +#define R2(a, b, c, d, e, f, g, h, t, w1, w2) \ + R(2, a, b, c, d, e, f, g, h, t, w1, w2) + +#define FF1(x, y, z) (x ^ y ^ z) +#define FF2(x, y, z) ((x & y) | (x & z) | (y & z)) + +#define GG1(x, y, z) FF1(x, y, z) +#define GG2(x, y, z) ((x & y) | (~x & z)) + +/* Message expansion */ +#define P0(x) ((x) ^ rol32((x), 9) ^ rol32((x), 17)) +#define P1(x) ((x) ^ rol32((x), 15) ^ rol32((x), 23)) +#define I(i) (W[i] = get_unaligned_be32(data + i * 4)) +#define W1(i) (W[i & 0x0f]) +#define W2(i) (W[i & 0x0f] = \ + P1(W[i & 0x0f] \ + ^ W[(i-9) & 0x0f] \ + ^ rol32(W[(i-3) & 0x0f], 15)) \ + ^ rol32(W[(i-13) & 0x0f], 7) \ + ^ W[(i-6) & 0x0f]) + +static void sm3_transform(struct sm3_state *sctx, u8 const *data, u32 W[16]) +{ + u32 a, b, c, d, e, f, g, h, ss1, ss2; + + a = sctx->state[0]; + b = sctx->state[1]; + c = sctx->state[2]; + d = sctx->state[3]; + e = sctx->state[4]; + f = sctx->state[5]; + g = sctx->state[6]; + h = sctx->state[7]; + + R1(a, b, c, d, e, f, g, h, K[0], I(0), I(4)); + R1(d, a, b, c, h, e, f, g, K[1], I(1), I(5)); + R1(c, d, a, b, g, h, e, f, K[2], I(2), I(6)); + R1(b, c, d, a, f, g, h, e, K[3], I(3), I(7)); + R1(a, b, c, d, e, f, g, h, K[4], W1(4), I(8)); + R1(d, a, b, c, h, e, f, g, K[5], W1(5), I(9)); + R1(c, d, a, b, g, h, e, f, K[6], W1(6), I(10)); + R1(b, c, d, a, f, g, h, e, K[7], W1(7), I(11)); + R1(a, b, c, d, e, f, g, h, K[8], W1(8), I(12)); + R1(d, a, b, c, h, e, f, g, K[9], W1(9), I(13)); + R1(c, d, a, b, g, h, e, f, K[10], W1(10), I(14)); + R1(b, c, d, a, f, g, h, e, K[11], W1(11), I(15)); + R1(a, b, c, d, e, f, g, h, K[12], W1(12), W2(16)); + R1(d, a, b, c, h, e, f, g, K[13], W1(13), W2(17)); + R1(c, d, a, b, g, h, e, f, K[14], W1(14), W2(18)); + R1(b, c, d, a, f, g, h, e, K[15], W1(15), W2(19)); + + R2(a, b, c, d, e, f, g, h, K[16], W1(16), W2(20)); + R2(d, a, b, c, h, e, f, g, K[17], W1(17), W2(21)); + R2(c, d, a, b, g, h, e, f, K[18], W1(18), W2(22)); + R2(b, c, d, a, f, g, h, e, K[19], W1(19), W2(23)); + R2(a, b, c, d, e, f, g, h, K[20], W1(20), W2(24)); + R2(d, a, b, c, h, e, f, g, K[21], W1(21), W2(25)); + R2(c, d, a, b, g, h, e, f, K[22], W1(22), W2(26)); + R2(b, c, d, a, f, g, h, e, K[23], W1(23), W2(27)); + R2(a, b, c, d, e, f, g, h, K[24], W1(24), W2(28)); + R2(d, a, b, c, h, e, f, g, K[25], W1(25), W2(29)); + R2(c, d, a, b, g, h, e, f, K[26], W1(26), W2(30)); + R2(b, c, d, a, f, g, h, e, K[27], W1(27), W2(31)); + R2(a, b, c, d, e, f, g, h, K[28], W1(28), W2(32)); + R2(d, a, b, c, h, e, f, g, K[29], W1(29), W2(33)); + R2(c, d, a, b, g, h, e, f, K[30], W1(30), W2(34)); + R2(b, c, d, a, f, g, h, e, K[31], W1(31), W2(35)); + + R2(a, b, c, d, e, f, g, h, K[32], W1(32), W2(36)); + R2(d, a, b, c, h, e, f, g, K[33], W1(33), W2(37)); + R2(c, d, a, b, g, h, e, f, K[34], W1(34), W2(38)); + R2(b, c, d, a, f, g, h, e, K[35], W1(35), W2(39)); + R2(a, b, c, d, e, f, g, h, K[36], W1(36), W2(40)); + R2(d, a, b, c, h, e, f, g, K[37], W1(37), W2(41)); + R2(c, d, a, b, g, h, e, f, K[38], W1(38), W2(42)); + R2(b, c, d, a, f, g, h, e, K[39], W1(39), W2(43)); + R2(a, b, c, d, e, f, g, h, K[40], W1(40), W2(44)); + R2(d, a, b, c, h, e, f, g, K[41], W1(41), W2(45)); + R2(c, d, a, b, g, h, e, f, K[42], W1(42), W2(46)); + R2(b, c, d, a, f, g, h, e, K[43], W1(43), W2(47)); + R2(a, b, c, d, e, f, g, h, K[44], W1(44), W2(48)); + R2(d, a, b, c, h, e, f, g, K[45], W1(45), W2(49)); + R2(c, d, a, b, g, h, e, f, K[46], W1(46), W2(50)); + R2(b, c, d, a, f, g, h, e, K[47], W1(47), W2(51)); + + R2(a, b, c, d, e, f, g, h, K[48], W1(48), W2(52)); + R2(d, a, b, c, h, e, f, g, K[49], W1(49), W2(53)); + R2(c, d, a, b, g, h, e, f, K[50], W1(50), W2(54)); + R2(b, c, d, a, f, g, h, e, K[51], W1(51), W2(55)); + R2(a, b, c, d, e, f, g, h, K[52], W1(52), W2(56)); + R2(d, a, b, c, h, e, f, g, K[53], W1(53), W2(57)); + R2(c, d, a, b, g, h, e, f, K[54], W1(54), W2(58)); + R2(b, c, d, a, f, g, h, e, K[55], W1(55), W2(59)); + R2(a, b, c, d, e, f, g, h, K[56], W1(56), W2(60)); + R2(d, a, b, c, h, e, f, g, K[57], W1(57), W2(61)); + R2(c, d, a, b, g, h, e, f, K[58], W1(58), W2(62)); + R2(b, c, d, a, f, g, h, e, K[59], W1(59), W2(63)); + R2(a, b, c, d, e, f, g, h, K[60], W1(60), W2(64)); + R2(d, a, b, c, h, e, f, g, K[61], W1(61), W2(65)); + R2(c, d, a, b, g, h, e, f, K[62], W1(62), W2(66)); + R2(b, c, d, a, f, g, h, e, K[63], W1(63), W2(67)); + + sctx->state[0] ^= a; + sctx->state[1] ^= b; + sctx->state[2] ^= c; + sctx->state[3] ^= d; + sctx->state[4] ^= e; + sctx->state[5] ^= f; + sctx->state[6] ^= g; + sctx->state[7] ^= h; +} +#undef R +#undef R1 +#undef R2 +#undef I +#undef W1 +#undef W2 + +static inline void sm3_block(struct sm3_state *sctx, + u8 const *data, int blocks, u32 W[16]) +{ + while (blocks--) { + sm3_transform(sctx, data, W); + data += SM3_BLOCK_SIZE; + } +} + +void sm3_update(struct sm3_state *sctx, const u8 *data, unsigned int len) +{ + unsigned int partial = sctx->count % SM3_BLOCK_SIZE; + u32 W[16]; + + sctx->count += len; + + if ((partial + len) >= SM3_BLOCK_SIZE) { + int blocks; + + if (partial) { + int p = SM3_BLOCK_SIZE - partial; + + memcpy(sctx->buffer + partial, data, p); + data += p; + len -= p; + + sm3_block(sctx, sctx->buffer, 1, W); + } + + blocks = len / SM3_BLOCK_SIZE; + len %= SM3_BLOCK_SIZE; + + if (blocks) { + sm3_block(sctx, data, blocks, W); + data += blocks * SM3_BLOCK_SIZE; + } + + memzero_explicit(W, sizeof(W)); + + partial = 0; + } + if (len) + memcpy(sctx->buffer + partial, data, len); +} +EXPORT_SYMBOL_GPL(sm3_update); + +void sm3_final(struct sm3_state *sctx, u8 *out) +{ + const int bit_offset = SM3_BLOCK_SIZE - sizeof(u64); + __be64 *bits = (__be64 *)(sctx->buffer + bit_offset); + __be32 *digest = (__be32 *)out; + unsigned int partial = sctx->count % SM3_BLOCK_SIZE; + u32 W[16]; + int i; + + sctx->buffer[partial++] = 0x80; + if (partial > bit_offset) { + memset(sctx->buffer + partial, 0, SM3_BLOCK_SIZE - partial); + partial = 0; + + sm3_block(sctx, sctx->buffer, 1, W); + } + + memset(sctx->buffer + partial, 0, bit_offset - partial); + *bits = cpu_to_be64(sctx->count << 3); + sm3_block(sctx, sctx->buffer, 1, W); + + for (i = 0; i < 8; i++) + put_unaligned_be32(sctx->state[i], digest++); + + /* Zeroize sensitive information. */ + memzero_explicit(W, sizeof(W)); + memzero_explicit(sctx, sizeof(*sctx)); +} +EXPORT_SYMBOL_GPL(sm3_final); + +MODULE_DESCRIPTION("Generic SM3 library"); +MODULE_LICENSE("GPL v2"); From patchwork Thu Dec 23 04:35:43 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "tianjia.zhang" X-Patchwork-Id: 527501 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C0212C433FE for ; Thu, 23 Dec 2021 04:36:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346372AbhLWEf7 (ORCPT ); Wed, 22 Dec 2021 23:35:59 -0500 Received: from out30-132.freemail.mail.aliyun.com ([115.124.30.132]:60393 "EHLO out30-132.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1346358AbhLWEf6 (ORCPT ); Wed, 22 Dec 2021 23:35:58 -0500 X-Alimail-AntiSpam: AC=PASS; BC=-1|-1; BR=01201311R211e4; CH=green; DM=||false|; DS=||; FP=0|-1|-1|-1|0|-1|-1|-1; HT=e01e04423; MF=tianjia.zhang@linux.alibaba.com; NM=1; PH=DS; RN=20; SR=0; TI=SMTPD_---0V.ULz2o_1640234151; Received: from localhost(mailfrom:tianjia.zhang@linux.alibaba.com fp:SMTPD_---0V.ULz2o_1640234151) by smtp.aliyun-inc.com(127.0.0.1); Thu, 23 Dec 2021 12:35:51 +0800 From: Tianjia Zhang To: Herbert Xu , "David S. Miller" , Vitaly Chikunov , Eric Biggers , Eric Biggers , Gilad Ben-Yossef , Ard Biesheuvel , Jussi Kivilinna , Catalin Marinas , Will Deacon , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , linux-crypto@vger.kernel.org, x86@kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org Cc: Tianjia Zhang Subject: [PATCH v3 2/6] crypto: arm64/sm3-ce - make dependent on sm3 library Date: Thu, 23 Dec 2021 12:35:43 +0800 Message-Id: <20211223043547.32297-3-tianjia.zhang@linux.alibaba.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20211223043547.32297-1-tianjia.zhang@linux.alibaba.com> References: <20211223043547.32297-1-tianjia.zhang@linux.alibaba.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org SM3 generic library is stand-alone implementation, sm3-ce can depend on the SM3 library instead of sm3-generic. Signed-off-by: Tianjia Zhang --- arch/arm64/crypto/Kconfig | 2 +- arch/arm64/crypto/sm3-ce-glue.c | 20 ++++++++++++++------ 2 files changed, 15 insertions(+), 7 deletions(-) diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig index addfa413650b..2a965aa0188d 100644 --- a/arch/arm64/crypto/Kconfig +++ b/arch/arm64/crypto/Kconfig @@ -45,7 +45,7 @@ config CRYPTO_SM3_ARM64_CE tristate "SM3 digest algorithm (ARMv8.2 Crypto Extensions)" depends on KERNEL_MODE_NEON select CRYPTO_HASH - select CRYPTO_SM3 + select CRYPTO_LIB_SM3 config CRYPTO_SM4_ARM64_CE tristate "SM4 symmetric cipher (ARMv8.2 Crypto Extensions)" diff --git a/arch/arm64/crypto/sm3-ce-glue.c b/arch/arm64/crypto/sm3-ce-glue.c index d71faca322f2..3198f31c9446 100644 --- a/arch/arm64/crypto/sm3-ce-glue.c +++ b/arch/arm64/crypto/sm3-ce-glue.c @@ -27,7 +27,7 @@ static int sm3_ce_update(struct shash_desc *desc, const u8 *data, unsigned int len) { if (!crypto_simd_usable()) - return crypto_sm3_update(desc, data, len); + return sm3_update(shash_desc_ctx(desc), data, len); kernel_neon_begin(); sm3_base_do_update(desc, data, len, sm3_ce_transform); @@ -39,7 +39,7 @@ static int sm3_ce_update(struct shash_desc *desc, const u8 *data, static int sm3_ce_final(struct shash_desc *desc, u8 *out) { if (!crypto_simd_usable()) - return crypto_sm3_finup(desc, NULL, 0, out); + return sm3_final(shash_desc_ctx(desc), out); kernel_neon_begin(); sm3_base_do_finalize(desc, sm3_ce_transform); @@ -51,14 +51,22 @@ static int sm3_ce_final(struct shash_desc *desc, u8 *out) static int sm3_ce_finup(struct shash_desc *desc, const u8 *data, unsigned int len, u8 *out) { - if (!crypto_simd_usable()) - return crypto_sm3_finup(desc, data, len, out); + if (!crypto_simd_usable()) { + struct sm3_state *sctx = shash_desc_ctx(desc); + + if (len) + sm3_update(sctx, data, len); + sm3_final(sctx, out); + return 0; + } kernel_neon_begin(); - sm3_base_do_update(desc, data, len, sm3_ce_transform); + if (len) + sm3_base_do_update(desc, data, len, sm3_ce_transform); + sm3_base_do_finalize(desc, sm3_ce_transform); kernel_neon_end(); - return sm3_ce_final(desc, out); + return sm3_base_finish(desc, out); } static struct shash_alg sm3_alg = { From patchwork Thu Dec 23 04:35:44 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "tianjia.zhang" X-Patchwork-Id: 527502 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7C502C433EF for ; Thu, 23 Dec 2021 04:36:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346365AbhLWEf6 (ORCPT ); Wed, 22 Dec 2021 23:35:58 -0500 Received: from out30-132.freemail.mail.aliyun.com ([115.124.30.132]:60087 "EHLO out30-132.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1346356AbhLWEf5 (ORCPT ); Wed, 22 Dec 2021 23:35:57 -0500 X-Alimail-AntiSpam: AC=PASS; BC=-1|-1; BR=01201311R141e4; CH=green; DM=||false|; DS=||; FP=0|-1|-1|-1|0|-1|-1|-1; HT=e01e04426; MF=tianjia.zhang@linux.alibaba.com; NM=1; PH=DS; RN=20; SR=0; TI=SMTPD_---0V.TzZlD_1640234152; Received: from localhost(mailfrom:tianjia.zhang@linux.alibaba.com fp:SMTPD_---0V.TzZlD_1640234152) by smtp.aliyun-inc.com(127.0.0.1); Thu, 23 Dec 2021 12:35:53 +0800 From: Tianjia Zhang To: Herbert Xu , "David S. Miller" , Vitaly Chikunov , Eric Biggers , Eric Biggers , Gilad Ben-Yossef , Ard Biesheuvel , Jussi Kivilinna , Catalin Marinas , Will Deacon , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , linux-crypto@vger.kernel.org, x86@kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org Cc: Tianjia Zhang Subject: [PATCH v3 3/6] crypto: sm2 - make dependent on sm3 library Date: Thu, 23 Dec 2021 12:35:44 +0800 Message-Id: <20211223043547.32297-4-tianjia.zhang@linux.alibaba.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20211223043547.32297-1-tianjia.zhang@linux.alibaba.com> References: <20211223043547.32297-1-tianjia.zhang@linux.alibaba.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org SM3 generic library is stand-alone implementation, it is necessary for the calculation of sm2 z digest to depends on SM3 library instead of sm3-generic. Signed-off-by: Tianjia Zhang --- crypto/Kconfig | 2 +- crypto/sm2.c | 38 +++++++++++++++++++------------------- 2 files changed, 20 insertions(+), 20 deletions(-) diff --git a/crypto/Kconfig b/crypto/Kconfig index 01b9ca0836a5..60b252975dc4 100644 --- a/crypto/Kconfig +++ b/crypto/Kconfig @@ -267,7 +267,7 @@ config CRYPTO_ECRDSA config CRYPTO_SM2 tristate "SM2 algorithm" - select CRYPTO_SM3 + select CRYPTO_LIB_SM3 select CRYPTO_AKCIPHER select CRYPTO_MANAGER select MPILIB diff --git a/crypto/sm2.c b/crypto/sm2.c index db8a4a265669..ae3f77a66070 100644 --- a/crypto/sm2.c +++ b/crypto/sm2.c @@ -13,7 +13,7 @@ #include #include #include -#include +#include #include #include #include "sm2signature.asn1.h" @@ -213,7 +213,7 @@ int sm2_get_signature_s(void *context, size_t hdrlen, unsigned char tag, return 0; } -static int sm2_z_digest_update(struct shash_desc *desc, +static int sm2_z_digest_update(struct sm3_state *sctx, MPI m, unsigned int pbytes) { static const unsigned char zero[32]; @@ -226,20 +226,20 @@ static int sm2_z_digest_update(struct shash_desc *desc, if (inlen < pbytes) { /* padding with zero */ - crypto_sm3_update(desc, zero, pbytes - inlen); - crypto_sm3_update(desc, in, inlen); + sm3_update(sctx, zero, pbytes - inlen); + sm3_update(sctx, in, inlen); } else if (inlen > pbytes) { /* skip the starting zero */ - crypto_sm3_update(desc, in + inlen - pbytes, pbytes); + sm3_update(sctx, in + inlen - pbytes, pbytes); } else { - crypto_sm3_update(desc, in, inlen); + sm3_update(sctx, in, inlen); } kfree(in); return 0; } -static int sm2_z_digest_update_point(struct shash_desc *desc, +static int sm2_z_digest_update_point(struct sm3_state *sctx, MPI_POINT point, struct mpi_ec_ctx *ec, unsigned int pbytes) { MPI x, y; @@ -249,8 +249,8 @@ static int sm2_z_digest_update_point(struct shash_desc *desc, y = mpi_new(0); if (!mpi_ec_get_affine(x, y, point, ec) && - !sm2_z_digest_update(desc, x, pbytes) && - !sm2_z_digest_update(desc, y, pbytes)) + !sm2_z_digest_update(sctx, x, pbytes) && + !sm2_z_digest_update(sctx, y, pbytes)) ret = 0; mpi_free(x); @@ -265,7 +265,7 @@ int sm2_compute_z_digest(struct crypto_akcipher *tfm, struct mpi_ec_ctx *ec = akcipher_tfm_ctx(tfm); uint16_t bits_len; unsigned char entl[2]; - SHASH_DESC_ON_STACK(desc, NULL); + struct sm3_state sctx; unsigned int pbytes; if (id_len > (USHRT_MAX / 8) || !ec->Q) @@ -278,17 +278,17 @@ int sm2_compute_z_digest(struct crypto_akcipher *tfm, pbytes = MPI_NBYTES(ec->p); /* ZA = H256(ENTLA | IDA | a | b | xG | yG | xA | yA) */ - sm3_base_init(desc); - crypto_sm3_update(desc, entl, 2); - crypto_sm3_update(desc, id, id_len); - - if (sm2_z_digest_update(desc, ec->a, pbytes) || - sm2_z_digest_update(desc, ec->b, pbytes) || - sm2_z_digest_update_point(desc, ec->G, ec, pbytes) || - sm2_z_digest_update_point(desc, ec->Q, ec, pbytes)) + sm3_init(&sctx); + sm3_update(&sctx, entl, 2); + sm3_update(&sctx, id, id_len); + + if (sm2_z_digest_update(&sctx, ec->a, pbytes) || + sm2_z_digest_update(&sctx, ec->b, pbytes) || + sm2_z_digest_update_point(&sctx, ec->G, ec, pbytes) || + sm2_z_digest_update_point(&sctx, ec->Q, ec, pbytes)) return -EINVAL; - crypto_sm3_final(desc, dgst); + sm3_final(&sctx, dgst); return 0; } EXPORT_SYMBOL(sm2_compute_z_digest); From patchwork Thu Dec 23 04:35:45 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "tianjia.zhang" X-Patchwork-Id: 527771 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0239CC4332F for ; Thu, 23 Dec 2021 04:36:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346366AbhLWEgA (ORCPT ); Wed, 22 Dec 2021 23:36:00 -0500 Received: from out30-42.freemail.mail.aliyun.com ([115.124.30.42]:44826 "EHLO out30-42.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1346359AbhLWEf6 (ORCPT ); Wed, 22 Dec 2021 23:35:58 -0500 X-Alimail-AntiSpam: AC=PASS; BC=-1|-1; BR=01201311R641e4; CH=green; DM=||false|; DS=||; FP=0|-1|-1|-1|0|-1|-1|-1; HT=e01e04423; MF=tianjia.zhang@linux.alibaba.com; NM=1; PH=DS; RN=20; SR=0; TI=SMTPD_---0V.ULz3c_1640234153; Received: from localhost(mailfrom:tianjia.zhang@linux.alibaba.com fp:SMTPD_---0V.ULz3c_1640234153) by smtp.aliyun-inc.com(127.0.0.1); Thu, 23 Dec 2021 12:35:54 +0800 From: Tianjia Zhang To: Herbert Xu , "David S. Miller" , Vitaly Chikunov , Eric Biggers , Eric Biggers , Gilad Ben-Yossef , Ard Biesheuvel , Jussi Kivilinna , Catalin Marinas , Will Deacon , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , linux-crypto@vger.kernel.org, x86@kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org Cc: Tianjia Zhang Subject: [PATCH v3 4/6] crypto: sm3 - make dependent on sm3 library Date: Thu, 23 Dec 2021 12:35:45 +0800 Message-Id: <20211223043547.32297-5-tianjia.zhang@linux.alibaba.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20211223043547.32297-1-tianjia.zhang@linux.alibaba.com> References: <20211223043547.32297-1-tianjia.zhang@linux.alibaba.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org SM3 generic library is stand-alone implementation, it is necessary making the sm3-generic implementation to depends on SM3 library. The functions crypto_sm3_*() provided by sm3_generic is no longer exported. Signed-off-by: Tianjia Zhang --- crypto/Kconfig | 1 + crypto/sm3_generic.c | 142 +++++-------------------------------------- include/crypto/sm3.h | 10 --- 3 files changed, 16 insertions(+), 137 deletions(-) diff --git a/crypto/Kconfig b/crypto/Kconfig index 60b252975dc4..8e780bf66c0a 100644 --- a/crypto/Kconfig +++ b/crypto/Kconfig @@ -999,6 +999,7 @@ config CRYPTO_SHA3 config CRYPTO_SM3 tristate "SM3 digest algorithm" select CRYPTO_HASH + select CRYPTO_LIB_SM3 help SM3 secure hash function as defined by OSCCA GM/T 0004-2012 SM3). It is part of the Chinese Commercial Cryptography suite. diff --git a/crypto/sm3_generic.c b/crypto/sm3_generic.c index 193c4584bd00..a215c1c37e73 100644 --- a/crypto/sm3_generic.c +++ b/crypto/sm3_generic.c @@ -5,6 +5,7 @@ * * Copyright (C) 2017 ARM Limited or its affiliates. * Written by Gilad Ben-Yossef + * Copyright (C) 2021 Tianjia Zhang */ #include @@ -26,143 +27,29 @@ const u8 sm3_zero_message_hash[SM3_DIGEST_SIZE] = { }; EXPORT_SYMBOL_GPL(sm3_zero_message_hash); -static inline u32 p0(u32 x) -{ - return x ^ rol32(x, 9) ^ rol32(x, 17); -} - -static inline u32 p1(u32 x) -{ - return x ^ rol32(x, 15) ^ rol32(x, 23); -} - -static inline u32 ff(unsigned int n, u32 a, u32 b, u32 c) -{ - return (n < 16) ? (a ^ b ^ c) : ((a & b) | (a & c) | (b & c)); -} - -static inline u32 gg(unsigned int n, u32 e, u32 f, u32 g) -{ - return (n < 16) ? (e ^ f ^ g) : ((e & f) | ((~e) & g)); -} - -static inline u32 t(unsigned int n) -{ - return (n < 16) ? SM3_T1 : SM3_T2; -} - -static void sm3_expand(u32 *t, u32 *w, u32 *wt) -{ - int i; - unsigned int tmp; - - /* load the input */ - for (i = 0; i <= 15; i++) - w[i] = get_unaligned_be32((__u32 *)t + i); - - for (i = 16; i <= 67; i++) { - tmp = w[i - 16] ^ w[i - 9] ^ rol32(w[i - 3], 15); - w[i] = p1(tmp) ^ (rol32(w[i - 13], 7)) ^ w[i - 6]; - } - - for (i = 0; i <= 63; i++) - wt[i] = w[i] ^ w[i + 4]; -} - -static void sm3_compress(u32 *w, u32 *wt, u32 *m) -{ - u32 ss1; - u32 ss2; - u32 tt1; - u32 tt2; - u32 a, b, c, d, e, f, g, h; - int i; - - a = m[0]; - b = m[1]; - c = m[2]; - d = m[3]; - e = m[4]; - f = m[5]; - g = m[6]; - h = m[7]; - - for (i = 0; i <= 63; i++) { - - ss1 = rol32((rol32(a, 12) + e + rol32(t(i), i & 31)), 7); - - ss2 = ss1 ^ rol32(a, 12); - - tt1 = ff(i, a, b, c) + d + ss2 + *wt; - wt++; - - tt2 = gg(i, e, f, g) + h + ss1 + *w; - w++; - - d = c; - c = rol32(b, 9); - b = a; - a = tt1; - h = g; - g = rol32(f, 19); - f = e; - e = p0(tt2); - } - - m[0] = a ^ m[0]; - m[1] = b ^ m[1]; - m[2] = c ^ m[2]; - m[3] = d ^ m[3]; - m[4] = e ^ m[4]; - m[5] = f ^ m[5]; - m[6] = g ^ m[6]; - m[7] = h ^ m[7]; - - a = b = c = d = e = f = g = h = ss1 = ss2 = tt1 = tt2 = 0; -} - -static void sm3_transform(struct sm3_state *sst, u8 const *src) -{ - unsigned int w[68]; - unsigned int wt[64]; - - sm3_expand((u32 *)src, w, wt); - sm3_compress(w, wt, sst->state); - - memzero_explicit(w, sizeof(w)); - memzero_explicit(wt, sizeof(wt)); -} - -static void sm3_generic_block_fn(struct sm3_state *sst, u8 const *src, - int blocks) -{ - while (blocks--) { - sm3_transform(sst, src); - src += SM3_BLOCK_SIZE; - } -} - -int crypto_sm3_update(struct shash_desc *desc, const u8 *data, +static int crypto_sm3_update(struct shash_desc *desc, const u8 *data, unsigned int len) { - return sm3_base_do_update(desc, data, len, sm3_generic_block_fn); + sm3_update(shash_desc_ctx(desc), data, len); + return 0; } -EXPORT_SYMBOL(crypto_sm3_update); -int crypto_sm3_final(struct shash_desc *desc, u8 *out) +static int crypto_sm3_final(struct shash_desc *desc, u8 *out) { - sm3_base_do_finalize(desc, sm3_generic_block_fn); - return sm3_base_finish(desc, out); + sm3_final(shash_desc_ctx(desc), out); + return 0; } -EXPORT_SYMBOL(crypto_sm3_final); -int crypto_sm3_finup(struct shash_desc *desc, const u8 *data, +static int crypto_sm3_finup(struct shash_desc *desc, const u8 *data, unsigned int len, u8 *hash) { - sm3_base_do_update(desc, data, len, sm3_generic_block_fn); - return crypto_sm3_final(desc, hash); + struct sm3_state *sctx = shash_desc_ctx(desc); + + if (len) + sm3_update(sctx, data, len); + sm3_final(sctx, hash); + return 0; } -EXPORT_SYMBOL(crypto_sm3_finup); static struct shash_alg sm3_alg = { .digestsize = SM3_DIGEST_SIZE, @@ -174,6 +61,7 @@ static struct shash_alg sm3_alg = { .base = { .cra_name = "sm3", .cra_driver_name = "sm3-generic", + .cra_priority = 100, .cra_blocksize = SM3_BLOCK_SIZE, .cra_module = THIS_MODULE, } diff --git a/include/crypto/sm3.h b/include/crypto/sm3.h index b5fb6d1bf247..1f021ad0533f 100644 --- a/include/crypto/sm3.h +++ b/include/crypto/sm3.h @@ -35,16 +35,6 @@ struct sm3_state { u8 buffer[SM3_BLOCK_SIZE]; }; -struct shash_desc; - -extern int crypto_sm3_update(struct shash_desc *desc, const u8 *data, - unsigned int len); - -extern int crypto_sm3_final(struct shash_desc *desc, u8 *out); - -extern int crypto_sm3_finup(struct shash_desc *desc, const u8 *data, - unsigned int len, u8 *hash); - /* * Stand-alone implementation of the SM3 algorithm. It is designed to * have as little dependencies as possible so it can be used in the From patchwork Thu Dec 23 04:35:46 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "tianjia.zhang" X-Patchwork-Id: 527770 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DE6DAC433FE for ; Thu, 23 Dec 2021 04:36:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346413AbhLWEgF (ORCPT ); Wed, 22 Dec 2021 23:36:05 -0500 Received: from out30-56.freemail.mail.aliyun.com ([115.124.30.56]:44373 "EHLO out30-56.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1346371AbhLWEgA (ORCPT ); Wed, 22 Dec 2021 23:36:00 -0500 X-Alimail-AntiSpam: AC=PASS; BC=-1|-1; BR=01201311R621e4; CH=green; DM=||false|; DS=||; FP=0|-1|-1|-1|0|-1|-1|-1; HT=e01e04426; MF=tianjia.zhang@linux.alibaba.com; NM=1; PH=DS; RN=20; SR=0; TI=SMTPD_---0V.VW5QU_1640234155; Received: from localhost(mailfrom:tianjia.zhang@linux.alibaba.com fp:SMTPD_---0V.VW5QU_1640234155) by smtp.aliyun-inc.com(127.0.0.1); Thu, 23 Dec 2021 12:35:56 +0800 From: Tianjia Zhang To: Herbert Xu , "David S. Miller" , Vitaly Chikunov , Eric Biggers , Eric Biggers , Gilad Ben-Yossef , Ard Biesheuvel , Jussi Kivilinna , Catalin Marinas , Will Deacon , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , linux-crypto@vger.kernel.org, x86@kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org Cc: Tianjia Zhang Subject: [PATCH v3 5/6] crypto: x86/sm3 - add AVX assembly implementation Date: Thu, 23 Dec 2021 12:35:46 +0800 Message-Id: <20211223043547.32297-6-tianjia.zhang@linux.alibaba.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20211223043547.32297-1-tianjia.zhang@linux.alibaba.com> References: <20211223043547.32297-1-tianjia.zhang@linux.alibaba.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org This patch adds AVX assembly accelerated implementation of SM3 secure hash algorithm. From the benchmark data, compared to pure software implementation sm3-generic, the performance increase is up to 38%. The main algorithm implementation based on SM3 AES/BMI2 accelerated work by libgcrypt at: https://gnupg.org/software/libgcrypt/index.html Benchmark on Intel i5-6200U 2.30GHz, performance data of two implementations, pure software sm3-generic and sm3-avx acceleration. The data comes from the 326 mode and 422 mode of tcrypt. The abscissas are different lengths of per update. The data is tabulated and the unit is Mb/s: update-size | 16 64 256 1024 2048 4096 8192 -------------------------------------------------------------------- sm3-generic | 105.97 129.60 182.12 189.62 188.06 193.66 194.88 sm3-avx | 119.87 163.05 244.44 260.92 257.60 264.87 265.88 Signed-off-by: Tianjia Zhang --- arch/x86/crypto/Makefile | 3 + arch/x86/crypto/sm3-avx-asm_64.S | 517 +++++++++++++++++++++++++++++++ arch/x86/crypto/sm3_avx_glue.c | 134 ++++++++ crypto/Kconfig | 13 + 4 files changed, 667 insertions(+) create mode 100644 arch/x86/crypto/sm3-avx-asm_64.S create mode 100644 arch/x86/crypto/sm3_avx_glue.c diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile index f307c93fc90a..7cbe860f6201 100644 --- a/arch/x86/crypto/Makefile +++ b/arch/x86/crypto/Makefile @@ -88,6 +88,9 @@ nhpoly1305-avx2-y := nh-avx2-x86_64.o nhpoly1305-avx2-glue.o obj-$(CONFIG_CRYPTO_CURVE25519_X86) += curve25519-x86_64.o +obj-$(CONFIG_CRYPTO_SM3_AVX_X86_64) += sm3-avx-x86_64.o +sm3-avx-x86_64-y := sm3-avx-asm_64.o sm3_avx_glue.o + obj-$(CONFIG_CRYPTO_SM4_AESNI_AVX_X86_64) += sm4-aesni-avx-x86_64.o sm4-aesni-avx-x86_64-y := sm4-aesni-avx-asm_64.o sm4_aesni_avx_glue.o diff --git a/arch/x86/crypto/sm3-avx-asm_64.S b/arch/x86/crypto/sm3-avx-asm_64.S new file mode 100644 index 000000000000..71e6aae23e17 --- /dev/null +++ b/arch/x86/crypto/sm3-avx-asm_64.S @@ -0,0 +1,517 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * SM3 AVX accelerated transform. + * specified in: https://datatracker.ietf.org/doc/html/draft-sca-cfrg-sm3-02 + * + * Copyright (C) 2021 Jussi Kivilinna + * Copyright (C) 2021 Tianjia Zhang + */ + +/* Based on SM3 AES/BMI2 accelerated work by libgcrypt at: + * https://gnupg.org/software/libgcrypt/index.html + */ + +#include +#include + +/* Context structure */ + +#define state_h0 0 +#define state_h1 4 +#define state_h2 8 +#define state_h3 12 +#define state_h4 16 +#define state_h5 20 +#define state_h6 24 +#define state_h7 28 + +/* Constants */ + +/* Round constant macros */ + +#define K0 2043430169 /* 0x79cc4519 */ +#define K1 -208106958 /* 0xf3988a32 */ +#define K2 -416213915 /* 0xe7311465 */ +#define K3 -832427829 /* 0xce6228cb */ +#define K4 -1664855657 /* 0x9cc45197 */ +#define K5 965255983 /* 0x3988a32f */ +#define K6 1930511966 /* 0x7311465e */ +#define K7 -433943364 /* 0xe6228cbc */ +#define K8 -867886727 /* 0xcc451979 */ +#define K9 -1735773453 /* 0x988a32f3 */ +#define K10 823420391 /* 0x311465e7 */ +#define K11 1646840782 /* 0x6228cbce */ +#define K12 -1001285732 /* 0xc451979c */ +#define K13 -2002571463 /* 0x88a32f39 */ +#define K14 289824371 /* 0x11465e73 */ +#define K15 579648742 /* 0x228cbce6 */ +#define K16 -1651869049 /* 0x9d8a7a87 */ +#define K17 991229199 /* 0x3b14f50f */ +#define K18 1982458398 /* 0x7629ea1e */ +#define K19 -330050500 /* 0xec53d43c */ +#define K20 -660100999 /* 0xd8a7a879 */ +#define K21 -1320201997 /* 0xb14f50f3 */ +#define K22 1654563303 /* 0x629ea1e7 */ +#define K23 -985840690 /* 0xc53d43ce */ +#define K24 -1971681379 /* 0x8a7a879d */ +#define K25 351604539 /* 0x14f50f3b */ +#define K26 703209078 /* 0x29ea1e76 */ +#define K27 1406418156 /* 0x53d43cec */ +#define K28 -1482130984 /* 0xa7a879d8 */ +#define K29 1330705329 /* 0x4f50f3b1 */ +#define K30 -1633556638 /* 0x9ea1e762 */ +#define K31 1027854021 /* 0x3d43cec5 */ +#define K32 2055708042 /* 0x7a879d8a */ +#define K33 -183551212 /* 0xf50f3b14 */ +#define K34 -367102423 /* 0xea1e7629 */ +#define K35 -734204845 /* 0xd43cec53 */ +#define K36 -1468409689 /* 0xa879d8a7 */ +#define K37 1358147919 /* 0x50f3b14f */ +#define K38 -1578671458 /* 0xa1e7629e */ +#define K39 1137624381 /* 0x43cec53d */ +#define K40 -2019718534 /* 0x879d8a7a */ +#define K41 255530229 /* 0x0f3b14f5 */ +#define K42 511060458 /* 0x1e7629ea */ +#define K43 1022120916 /* 0x3cec53d4 */ +#define K44 2044241832 /* 0x79d8a7a8 */ +#define K45 -206483632 /* 0xf3b14f50 */ +#define K46 -412967263 /* 0xe7629ea1 */ +#define K47 -825934525 /* 0xcec53d43 */ +#define K48 -1651869049 /* 0x9d8a7a87 */ +#define K49 991229199 /* 0x3b14f50f */ +#define K50 1982458398 /* 0x7629ea1e */ +#define K51 -330050500 /* 0xec53d43c */ +#define K52 -660100999 /* 0xd8a7a879 */ +#define K53 -1320201997 /* 0xb14f50f3 */ +#define K54 1654563303 /* 0x629ea1e7 */ +#define K55 -985840690 /* 0xc53d43ce */ +#define K56 -1971681379 /* 0x8a7a879d */ +#define K57 351604539 /* 0x14f50f3b */ +#define K58 703209078 /* 0x29ea1e76 */ +#define K59 1406418156 /* 0x53d43cec */ +#define K60 -1482130984 /* 0xa7a879d8 */ +#define K61 1330705329 /* 0x4f50f3b1 */ +#define K62 -1633556638 /* 0x9ea1e762 */ +#define K63 1027854021 /* 0x3d43cec5 */ + +/* Register macros */ + +#define RSTATE %rdi +#define RDATA %rsi +#define RNBLKS %rdx + +#define t0 %eax +#define t1 %ebx +#define t2 %ecx + +#define a %r8d +#define b %r9d +#define c %r10d +#define d %r11d +#define e %r12d +#define f %r13d +#define g %r14d +#define h %r15d + +#define W0 %xmm0 +#define W1 %xmm1 +#define W2 %xmm2 +#define W3 %xmm3 +#define W4 %xmm4 +#define W5 %xmm5 + +#define XTMP0 %xmm6 +#define XTMP1 %xmm7 +#define XTMP2 %xmm8 +#define XTMP3 %xmm9 +#define XTMP4 %xmm10 +#define XTMP5 %xmm11 +#define XTMP6 %xmm12 + +#define BSWAP_REG %xmm15 + +/* Stack structure */ + +#define STACK_W_SIZE (32 * 2 * 3) +#define STACK_REG_SAVE_SIZE (64) + +#define STACK_W (0) +#define STACK_REG_SAVE (STACK_W + STACK_W_SIZE) +#define STACK_SIZE (STACK_REG_SAVE + STACK_REG_SAVE_SIZE) + +/* Instruction helpers. */ + +#define roll2(v, reg) \ + roll $(v), reg; + +#define roll3mov(v, src, dst) \ + movl src, dst; \ + roll $(v), dst; + +#define roll3(v, src, dst) \ + rorxl $(32-(v)), src, dst; + +#define addl2(a, out) \ + leal (a, out), out; + +/* Round function macros. */ + +#define GG1(x, y, z, o, t) \ + movl x, o; \ + xorl y, o; \ + xorl z, o; + +#define FF1(x, y, z, o, t) GG1(x, y, z, o, t) + +#define GG2(x, y, z, o, t) \ + andnl z, x, o; \ + movl y, t; \ + andl x, t; \ + addl2(t, o); + +#define FF2(x, y, z, o, t) \ + movl y, o; \ + xorl x, o; \ + movl y, t; \ + andl x, t; \ + andl z, o; \ + xorl t, o; + +#define R(i, a, b, c, d, e, f, g, h, round, widx, wtype) \ + /* rol(a, 12) => t0 */ \ + roll3mov(12, a, t0); /* rorxl here would reduce perf by 6% on zen3 */ \ + /* rol (t0 + e + t), 7) => t1 */ \ + leal K##round(t0, e, 1), t1; \ + roll2(7, t1); \ + /* h + w1 => h */ \ + addl wtype##_W1_ADDR(round, widx), h; \ + /* h + t1 => h */ \ + addl2(t1, h); \ + /* t1 ^ t0 => t0 */ \ + xorl t1, t0; \ + /* w1w2 + d => d */ \ + addl wtype##_W1W2_ADDR(round, widx), d; \ + /* FF##i(a,b,c) => t1 */ \ + FF##i(a, b, c, t1, t2); \ + /* d + t1 => d */ \ + addl2(t1, d); \ + /* GG#i(e,f,g) => t2 */ \ + GG##i(e, f, g, t2, t1); \ + /* h + t2 => h */ \ + addl2(t2, h); \ + /* rol (f, 19) => f */ \ + roll2(19, f); \ + /* d + t0 => d */ \ + addl2(t0, d); \ + /* rol (b, 9) => b */ \ + roll2(9, b); \ + /* P0(h) => h */ \ + roll3(9, h, t2); \ + roll3(17, h, t1); \ + xorl t2, h; \ + xorl t1, h; + +#define R1(a, b, c, d, e, f, g, h, round, widx, wtype) \ + R(1, a, b, c, d, e, f, g, h, round, widx, wtype) + +#define R2(a, b, c, d, e, f, g, h, round, widx, wtype) \ + R(2, a, b, c, d, e, f, g, h, round, widx, wtype) + +/* Input expansion macros. */ + +/* Byte-swapped input address. */ +#define IW_W_ADDR(round, widx, offs) \ + (STACK_W + ((round) / 4) * 64 + (offs) + ((widx) * 4))(%rsp) + +/* Expanded input address. */ +#define XW_W_ADDR(round, widx, offs) \ + (STACK_W + ((((round) / 3) - 4) % 2) * 64 + (offs) + ((widx) * 4))(%rsp) + +/* Rounds 1-12, byte-swapped input block addresses. */ +#define IW_W1_ADDR(round, widx) IW_W_ADDR(round, widx, 0) +#define IW_W1W2_ADDR(round, widx) IW_W_ADDR(round, widx, 32) + +/* Rounds 1-12, expanded input block addresses. */ +#define XW_W1_ADDR(round, widx) XW_W_ADDR(round, widx, 0) +#define XW_W1W2_ADDR(round, widx) XW_W_ADDR(round, widx, 32) + +/* Input block loading. */ +#define LOAD_W_XMM_1() \ + vmovdqu 0*16(RDATA), XTMP0; /* XTMP0: w3, w2, w1, w0 */ \ + vmovdqu 1*16(RDATA), XTMP1; /* XTMP1: w7, w6, w5, w4 */ \ + vmovdqu 2*16(RDATA), XTMP2; /* XTMP2: w11, w10, w9, w8 */ \ + vmovdqu 3*16(RDATA), XTMP3; /* XTMP3: w15, w14, w13, w12 */ \ + vpshufb BSWAP_REG, XTMP0, XTMP0; \ + vpshufb BSWAP_REG, XTMP1, XTMP1; \ + vpshufb BSWAP_REG, XTMP2, XTMP2; \ + vpshufb BSWAP_REG, XTMP3, XTMP3; \ + vpxor XTMP0, XTMP1, XTMP4; \ + vpxor XTMP1, XTMP2, XTMP5; \ + vpxor XTMP2, XTMP3, XTMP6; \ + leaq 64(RDATA), RDATA; \ + vmovdqa XTMP0, IW_W1_ADDR(0, 0); \ + vmovdqa XTMP4, IW_W1W2_ADDR(0, 0); \ + vmovdqa XTMP1, IW_W1_ADDR(4, 0); \ + vmovdqa XTMP5, IW_W1W2_ADDR(4, 0); + +#define LOAD_W_XMM_2() \ + vmovdqa XTMP2, IW_W1_ADDR(8, 0); \ + vmovdqa XTMP6, IW_W1W2_ADDR(8, 0); + +#define LOAD_W_XMM_3() \ + vpshufd $0b00000000, XTMP0, W0; /* W0: xx, w0, xx, xx */ \ + vpshufd $0b11111001, XTMP0, W1; /* W1: xx, w3, w2, w1 */ \ + vmovdqa XTMP1, W2; /* W2: xx, w6, w5, w4 */ \ + vpalignr $12, XTMP1, XTMP2, W3; /* W3: xx, w9, w8, w7 */ \ + vpalignr $8, XTMP2, XTMP3, W4; /* W4: xx, w12, w11, w10 */ \ + vpshufd $0b11111001, XTMP3, W5; /* W5: xx, w15, w14, w13 */ + +/* Message scheduling. Note: 3 words per XMM register. */ +#define SCHED_W_0(round, w0, w1, w2, w3, w4, w5) \ + /* Load (w[i - 16]) => XTMP0 */ \ + vpshufd $0b10111111, w0, XTMP0; \ + vpalignr $12, XTMP0, w1, XTMP0; /* XTMP0: xx, w2, w1, w0 */ \ + /* Load (w[i - 13]) => XTMP1 */ \ + vpshufd $0b10111111, w1, XTMP1; \ + vpalignr $12, XTMP1, w2, XTMP1; \ + /* w[i - 9] == w3 */ \ + /* XMM3 ^ XTMP0 => XTMP0 */ \ + vpxor w3, XTMP0, XTMP0; + +#define SCHED_W_1(round, w0, w1, w2, w3, w4, w5) \ + /* w[i - 3] == w5 */ \ + /* rol(XMM5, 15) ^ XTMP0 => XTMP0 */ \ + vpslld $15, w5, XTMP2; \ + vpsrld $(32-15), w5, XTMP3; \ + vpxor XTMP2, XTMP3, XTMP3; \ + vpxor XTMP3, XTMP0, XTMP0; \ + /* rol(XTMP1, 7) => XTMP1 */ \ + vpslld $7, XTMP1, XTMP5; \ + vpsrld $(32-7), XTMP1, XTMP1; \ + vpxor XTMP5, XTMP1, XTMP1; \ + /* XMM4 ^ XTMP1 => XTMP1 */ \ + vpxor w4, XTMP1, XTMP1; \ + /* w[i - 6] == XMM4 */ \ + /* P1(XTMP0) ^ XTMP1 => XMM0 */ \ + vpslld $15, XTMP0, XTMP5; \ + vpsrld $(32-15), XTMP0, XTMP6; \ + vpslld $23, XTMP0, XTMP2; \ + vpsrld $(32-23), XTMP0, XTMP3; \ + vpxor XTMP0, XTMP1, XTMP1; \ + vpxor XTMP6, XTMP5, XTMP5; \ + vpxor XTMP3, XTMP2, XTMP2; \ + vpxor XTMP2, XTMP5, XTMP5; \ + vpxor XTMP5, XTMP1, w0; + +#define SCHED_W_2(round, w0, w1, w2, w3, w4, w5) \ + /* W1 in XMM12 */ \ + vpshufd $0b10111111, w4, XTMP4; \ + vpalignr $12, XTMP4, w5, XTMP4; \ + vmovdqa XTMP4, XW_W1_ADDR((round), 0); \ + /* W1 ^ W2 => XTMP1 */ \ + vpxor w0, XTMP4, XTMP1; \ + vmovdqa XTMP1, XW_W1W2_ADDR((round), 0); + + +.section .rodata.cst16, "aM", @progbits, 16 +.align 16 + +.Lbe32mask: + .long 0x00010203, 0x04050607, 0x08090a0b, 0x0c0d0e0f + +.text + +/* + * Transform nblocks*64 bytes (nblocks*16 32-bit words) at DATA. + * + * void sm3_transform_avx(struct sm3_state *state, + * const u8 *data, int nblocks); + */ +.align 16 +SYM_FUNC_START(sm3_transform_avx) + /* input: + * %rdi: ctx, CTX + * %rsi: data (64*nblks bytes) + * %rdx: nblocks + */ + vzeroupper; + + pushq %rbp; + movq %rsp, %rbp; + + movq %rdx, RNBLKS; + + subq $STACK_SIZE, %rsp; + andq $(~63), %rsp; + + movq %rbx, (STACK_REG_SAVE + 0 * 8)(%rsp); + movq %r15, (STACK_REG_SAVE + 1 * 8)(%rsp); + movq %r14, (STACK_REG_SAVE + 2 * 8)(%rsp); + movq %r13, (STACK_REG_SAVE + 3 * 8)(%rsp); + movq %r12, (STACK_REG_SAVE + 4 * 8)(%rsp); + + vmovdqa .Lbe32mask (%rip), BSWAP_REG; + + /* Get the values of the chaining variables. */ + movl state_h0(RSTATE), a; + movl state_h1(RSTATE), b; + movl state_h2(RSTATE), c; + movl state_h3(RSTATE), d; + movl state_h4(RSTATE), e; + movl state_h5(RSTATE), f; + movl state_h6(RSTATE), g; + movl state_h7(RSTATE), h; + +.align 16 +.Loop: + /* Load data part1. */ + LOAD_W_XMM_1(); + + leaq -1(RNBLKS), RNBLKS; + + /* Transform 0-3 + Load data part2. */ + R1(a, b, c, d, e, f, g, h, 0, 0, IW); LOAD_W_XMM_2(); + R1(d, a, b, c, h, e, f, g, 1, 1, IW); + R1(c, d, a, b, g, h, e, f, 2, 2, IW); + R1(b, c, d, a, f, g, h, e, 3, 3, IW); LOAD_W_XMM_3(); + + /* Transform 4-7 + Precalc 12-14. */ + R1(a, b, c, d, e, f, g, h, 4, 0, IW); + R1(d, a, b, c, h, e, f, g, 5, 1, IW); + R1(c, d, a, b, g, h, e, f, 6, 2, IW); SCHED_W_0(12, W0, W1, W2, W3, W4, W5); + R1(b, c, d, a, f, g, h, e, 7, 3, IW); SCHED_W_1(12, W0, W1, W2, W3, W4, W5); + + /* Transform 8-11 + Precalc 12-17. */ + R1(a, b, c, d, e, f, g, h, 8, 0, IW); SCHED_W_2(12, W0, W1, W2, W3, W4, W5); + R1(d, a, b, c, h, e, f, g, 9, 1, IW); SCHED_W_0(15, W1, W2, W3, W4, W5, W0); + R1(c, d, a, b, g, h, e, f, 10, 2, IW); SCHED_W_1(15, W1, W2, W3, W4, W5, W0); + R1(b, c, d, a, f, g, h, e, 11, 3, IW); SCHED_W_2(15, W1, W2, W3, W4, W5, W0); + + /* Transform 12-14 + Precalc 18-20 */ + R1(a, b, c, d, e, f, g, h, 12, 0, XW); SCHED_W_0(18, W2, W3, W4, W5, W0, W1); + R1(d, a, b, c, h, e, f, g, 13, 1, XW); SCHED_W_1(18, W2, W3, W4, W5, W0, W1); + R1(c, d, a, b, g, h, e, f, 14, 2, XW); SCHED_W_2(18, W2, W3, W4, W5, W0, W1); + + /* Transform 15-17 + Precalc 21-23 */ + R1(b, c, d, a, f, g, h, e, 15, 0, XW); SCHED_W_0(21, W3, W4, W5, W0, W1, W2); + R2(a, b, c, d, e, f, g, h, 16, 1, XW); SCHED_W_1(21, W3, W4, W5, W0, W1, W2); + R2(d, a, b, c, h, e, f, g, 17, 2, XW); SCHED_W_2(21, W3, W4, W5, W0, W1, W2); + + /* Transform 18-20 + Precalc 24-26 */ + R2(c, d, a, b, g, h, e, f, 18, 0, XW); SCHED_W_0(24, W4, W5, W0, W1, W2, W3); + R2(b, c, d, a, f, g, h, e, 19, 1, XW); SCHED_W_1(24, W4, W5, W0, W1, W2, W3); + R2(a, b, c, d, e, f, g, h, 20, 2, XW); SCHED_W_2(24, W4, W5, W0, W1, W2, W3); + + /* Transform 21-23 + Precalc 27-29 */ + R2(d, a, b, c, h, e, f, g, 21, 0, XW); SCHED_W_0(27, W5, W0, W1, W2, W3, W4); + R2(c, d, a, b, g, h, e, f, 22, 1, XW); SCHED_W_1(27, W5, W0, W1, W2, W3, W4); + R2(b, c, d, a, f, g, h, e, 23, 2, XW); SCHED_W_2(27, W5, W0, W1, W2, W3, W4); + + /* Transform 24-26 + Precalc 30-32 */ + R2(a, b, c, d, e, f, g, h, 24, 0, XW); SCHED_W_0(30, W0, W1, W2, W3, W4, W5); + R2(d, a, b, c, h, e, f, g, 25, 1, XW); SCHED_W_1(30, W0, W1, W2, W3, W4, W5); + R2(c, d, a, b, g, h, e, f, 26, 2, XW); SCHED_W_2(30, W0, W1, W2, W3, W4, W5); + + /* Transform 27-29 + Precalc 33-35 */ + R2(b, c, d, a, f, g, h, e, 27, 0, XW); SCHED_W_0(33, W1, W2, W3, W4, W5, W0); + R2(a, b, c, d, e, f, g, h, 28, 1, XW); SCHED_W_1(33, W1, W2, W3, W4, W5, W0); + R2(d, a, b, c, h, e, f, g, 29, 2, XW); SCHED_W_2(33, W1, W2, W3, W4, W5, W0); + + /* Transform 30-32 + Precalc 36-38 */ + R2(c, d, a, b, g, h, e, f, 30, 0, XW); SCHED_W_0(36, W2, W3, W4, W5, W0, W1); + R2(b, c, d, a, f, g, h, e, 31, 1, XW); SCHED_W_1(36, W2, W3, W4, W5, W0, W1); + R2(a, b, c, d, e, f, g, h, 32, 2, XW); SCHED_W_2(36, W2, W3, W4, W5, W0, W1); + + /* Transform 33-35 + Precalc 39-41 */ + R2(d, a, b, c, h, e, f, g, 33, 0, XW); SCHED_W_0(39, W3, W4, W5, W0, W1, W2); + R2(c, d, a, b, g, h, e, f, 34, 1, XW); SCHED_W_1(39, W3, W4, W5, W0, W1, W2); + R2(b, c, d, a, f, g, h, e, 35, 2, XW); SCHED_W_2(39, W3, W4, W5, W0, W1, W2); + + /* Transform 36-38 + Precalc 42-44 */ + R2(a, b, c, d, e, f, g, h, 36, 0, XW); SCHED_W_0(42, W4, W5, W0, W1, W2, W3); + R2(d, a, b, c, h, e, f, g, 37, 1, XW); SCHED_W_1(42, W4, W5, W0, W1, W2, W3); + R2(c, d, a, b, g, h, e, f, 38, 2, XW); SCHED_W_2(42, W4, W5, W0, W1, W2, W3); + + /* Transform 39-41 + Precalc 45-47 */ + R2(b, c, d, a, f, g, h, e, 39, 0, XW); SCHED_W_0(45, W5, W0, W1, W2, W3, W4); + R2(a, b, c, d, e, f, g, h, 40, 1, XW); SCHED_W_1(45, W5, W0, W1, W2, W3, W4); + R2(d, a, b, c, h, e, f, g, 41, 2, XW); SCHED_W_2(45, W5, W0, W1, W2, W3, W4); + + /* Transform 42-44 + Precalc 48-50 */ + R2(c, d, a, b, g, h, e, f, 42, 0, XW); SCHED_W_0(48, W0, W1, W2, W3, W4, W5); + R2(b, c, d, a, f, g, h, e, 43, 1, XW); SCHED_W_1(48, W0, W1, W2, W3, W4, W5); + R2(a, b, c, d, e, f, g, h, 44, 2, XW); SCHED_W_2(48, W0, W1, W2, W3, W4, W5); + + /* Transform 45-47 + Precalc 51-53 */ + R2(d, a, b, c, h, e, f, g, 45, 0, XW); SCHED_W_0(51, W1, W2, W3, W4, W5, W0); + R2(c, d, a, b, g, h, e, f, 46, 1, XW); SCHED_W_1(51, W1, W2, W3, W4, W5, W0); + R2(b, c, d, a, f, g, h, e, 47, 2, XW); SCHED_W_2(51, W1, W2, W3, W4, W5, W0); + + /* Transform 48-50 + Precalc 54-56 */ + R2(a, b, c, d, e, f, g, h, 48, 0, XW); SCHED_W_0(54, W2, W3, W4, W5, W0, W1); + R2(d, a, b, c, h, e, f, g, 49, 1, XW); SCHED_W_1(54, W2, W3, W4, W5, W0, W1); + R2(c, d, a, b, g, h, e, f, 50, 2, XW); SCHED_W_2(54, W2, W3, W4, W5, W0, W1); + + /* Transform 51-53 + Precalc 57-59 */ + R2(b, c, d, a, f, g, h, e, 51, 0, XW); SCHED_W_0(57, W3, W4, W5, W0, W1, W2); + R2(a, b, c, d, e, f, g, h, 52, 1, XW); SCHED_W_1(57, W3, W4, W5, W0, W1, W2); + R2(d, a, b, c, h, e, f, g, 53, 2, XW); SCHED_W_2(57, W3, W4, W5, W0, W1, W2); + + /* Transform 54-56 + Precalc 60-62 */ + R2(c, d, a, b, g, h, e, f, 54, 0, XW); SCHED_W_0(60, W4, W5, W0, W1, W2, W3); + R2(b, c, d, a, f, g, h, e, 55, 1, XW); SCHED_W_1(60, W4, W5, W0, W1, W2, W3); + R2(a, b, c, d, e, f, g, h, 56, 2, XW); SCHED_W_2(60, W4, W5, W0, W1, W2, W3); + + /* Transform 57-59 + Precalc 63 */ + R2(d, a, b, c, h, e, f, g, 57, 0, XW); SCHED_W_0(63, W5, W0, W1, W2, W3, W4); + R2(c, d, a, b, g, h, e, f, 58, 1, XW); + R2(b, c, d, a, f, g, h, e, 59, 2, XW); SCHED_W_1(63, W5, W0, W1, W2, W3, W4); + + /* Transform 60-62 + Precalc 63 */ + R2(a, b, c, d, e, f, g, h, 60, 0, XW); + R2(d, a, b, c, h, e, f, g, 61, 1, XW); SCHED_W_2(63, W5, W0, W1, W2, W3, W4); + R2(c, d, a, b, g, h, e, f, 62, 2, XW); + + /* Transform 63 */ + R2(b, c, d, a, f, g, h, e, 63, 0, XW); + + /* Update the chaining variables. */ + xorl state_h0(RSTATE), a; + xorl state_h1(RSTATE), b; + xorl state_h2(RSTATE), c; + xorl state_h3(RSTATE), d; + movl a, state_h0(RSTATE); + movl b, state_h1(RSTATE); + movl c, state_h2(RSTATE); + movl d, state_h3(RSTATE); + xorl state_h4(RSTATE), e; + xorl state_h5(RSTATE), f; + xorl state_h6(RSTATE), g; + xorl state_h7(RSTATE), h; + movl e, state_h4(RSTATE); + movl f, state_h5(RSTATE); + movl g, state_h6(RSTATE); + movl h, state_h7(RSTATE); + + cmpq $0, RNBLKS; + jne .Loop; + + vzeroall; + + movq (STACK_REG_SAVE + 0 * 8)(%rsp), %rbx; + movq (STACK_REG_SAVE + 1 * 8)(%rsp), %r15; + movq (STACK_REG_SAVE + 2 * 8)(%rsp), %r14; + movq (STACK_REG_SAVE + 3 * 8)(%rsp), %r13; + movq (STACK_REG_SAVE + 4 * 8)(%rsp), %r12; + + vmovdqa %xmm0, IW_W1_ADDR(0, 0); + vmovdqa %xmm0, IW_W1W2_ADDR(0, 0); + vmovdqa %xmm0, IW_W1_ADDR(4, 0); + vmovdqa %xmm0, IW_W1W2_ADDR(4, 0); + vmovdqa %xmm0, IW_W1_ADDR(8, 0); + vmovdqa %xmm0, IW_W1W2_ADDR(8, 0); + + movq %rbp, %rsp; + popq %rbp; + ret; +SYM_FUNC_END(sm3_transform_avx) diff --git a/arch/x86/crypto/sm3_avx_glue.c b/arch/x86/crypto/sm3_avx_glue.c new file mode 100644 index 000000000000..661b6f22ffcd --- /dev/null +++ b/arch/x86/crypto/sm3_avx_glue.c @@ -0,0 +1,134 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * SM3 Secure Hash Algorithm, AVX assembler accelerated. + * specified in: https://datatracker.ietf.org/doc/html/draft-sca-cfrg-sm3-02 + * + * Copyright (C) 2021 Tianjia Zhang + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include +#include +#include +#include +#include +#include +#include +#include + +asmlinkage void sm3_transform_avx(struct sm3_state *state, + const u8 *data, int nblocks); + +static int sm3_avx_update(struct shash_desc *desc, const u8 *data, + unsigned int len) +{ + struct sm3_state *sctx = shash_desc_ctx(desc); + + if (!crypto_simd_usable() || + (sctx->count % SM3_BLOCK_SIZE) + len < SM3_BLOCK_SIZE) { + sm3_update(sctx, data, len); + return 0; + } + + /* + * Make sure struct sm3_state begins directly with the SM3 + * 256-bit internal state, as this is what the asm functions expect. + */ + BUILD_BUG_ON(offsetof(struct sm3_state, state) != 0); + + kernel_fpu_begin(); + sm3_base_do_update(desc, data, len, sm3_transform_avx); + kernel_fpu_end(); + + return 0; +} + +static int sm3_avx_finup(struct shash_desc *desc, const u8 *data, + unsigned int len, u8 *out) +{ + if (!crypto_simd_usable()) { + struct sm3_state *sctx = shash_desc_ctx(desc); + + if (len) + sm3_update(sctx, data, len); + + sm3_final(sctx, out); + return 0; + } + + kernel_fpu_begin(); + if (len) + sm3_base_do_update(desc, data, len, sm3_transform_avx); + sm3_base_do_finalize(desc, sm3_transform_avx); + kernel_fpu_end(); + + return sm3_base_finish(desc, out); +} + +static int sm3_avx_final(struct shash_desc *desc, u8 *out) +{ + if (!crypto_simd_usable()) { + sm3_final(shash_desc_ctx(desc), out); + return 0; + } + + kernel_fpu_begin(); + sm3_base_do_finalize(desc, sm3_transform_avx); + kernel_fpu_end(); + + return sm3_base_finish(desc, out); +} + +static struct shash_alg sm3_avx_alg = { + .digestsize = SM3_DIGEST_SIZE, + .init = sm3_base_init, + .update = sm3_avx_update, + .final = sm3_avx_final, + .finup = sm3_avx_finup, + .descsize = sizeof(struct sm3_state), + .base = { + .cra_name = "sm3", + .cra_driver_name = "sm3-avx", + .cra_priority = 300, + .cra_blocksize = SM3_BLOCK_SIZE, + .cra_module = THIS_MODULE, + } +}; + +static int __init sm3_avx_mod_init(void) +{ + const char *feature_name; + + if (!boot_cpu_has(X86_FEATURE_AVX)) { + pr_info("AVX instruction are not detected.\n"); + return -ENODEV; + } + + if (!boot_cpu_has(X86_FEATURE_BMI2)) { + pr_info("BMI2 instruction are not detected.\n"); + return -ENODEV; + } + + if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, + &feature_name)) { + pr_info("CPU feature '%s' is not supported.\n", feature_name); + return -ENODEV; + } + + return crypto_register_shash(&sm3_avx_alg); +} + +static void __exit sm3_avx_mod_exit(void) +{ + crypto_unregister_shash(&sm3_avx_alg); +} + +module_init(sm3_avx_mod_init); +module_exit(sm3_avx_mod_exit); + +MODULE_LICENSE("GPL v2"); +MODULE_AUTHOR("Tianjia Zhang "); +MODULE_DESCRIPTION("SM3 Secure Hash Algorithm, AVX assembler accelerated"); +MODULE_ALIAS_CRYPTO("sm3"); +MODULE_ALIAS_CRYPTO("sm3-avx"); diff --git a/crypto/Kconfig b/crypto/Kconfig index 8e780bf66c0a..0af5216c083a 100644 --- a/crypto/Kconfig +++ b/crypto/Kconfig @@ -1008,6 +1008,19 @@ config CRYPTO_SM3 http://www.oscca.gov.cn/UpFile/20101222141857786.pdf https://datatracker.ietf.org/doc/html/draft-shen-sm3-hash +config CRYPTO_SM3_AVX_X86_64 + tristate "SM3 digest algorithm (x86_64/AVX)" + depends on X86 && 64BIT + select CRYPTO_HASH + select CRYPTO_LIB_SM3 + help + SM3 secure hash function as defined by OSCCA GM/T 0004-2012 SM3). + It is part of the Chinese Commercial Cryptography suite. This is + SM3 optimized implementation using Advanced Vector Extensions (AVX) + when available. + + If unsure, say N. + config CRYPTO_STREEBOG tristate "Streebog Hash Function" select CRYPTO_HASH From patchwork Thu Dec 23 04:35:47 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "tianjia.zhang" X-Patchwork-Id: 527500 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BE831C433F5 for ; Thu, 23 Dec 2021 04:36:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346402AbhLWEgE (ORCPT ); Wed, 22 Dec 2021 23:36:04 -0500 Received: from out30-44.freemail.mail.aliyun.com ([115.124.30.44]:49378 "EHLO out30-44.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1346334AbhLWEgB (ORCPT ); Wed, 22 Dec 2021 23:36:01 -0500 X-Alimail-AntiSpam: AC=PASS; BC=-1|-1; BR=01201311R141e4; CH=green; DM=||false|; DS=||; FP=0|-1|-1|-1|0|-1|-1|-1; HT=e01e04394; MF=tianjia.zhang@linux.alibaba.com; NM=1; PH=DS; RN=20; SR=0; TI=SMTPD_---0V.ULz4B_1640234156; Received: from localhost(mailfrom:tianjia.zhang@linux.alibaba.com fp:SMTPD_---0V.ULz4B_1640234156) by smtp.aliyun-inc.com(127.0.0.1); Thu, 23 Dec 2021 12:35:57 +0800 From: Tianjia Zhang To: Herbert Xu , "David S. Miller" , Vitaly Chikunov , Eric Biggers , Eric Biggers , Gilad Ben-Yossef , Ard Biesheuvel , Jussi Kivilinna , Catalin Marinas , Will Deacon , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , linux-crypto@vger.kernel.org, x86@kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org Cc: Tianjia Zhang Subject: [PATCH v3 6/6] crypto: tcrypt - add asynchronous speed test for SM3 Date: Thu, 23 Dec 2021 12:35:47 +0800 Message-Id: <20211223043547.32297-7-tianjia.zhang@linux.alibaba.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20211223043547.32297-1-tianjia.zhang@linux.alibaba.com> References: <20211223043547.32297-1-tianjia.zhang@linux.alibaba.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org tcrypt supports testing of SM3 hash algorithms that use AVX instruction acceleration. In order to add the sm3 asynchronous test to the appropriate position, shift the testcase sequence number of the multi buffer backward and start from 450. Signed-off-by: Tianjia Zhang --- crypto/tcrypt.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/crypto/tcrypt.c b/crypto/tcrypt.c index 00149657a4bc..82b5eef2246a 100644 --- a/crypto/tcrypt.c +++ b/crypto/tcrypt.c @@ -2571,31 +2571,35 @@ static int do_test(const char *alg, u32 type, u32 mask, int m, u32 num_mb) if (mode > 400 && mode < 500) break; fallthrough; case 422: + test_ahash_speed("sm3", sec, generic_hash_speed_template); + if (mode > 400 && mode < 500) break; + fallthrough; + case 450: test_mb_ahash_speed("sha1", sec, generic_hash_speed_template, num_mb); if (mode > 400 && mode < 500) break; fallthrough; - case 423: + case 451: test_mb_ahash_speed("sha256", sec, generic_hash_speed_template, num_mb); if (mode > 400 && mode < 500) break; fallthrough; - case 424: + case 452: test_mb_ahash_speed("sha512", sec, generic_hash_speed_template, num_mb); if (mode > 400 && mode < 500) break; fallthrough; - case 425: + case 453: test_mb_ahash_speed("sm3", sec, generic_hash_speed_template, num_mb); if (mode > 400 && mode < 500) break; fallthrough; - case 426: + case 454: test_mb_ahash_speed("streebog256", sec, generic_hash_speed_template, num_mb); if (mode > 400 && mode < 500) break; fallthrough; - case 427: + case 455: test_mb_ahash_speed("streebog512", sec, generic_hash_speed_template, num_mb); if (mode > 400 && mode < 500) break;