From patchwork Thu Dec 17 22:21:29 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Biggers X-Patchwork-Id: 345023 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.0 required=3.0 tests=BAYES_00, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D57DDC2D0E4 for ; Thu, 17 Dec 2020 22:25:50 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9AF682376F for ; Thu, 17 Dec 2020 22:25:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732074AbgLQWZl (ORCPT ); Thu, 17 Dec 2020 17:25:41 -0500 Received: from mail.kernel.org ([198.145.29.99]:45180 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727137AbgLQWZl (ORCPT ); Thu, 17 Dec 2020 17:25:41 -0500 From: Eric Biggers Authentication-Results: mail.kernel.org; dkim=permerror (bad message/signature format) To: linux-crypto@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org, Ard Biesheuvel , Herbert Xu , David Sterba , "Jason A . Donenfeld" , Paul Crowley Subject: [PATCH v2 02/11] crypto: blake2b - define shash_alg structs using macros Date: Thu, 17 Dec 2020 14:21:29 -0800 Message-Id: <20201217222138.170526-3-ebiggers@kernel.org> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201217222138.170526-1-ebiggers@kernel.org> References: <20201217222138.170526-1-ebiggers@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org From: Eric Biggers The shash_alg structs for the four variants of BLAKE2b are identical except for the algorithm name, driver name, and digest size. So, avoid code duplication by using a macro to define these structs. Reviewed-by: David Sterba Signed-off-by: Eric Biggers --- crypto/blake2b_generic.c | 82 ++++++++++++---------------------------- 1 file changed, 25 insertions(+), 57 deletions(-) diff --git a/crypto/blake2b_generic.c b/crypto/blake2b_generic.c index 83942f511075e..0e38e3e48297c 100644 --- a/crypto/blake2b_generic.c +++ b/crypto/blake2b_generic.c @@ -236,64 +236,32 @@ static int blake2b_final(struct shash_desc *desc, u8 *out) return 0; } -static struct shash_alg blake2b_algs[] = { - { - .base.cra_name = "blake2b-160", - .base.cra_driver_name = "blake2b-160-generic", - .base.cra_priority = 100, - .base.cra_flags = CRYPTO_ALG_OPTIONAL_KEY, - .base.cra_blocksize = BLAKE2B_BLOCK_SIZE, - .base.cra_ctxsize = sizeof(struct blake2b_tfm_ctx), - .base.cra_module = THIS_MODULE, - .digestsize = BLAKE2B_160_HASH_SIZE, - .setkey = blake2b_setkey, - .init = blake2b_init, - .update = blake2b_update, - .final = blake2b_final, - .descsize = sizeof(struct blake2b_state), - }, { - .base.cra_name = "blake2b-256", - .base.cra_driver_name = "blake2b-256-generic", - .base.cra_priority = 100, - .base.cra_flags = CRYPTO_ALG_OPTIONAL_KEY, - .base.cra_blocksize = BLAKE2B_BLOCK_SIZE, - .base.cra_ctxsize = sizeof(struct blake2b_tfm_ctx), - .base.cra_module = THIS_MODULE, - .digestsize = BLAKE2B_256_HASH_SIZE, - .setkey = blake2b_setkey, - .init = blake2b_init, - .update = blake2b_update, - .final = blake2b_final, - .descsize = sizeof(struct blake2b_state), - }, { - .base.cra_name = "blake2b-384", - .base.cra_driver_name = "blake2b-384-generic", - .base.cra_priority = 100, - .base.cra_flags = CRYPTO_ALG_OPTIONAL_KEY, - .base.cra_blocksize = BLAKE2B_BLOCK_SIZE, - .base.cra_ctxsize = sizeof(struct blake2b_tfm_ctx), - .base.cra_module = THIS_MODULE, - .digestsize = BLAKE2B_384_HASH_SIZE, - .setkey = blake2b_setkey, - .init = blake2b_init, - .update = blake2b_update, - .final = blake2b_final, - .descsize = sizeof(struct blake2b_state), - }, { - .base.cra_name = "blake2b-512", - .base.cra_driver_name = "blake2b-512-generic", - .base.cra_priority = 100, - .base.cra_flags = CRYPTO_ALG_OPTIONAL_KEY, - .base.cra_blocksize = BLAKE2B_BLOCK_SIZE, - .base.cra_ctxsize = sizeof(struct blake2b_tfm_ctx), - .base.cra_module = THIS_MODULE, - .digestsize = BLAKE2B_512_HASH_SIZE, - .setkey = blake2b_setkey, - .init = blake2b_init, - .update = blake2b_update, - .final = blake2b_final, - .descsize = sizeof(struct blake2b_state), +#define BLAKE2B_ALG(name, driver_name, digest_size) \ + { \ + .base.cra_name = name, \ + .base.cra_driver_name = driver_name, \ + .base.cra_priority = 100, \ + .base.cra_flags = CRYPTO_ALG_OPTIONAL_KEY, \ + .base.cra_blocksize = BLAKE2B_BLOCK_SIZE, \ + .base.cra_ctxsize = sizeof(struct blake2b_tfm_ctx), \ + .base.cra_module = THIS_MODULE, \ + .digestsize = digest_size, \ + .setkey = blake2b_setkey, \ + .init = blake2b_init, \ + .update = blake2b_update, \ + .final = blake2b_final, \ + .descsize = sizeof(struct blake2b_state), \ } + +static struct shash_alg blake2b_algs[] = { + BLAKE2B_ALG("blake2b-160", "blake2b-160-generic", + BLAKE2B_160_HASH_SIZE), + BLAKE2B_ALG("blake2b-256", "blake2b-256-generic", + BLAKE2B_256_HASH_SIZE), + BLAKE2B_ALG("blake2b-384", "blake2b-384-generic", + BLAKE2B_384_HASH_SIZE), + BLAKE2B_ALG("blake2b-512", "blake2b-512-generic", + BLAKE2B_512_HASH_SIZE), }; static int __init blake2b_mod_init(void) From patchwork Thu Dec 17 22:21:30 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Biggers X-Patchwork-Id: 345022 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.0 required=3.0 tests=BAYES_00, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 068ACC3526A for ; Thu, 17 Dec 2020 22:25:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CE425238A0 for ; Thu, 17 Dec 2020 22:25:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732071AbgLQWZm (ORCPT ); Thu, 17 Dec 2020 17:25:42 -0500 Received: from mail.kernel.org ([198.145.29.99]:45200 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732063AbgLQWZm (ORCPT ); Thu, 17 Dec 2020 17:25:42 -0500 From: Eric Biggers Authentication-Results: mail.kernel.org; dkim=permerror (bad message/signature format) To: linux-crypto@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org, Ard Biesheuvel , Herbert Xu , David Sterba , "Jason A . Donenfeld" , Paul Crowley Subject: [PATCH v2 03/11] crypto: blake2b - export helpers for optimized implementations Date: Thu, 17 Dec 2020 14:21:30 -0800 Message-Id: <20201217222138.170526-4-ebiggers@kernel.org> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201217222138.170526-1-ebiggers@kernel.org> References: <20201217222138.170526-1-ebiggers@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org From: Eric Biggers In preparation for adding optimized implementations of BLAKE2b (as well as possibly supporting BLAKE2b through the library API in the future), create headers and that contain common constants, structs, and helper functions for BLAKE2b. Furthermore, export helper functions that reduce the amount of boilerplate that needs to be duplicated in optimized implementations of BLAKE2b. This includes exporting the generic setkey() and init() functions as-is, as well as exporting the update() and final() functions with a function pointer argument added to provide an implementation of the compression function. (The compression function is the only thing that optimized implementations really want to override.) This is similar to what is already done for nhpoly1305, sha1, and sha256. It's also similar to what I'll be doing for blake2s. I didn't go so far as to put the helper functions in a separate module blake2b_helpers.ko, like I'm doing for BLAKE2s. This would be needed if BLAKE2b gets exposed through the library API, but it can be done later. Signed-off-by: Eric Biggers --- crypto/blake2b_generic.c | 95 ++++++++++++++++--------------- include/crypto/blake2b.h | 27 +++++++++ include/crypto/internal/blake2b.h | 33 +++++++++++ 3 files changed, 110 insertions(+), 45 deletions(-) create mode 100644 include/crypto/blake2b.h create mode 100644 include/crypto/internal/blake2b.h diff --git a/crypto/blake2b_generic.c b/crypto/blake2b_generic.c index 0e38e3e48297c..ee5084f3c92e1 100644 --- a/crypto/blake2b_generic.c +++ b/crypto/blake2b_generic.c @@ -23,27 +23,9 @@ #include #include #include +#include #include - -enum blake2b_lengths { - BLAKE2B_BLOCK_SIZE = 128, - BLAKE2B_KEY_SIZE = 64, - - BLAKE2B_160_HASH_SIZE = 20, - BLAKE2B_256_HASH_SIZE = 32, - BLAKE2B_384_HASH_SIZE = 48, - BLAKE2B_512_HASH_SIZE = 64, -}; - -struct blake2b_state { - u64 h[8]; - u64 t[2]; - u64 f[2]; - u8 buf[BLAKE2B_BLOCK_SIZE]; - size_t buflen; -}; - static const u64 blake2b_IV[8] = { 0x6a09e667f3bcc908ULL, 0xbb67ae8584caa73bULL, 0x3c6ef372fe94f82bULL, 0xa54ff53a5f1d36f1ULL, @@ -96,8 +78,8 @@ static void blake2b_increment_counter(struct blake2b_state *S, const u64 inc) G(r,7,v[ 3],v[ 4],v[ 9],v[14]); \ } while (0) -static void blake2b_compress(struct blake2b_state *S, - const u8 block[BLAKE2B_BLOCK_SIZE]) +static void blake2b_compress_one_generic(struct blake2b_state *S, + const u8 block[BLAKE2B_BLOCK_SIZE]) { u64 m[16]; u64 v[16]; @@ -140,12 +122,18 @@ static void blake2b_compress(struct blake2b_state *S, #undef G #undef ROUND -struct blake2b_tfm_ctx { - u8 key[BLAKE2B_KEY_SIZE]; - unsigned int keylen; -}; +void blake2b_compress_generic(struct blake2b_state *state, + const u8 *block, size_t nblocks, u32 inc) +{ + do { + blake2b_increment_counter(state, inc); + blake2b_compress_one_generic(state, block); + block += BLAKE2B_BLOCK_SIZE; + } while (--nblocks); +} +EXPORT_SYMBOL_GPL(blake2b_compress_generic); -static int blake2b_setkey(struct crypto_shash *tfm, const u8 *key, +int crypto_blake2b_setkey(struct crypto_shash *tfm, const u8 *key, unsigned int keylen) { struct blake2b_tfm_ctx *tctx = crypto_shash_ctx(tfm); @@ -158,8 +146,9 @@ static int blake2b_setkey(struct crypto_shash *tfm, const u8 *key, return 0; } +EXPORT_SYMBOL_GPL(crypto_blake2b_setkey); -static int blake2b_init(struct shash_desc *desc) +int crypto_blake2b_init(struct shash_desc *desc) { struct blake2b_tfm_ctx *tctx = crypto_shash_ctx(desc->tfm); struct blake2b_state *state = shash_desc_ctx(desc); @@ -181,9 +170,10 @@ static int blake2b_init(struct shash_desc *desc) } return 0; } +EXPORT_SYMBOL_GPL(crypto_blake2b_init); -static int blake2b_update(struct shash_desc *desc, const u8 *in, - unsigned int inlen) +int crypto_blake2b_update(struct shash_desc *desc, const u8 *in, + unsigned int inlen, blake2b_compress_t compress) { struct blake2b_state *state = shash_desc_ctx(desc); const size_t left = state->buflen; @@ -194,18 +184,19 @@ static int blake2b_update(struct shash_desc *desc, const u8 *in, if (inlen > fill) { state->buflen = 0; - /* Fill buffer */ memcpy(state->buf + left, in, fill); - blake2b_increment_counter(state, BLAKE2B_BLOCK_SIZE); - /* Compress */ - blake2b_compress(state, state->buf); + (*compress)(state, state->buf, 1, BLAKE2B_BLOCK_SIZE); in += fill; inlen -= fill; - while (inlen > BLAKE2B_BLOCK_SIZE) { - blake2b_increment_counter(state, BLAKE2B_BLOCK_SIZE); - blake2b_compress(state, in); - in += BLAKE2B_BLOCK_SIZE; - inlen -= BLAKE2B_BLOCK_SIZE; + if (inlen > BLAKE2B_BLOCK_SIZE) { + /* Hash one less (full) block than strictly possible */ + size_t nbytes = round_up(inlen - BLAKE2B_BLOCK_SIZE, + BLAKE2B_BLOCK_SIZE); + + (*compress)(state, in, nbytes / BLAKE2B_BLOCK_SIZE, + BLAKE2B_BLOCK_SIZE); + in += nbytes; + inlen -= nbytes; } } memcpy(state->buf + state->buflen, in, inlen); @@ -213,20 +204,28 @@ static int blake2b_update(struct shash_desc *desc, const u8 *in, return 0; } +EXPORT_SYMBOL_GPL(crypto_blake2b_update); + +static int crypto_blake2b_update_generic(struct shash_desc *desc, + const u8 *in, unsigned int inlen) +{ + return crypto_blake2b_update(desc, in, inlen, blake2b_compress_generic); +} -static int blake2b_final(struct shash_desc *desc, u8 *out) +int crypto_blake2b_final(struct shash_desc *desc, u8 *out, + blake2b_compress_t compress) { struct blake2b_state *state = shash_desc_ctx(desc); const int digestsize = crypto_shash_digestsize(desc->tfm); size_t i; - blake2b_increment_counter(state, state->buflen); /* Set last block */ state->f[0] = (u64)-1; /* Padding */ memset(state->buf + state->buflen, 0, BLAKE2B_BLOCK_SIZE - state->buflen); - blake2b_compress(state, state->buf); + + (*compress)(state, state->buf, 1, state->buflen); /* Avoid temporary buffer and switch the internal output to LE order */ for (i = 0; i < ARRAY_SIZE(state->h); i++) @@ -235,6 +234,12 @@ static int blake2b_final(struct shash_desc *desc, u8 *out) memcpy(out, state->h, digestsize); return 0; } +EXPORT_SYMBOL_GPL(crypto_blake2b_final); + +static int crypto_blake2b_final_generic(struct shash_desc *desc, u8 *out) +{ + return crypto_blake2b_final(desc, out, blake2b_compress_generic); +} #define BLAKE2B_ALG(name, driver_name, digest_size) \ { \ @@ -246,10 +251,10 @@ static int blake2b_final(struct shash_desc *desc, u8 *out) .base.cra_ctxsize = sizeof(struct blake2b_tfm_ctx), \ .base.cra_module = THIS_MODULE, \ .digestsize = digest_size, \ - .setkey = blake2b_setkey, \ - .init = blake2b_init, \ - .update = blake2b_update, \ - .final = blake2b_final, \ + .setkey = crypto_blake2b_setkey, \ + .init = crypto_blake2b_init, \ + .update = crypto_blake2b_update_generic, \ + .final = crypto_blake2b_final_generic, \ .descsize = sizeof(struct blake2b_state), \ } diff --git a/include/crypto/blake2b.h b/include/crypto/blake2b.h new file mode 100644 index 0000000000000..7c3045df597c0 --- /dev/null +++ b/include/crypto/blake2b.h @@ -0,0 +1,27 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ + +#ifndef _CRYPTO_BLAKE2B_H +#define _CRYPTO_BLAKE2B_H + +#include + +enum blake2b_lengths { + BLAKE2B_BLOCK_SIZE = 128, + BLAKE2B_KEY_SIZE = 64, + + BLAKE2B_160_HASH_SIZE = 20, + BLAKE2B_256_HASH_SIZE = 32, + BLAKE2B_384_HASH_SIZE = 48, + BLAKE2B_512_HASH_SIZE = 64, +}; + +struct blake2b_state { + /* 'h', 't', and 'f' are used in assembly code, so keep them as-is. */ + u64 h[8]; + u64 t[2]; + u64 f[2]; + u8 buf[BLAKE2B_BLOCK_SIZE]; + size_t buflen; +}; + +#endif /* _CRYPTO_BLAKE2B_H */ diff --git a/include/crypto/internal/blake2b.h b/include/crypto/internal/blake2b.h new file mode 100644 index 0000000000000..822dff79bab91 --- /dev/null +++ b/include/crypto/internal/blake2b.h @@ -0,0 +1,33 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ + +#ifndef _CRYPTO_INTERNAL_BLAKE2B_H +#define _CRYPTO_INTERNAL_BLAKE2B_H + +#include + +struct blake2b_tfm_ctx { + u8 key[BLAKE2B_KEY_SIZE]; + unsigned int keylen; +}; + +void blake2b_compress_generic(struct blake2b_state *state, + const u8 *block, size_t nblocks, u32 inc); + +typedef void (*blake2b_compress_t)(struct blake2b_state *state, + const u8 *block, size_t nblocks, u32 inc); + +struct crypto_shash; +struct shash_desc; + +int crypto_blake2b_setkey(struct crypto_shash *tfm, const u8 *key, + unsigned int keylen); + +int crypto_blake2b_init(struct shash_desc *desc); + +int crypto_blake2b_update(struct shash_desc *desc, const u8 *in, + unsigned int inlen, blake2b_compress_t compress); + +int crypto_blake2b_final(struct shash_desc *desc, u8 *out, + blake2b_compress_t compress); + +#endif /* _CRYPTO_INTERNAL_BLAKE2B_H */ From patchwork Thu Dec 17 22:21:32 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Biggers X-Patchwork-Id: 345019 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.0 required=3.0 tests=BAYES_00, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3089FC2BBD5 for ; Thu, 17 Dec 2020 22:26:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 07561239FE for ; Thu, 17 Dec 2020 22:26:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732086AbgLQW0V (ORCPT ); Thu, 17 Dec 2020 17:26:21 -0500 Received: from mail.kernel.org ([198.145.29.99]:45448 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732063AbgLQW0V (ORCPT ); Thu, 17 Dec 2020 17:26:21 -0500 From: Eric Biggers Authentication-Results: mail.kernel.org; dkim=permerror (bad message/signature format) To: linux-crypto@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org, Ard Biesheuvel , Herbert Xu , David Sterba , "Jason A . Donenfeld" , Paul Crowley Subject: [PATCH v2 05/11] crypto: arm/blake2b - add NEON-accelerated BLAKE2b Date: Thu, 17 Dec 2020 14:21:32 -0800 Message-Id: <20201217222138.170526-6-ebiggers@kernel.org> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201217222138.170526-1-ebiggers@kernel.org> References: <20201217222138.170526-1-ebiggers@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org From: Eric Biggers Add a NEON-accelerated implementation of BLAKE2b. On Cortex-A7 (which these days is the most common ARM processor that doesn't have the ARMv8 Crypto Extensions), this is over twice as fast as SHA-256, and slightly faster than SHA-1. It is also almost three times as fast as the generic implementation of BLAKE2b: Algorithm Cycles per byte (on 4096-byte messages) =================== ======================================= blake2b-256-neon 14.0 sha1-neon 16.3 sha1-asm 20.8 blake2s-256-generic 26.1 sha256-neon 28.9 sha256-asm 32.0 blake2b-256-generic 39.9 This implementation isn't directly based on any other implementation, but it borrows some ideas from previous NEON code I've written as well as from chacha-neon-core.S. At least on Cortex-A7, it is faster than the other NEON implementations of BLAKE2b I'm aware of (the implementation in the BLAKE2 official repository using intrinsics, and Andrew Moon's implementation which can be found in SUPERCOP). It does only one block at a time, so it performs well on short messages too. NEON-accelerated BLAKE2b is useful because there is interest in using BLAKE2b-256 for dm-verity on low-end Android devices (specifically, devices that lack the ARMv8 Crypto Extensions) to replace SHA-1. On these devices, the performance cost of upgrading to SHA-256 may be unacceptable, whereas BLAKE2b-256 would actually improve performance. Although BLAKE2b is intended for 64-bit platforms (unlike BLAKE2s which is intended for 32-bit platforms), on 32-bit ARM processors with NEON, BLAKE2b is actually faster than BLAKE2s. This is because NEON supports 64-bit operations, and because BLAKE2s's block size is too small for NEON to be helpful for it. The best I've been able to do with BLAKE2s on Cortex-A7 is 19.0 cpb with an optimized scalar implementation. (I didn't try BLAKE2sp and BLAKE3, which in theory would be faster, but they're more complex as they require running multiple hashes at once. Note that BLAKE2b already uses all the NEON bandwidth on the Cortex-A7, so I expect that any speedup from BLAKE2sp or BLAKE3 would come only from the smaller number of rounds, not from the extra parallelism.) For now this BLAKE2b implementation is only exposed via the shash API, since the library API doesn't support BLAKE2b yet. However, I've tried to keep things somewhat consistent with BLAKE2s, e.g. by defining blake2b_compress_arch() which is analogous to blake2s_compress_arch() and could be exported for use by the library API later if needed. Signed-off-by: Eric Biggers --- arch/arm/crypto/Kconfig | 10 + arch/arm/crypto/Makefile | 2 + arch/arm/crypto/blake2b-neon-core.S | 345 ++++++++++++++++++++++++++++ arch/arm/crypto/blake2b-neon-glue.c | 105 +++++++++ 4 files changed, 462 insertions(+) create mode 100644 arch/arm/crypto/blake2b-neon-core.S create mode 100644 arch/arm/crypto/blake2b-neon-glue.c diff --git a/arch/arm/crypto/Kconfig b/arch/arm/crypto/Kconfig index c9bf2df85cb90..f6a14c186b4ec 100644 --- a/arch/arm/crypto/Kconfig +++ b/arch/arm/crypto/Kconfig @@ -62,6 +62,16 @@ config CRYPTO_SHA512_ARM SHA-512 secure hash standard (DFIPS 180-2) implemented using optimized ARM assembler and NEON, when available. +config CRYPTO_BLAKE2B_NEON + tristate "BLAKE2b digest algorithm (ARM NEON)" + depends on KERNEL_MODE_NEON + select CRYPTO_BLAKE2B + help + BLAKE2b digest algorithm optimized with ARM NEON instructions. + On ARM processors that have NEON support but not the ARMv8 + Crypto Extensions, typically this BLAKE2b implementation is + much faster than SHA-2 and slightly faster than SHA-1. + config CRYPTO_AES_ARM tristate "Scalar AES cipher for ARM" select CRYPTO_ALGAPI diff --git a/arch/arm/crypto/Makefile b/arch/arm/crypto/Makefile index b745c17d356fe..ab835ceeb4f2e 100644 --- a/arch/arm/crypto/Makefile +++ b/arch/arm/crypto/Makefile @@ -9,6 +9,7 @@ obj-$(CONFIG_CRYPTO_SHA1_ARM) += sha1-arm.o obj-$(CONFIG_CRYPTO_SHA1_ARM_NEON) += sha1-arm-neon.o obj-$(CONFIG_CRYPTO_SHA256_ARM) += sha256-arm.o obj-$(CONFIG_CRYPTO_SHA512_ARM) += sha512-arm.o +obj-$(CONFIG_CRYPTO_BLAKE2B_NEON) += blake2b-neon.o obj-$(CONFIG_CRYPTO_CHACHA20_NEON) += chacha-neon.o obj-$(CONFIG_CRYPTO_POLY1305_ARM) += poly1305-arm.o obj-$(CONFIG_CRYPTO_NHPOLY1305_NEON) += nhpoly1305-neon.o @@ -29,6 +30,7 @@ sha256-arm-neon-$(CONFIG_KERNEL_MODE_NEON) := sha256_neon_glue.o sha256-arm-y := sha256-core.o sha256_glue.o $(sha256-arm-neon-y) sha512-arm-neon-$(CONFIG_KERNEL_MODE_NEON) := sha512-neon-glue.o sha512-arm-y := sha512-core.o sha512-glue.o $(sha512-arm-neon-y) +blake2b-neon-y := blake2b-neon-core.o blake2b-neon-glue.o sha1-arm-ce-y := sha1-ce-core.o sha1-ce-glue.o sha2-arm-ce-y := sha2-ce-core.o sha2-ce-glue.o aes-arm-ce-y := aes-ce-core.o aes-ce-glue.o diff --git a/arch/arm/crypto/blake2b-neon-core.S b/arch/arm/crypto/blake2b-neon-core.S new file mode 100644 index 0000000000000..f246884c29cb4 --- /dev/null +++ b/arch/arm/crypto/blake2b-neon-core.S @@ -0,0 +1,345 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * BLAKE2b digest algorithm, NEON accelerated + * + * Copyright 2020 Google LLC + * + * Author: Eric Biggers + */ + +#include + + .text + .fpu neon + + // The arguments to blake2b_compress_neon() + STATE .req r0 + BLOCK .req r1 + NBLOCKS .req r2 + INC .req r3 + + // Pointers to the rotation tables + ROR24_TABLE .req r4 + ROR16_TABLE .req r5 + + // The original stack pointer + ORIG_SP .req r6 + + // NEON registers which contain the message words of the current block. + // M_0-M_3 are occasionally used for other purposes too. + M_0 .req d16 + M_1 .req d17 + M_2 .req d18 + M_3 .req d19 + M_4 .req d20 + M_5 .req d21 + M_6 .req d22 + M_7 .req d23 + M_8 .req d24 + M_9 .req d25 + M_10 .req d26 + M_11 .req d27 + M_12 .req d28 + M_13 .req d29 + M_14 .req d30 + M_15 .req d31 + + .align 4 + // Tables for computing ror64(x, 24) and ror64(x, 16) using the vtbl.8 + // instruction. This is the most efficient way to implement these + // rotation amounts with NEON. (On Cortex-A53 it's the same speed as + // vshr.u64 + vsli.u64, while on Cortex-A7 it's faster.) +.Lror24_table: + .byte 3, 4, 5, 6, 7, 0, 1, 2 +.Lror16_table: + .byte 2, 3, 4, 5, 6, 7, 0, 1 + // The BLAKE2b initialization vector +.Lblake2b_IV: + .quad 0x6a09e667f3bcc908, 0xbb67ae8584caa73b + .quad 0x3c6ef372fe94f82b, 0xa54ff53a5f1d36f1 + .quad 0x510e527fade682d1, 0x9b05688c2b3e6c1f + .quad 0x1f83d9abfb41bd6b, 0x5be0cd19137e2179 + +// Execute one round of BLAKE2b by updating the state matrix v[0..15] in the +// NEON registers q0-q7. The message block is in q8..q15. The stack pointer +// points to a 32-byte aligned buffer containing a copy of q8 and q9, so that +// they can be reloaded if q8 and q9 are used as temporary registers. The macro +// arguments s0-s15 give the order in which the message words are used in this +// round. 'final' is 1 if this is the final round. +.macro _blake2b_round s0, s1, s2, s3, s4, s5, s6, s7, \ + s8, s9, s10, s11, s12, s13, s14, s15, final=0 + + // Mix the columns: + // (v[0], v[4], v[8], v[12]), (v[1], v[5], v[9], v[13]), + // (v[2], v[6], v[10], v[14]), and (v[3], v[7], v[11], v[15]). + + // a += b + m[blake2b_sigma[r][2*i + 0]]; + vadd.u64 q0, q0, q2 + vadd.u64 q1, q1, q3 + vadd.u64 d0, d0, M_\s0 + vadd.u64 d1, d1, M_\s2 + vadd.u64 d2, d2, M_\s4 + vadd.u64 d3, d3, M_\s6 + + // d = ror64(d ^ a, 32); + veor q6, q6, q0 + veor q7, q7, q1 + vrev64.32 q6, q6 + vrev64.32 q7, q7 + + // c += d; + vadd.u64 q4, q4, q6 + vadd.u64 q5, q5, q7 + + // b = ror64(b ^ c, 24); + vld1.8 {M_0}, [ROR24_TABLE, :64] + veor q2, q2, q4 + veor q3, q3, q5 + vtbl.8 d4, {d4}, M_0 + vtbl.8 d5, {d5}, M_0 + vtbl.8 d6, {d6}, M_0 + vtbl.8 d7, {d7}, M_0 + + // a += b + m[blake2b_sigma[r][2*i + 1]]; + // + // M_0 got clobbered above, so we have to reload it if any of the four + // message words this step needs happens to be M_0. Otherwise we don't + // need to reload it here, as it will just get clobbered again below. +.if \s1 == 0 || \s3 == 0 || \s5 == 0 || \s7 == 0 + vld1.8 {M_0}, [sp, :64] +.endif + vadd.u64 q0, q0, q2 + vadd.u64 q1, q1, q3 + vadd.u64 d0, d0, M_\s1 + vadd.u64 d1, d1, M_\s3 + vadd.u64 d2, d2, M_\s5 + vadd.u64 d3, d3, M_\s7 + + // d = ror64(d ^ a, 16); + vld1.8 {M_0}, [ROR16_TABLE, :64] + veor q6, q6, q0 + veor q7, q7, q1 + vtbl.8 d12, {d12}, M_0 + vtbl.8 d13, {d13}, M_0 + vtbl.8 d14, {d14}, M_0 + vtbl.8 d15, {d15}, M_0 + + // c += d; + vadd.u64 q4, q4, q6 + vadd.u64 q5, q5, q7 + + // b = ror64(b ^ c, 63); + // + // This rotation amount isn't a multiple of 8, so it has to be + // implemented using a pair of shifts, which requires temporary + // registers. Use q8-q9 (M_0-M_3) for this, and reload them afterwards. + veor q8, q2, q4 + veor q9, q3, q5 + vshr.u64 q2, q8, #63 + vshr.u64 q3, q9, #63 + vsli.u64 q2, q8, #1 + vsli.u64 q3, q9, #1 + vld1.8 {q8-q9}, [sp, :256] + + // Mix the diagonals: + // (v[0], v[5], v[10], v[15]), (v[1], v[6], v[11], v[12]), + // (v[2], v[7], v[8], v[13]), and (v[3], v[4], v[9], v[14]). + // + // There are two possible ways to do this: use 'vext' instructions to + // shift the rows of the matrix so that the diagonals become columns, + // and undo it afterwards; or just use 64-bit operations on 'd' + // registers instead of 128-bit operations on 'q' registers. We use the + // latter approach, as it performs much better on Cortex-A7. + + // a += b + m[blake2b_sigma[r][2*i + 0]]; + vadd.u64 d0, d0, d5 + vadd.u64 d1, d1, d6 + vadd.u64 d2, d2, d7 + vadd.u64 d3, d3, d4 + vadd.u64 d0, d0, M_\s8 + vadd.u64 d1, d1, M_\s10 + vadd.u64 d2, d2, M_\s12 + vadd.u64 d3, d3, M_\s14 + + // d = ror64(d ^ a, 32); + veor d15, d15, d0 + veor d12, d12, d1 + veor d13, d13, d2 + veor d14, d14, d3 + vrev64.32 d15, d15 + vrev64.32 d12, d12 + vrev64.32 d13, d13 + vrev64.32 d14, d14 + + // c += d; + vadd.u64 d10, d10, d15 + vadd.u64 d11, d11, d12 + vadd.u64 d8, d8, d13 + vadd.u64 d9, d9, d14 + + // b = ror64(b ^ c, 24); + vld1.8 {M_0}, [ROR24_TABLE, :64] + veor d5, d5, d10 + veor d6, d6, d11 + veor d7, d7, d8 + veor d4, d4, d9 + vtbl.8 d5, {d5}, M_0 + vtbl.8 d6, {d6}, M_0 + vtbl.8 d7, {d7}, M_0 + vtbl.8 d4, {d4}, M_0 + + // a += b + m[blake2b_sigma[r][2*i + 1]]; +.if \s9 == 0 || \s11 == 0 || \s13 == 0 || \s15 == 0 + vld1.8 {M_0}, [sp, :64] +.endif + vadd.u64 d0, d0, d5 + vadd.u64 d1, d1, d6 + vadd.u64 d2, d2, d7 + vadd.u64 d3, d3, d4 + vadd.u64 d0, d0, M_\s9 + vadd.u64 d1, d1, M_\s11 + vadd.u64 d2, d2, M_\s13 + vadd.u64 d3, d3, M_\s15 + + // d = ror64(d ^ a, 16); + vld1.8 {M_0}, [ROR16_TABLE, :64] + veor d15, d15, d0 + veor d12, d12, d1 + veor d13, d13, d2 + veor d14, d14, d3 + vtbl.8 d12, {d12}, M_0 + vtbl.8 d13, {d13}, M_0 + vtbl.8 d14, {d14}, M_0 + vtbl.8 d15, {d15}, M_0 + + // c += d; + vadd.u64 d10, d10, d15 + vadd.u64 d11, d11, d12 + vadd.u64 d8, d8, d13 + vadd.u64 d9, d9, d14 + + // b = ror64(b ^ c, 63); + veor d16, d4, d9 + veor d17, d5, d10 + veor d18, d6, d11 + veor d19, d7, d8 + vshr.u64 q2, q8, #63 + vshr.u64 q3, q9, #63 + vsli.u64 q2, q8, #1 + vsli.u64 q3, q9, #1 + // Reloading q8-q9 can be skipped on the final round. +.if ! \final + vld1.8 {q8-q9}, [sp, :256] +.endif +.endm + +// +// void blake2b_compress_neon(struct blake2b_state *state, +// const u8 *block, size_t nblocks, u32 inc); +// +// Only the first three fields of struct blake2b_state are used: +// u64 h[8]; (inout) +// u64 t[2]; (in) +// u64 f[2]; (in) +// + .align 5 +ENTRY(blake2b_compress_neon) + push {r4-r10} + + // Allocate a 32-byte stack buffer that is 32-byte aligned. + mov ORIG_SP, sp + sub ip, sp, #32 + bic ip, ip, #31 + mov sp, ip + + adr ROR24_TABLE, .Lror24_table + adr ROR16_TABLE, .Lror16_table + + mov ip, STATE + vld1.64 {q0-q1}, [ip]! // Load h[0..3] + vld1.64 {q2-q3}, [ip]! // Load h[4..7] +.Lnext_block: + adr r10, .Lblake2b_IV + vld1.64 {q14-q15}, [ip] // Load t[0..1] and f[0..1] + vld1.64 {q4-q5}, [r10]! // Load IV[0..3] + vmov r7, r8, d28 // Copy t[0] to (r7, r8) + vld1.64 {q6-q7}, [r10] // Load IV[4..7] + adds r7, r7, INC // Increment counter + bcs .Lslow_inc_ctr + vmov.i32 d28[0], r7 + vst1.64 {d28}, [ip] // Update t[0] +.Linc_ctr_done: + + // Load the next message block and finish initializing the state matrix + // 'v'. Fortunately, there are exactly enough NEON registers to fit the + // entire state matrix in q0-q7 and the entire message block in q8-15. + // + // However, _blake2b_round also needs some extra registers for rotates, + // so we have to spill some registers. It's better to spill the message + // registers than the state registers, as the message doesn't change. + // Therefore we store a copy of the first 32 bytes of the message block + // (q8-q9) in an aligned buffer on the stack so that they can be + // reloaded when needed. (We could just reload directly from the + // message buffer, but it's faster to use aligned loads.) + vld1.8 {q8-q9}, [BLOCK]! + veor q6, q6, q14 // v[12..13] = IV[4..5] ^ t[0..1] + vld1.8 {q10-q11}, [BLOCK]! + veor q7, q7, q15 // v[14..15] = IV[6..7] ^ f[0..1] + vld1.8 {q12-q13}, [BLOCK]! + vst1.8 {q8-q9}, [sp, :256] + mov ip, STATE + vld1.8 {q14-q15}, [BLOCK]! + + // Execute the rounds. Each round is provided the order in which it + // needs to use the message words. + _blake2b_round 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 + _blake2b_round 14, 10, 4, 8, 9, 15, 13, 6, 1, 12, 0, 2, 11, 7, 5, 3 + _blake2b_round 11, 8, 12, 0, 5, 2, 15, 13, 10, 14, 3, 6, 7, 1, 9, 4 + _blake2b_round 7, 9, 3, 1, 13, 12, 11, 14, 2, 6, 5, 10, 4, 0, 15, 8 + _blake2b_round 9, 0, 5, 7, 2, 4, 10, 15, 14, 1, 11, 12, 6, 8, 3, 13 + _blake2b_round 2, 12, 6, 10, 0, 11, 8, 3, 4, 13, 7, 5, 15, 14, 1, 9 + _blake2b_round 12, 5, 1, 15, 14, 13, 4, 10, 0, 7, 6, 3, 9, 2, 8, 11 + _blake2b_round 13, 11, 7, 14, 12, 1, 3, 9, 5, 0, 15, 4, 8, 6, 2, 10 + _blake2b_round 6, 15, 14, 9, 11, 3, 0, 8, 12, 2, 13, 7, 1, 4, 10, 5 + _blake2b_round 10, 2, 8, 4, 7, 6, 1, 5, 15, 11, 9, 14, 3, 12, 13, 0 + _blake2b_round 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 + _blake2b_round 14, 10, 4, 8, 9, 15, 13, 6, 1, 12, 0, 2, 11, 7, 5, 3 \ + final=1 + + // Fold the final state matrix into the hash chaining value: + // + // for (i = 0; i < 8; i++) + // h[i] ^= v[i] ^ v[i + 8]; + // + vld1.64 {q8-q9}, [ip]! // Load old h[0..3] + veor q0, q0, q4 // v[0..1] ^= v[8..9] + veor q1, q1, q5 // v[2..3] ^= v[10..11] + vld1.64 {q10-q11}, [ip] // Load old h[4..7] + veor q2, q2, q6 // v[4..5] ^= v[12..13] + veor q3, q3, q7 // v[6..7] ^= v[14..15] + veor q0, q0, q8 // v[0..1] ^= h[0..1] + veor q1, q1, q9 // v[2..3] ^= h[2..3] + mov ip, STATE + subs NBLOCKS, NBLOCKS, #1 // nblocks-- + vst1.64 {q0-q1}, [ip]! // Store new h[0..3] + veor q2, q2, q10 // v[4..5] ^= h[4..5] + veor q3, q3, q11 // v[6..7] ^= h[6..7] + vst1.64 {q2-q3}, [ip]! // Store new h[4..7] + bne .Lnext_block // nblocks != 0? + + mov sp, ORIG_SP + pop {r4-r10} + mov pc, lr + +.Lslow_inc_ctr: + // Handle the case where the counter overflowed its low 32 bits, by + // carrying the overflow bit into the full 128-bit counter. + vmov r9, r10, d29 + adcs r8, r8, #0 + adcs r9, r9, #0 + adc r10, r10, #0 + vmov d28, r7, r8 + vmov d29, r9, r10 + vst1.64 {q14}, [ip] // Update t[0] and t[1] + b .Linc_ctr_done +ENDPROC(blake2b_compress_neon) diff --git a/arch/arm/crypto/blake2b-neon-glue.c b/arch/arm/crypto/blake2b-neon-glue.c new file mode 100644 index 0000000000000..34d73200e7fa6 --- /dev/null +++ b/arch/arm/crypto/blake2b-neon-glue.c @@ -0,0 +1,105 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * BLAKE2b digest algorithm, NEON accelerated + * + * Copyright 2020 Google LLC + */ + +#include +#include +#include + +#include +#include + +#include +#include + +asmlinkage void blake2b_compress_neon(struct blake2b_state *state, + const u8 *block, size_t nblocks, u32 inc); + +static void blake2b_compress_arch(struct blake2b_state *state, + const u8 *block, size_t nblocks, u32 inc) +{ + if (!crypto_simd_usable()) { + blake2b_compress_generic(state, block, nblocks, inc); + return; + } + + do { + const size_t blocks = min_t(size_t, nblocks, + SZ_4K / BLAKE2B_BLOCK_SIZE); + + kernel_neon_begin(); + blake2b_compress_neon(state, block, blocks, inc); + kernel_neon_end(); + + nblocks -= blocks; + block += blocks * BLAKE2B_BLOCK_SIZE; + } while (nblocks); +} + +static int crypto_blake2b_update_neon(struct shash_desc *desc, + const u8 *in, unsigned int inlen) +{ + return crypto_blake2b_update(desc, in, inlen, blake2b_compress_arch); +} + +static int crypto_blake2b_final_neon(struct shash_desc *desc, u8 *out) +{ + return crypto_blake2b_final(desc, out, blake2b_compress_arch); +} + +#define BLAKE2B_ALG(name, driver_name, digest_size) \ + { \ + .base.cra_name = name, \ + .base.cra_driver_name = driver_name, \ + .base.cra_priority = 200, \ + .base.cra_flags = CRYPTO_ALG_OPTIONAL_KEY, \ + .base.cra_blocksize = BLAKE2B_BLOCK_SIZE, \ + .base.cra_ctxsize = sizeof(struct blake2b_tfm_ctx), \ + .base.cra_module = THIS_MODULE, \ + .digestsize = digest_size, \ + .setkey = crypto_blake2b_setkey, \ + .init = crypto_blake2b_init, \ + .update = crypto_blake2b_update_neon, \ + .final = crypto_blake2b_final_neon, \ + .descsize = sizeof(struct blake2b_state), \ + } + +static struct shash_alg blake2b_neon_algs[] = { + BLAKE2B_ALG("blake2b-160", "blake2b-160-neon", BLAKE2B_160_HASH_SIZE), + BLAKE2B_ALG("blake2b-256", "blake2b-256-neon", BLAKE2B_256_HASH_SIZE), + BLAKE2B_ALG("blake2b-384", "blake2b-384-neon", BLAKE2B_384_HASH_SIZE), + BLAKE2B_ALG("blake2b-512", "blake2b-512-neon", BLAKE2B_512_HASH_SIZE), +}; + +static int __init blake2b_neon_mod_init(void) +{ + if (!(elf_hwcap & HWCAP_NEON)) + return -ENODEV; + + return crypto_register_shashes(blake2b_neon_algs, + ARRAY_SIZE(blake2b_neon_algs)); +} + +static void __exit blake2b_neon_mod_exit(void) +{ + return crypto_unregister_shashes(blake2b_neon_algs, + ARRAY_SIZE(blake2b_neon_algs)); +} + +module_init(blake2b_neon_mod_init); +module_exit(blake2b_neon_mod_exit); + +MODULE_DESCRIPTION("BLAKE2b digest algorithm, NEON accelerated"); +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Eric Biggers "); +MODULE_ALIAS_CRYPTO("blake2b-160"); +MODULE_ALIAS_CRYPTO("blake2b-160-neon"); +MODULE_ALIAS_CRYPTO("blake2b-256"); +MODULE_ALIAS_CRYPTO("blake2b-256-neon"); +MODULE_ALIAS_CRYPTO("blake2b-384"); +MODULE_ALIAS_CRYPTO("blake2b-384-neon"); +MODULE_ALIAS_CRYPTO("blake2b-512"); +MODULE_ALIAS_CRYPTO("blake2b-512-neon"); From patchwork Thu Dec 17 22:21:36 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Biggers X-Patchwork-Id: 345018 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.0 required=3.0 tests=BAYES_00, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5A7CEC2D0E4 for ; Thu, 17 Dec 2020 22:26:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 35698238E8 for ; Thu, 17 Dec 2020 22:26:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732078AbgLQW0W (ORCPT ); Thu, 17 Dec 2020 17:26:22 -0500 Received: from mail.kernel.org ([198.145.29.99]:45468 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732077AbgLQW0W (ORCPT ); Thu, 17 Dec 2020 17:26:22 -0500 From: Eric Biggers Authentication-Results: mail.kernel.org; dkim=permerror (bad message/signature format) To: linux-crypto@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org, Ard Biesheuvel , Herbert Xu , David Sterba , "Jason A . Donenfeld" , Paul Crowley Subject: [PATCH v2 09/11] crypto: blake2s - share the "shash" API boilerplate code Date: Thu, 17 Dec 2020 14:21:36 -0800 Message-Id: <20201217222138.170526-10-ebiggers@kernel.org> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201217222138.170526-1-ebiggers@kernel.org> References: <20201217222138.170526-1-ebiggers@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org From: Eric Biggers Move the boilerplate code for setkey(), init(), update(), and final() from blake2s_generic.ko into a new module blake2s_helpers.ko, and export it so that it can be used by other shash implementations of BLAKE2s. setkey() and init() are exported as-is, while update() and final() have a blake2s_compress_t function pointer argument added. This allows the implementation of the compression function to be overridden, which is the only part that optimized implementations really care about. The helper functions are defined in a separate module blake2s_helpers.ko (rather than just than in blake2s_generic.ko) because we can't simply select CRYPTO_BLAKE2B from CRYPTO_BLAKE2S_X86. Doing this selection unconditionally would make the library API select the shash API, while doing it conditionally on CRYPTO_HASH would create a recursive kconfig dependency on CRYPTO_HASH. As a bonus, using a separate module also allows the generic implementation to be omitted when unneeded. These helper functions very closely match the ones I defined for BLAKE2b, except the BLAKE2b ones didn't go in a separate module yet because BLAKE2b isn't exposed through the library API yet. Finally, use these new helper functions in the x86 implementation of BLAKE2s. (This part should be a separate patch, but unfortunately the x86 implementation used the exact same function names like "crypto_blake2s_update()", so it had to be updated at the same time.) Signed-off-by: Eric Biggers --- arch/x86/crypto/blake2s-glue.c | 74 +++----------------------- crypto/Kconfig | 5 ++ crypto/Makefile | 1 + crypto/blake2s_generic.c | 79 ++++------------------------ crypto/blake2s_helpers.c | 87 +++++++++++++++++++++++++++++++ include/crypto/internal/blake2s.h | 17 ++++++ 6 files changed, 126 insertions(+), 137 deletions(-) create mode 100644 crypto/blake2s_helpers.c diff --git a/arch/x86/crypto/blake2s-glue.c b/arch/x86/crypto/blake2s-glue.c index 4dcb2ee89efc9..1306b3272c77f 100644 --- a/arch/x86/crypto/blake2s-glue.c +++ b/arch/x86/crypto/blake2s-glue.c @@ -58,75 +58,15 @@ void blake2s_compress_arch(struct blake2s_state *state, } EXPORT_SYMBOL(blake2s_compress_arch); -static int crypto_blake2s_setkey(struct crypto_shash *tfm, const u8 *key, - unsigned int keylen) +static int crypto_blake2s_update_x86(struct shash_desc *desc, const u8 *in, + unsigned int inlen) { - struct blake2s_tfm_ctx *tctx = crypto_shash_ctx(tfm); - - if (keylen == 0 || keylen > BLAKE2S_KEY_SIZE) - return -EINVAL; - - memcpy(tctx->key, key, keylen); - tctx->keylen = keylen; - - return 0; -} - -static int crypto_blake2s_init(struct shash_desc *desc) -{ - struct blake2s_tfm_ctx *tctx = crypto_shash_ctx(desc->tfm); - struct blake2s_state *state = shash_desc_ctx(desc); - const int outlen = crypto_shash_digestsize(desc->tfm); - - if (tctx->keylen) - blake2s_init_key(state, outlen, tctx->key, tctx->keylen); - else - blake2s_init(state, outlen); - - return 0; -} - -static int crypto_blake2s_update(struct shash_desc *desc, const u8 *in, - unsigned int inlen) -{ - struct blake2s_state *state = shash_desc_ctx(desc); - const size_t fill = BLAKE2S_BLOCK_SIZE - state->buflen; - - if (unlikely(!inlen)) - return 0; - if (inlen > fill) { - memcpy(state->buf + state->buflen, in, fill); - blake2s_compress_arch(state, state->buf, 1, BLAKE2S_BLOCK_SIZE); - state->buflen = 0; - in += fill; - inlen -= fill; - } - if (inlen > BLAKE2S_BLOCK_SIZE) { - const size_t nblocks = DIV_ROUND_UP(inlen, BLAKE2S_BLOCK_SIZE); - /* Hash one less (full) block than strictly possible */ - blake2s_compress_arch(state, in, nblocks - 1, BLAKE2S_BLOCK_SIZE); - in += BLAKE2S_BLOCK_SIZE * (nblocks - 1); - inlen -= BLAKE2S_BLOCK_SIZE * (nblocks - 1); - } - memcpy(state->buf + state->buflen, in, inlen); - state->buflen += inlen; - - return 0; + return crypto_blake2s_update(desc, in, inlen, blake2s_compress_arch); } -static int crypto_blake2s_final(struct shash_desc *desc, u8 *out) +static int crypto_blake2s_final_x86(struct shash_desc *desc, u8 *out) { - struct blake2s_state *state = shash_desc_ctx(desc); - - blake2s_set_lastblock(state); - memset(state->buf + state->buflen, 0, - BLAKE2S_BLOCK_SIZE - state->buflen); /* Padding */ - blake2s_compress_arch(state, state->buf, 1, state->buflen); - cpu_to_le32_array(state->h, ARRAY_SIZE(state->h)); - memcpy(out, state->h, state->outlen); - memzero_explicit(state, sizeof(*state)); - - return 0; + return crypto_blake2s_final(desc, out, blake2s_compress_arch); } #define BLAKE2S_ALG(name, driver_name, digest_size) \ @@ -141,8 +81,8 @@ static int crypto_blake2s_final(struct shash_desc *desc, u8 *out) .digestsize = digest_size, \ .setkey = crypto_blake2s_setkey, \ .init = crypto_blake2s_init, \ - .update = crypto_blake2s_update, \ - .final = crypto_blake2s_final, \ + .update = crypto_blake2s_update_x86, \ + .final = crypto_blake2s_final_x86, \ .descsize = sizeof(struct blake2s_state), \ } diff --git a/crypto/Kconfig b/crypto/Kconfig index a367fcfeb5d45..e3a51154ac0cf 100644 --- a/crypto/Kconfig +++ b/crypto/Kconfig @@ -681,6 +681,7 @@ config CRYPTO_BLAKE2B config CRYPTO_BLAKE2S tristate "BLAKE2s digest algorithm" select CRYPTO_LIB_BLAKE2S_GENERIC + select CRYPTO_BLAKE2S_HELPERS select CRYPTO_HASH help Implementation of cryptographic hash function BLAKE2s @@ -696,11 +697,15 @@ config CRYPTO_BLAKE2S See https://blake2.net for further information. +config CRYPTO_BLAKE2S_HELPERS + tristate + config CRYPTO_BLAKE2S_X86 tristate "BLAKE2s digest algorithm (x86 accelerated version)" depends on X86 && 64BIT select CRYPTO_LIB_BLAKE2S_GENERIC select CRYPTO_ARCH_HAVE_LIB_BLAKE2S + select CRYPTO_BLAKE2S_HELPERS if CRYPTO_HASH config CRYPTO_CRCT10DIF tristate "CRCT10DIF algorithm" diff --git a/crypto/Makefile b/crypto/Makefile index b279483fba50b..75f99ab82bfc2 100644 --- a/crypto/Makefile +++ b/crypto/Makefile @@ -82,6 +82,7 @@ CFLAGS_wp512.o := $(call cc-option,-fno-schedule-insns) # https://gcc.gnu.org/b obj-$(CONFIG_CRYPTO_TGR192) += tgr192.o obj-$(CONFIG_CRYPTO_BLAKE2B) += blake2b_generic.o obj-$(CONFIG_CRYPTO_BLAKE2S) += blake2s_generic.o +obj-$(CONFIG_CRYPTO_BLAKE2S_HELPERS) += blake2s_helpers.o obj-$(CONFIG_CRYPTO_GF128MUL) += gf128mul.o obj-$(CONFIG_CRYPTO_ECB) += ecb.o obj-$(CONFIG_CRYPTO_CBC) += cbc.o diff --git a/crypto/blake2s_generic.c b/crypto/blake2s_generic.c index b89536c3671cf..ea2f1c6eb3231 100644 --- a/crypto/blake2s_generic.c +++ b/crypto/blake2s_generic.c @@ -1,84 +1,23 @@ // SPDX-License-Identifier: GPL-2.0 OR MIT /* + * shash interface to the generic implementation of BLAKE2s + * * Copyright (C) 2015-2019 Jason A. Donenfeld . All Rights Reserved. */ #include #include - -#include -#include #include -static int crypto_blake2s_setkey(struct crypto_shash *tfm, const u8 *key, - unsigned int keylen) -{ - struct blake2s_tfm_ctx *tctx = crypto_shash_ctx(tfm); - - if (keylen == 0 || keylen > BLAKE2S_KEY_SIZE) - return -EINVAL; - - memcpy(tctx->key, key, keylen); - tctx->keylen = keylen; - - return 0; -} - -static int crypto_blake2s_init(struct shash_desc *desc) -{ - struct blake2s_tfm_ctx *tctx = crypto_shash_ctx(desc->tfm); - struct blake2s_state *state = shash_desc_ctx(desc); - const int outlen = crypto_shash_digestsize(desc->tfm); - - if (tctx->keylen) - blake2s_init_key(state, outlen, tctx->key, tctx->keylen); - else - blake2s_init(state, outlen); - - return 0; -} - -static int crypto_blake2s_update(struct shash_desc *desc, const u8 *in, - unsigned int inlen) +static int crypto_blake2s_update_generic(struct shash_desc *desc, + const u8 *in, unsigned int inlen) { - struct blake2s_state *state = shash_desc_ctx(desc); - const size_t fill = BLAKE2S_BLOCK_SIZE - state->buflen; - - if (unlikely(!inlen)) - return 0; - if (inlen > fill) { - memcpy(state->buf + state->buflen, in, fill); - blake2s_compress_generic(state, state->buf, 1, BLAKE2S_BLOCK_SIZE); - state->buflen = 0; - in += fill; - inlen -= fill; - } - if (inlen > BLAKE2S_BLOCK_SIZE) { - const size_t nblocks = DIV_ROUND_UP(inlen, BLAKE2S_BLOCK_SIZE); - /* Hash one less (full) block than strictly possible */ - blake2s_compress_generic(state, in, nblocks - 1, BLAKE2S_BLOCK_SIZE); - in += BLAKE2S_BLOCK_SIZE * (nblocks - 1); - inlen -= BLAKE2S_BLOCK_SIZE * (nblocks - 1); - } - memcpy(state->buf + state->buflen, in, inlen); - state->buflen += inlen; - - return 0; + return crypto_blake2s_update(desc, in, inlen, blake2s_compress_generic); } -static int crypto_blake2s_final(struct shash_desc *desc, u8 *out) +static int crypto_blake2s_final_generic(struct shash_desc *desc, u8 *out) { - struct blake2s_state *state = shash_desc_ctx(desc); - - blake2s_set_lastblock(state); - memset(state->buf + state->buflen, 0, - BLAKE2S_BLOCK_SIZE - state->buflen); /* Padding */ - blake2s_compress_generic(state, state->buf, 1, state->buflen); - cpu_to_le32_array(state->h, ARRAY_SIZE(state->h)); - memcpy(out, state->h, state->outlen); - memzero_explicit(state, sizeof(*state)); - - return 0; + return crypto_blake2s_final(desc, out, blake2s_compress_generic); } #define BLAKE2S_ALG(name, driver_name, digest_size) \ @@ -93,8 +32,8 @@ static int crypto_blake2s_final(struct shash_desc *desc, u8 *out) .digestsize = digest_size, \ .setkey = crypto_blake2s_setkey, \ .init = crypto_blake2s_init, \ - .update = crypto_blake2s_update, \ - .final = crypto_blake2s_final, \ + .update = crypto_blake2s_update_generic, \ + .final = crypto_blake2s_final_generic, \ .descsize = sizeof(struct blake2s_state), \ } diff --git a/crypto/blake2s_helpers.c b/crypto/blake2s_helpers.c new file mode 100644 index 0000000000000..0c3b9fcbd022c --- /dev/null +++ b/crypto/blake2s_helpers.c @@ -0,0 +1,87 @@ +// SPDX-License-Identifier: GPL-2.0 OR MIT +/* + * Helper functions for shash implementations of BLAKE2s. The caller must + * provide a pointer to their blake2s_compress_t implementation. + * + * Copyright (C) 2015-2019 Jason A. Donenfeld . All Rights Reserved. + */ + +#include +#include +#include + +int crypto_blake2s_setkey(struct crypto_shash *tfm, const u8 *key, + unsigned int keylen) +{ + struct blake2s_tfm_ctx *tctx = crypto_shash_ctx(tfm); + + if (keylen == 0 || keylen > BLAKE2S_KEY_SIZE) + return -EINVAL; + + memcpy(tctx->key, key, keylen); + tctx->keylen = keylen; + + return 0; +} +EXPORT_SYMBOL_GPL(crypto_blake2s_setkey); + +int crypto_blake2s_init(struct shash_desc *desc) +{ + struct blake2s_tfm_ctx *tctx = crypto_shash_ctx(desc->tfm); + struct blake2s_state *state = shash_desc_ctx(desc); + const int outlen = crypto_shash_digestsize(desc->tfm); + + if (tctx->keylen) + blake2s_init_key(state, outlen, tctx->key, tctx->keylen); + else + blake2s_init(state, outlen); + + return 0; +} +EXPORT_SYMBOL_GPL(crypto_blake2s_init); + +int crypto_blake2s_update(struct shash_desc *desc, const u8 *in, + unsigned int inlen, blake2s_compress_t compress) +{ + struct blake2s_state *state = shash_desc_ctx(desc); + const size_t fill = BLAKE2S_BLOCK_SIZE - state->buflen; + + if (unlikely(!inlen)) + return 0; + if (inlen > fill) { + memcpy(state->buf + state->buflen, in, fill); + (*compress)(state, state->buf, 1, BLAKE2S_BLOCK_SIZE); + state->buflen = 0; + in += fill; + inlen -= fill; + } + if (inlen > BLAKE2S_BLOCK_SIZE) { + const size_t nblocks = DIV_ROUND_UP(inlen, BLAKE2S_BLOCK_SIZE); + /* Hash one less (full) block than strictly possible */ + (*compress)(state, in, nblocks - 1, BLAKE2S_BLOCK_SIZE); + in += BLAKE2S_BLOCK_SIZE * (nblocks - 1); + inlen -= BLAKE2S_BLOCK_SIZE * (nblocks - 1); + } + memcpy(state->buf + state->buflen, in, inlen); + state->buflen += inlen; + + return 0; +} +EXPORT_SYMBOL_GPL(crypto_blake2s_update); + +int crypto_blake2s_final(struct shash_desc *desc, u8 *out, + blake2s_compress_t compress) +{ + struct blake2s_state *state = shash_desc_ctx(desc); + + blake2s_set_lastblock(state); + memset(state->buf + state->buflen, 0, + BLAKE2S_BLOCK_SIZE - state->buflen); /* Padding */ + (*compress)(state, state->buf, 1, state->buflen); + cpu_to_le32_array(state->h, ARRAY_SIZE(state->h)); + memcpy(out, state->h, state->outlen); + memzero_explicit(state, sizeof(*state)); + + return 0; +} +EXPORT_SYMBOL_GPL(crypto_blake2s_final); diff --git a/include/crypto/internal/blake2s.h b/include/crypto/internal/blake2s.h index 6e376ae6b6b58..8c79af7662aa2 100644 --- a/include/crypto/internal/blake2s.h +++ b/include/crypto/internal/blake2s.h @@ -23,4 +23,21 @@ static inline void blake2s_set_lastblock(struct blake2s_state *state) state->f[0] = -1; } +typedef void (*blake2s_compress_t)(struct blake2s_state *state, + const u8 *block, size_t nblocks, u32 inc); + +struct crypto_shash; +struct shash_desc; + +int crypto_blake2s_setkey(struct crypto_shash *tfm, const u8 *key, + unsigned int keylen); + +int crypto_blake2s_init(struct shash_desc *desc); + +int crypto_blake2s_update(struct shash_desc *desc, const u8 *in, + unsigned int inlen, blake2s_compress_t compress); + +int crypto_blake2s_final(struct shash_desc *desc, u8 *out, + blake2s_compress_t compress); + #endif /* BLAKE2S_INTERNAL_H */ From patchwork Thu Dec 17 22:21:38 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Biggers X-Patchwork-Id: 345020 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.0 required=3.0 tests=BAYES_00, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 378F2C2BBD4 for ; Thu, 17 Dec 2020 22:26:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 21D1123A54 for ; Thu, 17 Dec 2020 22:26:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732063AbgLQW0W (ORCPT ); Thu, 17 Dec 2020 17:26:22 -0500 Received: from mail.kernel.org ([198.145.29.99]:45472 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732078AbgLQW0V (ORCPT ); Thu, 17 Dec 2020 17:26:21 -0500 From: Eric Biggers Authentication-Results: mail.kernel.org; dkim=permerror (bad message/signature format) To: linux-crypto@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org, Ard Biesheuvel , Herbert Xu , David Sterba , "Jason A . Donenfeld" , Paul Crowley Subject: [PATCH v2 11/11] wireguard: Kconfig: select CRYPTO_BLAKE2S_ARM Date: Thu, 17 Dec 2020 14:21:38 -0800 Message-Id: <20201217222138.170526-12-ebiggers@kernel.org> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201217222138.170526-1-ebiggers@kernel.org> References: <20201217222138.170526-1-ebiggers@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org From: Eric Biggers When available, select the new implementation of BLAKE2s for 32-bit ARM. This is faster than the generic C implementation. Signed-off-by: Eric Biggers Reviewed-by: Jason A. Donenfeld --- drivers/net/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig index 260f9f46668b8..672fcdd9aecbb 100644 --- a/drivers/net/Kconfig +++ b/drivers/net/Kconfig @@ -90,6 +90,7 @@ config WIREGUARD select CRYPTO_CHACHA20_NEON if (ARM || ARM64) && KERNEL_MODE_NEON select CRYPTO_POLY1305_NEON if ARM64 && KERNEL_MODE_NEON select CRYPTO_POLY1305_ARM if ARM + select CRYPTO_BLAKE2S_ARM if ARM select CRYPTO_CURVE25519_NEON if ARM && KERNEL_MODE_NEON select CRYPTO_CHACHA_MIPS if CPU_MIPS32_R2 select CRYPTO_POLY1305_MIPS if CPU_MIPS32 || (CPU_MIPS64 && 64BIT)