From patchwork Tue Nov 10 19:04:42 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 323060 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.1 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 83120C55ABD for ; Tue, 10 Nov 2020 19:04:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 23D5720780 for ; Tue, 10 Nov 2020 19:04:57 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="FzWUAKJZ" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731220AbgKJTE4 (ORCPT ); Tue, 10 Nov 2020 14:04:56 -0500 Received: from mail.kernel.org ([198.145.29.99]:51140 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731187AbgKJTE4 (ORCPT ); Tue, 10 Nov 2020 14:04:56 -0500 Received: from e123331-lin.nice.arm.com (lfbn-nic-1-188-42.w2-15.abo.wanadoo.fr [2.15.37.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id B683120829; Tue, 10 Nov 2020 19:04:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1605035095; bh=N2U9menlQv60P/GBM6E2+KMGks+534jhCdZlvYWzxB4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=FzWUAKJZ9/izHyc31z3acKOo7VsShbL6umi7u1exaKr2X1MKKd6o2Q6X6e2Owoorb 4Jv9H7A5DoGSLInEXS8KXPKq/aiDXFGuHkwrekyAHHwpX2i1iGYzhV7IQ0cCdrdrui xwXUp4v33Sgn0d+OYCnIuozOcaOeQsw1o+PhbOP8= From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: herbert@gondor.apana.org.au, Ard Biesheuvel , Ondrej Mosnacek , Eric Biggers Subject: [PATCH v2 2/4] crypto: aegis128/neon - optimize tail block handling Date: Tue, 10 Nov 2020 20:04:42 +0100 Message-Id: <20201110190444.10634-3-ardb@kernel.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20201110190444.10634-1-ardb@kernel.org> References: <20201110190444.10634-1-ardb@kernel.org> Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Avoid copying the tail block via a stack buffer if the total size exceeds a single AEGIS block. In this case, we can use overlapping loads and stores and NEON permutation instructions instead, which leads to a modest performance improvement on some cores (< 5%), and is slightly cleaner. Note that we still need to use a stack buffer if the entire input is smaller than 16 bytes, given that we cannot use 16 byte NEON loads and stores safely in this case. Signed-off-by: Ard Biesheuvel --- crypto/aegis128-neon-inner.c | 89 +++++++++++++++++--- 1 file changed, 75 insertions(+), 14 deletions(-) diff --git a/crypto/aegis128-neon-inner.c b/crypto/aegis128-neon-inner.c index 2a660ac1bc3a..cd1b3ad1d1f3 100644 --- a/crypto/aegis128-neon-inner.c +++ b/crypto/aegis128-neon-inner.c @@ -20,7 +20,6 @@ extern int aegis128_have_aes_insn; void *memcpy(void *dest, const void *src, size_t n); -void *memset(void *s, int c, size_t n); struct aegis128_state { uint8x16_t v[5]; @@ -173,10 +172,46 @@ void crypto_aegis128_update_neon(void *state, const void *msg) aegis128_save_state_neon(st, state); } +#ifdef CONFIG_ARM +/* + * AArch32 does not provide these intrinsics natively because it does not + * implement the underlying instructions. AArch32 only provides 64-bit + * wide vtbl.8/vtbx.8 instruction, so use those instead. + */ +static uint8x16_t vqtbl1q_u8(uint8x16_t a, uint8x16_t b) +{ + union { + uint8x16_t val; + uint8x8x2_t pair; + } __a = { a }; + + return vcombine_u8(vtbl2_u8(__a.pair, vget_low_u8(b)), + vtbl2_u8(__a.pair, vget_high_u8(b))); +} + +static uint8x16_t vqtbx1q_u8(uint8x16_t v, uint8x16_t a, uint8x16_t b) +{ + union { + uint8x16_t val; + uint8x8x2_t pair; + } __a = { a }; + + return vcombine_u8(vtbx2_u8(vget_low_u8(v), __a.pair, vget_low_u8(b)), + vtbx2_u8(vget_high_u8(v), __a.pair, vget_high_u8(b))); +} +#endif + +static const uint8_t permute[] __aligned(64) = { + -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, + 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, + -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, +}; + void crypto_aegis128_encrypt_chunk_neon(void *state, void *dst, const void *src, unsigned int size) { struct aegis128_state st = aegis128_load_state_neon(state); + const int short_input = size < AEGIS_BLOCK_SIZE; uint8x16_t msg; preload_sbox(); @@ -186,7 +221,8 @@ void crypto_aegis128_encrypt_chunk_neon(void *state, void *dst, const void *src, msg = vld1q_u8(src); st = aegis128_update_neon(st, msg); - vst1q_u8(dst, msg ^ s); + msg ^= s; + vst1q_u8(dst, msg); size -= AEGIS_BLOCK_SIZE; src += AEGIS_BLOCK_SIZE; @@ -195,13 +231,26 @@ void crypto_aegis128_encrypt_chunk_neon(void *state, void *dst, const void *src, if (size > 0) { uint8x16_t s = st.v[1] ^ (st.v[2] & st.v[3]) ^ st.v[4]; - uint8_t buf[AEGIS_BLOCK_SIZE] = {}; + uint8_t buf[AEGIS_BLOCK_SIZE]; + const void *in = src; + void *out = dst; + uint8x16_t m; - memcpy(buf, src, size); - msg = vld1q_u8(buf); - st = aegis128_update_neon(st, msg); - vst1q_u8(buf, msg ^ s); - memcpy(dst, buf, size); + if (__builtin_expect(short_input, 0)) + in = out = memcpy(buf + AEGIS_BLOCK_SIZE - size, src, size); + + m = vqtbl1q_u8(vld1q_u8(in + size - AEGIS_BLOCK_SIZE), + vld1q_u8(permute + 32 - size)); + + st = aegis128_update_neon(st, m); + + vst1q_u8(out + size - AEGIS_BLOCK_SIZE, + vqtbl1q_u8(m ^ s, vld1q_u8(permute + size))); + + if (__builtin_expect(short_input, 0)) + memcpy(dst, out, size); + else + vst1q_u8(out - AEGIS_BLOCK_SIZE, msg); } aegis128_save_state_neon(st, state); @@ -211,6 +260,7 @@ void crypto_aegis128_decrypt_chunk_neon(void *state, void *dst, const void *src, unsigned int size) { struct aegis128_state st = aegis128_load_state_neon(state); + const int short_input = size < AEGIS_BLOCK_SIZE; uint8x16_t msg; preload_sbox(); @@ -228,14 +278,25 @@ void crypto_aegis128_decrypt_chunk_neon(void *state, void *dst, const void *src, if (size > 0) { uint8x16_t s = st.v[1] ^ (st.v[2] & st.v[3]) ^ st.v[4]; uint8_t buf[AEGIS_BLOCK_SIZE]; + const void *in = src; + void *out = dst; + uint8x16_t m; - vst1q_u8(buf, s); - memcpy(buf, src, size); - msg = vld1q_u8(buf) ^ s; - vst1q_u8(buf, msg); - memcpy(dst, buf, size); + if (__builtin_expect(short_input, 0)) + in = out = memcpy(buf + AEGIS_BLOCK_SIZE - size, src, size); - st = aegis128_update_neon(st, msg); + m = s ^ vqtbx1q_u8(s, vld1q_u8(in + size - AEGIS_BLOCK_SIZE), + vld1q_u8(permute + 32 - size)); + + st = aegis128_update_neon(st, m); + + vst1q_u8(out + size - AEGIS_BLOCK_SIZE, + vqtbl1q_u8(m, vld1q_u8(permute + size))); + + if (__builtin_expect(short_input, 0)) + memcpy(dst, out, size); + else + vst1q_u8(out - AEGIS_BLOCK_SIZE, msg); } aegis128_save_state_neon(st, state); From patchwork Tue Nov 10 19:04:44 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 323059 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.1 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7FEE5C388F7 for ; Tue, 10 Nov 2020 19:05:01 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 268C620797 for ; Tue, 10 Nov 2020 19:05:01 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="AhIwZsal" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731281AbgKJTFA (ORCPT ); Tue, 10 Nov 2020 14:05:00 -0500 Received: from mail.kernel.org ([198.145.29.99]:51200 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731187AbgKJTFA (ORCPT ); Tue, 10 Nov 2020 14:05:00 -0500 Received: from e123331-lin.nice.arm.com (lfbn-nic-1-188-42.w2-15.abo.wanadoo.fr [2.15.37.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id E14D02076E; Tue, 10 Nov 2020 19:04:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1605035099; bh=uMlXX6dEGvjboEdSLJCaee28jrk5uTh2PV9QmdrnoPg=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=AhIwZsalO+SdeWzz5OC0Fep9bk2tisPQWmF2wm4aymmF8A+CadDkjeYFlU6uGatoK lCx+UwnHXsgQpXT1oGsJDd+JxGbQ2odH9iRi7JZURcCoYzRJj8520F1Qg54bqhhOe2 TiyXNO5JlV9D7h+tl5SqCbA+RUT2Y7ib1tqV2dbA= From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: herbert@gondor.apana.org.au, Ard Biesheuvel , Ondrej Mosnacek , Eric Biggers Subject: [PATCH v2 4/4] crypto: aegis128 - expose SIMD code path as separate driver Date: Tue, 10 Nov 2020 20:04:44 +0100 Message-Id: <20201110190444.10634-5-ardb@kernel.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20201110190444.10634-1-ardb@kernel.org> References: <20201110190444.10634-1-ardb@kernel.org> Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Wiring the SIMD code into the generic driver has the unfortunate side effect that the tcrypt testing code cannot distinguish them, and will therefore not use the latter to fuzz test the former, as it does for other algorithms. So let's refactor the code a bit so we can register two implementations: aegis128-generic and aegis128-simd. Signed-off-by: Ard Biesheuvel --- crypto/aegis128-core.c | 176 +++++++++++++------- 1 file changed, 119 insertions(+), 57 deletions(-) diff --git a/crypto/aegis128-core.c b/crypto/aegis128-core.c index 859c7b905618..19f38e8c1627 100644 --- a/crypto/aegis128-core.c +++ b/crypto/aegis128-core.c @@ -396,7 +396,7 @@ static int crypto_aegis128_setauthsize(struct crypto_aead *tfm, return 0; } -static int crypto_aegis128_encrypt(struct aead_request *req) +static int crypto_aegis128_encrypt_generic(struct aead_request *req) { struct crypto_aead *tfm = crypto_aead_reqtfm(req); union aegis_block tag = {}; @@ -407,27 +407,18 @@ static int crypto_aegis128_encrypt(struct aead_request *req) struct aegis_state state; skcipher_walk_aead_encrypt(&walk, req, false); - if (aegis128_do_simd()) { - crypto_aegis128_init_simd(&state, &ctx->key, req->iv); - crypto_aegis128_process_ad(&state, req->src, req->assoclen); - crypto_aegis128_process_crypt(&state, &walk, - crypto_aegis128_encrypt_chunk_simd); - crypto_aegis128_final_simd(&state, &tag, req->assoclen, - cryptlen, 0); - } else { - crypto_aegis128_init(&state, &ctx->key, req->iv); - crypto_aegis128_process_ad(&state, req->src, req->assoclen); - crypto_aegis128_process_crypt(&state, &walk, - crypto_aegis128_encrypt_chunk); - crypto_aegis128_final(&state, &tag, req->assoclen, cryptlen); - } + crypto_aegis128_init(&state, &ctx->key, req->iv); + crypto_aegis128_process_ad(&state, req->src, req->assoclen); + crypto_aegis128_process_crypt(&state, &walk, + crypto_aegis128_encrypt_chunk); + crypto_aegis128_final(&state, &tag, req->assoclen, cryptlen); scatterwalk_map_and_copy(tag.bytes, req->dst, req->assoclen + cryptlen, authsize, 1); return 0; } -static int crypto_aegis128_decrypt(struct aead_request *req) +static int crypto_aegis128_decrypt_generic(struct aead_request *req) { static const u8 zeros[AEGIS128_MAX_AUTH_SIZE] = {}; struct crypto_aead *tfm = crypto_aead_reqtfm(req); @@ -442,27 +433,11 @@ static int crypto_aegis128_decrypt(struct aead_request *req) authsize, 0); skcipher_walk_aead_decrypt(&walk, req, false); - if (aegis128_do_simd()) { - crypto_aegis128_init_simd(&state, &ctx->key, req->iv); - crypto_aegis128_process_ad(&state, req->src, req->assoclen); - crypto_aegis128_process_crypt(&state, &walk, - crypto_aegis128_decrypt_chunk_simd); - if (unlikely(crypto_aegis128_final_simd(&state, &tag, - req->assoclen, - cryptlen, authsize))) { - skcipher_walk_aead_decrypt(&walk, req, false); - crypto_aegis128_process_crypt(NULL, &walk, - crypto_aegis128_wipe_chunk); - return -EBADMSG; - } - return 0; - } else { - crypto_aegis128_init(&state, &ctx->key, req->iv); - crypto_aegis128_process_ad(&state, req->src, req->assoclen); - crypto_aegis128_process_crypt(&state, &walk, - crypto_aegis128_decrypt_chunk); - crypto_aegis128_final(&state, &tag, req->assoclen, cryptlen); - } + crypto_aegis128_init(&state, &ctx->key, req->iv); + crypto_aegis128_process_ad(&state, req->src, req->assoclen); + crypto_aegis128_process_crypt(&state, &walk, + crypto_aegis128_decrypt_chunk); + crypto_aegis128_final(&state, &tag, req->assoclen, cryptlen); if (unlikely(crypto_memneq(tag.bytes, zeros, authsize))) { /* @@ -482,42 +457,128 @@ static int crypto_aegis128_decrypt(struct aead_request *req) return 0; } -static struct aead_alg crypto_aegis128_alg = { - .setkey = crypto_aegis128_setkey, - .setauthsize = crypto_aegis128_setauthsize, - .encrypt = crypto_aegis128_encrypt, - .decrypt = crypto_aegis128_decrypt, +static int crypto_aegis128_encrypt_simd(struct aead_request *req) +{ + struct crypto_aead *tfm = crypto_aead_reqtfm(req); + union aegis_block tag = {}; + unsigned int authsize = crypto_aead_authsize(tfm); + struct aegis_ctx *ctx = crypto_aead_ctx(tfm); + unsigned int cryptlen = req->cryptlen; + struct skcipher_walk walk; + struct aegis_state state; - .ivsize = AEGIS128_NONCE_SIZE, - .maxauthsize = AEGIS128_MAX_AUTH_SIZE, - .chunksize = AEGIS_BLOCK_SIZE, + if (!aegis128_do_simd()) + return crypto_aegis128_encrypt_generic(req); - .base = { - .cra_blocksize = 1, - .cra_ctxsize = sizeof(struct aegis_ctx), - .cra_alignmask = 0, + skcipher_walk_aead_encrypt(&walk, req, false); + crypto_aegis128_init_simd(&state, &ctx->key, req->iv); + crypto_aegis128_process_ad(&state, req->src, req->assoclen); + crypto_aegis128_process_crypt(&state, &walk, + crypto_aegis128_encrypt_chunk_simd); + crypto_aegis128_final_simd(&state, &tag, req->assoclen, cryptlen, 0); - .cra_priority = 100, + scatterwalk_map_and_copy(tag.bytes, req->dst, req->assoclen + cryptlen, + authsize, 1); + return 0; +} - .cra_name = "aegis128", - .cra_driver_name = "aegis128-generic", +static int crypto_aegis128_decrypt_simd(struct aead_request *req) +{ + struct crypto_aead *tfm = crypto_aead_reqtfm(req); + union aegis_block tag; + unsigned int authsize = crypto_aead_authsize(tfm); + unsigned int cryptlen = req->cryptlen - authsize; + struct aegis_ctx *ctx = crypto_aead_ctx(tfm); + struct skcipher_walk walk; + struct aegis_state state; + + if (!aegis128_do_simd()) + return crypto_aegis128_decrypt_generic(req); + + scatterwalk_map_and_copy(tag.bytes, req->src, req->assoclen + cryptlen, + authsize, 0); + + skcipher_walk_aead_decrypt(&walk, req, false); + crypto_aegis128_init_simd(&state, &ctx->key, req->iv); + crypto_aegis128_process_ad(&state, req->src, req->assoclen); + crypto_aegis128_process_crypt(&state, &walk, + crypto_aegis128_decrypt_chunk_simd); - .cra_module = THIS_MODULE, + if (unlikely(crypto_aegis128_final_simd(&state, &tag, req->assoclen, + cryptlen, authsize))) { + skcipher_walk_aead_decrypt(&walk, req, false); + crypto_aegis128_process_crypt(NULL, &walk, + crypto_aegis128_wipe_chunk); + return -EBADMSG; } + return 0; +} + +static struct aead_alg crypto_aegis128_alg_generic = { + .setkey = crypto_aegis128_setkey, + .setauthsize = crypto_aegis128_setauthsize, + .encrypt = crypto_aegis128_encrypt_generic, + .decrypt = crypto_aegis128_decrypt_generic, + + .ivsize = AEGIS128_NONCE_SIZE, + .maxauthsize = AEGIS128_MAX_AUTH_SIZE, + .chunksize = AEGIS_BLOCK_SIZE, + + .base.cra_blocksize = 1, + .base.cra_ctxsize = sizeof(struct aegis_ctx), + .base.cra_alignmask = 0, + .base.cra_priority = 100, + .base.cra_name = "aegis128", + .base.cra_driver_name = "aegis128-generic", +}; + +static struct aead_alg crypto_aegis128_alg_simd = { + .base.cra_module = THIS_MODULE, + .setkey = crypto_aegis128_setkey, + .setauthsize = crypto_aegis128_setauthsize, + .encrypt = crypto_aegis128_encrypt_simd, + .decrypt = crypto_aegis128_decrypt_simd, + + .ivsize = AEGIS128_NONCE_SIZE, + .maxauthsize = AEGIS128_MAX_AUTH_SIZE, + .chunksize = AEGIS_BLOCK_SIZE, + + .base.cra_blocksize = 1, + .base.cra_ctxsize = sizeof(struct aegis_ctx), + .base.cra_alignmask = 0, + .base.cra_priority = 200, + .base.cra_name = "aegis128", + .base.cra_driver_name = "aegis128-simd", + .base.cra_module = THIS_MODULE, }; static int __init crypto_aegis128_module_init(void) { + int ret; + + ret = crypto_register_aead(&crypto_aegis128_alg_generic); + if (ret) + return ret; + if (IS_ENABLED(CONFIG_CRYPTO_AEGIS128_SIMD) && - crypto_aegis128_have_simd()) + crypto_aegis128_have_simd()) { + ret = crypto_register_aead(&crypto_aegis128_alg_simd); + if (ret) { + crypto_unregister_aead(&crypto_aegis128_alg_generic); + return ret; + } static_branch_enable(&have_simd); - - return crypto_register_aead(&crypto_aegis128_alg); + } + return 0; } static void __exit crypto_aegis128_module_exit(void) { - crypto_unregister_aead(&crypto_aegis128_alg); + if (IS_ENABLED(CONFIG_CRYPTO_AEGIS128_SIMD) && + crypto_aegis128_have_simd()) + crypto_unregister_aead(&crypto_aegis128_alg_simd); + + crypto_unregister_aead(&crypto_aegis128_alg_generic); } subsys_initcall(crypto_aegis128_module_init); @@ -528,3 +589,4 @@ MODULE_AUTHOR("Ondrej Mosnacek "); MODULE_DESCRIPTION("AEGIS-128 AEAD algorithm"); MODULE_ALIAS_CRYPTO("aegis128"); MODULE_ALIAS_CRYPTO("aegis128-generic"); +MODULE_ALIAS_CRYPTO("aegis128-simd");