From patchwork Fri Jun 28 17:07:45 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 168112 Delivered-To: patch@linaro.org Received: by 2002:a92:4782:0:0:0:0:0 with SMTP id e2csp3943335ilk; Fri, 28 Jun 2019 10:08:16 -0700 (PDT) X-Google-Smtp-Source: APXvYqzyjuEFU+Ss2t5qsCYTHlqv/fjfuZjGSEc+SgEKSPTcRr322db4P1avmXvXpy6m9ZV/oIS7 X-Received: by 2002:a63:6b46:: with SMTP id g67mr10287511pgc.45.1561741696760; Fri, 28 Jun 2019 10:08:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1561741696; cv=none; d=google.com; s=arc-20160816; b=pChCYiP7p1V69e59a6PjwBJjlRpijMjqog0LUr55VoDTPaq0PPOu9SqjVNuA0ogidc zWnk/2DqcDK8Lnufl0wjSRkkWluc/AwA3GCQAy0Hms6r5510ArWehUm4OlUfi/0Hn5x+ XL5e6hHPLi7nqhzymSJewiROOMlQ17dzWun3lfZdIHi7xRjtd3g7rEz7XNkHs0DdWrSu +FZtlc6Nj43AAATDydC3AkYe4SbajKNglxlbYRxK8p1XlW8qA5NaY7WNtKNwe2enx9tN vdiq29aIXWdMq6WFg6dWRDb6RaK/Kzum+0xYZnfUN9Jnh7Uq2FOczrxceaaVgQpp51zL Ay4A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=Pl6rgvK8O4r0l+FZOe/I16ciKJFVn0p0xgITZq5Cfpc=; b=0f+6cfLGJAY//7mFOe4nuW44QZeOHUpW+lpZO57s7ZExNzar9FbRmukJ2rmmSb1b+N 0SIcoG3g/ClN2Y1hy73/BoeUKuymt+0bpc2Qt8UWVF/ZzC69fjep+OJUufvjTfUCq4so 6ni5sPrvunNZTe0Ai6eIiqq8r1rUUKz2L0DbsMK3o534NiS2yFxmlQiWq66bzuS1AlWB moVEJ5c+GBWYdT/a3CvfTwT6w/hMdBofYgyuw6c1LJyv9r1wUcr6cGg0WUcJLm3OYta6 Q09mGFkL4z5YX//gNnZUgHanIf/h6Jt1dMmPps2vIPm+7oYmELVbqx3EJ9DWQc9vdJ7F HzKw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=JCK771zE; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id i3si2338787pgq.440.2019.06.28.10.08.16; Fri, 28 Jun 2019 10:08:16 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=JCK771zE; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726408AbfF1RIQ (ORCPT + 3 others); Fri, 28 Jun 2019 13:08:16 -0400 Received: from mail-wr1-f65.google.com ([209.85.221.65]:40232 "EHLO mail-wr1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726631AbfF1RIP (ORCPT ); Fri, 28 Jun 2019 13:08:15 -0400 Received: by mail-wr1-f65.google.com with SMTP id p11so7043007wre.7 for ; Fri, 28 Jun 2019 10:08:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Pl6rgvK8O4r0l+FZOe/I16ciKJFVn0p0xgITZq5Cfpc=; b=JCK771zETGF3OokXVwahC28/u9TgS5P+8SUtUDkjj3m2SHRrtxJL/4PvppDelb2o2I hnXYtygyD3uSwPaPoDlW6aWJgNOzw7w92JAgPMrvCOCiCEa3zanmop15Ydvugz3roref 8hAXcMk+pyMM1lK8pap2N+TBDesy0Bi6CihDs32PdQ6Cd3GCD4/Y85ZqOp7LbwaZ51nf QF5VHWeB18A4QGpMY/dAX0ghtJQ/XFBY+mVjabr5GObXm2se3VtxxFlLPC+1SxZSZcWL pn0IZILTaOTmS84D5IPoaMc6TB7sl1PrR31BZsdowJAxKHP3c47zApJIDYvSOea2QJd5 oI2Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Pl6rgvK8O4r0l+FZOe/I16ciKJFVn0p0xgITZq5Cfpc=; b=lDzdrTJXcdvqXbr8dfXdVjwNadgirwIOXSvTqoBH2haX3n+CMD18/7ikzBaQB7F+Wy 1DArRwdWkYv4XEYD1GukLraneVWgKIYlNT8wIi7SCt/7Uv9TAvBWKdX2Ub7dn2H1Cin5 AANd3VMMI40PxkTpUkZcYUiV3+8OJKU5NmJHaqFcSIJo5LK7HAxzW4G3dV8vZCJ7rsBc oPeA7+7eAKQtG3ZfqBY1PUJb9PpB429RS7FG0PPcXRDQa9vg8oc4dXWCg2eGVzQCmOYH WpAhL+Yn2NssQBgPXAq8Mrd/2BSsjBgR6AfgvT3b52KSIvk65tKe53d8eAPmi5yifZLs RBUw== X-Gm-Message-State: APjAAAWdXSgydH3eMTuU0RZLsgZ4c+o4rJcqBOh9a1lrnLHSopKrSXHK xpNtgbkdJm+T2yicE3gQT3ib53RLjm8/yA== X-Received: by 2002:a05:6000:11c2:: with SMTP id i2mr5172431wrx.199.1561741692928; Fri, 28 Jun 2019 10:08:12 -0700 (PDT) Received: from localhost.localdomain (laubervilliers-657-1-83-120.w92-154.abo.wanadoo.fr. [92.154.90.120]) by smtp.gmail.com with ESMTPSA id c15sm3833251wrd.88.2019.06.28.10.08.11 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Fri, 28 Jun 2019 10:08:12 -0700 (PDT) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: Ard Biesheuvel , Eric Biggers , Ondrej Mosnacek , Herbert Xu , Steve Capper , Milan Broz Subject: [PATCH v2 6/7] crypto: aegis128 - provide a SIMD implementation based on NEON intrinsics Date: Fri, 28 Jun 2019 19:07:45 +0200 Message-Id: <20190628170746.28768-7-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190628170746.28768-1-ard.biesheuvel@linaro.org> References: <20190628170746.28768-1-ard.biesheuvel@linaro.org> MIME-Version: 1.0 Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Provide an accelerated implementation of aegis128 by wiring up the SIMD hooks in the generic driver to an implementation based on NEON intrinsics, which can be compiled to both ARM and arm64 code. This results in a performance of 2.2 cycles per byte on Cortex-A53, which is a performance increase of ~11x compared to the generic code. Signed-off-by: Ard Biesheuvel --- crypto/Kconfig | 5 + crypto/Makefile | 11 ++ crypto/aegis128-neon-inner.c | 132 ++++++++++++++++++++ crypto/aegis128-neon.c | 43 +++++++ 4 files changed, 191 insertions(+) -- 2.20.1 diff --git a/crypto/Kconfig b/crypto/Kconfig index 4b948dac21db..1af99189d2b6 100644 --- a/crypto/Kconfig +++ b/crypto/Kconfig @@ -306,6 +306,11 @@ config CRYPTO_AEGIS128 help Support for the AEGIS-128 dedicated AEAD algorithm. +config CRYPTO_AEGIS128_SIMD + bool "Support SIMD acceleration for AEGIS-128" + depends on CRYPTO_AEGIS128 && ((ARM || ARM64) && KERNEL_MODE_NEON) + default y + config CRYPTO_AEGIS128L tristate "AEGIS-128L AEAD algorithm" select CRYPTO_AEAD diff --git a/crypto/Makefile b/crypto/Makefile index 362a36f0bd2f..b3e16b4fb414 100644 --- a/crypto/Makefile +++ b/crypto/Makefile @@ -91,6 +91,17 @@ obj-$(CONFIG_CRYPTO_CCM) += ccm.o obj-$(CONFIG_CRYPTO_CHACHA20POLY1305) += chacha20poly1305.o obj-$(CONFIG_CRYPTO_AEGIS128) += aegis128.o aegis128-y := aegis128-core.o + +ifeq ($(ARCH),arm) +CFLAGS_aegis128-neon-inner.o += -ffreestanding -march=armv7-a -mfloat-abi=softfp -mfpu=crypto-neon-fp-armv8 +aegis128-$(CONFIG_CRYPTO_AEGIS128_SIMD) += aegis128-neon.o aegis128-neon-inner.o +endif +ifeq ($(ARCH),arm64) +CFLAGS_aegis128-neon-inner.o += -ffreestanding -mcpu=generic+crypto +CFLAGS_REMOVE_aegis128-neon-inner.o += -mgeneral-regs-only +aegis128-$(CONFIG_CRYPTO_AEGIS128_SIMD) += aegis128-neon.o aegis128-neon-inner.o +endif + obj-$(CONFIG_CRYPTO_PCRYPT) += pcrypt.o obj-$(CONFIG_CRYPTO_CRYPTD) += cryptd.o obj-$(CONFIG_CRYPTO_DES) += des_generic.o diff --git a/crypto/aegis128-neon-inner.c b/crypto/aegis128-neon-inner.c new file mode 100644 index 000000000000..0479970a2f25 --- /dev/null +++ b/crypto/aegis128-neon-inner.c @@ -0,0 +1,132 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (C) 2019 Linaro, Ltd. + */ + +#ifdef CONFIG_ARM64 +#include +#else +#include +#endif + +#define AEGIS_BLOCK_SIZE 16 + +#include + +void *memcpy(void *dest, const void *src, size_t n); +void *memset(void *s, int c, size_t n); + +struct aegis128_state { + uint8x16_t v[5]; +}; + +static struct aegis128_state aegis128_load_state_neon(const void *state) +{ + return (struct aegis128_state){ { + vld1q_u8(state), + vld1q_u8(state + 16), + vld1q_u8(state + 32), + vld1q_u8(state + 48), + vld1q_u8(state + 64) + } }; +} + +static void aegis128_save_state_neon(struct aegis128_state st, void *state) +{ + vst1q_u8(state, st.v[0]); + vst1q_u8(state + 16, st.v[1]); + vst1q_u8(state + 32, st.v[2]); + vst1q_u8(state + 48, st.v[3]); + vst1q_u8(state + 64, st.v[4]); +} + +static struct aegis128_state aegis128_update_neon(struct aegis128_state st, + uint8x16_t m) +{ + uint8x16_t z = {}; + uint8x16_t t; + + t = vaesmcq_u8(vaeseq_u8(st.v[3], z)); + st.v[3] ^= vaesmcq_u8(vaeseq_u8(st.v[2], z)); + st.v[2] ^= vaesmcq_u8(vaeseq_u8(st.v[1], z)); + st.v[1] ^= vaesmcq_u8(vaeseq_u8(st.v[0], z)); + st.v[0] ^= vaesmcq_u8(vaeseq_u8(st.v[4], z)) ^ m; + st.v[4] ^= t; + + return st; +} + +void crypto_aegis128_update_neon(void *state, const void *msg) +{ + struct aegis128_state st = aegis128_load_state_neon(state); + + st = aegis128_update_neon(st, vld1q_u8(msg)); + + aegis128_save_state_neon(st, state); +} + +void crypto_aegis128_encrypt_chunk_neon(void *state, void *dst, const void *src, + unsigned int size) +{ + struct aegis128_state st = aegis128_load_state_neon(state); + uint8x16_t tmp; + + while (size >= AEGIS_BLOCK_SIZE) { + uint8x16_t s = vld1q_u8(src); + + tmp = s ^ st.v[1] ^ (st.v[2] & st.v[3]) ^ st.v[4]; + st = aegis128_update_neon(st, s); + vst1q_u8(dst, tmp); + + size -= AEGIS_BLOCK_SIZE; + src += AEGIS_BLOCK_SIZE; + dst += AEGIS_BLOCK_SIZE; + } + + if (size > 0) { + uint8_t buf[AEGIS_BLOCK_SIZE] = {}; + uint8x16_t msg; + + memcpy(buf, src, size); + msg = vld1q_u8(buf); + tmp = msg ^ st.v[1] ^ (st.v[2] & st.v[3]) ^ st.v[4]; + st = aegis128_update_neon(st, msg); + vst1q_u8(buf, tmp); + memcpy(dst, buf, size); + } + + aegis128_save_state_neon(st, state); +} + +void crypto_aegis128_decrypt_chunk_neon(void *state, void *dst, const void *src, + unsigned int size) +{ + struct aegis128_state st = aegis128_load_state_neon(state); + uint8x16_t tmp; + + while (size >= AEGIS_BLOCK_SIZE) { + tmp = vld1q_u8(src) ^ st.v[1] ^ (st.v[2] & st.v[3]) ^ st.v[4]; + st = aegis128_update_neon(st, tmp); + vst1q_u8(dst, tmp); + + size -= AEGIS_BLOCK_SIZE; + src += AEGIS_BLOCK_SIZE; + dst += AEGIS_BLOCK_SIZE; + } + + if (size > 0) { + uint8_t buf[AEGIS_BLOCK_SIZE] = {}; + uint8x16_t msg; + + memcpy(buf, src, size); + msg = vld1q_u8(buf) ^ st.v[1] ^ (st.v[2] & st.v[3]) ^ st.v[4]; + vst1q_u8(buf, msg); + memcpy(dst, buf, size); + + memset(buf + size, 0, AEGIS_BLOCK_SIZE - size); + msg = vld1q_u8(buf); + st = aegis128_update_neon(st, msg); + } + + aegis128_save_state_neon(st, state); +} diff --git a/crypto/aegis128-neon.c b/crypto/aegis128-neon.c new file mode 100644 index 000000000000..c1c0a1686f67 --- /dev/null +++ b/crypto/aegis128-neon.c @@ -0,0 +1,43 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (C) 2019 Linaro Ltd + */ + +#include +#include + +#include "aegis.h" + +void crypto_aegis128_update_neon(void *state, const void *msg); +void crypto_aegis128_encrypt_chunk_neon(void *state, void *dst, const void *src, + unsigned int size); +void crypto_aegis128_decrypt_chunk_neon(void *state, void *dst, const void *src, + unsigned int size); + +bool crypto_aegis128_have_simd(void) +{ + return cpu_have_feature(cpu_feature(AES)); +} + +void crypto_aegis128_update_simd(union aegis_block *state, const void *msg) +{ + kernel_neon_begin(); + crypto_aegis128_update_neon(state, msg); + kernel_neon_end(); +} + +void crypto_aegis128_encrypt_chunk_simd(union aegis_block *state, u8 *dst, + const u8 *src, unsigned int size) +{ + kernel_neon_begin(); + crypto_aegis128_encrypt_chunk_neon(state, dst, src, size); + kernel_neon_end(); +} + +void crypto_aegis128_decrypt_chunk_simd(union aegis_block *state, u8 *dst, + const u8 *src, unsigned int size) +{ + kernel_neon_begin(); + crypto_aegis128_decrypt_chunk_neon(state, dst, src, size); + kernel_neon_end(); +}