From patchwork Fri Aug 16 11:07:14 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xi Ruoyao X-Patchwork-Id: 819807 Received: from xry111.site (xry111.site [89.208.246.23]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0056A17BEB5 for ; Fri, 16 Aug 2024 11:07:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=89.208.246.23 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723806471; cv=none; b=pgUXeg56ShssOJ78L+179EN/OchiAOke+XcQgP5+6WCMLbymM+VHKM82iWaDkX5JkzRt4x3ONaBokJQa6e39maHNd5yvM+kqOIk/ayOtuKVsrrPUNs2iS9F9FoKTGyhobLZ4zo5Vo8amOb0avGZZWSQqAn1a4iOluVRPEER7dtE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723806471; c=relaxed/simple; bh=8OC7nzC/nicmDbBE5a9SQEEtMU/SQrKF2XChYp742k8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=B2MH562qTUBnHK1n7d59R44c5tJqLdOeGzls2q8YNKuppVsWONOSO0codFJH2IOv3VzxlAp+qGxSWPPlD1WQEMQJVv5trQjrGitwB6SOMQrRaJWedq02+FV9uy+4RZvi6SVfHcRG5p+EicMC5pShzQw5Sm+PCQqYhDQt89AZITY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=xry111.site; spf=pass smtp.mailfrom=xry111.site; dkim=pass (1024-bit key) header.d=xry111.site header.i=@xry111.site header.b=gP+l2mdQ; arc=none smtp.client-ip=89.208.246.23 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=xry111.site Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=xry111.site Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=xry111.site header.i=@xry111.site header.b="gP+l2mdQ" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=xry111.site; s=default; t=1723806467; bh=8OC7nzC/nicmDbBE5a9SQEEtMU/SQrKF2XChYp742k8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=gP+l2mdQjRfhB1n2oqrVWePUFpRSIy/VAa+aRH+3Ab49bghDZ0NSvyefFTTbZALl4 9PC48uN1sYrduYw9q/WDJrZA/BvmQufFdZegKTS/ARaCJASoIZFBTByOr1Bp8Mr1ek vy/jx4qVkWTXvWoe336doPJUlWYcrw1OjPxgAh7s= Received: from stargazer.. (unknown [IPv6:240e:457:1000:1603:4ab7:c07d:7ab1:44b2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (Client did not present a certificate) (Authenticated sender: xry111@xry111.site) by xry111.site (Postfix) with ESMTPSA id 96D0066F27; Fri, 16 Aug 2024 07:07:41 -0400 (EDT) From: Xi Ruoyao To: "Jason A . Donenfeld" , Huacai Chen , WANG Xuerui Cc: Xi Ruoyao , linux-crypto@vger.kernel.org, loongarch@lists.linux.dev, Jinyang He , Tiezhu Yang , Arnd Bergmann Subject: [PATCH v3 1/3] LoongArch: vDSO: Wire up getrandom() vDSO implementation Date: Fri, 16 Aug 2024 19:07:14 +0800 Message-ID: <20240816110717.10249-2-xry111@xry111.site> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20240816110717.10249-1-xry111@xry111.site> References: <20240816110717.10249-1-xry111@xry111.site> Precedence: bulk X-Mailing-List: linux-crypto@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Hook up the generic vDSO implementation to the LoongArch vDSO data page: embed struct vdso_rng_data into struct loongarch_vdso_data, and use assembler hack to resolve the symbol name "_vdso_rng_data" (which is expected by the generic vDSO implementation) to the rng_data field in loongarch_vdso_data. The compiler (GCC 14.2) calls memset() for initializing a "large" struct in a cold path of the generic vDSO getrandom() code. There seems no way to prevent it from calling memset(), and it's a cold path so the performance does not matter, so just provide a naive memset() implementation for vDSO. Signed-off-by: Xi Ruoyao --- arch/loongarch/Kconfig | 1 + arch/loongarch/include/asm/vdso/getrandom.h | 47 ++++ arch/loongarch/include/asm/vdso/vdso.h | 8 + arch/loongarch/kernel/asm-offsets.c | 10 + arch/loongarch/kernel/vdso.c | 6 + arch/loongarch/vdso/Makefile | 2 + arch/loongarch/vdso/memset.S | 24 ++ arch/loongarch/vdso/vdso.lds.S | 1 + arch/loongarch/vdso/vgetrandom-chacha.S | 239 ++++++++++++++++++++ arch/loongarch/vdso/vgetrandom.c | 19 ++ 10 files changed, 357 insertions(+) create mode 100644 arch/loongarch/include/asm/vdso/getrandom.h create mode 100644 arch/loongarch/vdso/memset.S create mode 100644 arch/loongarch/vdso/vgetrandom-chacha.S create mode 100644 arch/loongarch/vdso/vgetrandom.c diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig index 70f169210b52..14821c2aba5b 100644 --- a/arch/loongarch/Kconfig +++ b/arch/loongarch/Kconfig @@ -190,6 +190,7 @@ config LOONGARCH select TRACE_IRQFLAGS_SUPPORT select USE_PERCPU_NUMA_NODE_ID select USER_STACKTRACE_SUPPORT + select VDSO_GETRANDOM select ZONE_DMA32 config 32BIT diff --git a/arch/loongarch/include/asm/vdso/getrandom.h b/arch/loongarch/include/asm/vdso/getrandom.h new file mode 100644 index 000000000000..a369588a4ebf --- /dev/null +++ b/arch/loongarch/include/asm/vdso/getrandom.h @@ -0,0 +1,47 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (C) 2024 Xi Ruoyao . All Rights Reserved. + */ +#ifndef __ASM_VDSO_GETRANDOM_H +#define __ASM_VDSO_GETRANDOM_H + +#ifndef __ASSEMBLY__ + +#include +#include + +static __always_inline ssize_t getrandom_syscall(void *_buffer, + size_t _len, + unsigned int _flags) +{ + register long ret asm("a0"); + register long int nr asm("a7") = __NR_getrandom; + register void *buffer asm("a0") = _buffer; + register size_t len asm("a1") = _len; + register unsigned int flags asm("a2") = _flags; + + asm volatile( + " syscall 0\n" + : "+r" (ret) + : "r" (nr), "r" (buffer), "r" (len), "r" (flags) + : "$t0", "$t1", "$t2", "$t3", "$t4", "$t5", "$t6", "$t7", "$t8", + "memory"); + + return ret; +} + +static __always_inline const struct vdso_rng_data *__arch_get_vdso_rng_data( + void) +{ + return (const struct vdso_rng_data *)( + get_vdso_data() + + VVAR_LOONGARCH_PAGES_START * PAGE_SIZE + + offsetof(struct loongarch_vdso_data, rng_data)); +} + +extern void __arch_chacha20_blocks_nostack(u8 *dst_bytes, const u32 *key, + u32 *counter, size_t nblocks); + +#endif /* !__ASSEMBLY__ */ + +#endif /* __ASM_VDSO_GETRANDOM_H */ diff --git a/arch/loongarch/include/asm/vdso/vdso.h b/arch/loongarch/include/asm/vdso/vdso.h index 5a12309d9fb5..a2e24c3007e2 100644 --- a/arch/loongarch/include/asm/vdso/vdso.h +++ b/arch/loongarch/include/asm/vdso/vdso.h @@ -4,6 +4,9 @@ * Copyright (C) 2020-2022 Loongson Technology Corporation Limited */ +#ifndef _ASM_VDSO_VDSO_H +#define _ASM_VDSO_VDSO_H + #ifndef __ASSEMBLY__ #include @@ -16,6 +19,9 @@ struct vdso_pcpu_data { struct loongarch_vdso_data { struct vdso_pcpu_data pdata[NR_CPUS]; +#ifdef CONFIG_VDSO_GETRANDOM + struct vdso_rng_data rng_data; +#endif }; /* @@ -63,3 +69,5 @@ static inline unsigned long get_vdso_data(void) } #endif /* __ASSEMBLY__ */ + +#endif diff --git a/arch/loongarch/kernel/asm-offsets.c b/arch/loongarch/kernel/asm-offsets.c index bee9f7a3108f..86f6d8a6dc23 100644 --- a/arch/loongarch/kernel/asm-offsets.c +++ b/arch/loongarch/kernel/asm-offsets.c @@ -14,6 +14,7 @@ #include #include #include +#include static void __used output_ptreg_defines(void) { @@ -321,3 +322,12 @@ static void __used output_kvm_defines(void) OFFSET(KVM_GPGD, kvm, arch.pgd); BLANK(); } + +#ifdef CONFIG_VDSO_GETRANDOM +static void __used output_vdso_rng_defines(void) +{ + COMMENT("LoongArch VDSO getrandom offsets."); + OFFSET(VDSO_RNG_DATA, loongarch_vdso_data, rng_data); + BLANK(); +} +#endif diff --git a/arch/loongarch/kernel/vdso.c b/arch/loongarch/kernel/vdso.c index 90dfccb41c14..15b65d8e2fdc 100644 --- a/arch/loongarch/kernel/vdso.c +++ b/arch/loongarch/kernel/vdso.c @@ -22,6 +22,7 @@ #include #include #include +#include #include extern char vdso_start[], vdso_end[]; @@ -34,6 +35,11 @@ static union { struct loongarch_vdso_data vdata; } loongarch_vdso_data __page_aligned_data; +#ifdef CONFIG_VDSO_GETRANDOM +asm(".globl _vdso_rng_data\n" + ".set _vdso_rng_data, loongarch_vdso_data + " __stringify(VDSO_RNG_DATA)); +#endif + static struct page *vdso_pages[] = { NULL }; struct vdso_data *vdso_data = generic_vdso_data.data; struct vdso_pcpu_data *vdso_pdata = loongarch_vdso_data.vdata.pdata; diff --git a/arch/loongarch/vdso/Makefile b/arch/loongarch/vdso/Makefile index 2ddf0480e710..c8c5d9a7c80c 100644 --- a/arch/loongarch/vdso/Makefile +++ b/arch/loongarch/vdso/Makefile @@ -6,6 +6,8 @@ include $(srctree)/lib/vdso/Makefile obj-vdso-y := elf.o vgetcpu.o vgettimeofday.o sigreturn.o +obj-vdso-$(CONFIG_VDSO_GETRANDOM) += vgetrandom.o vgetrandom-chacha.o memset.o + # Common compiler flags between ABIs. ccflags-vdso := \ $(filter -I%,$(KBUILD_CFLAGS)) \ diff --git a/arch/loongarch/vdso/memset.S b/arch/loongarch/vdso/memset.S new file mode 100644 index 000000000000..ec1531683936 --- /dev/null +++ b/arch/loongarch/vdso/memset.S @@ -0,0 +1,24 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * A copy of __memset_generic from arch/loongarch/lib/memset.S for vDSO. + * + * Copyright (C) 2020-2024 Loongson Technology Corporation Limited + */ + +#include +#include + +SYM_FUNC_START(memset) + move a3, a0 + beqz a2, 2f + +1: st.b a1, a0, 0 + addi.d a0, a0, 1 + addi.d a2, a2, -1 + bgt a2, zero, 1b + +2: move a0, a3 + jr ra +SYM_FUNC_END(memset) + +.hidden memset diff --git a/arch/loongarch/vdso/vdso.lds.S b/arch/loongarch/vdso/vdso.lds.S index 56ad855896de..2c965a597d9e 100644 --- a/arch/loongarch/vdso/vdso.lds.S +++ b/arch/loongarch/vdso/vdso.lds.S @@ -63,6 +63,7 @@ VERSION __vdso_clock_gettime; __vdso_gettimeofday; __vdso_rt_sigreturn; + __vdso_getrandom; local: *; }; } diff --git a/arch/loongarch/vdso/vgetrandom-chacha.S b/arch/loongarch/vdso/vgetrandom-chacha.S new file mode 100644 index 000000000000..2e42198f2faf --- /dev/null +++ b/arch/loongarch/vdso/vgetrandom-chacha.S @@ -0,0 +1,239 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2024 Xi Ruoyao . All Rights Reserved. + */ + +#include +#include +#include + +.text + +/* Salsa20 quarter-round */ +.macro QR a b c d + add.w \a, \a, \b + xor \d, \d, \a + rotri.w \d, \d, 16 + + add.w \c, \c, \d + xor \b, \b, \c + rotri.w \b, \b, 20 + + add.w \a, \a, \b + xor \d, \d, \a + rotri.w \d, \d, 24 + + add.w \c, \c, \d + xor \b, \b, \c + rotri.w \b, \b, 25 +.endm + +/* + * Very basic LoongArch implementation of ChaCha20. Produces a given positive + * number of blocks of output with a nonce of 0, taking an input key and + * 8-byte counter. Importantly does not spill to the stack. Its arguments + * are: + * + * a0: output bytes + * a1: 32-byte key input + * a2: 8-byte counter input/output + * a3: number of 64-byte blocks to write to output + */ +SYM_FUNC_START(__arch_chacha20_blocks_nostack) + +/* We don't need a frame pointer */ +#define s9 fp + +#define output a0 +#define key a1 +#define counter a2 +#define nblocks a3 +#define i a4 +#define state0 s0 +#define state1 s1 +#define state2 s2 +#define state3 s3 +#define state4 s4 +#define state5 s5 +#define state6 s6 +#define state7 s7 +#define state8 s8 +#define state9 s9 +#define state10 a5 +#define state11 a6 +#define state12 a7 +#define state13 t0 +#define state14 t1 +#define state15 t2 +#define cnt_lo t3 +#define cnt_hi t4 +#define copy0 t5 +#define copy1 t6 +#define copy2 t7 + +/* Reuse i as copy3 */ +#define copy3 i + + /* + * The ABI requires s0-s9 saved, and sp aligned to 16-byte. + * This does not violate the stack-less requirement: no sensitive data + * is spilled onto the stack. + */ + PTR_ADDI sp, sp, (-SZREG * 10) & STACK_ALIGN + REG_S s0, sp, 0 + REG_S s1, sp, SZREG + REG_S s2, sp, SZREG * 2 + REG_S s3, sp, SZREG * 3 + REG_S s4, sp, SZREG * 4 + REG_S s5, sp, SZREG * 5 + REG_S s6, sp, SZREG * 6 + REG_S s7, sp, SZREG * 7 + REG_S s8, sp, SZREG * 8 + REG_S s9, sp, SZREG * 9 + + li.w copy0, 0x61707865 + li.w copy1, 0x3320646e + li.w copy2, 0x79622d32 + + ld.w cnt_lo, counter, 0 + ld.w cnt_hi, counter, 4 + +.Lblock: + /* state[0,1,2,3] = "expand 32-byte k" */ + move state0, copy0 + move state1, copy1 + move state2, copy2 + li.w state3, 0x6b206574 + + /* state[4,5,..,11] = key */ + ld.w state4, key, 0 + ld.w state5, key, 4 + ld.w state6, key, 8 + ld.w state7, key, 12 + ld.w state8, key, 16 + ld.w state9, key, 20 + ld.w state10, key, 24 + ld.w state11, key, 28 + + /* state[12,13] = counter */ + move state12, cnt_lo + move state13, cnt_hi + + /* state[14,15] = 0 */ + move state14, zero + move state15, zero + + li.w i, 10 +.Lpermute: + /* odd round */ + QR state0, state4, state8, state12 + QR state1, state5, state9, state13 + QR state2, state6, state10, state14 + QR state3, state7, state11, state15 + + /* even round */ + QR state0, state5, state10, state15 + QR state1, state6, state11, state12 + QR state2, state7, state8, state13 + QR state3, state4, state9, state14 + + addi.w i, i, -1 + bnez i, .Lpermute + + /* copy[3] = "expa" */ + li.w copy3, 0x6b206574 + + /* output[0,1,2,3] = copy[0,1,2,3] + state[0,1,2,3] */ + add.w state0, state0, copy0 + add.w state1, state1, copy1 + add.w state2, state2, copy2 + add.w state3, state3, copy3 + st.w state0, output, 0 + st.w state1, output, 4 + st.w state2, output, 8 + st.w state3, output, 12 + + /* from now on state[0,1,2,3] are scratch registers */ + + /* state[0,1,2,3] = lo32(key) */ + ld.w state0, key, 0 + ld.w state1, key, 4 + ld.w state2, key, 8 + ld.w state3, key, 12 + + /* output[4,5,6,7] = state[0,1,2,3] + state[4,5,6,7] */ + add.w state4, state4, state0 + add.w state5, state5, state1 + add.w state6, state6, state2 + add.w state7, state7, state3 + st.w state4, output, 16 + st.w state5, output, 20 + st.w state6, output, 24 + st.w state7, output, 28 + + /* state[0,1,2,3] = hi32(key) */ + ld.w state0, key, 16 + ld.w state1, key, 20 + ld.w state2, key, 24 + ld.w state3, key, 28 + + /* output[8,9,10,11] = state[0,1,2,3] + state[8,9,10,11] */ + add.w state8, state8, state0 + add.w state9, state9, state1 + add.w state10, state10, state2 + add.w state11, state11, state3 + st.w state8, output, 32 + st.w state9, output, 36 + st.w state10, output, 40 + st.w state11, output, 44 + + /* output[12,13,14,15] = state[12,13,14,15] + [cnt_lo, cnt_hi, 0, 0] */ + add.w state12, state12, cnt_lo + add.w state13, state13, cnt_hi + st.w state12, output, 48 + st.w state13, output, 52 + st.w state14, output, 56 + st.w state15, output, 60 + + /* ++counter */ + addi.w cnt_lo, cnt_lo, 1 + sltui state0, cnt_lo, 1 + add.w cnt_hi, cnt_hi, state0 + + /* output += 64 */ + PTR_ADDI output, output, 64 + /* --nblocks */ + PTR_ADDI nblocks, nblocks, -1 + bnez nblocks, .Lblock + + /* counter = [cnt_lo, cnt_hi] */ + st.w cnt_lo, counter, 0 + st.w cnt_hi, counter, 4 + + /* + * Zero out the potentially sensitive regs, in case nothing uses these + * again. As at now copy[0,1,2,3] just contains "expand 32-byte k" and + * state[0,...,9] are s0-s9 those we'll restore in the epilogue, so we + * only need to zero state[11,...,15]. + */ + move state10, zero + move state11, zero + move state12, zero + move state13, zero + move state14, zero + move state15, zero + + REG_L s0, sp, 0 + REG_L s1, sp, SZREG + REG_L s2, sp, SZREG * 2 + REG_L s3, sp, SZREG * 3 + REG_L s4, sp, SZREG * 4 + REG_L s5, sp, SZREG * 5 + REG_L s6, sp, SZREG * 6 + REG_L s7, sp, SZREG * 7 + REG_L s8, sp, SZREG * 8 + REG_L s9, sp, SZREG * 9 + PTR_ADDI sp, sp, -((-SZREG * 10) & STACK_ALIGN) + + jr ra +SYM_FUNC_END(__arch_chacha20_blocks_nostack) diff --git a/arch/loongarch/vdso/vgetrandom.c b/arch/loongarch/vdso/vgetrandom.c new file mode 100644 index 000000000000..0b3b30ecd68a --- /dev/null +++ b/arch/loongarch/vdso/vgetrandom.c @@ -0,0 +1,19 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (C) 2024 Xi Ruoyao . All Rights Reserved. + */ +#include + +#include "../../../../lib/vdso/getrandom.c" + +typeof(__cvdso_getrandom) __vdso_getrandom; + +ssize_t __vdso_getrandom(void *buffer, size_t len, unsigned int flags, + void *opaque_state, size_t opaque_len) +{ + return __cvdso_getrandom(buffer, len, flags, opaque_state, + opaque_len); +} + +typeof(__cvdso_getrandom) getrandom + __attribute__((weak, alias("__vdso_getrandom"))); From patchwork Fri Aug 16 11:07:15 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xi Ruoyao X-Patchwork-Id: 820187 Received: from xry111.site (xry111.site [89.208.246.23]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4CE7E198E7E for ; Fri, 16 Aug 2024 11:08:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=89.208.246.23 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723806506; cv=none; b=dq8YMYxXo7/mOsUpE08B/a1IJchkuPh6gqXILJJoF2Ldui6+xHsGfxTpUD56U/PTHg2OK+QzLGKa3jOXi4jpnglQi15uaQlM8WXYrW6n1rKhCHfGUq0yDf6J9md5DaPIwiE7Q+llzb+X/wSGaRv3aiPzHsdyK05rm0U8yslrNUc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723806506; c=relaxed/simple; bh=huVSt3hvB1YK7wxUMvdb+dL43EiEEO3UtFynjBvkLQM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=bmNf5Am8MknOZ1T3uZLaSAAuXcNuHQsVjkaka9JOwoDvHVXi8+exfyORODtH5Rwbi64tLW7BOogLQDAdKl1KPozkYkAdK3fqWzF06PYcBYX7DdljKcN26h+BWwu+aWHY7sXPLR5LSx5Lw66p2ntF8yRkL8BGilQ8YqlREKq1aLM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=xry111.site; spf=pass smtp.mailfrom=xry111.site; dkim=pass (1024-bit key) header.d=xry111.site header.i=@xry111.site header.b=Rf9WnYLF; arc=none smtp.client-ip=89.208.246.23 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=xry111.site Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=xry111.site Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=xry111.site header.i=@xry111.site header.b="Rf9WnYLF" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=xry111.site; s=default; t=1723806504; bh=huVSt3hvB1YK7wxUMvdb+dL43EiEEO3UtFynjBvkLQM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Rf9WnYLF7dhpfGSPL3fanhdK5ixjzjhn5pozRh3h+jFRfQWPgJU2uMYuQzewavq9Y NN+tT1K4JWKfa0mv7mp6++XFAck9L0eq03kYVZnrON/N4NFKpB0bWUMTr2Aqyzkq/0 wVcFANnVlp0jnDlMXALdaopmHFVbVUFQEdrx/tT0= Received: from stargazer.. (unknown [IPv6:240e:457:1000:1603:4ab7:c07d:7ab1:44b2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (Client did not present a certificate) (Authenticated sender: xry111@xry111.site) by xry111.site (Postfix) with ESMTPSA id 11D4266F26; Fri, 16 Aug 2024 07:08:18 -0400 (EDT) From: Xi Ruoyao To: "Jason A . Donenfeld" , Huacai Chen , WANG Xuerui Cc: Xi Ruoyao , linux-crypto@vger.kernel.org, loongarch@lists.linux.dev, Jinyang He , Tiezhu Yang , Arnd Bergmann Subject: [PATCH v3 2/3] LoongArch: Perform alternative runtime patching on vDSO Date: Fri, 16 Aug 2024 19:07:15 +0800 Message-ID: <20240816110717.10249-3-xry111@xry111.site> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20240816110717.10249-1-xry111@xry111.site> References: <20240816110717.10249-1-xry111@xry111.site> Precedence: bulk X-Mailing-List: linux-crypto@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 To implement getrandom() in vDSO, we need to implement stack-less ChaCha20. ChaCha20 is designed to be SIMD-friendly, but LSX is not guaranteed to be available on all LoongArch CPU models. Perform alternative runtime patching on vDSO so we'll be able to use LSX in vDSO. Signed-off-by: Xi Ruoyao --- arch/loongarch/kernel/vdso.c | 8 +++++++- arch/loongarch/vdso/vdso.lds.S | 6 ++++++ 2 files changed, 13 insertions(+), 1 deletion(-) diff --git a/arch/loongarch/kernel/vdso.c b/arch/loongarch/kernel/vdso.c index 15b65d8e2fdc..d500436f252b 100644 --- a/arch/loongarch/kernel/vdso.c +++ b/arch/loongarch/kernel/vdso.c @@ -17,6 +17,7 @@ #include #include +#include #include #include #include @@ -105,7 +106,7 @@ struct loongarch_vdso_info vdso_info = { static int __init init_vdso(void) { - unsigned long i, cpu, pfn; + unsigned long i, cpu, pfn, vdso; BUG_ON(!PAGE_ALIGNED(vdso_info.vdso)); BUG_ON(!PAGE_ALIGNED(vdso_info.size)); @@ -117,6 +118,11 @@ static int __init init_vdso(void) for (i = 0; i < vdso_info.size / PAGE_SIZE; i++) vdso_info.code_mapping.pages[i] = pfn_to_page(pfn + i); + vdso = (unsigned long)vdso_info.vdso; + + apply_alternatives((struct alt_instr *)(vdso + vdso_offset_alt), + (struct alt_instr *)(vdso + vdso_offset_alt_end)); + return 0; } subsys_initcall(init_vdso); diff --git a/arch/loongarch/vdso/vdso.lds.S b/arch/loongarch/vdso/vdso.lds.S index 2c965a597d9e..ac63dc080bc9 100644 --- a/arch/loongarch/vdso/vdso.lds.S +++ b/arch/loongarch/vdso/vdso.lds.S @@ -35,6 +35,12 @@ SECTIONS .rodata : { *(.rodata*) } :text + .altinstructions : ALIGN(4) { + VDSO_alt = .; + *(.altinstructions) + VDSO_alt_end = .; + } :text + _end = .; PROVIDE(end = .); From patchwork Fri Aug 16 11:07:16 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xi Ruoyao X-Patchwork-Id: 819806 Received: from xry111.site (xry111.site [89.208.246.23]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 071A31B3730 for ; Fri, 16 Aug 2024 11:08:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=89.208.246.23 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723806516; cv=none; b=S6B2t2stHCMVXnZU44gW8m2uBpVSzso6hGYq37e3qDeuVJZIppPW4NCkkUb+y5gmyB3T1Tfp+8cJd5RuXBopuobxezk1DKiGsSDDKiUYFy9sx8aldXpsjckP8s7VyweIRxjOC5sZj5RPitmUflXWMooKh2hOcK2JhhN1EARXguI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723806516; c=relaxed/simple; bh=N6IxPQvc73cA34ZLFBJ+Wb3X71a6U0xPaAu/Ebm+NH8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=DorlV8zC83SANbVqvqsG/RbIPsDfZGMiYJhcO90eQXTKyxvnwENA6hn+USqqi8v07JB7Mxj2XaWtc44TdoFa3+cdQS30vZLAmWPZ0Je2k74oQNgpMwStGQWTC414Ao4IEK/MhfDjnKyxwBrIxhUEP041lEj6S4KG1ZDnkB/KyO8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=xry111.site; spf=pass smtp.mailfrom=xry111.site; dkim=pass (1024-bit key) header.d=xry111.site header.i=@xry111.site header.b=U+c+SwVl; arc=none smtp.client-ip=89.208.246.23 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=xry111.site Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=xry111.site Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=xry111.site header.i=@xry111.site header.b="U+c+SwVl" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=xry111.site; s=default; t=1723806514; bh=N6IxPQvc73cA34ZLFBJ+Wb3X71a6U0xPaAu/Ebm+NH8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=U+c+SwVl+0r96mp/pcLyUErptnO0LAhYULTsdRU82Y+dS1tsw/w6jyqzjeJzgv0ND /xkLT3+ecLzu5bcpp7ROPiyM/Yf41gMzn8ztorullIdBzcqMS+GoHLKJ+KH8oFl633 TaZCjPFpROQmPuCJEbqKK2CdGOC2Pq2BkrfE5F8c= Received: from stargazer.. (unknown [IPv6:240e:457:1000:1603:4ab7:c07d:7ab1:44b2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (Client did not present a certificate) (Authenticated sender: xry111@xry111.site) by xry111.site (Postfix) with ESMTPSA id 8BEDD66F27; Fri, 16 Aug 2024 07:08:28 -0400 (EDT) From: Xi Ruoyao To: "Jason A . Donenfeld" , Huacai Chen , WANG Xuerui Cc: Xi Ruoyao , linux-crypto@vger.kernel.org, loongarch@lists.linux.dev, Jinyang He , Tiezhu Yang , Arnd Bergmann Subject: [PATCH v3 3/3] LoongArch: vDSO: Add LSX implementation of vDSO getrandom() Date: Fri, 16 Aug 2024 19:07:16 +0800 Message-ID: <20240816110717.10249-4-xry111@xry111.site> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20240816110717.10249-1-xry111@xry111.site> References: <20240816110717.10249-1-xry111@xry111.site> Precedence: bulk X-Mailing-List: linux-crypto@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 It's 7% faster in vdso_test_getrandom bench-single test and 21% faster in vdso_test_getrandom bench-multi test than the generic LoongArch implementation. Signed-off-by: Xi Ruoyao --- arch/loongarch/vdso/Makefile | 4 + arch/loongarch/vdso/vgetrandom-chacha-lsx.S | 162 ++++++++++++++++++++ arch/loongarch/vdso/vgetrandom-chacha.S | 13 ++ 3 files changed, 179 insertions(+) create mode 100644 arch/loongarch/vdso/vgetrandom-chacha-lsx.S diff --git a/arch/loongarch/vdso/Makefile b/arch/loongarch/vdso/Makefile index c8c5d9a7c80c..cab92c3a70a4 100644 --- a/arch/loongarch/vdso/Makefile +++ b/arch/loongarch/vdso/Makefile @@ -8,6 +8,10 @@ obj-vdso-y := elf.o vgetcpu.o vgettimeofday.o sigreturn.o obj-vdso-$(CONFIG_VDSO_GETRANDOM) += vgetrandom.o vgetrandom-chacha.o memset.o +ifdef CONFIG_CPU_HAS_LSX +obj-vdso-$(CONFIG_VDSO_GETRANDOM) += vgetrandom-chacha-lsx.o +endif + # Common compiler flags between ABIs. ccflags-vdso := \ $(filter -I%,$(KBUILD_CFLAGS)) \ diff --git a/arch/loongarch/vdso/vgetrandom-chacha-lsx.S b/arch/loongarch/vdso/vgetrandom-chacha-lsx.S new file mode 100644 index 000000000000..6d8c886d78c8 --- /dev/null +++ b/arch/loongarch/vdso/vgetrandom-chacha-lsx.S @@ -0,0 +1,162 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2024 Xi Ruoyao . All Rights Reserved. + * + * Based on arch/x86/entry/vdso/vgetrandom-chacha.S: + * + * Copyright (C) 2022-2024 Jason A. Donenfeld . All Rights + * Reserved. + */ + +#include +#include +#include + +.section .rodata +.align 4 +CONSTANTS: .octa 0x6b20657479622d323320646e61707865 + +.text + +/* + * Loongson SIMD eXtension implementation of ChaCha20. Produces a given + * positive number of blocks of output with a nonce of 0, taking an input + * key and 8-byte counter. Importantly does not spill to the stack. Its + * arguments are: + * + * a0: output bytes + * a1: 32-byte key input + * a2: 8-byte counter input/output + * a3: number of 64-byte blocks to write to output + */ +SYM_FUNC_START(__arch_chacha20_blocks_nostack_lsx) +#define output a0 +#define key a1 +#define counter a2 +#define nblocks a3 +#define i t0 +/* LSX registers vr0-vr23 are caller-save. */ +#define state0 $vr0 +#define state1 $vr1 +#define state2 $vr2 +#define state3 $vr3 +#define copy0 $vr4 +#define copy1 $vr5 +#define copy2 $vr6 +#define copy3 $vr7 +#define one $vr8 + + /* copy0 = "expand 32-byte k" */ + la.pcrel t1, CONSTANTS + vld copy0, t1, 0 + /* copy1, copy2 = key */ + vld copy1, key, 0 + vld copy2, key, 0x10 + /* copy3 = counter || zero nonce */ + vldrepl.d copy3, counter, 0 + vinsgr2vr.d copy3, zero, 1 + /* one = 1 || 0 */ + vldi one, 0b0110000000001 + vinsgr2vr.d one, zero, 1 + +.Lblock: + /* state = copy */ + vori.b state0, copy0, 0 + vori.b state1, copy1, 0 + vori.b state2, copy2, 0 + vori.b state3, copy3, 0 + + li.w i, 10 +.Lpermute: + /* state0 += state1, state3 = rotl32(state3 ^ state0, 16) */ + vadd.w state0, state0, state1 + vxor.v state3, state3, state0 + vrotri.w state3, state3, 16 + + /* state2 += state3, state1 = rotl32(state1 ^ state2, 12) */ + vadd.w state2, state2, state3 + vxor.v state1, state1, state2 + vrotri.w state1, state1, 20 + + /* state0 += state1, state3 = rotl32(state3 ^ state0, 8) */ + vadd.w state0, state0, state1 + vxor.v state3, state3, state0 + vrotri.w state3, state3, 24 + + /* state2 += state3, state1 = rotl32(state1 ^ state2, 7) */ + vadd.w state2, state2, state3 + vxor.v state1, state1, state2 + vrotri.w state1, state1, 25 + + /* state1[0,1,2,3] = state1[1,2,3,0] */ + vshuf4i.w state1, state1, 0b00111001 + /* state2[0,1,2,3] = state2[2,3,0,1] */ + vshuf4i.w state2, state2, 0b01001110 + /* state3[0,1,2,3] = state3[1,2,3,0] */ + vshuf4i.w state3, state3, 0b10010011 + + /* state0 += state1, state3 = rotl32(state3 ^ state0, 16) */ + vadd.w state0, state0, state1 + vxor.v state3, state3, state0 + vrotri.w state3, state3, 16 + + /* state2 += state3, state1 = rotl32(state1 ^ state2, 12) */ + vadd.w state2, state2, state3 + vxor.v state1, state1, state2 + vrotri.w state1, state1, 20 + + /* state0 += state1, state3 = rotl32(state3 ^ state0, 8) */ + vadd.w state0, state0, state1 + vxor.v state3, state3, state0 + vrotri.w state3, state3, 24 + + /* state2 += state3, state1 = rotl32(state1 ^ state2, 7) */ + vadd.w state2, state2, state3 + vxor.v state1, state1, state2 + vrotri.w state1, state1, 25 + + /* state1[0,1,2,3] = state1[3,0,1,2] */ + vshuf4i.w state1, state1, 0b10010011 + /* state2[0,1,2,3] = state2[2,3,0,1] */ + vshuf4i.w state2, state2, 0b01001110 + /* state3[0,1,2,3] = state3[1,2,3,0] */ + vshuf4i.w state3, state3, 0b00111001 + + addi.w i, i, -1 + bnez i, .Lpermute + + /* output0 = state0 + copy0 */ + vadd.w state0, state0, copy0 + vst state0, output, 0 + /* output1 = state1 + copy1 */ + vadd.w state1, state1, copy1 + vst state1, output, 0x10 + /* output2 = state2 + copy2 */ + vadd.w state2, state2, copy2 + vst state2, output, 0x20 + /* output3 = state3 + copy3 */ + vadd.w state3, state3, copy3 + vst state3, output, 0x30 + + /* ++copy3.counter */ + vadd.d copy3, copy3, one + + /* output += 64 */ + PTR_ADDI output, output, 64 + /* --nblocks */ + PTR_ADDI nblocks, nblocks, -1 + bnez nblocks, .Lblock + + /* counter = copy3.counter */ + vstelm.d copy3, counter, 0, 0 + + /* Zero out the potentially sensitive regs, in case nothing uses these again. */ + vldi state0, 0 + vldi state1, 0 + vldi state2, 0 + vldi state3, 0 + vldi copy1, 0 + vldi copy2, 0 + + jr ra +SYM_FUNC_END(__arch_chacha20_blocks_nostack_lsx) diff --git a/arch/loongarch/vdso/vgetrandom-chacha.S b/arch/loongarch/vdso/vgetrandom-chacha.S index 2e42198f2faf..1931119e12a6 100644 --- a/arch/loongarch/vdso/vgetrandom-chacha.S +++ b/arch/loongarch/vdso/vgetrandom-chacha.S @@ -7,6 +7,11 @@ #include #include +#ifdef CONFIG_CPU_HAS_LSX +# include +# include +#endif + .text /* Salsa20 quarter-round */ @@ -78,8 +83,16 @@ SYM_FUNC_START(__arch_chacha20_blocks_nostack) * The ABI requires s0-s9 saved, and sp aligned to 16-byte. * This does not violate the stack-less requirement: no sensitive data * is spilled onto the stack. + * + * Rewrite the very first instruction to jump to the LSX implementation + * if LSX is available. */ +#ifdef CONFIG_CPU_HAS_LSX + ALTERNATIVE __stringify(PTR_ADDI sp, sp, (-SZREG * 10) & STACK_ALIGN), \ + "b __arch_chacha20_blocks_nostack_lsx", CPU_FEATURE_LSX +#else PTR_ADDI sp, sp, (-SZREG * 10) & STACK_ALIGN +#endif REG_S s0, sp, 0 REG_S s1, sp, SZREG REG_S s2, sp, SZREG * 2