mbox series

[v2,0/2] LoongArch: Implement getrandom() in vDSO

Message ID 20240815133357.35829-1-xry111@xry111.site
Headers show
Series LoongArch: Implement getrandom() in vDSO | expand

Message

Xi Ruoyao Aug. 15, 2024, 1:33 p.m. UTC
For the rationale to implement getrandom() in vDSO see [1].

The vDSO getrandom() needs a stack-less ChaCha20 implementation, so we
need to add architecture-specific code and wire it up with the generic
code.

Without LSX it's not easy to implement ChaCha20 without stack.  So the
current implementation just falls back to a getrandom() syscall if LSX
is unavailable.  In the 1st patch the existing alternative runtime
patching mechanism is expanded to cover vDSO in the first patch, so we
don't need to invoke cpucfg for each vDSO getrandom() call.

Then in the 2nd patch stack-less ChaCha20 is implemented with LSX.  The
code is basically a direct translate from the x86 SSE2 implementation.
One annoying thing here is the compiler generates a memset() call for a
"large" struct initialization in a cold path and there seems no way to
prevent it.  So a naive memset implementation is copied from the kernel
code into vDSO.

The implementation is tested with the kernel selftests added by the last
patch in [1].  I had to make some adjustments to make it work on
LoongArch (see [2], I've not submitted the changes as at now because I'm
unsure about the KHDR_INCLUDES addition).  The vdso_test_getrandom
bench-single result:

       vdso: 25000000 times in 0.631345201 seconds
       libc: 25000000 times in 6.953121083 seconds
    syscall: 25000000 times in 6.992112386 seconds

The vdso_test_getrandom bench-multi result:

       vdso: 25000000 x 256 times in 29.558284986 seconds
       libc: 25000000 x 256 times in 356.633930139 seconds
       syscall: 25000000 x 256 times in 334.885555338 seconds

[1]:https://lore.kernel.org/all/20240712014009.281406-1-Jason@zx2c4.com/
[2]:https://github.com/xry111/linux/commits/xry111/la-vdso/

v1->v2: Remove Cc: lists in the cover letter and just type them in git
send-email command.  I assumed the Cc: lists in the cover letter would be
"propagated" to the patches by git send-email but I was wrong, so v1 was
never properly delivered to the lists.

Xi Ruoyao (2):
  LoongArch: Perform alternative runtime patching on vDSO
  LoongArch: vDSO: Wire up getrandom() vDSO implementation

 arch/loongarch/Kconfig                      |   1 +
 arch/loongarch/include/asm/vdso/getrandom.h |  47 ++++++
 arch/loongarch/include/asm/vdso/vdso.h      |   8 +
 arch/loongarch/kernel/asm-offsets.c         |  10 ++
 arch/loongarch/kernel/vdso.c                |  14 +-
 arch/loongarch/vdso/Makefile                |   2 +
 arch/loongarch/vdso/memset.S                |  24 +++
 arch/loongarch/vdso/vdso.lds.S              |   7 +
 arch/loongarch/vdso/vgetrandom-alt.S        |  19 +++
 arch/loongarch/vdso/vgetrandom-chacha.S     | 162 ++++++++++++++++++++
 arch/loongarch/vdso/vgetrandom.c            |  16 ++
 11 files changed, 309 insertions(+), 1 deletion(-)
 create mode 100644 arch/loongarch/include/asm/vdso/getrandom.h
 create mode 100644 arch/loongarch/vdso/memset.S
 create mode 100644 arch/loongarch/vdso/vgetrandom-alt.S
 create mode 100644 arch/loongarch/vdso/vgetrandom-chacha.S
 create mode 100644 arch/loongarch/vdso/vgetrandom.c

Comments

Jason A. Donenfeld Aug. 15, 2024, 2:04 p.m. UTC | #1
Hi Xi,

Thanks for posting this! That's very nice to see.

I'm currently traveling without my laptop (actually in Yunnan, China!),
so I'll be able to take a look at this for real starting the 26th, as
right now I'm just on my cellphone using lore+mutt.

One thing I wanted to ask, though, is - doesn't LoongArch have 32 8-byte
registers? Shouldn't that be enough to implement ChaCha without spilling
and without using LSX?

Jason
Xi Ruoyao Aug. 15, 2024, 2:22 p.m. UTC | #2
On Thu, 2024-08-15 at 14:04 +0000, Jason A. Donenfeld wrote:
> Hi Xi,
> 
> Thanks for posting this! That's very nice to see.
> 
> I'm currently traveling without my laptop (actually in Yunnan, China!),

Have fun!

> so I'll be able to take a look at this for real starting the 26th, as
> right now I'm just on my cellphone using lore+mutt.
> 
> One thing I wanted to ask, though, is - doesn't LoongArch have 32 8-byte
> registers? Shouldn't that be enough to implement ChaCha without spilling
> and without using LSX?

I'll work on it but I need to ask a question (it may be stupid because I
know a little about security) before starting to code:

Is "stack-less" meaning simply "don't spill any sensitive data onto the
stack," or more strictly "stack shouldn't be used at all"?

For example, is it OK to save all the callee-saved registers in the
function prologue onto the stack, and restore them in the epilogue?
Xi Ruoyao Aug. 26, 2024, 6:32 a.m. UTC | #3
On Thu, 2024-08-15 at 14:04 +0000, Jason A. Donenfeld wrote:
> Thanks for posting this! That's very nice to see.
> 
> I'm currently traveling without my laptop (actually in Yunnan, China!),
> so I'll be able to take a look at this for real starting the 26th, as
> right now I'm just on my cellphone using lore+mutt.

Hi Jason,

When you start the reviewing I guess you can check out the powerpc
implementation first and add me into the Cc of your reply.  There seems
something useful to me in the powerpc implementation (avoiding memset,
adding __arch_get_k_vdso_data so I wouldn't need the inline asm trick
for the _vdso_rng_data symbol, and the selftest support).
Jason A. Donenfeld Aug. 26, 2024, 8:54 a.m. UTC | #4
On Mon, Aug 26, 2024 at 02:32:05PM +0800, Xi Ruoyao wrote:
> On Thu, 2024-08-15 at 14:04 +0000, Jason A. Donenfeld wrote:
> > Thanks for posting this! That's very nice to see.
> > 
> > I'm currently traveling without my laptop (actually in Yunnan, China!),
> > so I'll be able to take a look at this for real starting the 26th, as
> > right now I'm just on my cellphone using lore+mutt.
> 
> Hi Jason,
> 
> When you start the reviewing I guess you can check out the powerpc
> implementation first and add me into the Cc of your reply.  There seems
> something useful to me in the powerpc implementation (avoiding memset,
> adding __arch_get_k_vdso_data so I wouldn't need the inline asm trick
> for the _vdso_rng_data symbol, and the selftest support).

Indeed, I just committed a bit of those fixups to the random.git tree,
if you want to base your work on that for the time being:

   https://git.kernel.org/pub/scm/linux/kernel/git/crng/random.git/log/