mbox series

[v4,0/5] Wire up getrandom() vDSO implementation on powerpc

Message ID cover.1725278148.git.christophe.leroy@csgroup.eu
Headers show
Series Wire up getrandom() vDSO implementation on powerpc | expand

Message

Christophe Leroy Sept. 2, 2024, 12:04 p.m. UTC
This series wires up getrandom() vDSO implementation on powerpc.

Tested on PPC32 on real hardware.
Tested on PPC64 (both BE and LE) on QEMU:

Performance on powerpc 885:
	~# ./vdso_test_getrandom bench-single
	   vdso: 25000000 times in 62.938002291 seconds
	   libc: 25000000 times in 535.581916866 seconds
	syscall: 25000000 times in 531.525042806 seconds

Performance on powerpc 8321:
	~# ./vdso_test_getrandom bench-single
	   vdso: 25000000 times in 16.899318858 seconds
	   libc: 25000000 times in 131.050596522 seconds
	syscall: 25000000 times in 129.794790389 seconds

Performance on QEMU pseries:
	~ # ./vdso_test_getrandom bench-single
	   vdso: 25000000 times in 4.977777162 seconds
	   libc: 25000000 times in 75.516749981 seconds
	syscall: 25000000 times in 86.842242014 seconds

Changes in v4:
- Rebased on recent random git tree (963233ff0133) (The new tree includes selftests fixes)
- Read/write counter in native byte order
- Don't use anymore compat macros to write output
- Fixed selftests build failure with patch 4 (without patch 5) on little endian on PPC64
- Implement a __kernel_getrandom() stub returning ENOSYS on ppc64 in patch 4 (without patch 5) to make selftests happy.

Changes in v3:
- Rebased on recent random git tree (0c7e00e22c21)
- Fixed build failures reported by robots around VM_DROPPABLE
- Fixed crash on PPC64 due to clobbered r13 by not using r13 anymore (saving it was not enough for signals).
- Split final patch in two, first for PPC32, second for PPC64
- Moved selftest fixes out of this series

Changes in v2:
- Define VM_DROPPABLE for powerpc/32
- Fixes generic vDSO getrandom headers to enable CONFIG_COMPAT build.
- Fixed size of generation counter
- Fixed selftests to work on non x86 architectures

Christophe Leroy (5):
  mm: Define VM_DROPPABLE for powerpc/32
  powerpc/vdso32: Add crtsavres
  powerpc/vdso: Refactor CFLAGS for CVDSO build
  powerpc/vdso: Wire up getrandom() vDSO implementation on PPC32
  powerpc/vdso: Wire up getrandom() vDSO implementation on PPC64

 arch/powerpc/Kconfig                         |   1 +
 arch/powerpc/include/asm/mman.h              |   2 +-
 arch/powerpc/include/asm/vdso/getrandom.h    |  54 ++++
 arch/powerpc/include/asm/vdso/vsyscall.h     |   6 +
 arch/powerpc/include/asm/vdso_datapage.h     |   2 +
 arch/powerpc/kernel/asm-offsets.c            |   1 +
 arch/powerpc/kernel/vdso/Makefile            |  57 ++--
 arch/powerpc/kernel/vdso/getrandom.S         |  58 ++++
 arch/powerpc/kernel/vdso/gettimeofday.S      |  13 -
 arch/powerpc/kernel/vdso/vdso32.lds.S        |   1 +
 arch/powerpc/kernel/vdso/vdso64.lds.S        |   1 +
 arch/powerpc/kernel/vdso/vgetrandom-chacha.S | 320 +++++++++++++++++++
 arch/powerpc/kernel/vdso/vgetrandom.c        |  14 +
 fs/proc/task_mmu.c                           |   4 +-
 include/linux/mm.h                           |   4 +-
 include/trace/events/mmflags.h               |   4 +-
 tools/testing/selftests/vDSO/Makefile        |   2 +-
 17 files changed, 501 insertions(+), 43 deletions(-)
 create mode 100644 arch/powerpc/include/asm/vdso/getrandom.h
 create mode 100644 arch/powerpc/kernel/vdso/getrandom.S
 create mode 100644 arch/powerpc/kernel/vdso/vgetrandom-chacha.S
 create mode 100644 arch/powerpc/kernel/vdso/vgetrandom.c

Comments

Christophe Leroy Sept. 2, 2024, 1:12 p.m. UTC | #1
Le 02/09/2024 à 14:41, Jason A. Donenfeld a écrit :
> On Mon, Sep 02, 2024 at 02:04:42PM +0200, Christophe Leroy wrote:
>>   SYM_FUNC_START(__arch_chacha20_blocks_nostack)
>>   #ifdef __powerpc64__
>> -	blr
>> +	std	r5, -216(r1)
>> +
>> +	std	r14, -144(r1)
>> +	std	r15, -136(r1)
>> +	std	r16, -128(r1)
>> +	std	r17, -120(r1)
>> +	std	r18, -112(r1)
>> +	std	r19, -104(r1)
>> +	std	r20, -96(r1)
>> +	std	r21, -88(r1)
>> +	std	r22, -80(r1)
>> +	std	r23, -72(r1)
>> +	std	r24, -64(r1)
>> +	std	r25, -56(r1)
>> +	std	r26, -48(r1)
>> +	std	r27, -40(r1)
>> +	std	r28, -32(r1)
>> +	std	r29, -24(r1)
>> +	std	r30, -16(r1)
>> +	std	r31, -8(r1)
>>   #else
>>   	stwu	r1, -96(r1)
>>   	stw	r5, 20(r1)
>> +#ifdef __BIG_ENDIAN__
>>   	stmw	r14, 24(r1)
>> +#else
>> +	stw	r14, 24(r1)
>> +	stw	r15, 28(r1)
>> +	stw	r16, 32(r1)
>> +	stw	r17, 36(r1)
>> +	stw	r18, 40(r1)
>> +	stw	r19, 44(r1)
>> +	stw	r20, 48(r1)
>> +	stw	r21, 52(r1)
>> +	stw	r22, 56(r1)
>> +	stw	r23, 60(r1)
>> +	stw	r24, 64(r1)
>> +	stw	r25, 68(r1)
>> +	stw	r26, 72(r1)
>> +	stw	r27, 76(r1)
>> +	stw	r28, 80(r1)
>> +	stw	r29, 84(r1)
>> +	stw	r30, 88(r1)
>> +	stw	r31, 92(r1)
>> +#endif
>> +#endif
> 
> This confuses me. Why are you adding code to the !__powerpc64__ branch
> in this commit? (Also, why does stmw not work on LE?)

That's for the VDSO32 ie running 32 bits binaries on a 64 bits kernel.

"Programming Environments Manual for 32-Bit Implementations of the 
PowerPC™ Architecture" say: In some implementations operating with 
little-endian byte order, execution of an lmw or stmw instruction
causes the system alignment error handler to be invoked

And GCC doesn't like it either:

tools/arch/powerpc/vdso/vgetrandom-chacha.S:84: Error: `stmw' invalid 
when little-endian
Christophe Leroy Sept. 2, 2024, 2:16 p.m. UTC | #2
Le 02/09/2024 à 16:00, Jason A. Donenfeld a écrit :
> On Mon, Sep 02, 2024 at 03:12:47PM +0200, Christophe Leroy wrote:
>>
>>
>> Le 02/09/2024 à 14:41, Jason A. Donenfeld a écrit :
>>> On Mon, Sep 02, 2024 at 02:04:42PM +0200, Christophe Leroy wrote:
>>>>    SYM_FUNC_START(__arch_chacha20_blocks_nostack)
>>>>    #ifdef __powerpc64__
>>>> -	blr
>>>> +	std	r5, -216(r1)
>>>> +
>>>> +	std	r14, -144(r1)
>>>> +	std	r15, -136(r1)
>>>> +	std	r16, -128(r1)
>>>> +	std	r17, -120(r1)
>>>> +	std	r18, -112(r1)
>>>> +	std	r19, -104(r1)
>>>> +	std	r20, -96(r1)
>>>> +	std	r21, -88(r1)
>>>> +	std	r22, -80(r1)
>>>> +	std	r23, -72(r1)
>>>> +	std	r24, -64(r1)
>>>> +	std	r25, -56(r1)
>>>> +	std	r26, -48(r1)
>>>> +	std	r27, -40(r1)
>>>> +	std	r28, -32(r1)
>>>> +	std	r29, -24(r1)
>>>> +	std	r30, -16(r1)
>>>> +	std	r31, -8(r1)
>>>>    #else
>>>>    	stwu	r1, -96(r1)
>>>>    	stw	r5, 20(r1)
>>>> +#ifdef __BIG_ENDIAN__
>>>>    	stmw	r14, 24(r1)
>>>> +#else
>>>> +	stw	r14, 24(r1)
>>>> +	stw	r15, 28(r1)
>>>> +	stw	r16, 32(r1)
>>>> +	stw	r17, 36(r1)
>>>> +	stw	r18, 40(r1)
>>>> +	stw	r19, 44(r1)
>>>> +	stw	r20, 48(r1)
>>>> +	stw	r21, 52(r1)
>>>> +	stw	r22, 56(r1)
>>>> +	stw	r23, 60(r1)
>>>> +	stw	r24, 64(r1)
>>>> +	stw	r25, 68(r1)
>>>> +	stw	r26, 72(r1)
>>>> +	stw	r27, 76(r1)
>>>> +	stw	r28, 80(r1)
>>>> +	stw	r29, 84(r1)
>>>> +	stw	r30, 88(r1)
>>>> +	stw	r31, 92(r1)
>>>> +#endif
>>>> +#endif
>>>
>>> This confuses me. Why are you adding code to the !__powerpc64__ branch
>>> in this commit? (Also, why does stmw not work on LE?)
>>
>> That's for the VDSO32 ie running 32 bits binaries on a 64 bits kernel.
>>
>> "Programming Environments Manual for 32-Bit Implementations of the
>> PowerPC™ Architecture" say: In some implementations operating with
>> little-endian byte order, execution of an lmw or stmw instruction
>> causes the system alignment error handler to be invoked
>>
>> And GCC doesn't like it either:
>>
>> tools/arch/powerpc/vdso/vgetrandom-chacha.S:84: Error: `stmw' invalid
>> when little-endian
> 
> Does it make sense to do all the 32-bit stuff in the PPC32 commit (and
> then you can introduce the selftests there without the error you
> mentioned), and then add the 64-bit stuff in this commit?

Can do that, but there will still be a problem with chacha selftests if 
I don't opt-out the entire function content when it is ppc64. It will 
build properly but if someone runs it on a ppc64 it will likely crash 
because only the low 32 bits of registers will be saved.

That's the reason why I really prefered the approach where I set 
something in vdso_config.h so that the assembly is used only for 
powerpc32 and when building powerpc64 the assembly part is kept out and 
vdso_test_chacha simply tells it is not supported.

Christophe
Jason A. Donenfeld Sept. 2, 2024, 2:19 p.m. UTC | #3
On Mon, Sep 02, 2024 at 04:16:48PM +0200, Christophe Leroy wrote:
> Can do that, but there will still be a problem with chacha selftests if 
> I don't opt-out the entire function content when it is ppc64. It will 
> build properly but if someone runs it on a ppc64 it will likely crash 
> because only the low 32 bits of registers will be saved.

What if you don't wire up the selftests _at all_ until the ppc64 commit?
Then there'll be no risk.

(And I think I would prefer to see the 32-bit code all in the 32-bit
commit; that'd make it more straight forward to review too.)
Christophe Leroy Sept. 2, 2024, 2:27 p.m. UTC | #4
Hi Jason, hi Michael,

Le 02/09/2024 à 16:19, Jason A. Donenfeld a écrit :
> On Mon, Sep 02, 2024 at 04:16:48PM +0200, Christophe Leroy wrote:
>> Can do that, but there will still be a problem with chacha selftests if
>> I don't opt-out the entire function content when it is ppc64. It will
>> build properly but if someone runs it on a ppc64 it will likely crash
>> because only the low 32 bits of registers will be saved.
> 
> What if you don't wire up the selftests _at all_ until the ppc64 commit?
> Then there'll be no risk.
> 
> (And I think I would prefer to see the 32-bit code all in the 32-bit
> commit; that'd make it more straight forward to review too.)

I'd be fine with that but I'd like feedback from Michael on it: Is there 
a risk to only get PPC32 part merged as a first step or will both PPC32 
and PPC64 go together anyway ?

I would prefer not to delay PPC32 because someone doesn't feel confident 
with PPC64.

Christophe