Message ID | 20231208113218.3001940-6-ardb@google.com |
---|---|
Headers | show |
Series | arm64: Run kernel mode NEON with preemption enabled | expand |
Hey Ard, On Fri, 8 Dec 2023 12:32:19 +0100, Ard Biesheuvel wrote: > From: Ard Biesheuvel <ardb@kernel.org> > > Currently, kernel mode NEON (SIMD) support is implemented in a way that > requires preemption to be disabled while the SIMD registers are live. > The reason for this is that those registers are not in the set that is > preserved/restored on exception entry/exit and context switch, as this > would impact performance generally, even for workloads where kernel mode > SIMD is not the bottleneck. > > [...] I applied the first three patches to for-next/fpsimd: [1/4] arm64: fpsimd: Drop unneeded 'busy' flag https://git.kernel.org/arm64/c/e109130b0e5e [2/4] arm64: fpsimd: Preserve/restore kernel mode NEON at context switch https://git.kernel.org/arm64/c/1e3a3de1ff6c [3/4] arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD https://git.kernel.org/arm64/c/035262623959 It would be nice to have an Ack from Herbert on the last one so that he's aware of the possible conflicts. The other thing I tangentially wondered about is what happens now if code calls uaccess routines (e.g. get_user()) within a kernel_neon_{begin,end} section? I think previously the fact that preemption had to be disabled would've caused the might_fault() to explode, but now I suppose the BUG_ON() in kernel_neon_begin() will save us. Is that right? Cheers,
On Mon, 11 Dec 2023 at 21:27, Will Deacon <will@kernel.org> wrote: > > Hey Ard, > > On Fri, 8 Dec 2023 12:32:19 +0100, Ard Biesheuvel wrote: > > From: Ard Biesheuvel <ardb@kernel.org> > > > > Currently, kernel mode NEON (SIMD) support is implemented in a way that > > requires preemption to be disabled while the SIMD registers are live. > > The reason for this is that those registers are not in the set that is > > preserved/restored on exception entry/exit and context switch, as this > > would impact performance generally, even for workloads where kernel mode > > SIMD is not the bottleneck. > > > > [...] > > I applied the first three patches to for-next/fpsimd: > Thanks > [1/4] arm64: fpsimd: Drop unneeded 'busy' flag > https://git.kernel.org/arm64/c/e109130b0e5e > [2/4] arm64: fpsimd: Preserve/restore kernel mode NEON at context switch > https://git.kernel.org/arm64/c/1e3a3de1ff6c I spotted a typo in the commit log of this patch: TIF_USING_KMODE_FPSIMD -> TIF_KERNEL_FPSTATE > [3/4] arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD > https://git.kernel.org/arm64/c/035262623959 > > It would be nice to have an Ack from Herbert on the last one so that > he's aware of the possible conflicts. > > The other thing I tangentially wondered about is what happens now if code > calls uaccess routines (e.g. get_user()) within a kernel_neon_{begin,end} > section? I think previously the fact that preemption had to be disabled > would've caused the might_fault() to explode, but now I suppose the BUG_ON() > in kernel_neon_begin() will save us. Is that right? > Not sure what you mean by 'save us'. Is there anything fundamentally wrong with doing user access from a kernel mode NEON region if preemption remains enabled? The BUG_ON() will only catch uses from hardirq/NMI context, or cases where FP/SIMD is not implemented/enabled in the first place so it will not trigger on a user access.
On Tue, Dec 12, 2023 at 09:16:13AM +0100, Ard Biesheuvel wrote: > On Mon, 11 Dec 2023 at 21:27, Will Deacon <will@kernel.org> wrote: > > [1/4] arm64: fpsimd: Drop unneeded 'busy' flag > > https://git.kernel.org/arm64/c/e109130b0e5e > > [2/4] arm64: fpsimd: Preserve/restore kernel mode NEON at context switch > > https://git.kernel.org/arm64/c/1e3a3de1ff6c > > I spotted a typo in the commit log of this patch: > > TIF_USING_KMODE_FPSIMD -> TIF_KERNEL_FPSTATE Cheers, I'll go in and fix that (so the SHAs will change). > > [3/4] arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD > > https://git.kernel.org/arm64/c/035262623959 > > > > It would be nice to have an Ack from Herbert on the last one so that > > he's aware of the possible conflicts. > > > > The other thing I tangentially wondered about is what happens now if code > > calls uaccess routines (e.g. get_user()) within a kernel_neon_{begin,end} > > section? I think previously the fact that preemption had to be disabled > > would've caused the might_fault() to explode, but now I suppose the BUG_ON() > > in kernel_neon_begin() will save us. Is that right? > > > > Not sure what you mean by 'save us'. Is there anything fundamentally > wrong with doing user access from a kernel mode NEON region if > preemption remains enabled? > > The BUG_ON() will only catch uses from hardirq/NMI context, or cases > where FP/SIMD is not implemented/enabled in the first place so it will > not trigger on a user access. As discussed off-list, the vague concern was if kernel_neon_begin() is nested off the back of a user fault. The BUG_ON() should fire in that case, so we're all good. Thanks! Will
On Tue, Dec 12, 2023 at 10:55:13AM +0000, Will Deacon wrote: > As discussed off-list, the vague concern was if kernel_neon_begin() is > nested off the back of a user fault. The BUG_ON() should fire in that case, > so we're all good. I think we should really document all these rules, and Samuel's series to add the portable FPU API would be the right place for it. I'd also really love to see objtool support for enforcing the various rules where possible.
From: Ard Biesheuvel <ardb@kernel.org> Currently, kernel mode NEON (SIMD) support is implemented in a way that requires preemption to be disabled while the SIMD registers are live. The reason for this is that those registers are not in the set that is preserved/restored on exception entry/exit and context switch, as this would impact performance generally, even for workloads where kernel mode SIMD is not the bottleneck. However, doing substantial work with preemption disabled is not great, as it affects scheduling latency, which is especially problematic for real-time use cases. So ideally, we should keep preemption enabled when we can, and find another way to ensure that this does not corrupt the NEON register state of in-kernel SIMD users. This series implements a suggestion by Mark Rutland, and introduces a thread_info flag TIF_KERNEL_FPSTATE, which indicates to the thread switch machinery that the task in question has live kernel mode SIMD state which needs to be preserved and restored. The space needed for this is allocated in thread_struct. (*) Given that currently, we run kernel mode NEON with softirqs disabled (to avoid the need for preserving kernel mode NEON context belonging to task context while the SIMD unit is being used by code running in softirq context), just removing the preempt_disable/enable calls is not sufficient, and we also need to leave softirqs enabled. This means that we may need to preserve kernel mode NEON state not only on a context switch, but also when code running in softirq context takes ownership of the SIMD unit, but this is straight-forward once we add the scratch space to thread_struct. (On PREEMPT_RT, softirqs execute with preemption enabled, making kernel mode FPSIMD in softirq context preemptible as well. We rely on the fact that the task that hosts the softirq dispatch logic does not itself use kernel mode FPSIMD in task context to ensure that there is only a single kernel mode FPSIMD state that may need to be preserved and restored.) (*) We might decide to allocate this space (~512 bytes) dynamically, if the thread_struct memory footprint causes issues. However, we should also explore doing the same for the user space FPSIMD state, as kernel threads never return to user space and have no need for this allocation. v4: - for the time being, make the existing yield logic depend on CONFIG_PREEMPT_VOLUNTARY, so it can be retired once the prerequisite core preempt changes (which remove CONFIG_PREEMPT_VOLUNTARY) have been merged [0] - incorporate feedback from Mark Rutland and include his acks v3: - add patch to drop yield logic from crypto C glue code - add R-b from Mark v2: - tweak some commit logs for clarity - integrate with the existing lazy restore logic - add Mark's R-b to patch #1 [0] https://lore.kernel.org/lkml/20231107215742.363031-1-ankur.a.arora@oracle.com/ Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Kees Cook <keescook@chromium.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Mark Brown <broonie@kernel.org> Cc: Eric Biggers <ebiggers@google.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Ard Biesheuvel (4): arm64: fpsimd: Drop unneeded 'busy' flag arm64: fpsimd: Preserve/restore kernel mode NEON at context switch arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD arm64: crypto: Disable yielding logic unless CONFIG_PREEMPT_VOLUNTARY=y arch/arm64/crypto/aes-ce-ccm-glue.c | 8 +- arch/arm64/crypto/chacha-neon-glue.c | 5 +- arch/arm64/crypto/crct10dif-ce-glue.c | 6 +- arch/arm64/crypto/nhpoly1305-neon-glue.c | 5 +- arch/arm64/crypto/poly1305-glue.c | 5 +- arch/arm64/crypto/polyval-ce-glue.c | 9 +- arch/arm64/include/asm/assembler.h | 4 +- arch/arm64/include/asm/processor.h | 3 + arch/arm64/include/asm/simd.h | 11 +- arch/arm64/include/asm/thread_info.h | 1 + arch/arm64/kernel/fpsimd.c | 163 +++++++++++++------- 11 files changed, 140 insertions(+), 80 deletions(-)