Message ID | 20250520152938.21881-3-ebiggers@kernel.org |
---|---|
State | New |
Headers | show |
Series | x86/fpu: Fix irq_fpu_usable() to return false during CPU onlining | expand |
Hello, kernel test robot noticed "WARNING:at_arch/x86/kernel/fpu/init.c:#fpu__init_cpu" on: commit: b88c4665c7f43e1898f695642fd159c6c542e49b ("[PATCH v3 2/2] x86/fpu: Fix irq_fpu_usable() to return false during CPU onlining") url: https://github.com/intel-lab-lkp/linux/commits/Eric-Biggers/x86-fpu-Replace-in_kernel_fpu-with-kernel_fpu_allowed/20250520-233322 patch link: https://lore.kernel.org/all/20250520152938.21881-3-ebiggers@kernel.org/ patch subject: [PATCH v3 2/2] x86/fpu: Fix irq_fpu_usable() to return false during CPU onlining in testcase: boot config: i386-randconfig-003-20250522 compiler: clang-20 test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G (please refer to attached dmesg/kmsg for entire log/backtrace) +------------------------------------------------------+------------+------------+ | | 5454801de7 | b88c4665c7 | +------------------------------------------------------+------------+------------+ | WARNING:at_arch/x86/kernel/fpu/init.c:#fpu__init_cpu | 0 | 12 | | EIP:fpu__init_cpu | 0 | 12 | +------------------------------------------------------+------------+------------+ If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <oliver.sang@intel.com> | Closes: https://lore.kernel.org/oe-lkp/202505280957.3efe5bf5-lkp@intel.com [ 0.324937][ T0] ------------[ cut here ]------------ [ 0.325455][ T0] WARNING: CPU: 0 PID: 0 at arch/x86/kernel/fpu/init.c:56 fpu__init_cpu (arch/x86/kernel/fpu/init.c:56 (discriminator 15)) [ 0.326299][ T0] Modules linked in: [ 0.326690][ T0] CPU: 0 UID: 0 PID: 0 Comm: swapper Tainted: G T 6.15.0-rc7-00706-gb88c4665c7f4 #1 PREEMPTLAZY c76c09082e833299bbcc71b75dc4abf37758b94a [ 0.328079][ T0] Tainted: [T]=RANDSTRUCT [ 0.328493][ T0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 [ 0.329028][ T0] EIP: fpu__init_cpu (arch/x86/kernel/fpu/init.c:56 (discriminator 15)) [ 0.329483][ T0] Code: 75 00 00 db e3 e8 64 28 00 00 a0 10 76 8a 83 84 c0 75 17 c6 05 10 76 8a 83 01 83 c4 08 5e 5f 5d 31 c0 31 d2 2e e9 58 cc 8c 01 <0f> 0b eb e5 e8 b3 0b 8c 01 f7 c7 00 02 00 00 75 bc eb bb 90 55 89 All code ======== 0: 75 00 jne 0x2 2: 00 db add %bl,%bl 4: e3 e8 jrcxz 0xffffffffffffffee 6: 64 28 00 sub %al,%fs:(%rax) 9: 00 a0 10 76 8a 83 add %ah,-0x7c7589f0(%rax) f: 84 c0 test %al,%al 11: 75 17 jne 0x2a 13: c6 05 10 76 8a 83 01 movb $0x1,-0x7c7589f0(%rip) # 0xffffffff838a762a 1a: 83 c4 08 add $0x8,%esp 1d: 5e pop %rsi 1e: 5f pop %rdi 1f: 5d pop %rbp 20: 31 c0 xor %eax,%eax 22: 31 d2 xor %edx,%edx 24: 2e e9 58 cc 8c 01 cs jmp 0x18ccc82 2a:* 0f 0b ud2 <-- trapping instruction 2c: eb e5 jmp 0x13 2e: e8 b3 0b 8c 01 call 0x18c0be6 33: f7 c7 00 02 00 00 test $0x200,%edi 39: 75 bc jne 0xfffffffffffffff7 3b: eb bb jmp 0xfffffffffffffff8 3d: 90 nop 3e: 55 push %rbp 3f: 89 .byte 0x89 Code starting with the faulting instruction =========================================== 0: 0f 0b ud2 2: eb e5 jmp 0xffffffffffffffe9 4: e8 b3 0b 8c 01 call 0x18c0bbc 9: f7 c7 00 02 00 00 test $0x200,%edi f: 75 bc jne 0xffffffffffffffcd 11: eb bb jmp 0xffffffffffffffce 13: 90 nop 14: 55 push %rbp 15: 89 .byte 0x89 [ 0.331175][ T0] EAX: 00000001 EBX: 00020801 ECX: 00000000 EDX: 00000000 [ 0.332290][ T0] ESI: 00000600 EDI: 00200206 EBP: 83841f78 ESP: 83841f68 [ 0.332937][ T0] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 EFLAGS: 00210202 [ 0.333655][ T0] CR0: 80050033 CR2: ffbff000 CR3: 0413a000 CR4: 000406b0 [ 0.334516][ T0] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 [ 0.335460][ T0] DR6: fffe0ff0 DR7: 00000400 [ 0.336093][ T0] Call Trace: [ 0.336599][ T0] arch_cpu_finalize_init (arch/x86/kernel/cpu/common.c:2536) [ 0.337340][ T0] start_kernel (init/main.c:1067) [ 0.338019][ T0] i386_start_kernel (arch/x86/kernel/head32.c:79 (discriminator 1)) [ 0.338725][ T0] startup_32_smp (arch/x86/kernel/head_32.S:290) [ 0.339030][ T0] irq event stamp: 2667 [ 0.339652][ T0] hardirqs last enabled at (2677): __console_unlock (arch/x86/include/asm/irqflags.h:19 arch/x86/include/asm/irqflags.h:109 arch/x86/include/asm/irqflags.h:151 kernel/printk/printk.c:344 kernel/printk/printk.c:2885) [ 0.340902][ T0] hardirqs last disabled at (2686): __console_unlock (kernel/printk/printk.c:342 (discriminator 9)) [ 0.342501][ T0] softirqs last enabled at (0): 0x0 [ 0.345464][ T0] softirqs last disabled at (0): 0x0 [ 0.346433][ T0] ---[ end trace 0000000000000000 ]--- The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20250528/202505280957.3efe5bf5-lkp@intel.com
On Wed, May 28, 2025 at 10:04:39AM +0800, kernel test robot wrote: > > Hello, > > kernel test robot noticed "WARNING:at_arch/x86/kernel/fpu/init.c:#fpu__init_cpu" on: > > commit: b88c4665c7f43e1898f695642fd159c6c542e49b ("[PATCH v3 2/2] x86/fpu: Fix irq_fpu_usable() to return false during CPU onlining") > url: https://github.com/intel-lab-lkp/linux/commits/Eric-Biggers/x86-fpu-Replace-in_kernel_fpu-with-kernel_fpu_allowed/20250520-233322 > patch link: https://lore.kernel.org/all/20250520152938.21881-3-ebiggers@kernel.org/ > patch subject: [PATCH v3 2/2] x86/fpu: Fix irq_fpu_usable() to return false during CPU onlining Right, this is because fpu__init_cpu() is actually called twice on the boot CPU. So the WARN_ON_FPU I added in v2 of this patch trips. Fortunately, the version that was applied was v1, and it does not have the problematic WARN_ON_FPU. I wonder if fpu__init_cpu() really should be called twice. The flow is: arch_cpu_finalize_init() fpu__init_system() fpu__init_system_early_generic() fpu__init_cpu() fpu__init_system_generic(); fpu__init_system_xstate_size_legacy(); fpu__init_system_xstate(fpu_kernel_cfg.max_size); fpu__init_task_struct_size(); fpu__init_cpu() - Eric
diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h index 8e6848f55dcdb..2983acd95f5de 100644 --- a/arch/x86/include/asm/fpu/api.h +++ b/arch/x86/include/asm/fpu/api.h @@ -116,10 +116,11 @@ extern void fpu_reset_from_exception_fixup(void); /* Boot, hotplug and resume */ extern void fpu__init_cpu(void); extern void fpu__init_system(void); extern void fpu__init_check_bugs(void); extern void fpu__resume_cpu(void); +extern void fpu__disable_cpu(void); #ifdef CONFIG_MATH_EMULATION extern void fpstate_init_soft(struct swregs_state *soft); #else static inline void fpstate_init_soft(struct swregs_state *soft) {} diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c index 6495259a23962..ea138583dd92a 100644 --- a/arch/x86/kernel/fpu/core.c +++ b/arch/x86/kernel/fpu/core.c @@ -42,12 +42,15 @@ struct fpu_state_config fpu_user_cfg __ro_after_init; * Represents the initial FPU state. It's mostly (but not completely) zeroes, * depending on the FPU hardware format: */ struct fpstate init_fpstate __ro_after_init; -/* Track in-kernel FPU usage */ -static DEFINE_PER_CPU(bool, kernel_fpu_allowed) = true; +/* + * Track FPU initialization and kernel-mode usage. 'true' means the FPU is + * initialized and is not currently being used by the kernel: + */ +DEFINE_PER_CPU(bool, kernel_fpu_allowed); /* * Track which context is using the FPU on the CPU: */ DEFINE_PER_CPU(struct fpu *, fpu_fpregs_owner_ctx); @@ -70,19 +73,22 @@ bool irq_fpu_usable(void) { if (WARN_ON_ONCE(in_nmi())) return false; /* - * In kernel FPU usage already active? This detects any explicitly - * nested usage in task or softirq context, which is unsupported. It - * also detects attempted usage in a hardirq that has interrupted a - * kernel-mode FPU section. + * Return false in the following cases: + * + * - FPU is not yet initialized. This can happen only when the call is + * coming from CPU onlining, for example for microcode checksumming. + * - The kernel is already using the FPU, either because of explicit + * nesting (which should never be done), or because of implicit + * nesting when a hardirq interrupted a kernel-mode FPU section. + * + * The single boolean check below handles both cases: */ - if (!this_cpu_read(kernel_fpu_allowed)) { - WARN_ON_FPU(!in_hardirq()); + if (!this_cpu_read(kernel_fpu_allowed)) return false; - } /* * When not in NMI or hard interrupt context, FPU can be used in: * * - Task context except from within fpregs_lock()'ed critical diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c index 6bb3e35c40e24..c581a3e452dfd 100644 --- a/arch/x86/kernel/fpu/init.c +++ b/arch/x86/kernel/fpu/init.c @@ -49,10 +49,23 @@ static void fpu__init_cpu_generic(void) */ void fpu__init_cpu(void) { fpu__init_cpu_generic(); fpu__init_cpu_xstate(); + + /* Start allowing kernel-mode FPU: */ + WARN_ON_FPU(this_cpu_read(kernel_fpu_allowed)); + this_cpu_write(kernel_fpu_allowed, true); +} + +/* + * Stop allowing kernel-mode FPU. Called when a CPU is brought offline: + */ +void fpu__disable_cpu(void) +{ + WARN_ON_FPU(!this_cpu_read(kernel_fpu_allowed)); + this_cpu_write(kernel_fpu_allowed, false); } static bool __init fpu__probe_without_cpuid(void) { unsigned long cr0; diff --git a/arch/x86/kernel/fpu/internal.h b/arch/x86/kernel/fpu/internal.h index 975de070c9c98..9782152d609c7 100644 --- a/arch/x86/kernel/fpu/internal.h +++ b/arch/x86/kernel/fpu/internal.h @@ -2,10 +2,12 @@ #ifndef __X86_KERNEL_FPU_INTERNAL_H #define __X86_KERNEL_FPU_INTERNAL_H extern struct fpstate init_fpstate; +DECLARE_PER_CPU(bool, kernel_fpu_allowed); + /* CPU feature check wrappers */ static __always_inline __pure bool use_xsave(void) { return cpu_feature_enabled(X86_FEATURE_XSAVE); } diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index d7d61b3de2bf6..cf42a7632dd49 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -1186,10 +1186,16 @@ void cpu_disable_common(void) { int cpu = smp_processor_id(); remove_siblinginfo(cpu); + /* + * Stop allowing kernel-mode FPU. This is needed so that if the CPU is + * brought online again, the initial state is not allowed: + */ + fpu__disable_cpu(); + /* It's now safe to remove this processor from the online map */ lock_vector_lock(); remove_cpu_from_maps(cpu); unlock_vector_lock(); fixup_irqs();