Message ID | 20231023160018.164054-3-mario.limonciello@amd.com |
---|---|
State | New |
Headers | show |
Series | Fixes for s3 with parallel bootup | expand |
* Mario Limonciello <mario.limonciello@amd.com> wrote: > Fixes a BUG reported during suspend to ram testing. > > ``` > [ 478.274752] BUG: using smp_processor_id() in preemptible [00000000] code: rtcwake/2948 > [ 478.274754] caller is amd_pmu_lbr_reset+0x19/0xc0 > ``` > > Cc: stable@vger.kernel.org # 6.1+ > Fixes: ca5b7c0d9621 ("perf/x86/amd/lbr: Add LbrExtV2 branch record support") > Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> > --- > arch/x86/events/amd/lbr.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/events/amd/lbr.c b/arch/x86/events/amd/lbr.c > index eb31f850841a..5b98e8c7d8b7 100644 > --- a/arch/x86/events/amd/lbr.c > +++ b/arch/x86/events/amd/lbr.c > @@ -321,7 +321,7 @@ int amd_pmu_lbr_hw_config(struct perf_event *event) > > void amd_pmu_lbr_reset(void) > { > - struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); > + struct cpu_hw_events *cpuc = get_cpu_ptr(&cpu_hw_events); > int i; > > if (!x86_pmu.lbr_nr) > @@ -335,6 +335,7 @@ void amd_pmu_lbr_reset(void) > > cpuc->last_task_ctx = NULL; > cpuc->last_log_id = 0; > + put_cpu_ptr(&cpu_hw_events); > wrmsrl(MSR_AMD64_LBR_SELECT, 0); > } Weird, amd_pmu_lbr_reset() is called from these places: - amd_pmu_lbr_sched_task(): during task sched-in during context-switching, this should already have preemption disabled. - amd_pmu_lbr_add(): this gets indirectly called by amd_pmu::add (amd_pmu_add_event()), called by event_sched_in(), which too should have preemption disabled. I clearly must have missed some additional place it gets called in. Could you please cite the full log of the amd_pmu_lbr_reset() call that caused the critical section warning? Thanks, Ingo
On 10/24/2023 03:02, Ingo Molnar wrote: > > * Mario Limonciello <mario.limonciello@amd.com> wrote: > >> Fixes a BUG reported during suspend to ram testing. >> >> ``` >> [ 478.274752] BUG: using smp_processor_id() in preemptible [00000000] code: rtcwake/2948 >> [ 478.274754] caller is amd_pmu_lbr_reset+0x19/0xc0 >> ``` >> >> Cc: stable@vger.kernel.org # 6.1+ >> Fixes: ca5b7c0d9621 ("perf/x86/amd/lbr: Add LbrExtV2 branch record support") >> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> >> --- >> arch/x86/events/amd/lbr.c | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/arch/x86/events/amd/lbr.c b/arch/x86/events/amd/lbr.c >> index eb31f850841a..5b98e8c7d8b7 100644 >> --- a/arch/x86/events/amd/lbr.c >> +++ b/arch/x86/events/amd/lbr.c >> @@ -321,7 +321,7 @@ int amd_pmu_lbr_hw_config(struct perf_event *event) >> >> void amd_pmu_lbr_reset(void) >> { >> - struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); >> + struct cpu_hw_events *cpuc = get_cpu_ptr(&cpu_hw_events); >> int i; >> >> if (!x86_pmu.lbr_nr) >> @@ -335,6 +335,7 @@ void amd_pmu_lbr_reset(void) >> >> cpuc->last_task_ctx = NULL; >> cpuc->last_log_id = 0; >> + put_cpu_ptr(&cpu_hw_events); >> wrmsrl(MSR_AMD64_LBR_SELECT, 0); >> } > > Weird, amd_pmu_lbr_reset() is called from these places: > > - amd_pmu_lbr_sched_task(): during task sched-in during > context-switching, this should already have preemption disabled. > > - amd_pmu_lbr_add(): this gets indirectly called by amd_pmu::add > (amd_pmu_add_event()), called by event_sched_in(), which too should have > preemption disabled. > > I clearly must have missed some additional place it gets called in. > > Could you please cite the full log of the amd_pmu_lbr_reset() call that > caused the critical section warning? > > Thanks, > > Ingo Below is the call trace in case you think it's better to disable preemption by the caller instead. If you think it's better to keep it in amd_pmu_lbr_reset() I'll add this trace to the commit message. Call Trace: <TASK> dump_stack_lvl+0x44/0x60 check_preemption_disabled+0xce/0xf0 ? __pfx_x86_pmu_dead_cpu+0x10/0x10 amd_pmu_lbr_reset+0x19/0xc0 ? __pfx_x86_pmu_dead_cpu+0x10/0x10 amd_pmu_cpu_reset.constprop.0+0x51/0x60 amd_pmu_cpu_dead+0x3e/0x90 x86_pmu_dead_cpu+0x13/0x20 cpuhp_invoke_callback+0x169/0x4b0 ? __pfx_virtnet_cpu_dead+0x10/0x10 __cpuhp_invoke_callback_range+0x76/0xe0 _cpu_down+0x112/0x270 freeze_secondary_cpus+0x8e/0x280 suspend_devices_and_enter+0x342/0x900 pm_suspend+0x2fd/0x690 state_store+0x71/0xd0 kernfs_fop_write_iter+0x128/0x1c0 vfs_write+0x2db/0x400 ksys_write+0x5f/0xe0 do_syscall_64+0x59/0x90
On Tue, Oct 24, 2023 at 10:32:27AM -0500, Mario Limonciello wrote: > On 10/24/2023 03:02, Ingo Molnar wrote: > > > > * Mario Limonciello <mario.limonciello@amd.com> wrote: > > > > > Fixes a BUG reported during suspend to ram testing. > > > > > > ``` > > > [ 478.274752] BUG: using smp_processor_id() in preemptible [00000000] code: rtcwake/2948 > > > [ 478.274754] caller is amd_pmu_lbr_reset+0x19/0xc0 > > > ``` > > > > > > Cc: stable@vger.kernel.org # 6.1+ > > > Fixes: ca5b7c0d9621 ("perf/x86/amd/lbr: Add LbrExtV2 branch record support") > > > Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> > > > --- > > > arch/x86/events/amd/lbr.c | 3 ++- > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > diff --git a/arch/x86/events/amd/lbr.c b/arch/x86/events/amd/lbr.c > > > index eb31f850841a..5b98e8c7d8b7 100644 > > > --- a/arch/x86/events/amd/lbr.c > > > +++ b/arch/x86/events/amd/lbr.c > > > @@ -321,7 +321,7 @@ int amd_pmu_lbr_hw_config(struct perf_event *event) > > > void amd_pmu_lbr_reset(void) > > > { > > > - struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); > > > + struct cpu_hw_events *cpuc = get_cpu_ptr(&cpu_hw_events); > > > int i; > > > if (!x86_pmu.lbr_nr) > > > @@ -335,6 +335,7 @@ void amd_pmu_lbr_reset(void) > > > cpuc->last_task_ctx = NULL; > > > cpuc->last_log_id = 0; > > > + put_cpu_ptr(&cpu_hw_events); > > > wrmsrl(MSR_AMD64_LBR_SELECT, 0); > > > } > > > > Weird, amd_pmu_lbr_reset() is called from these places: > > > > - amd_pmu_lbr_sched_task(): during task sched-in during > > context-switching, this should already have preemption disabled. > > > > - amd_pmu_lbr_add(): this gets indirectly called by amd_pmu::add > > (amd_pmu_add_event()), called by event_sched_in(), which too should have > > preemption disabled. > > > > I clearly must have missed some additional place it gets called in. > > > > Could you please cite the full log of the amd_pmu_lbr_reset() call that > > caused the critical section warning? > > > > Thanks, > > > > Ingo > > Below is the call trace in case you think it's better to disable preemption > by the caller instead. If you think it's better to keep it in > amd_pmu_lbr_reset() I'll add this trace to the commit message. You cut too much; what task is running this? IIRC this is the hotplug thread running a teardown function on that CPU itself. It being a strict per-cpu thread should not trip smp_processor_id() wanrs. > > Call Trace: > <TASK> > dump_stack_lvl+0x44/0x60 > check_preemption_disabled+0xce/0xf0 > ? __pfx_x86_pmu_dead_cpu+0x10/0x10 > amd_pmu_lbr_reset+0x19/0xc0 > ? __pfx_x86_pmu_dead_cpu+0x10/0x10 > amd_pmu_cpu_reset.constprop.0+0x51/0x60 > amd_pmu_cpu_dead+0x3e/0x90 > x86_pmu_dead_cpu+0x13/0x20 > cpuhp_invoke_callback+0x169/0x4b0 > ? __pfx_virtnet_cpu_dead+0x10/0x10 > __cpuhp_invoke_callback_range+0x76/0xe0 > _cpu_down+0x112/0x270 > freeze_secondary_cpus+0x8e/0x280 > suspend_devices_and_enter+0x342/0x900 > pm_suspend+0x2fd/0x690 > state_store+0x71/0xd0 > kernfs_fop_write_iter+0x128/0x1c0 > vfs_write+0x2db/0x400 > ksys_write+0x5f/0xe0 > do_syscall_64+0x59/0x90 >
On 10/24/2023 10:59, Peter Zijlstra wrote: > On Tue, Oct 24, 2023 at 10:32:27AM -0500, Mario Limonciello wrote: >> On 10/24/2023 03:02, Ingo Molnar wrote: >>> >>> * Mario Limonciello <mario.limonciello@amd.com> wrote: >>> >>>> Fixes a BUG reported during suspend to ram testing. >>>> >>>> ``` >>>> [ 478.274752] BUG: using smp_processor_id() in preemptible [00000000] code: rtcwake/2948 >>>> [ 478.274754] caller is amd_pmu_lbr_reset+0x19/0xc0 >>>> ``` >>>> >>>> Cc: stable@vger.kernel.org # 6.1+ >>>> Fixes: ca5b7c0d9621 ("perf/x86/amd/lbr: Add LbrExtV2 branch record support") >>>> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> >>>> --- >>>> arch/x86/events/amd/lbr.c | 3 ++- >>>> 1 file changed, 2 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/arch/x86/events/amd/lbr.c b/arch/x86/events/amd/lbr.c >>>> index eb31f850841a..5b98e8c7d8b7 100644 >>>> --- a/arch/x86/events/amd/lbr.c >>>> +++ b/arch/x86/events/amd/lbr.c >>>> @@ -321,7 +321,7 @@ int amd_pmu_lbr_hw_config(struct perf_event *event) >>>> void amd_pmu_lbr_reset(void) >>>> { >>>> - struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); >>>> + struct cpu_hw_events *cpuc = get_cpu_ptr(&cpu_hw_events); >>>> int i; >>>> if (!x86_pmu.lbr_nr) >>>> @@ -335,6 +335,7 @@ void amd_pmu_lbr_reset(void) >>>> cpuc->last_task_ctx = NULL; >>>> cpuc->last_log_id = 0; >>>> + put_cpu_ptr(&cpu_hw_events); >>>> wrmsrl(MSR_AMD64_LBR_SELECT, 0); >>>> } >>> >>> Weird, amd_pmu_lbr_reset() is called from these places: >>> >>> - amd_pmu_lbr_sched_task(): during task sched-in during >>> context-switching, this should already have preemption disabled. >>> >>> - amd_pmu_lbr_add(): this gets indirectly called by amd_pmu::add >>> (amd_pmu_add_event()), called by event_sched_in(), which too should have >>> preemption disabled. >>> >>> I clearly must have missed some additional place it gets called in. >>> >>> Could you please cite the full log of the amd_pmu_lbr_reset() call that >>> caused the critical section warning? >>> >>> Thanks, >>> >>> Ingo >> >> Below is the call trace in case you think it's better to disable preemption >> by the caller instead. If you think it's better to keep it in >> amd_pmu_lbr_reset() I'll add this trace to the commit message. > > You cut too much; what task is running this? > > IIRC this is the hotplug thread running a teardown function on that CPU > itself. It being a strict per-cpu thread should not trip > smp_processor_id() wanrs. > BUG: using smp_processor_id() in preemptible [00000000] code: rtcwake/2960 caller is amd_pmu_lbr_reset+0x19/0xc0 CPU: 104 PID: 2960 Comm: rtcwake Not tainted 6.6.0-rc6-00002-g3e2c7f3ac51f #1025 Call Trace: <TASK> dump_stack_lvl+0x44/0x60 check_preemption_disabled+0xce/0xf0 ? __pfx_x86_pmu_dead_cpu+0x10/0x10 amd_pmu_lbr_reset+0x19/0xc0 ? __pfx_x86_pmu_dead_cpu+0x10/0x10 amd_pmu_cpu_reset.constprop.0+0x51/0x60 amd_pmu_cpu_dead+0x3e/0x90 x86_pmu_dead_cpu+0x13/0x20 cpuhp_invoke_callback+0x169/0x4b0 ? __pfx_virtnet_cpu_dead+0x10/0x10 __cpuhp_invoke_callback_range+0x76/0xe0 _cpu_down+0x112/0x270 freeze_secondary_cpus+0x8e/0x280 suspend_devices_and_enter+0x342/0x900 pm_suspend+0x2fd/0x690 state_store+0x71/0xd0 kernfs_fop_write_iter+0x128/0x1c0 vfs_write+0x2db/0x400 ksys_write+0x5f/0xe0 do_syscall_64+0x59/0x90 ? srso_alias_return_thunk+0x5/0x7f ? count_memcg_events.constprop.0+0x1a/0x30 ? srso_alias_return_thunk+0x5/0x7f ? handle_mm_fault+0x1e9/0x340 ? srso_alias_return_thunk+0x5/0x7f ? preempt_count_add+0x4d/0xa0 ? srso_alias_return_thunk+0x5/0x7f ? up_read+0x38/0x70 ? srso_alias_return_thunk+0x5/0x7f ? do_user_addr_fault+0x343/0x6b0 ? srso_alias_return_thunk+0x5/0x7f ? exc_page_fault+0x74/0x170 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 RIP: 0033:0x7f32f8d14a77 Code: 10 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 RSP: 002b:00007ffdc648de18 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f32f8d14a77 RDX: 0000000000000004 RSI: 000055b2fc2a5670 RDI: 0000000000000004 RBP: 000055b2fc2a5670 R08: 0000000000000000 R09: 000055b2fc2a5670 R10: 00007f32f8e1a2f0 R11: 0000000000000246 R12: 0000000000000004 R13: 000055b2fc2a2480 R14: 00007f32f8e16600 R15: 00007f32f8e15a00 </TASK> >> >> Call Trace: >> <TASK> >> dump_stack_lvl+0x44/0x60 >> check_preemption_disabled+0xce/0xf0 >> ? __pfx_x86_pmu_dead_cpu+0x10/0x10 >> amd_pmu_lbr_reset+0x19/0xc0 >> ? __pfx_x86_pmu_dead_cpu+0x10/0x10 >> amd_pmu_cpu_reset.constprop.0+0x51/0x60 >> amd_pmu_cpu_dead+0x3e/0x90 >> x86_pmu_dead_cpu+0x13/0x20 >> cpuhp_invoke_callback+0x169/0x4b0 >> ? __pfx_virtnet_cpu_dead+0x10/0x10 >> __cpuhp_invoke_callback_range+0x76/0xe0 >> _cpu_down+0x112/0x270 >> freeze_secondary_cpus+0x8e/0x280 >> suspend_devices_and_enter+0x342/0x900 >> pm_suspend+0x2fd/0x690 >> state_store+0x71/0xd0 >> kernfs_fop_write_iter+0x128/0x1c0 >> vfs_write+0x2db/0x400 >> ksys_write+0x5f/0xe0 >> do_syscall_64+0x59/0x90 >>
On Tue, Oct 24, 2023 at 11:04:06AM -0500, Mario Limonciello wrote: > > IIRC this is the hotplug thread running a teardown function on that CPU > > itself. It being a strict per-cpu thread should not trip > > smp_processor_id() wanrs. > > > > BUG: using smp_processor_id() in preemptible [00000000] code: rtcwake/2960 > caller is amd_pmu_lbr_reset+0x19/0xc0 > CPU: 104 PID: 2960 Comm: rtcwake Not tainted 6.6.0-rc6-00002-g3e2c7f3ac51f Very much not the cpuhp/%u thread :/, let me try and figure out how that happens. > #1025 > Call Trace: > <TASK> > dump_stack_lvl+0x44/0x60 > check_preemption_disabled+0xce/0xf0 > ? __pfx_x86_pmu_dead_cpu+0x10/0x10 > amd_pmu_lbr_reset+0x19/0xc0 > ? __pfx_x86_pmu_dead_cpu+0x10/0x10 > amd_pmu_cpu_reset.constprop.0+0x51/0x60 > amd_pmu_cpu_dead+0x3e/0x90 > x86_pmu_dead_cpu+0x13/0x20 > cpuhp_invoke_callback+0x169/0x4b0 > ? __pfx_virtnet_cpu_dead+0x10/0x10 > __cpuhp_invoke_callback_range+0x76/0xe0 > _cpu_down+0x112/0x270 > freeze_secondary_cpus+0x8e/0x280 > suspend_devices_and_enter+0x342/0x900 > pm_suspend+0x2fd/0x690 > state_store+0x71/0xd0 > kernfs_fop_write_iter+0x128/0x1c0 > vfs_write+0x2db/0x400 > ksys_write+0x5f/0xe0 > do_syscall_64+0x59/0x90 > ? srso_alias_return_thunk+0x5/0x7f > ? count_memcg_events.constprop.0+0x1a/0x30 > ? srso_alias_return_thunk+0x5/0x7f > ? handle_mm_fault+0x1e9/0x340 > ? srso_alias_return_thunk+0x5/0x7f > ? preempt_count_add+0x4d/0xa0 > ? srso_alias_return_thunk+0x5/0x7f > ? up_read+0x38/0x70 > ? srso_alias_return_thunk+0x5/0x7f > ? do_user_addr_fault+0x343/0x6b0 > ? srso_alias_return_thunk+0x5/0x7f > ? exc_page_fault+0x74/0x170 > entry_SYSCALL_64_after_hwframe+0x6e/0xd8 > RIP: 0033:0x7f32f8d14a77 > Code: 10 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa > 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff > 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 > RSP: 002b:00007ffdc648de18 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 > RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f32f8d14a77 > RDX: 0000000000000004 RSI: 000055b2fc2a5670 RDI: 0000000000000004 > RBP: 000055b2fc2a5670 R08: 0000000000000000 R09: 000055b2fc2a5670 > R10: 00007f32f8e1a2f0 R11: 0000000000000246 R12: 0000000000000004 > R13: 000055b2fc2a2480 R14: 00007f32f8e16600 R15: 00007f32f8e15a00 > </TASK>
On Tue, Oct 24, 2023 at 06:30:38PM +0200, Peter Zijlstra wrote: > On Tue, Oct 24, 2023 at 11:04:06AM -0500, Mario Limonciello wrote: > > > > IIRC this is the hotplug thread running a teardown function on that CPU > > > itself. It being a strict per-cpu thread should not trip > > > smp_processor_id() wanrs. > > > > > > > BUG: using smp_processor_id() in preemptible [00000000] code: rtcwake/2960 > > caller is amd_pmu_lbr_reset+0x19/0xc0 > > CPU: 104 PID: 2960 Comm: rtcwake Not tainted 6.6.0-rc6-00002-g3e2c7f3ac51f > > Very much not the cpuhp/%u thread :/, let me try and figure out how that > happens. Uhh, my bad, these are the PREPARE/DEAD handlers, they run before online and after dying. The CPU is completely dead. Running lbr_reset() here makes no sense. Did that want to be in amd_pmu_cpu_dying() ? > > > #1025 > > Call Trace: > > <TASK> > > dump_stack_lvl+0x44/0x60 > > check_preemption_disabled+0xce/0xf0 > > ? __pfx_x86_pmu_dead_cpu+0x10/0x10 > > amd_pmu_lbr_reset+0x19/0xc0 > > ? __pfx_x86_pmu_dead_cpu+0x10/0x10 > > amd_pmu_cpu_reset.constprop.0+0x51/0x60 > > amd_pmu_cpu_dead+0x3e/0x90 > > x86_pmu_dead_cpu+0x13/0x20 > > cpuhp_invoke_callback+0x169/0x4b0 > > ? __pfx_virtnet_cpu_dead+0x10/0x10 > > __cpuhp_invoke_callback_range+0x76/0xe0 > > _cpu_down+0x112/0x270 > > freeze_secondary_cpus+0x8e/0x280
On 10/24/2023 11:51, Ingo Molnar wrote: > > * Ingo Molnar <mingo@kernel.org> wrote: > >> >> * Mario Limonciello <mario.limonciello@amd.com> wrote: >> >>> Fixes a BUG reported during suspend to ram testing. >>> >>> ``` >>> [ 478.274752] BUG: using smp_processor_id() in preemptible [00000000] code: rtcwake/2948 >>> [ 478.274754] caller is amd_pmu_lbr_reset+0x19/0xc0 >>> ``` >>> >>> Cc: stable@vger.kernel.org # 6.1+ >>> Fixes: ca5b7c0d9621 ("perf/x86/amd/lbr: Add LbrExtV2 branch record support") >>> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> >>> --- >>> arch/x86/events/amd/lbr.c | 3 ++- >>> 1 file changed, 2 insertions(+), 1 deletion(-) >>> >>> diff --git a/arch/x86/events/amd/lbr.c b/arch/x86/events/amd/lbr.c >>> index eb31f850841a..5b98e8c7d8b7 100644 >>> --- a/arch/x86/events/amd/lbr.c >>> +++ b/arch/x86/events/amd/lbr.c >>> @@ -321,7 +321,7 @@ int amd_pmu_lbr_hw_config(struct perf_event *event) >>> >>> void amd_pmu_lbr_reset(void) >>> { >>> - struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); >>> + struct cpu_hw_events *cpuc = get_cpu_ptr(&cpu_hw_events); >>> int i; >>> >>> if (!x86_pmu.lbr_nr) >>> @@ -335,6 +335,7 @@ void amd_pmu_lbr_reset(void) >>> >>> cpuc->last_task_ctx = NULL; >>> cpuc->last_log_id = 0; >>> + put_cpu_ptr(&cpu_hw_events); >>> wrmsrl(MSR_AMD64_LBR_SELECT, 0); >>> } >> >> Weird, amd_pmu_lbr_reset() is called from these places: >> >> - amd_pmu_lbr_sched_task(): during task sched-in during >> context-switching, this should already have preemption disabled. >> >> - amd_pmu_lbr_add(): this gets indirectly called by amd_pmu::add >> (amd_pmu_add_event()), called by event_sched_in(), which too should have >> preemption disabled. >> >> I clearly must have missed some additional place it gets called in. > > Just for completeness, the additional place I missed is > amd_pmu_cpu_reset(): > > static_call(amd_pmu_branch_reset)(); > > ... and the amd_pmu_branch_reset static call is set up with > amd_pmu_lbr_reset, which is why git grep missed it. > > Anyway, amd_pmu_cpu_reset() is very much something that should run > non-preemptable to begin with, so your patch only papers over the real > problem AFAICS. > > Thanks, > > Ingo In that case - should preemption be disabled for all of x86_pmu_dying_cpu() perhaps? For good measure x86_pmu_starting_cpu() too?
On Tue, Oct 24, 2023 at 01:30:59PM -0500, Mario Limonciello wrote: > In that case - should preemption be disabled for all of x86_pmu_dying_cpu() > perhaps? > > For good measure x86_pmu_starting_cpu() too? starting and dying are with IRQs disabled.
On 10/24/2023 10:04 PM, Peter Zijlstra wrote: > On Tue, Oct 24, 2023 at 06:30:38PM +0200, Peter Zijlstra wrote: >> On Tue, Oct 24, 2023 at 11:04:06AM -0500, Mario Limonciello wrote: >> >>>> IIRC this is the hotplug thread running a teardown function on that CPU >>>> itself. It being a strict per-cpu thread should not trip >>>> smp_processor_id() wanrs. >>>> >>> >>> BUG: using smp_processor_id() in preemptible [00000000] code: rtcwake/2960 >>> caller is amd_pmu_lbr_reset+0x19/0xc0 >>> CPU: 104 PID: 2960 Comm: rtcwake Not tainted 6.6.0-rc6-00002-g3e2c7f3ac51f >> >> Very much not the cpuhp/%u thread :/, let me try and figure out how that >> happens. > > Uhh, my bad, these are the PREPARE/DEAD handlers, they run before online > and after dying. The CPU is completely dead. Running lbr_reset() here > makes no sense. > > Did that want to be in amd_pmu_cpu_dying() ? > Agreed, it should have gone into the cpu_dying() callback. lbr_reset() is called once from cpu_starting() so I wonder if its necessary to call it again in the CPU offline path.
diff --git a/arch/x86/events/amd/lbr.c b/arch/x86/events/amd/lbr.c index eb31f850841a..5b98e8c7d8b7 100644 --- a/arch/x86/events/amd/lbr.c +++ b/arch/x86/events/amd/lbr.c @@ -321,7 +321,7 @@ int amd_pmu_lbr_hw_config(struct perf_event *event) void amd_pmu_lbr_reset(void) { - struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); + struct cpu_hw_events *cpuc = get_cpu_ptr(&cpu_hw_events); int i; if (!x86_pmu.lbr_nr) @@ -335,6 +335,7 @@ void amd_pmu_lbr_reset(void) cpuc->last_task_ctx = NULL; cpuc->last_log_id = 0; + put_cpu_ptr(&cpu_hw_events); wrmsrl(MSR_AMD64_LBR_SELECT, 0); }
Fixes a BUG reported during suspend to ram testing. ``` [ 478.274752] BUG: using smp_processor_id() in preemptible [00000000] code: rtcwake/2948 [ 478.274754] caller is amd_pmu_lbr_reset+0x19/0xc0 ``` Cc: stable@vger.kernel.org # 6.1+ Fixes: ca5b7c0d9621 ("perf/x86/amd/lbr: Add LbrExtV2 branch record support") Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> --- arch/x86/events/amd/lbr.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)