Message ID | 1627997128-57891-1-git-send-email-kan.liang@linux.intel.com |
---|---|
State | New |
Headers | show |
Series | [V2] perf/x86/intel: Apply mid ACK for small core | expand |
On Tue, Aug 03, 2021 at 06:25:28AM -0700, kan.liang@linux.intel.com wrote: > From: Kan Liang <kan.liang@linux.intel.com> > > A warning as below may be occasionally triggered in an ADL machine when > these conditions occur, > - Two perf record commands run one by one. Both record a PEBS event. > - Both runs on small cores. > - They have different adaptive PEBS configuration (PEBS_DATA_CFG). > > [ 673.663291] WARNING: CPU: 4 PID: 9874 at > arch/x86/events/intel/ds.c:1743 > setup_pebs_adaptive_sample_data+0x55e/0x5b0 > [ 673.663348] RIP: 0010:setup_pebs_adaptive_sample_data+0x55e/0x5b0 > [ 673.663357] Call Trace: > [ 673.663357] <NMI> > [ 673.663357] intel_pmu_drain_pebs_icl+0x48b/0x810 > [ 673.663360] perf_event_nmi_handler+0x41/0x80 > [ 673.663368] </NMI> > [ 673.663370] __perf_event_task_sched_in+0x2c2/0x3a0 > > Different from the big core, the small core requires the ACK right > before re-enabling counters in the NMI handler, otherwise a stale PEBS > record may be dumped into the later NMI handler, which trigger the > warning. > > Add a new mid_ack flag to track the case. Add all PMI handler bits in > the struct x86_hybrid_pmu to track the bits for different types of PMUs. > Apply mid ACK for the small cores on an Alder Lake machine. Why do we need a new option? Why isn't early (as in not late) good enough?
On 8/3/2021 10:55 AM, Peter Zijlstra wrote: > On Tue, Aug 03, 2021 at 06:25:28AM -0700, kan.liang@linux.intel.com wrote: >> From: Kan Liang <kan.liang@linux.intel.com> >> >> A warning as below may be occasionally triggered in an ADL machine when >> these conditions occur, >> - Two perf record commands run one by one. Both record a PEBS event. >> - Both runs on small cores. >> - They have different adaptive PEBS configuration (PEBS_DATA_CFG). >> >> [ 673.663291] WARNING: CPU: 4 PID: 9874 at >> arch/x86/events/intel/ds.c:1743 >> setup_pebs_adaptive_sample_data+0x55e/0x5b0 >> [ 673.663348] RIP: 0010:setup_pebs_adaptive_sample_data+0x55e/0x5b0 >> [ 673.663357] Call Trace: >> [ 673.663357] <NMI> >> [ 673.663357] intel_pmu_drain_pebs_icl+0x48b/0x810 >> [ 673.663360] perf_event_nmi_handler+0x41/0x80 >> [ 673.663368] </NMI> >> [ 673.663370] __perf_event_task_sched_in+0x2c2/0x3a0 >> >> Different from the big core, the small core requires the ACK right >> before re-enabling counters in the NMI handler, otherwise a stale PEBS >> record may be dumped into the later NMI handler, which trigger the >> warning. >> >> Add a new mid_ack flag to track the case. Add all PMI handler bits in >> the struct x86_hybrid_pmu to track the bits for different types of PMUs. >> Apply mid ACK for the small cores on an Alder Lake machine. > > Why do we need a new option? Why isn't early (as in not late) good > enough? > The early ACK can fix this issue, however it triggers a spurious NMI during the stress test. I'm told to do the ACK right before re-enabling counters for small cores. That indeed fixes all the issues. Thanks, Kan
On 8/3/2021 12:17 PM, Peter Zijlstra wrote: > On Tue, Aug 03, 2021 at 11:20:20AM -0400, Liang, Kan wrote: >> >> >> On 8/3/2021 10:55 AM, Peter Zijlstra wrote: >>> On Tue, Aug 03, 2021 at 06:25:28AM -0700, kan.liang@linux.intel.com wrote: >>>> From: Kan Liang <kan.liang@linux.intel.com> >>>> >>>> A warning as below may be occasionally triggered in an ADL machine when >>>> these conditions occur, >>>> - Two perf record commands run one by one. Both record a PEBS event. >>>> - Both runs on small cores. >>>> - They have different adaptive PEBS configuration (PEBS_DATA_CFG). >>>> >>>> [ 673.663291] WARNING: CPU: 4 PID: 9874 at >>>> arch/x86/events/intel/ds.c:1743 >>>> setup_pebs_adaptive_sample_data+0x55e/0x5b0 >>>> [ 673.663348] RIP: 0010:setup_pebs_adaptive_sample_data+0x55e/0x5b0 >>>> [ 673.663357] Call Trace: >>>> [ 673.663357] <NMI> >>>> [ 673.663357] intel_pmu_drain_pebs_icl+0x48b/0x810 >>>> [ 673.663360] perf_event_nmi_handler+0x41/0x80 >>>> [ 673.663368] </NMI> >>>> [ 673.663370] __perf_event_task_sched_in+0x2c2/0x3a0 >>>> >>>> Different from the big core, the small core requires the ACK right >>>> before re-enabling counters in the NMI handler, otherwise a stale PEBS >>>> record may be dumped into the later NMI handler, which trigger the >>>> warning. >>>> >>>> Add a new mid_ack flag to track the case. Add all PMI handler bits in >>>> the struct x86_hybrid_pmu to track the bits for different types of PMUs. >>>> Apply mid ACK for the small cores on an Alder Lake machine. >>> >>> Why do we need a new option? Why isn't early (as in not late) good >>> enough? >>> >> >> The early ACK can fix this issue, however it triggers a spurious NMI during >> the stress test. I'm told to do the ACK right before re-enabling counters >> for small cores. That indeed fixes all the issues. > > Any chance that would also work for the chips that now use late_ack? > Let me check and do some tests. Thanks, Kan
>> Any chance that would also work for the chips that now use late_ack? >> > > Let me check and do some tests. I suspect we won't be able to test all, so we'll probably have to keep all three options. -Andi
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index d76be3b..511d1f9 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -2904,24 +2904,28 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status) */ static int intel_pmu_handle_irq(struct pt_regs *regs) { - struct cpu_hw_events *cpuc; + struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); + bool late_ack = hybrid_bit(cpuc->pmu, late_ack); + bool mid_ack = hybrid_bit(cpuc->pmu, mid_ack); int loops; u64 status; int handled; int pmu_enabled; - cpuc = this_cpu_ptr(&cpu_hw_events); - /* * Save the PMU state. * It needs to be restored when leaving the handler. */ pmu_enabled = cpuc->enabled; /* - * No known reason to not always do late ACK, - * but just in case do it opt-in. + * In general, the early ACK is only applied for old platforms. + * For the big core starts from Haswell, the late ACK should be + * applied. + * For the small core after Tremont, we have to do the ACK right + * before re-enabling counters, which is in the middle of the + * NMI handler. */ - if (!x86_pmu.late_ack) + if (!late_ack && !mid_ack) apic_write(APIC_LVTPC, APIC_DM_NMI); intel_bts_disable_local(); cpuc->enabled = 0; @@ -2958,6 +2962,8 @@ static int intel_pmu_handle_irq(struct pt_regs *regs) goto again; done: + if (mid_ack) + apic_write(APIC_LVTPC, APIC_DM_NMI); /* Only restore PMU state when it's active. See x86_pmu_disable(). */ cpuc->enabled = pmu_enabled; if (pmu_enabled) @@ -2969,7 +2975,7 @@ static int intel_pmu_handle_irq(struct pt_regs *regs) * have been reset. This avoids spurious NMIs on * Haswell CPUs. */ - if (x86_pmu.late_ack) + if (late_ack) apic_write(APIC_LVTPC, APIC_DM_NMI); return handled; } @@ -6123,7 +6129,6 @@ __init int intel_pmu_init(void) static_branch_enable(&perf_is_hybrid); x86_pmu.num_hybrid_pmus = X86_HYBRID_NUM_PMUS; - x86_pmu.late_ack = true; x86_pmu.pebs_aliases = NULL; x86_pmu.pebs_prec_dist = true; x86_pmu.pebs_block = true; @@ -6161,6 +6166,7 @@ __init int intel_pmu_init(void) pmu = &x86_pmu.hybrid_pmu[X86_HYBRID_PMU_CORE_IDX]; pmu->name = "cpu_core"; pmu->cpu_type = hybrid_big; + pmu->late_ack = true; if (cpu_feature_enabled(X86_FEATURE_HYBRID_CPU)) { pmu->num_counters = x86_pmu.num_counters + 2; pmu->num_counters_fixed = x86_pmu.num_counters_fixed + 1; @@ -6186,6 +6192,7 @@ __init int intel_pmu_init(void) pmu = &x86_pmu.hybrid_pmu[X86_HYBRID_PMU_ATOM_IDX]; pmu->name = "cpu_atom"; pmu->cpu_type = hybrid_small; + pmu->mid_ack = true; pmu->num_counters = x86_pmu.num_counters; pmu->num_counters_fixed = x86_pmu.num_counters_fixed; pmu->max_pebs_events = x86_pmu.max_pebs_events; diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index ad87cb3..eec7ce8 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -655,6 +655,10 @@ struct x86_hybrid_pmu { struct event_constraint *event_constraints; struct event_constraint *pebs_constraints; struct extra_reg *extra_regs; + + unsigned int late_ack :1, + mid_ack :1, + enabled_ack :1; }; static __always_inline struct x86_hybrid_pmu *hybrid_pmu(struct pmu *pmu) @@ -685,6 +689,16 @@ extern struct static_key_false perf_is_hybrid; __Fp; \ })) +#define hybrid_bit(_pmu, _field) \ +({ \ + bool __Fp = x86_pmu._field; \ + \ + if (is_hybrid() && (_pmu)) \ + __Fp = hybrid_pmu(_pmu)->_field; \ + \ + __Fp; \ +}) + enum hybrid_pmu_type { hybrid_big = 0x40, hybrid_small = 0x20, @@ -754,6 +768,7 @@ struct x86_pmu { /* PMI handler bits */ unsigned int late_ack :1, + mid_ack :1, enabled_ack :1; /* * sysfs attrs