[v4,11/38] perf/x86: Forbid PMI handler when guest own PMU

Message ID	20250324173121.1275209-12-mizhang@google.com
State	New
Headers	show Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BA179266F03 for <linux-kselftest@vger.kernel.org>; Mon, 24 Mar 2025 17:33:10 +0000 (UTC) Reply-To: Mingwei Zhang <mizhang@google.com> Date: Mon, 24 Mar 2025 17:30:51 +0000 In-Reply-To: <20250324173121.1275209-1-mizhang@google.com> Precedence: bulk Mime-Version: 1.0 References: <20250324173121.1275209-1-mizhang@google.com> Message-ID: <20250324173121.1275209-12-mizhang@google.com> Subject: [PATCH v4 11/38] perf/x86: Forbid PMI handler when guest own PMU From: Mingwei Zhang <mizhang@google.com> To: Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@redhat.com>, Arnaldo Carvalho de Melo <acme@kernel.org>, Namhyung Kim <namhyung@kernel.org>, Sean Christopherson <seanjc@google.com>, Paolo Bonzini <pbonzini@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com>, Alexander Shishkin <alexander.shishkin@linux.intel.com>, Jiri Olsa <jolsa@kernel.org>, Ian Rogers <irogers@google.com>, Adrian Hunter <adrian.hunter@intel.com>, Liang@google.com, Kan <kan.liang@linux.intel.com>, "H. Peter Anvin" <hpa@zytor.com>, linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-kselftest@vger.kernel.org, Mingwei Zhang <mizhang@google.com>, Yongwei Ma <yongwei.ma@intel.com>, Xiong Zhang <xiong.y.zhang@linux.intel.com>, Dapeng Mi <dapeng1.mi@linux.intel.com>, Jim Mattson <jmattson@google.com>, Sandipan Das <sandipan.das@amd.com>, Zide Chen <zide.chen@intel.com>, Eranian Stephane <eranian@google.com>, Das Sandipan <Sandipan.Das@amd.com>, Shukla Manali <Manali.Shukla@amd.com>, Nikunj Dadhania <nikunj.dadhania@amd.com> Content-Type: text/plain; charset="UTF-8"
Series	[v4,01/38] perf: Support get/put mediated PMU interfaces \| expand [v4,01/38] perf: Support get/put mediated PMU interfaces [v4,03/38] perf: Clean up perf ctx time [v4,05/38] perf: Add generic exclude_guest support [v4,07/38] perf: core/x86: Register a new vector for KVM GUEST PMI [v4,10/38] perf/x86: Support switch_guest_ctx interface [v4,11/38] perf/x86: Forbid PMI handler when guest own PMU [v4,13/38] perf/x86/core: Plumb mediated PMU capability from x86_pmu to x86_pmu_cap [v4,15/38] KVM: x86/pmu: Check PMU cpuid configuration from user space [v4,17/38] KVM: x86/pmu: Add perf_capabilities field in struct kvm_host_values{} [v4,19/38] KVM: VMX: Add macros to wrap around {secondary,tertiary}_exec_controls_changebit() [v4,21/38] KVM: x86/pmu/vmx: Save/load guest IA32_PERF_GLOBAL_CTRL with vm_exit/entry_ctrl [v4,23/38] KVM: x86/pmu: Configure the interception of PMU MSRs [v4,25/38] KVM: x86/pmu: Add AMD PMU registers to direct access list [v4,27/38] KVM: x86/pmu: Handle PMU MSRs interception and event filtering [v4,29/38] KVM: x86/pmu: Switch host/guest PMU context at vm-exit/vm-entry [v4,31/38] KVM: nVMX: Add macros to simplify nested MSR interception setting [v4,33/38] perf/x86/intel: Support PERF_PMU_CAP_MEDIATED_VPMU [v4,36/38] KVM: selftests: Add mediated vPMU supported for pmu tests [v4,38/38] KVM: Selftests: Fix pmu_counters_test error for mediated vPMU

Message ID

20250324173121.1275209-12-mizhang@google.com

State

New

Headers

Reply-To: Mingwei Zhang <mizhang@google.com>
Date: Mon, 24 Mar 2025 17:30:51 +0000
In-Reply-To: <20250324173121.1275209-1-mizhang@google.com>
Precedence: bulk
Mime-Version: 1.0
References: <20250324173121.1275209-1-mizhang@google.com>
Message-ID: <20250324173121.1275209-12-mizhang@google.com>
Subject: [PATCH v4 11/38] perf/x86: Forbid PMI handler when guest own PMU
From: Mingwei Zhang <mizhang@google.com>
To: Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@redhat.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
 Namhyung Kim <namhyung@kernel.org>,
	Sean Christopherson <seanjc@google.com>, Paolo Bonzini <pbonzini@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
 Jiri Olsa <jolsa@kernel.org>,
	Ian Rogers <irogers@google.com>, Adrian Hunter <adrian.hunter@intel.com>,
 Liang@google.com,
	Kan <kan.liang@linux.intel.com>, "H. Peter Anvin" <hpa@zytor.com>,
	linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org,
	kvm@vger.kernel.org, linux-kselftest@vger.kernel.org,
	Mingwei Zhang <mizhang@google.com>, Yongwei Ma <yongwei.ma@intel.com>,
	Xiong Zhang <xiong.y.zhang@linux.intel.com>,
 Dapeng Mi <dapeng1.mi@linux.intel.com>,
	Jim Mattson <jmattson@google.com>, Sandipan Das <sandipan.das@amd.com>,
	Zide Chen <zide.chen@intel.com>, Eranian Stephane <eranian@google.com>,
	Das Sandipan <Sandipan.Das@amd.com>, Shukla Manali <Manali.Shukla@amd.com>,
	Nikunj Dadhania <nikunj.dadhania@amd.com>
Content-Type: text/plain; charset="UTF-8"

Series

[v4,01/38] perf: Support get/put mediated PMU interfaces | expand

Commit Message

Mingwei Zhang March 24, 2025, 5:30 p.m. UTC

If a guest PMI is delivered after VM-exit, the KVM maskable interrupt will
be held pending until EFLAGS.IF is set. In the meantime, if the logical
processor receives an NMI for any reason at all, perf_event_nmi_handler()
will be invoked. If there is any active perf event anywhere on the system,
x86_pmu_handle_irq() will be invoked, and it will clear
IA32_PERF_GLOBAL_STATUS. By the time KVM's PMI handler is invoked, it will
be a mystery which counter(s) overflowed.

When LVTPC is using KVM PMI vecotr, PMU is owned by guest, Host NMI let
x86_pmu_handle_irq() run, x86_pmu_handle_irq() restore PMU vector to NMI
and clear IA32_PERF_GLOBAL_STATUS, this breaks guest vPMU passthrough
environment.

So modify perf_event_nmi_handler() to check perf_in_guest per cpu variable,
and if so, to simply return without calling x86_pmu_handle_irq().

Suggested-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
---
 arch/x86/events/core.c | 27 +++++++++++++++++++++++++--
 1 file changed, 25 insertions(+), 2 deletions(-)

Comments

Sean Christopherson May 15, 2025, midnight UTC | #1

On Mon, Mar 24, 2025, Mingwei Zhang wrote:
> If a guest PMI is delivered after VM-exit, the KVM maskable interrupt will
> be held pending until EFLAGS.IF is set. In the meantime, if the logical
> processor receives an NMI for any reason at all, perf_event_nmi_handler()
> will be invoked. If there is any active perf event anywhere on the system,
> x86_pmu_handle_irq() will be invoked, and it will clear
> IA32_PERF_GLOBAL_STATUS. By the time KVM's PMI handler is invoked, it will
> be a mystery which counter(s) overflowed.
> 
> When LVTPC is using KVM PMI vecotr, PMU is owned by guest, Host NMI let
> x86_pmu_handle_irq() run, x86_pmu_handle_irq() restore PMU vector to NMI
> and clear IA32_PERF_GLOBAL_STATUS, this breaks guest vPMU passthrough
> environment.
> 
> So modify perf_event_nmi_handler() to check perf_in_guest per cpu variable,
> and if so, to simply return without calling x86_pmu_handle_irq().
> 
> Suggested-by: Jim Mattson <jmattson@google.com>
> Signed-off-by: Mingwei Zhang <mizhang@google.com>
> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
> ---
>  arch/x86/events/core.c | 27 +++++++++++++++++++++++++--
>  1 file changed, 25 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> index 28161d6ff26d..96a173bbbec2 100644
> --- a/arch/x86/events/core.c
> +++ b/arch/x86/events/core.c
> @@ -54,6 +54,8 @@ DEFINE_PER_CPU(struct cpu_hw_events, cpu_hw_events) = {
>  	.pmu = &pmu,
>  };
>  
> +static DEFINE_PER_CPU(bool, pmi_vector_is_nmi) = true;

I strongly prefer guest_ctx_loaded.  pmi_vector_is_nmi very inflexible and
doesn't communicate *why* perf's NMI handler needs to ignore NMIs

>  DEFINE_STATIC_KEY_FALSE(rdpmc_never_available_key);
>  DEFINE_STATIC_KEY_FALSE(rdpmc_always_available_key);
>  DEFINE_STATIC_KEY_FALSE(perf_is_hybrid);
> @@ -1737,6 +1739,24 @@ perf_event_nmi_handler(unsigned int cmd, struct pt_regs *regs)
>  	u64 finish_clock;
>  	int ret;
>  
> +	/*
> +	 * When guest pmu context is loaded this handler should be forbidden from
> +	 * running, the reasons are:
> +	 * 1. After perf_guest_enter() is called, and before cpu enter into
> +	 *    non-root mode, host non-PMI NMI could happen, but x86_pmu_handle_irq()
> +	 *    restore PMU to use NMI vector, which destroy KVM PMI vector setting.
> +	 * 2. When VM is running, host non-PMI NMI causes VM exit, KVM will
> +	 *    call host NMI handler (vmx_vcpu_enter_exit()) first before KVM save
> +	 *    guest PMU context (kvm_pmu_put_guest_context()), as x86_pmu_handle_irq()
> +	 *    clear global_status MSR which has guest status now, then this destroy
> +	 *    guest PMU status.
> +	 * 3. After VM exit, but before KVM save guest PMU context, host non-PMI NMI
> +	 *    could happen, x86_pmu_handle_irq() clear global_status MSR which has
> +	 *    guest status now, then this destroy guest PMU status.
> +	 */

This *might* be useful for a changelog, but even then it's probably overkill.
NMIs can happen at any time, that's the full the story.  Enumerating the exact
edge cases adds a lot of noise and not much value.

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 28161d6ff26d..96a173bbbec2 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -54,6 +54,8 @@  DEFINE_PER_CPU(struct cpu_hw_events, cpu_hw_events) = {
 	.pmu = &pmu,
 };
 
+static DEFINE_PER_CPU(bool, pmi_vector_is_nmi) = true;
+
 DEFINE_STATIC_KEY_FALSE(rdpmc_never_available_key);
 DEFINE_STATIC_KEY_FALSE(rdpmc_always_available_key);
 DEFINE_STATIC_KEY_FALSE(perf_is_hybrid);
@@ -1737,6 +1739,24 @@  perf_event_nmi_handler(unsigned int cmd, struct pt_regs *regs)
 	u64 finish_clock;
 	int ret;
 
+	/*
+	 * When guest pmu context is loaded this handler should be forbidden from
+	 * running, the reasons are:
+	 * 1. After perf_guest_enter() is called, and before cpu enter into
+	 *    non-root mode, host non-PMI NMI could happen, but x86_pmu_handle_irq()
+	 *    restore PMU to use NMI vector, which destroy KVM PMI vector setting.
+	 * 2. When VM is running, host non-PMI NMI causes VM exit, KVM will
+	 *    call host NMI handler (vmx_vcpu_enter_exit()) first before KVM save
+	 *    guest PMU context (kvm_pmu_put_guest_context()), as x86_pmu_handle_irq()
+	 *    clear global_status MSR which has guest status now, then this destroy
+	 *    guest PMU status.
+	 * 3. After VM exit, but before KVM save guest PMU context, host non-PMI NMI
+	 *    could happen, x86_pmu_handle_irq() clear global_status MSR which has
+	 *    guest status now, then this destroy guest PMU status.
+	 */
+	if (!this_cpu_read(pmi_vector_is_nmi))
+		return NMI_DONE;
+
 	/*
 	 * All PMUs/events that share this PMI handler should make sure to
 	 * increment active_events for their events.
@@ -2681,10 +2701,13 @@  static void x86_pmu_switch_guest_ctx(bool enter, void *data)
 {
 	u32 guest_lvtpc = *(u32 *)data;
 
-	if (enter)
+	if (enter) {
 		apic_write(APIC_LVTPC, guest_lvtpc);
-	else
+		this_cpu_write(pmi_vector_is_nmi, false);
+	} else {
 		apic_write(APIC_LVTPC, APIC_DM_NMI);
+		this_cpu_write(pmi_vector_is_nmi, true);
+	}
 }
 
 static struct pmu pmu = {

[v4,11/38] perf/x86: Forbid PMI handler when guest own PMU

Commit Message

Comments

Patch