diff mbox series

[v5,2/3] KVM: SVM: Add Idle HLT intercept support

Message ID 20250103081828.7060-3-manali.shukla@amd.com
State New
Headers show
Series Add support for the Idle HLT intercept feature | expand

Commit Message

Manali Shukla Jan. 3, 2025, 8:18 a.m. UTC
From: Manali Shukla <Manali.Shukla@amd.com>

The hypervisor can intercept the HLT instruction by setting the
HLT-Intercept Bit in VMCB, causing a VMEXIT. This can be wasteful if
there are pending V_INTR and V_NMI events, as the hypervisor must then
initiate a VMRUN to handle them.

If the HLT-Intercept Bit is cleared and the vCPU executes HLT while
there are pending V_INTR and V_NMI events, the hypervisor won’t detect
them, potentially causing indefinite suspension of the vCPU. This poses
a problem for enlightened guests who  wish to securely handle the
events.

For Secure AVIC scenarios, if a guest does a HLT while an interrupt is
pending (in IRR), the hypervisor does not have a way to figure out
whether the guest needs to be re-entered, as it cannot read the guest
backing page.  The Idle HLT intercept feature allows the hypervisor to
intercept HLT execution only if there are no pending V_INTR and V_NMI
events.

There are two use cases for the Idle HLT intercept feature:
- Secure VMs that wish to handle pending events securely without exiting
  to the hypervisor on HLT (Secure AVIC).
- Optimization for all the VMs to avoid a wasteful VMEXIT during HLT
  when there are pending events.

On discovering the Idle HLT Intercept, the KVM hypervisor,
Sets the Idle HLT Intercept bit (bit (6), offset 0x14h) in the VMCB.
When the Idle HLT Intercept bit is set, HLT Intercept bit (bit (0),
offset 0xFh) should be cleared.

Before entering the HLT state, the HLT instruction performs checks in
following order:
- The HLT intercept check, if set, it unconditionally triggers
  SVM_EXIT_HLT (0x78).
- The Idle HLT intercept check, if set and there are no pending V_INTR
  or V_NMI events, triggers SVM_EXIT_IDLE_HLT (0xA6).

Details about the Idle HLT intercept feature can be found in AMD APM [1].

[1]: AMD64 Architecture Programmer's Manual Pub. 24593, April
     2024, Vol 2, 15.9 Instruction Intercepts (Table 15-7: IDLE_HLT).
     https://bugzilla.kernel.org/attachment.cgi?id=306250

Signed-off-by: Manali Shukla <Manali.Shukla@amd.com>
---
 arch/x86/include/asm/svm.h      |  1 +
 arch/x86/include/uapi/asm/svm.h |  2 ++
 arch/x86/kvm/svm/svm.c          | 13 ++++++++++---
 3 files changed, 13 insertions(+), 3 deletions(-)
diff mbox series

Patch

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 2b59b9951c90..992050cb83d0 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -116,6 +116,7 @@  enum {
 	INTERCEPT_INVPCID,
 	INTERCEPT_MCOMMIT,
 	INTERCEPT_TLBSYNC,
+	INTERCEPT_IDLE_HLT = 166,
 };
 
 
diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
index 1814b413fd57..ec1321248dac 100644
--- a/arch/x86/include/uapi/asm/svm.h
+++ b/arch/x86/include/uapi/asm/svm.h
@@ -95,6 +95,7 @@ 
 #define SVM_EXIT_CR14_WRITE_TRAP		0x09e
 #define SVM_EXIT_CR15_WRITE_TRAP		0x09f
 #define SVM_EXIT_INVPCID       0x0a2
+#define SVM_EXIT_IDLE_HLT      0x0a6
 #define SVM_EXIT_NPF           0x400
 #define SVM_EXIT_AVIC_INCOMPLETE_IPI		0x401
 #define SVM_EXIT_AVIC_UNACCELERATED_ACCESS	0x402
@@ -224,6 +225,7 @@ 
 	{ SVM_EXIT_CR4_WRITE_TRAP,	"write_cr4_trap" }, \
 	{ SVM_EXIT_CR8_WRITE_TRAP,	"write_cr8_trap" }, \
 	{ SVM_EXIT_INVPCID,     "invpcid" }, \
+	{ SVM_EXIT_IDLE_HLT,     "idle-halt" }, \
 	{ SVM_EXIT_NPF,         "npf" }, \
 	{ SVM_EXIT_AVIC_INCOMPLETE_IPI,		"avic_incomplete_ipi" }, \
 	{ SVM_EXIT_AVIC_UNACCELERATED_ACCESS,   "avic_unaccelerated_access" }, \
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 78daedf6697b..36f307e71d5d 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1296,8 +1296,12 @@  static void init_vmcb(struct kvm_vcpu *vcpu)
 		svm_set_intercept(svm, INTERCEPT_MWAIT);
 	}
 
-	if (!kvm_hlt_in_guest(vcpu->kvm))
-		svm_set_intercept(svm, INTERCEPT_HLT);
+	if (!kvm_hlt_in_guest(vcpu->kvm)) {
+		if (cpu_feature_enabled(X86_FEATURE_IDLE_HLT))
+			svm_set_intercept(svm, INTERCEPT_IDLE_HLT);
+		else
+			svm_set_intercept(svm, INTERCEPT_HLT);
+	}
 
 	control->iopm_base_pa = iopm_base;
 	control->msrpm_base_pa = __sme_set(__pa(svm->msrpm));
@@ -3341,6 +3345,7 @@  static int (*const svm_exit_handlers[])(struct kvm_vcpu *vcpu) = {
 	[SVM_EXIT_CR4_WRITE_TRAP]		= cr_trap,
 	[SVM_EXIT_CR8_WRITE_TRAP]		= cr_trap,
 	[SVM_EXIT_INVPCID]                      = invpcid_interception,
+	[SVM_EXIT_IDLE_HLT]			= kvm_emulate_halt,
 	[SVM_EXIT_NPF]				= npf_interception,
 	[SVM_EXIT_RSM]                          = rsm_interception,
 	[SVM_EXIT_AVIC_INCOMPLETE_IPI]		= avic_incomplete_ipi_interception,
@@ -3503,7 +3508,7 @@  int svm_invoke_exit_handler(struct kvm_vcpu *vcpu, u64 exit_code)
 		return interrupt_window_interception(vcpu);
 	else if (exit_code == SVM_EXIT_INTR)
 		return intr_interception(vcpu);
-	else if (exit_code == SVM_EXIT_HLT)
+	else if (exit_code == SVM_EXIT_HLT || exit_code == SVM_EXIT_IDLE_HLT)
 		return kvm_emulate_halt(vcpu);
 	else if (exit_code == SVM_EXIT_NPF)
 		return npf_interception(vcpu);
@@ -5224,6 +5229,8 @@  static __init void svm_set_cpu_caps(void)
 		if (vnmi)
 			kvm_cpu_cap_set(X86_FEATURE_VNMI);
 
+		kvm_cpu_cap_check_and_set(X86_FEATURE_IDLE_HLT);
+
 		/* Nested VM can receive #VMEXIT instead of triggering #GP */
 		kvm_cpu_cap_set(X86_FEATURE_SVME_ADDR_CHK);
 	}