Message ID | 20231108183003.5981-14-xin3.li@intel.com |
---|---|
State | New |
Headers | show |
Series | Enable FRED with KVM VMX | expand |
> /* Require Write-Back (WB) memory type for VMCS accesses. */ >@@ -7313,11 +7328,12 @@ static void __vmx_complete_interrupts(struct kvm_vcpu *vcpu, > } > } > >- if (idt_vectoring_info & VECTORING_INFO_DELIVER_CODE_MASK) { >- u32 err = vmcs_read32(error_code_field); >- kvm_requeue_exception_e(vcpu, vector, err); >- } else >- kvm_requeue_exception(vcpu, vector); >+ if (idt_vectoring_info & VECTORING_INFO_DELIVER_CODE_MASK) >+ kvm_requeue_exception_e(vcpu, vector, vmcs_read32(error_code_field), >+ idt_vectoring_info & INTR_INFO_NESTED_EXCEPTION_MASK); >+ else >+ kvm_requeue_exception(vcpu, vector, >+ idt_vectoring_info & INTR_INFO_NESTED_EXCEPTION_MASK); Exiting-event identification can also have bit 13 set, indicating a nested exception encountered and caused VM-exit. when reinjecting the exception to guests, kvm needs to set the "nested" bit, right? I suspect some changes to e.g., handle_exception_nmi() are needed.
> Subject: RE: [PATCH v1 13/23] KVM: VMX: Handle VMX nested exception for FRED > > > >+ if (idt_vectoring_info & > VECTORING_INFO_DELIVER_CODE_MASK) > > >+ kvm_requeue_exception_e(vcpu, vector, > > vmcs_read32(error_code_field), > > >+ idt_vectoring_info & > > INTR_INFO_NESTED_EXCEPTION_MASK); > > >+ else > > >+ kvm_requeue_exception(vcpu, vector, > > >+ idt_vectoring_info & > > INTR_INFO_NESTED_EXCEPTION_MASK); > > > > Exiting-event identification can also have bit 13 set, indicating a > > nested exception encountered and caused VM-exit. when reinjecting the > > exception to guests, kvm needs to set the "nested" bit, right? I > > suspect some changes to e.g., handle_exception_nmi() are needed. > > The current patch relies on kvm_multiple_exception() to do that. But TBH, I'm > not sure it can recognize all nested cases. I probably should revisit it. So the conclusion is that kvm_multiple_exception() is smart enough, and a VMM doesn't have to check bit 13 of the Exiting-event identification. In FRED spec 5.0, section 9.2 - New VMX Feature: VMX Nested-Exception Support, there is a statement at the end of Exiting-event identification: (The value of this bit is always identical to that of the valid bit of the original-event identification field.) It means that even w/o VMX Nested-Exception support, a VMM already knows if an exception is a nested exception encountered during delivery of another event in an exception caused VM exit (exit reason 0). This is done in KVM through reading IDT_VECTORING_INFO_FIELD and calling vmx_complete_interrupts() immediately after VM exits. vmx_complete_interrupts() simply queues the original exception if there is one, and later the nested exception causing the VM exit could be cancelled if it is a shadow page fault. However if the shadow page fault is caused by a guest page fault, KVM injects it as a nested exception to have guest fix its page table. I will add comments about this background in the next iteration.
On Wed, Dec 06, 2023 at 04:37:39PM +0800, Li, Xin3 wrote: >> Subject: RE: [PATCH v1 13/23] KVM: VMX: Handle VMX nested exception for FRED >> >> > >+ if (idt_vectoring_info & >> VECTORING_INFO_DELIVER_CODE_MASK) >> > >+ kvm_requeue_exception_e(vcpu, vector, >> > vmcs_read32(error_code_field), >> > >+ idt_vectoring_info & >> > INTR_INFO_NESTED_EXCEPTION_MASK); >> > >+ else >> > >+ kvm_requeue_exception(vcpu, vector, >> > >+ idt_vectoring_info & >> > INTR_INFO_NESTED_EXCEPTION_MASK); >> > >> > Exiting-event identification can also have bit 13 set, indicating a >> > nested exception encountered and caused VM-exit. when reinjecting the >> > exception to guests, kvm needs to set the "nested" bit, right? I >> > suspect some changes to e.g., handle_exception_nmi() are needed. >> >> The current patch relies on kvm_multiple_exception() to do that. But TBH, I'm >> not sure it can recognize all nested cases. I probably should revisit it. > >So the conclusion is that kvm_multiple_exception() is smart enough, and >a VMM doesn't have to check bit 13 of the Exiting-event identification. > >In FRED spec 5.0, section 9.2 - New VMX Feature: VMX Nested-Exception >Support, there is a statement at the end of Exiting-event identification: > >(The value of this bit is always identical to that of the valid bit of >the original-event identification field.) > >It means that even w/o VMX Nested-Exception support, a VMM already knows >if an exception is a nested exception encountered during delivery of >another event in an exception caused VM exit (exit reason 0). This is >done in KVM through reading IDT_VECTORING_INFO_FIELD and calling >vmx_complete_interrupts() immediately after VM exits. > >vmx_complete_interrupts() simply queues the original exception if there is >one, and later the nested exception causing the VM exit could be cancelled >if it is a shadow page fault. However if the shadow page fault is caused >by a guest page fault, KVM injects it as a nested exception to have guest >fix its page table. > >I will add comments about this background in the next iteration. is it possible that the CPU encounters an exception and causes VM-exit during injecting an __interrupt__? in this case, no __exception__ will be (re-)queued by vmx_complete_interrupts().
On Thu, Dec 07, 2023 at 06:09:46PM +0800, Li, Xin3 wrote: >> >> > Exiting-event identification can also have bit 13 set, indicating a >> >> > nested exception encountered and caused VM-exit. when reinjecting the >> >> > exception to guests, kvm needs to set the "nested" bit, right? I >> >> > suspect some changes to e.g., handle_exception_nmi() are needed. >> >> >> >> The current patch relies on kvm_multiple_exception() to do that. But TBH, I'm >> >> not sure it can recognize all nested cases. I probably should revisit it. >> > >> >So the conclusion is that kvm_multiple_exception() is smart enough, and >> >a VMM doesn't have to check bit 13 of the Exiting-event identification. >> > >> >In FRED spec 5.0, section 9.2 - New VMX Feature: VMX Nested-Exception >> >Support, there is a statement at the end of Exiting-event identification: >> > >> >(The value of this bit is always identical to that of the valid bit of >> >the original-event identification field.) >> > >> >It means that even w/o VMX Nested-Exception support, a VMM already knows >> >if an exception is a nested exception encountered during delivery of >> >another event in an exception caused VM exit (exit reason 0). This is >> >done in KVM through reading IDT_VECTORING_INFO_FIELD and calling >> >vmx_complete_interrupts() immediately after VM exits. >> > >> >vmx_complete_interrupts() simply queues the original exception if there is >> >one, and later the nested exception causing the VM exit could be cancelled >> >if it is a shadow page fault. However if the shadow page fault is caused >> >by a guest page fault, KVM injects it as a nested exception to have guest >> >fix its page table. >> > >> >I will add comments about this background in the next iteration. >> >> is it possible that the CPU encounters an exception and causes VM-exit during >> injecting an __interrupt__? in this case, no __exception__ will be (re-)queued >> by vmx_complete_interrupts(). > >I guess the following case is what you're suggesting: >KVM injects an external interrupt after shadow page tables are nuked. > >vmx_complete_interrupts() are called after each VM exit to clear both >interrupt and exception queues, which means it always pushes the >deepest event if there is an original event. In the above case, the >original event is the external interrupt KVM just tried to inject. in my understanding, your point is: 1. if bit 13 of the Exiting-event identification is set. the original-event identification field should be valid. 2. vmx_complete_interrupts() is done immediately after VM exits and reads original-event identification and reinjects the event there. 3. if KVM injects the exception in exiting-event identification to guest, KVM doesn't need to read the bit 13 because kvm_multiple_exception() is "smart enough" and recognize the exception as nested-exception because if bit 13 is 1, one exception must has been queued in #2. my question is: what if the event in original-event identification is an interrupt e.g., external interrupt or NMI, rather than exception. vmx_complete_interrupts() won't queue an exception, then how can KVM or kvm_multiple_exception() know the exception that caused VM-exit is an nested exception w/o reading bit 13 of the Exiting-event identification?
> >> >> > Exiting-event identification can also have bit 13 set, indicating a > >> >> > nested exception encountered and caused VM-exit. when reinjecting the > >> >> > exception to guests, kvm needs to set the "nested" bit, right? I > >> >> > suspect some changes to e.g., handle_exception_nmi() are needed. > >> >> > >> >> The current patch relies on kvm_multiple_exception() to do that. But TBH, > I'm > >> >> not sure it can recognize all nested cases. I probably should revisit it. > >> > > >> >So the conclusion is that kvm_multiple_exception() is smart enough, and > >> >a VMM doesn't have to check bit 13 of the Exiting-event identification. > >> > > >> >In FRED spec 5.0, section 9.2 - New VMX Feature: VMX Nested-Exception > >> >Support, there is a statement at the end of Exiting-event identification: > >> > > >> >(The value of this bit is always identical to that of the valid bit of > >> >the original-event identification field.) > >> > > >> >It means that even w/o VMX Nested-Exception support, a VMM already > knows > >> >if an exception is a nested exception encountered during delivery of > >> >another event in an exception caused VM exit (exit reason 0). This is > >> >done in KVM through reading IDT_VECTORING_INFO_FIELD and calling > >> >vmx_complete_interrupts() immediately after VM exits. > >> > > >> >vmx_complete_interrupts() simply queues the original exception if there is > >> >one, and later the nested exception causing the VM exit could be cancelled > >> >if it is a shadow page fault. However if the shadow page fault is caused > >> >by a guest page fault, KVM injects it as a nested exception to have guest > >> >fix its page table. > >> > > >> >I will add comments about this background in the next iteration. > >> > >> is it possible that the CPU encounters an exception and causes VM-exit during > >> injecting an __interrupt__? in this case, no __exception__ will be (re-)queued > >> by vmx_complete_interrupts(). > > > >I guess the following case is what you're suggesting: > >KVM injects an external interrupt after shadow page tables are nuked. > > > >vmx_complete_interrupts() are called after each VM exit to clear both > >interrupt and exception queues, which means it always pushes the > >deepest event if there is an original event. In the above case, the > >original event is the external interrupt KVM just tried to inject. > > in my understanding, your point is: > 1. if bit 13 of the Exiting-event identification is set. the original-event > identification field should be valid. > 2. vmx_complete_interrupts() is done immediately after VM exits and reads > original-event identification and reinjects the event there. > 3. if KVM injects the exception in exiting-event identification > to guest, KVM doesn't need to read the bit 13 because kvm_multiple_exception() > is "smart enough" and recognize the exception as nested-exception because if > bit 13 is 1, one exception must has been queued in #2. > > my question is: > what if the event in original-event identification is an interrupt e.g., > external interrupt or NMI, rather than exception. vmx_complete_interrupts() > won't queue an exception, then how can KVM or kvm_multiple_exception() > know the > exception that caused VM-exit is an nested exception w/o reading bit 13 of the > Exiting-event identification? The good news is that vmx_complete_interrupts() still queues the event even it's not a hardware exception. It's just that kvm_multiple_exception() doesn't check if there is an original interrupt or NMI because IDT event delivery doesn't care such a case. I think your point is more of that we should check it when FRED is enabled for a guest. Yes, architecturally we should do it. What I want to emphasize is that bit 13 of the exiting-event identification is set to the valid bit of the original-event identification, they are logically the same thing when FRED is enabled. It doens't matter which one a VMM reads and uses. But a VMM doesn't need to differentiate FRED and IDT if it reads the info from original-event identification.
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 1e5a6d9439f8..2ae8cc83dbb3 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -721,6 +721,7 @@ struct kvm_queued_exception { u32 error_code; unsigned long payload; bool has_payload; + bool nested; }; struct kvm_vcpu_arch { @@ -2015,8 +2016,9 @@ int kvm_emulate_rdpmc(struct kvm_vcpu *vcpu); void kvm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr); void kvm_queue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code); void kvm_queue_exception_p(struct kvm_vcpu *vcpu, unsigned nr, unsigned long payload); -void kvm_requeue_exception(struct kvm_vcpu *vcpu, unsigned nr); -void kvm_requeue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code); +void kvm_requeue_exception(struct kvm_vcpu *vcpu, unsigned nr, bool nested); +void kvm_requeue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, + u32 error_code, bool nested); void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault); void kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault); diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index 97729248e844..020dfd3f6b44 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -132,6 +132,7 @@ /* VMX_BASIC bits and bitmasks */ #define VMX_BASIC_32BIT_PHYS_ADDR_ONLY BIT_ULL(48) #define VMX_BASIC_INOUT BIT_ULL(54) +#define VMX_BASIC_NESTED_EXCEPTION BIT_ULL(58) /* VMX_MISC bits and bitmasks */ #define VMX_MISC_INTEL_PT BIT_ULL(14) @@ -404,8 +405,9 @@ enum vmcs_field { #define INTR_INFO_INTR_TYPE_MASK 0x700 /* 10:8 */ #define INTR_INFO_DELIVER_CODE_MASK 0x800 /* 11 */ #define INTR_INFO_UNBLOCK_NMI 0x1000 /* 12 */ +#define INTR_INFO_NESTED_EXCEPTION_MASK 0x2000 /* 13 */ #define INTR_INFO_VALID_MASK 0x80000000 /* 31 */ -#define INTR_INFO_RESVD_BITS_MASK 0x7ffff000 +#define INTR_INFO_RESVD_BITS_MASK 0x7fffd000 #define VECTORING_INFO_VECTOR_MASK INTR_INFO_VECTOR_MASK #define VECTORING_INFO_TYPE_MASK INTR_INFO_INTR_TYPE_MASK diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 712146312358..78a9ff5cfcad 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -4047,10 +4047,10 @@ static void svm_complete_interrupts(struct kvm_vcpu *vcpu) if (exitintinfo & SVM_EXITINTINFO_VALID_ERR) { u32 err = svm->vmcb->control.exit_int_info_err; - kvm_requeue_exception_e(vcpu, vector, err); + kvm_requeue_exception_e(vcpu, vector, err, false); } else - kvm_requeue_exception(vcpu, vector); + kvm_requeue_exception(vcpu, vector, false); break; case SVM_EXITINTINFO_TYPE_INTR: kvm_queue_interrupt(vcpu, vector, false); diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 67fd4a56d031..518e68ee5a0d 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -1901,6 +1901,8 @@ static void vmx_inject_exception(struct kvm_vcpu *vcpu) event_data = vcpu->arch.guest_fpu.xfd_err; vmcs_write64(INJECTED_EVENT_DATA, event_data); + + intr_info |= ex->nested ? INTR_INFO_NESTED_EXCEPTION_MASK : 0; } } @@ -2851,6 +2853,19 @@ static int setup_vmcs_config(struct vmcs_config *vmcs_conf, /* IA-32 SDM Vol 3B: 64-bit CPUs always have VMX_BASIC_MSR[48]==0. */ if (basic_msr & VMX_BASIC_32BIT_PHYS_ADDR_ONLY) return -EIO; + + /* + * FRED draft Spec 5.0 Section 9.2: + * + * Any processor that enumerates support for FRED transitions + * will also enumerate VMX nested-exception support. + */ + if (cpu_feature_enabled(X86_FEATURE_FRED) && + !(basic_msr & VMX_BASIC_NESTED_EXCEPTION)) { + pr_warn_once("FRED enabled but no VMX nested-exception support\n"); + if (error_on_inconsistent_vmcs_config) + return -EIO; + } #endif /* Require Write-Back (WB) memory type for VMCS accesses. */ @@ -7313,11 +7328,12 @@ static void __vmx_complete_interrupts(struct kvm_vcpu *vcpu, } } - if (idt_vectoring_info & VECTORING_INFO_DELIVER_CODE_MASK) { - u32 err = vmcs_read32(error_code_field); - kvm_requeue_exception_e(vcpu, vector, err); - } else - kvm_requeue_exception(vcpu, vector); + if (idt_vectoring_info & VECTORING_INFO_DELIVER_CODE_MASK) + kvm_requeue_exception_e(vcpu, vector, vmcs_read32(error_code_field), + idt_vectoring_info & INTR_INFO_NESTED_EXCEPTION_MASK); + else + kvm_requeue_exception(vcpu, vector, + idt_vectoring_info & INTR_INFO_NESTED_EXCEPTION_MASK); break; case INTR_TYPE_SOFT_INTR: vcpu->arch.event_exit_inst_len = vmcs_read32(instr_len_field); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index d190bfc63fc4..51c07730f1b6 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -645,7 +645,8 @@ static void kvm_leave_nested(struct kvm_vcpu *vcpu) static void kvm_multiple_exception(struct kvm_vcpu *vcpu, unsigned nr, bool has_error, u32 error_code, - bool has_payload, unsigned long payload, bool reinject) + bool has_payload, unsigned long payload, + bool reinject, bool nested) { u32 prev_nr; int class1, class2; @@ -678,6 +679,7 @@ static void kvm_multiple_exception(struct kvm_vcpu *vcpu, */ WARN_ON_ONCE(kvm_is_exception_pending(vcpu)); vcpu->arch.exception.injected = true; + vcpu->arch.exception.nested = nested; if (WARN_ON_ONCE(has_payload)) { /* * For a reinjected event, KVM delivers its @@ -727,6 +729,8 @@ static void kvm_multiple_exception(struct kvm_vcpu *vcpu, kvm_queue_exception_e(vcpu, DF_VECTOR, 0); } else { + vcpu->arch.exception.nested = true; + /* replace previous exception with a new one in a hope that instruction re-execution will regenerate lost exception */ @@ -736,20 +740,20 @@ static void kvm_multiple_exception(struct kvm_vcpu *vcpu, void kvm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr) { - kvm_multiple_exception(vcpu, nr, false, 0, false, 0, false); + kvm_multiple_exception(vcpu, nr, false, 0, false, 0, false, false); } EXPORT_SYMBOL_GPL(kvm_queue_exception); -void kvm_requeue_exception(struct kvm_vcpu *vcpu, unsigned nr) +void kvm_requeue_exception(struct kvm_vcpu *vcpu, unsigned nr, bool nested) { - kvm_multiple_exception(vcpu, nr, false, 0, false, 0, true); + kvm_multiple_exception(vcpu, nr, false, 0, false, 0, true, nested); } EXPORT_SYMBOL_GPL(kvm_requeue_exception); void kvm_queue_exception_p(struct kvm_vcpu *vcpu, unsigned nr, unsigned long payload) { - kvm_multiple_exception(vcpu, nr, false, 0, true, payload, false); + kvm_multiple_exception(vcpu, nr, false, 0, true, payload, false, false); } EXPORT_SYMBOL_GPL(kvm_queue_exception_p); @@ -757,7 +761,7 @@ static void kvm_queue_exception_e_p(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code, unsigned long payload) { kvm_multiple_exception(vcpu, nr, true, error_code, - true, payload, false); + true, payload, false, false); } int kvm_complete_insn_gp(struct kvm_vcpu *vcpu, int err) @@ -829,13 +833,13 @@ void kvm_inject_nmi(struct kvm_vcpu *vcpu) void kvm_queue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code) { - kvm_multiple_exception(vcpu, nr, true, error_code, false, 0, false); + kvm_multiple_exception(vcpu, nr, true, error_code, false, 0, false, false); } EXPORT_SYMBOL_GPL(kvm_queue_exception_e); -void kvm_requeue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code) +void kvm_requeue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code, bool nested) { - kvm_multiple_exception(vcpu, nr, true, error_code, false, 0, true); + kvm_multiple_exception(vcpu, nr, true, error_code, false, 0, true, nested); } EXPORT_SYMBOL_GPL(kvm_requeue_exception_e); diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index 60da8cbe6759..63e543c6834b 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -108,6 +108,7 @@ static inline void kvm_clear_exception_queue(struct kvm_vcpu *vcpu) { vcpu->arch.exception.pending = false; vcpu->arch.exception.injected = false; + vcpu->arch.exception.nested = false; vcpu->arch.exception_vmexit.pending = false; }