mbox series

[v1,00/23] Enable FRED with KVM VMX

Message ID 20231108183003.5981-1-xin3.li@intel.com
Headers show
Series Enable FRED with KVM VMX | expand

Message

Li, Xin3 Nov. 8, 2023, 6:29 p.m. UTC
This patch set enables the Intel flexible return and event delivery
(FRED) architecture with KVM VMX to allow guests to utilize FRED.

The FRED architecture defines simple new transitions that change
privilege level (ring transitions). The FRED architecture was
designed with the following goals:

1) Improve overall performance and response time by replacing event
   delivery through the interrupt descriptor table (IDT event
   delivery) and event return by the IRET instruction with lower
   latency transitions.

2) Improve software robustness by ensuring that event delivery
   establishes the full supervisor context and that event return
   establishes the full user context.

The new transitions defined by the FRED architecture are FRED event
delivery and, for returning from events, two FRED return instructions.
FRED event delivery can effect a transition from ring 3 to ring 0, but
it is used also to deliver events incident to ring 0. One FRED
instruction (ERETU) effects a return from ring 0 to ring 3, while the
other (ERETS) returns while remaining in ring 0. Collectively, FRED
event delivery and the FRED return instructions are FRED transitions.

Intel VMX architecture is extended to run FRED guests, and the changes
are majorly:

1) New VMCS fields for FRED context management, which includes two new
event data VMCS fields, eight new guest FRED context VMCS fields and
eight new host FRED context VMCS fields.

2) VMX nested-Exception support for proper virtualization of stack
levels introduced with FRED architecture.

Search for the latest FRED spec in most search engines with this search pattern:

  site:intel.com FRED (flexible return and event delivery) specification

We want to send out the FRED VMX patch set for review while the FRED
native patch set v12 is being reviewed @
https://lkml.kernel.org/kvm/20231003062458.23552-1-xin3.li@intel.com/.
For easier review, I have set up a base tree with the latest FRED native
patch set on top of tip tree in the 'fred_v12' branch of repo
    https://github.com/xinli-intel/linux-fred-public.git.

Patch 1-2 are cleanups to VMX basic and misc MSRs, which were sent
out earlier as a preparation for FRED changes:
https://lore.kernel.org/kvm/20231030233940.438233-1-xin@zytor.com/.

Patch 3-14 add FRED support to VMX.
Patch 15-18 add FRED support to nested VMX.
Patch 19 exposes FRED to KVM guests to complete the enabling.
Patch 20-23 adds FRED selftests.


Shan Kang (1):
  KVM: selftests: Add fred exception tests

Xin Li (22):
  KVM: VMX: Cleanup VMX basic information defines and usages
  KVM: VMX: Cleanup VMX misc information defines and usages
  KVM: VMX: Add support for the secondary VM exit controls
  KVM: x86: Mark CR4.FRED as not reserved
  KVM: VMX: Initialize FRED VM entry/exit controls in vmcs_config
  KVM: VMX: Defer enabling FRED MSRs save/load until after set CPUID
  KVM: VMX: Disable intercepting FRED MSRs
  KVM: VMX: Initialize VMCS FRED fields
  KVM: VMX: Switch FRED RSP0 between host and guest
  KVM: VMX: Add support for FRED context save/restore
  KVM: x86: Add kvm_is_fred_enabled()
  KVM: VMX: Handle FRED event data
  KVM: VMX: Handle VMX nested exception for FRED
  KVM: VMX: Dump FRED context in dump_vmcs()
  KVM: nVMX: Add support for the secondary VM exit controls
  KVM: nVMX: Add FRED VMCS fields
  KVM: nVMX: Add support for VMX FRED controls
  KVM: nVMX: Add VMCS FRED states checking
  KVM: x86: Allow FRED/LKGS/WRMSRNS to be exposed to guests
  KVM: selftests: Add FRED VMCS fields to evmcs
  KVM: selftests: Run debug_regs test with FRED enabled
  KVM: selftests: Add a new VM guest mode to run user level code

 Documentation/virt/kvm/x86/nested-vmx.rst     |  19 +
 arch/x86/include/asm/hyperv-tlfs.h            |  19 +
 arch/x86/include/asm/kvm_host.h               |   9 +-
 arch/x86/include/asm/msr-index.h              |  15 +-
 arch/x86/include/asm/vmx.h                    |  57 ++-
 arch/x86/kvm/cpuid.c                          |   4 +-
 arch/x86/kvm/kvm_cache_regs.h                 |  10 +
 arch/x86/kvm/svm/svm.c                        |   4 +-
 arch/x86/kvm/vmx/capabilities.h               |  20 +-
 arch/x86/kvm/vmx/hyperv.c                     |  61 ++-
 arch/x86/kvm/vmx/nested.c                     | 315 ++++++++++++--
 arch/x86/kvm/vmx/nested.h                     |   2 +-
 arch/x86/kvm/vmx/vmcs.h                       |   1 +
 arch/x86/kvm/vmx/vmcs12.c                     |  19 +
 arch/x86/kvm/vmx/vmcs12.h                     |  38 ++
 arch/x86/kvm/vmx/vmcs_shadow_fields.h         |   6 +-
 arch/x86/kvm/vmx/vmx.c                        | 404 ++++++++++++++++--
 arch/x86/kvm/vmx/vmx.h                        |  14 +-
 arch/x86/kvm/x86.c                            |  55 ++-
 arch/x86/kvm/x86.h                            |   5 +-
 tools/testing/selftests/kvm/Makefile          |   1 +
 .../selftests/kvm/include/kvm_util_base.h     |   1 +
 .../selftests/kvm/include/x86_64/evmcs.h      | 146 +++++++
 .../selftests/kvm/include/x86_64/processor.h  |  33 ++
 .../selftests/kvm/include/x86_64/vmx.h        |  20 +
 tools/testing/selftests/kvm/lib/kvm_util.c    |   5 +-
 .../selftests/kvm/lib/x86_64/processor.c      |  15 +-
 tools/testing/selftests/kvm/lib/x86_64/vmx.c  |   4 +-
 .../testing/selftests/kvm/x86_64/debug_regs.c |  50 ++-
 .../testing/selftests/kvm/x86_64/fred_test.c  | 262 ++++++++++++
 30 files changed, 1464 insertions(+), 150 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/x86_64/fred_test.c


base-commit: d49b86c24e836941c85c4906e9519fca9426a6e0

Comments

Li, Xin3 Nov. 9, 2023, 11:50 p.m. UTC | #1
> >+static void vmx_vcpu_config_fred_after_set_cpuid(struct kvm_vcpu *vcpu)
> >+{
> >+	struct vcpu_vmx *vmx = to_vmx(vcpu);
> >+
> >+	if (!cpu_feature_enabled(X86_FEATURE_FRED) ||
> >+	    !guest_cpuid_has(vcpu, X86_FEATURE_FRED))
> >+		return;
> >+
> >+	/* Enable loading guest FRED MSRs from VMCS */
> >+	vm_entry_controls_setbit(vmx, VM_ENTRY_LOAD_IA32_FRED);
> >+
> >+	/*
> >+	 * Enable saving guest FRED MSRs into VMCS and loading host FRED MSRs
> >+	 * from VMCS.
> >+	 */
> >+	vm_exit_controls_setbit(vmx,
> VM_EXIT_ACTIVATE_SECONDARY_CONTROLS);
> >+	secondary_vm_exit_controls_setbit(vmx,
> >+					  SECONDARY_VM_EXIT_SAVE_IA32_FRED
> |
> >+
> SECONDARY_VM_EXIT_LOAD_IA32_FRED);
> 
> all above vmcs controls need to be cleared if guest doesn't enumerate FRED, see
> 
> https://lore.kernel.org/all/ZJYzPn7ipYfO0fLZ@google.com/

Good point, the user space could set cpuid multiple times...
 
> Clearing VM_EXIT_ACTIVATE_SECONDARY_CONTROLS may be problematic when
> new bits are added to secondary vmcs controls. Why not keep
> VM_EXIT_ACTIVATE_SECONDARY_CONTROLS always on if it is supported? or you
> see any perf impact?

I think it from the other way, why keeps hw loading it on every vmentry
even if it's not used by a guest?

Different CPUs may implement it in different ways, which we can't assume.

Other features needing it should set it separately, say with a refcount.
Li, Xin3 Nov. 10, 2023, 12:04 a.m. UTC | #2
> > >+	if (cpu_feature_enabled(X86_FEATURE_FRED) &&
> > >+	    !(_vmentry_control & VM_ENTRY_LOAD_IA32_FRED)) {
> > >+		pr_warn_once("FRED enabled but no VMX VM-Entry
> LOAD_IA32_FRED control: %x\n",
> > >+			     _vmentry_control);
> >
> > Can we just hide FRED from guests like what KVM does for other
> > features which have similar dependencies? see vmx_set_cpu_caps().
> 
> Both of these warnings should simply be dropped.  The
> error_on_inconsistent_vmcs_config stuff is for inconsistencies within the allowed
> VMCS fields.  Having a feature that is supported in bare metal but not virtualized
> is perfectly legal, if uncommon.

I deliberately keep it, at least for now, because these checks are helpful
during the development of nVMX FRED.  It will be helpful for other VMMs
being developed/tested on KVM.

> What *is* needed is for KVM to refuse to virtualize FRED if the entry/exit controls
> aren't consistent.  E.g. if at least one control is present, and at least one
> control is missing.   I.e. KVM needs a version of vmcs_entry_exit_pairs that can
> deal with SECONDAY_VM_EXIT controls.

I agree there are better ways.  But maybe after or before VMX FRED.

>  I'll circle back to this when I give the
> series a proper review, which is going to be 3+ weeks.

The traffic in KVM mailing list is surprisingly high recently.  So that is
totally expected.
Sean Christopherson Nov. 10, 2023, 12:18 a.m. UTC | #3
On Thu, Nov 09, 2023, Xin3 Li wrote:
> > >+static void vmx_vcpu_config_fred_after_set_cpuid(struct kvm_vcpu *vcpu)
> > >+{
> > >+	struct vcpu_vmx *vmx = to_vmx(vcpu);
> > >+
> > >+	if (!cpu_feature_enabled(X86_FEATURE_FRED) ||
> > >+	    !guest_cpuid_has(vcpu, X86_FEATURE_FRED))
> > >+		return;
> > >+
> > >+	/* Enable loading guest FRED MSRs from VMCS */
> > >+	vm_entry_controls_setbit(vmx, VM_ENTRY_LOAD_IA32_FRED);
> > >+
> > >+	/*
> > >+	 * Enable saving guest FRED MSRs into VMCS and loading host FRED MSRs
> > >+	 * from VMCS.
> > >+	 */
> > >+	vm_exit_controls_setbit(vmx,
> > VM_EXIT_ACTIVATE_SECONDARY_CONTROLS);
> > >+	secondary_vm_exit_controls_setbit(vmx,
> > >+					  SECONDARY_VM_EXIT_SAVE_IA32_FRED
> > |
> > >+
> > SECONDARY_VM_EXIT_LOAD_IA32_FRED);
> > 
> > all above vmcs controls need to be cleared if guest doesn't enumerate FRED, see
> > 
> > https://lore.kernel.org/all/ZJYzPn7ipYfO0fLZ@google.com/
> 
> Good point, the user space could set cpuid multiple times...
>  
> > Clearing VM_EXIT_ACTIVATE_SECONDARY_CONTROLS may be problematic when
> > new bits are added to secondary vmcs controls. Why not keep
> > VM_EXIT_ACTIVATE_SECONDARY_CONTROLS always on if it is supported? or you
> > see any perf impact?
> 
> I think it from the other way, why keeps hw loading it on every vmentry
> even if it's not used by a guest?

Oh, yeesh, this is clearing the activation control.  Yeah, NAK to that, just
leave it set.  If it weren't for the fact that there is apparently a metrict ton
of FRED state (I thought the whole point of FRED was to kill off legacy crap like
CPL1 and CPL2 stacks?) _and_ that KVM needs to toggle MSR intercepts, I'd probably
push back on toggling even the FRED controls.

> Different CPUs may implement it in different ways, which we can't assume.

Implement what in a different way?  The VMCS fields and FRED are architectural.
The internal layout of the VMCS is uarch specific, but the encodings and semantics
absolutely cannot change without breaking software.  And if Intel does something
asinine like make a control active-low then we have far, far bigger problems.

> Other features needing it should set it separately, say with a refcount.

Absolutely not.  Unless Intel screwed up the implementation, the cost of leaving
VM_EXIT_ACTIVATE_SECONDARY_CONTROLS set when it's supported shouldn't even be
measurable.
Sean Christopherson Nov. 10, 2023, 3:01 p.m. UTC | #4
On Fri, Nov 10, 2023, Xin3 Li wrote:
> > > >+	if (cpu_feature_enabled(X86_FEATURE_FRED) &&
> > > >+	    !(_vmentry_control & VM_ENTRY_LOAD_IA32_FRED)) {
> > > >+		pr_warn_once("FRED enabled but no VMX VM-Entry
> > LOAD_IA32_FRED control: %x\n",
> > > >+			     _vmentry_control);
> > >
> > > Can we just hide FRED from guests like what KVM does for other
> > > features which have similar dependencies? see vmx_set_cpu_caps().
> > 
> > Both of these warnings should simply be dropped.  The
> > error_on_inconsistent_vmcs_config stuff is for inconsistencies within the allowed
> > VMCS fields.  Having a feature that is supported in bare metal but not virtualized
> > is perfectly legal, if uncommon.
> 
> I deliberately keep it, at least for now, because these checks are helpful
> during the development of nVMX FRED.  It will be helpful for other VMMs
> being developed/tested on KVM.

No, remove it.  It's architecturally legal for a CPU to support a feature in bare
metal but not provide virtualization support.

> > What *is* needed is for KVM to refuse to virtualize FRED if the entry/exit controls
> > aren't consistent.  E.g. if at least one control is present, and at least one
> > control is missing.   I.e. KVM needs a version of vmcs_entry_exit_pairs that can
> > deal with SECONDAY_VM_EXIT controls.
> 
> I agree there are better ways.  But maybe after or before VMX FRED.

Uh, no.  This is not optional.  FRED is the first feature that uses SECONDAY_VM_EXIT
controls, so FRED gets the honor of adding the necessary infrastructure support.
Li, Xin3 Nov. 14, 2023, 2:50 a.m. UTC | #5
> > > Clearing VM_EXIT_ACTIVATE_SECONDARY_CONTROLS may be problematic when
> > > new bits are added to secondary vmcs controls. Why not keep
> > > VM_EXIT_ACTIVATE_SECONDARY_CONTROLS always on if it is supported? or
> > > you see any perf impact?
> >
> > I think it from the other way, why keeps hw loading it on every
> > vmentry even if it's not used by a guest?
> 
> Oh, yeesh, this is clearing the activation control.  Yeah, NAK to that, just leave it
> set.  If it weren't for the fact that there is apparently a metrict ton of FRED state (I
> thought the whole point of FRED was to kill off legacy crap like
> CPL1 and CPL2 stacks?) _and_ that KVM needs to toggle MSR intercepts, I'd
> probably push back on toggling even the FRED controls.

To me, FRED is to _architecturally_ do the right thing for x86 event handling.

I don't think FRED's major purpose is to kill legacy craps, but longer term
it paves the way for that.

Yeah, I would like to discuss whether to toggle FRED controls.

> 
> > Different CPUs may implement it in different ways, which we can't assume.
> 
> Implement what in a different way?  The VMCS fields and FRED are architectural.
> The internal layout of the VMCS is uarch specific, but the encodings and semantics
> absolutely cannot change without breaking software.  And if Intel does something
> asinine like make a control active-low then we have far, far bigger problems.

I should have made it clear that I wasn't talking at the ISA level.  And
of course CPU uarch implementations should be transparent to software.

I mean a CPU uarch could choose to check the activation bit in the VM exit
controls first and then decide whether to load the 2nd VM exit controls.
While if resources allow, a CPU uarch could always load the 2nd VM exit
controls.

BTW, I believe the active-low controls are really gone with new features.
All new controls are all 0s by default.

> > Other features needing it should set it separately, say with a refcount.
> 
> Absolutely not.  Unless Intel screwed up the implementation, the cost of leaving
> VM_EXIT_ACTIVATE_SECONDARY_CONTROLS set when it's supported shouldn't
> even be measurable.

I do hope so.  However, I don't know whether this is guaranteed or not on
all uarch implementations.

A decision to leave it set is good enough for now.

Thanks!
    Xin
Li, Xin3 Nov. 14, 2023, 4:05 a.m. UTC | #6
> > > > Can we just hide FRED from guests like what KVM does for other
> > > > features which have similar dependencies? see vmx_set_cpu_caps().
> > >
> > > Both of these warnings should simply be dropped.  The
> > > error_on_inconsistent_vmcs_config stuff is for inconsistencies within the
> allowed
> > > VMCS fields.  Having a feature that is supported in bare metal but not
> virtualized
> > > is perfectly legal, if uncommon.
> >
> > I deliberately keep it, at least for now, because these checks are helpful
> > during the development of nVMX FRED.  It will be helpful for other VMMs
> > being developed/tested on KVM.
> 
> No, remove it.  It's architecturally legal for a CPU to support a feature in bare
> metal but not provide virtualization support.

Like the stage when native Linux has FRED support while KVM not yet?

> > > What *is* needed is for KVM to refuse to virtualize FRED if the entry/exit
> controls
> > > aren't consistent.  E.g. if at least one control is present, and at least one
> > > control is missing.   I.e. KVM needs a version of vmcs_entry_exit_pairs that can
> > > deal with SECONDAY_VM_EXIT controls.
> >
> > I agree there are better ways.  But maybe after or before VMX FRED.
> 
> Uh, no.  This is not optional.  FRED is the first feature that uses
> SECONDAY_VM_EXIT
> controls, so FRED gets the honor of adding the necessary infrastructure support.

The 2nd VM exit controls is a must for FRED, so I should do it.

I think you mean the consistency checks can be done in a better way (which
is not just for FRED controls consistency checks).  No?

Thanks!
    Xin