Message ID | 20220608144517.251109029@infradead.org |
---|---|
State | Superseded |
Headers | show |
Series | cpuidle,rcu: Cleanup the mess | expand |
On Wed, Jun 8, 2022 at 10:48 PM Peter Zijlstra <peterz@infradead.org> wrote: > > Now that arch_cpu_idle() is expected to return with IRQs disabled, > avoid the useless STI/CLI dance. > > Per the specs this is supposed to work, but nobody has yet relied up > this behaviour so broken implementations are possible. I'm totally newbie here. The point of safe_halt() is that STI must be used and be used directly before HLT to enable IRQ during the halting and stop the halting if there is any IRQ. In TDX case, STI must be used directly before the hypercall. Otherwise, no IRQ can come and the vcpu would be stalled forever. Although the hypercall has an "irq_disabled" argument. But the hypervisor doesn't (and can't) touch the IRQ flags no matter what the "irq_disabled" argument is. The IRQ is not enabled during the halting if the IRQ is disabled before the hypercall even if irq_disabled=false. The "irq_disabled" argument is used for workaround purposes: https://lore.kernel.org/kvm/c020ee0b90c424a7010e979c9b32a28e9c488a51.1651774251.git.isaku.yamahata@intel.com/ Hope my immature/incorrect reply elicits a real response from others. Thanks Lai
On Mon, Jun 13, 2022 at 04:26:01PM +0800, Lai Jiangshan wrote: > On Wed, Jun 8, 2022 at 10:48 PM Peter Zijlstra <peterz@infradead.org> wrote: > > > > Now that arch_cpu_idle() is expected to return with IRQs disabled, > > avoid the useless STI/CLI dance. > > > > Per the specs this is supposed to work, but nobody has yet relied up > > this behaviour so broken implementations are possible. > > I'm totally newbie here. > > The point of safe_halt() is that STI must be used and be used > directly before HLT to enable IRQ during the halting and stop > the halting if there is any IRQ. Correct; on real hardware. But this is virt... > In TDX case, STI must be used directly before the hypercall. > Otherwise, no IRQ can come and the vcpu would be stalled forever. > > Although the hypercall has an "irq_disabled" argument. > But the hypervisor doesn't (and can't) touch the IRQ flags no matter > what the "irq_disabled" argument is. The IRQ is not enabled during > the halting if the IRQ is disabled before the hypercall even if > irq_disabled=false. All we need the VMM to do is wake the vCPU, and it can do that, irrespective of the guest's IF. So the VMM can (and does) know if there's an interrupt pending, and that's all that's needed to wake from this hypercall. Once the vCPU is back up and running again, we'll eventually set IF again and the pending interrupt will get delivered and all's well. Think of this like MWAIT with ECX[0] set if you will.
--- a/arch/x86/coco/tdx/tdcall.S +++ b/arch/x86/coco/tdx/tdcall.S @@ -139,19 +139,6 @@ SYM_FUNC_START(__tdx_hypercall) movl $TDVMCALL_EXPOSE_REGS_MASK, %ecx - /* - * For the idle loop STI needs to be called directly before the TDCALL - * that enters idle (EXIT_REASON_HLT case). STI instruction enables - * interrupts only one instruction later. If there is a window between - * STI and the instruction that emulates the HALT state, there is a - * chance for interrupts to happen in this window, which can delay the - * HLT operation indefinitely. Since this is the not the desired - * result, conditionally call STI before TDCALL. - */ - testq $TDX_HCALL_ISSUE_STI, %rsi - jz .Lskip_sti - sti -.Lskip_sti: tdcall /* --- a/arch/x86/coco/tdx/tdx.c +++ b/arch/x86/coco/tdx/tdx.c @@ -124,7 +124,7 @@ static u64 get_cc_mask(void) return BIT_ULL(gpa_width - 1); } -static u64 __cpuidle __halt(const bool irq_disabled, const bool do_sti) +static u64 __cpuidle __halt(const bool irq_disabled) { struct tdx_hypercall_args args = { .r10 = TDX_HYPERCALL_STANDARD, @@ -144,20 +144,14 @@ static u64 __cpuidle __halt(const bool i * can keep the vCPU in virtual HLT, even if an IRQ is * pending, without hanging/breaking the guest. */ - return __tdx_hypercall(&args, do_sti ? TDX_HCALL_ISSUE_STI : 0); + return __tdx_hypercall(&args, 0); } static bool handle_halt(void) { - /* - * Since non safe halt is mainly used in CPU offlining - * and the guest will always stay in the halt state, don't - * call the STI instruction (set do_sti as false). - */ const bool irq_disabled = irqs_disabled(); - const bool do_sti = false; - if (__halt(irq_disabled, do_sti)) + if (__halt(irq_disabled)) return false; return true; @@ -165,22 +159,13 @@ static bool handle_halt(void) void __cpuidle tdx_safe_halt(void) { - /* - * For do_sti=true case, __tdx_hypercall() function enables - * interrupts using the STI instruction before the TDCALL. So - * set irq_disabled as false. - */ const bool irq_disabled = false; - const bool do_sti = true; /* * Use WARN_ONCE() to report the failure. */ - if (__halt(irq_disabled, do_sti)) + if (__halt(irq_disabled)) WARN_ONCE(1, "HLT instruction emulation failed\n"); - - /* XXX I can't make sense of what @do_sti actually does */ - raw_local_irq_disable(); } static bool read_msr(struct pt_regs *regs) --- a/arch/x86/include/asm/shared/tdx.h +++ b/arch/x86/include/asm/shared/tdx.h @@ -8,7 +8,6 @@ #define TDX_HYPERCALL_STANDARD 0 #define TDX_HCALL_HAS_OUTPUT BIT(0) -#define TDX_HCALL_ISSUE_STI BIT(1) #define TDX_CPUID_LEAF_ID 0x21 #define TDX_IDENT "IntelTDX "
Now that arch_cpu_idle() is expected to return with IRQs disabled, avoid the useless STI/CLI dance. Per the specs this is supposed to work, but nobody has yet relied up this behaviour so broken implementations are possible. Cc: Isaku Yamahata <isaku.yamahata@gmail.com> Cc: kirill.shutemov@linux.intel.com Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> --- arch/x86/coco/tdx/tdcall.S | 13 ------------- arch/x86/coco/tdx/tdx.c | 23 ++++------------------- arch/x86/include/asm/shared/tdx.h | 1 - 3 files changed, 4 insertions(+), 33 deletions(-)