Message ID | 20210531140526.42932-1-krzysztof.kozlowski@canonical.com |
---|---|
Headers | show |
Series | x86/kvm: fixes for hibernation | expand |
On 31/05/21 16:05, Krzysztof Kozlowski wrote: > From: Vitaly Kuznetsov <vkuznets@redhat.com> > > commit 8b79feffeca28c5459458fe78676b081e87c93a4 upstream. > > Various PV features (Async PF, PV EOI, steal time) work through memory > shared with hypervisor and when we restore from hibernation we must > properly teardown all these features to make sure hypervisor doesn't > write to stale locations after we jump to the previously hibernated kernel > (which can try to place anything there). For secondary CPUs the job is > already done by kvm_cpu_down_prepare(), register syscore ops to do > the same for boot CPU. > > Krzysztof: > This fixes memory corruption visible after second resume from > hibernation: > > BUG: Bad page state in process dbus-daemon pfn:18b01 > page:ffffea000062c040 refcount:0 mapcount:0 mapping:0000000000000000 index:0x1 compound_mapcount: -30591 > flags: 0xfffffc0078141(locked|error|workingset|writeback|head|mappedtodisk|reclaim) > raw: 000fffffc0078141 dead0000000002d0 dead000000000100 0000000000000000 > raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000 > page dumped because: PAGE_FLAGS_CHECK_AT_PREP flag set > bad because of flags: 0x78141(locked|error|workingset|writeback|head|mappedtodisk|reclaim) > > Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> > Message-Id: <20210414123544.1060604-3-vkuznets@redhat.com> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > Signed-off-by: Andrea Righi <andrea.righi@canonical.com> > [krzysztof: Extend the commit message, adjust for v5.10 context] > Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> > --- > arch/x86/kernel/kvm.c | 57 +++++++++++++++++++++++++++++++------------ > 1 file changed, 41 insertions(+), 16 deletions(-) > > diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c > index 7f57ede3cb8e..6af3f9c3956c 100644 > --- a/arch/x86/kernel/kvm.c > +++ b/arch/x86/kernel/kvm.c > @@ -26,6 +26,7 @@ > #include <linux/kprobes.h> > #include <linux/nmi.h> > #include <linux/swait.h> > +#include <linux/syscore_ops.h> > #include <asm/timer.h> > #include <asm/cpu.h> > #include <asm/traps.h> > @@ -460,6 +461,25 @@ static bool pv_tlb_flush_supported(void) > > static DEFINE_PER_CPU(cpumask_var_t, __pv_cpu_mask); > > +static void kvm_guest_cpu_offline(void) > +{ > + kvm_disable_steal_time(); > + if (kvm_para_has_feature(KVM_FEATURE_PV_EOI)) > + wrmsrl(MSR_KVM_PV_EOI_EN, 0); > + kvm_pv_disable_apf(); > + apf_task_wake_all(); > +} > + > +static int kvm_cpu_online(unsigned int cpu) > +{ > + unsigned long flags; > + > + local_irq_save(flags); > + kvm_guest_cpu_init(); > + local_irq_restore(flags); > + return 0; > +} > + > #ifdef CONFIG_SMP > > static bool pv_ipi_supported(void) > @@ -587,31 +607,34 @@ static void __init kvm_smp_prepare_boot_cpu(void) > kvm_spinlock_init(); > } > > -static void kvm_guest_cpu_offline(void) > +static int kvm_cpu_down_prepare(unsigned int cpu) > { > - kvm_disable_steal_time(); > - if (kvm_para_has_feature(KVM_FEATURE_PV_EOI)) > - wrmsrl(MSR_KVM_PV_EOI_EN, 0); > - kvm_pv_disable_apf(); > - apf_task_wake_all(); > -} > + unsigned long flags; > > -static int kvm_cpu_online(unsigned int cpu) > -{ > - local_irq_disable(); > - kvm_guest_cpu_init(); > - local_irq_enable(); > + local_irq_save(flags); > + kvm_guest_cpu_offline(); > + local_irq_restore(flags); > return 0; > } > > -static int kvm_cpu_down_prepare(unsigned int cpu) > +#endif > + > +static int kvm_suspend(void) > { > - local_irq_disable(); > kvm_guest_cpu_offline(); > - local_irq_enable(); > + > return 0; > } > -#endif > + > +static void kvm_resume(void) > +{ > + kvm_cpu_online(raw_smp_processor_id()); > +} > + > +static struct syscore_ops kvm_syscore_ops = { > + .suspend = kvm_suspend, > + .resume = kvm_resume, > +}; > > static void kvm_flush_tlb_others(const struct cpumask *cpumask, > const struct flush_tlb_info *info) > @@ -681,6 +704,8 @@ static void __init kvm_guest_init(void) > kvm_guest_cpu_init(); > #endif > > + register_syscore_ops(&kvm_syscore_ops); > + > /* > * Hard lockup detection is enabled by default. Disable it, as guests > * can get false positives too easily, for example if the host is > Acked-by: Paolo Bonzini <pbonzini@redhat.com>
On 31/05/21 16:05, Krzysztof Kozlowski wrote: > From: Vitaly Kuznetsov <vkuznets@redhat.com> > > commit 3d6b84132d2a57b5a74100f6923a8feb679ac2ce upstream. > > Crash shutdown handler only disables kvmclock and steal time, other PV > features remain active so we risk corrupting memory or getting some > side-effects in kdump kernel. Move crash handler to kvm.c and unify > with CPU offline. > > Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> > Message-Id: <20210414123544.1060604-5-vkuznets@redhat.com> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> > --- > arch/x86/include/asm/kvm_para.h | 6 ----- > arch/x86/kernel/kvm.c | 44 ++++++++++++++++++++++++--------- > arch/x86/kernel/kvmclock.c | 21 ---------------- > 3 files changed, 32 insertions(+), 39 deletions(-) > > diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h > index 9c56e0defd45..69299878b200 100644 > --- a/arch/x86/include/asm/kvm_para.h > +++ b/arch/x86/include/asm/kvm_para.h > @@ -92,7 +92,6 @@ unsigned int kvm_arch_para_hints(void); > void kvm_async_pf_task_wait_schedule(u32 token); > void kvm_async_pf_task_wake(u32 token); > u32 kvm_read_and_reset_apf_flags(void); > -void kvm_disable_steal_time(void); > bool __kvm_handle_async_pf(struct pt_regs *regs, u32 token); > > DECLARE_STATIC_KEY_FALSE(kvm_async_pf_enabled); > @@ -137,11 +136,6 @@ static inline u32 kvm_read_and_reset_apf_flags(void) > return 0; > } > > -static inline void kvm_disable_steal_time(void) > -{ > - return; > -} > - > static __always_inline bool kvm_handle_async_pf(struct pt_regs *regs, u32 token) > { > return false; > diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c > index be1c42e663c6..7462b79c39de 100644 > --- a/arch/x86/kernel/kvm.c > +++ b/arch/x86/kernel/kvm.c > @@ -38,6 +38,7 @@ > #include <asm/tlb.h> > #include <asm/cpuidle_haltpoll.h> > #include <asm/ptrace.h> > +#include <asm/reboot.h> > #include <asm/svm.h> > > DEFINE_STATIC_KEY_FALSE(kvm_async_pf_enabled); > @@ -375,6 +376,14 @@ static void kvm_pv_disable_apf(void) > pr_info("Unregister pv shared memory for cpu %d\n", smp_processor_id()); > } > > +static void kvm_disable_steal_time(void) > +{ > + if (!has_steal_clock) > + return; > + > + wrmsr(MSR_KVM_STEAL_TIME, 0, 0); > +} > + > static void kvm_pv_guest_cpu_reboot(void *unused) > { > /* > @@ -417,14 +426,6 @@ static u64 kvm_steal_clock(int cpu) > return steal; > } > > -void kvm_disable_steal_time(void) > -{ > - if (!has_steal_clock) > - return; > - > - wrmsr(MSR_KVM_STEAL_TIME, 0, 0); > -} > - > static inline void __set_percpu_decrypted(void *ptr, unsigned long size) > { > early_set_memory_decrypted((unsigned long) ptr, size); > @@ -461,13 +462,14 @@ static bool pv_tlb_flush_supported(void) > > static DEFINE_PER_CPU(cpumask_var_t, __pv_cpu_mask); > > -static void kvm_guest_cpu_offline(void) > +static void kvm_guest_cpu_offline(bool shutdown) > { > kvm_disable_steal_time(); > if (kvm_para_has_feature(KVM_FEATURE_PV_EOI)) > wrmsrl(MSR_KVM_PV_EOI_EN, 0); > kvm_pv_disable_apf(); > - apf_task_wake_all(); > + if (!shutdown) > + apf_task_wake_all(); > kvmclock_disable(); > } > > @@ -613,7 +615,7 @@ static int kvm_cpu_down_prepare(unsigned int cpu) > unsigned long flags; > > local_irq_save(flags); > - kvm_guest_cpu_offline(); > + kvm_guest_cpu_offline(false); > local_irq_restore(flags); > return 0; > } > @@ -622,7 +624,7 @@ static int kvm_cpu_down_prepare(unsigned int cpu) > > static int kvm_suspend(void) > { > - kvm_guest_cpu_offline(); > + kvm_guest_cpu_offline(false); > > return 0; > } > @@ -637,6 +639,20 @@ static struct syscore_ops kvm_syscore_ops = { > .resume = kvm_resume, > }; > > +/* > + * After a PV feature is registered, the host will keep writing to the > + * registered memory location. If the guest happens to shutdown, this memory > + * won't be valid. In cases like kexec, in which you install a new kernel, this > + * means a random memory location will be kept being written. > + */ > +#ifdef CONFIG_KEXEC_CORE > +static void kvm_crash_shutdown(struct pt_regs *regs) > +{ > + kvm_guest_cpu_offline(true); > + native_machine_crash_shutdown(regs); > +} > +#endif > + > static void kvm_flush_tlb_others(const struct cpumask *cpumask, > const struct flush_tlb_info *info) > { > @@ -705,6 +721,10 @@ static void __init kvm_guest_init(void) > kvm_guest_cpu_init(); > #endif > > +#ifdef CONFIG_KEXEC_CORE > + machine_ops.crash_shutdown = kvm_crash_shutdown; > +#endif > + > register_syscore_ops(&kvm_syscore_ops); > > /* > diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c > index 327a0de01066..c4ac26333bc4 100644 > --- a/arch/x86/kernel/kvmclock.c > +++ b/arch/x86/kernel/kvmclock.c > @@ -20,7 +20,6 @@ > #include <asm/hypervisor.h> > #include <asm/mem_encrypt.h> > #include <asm/x86_init.h> > -#include <asm/reboot.h> > #include <asm/kvmclock.h> > > static int kvmclock __initdata = 1; > @@ -204,23 +203,6 @@ static void kvm_setup_secondary_clock(void) > } > #endif > > -/* > - * After the clock is registered, the host will keep writing to the > - * registered memory location. If the guest happens to shutdown, this memory > - * won't be valid. In cases like kexec, in which you install a new kernel, this > - * means a random memory location will be kept being written. So before any > - * kind of shutdown from our side, we unregister the clock by writing anything > - * that does not have the 'enable' bit set in the msr > - */ > -#ifdef CONFIG_KEXEC_CORE > -static void kvm_crash_shutdown(struct pt_regs *regs) > -{ > - native_write_msr(msr_kvm_system_time, 0, 0); > - kvm_disable_steal_time(); > - native_machine_crash_shutdown(regs); > -} > -#endif > - > void kvmclock_disable(void) > { > native_write_msr(msr_kvm_system_time, 0, 0); > @@ -350,9 +332,6 @@ void __init kvmclock_init(void) > #endif > x86_platform.save_sched_clock_state = kvm_save_sched_clock_state; > x86_platform.restore_sched_clock_state = kvm_restore_sched_clock_state; > -#ifdef CONFIG_KEXEC_CORE > - machine_ops.crash_shutdown = kvm_crash_shutdown; > -#endif > kvm_get_preset_lpj(); > > /* > Acked-by: Paolo Bonzini <pbonzini@redhat.com>
On Mon, May 31, 2021 at 04:05:23PM +0200, Krzysztof Kozlowski wrote: > Hi, > > This is version 2 of a backport for v5.10. > > Changes since v2: > 1. Rebased v3 of v5.4 version, I kept numbering to be consistent. > 2. The context in patch 1/3 had to be adjusted. I also need a 5.12.y version of this series before I can apply these to older kernel releases. thanks, greg k-h
On 01/06/2021 08:40, Greg KH wrote: > On Mon, May 31, 2021 at 04:05:23PM +0200, Krzysztof Kozlowski wrote: >> Hi, >> >> This is version 2 of a backport for v5.10. >> >> Changes since v2: >> 1. Rebased v3 of v5.4 version, I kept numbering to be consistent. >> 2. The context in patch 1/3 had to be adjusted. > > I also need a 5.12.y version of this series before I can apply these to > older kernel releases. The cherry-pick from v5.10 would work although apply might fail due to context difference. I'll sent shortly a v5.12 port. Best regards, Krzysztof