Message ID | 1803209.Mvru99baaF@kreacher |
---|---|
State | New |
Headers | show |
Series | x86: PM: Register syscore_ops for scale invariance | expand |
On Fri, 2021-01-08 at 19:05 +0100, Rafael J. Wysocki wrote: > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > On x86 scale invariace tends to be disabled during resume from > suspend-to-RAM, because the MPERF or APERF MSR values are not as > expected then due to updates taking place after the platform > firmware has been invoked to complete the suspend transition. > > That, of course, is not desirable, especially if the schedutil > scaling governor is in use, because the lack of scale invariance > causes it to be less reliable. > > To counter that effect, modify init_freq_invariance() to register > a syscore_ops object for scale invariance with the ->resume callback > pointing to init_counter_refs() which will run on the CPU starting > the resume transition (the other CPUs will be taken care of the > "online" operations taking place later). > > Fixes: e2b0d619b400 ("x86, sched: check for counters overflow in frequency invariant accounting") > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > <snip> Thanks for writing this, Rafael. Peter Zijlstra asked to fix this problem months ago; I started but got stucked and never finished. Giovanni Gherdovich
On Fri, Jan 08, 2021 at 07:05:59PM +0100, Rafael J. Wysocki wrote: > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > On x86 scale invariace tends to be disabled during resume from > suspend-to-RAM, because the MPERF or APERF MSR values are not as > expected then due to updates taking place after the platform > firmware has been invoked to complete the suspend transition. > > That, of course, is not desirable, especially if the schedutil > scaling governor is in use, because the lack of scale invariance > causes it to be less reliable. > > To counter that effect, modify init_freq_invariance() to register > a syscore_ops object for scale invariance with the ->resume callback > pointing to init_counter_refs() which will run on the CPU starting > the resume transition (the other CPUs will be taken care of the > "online" operations taking place later). > > Fixes: e2b0d619b400 ("x86, sched: check for counters overflow in frequency invariant accounting") > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Thanks!, I'll take it through the sched/urgent tree?
On Tue, Jan 12, 2021 at 4:02 PM Peter Zijlstra <peterz@infradead.org> wrote: > > On Fri, Jan 08, 2021 at 07:05:59PM +0100, Rafael J. Wysocki wrote: > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > On x86 scale invariace tends to be disabled during resume from > > suspend-to-RAM, because the MPERF or APERF MSR values are not as > > expected then due to updates taking place after the platform > > firmware has been invoked to complete the suspend transition. > > > > That, of course, is not desirable, especially if the schedutil > > scaling governor is in use, because the lack of scale invariance > > causes it to be less reliable. > > > > To counter that effect, modify init_freq_invariance() to register > > a syscore_ops object for scale invariance with the ->resume callback > > pointing to init_counter_refs() which will run on the CPU starting > > the resume transition (the other CPUs will be taken care of the > > "online" operations taking place later). > > > > Fixes: e2b0d619b400 ("x86, sched: check for counters overflow in frequency invariant accounting") > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > Thanks!, I'll take it through the sched/urgent tree? That works, thanks!
On Tue, Jan 12, 2021 at 4:10 PM Rafael J. Wysocki <rafael@kernel.org> wrote: > > On Tue, Jan 12, 2021 at 4:02 PM Peter Zijlstra <peterz@infradead.org> wrote: > > > > On Fri, Jan 08, 2021 at 07:05:59PM +0100, Rafael J. Wysocki wrote: > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > > > On x86 scale invariace tends to be disabled during resume from > > > suspend-to-RAM, because the MPERF or APERF MSR values are not as > > > expected then due to updates taking place after the platform > > > firmware has been invoked to complete the suspend transition. > > > > > > That, of course, is not desirable, especially if the schedutil > > > scaling governor is in use, because the lack of scale invariance > > > causes it to be less reliable. > > > > > > To counter that effect, modify init_freq_invariance() to register > > > a syscore_ops object for scale invariance with the ->resume callback > > > pointing to init_counter_refs() which will run on the CPU starting > > > the resume transition (the other CPUs will be taken care of the > > > "online" operations taking place later). > > > > > > Fixes: e2b0d619b400 ("x86, sched: check for counters overflow in frequency invariant accounting") > > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > Thanks!, I'll take it through the sched/urgent tree? > > That works, thanks! Any news on this front? It's been a few days ...
On Tue, Jan 19, 2021 at 04:12:20PM +0100, Rafael J. Wysocki wrote: > On Tue, Jan 12, 2021 at 4:10 PM Rafael J. Wysocki <rafael@kernel.org> wrote: > > > > On Tue, Jan 12, 2021 at 4:02 PM Peter Zijlstra <peterz@infradead.org> wrote: > > > > > > On Fri, Jan 08, 2021 at 07:05:59PM +0100, Rafael J. Wysocki wrote: > > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > > > > > On x86 scale invariace tends to be disabled during resume from > > > > suspend-to-RAM, because the MPERF or APERF MSR values are not as > > > > expected then due to updates taking place after the platform > > > > firmware has been invoked to complete the suspend transition. > > > > > > > > That, of course, is not desirable, especially if the schedutil > > > > scaling governor is in use, because the lack of scale invariance > > > > causes it to be less reliable. > > > > > > > > To counter that effect, modify init_freq_invariance() to register > > > > a syscore_ops object for scale invariance with the ->resume callback > > > > pointing to init_counter_refs() which will run on the CPU starting > > > > the resume transition (the other CPUs will be taken care of the > > > > "online" operations taking place later). > > > > > > > > Fixes: e2b0d619b400 ("x86, sched: check for counters overflow in frequency invariant accounting") > > > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > > > Thanks!, I'll take it through the sched/urgent tree? > > > > That works, thanks! > > Any news on this front? It's been a few days ... My bad, it's been held up behind me trying to fix another sched regression. Lemme push out just this one so it doesn't go walk-about.
Index: linux-pm/arch/x86/kernel/smpboot.c =================================================================== --- linux-pm.orig/arch/x86/kernel/smpboot.c +++ linux-pm/arch/x86/kernel/smpboot.c @@ -56,6 +56,7 @@ #include <linux/numa.h> #include <linux/pgtable.h> #include <linux/overflow.h> +#include <linux/syscore_ops.h> #include <asm/acpi.h> #include <asm/desc.h> @@ -2083,6 +2084,23 @@ static void init_counter_refs(void) this_cpu_write(arch_prev_mperf, mperf); } +#ifdef CONFIG_PM_SLEEP +static struct syscore_ops freq_invariance_syscore_ops = { + .resume = init_counter_refs, +}; + +static void register_freq_invariance_syscore_ops(void) +{ + /* Bail out if registered already. */ + if (freq_invariance_syscore_ops.node.prev) + return; + + register_syscore_ops(&freq_invariance_syscore_ops); +} +#else +static inline void register_freq_invariance_syscore_ops(void) {} +#endif + static void init_freq_invariance(bool secondary, bool cppc_ready) { bool ret = false; @@ -2109,6 +2127,7 @@ static void init_freq_invariance(bool se if (ret) { init_counter_refs(); static_branch_enable(&arch_scale_freq_key); + register_freq_invariance_syscore_ops(); pr_info("Estimated ratio of average max frequency by base frequency (times 1024): %llu\n", arch_max_freq_ratio); } else { pr_debug("Couldn't determine max cpu frequency, necessary for scale-invariant accounting.\n");