mbox series

[v1,0/2] cpufreq: ACPI: Address performance regression related to scale-invariance

Message ID 13690581.X0sz4iL7V8@kreacher
Headers show
Series cpufreq: ACPI: Address performance regression related to scale-invariance | expand

Message

Rafael J. Wysocki Feb. 4, 2021, 5:23 p.m. UTC
Hi All,

These 2 patches address a performance regression related to scale-invariance
found by Michael and analyzed by Giovanni (see the patch from Giovanni at
https://lore.kernel.org/linux-pm/20210203135321.12253-2-ggherdovich@suse.cz/).

Patch [1/2] is a replacement for the one mentioned above (it was posted without
a changelog and tested somewhat) and patch [2/2] takes care of systems without
CPPC on top of that.

Please see the patch changelogs for details and let me know if you have any
concerns.

Thanks!

Comments

Giovanni Gherdovich Feb. 5, 2021, 7:06 a.m. UTC | #1
On Thu, 2021-02-04 at 18:25 +0100, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> A severe performance regression on AMD EPYC processors when using
> the schedutil scaling governor was discovered by Phoronix.com and
> attributed to the following commits:
> 
>     41ea667227ba ("x86, sched: Calculate frequency invariance for
>     AMD systems")
> 
>     976df7e5730e ("x86, sched: Use midpoint of max_boost and max_P
>     for frequency invariance on AMD EPYC")
> 
> [snip]
> 
> Fixes: 41ea667227ba ("x86, sched: Calculate frequency invariance for AMD systems")
> Fixes: 976df7e5730e ("x86, sched: Use midpoint of max_boost and max_P for frequency invariance on AMD EPYC")
> Fixes: db865272d9c4 ("cpufreq: Avoid configuring old governors as default with intel_pstate")
> Link: https://www.phoronix.com/scan.php?page=article&item=linux511-amd-schedutil&num=1
> Link: https://lore.kernel.org/linux-pm/20210203135321.12253-2-ggherdovich@suse.cz/
> Reported-by: Michael Larabel <Michael@phoronix.com>
> Diagnosed-by: Giovanni Gherdovich <ggherdovich@suse.cz>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>  drivers/cpufreq/acpi-cpufreq.c |  107 ++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 95 insertions(+), 12 deletions(-)
> 
> Index: linux-pm/drivers/cpufreq/acpi-cpufreq.c
> ===================================================================
> --- linux-pm.orig/drivers/cpufreq/acpi-cpufreq.c
> +++ linux-pm/drivers/cpufreq/acpi-cpufreq.c
> [...]

Tested-by: Giovanni Gherdovich <ggherdovich@suse.cz>
Reviewed-by: Giovanni Gherdovich <ggherdovich@suse.cz>

Note there is also the Tested-by: Michael, from the other thread
https://lore.kernel.org/lkml/5ea06dbe-255c-3d22-b9bd-6e627c5f94af@phoronix.com/

I tested this patch with the "NASA Parallel Benchmarks" from [link below], the
results confirms that the 5.10 performance is recovered:


Ratios of completion times, lower is better (5.10 is the baseline)

                              5.10     5.11-rc6 5.11-rc6-ggherdov 5.11-rc6-rafael
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Integer Sort              1.00        1.21        0.91           0.93
    Embarrassingly Parallel   1.00        1.60        1.00           1.00
    Discrete FFT              1.00        1.68        0.67           0.67


    CPU     : MODEL            : 2x AMD EPYC 7742
              FREQUENCY TABLE  : P2: 1.50 GHz
                                 P1: 2.00 GHz
                                 P0: 2.25 GHz
              MAX BOOST        :     3.40 GHz

[link] https://www.nas.nasa.gov/publications/npb.html

Thanks,
Giovanni
Giovanni Gherdovich Feb. 5, 2021, 7:07 a.m. UTC | #2
On Thu, 2021-02-04 at 18:34 +0100, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

> 

> If the maximum performance level taken for computing the

> arch_max_freq_ratio value used in the x86 scale-invariance code is

> higher than the one corresponding to the cpuinfo.max_freq value

> coming from the acpi_cpufreq driver, the scale-invariant utilization

> falls below 100% even if the CPU runs at cpuinfo.max_freq or slightly

> faster, which causes the schedutil governor to select a frequency

> below cpuinfo.max_freq.  That frequency corresponds to a frequency

> table entry below the maximum performance level necessary to get to

> the "boost" range of CPU frequencies which prevents "boost"

> frequencies from being used in some workloads.

> 

> While this issue is related to scale-invariance, it may be amplified

> by commit db865272d9c4 ("cpufreq: Avoid configuring old governors as

> default with intel_pstate") from the 5.10 development cycle which

> made it extremely easy to default to schedutil even if the preferred

> driver is acpi_cpufreq as long as intel_pstate is built too, because

> the mere presence of the latter effectively removes the ondemand

> governor from the defaults.  Distro kernels are likely to include

> both intel_pstate and acpi_cpufreq on x86, so their users who cannot

> use intel_pstate or choose to use acpi_cpufreq may easily be

> affectecd by this issue.

> 

> If CPPC is available, it can be used to address this issue by

> extending the frequency tables created by acpi_cpufreq to cover the

> entire available frequency range (including "boost" frequencies) for

> each CPU, but if CPPC is not there, acpi_cpufreq has no idea what

> the maximum "boost" frequency is and the frequency tables created by

> it cannot be extended in a meaningful way, so in that case make it

> ask the arch scale-invariance code to to use the "nominal" performance

> level for CPU utilization scaling in order to avoid the issue at hand.

> 

> Fixes: db865272d9c4 ("cpufreq: Avoid configuring old governors as default with intel_pstate")

> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

> ---

>  arch/x86/kernel/smpboot.c      |    1 +

>  drivers/cpufreq/acpi-cpufreq.c |    8 ++++++++

>  2 files changed, 9 insertions(+)

> 

> Index: linux-pm/drivers/cpufreq/acpi-cpufreq.c

> ===================================================================

> --- linux-pm.orig/drivers/cpufreq/acpi-cpufreq.c

> +++ linux-pm/drivers/cpufreq/acpi-cpufreq.c

> @@ -806,6 +806,14 @@ static int acpi_cpufreq_cpu_init(struct

>  		state_count++;

>  		valid_states++;

>  		data->first_perf_state = valid_states;

> +	} else {

> +		/*

> +		 * If the maximum "boost" frequency is unknown, ask the arch

> +		 * scale-invariance code to use the "nominal" performance for

> +		 * CPU utilization scaling so as to prevent the schedutil

> +		 * governor from selecting inadequate CPU frequencies.

> +		 */

> +		arch_set_max_freq_ratio(true);

>  	}

>  

>  	freq_table = kcalloc(state_count, sizeof(*freq_table), GFP_KERNEL);

> Index: linux-pm/arch/x86/kernel/smpboot.c

> ===================================================================

> --- linux-pm.orig/arch/x86/kernel/smpboot.c

> +++ linux-pm/arch/x86/kernel/smpboot.c

> @@ -1833,6 +1833,7 @@ void arch_set_max_freq_ratio(bool turbo_

>  	arch_max_freq_ratio = turbo_disabled ? SCHED_CAPACITY_SCALE :

>  					arch_turbo_freq_ratio;

>  }

> +EXPORT_SYMBOL_GPL(arch_set_max_freq_ratio);

>  

>  static bool turbo_disabled(void)

>  {


Reviewed-by: Giovanni Gherdovich <ggherdovich@suse.cz>


Thanks,
Giovanni
Peter Zijlstra Feb. 5, 2021, 11:56 a.m. UTC | #3
On Thu, Feb 04, 2021 at 06:34:32PM +0100, Rafael J. Wysocki wrote:

>  arch/x86/kernel/smpboot.c      |    1 +
>  drivers/cpufreq/acpi-cpufreq.c |    8 ++++++++
>  2 files changed, 9 insertions(+)
> 
> Index: linux-pm/drivers/cpufreq/acpi-cpufreq.c
> ===================================================================
> --- linux-pm.orig/drivers/cpufreq/acpi-cpufreq.c
> +++ linux-pm/drivers/cpufreq/acpi-cpufreq.c
> @@ -806,6 +806,14 @@ static int acpi_cpufreq_cpu_init(struct
>  		state_count++;
>  		valid_states++;
>  		data->first_perf_state = valid_states;
> +	} else {
> +		/*
> +		 * If the maximum "boost" frequency is unknown, ask the arch
> +		 * scale-invariance code to use the "nominal" performance for
> +		 * CPU utilization scaling so as to prevent the schedutil
> +		 * governor from selecting inadequate CPU frequencies.
> +		 */
> +		arch_set_max_freq_ratio(true);
>  	}
>  
>  	freq_table = kcalloc(state_count, sizeof(*freq_table), GFP_KERNEL);
> Index: linux-pm/arch/x86/kernel/smpboot.c
> ===================================================================
> --- linux-pm.orig/arch/x86/kernel/smpboot.c
> +++ linux-pm/arch/x86/kernel/smpboot.c
> @@ -1833,6 +1833,7 @@ void arch_set_max_freq_ratio(bool turbo_
>  	arch_max_freq_ratio = turbo_disabled ? SCHED_CAPACITY_SCALE :
>  					arch_turbo_freq_ratio;
>  }
> +EXPORT_SYMBOL_GPL(arch_set_max_freq_ratio);

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Rafael J. Wysocki Feb. 5, 2021, 12:23 p.m. UTC | #4
On Fri, Feb 5, 2021 at 12:59 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Thu, Feb 04, 2021 at 06:34:32PM +0100, Rafael J. Wysocki wrote:
>
> >  arch/x86/kernel/smpboot.c      |    1 +
> >  drivers/cpufreq/acpi-cpufreq.c |    8 ++++++++
> >  2 files changed, 9 insertions(+)
> >
> > Index: linux-pm/drivers/cpufreq/acpi-cpufreq.c
> > ===================================================================
> > --- linux-pm.orig/drivers/cpufreq/acpi-cpufreq.c
> > +++ linux-pm/drivers/cpufreq/acpi-cpufreq.c
> > @@ -806,6 +806,14 @@ static int acpi_cpufreq_cpu_init(struct
> >               state_count++;
> >               valid_states++;
> >               data->first_perf_state = valid_states;
> > +     } else {
> > +             /*
> > +              * If the maximum "boost" frequency is unknown, ask the arch
> > +              * scale-invariance code to use the "nominal" performance for
> > +              * CPU utilization scaling so as to prevent the schedutil
> > +              * governor from selecting inadequate CPU frequencies.
> > +              */
> > +             arch_set_max_freq_ratio(true);
> >       }
> >
> >       freq_table = kcalloc(state_count, sizeof(*freq_table), GFP_KERNEL);
> > Index: linux-pm/arch/x86/kernel/smpboot.c
> > ===================================================================
> > --- linux-pm.orig/arch/x86/kernel/smpboot.c
> > +++ linux-pm/arch/x86/kernel/smpboot.c
> > @@ -1833,6 +1833,7 @@ void arch_set_max_freq_ratio(bool turbo_
> >       arch_max_freq_ratio = turbo_disabled ? SCHED_CAPACITY_SCALE :
> >                                       arch_turbo_freq_ratio;
> >  }
> > +EXPORT_SYMBOL_GPL(arch_set_max_freq_ratio);
>
> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Thanks, I'm queuing up this lot for a post-rc7 late push.