diff mbox series

[RFT,v1,8/8] cpufreq: intel_pstate: EAS: Increase cost for CPUs using L3 cache

Message ID 47159248.fMDQidcC6G@rjwysocki.net
State New
Headers show
Series cpufreq: intel_pstate: Enable EAS on hybrid platforms without SMT | expand

Commit Message

Rafael J. Wysocki April 16, 2025, 6:12 p.m. UTC
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

On some hybrid platforms some efficient CPUs (E-cores) are not connected
to the L3 cache, but there are no other differences between them and the
other E-cores that use L3.  In that case, it is generally more efficient
to run "light" workloads on the E-cores that do not use L3 and allow all
of the cores using L3, including P-cores, to go into idle states.

For this reason, slightly increase the cost for all CPUs sharing the L3
cache to make EAS prefer CPUs that do not use it to the other CPUs with
the same perf-to-frequency scaling factor (if any).

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/cpufreq/intel_pstate.c |    8 ++++++++
 1 file changed, 8 insertions(+)

Comments

Christian Loehle April 25, 2025, 9:32 p.m. UTC | #1
On 4/16/25 19:12, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> On some hybrid platforms some efficient CPUs (E-cores) are not connected
> to the L3 cache, but there are no other differences between them and the
> other E-cores that use L3.  In that case, it is generally more efficient
> to run "light" workloads on the E-cores that do not use L3 and allow all
> of the cores using L3, including P-cores, to go into idle states.
> 
> For this reason, slightly increase the cost for all CPUs sharing the L3
> cache to make EAS prefer CPUs that do not use it to the other CPUs with
> the same perf-to-frequency scaling factor (if any).
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>  drivers/cpufreq/intel_pstate.c |    8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> --- a/drivers/cpufreq/intel_pstate.c
> +++ b/drivers/cpufreq/intel_pstate.c
> @@ -979,6 +979,7 @@
>  			   unsigned long *cost)
>  {
>  	struct pstate_data *pstate = &all_cpu_data[dev->id]->pstate;
> +	struct cpu_cacheinfo *cacheinfo = get_cpu_cacheinfo(dev->id);
>  
>  	/*
>  	 * The smaller the perf-to-frequency scaling factor, the larger the IPC
> @@ -991,6 +992,13 @@
>  	 * of the same type in different "utilization bins" is different.
>  	 */
>  	*cost = div_u64(100ULL * INTEL_PSTATE_CORE_SCALING, pstate->scaling) + freq;
> +	/*
> +	 * Inrease the cost slightly for CPUs able to access L3 to avoid litting

s/Inrease/Increase
and I guess s/litting/littering

> +	 * it up too eagerly in case some other CPUs of the same type cannot
> +	 * access it.
> +	 */
> +	if (cacheinfo->num_levels >= 3)
> +		(*cost)++;

This makes cost(OPP1) of the SoC Tile e-core as expensive as cost(OPP0) of a
normal e-core.
Is that the intended behaviour?
Rafael J. Wysocki April 25, 2025, 9:39 p.m. UTC | #2
On Fri, Apr 25, 2025 at 11:32 PM Christian Loehle
<christian.loehle@arm.com> wrote:
>
> On 4/16/25 19:12, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > On some hybrid platforms some efficient CPUs (E-cores) are not connected
> > to the L3 cache, but there are no other differences between them and the
> > other E-cores that use L3.  In that case, it is generally more efficient
> > to run "light" workloads on the E-cores that do not use L3 and allow all
> > of the cores using L3, including P-cores, to go into idle states.
> >
> > For this reason, slightly increase the cost for all CPUs sharing the L3
> > cache to make EAS prefer CPUs that do not use it to the other CPUs with
> > the same perf-to-frequency scaling factor (if any).
> >
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > ---
> >  drivers/cpufreq/intel_pstate.c |    8 ++++++++
> >  1 file changed, 8 insertions(+)
> >
> > --- a/drivers/cpufreq/intel_pstate.c
> > +++ b/drivers/cpufreq/intel_pstate.c
> > @@ -979,6 +979,7 @@
> >                          unsigned long *cost)
> >  {
> >       struct pstate_data *pstate = &all_cpu_data[dev->id]->pstate;
> > +     struct cpu_cacheinfo *cacheinfo = get_cpu_cacheinfo(dev->id);
> >
> >       /*
> >        * The smaller the perf-to-frequency scaling factor, the larger the IPC
> > @@ -991,6 +992,13 @@
> >        * of the same type in different "utilization bins" is different.
> >        */
> >       *cost = div_u64(100ULL * INTEL_PSTATE_CORE_SCALING, pstate->scaling) + freq;
> > +     /*
> > +      * Inrease the cost slightly for CPUs able to access L3 to avoid litting
>
> s/Inrease/Increase
> and I guess s/litting/littering
>
> > +      * it up too eagerly in case some other CPUs of the same type cannot
> > +      * access it.
> > +      */
> > +     if (cacheinfo->num_levels >= 3)

This check actually doesn't work on Intel processors, I have a
replacement patch for this one.

> > +             (*cost)++;
>
> This makes cost(OPP1) of the SoC Tile e-core as expensive as cost(OPP0) of a
> normal e-core.

If "a normal Ecore" is one using L3, then yes.

> Is that the intended behaviour?

Yes, it is.  I wanted the Ecores on L3 to appear somewhat more
expensive, but not too much.

It looks like *cost += 2 would work better, though.
diff mbox series

Patch

--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -979,6 +979,7 @@ 
 			   unsigned long *cost)
 {
 	struct pstate_data *pstate = &all_cpu_data[dev->id]->pstate;
+	struct cpu_cacheinfo *cacheinfo = get_cpu_cacheinfo(dev->id);
 
 	/*
 	 * The smaller the perf-to-frequency scaling factor, the larger the IPC
@@ -991,6 +992,13 @@ 
 	 * of the same type in different "utilization bins" is different.
 	 */
 	*cost = div_u64(100ULL * INTEL_PSTATE_CORE_SCALING, pstate->scaling) + freq;
+	/*
+	 * Inrease the cost slightly for CPUs able to access L3 to avoid litting
+	 * it up too eagerly in case some other CPUs of the same type cannot
+	 * access it.
+	 */
+	if (cacheinfo->num_levels >= 3)
+		(*cost)++;
 
 	return 0;
 }