diff mbox series

[1/6] cpuidle: teo: Increase util-threshold

Message ID 20240606090050.327614-2-christian.loehle@arm.com
State New
Headers show
Series cpuidle: teo: fixes and improvements | expand

Commit Message

Christian Loehle June 6, 2024, 9 a.m. UTC
Increase the util-threshold by a lot as it was low enough for some
minor load to always be active, especially on smaller CPUs.

For small cap CPUs (Pixel6) the util threshold is as low as 1.
For CPUs of capacity <64 it is 0. So ensure it is at a minimum, too.

Fixes: 9ce0f7c4bc64 ("cpuidle: teo: Introduce util-awareness")
Reported-by: Qais Yousef <qyousef@layalina.io>
Reported-by: Vincent Guittot <vincent.guittot@linaro.org>
Suggested-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
Signed-off-by: Christian Loehle <christian.loehle@arm.com>
---
 drivers/cpuidle/governors/teo.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

Comments

Dietmar Eggemann June 7, 2024, 8:01 a.m. UTC | #1
On 06/06/2024 11:00, Christian Loehle wrote:
> Increase the util-threshold by a lot as it was low enough for some
> minor load to always be active, especially on smaller CPUs.

We see the blocked part of the CPU utilization as something telling the
task scheduler that the corresponding tasks might be runnable soon again
on this CPU.

This model seems to be used here as well. I guess folks are still
debating whether the amount of blocked utilization is a good enough
indicator for the length of idle time.

> For small cap CPUs (Pixel6) the util threshold is as low as 1.
> For CPUs of capacity <64 it is 0. So ensure it is at a minimum, too.

So before this threshold was 16 on a 1024 CPU, now it's 256?

A <= 200 CPU has now a threshold of 50.

Where do those numbers come from? Just from running another workload on
a specific device?

[...]
Christian Loehle June 7, 2024, 9:35 a.m. UTC | #2
On 6/7/24 09:01, Dietmar Eggemann wrote:
> On 06/06/2024 11:00, Christian Loehle wrote:
>> Increase the util-threshold by a lot as it was low enough for some
>> minor load to always be active, especially on smaller CPUs.
> 
> We see the blocked part of the CPU utilization as something telling the
> task scheduler that the corresponding tasks might be runnable soon again
> on this CPU.
> 
> This model seems to be used here as well. I guess folks are still
> debating whether the amount of blocked utilization is a good enough
> indicator for the length of idle time.

Right, the blocked utilization is treated as an indicator that we will
be brought out of sleep by a non-timer wakeup.

> 
>> For small cap CPUs (Pixel6) the util threshold is as low as 1.
>> For CPUs of capacity <64 it is 0. So ensure it is at a minimum, too.
> 
> So before this threshold was 16 on a 1024 CPU, now it's 256?
> 
> A <= 200 CPU has now a threshold of 50.
> 
> Where do those numbers come from? Just from running another workload on
> a specific device?
> 
> [...]

More or less yes.
Kajetan identified two broad use-cases for the utilization-based state
bypass: Early utilization ramp-up and high utilization scenarios.
The reports made it clear that the former can't be handled with a
threshold for just a single value as it will be too aggressive in
sustained (non-ramp-up) workloads.
To be fair, with patches 5 and 6 of this series, the ramp-up is
also handled quite early by the intercepts logic itself.
So as a fix I increased the value high enough to not trigger in
low-utilization scenarios.
There is likely room for optimization here, e.g. many wakeups
are also IPIs more related to the general system utilization instead
of the current CPU.
Qais Yousef June 9, 2024, 10:47 p.m. UTC | #3
On 06/06/24 10:00, Christian Loehle wrote:
> Increase the util-threshold by a lot as it was low enough for some
> minor load to always be active, especially on smaller CPUs.
> 
> For small cap CPUs (Pixel6) the util threshold is as low as 1.
> For CPUs of capacity <64 it is 0. So ensure it is at a minimum, too.
> 
> Fixes: 9ce0f7c4bc64 ("cpuidle: teo: Introduce util-awareness")
> Reported-by: Qais Yousef <qyousef@layalina.io>
> Reported-by: Vincent Guittot <vincent.guittot@linaro.org>
> Suggested-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
> Signed-off-by: Christian Loehle <christian.loehle@arm.com>
> ---
>  drivers/cpuidle/governors/teo.c | 11 +++++------
>  1 file changed, 5 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/cpuidle/governors/teo.c b/drivers/cpuidle/governors/teo.c
> index 7244f71c59c5..45f43e2ee02d 100644
> --- a/drivers/cpuidle/governors/teo.c
> +++ b/drivers/cpuidle/governors/teo.c
> @@ -146,13 +146,11 @@
>   * The number of bits to shift the CPU's capacity by in order to determine
>   * the utilized threshold.
>   *
> - * 6 was chosen based on testing as the number that achieved the best balance
> - * of power and performance on average.
> - *
>   * The resulting threshold is high enough to not be triggered by background
> - * noise and low enough to react quickly when activity starts to ramp up.
> + * noise.
>   */
> -#define UTIL_THRESHOLD_SHIFT 6
> +#define UTIL_THRESHOLD_SHIFT 2
> +#define UTIL_THRESHOLD_MIN 50
>  
>  /*
>   * The PULSE value is added to metrics when they grow and the DECAY_SHIFT value
> @@ -671,7 +669,8 @@ static int teo_enable_device(struct cpuidle_driver *drv,
>  	int i;
>  
>  	memset(cpu_data, 0, sizeof(*cpu_data));
> -	cpu_data->util_threshold = max_capacity >> UTIL_THRESHOLD_SHIFT;
> +	cpu_data->util_threshold = max(UTIL_THRESHOLD_MIN,
> +				max_capacity >> UTIL_THRESHOLD_SHIFT);

Thanks for trying to fix this. But I am afraid this is not a solution. There's
no magic number that can truly work here - we tried. As I tried to explain
before, a higher util value doesn't mean long idle time is unlikely. And
blocked load can cause problems where a decay can take too long.

We are following up with the suggestions I have thrown back then and we'll
share results if anything actually works.

For now, I think a revert is more appropriate. There was some perf benefit, but
the power regressions were bad and there's no threshold value that actually
works. The thresholding concept itself is incorrect and flawed - it seemed the
correct thing back then, yes. But in a hindsight now it doesn't work.


Thanks!

--
Qais Yousef

>  
>  	for (i = 0; i < NR_RECENT; i++)
>  		cpu_data->recent_idx[i] = -1;
> -- 
> 2.34.1
>
Christian Loehle June 10, 2024, 9:11 a.m. UTC | #4
On 6/9/24 23:47, Qais Yousef wrote:
> On 06/06/24 10:00, Christian Loehle wrote:
>> Increase the util-threshold by a lot as it was low enough for some
>> minor load to always be active, especially on smaller CPUs.
>>
>> For small cap CPUs (Pixel6) the util threshold is as low as 1.
>> For CPUs of capacity <64 it is 0. So ensure it is at a minimum, too.
>>
>> Fixes: 9ce0f7c4bc64 ("cpuidle: teo: Introduce util-awareness")
>> Reported-by: Qais Yousef <qyousef@layalina.io>
>> Reported-by: Vincent Guittot <vincent.guittot@linaro.org>
>> Suggested-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
>> Signed-off-by: Christian Loehle <christian.loehle@arm.com>
>> ---
>>  drivers/cpuidle/governors/teo.c | 11 +++++------
>>  1 file changed, 5 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/cpuidle/governors/teo.c b/drivers/cpuidle/governors/teo.c
>> index 7244f71c59c5..45f43e2ee02d 100644
>> --- a/drivers/cpuidle/governors/teo.c
>> +++ b/drivers/cpuidle/governors/teo.c
>> @@ -146,13 +146,11 @@
>>   * The number of bits to shift the CPU's capacity by in order to determine
>>   * the utilized threshold.
>>   *
>> - * 6 was chosen based on testing as the number that achieved the best balance
>> - * of power and performance on average.
>> - *
>>   * The resulting threshold is high enough to not be triggered by background
>> - * noise and low enough to react quickly when activity starts to ramp up.
>> + * noise.
>>   */
>> -#define UTIL_THRESHOLD_SHIFT 6
>> +#define UTIL_THRESHOLD_SHIFT 2
>> +#define UTIL_THRESHOLD_MIN 50
>>  
>>  /*
>>   * The PULSE value is added to metrics when they grow and the DECAY_SHIFT value
>> @@ -671,7 +669,8 @@ static int teo_enable_device(struct cpuidle_driver *drv,
>>  	int i;
>>  
>>  	memset(cpu_data, 0, sizeof(*cpu_data));
>> -	cpu_data->util_threshold = max_capacity >> UTIL_THRESHOLD_SHIFT;
>> +	cpu_data->util_threshold = max(UTIL_THRESHOLD_MIN,
>> +				max_capacity >> UTIL_THRESHOLD_SHIFT);
> 
> Thanks for trying to fix this. But I am afraid this is not a solution. There's
> no magic number that can truly work here - we tried. As I tried to explain
> before, a higher util value doesn't mean long idle time is unlikely. And
> blocked load can cause problems where a decay can take too long.
> 
> We are following up with the suggestions I have thrown back then and we'll
> share results if anything actually works.

Namely watching increase / decrease of utilization?
I think you would have to watch at least a couple of values before entering such
a logic and at that point the intercepts logic will handle it anyway.
Furthermore IMO we should be wary about introducing any state in teo that persists
across calls if not absolutely necessary (like intercept-detection) as it really
makes teo much less predictable.

> 
> For now, I think a revert is more appropriate. There was some perf benefit, but
> the power regressions were bad and there's no threshold value that actually
> works. The thresholding concept itself is incorrect and flawed - it seemed the
> correct thing back then, yes. But in a hindsight now it doesn't work.

Unfortunate :/
OK. I'll do some more testing with that, too. From what I can see a revert wouldn't
have terrible fallout with the series altogether, so I might just change this for
v2 and drop 2/6.

Kind Regards,
Christian
Qais Yousef June 16, 2024, 9:54 p.m. UTC | #5
On 06/10/24 10:11, Christian Loehle wrote:
> On 6/9/24 23:47, Qais Yousef wrote:
> > On 06/06/24 10:00, Christian Loehle wrote:
> >> Increase the util-threshold by a lot as it was low enough for some
> >> minor load to always be active, especially on smaller CPUs.
> >>
> >> For small cap CPUs (Pixel6) the util threshold is as low as 1.
> >> For CPUs of capacity <64 it is 0. So ensure it is at a minimum, too.
> >>
> >> Fixes: 9ce0f7c4bc64 ("cpuidle: teo: Introduce util-awareness")
> >> Reported-by: Qais Yousef <qyousef@layalina.io>
> >> Reported-by: Vincent Guittot <vincent.guittot@linaro.org>
> >> Suggested-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
> >> Signed-off-by: Christian Loehle <christian.loehle@arm.com>
> >> ---
> >>  drivers/cpuidle/governors/teo.c | 11 +++++------
> >>  1 file changed, 5 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/drivers/cpuidle/governors/teo.c b/drivers/cpuidle/governors/teo.c
> >> index 7244f71c59c5..45f43e2ee02d 100644
> >> --- a/drivers/cpuidle/governors/teo.c
> >> +++ b/drivers/cpuidle/governors/teo.c
> >> @@ -146,13 +146,11 @@
> >>   * The number of bits to shift the CPU's capacity by in order to determine
> >>   * the utilized threshold.
> >>   *
> >> - * 6 was chosen based on testing as the number that achieved the best balance
> >> - * of power and performance on average.
> >> - *
> >>   * The resulting threshold is high enough to not be triggered by background
> >> - * noise and low enough to react quickly when activity starts to ramp up.
> >> + * noise.
> >>   */
> >> -#define UTIL_THRESHOLD_SHIFT 6
> >> +#define UTIL_THRESHOLD_SHIFT 2
> >> +#define UTIL_THRESHOLD_MIN 50
> >>  
> >>  /*
> >>   * The PULSE value is added to metrics when they grow and the DECAY_SHIFT value
> >> @@ -671,7 +669,8 @@ static int teo_enable_device(struct cpuidle_driver *drv,
> >>  	int i;
> >>  
> >>  	memset(cpu_data, 0, sizeof(*cpu_data));
> >> -	cpu_data->util_threshold = max_capacity >> UTIL_THRESHOLD_SHIFT;
> >> +	cpu_data->util_threshold = max(UTIL_THRESHOLD_MIN,
> >> +				max_capacity >> UTIL_THRESHOLD_SHIFT);
> > 
> > Thanks for trying to fix this. But I am afraid this is not a solution. There's
> > no magic number that can truly work here - we tried. As I tried to explain
> > before, a higher util value doesn't mean long idle time is unlikely. And
> > blocked load can cause problems where a decay can take too long.
> > 
> > We are following up with the suggestions I have thrown back then and we'll
> > share results if anything actually works.
> 
> Namely watching increase / decrease of utilization?

We still have to explore and see. I think we need multiple cues if we are to
try to predict the likelihood of a task waking up sooner than min_residency.
And be scalable across workloads/systems.

> I think you would have to watch at least a couple of values before entering such
> a logic and at that point the intercepts logic will handle it anyway.
> Furthermore IMO we should be wary about introducing any state in teo that persists
> across calls if not absolutely necessary (like intercept-detection) as it really
> makes teo much less predictable.
> 
> > 
> > For now, I think a revert is more appropriate. There was some perf benefit, but
> > the power regressions were bad and there's no threshold value that actually
> > works. The thresholding concept itself is incorrect and flawed - it seemed the
> > correct thing back then, yes. But in a hindsight now it doesn't work.
> 
> Unfortunate :/
> OK. I'll do some more testing with that, too. From what I can see a revert wouldn't
> have terrible fallout with the series altogether, so I might just change this for
> v2 and drop 2/6.
> 
> Kind Regards,
> Christian
>
Christian Loehle June 19, 2024, 9:53 a.m. UTC | #6
On Mon, Jun 10, 2024 at 11:57:55AM +0200, Ulf Hansson wrote:
> On Mon, 10 Jun 2024 at 00:47, Qais Yousef <qyousef@layalina.io> wrote:
> >
> > On 06/06/24 10:00, Christian Loehle wrote:
> > > Increase the util-threshold by a lot as it was low enough for some
> > > minor load to always be active, especially on smaller CPUs.
> > >
> > > For small cap CPUs (Pixel6) the util threshold is as low as 1.
> > > For CPUs of capacity <64 it is 0. So ensure it is at a minimum, too.
> > >
> > > Fixes: 9ce0f7c4bc64 ("cpuidle: teo: Introduce util-awareness")
> > > Reported-by: Qais Yousef <qyousef@layalina.io>
> > > Reported-by: Vincent Guittot <vincent.guittot@linaro.org>
> > > Suggested-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
> > > Signed-off-by: Christian Loehle <christian.loehle@arm.com>
> > > ---
> > >  drivers/cpuidle/governors/teo.c | 11 +++++------
> > >  1 file changed, 5 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/drivers/cpuidle/governors/teo.c b/drivers/cpuidle/governors/teo.c
> > > index 7244f71c59c5..45f43e2ee02d 100644
> > > --- a/drivers/cpuidle/governors/teo.c
> > > +++ b/drivers/cpuidle/governors/teo.c
> > > @@ -146,13 +146,11 @@
> > >   * The number of bits to shift the CPU's capacity by in order to determine
> > >   * the utilized threshold.
> > >   *
> > > - * 6 was chosen based on testing as the number that achieved the best balance
> > > - * of power and performance on average.
> > > - *
> > >   * The resulting threshold is high enough to not be triggered by background
> > > - * noise and low enough to react quickly when activity starts to ramp up.
> > > + * noise.
> > >   */
> > > -#define UTIL_THRESHOLD_SHIFT 6
> > > +#define UTIL_THRESHOLD_SHIFT 2
> > > +#define UTIL_THRESHOLD_MIN 50
> > >
> > >  /*
> > >   * The PULSE value is added to metrics when they grow and the DECAY_SHIFT value
> > > @@ -671,7 +669,8 @@ static int teo_enable_device(struct cpuidle_driver *drv,
> > >       int i;
> > >
> > >       memset(cpu_data, 0, sizeof(*cpu_data));
> > > -     cpu_data->util_threshold = max_capacity >> UTIL_THRESHOLD_SHIFT;
> > > +     cpu_data->util_threshold = max(UTIL_THRESHOLD_MIN,
> > > +                             max_capacity >> UTIL_THRESHOLD_SHIFT);
> >
> > Thanks for trying to fix this. But I am afraid this is not a solution. There's
> > no magic number that can truly work here - we tried. As I tried to explain
> > before, a higher util value doesn't mean long idle time is unlikely. And
> > blocked load can cause problems where a decay can take too long.
> >
> > We are following up with the suggestions I have thrown back then and we'll
> > share results if anything actually works.
> >
> > For now, I think a revert is more appropriate. There was some perf benefit, but
> > the power regressions were bad and there's no threshold value that actually
> > works. The thresholding concept itself is incorrect and flawed - it seemed the
> > correct thing back then, yes. But in a hindsight now it doesn't work.
> >
>
> For the record, I fully agree with the above. A revert seems to be the
> best option in my opinion too.
>
> Besides for the above reasons; when using cpuidle-psci with PSCI OSI
> mode, the approach leads to disabling *all* of cluster's idle-states
> too, as those aren't even visible for the teo governor. I am sure,
> that was not the intent with commit 9ce0f7c4bc64.
>

Just for my understanding, what does OSI mode have to do with this?
My understanding is that util-awareness (and also intercepts) will
disable cluster idle for the entire cluster of the affected CPU.
Why would this not be the commit's intent? The assumption of
util-awareness is that this CPU will a) be woken up early anyway
and b) the exit latency of cluster idle will affect performance (of
the utilized CPU).

We can argue if the assumption of util-awareness is true, but I don't
quite follow your argument with OSI. The behavior is the same with PC
mode, isn't it?
diff mbox series

Patch

diff --git a/drivers/cpuidle/governors/teo.c b/drivers/cpuidle/governors/teo.c
index 7244f71c59c5..45f43e2ee02d 100644
--- a/drivers/cpuidle/governors/teo.c
+++ b/drivers/cpuidle/governors/teo.c
@@ -146,13 +146,11 @@ 
  * The number of bits to shift the CPU's capacity by in order to determine
  * the utilized threshold.
  *
- * 6 was chosen based on testing as the number that achieved the best balance
- * of power and performance on average.
- *
  * The resulting threshold is high enough to not be triggered by background
- * noise and low enough to react quickly when activity starts to ramp up.
+ * noise.
  */
-#define UTIL_THRESHOLD_SHIFT 6
+#define UTIL_THRESHOLD_SHIFT 2
+#define UTIL_THRESHOLD_MIN 50
 
 /*
  * The PULSE value is added to metrics when they grow and the DECAY_SHIFT value
@@ -671,7 +669,8 @@  static int teo_enable_device(struct cpuidle_driver *drv,
 	int i;
 
 	memset(cpu_data, 0, sizeof(*cpu_data));
-	cpu_data->util_threshold = max_capacity >> UTIL_THRESHOLD_SHIFT;
+	cpu_data->util_threshold = max(UTIL_THRESHOLD_MIN,
+				max_capacity >> UTIL_THRESHOLD_SHIFT);
 
 	for (i = 0; i < NR_RECENT; i++)
 		cpu_data->recent_idx[i] = -1;