[RFT,v2,0/3] cpuidle: teo: Do not check timers unconditionally every time

Message ID	5712331.DvuYhMxLoT@kreacher
Headers	show Return-Path: <linux-pm-owner@vger.kernel.org> From: "Rafael J. Wysocki" <rjw@rjwysocki.net> To: Linux PM <linux-pm@vger.kernel.org>, Peter Zijlstra <peterz@infradead.org>, Anna-Maria Behnsen <anna-maria@linutronix.de> Cc: LKML <linux-kernel@vger.kernel.org>, Frederic Weisbecker <frederic@kernel.org>, Kajetan Puchalski <kajetan.puchalski@arm.com> Subject: [RFT][PATCH v2 0/3] cpuidle: teo: Do not check timers unconditionally every time Date: Thu, 03 Aug 2023 22:57:04 +0200 Message-ID: <5712331.DvuYhMxLoT@kreacher> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="UTF-8" Precedence: bulk
Series	cpuidle: teo: Do not check timers unconditionally every time \| expand [RFT,v2,0/3] cpuidle: teo: Do not check timers unconditionally every time [RFT,v2,1/3] cpuidle: teo: Do not call tick_nohz_get_sleep_length() upfront [RFT,v2,2/3] cpuidle: teo: Skip tick_nohz_get_sleep_length() call in some cases [RFT,v2,3/3] cpuidle: teo: Gather statistics regarding whether or not to stop the tick

Rafael J. Wysocki Aug. 3, 2023, 8:57 p.m. UTC

Hi Folks,

This is the second iteration of:

https://lore.kernel.org/linux-pm/4511619.LvFx2qVVIh@kreacher/

with an additional patch.

There are some small modifications of patch [1/3] and the new
patch causes governor statistics to play a role in deciding whether
or not to stop the scheduler tick.

Testing would be much appreciated!

Thanks!

Anna-Maria Behnsen Aug. 7, 2023, 3:38 p.m. UTC | #1

On Thu, 3 Aug 2023, Rafael J. Wysocki wrote:

> On Thu, Aug 3, 2023 at 11:12 PM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> >
> > Hi Folks,
> >
> > This is the second iteration of:
> >
> > https://lore.kernel.org/linux-pm/4511619.LvFx2qVVIh@kreacher/
> >
> > with an additional patch.
> >
> > There are some small modifications of patch [1/3] and the new
> > patch causes governor statistics to play a role in deciding whether
> > or not to stop the scheduler tick.
> >
> > Testing would be much appreciated!
> 
> For convenience, this series is now available in the following git branch:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \
>  pm-cpuidle-teo
> 

Gauthams tests and the distribution of idle time durations looks pretty
good. Also the prevention of calling tick_nohz_get_sleep_length() is very
nice (21477 calls of tick_nohz_next_event() and the tick was stopped 2670
times).

Here is the deviation of idle time durations (based on your branch):

Idle Total		2670	100.00%
x >= 4ms		2537	95.02%
4ms> x >= 2ms		19	0.71%
2ms > x >= 1ms		10	0.37%
1ms > x >= 500us	7	0.26%
500us > x >= 250us	6	0.22%
250us > x >=100us	13	0.49%
100us > x >= 50us	17	0.64%
50us > x >= 25us	25	0.94%
25us > x >= 10us	22	0.82%
10us > x > 5us		9	0.34%
5us > x			5	0.19%


Thanks,

	Anna-Maria

Rafael J. Wysocki Aug. 7, 2023, 4:46 p.m. UTC | #2

Hi Kajetan,

On Mon, Aug 7, 2023 at 4:04 PM Kajetan Puchalski
<kajetan.puchalski@arm.com> wrote:
>
> Hi Rafael,
>
> On Thu, Aug 03, 2023 at 10:57:04PM +0200, Rafael J. Wysocki wrote:
> > Hi Folks,
> >
> > This is the second iteration of:
> >
> > https://lore.kernel.org/linux-pm/4511619.LvFx2qVVIh@kreacher/
> >
> > with an additional patch.
> >
> > There are some small modifications of patch [1/3] and the new
> > patch causes governor statistics to play a role in deciding whether
> > or not to stop the scheduler tick.
> >
> > Testing would be much appreciated!
> >
> > Thanks!
> >
>
> My test results including the v2 are below.
>
> 1. Geekbench 6
>
> +---------------------------+---------------+-----------------+-------------------+----------------------+
> |          metric           |      teo      |     teo_tick    |    teo_tick_rfc   |    teo_tick_rfc_v2   |
> +---------------------------+---------------+-----------------+-------------------+----------------------+
> |      multicore_score      | 3320.9 (0.0%) | 3303.3 (-0.53%) |  3293.6 (-0.82%)  |   3302.3 (-0.56%)    |
> |           score           | 1415.7 (0.0%) | 1417.7 (0.14%)  |  1423.4 (0.54%)   |    1425.8 (0.71%)    |
> |      CPU_total_power      | 2421.3 (0.0%) | 2429.3 (0.33%)  |  2442.2 (0.86%)   |    2461.9 (1.67%)    |
> |  latency (AsyncTask #1)   | 49.41μ (0.0%) | 51.07μ (3.36%)  |   50.1μ (1.4%)    |    50.76μ (2.73%)    |
> | latency (labs.geekbench6) | 65.63μ (0.0%) | 77.47μ (18.03%) | 55.82μ (-14.95%)  |    66.12μ (0.75%)    |
> | latency (surfaceflinger)  | 39.46μ (0.0%) | 36.94μ (-6.39%) |  35.79μ (-9.28%)  |    40.36μ (2.3%)     |
> +---------------------------+---------------+-----------------+-------------------+----------------------+
>
> +----------------------+-------------+------------+
> |         tag          |    type     | count_perc |
> +----------------------+-------------+------------+
> |         teo          |  too deep   |   2.034    |
> |       teo_tick       |  too deep   |    2.16    |
> |     teo_tick_rfc     |  too deep   |   2.071    |
> |    teo_tick_rfc_v2   |  too deep   |   2.548    |
> |         teo          | too shallow |   15.791   |
> |       teo_tick       | too shallow |   20.881   |
> |     teo_tick_rfc     | too shallow |   20.337   |
> |    teo_tick_rfc_v2   | too shallow |   19.886   |
> +----------------------+-------------+------------+
>
>
> 2. JetNews
>
> +-----------------+---------------+----------------+-----------------+-----------------+
> |     metric      |      teo      |    teo_tick    |  teo_tick_rfc   | teo_tick_rfc_v2 |
> +-----------------+---------------+----------------+-----------------+-----------------+
> |       fps       |  86.2 (0.0%)  |  86.4 (0.16%)  |  86.0 (-0.28%)  |  86.6 (0.41%)   |
> |    janks_pc     |  0.8 (0.0%)   |  0.8 (-0.66%)  |  0.8 (-1.37%)   |  0.7 (-11.37%)  |
> | CPU_total_power | 185.2 (0.0%)  | 178.2 (-3.76%) |  182.2 (-1.6%)  | 169.4 (-8.53%)  | <- very interesting
> +-----------------+---------------+----------------+-----------------+-----------------+
>
> +----------------------+-------------+--------------------+
> |         tag          |    type     |     count_perc     |
> +----------------------+-------------+--------------------+
> |         teo          |  too deep   |       0.992        |
> |       teo_tick       |  too deep   |       0.945        |
> |     teo_tick_rfc     |  too deep   |       1.035        |
> |    teo_tick_rfc_v2   |  too deep   |       1.127        |
> |         teo          | too shallow |       17.085       |
> |       teo_tick       | too shallow |       15.236       |
> |     teo_tick_rfc     | too shallow |       15.379       |
> |    teo_tick_rfc_v2   | too shallow |       15.34        |
> +----------------------+-------------+--------------------+
>
> All in all looks pretty good. Unfortunately there's a slightly larger
> percentage of too deep sleeps with the v2 (which is probably where the
> increase in GB6 power usage comes from) but the lower jank percentage +
> substantially lower power usage for the UI workload are very promising.
>
> Since we don't care about GB6 power usage as much as UI power usage, I'd
> say that the patchset looks good :)
>
> Tested-by: Kajetan Puchalski <kajetan.puchalski@arm.com>

Thanks a lot, much appreciated!

Rafael J. Wysocki Aug. 7, 2023, 4:47 p.m. UTC | #3

On Mon, Aug 7, 2023 at 5:38 PM Anna-Maria Behnsen
<anna-maria@linutronix.de> wrote:
>
> On Thu, 3 Aug 2023, Rafael J. Wysocki wrote:
>
> > On Thu, Aug 3, 2023 at 11:12 PM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > >
> > > Hi Folks,
> > >
> > > This is the second iteration of:
> > >
> > > https://lore.kernel.org/linux-pm/4511619.LvFx2qVVIh@kreacher/
> > >
> > > with an additional patch.
> > >
> > > There are some small modifications of patch [1/3] and the new
> > > patch causes governor statistics to play a role in deciding whether
> > > or not to stop the scheduler tick.
> > >
> > > Testing would be much appreciated!
> >
> > For convenience, this series is now available in the following git branch:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \
> >  pm-cpuidle-teo
> >
>
> Gauthams tests and the distribution of idle time durations looks pretty
> good. Also the prevention of calling tick_nohz_get_sleep_length() is very
> nice (21477 calls of tick_nohz_next_event() and the tick was stopped 2670
> times).
>
> Here is the deviation of idle time durations (based on your branch):
>
> Idle Total              2670    100.00%
> x >= 4ms                2537    95.02%
> 4ms> x >= 2ms           19      0.71%
> 2ms > x >= 1ms          10      0.37%
> 1ms > x >= 500us        7       0.26%
> 500us > x >= 250us      6       0.22%
> 250us > x >=100us       13      0.49%
> 100us > x >= 50us       17      0.64%
> 50us > x >= 25us        25      0.94%
> 25us > x >= 10us        22      0.82%
> 10us > x > 5us          9       0.34%
> 5us > x                 5       0.19%

Thanks a lot for the data!

Can I add a Tested-by: tag from you to this series?

Doug Smythies Aug. 8, 2023, 3:22 p.m. UTC | #4

On 2023.08.03 14:33 Rafael wrote:
> On Thu, Aug 3, 2023 at 11:12 PM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>>
>> Hi Folks,
>>
>> This is the second iteration of:
>>
>> https://lore.kernel.org/linux-pm/4511619.LvFx2qVVIh@kreacher/
>>
>> with an additional patch.
>>
>> There are some small modifications of patch [1/3] and the new
>> patch causes governor statistics to play a role in deciding whether
>> or not to stop the scheduler tick.
>>
>> Testing would be much appreciated!
>
> For convenience, this series is now available in the following git branch:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \
> pm-cpuidle-teo

Hi Rafael,

Thank you for the git branch link.

I did some testing:

Disclaimer: I used areas of focus derived
from the original teo-util work last fall,
and did not check if they were still the best
places to look for issues.

CPU: Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz
HWP: enabled
CPU frequency scaling driver: intel_pstate
CPU frequency scaling governor: performance
Kernel 1: 6.5-rc4 (1000 Hz tick rate)
Kernel 2: kernel 1 + this patch series (called "rjw")
System is extremely idle, other than the test work.

All tests were done with all idle governors:
menu, teo, ladder, rjw.

Test 1: 2 core ping pong sweep:

Pass a token between 2 CPUs on 2 different cores.
Do a variable amount of work at each stop.

Purpose: To utilize the shallowest idle states
and observe the transition from using more of 1
idle state to another.

Results: 

teo and rjw track fairly well, with
rjw reducing its use of idle state 0 before
teo as the work packet increases. The menu governor
does best overall, but performs worse over a greater
range of token loop times.

Details (power and idle stats; times):
http://smythies.com/~doug/linux/idle/teo-util2/ping-sweep/2-1/perf/
http://smythies.com/~doug/linux/idle/teo-util2/ping-sweep/2-1/2-core-ping-pong-sweep.png

Test 2: 6 core ping pong sweep:

Pass a token between 6 CPUs on 6 different cores.
Do a variable amount of work at each stop.

Purpose: To utilize the midrange idle states
and observe the transitions from between use of
idle states.

Results: There is some instability in the results
in the early stages.
For unknown reasons, the rjw governor sometimes works
slower and at lower power. The condition is not 100%
repeatable.

Overall teo completed the test fastest (54.9 minutes)
Followed by menu (56.2 minutes), then rjw (56.7 minutes),
then ladder (58.4 minutes). teo is faster throughout the
latter stages of the test, but at the cost of more power.
The differences seem to be in the transition from idle
state 1 to idle state 2 usage.

Details (power and idle stats; times):
http://smythies.com/~doug/linux/idle/teo-util2/ping-sweep/6-2/perf/
http://smythies.com/~doug/linux/idle/teo-util2/ping-sweep/6-2/6-core-ping-pong-sweep.png
http://smythies.com/~doug/linux/idle/teo-util2/ping-sweep/6-2/6-core-ping-pong-sweep-detail-a.png
http://smythies.com/~doug/linux/idle/teo-util2/ping-sweep/6-2/6-core-ping-pong-sweep-detail-b.png
http://smythies.com/~doug/linux/idle/teo-util2/ping-sweep/6-2/6-core-ping-pong-sweep-diffs.png

a re-run power and idle stats, showing inconsistent behaviour.
teo and rjw only, and no timing data:
http://smythies.com/~doug/linux/idle/teo-util2/ping-sweep/6-1/perf/

Test 3: sleeping ebizzy - 128 threads.

Purpose: This test has given interesting results in the past.
The test varies the sleep interval between record lookups.
The result is varying usage of idle states.

Results: It can be difficult to see any differences in
the overall timing graph, but a graph of differences
is revealing. teo outperforms rjw in the longer intervals
region of the test, at the cost of more power.

Details: (power and idle stats; times):
http://smythies.com/~doug/linux/idle/teo-util2/ebizzy/perf/
http://smythies.com/~doug/linux/idle/teo-util2/ebizzy/ebizzy-128-perf.png
http://smythies.com/~doug/linux/idle/teo-util2/ebizzy/ebizzy-128-perf-diffs.png

Test 4: 2 X 2 pair token passing. Dwell test. Fast:

Purpose: Dwell under one set of conditions. Observe
noise and/or any bi-stability.

Results (reference time is menu):
rjw: 3.0723 usecs/loop average. +3.15%
teo: 2.9917 usecs/loop average. +0.44%
menu: 2.97845 usecs/loop average. Reference
ladder: 4.077375 usecs/loop average. +36.9%

Powers are all similar, with ladder a bit lower.

Details: (power and idle stats; times):
http://smythies.com/~doug/linux/idle/teo-util2/many-0-400000000-2/perf/
http://smythies.com/~doug/linux/idle/teo-util2/many-0-400000000-2/times.txt

Test 5: 2 X 2 pair token passing. Dwell test. Medium:

Purpose: Dwell under one set of conditions. Observe
noise and/or any bi-stability.

Results (reference time is menu):
rjw: 11.3406 usecs/loop average. -0.69%
teo: 11.36765 usecs/loop average. -0.45%
menu: 11.41905 usecs/loop average. reference
ladder: 11.9535 usecs/loop average. +4.68%

Powers are all similar.

Details:
http://smythies.com/~doug/linux/idle/teo-util2/many-3000-100000000-2/perf/
http://smythies.com/~doug/linux/idle/teo-util2/many-3000-100000000-2/times.txt

Test 6: 2 X 2 pair token passing. Dwell test. Slow:

Purpose: Dwell under one set of conditions. Observe
noise and/or any bi-stability.

Results (reference time is menu):
rjw: 2591.70 usecs/loop average. +0.26%
teo: 2566.34 usecs/loop average. -0.72%
menu: 2585.00 usecs/loop average. reference
ladder: 2635.36 usecs/loop average. +1.95%

Powers are all similar, with ladder a bit lower. 
Due to the strong temperature to power use curve,
a much longer dwell test would need to be run to
be sure to get to steady state power usage.

Details:
http://smythies.com/~doug/linux/idle/teo-util2/many-1000000-342000-2/perf/
http://smythies.com/~doug/linux/idle/teo-util2/many-1000000-342000-2/times.txt

Test 7: 500 low load threads.

Purpose: This test has given interesting results
in the past.

500 threads at approximately 10 hertz work/sleep frequency
and about 0.0163 load per thread, 8.15 total.
CPUs about 32% idle.

Results:
rjw executed 0.01% faster than teo.
rjw used 5% less energy than teo.

Details:
http://smythies.com/~doug/linux/idle/teo-util2/waiter/perf/
http://smythies.com/~doug/linux/idle/teo-util2/waiter/times.txt

Conclusions: Overall, I am not seeing a compelling reason to
proceed with this patch set.

... Doug

Rafael J. Wysocki Aug. 8, 2023, 4:43 p.m. UTC | #5

On Tue, Aug 8, 2023 at 5:22 PM Doug Smythies <dsmythies@telus.net> wrote:
>
> On 2023.08.03 14:33 Rafael wrote:
> > On Thu, Aug 3, 2023 at 11:12 PM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> >>
> >> Hi Folks,
> >>
> >> This is the second iteration of:
> >>
> >> https://lore.kernel.org/linux-pm/4511619.LvFx2qVVIh@kreacher/
> >>
> >> with an additional patch.
> >>
> >> There are some small modifications of patch [1/3] and the new
> >> patch causes governor statistics to play a role in deciding whether
> >> or not to stop the scheduler tick.
> >>
> >> Testing would be much appreciated!
> >
> > For convenience, this series is now available in the following git branch:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \
> > pm-cpuidle-teo
>
> Hi Rafael,
>
> Thank you for the git branch link.
>
> I did some testing:
>
> Disclaimer: I used areas of focus derived
> from the original teo-util work last fall,
> and did not check if they were still the best
> places to look for issues.
>
> CPU: Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz
> HWP: enabled
> CPU frequency scaling driver: intel_pstate
> CPU frequency scaling governor: performance
> Kernel 1: 6.5-rc4 (1000 Hz tick rate)
> Kernel 2: kernel 1 + this patch series (called "rjw")
> System is extremely idle, other than the test work.
>
> All tests were done with all idle governors:
> menu, teo, ladder, rjw.
>
> Test 1: 2 core ping pong sweep:
>
> Pass a token between 2 CPUs on 2 different cores.
> Do a variable amount of work at each stop.
>
> Purpose: To utilize the shallowest idle states
> and observe the transition from using more of 1
> idle state to another.
>
> Results:
>
> teo and rjw track fairly well, with
> rjw reducing its use of idle state 0 before
> teo as the work packet increases. The menu governor
> does best overall, but performs worse over a greater
> range of token loop times.
>
> Details (power and idle stats; times):
> http://smythies.com/~doug/linux/idle/teo-util2/ping-sweep/2-1/perf/
> http://smythies.com/~doug/linux/idle/teo-util2/ping-sweep/2-1/2-core-ping-pong-sweep.png
>
> Test 2: 6 core ping pong sweep:
>
> Pass a token between 6 CPUs on 6 different cores.
> Do a variable amount of work at each stop.
>
> Purpose: To utilize the midrange idle states
> and observe the transitions from between use of
> idle states.
>
> Results: There is some instability in the results
> in the early stages.
> For unknown reasons, the rjw governor sometimes works
> slower and at lower power. The condition is not 100%
> repeatable.
>
> Overall teo completed the test fastest (54.9 minutes)
> Followed by menu (56.2 minutes), then rjw (56.7 minutes),
> then ladder (58.4 minutes). teo is faster throughout the
> latter stages of the test, but at the cost of more power.
> The differences seem to be in the transition from idle
> state 1 to idle state 2 usage.
>
> Details (power and idle stats; times):
> http://smythies.com/~doug/linux/idle/teo-util2/ping-sweep/6-2/perf/
> http://smythies.com/~doug/linux/idle/teo-util2/ping-sweep/6-2/6-core-ping-pong-sweep.png
> http://smythies.com/~doug/linux/idle/teo-util2/ping-sweep/6-2/6-core-ping-pong-sweep-detail-a.png
> http://smythies.com/~doug/linux/idle/teo-util2/ping-sweep/6-2/6-core-ping-pong-sweep-detail-b.png
> http://smythies.com/~doug/linux/idle/teo-util2/ping-sweep/6-2/6-core-ping-pong-sweep-diffs.png
>
> a re-run power and idle stats, showing inconsistent behaviour.
> teo and rjw only, and no timing data:
> http://smythies.com/~doug/linux/idle/teo-util2/ping-sweep/6-1/perf/
>
> Test 3: sleeping ebizzy - 128 threads.
>
> Purpose: This test has given interesting results in the past.
> The test varies the sleep interval between record lookups.
> The result is varying usage of idle states.
>
> Results: It can be difficult to see any differences in
> the overall timing graph, but a graph of differences
> is revealing. teo outperforms rjw in the longer intervals
> region of the test, at the cost of more power.
>
> Details: (power and idle stats; times):
> http://smythies.com/~doug/linux/idle/teo-util2/ebizzy/perf/
> http://smythies.com/~doug/linux/idle/teo-util2/ebizzy/ebizzy-128-perf.png
> http://smythies.com/~doug/linux/idle/teo-util2/ebizzy/ebizzy-128-perf-diffs.png
>
> Test 4: 2 X 2 pair token passing. Dwell test. Fast:
>
> Purpose: Dwell under one set of conditions. Observe
> noise and/or any bi-stability.
>
> Results (reference time is menu):
> rjw: 3.0723 usecs/loop average. +3.15%
> teo: 2.9917 usecs/loop average. +0.44%
> menu: 2.97845 usecs/loop average. Reference
> ladder: 4.077375 usecs/loop average. +36.9%
>
> Powers are all similar, with ladder a bit lower.
>
> Details: (power and idle stats; times):
> http://smythies.com/~doug/linux/idle/teo-util2/many-0-400000000-2/perf/
> http://smythies.com/~doug/linux/idle/teo-util2/many-0-400000000-2/times.txt
>
> Test 5: 2 X 2 pair token passing. Dwell test. Medium:
>
> Purpose: Dwell under one set of conditions. Observe
> noise and/or any bi-stability.
>
> Results (reference time is menu):
> rjw: 11.3406 usecs/loop average. -0.69%
> teo: 11.36765 usecs/loop average. -0.45%
> menu: 11.41905 usecs/loop average. reference
> ladder: 11.9535 usecs/loop average. +4.68%
>
> Powers are all similar.
>
> Details:
> http://smythies.com/~doug/linux/idle/teo-util2/many-3000-100000000-2/perf/
> http://smythies.com/~doug/linux/idle/teo-util2/many-3000-100000000-2/times.txt
>
> Test 6: 2 X 2 pair token passing. Dwell test. Slow:
>
> Purpose: Dwell under one set of conditions. Observe
> noise and/or any bi-stability.
>
> Results (reference time is menu):
> rjw: 2591.70 usecs/loop average. +0.26%
> teo: 2566.34 usecs/loop average. -0.72%
> menu: 2585.00 usecs/loop average. reference
> ladder: 2635.36 usecs/loop average. +1.95%
>
> Powers are all similar, with ladder a bit lower.
> Due to the strong temperature to power use curve,
> a much longer dwell test would need to be run to
> be sure to get to steady state power usage.
>
> Details:
> http://smythies.com/~doug/linux/idle/teo-util2/many-1000000-342000-2/perf/
> http://smythies.com/~doug/linux/idle/teo-util2/many-1000000-342000-2/times.txt
>
> Test 7: 500 low load threads.
>
> Purpose: This test has given interesting results
> in the past.
>
> 500 threads at approximately 10 hertz work/sleep frequency
> and about 0.0163 load per thread, 8.15 total.
> CPUs about 32% idle.
>
> Results:
> rjw executed 0.01% faster than teo.
> rjw used 5% less energy than teo.
>
> Details:
> http://smythies.com/~doug/linux/idle/teo-util2/waiter/perf/
> http://smythies.com/~doug/linux/idle/teo-util2/waiter/times.txt

Thanks a lot for doing this work, much appreciated!

> Conclusions: Overall, I am not seeing a compelling reason to
> proceed with this patch set.

On the other hand, if there is a separate compelling reason to do
that, it doesn't appear to lead to a major regression.

Doug Smythies Aug. 8, 2023, 10:40 p.m. UTC | #6

On Tue, Aug 8, 2023 at 9:43 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> On Tue, Aug 8, 2023 at 5:22 PM Doug Smythies <dsmythies@telus.net> wrote:
> > On 2023.08.03 14:33 Rafael wrote:
> > > On Thu, Aug 3, 2023 at 11:12 PM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > >>
> > >> Hi Folks,
> > >>
> > >> This is the second iteration of:
> > >>
> > >> https://lore.kernel.org/linux-pm/4511619.LvFx2qVVIh@kreacher/
> > >>
> > >> with an additional patch.
> > >>
> > >> There are some small modifications of patch [1/3] and the new
> > >> patch causes governor statistics to play a role in deciding whether
> > >> or not to stop the scheduler tick.
> > >>
> > >> Testing would be much appreciated!
> > >
> > > For convenience, this series is now available in the following git branch:
> > >
> > > git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \
> > > pm-cpuidle-teo
> >
> > Hi Rafael,
> >
> > Thank you for the git branch link.
> >
> > I did some testing:


... deleted ...

> > Test 2: 6 core ping pong sweep:
> >
> > Pass a token between 6 CPUs on 6 different cores.
> > Do a variable amount of work at each stop.
> >
> > Purpose: To utilize the midrange idle states
> > and observe the transitions from between use of
> > idle states.
> >
> > Results: There is some instability in the results
> > in the early stages.
> > For unknown reasons, the rjw governor sometimes works
> > slower and at lower power. The condition is not 100%
> > repeatable.
> >
> > Overall teo completed the test fastest (54.9 minutes)
> > Followed by menu (56.2 minutes), then rjw (56.7 minutes),
> > then ladder (58.4 minutes). teo is faster throughout the
> > latter stages of the test, but at the cost of more power.
> > The differences seem to be in the transition from idle
> > state 1 to idle state 2 usage.

the magnitude of the later stages differences are significant.

... deleted ...

> Thanks a lot for doing this work, much appreciated!
>
> > Conclusions: Overall, I am not seeing a compelling reason to
> > proceed with this patch set.
>
> On the other hand, if there is a separate compelling reason to do
> that, it doesn't appear to lead to a major regression.

Agreed.

Just for additional information, a 6 core dwell test was run.
The test conditions were cherry picked for dramatic effect:

teo: average: 1162.13 uSec/loop ; Std dev: 0.38
ryw: average: 1266.45 uSec/loop ; Std dev: 6.53 ; +9%

teo: average: 29.98 watts
rjw: average: 30.30 watts
(the same within thermal experimental error)

Details (power and idle stats over the 45 minute test period):
http://smythies.com/~doug/linux/idle/teo-util2/6-13568-147097/perf/

Anna-Maria Behnsen Aug. 9, 2023, 3:09 p.m. UTC | #7

On Mon, 7 Aug 2023, Rafael J. Wysocki wrote:

> On Mon, Aug 7, 2023 at 5:38 PM Anna-Maria Behnsen
> <anna-maria@linutronix.de> wrote:
> >
> > On Thu, 3 Aug 2023, Rafael J. Wysocki wrote:
> >
> > > On Thu, Aug 3, 2023 at 11:12 PM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > > >
> > > > Hi Folks,
> > > >
> > > > This is the second iteration of:
> > > >
> > > > https://lore.kernel.org/linux-pm/4511619.LvFx2qVVIh@kreacher/
> > > >
> > > > with an additional patch.
> > > >
> > > > There are some small modifications of patch [1/3] and the new
> > > > patch causes governor statistics to play a role in deciding whether
> > > > or not to stop the scheduler tick.
> > > >
> > > > Testing would be much appreciated!
> > >
> > > For convenience, this series is now available in the following git branch:
> > >
> > > git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \
> > >  pm-cpuidle-teo
> > >
> >
> > Gauthams tests and the distribution of idle time durations looks pretty
> > good. Also the prevention of calling tick_nohz_get_sleep_length() is very
> > nice (21477 calls of tick_nohz_next_event() and the tick was stopped 2670
> > times).
> >
> > Here is the deviation of idle time durations (based on your branch):
> >
> > Idle Total              2670    100.00%
> > x >= 4ms                2537    95.02%
> > 4ms> x >= 2ms           19      0.71%
> > 2ms > x >= 1ms          10      0.37%
> > 1ms > x >= 500us        7       0.26%
> > 500us > x >= 250us      6       0.22%
> > 250us > x >=100us       13      0.49%
> > 100us > x >= 50us       17      0.64%
> > 50us > x >= 25us        25      0.94%
> > 25us > x >= 10us        22      0.82%
> > 10us > x > 5us          9       0.34%
> > 5us > x                 5       0.19%
> 
> Thanks a lot for the data!
> 
> Can I add a Tested-by: tag from you to this series?
> 

Sure - sorry for the delay!

Rafael J. Wysocki Aug. 9, 2023, 3:11 p.m. UTC | #8

On Wed, Aug 9, 2023 at 5:10 PM Anna-Maria Behnsen
<anna-maria@linutronix.de> wrote:
>
> On Mon, 7 Aug 2023, Rafael J. Wysocki wrote:
>
> > On Mon, Aug 7, 2023 at 5:38 PM Anna-Maria Behnsen
> > <anna-maria@linutronix.de> wrote:
> > >
> > > On Thu, 3 Aug 2023, Rafael J. Wysocki wrote:
> > >
> > > > On Thu, Aug 3, 2023 at 11:12 PM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > > > >
> > > > > Hi Folks,
> > > > >
> > > > > This is the second iteration of:
> > > > >
> > > > > https://lore.kernel.org/linux-pm/4511619.LvFx2qVVIh@kreacher/
> > > > >
> > > > > with an additional patch.
> > > > >
> > > > > There are some small modifications of patch [1/3] and the new
> > > > > patch causes governor statistics to play a role in deciding whether
> > > > > or not to stop the scheduler tick.
> > > > >
> > > > > Testing would be much appreciated!
> > > >
> > > > For convenience, this series is now available in the following git branch:
> > > >
> > > > git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \
> > > >  pm-cpuidle-teo
> > > >
> > >
> > > Gauthams tests and the distribution of idle time durations looks pretty
> > > good. Also the prevention of calling tick_nohz_get_sleep_length() is very
> > > nice (21477 calls of tick_nohz_next_event() and the tick was stopped 2670
> > > times).
> > >
> > > Here is the deviation of idle time durations (based on your branch):
> > >
> > > Idle Total              2670    100.00%
> > > x >= 4ms                2537    95.02%
> > > 4ms> x >= 2ms           19      0.71%
> > > 2ms > x >= 1ms          10      0.37%
> > > 1ms > x >= 500us        7       0.26%
> > > 500us > x >= 250us      6       0.22%
> > > 250us > x >=100us       13      0.49%
> > > 100us > x >= 50us       17      0.64%
> > > 50us > x >= 25us        25      0.94%
> > > 25us > x >= 10us        22      0.82%
> > > 10us > x > 5us          9       0.34%
> > > 5us > x                 5       0.19%
> >
> > Thanks a lot for the data!
> >
> > Can I add a Tested-by: tag from you to this series?
> >
>
> Sure - sorry for the delay!

No worries, thanks!

Anna-Maria Behnsen Aug. 9, 2023, 4:24 p.m. UTC | #9

Hi,

On Tue, 8 Aug 2023, Doug Smythies wrote:
> On Tue, Aug 8, 2023 at 9:43 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > On Tue, Aug 8, 2023 at 5:22 PM Doug Smythies <dsmythies@telus.net> wrote:

[...]

> > > Conclusions: Overall, I am not seeing a compelling reason to
> > > proceed with this patch set.
> >
> > On the other hand, if there is a separate compelling reason to do
> > that, it doesn't appear to lead to a major regression.
> 
> Agreed.
> 

Regarding the compelling reason:

On a fully loaded machine with 256 CPUs tick_nohz_next_event() is executed
~48000 times per second. With this patchset it is reduced to ~120 times per
second. The factor for the difference is 400. This is already an
improvement.

tick_nohz_next_event() marks timer bases idle, whenever possible - even if
the tick is not stopped afterwards. When a timer is enqueued remote into an
idle timer base an IPI is sent. Calling tick_nohz_next_event() only when
the system is not that busy, prevents those unnecessary IPIs.

Beside of those facts, I'm working on the timer pull model [0]. With this,
non pinned timers can also be expired by other CPUs and do not prevent CPUs
from going idle. Those timers will be enqueued on the local CPU without any
heuristics. This helps to improve behavior when a system is idle (regarding
power). But the call of tick_nohz_next_event() will be more expensive which
led to a regression during testing. This regression is gone with the new
teo implementation - it seems that there is also an improvement under
load. I do not have finalized numbers, as it is still WIP (I came across
some other possible optimizations during analyzing the regression, which
I'm evaluating at the moment).

Thanks,

        Anna-Maria

[0] https://lore.kernel.org/lkml/20230524070629.6377-1-anna-maria@linutronix.de/

Doug Smythies Aug. 10, 2023, 12:43 a.m. UTC | #10

On Wed, Aug 9, 2023 at 9:24 AM Anna-Maria Behnsen
<anna-maria@linutronix.de> wrote:
> On Tue, 8 Aug 2023, Doug Smythies wrote:
> > On Tue, Aug 8, 2023 at 9:43 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > On Tue, Aug 8, 2023 at 5:22 PM Doug Smythies <dsmythies@telus.net> wrote:
>
> [...]
>
> > > > Conclusions: Overall, I am not seeing a compelling reason to
> > > > proceed with this patch set.
> > >
> > > On the other hand, if there is a separate compelling reason to do
> > > that, it doesn't appear to lead to a major regression.
> >
> > Agreed.
> >
>
> Regarding the compelling reason:
>
> On a fully loaded machine with 256 CPUs tick_nohz_next_event() is executed
> ~48000 times per second. With this patchset it is reduced to ~120 times per
> second. The factor for the difference is 400. This is already an
> improvement.
>
> tick_nohz_next_event() marks timer bases idle, whenever possible - even if
> the tick is not stopped afterwards. When a timer is enqueued remote into an
> idle timer base an IPI is sent. Calling tick_nohz_next_event() only when
> the system is not that busy, prevents those unnecessary IPIs.
>
> Beside of those facts, I'm working on the timer pull model [0]. With this,
> non pinned timers can also be expired by other CPUs and do not prevent CPUs
> from going idle. Those timers will be enqueued on the local CPU without any
> heuristics. This helps to improve behavior when a system is idle (regarding
> power). But the call of tick_nohz_next_event() will be more expensive which
> led to a regression during testing. This regression is gone with the new
> teo implementation - it seems that there is also an improvement under
> load. I do not have finalized numbers, as it is still WIP (I came across
> some other possible optimizations during analyzing the regression, which
> I'm evaluating at the moment).
>
> Thanks,
>
>         Anna-Maria
>
>
> [0] https://lore.kernel.org/lkml/20230524070629.6377-1-anna-maria@linutronix.de/

Thank you for the context and the link.

... Doug

Rafael J. Wysocki Aug. 10, 2023, 7:27 a.m. UTC | #11

Hi Doug,

On Thu, Aug 10, 2023 at 3:08 AM Doug Smythies <dsmythies@telus.net> wrote:
>
> Hi Rafael,
>
> Please bear with me. As you know I have many tests
> that search over a wide range of operating conditions
> looking for areas to focus on in more detail.
>
> On Tue, Aug 8, 2023 at 3:40 PM Doug Smythies <dsmythies@telus.net> wrote:
> > On Tue, Aug 8, 2023 at 9:43 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > On Tue, Aug 8, 2023 at 5:22 PM Doug Smythies <dsmythies@telus.net> wrote:
> > > > On 2023.08.03 14:33 Rafael wrote:
> > > > > On Thu, Aug 3, 2023 at 11:12 PM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > > > >>
> > > > >> Hi Folks,
> > > > >>
> > > > >> This is the second iteration of:
> > > > >>
> > > > >> https://lore.kernel.org/linux-pm/4511619.LvFx2qVVIh@kreacher/
> > > > >>
> > > > >> with an additional patch.
> > > > >>
> > > > >> There are some small modifications of patch [1/3] and the new
> > > > >> patch causes governor statistics to play a role in deciding whether
> > > > >> or not to stop the scheduler tick.
> > > > >>
> > > > >> Testing would be much appreciated!
> > > > >
> > > > > For convenience, this series is now available in the following git branch:
> > > > >
> > > > > git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \
> > > > > pm-cpuidle-teo
> > > >
> > > > Hi Rafael,
> > > >
> > > > Thank you for the git branch link.
> > > >
> > > > I did some testing:
> >
> >
> > ... deleted ...
> >
> > > > Test 2: 6 core ping pong sweep:
> > > >
> > > > Pass a token between 6 CPUs on 6 different cores.
> > > > Do a variable amount of work at each stop.
> > > >
> > > > Purpose: To utilize the midrange idle states
> > > > and observe the transitions from between use of
> > > > idle states.
> > > >
> > > > Results: There is some instability in the results
> > > > in the early stages.
> > > > For unknown reasons, the rjw governor sometimes works
> > > > slower and at lower power. The condition is not 100%
> > > > repeatable.
> > > >
> > > > Overall teo completed the test fastest (54.9 minutes)
> > > > Followed by menu (56.2 minutes), then rjw (56.7 minutes),
> > > > then ladder (58.4 minutes). teo is faster throughout the
> > > > latter stages of the test, but at the cost of more power.
> > > > The differences seem to be in the transition from idle
> > > > state 1 to idle state 2 usage.
> >
> > the magnitude of the later stages differences are significant.
> >
> > ... deleted ...
> >
> > > Thanks a lot for doing this work, much appreciated!
> > >
> > > > Conclusions: Overall, I am not seeing a compelling reason to
> > > > proceed with this patch set.
> > >
> > > On the other hand, if there is a separate compelling reason to do
> > > that, it doesn't appear to lead to a major regression.
> >
> > Agreed.
> >
> > Just for additional information, a 6 core dwell test was run.
> > The test conditions were cherry picked for dramatic effect:
> >
> > teo: average: 1162.13 uSec/loop ; Std dev: 0.38
> > ryw: average: 1266.45 uSec/loop ; Std dev: 6.53 ; +9%
> >
> > teo: average: 29.98 watts
> > rjw: average: 30.30 watts
> > (the same within thermal experimental error)
> >
> > Details (power and idle stats over the 45 minute test period):
> > http://smythies.com/~doug/linux/idle/teo-util2/6-13568-147097/perf/
>
> Okay, so while differences in the sometimes selection of a deeper
> idle state might be detrimental to latency sensitive workflow such as
> above, it is an overwhelming benefit to periodic workflows:
>
> Test 8: low load periodic workflow.
>
> There is an enormous range of work/sleep frequencies and loads
> to pick from. There was no cherry picking for this test.
>
> The only criteria is that the periodic fixed packet of work is
> completed before the start of the next period.
>
> Test 8 A: 1 load at about 3% and 347 Hz work/sleep frequency:
> teo average processor package power: 16.38 watts
> rjw average processor package power: 4.29 watts
> or 73.8% improvement!!!!!
>
> Test 8 B: 2 loads at about 3% and 347 Hz work/sleep frequency:
> teo average processor package power: 18.35 watts
> rjw average processor package power: 6.67 watts
> or 63.7% improvement!!!!!

This is very interesting, thank you!

[RFT,v2,0/3] cpuidle: teo: Do not check timers unconditionally every time

Message

Comments