Message ID | 20221105174225.28673-1-rui.zhang@intel.com |
---|---|
State | New |
Headers | show |
Series | [RFC,1/3] cpuidle: ladder: Fix bogus comparison between s64 and u64 | expand |
On Sat, Nov 5, 2022 at 6:40 PM Zhang Rui <rui.zhang@intel.com> wrote: > > ladder_device_state.threshold.promotion_time_ns/demotion_time_ns > are u64 type. > > In ladder_select_state(), variable 'last_residency', as calculated by > > last_residency = dev->last_residency_ns - drv->states[last_idx].exit_latency_ns > > are s64 type, and it can be negative value. The code changes are fine AFAICS, but the description below could be more precise. > When this happens, comparing between 'last_residency' and > 'promotion_time_ns/demotion_time_ns' become bogus. IIUC, what happens is that last_residency is converted to u64 in the comparison expression and that conversion causes it to become a large positive number if it is negative. > As a result, the ladder governor promotes or stays with current state errornously. "promotes or retains the current state erroneously". > > <idle>-0 [001] d..1. 151.893396: ladder_select_state: last_idx 7, last_residency -373033 > <idle>-0 [001] d..1. 151.893399: ladder_select_state: dev->last_residency_ns 106967, drv->states[last_idx].exit_latency_ns 480000 > <idle>-0 [001] d..1. 151.893402: ladder_select_state: promote, last_state->threshold.promotion_time_ns 480000 > <idle>-0 [001] d..1. 151.893404: ladder_select_state: ---> new state 7 > <idle>-0 [001] d..1. 151.893465: ladder_select_state: last_idx 7, last_residency -463800 > <idle>-0 [001] d..1. 151.893467: ladder_select_state: dev->last_residency_ns 16200, drv->states[last_idx].exit_latency_ns 480000 > <idle>-0 [001] d..1. 151.893468: ladder_select_state: promote, last_state->threshold.promotion_time_ns 480000 > <idle>-0 [001] dn.1. 151.893470: ladder_select_state: ---> new state 8 > > Given that promotion_time_ns/demotion_time_ns are initialized with > cpuidle_state.exit_latency_ns, which is s64 type, and they are used to > compare with 'last_residency', which is also s64 type, there is no "they are compared with" > reason to use u64 for promotion_time_ns/demotion_time_ns. "so change them both to be s64". > With this patch, > <idle>-0 [001] d..1. 523.578531: ladder_select_state: last_idx 8, last_residency -879453 > <idle>-0 [001] d..1. 523.578531: ladder_select_state: dev->last_residency_ns 10547, drv->states[last_idx].exit_latency_ns 890000 > <idle>-0 [001] d..1. 523.578532: ladder_select_state: demote , last_state->threshold.demotion_time_ns 890000 > <idle>-0 [001] d..1. 523.578532: ladder_select_state: ---> new state 7 > <idle>-0 [001] d..1. 523.580220: ladder_select_state: last_idx 7, last_residency -169629 > <idle>-0 [001] d..1. 523.580221: ladder_select_state: dev->last_residency_ns 310371, drv->states[last_idx].exit_latency_ns 480000 > <idle>-0 [001] d..1. 523.580221: ladder_select_state: demote , last_state->threshold.demotion_time_ns 480000 > <idle>-0 [001] d..1. 523.580222: ladder_select_state: ---> new state 6 > > Signed-off-by: Zhang Rui <rui.zhang@intel.com> > --- > drivers/cpuidle/governors/ladder.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/cpuidle/governors/ladder.c b/drivers/cpuidle/governors/ladder.c > index 8e9058c4ea63..fb61118aef37 100644 > --- a/drivers/cpuidle/governors/ladder.c > +++ b/drivers/cpuidle/governors/ladder.c > @@ -27,8 +27,8 @@ struct ladder_device_state { > struct { > u32 promotion_count; > u32 demotion_count; > - u64 promotion_time_ns; > - u64 demotion_time_ns; > + s64 promotion_time_ns; > + s64 demotion_time_ns; > } threshold; > struct { > int promotion_count; > --
On Sat, Nov 5, 2022 at 6:40 PM Zhang Rui <rui.zhang@intel.com> wrote: > > After fixing the bogus comparison between u64 and s64, the ladder > governor stops making promotion decisions errornously. > > However, after this, it is found that the ladder governor demotes much > easier than promotes. "After fixing an error related to using signed and unsigned integers in the ladder governor in a previous patch, that governor turns out to demote much easier than promote" > Below is captured using turbostat after a 30 seconds runtime idle, > > Without previous patch, > Busy% IRQ POLL C1 C1E C3 C6 C7s C8 C9 C10 CPU%c1 CPU%c3 CPU%c6 CPU%c7 PkgWatt > 0.30 2373 0 0 0 4 9 25 122 326 2857 0.36 0.04 0.57 98.73 1.48 Why is the above relevant? > With previous patch, > Busy% IRQ POLL C1 C1E C3 C6 C7s C8 C9 C10 CPU%c1 CPU%c3 CPU%c6 CPU%c7 PkgWatt > 0.42 3071 0 771 838 447 327 336 382 299 344 34.18 16.21 17.69 31.51 2.00 > > And this is caused by the imbalanced PROMOTION_COUNT/DEMOTION_COUNT. I would explain why/how the imbalanced PROMOTION_COUNT/DEMOTION_COUNT imbalance causes this. I guess more residency in the deeper idle state is expected? Or desired?? > With this patch, > Busy% IRQ POLL C1 C1E C3 C6 C7s C8 C9 C10 CPU%c1 CPU%c3 CPU%c6 CPU%c7 PkgWatt > 0.39 2436 0 1 72 177 51 194 243 799 1883 0.50 0.32 0.35 98.45 1.53 > > Note that this is an experimental patch to illustrate the problem, > and it is checked with idle scenario only for now. > I will try to evaluate with more scenarios, and if someone can help > evaluate with more scenarios at the same time and provide data for the > benefit with different PROMOTION_COUNT/DEMOTION_COUNT values, that > would be great. So yes, this requires more work. Overall, I think that you are concerned that the previous change might be regarded as a regression and are trying to compensate for it with a PROMOTION_COUNT/DEMOTION_COUNT change. I'm not sure I can agree with that approach, because the shallower idle states might be preferred by the original ladder design intentionally, for performance reasons. > Signed-off-by: Zhang Rui <rui.zhang@intel.com> > --- > drivers/cpuidle/governors/ladder.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/cpuidle/governors/ladder.c b/drivers/cpuidle/governors/ladder.c > index fb61118aef37..4b47aa0a4da9 100644 > --- a/drivers/cpuidle/governors/ladder.c > +++ b/drivers/cpuidle/governors/ladder.c > @@ -20,8 +20,8 @@ > #include <asm/io.h> > #include <linux/uaccess.h> > > -#define PROMOTION_COUNT 4 > -#define DEMOTION_COUNT 1 > +#define PROMOTION_COUNT 2 > +#define DEMOTION_COUNT 4 > > struct ladder_device_state { > struct { > --
On 2022.11.23 09:50 Rafael wrote: > On Sat, Nov 5, 2022 at 6:40 PM Zhang Rui <rui.zhang@intel.com> wrote: >> >> After fixing the bogus comparison between u64 and s64, the ladder >> governor stops making promotion decisions errornously. >> >> However, after this, it is found that the ladder governor demotes much >> easier than promotes. > > "After fixing an error related to using signed and unsigned integers > in the ladder governor in a previous patch, that governor turns out to > demote much easier than promote" > >> Below is captured using turbostat after a 30 seconds runtime idle, >> >> Without previous patch, >> Busy% IRQ POLL C1 C1E C3 C6 C7s C8 C9 C10 CPU%c1 CPU%c3 CPU%c6 CPU%c7 PkgWatt >> 0.30 2373 0 0 0 4 9 25 122 326 2857 0.36 0.04 0.57 98.73 1.48 > > Why is the above relevant? > >> With previous patch, >> Busy% IRQ POLL C1 C1E C3 C6 C7s C8 C9 C10 CPU%c1 CPU%c3 CPU%c6 CPU%c7 PkgWatt >> 0.42 3071 0 771 838 447 327 336 382 299 344 34.18 16.21 17.69 31.51 2.00 >> >> And this is caused by the imbalanced PROMOTION_COUNT/DEMOTION_COUNT. > > I would explain why/how the imbalanced PROMOTION_COUNT/DEMOTION_COUNT > imbalance causes this. > > I guess more residency in the deeper idle state is expected? Or desired?? > >> With this patch, >> Busy% IRQ POLL C1 C1E C3 C6 C7s C8 C9 C10 CPU%c1 CPU%c3 CPU%c6 CPU%c7 PkgWatt >> 0.39 2436 0 1 72 177 51 194 243 799 1883 0.50 0.32 0.35 98.45 1.53 >> >> Note that this is an experimental patch to illustrate the problem, >> and it is checked with idle scenario only for now. >> I will try to evaluate with more scenarios, and if someone can help >> evaluate with more scenarios at the same time and provide data for the >> benefit with different PROMOTION_COUNT/DEMOTION_COUNT values, that >> would be great. > > So yes, this requires more work. > > Overall, I think that you are concerned that the previous change might > be regarded as a regression and are trying to compensate for it with a > PROMOTION_COUNT/DEMOTION_COUNT change. > > I'm not sure I can agree with that approach, because the shallower > idle states might be preferred by the original ladder design > intentionally, for performance reasons. Hi All, Because I was continuing to test the teo governor with the util patch version 4, it was fairly easy for me to test this patch set also. However, I have had difficulties having enough time to write up my results. The best improvement was for a slow speed ping-pong (I did 3 speeds, fast, medium, and slow) 2 pairs of ping pongs, not forced CPU affinity, schedutil CPU scaling governor, intel_cpufreq CPU scaling driver, HWP disabled. The menu governor was considered the master reference: Old ladder was 44% slower. New ladder was 5.9% slower. Just for reference: Old teo was 29% slower. teo util V4 was 13% slower. The worst degradation was for a fast speed ping-pong 2 pairs of ping pongs, not forced CPU affinity, schedutil CPU scaling governor, intel_cpufreq CPU scaling driver, HWP disabled. The menu governor was considered the master reference: Old ladder was 64% slower. New ladder was 71% slower. Interestingly, the old ladder governor outperformed the menu governor on the slow 2 pair ping-pong with the performance governor: Old ladder was 0.56% faster. New ladder was 0.81% slower. Disclaimer: Test results using the schedutil CPU scaling governor are noisy, with questionable repeatability. I'll try to get all the test results written up soon. ... Doug
> > > I don't have a solid proof for this. But at least for the pure idle > > scenario, I don't think 30% deep idle residency is the right > > behavior, > > and it needs to be tuned anyway. > > Well, have you checked what happens if the counts are set to the same > value, e.g. 2? Well, this is embarrassing. I found a problem with my previous data when I re-evaluate following your suggestion. In short, 1. the 30% deep idle residency problem was got when I added some trace_printk() in the ladder_select_state() 2, without those trace_printk(), after patch 1, the ladder governor can still get 98% CPU%c7 in pure idle scenario. Currently, my understanding is that trace_printk() can call __schedule() and this increased the chance that call_cpuidle() returns immediately. When this happens, dev->last_residency_ns is set to 0 and results in a real demotion next time. Anyway, you are right on questioning this approach, because this seems to be a different problem or even a false alarm. So, I think I will submit patch 1/3 and 3/3 as they are bug fixes, and drop this patch for now, and leave the tuning work, if there is any, for the real ladder governor users. What do you think? thanks, rui
diff --git a/drivers/cpuidle/governors/ladder.c b/drivers/cpuidle/governors/ladder.c index 8e9058c4ea63..fb61118aef37 100644 --- a/drivers/cpuidle/governors/ladder.c +++ b/drivers/cpuidle/governors/ladder.c @@ -27,8 +27,8 @@ struct ladder_device_state { struct { u32 promotion_count; u32 demotion_count; - u64 promotion_time_ns; - u64 demotion_time_ns; + s64 promotion_time_ns; + s64 demotion_time_ns; } threshold; struct { int promotion_count;
ladder_device_state.threshold.promotion_time_ns/demotion_time_ns are u64 type. In ladder_select_state(), variable 'last_residency', as calculated by last_residency = dev->last_residency_ns - drv->states[last_idx].exit_latency_ns are s64 type, and it can be negative value. When this happens, comparing between 'last_residency' and 'promotion_time_ns/demotion_time_ns' become bogus. As a result, the ladder governor promotes or stays with current state errornously. <idle>-0 [001] d..1. 151.893396: ladder_select_state: last_idx 7, last_residency -373033 <idle>-0 [001] d..1. 151.893399: ladder_select_state: dev->last_residency_ns 106967, drv->states[last_idx].exit_latency_ns 480000 <idle>-0 [001] d..1. 151.893402: ladder_select_state: promote, last_state->threshold.promotion_time_ns 480000 <idle>-0 [001] d..1. 151.893404: ladder_select_state: ---> new state 7 <idle>-0 [001] d..1. 151.893465: ladder_select_state: last_idx 7, last_residency -463800 <idle>-0 [001] d..1. 151.893467: ladder_select_state: dev->last_residency_ns 16200, drv->states[last_idx].exit_latency_ns 480000 <idle>-0 [001] d..1. 151.893468: ladder_select_state: promote, last_state->threshold.promotion_time_ns 480000 <idle>-0 [001] dn.1. 151.893470: ladder_select_state: ---> new state 8 Given that promotion_time_ns/demotion_time_ns are initialized with cpuidle_state.exit_latency_ns, which is s64 type, and they are used to compare with 'last_residency', which is also s64 type, there is no reason to use u64 for promotion_time_ns/demotion_time_ns. With this patch, <idle>-0 [001] d..1. 523.578531: ladder_select_state: last_idx 8, last_residency -879453 <idle>-0 [001] d..1. 523.578531: ladder_select_state: dev->last_residency_ns 10547, drv->states[last_idx].exit_latency_ns 890000 <idle>-0 [001] d..1. 523.578532: ladder_select_state: demote , last_state->threshold.demotion_time_ns 890000 <idle>-0 [001] d..1. 523.578532: ladder_select_state: ---> new state 7 <idle>-0 [001] d..1. 523.580220: ladder_select_state: last_idx 7, last_residency -169629 <idle>-0 [001] d..1. 523.580221: ladder_select_state: dev->last_residency_ns 310371, drv->states[last_idx].exit_latency_ns 480000 <idle>-0 [001] d..1. 523.580221: ladder_select_state: demote , last_state->threshold.demotion_time_ns 480000 <idle>-0 [001] d..1. 523.580222: ladder_select_state: ---> new state 6 Signed-off-by: Zhang Rui <rui.zhang@intel.com> --- drivers/cpuidle/governors/ladder.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)