Message ID | 20240428092852.1588188-1-liwei728@huawei.com |
---|---|
State | New |
Headers | show |
Series | cpufreq/cppc: changing highest_perf to nominal_perf in cppc_cpufreq_cpu_init() | expand |
CC'ing few folks who are working with the driver. On 28-04-24, 17:28, liwei wrote: > When turning on turbo, if frequency configuration takes effect slowly, > the updated policy->cur may be equal to the frequency configured in > governor->limits(), performance governor will not adjust the frequency, > configured frequency will remain at turbo-freq. > > Simplified call stack looks as follows: > cpufreq_register_driver(&cppc_cpufreq_driver) > ... > cppc_cpufreq_cpu_init() > cppc_get_perf_caps() > policy->max = cppc_perf_to_khz(caps, caps->nominal_perf) > cppc_set_perf(highest_perf) // set highest_perf > policy->cur = cpufreq_driver->get() // if cur == policy->max > cpufreq_init_policy() > ... > cpufreq_start_governor() // governor: performance > new_freq = cpufreq_driver->get() // if new_freq == policy->max > if (policy->cur != new_freq) > cpufreq_out_of_sync(policy, new_freq) > ... > policy->cur = new_freq > ... > policy->governor->limits() > __cpufreq_driver_target(policy->max) > if (policy->cur==target) > // generate error, keep set highest_perf > ret > cppc_set_perf(target) > > Fix this by changing highest_perf to nominal_perf in cppc_cpufreq_cpu_init(). > > Fixes: 5477fb3bd1e8 ("ACPI / CPPC: Add a CPUFreq driver for use with CPPC") > Signed-off-by: liwei <liwei728@huawei.com> > --- > drivers/cpufreq/cppc_cpufreq.c | 8 ++++---- > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c > index 64420d9cfd1e..db04a82b8a97 100644 > --- a/drivers/cpufreq/cppc_cpufreq.c > +++ b/drivers/cpufreq/cppc_cpufreq.c > @@ -669,14 +669,14 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy) > if (caps->highest_perf > caps->nominal_perf) > boost_supported = true; > > - /* Set policy->cur to max now. The governors will adjust later. */ > - policy->cur = cppc_perf_to_khz(caps, caps->highest_perf); > - cpu_data->perf_ctrls.desired_perf = caps->highest_perf; > + /* Set policy->cur to norm now. */ > + policy->cur = cppc_perf_to_khz(caps, caps->nominal_perf); > + cpu_data->perf_ctrls.desired_perf = caps->nominal_perf; > > ret = cppc_set_perf(cpu, &cpu_data->perf_ctrls); > if (ret) { > pr_debug("Err setting perf value:%d on CPU:%d. ret:%d\n", > - caps->highest_perf, cpu, ret); > + caps->nominal_perf, cpu, ret); > goto out; > } > > -- > 2.25.1
On Mon, Apr 29, 2024 at 04:19:45PM +0530, Viresh Kumar wrote: >CC'ing few folks who are working with the driver. > >On 28-04-24, 17:28, liwei wrote: >> When turning on turbo, if frequency configuration takes effect slowly, >> the updated policy->cur may be equal to the frequency configured in >> governor->limits(), performance governor will not adjust the frequency, >> configured frequency will remain at turbo-freq. >> >> Simplified call stack looks as follows: >> cpufreq_register_driver(&cppc_cpufreq_driver) >> ... >> cppc_cpufreq_cpu_init() >> cppc_get_perf_caps() >> policy->max = cppc_perf_to_khz(caps, caps->nominal_perf) >> cppc_set_perf(highest_perf) // set highest_perf >> policy->cur = cpufreq_driver->get() // if cur == policy->max >> cpufreq_init_policy() >> ... >> cpufreq_start_governor() // governor: performance >> new_freq = cpufreq_driver->get() // if new_freq == policy->max >> if (policy->cur != new_freq) >> cpufreq_out_of_sync(policy, new_freq) >> ... >> policy->cur = new_freq >> ... >> policy->governor->limits() >> __cpufreq_driver_target(policy->max) >> if (policy->cur==target) >> // generate error, keep set highest_perf >> ret >> cppc_set_perf(target) >> >> Fix this by changing highest_perf to nominal_perf in cppc_cpufreq_cpu_init(). >> >> Fixes: 5477fb3bd1e8 ("ACPI / CPPC: Add a CPUFreq driver for use with CPPC") >> Signed-off-by: liwei <liwei728@huawei.com> >> --- >> drivers/cpufreq/cppc_cpufreq.c | 8 ++++---- >> 1 file changed, 4 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c >> index 64420d9cfd1e..db04a82b8a97 100644 >> --- a/drivers/cpufreq/cppc_cpufreq.c >> +++ b/drivers/cpufreq/cppc_cpufreq.c >> @@ -669,14 +669,14 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy) >> if (caps->highest_perf > caps->nominal_perf) >> boost_supported = true; >> >> - /* Set policy->cur to max now. The governors will adjust later. */ >> - policy->cur = cppc_perf_to_khz(caps, caps->highest_perf); >> - cpu_data->perf_ctrls.desired_perf = caps->highest_perf; >> + /* Set policy->cur to norm now. */ >> + policy->cur = cppc_perf_to_khz(caps, caps->nominal_perf); >> + cpu_data->perf_ctrls.desired_perf = caps->nominal_perf; >> This seems like the safer thing to do. Reviewed-by: Vanshidhar Konda <vanshikonda@os.amperecomputing.com> >> ret = cppc_set_perf(cpu, &cpu_data->perf_ctrls); >> if (ret) { >> pr_debug("Err setting perf value:%d on CPU:%d. ret:%d\n", >> - caps->highest_perf, cpu, ret); >> + caps->nominal_perf, cpu, ret); >> goto out; >> } >> >> -- >> 2.25.1 > >-- >viresh
Hello Liwei, The change itself seems ok, but I'm not sure I understand what the issue is exactly. On 4/29/24 12:49, Viresh Kumar wrote: > CC'ing few folks who are working with the driver. > > On 28-04-24, 17:28, liwei wrote: >> When turning on turbo, if frequency configuration takes effect slowly, >> the updated policy->cur may be equal to the frequency configured in >> governor->limits(), performance governor will not adjust the frequency, >> configured frequency will remain at turbo-freq. >> >> Simplified call stack looks as follows: >> cpufreq_register_driver(&cppc_cpufreq_driver) >> ... >> cppc_cpufreq_cpu_init() >> cppc_get_perf_caps() >> policy->max = cppc_perf_to_khz(caps, caps->nominal_perf) >> cppc_set_perf(highest_perf) // set highest_perf >> policy->cur = cpufreq_driver->get() // if cur == policy->max During the driver initialization, we have: cppc_cpufreq_cpu_init() \-policy->max = cppc_perf_to_khz(caps, caps->nominal_perf) \-policy->cur = cppc_perf_to_khz(caps, caps->highest_perf); \-cpu_data->perf_ctrls.desired_perf = caps->highest_perf; \-cppc_set_perf(cpu, &cpu_data->perf_ctrls); // set freq to highest_perf so here: policy->max = nominal_freq policy->cur = highest_freq And then for the cpufreq framework: cpufreq_online() // IIUC there is some delay here, so policy->cur = nominal_freq ? // i.e. the freq. was requested to change to the highest_freq, // but the change is not effective yet ? \-policy->cur = cpufreq_driver->get(policy->cpu); \-cpufreq_init_policy() \-cpufreq_set_policy() \-cpufreq_start_governor() \-cpufreq_verify_current_freq() \-new_freq = cpufreq_driver->get(policy->cpu); // new_freq = nominal_freq ? \-if (policy->cur != new_freq) \- cpufreq_out_of_sync() \- policy->cur = new_freq; \-cpufreq_start_governor() \-cpufreq_gov_performance_limits() \-__cpufreq_driver_target(target_freq=policy->max) // with policy->max = nominal_freq ? \-if (target_freq == policy->cur) \- // do nothing I am not sure I understand when you are turning the turbo on with: # echo 1 > /sys/devices/system/cpu/cpufreq/boost Or do you mean that turbo is available but not turned on ? Regards, Pierre >> cpufreq_init_policy() >> ... >> cpufreq_start_governor() // governor: performance >> new_freq = cpufreq_driver->get() // if new_freq == policy->max >> if (policy->cur != new_freq) >> cpufreq_out_of_sync(policy, new_freq) >> ... >> policy->cur = new_freq >> ... >> policy->governor->limits() >> __cpufreq_driver_target(policy->max) >> if (policy->cur==target) >> // generate error, keep set highest_perf >> ret >> cppc_set_perf(target) >> >> Fix this by changing highest_perf to nominal_perf in cppc_cpufreq_cpu_init(). >> >> Fixes: 5477fb3bd1e8 ("ACPI / CPPC: Add a CPUFreq driver for use with CPPC") >> Signed-off-by: liwei <liwei728@huawei.com> >> --- >> drivers/cpufreq/cppc_cpufreq.c | 8 ++++---- >> 1 file changed, 4 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c >> index 64420d9cfd1e..db04a82b8a97 100644 >> --- a/drivers/cpufreq/cppc_cpufreq.c >> +++ b/drivers/cpufreq/cppc_cpufreq.c >> @@ -669,14 +669,14 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy) >> if (caps->highest_perf > caps->nominal_perf) >> boost_supported = true; >> >> - /* Set policy->cur to max now. The governors will adjust later. */ >> - policy->cur = cppc_perf_to_khz(caps, caps->highest_perf); >> - cpu_data->perf_ctrls.desired_perf = caps->highest_perf; >> + /* Set policy->cur to norm now. */ >> + policy->cur = cppc_perf_to_khz(caps, caps->nominal_perf); >> + cpu_data->perf_ctrls.desired_perf = caps->nominal_perf; >> >> ret = cppc_set_perf(cpu, &cpu_data->perf_ctrls); >> if (ret) { >> pr_debug("Err setting perf value:%d on CPU:%d. ret:%d\n", >> - caps->highest_perf, cpu, ret); >> + caps->nominal_perf, cpu, ret); >> goto out; >> } >> >> -- >> 2.25.1 >
在 2024/5/3 22:19, Pierre Gondois 写道: > Hello Liwei, > The change itself seems ok, but I'm not sure I understand what the > issue is exactly. > > On 4/29/24 12:49, Viresh Kumar wrote: >> CC'ing few folks who are working with the driver. >> >> On 28-04-24, 17:28, liwei wrote: >>> When turning on turbo, if frequency configuration takes effect slowly, >>> the updated policy->cur may be equal to the frequency configured in >>> governor->limits(), performance governor will not adjust the frequency, >>> configured frequency will remain at turbo-freq. >>> >>> Simplified call stack looks as follows: >>> cpufreq_register_driver(&cppc_cpufreq_driver) >>> ... >>> cppc_cpufreq_cpu_init() >>> cppc_get_perf_caps() >>> policy->max = cppc_perf_to_khz(caps, caps->nominal_perf) >>> cppc_set_perf(highest_perf) // set highest_perf >>> policy->cur = cpufreq_driver->get() // if cur == policy->max > > During the driver initialization, we have: > cppc_cpufreq_cpu_init() > \-policy->max = cppc_perf_to_khz(caps, caps->nominal_perf) > \-policy->cur = cppc_perf_to_khz(caps, caps->highest_perf); > \-cpu_data->perf_ctrls.desired_perf = caps->highest_perf; > \-cppc_set_perf(cpu, &cpu_data->perf_ctrls); // set freq to highest_perf > so here: > policy->max = nominal_freq > policy->cur = highest_freq > > > And then for the cpufreq framework: > cpufreq_online() > // IIUC there is some delay here, so policy->cur = nominal_freq ? > // i.e. the freq. was requested to change to the highest_freq, > // but the change is not effective yet ? > \-policy->cur = cpufreq_driver->get(policy->cpu); > \-cpufreq_init_policy() > \-cpufreq_set_policy() > \-cpufreq_start_governor() > \-cpufreq_verify_current_freq() > \-new_freq = cpufreq_driver->get(policy->cpu); // new_freq = > nominal_freq ? > \-if (policy->cur != new_freq) > \- cpufreq_out_of_sync() > \- policy->cur = new_freq; > \-cpufreq_start_governor() > \-cpufreq_gov_performance_limits() > \-__cpufreq_driver_target(target_freq=policy->max) // with > policy->max = nominal_freq ? > \-if (target_freq == policy->cur) > \- // do nothing > > I am not sure I understand when you are turning the turbo on with: > # echo 1 > /sys/devices/system/cpu/cpufreq/boost > > Or do you mean that turbo is available but not turned on ? > Sorry, my description is not clear enough. The scenario described above is during the kernel initialization process, turbo is available but boost is not turned on. I found this problem is to read /sys/devices/system/cpu/cpufreq/policyX/cpuinfo_cur_freq directly after OS startup, and found that some frequencies are in turbo state and /sys/devices/system/cpu/cpufreq/boost has not been modified, its value is still 0. LiWei > > Regards, > Pierre > >>> cpufreq_init_policy() >>> ... >>> cpufreq_start_governor() // governor: performance >>> new_freq = cpufreq_driver->get() // if new_freq == >>> policy->max >>> if (policy->cur != new_freq) >>> cpufreq_out_of_sync(policy, new_freq) >>> ... >>> policy->cur = new_freq >>> ... >>> policy->governor->limits() >>> __cpufreq_driver_target(policy->max) >>> if (policy->cur==target) >>> // generate error, keep set highest_perf >>> ret >>> cppc_set_perf(target) >>> >>> Fix this by changing highest_perf to nominal_perf in >>> cppc_cpufreq_cpu_init(). >>> >>> Fixes: 5477fb3bd1e8 ("ACPI / CPPC: Add a CPUFreq driver for use with >>> CPPC") >>> Signed-off-by: liwei <liwei728@huawei.com> >>> --- >>> drivers/cpufreq/cppc_cpufreq.c | 8 ++++---- >>> 1 file changed, 4 insertions(+), 4 deletions(-) >>> >>> diff --git a/drivers/cpufreq/cppc_cpufreq.c >>> b/drivers/cpufreq/cppc_cpufreq.c >>> index 64420d9cfd1e..db04a82b8a97 100644 >>> --- a/drivers/cpufreq/cppc_cpufreq.c >>> +++ b/drivers/cpufreq/cppc_cpufreq.c >>> @@ -669,14 +669,14 @@ static int cppc_cpufreq_cpu_init(struct >>> cpufreq_policy *policy) >>> if (caps->highest_perf > caps->nominal_perf) >>> boost_supported = true; >>> - /* Set policy->cur to max now. The governors will adjust later. */ >>> - policy->cur = cppc_perf_to_khz(caps, caps->highest_perf); >>> - cpu_data->perf_ctrls.desired_perf = caps->highest_perf; >>> + /* Set policy->cur to norm now. */ >>> + policy->cur = cppc_perf_to_khz(caps, caps->nominal_perf); >>> + cpu_data->perf_ctrls.desired_perf = caps->nominal_perf; >>> ret = cppc_set_perf(cpu, &cpu_data->perf_ctrls); >>> if (ret) { >>> pr_debug("Err setting perf value:%d on CPU:%d. ret:%d\n", >>> - caps->highest_perf, cpu, ret); >>> + caps->nominal_perf, cpu, ret); >>> goto out; >>> } >>> -- >>> 2.25.1 >>
Hi, Thanks for adding me to this. On Monday 29 Apr 2024 at 16:19:45 (+0530), Viresh Kumar wrote: > CC'ing few folks who are working with the driver. > > On 28-04-24, 17:28, liwei wrote: > > When turning on turbo, if frequency configuration takes effect slowly, > > the updated policy->cur may be equal to the frequency configured in > > governor->limits(), performance governor will not adjust the frequency, > > configured frequency will remain at turbo-freq. > > > > Simplified call stack looks as follows: > > cpufreq_register_driver(&cppc_cpufreq_driver) > > ... > > cppc_cpufreq_cpu_init() > > cppc_get_perf_caps() > > policy->max = cppc_perf_to_khz(caps, caps->nominal_perf) > > cppc_set_perf(highest_perf) // set highest_perf > > policy->cur = cpufreq_driver->get() // if cur == policy->max > > cpufreq_init_policy() > > ... > > cpufreq_start_governor() // governor: performance > > new_freq = cpufreq_driver->get() // if new_freq == policy->max > > if (policy->cur != new_freq) > > cpufreq_out_of_sync(policy, new_freq) > > ... > > policy->cur = new_freq I believe the problem is here ^^^^^^^^^^^^^^^^^^^^^^. cpufreq_verify_current_freq() should not update policy->cur unless a request to change frequency has actually reached the driver. I believe policy->cur should always reflect the request, not the actual current frequency of the CPU. Given that new_freq is the current (hardware) frequency of the CPU, obtained via .get(), it can be the nominal frequency, as it is in your case, or any frequency, if there is any firmware/hardware capping in place. This causes the issue in your scenario, in which __cpufreq_driver_target() filters the request from the governor as it finds it equal to policy->cur, and it believes it's already set by hardware. This causes another issue in which scaling_cur_freq, which for some systems returns policy->cur, ends up returning the hardware frequency of the CPUs, and not the last frequency request, as it should: "scaling_cur_freq Current frequency of all of the CPUs belonging to this policy (in kHz). In the majority of cases, this is the frequency of the last P-state requested by the scaling driver from the hardware using the scaling interface provided by it, which may or may not reflect the frequency the CPU is actually running at (due to hardware design and other limitations)." [1] Therefore policy->cur gets polluted with the hardware frequency of the CPU sampled at that one time, and this affects governor decisions, as in your case, and scaling_cur_freq feedback as well. This bad value will not change until there's another .target() or cpufreq_out_of_sync() call, which will never happen for fixed frequency governors like the performance governor. Thanks, Ionela. [1] https://docs.kernel.org/admin-guide/pm/cpufreq.html > > ... > > policy->governor->limits() > > __cpufreq_driver_target(policy->max) > > if (policy->cur==target) > > // generate error, keep set highest_perf > > ret > > cppc_set_perf(target) > > > > Fix this by changing highest_perf to nominal_perf in cppc_cpufreq_cpu_init(). > > > > Fixes: 5477fb3bd1e8 ("ACPI / CPPC: Add a CPUFreq driver for use with CPPC") > > Signed-off-by: liwei <liwei728@huawei.com> > > --- > > drivers/cpufreq/cppc_cpufreq.c | 8 ++++---- > > 1 file changed, 4 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c > > index 64420d9cfd1e..db04a82b8a97 100644 > > --- a/drivers/cpufreq/cppc_cpufreq.c > > +++ b/drivers/cpufreq/cppc_cpufreq.c > > @@ -669,14 +669,14 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy) > > if (caps->highest_perf > caps->nominal_perf) > > boost_supported = true; > > > > - /* Set policy->cur to max now. The governors will adjust later. */ > > - policy->cur = cppc_perf_to_khz(caps, caps->highest_perf); > > - cpu_data->perf_ctrls.desired_perf = caps->highest_perf; > > + /* Set policy->cur to norm now. */ > > + policy->cur = cppc_perf_to_khz(caps, caps->nominal_perf); > > + cpu_data->perf_ctrls.desired_perf = caps->nominal_perf; > > > > ret = cppc_set_perf(cpu, &cpu_data->perf_ctrls); > > if (ret) { > > pr_debug("Err setting perf value:%d on CPU:%d. ret:%d\n", > > - caps->highest_perf, cpu, ret); > > + caps->nominal_perf, cpu, ret); > > goto out; > > } > > > > -- > > 2.25.1 > > -- > viresh
Hello, Thanks for for your reply. Maybe my description has caused you some misunderstandings, please allow me to supplement the description 在 2024/5/7 18:25, Ionela Voinescu 写道: > Hi, > > Thanks for adding me to this. > > On Monday 29 Apr 2024 at 16:19:45 (+0530), Viresh Kumar wrote: >> CC'ing few folks who are working with the driver. >> >> On 28-04-24, 17:28, liwei wrote: >>> When turning on turbo, if frequency configuration takes effect slowly, >>> the updated policy->cur may be equal to the frequency configured in >>> governor->limits(), performance governor will not adjust the frequency, >>> configured frequency will remain at turbo-freq. >>> >>> Simplified call stack looks as follows: >>> cpufreq_register_driver(&cppc_cpufreq_driver) >>> ... >>> cppc_cpufreq_cpu_init() >>> cppc_get_perf_caps() >>> policy->max = cppc_perf_to_khz(caps, caps->nominal_perf) >>> cppc_set_perf(highest_perf) // set highest_perf >>> policy->cur = cpufreq_driver->get() // if cur == policy->max >>> cpufreq_init_policy() >>> ... >>> cpufreq_start_governor() // governor: performance >>> new_freq = cpufreq_driver->get() // if new_freq == policy->max >>> if (policy->cur != new_freq) >>> cpufreq_out_of_sync(policy, new_freq) >>> ... >>> policy->cur = new_freq > I believe the problem is here ^^^^^^^^^^^^^^^^^^^^^^. > > cpufreq_verify_current_freq() should not update policy->cur unless a > request to change frequency has actually reached the driver. I believe > policy->cur should always reflect the request, not the actual current > frequency of the CPU. > > Given that new_freq is the current (hardware) frequency of the CPU, > obtained via .get(), it can be the nominal frequency, as it is in your > case, or any frequency, if there is any firmware/hardware capping in > place. > > This causes the issue in your scenario, in which __cpufreq_driver_target() > filters the request from the governor as it finds it equal to policy->cur, > and it believes it's already set by hardware. > > This causes another issue in which scaling_cur_freq, which for some > systems returns policy->cur, ends up returning the hardware frequency of > the CPUs, and not the last frequency request, as it should: > > "scaling_cur_freq > Current frequency of all of the CPUs belonging to this policy (in kHz). > > In the majority of cases, this is the frequency of the last P-state > requested by the scaling driver from the hardware using the scaling > interface provided by it, which may or may not reflect the frequency > the CPU is actually running at (due to hardware design and other > limitations)." [1] > > Therefore policy->cur gets polluted with the hardware frequency of the > CPU sampled at that one time, and this affects governor decisions, as > in your case, and scaling_cur_freq feedback as well. This bad value will > not change until there's another .target() or cpufreq_out_of_sync() > call, which will never happen for fixed frequency governors like the > performance governor. > > Thanks, > Ionela. > In the above function calling process, the frequency is obtained twice. The first time is in cpufreq_online(), and the second time is in cpufreq_verify_current_freq(). When the frequency configuration takes effect slowly, the kernel cannot sense when the frequency configuration takes effect. It may take effect before the frequency is read twice, between the frequencies read twice, or after the frequency is read twice. |------------------|--------------------|---------------------| set highest_freq get() get() target() If it takes effect before two read operations, there will be no problem. If it takes effect between two read operations, policy->cur will be updated in cpufreq_verify_current_freq(), the execution path is as follows: new_freq = cpufreq_driver->get() // new_freq = turbo_freq if (policy->cur != new_freq) cpufreq_out_of_sync(policy, new_freq) ... policy->cur = new_freq // cur = turbo_freq ... __cpufreq_driver_target(policy->max) cppc_set_perf(target) // policy->cur!=target Reconfigure frequency to policy->max. If policy->cur is not set to turbo_freq after two read operations, policy->cur will not be updated in cpufreq_verify_current_freq(), the execution path is as follows: new_freq = cpufreq_driver->get() // new_freq == policy->cur if (policy->cur != new_freq) ... __cpufreq_driver_target(policy->max) ret // policy->cur==target Configured frequency will remain at turbo-freq. When reading scaling_cur_freq, the frequency value that may be read is policy->cur. If arch does not implement arch_freq_get_on_cpu(), and the registered cpufreq_driver does not define setpolicy()/get(), the frequency will not be obtained through the get() and will directly feed back policy->cur. If the above problem occurs, no exception will be detected when reading scaling_cur_freq. But reading cpuinfo_cur_freq will reacquire the frequency through the get() interface and feedback the newly acquired frequency value. Thanks liwei > > [1] https://docs.kernel.org/admin-guide/pm/cpufreq.html > >>> ... >>> policy->governor->limits() >>> __cpufreq_driver_target(policy->max) >>> if (policy->cur==target) >>> // generate error, keep set highest_perf >>> ret >>> cppc_set_perf(target) >>> >>> Fix this by changing highest_perf to nominal_perf in cppc_cpufreq_cpu_init(). >>> >>> Fixes: 5477fb3bd1e8 ("ACPI / CPPC: Add a CPUFreq driver for use with CPPC") >>> Signed-off-by: liwei <liwei728@huawei.com> >>> --- >>> drivers/cpufreq/cppc_cpufreq.c | 8 ++++---- >>> 1 file changed, 4 insertions(+), 4 deletions(-) >>> >>> diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c >>> index 64420d9cfd1e..db04a82b8a97 100644 >>> --- a/drivers/cpufreq/cppc_cpufreq.c >>> +++ b/drivers/cpufreq/cppc_cpufreq.c >>> @@ -669,14 +669,14 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy) >>> if (caps->highest_perf > caps->nominal_perf) >>> boost_supported = true; >>> >>> - /* Set policy->cur to max now. The governors will adjust later. */ >>> - policy->cur = cppc_perf_to_khz(caps, caps->highest_perf); >>> - cpu_data->perf_ctrls.desired_perf = caps->highest_perf; >>> + /* Set policy->cur to norm now. */ >>> + policy->cur = cppc_perf_to_khz(caps, caps->nominal_perf); >>> + cpu_data->perf_ctrls.desired_perf = caps->nominal_perf; >>> >>> ret = cppc_set_perf(cpu, &cpu_data->perf_ctrls); >>> if (ret) { >>> pr_debug("Err setting perf value:%d on CPU:%d. ret:%d\n", >>> - caps->highest_perf, cpu, ret); >>> + caps->nominal_perf, cpu, ret); >>> goto out; >>> } >>> >>> -- >>> 2.25.1 >> >> -- >> viresh
Hi, On Friday 10 May 2024 at 11:06:50 (+0800), liwei (JK) wrote: > Hello, > > Thanks for for your reply. > > Maybe my description has caused you some misunderstandings, please allow me > to supplement the description > > 在 2024/5/7 18:25, Ionela Voinescu 写道: > > Hi, > > > > Thanks for adding me to this. > > > > On Monday 29 Apr 2024 at 16:19:45 (+0530), Viresh Kumar wrote: > > > CC'ing few folks who are working with the driver. > > > > > > On 28-04-24, 17:28, liwei wrote: > > > > When turning on turbo, if frequency configuration takes effect slowly, > > > > the updated policy->cur may be equal to the frequency configured in > > > > governor->limits(), performance governor will not adjust the frequency, > > > > configured frequency will remain at turbo-freq. > > > > > > > > Simplified call stack looks as follows: > > > > cpufreq_register_driver(&cppc_cpufreq_driver) > > > > ... > > > > cppc_cpufreq_cpu_init() > > > > cppc_get_perf_caps() > > > > policy->max = cppc_perf_to_khz(caps, caps->nominal_perf) > > > > cppc_set_perf(highest_perf) // set highest_perf > > > > policy->cur = cpufreq_driver->get() // if cur == policy->max > > > > cpufreq_init_policy() > > > > ... > > > > cpufreq_start_governor() // governor: performance > > > > new_freq = cpufreq_driver->get() // if new_freq == policy->max > > > > if (policy->cur != new_freq) > > > > cpufreq_out_of_sync(policy, new_freq) > > > > ... > > > > policy->cur = new_freq > > I believe the problem is here ^^^^^^^^^^^^^^^^^^^^^^. > > > > cpufreq_verify_current_freq() should not update policy->cur unless a > > request to change frequency has actually reached the driver. I believe > > policy->cur should always reflect the request, not the actual current > > frequency of the CPU. > > > > Given that new_freq is the current (hardware) frequency of the CPU, > > obtained via .get(), it can be the nominal frequency, as it is in your > > case, or any frequency, if there is any firmware/hardware capping in > > place. > > > > This causes the issue in your scenario, in which __cpufreq_driver_target() > > filters the request from the governor as it finds it equal to policy->cur, > > and it believes it's already set by hardware. > > > > This causes another issue in which scaling_cur_freq, which for some > > systems returns policy->cur, ends up returning the hardware frequency of > > the CPUs, and not the last frequency request, as it should: > > > > "scaling_cur_freq > > Current frequency of all of the CPUs belonging to this policy (in kHz). > > > > In the majority of cases, this is the frequency of the last P-state > > requested by the scaling driver from the hardware using the scaling > > interface provided by it, which may or may not reflect the frequency > > the CPU is actually running at (due to hardware design and other > > limitations)." [1] > > > > Therefore policy->cur gets polluted with the hardware frequency of the > > CPU sampled at that one time, and this affects governor decisions, as > > in your case, and scaling_cur_freq feedback as well. This bad value will > > not change until there's another .target() or cpufreq_out_of_sync() > > call, which will never happen for fixed frequency governors like the > > performance governor. > > > > Thanks, > > Ionela. > > > > In the above function calling process, the frequency is obtained twice. The > first time is in cpufreq_online(), and the second time is in > cpufreq_verify_current_freq(). > > When the frequency configuration takes effect slowly, the kernel cannot > sense when the frequency configuration takes effect. It may take effect > before the frequency is read twice, between the frequencies read twice, or > after the frequency is read twice. > > |------------------|--------------------|---------------------| > set highest_freq get() get() target() > > If it takes effect before two read operations, there will be no problem. > > If it takes effect between two read operations, policy->cur will be updated > in cpufreq_verify_current_freq(), the execution path is as follows: > new_freq = cpufreq_driver->get() // new_freq = turbo_freq > if (policy->cur != new_freq) > cpufreq_out_of_sync(policy, new_freq) > ... > policy->cur = new_freq // cur = turbo_freq > ... > __cpufreq_driver_target(policy->max) > cppc_set_perf(target) // policy->cur!=target > > Reconfigure frequency to policy->max. > > If policy->cur is not set to turbo_freq after two read operations, > policy->cur will not be updated in cpufreq_verify_current_freq(), the > execution path is as follows: > new_freq = cpufreq_driver->get() // new_freq == policy->cur > if (policy->cur != new_freq) > ... > __cpufreq_driver_target(policy->max) > ret // policy->cur==target > > Configured frequency will remain at turbo-freq. > > When reading scaling_cur_freq, the frequency value that may be read is > policy->cur. If arch does not implement arch_freq_get_on_cpu(), and the > registered cpufreq_driver does not define setpolicy()/get(), the frequency > will not be obtained through the get() and will directly feed back > policy->cur. If the above problem occurs, no exception will be detected when > reading scaling_cur_freq. But reading cpuinfo_cur_freq will reacquire the > frequency through the get() interface and feedback the newly acquired > frequency value. Thank you for the details. I did understand the problem, but I believe the underlying cause is cpufreq_out_of_sync() setting policy->cur to the current frequency and not keeping the value of the last frequency request. @Viresh, do you happen to know the reason behind this? There are multiple issues caused by this, detailed at [1] (your patch), [2] (the other issue described by me above), and more recently [3]. I agree that your code is a good fix for [1] and [3] is a fix for both [2] and [3], if I'm not mistaken, but to me these are "tweaks" that bypass the fundamental issue in the cpufreq core and I would not be surprised to see other issues in the future caused by this, and not covered by the fixes at [1] and [3]. This being said, I would like to see these issues fixed, even by [1] and [3], if fixing the underlying cause is not feasible (or at least not easy to evaluate). [1] https://lore.kernel.org/lkml/20240428092852.1588188-1-liwei728@huawei.com/ [2] https://lore.kernel.org/lkml/3e6077bb-907c-057f-0896-d0a5814a4229@nvidia.com/ [3] https://lore.kernel.org/lkml/TYCP286MB2486B1D734F8E2D74BFBEEB1B1F32@TYCP286MB2486.JPNP286.PROD.OUTLOOK.COM/ Hope it helps, Ionela. > > Thanks > liwei > > > > > [1] https://docs.kernel.org/admin-guide/pm/cpufreq.html > > > > > > ... > > > > policy->governor->limits() > > > > __cpufreq_driver_target(policy->max) > > > > if (policy->cur==target) > > > > // generate error, keep set highest_perf > > > > ret > > > > cppc_set_perf(target) > > > > > > > > Fix this by changing highest_perf to nominal_perf in cppc_cpufreq_cpu_init(). > > > > > > > > Fixes: 5477fb3bd1e8 ("ACPI / CPPC: Add a CPUFreq driver for use with CPPC") > > > > Signed-off-by: liwei <liwei728@huawei.com> > > > > --- > > > > drivers/cpufreq/cppc_cpufreq.c | 8 ++++---- > > > > 1 file changed, 4 insertions(+), 4 deletions(-) > > > > > > > > diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c > > > > index 64420d9cfd1e..db04a82b8a97 100644 > > > > --- a/drivers/cpufreq/cppc_cpufreq.c > > > > +++ b/drivers/cpufreq/cppc_cpufreq.c > > > > @@ -669,14 +669,14 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy) > > > > if (caps->highest_perf > caps->nominal_perf) > > > > boost_supported = true; > > > > - /* Set policy->cur to max now. The governors will adjust later. */ > > > > - policy->cur = cppc_perf_to_khz(caps, caps->highest_perf); > > > > - cpu_data->perf_ctrls.desired_perf = caps->highest_perf; > > > > + /* Set policy->cur to norm now. */ > > > > + policy->cur = cppc_perf_to_khz(caps, caps->nominal_perf); > > > > + cpu_data->perf_ctrls.desired_perf = caps->nominal_perf; > > > > ret = cppc_set_perf(cpu, &cpu_data->perf_ctrls); > > > > if (ret) { > > > > pr_debug("Err setting perf value:%d on CPU:%d. ret:%d\n", > > > > - caps->highest_perf, cpu, ret); > > > > + caps->nominal_perf, cpu, ret); > > > > goto out; > > > > } > > > > -- > > > > 2.25.1 > > > > > > -- > > > viresh
On 10-05-24, 11:06, liwei (JK) wrote: > In the above function calling process, the frequency is obtained twice. The > first time is in cpufreq_online(), and the second time is in > cpufreq_verify_current_freq(). > > When the frequency configuration takes effect slowly, the kernel cannot > sense when the frequency configuration takes effect. Who is making this frequency change. I am not sure I understand how is that taking place slowly. > It may take effect > before the frequency is read twice, between the frequencies read twice, or > after the frequency is read twice. > > |------------------|--------------------|---------------------| > set highest_freq get() get() target() > > If it takes effect before two read operations, there will be no problem. > > If it takes effect between two read operations, policy->cur will be updated > in cpufreq_verify_current_freq(), the execution path is as follows: > new_freq = cpufreq_driver->get() // new_freq = turbo_freq > if (policy->cur != new_freq) > cpufreq_out_of_sync(policy, new_freq) > ... > policy->cur = new_freq // cur = turbo_freq > ... > __cpufreq_driver_target(policy->max) > cppc_set_perf(target) // policy->cur!=target > > Reconfigure frequency to policy->max. > > If policy->cur is not set to turbo_freq after two read operations, > policy->cur will not be updated in cpufreq_verify_current_freq(), the > execution path is as follows: > new_freq = cpufreq_driver->get() // new_freq == policy->cur > if (policy->cur != new_freq) > ... > __cpufreq_driver_target(policy->max) > ret // policy->cur==target > > Configured frequency will remain at turbo-freq. > > When reading scaling_cur_freq, the frequency value that may be read is > policy->cur. If arch does not implement arch_freq_get_on_cpu(), and the > registered cpufreq_driver does not define setpolicy()/get(), the frequency > will not be obtained through the get() and will directly feed back > policy->cur. If the above problem occurs, no exception will be detected when > reading scaling_cur_freq. But reading cpuinfo_cur_freq will reacquire the > frequency through the get() interface and feedback the newly acquired > frequency value.
On 05-06-24, 15:26, Ionela Voinescu wrote: > > > > > cpufreq_start_governor() // governor: performance > > > > > new_freq = cpufreq_driver->get() // if new_freq == policy->max > > > > > if (policy->cur != new_freq) > > > > > cpufreq_out_of_sync(policy, new_freq) > > > > > ... > > > > > policy->cur = new_freq > > > I believe the problem is here ^^^^^^^^^^^^^^^^^^^^^^. > > > > > > cpufreq_verify_current_freq() should not update policy->cur unless a > > > request to change frequency has actually reached the driver. I believe > > > policy->cur should always reflect the request, not the actual current > > > frequency of the CPU. There are times when the core doesn't have any prior information about the frequency, for example at driver probe time and resume. And so needs to set policy->cur by reading it from the hardware. > > > Given that new_freq is the current (hardware) frequency of the CPU, > > > obtained via .get(), it can be the nominal frequency, as it is in your > > > case, or any frequency, if there is any firmware/hardware capping in > > > place. > > > > > > This causes the issue in your scenario, in which __cpufreq_driver_target() > > > filters the request from the governor as it finds it equal to policy->cur, > > > and it believes it's already set by hardware. I am still not sure why mismatch happens at boot time here. > > > This causes another issue in which scaling_cur_freq, which for some > > > systems returns policy->cur, ends up returning the hardware frequency of > > > the CPUs, and not the last frequency request, as it should: > > > > > > "scaling_cur_freq > > > Current frequency of all of the CPUs belonging to this policy (in kHz). > > > > > > In the majority of cases, this is the frequency of the last P-state > > > requested by the scaling driver from the hardware using the scaling > > > interface provided by it, which may or may not reflect the frequency > > > the CPU is actually running at (due to hardware design and other > > > limitations)." [1] There is discussion going on about this in another thread [1] now. > > > Therefore policy->cur gets polluted with the hardware frequency of the > > > CPU sampled at that one time, and this affects governor decisions, as > > > in your case, and scaling_cur_freq feedback as well. This bad value will > > > not change until there's another .target() or cpufreq_out_of_sync() > > > call, which will never happen for fixed frequency governors like the > > > performance governor. -- viresh [1] https://lore.kernel.org/all/20240603081331.3829278-2-beata.michalska@arm.com/
On 11-06-24, 11:29, Ionela Voinescu wrote: > Yes, I mean we should only update policy->cur to a requested frequency > from a governor, after we start it (cpufreq_start_governor()). Hmm, I went through the code one more time and there is more to it I guess. During the earlier days (when the cpufreq core was taking shape), I think the basic idea behind keeping the policy->cur field was (which is true today as well): - Know current frequency of a CPU, quickly and so we cached it. - Avoid a frequency change, and not waste time, if we know that we aren't going to change the frequency eventually. - Ideally we can just remove the field, if the get() operation has zero overhead. But it doesn't and so we have this field. What the core does now is this: - Once the frequency is changed, set policy->cur to the frequency that the software believes the hardware is running at. This can be different from the requested frequency for example. - If we end up getting the real frequency from the hardware somehow, update policy->cur to match and so to avoid wasting of time later. - And I think this is the right thing to do even today. > But currently policy->cur gets updated to the .get() returned value in > multiple places, via cpufreq_verify_current_freq() (for example from > show_cpuinfo_cur_freq() or cpufreq_get(). Which makes sense based on above.. > .get() is meant to return the current frequency of the hardware and that > can opportunistically be different from the last request made. Right, and so if we have called get() for some other reason, like the user requesting it, we opportunistically fix policy->cur as well. > (+ we probably should force the first request from a governor to go > through to the driver to make sure the policy->cur obtained before, > via .get(), did not just happen to coincide with the governor request, > therefore making the request no longer go through to the driver: see > __cpufreq_driver_target) Not sure why would we want to do that. > > needs to be up to date all the times, it is an important part of the > > entire working of the cpufreq core.. > > When you say that "policy->cur must be kept up to date at all times", > I suppose you mean that it should be kept up to date with any frequency > change requests not with any changes happening in hardware? Yes and no. It should be up to date with the best knowledge that the software has, without any extra overheads.
diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c index 64420d9cfd1e..db04a82b8a97 100644 --- a/drivers/cpufreq/cppc_cpufreq.c +++ b/drivers/cpufreq/cppc_cpufreq.c @@ -669,14 +669,14 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy) if (caps->highest_perf > caps->nominal_perf) boost_supported = true; - /* Set policy->cur to max now. The governors will adjust later. */ - policy->cur = cppc_perf_to_khz(caps, caps->highest_perf); - cpu_data->perf_ctrls.desired_perf = caps->highest_perf; + /* Set policy->cur to norm now. */ + policy->cur = cppc_perf_to_khz(caps, caps->nominal_perf); + cpu_data->perf_ctrls.desired_perf = caps->nominal_perf; ret = cppc_set_perf(cpu, &cpu_data->perf_ctrls); if (ret) { pr_debug("Err setting perf value:%d on CPU:%d. ret:%d\n", - caps->highest_perf, cpu, ret); + caps->nominal_perf, cpu, ret); goto out; }
When turning on turbo, if frequency configuration takes effect slowly, the updated policy->cur may be equal to the frequency configured in governor->limits(), performance governor will not adjust the frequency, configured frequency will remain at turbo-freq. Simplified call stack looks as follows: cpufreq_register_driver(&cppc_cpufreq_driver) ... cppc_cpufreq_cpu_init() cppc_get_perf_caps() policy->max = cppc_perf_to_khz(caps, caps->nominal_perf) cppc_set_perf(highest_perf) // set highest_perf policy->cur = cpufreq_driver->get() // if cur == policy->max cpufreq_init_policy() ... cpufreq_start_governor() // governor: performance new_freq = cpufreq_driver->get() // if new_freq == policy->max if (policy->cur != new_freq) cpufreq_out_of_sync(policy, new_freq) ... policy->cur = new_freq ... policy->governor->limits() __cpufreq_driver_target(policy->max) if (policy->cur==target) // generate error, keep set highest_perf ret cppc_set_perf(target) Fix this by changing highest_perf to nominal_perf in cppc_cpufreq_cpu_init(). Fixes: 5477fb3bd1e8 ("ACPI / CPPC: Add a CPUFreq driver for use with CPPC") Signed-off-by: liwei <liwei728@huawei.com> --- drivers/cpufreq/cppc_cpufreq.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)