mbox series

[v3,0/2] cpufreq/amd-pstate: Set initial min_freq to lowest_nonlinear_freq

Message ID 20241017053927.25285-1-Dhananjay.Ugwekar@amd.com
Headers show
Series cpufreq/amd-pstate: Set initial min_freq to lowest_nonlinear_freq | expand

Message

Dhananjay Ugwekar Oct. 17, 2024, 5:39 a.m. UTC
According to the AMD architectural programmer's manual volume 2 [1], 
in section "17.6.4.1 CPPC_CAPABILITY_1" lowest_nonlinear_perf is described 
as "Reports the most energy efficient performance level (in terms of 
performance per watt). Above this threshold, lower performance levels 
generally result in increased energy efficiency. Reducing performance 
below this threshold does not result in total energy savings for a given 
computation, although it reduces instantaneous power consumption". So 
lowest_nonlinear_perf is the most power efficient performance level, and 
going below that would lead to a worse performance/watt.

Also setting the minimum frequency to lowest_nonlinear_freq (instead of
lowest_freq) allows the CPU to idle at a higher frequency which leads
to more time being spent in a deeper idle state (as trivial idle tasks
are completed sooner). This has shown a power benefit in some systems.
In other systems, power consumption has increased but so has the
throughput/watt.

Our objective here is to update the initial lower frequency limit to 
lowest_nonlinear_freq, while allowing the user to later update the lower 
limit to anywhere between lowest_freq to highest_freq for the platform.

So, set the policy->min to lowest_nonlinear_freq in the ->verify() 
callback, only if the original value is equal to FREQ_QOS_MIN_DEFAULT_VALUE
(i.e. 0). Merge the two identical verify functions while at it.

Link: https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24593.pdf [1]

Changes from v2:
* Fix the misplaced NULL pointer check (Mario)
* Move all new code inside the if condition
* Add comment to explain the rationale

v2 Link: https://lore.kernel.org/linux-pm/20241016144639.135610-1-Dhananjay.Ugwekar@amd.com/

Changes from v1:
* Modify the initial min_freq from verify callback, instead of adding a
  new callback in cpufreq_driver struct (Rafael)

v1 Link: https://lore.kernel.org/linux-pm/20241003083952.3186-1-Dhananjay.Ugwekar@amd.com/

Dhananjay Ugwekar (2):
  cpufreq/amd-pstate: Remove the redundant verify() function
  cpufreq/amd-pstate: Set the initial min_freq to lowest_nonlinear_freq

 drivers/cpufreq/amd-pstate.c | 34 +++++++++++++++++++++++-----------
 1 file changed, 23 insertions(+), 11 deletions(-)

Comments

Hanabishi Dec. 8, 2024, 7:54 a.m. UTC | #1
Hello. Maybe I'm too late on this, but I have some concerns.

On 10/17/24 05:39, Dhananjay Ugwekar wrote:
> In other systems, power consumption has increased but so has the
> throughput/watt.

I just want to bring up the fact that this change affects all governors. It sounds good for the performance governor, but not so much for the powersave governor.

So the question is: don't we want the lowest power consumption possible in the powersave mode? Even if it means decreased efficiency. Powersave by definition supposed to make battery last as long as possible no matter what, isn't it?
Mario Limonciello Dec. 8, 2024, 4:35 p.m. UTC | #2
On 12/8/2024 01:54, Hanabishi wrote:
> Hello. Maybe I'm too late on this, but I have some concerns.
> 
> On 10/17/24 05:39, Dhananjay Ugwekar wrote:
>> In other systems, power consumption has increased but so has the
>> throughput/watt.
> 
> I just want to bring up the fact that this change affects all governors. 
> It sounds good for the performance governor, but not so much for the 
> powersave governor.
> 
> So the question is: don't we want the lowest power consumption possible 
> in the powersave mode? Even if it means decreased efficiency. Powersave 
> by definition supposed to make battery last as long as possible no 
> matter what, isn't it?
> 

No, the powersave governor isn't a one stop shop to bring everything to 
longest battery.

By your argument we should set the EPP to "power" by default and "boost" 
to off by default when the powersave governor is enacted?

All of those are far too aggressive for a default behavior.  Setting the 
lowest nonlinear frequency as the default lowest scaling frequency is 
about having a good default that balances responsiveness, battery life 
and performance.

Like all knobs anyone that doesn't agree with it can of course modify it 
from sysfs.
Russell Haley Jan. 5, 2025, 3:37 a.m. UTC | #3
On 12/8/24 10:35 AM, Mario Limonciello wrote:
> On 12/8/2024 01:54, Hanabishi wrote:
>> Hello. Maybe I'm too late on this, but I have some concerns.
>>
>> On 10/17/24 05:39, Dhananjay Ugwekar wrote:
>>> In other systems, power consumption has increased but so has the
>>> throughput/watt.
>>
>> I just want to bring up the fact that this change affects all
>> governors. It sounds good for the performance governor, but not so
>> much for the powersave governor.
>>
>> So the question is: don't we want the lowest power consumption
>> possible in the powersave mode? Even if it means decreased efficiency.
>> Powersave by definition supposed to make battery last as long as
>> possible no matter what, isn't it?
>>
> 
> No, the powersave governor isn't a one stop shop to bring everything to
> longest battery.
> 
> By your argument we should set the EPP to "power" by default and "boost"
> to off by default when the powersave governor is enacted?
> 
> All of those are far too aggressive for a default behavior.  Setting the
> lowest nonlinear frequency as the default lowest scaling frequency is
> about having a good default that balances responsiveness, battery life
> and performance.
> 
> Like all knobs anyone that doesn't agree with it can of course modify it
> from sysfs.
> 

If the documentation is correct, the lowest_nonlinear_frequency *does*
result in the lowest battery consumption unless you are running one or
more threads at 100% utilization until the battery dies. In that case,
lowest nonlinear frequency should result in greatest number of
instructions retired when the battery dies. I say instructions retired
rather than work completed, because "100% until the battery dies" is
only stress tests, malware, and damn-the-torpedos concurrency frameworks
that use spinwaits.

If that is not true, then either the documentation is wrong, or the
CPU's reporting of its lowest nonlinear frequency is wrong.

I am puzzled why the CPU even exposes frequencies below
lowest-nonlinear. They should always be worse than PWM-ing between C0 at
lowest nonlinear freq and some deeper C-state. Testing software that has
to run on much slower CPUs, I guess?
Dhananjay Ugwekar Jan. 6, 2025, 4:43 a.m. UTC | #4
On 1/5/2025 9:07 AM, Russell Haley wrote:
> 
> 
> On 12/8/24 10:35 AM, Mario Limonciello wrote:
>> On 12/8/2024 01:54, Hanabishi wrote:
>>> Hello. Maybe I'm too late on this, but I have some concerns.
>>>
>>> On 10/17/24 05:39, Dhananjay Ugwekar wrote:
>>>> In other systems, power consumption has increased but so has the
>>>> throughput/watt.
>>>
>>> I just want to bring up the fact that this change affects all
>>> governors. It sounds good for the performance governor, but not so
>>> much for the powersave governor.
>>>
>>> So the question is: don't we want the lowest power consumption
>>> possible in the powersave mode? Even if it means decreased efficiency.
>>> Powersave by definition supposed to make battery last as long as
>>> possible no matter what, isn't it?
>>>
>>
>> No, the powersave governor isn't a one stop shop to bring everything to
>> longest battery.
>>
>> By your argument we should set the EPP to "power" by default and "boost"
>> to off by default when the powersave governor is enacted?
>>
>> All of those are far too aggressive for a default behavior.  Setting the
>> lowest nonlinear frequency as the default lowest scaling frequency is
>> about having a good default that balances responsiveness, battery life
>> and performance.
>>
>> Like all knobs anyone that doesn't agree with it can of course modify it
>> from sysfs.
>>
> 
> If the documentation is correct, the lowest_nonlinear_frequency *does*
> result in the lowest battery consumption unless you are running one or
> more threads at 100% utilization until the battery dies. In that case,
> lowest nonlinear frequency should result in greatest number of
> instructions retired when the battery dies. I say instructions retired
> rather than work completed, because "100% until the battery dies" is
> only stress tests, malware, and damn-the-torpedos concurrency frameworks
> that use spinwaits.
> 
> If that is not true, then either the documentation is wrong, or the
> CPU's reporting of its lowest nonlinear frequency is wrong.
> 
> I am puzzled why the CPU even exposes frequencies below
> lowest-nonlinear. They should always be worse than PWM-ing between C0 at
> lowest nonlinear freq and some deeper C-state. 

I dont think we can assume that idling at lowest frequency would *always* be 
worse than going to the shallowest C-state (considering the c-state entry-exit 
latency), in terms of power, performance or tail latencies. This might vary 
between different systems and scenarios. 

Testing software that has
> to run on much slower CPUs, I guess?