mbox series

[V7,0/7] amd-pstate preferred core

Message ID 20230918081407.756858-1-li.meng@amd.com
Headers show
Series amd-pstate preferred core | expand

Message

Meng, Li (Jassmine) Sept. 18, 2023, 8:14 a.m. UTC
Hi all:

The core frequency is subjected to the process variation in semiconductors.
Not all cores are able to reach the maximum frequency respecting the
infrastructure limits. Consequently, AMD has redefined the concept of
maximum frequency of a part. This means that a fraction of cores can reach
maximum frequency. To find the best process scheduling policy for a given
scenario, OS needs to know the core ordering informed by the platform through
highest performance capability register of the CPPC interface.

Earlier implementations of amd-pstate preferred core only support a static
core ranking and targeted performance. Now it has the ability to dynamically
change the preferred core based on the workload and platform conditions and
accounting for thermals and aging.

Amd-pstate driver utilizes the functions and data structures provided by
the ITMT architecture to enable the scheduler to favor scheduling on cores
which can be get a higher frequency with lower voltage.
We call it amd-pstate preferred core.

Here sched_set_itmt_core_prio() is called to set priorities and
sched_set_itmt_support() is called to enable ITMT feature.
Amd-pstate driver uses the highest performance value to indicate
the priority of CPU. The higher value has a higher priority.

Amd-pstate driver will provide an initial core ordering at boot time.
It relies on the CPPC interface to communicate the core ranking to the
operating system and scheduler to make sure that OS is choosing the cores
with highest performance firstly for scheduling the process. When amd-pstate
driver receives a message with the highest performance change, it will
update the core ranking.

Changes form V6->V7:
- x86:
- - Modify kconfig about X86_AMD_PSTATE.
- cpufreq: amd-pstate:
- - modify incorrect comments about scheduler_work().
- - convert highest_perf data type.
- - modify preferred core init when cpu init and online.
- acpi: cppc:
- - modify link of CPPC highest performance.
- cpufreq:
- - modify link of CPPC highest performance changed.

Changes form V5->V6:
- cpufreq: amd-pstate:
- - modify the wrong tag order.
- - modify warning about hw_prefcore sysfs attribute.
- - delete duplicate comments.
- - modify the variable name cppc_highest_perf to prefcore_ranking.
- - modify judgment conditions for setting highest_perf.
- - modify sysfs attribute for CPPC highest perf to pr_debug message.
- Documentation: amd-pstate:
- - modify warning: title underline too short.

Changes form V4->V5:
- cpufreq: amd-pstate:
- - modify sysfs attribute for CPPC highest perf.
- - modify warning about comments
- - rebase linux-next
- cpufreq: 
- - Moidfy warning about function declarations.
- Documentation: amd-pstate:
- - align with ``amd-pstat``

Changes form V3->V4:
- Documentation: amd-pstate:
- - Modify inappropriate descriptions.

Changes form V2->V3:
- x86:
- - Modify kconfig and description.
- cpufreq: amd-pstate: 
- - Add Co-developed-by tag in commit message.
- cpufreq:
- - Modify commit message.
- Documentation: amd-pstate:
- - Modify inappropriate descriptions.

Changes form V1->V2:
- acpi: cppc:
- - Add reference link.
- cpufreq:
- - Moidfy link error.
- cpufreq: amd-pstate: 
- - Init the priorities of all online CPUs
- - Use a single variable to represent the status of preferred core.
- Documentation:
- - Default enabled preferred core.
- Documentation: amd-pstate: 
- - Modify inappropriate descriptions.
- - Default enabled preferred core.
- - Use a single variable to represent the status of preferred core.

Meng Li (7):
  x86: Drop CPU_SUP_INTEL from SCHED_MC_PRIO for the expansion.
  acpi: cppc: Add get the highest performance cppc control
  cpufreq: amd-pstate: Enable amd-pstate preferred core supporting.
  cpufreq: Add a notification message that the highest perf has changed
  cpufreq: amd-pstate: Update amd-pstate preferred core ranking
    dynamically
  Documentation: amd-pstate: introduce amd-pstate preferred core
  Documentation: introduce amd-pstate preferrd core mode kernel command
    line options

 .../admin-guide/kernel-parameters.txt         |   5 +
 Documentation/admin-guide/pm/amd-pstate.rst   |  58 +++++-
 arch/x86/Kconfig                              |   5 +-
 drivers/acpi/cppc_acpi.c                      |  13 ++
 drivers/acpi/processor_driver.c               |   6 +
 drivers/cpufreq/amd-pstate.c                  | 197 ++++++++++++++++--
 drivers/cpufreq/cpufreq.c                     |  13 ++
 include/acpi/cppc_acpi.h                      |   5 +
 include/linux/amd-pstate.h                    |   6 +
 include/linux/cpufreq.h                       |   5 +
 10 files changed, 291 insertions(+), 22 deletions(-)

Comments

Huang Rui Sept. 20, 2023, 2:50 a.m. UTC | #1
On Mon, Sep 18, 2023 at 04:14:00PM +0800, Meng, Li (Jassmine) wrote:
> Hi all:
> 
> The core frequency is subjected to the process variation in semiconductors.
> Not all cores are able to reach the maximum frequency respecting the
> infrastructure limits. Consequently, AMD has redefined the concept of
> maximum frequency of a part. This means that a fraction of cores can reach
> maximum frequency. To find the best process scheduling policy for a given
> scenario, OS needs to know the core ordering informed by the platform through
> highest performance capability register of the CPPC interface.
> 
> Earlier implementations of amd-pstate preferred core only support a static
> core ranking and targeted performance. Now it has the ability to dynamically
> change the preferred core based on the workload and platform conditions and
> accounting for thermals and aging.
> 
> Amd-pstate driver utilizes the functions and data structures provided by
> the ITMT architecture to enable the scheduler to favor scheduling on cores
> which can be get a higher frequency with lower voltage.
> We call it amd-pstate preferred core.
> 
> Here sched_set_itmt_core_prio() is called to set priorities and
> sched_set_itmt_support() is called to enable ITMT feature.
> Amd-pstate driver uses the highest performance value to indicate
> the priority of CPU. The higher value has a higher priority.
> 
> Amd-pstate driver will provide an initial core ordering at boot time.
> It relies on the CPPC interface to communicate the core ranking to the
> operating system and scheduler to make sure that OS is choosing the cores
> with highest performance firstly for scheduling the process. When amd-pstate
> driver receives a message with the highest performance change, it will
> update the core ranking.
> 
> Changes form V6->V7:
> - x86:
> - - Modify kconfig about X86_AMD_PSTATE.
> - cpufreq: amd-pstate:
> - - modify incorrect comments about scheduler_work().
> - - convert highest_perf data type.
> - - modify preferred core init when cpu init and online.
> - acpi: cppc:
> - - modify link of CPPC highest performance.
> - cpufreq:
> - - modify link of CPPC highest performance changed.
> 
> Changes form V5->V6:
> - cpufreq: amd-pstate:
> - - modify the wrong tag order.
> - - modify warning about hw_prefcore sysfs attribute.
> - - delete duplicate comments.
> - - modify the variable name cppc_highest_perf to prefcore_ranking.
> - - modify judgment conditions for setting highest_perf.
> - - modify sysfs attribute for CPPC highest perf to pr_debug message.
> - Documentation: amd-pstate:
> - - modify warning: title underline too short.

Apart from the comment in patch 3, others look good for me.

Please feel free to add my RB in other patches:

Reviewed-by: Huang Rui <ray.huang@amd.com>

> 
> Changes form V4->V5:
> - cpufreq: amd-pstate:
> - - modify sysfs attribute for CPPC highest perf.
> - - modify warning about comments
> - - rebase linux-next
> - cpufreq: 
> - - Moidfy warning about function declarations.
> - Documentation: amd-pstate:
> - - align with ``amd-pstat``
> 
> Changes form V3->V4:
> - Documentation: amd-pstate:
> - - Modify inappropriate descriptions.
> 
> Changes form V2->V3:
> - x86:
> - - Modify kconfig and description.
> - cpufreq: amd-pstate: 
> - - Add Co-developed-by tag in commit message.
> - cpufreq:
> - - Modify commit message.
> - Documentation: amd-pstate:
> - - Modify inappropriate descriptions.
> 
> Changes form V1->V2:
> - acpi: cppc:
> - - Add reference link.
> - cpufreq:
> - - Moidfy link error.
> - cpufreq: amd-pstate: 
> - - Init the priorities of all online CPUs
> - - Use a single variable to represent the status of preferred core.
> - Documentation:
> - - Default enabled preferred core.
> - Documentation: amd-pstate: 
> - - Modify inappropriate descriptions.
> - - Default enabled preferred core.
> - - Use a single variable to represent the status of preferred core.
> 
> Meng Li (7):
>   x86: Drop CPU_SUP_INTEL from SCHED_MC_PRIO for the expansion.
>   acpi: cppc: Add get the highest performance cppc control
>   cpufreq: amd-pstate: Enable amd-pstate preferred core supporting.
>   cpufreq: Add a notification message that the highest perf has changed
>   cpufreq: amd-pstate: Update amd-pstate preferred core ranking
>     dynamically
>   Documentation: amd-pstate: introduce amd-pstate preferred core
>   Documentation: introduce amd-pstate preferrd core mode kernel command
>     line options
> 
>  .../admin-guide/kernel-parameters.txt         |   5 +
>  Documentation/admin-guide/pm/amd-pstate.rst   |  58 +++++-
>  arch/x86/Kconfig                              |   5 +-
>  drivers/acpi/cppc_acpi.c                      |  13 ++
>  drivers/acpi/processor_driver.c               |   6 +
>  drivers/cpufreq/amd-pstate.c                  | 197 ++++++++++++++++--
>  drivers/cpufreq/cpufreq.c                     |  13 ++
>  include/acpi/cppc_acpi.h                      |   5 +
>  include/linux/amd-pstate.h                    |   6 +
>  include/linux/cpufreq.h                       |   5 +
>  10 files changed, 291 insertions(+), 22 deletions(-)
> 
> -- 
> 2.34.1
>
Mario Limonciello Sept. 20, 2023, 4:56 p.m. UTC | #2
On 9/19/2023 14:01, Oleksandr Natalenko wrote:
>> Meng Li (7):
>>    x86: Drop CPU_SUP_INTEL from SCHED_MC_PRIO for the expansion.
>>    acpi: cppc: Add get the highest performance cppc control
>>    cpufreq: amd-pstate: Enable amd-pstate preferred core supporting.
>>    cpufreq: Add a notification message that the highest perf has changed
>>    cpufreq: amd-pstate: Update amd-pstate preferred core ranking
>>      dynamically
>>    Documentation: amd-pstate: introduce amd-pstate preferred core
>>    Documentation: introduce amd-pstate preferrd core mode kernel command
>>      line options
>>
>>   .../admin-guide/kernel-parameters.txt         |   5 +
>>   Documentation/admin-guide/pm/amd-pstate.rst   |  58 +++++-
>>   arch/x86/Kconfig                              |   5 +-
>>   drivers/acpi/cppc_acpi.c                      |  13 ++
>>   drivers/acpi/processor_driver.c               |   6 +
>>   drivers/cpufreq/amd-pstate.c                  | 197 ++++++++++++++++--
>>   drivers/cpufreq/cpufreq.c                     |  13 ++
>>   include/acpi/cppc_acpi.h                      |   5 +
>>   include/linux/amd-pstate.h                    |   6 +
>>   include/linux/cpufreq.h                       |   5 +
>>   10 files changed, 291 insertions(+), 22 deletions(-)
> 
> When applied on top of v6.5.3 this breaks turbo on my 5950X after suspend/resume cycle. Please see the scenario description below.
> 
> If I boot v6.5.3 + this patchset, then `turbostat` reports ~4.9 GHz on core 0 where `taskset -c 0 dd if=/dev/zero of=/dev/null` is being run.
> 
> After I suspend the machine and then resume it, and run `dd` again, `turbostat` reports the core to be capped to a stock frequency of ~3.4 GHz. Rebooting the machine fixes this, and the CPU can boost again.
> 
> If this patchset is reverted, then the CPU can turbo after suspend/resume cycle just fine.
> 
> I'm using `amd_pstate=guided`.
> 
> Is this behaviour expected?

To help confirm where the issue is, can I ask you to do three 
experiments with the patch series applied:

1) 'amd_pstate=active' on your kernel command line.
2) 'amd_pstate=active amd_prefcore=disable' on your kernel command line.
3) 'amd_pstate=guided amd_prefcore=disable' on your kernel command line.

Looking through the code, I anticipate from your report that it 
reproduces on "1" but not "2" and "3".

Meng,

Can you try to repro?

I think that it's probably a call to amd_pstate_init_prefcore() missing
from amd_pstate_cpu_resume() and also amd_pstate_epp_resume().
Oleksandr Natalenko Sept. 20, 2023, 7:34 p.m. UTC | #3
Hello.

On středa 20. září 2023 18:56:09 CEST Mario Limonciello wrote:
> > When applied on top of v6.5.3 this breaks turbo on my 5950X after suspend/resume cycle. Please see the scenario description below.
> > 
> > If I boot v6.5.3 + this patchset, then `turbostat` reports ~4.9 GHz on core 0 where `taskset -c 0 dd if=/dev/zero of=/dev/null` is being run.
> > 
> > After I suspend the machine and then resume it, and run `dd` again, `turbostat` reports the core to be capped to a stock frequency of ~3.4 GHz. Rebooting the machine fixes this, and the CPU can boost again.
> > 
> > If this patchset is reverted, then the CPU can turbo after suspend/resume cycle just fine.
> > 
> > I'm using `amd_pstate=guided`.
> > 
> > Is this behaviour expected?
> 
> To help confirm where the issue is, can I ask you to do three 
> experiments with the patch series applied:
> 
> 1) 'amd_pstate=active' on your kernel command line.

The issue is reproducible. If I toggle the governor in cpupower to `powersave` and back to `performance`, boost is restored.

> 2) 'amd_pstate=active amd_prefcore=disable' on your kernel command line.

The issue is not reproducible.

> 3) 'amd_pstate=guided amd_prefcore=disable' on your kernel command line.

The issue is not reproducible.

I should also mention that in my initial configuration I use `amd_pstate=guided` and `schedutil`. If I switch to `performance` after suspend-resume cycle, the boost is restored. However, if I switch back to `schedutil`, the freq is capped.

Does this info help?

> Looking through the code, I anticipate from your report that it 
> reproduces on "1" but not "2" and "3".
> 
> Meng,
> 
> Can you try to repro?
> 
> I think that it's probably a call to amd_pstate_init_prefcore() missing
> from amd_pstate_cpu_resume() and also amd_pstate_epp_resume().
Mario Limonciello Sept. 20, 2023, 8:11 p.m. UTC | #4
On 9/20/2023 14:34, Oleksandr Natalenko wrote:
> Hello.
> 
> On středa 20. září 2023 18:56:09 CEST Mario Limonciello wrote:
>>> When applied on top of v6.5.3 this breaks turbo on my 5950X after suspend/resume cycle. Please see the scenario description below.
>>>
>>> If I boot v6.5.3 + this patchset, then `turbostat` reports ~4.9 GHz on core 0 where `taskset -c 0 dd if=/dev/zero of=/dev/null` is being run.
>>>
>>> After I suspend the machine and then resume it, and run `dd` again, `turbostat` reports the core to be capped to a stock frequency of ~3.4 GHz. Rebooting the machine fixes this, and the CPU can boost again.
>>>
>>> If this patchset is reverted, then the CPU can turbo after suspend/resume cycle just fine.
>>>
>>> I'm using `amd_pstate=guided`.
>>>
>>> Is this behaviour expected?
>>
>> To help confirm where the issue is, can I ask you to do three
>> experiments with the patch series applied:
>>
>> 1) 'amd_pstate=active' on your kernel command line.
> 
> The issue is reproducible. If I toggle the governor in cpupower to `powersave` and back to `performance`, boost is restored.
> 
>> 2) 'amd_pstate=active amd_prefcore=disable' on your kernel command line.
> 
> The issue is not reproducible.
> 
>> 3) 'amd_pstate=guided amd_prefcore=disable' on your kernel command line.
> 
> The issue is not reproducible.
> 
> I should also mention that in my initial configuration I use `amd_pstate=guided` and `schedutil`. If I switch to `performance` after suspend-resume cycle, the boost is restored. However, if I switch back to `schedutil`, the freq is capped.
> 
> Does this info help?
> 

Yeah, it matches my expectations for this issue you reported.
Thanks!

Jassmine can dig into a fix for another spin of this series.

>> Looking through the code, I anticipate from your report that it
>> reproduces on "1" but not "2" and "3".
>>
>> Meng,
>>
>> Can you try to repro?
>>
>> I think that it's probably a call to amd_pstate_init_prefcore() missing
>> from amd_pstate_cpu_resume() and also amd_pstate_epp_resume().
>
Meng, Li (Jassmine) Sept. 21, 2023, 5:51 a.m. UTC | #5
[AMD Official Use Only - General]

Hi Natalenko and Mario:

> -----Original Message-----
> From: Limonciello, Mario <Mario.Limonciello@amd.com>
> Sent: Thursday, September 21, 2023 4:12 AM
> To: Oleksandr Natalenko <oleksandr@natalenko.name>; Huang, Ray
> <Ray.Huang@amd.com>; Meng, Li (Jassmine) <Li.Meng@amd.com>
> Cc: linux-pm@vger.kernel.org; linux-kernel@vger.kernel.org;
> x86@kernel.org; linux-acpi@vger.kernel.org; Shuah Khan
> <skhan@linuxfoundation.org>; linux-kselftest@vger.kernel.org; Fontenot,
> Nathan <Nathan.Fontenot@amd.com>; Sharma, Deepak
> <Deepak.Sharma@amd.com>; Deucher, Alexander
> <Alexander.Deucher@amd.com>; Huang, Shimmer
> <Shimmer.Huang@amd.com>; Yuan, Perry <Perry.Yuan@amd.com>; Du,
> Xiaojian <Xiaojian.Du@amd.com>; Viresh Kumar <viresh.kumar@linaro.org>;
> Borislav Petkov <bp@alien8.de>; Rafael J . Wysocki
> <rafael.j.wysocki@intel.com>
> Subject: Re: [PATCH V7 0/7] amd-pstate preferred core
>
> On 9/20/2023 14:34, Oleksandr Natalenko wrote:
> > Hello.
> >
> > On středa 20. září 2023 18:56:09 CEST Mario Limonciello wrote:
> >>> When applied on top of v6.5.3 this breaks turbo on my 5950X after
> suspend/resume cycle. Please see the scenario description below.
> >>>
> >>> If I boot v6.5.3 + this patchset, then `turbostat` reports ~4.9 GHz on core
> 0 where `taskset -c 0 dd if=/dev/zero of=/dev/null` is being run.
> >>>
> >>> After I suspend the machine and then resume it, and run `dd` again,
> `turbostat` reports the core to be capped to a stock frequency of ~3.4 GHz.
> Rebooting the machine fixes this, and the CPU can boost again.
> >>>
> >>> If this patchset is reverted, then the CPU can turbo after
> suspend/resume cycle just fine.
> >>>
> >>> I'm using `amd_pstate=guided`.
> >>>
> >>> Is this behaviour expected?
> >>
> >> To help confirm where the issue is, can I ask you to do three
> >> experiments with the patch series applied:
> >>
> >> 1) 'amd_pstate=active' on your kernel command line.
> >
> > The issue is reproducible. If I toggle the governor in cpupower to
> `powersave` and back to `performance`, boost is restored.
> >
> >> 2) 'amd_pstate=active amd_prefcore=disable' on your kernel command
> line.
> >
> > The issue is not reproducible.
> >
> >> 3) 'amd_pstate=guided amd_prefcore=disable' on your kernel command
> line.
> >
> > The issue is not reproducible.
> >
> > I should also mention that in my initial configuration I use
> `amd_pstate=guided` and `schedutil`. If I switch to `performance` after
> suspend-resume cycle, the boost is restored. However, if I switch back to
> `schedutil`, the freq is capped.
> >
> > Does this info help?
> >
>
> Yeah, it matches my expectations for this issue you reported.
> Thanks!
>
> Jassmine can dig into a fix for another spin of this series.
[Meng, Li (Jassmine)]
Thank you very much!
I will fix this issue in the next patches.
>
> >> Looking through the code, I anticipate from your report that it
> >> reproduces on "1" but not "2" and "3".
> >>
> >> Meng,
> >>
> >> Can you try to repro?
> >>
> >> I think that it's probably a call to amd_pstate_init_prefcore()
> >> missing from amd_pstate_cpu_resume() and also
> amd_pstate_epp_resume().
> >