Message ID | 20231013033118.3759311-1-li.meng@amd.com |
---|---|
Headers | show |
Series | amd-pstate preferred core | expand |
[AMD Official Use Only - General] Hi Oleksandr: > -----Original Message----- > From: Oleksandr Natalenko <oleksandr@natalenko.name> > Sent: Friday, October 13, 2023 11:45 PM > To: Rafael J . Wysocki <rafael.j.wysocki@intel.com>; Huang, Ray > <Ray.Huang@amd.com>; Meng, Li (Jassmine) <Li.Meng@amd.com> > Cc: linux-pm@vger.kernel.org; linux-kernel@vger.kernel.org; > x86@kernel.org; linux-acpi@vger.kernel.org; Shuah Khan > <skhan@linuxfoundation.org>; linux-kselftest@vger.kernel.org; Fontenot, > Nathan <Nathan.Fontenot@amd.com>; Sharma, Deepak > <Deepak.Sharma@amd.com>; Deucher, Alexander > <Alexander.Deucher@amd.com>; Limonciello, Mario > <Mario.Limonciello@amd.com>; Huang, Shimmer > <Shimmer.Huang@amd.com>; Yuan, Perry <Perry.Yuan@amd.com>; Du, > Xiaojian <Xiaojian.Du@amd.com>; Viresh Kumar <viresh.kumar@linaro.org>; > Borislav Petkov <bp@alien8.de>; Meng, Li (Jassmine) <Li.Meng@amd.com> > Subject: Re: [RESEND PATCH V9 0/7] amd-pstate preferred core > > Hello. > > On pátek 13. října 2023 5:31:11 CEST Meng Li wrote: > > Hi all: > > > > The core frequency is subjected to the process variation in semiconductors. > > Not all cores are able to reach the maximum frequency respecting the > > infrastructure limits. Consequently, AMD has redefined the concept of > > maximum frequency of a part. This means that a fraction of cores can > > reach maximum frequency. To find the best process scheduling policy > > for a given scenario, OS needs to know the core ordering informed by > > the platform through highest performance capability register of the CPPC > interface. > > > > Earlier implementations of amd-pstate preferred core only support a > > static core ranking and targeted performance. Now it has the ability > > to dynamically change the preferred core based on the workload and > > platform conditions and accounting for thermals and aging. > > > > Amd-pstate driver utilizes the functions and data structures provided > > by the ITMT architecture to enable the scheduler to favor scheduling > > on cores which can be get a higher frequency with lower voltage. > > We call it amd-pstate preferred core. > > > > Here sched_set_itmt_core_prio() is called to set priorities and > > sched_set_itmt_support() is called to enable ITMT feature. > > Amd-pstate driver uses the highest performance value to indicate the > > priority of CPU. The higher value has a higher priority. > > > > Amd-pstate driver will provide an initial core ordering at boot time. > > It relies on the CPPC interface to communicate the core ranking to the > > operating system and scheduler to make sure that OS is choosing the > > cores with highest performance firstly for scheduling the process. > > When amd-pstate driver receives a message with the highest performance > > change, it will update the core ranking. > > > > Changes form V8->V9: > > - all: > > - - pick up Tested-By flag added by Oleksandr. > > - cpufreq: amd-pstate: > > - - pick up Review-By flag added by Wyes. > > - - ignore modification of bug. > > Thanks for this submission. > > The bug you refer to, I assume it should have been fixed by this hunk: > > ``` > --- a/drivers/cpufreq/amd-pstate.c > +++ b/drivers/cpufreq/amd-pstate.c > @@ -542,7 +542,7 @@ static void amd_pstate_adjust_perf(unsigned int cpu, > if (target_perf < capacity) > des_perf = DIV_ROUND_UP(cap_perf * target_perf, > capacity); > > - min_perf = READ_ONCE(cpudata->highest_perf); > + min_perf = READ_ONCE(cpudata->lowest_perf); > if (_min_perf < capacity) > min_perf = DIV_ROUND_UP(cap_perf * _min_perf, capacity); > ``` > > which is now missing from this patchset as it was suggested to send it as a > separate patch. > > Am I correct? If so, are you going to send it as a separate patch within the > next round of this patchset, or it will be sent separately (if it hasn't yet)? > [Meng, Li (Jassmine)] Thank you! It is now missing from this serial patches. And I will send a separately patch for this issue. > > - - add a attribute of prefcore_ranking. > > - - modify data type conversion from u32 to int. > > - Documentation: amd-pstate: > > - - pick up Review-By flag added by Wyes. > > > > Changes form V7->V8: > > - all: > > - - pick up Review-By flag added by Mario and Ray. > > - cpufreq: amd-pstate: > > - - use hw_prefcore embeds into cpudata structure. > > - - delete preferred core init from cpu online/off. > > > > Changes form V6->V7: > > - x86: > > - - Modify kconfig about X86_AMD_PSTATE. > > - cpufreq: amd-pstate: > > - - modify incorrect comments about scheduler_work(). > > - - convert highest_perf data type. > > - - modify preferred core init when cpu init and online. > > - acpi: cppc: > > - - modify link of CPPC highest performance. > > - cpufreq: > > - - modify link of CPPC highest performance changed. > > > > Changes form V5->V6: > > - cpufreq: amd-pstate: > > - - modify the wrong tag order. > > - - modify warning about hw_prefcore sysfs attribute. > > - - delete duplicate comments. > > - - modify the variable name cppc_highest_perf to prefcore_ranking. > > - - modify judgment conditions for setting highest_perf. > > - - modify sysfs attribute for CPPC highest perf to pr_debug message. > > - Documentation: amd-pstate: > > - - modify warning: title underline too short. > > > > Changes form V4->V5: > > - cpufreq: amd-pstate: > > - - modify sysfs attribute for CPPC highest perf. > > - - modify warning about comments > > - - rebase linux-next > > - cpufreq: > > - - Moidfy warning about function declarations. > > - Documentation: amd-pstate: > > - - align with ``amd-pstat`` > > > > Changes form V3->V4: > > - Documentation: amd-pstate: > > - - Modify inappropriate descriptions. > > > > Changes form V2->V3: > > - x86: > > - - Modify kconfig and description. > > - cpufreq: amd-pstate: > > - - Add Co-developed-by tag in commit message. > > - cpufreq: > > - - Modify commit message. > > - Documentation: amd-pstate: > > - - Modify inappropriate descriptions. > > > > Changes form V1->V2: > > - acpi: cppc: > > - - Add reference link. > > - cpufreq: > > - - Moidfy link error. > > - cpufreq: amd-pstate: > > - - Init the priorities of all online CPUs > > - - Use a single variable to represent the status of preferred core. > > - Documentation: > > - - Default enabled preferred core. > > - Documentation: amd-pstate: > > - - Modify inappropriate descriptions. > > - - Default enabled preferred core. > > - - Use a single variable to represent the status of preferred core. > > > > Meng Li (7): > > x86: Drop CPU_SUP_INTEL from SCHED_MC_PRIO for the expansion. > > acpi: cppc: Add get the highest performance cppc control > > cpufreq: amd-pstate: Enable amd-pstate preferred core supporting. > > cpufreq: Add a notification message that the highest perf has changed > > cpufreq: amd-pstate: Update amd-pstate preferred core ranking > > dynamically > > Documentation: amd-pstate: introduce amd-pstate preferred core > > Documentation: introduce amd-pstate preferrd core mode kernel > command > > line options > > > > .../admin-guide/kernel-parameters.txt | 5 + > > Documentation/admin-guide/pm/amd-pstate.rst | 59 ++++- > > arch/x86/Kconfig | 5 +- > > drivers/acpi/cppc_acpi.c | 13 ++ > > drivers/acpi/processor_driver.c | 6 + > > drivers/cpufreq/amd-pstate.c | 204 ++++++++++++++++-- > > drivers/cpufreq/cpufreq.c | 13 ++ > > include/acpi/cppc_acpi.h | 5 + > > include/linux/amd-pstate.h | 10 + > > include/linux/cpufreq.h | 5 + > > 10 files changed, 305 insertions(+), 20 deletions(-) > > > > > > > -- > Oleksandr Natalenko (post-factum)
[AMD Official Use Only - General] Hi Peter: > -----Original Message----- > From: Peter Zijlstra <peterz@infradead.org> > Sent: Saturday, October 14, 2023 12:01 AM > To: Meng, Li (Jassmine) <Li.Meng@amd.com> > Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>; Huang, Ray > <Ray.Huang@amd.com>; linux-pm@vger.kernel.org; linux- > kernel@vger.kernel.org; x86@kernel.org; linux-acpi@vger.kernel.org; Shuah > Khan <skhan@linuxfoundation.org>; linux-kselftest@vger.kernel.org; > Fontenot, Nathan <Nathan.Fontenot@amd.com>; Sharma, Deepak > <Deepak.Sharma@amd.com>; Deucher, Alexander > <Alexander.Deucher@amd.com>; Limonciello, Mario > <Mario.Limonciello@amd.com>; Huang, Shimmer > <Shimmer.Huang@amd.com>; Yuan, Perry <Perry.Yuan@amd.com>; Du, > Xiaojian <Xiaojian.Du@amd.com>; Viresh Kumar <viresh.kumar@linaro.org>; > Borislav Petkov <bp@alien8.de>; Oleksandr Natalenko > <oleksandr@natalenko.name>; Karny, Wyes <Wyes.Karny@amd.com> > Subject: Re: [RESEND PATCH V9 3/7] cpufreq: amd-pstate: Enable amd- > pstate preferred core supporting. > > Caution: This message originated from an External Source. Use proper > caution when opening attachments, clicking links, or responding. > > > On Fri, Oct 13, 2023 at 11:31:14AM +0800, Meng Li wrote: > > > +#define AMD_PSTATE_PREFCORE_THRESHOLD 166 > > +#define AMD_PSTATE_MAX_CPPC_PERF 255 > > > +static void amd_pstate_init_prefcore(struct amd_cpudata *cpudata) { > > + int ret, prio; > > + u32 highest_perf; > > + static u32 max_highest_perf = 0, min_highest_perf = U32_MAX; > > What serializes these things? > > Also, *why* are you using u32 here, what's wrong with something like: > > int max_hp = INT_MIN, min_hp = INT_MAX; > [Meng, Li (Jassmine)] We use ITMT architecture to utilize preferred core features. Therefore, we need to try to be consistent with Intel's implementation as much as possible. For details, please refer to the intel_pstate_set_itmt_prio function in file intel_pstate.c. (Line 355) I think using the data type of u32 is consistent with the data structures of cppc_perf_ctrls and amd_cpudata etc. > > + > > + ret = amd_pstate_get_highest_perf(cpudata->cpu, &highest_perf); > > + if (ret) > > + return; > > + > > + cpudata->hw_prefcore = true; > > + /* check if CPPC preferred core feature is enabled*/ > > + if (highest_perf == AMD_PSTATE_MAX_CPPC_PERF) { > > Which effectively means <255 (also, seems to suggest MAX_CPPC_PERF > might not be the best name, hmm?) > > Should you not write '>= 255' then? Just in case something 'funny' > happens? > [Meng, Li (Jassmine)] OK, I will modify these. > > + pr_debug("AMD CPPC preferred core is unsupported!\n"); > > + cpudata->hw_prefcore = false; > > + return; > > + } > > + > > + if (!amd_pstate_prefcore) > > + return; > > + > > + /* The maximum value of highest perf is 255 */ > > + prio = (int)(highest_perf & 0xff); > > If for some weird reason you get 0x1ff or whatever above (dodgy BIOS never > happens, right) then this makes sense how? > > Perhaps stop sending patches at break-nek speed and think for a little while > on how to write this and not be confused? > [Meng, Li (Jassmine)] If I use '>= 255' to check, the issue mentioned will not exist. Because it will be returned when highest_perff>0xff. > > > + /* > > + * The priorities can be set regardless of whether or not > > + * sched_set_itmt_support(true) has been called and it is valid to > > + * update them at any time after it has been called. > > + */ > > + sched_set_itmt_core_prio(prio, cpudata->cpu); > > + > > + if (max_highest_perf <= min_highest_perf) { > > + if (highest_perf > max_highest_perf) > > + max_highest_perf = highest_perf; > > + > > + if (highest_perf < min_highest_perf) > > + min_highest_perf = highest_perf; > > + > > + if (max_highest_perf > min_highest_perf) { > > + /* > > + * This code can be run during CPU online under the > > + * CPU hotplug locks, so sched_set_itmt_support() > > + * cannot be called from here. Queue up a work item > > + * to invoke it. > > + */ > > + schedule_work(&sched_prefcore_work); > > + } > > + } > > Not a word about what serializes these variables. > > > +}
On Mon, Oct 16, 2023 at 06:20:53AM +0000, Meng, Li (Jassmine) wrote: > > > +static void amd_pstate_init_prefcore(struct amd_cpudata *cpudata) { > > > + int ret, prio; > > > + u32 highest_perf; > > > + static u32 max_highest_perf = 0, min_highest_perf = U32_MAX; > > > > What serializes these things? > > > > Also, *why* are you using u32 here, what's wrong with something like: > > > > int max_hp = INT_MIN, min_hp = INT_MAX; > > > [Meng, Li (Jassmine)] > We use ITMT architecture to utilize preferred core features. > Therefore, we need to try to be consistent with Intel's implementation > as much as possible. For details, please refer to the > intel_pstate_set_itmt_prio function in file intel_pstate.c. (Line 355) > > I think using the data type of u32 is consistent with the data > structures of cppc_perf_ctrls and amd_cpudata etc. Rafael, should we fix intel_pstate too? The point is, that sched_asym_prefer(), the final consumer of these values uses int and thus an explicitly signed compare. Using u32 and U32_MAX anywhere near the setting the priority makes absolutely no sense. If you were to have the high bit set, things do not behave as expected. Also, same question as to the amd folks; what serializes those static variables?
On 10/16/2023 12:58 PM, Peter Zijlstra wrote: > On Mon, Oct 16, 2023 at 06:20:53AM +0000, Meng, Li (Jassmine) wrote: >>>> +static void amd_pstate_init_prefcore(struct amd_cpudata *cpudata) { >>>> + int ret, prio; >>>> + u32 highest_perf; >>>> + static u32 max_highest_perf = 0, min_highest_perf = U32_MAX; >>> What serializes these things? >>> >>> Also, *why* are you using u32 here, what's wrong with something like: >>> >>> int max_hp = INT_MIN, min_hp = INT_MAX; >>> >> [Meng, Li (Jassmine)] >> We use ITMT architecture to utilize preferred core features. >> Therefore, we need to try to be consistent with Intel's implementation >> as much as possible. For details, please refer to the >> intel_pstate_set_itmt_prio function in file intel_pstate.c. (Line 355) >> >> I think using the data type of u32 is consistent with the data >> structures of cppc_perf_ctrls and amd_cpudata etc. > Rafael, should we fix intel_pstate too? Srinivas should be more familiar with this code than I am, so adding him. > The point is, that sched_asym_prefer(), the final consumer of these > values uses int and thus an explicitly signed compare. > > Using u32 and U32_MAX anywhere near the setting the priority makes > absolutely no sense. > > If you were to have the high bit set, things do not behave as expected. Right, but in practice these values are always between 0 and 255 inclusive AFAICS. It would have been better to use u8 I suppose. > Also, same question as to the amd folks; what serializes those static > variables? That's a good one.
On Mon, 2023-10-16 at 19:27 +0200, Wysocki, Rafael J wrote: > On 10/16/2023 12:58 PM, Peter Zijlstra wrote: > > On Mon, Oct 16, 2023 at 06:20:53AM +0000, Meng, Li (Jassmine) > > wrote: > > > > > +static void amd_pstate_init_prefcore(struct amd_cpudata > > > > > *cpudata) { > > > > > + int ret, prio; > > > > > + u32 highest_perf; > > > > > + static u32 max_highest_perf = 0, min_highest_perf = > > > > > U32_MAX; > > > > What serializes these things? > > > > > > > > Also, *why* are you using u32 here, what's wrong with something > > > > like: > > > > > > > > int max_hp = INT_MIN, min_hp = INT_MAX; > > > > > > > [Meng, Li (Jassmine)] > > > We use ITMT architecture to utilize preferred core features. > > > Therefore, we need to try to be consistent with Intel's > > > implementation > > > as much as possible. For details, please refer to the > > > intel_pstate_set_itmt_prio function in file intel_pstate.c. (Line > > > 355) > > > > > > I think using the data type of u32 is consistent with the data > > > structures of cppc_perf_ctrls and amd_cpudata etc. > > Rafael, should we fix intel_pstate too? > > Srinivas should be more familiar with this code than I am, so adding > him. > If we make static u32 max_highest_perf = 0, min_highest_perf = U32_MAX; to static int max_highest_perf = INT_MIN, min_highest_perf = INT_MAX; Then in intel_pstate we will compare signed vs unsigned comparison as cppc_perf.highest_perf is u32. In reality this will be fine to change to "int" as we will never reach u32 max as performance on any Intel platform. > > > The point is, that sched_asym_prefer(), the final consumer of these > > values uses int and thus an explicitly signed compare. > > > > Using u32 and U32_MAX anywhere near the setting the priority makes > > absolutely no sense. > > > > If you were to have the high bit set, things do not behave as > > expected. > > Right, but in practice these values are always between 0 and 255 > inclusive AFAICS. > > It would have been better to use u8 I suppose. Should be fine as over clocked parts will set to max 0xff. > > > > Also, same question as to the amd folks; what serializes those > > static > > variables? > > That's a good one. This function which is checking static variables is called from cpufreq ->init callback. Which in turn is called from a function which is passed as startup() function pointer to cpuhp_setup_state_nocalls_cpuslocked(). I see that startup() callbacks are called under a mutex cpuhp_state_mutex for each present CPUs. So if some tear down happen, that is also protected by the same mutex. The assumption is here is that cpuhp_invoke_callback() in hotplug state machine is not called in parallel on two CPUs by the hotplug state machine. But I see activity on parallel bringup, so this is questionable now. Thanks, Srinivas > >
On Mon, Oct 16, 2023 at 11:50:34AM -0700, srinivas pandruvada wrote: I'll respond to the rest tomorrow, it's far too late. > > > Also, same question as to the amd folks; what serializes those > > > static > > > variables? > > > > That's a good one. > > This function which is checking static variables is called from cpufreq > ->init callback. Which in turn is called from a function which is > passed as startup() function pointer to > cpuhp_setup_state_nocalls_cpuslocked(). > > I see that startup() callbacks are called under a mutex > cpuhp_state_mutex for each present CPUs. So if some tear down happen, > that is also protected by the same mutex. The assumption is here is > that cpuhp_invoke_callback() in hotplug state machine is not called in > parallel on two CPUs by the hotplug state machine. But I see activity > on parallel bringup, so this is questionable now. Parallel bringup should still serialise this. It mostly only does the hardware bringup in parallel. Having a pointer back to the cpu hotplug lock would make it easier to untangle this code though.