Message ID | 3310447.aeNJFYEL58@rjwysocki.net |
---|---|
Headers | show |
Series | x86 / intel_pstate: Set asymmetric CPU capacity on hybrid systems | expand |
On Wed, Aug 28, 2024 at 01:45:00PM +0200, Rafael J. Wysocki wrote: > Hi Everyone, > > This is an update of > > https://lore.kernel.org/linux-pm/4941491.31r3eYUQgx@rjwysocki.net/ > > which was an update of > > https://lore.kernel.org/linux-pm/4908113.GXAFRqVoOG@rjwysocki.net/ > > It addresses Ricardo's review comments and fixes an issue with intel_pstate > operation mode changes that would cause it to attempt to enable hybrid CPU > capacity scaling after it has been already enabled during initialization. > > The most visible difference with respect to the previous version is that > patch [1/3] has been dropped because it is not needed any more after using > the observation that sched_clear_itmt_support() would cause sched domains > to be rebuilt. > > Other than this, there are cosmetic differences in patch [1/2] (previously [2/3]) > and the new code in intel_pstate_register_driver() in patch [2/2] (previously [3/3]) > has been squashed into hybrid_init_cpu_scaling() which now checks whether or > not to enable hybrid CPU capacity scaling (as it may have been enabled already). > > This series is available from the following git branch: > > https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/log/?h=intel_pstate-testing > > (with an extra debug commit on top). > > The original cover letter quoted below still applies: > > The purpose of this series is to provide the scheduler with asymmetric CPU > capacity information on x86 hybrid systems based on Intel hardware. > > The asymmetric CPU capacity information is important on hybrid systems as it > allows utilization to be computed for tasks in a consistent way across all > CPUs in the system, regardless of their capacity. This, in turn, allows > the schedutil cpufreq governor to set CPU performance levels consistently > in the cases when tasks migrate between CPUs of different capacities. It > should also help to improve task placement and load balancing decisions on > hybrid systems and it is key for anything along the lines of EAS. > > The information in question comes from the MSR_HWP_CAPABILITIES register and > is provided to the scheduler by the intel_pstate driver, as per the changelog > of patch [3/3]. Patch [2/3] introduces the arch infrastructure needed for > that (in the form of a per-CPU capacity variable) and patch [1/3] is a > preliminary code adjustment. > > This is based on an RFC posted previously > > https://lore.kernel.org/linux-pm/7663799.EvYhyI6sBW@kreacher/ > > but differs from it quite a bit (except for the first patch). The most > significant difference is based on the observation that frequency- > invariance needs to adjusted to the capacity scaling on hybrid systems > for the complete scale-invariance to work as expected. > > Thank you! Tested-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> # scale invariance You can look at the scaling invariance these patches achieve here https://pasteboard.co/dhBAUjfr36Tx.png I tested these patches on an Meteor Lake system. It has CPUs with three levels of capacity (Pcore, Ecore, and Lcore) The "Requested work" plot shows a sawtooth pattern of the amount of work requested as a percentage of the maximum amount of work that can be obtained from the biggest CPU running at its maximum frequency. The work is continuously calling getcpu() in a time window of constant duration with varying percentages of work. The "Achieved work" plot shows that the Ecore and Lcore cannot complete as much work as the PCore even when fully busy (see the "Busy %" plot). Also, bigger CPUs have more idle time. The "Scale freq capacity" plot shows the current frequency of each CPU is now scaled to 1024 by their respective max frequencies. It no longer uses the single arch_max_freq_ratio value. Capacity now scales correctly: when running at its maximum frequency, the current capacity (see "Current capacity" plot and refer to cap_scale()) now matches the value from arch_scale_cpu_capacity() (see "CPU capacity" plot). The "Task utilization" plot shows that task->util_avg is now invariant across CPUs. > > >
On Wed, Sep 4, 2024 at 9:19 AM Ricardo Neri <ricardo.neri-calderon@linux.intel.com> wrote: > > On Wed, Aug 28, 2024 at 01:45:00PM +0200, Rafael J. Wysocki wrote: > > Hi Everyone, > > > > This is an update of > > > > https://lore.kernel.org/linux-pm/4941491.31r3eYUQgx@rjwysocki.net/ > > > > which was an update of > > > > https://lore.kernel.org/linux-pm/4908113.GXAFRqVoOG@rjwysocki.net/ > > > > It addresses Ricardo's review comments and fixes an issue with intel_pstate > > operation mode changes that would cause it to attempt to enable hybrid CPU > > capacity scaling after it has been already enabled during initialization. > > > > The most visible difference with respect to the previous version is that > > patch [1/3] has been dropped because it is not needed any more after using > > the observation that sched_clear_itmt_support() would cause sched domains > > to be rebuilt. > > > > Other than this, there are cosmetic differences in patch [1/2] (previously [2/3]) > > and the new code in intel_pstate_register_driver() in patch [2/2] (previously [3/3]) > > has been squashed into hybrid_init_cpu_scaling() which now checks whether or > > not to enable hybrid CPU capacity scaling (as it may have been enabled already). > > > > This series is available from the following git branch: > > > > https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/log/?h=intel_pstate-testing > > > > (with an extra debug commit on top). > > > > The original cover letter quoted below still applies: > > > > The purpose of this series is to provide the scheduler with asymmetric CPU > > capacity information on x86 hybrid systems based on Intel hardware. > > > > The asymmetric CPU capacity information is important on hybrid systems as it > > allows utilization to be computed for tasks in a consistent way across all > > CPUs in the system, regardless of their capacity. This, in turn, allows > > the schedutil cpufreq governor to set CPU performance levels consistently > > in the cases when tasks migrate between CPUs of different capacities. It > > should also help to improve task placement and load balancing decisions on > > hybrid systems and it is key for anything along the lines of EAS. > > > > The information in question comes from the MSR_HWP_CAPABILITIES register and > > is provided to the scheduler by the intel_pstate driver, as per the changelog > > of patch [3/3]. Patch [2/3] introduces the arch infrastructure needed for > > that (in the form of a per-CPU capacity variable) and patch [1/3] is a > > preliminary code adjustment. > > > > This is based on an RFC posted previously > > > > https://lore.kernel.org/linux-pm/7663799.EvYhyI6sBW@kreacher/ > > > > but differs from it quite a bit (except for the first patch). The most > > significant difference is based on the observation that frequency- > > invariance needs to adjusted to the capacity scaling on hybrid systems > > for the complete scale-invariance to work as expected. > > > > Thank you! > > Tested-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> # scale invariance > > You can look at the scaling invariance these patches achieve here > > https://pasteboard.co/dhBAUjfr36Tx.png > > I tested these patches on an Meteor Lake system. It has CPUs with three > levels of capacity (Pcore, Ecore, and Lcore) > > The "Requested work" plot shows a sawtooth pattern of the amount of work > requested as a percentage of the maximum amount of work that can be > obtained from the biggest CPU running at its maximum frequency. The work > is continuously calling getcpu() in a time window of constant duration > with varying percentages of work. > > The "Achieved work" plot shows that the Ecore and Lcore cannot complete > as much work as the PCore even when fully busy (see the "Busy %" plot). > Also, bigger CPUs have more idle time. > > The "Scale freq capacity" plot shows the current frequency of each CPU > is now scaled to 1024 by their respective max frequencies. It no longer > uses the single arch_max_freq_ratio value. Capacity now scales correctly: > when running at its maximum frequency, the current capacity (see > "Current capacity" plot and refer to cap_scale()) now matches the value > from arch_scale_cpu_capacity() (see "CPU capacity" plot). > > The "Task utilization" plot shows that task->util_avg is now invariant > across CPUs. Thank you!