mbox series

[v5,0/7] consolidate and cleanup CPU capacity

Message ID 20231104105907.1365392-1-vincent.guittot@linaro.org
Headers show
Series consolidate and cleanup CPU capacity | expand

Message

Vincent Guittot Nov. 4, 2023, 10:59 a.m. UTC
This is the 1st part of consolidating how the max compute capacity is
used in the scheduler and how we calculate the frequency for a level of
utilization.

Fix some unconsistancy when computing frequency for an utilization. There
can be a mismatch between energy model and schedutil.

Next step will be to make a difference between the original
max compute capacity of a CPU and what is currently available when
there is a capping applying forever (i.e. seconds or more).

Changes since v4:
- Capitalize the verb in subject
- Remove usless parentheses in cppc_get_dmi_max_khz()
- Use freq_ref pattern everywhere
- Fix MHz / kHz units conversion for cppc_cpufreq
- Move default definition of arch_scale_freq_ref() in
  include/linux/sched/topology.h beside arch_scale_cpu_capacity
  which faces similar default declaration behavior. This location covers
  all cases with arch and CONFIG_* which was not the case with previous
  attempts.

Changes since v3:
- Split patch 5 cpufreq/cppc
- Fix topology_init_cpu_capacity_cppc() 
- Fix init if AMU ratio
- Added some tags

Changes since v2:
- Remove the 1st patch which has been queued in tip
- Rework how to initialize the reference frequency for cppc_cpufreq and
  change topology_init_cpu_capacity_cppc() to also set capacity_ref_freq
- Add a RFC to convert AMU to use arch_scale_freq_ref and move the config
  of the AMU ratio to be done when intializing cpu capacity and
  capacity_ref_freq
- Added some tags

Changes since v1:
- Fix typos
- Added changes in cpufreq to use arch_scale_freq_ref() when calling
  arch_set_freq_scale (patch 3).
- arch_scale_freq_ref() is always defined and returns 0 (as proposed
  by Ionela) when not defined by the arch. This simplifies the code with
  the addition of patch 3.
- Simplify Energy Model which always uses arch_scale_freq_ref(). The
  latter returns 0 when not defined by arch instead of last item of the 
  perf domain. This is not a problem because the function is only defined
  for compilation purpose in this case and we don't care about the
  returned value. (patch 5)
- Added changes in cppc cpufreq to set capacity_ref_freq (patch 6)
- Added reviewed tag for patch 1 which got a minor change but not for
  others as I did some changes which could make previous reviewed tag
  no more relevant.

Vincent Guittot (7):
  topology: Add a new arch_scale_freq_reference
  cpufreq: Use the fixed and coherent frequency for scaling capacity
  cpufreq/schedutil: Use a fixed reference frequency
  energy_model: Use a fixed reference frequency
  cpufreq/cppc: Move and rename cppc_cpufreq_{perf_to_khz|khz_to_perf}
  cpufreq/cppc: Set the frequency used for computing the capacity
  arm64/amu: Use capacity_ref_freq to set AMU ratio

 arch/arm/include/asm/topology.h   |   1 +
 arch/arm64/include/asm/topology.h |   1 +
 arch/arm64/kernel/topology.c      |  26 +++---
 arch/riscv/include/asm/topology.h |   1 +
 drivers/acpi/cppc_acpi.c          | 104 ++++++++++++++++++++++
 drivers/base/arch_topology.c      |  56 ++++++++----
 drivers/cpufreq/cppc_cpufreq.c    | 139 ++++--------------------------
 drivers/cpufreq/cpufreq.c         |   4 +-
 include/acpi/cppc_acpi.h          |   2 +
 include/linux/arch_topology.h     |   8 ++
 include/linux/cpufreq.h           |   1 +
 include/linux/energy_model.h      |   6 +-
 include/linux/sched/topology.h    |   8 ++
 kernel/sched/cpufreq_schedutil.c  |  26 +++++-
 14 files changed, 225 insertions(+), 158 deletions(-)

Comments

Rafael J. Wysocki Nov. 6, 2023, 2:26 p.m. UTC | #1
On Sat, Nov 4, 2023 at 11:59 AM Vincent Guittot
<vincent.guittot@linaro.org> wrote:
>
> Move and rename cppc_cpufreq_perf_to_khz and cppc_cpufreq_khz_to_perf to
> use them outside cppc_cpufreq in topology_init_cpu_capacity_cppc().
>
> Modify the interface to use struct cppc_perf_caps *caps instead of
> struct cppc_cpudata *cpu_data as we only use the fields of cppc_perf_caps.
>
> cppc_cpufreq was converting the lowest and nominal freq from MHz to kHz
> before using them. We move this conversion inside cppc_perf_to_khz and
> cppc_khz_to_perf to make them generic and usable outside cppc_cpufreq.
>
> No functional change
>
> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>

Acked-by: Rafael J. Wysocki <rafael@kernel.org>

> ---
>  drivers/acpi/cppc_acpi.c       | 104 ++++++++++++++++++++++++
>  drivers/cpufreq/cppc_cpufreq.c | 139 ++++-----------------------------
>  include/acpi/cppc_acpi.h       |   2 +
>  3 files changed, 123 insertions(+), 122 deletions(-)
>
> diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c
> index 7ff269a78c20..d155a86a8614 100644
> --- a/drivers/acpi/cppc_acpi.c
> +++ b/drivers/acpi/cppc_acpi.c
> @@ -39,6 +39,9 @@
>  #include <linux/rwsem.h>
>  #include <linux/wait.h>
>  #include <linux/topology.h>
> +#include <linux/dmi.h>
> +#include <linux/units.h>
> +#include <asm/unaligned.h>
>
>  #include <acpi/cppc_acpi.h>
>
> @@ -1760,3 +1763,104 @@ unsigned int cppc_get_transition_latency(int cpu_num)
>         return latency_ns;
>  }
>  EXPORT_SYMBOL_GPL(cppc_get_transition_latency);
> +
> +/* Minimum struct length needed for the DMI processor entry we want */
> +#define DMI_ENTRY_PROCESSOR_MIN_LENGTH 48
> +
> +/* Offset in the DMI processor structure for the max frequency */
> +#define DMI_PROCESSOR_MAX_SPEED                0x14
> +
> +/* Callback function used to retrieve the max frequency from DMI */
> +static void cppc_find_dmi_mhz(const struct dmi_header *dm, void *private)
> +{
> +       const u8 *dmi_data = (const u8 *)dm;
> +       u16 *mhz = (u16 *)private;
> +
> +       if (dm->type == DMI_ENTRY_PROCESSOR &&
> +           dm->length >= DMI_ENTRY_PROCESSOR_MIN_LENGTH) {
> +               u16 val = (u16)get_unaligned((const u16 *)
> +                               (dmi_data + DMI_PROCESSOR_MAX_SPEED));
> +               *mhz = val > *mhz ? val : *mhz;
> +       }
> +}
> +
> +/* Look up the max frequency in DMI */
> +static u64 cppc_get_dmi_max_khz(void)
> +{
> +       u16 mhz = 0;
> +
> +       dmi_walk(cppc_find_dmi_mhz, &mhz);
> +
> +       /*
> +        * Real stupid fallback value, just in case there is no
> +        * actual value set.
> +        */
> +       mhz = mhz ? mhz : 1;
> +
> +       return KHZ_PER_MHZ * mhz;
> +}
> +
> +/*
> + * If CPPC lowest_freq and nominal_freq registers are exposed then we can
> + * use them to convert perf to freq and vice versa. The conversion is
> + * extrapolated as an affine function passing by the 2 points:
> + *  - (Low perf, Low freq)
> + *  - (Nominal perf, Nominal freq)
> + */
> +unsigned int cppc_perf_to_khz(struct cppc_perf_caps *caps, unsigned int perf)
> +{
> +       s64 retval, offset = 0;
> +       static u64 max_khz;
> +       u64 mul, div;
> +
> +       if (caps->lowest_freq && caps->nominal_freq) {
> +               mul = caps->nominal_freq - caps->lowest_freq;
> +               mul *= KHZ_PER_MHZ;
> +               div = caps->nominal_perf - caps->lowest_perf;
> +               offset = caps->nominal_freq * KHZ_PER_MHZ -
> +                        div64_u64(caps->nominal_perf * mul, div);
> +       } else {
> +               if (!max_khz)
> +                       max_khz = cppc_get_dmi_max_khz();
> +               mul = max_khz;
> +               div = caps->highest_perf;
> +       }
> +
> +       retval = offset + div64_u64(perf * mul, div);
> +       if (retval >= 0)
> +               return retval;
> +       return 0;
> +}
> +EXPORT_SYMBOL_GPL(cppc_perf_to_khz);
> +
> +unsigned int cppc_khz_to_perf(struct cppc_perf_caps *caps, unsigned int freq)
> +{
> +       s64 retval, offset = 0;
> +       static u64 max_khz;
> +       u64  mul, div;
> +
> +       if (caps->lowest_freq && caps->nominal_freq) {
> +               mul = caps->nominal_perf - caps->lowest_perf;
> +               div = caps->nominal_freq - caps->lowest_freq;
> +               /*
> +                * We don't need to convert to kHz for computing offset and can
> +                * directly use nominal_freq and lowest_freq as the div64_u64
> +                * will remove the frequency unit.
> +                */
> +               offset = caps->nominal_perf -
> +                        div64_u64(caps->nominal_freq * mul, div);
> +               /* But we need it for computing the perf level. */
> +               div *= KHZ_PER_MHZ;
> +       } else {
> +               if (!max_khz)
> +                       max_khz = cppc_get_dmi_max_khz();
> +               mul = caps->highest_perf;
> +               div = max_khz;
> +       }
> +
> +       retval = offset + div64_u64(freq * mul, div);
> +       if (retval >= 0)
> +               return retval;
> +       return 0;
> +}
> +EXPORT_SYMBOL_GPL(cppc_khz_to_perf);
> diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
> index fe08ca419b3d..64420d9cfd1e 100644
> --- a/drivers/cpufreq/cppc_cpufreq.c
> +++ b/drivers/cpufreq/cppc_cpufreq.c
> @@ -16,7 +16,6 @@
>  #include <linux/delay.h>
>  #include <linux/cpu.h>
>  #include <linux/cpufreq.h>
> -#include <linux/dmi.h>
>  #include <linux/irq_work.h>
>  #include <linux/kthread.h>
>  #include <linux/time.h>
> @@ -27,12 +26,6 @@
>
>  #include <acpi/cppc_acpi.h>
>
> -/* Minimum struct length needed for the DMI processor entry we want */
> -#define DMI_ENTRY_PROCESSOR_MIN_LENGTH 48
> -
> -/* Offset in the DMI processor structure for the max frequency */
> -#define DMI_PROCESSOR_MAX_SPEED                0x14
> -
>  /*
>   * This list contains information parsed from per CPU ACPI _CPC and _PSD
>   * structures: e.g. the highest and lowest supported performance, capabilities,
> @@ -291,97 +284,9 @@ static inline void cppc_freq_invariance_exit(void)
>  }
>  #endif /* CONFIG_ACPI_CPPC_CPUFREQ_FIE */
>
> -/* Callback function used to retrieve the max frequency from DMI */
> -static void cppc_find_dmi_mhz(const struct dmi_header *dm, void *private)
> -{
> -       const u8 *dmi_data = (const u8 *)dm;
> -       u16 *mhz = (u16 *)private;
> -
> -       if (dm->type == DMI_ENTRY_PROCESSOR &&
> -           dm->length >= DMI_ENTRY_PROCESSOR_MIN_LENGTH) {
> -               u16 val = (u16)get_unaligned((const u16 *)
> -                               (dmi_data + DMI_PROCESSOR_MAX_SPEED));
> -               *mhz = val > *mhz ? val : *mhz;
> -       }
> -}
> -
> -/* Look up the max frequency in DMI */
> -static u64 cppc_get_dmi_max_khz(void)
> -{
> -       u16 mhz = 0;
> -
> -       dmi_walk(cppc_find_dmi_mhz, &mhz);
> -
> -       /*
> -        * Real stupid fallback value, just in case there is no
> -        * actual value set.
> -        */
> -       mhz = mhz ? mhz : 1;
> -
> -       return (1000 * mhz);
> -}
> -
> -/*
> - * If CPPC lowest_freq and nominal_freq registers are exposed then we can
> - * use them to convert perf to freq and vice versa. The conversion is
> - * extrapolated as an affine function passing by the 2 points:
> - *  - (Low perf, Low freq)
> - *  - (Nominal perf, Nominal perf)
> - */
> -static unsigned int cppc_cpufreq_perf_to_khz(struct cppc_cpudata *cpu_data,
> -                                            unsigned int perf)
> -{
> -       struct cppc_perf_caps *caps = &cpu_data->perf_caps;
> -       s64 retval, offset = 0;
> -       static u64 max_khz;
> -       u64 mul, div;
> -
> -       if (caps->lowest_freq && caps->nominal_freq) {
> -               mul = caps->nominal_freq - caps->lowest_freq;
> -               div = caps->nominal_perf - caps->lowest_perf;
> -               offset = caps->nominal_freq - div64_u64(caps->nominal_perf * mul, div);
> -       } else {
> -               if (!max_khz)
> -                       max_khz = cppc_get_dmi_max_khz();
> -               mul = max_khz;
> -               div = caps->highest_perf;
> -       }
> -
> -       retval = offset + div64_u64(perf * mul, div);
> -       if (retval >= 0)
> -               return retval;
> -       return 0;
> -}
> -
> -static unsigned int cppc_cpufreq_khz_to_perf(struct cppc_cpudata *cpu_data,
> -                                            unsigned int freq)
> -{
> -       struct cppc_perf_caps *caps = &cpu_data->perf_caps;
> -       s64 retval, offset = 0;
> -       static u64 max_khz;
> -       u64  mul, div;
> -
> -       if (caps->lowest_freq && caps->nominal_freq) {
> -               mul = caps->nominal_perf - caps->lowest_perf;
> -               div = caps->nominal_freq - caps->lowest_freq;
> -               offset = caps->nominal_perf - div64_u64(caps->nominal_freq * mul, div);
> -       } else {
> -               if (!max_khz)
> -                       max_khz = cppc_get_dmi_max_khz();
> -               mul = caps->highest_perf;
> -               div = max_khz;
> -       }
> -
> -       retval = offset + div64_u64(freq * mul, div);
> -       if (retval >= 0)
> -               return retval;
> -       return 0;
> -}
> -
>  static int cppc_cpufreq_set_target(struct cpufreq_policy *policy,
>                                    unsigned int target_freq,
>                                    unsigned int relation)
> -
>  {
>         struct cppc_cpudata *cpu_data = policy->driver_data;
>         unsigned int cpu = policy->cpu;
> @@ -389,7 +294,7 @@ static int cppc_cpufreq_set_target(struct cpufreq_policy *policy,
>         u32 desired_perf;
>         int ret = 0;
>
> -       desired_perf = cppc_cpufreq_khz_to_perf(cpu_data, target_freq);
> +       desired_perf = cppc_khz_to_perf(&cpu_data->perf_caps, target_freq);
>         /* Return if it is exactly the same perf */
>         if (desired_perf == cpu_data->perf_ctrls.desired_perf)
>                 return ret;
> @@ -417,7 +322,7 @@ static unsigned int cppc_cpufreq_fast_switch(struct cpufreq_policy *policy,
>         u32 desired_perf;
>         int ret;
>
> -       desired_perf = cppc_cpufreq_khz_to_perf(cpu_data, target_freq);
> +       desired_perf = cppc_khz_to_perf(&cpu_data->perf_caps, target_freq);
>         cpu_data->perf_ctrls.desired_perf = desired_perf;
>         ret = cppc_set_perf(cpu, &cpu_data->perf_ctrls);
>
> @@ -530,7 +435,7 @@ static int cppc_get_cpu_power(struct device *cpu_dev,
>         min_step = min_cap / CPPC_EM_CAP_STEP;
>         max_step = max_cap / CPPC_EM_CAP_STEP;
>
> -       perf_prev = cppc_cpufreq_khz_to_perf(cpu_data, *KHz);
> +       perf_prev = cppc_khz_to_perf(perf_caps, *KHz);
>         step = perf_prev / perf_step;
>
>         if (step > max_step)
> @@ -550,8 +455,8 @@ static int cppc_get_cpu_power(struct device *cpu_dev,
>                         perf = step * perf_step;
>         }
>
> -       *KHz = cppc_cpufreq_perf_to_khz(cpu_data, perf);
> -       perf_check = cppc_cpufreq_khz_to_perf(cpu_data, *KHz);
> +       *KHz = cppc_perf_to_khz(perf_caps, perf);
> +       perf_check = cppc_khz_to_perf(perf_caps, *KHz);
>         step_check = perf_check / perf_step;
>
>         /*
> @@ -561,8 +466,8 @@ static int cppc_get_cpu_power(struct device *cpu_dev,
>          */
>         while ((*KHz == prev_freq) || (step_check != step)) {
>                 perf++;
> -               *KHz = cppc_cpufreq_perf_to_khz(cpu_data, perf);
> -               perf_check = cppc_cpufreq_khz_to_perf(cpu_data, *KHz);
> +               *KHz = cppc_perf_to_khz(perf_caps, perf);
> +               perf_check = cppc_khz_to_perf(perf_caps, *KHz);
>                 step_check = perf_check / perf_step;
>         }
>
> @@ -591,7 +496,7 @@ static int cppc_get_cpu_cost(struct device *cpu_dev, unsigned long KHz,
>         perf_caps = &cpu_data->perf_caps;
>         max_cap = arch_scale_cpu_capacity(cpu_dev->id);
>
> -       perf_prev = cppc_cpufreq_khz_to_perf(cpu_data, KHz);
> +       perf_prev = cppc_khz_to_perf(perf_caps, KHz);
>         perf_step = CPPC_EM_CAP_STEP * perf_caps->highest_perf / max_cap;
>         step = perf_prev / perf_step;
>
> @@ -679,10 +584,6 @@ static struct cppc_cpudata *cppc_cpufreq_get_cpu_data(unsigned int cpu)
>                 goto free_mask;
>         }
>
> -       /* Convert the lowest and nominal freq from MHz to KHz */
> -       cpu_data->perf_caps.lowest_freq *= 1000;
> -       cpu_data->perf_caps.nominal_freq *= 1000;
> -
>         list_add(&cpu_data->node, &cpu_data_list);
>
>         return cpu_data;
> @@ -724,20 +625,16 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy)
>          * Set min to lowest nonlinear perf to avoid any efficiency penalty (see
>          * Section 8.4.7.1.1.5 of ACPI 6.1 spec)
>          */
> -       policy->min = cppc_cpufreq_perf_to_khz(cpu_data,
> -                                              caps->lowest_nonlinear_perf);
> -       policy->max = cppc_cpufreq_perf_to_khz(cpu_data,
> -                                              caps->nominal_perf);
> +       policy->min = cppc_perf_to_khz(caps, caps->lowest_nonlinear_perf);
> +       policy->max = cppc_perf_to_khz(caps, caps->nominal_perf);
>
>         /*
>          * Set cpuinfo.min_freq to Lowest to make the full range of performance
>          * available if userspace wants to use any perf between lowest & lowest
>          * nonlinear perf
>          */
> -       policy->cpuinfo.min_freq = cppc_cpufreq_perf_to_khz(cpu_data,
> -                                                           caps->lowest_perf);
> -       policy->cpuinfo.max_freq = cppc_cpufreq_perf_to_khz(cpu_data,
> -                                                           caps->nominal_perf);
> +       policy->cpuinfo.min_freq = cppc_perf_to_khz(caps, caps->lowest_perf);
> +       policy->cpuinfo.max_freq = cppc_perf_to_khz(caps, caps->nominal_perf);
>
>         policy->transition_delay_us = cppc_cpufreq_get_transition_delay_us(cpu);
>         policy->shared_type = cpu_data->shared_type;
> @@ -773,7 +670,7 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy)
>                 boost_supported = true;
>
>         /* Set policy->cur to max now. The governors will adjust later. */
> -       policy->cur = cppc_cpufreq_perf_to_khz(cpu_data, caps->highest_perf);
> +       policy->cur = cppc_perf_to_khz(caps, caps->highest_perf);
>         cpu_data->perf_ctrls.desired_perf =  caps->highest_perf;
>
>         ret = cppc_set_perf(cpu, &cpu_data->perf_ctrls);
> @@ -863,7 +760,7 @@ static unsigned int cppc_cpufreq_get_rate(unsigned int cpu)
>         delivered_perf = cppc_perf_from_fbctrs(cpu_data, &fb_ctrs_t0,
>                                                &fb_ctrs_t1);
>
> -       return cppc_cpufreq_perf_to_khz(cpu_data, delivered_perf);
> +       return cppc_perf_to_khz(&cpu_data->perf_caps, delivered_perf);
>  }
>
>  static int cppc_cpufreq_set_boost(struct cpufreq_policy *policy, int state)
> @@ -878,11 +775,9 @@ static int cppc_cpufreq_set_boost(struct cpufreq_policy *policy, int state)
>         }
>
>         if (state)
> -               policy->max = cppc_cpufreq_perf_to_khz(cpu_data,
> -                                                      caps->highest_perf);
> +               policy->max = cppc_perf_to_khz(caps, caps->highest_perf);
>         else
> -               policy->max = cppc_cpufreq_perf_to_khz(cpu_data,
> -                                                      caps->nominal_perf);
> +               policy->max = cppc_perf_to_khz(caps, caps->nominal_perf);
>         policy->cpuinfo.max_freq = policy->max;
>
>         ret = freq_qos_update_request(policy->max_freq_req, policy->max);
> @@ -937,7 +832,7 @@ static unsigned int hisi_cppc_cpufreq_get_rate(unsigned int cpu)
>         if (ret < 0)
>                 return -EIO;
>
> -       return cppc_cpufreq_perf_to_khz(cpu_data, desired_perf);
> +       return cppc_perf_to_khz(&cpu_data->perf_caps, desired_perf);
>  }
>
>  static void cppc_check_hisi_workaround(void)
> diff --git a/include/acpi/cppc_acpi.h b/include/acpi/cppc_acpi.h
> index 6126c977ece0..3a0995f8bce8 100644
> --- a/include/acpi/cppc_acpi.h
> +++ b/include/acpi/cppc_acpi.h
> @@ -144,6 +144,8 @@ extern int cppc_set_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls);
>  extern int cppc_set_enable(int cpu, bool enable);
>  extern int cppc_get_perf_caps(int cpu, struct cppc_perf_caps *caps);
>  extern bool cppc_perf_ctrs_in_pcc(void);
> +extern unsigned int cppc_perf_to_khz(struct cppc_perf_caps *caps, unsigned int perf);
> +extern unsigned int cppc_khz_to_perf(struct cppc_perf_caps *caps, unsigned int freq);
>  extern bool acpi_cpc_valid(void);
>  extern bool cppc_allow_fast_switch(void);
>  extern int acpi_get_psd_map(unsigned int cpu, struct cppc_cpudata *cpu_data);
> --
> 2.34.1
>