Message ID | 20210617060546.26933-4-cw00.choi@samsung.com |
---|---|
State | New |
Headers | show |
Series | PM / devfreq: Add cpu based scaling support to passive governor | expand |
On 6/17/21 2:51 PM, Hsin-Yi Wang wrote: > On Thu, Jun 17, 2021 at 1:46 PM Chanwoo Choi <cw00.choi@samsung.com> wrote: >> >> From: Saravana Kannan <skannan@codeaurora.org> >> >> Many CPU architectures have caches that can scale independent of the >> CPUs. Frequency scaling of the caches is necessary to make sure that the >> cache is not a performance bottleneck that leads to poor performance and >> power. The same idea applies for RAM/DDR. >> >> To achieve this, this patch adds support for cpu based scaling to the >> passive governor. This is accomplished by taking the current frequency >> of each CPU frequency domain and then adjust the frequency of the cache >> (or any devfreq device) based on the frequency of the CPUs. It listens >> to CPU frequency transition notifiers to keep itself up to date on the >> current CPU frequency. >> >> To decide the frequency of the device, the governor does one of the >> following: >> * Derives the optimal devfreq device opp from required-opps property of >> the parent cpu opp_table. >> >> * Scales the device frequency in proportion to the CPU frequency. So, if >> the CPUs are running at their max frequency, the device runs at its >> max frequency. If the CPUs are running at their min frequency, the >> device runs at its min frequency. It is interpolated for frequencies >> in between. >> >> Signed-off-by: Saravana Kannan <skannan@codeaurora.org> >> [Sibi: Integrated cpu-freqmap governor into passive_governor] >> Signed-off-by: Sibi Sankar <sibis@codeaurora.org> >> [Chanwoo: Fix conflict with latest code and clean code up] >> Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com> >> --- >> drivers/devfreq/governor.h | 22 +++ >> drivers/devfreq/governor_passive.c | 264 ++++++++++++++++++++++++++++- >> include/linux/devfreq.h | 16 +- >> 3 files changed, 293 insertions(+), 9 deletions(-) >> >> diff --git a/drivers/devfreq/governor.h b/drivers/devfreq/governor.h >> index 9a9495f94ac6..3c36c92c89a9 100644 >> --- a/drivers/devfreq/governor.h >> +++ b/drivers/devfreq/governor.h >> @@ -47,6 +47,28 @@ >> #define DEVFREQ_GOV_ATTR_POLLING_INTERVAL BIT(0) >> #define DEVFREQ_GOV_ATTR_TIMER BIT(1) >> >> +/** >> + * struct devfreq_cpu_data - Hold the per-cpu data >> + * @dev: reference to cpu device. >> + * @first_cpu: the cpumask of the first cpu of a policy. >> + * @opp_table: reference to cpu opp table. >> + * @cur_freq: the current frequency of the cpu. >> + * @min_freq: the min frequency of the cpu. >> + * @max_freq: the max frequency of the cpu. >> + * >> + * This structure stores the required cpu_data of a cpu. >> + * This is auto-populated by the governor. >> + */ >> +struct devfreq_cpu_data { >> + struct device *dev; >> + unsigned int first_cpu; >> + >> + struct opp_table *opp_table; >> + unsigned int cur_freq; >> + unsigned int min_freq; >> + unsigned int max_freq; >> +}; >> + >> /** >> * struct devfreq_governor - Devfreq policy governor >> * @node: list node - contains registered devfreq governors >> diff --git a/drivers/devfreq/governor_passive.c b/drivers/devfreq/governor_passive.c >> index fc09324a03e0..07e864509b7e 100644 >> --- a/drivers/devfreq/governor_passive.c >> +++ b/drivers/devfreq/governor_passive.c >> @@ -8,11 +8,84 @@ >> */ >> >> #include <linux/module.h> >> +#include <linux/cpu.h> >> +#include <linux/cpufreq.h> >> +#include <linux/cpumask.h> >> +#include <linux/slab.h> >> #include <linux/device.h> >> #include <linux/devfreq.h> >> #include "governor.h" >> >> -static int devfreq_passive_get_target_freq(struct devfreq *devfreq, >> +#define HZ_PER_KHZ 1000 >> + >> +static unsigned long get_taget_freq_by_required_opp(struct device *p_dev, >> + struct opp_table *p_opp_table, >> + struct opp_table *opp_table, >> + unsigned long freq) >> +{ >> + struct dev_pm_opp *opp = NULL, *p_opp = NULL; >> + >> + if (!p_dev || !p_opp_table || !opp_table || !freq) >> + return 0; >> + >> + p_opp = devfreq_recommended_opp(p_dev, &freq, 0); >> + if (IS_ERR(p_opp)) >> + return 0; >> + >> + opp = dev_pm_opp_xlate_required_opp(p_opp_table, opp_table, p_opp); >> + dev_pm_opp_put(p_opp); >> + >> + if (IS_ERR(opp)) >> + return 0; >> + >> + freq = dev_pm_opp_get_freq(opp); >> + dev_pm_opp_put(opp); >> + >> + return freq; >> +} >> + >> +static int get_target_freq_with_cpufreq(struct devfreq *devfreq, >> + unsigned long *target_freq) >> +{ >> + struct devfreq_passive_data *p_data = >> + (struct devfreq_passive_data *)devfreq->data; >> + struct devfreq_cpu_data *cpu_data; > We might have to rename the cpu_data variable. > > For some architectures, cpu_data is defined as macro. This results in > errors such as > > include/linux/devfreq.h:331:27: note: in expansion of macro 'cpu_data' > 331 | struct devfreq_cpu_data *cpu_data[NR_CPUS]; > | ^~~~~~~~ > In file included from include/linux/devfreq_cooling.h:13, > from drivers/devfreq/devfreq.c:14: > include/linux/devfreq.h:332:1: warning: no semicolon at end of > struct or union OK. I'll rename. > >> + unsigned long cpu, cpu_cur, cpu_min, cpu_max, cpu_percent; >> + unsigned long dev_min, dev_max; >> + unsigned long freq = 0; >> + >> + for_each_online_cpu(cpu) { >> + cpu_data = p_data->cpu_data[cpu]; >> + if (!cpu_data || cpu_data->first_cpu != cpu) >> + continue; >> + >> + /* Get target freq via required opps */ >> + cpu_cur = cpu_data->cur_freq * HZ_PER_KHZ; >> + freq = get_taget_freq_by_required_opp(cpu_data->dev, >> + cpu_data->opp_table, >> + devfreq->opp_table, cpu_cur); >> + if (freq) { >> + *target_freq = max(freq, *target_freq); >> + continue; >> + } >> + >> + /* Use Interpolation if required opps is not available */ >> + devfreq_get_freq_range(devfreq, &dev_min, &dev_max); >> + >> + cpu_min = cpu_data->min_freq; >> + cpu_max = cpu_data->max_freq; >> + cpu_cur = cpu_data->cur_freq; >> + >> + cpu_percent = ((cpu_cur - cpu_min) * 100) / (cpu_max - cpu_min); >> + freq = dev_min + mult_frac(dev_max - dev_min, cpu_percent, 100); >> + >> + *target_freq = max(freq, *target_freq); >> + } >> + >> + return 0; >> +} >> + >> +static int get_target_freq_with_devfreq(struct devfreq *devfreq, >> unsigned long *freq) >> { >> struct devfreq_passive_data *p_data >> @@ -99,6 +172,172 @@ static int devfreq_passive_get_target_freq(struct devfreq *devfreq, >> return 0; >> } >> >> +static int devfreq_passive_get_target_freq(struct devfreq *devfreq, >> + unsigned long *freq) >> +{ >> + struct devfreq_passive_data *p_data = >> + (struct devfreq_passive_data *)devfreq->data; >> + int ret; >> + >> + if (!p_data) >> + return -EINVAL; >> + >> + /* >> + * If the devfreq device with passive governor has the specific method >> + * to determine the next frequency, should use the get_target_freq() >> + * of struct devfreq_passive_data. >> + */ >> + if (p_data->get_target_freq) >> + return p_data->get_target_freq(devfreq, freq); >> + >> + switch (p_data->parent_type) { >> + case DEVFREQ_PARENT_DEV: >> + ret = get_target_freq_with_devfreq(devfreq, freq); >> + break; >> + case CPUFREQ_PARENT_DEV: >> + ret = get_target_freq_with_cpufreq(devfreq, freq); >> + break; >> + default: >> + ret = -EINVAL; >> + dev_err(&devfreq->dev, "Invalid parent type\n"); >> + break; >> + } >> + >> + return ret; >> +} >> + >> +static int cpufreq_passive_notifier_call(struct notifier_block *nb, >> + unsigned long event, void *ptr) >> +{ >> + struct devfreq_passive_data *data = >> + container_of(nb, struct devfreq_passive_data, nb); >> + struct devfreq *devfreq = (struct devfreq *)data->this; >> + struct devfreq_cpu_data *cpu_data; >> + struct cpufreq_freqs *freqs = ptr; >> + unsigned int cur_freq; >> + int ret; >> + >> + if (event != CPUFREQ_POSTCHANGE || !freqs || >> + !data->cpu_data[freqs->policy->cpu]) >> + return 0; >> + >> + cpu_data = data->cpu_data[freqs->policy->cpu]; >> + if (cpu_data->cur_freq == freqs->new) >> + return 0; >> + >> + cur_freq = cpu_data->cur_freq; >> + cpu_data->cur_freq = freqs->new; >> + >> + mutex_lock(&devfreq->lock); >> + ret = devfreq_update_target(devfreq, freqs->new); >> + mutex_unlock(&devfreq->lock); >> + if (ret) { >> + cpu_data->cur_freq = cur_freq; >> + dev_err(&devfreq->dev, "failed to update the frequency.\n"); >> + return ret; >> + } >> + >> + return 0; >> +} >> + >> +static int cpufreq_passive_register_notifier(struct devfreq *devfreq) >> +{ >> + struct devfreq_passive_data *p_data >> + = (struct devfreq_passive_data *)devfreq->data; >> + struct device *dev = devfreq->dev.parent; >> + struct opp_table *opp_table = NULL; >> + struct devfreq_cpu_data *cpu_data; >> + struct cpufreq_policy *policy; >> + struct device *cpu_dev; >> + unsigned int cpu; >> + int ret; >> + >> + get_online_cpus(); >> + >> + p_data->nb.notifier_call = cpufreq_passive_notifier_call; >> + ret = cpufreq_register_notifier(&p_data->nb, CPUFREQ_TRANSITION_NOTIFIER); >> + if (ret) { >> + dev_err(dev, "failed to register cpufreq notifier\n"); >> + p_data->nb.notifier_call = NULL; >> + goto out; >> + } >> + >> + for_each_online_cpu(cpu) { >> + if (p_data->cpu_data[cpu]) >> + continue; >> + >> + policy = cpufreq_cpu_get(cpu); >> + if (policy) { >> + cpu_data = kzalloc(sizeof(*cpu_data), GFP_KERNEL); >> + if (!cpu_data) { >> + ret = -ENOMEM; >> + goto out; >> + } >> + >> + cpu_dev = get_cpu_device(cpu); >> + if (!cpu_dev) { >> + dev_err(dev, "failed to get cpu device\n"); >> + ret = -ENODEV; >> + goto out; >> + } >> + >> + opp_table = dev_pm_opp_get_opp_table(cpu_dev); >> + if (IS_ERR(opp_table)) { >> + ret = PTR_ERR(opp_table); >> + goto out; >> + } >> + >> + cpu_data->dev = cpu_dev; >> + cpu_data->opp_table = opp_table; >> + cpu_data->first_cpu = cpumask_first(policy->related_cpus); >> + cpu_data->cur_freq = policy->cur; >> + cpu_data->min_freq = policy->cpuinfo.min_freq; >> + cpu_data->max_freq = policy->cpuinfo.max_freq; >> + >> + p_data->cpu_data[cpu] = cpu_data; >> + cpufreq_cpu_put(policy); >> + } else { >> + ret = -EPROBE_DEFER; >> + goto out; >> + } >> + } >> +out: >> + put_online_cpus(); >> + if (ret) >> + return ret; >> + >> + mutex_lock(&devfreq->lock); >> + ret = devfreq_update_target(devfreq, 0L); >> + mutex_unlock(&devfreq->lock); >> + if (ret) >> + dev_err(dev, "failed to update the frequency\n"); >> + >> + return ret; >> +} >> + >> +static int cpufreq_passive_unregister_notifier(struct devfreq *devfreq) >> +{ >> + struct devfreq_passive_data *p_data >> + = (struct devfreq_passive_data *)devfreq->data; >> + struct devfreq_cpu_data *cpu_data; >> + int cpu; >> + >> + if (p_data->nb.notifier_call) >> + cpufreq_unregister_notifier(&p_data->nb, CPUFREQ_TRANSITION_NOTIFIER); >> + >> + for_each_possible_cpu(cpu) { >> + cpu_data = p_data->cpu_data[cpu]; >> + if (cpu_data) { >> + if (cpu_data->opp_table) >> + dev_pm_opp_put_opp_table(cpu_data->opp_table); >> + kfree(cpu_data); >> + cpu_data = NULL; >> + } >> + } >> + >> + return 0; >> +} >> + >> static int devfreq_passive_notifier_call(struct notifier_block *nb, >> unsigned long event, void *ptr) >> { >> @@ -140,7 +379,7 @@ static int devfreq_passive_event_handler(struct devfreq *devfreq, >> struct notifier_block *nb = &p_data->nb; >> int ret = 0; >> >> - if (!parent) >> + if (p_data->parent_type == DEVFREQ_PARENT_DEV && !parent) >> return -EPROBE_DEFER; >> >> switch (event) { >> @@ -148,13 +387,24 @@ static int devfreq_passive_event_handler(struct devfreq *devfreq, >> if (!p_data->this) >> p_data->this = devfreq; >> >> - nb->notifier_call = devfreq_passive_notifier_call; >> - ret = devfreq_register_notifier(parent, nb, >> - DEVFREQ_TRANSITION_NOTIFIER); >> + if (p_data->parent_type == DEVFREQ_PARENT_DEV) { >> + nb->notifier_call = devfreq_passive_notifier_call; >> + ret = devfreq_register_notifier(parent, nb, >> + DEVFREQ_TRANSITION_NOTIFIER); >> + } else if (p_data->parent_type == CPUFREQ_PARENT_DEV) { >> + ret = cpufreq_passive_register_notifier(devfreq); >> + } else { >> + ret = -EINVAL; >> + } >> break; >> case DEVFREQ_GOV_STOP: >> - WARN_ON(devfreq_unregister_notifier(parent, nb, >> - DEVFREQ_TRANSITION_NOTIFIER)); >> + if (p_data->parent_type == DEVFREQ_PARENT_DEV) >> + WARN_ON(devfreq_unregister_notifier(parent, nb, >> + DEVFREQ_TRANSITION_NOTIFIER)); >> + else if (p_data->parent_type == CPUFREQ_PARENT_DEV) >> + WARN_ON(cpufreq_passive_unregister_notifier(devfreq)); >> + else >> + ret = -EINVAL; >> break; >> default: >> break; >> diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h >> index 142474b4af96..cfa0ef54841e 100644 >> --- a/include/linux/devfreq.h >> +++ b/include/linux/devfreq.h >> @@ -38,6 +38,7 @@ enum devfreq_timer { >> >> struct devfreq; >> struct devfreq_governor; >> +struct devfreq_cpu_data; >> struct thermal_cooling_device; >> >> /** >> @@ -288,6 +289,11 @@ struct devfreq_simple_ondemand_data { >> #endif >> >> #if IS_ENABLED(CONFIG_DEVFREQ_GOV_PASSIVE) >> +enum devfreq_parent_dev_type { >> + DEVFREQ_PARENT_DEV, >> + CPUFREQ_PARENT_DEV, >> +}; >> + >> /** >> * struct devfreq_passive_data - ``void *data`` fed to struct devfreq >> * and devfreq_add_device >> @@ -299,8 +305,10 @@ struct devfreq_simple_ondemand_data { >> * using governors except for passive governor. >> * If the devfreq device has the specific method to decide >> * the next frequency, should use this callback. >> - * @this: the devfreq instance of own device. >> - * @nb: the notifier block for DEVFREQ_TRANSITION_NOTIFIER list >> + + * @parent_type parent type of the device >> + + * @this: the devfreq instance of own device. >> + + * @nb: the notifier block for DEVFREQ_TRANSITION_NOTIFIER list >> + + * @cpu_data: the state min/max/current frequency of all online cpu's >> * >> * The devfreq_passive_data have to set the devfreq instance of parent >> * device with governors except for the passive governor. But, don't need to >> @@ -314,9 +322,13 @@ struct devfreq_passive_data { >> /* Optional callback to decide the next frequency of passvice device */ >> int (*get_target_freq)(struct devfreq *this, unsigned long *freq); >> >> + /* Should set the type of parent device */ >> + enum devfreq_parent_dev_type parent_type; >> + >> /* For passive governor's internal use. Don't need to set them */ >> struct devfreq *this; >> struct notifier_block nb; >> + struct devfreq_cpu_data *cpu_data[NR_CPUS]; >> }; >> #endif >> >> -- >> 2.17.1 >> > >
On Thu, Jun 17, 2021 at 03:05:45PM +0900, Chanwoo Choi wrote: > From: Saravana Kannan <skannan@codeaurora.org> > > Many CPU architectures have caches that can scale independent of the > CPUs. Frequency scaling of the caches is necessary to make sure that the > cache is not a performance bottleneck that leads to poor performance and > power. The same idea applies for RAM/DDR. > > To achieve this, this patch adds support for cpu based scaling to the > passive governor. This is accomplished by taking the current frequency > of each CPU frequency domain and then adjust the frequency of the cache > (or any devfreq device) based on the frequency of the CPUs. It listens > to CPU frequency transition notifiers to keep itself up to date on the > current CPU frequency. > > To decide the frequency of the device, the governor does one of the > following: > * Derives the optimal devfreq device opp from required-opps property of > the parent cpu opp_table. > > * Scales the device frequency in proportion to the CPU frequency. So, if > the CPUs are running at their max frequency, the device runs at its > max frequency. If the CPUs are running at their min frequency, the > device runs at its min frequency. It is interpolated for frequencies > in between. > > Signed-off-by: Saravana Kannan <skannan@codeaurora.org> > [Sibi: Integrated cpu-freqmap governor into passive_governor] > Signed-off-by: Sibi Sankar <sibis@codeaurora.org> > [Chanwoo: Fix conflict with latest code and clean code up] > Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com> > --- > drivers/devfreq/governor.h | 22 +++ > drivers/devfreq/governor_passive.c | 264 ++++++++++++++++++++++++++++- > include/linux/devfreq.h | 16 +- > 3 files changed, 293 insertions(+), 9 deletions(-) > > diff --git a/drivers/devfreq/governor.h b/drivers/devfreq/governor.h > index 9a9495f94ac6..3c36c92c89a9 100644 > --- a/drivers/devfreq/governor.h > +++ b/drivers/devfreq/governor.h > @@ -47,6 +47,28 @@ > #define DEVFREQ_GOV_ATTR_POLLING_INTERVAL BIT(0) > #define DEVFREQ_GOV_ATTR_TIMER BIT(1) > > +/** > + * struct devfreq_cpu_data - Hold the per-cpu data > + * @dev: reference to cpu device. > + * @first_cpu: the cpumask of the first cpu of a policy. > + * @opp_table: reference to cpu opp table. > + * @cur_freq: the current frequency of the cpu. > + * @min_freq: the min frequency of the cpu. > + * @max_freq: the max frequency of the cpu. > + * > + * This structure stores the required cpu_data of a cpu. > + * This is auto-populated by the governor. > + */ > +struct devfreq_cpu_data { > + struct device *dev; > + unsigned int first_cpu; > + > + struct opp_table *opp_table; > + unsigned int cur_freq; > + unsigned int min_freq; > + unsigned int max_freq; > +}; > + > /** > * struct devfreq_governor - Devfreq policy governor > * @node: list node - contains registered devfreq governors > diff --git a/drivers/devfreq/governor_passive.c b/drivers/devfreq/governor_passive.c > index fc09324a03e0..07e864509b7e 100644 > --- a/drivers/devfreq/governor_passive.c > +++ b/drivers/devfreq/governor_passive.c > > ... > > +static int cpufreq_passive_register_notifier(struct devfreq *devfreq) > +{ > + struct devfreq_passive_data *p_data > + = (struct devfreq_passive_data *)devfreq->data; > + struct device *dev = devfreq->dev.parent; > + struct opp_table *opp_table = NULL; > + struct devfreq_cpu_data *cpu_data; > + struct cpufreq_policy *policy; > + struct device *cpu_dev; > + unsigned int cpu; > + int ret; > + > + get_online_cpus(); > + > + p_data->nb.notifier_call = cpufreq_passive_notifier_call; > + ret = cpufreq_register_notifier(&p_data->nb, CPUFREQ_TRANSITION_NOTIFIER); > + if (ret) { > + dev_err(dev, "failed to register cpufreq notifier\n"); > + p_data->nb.notifier_call = NULL; > + goto out; > + } > + > + for_each_online_cpu(cpu) { Is this really needed for each CPU? Wouldn't it be enough to create a 'cpu_data' for each 'policy CPU'? In any case should this be for_each_possible_cpu() as in _unregister_notifier() to also support CPUs that may be offline when the notifier is registered? > + if (p_data->cpu_data[cpu]) > + continue; > + > + policy = cpufreq_cpu_get(cpu); > + if (policy) { > + cpu_data = kzalloc(sizeof(*cpu_data), GFP_KERNEL); > + if (!cpu_data) { > + ret = -ENOMEM; > + goto out; > + } > + > + cpu_dev = get_cpu_device(cpu); > + if (!cpu_dev) { > + dev_err(dev, "failed to get cpu device\n"); > + ret = -ENODEV; > + goto out; Memory for 'cpu_data' is not freed in this path. Also applies to CPUs from possible prior iterations. > + } > + > + opp_table = dev_pm_opp_get_opp_table(cpu_dev); > + if (IS_ERR(opp_table)) { > + ret = PTR_ERR(opp_table); > + goto out; Ditto and cpufreq_cpu_put() is missing too. > + } > + > + cpu_data->dev = cpu_dev; > + cpu_data->opp_table = opp_table; > + cpu_data->first_cpu = cpumask_first(policy->related_cpus); > + cpu_data->cur_freq = policy->cur; > + cpu_data->min_freq = policy->cpuinfo.min_freq; > + cpu_data->max_freq = policy->cpuinfo.max_freq; > + > + p_data->cpu_data[cpu] = cpu_data; > + cpufreq_cpu_put(policy); > + } else { > + ret = -EPROBE_DEFER; > + goto out; Resources from possible prior iterations aren't freed. > + } > + } > +out: > + put_online_cpus(); > + if (ret) > + return ret; > + > + mutex_lock(&devfreq->lock); > + ret = devfreq_update_target(devfreq, 0L); > + mutex_unlock(&devfreq->lock); > + if (ret) > + dev_err(dev, "failed to update the frequency\n"); > + > + return ret; > +} > + > +static int cpufreq_passive_unregister_notifier(struct devfreq *devfreq) > +{ > + struct devfreq_passive_data *p_data > + = (struct devfreq_passive_data *)devfreq->data; > + struct devfreq_cpu_data *cpu_data; > + int cpu; > + > + if (p_data->nb.notifier_call) > + cpufreq_unregister_notifier(&p_data->nb, CPUFREQ_TRANSITION_NOTIFIER); > + > + for_each_possible_cpu(cpu) { > + cpu_data = p_data->cpu_data[cpu]; > + if (cpu_data) { > + if (cpu_data->opp_table) > + dev_pm_opp_put_opp_table(cpu_data->opp_table); > + kfree(cpu_data); > + cpu_data = NULL; Assignment to NULL is not needed. > + } > + } > + > + return 0; > +} > + > static int devfreq_passive_notifier_call(struct notifier_block *nb, > unsigned long event, void *ptr) > { > @@ -140,7 +379,7 @@ static int devfreq_passive_event_handler(struct devfreq *devfreq, > struct notifier_block *nb = &p_data->nb; > int ret = 0; > > - if (!parent) > + if (p_data->parent_type == DEVFREQ_PARENT_DEV && !parent) > return -EPROBE_DEFER; > > switch (event) { > @@ -148,13 +387,24 @@ static int devfreq_passive_event_handler(struct devfreq *devfreq, > if (!p_data->this) > p_data->this = devfreq; > > - nb->notifier_call = devfreq_passive_notifier_call; > - ret = devfreq_register_notifier(parent, nb, > - DEVFREQ_TRANSITION_NOTIFIER); > + if (p_data->parent_type == DEVFREQ_PARENT_DEV) { > + nb->notifier_call = devfreq_passive_notifier_call; > + ret = devfreq_register_notifier(parent, nb, > + DEVFREQ_TRANSITION_NOTIFIER); > + } else if (p_data->parent_type == CPUFREQ_PARENT_DEV) { > + ret = cpufreq_passive_register_notifier(devfreq); > + } else { > + ret = -EINVAL; > + } > break; > case DEVFREQ_GOV_STOP: > - WARN_ON(devfreq_unregister_notifier(parent, nb, > - DEVFREQ_TRANSITION_NOTIFIER)); > + if (p_data->parent_type == DEVFREQ_PARENT_DEV) > + WARN_ON(devfreq_unregister_notifier(parent, nb, > + DEVFREQ_TRANSITION_NOTIFIER)); > + else if (p_data->parent_type == CPUFREQ_PARENT_DEV) > + WARN_ON(cpufreq_passive_unregister_notifier(devfreq)); > + else > + ret = -EINVAL; > break; > default: > break; > diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h > index 142474b4af96..cfa0ef54841e 100644 > --- a/include/linux/devfreq.h > +++ b/include/linux/devfreq.h > @@ -38,6 +38,7 @@ enum devfreq_timer { > > struct devfreq; > struct devfreq_governor; > +struct devfreq_cpu_data; > struct thermal_cooling_device; > > /** > @@ -288,6 +289,11 @@ struct devfreq_simple_ondemand_data { > #endif > > #if IS_ENABLED(CONFIG_DEVFREQ_GOV_PASSIVE) > +enum devfreq_parent_dev_type { > + DEVFREQ_PARENT_DEV, > + CPUFREQ_PARENT_DEV, > +}; > + > /** > * struct devfreq_passive_data - ``void *data`` fed to struct devfreq > * and devfreq_add_device > @@ -299,8 +305,10 @@ struct devfreq_simple_ondemand_data { > * using governors except for passive governor. > * If the devfreq device has the specific method to decide > * the next frequency, should use this callback. > - * @this: the devfreq instance of own device. > - * @nb: the notifier block for DEVFREQ_TRANSITION_NOTIFIER list > + + * @parent_type parent type of the device > + + * @this: the devfreq instance of own device. > + + * @nb: the notifier block for DEVFREQ_TRANSITION_NOTIFIER list > + + * @cpu_data: the state min/max/current frequency of all online cpu's > * > * The devfreq_passive_data have to set the devfreq instance of parent > * device with governors except for the passive governor. But, don't need to > @@ -314,9 +322,13 @@ struct devfreq_passive_data { > /* Optional callback to decide the next frequency of passvice device */ > int (*get_target_freq)(struct devfreq *this, unsigned long *freq); > > + /* Should set the type of parent device */ > + enum devfreq_parent_dev_type parent_type; > + > /* For passive governor's internal use. Don't need to set them */ > struct devfreq *this; > struct notifier_block nb; > + struct devfreq_cpu_data *cpu_data[NR_CPUS]; Could memory usage be a concern on systems with a really high number of CPUs (e.g. 8k for x86 with MAXSMP)? One could argue that such systems likely have significant amount of RAM too and a chunk of memory in the order of 100k wouldn't make a big impact. I'm assuming that 'cpu_data' is only needed for 'policy CPUs'.
On Thu, Jun 17, 2021 at 03:05:45PM +0900, Chanwoo Choi wrote: > From: Saravana Kannan <skannan@codeaurora.org> > > Many CPU architectures have caches that can scale independent of the > CPUs. Frequency scaling of the caches is necessary to make sure that the > cache is not a performance bottleneck that leads to poor performance and > power. The same idea applies for RAM/DDR. > > To achieve this, this patch adds support for cpu based scaling to the > passive governor. This is accomplished by taking the current frequency > of each CPU frequency domain and then adjust the frequency of the cache > (or any devfreq device) based on the frequency of the CPUs. It listens > to CPU frequency transition notifiers to keep itself up to date on the > current CPU frequency. > > To decide the frequency of the device, the governor does one of the > following: > * Derives the optimal devfreq device opp from required-opps property of > the parent cpu opp_table. > > * Scales the device frequency in proportion to the CPU frequency. So, if > the CPUs are running at their max frequency, the device runs at its > max frequency. If the CPUs are running at their min frequency, the > device runs at its min frequency. It is interpolated for frequencies > in between. > > Signed-off-by: Saravana Kannan <skannan@codeaurora.org> > [Sibi: Integrated cpu-freqmap governor into passive_governor] > Signed-off-by: Sibi Sankar <sibis@codeaurora.org> > [Chanwoo: Fix conflict with latest code and clean code up] > Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com> > --- > drivers/devfreq/governor.h | 22 +++ > drivers/devfreq/governor_passive.c | 264 ++++++++++++++++++++++++++++- > include/linux/devfreq.h | 16 +- > 3 files changed, 293 insertions(+), 9 deletions(-) > > diff --git a/drivers/devfreq/governor.h b/drivers/devfreq/governor.h > index 9a9495f94ac6..3c36c92c89a9 100644 > --- a/drivers/devfreq/governor.h > +++ b/drivers/devfreq/governor.h > @@ -47,6 +47,28 @@ > #define DEVFREQ_GOV_ATTR_POLLING_INTERVAL BIT(0) > #define DEVFREQ_GOV_ATTR_TIMER BIT(1) > > +/** > + * struct devfreq_cpu_data - Hold the per-cpu data > + * @dev: reference to cpu device. > + * @first_cpu: the cpumask of the first cpu of a policy. > + * @opp_table: reference to cpu opp table. > + * @cur_freq: the current frequency of the cpu. > + * @min_freq: the min frequency of the cpu. > + * @max_freq: the max frequency of the cpu. > + * > + * This structure stores the required cpu_data of a cpu. > + * This is auto-populated by the governor. > + */ > +struct devfreq_cpu_data { > + struct device *dev; > + unsigned int first_cpu; > + > + struct opp_table *opp_table; > + unsigned int cur_freq; > + unsigned int min_freq; > + unsigned int max_freq; > +}; > + > /** > * struct devfreq_governor - Devfreq policy governor > * @node: list node - contains registered devfreq governors > diff --git a/drivers/devfreq/governor_passive.c b/drivers/devfreq/governor_passive.c > index fc09324a03e0..07e864509b7e 100644 > --- a/drivers/devfreq/governor_passive.c > +++ b/drivers/devfreq/governor_passive.c > @@ -8,11 +8,84 @@ > */ > > #include <linux/module.h> > +#include <linux/cpu.h> > +#include <linux/cpufreq.h> > +#include <linux/cpumask.h> > +#include <linux/slab.h> > #include <linux/device.h> > #include <linux/devfreq.h> > #include "governor.h" > > -static int devfreq_passive_get_target_freq(struct devfreq *devfreq, > +#define HZ_PER_KHZ 1000 > + > +static unsigned long get_taget_freq_by_required_opp(struct device *p_dev, > + struct opp_table *p_opp_table, > + struct opp_table *opp_table, > + unsigned long freq) > +{ s/get_taget_freq_by_required_opp/get_target_freq_by_required_opp/
diff --git a/drivers/devfreq/governor.h b/drivers/devfreq/governor.h index 9a9495f94ac6..3c36c92c89a9 100644 --- a/drivers/devfreq/governor.h +++ b/drivers/devfreq/governor.h @@ -47,6 +47,28 @@ #define DEVFREQ_GOV_ATTR_POLLING_INTERVAL BIT(0) #define DEVFREQ_GOV_ATTR_TIMER BIT(1) +/** + * struct devfreq_cpu_data - Hold the per-cpu data + * @dev: reference to cpu device. + * @first_cpu: the cpumask of the first cpu of a policy. + * @opp_table: reference to cpu opp table. + * @cur_freq: the current frequency of the cpu. + * @min_freq: the min frequency of the cpu. + * @max_freq: the max frequency of the cpu. + * + * This structure stores the required cpu_data of a cpu. + * This is auto-populated by the governor. + */ +struct devfreq_cpu_data { + struct device *dev; + unsigned int first_cpu; + + struct opp_table *opp_table; + unsigned int cur_freq; + unsigned int min_freq; + unsigned int max_freq; +}; + /** * struct devfreq_governor - Devfreq policy governor * @node: list node - contains registered devfreq governors diff --git a/drivers/devfreq/governor_passive.c b/drivers/devfreq/governor_passive.c index fc09324a03e0..07e864509b7e 100644 --- a/drivers/devfreq/governor_passive.c +++ b/drivers/devfreq/governor_passive.c @@ -8,11 +8,84 @@ */ #include <linux/module.h> +#include <linux/cpu.h> +#include <linux/cpufreq.h> +#include <linux/cpumask.h> +#include <linux/slab.h> #include <linux/device.h> #include <linux/devfreq.h> #include "governor.h" -static int devfreq_passive_get_target_freq(struct devfreq *devfreq, +#define HZ_PER_KHZ 1000 + +static unsigned long get_taget_freq_by_required_opp(struct device *p_dev, + struct opp_table *p_opp_table, + struct opp_table *opp_table, + unsigned long freq) +{ + struct dev_pm_opp *opp = NULL, *p_opp = NULL; + + if (!p_dev || !p_opp_table || !opp_table || !freq) + return 0; + + p_opp = devfreq_recommended_opp(p_dev, &freq, 0); + if (IS_ERR(p_opp)) + return 0; + + opp = dev_pm_opp_xlate_required_opp(p_opp_table, opp_table, p_opp); + dev_pm_opp_put(p_opp); + + if (IS_ERR(opp)) + return 0; + + freq = dev_pm_opp_get_freq(opp); + dev_pm_opp_put(opp); + + return freq; +} + +static int get_target_freq_with_cpufreq(struct devfreq *devfreq, + unsigned long *target_freq) +{ + struct devfreq_passive_data *p_data = + (struct devfreq_passive_data *)devfreq->data; + struct devfreq_cpu_data *cpu_data; + unsigned long cpu, cpu_cur, cpu_min, cpu_max, cpu_percent; + unsigned long dev_min, dev_max; + unsigned long freq = 0; + + for_each_online_cpu(cpu) { + cpu_data = p_data->cpu_data[cpu]; + if (!cpu_data || cpu_data->first_cpu != cpu) + continue; + + /* Get target freq via required opps */ + cpu_cur = cpu_data->cur_freq * HZ_PER_KHZ; + freq = get_taget_freq_by_required_opp(cpu_data->dev, + cpu_data->opp_table, + devfreq->opp_table, cpu_cur); + if (freq) { + *target_freq = max(freq, *target_freq); + continue; + } + + /* Use Interpolation if required opps is not available */ + devfreq_get_freq_range(devfreq, &dev_min, &dev_max); + + cpu_min = cpu_data->min_freq; + cpu_max = cpu_data->max_freq; + cpu_cur = cpu_data->cur_freq; + + cpu_percent = ((cpu_cur - cpu_min) * 100) / (cpu_max - cpu_min); + freq = dev_min + mult_frac(dev_max - dev_min, cpu_percent, 100); + + *target_freq = max(freq, *target_freq); + } + + return 0; +} + +static int get_target_freq_with_devfreq(struct devfreq *devfreq, unsigned long *freq) { struct devfreq_passive_data *p_data @@ -99,6 +172,172 @@ static int devfreq_passive_get_target_freq(struct devfreq *devfreq, return 0; } +static int devfreq_passive_get_target_freq(struct devfreq *devfreq, + unsigned long *freq) +{ + struct devfreq_passive_data *p_data = + (struct devfreq_passive_data *)devfreq->data; + int ret; + + if (!p_data) + return -EINVAL; + + /* + * If the devfreq device with passive governor has the specific method + * to determine the next frequency, should use the get_target_freq() + * of struct devfreq_passive_data. + */ + if (p_data->get_target_freq) + return p_data->get_target_freq(devfreq, freq); + + switch (p_data->parent_type) { + case DEVFREQ_PARENT_DEV: + ret = get_target_freq_with_devfreq(devfreq, freq); + break; + case CPUFREQ_PARENT_DEV: + ret = get_target_freq_with_cpufreq(devfreq, freq); + break; + default: + ret = -EINVAL; + dev_err(&devfreq->dev, "Invalid parent type\n"); + break; + } + + return ret; +} + +static int cpufreq_passive_notifier_call(struct notifier_block *nb, + unsigned long event, void *ptr) +{ + struct devfreq_passive_data *data = + container_of(nb, struct devfreq_passive_data, nb); + struct devfreq *devfreq = (struct devfreq *)data->this; + struct devfreq_cpu_data *cpu_data; + struct cpufreq_freqs *freqs = ptr; + unsigned int cur_freq; + int ret; + + if (event != CPUFREQ_POSTCHANGE || !freqs || + !data->cpu_data[freqs->policy->cpu]) + return 0; + + cpu_data = data->cpu_data[freqs->policy->cpu]; + if (cpu_data->cur_freq == freqs->new) + return 0; + + cur_freq = cpu_data->cur_freq; + cpu_data->cur_freq = freqs->new; + + mutex_lock(&devfreq->lock); + ret = devfreq_update_target(devfreq, freqs->new); + mutex_unlock(&devfreq->lock); + if (ret) { + cpu_data->cur_freq = cur_freq; + dev_err(&devfreq->dev, "failed to update the frequency.\n"); + return ret; + } + + return 0; +} + +static int cpufreq_passive_register_notifier(struct devfreq *devfreq) +{ + struct devfreq_passive_data *p_data + = (struct devfreq_passive_data *)devfreq->data; + struct device *dev = devfreq->dev.parent; + struct opp_table *opp_table = NULL; + struct devfreq_cpu_data *cpu_data; + struct cpufreq_policy *policy; + struct device *cpu_dev; + unsigned int cpu; + int ret; + + get_online_cpus(); + + p_data->nb.notifier_call = cpufreq_passive_notifier_call; + ret = cpufreq_register_notifier(&p_data->nb, CPUFREQ_TRANSITION_NOTIFIER); + if (ret) { + dev_err(dev, "failed to register cpufreq notifier\n"); + p_data->nb.notifier_call = NULL; + goto out; + } + + for_each_online_cpu(cpu) { + if (p_data->cpu_data[cpu]) + continue; + + policy = cpufreq_cpu_get(cpu); + if (policy) { + cpu_data = kzalloc(sizeof(*cpu_data), GFP_KERNEL); + if (!cpu_data) { + ret = -ENOMEM; + goto out; + } + + cpu_dev = get_cpu_device(cpu); + if (!cpu_dev) { + dev_err(dev, "failed to get cpu device\n"); + ret = -ENODEV; + goto out; + } + + opp_table = dev_pm_opp_get_opp_table(cpu_dev); + if (IS_ERR(opp_table)) { + ret = PTR_ERR(opp_table); + goto out; + } + + cpu_data->dev = cpu_dev; + cpu_data->opp_table = opp_table; + cpu_data->first_cpu = cpumask_first(policy->related_cpus); + cpu_data->cur_freq = policy->cur; + cpu_data->min_freq = policy->cpuinfo.min_freq; + cpu_data->max_freq = policy->cpuinfo.max_freq; + + p_data->cpu_data[cpu] = cpu_data; + cpufreq_cpu_put(policy); + } else { + ret = -EPROBE_DEFER; + goto out; + } + } +out: + put_online_cpus(); + if (ret) + return ret; + + mutex_lock(&devfreq->lock); + ret = devfreq_update_target(devfreq, 0L); + mutex_unlock(&devfreq->lock); + if (ret) + dev_err(dev, "failed to update the frequency\n"); + + return ret; +} + +static int cpufreq_passive_unregister_notifier(struct devfreq *devfreq) +{ + struct devfreq_passive_data *p_data + = (struct devfreq_passive_data *)devfreq->data; + struct devfreq_cpu_data *cpu_data; + int cpu; + + if (p_data->nb.notifier_call) + cpufreq_unregister_notifier(&p_data->nb, CPUFREQ_TRANSITION_NOTIFIER); + + for_each_possible_cpu(cpu) { + cpu_data = p_data->cpu_data[cpu]; + if (cpu_data) { + if (cpu_data->opp_table) + dev_pm_opp_put_opp_table(cpu_data->opp_table); + kfree(cpu_data); + cpu_data = NULL; + } + } + + return 0; +} + static int devfreq_passive_notifier_call(struct notifier_block *nb, unsigned long event, void *ptr) { @@ -140,7 +379,7 @@ static int devfreq_passive_event_handler(struct devfreq *devfreq, struct notifier_block *nb = &p_data->nb; int ret = 0; - if (!parent) + if (p_data->parent_type == DEVFREQ_PARENT_DEV && !parent) return -EPROBE_DEFER; switch (event) { @@ -148,13 +387,24 @@ static int devfreq_passive_event_handler(struct devfreq *devfreq, if (!p_data->this) p_data->this = devfreq; - nb->notifier_call = devfreq_passive_notifier_call; - ret = devfreq_register_notifier(parent, nb, - DEVFREQ_TRANSITION_NOTIFIER); + if (p_data->parent_type == DEVFREQ_PARENT_DEV) { + nb->notifier_call = devfreq_passive_notifier_call; + ret = devfreq_register_notifier(parent, nb, + DEVFREQ_TRANSITION_NOTIFIER); + } else if (p_data->parent_type == CPUFREQ_PARENT_DEV) { + ret = cpufreq_passive_register_notifier(devfreq); + } else { + ret = -EINVAL; + } break; case DEVFREQ_GOV_STOP: - WARN_ON(devfreq_unregister_notifier(parent, nb, - DEVFREQ_TRANSITION_NOTIFIER)); + if (p_data->parent_type == DEVFREQ_PARENT_DEV) + WARN_ON(devfreq_unregister_notifier(parent, nb, + DEVFREQ_TRANSITION_NOTIFIER)); + else if (p_data->parent_type == CPUFREQ_PARENT_DEV) + WARN_ON(cpufreq_passive_unregister_notifier(devfreq)); + else + ret = -EINVAL; break; default: break; diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h index 142474b4af96..cfa0ef54841e 100644 --- a/include/linux/devfreq.h +++ b/include/linux/devfreq.h @@ -38,6 +38,7 @@ enum devfreq_timer { struct devfreq; struct devfreq_governor; +struct devfreq_cpu_data; struct thermal_cooling_device; /** @@ -288,6 +289,11 @@ struct devfreq_simple_ondemand_data { #endif #if IS_ENABLED(CONFIG_DEVFREQ_GOV_PASSIVE) +enum devfreq_parent_dev_type { + DEVFREQ_PARENT_DEV, + CPUFREQ_PARENT_DEV, +}; + /** * struct devfreq_passive_data - ``void *data`` fed to struct devfreq * and devfreq_add_device @@ -299,8 +305,10 @@ struct devfreq_simple_ondemand_data { * using governors except for passive governor. * If the devfreq device has the specific method to decide * the next frequency, should use this callback. - * @this: the devfreq instance of own device. - * @nb: the notifier block for DEVFREQ_TRANSITION_NOTIFIER list + + * @parent_type parent type of the device + + * @this: the devfreq instance of own device. + + * @nb: the notifier block for DEVFREQ_TRANSITION_NOTIFIER list + + * @cpu_data: the state min/max/current frequency of all online cpu's * * The devfreq_passive_data have to set the devfreq instance of parent * device with governors except for the passive governor. But, don't need to @@ -314,9 +322,13 @@ struct devfreq_passive_data { /* Optional callback to decide the next frequency of passvice device */ int (*get_target_freq)(struct devfreq *this, unsigned long *freq); + /* Should set the type of parent device */ + enum devfreq_parent_dev_type parent_type; + /* For passive governor's internal use. Don't need to set them */ struct devfreq *this; struct notifier_block nb; + struct devfreq_cpu_data *cpu_data[NR_CPUS]; }; #endif