Message ID | 55cc3fbf9c553d669629e8ba98eef2b0ef33aaad.1385374822.git.viresh.kumar@linaro.org |
---|---|
State | New |
Headers | show |
On Monday, November 25, 2013 07:41:41 PM Viresh Kumar wrote: > > This patch adds cpufreq callbacks to dpm_{suspend|resume}() for handling > suspend/resume of cpufreq governors. This is required for early suspend and > late resume of governors and cpufreq core. > > There are multiple problems that are fixed by this patch: > - Nishanth Menon (TI) found an interesting problem on his platform, OMAP. His board > wasn't working well with suspend/resume as calls for removing non-boot CPUs > was turning out into a call to drivers ->target() which then tries to play > with regulators. But regulators and their I2C bus were already suspended and > this resulted in a failure. Many platforms have such problems, samsung, > tegra, etc.. They solved it with driver specific PM notifiers where they > used to disable their driver's ->target() routine. > > - Lan Tianyu (Intel) & Jinhyuk Choi (Broadcom) found another issue where > > tunables configuration for clusters/sockets with non-boot CPUs was getting > lost after suspend/resume, as we were notifying governors with > CPUFREQ_GOV_POLICY_EXIT on removal of the last cpu for that policy and so > deallocating memory for tunables. This is also fixed with this patch as don't > allow any operation on Governors during suspend/resume now. > > Reported-by: Lan Tianyu <tianyu.lan@intel.com> > Reported-by: Nishanth Menon <nm@ti.com> > Reported-by: Jinhyuk Choi <jinchoi@broadcom.com> > Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> > > --- > drivers/base/power/main.c | 5 +++++ > > drivers/cpufreq/cpufreq.c | 50 > +++++++++++++++++++++++++++++++++++++++++++++++ include/linux/cpufreq.h > | 3 +++ > 3 files changed, 58 insertions(+) > > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c > index 1b41fca..c9fbb9d 100644 > --- a/drivers/base/power/main.c > +++ b/drivers/base/power/main.c > @@ -29,6 +29,7 @@ > > #include <linux/async.h> > #include <linux/suspend.h> > #include <trace/events/power.h> > > +#include <linux/cpufreq.h> > > #include <linux/cpuidle.h> > #include <linux/timer.h> > > @@ -789,6 +790,8 @@ void dpm_resume(pm_message_t state) > > mutex_unlock(&dpm_list_mtx); > async_synchronize_full(); > dpm_show_time(starttime, state, NULL); > > + > + cpufreq_resume(); > > } > > /** > > @@ -1259,6 +1262,8 @@ int dpm_suspend(pm_message_t state) > > might_sleep(); > > + cpufreq_suspend(); > + > > mutex_lock(&dpm_list_mtx); > pm_transition = state; > async_error = 0; Shouldn't it do cpufreq_resume() on errors? > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c > index 02d534d..b6c7821 100644 > --- a/drivers/cpufreq/cpufreq.c > +++ b/drivers/cpufreq/cpufreq.c > @@ -26,6 +26,7 @@ > > #include <linux/module.h> > #include <linux/mutex.h> > #include <linux/slab.h> > > +#include <linux/suspend.h> > > #include <linux/syscore_ops.h> > #include <linux/tick.h> > #include <trace/events/power.h> > > @@ -47,6 +48,9 @@ static LIST_HEAD(cpufreq_policy_list); > > static DEFINE_PER_CPU(char[CPUFREQ_NAME_LEN], cpufreq_cpu_governor); > #endif > > +/* Flag to suspend/resume CPUFreq governors */ > +static bool cpufreq_suspended; > + > > static inline bool has_target(void) > { > > return cpufreq_driver->target_index || cpufreq_driver->target; > > @@ -1462,6 +1466,48 @@ static struct subsys_interface cpufreq_interface = { > > .remove_dev = cpufreq_remove_dev, > > }; > > +/* > + * Callbacks for suspending/resuming governors as some platforms can't > change + * frequency after this point in suspend cycle. Because some of the > devices + * (like: i2c, regulators, etc) they use for changing frequency > are suspended + * quickly after this point. > + */ > +void cpufreq_suspend(void) > +{ > + struct cpufreq_policy *policy; > + > + if (!has_target()) > + return; > + > + pr_debug("%s: Suspending Governors\n", __func__); > + > + list_for_each_entry(policy, &cpufreq_policy_list, policy_list) > + if (__cpufreq_governor(policy, CPUFREQ_GOV_STOP)) > + pr_err("%s: Failed to stop governor for policy: %p\n", > + __func__, policy); This appears to be racy. Is it really racy, or just seemingly? > + > + cpufreq_suspended = true; > +} > + > +void cpufreq_resume(void) > +{ > + struct cpufreq_policy *policy; > + > + if (!has_target()) > + return; > + > + pr_debug("%s: Resuming Governors\n", __func__); > + > + cpufreq_suspended = false; > + > + list_for_each_entry(policy, &cpufreq_policy_list, policy_list) > + if (__cpufreq_governor(policy, CPUFREQ_GOV_START) || > + __cpufreq_governor(policy, > + CPUFREQ_GOV_LIMITS)) > + pr_err("%s: Failed to start governor for policy: %p\n", > + __func__, policy); > +} > + > > /** > > * cpufreq_bp_suspend - Prepare the boot CPU for system suspend. > * > > @@ -1764,6 +1810,10 @@ static int __cpufreq_governor(struct cpufreq_policy > *policy,> > struct cpufreq_governor *gov = NULL; > > #endif > > + /* Don't start any governor operations if we are entering suspend */ > + if (cpufreq_suspended) > + return 0; > + > > if (policy->governor->max_transition_latency && > > policy->cpuinfo.transition_latency > > policy->governor->max_transition_latency) { > > diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h > index dc196bb..6d93f91 100644 > --- a/include/linux/cpufreq.h > +++ b/include/linux/cpufreq.h > @@ -255,6 +255,9 @@ struct cpufreq_driver { > > int cpufreq_register_driver(struct cpufreq_driver *driver_data); > int cpufreq_unregister_driver(struct cpufreq_driver *driver_data); > > +void cpufreq_suspend(void); > +void cpufreq_resume(void); > + > > const char *cpufreq_get_current_driver(void); > > static inline void cpufreq_verify_within_limits(struct cpufreq_policy > *policy, Thanks!
On 26 November 2013 04:59, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: >> @@ -1259,6 +1262,8 @@ int dpm_suspend(pm_message_t state) >> >> might_sleep(); >> >> + cpufreq_suspend(); >> + >> >> mutex_lock(&dpm_list_mtx); >> pm_transition = state; >> async_error = 0; > > Shouldn't it do cpufreq_resume() on errors? Yes and this is already done I believe. In case dpm_suspend() fails, dpm_resume() gets called. Isn't it? >> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c >> +void cpufreq_suspend(void) >> +{ >> + struct cpufreq_policy *policy; >> + >> + if (!has_target()) >> + return; >> + >> + pr_debug("%s: Suspending Governors\n", __func__); >> + >> + list_for_each_entry(policy, &cpufreq_policy_list, policy_list) >> + if (__cpufreq_governor(policy, CPUFREQ_GOV_STOP)) >> + pr_err("%s: Failed to stop governor for policy: %p\n", >> + __func__, policy); > > This appears to be racy. Is it really racy, or just seemingly? Why does it look racy to you? Userspace should be frozen by now, policy_list should be stable as well as nobody would touch it.
On Tuesday, November 26, 2013 07:56:19 AM Viresh Kumar wrote: > On 26 November 2013 04:59, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > >> @@ -1259,6 +1262,8 @@ int dpm_suspend(pm_message_t state) > >> > >> might_sleep(); > >> > >> + cpufreq_suspend(); > >> + > >> > >> mutex_lock(&dpm_list_mtx); > >> pm_transition = state; > >> async_error = 0; > > > > Shouldn't it do cpufreq_resume() on errors? > > Yes and this is already done I believe. In case dpm_suspend() fails, > dpm_resume() gets called. Isn't it? OK > >> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c > >> +void cpufreq_suspend(void) > >> +{ > >> + struct cpufreq_policy *policy; > >> + > >> + if (!has_target()) > >> + return; > >> + > >> + pr_debug("%s: Suspending Governors\n", __func__); > >> + > >> + list_for_each_entry(policy, &cpufreq_policy_list, policy_list) > >> + if (__cpufreq_governor(policy, CPUFREQ_GOV_STOP)) > >> + pr_err("%s: Failed to stop governor for policy: %p\n", > >> + __func__, policy); > > > > This appears to be racy. Is it really racy, or just seemingly? > > Why does it look racy to you? Userspace should be frozen by now, > policy_list should be stable as well as nobody would touch it. You're stopping governors while they may be in use in principle. Do we have suitable synchronization in place for that? Rafael
On Tuesday, November 26, 2013 09:23:15 PM Rafael J. Wysocki wrote: > On Tuesday, November 26, 2013 07:56:19 AM Viresh Kumar wrote: > > On 26 November 2013 04:59, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > > >> @@ -1259,6 +1262,8 @@ int dpm_suspend(pm_message_t state) > > >> > > >> might_sleep(); > > >> > > >> + cpufreq_suspend(); > > >> + > > >> > > >> mutex_lock(&dpm_list_mtx); > > >> pm_transition = state; > > >> async_error = 0; > > > > > > Shouldn't it do cpufreq_resume() on errors? > > > > Yes and this is already done I believe. In case dpm_suspend() fails, > > dpm_resume() gets called. Isn't it? > > OK > > > >> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c > > >> +void cpufreq_suspend(void) > > >> +{ > > >> + struct cpufreq_policy *policy; > > >> + > > >> + if (!has_target()) > > >> + return; > > >> + > > >> + pr_debug("%s: Suspending Governors\n", __func__); > > >> + > > >> + list_for_each_entry(policy, &cpufreq_policy_list, policy_list) > > >> + if (__cpufreq_governor(policy, CPUFREQ_GOV_STOP)) > > >> + pr_err("%s: Failed to stop governor for policy: %p\n", > > >> + __func__, policy); > > > > > > This appears to be racy. Is it really racy, or just seemingly? > > > > Why does it look racy to you? Userspace should be frozen by now, > > policy_list should be stable as well as nobody would touch it. > > You're stopping governors while they may be in use in principle. Do we have > suitable synchronization in place for that? Anyway, if you did what I asked you to do and put the cpufreq suspend/resume into dpm_suspend/resume_noirq(), I'd probably take this for 3.13. However, since you've decided to put those things somewhere else thus making the change much more intrusive, I can only queue it up for 3.14. This means I'm going to take the Tianyu's patch as a stop gap for 3.13. Thanks!
On 27 November 2013 07:12, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > Anyway, if you did what I asked you to do and put the cpufreq suspend/resume > into dpm_suspend/resume_noirq(), I'd probably take this for 3.13. However, > since you've decided to put those things somewhere else thus making the > change much more intrusive, I can only queue it up for 3.14. > > This means I'm going to take the Tianyu's patch as a stop gap for 3.13. There were design issues with that patch actually, as I pointed out earlier (handling EXIT part in core and INIT in governors).. And so in case we need to get something for v3.13, I will send a short version of this series with callbacks from suspend_noirq. Get that one instead. -- viresh
On 27 November 2013 01:53, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > On Tuesday, November 26, 2013 07:56:19 AM Viresh Kumar wrote: >> On 26 November 2013 04:59, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: >> > This appears to be racy. Is it really racy, or just seemingly? >> >> Why does it look racy to you? Userspace should be frozen by now, >> policy_list should be stable as well as nobody would touch it. > > You're stopping governors while they may be in use in principle. Do we have > suitable synchronization in place for that? At what point exactly in suspend cycle do we suspend timers and workqueues. I thought userspace would be frozen by now and so would be the governors..
On 2013年11月27日 11:07, Viresh Kumar wrote: > On 27 November 2013 07:12, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: >> Anyway, if you did what I asked you to do and put the cpufreq suspend/resume >> into dpm_suspend/resume_noirq(), I'd probably take this for 3.13. However, >> since you've decided to put those things somewhere else thus making the >> change much more intrusive, I can only queue it up for 3.14. >> >> This means I'm going to take the Tianyu's patch as a stop gap for 3.13. > Hi Viresh: First, I agree the new solution you are working on. :) But actually I don't totally agree my origin patch have design issue. Because I think governor should have the ability to check whether it has been EXIT when doing INIT and it should return error code at that point. The design is to make governor code stronger to deal with the case that governor is reinitialized before EXIT. Just from my view. Sorry for noise. > There were design issues with that patch actually, as I pointed out earlier > (handling EXIT part in core and INIT in governors).. And so in case we > need to get something for v3.13, I will send a short version of this series > with callbacks from suspend_noirq. > > Get that one instead. > > -- > viresh >
On 27 November 2013 12:38, Lan Tianyu <tianyu.lan@intel.com> wrote: > Hi Viresh: Hey Lan, > First, I agree the new solution you are working on. :) Thanks :) > But actually I don't totally agree my origin patch have design issue. > Because I think governor should have the ability to check whether it has > been EXIT when doing INIT and it should return error code at that point. > The design is to make governor code stronger to deal with the case that > governor is reinitialized before EXIT. Just from my view. > Sorry for noise. Ahh, these are useful discussions. Everyone have their own thoughts and its upto all of us to get meaningful stuff out of it.. I agree to whatever you wrote above but this isn't exactly what's being done in your patch. I was more concerned about this stuff: On 22 November 2013 13:08, Lan Tianyu <tianyu.lan@intel.com> wrote: > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c > + if (has_target() && !frozen) { > ret = __cpufreq_governor(policy, > CPUFREQ_GOV_POLICY_EXIT); > diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c > @@ -204,9 +204,20 @@ int cpufreq_governor_dbs(struct cpufreq_policy *policy, > > switch (event) { > case CPUFREQ_GOV_POLICY_INIT: > + /* > + * In order to keep governor data across suspend/resume, > + * Governor doesn't exit when suspend and will be > + * reinitialized when resume. Here check policy governor > + * data to determine whether the governor has been exited. > + * If not, return EALREADY. > + */ > if (have_governor_per_policy()) { > - WARN_ON(dbs_data); > + if (dbs_data) > + return -EALREADY; > } else if (dbs_data) { > + if (policy->governor_data == dbs_data) > + return -EALREADY; > + > dbs_data->usage_count++; > policy->governor_data = dbs_data; > return 0; Here the cpufreq core has skipped the call to governor's EXIT, and so it shouldn't pass on the following INIT call to them.. That's a bit wrong. These two calls work in pairs and are exactly opposite to each other. And so if some decision has to be taken then either that should be done completely at governor level or core level. Doing stuff partly in governor and partly in core is like giving invitation to new bugs/problems :) Nothing personal otherwise. Recently there were patches sent by people, you, Nishanth, etc, which I have just overridden with my versions.. It wasn't about getting my count higher :) but getting the solution at right places instead of solving them at wrong locations.. I am already having tough time upstreaming patches for cpufreq consolidation, as the number of patches is huge. It takes time for people to absorb/test them. Though Rafael has taken almost all of them in v3.13 finally, but I understand its difficult for him as well and he did his job wonderfully :) And so I don't really want to get any new stuff in which will surely get consolidated later. Lets do it now, we had enough of it :) Even, related to your patch, I was already thinking of getting rid of "frozen" variable and parameter to functions, as we already know status from a global variable now, cpufreq_suspended. And so we don't actually need to pass any additional parameters to many routines, which have something like 'frozen' currently. -- viresh
diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c index 1b41fca..c9fbb9d 100644 --- a/drivers/base/power/main.c +++ b/drivers/base/power/main.c @@ -29,6 +29,7 @@ #include <linux/async.h> #include <linux/suspend.h> #include <trace/events/power.h> +#include <linux/cpufreq.h> #include <linux/cpuidle.h> #include <linux/timer.h> @@ -789,6 +790,8 @@ void dpm_resume(pm_message_t state) mutex_unlock(&dpm_list_mtx); async_synchronize_full(); dpm_show_time(starttime, state, NULL); + + cpufreq_resume(); } /** @@ -1259,6 +1262,8 @@ int dpm_suspend(pm_message_t state) might_sleep(); + cpufreq_suspend(); + mutex_lock(&dpm_list_mtx); pm_transition = state; async_error = 0; diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 02d534d..b6c7821 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -26,6 +26,7 @@ #include <linux/module.h> #include <linux/mutex.h> #include <linux/slab.h> +#include <linux/suspend.h> #include <linux/syscore_ops.h> #include <linux/tick.h> #include <trace/events/power.h> @@ -47,6 +48,9 @@ static LIST_HEAD(cpufreq_policy_list); static DEFINE_PER_CPU(char[CPUFREQ_NAME_LEN], cpufreq_cpu_governor); #endif +/* Flag to suspend/resume CPUFreq governors */ +static bool cpufreq_suspended; + static inline bool has_target(void) { return cpufreq_driver->target_index || cpufreq_driver->target; @@ -1462,6 +1466,48 @@ static struct subsys_interface cpufreq_interface = { .remove_dev = cpufreq_remove_dev, }; +/* + * Callbacks for suspending/resuming governors as some platforms can't change + * frequency after this point in suspend cycle. Because some of the devices + * (like: i2c, regulators, etc) they use for changing frequency are suspended + * quickly after this point. + */ +void cpufreq_suspend(void) +{ + struct cpufreq_policy *policy; + + if (!has_target()) + return; + + pr_debug("%s: Suspending Governors\n", __func__); + + list_for_each_entry(policy, &cpufreq_policy_list, policy_list) + if (__cpufreq_governor(policy, CPUFREQ_GOV_STOP)) + pr_err("%s: Failed to stop governor for policy: %p\n", + __func__, policy); + + cpufreq_suspended = true; +} + +void cpufreq_resume(void) +{ + struct cpufreq_policy *policy; + + if (!has_target()) + return; + + pr_debug("%s: Resuming Governors\n", __func__); + + cpufreq_suspended = false; + + list_for_each_entry(policy, &cpufreq_policy_list, policy_list) + if (__cpufreq_governor(policy, CPUFREQ_GOV_START) || + __cpufreq_governor(policy, + CPUFREQ_GOV_LIMITS)) + pr_err("%s: Failed to start governor for policy: %p\n", + __func__, policy); +} + /** * cpufreq_bp_suspend - Prepare the boot CPU for system suspend. * @@ -1764,6 +1810,10 @@ static int __cpufreq_governor(struct cpufreq_policy *policy, struct cpufreq_governor *gov = NULL; #endif + /* Don't start any governor operations if we are entering suspend */ + if (cpufreq_suspended) + return 0; + if (policy->governor->max_transition_latency && policy->cpuinfo.transition_latency > policy->governor->max_transition_latency) { diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index dc196bb..6d93f91 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -255,6 +255,9 @@ struct cpufreq_driver { int cpufreq_register_driver(struct cpufreq_driver *driver_data); int cpufreq_unregister_driver(struct cpufreq_driver *driver_data); +void cpufreq_suspend(void); +void cpufreq_resume(void); + const char *cpufreq_get_current_driver(void); static inline void cpufreq_verify_within_limits(struct cpufreq_policy *policy,
This patch adds cpufreq callbacks to dpm_{suspend|resume}() for handling suspend/resume of cpufreq governors. This is required for early suspend and late resume of governors and cpufreq core. There are multiple problems that are fixed by this patch: - Nishanth Menon (TI) found an interesting problem on his platform, OMAP. His board wasn't working well with suspend/resume as calls for removing non-boot CPUs was turning out into a call to drivers ->target() which then tries to play with regulators. But regulators and their I2C bus were already suspended and this resulted in a failure. Many platforms have such problems, samsung, tegra, etc.. They solved it with driver specific PM notifiers where they used to disable their driver's ->target() routine. - Lan Tianyu (Intel) & Jinhyuk Choi (Broadcom) found another issue where tunables configuration for clusters/sockets with non-boot CPUs was getting lost after suspend/resume, as we were notifying governors with CPUFREQ_GOV_POLICY_EXIT on removal of the last cpu for that policy and so deallocating memory for tunables. This is also fixed with this patch as don't allow any operation on Governors during suspend/resume now. Reported-by: Lan Tianyu <tianyu.lan@intel.com> Reported-by: Nishanth Menon <nm@ti.com> Reported-by: Jinhyuk Choi <jinchoi@broadcom.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> --- drivers/base/power/main.c | 5 +++++ drivers/cpufreq/cpufreq.c | 50 +++++++++++++++++++++++++++++++++++++++++++++++ include/linux/cpufreq.h | 3 +++ 3 files changed, 58 insertions(+)