Message ID | 25738681139c04272b6d2ebeff244c6d36c893f7.1400133090.git.viresh.kumar@linaro.org |
---|---|
State | New |
Headers | show |
Viresh, On Wed, May 14, 2014 at 10:56 PM, Viresh Kumar <viresh.kumar@linaro.org> wrote: > Douglas Anderson, recently pointed out an interesting problem due to which his > udelay() was expiring earlier than it should: > https://lkml.org/lkml/2014/5/13/766 > > While transitioning between frequencies few platforms may temporarily switch to > a stable frequency, waiting for the main PLL to stabilize. > > For example: When we transition between very low frequencies on exynos, like > between 200MHz and 300MHz, we may temporarily switch to a PLL running at 800MHz. > No CPUFREQ notification is sent for that. That means there's a period of time > when we're running at 800MHz but loops_per_jiffy is calibrated at between 200MHz > and 300MHz. And so udelay behaves badly. > > To get this fixed in a generic way, lets introduce another callback safe_freq() > for the cpufreq drivers. > > safe_freq() should return a stable intermediate frequency a platform might want > to switch to, before jumping to the frequency corresponding to 'index'. Core > will send the 'PRE' notification for this 'stable' frequency and 'POST' for the > 'target' frequency. Though if ->target_index() fails, it will handle POST for > 'stable' frequency only. > > Drivers must send 'POST' notification for 'stable' freq and 'PRE' for 'target' > freq. If they can't switch to target frequency, they don't need to send any > notification. This will have the side effect of sending twice as many notifications. ...however it does allow for people registering for CPUFREQ notifications to be more generic... Thinking about it, I think you're right that this is the way to go. The majority of the registrants of CPUFREQ that I see really ought to be moved to common clock notifications (they are dealing with the fact that a peripheral clock will get scaled as a side effect of CPUFREQ). What's left is only a very small number of cases that would most cleanly be dealt with by just seeing the extra notification. > Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> > --- > Doug/Stephen, > > If this doesn't look too ugly, then I would need patches from you to fix your > platforms as I am not well aware of clk hierarchy of your platforms. It probably makes sense to wait until Thomas Abraham's patch lands, since he's redoing exynos cpufreq to use cpufreq-cpu0. ...and maybe Thomas would be willing to write this patch? > drivers/cpufreq/cpufreq.c | 13 +++++++++++-- > include/linux/cpufreq.h | 18 ++++++++++++++++++ > 2 files changed, 29 insertions(+), 2 deletions(-) > > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c > index a05c921..8d1cb4f 100644 > --- a/drivers/cpufreq/cpufreq.c > +++ b/drivers/cpufreq/cpufreq.c > @@ -1874,11 +1874,17 @@ int __cpufreq_driver_target(struct cpufreq_policy *policy, > > if (notify) { > freqs.old = policy->cur; > - freqs.new = freq_table[index].frequency; > + /* Switch to some safe intermediate freq */ > + if (cpufreq_driver->safe_freq) What do you think about calling this get_safe_freq(). It took me a little while before I realized that this function didn't perform the transition to the safe frequency--it just returned it. ...the comment adds extra confusion since it makes it sound like the switch happens right here. > + freqs.new = cpufreq_driver->safe_freq(policy, > + index); > + else > + freqs.new = freq_table[index].frequency; > freqs.flags = 0; > > pr_debug("%s: cpu: %d, oldfreq: %u, new freq: %u\n", > - __func__, policy->cpu, freqs.old, freqs.new); > + __func__, policy->cpu, freqs.old, > + freq_table[index].frequency); > > cpufreq_freq_transition_begin(policy, &freqs); > } > @@ -1887,6 +1893,9 @@ int __cpufreq_driver_target(struct cpufreq_policy *policy, > if (retval) > pr_err("%s: Failed to change cpu frequency: %d\n", > __func__, retval); > + else > + /* Send POST notification for the target frequency */ > + freqs.new = freq_table[index].frequency; Don't you need to set freqs.old to the safe_freq? -Doug -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 05/14/2014 11:56 PM, Viresh Kumar wrote: > Douglas Anderson, recently pointed out an interesting problem due to which his > udelay() was expiring earlier than it should: > https://lkml.org/lkml/2014/5/13/766 > > While transitioning between frequencies few platforms may temporarily switch to > a stable frequency, waiting for the main PLL to stabilize. > > For example: When we transition between very low frequencies on exynos, like > between 200MHz and 300MHz, we may temporarily switch to a PLL running at 800MHz. > No CPUFREQ notification is sent for that. That means there's a period of time > when we're running at 800MHz but loops_per_jiffy is calibrated at between 200MHz > and 300MHz. And so udelay behaves badly. > > To get this fixed in a generic way, lets introduce another callback safe_freq() > for the cpufreq drivers. > > safe_freq() should return a stable intermediate frequency a platform might want > to switch to, before jumping to the frequency corresponding to 'index'. Core > will send the 'PRE' notification for this 'stable' frequency and 'POST' for the > 'target' frequency. Though if ->target_index() fails, it will handle POST for > 'stable' frequency only. > > Drivers must send 'POST' notification for 'stable' freq and 'PRE' for 'target' > freq. If they can't switch to target frequency, they don't need to send any > notification. This seems rather complex. Can't either the driver or the cpufreq core be responsible for all of the notifications? Otherwise, the logic gets rather complex, and spread between the core and the driver. Perhaps the core should make separate calls into the driver to switch to the temporary frequency and the final frequency, so it can manage all the notifications. Probably best to use a separate function pointer for the temporary change so the driver can easily know what it's doing. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, On Thu, May 15, 2014 at 12:17 PM, Stephen Warren <swarren@wwwdotorg.org> wrote: > On 05/14/2014 11:56 PM, Viresh Kumar wrote: >> Douglas Anderson, recently pointed out an interesting problem due to which his >> udelay() was expiring earlier than it should: >> https://lkml.org/lkml/2014/5/13/766 >> >> While transitioning between frequencies few platforms may temporarily switch to >> a stable frequency, waiting for the main PLL to stabilize. >> >> For example: When we transition between very low frequencies on exynos, like >> between 200MHz and 300MHz, we may temporarily switch to a PLL running at 800MHz. >> No CPUFREQ notification is sent for that. That means there's a period of time >> when we're running at 800MHz but loops_per_jiffy is calibrated at between 200MHz >> and 300MHz. And so udelay behaves badly. >> >> To get this fixed in a generic way, lets introduce another callback safe_freq() >> for the cpufreq drivers. >> >> safe_freq() should return a stable intermediate frequency a platform might want >> to switch to, before jumping to the frequency corresponding to 'index'. Core >> will send the 'PRE' notification for this 'stable' frequency and 'POST' for the >> 'target' frequency. Though if ->target_index() fails, it will handle POST for >> 'stable' frequency only. >> >> Drivers must send 'POST' notification for 'stable' freq and 'PRE' for 'target' >> freq. If they can't switch to target frequency, they don't need to send any >> notification. > > This seems rather complex. Can't either the driver or the cpufreq core > be responsible for all of the notifications? Otherwise, the logic gets > rather complex, and spread between the core and the driver. > > Perhaps the core should make separate calls into the driver to switch to > the temporary frequency and the final frequency, so it can manage all > the notifications. Probably best to use a separate function pointer for > the temporary change so the driver can easily know what it's doing. In the discussion about the exynos cpufreq redesign (atop cpufreq-cpu0), it turns out that they've come up with a pretty reasonable solution that also happens to solve our problem. They utilize an extra divider to make sure that the temporary PLL gets divided down so that it's low enough. It might mean that going between 300 MHz and 500 MHz that you will transition through 400 MHz, but I'm quite OK with not sending out a notification for that. If something like that could work for tegra, then maybe we can drop this whole thing and it will all just fix itself. ;) -Doug -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On 05/15/2014 02:39 PM, Doug Anderson wrote: > Hi, > > On Thu, May 15, 2014 at 12:17 PM, Stephen Warren <swarren@wwwdotorg.org> wrote: >> On 05/14/2014 11:56 PM, Viresh Kumar wrote: >>> Douglas Anderson, recently pointed out an interesting problem due to which his >>> udelay() was expiring earlier than it should: >>> https://lkml.org/lkml/2014/5/13/766 >>> >>> While transitioning between frequencies few platforms may temporarily switch to >>> a stable frequency, waiting for the main PLL to stabilize. >>> >>> For example: When we transition between very low frequencies on exynos, like >>> between 200MHz and 300MHz, we may temporarily switch to a PLL running at 800MHz. >>> No CPUFREQ notification is sent for that. That means there's a period of time >>> when we're running at 800MHz but loops_per_jiffy is calibrated at between 200MHz >>> and 300MHz. And so udelay behaves badly. >>> >>> To get this fixed in a generic way, lets introduce another callback safe_freq() >>> for the cpufreq drivers. >>> >>> safe_freq() should return a stable intermediate frequency a platform might want >>> to switch to, before jumping to the frequency corresponding to 'index'. Core >>> will send the 'PRE' notification for this 'stable' frequency and 'POST' for the >>> 'target' frequency. Though if ->target_index() fails, it will handle POST for >>> 'stable' frequency only. >>> >>> Drivers must send 'POST' notification for 'stable' freq and 'PRE' for 'target' >>> freq. If they can't switch to target frequency, they don't need to send any >>> notification. >> >> This seems rather complex. Can't either the driver or the cpufreq core >> be responsible for all of the notifications? Otherwise, the logic gets >> rather complex, and spread between the core and the driver. >> >> Perhaps the core should make separate calls into the driver to switch to >> the temporary frequency and the final frequency, so it can manage all >> the notifications. Probably best to use a separate function pointer for >> the temporary change so the driver can easily know what it's doing. > > In the discussion about the exynos cpufreq redesign (atop > cpufreq-cpu0), it turns out that they've come up with a pretty > reasonable solution that also happens to solve our problem. They > utilize an extra divider to make sure that the temporary PLL gets > divided down so that it's low enough. > > It might mean that going between 300 MHz and 500 MHz that you will > transition through 400 MHz, but I'm quite OK with not sending out a > notification for that. > > If something like that could work for tegra, then maybe we can drop > this whole thing and it will all just fix itself. ;) At least in the case of Tegra20 cpufreq, I don't think that will be possible at least without changing the temporary clock source we use (pll_p). The PLL that's use temporarily is also the root of all the peripheral clocks, and hence can't be changed. We also only characterize that PLL at the one specific frequency it was designed to run at. That said, it looks like the CPU clock may support pll_p_out3 and 4 as sources in addition to pll_p. I'm not sure if anything else uses those divided pll_p outputs. Peter, perhaps you can comment? Also, since pll_p itself runs at exactly 216MHz, pll_p_out3 and 4 can never go higher than that, so we couldn't use this trick for transitions between two fast clock rates. Note that Tegra20 does reparenting for any CPU clock rate change, not just when changing to/from certain slow rates. None of the other potential CPU clock parents seem any better. It's possible that later Tegra SoCs have more freedom here, but I didn't check. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Stephen, On Thu, May 15, 2014 at 1:51 PM, Stephen Warren <swarren@wwwdotorg.org> wrote: > On 05/15/2014 02:39 PM, Doug Anderson wrote: >> Hi, >> >> On Thu, May 15, 2014 at 12:17 PM, Stephen Warren <swarren@wwwdotorg.org> wrote: >>> On 05/14/2014 11:56 PM, Viresh Kumar wrote: >>>> Douglas Anderson, recently pointed out an interesting problem due to which his >>>> udelay() was expiring earlier than it should: >>>> https://lkml.org/lkml/2014/5/13/766 >>>> >>>> While transitioning between frequencies few platforms may temporarily switch to >>>> a stable frequency, waiting for the main PLL to stabilize. >>>> >>>> For example: When we transition between very low frequencies on exynos, like >>>> between 200MHz and 300MHz, we may temporarily switch to a PLL running at 800MHz. >>>> No CPUFREQ notification is sent for that. That means there's a period of time >>>> when we're running at 800MHz but loops_per_jiffy is calibrated at between 200MHz >>>> and 300MHz. And so udelay behaves badly. >>>> >>>> To get this fixed in a generic way, lets introduce another callback safe_freq() >>>> for the cpufreq drivers. >>>> >>>> safe_freq() should return a stable intermediate frequency a platform might want >>>> to switch to, before jumping to the frequency corresponding to 'index'. Core >>>> will send the 'PRE' notification for this 'stable' frequency and 'POST' for the >>>> 'target' frequency. Though if ->target_index() fails, it will handle POST for >>>> 'stable' frequency only. >>>> >>>> Drivers must send 'POST' notification for 'stable' freq and 'PRE' for 'target' >>>> freq. If they can't switch to target frequency, they don't need to send any >>>> notification. >>> >>> This seems rather complex. Can't either the driver or the cpufreq core >>> be responsible for all of the notifications? Otherwise, the logic gets >>> rather complex, and spread between the core and the driver. >>> >>> Perhaps the core should make separate calls into the driver to switch to >>> the temporary frequency and the final frequency, so it can manage all >>> the notifications. Probably best to use a separate function pointer for >>> the temporary change so the driver can easily know what it's doing. >> >> In the discussion about the exynos cpufreq redesign (atop >> cpufreq-cpu0), it turns out that they've come up with a pretty >> reasonable solution that also happens to solve our problem. They >> utilize an extra divider to make sure that the temporary PLL gets >> divided down so that it's low enough. >> >> It might mean that going between 300 MHz and 500 MHz that you will >> transition through 400 MHz, but I'm quite OK with not sending out a >> notification for that. >> >> If something like that could work for tegra, then maybe we can drop >> this whole thing and it will all just fix itself. ;) > > At least in the case of Tegra20 cpufreq, I don't think that will be > possible at least without changing the temporary clock source we use > (pll_p). The PLL that's use temporarily is also the root of all the > peripheral clocks, and hence can't be changed. We also only characterize > that PLL at the one specific frequency it was designed to run at. It's interesting, in the exynos case they didn't change the PLL itself but found an extra divider that I wasn't actually aware existed. It was located after the mux and before the cpu. > That said, it looks like the CPU clock may support pll_p_out3 and 4 as > sources in addition to pll_p. I'm not sure if anything else uses those > divided pll_p outputs. Peter, perhaps you can comment? Also, since pll_p > itself runs at exactly 216MHz, pll_p_out3 and 4 can never go higher than > that, so we couldn't use this trick for transitions between two fast > clock rates. Note that Tegra20 does reparenting for any CPU clock rate > change, not just when changing to/from certain slow rates. None of the > other potential CPU clock parents seem any better. On exynos it won't necessarily transition to a frequency that's between the start and end. ...but at least with the trick mentioned you can be sure that it's never _faster_ than either start or end. The last I read through the code exynos always transitioned to a temp PLL, though perhaps certain transitions could be optimized to avoid it (if you're making a transition that doesn't need to relock). ...example transitions: 1.6 => 800 (temp) => 1.7 600 => 800 (temp) => 800 600 => 400 (temp) => 700 200 => 200 (temp) => 300 -Doug -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On 15 May 2014 23:43, Doug Anderson <dianders@chromium.org> wrote: > This will have the side effect of sending twice as many notifications. > ...however it does allow for people registering for CPUFREQ > notifications to be more generic... That's not a side effect of this approach but the way platforms are handling it, and there is no way out we can skip that. In case we do, we will give space for the race to happen and udelay will work badly.. > Thinking about it, I think you're right that this is the way to go. Correct :) > It probably makes sense to wait until Thomas Abraham's patch lands, > since he's redoing exynos cpufreq to use cpufreq-cpu0. ...and maybe > Thomas would be willing to write this patch? But cpufreq-cpu0 isn't handling this intermediate freq concept, how will you work around that? Anyway we can get this working for tegra if stephen agrees. > What do you think about calling this get_safe_freq(). It took me a > little while before I realized that this function didn't perform the > transition to the safe frequency--it just returned it. > > ...the comment adds extra confusion since it makes it sound like the > switch happens right here. Agree. >> + /* Send POST notification for the target frequency */ >> + freqs.new = freq_table[index].frequency; > > Don't you need to set freqs.old to the safe_freq? Agree.. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 16 May 2014 00:47, Stephen Warren <swarren@wwwdotorg.org> wrote: > This seems rather complex. Can't either the driver or the cpufreq core > be responsible for all of the notifications? Otherwise, the logic gets > rather complex, and spread between the core and the driver. I do agree about that and that's why added that 'ugly' statement. > Perhaps the core should make separate calls into the driver to switch to > the temporary frequency and the final frequency, so it can manage all > the notifications. Probably best to use a separate function pointer for > the temporary change so the driver can easily know what it's doing. Hmm, that sounds like a much better approach. Let me try to code it. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, May 15, 2014 at 10:58:26PM +0200, Doug Anderson wrote: > Stephen, > > On Thu, May 15, 2014 at 1:51 PM, Stephen Warren <swarren@wwwdotorg.org> wrote: > > On 05/15/2014 02:39 PM, Doug Anderson wrote: > >> Hi, > >> > >> On Thu, May 15, 2014 at 12:17 PM, Stephen Warren <swarren@wwwdotorg.org> wrote: > >>> On 05/14/2014 11:56 PM, Viresh Kumar wrote: > >>>> Douglas Anderson, recently pointed out an interesting problem due to which his > >>>> udelay() was expiring earlier than it should: > >>>> https://lkml.org/lkml/2014/5/13/766 > >>>> > >>>> While transitioning between frequencies few platforms may temporarily switch to > >>>> a stable frequency, waiting for the main PLL to stabilize. > >>>> > >>>> For example: When we transition between very low frequencies on exynos, like > >>>> between 200MHz and 300MHz, we may temporarily switch to a PLL running at 800MHz. > >>>> No CPUFREQ notification is sent for that. That means there's a period of time > >>>> when we're running at 800MHz but loops_per_jiffy is calibrated at between 200MHz > >>>> and 300MHz. And so udelay behaves badly. > >>>> > >>>> To get this fixed in a generic way, lets introduce another callback safe_freq() > >>>> for the cpufreq drivers. > >>>> > >>>> safe_freq() should return a stable intermediate frequency a platform might want > >>>> to switch to, before jumping to the frequency corresponding to 'index'. Core > >>>> will send the 'PRE' notification for this 'stable' frequency and 'POST' for the > >>>> 'target' frequency. Though if ->target_index() fails, it will handle POST for > >>>> 'stable' frequency only. > >>>> > >>>> Drivers must send 'POST' notification for 'stable' freq and 'PRE' for 'target' > >>>> freq. If they can't switch to target frequency, they don't need to send any > >>>> notification. > >>> > >>> This seems rather complex. Can't either the driver or the cpufreq core > >>> be responsible for all of the notifications? Otherwise, the logic gets > >>> rather complex, and spread between the core and the driver. > >>> > >>> Perhaps the core should make separate calls into the driver to switch to > >>> the temporary frequency and the final frequency, so it can manage all > >>> the notifications. Probably best to use a separate function pointer for > >>> the temporary change so the driver can easily know what it's doing. > >> > >> In the discussion about the exynos cpufreq redesign (atop > >> cpufreq-cpu0), it turns out that they've come up with a pretty > >> reasonable solution that also happens to solve our problem. They > >> utilize an extra divider to make sure that the temporary PLL gets > >> divided down so that it's low enough. > >> > >> It might mean that going between 300 MHz and 500 MHz that you will > >> transition through 400 MHz, but I'm quite OK with not sending out a > >> notification for that. > >> > >> If something like that could work for tegra, then maybe we can drop > >> this whole thing and it will all just fix itself. ;) > > > > At least in the case of Tegra20 cpufreq, I don't think that will be > > possible at least without changing the temporary clock source we use > > (pll_p). The PLL that's use temporarily is also the root of all the > > peripheral clocks, and hence can't be changed. We also only characterize > > that PLL at the one specific frequency it was designed to run at. > > It's interesting, in the exynos case they didn't change the PLL itself > but found an extra divider that I wasn't actually aware existed. It > was located after the mux and before the cpu. > > We do have a divider between the mux and clocksource as well, but we never use it except from Tegra114 onwards for hw controlled thermal throttling (more as an emergency measure in case sw fails to react in time). Tegra also has a microsecond counter which could be used for udelay() on parts which don't have the arch counter (Tegra20 and Tegra30). For Tegra114 and Tegra124 we use the arch counter, so there shouldn't be a problem. Cheers, Peter. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index a05c921..8d1cb4f 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1874,11 +1874,17 @@ int __cpufreq_driver_target(struct cpufreq_policy *policy, if (notify) { freqs.old = policy->cur; - freqs.new = freq_table[index].frequency; + /* Switch to some safe intermediate freq */ + if (cpufreq_driver->safe_freq) + freqs.new = cpufreq_driver->safe_freq(policy, + index); + else + freqs.new = freq_table[index].frequency; freqs.flags = 0; pr_debug("%s: cpu: %d, oldfreq: %u, new freq: %u\n", - __func__, policy->cpu, freqs.old, freqs.new); + __func__, policy->cpu, freqs.old, + freq_table[index].frequency); cpufreq_freq_transition_begin(policy, &freqs); } @@ -1887,6 +1893,9 @@ int __cpufreq_driver_target(struct cpufreq_policy *policy, if (retval) pr_err("%s: Failed to change cpu frequency: %d\n", __func__, retval); + else + /* Send POST notification for the target frequency */ + freqs.new = freq_table[index].frequency; if (notify) cpufreq_freq_transition_end(policy, &freqs, retval); diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index 3f45889..b5ba275 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -226,6 +226,24 @@ struct cpufreq_driver { unsigned int relation); int (*target_index) (struct cpufreq_policy *policy, unsigned int index); + /* + * Only for drivers with target_index() and CPUFREQ_ASYNC_NOTIFICATION + * unset. + * + * safe_freq() should return a stable intermediate frequency a platform + * might want to switch to, before jumping to the frequency + * corresponding to 'index'. Core will send the 'PRE' notification for + * this 'stable' frequency and 'POST' for the 'target' frequency. Though + * if ->target_index() fails, it will handle POST for 'stable' frequency + * only. + * + * Drivers must send 'POST' notification for 'stable' freq and 'PRE' for + * 'target' freq. If they can't switch to target frequency, they don't + * need to send any notification. + * + */ + unsigned int (*safe_freq)(struct cpufreq_policy *policy, + unsigned int index); /* should be defined, if possible */ unsigned int (*get) (unsigned int cpu);
Douglas Anderson, recently pointed out an interesting problem due to which his udelay() was expiring earlier than it should: https://lkml.org/lkml/2014/5/13/766 While transitioning between frequencies few platforms may temporarily switch to a stable frequency, waiting for the main PLL to stabilize. For example: When we transition between very low frequencies on exynos, like between 200MHz and 300MHz, we may temporarily switch to a PLL running at 800MHz. No CPUFREQ notification is sent for that. That means there's a period of time when we're running at 800MHz but loops_per_jiffy is calibrated at between 200MHz and 300MHz. And so udelay behaves badly. To get this fixed in a generic way, lets introduce another callback safe_freq() for the cpufreq drivers. safe_freq() should return a stable intermediate frequency a platform might want to switch to, before jumping to the frequency corresponding to 'index'. Core will send the 'PRE' notification for this 'stable' frequency and 'POST' for the 'target' frequency. Though if ->target_index() fails, it will handle POST for 'stable' frequency only. Drivers must send 'POST' notification for 'stable' freq and 'PRE' for 'target' freq. If they can't switch to target frequency, they don't need to send any notification. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> --- Doug/Stephen, If this doesn't look too ugly, then I would need patches from you to fix your platforms as I am not well aware of clk hierarchy of your platforms. drivers/cpufreq/cpufreq.c | 13 +++++++++++-- include/linux/cpufreq.h | 18 ++++++++++++++++++ 2 files changed, 29 insertions(+), 2 deletions(-)