diff mbox

[RFC,V2] cpufreq: make sure frequency transitions are serialized

Message ID 2efc621827cbd96a05a3d34075154974b4816ecd.1394782795.git.viresh.kumar@linaro.org
State New
Headers show

Commit Message

Viresh Kumar March 14, 2014, 7:43 a.m. UTC
Whenever we are changing frequency of a cpu, we are calling PRECHANGE and
POSTCHANGE notifiers. They must be serialized. i.e. PRECHANGE or POSTCHANGE
shouldn't be called twice continuously. Following examples show why this is
important:

Scenario 1:
-----------
One thread reading value of cpuinfo_cur_freq, which will call
__cpufreq_cpu_get()->cpufreq_out_of_sync()->cpufreq_notify_transition()..

And ondemand governor trying to change freq of cpu at the same time and so
sending notification via ->target()..

Notifiers are not serialized and suppose this is what happened
- PRECHANGE Notification for freq A (from cpuinfo_cur_freq)
- PRECHANGE Notification for freq B (from target())
- Freq changed by target() to B
- POSTCHANGE Notification for freq B
- POSTCHANGE Notification for freq A

Now the last POSTCHANGE Notification happened for freq A and hardware is
currently running at freq B :)

Where would we break then?: adjust_jiffies() in cpufreq.c and cpufreq_callback()
in arch/arm/kernel/smp.c (which is also adjusting jiffies).. All loops_per_jiffy
based stuff is broken..

Scenario 2:
-----------
Governor is changing freq and has called __cpufreq_driver_target(). At the same
time we are changing scaling_{min|max}_freq from sysfs, which would eventually
end up calling governor's: CPUFREQ_GOV_LIMITS notification, that will also call:
__cpufreq_driver_target().. And hence concurrent calls to ->target()

And Platform have something like this in their ->target() (Like: cpufreq-cpu0,
omap, exynos, etc)

A. If new freq is more than old: Increase voltage
B. Change freq
C. If new freq is less than old: decrease voltage

Now, two concurrent calls to target are X and Y, where X is trying to increase
freq and Y is trying to decrease it..

And this is the sequence that followed due to races..

X.A: voltage increased for larger freq
Y.A: nothing happened here
Y.B: freq decreased
Y.C: voltage decreased
X.B: freq increased
X.C: nothing happened..

We ended up setting a freq which is not supported by the voltage we have set..
That will probably make clock to CPU unstable and system might not be workable
anymore...

This patch adds protection in cpufreq_notify_transition() to make transitions
serialized. It runs WARN() if POSTCHANGE notification is sent when we are not in
middle of a transition. And PRECHANGE notification is forced to wait while the
current transition is in progress.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---

This was discussed earlier here:
https://lkml.org/lkml/2013/9/25/402

Where Rafael asked for better fix, as he called the V1 fixes: "quick and dirty".
This is another approach, much simpler than previous one. Please see if this
looks fine. There is a TODO note in there as I wanted some suggestions on how
exactly should we wait for a transition to get over.

 drivers/cpufreq/cpufreq.c | 39 +++++++++++++++++++++++++++++++++++++--
 include/linux/cpufreq.h   |  2 ++
 2 files changed, 39 insertions(+), 2 deletions(-)

Comments

Srivatsa S. Bhat March 18, 2014, 12:50 p.m. UTC | #1
On 03/14/2014 01:13 PM, Viresh Kumar wrote:
> Whenever we are changing frequency of a cpu, we are calling PRECHANGE and
> POSTCHANGE notifiers. They must be serialized. i.e. PRECHANGE or POSTCHANGE
> shouldn't be called twice continuously. Following examples show why this is
> important:
> 
[...] 
> This was discussed earlier here:
> https://lkml.org/lkml/2013/9/25/402
> 
> Where Rafael asked for better fix, as he called the V1 fixes: "quick and dirty".
> This is another approach, much simpler than previous one. Please see if this
> looks fine. There is a TODO note in there as I wanted some suggestions on how
> exactly should we wait for a transition to get over.
> 
>  drivers/cpufreq/cpufreq.c | 39 +++++++++++++++++++++++++++++++++++++--
>  include/linux/cpufreq.h   |  2 ++
>  2 files changed, 39 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 2677ff1..66bdfff 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -324,6 +324,13 @@ static void __cpufreq_notify_transition(struct cpufreq_policy *policy,
>  	}
>  }
> 
> +static void notify_transition_for_each_cpu(struct cpufreq_policy *policy,
> +		struct cpufreq_freqs *freqs, unsigned int state)
> +{
> +	for_each_cpu(freqs->cpu, policy->cpus)
> +		__cpufreq_notify_transition(policy, freqs, state);
> +}
> +
>  /**
>   * cpufreq_notify_transition - call notifier chain and adjust_jiffies
>   * on frequency transition.
> @@ -335,8 +342,35 @@ static void __cpufreq_notify_transition(struct cpufreq_policy *policy,
>  void cpufreq_notify_transition(struct cpufreq_policy *policy,
>  		struct cpufreq_freqs *freqs, unsigned int state)
>  {
> -	for_each_cpu(freqs->cpu, policy->cpus)
> -		__cpufreq_notify_transition(policy, freqs, state);
> +	if ((state != CPUFREQ_PRECHANGE) && (state != CPUFREQ_POSTCHANGE))

Wait a min, when is this condition ever true? I mean, what else can
'state' ever be, apart from CPUFREQ_PRECHANGE and POSTCHANGE?

> +		return notify_transition_for_each_cpu(policy, freqs, state);
> +
> +	/* Serialize pre-post notifications */
> +	mutex_lock(&policy->transition_lock);

Nope, this is definitely not the way to go, IMHO. We should enforce that
the *callers* serialize the transitions, something like this:

	cpufreq_transition_lock();

	cpufreq_notify_transition(CPUFREQ_PRECHANGE);

	//Perform the frequency change

	cpufreq_notify_transition(CPUFREQ_POSTCHANGE);

	cpufreq_transition_unlock();

That's it!

[ We can either introduce a new "transition" lock or perhaps even reuse
the cpufreq_driver_lock if it fits... but the point is, the _caller_ has
to perform the locking; trying to be smart inside cpufreq_notify_transition()
is a recipe for headache :-( ]

Is there any problem with this approach due to which you didn't take
this route?

> +	if (unlikely(WARN_ON(!policy->transition_ongoing &&
> +				(state == CPUFREQ_POSTCHANGE)))) {
> +		mutex_unlock(&policy->transition_lock);
> +		return;
> +	}
> +
> +	if (state == CPUFREQ_PRECHANGE) {
> +		while (policy->transition_ongoing) {
> +			mutex_unlock(&policy->transition_lock);
> +			/* TODO: Can we do something better here? */
> +			cpu_relax();
> +			mutex_lock(&policy->transition_lock);

If the caller takes care of the synchronization, we can avoid
these sorts of acrobatics ;-)

Regards,
Srivatsa S. Bhat

> +		}
> +
> +		policy->transition_ongoing = true;
> +		mutex_unlock(&policy->transition_lock);
> +	}
> +
> +	notify_transition_for_each_cpu(policy, freqs, state);
> +
> +	if (state == CPUFREQ_POSTCHANGE) {
> +		policy->transition_ongoing = false;
> +		mutex_unlock(&policy->transition_lock);
> +	}
>  }
>  EXPORT_SYMBOL_GPL(cpufreq_notify_transition);
> 
> @@ -983,6 +1017,7 @@ static struct cpufreq_policy *cpufreq_policy_alloc(void)
> 
>  	INIT_LIST_HEAD(&policy->policy_list);
>  	init_rwsem(&policy->rwsem);
> +	mutex_init(&policy->transition_lock);
> 
>  	return policy;
> 
> diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
> index 31c431e..e5cebce 100644
> --- a/include/linux/cpufreq.h
> +++ b/include/linux/cpufreq.h
> @@ -104,6 +104,8 @@ struct cpufreq_policy {
>  	 *     __cpufreq_governor(data, CPUFREQ_GOV_POLICY_EXIT);
>  	 */
>  	struct rw_semaphore	rwsem;
> +	bool			transition_ongoing; /* Tracks transition status */
> +	struct mutex		transition_lock;
>  };
> 
>  /* Only for ACPI */
> 

--
To unsubscribe from this list: send the line "unsubscribe cpufreq" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 2677ff1..66bdfff 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -324,6 +324,13 @@  static void __cpufreq_notify_transition(struct cpufreq_policy *policy,
 	}
 }
 
+static void notify_transition_for_each_cpu(struct cpufreq_policy *policy,
+		struct cpufreq_freqs *freqs, unsigned int state)
+{
+	for_each_cpu(freqs->cpu, policy->cpus)
+		__cpufreq_notify_transition(policy, freqs, state);
+}
+
 /**
  * cpufreq_notify_transition - call notifier chain and adjust_jiffies
  * on frequency transition.
@@ -335,8 +342,35 @@  static void __cpufreq_notify_transition(struct cpufreq_policy *policy,
 void cpufreq_notify_transition(struct cpufreq_policy *policy,
 		struct cpufreq_freqs *freqs, unsigned int state)
 {
-	for_each_cpu(freqs->cpu, policy->cpus)
-		__cpufreq_notify_transition(policy, freqs, state);
+	if ((state != CPUFREQ_PRECHANGE) && (state != CPUFREQ_POSTCHANGE))
+		return notify_transition_for_each_cpu(policy, freqs, state);
+
+	/* Serialize pre-post notifications */
+	mutex_lock(&policy->transition_lock);
+	if (unlikely(WARN_ON(!policy->transition_ongoing &&
+				(state == CPUFREQ_POSTCHANGE)))) {
+		mutex_unlock(&policy->transition_lock);
+		return;
+	}
+
+	if (state == CPUFREQ_PRECHANGE) {
+		while (policy->transition_ongoing) {
+			mutex_unlock(&policy->transition_lock);
+			/* TODO: Can we do something better here? */
+			cpu_relax();
+			mutex_lock(&policy->transition_lock);
+		}
+
+		policy->transition_ongoing = true;
+		mutex_unlock(&policy->transition_lock);
+	}
+
+	notify_transition_for_each_cpu(policy, freqs, state);
+
+	if (state == CPUFREQ_POSTCHANGE) {
+		policy->transition_ongoing = false;
+		mutex_unlock(&policy->transition_lock);
+	}
 }
 EXPORT_SYMBOL_GPL(cpufreq_notify_transition);
 
@@ -983,6 +1017,7 @@  static struct cpufreq_policy *cpufreq_policy_alloc(void)
 
 	INIT_LIST_HEAD(&policy->policy_list);
 	init_rwsem(&policy->rwsem);
+	mutex_init(&policy->transition_lock);
 
 	return policy;
 
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index 31c431e..e5cebce 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -104,6 +104,8 @@  struct cpufreq_policy {
 	 *     __cpufreq_governor(data, CPUFREQ_GOV_POLICY_EXIT);
 	 */
 	struct rw_semaphore	rwsem;
+	bool			transition_ongoing; /* Tracks transition status */
+	struct mutex		transition_lock;
 };
 
 /* Only for ACPI */