mbox series

[RFC,0/9] cpufreq: schedutil: Allow remote wakeups

Message ID cover.1489058244.git.viresh.kumar@linaro.org
Headers show
Series cpufreq: schedutil: Allow remote wakeups | expand

Message

Viresh Kumar March 9, 2017, 11:45 a.m. UTC
Hi,

This is based of the work done by Steve Muckle [1] before he left Linaro
and most of the patches are still under his authorship. I have done
couple of improvements (detailed in individual patches) and removed the
late callback support [2] as I wasn't sure of the value it adds. We can
include it separately if others feel it is required. This series is
based on pm/linux-next with patches [3] and [4] applied on top of it.

With Android UI and benchmarks the latency of cpufreq response to
certain scheduling events can become very critical. Currently, callbacks
into schedutil are only made from the scheduler if the target CPU of the
event is the same as the current CPU. This means there are certain
situations where a target CPU may not run schedutil for some time.

One testcase to show this behavior is where a task starts running on
CPU0, then a new task is also spawned on CPU0 by a task on CPU1. If the
system is configured such that new tasks should receive maximum demand
initially, this should result in CPU0 increasing frequency immediately.
Because of the above mentioned limitation though this does not occur.
This is verified using ftrace with the sample [5] application.

This patchset updates the scheduler to call cpufreq callbacks for remote
CPUs as well and updates schedutil governor to deal with it. An
additional flag is added to cpufreq policies to avoid sending IPIs to
remote CPUs to update the frequency, if CPUs on the platform can change
frequency of any other CPU.

This series is tested with couple of usecases (Android: hackbench,
recentfling, galleryfling, vellamo, Ubuntu: hackbench) on ARM hikey
board (64 bit octa-core, single policy). Only galleryfling showed minor
improvements, while others didn't had much deviation.

The reason being that this patchset only targets a corner case, where
following are required to be true to improve performance and that
doesn't happen too often with these tests:

- Task is migrated to another CPU.
- The task has maximum demand initially, and should take the CPU to
  higher OPPs.
- And the target CPU doesn't call into schedutil until the next tick,
  without this patchset.

--
viresh

[1] https://git.linaro.org/people/steve.muckle/kernel.git/log/?h=pmwg-integration
[2] https://git.linaro.org/people/steve.muckle/kernel.git/commit/?h=pmwg-integration&id=8f2ba60cde7e8ce9f9e5994bf8887371d7d6569c
[3] https://marc.info/?l=linux-kernel&m=148766093718487&w=2
[4] https://marc.info/?l=linux-kernel&m=148903231720432&w=2
[5] http://pastebin.com/7LkMSRxE

Steve Muckle (8):
  sched: cpufreq: add cpu to update_util_data
  irq_work: add irq_work_queue_on for !CONFIG_SMP
  sched: cpufreq: extend irq work to support fast switches
  sched: cpufreq: remove smp_processor_id() in remote paths
  sched: cpufreq: detect, process remote callbacks
  cpufreq: governor: support scheduler cpufreq callbacks on remote CPUs
  intel_pstate: ignore scheduler cpufreq callbacks on remote CPUs
  sched: cpufreq: enable remote sched cpufreq callbacks

Viresh Kumar (1):
  cpufreq: Add dvfs_possible_from_any_cpu policy flag

 drivers/cpufreq/cpufreq-dt.c       |  1 +
 drivers/cpufreq/cpufreq_governor.c |  2 +-
 drivers/cpufreq/intel_pstate.c     |  3 ++
 include/linux/cpufreq.h            |  9 +++++
 include/linux/irq_work.h           |  7 ++++
 include/linux/sched/cpufreq.h      |  1 +
 kernel/sched/cpufreq.c             |  1 +
 kernel/sched/cpufreq_schedutil.c   | 80 +++++++++++++++++++++++++++++---------
 kernel/sched/fair.c                |  6 ++-
 kernel/sched/sched.h               |  3 +-
 10 files changed, 90 insertions(+), 23 deletions(-)

-- 
2.7.1.410.g6faf27b

Comments

Rafael J. Wysocki March 15, 2017, 11:45 a.m. UTC | #1
On Thursday, March 09, 2017 05:15:10 PM Viresh Kumar wrote:
> Hi,

> 

> This is based of the work done by Steve Muckle [1] before he left Linaro

> and most of the patches are still under his authorship. I have done

> couple of improvements (detailed in individual patches) and removed the

> late callback support [2] as I wasn't sure of the value it adds. We can

> include it separately if others feel it is required. This series is

> based on pm/linux-next with patches [3] and [4] applied on top of it.

> 

> With Android UI and benchmarks the latency of cpufreq response to

> certain scheduling events can become very critical. Currently, callbacks

> into schedutil are only made from the scheduler if the target CPU of the

> event is the same as the current CPU. This means there are certain

> situations where a target CPU may not run schedutil for some time.

> 

> One testcase to show this behavior is where a task starts running on

> CPU0, then a new task is also spawned on CPU0 by a task on CPU1. If the

> system is configured such that new tasks should receive maximum demand

> initially, this should result in CPU0 increasing frequency immediately.

> Because of the above mentioned limitation though this does not occur.

> This is verified using ftrace with the sample [5] application.

> 

> This patchset updates the scheduler to call cpufreq callbacks for remote

> CPUs as well and updates schedutil governor to deal with it. An

> additional flag is added to cpufreq policies to avoid sending IPIs to

> remote CPUs to update the frequency, if CPUs on the platform can change

> frequency of any other CPU.

> 

> This series is tested with couple of usecases (Android: hackbench,

> recentfling, galleryfling, vellamo, Ubuntu: hackbench) on ARM hikey

> board (64 bit octa-core, single policy). Only galleryfling showed minor

> improvements, while others didn't had much deviation.

> 

> The reason being that this patchset only targets a corner case, where

> following are required to be true to improve performance and that

> doesn't happen too often with these tests:

> 

> - Task is migrated to another CPU.

> - The task has maximum demand initially, and should take the CPU to

>   higher OPPs.

> - And the target CPU doesn't call into schedutil until the next tick,

>   without this patchset.


From the first quick look patches [1-3/9] seem to be split out somewhat
artificially.

Any chance to fold them into the patches where the new stuff is actually used?

I'll be looking at the rest of the patchset shortly.

Thanks,
Rafael
Viresh Kumar March 16, 2017, 3:09 a.m. UTC | #2
On 15-03-17, 12:45, Rafael J. Wysocki wrote:
> From the first quick look patches [1-3/9] seem to be split out somewhat

> artificially.

> 

> Any chance to fold them into the patches where the new stuff is actually used?

> 

> I'll be looking at the rest of the patchset shortly.


I thought it would be better to keep them separate, but I can merge them to
others in the next version if you want.

-- 
viresh
Rafael J. Wysocki March 16, 2017, 10:04 a.m. UTC | #3
On Thu, Mar 16, 2017 at 4:09 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> On 15-03-17, 12:45, Rafael J. Wysocki wrote:

>> From the first quick look patches [1-3/9] seem to be split out somewhat

>> artificially.

>>

>> Any chance to fold them into the patches where the new stuff is actually used?

>>

>> I'll be looking at the rest of the patchset shortly.

>

> I thought it would be better to keep them separate, but I can merge them to

> others in the next version if you want.


Yes, it is kind of more convenient to see how they are used right away.

Otherwise I need to look at two patches at the same time which is a
pain on some machine form factors. :-)

Thanks,
Rafael
Rafael J. Wysocki March 29, 2017, 9:28 p.m. UTC | #4
On Thursday, March 09, 2017 05:15:15 PM Viresh Kumar wrote:
> From: Steve Muckle <smuckle.linux@gmail.com>

> 

> Upcoming support for remote callbacks from the scheduler into schedutil

> requires that the CPU identified in the hook structure be used to

> indicate the CPU being operated on, rather than relying on

> smp_processor_id().

> 

> Note that policy->cpu is passed to trace_cpu_frequency() in fast switch

> path, as it is safe to use any CPU which is managed by the current

> policy.


This should be commented about in the code too IMO.

> 

> Signed-off-by: Steve Muckle <smuckle.linux@gmail.com>

> [ vk: updated commit log, minor code cleanups and use policy->cpu for

>       traces ]

> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

> ---

>  kernel/sched/cpufreq_schedutil.c | 14 +++++++-------

>  1 file changed, 7 insertions(+), 7 deletions(-)

> 

> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c

> index a418544c51b1..b168c31f1c8f 100644

> --- a/kernel/sched/cpufreq_schedutil.c

> +++ b/kernel/sched/cpufreq_schedutil.c

> @@ -96,7 +96,7 @@ static void sugov_fast_switch(struct cpufreq_policy *policy,

>  		return;

>  

>  	policy->cur = next_freq;

> -	trace_cpu_frequency(next_freq, smp_processor_id());

> +	trace_cpu_frequency(next_freq, policy->cpu);

>  }

>  

>  static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time,

> @@ -106,7 +106,7 @@ static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time,

>  

>  	if (policy->fast_switch_enabled) {

>  		if (sg_policy->next_freq == next_freq) {

> -			trace_cpu_frequency(policy->cur, smp_processor_id());

> +			trace_cpu_frequency(policy->cur, policy->cpu);

>  			return;

>  		}

>  		sg_policy->next_freq = next_freq;

> @@ -157,12 +157,12 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,

>  	return cpufreq_driver_resolve_freq(policy, freq);

>  }

>  

> -static void sugov_get_util(unsigned long *util, unsigned long *max)

> +static void sugov_get_util(unsigned long *util, unsigned long *max, int cpu)

>  {

> -	struct rq *rq = this_rq();

> +	struct rq *rq = cpu_rq(cpu);

>  	unsigned long cfs_max;

>  

> -	cfs_max = arch_scale_cpu_capacity(NULL, smp_processor_id());

> +	cfs_max = arch_scale_cpu_capacity(NULL, cpu);

>  

>  	*util = min(rq->cfs.avg.util_avg, cfs_max);

>  	*max = cfs_max;

> @@ -216,7 +216,7 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,

>  	if (flags & SCHED_CPUFREQ_RT_DL) {

>  		next_f = policy->cpuinfo.max_freq;

>  	} else {

> -		sugov_get_util(&util, &max);

> +		sugov_get_util(&util, &max, hook->cpu);


Why is this not racy?

>  		sugov_iowait_boost(sg_cpu, &util, &max);

>  		next_f = get_next_freq(sg_policy, util, max);

>  	}

> @@ -272,7 +272,7 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,

>  	unsigned long util, max;

>  	unsigned int next_f;

>  

> -	sugov_get_util(&util, &max);

> +	sugov_get_util(&util, &max, hook->cpu);

>  


And here?

>  	raw_spin_lock(&sg_policy->update_lock);

>  

> 


Thanks,
Rafael
Viresh Kumar April 11, 2017, 10:35 a.m. UTC | #5
On 29-03-17, 23:28, Rafael J. Wysocki wrote:
> On Thursday, March 09, 2017 05:15:15 PM Viresh Kumar wrote:

> > @@ -216,7 +216,7 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,

> >  	if (flags & SCHED_CPUFREQ_RT_DL) {

> >  		next_f = policy->cpuinfo.max_freq;

> >  	} else {

> > -		sugov_get_util(&util, &max);

> > +		sugov_get_util(&util, &max, hook->cpu);

> 

> Why is this not racy?


Why would reading the utilization values be racy? The only dynamic value here is
"util_avg" and I am not sure if reading it is racy.

But, this whole routine has races which I ignored as we may end up updating
frequency simultaneously from two threads.

> >  		sugov_iowait_boost(sg_cpu, &util, &max);

> >  		next_f = get_next_freq(sg_policy, util, max);

> >  	}

> > @@ -272,7 +272,7 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,

> >  	unsigned long util, max;

> >  	unsigned int next_f;

> >  

> > -	sugov_get_util(&util, &max);

> > +	sugov_get_util(&util, &max, hook->cpu);

> >  

> 

> And here?

> 

> >  	raw_spin_lock(&sg_policy->update_lock);


The lock prevents the same here though.

So, if we are going to use this series, then we can use the same update-lock in
case of single cpu per policies as well.

-- 
viresh