mbox series

[V3,0/3] sched: cpufreq: Allow remote callbacks

Message ID cover.1499927699.git.viresh.kumar@linaro.org
Headers show
Series sched: cpufreq: Allow remote callbacks | expand

Message

Viresh Kumar July 13, 2017, 6:44 a.m. UTC
Hi,

With Android UI and benchmarks the latency of cpufreq response to
certain scheduling events can become very critical. Currently, callbacks
into schedutil are only made from the scheduler if the target CPU of the
event is the same as the current CPU. This means there are certain
situations where a target CPU may not run schedutil for some time.

One testcase to show this behavior is where a task starts running on
CPU0, then a new task is also spawned on CPU0 by a task on CPU1. If the
system is configured such that new tasks should receive maximum demand
initially, this should result in CPU0 increasing frequency immediately.
Because of the above mentioned limitation though this does not occur.
This is verified using ftrace with the sample [1] application.

Maybe the ideal solution is to always allow remote callbacks but that
has its own challenges:

o There is no protection required for single CPU per policy case today,
  and adding any kind of locking there, to supply remote callbacks,
  isn't really a good idea.

o If is local CPU isn't part of the same cpufreq policy as the target
  CPU, then we wouldn't be able to do fast switching at all and have to
  use some kind of bottom half to schedule work on the target CPU to do
  real switching. That may be overkill as well.


And so this series only allows remote callbacks for target CPUs that
share the cpufreq policy with the local CPU.

This series is tested with couple of usecases (Android: hackbench,
recentfling, galleryfling, vellamo, Ubuntu: hackbench) on ARM hikey
board (64 bit octa-core, single policy). Only galleryfling showed minor
improvements, while others didn't had much deviation.

The reason being that this patchset only targets a corner case, where
following are required to be true to improve performance and that
doesn't happen too often with these tests:

- Task is migrated to another CPU.
- The task has maximum demand initially, and should take the CPU to
  higher OPPs.
- And the target CPU doesn't call into schedutil until the next tick.

V2->V3:
- Rearranged/merged patches as suggested by Rafael (looks much better
  now)
- Also handle new hook added to intel-pstate driver.
- The final code remains the same as V2, except for the above hook.

V1->V2:
- Don't support remote callbacks for unshared cpufreq policies.
- Don't support remote callbacks where local CPU isn't part of the
  target CPU's cpufreq policy.
- Dropped dvfs_possible_from_any_cpu flag.

--
viresh

[1] http://pastebin.com/7LkMSRxE

Viresh Kumar (3):
  sched: cpufreq: Allow remote cpufreq callbacks
  cpufreq: schedutil: Process remote callback for shared policies
  cpufreq: governor: Process remote callback for shared policies

 drivers/cpufreq/cpufreq_governor.c |  4 ++++
 drivers/cpufreq/intel_pstate.c     |  8 ++++++++
 include/linux/sched/cpufreq.h      |  1 +
 kernel/sched/cpufreq.c             |  1 +
 kernel/sched/cpufreq_schedutil.c   | 19 ++++++++++++++-----
 kernel/sched/deadline.c            |  2 +-
 kernel/sched/fair.c                |  8 +++++---
 kernel/sched/rt.c                  |  2 +-
 kernel/sched/sched.h               | 10 ++--------
 9 files changed, 37 insertions(+), 18 deletions(-)

-- 
2.13.0.71.gd7076ec9c9cb

Comments

Rafael J. Wysocki July 13, 2017, 3:17 p.m. UTC | #1
On Thu, Jul 13, 2017 at 8:44 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> Hi,

>

> With Android UI and benchmarks the latency of cpufreq response to

> certain scheduling events can become very critical. Currently, callbacks

> into schedutil are only made from the scheduler if the target CPU of the

> event is the same as the current CPU. This means there are certain

> situations where a target CPU may not run schedutil for some time.

>

> One testcase to show this behavior is where a task starts running on

> CPU0, then a new task is also spawned on CPU0 by a task on CPU1. If the

> system is configured such that new tasks should receive maximum demand

> initially, this should result in CPU0 increasing frequency immediately.

> Because of the above mentioned limitation though this does not occur.

> This is verified using ftrace with the sample [1] application.

>

> Maybe the ideal solution is to always allow remote callbacks but that

> has its own challenges:

>

> o There is no protection required for single CPU per policy case today,

>   and adding any kind of locking there, to supply remote callbacks,

>   isn't really a good idea.

>

> o If is local CPU isn't part of the same cpufreq policy as the target

>   CPU, then we wouldn't be able to do fast switching at all and have to

>   use some kind of bottom half to schedule work on the target CPU to do

>   real switching. That may be overkill as well.

>

>

> And so this series only allows remote callbacks for target CPUs that

> share the cpufreq policy with the local CPU.

>

> This series is tested with couple of usecases (Android: hackbench,

> recentfling, galleryfling, vellamo, Ubuntu: hackbench) on ARM hikey

> board (64 bit octa-core, single policy). Only galleryfling showed minor

> improvements, while others didn't had much deviation.

>

> The reason being that this patchset only targets a corner case, where

> following are required to be true to improve performance and that

> doesn't happen too often with these tests:

>

> - Task is migrated to another CPU.

> - The task has maximum demand initially, and should take the CPU to

>   higher OPPs.

> - And the target CPU doesn't call into schedutil until the next tick.


I don't have any problems with this series at this point, so you can add

Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


to the patches.

I can't apply them without ACKs from Peter or Ingo, though.

Thanks,
Rafael
Viresh Kumar July 26, 2017, 6:29 a.m. UTC | #2
On 24-07-17, 15:47, Peter Zijlstra wrote:
> I said nothing about the shared locking. That is indeed required. All I

> said is that those two tests you add could be left out.


I was right, I didn't understood your comment at all :(

> > > That would then continue to process the iowait and other accounting

> > > stuff, but stall the moment we call into the actual driver, which will

> > > then drop the request on the floor as per the first few hunks.

> > 

> > I am not sure I understood your comment completely though.

> 

> Since we call cpufreq_update_util(@rq, ...) with @rq->lock held, all

> such calls are in fact serialized for that cpu.


Yes, they are serialized but ..

> Therefore the cpu !=

> current_cpu test you add are pointless.


.. I didn't understand why you said so. This check isn't there to take
care of serialization but remote callbacks.

> Only once we get to the actual cpufreq driver (intel_pstate and others)

> do we run into the fact that we might not be able to service the request

> remotely.


We never check for remote callbacks in drivers.

> But since you also add a test there, that is sufficient.


No.

The diff for intel-pstate that you saw in this patch was for the case
where intel-pstate works directly with the scheduler (i.e. no
schedutil governor). The routine that gets called with schedutil is
intel_cpufreq_target(), which doesn't check for remoteness at all.

-- 
viresh
Viresh Kumar July 27, 2017, 3:30 a.m. UTC | #3
On 26-07-17, 14:00, Saravana Kannan wrote:
> No, the alternative is to pass it on to the CPU freq driver and let it

> decide what it wants to do. That's the whole point if having a CPU freq

> driver -- so that the generic code doesn't need to care about HW specific

> details. Which is the point I was making in an earlier email to Viresh's

> patch -- we shouldn't be doing any CPU check for the call backs at the

> scheduler or ever governor level.

> 

> That would simplify this whole thing by deleting a bunch of code. And having

> much simpler checks in those drivers that actually have to deal with their

> HW specific details.


So what you are saying is that we go and update (almost) every cpufreq
driver we have today and make their ->target() callbacks return early
if they don't support switching frequency remotely ? Is that really
simplifying anything?

The core already has most of the data required and I believe that we
need to handle it in the governor's code as is handled in this series.

To solve the problem that you have been reporting (update from any
CPU), we need something like this which I earlier suggested and I
will come back to it after this series is gone. Don't want to
complicate things here unnecessarily.

https://marc.info/?l=linux-kernel&m=148906012827786&w=2

-- 
viresh
Viresh Kumar July 28, 2017, 6 a.m. UTC | #4
On 27-07-17, 12:55, Saravana Kannan wrote:
> Yes. Simplifying isn't always about number of lines of code. It's also about

> abstraction. Having generic scheduler code care about HW details doesn't

> seem nice.


I can argue that even the policy->cpus field is also hardware
specific, isn't it ? And we are using that in the schedutil governor
anyway. What's wrong with having another field (in a generic way) in
the same structure that tells us more about hardware ?

And then schedutil isn't really scheduler, but a cpufreq governor.
Just like ondemand/conservative, which are also called from the same
scheduler path.

> It'll literally one simple check (cpu == smp_processor_id()) or (cpu "in"

> policy->cpus).

> 

> Also, this is only for drivers that currently support fast switching. How

> many of those do you have?


Why? Why shouldn't we do that for the other drivers? I think it should
be done across everyone.

> >The core already has most of the data required and I believe that we

> >need to handle it in the governor's code as is handled in this series.

> 

> Clearly, it doesn't. You are just making assumptions about HW.


So assuming that any CPU from a policy can change freq on behalf of
all the CPUs of the same policy is wrong? That is the basis of how the
cpufreq core is designed.

-- 
viresh