diff mbox

[RFC,02/16] sched: Introduce CONFIG_SCHED_ENERGY

Message ID 1400869003-27769-3-git-send-email-morten.rasmussen@arm.com
State New
Headers show

Commit Message

Morten Rasmussen May 23, 2014, 6:16 p.m. UTC
The Energy-aware scheduler implementation is guarded by
CONFIG_SCHED_ENERGY.

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
---
 arch/arm/Kconfig |    5 +++++
 1 file changed, 5 insertions(+)

Comments

Henrik Austad June 8, 2014, 6:03 a.m. UTC | #1
On Fri, May 23, 2014 at 07:16:29PM +0100, Morten Rasmussen wrote:
> The Energy-aware scheduler implementation is guarded by
> CONFIG_SCHED_ENERGY.
> 
> Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
> ---
>  arch/arm/Kconfig |    5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index ab438cb..bfc3a85 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig

Is this going to be duplicate for each architecture enabling this? Why
not make a kernel/Kconfig.energy and link to that from those
architectures using it?

> @@ -1926,6 +1926,11 @@ config XEN
>  	help
>  	  Say Y if you want to run Linux in a Virtual Machine on Xen on ARM.
>  
> +config SCHED_ENERGY
> +	bool "Energy-aware scheduling (EXPERIMENTAL)"
> +	help
> +	  Highly experimental energy aware task scheduling.
> +

how about adding *slightly* more info here? :) (yes, yes, I know it's an RFC)

"""
Highly experimental energy aware task scheduling.

This will allow the kernel to keep track of energy required for
different capacity levels for a given CPU. That way, the scheduler can
make more informed decisions as to where a newly woken task should be
placed. Heterogenous platform will benefit the most from this option.

Enabling this will add a significant overhead for a task-switch.

If unsure, say N here.
"""

>  endmenu
>  
>  menu "Boot options"
> -- 
> 1.7.9.5
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
Morten Rasmussen June 9, 2014, 10:20 a.m. UTC | #2
On Sun, Jun 08, 2014 at 07:03:16AM +0100, Henrik Austad wrote:
> On Fri, May 23, 2014 at 07:16:29PM +0100, Morten Rasmussen wrote:
> > The Energy-aware scheduler implementation is guarded by
> > CONFIG_SCHED_ENERGY.
> > 
> > Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
> > Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
> > ---
> >  arch/arm/Kconfig |    5 +++++
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> > index ab438cb..bfc3a85 100644
> > --- a/arch/arm/Kconfig
> > +++ b/arch/arm/Kconfig
> 
> Is this going to be duplicate for each architecture enabling this? Why
> not make a kernel/Kconfig.energy and link to that from those
> architectures using it?

kernel/Kconfig.energy is better I think.

> 
> > @@ -1926,6 +1926,11 @@ config XEN
> >  	help
> >  	  Say Y if you want to run Linux in a Virtual Machine on Xen on ARM.
> >  
> > +config SCHED_ENERGY
> > +	bool "Energy-aware scheduling (EXPERIMENTAL)"
> > +	help
> > +	  Highly experimental energy aware task scheduling.
> > +
> 
> how about adding *slightly* more info here? :) (yes, yes, I know it's an RFC)

Fair point.

> 
> """
> Highly experimental energy aware task scheduling.
> 
> This will allow the kernel to keep track of energy required for
> different capacity levels for a given CPU. That way, the scheduler can
> make more informed decisions as to where a newly woken task should be
> placed. Heterogenous platform will benefit the most from this option.

Platforms with hierarchical power domains (for example, having ability
to power off groups of cpus and their caches) should see some benefit as
well.

> Enabling this will add a significant overhead for a task-switch.

The overhead is at task wakeup, task switch (as in task preemption)
should not be affected.

Thanks for the text. I will roll into v2.

Morten
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Zijlstra June 10, 2014, 9:39 a.m. UTC | #3
On Mon, Jun 09, 2014 at 11:20:27AM +0100, Morten Rasmussen wrote:
> On Sun, Jun 08, 2014 at 07:03:16AM +0100, Henrik Austad wrote:
> > On Fri, May 23, 2014 at 07:16:29PM +0100, Morten Rasmussen wrote:
> > > The Energy-aware scheduler implementation is guarded by
> > > CONFIG_SCHED_ENERGY.
> > > 
> > > Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
> > > Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
> > > ---
> > >  arch/arm/Kconfig |    5 +++++
> > >  1 file changed, 5 insertions(+)
> > > 
> > > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> > > index ab438cb..bfc3a85 100644
> > > --- a/arch/arm/Kconfig
> > > +++ b/arch/arm/Kconfig
> > 
> > Is this going to be duplicate for each architecture enabling this? Why
> > not make a kernel/Kconfig.energy and link to that from those
> > architectures using it?
> 
> kernel/Kconfig.energy is better I think.

Well, strictly speaking I'd prefer to not have more sched CONFIG knobs.

Do we really need to have this CONFIG guarded?
Morten Rasmussen June 10, 2014, 10:06 a.m. UTC | #4
On Tue, Jun 10, 2014 at 10:39:43AM +0100, Peter Zijlstra wrote:
> On Mon, Jun 09, 2014 at 11:20:27AM +0100, Morten Rasmussen wrote:
> > On Sun, Jun 08, 2014 at 07:03:16AM +0100, Henrik Austad wrote:
> > > On Fri, May 23, 2014 at 07:16:29PM +0100, Morten Rasmussen wrote:
> > > > The Energy-aware scheduler implementation is guarded by
> > > > CONFIG_SCHED_ENERGY.
> > > > 
> > > > Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
> > > > Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
> > > > ---
> > > >  arch/arm/Kconfig |    5 +++++
> > > >  1 file changed, 5 insertions(+)
> > > > 
> > > > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> > > > index ab438cb..bfc3a85 100644
> > > > --- a/arch/arm/Kconfig
> > > > +++ b/arch/arm/Kconfig
> > > 
> > > Is this going to be duplicate for each architecture enabling this? Why
> > > not make a kernel/Kconfig.energy and link to that from those
> > > architectures using it?
> > 
> > kernel/Kconfig.energy is better I think.
> 
> Well, strictly speaking I'd prefer to not have more sched CONFIG knobs.
> 
> Do we really need to have this CONFIG guarded?

How would you like to disable the energy stuff for users for whom
latency is everything?

I mean, we are adding some extra load/utilization tracking. While I
think we should do everything possible to minimize the overhead, I think
it is unrealistic to assume that it will be zero. Is a some extra 'if
(energy_enabled)' acceptable?

I'm open for other suggestions.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Zijlstra June 10, 2014, 10:23 a.m. UTC | #5
On Tue, Jun 10, 2014 at 11:06:41AM +0100, Morten Rasmussen wrote:
> How would you like to disable the energy stuff for users for whom
> latency is everything?
> 
> I mean, we are adding some extra load/utilization tracking. While I
> think we should do everything possible to minimize the overhead, I think
> it is unrealistic to assume that it will be zero. Is a some extra 'if
> (energy_enabled)' acceptable?
> 
> I'm open for other suggestions.

We have the jump-label stuff to do self modifying code ;-) The only
thing we need to be careful with is data-layout.

So I'm _hoping_ we can do all this without more CONFIG knobs, because
{PREEMPT*SMP*CGROUP^3*NUMA^2} is already entirely annoying to
build and run test, not to mention that distro builds will have no other
option than to enable everything anyhow.
Henrik Austad June 10, 2014, 11:17 a.m. UTC | #6
On Tue, Jun 10, 2014 at 12:23:53PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 10, 2014 at 11:06:41AM +0100, Morten Rasmussen wrote:
> > How would you like to disable the energy stuff for users for whom
> > latency is everything?
> > 
> > I mean, we are adding some extra load/utilization tracking. While I
> > think we should do everything possible to minimize the overhead, I think
> > it is unrealistic to assume that it will be zero. Is a some extra 'if
> > (energy_enabled)' acceptable?
> > 
> > I'm open for other suggestions.
> 
> We have the jump-label stuff to do self modifying code ;-) The only
> thing we need to be careful with is data-layout.

Isn't this asking for trouble?

I do get the point of not introducing more make-ifdeffery, but I'm not
so sure the alternative is much better. Do we really want to spend time
tracing down bugs introduced via a self-modifying process in something
as central as the scheduler?

> So I'm _hoping_ we can do all this without more CONFIG knobs, because
> {PREEMPT*SMP*CGROUP^3*NUMA^2} is already entirely annoying to
> build and run test, not to mention that distro builds will have no other
> option than to enable everything anyhow.

True, but if that is the argument, how is adding this as a dynamic thing
any better, you still end up with a test-matrix of the same size?

Building a kernel isn't _that_ much work and it would make the
test-scripts all the much simpler to maintain if we don't have to rely
on some dynamic tweaking of the core.

Just sayin'
Morten Rasmussen June 10, 2014, 11:24 a.m. UTC | #7
On Tue, Jun 10, 2014 at 11:23:53AM +0100, Peter Zijlstra wrote:
> On Tue, Jun 10, 2014 at 11:06:41AM +0100, Morten Rasmussen wrote:
> > How would you like to disable the energy stuff for users for whom
> > latency is everything?
> > 
> > I mean, we are adding some extra load/utilization tracking. While I
> > think we should do everything possible to minimize the overhead, I think
> > it is unrealistic to assume that it will be zero. Is a some extra 'if
> > (energy_enabled)' acceptable?
> > 
> > I'm open for other suggestions.
> 
> We have the jump-label stuff to do self modifying code ;-) The only
> thing we need to be careful with is data-layout.

Thanks. I can see that it is already used in for various bit in
kernel/sched/*. I didn't catch anything in Documentation/static-keys.txt
related to data-layout caveats. Is there some other
documentation/patches I should read before messing everything up? ;-)

> So I'm _hoping_ we can do all this without more CONFIG knobs, because
> {PREEMPT*SMP*CGROUP^3*NUMA^2} is already entirely annoying to
> build and run test, not to mention that distro builds will have no other
> option than to enable everything anyhow.

Fair enough.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Zijlstra June 10, 2014, 12:19 p.m. UTC | #8
On Tue, Jun 10, 2014 at 01:17:32PM +0200, Henrik Austad wrote:
> On Tue, Jun 10, 2014 at 12:23:53PM +0200, Peter Zijlstra wrote:
> > On Tue, Jun 10, 2014 at 11:06:41AM +0100, Morten Rasmussen wrote:
> > > How would you like to disable the energy stuff for users for whom
> > > latency is everything?
> > > 
> > > I mean, we are adding some extra load/utilization tracking. While I
> > > think we should do everything possible to minimize the overhead, I think
> > > it is unrealistic to assume that it will be zero. Is a some extra 'if
> > > (energy_enabled)' acceptable?
> > > 
> > > I'm open for other suggestions.
> > 
> > We have the jump-label stuff to do self modifying code ;-) The only
> > thing we need to be careful with is data-layout.
> 
> Isn't this asking for trouble?
> 
> I do get the point of not introducing more make-ifdeffery, but I'm not
> so sure the alternative is much better. Do we really want to spend time
> tracing down bugs introduced via a self-modifying process in something
> as central as the scheduler?

Its already chock full of that stuff ;-)

> > So I'm _hoping_ we can do all this without more CONFIG knobs, because
> > {PREEMPT*SMP*CGROUP^3*NUMA^2} is already entirely annoying to
> > build and run test, not to mention that distro builds will have no other
> > option than to enable everything anyhow.
> 
> True, but if that is the argument, how is adding this as a dynamic thing
> any better, you still end up with a test-matrix of the same size?

Test-matrix yes, sadly so and there's nothing we can really do about
that, so that sucks.

But it does reduce the coverage of the tests; everything that is not
uber critical fast path we can do unconditionally. So all the
sched_domain wankery gets tested on every boot / hotplug event, which is
tons better than only when that particular option is build in.

So while the total test matrix does suck rocks, the actual code that
needs testing per option can be siginficantly reduced.

> Building a kernel isn't _that_ much work and it would make the
> test-scripts all the much simpler to maintain if we don't have to rely
> on some dynamic tweaking of the core.

its exponential, given that I now already have to build
PREEMPT*SMP*CGROUP^3*NUMA^2 = 2^7 = 128 kernels to cover all options,
adding one more option means I'll have to build another 128 kernels.
Building 128 kernels does take a lot of time, no matter how far you
strip that .config and no matter I can build a kernel in <50 seconds.
Peter Zijlstra June 10, 2014, 12:24 p.m. UTC | #9
On Tue, Jun 10, 2014 at 12:24:03PM +0100, Morten Rasmussen wrote:
> On Tue, Jun 10, 2014 at 11:23:53AM +0100, Peter Zijlstra wrote:
> > On Tue, Jun 10, 2014 at 11:06:41AM +0100, Morten Rasmussen wrote:
> > > How would you like to disable the energy stuff for users for whom
> > > latency is everything?
> > > 
> > > I mean, we are adding some extra load/utilization tracking. While I
> > > think we should do everything possible to minimize the overhead, I think
> > > it is unrealistic to assume that it will be zero. Is a some extra 'if
> > > (energy_enabled)' acceptable?
> > > 
> > > I'm open for other suggestions.
> > 
> > We have the jump-label stuff to do self modifying code ;-) The only
> > thing we need to be careful with is data-layout.
> 
> Thanks. I can see that it is already used in for various bit in
> kernel/sched/*. I didn't catch anything in Documentation/static-keys.txt
> related to data-layout caveats. Is there some other
> documentation/patches I should read before messing everything up? ;-)

So the data-layout was mostly referring to things like making sure that
struct sched_avg doesn't end up straddling a cacheline somewhere by
accident.

The most expensive part of the per-task accounting nonsense is the
amount of memory we need to touch to do so, the actual instructions come
second, unless of course we go put tons of divisions in there :-)

BTW, are cachelines 64 bytes for you ARM people too?
Morten Rasmussen June 10, 2014, 2:41 p.m. UTC | #10
On Tue, Jun 10, 2014 at 01:24:35PM +0100, Peter Zijlstra wrote:
> On Tue, Jun 10, 2014 at 12:24:03PM +0100, Morten Rasmussen wrote:
> > On Tue, Jun 10, 2014 at 11:23:53AM +0100, Peter Zijlstra wrote:
> > > On Tue, Jun 10, 2014 at 11:06:41AM +0100, Morten Rasmussen wrote:
> > > > How would you like to disable the energy stuff for users for whom
> > > > latency is everything?
> > > > 
> > > > I mean, we are adding some extra load/utilization tracking. While I
> > > > think we should do everything possible to minimize the overhead, I think
> > > > it is unrealistic to assume that it will be zero. Is a some extra 'if
> > > > (energy_enabled)' acceptable?
> > > > 
> > > > I'm open for other suggestions.
> > > 
> > > We have the jump-label stuff to do self modifying code ;-) The only
> > > thing we need to be careful with is data-layout.
> > 
> > Thanks. I can see that it is already used in for various bit in
> > kernel/sched/*. I didn't catch anything in Documentation/static-keys.txt
> > related to data-layout caveats. Is there some other
> > documentation/patches I should read before messing everything up? ;-)
> 
> So the data-layout was mostly referring to things like making sure that
> struct sched_avg doesn't end up straddling a cacheline somewhere by
> accident.
> 
> The most expensive part of the per-task accounting nonsense is the
> amount of memory we need to touch to do so, the actual instructions come
> second, unless of course we go put tons of divisions in there :-)

Make sense.

> BTW, are cachelines 64 bytes for you ARM people too?

Mostly yes, but as with a lot of other things on ARM it is
implementation defined. The cacheline sizes are probeable at runtime,
but for things where we don't know I think 64 bytes is the current
assumption.

Catalin or Will would be able to provide a more detailed answer.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index ab438cb..bfc3a85 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1926,6 +1926,11 @@  config XEN
 	help
 	  Say Y if you want to run Linux in a Virtual Machine on Xen on ARM.
 
+config SCHED_ENERGY
+	bool "Energy-aware scheduling (EXPERIMENTAL)"
+	help
+	  Highly experimental energy aware task scheduling.
+
 endmenu
 
 menu "Boot options"