Message ID | 1400869003-27769-3-git-send-email-morten.rasmussen@arm.com |
---|---|
State | New |
Headers | show |
On Fri, May 23, 2014 at 07:16:29PM +0100, Morten Rasmussen wrote: > The Energy-aware scheduler implementation is guarded by > CONFIG_SCHED_ENERGY. > > Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> > Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com> > --- > arch/arm/Kconfig | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig > index ab438cb..bfc3a85 100644 > --- a/arch/arm/Kconfig > +++ b/arch/arm/Kconfig Is this going to be duplicate for each architecture enabling this? Why not make a kernel/Kconfig.energy and link to that from those architectures using it? > @@ -1926,6 +1926,11 @@ config XEN > help > Say Y if you want to run Linux in a Virtual Machine on Xen on ARM. > > +config SCHED_ENERGY > + bool "Energy-aware scheduling (EXPERIMENTAL)" > + help > + Highly experimental energy aware task scheduling. > + how about adding *slightly* more info here? :) (yes, yes, I know it's an RFC) """ Highly experimental energy aware task scheduling. This will allow the kernel to keep track of energy required for different capacity levels for a given CPU. That way, the scheduler can make more informed decisions as to where a newly woken task should be placed. Heterogenous platform will benefit the most from this option. Enabling this will add a significant overhead for a task-switch. If unsure, say N here. """ > endmenu > > menu "Boot options" > -- > 1.7.9.5 > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/
On Sun, Jun 08, 2014 at 07:03:16AM +0100, Henrik Austad wrote: > On Fri, May 23, 2014 at 07:16:29PM +0100, Morten Rasmussen wrote: > > The Energy-aware scheduler implementation is guarded by > > CONFIG_SCHED_ENERGY. > > > > Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> > > Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com> > > --- > > arch/arm/Kconfig | 5 +++++ > > 1 file changed, 5 insertions(+) > > > > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig > > index ab438cb..bfc3a85 100644 > > --- a/arch/arm/Kconfig > > +++ b/arch/arm/Kconfig > > Is this going to be duplicate for each architecture enabling this? Why > not make a kernel/Kconfig.energy and link to that from those > architectures using it? kernel/Kconfig.energy is better I think. > > > @@ -1926,6 +1926,11 @@ config XEN > > help > > Say Y if you want to run Linux in a Virtual Machine on Xen on ARM. > > > > +config SCHED_ENERGY > > + bool "Energy-aware scheduling (EXPERIMENTAL)" > > + help > > + Highly experimental energy aware task scheduling. > > + > > how about adding *slightly* more info here? :) (yes, yes, I know it's an RFC) Fair point. > > """ > Highly experimental energy aware task scheduling. > > This will allow the kernel to keep track of energy required for > different capacity levels for a given CPU. That way, the scheduler can > make more informed decisions as to where a newly woken task should be > placed. Heterogenous platform will benefit the most from this option. Platforms with hierarchical power domains (for example, having ability to power off groups of cpus and their caches) should see some benefit as well. > Enabling this will add a significant overhead for a task-switch. The overhead is at task wakeup, task switch (as in task preemption) should not be affected. Thanks for the text. I will roll into v2. Morten -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jun 09, 2014 at 11:20:27AM +0100, Morten Rasmussen wrote: > On Sun, Jun 08, 2014 at 07:03:16AM +0100, Henrik Austad wrote: > > On Fri, May 23, 2014 at 07:16:29PM +0100, Morten Rasmussen wrote: > > > The Energy-aware scheduler implementation is guarded by > > > CONFIG_SCHED_ENERGY. > > > > > > Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> > > > Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com> > > > --- > > > arch/arm/Kconfig | 5 +++++ > > > 1 file changed, 5 insertions(+) > > > > > > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig > > > index ab438cb..bfc3a85 100644 > > > --- a/arch/arm/Kconfig > > > +++ b/arch/arm/Kconfig > > > > Is this going to be duplicate for each architecture enabling this? Why > > not make a kernel/Kconfig.energy and link to that from those > > architectures using it? > > kernel/Kconfig.energy is better I think. Well, strictly speaking I'd prefer to not have more sched CONFIG knobs. Do we really need to have this CONFIG guarded?
On Tue, Jun 10, 2014 at 10:39:43AM +0100, Peter Zijlstra wrote: > On Mon, Jun 09, 2014 at 11:20:27AM +0100, Morten Rasmussen wrote: > > On Sun, Jun 08, 2014 at 07:03:16AM +0100, Henrik Austad wrote: > > > On Fri, May 23, 2014 at 07:16:29PM +0100, Morten Rasmussen wrote: > > > > The Energy-aware scheduler implementation is guarded by > > > > CONFIG_SCHED_ENERGY. > > > > > > > > Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> > > > > Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com> > > > > --- > > > > arch/arm/Kconfig | 5 +++++ > > > > 1 file changed, 5 insertions(+) > > > > > > > > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig > > > > index ab438cb..bfc3a85 100644 > > > > --- a/arch/arm/Kconfig > > > > +++ b/arch/arm/Kconfig > > > > > > Is this going to be duplicate for each architecture enabling this? Why > > > not make a kernel/Kconfig.energy and link to that from those > > > architectures using it? > > > > kernel/Kconfig.energy is better I think. > > Well, strictly speaking I'd prefer to not have more sched CONFIG knobs. > > Do we really need to have this CONFIG guarded? How would you like to disable the energy stuff for users for whom latency is everything? I mean, we are adding some extra load/utilization tracking. While I think we should do everything possible to minimize the overhead, I think it is unrealistic to assume that it will be zero. Is a some extra 'if (energy_enabled)' acceptable? I'm open for other suggestions. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jun 10, 2014 at 11:06:41AM +0100, Morten Rasmussen wrote: > How would you like to disable the energy stuff for users for whom > latency is everything? > > I mean, we are adding some extra load/utilization tracking. While I > think we should do everything possible to minimize the overhead, I think > it is unrealistic to assume that it will be zero. Is a some extra 'if > (energy_enabled)' acceptable? > > I'm open for other suggestions. We have the jump-label stuff to do self modifying code ;-) The only thing we need to be careful with is data-layout. So I'm _hoping_ we can do all this without more CONFIG knobs, because {PREEMPT*SMP*CGROUP^3*NUMA^2} is already entirely annoying to build and run test, not to mention that distro builds will have no other option than to enable everything anyhow.
On Tue, Jun 10, 2014 at 12:23:53PM +0200, Peter Zijlstra wrote: > On Tue, Jun 10, 2014 at 11:06:41AM +0100, Morten Rasmussen wrote: > > How would you like to disable the energy stuff for users for whom > > latency is everything? > > > > I mean, we are adding some extra load/utilization tracking. While I > > think we should do everything possible to minimize the overhead, I think > > it is unrealistic to assume that it will be zero. Is a some extra 'if > > (energy_enabled)' acceptable? > > > > I'm open for other suggestions. > > We have the jump-label stuff to do self modifying code ;-) The only > thing we need to be careful with is data-layout. Isn't this asking for trouble? I do get the point of not introducing more make-ifdeffery, but I'm not so sure the alternative is much better. Do we really want to spend time tracing down bugs introduced via a self-modifying process in something as central as the scheduler? > So I'm _hoping_ we can do all this without more CONFIG knobs, because > {PREEMPT*SMP*CGROUP^3*NUMA^2} is already entirely annoying to > build and run test, not to mention that distro builds will have no other > option than to enable everything anyhow. True, but if that is the argument, how is adding this as a dynamic thing any better, you still end up with a test-matrix of the same size? Building a kernel isn't _that_ much work and it would make the test-scripts all the much simpler to maintain if we don't have to rely on some dynamic tweaking of the core. Just sayin'
On Tue, Jun 10, 2014 at 11:23:53AM +0100, Peter Zijlstra wrote: > On Tue, Jun 10, 2014 at 11:06:41AM +0100, Morten Rasmussen wrote: > > How would you like to disable the energy stuff for users for whom > > latency is everything? > > > > I mean, we are adding some extra load/utilization tracking. While I > > think we should do everything possible to minimize the overhead, I think > > it is unrealistic to assume that it will be zero. Is a some extra 'if > > (energy_enabled)' acceptable? > > > > I'm open for other suggestions. > > We have the jump-label stuff to do self modifying code ;-) The only > thing we need to be careful with is data-layout. Thanks. I can see that it is already used in for various bit in kernel/sched/*. I didn't catch anything in Documentation/static-keys.txt related to data-layout caveats. Is there some other documentation/patches I should read before messing everything up? ;-) > So I'm _hoping_ we can do all this without more CONFIG knobs, because > {PREEMPT*SMP*CGROUP^3*NUMA^2} is already entirely annoying to > build and run test, not to mention that distro builds will have no other > option than to enable everything anyhow. Fair enough. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jun 10, 2014 at 01:17:32PM +0200, Henrik Austad wrote: > On Tue, Jun 10, 2014 at 12:23:53PM +0200, Peter Zijlstra wrote: > > On Tue, Jun 10, 2014 at 11:06:41AM +0100, Morten Rasmussen wrote: > > > How would you like to disable the energy stuff for users for whom > > > latency is everything? > > > > > > I mean, we are adding some extra load/utilization tracking. While I > > > think we should do everything possible to minimize the overhead, I think > > > it is unrealistic to assume that it will be zero. Is a some extra 'if > > > (energy_enabled)' acceptable? > > > > > > I'm open for other suggestions. > > > > We have the jump-label stuff to do self modifying code ;-) The only > > thing we need to be careful with is data-layout. > > Isn't this asking for trouble? > > I do get the point of not introducing more make-ifdeffery, but I'm not > so sure the alternative is much better. Do we really want to spend time > tracing down bugs introduced via a self-modifying process in something > as central as the scheduler? Its already chock full of that stuff ;-) > > So I'm _hoping_ we can do all this without more CONFIG knobs, because > > {PREEMPT*SMP*CGROUP^3*NUMA^2} is already entirely annoying to > > build and run test, not to mention that distro builds will have no other > > option than to enable everything anyhow. > > True, but if that is the argument, how is adding this as a dynamic thing > any better, you still end up with a test-matrix of the same size? Test-matrix yes, sadly so and there's nothing we can really do about that, so that sucks. But it does reduce the coverage of the tests; everything that is not uber critical fast path we can do unconditionally. So all the sched_domain wankery gets tested on every boot / hotplug event, which is tons better than only when that particular option is build in. So while the total test matrix does suck rocks, the actual code that needs testing per option can be siginficantly reduced. > Building a kernel isn't _that_ much work and it would make the > test-scripts all the much simpler to maintain if we don't have to rely > on some dynamic tweaking of the core. its exponential, given that I now already have to build PREEMPT*SMP*CGROUP^3*NUMA^2 = 2^7 = 128 kernels to cover all options, adding one more option means I'll have to build another 128 kernels. Building 128 kernels does take a lot of time, no matter how far you strip that .config and no matter I can build a kernel in <50 seconds.
On Tue, Jun 10, 2014 at 12:24:03PM +0100, Morten Rasmussen wrote: > On Tue, Jun 10, 2014 at 11:23:53AM +0100, Peter Zijlstra wrote: > > On Tue, Jun 10, 2014 at 11:06:41AM +0100, Morten Rasmussen wrote: > > > How would you like to disable the energy stuff for users for whom > > > latency is everything? > > > > > > I mean, we are adding some extra load/utilization tracking. While I > > > think we should do everything possible to minimize the overhead, I think > > > it is unrealistic to assume that it will be zero. Is a some extra 'if > > > (energy_enabled)' acceptable? > > > > > > I'm open for other suggestions. > > > > We have the jump-label stuff to do self modifying code ;-) The only > > thing we need to be careful with is data-layout. > > Thanks. I can see that it is already used in for various bit in > kernel/sched/*. I didn't catch anything in Documentation/static-keys.txt > related to data-layout caveats. Is there some other > documentation/patches I should read before messing everything up? ;-) So the data-layout was mostly referring to things like making sure that struct sched_avg doesn't end up straddling a cacheline somewhere by accident. The most expensive part of the per-task accounting nonsense is the amount of memory we need to touch to do so, the actual instructions come second, unless of course we go put tons of divisions in there :-) BTW, are cachelines 64 bytes for you ARM people too?
On Tue, Jun 10, 2014 at 01:24:35PM +0100, Peter Zijlstra wrote: > On Tue, Jun 10, 2014 at 12:24:03PM +0100, Morten Rasmussen wrote: > > On Tue, Jun 10, 2014 at 11:23:53AM +0100, Peter Zijlstra wrote: > > > On Tue, Jun 10, 2014 at 11:06:41AM +0100, Morten Rasmussen wrote: > > > > How would you like to disable the energy stuff for users for whom > > > > latency is everything? > > > > > > > > I mean, we are adding some extra load/utilization tracking. While I > > > > think we should do everything possible to minimize the overhead, I think > > > > it is unrealistic to assume that it will be zero. Is a some extra 'if > > > > (energy_enabled)' acceptable? > > > > > > > > I'm open for other suggestions. > > > > > > We have the jump-label stuff to do self modifying code ;-) The only > > > thing we need to be careful with is data-layout. > > > > Thanks. I can see that it is already used in for various bit in > > kernel/sched/*. I didn't catch anything in Documentation/static-keys.txt > > related to data-layout caveats. Is there some other > > documentation/patches I should read before messing everything up? ;-) > > So the data-layout was mostly referring to things like making sure that > struct sched_avg doesn't end up straddling a cacheline somewhere by > accident. > > The most expensive part of the per-task accounting nonsense is the > amount of memory we need to touch to do so, the actual instructions come > second, unless of course we go put tons of divisions in there :-) Make sense. > BTW, are cachelines 64 bytes for you ARM people too? Mostly yes, but as with a lot of other things on ARM it is implementation defined. The cacheline sizes are probeable at runtime, but for things where we don't know I think 64 bytes is the current assumption. Catalin or Will would be able to provide a more detailed answer. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index ab438cb..bfc3a85 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -1926,6 +1926,11 @@ config XEN help Say Y if you want to run Linux in a Virtual Machine on Xen on ARM. +config SCHED_ENERGY + bool "Energy-aware scheduling (EXPERIMENTAL)" + help + Highly experimental energy aware task scheduling. + endmenu menu "Boot options"