Message ID | 1404144343-18720-9-git-send-email-vincent.guittot@linaro.org |
---|---|
State | New |
Headers | show |
On Mon, Jun 30, 2014 at 06:05:39PM +0200, Vincent Guittot wrote: > + /* > + * If the CPUs share their cache and the src_cpu's capacity is > + * reduced because of other sched_class or IRQs, we trig an > + * active balance to move the task > + */ > + if ((sd->flags & SD_SHARE_PKG_RESOURCES) > + && ((capacity_orig_of(src_cpu) * 100) > (capacity_of(src_cpu) * > + sd->imbalance_pct))) > return 1; Why is this tied to shared caches? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On Mon, Jun 30, 2014 at 06:05:39PM +0200, Vincent Guittot wrote: > + if ((sd->flags & SD_SHARE_PKG_RESOURCES) > + && ((capacity_orig_of(src_cpu) * 100) > (capacity_of(src_cpu) * > + sd->imbalance_pct))) > + if ((rq->cfs.h_nr_running >= 1) > + && ((rq->cpu_capacity * sd->imbalance_pct) < > + (rq->cpu_capacity_orig * 100))) { That's just weird indentation and operators should be at the end not at the beginning of a line-break. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On Mon, Jun 30, 2014 at 06:05:39PM +0200, Vincent Guittot wrote: You 'forgot' to update the comment that goes with nohz_kick_needed(). > @@ -7233,9 +7253,10 @@ static inline int nohz_kick_needed(struct rq *rq) > struct sched_domain *sd; > struct sched_group_capacity *sgc; > int nr_busy, cpu = rq->cpu; > + bool kick = false; > > if (unlikely(rq->idle_balance)) > + return false; > > /* > * We may be recently in ticked or tickless idle mode. At the first > @@ -7249,38 +7270,41 @@ static inline int nohz_kick_needed(struct rq *rq) > * balancing. > */ > if (likely(!atomic_read(&nohz.nr_cpus))) > + return false; > > if (time_before(now, nohz.next_balance)) > + return false; > > if (rq->nr_running >= 2) > + return true; > > rcu_read_lock(); > sd = rcu_dereference(per_cpu(sd_busy, cpu)); > if (sd) { > sgc = sd->groups->sgc; > nr_busy = atomic_read(&sgc->nr_busy_cpus); > > + if (nr_busy > 1) { > + kick = true; > + goto unlock; > + } > + > + if ((rq->cfs.h_nr_running >= 1) > + && ((rq->cpu_capacity * sd->imbalance_pct) < > + (rq->cpu_capacity_orig * 100))) { > + kick = true; > + goto unlock; > + } Again, why only for shared caches? > } > > sd = rcu_dereference(per_cpu(sd_asym, cpu)); > if (sd && (cpumask_first_and(nohz.idle_cpus_mask, > sched_domain_span(sd)) < cpu)) > + kick = true; > > +unlock: > rcu_read_unlock(); > + return kick; > } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On 10 July 2014 13:24, Peter Zijlstra <peterz@infradead.org> wrote: > On Mon, Jun 30, 2014 at 06:05:39PM +0200, Vincent Guittot wrote: >> + if ((sd->flags & SD_SHARE_PKG_RESOURCES) >> + && ((capacity_orig_of(src_cpu) * 100) > (capacity_of(src_cpu) * >> + sd->imbalance_pct))) > > >> + if ((rq->cfs.h_nr_running >= 1) >> + && ((rq->cpu_capacity * sd->imbalance_pct) < >> + (rq->cpu_capacity_orig * 100))) { > > That's just weird indentation and operators should be at the end not at > the beginning of a line-break. Ok, i'm going to fix it -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On 10 July 2014 13:18, Peter Zijlstra <peterz@infradead.org> wrote: > On Mon, Jun 30, 2014 at 06:05:39PM +0200, Vincent Guittot wrote: >> + /* >> + * If the CPUs share their cache and the src_cpu's capacity is >> + * reduced because of other sched_class or IRQs, we trig an >> + * active balance to move the task >> + */ >> + if ((sd->flags & SD_SHARE_PKG_RESOURCES) >> + && ((capacity_orig_of(src_cpu) * 100) > (capacity_of(src_cpu) * >> + sd->imbalance_pct))) >> return 1; > > Why is this tied to shared caches? It's just to limit the change of the policy to a level that can have benefit without performance regression. I'm not sure that we can do that at any level without risk -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On Thu, Jul 10, 2014 at 04:03:51PM +0200, Vincent Guittot wrote: > On 10 July 2014 13:18, Peter Zijlstra <peterz@infradead.org> wrote: > > On Mon, Jun 30, 2014 at 06:05:39PM +0200, Vincent Guittot wrote: > >> + /* > >> + * If the CPUs share their cache and the src_cpu's capacity is > >> + * reduced because of other sched_class or IRQs, we trig an > >> + * active balance to move the task > >> + */ > >> + if ((sd->flags & SD_SHARE_PKG_RESOURCES) > >> + && ((capacity_orig_of(src_cpu) * 100) > (capacity_of(src_cpu) * > >> + sd->imbalance_pct))) > >> return 1; > > > > Why is this tied to shared caches? > > It's just to limit the change of the policy to a level that can have > benefit without performance regression. I'm not sure that we can do > that at any level without risk Similar to the other change; so both details _should_ have been in the changelogs etc.. In any case, its feels rather arbitrary to me. What about machines where there's no cache sharing at all (the traditional SMP systems). This thing you're trying to do still seems to make sense there. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On 11 July 2014 16:51, Peter Zijlstra <peterz@infradead.org> wrote: > On Thu, Jul 10, 2014 at 04:03:51PM +0200, Vincent Guittot wrote: >> On 10 July 2014 13:18, Peter Zijlstra <peterz@infradead.org> wrote: >> > On Mon, Jun 30, 2014 at 06:05:39PM +0200, Vincent Guittot wrote: >> >> + /* >> >> + * If the CPUs share their cache and the src_cpu's capacity is >> >> + * reduced because of other sched_class or IRQs, we trig an >> >> + * active balance to move the task >> >> + */ >> >> + if ((sd->flags & SD_SHARE_PKG_RESOURCES) >> >> + && ((capacity_orig_of(src_cpu) * 100) > (capacity_of(src_cpu) * >> >> + sd->imbalance_pct))) >> >> return 1; >> > >> > Why is this tied to shared caches? >> >> It's just to limit the change of the policy to a level that can have >> benefit without performance regression. I'm not sure that we can do >> that at any level without risk > > Similar to the other change; so both details _should_ have been in the > changelogs etc.. i'm going to add details in the v4 > > In any case, its feels rather arbitrary to me. What about machines where > there's no cache sharing at all (the traditional SMP systems). This > thing you're trying to do still seems to make sense there. ok, I thought that traditional SMP systems have this flag set at core level. I mean ARM platforms have the flag for CPUs in the same cluster (which include current ARM SMP system) and the corei7 of my laptop has the flag at the cores level. > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On Fri, Jul 11, 2014 at 05:17:44PM +0200, Vincent Guittot wrote: > > In any case, its feels rather arbitrary to me. What about machines where > > there's no cache sharing at all (the traditional SMP systems). This > > thing you're trying to do still seems to make sense there. > > ok, I thought that traditional SMP systems have this flag set at core > level. Yeah, with 1 core, so its effectively disabled. > I mean ARM platforms have the flag for CPUs in the same cluster > (which include current ARM SMP system) and the corei7 of my laptop has > the flag at the cores level. So I can see 'small' parts reducing shared caches in order to improve idle performance. The point being that LLC seems a somewhat arbitrary measure for this. Can we try and see what happens if you remove the limit. Its always best to try the simpler things first and only make it more complex if we have to.
On 14 July 2014 15:51, Peter Zijlstra <peterz@infradead.org> wrote: > On Fri, Jul 11, 2014 at 05:17:44PM +0200, Vincent Guittot wrote: >> > In any case, its feels rather arbitrary to me. What about machines where >> > there's no cache sharing at all (the traditional SMP systems). This >> > thing you're trying to do still seems to make sense there. >> >> ok, I thought that traditional SMP systems have this flag set at core >> level. > > Yeah, with 1 core, so its effectively disabled. > >> I mean ARM platforms have the flag for CPUs in the same cluster >> (which include current ARM SMP system) and the corei7 of my laptop has >> the flag at the cores level. > > So I can see 'small' parts reducing shared caches in order to improve > idle performance. > > The point being that LLC seems a somewhat arbitrary measure for this. > > Can we try and see what happens if you remove the limit. Its always best > to try the simpler things first and only make it more complex if we have > to. ok. i will remove the condition in the next version -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index a23c938..742ad88 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5944,6 +5944,14 @@ static bool update_sd_pick_busiest(struct lb_env *env, return true; } + /* + * The group capacity is reduced probably because of activity from other + * sched class or interrupts which use part of the available capacity + */ + if ((sg->sgc->capacity_orig * 100) > (sgs->group_capacity * + env->sd->imbalance_pct)) + return true; + return false; } @@ -6421,13 +6429,24 @@ static int need_active_balance(struct lb_env *env) struct sched_domain *sd = env->sd; if (env->idle == CPU_NEWLY_IDLE) { + int src_cpu = env->src_cpu; /* * ASYM_PACKING needs to force migrate tasks from busy but * higher numbered CPUs in order to pack all tasks in the * lowest numbered CPUs. */ - if ((sd->flags & SD_ASYM_PACKING) && env->src_cpu > env->dst_cpu) + if ((sd->flags & SD_ASYM_PACKING) && src_cpu > env->dst_cpu) + return 1; + + /* + * If the CPUs share their cache and the src_cpu's capacity is + * reduced because of other sched_class or IRQs, we trig an + * active balance to move the task + */ + if ((sd->flags & SD_SHARE_PKG_RESOURCES) + && ((capacity_orig_of(src_cpu) * 100) > (capacity_of(src_cpu) * + sd->imbalance_pct))) return 1; } @@ -6529,6 +6548,8 @@ redo: schedstat_add(sd, lb_imbalance[idle], env.imbalance); + env.src_cpu = busiest->cpu; + ld_moved = 0; if (busiest->nr_running > 1) { /* @@ -6538,7 +6559,6 @@ redo: * correctly treated as an imbalance. */ env.flags |= LBF_ALL_PINNED; - env.src_cpu = busiest->cpu; env.src_rq = busiest; env.loop_max = min(sysctl_sched_nr_migrate, busiest->nr_running); @@ -7233,9 +7253,10 @@ static inline int nohz_kick_needed(struct rq *rq) struct sched_domain *sd; struct sched_group_capacity *sgc; int nr_busy, cpu = rq->cpu; + bool kick = false; if (unlikely(rq->idle_balance)) - return 0; + return false; /* * We may be recently in ticked or tickless idle mode. At the first @@ -7249,38 +7270,41 @@ static inline int nohz_kick_needed(struct rq *rq) * balancing. */ if (likely(!atomic_read(&nohz.nr_cpus))) - return 0; + return false; if (time_before(now, nohz.next_balance)) - return 0; + return false; if (rq->nr_running >= 2) - goto need_kick; + return true; rcu_read_lock(); sd = rcu_dereference(per_cpu(sd_busy, cpu)); - if (sd) { sgc = sd->groups->sgc; nr_busy = atomic_read(&sgc->nr_busy_cpus); - if (nr_busy > 1) - goto need_kick_unlock; + if (nr_busy > 1) { + kick = true; + goto unlock; + } + + if ((rq->cfs.h_nr_running >= 1) + && ((rq->cpu_capacity * sd->imbalance_pct) < + (rq->cpu_capacity_orig * 100))) { + kick = true; + goto unlock; + } } sd = rcu_dereference(per_cpu(sd_asym, cpu)); - if (sd && (cpumask_first_and(nohz.idle_cpus_mask, sched_domain_span(sd)) < cpu)) - goto need_kick_unlock; + kick = true; +unlock: rcu_read_unlock(); - return 0; - -need_kick_unlock: - rcu_read_unlock(); -need_kick: - return 1; + return kick; } #else static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle) { }
If the CPU is used for handling lot of IRQs, trig a load balance to check if it's worth moving its tasks on another CPU that has more capacity. As a sidenote, this will note generate more spurious ilb because we already trig an ilb if there is more than 1 busy cpu. If this cpu is the only one that has a task, we will trig the ilb once for migrating the task. The nohz_kick_needed function has been cleaned up a bit while adding the new test Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> --- kernel/sched/fair.c | 58 +++++++++++++++++++++++++++++++++++++---------------- 1 file changed, 41 insertions(+), 17 deletions(-)