Message ID | 1571405198-27570-2-git-send-email-vincent.guittot@linaro.org |
---|---|
State | Accepted |
Commit | 490ba971d8b498ba3a47999ab94c6a0d1830ad41 |
Headers | show |
Series | sched/fair: rework the CFS load balance | expand |
On Fri, Oct 18, 2019 at 03:26:28PM +0200, Vincent Guittot wrote: > Clean up asym packing to follow the default load balance behavior: > - classify the group by creating a group_asym_packing field. > - calculate the imbalance in calculate_imbalance() instead of bypassing it. > > We don't need to test twice same conditions anymore to detect asym packing > and we consolidate the calculation of imbalance in calculate_imbalance(). > > There is no functional changes. > > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> > Acked-by: Rik van Riel <riel@surriel.com> > --- > kernel/sched/fair.c | 63 ++++++++++++++--------------------------------------- > 1 file changed, 16 insertions(+), 47 deletions(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 1f0a5e1..617145c 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -7675,6 +7675,7 @@ struct sg_lb_stats { > unsigned int group_weight; > enum group_type group_type; > int group_no_capacity; > + unsigned int group_asym_packing; /* Tasks should be moved to preferred CPU */ > unsigned long group_misfit_task_load; /* A CPU has a task too big for its capacity */ > #ifdef CONFIG_NUMA_BALANCING > unsigned int nr_numa_running; > @@ -8129,9 +8130,17 @@ static bool update_sd_pick_busiest(struct lb_env *env, > * ASYM_PACKING needs to move all the work to the highest > * prority CPUs in the group, therefore mark all groups > * of lower priority than ourself as busy. > + * > + * This is primarily intended to used at the sibling level. Some > + * cores like POWER7 prefer to use lower numbered SMT threads. In the > + * case of POWER7, it can move to lower SMT modes only when higher > + * threads are idle. When in lower SMT modes, the threads will > + * perform better since they share less core resources. Hence when we > + * have idle threads, we want them to be the higher ones. > */ > if (sgs->sum_nr_running && > sched_asym_prefer(env->dst_cpu, sg->asym_prefer_cpu)) { > + sgs->group_asym_packing = 1; > if (!sds->busiest) > return true; > (I did not read any of the earlier implementations of this series, maybe this was discussed already in which case, sorry for the noise) Are you *sure* this is not a functional change? Asym packing is a twisty maze of headaches and I'm not familiar enough with it to be 100% certain without spending a lot of time on this. Even spotting how Power7 ends up using asym packing with lower-numbered SMT threads is a bit of a challenge. Specifically, it relies on the scheduler domain SD_ASYM_PACKING flag for SMT domains to use the weak implementation of arch_asym_cpu_priority which by defaults favours the lower-numbered CPU. The check_asym_packing implementation checks that flag but I can't see the equiavlent type of check here. update_sd_pick_busiest could be called for domains that span NUMA or basically any domain that does not specify SD_ASYM_PACKING and end up favouring a lower-numbered CPU (or whatever arch_asym_cpu_priority returns in the case of x86 which has a different idea for favoured CPUs). sched_asym_prefer appears to be a function that is very easy to use incorrectly. Should it take env and check the SD flags first? -- Mel Gorman SUSE Labs
On Wed, 30 Oct 2019 at 15:51, Mel Gorman <mgorman@techsingularity.net> wrote: > > On Fri, Oct 18, 2019 at 03:26:28PM +0200, Vincent Guittot wrote: > > Clean up asym packing to follow the default load balance behavior: > > - classify the group by creating a group_asym_packing field. > > - calculate the imbalance in calculate_imbalance() instead of bypassing it. > > > > We don't need to test twice same conditions anymore to detect asym packing > > and we consolidate the calculation of imbalance in calculate_imbalance(). > > > > There is no functional changes. > > > > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> > > Acked-by: Rik van Riel <riel@surriel.com> > > --- > > kernel/sched/fair.c | 63 ++++++++++++++--------------------------------------- > > 1 file changed, 16 insertions(+), 47 deletions(-) > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > index 1f0a5e1..617145c 100644 > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > > @@ -7675,6 +7675,7 @@ struct sg_lb_stats { > > unsigned int group_weight; > > enum group_type group_type; > > int group_no_capacity; > > + unsigned int group_asym_packing; /* Tasks should be moved to preferred CPU */ > > unsigned long group_misfit_task_load; /* A CPU has a task too big for its capacity */ > > #ifdef CONFIG_NUMA_BALANCING > > unsigned int nr_numa_running; > > @@ -8129,9 +8130,17 @@ static bool update_sd_pick_busiest(struct lb_env *env, > > * ASYM_PACKING needs to move all the work to the highest > > * prority CPUs in the group, therefore mark all groups > > * of lower priority than ourself as busy. > > + * > > + * This is primarily intended to used at the sibling level. Some > > + * cores like POWER7 prefer to use lower numbered SMT threads. In the > > + * case of POWER7, it can move to lower SMT modes only when higher > > + * threads are idle. When in lower SMT modes, the threads will > > + * perform better since they share less core resources. Hence when we > > + * have idle threads, we want them to be the higher ones. > > */ > > if (sgs->sum_nr_running && > > sched_asym_prefer(env->dst_cpu, sg->asym_prefer_cpu)) { > > + sgs->group_asym_packing = 1; > > if (!sds->busiest) > > return true; > > > > (I did not read any of the earlier implementations of this series, maybe > this was discussed already in which case, sorry for the noise) > > Are you *sure* this is not a functional change? > > Asym packing is a twisty maze of headaches and I'm not familiar enough > with it to be 100% certain without spending a lot of time on this. Even > spotting how Power7 ends up using asym packing with lower-numbered SMT > threads is a bit of a challenge. Specifically, it relies on the scheduler > domain SD_ASYM_PACKING flag for SMT domains to use the weak implementation > of arch_asym_cpu_priority which by defaults favours the lower-numbered CPU. > > The check_asym_packing implementation checks that flag but I can't see > the equiavlent type of check here. update_sd_pick_busiest could be called The checks of SD_ASYM_PACKING and CPU_NOT_IDLE are already above in the function but out of the patch. In fact this part of update_sd_pick_busiest is already dedicated to asym_packing. What I'm doing is that instead of checking asym_packing in update_sd_pick_busiest and then rechecking the same thing in find_busiest_group, I save the check result and reuse it Also patch 04 moves further this code > for domains that span NUMA or basically any domain that does not specify > SD_ASYM_PACKING and end up favouring a lower-numbered CPU (or whatever > arch_asym_cpu_priority returns in the case of x86 which has a different > idea for favoured CPUs). > > sched_asym_prefer appears to be a function that is very easy to use > incorrectly. Should it take env and check the SD flags first? > > -- > Mel Gorman > SUSE Labs
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 1f0a5e1..617145c 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7675,6 +7675,7 @@ struct sg_lb_stats { unsigned int group_weight; enum group_type group_type; int group_no_capacity; + unsigned int group_asym_packing; /* Tasks should be moved to preferred CPU */ unsigned long group_misfit_task_load; /* A CPU has a task too big for its capacity */ #ifdef CONFIG_NUMA_BALANCING unsigned int nr_numa_running; @@ -8129,9 +8130,17 @@ static bool update_sd_pick_busiest(struct lb_env *env, * ASYM_PACKING needs to move all the work to the highest * prority CPUs in the group, therefore mark all groups * of lower priority than ourself as busy. + * + * This is primarily intended to used at the sibling level. Some + * cores like POWER7 prefer to use lower numbered SMT threads. In the + * case of POWER7, it can move to lower SMT modes only when higher + * threads are idle. When in lower SMT modes, the threads will + * perform better since they share less core resources. Hence when we + * have idle threads, we want them to be the higher ones. */ if (sgs->sum_nr_running && sched_asym_prefer(env->dst_cpu, sg->asym_prefer_cpu)) { + sgs->group_asym_packing = 1; if (!sds->busiest) return true; @@ -8273,51 +8282,6 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd } /** - * check_asym_packing - Check to see if the group is packed into the - * sched domain. - * - * This is primarily intended to used at the sibling level. Some - * cores like POWER7 prefer to use lower numbered SMT threads. In the - * case of POWER7, it can move to lower SMT modes only when higher - * threads are idle. When in lower SMT modes, the threads will - * perform better since they share less core resources. Hence when we - * have idle threads, we want them to be the higher ones. - * - * This packing function is run on idle threads. It checks to see if - * the busiest CPU in this domain (core in the P7 case) has a higher - * CPU number than the packing function is being run on. Here we are - * assuming lower CPU number will be equivalent to lower a SMT thread - * number. - * - * Return: 1 when packing is required and a task should be moved to - * this CPU. The amount of the imbalance is returned in env->imbalance. - * - * @env: The load balancing environment. - * @sds: Statistics of the sched_domain which is to be packed - */ -static int check_asym_packing(struct lb_env *env, struct sd_lb_stats *sds) -{ - int busiest_cpu; - - if (!(env->sd->flags & SD_ASYM_PACKING)) - return 0; - - if (env->idle == CPU_NOT_IDLE) - return 0; - - if (!sds->busiest) - return 0; - - busiest_cpu = sds->busiest->asym_prefer_cpu; - if (sched_asym_prefer(busiest_cpu, env->dst_cpu)) - return 0; - - env->imbalance = sds->busiest_stat.group_load; - - return 1; -} - -/** * fix_small_imbalance - Calculate the minor imbalance that exists * amongst the groups of a sched_domain, during * load balancing. @@ -8401,6 +8365,11 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s local = &sds->local_stat; busiest = &sds->busiest_stat; + if (busiest->group_asym_packing) { + env->imbalance = busiest->group_load; + return; + } + if (busiest->group_type == group_imbalanced) { /* * In the group_imb case we cannot rely on group-wide averages @@ -8505,8 +8474,8 @@ static struct sched_group *find_busiest_group(struct lb_env *env) busiest = &sds.busiest_stat; /* ASYM feature bypasses nice load balance check */ - if (check_asym_packing(env, &sds)) - return sds.busiest; + if (busiest->group_asym_packing) + goto force_balance; /* There is no busy sibling group to pull tasks from */ if (!sds.busiest || busiest->sum_nr_running == 0)