Message ID | 1410826030-26301-1-git-send-email-morten.rasmussen@arm.com |
---|---|
State | New |
Headers | show |
Morten Rasmussen <morten.rasmussen@arm.com> writes: > Task group load-contributions are not updated when tasks belonging to > task groups are migrated by active load-balancing. If no other task > belonging to the same task group is already queued at the destination > cpu the group sched_entity will be enqueued with load_avg_contrib=0. > Hence, weighted_cpuload() won't reflect the newly added load. > > The load may remain invisible until the next tick, when the sched_entity > load_avg_contrib and task group contributions are reevaluated. > > The enqueue loop > > for_each_entity(se) { > enqueue_entity(cfs_rq,se) > ... > enqueue_entity_load_avg(cfs_rq,se) > ... > update_entity_load_avg(se) > ... > __update_entity_load_avg_contrib(se) > ... > ... > update_cfs_rq_blocked_load(cfs_rq) > ... > __update_cfs_rq_tg_load_contrib(cfs_rq) > ... > } > > currently skips __update_entity_load_avg_contrib() and > __update_cfs_rq_tg_load_contrib() for group entities for active > load-balance migrations. The former updates the sched_entity > load_avg_contrib, and the latter updates the task group contribution > which is needed by the former. They must both be called to ensure that > load doesn't temporarily disappear. > > cc: Paul Turner <pjt@google.com> > cc: Ben Segall <bsegall@google.com> > > Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com> > --- > kernel/sched/fair.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index be9e97b..2b6e2eb 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -2521,7 +2521,8 @@ static inline void update_entity_load_avg(struct sched_entity *se, > else > now = cfs_rq_clock_task(group_cfs_rq(se)); > > - if (!__update_entity_runnable_avg(now, &se->avg, se->on_rq)) > + if (!__update_entity_runnable_avg(now, &se->avg, se->on_rq) && > + entity_is_task(se)) > return; > > contrib_delta = __update_entity_load_avg_contrib(se); > @@ -2609,6 +2610,10 @@ static inline void enqueue_entity_load_avg(struct cfs_rq *cfs_rq, > cfs_rq->runnable_load_avg += se->avg.load_avg_contrib; > /* we force update consideration on load-balancer moves */ > update_cfs_rq_blocked_load(cfs_rq, !wakeup); > + > + /* We force update group contributions on load-balancer moves */ > + if (wakeup && !entity_is_task(se)) > + __update_cfs_rq_tg_load_contrib(cfs_rq, 0); > } > > /* It should probably be clearer that what this actually means is that we always update tg_load_contrib for any group-of-group (which is weird), so that we update the entire tree on load-balancer moves (which is sensible and what we care about, because it determines each se.load_avg_contrib). Wouldn't we need this on dequeue as well? I believe the reason for both of these ratelimits was performance on sleep/wake (where blocked_load_avg means tg_load_contrib should not have changed), it would be good to know how much this hurts. Given the fast-exit in __update_cfs_rq_tg_load_contrib, it seems like it might be fine, but the group-entity __update_entity_load_avg_contrib is less free when div_u64 is expensive, and is currently only called 1/ms. It seems like it would be a bit of a mess to force an update of the entire path in migrate, but it would certainly dodge the performance concerns if they wind up being an issue. (I suppose we could have ENQUEUE/DEQUEUE_MIGRATE or something...) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On Tue, Sep 16, 2014 at 06:49:14PM +0100, bsegall@google.com wrote: > Morten Rasmussen <morten.rasmussen@arm.com> writes: > > > Task group load-contributions are not updated when tasks belonging to > > task groups are migrated by active load-balancing. If no other task > > belonging to the same task group is already queued at the destination > > cpu the group sched_entity will be enqueued with load_avg_contrib=0. > > Hence, weighted_cpuload() won't reflect the newly added load. > > > > The load may remain invisible until the next tick, when the sched_entity > > load_avg_contrib and task group contributions are reevaluated. > > > > The enqueue loop > > > > for_each_entity(se) { > > enqueue_entity(cfs_rq,se) > > ... > > enqueue_entity_load_avg(cfs_rq,se) > > ... > > update_entity_load_avg(se) > > ... > > __update_entity_load_avg_contrib(se) > > ... > > ... > > update_cfs_rq_blocked_load(cfs_rq) > > ... > > __update_cfs_rq_tg_load_contrib(cfs_rq) > > ... > > } > > > > currently skips __update_entity_load_avg_contrib() and > > __update_cfs_rq_tg_load_contrib() for group entities for active > > load-balance migrations. The former updates the sched_entity > > load_avg_contrib, and the latter updates the task group contribution > > which is needed by the former. They must both be called to ensure that > > load doesn't temporarily disappear. > > > > cc: Paul Turner <pjt@google.com> > > cc: Ben Segall <bsegall@google.com> > > > > Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com> > > --- > > kernel/sched/fair.c | 7 ++++++- > > 1 file changed, 6 insertions(+), 1 deletion(-) > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > index be9e97b..2b6e2eb 100644 > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > > @@ -2521,7 +2521,8 @@ static inline void update_entity_load_avg(struct sched_entity *se, > > else > > now = cfs_rq_clock_task(group_cfs_rq(se)); > > > > - if (!__update_entity_runnable_avg(now, &se->avg, se->on_rq)) > > + if (!__update_entity_runnable_avg(now, &se->avg, se->on_rq) && > > + entity_is_task(se)) > > return; > > > > contrib_delta = __update_entity_load_avg_contrib(se); > > @@ -2609,6 +2610,10 @@ static inline void enqueue_entity_load_avg(struct cfs_rq *cfs_rq, > > cfs_rq->runnable_load_avg += se->avg.load_avg_contrib; > > /* we force update consideration on load-balancer moves */ > > update_cfs_rq_blocked_load(cfs_rq, !wakeup); > > + > > + /* We force update group contributions on load-balancer moves */ > > + if (wakeup && !entity_is_task(se)) > > + __update_cfs_rq_tg_load_contrib(cfs_rq, 0); > > } > > > > /* > > It should probably be clearer that what this actually means is that we > always update tg_load_contrib for any group-of-group (which is weird), Would it not only update groups entities if they aren't already on a runqueue? I'm clearly missing something. > so that we update the entire tree on load-balancer moves (which is > sensible and what we care about, because it determines each se.load_avg_contrib). > > Wouldn't we need this on dequeue as well? Yes. The problem is there too I think. It is not as visible though. If the last task of a particular group is migrated away from a cpu, the group entities will be dequeued and we should be fine. However, if there are other tasks left in the group or any parent group the load of that group will be off, I think. I need to verify that. > > I believe the reason for both of these ratelimits was performance on > sleep/wake (where blocked_load_avg means tg_load_contrib should not have > changed), it would be good to know how much this hurts. Given the > fast-exit in __update_cfs_rq_tg_load_contrib, it seems like it might be > fine, but the group-entity __update_entity_load_avg_contrib is less > free when div_u64 is expensive, and is currently only called 1/ms. Right. Is there a suitable group scheduling benchmark I can use to test this? > > It seems like it would be a bit of a mess to force an update of the > entire path in migrate, but it would certainly dodge the performance > concerns if they wind up being an issue. (I suppose we could have > ENQUEUE/DEQUEUE_MIGRATE or something...) Yes, I had the same thought. We might have to do something like that anyway to fix the dequeue path. It seems less straightforward to fix, but I need to take a closer look. Morten -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Morten Rasmussen <morten.rasmussen@arm.com> writes: > On Tue, Sep 16, 2014 at 06:49:14PM +0100, bsegall@google.com wrote: >> Morten Rasmussen <morten.rasmussen@arm.com> writes: >> >> > Task group load-contributions are not updated when tasks belonging to >> > task groups are migrated by active load-balancing. If no other task >> > belonging to the same task group is already queued at the destination >> > cpu the group sched_entity will be enqueued with load_avg_contrib=0. >> > Hence, weighted_cpuload() won't reflect the newly added load. >> > >> > The load may remain invisible until the next tick, when the sched_entity >> > load_avg_contrib and task group contributions are reevaluated. >> > >> > The enqueue loop >> > >> > for_each_entity(se) { >> > enqueue_entity(cfs_rq,se) >> > ... >> > enqueue_entity_load_avg(cfs_rq,se) >> > ... >> > update_entity_load_avg(se) >> > ... >> > __update_entity_load_avg_contrib(se) >> > ... >> > ... >> > update_cfs_rq_blocked_load(cfs_rq) >> > ... >> > __update_cfs_rq_tg_load_contrib(cfs_rq) >> > ... >> > } >> > >> > currently skips __update_entity_load_avg_contrib() and >> > __update_cfs_rq_tg_load_contrib() for group entities for active >> > load-balance migrations. The former updates the sched_entity >> > load_avg_contrib, and the latter updates the task group contribution >> > which is needed by the former. They must both be called to ensure that >> > load doesn't temporarily disappear. >> > >> > cc: Paul Turner <pjt@google.com> >> > cc: Ben Segall <bsegall@google.com> >> > >> > Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com> >> > --- >> > kernel/sched/fair.c | 7 ++++++- >> > 1 file changed, 6 insertions(+), 1 deletion(-) >> > >> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> > index be9e97b..2b6e2eb 100644 >> > --- a/kernel/sched/fair.c >> > +++ b/kernel/sched/fair.c >> > @@ -2521,7 +2521,8 @@ static inline void update_entity_load_avg(struct sched_entity *se, >> > else >> > now = cfs_rq_clock_task(group_cfs_rq(se)); >> > >> > - if (!__update_entity_runnable_avg(now, &se->avg, se->on_rq)) >> > + if (!__update_entity_runnable_avg(now, &se->avg, se->on_rq) && >> > + entity_is_task(se)) >> > return; >> > >> > contrib_delta = __update_entity_load_avg_contrib(se); >> > @@ -2609,6 +2610,10 @@ static inline void enqueue_entity_load_avg(struct cfs_rq *cfs_rq, >> > cfs_rq->runnable_load_avg += se->avg.load_avg_contrib; >> > /* we force update consideration on load-balancer moves */ >> > update_cfs_rq_blocked_load(cfs_rq, !wakeup); >> > + >> > + /* We force update group contributions on load-balancer moves */ >> > + if (wakeup && !entity_is_task(se)) >> > + __update_cfs_rq_tg_load_contrib(cfs_rq, 0); >> > } >> > >> > /* >> >> It should probably be clearer that what this actually means is that we >> always update tg_load_contrib for any group-of-group (which is weird), > > Would it not only update groups entities if they aren't already on a > runqueue? I'm clearly missing something. No, that's correct, but it will do it on normal sleep/wake too, not just load-balancer moves. > >> so that we update the entire tree on load-balancer moves (which is >> sensible and what we care about, because it determines each se.load_avg_contrib). >> >> Wouldn't we need this on dequeue as well? > > Yes. The problem is there too I think. It is not as visible though. If > the last task of a particular group is migrated away from a cpu, the > group entities will be dequeued and we should be fine. However, if there > are other tasks left in the group or any parent group the load of that > group will be off, I think. I need to verify that. Yeah, the case where it comes up would be different, and possibly less bad. > >> >> I believe the reason for both of these ratelimits was performance on >> sleep/wake (where blocked_load_avg means tg_load_contrib should not have >> changed), it would be good to know how much this hurts. Given the >> fast-exit in __update_cfs_rq_tg_load_contrib, it seems like it might be >> fine, but the group-entity __update_entity_load_avg_contrib is less >> free when div_u64 is expensive, and is currently only called 1/ms. > > Right. Is there a suitable group scheduling benchmark I can use to test > this? The most severe would probably be pipe test where you moved the threads to be in separate cgroups. A more realistic one would probably just be looping around nanosleep(100ns) on a bunch of cpus or something like that. > >> >> It seems like it would be a bit of a mess to force an update of the >> entire path in migrate, but it would certainly dodge the performance >> concerns if they wind up being an issue. (I suppose we could have >> ENQUEUE/DEQUEUE_MIGRATE or something...) > > Yes, I had the same thought. We might have to do something like that > anyway to fix the dequeue path. It seems less straightforward to fix, > but I need to take a closer look. > > Morten -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index be9e97b..2b6e2eb 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2521,7 +2521,8 @@ static inline void update_entity_load_avg(struct sched_entity *se, else now = cfs_rq_clock_task(group_cfs_rq(se)); - if (!__update_entity_runnable_avg(now, &se->avg, se->on_rq)) + if (!__update_entity_runnable_avg(now, &se->avg, se->on_rq) && + entity_is_task(se)) return; contrib_delta = __update_entity_load_avg_contrib(se); @@ -2609,6 +2610,10 @@ static inline void enqueue_entity_load_avg(struct cfs_rq *cfs_rq, cfs_rq->runnable_load_avg += se->avg.load_avg_contrib; /* we force update consideration on load-balancer moves */ update_cfs_rq_blocked_load(cfs_rq, !wakeup); + + /* We force update group contributions on load-balancer moves */ + if (wakeup && !entity_is_task(se)) + __update_cfs_rq_tg_load_contrib(cfs_rq, 0); } /*
Task group load-contributions are not updated when tasks belonging to task groups are migrated by active load-balancing. If no other task belonging to the same task group is already queued at the destination cpu the group sched_entity will be enqueued with load_avg_contrib=0. Hence, weighted_cpuload() won't reflect the newly added load. The load may remain invisible until the next tick, when the sched_entity load_avg_contrib and task group contributions are reevaluated. The enqueue loop for_each_entity(se) { enqueue_entity(cfs_rq,se) ... enqueue_entity_load_avg(cfs_rq,se) ... update_entity_load_avg(se) ... __update_entity_load_avg_contrib(se) ... ... update_cfs_rq_blocked_load(cfs_rq) ... __update_cfs_rq_tg_load_contrib(cfs_rq) ... } currently skips __update_entity_load_avg_contrib() and __update_cfs_rq_tg_load_contrib() for group entities for active load-balance migrations. The former updates the sched_entity load_avg_contrib, and the latter updates the task group contribution which is needed by the former. They must both be called to ensure that load doesn't temporarily disappear. cc: Paul Turner <pjt@google.com> cc: Ben Segall <bsegall@google.com> Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com> --- kernel/sched/fair.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)