Message ID | 1458606068-7476-1-git-send-email-smuckle@linaro.org |
---|---|
State | Accepted |
Commit | 21e96f88776deead303ecd30a17d1d7c2a1776e3 |
Headers | show |
On 03/28/2016 11:30 AM, Dietmar Eggemann wrote: > On 03/28/2016 06:34 PM, Steve Muckle wrote: >> Hi Dietmar, >> >> On 03/28/2016 05:02 AM, Dietmar Eggemann wrote: >>> Hi Steve, >>> >>> these patches fall into the bucket of 'optimization of updating the >>> value only if the root cfs_rq util has changed' as discussed in '[PATCH >>> 5/8] sched/cpufreq: pass sched class into cpufreq_update_util' of Mike >>> T's current series '[PATCH 0/8] schedutil enhancements', right? >> >> I would say just the second patch is an optimization. The first and >> third patches cover additional paths in CFS where the hook should be >> called but currently is not, which I think is a correctness issue. > > Not disagreeing here but I don't know if this level of accuracy is > really needed. I mean we currently miss updates in > enqueue_task_fair()->enqueue_entity()->enqueue_entity_load_avg() and > idle_balance()/rebalance_domains()->update_blocked_averages() but there > are plenty of call sides of update_load_avg(se, ...) with > '&rq_of(cfs_rq_of(se))->cfs == cfs_rq_of(se)'. > > The question for me is does schedutil work better with this new, more > accurate signal? IMO, not receiving a bunch of consecutive > cpufreq_update_util's w/ the same 'util' value is probably a good thing, > unless we see the interaction with RT/DL class as mentioned by Sai. Here > an agreement on the design for the 'capacity vote aggregation from > CFS/RT/DL' would help to clarify. Without covering all the paths where CFS utilization changes it's possible to have to wait up to a tick to act on some changes, since the tick is the only guaranteed regularly-occurring instance of the hook. That's an unacceptable amount of latency IMO... thanks, Steve
On 30 March 2016 at 21:35, Peter Zijlstra <peterz@infradead.org> wrote: > On Mon, Mar 28, 2016 at 12:38:26PM -0700, Steve Muckle wrote: >> Without covering all the paths where CFS utilization changes it's >> possible to have to wait up to a tick to act on some changes, since the >> tick is the only guaranteed regularly-occurring instance of the hook. >> That's an unacceptable amount of latency IMO... > > Note that even with your patches that might still be the case. Remote > wakeups might not happen on the destination CPU at all, so it might not > be until the next tick (which always happens locally) that we'll > 'observe' the utilization change brought with the wakeups. > > We could force all the remote wakeups to IPI the destination CPU, but > that comes at a significant performance cost. Isn't a reschedule ipi already sent in this case ?
On 31 March 2016 at 14:34, Peter Zijlstra <peterz@infradead.org> wrote: > On Thu, Mar 31, 2016 at 02:14:50PM +0200, Vincent Guittot wrote: >> In fact, I looks for the sequence where the utilization of a rq is not >> updated until the next tick but i can't find it. > > No, util it always updated, however.. > >> If cpu doesn't share cache, task is added to wake list and an ipi is >> sent and the utilization. > > Here we run: > > ttwu_do_activate() > ttwu_activate() > activate_task() > enqueue_task() > p->sched_class->enqueue_task() := enqueue_task_fair() > update_load_avg() > update_cfs_rq_load_avg() > cfs_rq_util_change() > > On the local cpu, and we can indeed call out to have the frequency > changed. > >> Otherwise, we directly enqueue the task on >> the rq and the utilization is updated > > But here we run it on a remote cpu, so we cannot call out and the > frequency remains the same. > > So if a remote wakeup on the same LLC domain happens, utilization will > increase but we will not observe until the next tick. ok. I forgot that we have the condition cpu == smp_processor_id() in cfs_rq_util_change.
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 46d64e4ccfde..d418deb04049 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2825,7 +2825,9 @@ static inline u64 cfs_rq_clock_task(struct cfs_rq *cfs_rq); static inline int update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq) { struct sched_avg *sa = &cfs_rq->avg; + struct rq *rq = rq_of(cfs_rq); int decayed, removed = 0; + int cpu = cpu_of(rq); if (atomic_long_read(&cfs_rq->removed_load_avg)) { s64 r = atomic_long_xchg(&cfs_rq->removed_load_avg, 0); @@ -2840,7 +2842,7 @@ static inline int update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq) sa->util_sum = max_t(s32, sa->util_sum - r * LOAD_AVG_MAX, 0); } - decayed = __update_load_avg(now, cpu_of(rq_of(cfs_rq)), sa, + decayed = __update_load_avg(now, cpu, sa, scale_load_down(cfs_rq->load.weight), cfs_rq->curr != NULL, cfs_rq); #ifndef CONFIG_64BIT @@ -2848,28 +2850,6 @@ static inline int update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq) cfs_rq->load_last_update_time_copy = sa->last_update_time; #endif - return decayed || removed; -} - -/* Update task and its cfs_rq load average */ -static inline void update_load_avg(struct sched_entity *se, int update_tg) -{ - struct cfs_rq *cfs_rq = cfs_rq_of(se); - u64 now = cfs_rq_clock_task(cfs_rq); - struct rq *rq = rq_of(cfs_rq); - int cpu = cpu_of(rq); - - /* - * Track task load average for carrying it to new CPU after migrated, and - * track group sched_entity load average for task_h_load calc in migration - */ - __update_load_avg(now, cpu, &se->avg, - se->on_rq * scale_load_down(se->load.weight), - cfs_rq->curr == se, NULL); - - if (update_cfs_rq_load_avg(now, cfs_rq) && update_tg) - update_tg_load_avg(cfs_rq, 0); - if (cpu == smp_processor_id() && &rq->cfs == cfs_rq) { unsigned long max = rq->cpu_capacity_orig; @@ -2890,8 +2870,30 @@ static inline void update_load_avg(struct sched_entity *se, int update_tg) * See cpu_util(). */ cpufreq_update_util(rq_clock(rq), - min(cfs_rq->avg.util_avg, max), max); + min(sa->util_avg, max), max); } + + return decayed || removed; +} + +/* Update task and its cfs_rq load average */ +static inline void update_load_avg(struct sched_entity *se, int update_tg) +{ + struct cfs_rq *cfs_rq = cfs_rq_of(se); + u64 now = cfs_rq_clock_task(cfs_rq); + struct rq *rq = rq_of(cfs_rq); + int cpu = cpu_of(rq); + + /* + * Track task load average for carrying it to new CPU after migrated, and + * track group sched_entity load average for task_h_load calc in migration + */ + __update_load_avg(now, cpu, &se->avg, + se->on_rq * scale_load_down(se->load.weight), + cfs_rq->curr == se, NULL); + + if (update_cfs_rq_load_avg(now, cfs_rq) && update_tg) + update_tg_load_avg(cfs_rq, 0); } static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
The cpufreq hook should be called whenever the root cfs_rq utilization changes so update_cfs_rq_load_avg() is a better place for it. The current location is not invoked in the enqueue_entity() or update_blocked_averages() paths. Suggested-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Steve Muckle <smuckle@linaro.org> --- kernel/sched/fair.c | 50 ++++++++++++++++++++++++++------------------------ 1 file changed, 26 insertions(+), 24 deletions(-) -- 2.4.10