Message ID | 1498831118-22672-1-git-send-email-vincent.guittot@linaro.org |
---|---|
State | New |
Headers | show |
On 30 June 2017 at 15:58, Vincent Guittot <vincent.guittot@linaro.org> wrote: > The running state is a subset of runnable state which means that running > can't be set if runnable (weight) is cleared. There are corner cases > where the current sched_entity has been already dequeued but cfs_rq->curr > has not been updated yet and still points to the dequeued sched_entity. > If ___update_load_avg is called at that time, weight will be 0 and running > will be set which is not possible. > > This case happens during pick_next_task_fair() when a cfs_rq becomes idles. > The current sched_entity has been dequeued so se->on_rq is cleared and > cfs_rq->weight is null. But cfs_rq->curr still points to se (it will be > cleared when picking the idle thread). Because the cfs_rq becomes idle, > idle_balance() is called and ends up to call update_blocked_averages() > with these wrong running and runnable states. > > Add a test in ___update_load_avg to correct the running state in this case.* > > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> The v1 is just wrong. I have sent a v2 with correct patch sorry for the disturb Vincent > --- > kernel/sched/fair.c | 18 ++++++++++++++++++ > 1 file changed, 18 insertions(+) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 008c514..5fdcb42 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -2968,6 +2968,24 @@ ___update_load_avg(u64 now, int cpu, struct sched_avg *sa, > sa->last_update_time += delta << 10; > > /* > + * running is a subset of runnable (weight) so running can't be set if > + * runnable is clear. But there are some corner cases where the current > + * se has been already dequeued but cfs_rq->curr still points to it. > + * This means that weight will be 0 but not running for a sched_entity > + * but also for a cfs_rq if the latter becomes idle. As an example, > + * this happens during idle_balance() which calls > + * update_blocked_averages() > + */ > + if (weight) > + running = 1; > + > + /* > + * Scale time to reflect the amount a computation effectively done > + * during the time slot at current capacity > + */ > + delta = scale_time(delta, cpu, sa, weight, running); > + > + /* > * Now we know we crossed measurement unit boundaries. The *_avg > * accrues by two steps: > * > -- > 2.7.4 >
Hi Vincent, [auto build test ERROR on tip/sched/core] [also build test ERROR on v4.12-rc7 next-20170630] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Vincent-Guittot/sched-pelt-fix-false-running-accounting-in-PELT/20170702-002907 config: i386-randconfig-x078-07012120 (attached as .config) compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901 reproduce: # save the attached .config to linux build tree make ARCH=i386 All errors (new ones prefixed by >>): kernel/sched/fair.c: In function '___update_load_avg': >> kernel/sched/fair.c:2986:10: error: implicit declaration of function 'scale_time' [-Werror=implicit-function-declaration] delta = scale_time(delta, cpu, sa, weight, running); ^~~~~~~~~~ cc1: some warnings being treated as errors vim +/scale_time +2986 kernel/sched/fair.c 2980 running = 1; 2981 2982 /* 2983 * Scale time to reflect the amount a computation effectively done 2984 * during the time slot at current capacity 2985 */ > 2986 delta = scale_time(delta, cpu, sa, weight, running); 2987 2988 /* 2989 * Now we know we crossed measurement unit boundaries. The *_avg --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 008c514..5fdcb42 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2968,6 +2968,24 @@ ___update_load_avg(u64 now, int cpu, struct sched_avg *sa, sa->last_update_time += delta << 10; /* + * running is a subset of runnable (weight) so running can't be set if + * runnable is clear. But there are some corner cases where the current + * se has been already dequeued but cfs_rq->curr still points to it. + * This means that weight will be 0 but not running for a sched_entity + * but also for a cfs_rq if the latter becomes idle. As an example, + * this happens during idle_balance() which calls + * update_blocked_averages() + */ + if (weight) + running = 1; + + /* + * Scale time to reflect the amount a computation effectively done + * during the time slot at current capacity + */ + delta = scale_time(delta, cpu, sa, weight, running); + + /* * Now we know we crossed measurement unit boundaries. The *_avg * accrues by two steps: *
The running state is a subset of runnable state which means that running can't be set if runnable (weight) is cleared. There are corner cases where the current sched_entity has been already dequeued but cfs_rq->curr has not been updated yet and still points to the dequeued sched_entity. If ___update_load_avg is called at that time, weight will be 0 and running will be set which is not possible. This case happens during pick_next_task_fair() when a cfs_rq becomes idles. The current sched_entity has been dequeued so se->on_rq is cleared and cfs_rq->weight is null. But cfs_rq->curr still points to se (it will be cleared when picking the idle thread). Because the cfs_rq becomes idle, idle_balance() is called and ends up to call update_blocked_averages() with these wrong running and runnable states. Add a test in ___update_load_avg to correct the running state in this case. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> --- kernel/sched/fair.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) -- 2.7.4