Message ID | 20161018115651.GA20956@linaro.org |
---|---|
State | New |
Headers | show |
On 10/18/2016 04:56 AM, Vincent Guittot wrote: > Le Tuesday 18 Oct 2016 à 12:34:12 (+0200), Peter Zijlstra a écrit : >> On Tue, Oct 18, 2016 at 11:45:48AM +0200, Vincent Guittot wrote: >>> On 18 October 2016 at 11:07, Peter Zijlstra <peterz@infradead.org> wrote: >>>> So aside from funny BIOSes, this should also show up when creating >>>> cgroups when you have offlined a few CPUs, which is far more common I'd >>>> think. >>> >>> The problem is also that the load of the tg->se[cpu] that represents >>> the tg->cfs_rq[cpu] is initialized to 1024 in: >>> alloc_fair_sched_group >>> for_each_possible_cpu(i) { >>> init_entity_runnable_average(se); >>> sa->load_avg = scale_load_down(se->load.weight); >>> >>> Initializing sa->load_avg to 1024 for a newly created task makes >>> sense as we don't know yet what will be its real load but i'm not sure >>> that we have to do the same for se that represents a task group. This >>> load should be initialized to 0 and it will increase when task will be >>> moved/attached into task group >> >> Yes, I think that makes sense, not sure how horrible that is with the > > That should not be that bad because this initial value is only useful for > the few dozens of ms that follow the creation of the task group > >> >> current state of things, but after your propagate patch, that >> reinstates the interactivity hack that should work for sure. > > The patch below fixes the issue on my platform: > > Dietmar, Omer can you confirm that this fix the problem of your platform too ? I just noticed this thread after posting https://lkml.org/lkml/2016/10/18/719... Noticed this bug while a ago and had the patch above at least a week but unfortunately didn't have time to post... I think Omer had same problem I was trying to fix and I believe patch I post should address it. Vincent, your version fixes my test case as well. This is sched_stat from the same test case I had in my changelog. Note dd-2030 which is in root cgroup had same runtime as dd-2033 which is in child cgroup. dd (2030, #threads: 1) ------------------------------------------------------------------- se.exec_start : 275700.024137 se.vruntime : 10589.114654 se.sum_exec_runtime : 1576.837993 se.nr_migrations : 0 nr_switches : 159 nr_voluntary_switches : 0 nr_involuntary_switches : 159 se.load.weight : 1048576 se.avg.load_sum : 48840575 se.avg.util_sum : 19741820 se.avg.load_avg : 1022 se.avg.util_avg : 413 se.avg.last_update_time : 275700024137 policy : 0 prio : 120 clock-delta : 34 dd (2033, #threads: 1) ------------------------------------------------------------------- se.exec_start : 275710.037178 se.vruntime : 2383.802868 se.sum_exec_runtime : 1576.547591 se.nr_migrations : 0 nr_switches : 162 nr_voluntary_switches : 0 nr_involuntary_switches : 162 se.load.weight : 1048576 se.avg.load_sum : 48316646 se.avg.util_sum : 21235249 se.avg.load_avg : 1011 se.avg.util_avg : 444 se.avg.last_update_time : 275710037178 policy : 0 prio : 120 clock-delta : 36 Thanks, Joonwoo > > --- > kernel/sched/fair.c | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 8b03fb5..89776ac 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -690,7 +690,14 @@ void init_entity_runnable_average(struct sched_entity *se) > * will definitely be update (after enqueue). > */ > sa->period_contrib = 1023; > - sa->load_avg = scale_load_down(se->load.weight); > + /* > + * Tasks are intialized with full load to be seen as heavy task until > + * they get a chance to stabilize to their real load level. > + * group entity are intialized with null load to reflect the fact that > + * nothing has been attached yet to the task group. > + */ > + if (entity_is_task(se)) > + sa->load_avg = scale_load_down(se->load.weight); > sa->load_sum = sa->load_avg * LOAD_AVG_MAX; > /* > * At this point, util_avg won't be used in select_task_rq_fair anyway > > > >
On 18 October 2016 at 23:58, Joonwoo Park <joonwoop@codeaurora.org> wrote: > > > On 10/18/2016 04:56 AM, Vincent Guittot wrote: >> >> Le Tuesday 18 Oct 2016 à 12:34:12 (+0200), Peter Zijlstra a écrit : >>> >>> On Tue, Oct 18, 2016 at 11:45:48AM +0200, Vincent Guittot wrote: >>>> >>>> On 18 October 2016 at 11:07, Peter Zijlstra <peterz@infradead.org> >>>> wrote: >>>>> >>>>> So aside from funny BIOSes, this should also show up when creating >>>>> cgroups when you have offlined a few CPUs, which is far more common I'd >>>>> think. >>>> >>>> >>>> The problem is also that the load of the tg->se[cpu] that represents >>>> the tg->cfs_rq[cpu] is initialized to 1024 in: >>>> alloc_fair_sched_group >>>> for_each_possible_cpu(i) { >>>> init_entity_runnable_average(se); >>>> sa->load_avg = scale_load_down(se->load.weight); >>>> >>>> Initializing sa->load_avg to 1024 for a newly created task makes >>>> sense as we don't know yet what will be its real load but i'm not sure >>>> that we have to do the same for se that represents a task group. This >>>> load should be initialized to 0 and it will increase when task will be >>>> moved/attached into task group >>> >>> >>> Yes, I think that makes sense, not sure how horrible that is with the >> >> >> That should not be that bad because this initial value is only useful for >> the few dozens of ms that follow the creation of the task group >> >>> >>> current state of things, but after your propagate patch, that >>> reinstates the interactivity hack that should work for sure. >> >> >> The patch below fixes the issue on my platform: >> >> Dietmar, Omer can you confirm that this fix the problem of your platform >> too ? > > > I just noticed this thread after posting > https://lkml.org/lkml/2016/10/18/719... > Noticed this bug while a ago and had the patch above at least a week but > unfortunately didn't have time to post... > I think Omer had same problem I was trying to fix and I believe patch I post > should address it. > > Vincent, your version fixes my test case as well. Thanks for testing. Can i consider this as a Tested-by ? > This is sched_stat from the same test case I had in my changelog. > Note dd-2030 which is in root cgroup had same runtime as dd-2033 which is in > child cgroup. > > dd (2030, #threads: 1) > ------------------------------------------------------------------- > se.exec_start : 275700.024137 > se.vruntime : 10589.114654 > se.sum_exec_runtime : 1576.837993 > se.nr_migrations : 0 > nr_switches : 159 > nr_voluntary_switches : 0 > nr_involuntary_switches : 159 > se.load.weight : 1048576 > se.avg.load_sum : 48840575 > se.avg.util_sum : 19741820 > se.avg.load_avg : 1022 > se.avg.util_avg : 413 > se.avg.last_update_time : 275700024137 > policy : 0 > prio : 120 > clock-delta : 34 > dd (2033, #threads: 1) > ------------------------------------------------------------------- > se.exec_start : 275710.037178 > se.vruntime : 2383.802868 > se.sum_exec_runtime : 1576.547591 > se.nr_migrations : 0 > nr_switches : 162 > nr_voluntary_switches : 0 > nr_involuntary_switches : 162 > se.load.weight : 1048576 > se.avg.load_sum : 48316646 > se.avg.util_sum : 21235249 > se.avg.load_avg : 1011 > se.avg.util_avg : 444 > se.avg.last_update_time : 275710037178 > policy : 0 > prio : 120 > clock-delta : 36 > > Thanks, > Joonwoo > > >> >> --- >> kernel/sched/fair.c | 9 ++++++++- >> 1 file changed, 8 insertions(+), 1 deletion(-) >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> index 8b03fb5..89776ac 100644 >> --- a/kernel/sched/fair.c >> +++ b/kernel/sched/fair.c >> @@ -690,7 +690,14 @@ void init_entity_runnable_average(struct sched_entity >> *se) >> * will definitely be update (after enqueue). >> */ >> sa->period_contrib = 1023; >> - sa->load_avg = scale_load_down(se->load.weight); >> + /* >> + * Tasks are intialized with full load to be seen as heavy task >> until >> + * they get a chance to stabilize to their real load level. >> + * group entity are intialized with null load to reflect the fact >> that >> + * nothing has been attached yet to the task group. >> + */ >> + if (entity_is_task(se)) >> + sa->load_avg = scale_load_down(se->load.weight); >> sa->load_sum = sa->load_avg * LOAD_AVG_MAX; >> /* >> * At this point, util_avg won't be used in select_task_rq_fair >> anyway >> >> >> >> >
On 18/10/16 12:56, Vincent Guittot wrote: > Le Tuesday 18 Oct 2016 à 12:34:12 (+0200), Peter Zijlstra a écrit : >> On Tue, Oct 18, 2016 at 11:45:48AM +0200, Vincent Guittot wrote: >>> On 18 October 2016 at 11:07, Peter Zijlstra <peterz@infradead.org> wrote: [...] > > The patch below fixes the issue on my platform: > > Dietmar, Omer can you confirm that this fix the problem of your platform too ? It fixes this broken BIOS issue on my T430 ( cpu_possible_mask > cpu_online_mask). I ran the original test with the cpu hogs (stress -c 4). Launch time of applications becomes normal again. Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> But this test only makes sure that we don't see any ghost contribution (from non-existing cpus) any more. We should study the tg->se[i]->avg.load_avg for the hierarchy of tg's (with the highest tg having a task enqueued) a little bit more, with and without your v5 'sched: reflect sched_entity move into task_group's load'. > --- > kernel/sched/fair.c | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 8b03fb5..89776ac 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -690,7 +690,14 @@ void init_entity_runnable_average(struct sched_entity *se) > * will definitely be update (after enqueue). > */ > sa->period_contrib = 1023; > - sa->load_avg = scale_load_down(se->load.weight); > + /* > + * Tasks are intialized with full load to be seen as heavy task until > + * they get a chance to stabilize to their real load level. > + * group entity are intialized with null load to reflect the fact that > + * nothing has been attached yet to the task group. > + */ > + if (entity_is_task(se)) > + sa->load_avg = scale_load_down(se->load.weight); > sa->load_sum = sa->load_avg * LOAD_AVG_MAX; > /* > * At this point, util_avg won't be used in select_task_rq_fair anyway
On 19 October 2016 at 11:46, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote: > On 18/10/16 12:56, Vincent Guittot wrote: >> Le Tuesday 18 Oct 2016 à 12:34:12 (+0200), Peter Zijlstra a écrit : >>> On Tue, Oct 18, 2016 at 11:45:48AM +0200, Vincent Guittot wrote: >>>> On 18 October 2016 at 11:07, Peter Zijlstra <peterz@infradead.org> wrote: > > [...] > >> >> The patch below fixes the issue on my platform: >> >> Dietmar, Omer can you confirm that this fix the problem of your platform too ? > > It fixes this broken BIOS issue on my T430 ( cpu_possible_mask > > cpu_online_mask). I ran the original test with the cpu hogs (stress -c > 4). Launch time of applications becomes normal again. > > Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Thanks > > But this test only makes sure that we don't see any ghost contribution > (from non-existing cpus) any more. > > We should study the tg->se[i]->avg.load_avg for the hierarchy of tg's > (with the highest tg having a task enqueued) a little bit more, with and > without your v5 'sched: reflect sched_entity move into task_group's load'. Can you elaborate ? Vincent > >> --- >> kernel/sched/fair.c | 9 ++++++++- >> 1 file changed, 8 insertions(+), 1 deletion(-) >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> index 8b03fb5..89776ac 100644 >> --- a/kernel/sched/fair.c >> +++ b/kernel/sched/fair.c >> @@ -690,7 +690,14 @@ void init_entity_runnable_average(struct sched_entity *se) >> * will definitely be update (after enqueue). >> */ >> sa->period_contrib = 1023; >> - sa->load_avg = scale_load_down(se->load.weight); >> + /* >> + * Tasks are intialized with full load to be seen as heavy task until >> + * they get a chance to stabilize to their real load level. >> + * group entity are intialized with null load to reflect the fact that >> + * nothing has been attached yet to the task group. >> + */ >> + if (entity_is_task(se)) >> + sa->load_avg = scale_load_down(se->load.weight); >> sa->load_sum = sa->load_avg * LOAD_AVG_MAX; >> /* >> * At this point, util_avg won't be used in select_task_rq_fair anyway
On Tue, Oct 18, 2016 at 01:56:51PM +0200, Vincent Guittot wrote: > --- > kernel/sched/fair.c | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 8b03fb5..89776ac 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -690,7 +690,14 @@ void init_entity_runnable_average(struct sched_entity *se) > * will definitely be update (after enqueue). > */ > sa->period_contrib = 1023; > - sa->load_avg = scale_load_down(se->load.weight); > + /* > + * Tasks are intialized with full load to be seen as heavy task until > + * they get a chance to stabilize to their real load level. > + * group entity are intialized with null load to reflect the fact that > + * nothing has been attached yet to the task group. > + */ > + if (entity_is_task(se)) > + sa->load_avg = scale_load_down(se->load.weight); > sa->load_sum = sa->load_avg * LOAD_AVG_MAX; > /* > * At this point, util_avg won't be used in select_task_rq_fair anyway > Vince, could you post a proper version of this patch with changelogs and tags so that we can get that merged into Linus' tree and stable for 4.8?
On 19 October 2016 at 13:33, Peter Zijlstra <peterz@infradead.org> wrote: > On Tue, Oct 18, 2016 at 01:56:51PM +0200, Vincent Guittot wrote: > >> --- >> kernel/sched/fair.c | 9 ++++++++- >> 1 file changed, 8 insertions(+), 1 deletion(-) >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> index 8b03fb5..89776ac 100644 >> --- a/kernel/sched/fair.c >> +++ b/kernel/sched/fair.c >> @@ -690,7 +690,14 @@ void init_entity_runnable_average(struct sched_entity *se) >> * will definitely be update (after enqueue). >> */ >> sa->period_contrib = 1023; >> - sa->load_avg = scale_load_down(se->load.weight); >> + /* >> + * Tasks are intialized with full load to be seen as heavy task until >> + * they get a chance to stabilize to their real load level. >> + * group entity are intialized with null load to reflect the fact that >> + * nothing has been attached yet to the task group. >> + */ >> + if (entity_is_task(se)) >> + sa->load_avg = scale_load_down(se->load.weight); >> sa->load_sum = sa->load_avg * LOAD_AVG_MAX; >> /* >> * At this point, util_avg won't be used in select_task_rq_fair anyway >> > > Vince, could you post a proper version of this patch with changelogs and > tags so that we can get that merged into Linus' tree and stable for 4.8? yes. i just have to finish the changelog and i sent it >
On 10/18/2016 07:56 AM, Vincent Guittot wrote: > Le Tuesday 18 Oct 2016 à 12:34:12 (+0200), Peter Zijlstra a écrit : >> On Tue, Oct 18, 2016 at 11:45:48AM +0200, Vincent Guittot wrote: >>> On 18 October 2016 at 11:07, Peter Zijlstra <peterz@infradead.org> wrote: >>>> So aside from funny BIOSes, this should also show up when creating >>>> cgroups when you have offlined a few CPUs, which is far more common I'd >>>> think. >>> The problem is also that the load of the tg->se[cpu] that represents >>> the tg->cfs_rq[cpu] is initialized to 1024 in: >>> alloc_fair_sched_group >>> for_each_possible_cpu(i) { >>> init_entity_runnable_average(se); >>> sa->load_avg = scale_load_down(se->load.weight); >>> >>> Initializing sa->load_avg to 1024 for a newly created task makes >>> sense as we don't know yet what will be its real load but i'm not sure >>> that we have to do the same for se that represents a task group. This >>> load should be initialized to 0 and it will increase when task will be >>> moved/attached into task group >> Yes, I think that makes sense, not sure how horrible that is with the > That should not be that bad because this initial value is only useful for > the few dozens of ms that follow the creation of the task group > >> current state of things, but after your propagate patch, that >> reinstates the interactivity hack that should work for sure. > The patch below fixes the issue on my platform: > > Dietmar, Omer can you confirm that this fix the problem of your platform too ? > > --- > kernel/sched/fair.c | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-)Vinc > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 8b03fb5..89776ac 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -690,7 +690,14 @@ void init_entity_runnable_average(struct sched_entity *se) > * will definitely be update (after enqueue). > */ > sa->period_contrib = 1023; > - sa->load_avg = scale_load_down(se->load.weight); > + /* > + * Tasks are intialized with full load to be seen as heavy task until > + * they get a chance to stabilize to their real load level. > + * group entity are intialized with null load to reflect the fact that > + * nothing has been attached yet to the task group. > + */ > + if (entity_is_task(se)) > + sa->load_avg = scale_load_down(se->load.weight); > sa->load_sum = sa->load_avg * LOAD_AVG_MAX; > /* > * At this point, util_avg won't be used in select_task_rq_fair anyway > > > > Omer also reports that this patch fixes the bug for him as well. Thanks for the great work, Vincent!
On 19 October 2016 at 16:49, Joseph Salisbury <joseph.salisbury@canonical.com> wrote: > On 10/18/2016 07:56 AM, Vincent Guittot wrote: >> Le Tuesday 18 Oct 2016 à 12:34:12 (+0200), Peter Zijlstra a écrit : >>> On Tue, Oct 18, 2016 at 11:45:48AM +0200, Vincent Guittot wrote: >>>> On 18 October 2016 at 11:07, Peter Zijlstra <peterz@infradead.org> wrote: >>>>> So aside from funny BIOSes, this should also show up when creating >>>>> cgroups when you have offlined a few CPUs, which is far more common I'd >>>>> think. >>>> The problem is also that the load of the tg->se[cpu] that represents >>>> the tg->cfs_rq[cpu] is initialized to 1024 in: >>>> alloc_fair_sched_group >>>> for_each_possible_cpu(i) { >>>> init_entity_runnable_average(se); >>>> sa->load_avg = scale_load_down(se->load.weight); >>>> >>>> Initializing sa->load_avg to 1024 for a newly created task makes >>>> sense as we don't know yet what will be its real load but i'm not sure >>>> that we have to do the same for se that represents a task group. This >>>> load should be initialized to 0 and it will increase when task will be >>>> moved/attached into task group >>> Yes, I think that makes sense, not sure how horrible that is with the >> That should not be that bad because this initial value is only useful for >> the few dozens of ms that follow the creation of the task group >> >>> current state of things, but after your propagate patch, that >>> reinstates the interactivity hack that should work for sure. >> The patch below fixes the issue on my platform: >> >> Dietmar, Omer can you confirm that this fix the problem of your platform too ? >> >> --- >> kernel/sched/fair.c | 9 ++++++++- >> 1 file changed, 8 insertions(+), 1 deletion(-)Vinc >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> index 8b03fb5..89776ac 100644 >> --- a/kernel/sched/fair.c >> +++ b/kernel/sched/fair.c >> @@ -690,7 +690,14 @@ void init_entity_runnable_average(struct sched_entity *se) >> * will definitely be update (after enqueue). >> */ >> sa->period_contrib = 1023; >> - sa->load_avg = scale_load_down(se->load.weight); >> + /* >> + * Tasks are intialized with full load to be seen as heavy task until >> + * they get a chance to stabilize to their real load level. >> + * group entity are intialized with null load to reflect the fact that >> + * nothing has been attached yet to the task group. >> + */ >> + if (entity_is_task(se)) >> + sa->load_avg = scale_load_down(se->load.weight); >> sa->load_sum = sa->load_avg * LOAD_AVG_MAX; >> /* >> * At this point, util_avg won't be used in select_task_rq_fair anyway >> >> >> >> > Omer also reports that this patch fixes the bug for him as well. Thanks > for the great work, Vincent! Thanks >
On 19/10/16 12:25, Vincent Guittot wrote: > On 19 October 2016 at 11:46, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote: >> On 18/10/16 12:56, Vincent Guittot wrote: >>> Le Tuesday 18 Oct 2016 à 12:34:12 (+0200), Peter Zijlstra a écrit : >>>> On Tue, Oct 18, 2016 at 11:45:48AM +0200, Vincent Guittot wrote: >>>>> On 18 October 2016 at 11:07, Peter Zijlstra <peterz@infradead.org> wrote: [...] >> But this test only makes sure that we don't see any ghost contribution >> (from non-existing cpus) any more. >> >> We should study the tg->se[i]->avg.load_avg for the hierarchy of tg's >> (with the highest tg having a task enqueued) a little bit more, with and >> without your v5 'sched: reflect sched_entity move into task_group's load'. > > Can you elaborate ? I try :-) I thought I will see some different behaviour because of the fact that the tg se's are initialized differently [1024 versus 0]. But I can't spot any difference. The test case is running a sysbench thread affine to cpu1 in tg_root/tg_1/tg_11/tg_111 on tip/sched/core on an ARM64 Juno (6 logical cpus). The moment the sysbench task is put into tg_111 tg_111->se[1]->avg.load_avg gets updated to 0 any way because of the huge time difference between creating this tg and attaching a task to it. So the tg->se[2]->avg.load_avg signals for tg_111, tg_11 and tg_1 look exactly the same w/o and w/ your patch. But your patch helps in this (very synthetic) test case as well. W/o your patch I see remaining tg->load_avg for tg_1 and tg_11 after the test case has finished because the tg's were exclusively used on cpu1. # cat /proc/sched_debug cfs_rq[1]:/tg_1 .tg_load_avg_contrib : 0 .tg_load_avg : 5120 (5 (unused cpus) * 1024 * 1) cfs_rq[1]:/tg_1/tg_11/tg_111 .tg_load_avg_contrib : 0 .tg_load_avg : 0 cfs_rq[1]:/tg_1/tg_11 .tg_load_avg_contrib : 0 .tg_load_avg : 5120 With your patch applied all the .tg_load_avg are 0.
On Wed, Oct 19, 2016 at 04:33:03PM +0100, Dietmar Eggemann wrote: > On 19/10/16 12:25, Vincent Guittot wrote: > > On 19 October 2016 at 11:46, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote: > >> On 18/10/16 12:56, Vincent Guittot wrote: > >>> Le Tuesday 18 Oct 2016 à 12:34:12 (+0200), Peter Zijlstra a écrit : > >>>> On Tue, Oct 18, 2016 at 11:45:48AM +0200, Vincent Guittot wrote: > >>>>> On 18 October 2016 at 11:07, Peter Zijlstra <peterz@infradead.org> wrote: > > [...] > > >> But this test only makes sure that we don't see any ghost contribution > >> (from non-existing cpus) any more. > >> > >> We should study the tg->se[i]->avg.load_avg for the hierarchy of tg's > >> (with the highest tg having a task enqueued) a little bit more, with and > >> without your v5 'sched: reflect sched_entity move into task_group's load'. > > > > Can you elaborate ? > > I try :-) > > I thought I will see some different behaviour because of the fact that > the tg se's are initialized differently [1024 versus 0]. This is the exact thing I was also worried about and that's the reason I tried to fix this in a different way. However I didn't find any behaviour difference once any task attached to child cfs_rq which is the point we really care about. I found this bug while making patch at https://lkml.org/lkml/2016/10/18/841 which will fail with wrong task_group load_avg. I tested Vincent's patch and above together, confirmed it's still good. Though I know Ingo already sent out pull request. Anyway. Tested-by: Joonwoo Park <joonwoop@codeaurora.org> Thanks, Joonwoo > > But I can't spot any difference. The test case is running a sysbench > thread affine to cpu1 in tg_root/tg_1/tg_11/tg_111 on tip/sched/core on > an ARM64 Juno (6 logical cpus). > The moment the sysbench task is put into tg_111 > tg_111->se[1]->avg.load_avg gets updated to 0 any way because of the > huge time difference between creating this tg and attaching a task to > it. So the tg->se[2]->avg.load_avg signals for tg_111, tg_11 and tg_1 > look exactly the same w/o and w/ your patch. > > But your patch helps in this (very synthetic) test case as well. W/o > your patch I see remaining tg->load_avg for tg_1 and tg_11 after the > test case has finished because the tg's were exclusively used on cpu1. > > # cat /proc/sched_debug > > cfs_rq[1]:/tg_1 > .tg_load_avg_contrib : 0 > .tg_load_avg : 5120 (5 (unused cpus) * 1024 * 1) > cfs_rq[1]:/tg_1/tg_11/tg_111 > .tg_load_avg_contrib : 0 > .tg_load_avg : 0 > cfs_rq[1]:/tg_1/tg_11 > .tg_load_avg_contrib : 0 > .tg_load_avg : 5120 > > With your patch applied all the .tg_load_avg are 0.
On 19 October 2016 at 17:33, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote: > On 19/10/16 12:25, Vincent Guittot wrote: >> On 19 October 2016 at 11:46, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote: >>> On 18/10/16 12:56, Vincent Guittot wrote: >>>> Le Tuesday 18 Oct 2016 à 12:34:12 (+0200), Peter Zijlstra a écrit : >>>>> On Tue, Oct 18, 2016 at 11:45:48AM +0200, Vincent Guittot wrote: >>>>>> On 18 October 2016 at 11:07, Peter Zijlstra <peterz@infradead.org> wrote: > > [...] > >>> But this test only makes sure that we don't see any ghost contribution >>> (from non-existing cpus) any more. >>> >>> We should study the tg->se[i]->avg.load_avg for the hierarchy of tg's >>> (with the highest tg having a task enqueued) a little bit more, with and >>> without your v5 'sched: reflect sched_entity move into task_group's load'. >> >> Can you elaborate ? > > I try :-) > > I thought I will see some different behaviour because of the fact that > the tg se's are initialized differently [1024 versus 0]. This difference should be noticeable (if noticeable) only during few hundreds of ms after the creation of the task group until the load_avg has reached its real value. > > But I can't spot any difference. The test case is running a sysbench > thread affine to cpu1 in tg_root/tg_1/tg_11/tg_111 on tip/sched/core on > an ARM64 Juno (6 logical cpus). > The moment the sysbench task is put into tg_111 > tg_111->se[1]->avg.load_avg gets updated to 0 any way because of the > huge time difference between creating this tg and attaching a task to > it. So the tg->se[2]->avg.load_avg signals for tg_111, tg_11 and tg_1 > look exactly the same w/o and w/ your patch. > > But your patch helps in this (very synthetic) test case as well. W/o > your patch I see remaining tg->load_avg for tg_1 and tg_11 after the > test case has finished because the tg's were exclusively used on cpu1. > > # cat /proc/sched_debug > > cfs_rq[1]:/tg_1 > .tg_load_avg_contrib : 0 > .tg_load_avg : 5120 (5 (unused cpus) * 1024 * 1) > cfs_rq[1]:/tg_1/tg_11/tg_111 > .tg_load_avg_contrib : 0 > .tg_load_avg : 0 > cfs_rq[1]:/tg_1/tg_11 > .tg_load_avg_contrib : 0 > .tg_load_avg : 5120 > > With your patch applied all the .tg_load_avg are 0.
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 8b03fb5..89776ac 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -690,7 +690,14 @@ void init_entity_runnable_average(struct sched_entity *se) * will definitely be update (after enqueue). */ sa->period_contrib = 1023; - sa->load_avg = scale_load_down(se->load.weight); + /* + * Tasks are intialized with full load to be seen as heavy task until + * they get a chance to stabilize to their real load level. + * group entity are intialized with null load to reflect the fact that + * nothing has been attached yet to the task group. + */ + if (entity_is_task(se)) + sa->load_avg = scale_load_down(se->load.weight); sa->load_sum = sa->load_avg * LOAD_AVG_MAX; /* * At this point, util_avg won't be used in select_task_rq_fair anyway