diff mbox

[4/4] sched,fair: Fix PELT integrity for new tasks

Message ID 20160620092339.GA4526@vingu-laptop
State New
Headers show

Commit Message

Vincent Guittot June 20, 2016, 9:23 a.m. UTC
Le Friday 17 Jun 2016 à 18:18:31 (+0200), Peter Zijlstra a écrit :
> On Fri, Jun 17, 2016 at 06:02:39PM +0200, Peter Zijlstra wrote:

> > So yes, ho-humm, how to go about doing that bestest. Lemme have a play.

> 

> This is what I came up with, not entirely pretty, but I suppose it'll

> have to do.

> 

> ---

> --- a/kernel/sched/fair.c

> +++ b/kernel/sched/fair.c

> @@ -724,6 +724,7 @@ void post_init_entity_util_avg(struct sc

>  	struct cfs_rq *cfs_rq = cfs_rq_of(se);

>  	struct sched_avg *sa = &se->avg;

>  	long cap = (long)(SCHED_CAPACITY_SCALE - cfs_rq->avg.util_avg) / 2;

> +	u64 now = cfs_rq_clock_task(cfs_rq);

>  

>  	if (cap > 0) {

>  		if (cfs_rq->avg.util_avg != 0) {

> @@ -738,7 +739,20 @@ void post_init_entity_util_avg(struct sc

>  		sa->util_sum = sa->util_avg * LOAD_AVG_MAX;

>  	}

>  

> -	update_cfs_rq_load_avg(cfs_rq_clock_task(cfs_rq), cfs_rq, false);

> +	if (entity_is_task(se)) {


Why only task ?

> +		struct task_struct *p = task_of(se);

> +		if (p->sched_class != &fair_sched_class) {

> +			/*

> +			 * For !fair tasks do attach_entity_load_avg()

> +			 * followed by detach_entity_load_avg() as per

> +			 * switched_from_fair().

> +			 */

> +			se->avg.last_update_time = now;

> +			return;

> +		}

> +	}

> +

> +	update_cfs_rq_load_avg(now, cfs_rq, false);

>  	attach_entity_load_avg(cfs_rq, se);


Don't we have to do a complete attach with attach_task_cfs_rq instead of just the load_avg ? to set also depth ?

What about something like below ?

---
--

Comments

Vincent Guittot June 20, 2016, 10:07 a.m. UTC | #1
On 20 June 2016 at 11:52, Peter Zijlstra <peterz@infradead.org> wrote:
> On Mon, Jun 20, 2016 at 11:23:39AM +0200, Vincent Guittot wrote:

>

>> > @@ -738,7 +739,20 @@ void post_init_entity_util_avg(struct sc

>> >             sa->util_sum = sa->util_avg * LOAD_AVG_MAX;

>> >     }

>> >

>> > -   update_cfs_rq_load_avg(cfs_rq_clock_task(cfs_rq), cfs_rq, false);

>> > +   if (entity_is_task(se)) {

>>

>> Why only task ?

>

> Only tasks can be of another class, the cgroup se's must be fair.


ok

>

>> > +           struct task_struct *p = task_of(se);

>> > +           if (p->sched_class != &fair_sched_class) {

>> > +                   /*

>> > +                    * For !fair tasks do attach_entity_load_avg()

>> > +                    * followed by detach_entity_load_avg() as per

>> > +                    * switched_from_fair().

>> > +                    */

>> > +                   se->avg.last_update_time = now;

>> > +                   return;

>> > +           }

>> > +   }

>> > +

>> > +   update_cfs_rq_load_avg(now, cfs_rq, false);

>> >     attach_entity_load_avg(cfs_rq, se);

>>

>> Don't we have to do a complete attach with attach_task_cfs_rq instead

>> of just the load_avg ? to set also depth ?

>

> init_tg_cfs_entity() sets depth for cgroup entities,

> attach_task_cfs_rq() reset depth on switched_to().

>

> So that should be fine.


But for new fair task ? it will not be set

>

>> ---

>> --- a/kernel/sched/fair.c

>> +++ b/kernel/sched/fair.c

>> @@ -723,6 +723,7 @@ void post_init_entity_util_avg(struct sched_entity *se)

>>       struct cfs_rq *cfs_rq = cfs_rq_of(se);

>>       struct sched_avg *sa = &se->avg;

>>       long cap = (long)(SCHED_CAPACITY_SCALE - cfs_rq->avg.util_avg) / 2;

>> +     u64 now = cfs_rq_clock_task(cfs_rq);

>>

>>       if (cap > 0) {

>>               if (cfs_rq->avg.util_avg != 0) {

>> @@ -737,8 +738,18 @@ void post_init_entity_util_avg(struct sched_entity *se)

>>               sa->util_sum = sa->util_avg * LOAD_AVG_MAX;

>>       }

>>

>> -     update_cfs_rq_load_avg(cfs_rq_clock_task(cfs_rq), cfs_rq, false);

>> -     attach_entity_load_avg(cfs_rq, se);

>

> There is no p here...


yes my condition is just wrong

>

>> +     if (p->sched_class == &fair_sched_class) {

>> +             /* fair entity must be attached to cfs_rq */

>> +             attach_task_cfs_rq(se);

>> +     } else {

>> +             /*

>> +              * For !fair tasks do attach_entity_load_avg()

>> +              * followed by detach_entity_load_avg() as per

>> +              * switched_from_fair().

>> +              */

>> +             se->avg.last_update_time = now;

>> +     }

>> +

>>  }

>

> And you're now not attaching new cgroup thingies.
Vincent Guittot June 21, 2016, 12:36 p.m. UTC | #2
On 21 June 2016 at 13:43, Peter Zijlstra <peterz@infradead.org> wrote:
> On Mon, Jun 20, 2016 at 11:23:39AM +0200, Vincent Guittot wrote:

>

>> Don't we have to do a complete attach with attach_task_cfs_rq instead

>> of just the load_avg ? to set also depth ?

>

> Hmm, yes, your sched_set_group() change seems to have munged this.

>


I think that it was done by the attach_task_cfs_rq during the activate_task.
Now, the attach is done in post_init_entity_util_avg. Can't we just
set the depth in post_init_entity_util_avg ?

> Previously we'd call task_move_group_fair() which would indeed setup

> depth.

>

> I've changed it thus (all smaller edits just didn't look right):

>

>

> --- a/kernel/sched/core.c

> +++ b/kernel/sched/core.c

> @@ -7743,7 +7743,7 @@ void sched_offline_group(struct task_gro

>   * group pointers. The task will be attached to the runqueue during its wake

>   * up.

>   */

> -static void sched_set_group(struct task_struct *tsk, bool move)

> +static void sched_change_group(struct task_struct *tsk, int type)

>  {

>         struct task_group *tg;

>

> @@ -7758,8 +7758,8 @@ static void sched_set_group(struct task_

>         tsk->sched_task_group = tg;

>

>  #ifdef CONFIG_FAIR_GROUP_SCHED

> -       if (move && tsk->sched_class->task_move_group)

> -               tsk->sched_class->task_move_group(tsk);

> +       if (tsk->sched_class->task_change_group)

> +               tsk->sched_class->task_change_group(tsk, type);

>         else

>  #endif

>                 set_task_rq(tsk, task_cpu(tsk));

> @@ -7788,7 +7788,7 @@ void sched_move_task(struct task_struct

>         if (unlikely(running))

>                 put_prev_task(rq, tsk);

>

> -       sched_set_group(tsk, true);

> +       sched_change_group(tsk, TASK_MOVE_GROUP);

>

>         if (unlikely(running))

>                 tsk->sched_class->set_curr_task(rq);

> @@ -8227,7 +8227,7 @@ static void cpu_cgroup_fork(struct task_

>

>         rq = task_rq_lock(task, &rf);

>

> -       sched_set_group(task, false);

> +       sched_change_group(task, TASK_SET_GROUP);

>

>         task_rq_unlock(rq, task, &rf);

>  }

> --- a/kernel/sched/fair.c

> +++ b/kernel/sched/fair.c

> @@ -8443,6 +8443,14 @@ void init_cfs_rq(struct cfs_rq *cfs_rq)

>  }

>

>  #ifdef CONFIG_FAIR_GROUP_SCHED

> +static void task_set_group_fair(struct task_struct *p)

> +{

> +       struct sched_entity *se = &p->se;

> +

> +       set_task_rq(p, task_cpu(p));

> +       se->depth = se->parent ? se->parent->depth + 1 : 0;

> +}

> +

>  static void task_move_group_fair(struct task_struct *p)

>  {

>         detach_task_cfs_rq(p);

> @@ -8455,6 +8463,19 @@ static void task_move_group_fair(struct

>         attach_task_cfs_rq(p);

>  }

>

> +static void task_change_group_fair(struct task_struct *p, int type)

> +{

> +       switch (type) {

> +       case TASK_SET_GROUP:

> +               task_set_group_fair(p);

> +               break;

> +

> +       case TASK_MOVE_GROUP:

> +               task_move_group_fair(p);

> +               break;

> +       }

> +}

> +

>  void free_fair_sched_group(struct task_group *tg)

>  {

>         int i;

> @@ -8683,7 +8704,7 @@ const struct sched_class fair_sched_clas

>         .update_curr            = update_curr_fair,

>

>  #ifdef CONFIG_FAIR_GROUP_SCHED

> -       .task_move_group        = task_move_group_fair,

> +       .task_change_group      = task_change_group_fair,

>  #endif

>  };

>

> --- a/kernel/sched/sched.h

> +++ b/kernel/sched/sched.h

> @@ -1247,7 +1247,10 @@ struct sched_class {

>         void (*update_curr) (struct rq *rq);

>

>  #ifdef CONFIG_FAIR_GROUP_SCHED

> -       void (*task_move_group) (struct task_struct *p);

> +#define TASK_SET_GROUP  0

> +#define TASK_MOVE_GROUP        1

> +

> +       void (*task_change_group) (struct task_struct *p, int type);

>  #endif

>  };

>
Vincent Guittot June 21, 2016, 12:56 p.m. UTC | #3
On 21 June 2016 at 14:47, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, Jun 21, 2016 at 02:36:46PM +0200, Vincent Guittot wrote:

>> On 21 June 2016 at 13:43, Peter Zijlstra <peterz@infradead.org> wrote:

>> > On Mon, Jun 20, 2016 at 11:23:39AM +0200, Vincent Guittot wrote:

>> >

>> >> Don't we have to do a complete attach with attach_task_cfs_rq instead

>> >> of just the load_avg ? to set also depth ?

>> >

>> > Hmm, yes, your sched_set_group() change seems to have munged this.

>> >

>>

>> I think that it was done by the attach_task_cfs_rq during the activate_task.

>> Now, the attach is done in post_init_entity_util_avg. Can't we just

>> set the depth in post_init_entity_util_avg ?

>

> Yes, I actually had that patch for a little while.

>

> But since its cgroup specific, I felt it should be in the cgroup code,

> hence the current patch.


that's a fair point
diff mbox

Patch

--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -723,6 +723,7 @@  void post_init_entity_util_avg(struct sched_entity *se)
 	struct cfs_rq *cfs_rq = cfs_rq_of(se);
 	struct sched_avg *sa = &se->avg;
 	long cap = (long)(SCHED_CAPACITY_SCALE - cfs_rq->avg.util_avg) / 2;
+	u64 now = cfs_rq_clock_task(cfs_rq);
 
 	if (cap > 0) {
 		if (cfs_rq->avg.util_avg != 0) {
@@ -737,8 +738,18 @@  void post_init_entity_util_avg(struct sched_entity *se)
 		sa->util_sum = sa->util_avg * LOAD_AVG_MAX;
 	}
 
-	update_cfs_rq_load_avg(cfs_rq_clock_task(cfs_rq), cfs_rq, false);
-	attach_entity_load_avg(cfs_rq, se);
+	if (p->sched_class == &fair_sched_class) {
+		/* fair entity must be attached to cfs_rq */
+		attach_task_cfs_rq(se);
+	} else {
+		/*
+		 * For !fair tasks do attach_entity_load_avg()
+		 * followed by detach_entity_load_avg() as per
+		 * switched_from_fair().
+		 */
+		se->avg.last_update_time = now;
+	}
+
 }
 
 static inline unsigned long cfs_rq_runnable_load_avg(struct cfs_rq *cfs_rq);