diff mbox

[v3] sched/deadline: fix earliest_dl.next logic

Message ID 20151127112647.GR20439@e106622-lin
State New
Headers show

Commit Message

Juri Lelli Nov. 27, 2015, 11:26 a.m. UTC
Hi,

[+Luca, as he has been testing this patch an he has probably findings to
share]

On 19/11/15 18:11, Wanpeng li wrote:
> earliest_dl.next should cache deadline of the earliest ready task that

> is also enqueued in the pushable rbtree, as pull algorithm uses this

> information to find candidates for migration: if the earliest_dl.next

> deadline of source rq is earlier than the earliest_dl.curr deadline of

> destination rq, the task from the source rq can be pulled.

> 

> However, current implementation only guarantees that earliest_dl.next is

> the deadline of the next ready task instead of the next pushable task;

> which will result in potentially holding both rqs' lock and find nothing

> to migrate because of affinity constraints. In addition, current logic

> doesn't update the next candidate for pushing in pick_next_task_dl(),

> even if the running task is never eligible.

> 

> This patch fixes both problems by updating earliest_dl.next when

> pushable dl task is enqueued/dequeued, similar to what we already do for

> RT.

> 

> Signed-off-by: Wanpeng li <wanpeng.li@hotmail.com>

> ---

> v2 -> v3:

>  * reset dl_rq->earliest_dl.next to 0 if !next_pushable

> v1 -> v2:

>  * fix potential NULL pointer dereference

> 

>  kernel/sched/deadline.c | 63 ++++++++++---------------------------------------

>  1 file changed, 12 insertions(+), 51 deletions(-)

> 

> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c

> index 142df26..547d102 100644

> --- a/kernel/sched/deadline.c

> +++ b/kernel/sched/deadline.c

> @@ -87,6 +87,8 @@ void init_dl_rq(struct dl_rq *dl_rq)

>  

>  #ifdef CONFIG_SMP

>  

> +static struct task_struct *pick_next_pushable_dl_task(struct rq *rq);

> +

>  static inline int dl_overloaded(struct rq *rq)

>  {

>  	return atomic_read(&rq->rd->dlo_count);

> @@ -181,11 +183,15 @@ static void enqueue_pushable_dl_task(struct rq *rq, struct task_struct *p)

>  

>  	rb_link_node(&p->pushable_dl_tasks, parent, link);

>  	rb_insert_color(&p->pushable_dl_tasks, &dl_rq->pushable_dl_tasks_root);

> +

> +	if (dl_time_before(p->dl.deadline, dl_rq->earliest_dl.next))

> +		dl_rq->earliest_dl.next = p->dl.deadline;


This seems to be a bug, as earliest_dl.next is initialized to 0 and
dl_time_before() will say that p has later deadline than
earliest_dl.next even if p is actually the first pushable task.

>  }

>  

>  static void dequeue_pushable_dl_task(struct rq *rq, struct task_struct *p)

>  {

>  	struct dl_rq *dl_rq = &rq->dl;

> +	struct task_struct *next_pushable;

>  

>  	if (RB_EMPTY_NODE(&p->pushable_dl_tasks))

>  		return;

> @@ -199,6 +205,12 @@ static void dequeue_pushable_dl_task(struct rq *rq, struct task_struct *p)

>  

>  	rb_erase(&p->pushable_dl_tasks, &dl_rq->pushable_dl_tasks_root);

>  	RB_CLEAR_NODE(&p->pushable_dl_tasks);

> +

> +	next_pushable = pick_next_pushable_dl_task(rq);

> +	if (next_pushable)

> +		dl_rq->earliest_dl.next = next_pushable->dl.deadline;

> +	else

> +		dl_rq->earliest_dl.next = 0;


As already said, this is useless (sorry for suggesting it in the first
instance).

What follows might fix these two issue. However, Luca is telling me that
he is seeing some other issue with this patch on his testing box. Maybe
he can directly comment on this.

Thanks,

- Juri

---
 kernel/sched/deadline.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Comments

Juri Lelli Dec. 1, 2015, 11:30 a.m. UTC | #1
On 30/11/15 10:20, Wanpeng Li wrote:
> 2015-11-27 20:14 GMT+08:00 Luca Abeni <luca.abeni@unitn.it>:

> > Hi all,

> >

> > I ran some quick tests on this patch (because I was working on something

> > related), and it seems to me that it triggers a bug. Here are some

> > information:


[snip]

> >

> > Here is my understanding of the crash:

> > - schedule() invokes pick_next_task_dl() which wants to do a context switch

> > (by selecting

> >   for execution a new task "p" which is different from "prev")

> > - pick_next_task_dl() invokes put_prev, which puts the "prev" task in the

> > pushable tasks

> >   queue (WARNING! "prev" is still the "current" task in this rq, because the

> > scheduler is

> >   still running... I think this is the source of the issue)

> > - then, pick_next_task_dl() invokes dequeue_pushable_dl_task() on p, to

> > remove the selected

> >   task from the pushable tasks queue...

> > - ...But after your patch dequeue_pushable_dl_task() invokes

> > pick_next_pushable_dl_task().

> >   Which sees that the next pushable task is the "current" task (see above).

> > This happens

> >   becuase "prev" has already been inserted in the pushable tasks queue, and

> > can be the

> >   next pushable task... But "current" has not been updated yet.

> > - The BUG_ON() at line 1443 of deadline.c is just "BUG_ON(task_current(rq,

> > p))"

> 

> Thanks for your report and great analyse, Luca, you are right.

> 

> >

> > Summing up, I think pick_next_pushable_dl_task() cannot be called from

> > dequeue_pushable_dl_task() (at least, not without removing or modifying that

> > BUG_ON()).

> 

> Juri, how about remove this BUG_ON() just like rt class?

> 


It seems that we actually check the very same conditions for RT, as we
do for DL. The difference is that RT doesn't call pick_next_pushable
_task(). I think we can do the same, just checking and eventually using
the updated leftmost in dequeue_pushable_dl_task().

Thanks,

- Juri
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
diff mbox

Patch

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 547d102..d6de660 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -178,14 +178,13 @@  static void enqueue_pushable_dl_task(struct rq *rq, struct task_struct *p)
 		}
 	}
 
-	if (leftmost)
+	if (leftmost) {
 		dl_rq->pushable_dl_tasks_leftmost = &p->pushable_dl_tasks;
+		dl_rq->earliest_dl.next = p->dl.deadline;
+	}
 
 	rb_link_node(&p->pushable_dl_tasks, parent, link);
 	rb_insert_color(&p->pushable_dl_tasks, &dl_rq->pushable_dl_tasks_root);
-
-	if (dl_time_before(p->dl.deadline, dl_rq->earliest_dl.next))
-		dl_rq->earliest_dl.next = p->dl.deadline;
 }
 
 static void dequeue_pushable_dl_task(struct rq *rq, struct task_struct *p)
@@ -209,8 +208,6 @@  static void dequeue_pushable_dl_task(struct rq *rq, struct task_struct *p)
 	next_pushable = pick_next_pushable_dl_task(rq);
 	if (next_pushable)
 		dl_rq->earliest_dl.next = next_pushable->dl.deadline;
-	else
-		dl_rq->earliest_dl.next = 0;
 }
 
 static inline int has_pushable_dl_tasks(struct rq *rq)