diff mbox

[PATCHv2,6/8] linux-generic: queue: correct handling of reorder completion

Message ID 1441293400-28777-7-git-send-email-bill.fischofer@linaro.org
State Accepted
Commit 9c3fcc0e5c11cf8117672f63a346b80a66d72103
Headers show

Commit Message

Bill Fischofer Sept. 3, 2015, 3:16 p.m. UTC
Fix a race condition that could cause events to be stuck in an ordered
queue's reorder queue.

Signed-off-by: Bill Fischofer <bill.fischofer@linaro.org>
---
 .../linux-generic/include/odp_queue_internal.h     | 36 ++++++++++++----
 platform/linux-generic/odp_queue.c                 | 48 +++++++++-------------
 2 files changed, 48 insertions(+), 36 deletions(-)

Comments

Nicolas Morey-Chaisemartin Sept. 3, 2015, 4:05 p.m. UTC | #1
On 09/03/2015 05:16 PM, Bill Fischofer wrote:
> @@ -910,23 +897,28 @@ int release_order(queue_entry_t *origin_qe, uint64_t order,
>  		order_release(origin_qe, 1);
>  
>  		/* Check if this release allows us to unblock waiters.
> -		 * Note that we can only process complete waiters since
> -		 * if the sustain bit is set for a buffer this means that
> -		 * some other thread is working on it and will be
> -		 * responsible for resolving order when it is complete.
> +		 * At the point of this call, the reorder list may contain
> +		 * zero or more placeholders that need to be freed, followed
> +		 * by zero or one complete reorder buffer chain.
>  		 */

I'm struggling to see why this is true.
From my understanding, the reorder list is sorted solely on order.
Place holders are inserted when a thread releases a scheduler while it's not its turn yet (thread order > order_out).

So let's assume 5 threads pulling each 1 buffer ordered 0 to 4.
For some reason, T0 is busy and not releasing/Sending anything so order_out stays stuck at 0

T1 frees the buffer, releases the scheduler => a place holder is inserted (ReorderList = P1)
T2 sends the buffer => Reorder list = P1 B2
    Scheduler release clears the sustain bit here
T3 frees the buffer and releases
T4 sends the buffer

In the end the reorder list should look like P1 B2 P3 B4

Or I am missing something here.
Bill Fischofer Sept. 3, 2015, 4:19 p.m. UTC | #2
On Thu, Sep 3, 2015 at 11:05 AM, Nicolas Morey-Chaisemartin <
nmorey@kalray.eu> wrote:

>
>
> On 09/03/2015 05:16 PM, Bill Fischofer wrote:
> > @@ -910,23 +897,28 @@ int release_order(queue_entry_t *origin_qe,
> uint64_t order,
> >               order_release(origin_qe, 1);
> >
> >               /* Check if this release allows us to unblock waiters.
> > -              * Note that we can only process complete waiters since
> > -              * if the sustain bit is set for a buffer this means that
> > -              * some other thread is working on it and will be
> > -              * responsible for resolving order when it is complete.
> > +              * At the point of this call, the reorder list may contain
> > +              * zero or more placeholders that need to be freed,
> followed
> > +              * by zero or one complete reorder buffer chain.
> >                */
>
>
Note that this comment was simplified in v2 of the patch


> I'm struggling to see why this is true.
> From my understanding, the reorder list is sorted solely on order.
> Place holders are inserted when a thread releases a scheduler while it's
> not its turn yet (thread order > order_out).
>
> So let's assume 5 threads pulling each 1 buffer ordered 0 to 4.
> For some reason, T0 is busy and not releasing/Sending anything so
> order_out stays stuck at 0
>
> T1 frees the buffer, releases the scheduler => a place holder is inserted
> (ReorderList = P1)
> T2 sends the buffer => Reorder list = P1 B2
>     Scheduler release clears the sustain bit here
> T3 frees the buffer and releases
> T4 sends the buffer
>
> In the end the reorder list should look like P1 B2 P3 B4
>
> Or I am missing something here.
>

That's correct.  The corner case we're avoiding here is let's say T2 has
sent one or more buffers but hasn't released order yet.  So the Reorder
list may look like:

P1, B2a, B2b, P3, B4

Now suppose T0 finishes and so it resolves its order.  As part of that
order processing we'll bump order_out and see that we can now also remove
P1, which bumps the order again.  It's now the case that T3 is in order,
but the two buffers on the list have their sustain bits set, which means
that T3 is still running and could issue more enqueus.  If we tried to
process B2a and B2b here, we'd be in a race condition between two threads
trying to process the same order, and this would likely result in order
inversion.

So the rule is you can only process buffers on the list if the buffer(s)
for that order end with a buffer that has sustain=0, because that means
that the thread that held that order has released it's context so there can
be no race condition.

Make sense?
Nicolas Morey-Chaisemartin Sept. 3, 2015, 5:28 p.m. UTC | #3
> From: "Bill Fischofer" <bill.fischofer@linaro.org>
> To: "Nicolas Morey-Chaisemartin" <nmorey@kalray.eu>
> Cc: "LNG ODP Mailman List" <lng-odp@lists.linaro.org>, "Maxim Uvarov"
> <maxim.uvarov@linaro.org>
> Sent: Thursday, 3 September, 2015 6:19:52 PM
> Subject: Re: [lng-odp] [PATCHv2 6/8] linux-generic: queue: correct handling
> of reorder completion

> On Thu, Sep 3, 2015 at 11:05 AM, Nicolas Morey-Chaisemartin <
> nmorey@kalray.eu > wrote:

> > On 09/03/2015 05:16 PM, Bill Fischofer wrote:
> 
> > > @@ -910,23 +897,28 @@ int release_order(queue_entry_t *origin_qe,
> > > uint64_t
> > > order,
> 
> > > order_release(origin_qe, 1);
> 
> > >
> 
> > > /* Check if this release allows us to unblock waiters.
> 
> > > - * Note that we can only process complete waiters since
> 
> > > - * if the sustain bit is set for a buffer this means that
> 
> > > - * some other thread is working on it and will be
> 
> > > - * responsible for resolving order when it is complete.
> 
> > > + * At the point of this call, the reorder list may contain
> 
> > > + * zero or more placeholders that need to be freed, followed
> 
> > > + * by zero or one complete reorder buffer chain.
> 
> > > */
> 

> Note that this comment was simplified in v2 of the patch

> > I'm struggling to see why this is true.
> 
> > From my understanding, the reorder list is sorted solely on order.
> 
> > Place holders are inserted when a thread releases a scheduler while it's
> > not
> > its turn yet (thread order > order_out).
> 

> > So let's assume 5 threads pulling each 1 buffer ordered 0 to 4.
> 
> > For some reason, T0 is busy and not releasing/Sending anything so order_out
> > stays stuck at 0
> 

> > T1 frees the buffer, releases the scheduler => a place holder is inserted
> > (ReorderList = P1)
> 
> > T2 sends the buffer => Reorder list = P1 B2
> 
> > Scheduler release clears the sustain bit here
> 
> > T3 frees the buffer and releases
> 
> > T4 sends the buffer
> 

> > In the end the reorder list should look like P1 B2 P3 B4
> 

> > Or I am missing something here.
> 

> That's correct. The corner case we're avoiding here is let's say T2 has sent
> one or more buffers but hasn't released order yet. So the Reorder list may
> look like:

> P1, B2a, B2b, P3, B4

> Now suppose T0 finishes and so it resolves its order. As part of that order
> processing we'll bump order_out and see that we can now also remove P1,
> which bumps the order again. It's now the case that T3 is in order, but the
> two buffers on the list have their sustain bits set, which means that T3 is
> still running and could issue more enqueus. If we tried to process B2a and
> B2b here, we'd be in a race condition between two threads trying to process
> the same order, and this would likely result in order inversion.

> So the rule is you can only process buffers on the list if the buffer(s) for
> that order end with a buffer that has sustain=0, because that means that the
> thread that held that order has released it's context so there can be no
> race condition.

> Make sense?

It does. I'm not at work anymore so I can't check the code but from what I saw release_order is only called when the scheduler is released. And from what I see in this patch, we never pull more than 1 EVent (or maybe a list of event if they have the same order) at once. 
So in my previous case, when T0 releases the scheduler, we free P1, send B2 and then stop (or maybe also 3 P3, not sure about that one) but B4 stays in the reorder queue although it could be sent ! 
Assuming that no other packet comes and all the queues are empty, how is B4 sent ? 

Nicolas
Bill Fischofer Sept. 3, 2015, 5:31 p.m. UTC | #4
The send of B2 will finish up by repeating the order resolution logic to
see if further progress can be made.  So it propagates down the list until
we can't resolve any more.  At that point its up to the next guy to take
over when they start resolving their own order.

On Thu, Sep 3, 2015 at 12:28 PM, Nicolas Morey Chaisemartin <
nmorey@kalray.eu> wrote:

>
>
> *From: *"Bill Fischofer" <bill.fischofer@linaro.org>
> *To: *"Nicolas Morey-Chaisemartin" <nmorey@kalray.eu>
> *Cc: *"LNG ODP Mailman List" <lng-odp@lists.linaro.org>, "Maxim Uvarov" <
> maxim.uvarov@linaro.org>
> *Sent: *Thursday, 3 September, 2015 6:19:52 PM
> *Subject: *Re: [lng-odp] [PATCHv2 6/8] linux-generic: queue: correct
> handling of reorder completion
>
>
>
> On Thu, Sep 3, 2015 at 11:05 AM, Nicolas Morey-Chaisemartin <
> nmorey@kalray.eu> wrote:
>
>>
>>
>> On 09/03/2015 05:16 PM, Bill Fischofer wrote:
>> > @@ -910,23 +897,28 @@ int release_order(queue_entry_t *origin_qe,
>> uint64_t order,
>> >               order_release(origin_qe, 1);
>> >
>> >               /* Check if this release allows us to unblock waiters.
>> > -              * Note that we can only process complete waiters since
>> > -              * if the sustain bit is set for a buffer this means that
>> > -              * some other thread is working on it and will be
>> > -              * responsible for resolving order when it is complete.
>> > +              * At the point of this call, the reorder list may contain
>> > +              * zero or more placeholders that need to be freed,
>> followed
>> > +              * by zero or one complete reorder buffer chain.
>> >                */
>>
>>
> Note that this comment was simplified in v2 of the patch
>
>
>> I'm struggling to see why this is true.
>> From my understanding, the reorder list is sorted solely on order.
>> Place holders are inserted when a thread releases a scheduler while it's
>> not its turn yet (thread order > order_out).
>>
>> So let's assume 5 threads pulling each 1 buffer ordered 0 to 4.
>> For some reason, T0 is busy and not releasing/Sending anything so
>> order_out stays stuck at 0
>>
>> T1 frees the buffer, releases the scheduler => a place holder is inserted
>> (ReorderList = P1)
>> T2 sends the buffer => Reorder list = P1 B2
>>     Scheduler release clears the sustain bit here
>> T3 frees the buffer and releases
>> T4 sends the buffer
>>
>> In the end the reorder list should look like P1 B2 P3 B4
>>
>> Or I am missing something here.
>>
>
> That's correct.  The corner case we're avoiding here is let's say T2 has
> sent one or more buffers but hasn't released order yet.  So the Reorder
> list may look like:
>
> P1, B2a, B2b, P3, B4
>
> Now suppose T0 finishes and so it resolves its order.  As part of that
> order processing we'll bump order_out and see that we can now also remove
> P1, which bumps the order again.  It's now the case that T3 is in order,
> but the two buffers on the list have their sustain bits set, which means
> that T3 is still running and could issue more enqueus.  If we tried to
> process B2a and B2b here, we'd be in a race condition between two threads
> trying to process the same order, and this would likely result in order
> inversion.
>
> So the rule is you can only process buffers on the list if the buffer(s)
> for that order end with a buffer that has sustain=0, because that means
> that the thread that held that order has released it's context so there can
> be no race condition.
>
> Make sense?
>
> It does. I'm not at work anymore so I can't check the code but from what I
> saw release_order is only called when the scheduler is released. And from
> what I see in this patch, we never pull more than 1 EVent (or maybe a list
> of event if they have the same order) at once.
> So in my previous case,  when T0 releases the scheduler, we free P1, send
> B2 and then stop (or maybe also 3 P3, not sure about that one) but B4 stays
> in the reorder queue although it could be sent !
> Assuming that no other packet comes and all the queues are empty, how is
> B4 sent ?
>
> Nicolas
>
>
diff mbox

Patch

diff --git a/platform/linux-generic/include/odp_queue_internal.h b/platform/linux-generic/include/odp_queue_internal.h
index f285ea3..48576bc 100644
--- a/platform/linux-generic/include/odp_queue_internal.h
+++ b/platform/linux-generic/include/odp_queue_internal.h
@@ -284,18 +284,38 @@  static inline int reorder_deq(queue_entry_t *queue,
 	return deq_count;
 }
 
-static inline int reorder_complete(odp_buffer_hdr_t *reorder_buf)
+static inline void reorder_complete(queue_entry_t *origin_qe,
+				    odp_buffer_hdr_t **reorder_buf_return,
+				    odp_buffer_hdr_t **placeholder_buf,
+				    int placeholder_append)
 {
-	odp_buffer_hdr_t *next_buf = reorder_buf->next;
-	uint64_t order = reorder_buf->order;
+	odp_buffer_hdr_t *reorder_buf = origin_qe->s.reorder_head;
+	odp_buffer_hdr_t *next_buf;
+
+	*reorder_buf_return = NULL;
+	if (!placeholder_append)
+		*placeholder_buf = NULL;
 
-	while (reorder_buf->flags.sustain &&
-	       next_buf && next_buf->order == order) {
-		reorder_buf = next_buf;
+	while (reorder_buf &&
+	       reorder_buf->order <= origin_qe->s.order_out) {
 		next_buf = reorder_buf->next;
-	}
 
-	return !reorder_buf->flags.sustain;
+		if (!reorder_buf->target_qe) {
+			origin_qe->s.reorder_head = next_buf;
+			reorder_buf->next         = *placeholder_buf;
+			*placeholder_buf          = reorder_buf;
+
+			reorder_buf = next_buf;
+			order_release(origin_qe, 1);
+		} else if (reorder_buf->flags.sustain) {
+			reorder_buf = next_buf;
+		} else {
+			*reorder_buf_return = origin_qe->s.reorder_head;
+			origin_qe->s.reorder_head =
+				origin_qe->s.reorder_head->next;
+			break;
+		}
+	}
 }
 
 static inline void get_queue_order(queue_entry_t **origin_qe, uint64_t *order,
diff --git a/platform/linux-generic/odp_queue.c b/platform/linux-generic/odp_queue.c
index 87483b9..15abd93 100644
--- a/platform/linux-generic/odp_queue.c
+++ b/platform/linux-generic/odp_queue.c
@@ -458,18 +458,10 @@  int queue_enq(queue_entry_t *queue, odp_buffer_hdr_t *buf_hdr, int sustain)
 		order_release(origin_qe, release_count + placeholder_count);
 
 		/* Now handle any unblocked complete buffers destined for
-		 * other queues.  Note that these must be complete because
-		 * otherwise another thread is working on it and is
-		 * responsible for resolving order when it is complete.
+		 * other queues, appending placeholder bufs as needed.
 		 */
 		UNLOCK(&queue->s.lock);
-
-		if (reorder_buf &&
-		    reorder_buf->order <= origin_qe->s.order_out &&
-		    reorder_complete(reorder_buf))
-			origin_qe->s.reorder_head = reorder_buf->next;
-		else
-			reorder_buf = NULL;
+		reorder_complete(origin_qe, &reorder_buf, &placeholder_buf, 1);
 		UNLOCK(&origin_qe->s.lock);
 
 		if (reorder_buf)
@@ -827,12 +819,7 @@  int queue_pktout_enq(queue_entry_t *queue, odp_buffer_hdr_t *buf_hdr,
 	order_release(origin_qe, release_count + placeholder_count);
 
 	/* Now handle sends to other queues that are ready to go */
-	if (reorder_buf &&
-	    reorder_buf->order <= origin_qe->s.order_out &&
-	    reorder_complete(reorder_buf))
-		origin_qe->s.reorder_head = reorder_buf->next;
-	else
-		reorder_buf = NULL;
+	reorder_complete(origin_qe, &reorder_buf, &placeholder_buf, 1);
 
 	/* We're fully done with the origin_qe at last */
 	UNLOCK(&origin_qe->s.lock);
@@ -910,23 +897,28 @@  int release_order(queue_entry_t *origin_qe, uint64_t order,
 		order_release(origin_qe, 1);
 
 		/* Check if this release allows us to unblock waiters.
-		 * Note that we can only process complete waiters since
-		 * if the sustain bit is set for a buffer this means that
-		 * some other thread is working on it and will be
-		 * responsible for resolving order when it is complete.
+		 * At the point of this call, the reorder list may contain
+		 * zero or more placeholders that need to be freed, followed
+		 * by zero or one complete reorder buffer chain.
 		 */
-		reorder_buf = origin_qe->s.reorder_head;
-
-		if (reorder_buf &&
-		    reorder_buf->order <= origin_qe->s.order_out &&
-		    reorder_complete(reorder_buf))
-			origin_qe->s.reorder_head = reorder_buf->next;
-		else
-			reorder_buf = NULL;
+		reorder_complete(origin_qe, &reorder_buf,
+				 &placeholder_buf_hdr, 0);
 
+		/* Now safe to unlock */
 		UNLOCK(&origin_qe->s.lock);
+
+		/* If reorder_buf has a target, do the enq now */
 		if (reorder_buf)
 			queue_enq_internal(reorder_buf);
+
+		while (placeholder_buf_hdr) {
+			odp_buffer_hdr_t *placeholder_next =
+				placeholder_buf_hdr->next;
+
+			odp_buffer_free(placeholder_buf_hdr->handle.handle);
+			placeholder_buf_hdr = placeholder_next;
+		}
+
 		return 0;
 	}