diff mbox series

[06/16] mmc: core: replace waitqueue with worker

Message ID 20170209153403.9730-7-linus.walleij@linaro.org
State New
Headers show
Series multiqueue for MMC/SD third try | expand

Commit Message

Linus Walleij Feb. 9, 2017, 3:33 p.m. UTC
The waitqueue in the host context is there to signal back from
mmc_request_done() through mmc_wait_data_done() that the hardware
is done with a command, and when the wait is over, the core
will typically submit the next asynchronous request that is pending
just waiting for the hardware to be available.

This is in the way for letting the mmc_request_done() trigger the
report up to the block layer that a block request is finished.

Re-jig this as a first step, remvoving the waitqueue and introducing
a work that will run after a completed asynchronous request,
finalizing that request, including retransmissions, and eventually
reporting back with a completion and a status code to the
asynchronous issue method.

This had the upside that we can remove the MMC_BLK_NEW_REQUEST
status code and the "new_request" state in the request queue
that is only there to make the state machine spin out
the first time we send a request.

Introduce a workqueue in the host for handling just this, and
then a work and completion in the asynchronous request to deal
with this mechanism.

This is a central change that let us do many other changes since
we have broken the submit and complete code paths in two, and we
can potentially remove the NULL flushing of the asynchronous
pipeline and report block requests as finished directly from
the worker.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>

---
 drivers/mmc/core/block.c |  7 ++--
 drivers/mmc/core/core.c  | 84 +++++++++++++++++++++++-------------------------
 drivers/mmc/core/queue.c |  6 ----
 drivers/mmc/core/queue.h |  1 -
 include/linux/mmc/core.h |  3 +-
 include/linux/mmc/host.h |  7 ++--
 6 files changed, 51 insertions(+), 57 deletions(-)

-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Adrian Hunter Feb. 22, 2017, 1:29 p.m. UTC | #1
On 09/02/17 17:33, Linus Walleij wrote:
> The waitqueue in the host context is there to signal back from

> mmc_request_done() through mmc_wait_data_done() that the hardware

> is done with a command, and when the wait is over, the core

> will typically submit the next asynchronous request that is pending

> just waiting for the hardware to be available.

> 

> This is in the way for letting the mmc_request_done() trigger the

> report up to the block layer that a block request is finished.

> 

> Re-jig this as a first step, remvoving the waitqueue and introducing

> a work that will run after a completed asynchronous request,

> finalizing that request, including retransmissions, and eventually

> reporting back with a completion and a status code to the

> asynchronous issue method.

> 

> This had the upside that we can remove the MMC_BLK_NEW_REQUEST

> status code and the "new_request" state in the request queue

> that is only there to make the state machine spin out

> the first time we send a request.

> 

> Introduce a workqueue in the host for handling just this, and

> then a work and completion in the asynchronous request to deal

> with this mechanism.

> 

> This is a central change that let us do many other changes since

> we have broken the submit and complete code paths in two, and we

> can potentially remove the NULL flushing of the asynchronous

> pipeline and report block requests as finished directly from

> the worker.


This needs more thought.  The completion should go straight to the mmc block
driver from the ->done() callback.  And from there straight back to the
block layer if recovery is not needed.  We want to stop using
mmc_start_areq() altogether because we never want to wait - we always want
to issue (if possible) and return.

The core API to use is __mmc_start_req() but the block driver should
populate mrq->done with its own handler. i.e. change __mmc_start_req()

-	mrq->done = mmc_wait_done;
+	if (!mrq->done)
+		mrq->done = mmc_wait_done;

mrq->done() would complete the request (e.g. via blk_complete_request()) if
it has no errors (and doesn't need polling), and wake up the queue thread to
finish up everything else and start the next request.

For the blk-mq port, the queue thread should also be retained, partly
because it solves some synchronization problems, but mostly because, at this
stage, we anyway don't have solutions for all the different ways the driver
can block.
(as listed here https://marc.info/?l=linux-mmc&m=148336571720463&w=2 )

--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bartlomiej Zolnierkiewicz Feb. 28, 2017, 4:10 p.m. UTC | #2
On Thursday, February 09, 2017 04:33:53 PM Linus Walleij wrote:
> The waitqueue in the host context is there to signal back from

> mmc_request_done() through mmc_wait_data_done() that the hardware

> is done with a command, and when the wait is over, the core

> will typically submit the next asynchronous request that is pending

> just waiting for the hardware to be available.

> 

> This is in the way for letting the mmc_request_done() trigger the

> report up to the block layer that a block request is finished.

> 

> Re-jig this as a first step, remvoving the waitqueue and introducing

> a work that will run after a completed asynchronous request,

> finalizing that request, including retransmissions, and eventually

> reporting back with a completion and a status code to the

> asynchronous issue method.

> 

> This had the upside that we can remove the MMC_BLK_NEW_REQUEST

> status code and the "new_request" state in the request queue

> that is only there to make the state machine spin out

> the first time we send a request.

> 

> Introduce a workqueue in the host for handling just this, and

> then a work and completion in the asynchronous request to deal

> with this mechanism.

> 

> This is a central change that let us do many other changes since

> we have broken the submit and complete code paths in two, and we

> can potentially remove the NULL flushing of the asynchronous

> pipeline and report block requests as finished directly from

> the worker.

> 

> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>


Reviewed-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>


Best regards,
--
Bartlomiej Zolnierkiewicz
Samsung R&D Institute Poland
Samsung Electronics

--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Linus Walleij March 9, 2017, 10:49 p.m. UTC | #3
On Wed, Feb 22, 2017 at 2:29 PM, Adrian Hunter <adrian.hunter@intel.com> wrote:
> On 09/02/17 17:33, Linus Walleij wrote:

>> The waitqueue in the host context is there to signal back from

>> mmc_request_done() through mmc_wait_data_done() that the hardware

>> is done with a command, and when the wait is over, the core

>> will typically submit the next asynchronous request that is pending

>> just waiting for the hardware to be available.

>>

>> This is in the way for letting the mmc_request_done() trigger the

>> report up to the block layer that a block request is finished.

>>

>> Re-jig this as a first step, remvoving the waitqueue and introducing

>> a work that will run after a completed asynchronous request,

>> finalizing that request, including retransmissions, and eventually

>> reporting back with a completion and a status code to the

>> asynchronous issue method.

>>

>> This had the upside that we can remove the MMC_BLK_NEW_REQUEST

>> status code and the "new_request" state in the request queue

>> that is only there to make the state machine spin out

>> the first time we send a request.

>>

>> Introduce a workqueue in the host for handling just this, and

>> then a work and completion in the asynchronous request to deal

>> with this mechanism.

>>

>> This is a central change that let us do many other changes since

>> we have broken the submit and complete code paths in two, and we

>> can potentially remove the NULL flushing of the asynchronous

>> pipeline and report block requests as finished directly from

>> the worker.

>

> This needs more thought.  The completion should go straight to the mmc block

> driver from the ->done() callback.  And from there straight back to the

> block layer if recovery is not needed.  We want to stop using

> mmc_start_areq() altogether because we never want to wait - we always want

> to issue (if possible) and return.


I don't quite follow this. Isn't what you request exactly what
patch 15/16 "mmc: queue: issue requests in massive parallel"
is doing?

The whole patch series leads up to that.

> The core API to use is __mmc_start_req() but the block driver should

> populate mrq->done with its own handler. i.e. change __mmc_start_req()

>

> -       mrq->done = mmc_wait_done;

> +       if (!mrq->done)

> +               mrq->done = mmc_wait_done;

>

> mrq->done() would complete the request (e.g. via blk_complete_request()) if

> it has no errors (and doesn't need polling), and wake up the queue thread to

> finish up everything else and start the next request.


I think this is what it does at the end of the patch series, patch 15/16.
I have to split it somehow...

> For the blk-mq port, the queue thread should also be retained, partly

> because it solves some synchronization problems, but mostly because, at this

> stage, we anyway don't have solutions for all the different ways the driver

> can block.

> (as listed here https://marc.info/?l=linux-mmc&m=148336571720463&w=2 )


Essentially I take out that thread and replace it with this one worker
introduced in this very patch. I agree the driver can block in many ways
and that is why I need to have it running in process context, and this
is what the worker introduced here provides.

Maybe I'm getting something backwards, sorry then :/

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Adrian Hunter March 10, 2017, 2:21 p.m. UTC | #4
On 10/03/17 00:49, Linus Walleij wrote:
> On Wed, Feb 22, 2017 at 2:29 PM, Adrian Hunter <adrian.hunter@intel.com> wrote:

>> On 09/02/17 17:33, Linus Walleij wrote:

>>> The waitqueue in the host context is there to signal back from

>>> mmc_request_done() through mmc_wait_data_done() that the hardware

>>> is done with a command, and when the wait is over, the core

>>> will typically submit the next asynchronous request that is pending

>>> just waiting for the hardware to be available.

>>>

>>> This is in the way for letting the mmc_request_done() trigger the

>>> report up to the block layer that a block request is finished.

>>>

>>> Re-jig this as a first step, remvoving the waitqueue and introducing

>>> a work that will run after a completed asynchronous request,

>>> finalizing that request, including retransmissions, and eventually

>>> reporting back with a completion and a status code to the

>>> asynchronous issue method.

>>>

>>> This had the upside that we can remove the MMC_BLK_NEW_REQUEST

>>> status code and the "new_request" state in the request queue

>>> that is only there to make the state machine spin out

>>> the first time we send a request.

>>>

>>> Introduce a workqueue in the host for handling just this, and

>>> then a work and completion in the asynchronous request to deal

>>> with this mechanism.

>>>

>>> This is a central change that let us do many other changes since

>>> we have broken the submit and complete code paths in two, and we

>>> can potentially remove the NULL flushing of the asynchronous

>>> pipeline and report block requests as finished directly from

>>> the worker.

>>

>> This needs more thought.  The completion should go straight to the mmc block

>> driver from the ->done() callback.  And from there straight back to the

>> block layer if recovery is not needed.  We want to stop using

>> mmc_start_areq() altogether because we never want to wait - we always want

>> to issue (if possible) and return.

> 

> I don't quite follow this. Isn't what you request exactly what

> patch 15/16 "mmc: queue: issue requests in massive parallel"

> is doing?


There is the latency for the worker that runs mmc_finalize_areq() and then
another latency to wake up the worker that is running mmc_start_areq().
That is 2 wake-ups instead of 1.

As a side note, ideally we would be able to issue the next request from the
interrupt or soft interrupt context of the completion (i.e. 0 wake-ups
between requests), but we would probably have to look at the host API to
support that.

> 

> The whole patch series leads up to that.

> 

>> The core API to use is __mmc_start_req() but the block driver should

>> populate mrq->done with its own handler. i.e. change __mmc_start_req()

>>

>> -       mrq->done = mmc_wait_done;

>> +       if (!mrq->done)

>> +               mrq->done = mmc_wait_done;

>>

>> mrq->done() would complete the request (e.g. via blk_complete_request()) if

>> it has no errors (and doesn't need polling), and wake up the queue thread to

>> finish up everything else and start the next request.

> 

> I think this is what it does at the end of the patch series, patch 15/16.

> I have to split it somehow...

> 

>> For the blk-mq port, the queue thread should also be retained, partly

>> because it solves some synchronization problems, but mostly because, at this

>> stage, we anyway don't have solutions for all the different ways the driver

>> can block.

>> (as listed here https://marc.info/?l=linux-mmc&m=148336571720463&w=2 )

> 

> Essentially I take out that thread and replace it with this one worker

> introduced in this very patch. I agree the driver can block in many ways

> and that is why I need to have it running in process context, and this

> is what the worker introduced here provides.


The last time I looked at the blk-mq I/O scheduler code, it pulled up to
qdepth requests from the I/O scheduler and left them on a local list while
running ->queue_rq().  That means blocking in ->queue_rq() leaves some
number of requests in limbo (not issued but also not in the I/O scheduler)
for that time.

Maybe blk-mq should offer a pull interface to I/O scheduler users?

--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jens Axboe March 10, 2017, 10:05 p.m. UTC | #5
On 03/10/2017 07:21 AM, Adrian Hunter wrote:
>> Essentially I take out that thread and replace it with this one worker

>> introduced in this very patch. I agree the driver can block in many ways

>> and that is why I need to have it running in process context, and this

>> is what the worker introduced here provides.

> 

> The last time I looked at the blk-mq I/O scheduler code, it pulled up to

> qdepth requests from the I/O scheduler and left them on a local list while

> running ->queue_rq().  That means blocking in ->queue_rq() leaves some

> number of requests in limbo (not issued but also not in the I/O scheduler)

> for that time.


Look again, if we're not handling the requeued dispatches, we pull one
at the time from the scheduler.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Adrian Hunter March 13, 2017, 9:25 a.m. UTC | #6
On 11/03/17 00:05, Jens Axboe wrote:
> On 03/10/2017 07:21 AM, Adrian Hunter wrote:

>>> Essentially I take out that thread and replace it with this one worker

>>> introduced in this very patch. I agree the driver can block in many ways

>>> and that is why I need to have it running in process context, and this

>>> is what the worker introduced here provides.

>>

>> The last time I looked at the blk-mq I/O scheduler code, it pulled up to

>> qdepth requests from the I/O scheduler and left them on a local list while

>> running ->queue_rq().  That means blocking in ->queue_rq() leaves some

>> number of requests in limbo (not issued but also not in the I/O scheduler)

>> for that time.

> 

> Look again, if we're not handling the requeued dispatches, we pull one

> at the time from the scheduler.

> 


That's good :-)

Now the next thing ;-)

It looks like we either set BLK_MQ_F_BLOCKING and miss the possibility of
issuing synchronous requests immediately, or we don't set BLK_MQ_F_BLOCKING
in which case we are never allowed to sleep in ->queue_rq().  Is that true?

--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jens Axboe March 13, 2017, 2:19 p.m. UTC | #7
On 03/13/2017 03:25 AM, Adrian Hunter wrote:
> On 11/03/17 00:05, Jens Axboe wrote:

>> On 03/10/2017 07:21 AM, Adrian Hunter wrote:

>>>> Essentially I take out that thread and replace it with this one worker

>>>> introduced in this very patch. I agree the driver can block in many ways

>>>> and that is why I need to have it running in process context, and this

>>>> is what the worker introduced here provides.

>>>

>>> The last time I looked at the blk-mq I/O scheduler code, it pulled up to

>>> qdepth requests from the I/O scheduler and left them on a local list while

>>> running ->queue_rq().  That means blocking in ->queue_rq() leaves some

>>> number of requests in limbo (not issued but also not in the I/O scheduler)

>>> for that time.

>>

>> Look again, if we're not handling the requeued dispatches, we pull one

>> at the time from the scheduler.

>>

> 

> That's good :-)

> 

> Now the next thing ;-)

> 

> It looks like we either set BLK_MQ_F_BLOCKING and miss the possibility of

> issuing synchronous requests immediately, or we don't set BLK_MQ_F_BLOCKING

> in which case we are never allowed to sleep in ->queue_rq().  Is that true?


Only one of those statements is true - if you don't set BLK_MQ_F_BLOCKING,
then you may never block in your ->queue_rq() function. But if you do set it,
it does not preclude immediate issue of sync requests.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Adrian Hunter March 14, 2017, 12:59 p.m. UTC | #8
On 13/03/17 16:19, Jens Axboe wrote:
> On 03/13/2017 03:25 AM, Adrian Hunter wrote:

>> On 11/03/17 00:05, Jens Axboe wrote:

>>> On 03/10/2017 07:21 AM, Adrian Hunter wrote:

>>>>> Essentially I take out that thread and replace it with this one worker

>>>>> introduced in this very patch. I agree the driver can block in many ways

>>>>> and that is why I need to have it running in process context, and this

>>>>> is what the worker introduced here provides.

>>>>

>>>> The last time I looked at the blk-mq I/O scheduler code, it pulled up to

>>>> qdepth requests from the I/O scheduler and left them on a local list while

>>>> running ->queue_rq().  That means blocking in ->queue_rq() leaves some

>>>> number of requests in limbo (not issued but also not in the I/O scheduler)

>>>> for that time.

>>>

>>> Look again, if we're not handling the requeued dispatches, we pull one

>>> at the time from the scheduler.

>>>

>>

>> That's good :-)

>>

>> Now the next thing ;-)

>>

>> It looks like we either set BLK_MQ_F_BLOCKING and miss the possibility of

>> issuing synchronous requests immediately, or we don't set BLK_MQ_F_BLOCKING

>> in which case we are never allowed to sleep in ->queue_rq().  Is that true?

> 

> Only one of those statements is true - if you don't set BLK_MQ_F_BLOCKING,

> then you may never block in your ->queue_rq() function. But if you do set it,

> it does not preclude immediate issue of sync requests.


I meant it gets put to the workqueue rather than issued in the context of
the submitter.

--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jens Axboe March 14, 2017, 2:36 p.m. UTC | #9
On 03/14/2017 06:59 AM, Adrian Hunter wrote:
> On 13/03/17 16:19, Jens Axboe wrote:

>> On 03/13/2017 03:25 AM, Adrian Hunter wrote:

>>> On 11/03/17 00:05, Jens Axboe wrote:

>>>> On 03/10/2017 07:21 AM, Adrian Hunter wrote:

>>>>>> Essentially I take out that thread and replace it with this one worker

>>>>>> introduced in this very patch. I agree the driver can block in many ways

>>>>>> and that is why I need to have it running in process context, and this

>>>>>> is what the worker introduced here provides.

>>>>>

>>>>> The last time I looked at the blk-mq I/O scheduler code, it pulled up to

>>>>> qdepth requests from the I/O scheduler and left them on a local list while

>>>>> running ->queue_rq().  That means blocking in ->queue_rq() leaves some

>>>>> number of requests in limbo (not issued but also not in the I/O scheduler)

>>>>> for that time.

>>>>

>>>> Look again, if we're not handling the requeued dispatches, we pull one

>>>> at the time from the scheduler.

>>>>

>>>

>>> That's good :-)

>>>

>>> Now the next thing ;-)

>>>

>>> It looks like we either set BLK_MQ_F_BLOCKING and miss the possibility of

>>> issuing synchronous requests immediately, or we don't set BLK_MQ_F_BLOCKING

>>> in which case we are never allowed to sleep in ->queue_rq().  Is that true?

>>

>> Only one of those statements is true - if you don't set BLK_MQ_F_BLOCKING,

>> then you may never block in your ->queue_rq() function. But if you do set it,

>> it does not preclude immediate issue of sync requests.

> 

> I meant it gets put to the workqueue rather than issued in the context of

> the submitter.


There's one case that doesn't look like it was converted properly, but
that's a mistake. The general insert-and-run cases run inline if we can,
but the direct-issue needs a fixup, see below.



-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.htmldiff --git a/block/blk-mq.c b/block/blk-mq.c
index 159187a28d66..4196d6bee92d 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1434,7 +1434,8 @@ static blk_qc_t request_to_qc_t(struct blk_mq_hw_ctx *hctx, struct request *rq)
 	return blk_tag_to_qc_t(rq->internal_tag, hctx->queue_num, true);
 }
 
-static void blk_mq_try_issue_directly(struct request *rq, blk_qc_t *cookie)
+static void blk_mq_try_issue_directly(struct request *rq, blk_qc_t *cookie,
+				      bool can_block)
 {
 	struct request_queue *q = rq->q;
 	struct blk_mq_queue_data bd = {
@@ -1475,7 +1476,7 @@ static void blk_mq_try_issue_directly(struct request *rq, blk_qc_t *cookie)
 	}
 
 insert:
-	blk_mq_sched_insert_request(rq, false, true, true, false);
+	blk_mq_sched_insert_request(rq, false, true, false, can_block);
 }
 
 /*
@@ -1569,11 +1570,11 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 
 		if (!(data.hctx->flags & BLK_MQ_F_BLOCKING)) {
 			rcu_read_lock();
-			blk_mq_try_issue_directly(old_rq, &cookie);
+			blk_mq_try_issue_directly(old_rq, &cookie, false);
 			rcu_read_unlock();
 		} else {
 			srcu_idx = srcu_read_lock(&data.hctx->queue_rq_srcu);
-			blk_mq_try_issue_directly(old_rq, &cookie);
+			blk_mq_try_issue_directly(old_rq, &cookie, true);
 			srcu_read_unlock(&data.hctx->queue_rq_srcu, srcu_idx);
 		}
 		goto done;

Christoph Hellwig March 14, 2017, 2:43 p.m. UTC | #10
On Tue, Mar 14, 2017 at 08:36:26AM -0600, Jens Axboe wrote:
> There's one case that doesn't look like it was converted properly, but

> that's a mistake. The general insert-and-run cases run inline if we can,

> but the direct-issue needs a fixup, see below.


Note that blk_mq_try_issue_directly is only called for the multiple
hardware queue case, so MMC would not hit it.
--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jens Axboe March 14, 2017, 2:52 p.m. UTC | #11
On 03/14/2017 08:43 AM, Christoph Hellwig wrote:
> On Tue, Mar 14, 2017 at 08:36:26AM -0600, Jens Axboe wrote:

>> There's one case that doesn't look like it was converted properly, but

>> that's a mistake. The general insert-and-run cases run inline if we can,

>> but the direct-issue needs a fixup, see below.

> 

> Note that blk_mq_try_issue_directly is only called for the multiple

> hardware queue case, so MMC would not hit it.


Right, which is why I said that the general case works fine, it's
only the explicit issue-direct that currently does not. Not by
design, just an error.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Linus Walleij March 28, 2017, 7:46 a.m. UTC | #12
On Fri, Mar 10, 2017 at 3:21 PM, Adrian Hunter <adrian.hunter@intel.com> wrote:
> On 10/03/17 00:49, Linus Walleij wrote:

>> On Wed, Feb 22, 2017 at 2:29 PM, Adrian Hunter <adrian.hunter@intel.com> wrote:

>>> On 09/02/17 17:33, Linus Walleij wrote:


>>>> This is a central change that let us do many other changes since

>>>> we have broken the submit and complete code paths in two, and we

>>>> can potentially remove the NULL flushing of the asynchronous

>>>> pipeline and report block requests as finished directly from

>>>> the worker.

>>>

>>> This needs more thought.  The completion should go straight to the mmc block

>>> driver from the ->done() callback.  And from there straight back to the

>>> block layer if recovery is not needed.  We want to stop using

>>> mmc_start_areq() altogether because we never want to wait - we always want

>>> to issue (if possible) and return.

>>

>> I don't quite follow this. Isn't what you request exactly what

>> patch 15/16 "mmc: queue: issue requests in massive parallel"

>> is doing?

>

> There is the latency for the worker that runs mmc_finalize_areq() and then

> another latency to wake up the worker that is running mmc_start_areq().

> That is 2 wake-ups instead of 1.


That is correct (though the measured effect is small).

However when we switch to MQ it must happen like this due to its asynchronous
nature of issuing requests to us.

Then we have MQ's submission thread coming in from one en and our worker
to manage retries and errors on the other side. We obviously cannot do
the retries and resends in MQs context as it blocks subsequent requests.

> As a side note, ideally we would be able to issue the next request from the

> interrupt or soft interrupt context of the completion (i.e. 0 wake-ups

> between requests), but we would probably have to look at the host API to

> support that.


I looked at that and couldn't find a good way to get to that point.

Mainly because of mmc_start_bkops() that may
arbitrarily fire after every command and start new requests to the
card, and that of course require a process context to happen. Then there
is the retune thing that I do not fully understand how it schedules, but
it might be fine since I'm under the impression that it is done at the
start of the next request if need be. Maybe both can be overcome by
quick checks in IRQ context and then this can be done. (I'm not smart enough
to see that yet, sorry.)

However since we activate the blocking context in MQ I don't know
if it can even deal with having requests completed in interrupt context
so that the next thing that happens after completing the request and
returning from the interrupt is that the block layer thread gets scheduled
(unless something more important is going on), I guess it is possible?
It looks like it could be really efficient.

>>> For the blk-mq port, the queue thread should also be retained, partly

>>> because it solves some synchronization problems, but mostly because, at this

>>> stage, we anyway don't have solutions for all the different ways the driver

>>> can block.

>>> (as listed here https://marc.info/?l=linux-mmc&m=148336571720463&w=2 )

>>

>> Essentially I take out that thread and replace it with this one worker

>> introduced in this very patch. I agree the driver can block in many ways

>> and that is why I need to have it running in process context, and this

>> is what the worker introduced here provides.

>

> The last time I looked at the blk-mq I/O scheduler code, it pulled up to

> qdepth requests from the I/O scheduler and left them on a local list while

> running ->queue_rq().  That means blocking in ->queue_rq() leaves some

> number of requests in limbo (not issued but also not in the I/O scheduler)

> for that time.


I think Jens provided a patch for this bug (don't see the patch
upstream though).

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Linus Walleij March 28, 2017, 7:47 a.m. UTC | #13
On Tue, Mar 14, 2017 at 3:36 PM, Jens Axboe <axboe@kernel.dk> wrote:

> There's one case that doesn't look like it was converted properly, but

> that's a mistake. The general insert-and-run cases run inline if we can,

> but the direct-issue needs a fixup, see below.

>

>

> diff --git a/block/blk-mq.c b/block/blk-mq.c

> index 159187a28d66..4196d6bee92d 100644

> --- a/block/blk-mq.c

> +++ b/block/blk-mq.c

> @@ -1434,7 +1434,8 @@ static blk_qc_t request_to_qc_t(struct blk_mq_hw_ctx *hctx, struct request *rq)

>         return blk_tag_to_qc_t(rq->internal_tag, hctx->queue_num, true);

>  }

>

> -static void blk_mq_try_issue_directly(struct request *rq, blk_qc_t *cookie)

> +static void blk_mq_try_issue_directly(struct request *rq, blk_qc_t *cookie,

> +                                     bool can_block)

>  {

>         struct request_queue *q = rq->q;

>         struct blk_mq_queue_data bd = {

> @@ -1475,7 +1476,7 @@ static void blk_mq_try_issue_directly(struct request *rq, blk_qc_t *cookie)

>         }

>

>  insert:

> -       blk_mq_sched_insert_request(rq, false, true, true, false);

> +       blk_mq_sched_insert_request(rq, false, true, false, can_block);

>  }

>

>  /*

> @@ -1569,11 +1570,11 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)

>

>                 if (!(data.hctx->flags & BLK_MQ_F_BLOCKING)) {

>                         rcu_read_lock();

> -                       blk_mq_try_issue_directly(old_rq, &cookie);

> +                       blk_mq_try_issue_directly(old_rq, &cookie, false);

>                         rcu_read_unlock();

>                 } else {

>                         srcu_idx = srcu_read_lock(&data.hctx->queue_rq_srcu);

> -                       blk_mq_try_issue_directly(old_rq, &cookie);

> +                       blk_mq_try_issue_directly(old_rq, &cookie, true);

>                         srcu_read_unlock(&data.hctx->queue_rq_srcu, srcu_idx);

>                 }

>                 goto done;


Jens do you have this in your patch queue or is it something you want
us to test and
submit back to you?

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox series

Patch

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index c49c90dba839..c459d80c66bf 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -1562,6 +1562,8 @@  static void mmc_blk_rw_rq_prep(struct mmc_queue_req *mqrq,
 
 	mqrq->areq.mrq = &brq->mrq;
 	mqrq->areq.err_check = mmc_blk_err_check;
+	mqrq->areq.host = card->host;
+	kthread_init_work(&mqrq->areq.finalization_work, mmc_finalize_areq);
 
 	mmc_queue_bounce_pre(mqrq);
 }
@@ -1672,8 +1674,6 @@  static void mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *new_req)
 			 * and there is nothing more to do until it is
 			 * complete.
 			 */
-			if (status == MMC_BLK_NEW_REQUEST)
-				mq->new_request = true;
 			return;
 		}
 
@@ -1811,7 +1811,6 @@  void mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req)
 		goto out;
 	}
 
-	mq->new_request = false;
 	if (req && req_op(req) == REQ_OP_DISCARD) {
 		/* complete ongoing async transfer before issuing discard */
 		if (card->host->areq)
@@ -1832,7 +1831,7 @@  void mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req)
 	}
 
 out:
-	if ((!req && !mq->new_request) || req_is_special)
+	if (!req || req_is_special)
 		/*
 		 * Release host when there are no more requests
 		 * and after special request(discard, flush) is done.
diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index 663799240635..8ecf61e51662 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -415,10 +415,13 @@  EXPORT_SYMBOL(mmc_start_bkops);
  */
 static void mmc_wait_data_done(struct mmc_request *mrq)
 {
-	struct mmc_context_info *context_info = &mrq->host->context_info;
+	struct mmc_host *host = mrq->host;
+	struct mmc_context_info *context_info = &host->context_info;
+	struct mmc_async_req *areq = host->areq;
 
 	context_info->is_done_rcv = true;
-	wake_up_interruptible(&context_info->wait);
+	/* Schedule a work to deal with finalizing this request */
+	kthread_queue_work(&host->req_done_worker, &areq->finalization_work);
 }
 
 static void mmc_wait_done(struct mmc_request *mrq)
@@ -592,43 +595,34 @@  static void mmc_post_req(struct mmc_host *host, struct mmc_request *mrq,
  *	will return MMC_BLK_SUCCESS if no request was
  *	going on.
  */
-static enum mmc_blk_status mmc_finalize_areq(struct mmc_host *host)
+void mmc_finalize_areq(struct kthread_work *work)
 {
+	struct mmc_async_req *areq =
+		container_of(work, struct mmc_async_req, finalization_work);
+	struct mmc_host *host = areq->host;
 	struct mmc_context_info *context_info = &host->context_info;
-	enum mmc_blk_status status;
-
-	if (!host->areq)
-		return MMC_BLK_SUCCESS;
-
-	while (1) {
-		wait_event_interruptible(context_info->wait,
-				(context_info->is_done_rcv ||
-				 context_info->is_new_req));
+	enum mmc_blk_status status = MMC_BLK_SUCCESS;
 
-		if (context_info->is_done_rcv) {
-			struct mmc_command *cmd;
+	if (context_info->is_done_rcv) {
+		struct mmc_command *cmd;
 
-			context_info->is_done_rcv = false;
-			cmd = host->areq->mrq->cmd;
+		context_info->is_done_rcv = false;
+		cmd = areq->mrq->cmd;
 
-			if (!cmd->error || !cmd->retries ||
-			    mmc_card_removed(host->card)) {
-				status = host->areq->err_check(host->card,
-							       host->areq);
-				break; /* return status */
-			} else {
-				mmc_retune_recheck(host);
-				pr_info("%s: req failed (CMD%u): %d, retrying...\n",
-					mmc_hostname(host),
-					cmd->opcode, cmd->error);
-				cmd->retries--;
-				cmd->error = 0;
-				__mmc_start_request(host, host->areq->mrq);
-				continue; /* wait for done/new event again */
-			}
+		if (!cmd->error || !cmd->retries ||
+		    mmc_card_removed(host->card)) {
+			status = areq->err_check(host->card,
+						 areq);
+		} else {
+			mmc_retune_recheck(host);
+			pr_info("%s: req failed (CMD%u): %d, retrying...\n",
+				mmc_hostname(host),
+				cmd->opcode, cmd->error);
+			cmd->retries--;
+			cmd->error = 0;
+			__mmc_start_request(host, areq->mrq);
+			return; /* wait for done/new event again */
 		}
-
-		return MMC_BLK_NEW_REQUEST;
 	}
 
 	mmc_retune_release(host);
@@ -644,10 +638,12 @@  static enum mmc_blk_status mmc_finalize_areq(struct mmc_host *host)
 	}
 
 	/* Successfully postprocess the old request at this point */
-	mmc_post_req(host, host->areq->mrq, 0);
+	mmc_post_req(host, areq->mrq, 0);
 
-	return status;
+	areq->finalization_status = status;
+	complete(&areq->complete);
 }
+EXPORT_SYMBOL(mmc_finalize_areq);
 
 /**
  *	mmc_start_areq - start an asynchronous request
@@ -677,18 +673,21 @@  struct mmc_async_req *mmc_start_areq(struct mmc_host *host,
 	if (areq)
 		mmc_pre_req(host, areq->mrq);
 
-	/* Finalize previous request */
-	status = mmc_finalize_areq(host);
+	/* Finalize previous request, if there is one */
+	if (previous) {
+		wait_for_completion(&previous->complete);
+		status = previous->finalization_status;
+	} else {
+		status = MMC_BLK_SUCCESS;
+	}
 	if (ret_stat)
 		*ret_stat = status;
 
-	/* The previous request is still going on... */
-	if (status == MMC_BLK_NEW_REQUEST)
-		return NULL;
-
 	/* Fine so far, start the new request! */
-	if (status == MMC_BLK_SUCCESS && areq)
+	if (status == MMC_BLK_SUCCESS && areq) {
+		init_completion(&areq->complete);
 		start_err = __mmc_start_data_req(host, areq->mrq);
+	}
 
 	/* Cancel a prepared request if it was not started. */
 	if ((status != MMC_BLK_SUCCESS || start_err) && areq)
@@ -2996,7 +2995,6 @@  void mmc_init_context_info(struct mmc_host *host)
 	host->context_info.is_new_req = false;
 	host->context_info.is_done_rcv = false;
 	host->context_info.is_waiting_last_req = false;
-	init_waitqueue_head(&host->context_info.wait);
 }
 
 static int __init mmc_init(void)
diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
index 5cb369c2664b..73250ed8f093 100644
--- a/drivers/mmc/core/queue.c
+++ b/drivers/mmc/core/queue.c
@@ -86,11 +86,6 @@  static int mmc_queue_thread(void *d)
 			set_current_state(TASK_RUNNING);
 			mmc_blk_issue_rq(mq, req);
 			cond_resched();
-			if (mq->new_request) {
-				mq->new_request = false;
-				continue; /* fetch again */
-			}
-
 			/*
 			 * Current request becomes previous request
 			 * and vice versa.
@@ -143,7 +138,6 @@  static void mmc_request_fn(struct request_queue *q)
 
 	if (cntx->is_waiting_last_req) {
 		cntx->is_new_req = true;
-		wake_up_interruptible(&cntx->wait);
 	}
 
 	if (mq->asleep)
diff --git a/drivers/mmc/core/queue.h b/drivers/mmc/core/queue.h
index e298f100101b..39d8e710287e 100644
--- a/drivers/mmc/core/queue.h
+++ b/drivers/mmc/core/queue.h
@@ -40,7 +40,6 @@  struct mmc_queue {
 	struct mmc_card		*card;
 	struct task_struct	*thread;
 	struct semaphore	thread_sem;
-	bool			new_request;
 	bool			suspended;
 	bool			asleep;
 	struct mmc_blk_data	*blkdata;
diff --git a/include/linux/mmc/core.h b/include/linux/mmc/core.h
index a0c63ea28796..5db0fb722c37 100644
--- a/include/linux/mmc/core.h
+++ b/include/linux/mmc/core.h
@@ -9,6 +9,7 @@ 
 #define LINUX_MMC_CORE_H
 
 #include <linux/completion.h>
+#include <linux/kthread.h>
 #include <linux/types.h>
 
 struct mmc_data;
@@ -23,7 +24,6 @@  enum mmc_blk_status {
 	MMC_BLK_DATA_ERR,
 	MMC_BLK_ECC_ERR,
 	MMC_BLK_NOMEDIUM,
-	MMC_BLK_NEW_REQUEST,
 };
 
 struct mmc_command {
@@ -158,6 +158,7 @@  struct mmc_request {
 struct mmc_card;
 struct mmc_async_req;
 
+void mmc_finalize_areq(struct kthread_work *work);
 struct mmc_async_req *mmc_start_areq(struct mmc_host *host,
 				struct mmc_async_req *areq,
 				enum mmc_blk_status *ret_stat);
diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h
index b04f8cd51c82..c5f61f2f2310 100644
--- a/include/linux/mmc/host.h
+++ b/include/linux/mmc/host.h
@@ -14,6 +14,7 @@ 
 #include <linux/device.h>
 #include <linux/fault-inject.h>
 #include <linux/kthread.h>
+#include <linux/completion.h>
 
 #include <linux/mmc/core.h>
 #include <linux/mmc/card.h>
@@ -169,6 +170,10 @@  struct mmc_async_req {
 	 * Returns 0 if success otherwise non zero.
 	 */
 	enum mmc_blk_status (*err_check)(struct mmc_card *, struct mmc_async_req *);
+	struct kthread_work finalization_work;
+	enum mmc_blk_status finalization_status;
+	struct completion complete;
+	struct mmc_host *host;
 };
 
 /**
@@ -192,13 +197,11 @@  struct mmc_slot {
  * @is_done_rcv		wake up reason was done request
  * @is_new_req		wake up reason was new request
  * @is_waiting_last_req	mmc context waiting for single running request
- * @wait		wait queue
  */
 struct mmc_context_info {
 	bool			is_done_rcv;
 	bool			is_new_req;
 	bool			is_waiting_last_req;
-	wait_queue_head_t	wait;
 };
 
 struct regulator;