Message ID | 20220922182805.96173-2-axboe@kernel.dk |
---|---|
State | Superseded |
Headers | show |
Series | Enable alloc caching and batched freeing for passthrough | expand |
On Thu, Sep 22, 2022 at 12:28:01PM -0600, Jens Axboe wrote: > The filesystem IO path can take advantage of allocating batches of > requests, if the underlying submitter tells the block layer about it > through the blk_plug. For passthrough IO, the exported API is the > blk_mq_alloc_request() helper, and that one does not allow for > request caching. > > Wire up request caching for blk_mq_alloc_request(), which is generally > done without having a bio available upfront. > > Signed-off-by: Jens Axboe <axboe@kernel.dk> > --- > block/blk-mq.c | 80 ++++++++++++++++++++++++++++++++++++++++++++------ > 1 file changed, 71 insertions(+), 9 deletions(-) > I think we need this patch to ensure correct behaviour for passthrough: diff --git a/block/blk-mq.c b/block/blk-mq.c index c11949d66163..840541c1ab40 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1213,7 +1213,7 @@ void blk_execute_rq_nowait(struct request *rq, bool at_head) WARN_ON(!blk_rq_is_passthrough(rq)); blk_account_io_start(rq); - if (current->plug) + if (blk_mq_plug(rq->bio)) blk_add_rq_to_plug(current->plug, rq); else blk_mq_sched_insert_request(rq, at_head, true, false); As the passthrough path can now support request caching via blk_mq_alloc_request(), and it uses blk_execute_rq_nowait(), bad things can happen at least for zoned devices: static inline struct blk_plug *blk_mq_plug( struct bio *bio) { /* Zoned block device write operation case: do not plug the BIO */ if (bdev_is_zoned(bio->bi_bdev) && op_is_write(bio_op(bio))) return NULL; .. Am I missing something?
On 2022-09-23 16:52, Pankaj Raghav wrote: > On Thu, Sep 22, 2022 at 12:28:01PM -0600, Jens Axboe wrote: >> The filesystem IO path can take advantage of allocating batches of >> requests, if the underlying submitter tells the block layer about it >> through the blk_plug. For passthrough IO, the exported API is the >> blk_mq_alloc_request() helper, and that one does not allow for >> request caching. >> >> Wire up request caching for blk_mq_alloc_request(), which is generally >> done without having a bio available upfront. >> >> Signed-off-by: Jens Axboe <axboe@kernel.dk> >> --- >> block/blk-mq.c | 80 ++++++++++++++++++++++++++++++++++++++++++++------ >> 1 file changed, 71 insertions(+), 9 deletions(-) >> > I think we need this patch to ensure correct behaviour for passthrough: > > diff --git a/block/blk-mq.c b/block/blk-mq.c > index c11949d66163..840541c1ab40 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -1213,7 +1213,7 @@ void blk_execute_rq_nowait(struct request *rq, bool at_head) > WARN_ON(!blk_rq_is_passthrough(rq)); > > blk_account_io_start(rq); > - if (current->plug) > + if (blk_mq_plug(rq->bio)) > blk_add_rq_to_plug(current->plug, rq); > else > blk_mq_sched_insert_request(rq, at_head, true, false); > > As the passthrough path can now support request caching via blk_mq_alloc_request(), > and it uses blk_execute_rq_nowait(), bad things can happen at least for zoned > devices: > > static inline struct blk_plug *blk_mq_plug( struct bio *bio) > { > /* Zoned block device write operation case: do not plug the BIO */ > if (bdev_is_zoned(bio->bi_bdev) && op_is_write(bio_op(bio))) > return NULL; > .. Thinking more about it, even this will not fix it because op is REQ_OP_DRV_OUT if it is a NVMe write for passthrough requests. @Damien Should the condition in blk_mq_plug() be changed to: static inline struct blk_plug *blk_mq_plug( struct bio *bio) { /* Zoned block device write operation case: do not plug the BIO */ if (bdev_is_zoned(bio->bi_bdev) && !op_is_read(bio_op(bio))) return NULL; > > Am I missing something?
On 9/23/22 9:13 AM, Pankaj Raghav wrote: > On 2022-09-23 16:52, Pankaj Raghav wrote: >> On Thu, Sep 22, 2022 at 12:28:01PM -0600, Jens Axboe wrote: >>> The filesystem IO path can take advantage of allocating batches of >>> requests, if the underlying submitter tells the block layer about it >>> through the blk_plug. For passthrough IO, the exported API is the >>> blk_mq_alloc_request() helper, and that one does not allow for >>> request caching. >>> >>> Wire up request caching for blk_mq_alloc_request(), which is generally >>> done without having a bio available upfront. >>> >>> Signed-off-by: Jens Axboe <axboe@kernel.dk> >>> --- >>> block/blk-mq.c | 80 ++++++++++++++++++++++++++++++++++++++++++++------ >>> 1 file changed, 71 insertions(+), 9 deletions(-) >>> >> I think we need this patch to ensure correct behaviour for passthrough: >> >> diff --git a/block/blk-mq.c b/block/blk-mq.c >> index c11949d66163..840541c1ab40 100644 >> --- a/block/blk-mq.c >> +++ b/block/blk-mq.c >> @@ -1213,7 +1213,7 @@ void blk_execute_rq_nowait(struct request *rq, bool at_head) >> WARN_ON(!blk_rq_is_passthrough(rq)); >> >> blk_account_io_start(rq); >> - if (current->plug) >> + if (blk_mq_plug(rq->bio)) >> blk_add_rq_to_plug(current->plug, rq); >> else >> blk_mq_sched_insert_request(rq, at_head, true, false); >> >> As the passthrough path can now support request caching via blk_mq_alloc_request(), >> and it uses blk_execute_rq_nowait(), bad things can happen at least for zoned >> devices: >> >> static inline struct blk_plug *blk_mq_plug( struct bio *bio) >> { >> /* Zoned block device write operation case: do not plug the BIO */ >> if (bdev_is_zoned(bio->bi_bdev) && op_is_write(bio_op(bio))) >> return NULL; >> .. > > Thinking more about it, even this will not fix it because op is > REQ_OP_DRV_OUT if it is a NVMe write for passthrough requests. > > @Damien Should the condition in blk_mq_plug() be changed to: > > static inline struct blk_plug *blk_mq_plug( struct bio *bio) > { > /* Zoned block device write operation case: do not plug the BIO */ > if (bdev_is_zoned(bio->bi_bdev) && !op_is_read(bio_op(bio))) > return NULL; That looks reasonable to me. It'll prevent plug optimizations even for passthrough on zoned devices, but that's probably fine.
On 9/24/22 05:54, Jens Axboe wrote: > On 9/23/22 9:13 AM, Pankaj Raghav wrote: >> On 2022-09-23 16:52, Pankaj Raghav wrote: >>> On Thu, Sep 22, 2022 at 12:28:01PM -0600, Jens Axboe wrote: >>>> The filesystem IO path can take advantage of allocating batches of >>>> requests, if the underlying submitter tells the block layer about it >>>> through the blk_plug. For passthrough IO, the exported API is the >>>> blk_mq_alloc_request() helper, and that one does not allow for >>>> request caching. >>>> >>>> Wire up request caching for blk_mq_alloc_request(), which is generally >>>> done without having a bio available upfront. >>>> >>>> Signed-off-by: Jens Axboe <axboe@kernel.dk> >>>> --- >>>> block/blk-mq.c | 80 ++++++++++++++++++++++++++++++++++++++++++++------ >>>> 1 file changed, 71 insertions(+), 9 deletions(-) >>>> >>> I think we need this patch to ensure correct behaviour for passthrough: >>> >>> diff --git a/block/blk-mq.c b/block/blk-mq.c >>> index c11949d66163..840541c1ab40 100644 >>> --- a/block/blk-mq.c >>> +++ b/block/blk-mq.c >>> @@ -1213,7 +1213,7 @@ void blk_execute_rq_nowait(struct request *rq, bool at_head) >>> WARN_ON(!blk_rq_is_passthrough(rq)); >>> >>> blk_account_io_start(rq); >>> - if (current->plug) >>> + if (blk_mq_plug(rq->bio)) >>> blk_add_rq_to_plug(current->plug, rq); >>> else >>> blk_mq_sched_insert_request(rq, at_head, true, false); >>> >>> As the passthrough path can now support request caching via blk_mq_alloc_request(), >>> and it uses blk_execute_rq_nowait(), bad things can happen at least for zoned >>> devices: >>> >>> static inline struct blk_plug *blk_mq_plug( struct bio *bio) >>> { >>> /* Zoned block device write operation case: do not plug the BIO */ >>> if (bdev_is_zoned(bio->bi_bdev) && op_is_write(bio_op(bio))) >>> return NULL; >>> .. >> >> Thinking more about it, even this will not fix it because op is >> REQ_OP_DRV_OUT if it is a NVMe write for passthrough requests. >> >> @Damien Should the condition in blk_mq_plug() be changed to: >> >> static inline struct blk_plug *blk_mq_plug( struct bio *bio) >> { >> /* Zoned block device write operation case: do not plug the BIO */ >> if (bdev_is_zoned(bio->bi_bdev) && !op_is_read(bio_op(bio))) >> return NULL; > > That looks reasonable to me. It'll prevent plug optimizations even > for passthrough on zoned devices, but that's probably fine. Could do: if (blk_op_is_passthrough(bio_op(bio)) || (bdev_is_zoned(bio->bi_bdev) && op_is_write(bio_op(bio)))) return NULL; Which I think is way cleaner. No ? Unless you want to preserve plugging with passthrough commands on regular (not zoned) drives ?
On 9/23/22 6:59 PM, Damien Le Moal wrote: > On 9/24/22 05:54, Jens Axboe wrote: >> On 9/23/22 9:13 AM, Pankaj Raghav wrote: >>> On 2022-09-23 16:52, Pankaj Raghav wrote: >>>> On Thu, Sep 22, 2022 at 12:28:01PM -0600, Jens Axboe wrote: >>>>> The filesystem IO path can take advantage of allocating batches of >>>>> requests, if the underlying submitter tells the block layer about it >>>>> through the blk_plug. For passthrough IO, the exported API is the >>>>> blk_mq_alloc_request() helper, and that one does not allow for >>>>> request caching. >>>>> >>>>> Wire up request caching for blk_mq_alloc_request(), which is generally >>>>> done without having a bio available upfront. >>>>> >>>>> Signed-off-by: Jens Axboe <axboe@kernel.dk> >>>>> --- >>>>> block/blk-mq.c | 80 ++++++++++++++++++++++++++++++++++++++++++++------ >>>>> 1 file changed, 71 insertions(+), 9 deletions(-) >>>>> >>>> I think we need this patch to ensure correct behaviour for passthrough: >>>> >>>> diff --git a/block/blk-mq.c b/block/blk-mq.c >>>> index c11949d66163..840541c1ab40 100644 >>>> --- a/block/blk-mq.c >>>> +++ b/block/blk-mq.c >>>> @@ -1213,7 +1213,7 @@ void blk_execute_rq_nowait(struct request *rq, bool at_head) >>>> WARN_ON(!blk_rq_is_passthrough(rq)); >>>> >>>> blk_account_io_start(rq); >>>> - if (current->plug) >>>> + if (blk_mq_plug(rq->bio)) >>>> blk_add_rq_to_plug(current->plug, rq); >>>> else >>>> blk_mq_sched_insert_request(rq, at_head, true, false); >>>> >>>> As the passthrough path can now support request caching via blk_mq_alloc_request(), >>>> and it uses blk_execute_rq_nowait(), bad things can happen at least for zoned >>>> devices: >>>> >>>> static inline struct blk_plug *blk_mq_plug( struct bio *bio) >>>> { >>>> /* Zoned block device write operation case: do not plug the BIO */ >>>> if (bdev_is_zoned(bio->bi_bdev) && op_is_write(bio_op(bio))) >>>> return NULL; >>>> .. >>> >>> Thinking more about it, even this will not fix it because op is >>> REQ_OP_DRV_OUT if it is a NVMe write for passthrough requests. >>> >>> @Damien Should the condition in blk_mq_plug() be changed to: >>> >>> static inline struct blk_plug *blk_mq_plug( struct bio *bio) >>> { >>> /* Zoned block device write operation case: do not plug the BIO */ >>> if (bdev_is_zoned(bio->bi_bdev) && !op_is_read(bio_op(bio))) >>> return NULL; >> >> That looks reasonable to me. It'll prevent plug optimizations even >> for passthrough on zoned devices, but that's probably fine. > > Could do: > > if (blk_op_is_passthrough(bio_op(bio)) || > (bdev_is_zoned(bio->bi_bdev) && op_is_write(bio_op(bio)))) > return NULL; > > Which I think is way cleaner. No ? > Unless you want to preserve plugging with passthrough commands on regular > (not zoned) drives ? We most certainly do, without plugging this whole patchset is not functional. Nor is batched dispatch, for example.
On 9/24/22 10:01, Jens Axboe wrote: > On 9/23/22 6:59 PM, Damien Le Moal wrote: >> On 9/24/22 05:54, Jens Axboe wrote: >>> On 9/23/22 9:13 AM, Pankaj Raghav wrote: >>>> On 2022-09-23 16:52, Pankaj Raghav wrote: >>>>> On Thu, Sep 22, 2022 at 12:28:01PM -0600, Jens Axboe wrote: >>>>>> The filesystem IO path can take advantage of allocating batches of >>>>>> requests, if the underlying submitter tells the block layer about it >>>>>> through the blk_plug. For passthrough IO, the exported API is the >>>>>> blk_mq_alloc_request() helper, and that one does not allow for >>>>>> request caching. >>>>>> >>>>>> Wire up request caching for blk_mq_alloc_request(), which is generally >>>>>> done without having a bio available upfront. >>>>>> >>>>>> Signed-off-by: Jens Axboe <axboe@kernel.dk> >>>>>> --- >>>>>> block/blk-mq.c | 80 ++++++++++++++++++++++++++++++++++++++++++++------ >>>>>> 1 file changed, 71 insertions(+), 9 deletions(-) >>>>>> >>>>> I think we need this patch to ensure correct behaviour for passthrough: >>>>> >>>>> diff --git a/block/blk-mq.c b/block/blk-mq.c >>>>> index c11949d66163..840541c1ab40 100644 >>>>> --- a/block/blk-mq.c >>>>> +++ b/block/blk-mq.c >>>>> @@ -1213,7 +1213,7 @@ void blk_execute_rq_nowait(struct request *rq, bool at_head) >>>>> WARN_ON(!blk_rq_is_passthrough(rq)); >>>>> >>>>> blk_account_io_start(rq); >>>>> - if (current->plug) >>>>> + if (blk_mq_plug(rq->bio)) >>>>> blk_add_rq_to_plug(current->plug, rq); >>>>> else >>>>> blk_mq_sched_insert_request(rq, at_head, true, false); >>>>> >>>>> As the passthrough path can now support request caching via blk_mq_alloc_request(), >>>>> and it uses blk_execute_rq_nowait(), bad things can happen at least for zoned >>>>> devices: >>>>> >>>>> static inline struct blk_plug *blk_mq_plug( struct bio *bio) >>>>> { >>>>> /* Zoned block device write operation case: do not plug the BIO */ >>>>> if (bdev_is_zoned(bio->bi_bdev) && op_is_write(bio_op(bio))) >>>>> return NULL; >>>>> .. >>>> >>>> Thinking more about it, even this will not fix it because op is >>>> REQ_OP_DRV_OUT if it is a NVMe write for passthrough requests. >>>> >>>> @Damien Should the condition in blk_mq_plug() be changed to: >>>> >>>> static inline struct blk_plug *blk_mq_plug( struct bio *bio) >>>> { >>>> /* Zoned block device write operation case: do not plug the BIO */ >>>> if (bdev_is_zoned(bio->bi_bdev) && !op_is_read(bio_op(bio))) >>>> return NULL; >>> >>> That looks reasonable to me. It'll prevent plug optimizations even >>> for passthrough on zoned devices, but that's probably fine. >> >> Could do: >> >> if (blk_op_is_passthrough(bio_op(bio)) || >> (bdev_is_zoned(bio->bi_bdev) && op_is_write(bio_op(bio)))) >> return NULL; >> >> Which I think is way cleaner. No ? >> Unless you want to preserve plugging with passthrough commands on regular >> (not zoned) drives ? > > We most certainly do, without plugging this whole patchset is not > functional. Nor is batched dispatch, for example. OK. Then the change to !op_is_read() is fine then. >
>>> As the passthrough path can now support request caching via blk_mq_alloc_request(), >>> and it uses blk_execute_rq_nowait(), bad things can happen at least for zoned >>> devices: >>> >>> static inline struct blk_plug *blk_mq_plug( struct bio *bio) >>> { >>> /* Zoned block device write operation case: do not plug the BIO */ >>> if (bdev_is_zoned(bio->bi_bdev) && op_is_write(bio_op(bio))) >>> return NULL; >>> .. >> >> Thinking more about it, even this will not fix it because op is >> REQ_OP_DRV_OUT if it is a NVMe write for passthrough requests. >> >> @Damien Should the condition in blk_mq_plug() be changed to: >> >> static inline struct blk_plug *blk_mq_plug( struct bio *bio) >> { >> /* Zoned block device write operation case: do not plug the BIO */ >> if (bdev_is_zoned(bio->bi_bdev) && !op_is_read(bio_op(bio))) >> return NULL; > > That looks reasonable to me. It'll prevent plug optimizations even > for passthrough on zoned devices, but that's probably fine. > Do you want me send a separate patch for this change or you will fold it in the existing series? -- Pankaj
On 9/24/22 5:56 AM, Pankaj Raghav wrote: >>>> As the passthrough path can now support request caching via blk_mq_alloc_request(), >>>> and it uses blk_execute_rq_nowait(), bad things can happen at least for zoned >>>> devices: >>>> >>>> static inline struct blk_plug *blk_mq_plug( struct bio *bio) >>>> { >>>> /* Zoned block device write operation case: do not plug the BIO */ >>>> if (bdev_is_zoned(bio->bi_bdev) && op_is_write(bio_op(bio))) >>>> return NULL; >>>> .. >>> >>> Thinking more about it, even this will not fix it because op is >>> REQ_OP_DRV_OUT if it is a NVMe write for passthrough requests. >>> >>> @Damien Should the condition in blk_mq_plug() be changed to: >>> >>> static inline struct blk_plug *blk_mq_plug( struct bio *bio) >>> { >>> /* Zoned block device write operation case: do not plug the BIO */ >>> if (bdev_is_zoned(bio->bi_bdev) && !op_is_read(bio_op(bio))) >>> return NULL; >> >> That looks reasonable to me. It'll prevent plug optimizations even >> for passthrough on zoned devices, but that's probably fine. >> > > Do you want me send a separate patch for this change or you will fold it in > the existing series? Probably cleaner as a separate patch, would be great if you could send one.
diff --git a/block/blk-mq.c b/block/blk-mq.c index c11949d66163..d3a9f8b9c7ee 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -510,25 +510,87 @@ static struct request *__blk_mq_alloc_requests(struct blk_mq_alloc_data *data) alloc_time_ns); } -struct request *blk_mq_alloc_request(struct request_queue *q, blk_opf_t opf, - blk_mq_req_flags_t flags) +static struct request *blk_mq_rq_cache_fill(struct request_queue *q, + struct blk_plug *plug, + blk_opf_t opf, + blk_mq_req_flags_t flags) { struct blk_mq_alloc_data data = { .q = q, .flags = flags, .cmd_flags = opf, - .nr_tags = 1, + .nr_tags = plug->nr_ios, + .cached_rq = &plug->cached_rq, }; struct request *rq; - int ret; - ret = blk_queue_enter(q, flags); - if (ret) - return ERR_PTR(ret); + if (blk_queue_enter(q, flags)) + return NULL; + + plug->nr_ios = 1; rq = __blk_mq_alloc_requests(&data); - if (!rq) - goto out_queue_exit; + if (unlikely(!rq)) + blk_queue_exit(q); + return rq; +} + +static struct request *blk_mq_alloc_cached_request(struct request_queue *q, + blk_opf_t opf, + blk_mq_req_flags_t flags) +{ + struct blk_plug *plug = current->plug; + struct request *rq; + + if (!plug) + return NULL; + if (rq_list_empty(plug->cached_rq)) { + if (plug->nr_ios == 1) + return NULL; + rq = blk_mq_rq_cache_fill(q, plug, opf, flags); + if (rq) + goto got_it; + return NULL; + } + rq = rq_list_peek(&plug->cached_rq); + if (!rq || rq->q != q) + return NULL; + + if (blk_mq_get_hctx_type(opf) != rq->mq_hctx->type) + return NULL; + if (op_is_flush(rq->cmd_flags) != op_is_flush(opf)) + return NULL; + + plug->cached_rq = rq_list_next(rq); +got_it: + rq->cmd_flags = opf; + INIT_LIST_HEAD(&rq->queuelist); + return rq; +} + +struct request *blk_mq_alloc_request(struct request_queue *q, blk_opf_t opf, + blk_mq_req_flags_t flags) +{ + struct request *rq; + + rq = blk_mq_alloc_cached_request(q, opf, flags); + if (!rq) { + struct blk_mq_alloc_data data = { + .q = q, + .flags = flags, + .cmd_flags = opf, + .nr_tags = 1, + }; + int ret; + + ret = blk_queue_enter(q, flags); + if (ret) + return ERR_PTR(ret); + + rq = __blk_mq_alloc_requests(&data); + if (!rq) + goto out_queue_exit; + } rq->__data_len = 0; rq->__sector = (sector_t) -1; rq->bio = rq->biotail = NULL;
The filesystem IO path can take advantage of allocating batches of requests, if the underlying submitter tells the block layer about it through the blk_plug. For passthrough IO, the exported API is the blk_mq_alloc_request() helper, and that one does not allow for request caching. Wire up request caching for blk_mq_alloc_request(), which is generally done without having a bio available upfront. Signed-off-by: Jens Axboe <axboe@kernel.dk> --- block/blk-mq.c | 80 ++++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 71 insertions(+), 9 deletions(-)