Message ID | 20230809202355.1171455-1-bvanassche@acm.org |
---|---|
Headers | show |
Series | Improve performance for zoned UFS devices | expand |
On 8/10/23 05:23, Bart Van Assche wrote: > Writes in sequential write required zones must happen at the write > pointer. Even if the submitter of the write commands (e.g. a filesystem) > submits writes for sequential write required zones in order, the block > layer or the storage controller may reorder these write commands. > > The zone locking mechanism in the mq-deadline I/O scheduler serializes > write commands for sequential zones. Some but not all storage controllers > require this serialization. Introduce a new request queue limit member > variable to allow block drivers to indicate that they preserve the order > of write commands and thus do not require serialization of writes per > zone. > > Cc: Damien Le Moal <dlemoal@kernel.org> > Cc: Christoph Hellwig <hch@lst.de> > Cc: Ming Lei <ming.lei@redhat.com> > Signed-off-by: Bart Van Assche <bvanassche@acm.org> > --- > block/blk-settings.c | 6 ++++++ > include/linux/blkdev.h | 1 + > 2 files changed, 7 insertions(+) > > diff --git a/block/blk-settings.c b/block/blk-settings.c > index 0046b447268f..b75c97971860 100644 > --- a/block/blk-settings.c > +++ b/block/blk-settings.c > @@ -56,6 +56,7 @@ void blk_set_default_limits(struct queue_limits *lim) > lim->alignment_offset = 0; > lim->io_opt = 0; > lim->misaligned = 0; > + lim->use_zone_write_lock = true; > lim->zoned = BLK_ZONED_NONE; Given that the default for zoned is BLK_ZONED_NONE, having use_zone_write_lock default to true is strange. It would be better to set the default to false and have disk_set_zoned() set it to true if needed, with an additional argument to specify if it should be the case or not. E.g., for SMR drives, sd.c would call something like: disk_set_zoned(sdkp->disk, BLK_ZONED_HM, sdp->use_zone_write_lock); sd.c would default to sdp->use_zone_write_lock == true and UFS driver can set it to false. That would be cleaner I think. > lim->zone_write_granularity = 0; > lim->dma_alignment = 511; > @@ -685,6 +686,11 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b, > b->max_secure_erase_sectors); > t->zone_write_granularity = max(t->zone_write_granularity, > b->zone_write_granularity); > + /* > + * Whether or not the zone write lock should be used depends on the > + * bottom driver only. > + */ > + t->use_zone_write_lock = b->use_zone_write_lock; Given that DM bio targets do not have a scheduler and do not have a zone lock bitmap allocated, I do not think this is necessary at all. This can remain to false, thus in sync with the fact that there is no IO scheduler. > t->zoned = max(t->zoned, b->zoned); > return ret; > } > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h > index 2f5371b8482c..deffa1f13444 100644 > --- a/include/linux/blkdev.h > +++ b/include/linux/blkdev.h > @@ -316,6 +316,7 @@ struct queue_limits { > unsigned char misaligned; > unsigned char discard_misaligned; > unsigned char raid_partial_stripes_expensive; > + bool use_zone_write_lock; > enum blk_zoned_model zoned; > > /*
On 8/9/23 18:33, Damien Le Moal wrote: > On 8/10/23 05:23, Bart Van Assche wrote: >> Writes in sequential write required zones must happen at the write >> pointer. Even if the submitter of the write commands (e.g. a filesystem) >> submits writes for sequential write required zones in order, the block >> layer or the storage controller may reorder these write commands. >> >> The zone locking mechanism in the mq-deadline I/O scheduler serializes >> write commands for sequential zones. Some but not all storage controllers >> require this serialization. Introduce a new request queue limit member >> variable to allow block drivers to indicate that they preserve the order >> of write commands and thus do not require serialization of writes per >> zone. >> >> Cc: Damien Le Moal <dlemoal@kernel.org> >> Cc: Christoph Hellwig <hch@lst.de> >> Cc: Ming Lei <ming.lei@redhat.com> >> Signed-off-by: Bart Van Assche <bvanassche@acm.org> >> --- >> block/blk-settings.c | 6 ++++++ >> include/linux/blkdev.h | 1 + >> 2 files changed, 7 insertions(+) >> >> diff --git a/block/blk-settings.c b/block/blk-settings.c >> index 0046b447268f..b75c97971860 100644 >> --- a/block/blk-settings.c >> +++ b/block/blk-settings.c >> @@ -56,6 +56,7 @@ void blk_set_default_limits(struct queue_limits *lim) >> lim->alignment_offset = 0; >> lim->io_opt = 0; >> lim->misaligned = 0; >> + lim->use_zone_write_lock = true; >> lim->zoned = BLK_ZONED_NONE; > > Given that the default for zoned is BLK_ZONED_NONE, having use_zone_write_lock > default to true is strange. It would be better to set the default to false and > have disk_set_zoned() set it to true if needed, with an additional argument to > specify if it should be the case or not. E.g., for SMR drives, sd.c would call > something like: > > disk_set_zoned(sdkp->disk, BLK_ZONED_HM, sdp->use_zone_write_lock); > > sd.c would default to sdp->use_zone_write_lock == true and UFS driver can set > it to false. That would be cleaner I think. Hi Damien, Thanks for the detailed feedback. My concerns about the above proposal are as follows: * The information about whether or not the zone write lock should be used comes from the block driver, e.g. a SCSI LLD. * sdp, the SCSI disk pointer, is owned by the ULD. * An ULD may be attached and detached multiple times during the lifetime of a logical unit without the LLD being informed about this. So how to set sdp->use_zone_write_lock without introducing a new callback or member variable in a data structure owned by the LLD? Hence my preference to store use_zone_write_lock in a data structure that has the same lifetime as the logical unit and not in any data structure controlled by the ULD. >> lim->zone_write_granularity = 0; >> lim->dma_alignment = 511; >> @@ -685,6 +686,11 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b, >> b->max_secure_erase_sectors); >> t->zone_write_granularity = max(t->zone_write_granularity, >> b->zone_write_granularity); >> + /* >> + * Whether or not the zone write lock should be used depends on the >> + * bottom driver only. >> + */ >> + t->use_zone_write_lock = b->use_zone_write_lock; > > Given that DM bio targets do not have a scheduler and do not have a zone lock > bitmap allocated, I do not think this is necessary at all. This can remain to > false, thus in sync with the fact that there is no IO scheduler. How about the request-based dm drivers (dm-mpath and dm-target)? Isn't the dm-mpath driver request based because that allows the I/O scheduler to be configured on top of the dm-mpath driver? From https://lwn.net/Articles/274292/: "The basic idea to resolve the issue is to move multipathing layer down below the I/O scheduler, and it was proposed from Mike Christie as the block layer (request-based) multipath". Thanks, Bart.
Just as last time: comparing the lbas really has no business in the SCSI core.
On 8/10/23 17:39, Damien Le Moal wrote: > On 8/10/23 23:02, Bart Van Assche wrote: >> My concerns about the above proposal are as follows: >> * The information about whether or not the zone write lock should be used comes >> from the block driver, e.g. a SCSI LLD. > > Yes. > >> * sdp, the SCSI disk pointer, is owned by the ULD. >> * An ULD may be attached and detached multiple times during the lifetime of a >> logical unit without the LLD being informed about this. So how to set >> sdp->use_zone_write_lock without introducing a new callback or member variable >> in a data structure owned by the LLD? > > That would be set during device scan and device revalidate. And if the value > changes, then disk_set_zoned() should be called again to update the queue limit. > That is already what is done for the zoned limit indicating the type of the > drive. My point is that the zoned limit should dictate if use_zone_write_lock > can be true. The default should be be: > > q->limits.use_zone_write_lock = q->limits.zoned != BLK_ZONED_NONE. > > And as proposed, if the UFS driver wants to disable zone write locking, all it > needs to do is call "disk_set_zoned(disk, BLK_ZONED_HM, false)". I did not try > to actually code that, and the scsi disk driver may be in the way and that may > need to be done there, hence the suggestion of having a use_zone_write_lock flag > in the scsi device structure so that the UFS driver can set it as needed as well > (and detect changes when revalidating). That should work, but I may be missing > something. Hi Damien, Adding a use_zone_write_lock argument to disk_set_zoned() seems wrong to me. The information about whether or not to use the zone write lock comes from the LLD. The zone model is retrieved by the ULD. Since both pieces of information come from different drivers, both properties should be modified independently. Moving the use_zone_write_lock member variable into a data structure owned by the ULD seems wrong to me because that member variable is set by the LLD. Reviewers are allowed to request changes for well-designed and working code but should be able to explain why they request these changes. Can you please explain why you care about the value of q->limits.use_zone_write_lock for the BLK_ZONED_NONE case? If dd_use_zone_write_locking() would be renamed and would be moved into include/linux/blkdev.h and if reads of q->limits.use_zone_write_lock would be changed into blk_use_zone_write_locking() calls then the value of q->limits.use_zone_write_lock for the BLK_ZONED_NONE case wouldn't matter at all. Thanks, Bart.
On 8/12/23 00:41, Bart Van Assche wrote: > On 8/10/23 17:39, Damien Le Moal wrote: >> On 8/10/23 23:02, Bart Van Assche wrote: >>> My concerns about the above proposal are as follows: >>> * The information about whether or not the zone write lock should be used comes >>> from the block driver, e.g. a SCSI LLD. >> >> Yes. >> >>> * sdp, the SCSI disk pointer, is owned by the ULD. >>> * An ULD may be attached and detached multiple times during the lifetime of a >>> logical unit without the LLD being informed about this. So how to set >>> sdp->use_zone_write_lock without introducing a new callback or member variable >>> in a data structure owned by the LLD? >> >> That would be set during device scan and device revalidate. And if the value >> changes, then disk_set_zoned() should be called again to update the queue limit. >> That is already what is done for the zoned limit indicating the type of the >> drive. My point is that the zoned limit should dictate if use_zone_write_lock >> can be true. The default should be be: >> >> q->limits.use_zone_write_lock = q->limits.zoned != BLK_ZONED_NONE. >> >> And as proposed, if the UFS driver wants to disable zone write locking, all it >> needs to do is call "disk_set_zoned(disk, BLK_ZONED_HM, false)". I did not try >> to actually code that, and the scsi disk driver may be in the way and that may >> need to be done there, hence the suggestion of having a use_zone_write_lock flag >> in the scsi device structure so that the UFS driver can set it as needed as well >> (and detect changes when revalidating). That should work, but I may be missing >> something. > > Hi Damien, > > Adding a use_zone_write_lock argument to disk_set_zoned() seems wrong to > me. The information about whether or not to use the zone write lock > comes from the LLD. The zone model is retrieved by the ULD. Since both > pieces of information come from different drivers, both properties > should be modified independently. OK. But I still think that disk_set_zoned() is the place where the default for use_zone_write_lock should be set. And we need a clean way for the LLD to change use_zone_write_lock. > > Moving the use_zone_write_lock member variable into a data structure > owned by the ULD seems wrong to me because that member variable is set > by the LLD. > > Reviewers are allowed to request changes for well-designed and working > code but should be able to explain why they request these changes. Can > you please explain why you care about the value of > q->limits.use_zone_write_lock for the BLK_ZONED_NONE case? If Because use_zone_write_lock should ever be true for !BLK_ZONED_NONE case. Allowing use_zone_write_lock to be true even with zoned == BLK_ZONED_NONE forces to always check that the device is zoned to ensure that the value of use_zone_write_lock is valid. This is awckward and uselessly complicate things. > dd_use_zone_write_locking() would be renamed and would be moved into > include/linux/blkdev.h and if reads of q->limits.use_zone_write_lock > would be changed into blk_use_zone_write_locking() calls then the value > of q->limits.use_zone_write_lock for the BLK_ZONED_NONE case wouldn't > matter at all. Sure, that would work. But again, why the need to check both model and actual value of use_zone_write_lock ? I do not think it is that hard to keep use_zone_write_lock to false for the BLK_ZONED_NONE case. Then blk_use_zone_write_locking() is reduced to returning the value of use_zone_write_lock. > > Thanks, > > Bart.