Message ID | 20240202073104.2418230-1-dlemoal@kernel.org |
---|---|
Headers | show |
Series | Zone write plugging | expand |
On 2/2/24 12:37 AM, Damien Le Moal wrote: > On 2/2/24 16:30, Damien Le Moal wrote: >> The patch series introduces zone write plugging (ZWP) as the new >> mechanism to control the ordering of writes to zoned block devices. >> ZWP replaces zone write locking (ZWL) which is implemented only by >> mq-deadline today. ZWP also allows emulating zone append operations >> using regular writes for zoned devices that do not natively support this >> operation (e.g. SMR HDDs). This patch series removes the scsi disk >> driver and device mapper zone append emulation to use ZWP emulation. >> >> Unlike ZWL which operates on requests, ZWP operates on BIOs. A zone >> write plug is simply a BIO list that is atomically manipulated using a >> spinlock and a kblockd submission work. A write BIO to a zone is >> "plugged" to delay its execution if a write BIO for the same zone was >> already issued, that is, if a write request for the same zone is being >> executed. The next plugged BIO is unplugged and issued once the write >> request completes. >> >> This mechanism allows to: >> - Untangle zone write ordering from the block IO schedulers. This >> allows removing the restriction on using only mq-deadline for zoned >> block devices. Any block IO scheduler, including "none" can be used. >> - Zone write plugging operates on BIOs instead of requests. Plugged >> BIOs waiting for execution thus do not hold scheduling tags and thus >> do not prevent other BIOs from being submitted to the device (reads >> or writes to other zones). Depending on the workload, this can >> significantly improve the device use and the performance. >> - Both blk-mq (request) based zoned devices and BIO-based devices (e.g. >> device mapper) can use ZWP. It is mandatory for the >> former but optional for the latter: BIO-based driver can use zone >> write plugging to implement write ordering guarantees, or the drivers >> can implement their own if needed. >> - The code is less invasive in the block layer and in device drivers. >> ZWP implementation is mostly limited to blk-zoned.c, with some small >> changes in blk-mq.c, blk-merge.c and bio.c. >> >> Performance evaluation results are shown below. >> >> The series is organized as follows: > > I forgot to mention that the patches are against Jens block/for-next > branch with the addition of Christoph's "clean up blk_mq_submit_bio" > patches [1] and my patch "null_blk: Always split BIOs to respect queue > limits" [2]. I figured that was the case, I'll get both of these properly setup in a for-6.9/block branch, just wanted -rc3 to get cut first. JFYI that they are coming tomorrow.
On 2/1/24 23:30, Damien Le Moal wrote: > The patch series introduces zone write plugging (ZWP) as the new > mechanism to control the ordering of writes to zoned block devices. > ZWP replaces zone write locking (ZWL) which is implemented only by > mq-deadline today. ZWP also allows emulating zone append operations > using regular writes for zoned devices that do not natively support this > operation (e.g. SMR HDDs). This patch series removes the scsi disk > driver and device mapper zone append emulation to use ZWP emulation. How are SCSI unit attention conditions handled? Thanks, Bart.
On 2/1/24 23:30, Damien Le Moal wrote: > - Zone write plugging operates on BIOs instead of requests. Plugged > BIOs waiting for execution thus do not hold scheduling tags and thus > do not prevent other BIOs from being submitted to the device (reads > or writes to other zones). Depending on the workload, this can > significantly improve the device use and the performance. Deep queues may introduce performance problems. In Android we had to restrict the number of pending writes to the device queue depth because otherwise read latency is too high (e.g. to start the camera app). I'm not convinced that queuing zoned write bios is a better approach than queuing zoned write requests. Are there numbers available about the performance differences (bandwidth and latency) between plugging zoned write bios and zoned write plugging requests? Thanks, Bart.
On 2/6/24 02:21, Bart Van Assche wrote: > On 2/1/24 23:30, Damien Le Moal wrote: >> The patch series introduces zone write plugging (ZWP) as the new >> mechanism to control the ordering of writes to zoned block devices. >> ZWP replaces zone write locking (ZWL) which is implemented only by >> mq-deadline today. ZWP also allows emulating zone append operations >> using regular writes for zoned devices that do not natively support this >> operation (e.g. SMR HDDs). This patch series removes the scsi disk >> driver and device mapper zone append emulation to use ZWP emulation. > > How are SCSI unit attention conditions handled? ???? How does that have anything to do with this series ? Whatever SCSI sd is doing with unit attention conditions remains the same. I did not touch that.
On 2/6/24 03:18, Bart Van Assche wrote: > On 2/1/24 23:30, Damien Le Moal wrote: >> - Zone write plugging operates on BIOs instead of requests. Plugged >> BIOs waiting for execution thus do not hold scheduling tags and thus >> do not prevent other BIOs from being submitted to the device (reads >> or writes to other zones). Depending on the workload, this can >> significantly improve the device use and the performance. > > Deep queues may introduce performance problems. In Android we had to > restrict the number of pending writes to the device queue depth because > otherwise read latency is too high (e.g. to start the camera app). With zone write plugging, BIOS are delayed well above the scheduler and device. BIOs that are plugged/delayed by ZWP do not hold tags, not even a scheduler tag, so that allows reads (which are never plugged) to proceed. That is actually unlike zone write locking which can hold on to all scheduler tags thus preventing reads to proceed. > I'm not convinced that queuing zoned write bios is a better approach than > queuing zoned write requests. Well, I do not see why not. The above point on its own is actually to me a good argument enough. And various tests with btrfs showed that even with a slow HDD I can see better overall thoughtput with ZWP compared to zone write locking. And for fast sloid state zoned device (NVMe/UFS), you do not even need an IO scheduler anymore. > > Are there numbers available about the performance differences (bandwidth > and latency) between plugging zoned write bios and zoned write plugging > requests? Finish reading the cover letter. It has lots of measurements with rc2, Jens block/for-next and ZWP... I actually reran all these perf tests over the weekend, but this time did 10 runs and took the average for comparison. Overall, I confirmed the results showed in the cover letter: performance is generally on-par with ZWP or better, but there is one exception: small sequential writes at high qd. There seem to be an issue with regular plugging (current->plug) which result in lost merging opportunists, causing the performance regression. I am digging into that to understand what is happening.
On 2/5/24 15:42, Damien Le Moal wrote: > On 2/6/24 02:21, Bart Van Assche wrote: >> On 2/1/24 23:30, Damien Le Moal wrote: >>> The patch series introduces zone write plugging (ZWP) as the new >>> mechanism to control the ordering of writes to zoned block devices. >>> ZWP replaces zone write locking (ZWL) which is implemented only by >>> mq-deadline today. ZWP also allows emulating zone append operations >>> using regular writes for zoned devices that do not natively support this >>> operation (e.g. SMR HDDs). This patch series removes the scsi disk >>> driver and device mapper zone append emulation to use ZWP emulation. >> >> How are SCSI unit attention conditions handled? > > ???? How does that have anything to do with this series ? > Whatever SCSI sd is doing with unit attention conditions remains the same. I did > not touch that. I wrote my question before I had realized that this patch series restricts the number of outstanding writes to one per zone. Hence, there is no risk of unaligned write pointer errors due to reordering of writes due to unit attention conditions. Hence, my question can be ignored :-) Thanks, Bart.
On 2/6/24 10:25, Bart Van Assche wrote: > On 2/5/24 16:07, Damien Le Moal wrote: >> On 2/6/24 03:18, Bart Van Assche wrote: >>> Are there numbers available about the performance differences (bandwidth >>> and latency) between plugging zoned write bios and zoned write plugging >>> requests? >> >> Finish reading the cover letter. It has lots of measurements with rc2, Jens >> block/for-next and ZWP... > Hmm ... as far as I know nobody ever implemented zoned write plugging > for requests in the block layer core so these numbers can't be in the > cover letter. No, I have not implemented zone write plugging for requests as I beleive it would lead to very similar results as zone write locking, that is, a potential problem with efficiently using a device in a mixed read/write workload as having too many plugged writes can lead to read starvation (blocking of read submission on request allocation when nr_requests is reached). > Has the bio plugging approach perhaps been chosen because it works > better for bio-based device mapper drivers? Not that it "works better" but rather that doing plugging at the BIO level allows re-using the exact same code for zone append emulation, and write ordering (if a DM driver wants the block layer to handle that). We had zone append emulation implemented for DM (for dm-crypt) using BIOs and in scsi sd driver using requests. ZWP unifies all this and will trivially allow enabling that emulation for other device types as well (e.g. NVMe ZNS drives that do not have native zone append support).
On 2/3/24 21:11, Jens Axboe wrote: >> I forgot to mention that the patches are against Jens block/for-next >> branch with the addition of Christoph's "clean up blk_mq_submit_bio" >> patches [1] and my patch "null_blk: Always split BIOs to respect queue >> limits" [2]. > > I figured that was the case, I'll get both of these properly setup in a > for-6.9/block branch, just wanted -rc3 to get cut first. JFYI that they > are coming tomorrow. Jens, I saw the updated rc3-based for-next branch. Thanks for that. But it seems that you removed the mq-deadline insert optimization ? Is that in purpose or did I mess up something ?