Message ID | 20220214080002.18381-1-nj.shetty@samsung.com |
---|---|
Headers | show |
Series | Add Copy offload support | expand |
On Mon, Feb 14, 2022 at 01:29:50PM +0530, Nitesh Shetty wrote: > The patch series covers the points discussed in November 2021 virtual call > [LSF/MM/BFP TOPIC] Storage: Copy Offload[0]. > We have covered the Initial agreed requirements in this patchset. > Patchset borrows Mikulas's token based approach for 2 bdev > implementation. > > Overall series supports – > > 1. Driver > - NVMe Copy command (single NS), including support in nvme-target (for > block and file backend) > > 2. Block layer > - Block-generic copy (REQ_COPY flag), with interface accommodating > two block-devs, and multi-source/destination interface > - Emulation, when offload is natively absent > - dm-linear support (for cases not requiring split) > > 3. User-interface > - new ioctl > > 4. In-kernel user > - dm-kcopyd The biggest missing piece - and arguably the single most useful piece of this functionality for users - is hooking this up to the copy_file_range() syscall so that user file copies can be offloaded to the hardware efficiently. This seems like it would relatively easy to do with an fs/iomap iter loop that maps src + dst file ranges and issues block copy offload commands on the extents. We already do similar "read from source, write to destination" operations in iomap, so it's not a huge stretch to extent the iomap interfaces to provide an copy offload mechanism using this infrastructure. Also, hooking this up to copy-file-range() will also get you immediate data integrity testing right down to the hardware via fsx in fstests - it uses copy_file_range() as one of it's operations and it will find all the off-by-one failures in both the linux IO stack implementation and the hardware itself. And, in reality, I wouldn't trust a block copy offload mechanism until it is integrated with filesystems, the page cache and has solid end-to-end data integrity testing available to shake out all the bugs that will inevitably exist in this stack.... Cheers, Dave.
Tue, Feb 15, 2022 at 09:08:12AM +1100, Dave Chinner wrote: > On Mon, Feb 14, 2022 at 01:29:50PM +0530, Nitesh Shetty wrote: > > The patch series covers the points discussed in November 2021 virtual call > > [LSF/MM/BFP TOPIC] Storage: Copy Offload[0]. > > We have covered the Initial agreed requirements in this patchset. > > Patchset borrows Mikulas's token based approach for 2 bdev > > implementation. > > > > Overall series supports – > > > > 1. Driver > > - NVMe Copy command (single NS), including support in nvme-target (for > > block and file backend) > > > > 2. Block layer > > - Block-generic copy (REQ_COPY flag), with interface accommodating > > two block-devs, and multi-source/destination interface > > - Emulation, when offload is natively absent > > - dm-linear support (for cases not requiring split) > > > > 3. User-interface > > - new ioctl > > > > 4. In-kernel user > > - dm-kcopyd > > The biggest missing piece - and arguably the single most useful > piece of this functionality for users - is hooking this up to the > copy_file_range() syscall so that user file copies can be offloaded > to the hardware efficiently. > > This seems like it would relatively easy to do with an fs/iomap iter > loop that maps src + dst file ranges and issues block copy offload > commands on the extents. We already do similar "read from source, > write to destination" operations in iomap, so it's not a huge > stretch to extent the iomap interfaces to provide an copy offload > mechanism using this infrastructure. > > Also, hooking this up to copy-file-range() will also get you > immediate data integrity testing right down to the hardware via fsx > in fstests - it uses copy_file_range() as one of it's operations and > it will find all the off-by-one failures in both the linux IO stack > implementation and the hardware itself. > > And, in reality, I wouldn't trust a block copy offload mechanism > until it is integrated with filesystems, the page cache and has > solid end-to-end data integrity testing available to shake out all > the bugs that will inevitably exist in this stack.... > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com > We had planned copy_file_range (CFR) in next phase of copy offload patch series. Thinking that we will get to CFR when everything else is robust. But if that is needed to make things robust, will start looking into that. -- Nitesh Shetty
On Thu, Feb 17, 2022 at 06:32:15PM +0530, Nitesh Shetty wrote: > Tue, Feb 15, 2022 at 09:08:12AM +1100, Dave Chinner wrote: > > On Mon, Feb 14, 2022 at 01:29:50PM +0530, Nitesh Shetty wrote: > > > [LSF/MM/BFP TOPIC] Storage: Copy Offload[0]. > > The biggest missing piece - and arguably the single most useful > > piece of this functionality for users - is hooking this up to the > > copy_file_range() syscall so that user file copies can be offloaded > > to the hardware efficiently. > > > > This seems like it would relatively easy to do with an fs/iomap iter > > loop that maps src + dst file ranges and issues block copy offload > > commands on the extents. We already do similar "read from source, > > write to destination" operations in iomap, so it's not a huge > > stretch to extent the iomap interfaces to provide an copy offload > > mechanism using this infrastructure. > > > > Also, hooking this up to copy-file-range() will also get you > > immediate data integrity testing right down to the hardware via fsx > > in fstests - it uses copy_file_range() as one of it's operations and > > it will find all the off-by-one failures in both the linux IO stack > > implementation and the hardware itself. > > > > And, in reality, I wouldn't trust a block copy offload mechanism > > until it is integrated with filesystems, the page cache and has > > solid end-to-end data integrity testing available to shake out all > > the bugs that will inevitably exist in this stack.... > > We had planned copy_file_range (CFR) in next phase of copy offload patch series. > Thinking that we will get to CFR when everything else is robust. > But if that is needed to make things robust, will start looking into that. How do you make it robust when there is no locking/serialisation to prevent overlapping concurrent IO while the copy-offload is in progress? Or that you don't have overlapping concurrent copy-offloads running at the same time? You've basically created a block dev ioctl interface that looks impossible to use safely. It doesn't appear to be coherent with the blockdev page cache nor does it appear to have any documented data integrity semantics, either. e.g. how does this interact with the guarantees that fsync_bdev() and/or sync_blockdev() are supposed to provide? IOWs, if you don't have either CFR or some other strictly bound kernel user with well defined access, synchronisation and integrity semantics, how can anyone actually robustly test these ioctls to be working correctly in all situations they might be called? Cheers, Dave.