Message ID | 20230605121732.28468-1-nj.shetty@samsung.com |
---|---|
Headers | show |
Series | Implement copy offload support | expand |
> break; > case REQ_OP_READ: > - ret = nvme_setup_rw(ns, req, cmd, nvme_cmd_read); > + if (unlikely(req->cmd_flags & REQ_COPY)) > + nvme_setup_copy_read(ns, req); > + else > + ret = nvme_setup_rw(ns, req, cmd, nvme_cmd_read); > break; > case REQ_OP_WRITE: > - ret = nvme_setup_rw(ns, req, cmd, nvme_cmd_write); > + if (unlikely(req->cmd_flags & REQ_COPY)) > + ret = nvme_setup_copy_write(ns, req, cmd); > + else > + ret = nvme_setup_rw(ns, req, cmd, nvme_cmd_write); Yikes. Overloading REQ_OP_READ and REQ_OP_WRITE with something entirely different brings us back the horrors of the block layer 15 years ago. Don't do that. Please add separate REQ_COPY_IN/OUT (or maybe SEND/RECEIVE or whatever) methods. > + /* setting copy limits */ > + if (blk_queue_flag_test_and_set(QUEUE_FLAG_COPY, q)) I don't understand this comment. > +struct nvme_copy_token { > + char *subsys; > + struct nvme_ns *ns; > + sector_t src_sector; > + sector_t sectors; > +}; Why do we need a subsys token? Inter-namespace copy is pretty crazy, and not really anything we should aim for. But this whole token design is pretty odd anyway. The only thing we'd need is a sequence number / idr / etc to find an input and output side match up, as long as we stick to the proper namespace scope. > + if (unlikely((req->cmd_flags & REQ_COPY) && > + (req_op(req) == REQ_OP_READ))) { > + blk_mq_start_request(req); > + return BLK_STS_OK; > + } This really needs to be hiden inside of nvme_setup_cmd. And given that other drivers might need similar handling the best way is probably to have a new magic BLK_STS_* value for request started but we're not actually sending it to hardware.
On Tue, Jun 06, 2023 at 05:05:35PM +0530, Nitesh Shetty wrote: > Downside will be duplicating checks which are present for read, write in > block layer, device-mapper and zoned devices. > But we can do this, shouldn't be an issue. Yes. Please never overload operations, this is just causing problems everywhere, and that why I split the operations from the flag a few years ago. > The idea behind subsys is to prevent copy across different subsystem. > For example, copy across nvme subsystem and the scsi subsystem. [1] > At present, we don't support inter-namespace(copy across NVMe namespace), > but after community feedback for previous series we left scope for it. Never leave scope for something that isn't actually added. That just creates a giant maintainance nightmare. Cross-device copies are giant nightmare in general, and in the case of NVMe completely unusable as currently done in the working group. Messing up something that is entirely reasonable (local copy) for something like that is a sure way to never get this series in.
Christoph, > Yikes. Overloading REQ_OP_READ and REQ_OP_WRITE with something > entirely different brings us back the horrors of the block layer 15 > years ago. Don't do that. Please add separate REQ_COPY_IN/OUT (or > maybe SEND/RECEIVE or whatever) methods. I agree, I used REQ_COPY_IN and REQ_COPY_OUT in my original series. >> + /* setting copy limits */ >> + if (blk_queue_flag_test_and_set(QUEUE_FLAG_COPY, q)) > > I don't understand this comment. > >> +struct nvme_copy_token { >> + char *subsys; >> + struct nvme_ns *ns; >> + sector_t src_sector; >> + sector_t sectors; >> +}; > > Why do we need a subsys token? Inter-namespace copy is pretty crazy, > and not really anything we should aim for. But this whole token design > is pretty odd anyway. The only thing we'd need is a sequence number / > idr / etc to find an input and output side match up, as long as we > stick to the proper namespace scope. Yeah, I don't think we need to carry this in a token. Doing the sanity check up front in blkdev_copy_offload() should be fine. For NVMe it's not currently possible to copy across and for SCSI we'd just make sure the copy scope is the same for the two block devices before we even issue the operations.
On Thu, Jun 08, 2023 at 05:38:17PM +0530, Nitesh Shetty wrote: > Sure, we can do away with subsys and realign more on single namespace copy. > We are planning to use token to store source info, such as src sector, > len and namespace. Something like below, > > struct nvme_copy_token { > struct nvme_ns *ns; // to make sure we are copying within same namespace > /* store source info during *IN operation, will be used by *OUT operation */ > sector_t src_sector; > sector_t sectors; > }; > Do you have any better way to handle this in mind ? In general every time we tried to come up with a request payload that is not just data passed to the device it has been a nightmare. So my gut feeling would be that bi_sector and bi_iter.bi_size are the ranges, with multiple bios being allowed to form the input data, similar to how we implement discard merging. The interesting part is how we'd match up these bios. One idea would be that since copy by definition doesn't need integrity data we just add a copy_id that unions it, and use a simple per-gendisk copy I/D allocator, but I'm not entirely sure how well that interacts stacking drivers.
On 23/06/08 09:24PM, Christoph Hellwig wrote: >On Thu, Jun 08, 2023 at 05:38:17PM +0530, Nitesh Shetty wrote: >> Sure, we can do away with subsys and realign more on single namespace copy. >> We are planning to use token to store source info, such as src sector, >> len and namespace. Something like below, >> >> struct nvme_copy_token { >> struct nvme_ns *ns; // to make sure we are copying within same namespace >> /* store source info during *IN operation, will be used by *OUT operation */ >> sector_t src_sector; >> sector_t sectors; >> }; >> Do you have any better way to handle this in mind ? > >In general every time we tried to come up with a request payload that is >not just data passed to the device it has been a nightmare. > >So my gut feeling would be that bi_sector and bi_iter.bi_size are the >ranges, with multiple bios being allowed to form the input data, similar >to how we implement discard merging. > >The interesting part is how we'd match up these bios. One idea would >be that since copy by definition doesn't need integrity data we just >add a copy_id that unions it, and use a simple per-gendisk copy I/D >allocator, but I'm not entirely sure how well that interacts stacking >drivers. V13[1] implements that route. Please see if that matches with what you had in mind? [1] https://lore.kernel.org/linux-nvme/20230627183629.26571-1-nj.shetty@samsung.com/ Thank you, Nitesh Shetty