Message ID | 20221026231945.6609-13-michael.christie@oracle.com |
---|---|
State | New |
Headers | show |
Series | Use block pr_ops in LIO | expand |
On Wed, Oct 26, 2022 at 06:19:38PM -0500, Mike Christie wrote: > To handle both cases and keep userspace compatibility, this patch adds a > blk_status_t arg to the pr_ops callouts. The lower levels will convert > their device specific error to the blk_status_t then the upper levels > can easily check that code without knowing the device type. Adding the > extra return value will then allow us to not break userspace which expects > a negative -Exyz error code if the command fails before it's sent to the > device or a device/driver specific value if the error is > 0. I really hate this double error return. What -E* statuses that matter can be returned without a BLK_STS_* equivalent that we couldn't convert to and from?
On 10/30/22 3:20 AM, Christoph Hellwig wrote: > On Wed, Oct 26, 2022 at 06:19:38PM -0500, Mike Christie wrote: >> To handle both cases and keep userspace compatibility, this patch adds a >> blk_status_t arg to the pr_ops callouts. The lower levels will convert >> their device specific error to the blk_status_t then the upper levels >> can easily check that code without knowing the device type. Adding the >> extra return value will then allow us to not break userspace which expects >> a negative -Exyz error code if the command fails before it's sent to the >> device or a device/driver specific value if the error is > 0. > > I really hate this double error return. What -E* statuses that matter > can be returned without a BLK_STS_* equivalent that we couldn't convert > to and from? Hey, I didn't fully understand the question. The specific issue I'm hitting is that if a scsi/nvme device returns SAM_STAT_RESERVATION_CONFLICT/ NVME_SC_RESERVATION_CONFLICT then in lio I need to be able to detect that and return SAM_STAT_RESERVATION_CONFLICT. So I only care about -EBADE/BLK_STS_NEXUS right now. So are you asking about -EBADE? Do you mean I could have sd_pr_out_command return -EBADE when it gets a SAM_STAT_RESERVATION_CONFLICT (nvme would do the equivalent). Then in lio do: ret = ops->pr_register() if (ret == -EBADE) return SAM_STAT_RESERVATION_CONFLICT; The problem I hit is that in the ioctl code I then have to do: @@ -269,7 +270,14 @@ static int blkdev_pr_register(struct block_device *bdev, if (reg.flags & ~PR_FL_IGNORE_KEY) return -EOPNOTSUPP; - return ops->pr_register(bdev, reg.old_key, reg.new_key, reg.flags); + ret = ops->pr_register(bdev, reg.old_key, reg.new_key, reg.flags); + if (ret == -EBADE) { + if (bdev_is_nvme(bdev)) + ret = NVME_SC_RESERVATION_CONFLICT; + else if (bdev_is_scsi(bdev) + ret = SAM_STAT_RESERVATION_CONFLICT; + } + return ret; } or I could convert the scsi/nvme or code to always use BLK_STS errors. In LIO I can easily check for BLK_STS_NEXUS like with the -EBADE example. In the ioctl code then for common errors I can go from BLK_STS using the blk_status_to_errno helper. However, for some scsi/nvme specific errors we would want to do: @@ -269,7 +270,36 @@ static int blkdev_pr_register(struct block_device *bdev, if (reg.flags & ~PR_FL_IGNORE_KEY) return -EOPNOTSUPP; - return ops->pr_register(bdev, reg.old_key, reg.new_key, reg.flags); + ret = ops->pr_register(bdev, reg.old_key, reg.new_key, reg.flags); + switch (ret) { + /* there could be nvme/scsi helper functions for this which would + * be the reverse of nvme_error_status/ */ + case BLK_STS_NEXUS: + if (bdev_is_nvme(bdev)) + ret = NVME_SC_RESERVATION_CONFLICT; + else if (bdev_is_scsi(bdev) + ret = SAM_STAT_RESERVATION_CONFLICT; + break; + case BLK_STS_TRANSPORT: + if (bdev_is_nvme(bdev)) + ret = NVME_SC_HOST_PATH_ERROR; + else if (bdev_is_scsi(bdev) + ret = DID_TRANSPORT_FAILFAST or DID_TRANSPORT_MARGINAL; + break; + case BLK_STS_NOTSUPP: + if (bdev_is_nvme(bdev)) + ret = NVME_SC_BAD_ATTRIBUTES or + NVME_SC_ONCS_NOT_SUPPORTED or + NVME_SC_INVALID_OPCODE or + NVME_SC_INVALID_FIELD or + NVME_SC_INVALID_NS + else if (bdev_is_scsi(bdev) + ret = We don't have a way to support this in SCSI yet because it would be in the sense/asc/ascq. + break; + default: + ret = blk_status_to_errno(ret); + } + return ret; }
On Sun, Oct 30, 2022 at 06:05:35PM -0500, Mike Christie wrote: > The problem I hit is that in the ioctl code I then have to do: > > @@ -269,7 +270,14 @@ static int blkdev_pr_register(struct block_device *bdev, > > if (reg.flags & ~PR_FL_IGNORE_KEY) > return -EOPNOTSUPP; > - return ops->pr_register(bdev, reg.old_key, reg.new_key, reg.flags); > + ret = ops->pr_register(bdev, reg.old_key, reg.new_key, reg.flags); > + if (ret == -EBADE) { > + if (bdev_is_nvme(bdev)) > + ret = NVME_SC_RESERVATION_CONFLICT; > + else if (bdev_is_scsi(bdev) > + ret = SAM_STAT_RESERVATION_CONFLICT; > + } > + return ret; Eww. We should have never leaked protocol specific values out to userspace. This is an original bug I introduceѕ, so all blame is on me. I suspect the right way to fix is is to keep the numeric value of SAM_STAT_RESERVATION_CONFLICT and give it a new constant exposed in the uapi header, assuming that SCSI is the thing people actually used the PR API for, and nvme was just an nice little add-on. Now if an errno value or blk_status_t is returned from the method should not matter for this fix, but in the long run I think the blk_status_t would be cleaner than the int used for errno, and that will also prevent us from returning accidental non-intended values. Btw, I also thing we should rename BLK_STS_NEXUS to BLK_STS_RESERVATION_CONFLICT (assuming s390 is ok with that), as that has much better documentary value. > + case BLK_STS_TRANSPORT: > + if (bdev_is_nvme(bdev)) > + ret = NVME_SC_HOST_PATH_ERROR; > + else if (bdev_is_scsi(bdev) > + ret = DID_TRANSPORT_FAILFAST or DID_TRANSPORT_MARGINAL; > + break; And we'll need an uapi value for this as well. > + case BLK_STS_NOTSUPP: and maybe this unless we can just get away with the negative errno value.
On 11/1/22 5:15 AM, Christoph Hellwig wrote: > On Sun, Oct 30, 2022 at 06:05:35PM -0500, Mike Christie wrote: >> The problem I hit is that in the ioctl code I then have to do: >> >> @@ -269,7 +270,14 @@ static int blkdev_pr_register(struct block_device *bdev, >> >> if (reg.flags & ~PR_FL_IGNORE_KEY) >> return -EOPNOTSUPP; >> - return ops->pr_register(bdev, reg.old_key, reg.new_key, reg.flags); >> + ret = ops->pr_register(bdev, reg.old_key, reg.new_key, reg.flags); >> + if (ret == -EBADE) { >> + if (bdev_is_nvme(bdev)) >> + ret = NVME_SC_RESERVATION_CONFLICT; >> + else if (bdev_is_scsi(bdev) >> + ret = SAM_STAT_RESERVATION_CONFLICT; >> + } >> + return ret; > > Eww. We should have never leaked protocol specific values out to > userspace. This is an original bug I introduceѕ, so all blame is on me. > > I suspect the right way to fix is is to keep the numeric value of > SAM_STAT_RESERVATION_CONFLICT and give it a new constant exposed in > the uapi header, assuming that SCSI is the thing people actually > used the PR API for, and nvme was just an nice little add-on. > Do you mean just doing this: diff --git a/drivers/nvme/host/pr.c b/drivers/nvme/host/pr.c index 727123c611e6..6ac70514170d 100644 --- a/drivers/nvme/host/pr.c +++ b/drivers/nvme/host/pr.c @@ -72,12 +72,17 @@ static int nvme_send_ns_pr_command(struct nvme_ns *ns, struct nvme_command *c, static int nvme_send_pr_command(struct block_device *bdev, struct nvme_command *c, void *data, unsigned int data_len) { + int ret; + if (IS_ENABLED(CONFIG_NVME_MULTIPATH) && bdev->bd_disk->fops == &nvme_ns_head_ops) - return nvme_send_ns_head_pr_command(bdev, c, data, data_len); + ret = nvme_send_ns_head_pr_command(bdev, c, data, data_len); else - return nvme_send_ns_pr_command(bdev->bd_disk->private_data, c, - data, data_len); + ret = nvme_send_ns_pr_command(bdev->bd_disk->private_data, c, + data, data_len); + if (ret == NVME_SC_RESERVATION_CONFLICT) + ret = PR_STS_RESERVATION_CONFLICT; + return ret; } static int nvme_pr_command(struct block_device *bdev, u32 cdw10, diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index c0a614efcfce..c7621d8ffa51 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -1840,7 +1840,8 @@ static int sd_pr_out_command(struct block_device *bdev, u8 sa, scsi_sense_valid(&sshdr)) { sdev_printk(KERN_INFO, sdev, "PR command failed: %d\n", result); scsi_print_sense_hdr(sdev, NULL, &sshdr); - } + } else if (__get_status_byte(result) == SAM_STAT_RESERVATION_CONFLICT) + result = PR_STS_RESERVATION_CONFLICT; return result; } diff --git a/include/uapi/linux/pr.h b/include/uapi/linux/pr.h index ccc78cbf1221..7a637f9e5b49 100644 --- a/include/uapi/linux/pr.h +++ b/include/uapi/linux/pr.h @@ -13,6 +13,15 @@ enum pr_type { PR_EXCLUSIVE_ACCESS_ALL_REGS = 6, }; +enum pr_status { + PR_STS_SUCCESS = 0x0, + /* + * The error codes are based on SCSI, because it was the primary user + * and had userspace users. + */ + PR_STS_RESERVATION_CONFLICT = 0x18, +}; + struct pr_reservation { __u64 key; __u32 type; --------------------------------------------------------------------------- Or we could add a new flag and make it nicer for the user in the future, but also uglier for the drivers but a little less uglier than adding the blk_sts field to the calls: diff --git a/block/ioctl.c b/block/ioctl.c index 60121e89052b..cae58f57ea13 100644 --- a/block/ioctl.c +++ b/block/ioctl.c @@ -269,7 +269,8 @@ static int blkdev_pr_register(struct block_device *bdev, if (reg.flags & ~PR_FL_IGNORE_KEY) return -EOPNOTSUPP; - return ops->pr_register(bdev, reg.old_key, reg.new_key, reg.flags); + return ops->pr_register(bdev, reg.old_key, reg.new_key, reg.flags, + reg.flags & PR_FL_PR_STAT); } static int blkdev_pr_reserve(struct block_device *bdev, @@ -287,7 +288,8 @@ static int blkdev_pr_reserve(struct block_device *bdev, if (rsv.flags & ~PR_FL_IGNORE_KEY) return -EOPNOTSUPP; - return ops->pr_reserve(bdev, rsv.key, rsv.type, rsv.flags); + return ops->pr_reserve(bdev, rsv.key, rsv.type, rsv.flags, + rsv.flags & PR_FL_PR_STAT) } static int blkdev_pr_release(struct block_device *bdev, @@ -305,7 +307,8 @@ static int blkdev_pr_release(struct block_device *bdev, if (rsv.flags) return -EOPNOTSUPP; - return ops->pr_release(bdev, rsv.key, rsv.type); + return ops->pr_release(bdev, rsv.key, rsv.type, + rsv.flags & PR_FL_PR_STAT); } static int blkdev_pr_preempt(struct block_device *bdev, @@ -323,7 +326,8 @@ static int blkdev_pr_preempt(struct block_device *bdev, if (p.flags) return -EOPNOTSUPP; - return ops->pr_preempt(bdev, p.old_key, p.new_key, p.type, abort); + return ops->pr_preempt(bdev, p.old_key, p.new_key, p.type, abort, + p.flags & PR_FL_PR_STAT) } static int blkdev_pr_clear(struct block_device *bdev, @@ -341,7 +345,7 @@ static int blkdev_pr_clear(struct block_device *bdev, if (c.flags) return -EOPNOTSUPP; - return ops->pr_clear(bdev, c.key); + return ops->pr_clear(bdev, c.key, c.flags & PR_FL_PR_STAT); } static int blkdev_flushbuf(struct block_device *bdev, fmode_t mode, diff --git a/drivers/nvme/host/pr.c b/drivers/nvme/host/pr.c index 727123c611e6..7c317b646cde 100644 --- a/drivers/nvme/host/pr.c +++ b/drivers/nvme/host/pr.c @@ -70,18 +70,44 @@ static int nvme_send_ns_pr_command(struct nvme_ns *ns, struct nvme_command *c, } static int nvme_send_pr_command(struct block_device *bdev, - struct nvme_command *c, void *data, unsigned int data_len) + struct nvme_command *c, void *data, unsigned int data_len, + bool use_pr_sts) { + int ret; + if (IS_ENABLED(CONFIG_NVME_MULTIPATH) && bdev->bd_disk->fops == &nvme_ns_head_ops) - return nvme_send_ns_head_pr_command(bdev, c, data, data_len); + ret = nvme_send_ns_head_pr_command(bdev, c, data, data_len); else - return nvme_send_ns_pr_command(bdev->bd_disk->private_data, c, - data, data_len); + ret = nvme_send_ns_pr_command(bdev->bd_disk->private_data, c, + data, data_len); + if (!ret || ret < 0 || (ret > 0 && !use_pr_sts)) + return ret; + + switch (ret) { + case NVME_SC_RESERVATION_CONFLICT: + ret = PR_STS_RESERVATION_CONFLICT; + break; + case NVME_SC_HOST_PATH_ERROR: + ret = PR_STS_PATH_FAILURE; + break + case NVME_SC_BAD_ATTRIBUTES: + case NVME_SC_ONCS_NOT_SUPPORTED: + case NVME_SC_INVALID_OPCODE: + case NVME_SC_INVALID_FIELD: + case NVME_SC_INVALID_NS: + ret = PR_STS_INVALID_OP; + break; + default: + ret = PR_STS_IOERR; + break; + } + + return ret; } static int nvme_pr_command(struct block_device *bdev, u32 cdw10, - u64 key, u64 sa_key, u8 op) + u64 key, u64 sa_key, u8 op, bool use_pr_sts) { struct nvme_command c = { }; u8 data[16] = { 0, }; @@ -92,11 +118,11 @@ static int nvme_pr_command(struct block_device *bdev, u32 cdw10, c.common.opcode = op; c.common.cdw10 = cpu_to_le32(cdw10); - return nvme_send_pr_command(bdev, &c, data, sizeof(data)); + return nvme_send_pr_command(bdev, &c, data, sizeof(data), use_pr_sts); } static int nvme_pr_register(struct block_device *bdev, u64 old, - u64 new, unsigned flags) + u64 new, unsigned flags, bool use_pr_sts) { u32 cdw10; @@ -106,11 +132,12 @@ static int nvme_pr_register(struct block_device *bdev, u64 old, cdw10 = old ? 2 : 0; cdw10 |= (flags & PR_FL_IGNORE_KEY) ? 1 << 3 : 0; cdw10 |= (1 << 30) | (1 << 31); /* PTPL=1 */ - return nvme_pr_command(bdev, cdw10, old, new, nvme_cmd_resv_register); + return nvme_pr_command(bdev, cdw10, old, new, nvme_cmd_resv_register, + use_pr_sts); } static int nvme_pr_reserve(struct block_device *bdev, u64 key, - enum pr_type type, unsigned flags) + enum pr_type type, unsigned flags, bool use_pr_sts) { u32 cdw10; @@ -119,29 +146,34 @@ static int nvme_pr_reserve(struct block_device *bdev, u64 key, cdw10 = nvme_pr_type_from_blk(type) << 8; cdw10 |= ((flags & PR_FL_IGNORE_KEY) ? 1 << 3 : 0); - return nvme_pr_command(bdev, cdw10, key, 0, nvme_cmd_resv_acquire); + return nvme_pr_command(bdev, cdw10, key, 0, nvme_cmd_resv_acquire, + use_pr_sts); } static int nvme_pr_preempt(struct block_device *bdev, u64 old, u64 new, - enum pr_type type, bool abort) + enum pr_type type, bool abort, bool use_pr_sts) { u32 cdw10 = nvme_pr_type_from_blk(type) << 8 | (abort ? 2 : 1); - return nvme_pr_command(bdev, cdw10, old, new, nvme_cmd_resv_acquire); + return nvme_pr_command(bdev, cdw10, old, new, nvme_cmd_resv_acquire, + use_pr_sts); } -static int nvme_pr_clear(struct block_device *bdev, u64 key) +static int nvme_pr_clear(struct block_device *bdev, u64 key, bool use_pr_sts) { u32 cdw10 = 1 | (key ? 0 : 1 << 3); - return nvme_pr_command(bdev, cdw10, key, 0, nvme_cmd_resv_release); + return nvme_pr_command(bdev, cdw10, key, 0, nvme_cmd_resv_release, + use_pr_sts); } -static int nvme_pr_release(struct block_device *bdev, u64 key, enum pr_type type) +static int nvme_pr_release(struct block_device *bdev, u64 key, + enum pr_type type, bool use_pr_sts) { u32 cdw10 = nvme_pr_type_from_blk(type) << 8 | (key ? 0 : 1 << 3); - return nvme_pr_command(bdev, cdw10, key, 0, nvme_cmd_resv_release); + return nvme_pr_command(bdev, cdw10, key, 0, nvme_cmd_resv_release, + use_pr_sts); } static int nvme_pr_resv_report(struct block_device *bdev, void *data, diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index c0a614efcfce..46393bdd7427 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -1797,7 +1797,8 @@ static int sd_pr_read_reservation(struct block_device *bdev, } static int sd_pr_out_command(struct block_device *bdev, u8 sa, - u64 key, u64 sa_key, enum scsi_pr_type type, u8 flags) + u64 key, u64 sa_key, enum scsi_pr_type type, u8 flags, + bool use_pr_sts) { struct scsi_disk *sdkp = scsi_disk(bdev->bd_disk); struct scsi_device *sdev = sdkp->device; @@ -1842,44 +1843,74 @@ static int sd_pr_out_command(struct block_device *bdev, u8 sa, scsi_print_sense_hdr(sdev, NULL, &sshdr); } + if (!result || result < 0 || (result > 0 && !use_pr_sts)) + return result; + + result = PR_STS_IOERR; + + switch host_byte(result) { + case DID_TRANSPORT_FAILFAST: + case DID_TRANSPORT_MARGINAL: + case DID_TRANSPORT_DISRUPTED: + result = PR_STS_PATH_FAILURE; + goto done; + } + + switch (__get_status_byte(result)) { + case SAM_STAT_RESERVATION_CONFLICT: + result = PR_STS_RESERVATION_CONFLICT; + goto done; + case SAM_STAT_CHECK_CONDITION: + if (!scsi_sense_valid(&sshdr)) + goto done; + + if (sshdr.sense_key == ILLEGAL_REQUEST && + (sshdr.asc == 0x26 || sshdr.asc == 0x24)) { + result = PR_STS_INVALID_OP; + goto done; + } + } + +done: return result; } static int sd_pr_register(struct block_device *bdev, u64 old_key, u64 new_key, - u32 flags) + u32 flags, bool use_pr_sts) { if (flags & ~PR_FL_IGNORE_KEY) return -EOPNOTSUPP; return sd_pr_out_command(bdev, (flags & PR_FL_IGNORE_KEY) ? 0x06 : 0x00, - old_key, new_key, 0, - (1 << 0) /* APTPL */); + old_key, new_key, 0, (1 << 0) /* APTPL */, + use_pr_sts); } static int sd_pr_reserve(struct block_device *bdev, u64 key, enum pr_type type, - u32 flags) + u32 flags, bool use_pr_sts) { if (flags) return -EOPNOTSUPP; return sd_pr_out_command(bdev, 0x01, key, 0, - block_pr_type_to_scsi(type), 0); + block_pr_type_to_scsi(type), 0, use_pr_sts) } -static int sd_pr_release(struct block_device *bdev, u64 key, enum pr_type type) +static int sd_pr_release(struct block_device *bdev, u64 key, enum pr_type type, + bool use_pr_sts) { return sd_pr_out_command(bdev, 0x02, key, 0, - block_pr_type_to_scsi(type), 0); + block_pr_type_to_scsi(type), 0, use_pr_sts); } static int sd_pr_preempt(struct block_device *bdev, u64 old_key, u64 new_key, - enum pr_type type, bool abort) + enum pr_type type, bool abort, bool use_pr_sts) { return sd_pr_out_command(bdev, abort ? 0x05 : 0x04, old_key, new_key, - block_pr_type_to_scsi(type), 0); + block_pr_type_to_scsi(type), 0, use_pr_sts); } -static int sd_pr_clear(struct block_device *bdev, u64 key) +static int sd_pr_clear(struct block_device *bdev, u64 key, bool use_pr_sts) { - return sd_pr_out_command(bdev, 0x03, key, 0, 0, 0); + return sd_pr_out_command(bdev, 0x03, key, 0, 0, 0, use_pr_sts); } static const struct pr_ops sd_pr_ops = { diff --git a/include/linux/pr.h b/include/linux/pr.h index 3003daec28a5..51e03e73a87f 100644 --- a/include/linux/pr.h +++ b/include/linux/pr.h @@ -18,14 +18,14 @@ struct pr_held_reservation { struct pr_ops { int (*pr_register)(struct block_device *bdev, u64 old_key, u64 new_key, - u32 flags); + u32 flags, bool use_pr_sts); int (*pr_reserve)(struct block_device *bdev, u64 key, - enum pr_type type, u32 flags); + enum pr_type type, u32 flags, bool use_pr_sts); int (*pr_release)(struct block_device *bdev, u64 key, - enum pr_type type); + enum pr_type type, bool use_pr_sts); int (*pr_preempt)(struct block_device *bdev, u64 old_key, u64 new_key, - enum pr_type type, bool abort); - int (*pr_clear)(struct block_device *bdev, u64 key); + enum pr_type type, bool abort, bool use_pr_sts); + int (*pr_clear)(struct block_device *bdev, u64 key, bool use_pr_sts); /* * pr_read_keys - Read the registered keys and return them in the * pr_keys->keys array. The keys array will have been allocated at the diff --git a/include/uapi/linux/pr.h b/include/uapi/linux/pr.h index ccc78cbf1221..ac8f7fc404ff 100644 --- a/include/uapi/linux/pr.h +++ b/include/uapi/linux/pr.h @@ -13,6 +13,14 @@ enum pr_type { PR_EXCLUSIVE_ACCESS_ALL_REGS = 6, }; +enum pr_status { + PR_STS_SUCCESS, + PR_STS_IOERR, + PR_STS_RESERVATION_CONFLICT, + PR_STS_PATH_FAILURE, + PR_STS_INVALID_OP, +}; + struct pr_reservation { __u64 key; __u32 type; @@ -40,6 +48,7 @@ struct pr_clear { }; #define PR_FL_IGNORE_KEY (1 << 0) /* ignore existing key */ +#define PR_FL_PR_STAT (1 << 1) /* Convert device errors to pr_status */ #define IOC_PR_REGISTER _IOW('p', 200, struct pr_registration) #define IOC_PR_RESERVE _IOW('p', 201, struct pr_reservation) Bart, if this is what you were suggesting I misunderstood you. I thought you wanted to only pass the new code in the kernel so that's why I was saying we still have the problem of converting the error back when passing it back to userspace.
On Sat, Nov 05, 2022 at 01:36:18PM -0500, Mike Christie wrote:
> Do you mean just doing this:
That would be the minimal fix. We'd then still need to enumerate
the allowed positive return values and check noting else is returned.
I don't like the opt in in the other version. The SCSI return values are
the defactor API, and we need to switch NVMe to align with it ASAP
instead of keeping the broken old version around.
diff --git a/block/ioctl.c b/block/ioctl.c index 60121e89052b..72338c56e235 100644 --- a/block/ioctl.c +++ b/block/ioctl.c @@ -269,7 +269,8 @@ static int blkdev_pr_register(struct block_device *bdev, if (reg.flags & ~PR_FL_IGNORE_KEY) return -EOPNOTSUPP; - return ops->pr_register(bdev, reg.old_key, reg.new_key, reg.flags); + return ops->pr_register(bdev, reg.old_key, reg.new_key, reg.flags, + NULL); } static int blkdev_pr_reserve(struct block_device *bdev, @@ -287,7 +288,7 @@ static int blkdev_pr_reserve(struct block_device *bdev, if (rsv.flags & ~PR_FL_IGNORE_KEY) return -EOPNOTSUPP; - return ops->pr_reserve(bdev, rsv.key, rsv.type, rsv.flags); + return ops->pr_reserve(bdev, rsv.key, rsv.type, rsv.flags, NULL); } static int blkdev_pr_release(struct block_device *bdev, @@ -305,7 +306,7 @@ static int blkdev_pr_release(struct block_device *bdev, if (rsv.flags) return -EOPNOTSUPP; - return ops->pr_release(bdev, rsv.key, rsv.type); + return ops->pr_release(bdev, rsv.key, rsv.type, NULL); } static int blkdev_pr_preempt(struct block_device *bdev, @@ -323,7 +324,7 @@ static int blkdev_pr_preempt(struct block_device *bdev, if (p.flags) return -EOPNOTSUPP; - return ops->pr_preempt(bdev, p.old_key, p.new_key, p.type, abort); + return ops->pr_preempt(bdev, p.old_key, p.new_key, p.type, abort, NULL); } static int blkdev_pr_clear(struct block_device *bdev, @@ -341,7 +342,7 @@ static int blkdev_pr_clear(struct block_device *bdev, if (c.flags) return -EOPNOTSUPP; - return ops->pr_clear(bdev, c.key); + return ops->pr_clear(bdev, c.key, NULL); } static int blkdev_flushbuf(struct block_device *bdev, fmode_t mode, diff --git a/drivers/md/dm.c b/drivers/md/dm.c index f7f806890c92..7e77c836a61a 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -3079,7 +3079,8 @@ struct dm_pr { bool abort; bool fail_early; int ret; - enum pr_type type; + enum pr_type type; + blk_status_t *blk_stat; }; static int dm_call_pr(struct block_device *bdev, iterate_devices_callout_fn fn, @@ -3130,7 +3131,8 @@ static int __dm_pr_register(struct dm_target *ti, struct dm_dev *dev, return -1; } - ret = ops->pr_register(dev->bdev, pr->old_key, pr->new_key, pr->flags); + ret = ops->pr_register(dev->bdev, pr->old_key, pr->new_key, pr->flags, + pr->blk_stat); if (!ret) return 0; @@ -3144,7 +3146,7 @@ static int __dm_pr_register(struct dm_target *ti, struct dm_dev *dev, } static int dm_pr_register(struct block_device *bdev, u64 old_key, u64 new_key, - u32 flags) + u32 flags, blk_status_t *blk_stat) { struct dm_pr pr = { .old_key = old_key, @@ -3152,6 +3154,7 @@ static int dm_pr_register(struct block_device *bdev, u64 old_key, u64 new_key, .flags = flags, .fail_early = true, .ret = 0, + .blk_stat = blk_stat, }; int ret; @@ -3189,7 +3192,8 @@ static int __dm_pr_reserve(struct dm_target *ti, struct dm_dev *dev, return -1; } - pr->ret = ops->pr_reserve(dev->bdev, pr->old_key, pr->type, pr->flags); + pr->ret = ops->pr_reserve(dev->bdev, pr->old_key, pr->type, pr->flags, + pr->blk_stat); if (!pr->ret) return -1; @@ -3197,7 +3201,7 @@ static int __dm_pr_reserve(struct dm_target *ti, struct dm_dev *dev, } static int dm_pr_reserve(struct block_device *bdev, u64 key, enum pr_type type, - u32 flags) + u32 flags, blk_status_t *blk_stat) { struct dm_pr pr = { .old_key = key, @@ -3205,6 +3209,7 @@ static int dm_pr_reserve(struct block_device *bdev, u64 key, enum pr_type type, .type = type, .fail_early = false, .ret = 0, + .blk_stat = blk_stat, }; int ret; @@ -3232,19 +3237,22 @@ static int __dm_pr_release(struct dm_target *ti, struct dm_dev *dev, return -1; } - pr->ret = ops->pr_release(dev->bdev, pr->old_key, pr->type); + pr->ret = ops->pr_release(dev->bdev, pr->old_key, pr->type, + pr->blk_stat); if (pr->ret) return -1; return 0; } -static int dm_pr_release(struct block_device *bdev, u64 key, enum pr_type type) +static int dm_pr_release(struct block_device *bdev, u64 key, enum pr_type type, + blk_status_t *blk_stat) { struct dm_pr pr = { .old_key = key, .type = type, .fail_early = false, + .blk_stat = blk_stat, }; int ret; @@ -3267,7 +3275,7 @@ static int __dm_pr_preempt(struct dm_target *ti, struct dm_dev *dev, } pr->ret = ops->pr_preempt(dev->bdev, pr->old_key, pr->new_key, pr->type, - pr->abort); + pr->abort, pr->blk_stat); if (!pr->ret) return -1; @@ -3275,13 +3283,14 @@ static int __dm_pr_preempt(struct dm_target *ti, struct dm_dev *dev, } static int dm_pr_preempt(struct block_device *bdev, u64 old_key, u64 new_key, - enum pr_type type, bool abort) + enum pr_type type, bool abort, blk_status_t *blk_stat) { struct dm_pr pr = { .new_key = new_key, .old_key = old_key, .type = type, .fail_early = false, + .blk_stat = blk_stat, }; int ret; @@ -3292,7 +3301,8 @@ static int dm_pr_preempt(struct block_device *bdev, u64 old_key, u64 new_key, return pr.ret; } -static int dm_pr_clear(struct block_device *bdev, u64 key) +static int dm_pr_clear(struct block_device *bdev, u64 key, + blk_status_t *blk_stat) { struct mapped_device *md = bdev->bd_disk->private_data; const struct pr_ops *ops; @@ -3304,7 +3314,7 @@ static int dm_pr_clear(struct block_device *bdev, u64 key) ops = bdev->bd_disk->fops->pr_ops; if (ops && ops->pr_clear) - r = ops->pr_clear(bdev, key); + r = ops->pr_clear(bdev, key, blk_stat); else r = -EOPNOTSUPP; out: @@ -3313,7 +3323,7 @@ static int dm_pr_clear(struct block_device *bdev, u64 key) } static int dm_pr_read_keys(struct block_device *bdev, struct pr_keys *keys, - u32 keys_len) + u32 keys_len, blk_status_t *blk_stat) { struct mapped_device *md = bdev->bd_disk->private_data; const struct pr_ops *ops; @@ -3325,7 +3335,7 @@ static int dm_pr_read_keys(struct block_device *bdev, struct pr_keys *keys, ops = bdev->bd_disk->fops->pr_ops; if (ops && ops->pr_read_keys) - r = ops->pr_read_keys(bdev, keys, keys_len); + r = ops->pr_read_keys(bdev, keys, keys_len, blk_stat); else r = -EOPNOTSUPP; out: @@ -3334,7 +3344,8 @@ static int dm_pr_read_keys(struct block_device *bdev, struct pr_keys *keys, } static int dm_pr_read_reservation(struct block_device *bdev, - struct pr_held_reservation *rsv) + struct pr_held_reservation *rsv, + blk_status_t *blk_stat) { struct mapped_device *md = bdev->bd_disk->private_data; const struct pr_ops *ops; @@ -3346,7 +3357,7 @@ static int dm_pr_read_reservation(struct block_device *bdev, ops = bdev->bd_disk->fops->pr_ops; if (ops && ops->pr_read_reservation) - r = ops->pr_read_reservation(bdev, rsv); + r = ops->pr_read_reservation(bdev, rsv, blk_stat); else r = -EOPNOTSUPP; out: diff --git a/drivers/nvme/host/pr.c b/drivers/nvme/host/pr.c index 87a0aaf811f4..854a26ce55fe 100644 --- a/drivers/nvme/host/pr.c +++ b/drivers/nvme/host/pr.c @@ -86,7 +86,7 @@ static int nvme_pr_command(struct block_device *bdev, u32 cdw10, } static int nvme_pr_register(struct block_device *bdev, u64 old, - u64 new, unsigned flags) + u64 new, unsigned flags, blk_status_t *blk_stat) { u32 cdw10; @@ -100,7 +100,7 @@ static int nvme_pr_register(struct block_device *bdev, u64 old, } static int nvme_pr_reserve(struct block_device *bdev, u64 key, - enum pr_type type, unsigned flags) + enum pr_type type, unsigned flags, blk_status_t *blk_stat) { u32 cdw10; @@ -113,21 +113,23 @@ static int nvme_pr_reserve(struct block_device *bdev, u64 key, } static int nvme_pr_preempt(struct block_device *bdev, u64 old, u64 new, - enum pr_type type, bool abort) + enum pr_type type, bool abort, blk_status_t *blk_stat) { u32 cdw10 = nvme_pr_type_from_blk(type) << 8 | (abort ? 2 : 1); return nvme_pr_command(bdev, cdw10, old, new, nvme_cmd_resv_acquire); } -static int nvme_pr_clear(struct block_device *bdev, u64 key) +static int nvme_pr_clear(struct block_device *bdev, u64 key, + blk_status_t *blk_stat) { u32 cdw10 = 1 | (key ? 0 : 1 << 3); return nvme_pr_command(bdev, cdw10, key, 0, nvme_cmd_resv_release); } -static int nvme_pr_release(struct block_device *bdev, u64 key, enum pr_type type) +static int nvme_pr_release(struct block_device *bdev, u64 key, enum pr_type type, + blk_status_t *blk_stat) { u32 cdw10 = nvme_pr_type_from_blk(type) << 8 | (key ? 0 : 1 << 3); @@ -163,7 +165,7 @@ static int nvme_pr_resv_report(struct block_device *bdev, u8 *data, } static int nvme_pr_read_keys(struct block_device *bdev, - struct pr_keys *keys_info, u32 keys_len) + struct pr_keys *keys_info, u32 keys_len, blk_status_t *blk_stat) { struct nvme_reservation_status *status; u32 data_len, num_ret_keys; @@ -207,7 +209,7 @@ static int nvme_pr_read_keys(struct block_device *bdev, } static int nvme_pr_read_reservation(struct block_device *bdev, - struct pr_held_reservation *resv) + struct pr_held_reservation *resv, blk_status_t *blk_stat) { struct nvme_reservation_status tmp_status, *status; int ret, i, num_regs; diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index 86b602399000..8a39f25e4470 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -1736,7 +1736,7 @@ static int sd_pr_in_command(struct block_device *bdev, u8 sa, } static int sd_pr_read_keys(struct block_device *bdev, struct pr_keys *keys_info, - u32 keys_len) + u32 keys_len, blk_status_t *blk_stat) { int result, i, data_offset, num_copy_keys; int data_len = keys_len + 8; @@ -1767,7 +1767,8 @@ static int sd_pr_read_keys(struct block_device *bdev, struct pr_keys *keys_info, } static int sd_pr_read_reservation(struct block_device *bdev, - struct pr_held_reservation *rsv) + struct pr_held_reservation *rsv, + blk_status_t *blk_stat) { struct scsi_disk *sdkp = scsi_disk(bdev->bd_disk); struct scsi_device *sdev = sdkp->device; @@ -1797,8 +1798,8 @@ static int sd_pr_read_reservation(struct block_device *bdev, return 0; } -static int sd_pr_out_command(struct block_device *bdev, u8 sa, - u64 key, u64 sa_key, enum scsi_pr_type type, u8 flags) +static int sd_pr_out_command(struct block_device *bdev, u8 sa, u64 key, + u64 sa_key, enum scsi_pr_type type, u8 flags) { struct scsi_disk *sdkp = scsi_disk(bdev->bd_disk); struct scsi_device *sdev = sdkp->device; @@ -1847,7 +1848,7 @@ static int sd_pr_out_command(struct block_device *bdev, u8 sa, } static int sd_pr_register(struct block_device *bdev, u64 old_key, u64 new_key, - u32 flags) + u32 flags, blk_status_t *blk_stat) { if (flags & ~PR_FL_IGNORE_KEY) return -EOPNOTSUPP; @@ -1857,7 +1858,7 @@ static int sd_pr_register(struct block_device *bdev, u64 old_key, u64 new_key, } static int sd_pr_reserve(struct block_device *bdev, u64 key, enum pr_type type, - u32 flags) + u32 flags, blk_status_t *blk_stat) { if (flags) return -EOPNOTSUPP; @@ -1865,20 +1866,22 @@ static int sd_pr_reserve(struct block_device *bdev, u64 key, enum pr_type type, block_pr_type_to_scsi(type), 0); } -static int sd_pr_release(struct block_device *bdev, u64 key, enum pr_type type) +static int sd_pr_release(struct block_device *bdev, u64 key, enum pr_type type, + blk_status_t *blk_stat) { return sd_pr_out_command(bdev, 0x02, key, 0, block_pr_type_to_scsi(type), 0); } static int sd_pr_preempt(struct block_device *bdev, u64 old_key, u64 new_key, - enum pr_type type, bool abort) + enum pr_type type, bool abort, blk_status_t *blk_stat) { return sd_pr_out_command(bdev, abort ? 0x05 : 0x04, old_key, new_key, block_pr_type_to_scsi(type), 0); } -static int sd_pr_clear(struct block_device *bdev, u64 key) +static int sd_pr_clear(struct block_device *bdev, u64 key, + blk_status_t *blk_stat) { return sd_pr_out_command(bdev, 0x03, key, 0, 0, 0); } diff --git a/fs/nfs/blocklayout/dev.c b/fs/nfs/blocklayout/dev.c index fea5f8821da5..516436446f5d 100644 --- a/fs/nfs/blocklayout/dev.c +++ b/fs/nfs/blocklayout/dev.c @@ -29,7 +29,7 @@ bl_free_device(struct pnfs_block_dev *dev) int error; error = ops->pr_register(dev->bdev, dev->pr_key, 0, - false); + false, NULL); if (error) pr_err("failed to unregister PR key.\n"); } @@ -362,7 +362,7 @@ bl_parse_scsi(struct nfs_server *server, struct pnfs_block_dev *d, goto out_blkdev_put; } - error = ops->pr_register(d->bdev, 0, d->pr_key, true); + error = ops->pr_register(d->bdev, 0, d->pr_key, true, NULL); if (error) { pr_err("pNFS: failed to register key for block device %s.", d->bdev->bd_disk->disk_name); diff --git a/fs/nfsd/blocklayout.c b/fs/nfsd/blocklayout.c index b6d01d51a746..a302ea026f72 100644 --- a/fs/nfsd/blocklayout.c +++ b/fs/nfsd/blocklayout.c @@ -277,7 +277,7 @@ nfsd4_block_get_device_info_scsi(struct super_block *sb, goto out_free_dev; } - ret = ops->pr_register(sb->s_bdev, 0, NFSD_MDS_PR_KEY, true); + ret = ops->pr_register(sb->s_bdev, 0, NFSD_MDS_PR_KEY, true, NULL); if (ret) { pr_err("pNFS: failed to register key for device %s.\n", sb->s_id); @@ -285,7 +285,7 @@ nfsd4_block_get_device_info_scsi(struct super_block *sb, } ret = ops->pr_reserve(sb->s_bdev, NFSD_MDS_PR_KEY, - PR_EXCLUSIVE_ACCESS_REG_ONLY, 0); + PR_EXCLUSIVE_ACCESS_REG_ONLY, 0, NULL); if (ret) { pr_err("pNFS: failed to reserve device %s.\n", sb->s_id); @@ -331,7 +331,7 @@ nfsd4_scsi_fence_client(struct nfs4_layout_stateid *ls) struct block_device *bdev = ls->ls_file->nf_file->f_path.mnt->mnt_sb->s_bdev; bdev->bd_disk->fops->pr_ops->pr_preempt(bdev, NFSD_MDS_PR_KEY, - nfsd4_scsi_pr_key(clp), 0, true); + nfsd4_scsi_pr_key(clp), 0, true, NULL); } const struct nfsd4_layout_ops scsi_layout_ops = { diff --git a/include/linux/pr.h b/include/linux/pr.h index 55c9ab7a202b..954544200a57 100644 --- a/include/linux/pr.h +++ b/include/linux/pr.h @@ -18,14 +18,15 @@ struct pr_held_reservation { struct pr_ops { int (*pr_register)(struct block_device *bdev, u64 old_key, u64 new_key, - u32 flags); + u32 flags, blk_status_t *blk_stat); int (*pr_reserve)(struct block_device *bdev, u64 key, - enum pr_type type, u32 flags); + enum pr_type type, u32 flags, blk_status_t *blk_stat); int (*pr_release)(struct block_device *bdev, u64 key, - enum pr_type type); + enum pr_type type, blk_status_t *blk_stat); int (*pr_preempt)(struct block_device *bdev, u64 old_key, u64 new_key, - enum pr_type type, bool abort); - int (*pr_clear)(struct block_device *bdev, u64 key); + enum pr_type type, bool abort, blk_status_t *blk_stat); + int (*pr_clear)(struct block_device *bdev, u64 key, + blk_status_t *blk_stat); /* * pr_read_keys - Read the registered keys and return them in the * pr_keys->keys array. The keys array will have been allocated at the @@ -35,9 +36,11 @@ struct pr_ops { * contains, so the caller can retry with a larger array. */ int (*pr_read_keys)(struct block_device *bdev, - struct pr_keys *keys_info, u32 keys_len); + struct pr_keys *keys_info, u32 keys_len, + blk_status_t *blk_stat); int (*pr_read_reservation)(struct block_device *bdev, - struct pr_held_reservation *rsv); + struct pr_held_reservation *rsv, + blk_status_t *blk_stat); }; #endif /* LINUX_PR_H */
LIO needs to be able to know if a failure was the result of a reservation conflict and then be able to convert from the lower level's definition of that error to SCSI so it can be returned to the initiator. Windows clustering and test tools like libiscsi require this. dm-multipath would also like to be able to distiguish between path failures and reservation conflict so they can optimize their error handlers for their pr_ops. To do this they currently have to know the lower level device type and how to convert between that driver's error code and SCSI. Just knowing the device type is difficult because we can have layers like dm-multipath between us and dm-multipath only knows the layer below it is a block device. To handle both cases and keep userspace compatibility, this patch adds a blk_status_t arg to the pr_ops callouts. The lower levels will convert their device specific error to the blk_status_t then the upper levels can easily check that code without knowing the device type. Adding the extra return value will then allow us to not break userspace which expects a negative -Exyz error code if the command fails before it's sent to the device or a device/driver specific value if the error is > 0. This patch just wires in the blk_status_t to the pr_ops callouts. The next patches will then have the drivers pass up a blk_status_t. Signed-off-by: Mike Christie <michael.christie@oracle.com> --- block/ioctl.c | 11 ++++++----- drivers/md/dm.c | 41 +++++++++++++++++++++++++--------------- drivers/nvme/host/pr.c | 16 +++++++++------- drivers/scsi/sd.c | 21 +++++++++++--------- fs/nfs/blocklayout/dev.c | 4 ++-- fs/nfsd/blocklayout.c | 6 +++--- include/linux/pr.h | 17 ++++++++++------- 7 files changed, 68 insertions(+), 48 deletions(-)