Message ID | 20231114013750.76609-1-michael.christie@oracle.com |
---|---|
Headers | show |
Series | scsi: Allow scsi_execute users to request retries | expand |
On 14/11/2023 01:37, Mike Christie wrote: > For passthrough we don't retry any error we get a check condition for. nit: For passthrough we don't retry any error which we get a check condition for. > This results in a lot of callers driving their own retries for all UAs, > specific UAs, NOT_READY, specific sense values or any type of failure. > > This adds the core code to allow passthrough users to specify what errors > they want scsi-ml to retry for them. We can then convert users to drop > a lot of their sense parsing and retry handling. > > Signed-off-by: Mike Christie <michael.christie@oracle.com> Looks ok to me, so feel free to pick up: Reviewed-by: John Garry <john.g.garry@oracle.com> > --- > drivers/scsi/scsi_lib.c | 92 ++++++++++++++++++++++++++++++++++++++ > include/scsi/scsi_device.h | 48 ++++++++++++++++++++ > 2 files changed, 140 insertions(+) > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > index cf3864f72093..dee43c6f7ad0 100644 > --- a/drivers/scsi/scsi_lib.c > +++ b/drivers/scsi/scsi_lib.c > @@ -184,6 +184,92 @@ void scsi_queue_insert(struct scsi_cmnd *cmd, int reason) > __scsi_queue_insert(cmd, reason, true); > } > > +void scsi_reset_failures(struct scsi_failures *failures) nit: maybe scsi_reset_failures_retries as the name, to be more specific > +{ > + struct scsi_failure *failure; > + > + failures->total_retries = 0; > + > + for (failure = failures->failure_definitions; failure->result; > + failure++) > + failure->retries = 0; > +} > +EXPORT_SYMBOL_GPL(scsi_reset_failures); > + > +/** > + * scsi_check_passthrough - Determine if passthrough scsi_cmnd needs a retry. > + * @scmd: scsi_cmnd to check. > + * @failures: scsi_failures struct that lists failures to check for. > + * > + * Returns -EAGAIN if the caller should retry else 0. > + */ > +static int scsi_check_passthrough(struct scsi_cmnd *scmd, > + struct scsi_failures *failures) > +{ > + struct scsi_failure *failure; > + struct scsi_sense_hdr sshdr; > + enum sam_status status; > + > + if (!failures) > + return 0; > + > + for (failure = failures->failure_definitions; failure->result; > + failure++) { > + if (failure->result == SCMD_FAILURE_RESULT_ANY) > + goto maybe_retry; > + > + if (host_byte(scmd->result) && > + host_byte(scmd->result) == host_byte(failure->result)) > + goto maybe_retry; > + > + status = status_byte(scmd->result); > + if (!status) > + continue; > + > + if (failure->result == SCMD_FAILURE_STAT_ANY && > + !scsi_status_is_good(scmd->result)) > + goto maybe_retry; > + > + if (status != status_byte(failure->result)) > + continue; > + > + if (status_byte(failure->result) != SAM_STAT_CHECK_CONDITION || > + failure->sense == SCMD_FAILURE_SENSE_ANY) > + goto maybe_retry; > + > + if (!scsi_command_normalize_sense(scmd, &sshdr)) > + return 0; > + > + if (failure->sense != sshdr.sense_key) > + continue; > + > + if (failure->asc == SCMD_FAILURE_ASC_ANY) > + goto maybe_retry; > + > + if (failure->asc != sshdr.asc) > + continue; > + > + if (failure->ascq == SCMD_FAILURE_ASCQ_ANY || > + failure->ascq == sshdr.ascq) > + goto maybe_retry; > + } > + > + return 0; > + > +maybe_retry: > + if (failure->allowed) { > + if (failure->allowed == SCMD_FAILURE_NO_LIMIT || > + ++failure->retries <= failure->allowed) > + return -EAGAIN; > + } else { > + if (failures->total_allowed == SCMD_FAILURE_NO_LIMIT || > + ++failures->total_retries <= failures->total_allowed) > + return -EAGAIN; > + } > + > + return 0; > +} > + > /** > * scsi_execute_cmd - insert request and wait for the result > * @sdev: scsi_device > @@ -214,6 +300,7 @@ int scsi_execute_cmd(struct scsi_device *sdev, const unsigned char *cmd, > args->sense_len != SCSI_SENSE_BUFFERSIZE)) > return -EINVAL; > > +retry: > req = scsi_alloc_request(sdev->request_queue, opf, args->req_flags); > if (IS_ERR(req)) > return PTR_ERR(req); > @@ -237,6 +324,11 @@ int scsi_execute_cmd(struct scsi_device *sdev, const unsigned char *cmd, > */ > blk_execute_rq(req, true); > > + if (scsi_check_passthrough(scmd, args->failures) == -EAGAIN) { > + blk_mq_free_request(req); > + goto retry; > + } > + > /* > * Some devices (USB mass-storage in particular) may transfer > * garbage data together with a residue indicating that the data > diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h > index 10480eb582b2..c92d6d9e644e 100644 > --- a/include/scsi/scsi_device.h > +++ b/include/scsi/scsi_device.h > @@ -483,6 +483,52 @@ extern int scsi_is_sdev_device(const struct device *); > extern int scsi_is_target_device(const struct device *); > extern void scsi_sanitize_inquiry_string(unsigned char *s, int len); > > +/* > + * scsi_execute_cmd users can set scsi_failure.result to have > + * scsi_check_passthrough fail/retry a command. scsi_failure.result can be a > + * specific host byte or message code, or SCMD_FAILURE_RESULT_ANY can be used > + * to match any host or message code. > + */ > +#define SCMD_FAILURE_RESULT_ANY 0x7fffffff > +/* > + * Set scsi_failure.result to SCMD_FAILURE_STAT_ANY to fail/retry any failure > + * scsi_status_is_good returns false for. > + */ > +#define SCMD_FAILURE_STAT_ANY 0xff > +/* > + * The following can be set to the scsi_failure sense, asc and ascq fields to > + * match on any sense, ASC, or ASCQ value. > + */ > +#define SCMD_FAILURE_SENSE_ANY 0xff > +#define SCMD_FAILURE_ASC_ANY 0xff > +#define SCMD_FAILURE_ASCQ_ANY 0xff > +/* Always retry a matching failure. */ > +#define SCMD_FAILURE_NO_LIMIT -1 > + > +struct scsi_failure { > + int result; > + u8 sense; > + u8 asc; > + u8 ascq; > + > + /* > + * Number of attempts allowed for this failure. It does not count > + * for the total_allowed. > + */ > + s8 allowed; > + s8 retries; > +}; > + > +struct scsi_failures { > + /* > + * If the failure does not have a specific limit in the scsi_failure > + * then this limit is followed. > + */ > + int total_allowed; > + int total_retries; > + struct scsi_failure *failure_definitions; > +}; > + > /* Optional arguments to scsi_execute_cmd */ > struct scsi_exec_args { > unsigned char *sense; /* sense buffer */ > @@ -491,12 +537,14 @@ struct scsi_exec_args { > blk_mq_req_flags_t req_flags; /* BLK_MQ_REQ flags */ > int scmd_flags; /* SCMD flags */ > int *resid; /* residual length */ > + struct scsi_failures *failures; /* failures to retry */ > }; > > int scsi_execute_cmd(struct scsi_device *sdev, const unsigned char *cmd, > blk_opf_t opf, void *buffer, unsigned int bufflen, > int timeout, int retries, > const struct scsi_exec_args *args); > +void scsi_reset_failures(struct scsi_failures *failures); > > extern void sdev_disable_disk_events(struct scsi_device *sdev); > extern void sdev_enable_disk_events(struct scsi_device *sdev);
On 14/11/2023 01:37, Mike Christie wrote: > The following patches were made over Martin's 6.7 staging branch > which contains a sd change that this patch requires. Hi Mike, It would be nice if we could apply to 6.8 staging - was the sd.c change not in v6.7-rc1? Cheers, John
On 14/11/2023 01:37, Mike Christie wrote: Feel free to pick up: Reviewed-by: John Garry <john.g.garry@oracle.com> > + the_result = scsi_execute_cmd(sdp, cmd, REQ_OP_DRV_IN, buffer, > + RC16_LEN, SD_TIMEOUT, sdkp->max_retries, BTW, some might find it a little confusing that we have scsi_execute_cmd() retries arg as well as being able to pass exec_args.failure, and exec_args.failure may allow us to retry. Could this be improved, even in terms of arg or struct member naming/comments? Thanks, John > + &exec_args); > > - } while (the_result && retries); > + if (the_result > 0) { > + if (media_not_present(sdkp, &sshdr)) > + return -ENODEV; > + > + sense_valid = scsi_sense_valid(&sshdr); > + if (sense_valid && sshdr.sense_key == ILLEGAL_REQUEST && > + (sshdr.asc == 0x20 || sshdr.asc == 0x24) && > + sshdr.ascq == 0x00) { > + /* > + * Invalid Command Operation Code or Invalid Field in > + * CDB, just retry silently with RC10 > + */ > + return -EINVAL; > + } > + }
On 11/16/23 5:39 AM, John Garry wrote: > On 14/11/2023 01:37, Mike Christie wrote: > > Feel free to pick up: > Reviewed-by: John Garry <john.g.garry@oracle.com> > >> + the_result = scsi_execute_cmd(sdp, cmd, REQ_OP_DRV_IN, buffer, >> + RC16_LEN, SD_TIMEOUT, sdkp->max_retries, > > BTW, some might find it a little confusing that we have scsi_execute_cmd() retries arg as well as being able to pass exec_args.failure, and exec_args.failure may allow us to retry. Could this be improved, even in terms of arg or struct member naming/comments? Will add some comments. Martin W, if you are going to ask about this again, the answer is that in a future patchset, we can kill the reties passed directly into scsi_execute_cmd. We could make scsi_decide_disposition's failures an array of scsi_failure structs that gets checked in scsi_decide_disposition and scsi_check_passthrough. We would need to add a function callout to the scsi_failure struct for some of the more complicated failure handling. That could then be used for some of the other scsi_execute_cmd cases as well. For the normal/fast path case though we would need something to avoid function pointers.
On Thu, 2023-11-16 at 11:15 -0600, Mike Christie wrote: > On 11/16/23 5:39 AM, John Garry wrote: > > On 14/11/2023 01:37, Mike Christie wrote: > > > > Feel free to pick up: > > Reviewed-by: John Garry <john.g.garry@oracle.com> > > > > > + the_result = scsi_execute_cmd(sdp, cmd, REQ_OP_DRV_IN, > > > buffer, > > > + RC16_LEN, SD_TIMEOUT, sdkp->max_retries, > > > > BTW, some might find it a little confusing that we have > > scsi_execute_cmd() retries arg as well as being able to pass > > exec_args.failure, and exec_args.failure may allow us to retry. > > Could this be improved, even in terms of arg or struct member > > naming/comments? > > Will add some comments. > > Martin W, if you are going to ask about this again, the answer is > that > in a future patchset, we can kill the reties passed directly into > scsi_execute_cmd. > Did I ask about this? ... I dimly remember ;-) > We could make scsi_decide_disposition's failures an array of > scsi_failure structs that gets checked in scsi_decide_disposition > and scsi_check_passthrough. We would need to add a function callout > to the scsi_failure struct for some of the more complicated failure > handling. That could then be used for some of the other > scsi_execute_cmd > cases as well. For the normal/fast path case though we would need > something to avoid function pointers. I am not sure if I like this idea. scsi_decide_disposition() is challenging to read already. I am not convinced that converting it into a loop over "struct scsi_failure" would make it easier to understand. I guess we'd need to see how it would look like. To my understanding, the two retry counts are currently orthogonal, the one in scsi_execute_cmd() representing the standard mid-layer "maybe_retry" case in scsi_decide_disposition, and the the other one representing the additional "high level" passthrough retry. We need to be careful when merging them. Wrt the callout, I'd actually thought about that when looking at your previous set, but I didn't want to interfere too much. I thought that using callouts might simplify the scsi_failure handling, and might actually make the new code easier to read. I can't judge possible effects on performance. Cheers, Martin