Message ID | 20221018202958.1902564-4-bvanassche@acm.org |
---|---|
State | New |
Headers | show |
Series | Fix a deadlock in the UFS driver | expand |
On 10/18/22 3:29 PM, Bart Van Assche wrote: > The current behavior for SCSI commands submitted while error recovery > is ongoing is to retry command submission after error recovery has > finished. See also the scsi_host_in_recovery() check in > scsi_host_queue_ready(). Add support for failing SCSI commands while > host recovery is in progress. This functionality will be used to fix a > deadlock in the UFS driver. > > Cc: Christoph Hellwig <hch@lst.de> > Cc: Ming Lei <ming.lei@redhat.com> > Cc: John Garry <john.garry@huawei.com> > Cc: Mike Christie <michael.christie@oracle.com> > Cc: Hannes Reinecke <hare@suse.de> > Signed-off-by: Bart Van Assche <bvanassche@acm.org> > --- > drivers/scsi/scsi_lib.c | 8 +++++--- > include/scsi/scsi_cmnd.h | 3 ++- > 2 files changed, 7 insertions(+), 4 deletions(-) > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > index fa96d3cfdfa3..ec890865abae 100644 > --- a/drivers/scsi/scsi_lib.c > +++ b/drivers/scsi/scsi_lib.c > @@ -1341,9 +1341,6 @@ static inline int scsi_host_queue_ready(struct request_queue *q, > struct scsi_device *sdev, > struct scsi_cmnd *cmd) > { > - if (scsi_host_in_recovery(shost)) > - return 0; > - > if (atomic_read(&shost->host_blocked) > 0) { > if (scsi_host_busy(shost) > 0) > goto starved; > @@ -1732,6 +1729,11 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx, > ret = BLK_STS_RESOURCE; > if (!scsi_target_queue_ready(shost, sdev)) > goto out_put_budget; > + if (unlikely(scsi_host_in_recovery(shost))) { > + if (cmd->flags & SCMD_FAIL_IF_RECOVERING) > + ret = BLK_STS_OFFLINE; > + goto out_dec_target_busy; > + } Hey, Will we always hit this check? For example, if we have hit the device's queue depth limit so scsi_mq_get_budget -> scsi_dev_queue_ready is returning -1, will we not even call scsi_queue_rq? So because we are in recovery and no commands are completing, we will be stuck waiting for a token to be put back on the sdev->budget_map. Do you need a similar check in scsi_dev_queue_ready or should the check go in there only or can we hit a race for the latter?
On 10/19/22 12:52, Mike Christie wrote: > Will we always hit this check? For example, if we have hit the > device's queue depth limit so > > scsi_mq_get_budget -> scsi_dev_queue_ready > > is returning -1, will we not even call scsi_queue_rq? So because > we are in recovery and no commands are completing, we will be > stuck waiting for a token to be put back on the sdev->budget_map. > > Do you need a similar check in scsi_dev_queue_ready or should > the check go in there only or can we hit a race for the latter? Hi Mike, A later patch in this series uses the SCMD_FAIL_IF_RECOVERING flag in the suspend path of the UFS driver. The UFS suspend code waits for the SCSI error handler to finish before the UFS device is suspended. Does this answer your question? Bart.
On 10/19/22 4:11 PM, Bart Van Assche wrote: > On 10/19/22 12:52, Mike Christie wrote: >> Will we always hit this check? For example, if we have hit the >> device's queue depth limit so >> >> scsi_mq_get_budget -> scsi_dev_queue_ready >> >> is returning -1, will we not even call scsi_queue_rq? So because >> we are in recovery and no commands are completing, we will be >> stuck waiting for a token to be put back on the sdev->budget_map. >> >> Do you need a similar check in scsi_dev_queue_ready or should >> the check go in there only or can we hit a race for the latter? > > Hi Mike, > > A later patch in this series uses the SCMD_FAIL_IF_RECOVERING flag in the suspend path of the UFS driver. The UFS suspend code waits for the SCSI error handler to finish before the UFS device is suspended. Does this answer your question? The reply for patch 9 about no other commands should be running at this time answers my question, so we should be ok. Reviewed-by: Mike Christie <michael.christie@oracle.com>
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index fa96d3cfdfa3..ec890865abae 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -1341,9 +1341,6 @@ static inline int scsi_host_queue_ready(struct request_queue *q, struct scsi_device *sdev, struct scsi_cmnd *cmd) { - if (scsi_host_in_recovery(shost)) - return 0; - if (atomic_read(&shost->host_blocked) > 0) { if (scsi_host_busy(shost) > 0) goto starved; @@ -1732,6 +1729,11 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx, ret = BLK_STS_RESOURCE; if (!scsi_target_queue_ready(shost, sdev)) goto out_put_budget; + if (unlikely(scsi_host_in_recovery(shost))) { + if (cmd->flags & SCMD_FAIL_IF_RECOVERING) + ret = BLK_STS_OFFLINE; + goto out_dec_target_busy; + } if (!scsi_host_queue_ready(q, shost, sdev, cmd)) goto out_dec_target_busy; diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h index 7d3622db38ed..c2cb5f69635c 100644 --- a/include/scsi/scsi_cmnd.h +++ b/include/scsi/scsi_cmnd.h @@ -52,8 +52,9 @@ struct scsi_pointer { #define SCMD_TAGGED (1 << 0) #define SCMD_INITIALIZED (1 << 1) #define SCMD_LAST (1 << 2) +#define SCMD_FAIL_IF_RECOVERING (1 << 4) /* flags preserved across unprep / reprep */ -#define SCMD_PRESERVED_FLAGS (SCMD_INITIALIZED) +#define SCMD_PRESERVED_FLAGS (SCMD_INITIALIZED | SCMD_FAIL_IF_RECOVERING) /* for scmd->state */ #define SCMD_STATE_COMPLETE 0
The current behavior for SCSI commands submitted while error recovery is ongoing is to retry command submission after error recovery has finished. See also the scsi_host_in_recovery() check in scsi_host_queue_ready(). Add support for failing SCSI commands while host recovery is in progress. This functionality will be used to fix a deadlock in the UFS driver. Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mike Christie <michael.christie@oracle.com> Cc: Hannes Reinecke <hare@suse.de> Signed-off-by: Bart Van Assche <bvanassche@acm.org> --- drivers/scsi/scsi_lib.c | 8 +++++--- include/scsi/scsi_cmnd.h | 3 ++- 2 files changed, 7 insertions(+), 4 deletions(-)