Message ID | 20230714213419.95492-14-michael.christie@oracle.com |
---|---|
State | New |
Headers | show |
Series | scsi: Allow scsi_execute users to control retries | expand |
On 14/07/2023 22:33, Mike Christie wrote: > If scsi_execute_cmd returns < 0, it doesn't initialize the sshdr, so we > shouldn't access the sshdr. If it returns 0, then the cmd executed > successfully, so there is no need to check the sshdr. This has us access > the sshdr when get a return value > 0. > > Signed-off-by: Mike Christie<michael.christie@oracle.com> > Reviewed-by: Christoph Hellwig<hch@lst.de> > --- > drivers/scsi/device_handler/scsi_dh_rdac.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/drivers/scsi/device_handler/scsi_dh_rdac.c b/drivers/scsi/device_handler/scsi_dh_rdac.c > index c5538645057a..cdefaa9f614e 100644 > --- a/drivers/scsi/device_handler/scsi_dh_rdac.c > +++ b/drivers/scsi/device_handler/scsi_dh_rdac.c > @@ -541,6 +541,7 @@ static void send_mode_select(struct work_struct *work) > const struct scsi_exec_args exec_args = { > .sshdr = &sshdr, > }; > + int rc; > > spin_lock(&ctlr->ms_lock); > list_splice_init(&ctlr->ms_head, &list); > @@ -558,14 +559,18 @@ static void send_mode_select(struct work_struct *work) > (char *) h->ctlr->array_name, h->ctlr->index, > (retry_cnt == RDAC_RETRY_COUNT) ? "queueing" : "retrying"); > > - if (scsi_execute_cmd(sdev, cdb, opf, &h->ctlr->mode_select, data_size, > - RDAC_TIMEOUT * HZ, RDAC_RETRIES, &exec_args)) { > + rc = scsi_execute_cmd(sdev, cdb, opf, &h->ctlr->mode_select, data_size, > + RDAC_TIMEOUT * HZ, RDAC_RETRIES, &exec_args); > + if (rc < 0) { > + err = SCSI_DH_IO; > + } else if (rc > 0) { > err = mode_select_handle_sense(sdev, &sshdr); > if (err == SCSI_DH_RETRY && retry_cnt--) > goto retry; > if (err == SCSI_DH_IMM_RETRY) > goto retry; > } > + > if (err == SCSI_DH_OK) { As I see, err is initially set to SCSI_DH_OK when declared. Then if we need to retry and 2nd call to scsi_execute_cmd() passes, such that rc == 0, then err still holds the old value. This seems same as pre-existing behaviour - is this proper? Thanks, John
On 7/18/23 8:25 AM, John Garry wrote: > On 14/07/2023 22:33, Mike Christie wrote: >> If scsi_execute_cmd returns < 0, it doesn't initialize the sshdr, so we >> shouldn't access the sshdr. If it returns 0, then the cmd executed >> successfully, so there is no need to check the sshdr. This has us access >> the sshdr when get a return value > 0. >> >> Signed-off-by: Mike Christie<michael.christie@oracle.com> >> Reviewed-by: Christoph Hellwig<hch@lst.de> >> --- >> drivers/scsi/device_handler/scsi_dh_rdac.c | 9 +++++++-- >> 1 file changed, 7 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/scsi/device_handler/scsi_dh_rdac.c b/drivers/scsi/device_handler/scsi_dh_rdac.c >> index c5538645057a..cdefaa9f614e 100644 >> --- a/drivers/scsi/device_handler/scsi_dh_rdac.c >> +++ b/drivers/scsi/device_handler/scsi_dh_rdac.c >> @@ -541,6 +541,7 @@ static void send_mode_select(struct work_struct *work) >> const struct scsi_exec_args exec_args = { >> .sshdr = &sshdr, >> }; >> + int rc; >> spin_lock(&ctlr->ms_lock); >> list_splice_init(&ctlr->ms_head, &list); >> @@ -558,14 +559,18 @@ static void send_mode_select(struct work_struct *work) >> (char *) h->ctlr->array_name, h->ctlr->index, >> (retry_cnt == RDAC_RETRY_COUNT) ? "queueing" : "retrying"); >> - if (scsi_execute_cmd(sdev, cdb, opf, &h->ctlr->mode_select, data_size, >> - RDAC_TIMEOUT * HZ, RDAC_RETRIES, &exec_args)) { >> + rc = scsi_execute_cmd(sdev, cdb, opf, &h->ctlr->mode_select, data_size, >> + RDAC_TIMEOUT * HZ, RDAC_RETRIES, &exec_args); >> + if (rc < 0) { >> + err = SCSI_DH_IO; >> + } else if (rc > 0) { >> err = mode_select_handle_sense(sdev, &sshdr); >> if (err == SCSI_DH_RETRY && retry_cnt--) >> goto retry; >> if (err == SCSI_DH_IMM_RETRY) >> goto retry; >> } >> + >> if (err == SCSI_DH_OK) { > As I see, err is initially set to SCSI_DH_OK when declared. Then if we need to retry and 2nd call to scsi_execute_cmd() passes, such that rc == 0, then err still holds the old value. This seems same as pre-existing behaviour - is this proper? I guess this has been working by accident. When that happens we end up returning one of the retryable error codes to dm-multipath. It will then recall us, and before we re-run this function we will run check_ownership and see that the last call worked.
On 18/07/2023 16:31, Mike Christie wrote: Hi Mike, >>> + >>> if (err == SCSI_DH_OK) { >> As I see, err is initially set to SCSI_DH_OK when declared. Then if we need to retry and 2nd call to scsi_execute_cmd() passes, such that rc == 0, then err still holds the old value. This seems same as pre-existing behaviour - is this proper? > I guess this has been working by accident. > > When that happens we end up returning one of the retryable error codes > to dm-multipath. It will then recall us, and before we re-run this > function we will run check_ownership and see that the last call worked. > I'd suggest that we fix this up as a prep patch, if you don't mind. Thanks, John
On 7/19/23 9:44 AM, John Garry wrote: > On 18/07/2023 16:31, Mike Christie wrote: > > Hi Mike, > >>>> + >>>> if (err == SCSI_DH_OK) { >>> As I see, err is initially set to SCSI_DH_OK when declared. Then if we need to retry and 2nd call to scsi_execute_cmd() passes, such that rc == 0, then err still holds the old value. This seems same as pre-existing behaviour - is this proper? >> I guess this has been working by accident. >> >> When that happens we end up returning one of the retryable error codes >> to dm-multipath. It will then recall us, and before we re-run this >> function we will run check_ownership and see that the last call worked. >> > > I'd suggest that we fix this up as a prep patch, if you don't mind. > Do you mean just change the description of this patch to reflect it fixes the second bug? It already is a prep patch. The second rdac patch is built over it. stable can take the sshdr fix patches without the API change ones if they want. I just put the sshdr one next to the API change one, so reviewers wouldn't have to jump back and forth between the front and back of the patchset. Do you mean move all the sshd hdr patches to a separate patchset?
On 22/07/2023 18:10, Mike Christie wrote: >>>> As I see, err is initially set to SCSI_DH_OK when declared. Then if we need to retry and 2nd call to scsi_execute_cmd() passes, such that rc == 0, then err still holds the old value. This seems same as pre-existing behaviour - is this proper? >>> I guess this has been working by accident. >>> >>> When that happens we end up returning one of the retryable error codes >>> to dm-multipath. It will then recall us, and before we re-run this >>> function we will run check_ownership and see that the last call worked. >>> Hi Mike, >> I'd suggest that we fix this up as a prep patch, if you don't mind. >> > Do you mean just change the description of this patch to reflect it fixes the > second bug? It already is a prep patch. The second rdac patch is built over it. AFAICS, this patch does not fix the bug where @err may hold a stale value. However this broken code goes away later in the series. > stable can take the sshdr fix patches without the API change ones if they want. > > I just put the sshdr one next to the API change one, so reviewers wouldn't > have to jump back and forth between the front and back of the patchset. > > Do you mean move all the sshd hdr patches to a separate patchset? No, I was just suggesting that we fix the broken code also in a separate patch in this series, like: diff --git a/drivers/scsi/device_handler/scsi_dh_rdac.c b/drivers/scsi/device_handler/scsi_dh_rdac.c index c5538645057a..5d338f12e68b 100644 --- a/drivers/scsi/device_handler/scsi_dh_rdac.c +++ b/drivers/scsi/device_handler/scsi_dh_rdac.c @@ -530,7 +530,7 @@ static void send_mode_select(struct work_struct *work) container_of(work, struct rdac_controller, ms_work); struct scsi_device *sdev = ctlr->ms_sdev; struct rdac_dh_data *h = sdev->handler_data; - int err = SCSI_DH_OK, retry_cnt = RDAC_RETRY_COUNT; + int err, retry_cnt = RDAC_RETRY_COUNT; struct rdac_queue_data *tmp, *qdata; LIST_HEAD(list); unsigned char cdb[MAX_COMMAND_SIZE]; @@ -549,6 +549,7 @@ static void send_mode_select(struct work_struct *work) spin_unlock(&ctlr->ms_lock); retry: + err = SCSI_DH_OK; memset(cdb, 0, sizeof(cdb)); data_size = rdac_failover_get(ctlr, &list, cdb); Thanks, John
On 7/25/23 02:59, John Garry wrote: > On 22/07/2023 18:10, Mike Christie wrote: >>>>> As I see, err is initially set to SCSI_DH_OK when declared. Then if we need to retry and 2nd call to scsi_execute_cmd() passes, such that rc == 0, then err still holds the old value. This seems same as pre-existing behaviour - is this proper? >>>> I guess this has been working by accident. >>>> >>>> When that happens we end up returning one of the retryable error codes >>>> to dm-multipath. It will then recall us, and before we re-run this >>>> function we will run check_ownership and see that the last call worked. >>>> > > Hi Mike, > >>> I'd suggest that we fix this up as a prep patch, if you don't mind. >>> >> Do you mean just change the description of this patch to reflect it fixes the >> second bug? It already is a prep patch. The second rdac patch is built over it. > > AFAICS, this patch does not fix the bug where @err may hold a stale value. However this broken code goes away later in the series. > Ah shoot, you're right. I should have stayed on vacation one more day :) >> stable can take the sshdr fix patches without the API change ones if they want. >> >> I just put the sshdr one next to the API change one, so reviewers wouldn't >> have to jump back and forth between the front and back of the patchset. >> >> Do you mean move all the sshd hdr patches to a separate patchset? > No, I was just suggesting that we fix the broken code also in a separate patch in this series, like: > Got it. I'll add that patch to the series.
diff --git a/drivers/scsi/device_handler/scsi_dh_rdac.c b/drivers/scsi/device_handler/scsi_dh_rdac.c index c5538645057a..cdefaa9f614e 100644 --- a/drivers/scsi/device_handler/scsi_dh_rdac.c +++ b/drivers/scsi/device_handler/scsi_dh_rdac.c @@ -541,6 +541,7 @@ static void send_mode_select(struct work_struct *work) const struct scsi_exec_args exec_args = { .sshdr = &sshdr, }; + int rc; spin_lock(&ctlr->ms_lock); list_splice_init(&ctlr->ms_head, &list); @@ -558,14 +559,18 @@ static void send_mode_select(struct work_struct *work) (char *) h->ctlr->array_name, h->ctlr->index, (retry_cnt == RDAC_RETRY_COUNT) ? "queueing" : "retrying"); - if (scsi_execute_cmd(sdev, cdb, opf, &h->ctlr->mode_select, data_size, - RDAC_TIMEOUT * HZ, RDAC_RETRIES, &exec_args)) { + rc = scsi_execute_cmd(sdev, cdb, opf, &h->ctlr->mode_select, data_size, + RDAC_TIMEOUT * HZ, RDAC_RETRIES, &exec_args); + if (rc < 0) { + err = SCSI_DH_IO; + } else if (rc > 0) { err = mode_select_handle_sense(sdev, &sshdr); if (err == SCSI_DH_RETRY && retry_cnt--) goto retry; if (err == SCSI_DH_IMM_RETRY) goto retry; } + if (err == SCSI_DH_OK) { h->state = RDAC_STATE_ACTIVE; RDAC_LOG(RDAC_LOG_FAILOVER, sdev, "array %s, ctlr %d, "