Message ID | 20221109074754.24075-1-hare@suse.de |
---|---|
State | New |
Headers | show |
Series | scsi_error: do not queue pointless abort workqueue functions | expand |
On 09/11/2022 07:47, Hannes Reinecke wrote: > If a host template doesn't implement the .eh_abort_handler() > there is no point in queueing the abort workqueue function; > all it does is invoking SCSI EH anyway. > So return 'FAILED' from scsi_abort_command() if the .eh_abort_handler() > is not implemented and save us from having to wait for the > abort workqueue function to complete. Do we ever use shost->tmf_work_q in this case? Doesn't seem much point in allocating it, apart from keeping the code simpler > > Cc: Niklas Cassel <Niklas.Cassel@wdc.com> > Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com> > Cc: John Garry <john.garry@oracle.com> That's someone else :) > Signed-off-by: Hannes Reinecke <hare@suse.de> > --- > drivers/scsi/scsi_error.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c > index be2a70c5ac6d..e9f9c8f52c59 100644 > --- a/drivers/scsi/scsi_error.c > +++ b/drivers/scsi/scsi_error.c > @@ -242,6 +242,11 @@ scsi_abort_command(struct scsi_cmnd *scmd) > return FAILED; > } > > + if (!shost->hostt->eh_abort_handler) { nit: no need for {}, but maybe better put comment above the check if removing it. However maybe it's also a bit obvious comment. > + /* No abort handler, fail command directly */ > + return FAILED; > + } > + > spin_lock_irqsave(shost->host_lock, flags); > if (shost->eh_deadline != -1 && !shost->last_reset) > shost->last_reset = jiffies;
On Wed, Nov 09, 2022 at 08:47:54AM +0100, Hannes Reinecke wrote: > If a host template doesn't implement the .eh_abort_handler() > there is no point in queueing the abort workqueue function; > all it does is invoking SCSI EH anyway. > So return 'FAILED' from scsi_abort_command() if the .eh_abort_handler() > is not implemented and save us from having to wait for the > abort workqueue function to complete. > > Cc: Niklas Cassel <Niklas.Cassel@wdc.com> > Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com> > Cc: John Garry <john.garry@oracle.com> > Signed-off-by: Hannes Reinecke <hare@suse.de> > --- > drivers/scsi/scsi_error.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c > index be2a70c5ac6d..e9f9c8f52c59 100644 > --- a/drivers/scsi/scsi_error.c > +++ b/drivers/scsi/scsi_error.c > @@ -242,6 +242,11 @@ scsi_abort_command(struct scsi_cmnd *scmd) > return FAILED; > } > > + if (!shost->hostt->eh_abort_handler) { > + /* No abort handler, fail command directly */ > + return FAILED; > + } > + Hello Hannes, is there any reason why you didn't put this before the preceding if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED) { if statement? > spin_lock_irqsave(shost->host_lock, flags); > if (shost->eh_deadline != -1 && !shost->last_reset) > shost->last_reset = jiffies; > -- > 2.35.3 > After some additional testing with this patch, I did notice that it does introduce a behavioural change from libata perspective: Before this patch, for libata, scmd_eh_abort_handler() would get called, and we would come into this statement: rtn = scsi_try_to_abort_cmd(shost->hostt, scmd); if (rtn != SUCCESS) { SCSI_LOG_ERROR_RECOVERY(3, scmd_printk(KERN_INFO, scmd, "cmd abort %s\n", (rtn == FAST_IO_FAIL) ? "not send" : "failed")); goto out; } Which jumps to: out: spin_lock_irqsave(shost->host_lock, flags); list_del_init(&scmd->eh_entry); spin_unlock_irqrestore(shost->host_lock, flags); scsi_eh_scmd_add(scmd); So scsi_eh_scmd_add() would be called. After this patch, scsi_abort_command() will return FAILED instead of SUCCESS, which means that scsi_timeout() instead enters this if statement: if (scsi_abort_command(scmd) != SUCCESS) { set_host_byte(scmd, DID_TIME_OUT); scsi_eh_scmd_add(scmd); } Which means that scmds reaching libata .eh_strategy_handler() now has host_byte DID_TIME_OUT set, while before this patch, that was not the case. I guess we could simply clear the host_byte in libata's .eh_strategy_handler() (and that is actually what we do). I just want to understand how it is meant to work. Looking back at the code to when libata first started to use blk_abort_request()/scsi_req_abort_cmd(), DID_TIME_OUT was only set if scsi_eh_scmd_add() failed: https://github.com/torvalds/linux/blob/7b70fc039824bc7303e4007a5f758f832de56611/drivers/scsi/scsi_error.c#L181 Martin then moved the set_host_byte(scmd, DID_TIME_OUT) to be done regardless if scsi_eh_scmd_add() failed or not in: 18a4d0a22ed6 ("[SCSI] Handle disk devices which can not process medium access commands") without really explaining why. Then in your commit: 2171b6d08bf8 ("scsi: make scsi_eh_scmd_add() always succeed") you changed scsi_times_out() to: + if (host->hostt->no_async_abort || + scsi_abort_command(scmd) != SUCCESS) { + set_host_byte(scmd, DID_TIME_OUT); + scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD); + } And it also changed scmd_eh_abort_handler(): - if (!scsi_eh_scmd_add(scmd, 0)) { - SCSI_LOG_ERROR_RECOVERY(3, - scmd_printk(KERN_WARNING, scmd, - "terminate aborted command\n")); - set_host_byte(scmd, DID_TIME_OUT); - scsi_finish_command(scmd); - } + scsi_eh_scmd_add(scmd, 0); So for libata, which did not set host->hostt->no_async_abort, scsi_abort_command() would return SUCCESS, so we would not go into that if statement. Instead we would have the current behavior where scmd_eh_abort_handler() fails, and does a goto out; to add the scsi_eh_scmd_add() (without setting DID_TIME_OUT). Should perhaps scmd_eh_abort_handler() perhaps set DID_TIME_OUT unconditionally, to match the code before the change? To me, it is not really clear how the SCSI code is meant to behave. I think if the timeout has actually triggered, because the timer expired, it makes sense that scsi_timeout() sets DID_TIME_OUT. But if e.g. libata called blk_abort_request() to abort the command before the timer actually expired, I'm not sure. For ata part, it does not really matter, because currently, libata always overwrites the scmd->result anyway. However, there might be other LLDD where this change actually do matter. (For the curious, libata's own way of detecting if the command actually was a timeout from scsi_timeout() or if it was an aborted command works like this: If QCFLAG_FAILED is not set in .eh_strategy_handler, then libata EH does not own the QC, so it was scsi_timeout() that won the race, without libata ever aborting the command. It then sets qc->err_mask = AC_ERR_TIMEOUT). So libata currently never looks at host_byte(), it always overwrites it, and it instead uses its own way of detecting that a timeout occured, if so, it freezes the port (disables IRQs) and resets the controller, and increases smcd->allowed, such that the command is retried.) Kind regards, Niklas
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index be2a70c5ac6d..e9f9c8f52c59 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -242,6 +242,11 @@ scsi_abort_command(struct scsi_cmnd *scmd) return FAILED; } + if (!shost->hostt->eh_abort_handler) { + /* No abort handler, fail command directly */ + return FAILED; + } + spin_lock_irqsave(shost->host_lock, flags); if (shost->eh_deadline != -1 && !shost->last_reset) shost->last_reset = jiffies;
If a host template doesn't implement the .eh_abort_handler() there is no point in queueing the abort workqueue function; all it does is invoking SCSI EH anyway. So return 'FAILED' from scsi_abort_command() if the .eh_abort_handler() is not implemented and save us from having to wait for the abort workqueue function to complete. Cc: Niklas Cassel <Niklas.Cassel@wdc.com> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com> Cc: John Garry <john.garry@oracle.com> Signed-off-by: Hannes Reinecke <hare@suse.de> --- drivers/scsi/scsi_error.c | 5 +++++ 1 file changed, 5 insertions(+)