Message ID | 20231016121542.111501-6-hare@suse.de |
---|---|
State | New |
Headers | show |
Series | scsi: EH rework, main part | expand |
On 16.10.23 14:16, Hannes Reinecke wrote: > -void scsi_eh_finish_cmd(struct scsi_cmnd *scmd, struct list_head *done_q) > +void __scsi_eh_finish_cmd(struct scsi_cmnd *scmd, struct list_head *done_q, > + int host_byte) > { > + if (host_byte) > + set_host_byte(scmd, host_byte); I think the 'if' is not needed here.
On 10/16/23 15:59, Johannes Thumshirn wrote: > On 16.10.23 14:16, Hannes Reinecke wrote: >> -void scsi_eh_finish_cmd(struct scsi_cmnd *scmd, struct list_head *done_q) >> +void __scsi_eh_finish_cmd(struct scsi_cmnd *scmd, struct list_head *done_q, >> + int host_byte) >> { >> + if (host_byte) >> + set_host_byte(scmd, host_byte); > > I think the 'if' is not needed here. > Actually, I prefer to keep it as passing in 'DID_OK' would _not_ modify the status; leaving out the 'if' would always overwrite it. Cheers, Hannes
On Mon, Oct 16, 2023 at 02:15:38PM +0200, Hannes Reinecke wrote: > When SCSI EH completes we should be setting the host byte to > DID_ABORT, DID_RESET, or DID_TRANSPORT_DISRUPTED to inform > the caller that some EH processing has happened. I have a hard time following this commit log. Yes, we probably should. But so far we haven't, so why is this suddenly a problem? > -void scsi_eh_finish_cmd(struct scsi_cmnd *scmd, struct list_head *done_q) > +void __scsi_eh_finish_cmd(struct scsi_cmnd *scmd, struct list_head *done_q, > + int host_byte) > { > + if (host_byte) > + set_host_byte(scmd, host_byte); > list_move_tail(&scmd->eh_entry, done_q); What is the point of passing in the host_byte vs just setting it in the caller? In fat I'm not even quite sure what the point of the existing helper is, as moving the command to the passed in queue doesn't provide much of a useful abstraction.
On 10/17/23 09:25, Christoph Hellwig wrote: > On Mon, Oct 16, 2023 at 02:15:38PM +0200, Hannes Reinecke wrote: >> When SCSI EH completes we should be setting the host byte to >> DID_ABORT, DID_RESET, or DID_TRANSPORT_DISRUPTED to inform >> the caller that some EH processing has happened. > > I have a hard time following this commit log. Yes, we probably > should. But so far we haven't, so why is this suddenly a problem? > >> -void scsi_eh_finish_cmd(struct scsi_cmnd *scmd, struct list_head *done_q) >> +void __scsi_eh_finish_cmd(struct scsi_cmnd *scmd, struct list_head *done_q, >> + int host_byte) >> { >> + if (host_byte) >> + set_host_byte(scmd, host_byte); >> list_move_tail(&scmd->eh_entry, done_q); > > What is the point of passing in the host_byte vs just setting it in > the caller? > Hmm. Sure, could do. > In fat I'm not even quite sure what the point of the existing helper > is, as moving the command to the passed in queue doesn't provide > much of a useful abstraction. > Share your sentiments. One could open-code it, but if I move the host_byte setting into the caller this function won't be touched at all. And it's time to clean up the entire list splice-and-dice game in SCSI EH once this rework is in; my plan is to move failed commands onto a per-entity list, and merge them onto the next higher entity list once an escalation fails. So maybe shelf it for now, and just open-code the host_byte setting in the caller. Cheers, Hannes
On Tue, Oct 17, 2023 at 10:14:19AM +0200, Hannes Reinecke wrote: >> What is the point of passing in the host_byte vs just setting it in >> the caller? >> > Hmm. Sure, could do. > >> In fat I'm not even quite sure what the point of the existing helper >> is, as moving the command to the passed in queue doesn't provide >> much of a useful abstraction. >> > Share your sentiments. > One could open-code it, but if I move the host_byte setting into the > caller this function won't be touched at all. Yes, I can rant about existing code, not just new code :) > And it's time to clean up the entire list splice-and-dice game in > SCSI EH once this rework is in; my plan is to move failed commands > onto a per-entity list, and merge them onto the next higher entity > list once an escalation fails. > > So maybe shelf it for now, and just open-code the host_byte setting > in the caller. Sounds good.
On 10/16/23 7:15 AM, Hannes Reinecke wrote: > @@ -1671,7 +1682,8 @@ static int scsi_eh_target_reset(struct Scsi_Host *shost, > if (rtn == SUCCESS) > list_move_tail(&scmd->eh_entry, &check_list); > else if (rtn == FAST_IO_FAIL) > - scsi_eh_finish_cmd(scmd, done_q); > + __scsi_eh_finish_cmd(scmd, done_q, > + DID_TRANSPORT_DISRUPTED); > else > /* push back on work queue for further processing */ > list_move(&scmd->eh_entry, work_q); > @@ -1736,8 +1748,9 @@ static int scsi_eh_bus_reset(struct Scsi_Host *shost, > list_for_each_entry_safe(scmd, next, work_q, eh_entry) { > if (channel == scmd_channel(scmd)) { > if (rtn == FAST_IO_FAIL) > - scsi_eh_finish_cmd(scmd, > - done_q); > + __scsi_eh_finish_cmd(scmd, > + done_q, > + DID_TRANSPORT_DISRUPTED); > else > list_move_tail(&scmd->eh_entry, > &check_list); > @@ -1780,9 +1793,9 @@ static int scsi_eh_host_reset(struct Scsi_Host *shost, > if (rtn == SUCCESS) { > list_splice_init(work_q, &check_list); > } else if (rtn == FAST_IO_FAIL) { > - list_for_each_entry_safe(scmd, next, work_q, eh_entry) { > - scsi_eh_finish_cmd(scmd, done_q); > - } > + list_for_each_entry_safe(scmd, next, work_q, eh_entry) > + __scsi_eh_finish_cmd(scmd, done_q, > + DID_TRANSPORT_DISRUPTED); > } else { > SCSI_LOG_ERROR_RECOVERY(3, > shost_printk(KERN_INFO, shost, For FAST_IO_FAIL I think you want to use DID_TRANSPORT_FAILFAST because when drivers return that, they normally have hit their fast io fail timer or have hit a hard transport issue like the port is offline. For example for FC drivers they do: err = fc_block_rport(rport); if (err) return err; where fc_block_rport does: if (rport->flags & FC_RPORT_FAST_FAIL_TIMEDOUT) return FAST_IO_FAIL; and then for fc_remote_port_chkready we return DID_TRANSPORT_FAILFAST when FC_RPORT_FAST_FAIL_TIMEDOUT is set. So using DID_TRANSPORT_FAILFAST would align the return values for that state. One question I had is why you added those checks for target and host reset but not scsi_eh_bus_device_reset because drivers will do something similar to above where they call fc_block_rport for that callout as well.
On 10/19/23 22:30, Mike Christie wrote: > On 10/16/23 7:15 AM, Hannes Reinecke wrote: >> @@ -1671,7 +1682,8 @@ static int scsi_eh_target_reset(struct Scsi_Host *shost, >> if (rtn == SUCCESS) >> list_move_tail(&scmd->eh_entry, &check_list); >> else if (rtn == FAST_IO_FAIL) >> - scsi_eh_finish_cmd(scmd, done_q); >> + __scsi_eh_finish_cmd(scmd, done_q, >> + DID_TRANSPORT_DISRUPTED); >> else >> /* push back on work queue for further processing */ >> list_move(&scmd->eh_entry, work_q); >> @@ -1736,8 +1748,9 @@ static int scsi_eh_bus_reset(struct Scsi_Host *shost, >> list_for_each_entry_safe(scmd, next, work_q, eh_entry) { >> if (channel == scmd_channel(scmd)) { >> if (rtn == FAST_IO_FAIL) >> - scsi_eh_finish_cmd(scmd, >> - done_q); >> + __scsi_eh_finish_cmd(scmd, >> + done_q, >> + DID_TRANSPORT_DISRUPTED); >> else >> list_move_tail(&scmd->eh_entry, >> &check_list); >> @@ -1780,9 +1793,9 @@ static int scsi_eh_host_reset(struct Scsi_Host *shost, >> if (rtn == SUCCESS) { >> list_splice_init(work_q, &check_list); >> } else if (rtn == FAST_IO_FAIL) { >> - list_for_each_entry_safe(scmd, next, work_q, eh_entry) { >> - scsi_eh_finish_cmd(scmd, done_q); >> - } >> + list_for_each_entry_safe(scmd, next, work_q, eh_entry) >> + __scsi_eh_finish_cmd(scmd, done_q, >> + DID_TRANSPORT_DISRUPTED); >> } else { >> SCSI_LOG_ERROR_RECOVERY(3, >> shost_printk(KERN_INFO, shost, > > For FAST_IO_FAIL I think you want to use DID_TRANSPORT_FAILFAST because when > drivers return that, they normally have hit their fast io fail timer or have > hit a hard transport issue like the port is offline. For example for FC drivers > they do: > Ah, yes, you are right. Will be modifying that. > err = fc_block_rport(rport); > if (err) > return err; > > where fc_block_rport does: > > if (rport->flags & FC_RPORT_FAST_FAIL_TIMEDOUT) > return FAST_IO_FAIL; > > and then for fc_remote_port_chkready we return DID_TRANSPORT_FAILFAST > when FC_RPORT_FAST_FAIL_TIMEDOUT is set. > > So using DID_TRANSPORT_FAILFAST would align the return values for that > state. > > One question I had is why you added those checks for target and host > reset but not scsi_eh_bus_device_reset because drivers will do something > similar to above where they call fc_block_rport for that callout as > well. > That's done in one of the later patches where I convert the loop in scsi_eh_bus_device_reset(). Cheers, Hannes
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index 826bc7f4d59f..cdbad217d013 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -1262,9 +1262,10 @@ scsi_eh_action(struct scsi_cmnd *scmd, enum scsi_disposition rtn) } /** - * scsi_eh_finish_cmd - Handle a cmd that eh is finished with. + * __scsi_eh_finish_cmd - Handle a cmd that eh is finished with. * @scmd: Original SCSI cmd that eh has finished. * @done_q: Queue for processed commands. + * @host_byte: Host byte of the command status to be set * * Notes: * We don't want to use the normal command completion while we are are @@ -1273,10 +1274,18 @@ scsi_eh_action(struct scsi_cmnd *scmd, enum scsi_disposition rtn) * keep a list of pending commands for final completion, and once we * are ready to leave error handling we handle completion for real. */ -void scsi_eh_finish_cmd(struct scsi_cmnd *scmd, struct list_head *done_q) +void __scsi_eh_finish_cmd(struct scsi_cmnd *scmd, struct list_head *done_q, + int host_byte) { + if (host_byte) + set_host_byte(scmd, host_byte); list_move_tail(&scmd->eh_entry, done_q); } + +void scsi_eh_finish_cmd(struct scsi_cmnd *scmd, struct list_head *done_q) +{ + __scsi_eh_finish_cmd(scmd, done_q, DID_OK); +} EXPORT_SYMBOL(scsi_eh_finish_cmd); /** @@ -1451,7 +1460,8 @@ static int scsi_eh_test_devices(struct list_head *cmd_list, if (finish_cmds && (try_stu || scsi_eh_action(scmd, SUCCESS) == SUCCESS)) - scsi_eh_finish_cmd(scmd, done_q); + __scsi_eh_finish_cmd(scmd, done_q, + DID_RESET); else list_move_tail(&scmd->eh_entry, work_q); } @@ -1600,8 +1610,9 @@ static int scsi_eh_bus_device_reset(struct Scsi_Host *shost, work_q, eh_entry) { if (scmd->device == sdev && scsi_eh_action(scmd, rtn) != FAILED) - scsi_eh_finish_cmd(scmd, - done_q); + __scsi_eh_finish_cmd(scmd, + done_q, + DID_RESET); } } } else { @@ -1671,7 +1682,8 @@ static int scsi_eh_target_reset(struct Scsi_Host *shost, if (rtn == SUCCESS) list_move_tail(&scmd->eh_entry, &check_list); else if (rtn == FAST_IO_FAIL) - scsi_eh_finish_cmd(scmd, done_q); + __scsi_eh_finish_cmd(scmd, done_q, + DID_TRANSPORT_DISRUPTED); else /* push back on work queue for further processing */ list_move(&scmd->eh_entry, work_q); @@ -1736,8 +1748,9 @@ static int scsi_eh_bus_reset(struct Scsi_Host *shost, list_for_each_entry_safe(scmd, next, work_q, eh_entry) { if (channel == scmd_channel(scmd)) { if (rtn == FAST_IO_FAIL) - scsi_eh_finish_cmd(scmd, - done_q); + __scsi_eh_finish_cmd(scmd, + done_q, + DID_TRANSPORT_DISRUPTED); else list_move_tail(&scmd->eh_entry, &check_list); @@ -1780,9 +1793,9 @@ static int scsi_eh_host_reset(struct Scsi_Host *shost, if (rtn == SUCCESS) { list_splice_init(work_q, &check_list); } else if (rtn == FAST_IO_FAIL) { - list_for_each_entry_safe(scmd, next, work_q, eh_entry) { - scsi_eh_finish_cmd(scmd, done_q); - } + list_for_each_entry_safe(scmd, next, work_q, eh_entry) + __scsi_eh_finish_cmd(scmd, done_q, + DID_TRANSPORT_DISRUPTED); } else { SCSI_LOG_ERROR_RECOVERY(3, shost_printk(KERN_INFO, shost,
When SCSI EH completes we should be setting the host byte to DID_ABORT, DID_RESET, or DID_TRANSPORT_DISRUPTED to inform the caller that some EH processing has happened. Signed-off-by: Hannes Reinecke <hare@suse.de> --- drivers/scsi/scsi_error.c | 35 ++++++++++++++++++++++++----------- 1 file changed, 24 insertions(+), 11 deletions(-)