[5/9] scsi: set host byte after EH completed

Message ID	20231016121542.111501-6-hare@suse.de
State	New
Headers	show Return-Path: <linux-scsi-owner@vger.kernel.org> From: Hannes Reinecke <hare@suse.de> To: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Christoph Hellwig <hch@lst.de>, James Bottomley <james.bottomley@hansenpartnership.com>, linux-scsi@vger.kernel.org, Hannes Reinecke <hare@suse.de> Subject: [PATCH 5/9] scsi: set host byte after EH completed Date: Mon, 16 Oct 2023 14:15:38 +0200 Message-Id: <20231016121542.111501-6-hare@suse.de> In-Reply-To: <20231016121542.111501-1-hare@suse.de> References: <20231016121542.111501-1-hare@suse.de> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	scsi: EH rework, main part \| expand [PATCHv6,0/9] scsi: EH rework, main part [1/9] scsi: Use Scsi_Host as argument for eh_host_reset_handler [2/9] scsi: Use Scsi_Host and channel number as argument for eh_bus_reset_handler() [3/9] scsi: Use scsi_target as argument for eh_target_reset_handler() [4/9] scsi: Use scsi_device as argument to eh_device_reset_handler() [5/9] scsi: set host byte after EH completed [6/9] scsi_error: iterate over list of failed commands in scsi_eh_bus_reset() [7/9] scsi: Do not allocate scsi command in scsi_ioctl_reset() [8/9] scsi_error: iterate over list of failed commands in scsi_eh_bus_device_reset() [9/9] scsi: remove SUBMITTED_BY_SCSI_RESET_IOCTL

Hannes Reinecke Oct. 16, 2023, 12:15 p.m. UTC

When SCSI EH completes we should be setting the host byte to
DID_ABORT, DID_RESET, or DID_TRANSPORT_DISRUPTED to inform
the caller that some EH processing has happened.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/scsi/scsi_error.c | 35 ++++++++++++++++++++++++-----------
 1 file changed, 24 insertions(+), 11 deletions(-)

Johannes Thumshirn Oct. 16, 2023, 1:59 p.m. UTC | #1

On 16.10.23 14:16, Hannes Reinecke wrote:
> -void scsi_eh_finish_cmd(struct scsi_cmnd *scmd, struct list_head *done_q)
> +void __scsi_eh_finish_cmd(struct scsi_cmnd *scmd, struct list_head *done_q,
> +			int host_byte)
>   {
> +	if (host_byte)
> +		set_host_byte(scmd, host_byte);

I think the 'if' is not needed here.

Hannes Reinecke Oct. 17, 2023, 7:13 a.m. UTC | #2

On 10/16/23 15:59, Johannes Thumshirn wrote:
> On 16.10.23 14:16, Hannes Reinecke wrote:
>> -void scsi_eh_finish_cmd(struct scsi_cmnd *scmd, struct list_head *done_q)
>> +void __scsi_eh_finish_cmd(struct scsi_cmnd *scmd, struct list_head *done_q,
>> +			int host_byte)
>>    {
>> +	if (host_byte)
>> +		set_host_byte(scmd, host_byte);
> 
> I think the 'if' is not needed here.
> 
Actually, I prefer to keep it as passing in 'DID_OK' would _not_ modify
the status; leaving out the 'if' would always overwrite it.

Cheers,

Hannes

Christoph Hellwig Oct. 17, 2023, 7:25 a.m. UTC | #3

On Mon, Oct 16, 2023 at 02:15:38PM +0200, Hannes Reinecke wrote:
> When SCSI EH completes we should be setting the host byte to
> DID_ABORT, DID_RESET, or DID_TRANSPORT_DISRUPTED to inform
> the caller that some EH processing has happened.

I have a hard time following this commit log.  Yes, we probably
should.  But so far we haven't, so why is this suddenly a problem?

> -void scsi_eh_finish_cmd(struct scsi_cmnd *scmd, struct list_head *done_q)
> +void __scsi_eh_finish_cmd(struct scsi_cmnd *scmd, struct list_head *done_q,
> +			int host_byte)
>  {
> +	if (host_byte)
> +		set_host_byte(scmd, host_byte);
>  	list_move_tail(&scmd->eh_entry, done_q);

What is the point of passing in the host_byte vs just setting it in
the caller?

In fat I'm not even quite sure what the point of the existing helper
is, as moving the command to the passed in queue doesn't provide
much of a useful abstraction.

Hannes Reinecke Oct. 17, 2023, 8:14 a.m. UTC | #4

On 10/17/23 09:25, Christoph Hellwig wrote:
> On Mon, Oct 16, 2023 at 02:15:38PM +0200, Hannes Reinecke wrote:
>> When SCSI EH completes we should be setting the host byte to
>> DID_ABORT, DID_RESET, or DID_TRANSPORT_DISRUPTED to inform
>> the caller that some EH processing has happened.
> 
> I have a hard time following this commit log.  Yes, we probably
> should.  But so far we haven't, so why is this suddenly a problem?
> 
>> -void scsi_eh_finish_cmd(struct scsi_cmnd *scmd, struct list_head *done_q)
>> +void __scsi_eh_finish_cmd(struct scsi_cmnd *scmd, struct list_head *done_q,
>> +			int host_byte)
>>   {
>> +	if (host_byte)
>> +		set_host_byte(scmd, host_byte);
>>   	list_move_tail(&scmd->eh_entry, done_q);
> 
> What is the point of passing in the host_byte vs just setting it in
> the caller?
> 
Hmm. Sure, could do.

> In fat I'm not even quite sure what the point of the existing helper
> is, as moving the command to the passed in queue doesn't provide
> much of a useful abstraction.
> 
Share your sentiments.
One could open-code it, but if I move the host_byte setting into the
caller this function won't be touched at all.
And it's time to clean up the entire list splice-and-dice game in
SCSI EH once this rework is in; my plan is to move failed commands
onto a per-entity list, and merge them onto the next higher entity
list once an escalation fails.

So maybe shelf it for now, and just open-code the host_byte setting
in the caller.

Cheers,

Hannes

Christoph Hellwig Oct. 18, 2023, 5:46 a.m. UTC | #5

On Tue, Oct 17, 2023 at 10:14:19AM +0200, Hannes Reinecke wrote:
>> What is the point of passing in the host_byte vs just setting it in
>> the caller?
>>
> Hmm. Sure, could do.
>
>> In fat I'm not even quite sure what the point of the existing helper
>> is, as moving the command to the passed in queue doesn't provide
>> much of a useful abstraction.
>>
> Share your sentiments.
> One could open-code it, but if I move the host_byte setting into the
> caller this function won't be touched at all.

Yes, I can rant about existing code, not just new code :)

> And it's time to clean up the entire list splice-and-dice game in
> SCSI EH once this rework is in; my plan is to move failed commands
> onto a per-entity list, and merge them onto the next higher entity
> list once an escalation fails.
>
> So maybe shelf it for now, and just open-code the host_byte setting
> in the caller.

Sounds good.

Mike Christie Oct. 19, 2023, 8:30 p.m. UTC | #6

On 10/16/23 7:15 AM, Hannes Reinecke wrote:
> @@ -1671,7 +1682,8 @@ static int scsi_eh_target_reset(struct Scsi_Host *shost,
>  			if (rtn == SUCCESS)
>  				list_move_tail(&scmd->eh_entry, &check_list);
>  			else if (rtn == FAST_IO_FAIL)
> -				scsi_eh_finish_cmd(scmd, done_q);
> +				__scsi_eh_finish_cmd(scmd, done_q,
> +						     DID_TRANSPORT_DISRUPTED);
>  			else
>  				/* push back on work queue for further processing */
>  				list_move(&scmd->eh_entry, work_q);
> @@ -1736,8 +1748,9 @@ static int scsi_eh_bus_reset(struct Scsi_Host *shost,
>  			list_for_each_entry_safe(scmd, next, work_q, eh_entry) {
>  				if (channel == scmd_channel(scmd)) {
>  					if (rtn == FAST_IO_FAIL)
> -						scsi_eh_finish_cmd(scmd,
> -								   done_q);
> +						__scsi_eh_finish_cmd(scmd,
> +								     done_q,
> +								     DID_TRANSPORT_DISRUPTED);
>  					else
>  						list_move_tail(&scmd->eh_entry,
>  							       &check_list);
> @@ -1780,9 +1793,9 @@ static int scsi_eh_host_reset(struct Scsi_Host *shost,
>  		if (rtn == SUCCESS) {
>  			list_splice_init(work_q, &check_list);
>  		} else if (rtn == FAST_IO_FAIL) {
> -			list_for_each_entry_safe(scmd, next, work_q, eh_entry) {
> -					scsi_eh_finish_cmd(scmd, done_q);
> -			}
> +			list_for_each_entry_safe(scmd, next, work_q, eh_entry)
> +				__scsi_eh_finish_cmd(scmd, done_q,
> +						     DID_TRANSPORT_DISRUPTED);
>  		} else {
>  			SCSI_LOG_ERROR_RECOVERY(3,
>  				shost_printk(KERN_INFO, shost,

For FAST_IO_FAIL I think you want to use DID_TRANSPORT_FAILFAST  because when
drivers return that, they normally have hit their fast io fail timer or have
hit a hard transport issue like the port is offline. For example for FC drivers
they do:

err = fc_block_rport(rport);
if (err)
	return err;
 
where fc_block_rport does:

if (rport->flags & FC_RPORT_FAST_FAIL_TIMEDOUT)
	return FAST_IO_FAIL;

and then for fc_remote_port_chkready we return DID_TRANSPORT_FAILFAST
when FC_RPORT_FAST_FAIL_TIMEDOUT is set.

So using DID_TRANSPORT_FAILFAST would align the return values for that
state.

One question I had is why you added those checks for target and host
reset but not scsi_eh_bus_device_reset because drivers will do something
similar to above where they call fc_block_rport for that callout as
well.

Hannes Reinecke Oct. 20, 2023, 5:55 a.m. UTC | #7

On 10/19/23 22:30, Mike Christie wrote:
> On 10/16/23 7:15 AM, Hannes Reinecke wrote:
>> @@ -1671,7 +1682,8 @@ static int scsi_eh_target_reset(struct Scsi_Host *shost,
>>   			if (rtn == SUCCESS)
>>   				list_move_tail(&scmd->eh_entry, &check_list);
>>   			else if (rtn == FAST_IO_FAIL)
>> -				scsi_eh_finish_cmd(scmd, done_q);
>> +				__scsi_eh_finish_cmd(scmd, done_q,
>> +						     DID_TRANSPORT_DISRUPTED);
>>   			else
>>   				/* push back on work queue for further processing */
>>   				list_move(&scmd->eh_entry, work_q);
>> @@ -1736,8 +1748,9 @@ static int scsi_eh_bus_reset(struct Scsi_Host *shost,
>>   			list_for_each_entry_safe(scmd, next, work_q, eh_entry) {
>>   				if (channel == scmd_channel(scmd)) {
>>   					if (rtn == FAST_IO_FAIL)
>> -						scsi_eh_finish_cmd(scmd,
>> -								   done_q);
>> +						__scsi_eh_finish_cmd(scmd,
>> +								     done_q,
>> +								     DID_TRANSPORT_DISRUPTED);
>>   					else
>>   						list_move_tail(&scmd->eh_entry,
>>   							       &check_list);
>> @@ -1780,9 +1793,9 @@ static int scsi_eh_host_reset(struct Scsi_Host *shost,
>>   		if (rtn == SUCCESS) {
>>   			list_splice_init(work_q, &check_list);
>>   		} else if (rtn == FAST_IO_FAIL) {
>> -			list_for_each_entry_safe(scmd, next, work_q, eh_entry) {
>> -					scsi_eh_finish_cmd(scmd, done_q);
>> -			}
>> +			list_for_each_entry_safe(scmd, next, work_q, eh_entry)
>> +				__scsi_eh_finish_cmd(scmd, done_q,
>> +						     DID_TRANSPORT_DISRUPTED);
>>   		} else {
>>   			SCSI_LOG_ERROR_RECOVERY(3,
>>   				shost_printk(KERN_INFO, shost,
> 
> For FAST_IO_FAIL I think you want to use DID_TRANSPORT_FAILFAST  because when
> drivers return that, they normally have hit their fast io fail timer or have
> hit a hard transport issue like the port is offline. For example for FC drivers
> they do:
> 
Ah, yes, you are right. Will be modifying that.

> err = fc_block_rport(rport);
> if (err)
> 	return err;
>   
> where fc_block_rport does:
> 
> if (rport->flags & FC_RPORT_FAST_FAIL_TIMEDOUT)
> 	return FAST_IO_FAIL;
> 
> and then for fc_remote_port_chkready we return DID_TRANSPORT_FAILFAST
> when FC_RPORT_FAST_FAIL_TIMEDOUT is set.
> 
> So using DID_TRANSPORT_FAILFAST would align the return values for that
> state.
> 
> One question I had is why you added those checks for target and host
> reset but not scsi_eh_bus_device_reset because drivers will do something
> similar to above where they call fc_block_rport for that callout as
> well.
> 
That's done in one of the later patches where I convert the loop in
scsi_eh_bus_device_reset().

Cheers,

Hannes

[5/9] scsi: set host byte after EH completed

Commit Message

Comments

Patch