diff mbox series

[v4] scsi: sd: retry command SYNC CACHE if format in progress

Message ID 20240817015019.3467765-1-liyihang9@huawei.com
State New
Headers show
Series [v4] scsi: sd: retry command SYNC CACHE if format in progress | expand

Commit Message

Yihang Li Aug. 17, 2024, 1:50 a.m. UTC
If formatting a suspended disk (such as formatting with different DIF
type), the disk will be resuming first, and then the format command will
submit to the disk through SG_IO ioctl.

When the disk is processing the format command, the system does not submit
other commands to the disk. Therefore, the system attempts to suspend the
disk again and sends the SYNC CACHE command. However, the SYNC CACHE
command will fail because the disk is in the formatting process, which
will cause the runtime_status of the disk to error and it is difficult
for user to recover it. Error info like:

[  669.925325] sd 6:0:6:0: [sdg] Synchronizing SCSI cache
[  670.202371] sd 6:0:6:0: [sdg] Synchronize Cache(10) failed: Result: hostbyte=0x00 driverbyte=DRIVER_OK
[  670.216300] sd 6:0:6:0: [sdg] Sense Key : 0x2 [current]
[  670.221860] sd 6:0:6:0: [sdg] ASC=0x4 ASCQ=0x4

To solve the issue, retry the command until format command is finished.

Cc: stable@vger.kernel.org
Signed-off-by: Yihang Li <liyihang9@huawei.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
---
Changes since v3:
- Add Cc tag for kernel stable.

Changes since v2:
- Add Reviewed-by for Bart.

Changes since v1:
- Updated and added error information to the patch description.

---
 drivers/scsi/sd.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Damien Le Moal Aug. 18, 2024, 11:55 p.m. UTC | #1
On 8/17/24 10:50, Yihang Li wrote:
> If formatting a suspended disk (such as formatting with different DIF
> type), the disk will be resuming first, and then the format command will
> submit to the disk through SG_IO ioctl.
> 
> When the disk is processing the format command, the system does not submit
> other commands to the disk. Therefore, the system attempts to suspend the
> disk again and sends the SYNC CACHE command. However, the SYNC CACHE

Why would the system try to suspend the disk with a request in flight ? Sounds
like there is a bug with PM reference counting, no ?

> command will fail because the disk is in the formatting process, which
> will cause the runtime_status of the disk to error and it is difficult
> for user to recover it. Error info like:
> 
> [  669.925325] sd 6:0:6:0: [sdg] Synchronizing SCSI cache
> [  670.202371] sd 6:0:6:0: [sdg] Synchronize Cache(10) failed: Result: hostbyte=0x00 driverbyte=DRIVER_OK
> [  670.216300] sd 6:0:6:0: [sdg] Sense Key : 0x2 [current]
> [  670.221860] sd 6:0:6:0: [sdg] ASC=0x4 ASCQ=0x4
> 
> To solve the issue, retry the command until format command is finished.
> 
> Cc: stable@vger.kernel.org
> Signed-off-by: Yihang Li <liyihang9@huawei.com>
> Reviewed-by: Bart Van Assche <bvanassche@acm.org>
> ---
> Changes since v3:
> - Add Cc tag for kernel stable.
> 
> Changes since v2:
> - Add Reviewed-by for Bart.
> 
> Changes since v1:
> - Updated and added error information to the patch description.
> 
> ---
>  drivers/scsi/sd.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index adeaa8ab9951..5cd88a8eea73 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -1823,6 +1823,11 @@ static int sd_sync_cache(struct scsi_disk *sdkp)
>  			    (sshdr.asc == 0x74 && sshdr.ascq == 0x71))	/* drive is password locked */
>  				/* this is no error here */
>  				return 0;
> +
> +			/* retry if format in progress */
> +			if (sshdr.asc == 0x4 && sshdr.ascq == 0x4)
> +				return -EBUSY;
> +
>  			/*
>  			 * This drive doesn't support sync and there's not much
>  			 * we can do because this is called during shutdown
Yihang Li Aug. 19, 2024, 4:07 a.m. UTC | #2
On 2024/8/19 7:55, Damien Le Moal wrote:
> On 8/17/24 10:50, Yihang Li wrote:
>> If formatting a suspended disk (such as formatting with different DIF
>> type), the disk will be resuming first, and then the format command will
>> submit to the disk through SG_IO ioctl.
>>
>> When the disk is processing the format command, the system does not submit
>> other commands to the disk. Therefore, the system attempts to suspend the
>> disk again and sends the SYNC CACHE command. However, the SYNC CACHE
> 
> Why would the system try to suspend the disk with a request in flight ? Sounds
> like there is a bug with PM reference counting, no ?

According to my understand and test, the format command request is finished,
so it is not in flight for the kernel. And the command need a few time to processing
in the disk while no other commands are being sent.

> 
>> command will fail because the disk is in the formatting process, which
>> will cause the runtime_status of the disk to error and it is difficult
>> for user to recover it. Error info like:
>>
>> [  669.925325] sd 6:0:6:0: [sdg] Synchronizing SCSI cache
>> [  670.202371] sd 6:0:6:0: [sdg] Synchronize Cache(10) failed: Result: hostbyte=0x00 driverbyte=DRIVER_OK
>> [  670.216300] sd 6:0:6:0: [sdg] Sense Key : 0x2 [current]
>> [  670.221860] sd 6:0:6:0: [sdg] ASC=0x4 ASCQ=0x4
>>
>> To solve the issue, retry the command until format command is finished.
>>
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Yihang Li <liyihang9@huawei.com>
>> Reviewed-by: Bart Van Assche <bvanassche@acm.org>
>> ---
>> Changes since v3:
>> - Add Cc tag for kernel stable.
>>
>> Changes since v2:
>> - Add Reviewed-by for Bart.
>>
>> Changes since v1:
>> - Updated and added error information to the patch description.
>>
>> ---
>>  drivers/scsi/sd.c | 5 +++++
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
>> index adeaa8ab9951..5cd88a8eea73 100644
>> --- a/drivers/scsi/sd.c
>> +++ b/drivers/scsi/sd.c
>> @@ -1823,6 +1823,11 @@ static int sd_sync_cache(struct scsi_disk *sdkp)
>>  			    (sshdr.asc == 0x74 && sshdr.ascq == 0x71))	/* drive is password locked */
>>  				/* this is no error here */
>>  				return 0;
>> +
>> +			/* retry if format in progress */
>> +			if (sshdr.asc == 0x4 && sshdr.ascq == 0x4)
>> +				return -EBUSY;
>> +
>>  			/*
>>  			 * This drive doesn't support sync and there's not much
>>  			 * we can do because this is called during shutdown
>
Damien Le Moal Aug. 19, 2024, 4:17 a.m. UTC | #3
On 8/19/24 13:07, Yihang Li wrote:
> 
> 
> On 2024/8/19 7:55, Damien Le Moal wrote:
>> On 8/17/24 10:50, Yihang Li wrote:
>>> If formatting a suspended disk (such as formatting with different DIF
>>> type), the disk will be resuming first, and then the format command will
>>> submit to the disk through SG_IO ioctl.
>>>
>>> When the disk is processing the format command, the system does not submit
>>> other commands to the disk. Therefore, the system attempts to suspend the
>>> disk again and sends the SYNC CACHE command. However, the SYNC CACHE
>>
>> Why would the system try to suspend the disk with a request in flight ? Sounds
>> like there is a bug with PM reference counting, no ?
> 
> According to my understand and test, the format command request is finished,
> so it is not in flight for the kernel. And the command need a few time to processing
> in the disk while no other commands are being sent.

OK, fine. But I think that retrying SYNC CACHE if the drive is formatting makes
absolutely no sense at all because there is nothing to flush in that case.
So what about simply ignoring the error ? I.e. something like this:

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 699f4f9674d9..1da267b8cd8a 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1824,12 +1824,14 @@ static int sd_sync_cache(struct scsi_disk *sdkp)
                                /* this is no error here */
                                return 0;
                        /*
-                        * This drive doesn't support sync and there's not much
-                        * we can do because this is called during shutdown
-                        * or suspend so just return success so those operations
-                        * can proceed.
+                        * If a format is in progress (asc = LOGICAL UNIT NOT
+                        * READY, ascq = FORMAT IN PROGRESS) or if the drive
+                        * does not support sync, there is not much we can do
+                        * because this is called during shutdown or suspend. So
+                        * just return success so those operations can proceed.
                         */
-                       if (sshdr.sense_key == ILLEGAL_REQUEST)
+                       if ((sshdr.asc == 0x04 && sshdr.ascq == 0x04) ||
+                           sshdr.sense_key == ILLEGAL_REQUEST)
                                return 0;
                }

> 
>>
>>> command will fail because the disk is in the formatting process, which
>>> will cause the runtime_status of the disk to error and it is difficult
>>> for user to recover it. Error info like:
>>>
>>> [  669.925325] sd 6:0:6:0: [sdg] Synchronizing SCSI cache
>>> [  670.202371] sd 6:0:6:0: [sdg] Synchronize Cache(10) failed: Result: hostbyte=0x00 driverbyte=DRIVER_OK
>>> [  670.216300] sd 6:0:6:0: [sdg] Sense Key : 0x2 [current]
>>> [  670.221860] sd 6:0:6:0: [sdg] ASC=0x4 ASCQ=0x4
>>>
>>> To solve the issue, retry the command until format command is finished.
>>>
>>> Cc: stable@vger.kernel.org
>>> Signed-off-by: Yihang Li <liyihang9@huawei.com>
>>> Reviewed-by: Bart Van Assche <bvanassche@acm.org>
>>> ---
>>> Changes since v3:
>>> - Add Cc tag for kernel stable.
>>>
>>> Changes since v2:
>>> - Add Reviewed-by for Bart.
>>>
>>> Changes since v1:
>>> - Updated and added error information to the patch description.
>>>
>>> ---
>>>  drivers/scsi/sd.c | 5 +++++
>>>  1 file changed, 5 insertions(+)
>>>
>>> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
>>> index adeaa8ab9951..5cd88a8eea73 100644
>>> --- a/drivers/scsi/sd.c
>>> +++ b/drivers/scsi/sd.c
>>> @@ -1823,6 +1823,11 @@ static int sd_sync_cache(struct scsi_disk *sdkp)
>>>  			    (sshdr.asc == 0x74 && sshdr.ascq == 0x71))	/* drive is password locked */
>>>  				/* this is no error here */
>>>  				return 0;
>>> +
>>> +			/* retry if format in progress */
>>> +			if (sshdr.asc == 0x4 && sshdr.ascq == 0x4)
>>> +				return -EBUSY;
>>> +
>>>  			/*
>>>  			 * This drive doesn't support sync and there's not much
>>>  			 * we can do because this is called during shutdown
>>
Yihang Li Aug. 19, 2024, 6:44 a.m. UTC | #4
On 2024/8/19 12:17, Damien Le Moal wrote:
> On 8/19/24 13:07, Yihang Li wrote:
>>
>>
>> On 2024/8/19 7:55, Damien Le Moal wrote:
>>> On 8/17/24 10:50, Yihang Li wrote:
>>>> If formatting a suspended disk (such as formatting with different DIF
>>>> type), the disk will be resuming first, and then the format command will
>>>> submit to the disk through SG_IO ioctl.
>>>>
>>>> When the disk is processing the format command, the system does not submit
>>>> other commands to the disk. Therefore, the system attempts to suspend the
>>>> disk again and sends the SYNC CACHE command. However, the SYNC CACHE
>>>
>>> Why would the system try to suspend the disk with a request in flight ? Sounds
>>> like there is a bug with PM reference counting, no ?
>>
>> According to my understand and test, the format command request is finished,
>> so it is not in flight for the kernel. And the command need a few time to processing
>> in the disk while no other commands are being sent.
> 
> OK, fine. But I think that retrying SYNC CACHE if the drive is formatting makes
> absolutely no sense at all because there is nothing to flush in that case.
> So what about simply ignoring the error ? I.e. something like this:
> 
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index 699f4f9674d9..1da267b8cd8a 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -1824,12 +1824,14 @@ static int sd_sync_cache(struct scsi_disk *sdkp)
>                                 /* this is no error here */
>                                 return 0;
>                         /*
> -                        * This drive doesn't support sync and there's not much
> -                        * we can do because this is called during shutdown
> -                        * or suspend so just return success so those operations
> -                        * can proceed.
> +                        * If a format is in progress (asc = LOGICAL UNIT NOT
> +                        * READY, ascq = FORMAT IN PROGRESS) or if the drive
> +                        * does not support sync, there is not much we can do
> +                        * because this is called during shutdown or suspend. So
> +                        * just return success so those operations can proceed.
>                          */
> -                       if (sshdr.sense_key == ILLEGAL_REQUEST)
> +                       if ((sshdr.asc == 0x04 && sshdr.ascq == 0x04) ||
> +                           sshdr.sense_key == ILLEGAL_REQUEST)
>                                 return 0;
>                 }
> 

Thanks for your suggestion, it seems like good.
I will send a new version based on this later.

Thanks,

Yihang.
diff mbox series

Patch

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index adeaa8ab9951..5cd88a8eea73 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1823,6 +1823,11 @@  static int sd_sync_cache(struct scsi_disk *sdkp)
 			    (sshdr.asc == 0x74 && sshdr.ascq == 0x71))	/* drive is password locked */
 				/* this is no error here */
 				return 0;
+
+			/* retry if format in progress */
+			if (sshdr.asc == 0x4 && sshdr.ascq == 0x4)
+				return -EBUSY;
+
 			/*
 			 * This drive doesn't support sync and there's not much
 			 * we can do because this is called during shutdown