diff mbox series

scsi: sd: retry command SYNC CACHE if format in progress

Message ID 20240808021719.4167352-1-liyihang9@huawei.com
State Superseded
Headers show
Series scsi: sd: retry command SYNC CACHE if format in progress | expand

Commit Message

Yihang Li Aug. 8, 2024, 2:17 a.m. UTC
If formatting a suspended disk (such as formatting with different DIF
type), the SYNC CACHE command will fail because the disk is in the
formatting process, which will cause the runtime_status of the disk to
error and it is difficult for user to recover it.

To solve the issue, retry the command until format command is finished.

Signed-off-by: Yihang Li <liyihang9@huawei.com>
---
 drivers/scsi/sd.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Bart Van Assche Aug. 8, 2024, 7:09 p.m. UTC | #1
On 8/7/24 7:17 PM, Yihang Li wrote:
> If formatting a suspended disk (such as formatting with different DIF
> type), the SYNC CACHE command will fail because the disk is in the
> formatting process, which will cause the runtime_status of the disk to
> error and it is difficult for user to recover it.
> 
> To solve the issue, retry the command until format command is finished.

How is the format command submitted to the SCSI disk? Is that command
perhaps submitted as a SCSI pass-through command (SG_IO ioctl)?

Should the sd driver perhaps be unbound while the format command is in
progress?

Thanks,

Bart.
Yihang Li Aug. 9, 2024, 3:44 a.m. UTC | #2
On 2024/8/9 3:09, Bart Van Assche wrote:
> On 8/7/24 7:17 PM, Yihang Li wrote:
>> If formatting a suspended disk (such as formatting with different DIF
>> type), the SYNC CACHE command will fail because the disk is in the
>> formatting process, which will cause the runtime_status of the disk to
>> error and it is difficult for user to recover it.
>>
>> To solve the issue, retry the command until format command is finished.
> 
> How is the format command submitted to the SCSI disk? Is that command
> perhaps submitted as a SCSI pass-through command (SG_IO ioctl)?
> 

When formatting a suspended disk, the disk will be resuming first,
and then the format command will submit to the disk through SG_IO ioctl.

When the disk is processing the formatting command, the system does not
submit other commands to the disk. Therefore, the system attempts to suspend
the disk again and sends the SYNC CACHE command. However, the SYNC CACHE
command fails because the disk is being formatted.

Error info like:

[  669.925325] sd 6:0:6:0: [sdg] Synchronizing SCSI cache
[  670.202371] sd 6:0:6:0: [sdg] Synchronize Cache(10) failed: Result: hostbyte=0x00 driverbyte=DRIVER_OK
[  670.216300] sd 6:0:6:0: [sdg] Sense Key : 0x2 [current]
[  670.221860] sd 6:0:6:0: [sdg] ASC=0x4 ASCQ=0x4


> Should the sd driver perhaps be unbound while the format command is in
> progress?
> 

I do not have any suggestions for this yet. I don't know how to unbound driver
when executing the format command and bound driver after the command is executed.

If you have any suggestions, please let me know.

Thanks,

Yihang.
Bart Van Assche Aug. 9, 2024, 7:14 p.m. UTC | #3
On 8/8/24 8:44 PM, Yihang Li wrote:
> On 2024/8/9 3:09, Bart Van Assche wrote:
>> On 8/7/24 7:17 PM, Yihang Li wrote:
>>> If formatting a suspended disk (such as formatting with different DIF
>>> type), the SYNC CACHE command will fail because the disk is in the
>>> formatting process, which will cause the runtime_status of the disk to
>>> error and it is difficult for user to recover it.
>>>
>>> To solve the issue, retry the command until format command is finished.
>>
>> How is the format command submitted to the SCSI disk? Is that command
>> perhaps submitted as a SCSI pass-through command (SG_IO ioctl)?
>>
> 
> When formatting a suspended disk, the disk will be resuming first,
> and then the format command will submit to the disk through SG_IO ioctl.
> 
> When the disk is processing the formatting command, the system does not
> submit other commands to the disk. Therefore, the system attempts to suspend
> the disk again and sends the SYNC CACHE command. However, the SYNC CACHE
> command fails because the disk is being formatted.
> 
> Error info like:
> 
> [  669.925325] sd 6:0:6:0: [sdg] Synchronizing SCSI cache
> [  670.202371] sd 6:0:6:0: [sdg] Synchronize Cache(10) failed: Result: hostbyte=0x00 driverbyte=DRIVER_OK
> [  670.216300] sd 6:0:6:0: [sdg] Sense Key : 0x2 [current]
> [  670.221860] sd 6:0:6:0: [sdg] ASC=0x4 ASCQ=0x4

Please consider integrating this information in the patch description.

>> Should the sd driver perhaps be unbound while the format command is in
>> progress?
>>
> 
> I do not have any suggestions for this yet. I don't know how to unbound driver
> when executing the format command and bound driver after the command is executed.
> 
> If you have any suggestions, please let me know.

It seems like the PCI core supports binding and unbinding through sysfs
but the SCSI core not. So it's probably easier to add support for
ASC/ASCQ 04h / 04h rather than to add bind/unbind support to the SCSI
core.

Thanks,

Bart.
Yihang Li Aug. 10, 2024, 1:46 a.m. UTC | #4
On 2024/8/10 3:14, Bart Van Assche wrote:
> On 8/8/24 8:44 PM, Yihang Li wrote:
>> On 2024/8/9 3:09, Bart Van Assche wrote:
>>> On 8/7/24 7:17 PM, Yihang Li wrote:
>>>> If formatting a suspended disk (such as formatting with different DIF
>>>> type), the SYNC CACHE command will fail because the disk is in the
>>>> formatting process, which will cause the runtime_status of the disk to
>>>> error and it is difficult for user to recover it.
>>>>
>>>> To solve the issue, retry the command until format command is finished.
>>>
>>> How is the format command submitted to the SCSI disk? Is that command
>>> perhaps submitted as a SCSI pass-through command (SG_IO ioctl)?
>>>
>>
>> When formatting a suspended disk, the disk will be resuming first,
>> and then the format command will submit to the disk through SG_IO ioctl.
>>
>> When the disk is processing the formatting command, the system does not
>> submit other commands to the disk. Therefore, the system attempts to suspend
>> the disk again and sends the SYNC CACHE command. However, the SYNC CACHE
>> command fails because the disk is being formatted.
>>
>> Error info like:
>>
>> [  669.925325] sd 6:0:6:0: [sdg] Synchronizing SCSI cache
>> [  670.202371] sd 6:0:6:0: [sdg] Synchronize Cache(10) failed: Result: hostbyte=0x00 driverbyte=DRIVER_OK
>> [  670.216300] sd 6:0:6:0: [sdg] Sense Key : 0x2 [current]
>> [  670.221860] sd 6:0:6:0: [sdg] ASC=0x4 ASCQ=0x4
> 
> Please consider integrating this information in the patch description.

Ok, I will send a new version later.

> 
>>> Should the sd driver perhaps be unbound while the format command is in
>>> progress?
>>>
>>
>> I do not have any suggestions for this yet. I don't know how to unbound driver
>> when executing the format command and bound driver after the command is executed.
>>
>> If you have any suggestions, please let me know.
> 
> It seems like the PCI core supports binding and unbinding through sysfs
> but the SCSI core not. So it's probably easier to add support for
> ASC/ASCQ 04h / 04h rather than to add bind/unbind support to the SCSI
> core.

Thanks,

Yihang.
Martin K. Petersen Aug. 13, 2024, 1:33 a.m. UTC | #5
> It seems like the PCI core supports binding and unbinding through
> sysfs but the SCSI core not. So it's probably easier to add support
> for ASC/ASCQ 04h / 04h rather than to add bind/unbind support to the
> SCSI core.

I am not a fan of trying to cover all these revalidation corner cases.
Having the ULD attached during a format operation makes zero sense.

NVMe handles formatting and a few other commands differently based on
the Commands Supported and Effects Log Page. The SCSI protocol doesn't
have the same elaborate reporting scheme but that doesn't mean we
couldn't handle disruptive operations similar to what the NVMe driver is
doing.
diff mbox series

Patch

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index adeaa8ab9951..5cd88a8eea73 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1823,6 +1823,11 @@  static int sd_sync_cache(struct scsi_disk *sdkp)
 			    (sshdr.asc == 0x74 && sshdr.ascq == 0x71))	/* drive is password locked */
 				/* this is no error here */
 				return 0;
+
+			/* retry if format in progress */
+			if (sshdr.asc == 0x4 && sshdr.ascq == 0x4)
+				return -EBUSY;
+
 			/*
 			 * This drive doesn't support sync and there's not much
 			 * we can do because this is called during shutdown