diff mbox series

[v3] sd: Retry START STOP UNIT commands

Message ID 20240904210304.2947789-1-bvanassche@acm.org
State New
Headers show
Series [v3] sd: Retry START STOP UNIT commands | expand

Commit Message

Bart Van Assche Sept. 4, 2024, 9:03 p.m. UTC
During system resume, sd_start_stop_device() submits a START STOP UNIT
command to the SCSI device that is being resumed. That command is not
retried in case of a unit attention and hence may fail. An example:

[16575.983359] sd 0:0:0:3: [sdd] Starting disk
[16575.983693] sd 0:0:0:3: [sdd] Start/Stop Unit failed: Result: hostbyte=0x00 driverbyte=DRIVER_OK
[16575.983712] sd 0:0:0:3: [sdd] Sense Key : 0x6
[16575.983730] sd 0:0:0:3: [sdd] ASC=0x29 ASCQ=0x0
[16575.983738] sd 0:0:0:3: PM: dpm_run_callback(): scsi_bus_resume+0x0/0xa0 returns -5
[16575.983783] sd 0:0:0:3: PM: failed to resume async: error -5

Make the SCSI core retry the START STOP UNIT command if the device
reports that it has been powered on or that it has been reset.

Cc: Damien Le Moal <dlemoal@kernel.org>
Cc: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---

Changes compared to v2:
 - Dropped the SCMD_RETRY_PASSTHROUGH flag and use the SCSI failure mechanism
   instead.

Changes compared to v1:
 - Renamed SCMD_RETRY_PASST_ON_UA into SCMD_RETRY_PASSTHROUGH.

drivers/scsi/sd.c | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

Comments

Damien Le Moal Sept. 5, 2024, 12:35 a.m. UTC | #1
On 2024/09/05 6:03, Bart Van Assche wrote:
> During system resume, sd_start_stop_device() submits a START STOP UNIT
> command to the SCSI device that is being resumed. That command is not
> retried in case of a unit attention and hence may fail. An example:
> 
> [16575.983359] sd 0:0:0:3: [sdd] Starting disk
> [16575.983693] sd 0:0:0:3: [sdd] Start/Stop Unit failed: Result: hostbyte=0x00 driverbyte=DRIVER_OK
> [16575.983712] sd 0:0:0:3: [sdd] Sense Key : 0x6
> [16575.983730] sd 0:0:0:3: [sdd] ASC=0x29 ASCQ=0x0
> [16575.983738] sd 0:0:0:3: PM: dpm_run_callback(): scsi_bus_resume+0x0/0xa0 returns -5
> [16575.983783] sd 0:0:0:3: PM: failed to resume async: error -5
> 
> Make the SCSI core retry the START STOP UNIT command if the device
> reports that it has been powered on or that it has been reset.
> 
> Cc: Damien Le Moal <dlemoal@kernel.org>
> Cc: Mike Christie <michael.christie@oracle.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>

Looks OK to me.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Martin K. Petersen Sept. 13, 2024, 12:41 a.m. UTC | #2
Bart,

> During system resume, sd_start_stop_device() submits a START STOP UNIT
> command to the SCSI device that is being resumed. That command is not
> retried in case of a unit attention and hence may fail. An example:

Thanks for making this change! Applied to 6.12/scsi-staging.
diff mbox series

Patch

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 9db86943d04c..9f09060ab401 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -4093,9 +4093,38 @@  static int sd_start_stop_device(struct scsi_disk *sdkp, int start)
 {
 	unsigned char cmd[6] = { START_STOP };	/* START_VALID */
 	struct scsi_sense_hdr sshdr;
+	struct scsi_failure failure_defs[] = {
+		{
+			/* Power on, reset, or bus device reset occurred */
+			.sense = UNIT_ATTENTION,
+			.asc = 0x29,
+			.ascq = 0,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		{
+			/* Power on occurred */
+			.sense = UNIT_ATTENTION,
+			.asc = 0x29,
+			.ascq = 1,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		{
+			/* SCSI bus reset */
+			.sense = UNIT_ATTENTION,
+			.asc = 0x29,
+			.ascq = 2,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		{}
+	};
+	struct scsi_failures failures = {
+		.total_allowed = 3,
+		.failure_definitions = failure_defs,
+	};
 	const struct scsi_exec_args exec_args = {
 		.sshdr = &sshdr,
 		.req_flags = BLK_MQ_REQ_PM,
+		.failures = &failures,
 	};
 	struct scsi_device *sdp = sdkp->device;
 	int res;