diff mbox series

[v6,09/23] scsi: sd: Do not issue commands to suspended disks on shutdown

Message ID 20230923002932.1082348-10-dlemoal@kernel.org
State New
Headers show
Series [v6,01/23] ata: libata-core: Fix ata_port_request_pm() locking | expand

Commit Message

Damien Le Moal Sept. 23, 2023, 12:29 a.m. UTC
If an error occurs when resuming a host adapter before the devices
attached to the adapter are resumed, the adapter low level driver may
remove the scsi host, resulting in a call to sd_remove() for the
disks of the host. This in turn results in a call to sd_shutdown() which
will issue a synchronize cache command and a start stop unit command to
spindown the disk. sd_shutdown() issues the commands only if the device
is not already runtime suspended but does not check the power state for
system-wide suspend/resume. That is, the commands may be issued with the
device in a suspended state, which causes PM resume to hang, forcing a
reset of the machine to recover.

Fix this by tracking the suspended state of a disk using the sicsi_disk
suspended flag and by not calling sd_shutdown() in sd_remove() if the
disk is not running.

Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
---
 drivers/scsi/sd.c | 17 +++++++++++++----
 drivers/scsi/sd.h |  1 +
 2 files changed, 14 insertions(+), 4 deletions(-)

Comments

Bart Van Assche Sept. 25, 2023, 8:22 p.m. UTC | #1
On 9/22/23 17:29, Damien Le Moal wrote:
> diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
> index 5eea762f84d1..4d42392fae07 100644
> --- a/drivers/scsi/sd.h
> +++ b/drivers/scsi/sd.h
> @@ -150,6 +150,7 @@ struct scsi_disk {
>   	unsigned	urswrz : 1;
>   	unsigned	security : 1;
>   	unsigned	ignore_medium_access_errors : 1;
> +	unsigned	suspended : 1;
>   };
>   #define to_scsi_disk(obj) container_of(obj, struct scsi_disk, disk_dev)

If the 'suspended' member is retained, please do not use a bitfield for the
but use type 'bool' instead. Updates of instances of type 'bool' are atomic
while there is no guarantee in the C standard that bitfield updates will be
atomic. Bitfield updates are typically translated into a combination of &,
| and ~ operations.

Additionally, I'm not convinced that we need the new 'suspended' member.
The Linux kernel runtime PM subsystem serializes I/O and system-wide power
operations. No I/O happens during system-wide suspend or resume operations
and no system-wide suspend or resume callbacks are invoked while I/O is
ongoing. The only exception is I/O that is initiated as the result of error
handling by suspend or resume callbacks, e.g. the SCSI commands submitted
by sd_shutdown(). Even if sd_shutdown() is called indirectly by a suspend
or resume callback, I don't think that it can happen that a suspend or
resume operation is ongoing for the device sd_shutdown() operates on. If
scsi_remove_host() is called from inside a resume callback, resuming of the
devices affected by sd_shutdown() will only be attempted after the host
adapter resume callback has finished.

Thanks,

Bart.
Damien Le Moal Sept. 26, 2023, 6 a.m. UTC | #2
On 2023/09/25 22:22, Bart Van Assche wrote:
> On 9/22/23 17:29, Damien Le Moal wrote:
>> diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
>> index 5eea762f84d1..4d42392fae07 100644
>> --- a/drivers/scsi/sd.h
>> +++ b/drivers/scsi/sd.h
>> @@ -150,6 +150,7 @@ struct scsi_disk {
>>   	unsigned	urswrz : 1;
>>   	unsigned	security : 1;
>>   	unsigned	ignore_medium_access_errors : 1;
>> +	unsigned	suspended : 1;
>>   };
>>   #define to_scsi_disk(obj) container_of(obj, struct scsi_disk, disk_dev)
> 
> If the 'suspended' member is retained, please do not use a bitfield for the
> but use type 'bool' instead. Updates of instances of type 'bool' are atomic
> while there is no guarantee in the C standard that bitfield updates will be
> atomic. Bitfield updates are typically translated into a combination of &,
> | and ~ operations.

Sure, I can make it a bool.

> Additionally, I'm not convinced that we need the new 'suspended' member.
> The Linux kernel runtime PM subsystem serializes I/O and system-wide power
> operations. No I/O happens during system-wide suspend or resume operations
> and no system-wide suspend or resume callbacks are invoked while I/O is
> ongoing. The only exception is I/O that is initiated as the result of error
> handling by suspend or resume callbacks, e.g. the SCSI commands submitted
> by sd_shutdown(). Even if sd_shutdown() is called indirectly by a suspend
> or resume callback, I don't think that it can happen that a suspend or
> resume operation is ongoing for the device sd_shutdown() operates on. If

Yes, but that is not what this patch addresses.

> scsi_remove_host() is called from inside a resume callback, resuming of the
> devices affected by sd_shutdown() will only be attempted after the host
> adapter resume callback has finished.

No it will not because the commands issued in sd_shutdown() are synchronous, so
the adapter resume will wait for these to complete. But they will never complete
as the adapter itself is not fully resumed, AND the disk may not be in a state
that allows commands to be executed. Deadlock.

It is easy to recreate this issue if you have a pm8001 adapter: remove that fix
patch I sent to correctly re-allocate IRQs on resume and do a suspend-resume
cycle: on resume, the adapter fails to allocate IRQs and gives up, calling
scsi_remove_host(). The system end being stuck in resume context with no forward
progress ever made.

It seems that you are suggesting that we should use some information from the
scsi_device->power structure to detect the suspended state... But as mentioned
before, these are PM internal and should not be touched without the device lock
held. So the little "suspended" falg simplifies things a lot.

> 
> Thanks,
> 
> Bart.
Bart Van Assche Sept. 26, 2023, 2:51 p.m. UTC | #3
On 9/25/23 23:00, Damien Le Moal wrote:
> It seems that you are suggesting that we should use some information 
> from the scsi_device->power structure to detect the suspended 
> state...

Yes, that's indeed what I'm suggesting.

> But as mentioned before, these are PM internal and should not be 
> touched without the device lock held. So the little "suspended" flag 
> simplifies things a lot.

Hmm ... I think there is plenty of code in the Linux kernel that reads
variables that can be modified by another thread without using locking.
Hasn't the READ_ONCE() macro been introduced for this purpose? Anyway, I
don't have a strong opinion about whether to read directly from the
scsi_device->power data structure or whether to introduce the new
'suspended' member.

Thanks,

Bart.
diff mbox series

Patch

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index a1ef4eef904f..bff8663be7e0 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3741,7 +3741,8 @@  static int sd_remove(struct device *dev)
 
 	device_del(&sdkp->disk_dev);
 	del_gendisk(sdkp->disk);
-	sd_shutdown(dev);
+	if (!sdkp->suspended)
+		sd_shutdown(dev);
 
 	put_disk(sdkp->disk);
 	return 0;
@@ -3872,6 +3873,9 @@  static int sd_suspend_common(struct device *dev, bool runtime)
 			ret = 0;
 	}
 
+	if (!ret)
+		sdkp->suspended = 1;
+
 	return ret;
 }
 
@@ -3891,21 +3895,26 @@  static int sd_suspend_runtime(struct device *dev)
 static int sd_resume(struct device *dev, bool runtime)
 {
 	struct scsi_disk *sdkp = dev_get_drvdata(dev);
-	int ret;
+	int ret = 0;
 
 	if (!sdkp)	/* E.g.: runtime resume at the start of sd_probe() */
 		return 0;
 
-	if (!sd_do_start_stop(sdkp->device, runtime))
+	if (!sd_do_start_stop(sdkp->device, runtime)) {
+		sdkp->suspended = 0;
 		return 0;
+	}
 
 	if (!sdkp->device->no_start_on_resume) {
 		sd_printk(KERN_NOTICE, sdkp, "Starting disk\n");
 		ret = sd_start_stop_device(sdkp, 1);
 	}
 
-	if (!ret)
+	if (!ret) {
 		opal_unlock_from_suspend(sdkp->opal_dev);
+		sdkp->suspended = 0;
+	}
+
 	return ret;
 }
 
diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
index 5eea762f84d1..4d42392fae07 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -150,6 +150,7 @@  struct scsi_disk {
 	unsigned	urswrz : 1;
 	unsigned	security : 1;
 	unsigned	ignore_medium_access_errors : 1;
+	unsigned	suspended : 1;
 };
 #define to_scsi_disk(obj) container_of(obj, struct scsi_disk, disk_dev)