scsi: sd: Revert "Rework asynchronous resume support"

Message ID	20220816172638.538734-1-bvanassche@acm.org
State	New
Headers	show Return-Path: <linux-scsi-owner@kernel.org> From: Bart Van Assche <bvanassche@acm.org> To: "Martin K . Petersen" <martin.petersen@oracle.com> Cc: linux-scsi@vger.kernel.org, Bart Van Assche <bvanassche@acm.org>, Damien Le Moal <damien.lemoal@opensource.wdc.com>, Hannes Reinecke <hare@suse.de>, Geert Uytterhoeven <geert@linux-m68k.org>, gzhqyz@gmail.com, "James E.J. Bottomley" <jejb@linux.ibm.com> Subject: [PATCH] scsi: sd: Revert "Rework asynchronous resume support" Date: Tue, 16 Aug 2022 10:26:38 -0700 Message-Id: <20220816172638.538734-1-bvanassche@acm.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	scsi: sd: Revert "Rework asynchronous resume support" \| expand scsi: sd: Revert "Rework asynchronous resume support"

Bart Van Assche Aug. 16, 2022, 5:26 p.m. UTC

Although patch "Rework asynchronous resume support" eliminates the delay
for some ATA disks after resume, it causes resume of ATA disks to fail
on other setups. See also:
* "Resume process hangs for 5-6 seconds starting sometime in 5.16"
  (https://bugzilla.kernel.org/show_bug.cgi?id=215880).
* Geert's regression report
  (https://lore.kernel.org/linux-scsi/alpine.DEB.2.22.394.2207191125130.1006766@ramsan.of.borg/).

This is what I understand about this issue:
* During resume, ata_port_pm_resume() starts the SCSI error handler.
  This changes the SCSI host state into SHOST_RECOVERY and causes
  scsi_queue_rq() to return BLK_STS_RESOURCE.
* sd_resume() calls sd_start_stop_device() for ATA devices. That
  function in turn calls sd_submit_start() which tries to submit a START
  STOP UNIT command. That command can only be submitted after the SCSI
  error handler has changed the SCSI host state back to SHOST_RUNNING.
* The SCSI error handler runs on its own thread and calls
  schedule_work(&(ap->scsi_rescan_task)). That causes
  ata_scsi_dev_rescan() to be called from the context of a kernel
  workqueue. That call hangs in blk_mq_get_tag(). I'm not sure why -
  maybe because all available tags have been allocated by
  sd_submit_start() calls (this is a guess).

Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: gzhqyz@gmail.com
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reported-by: gzhqyz@gmail.com
Fixes: 88f1669019bd ("scsi: sd: Rework asynchronous resume support"; v6.0-rc1~114^2~68)
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/scsi/sd.c | 84 ++++++++++-------------------------------------
 drivers/scsi/sd.h |  5 ---
 2 files changed, 18 insertions(+), 71 deletions(-)

John Garry Aug. 16, 2022, 6 p.m. UTC | #1

On 16/08/2022 18:26, Bart Van Assche wrote:
> Although patch "Rework asynchronous resume support" eliminates the delay
> for some ATA disks after resume, it causes resume of ATA disks to fail
> on other setups. See also:
> * "Resume process hangs for 5-6 seconds starting sometime in 5.16"
>    (https://bugzilla.kernel.org/show_bug.cgi?id=215880).
> * Geert's regression report
>    (https://lore.kernel.org/linux-scsi/alpine.DEB.2.22.394.2207191125130.1006766@ramsan.of.borg/).
> 
> This is what I understand about this issue:
> * During resume, ata_port_pm_resume() starts the SCSI error handler.
>    This changes the SCSI host state into SHOST_RECOVERY and causes
>    scsi_queue_rq() to return BLK_STS_RESOURCE.
> * sd_resume() calls sd_start_stop_device() for ATA devices. That
>    function in turn calls sd_submit_start() which tries to submit a START
>    STOP UNIT command. That command can only be submitted after the SCSI
>    error handler has changed the SCSI host state back to SHOST_RUNNING.
> * The SCSI error handler runs on its own thread and calls
>    schedule_work(&(ap->scsi_rescan_task)). That causes
>    ata_scsi_dev_rescan() to be called from the context of a kernel
>    workqueue. That call hangs in blk_mq_get_tag(). I'm not sure why -
>    maybe because all available tags have been allocated by
>    sd_submit_start() calls (this is a guess).
> 
> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> Cc: Hannes Reinecke <hare@suse.de>
> Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> Cc: gzhqyz@gmail.com
> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
> Reported-by: gzhqyz@gmail.com
> Fixes: 88f1669019bd ("scsi: sd: Rework asynchronous resume support"; v6.0-rc1~114^2~68)


JFYI, Just now I see that 88f1669019bd also causes me issues for my SATA 
disk:

root@(none)$ echo 0 > sys/class/sas_phy/phy-0:0:2/enable
[58.271646] sas: ex 500e004aaaaaaa1f phy02 change count has changed
[58.305876] sd 0:0:1:0: [sdb] Synchronizing SCSI cache
[58.305898] sd 0:0:1:0: [sdb] Synchronize Cache(10) failed: Result: 
hostbyte=0x04 driverbyte=DRIVER_OK
[58.305901] sd 0:0:1:0: [sdb] Stopping disk
[58.305911] sd 0:0:1:0: [sdb] Start/Stop Unit failed: Result: 
hostbyte=0x04 driverbyte=DRIVER_OK

root@(none)$ echo 1 > sys/class/sas_phy/phy-0:0:2/enable
[95.405836] INFO: task kworker/u128:0:8 blocked for more than 30 seconds.
[95.412618] Not tainted 5.19.0-rc1-00128-g88f1669019bd-dirty #1166
[95.419311] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
[95.427130] task:kworker/u128:0  state:D stack: 0 pid: 8 ppid:  2 
flags:0x00000008
[95.435475] Workqueue: HISI0162:01_event_q sas_port_event_worker
[95.441481] Call trace:
[95.443918]  __switch_to+0x11c/0x1a8
[95.447491]  __schedule+0x264/0x718
[95.450972]  schedule+0x50/0xc8
[95.454106]  schedule_timeout+0x19c/0x250
[95.458108]  wait_for_completion+0x84/0x140
[95.462283]  flush_workqueue+0xfc/0x3d0
[95.466114]  sas_porte_broadcast_rcvd+0x5c/0x68
[95.470638]  sas_port_event_worker+0x2c/0x50
[95.474899]  process_one_work+0x1e4/0x348
[95.478901]  worker_thread+0x48/0x418
[95.482555]  kthread+0xf4/0x110
[95.485687]  ret_from_fork+0x10/0x20
[95.489274] INFO: task kworker/u128:1:462 blocked for more than 30 seconds.
[95.496226] Not tainted 5.19.0-rc1-00128-g88f1669019bd-dirty #1166
[95.502918] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
[95.510736] task:kworker/u128:1  state:D stack: 0 pid:  462 ppid:  2 
flags:0x00000008
[95.519080] Workqueue: HISI0162:01_disco_q sas_revalidate_domain
[95.525080] Call trace:
[95.527517]  __switch_to+0x11c/0x1a8
[95.531085]  __schedule+0x264/0x718
[95.534565]  schedule+0x50/0xc8
[95.537694]  blk_mq_freeze_queue_wait+0x7c/0xb0
[95.542218]  blk_mq_freeze_queue+0x1c/0x28
[95.546307]  disk_release+0x40/0xf0
[95.549784]  device_release+0x30/0x90
[95.553441]  kobject_put+0x90/0x108
[95.556923]  put_device+0x10/0x20
[95.560231]  put_disk+0x18/0x28
[95.563363]  sd_remove+0x44/0x58
[95.566585]  device_remove+0x6c/0x78
[95.570153]  device_release_driver_internal+0xd8/0x180
[95.575284]  device_release_driver+0x14/0x20
[95.579545]  bus_remove_device+0xc8/0x108
[95.583547]  device_del+0x16c/0x3a0
[95.587028]  __scsi_remove_device+0xf0/0x130
[95.591290]  scsi_remove_device+0x28/0x40
[95.595291]  scsi_remove_target+0x1b8/0x220
[95.599466]  sas_rphy_remove+0x5c/0x60
[95.603208]  sas_rphy_delete+0x14/0x28
[95.606949]  sas_destruct_devices+0x54/0x88
[95.611124]  sas_revalidate_domain+0x60/0x148
[95.615472]  process_one_work+0x1e4/0x348
[95.619474]  worker_thread+0x48/0x418
[95.623129]  kthread+0xf4/0x110
[95.626261]  ret_from_fork+0x10/0x20
[95.629830] INFO: task sh:541 blocked for more than 30 seconds.
[95.635740] Not tainted 5.19.0-rc1-00128-g88f1669019bd-dirty #1166
[95.642431] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
[95.650250] task:sh  state:D stack: 0 pid:  541 ppid:  1 flags:0x00000000
[95.658594] Call trace:
[95.661028]  __switch_to+0x11c/0x1a8
[95.664597]  __schedule+0x264/0x718
[95.668076]  schedule+0x50/0xc8
[95.671210]  schedule_timeout+0x19c/0x250
[95.675211]  wait_for_completion+0x84/0x140
[95.679387]  flush_workqueue+0xfc/0x3d0
[95.683214]  drain_workqueue+0xb0/0x158
[95.687043]  __sas_drain_work+0x3c/0x98
[95.690870]  sas_drain_work+0x58/0x60
[95.694524]  queue_phy_enable+0x84/0xc8
[95.698352]  do_sas_phy_enable.isra.7+0x58/0xb0
[95.702875]  store_sas_phy_enable+0x48/0x78
[95.707051]  dev_attr_store+0x14/0x28
[95.710706]  sysfs_kf_write+0x48/0x58
[95.714362]  kernfs_fop_write_iter+0x118/0x1a0
[95.718798]  new_sync_write+0xdc/0x168
[95.722543]  vfs_write+0x1b8/0x388
[95.725938]  ksys_write+0x64/0xf0
[95.729241]  __arm64_sys_write+0x14/0x20
[95.733157]  invoke_syscall+0x40/0xf8
[95.736814]  el0_svc_common.constprop.3+0x6c/0xf8
[95.741511]  do_el0_svc+0x28/0xc8
[95.744818]  el0_svc+0x28/0x80
[95.747865]  el0t_64_sync_handler+0x94/0xb8
[95.752039]  el0t_64_sync+0x178/0x17c


BTW, I see 88f1669019bd is queued for stable...


> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>   drivers/scsi/sd.c | 84 ++++++++++-------------------------------------
>   drivers/scsi/sd.h |  5 ---
>   2 files changed, 18 insertions(+), 71 deletions(-)
> 
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index 8f79fa6318fe..eb76ba055021 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -103,7 +103,6 @@ static void sd_config_discard(struct scsi_disk *, unsigned int);
>   static void sd_config_write_same(struct scsi_disk *);
>   static int  sd_revalidate_disk(struct gendisk *);
>   static void sd_unlock_native_capacity(struct gendisk *disk);
> -static void sd_start_done_work(struct work_struct *work);
>   static int  sd_probe(struct device *);
>   static int  sd_remove(struct device *);
>   static void sd_shutdown(struct device *);
> @@ -3471,7 +3470,6 @@ static int sd_probe(struct device *dev)
>   	sdkp->max_retries = SD_MAX_RETRIES;
>   	atomic_set(&sdkp->openers, 0);
>   	atomic_set(&sdkp->device->ioerr_cnt, 0);
> -	INIT_WORK(&sdkp->start_done_work, sd_start_done_work);
>   
>   	if (!sdp->request_queue->rq_timeout) {
>   		if (sdp->type != TYPE_MOD)
> @@ -3594,69 +3592,12 @@ static void scsi_disk_release(struct device *dev)
>   	kfree(sdkp);
>   }
>   
> -/* Process sense data after a START command finished. */
> -static void sd_start_done_work(struct work_struct *work)
> -{
> -	struct scsi_disk *sdkp = container_of(work, typeof(*sdkp),
> -					      start_done_work);
> -	struct scsi_sense_hdr sshdr;
> -	int res = sdkp->start_result;
> -
> -	if (res == 0)
> -		return;
> -
> -	sd_print_result(sdkp, "Start/Stop Unit failed", res);
> -
> -	if (res < 0)
> -		return;
> -
> -	if (scsi_normalize_sense(sdkp->start_sense_buffer,
> -				 sdkp->start_sense_len, &sshdr))
> -		sd_print_sense_hdr(sdkp, &sshdr);
> -}
> -
> -/* A START command finished. May be called from interrupt context. */
> -static void sd_start_done(struct request *req, blk_status_t status)
> -{
> -	const struct scsi_cmnd *scmd = blk_mq_rq_to_pdu(req);
> -	struct scsi_disk *sdkp = scsi_disk(req->q->disk);
> -
> -	sdkp->start_result = scmd->result;
> -	WARN_ON_ONCE(scmd->sense_len > SCSI_SENSE_BUFFERSIZE);
> -	sdkp->start_sense_len = scmd->sense_len;
> -	memcpy(sdkp->start_sense_buffer, scmd->sense_buffer,
> -	       ARRAY_SIZE(sdkp->start_sense_buffer));
> -	WARN_ON_ONCE(!schedule_work(&sdkp->start_done_work));
> -}
> -
> -/* Submit a START command asynchronously. */
> -static int sd_submit_start(struct scsi_disk *sdkp, u8 cmd[], u8 cmd_len)
> -{
> -	struct scsi_device *sdev = sdkp->device;
> -	struct request_queue *q = sdev->request_queue;
> -	struct request *req;
> -	struct scsi_cmnd *scmd;
> -
> -	req = scsi_alloc_request(q, REQ_OP_DRV_IN, BLK_MQ_REQ_PM);
> -	if (IS_ERR(req))
> -		return PTR_ERR(req);
> -
> -	scmd = blk_mq_rq_to_pdu(req);
> -	scmd->cmd_len = cmd_len;
> -	memcpy(scmd->cmnd, cmd, cmd_len);
> -	scmd->allowed = sdkp->max_retries;
> -	req->timeout = SD_TIMEOUT;
> -	req->rq_flags |= RQF_PM | RQF_QUIET;
> -	req->end_io = sd_start_done;
> -	blk_execute_rq_nowait(req, /*at_head=*/true);
> -
> -	return 0;
> -}
> -
>   static int sd_start_stop_device(struct scsi_disk *sdkp, int start)
>   {
>   	unsigned char cmd[6] = { START_STOP };	/* START_VALID */
> +	struct scsi_sense_hdr sshdr;
>   	struct scsi_device *sdp = sdkp->device;
> +	int res;
>   
>   	if (start)
>   		cmd[4] |= 1;	/* START */
> @@ -3667,10 +3608,23 @@ static int sd_start_stop_device(struct scsi_disk *sdkp, int start)
>   	if (!scsi_device_online(sdp))
>   		return -ENODEV;
>   
> -	/* Wait until processing of sense data has finished. */
> -	flush_work(&sdkp->start_done_work);
> +	res = scsi_execute(sdp, cmd, DMA_NONE, NULL, 0, NULL, &sshdr,
> +			SD_TIMEOUT, sdkp->max_retries, 0, RQF_PM, NULL);
> +	if (res) {
> +		sd_print_result(sdkp, "Start/Stop Unit failed", res);
> +		if (res > 0 && scsi_sense_valid(&sshdr)) {
> +			sd_print_sense_hdr(sdkp, &sshdr);
> +			/* 0x3a is medium not present */
> +			if (sshdr.asc == 0x3a)
> +				res = 0;
> +		}
> +	}
>   
> -	return sd_submit_start(sdkp, cmd, sizeof(cmd));
> +	/* SCSI error codes must not go to the generic layer */
> +	if (res)
> +		return -EIO;
> +
> +	return 0;
>   }
>   
>   /*
> @@ -3697,8 +3651,6 @@ static void sd_shutdown(struct device *dev)
>   		sd_printk(KERN_NOTICE, sdkp, "Stopping disk\n");
>   		sd_start_stop_device(sdkp, 0);
>   	}
> -
> -	flush_work(&sdkp->start_done_work);
>   }
>   
>   static int sd_suspend_common(struct device *dev, bool ignore_stop_errors)
> diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
> index b89187761d61..5eea762f84d1 100644
> --- a/drivers/scsi/sd.h
> +++ b/drivers/scsi/sd.h
> @@ -150,11 +150,6 @@ struct scsi_disk {
>   	unsigned	urswrz : 1;
>   	unsigned	security : 1;
>   	unsigned	ignore_medium_access_errors : 1;
> -
> -	int		start_result;
> -	u32		start_sense_len;
> -	u8		start_sense_buffer[SCSI_SENSE_BUFFERSIZE];
> -	struct work_struct start_done_work;
>   };
>   #define to_scsi_disk(obj) container_of(obj, struct scsi_disk, disk_dev)
>   
> .

Bart Van Assche Aug. 16, 2022, 6:06 p.m. UTC | #2

On 8/16/22 11:00, John Garry wrote:
> JFYI, Just now I see that 88f1669019bd also causes me issues for my SATA 
> disk: [ ... ]
Hi John,

Does reverting commit 88f1669019bd help on your setup?

Thanks,

Bart.

John Garry Aug. 17, 2022, 8:20 a.m. UTC | #3

On 16/08/2022 19:06, Bart Van Assche wrote:
> On 8/16/22 11:00, John Garry wrote:
>> JFYI, Just now I see that 88f1669019bd also causes me issues for my 
>> SATA disk: [ ... ]
> Hi John,
> 
> Does reverting commit 88f1669019bd help on your setup?

Yes,

Tested-by: John Garry <john.garry@huawei.com>

Vlastimil Babka Aug. 17, 2022, 8:06 p.m. UTC | #4

On 8/16/22 19:26, Bart Van Assche wrote:
> Although patch "Rework asynchronous resume support" eliminates the delay
> for some ATA disks after resume, it causes resume of ATA disks to fail
> on other setups. See also:
> * "Resume process hangs for 5-6 seconds starting sometime in 5.16"
>   (https://bugzilla.kernel.org/show_bug.cgi?id=215880).
> * Geert's regression report
>   (https://lore.kernel.org/linux-scsi/alpine.DEB.2.22.394.2207191125130.1006766@ramsan.of.borg/).
> 
> This is what I understand about this issue:
> * During resume, ata_port_pm_resume() starts the SCSI error handler.
>   This changes the SCSI host state into SHOST_RECOVERY and causes
>   scsi_queue_rq() to return BLK_STS_RESOURCE.
> * sd_resume() calls sd_start_stop_device() for ATA devices. That
>   function in turn calls sd_submit_start() which tries to submit a START
>   STOP UNIT command. That command can only be submitted after the SCSI
>   error handler has changed the SCSI host state back to SHOST_RUNNING.
> * The SCSI error handler runs on its own thread and calls
>   schedule_work(&(ap->scsi_rescan_task)). That causes
>   ata_scsi_dev_rescan() to be called from the context of a kernel
>   workqueue. That call hangs in blk_mq_get_tag(). I'm not sure why -
>   maybe because all available tags have been allocated by
>   sd_submit_start() calls (this is a guess).
> 
> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> Cc: Hannes Reinecke <hare@suse.de>
> Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> Cc: gzhqyz@gmail.com
> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
> Reported-by: gzhqyz@gmail.com
> Fixes: 88f1669019bd ("scsi: sd: Rework asynchronous resume support"; v6.0-rc1~114^2~68)
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>

Reported-and-tested-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  drivers/scsi/sd.c | 84 ++++++++++-------------------------------------
>  drivers/scsi/sd.h |  5 ---
>  2 files changed, 18 insertions(+), 71 deletions(-)
> 
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index 8f79fa6318fe..eb76ba055021 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -103,7 +103,6 @@ static void sd_config_discard(struct scsi_disk *, unsigned int);
>  static void sd_config_write_same(struct scsi_disk *);
>  static int  sd_revalidate_disk(struct gendisk *);
>  static void sd_unlock_native_capacity(struct gendisk *disk);
> -static void sd_start_done_work(struct work_struct *work);
>  static int  sd_probe(struct device *);
>  static int  sd_remove(struct device *);
>  static void sd_shutdown(struct device *);
> @@ -3471,7 +3470,6 @@ static int sd_probe(struct device *dev)
>  	sdkp->max_retries = SD_MAX_RETRIES;
>  	atomic_set(&sdkp->openers, 0);
>  	atomic_set(&sdkp->device->ioerr_cnt, 0);
> -	INIT_WORK(&sdkp->start_done_work, sd_start_done_work);
>  
>  	if (!sdp->request_queue->rq_timeout) {
>  		if (sdp->type != TYPE_MOD)
> @@ -3594,69 +3592,12 @@ static void scsi_disk_release(struct device *dev)
>  	kfree(sdkp);
>  }
>  
> -/* Process sense data after a START command finished. */
> -static void sd_start_done_work(struct work_struct *work)
> -{
> -	struct scsi_disk *sdkp = container_of(work, typeof(*sdkp),
> -					      start_done_work);
> -	struct scsi_sense_hdr sshdr;
> -	int res = sdkp->start_result;
> -
> -	if (res == 0)
> -		return;
> -
> -	sd_print_result(sdkp, "Start/Stop Unit failed", res);
> -
> -	if (res < 0)
> -		return;
> -
> -	if (scsi_normalize_sense(sdkp->start_sense_buffer,
> -				 sdkp->start_sense_len, &sshdr))
> -		sd_print_sense_hdr(sdkp, &sshdr);
> -}
> -
> -/* A START command finished. May be called from interrupt context. */
> -static void sd_start_done(struct request *req, blk_status_t status)
> -{
> -	const struct scsi_cmnd *scmd = blk_mq_rq_to_pdu(req);
> -	struct scsi_disk *sdkp = scsi_disk(req->q->disk);
> -
> -	sdkp->start_result = scmd->result;
> -	WARN_ON_ONCE(scmd->sense_len > SCSI_SENSE_BUFFERSIZE);
> -	sdkp->start_sense_len = scmd->sense_len;
> -	memcpy(sdkp->start_sense_buffer, scmd->sense_buffer,
> -	       ARRAY_SIZE(sdkp->start_sense_buffer));
> -	WARN_ON_ONCE(!schedule_work(&sdkp->start_done_work));
> -}
> -
> -/* Submit a START command asynchronously. */
> -static int sd_submit_start(struct scsi_disk *sdkp, u8 cmd[], u8 cmd_len)
> -{
> -	struct scsi_device *sdev = sdkp->device;
> -	struct request_queue *q = sdev->request_queue;
> -	struct request *req;
> -	struct scsi_cmnd *scmd;
> -
> -	req = scsi_alloc_request(q, REQ_OP_DRV_IN, BLK_MQ_REQ_PM);
> -	if (IS_ERR(req))
> -		return PTR_ERR(req);
> -
> -	scmd = blk_mq_rq_to_pdu(req);
> -	scmd->cmd_len = cmd_len;
> -	memcpy(scmd->cmnd, cmd, cmd_len);
> -	scmd->allowed = sdkp->max_retries;
> -	req->timeout = SD_TIMEOUT;
> -	req->rq_flags |= RQF_PM | RQF_QUIET;
> -	req->end_io = sd_start_done;
> -	blk_execute_rq_nowait(req, /*at_head=*/true);
> -
> -	return 0;
> -}
> -
>  static int sd_start_stop_device(struct scsi_disk *sdkp, int start)
>  {
>  	unsigned char cmd[6] = { START_STOP };	/* START_VALID */
> +	struct scsi_sense_hdr sshdr;
>  	struct scsi_device *sdp = sdkp->device;
> +	int res;
>  
>  	if (start)
>  		cmd[4] |= 1;	/* START */
> @@ -3667,10 +3608,23 @@ static int sd_start_stop_device(struct scsi_disk *sdkp, int start)
>  	if (!scsi_device_online(sdp))
>  		return -ENODEV;
>  
> -	/* Wait until processing of sense data has finished. */
> -	flush_work(&sdkp->start_done_work);
> +	res = scsi_execute(sdp, cmd, DMA_NONE, NULL, 0, NULL, &sshdr,
> +			SD_TIMEOUT, sdkp->max_retries, 0, RQF_PM, NULL);
> +	if (res) {
> +		sd_print_result(sdkp, "Start/Stop Unit failed", res);
> +		if (res > 0 && scsi_sense_valid(&sshdr)) {
> +			sd_print_sense_hdr(sdkp, &sshdr);
> +			/* 0x3a is medium not present */
> +			if (sshdr.asc == 0x3a)
> +				res = 0;
> +		}
> +	}
>  
> -	return sd_submit_start(sdkp, cmd, sizeof(cmd));
> +	/* SCSI error codes must not go to the generic layer */
> +	if (res)
> +		return -EIO;
> +
> +	return 0;
>  }
>  
>  /*
> @@ -3697,8 +3651,6 @@ static void sd_shutdown(struct device *dev)
>  		sd_printk(KERN_NOTICE, sdkp, "Stopping disk\n");
>  		sd_start_stop_device(sdkp, 0);
>  	}
> -
> -	flush_work(&sdkp->start_done_work);
>  }
>  
>  static int sd_suspend_common(struct device *dev, bool ignore_stop_errors)
> diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
> index b89187761d61..5eea762f84d1 100644
> --- a/drivers/scsi/sd.h
> +++ b/drivers/scsi/sd.h
> @@ -150,11 +150,6 @@ struct scsi_disk {
>  	unsigned	urswrz : 1;
>  	unsigned	security : 1;
>  	unsigned	ignore_medium_access_errors : 1;
> -
> -	int		start_result;
> -	u32		start_sense_len;
> -	u8		start_sense_buffer[SCSI_SENSE_BUFFERSIZE];
> -	struct work_struct start_done_work;
>  };
>  #define to_scsi_disk(obj) container_of(obj, struct scsi_disk, disk_dev)
>

Hans de Goede Aug. 20, 2022, 3:37 p.m. UTC | #5

Hi,

On 8/16/22 19:26, Bart Van Assche wrote:
> Although patch "Rework asynchronous resume support" eliminates the delay
> for some ATA disks after resume, it causes resume of ATA disks to fail
> on other setups. See also:
> * "Resume process hangs for 5-6 seconds starting sometime in 5.16"
>   (https://bugzilla.kernel.org/show_bug.cgi?id=215880).
> * Geert's regression report
>   (https://lore.kernel.org/linux-scsi/alpine.DEB.2.22.394.2207191125130.1006766@ramsan.of.borg/).
> 
> This is what I understand about this issue:
> * During resume, ata_port_pm_resume() starts the SCSI error handler.
>   This changes the SCSI host state into SHOST_RECOVERY and causes
>   scsi_queue_rq() to return BLK_STS_RESOURCE.
> * sd_resume() calls sd_start_stop_device() for ATA devices. That
>   function in turn calls sd_submit_start() which tries to submit a START
>   STOP UNIT command. That command can only be submitted after the SCSI
>   error handler has changed the SCSI host state back to SHOST_RUNNING.
> * The SCSI error handler runs on its own thread and calls
>   schedule_work(&(ap->scsi_rescan_task)). That causes
>   ata_scsi_dev_rescan() to be called from the context of a kernel
>   workqueue. That call hangs in blk_mq_get_tag(). I'm not sure why -
>   maybe because all available tags have been allocated by
>   sd_submit_start() calls (this is a guess).
> 
> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> Cc: Hannes Reinecke <hare@suse.de>
> Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> Cc: gzhqyz@gmail.com
> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
> Reported-by: gzhqyz@gmail.com
> Fixes: 88f1669019bd ("scsi: sd: Rework asynchronous resume support"; v6.0-rc1~114^2~68)
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>

As reported here I've been seeing tasks block/hang on IO
to a sata disk on a system with / on a NVME (which keeps
the system alive except for the SATA disk acccessing tasks):

https://lore.kernel.org/regressions/dd6844e7-f338-a4e9-2dad-0960e25b2ca1@redhat.com/

I'm running 6.0-rc1 with this patch added now and so far
I've not seen the problem re-occur.

I was also seeing 6.0 suspend/resume issues on 2 laptops with
sata disks (rather then NVME) which I did not yet get around
to collecting logs from / reporting. I'm happy to report that
those suspend/resume issues are also fixed by this:

Tested-by: Hans de Goede <hdegoede@redhat.com>

Regards,

Hans




> ---
>  drivers/scsi/sd.c | 84 ++++++++++-------------------------------------
>  drivers/scsi/sd.h |  5 ---
>  2 files changed, 18 insertions(+), 71 deletions(-)
> 
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index 8f79fa6318fe..eb76ba055021 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -103,7 +103,6 @@ static void sd_config_discard(struct scsi_disk *, unsigned int);
>  static void sd_config_write_same(struct scsi_disk *);
>  static int  sd_revalidate_disk(struct gendisk *);
>  static void sd_unlock_native_capacity(struct gendisk *disk);
> -static void sd_start_done_work(struct work_struct *work);
>  static int  sd_probe(struct device *);
>  static int  sd_remove(struct device *);
>  static void sd_shutdown(struct device *);
> @@ -3471,7 +3470,6 @@ static int sd_probe(struct device *dev)
>  	sdkp->max_retries = SD_MAX_RETRIES;
>  	atomic_set(&sdkp->openers, 0);
>  	atomic_set(&sdkp->device->ioerr_cnt, 0);
> -	INIT_WORK(&sdkp->start_done_work, sd_start_done_work);
>  
>  	if (!sdp->request_queue->rq_timeout) {
>  		if (sdp->type != TYPE_MOD)
> @@ -3594,69 +3592,12 @@ static void scsi_disk_release(struct device *dev)
>  	kfree(sdkp);
>  }
>  
> -/* Process sense data after a START command finished. */
> -static void sd_start_done_work(struct work_struct *work)
> -{
> -	struct scsi_disk *sdkp = container_of(work, typeof(*sdkp),
> -					      start_done_work);
> -	struct scsi_sense_hdr sshdr;
> -	int res = sdkp->start_result;
> -
> -	if (res == 0)
> -		return;
> -
> -	sd_print_result(sdkp, "Start/Stop Unit failed", res);
> -
> -	if (res < 0)
> -		return;
> -
> -	if (scsi_normalize_sense(sdkp->start_sense_buffer,
> -				 sdkp->start_sense_len, &sshdr))
> -		sd_print_sense_hdr(sdkp, &sshdr);
> -}
> -
> -/* A START command finished. May be called from interrupt context. */
> -static void sd_start_done(struct request *req, blk_status_t status)
> -{
> -	const struct scsi_cmnd *scmd = blk_mq_rq_to_pdu(req);
> -	struct scsi_disk *sdkp = scsi_disk(req->q->disk);
> -
> -	sdkp->start_result = scmd->result;
> -	WARN_ON_ONCE(scmd->sense_len > SCSI_SENSE_BUFFERSIZE);
> -	sdkp->start_sense_len = scmd->sense_len;
> -	memcpy(sdkp->start_sense_buffer, scmd->sense_buffer,
> -	       ARRAY_SIZE(sdkp->start_sense_buffer));
> -	WARN_ON_ONCE(!schedule_work(&sdkp->start_done_work));
> -}
> -
> -/* Submit a START command asynchronously. */
> -static int sd_submit_start(struct scsi_disk *sdkp, u8 cmd[], u8 cmd_len)
> -{
> -	struct scsi_device *sdev = sdkp->device;
> -	struct request_queue *q = sdev->request_queue;
> -	struct request *req;
> -	struct scsi_cmnd *scmd;
> -
> -	req = scsi_alloc_request(q, REQ_OP_DRV_IN, BLK_MQ_REQ_PM);
> -	if (IS_ERR(req))
> -		return PTR_ERR(req);
> -
> -	scmd = blk_mq_rq_to_pdu(req);
> -	scmd->cmd_len = cmd_len;
> -	memcpy(scmd->cmnd, cmd, cmd_len);
> -	scmd->allowed = sdkp->max_retries;
> -	req->timeout = SD_TIMEOUT;
> -	req->rq_flags |= RQF_PM | RQF_QUIET;
> -	req->end_io = sd_start_done;
> -	blk_execute_rq_nowait(req, /*at_head=*/true);
> -
> -	return 0;
> -}
> -
>  static int sd_start_stop_device(struct scsi_disk *sdkp, int start)
>  {
>  	unsigned char cmd[6] = { START_STOP };	/* START_VALID */
> +	struct scsi_sense_hdr sshdr;
>  	struct scsi_device *sdp = sdkp->device;
> +	int res;
>  
>  	if (start)
>  		cmd[4] |= 1;	/* START */
> @@ -3667,10 +3608,23 @@ static int sd_start_stop_device(struct scsi_disk *sdkp, int start)
>  	if (!scsi_device_online(sdp))
>  		return -ENODEV;
>  
> -	/* Wait until processing of sense data has finished. */
> -	flush_work(&sdkp->start_done_work);
> +	res = scsi_execute(sdp, cmd, DMA_NONE, NULL, 0, NULL, &sshdr,
> +			SD_TIMEOUT, sdkp->max_retries, 0, RQF_PM, NULL);
> +	if (res) {
> +		sd_print_result(sdkp, "Start/Stop Unit failed", res);
> +		if (res > 0 && scsi_sense_valid(&sshdr)) {
> +			sd_print_sense_hdr(sdkp, &sshdr);
> +			/* 0x3a is medium not present */
> +			if (sshdr.asc == 0x3a)
> +				res = 0;
> +		}
> +	}
>  
> -	return sd_submit_start(sdkp, cmd, sizeof(cmd));
> +	/* SCSI error codes must not go to the generic layer */
> +	if (res)
> +		return -EIO;
> +
> +	return 0;
>  }
>  
>  /*
> @@ -3697,8 +3651,6 @@ static void sd_shutdown(struct device *dev)
>  		sd_printk(KERN_NOTICE, sdkp, "Stopping disk\n");
>  		sd_start_stop_device(sdkp, 0);
>  	}
> -
> -	flush_work(&sdkp->start_done_work);
>  }
>  
>  static int sd_suspend_common(struct device *dev, bool ignore_stop_errors)
> diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
> index b89187761d61..5eea762f84d1 100644
> --- a/drivers/scsi/sd.h
> +++ b/drivers/scsi/sd.h
> @@ -150,11 +150,6 @@ struct scsi_disk {
>  	unsigned	urswrz : 1;
>  	unsigned	security : 1;
>  	unsigned	ignore_medium_access_errors : 1;
> -
> -	int		start_result;
> -	u32		start_sense_len;
> -	u8		start_sense_buffer[SCSI_SENSE_BUFFERSIZE];
> -	struct work_struct start_done_work;
>  };
>  #define to_scsi_disk(obj) container_of(obj, struct scsi_disk, disk_dev)
>  
>

Geert Uytterhoeven Aug. 21, 2022, 9:16 a.m. UTC | #6

Hi Hans,

On Sat, Aug 20, 2022 at 5:37 PM Hans de Goede <hdegoede@redhat.com> wrote:
> On 8/16/22 19:26, Bart Van Assche wrote:
> > Although patch "Rework asynchronous resume support" eliminates the delay
> > for some ATA disks after resume, it causes resume of ATA disks to fail
> > on other setups. See also:
> > * "Resume process hangs for 5-6 seconds starting sometime in 5.16"
> >   (https://bugzilla.kernel.org/show_bug.cgi?id=215880).
> > * Geert's regression report
> >   (https://lore.kernel.org/linux-scsi/alpine.DEB.2.22.394.2207191125130.1006766@ramsan.of.borg/).
> >
> > This is what I understand about this issue:
> > * During resume, ata_port_pm_resume() starts the SCSI error handler.
> >   This changes the SCSI host state into SHOST_RECOVERY and causes
> >   scsi_queue_rq() to return BLK_STS_RESOURCE.
> > * sd_resume() calls sd_start_stop_device() for ATA devices. That
> >   function in turn calls sd_submit_start() which tries to submit a START
> >   STOP UNIT command. That command can only be submitted after the SCSI
> >   error handler has changed the SCSI host state back to SHOST_RUNNING.
> > * The SCSI error handler runs on its own thread and calls
> >   schedule_work(&(ap->scsi_rescan_task)). That causes
> >   ata_scsi_dev_rescan() to be called from the context of a kernel
> >   workqueue. That call hangs in blk_mq_get_tag(). I'm not sure why -
> >   maybe because all available tags have been allocated by
> >   sd_submit_start() calls (this is a guess).
> >
> > Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> > Cc: Hannes Reinecke <hare@suse.de>
> > Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> > Cc: gzhqyz@gmail.com
> > Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
> > Reported-by: gzhqyz@gmail.com
> > Fixes: 88f1669019bd ("scsi: sd: Rework asynchronous resume support"; v6.0-rc1~114^2~68)
> > Signed-off-by: Bart Van Assche <bvanassche@acm.org>
>
> As reported here I've been seeing tasks block/hang on IO
> to a sata disk on a system with / on a NVME (which keeps
> the system alive except for the SATA disk acccessing tasks):
>
> https://lore.kernel.org/regressions/dd6844e7-f338-a4e9-2dad-0960e25b2ca1@redhat.com/
>
> I'm running 6.0-rc1 with this patch added now and so far
> I've not seen the problem re-occur.
>
> I was also seeing 6.0 suspend/resume issues on 2 laptops with
> sata disks (rather then NVME) which I did not yet get around
> to collecting logs from / reporting. I'm happy to report that
> those suspend/resume issues are also fixed by this:

It looks like there is a (different) regression in v6.1-rc1 related
to s2idle and s2ram, which is not fixed by this patch.  In fact it
also happens on boards where SATA is not used, it is just less likely
to happen on the non-SATA boards.
I still have to bisect it, which may take some time, as the issue is
not 100% reproducible.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

Bart Van Assche Aug. 22, 2022, 2:52 a.m. UTC | #7

On 8/21/22 02:16, Geert Uytterhoeven wrote:
> It looks like there is a (different) regression in v6.1-rc1 related
> to s2idle and s2ram, which is not fixed by this patch.  In fact it
> also happens on boards where SATA is not used, it is just less likely
> to happen on the non-SATA boards.
> I still have to bisect it, which may take some time, as the issue is
> not 100% reproducible.

Hi Geert,

What kind of regression are you encountering? A crash, a hang or 
something else? Are any call traces available?

I posted another revert earlier today but that revert is for code paths 
not related to suspend/resume functionality. See also 
https://lore.kernel.org/linux-scsi/20220821220502.13685-1-bvanassche@acm.org/

Thanks,

Bart.

Vlastimil Babka Aug. 22, 2022, 8:53 p.m. UTC | #8

On 8/17/22 22:06, Vlastimil Babka wrote:
> On 8/16/22 19:26, Bart Van Assche wrote:
>> Although patch "Rework asynchronous resume support" eliminates the delay
>> for some ATA disks after resume, it causes resume of ATA disks to fail
>> on other setups. See also:
>> * "Resume process hangs for 5-6 seconds starting sometime in 5.16"
>>   (https://bugzilla.kernel.org/show_bug.cgi?id=215880).
>> * Geert's regression report
>>   (https://lore.kernel.org/linux-scsi/alpine.DEB.2.22.394.2207191125130.1006766@ramsan.of.borg/).
>>
>> This is what I understand about this issue:
>> * During resume, ata_port_pm_resume() starts the SCSI error handler.
>>   This changes the SCSI host state into SHOST_RECOVERY and causes
>>   scsi_queue_rq() to return BLK_STS_RESOURCE.
>> * sd_resume() calls sd_start_stop_device() for ATA devices. That
>>   function in turn calls sd_submit_start() which tries to submit a START
>>   STOP UNIT command. That command can only be submitted after the SCSI
>>   error handler has changed the SCSI host state back to SHOST_RUNNING.
>> * The SCSI error handler runs on its own thread and calls
>>   schedule_work(&(ap->scsi_rescan_task)). That causes
>>   ata_scsi_dev_rescan() to be called from the context of a kernel
>>   workqueue. That call hangs in blk_mq_get_tag(). I'm not sure why -
>>   maybe because all available tags have been allocated by
>>   sd_submit_start() calls (this is a guess).
>>
>> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
>> Cc: Hannes Reinecke <hare@suse.de>
>> Cc: Geert Uytterhoeven <geert@linux-m68k.org>
>> Cc: gzhqyz@gmail.com
>> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
>> Reported-by: gzhqyz@gmail.com
>> Fixes: 88f1669019bd ("scsi: sd: Rework asynchronous resume support"; v6.0-rc1~114^2~68)
>> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> 
> Reported-and-tested-by: Vlastimil Babka <vbabka@suse.cz>

What's the hold up on this? Didn't make it to rc2 and the number of
people who independently spent time bisecting this seems to be only
increasing.

Thanks, Vlastimil

> 
>> ---
>>  drivers/scsi/sd.c | 84 ++++++++++-------------------------------------
>>  drivers/scsi/sd.h |  5 ---
>>  2 files changed, 18 insertions(+), 71 deletions(-)
>>
>> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
>> index 8f79fa6318fe..eb76ba055021 100644
>> --- a/drivers/scsi/sd.c
>> +++ b/drivers/scsi/sd.c
>> @@ -103,7 +103,6 @@ static void sd_config_discard(struct scsi_disk *, unsigned int);
>>  static void sd_config_write_same(struct scsi_disk *);
>>  static int  sd_revalidate_disk(struct gendisk *);
>>  static void sd_unlock_native_capacity(struct gendisk *disk);
>> -static void sd_start_done_work(struct work_struct *work);
>>  static int  sd_probe(struct device *);
>>  static int  sd_remove(struct device *);
>>  static void sd_shutdown(struct device *);
>> @@ -3471,7 +3470,6 @@ static int sd_probe(struct device *dev)
>>  	sdkp->max_retries = SD_MAX_RETRIES;
>>  	atomic_set(&sdkp->openers, 0);
>>  	atomic_set(&sdkp->device->ioerr_cnt, 0);
>> -	INIT_WORK(&sdkp->start_done_work, sd_start_done_work);
>>  
>>  	if (!sdp->request_queue->rq_timeout) {
>>  		if (sdp->type != TYPE_MOD)
>> @@ -3594,69 +3592,12 @@ static void scsi_disk_release(struct device *dev)
>>  	kfree(sdkp);
>>  }
>>  
>> -/* Process sense data after a START command finished. */
>> -static void sd_start_done_work(struct work_struct *work)
>> -{
>> -	struct scsi_disk *sdkp = container_of(work, typeof(*sdkp),
>> -					      start_done_work);
>> -	struct scsi_sense_hdr sshdr;
>> -	int res = sdkp->start_result;
>> -
>> -	if (res == 0)
>> -		return;
>> -
>> -	sd_print_result(sdkp, "Start/Stop Unit failed", res);
>> -
>> -	if (res < 0)
>> -		return;
>> -
>> -	if (scsi_normalize_sense(sdkp->start_sense_buffer,
>> -				 sdkp->start_sense_len, &sshdr))
>> -		sd_print_sense_hdr(sdkp, &sshdr);
>> -}
>> -
>> -/* A START command finished. May be called from interrupt context. */
>> -static void sd_start_done(struct request *req, blk_status_t status)
>> -{
>> -	const struct scsi_cmnd *scmd = blk_mq_rq_to_pdu(req);
>> -	struct scsi_disk *sdkp = scsi_disk(req->q->disk);
>> -
>> -	sdkp->start_result = scmd->result;
>> -	WARN_ON_ONCE(scmd->sense_len > SCSI_SENSE_BUFFERSIZE);
>> -	sdkp->start_sense_len = scmd->sense_len;
>> -	memcpy(sdkp->start_sense_buffer, scmd->sense_buffer,
>> -	       ARRAY_SIZE(sdkp->start_sense_buffer));
>> -	WARN_ON_ONCE(!schedule_work(&sdkp->start_done_work));
>> -}
>> -
>> -/* Submit a START command asynchronously. */
>> -static int sd_submit_start(struct scsi_disk *sdkp, u8 cmd[], u8 cmd_len)
>> -{
>> -	struct scsi_device *sdev = sdkp->device;
>> -	struct request_queue *q = sdev->request_queue;
>> -	struct request *req;
>> -	struct scsi_cmnd *scmd;
>> -
>> -	req = scsi_alloc_request(q, REQ_OP_DRV_IN, BLK_MQ_REQ_PM);
>> -	if (IS_ERR(req))
>> -		return PTR_ERR(req);
>> -
>> -	scmd = blk_mq_rq_to_pdu(req);
>> -	scmd->cmd_len = cmd_len;
>> -	memcpy(scmd->cmnd, cmd, cmd_len);
>> -	scmd->allowed = sdkp->max_retries;
>> -	req->timeout = SD_TIMEOUT;
>> -	req->rq_flags |= RQF_PM | RQF_QUIET;
>> -	req->end_io = sd_start_done;
>> -	blk_execute_rq_nowait(req, /*at_head=*/true);
>> -
>> -	return 0;
>> -}
>> -
>>  static int sd_start_stop_device(struct scsi_disk *sdkp, int start)
>>  {
>>  	unsigned char cmd[6] = { START_STOP };	/* START_VALID */
>> +	struct scsi_sense_hdr sshdr;
>>  	struct scsi_device *sdp = sdkp->device;
>> +	int res;
>>  
>>  	if (start)
>>  		cmd[4] |= 1;	/* START */
>> @@ -3667,10 +3608,23 @@ static int sd_start_stop_device(struct scsi_disk *sdkp, int start)
>>  	if (!scsi_device_online(sdp))
>>  		return -ENODEV;
>>  
>> -	/* Wait until processing of sense data has finished. */
>> -	flush_work(&sdkp->start_done_work);
>> +	res = scsi_execute(sdp, cmd, DMA_NONE, NULL, 0, NULL, &sshdr,
>> +			SD_TIMEOUT, sdkp->max_retries, 0, RQF_PM, NULL);
>> +	if (res) {
>> +		sd_print_result(sdkp, "Start/Stop Unit failed", res);
>> +		if (res > 0 && scsi_sense_valid(&sshdr)) {
>> +			sd_print_sense_hdr(sdkp, &sshdr);
>> +			/* 0x3a is medium not present */
>> +			if (sshdr.asc == 0x3a)
>> +				res = 0;
>> +		}
>> +	}
>>  
>> -	return sd_submit_start(sdkp, cmd, sizeof(cmd));
>> +	/* SCSI error codes must not go to the generic layer */
>> +	if (res)
>> +		return -EIO;
>> +
>> +	return 0;
>>  }
>>  
>>  /*
>> @@ -3697,8 +3651,6 @@ static void sd_shutdown(struct device *dev)
>>  		sd_printk(KERN_NOTICE, sdkp, "Stopping disk\n");
>>  		sd_start_stop_device(sdkp, 0);
>>  	}
>> -
>> -	flush_work(&sdkp->start_done_work);
>>  }
>>  
>>  static int sd_suspend_common(struct device *dev, bool ignore_stop_errors)
>> diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
>> index b89187761d61..5eea762f84d1 100644
>> --- a/drivers/scsi/sd.h
>> +++ b/drivers/scsi/sd.h
>> @@ -150,11 +150,6 @@ struct scsi_disk {
>>  	unsigned	urswrz : 1;
>>  	unsigned	security : 1;
>>  	unsigned	ignore_medium_access_errors : 1;
>> -
>> -	int		start_result;
>> -	u32		start_sense_len;
>> -	u8		start_sense_buffer[SCSI_SENSE_BUFFERSIZE];
>> -	struct work_struct start_done_work;
>>  };
>>  #define to_scsi_disk(obj) container_of(obj, struct scsi_disk, disk_dev)
>>  
>

Geert Uytterhoeven Aug. 23, 2022, 6:41 a.m. UTC | #9

Hi Bart,

On Mon, Aug 22, 2022 at 4:52 AM Bart Van Assche <bvanassche@acm.org> wrote:
> On 8/21/22 02:16, Geert Uytterhoeven wrote:
> > It looks like there is a (different) regression in v6.1-rc1 related
> > to s2idle and s2ram, which is not fixed by this patch.  In fact it
> > also happens on boards where SATA is not used, it is just less likely
> > to happen on the non-SATA boards.
> > I still have to bisect it, which may take some time, as the issue is
> > not 100% reproducible.
>
> What kind of regression are you encountering? A crash, a hang or
> something else? Are any call traces available?

A lock-up (magic sysrq does not work) during s2idle.
I tried bisecting it yesterday, but failed.
On v6.0-rc1 (and rc2) it happens ca. 25% of the time, but the closer
I get to v5.19, the less likely it is to happen. Apparently 100
successful s2idle cycles was not enough to declare a kernel good...

    Freezing ...
    Filesystems sync: 0.001 seconds
    Freezing user space processes ... (elapsed 0.001 seconds) done.
    OOM killer disabled.
    Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
    sd 0:0:0:0: [sda] Synchronizing SCSI cache
    sd 0:0:0:0: [sda] Stopping disk

---> hangs here if it happens

    ravb e6800000.ethernet eth0: Link is Down
    sd 0:0:0:0: [sda] Starting disk
    Micrel KSZ9031 Gigabit PHY e6800000.ethernet-ffffffff:00: attached
PHY driver (mii_bus:phy_addr=e6800000.ethernet-ffffffff:00, irq=186)
    ata1: link resume succeeded after 1 retries
    ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
    ata1.00: configured for UDMA/133
    OOM killer enabled.
    Restarting tasks ... done.
    random: crng reseeded on system resumption
    PM: suspend exit
    ravb e6800000.ethernet eth0: Link is Up - 1Gbps/Full - flow control off

> I posted another revert earlier today but that revert is for code paths
> not related to suspend/resume functionality. See also
> https://lore.kernel.org/linux-scsi/20220821220502.13685-1-bvanassche@acm.org/

Unlikely to be the case, but I'll give it a try later...

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

Bart Van Assche Aug. 23, 2022, 6:10 p.m. UTC | #10

On 8/22/22 23:41, Geert Uytterhoeven wrote:
> A lock-up (magic sysrq does not work) during s2idle.
> I tried bisecting it yesterday, but failed.
> On v6.0-rc1 (and rc2) it happens ca. 25% of the time, but the closer
> I get to v5.19, the less likely it is to happen. Apparently 100
> successful s2idle cycles was not enough to declare a kernel good...
> 
>      Freezing ...
>      Filesystems sync: 0.001 seconds
>      Freezing user space processes ... (elapsed 0.001 seconds) done.
>      OOM killer disabled.
>      Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
>      sd 0:0:0:0: [sda] Synchronizing SCSI cache
>      sd 0:0:0:0: [sda] Stopping disk
> 
> ---> hangs here if it happens
> 
>      ravb e6800000.ethernet eth0: Link is Down
>      sd 0:0:0:0: [sda] Starting disk
>      Micrel KSZ9031 Gigabit PHY e6800000.ethernet-ffffffff:00: attached
> PHY driver (mii_bus:phy_addr=e6800000.ethernet-ffffffff:00, irq=186)
>      ata1: link resume succeeded after 1 retries
>      ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>      ata1.00: configured for UDMA/133
>      OOM killer enabled.
>      Restarting tasks ... done.
>      random: crng reseeded on system resumption
>      PM: suspend exit
>      ravb e6800000.ethernet eth0: Link is Up - 1Gbps/Full - flow control off

Hi Geert,

I'm not sure that is enough information to find the root cause. How 
about enabling the tp_printk boot option and to enable tracing for 
suspend/resume operations, e.g. as follows?

cd /sys/kernel/tracing &&
echo 256 > /sys/kernel/tracing/buffer_size_kb &&
echo nop > current_tracer &&
echo > trace &&
echo 1 > events/power/device_pm_callback_start/enable &&
echo 1 > events/power/device_pm_callback_end/enable &&
echo 1 > events/power/suspend_resume/enable &&
echo 1 > tracing_on

Thanks,

Bart.

Geert Uytterhoeven Aug. 26, 2022, 7:54 a.m. UTC | #11

Hi Bart,

On Tue, Aug 23, 2022 at 8:10 PM Bart Van Assche <bvanassche@acm.org> wrote:
> On 8/22/22 23:41, Geert Uytterhoeven wrote:
> > A lock-up (magic sysrq does not work) during s2idle.
> > I tried bisecting it yesterday, but failed.
> > On v6.0-rc1 (and rc2) it happens ca. 25% of the time, but the closer
> > I get to v5.19, the less likely it is to happen. Apparently 100
> > successful s2idle cycles was not enough to declare a kernel good...
> >
> >      Freezing ...
> >      Filesystems sync: 0.001 seconds
> >      Freezing user space processes ... (elapsed 0.001 seconds) done.
> >      OOM killer disabled.
> >      Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
> >      sd 0:0:0:0: [sda] Synchronizing SCSI cache
> >      sd 0:0:0:0: [sda] Stopping disk
> >
> > ---> hangs here if it happens
> >
> >      ravb e6800000.ethernet eth0: Link is Down
> >      sd 0:0:0:0: [sda] Starting disk
> >      Micrel KSZ9031 Gigabit PHY e6800000.ethernet-ffffffff:00: attached
> > PHY driver (mii_bus:phy_addr=e6800000.ethernet-ffffffff:00, irq=186)
> >      ata1: link resume succeeded after 1 retries
> >      ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> >      ata1.00: configured for UDMA/133
> >      OOM killer enabled.
> >      Restarting tasks ... done.
> >      random: crng reseeded on system resumption
> >      PM: suspend exit
> >      ravb e6800000.ethernet eth0: Link is Up - 1Gbps/Full - flow control off
>
> I'm not sure that is enough information to find the root cause. How

Sorry for not making it clear I didn't expect this to be enough
information.

> about enabling the tp_printk boot option and to enable tracing for
> suspend/resume operations, e.g. as follows?
>
> cd /sys/kernel/tracing &&
> echo 256 > /sys/kernel/tracing/buffer_size_kb &&
> echo nop > current_tracer &&
> echo > trace &&
> echo 1 > events/power/device_pm_callback_start/enable &&
> echo 1 > events/power/device_pm_callback_end/enable &&
> echo 1 > events/power/suspend_resume/enable &&
> echo 1 > tracing_on

Thanks, that generates lots of output (362 KiB/cycle)!
Unfortunately it also has an impact on the probability of lock-ups.
Combined with 'scsi: sd: Revert "Rework asynchronous resume support"',
s2idle now works almost always.

I did manage to trigger the lock-up once with tracing enabled:

     device_pm_callback_end: gpio_rcar e6055400.gpio, err=0
     device_pm_callback_start: gpio_rcar e6055800.gpio, parent: soc,
noirq power domain [suspend]
     device_pm_callback_end: gpio_rcar e6055800.gpio, err=0
     device_pm_callback_start: renesas-cpg-mssr
e6150000.clock-controller, parent: soc, noirq driver [suspend]
     device_pm_callback_end: renesas-cpg-mssr e6150000.clock-controller, err=0
     device_pm_callback_start: sh-pfc e6060000.pinctrl, parent: soc,
noirq driver [suspend]
     device_pm_callback_end: sh-pfc e6060000.pinctrl, err=0
     suspend_resume: dpm_suspend_noirq[2] end
     suspend_resume: machine_suspend[1] begin
     suspend_resume: timekeeping_freeze[5] begin

---> hang

     suspend_resume: timekeeping_freeze[0] end
     suspend_resume: machine_suspend[1] end
     suspend_resume: dpm_resume_noirq[16] begin
     device_pm_callback_start: sh-pfc e6060000.pinctrl, parent: soc,
noirq driver [resume]
     device_pm_callback_end: sh-pfc e6060000.pinctrl, err=0
     device_pm_callback_start: renesas-cpg-mssr
e6150000.clock-controller, parent: soc, noirq driver [resume]
     device_pm_callback_end: renesas-cpg-mssr e6150000.clock-controller, err=0
     device_pm_callback_start: gpio_rcar e6055800.gpio, parent: soc,
noirq power domain [resume]

Oops, timers...

At least it's not related to SCSI ;-)

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

scsi: sd: Revert "Rework asynchronous resume support"

Commit Message

Comments

Patch