Message ID | 20220816172638.538734-1-bvanassche@acm.org |
---|---|
State | New |
Headers | show |
Series | scsi: sd: Revert "Rework asynchronous resume support" | expand |
On 16/08/2022 18:26, Bart Van Assche wrote: > Although patch "Rework asynchronous resume support" eliminates the delay > for some ATA disks after resume, it causes resume of ATA disks to fail > on other setups. See also: > * "Resume process hangs for 5-6 seconds starting sometime in 5.16" > (https://bugzilla.kernel.org/show_bug.cgi?id=215880). > * Geert's regression report > (https://lore.kernel.org/linux-scsi/alpine.DEB.2.22.394.2207191125130.1006766@ramsan.of.borg/). > > This is what I understand about this issue: > * During resume, ata_port_pm_resume() starts the SCSI error handler. > This changes the SCSI host state into SHOST_RECOVERY and causes > scsi_queue_rq() to return BLK_STS_RESOURCE. > * sd_resume() calls sd_start_stop_device() for ATA devices. That > function in turn calls sd_submit_start() which tries to submit a START > STOP UNIT command. That command can only be submitted after the SCSI > error handler has changed the SCSI host state back to SHOST_RUNNING. > * The SCSI error handler runs on its own thread and calls > schedule_work(&(ap->scsi_rescan_task)). That causes > ata_scsi_dev_rescan() to be called from the context of a kernel > workqueue. That call hangs in blk_mq_get_tag(). I'm not sure why - > maybe because all available tags have been allocated by > sd_submit_start() calls (this is a guess). > > Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com> > Cc: Hannes Reinecke <hare@suse.de> > Cc: Geert Uytterhoeven <geert@linux-m68k.org> > Cc: gzhqyz@gmail.com > Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> > Reported-by: gzhqyz@gmail.com > Fixes: 88f1669019bd ("scsi: sd: Rework asynchronous resume support"; v6.0-rc1~114^2~68) JFYI, Just now I see that 88f1669019bd also causes me issues for my SATA disk: root@(none)$ echo 0 > sys/class/sas_phy/phy-0:0:2/enable [58.271646] sas: ex 500e004aaaaaaa1f phy02 change count has changed [58.305876] sd 0:0:1:0: [sdb] Synchronizing SCSI cache [58.305898] sd 0:0:1:0: [sdb] Synchronize Cache(10) failed: Result: hostbyte=0x04 driverbyte=DRIVER_OK [58.305901] sd 0:0:1:0: [sdb] Stopping disk [58.305911] sd 0:0:1:0: [sdb] Start/Stop Unit failed: Result: hostbyte=0x04 driverbyte=DRIVER_OK root@(none)$ echo 1 > sys/class/sas_phy/phy-0:0:2/enable [95.405836] INFO: task kworker/u128:0:8 blocked for more than 30 seconds. [95.412618] Not tainted 5.19.0-rc1-00128-g88f1669019bd-dirty #1166 [95.419311] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [95.427130] task:kworker/u128:0 state:D stack: 0 pid: 8 ppid: 2 flags:0x00000008 [95.435475] Workqueue: HISI0162:01_event_q sas_port_event_worker [95.441481] Call trace: [95.443918] __switch_to+0x11c/0x1a8 [95.447491] __schedule+0x264/0x718 [95.450972] schedule+0x50/0xc8 [95.454106] schedule_timeout+0x19c/0x250 [95.458108] wait_for_completion+0x84/0x140 [95.462283] flush_workqueue+0xfc/0x3d0 [95.466114] sas_porte_broadcast_rcvd+0x5c/0x68 [95.470638] sas_port_event_worker+0x2c/0x50 [95.474899] process_one_work+0x1e4/0x348 [95.478901] worker_thread+0x48/0x418 [95.482555] kthread+0xf4/0x110 [95.485687] ret_from_fork+0x10/0x20 [95.489274] INFO: task kworker/u128:1:462 blocked for more than 30 seconds. [95.496226] Not tainted 5.19.0-rc1-00128-g88f1669019bd-dirty #1166 [95.502918] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [95.510736] task:kworker/u128:1 state:D stack: 0 pid: 462 ppid: 2 flags:0x00000008 [95.519080] Workqueue: HISI0162:01_disco_q sas_revalidate_domain [95.525080] Call trace: [95.527517] __switch_to+0x11c/0x1a8 [95.531085] __schedule+0x264/0x718 [95.534565] schedule+0x50/0xc8 [95.537694] blk_mq_freeze_queue_wait+0x7c/0xb0 [95.542218] blk_mq_freeze_queue+0x1c/0x28 [95.546307] disk_release+0x40/0xf0 [95.549784] device_release+0x30/0x90 [95.553441] kobject_put+0x90/0x108 [95.556923] put_device+0x10/0x20 [95.560231] put_disk+0x18/0x28 [95.563363] sd_remove+0x44/0x58 [95.566585] device_remove+0x6c/0x78 [95.570153] device_release_driver_internal+0xd8/0x180 [95.575284] device_release_driver+0x14/0x20 [95.579545] bus_remove_device+0xc8/0x108 [95.583547] device_del+0x16c/0x3a0 [95.587028] __scsi_remove_device+0xf0/0x130 [95.591290] scsi_remove_device+0x28/0x40 [95.595291] scsi_remove_target+0x1b8/0x220 [95.599466] sas_rphy_remove+0x5c/0x60 [95.603208] sas_rphy_delete+0x14/0x28 [95.606949] sas_destruct_devices+0x54/0x88 [95.611124] sas_revalidate_domain+0x60/0x148 [95.615472] process_one_work+0x1e4/0x348 [95.619474] worker_thread+0x48/0x418 [95.623129] kthread+0xf4/0x110 [95.626261] ret_from_fork+0x10/0x20 [95.629830] INFO: task sh:541 blocked for more than 30 seconds. [95.635740] Not tainted 5.19.0-rc1-00128-g88f1669019bd-dirty #1166 [95.642431] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [95.650250] task:sh state:D stack: 0 pid: 541 ppid: 1 flags:0x00000000 [95.658594] Call trace: [95.661028] __switch_to+0x11c/0x1a8 [95.664597] __schedule+0x264/0x718 [95.668076] schedule+0x50/0xc8 [95.671210] schedule_timeout+0x19c/0x250 [95.675211] wait_for_completion+0x84/0x140 [95.679387] flush_workqueue+0xfc/0x3d0 [95.683214] drain_workqueue+0xb0/0x158 [95.687043] __sas_drain_work+0x3c/0x98 [95.690870] sas_drain_work+0x58/0x60 [95.694524] queue_phy_enable+0x84/0xc8 [95.698352] do_sas_phy_enable.isra.7+0x58/0xb0 [95.702875] store_sas_phy_enable+0x48/0x78 [95.707051] dev_attr_store+0x14/0x28 [95.710706] sysfs_kf_write+0x48/0x58 [95.714362] kernfs_fop_write_iter+0x118/0x1a0 [95.718798] new_sync_write+0xdc/0x168 [95.722543] vfs_write+0x1b8/0x388 [95.725938] ksys_write+0x64/0xf0 [95.729241] __arm64_sys_write+0x14/0x20 [95.733157] invoke_syscall+0x40/0xf8 [95.736814] el0_svc_common.constprop.3+0x6c/0xf8 [95.741511] do_el0_svc+0x28/0xc8 [95.744818] el0_svc+0x28/0x80 [95.747865] el0t_64_sync_handler+0x94/0xb8 [95.752039] el0t_64_sync+0x178/0x17c BTW, I see 88f1669019bd is queued for stable... > Signed-off-by: Bart Van Assche <bvanassche@acm.org> > --- > drivers/scsi/sd.c | 84 ++++++++++------------------------------------- > drivers/scsi/sd.h | 5 --- > 2 files changed, 18 insertions(+), 71 deletions(-) > > diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c > index 8f79fa6318fe..eb76ba055021 100644 > --- a/drivers/scsi/sd.c > +++ b/drivers/scsi/sd.c > @@ -103,7 +103,6 @@ static void sd_config_discard(struct scsi_disk *, unsigned int); > static void sd_config_write_same(struct scsi_disk *); > static int sd_revalidate_disk(struct gendisk *); > static void sd_unlock_native_capacity(struct gendisk *disk); > -static void sd_start_done_work(struct work_struct *work); > static int sd_probe(struct device *); > static int sd_remove(struct device *); > static void sd_shutdown(struct device *); > @@ -3471,7 +3470,6 @@ static int sd_probe(struct device *dev) > sdkp->max_retries = SD_MAX_RETRIES; > atomic_set(&sdkp->openers, 0); > atomic_set(&sdkp->device->ioerr_cnt, 0); > - INIT_WORK(&sdkp->start_done_work, sd_start_done_work); > > if (!sdp->request_queue->rq_timeout) { > if (sdp->type != TYPE_MOD) > @@ -3594,69 +3592,12 @@ static void scsi_disk_release(struct device *dev) > kfree(sdkp); > } > > -/* Process sense data after a START command finished. */ > -static void sd_start_done_work(struct work_struct *work) > -{ > - struct scsi_disk *sdkp = container_of(work, typeof(*sdkp), > - start_done_work); > - struct scsi_sense_hdr sshdr; > - int res = sdkp->start_result; > - > - if (res == 0) > - return; > - > - sd_print_result(sdkp, "Start/Stop Unit failed", res); > - > - if (res < 0) > - return; > - > - if (scsi_normalize_sense(sdkp->start_sense_buffer, > - sdkp->start_sense_len, &sshdr)) > - sd_print_sense_hdr(sdkp, &sshdr); > -} > - > -/* A START command finished. May be called from interrupt context. */ > -static void sd_start_done(struct request *req, blk_status_t status) > -{ > - const struct scsi_cmnd *scmd = blk_mq_rq_to_pdu(req); > - struct scsi_disk *sdkp = scsi_disk(req->q->disk); > - > - sdkp->start_result = scmd->result; > - WARN_ON_ONCE(scmd->sense_len > SCSI_SENSE_BUFFERSIZE); > - sdkp->start_sense_len = scmd->sense_len; > - memcpy(sdkp->start_sense_buffer, scmd->sense_buffer, > - ARRAY_SIZE(sdkp->start_sense_buffer)); > - WARN_ON_ONCE(!schedule_work(&sdkp->start_done_work)); > -} > - > -/* Submit a START command asynchronously. */ > -static int sd_submit_start(struct scsi_disk *sdkp, u8 cmd[], u8 cmd_len) > -{ > - struct scsi_device *sdev = sdkp->device; > - struct request_queue *q = sdev->request_queue; > - struct request *req; > - struct scsi_cmnd *scmd; > - > - req = scsi_alloc_request(q, REQ_OP_DRV_IN, BLK_MQ_REQ_PM); > - if (IS_ERR(req)) > - return PTR_ERR(req); > - > - scmd = blk_mq_rq_to_pdu(req); > - scmd->cmd_len = cmd_len; > - memcpy(scmd->cmnd, cmd, cmd_len); > - scmd->allowed = sdkp->max_retries; > - req->timeout = SD_TIMEOUT; > - req->rq_flags |= RQF_PM | RQF_QUIET; > - req->end_io = sd_start_done; > - blk_execute_rq_nowait(req, /*at_head=*/true); > - > - return 0; > -} > - > static int sd_start_stop_device(struct scsi_disk *sdkp, int start) > { > unsigned char cmd[6] = { START_STOP }; /* START_VALID */ > + struct scsi_sense_hdr sshdr; > struct scsi_device *sdp = sdkp->device; > + int res; > > if (start) > cmd[4] |= 1; /* START */ > @@ -3667,10 +3608,23 @@ static int sd_start_stop_device(struct scsi_disk *sdkp, int start) > if (!scsi_device_online(sdp)) > return -ENODEV; > > - /* Wait until processing of sense data has finished. */ > - flush_work(&sdkp->start_done_work); > + res = scsi_execute(sdp, cmd, DMA_NONE, NULL, 0, NULL, &sshdr, > + SD_TIMEOUT, sdkp->max_retries, 0, RQF_PM, NULL); > + if (res) { > + sd_print_result(sdkp, "Start/Stop Unit failed", res); > + if (res > 0 && scsi_sense_valid(&sshdr)) { > + sd_print_sense_hdr(sdkp, &sshdr); > + /* 0x3a is medium not present */ > + if (sshdr.asc == 0x3a) > + res = 0; > + } > + } > > - return sd_submit_start(sdkp, cmd, sizeof(cmd)); > + /* SCSI error codes must not go to the generic layer */ > + if (res) > + return -EIO; > + > + return 0; > } > > /* > @@ -3697,8 +3651,6 @@ static void sd_shutdown(struct device *dev) > sd_printk(KERN_NOTICE, sdkp, "Stopping disk\n"); > sd_start_stop_device(sdkp, 0); > } > - > - flush_work(&sdkp->start_done_work); > } > > static int sd_suspend_common(struct device *dev, bool ignore_stop_errors) > diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h > index b89187761d61..5eea762f84d1 100644 > --- a/drivers/scsi/sd.h > +++ b/drivers/scsi/sd.h > @@ -150,11 +150,6 @@ struct scsi_disk { > unsigned urswrz : 1; > unsigned security : 1; > unsigned ignore_medium_access_errors : 1; > - > - int start_result; > - u32 start_sense_len; > - u8 start_sense_buffer[SCSI_SENSE_BUFFERSIZE]; > - struct work_struct start_done_work; > }; > #define to_scsi_disk(obj) container_of(obj, struct scsi_disk, disk_dev) > > .
On 8/16/22 11:00, John Garry wrote: > JFYI, Just now I see that 88f1669019bd also causes me issues for my SATA > disk: [ ... ] Hi John, Does reverting commit 88f1669019bd help on your setup? Thanks, Bart.
On 16/08/2022 19:06, Bart Van Assche wrote: > On 8/16/22 11:00, John Garry wrote: >> JFYI, Just now I see that 88f1669019bd also causes me issues for my >> SATA disk: [ ... ] > Hi John, > > Does reverting commit 88f1669019bd help on your setup? Yes, Tested-by: John Garry <john.garry@huawei.com>
On 8/16/22 19:26, Bart Van Assche wrote: > Although patch "Rework asynchronous resume support" eliminates the delay > for some ATA disks after resume, it causes resume of ATA disks to fail > on other setups. See also: > * "Resume process hangs for 5-6 seconds starting sometime in 5.16" > (https://bugzilla.kernel.org/show_bug.cgi?id=215880). > * Geert's regression report > (https://lore.kernel.org/linux-scsi/alpine.DEB.2.22.394.2207191125130.1006766@ramsan.of.borg/). > > This is what I understand about this issue: > * During resume, ata_port_pm_resume() starts the SCSI error handler. > This changes the SCSI host state into SHOST_RECOVERY and causes > scsi_queue_rq() to return BLK_STS_RESOURCE. > * sd_resume() calls sd_start_stop_device() for ATA devices. That > function in turn calls sd_submit_start() which tries to submit a START > STOP UNIT command. That command can only be submitted after the SCSI > error handler has changed the SCSI host state back to SHOST_RUNNING. > * The SCSI error handler runs on its own thread and calls > schedule_work(&(ap->scsi_rescan_task)). That causes > ata_scsi_dev_rescan() to be called from the context of a kernel > workqueue. That call hangs in blk_mq_get_tag(). I'm not sure why - > maybe because all available tags have been allocated by > sd_submit_start() calls (this is a guess). > > Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com> > Cc: Hannes Reinecke <hare@suse.de> > Cc: Geert Uytterhoeven <geert@linux-m68k.org> > Cc: gzhqyz@gmail.com > Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> > Reported-by: gzhqyz@gmail.com > Fixes: 88f1669019bd ("scsi: sd: Rework asynchronous resume support"; v6.0-rc1~114^2~68) > Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reported-and-tested-by: Vlastimil Babka <vbabka@suse.cz> > --- > drivers/scsi/sd.c | 84 ++++++++++------------------------------------- > drivers/scsi/sd.h | 5 --- > 2 files changed, 18 insertions(+), 71 deletions(-) > > diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c > index 8f79fa6318fe..eb76ba055021 100644 > --- a/drivers/scsi/sd.c > +++ b/drivers/scsi/sd.c > @@ -103,7 +103,6 @@ static void sd_config_discard(struct scsi_disk *, unsigned int); > static void sd_config_write_same(struct scsi_disk *); > static int sd_revalidate_disk(struct gendisk *); > static void sd_unlock_native_capacity(struct gendisk *disk); > -static void sd_start_done_work(struct work_struct *work); > static int sd_probe(struct device *); > static int sd_remove(struct device *); > static void sd_shutdown(struct device *); > @@ -3471,7 +3470,6 @@ static int sd_probe(struct device *dev) > sdkp->max_retries = SD_MAX_RETRIES; > atomic_set(&sdkp->openers, 0); > atomic_set(&sdkp->device->ioerr_cnt, 0); > - INIT_WORK(&sdkp->start_done_work, sd_start_done_work); > > if (!sdp->request_queue->rq_timeout) { > if (sdp->type != TYPE_MOD) > @@ -3594,69 +3592,12 @@ static void scsi_disk_release(struct device *dev) > kfree(sdkp); > } > > -/* Process sense data after a START command finished. */ > -static void sd_start_done_work(struct work_struct *work) > -{ > - struct scsi_disk *sdkp = container_of(work, typeof(*sdkp), > - start_done_work); > - struct scsi_sense_hdr sshdr; > - int res = sdkp->start_result; > - > - if (res == 0) > - return; > - > - sd_print_result(sdkp, "Start/Stop Unit failed", res); > - > - if (res < 0) > - return; > - > - if (scsi_normalize_sense(sdkp->start_sense_buffer, > - sdkp->start_sense_len, &sshdr)) > - sd_print_sense_hdr(sdkp, &sshdr); > -} > - > -/* A START command finished. May be called from interrupt context. */ > -static void sd_start_done(struct request *req, blk_status_t status) > -{ > - const struct scsi_cmnd *scmd = blk_mq_rq_to_pdu(req); > - struct scsi_disk *sdkp = scsi_disk(req->q->disk); > - > - sdkp->start_result = scmd->result; > - WARN_ON_ONCE(scmd->sense_len > SCSI_SENSE_BUFFERSIZE); > - sdkp->start_sense_len = scmd->sense_len; > - memcpy(sdkp->start_sense_buffer, scmd->sense_buffer, > - ARRAY_SIZE(sdkp->start_sense_buffer)); > - WARN_ON_ONCE(!schedule_work(&sdkp->start_done_work)); > -} > - > -/* Submit a START command asynchronously. */ > -static int sd_submit_start(struct scsi_disk *sdkp, u8 cmd[], u8 cmd_len) > -{ > - struct scsi_device *sdev = sdkp->device; > - struct request_queue *q = sdev->request_queue; > - struct request *req; > - struct scsi_cmnd *scmd; > - > - req = scsi_alloc_request(q, REQ_OP_DRV_IN, BLK_MQ_REQ_PM); > - if (IS_ERR(req)) > - return PTR_ERR(req); > - > - scmd = blk_mq_rq_to_pdu(req); > - scmd->cmd_len = cmd_len; > - memcpy(scmd->cmnd, cmd, cmd_len); > - scmd->allowed = sdkp->max_retries; > - req->timeout = SD_TIMEOUT; > - req->rq_flags |= RQF_PM | RQF_QUIET; > - req->end_io = sd_start_done; > - blk_execute_rq_nowait(req, /*at_head=*/true); > - > - return 0; > -} > - > static int sd_start_stop_device(struct scsi_disk *sdkp, int start) > { > unsigned char cmd[6] = { START_STOP }; /* START_VALID */ > + struct scsi_sense_hdr sshdr; > struct scsi_device *sdp = sdkp->device; > + int res; > > if (start) > cmd[4] |= 1; /* START */ > @@ -3667,10 +3608,23 @@ static int sd_start_stop_device(struct scsi_disk *sdkp, int start) > if (!scsi_device_online(sdp)) > return -ENODEV; > > - /* Wait until processing of sense data has finished. */ > - flush_work(&sdkp->start_done_work); > + res = scsi_execute(sdp, cmd, DMA_NONE, NULL, 0, NULL, &sshdr, > + SD_TIMEOUT, sdkp->max_retries, 0, RQF_PM, NULL); > + if (res) { > + sd_print_result(sdkp, "Start/Stop Unit failed", res); > + if (res > 0 && scsi_sense_valid(&sshdr)) { > + sd_print_sense_hdr(sdkp, &sshdr); > + /* 0x3a is medium not present */ > + if (sshdr.asc == 0x3a) > + res = 0; > + } > + } > > - return sd_submit_start(sdkp, cmd, sizeof(cmd)); > + /* SCSI error codes must not go to the generic layer */ > + if (res) > + return -EIO; > + > + return 0; > } > > /* > @@ -3697,8 +3651,6 @@ static void sd_shutdown(struct device *dev) > sd_printk(KERN_NOTICE, sdkp, "Stopping disk\n"); > sd_start_stop_device(sdkp, 0); > } > - > - flush_work(&sdkp->start_done_work); > } > > static int sd_suspend_common(struct device *dev, bool ignore_stop_errors) > diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h > index b89187761d61..5eea762f84d1 100644 > --- a/drivers/scsi/sd.h > +++ b/drivers/scsi/sd.h > @@ -150,11 +150,6 @@ struct scsi_disk { > unsigned urswrz : 1; > unsigned security : 1; > unsigned ignore_medium_access_errors : 1; > - > - int start_result; > - u32 start_sense_len; > - u8 start_sense_buffer[SCSI_SENSE_BUFFERSIZE]; > - struct work_struct start_done_work; > }; > #define to_scsi_disk(obj) container_of(obj, struct scsi_disk, disk_dev) >
Hi, On 8/16/22 19:26, Bart Van Assche wrote: > Although patch "Rework asynchronous resume support" eliminates the delay > for some ATA disks after resume, it causes resume of ATA disks to fail > on other setups. See also: > * "Resume process hangs for 5-6 seconds starting sometime in 5.16" > (https://bugzilla.kernel.org/show_bug.cgi?id=215880). > * Geert's regression report > (https://lore.kernel.org/linux-scsi/alpine.DEB.2.22.394.2207191125130.1006766@ramsan.of.borg/). > > This is what I understand about this issue: > * During resume, ata_port_pm_resume() starts the SCSI error handler. > This changes the SCSI host state into SHOST_RECOVERY and causes > scsi_queue_rq() to return BLK_STS_RESOURCE. > * sd_resume() calls sd_start_stop_device() for ATA devices. That > function in turn calls sd_submit_start() which tries to submit a START > STOP UNIT command. That command can only be submitted after the SCSI > error handler has changed the SCSI host state back to SHOST_RUNNING. > * The SCSI error handler runs on its own thread and calls > schedule_work(&(ap->scsi_rescan_task)). That causes > ata_scsi_dev_rescan() to be called from the context of a kernel > workqueue. That call hangs in blk_mq_get_tag(). I'm not sure why - > maybe because all available tags have been allocated by > sd_submit_start() calls (this is a guess). > > Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com> > Cc: Hannes Reinecke <hare@suse.de> > Cc: Geert Uytterhoeven <geert@linux-m68k.org> > Cc: gzhqyz@gmail.com > Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> > Reported-by: gzhqyz@gmail.com > Fixes: 88f1669019bd ("scsi: sd: Rework asynchronous resume support"; v6.0-rc1~114^2~68) > Signed-off-by: Bart Van Assche <bvanassche@acm.org> As reported here I've been seeing tasks block/hang on IO to a sata disk on a system with / on a NVME (which keeps the system alive except for the SATA disk acccessing tasks): https://lore.kernel.org/regressions/dd6844e7-f338-a4e9-2dad-0960e25b2ca1@redhat.com/ I'm running 6.0-rc1 with this patch added now and so far I've not seen the problem re-occur. I was also seeing 6.0 suspend/resume issues on 2 laptops with sata disks (rather then NVME) which I did not yet get around to collecting logs from / reporting. I'm happy to report that those suspend/resume issues are also fixed by this: Tested-by: Hans de Goede <hdegoede@redhat.com> Regards, Hans > --- > drivers/scsi/sd.c | 84 ++++++++++------------------------------------- > drivers/scsi/sd.h | 5 --- > 2 files changed, 18 insertions(+), 71 deletions(-) > > diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c > index 8f79fa6318fe..eb76ba055021 100644 > --- a/drivers/scsi/sd.c > +++ b/drivers/scsi/sd.c > @@ -103,7 +103,6 @@ static void sd_config_discard(struct scsi_disk *, unsigned int); > static void sd_config_write_same(struct scsi_disk *); > static int sd_revalidate_disk(struct gendisk *); > static void sd_unlock_native_capacity(struct gendisk *disk); > -static void sd_start_done_work(struct work_struct *work); > static int sd_probe(struct device *); > static int sd_remove(struct device *); > static void sd_shutdown(struct device *); > @@ -3471,7 +3470,6 @@ static int sd_probe(struct device *dev) > sdkp->max_retries = SD_MAX_RETRIES; > atomic_set(&sdkp->openers, 0); > atomic_set(&sdkp->device->ioerr_cnt, 0); > - INIT_WORK(&sdkp->start_done_work, sd_start_done_work); > > if (!sdp->request_queue->rq_timeout) { > if (sdp->type != TYPE_MOD) > @@ -3594,69 +3592,12 @@ static void scsi_disk_release(struct device *dev) > kfree(sdkp); > } > > -/* Process sense data after a START command finished. */ > -static void sd_start_done_work(struct work_struct *work) > -{ > - struct scsi_disk *sdkp = container_of(work, typeof(*sdkp), > - start_done_work); > - struct scsi_sense_hdr sshdr; > - int res = sdkp->start_result; > - > - if (res == 0) > - return; > - > - sd_print_result(sdkp, "Start/Stop Unit failed", res); > - > - if (res < 0) > - return; > - > - if (scsi_normalize_sense(sdkp->start_sense_buffer, > - sdkp->start_sense_len, &sshdr)) > - sd_print_sense_hdr(sdkp, &sshdr); > -} > - > -/* A START command finished. May be called from interrupt context. */ > -static void sd_start_done(struct request *req, blk_status_t status) > -{ > - const struct scsi_cmnd *scmd = blk_mq_rq_to_pdu(req); > - struct scsi_disk *sdkp = scsi_disk(req->q->disk); > - > - sdkp->start_result = scmd->result; > - WARN_ON_ONCE(scmd->sense_len > SCSI_SENSE_BUFFERSIZE); > - sdkp->start_sense_len = scmd->sense_len; > - memcpy(sdkp->start_sense_buffer, scmd->sense_buffer, > - ARRAY_SIZE(sdkp->start_sense_buffer)); > - WARN_ON_ONCE(!schedule_work(&sdkp->start_done_work)); > -} > - > -/* Submit a START command asynchronously. */ > -static int sd_submit_start(struct scsi_disk *sdkp, u8 cmd[], u8 cmd_len) > -{ > - struct scsi_device *sdev = sdkp->device; > - struct request_queue *q = sdev->request_queue; > - struct request *req; > - struct scsi_cmnd *scmd; > - > - req = scsi_alloc_request(q, REQ_OP_DRV_IN, BLK_MQ_REQ_PM); > - if (IS_ERR(req)) > - return PTR_ERR(req); > - > - scmd = blk_mq_rq_to_pdu(req); > - scmd->cmd_len = cmd_len; > - memcpy(scmd->cmnd, cmd, cmd_len); > - scmd->allowed = sdkp->max_retries; > - req->timeout = SD_TIMEOUT; > - req->rq_flags |= RQF_PM | RQF_QUIET; > - req->end_io = sd_start_done; > - blk_execute_rq_nowait(req, /*at_head=*/true); > - > - return 0; > -} > - > static int sd_start_stop_device(struct scsi_disk *sdkp, int start) > { > unsigned char cmd[6] = { START_STOP }; /* START_VALID */ > + struct scsi_sense_hdr sshdr; > struct scsi_device *sdp = sdkp->device; > + int res; > > if (start) > cmd[4] |= 1; /* START */ > @@ -3667,10 +3608,23 @@ static int sd_start_stop_device(struct scsi_disk *sdkp, int start) > if (!scsi_device_online(sdp)) > return -ENODEV; > > - /* Wait until processing of sense data has finished. */ > - flush_work(&sdkp->start_done_work); > + res = scsi_execute(sdp, cmd, DMA_NONE, NULL, 0, NULL, &sshdr, > + SD_TIMEOUT, sdkp->max_retries, 0, RQF_PM, NULL); > + if (res) { > + sd_print_result(sdkp, "Start/Stop Unit failed", res); > + if (res > 0 && scsi_sense_valid(&sshdr)) { > + sd_print_sense_hdr(sdkp, &sshdr); > + /* 0x3a is medium not present */ > + if (sshdr.asc == 0x3a) > + res = 0; > + } > + } > > - return sd_submit_start(sdkp, cmd, sizeof(cmd)); > + /* SCSI error codes must not go to the generic layer */ > + if (res) > + return -EIO; > + > + return 0; > } > > /* > @@ -3697,8 +3651,6 @@ static void sd_shutdown(struct device *dev) > sd_printk(KERN_NOTICE, sdkp, "Stopping disk\n"); > sd_start_stop_device(sdkp, 0); > } > - > - flush_work(&sdkp->start_done_work); > } > > static int sd_suspend_common(struct device *dev, bool ignore_stop_errors) > diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h > index b89187761d61..5eea762f84d1 100644 > --- a/drivers/scsi/sd.h > +++ b/drivers/scsi/sd.h > @@ -150,11 +150,6 @@ struct scsi_disk { > unsigned urswrz : 1; > unsigned security : 1; > unsigned ignore_medium_access_errors : 1; > - > - int start_result; > - u32 start_sense_len; > - u8 start_sense_buffer[SCSI_SENSE_BUFFERSIZE]; > - struct work_struct start_done_work; > }; > #define to_scsi_disk(obj) container_of(obj, struct scsi_disk, disk_dev) > >
Hi Hans, On Sat, Aug 20, 2022 at 5:37 PM Hans de Goede <hdegoede@redhat.com> wrote: > On 8/16/22 19:26, Bart Van Assche wrote: > > Although patch "Rework asynchronous resume support" eliminates the delay > > for some ATA disks after resume, it causes resume of ATA disks to fail > > on other setups. See also: > > * "Resume process hangs for 5-6 seconds starting sometime in 5.16" > > (https://bugzilla.kernel.org/show_bug.cgi?id=215880). > > * Geert's regression report > > (https://lore.kernel.org/linux-scsi/alpine.DEB.2.22.394.2207191125130.1006766@ramsan.of.borg/). > > > > This is what I understand about this issue: > > * During resume, ata_port_pm_resume() starts the SCSI error handler. > > This changes the SCSI host state into SHOST_RECOVERY and causes > > scsi_queue_rq() to return BLK_STS_RESOURCE. > > * sd_resume() calls sd_start_stop_device() for ATA devices. That > > function in turn calls sd_submit_start() which tries to submit a START > > STOP UNIT command. That command can only be submitted after the SCSI > > error handler has changed the SCSI host state back to SHOST_RUNNING. > > * The SCSI error handler runs on its own thread and calls > > schedule_work(&(ap->scsi_rescan_task)). That causes > > ata_scsi_dev_rescan() to be called from the context of a kernel > > workqueue. That call hangs in blk_mq_get_tag(). I'm not sure why - > > maybe because all available tags have been allocated by > > sd_submit_start() calls (this is a guess). > > > > Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com> > > Cc: Hannes Reinecke <hare@suse.de> > > Cc: Geert Uytterhoeven <geert@linux-m68k.org> > > Cc: gzhqyz@gmail.com > > Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> > > Reported-by: gzhqyz@gmail.com > > Fixes: 88f1669019bd ("scsi: sd: Rework asynchronous resume support"; v6.0-rc1~114^2~68) > > Signed-off-by: Bart Van Assche <bvanassche@acm.org> > > As reported here I've been seeing tasks block/hang on IO > to a sata disk on a system with / on a NVME (which keeps > the system alive except for the SATA disk acccessing tasks): > > https://lore.kernel.org/regressions/dd6844e7-f338-a4e9-2dad-0960e25b2ca1@redhat.com/ > > I'm running 6.0-rc1 with this patch added now and so far > I've not seen the problem re-occur. > > I was also seeing 6.0 suspend/resume issues on 2 laptops with > sata disks (rather then NVME) which I did not yet get around > to collecting logs from / reporting. I'm happy to report that > those suspend/resume issues are also fixed by this: It looks like there is a (different) regression in v6.1-rc1 related to s2idle and s2ram, which is not fixed by this patch. In fact it also happens on boards where SATA is not used, it is just less likely to happen on the non-SATA boards. I still have to bisect it, which may take some time, as the issue is not 100% reproducible. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
On 8/21/22 02:16, Geert Uytterhoeven wrote: > It looks like there is a (different) regression in v6.1-rc1 related > to s2idle and s2ram, which is not fixed by this patch. In fact it > also happens on boards where SATA is not used, it is just less likely > to happen on the non-SATA boards. > I still have to bisect it, which may take some time, as the issue is > not 100% reproducible. Hi Geert, What kind of regression are you encountering? A crash, a hang or something else? Are any call traces available? I posted another revert earlier today but that revert is for code paths not related to suspend/resume functionality. See also https://lore.kernel.org/linux-scsi/20220821220502.13685-1-bvanassche@acm.org/ Thanks, Bart.
On 8/17/22 22:06, Vlastimil Babka wrote: > On 8/16/22 19:26, Bart Van Assche wrote: >> Although patch "Rework asynchronous resume support" eliminates the delay >> for some ATA disks after resume, it causes resume of ATA disks to fail >> on other setups. See also: >> * "Resume process hangs for 5-6 seconds starting sometime in 5.16" >> (https://bugzilla.kernel.org/show_bug.cgi?id=215880). >> * Geert's regression report >> (https://lore.kernel.org/linux-scsi/alpine.DEB.2.22.394.2207191125130.1006766@ramsan.of.borg/). >> >> This is what I understand about this issue: >> * During resume, ata_port_pm_resume() starts the SCSI error handler. >> This changes the SCSI host state into SHOST_RECOVERY and causes >> scsi_queue_rq() to return BLK_STS_RESOURCE. >> * sd_resume() calls sd_start_stop_device() for ATA devices. That >> function in turn calls sd_submit_start() which tries to submit a START >> STOP UNIT command. That command can only be submitted after the SCSI >> error handler has changed the SCSI host state back to SHOST_RUNNING. >> * The SCSI error handler runs on its own thread and calls >> schedule_work(&(ap->scsi_rescan_task)). That causes >> ata_scsi_dev_rescan() to be called from the context of a kernel >> workqueue. That call hangs in blk_mq_get_tag(). I'm not sure why - >> maybe because all available tags have been allocated by >> sd_submit_start() calls (this is a guess). >> >> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com> >> Cc: Hannes Reinecke <hare@suse.de> >> Cc: Geert Uytterhoeven <geert@linux-m68k.org> >> Cc: gzhqyz@gmail.com >> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> >> Reported-by: gzhqyz@gmail.com >> Fixes: 88f1669019bd ("scsi: sd: Rework asynchronous resume support"; v6.0-rc1~114^2~68) >> Signed-off-by: Bart Van Assche <bvanassche@acm.org> > > Reported-and-tested-by: Vlastimil Babka <vbabka@suse.cz> What's the hold up on this? Didn't make it to rc2 and the number of people who independently spent time bisecting this seems to be only increasing. Thanks, Vlastimil > >> --- >> drivers/scsi/sd.c | 84 ++++++++++------------------------------------- >> drivers/scsi/sd.h | 5 --- >> 2 files changed, 18 insertions(+), 71 deletions(-) >> >> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c >> index 8f79fa6318fe..eb76ba055021 100644 >> --- a/drivers/scsi/sd.c >> +++ b/drivers/scsi/sd.c >> @@ -103,7 +103,6 @@ static void sd_config_discard(struct scsi_disk *, unsigned int); >> static void sd_config_write_same(struct scsi_disk *); >> static int sd_revalidate_disk(struct gendisk *); >> static void sd_unlock_native_capacity(struct gendisk *disk); >> -static void sd_start_done_work(struct work_struct *work); >> static int sd_probe(struct device *); >> static int sd_remove(struct device *); >> static void sd_shutdown(struct device *); >> @@ -3471,7 +3470,6 @@ static int sd_probe(struct device *dev) >> sdkp->max_retries = SD_MAX_RETRIES; >> atomic_set(&sdkp->openers, 0); >> atomic_set(&sdkp->device->ioerr_cnt, 0); >> - INIT_WORK(&sdkp->start_done_work, sd_start_done_work); >> >> if (!sdp->request_queue->rq_timeout) { >> if (sdp->type != TYPE_MOD) >> @@ -3594,69 +3592,12 @@ static void scsi_disk_release(struct device *dev) >> kfree(sdkp); >> } >> >> -/* Process sense data after a START command finished. */ >> -static void sd_start_done_work(struct work_struct *work) >> -{ >> - struct scsi_disk *sdkp = container_of(work, typeof(*sdkp), >> - start_done_work); >> - struct scsi_sense_hdr sshdr; >> - int res = sdkp->start_result; >> - >> - if (res == 0) >> - return; >> - >> - sd_print_result(sdkp, "Start/Stop Unit failed", res); >> - >> - if (res < 0) >> - return; >> - >> - if (scsi_normalize_sense(sdkp->start_sense_buffer, >> - sdkp->start_sense_len, &sshdr)) >> - sd_print_sense_hdr(sdkp, &sshdr); >> -} >> - >> -/* A START command finished. May be called from interrupt context. */ >> -static void sd_start_done(struct request *req, blk_status_t status) >> -{ >> - const struct scsi_cmnd *scmd = blk_mq_rq_to_pdu(req); >> - struct scsi_disk *sdkp = scsi_disk(req->q->disk); >> - >> - sdkp->start_result = scmd->result; >> - WARN_ON_ONCE(scmd->sense_len > SCSI_SENSE_BUFFERSIZE); >> - sdkp->start_sense_len = scmd->sense_len; >> - memcpy(sdkp->start_sense_buffer, scmd->sense_buffer, >> - ARRAY_SIZE(sdkp->start_sense_buffer)); >> - WARN_ON_ONCE(!schedule_work(&sdkp->start_done_work)); >> -} >> - >> -/* Submit a START command asynchronously. */ >> -static int sd_submit_start(struct scsi_disk *sdkp, u8 cmd[], u8 cmd_len) >> -{ >> - struct scsi_device *sdev = sdkp->device; >> - struct request_queue *q = sdev->request_queue; >> - struct request *req; >> - struct scsi_cmnd *scmd; >> - >> - req = scsi_alloc_request(q, REQ_OP_DRV_IN, BLK_MQ_REQ_PM); >> - if (IS_ERR(req)) >> - return PTR_ERR(req); >> - >> - scmd = blk_mq_rq_to_pdu(req); >> - scmd->cmd_len = cmd_len; >> - memcpy(scmd->cmnd, cmd, cmd_len); >> - scmd->allowed = sdkp->max_retries; >> - req->timeout = SD_TIMEOUT; >> - req->rq_flags |= RQF_PM | RQF_QUIET; >> - req->end_io = sd_start_done; >> - blk_execute_rq_nowait(req, /*at_head=*/true); >> - >> - return 0; >> -} >> - >> static int sd_start_stop_device(struct scsi_disk *sdkp, int start) >> { >> unsigned char cmd[6] = { START_STOP }; /* START_VALID */ >> + struct scsi_sense_hdr sshdr; >> struct scsi_device *sdp = sdkp->device; >> + int res; >> >> if (start) >> cmd[4] |= 1; /* START */ >> @@ -3667,10 +3608,23 @@ static int sd_start_stop_device(struct scsi_disk *sdkp, int start) >> if (!scsi_device_online(sdp)) >> return -ENODEV; >> >> - /* Wait until processing of sense data has finished. */ >> - flush_work(&sdkp->start_done_work); >> + res = scsi_execute(sdp, cmd, DMA_NONE, NULL, 0, NULL, &sshdr, >> + SD_TIMEOUT, sdkp->max_retries, 0, RQF_PM, NULL); >> + if (res) { >> + sd_print_result(sdkp, "Start/Stop Unit failed", res); >> + if (res > 0 && scsi_sense_valid(&sshdr)) { >> + sd_print_sense_hdr(sdkp, &sshdr); >> + /* 0x3a is medium not present */ >> + if (sshdr.asc == 0x3a) >> + res = 0; >> + } >> + } >> >> - return sd_submit_start(sdkp, cmd, sizeof(cmd)); >> + /* SCSI error codes must not go to the generic layer */ >> + if (res) >> + return -EIO; >> + >> + return 0; >> } >> >> /* >> @@ -3697,8 +3651,6 @@ static void sd_shutdown(struct device *dev) >> sd_printk(KERN_NOTICE, sdkp, "Stopping disk\n"); >> sd_start_stop_device(sdkp, 0); >> } >> - >> - flush_work(&sdkp->start_done_work); >> } >> >> static int sd_suspend_common(struct device *dev, bool ignore_stop_errors) >> diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h >> index b89187761d61..5eea762f84d1 100644 >> --- a/drivers/scsi/sd.h >> +++ b/drivers/scsi/sd.h >> @@ -150,11 +150,6 @@ struct scsi_disk { >> unsigned urswrz : 1; >> unsigned security : 1; >> unsigned ignore_medium_access_errors : 1; >> - >> - int start_result; >> - u32 start_sense_len; >> - u8 start_sense_buffer[SCSI_SENSE_BUFFERSIZE]; >> - struct work_struct start_done_work; >> }; >> #define to_scsi_disk(obj) container_of(obj, struct scsi_disk, disk_dev) >> >
Hi Bart, On Mon, Aug 22, 2022 at 4:52 AM Bart Van Assche <bvanassche@acm.org> wrote: > On 8/21/22 02:16, Geert Uytterhoeven wrote: > > It looks like there is a (different) regression in v6.1-rc1 related > > to s2idle and s2ram, which is not fixed by this patch. In fact it > > also happens on boards where SATA is not used, it is just less likely > > to happen on the non-SATA boards. > > I still have to bisect it, which may take some time, as the issue is > > not 100% reproducible. > > What kind of regression are you encountering? A crash, a hang or > something else? Are any call traces available? A lock-up (magic sysrq does not work) during s2idle. I tried bisecting it yesterday, but failed. On v6.0-rc1 (and rc2) it happens ca. 25% of the time, but the closer I get to v5.19, the less likely it is to happen. Apparently 100 successful s2idle cycles was not enough to declare a kernel good... Freezing ... Filesystems sync: 0.001 seconds Freezing user space processes ... (elapsed 0.001 seconds) done. OOM killer disabled. Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. sd 0:0:0:0: [sda] Synchronizing SCSI cache sd 0:0:0:0: [sda] Stopping disk ---> hangs here if it happens ravb e6800000.ethernet eth0: Link is Down sd 0:0:0:0: [sda] Starting disk Micrel KSZ9031 Gigabit PHY e6800000.ethernet-ffffffff:00: attached PHY driver (mii_bus:phy_addr=e6800000.ethernet-ffffffff:00, irq=186) ata1: link resume succeeded after 1 retries ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata1.00: configured for UDMA/133 OOM killer enabled. Restarting tasks ... done. random: crng reseeded on system resumption PM: suspend exit ravb e6800000.ethernet eth0: Link is Up - 1Gbps/Full - flow control off > I posted another revert earlier today but that revert is for code paths > not related to suspend/resume functionality. See also > https://lore.kernel.org/linux-scsi/20220821220502.13685-1-bvanassche@acm.org/ Unlikely to be the case, but I'll give it a try later... Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
On 8/22/22 23:41, Geert Uytterhoeven wrote: > A lock-up (magic sysrq does not work) during s2idle. > I tried bisecting it yesterday, but failed. > On v6.0-rc1 (and rc2) it happens ca. 25% of the time, but the closer > I get to v5.19, the less likely it is to happen. Apparently 100 > successful s2idle cycles was not enough to declare a kernel good... > > Freezing ... > Filesystems sync: 0.001 seconds > Freezing user space processes ... (elapsed 0.001 seconds) done. > OOM killer disabled. > Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. > sd 0:0:0:0: [sda] Synchronizing SCSI cache > sd 0:0:0:0: [sda] Stopping disk > > ---> hangs here if it happens > > ravb e6800000.ethernet eth0: Link is Down > sd 0:0:0:0: [sda] Starting disk > Micrel KSZ9031 Gigabit PHY e6800000.ethernet-ffffffff:00: attached > PHY driver (mii_bus:phy_addr=e6800000.ethernet-ffffffff:00, irq=186) > ata1: link resume succeeded after 1 retries > ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) > ata1.00: configured for UDMA/133 > OOM killer enabled. > Restarting tasks ... done. > random: crng reseeded on system resumption > PM: suspend exit > ravb e6800000.ethernet eth0: Link is Up - 1Gbps/Full - flow control off Hi Geert, I'm not sure that is enough information to find the root cause. How about enabling the tp_printk boot option and to enable tracing for suspend/resume operations, e.g. as follows? cd /sys/kernel/tracing && echo 256 > /sys/kernel/tracing/buffer_size_kb && echo nop > current_tracer && echo > trace && echo 1 > events/power/device_pm_callback_start/enable && echo 1 > events/power/device_pm_callback_end/enable && echo 1 > events/power/suspend_resume/enable && echo 1 > tracing_on Thanks, Bart.
Hi Bart, On Tue, Aug 23, 2022 at 8:10 PM Bart Van Assche <bvanassche@acm.org> wrote: > On 8/22/22 23:41, Geert Uytterhoeven wrote: > > A lock-up (magic sysrq does not work) during s2idle. > > I tried bisecting it yesterday, but failed. > > On v6.0-rc1 (and rc2) it happens ca. 25% of the time, but the closer > > I get to v5.19, the less likely it is to happen. Apparently 100 > > successful s2idle cycles was not enough to declare a kernel good... > > > > Freezing ... > > Filesystems sync: 0.001 seconds > > Freezing user space processes ... (elapsed 0.001 seconds) done. > > OOM killer disabled. > > Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. > > sd 0:0:0:0: [sda] Synchronizing SCSI cache > > sd 0:0:0:0: [sda] Stopping disk > > > > ---> hangs here if it happens > > > > ravb e6800000.ethernet eth0: Link is Down > > sd 0:0:0:0: [sda] Starting disk > > Micrel KSZ9031 Gigabit PHY e6800000.ethernet-ffffffff:00: attached > > PHY driver (mii_bus:phy_addr=e6800000.ethernet-ffffffff:00, irq=186) > > ata1: link resume succeeded after 1 retries > > ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) > > ata1.00: configured for UDMA/133 > > OOM killer enabled. > > Restarting tasks ... done. > > random: crng reseeded on system resumption > > PM: suspend exit > > ravb e6800000.ethernet eth0: Link is Up - 1Gbps/Full - flow control off > > I'm not sure that is enough information to find the root cause. How Sorry for not making it clear I didn't expect this to be enough information. > about enabling the tp_printk boot option and to enable tracing for > suspend/resume operations, e.g. as follows? > > cd /sys/kernel/tracing && > echo 256 > /sys/kernel/tracing/buffer_size_kb && > echo nop > current_tracer && > echo > trace && > echo 1 > events/power/device_pm_callback_start/enable && > echo 1 > events/power/device_pm_callback_end/enable && > echo 1 > events/power/suspend_resume/enable && > echo 1 > tracing_on Thanks, that generates lots of output (362 KiB/cycle)! Unfortunately it also has an impact on the probability of lock-ups. Combined with 'scsi: sd: Revert "Rework asynchronous resume support"', s2idle now works almost always. I did manage to trigger the lock-up once with tracing enabled: device_pm_callback_end: gpio_rcar e6055400.gpio, err=0 device_pm_callback_start: gpio_rcar e6055800.gpio, parent: soc, noirq power domain [suspend] device_pm_callback_end: gpio_rcar e6055800.gpio, err=0 device_pm_callback_start: renesas-cpg-mssr e6150000.clock-controller, parent: soc, noirq driver [suspend] device_pm_callback_end: renesas-cpg-mssr e6150000.clock-controller, err=0 device_pm_callback_start: sh-pfc e6060000.pinctrl, parent: soc, noirq driver [suspend] device_pm_callback_end: sh-pfc e6060000.pinctrl, err=0 suspend_resume: dpm_suspend_noirq[2] end suspend_resume: machine_suspend[1] begin suspend_resume: timekeeping_freeze[5] begin ---> hang suspend_resume: timekeeping_freeze[0] end suspend_resume: machine_suspend[1] end suspend_resume: dpm_resume_noirq[16] begin device_pm_callback_start: sh-pfc e6060000.pinctrl, parent: soc, noirq driver [resume] device_pm_callback_end: sh-pfc e6060000.pinctrl, err=0 device_pm_callback_start: renesas-cpg-mssr e6150000.clock-controller, parent: soc, noirq driver [resume] device_pm_callback_end: renesas-cpg-mssr e6150000.clock-controller, err=0 device_pm_callback_start: gpio_rcar e6055800.gpio, parent: soc, noirq power domain [resume] Oops, timers... At least it's not related to SCSI ;-) Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index 8f79fa6318fe..eb76ba055021 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -103,7 +103,6 @@ static void sd_config_discard(struct scsi_disk *, unsigned int); static void sd_config_write_same(struct scsi_disk *); static int sd_revalidate_disk(struct gendisk *); static void sd_unlock_native_capacity(struct gendisk *disk); -static void sd_start_done_work(struct work_struct *work); static int sd_probe(struct device *); static int sd_remove(struct device *); static void sd_shutdown(struct device *); @@ -3471,7 +3470,6 @@ static int sd_probe(struct device *dev) sdkp->max_retries = SD_MAX_RETRIES; atomic_set(&sdkp->openers, 0); atomic_set(&sdkp->device->ioerr_cnt, 0); - INIT_WORK(&sdkp->start_done_work, sd_start_done_work); if (!sdp->request_queue->rq_timeout) { if (sdp->type != TYPE_MOD) @@ -3594,69 +3592,12 @@ static void scsi_disk_release(struct device *dev) kfree(sdkp); } -/* Process sense data after a START command finished. */ -static void sd_start_done_work(struct work_struct *work) -{ - struct scsi_disk *sdkp = container_of(work, typeof(*sdkp), - start_done_work); - struct scsi_sense_hdr sshdr; - int res = sdkp->start_result; - - if (res == 0) - return; - - sd_print_result(sdkp, "Start/Stop Unit failed", res); - - if (res < 0) - return; - - if (scsi_normalize_sense(sdkp->start_sense_buffer, - sdkp->start_sense_len, &sshdr)) - sd_print_sense_hdr(sdkp, &sshdr); -} - -/* A START command finished. May be called from interrupt context. */ -static void sd_start_done(struct request *req, blk_status_t status) -{ - const struct scsi_cmnd *scmd = blk_mq_rq_to_pdu(req); - struct scsi_disk *sdkp = scsi_disk(req->q->disk); - - sdkp->start_result = scmd->result; - WARN_ON_ONCE(scmd->sense_len > SCSI_SENSE_BUFFERSIZE); - sdkp->start_sense_len = scmd->sense_len; - memcpy(sdkp->start_sense_buffer, scmd->sense_buffer, - ARRAY_SIZE(sdkp->start_sense_buffer)); - WARN_ON_ONCE(!schedule_work(&sdkp->start_done_work)); -} - -/* Submit a START command asynchronously. */ -static int sd_submit_start(struct scsi_disk *sdkp, u8 cmd[], u8 cmd_len) -{ - struct scsi_device *sdev = sdkp->device; - struct request_queue *q = sdev->request_queue; - struct request *req; - struct scsi_cmnd *scmd; - - req = scsi_alloc_request(q, REQ_OP_DRV_IN, BLK_MQ_REQ_PM); - if (IS_ERR(req)) - return PTR_ERR(req); - - scmd = blk_mq_rq_to_pdu(req); - scmd->cmd_len = cmd_len; - memcpy(scmd->cmnd, cmd, cmd_len); - scmd->allowed = sdkp->max_retries; - req->timeout = SD_TIMEOUT; - req->rq_flags |= RQF_PM | RQF_QUIET; - req->end_io = sd_start_done; - blk_execute_rq_nowait(req, /*at_head=*/true); - - return 0; -} - static int sd_start_stop_device(struct scsi_disk *sdkp, int start) { unsigned char cmd[6] = { START_STOP }; /* START_VALID */ + struct scsi_sense_hdr sshdr; struct scsi_device *sdp = sdkp->device; + int res; if (start) cmd[4] |= 1; /* START */ @@ -3667,10 +3608,23 @@ static int sd_start_stop_device(struct scsi_disk *sdkp, int start) if (!scsi_device_online(sdp)) return -ENODEV; - /* Wait until processing of sense data has finished. */ - flush_work(&sdkp->start_done_work); + res = scsi_execute(sdp, cmd, DMA_NONE, NULL, 0, NULL, &sshdr, + SD_TIMEOUT, sdkp->max_retries, 0, RQF_PM, NULL); + if (res) { + sd_print_result(sdkp, "Start/Stop Unit failed", res); + if (res > 0 && scsi_sense_valid(&sshdr)) { + sd_print_sense_hdr(sdkp, &sshdr); + /* 0x3a is medium not present */ + if (sshdr.asc == 0x3a) + res = 0; + } + } - return sd_submit_start(sdkp, cmd, sizeof(cmd)); + /* SCSI error codes must not go to the generic layer */ + if (res) + return -EIO; + + return 0; } /* @@ -3697,8 +3651,6 @@ static void sd_shutdown(struct device *dev) sd_printk(KERN_NOTICE, sdkp, "Stopping disk\n"); sd_start_stop_device(sdkp, 0); } - - flush_work(&sdkp->start_done_work); } static int sd_suspend_common(struct device *dev, bool ignore_stop_errors) diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h index b89187761d61..5eea762f84d1 100644 --- a/drivers/scsi/sd.h +++ b/drivers/scsi/sd.h @@ -150,11 +150,6 @@ struct scsi_disk { unsigned urswrz : 1; unsigned security : 1; unsigned ignore_medium_access_errors : 1; - - int start_result; - u32 start_sense_len; - u8 start_sense_buffer[SCSI_SENSE_BUFFERSIZE]; - struct work_struct start_done_work; }; #define to_scsi_disk(obj) container_of(obj, struct scsi_disk, disk_dev)
Although patch "Rework asynchronous resume support" eliminates the delay for some ATA disks after resume, it causes resume of ATA disks to fail on other setups. See also: * "Resume process hangs for 5-6 seconds starting sometime in 5.16" (https://bugzilla.kernel.org/show_bug.cgi?id=215880). * Geert's regression report (https://lore.kernel.org/linux-scsi/alpine.DEB.2.22.394.2207191125130.1006766@ramsan.of.borg/). This is what I understand about this issue: * During resume, ata_port_pm_resume() starts the SCSI error handler. This changes the SCSI host state into SHOST_RECOVERY and causes scsi_queue_rq() to return BLK_STS_RESOURCE. * sd_resume() calls sd_start_stop_device() for ATA devices. That function in turn calls sd_submit_start() which tries to submit a START STOP UNIT command. That command can only be submitted after the SCSI error handler has changed the SCSI host state back to SHOST_RUNNING. * The SCSI error handler runs on its own thread and calls schedule_work(&(ap->scsi_rescan_task)). That causes ata_scsi_dev_rescan() to be called from the context of a kernel workqueue. That call hangs in blk_mq_get_tag(). I'm not sure why - maybe because all available tags have been allocated by sd_submit_start() calls (this is a guess). Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: gzhqyz@gmail.com Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Reported-by: gzhqyz@gmail.com Fixes: 88f1669019bd ("scsi: sd: Rework asynchronous resume support"; v6.0-rc1~114^2~68) Signed-off-by: Bart Van Assche <bvanassche@acm.org> --- drivers/scsi/sd.c | 84 ++++++++++------------------------------------- drivers/scsi/sd.h | 5 --- 2 files changed, 18 insertions(+), 71 deletions(-)