Message ID | 20240829093913.6282-2-sh8267.baek@samsung.com |
---|---|
State | New |
Headers | show |
Series | Set SDEV_OFFLINE when ufs shutdown. | expand |
> There is a history of dead lock as reboot is performed at the beginning of > booting. SDEV_QUIESCE was set for all lu's scsi_devices by ufs shutdown, > and at that time the audio driver was waiting on blk_mq_submit_bio holding > a mutex_lock while reading the fw binary. After that, a deadlock issue > occurred while audio driver shutdown was waiting for mutex_unlock of > blk_mq_submit_bio. To solve this, set SDEV_OFFLINE for all lus except wlun, > so that any i/o that comes down after a ufs shutdown will return an error. > > [ 31.907781]I[0: swapper/0: 0] 1 130705007 1651079834 > 11289729804 0 D( 2) 3 ffffff882e208000 * init > [device_shutdown] > [ 31.907793]I[0: swapper/0: 0] Mutex: 0xffffff8849a2b8b0: > owner[0xffffff882e28cb00 kworker/6:0 :49] > [ 31.907806]I[0: swapper/0: 0] Call trace: > [ 31.907810]I[0: swapper/0: 0] __switch_to+0x174/0x338 > [ 31.907819]I[0: swapper/0: 0] __schedule+0x5ec/0x9cc > [ 31.907826]I[0: swapper/0: 0] schedule+0x7c/0xe8 > [ 31.907834]I[0: swapper/0: 0] schedule_preempt_disabled+0x24/0x40 > [ 31.907842]I[0: swapper/0: 0] __mutex_lock+0x408/0xdac > [ 31.907849]I[0: swapper/0: 0] __mutex_lock_slowpath+0x14/0x24 > [ 31.907858]I[0: swapper/0: 0] mutex_lock+0x40/0xec > [ 31.907866]I[0: swapper/0: 0] device_shutdown+0x108/0x280 > [ 31.907875]I[0: swapper/0: 0] kernel_restart+0x4c/0x11c > [ 31.907883]I[0: swapper/0: 0] __arm64_sys_reboot+0x15c/0x280 > [ 31.907890]I[0: swapper/0: 0] invoke_syscall+0x70/0x158 > [ 31.907899]I[0: swapper/0: 0] el0_svc_common+0xb4/0xf4 > [ 31.907909]I[0: swapper/0: 0] do_el0_svc+0x2c/0xb0 > [ 31.907918]I[0: swapper/0: 0] el0_svc+0x34/0xe0 > [ 31.907928]I[0: swapper/0: 0] el0t_64_sync_handler+0x68/0xb4 > [ 31.907937]I[0: swapper/0: 0] el0t_64_sync+0x1a0/0x1a4 > > [ 31.908774]I[0: swapper/0: 0] 49 0 11960702 > 11236868007 0 D( 2) 6 ffffff882e28cb00 * kworker/6:0 > [__bio_queue_enter] > [ 31.908783]I[0: swapper/0: 0] Call trace: > [ 31.908788]I[0: swapper/0: 0] __switch_to+0x174/0x338 > [ 31.908796]I[0: swapper/0: 0] __schedule+0x5ec/0x9cc > [ 31.908803]I[0: swapper/0: 0] schedule+0x7c/0xe8 > [ 31.908811]I[0: swapper/0: 0] __bio_queue_enter+0xb8/0x178 > [ 31.908818]I[0: swapper/0: 0] blk_mq_submit_bio+0x194/0x67c > [ 31.908827]I[0: swapper/0: 0] __submit_bio+0xb8/0x19c > > Fixes: b294ff3e3449 ("scsi: ufs: core: Enable power management for wlun") > Cc: stable@vger.kernel.org > Signed-off-by: Seunghwan Baek <sh8267.baek@samsung.com> > --- > drivers/ufs/core/ufshcd.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c > index a6f818cdef0e..4ac1492787c2 100644 > --- a/drivers/ufs/core/ufshcd.c > +++ b/drivers/ufs/core/ufshcd.c > @@ -10215,7 +10215,9 @@ static void ufshcd_wl_shutdown(struct device *dev) > shost_for_each_device(sdev, hba->host) { > if (sdev == hba->ufs_device_wlun) > continue; > - scsi_device_quiesce(sdev); > + mutex_lock(&sdev->state_mutex); > + scsi_device_set_state(sdev, SDEV_OFFLINE); > + mutex_unlock(&sdev->state_mutex); > } > __ufshcd_wl_suspend(hba, UFS_SHUTDOWN_PM); > > -- > 2.17.1 > Dear all. Could you please review this patch? It's been almost a month. If you have any opinions about this patch, share and comment it. Thanks. BRs.
On 9/23/24 12:54 AM, Seunghwan Baek wrote: > Could you please review this patch? It's been almost a month. > If you have any opinions about this patch, share and comment it. Thanks for the reminder. I'm not sure why this patch got overlooked but I will take a look. Bart.
On 8/29/24 2:39 AM, Seunghwan Baek wrote: > diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c > index a6f818cdef0e..4ac1492787c2 100644 > --- a/drivers/ufs/core/ufshcd.c > +++ b/drivers/ufs/core/ufshcd.c > @@ -10215,7 +10215,9 @@ static void ufshcd_wl_shutdown(struct device *dev) > shost_for_each_device(sdev, hba->host) { > if (sdev == hba->ufs_device_wlun) > continue; > - scsi_device_quiesce(sdev); > + mutex_lock(&sdev->state_mutex); > + scsi_device_set_state(sdev, SDEV_OFFLINE); > + mutex_unlock(&sdev->state_mutex); > } > __ufshcd_wl_suspend(hba, UFS_SHUTDOWN_PM); Why to keep one scsi_device_quiesce() call and convert the other call? Please consider something like this change: diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c index e808350c6774..914770dff18f 100644 --- a/drivers/ufs/core/ufshcd.c +++ b/drivers/ufs/core/ufshcd.c @@ -10134,11 +10134,10 @@ static void ufshcd_wl_shutdown(struct device *dev) /* Turn on everything while shutting down */ ufshcd_rpm_get_sync(hba); - scsi_device_quiesce(sdev); shost_for_each_device(sdev, hba->host) { - if (sdev == hba->ufs_device_wlun) - continue; - scsi_device_quiesce(sdev); + mutex_lock(&sdev->state_mutex); + scsi_device_set_state(sdev, SDEV_OFFLINE); + mutex_unlock(&sdev->state_mutex); } __ufshcd_wl_suspend(hba, UFS_SHUTDOWN_PM); Thanks, Bart.
> On 8/29/24 2:39 AM, Seunghwan Baek wrote: > > diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c > > index a6f818cdef0e..4ac1492787c2 100644 > > --- a/drivers/ufs/core/ufshcd.c > > +++ b/drivers/ufs/core/ufshcd.c > > @@ -10215,7 +10215,9 @@ static void ufshcd_wl_shutdown(struct device > *dev) > > shost_for_each_device(sdev, hba->host) { > > if (sdev == hba->ufs_device_wlun) > > continue; > > - scsi_device_quiesce(sdev); > > + mutex_lock(&sdev->state_mutex); > > + scsi_device_set_state(sdev, SDEV_OFFLINE); > > + mutex_unlock(&sdev->state_mutex); > > } > > __ufshcd_wl_suspend(hba, UFS_SHUTDOWN_PM); > > Why to keep one scsi_device_quiesce() call and convert the other call? > Please consider something like this change: > > diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c index > e808350c6774..914770dff18f 100644 > --- a/drivers/ufs/core/ufshcd.c > +++ b/drivers/ufs/core/ufshcd.c > @@ -10134,11 +10134,10 @@ static void ufshcd_wl_shutdown(struct device > *dev) > > /* Turn on everything while shutting down */ > ufshcd_rpm_get_sync(hba); > - scsi_device_quiesce(sdev); > shost_for_each_device(sdev, hba->host) { > - if (sdev == hba->ufs_device_wlun) > - continue; > - scsi_device_quiesce(sdev); > + mutex_lock(&sdev->state_mutex); > + scsi_device_set_state(sdev, SDEV_OFFLINE); > + mutex_unlock(&sdev->state_mutex); > } > __ufshcd_wl_suspend(hba, UFS_SHUTDOWN_PM); > > Thanks, > > Bart. That's because SSU (Start Stop Unit) command must be sent during shutdown process. If SDEV_OFFLINE is set for wlun, SSU command cannot be sent because it is rejected by the scsi layer. Therefore, we consider to set SDEV_QUIESCE for wlun, and set SDEV_OFFLINE for other lus. static int ufshcd_execute_start_stop(struct scsi_device *sdev, enum ufs_dev_pwr_mode pwr_mode, struct scsi_sense_hdr *sshdr) { const unsigned char cdb[6] = { START_STOP, 0, 0, 0, pwr_mode << 4, 0 }; const struct scsi_exec_args args = { .sshdr = sshdr, .req_flags = BLK_MQ_REQ_PM, <<< set REQ_PM flag .scmd_flags = SCMD_FAIL_IF_RECOVERING, }; return scsi_execute_cmd(sdev, cdb, REQ_OP_DRV_IN, /*buffer=*/NULL, /*bufflen=*/0, /*timeout=*/10 * HZ, /*retries=*/0, &args); } static blk_status_t scsi_device_state_check(struct scsi_device *sdev, struct request *req) { case SDEV_OFFLINE: case SDEV_TRANSPORT_OFFLINE: <<< Refuse all commands /* * If the device is offline we refuse to process any * commands. The device must be brought online * before trying any recovery commands. */ if (!sdev->offline_already) { sdev->offline_already = true; sdev_printk(KERN_ERR, sdev, "rejecting I/O to offline device\n"); } return BLK_STS_IOERR; case SDEV_QUIESCE: <<< Refuse all commands except REQ_PM flag /* * If the device is blocked we only accept power management * commands. */ if (req && WARN_ON_ONCE(!(req->rq_flags & RQF_PM))) return BLK_STS_RESOURCE; return BLK_STS_OK;
On 9/23/24 7:17 PM, Seunghwan Baek wrote:> That's because SSU (Start Stop Unit) command must be sent during > shutdown process. If SDEV_OFFLINE is set for wlun, SSU command cannot > be sent because it is rejected by the scsi layer. Therefore, we > consider to set SDEV_QUIESCE for wlun, and set SDEV_OFFLINE for other > lus. Right. Since ufshcd_wl_shutdown() is expected to stop all DMA related to the UFS host, shouldn't there be a scsi_device_quiesce(sdev) call after the __ufshcd_wl_suspend(hba, UFS_SHUTDOWN_PM) call? Thanks, Bart.
> On 9/23/24 7:17 PM, Seunghwan Baek wrote:> That's because SSU (Start Stop > Unit) command must be sent during > > shutdown process. If SDEV_OFFLINE is set for wlun, SSU command cannot > > be sent because it is rejected by the scsi layer. Therefore, we > > consider to set SDEV_QUIESCE for wlun, and set SDEV_OFFLINE for other > > lus. > Right. Since ufshcd_wl_shutdown() is expected to stop all DMA related to > the UFS host, shouldn't there be a scsi_device_quiesce(sdev) call after > the __ufshcd_wl_suspend(hba, UFS_SHUTDOWN_PM) call? > > Thanks, > > Bart. Yes. __ufshcd_wl_suspend(hba, UFS_SHUTDOWN_PM) should be called after scsi_device_quiesce(sdev). Generally, the SSU command is the last command before UFS power off. Therefore, if __ufshcd_wl_suspend is performed before scsi_device_quiesce, other commands may be performed after the SSU command and UFS may not guarantee the operation of the SSU command, which may cause other problems. This order must be guaranteed. And with SDEV_QUIESCE, deadlock issue cannot be avoided due to requeue. We need to return the i/o error with SDEV_OFFLINE to avoid the mentioned deadlock problem.
> > On 9/23/24 7:17 PM, Seunghwan Baek wrote:> That's because SSU (Start > > Stop > > Unit) command must be sent during > > > shutdown process. If SDEV_OFFLINE is set for wlun, SSU command > > > cannot be sent because it is rejected by the scsi layer. Therefore, > > > we consider to set SDEV_QUIESCE for wlun, and set SDEV_OFFLINE for > > > other lus. > > Right. Since ufshcd_wl_shutdown() is expected to stop all DMA related > > to the UFS host, shouldn't there be a scsi_device_quiesce(sdev) call > > after the __ufshcd_wl_suspend(hba, UFS_SHUTDOWN_PM) call? > > > > Thanks, > > > > Bart. > > Yes. __ufshcd_wl_suspend(hba, UFS_SHUTDOWN_PM) should be called after > scsi_device_quiesce(sdev). Generally, the SSU command is the last command > before UFS power off. Therefore, if __ufshcd_wl_suspend is performed > before scsi_device_quiesce, other commands may be performed after the SSU > command and UFS may not guarantee the operation of the SSU command, which > may cause other problems. This order must be guaranteed. > > And with SDEV_QUIESCE, deadlock issue cannot be avoided due to requeue. > We need to return the i/o error with SDEV_OFFLINE to avoid the mentioned > deadlock problem. (+ more CC added.) Dear All. Could you please update for this patch? If you have any opinions about this patch, share and comment it. Thanks. BRs.
On 8/29/24 2:39 AM, Seunghwan Baek wrote: > There is a history of dead lock as reboot is performed at the beginning of > booting. SDEV_QUIESCE was set for all lu's scsi_devices by ufs shutdown, > and at that time the audio driver was waiting on blk_mq_submit_bio holding > a mutex_lock while reading the fw binary. After that, a deadlock issue > occurred while audio driver shutdown was waiting for mutex_unlock of > blk_mq_submit_bio. To solve this, set SDEV_OFFLINE for all lus except wlun, > so that any i/o that comes down after a ufs shutdown will return an error. Reviewed-by: Bart Van Assche <bvanassche@acm.org>
> > > On 9/23/24 7:17 PM, Seunghwan Baek wrote:> That's because SSU (Start > > > Stop > > > Unit) command must be sent during > > > > shutdown process. If SDEV_OFFLINE is set for wlun, SSU command > > > > cannot be sent because it is rejected by the scsi layer. > > > > Therefore, we consider to set SDEV_QUIESCE for wlun, and set > > > > SDEV_OFFLINE for other lus. > > > Right. Since ufshcd_wl_shutdown() is expected to stop all DMA > > > related to the UFS host, shouldn't there be a > > > scsi_device_quiesce(sdev) call after the __ufshcd_wl_suspend(hba, > UFS_SHUTDOWN_PM) call? > > > > > > Thanks, > > > > > > Bart. > > > > Yes. __ufshcd_wl_suspend(hba, UFS_SHUTDOWN_PM) should be called after > > scsi_device_quiesce(sdev). Generally, the SSU command is the last > > command before UFS power off. Therefore, if __ufshcd_wl_suspend is > > performed before scsi_device_quiesce, other commands may be performed > > after the SSU command and UFS may not guarantee the operation of the > > SSU command, which may cause other problems. This order must be > guaranteed. > > > > And with SDEV_QUIESCE, deadlock issue cannot be avoided due to requeue. > > We need to return the i/o error with SDEV_OFFLINE to avoid the > > mentioned deadlock problem. > > (+ more CC added.) > Dear All. > > Could you please update for this patch? > If you have any opinions about this patch, share and comment it. > > Thanks. > BRs. Looks good to me. Thanks. Kiwoong Kim.
diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c index a6f818cdef0e..4ac1492787c2 100644 --- a/drivers/ufs/core/ufshcd.c +++ b/drivers/ufs/core/ufshcd.c @@ -10215,7 +10215,9 @@ static void ufshcd_wl_shutdown(struct device *dev) shost_for_each_device(sdev, hba->host) { if (sdev == hba->ufs_device_wlun) continue; - scsi_device_quiesce(sdev); + mutex_lock(&sdev->state_mutex); + scsi_device_set_state(sdev, SDEV_OFFLINE); + mutex_unlock(&sdev->state_mutex); } __ufshcd_wl_suspend(hba, UFS_SHUTDOWN_PM);
There is a history of dead lock as reboot is performed at the beginning of booting. SDEV_QUIESCE was set for all lu's scsi_devices by ufs shutdown, and at that time the audio driver was waiting on blk_mq_submit_bio holding a mutex_lock while reading the fw binary. After that, a deadlock issue occurred while audio driver shutdown was waiting for mutex_unlock of blk_mq_submit_bio. To solve this, set SDEV_OFFLINE for all lus except wlun, so that any i/o that comes down after a ufs shutdown will return an error. [ 31.907781]I[0: swapper/0: 0] 1 130705007 1651079834 11289729804 0 D( 2) 3 ffffff882e208000 * init [device_shutdown] [ 31.907793]I[0: swapper/0: 0] Mutex: 0xffffff8849a2b8b0: owner[0xffffff882e28cb00 kworker/6:0 :49] [ 31.907806]I[0: swapper/0: 0] Call trace: [ 31.907810]I[0: swapper/0: 0] __switch_to+0x174/0x338 [ 31.907819]I[0: swapper/0: 0] __schedule+0x5ec/0x9cc [ 31.907826]I[0: swapper/0: 0] schedule+0x7c/0xe8 [ 31.907834]I[0: swapper/0: 0] schedule_preempt_disabled+0x24/0x40 [ 31.907842]I[0: swapper/0: 0] __mutex_lock+0x408/0xdac [ 31.907849]I[0: swapper/0: 0] __mutex_lock_slowpath+0x14/0x24 [ 31.907858]I[0: swapper/0: 0] mutex_lock+0x40/0xec [ 31.907866]I[0: swapper/0: 0] device_shutdown+0x108/0x280 [ 31.907875]I[0: swapper/0: 0] kernel_restart+0x4c/0x11c [ 31.907883]I[0: swapper/0: 0] __arm64_sys_reboot+0x15c/0x280 [ 31.907890]I[0: swapper/0: 0] invoke_syscall+0x70/0x158 [ 31.907899]I[0: swapper/0: 0] el0_svc_common+0xb4/0xf4 [ 31.907909]I[0: swapper/0: 0] do_el0_svc+0x2c/0xb0 [ 31.907918]I[0: swapper/0: 0] el0_svc+0x34/0xe0 [ 31.907928]I[0: swapper/0: 0] el0t_64_sync_handler+0x68/0xb4 [ 31.907937]I[0: swapper/0: 0] el0t_64_sync+0x1a0/0x1a4 [ 31.908774]I[0: swapper/0: 0] 49 0 11960702 11236868007 0 D( 2) 6 ffffff882e28cb00 * kworker/6:0 [__bio_queue_enter] [ 31.908783]I[0: swapper/0: 0] Call trace: [ 31.908788]I[0: swapper/0: 0] __switch_to+0x174/0x338 [ 31.908796]I[0: swapper/0: 0] __schedule+0x5ec/0x9cc [ 31.908803]I[0: swapper/0: 0] schedule+0x7c/0xe8 [ 31.908811]I[0: swapper/0: 0] __bio_queue_enter+0xb8/0x178 [ 31.908818]I[0: swapper/0: 0] blk_mq_submit_bio+0x194/0x67c [ 31.908827]I[0: swapper/0: 0] __submit_bio+0xb8/0x19c Fixes: b294ff3e3449 ("scsi: ufs: core: Enable power management for wlun") Cc: stable@vger.kernel.org Signed-off-by: Seunghwan Baek <sh8267.baek@samsung.com> --- drivers/ufs/core/ufshcd.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)