Message ID | 20221208072520.26210-1-peter.wang@mediatek.com |
---|---|
State | New |
Headers | show |
Series | [v6] ufs: core: wlun suspend SSU/enter hibern8 fail recovery | expand |
Peter, > When SSU/enter hibern8 fail in wlun suspend flow, trigger error > handler and return busy to break the suspend. If not, wlun runtime pm > status become error and the consumer will stuck in runtime suspend > status. Applied to 6.2/scsi-staging, thanks!
> Applied to 6.2/scsi-staging, thanks!
There is an interesting side effect of the patch in this iteration
(which I am not sure was present in the past iteration I tried):
If the device auto suspends while running purge - controller is
seemingly recent and thus the purge is aborted (with no patch at all
it hangs).
That might be ok behaviour though - it will just make it an explicit
requirement to disable runtime suspend during the management
operation.
localhost ~ # ufs-utils fl -t 6 -e -p /dev/bsg/ufs-bsg0
localhost ~ # ufs-utils attr -a -p /dev/bsg/ufs-bsg0 | grep bPurgeStatus
bPurgeStatus := 0x00
[ 25.801980] ufs_device_wlun 0:0:0:49488: START_STOP failed for
power mode: 2, result 2
[ 25.802002] ufs_device_wlun 0:0:0:49488: Sense Key : Not Ready [current]
[ 25.802009] ufs_device_wlun 0:0:0:49488: Add. Sense: No additional
sense information
[ 25.802020] ufs_device_wlun 0:0:0:49488: ufshcd_wl_runtime_suspend
failed: -16
On Wed, 2022-12-21 at 08:00 +1100, Daniil Lunev wrote: > > Applied to 6.2/scsi-staging, thanks! > > There is an interesting side effect of the patch in this iteration > (which I am not sure was present in the past iteration I tried): > If the device auto suspends while running purge - controller is > seemingly recent and thus the purge is aborted (with no patch at all > it hangs). > That might be ok behaviour though - it will just make it an explicit > requirement to disable runtime suspend during the management > operation. > Hi Daniil, I am not sure if this is similar reason we get SSU(sleep) fail. But if without this patch when purge is onging, system IO will hang, this is no better. And I have another idea about rpm and purge. To disable runtime suspend when purge operation is ongoing: 1. Disable rpm when fPurgeEnable is set, polling bPurgeStatus become 0 and enable rpm. But polling bPurgeStatus will extend rpm timer, so we don't need really disable rpm, right? 2. Check bPurgeStatus if enter runtime suspend, return EBUSY if bPurgeStatus is not 0 to break suspend. This is correct design to tell rpm flamework that driver is busy with purge and suspend is inappropriate. But it should be similar as current flow, return EBUSY when get SSU fail? So, with current design, if purge initiator do not want to see rpm EBUSY, then he should polling bPurgeStatus. What do you think? Thanks. BR Peter > localhost ~ # ufs-utils fl -t 6 -e -p /dev/bsg/ufs-bsg0 > localhost ~ # ufs-utils attr -a -p /dev/bsg/ufs-bsg0 | grep > bPurgeStatus > bPurgeStatus := 0x00 > > [ 25.801980] ufs_device_wlun 0:0:0:49488: START_STOP failed for > power mode: 2, result 2 > [ 25.802002] ufs_device_wlun 0:0:0:49488: Sense Key : Not Ready > [current] > [ 25.802009] ufs_device_wlun 0:0:0:49488: Add. Sense: No additional > sense information > [ 25.802020] ufs_device_wlun 0:0:0:49488: ufshcd_wl_runtime_suspend > failed: -16
On Wed, Dec 21, 2022 at 4:59 PM Peter Wang (王信友) <peter.wang@mediatek.com> wrote: > But if without this patch when purge is onging, system IO will hang, > this is no better. Yes, that is why I am just pointing this out as a matter of fact, not as a bug. It is arguable if resetting the controller in the deadlock situation is a proper thing to do, but it might be the next best thing, so I don't argue that neither. > So, with current design, if purge initiator do not want to see rpm > EBUSY, then he should polling bPurgeStatus. > What do you think? I am actually not sure if management operations extend the timeout - they are going through bsg interface, and I am not sure it properly re-sets the timeouts on all possible nexus interfaces, need to check that. But even if it does, there are two problems: * If you make kernel be polling that parameter - it will actually make the application level to miss the completion code (since after querying completion once it will return Not Started afterwards). * And application polling is race prone. We set runtime suspend to 100ms - so depending on the scheduling quirks it may miss the event. --Daniil
diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c index b1f59a5fe632..31ed3fdb5266 100644 --- a/drivers/ufs/core/ufshcd.c +++ b/drivers/ufs/core/ufshcd.c @@ -6070,6 +6070,14 @@ void ufshcd_schedule_eh_work(struct ufs_hba *hba) } } +static void ufshcd_force_error_recovery(struct ufs_hba *hba) +{ + spin_lock_irq(hba->host->host_lock); + hba->force_reset = true; + ufshcd_schedule_eh_work(hba); + spin_unlock_irq(hba->host->host_lock); +} + static void ufshcd_clk_scaling_allow(struct ufs_hba *hba, bool allow) { down_write(&hba->clk_scaling_lock); @@ -9049,6 +9057,15 @@ static int __ufshcd_wl_suspend(struct ufs_hba *hba, enum ufs_pm_op pm_op) if (!hba->dev_info.b_rpm_dev_flush_capable) { ret = ufshcd_set_dev_pwr_mode(hba, req_dev_pwr_mode); + if (ret && pm_op != UFS_SHUTDOWN_PM) { + /* + * If return err in suspend flow, IO will hang. + * Trigger error handler and break suspend for + * error recovery. + */ + ufshcd_force_error_recovery(hba); + ret = -EBUSY; + } if (ret) goto enable_scaling; } @@ -9060,6 +9077,15 @@ static int __ufshcd_wl_suspend(struct ufs_hba *hba, enum ufs_pm_op pm_op) */ check_for_bkops = !ufshcd_is_ufs_dev_deepsleep(hba); ret = ufshcd_link_state_transition(hba, req_link_state, check_for_bkops); + if (ret && pm_op != UFS_SHUTDOWN_PM) { + /* + * If return err in suspend flow, IO will hang. + * Trigger error handler and break suspend for + * error recovery. + */ + ufshcd_force_error_recovery(hba); + ret = -EBUSY; + } if (ret) goto set_dev_active;