mbox series

[v14,0/2] Enable power management for ufs wlun

Message ID cover.1617143113.git.asutoshd@codeaurora.org
Headers show
Series Enable power management for ufs wlun | expand

Message

Asutosh Das (asd) March 30, 2021, 10:31 p.m. UTC
This patch attempts to fix a deadlock in ufs while sending SSU.
Recently, blk_queue_enter() added a check to not process requests if the
queue is suspended. That leads to a resume of the associated device which
is suspended. In ufs, that device is ufs device wlun and it's parent is
ufs_hba. This resume tries to resume ufs device wlun which in turn tries
to resume ufs_hba, which is already in the process of suspending, thus
causing a deadlock.

This patch takes care of:
* Suspending the ufs device lun only after all other luns are suspended
* Sending SSU during ufs device wlun suspend
* Clearing uac for rpmb and ufs device wlun
* Not sending commands to the device during host suspend

v13 -> v14:
- Addressed Adrian's comments
  * Rebased it on top of scsi-next
  * Added scsi_autopm_[get/put] to ufs_debugfs[get/put]_user_access()
  * Resume the device in ufshcd_remove()
  * Unregister ufs_rpmb_wlun before ufs_dev_wlun
  * hba->shutting_down moved to ufshcd_wl_shutdown()

v12 -> v13:
- Addressed Adrian's comments
  * Paired pm_runtime_get_noresume() with pm_runtime_put()
  * no rpm_autosuspend for ufs device wlun
  * Moved runtime-pm init functionality to ufshcd_wl_probe()
- Addressed Bart's comments
  * Expanded abbrevs in commit message

v11 -> v12:
- Addressed Adrian's comments
  * Fixed ahit for Mediatek driver
  * Fixed error handling in ufshcd_core_init()
  * Tested this patch and the issue is still seen.

v10 -> v11:
- Fixed supplier suspending before consumer race
- Addressed Adrian's comments
  * Added proper resume/suspend cb to ufshcd_auto_hibern8_update()
  * Cosmetic changes to ufshcd-pci.c
  * Cleaned up ufshcd_system_suspend()
  * Added ufshcd_debugfs_eh_exit to ufshcd_core_init()

v9 -> v10:
- Addressed Adrian's comments
  * Moved suspend/resume vops to __ufshcd_wl_[suspend/resume]()
  * Added correct resume in ufs_bsg

v8 -> v9:
- Addressed Adrian's comments
  * Moved link transition to __ufshcd_wl_[suspend/resume]()
  * Fixed the other minor comments

v7 -> v8:
- Addressed Adrian's comments
  * Removed separate autosuspend delay for ufs-device lun
  * Fixed the ee handler getting scheduled during pm
  * Always runtime resume in suspend_prepare()
  * Added CONFIG_PM_SLEEP where needed
  
v6 -> v7:
  * Resume the ufs device before shutting it down

v5 -> v6:
- Addressed Adrian's comments
  * Added complete() cb
  * Added suspend_prepare() and complete() to all drivers
  * Moved suspend_prepare() and complete() to ufshcd
  * .poweroff() uses ufhcd_wl_poweroff()
  * Removed several forward declarations
  * Moved scsi_register_driver() to ufshcd_core_init()

v4 -> v5:
- Addressed Adrian's comments
  * Used the rpmb driver contributed by Adrian
  * Runtime-resume the ufs device during suspend to honor spm-lvl
  * Unregister the scsi_driver in ufshcd_remove()
  * Currently shutdown() puts the ufs device to power-down mode
    so, just removed ufshcd_pci_poweroff()
  * Quiesce the scsi device during shutdown instead of remove

v3 RFC -> v4:
- Addressed Bart's comments
  * Except that I didn't get any checkpatch failures
- Addressed Avri's comments
- Addressed Adrian's comments
  * Added a check for deepsleep power mode
  * Removed a couple of forward declarations
  * Didn't separate the scsi drivers because in rpmb case it just sends uac
    in resume and it seemed pretty neat to me.
- Added sysfs changes to resume the devices before accessing

Asutosh Das (2):
  scsi: ufs: Enable power management for wlun
  ufs: sysfs: Resume the proper scsi device

 drivers/scsi/ufs/cdns-pltfrm.c     |   2 +
 drivers/scsi/ufs/tc-dwc-g210-pci.c |   2 +
 drivers/scsi/ufs/ufs-debugfs.c     |   6 +-
 drivers/scsi/ufs/ufs-debugfs.h     |   2 +-
 drivers/scsi/ufs/ufs-exynos.c      |   2 +
 drivers/scsi/ufs/ufs-hisi.c        |   2 +
 drivers/scsi/ufs/ufs-mediatek.c    |  12 +-
 drivers/scsi/ufs/ufs-qcom.c        |   2 +
 drivers/scsi/ufs/ufs-sysfs.c       |  30 +-
 drivers/scsi/ufs/ufs_bsg.c         |   6 +-
 drivers/scsi/ufs/ufshcd-pci.c      |  36 +--
 drivers/scsi/ufs/ufshcd.c          | 636 ++++++++++++++++++++++++++-----------
 drivers/scsi/ufs/ufshcd.h          |   6 +
 include/trace/events/ufs.h         |  20 ++
 14 files changed, 521 insertions(+), 243 deletions(-)

Comments

Adrian Hunter March 31, 2021, 6:19 p.m. UTC | #1
On 31/03/21 1:31 am, Asutosh Das wrote:
> During runtime-suspend of ufs host, the scsi devices are

> already suspended and so are the queues associated with them.

> But the ufs host sends SSU (START_STOP_UNIT) to wlun

> during its runtime-suspend.

> During the process blk_queue_enter checks if the queue is not in

> suspended state. If so, it waits for the queue to resume, and never

> comes out of it.

> The commit

> (d55d15a33: scsi: block: Do not accept any requests while suspended)

> adds the check if the queue is in suspended state in blk_queue_enter().

> 

> Call trace:

>  __switch_to+0x174/0x2c4

>  __schedule+0x478/0x764

>  schedule+0x9c/0xe0

>  blk_queue_enter+0x158/0x228

>  blk_mq_alloc_request+0x40/0xa4

>  blk_get_request+0x2c/0x70

>  __scsi_execute+0x60/0x1c4

>  ufshcd_set_dev_pwr_mode+0x124/0x1e4

>  ufshcd_suspend+0x208/0x83c

>  ufshcd_runtime_suspend+0x40/0x154

>  ufshcd_pltfrm_runtime_suspend+0x14/0x20

>  pm_generic_runtime_suspend+0x28/0x3c

>  __rpm_callback+0x80/0x2a4

>  rpm_suspend+0x308/0x614

>  rpm_idle+0x158/0x228

>  pm_runtime_work+0x84/0xac

>  process_one_work+0x1f0/0x470

>  worker_thread+0x26c/0x4c8

>  kthread+0x13c/0x320

>  ret_from_fork+0x10/0x18

> 

> Fix this by registering ufs device wlun as a scsi driver and

> registering it for block runtime-pm. Also make this as a

> supplier for all other luns. That way, this device wlun

> suspends after all the consumers and resumes after

> hba resumes.

> 

> Co-developed-by: Can Guo <cang@codeaurora.org>

> Signed-off-by: Can Guo <cang@codeaurora.org>

> Signed-off-by: Asutosh Das <asutoshd@codeaurora.org>

> ---


Looks good but still doesn't seem to based on the latest tree.

Also came across the issue below:

<SNIP>

> +#ifdef CONFIG_PM_SLEEP

> +static int ufshcd_wl_poweroff(struct device *dev)

> +{

> +	ufshcd_wl_shutdown(dev);


This turned out to be wrong.  This is a PM op and SCSI has already
quiesced the sdev's.  All that is needed is:

	__ufshcd_wl_suspend(hba, UFS_SHUTDOWN_PM);

> +	return 0;

> +}

> +#endif
Asutosh Das (asd) April 1, 2021, 1:40 a.m. UTC | #2
On 3/31/2021 11:19 AM, Adrian Hunter wrote:
> On 31/03/21 1:31 am, Asutosh Das wrote:
>> During runtime-suspend of ufs host, the scsi devices are
>> already suspended and so are the queues associated with them.
>> But the ufs host sends SSU (START_STOP_UNIT) to wlun
>> during its runtime-suspend.
>> During the process blk_queue_enter checks if the queue is not in
>> suspended state. If so, it waits for the queue to resume, and never
>> comes out of it.
>> The commit
>> (d55d15a33: scsi: block: Do not accept any requests while suspended)
>> adds the check if the queue is in suspended state in blk_queue_enter().
>>
>> Call trace:
>>   __switch_to+0x174/0x2c4
>>   __schedule+0x478/0x764
>>   schedule+0x9c/0xe0
>>   blk_queue_enter+0x158/0x228
>>   blk_mq_alloc_request+0x40/0xa4
>>   blk_get_request+0x2c/0x70
>>   __scsi_execute+0x60/0x1c4
>>   ufshcd_set_dev_pwr_mode+0x124/0x1e4
>>   ufshcd_suspend+0x208/0x83c
>>   ufshcd_runtime_suspend+0x40/0x154
>>   ufshcd_pltfrm_runtime_suspend+0x14/0x20
>>   pm_generic_runtime_suspend+0x28/0x3c
>>   __rpm_callback+0x80/0x2a4
>>   rpm_suspend+0x308/0x614
>>   rpm_idle+0x158/0x228
>>   pm_runtime_work+0x84/0xac
>>   process_one_work+0x1f0/0x470
>>   worker_thread+0x26c/0x4c8
>>   kthread+0x13c/0x320
>>   ret_from_fork+0x10/0x18
>>
>> Fix this by registering ufs device wlun as a scsi driver and
>> registering it for block runtime-pm. Also make this as a
>> supplier for all other luns. That way, this device wlun
>> suspends after all the consumers and resumes after
>> hba resumes.
>>
>> Co-developed-by: Can Guo <cang@codeaurora.org>
>> Signed-off-by: Can Guo <cang@codeaurora.org>
>> Signed-off-by: Asutosh Das <asutoshd@codeaurora.org>
>> ---
> 
Hi Adrian
Thanks for the comments.
> Looks good but still doesn't seem to based on the latest tree.
> 
Umm, it's based on the below:
git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git
Branch: refs/heads/for-next

The top most change is e27f3c8 on 27th March'21.
Which tree are you referring to that'd be latest?

> Also came across the issue below:
> 
> <SNIP>
> 
>> +#ifdef CONFIG_PM_SLEEP
>> +static int ufshcd_wl_poweroff(struct device *dev)
>> +{
>> +	ufshcd_wl_shutdown(dev);
> 
> This turned out to be wrong.  This is a PM op and SCSI has already
> quiesced the sdev's.  All that is needed isOk. I'll fix it in the next version.

> 
> 	__ufshcd_wl_suspend(hba, UFS_SHUTDOWN_PM);
> 
>> +	return 0;
>> +}
>> +#endif
Adrian Hunter April 1, 2021, 7:45 a.m. UTC | #3
On 1/04/21 4:40 am, Asutosh Das (asd) wrote:
> On 3/31/2021 11:19 AM, Adrian Hunter wrote:
>> On 31/03/21 1:31 am, Asutosh Das wrote:
>>> During runtime-suspend of ufs host, the scsi devices are
>>> already suspended and so are the queues associated with them.
>>> But the ufs host sends SSU (START_STOP_UNIT) to wlun
>>> during its runtime-suspend.
>>> During the process blk_queue_enter checks if the queue is not in
>>> suspended state. If so, it waits for the queue to resume, and never
>>> comes out of it.
>>> The commit
>>> (d55d15a33: scsi: block: Do not accept any requests while suspended)
>>> adds the check if the queue is in suspended state in blk_queue_enter().
>>>
>>> Call trace:
>>>   __switch_to+0x174/0x2c4
>>>   __schedule+0x478/0x764
>>>   schedule+0x9c/0xe0
>>>   blk_queue_enter+0x158/0x228
>>>   blk_mq_alloc_request+0x40/0xa4
>>>   blk_get_request+0x2c/0x70
>>>   __scsi_execute+0x60/0x1c4
>>>   ufshcd_set_dev_pwr_mode+0x124/0x1e4
>>>   ufshcd_suspend+0x208/0x83c
>>>   ufshcd_runtime_suspend+0x40/0x154
>>>   ufshcd_pltfrm_runtime_suspend+0x14/0x20
>>>   pm_generic_runtime_suspend+0x28/0x3c
>>>   __rpm_callback+0x80/0x2a4
>>>   rpm_suspend+0x308/0x614
>>>   rpm_idle+0x158/0x228
>>>   pm_runtime_work+0x84/0xac
>>>   process_one_work+0x1f0/0x470
>>>   worker_thread+0x26c/0x4c8
>>>   kthread+0x13c/0x320
>>>   ret_from_fork+0x10/0x18
>>>
>>> Fix this by registering ufs device wlun as a scsi driver and
>>> registering it for block runtime-pm. Also make this as a
>>> supplier for all other luns. That way, this device wlun
>>> suspends after all the consumers and resumes after
>>> hba resumes.
>>>
>>> Co-developed-by: Can Guo <cang@codeaurora.org>
>>> Signed-off-by: Can Guo <cang@codeaurora.org>
>>> Signed-off-by: Asutosh Das <asutoshd@codeaurora.org>
>>> ---
>>
> Hi Adrian
> Thanks for the comments.
>> Looks good but still doesn't seem to based on the latest tree.
>>
> Umm, it's based on the below:
> git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git
> Branch: refs/heads/for-next
> 
> The top most change is e27f3c8 on 27th March'21.
> Which tree are you referring to that'd be latest?

Dunno, but that seems to be missing:

  commit aa53f580e67b49ec5f4d9bd1de81eb9eb0dc079f
  Author: Can Guo <cang@codeaurora.org>
  Date:   Tue Feb 23 21:36:47 2021 -0800

    scsi: ufs: Minor adjustments to error handling

which is in v5.12-rc3

> 
>> Also came across the issue below:
>>
>> <SNIP>
>>
>>> +#ifdef CONFIG_PM_SLEEP
>>> +static int ufshcd_wl_poweroff(struct device *dev)
>>> +{
>>> +    ufshcd_wl_shutdown(dev);
>>
>> This turned out to be wrong.  This is a PM op and SCSI has already
>> quiesced the sdev's.  All that is needed isOk. I'll fix it in the next version.
> 
>>
>>     __ufshcd_wl_suspend(hba, UFS_SHUTDOWN_PM);
>>
>>> +    return 0;
>>> +}
>>> +#endif
> 
>