mbox series

[v17,0/2] Enable power management for ufs wlun

Message ID cover.1617893198.git.asutoshd@codeaurora.org
Headers show
Series Enable power management for ufs wlun | expand

Message

Asutosh Das (asd) April 8, 2021, 2:49 p.m. UTC
This patch attempts to fix a deadlock in ufs while sending SSU.
Recently, blk_queue_enter() added a check to not process requests if the
queue is suspended. That leads to a resume of the associated device which
is suspended. In ufs, that device is ufs device wlun and it's parent is
ufs_hba. This resume tries to resume ufs device wlun which in turn tries
to resume ufs_hba, which is already in the process of suspending, thus
causing a deadlock.

This patch takes care of:
* Suspending the ufs device lun only after all other luns are suspended
* Sending SSU during ufs device wlun suspend
* Clearing uac for rpmb and ufs device wlun
* Not sending commands to the device during host suspend

v16 -> v17:
- Addressed Adrian's & Daejun's comments

v15 -> v16:
- Brought back the missing changes
  * Added scsi_autopm_[get/put] to ufs_debugfs[get/put]_user_access()
  * Fix ufshcd_wl_poweroff()

v14 -> v15:
- Rebased on 5.13/scsi-staging

v13 -> v14:
- Addressed Adrian's comments
  * Rebased it on top of scsi-next
  * Added scsi_autopm_[get/put] to ufs_debugfs[get/put]_user_access()
  * Resume the device in ufshcd_remove()
  * Unregister ufs_rpmb_wlun before ufs_dev_wlun
  * hba->shutting_down moved to ufshcd_wl_shutdown()

v12 -> v13:
- Addressed Adrian's comments
  * Paired pm_runtime_get_noresume() with pm_runtime_put()
  * no rpm_autosuspend for ufs device wlun
  * Moved runtime-pm init functionality to ufshcd_wl_probe()
- Addressed Bart's comments
  * Expanded abbrevs in commit message

v11 -> v12:
- Addressed Adrian's comments
  * Fixed ahit for Mediatek driver
  * Fixed error handling in ufshcd_core_init()
  * Tested this patch and the issue is still seen.

v10 -> v11:
- Fixed supplier suspending before consumer race
- Addressed Adrian's comments
  * Added proper resume/suspend cb to ufshcd_auto_hibern8_update()
  * Cosmetic changes to ufshcd-pci.c
  * Cleaned up ufshcd_system_suspend()
  * Added ufshcd_debugfs_eh_exit to ufshcd_core_init()

v9 -> v10:
- Addressed Adrian's comments
  * Moved suspend/resume vops to __ufshcd_wl_[suspend/resume]()
  * Added correct resume in ufs_bsg

v8 -> v9:
- Addressed Adrian's comments
  * Moved link transition to __ufshcd_wl_[suspend/resume]()
  * Fixed the other minor comments

v7 -> v8:
- Addressed Adrian's comments
  * Removed separate autosuspend delay for ufs-device lun
  * Fixed the ee handler getting scheduled during pm
  * Always runtime resume in suspend_prepare()
  * Added CONFIG_PM_SLEEP where needed
  
v6 -> v7:
  * Resume the ufs device before shutting it down

v5 -> v6:
- Addressed Adrian's comments
  * Added complete() cb
  * Added suspend_prepare() and complete() to all drivers
  * Moved suspend_prepare() and complete() to ufshcd
  * .poweroff() uses ufhcd_wl_poweroff()
  * Removed several forward declarations
  * Moved scsi_register_driver() to ufshcd_core_init()

v4 -> v5:
- Addressed Adrian's comments
  * Used the rpmb driver contributed by Adrian
  * Runtime-resume the ufs device during suspend to honor spm-lvl
  * Unregister the scsi_driver in ufshcd_remove()
  * Currently shutdown() puts the ufs device to power-down mode
    so, just removed ufshcd_pci_poweroff()
  * Quiesce the scsi device during shutdown instead of remove

v3 RFC -> v4:
- Addressed Bart's comments
  * Except that I didn't get any checkpatch failures
- Addressed Avri's comments
- Addressed Adrian's comments
  * Added a check for deepsleep power mode
  * Removed a couple of forward declarations
  * Didn't separate the scsi drivers because in rpmb case it just sends uac
    in resume and it seemed pretty neat to me.
- Added sysfs changes to resume the devices before accessing


Asutosh Das (2):
  scsi: ufs: Enable power management for wlun
  ufs: sysfs: Resume the proper scsi device

 drivers/scsi/ufs/cdns-pltfrm.c     |   2 +
 drivers/scsi/ufs/tc-dwc-g210-pci.c |   2 +
 drivers/scsi/ufs/ufs-debugfs.c     |   6 +-
 drivers/scsi/ufs/ufs-debugfs.h     |   2 +-
 drivers/scsi/ufs/ufs-exynos.c      |   2 +
 drivers/scsi/ufs/ufs-hisi.c        |   2 +
 drivers/scsi/ufs/ufs-mediatek.c    |  12 +-
 drivers/scsi/ufs/ufs-qcom.c        |   2 +
 drivers/scsi/ufs/ufs-sysfs.c       |  30 +-
 drivers/scsi/ufs/ufs_bsg.c         |   6 +-
 drivers/scsi/ufs/ufshcd-pci.c      |  36 +--
 drivers/scsi/ufs/ufshcd.c          | 642 ++++++++++++++++++++++++++-----------
 drivers/scsi/ufs/ufshcd.h          |   6 +
 include/trace/events/ufs.h         |  20 ++
 14 files changed, 526 insertions(+), 244 deletions(-)

Comments

Adrian Hunter April 9, 2021, 10:07 a.m. UTC | #1
On 9/04/21 5:27 am, Daejun Park wrote:
> Hi Asutosh Das,

> 

>> During runtime-suspend of ufs host, the scsi devices are

>> already suspended and so are the queues associated with them.

>> But the ufs host sends SSU (START_STOP_UNIT) to wlun

>> during its runtime-suspend.

>> During the process blk_queue_enter checks if the queue is not in

>> suspended state. If so, it waits for the queue to resume, and never

>> comes out of it.

>> The commit

>> (d55d15a33: scsi: block: Do not accept any requests while suspended)

>> adds the check if the queue is in suspended state in blk_queue_enter().

>>

>> Call trace:

>> __switch_to+0x174/0x2c4

>> __schedule+0x478/0x764

>> schedule+0x9c/0xe0

>> blk_queue_enter+0x158/0x228

>> blk_mq_alloc_request+0x40/0xa4

>> blk_get_request+0x2c/0x70

>> __scsi_execute+0x60/0x1c4

>> ufshcd_set_dev_pwr_mode+0x124/0x1e4

>> ufshcd_suspend+0x208/0x83c

>> ufshcd_runtime_suspend+0x40/0x154

>> ufshcd_pltfrm_runtime_suspend+0x14/0x20

>> pm_generic_runtime_suspend+0x28/0x3c

>> __rpm_callback+0x80/0x2a4

>> rpm_suspend+0x308/0x614

>> rpm_idle+0x158/0x228

>> pm_runtime_work+0x84/0xac

>> process_one_work+0x1f0/0x470

>> worker_thread+0x26c/0x4c8

>> kthread+0x13c/0x320

>> ret_from_fork+0x10/0x18

>>

>> Fix this by registering ufs device wlun as a scsi driver and

>> registering it for block runtime-pm. Also make this as a

>> supplier for all other luns. That way, this device wlun

>> suspends after all the consumers and resumes after

>> hba resumes.

>>

>> Co-developed-by: Can Guo <cang@codeaurora.org>

>> Signed-off-by: Can Guo <cang@codeaurora.org>

>> Signed-off-by: Asutosh Das <asutoshd@codeaurora.org>

>> ---

>> drivers/scsi/ufs/cdns-pltfrm.c     |   2 +

>> drivers/scsi/ufs/tc-dwc-g210-pci.c |   2 +

>> drivers/scsi/ufs/ufs-debugfs.c     |   6 +-

>> drivers/scsi/ufs/ufs-debugfs.h     |   2 +-

>> drivers/scsi/ufs/ufs-exynos.c      |   2 +

>> drivers/scsi/ufs/ufs-hisi.c        |   2 +

>> drivers/scsi/ufs/ufs-mediatek.c    |  12 +-

>> drivers/scsi/ufs/ufs-qcom.c        |   2 +

>> drivers/scsi/ufs/ufs_bsg.c         |   6 +-

>> drivers/scsi/ufs/ufshcd-pci.c      |  36 +--

>> drivers/scsi/ufs/ufshcd.c          | 642 ++++++++++++++++++++++++++-----------

>> drivers/scsi/ufs/ufshcd.h          |   6 +

>> include/trace/events/ufs.h         |  20 ++

>> 13 files changed, 509 insertions(+), 231 deletions(-)

> 

> In this patch, you changed pm_runtime_{get, put}_sync to scsi_autopm_{get, put}_device.

> But, scsi_autopm_get_device() calls pm_runtime_put_sync() in case of error

> of pm_runtime_get_sync(). So, pm_runtime_put_sync() can be called twice if

> scsi_autopm_get_device has error.


Also it might be tidy to make wrappers e.g.

static inline int ufshcd_rpm_get_sync(struct ufs_hba *hba)
{
    return pm_runtime_get_sync(&hba->sdev_ufs_device->sdev_gendev);
}
   
static inline int ufshcd_rpm_put(struct ufs_hba *hba)
{
    return pm_runtime_put(&hba->sdev_ufs_device->sdev_gendev);
}

static inline int ufshcd_rpm_put_sync(struct ufs_hba *hba)
{
    return pm_runtime_put_sync(&hba->sdev_ufs_device->sdev_gendev);
} 

And also consider matching: e.g.

	pm_runtime_put(hba->dev)	to	ufshcd_rpm_put(hba)
	pm_runtime_put_sync(hba->dev)	to	ufshcd_rpm_put_sync(hba)
Asutosh Das (asd) April 9, 2021, 5:15 p.m. UTC | #2
On 4/9/2021 3:07 AM, Adrian Hunter wrote:
> On 9/04/21 5:27 am, Daejun Park wrote:

>> Hi Asutosh Das,

>>

>>> During runtime-suspend of ufs host, the scsi devices are

>>> already suspended and so are the queues associated with them.

>>> But the ufs host sends SSU (START_STOP_UNIT) to wlun

>>> during its runtime-suspend.

>>> During the process blk_queue_enter checks if the queue is not in

>>> suspended state. If so, it waits for the queue to resume, and never

>>> comes out of it.

>>> The commit

>>> (d55d15a33: scsi: block: Do not accept any requests while suspended)

>>> adds the check if the queue is in suspended state in blk_queue_enter().

>>>

>>> Call trace:

>>> __switch_to+0x174/0x2c4

>>> __schedule+0x478/0x764

>>> schedule+0x9c/0xe0

>>> blk_queue_enter+0x158/0x228

>>> blk_mq_alloc_request+0x40/0xa4

>>> blk_get_request+0x2c/0x70

>>> __scsi_execute+0x60/0x1c4

>>> ufshcd_set_dev_pwr_mode+0x124/0x1e4

>>> ufshcd_suspend+0x208/0x83c

>>> ufshcd_runtime_suspend+0x40/0x154

>>> ufshcd_pltfrm_runtime_suspend+0x14/0x20

>>> pm_generic_runtime_suspend+0x28/0x3c

>>> __rpm_callback+0x80/0x2a4

>>> rpm_suspend+0x308/0x614

>>> rpm_idle+0x158/0x228

>>> pm_runtime_work+0x84/0xac

>>> process_one_work+0x1f0/0x470

>>> worker_thread+0x26c/0x4c8

>>> kthread+0x13c/0x320

>>> ret_from_fork+0x10/0x18

>>>

>>> Fix this by registering ufs device wlun as a scsi driver and

>>> registering it for block runtime-pm. Also make this as a

>>> supplier for all other luns. That way, this device wlun

>>> suspends after all the consumers and resumes after

>>> hba resumes.

>>>

>>> Co-developed-by: Can Guo <cang@codeaurora.org>

>>> Signed-off-by: Can Guo <cang@codeaurora.org>

>>> Signed-off-by: Asutosh Das <asutoshd@codeaurora.org>

>>> ---

>>> drivers/scsi/ufs/cdns-pltfrm.c     |   2 +

>>> drivers/scsi/ufs/tc-dwc-g210-pci.c |   2 +

>>> drivers/scsi/ufs/ufs-debugfs.c     |   6 +-

>>> drivers/scsi/ufs/ufs-debugfs.h     |   2 +-

>>> drivers/scsi/ufs/ufs-exynos.c      |   2 +

>>> drivers/scsi/ufs/ufs-hisi.c        |   2 +

>>> drivers/scsi/ufs/ufs-mediatek.c    |  12 +-

>>> drivers/scsi/ufs/ufs-qcom.c        |   2 +

>>> drivers/scsi/ufs/ufs_bsg.c         |   6 +-

>>> drivers/scsi/ufs/ufshcd-pci.c      |  36 +--

>>> drivers/scsi/ufs/ufshcd.c          | 642 ++++++++++++++++++++++++++-----------

>>> drivers/scsi/ufs/ufshcd.h          |   6 +

>>> include/trace/events/ufs.h         |  20 ++

>>> 13 files changed, 509 insertions(+), 231 deletions(-)

>>

>> In this patch, you changed pm_runtime_{get, put}_sync to scsi_autopm_{get, put}_device.

>> But, scsi_autopm_get_device() calls pm_runtime_put_sync() in case of error

>> of pm_runtime_get_sync(). So, pm_runtime_put_sync() can be called twice if

>> scsi_autopm_get_device has error.

> 

> Also it might be tidy to make wrappers e.g.

> 

> static inline int ufshcd_rpm_get_sync(struct ufs_hba *hba)

> {

>      return pm_runtime_get_sync(&hba->sdev_ufs_device->sdev_gendev);

> }

>     

> static inline int ufshcd_rpm_put(struct ufs_hba *hba)

> {

>      return pm_runtime_put(&hba->sdev_ufs_device->sdev_gendev);

> }

> 

> static inline int ufshcd_rpm_put_sync(struct ufs_hba *hba)

> {

>      return pm_runtime_put_sync(&hba->sdev_ufs_device->sdev_gendev);

> }

> 

> And also consider matching: e.g.

> 

> 	pm_runtime_put(hba->dev)	to	ufshcd_rpm_put(hba)

> 	pm_runtime_put_sync(hba->dev)	to	ufshcd_rpm_put_sync(hba)

> 

> 

> 


Ok, I'll push the changes shortly.

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
Linux Foundation Collaborative Project
Adrian Hunter April 9, 2021, 5:30 p.m. UTC | #3
On 9/04/21 8:15 pm, Asutosh Das (asd) wrote:
> On 4/9/2021 3:07 AM, Adrian Hunter wrote:
>> On 9/04/21 5:27 am, Daejun Park wrote:
>>> Hi Asutosh Das,
>>>
>>>> During runtime-suspend of ufs host, the scsi devices are
>>>> already suspended and so are the queues associated with them.
>>>> But the ufs host sends SSU (START_STOP_UNIT) to wlun
>>>> during its runtime-suspend.
>>>> During the process blk_queue_enter checks if the queue is not in
>>>> suspended state. If so, it waits for the queue to resume, and never
>>>> comes out of it.
>>>> The commit
>>>> (d55d15a33: scsi: block: Do not accept any requests while suspended)
>>>> adds the check if the queue is in suspended state in blk_queue_enter().
>>>>
>>>> Call trace:
>>>> __switch_to+0x174/0x2c4
>>>> __schedule+0x478/0x764
>>>> schedule+0x9c/0xe0
>>>> blk_queue_enter+0x158/0x228
>>>> blk_mq_alloc_request+0x40/0xa4
>>>> blk_get_request+0x2c/0x70
>>>> __scsi_execute+0x60/0x1c4
>>>> ufshcd_set_dev_pwr_mode+0x124/0x1e4
>>>> ufshcd_suspend+0x208/0x83c
>>>> ufshcd_runtime_suspend+0x40/0x154
>>>> ufshcd_pltfrm_runtime_suspend+0x14/0x20
>>>> pm_generic_runtime_suspend+0x28/0x3c
>>>> __rpm_callback+0x80/0x2a4
>>>> rpm_suspend+0x308/0x614
>>>> rpm_idle+0x158/0x228
>>>> pm_runtime_work+0x84/0xac
>>>> process_one_work+0x1f0/0x470
>>>> worker_thread+0x26c/0x4c8
>>>> kthread+0x13c/0x320
>>>> ret_from_fork+0x10/0x18
>>>>
>>>> Fix this by registering ufs device wlun as a scsi driver and
>>>> registering it for block runtime-pm. Also make this as a
>>>> supplier for all other luns. That way, this device wlun
>>>> suspends after all the consumers and resumes after
>>>> hba resumes.
>>>>
>>>> Co-developed-by: Can Guo <cang@codeaurora.org>
>>>> Signed-off-by: Can Guo <cang@codeaurora.org>
>>>> Signed-off-by: Asutosh Das <asutoshd@codeaurora.org>
>>>> ---
>>>> drivers/scsi/ufs/cdns-pltfrm.c     |   2 +
>>>> drivers/scsi/ufs/tc-dwc-g210-pci.c |   2 +
>>>> drivers/scsi/ufs/ufs-debugfs.c     |   6 +-
>>>> drivers/scsi/ufs/ufs-debugfs.h     |   2 +-
>>>> drivers/scsi/ufs/ufs-exynos.c      |   2 +
>>>> drivers/scsi/ufs/ufs-hisi.c        |   2 +
>>>> drivers/scsi/ufs/ufs-mediatek.c    |  12 +-
>>>> drivers/scsi/ufs/ufs-qcom.c        |   2 +
>>>> drivers/scsi/ufs/ufs_bsg.c         |   6 +-
>>>> drivers/scsi/ufs/ufshcd-pci.c      |  36 +--
>>>> drivers/scsi/ufs/ufshcd.c          | 642 ++++++++++++++++++++++++++-----------
>>>> drivers/scsi/ufs/ufshcd.h          |   6 +
>>>> include/trace/events/ufs.h         |  20 ++
>>>> 13 files changed, 509 insertions(+), 231 deletions(-)
>>>
>>> In this patch, you changed pm_runtime_{get, put}_sync to scsi_autopm_{get, put}_device.
>>> But, scsi_autopm_get_device() calls pm_runtime_put_sync() in case of error
>>> of pm_runtime_get_sync(). So, pm_runtime_put_sync() can be called twice if
>>> scsi_autopm_get_device has error.
>>
>> Also it might be tidy to make wrappers e.g.
>>
>> static inline int ufshcd_rpm_get_sync(struct ufs_hba *hba)
>> {
>>      return pm_runtime_get_sync(&hba->sdev_ufs_device->sdev_gendev);
>> }
>>     static inline int ufshcd_rpm_put(struct ufs_hba *hba)
>> {
>>      return pm_runtime_put(&hba->sdev_ufs_device->sdev_gendev);
>> }
>>
>> static inline int ufshcd_rpm_put_sync(struct ufs_hba *hba)
>> {
>>      return pm_runtime_put_sync(&hba->sdev_ufs_device->sdev_gendev);
>> }
>>
>> And also consider matching: e.g.
>>
>>     pm_runtime_put(hba->dev)    to    ufshcd_rpm_put(hba)
>>     pm_runtime_put_sync(hba->dev)    to    ufshcd_rpm_put_sync(hba)
>>
>>
>>
> 
> Ok, I'll push the changes shortly.
> 

I think I will have some more comments, so you could wait if you want.