Message ID | 20230601155652.1157611-1-kai.heng.feng@canonical.com |
---|---|
State | Superseded |
Headers | show |
Series | [v5] scsi: core: Wait until device is fully resumed before doing rescan | expand |
On 6/1/23 08:56, Kai-Heng Feng wrote: > During system resuming process, the resuming order is from top to down. > Namely, the ATA host is resumed before disks connected to it. > > When an EH is scheduled while ATA host is resumed and disk device is > still suspended, the device_lock hold by scsi_rescan_device() is never > released so the dpm_resume() of the disk is blocked forerver, therefore > the system can never be resumed back. > > That's because scsi_attach_vpd() is expecting the disk device is in > operational state, as it doesn't work on suspended device. > > To avoid such deadlock, wait until the scsi device is fully resumed, > before continuing the rescan process. Why doesn't scsi_attach_vpd() support runtime power management? Calling scsi_attach_vpd() should result in a call of sdev_runtime_resume(), isn't it? Thanks, Bart.
On Fri, Jun 2, 2023 at 3:48 AM Bart Van Assche <bvanassche@acm.org> wrote: > > On 6/1/23 08:56, Kai-Heng Feng wrote: > > During system resuming process, the resuming order is from top to down. > > Namely, the ATA host is resumed before disks connected to it. > > > > When an EH is scheduled while ATA host is resumed and disk device is > > still suspended, the device_lock hold by scsi_rescan_device() is never > > released so the dpm_resume() of the disk is blocked forerver, therefore > > the system can never be resumed back. > > > > That's because scsi_attach_vpd() is expecting the disk device is in > > operational state, as it doesn't work on suspended device. > > > > To avoid such deadlock, wait until the scsi device is fully resumed, > > before continuing the rescan process. > > Why doesn't scsi_attach_vpd() support runtime power management? Calling > scsi_attach_vpd() should result in a call of sdev_runtime_resume(), It's system-wide resume in this context, so it's dpm_resume() waiting for the lock to be released by scsi_rescan_device(). Kai-Heng > isn't it? > > Thanks, > > Bart. >
Hi Kai-Heng, kernel test robot noticed the following build errors: [auto build test ERROR on mkp-scsi/for-next] [also build test ERROR on jejb-scsi/for-next linus/master v6.4-rc4 next-20230601] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Kai-Heng-Feng/scsi-core-Wait-until-device-is-fully-resumed-before-doing-rescan/20230601-235821 base: https://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git for-next patch link: https://lore.kernel.org/r/20230601155652.1157611-1-kai.heng.feng%40canonical.com patch subject: [PATCH v5] scsi: core: Wait until device is fully resumed before doing rescan config: parisc-defconfig (https://download.01.org/0day-ci/archive/20230602/202306021251.amkU7A6P-lkp@intel.com/config) compiler: hppa-linux-gcc (GCC) 12.3.0 reproduce (this is a W=1 build): mkdir -p ~/bin wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/intel-lab-lkp/linux/commit/fe2b65dd8204442bd73db8a6e40d8307e11fcd04 git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review Kai-Heng-Feng/scsi-core-Wait-until-device-is-fully-resumed-before-doing-rescan/20230601-235821 git checkout fe2b65dd8204442bd73db8a6e40d8307e11fcd04 # save the config file mkdir build_dir && cp config build_dir/.config COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.3.0 ~/bin/make.cross W=1 O=build_dir ARCH=parisc olddefconfig COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.3.0 ~/bin/make.cross W=1 O=build_dir ARCH=parisc SHELL=/bin/bash drivers/ If you fix the issue, kindly add following tag where applicable | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202306021251.amkU7A6P-lkp@intel.com/ All errors (new ones prefixed by >>): drivers/scsi/scsi_scan.c: In function 'scsi_rescan_device': >> drivers/scsi/scsi_scan.c:1627:48: error: 'struct dev_pm_info' has no member named 'completion' 1627 | wait_for_completion(&dev->power.completion); | ^ vim +1627 drivers/scsi/scsi_scan.c 1621 1622 void scsi_rescan_device(struct device *dev) 1623 { 1624 struct scsi_device *sdev = to_scsi_device(dev); 1625 1626 if (dev->power.is_suspended) > 1627 wait_for_completion(&dev->power.completion); 1628 1629 device_lock(dev); 1630 1631 scsi_attach_vpd(sdev); 1632 scsi_cdl_check(sdev); 1633 1634 if (sdev->handler && sdev->handler->rescan) 1635 sdev->handler->rescan(sdev); 1636 1637 if (dev->driver && try_module_get(dev->driver->owner)) { 1638 struct scsi_driver *drv = to_scsi_driver(dev->driver); 1639 1640 if (drv->rescan) 1641 drv->rescan(dev); 1642 module_put(dev->driver->owner); 1643 } 1644 device_unlock(dev); 1645 } 1646 EXPORT_SYMBOL(scsi_rescan_device); 1647
diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c index d217be323cc6..a59aada98ac5 100644 --- a/drivers/scsi/scsi_scan.c +++ b/drivers/scsi/scsi_scan.c @@ -1621,6 +1621,9 @@ void scsi_rescan_device(struct device *dev) { struct scsi_device *sdev = to_scsi_device(dev); + if (dev->power.is_suspended) + wait_for_completion(&dev->power.completion); + device_lock(dev); scsi_attach_vpd(sdev);
During system resuming process, the resuming order is from top to down. Namely, the ATA host is resumed before disks connected to it. When an EH is scheduled while ATA host is resumed and disk device is still suspended, the device_lock hold by scsi_rescan_device() is never released so the dpm_resume() of the disk is blocked forerver, therefore the system can never be resumed back. That's because scsi_attach_vpd() is expecting the disk device is in operational state, as it doesn't work on suspended device. To avoid such deadlock, wait until the scsi device is fully resumed, before continuing the rescan process. Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> --- v5: - Use a different approach. Wait until the disk device is resumed. v4: - No change. v3: - New patch to resolve undefined pm_suspend_target_state. v2: - Schedule rescan task at the end of system resume phase. - Wording. drivers/scsi/scsi_scan.c | 3 +++ 1 file changed, 3 insertions(+)