Message ID | 5448054.DvuYhMxLoT@kreacher |
---|---|
Headers | show |
Series | PM: runtime: Update device status before letting suppliers suspend (another take) | expand |
On Thu, 18 Mar 2021 at 19:15, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > Because the PM-runtime status of the device is not updated in > __rpm_callback(), attempts to suspend the suppliers of the given > device triggered by the rpm_put_suppliers() call in there may fail. > > To fix this (1) modify __rpm_callback() to avoid attempting to > actually suspend the suppliers, but only decrease their PM-runtime > usage counters and (2) make rpm_suspend() try to suspend the suppliers > after changing the device's PM-runtime status, in analogy with the > handling of the device's parent. > > Link: https://lore.kernel.org/linux-pm/CAPDyKFqm06KDw_p8WXsM4dijDbho4bb6T4k50UqqvR1_COsp8g@mail.gmail.com/ > Fixes: 21d5c57b3726 ("PM / runtime: Use device links") > Reported-by: elaine.zhang <zhangqing@rock-chips.com> > Diagnosed-by: Ulf Hansson <ulf.hansson@linaro.org> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Just a minor nitpick, see below. In any case: Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> > --- > drivers/base/power/runtime.c | 45 +++++++++++++++++++++++++++++++++++++------ > 1 file changed, 39 insertions(+), 6 deletions(-) > > Index: linux-pm/drivers/base/power/runtime.c > =================================================================== > --- linux-pm.orig/drivers/base/power/runtime.c > +++ linux-pm/drivers/base/power/runtime.c > @@ -305,7 +305,7 @@ static int rpm_get_suppliers(struct devi > return 0; > } > > -static void rpm_put_suppliers(struct device *dev) > +static void __rpm_put_suppliers(struct device *dev, bool try_to_suspend) > { > struct device_link *link; > > @@ -313,10 +313,30 @@ static void rpm_put_suppliers(struct dev > device_links_read_lock_held()) { > > while (refcount_dec_not_one(&link->rpm_active)) > - pm_runtime_put(link->supplier); > + pm_runtime_put_noidle(link->supplier); > + > + if (try_to_suspend) > + pm_request_idle(link->supplier); > } > } > > +static void rpm_put_suppliers(struct device *dev) > +{ > + __rpm_put_suppliers(dev, true); > +} > + > +static void rpm_try_to_suspend_suppliers(struct device *dev) Maybe "rpm_suspend_suppliers" is sufficient for the name of the function, but I have no strong opinion. > +{ > + struct device_link *link; > + int idx = device_links_read_lock(); > + > + list_for_each_entry_rcu(link, &dev->links.suppliers, c_node, > + device_links_read_lock_held()) > + pm_request_idle(link->supplier); > + > + device_links_read_unlock(idx); > +} > + [...] Kind regards Uffe
On Thu, 18 Mar 2021 at 19:15, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > > From: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> > > Revert commit 44cc89f76464 ("PM: runtime: Update device status > before letting suppliers suspend") that introduced a race condition > into __rpm_callback() which allowed a concurrent rpm_resume() to > run and resume the device prematurely after its status had been > changed to RPM_SUSPENDED by __rpm_callback(). Huh, the code path is not entirely easy to follow. :-) Did you find this by code inspection or did you get an error report? > > Fixes: 44cc89f76464 ("PM: runtime: Update device status before letting suppliers suspend") > Link: https://lore.kernel.org/linux-pm/24dfb6fc-5d54-6ee2-9195-26428b7ecf8a@intel.com/ > Reported by: Adrian Hunter <adrian.hunter@intel.com> > Cc: 4.10+ <stable@vger.kernel.org> # 4.10+ > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Kind regards Uffe > --- > drivers/base/power/runtime.c | 62 +++++++++++++++--------------------- > 1 file changed, 25 insertions(+), 37 deletions(-) > > diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c > index 18b82427d0cb..a46a7e30881b 100644 > --- a/drivers/base/power/runtime.c > +++ b/drivers/base/power/runtime.c > @@ -325,22 +325,22 @@ static void rpm_put_suppliers(struct device *dev) > static int __rpm_callback(int (*cb)(struct device *), struct device *dev) > __releases(&dev->power.lock) __acquires(&dev->power.lock) > { > - bool use_links = dev->power.links_count > 0; > - bool get = false; > int retval, idx; > - bool put; > + bool use_links = dev->power.links_count > 0; > > if (dev->power.irq_safe) { > spin_unlock(&dev->power.lock); > - } else if (!use_links) { > - spin_unlock_irq(&dev->power.lock); > } else { > - get = dev->power.runtime_status == RPM_RESUMING; > - > spin_unlock_irq(&dev->power.lock); > > - /* Resume suppliers if necessary. */ > - if (get) { > + /* > + * Resume suppliers if necessary. > + * > + * The device's runtime PM status cannot change until this > + * routine returns, so it is safe to read the status outside of > + * the lock. > + */ > + if (use_links && dev->power.runtime_status == RPM_RESUMING) { > idx = device_links_read_lock(); > > retval = rpm_get_suppliers(dev); > @@ -355,36 +355,24 @@ static int __rpm_callback(int (*cb)(struct device *), struct device *dev) > > if (dev->power.irq_safe) { > spin_lock(&dev->power.lock); > - return retval; > - } > - > - spin_lock_irq(&dev->power.lock); > - > - if (!use_links) > - return retval; > - > - /* > - * If the device is suspending and the callback has returned success, > - * drop the usage counters of the suppliers that have been reference > - * counted on its resume. > - * > - * Do that if the resume fails too. > - */ > - put = dev->power.runtime_status == RPM_SUSPENDING && !retval; > - if (put) > - __update_runtime_status(dev, RPM_SUSPENDED); > - else > - put = get && retval; > - > - if (put) { > - spin_unlock_irq(&dev->power.lock); > - > - idx = device_links_read_lock(); > + } else { > + /* > + * If the device is suspending and the callback has returned > + * success, drop the usage counters of the suppliers that have > + * been reference counted on its resume. > + * > + * Do that if resume fails too. > + */ > + if (use_links > + && ((dev->power.runtime_status == RPM_SUSPENDING && !retval) > + || (dev->power.runtime_status == RPM_RESUMING && retval))) { > + idx = device_links_read_lock(); > > -fail: > - rpm_put_suppliers(dev); > + fail: > + rpm_put_suppliers(dev); > > - device_links_read_unlock(idx); > + device_links_read_unlock(idx); > + } > > spin_lock_irq(&dev->power.lock); > } > -- > 2.26.2 > > > >
On Fri, Mar 19, 2021 at 2:31 PM Ulf Hansson <ulf.hansson@linaro.org> wrote: > > On Thu, 18 Mar 2021 at 19:15, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > > > > From: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> > > > > Revert commit 44cc89f76464 ("PM: runtime: Update device status > > before letting suppliers suspend") that introduced a race condition > > into __rpm_callback() which allowed a concurrent rpm_resume() to > > run and resume the device prematurely after its status had been > > changed to RPM_SUSPENDED by __rpm_callback(). > > Huh, the code path is not entirely easy to follow. :-) > > Did you find this by code inspection or did you get an error report? There was a bug report that caused me to look at the code once again. > > Fixes: 44cc89f76464 ("PM: runtime: Update device status before letting suppliers suspend") > > Link: https://lore.kernel.org/linux-pm/24dfb6fc-5d54-6ee2-9195-26428b7ecf8a@intel.com/ > > Reported by: Adrian Hunter <adrian.hunter@intel.com> > > Cc: 4.10+ <stable@vger.kernel.org> # 4.10+ > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Thanks!
On Fri, Mar 19, 2021 at 2:30 PM Ulf Hansson <ulf.hansson@linaro.org> wrote: > > On Thu, 18 Mar 2021 at 19:15, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > Because the PM-runtime status of the device is not updated in > > __rpm_callback(), attempts to suspend the suppliers of the given > > device triggered by the rpm_put_suppliers() call in there may fail. > > > > To fix this (1) modify __rpm_callback() to avoid attempting to > > actually suspend the suppliers, but only decrease their PM-runtime > > usage counters and (2) make rpm_suspend() try to suspend the suppliers > > after changing the device's PM-runtime status, in analogy with the > > handling of the device's parent. > > > > Link: https://lore.kernel.org/linux-pm/CAPDyKFqm06KDw_p8WXsM4dijDbho4bb6T4k50UqqvR1_COsp8g@mail.gmail.com/ > > Fixes: 21d5c57b3726 ("PM / runtime: Use device links") > > Reported-by: elaine.zhang <zhangqing@rock-chips.com> > > Diagnosed-by: Ulf Hansson <ulf.hansson@linaro.org> > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > Just a minor nitpick, see below. In any case: > > Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Thanks! > > > --- > > drivers/base/power/runtime.c | 45 +++++++++++++++++++++++++++++++++++++------ > > 1 file changed, 39 insertions(+), 6 deletions(-) > > > > Index: linux-pm/drivers/base/power/runtime.c > > =================================================================== > > --- linux-pm.orig/drivers/base/power/runtime.c > > +++ linux-pm/drivers/base/power/runtime.c > > @@ -305,7 +305,7 @@ static int rpm_get_suppliers(struct devi > > return 0; > > } > > > > -static void rpm_put_suppliers(struct device *dev) > > +static void __rpm_put_suppliers(struct device *dev, bool try_to_suspend) > > { > > struct device_link *link; > > > > @@ -313,10 +313,30 @@ static void rpm_put_suppliers(struct dev > > device_links_read_lock_held()) { > > > > while (refcount_dec_not_one(&link->rpm_active)) > > - pm_runtime_put(link->supplier); > > + pm_runtime_put_noidle(link->supplier); > > + > > + if (try_to_suspend) > > + pm_request_idle(link->supplier); > > } > > } > > > > +static void rpm_put_suppliers(struct device *dev) > > +{ > > + __rpm_put_suppliers(dev, true); > > +} > > + > > +static void rpm_try_to_suspend_suppliers(struct device *dev) > > Maybe "rpm_suspend_suppliers" is sufficient for the name of the > function, but I have no strong opinion. OK In addition to this, spin_unlock_irq()/spin_lock_irq() need to be used around the call to it in rpm_suspend(), so I'll send a v2. I guess that the R-by still applies, though. :-) > > +{ > > + struct device_link *link; > > + int idx = device_links_read_lock(); > > + > > + list_for_each_entry_rcu(link, &dev->links.suppliers, c_node, > > + device_links_read_lock_held()) > > + pm_request_idle(link->supplier); > > + > > + device_links_read_unlock(idx); > > +} > > +