Message ID | 1654158277-12921-3-git-send-email-quic_kriskura@quicinc.com |
---|---|
State | Superseded |
Headers | show |
Series | USB DWC3 host wake up support from system suspend | expand |
On Mon, Jun 13, 2022 at 11:08:32AM -0700, Matthias Kaehlcke wrote: > On Mon, Jun 06, 2022 at 01:45:51PM -0700, Matthias Kaehlcke wrote: > > On Thu, Jun 02, 2022 at 12:35:42PM -0700, Matthias Kaehlcke wrote: > > > Hi Krishna, > > > > > > with this version I see xHCI errors on my SC7180 based system, like > > > these: > > > > > > [ 65.352605] xhci-hcd xhci-hcd.13.auto: xHC error in resume, USBSTS 0x401, Reinit > > > > > > [ 101.307155] xhci-hcd xhci-hcd.13.auto: WARN: xHC CMD_RUN timeout > > > > > > After resume a downstream hub isn't enumerated again. > > > > > > So far I didn't see those with v13, but I aso saw the first error with > > > v16. > > > > It also happens with v13, but only when a wakeup capable vUSB <= 2 > > device is plugged in. Initially I used a wakeup capable USB3 to > > Ethernet adapter to trigger the wakeup case, however older versions > > of this series that use usb_wakeup_enabled_descendants() to check > > for wakeup capable devices didn't actually check for vUSB > 2 > > devices. > > > > So the case were the controller/PHYs is powered down works, but > > the controller is unhappy when the runtime PM path is used during > > system suspend. > > The issue isn't seen on all systems using dwc3-qcom and the problem starts > during probe(). The expected probe sequence is something like this: > > dwc3_qcom_probe > dwc3_qcom_of_register_core > dwc3_probe > > if (device_can_wakeup(&qcom->dwc3->dev)) > ... > > The important part is that device_can_wakeup() is called after dwc3_probe() > has completed. That's what I see on a QC SC7280 system, where wakeup is > generally working with these patches. > > However on a QC SC7180 system dwc3_probe() is deferred and only executed after > dwc3_qcom_probe(). As a result the device_can_wakeup() call returns false. > With that the controller/driver ends up in an unhappy state after system > suspend. > > Probing is deferred on SC7180 because device_links_check_suppliers() finds > that '88e3000.phy' isn't ready yet. It seems device links could be used to make sure the dwc3 core is present: Another example for an inconsistent state would be a device link that represents a driver presence dependency, yet is added from the consumer’s ->probe callback while the supplier hasn’t probed yet: Had the driver core known about the device link earlier, it wouldn’t have probed the consumer in the first place. The onus is thus on the consumer to check presence of the supplier after adding the link, and defer probing on non-presence. https://www.kernel.org/doc/html/v5.18/driver-api/device_link.html#usage You could add something like this to dwc3_qcom_of_register_core(): device_link_add(dev, &qcom->dwc3->dev, DL_FLAG_AUTOREMOVE_CONSUMER | DL_FLAG_AUTOPROBE_CONSUMER); if (qcom->dwc3->dev.links.status != DL_DEV_DRIVER_BOUND) ret = -EPROBE_DEFER;
On 6/14/2022 11:23 PM, Matthias Kaehlcke wrote: > On Mon, Jun 13, 2022 at 11:08:32AM -0700, Matthias Kaehlcke wrote: >> On Mon, Jun 06, 2022 at 01:45:51PM -0700, Matthias Kaehlcke wrote: >>> On Thu, Jun 02, 2022 at 12:35:42PM -0700, Matthias Kaehlcke wrote: >>>> Hi Krishna, >>>> >>>> with this version I see xHCI errors on my SC7180 based system, like >>>> these: >>>> >>>> [ 65.352605] xhci-hcd xhci-hcd.13.auto: xHC error in resume, USBSTS 0x401, Reinit >>>> >>>> [ 101.307155] xhci-hcd xhci-hcd.13.auto: WARN: xHC CMD_RUN timeout >>>> >>>> After resume a downstream hub isn't enumerated again. >>>> >>>> So far I didn't see those with v13, but I aso saw the first error with >>>> v16. >>> It also happens with v13, but only when a wakeup capable vUSB <= 2 >>> device is plugged in. Initially I used a wakeup capable USB3 to >>> Ethernet adapter to trigger the wakeup case, however older versions >>> of this series that use usb_wakeup_enabled_descendants() to check >>> for wakeup capable devices didn't actually check for vUSB > 2 >>> devices. >>> >>> So the case were the controller/PHYs is powered down works, but >>> the controller is unhappy when the runtime PM path is used during >>> system suspend. >> The issue isn't seen on all systems using dwc3-qcom and the problem starts >> during probe(). The expected probe sequence is something like this: >> >> dwc3_qcom_probe >> dwc3_qcom_of_register_core >> dwc3_probe >> >> if (device_can_wakeup(&qcom->dwc3->dev)) >> ... >> >> The important part is that device_can_wakeup() is called after dwc3_probe() >> has completed. That's what I see on a QC SC7280 system, where wakeup is >> generally working with these patches. >> >> However on a QC SC7180 system dwc3_probe() is deferred and only executed after >> dwc3_qcom_probe(). As a result the device_can_wakeup() call returns false. >> With that the controller/driver ends up in an unhappy state after system >> suspend. >> >> Probing is deferred on SC7180 because device_links_check_suppliers() finds >> that '88e3000.phy' isn't ready yet. > It seems device links could be used to make sure the dwc3 core is present: > > Another example for an inconsistent state would be a device link that > represents a driver presence dependency, yet is added from the consumer’s > ->probe callback while the supplier hasn’t probed yet: Had the driver core > known about the device link earlier, it wouldn’t have probed the consumer > in the first place. The onus is thus on the consumer to check presence of > the supplier after adding the link, and defer probing on non-presence. > > https://www.kernel.org/doc/html/v5.18/driver-api/device_link.html#usage > > > You could add something like this to dwc3_qcom_of_register_core(): > > > device_link_add(dev, &qcom->dwc3->dev, > DL_FLAG_AUTOREMOVE_CONSUMER | DL_FLAG_AUTOPROBE_CONSUMER); > > if (qcom->dwc3->dev.links.status != DL_DEV_DRIVER_BOUND) > ret = -EPROBE_DEFER; > > > From the doc it isn't clear how the consumer is supposed to check presence > of the supplier, the above check of the link status is also used in > drivers/cpufreq/mediatek-cpufreq.c , but not elsewhere outside of the > driver framework. Hi Mathias, Thanks for the input. I will try the above snippet and confirm if probe call happens in sync with of_platform_populate in dwc3_qcom_of_register_core Regards, Krishna,
On Thu, Jun 16, 2022 at 02:41:10PM +0530, Pavan Kondeti wrote: > Hi Matthias/Krishna, > > On Tue, Jun 14, 2022 at 10:53:35AM -0700, Matthias Kaehlcke wrote: > > On Mon, Jun 13, 2022 at 11:08:32AM -0700, Matthias Kaehlcke wrote: > > > On Mon, Jun 06, 2022 at 01:45:51PM -0700, Matthias Kaehlcke wrote: > > > > On Thu, Jun 02, 2022 at 12:35:42PM -0700, Matthias Kaehlcke wrote: > > > > > Hi Krishna, > > > > > > > > > > with this version I see xHCI errors on my SC7180 based system, like > > > > > these: > > > > > > > > > > [ 65.352605] xhci-hcd xhci-hcd.13.auto: xHC error in resume, USBSTS 0x401, Reinit > > > > > > > > > > [ 101.307155] xhci-hcd xhci-hcd.13.auto: WARN: xHC CMD_RUN timeout > > > > > > > > > > After resume a downstream hub isn't enumerated again. > > > > > > > > > > So far I didn't see those with v13, but I aso saw the first error with > > > > > v16. > > > > > > > > It also happens with v13, but only when a wakeup capable vUSB <= 2 > > > > device is plugged in. Initially I used a wakeup capable USB3 to > > > > Ethernet adapter to trigger the wakeup case, however older versions > > > > of this series that use usb_wakeup_enabled_descendants() to check > > > > for wakeup capable devices didn't actually check for vUSB > 2 > > > > devices. > > > > > > > > So the case were the controller/PHYs is powered down works, but > > > > the controller is unhappy when the runtime PM path is used during > > > > system suspend. > > > > > > The issue isn't seen on all systems using dwc3-qcom and the problem starts > > > during probe(). The expected probe sequence is something like this: > > > > > > dwc3_qcom_probe > > > dwc3_qcom_of_register_core > > > dwc3_probe > > > > > > if (device_can_wakeup(&qcom->dwc3->dev)) > > > ... > > > > > > The important part is that device_can_wakeup() is called after dwc3_probe() > > > has completed. That's what I see on a QC SC7280 system, where wakeup is > > > generally working with these patches. > > > > > > However on a QC SC7180 system dwc3_probe() is deferred and only executed after > > > dwc3_qcom_probe(). As a result the device_can_wakeup() call returns false. > > > With that the controller/driver ends up in an unhappy state after system > > > suspend. > > > > > > Probing is deferred on SC7180 because device_links_check_suppliers() finds > > > that '88e3000.phy' isn't ready yet. > > > > It seems device links could be used to make sure the dwc3 core is present: > > > > Another example for an inconsistent state would be a device link that > > represents a driver presence dependency, yet is added from the consumer’s > > ->probe callback while the supplier hasn’t probed yet: Had the driver core > > known about the device link earlier, it wouldn’t have probed the consumer > > in the first place. The onus is thus on the consumer to check presence of > > the supplier after adding the link, and defer probing on non-presence. > > > > https://www.kernel.org/doc/html/v5.18/driver-api/device_link.html#usage > > > > > > You could add something like this to dwc3_qcom_of_register_core(): > > > > > > device_link_add(dev, &qcom->dwc3->dev, > > DL_FLAG_AUTOREMOVE_CONSUMER | DL_FLAG_AUTOPROBE_CONSUMER); > > > > if (qcom->dwc3->dev.links.status != DL_DEV_DRIVER_BOUND) > > ret = -EPROBE_DEFER; > > > > > I am not very sure how the device_link_add() API works. we are the parent and > creating a depdency on child probe. That does not sound correct to me. The functional dependency is effectively there, the driver already assumes that the dwc3 core was probed when of_platform_populate() returns. The device link itself doesn't create the dependency on the probe(), the check of the link status below does. Another option would be to add a link to the PHYs to the dwc3-qcom node in the device tree, but I don't think that would be a better solution (and I expect Rob would oppose this). I'm open to other solutions, so far the device link is the cleanest that came to my mind. I think the root issue is the driver architecture, with two interdependent drivers for the same IP block, instead of a single framework driver with a common part (dwc3 core) and vendor specific hooks/data. > Any ways, I have another question. > > When dwc3_qcom_of_register_core() returns error back to dwc3_qcom_probe(), we > goto depopulate label which calls of_platform_depopulate() which destroy the > child devices that are populated. how does that ensure that child probe is > completed by the time, our probe is called again. The child device it self is > gone. Is this working because when our probe is called next time, the child > probe depenencies are resolved? Good point! It doesn't really ensure that the child is probed (actually it won't be probed and DL_FLAG_AUTOPROBE_CONSUMER doesn't make sense here), it could happen that dwc3_qcom_probe() is deferred multiple times, but eventually the PHYs should be ready and dwc3_probe() be invoked through of_platform_populate().
On Mon, Jun 20, 2022 at 02:24:15PM +0530, Pavan Kondeti wrote: > +Felipe, Bjorn > > On Thu, Jun 16, 2022 at 10:15:49AM -0700, Matthias Kaehlcke wrote: > > On Thu, Jun 16, 2022 at 02:41:10PM +0530, Pavan Kondeti wrote: > > > Hi Matthias/Krishna, > > > > > > On Tue, Jun 14, 2022 at 10:53:35AM -0700, Matthias Kaehlcke wrote: > > > > On Mon, Jun 13, 2022 at 11:08:32AM -0700, Matthias Kaehlcke wrote: > > > > > On Mon, Jun 06, 2022 at 01:45:51PM -0700, Matthias Kaehlcke wrote: > > > > > > On Thu, Jun 02, 2022 at 12:35:42PM -0700, Matthias Kaehlcke wrote: > > > > > > > Hi Krishna, > > > > > > > > > > > > > > with this version I see xHCI errors on my SC7180 based system, like > > > > > > > these: > > > > > > > > > > > > > > [ 65.352605] xhci-hcd xhci-hcd.13.auto: xHC error in resume, USBSTS 0x401, Reinit > > > > > > > > > > > > > > [ 101.307155] xhci-hcd xhci-hcd.13.auto: WARN: xHC CMD_RUN timeout > > > > > > > > > > > > > > After resume a downstream hub isn't enumerated again. > > > > > > > > > > > > > > So far I didn't see those with v13, but I aso saw the first error with > > > > > > > v16. > > > > > > > > > > > > It also happens with v13, but only when a wakeup capable vUSB <= 2 > > > > > > device is plugged in. Initially I used a wakeup capable USB3 to > > > > > > Ethernet adapter to trigger the wakeup case, however older versions > > > > > > of this series that use usb_wakeup_enabled_descendants() to check > > > > > > for wakeup capable devices didn't actually check for vUSB > 2 > > > > > > devices. > > > > > > > > > > > > So the case were the controller/PHYs is powered down works, but > > > > > > the controller is unhappy when the runtime PM path is used during > > > > > > system suspend. > > > > > > > > > > The issue isn't seen on all systems using dwc3-qcom and the problem starts > > > > > during probe(). The expected probe sequence is something like this: > > > > > > > > > > dwc3_qcom_probe > > > > > dwc3_qcom_of_register_core > > > > > dwc3_probe > > > > > > > > > > if (device_can_wakeup(&qcom->dwc3->dev)) > > > > > ... > > > > > > > > > > The important part is that device_can_wakeup() is called after dwc3_probe() > > > > > has completed. That's what I see on a QC SC7280 system, where wakeup is > > > > > generally working with these patches. > > > > > > > > > > However on a QC SC7180 system dwc3_probe() is deferred and only executed after > > > > > dwc3_qcom_probe(). As a result the device_can_wakeup() call returns false. > > > > > With that the controller/driver ends up in an unhappy state after system > > > > > suspend. > > > > > > > > > > Probing is deferred on SC7180 because device_links_check_suppliers() finds > > > > > that '88e3000.phy' isn't ready yet. > > > > > > > > It seems device links could be used to make sure the dwc3 core is present: > > > > > > > > Another example for an inconsistent state would be a device link that > > > > represents a driver presence dependency, yet is added from the consumer’s > > > > ->probe callback while the supplier hasn’t probed yet: Had the driver core > > > > known about the device link earlier, it wouldn’t have probed the consumer > > > > in the first place. The onus is thus on the consumer to check presence of > > > > the supplier after adding the link, and defer probing on non-presence. > > > > > > > > https://www.kernel.org/doc/html/v5.18/driver-api/device_link.html#usage > > > > > > > > > > > > You could add something like this to dwc3_qcom_of_register_core(): > > > > > > > > > > > > device_link_add(dev, &qcom->dwc3->dev, > > > > DL_FLAG_AUTOREMOVE_CONSUMER | DL_FLAG_AUTOPROBE_CONSUMER); > > > > > > > > if (qcom->dwc3->dev.links.status != DL_DEV_DRIVER_BOUND) > > > > ret = -EPROBE_DEFER; > > > > > > > > > > > I am not very sure how the device_link_add() API works. we are the parent and > > > creating a depdency on child probe. That does not sound correct to me. > > > > The functional dependency is effectively there, the driver already assumes that > > the dwc3 core was probed when of_platform_populate() returns. > > > > The device link itself doesn't create the dependency on the probe(), the check > > of the link status below does. > > > > Another option would be to add a link to the PHYs to the dwc3-qcom node in > > the device tree, but I don't think that would be a better solution (and I > > expect Rob would oppose this). > > > > I'm open to other solutions, so far the device link is the cleanest that came > > to my mind. > > > > I think the root issue is the driver architecture, with two interdependent > > drivers for the same IP block, instead of a single framework driver with a > > common part (dwc3 core) and vendor specific hooks/data. > > > > > Any ways, I have another question. > > > > > > When dwc3_qcom_of_register_core() returns error back to dwc3_qcom_probe(), we > > > goto depopulate label which calls of_platform_depopulate() which destroy the > > > child devices that are populated. how does that ensure that child probe is > > > completed by the time, our probe is called again. The child device it self is > > > gone. Is this working because when our probe is called next time, the child > > > probe depenencies are resolved? > > > > Good point! It doesn't really ensure that the child is probed (actually it > > won't be probed and DL_FLAG_AUTOPROBE_CONSUMER doesn't make sense here), it > > could happen that dwc3_qcom_probe() is deferred multiple times, but eventually > > the PHYs should be ready and dwc3_probe() be invoked through > > of_platform_populate(). > > This is a generic problem i.e if a parent can only proceed after the child > devices are bounded (i.e probed successfully), how to ensure this behavior > from the parent's probe? Since we can't block the parent probe (async probe is > not the default behavior), we have to identify the condition that the children > are deferring probe, so that parent also can do that. > > Can we add a API in drivers core to tell if a device probe is deferred or > not? This can be done by testing list_empty(&dev->p->deferred_probe) under > deferred_probe_mutex mutex. The parent can return EPROBE_DEFER based on this > API return value. That could be an option. > Another alternative would be explicitly checking if the child device suppliers > are ready or not before adding child device. That would require decoupling > of_platform_populate() to creating devices and adding devices. It might require a new API since there are plenty of users of of_platform_populate() that rely on the current behavior. > Note that this problem is not just limited to suppliers not ready. if the > dwc3-qcom is made asynchronous probe, then its child also probed > asynchronously and there is no guarantee that child would be probed by the > time of_platform_populate() is returned. The bus notifier might come handy > in this case. The parent can register for this notifier and waiting for > the children device's BUS_NOTIFY_BOUND_DRIVER/BUS_NOTIFY_DRIVER_NOT_BOUND > notifications. This would also work in our case, if we move to > of_platform_populate() outside the probe(). If I understand correctly the outcome would be a probe() in two stages. The first does as much as it can do without the dwc3 core and leaves the device in a state where it isn't really functional, and the second stage does the rest when BUS_NOTIFY_BOUND_DRIVER is received for the dwc3 core device. A concern could be the need for additional conditions in some code paths to deal with the half-initialized device. Why would of_platform_populate() be moved outside of probe()? To avoid the half-initialized device probe() could block until BUS_NOTIFY_BOUND_DRIVER is received. Probably that should be done with a timeout to avoid blocking forever in case of a problem with probing the dwc3 core.
On Thu, Jun 23, 2022 at 11:38:06AM -0700, Matthias Kaehlcke wrote: > On Mon, Jun 20, 2022 at 02:24:15PM +0530, Pavan Kondeti wrote: > > +Felipe, Bjorn > > > > On Thu, Jun 16, 2022 at 10:15:49AM -0700, Matthias Kaehlcke wrote: > > > On Thu, Jun 16, 2022 at 02:41:10PM +0530, Pavan Kondeti wrote: > > > > Hi Matthias/Krishna, > > > > > > > > On Tue, Jun 14, 2022 at 10:53:35AM -0700, Matthias Kaehlcke wrote: > > > > > On Mon, Jun 13, 2022 at 11:08:32AM -0700, Matthias Kaehlcke wrote: > > > > > > On Mon, Jun 06, 2022 at 01:45:51PM -0700, Matthias Kaehlcke wrote: > > > > > > > On Thu, Jun 02, 2022 at 12:35:42PM -0700, Matthias Kaehlcke wrote: > > > > > > > > Hi Krishna, > > > > > > > > > > > > > > > > with this version I see xHCI errors on my SC7180 based system, like > > > > > > > > these: > > > > > > > > > > > > > > > > [ 65.352605] xhci-hcd xhci-hcd.13.auto: xHC error in resume, USBSTS 0x401, Reinit > > > > > > > > > > > > > > > > [ 101.307155] xhci-hcd xhci-hcd.13.auto: WARN: xHC CMD_RUN timeout > > > > > > > > > > > > > > > > After resume a downstream hub isn't enumerated again. > > > > > > > > > > > > > > > > So far I didn't see those with v13, but I aso saw the first error with > > > > > > > > v16. > > > > > > > > > > > > > > It also happens with v13, but only when a wakeup capable vUSB <= 2 > > > > > > > device is plugged in. Initially I used a wakeup capable USB3 to > > > > > > > Ethernet adapter to trigger the wakeup case, however older versions > > > > > > > of this series that use usb_wakeup_enabled_descendants() to check > > > > > > > for wakeup capable devices didn't actually check for vUSB > 2 > > > > > > > devices. > > > > > > > > > > > > > > So the case were the controller/PHYs is powered down works, but > > > > > > > the controller is unhappy when the runtime PM path is used during > > > > > > > system suspend. > > > > > > > > > > > > The issue isn't seen on all systems using dwc3-qcom and the problem starts > > > > > > during probe(). The expected probe sequence is something like this: > > > > > > > > > > > > dwc3_qcom_probe > > > > > > dwc3_qcom_of_register_core > > > > > > dwc3_probe > > > > > > > > > > > > if (device_can_wakeup(&qcom->dwc3->dev)) > > > > > > ... > > > > > > > > > > > > The important part is that device_can_wakeup() is called after dwc3_probe() > > > > > > has completed. That's what I see on a QC SC7280 system, where wakeup is > > > > > > generally working with these patches. > > > > > > > > > > > > However on a QC SC7180 system dwc3_probe() is deferred and only executed after > > > > > > dwc3_qcom_probe(). As a result the device_can_wakeup() call returns false. > > > > > > With that the controller/driver ends up in an unhappy state after system > > > > > > suspend. > > > > > > > > > > > > Probing is deferred on SC7180 because device_links_check_suppliers() finds > > > > > > that '88e3000.phy' isn't ready yet. > > > > > > > > > > It seems device links could be used to make sure the dwc3 core is present: > > > > > > > > > > Another example for an inconsistent state would be a device link that > > > > > represents a driver presence dependency, yet is added from the consumer’s > > > > > ->probe callback while the supplier hasn’t probed yet: Had the driver core > > > > > known about the device link earlier, it wouldn’t have probed the consumer > > > > > in the first place. The onus is thus on the consumer to check presence of > > > > > the supplier after adding the link, and defer probing on non-presence. > > > > > > > > > > https://www.kernel.org/doc/html/v5.18/driver-api/device_link.html#usage > > > > > > > > > > > > > > > You could add something like this to dwc3_qcom_of_register_core(): > > > > > > > > > > > > > > > device_link_add(dev, &qcom->dwc3->dev, > > > > > DL_FLAG_AUTOREMOVE_CONSUMER | DL_FLAG_AUTOPROBE_CONSUMER); > > > > > > > > > > if (qcom->dwc3->dev.links.status != DL_DEV_DRIVER_BOUND) > > > > > ret = -EPROBE_DEFER; > > > > > > > > > > > > > > I am not very sure how the device_link_add() API works. we are the parent and > > > > creating a depdency on child probe. That does not sound correct to me. > > > > > > The functional dependency is effectively there, the driver already assumes that > > > the dwc3 core was probed when of_platform_populate() returns. > > > > > > The device link itself doesn't create the dependency on the probe(), the check > > > of the link status below does. > > > > > > Another option would be to add a link to the PHYs to the dwc3-qcom node in > > > the device tree, but I don't think that would be a better solution (and I > > > expect Rob would oppose this). > > > > > > I'm open to other solutions, so far the device link is the cleanest that came > > > to my mind. > > > > > > I think the root issue is the driver architecture, with two interdependent > > > drivers for the same IP block, instead of a single framework driver with a > > > common part (dwc3 core) and vendor specific hooks/data. > > > > > > > Any ways, I have another question. > > > > > > > > When dwc3_qcom_of_register_core() returns error back to dwc3_qcom_probe(), we > > > > goto depopulate label which calls of_platform_depopulate() which destroy the > > > > child devices that are populated. how does that ensure that child probe is > > > > completed by the time, our probe is called again. The child device it self is > > > > gone. Is this working because when our probe is called next time, the child > > > > probe depenencies are resolved? > > > > > > Good point! It doesn't really ensure that the child is probed (actually it > > > won't be probed and DL_FLAG_AUTOPROBE_CONSUMER doesn't make sense here), it > > > could happen that dwc3_qcom_probe() is deferred multiple times, but eventually > > > the PHYs should be ready and dwc3_probe() be invoked through > > > of_platform_populate(). > > > > This is a generic problem i.e if a parent can only proceed after the child > > devices are bounded (i.e probed successfully), how to ensure this behavior > > from the parent's probe? Since we can't block the parent probe (async probe is > > not the default behavior), we have to identify the condition that the children > > are deferring probe, so that parent also can do that. > > > > Can we add a API in drivers core to tell if a device probe is deferred or > > not? This can be done by testing list_empty(&dev->p->deferred_probe) under > > deferred_probe_mutex mutex. The parent can return EPROBE_DEFER based on this > > API return value. > > That could be an option. > > > Another alternative would be explicitly checking if the child device suppliers > > are ready or not before adding child device. That would require decoupling > > of_platform_populate() to creating devices and adding devices. > > It might require a new API since there are plenty of users of > of_platform_populate() that rely on the current behavior. Agree. A new API is needed. we also have to consider multiple children scenario, where one child is ready but others are not etc. > > > Note that this problem is not just limited to suppliers not ready. if the > > dwc3-qcom is made asynchronous probe, then its child also probed > > asynchronously and there is no guarantee that child would be probed by the > > time of_platform_populate() is returned. The bus notifier might come handy > > in this case. The parent can register for this notifier and waiting for > > the children device's BUS_NOTIFY_BOUND_DRIVER/BUS_NOTIFY_DRIVER_NOT_BOUND > > notifications. This would also work in our case, if we move to > > of_platform_populate() outside the probe(). > > If I understand correctly the outcome would be a probe() in two stages. The > first does as much as it can do without the dwc3 core and leaves the device > in a state where it isn't really functional, and the second stage does the > rest when BUS_NOTIFY_BOUND_DRIVER is received for the dwc3 core device. > > A concern could be the need for additional conditions in some code paths to > deal with the half-initialized device. > > Why would of_platform_populate() be moved outside of probe()? > > To avoid the half-initialized device probe() could block until > BUS_NOTIFY_BOUND_DRIVER is received. Probably that should be done with a > timeout to avoid blocking forever in case of a problem with probing the > dwc3 core. Right, we have to split the probe() into two parts. The second stage which runs asynchronously should wait for the dwc3 core probe to happen. if at all dwc3 core probe is falied (in which case also we get a notification) for any reason other than EPROBE_DEFER, we have to undo the first stage. Thansk, Pavan
Quoting Pavan Kondeti (2022-06-27 22:31:48) > On Mon, Jun 27, 2022 at 01:02:49PM -0700, Stephen Boyd wrote: > > Quoting Pavan Kondeti (2022-06-20 01:54:15) > > > > > > Would like to hear other people thoughts on this. > > > > > > > I'm not following very closely but it sounds like a problem that may be > > solved by using the component driver code (see > > include/linux/component.h). That would let you move anything that needs > > to be done once the child devices probe to the aggregate driver 'bind' > > function (see struct component_master_ops::bind). > > Thanks Stephen for letting us know about the component device framework. > > IIUC, > > - dwc3-qcom (parent of the dwc3 core) registers as a component master by > calling component_master_add_with_match() before calling > of_platform_populate(). The match callback could be as simple as comparing > the device against our child device. > > - The dwc3 core (child) at the end of its probe can add as a component by calling > component_add(). > > - The above triggers the component_master_ops::bind callback implemented in > dwc3-qcom driver which signals that we are good to go. > > - The dwc-qcom can call component_bind_all() to finish the formality i.e > telling the dwc3 core that we are good to go. > > Is my understanding correct? This is what we are looking for i.e a way for > the child device(s) to signal the parent when the former is bounded. Sounds about right to me. > > Also what happens when the child device probe fails for any reason. i.e > component_add() would never be called so the master driver i.e dwc3-qcom would > wait indefinitely. May be it needs to implement a timeout or runtime suspend > etc should take care of keeping the resoures in suspend state. When the child fails probe, it should return -EPROBE_DEFER if probe needs to be deferred. Then the driver will attempt probe at a later time. If probe fails without defer then it will never work and dwc3-qcom will wait indefinitely. Not much we can do in that situation. dwc3-qcom should wait for dwc3 core to call component_add() and then do whatever needs to be done once the dwc3 core is registered in the dwc3-qcom bind callback. Honestly this may all be a little overkill if there's only two drivers here, dwc3-qcom and dwc3 core. It could probably just be some callback from dwc3 core at the end of probe that calls some function in dwc3-qcom.
On Thu, Jun 30, 2022 at 11:43:01PM +0530, Krishna Kurapati PSSNV wrote: > > On 6/30/2022 3:45 AM, Stephen Boyd wrote: > > Quoting Pavan Kondeti (2022-06-27 22:31:48) > > > On Mon, Jun 27, 2022 at 01:02:49PM -0700, Stephen Boyd wrote: > > > > Quoting Pavan Kondeti (2022-06-20 01:54:15) > > > > > Would like to hear other people thoughts on this. > > > > > > > > > I'm not following very closely but it sounds like a problem that may be > > > > solved by using the component driver code (see > > > > include/linux/component.h). That would let you move anything that needs > > > > to be done once the child devices probe to the aggregate driver 'bind' > > > > function (see struct component_master_ops::bind). > > > Thanks Stephen for letting us know about the component device framework. > > > > > > IIUC, > > > > > > - dwc3-qcom (parent of the dwc3 core) registers as a component master by > > > calling component_master_add_with_match() before calling > > > of_platform_populate(). The match callback could be as simple as comparing > > > the device against our child device. > > > > > > - The dwc3 core (child) at the end of its probe can add as a component by calling > > > component_add(). > > > > > > - The above triggers the component_master_ops::bind callback implemented in > > > dwc3-qcom driver which signals that we are good to go. > > > > > > - The dwc-qcom can call component_bind_all() to finish the formality i.e > > > telling the dwc3 core that we are good to go. > > > > > > Is my understanding correct? This is what we are looking for i.e a way for > > > the child device(s) to signal the parent when the former is bounded. > > Sounds about right to me. > > > > > Also what happens when the child device probe fails for any reason. i.e > > > component_add() would never be called so the master driver i.e dwc3-qcom would > > > wait indefinitely. May be it needs to implement a timeout or runtime suspend > > > etc should take care of keeping the resoures in suspend state. > > When the child fails probe, it should return -EPROBE_DEFER if probe > > needs to be deferred. Then the driver will attempt probe at a later > > time. If probe fails without defer then it will never work and dwc3-qcom > > will wait indefinitely. Not much we can do in that situation. > Hi Stephen, > > Thanks for the idea. But doesn't adding dwc3 as a component to an agg > driver meanthat this change needs to be done on all glue drivers, as > component_bind_all( ) from master componentis supposed to let the dwc3 > core know that we are good to go ? Ideally all glue drivers would add component support, however I don't think it is strictly necessary. Currently the dwc3 core already assumes that everything is in place when it is probed. The core could have empty bind() and unbind() callbacks, with that things in the core would remain essentially as they are and the core doesn't depend on the glue driver to call component_bind_all(). > > dwc3-qcom should wait for dwc3 core to call component_add() and then do > > whatever needs to be done once the dwc3 core is registered in the > > dwc3-qcom bind callback. Honestly this may all be a little overkill if > > there's only two drivers here, dwc3-qcom and dwc3 core. It could > > probably just be some callback from dwc3 core at the end of probe that > > calls some function in dwc3-qcom. > Since the issue we are facing is that the ssphy device links are not ready > causing the dwc3 probe not being invoked, can we add an API as Pavan > suggested > to check if deferred_probe listfor dwc3 device is empty or not andbased on > that we can choose to defer our qcomprobe ? In this case, we don't need to > touch the dwc3 core driver and would be making changesonly in qcom glue > driver. As mentioned above, it shouldn't be necessary to add component support to all the glue drivers. An API to check for deferred probing is an option, however there is a possible race condition: When the dwc3-qcom driver checks for a deferred probe the core could still be probing, in that situation the glue would proceed before the core driver is ready. That could be avoided with the component based approach.
On Thu, Jun 30, 2022 at 06:10:50PM -0700, Matthias Kaehlcke wrote: > > > dwc3-qcom should wait for dwc3 core to call component_add() and then do > > > whatever needs to be done once the dwc3 core is registered in the > > > dwc3-qcom bind callback. Honestly this may all be a little overkill if > > > there's only two drivers here, dwc3-qcom and dwc3 core. It could > > > probably just be some callback from dwc3 core at the end of probe that > > > calls some function in dwc3-qcom. > > Since the issue we are facing is that the ssphy device links are not ready > > causing the dwc3 probe not being invoked, can we add an API as Pavan > > suggested > > to check if deferred_probe listfor dwc3 device is empty or not andbased on > > that we can choose to defer our qcomprobe ? In this case, we don't need to > > touch the dwc3 core driver and would be making changesonly in qcom glue > > driver. > > As mentioned above, it shouldn't be necessary to add component support to > all the glue drivers. An API to check for deferred probing is an option, > however there is a possible race condition: When the dwc3-qcom driver checks > for a deferred probe the core could still be probing, in that situation the > glue would proceed before the core driver is ready. That could be avoided > with the component based approach. The race can happen only if asynchronous probe is enabled, otherwise the child's probe happens synchronously in of_platform_populate() OTOH, would the below condition suffice for our needs here? if our device is not bounded to a driver, we check the state of initcalls and return either error or -EPROBE_DEFER diff --git a/drivers/usb/dwc3/dwc3-qcom.c b/drivers/usb/dwc3/dwc3-qcom.c index 7b6eff5..519a503 100644 --- a/drivers/usb/dwc3/dwc3-qcom.c +++ b/drivers/usb/dwc3/dwc3-qcom.c @@ -722,6 +722,9 @@ static int dwc3_qcom_of_register_core(struct platform_device *pdev) dev_err(dev, "failed to get dwc3 platform device\n"); } + if (!qcom->dwc3->dev.driver) + return driver_deferred_probe_check_state(&qcom->dwc3->dev); + node_put: of_node_put(dwc3_np); Thanks, Pavan
On Fri, Jul 01, 2022 at 03:45:26PM +0530, Pavan Kondeti wrote: > On Thu, Jun 30, 2022 at 06:10:50PM -0700, Matthias Kaehlcke wrote: > > > > dwc3-qcom should wait for dwc3 core to call component_add() and then do > > > > whatever needs to be done once the dwc3 core is registered in the > > > > dwc3-qcom bind callback. Honestly this may all be a little overkill if > > > > there's only two drivers here, dwc3-qcom and dwc3 core. It could > > > > probably just be some callback from dwc3 core at the end of probe that > > > > calls some function in dwc3-qcom. > > > Since the issue we are facing is that the ssphy device links are not ready > > > causing the dwc3 probe not being invoked, can we add an API as Pavan > > > suggested > > > to check if deferred_probe listfor dwc3 device is empty or not andbased on > > > that we can choose to defer our qcomprobe ? In this case, we don't need to > > > touch the dwc3 core driver and would be making changesonly in qcom glue > > > driver. > > > > As mentioned above, it shouldn't be necessary to add component support to > > all the glue drivers. An API to check for deferred probing is an option, > > however there is a possible race condition: When the dwc3-qcom driver checks > > for a deferred probe the core could still be probing, in that situation the > > glue would proceed before the core driver is ready. That could be avoided > > with the component based approach. > > The race can happen only if asynchronous probe is enabled, otherwise the > child's probe happens synchronously in of_platform_populate() I was thinking about the case where the dwc3-qcom probe is initially deferred, then the deferred probe starts shortly after (asynchronously) and now the dwc3-qcom driver does its check. Probably it's not very likely to happen ... > OTOH, would the below condition suffice for our needs here? if our device > is not bounded to a driver, we check the state of initcalls and return > either error or -EPROBE_DEFER > > diff --git a/drivers/usb/dwc3/dwc3-qcom.c b/drivers/usb/dwc3/dwc3-qcom.c > index 7b6eff5..519a503 100644 > --- a/drivers/usb/dwc3/dwc3-qcom.c > +++ b/drivers/usb/dwc3/dwc3-qcom.c > @@ -722,6 +722,9 @@ static int dwc3_qcom_of_register_core(struct platform_device *pdev) > dev_err(dev, "failed to get dwc3 platform device\n"); > } > > + if (!qcom->dwc3->dev.driver) > + return driver_deferred_probe_check_state(&qcom->dwc3->dev); > + > node_put: > of_node_put(dwc3_np); I like the simplicity of it, no need for new APIs. The components based approach would be slightly safer, but in practice I think this should be good enough.
On Fri, Jul 08, 2022 at 04:37:19PM +0530, Krishna Kurapati PSSNV wrote: > On 7/6/2022 12:28 PM, Krishna Kurapati PSSNV wrote: > > On 7/1/2022 9:22 PM, Matthias Kaehlcke wrote: > > On Fri, Jul 01, 2022 at 03:45:26PM +0530, Pavan Kondeti wrote: > > On Thu, Jun 30, 2022 at 06:10:50PM -0700, Matthias Kaehlcke wrote: > > dwc3-qcom should wait for dwc3 core to call component_add() and then do > whatever needs to be done once the dwc3 core is registered in the > dwc3-qcom bind callback. Honestly this may all be a little overkill if > there's only two drivers here, dwc3-qcom and dwc3 core. It could > probably just be some callback from dwc3 core at the end of probe that > calls some function in dwc3-qcom. > > Since the issue we are facing is that the ssphy device links are not ready > causing the dwc3 probe not being invoked, can we add an API as Pavan > suggested > to check if deferred_probe listfor dwc3 device is empty or not andbased on > that we can choose to defer our qcomprobe ? In this case, we don't need to > touch the dwc3 core driver and would be making changesonly in qcom glue > driver. > > As mentioned above, it shouldn't be necessary to add component support to > all the glue drivers. An API to check for deferred probing is an option, > however there is a possible race condition: When the dwc3-qcom driver checks > for a deferred probe the core could still be probing, in that situation the > glue would proceed before the core driver is ready. That could be avoided > with the component based approach. > > The race can happen only if asynchronous probe is enabled, otherwise the > child's probe happens synchronously in of_platform_populate() > > I was thinking about the case where the dwc3-qcom probe is initially deferred, > then the deferred probe starts shortly after (asynchronously) and now the > dwc3-qcom driver does its check. Probably it's not very likely to happen ... > > > OTOH, would the below condition suffice for our needs here? if our device > is not bounded to a driver, we check the state of initcalls and return > either error or -EPROBE_DEFER > > diff --git a/drivers/usb/dwc3/dwc3-qcom.c b/drivers/usb/dwc3/dwc3-qcom.c > index 7b6eff5..519a503 100644 > --- a/drivers/usb/dwc3/dwc3-qcom.c > +++ b/drivers/usb/dwc3/dwc3-qcom.c > @@ -722,6 +722,9 @@ static int dwc3_qcom_of_register_core(struct platform_device > *pdev) > dev_err(dev, "failed to get dwc3 platform device\n"); > } > > + if (!qcom->dwc3->dev.driver) > + return driver_deferred_probe_check_state(&qcom->dwc3->dev); > + > node_put: > of_node_put(dwc3_np); > > I like the simplicity of it, no need for new APIs. > > The components based approach would be slightly safer, but in practice I > think this should be good enough. > > Hi Pavan, Mathias, > I have tested the suggested code and verified that it works on > sc7180. I see that the API has been removed recently in the following > patch :\ > commit 9cbffc7a59561be950ecc675d19a3d2b45202b2b > Author: Saravana Kannan [1]<saravanak@google.com> > Date: Wed Jun 1 00:07:05 2022 -0700 > driver core: Delete driver_deferred_probe_check_state() > Can we make a patch and add it back to the kernel for this purpose ? > Hi Saravana, > Can you help suggest if we can revert your patch or make a new one to > add back the function. The cover letter [1] of the 'deferred_probe_timeout logic clean up' series [2] has more context: A lot of the deferred_probe_timeout logic is redundant with fw_devlink=on. Also, enabling deferred_probe_timeout by default breaks a few cases. This series tries to delete the redundant logic, simplify the frameworks that use driver_deferred_probe_check_state(), enable deferred_probe_timeout=10 by default, and fixes the nfsroot failure case. The overall idea of this series is to replace the global behavior of driver_deferred_probe_check_state() where all devices give up waiting on supplier at the same time with a more granular behavior: 1. Devices with all their suppliers successfully probed by late_initcall probe as usual and avoid unnecessary deferred probe attempts. 2. At or after late_initcall, in cases where boot would break because of fw_devlink=on being strict about the ordering, we a. Temporarily relax the enforcement to probe any unprobed devices that can probe successfully in the current state of the system. For example, when we boot with a NFS rootfs and no network device has probed. b. Go back to enforcing the ordering for any devices that haven't probed. 3. After deferred probe timeout expires, we permanently give up waiting on supplier devices without drivers. At this point, whatever devices can probe without some of their optional suppliers end up probing. In the case where module support is disabled, it's fairly straightforward and all device probes are completed before the initcalls are done. [1] https://lore.kernel.org/all/20220601070707.3946847-1-saravanak@google.com/ [2] https://patchwork.kernel.org/project/linux-pm/list/?series=646471&archive=both&state=* Does anything speak against returning -EPROBE_DEFER directly from dwc3-qcom's probe()? Now with deferred_probe_timeout > 0 there should be at least no endless probing.
diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c index e027c04..b99d3c2 100644 --- a/drivers/usb/dwc3/core.c +++ b/drivers/usb/dwc3/core.c @@ -1787,6 +1787,7 @@ static int dwc3_probe(struct platform_device *pdev) platform_set_drvdata(pdev, dwc); dwc3_cache_hwparams(dwc); + device_init_wakeup(&pdev->dev, of_property_read_bool(dev->of_node, "wakeup-source")); spin_lock_init(&dwc->lock); mutex_init(&dwc->mutex); @@ -1948,7 +1949,7 @@ static int dwc3_suspend_common(struct dwc3 *dwc, pm_message_t msg) dwc3_core_exit(dwc); break; case DWC3_GCTL_PRTCAP_HOST: - if (!PMSG_IS_AUTO(msg)) { + if (!PMSG_IS_AUTO(msg) && !device_can_wakeup(dwc->dev)) { dwc3_core_exit(dwc); break; } @@ -2009,7 +2010,7 @@ static int dwc3_resume_common(struct dwc3 *dwc, pm_message_t msg) spin_unlock_irqrestore(&dwc->lock, flags); break; case DWC3_GCTL_PRTCAP_HOST: - if (!PMSG_IS_AUTO(msg)) { + if (!PMSG_IS_AUTO(msg) && !device_can_wakeup(dwc->dev)) { ret = dwc3_core_init_for_resume(dwc); if (ret) return ret; @@ -2086,8 +2087,6 @@ static int dwc3_runtime_suspend(struct device *dev) if (ret) return ret; - device_init_wakeup(dev, true); - return 0; } @@ -2096,8 +2095,6 @@ static int dwc3_runtime_resume(struct device *dev) struct dwc3 *dwc = dev_get_drvdata(dev); int ret; - device_init_wakeup(dev, false); - ret = dwc3_resume_common(dwc, PMSG_AUTO_RESUME); if (ret) return ret;