Message ID | 20230618120949.14868-1-quic_kriskura@quicinc.com |
---|---|
State | New |
Headers | show |
Series | [v3] usb: dwc3: gadget: Propagate core init errors to UDC during pullup | expand |
On 6/19/2023 8:36 PM, Johan Hovold wrote: > On Mon, Jun 19, 2023 at 06:20:43PM +0530, Krishna Kurapati PSSNV wrote: >> On 6/19/2023 12:36 PM, Johan Hovold wrote: >>> On Sun, Jun 18, 2023 at 05:39:49PM +0530, Krishna Kurapati wrote: > >>>> @@ -2747,7 +2747,9 @@ static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on) >>>> ret = pm_runtime_get_sync(dwc->dev); >>>> if (!ret || ret < 0) { >>>> pm_runtime_put(dwc->dev); >>>> - return 0; >>>> + if (ret < 0) >>>> + pm_runtime_set_suspended(dwc->dev); >>> >>> This bit is broken and is also not mentioned or explained in the commit >>> message. What are you trying to achieve here? >>> >>> You cannot set the state like this after runtime PM is enabled and the >>> above call will always fail. > >> The reason why I an returning ret is because, when the first get_sync >> fails because of core_init failure and we return 0 instead of ret, the >> UDC thinks that controller has started successfully but we never set the >> run stop bit. > > That bit is clear. > >> So when we plug out the cable, the disconnect event won't >> be generated and we never send on systems like android the user space >> will never clear the UDC upon disconnect. Its a sort of mismatch between >> controller and udc. > > Ok, but the controller is an error state after the resume failure. And > here you rely on user space to retry gadget activation in order to > eventually detect the disconnect event? > >> Also once the first get_sync fails, the dwc->dev->power.runtime_error >> flag is set and successive calls to get_sync always return -EINVAL. In >> this situation even if UDC/configfs retry pullup, resume_common will >> never be called and we never actually start the controller or resume >> dwc->dev. >> >> By calling set_suspended, I am trying to clear the runtime_error flag so >> that the next retry to pullup will call resume_common and retry >> core_init and set run_stop. > > Ok, thanks, that's the bit I was missing in the commit message. > > First, I perhaps mistakingly thought pm_runtime_set_suspended() may only > be called with PM runtime disabled, but it appears it may indeed be > valid to call also after an error but with the caveat that the device > must then actually be in the suspended state. > > The documentation and implementation is inconsistent here as the kernel > doc for pm_runtime_set_suspended() clearly states: > > It is not valid to call this function for devices with runtime > PM enabled. > > and it also looks like we'd end up with an active-child counter > imbalance if anyone actually tries to do so. > > But either way, it also seems like the controller is not guaranteed to > be suspended here as pm_runtime_get_sync() may also fail after a > previous errors that have left the controller in the active state? > > Also, what kind of errors would cause core_init and resume to fail here? > Hi Johan, As per the comment just above the get_sync during pullup, the resume_common path is used to resume the controller and start peripheral mode incase the dwc3 was in suspended state. So if we are entering gadget_resume we are in suspended state the first time it is called. Regarding the errors that might leave controller in active state, I have faced issue in core init, and in that function there is a cleanup happening in case something fails. So controller was actually not in active state after cleanup was done. In resume common, if core_init_for_resume is failing, we cleanup everything initialized up until that point. The scenario you mentioned would be applicable in case gadget_resume fails. We are not having a return value check and I am not sure what would be the side effect of not having that check there. Either ways, since it was failing for core init, I went ahead and made this patch. As per the reason for failure in core init, the following is what was happening at customer's end: 1. Cable plug-in 2. get_sync calls resume common which inturn calls core_init 3. core soft reset fails in core init, we cleanup and return -110 4. After applying this patch, the -110 was propagated to UDC properly 5. We got a second call to pullup via connect_control and this time reset was successful. The behavior was similar to [1]. There as well, on all Gen-2 targets, after the retry happens I see soft reset is passing but failing for first attempt. [1]: https://lore.kernel.org/all/20230510075252.31023-2-quic_kriskura@quicinc.com/ Regards, Krishna, > If this is something that you see during normal operation then this > seems to suggest that something is wrong with the runtime pm > implementation. > > Note that virtually all drivers treat resume failures as fatal errors > and do not implement any recovery from that. > > In fact, the only other example of this kind of usage that I could find > is also for a Qualcomm driver... > > Johan
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c index 578804dc29ca..27cb671e18e3 100644 --- a/drivers/usb/dwc3/gadget.c +++ b/drivers/usb/dwc3/gadget.c @@ -2747,7 +2747,9 @@ static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on) ret = pm_runtime_get_sync(dwc->dev); if (!ret || ret < 0) { pm_runtime_put(dwc->dev); - return 0; + if (ret < 0) + pm_runtime_set_suspended(dwc->dev); + return ret; } if (dwc->pullups_connected == is_on) {