Message ID | 20220713003523.29309-4-quic_wcheng@quicinc.com |
---|---|
State | New |
Headers | show |
Series | Fix controller halt and endxfer timeout issues | expand |
Hi Thinh, On 7/14/2022 10:38 AM, Thinh Nguyen wrote: > On 7/12/2022, Wesley Cheng wrote: >> Local interrupts are currently being disabled as part of aquiring the >> spin lock before issuing the endxfer command. Leave interrupts enabled, so >> that EP0 events can continue to be processed. Also, ensure that there are >> no pending interrupts before attempting to handle any soft >> connect/disconnect. >> >> Fixes: 861c010a2ee1 ("usb: dwc3: gadget: Refactor pullup()") >> Signed-off-by: Wesley Cheng <quic_wcheng@quicinc.com> >> --- >> drivers/usb/dwc3/gadget.c | 21 ++++++++++++--------- >> 1 file changed, 12 insertions(+), 9 deletions(-) >> >> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c >> index a455f8d4631d..ee85b773e3fe 100644 >> --- a/drivers/usb/dwc3/gadget.c >> +++ b/drivers/usb/dwc3/gadget.c >> @@ -1674,6 +1674,7 @@ static int __dwc3_gadget_get_frame(struct dwc3 *dwc) >> static int __dwc3_stop_active_transfer(struct dwc3_ep *dep, bool force, bool interrupt) >> { >> struct dwc3_gadget_ep_cmd_params params; >> + struct dwc3 *dwc = dep->dwc; >> u32 cmd; >> int ret; >> >> @@ -1682,7 +1683,9 @@ static int __dwc3_stop_active_transfer(struct dwc3_ep *dep, bool force, bool int >> cmd |= interrupt ? DWC3_DEPCMD_CMDIOC : 0; >> cmd |= DWC3_DEPCMD_PARAM(dep->resource_index); >> memset(¶ms, 0, sizeof(params)); >> + spin_unlock(&dwc->lock); >> ret = dwc3_send_gadget_ep_cmd(dep, cmd, ¶ms); >> + spin_lock(&dwc->lock); >> WARN_ON_ONCE(ret); >> dep->resource_index = 0; >> >> @@ -2029,12 +2032,11 @@ static int dwc3_gadget_ep_dequeue(struct usb_ep *ep, >> struct dwc3_ep *dep = to_dwc3_ep(ep); >> struct dwc3 *dwc = dep->dwc; >> >> - unsigned long flags; >> int ret = 0; >> >> trace_dwc3_ep_dequeue(req); >> >> - spin_lock_irqsave(&dwc->lock, flags); >> + spin_lock(&dwc->lock); >> >> list_for_each_entry(r, &dep->cancelled_list, list) { >> if (r == req) >> @@ -2073,7 +2075,7 @@ static int dwc3_gadget_ep_dequeue(struct usb_ep *ep, >> request, ep->name); >> ret = -EINVAL; >> out: >> - spin_unlock_irqrestore(&dwc->lock, flags); >> + spin_unlock(&dwc->lock); >> >> return ret; >> } >> @@ -2489,9 +2491,7 @@ static int __dwc3_gadget_start(struct dwc3 *dwc); >> >> static int dwc3_gadget_soft_disconnect(struct dwc3 *dwc) >> { >> - unsigned long flags; >> - >> - spin_lock_irqsave(&dwc->lock, flags); >> + spin_lock(&dwc->lock); >> dwc->connected = false; >> >> /* >> @@ -2506,10 +2506,10 @@ static int dwc3_gadget_soft_disconnect(struct dwc3 *dwc) >> >> reinit_completion(&dwc->ep0_in_setup); >> >> - spin_unlock_irqrestore(&dwc->lock, flags); >> + spin_unlock(&dwc->lock); >> ret = wait_for_completion_timeout(&dwc->ep0_in_setup, >> msecs_to_jiffies(DWC3_PULL_UP_TIMEOUT)); >> - spin_lock_irqsave(&dwc->lock, flags); >> + spin_lock(&dwc->lock); >> if (ret == 0) >> dev_warn(dwc->dev, "timed out waiting for SETUP phase\n"); >> } >> @@ -2523,7 +2523,7 @@ static int dwc3_gadget_soft_disconnect(struct dwc3 *dwc) >> */ >> dwc3_stop_active_transfers(dwc); >> __dwc3_gadget_stop(dwc); >> - spin_unlock_irqrestore(&dwc->lock, flags); >> + spin_unlock(&dwc->lock); >> >> /* >> * Note: if the GEVNTCOUNT indicates events in the event buffer, the >> @@ -2569,6 +2569,8 @@ static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on) >> return 0; >> } >> >> + synchronize_irq(dwc->irq_gadget); >> + >> if (!is_on) { >> ret = dwc3_gadget_soft_disconnect(dwc); >> } else { >> @@ -3729,6 +3731,7 @@ void dwc3_stop_active_transfer(struct dwc3_ep *dep, bool force, >> */ >> >> __dwc3_stop_active_transfer(dep, force, interrupt); >> + >> } >> >> static void dwc3_clear_stall_all_ep(struct dwc3 *dwc) > > Hi Greg, > > Please don't pick up this patch yet. We're still in discussion with > this. I have some concern with unlocking/locking when sending End > Transfer command. For example, this patch may cause issues with > DWC3_EP_END_TRANSFER_PENDING checks. > > Hi Wesley, > > Did you try out my suggestion yet? > Just providing a quick update. So with your suggestion, I was able to consistently reproduce the controller halt issue after a day or so of testing. However, when I took a further look, I believe the problem is due to the DWC3 event handler: static void dwc3_endpoint_interrupt(struct dwc3 *dwc, const struct dwc3_event_depevt *event) { ... if (!(dep->flags & DWC3_EP_ENABLED)) { if (!(dep->flags & DWC3_EP_TRANSFER_STARTED)) return; /* Handle only EPCMDCMPLT when EP disabled */ if (event->endpoint_event != DWC3_DEPEVT_EPCMDCMPLT) return; } The soft disconnect routine reached to the run/stop polling point, and I could see that DWC3_EP_DELAYED_STOP was set, and we got a xfercomplete event for the STATUS phase. However, since we exit early in the event handler (due to __dwc3_gadget_stop() being called and disabling EP0), the STATUS complete is never handled, and we do not issue the endxfer command. I don't think I saw this issue with my change, as we allowed the STATUS phase handling to happen BEFORE gadget stop was called (since I released the lock in the stop active transfers API). However, I think even with my approach, we'd eventually run into a possibility of this issue, as we aren't truly handling EP0 events while polling for the halted status due to the above. It was just reducing the chances. The scenario of this issue is coming because the host took a long time to complete the STATUS phase, so we ran into a "timed out waiting for SETUP phase," which allowed us to call the run/stop routine while we were not yet in the SETUP phase. I'm currently running a change to add a EP num check to this IF condition: if ((epnum > 1) && !(dep->flags & DWC3_EP_ENABLED)) { if (!(dep->flags & DWC3_EP_TRANSFER_STARTED)) return; /* Handle only EPCMDCMPLT when EP disabled */ if (event->endpoint_event != DWC3_DEPEVT_EPCMDCMPLT) return; } Thanks Wesley Cheng
Hi Wesley, On 7/20/2022, Wesley Cheng wrote: > Hi Thinh, > > On 7/14/2022 10:38 AM, Thinh Nguyen wrote: >> On 7/12/2022, Wesley Cheng wrote: >>> Local interrupts are currently being disabled as part of aquiring the >>> spin lock before issuing the endxfer command. Leave interrupts >>> enabled, so >>> that EP0 events can continue to be processed. Also, ensure that >>> there are >>> no pending interrupts before attempting to handle any soft >>> connect/disconnect. >>> >>> Fixes: 861c010a2ee1 ("usb: dwc3: gadget: Refactor pullup()") >>> Signed-off-by: Wesley Cheng <quic_wcheng@quicinc.com> >>> --- >>> drivers/usb/dwc3/gadget.c | 21 ++++++++++++--------- >>> 1 file changed, 12 insertions(+), 9 deletions(-) >>> >>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c >>> index a455f8d4631d..ee85b773e3fe 100644 >>> --- a/drivers/usb/dwc3/gadget.c >>> +++ b/drivers/usb/dwc3/gadget.c >>> @@ -1674,6 +1674,7 @@ static int __dwc3_gadget_get_frame(struct dwc3 >>> *dwc) >>> static int __dwc3_stop_active_transfer(struct dwc3_ep *dep, bool >>> force, bool interrupt) >>> { >>> struct dwc3_gadget_ep_cmd_params params; >>> + struct dwc3 *dwc = dep->dwc; >>> u32 cmd; >>> int ret; >>> @@ -1682,7 +1683,9 @@ static int >>> __dwc3_stop_active_transfer(struct dwc3_ep *dep, bool force, bool int >>> cmd |= interrupt ? DWC3_DEPCMD_CMDIOC : 0; >>> cmd |= DWC3_DEPCMD_PARAM(dep->resource_index); >>> memset(¶ms, 0, sizeof(params)); >>> + spin_unlock(&dwc->lock); >>> ret = dwc3_send_gadget_ep_cmd(dep, cmd, ¶ms); >>> + spin_lock(&dwc->lock); >>> WARN_ON_ONCE(ret); >>> dep->resource_index = 0; >>> @@ -2029,12 +2032,11 @@ static int dwc3_gadget_ep_dequeue(struct >>> usb_ep *ep, >>> struct dwc3_ep *dep = to_dwc3_ep(ep); >>> struct dwc3 *dwc = dep->dwc; >>> - unsigned long flags; >>> int ret = 0; >>> trace_dwc3_ep_dequeue(req); >>> - spin_lock_irqsave(&dwc->lock, flags); >>> + spin_lock(&dwc->lock); >>> list_for_each_entry(r, &dep->cancelled_list, list) { >>> if (r == req) >>> @@ -2073,7 +2075,7 @@ static int dwc3_gadget_ep_dequeue(struct >>> usb_ep *ep, >>> request, ep->name); >>> ret = -EINVAL; >>> out: >>> - spin_unlock_irqrestore(&dwc->lock, flags); >>> + spin_unlock(&dwc->lock); >>> return ret; >>> } >>> @@ -2489,9 +2491,7 @@ static int __dwc3_gadget_start(struct dwc3 *dwc); >>> static int dwc3_gadget_soft_disconnect(struct dwc3 *dwc) >>> { >>> - unsigned long flags; >>> - >>> - spin_lock_irqsave(&dwc->lock, flags); >>> + spin_lock(&dwc->lock); >>> dwc->connected = false; >>> /* >>> @@ -2506,10 +2506,10 @@ static int >>> dwc3_gadget_soft_disconnect(struct dwc3 *dwc) >>> reinit_completion(&dwc->ep0_in_setup); >>> - spin_unlock_irqrestore(&dwc->lock, flags); >>> + spin_unlock(&dwc->lock); >>> ret = wait_for_completion_timeout(&dwc->ep0_in_setup, >>> msecs_to_jiffies(DWC3_PULL_UP_TIMEOUT)); >>> - spin_lock_irqsave(&dwc->lock, flags); >>> + spin_lock(&dwc->lock); >>> if (ret == 0) >>> dev_warn(dwc->dev, "timed out waiting for SETUP >>> phase\n"); >>> } >>> @@ -2523,7 +2523,7 @@ static int dwc3_gadget_soft_disconnect(struct >>> dwc3 *dwc) >>> */ >>> dwc3_stop_active_transfers(dwc); >>> __dwc3_gadget_stop(dwc); >>> - spin_unlock_irqrestore(&dwc->lock, flags); >>> + spin_unlock(&dwc->lock); >>> /* >>> * Note: if the GEVNTCOUNT indicates events in the event >>> buffer, the >>> @@ -2569,6 +2569,8 @@ static int dwc3_gadget_pullup(struct >>> usb_gadget *g, int is_on) >>> return 0; >>> } >>> + synchronize_irq(dwc->irq_gadget); >>> + >>> if (!is_on) { >>> ret = dwc3_gadget_soft_disconnect(dwc); >>> } else { >>> @@ -3729,6 +3731,7 @@ void dwc3_stop_active_transfer(struct dwc3_ep >>> *dep, bool force, >>> */ >>> __dwc3_stop_active_transfer(dep, force, interrupt); >>> + >>> } >>> static void dwc3_clear_stall_all_ep(struct dwc3 *dwc) >> >> Hi Greg, >> >> Please don't pick up this patch yet. We're still in discussion with >> this. I have some concern with unlocking/locking when sending End >> Transfer command. For example, this patch may cause issues with >> DWC3_EP_END_TRANSFER_PENDING checks. >> >> Hi Wesley, >> >> Did you try out my suggestion yet? >> > > Just providing a quick update. > > So with your suggestion, I was able to consistently reproduce the > controller halt issue after a day or so of testing. However, when I > took a further look, I believe the problem is due to the DWC3 event > handler: > > static void dwc3_endpoint_interrupt(struct dwc3 *dwc, > const struct dwc3_event_depevt *event) > { > ... > if (!(dep->flags & DWC3_EP_ENABLED)) { > if (!(dep->flags & DWC3_EP_TRANSFER_STARTED)) > return; > > /* Handle only EPCMDCMPLT when EP disabled */ > if (event->endpoint_event != DWC3_DEPEVT_EPCMDCMPLT) > return; > } > > The soft disconnect routine reached to the run/stop polling point, and > I could see that DWC3_EP_DELAYED_STOP was set, and we got a > xfercomplete event for the STATUS phase. However, since we exit early > in the event handler (due to __dwc3_gadget_stop() being called and > disabling EP0), the STATUS complete is never handled, and we do not > issue the endxfer command. > > I don't think I saw this issue with my change, as we allowed the > STATUS phase handling to happen BEFORE gadget stop was called (since I > released the lock in the stop active transfers API). > > However, I think even with my approach, we'd eventually run into a > possibility of this issue, as we aren't truly handling EP0 events > while polling for the halted status due to the above. It was just > reducing the chances. The scenario of this issue is coming because > the host took a long time to complete the STATUS phase, so we ran into > a "timed out waiting for SETUP phase," which allowed us to call the > run/stop routine while we were not yet in the SETUP phase. > > I'm currently running a change to add a EP num check to this IF > condition: > > if ((epnum > 1) && !(dep->flags & DWC3_EP_ENABLED)) { > if (!(dep->flags & DWC3_EP_TRANSFER_STARTED)) > return; > > /* Handle only EPCMDCMPLT when EP disabled */ > if (event->endpoint_event != DWC3_DEPEVT_EPCMDCMPLT) > return; > } > Ah... good catch! Thanks for all the testings. BR, Thinh
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c index a455f8d4631d..ee85b773e3fe 100644 --- a/drivers/usb/dwc3/gadget.c +++ b/drivers/usb/dwc3/gadget.c @@ -1674,6 +1674,7 @@ static int __dwc3_gadget_get_frame(struct dwc3 *dwc) static int __dwc3_stop_active_transfer(struct dwc3_ep *dep, bool force, bool interrupt) { struct dwc3_gadget_ep_cmd_params params; + struct dwc3 *dwc = dep->dwc; u32 cmd; int ret; @@ -1682,7 +1683,9 @@ static int __dwc3_stop_active_transfer(struct dwc3_ep *dep, bool force, bool int cmd |= interrupt ? DWC3_DEPCMD_CMDIOC : 0; cmd |= DWC3_DEPCMD_PARAM(dep->resource_index); memset(¶ms, 0, sizeof(params)); + spin_unlock(&dwc->lock); ret = dwc3_send_gadget_ep_cmd(dep, cmd, ¶ms); + spin_lock(&dwc->lock); WARN_ON_ONCE(ret); dep->resource_index = 0; @@ -2029,12 +2032,11 @@ static int dwc3_gadget_ep_dequeue(struct usb_ep *ep, struct dwc3_ep *dep = to_dwc3_ep(ep); struct dwc3 *dwc = dep->dwc; - unsigned long flags; int ret = 0; trace_dwc3_ep_dequeue(req); - spin_lock_irqsave(&dwc->lock, flags); + spin_lock(&dwc->lock); list_for_each_entry(r, &dep->cancelled_list, list) { if (r == req) @@ -2073,7 +2075,7 @@ static int dwc3_gadget_ep_dequeue(struct usb_ep *ep, request, ep->name); ret = -EINVAL; out: - spin_unlock_irqrestore(&dwc->lock, flags); + spin_unlock(&dwc->lock); return ret; } @@ -2489,9 +2491,7 @@ static int __dwc3_gadget_start(struct dwc3 *dwc); static int dwc3_gadget_soft_disconnect(struct dwc3 *dwc) { - unsigned long flags; - - spin_lock_irqsave(&dwc->lock, flags); + spin_lock(&dwc->lock); dwc->connected = false; /* @@ -2506,10 +2506,10 @@ static int dwc3_gadget_soft_disconnect(struct dwc3 *dwc) reinit_completion(&dwc->ep0_in_setup); - spin_unlock_irqrestore(&dwc->lock, flags); + spin_unlock(&dwc->lock); ret = wait_for_completion_timeout(&dwc->ep0_in_setup, msecs_to_jiffies(DWC3_PULL_UP_TIMEOUT)); - spin_lock_irqsave(&dwc->lock, flags); + spin_lock(&dwc->lock); if (ret == 0) dev_warn(dwc->dev, "timed out waiting for SETUP phase\n"); } @@ -2523,7 +2523,7 @@ static int dwc3_gadget_soft_disconnect(struct dwc3 *dwc) */ dwc3_stop_active_transfers(dwc); __dwc3_gadget_stop(dwc); - spin_unlock_irqrestore(&dwc->lock, flags); + spin_unlock(&dwc->lock); /* * Note: if the GEVNTCOUNT indicates events in the event buffer, the @@ -2569,6 +2569,8 @@ static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on) return 0; } + synchronize_irq(dwc->irq_gadget); + if (!is_on) { ret = dwc3_gadget_soft_disconnect(dwc); } else { @@ -3729,6 +3731,7 @@ void dwc3_stop_active_transfer(struct dwc3_ep *dep, bool force, */ __dwc3_stop_active_transfer(dep, force, interrupt); + } static void dwc3_clear_stall_all_ep(struct dwc3 *dwc)
Local interrupts are currently being disabled as part of aquiring the spin lock before issuing the endxfer command. Leave interrupts enabled, so that EP0 events can continue to be processed. Also, ensure that there are no pending interrupts before attempting to handle any soft connect/disconnect. Fixes: 861c010a2ee1 ("usb: dwc3: gadget: Refactor pullup()") Signed-off-by: Wesley Cheng <quic_wcheng@quicinc.com> --- drivers/usb/dwc3/gadget.c | 21 ++++++++++++--------- 1 file changed, 12 insertions(+), 9 deletions(-)