Message ID | 20140924191814.GM17997@saruman |
---|---|
State | New |
Headers | show |
Hi, On Wed, Sep 24, 2014 at 02:18:14PM -0500, Felipe Balbi wrote: > On Wed, Sep 24, 2014 at 01:08:15PM -0500, Felipe Balbi wrote: > > On Wed, Sep 24, 2014 at 01:53:31PM -0400, Alan Stern wrote: > > > On Wed, 24 Sep 2014, Felipe Balbi wrote: > > > > > > > > I'll capture usbmon and send here shortly. > > > > > > > > here it is... Interesting part starts at line 73 (114 on this email) > > > > where the data transport received EPIPE (due to Stall). This time > > > > however, I was eventually able to talk to the device and managed to > > > > issue quite a few writes to it. > > > > > > Here's where the unexpected stuff begins: > > > > > > > ed2541c0 1237463240 S Bo:003:01 -115 31 = 55534243 06000000 c0000000 8000061a 003f00c0 00000000 00000000 000000 > > > > ed2541c0 1237463431 C Bo:003:01 0 31 > > > > > ec1a8540 1237463873 S Bi:003:01 -115 192 < > > > > ec1a8540 1237464053 C Bi:003:01 -32 0 > > > > ed2541c0 1237464158 S Co:003:00 s 02 01 0000 0081 0000 0 > > > > ed2541c0 1237464359 C Co:003:00 0 0 > > > > ed2541c0 1237468607 S Bi:003:01 -115 13 < > > > > ed2541c0 1237468802 C Bi:003:01 -75 0 > > > > > > This is the first MODE SENSE command. The gadget should send as much > > > data as it can before halting the bulk-IN endpoint. Instead, the > > > endpoint was halted first. > > > > > > Then, after the host cleared the halt, the gadget apparently sent the > > > data that _should_ have been sent previously. The host was expecting > > > to receive the CSW at this point, so there was an overflow error. > > > That's what caused the host to perform a reset. > > > > > > Evidently this UDC implements the set_halt method incorrectly. > > > According to the kerneldoc for usb_ep_set_halt: > > > > > > * Attempts to halt IN endpoints will fail (returning -EAGAIN) if any > > > * transfer requests are still queued, or if the controller hardware > > > * (usually a FIFO) still holds bytes that the host hasn't collected. > > > > damn old bugs :-) I'll fix that up and Cc stable. > > alright fixed. Below you can see a combined diff which I'll still split > into patches so they can be applied properly. OTOH, there's really no way to split it. It's all needed to fix a single bug. Do you want me to add Reported-by: Alan Stern ?
On Wed, 24 Sep 2014, Felipe Balbi wrote: > > > According to the kerneldoc for usb_ep_set_halt: > > > > > > * Attempts to halt IN endpoints will fail (returning -EAGAIN) if any > > > * transfer requests are still queued, or if the controller hardware > > > * (usually a FIFO) still holds bytes that the host hasn't collected. > > > > damn old bugs :-) I'll fix that up and Cc stable. > > alright fixed. Below you can see a combined diff which I'll still split > into patches so they can be applied properly. And this eliminates the problems you saw with g_mass_storage? Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Sep 24, 2014 at 03:31:13PM -0400, Alan Stern wrote: > On Wed, 24 Sep 2014, Felipe Balbi wrote: > > > > > According to the kerneldoc for usb_ep_set_halt: > > > > > > > > * Attempts to halt IN endpoints will fail (returning -EAGAIN) if any > > > > * transfer requests are still queued, or if the controller hardware > > > > * (usually a FIFO) still holds bytes that the host hasn't collected. > > > > > > damn old bugs :-) I'll fix that up and Cc stable. > > > > alright fixed. Below you can see a combined diff which I'll still split > > into patches so they can be applied properly. > > And this eliminates the problems you saw with g_mass_storage? yup, working with and without stall=0, with and without debugging on. On all three systems I tested before ;-)
On Wed, 24 Sep 2014, Felipe Balbi wrote: > > alright fixed. Below you can see a combined diff which I'll still split > > into patches so they can be applied properly. > > OTOH, there's really no way to split it. It's all needed to fix a single > bug. Do you want me to add Reported-by: Alan Stern ? What we really need is a "Diagnosed-by:" tag. :-) I'll settle for Suggested-by:. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Sep 24, 2014 at 03:48:29PM -0400, Alan Stern wrote: > On Wed, 24 Sep 2014, Felipe Balbi wrote: > > > > alright fixed. Below you can see a combined diff which I'll still split > > > into patches so they can be applied properly. > > > > OTOH, there's really no way to split it. It's all needed to fix a single > > bug. Do you want me to add Reported-by: Alan Stern ? > > What we really need is a "Diagnosed-by:" tag. :-) heh, that's right. > I'll settle for Suggested-by:. alright, I'll add that :-)
On Wed, Sep 24, 2014 at 02:40:12PM -0500, Felipe Balbi wrote: > On Wed, Sep 24, 2014 at 03:31:13PM -0400, Alan Stern wrote: > > On Wed, 24 Sep 2014, Felipe Balbi wrote: > > > > > > > According to the kerneldoc for usb_ep_set_halt: > > > > > > > > > > * Attempts to halt IN endpoints will fail (returning -EAGAIN) if any > > > > > * transfer requests are still queued, or if the controller hardware > > > > > * (usually a FIFO) still holds bytes that the host hasn't collected. > > > > > > > > damn old bugs :-) I'll fix that up and Cc stable. > > > > > > alright fixed. Below you can see a combined diff which I'll still split > > > into patches so they can be applied properly. > > > > And this eliminates the problems you saw with g_mass_storage? > > yup, working with and without stall=0, with and without debugging on. On > all three systems I tested before ;-) there is still one detail which I just caught and not sure if it's something we should care. When I run my msc.sh/msc.c tests [1], after each test I see a new "sdX: unknown partition table". This doesn't happen with my USB stick. I'll fire up my sniffer again and see if I find anything peculiar. [1] https://gitorious.org/usb/usb-tools/source/7eb7ef21de6cd124e0e0d0e7df9ddfff0e2f548e:msc.sh
On Wed, Sep 24, 2014 at 03:06:15PM -0500, Felipe Balbi wrote: > On Wed, Sep 24, 2014 at 02:40:12PM -0500, Felipe Balbi wrote: > > On Wed, Sep 24, 2014 at 03:31:13PM -0400, Alan Stern wrote: > > > On Wed, 24 Sep 2014, Felipe Balbi wrote: > > > > > > > > > According to the kerneldoc for usb_ep_set_halt: > > > > > > > > > > > > * Attempts to halt IN endpoints will fail (returning -EAGAIN) if any > > > > > > * transfer requests are still queued, or if the controller hardware > > > > > > * (usually a FIFO) still holds bytes that the host hasn't collected. > > > > > > > > > > damn old bugs :-) I'll fix that up and Cc stable. > > > > > > > > alright fixed. Below you can see a combined diff which I'll still split > > > > into patches so they can be applied properly. > > > > > > And this eliminates the problems you saw with g_mass_storage? > > > > yup, working with and without stall=0, with and without debugging on. On > > all three systems I tested before ;-) > > there is still one detail which I just caught and not sure if it's > something we should care. When I run my msc.sh/msc.c tests [1], after > each test I see a new "sdX: unknown partition table". This doesn't > happen with my USB stick. it happens with the USB stick after I destroy the partition table. So I guess that's normal.
> From: linux-usb-owner@vger.kernel.org [mailto:linux-usb-owner@vger.kernel.org] On Behalf Of Felipe Balbi > Sent: Wednesday, September 24, 2014 12:18 PM > > On Wed, Sep 24, 2014 at 01:08:15PM -0500, Felipe Balbi wrote: > > On Wed, Sep 24, 2014 at 01:53:31PM -0400, Alan Stern wrote: > > > On Wed, 24 Sep 2014, Felipe Balbi wrote: > > > > > > > > I'll capture usbmon and send here shortly. > > > > > > > > here it is... Interesting part starts at line 73 (114 on this email) > > > > where the data transport received EPIPE (due to Stall). This time > > > > however, I was eventually able to talk to the device and managed to > > > > issue quite a few writes to it. > > > > > > Here's where the unexpected stuff begins: > > > > > > > ed2541c0 1237463240 S Bo:003:01 -115 31 = 55534243 06000000 c0000000 8000061a 003f00c0 00000000 > 00000000 000000 > > > > ed2541c0 1237463431 C Bo:003:01 0 31 > > > > > ec1a8540 1237463873 S Bi:003:01 -115 192 < > > > > ec1a8540 1237464053 C Bi:003:01 -32 0 > > > > ed2541c0 1237464158 S Co:003:00 s 02 01 0000 0081 0000 0 > > > > ed2541c0 1237464359 C Co:003:00 0 0 > > > > ed2541c0 1237468607 S Bi:003:01 -115 13 < > > > > ed2541c0 1237468802 C Bi:003:01 -75 0 > > > > > > This is the first MODE SENSE command. The gadget should send as much > > > data as it can before halting the bulk-IN endpoint. Instead, the > > > endpoint was halted first. > > > > > > Then, after the host cleared the halt, the gadget apparently sent the > > > data that _should_ have been sent previously. The host was expecting > > > to receive the CSW at this point, so there was an overflow error. > > > That's what caused the host to perform a reset. > > > > > > Evidently this UDC implements the set_halt method incorrectly. > > > According to the kerneldoc for usb_ep_set_halt: > > > > > > * Attempts to halt IN endpoints will fail (returning -EAGAIN) if any > > > * transfer requests are still queued, or if the controller hardware > > > * (usually a FIFO) still holds bytes that the host hasn't collected. > > > > damn old bugs :-) I'll fix that up and Cc stable. > > alright fixed. Below you can see a combined diff which I'll still split > into patches so they can be applied properly. [ snipped patch which grubs around in the GDBGFIFOSPACE registers ] Ick. That shouldn't be necessary. The Synopsys vendor driver works fine with the mass-storage gadget (with stall=1) without doing anything like that. When did the dwc3 driver start showing this problem? Wouldn't it be better to find the change which caused this? Or has dwc3 always had this problem, but you never tested with stall=1 before so didn't see it?
Hi, On Thu, Sep 25, 2014 at 01:37:05AM +0000, Paul Zimmerman wrote: > > > > Then, after the host cleared the halt, the gadget apparently sent the > > > > data that _should_ have been sent previously. The host was expecting > > > > to receive the CSW at this point, so there was an overflow error. > > > > That's what caused the host to perform a reset. > > > > > > > > Evidently this UDC implements the set_halt method incorrectly. > > > > According to the kerneldoc for usb_ep_set_halt: > > > > > > > > * Attempts to halt IN endpoints will fail (returning -EAGAIN) if any > > > > * transfer requests are still queued, or if the controller hardware > > > > * (usually a FIFO) still holds bytes that the host hasn't collected. > > > > > > damn old bugs :-) I'll fix that up and Cc stable. > > > > alright fixed. Below you can see a combined diff which I'll still split > > into patches so they can be applied properly. > > [ snipped patch which grubs around in the GDBGFIFOSPACE registers ] > > Ick. That shouldn't be necessary. The Synopsys vendor driver works you need to make sure that both there are no pending transfers and FIFO is empty in case of IN endpoints. Databook itself says (sorry, forgot the section and no access to the docs right now) that to figure out if the FIFO is empty, application can use GDBGFIFOSPACE. Do you have any better ideas ? > fine with the mass-storage gadget (with stall=1) without doing anything > like that. > > When did the dwc3 driver start showing this problem? Wouldn't it be > better to find the change which caused this? Or has dwc3 always had it's probably been like that since ever :-) I just figured that my scripts always had stall=0 and ended up never testing the other way. > this problem, but you never tested with stall=1 before so didn't see > it? yup. Note that it's not enough to checked for TRB completion because there could still be data in the FIFO which the host hasn't moved yet, unless it's 100% guaranteed that the core won't fire XferComplete until every single bit of data has been moved by the host. But the way things are, I'd expect core to be firing transfer complete as soon as all data has reached the FIFO (in case of TX). cheers
Hi again, On Wed, Sep 24, 2014 at 09:40:37PM -0500, Felipe Balbi wrote: > On Thu, Sep 25, 2014 at 01:37:05AM +0000, Paul Zimmerman wrote: > > > > > Then, after the host cleared the halt, the gadget apparently sent the > > > > > data that _should_ have been sent previously. The host was expecting > > > > > to receive the CSW at this point, so there was an overflow error. > > > > > That's what caused the host to perform a reset. > > > > > > > > > > Evidently this UDC implements the set_halt method incorrectly. > > > > > According to the kerneldoc for usb_ep_set_halt: > > > > > > > > > > * Attempts to halt IN endpoints will fail (returning -EAGAIN) if any > > > > > * transfer requests are still queued, or if the controller hardware > > > > > * (usually a FIFO) still holds bytes that the host hasn't collected. > > > > > > > > damn old bugs :-) I'll fix that up and Cc stable. > > > > > > alright fixed. Below you can see a combined diff which I'll still split > > > into patches so they can be applied properly. > > > > [ snipped patch which grubs around in the GDBGFIFOSPACE registers ] > > > > Ick. That shouldn't be necessary. The Synopsys vendor driver works > > you need to make sure that both there are no pending transfers and FIFO > is empty in case of IN endpoints. Databook itself says (sorry, forgot > the section and no access to the docs right now) that to figure out if > the FIFO is empty, application can use GDBGFIFOSPACE. Do you have any > better ideas ? oh, and btw... I only noticed the failure with Linux. I mentioned that it, though flakey during start, worked very stably against Mac OS X. Perhaps Windows is also more forgiving ? Can you share a little more details on what you guys did to get it to work without making sure FIFO is empty ? I can't really understand how you'd cover all cases otherwise...
> From: Felipe Balbi [mailto:balbi@ti.com] > Sent: Wednesday, September 24, 2014 7:41 PM > > On Thu, Sep 25, 2014 at 01:37:05AM +0000, Paul Zimmerman wrote: > > > > > Then, after the host cleared the halt, the gadget apparently sent the > > > > > data that _should_ have been sent previously. The host was expecting > > > > > to receive the CSW at this point, so there was an overflow error. > > > > > That's what caused the host to perform a reset. > > > > > > > > > > Evidently this UDC implements the set_halt method incorrectly. > > > > > According to the kerneldoc for usb_ep_set_halt: > > > > > > > > > > * Attempts to halt IN endpoints will fail (returning -EAGAIN) if any > > > > > * transfer requests are still queued, or if the controller hardware > > > > > * (usually a FIFO) still holds bytes that the host hasn't collected. > > > > > > > > damn old bugs :-) I'll fix that up and Cc stable. > > > > > > alright fixed. Below you can see a combined diff which I'll still split > > > into patches so they can be applied properly. > > > > [ snipped patch which grubs around in the GDBGFIFOSPACE registers ] > > > > Ick. That shouldn't be necessary. The Synopsys vendor driver works > > you need to make sure that both there are no pending transfers and FIFO > is empty in case of IN endpoints. Databook itself says (sorry, forgot > the section and no access to the docs right now) that to figure out if > the FIFO is empty, application can use GDBGFIFOSPACE. Do you have any > better ideas ? I guess I'm just afraid this may just be papering over the real cause. > > fine with the mass-storage gadget (with stall=1) without doing anything > > like that. > > > > When did the dwc3 driver start showing this problem? Wouldn't it be > > better to find the change which caused this? Or has dwc3 always had > > it's probably been like that since ever :-) I just figured that my > scripts always had stall=0 and ended up never testing the other way. > > > this problem, but you never tested with stall=1 before so didn't see > > it? > > yup. Note that it's not enough to checked for TRB completion because > there could still be data in the FIFO which the host hasn't moved yet, > unless it's 100% guaranteed that the core won't fire XferComplete until > every single bit of data has been moved by the host. That is supposed to be 100% guaranteed. The IN XferInProgress and XferComplete events are only delivered after the host has ACKed the packet. > But the way things are, I'd expect core to be firing transfer complete > as soon as all data has reached the FIFO (in case of TX). Nope. That's why I don't understand how this can happen for IN. AFAIK, a STALL is only sent in response to something the host sent in the CBW. At that point, there shouldn't be any IN transfers active. BTW, about a year ago, I fixed a similar issue in our vendor driver. But I don't remember what the cause was, I will search the Git log to see if I can find the commit that fixed it.
> From: Felipe Balbi [mailto:balbi@ti.com] > > On Wed, Sep 24, 2014 at 09:40:37PM -0500, Felipe Balbi wrote: > > On Thu, Sep 25, 2014 at 01:37:05AM +0000, Paul Zimmerman wrote: > > > > > > Then, after the host cleared the halt, the gadget apparently sent the > > > > > > data that _should_ have been sent previously. The host was expecting > > > > > > to receive the CSW at this point, so there was an overflow error. > > > > > > That's what caused the host to perform a reset. > > > > > > > > > > > > Evidently this UDC implements the set_halt method incorrectly. > > > > > > According to the kerneldoc for usb_ep_set_halt: > > > > > > > > > > > > * Attempts to halt IN endpoints will fail (returning -EAGAIN) if any > > > > > > * transfer requests are still queued, or if the controller hardware > > > > > > * (usually a FIFO) still holds bytes that the host hasn't collected. > > > > > > > > > > damn old bugs :-) I'll fix that up and Cc stable. > > > > > > > > alright fixed. Below you can see a combined diff which I'll still split > > > > into patches so they can be applied properly. > > > > > > [ snipped patch which grubs around in the GDBGFIFOSPACE registers ] > > > > > > Ick. That shouldn't be necessary. The Synopsys vendor driver works > > > > you need to make sure that both there are no pending transfers and FIFO > > is empty in case of IN endpoints. Databook itself says (sorry, forgot > > the section and no access to the docs right now) that to figure out if > > the FIFO is empty, application can use GDBGFIFOSPACE. Do you have any > > better ideas ? > > oh, and btw... I only noticed the failure with Linux. I mentioned that > it, though flakey during start, worked very stably against Mac OS X. > Perhaps Windows is also more forgiving ? Yeah, I see you said it only fails with 3.17-rc on an embedded platform you have. Can you say which platform that is? Do you know whose xHCI controller it uses? > Can you share a little more details on what you guys did to get it to > work without making sure FIFO is empty ? I can't really understand how > you'd cover all cases otherwise... See my other email, I will try to find the fix for the similar issue that we saw.
On Thu, 25 Sep 2014, Paul Zimmerman wrote: > That's why I don't understand how this can happen for IN. AFAIK, a STALL > is only sent in response to something the host sent in the CBW. At that > point, there shouldn't be any IN transfers active. The gadget may send a partial response to the CBW. The end of the response is marked with a STALL. The mass-storage gadget driver submits the partial response and then calls usb_ep_set_halt() without waiting for the IN data to be delivered. It relies on the UDC driver returning -EAGAIN if any data is still pending. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> From: Alan Stern [mailto:stern@rowland.harvard.edu] > Sent: Thursday, September 25, 2014 11:08 AM > > On Thu, 25 Sep 2014, Paul Zimmerman wrote: > > > That's why I don't understand how this can happen for IN. AFAIK, a STALL > > is only sent in response to something the host sent in the CBW. At that > > point, there shouldn't be any IN transfers active. > > The gadget may send a partial response to the CBW. The end of the > response is marked with a STALL. The mass-storage gadget driver > submits the partial response and then calls usb_ep_set_halt() without > waiting for the IN data to be delivered. It relies on the UDC driver > returning -EAGAIN if any data is still pending. I guess you're referring to the code under the DATA_DIR_TO_HOST case in finish_reply(). Felipe, in our vendor driver, the ep_set_halt() and ep_set_wedge() functions check the request queue for the endpoint, and if it is not empty, they return -EAGAIN. I see your patch for the dwc3 driver does add that, in addition to the FIFO empty check. Does it still work OK if you remove the FIFO empty check portion?
> From: Paul Zimmerman > > > From: Felipe Balbi [mailto:balbi@ti.com] > > > > On Wed, Sep 24, 2014 at 09:40:37PM -0500, Felipe Balbi wrote: > > > On Thu, Sep 25, 2014 at 01:37:05AM +0000, Paul Zimmerman wrote: > > > > > > > Then, after the host cleared the halt, the gadget apparently sent the > > > > > > > data that _should_ have been sent previously. The host was expecting > > > > > > > to receive the CSW at this point, so there was an overflow error. > > > > > > > That's what caused the host to perform a reset. > > > > > > > > > > > > > > Evidently this UDC implements the set_halt method incorrectly. > > > > > > > According to the kerneldoc for usb_ep_set_halt: > > > > > > > > > > > > > > * Attempts to halt IN endpoints will fail (returning -EAGAIN) if any > > > > > > > * transfer requests are still queued, or if the controller hardware > > > > > > > * (usually a FIFO) still holds bytes that the host hasn't collected. > > > > > > > > > > > > damn old bugs :-) I'll fix that up and Cc stable. > > > > > > > > > > alright fixed. Below you can see a combined diff which I'll still split > > > > > into patches so they can be applied properly. > > > > > > > > [ snipped patch which grubs around in the GDBGFIFOSPACE registers ] > > > > > > > > Ick. That shouldn't be necessary. The Synopsys vendor driver works > > > > > > you need to make sure that both there are no pending transfers and FIFO > > > is empty in case of IN endpoints. Databook itself says (sorry, forgot > > > the section and no access to the docs right now) that to figure out if > > > the FIFO is empty, application can use GDBGFIFOSPACE. Do you have any > > > better ideas ? > > > > oh, and btw... I only noticed the failure with Linux. I mentioned that > > it, though flakey during start, worked very stably against Mac OS X. > > Perhaps Windows is also more forgiving ? > > Yeah, I see you said it only fails with 3.17-rc on an embedded platform > you have. Can you say which platform that is? Do you know whose xHCI > controller it uses? > > > Can you share a little more details on what you guys did to get it to > > work without making sure FIFO is empty ? I can't really understand how > > you'd cover all cases otherwise... > > See my other email, I will try to find the fix for the similar issue > that we saw. OK, it looks like that issue is not related to this one. We were accidentally setting the Stream Capable bit in the endpoint configuration for all bulk EPs, not just stream-enabled ones. That caused several strange behaviors, including the Set Stall command not working.
On Thu, Sep 25, 2014 at 05:54:10PM +0000, Paul Zimmerman wrote: > > From: Felipe Balbi [mailto:balbi@ti.com] > > > > On Wed, Sep 24, 2014 at 09:40:37PM -0500, Felipe Balbi wrote: > > > On Thu, Sep 25, 2014 at 01:37:05AM +0000, Paul Zimmerman wrote: > > > > > > > Then, after the host cleared the halt, the gadget apparently sent the > > > > > > > data that _should_ have been sent previously. The host was expecting > > > > > > > to receive the CSW at this point, so there was an overflow error. > > > > > > > That's what caused the host to perform a reset. > > > > > > > > > > > > > > Evidently this UDC implements the set_halt method incorrectly. > > > > > > > According to the kerneldoc for usb_ep_set_halt: > > > > > > > > > > > > > > * Attempts to halt IN endpoints will fail (returning -EAGAIN) if any > > > > > > > * transfer requests are still queued, or if the controller hardware > > > > > > > * (usually a FIFO) still holds bytes that the host hasn't collected. > > > > > > > > > > > > damn old bugs :-) I'll fix that up and Cc stable. > > > > > > > > > > alright fixed. Below you can see a combined diff which I'll still split > > > > > into patches so they can be applied properly. > > > > > > > > [ snipped patch which grubs around in the GDBGFIFOSPACE registers ] > > > > > > > > Ick. That shouldn't be necessary. The Synopsys vendor driver works > > > > > > you need to make sure that both there are no pending transfers and FIFO > > > is empty in case of IN endpoints. Databook itself says (sorry, forgot > > > the section and no access to the docs right now) that to figure out if > > > the FIFO is empty, application can use GDBGFIFOSPACE. Do you have any > > > better ideas ? > > > > oh, and btw... I only noticed the failure with Linux. I mentioned that > > it, though flakey during start, worked very stably against Mac OS X. > > Perhaps Windows is also more forgiving ? > > Yeah, I see you said it only fails with 3.17-rc on an embedded platform > you have. Can you say which platform that is? Do you know whose xHCI AM473x with DWC3 2.40a (2 instances of it). > controller it uses? yours :-) > > Can you share a little more details on what you guys did to get it to > > work without making sure FIFO is empty ? I can't really understand how > > you'd cover all cases otherwise... > > See my other email, I will try to find the fix for the similar issue > that we saw. alright, tks
On Thu, Sep 25, 2014 at 07:43:35PM +0000, Paul Zimmerman wrote: > > From: Paul Zimmerman > > > > > From: Felipe Balbi [mailto:balbi@ti.com] > > > > > > On Wed, Sep 24, 2014 at 09:40:37PM -0500, Felipe Balbi wrote: > > > > On Thu, Sep 25, 2014 at 01:37:05AM +0000, Paul Zimmerman wrote: > > > > > > > > Then, after the host cleared the halt, the gadget apparently sent the > > > > > > > > data that _should_ have been sent previously. The host was expecting > > > > > > > > to receive the CSW at this point, so there was an overflow error. > > > > > > > > That's what caused the host to perform a reset. > > > > > > > > > > > > > > > > Evidently this UDC implements the set_halt method incorrectly. > > > > > > > > According to the kerneldoc for usb_ep_set_halt: > > > > > > > > > > > > > > > > * Attempts to halt IN endpoints will fail (returning -EAGAIN) if any > > > > > > > > * transfer requests are still queued, or if the controller hardware > > > > > > > > * (usually a FIFO) still holds bytes that the host hasn't collected. > > > > > > > > > > > > > > damn old bugs :-) I'll fix that up and Cc stable. > > > > > > > > > > > > alright fixed. Below you can see a combined diff which I'll still split > > > > > > into patches so they can be applied properly. > > > > > > > > > > [ snipped patch which grubs around in the GDBGFIFOSPACE registers ] > > > > > > > > > > Ick. That shouldn't be necessary. The Synopsys vendor driver works > > > > > > > > you need to make sure that both there are no pending transfers and FIFO > > > > is empty in case of IN endpoints. Databook itself says (sorry, forgot > > > > the section and no access to the docs right now) that to figure out if > > > > the FIFO is empty, application can use GDBGFIFOSPACE. Do you have any > > > > better ideas ? > > > > > > oh, and btw... I only noticed the failure with Linux. I mentioned that > > > it, though flakey during start, worked very stably against Mac OS X. > > > Perhaps Windows is also more forgiving ? > > > > Yeah, I see you said it only fails with 3.17-rc on an embedded platform > > you have. Can you say which platform that is? Do you know whose xHCI > > controller it uses? > > > > > Can you share a little more details on what you guys did to get it to > > > work without making sure FIFO is empty ? I can't really understand how > > > you'd cover all cases otherwise... > > > > See my other email, I will try to find the fix for the similar issue > > that we saw. > > OK, it looks like that issue is not related to this one. We were > accidentally setting the Stream Capable bit in the endpoint > configuration for all bulk EPs, not just stream-enabled ones. That > caused several strange behaviors, including the Set Stall command > not working. ok, this it not about Set Stall not working, however, it's about making sure we don't stall at the wrong time :-)
Hi, On Thu, Sep 25, 2014 at 05:50:59PM +0000, Paul Zimmerman wrote: > > From: Felipe Balbi [mailto:balbi@ti.com] > > Sent: Wednesday, September 24, 2014 7:41 PM > > > > On Thu, Sep 25, 2014 at 01:37:05AM +0000, Paul Zimmerman wrote: > > > > > > Then, after the host cleared the halt, the gadget apparently sent the > > > > > > data that _should_ have been sent previously. The host was expecting > > > > > > to receive the CSW at this point, so there was an overflow error. > > > > > > That's what caused the host to perform a reset. > > > > > > > > > > > > Evidently this UDC implements the set_halt method incorrectly. > > > > > > According to the kerneldoc for usb_ep_set_halt: > > > > > > > > > > > > * Attempts to halt IN endpoints will fail (returning -EAGAIN) if any > > > > > > * transfer requests are still queued, or if the controller hardware > > > > > > * (usually a FIFO) still holds bytes that the host hasn't collected. > > > > > > > > > > damn old bugs :-) I'll fix that up and Cc stable. > > > > > > > > alright fixed. Below you can see a combined diff which I'll still split > > > > into patches so they can be applied properly. > > > > > > [ snipped patch which grubs around in the GDBGFIFOSPACE registers ] > > > > > > Ick. That shouldn't be necessary. The Synopsys vendor driver works > > > > you need to make sure that both there are no pending transfers and FIFO > > is empty in case of IN endpoints. Databook itself says (sorry, forgot > > the section and no access to the docs right now) that to figure out if > > the FIFO is empty, application can use GDBGFIFOSPACE. Do you have any > > better ideas ? > > I guess I'm just afraid this may just be papering over the real cause. Well, the API documentation, as pointed out by Alan, calls for: 380 * Attempts to halt IN endpoints will fail (returning -EAGAIN) if any 381 * transfer requests are still queued, or if the controller hardware 382 * (usually a FIFO) still holds bytes that the host hasn't collected. There's no other way to ensuring FIFO is empty with this core. > > > fine with the mass-storage gadget (with stall=1) without doing anything > > > like that. > > > > > > When did the dwc3 driver start showing this problem? Wouldn't it be > > > better to find the change which caused this? Or has dwc3 always had > > > > it's probably been like that since ever :-) I just figured that my > > scripts always had stall=0 and ended up never testing the other way. > > > > > this problem, but you never tested with stall=1 before so didn't see > > > it? > > > > yup. Note that it's not enough to checked for TRB completion because > > there could still be data in the FIFO which the host hasn't moved yet, > > unless it's 100% guaranteed that the core won't fire XferComplete until > > every single bit of data has been moved by the host. > > That is supposed to be 100% guaranteed. The IN XferInProgress and > XferComplete events are only delivered after the host has ACKed the > packet. > > > But the way things are, I'd expect core to be firing transfer complete > > as soon as all data has reached the FIFO (in case of TX). > > Nope. > > That's why I don't understand how this can happen for IN. AFAIK, a STALL > is only sent in response to something the host sent in the CBW. At that > point, there shouldn't be any IN transfers active. you could race with the HW, right ? You just issued Start Transfer for that IN transfer and control returns to the gadget driver which can, at that very moment, issue a Set Stall. SW *must* make sure that there's nothing pending yet. We haven't received XferComplete for that IN we just started. > BTW, about a year ago, I fixed a similar issue in our vendor driver. > But I don't remember what the cause was, I will search the Git log to > see if I can find the commit that fixed it. ok, thanks.
Hi, On Thu, Sep 25, 2014 at 07:08:12PM +0000, Paul Zimmerman wrote: > > From: Alan Stern [mailto:stern@rowland.harvard.edu] > > Sent: Thursday, September 25, 2014 11:08 AM > > > > On Thu, 25 Sep 2014, Paul Zimmerman wrote: > > > > > That's why I don't understand how this can happen for IN. AFAIK, a STALL > > > is only sent in response to something the host sent in the CBW. At that > > > point, there shouldn't be any IN transfers active. > > > > The gadget may send a partial response to the CBW. The end of the > > response is marked with a STALL. The mass-storage gadget driver > > submits the partial response and then calls usb_ep_set_halt() without > > waiting for the IN data to be delivered. It relies on the UDC driver > > returning -EAGAIN if any data is still pending. > > I guess you're referring to the code under the DATA_DIR_TO_HOST case > in finish_reply(). > > Felipe, in our vendor driver, the ep_set_halt() and ep_set_wedge() > functions check the request queue for the endpoint, and if it is not > empty, they return -EAGAIN. is that really enough ? The request is deleted (rather, moved to another list) as soon as StartTransfer is issued. I can certainly check both lists (and already do) but I fear that we might still race with the HW because there's no documentation to guarantee that I won't :-) > I see your patch for the dwc3 driver does add that, in addition to the > FIFO empty check. Does it still work OK if you remove the FIFO empty > check portion? It probably does, but I can't be sure there won't be a corner case where the list is empty (Xfercomplete was issued and request given back) but host hasn't moved all the data. Synopsys databook doesn't guarantee that will never be the case. Can you confirm, without a shadow of a doubt, that XferComplete will only be issued after host has moved every single bit of data ? If that can be updated to the databook, then sure, I can remove the FIFO space check; otherwise we might leave a corner case that will take us a few days to debug again because it will be very difficult to trigger :-) cheers
> From: Felipe Balbi [mailto:balbi@ti.com] > > On Thu, Sep 25, 2014 at 07:08:12PM +0000, Paul Zimmerman wrote: > > > From: Alan Stern [mailto:stern@rowland.harvard.edu] > > > Sent: Thursday, September 25, 2014 11:08 AM > > > > > > On Thu, 25 Sep 2014, Paul Zimmerman wrote: > > > > > > > That's why I don't understand how this can happen for IN. AFAIK, a STALL > > > > is only sent in response to something the host sent in the CBW. At that > > > > point, there shouldn't be any IN transfers active. > > > > > > The gadget may send a partial response to the CBW. The end of the > > > response is marked with a STALL. The mass-storage gadget driver > > > submits the partial response and then calls usb_ep_set_halt() without > > > waiting for the IN data to be delivered. It relies on the UDC driver > > > returning -EAGAIN if any data is still pending. > > > > I guess you're referring to the code under the DATA_DIR_TO_HOST case > > in finish_reply(). > > > > Felipe, in our vendor driver, the ep_set_halt() and ep_set_wedge() > > functions check the request queue for the endpoint, and if it is not > > empty, they return -EAGAIN. > > is that really enough ? The request is deleted (rather, moved to another > list) as soon as StartTransfer is issued. I can certainly check both > lists (and already do) but I fear that we might still race with the HW > because there's no documentation to guarantee that I won't :-) > > > I see your patch for the dwc3 driver does add that, in addition to the > > FIFO empty check. Does it still work OK if you remove the FIFO empty > > check portion? > > It probably does, but I can't be sure there won't be a corner case where > the list is empty (Xfercomplete was issued and request given back) but > host hasn't moved all the data. Synopsys databook doesn't guarantee that > will never be the case. > > Can you confirm, without a shadow of a doubt, that XferComplete will > only be issued after host has moved every single bit of data ? If that > can be updated to the databook, then sure, I can remove the FIFO space > check; otherwise we might leave a corner case that will take us a few > days to debug again because it will be very difficult to trigger :-) From the 2.70a databook, section 8.2.2: "While processing TRBs, two conditions may cause the core to write out an event and raise an interrupt line: - TRB Complete: - For OUT endpoints, a packet is received which reduces the remaining byte count in the TRB buffer to zero. - For IN endpoints, an acknowledgement is received for a transmitted packet which reduces the remaining byte count in the TRB buffer to zero. - Short Packet Received: - For OUT endpoints only. While writing to a TRB buffer, the endpoint receives a packet that is smaller than the endpoint's MaxPacketSize. " So for an IN, the completion event doesn't come until the TRB has been ACKed, which means all the data has been sent. But, I'm not saying you *have* to remove the FIFO space check. I think we now know the problem was caused by the missing -EAGAIN. So having the FIFO check there shouldn't hurt anything, it's not masking some other problem. And you may need it in the future for one of those other cases the databook talks about.
> From the 2.70a databook, section 8.2.2:
Sorry, it's databook section 8.2.3, not 8.2.2.
Hi, On Thu, Sep 25, 2014 at 09:57:59PM +0000, Paul Zimmerman wrote: > > From: Felipe Balbi [mailto:balbi@ti.com] > > > > On Thu, Sep 25, 2014 at 07:08:12PM +0000, Paul Zimmerman wrote: > > > > From: Alan Stern [mailto:stern@rowland.harvard.edu] > > > > Sent: Thursday, September 25, 2014 11:08 AM > > > > > > > > On Thu, 25 Sep 2014, Paul Zimmerman wrote: > > > > > > > > > That's why I don't understand how this can happen for IN. AFAIK, a STALL > > > > > is only sent in response to something the host sent in the CBW. At that > > > > > point, there shouldn't be any IN transfers active. > > > > > > > > The gadget may send a partial response to the CBW. The end of the > > > > response is marked with a STALL. The mass-storage gadget driver > > > > submits the partial response and then calls usb_ep_set_halt() without > > > > waiting for the IN data to be delivered. It relies on the UDC driver > > > > returning -EAGAIN if any data is still pending. > > > > > > I guess you're referring to the code under the DATA_DIR_TO_HOST case > > > in finish_reply(). > > > > > > Felipe, in our vendor driver, the ep_set_halt() and ep_set_wedge() > > > functions check the request queue for the endpoint, and if it is not > > > empty, they return -EAGAIN. > > > > is that really enough ? The request is deleted (rather, moved to another > > list) as soon as StartTransfer is issued. I can certainly check both > > lists (and already do) but I fear that we might still race with the HW > > because there's no documentation to guarantee that I won't :-) > > > > > I see your patch for the dwc3 driver does add that, in addition to the > > > FIFO empty check. Does it still work OK if you remove the FIFO empty > > > check portion? > > > > It probably does, but I can't be sure there won't be a corner case where > > the list is empty (Xfercomplete was issued and request given back) but > > host hasn't moved all the data. Synopsys databook doesn't guarantee that > > will never be the case. > > > > Can you confirm, without a shadow of a doubt, that XferComplete will > > only be issued after host has moved every single bit of data ? If that > > can be updated to the databook, then sure, I can remove the FIFO space > > check; otherwise we might leave a corner case that will take us a few > > days to debug again because it will be very difficult to trigger :-) > > From the 2.70a databook, section 8.2.2: > > "While processing TRBs, two conditions may cause the core to write out an > event and raise an interrupt line: > - TRB Complete: > - For OUT endpoints, a packet is received which reduces the > remaining byte count in the TRB buffer to zero. > - For IN endpoints, an acknowledgement is received for a > transmitted packet which reduces the remaining byte count > in the TRB buffer to zero. > - Short Packet Received: > - For OUT endpoints only. While writing to a TRB buffer, > the endpoint receives a packet that is smaller than the > endpoint's MaxPacketSize. > " > > So for an IN, the completion event doesn't come until the TRB has been > ACKed, which means all the data has been sent. a very good point indeed. I completely missed that. Well, the patch can become a lot smaller, which makes my life easier when backporting :-) I'll test that out tomorrow, for sure :-) > But, I'm not saying you *have* to remove the FIFO space check. I think > we now know the problem was caused by the missing -EAGAIN. So having sure, that's certainly what was missing. > the FIFO check there shouldn't hurt anything, it's not masking some it's also completely useless. A few register accesses for no reason :-) > other problem. And you may need it in the future for one of those > other cases the databook talks about. I'll keep that in the back of my head. Perhaps I can repurpose these FIFO space accessors to implement usb_ep_fifo_status(). I'll consider that. cheers
On Thu, Sep 25, 2014 at 06:12:51PM -0500, Felipe Balbi wrote: > Hi, > > On Thu, Sep 25, 2014 at 09:57:59PM +0000, Paul Zimmerman wrote: > > > From: Felipe Balbi [mailto:balbi@ti.com] > > > > > > On Thu, Sep 25, 2014 at 07:08:12PM +0000, Paul Zimmerman wrote: > > > > > From: Alan Stern [mailto:stern@rowland.harvard.edu] > > > > > Sent: Thursday, September 25, 2014 11:08 AM > > > > > > > > > > On Thu, 25 Sep 2014, Paul Zimmerman wrote: > > > > > > > > > > > That's why I don't understand how this can happen for IN. AFAIK, a STALL > > > > > > is only sent in response to something the host sent in the CBW. At that > > > > > > point, there shouldn't be any IN transfers active. > > > > > > > > > > The gadget may send a partial response to the CBW. The end of the > > > > > response is marked with a STALL. The mass-storage gadget driver > > > > > submits the partial response and then calls usb_ep_set_halt() without > > > > > waiting for the IN data to be delivered. It relies on the UDC driver > > > > > returning -EAGAIN if any data is still pending. > > > > > > > > I guess you're referring to the code under the DATA_DIR_TO_HOST case > > > > in finish_reply(). > > > > > > > > Felipe, in our vendor driver, the ep_set_halt() and ep_set_wedge() > > > > functions check the request queue for the endpoint, and if it is not > > > > empty, they return -EAGAIN. > > > > > > is that really enough ? The request is deleted (rather, moved to another > > > list) as soon as StartTransfer is issued. I can certainly check both > > > lists (and already do) but I fear that we might still race with the HW > > > because there's no documentation to guarantee that I won't :-) > > > > > > > I see your patch for the dwc3 driver does add that, in addition to the > > > > FIFO empty check. Does it still work OK if you remove the FIFO empty > > > > check portion? > > > > > > It probably does, but I can't be sure there won't be a corner case where > > > the list is empty (Xfercomplete was issued and request given back) but > > > host hasn't moved all the data. Synopsys databook doesn't guarantee that > > > will never be the case. > > > > > > Can you confirm, without a shadow of a doubt, that XferComplete will > > > only be issued after host has moved every single bit of data ? If that > > > can be updated to the databook, then sure, I can remove the FIFO space > > > check; otherwise we might leave a corner case that will take us a few > > > days to debug again because it will be very difficult to trigger :-) > > > > From the 2.70a databook, section 8.2.2: > > > > "While processing TRBs, two conditions may cause the core to write out an > > event and raise an interrupt line: > > - TRB Complete: > > - For OUT endpoints, a packet is received which reduces the > > remaining byte count in the TRB buffer to zero. > > - For IN endpoints, an acknowledgement is received for a > > transmitted packet which reduces the remaining byte count > > in the TRB buffer to zero. > > - Short Packet Received: > > - For OUT endpoints only. While writing to a TRB buffer, > > the endpoint receives a packet that is smaller than the > > endpoint's MaxPacketSize. > > " > > > > So for an IN, the completion event doesn't come until the TRB has been > > ACKed, which means all the data has been sent. > > a very good point indeed. I completely missed that. Well, the patch can > become a lot smaller, which makes my life easier when backporting :-) > > I'll test that out tomorrow, for sure :-) > > > But, I'm not saying you *have* to remove the FIFO space check. I think > > we now know the problem was caused by the missing -EAGAIN. So having > > sure, that's certainly what was missing. > > > the FIFO check there shouldn't hurt anything, it's not masking some > > it's also completely useless. A few register accesses for no reason :-) > > > other problem. And you may need it in the future for one of those > > other cases the databook talks about. > > I'll keep that in the back of my head. Perhaps I can repurpose these > FIFO space accessors to implement usb_ep_fifo_status(). I'll consider > that. alright, removed all that and it still works fine. Even passes Halt Endpoint test on usb20cv.
diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h index 66f6256..834f524 100644 --- a/drivers/usb/dwc3/core.h +++ b/drivers/usb/dwc3/core.h @@ -210,6 +210,14 @@ #define DWC3_GHWPARAMS4_HIBER_SCRATCHBUFS(n) (((n) & (0x0f << 13)) >> 13) #define DWC3_MAX_HIBER_SCRATCHBUFS 15 +/* Global FIFO Space Register */ +#define DWC3_GDBGFIFOSPACE_TXFIFO (0 << 5) +#define DWC3_GDBGFIFOSPACE_RXFIFO (1 << 5) +#define DWC3_GDBGFIFOSPACE_TXREQ_Q (2 << 5) +#define DWC3_GDBGFIFOSPACE_RXREQ_Q (3 << 5) + +#define DWC3_GDBGFIFOSPACE_SPACE_AVAIL(num) (((num) & 0xffff0000) >> 16) + /* Device Configuration Register */ #define DWC3_DCFG_DEVADDR(addr) ((addr) << 3) #define DWC3_DCFG_DEVADDR_MASK DWC3_DCFG_DEVADDR(0x7f) diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c index de53361..5e89913 100644 --- a/drivers/usb/dwc3/gadget.c +++ b/drivers/usb/dwc3/gadget.c @@ -145,6 +145,75 @@ int dwc3_gadget_set_link_state(struct dwc3 *dwc, enum dwc3_link_state state) } /** + * dwc3_gadget_ep_fifo_space - returns currently available space on FIFO + * @dwc: pointer to our context struct + * @dep: the endpoint to fetch FIFO space + * + * This function will return the currently available FIFO space + */ +static int dwc3_gadget_ep_fifo_space(struct dwc3 *dwc, struct dwc3_ep *dep) +{ + u32 current_fifo; + u32 reg; + + if (dep->direction) + reg = DWC3_GDBGFIFOSPACE_TXFIFO; + else + reg = DWC3_GDBGFIFOSPACE_RXFIFO; + + /* remove direction bit */ + reg |= dep->number >> 1; + + dwc3_writel(dwc->regs, DWC3_GDBGFIFOSPACE, reg); + reg = dwc3_readl(dwc->regs, DWC3_GDBGFIFOSPACE); + current_fifo = DWC3_GDBGFIFOSPACE_SPACE_AVAIL(reg); + + return current_fifo; +} + +/** + * dwc3_gadget_ep_fifo_size - returns allocated FIFO size + * @dwc: pointer to our context struct + * @dep: TX endpoint to fetch allocated FIFO size + * + * This function will read the correct TX FIFO register and + * return the FIFO size + */ +static int dwc3_gadget_ep_fifo_size(struct dwc3 *dwc, struct dwc3_ep *dep) +{ + if (WARN_ON(!dep->direction)) + return 0; + + return dwc3_readl(dwc->regs, DWC3_GTXFIFOSIZ(dep->number)) & 0xffff; +} + +/** + * dwc3_gadget_ep_fifo_empty - returns true when FIFO is empty + * @dwc: pointer to our context struct + * @dep: endpoint to request FIFO space + * + * This function will return a TRUE when FIFO is empty and FALSE + * otherwise. + */ +static int dwc3_gadget_ep_fifo_empty(struct dwc3 *dwc, struct dwc3_ep *dep) +{ + u32 allocated_fifo; + u32 current_fifo; + + /* should not be called for RX endpoints */ + if (WARN_ON(!dep->direction)) + return false; + + current_fifo = dwc3_gadget_ep_fifo_space(dwc, dep); + allocated_fifo = dwc3_gadget_ep_fifo_size(dwc, dep); + + dev_vdbg(dwc->dev, "%s: FIFO space %u/%u\n", dep->name, + current_fifo, allocated_fifo); + + return current_fifo == allocated_fifo; +} + +/** * dwc3_gadget_resize_tx_fifos - reallocate fifo spaces for current use-case * @dwc: pointer to our context structure * @@ -1216,6 +1285,20 @@ int __dwc3_gadget_ep_set_halt(struct dwc3_ep *dep, int value) memset(¶ms, 0x00, sizeof(params)); if (value) { + if (dep->flags & DWC3_EP_BUSY || + (!list_empty(&dep->req_queued) || + !list_empty(&dep->request_list))) { + dev_dbg(dwc->dev, "%s: pending request, cannot halt\n", + dep->name); + return -EAGAIN; + } + + if (dep->direction && !dwc3_gadget_ep_fifo_empty(dwc, dep)) { + dev_dbg(dwc->dev, "%s: FIFO not empty, cannot halt\n", + dep->name); + return -EAGAIN; + } + ret = dwc3_send_gadget_ep_cmd(dwc, dep->number, DWC3_DEPCMD_SETSTALL, ¶ms); if (ret)