Message ID | 20200322130031.10455-2-lukma@denx.de |
---|---|
State | New |
Headers | show |
Series | usb: Improve robustness of ehci-hcd controller operation | expand |
On 3/22/20 2:00 PM, Lukasz Majewski wrote: > This reverts commit 02b0e1a36c5bc20174299312556ec4e266872bd6. > > Signed-off-by: Lukasz Majewski <lukma at denx.de> This patch lacks any and all explanation why this is being reverted. The patch you are reverting here explains why it was added and what real issues it was fixing, so instead of reverting it, if there is an issue with that patch, you should identify the issue and fix it.
Hi Marek, > On 3/22/20 2:00 PM, Lukasz Majewski wrote: > > This reverts commit 02b0e1a36c5bc20174299312556ec4e266872bd6. > > > > Signed-off-by: Lukasz Majewski <lukma at denx.de> > > This patch lacks any and all explanation why this is being reverted. > The patch you are reverting here explains why it was added and what > real issues it was fixing, so instead of reverting it, if there is an > issue with that patch, you should identify the issue and fix it. Marek, have you received the cover letter for this patch series? In the cover letter I've written the rationale for reverting this patch. In short - qhtoken has value of 0x0, when the token variable shows errors. As a result the error handling is broken. Could you comment on those arguments? Moreover, I've explicitly stated that this is a Request For Testing like patch series with a detailed report of testing procedure (for my use case) for the USB in U-Boot (as Tom has tested the patch with some ETH dongles). Best regards, Lukasz Majewski -- DENX Software Engineering GmbH, Managing Director: Wolfgang Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma at denx.de -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 488 bytes Desc: OpenPGP digital signature URL: <https://lists.denx.de/pipermail/u-boot/attachments/20200323/02cc5973/attachment.sig>
On 3/23/20 7:57 AM, Lukasz Majewski wrote: > Hi Marek, Hi, >> On 3/22/20 2:00 PM, Lukasz Majewski wrote: >>> This reverts commit 02b0e1a36c5bc20174299312556ec4e266872bd6. >>> >>> Signed-off-by: Lukasz Majewski <lukma at denx.de> >> >> This patch lacks any and all explanation why this is being reverted. >> The patch you are reverting here explains why it was added and what >> real issues it was fixing, so instead of reverting it, if there is an >> issue with that patch, you should identify the issue and fix it. > > Marek, have you received the cover letter for this patch series? > > In the cover letter I've written the rationale for reverting this > patch. That should have been explained in this patch description. > In short - qhtoken has value of 0x0, when the token variable shows > errors. As a result the error handling is broken. > Could you comment on those arguments? Maybe you are referencing/reading the wrong token ? You should probably figure out why this doesn't work first and then add fixes on top. > Moreover, I've explicitly stated that this is a Request For > Testing like patch series with a detailed report of testing procedure > (for my use case) for the USB in U-Boot (as Tom has tested the patch > with some ETH dongles). I was still unable to replicate the ethernet device failure.
Hi Marek, > On 3/23/20 7:57 AM, Lukasz Majewski wrote: > > Hi Marek, > > Hi, > > >> On 3/22/20 2:00 PM, Lukasz Majewski wrote: > >>> This reverts commit 02b0e1a36c5bc20174299312556ec4e266872bd6. > >>> > >>> Signed-off-by: Lukasz Majewski <lukma at denx.de> > >> > >> This patch lacks any and all explanation why this is being > >> reverted. The patch you are reverting here explains why it was > >> added and what real issues it was fixing, so instead of reverting > >> it, if there is an issue with that patch, you should identify the > >> issue and fix it. > > > > Marek, have you received the cover letter for this patch series? > > > > In the cover letter I've written the rationale for reverting this > > patch. > > That should have been explained in this patch description. > > > In short - qhtoken has value of 0x0, when the token variable shows > > errors. As a result the error handling is broken. > > Could you comment on those arguments? > > Maybe you are referencing/reading the wrong token ? I'm printing the token which is used afterwards for reacting on possible errors. > > You should probably figure out why this doesn't work first and then > add fixes on top. Haven't you seen such problem during code development on your setup when developing this patch? > > > Moreover, I've explicitly stated that this is a Request For > > Testing like patch series with a detailed report of testing > > procedure (for my use case) for the USB in U-Boot (as Tom has > > tested the patch with some ETH dongles). > > I was still unable to replicate the ethernet device failure. > Which boards and SoCs do you used for your test setup? For me the issue is visible on i.MX53 and i.MX6Q. Best regards, Lukasz Majewski -- DENX Software Engineering GmbH, Managing Director: Wolfgang Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma at denx.de -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 488 bytes Desc: OpenPGP digital signature URL: <https://lists.denx.de/pipermail/u-boot/attachments/20200323/3a45d9e7/attachment.sig>
On 3/23/20 1:41 PM, Lukasz Majewski wrote: > Hi Marek, Hi, >> On 3/23/20 7:57 AM, Lukasz Majewski wrote: >>> Hi Marek, >> >> Hi, >> >>>> On 3/22/20 2:00 PM, Lukasz Majewski wrote: >>>>> This reverts commit 02b0e1a36c5bc20174299312556ec4e266872bd6. >>>>> >>>>> Signed-off-by: Lukasz Majewski <lukma at denx.de> >>>> >>>> This patch lacks any and all explanation why this is being >>>> reverted. The patch you are reverting here explains why it was >>>> added and what real issues it was fixing, so instead of reverting >>>> it, if there is an issue with that patch, you should identify the >>>> issue and fix it. >>> >>> Marek, have you received the cover letter for this patch series? >>> >>> In the cover letter I've written the rationale for reverting this >>> patch. >> >> That should have been explained in this patch description. >> >>> In short - qhtoken has value of 0x0, when the token variable shows >>> errors. As a result the error handling is broken. >>> Could you comment on those arguments? >> >> Maybe you are referencing/reading the wrong token ? > > I'm printing the token which is used afterwards for reacting on possible > errors. > >> >> You should probably figure out why this doesn't work first and then >> add fixes on top. > > Haven't you seen such problem during code development on your setup > when developing this patch? During the development of the patch, I don't remember, sorry. I most certainly saw various failure modes, however those should not be present mainline. I tested this patch with the problematic USB sticks on R-Car Gen3 and with SMSC95xx USB ethernet adapter last weekend and I didn't see a problem.
Hi Marek, > On 3/23/20 1:41 PM, Lukasz Majewski wrote: > > Hi Marek, > > Hi, > > >> On 3/23/20 7:57 AM, Lukasz Majewski wrote: > >>> Hi Marek, > >> > >> Hi, > >> > >>>> On 3/22/20 2:00 PM, Lukasz Majewski wrote: > >>>>> This reverts commit 02b0e1a36c5bc20174299312556ec4e266872bd6. > >>>>> > >>>>> Signed-off-by: Lukasz Majewski <lukma at denx.de> > >>>> > >>>> This patch lacks any and all explanation why this is being > >>>> reverted. The patch you are reverting here explains why it was > >>>> added and what real issues it was fixing, so instead of reverting > >>>> it, if there is an issue with that patch, you should identify the > >>>> issue and fix it. > >>> > >>> Marek, have you received the cover letter for this patch series? > >>> > >>> In the cover letter I've written the rationale for reverting this > >>> patch. > >> > >> That should have been explained in this patch description. > >> > >>> In short - qhtoken has value of 0x0, when the token variable shows > >>> errors. As a result the error handling is broken. > >>> Could you comment on those arguments? > >> > >> Maybe you are referencing/reading the wrong token ? > > > > I'm printing the token which is used afterwards for reacting on > > possible errors. > > > >> > >> You should probably figure out why this doesn't work first and then > >> add fixes on top. > > > > Haven't you seen such problem during code development on your setup > > when developing this patch? > > During the development of the patch, I don't remember, sorry. I most > certainly saw various failure modes, however those should not be > present mainline. The issue is that the qhtoken is not updated at all. Maybe you remember - is Linux using async setup by default (as introduced in SHA1: 02b0e1a36c5bc20174299312556ec4e266872bd6) ? > > I tested this patch with the problematic USB sticks on R-Car Gen3 and > with SMSC95xx USB ethernet adapter last weekend and I didn't see a > problem. Ok. For i.MX6Q: The SHA1: 02b0e1a36c5bc20174299312556ec4e266872bd6 patch causes the iMX6Q to fail after a few minutes of testing. General in i.MX6Q the usb is NOT robust at all. For i.MX53: With patch 02b0e1a36c5bc20174299312556ec4e266872bd6 applied it also breaks after a few minutes. With this patch series applied it works for 2 days now without any issue. Best regards, Lukasz Majewski -- DENX Software Engineering GmbH, Managing Director: Wolfgang Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma at denx.de -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 488 bytes Desc: OpenPGP digital signature URL: <https://lists.denx.de/pipermail/u-boot/attachments/20200324/717bba31/attachment.sig>
On 3/24/20 8:06 AM, Lukasz Majewski wrote: > Hi Marek, Hi, [...] >>>> You should probably figure out why this doesn't work first and then >>>> add fixes on top. >>> >>> Haven't you seen such problem during code development on your setup >>> when developing this patch? >> >> During the development of the patch, I don't remember, sorry. I most >> certainly saw various failure modes, however those should not be >> present mainline. > > The issue is that the qhtoken is not updated at all. > > Maybe you remember - is Linux using async setup by default (as > introduced in SHA1: 02b0e1a36c5bc20174299312556ec4e266872bd6) ? If I recall correctly, it is using async schedule for bulk transfers. But the code is available, so you can double-check that. >> I tested this patch with the problematic USB sticks on R-Car Gen3 and >> with SMSC95xx USB ethernet adapter last weekend and I didn't see a >> problem. > > Ok. > > For i.MX6Q: > The SHA1: 02b0e1a36c5bc20174299312556ec4e266872bd6 patch causes the > iMX6Q to fail after a few minutes of testing. General in i.MX6Q the > usb is NOT robust at all. > > For i.MX53: > With patch 02b0e1a36c5bc20174299312556ec4e266872bd6 applied it also > breaks after a few minutes. So on CI HDRC , there is some difference in behavior. That is what you need to find and fix then. > With this patch series applied it works for 2 days now without any > issue. Except performance is totally degraded and there is still no clear explanation _why_ any of these patches are needed and/or whether doing write to a block device with these patches may cause data corruption.
Hi Marek, > On 3/24/20 8:06 AM, Lukasz Majewski wrote: > > Hi Marek, > > Hi, > > [...] > > >>>> You should probably figure out why this doesn't work first and > >>>> then add fixes on top. > >>> > >>> Haven't you seen such problem during code development on your > >>> setup when developing this patch? > >> > >> During the development of the patch, I don't remember, sorry. I > >> most certainly saw various failure modes, however those should not > >> be present mainline. > > > > The issue is that the qhtoken is not updated at all. > > > > Maybe you remember - is Linux using async setup by default (as > > introduced in SHA1: 02b0e1a36c5bc20174299312556ec4e266872bd6) ? > > If I recall correctly, it is using async schedule for bulk transfers. > But the code is available, so you can double-check that. > > >> I tested this patch with the problematic USB sticks on R-Car Gen3 > >> and with SMSC95xx USB ethernet adapter last weekend and I didn't > >> see a problem. > > > > Ok. > > > > For i.MX6Q: > > The SHA1: 02b0e1a36c5bc20174299312556ec4e266872bd6 patch causes the > > iMX6Q to fail after a few minutes of testing. General in i.MX6Q the > > usb is NOT robust at all. > > > > For i.MX53: > > With patch 02b0e1a36c5bc20174299312556ec4e266872bd6 applied it also > > breaks after a few minutes. > > So on CI HDRC , there is some difference in behavior. That is what you > need to find and fix then. The conclusion is that some boards/implementations are broken. > > > With this patch series applied it works for 2 days now without any > > issue. > > Except performance is totally degraded So we do have _very_ fast USB which breaks after a few minutes of constant testing (with procedure stated earlier). > and there is still no clear > explanation _why_ any of these patches are needed Haven't I explicitly explained in previous mails why XACTARR error shall be handled? Nor the original thread did it? Wasn't the cover-letter verbose enough? > and/or whether doing > write to a block device with these patches may cause data corruption. So I will ask differently - what _may_ happen when the "TD - token=XXXX" error shows up and the board hangs? Wouldn't we risk some unwanted storage corruption? Best regards, Lukasz Majewski -- DENX Software Engineering GmbH, Managing Director: Wolfgang Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma at denx.de -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 488 bytes Desc: OpenPGP digital signature URL: <https://lists.denx.de/pipermail/u-boot/attachments/20200324/a0b2a941/attachment.sig>
On 3/24/20 7:11 PM, Lukasz Majewski wrote: > Hi Marek, Hi, >> On 3/24/20 8:06 AM, Lukasz Majewski wrote: >>> Hi Marek, >> >> Hi, >> >> [...] >> >>>>>> You should probably figure out why this doesn't work first and >>>>>> then add fixes on top. >>>>> >>>>> Haven't you seen such problem during code development on your >>>>> setup when developing this patch? >>>> >>>> During the development of the patch, I don't remember, sorry. I >>>> most certainly saw various failure modes, however those should not >>>> be present mainline. >>> >>> The issue is that the qhtoken is not updated at all. >>> >>> Maybe you remember - is Linux using async setup by default (as >>> introduced in SHA1: 02b0e1a36c5bc20174299312556ec4e266872bd6) ? >> >> If I recall correctly, it is using async schedule for bulk transfers. >> But the code is available, so you can double-check that. >> >>>> I tested this patch with the problematic USB sticks on R-Car Gen3 >>>> and with SMSC95xx USB ethernet adapter last weekend and I didn't >>>> see a problem. >>> >>> Ok. >>> >>> For i.MX6Q: >>> The SHA1: 02b0e1a36c5bc20174299312556ec4e266872bd6 patch causes the >>> iMX6Q to fail after a few minutes of testing. General in i.MX6Q the >>> usb is NOT robust at all. >>> >>> For i.MX53: >>> With patch 02b0e1a36c5bc20174299312556ec4e266872bd6 applied it also >>> breaks after a few minutes. >> >> So on CI HDRC , there is some difference in behavior. That is what you >> need to find and fix then. > > The conclusion is that some boards/implementations are broken. At least systems with CI HDRC. >>> With this patch series applied it works for 2 days now without any >>> issue. >> >> Except performance is totally degraded > > So we do have _very_ fast USB which breaks after a few minutes of > constant testing (with procedure stated earlier). No, we have a controller which manifests some problem and that problem needs to be identified and fixed. Whether it's in the stack or in the controller driver is to be seen. >> and there is still no clear >> explanation _why_ any of these patches are needed > > Haven't I explicitly explained in previous mails why XACTARR error shall > be handled? Nor the original thread did it? Wasn't the cover-letter > verbose enough? Regarding XACTERR, I agree it should be handled somehow. However, I don't think handling XACTERR is what fixes your problems with the USB sticks, is it ? Also, it is still unclear whether handling XACTERR exactly the same as STALL is the right thing to do. Is it ? I think it's not. >> and/or whether doing >> write to a block device with these patches may cause data corruption. > > So I will ask differently - what _may_ happen when the "TD - > token=XXXX" error shows up and the board hangs? Wouldn't we risk some > unwanted storage corruption? The behavior is undefined.
diff --git a/drivers/usb/host/ehci-hcd.c b/drivers/usb/host/ehci-hcd.c index 1cc02052f5..0a77111f80 100644 --- a/drivers/usb/host/ehci-hcd.c +++ b/drivers/usb/host/ehci-hcd.c @@ -309,7 +309,7 @@ ehci_submit_async(struct usb_device *dev, unsigned long pipe, void *buffer, volatile struct qTD *vtd; unsigned long ts; uint32_t *tdp; - uint32_t endpt, maxpacket, token, usbsts, qhtoken; + uint32_t endpt, maxpacket, token, usbsts; uint32_t c, toggle; uint32_t cmd; int timeout; @@ -553,21 +553,22 @@ ehci_submit_async(struct usb_device *dev, unsigned long pipe, void *buffer, flush_dcache_range((unsigned long)qtd, ALIGN_END_ADDR(struct qTD, qtd, qtd_count)); + /* Set async. queue head pointer. */ + ehci_writel(&ctrl->hcor->or_asynclistaddr, virt_to_phys(&ctrl->qh_list)); + usbsts = ehci_readl(&ctrl->hcor->or_usbsts); ehci_writel(&ctrl->hcor->or_usbsts, (usbsts & 0x3f)); /* Enable async. schedule. */ cmd = ehci_readl(&ctrl->hcor->or_usbcmd); - if (!(cmd & CMD_ASE)) { - cmd |= CMD_ASE; - ehci_writel(&ctrl->hcor->or_usbcmd, cmd); + cmd |= CMD_ASE; + ehci_writel(&ctrl->hcor->or_usbcmd, cmd); - ret = handshake((uint32_t *)&ctrl->hcor->or_usbsts, STS_ASS, STS_ASS, - 100 * 1000); - if (ret < 0) { - printf("EHCI fail timeout STS_ASS set\n"); - goto fail; - } + ret = handshake((uint32_t *)&ctrl->hcor->or_usbsts, STS_ASS, STS_ASS, + 100 * 1000); + if (ret < 0) { + printf("EHCI fail timeout STS_ASS set\n"); + goto fail; } /* Wait for TDs to be processed. */ @@ -588,11 +589,6 @@ ehci_submit_async(struct usb_device *dev, unsigned long pipe, void *buffer, break; WATCHDOG_RESET(); } while (get_timer(ts) < timeout); - qhtoken = hc32_to_cpu(qh->qh_overlay.qt_token); - - ctrl->qh_list.qh_link = cpu_to_hc32(virt_to_phys(&ctrl->qh_list) | QH_LINK_TYPE_QH); - flush_dcache_range((unsigned long)&ctrl->qh_list, - ALIGN_END_ADDR(struct QH, &ctrl->qh_list, 1)); /* * Invalidate the memory area occupied by buffer @@ -611,12 +607,25 @@ ehci_submit_async(struct usb_device *dev, unsigned long pipe, void *buffer, if (QT_TOKEN_GET_STATUS(token) & QT_TOKEN_STATUS_ACTIVE) printf("EHCI timed out on TD - token=%#x\n", token); - if (!(QT_TOKEN_GET_STATUS(qhtoken) & QT_TOKEN_STATUS_ACTIVE)) { - debug("TOKEN=%#x\n", qhtoken); - switch (QT_TOKEN_GET_STATUS(qhtoken) & + /* Disable async schedule. */ + cmd = ehci_readl(&ctrl->hcor->or_usbcmd); + cmd &= ~CMD_ASE; + ehci_writel(&ctrl->hcor->or_usbcmd, cmd); + + ret = handshake((uint32_t *)&ctrl->hcor->or_usbsts, STS_ASS, 0, + 100 * 1000); + if (ret < 0) { + printf("EHCI fail timeout STS_ASS reset\n"); + goto fail; + } + + token = hc32_to_cpu(qh->qh_overlay.qt_token); + if (!(QT_TOKEN_GET_STATUS(token) & QT_TOKEN_STATUS_ACTIVE)) { + debug("TOKEN=%#x\n", token); + switch (QT_TOKEN_GET_STATUS(token) & ~(QT_TOKEN_STATUS_SPLITXSTATE | QT_TOKEN_STATUS_PERR)) { case 0: - toggle = QT_TOKEN_GET_DT(qhtoken); + toggle = QT_TOKEN_GET_DT(token); usb_settoggle(dev, usb_pipeendpoint(pipe), usb_pipeout(pipe), toggle); dev->status = 0; @@ -634,11 +643,11 @@ ehci_submit_async(struct usb_device *dev, unsigned long pipe, void *buffer, break; default: dev->status = USB_ST_CRC_ERR; - if (QT_TOKEN_GET_STATUS(qhtoken) & QT_TOKEN_STATUS_HALTED) + if (QT_TOKEN_GET_STATUS(token) & QT_TOKEN_STATUS_HALTED) dev->status |= USB_ST_STALLED; break; } - dev->act_len = length - QT_TOKEN_GET_TOTALBYTES(qhtoken); + dev->act_len = length - QT_TOKEN_GET_TOTALBYTES(token); } else { dev->act_len = 0; #ifndef CONFIG_USB_EHCI_FARADAY
This reverts commit 02b0e1a36c5bc20174299312556ec4e266872bd6. Signed-off-by: Lukasz Majewski <lukma at denx.de> --- drivers/usb/host/ehci-hcd.c | 51 ++++++++++++++++++++++--------------- 1 file changed, 30 insertions(+), 21 deletions(-)