diff mbox series

[RFT,v1,1/5] Revert "usb: ehci-hcd: Keep async schedule running"

Message ID 20200322130031.10455-2-lukma@denx.de
State New
Headers show
Series usb: Improve robustness of ehci-hcd controller operation | expand

Commit Message

Lukasz Majewski March 22, 2020, 1 p.m. UTC
This reverts commit 02b0e1a36c5bc20174299312556ec4e266872bd6.

Signed-off-by: Lukasz Majewski <lukma at denx.de>
---

 drivers/usb/host/ehci-hcd.c | 51 ++++++++++++++++++++++---------------
 1 file changed, 30 insertions(+), 21 deletions(-)

Comments

Marek Vasut March 22, 2020, 1:18 p.m. UTC | #1
On 3/22/20 2:00 PM, Lukasz Majewski wrote:
> This reverts commit 02b0e1a36c5bc20174299312556ec4e266872bd6.
> 
> Signed-off-by: Lukasz Majewski <lukma at denx.de>

This patch lacks any and all explanation why this is being reverted. The
patch you are reverting here explains why it was added and what real
issues it was fixing, so instead of reverting it, if there is an issue
with that patch, you should identify the issue and fix it.
Lukasz Majewski March 23, 2020, 6:57 a.m. UTC | #2
Hi Marek,

> On 3/22/20 2:00 PM, Lukasz Majewski wrote:
> > This reverts commit 02b0e1a36c5bc20174299312556ec4e266872bd6.
> > 
> > Signed-off-by: Lukasz Majewski <lukma at denx.de>  
> 
> This patch lacks any and all explanation why this is being reverted.
> The patch you are reverting here explains why it was added and what
> real issues it was fixing, so instead of reverting it, if there is an
> issue with that patch, you should identify the issue and fix it.

Marek, have you received the cover letter for this patch series?

In the cover letter I've written the rationale for reverting this
patch. 

In short - qhtoken has value of 0x0, when the token variable shows
errors. As a result the error handling is broken.
Could you comment on those arguments?

Moreover, I've explicitly stated that this is a Request For
Testing like patch series with a detailed report of testing procedure
(for my use case) for the USB in U-Boot (as Tom has tested the patch
with some ETH dongles). 



Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma at denx.de
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <https://lists.denx.de/pipermail/u-boot/attachments/20200323/02cc5973/attachment.sig>
Marek Vasut March 23, 2020, 11:46 a.m. UTC | #3
On 3/23/20 7:57 AM, Lukasz Majewski wrote:
> Hi Marek,

Hi,

>> On 3/22/20 2:00 PM, Lukasz Majewski wrote:
>>> This reverts commit 02b0e1a36c5bc20174299312556ec4e266872bd6.
>>>
>>> Signed-off-by: Lukasz Majewski <lukma at denx.de>  
>>
>> This patch lacks any and all explanation why this is being reverted.
>> The patch you are reverting here explains why it was added and what
>> real issues it was fixing, so instead of reverting it, if there is an
>> issue with that patch, you should identify the issue and fix it.
> 
> Marek, have you received the cover letter for this patch series?
> 
> In the cover letter I've written the rationale for reverting this
> patch. 

That should have been explained in this patch description.

> In short - qhtoken has value of 0x0, when the token variable shows
> errors. As a result the error handling is broken.
> Could you comment on those arguments?

Maybe you are referencing/reading the wrong token ?

You should probably figure out why this doesn't work first and then add
fixes on top.

> Moreover, I've explicitly stated that this is a Request For
> Testing like patch series with a detailed report of testing procedure
> (for my use case) for the USB in U-Boot (as Tom has tested the patch
> with some ETH dongles). 

I was still unable to replicate the ethernet device failure.
Lukasz Majewski March 23, 2020, 12:41 p.m. UTC | #4
Hi Marek,

> On 3/23/20 7:57 AM, Lukasz Majewski wrote:
> > Hi Marek,  
> 
> Hi,
> 
> >> On 3/22/20 2:00 PM, Lukasz Majewski wrote:  
> >>> This reverts commit 02b0e1a36c5bc20174299312556ec4e266872bd6.
> >>>
> >>> Signed-off-by: Lukasz Majewski <lukma at denx.de>    
> >>
> >> This patch lacks any and all explanation why this is being
> >> reverted. The patch you are reverting here explains why it was
> >> added and what real issues it was fixing, so instead of reverting
> >> it, if there is an issue with that patch, you should identify the
> >> issue and fix it.  
> > 
> > Marek, have you received the cover letter for this patch series?
> > 
> > In the cover letter I've written the rationale for reverting this
> > patch.   
> 
> That should have been explained in this patch description.
> 
> > In short - qhtoken has value of 0x0, when the token variable shows
> > errors. As a result the error handling is broken.
> > Could you comment on those arguments?  
> 
> Maybe you are referencing/reading the wrong token ?

I'm printing the token which is used afterwards for reacting on possible
errors.

> 
> You should probably figure out why this doesn't work first and then
> add fixes on top.

Haven't you seen such problem during code development on your setup
when developing this patch? 

> 
> > Moreover, I've explicitly stated that this is a Request For
> > Testing like patch series with a detailed report of testing
> > procedure (for my use case) for the USB in U-Boot (as Tom has
> > tested the patch with some ETH dongles).   
> 
> I was still unable to replicate the ethernet device failure.
> 

Which boards and SoCs do you used for your test setup?

For me the issue is visible on i.MX53 and i.MX6Q.


Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma at denx.de
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <https://lists.denx.de/pipermail/u-boot/attachments/20200323/3a45d9e7/attachment.sig>
Marek Vasut March 24, 2020, 12:58 a.m. UTC | #5
On 3/23/20 1:41 PM, Lukasz Majewski wrote:
> Hi Marek,

Hi,

>> On 3/23/20 7:57 AM, Lukasz Majewski wrote:
>>> Hi Marek,  
>>
>> Hi,
>>
>>>> On 3/22/20 2:00 PM, Lukasz Majewski wrote:  
>>>>> This reverts commit 02b0e1a36c5bc20174299312556ec4e266872bd6.
>>>>>
>>>>> Signed-off-by: Lukasz Majewski <lukma at denx.de>    
>>>>
>>>> This patch lacks any and all explanation why this is being
>>>> reverted. The patch you are reverting here explains why it was
>>>> added and what real issues it was fixing, so instead of reverting
>>>> it, if there is an issue with that patch, you should identify the
>>>> issue and fix it.  
>>>
>>> Marek, have you received the cover letter for this patch series?
>>>
>>> In the cover letter I've written the rationale for reverting this
>>> patch.   
>>
>> That should have been explained in this patch description.
>>
>>> In short - qhtoken has value of 0x0, when the token variable shows
>>> errors. As a result the error handling is broken.
>>> Could you comment on those arguments?  
>>
>> Maybe you are referencing/reading the wrong token ?
> 
> I'm printing the token which is used afterwards for reacting on possible
> errors.
> 
>>
>> You should probably figure out why this doesn't work first and then
>> add fixes on top.
> 
> Haven't you seen such problem during code development on your setup
> when developing this patch? 

During the development of the patch, I don't remember, sorry. I most
certainly saw various failure modes, however those should not be present
mainline.

I tested this patch with the problematic USB sticks on R-Car Gen3 and
with SMSC95xx USB ethernet adapter last weekend and I didn't see a problem.
Lukasz Majewski March 24, 2020, 7:06 a.m. UTC | #6
Hi Marek,

> On 3/23/20 1:41 PM, Lukasz Majewski wrote:
> > Hi Marek,  
> 
> Hi,
> 
> >> On 3/23/20 7:57 AM, Lukasz Majewski wrote:  
> >>> Hi Marek,    
> >>
> >> Hi,
> >>  
> >>>> On 3/22/20 2:00 PM, Lukasz Majewski wrote:    
> >>>>> This reverts commit 02b0e1a36c5bc20174299312556ec4e266872bd6.
> >>>>>
> >>>>> Signed-off-by: Lukasz Majewski <lukma at denx.de>      
> >>>>
> >>>> This patch lacks any and all explanation why this is being
> >>>> reverted. The patch you are reverting here explains why it was
> >>>> added and what real issues it was fixing, so instead of reverting
> >>>> it, if there is an issue with that patch, you should identify the
> >>>> issue and fix it.    
> >>>
> >>> Marek, have you received the cover letter for this patch series?
> >>>
> >>> In the cover letter I've written the rationale for reverting this
> >>> patch.     
> >>
> >> That should have been explained in this patch description.
> >>  
> >>> In short - qhtoken has value of 0x0, when the token variable shows
> >>> errors. As a result the error handling is broken.
> >>> Could you comment on those arguments?    
> >>
> >> Maybe you are referencing/reading the wrong token ?  
> > 
> > I'm printing the token which is used afterwards for reacting on
> > possible errors.
> >   
> >>
> >> You should probably figure out why this doesn't work first and then
> >> add fixes on top.  
> > 
> > Haven't you seen such problem during code development on your setup
> > when developing this patch?   
> 
> During the development of the patch, I don't remember, sorry. I most
> certainly saw various failure modes, however those should not be
> present mainline.

The issue is that the qhtoken is not updated at all. 

Maybe you remember - is Linux using async setup by default (as
introduced in SHA1: 02b0e1a36c5bc20174299312556ec4e266872bd6) ?

> 
> I tested this patch with the problematic USB sticks on R-Car Gen3 and
> with SMSC95xx USB ethernet adapter last weekend and I didn't see a
> problem.

Ok.

For i.MX6Q:
The SHA1: 02b0e1a36c5bc20174299312556ec4e266872bd6 patch causes the
iMX6Q to fail after a few minutes of testing. General in i.MX6Q the
usb is NOT robust at all.



For i.MX53:
With patch 02b0e1a36c5bc20174299312556ec4e266872bd6 applied it also
breaks after a few minutes.

With this patch series applied it works for 2 days now without any
issue.


Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma at denx.de
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <https://lists.denx.de/pipermail/u-boot/attachments/20200324/717bba31/attachment.sig>
Marek Vasut March 24, 2020, 3:07 p.m. UTC | #7
On 3/24/20 8:06 AM, Lukasz Majewski wrote:
> Hi Marek,

Hi,

[...]

>>>> You should probably figure out why this doesn't work first and then
>>>> add fixes on top.  
>>>
>>> Haven't you seen such problem during code development on your setup
>>> when developing this patch?   
>>
>> During the development of the patch, I don't remember, sorry. I most
>> certainly saw various failure modes, however those should not be
>> present mainline.
> 
> The issue is that the qhtoken is not updated at all. 
> 
> Maybe you remember - is Linux using async setup by default (as
> introduced in SHA1: 02b0e1a36c5bc20174299312556ec4e266872bd6) ?

If I recall correctly, it is using async schedule for bulk transfers.
But the code is available, so you can double-check that.

>> I tested this patch with the problematic USB sticks on R-Car Gen3 and
>> with SMSC95xx USB ethernet adapter last weekend and I didn't see a
>> problem.
> 
> Ok.
> 
> For i.MX6Q:
> The SHA1: 02b0e1a36c5bc20174299312556ec4e266872bd6 patch causes the
> iMX6Q to fail after a few minutes of testing. General in i.MX6Q the
> usb is NOT robust at all.
> 
> For i.MX53:
> With patch 02b0e1a36c5bc20174299312556ec4e266872bd6 applied it also
> breaks after a few minutes.

So on CI HDRC , there is some difference in behavior. That is what you
need to find and fix then.

> With this patch series applied it works for 2 days now without any
> issue.

Except performance is totally degraded and there is still no clear
explanation _why_ any of these patches are needed and/or whether doing
write to a block device with these patches may cause data corruption.
Lukasz Majewski March 24, 2020, 6:11 p.m. UTC | #8
Hi Marek,

> On 3/24/20 8:06 AM, Lukasz Majewski wrote:
> > Hi Marek,  
> 
> Hi,
> 
> [...]
> 
> >>>> You should probably figure out why this doesn't work first and
> >>>> then add fixes on top.    
> >>>
> >>> Haven't you seen such problem during code development on your
> >>> setup when developing this patch?     
> >>
> >> During the development of the patch, I don't remember, sorry. I
> >> most certainly saw various failure modes, however those should not
> >> be present mainline.  
> > 
> > The issue is that the qhtoken is not updated at all. 
> > 
> > Maybe you remember - is Linux using async setup by default (as
> > introduced in SHA1: 02b0e1a36c5bc20174299312556ec4e266872bd6) ?  
> 
> If I recall correctly, it is using async schedule for bulk transfers.
> But the code is available, so you can double-check that.
> 
> >> I tested this patch with the problematic USB sticks on R-Car Gen3
> >> and with SMSC95xx USB ethernet adapter last weekend and I didn't
> >> see a problem.  
> > 
> > Ok.
> > 
> > For i.MX6Q:
> > The SHA1: 02b0e1a36c5bc20174299312556ec4e266872bd6 patch causes the
> > iMX6Q to fail after a few minutes of testing. General in i.MX6Q the
> > usb is NOT robust at all.
> > 
> > For i.MX53:
> > With patch 02b0e1a36c5bc20174299312556ec4e266872bd6 applied it also
> > breaks after a few minutes.  
> 
> So on CI HDRC , there is some difference in behavior. That is what you
> need to find and fix then.

The conclusion is that some boards/implementations are broken.

> 
> > With this patch series applied it works for 2 days now without any
> > issue.  
> 
> Except performance is totally degraded

So we do have _very_ fast USB which breaks after a few minutes of
constant testing (with procedure stated earlier).

> and there is still no clear
> explanation _why_ any of these patches are needed

Haven't I explicitly explained in previous mails why XACTARR error shall
be handled? Nor the original thread did it? Wasn't the cover-letter
verbose enough?

> and/or whether doing
> write to a block device with these patches may cause data corruption.

So I will ask differently - what _may_ happen when the "TD -
token=XXXX" error shows up and the board hangs? Wouldn't we risk some
unwanted storage corruption?



Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma at denx.de
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <https://lists.denx.de/pipermail/u-boot/attachments/20200324/a0b2a941/attachment.sig>
Marek Vasut March 24, 2020, 6:33 p.m. UTC | #9
On 3/24/20 7:11 PM, Lukasz Majewski wrote:
> Hi Marek,

Hi,

>> On 3/24/20 8:06 AM, Lukasz Majewski wrote:
>>> Hi Marek,  
>>
>> Hi,
>>
>> [...]
>>
>>>>>> You should probably figure out why this doesn't work first and
>>>>>> then add fixes on top.    
>>>>>
>>>>> Haven't you seen such problem during code development on your
>>>>> setup when developing this patch?     
>>>>
>>>> During the development of the patch, I don't remember, sorry. I
>>>> most certainly saw various failure modes, however those should not
>>>> be present mainline.  
>>>
>>> The issue is that the qhtoken is not updated at all. 
>>>
>>> Maybe you remember - is Linux using async setup by default (as
>>> introduced in SHA1: 02b0e1a36c5bc20174299312556ec4e266872bd6) ?  
>>
>> If I recall correctly, it is using async schedule for bulk transfers.
>> But the code is available, so you can double-check that.
>>
>>>> I tested this patch with the problematic USB sticks on R-Car Gen3
>>>> and with SMSC95xx USB ethernet adapter last weekend and I didn't
>>>> see a problem.  
>>>
>>> Ok.
>>>
>>> For i.MX6Q:
>>> The SHA1: 02b0e1a36c5bc20174299312556ec4e266872bd6 patch causes the
>>> iMX6Q to fail after a few minutes of testing. General in i.MX6Q the
>>> usb is NOT robust at all.
>>>
>>> For i.MX53:
>>> With patch 02b0e1a36c5bc20174299312556ec4e266872bd6 applied it also
>>> breaks after a few minutes.  
>>
>> So on CI HDRC , there is some difference in behavior. That is what you
>> need to find and fix then.
> 
> The conclusion is that some boards/implementations are broken.

At least systems with CI HDRC.

>>> With this patch series applied it works for 2 days now without any
>>> issue.  
>>
>> Except performance is totally degraded
> 
> So we do have _very_ fast USB which breaks after a few minutes of
> constant testing (with procedure stated earlier).

No, we have a controller which manifests some problem and that problem
needs to be identified and fixed. Whether it's in the stack or in the
controller driver is to be seen.

>> and there is still no clear
>> explanation _why_ any of these patches are needed
> 
> Haven't I explicitly explained in previous mails why XACTARR error shall
> be handled? Nor the original thread did it? Wasn't the cover-letter
> verbose enough?

Regarding XACTERR, I agree it should be handled somehow.

However, I don't think handling XACTERR is what fixes your problems with
the USB sticks, is it ?

Also, it is still unclear whether handling XACTERR exactly the same as
STALL is the right thing to do. Is it ? I think it's not.

>> and/or whether doing
>> write to a block device with these patches may cause data corruption.
> 
> So I will ask differently - what _may_ happen when the "TD -
> token=XXXX" error shows up and the board hangs? Wouldn't we risk some
> unwanted storage corruption?

The behavior is undefined.
diff mbox series

Patch

diff --git a/drivers/usb/host/ehci-hcd.c b/drivers/usb/host/ehci-hcd.c
index 1cc02052f5..0a77111f80 100644
--- a/drivers/usb/host/ehci-hcd.c
+++ b/drivers/usb/host/ehci-hcd.c
@@ -309,7 +309,7 @@  ehci_submit_async(struct usb_device *dev, unsigned long pipe, void *buffer,
 	volatile struct qTD *vtd;
 	unsigned long ts;
 	uint32_t *tdp;
-	uint32_t endpt, maxpacket, token, usbsts, qhtoken;
+	uint32_t endpt, maxpacket, token, usbsts;
 	uint32_t c, toggle;
 	uint32_t cmd;
 	int timeout;
@@ -553,21 +553,22 @@  ehci_submit_async(struct usb_device *dev, unsigned long pipe, void *buffer,
 	flush_dcache_range((unsigned long)qtd,
 			   ALIGN_END_ADDR(struct qTD, qtd, qtd_count));
 
+	/* Set async. queue head pointer. */
+	ehci_writel(&ctrl->hcor->or_asynclistaddr, virt_to_phys(&ctrl->qh_list));
+
 	usbsts = ehci_readl(&ctrl->hcor->or_usbsts);
 	ehci_writel(&ctrl->hcor->or_usbsts, (usbsts & 0x3f));
 
 	/* Enable async. schedule. */
 	cmd = ehci_readl(&ctrl->hcor->or_usbcmd);
-	if (!(cmd & CMD_ASE)) {
-		cmd |= CMD_ASE;
-		ehci_writel(&ctrl->hcor->or_usbcmd, cmd);
+	cmd |= CMD_ASE;
+	ehci_writel(&ctrl->hcor->or_usbcmd, cmd);
 
-		ret = handshake((uint32_t *)&ctrl->hcor->or_usbsts, STS_ASS, STS_ASS,
-				100 * 1000);
-		if (ret < 0) {
-			printf("EHCI fail timeout STS_ASS set\n");
-			goto fail;
-		}
+	ret = handshake((uint32_t *)&ctrl->hcor->or_usbsts, STS_ASS, STS_ASS,
+			100 * 1000);
+	if (ret < 0) {
+		printf("EHCI fail timeout STS_ASS set\n");
+		goto fail;
 	}
 
 	/* Wait for TDs to be processed. */
@@ -588,11 +589,6 @@  ehci_submit_async(struct usb_device *dev, unsigned long pipe, void *buffer,
 			break;
 		WATCHDOG_RESET();
 	} while (get_timer(ts) < timeout);
-	qhtoken = hc32_to_cpu(qh->qh_overlay.qt_token);
-
-	ctrl->qh_list.qh_link = cpu_to_hc32(virt_to_phys(&ctrl->qh_list) | QH_LINK_TYPE_QH);
-	flush_dcache_range((unsigned long)&ctrl->qh_list,
-		ALIGN_END_ADDR(struct QH, &ctrl->qh_list, 1));
 
 	/*
 	 * Invalidate the memory area occupied by buffer
@@ -611,12 +607,25 @@  ehci_submit_async(struct usb_device *dev, unsigned long pipe, void *buffer,
 	if (QT_TOKEN_GET_STATUS(token) & QT_TOKEN_STATUS_ACTIVE)
 		printf("EHCI timed out on TD - token=%#x\n", token);
 
-	if (!(QT_TOKEN_GET_STATUS(qhtoken) & QT_TOKEN_STATUS_ACTIVE)) {
-		debug("TOKEN=%#x\n", qhtoken);
-		switch (QT_TOKEN_GET_STATUS(qhtoken) &
+	/* Disable async schedule. */
+	cmd = ehci_readl(&ctrl->hcor->or_usbcmd);
+	cmd &= ~CMD_ASE;
+	ehci_writel(&ctrl->hcor->or_usbcmd, cmd);
+
+	ret = handshake((uint32_t *)&ctrl->hcor->or_usbsts, STS_ASS, 0,
+			100 * 1000);
+	if (ret < 0) {
+		printf("EHCI fail timeout STS_ASS reset\n");
+		goto fail;
+	}
+
+	token = hc32_to_cpu(qh->qh_overlay.qt_token);
+	if (!(QT_TOKEN_GET_STATUS(token) & QT_TOKEN_STATUS_ACTIVE)) {
+		debug("TOKEN=%#x\n", token);
+		switch (QT_TOKEN_GET_STATUS(token) &
 			~(QT_TOKEN_STATUS_SPLITXSTATE | QT_TOKEN_STATUS_PERR)) {
 		case 0:
-			toggle = QT_TOKEN_GET_DT(qhtoken);
+			toggle = QT_TOKEN_GET_DT(token);
 			usb_settoggle(dev, usb_pipeendpoint(pipe),
 				       usb_pipeout(pipe), toggle);
 			dev->status = 0;
@@ -634,11 +643,11 @@  ehci_submit_async(struct usb_device *dev, unsigned long pipe, void *buffer,
 			break;
 		default:
 			dev->status = USB_ST_CRC_ERR;
-			if (QT_TOKEN_GET_STATUS(qhtoken) & QT_TOKEN_STATUS_HALTED)
+			if (QT_TOKEN_GET_STATUS(token) & QT_TOKEN_STATUS_HALTED)
 				dev->status |= USB_ST_STALLED;
 			break;
 		}
-		dev->act_len = length - QT_TOKEN_GET_TOTALBYTES(qhtoken);
+		dev->act_len = length - QT_TOKEN_GET_TOTALBYTES(token);
 	} else {
 		dev->act_len = 0;
 #ifndef CONFIG_USB_EHCI_FARADAY