Message ID | 20230630223638.2250-1-linuxtardis@gmail.com |
---|---|
State | New |
Headers | show |
Series | usb: dwc3: Disable AutoRetry controller feature for dwc_usb3 v2.00a | expand |
Hi, On Sat, Jul 01, 2023, Jakub Vanek wrote: > AutoRetry has been found to cause issues in this controller revision. > This feature allows the controller to send non-terminating/burst retry > ACKs (Retry=1 and Nump!=0) as opposed to terminating retry ACKs > (Retry=1 and Nump=0) to devices when a transaction error occurs. > > Unfortunately, some USB devices do not retry transfers when > the controller sends them a non-terminating retry ACK. After the transfer > times out, the xHCI driver tries to resume normal operation of the > controller by sending a Stop Endpoint command to it. However, this > revision of the controller fails to respond to that command in this > state and the xHCI driver therefore gives up. This is reported via dmesg: > > [sda] tag#29 uas_eh_abort_handler 0 uas-tag 1 inflight: CMD IN > [sda] tag#29 CDB: opcode=0x28 28 00 00 69 42 80 00 00 48 00 > xhci-hcd: xHCI host not responding to stop endpoint command > xhci-hcd: xHCI host controller not responding, assume dead > xhci-hcd: HC died; cleaning up > > This problem has been observed on Odroid HC2. This board has > an integrated JMS578 USB3-to-SATA bridge. The above problem could be > triggered by starting a read-heavy workload on an attached SSD. > After a while, the host controller would die and the SSD would disappear > from the system. > > Reverting b138e23d3dff ("usb: dwc3: core: Enable AutoRetry feature in > the controller") stopped the issue from occurring on Odroid HC2. > As this problem hasn't been reported on other devices, disable > AutoRetry just for the dwc_usb3 revision v2.00a that the board SoC > (Exynos 5422) uses. > > Fixes: b138e23d3dff ("usb: dwc3: core: Enable AutoRetry feature in the controller") > Link: https://urldefense.com/v3/__https://lore.kernel.org/r/a21f34c04632d250cd0a78c7c6f4a1c9c7a43142.camel@gmail.com/__;!!A4F2R9G_pg!YWgF6oqfuVY6xstZmroukjlrwAFEYEQE8uj_iUu4fd10mnJxEPG345k75dMLLdNFP8rT1leok-aPNkz_FuAJ1zxnmw$ > Link: https://urldefense.com/v3/__https://forum.odroid.com/viewtopic.php?t=42630__;!!A4F2R9G_pg!YWgF6oqfuVY6xstZmroukjlrwAFEYEQE8uj_iUu4fd10mnJxEPG345k75dMLLdNFP8rT1leok-aPNkz_FuCzOGIVPA$ > Cc: stable@vger.kernel.org > Cc: Mauro Ribeiro <mauro.ribeiro@hardkernel.com> > Cc: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> > Suggested-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com> > Signed-off-by: Jakub Vanek <linuxtardis@gmail.com> > --- > drivers/usb/dwc3/core.c | 10 ++++++++-- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c > index d68958e151a7..d742fc882d7e 100644 > --- a/drivers/usb/dwc3/core.c > +++ b/drivers/usb/dwc3/core.c > @@ -1218,9 +1218,15 @@ static int dwc3_core_init(struct dwc3 *dwc) > * Host mode on seeing transaction errors(CRC errors or internal > * overrun scenerios) on IN transfers to reply to the device > * with a non-terminating retry ACK (i.e, an ACK transcation > - * packet with Retry=1 & Nump != 0) > + * packet with Retry=1 & Nump != 0). > + * > + * Do not enable this for DWC_usb3 v2.00a. This controller > + * revision stops responding to Stop Endpoint commands if > + * a USB device does not retry after a non-terminating retry > + * ACK is sent to it. > */ > - reg |= DWC3_GUCTL_HSTINAUTORETRY; > + if (!DWC3_VER_IS(DWC3, 200A)) > + reg |= DWC3_GUCTL_HSTINAUTORETRY; > > dwc3_writel(dwc->regs, DWC3_GUCTL, reg); > } > -- > 2.25.1 > I brought up this inquiry internally. Please wait as I need to review this further. The handling for this may be different depending on the team's feedback. Thanks, Thinh
On Wed, 2023-07-05 at 22:47 +0000, Thinh Nguyen wrote: > Hi, > > On Sat, Jul 01, 2023, Jakub Vanek wrote: > > AutoRetry has been found to cause issues in this controller revision. > > This feature allows the controller to send non-terminating/burst retry > > ACKs (Retry=1 and Nump!=0) as opposed to terminating retry ACKs > > (Retry=1 and Nump=0) to devices when a transaction error occurs. > > > > Unfortunately, some USB devices do not retry transfers when > > the controller sends them a non-terminating retry ACK. After the transfer > > times out, the xHCI driver tries to resume normal operation of the > > controller by sending a Stop Endpoint command to it. However, this > > revision of the controller fails to respond to that command in this > > state and the xHCI driver therefore gives up. This is reported via dmesg: > > > > [sda] tag#29 uas_eh_abort_handler 0 uas-tag 1 inflight: CMD IN > > [sda] tag#29 CDB: opcode=0x28 28 00 00 69 42 80 00 00 48 00 > > xhci-hcd: xHCI host not responding to stop endpoint command > > xhci-hcd: xHCI host controller not responding, assume dead > > xhci-hcd: HC died; cleaning up > > > > This problem has been observed on Odroid HC2. This board has > > an integrated JMS578 USB3-to-SATA bridge. The above problem could be > > triggered by starting a read-heavy workload on an attached SSD. > > After a while, the host controller would die and the SSD would disappear > > from the system. > > > > Reverting b138e23d3dff ("usb: dwc3: core: Enable AutoRetry feature in > > the controller") stopped the issue from occurring on Odroid HC2. > > As this problem hasn't been reported on other devices, disable > > AutoRetry just for the dwc_usb3 revision v2.00a that the board SoC > > (Exynos 5422) uses. > > > > Fixes: b138e23d3dff ("usb: dwc3: core: Enable AutoRetry feature in the controller") > > Link: https://urldefense.com/v3/__https://lore.kernel.org/r/a21f34c04632d250cd0a78c7c6f4a1c9c7a43142.camel@gmail.com/__;!!A4F2R9G_pg!YWgF6oqfuVY6xstZmroukjlrwAFEYEQE8uj_iUu4fd10mnJxEPG345k75dMLLdNFP8rT1leok-aPNkz_FuAJ1zxnmw$ > > Link: https://urldefense.com/v3/__https://forum.odroid.com/viewtopic.php?t=42630__;!!A4F2R9G_pg!YWgF6oqfuVY6xstZmroukjlrwAFEYEQE8uj_iUu4fd10mnJxEPG345k75dMLLdNFP8rT1leok-aPNkz_FuCzOGIVPA$ > > Cc: stable@vger.kernel.org > > Cc: Mauro Ribeiro <mauro.ribeiro@hardkernel.com> > > Cc: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> > > Suggested-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com> > > Signed-off-by: Jakub Vanek <linuxtardis@gmail.com> > > --- > > drivers/usb/dwc3/core.c | 10 ++++++++-- > > 1 file changed, 8 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c > > index d68958e151a7..d742fc882d7e 100644 > > --- a/drivers/usb/dwc3/core.c > > +++ b/drivers/usb/dwc3/core.c > > @@ -1218,9 +1218,15 @@ static int dwc3_core_init(struct dwc3 *dwc) > > * Host mode on seeing transaction errors(CRC errors or internal > > * overrun scenerios) on IN transfers to reply to the device > > * with a non-terminating retry ACK (i.e, an ACK transcation > > - * packet with Retry=1 & Nump != 0) > > + * packet with Retry=1 & Nump != 0). > > + * > > + * Do not enable this for DWC_usb3 v2.00a. This controller > > + * revision stops responding to Stop Endpoint commands if > > + * a USB device does not retry after a non-terminating retry > > + * ACK is sent to it. > > */ > > - reg |= DWC3_GUCTL_HSTINAUTORETRY; > > + if (!DWC3_VER_IS(DWC3, 200A)) > > + reg |= DWC3_GUCTL_HSTINAUTORETRY; > > > > dwc3_writel(dwc->regs, DWC3_GUCTL, reg); > > } > > -- > > 2.25.1 > > > > I brought up this inquiry internally. Please wait as I need to review > this further. The handling for this may be different depending on the > team's feedback. > OK; feel free to contact me if I could be of any help. > Thanks, > Thinh Thank you, Jakub
Hi Jakub, On Thu, Jul 06, 2023, Jakub Vaněk wrote: > On Wed, 2023-07-05 at 22:47 +0000, Thinh Nguyen wrote: > > Hi, > > > > On Sat, Jul 01, 2023, Jakub Vanek wrote: > > > AutoRetry has been found to cause issues in this controller revision. > > > This feature allows the controller to send non-terminating/burst retry > > > ACKs (Retry=1 and Nump!=0) as opposed to terminating retry ACKs > > > (Retry=1 and Nump=0) to devices when a transaction error occurs. > > > > > > Unfortunately, some USB devices do not retry transfers when > > > the controller sends them a non-terminating retry ACK. After the transfer > > > times out, the xHCI driver tries to resume normal operation of the > > > controller by sending a Stop Endpoint command to it. However, this > > > revision of the controller fails to respond to that command in this > > > state and the xHCI driver therefore gives up. This is reported via dmesg: > > > > > > [sda] tag#29 uas_eh_abort_handler 0 uas-tag 1 inflight: CMD IN > > > [sda] tag#29 CDB: opcode=0x28 28 00 00 69 42 80 00 00 48 00 > > > xhci-hcd: xHCI host not responding to stop endpoint command > > > xhci-hcd: xHCI host controller not responding, assume dead > > > xhci-hcd: HC died; cleaning up > > > > > > This problem has been observed on Odroid HC2. This board has > > > an integrated JMS578 USB3-to-SATA bridge. The above problem could be > > > triggered by starting a read-heavy workload on an attached SSD. > > > After a while, the host controller would die and the SSD would disappear > > > from the system. > > > > > > Reverting b138e23d3dff ("usb: dwc3: core: Enable AutoRetry feature in > > > the controller") stopped the issue from occurring on Odroid HC2. > > > As this problem hasn't been reported on other devices, disable > > > AutoRetry just for the dwc_usb3 revision v2.00a that the board SoC > > > (Exynos 5422) uses. > > > > > > Fixes: b138e23d3dff ("usb: dwc3: core: Enable AutoRetry feature in the controller") > > > Link: https://urldefense.com/v3/__https://lore.kernel.org/r/a21f34c04632d250cd0a78c7c6f4a1c9c7a43142.camel@gmail.com/__;!!A4F2R9G_pg!YWgF6oqfuVY6xstZmroukjlrwAFEYEQE8uj_iUu4fd10mnJxEPG345k75dMLLdNFP8rT1leok-aPNkz_FuAJ1zxnmw$ > > > Link: https://urldefense.com/v3/__https://forum.odroid.com/viewtopic.php?t=42630__;!!A4F2R9G_pg!YWgF6oqfuVY6xstZmroukjlrwAFEYEQE8uj_iUu4fd10mnJxEPG345k75dMLLdNFP8rT1leok-aPNkz_FuCzOGIVPA$ > > > Cc: stable@vger.kernel.org > > > Cc: Mauro Ribeiro <mauro.ribeiro@hardkernel.com> > > > Cc: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> > > > Suggested-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com> > > > Signed-off-by: Jakub Vanek <linuxtardis@gmail.com> > > > --- > > > drivers/usb/dwc3/core.c | 10 ++++++++-- > > > 1 file changed, 8 insertions(+), 2 deletions(-) > > > > > > diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c > > > index d68958e151a7..d742fc882d7e 100644 > > > --- a/drivers/usb/dwc3/core.c > > > +++ b/drivers/usb/dwc3/core.c > > > @@ -1218,9 +1218,15 @@ static int dwc3_core_init(struct dwc3 *dwc) > > > * Host mode on seeing transaction errors(CRC errors or internal > > > * overrun scenerios) on IN transfers to reply to the device > > > * with a non-terminating retry ACK (i.e, an ACK transcation > > > - * packet with Retry=1 & Nump != 0) > > > + * packet with Retry=1 & Nump != 0). > > > + * > > > + * Do not enable this for DWC_usb3 v2.00a. This controller > > > + * revision stops responding to Stop Endpoint commands if > > > + * a USB device does not retry after a non-terminating retry > > > + * ACK is sent to it. > > > */ > > > - reg |= DWC3_GUCTL_HSTINAUTORETRY; > > > + if (!DWC3_VER_IS(DWC3, 200A)) > > > + reg |= DWC3_GUCTL_HSTINAUTORETRY; > > > > > > dwc3_writel(dwc->regs, DWC3_GUCTL, reg); > > > } > > > -- > > > 2.25.1 > > > > > > > I brought up this inquiry internally. Please wait as I need to review > > this further. The handling for this may be different depending on the > > team's feedback. > > > > OK; feel free to contact me if I could be of any help. > Sorry for the delay. After discussion, this behavior is the same across different controller versions. The consensus is to disable this feature altoghether. There will not be any noticeable performance impact. Would you mind revise the patch to disable this feature for all? Many thanks, Thinh
Hi Thinh, On Tue, 2023-07-11 at 21:48 +0000, Thinh Nguyen wrote: > Hi Jakub, > > On Thu, Jul 06, 2023, Jakub Vaněk wrote: > > On Wed, 2023-07-05 at 22:47 +0000, Thinh Nguyen wrote: > > > Hi, > > > > > > On Sat, Jul 01, 2023, Jakub Vanek wrote: > > > > AutoRetry has been found to cause issues in this controller > > > > revision. > > > > This feature allows the controller to send non- > > > > terminating/burst retry > > > > ACKs (Retry=1 and Nump!=0) as opposed to terminating retry ACKs > > > > (Retry=1 and Nump=0) to devices when a transaction error > > > > occurs. > > > > > > > > Unfortunately, some USB devices do not retry transfers when > > > > the controller sends them a non-terminating retry ACK. After > > > > the transfer > > > > times out, the xHCI driver tries to resume normal operation of > > > > the > > > > controller by sending a Stop Endpoint command to it. However, > > > > this > > > > revision of the controller fails to respond to that command in > > > > this > > > > state and the xHCI driver therefore gives up. This is reported > > > > via dmesg: > > > > > > > > [sda] tag#29 uas_eh_abort_handler 0 uas-tag 1 inflight: CMD IN > > > > [sda] tag#29 CDB: opcode=0x28 28 00 00 69 42 80 00 00 48 00 > > > > xhci-hcd: xHCI host not responding to stop endpoint command > > > > xhci-hcd: xHCI host controller not responding, assume dead > > > > xhci-hcd: HC died; cleaning up > > > > > > > > This problem has been observed on Odroid HC2. This board has > > > > an integrated JMS578 USB3-to-SATA bridge. The above problem > > > > could be > > > > triggered by starting a read-heavy workload on an attached SSD. > > > > After a while, the host controller would die and the SSD would > > > > disappear > > > > from the system. > > > > > > > > Reverting b138e23d3dff ("usb: dwc3: core: Enable AutoRetry > > > > feature in > > > > the controller") stopped the issue from occurring on Odroid > > > > HC2. > > > > As this problem hasn't been reported on other devices, disable > > > > AutoRetry just for the dwc_usb3 revision v2.00a that the board > > > > SoC > > > > (Exynos 5422) uses. > > > > > > > > Fixes: b138e23d3dff ("usb: dwc3: core: Enable AutoRetry feature > > > > in the controller") > > > > Link: > > > > https://urldefense.com/v3/__https://lore.kernel.org/r/a21f34c04632d250cd0a78c7c6f4a1c9c7a43142.camel@gmail.com/__;!!A4F2R9G_pg!YWgF6oqfuVY6xstZmroukjlrwAFEYEQE8uj_iUu4fd10mnJxEPG345k75dMLLdNFP8rT1leok-aPNkz_FuAJ1zxnmw$ > > > > > > > > Link: > > > > https://urldefense.com/v3/__https://forum.odroid.com/viewtopic.php?t=42630__;!!A4F2R9G_pg!YWgF6oqfuVY6xstZmroukjlrwAFEYEQE8uj_iUu4fd10mnJxEPG345k75dMLLdNFP8rT1leok-aPNkz_FuCzOGIVPA$ > > > > > > > > Cc: stable@vger.kernel.org > > > > Cc: Mauro Ribeiro <mauro.ribeiro@hardkernel.com> > > > > Cc: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> > > > > Suggested-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com> > > > > Signed-off-by: Jakub Vanek <linuxtardis@gmail.com> > > > > --- > > > > drivers/usb/dwc3/core.c | 10 ++++++++-- > > > > 1 file changed, 8 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c > > > > index d68958e151a7..d742fc882d7e 100644 > > > > --- a/drivers/usb/dwc3/core.c > > > > +++ b/drivers/usb/dwc3/core.c > > > > @@ -1218,9 +1218,15 @@ static int dwc3_core_init(struct dwc3 > > > > *dwc) > > > > * Host mode on seeing transaction errors(CRC > > > > errors or internal > > > > * overrun scenerios) on IN transfers to reply > > > > to the device > > > > * with a non-terminating retry ACK (i.e, an > > > > ACK transcation > > > > - * packet with Retry=1 & Nump != 0) > > > > + * packet with Retry=1 & Nump != 0). > > > > + * > > > > + * Do not enable this for DWC_usb3 v2.00a. This > > > > controller > > > > + * revision stops responding to Stop Endpoint > > > > commands if > > > > + * a USB device does not retry after a non- > > > > terminating retry > > > > + * ACK is sent to it. > > > > */ > > > > - reg |= DWC3_GUCTL_HSTINAUTORETRY; > > > > + if (!DWC3_VER_IS(DWC3, 200A)) > > > > + reg |= DWC3_GUCTL_HSTINAUTORETRY; > > > > > > > > dwc3_writel(dwc->regs, DWC3_GUCTL, reg); > > > > } > > > > -- > > > > 2.25.1 > > > > > > > > > > I brought up this inquiry internally. Please wait as I need to > > > review > > > this further. The handling for this may be different depending on > > > the > > > team's feedback. > > > > > > > OK; feel free to contact me if I could be of any help. > > > > Sorry for the delay. After discussion, this behavior is the same > across > different controller versions. The consensus is to disable this > feature > altoghether. There will not be any noticeable performance impact. > > Would you mind revise the patch to disable this feature for all? Not at all, I have sent a new revision of the patch: https://lore.kernel.org/linux-usb/20230712224037.24948-1-linuxtardis@gmail.com/ Reverting the commit altogether seemed preferable to me to updating the revision-specific patch. However, if the other way would be better, I can send an updated revision. > Many thanks, > Thinh Thank you, Jakub
On Thu, Jul 13, 2023, Jakub Vaněk wrote: > Hi Thinh, > > On Tue, 2023-07-11 at 21:48 +0000, Thinh Nguyen wrote: > > Hi Jakub, > > > > On Thu, Jul 06, 2023, Jakub Vaněk wrote: > > > On Wed, 2023-07-05 at 22:47 +0000, Thinh Nguyen wrote: > > > > I brought up this inquiry internally. Please wait as I need to > > > > review > > > > this further. The handling for this may be different depending on > > > > the > > > > team's feedback. > > > > > > > > > > OK; feel free to contact me if I could be of any help. > > > > > > > Sorry for the delay. After discussion, this behavior is the same > > across > > different controller versions. The consensus is to disable this > > feature > > altoghether. There will not be any noticeable performance impact. > > > > Would you mind revise the patch to disable this feature for all? > > Not at all, I have sent a new revision of the patch: > https://urldefense.com/v3/__https://lore.kernel.org/linux-usb/20230712224037.24948-1-linuxtardis@gmail.com/__;!!A4F2R9G_pg!euvmvJVr_AMCpyd59tiav4Co4u8p0TKzsLq0W7yKoyIlN_K7WFv2GP57wXmsddsJoxsEq6_Bhx8yp5gG8vAqebKhkw$ > > Reverting the commit altogether seemed preferable to me to updating the > revision-specific patch. However, if the other way would be better, I > can send an updated revision. > Reverting it is fine also. We'll keep the default setting from CoreConsultant. Thanks, Thinh
diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c index d68958e151a7..d742fc882d7e 100644 --- a/drivers/usb/dwc3/core.c +++ b/drivers/usb/dwc3/core.c @@ -1218,9 +1218,15 @@ static int dwc3_core_init(struct dwc3 *dwc) * Host mode on seeing transaction errors(CRC errors or internal * overrun scenerios) on IN transfers to reply to the device * with a non-terminating retry ACK (i.e, an ACK transcation - * packet with Retry=1 & Nump != 0) + * packet with Retry=1 & Nump != 0). + * + * Do not enable this for DWC_usb3 v2.00a. This controller + * revision stops responding to Stop Endpoint commands if + * a USB device does not retry after a non-terminating retry + * ACK is sent to it. */ - reg |= DWC3_GUCTL_HSTINAUTORETRY; + if (!DWC3_VER_IS(DWC3, 200A)) + reg |= DWC3_GUCTL_HSTINAUTORETRY; dwc3_writel(dwc->regs, DWC3_GUCTL, reg); }
AutoRetry has been found to cause issues in this controller revision. This feature allows the controller to send non-terminating/burst retry ACKs (Retry=1 and Nump!=0) as opposed to terminating retry ACKs (Retry=1 and Nump=0) to devices when a transaction error occurs. Unfortunately, some USB devices do not retry transfers when the controller sends them a non-terminating retry ACK. After the transfer times out, the xHCI driver tries to resume normal operation of the controller by sending a Stop Endpoint command to it. However, this revision of the controller fails to respond to that command in this state and the xHCI driver therefore gives up. This is reported via dmesg: [sda] tag#29 uas_eh_abort_handler 0 uas-tag 1 inflight: CMD IN [sda] tag#29 CDB: opcode=0x28 28 00 00 69 42 80 00 00 48 00 xhci-hcd: xHCI host not responding to stop endpoint command xhci-hcd: xHCI host controller not responding, assume dead xhci-hcd: HC died; cleaning up This problem has been observed on Odroid HC2. This board has an integrated JMS578 USB3-to-SATA bridge. The above problem could be triggered by starting a read-heavy workload on an attached SSD. After a while, the host controller would die and the SSD would disappear from the system. Reverting b138e23d3dff ("usb: dwc3: core: Enable AutoRetry feature in the controller") stopped the issue from occurring on Odroid HC2. As this problem hasn't been reported on other devices, disable AutoRetry just for the dwc_usb3 revision v2.00a that the board SoC (Exynos 5422) uses. Fixes: b138e23d3dff ("usb: dwc3: core: Enable AutoRetry feature in the controller") Link: https://lore.kernel.org/r/a21f34c04632d250cd0a78c7c6f4a1c9c7a43142.camel@gmail.com/ Link: https://forum.odroid.com/viewtopic.php?t=42630 Cc: stable@vger.kernel.org Cc: Mauro Ribeiro <mauro.ribeiro@hardkernel.com> Cc: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Suggested-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com> Signed-off-by: Jakub Vanek <linuxtardis@gmail.com> --- drivers/usb/dwc3/core.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)