diff mbox series

usb, xhci, rt2800usb: do not perform Soft Retry

Message ID 20210122104342.12451-1-stf_xl@wp.pl
State New
Headers show
Series usb, xhci, rt2800usb: do not perform Soft Retry | expand

Commit Message

Stanislaw Gruszka Jan. 22, 2021, 10:43 a.m. UTC
From: Stanislaw Gruszka <stf_xl@wp.pl>

Since f8f80be501aa ("xhci: Use soft retry to recover faster from transaction
errors") on some systems rt2800usb devices are unable to operate. Looks
that due to firmware or hardware limitations of those devices, they
require full recovery from USB Transaction Errors.

To avoid the problem add URB transfer flag, that restore pre f8f80be501aa
xhci behaviour when the flag is set. For now only add it only to rt2800usb
driver.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202541
Fixes: f8f80be501aa ("xhci: Use soft retry to recover faster from transaction errors")
Reported-and-tested-by: Bernhard <bernhard.gebetsberger@gmx.at>
Bisected-by: Bernhard <bernhard.gebetsberger@gmx.at>
Signed-off-by: Stanislaw Gruszka <stf_xl@wp.pl>
---
 drivers/net/wireless/ralink/rt2x00/rt2x00usb.c | 3 +++
 drivers/usb/core/urb.c                         | 2 +-
 drivers/usb/host/xhci-ring.c                   | 3 ++-
 include/linux/usb.h                            | 1 +
 4 files changed, 7 insertions(+), 2 deletions(-)

Comments

Andreas Hartmann Jan. 22, 2021, 1:17 p.m. UTC | #1
On 22.01.21 at 12:56 Greg KH wrote:
> On Fri, Jan 22, 2021 at 11:43:42AM +0100, stf_xl@wp.pl wrote:
>> From: Stanislaw Gruszka <stf_xl@wp.pl>
>>
>> Since f8f80be501aa ("xhci: Use soft retry to recover faster from transaction
>> errors") on some systems rt2800usb devices are unable to operate. Looks
>> that due to firmware or hardware limitations of those devices, they
>> require full recovery from USB Transaction Errors.
>>
>> To avoid the problem add URB transfer flag, that restore pre f8f80be501aa
>> xhci behaviour when the flag is set. For now only add it only to rt2800usb
>> driver.
> 
> This feels like a really heavy hammer, to add a xhci flag for a single
> broken device.
> 
> Are you sure this is really needed?  What does this device do on other
> operating systems, do they have such a quirk for their host controller
> driver?
> 
> Or is this due to the specific host controller device hardware?  Should
> this be a xhci quirk for a specific pci device instead?

Well, rt2800usb USB implementation does have a lot of potential for optimization 
since the very beginning (current throughput comparison 2 MiB/s vs 13 MiB/s with 
the original driver e.g.). That's why I'm using until today a self patched version 
(it's bound to cfg80211 meanwhile) of the original driver (rt5572sta), which 
doesn't have those problems at all. From my point of view, the goal should be to 
solve the real reason for the problem. The original driver works much better 
(leastwise here) and doesn't show this problem at all!

But anyway, there is from my point of view a basic problem with xhci_hcd, which 
just seems not to be completely backward compatible to existing USB 2 drivers (see 
https://marc.info/?l=linux-usb&m=161130327411612&w=2) if the device is plugged to 
an USB 3.x interface.



Thanks
Andreas
Mathias Nyman Jan. 22, 2021, 3:22 p.m. UTC | #2
On 22.1.2021 15.17, Andreas Hartmann wrote:
> 
> On 22.01.21 at 12:56 Greg KH wrote:
>> On Fri, Jan 22, 2021 at 11:43:42AM +0100, stf_xl@wp.pl wrote:
>>> From: Stanislaw Gruszka <stf_xl@wp.pl>
>>>
>>> Since f8f80be501aa ("xhci: Use soft retry to recover faster from transaction
>>> errors") on some systems rt2800usb devices are unable to operate. Looks
>>> that due to firmware or hardware limitations of those devices, they
>>> require full recovery from USB Transaction Errors.
>>>
>>> To avoid the problem add URB transfer flag, that restore pre f8f80be501aa
>>> xhci behaviour when the flag is set. For now only add it only to rt2800usb
>>> driver.
>>
>> This feels like a really heavy hammer, to add a xhci flag for a single
>> broken device.
>>
>> Are you sure this is really needed?  What does this device do on other
>> operating systems, do they have such a quirk for their host controller
>> driver?
>>
>> Or is this due to the specific host controller device hardware?  Should
>> this be a xhci quirk for a specific pci device instead?
> 
> Well, rt2800usb USB implementation does have a lot of potential for
> optimization since the very beginning (current throughput comparison
> 2 MiB/s vs 13 MiB/s with the original driver e.g.). That's why I'm
> using until today a self patched version (it's bound to cfg80211
> meanwhile) of the original driver (rt5572sta), which doesn't have those
> problems at all. From my point of view, the goal should be to solve the
> real reason for the problem. The original driver works much better
> (leastwise here) and doesn't show this problem at all!

Ok, so it could be a rt2800 driver issue, or it just hitting some
unlucky sequence that triggers this.

>
> But anyway, there is from my point of view a basic problem with xhci_hcd,
> which just seems not to be completely backward compatible to existing USB 2
> drivers (see https://marc.info/?l=linux-usb&m=161130327411612&w=2) if the
> device is plugged to an USB 3.x interface.

This looks like a different issue, lets keep it in its own thread.

The xHCI usb host controller handles both USB 2 and USB 3 speeds.
If the USB port is connected to a xHC controller then the xhci driver will
be used. If the port is connected to a EHCI then the ehci driever is used.
EHCI does not support USB3 speeds.

It's very possible that something that worked behind a EHCI host has issues
when connected to a xHCI host.

-Mathias
Stanislaw Gruszka Jan. 23, 2021, 10:14 a.m. UTC | #3
On Fri, Jan 22, 2021 at 05:00:21PM +0200, Mathias Nyman wrote:
> >> Or is this due to the specific host controller device hardware?  Should

> >> this be a xhci quirk for a specific pci device instead?

> 

> Exactly, this should be checked.

> Stanislaw, weren't there a few users already that saw this issue?


There are 30+ users cc-ed in the bugzilla report. However I think some
of them are not affected by issue originally reported by Bernhard.
They just saw "WARN Set TR Deq Ptr cmd failed due to incorrect slot
or ep state" message caused by different problem and added comment
to this particular bug report.

> Do we know what xHCI controllers they were using?


What I can tell issue was reported mostly on ASMedia and AMD
controllers. We can ask for exact vendor and device IDs and
just add xhci->quirks flag.

However I'm not entirely sure that xHCI hardware misbehave
is actual root cause. I think equally probable is that
connected device do not handle soft retry correctly. In that
case disabling Soft Retry per device would be actually
"lightest hammer" since other devices connected to the
xHCI host could benefit from faster recovery.

Is there way to debug/identify which side: host or device
hardware misbehave when Soft Retry is performed ?

Stanislaw
Stanislaw Gruszka Feb. 3, 2021, 9:02 a.m. UTC | #4
On Sat, Jan 23, 2021 at 11:14:19AM +0100, Stanislaw Gruszka wrote:
> On Fri, Jan 22, 2021 at 05:00:21PM +0200, Mathias Nyman wrote:
> > >> Or is this due to the specific host controller device hardware?  Should
> > >> this be a xhci quirk for a specific pci device instead?
> > 
> > Exactly, this should be checked.
> > Stanislaw, weren't there a few users already that saw this issue?
> 
> There are 30+ users cc-ed in the bugzilla report. However I think some
> of them are not affected by issue originally reported by Bernhard.
> They just saw "WARN Set TR Deq Ptr cmd failed due to incorrect slot
> or ep state" message caused by different problem and added comment
> to this particular bug report.
> 
> > Do we know what xHCI controllers they were using?
> 
> What I can tell issue was reported mostly on ASMedia and AMD
> controllers. We can ask for exact vendor and device IDs and
> just add xhci->quirks flag.
> 
> However I'm not entirely sure that xHCI hardware misbehave
> is actual root cause. I think equally probable is that
> connected device do not handle soft retry correctly. In that
> case disabling Soft Retry per device would be actually
> "lightest hammer" since other devices connected to the
> xHCI host could benefit from faster recovery.
> 
> Is there way to debug/identify which side: host or device
> hardware misbehave when Soft Retry is performed ?

So I think we do not have such possibility. If xhci quirk is
something that can be accepted I'll precede with such patch.
I'm going to ask bug reporters about xHCI conntroler PCI-id's ...

Stanislaw
diff mbox series

Patch

diff --git a/drivers/net/wireless/ralink/rt2x00/rt2x00usb.c b/drivers/net/wireless/ralink/rt2x00/rt2x00usb.c
index e4473a551241..f1d82b3e6bba 100644
--- a/drivers/net/wireless/ralink/rt2x00/rt2x00usb.c
+++ b/drivers/net/wireless/ralink/rt2x00/rt2x00usb.c
@@ -214,6 +214,7 @@  void rt2x00usb_register_read_async(struct rt2x00_dev *rt2x00dev,
 	usb_fill_control_urb(urb, usb_dev, usb_rcvctrlpipe(usb_dev, 0),
 			     (unsigned char *)(&rd->cr), &rd->reg, sizeof(rd->reg),
 			     rt2x00usb_register_read_async_cb, rd);
+	urb->transfer_flags |= URB_SOFT_RETRY_NOT_OK;
 	usb_anchor_urb(urb, rt2x00dev->anchor);
 	if (usb_submit_urb(urb, GFP_ATOMIC) < 0) {
 		usb_unanchor_urb(urb);
@@ -323,6 +324,7 @@  static bool rt2x00usb_kick_tx_entry(struct queue_entry *entry, void *data)
 			  usb_sndbulkpipe(usb_dev, entry->queue->usb_endpoint),
 			  entry->skb->data, length,
 			  rt2x00usb_interrupt_txdone, entry);
+	entry_priv->urb->transfer_flags |= URB_SOFT_RETRY_NOT_OK;
 
 	status = usb_submit_urb(entry_priv->urb, GFP_ATOMIC);
 	if (status) {
@@ -409,6 +411,7 @@  static bool rt2x00usb_kick_rx_entry(struct queue_entry *entry, void *data)
 			  usb_rcvbulkpipe(usb_dev, entry->queue->usb_endpoint),
 			  entry->skb->data, entry->skb->len,
 			  rt2x00usb_interrupt_rxdone, entry);
+	entry_priv->urb->transfer_flags |= URB_SOFT_RETRY_NOT_OK;
 
 	status = usb_submit_urb(entry_priv->urb, GFP_ATOMIC);
 	if (status) {
diff --git a/drivers/usb/core/urb.c b/drivers/usb/core/urb.c
index 357b149b20d3..140bac59dc32 100644
--- a/drivers/usb/core/urb.c
+++ b/drivers/usb/core/urb.c
@@ -495,7 +495,7 @@  int usb_submit_urb(struct urb *urb, gfp_t mem_flags)
 
 	/* Check against a simple/standard policy */
 	allowed = (URB_NO_TRANSFER_DMA_MAP | URB_NO_INTERRUPT | URB_DIR_MASK |
-			URB_FREE_BUFFER);
+		   URB_SOFT_RETRY_NOT_OK | URB_FREE_BUFFER);
 	switch (xfertype) {
 	case USB_ENDPOINT_XFER_BULK:
 	case USB_ENDPOINT_XFER_INT:
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 5677b81c0915..6712e1a7735c 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -2302,7 +2302,8 @@  static int process_bulk_intr_td(struct xhci_hcd *xhci, struct xhci_td *td,
 		remaining	= 0;
 		break;
 	case COMP_USB_TRANSACTION_ERROR:
-		if ((ep_ring->err_count++ > MAX_SOFT_RETRY) ||
+		if (td->urb->transfer_flags & URB_SOFT_RETRY_NOT_OK ||
+		    (ep_ring->err_count++ > MAX_SOFT_RETRY) ||
 		    le32_to_cpu(slot_ctx->tt_info) & TT_SLOT)
 			break;
 		*status = 0;
diff --git a/include/linux/usb.h b/include/linux/usb.h
index 7d72c4e0713c..dcdac2f03263 100644
--- a/include/linux/usb.h
+++ b/include/linux/usb.h
@@ -1329,6 +1329,7 @@  extern int usb_disabled(void);
 #define URB_ISO_ASAP		0x0002	/* iso-only; use the first unexpired
 					 * slot in the schedule */
 #define URB_NO_TRANSFER_DMA_MAP	0x0004	/* urb->transfer_dma valid on submit */
+#define URB_SOFT_RETRY_NOT_OK	0x0008	/* Avoid XHCI Soft Retry */
 #define URB_ZERO_PACKET		0x0040	/* Finish bulk OUT with short packet */
 #define URB_NO_INTERRUPT	0x0080	/* HINT: no non-error interrupt
 					 * needed */