mbox series

[0/3] USB: core: Don't overwrite device descriptor during reinitialization

Message ID 6eadec91-990a-4fbd-8883-8366c4a4d8e4@rowland.harvard.edu
Headers show
Series USB: core: Don't overwrite device descriptor during reinitialization | expand

Message

Alan Stern Aug. 4, 2023, 7:09 p.m. UTC
An outstanding syzbot bug report has been traced to a race between the
routine that reads in the device descriptor for a device being
reinitialized and the routine that writes the descriptors to a sysfs
attribute file.  The problem is that reinitializing a device, like
initializing it for the first time, stores the device descriptor
directly in the usb_device structure, where it may be accessed
concurrently as part of sending the descriptors to the sysfs reader.

This three-part series fixes the problem:

	The first patch unites the code paths responsible for first
	reading the device descriptor in hub.c's old scheme and new
	scheme, so that neither of them will call
	usb_get_device_descriptor().

	The second patch changes usb_get_device_descriptor(), making
	it return the descriptor in a dynamically allocated buffer
	rather than storing it directly in the device structure.

	The third patch changes hub_port_init(), adding a new argument
	that specifies a buffer in which to store the device
	descriptor for devices being reinitialized.

As a result of these changes, the copy of the device descriptor stored
in the usb_device structure will never be overwritten once it has been
initialized.  This eliminates the data race causing the bug identified
by syzbot.

It would be nice at some point to make a similar change to the code
that reads the device's BOS descriptor; reinitialization should not
overwrite its existing data either.  This series doesn't attempt to do
that, but it would be a good thing to do.

Alan Stern

Comments

Thinh Nguyen Aug. 10, 2023, 12:28 a.m. UTC | #1
Hi Alan,

On Fri, Aug 04, 2023, Alan Stern wrote:
> An outstanding syzbot bug report has been traced to a race between the
> routine that reads in the device descriptor for a device being
> reinitialized and the routine that writes the descriptors to a sysfs
> attribute file.  The problem is that reinitializing a device, like
> initializing it for the first time, stores the device descriptor
> directly in the usb_device structure, where it may be accessed
> concurrently as part of sending the descriptors to the sysfs reader.
> 
> This three-part series fixes the problem:
> 
> 	The first patch unites the code paths responsible for first
> 	reading the device descriptor in hub.c's old scheme and new
> 	scheme, so that neither of them will call
> 	usb_get_device_descriptor().
> 
> 	The second patch changes usb_get_device_descriptor(), making
> 	it return the descriptor in a dynamically allocated buffer
> 	rather than storing it directly in the device structure.
> 
> 	The third patch changes hub_port_init(), adding a new argument
> 	that specifies a buffer in which to store the device
> 	descriptor for devices being reinitialized.
> 
> As a result of these changes, the copy of the device descriptor stored
> in the usb_device structure will never be overwritten once it has been
> initialized.  This eliminates the data race causing the bug identified
> by syzbot.
> 
> It would be nice at some point to make a similar change to the code
> that reads the device's BOS descriptor; reinitialization should not
> overwrite its existing data either.  This series doesn't attempt to do
> that, but it would be a good thing to do.
> 

Testing from Greg's usb-next branch, this series causes problem with
device enumeration:

[   31.650759] usb 2-1: new SuperSpeed Plus Gen 2x1 USB device number 2 using xhci_hcd
[   31.663107] usb 2-1: device descriptor read/8, error -71
[   31.952697] usb 2-1: new SuperSpeed Plus Gen 2x1 USB device number 3 using xhci_hcd
[   31.965122] usb 2-1: device descriptor read/8, error -71
[   32.080991] usb usb2-port1: attempt power cycle
[   34.826893] usb 2-1: new SuperSpeed Plus Gen 2x1 USB device number 5 using xhci_hcd
[   34.839241] usb 2-1: device descriptor read/8, error -71
[   35.129908] usb 2-1: new SuperSpeed Plus Gen 2x1 USB device number 6 using xhci_hcd
[   35.142264] usb 2-1: device descriptor read/8, error -71
[   35.257155] usb usb2-port1: attempt power cycle
[   37.258922] usb 1-1: new high-speed USB device number 5 using xhci_hcd
[   38.115053] usb 2-1: new SuperSpeed Plus Gen 2x1 USB device number 8 using xhci_hcd
[   38.127407] usb 2-1: device descriptor read/8, error -71
[   38.417066] usb 2-1: new SuperSpeed Plus Gen 2x1 USB device number 9 using xhci_hcd
[   38.429428] usb 2-1: device descriptor read/8, error -71
[   38.545315] usb usb2-port1: attempt power cycle
[   43.347312] usb 2-2: new SuperSpeed Plus Gen 2x1 USB device number 11 using xhci_hcd
[   43.359659] usb 2-2: device descriptor read/8, error -71
[   43.649323] usb 2-2: new SuperSpeed Plus Gen 2x1 USB device number 12 using xhci_hcd
[   43.661681] usb 2-2: device descriptor read/8, error -71
[   43.777566] usb usb2-port2: attempt power cycle

I was testing with our host along with other common vendor hosts with a
few 3.x devices. Reverting this series resolves the issue. I didn't do
extensive tests for various speeds or debug it. I just want to notify
this first perhaps you can figure out the issue right away.

I can look into it and provide more info later this week. If you want to
print any debug, let me know and I can provide later this week.

Thanks,
Thinh
Alan Stern Aug. 10, 2023, 1:47 a.m. UTC | #2
On Thu, Aug 10, 2023 at 12:28:27AM +0000, Thinh Nguyen wrote:
> Hi Alan,
> 
> On Fri, Aug 04, 2023, Alan Stern wrote:
> > An outstanding syzbot bug report has been traced to a race between the
> > routine that reads in the device descriptor for a device being
> > reinitialized and the routine that writes the descriptors to a sysfs
> > attribute file.  The problem is that reinitializing a device, like
> > initializing it for the first time, stores the device descriptor
> > directly in the usb_device structure, where it may be accessed
> > concurrently as part of sending the descriptors to the sysfs reader.
> > 
> > This three-part series fixes the problem:
> > 
> > 	The first patch unites the code paths responsible for first
> > 	reading the device descriptor in hub.c's old scheme and new
> > 	scheme, so that neither of them will call
> > 	usb_get_device_descriptor().
> > 
> > 	The second patch changes usb_get_device_descriptor(), making
> > 	it return the descriptor in a dynamically allocated buffer
> > 	rather than storing it directly in the device structure.
> > 
> > 	The third patch changes hub_port_init(), adding a new argument
> > 	that specifies a buffer in which to store the device
> > 	descriptor for devices being reinitialized.
> > 
> > As a result of these changes, the copy of the device descriptor stored
> > in the usb_device structure will never be overwritten once it has been
> > initialized.  This eliminates the data race causing the bug identified
> > by syzbot.
> > 
> > It would be nice at some point to make a similar change to the code
> > that reads the device's BOS descriptor; reinitialization should not
> > overwrite its existing data either.  This series doesn't attempt to do
> > that, but it would be a good thing to do.
> > 
> 
> Testing from Greg's usb-next branch, this series causes problem with
> device enumeration:
> 
> [   31.650759] usb 2-1: new SuperSpeed Plus Gen 2x1 USB device number 2 using xhci_hcd
> [   31.663107] usb 2-1: device descriptor read/8, error -71
> [   31.952697] usb 2-1: new SuperSpeed Plus Gen 2x1 USB device number 3 using xhci_hcd
> [   31.965122] usb 2-1: device descriptor read/8, error -71
> [   32.080991] usb usb2-port1: attempt power cycle
> [   34.826893] usb 2-1: new SuperSpeed Plus Gen 2x1 USB device number 5 using xhci_hcd
> [   34.839241] usb 2-1: device descriptor read/8, error -71
> [   35.129908] usb 2-1: new SuperSpeed Plus Gen 2x1 USB device number 6 using xhci_hcd
> [   35.142264] usb 2-1: device descriptor read/8, error -71
> [   35.257155] usb usb2-port1: attempt power cycle
> [   37.258922] usb 1-1: new high-speed USB device number 5 using xhci_hcd
> [   38.115053] usb 2-1: new SuperSpeed Plus Gen 2x1 USB device number 8 using xhci_hcd
> [   38.127407] usb 2-1: device descriptor read/8, error -71
> [   38.417066] usb 2-1: new SuperSpeed Plus Gen 2x1 USB device number 9 using xhci_hcd
> [   38.429428] usb 2-1: device descriptor read/8, error -71
> [   38.545315] usb usb2-port1: attempt power cycle
> [   43.347312] usb 2-2: new SuperSpeed Plus Gen 2x1 USB device number 11 using xhci_hcd
> [   43.359659] usb 2-2: device descriptor read/8, error -71
> [   43.649323] usb 2-2: new SuperSpeed Plus Gen 2x1 USB device number 12 using xhci_hcd
> [   43.661681] usb 2-2: device descriptor read/8, error -71
> [   43.777566] usb usb2-port2: attempt power cycle
> 
> I was testing with our host along with other common vendor hosts with a
> few 3.x devices. Reverting this series resolves the issue. I didn't do
> extensive tests for various speeds or debug it. I just want to notify
> this first perhaps you can figure out the issue right away.
> 
> I can look into it and provide more info later this week. If you want to
> print any debug, let me know and I can provide later this week.

Thanks for the notification.  The problem is almost certainly caused by 
the first patch in the series, which changes the way that the initial 
read/8 thing is done.  However, I have no idea what part of the patch 
could be causing these errors.  I would appreciate anything you can find 
out.

You should concentrate your initial investigation on the new 
get_bMaxPacketSize0() routine.  In particular, does the -EPROTO error 
value arise as the return value from the usb_control_msg() call, or does 
it happen because of the values stored in buf?  In the first case, how 
does this call differ from the one that used to be made by 
usb_get_device_descriptor()?  And in the second case, what are the 
values of rc, buf->bMaxPacketSize0, and buf->bDescriptorType?

Alan Stern
Alan Stern Aug. 10, 2023, 4:34 p.m. UTC | #3
On Wed, Aug 09, 2023 at 09:47:14PM -0400, Alan Stern wrote:
> On Thu, Aug 10, 2023 at 12:28:27AM +0000, Thinh Nguyen wrote:

> > Testing from Greg's usb-next branch, this series causes problem with
> > device enumeration:
> > 
> > [   31.650759] usb 2-1: new SuperSpeed Plus Gen 2x1 USB device number 2 using xhci_hcd
> > [   31.663107] usb 2-1: device descriptor read/8, error -71
> > [   31.952697] usb 2-1: new SuperSpeed Plus Gen 2x1 USB device number 3 using xhci_hcd
> > [   31.965122] usb 2-1: device descriptor read/8, error -71
> > [   32.080991] usb usb2-port1: attempt power cycle
> > [   34.826893] usb 2-1: new SuperSpeed Plus Gen 2x1 USB device number 5 using xhci_hcd
> > [   34.839241] usb 2-1: device descriptor read/8, error -71
> > [   35.129908] usb 2-1: new SuperSpeed Plus Gen 2x1 USB device number 6 using xhci_hcd
> > [   35.142264] usb 2-1: device descriptor read/8, error -71
> > [   35.257155] usb usb2-port1: attempt power cycle
> > [   37.258922] usb 1-1: new high-speed USB device number 5 using xhci_hcd
> > [   38.115053] usb 2-1: new SuperSpeed Plus Gen 2x1 USB device number 8 using xhci_hcd
> > [   38.127407] usb 2-1: device descriptor read/8, error -71
> > [   38.417066] usb 2-1: new SuperSpeed Plus Gen 2x1 USB device number 9 using xhci_hcd
> > [   38.429428] usb 2-1: device descriptor read/8, error -71
> > [   38.545315] usb usb2-port1: attempt power cycle
> > [   43.347312] usb 2-2: new SuperSpeed Plus Gen 2x1 USB device number 11 using xhci_hcd
> > [   43.359659] usb 2-2: device descriptor read/8, error -71
> > [   43.649323] usb 2-2: new SuperSpeed Plus Gen 2x1 USB device number 12 using xhci_hcd
> > [   43.661681] usb 2-2: device descriptor read/8, error -71
> > [   43.777566] usb usb2-port2: attempt power cycle
> > 
> > I was testing with our host along with other common vendor hosts with a
> > few 3.x devices. Reverting this series resolves the issue. I didn't do
> > extensive tests for various speeds or debug it. I just want to notify
> > this first perhaps you can figure out the issue right away.
> > 
> > I can look into it and provide more info later this week. If you want to
> > print any debug, let me know and I can provide later this week.
> 
> Thanks for the notification.  The problem is almost certainly caused by 
> the first patch in the series, which changes the way that the initial 
> read/8 thing is done.  However, I have no idea what part of the patch 
> could be causing these errors.  I would appreciate anything you can find 
> out.
> 
> You should concentrate your initial investigation on the new 
> get_bMaxPacketSize0() routine.  In particular, does the -EPROTO error 
> value arise as the return value from the usb_control_msg() call, or does 
> it happen because of the values stored in buf?  In the first case, how 
> does this call differ from the one that used to be made by 
> usb_get_device_descriptor()?  And in the second case, what are the 
> values of rc, buf->bMaxPacketSize0, and buf->bDescriptorType?

Never mind; I found the problem.  I had forgotten that at SuperSpeed or 
faster, the device descriptor uses a logarithmic encoding for 
bMaxPacketSize0.

The patch below should fix things up.  Let me know how it goes.

Alan Stern



 drivers/usb/core/hub.c |   36 ++++++++++++++++++++++++------------
 1 file changed, 24 insertions(+), 12 deletions(-)

Index: usb-devel/drivers/usb/core/hub.c
===================================================================
--- usb-devel.orig/drivers/usb/core/hub.c
+++ usb-devel/drivers/usb/core/hub.c
@@ -4705,7 +4705,7 @@ static int get_bMaxPacketSize0(struct us
 				buf, size,
 				initial_descriptor_timeout);
 		switch (buf->bMaxPacketSize0) {
-		case 8: case 16: case 32: case 64: case 255:
+		case 8: case 16: case 32: case 64: case 9:
 			if (buf->bDescriptorType == USB_DT_DEVICE) {
 				rc = buf->bMaxPacketSize0;
 				break;
@@ -4992,23 +4992,35 @@ hub_port_init(struct usb_hub *hub, struc
 	if (retval)
 		goto fail;
 
-	if (maxp0 == 0xff || udev->speed >= USB_SPEED_SUPER)
-		i = 512;
-	else
-		i = maxp0;
-	if (usb_endpoint_maxp(&udev->ep0.desc) != i) {
-		if (udev->speed == USB_SPEED_LOW ||
-				!(i == 8 || i == 16 || i == 32 || i == 64)) {
-			dev_err(&udev->dev, "Invalid ep0 maxpacket: %d\n", i);
-			retval = -EMSGSIZE;
-			goto fail;
-		}
+	/*
+	 * Check the ep0 maxpacket guess and correct it if necessary.
+	 * maxp0 is the value stored in the device descriptor;
+	 * i is the value it encodes (logarithmic for SuperSpeed or greater).
+	 */
+	i = maxp0;
+	if (udev->speed >= USB_SPEED_SUPER) {
+		if (maxp0 <= 16)
+			i = 1 << maxp0;
+		else
+			i = 0;		/* Invalid */
+	}
+	if (usb_endpoint_maxp(&udev->ep0.desc) == i) {
+		;	/* Initial ep0 maxpacket guess is right */
+	} else if ((udev->speed == USB_SPEED_FULL ||
+				udev->speed == USB_SPEED_HIGH) &&
+			(i == 8 || i == 16 || i == 32 || i == 64)) {
+		/* Initial guess is wrong; use the descriptor's value */
 		if (udev->speed == USB_SPEED_FULL)
 			dev_dbg(&udev->dev, "ep0 maxpacket = %d\n", i);
 		else
 			dev_warn(&udev->dev, "Using ep0 maxpacket: %d\n", i);
 		udev->ep0.desc.wMaxPacketSize = cpu_to_le16(i);
 		usb_ep0_reinit(udev);
+	} else {
+		/* Initial guess is wrong and descriptor's value is invalid */
+		dev_err(&udev->dev, "Invalid ep0 maxpacket: %d\n", maxp0);
+		retval = -EMSGSIZE;
+		goto fail;
 	}
 
 	descr = usb_get_device_descriptor(udev);
Thinh Nguyen Aug. 10, 2023, 10:39 p.m. UTC | #4
On Thu, Aug 10, 2023, Alan Stern wrote:
> On Wed, Aug 09, 2023 at 09:47:14PM -0400, Alan Stern wrote:
> > On Thu, Aug 10, 2023 at 12:28:27AM +0000, Thinh Nguyen wrote:
> 
> > > Testing from Greg's usb-next branch, this series causes problem with
> > > device enumeration:
> > > 
> > > [   31.650759] usb 2-1: new SuperSpeed Plus Gen 2x1 USB device number 2 using xhci_hcd
> > > [   31.663107] usb 2-1: device descriptor read/8, error -71
> > > [   31.952697] usb 2-1: new SuperSpeed Plus Gen 2x1 USB device number 3 using xhci_hcd
> > > [   31.965122] usb 2-1: device descriptor read/8, error -71
> > > [   32.080991] usb usb2-port1: attempt power cycle
> > > [   34.826893] usb 2-1: new SuperSpeed Plus Gen 2x1 USB device number 5 using xhci_hcd
> > > [   34.839241] usb 2-1: device descriptor read/8, error -71
> > > [   35.129908] usb 2-1: new SuperSpeed Plus Gen 2x1 USB device number 6 using xhci_hcd
> > > [   35.142264] usb 2-1: device descriptor read/8, error -71
> > > [   35.257155] usb usb2-port1: attempt power cycle
> > > [   37.258922] usb 1-1: new high-speed USB device number 5 using xhci_hcd
> > > [   38.115053] usb 2-1: new SuperSpeed Plus Gen 2x1 USB device number 8 using xhci_hcd
> > > [   38.127407] usb 2-1: device descriptor read/8, error -71
> > > [   38.417066] usb 2-1: new SuperSpeed Plus Gen 2x1 USB device number 9 using xhci_hcd
> > > [   38.429428] usb 2-1: device descriptor read/8, error -71
> > > [   38.545315] usb usb2-port1: attempt power cycle
> > > [   43.347312] usb 2-2: new SuperSpeed Plus Gen 2x1 USB device number 11 using xhci_hcd
> > > [   43.359659] usb 2-2: device descriptor read/8, error -71
> > > [   43.649323] usb 2-2: new SuperSpeed Plus Gen 2x1 USB device number 12 using xhci_hcd
> > > [   43.661681] usb 2-2: device descriptor read/8, error -71
> > > [   43.777566] usb usb2-port2: attempt power cycle
> > > 
> > > I was testing with our host along with other common vendor hosts with a
> > > few 3.x devices. Reverting this series resolves the issue. I didn't do
> > > extensive tests for various speeds or debug it. I just want to notify
> > > this first perhaps you can figure out the issue right away.
> > > 
> > > I can look into it and provide more info later this week. If you want to
> > > print any debug, let me know and I can provide later this week.
> > 
> > Thanks for the notification.  The problem is almost certainly caused by 
> > the first patch in the series, which changes the way that the initial 
> > read/8 thing is done.  However, I have no idea what part of the patch 
> > could be causing these errors.  I would appreciate anything you can find 
> > out.
> > 
> > You should concentrate your initial investigation on the new 
> > get_bMaxPacketSize0() routine.  In particular, does the -EPROTO error 
> > value arise as the return value from the usb_control_msg() call, or does 
> > it happen because of the values stored in buf?  In the first case, how 
> > does this call differ from the one that used to be made by 
> > usb_get_device_descriptor()?  And in the second case, what are the 
> > values of rc, buf->bMaxPacketSize0, and buf->bDescriptorType?
> 
> Never mind; I found the problem.  I had forgotten that at SuperSpeed or 
> faster, the device descriptor uses a logarithmic encoding for 
> bMaxPacketSize0.
> 
> The patch below should fix things up.  Let me know how it goes.
> 

Quick test for Gen 1 and 2 devices work fine now. Highspeed also works
as expected before. I didn't test Fullspeed with various MPS, but I
don't expect any problem looking at the change.

Thanks for the fix,
Thinh
Alan Stern Aug. 11, 2023, 1:52 a.m. UTC | #5
On Thu, Aug 10, 2023 at 10:39:36PM +0000, Thinh Nguyen wrote:
> On Thu, Aug 10, 2023, Alan Stern wrote:
> > Never mind; I found the problem.  I had forgotten that at SuperSpeed or 
> > faster, the device descriptor uses a logarithmic encoding for 
> > bMaxPacketSize0.
> > 
> > The patch below should fix things up.  Let me know how it goes.
> > 
> 
> Quick test for Gen 1 and 2 devices work fine now. Highspeed also works
> as expected before. I didn't test Fullspeed with various MPS, but I
> don't expect any problem looking at the change.
> 
> Thanks for the fix,

And thanks for testing it.  Is it okay to put your 
"Reported-and-tested-by:" tag on the submitted patch?

Alan Stern
Thinh Nguyen Aug. 11, 2023, 5:05 p.m. UTC | #6
On Thu, Aug 10, 2023, Alan Stern wrote:
> On Thu, Aug 10, 2023 at 10:39:36PM +0000, Thinh Nguyen wrote:
> > On Thu, Aug 10, 2023, Alan Stern wrote:
> > > Never mind; I found the problem.  I had forgotten that at SuperSpeed or 
> > > faster, the device descriptor uses a logarithmic encoding for 
> > > bMaxPacketSize0.
> > > 
> > > The patch below should fix things up.  Let me know how it goes.
> > > 
> > 
> > Quick test for Gen 1 and 2 devices work fine now. Highspeed also works
> > as expected before. I didn't test Fullspeed with various MPS, but I
> > don't expect any problem looking at the change.
> > 
> > Thanks for the fix,
> 
> And thanks for testing it.  Is it okay to put your 
> "Reported-and-tested-by:" tag on the submitted patch?
> 

Yes.

BR,
Thinh
Greg Kroah-Hartman Aug. 12, 2023, 8:05 a.m. UTC | #7
On Fri, Aug 11, 2023 at 01:38:46PM -0400, Alan Stern wrote:
> Commit 85d07c556216 ("USB: core: Unite old scheme and new scheme
> descriptor reads") altered the way USB devices are enumerated
> following detection, and in the process it messed up the
> initialization of SuperSpeed (or faster) devices:
> 
> [   31.650759] usb 2-1: new SuperSpeed Plus Gen 2x1 USB device number 2 using xhci_hcd
> [   31.663107] usb 2-1: device descriptor read/8, error -71
> [   31.952697] usb 2-1: new SuperSpeed Plus Gen 2x1 USB device number 3 using xhci_hcd
> [   31.965122] usb 2-1: device descriptor read/8, error -71
> [   32.080991] usb usb2-port1: attempt power cycle
> ...
> 
> The problem was caused by the commit forgetting that in SuperSpeed or
> faster devices, the device descriptor uses a logarithmic encoding of
> the bMaxPacketSize0 value.  (For some reason I thought the 255 case in
> the switch statement was meant for these devices, but it isn't -- it
> was meant for Wireless USB and is no longer needed.)
> 
> We can fix the oversight by testing for buf->bMaxPacketSize0 = 9
> (meaning 512, the actual maxpacket size for ep0 on all SuperSpeed
> devices) and straightening out the logic that checks and adjusts our
> initial guesses of the maxpacket value.
> 
> Reported-and-tested-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com>
> Closes: https://lore.kernel.org/linux-usb/20230810002257.nadxmfmrobkaxgnz@synopsys.com/
> Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
> Fixes: 85d07c556216 ("USB: core: Unite old scheme and new scheme descriptor reads")
> 
> ---
> 
>  drivers/usb/core/hub.c |   36 ++++++++++++++++++++++++------------
>  1 file changed, 24 insertions(+), 12 deletions(-)

Process nit, when you add a patch to an existing patch series like this,
when I run `b4` to download and apply the patch, it sucks in the
original patch series, not this add-on patch (which then of course fails
as the original patch series was already applied.)

So, if it's easy enough, can you just send add-on patches for stuff that
has been applied, as a new thread?  That way the tools handle it
automatically and I don't have to hand edit/apply the patch?

thanks,

greg k-h
Alan Stern Aug. 12, 2023, 3:28 p.m. UTC | #8
On Sat, Aug 12, 2023 at 10:05:27AM +0200, Greg KH wrote:
> Process nit, when you add a patch to an existing patch series like this,
> when I run `b4` to download and apply the patch, it sucks in the
> original patch series, not this add-on patch (which then of course fails
> as the original patch series was already applied.)
> 
> So, if it's easy enough, can you just send add-on patches for stuff that
> has been applied, as a new thread?  That way the tools handle it
> automatically and I don't have to hand edit/apply the patch?

Sure.  I don't use b4, so I wasn't aware of this behavior.

If I ever forget in the future, feel free to yell at me.  :-)

Alan Stern
Jan Čermák March 5, 2024, 8:20 a.m. UTC | #9
Hi Alan, Greg, everyone,

On 04. 08. 23 21:09, Alan Stern wrote:
> An outstanding syzbot bug report has been traced to a race between the
> routine that reads in the device descriptor for a device being
> reinitialized and the routine that writes the descriptors to a sysfs
> attribute file.  The problem is that reinitializing a device, like
> initializing it for the first time, stores the device descriptor
> directly in the usb_device structure, where it may be accessed
> concurrently as part of sending the descriptors to the sysfs reader.

I have a suspicion that some of these patches (three from the original 
series, plus the "Fix oversight..." one) introduced a regression we see 
with some USB devices in Home Assistant OS (but in mainstream distro as 
well, see below). In particular it's Z-Wave.me UZB stick (0658:0200), 
however roughly at the time of introduction of these patches, we started 
to see a few more reports of issues with USB devices (in general radios 
for IoT protocols), so I can't rule out it's source of more regressions. 
For this particular device, we have most detailed tracing of the issue, 
confirming it also manifests on mainstream distribution (Debian) which 
included these patches in its kernel. Most issue reports come from RPi 3 
but we also got them on amd64, and both on HAOS and Debian.

I'm a layman in terms of the USB stack, so I might be wrong about some 
assumptions, but anyway, the device seemed to always misbehave due to 
poor HW (?) implementation - every time it's plugged into an USB slot, 
the following messages appear:

[ 1134.073005] usb 1-1.4: new full-speed USB device number 12 using dwc_otg
[ 1134.153006] usb 1-1.4: device descriptor read/64, error -32
[ 1134.341003] usb 1-1.4: device descriptor read/64, error -32
[ 1134.529004] usb 1-1.4: new full-speed USB device number 13 using dwc_otg
[ 1134.609063] usb 1-1.4: device descriptor read/64, error -32
[ 1134.797005] usb 1-1.4: device descriptor read/64, error -32
[ 1134.905181] usb 1-1-port4: attempt power cycle

However, kernel versions prior to 6.1.52, or 6.1.73 with these patches 
reverted, were able to recover:

[ 1135.717049] usb 1-1.4: new full-speed USB device number 14 using dwc_otg
[ 1135.741234] usb 1-1.4: New USB device found, idVendor=0658, 
idProduct=0200, bcdDevice= 0.00
[ 1135.741275] usb 1-1.4: New USB device strings: Mfr=0, Product=0, 
SerialNumber=0
[ 1135.743959] cdc_acm 1-1.4:1.0: ttyACM0: USB ACM device

Without these patches reverted, 6.1.73 goes 2 another rounds of device 
descriptor read errors, and ends with:

[ 263.705865] usb 1-1-port4: unable to enumerate USB device

Also it should be noted that it seems that this only happens on USB 2 
ports, on USB 3/SS ports, the descriptor read errors are "protocol 
error" instead of "broken pipe", and the driver recovers (realizing 
this, I am now finally able to reproduce the issue in my environment):

[ 38.244292] usb 2-3: new full-speed USB device number 3 using xhci_hcd
[ 38.372319] usb 2-3: device descriptor read/64, error -71
[ 38.608317] usb 2-3: device descriptor read/64, error -71
[ 38.844287] usb 2-3: new full-speed USB device number 4 using xhci_hcd
[ 38.972317] usb 2-3: device descriptor read/64, error -71
[ 39.208325] usb 2-3: device descriptor read/64, error -71
[ 39.316405] usb usb2-port3: attempt power cycle
[ 39.936295] usb 2-3: new full-speed USB device number 5 using xhci_hcd
[ 39.957228] usb 2-3: New USB device found, idVendor=0658, 
idProduct=0200, bcdDevice= 0.00
[ 39.957241] usb 2-3: New USB device strings: Mfr=0, Product=0, 
SerialNumber=0
[ 39.999591] cdc_acm 2-3:1.0: ttyACM0: USB ACM device
[ 39.999639] usbcore: registered new interface driver cdc_acm
[ 39.999641] cdc_acm: USB Abstract Control Model driver for USB modems 
and ISDN adapters

This is the gist of the problem, more detailed findings can be found in 
reports by @FredrikFornstad in the GH issue [1], who managed to 
reproduce and pinpoint the likely source of the problem.

Let me know if you need any more details, or if there's something more 
to try, I'll be happy to help with getting this resolved.

Thanks,
Jan


[1] 
https://github.com/home-assistant/operating-system/issues/2995#issuecomment-1973507518
Alan Stern March 6, 2024, 9:08 p.m. UTC | #10
On Tue, Mar 05, 2024 at 09:20:22AM +0100, Jan Čermák wrote:
> Hi Alan, Greg, everyone,
> 
> On 04. 08. 23 21:09, Alan Stern wrote:
> > An outstanding syzbot bug report has been traced to a race between the
> > routine that reads in the device descriptor for a device being
> > reinitialized and the routine that writes the descriptors to a sysfs
> > attribute file.  The problem is that reinitializing a device, like
> > initializing it for the first time, stores the device descriptor
> > directly in the usb_device structure, where it may be accessed
> > concurrently as part of sending the descriptors to the sysfs reader.
> 
> I have a suspicion that some of these patches (three from the original
> series, plus the "Fix oversight..." one) introduced a regression we see with
> some USB devices in Home Assistant OS (but in mainstream distro as well, see
> below). In particular it's Z-Wave.me UZB stick (0658:0200), however roughly
> at the time of introduction of these patches, we started to see a few more
> reports of issues with USB devices (in general radios for IoT protocols), so
> I can't rule out it's source of more regressions. For this particular
> device, we have most detailed tracing of the issue, confirming it also
> manifests on mainstream distribution (Debian) which included these patches
> in its kernel. Most issue reports come from RPi 3 but we also got them on
> amd64, and both on HAOS and Debian.
> 
> I'm a layman in terms of the USB stack, so I might be wrong about some
> assumptions, but anyway, the device seemed to always misbehave due to poor
> HW (?) implementation - every time it's plugged into an USB slot, the
> following messages appear:
> 
> [ 1134.073005] usb 1-1.4: new full-speed USB device number 12 using dwc_otg
> [ 1134.153006] usb 1-1.4: device descriptor read/64, error -32
> [ 1134.341003] usb 1-1.4: device descriptor read/64, error -32
> [ 1134.529004] usb 1-1.4: new full-speed USB device number 13 using dwc_otg
> [ 1134.609063] usb 1-1.4: device descriptor read/64, error -32
> [ 1134.797005] usb 1-1.4: device descriptor read/64, error -32
> [ 1134.905181] usb 1-1-port4: attempt power cycle
> 
> However, kernel versions prior to 6.1.52, or 6.1.73 with these patches
> reverted, were able to recover:
> 
> [ 1135.717049] usb 1-1.4: new full-speed USB device number 14 using dwc_otg
> [ 1135.741234] usb 1-1.4: New USB device found, idVendor=0658,
> idProduct=0200, bcdDevice= 0.00
> [ 1135.741275] usb 1-1.4: New USB device strings: Mfr=0, Product=0,
> SerialNumber=0
> [ 1135.743959] cdc_acm 1-1.4:1.0: ttyACM0: USB ACM device
> 
> Without these patches reverted, 6.1.73 goes 2 another rounds of device
> descriptor read errors, and ends with:
> 
> [ 263.705865] usb 1-1-port4: unable to enumerate USB device
> 
> Also it should be noted that it seems that this only happens on USB 2 ports,
> on USB 3/SS ports, the descriptor read errors are "protocol error" instead
> of "broken pipe", and the driver recovers (realizing this, I am now finally
> able to reproduce the issue in my environment):
> 
> [ 38.244292] usb 2-3: new full-speed USB device number 3 using xhci_hcd
> [ 38.372319] usb 2-3: device descriptor read/64, error -71
> [ 38.608317] usb 2-3: device descriptor read/64, error -71
> [ 38.844287] usb 2-3: new full-speed USB device number 4 using xhci_hcd
> [ 38.972317] usb 2-3: device descriptor read/64, error -71
> [ 39.208325] usb 2-3: device descriptor read/64, error -71
> [ 39.316405] usb usb2-port3: attempt power cycle
> [ 39.936295] usb 2-3: new full-speed USB device number 5 using xhci_hcd
> [ 39.957228] usb 2-3: New USB device found, idVendor=0658, idProduct=0200,
> bcdDevice= 0.00
> [ 39.957241] usb 2-3: New USB device strings: Mfr=0, Product=0,
> SerialNumber=0
> [ 39.999591] cdc_acm 2-3:1.0: ttyACM0: USB ACM device
> [ 39.999639] usbcore: registered new interface driver cdc_acm
> [ 39.999641] cdc_acm: USB Abstract Control Model driver for USB modems and
> ISDN adapters
> 
> This is the gist of the problem, more detailed findings can be found in
> reports by @FredrikFornstad in the GH issue [1], who managed to reproduce
> and pinpoint the likely source of the problem.
> 
> Let me know if you need any more details, or if there's something more to
> try, I'll be happy to help with getting this resolved.

Can you provide two usbmon traces, one showing the problem with those 
patches present and the other (on the same system but with the patches 
reverted) showing the recovery?  Comparison of the two should indicate 
what's happening differently.

Alan Stern

PS: I would avoid the Raspberry Pi for this sort of testing, because 
it uses a nonstandard USB host controller driver (dwc-otg) by default.
Alan Stern March 7, 2024, 7:34 p.m. UTC | #11
On Thu, Mar 07, 2024 at 05:17:20PM +0100, Jan Čermák wrote:
> Hi Alan,
> 
> On 06. 03. 24 22:08, Alan Stern wrote:
> > Can you provide two usbmon traces, one showing the problem with those
> > patches present and the other (on the same system but with the patches
> > reverted) showing the recovery?  Comparison of the two should indicate
> > what's happening differently.
> 
> I reproduced the issue on my old ThinkPad X220 with 6.6.20 kernel, you can
> find the usbmon captures below.

This is quite strange; the two traces are the same up to this point:

ffff9f9a29b3f300 298538675 S Co:1:002:0 s 23 03 0004 0002 0000 0
ffff9f9a29b3f300 298538893 C Co:1:002:0 0 0
ffff9f9a012cae40 298581342 C Ii:1:002:1 0:2048 1 = 04
ffff9f9a012cae40 298581372 S Ii:1:002:1 -115:2048 1 <
ffff9f9a29b3f300 298742112 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff9f9a29b3f300 298742459 C Ci:1:002:0 0 4 = 03011000

------------------------------------------------------------------

ffff8fc4ee367240 368298641 S Co:1:002:0 s 23 03 0004 0002 0000 0
ffff8fc4ee367240 368298823 C Co:1:002:0 0 0
ffff8fc4c0c5ac00 368343025 C Ii:1:002:1 0:2048 1 = 04
ffff8fc4c0c5ac00 368343056 S Ii:1:002:1 -115:2048 1 <
ffff8fc4ee367240 368502095 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff8fc4ee367240 368502372 C Ci:1:002:0 0 4 = 01011100

The difference is in the last line: 03011000 vs. 01011100.  This means
that in the "working" scenario the device disconnected itself from
the USB bus for no apparent reason and then reconnected, whereas in
the "nonworking" scenario it didn't.  The computer did nothing
different before then, so I have no idea why the device's behavior
changed.  It's a mystery.

Another thing the traces showed is that the device doesn't like the
"new" initialization scheme; it wants the "old" one.  You can test
this by setting the old_scheme_first module parameter for usbcore
before plugging in the device:

	echo 1 >/sys/module/usbcore/parameters/old_scheme_first

Also, you can try the patch below (without the module parameter set).
I suspect it should be applied in any case, but it would be nice to
know if it makes any difference in your case.

Alan Stern



Index: usb-devel/drivers/usb/core/hub.c
===================================================================
--- usb-devel.orig/drivers/usb/core/hub.c
+++ usb-devel/drivers/usb/core/hub.c
@@ -5481,6 +5481,7 @@ loop:
 			msleep(2 * hub_power_on_good_delay(hub));
 			usb_hub_set_port_power(hdev, hub, port1, true);
 			msleep(hub_power_on_good_delay(hub));
+			hub_port_debounce_be_stable(hub, port1);
 		}
 	}
 	if (hub->hdev->parent ||
Jan Čermák March 11, 2024, 9:58 a.m. UTC | #12
On 07. 03. 24 20:34, Alan Stern wrote:
> Another thing the traces showed is that the device doesn't like the
> "new" initialization scheme; it wants the "old" one.  You can test
> this by setting the old_scheme_first module parameter for usbcore
> before plugging in the device:
> 
> 	echo 1 >/sys/module/usbcore/parameters/old_scheme_first
> 

You are right, this indeed works and probing is successful after the 
power-cycling attempt.

> Also, you can try the patch below (without the module parameter set).
> I suspect it should be applied in any case, but it would be nice to
> know if it makes any difference in your case.
> 
> Index: usb-devel/drivers/usb/core/hub.c
> ===================================================================
> --- usb-devel.orig/drivers/usb/core/hub.c
> +++ usb-devel/drivers/usb/core/hub.c
> @@ -5481,6 +5481,7 @@ loop:
>   			msleep(2 * hub_power_on_good_delay(hub));
>   			usb_hub_set_port_power(hdev, hub, port1, true);
>   			msleep(hub_power_on_good_delay(hub));
> +			hub_port_debounce_be_stable(hub, port1);
>   		}
>   	}
>   	if (hub->hdev->parent ||

Unfortunately, this doesn't fix it. It changes the log output a bit but 
still does not enumerate:

[  199.295695] usb 1-1.2: new full-speed USB device number 25 using ehci-pci
[  199.360684] usb 1-1.2: device descriptor read/64, error -32
[  199.534678] usb 1-1.2: device descriptor read/64, error -32
[  199.708671] usb 1-1.2: new full-speed USB device number 26 using ehci-pci
[  199.773666] usb 1-1.2: device descriptor read/64, error -32
[  199.941602] usb 1-1.2: device descriptor read/64, error -32
[  200.044871] usb 1-1-port2: attempt power cycle
[  200.528631] usb 1-1.2: new full-speed USB device number 27 using ehci-pci
[  200.940603] usb 1-1.2: device not accepting address 27, error -32
[  201.208592] usb 1-1.2: new full-speed USB device number 28 using ehci-pci
[  201.223783] usb 1-1.2: device descriptor read/8, error -32
[  201.345772] usb 1-1.2: device descriptor read/8, error -32
[  201.452899] usb 1-1-port2: unable to enumerate USB device

I'm attaching usbmon traces with this patch.

Regards,
Jan

####################################################


ffff94ff414f8a80 199261119 C Ii:1:002:1 0:2048 1 = 04
ffff94ff414f8a80 199261158 S Ii:1:002:1 -115:2048 1 <
ffff94ff882bd840 199261199 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff94ff882bd840 199261378 C Ci:1:002:0 0 4 = 01010100
ffff94ff882bd840 199261383 S Co:1:002:0 s 23 01 0010 0002 0000 0
ffff94ff882bd840 199261635 C Co:1:002:0 0 0
ffff94ff882bd840 199261661 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff94ff882bd840 199261902 C Ci:1:002:0 0 4 = 01010000
ffff94ff882bd840 199288109 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff94ff882bd840 199288463 C Ci:1:002:0 0 4 = 01010000
ffff94ff882bd840 199315108 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff94ff882bd840 199315374 C Ci:1:002:0 0 4 = 01010000
ffff94ff882bd840 199342113 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff94ff882bd840 199342504 C Ci:1:002:0 0 4 = 01010000
ffff94ff882bd840 199369107 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff94ff882bd840 199369447 C Ci:1:002:0 0 4 = 01010000
ffff94ff882bd840 199369498 S Co:1:002:0 s 23 03 0004 0002 0000 0
ffff94ff882bd840 199369903 C Co:1:002:0 0 0
ffff94ff882bd840 199381106 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff94ff882bd840 199381375 C Ci:1:002:0 0 4 = 03011000
ffff94ff882bd840 199381410 S Co:1:002:0 s 23 01 0014 0002 0000 0
ffff94ff882bd840 199381633 C Co:1:002:0 0 0
ffff94ff882bd840 199433148 S Ci:1:000:0 s 80 06 0100 0000 0040 64 <
ffff94ff882bd840 199433339 C Ci:1:000:0 -32 0
ffff94ff882bd840 199433360 S Ci:1:000:0 s 80 06 0100 0000 0040 64 <
ffff94ff882bd840 199433971 C Ci:1:000:0 -32 0
ffff94ff882bd840 199433988 S Ci:1:000:0 s 80 06 0100 0000 0040 64 <
ffff94ff882bd840 199434342 C Ci:1:000:0 -32 0
ffff94ff882bd840 199434363 S Co:1:002:0 s 23 03 0004 0002 0000 0
ffff94ff882bd840 199434609 C Co:1:002:0 0 0
ffff94ff882bd840 199446107 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff94ff882bd840 199446423 C Ci:1:002:0 0 4 = 03011000
ffff94ff882bd840 199446448 S Co:1:002:0 s 23 01 0014 0002 0000 0
ffff94ff882bd840 199446679 C Co:1:002:0 0 0
ffff94ff882bd840 199606111 S Ci:1:000:0 s 80 06 0100 0000 0040 64 <
ffff94ff882bd840 199606348 C Ci:1:000:0 -32 0
ffff94ff882bd840 199606372 S Ci:1:000:0 s 80 06 0100 0000 0040 64 <
ffff94ff882bd840 199607003 C Ci:1:000:0 -32 0
ffff94ff882bd840 199607090 S Ci:1:000:0 s 80 06 0100 0000 0040 64 <
ffff94ff882bd840 199607593 C Ci:1:000:0 -32 0
ffff94ff882bd840 199607623 S Co:1:002:0 s 23 03 0004 0002 0000 0
ffff94ff882bd840 199608123 C Co:1:002:0 0 0
ffff94ff882bd840 199620108 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff94ff882bd840 199620506 C Ci:1:002:0 0 4 = 03011000
ffff94ff882bd840 199620531 S Co:1:002:0 s 23 01 0014 0002 0000 0
ffff94ff882bd840 199620770 C Co:1:002:0 0 0
ffff94ff882bd840 199782114 S Co:1:002:0 s 23 01 0001 0002 0000 0
ffff94ff882bd840 199782380 C Co:1:002:0 0 0
ffff94ff882bd900 199782433 S Co:1:002:0 s 23 03 0004 0002 0000 0
ffff94ff882bd900 199782639 C Co:1:002:0 0 0
ffff94ff882bd900 199794109 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff94ff882bd900 199794511 C Ci:1:002:0 0 4 = 03011000
ffff94ff882bd900 199794536 S Co:1:002:0 s 23 01 0014 0002 0000 0
ffff94ff882bd900 199794769 C Co:1:002:0 0 0
ffff94ff882bd900 199846142 S Ci:1:000:0 s 80 06 0100 0000 0040 64 <
ffff94ff882bd900 199846554 C Ci:1:000:0 -32 0
ffff94ff882bd900 199846574 S Ci:1:000:0 s 80 06 0100 0000 0040 64 <
ffff94ff882bd900 199846828 C Ci:1:000:0 -32 0
ffff94ff882bd900 199846842 S Ci:1:000:0 s 80 06 0100 0000 0040 64 <
ffff94ff882bd900 199847170 C Ci:1:000:0 -32 0
ffff94ff882bd900 199847236 S Co:1:002:0 s 23 03 0004 0002 0000 0
ffff94ff882bd900 199847490 C Co:1:002:0 0 0
ffff94ff882bd900 199859108 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff94ff882bd900 199859470 C Ci:1:002:0 0 4 = 03011000
ffff94ff882bd900 199859496 S Co:1:002:0 s 23 01 0014 0002 0000 0
ffff94ff882bd900 199859726 C Co:1:002:0 0 0
ffff94ff882bd900 200014111 S Ci:1:000:0 s 80 06 0100 0000 0040 64 <
ffff94ff882bd900 200014492 C Ci:1:000:0 -32 0
ffff94ff882bd900 200014516 S Ci:1:000:0 s 80 06 0100 0000 0040 64 <
ffff94ff882bd900 200014828 C Ci:1:000:0 -32 0
ffff94ff882bd900 200014854 S Ci:1:000:0 s 80 06 0100 0000 0040 64 <
ffff94ff882bd900 200015180 C Ci:1:000:0 -32 0
ffff94ff882bd900 200015251 S Co:1:002:0 s 23 03 0004 0002 0000 0
ffff94ff882bd900 200015502 C Co:1:002:0 0 0
ffff94ff882bd900 200027108 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff94ff882bd900 200027318 C Ci:1:002:0 0 4 = 03011000
ffff94ff882bd900 200027345 S Co:1:002:0 s 23 01 0014 0002 0000 0
ffff94ff882bd900 200027579 C Co:1:002:0 0 0
ffff94ff882bd900 200182109 S Co:1:002:0 s 23 01 0001 0002 0000 0
ffff94ff882bd900 200182296 C Co:1:002:0 0 0
ffff94ff882bd840 200182354 S Co:1:002:0 s 23 01 0008 0002 0000 0
ffff94ff882bd840 200182551 C Co:1:002:0 0 0
ffff94ff882bd840 200390109 S Co:1:002:0 s 23 03 0008 0002 0000 0
ffff94ff882bd840 200390345 C Co:1:002:0 0 0
ffff94ff882bd840 200494114 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff94ff882bd840 200494318 C Ci:1:002:0 0 4 = 01010100
ffff94ff882bd840 200494345 S Co:1:002:0 s 23 01 0010 0002 0000 0
ffff94ff882bd840 200494571 C Co:1:002:0 0 0
ffff94ff882bd840 200521107 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff94ff882bd840 200521316 C Ci:1:002:0 0 4 = 01010000
ffff94ff882bd840 200548107 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff94ff882bd840 200548426 C Ci:1:002:0 0 4 = 01010000
ffff94ff882bd840 200575100 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff94ff882bd840 200575338 C Ci:1:002:0 0 4 = 01010000
ffff94ff882bd840 200602100 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff94ff882bd840 200602260 C Ci:1:002:0 0 4 = 01010000
ffff94ff882bd840 200602311 S Co:1:002:0 s 23 03 0004 0002 0000 0
ffff94ff882bd840 200602714 C Co:1:002:0 0 0
ffff94ff882bd840 200614101 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff94ff882bd840 200614389 C Ci:1:002:0 0 4 = 03011000
ffff94ff882bd840 200614416 S Co:1:002:0 s 23 01 0014 0002 0000 0
ffff94ff882bd840 200614842 C Co:1:002:0 0 0
ffff94ff882bd840 200666141 S Co:1:000:0 s 00 05 001b 0000 0000 0
ffff94ff882bd840 200666529 C Co:1:000:0 -32 0
ffff94ff414f8a80 200797181 C Ii:1:002:1 0:2048 1 = 04
ffff94ff414f8a80 200797212 S Ii:1:002:1 -115:2048 1 <
ffff94ff882bd840 200870110 S Co:1:000:0 s 00 05 001b 0000 0000 0
ffff94ff882bd840 200870423 C Co:1:000:0 -32 0
ffff94ff414f8a80 201053174 C Ii:1:002:1 0:2048 1 = 04
ffff94ff414f8a80 201053190 S Ii:1:002:1 -115:2048 1 <
ffff94ff882bd840 201079556 S Co:1:002:0 s 23 01 0001 0002 0000 0
ffff94ff882bd840 201079768 C Co:1:002:0 0 0
ffff94ff882bd900 201079814 S Co:1:002:0 s 23 03 0004 0002 0000 0
ffff94ff882bd900 201080045 C Co:1:002:0 0 0
ffff94ff882bd900 201092106 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff94ff882bd900 201092293 C Ci:1:002:0 0 4 = 03011100
ffff94ff882bd900 201092321 S Co:1:002:0 s 23 01 0010 0002 0000 0
ffff94ff882bd900 201092559 C Co:1:002:0 0 0
ffff94ff882bd900 201092596 S Co:1:002:0 s 23 03 0004 0002 0000 0
ffff94ff882bd900 201092819 C Co:1:002:0 0 0
ffff94ff882bd900 201294110 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff94ff882bd900 201294319 C Ci:1:002:0 0 4 = 03011000
ffff94ff882bd900 201294346 S Co:1:002:0 s 23 01 0014 0002 0000 0
ffff94ff882bd900 201294575 C Co:1:002:0 0 0
ffff94ff882bd900 201346134 S Co:1:000:0 s 00 05 001c 0000 0000 0
ffff94ff882bd900 201346293 C Co:1:000:0 0 0
ffff94ff882bd840 201360112 S Ci:1:028:0 s 80 06 0100 0000 0008 8 <
ffff94ff882bd840 201360521 C Ci:1:028:0 -32 0
ffff94ff882bd840 201360558 S Ci:1:028:0 s 80 06 0100 0000 0008 8 <
ffff94ff882bd840 201360922 C Ci:1:028:0 -32 0
ffff94ff882bd840 201360938 S Ci:1:028:0 s 80 06 0100 0000 0008 8 <
ffff94ff882bd840 201361276 C Ci:1:028:0 -32 0
ffff94ff882bd840 201482107 S Ci:1:028:0 s 80 06 0100 0000 0008 8 <
ffff94ff882bd840 201482304 C Ci:1:028:0 -32 0
ffff94ff882bd840 201482330 S Ci:1:028:0 s 80 06 0100 0000 0008 8 <
ffff94ff882bd840 201482934 C Ci:1:028:0 -32 0
ffff94ff882bd840 201482954 S Ci:1:028:0 s 80 06 0100 0000 0008 8 <
ffff94ff882bd840 201483278 C Ci:1:028:0 -32 0
ffff94ff882bd840 201590112 S Co:1:002:0 s 23 01 0001 0002 0000 0
ffff94ff882bd840 201590387 C Co:1:002:0 0 0
ffff94ff882bd900 201591835 S Co:1:002:0 s 23 01 0001 0002 0000 0
ffff94ff882bd900 201592045 C Co:1:002:0 0 0
ffff94ff882bd900 201592073 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff94ff882bd900 201592497 C Ci:1:002:0 0 4 = 01010000
Alan Stern March 11, 2024, 2:43 p.m. UTC | #13
On Mon, Mar 11, 2024 at 10:58:18AM +0100, Jan Čermák wrote:
> On 07. 03. 24 20:34, Alan Stern wrote:
> > Another thing the traces showed is that the device doesn't like the
> > "new" initialization scheme; it wants the "old" one.  You can test
> > this by setting the old_scheme_first module parameter for usbcore
> > before plugging in the device:
> > 
> > 	echo 1 >/sys/module/usbcore/parameters/old_scheme_first
> > 
> 
> You are right, this indeed works and probing is successful after the
> power-cycling attempt.

Well, at least this means you do have a way of using the device, even
if it is rather awkward.  It might even work on the Raspberry Pi machine.

> > Also, you can try the patch below (without the module parameter set).
> > I suspect it should be applied in any case, but it would be nice to
> > know if it makes any difference in your case.
> > 
> > Index: usb-devel/drivers/usb/core/hub.c
> > ===================================================================
> > --- usb-devel.orig/drivers/usb/core/hub.c
> > +++ usb-devel/drivers/usb/core/hub.c
> > @@ -5481,6 +5481,7 @@ loop:
> >   			msleep(2 * hub_power_on_good_delay(hub));
> >   			usb_hub_set_port_power(hdev, hub, port1, true);
> >   			msleep(hub_power_on_good_delay(hub));
> > +			hub_port_debounce_be_stable(hub, port1);
> >   		}
> >   	}
> >   	if (hub->hdev->parent ||
> 
> Unfortunately, this doesn't fix it. It changes the log output a bit but
> still does not enumerate:
> 
> [  199.295695] usb 1-1.2: new full-speed USB device number 25 using ehci-pci
> [  199.360684] usb 1-1.2: device descriptor read/64, error -32
> [  199.534678] usb 1-1.2: device descriptor read/64, error -32
> [  199.708671] usb 1-1.2: new full-speed USB device number 26 using ehci-pci
> [  199.773666] usb 1-1.2: device descriptor read/64, error -32
> [  199.941602] usb 1-1.2: device descriptor read/64, error -32
> [  200.044871] usb 1-1-port2: attempt power cycle
> [  200.528631] usb 1-1.2: new full-speed USB device number 27 using ehci-pci
> [  200.940603] usb 1-1.2: device not accepting address 27, error -32
> [  201.208592] usb 1-1.2: new full-speed USB device number 28 using ehci-pci
> [  201.223783] usb 1-1.2: device descriptor read/8, error -32
> [  201.345772] usb 1-1.2: device descriptor read/8, error -32
> [  201.452899] usb 1-1-port2: unable to enumerate USB device
> 
> I'm attaching usbmon traces with this patch.

The device is so non-responsive, I'm amazed it ever works at all.
Judging by the usbmon traces, it doesn't look as if it would work on a
Windows system.

Actually, if you have access to a computer running Windows or Mac OSX
and you can try out the device on that computer, it would be good to
get the equivalent of a usbmon trace (something like a Wireshark
capture log would do).  If those systems manage to do something that
Linux doesn't, we ought to know what it is.

Alan Stern
Jan Čermák March 12, 2024, 8:57 a.m. UTC | #14
Hi Alan

On 11. 03. 24 15:43, Alan Stern wrote:
> Well, at least this means you do have a way of using the device, even
> if it is rather awkward.  It might even work on the Raspberry Pi machine.

Still, I'm looking for a more permanent and robust solution. If it were 
only a single device I'm using myself, I could come up with a 
workaround. However, this is one of the very few available Z-Wave USB 
interfaces and there are more users affected. So far we went with 
reverting the patches, but that's surely not the way we want to go forward.

> The device is so non-responsive, I'm amazed it ever works at all.
> Judging by the usbmon traces, it doesn't look as if it would work on a
> Windows system.
> 
> Actually, if you have access to a computer running Windows or Mac OSX
> and you can try out the device on that computer, it would be good to
> get the equivalent of a usbmon trace (something like a Wireshark
> capture log would do).  If those systems manage to do something that
> Linux doesn't, we ought to know what it is.

Fredrik (one of the original reporters) is following this conversation, 
here [1] are logs from his machine with some details in the ticket [2]. 
He also wonders why the initialization doesn't work only on USB2 ports 
but works on USB3 if the initialization code is shared between those two.

Also, if needed, I can get more logs from the X220 I was using for my 
usbmon traces - I believe I have an HDD with Windows 7 install lying 
somewhere.

Regards,
Jan


[1] 
https://github.com/home-assistant/operating-system/files/14563813/usbPcap1_USB2-port.txt
[2] 
https://github.com/home-assistant/operating-system/issues/2995#issuecomment-1989329739
Alan Stern March 12, 2024, 8:47 p.m. UTC | #15
On Tue, Mar 12, 2024 at 09:57:06AM +0100, Jan Čermák wrote:
> Hi Alan
> 
> On 11. 03. 24 15:43, Alan Stern wrote:
> > Well, at least this means you do have a way of using the device, even
> > if it is rather awkward.  It might even work on the Raspberry Pi machine.
> 
> Still, I'm looking for a more permanent and robust solution. If it were only
> a single device I'm using myself, I could come up with a workaround.
> However, this is one of the very few available Z-Wave USB interfaces and
> there are more users affected. So far we went with reverting the patches,
> but that's surely not the way we want to go forward.

I agree, this should be fixed if we can.  I'm still puzzled about how 
commits you located could cause any difference in the device's behavior, 
given that they don't seem to cause any change's to the computer's 
behavior.

> > The device is so non-responsive, I'm amazed it ever works at all.
> > Judging by the usbmon traces, it doesn't look as if it would work on a
> > Windows system.
> > 
> > Actually, if you have access to a computer running Windows or Mac OSX
> > and you can try out the device on that computer, it would be good to
> > get the equivalent of a usbmon trace (something like a Wireshark
> > capture log would do).  If those systems manage to do something that
> > Linux doesn't, we ought to know what it is.
> 
> Fredrik (one of the original reporters) is following this conversation, here
> [1] are logs from his machine with some details in the ticket [2]. He also
> wonders why the initialization doesn't work only on USB2 ports but works on
> USB3 if the initialization code is shared between those two.

Good.  I've been quite busy recently, but I'll look at the logs when 
there is time.

The USB-2 and USB-3 drivers do share some initialization code, but not 
all of it.  The first couple of steps are quite different: With a USB-2 
controller they are handled by the kernel driver, but with a USB-3 
controller they are handled directly by the controller's hardware with 
no software influence.

Alan Stern

> Also, if needed, I can get more logs from the X220 I was using for my usbmon
> traces - I believe I have an HDD with Windows 7 install lying somewhere.
> 
> Regards,
> Jan
> 
> 
> [1] https://github.com/home-assistant/operating-system/files/14563813/usbPcap1_USB2-port.txt
> [2] https://github.com/home-assistant/operating-system/issues/2995#issuecomment-1989329739
Alan Stern March 16, 2024, 8:35 p.m. UTC | #16
On Tue, Mar 12, 2024 at 09:57:06AM +0100, Jan Čermák wrote:
> Hi Alan
> 
> On 11. 03. 24 15:43, Alan Stern wrote:
> > Well, at least this means you do have a way of using the device, even
> > if it is rather awkward.  It might even work on the Raspberry Pi machine.
> 
> Still, I'm looking for a more permanent and robust solution. If it were only
> a single device I'm using myself, I could come up with a workaround.
> However, this is one of the very few available Z-Wave USB interfaces and
> there are more users affected. So far we went with reverting the patches,
> but that's surely not the way we want to go forward.
> 
> > The device is so non-responsive, I'm amazed it ever works at all.
> > Judging by the usbmon traces, it doesn't look as if it would work on a
> > Windows system.
> > 
> > Actually, if you have access to a computer running Windows or Mac OSX
> > and you can try out the device on that computer, it would be good to
> > get the equivalent of a usbmon trace (something like a Wireshark
> > capture log would do).  If those systems manage to do something that
> > Linux doesn't, we ought to know what it is.
> 
> Fredrik (one of the original reporters) is following this conversation, here
> [1] are logs from his machine with some details in the ticket [2]. He also
> wonders why the initialization doesn't work only on USB2 ports but works on
> USB3 if the initialization code is shared between those two.

It's very difficult to compare the Windows log with the usbmon trace.  
However, something Frederik mentioned about the number of resets led me 
to check more closely.  There are some extra unnecessary resets in the 
usbmon trace, and a reset that should be there is missing.

Below is a patch meant to get the number of resets back to what it 
should be.  I'd appreciate it if you can test this, and report the 
kernel log output along with the usbmon output for the normal case and 
also with the "old_scheme_first" parameter set.

I'm not very hopeful that this will solve the problem, but there's a 
good chance it will help point us in the right direction by removing 
extraneous complications.

Alan Stern



 drivers/usb/core/hub.c |   27 ++++++++++++++++++++-------
 1 file changed, 20 insertions(+), 7 deletions(-)

Index: usb-devel/drivers/usb/core/hub.c
===================================================================
--- usb-devel.orig/drivers/usb/core/hub.c
+++ usb-devel/drivers/usb/core/hub.c
@@ -4961,6 +4961,18 @@ hub_port_init(struct usb_hub *hub, struc
 			break;
 		}
 
+		if (retries > 0) {
+			retval = hub_port_reset(hub, port1, udev, delay, false);
+			if (retval < 0)		/* error or disconnect */
+				goto fail;
+			if (oldspeed != udev->speed) {
+				dev_dbg(&udev->dev,
+					"device reset changed speed!\n");
+				retval = -ENODEV;
+				goto fail;
+			}
+		}
+
 		if (do_new_scheme) {
 			retval = hub_enable_device(udev);
 			if (retval < 0) {
@@ -4972,6 +4984,13 @@ hub_port_init(struct usb_hub *hub, struc
 
 			maxp0 = get_bMaxPacketSize0(udev, buf,
 					GET_DESCRIPTOR_BUFSIZE, retries == 0);
+			if (maxp0 < 0) {
+				if (maxp0 != -ENODEV)
+					dev_err(&udev->dev, "device descriptor read/64, error %d\n",
+							maxp0);
+				retval = maxp0;
+				continue;
+			}
 			if (maxp0 > 0 && !initial &&
 					maxp0 != udev->descriptor.bMaxPacketSize0) {
 				dev_err(&udev->dev, "device reset changed ep0 maxpacket size!\n");
@@ -4988,13 +5007,6 @@ hub_port_init(struct usb_hub *hub, struc
 				retval = -ENODEV;
 				goto fail;
 			}
-			if (maxp0 < 0) {
-				if (maxp0 != -ENODEV)
-					dev_err(&udev->dev, "device descriptor read/64, error %d\n",
-							maxp0);
-				retval = maxp0;
-				continue;
-			}
 		}
 
 		for (operations = 0; operations < SET_ADDRESS_TRIES; ++operations) {
@@ -5533,6 +5545,7 @@ loop:
 			msleep(2 * hub_power_on_good_delay(hub));
 			usb_hub_set_port_power(hdev, hub, port1, true);
 			msleep(hub_power_on_good_delay(hub));
+			hub_port_debounce_be_stable(hub, port1);
 		}
 	}
 	if (hub->hdev->parent ||
Jan Čermák March 19, 2024, 11:54 a.m. UTC | #17
Hello Alan,

On 16. 03. 24 21:35, Alan Stern wrote:
> Below is a patch meant to get the number of resets back to what it
> should be.  I'd appreciate it if you can test this, and report the
> kernel log output along with the usbmon output for the normal case and
> also with the "old_scheme_first" parameter set.
> 
> I'm not very hopeful that this will solve the problem, but there's a
> good chance it will help point us in the right direction by removing
> extraneous complications.

unfortunately you were right, the problem is still unresolved with your 
patch. I hope the traces will provide some new insights then.

Regards
Jan


###############################
# Patched - new scheme (6.6.21)
###############################

[  149.638997] usb 1-1.2: new full-speed USB device number 8 using ehci-pci
[  149.640602] usb 1-1.2: device descriptor read/64, error -32
[  149.811493] usb 1-1.2: device descriptor read/64, error -32
[  149.986051] usb 1-1.2: new full-speed USB device number 9 using ehci-pci
[  149.987370] usb 1-1.2: device descriptor read/64, error -32
[  150.163337] usb 1-1.2: device descriptor read/64, error -32
[  150.266422] usb 1-1-port2: attempt power cycle
[  150.750045] usb 1-1.2: new full-speed USB device number 10 using ehci-pci
[  151.162030] usb 1-1.2: device not accepting address 10, error -32
[  151.430029] usb 1-1.2: new full-speed USB device number 11 using ehci-pci
[  151.445492] usb 1-1.2: device descriptor read/8, error -32
[  151.633637] usb 1-1.2: device descriptor read/8, error -32
[  151.739227] usb 1-1-port2: unable to enumerate USB device

ffff8881003dc9c0 149518477 C Ii:1:002:1 0:2048 1 = 04
ffff8881003dc9c0 149518517 S Ii:1:002:1 -115:2048 1 <
ffff888139ff8480 149518558 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff888139ff8480 149518935 C Ci:1:002:0 0 4 = 01010100
ffff888139ff8480 149518951 S Co:1:002:0 s 23 01 0010 0002 0000 0
ffff888139ff8480 149519199 C Co:1:002:0 0 0
ffff888139ff8480 149519229 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff888139ff8480 149519467 C Ci:1:002:0 0 4 = 01010000
ffff888139ff8480 149546106 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff888139ff8480 149546393 C Ci:1:002:0 0 4 = 01010000
ffff888139ff8480 149573113 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff888139ff8480 149573442 C Ci:1:002:0 0 4 = 01010000
ffff888139ff8480 149600103 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff888139ff8480 149600309 C Ci:1:002:0 0 4 = 01010000
ffff888139ff8480 149627105 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff888139ff8480 149627441 C Ci:1:002:0 0 4 = 01010000
ffff888139ff8480 149627495 S Co:1:002:0 s 23 03 0004 0002 0000 0
ffff888139ff8480 149627699 C Co:1:002:0 0 0
ffff888139ff8480 149639104 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff888139ff8480 149639364 C Ci:1:002:0 0 4 = 03011000
ffff888139ff8480 149639401 S Co:1:002:0 s 23 01 0014 0002 0000 0
ffff888139ff8480 149639619 C Co:1:002:0 0 0
ffff888139ff8480 149691082 S Ci:1:000:0 s 80 06 0100 0000 0040 64 <
ffff888139ff8480 149691572 C Ci:1:000:0 -32 0
ffff8881419950c0 149691656 S Ci:1:000:0 s 80 06 0100 0000 0040 64 <
ffff8881419950c0 149691905 C Ci:1:000:0 -32 0
ffff8881419950c0 149691917 S Ci:1:000:0 s 80 06 0100 0000 0040 64 <
ffff8881419950c0 149692565 C Ci:1:000:0 -32 0
ffff8881419950c0 149798110 S Co:1:002:0 s 23 03 0004 0002 0000 0
ffff8881419950c0 149798431 C Co:1:002:0 0 0
ffff8881419950c0 149810115 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff8881419950c0 149810401 C Ci:1:002:0 0 4 = 03011000
ffff8881419950c0 149810427 S Co:1:002:0 s 23 01 0014 0002 0000 0
ffff8881419950c0 149810864 C Co:1:002:0 0 0
ffff8881419950c0 149862123 S Ci:1:000:0 s 80 06 0100 0000 0040 64 <
ffff8881419950c0 149862618 C Ci:1:000:0 -32 0
ffff8881419950c0 149862653 S Ci:1:000:0 s 80 06 0100 0000 0040 64 <
ffff8881419950c0 149863135 C Ci:1:000:0 -32 0
ffff8881419950c0 149863165 S Ci:1:000:0 s 80 06 0100 0000 0040 64 <
ffff8881419950c0 149863534 C Ci:1:000:0 -32 0
ffff8881419950c0 149974112 S Co:1:002:0 s 23 01 0001 0002 0000 0
ffff8881419950c0 149974320 C Co:1:002:0 0 0
ffff888141995000 149974393 S Co:1:002:0 s 23 03 0004 0002 0000 0
ffff888141995000 149974575 C Co:1:002:0 0 0
ffff888141995000 149986107 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff888141995000 149986437 C Ci:1:002:0 0 4 = 03011000
ffff888141995000 149986471 S Co:1:002:0 s 23 01 0014 0002 0000 0
ffff888141995000 149986696 C Co:1:002:0 0 0
ffff888141995000 150038142 S Ci:1:000:0 s 80 06 0100 0000 0040 64 <
ffff888141995000 150038648 C Ci:1:000:0 -32 0
ffff888141995000 150038672 S Ci:1:000:0 s 80 06 0100 0000 0040 64 <
ffff888141995000 150039032 C Ci:1:000:0 -32 0
ffff888141995000 150039046 S Ci:1:000:0 s 80 06 0100 0000 0040 64 <
ffff888141995000 150039358 C Ci:1:000:0 -32 0
ffff888141995000 150150113 S Co:1:002:0 s 23 03 0004 0002 0000 0
ffff888141995000 150150417 C Co:1:002:0 0 0
ffff888141995000 150162111 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff888141995000 150162387 C Ci:1:002:0 0 4 = 03011000
ffff888141995000 150162418 S Co:1:002:0 s 23 01 0014 0002 0000 0
ffff888141995000 150162648 C Co:1:002:0 0 0
ffff888141995000 150214119 S Ci:1:000:0 s 80 06 0100 0000 0040 64 <
ffff888141995000 150214598 C Ci:1:000:0 -32 0
ffff888141995000 150214632 S Ci:1:000:0 s 80 06 0100 0000 0040 64 <
ffff888141995000 150214912 C Ci:1:000:0 -32 0
ffff888141995000 150214930 S Ci:1:000:0 s 80 06 0100 0000 0040 64 <
ffff888141995000 150215314 C Ci:1:000:0 -32 0
ffff888141995000 150318106 S Co:1:002:0 s 23 01 0001 0002 0000 0
ffff888141995000 150318446 C Co:1:002:0 0 0
ffff8881419950c0 150318517 S Co:1:002:0 s 23 01 0008 0002 0000 0
ffff8881419950c0 150318704 C Co:1:002:0 0 0
ffff8881419950c0 150526102 S Co:1:002:0 s 23 03 0008 0002 0000 0
ffff8881419950c0 150526430 C Co:1:002:0 0 0
ffff8881003dc9c0 150542516 C Ii:1:002:1 0:2048 1 = 04
ffff8881003dc9c0 150542548 S Ii:1:002:1 -115:2048 1 <
ffff8881419950c0 150630112 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff8881419950c0 150630417 C Ci:1:002:0 0 4 = 01010100
ffff8881419950c0 150630450 S Co:1:002:0 s 23 01 0010 0002 0000 0
ffff8881419950c0 150630685 C Co:1:002:0 0 0
ffff8881419950c0 150657105 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff8881419950c0 150657423 C Ci:1:002:0 0 4 = 01010000
ffff888141995000 150684024 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff888141995000 150684290 C Ci:1:002:0 0 4 = 01010000
ffff888141995000 150711108 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff888141995000 150711365 C Ci:1:002:0 0 4 = 01010000
ffff888141995000 150738112 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff888141995000 150738437 C Ci:1:002:0 0 4 = 01010000
ffff888141995000 150738494 S Co:1:002:0 s 23 03 0004 0002 0000 0
ffff888141995000 150738692 C Co:1:002:0 0 0
ffff888141995000 150750108 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff888141995000 150750364 C Ci:1:002:0 0 4 = 03011000
ffff888141995000 150750399 S Co:1:002:0 s 23 01 0014 0002 0000 0
ffff888141995000 150750621 C Co:1:002:0 0 0
ffff888141995000 150802147 S Co:1:000:0 s 00 05 000a 0000 0000 0
ffff888141995000 150802527 C Co:1:000:0 -32 0
ffff888141995000 151006106 S Co:1:000:0 s 00 05 000a 0000 0000 0
ffff888141995000 151006560 C Co:1:000:0 -32 0
ffff8881003dc9c0 151054453 C Ii:1:002:1 0:2048 1 = 04
ffff8881003dc9c0 151054467 S Ii:1:002:1 -115:2048 1 <
ffff888141995000 151215694 S Co:1:002:0 s 23 01 0001 0002 0000 0
ffff888141995000 151216068 C Co:1:002:0 0 0
ffff888141995180 151216115 S Co:1:002:0 s 23 03 0004 0002 0000 0
ffff888141995180 151216315 C Co:1:002:0 0 0
ffff888141995180 151228102 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff888141995180 151228380 C Ci:1:002:0 0 4 = 03011100
ffff888141995180 151228417 S Co:1:002:0 s 23 01 0010 0002 0000 0
ffff888141995180 151228638 C Co:1:002:0 0 0
ffff888141995180 151228664 S Co:1:002:0 s 23 03 0004 0002 0000 0
ffff888141995180 151228899 C Co:1:002:0 0 0
ffff8881003dc9c0 151310482 C Ii:1:002:1 0:2048 1 = 04
ffff8881003dc9c0 151310497 S Ii:1:002:1 -115:2048 1 <
ffff888141995180 151430108 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff888141995180 151430419 C Ci:1:002:0 0 4 = 03011000
ffff888141995180 151430455 S Co:1:002:0 s 23 01 0014 0002 0000 0
ffff888141995180 151430693 C Co:1:002:0 0 0
ffff888141995180 151482140 S Co:1:000:0 s 00 05 000b 0000 0000 0
ffff888141995180 151482405 C Co:1:000:0 0 0
ffff888141995000 151496106 S Ci:1:011:0 s 80 06 0100 0000 0008 8 <
ffff888141995000 151496622 C Ci:1:011:0 -32 0
ffff888141995000 151496658 S Ci:1:011:0 s 80 06 0100 0000 0008 8 <
ffff888141995000 151497138 C Ci:1:011:0 -32 0
ffff888141995000 151497169 S Ci:1:011:0 s 80 06 0100 0000 0008 8 <
ffff888141995000 151497486 C Ci:1:011:0 -32 0
ffff888141995000 151606108 S Co:1:002:0 s 23 03 0004 0002 0000 0
ffff888141995000 151606374 C Co:1:002:0 0 0
ffff888141995000 151618105 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff888141995000 151618344 C Ci:1:002:0 0 4 = 03011000
ffff888141995000 151618379 S Co:1:002:0 s 23 01 0014 0002 0000 0
ffff888141995000 151618599 C Co:1:002:0 0 0
ffff888141995000 151670136 S Co:1:000:0 s 00 05 000b 0000 0000 0
ffff888141995000 151670488 C Co:1:000:0 0 0
ffff888141995180 151684039 S Ci:1:011:0 s 80 06 0100 0000 0008 8 <
ffff888141995180 151684454 C Ci:1:011:0 -32 0
ffff888141995180 151684532 S Ci:1:011:0 s 80 06 0100 0000 0008 8 <
ffff888141995180 151684785 C Ci:1:011:0 -32 0
ffff888141995180 151684834 S Ci:1:011:0 s 80 06 0100 0000 0008 8 <
ffff888141995180 151685618 C Ci:1:011:0 -32 0
ffff888141995180 151791035 S Co:1:002:0 s 23 01 0001 0002 0000 0
ffff888141995180 151791290 C Co:1:002:0 0 0
ffff888141995000 151792277 S Co:1:002:0 s 23 01 0001 0002 0000 0
ffff888141995000 151792534 C Co:1:002:0 0 0
ffff888141995000 151792564 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff888141995000 151792796 C Ci:1:002:0 0 4 = 01010000


###############################
# Patched - old scheme (6.6.21)
###############################

[  272.774542] usb 1-1.2: new full-speed USB device number 12 using ehci-pci
[  273.185546] usb 1-1.2: device not accepting address 12, error -32
[  273.251550] usb 1-1.2: new full-speed USB device number 13 using ehci-pci
[  273.665491] usb 1-1.2: device not accepting address 13, error -32
[  273.667236] usb 1-1-port2: attempt power cycle
[  274.203558] usb 1-1.2: new full-speed USB device number 14 using ehci-pci
[  274.283841] usb 1-1.2: New USB device found, idVendor=0658, 
idProduct=0200, bcdDevice= 0.00
[  274.283867] usb 1-1.2: New USB device strings: Mfr=0, Product=0, 
SerialNumber=0
[  274.309310] cdc_acm 1-1.2:1.0: ttyACM0: USB ACM device
[  274.309355] usbcore: registered new interface driver cdc_acm
[  274.309357] cdc_acm: USB Abstract Control Model driver for USB modems 
and ISDN adapters

ffff8881003dc9c0 272654268 C Ii:1:002:1 0:2048 1 = 04
ffff8881003dc9c0 272654316 S Ii:1:002:1 -115:2048 1 <
ffff888145ae1780 272654483 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff888145ae1780 272654726 C Ci:1:002:0 0 4 = 01010100
ffff888145ae1780 272654739 S Co:1:002:0 s 23 01 0010 0002 0000 0
ffff888145ae1780 272654993 C Co:1:002:0 0 0
ffff888145ae1780 272655020 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff888145ae1780 272655255 C Ci:1:002:0 0 4 = 01010000
ffff888145ae1780 272682096 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff888145ae1780 272682378 C Ci:1:002:0 0 4 = 01010000
ffff888145ae1780 272709128 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff888145ae1780 272709311 C Ci:1:002:0 0 4 = 01010000
ffff888145ae1780 272736109 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff888145ae1780 272736445 C Ci:1:002:0 0 4 = 01010000
ffff888145ae1780 272763140 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff888145ae1780 272763338 C Ci:1:002:0 0 4 = 01010000
ffff888145ae1780 272763393 S Co:1:002:0 s 23 03 0004 0002 0000 0
ffff888145ae1780 272763595 C Co:1:002:0 0 0
ffff888145ae1780 272775074 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff888145ae1780 272775275 C Ci:1:002:0 0 4 = 03011000
ffff888145ae1780 272775314 S Co:1:002:0 s 23 01 0014 0002 0000 0
ffff888145ae1780 272775729 C Co:1:002:0 0 0
ffff888145ae1780 272827142 S Co:1:000:0 s 00 05 000c 0000 0000 0
ffff888145ae1780 272827417 C Co:1:000:0 -32 0
ffff888145ae1780 273030123 S Co:1:000:0 s 00 05 000c 0000 0000 0
ffff888145ae1780 273030664 C Co:1:000:0 -32 0
ffff888145ae1780 273239645 S Co:1:002:0 s 23 01 0001 0002 0000 0
ffff888145ae1780 273239905 C Co:1:002:0 0 0
ffff888145ae1d80 273239949 S Co:1:002:0 s 23 03 0004 0002 0000 0
ffff888145ae1d80 273240163 C Co:1:002:0 0 0
ffff888145ae1d80 273252126 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff888145ae1d80 273252423 C Ci:1:002:0 0 4 = 03011000
ffff888145ae1d80 273252447 S Co:1:002:0 s 23 01 0014 0002 0000 0
ffff888145ae1d80 273252680 C Co:1:002:0 0 0
ffff888145ae1d80 273304146 S Co:1:000:0 s 00 05 000d 0000 0000 0
ffff888145ae1d80 273304402 C Co:1:000:0 -32 0
ffff888145ae1d80 273510101 S Co:1:000:0 s 00 05 000d 0000 0000 0
ffff888145ae1d80 273510377 C Co:1:000:0 -32 0
ffff888145ae1d80 273719545 S Co:1:002:0 s 23 01 0001 0002 0000 0
ffff888145ae1d80 273719772 C Co:1:002:0 0 0
ffff888145ae1780 273719812 S Co:1:002:0 s 23 01 0008 0002 0000 0
ffff888145ae1780 273720047 C Co:1:002:0 0 0
ffff888145ae1780 273926112 S Co:1:002:0 s 23 03 0008 0002 0000 0
ffff888145ae1780 273926406 C Co:1:002:0 0 0
ffff8881003dc9c0 273934050 C Ii:1:002:1 0:2048 1 = 04
ffff8881003dc9c0 273934081 S Ii:1:002:1 -115:2048 1 <
ffff888145ae1780 274030115 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff888145ae1780 274030418 C Ci:1:002:0 0 4 = 01010100
ffff888145ae1780 274030442 S Co:1:002:0 s 23 01 0010 0002 0000 0
ffff888145ae1780 274030677 C Co:1:002:0 0 0
ffff888145ae1780 274057089 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff888145ae1780 274057416 C Ci:1:002:0 0 4 = 01010000
ffff888145ae1780 274084090 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff888145ae1780 274084322 C Ci:1:002:0 0 4 = 01010100
ffff888145ae1780 274084353 S Co:1:002:0 s 23 01 0010 0002 0000 0
ffff888145ae1780 274084578 C Co:1:002:0 0 0
ffff888145ae1780 274111090 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff888145ae1780 274111317 C Ci:1:002:0 0 4 = 01010000
ffff888145ae1780 274138114 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff888145ae1780 274138418 C Ci:1:002:0 0 4 = 01010000
ffff888145ae1780 274165108 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff888145ae1780 274165350 C Ci:1:002:0 0 4 = 01010000
ffff888145ae1780 274192122 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff888145ae1780 274192263 C Ci:1:002:0 0 4 = 01010000
ffff888145ae1780 274192320 S Co:1:002:0 s 23 03 0004 0002 0000 0
ffff888145ae1780 274192521 C Co:1:002:0 0 0
ffff888145ae1780 274204052 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff888145ae1780 274204382 C Ci:1:002:0 0 4 = 03011000
ffff888145ae1780 274204405 S Co:1:002:0 s 23 01 0014 0002 0000 0
ffff888145ae1780 274204652 C Co:1:002:0 0 0
ffff888145ae1780 274256143 S Ci:1:000:0 s 80 06 0100 0000 0040 64 <
ffff888145ae1780 274256517 C Ci:1:000:0 0 8 = 12010002 02000008
ffff888145ae1780 274256546 S Co:1:002:0 s 23 03 0004 0002 0000 0
ffff888145ae1780 274256743 C Co:1:002:0 0 0
ffff888145ae1780 274268088 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff888145ae1780 274268362 C Ci:1:002:0 0 4 = 03011000
ffff888145ae1780 274268388 S Co:1:002:0 s 23 01 0014 0002 0000 0
ffff888145ae1780 274268618 C Co:1:002:0 0 0
ffff888145ae1780 274320112 S Co:1:000:0 s 00 05 000e 0000 0000 0
ffff888145ae1780 274320449 C Co:1:000:0 0 0
ffff888145ae1d80 274334110 S Ci:1:014:0 s 80 06 0100 0000 0012 18 <
ffff888145ae1d80 274334545 C Ci:1:014:0 0 18 = 12010002 02000008 
58060002 00000000 0001
ffff888145ae1d80 274334577 S Ci:1:014:0 s 80 06 0600 0000 000a 10 <
ffff888145ae1d80 274334813 C Ci:1:014:0 -32 0
ffff888145ae1d80 274334843 S Ci:1:014:0 s 80 06 0600 0000 000a 10 <
ffff888145ae1d80 274335150 C Ci:1:014:0 -32 0
ffff888145ae1d80 274335168 S Ci:1:014:0 s 80 06 0600 0000 000a 10 <
ffff888145ae1d80 274335581 C Ci:1:014:0 -32 0
ffff888145ae1d80 274335610 S Ci:1:014:0 s 80 06 0200 0000 0009 9 <
ffff888145ae1d80 274335914 C Ci:1:014:0 0 9 = 09024300 02010080 32
ffff888145ae1d80 274335932 S Ci:1:014:0 s 80 06 0200 0000 0043 67 <
ffff888145ae1d80 274336373 C Ci:1:014:0 0 67 = 09024300 02010080 
32090400 00010202 01000524 00100105 24010001 04240200
ffff888145ae1900 274336803 S Co:1:014:0 s 00 09 0001 0000 0000 0
ffff888145ae1900 274337062 C Co:1:014:0 0 0
ffff888145ae1900 274340120 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff888145ae1900 274340243 C Ci:1:002:0 0 4 = 03010000
ffff888121658a80 274361461 S Co:1:014:0 s 21 20 0000 0000 0007 7 = 
80250000 000008
ffff888121658a80 274361751 C Co:1:014:0 0 7 >
Alan Stern March 19, 2024, 4:03 p.m. UTC | #18
On Tue, Mar 19, 2024 at 12:54:37PM +0100, Jan Čermák wrote:
> Hello Alan,
> 
> On 16. 03. 24 21:35, Alan Stern wrote:
> > Below is a patch meant to get the number of resets back to what it
> > should be.  I'd appreciate it if you can test this, and report the
> > kernel log output along with the usbmon output for the normal case and
> > also with the "old_scheme_first" parameter set.
> > 
> > I'm not very hopeful that this will solve the problem, but there's a
> > good chance it will help point us in the right direction by removing
> > extraneous complications.
> 
> unfortunately you were right, the problem is still unresolved with your
> patch. I hope the traces will provide some new insights then.

I get the strong impression that this device just takes a long time to
initialize when it is plugged in.  A lot longer than the current
debounce time of 150 ms -- more like 2000 ms!  The usbmon traces show
the device disconnecting and reconnecting about 1500 ms after it is
first plugged in, and then it starts working about 300-400 ms later.

Try doing this: Keep the patch applied, but make the following changes
in addition.  In drivers/usb/core/hub.c, around line 128 the code says:

#define HUB_DEBOUNCE_TIMEOUT	2000
#define HUB_DEBOUNCE_STEP	  25
#define HUB_DEBOUNCE_STABLE	 100

Change the HUB_DEBOUNCE_TIMEOUT value to 4500, the HUB_DEBOUNCE_STEP
value to 250 and the HUB_DEBOUNCE_STABLE value to 2000.  That just
might give the device enough time to settle down and start working
before the computer tries using it.

This is not something we would want to do for ordinary kernels; it
would cause new USB devices to be ignored for more than 2 seconds
after they are plugged in, which would annoy many people.  But if my
theory is right, it may be what your device needs.

Alan Stern
Jan Čermák March 27, 2024, 1:24 p.m. UTC | #19
Hi Alan,

On 19. 03. 24 17:03, Alan Stern wrote:
> Change the HUB_DEBOUNCE_TIMEOUT value to 4500, the HUB_DEBOUNCE_STEP
> value to 250 and the HUB_DEBOUNCE_STABLE value to 2000.  That just
> might give the device enough time to settle down and start working
> before the computer tries using it.

sorry for the delay, I only managed to test it today. You are right, 
with the timeouts adjusted, it enumerates fine after a while, without 
any descriptor read errors or anything like that:

[  210.957371] usb 1-1.2: new full-speed USB device number 5 using ehci-pci
[  211.037728] usb 1-1.2: New USB device found, idVendor=0658, 
idProduct=0200, bcdDevice= 0.00
[  211.037747] usb 1-1.2: New USB device strings: Mfr=0, Product=0, 
SerialNumber=0
[  211.039764] cdc_acm 1-1.2:1.0: ttyACM0: USB ACM device

If it's worth anything, usbmon trace is attached below. Anyway, do you 
have any ideas what could be done to make it work without doing any 
detrimental changes? I was thinking I'll try to reach out to the vendor 
at this point - they should be aware their device might stop working 
with recent kernels, and they could explain the quirky behavior, or 
implement any changes on the firmware side (if it's even possible).

Regards,
Jan



ffff888101391300 207454583 C Ii:1:002:1 0:2048 1 = 04
ffff888101391300 207454617 S Ii:1:002:1 -115:2048 1 <
ffff88814d461180 207454651 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff88814d461180 207454845 C Ci:1:002:0 0 4 = 01010100
ffff88814d461180 207454860 S Co:1:002:0 s 23 01 0010 0002 0000 0
ffff88814d461180 207455110 C Co:1:002:0 0 0
ffff88814d461180 207455119 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff88814d461180 207455368 C Ci:1:002:0 0 4 = 01010000
ffff88814d461180 207710110 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff88814d461180 207710378 C Ci:1:002:0 0 4 = 01010000
ffff88814d461180 207966112 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff88814d461180 207966378 C Ci:1:002:0 0 4 = 01010000
ffff88814d461180 208222112 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff88814d461180 208222401 C Ci:1:002:0 0 4 = 01010000
ffff88814d461180 208478109 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff88814d461180 208478561 C Ci:1:002:0 0 4 = 01010000
ffff88814d461180 208734110 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff88814d461180 208734342 C Ci:1:002:0 0 4 = 01010000
ffff88814d461180 208990111 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff88814d461180 208990297 C Ci:1:002:0 0 4 = 01010100
ffff88814d461180 208990317 S Co:1:002:0 s 23 01 0010 0002 0000 0
ffff888101391300 208990557 C Ii:1:002:1 0:2048 1 = 04
ffff888101391300 208990574 S Ii:1:002:1 -115:2048 1 <
ffff88814d461180 208990583 C Co:1:002:0 0 0
ffff88814d461180 209246112 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff88814d461180 209246405 C Ci:1:002:0 0 4 = 01010000
ffff88814d461180 209502108 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff88814d461180 209502393 C Ci:1:002:0 0 4 = 01010000
ffff88814d461180 209758115 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff88814d461180 209758358 C Ci:1:002:0 0 4 = 01010000
ffff88814d461180 210014113 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff88814d461180 210014318 C Ci:1:002:0 0 4 = 01010000
ffff88814d461180 210270107 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff88814d461180 210270327 C Ci:1:002:0 0 4 = 01010000
ffff88814d461180 210526114 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff88814d461180 210526337 C Ci:1:002:0 0 4 = 01010000
ffff88814d461180 210782106 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff88814d461180 210782340 C Ci:1:002:0 0 4 = 01010000
ffff88814d461180 211038111 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff88814d461180 211038365 C Ci:1:002:0 0 4 = 01010000
ffff88814d461180 211038419 S Co:1:002:0 s 23 03 0004 0002 0000 0
ffff88814d461180 211038621 C Co:1:002:0 0 0
ffff88814d461180 211050109 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff88814d461180 211050473 C Ci:1:002:0 0 4 = 03011000
ffff88814d461180 211050508 S Co:1:002:0 s 23 01 0014 0002 0000 0
ffff88814d461180 211050732 C Co:1:002:0 0 0
ffff88814d461180 211102145 S Ci:1:000:0 s 80 06 0100 0000 0040 64 <
ffff88814d461180 211102383 C Ci:1:000:0 0 8 = 12010002 02000008
ffff88814d461180 211102410 S Co:1:002:0 s 23 03 0004 0002 0000 0
ffff88814d461180 211102640 C Co:1:002:0 0 0
ffff88814d461180 211114111 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff88814d461180 211114411 C Ci:1:002:0 0 4 = 03011000
ffff88814d461180 211114446 S Co:1:002:0 s 23 01 0014 0002 0000 0
ffff88814d461180 211114667 C Co:1:002:0 0 0
ffff88814d461180 211166117 S Co:1:000:0 s 00 05 0005 0000 0000 0
ffff88814d461180 211166509 C Co:1:000:0 0 0
ffff88814d461240 211180097 S Ci:1:005:0 s 80 06 0100 0000 0012 18 <
ffff88814d461240 211180460 C Ci:1:005:0 0 18 = 12010002 02000008 
58060002 00000000 0001
ffff88814d461240 211180500 S Ci:1:005:0 s 80 06 0600 0000 000a 10 <
ffff88814d461240 211180722 C Ci:1:005:0 -32 0
ffff88814d461240 211180989 S Ci:1:005:0 s 80 06 0600 0000 000a 10 <
ffff88814d461240 211181294 C Ci:1:005:0 -32 0
ffff88814d461240 211181324 S Ci:1:005:0 s 80 06 0600 0000 000a 10 <
ffff88814d461240 211181647 C Ci:1:005:0 -32 0
ffff88814d461240 211181669 S Ci:1:005:0 s 80 06 0200 0000 0009 9 <
ffff88814d461240 211181972 C Ci:1:005:0 0 9 = 09024300 02010080 32
ffff88814d461240 211181999 S Ci:1:005:0 s 80 06 0200 0000 0043 67 <
ffff88814d461240 211182437 C Ci:1:005:0 0 67 = 09024300 02010080 
32090400 00010202 01000524 00100105 24010001 04240200
ffff88814d461300 211182850 S Co:1:005:0 s 00 09 0001 0000 0000 0
ffff88814d461300 211183117 C Co:1:005:0 0 0
ffff888145f1cb40 211184128 S Co:1:005:0 s 21 20 0000 0000 0007 7 = 
80250000 000008
ffff888145f1cb40 211184335 C Co:1:005:0 0 7 >
ffff888145f1cb40 211184995 S Ci:1:002:0 s a3 00 0000 0002 0004 4 <
ffff888145f1cb40 211185187 C Ci:1:002:0 0 4 = 03010000
Alan Stern March 27, 2024, 2:21 p.m. UTC | #20
On Wed, Mar 27, 2024 at 02:24:34PM +0100, Jan Čermák wrote:
> Hi Alan,
> 
> On 19. 03. 24 17:03, Alan Stern wrote:
> > Change the HUB_DEBOUNCE_TIMEOUT value to 4500, the HUB_DEBOUNCE_STEP
> > value to 250 and the HUB_DEBOUNCE_STABLE value to 2000.  That just
> > might give the device enough time to settle down and start working
> > before the computer tries using it.
> 
> sorry for the delay, I only managed to test it today. You are right, with
> the timeouts adjusted, it enumerates fine after a while, without any
> descriptor read errors or anything like that:
> 
> [  210.957371] usb 1-1.2: new full-speed USB device number 5 using ehci-pci
> [  211.037728] usb 1-1.2: New USB device found, idVendor=0658,
> idProduct=0200, bcdDevice= 0.00
> [  211.037747] usb 1-1.2: New USB device strings: Mfr=0, Product=0,
> SerialNumber=0
> [  211.039764] cdc_acm 1-1.2:1.0: ttyACM0: USB ACM device

Great!

> If it's worth anything, usbmon trace is attached below. Anyway, do you have
> any ideas what could be done to make it work without doing any detrimental
> changes? I was thinking I'll try to reach out to the vendor at this point -
> they should be aware their device might stop working with recent kernels,
> and they could explain the quirky behavior, or implement any changes on the
> firmware side (if it's even possible).

The ideal solution would be if the vendor updates the firmware to
prevent the device from turning on its pull-up (thereby telling the
host computer that it is connected to the bus) until it is ready to
operate.  There's no good reason to have that > 1-second period during
which the device claims to be connected but does not work.

Another possible solution, a lot less attractive, would be to change
the initialization code in the hub driver so that if it sees the
device disconnect itself from the bus, it restarts the entire
procedure from the beginning.  You'd end up getting a bunch of error
messages during the initial non-working period, just as you do now,
but afterwards the device should be detected and initialized okay.

Alan Stern
Alexander Dahl March 28, 2024, 3:44 p.m. UTC | #21
Hei hei,

following this discussion with some kind of curiosity, because I own
such a device and depent on it, but my firmware version does not seem
to be affected.  Remarks below.

Am Wed, Mar 27, 2024 at 10:21:19AM -0400 schrieb Alan Stern:
> On Wed, Mar 27, 2024 at 02:24:34PM +0100, Jan Čermák wrote:
> > Hi Alan,
> > 
> > On 19. 03. 24 17:03, Alan Stern wrote:
> > > Change the HUB_DEBOUNCE_TIMEOUT value to 4500, the HUB_DEBOUNCE_STEP
> > > value to 250 and the HUB_DEBOUNCE_STABLE value to 2000.  That just
> > > might give the device enough time to settle down and start working
> > > before the computer tries using it.
> > 
> > sorry for the delay, I only managed to test it today. You are right, with
> > the timeouts adjusted, it enumerates fine after a while, without any
> > descriptor read errors or anything like that:
> > 
> > [  210.957371] usb 1-1.2: new full-speed USB device number 5 using ehci-pci
> > [  211.037728] usb 1-1.2: New USB device found, idVendor=0658,
> > idProduct=0200, bcdDevice= 0.00
> > [  211.037747] usb 1-1.2: New USB device strings: Mfr=0, Product=0,
> > SerialNumber=0
> > [  211.039764] cdc_acm 1-1.2:1.0: ttyACM0: USB ACM device
> 
> Great!
> 
> > If it's worth anything, usbmon trace is attached below. Anyway, do you have
> > any ideas what could be done to make it work without doing any detrimental
> > changes? I was thinking I'll try to reach out to the vendor at this point -
> > they should be aware their device might stop working with recent kernels,
> > and they could explain the quirky behavior, or implement any changes on the
> > firmware side (if it's even possible).
> 
> The ideal solution would be if the vendor updates the firmware to
> prevent the device from turning on its pull-up (thereby telling the
> host computer that it is connected to the bus) until it is ready to
> operate.  There's no good reason to have that > 1-second period during
> which the device claims to be connected but does not work.

As pointed out in the GitHub ticket already:  Firmware update from a
users point of view is difficult to impossible.  There's no easy "take
the latest firmware" and update and you're done.  It is not clear
which tools are necessary, and even worse there are only certain
combinations of upgrade paths.  For example upgrading to x.z is only
possible from x.x but not from x.y, if I understood it correctly.  And
you don't know if you will brick the device or not.  And I'm speaking
as an embedded developer.  The ordinary home user is probably not even
going to try it.

> Another possible solution, a lot less attractive, would be to change
> the initialization code in the hub driver so that if it sees the
> device disconnect itself from the bus, it restarts the entire
> procedure from the beginning.  You'd end up getting a bunch of error
> messages during the initial non-working period, just as you do now,
> but afterwards the device should be detected and initialized okay.

Is it possible to hook in with some kind of quirk if this device ID is
seen on the bus (and wait longer just for this device), or can you
only access that after successful init?

Greets
Alex
Alan Stern March 28, 2024, 4:16 p.m. UTC | #22
On Thu, Mar 28, 2024 at 04:44:26PM +0100, Alexander Dahl wrote:
> Hei hei,
> 
> following this discussion with some kind of curiosity, because I own
> such a device and depent on it, but my firmware version does not seem
> to be affected.  Remarks below.

> > The ideal solution would be if the vendor updates the firmware to
> > prevent the device from turning on its pull-up (thereby telling the
> > host computer that it is connected to the bus) until it is ready to
> > operate.  There's no good reason to have that > 1-second period during
> > which the device claims to be connected but does not work.
> 
> As pointed out in the GitHub ticket already:  Firmware update from a
> users point of view is difficult to impossible.  There's no easy "take
> the latest firmware" and update and you're done.  It is not clear
> which tools are necessary, and even worse there are only certain
> combinations of upgrade paths.  For example upgrading to x.z is only
> possible from x.x but not from x.y, if I understood it correctly.  And
> you don't know if you will brick the device or not.  And I'm speaking
> as an embedded developer.  The ordinary home user is probably not even
> going to try it.

Oh, okay.  Too bad.  Although in many cases, manufacturers tend not to 
be eager to change their devices' firmwares just to suit Linux users...

> > Another possible solution, a lot less attractive, would be to change
> > the initialization code in the hub driver so that if it sees the
> > device disconnect itself from the bus, it restarts the entire
> > procedure from the beginning.  You'd end up getting a bunch of error
> > messages during the initial non-working period, just as you do now,
> > but afterwards the device should be detected and initialized okay.
> 
> Is it possible to hook in with some kind of quirk if this device ID is
> seen on the bus (and wait longer just for this device), or can you
> only access that after successful init?

The latter, unfortunately.  Before initialization there's no way to 
communicate with the device.

Alan Stern