[RFC] usb: gadget: u_ether: prevent deadlock under RT

Message ID	20250219181556.1020029-1-ralph.siemsen@linaro.org
State	New
Headers	show Received: from mail-qt1-f176.google.com (mail-qt1-f176.google.com [209.85.160.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3F1861FDE27 for <linux-usb@vger.kernel.org>; Wed, 19 Feb 2025 18:16:32 +0000 (UTC) From: Ralph Siemsen <ralph.siemsen@linaro.org> To: linux-rt-devel@lists.linux.dev, linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org, Pawel Laszczak <pawell@cadence.com>, Frank Li <Frank.Li@nxp.com>, Ferry Toth <ftoth@exalondelft.nl> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>, Thomas Gleixner <tglx@linutronix.de>, Steven Rostedt <rostedt@goodmis.org>, Ralph Siemsen <ralph.siemsen@linaro.org> Subject: [RFC PATCH] usb: gadget: u_ether: prevent deadlock under RT Date: Wed, 19 Feb 2025 13:15:52 -0500 Message-ID: <20250219181556.1020029-1-ralph.siemsen@linaro.org> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	[RFC] usb: gadget: u_ether: prevent deadlock under RT \| expand [RFC] usb: gadget: u_ether: prevent deadlock under RT

Message ID

20250219181556.1020029-1-ralph.siemsen@linaro.org

State

New

Headers

From: Ralph Siemsen <ralph.siemsen@linaro.org>
To: linux-rt-devel@lists.linux.dev,
	linux-usb@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	Pawel Laszczak <pawell@cadence.com>,
	Frank Li <Frank.Li@nxp.com>,
	Ferry Toth <ftoth@exalondelft.nl>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	Thomas Gleixner <tglx@linutronix.de>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ralph Siemsen <ralph.siemsen@linaro.org>
Subject: [RFC PATCH] usb: gadget: u_ether: prevent deadlock under RT
Date: Wed, 19 Feb 2025 13:15:52 -0500
Message-ID: <20250219181556.1020029-1-ralph.siemsen@linaro.org>
Precedence: bulk
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit

Series

[RFC] usb: gadget: u_ether: prevent deadlock under RT | expand

Commit Message

Ralph Siemsen Feb. 19, 2025, 6:15 p.m. UTC

A deadlock can be readily triggered when using NCM gadget with the
Cadence cdns3 USB driver, under heavy traffic from "iperf3 --bidir".
It has been observed under 6.1, 6.6 and 6.12 kernel versions, but
only on PREEMPT_RT. Once deadlocked, even magicsysrq has no effect.
However, there is periodic output from the rcu stall detector:

[   71.105941] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[   71.105966] rcu:     Tasks blocked on level-0 rcu_node (CPUs 0-1): P125/4:b..l
[   71.105992] rcu:     (detected by 1, t=5252 jiffies, g=-515, q=7 ncpus=2)
[   71.106003] task:irq/507-s-f4000 state:D stack:0     pid:125   tgid:125   ppid:2      flags:0x00000008
[   71.106018] Call trace:
[   71.106022]  __switch_to+0xf4/0x158
[   71.106046]  __schedule+0x2b4/0x920
[   71.106055]  schedule_rtlock+0x24/0x50
[   71.106064]  rtlock_slowlock_locked+0x348/0xcb8
[   71.106077]  rt_spin_lock+0x88/0xb8
[   71.106086]  eth_start_xmit+0x30/0x1490 [u_ether]        /*****/
[   71.106112]  ncm_tx_timeout+0x2c/0x50 [usb_f_ncm]
[   71.106131]  __hrtimer_run_queues+0x180/0x378
[   71.106143]  hrtimer_run_softirq+0x90/0x100
[   71.106151]  handle_softirqs.isra.0+0x14c/0x360
[   71.106165]  __local_bh_enable_ip+0x104/0x118
[   71.106177]  __netdev_alloc_skb+0x1e0/0x210
[   71.106192]  ncm_unwrap_ntb+0x1ec/0x528 [usb_f_ncm]
[   71.106206]  rx_complete+0x120/0x288 [u_ether]           /*****/
[   71.106221]  usb_gadget_giveback_request+0x34/0xf8
[   71.106236]  cdns3_gadget_giveback+0xe4/0x2d0 [cdns3]
[   71.106286]  cdns3_transfer_completed+0x3b0/0x630 [cdns3]
[   71.106320]  cdns3_device_thread_irq_handler+0x8b8/0xd18 [cdns3]
[   71.106353]  irq_thread_fn+0x34/0xb8
[   71.106364]  irq_thread+0x180/0x2f0
[   71.106374]  kthread+0x104/0x118
[   71.106384]  ret_from_fork+0x10/0x20

The deadlock occurs because eth_start_xmit() and rx_complete() both
acquire the same spinlock in the same instance of struct eth_dev.
The nested call occurs because rx_complete() calls __netdev_alloc_skb()
which performs a brief local_bh_disable/enable() sequence.

Prevent the deadlock by disabling softirq in rx_complete(), with the
same scope as the spinlock. With this fix, no deadlocks are observed
over many hours of continuous testing at USB 2.0 speed (480 Mbit/s).

Signed-off-by: Ralph Siemsen <ralph.siemsen@linaro.org>
---
Discussion items:

0) This is somewhat similar to the recent discussion
https://lore.kernel.org/linux-rt-devel/20250212123302.0f620f23@gandalf.local.home/
and my solution is to "sprinkle local_bh_disable() around",
which is clearly not optimal. Hence the RFC on this patch.

1) There are several more places using the same spinlock, for example
the gether_suspend() and gether_resume() functions. I'm not using power
management, but I wonder if there might be more deadlocks lurking?

2) By keeping softirq disabled for a longer duration, this patch could
potentially increase the RT latency. I tried to quantify this using
cyclictest, but I cannot get a baseline for comparison, since it
deadlocks almost immediately when the USB is active. Other suggestions?

3) The fix is in generic u_ether.c code, to match the scope of the
spinlock. Alternatively, it could be done in cdns3 specific code,
such as in cdns3_device_thread_irq_handler(). I'm not sure if the
same problem exists in other USB drivers?
---
 drivers/usb/gadget/function/u_ether.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Ralph Siemsen March 7, 2025, 7:51 p.m. UTC | #1

Hi Sebastian,

Thanks for your reply!

On Wed, Feb 26, 2025 at 3:29 AM Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
> Based on the backtrace the problem is within the cdns3. The driver
> acquires at the beginning of its threaded routine
>         spin_lock_irqsave(&priv_dev->lock, flags);
>
> and then before returning the URB it does
>                  spin_unlock(&priv_dev->lock);
>                  usb_gadget_giveback_request()
>
> so the lock is dropped but the interrupts are still disabled. This makes
> me wonder why using threaded interrupts at all since interrupts are
> disabled for the whole routine but then others do the same.

I also wondered why threaded interrupts are being used, but I don't
know the reason.

> If you look at dwc3_thread_interrupt() they have local_bh_disable()/
> enable() before acquiring the lock and this is what I would suggest
> doing here, too. The NCM is probably not the only one affected but
> everything doing network that may since it may recourse into softirq.

Thanks for the suggestion. I had also thought of doing this. I also
noticed that the "cdnsp" driver has a similar fix in commit
58f2fcb3a845 ("usb: cdnsp: Fix deadlock issue during using NCM
gadget"), which was also backported to stable branches. So I will
prepare a v2 patch to do the same for "cdns3" driver.

Regards
Ralph

diff --git a/drivers/usb/gadget/function/u_ether.c b/drivers/usb/gadget/function/u_ether.c
index 09e2838917e2..dc511c01b741 100644
--- a/drivers/usb/gadget/function/u_ether.c
+++ b/drivers/usb/gadget/function/u_ether.c
@@ -233,6 +233,7 @@  static void rx_complete(struct usb_ep *ep, struct usb_request *req)
 		if (dev->unwrap) {
 			unsigned long	flags;
 
+			local_bh_disable();
 			spin_lock_irqsave(&dev->lock, flags);
 			if (dev->port_usb) {
 				status = dev->unwrap(dev->port_usb,
@@ -243,6 +244,7 @@  static void rx_complete(struct usb_ep *ep, struct usb_request *req)
 				status = -ENOTCONN;
 			}
 			spin_unlock_irqrestore(&dev->lock, flags);
+			local_bh_enable();
 		} else {
 			skb_queue_tail(&dev->rx_frames, skb);
 		}

[RFC] usb: gadget: u_ether: prevent deadlock under RT

Commit Message

Comments

Patch