diff mbox series

usb: gadget: u_serial: Add null pointer checks after RX/TX submission

Message ID 20240116141801.396398-1-khtsai@google.com
State New
Headers show
Series usb: gadget: u_serial: Add null pointer checks after RX/TX submission | expand

Commit Message

Kuen-Han Tsai Jan. 16, 2024, 2:16 p.m. UTC
Commit ffd603f21423 ("usb: gadget: u_serial: Add null pointer check in
gs_start_io") adds null pointer checks to gs_start_io(), but it doesn't
fully fix the potential null pointer dereference issue. While
gserial_connect() calls gs_start_io() with port_lock held, gs_start_rx()
and gs_start_tx() release the lock during endpoint request submission.
This creates a window where gs_close() could set port->port_tty to NULL,
leading to a dereference when the lock is reacquired.

This patch adds a null pointer check for port->port_tty after RX/TX
submission, and removes the initial null pointer check in gs_start_io()
since the caller must hold port_lock and guarantee non-null values for
port_usb and port_tty.

Fixes: ffd603f21423 ("usb: gadget: u_serial: Add null pointer check in gs_start_io")
Cc: stable@vger.kernel.org
Signed-off-by: Kuen-Han Tsai <khtsai@google.com>
---
Explanation:
    CPU1:                            CPU2:
    gserial_connect() // lock
                                     gs_close() // await lock
    gs_start_rx()     // unlock
    usb_ep_queue()
                                     gs_close() // lock, reset port_tty and unlock
    gs_start_rx()     // lock
    tty_wakeup()      // dereference

Stack traces:
[   51.494375][  T278] ttyGS1: shutdown
[   51.494817][  T269] android_work: sent uevent USB_STATE=DISCONNECTED
[   52.115792][ T1508] usb: [dm_bind] generic ttyGS1: super speed IN/ep1in OUT/ep1out
[   52.516288][ T1026] android_work: sent uevent USB_STATE=CONNECTED
[   52.551667][ T1533] gserial_connect: start ttyGS1
[   52.565634][ T1533] [khtsai] enter gs_start_io, ttyGS1, port->port.tty=0000000046bd4060
[   52.565671][ T1533] [khtsai] gs_start_rx, unlock port ttyGS1
[   52.591552][ T1533] [khtsai] gs_start_rx, lock port ttyGS1
[   52.619901][ T1533] [khtsai] gs_start_rx, unlock port ttyGS1
[   52.638659][ T1325] [khtsai] gs_close, lock port ttyGS1
[   52.656842][ T1325] gs_close: ttyGS1 (0000000046bd4060,00000000be9750a5) ...
[   52.683005][ T1325] [khtsai] gs_close, clear ttyGS1
[   52.683007][ T1325] gs_close: ttyGS1 (0000000046bd4060,00000000be9750a5) done!
[   52.708643][ T1325] [khtsai] gs_close, unlock port ttyGS1
[   52.747592][ T1533] [khtsai] gs_start_rx, lock port ttyGS1
[   52.747616][ T1533] [khtsai] gs_start_io, ttyGS1, going to call tty_wakeup(), port->port.tty=0000000000000000
[   52.747629][ T1533] Unable to handle kernel NULL pointer dereference at virtual address 00000000000001f8
---
 drivers/usb/gadget/function/u_serial.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

--
2.43.0.275.g3460e3d667-goog

Comments

Greg Kroah-Hartman Jan. 28, 2024, 1:29 a.m. UTC | #1
On Thu, Jan 18, 2024 at 10:27:54AM +0100, Jiri Slaby wrote:
> On 16. 01. 24, 15:16, Kuen-Han Tsai wrote:
> > Commit ffd603f21423 ("usb: gadget: u_serial: Add null pointer check in
> > gs_start_io") adds null pointer checks to gs_start_io(), but it doesn't
> > fully fix the potential null pointer dereference issue. While
> > gserial_connect() calls gs_start_io() with port_lock held, gs_start_rx()
> > and gs_start_tx() release the lock during endpoint request submission.
> > This creates a window where gs_close() could set port->port_tty to NULL,
> > leading to a dereference when the lock is reacquired.
> > 
> > This patch adds a null pointer check for port->port_tty after RX/TX
> > submission, and removes the initial null pointer check in gs_start_io()
> > since the caller must hold port_lock and guarantee non-null values for
> > port_usb and port_tty.
> 
> Or you switch to tty_port refcounting and need not fiddling with this at all
> ;).

I agree, Kuen-Han, why not do that instead?

thanks,

greg k-h
Kuen-Han Tsai Jan. 28, 2024, 2 p.m. UTC | #2
Hi Greg & Jiri,

>> Or you switch to tty_port refcounting and need not fiddling with this at all
>> ;).
> I agree, Kuen-Han, why not do that instead?

Thanks for the feedback! I agree that switching to tty_port
refcounting is the right approach.

I'm currently digging into tty_port.c to understand the best way to
implement this change. Could you confirm if I'm on the right track by
using tty_kref_get() and tty_kref_put() to address race conditions?
Additionally, do I need to refactor other functions in u_serial.c that
interact with the TTY without refcounting?

Regards,
Kuen-Han
Kuen-Han Tsai March 8, 2024, 11:47 a.m. UTC | #3
Hi Greg & Jiri,

On Sun, Jan 28, 2024 at 9:29 AM Greg KH <gregkh@linuxfoundation.org> wrote:
>
> On Thu, Jan 18, 2024 at 10:27:54AM +0100, Jiri Slaby wrote:
> > On 16. 01. 24, 15:16, Kuen-Han Tsai wrote:
> > > Commit ffd603f21423 ("usb: gadget: u_serial: Add null pointer check in
> > > gs_start_io") adds null pointer checks to gs_start_io(), but it doesn't
> > > fully fix the potential null pointer dereference issue. While
> > > gserial_connect() calls gs_start_io() with port_lock held, gs_start_rx()
> > > and gs_start_tx() release the lock during endpoint request submission.
> > > This creates a window where gs_close() could set port->port_tty to NULL,
> > > leading to a dereference when the lock is reacquired.
> > >
> > > This patch adds a null pointer check for port->port_tty after RX/TX
> > > submission, and removes the initial null pointer check in gs_start_io()
> > > since the caller must hold port_lock and guarantee non-null values for
> > > port_usb and port_tty.
> >
> > Or you switch to tty_port refcounting and need not fiddling with this at all
> > ;).
>
> I agree, Kuen-Han, why not do that instead?

The u_serial driver has already maintained the usage count of a TTY
structure for open and close. While the driver tracks the usage count
via open/close, it doesn't fully eliminate race conditions. Below are
two potential scenarios:

Case 1 (Observed):
1. gs_open() sets usage count to 1.
2. gserial_connect(), gs_start_io(), and gs_start_rx() execute in
sequence (lock held).
3. Lock released, usb_ep_queue() called.
4. In parallel, gs_close() executes, sees count of 1, clears TTY, releases lock.
5. Original thread resumes in gs_start_rx(), potentially leading to
kernel panic on an invalid TTY.

---

Case 2: Hypothesis. Similar to Case 1, but the race occurs between
gs_open() and gs_close(), also potentially causing a kernel panic.
1. gserial_connect() enables usb endpoints.
2. gs_open(), gs_start_io(), and gs_start_rx() execute in sequence (lock held).
3. Lock released, usb_ep_queue() called.
4. In parallel, gs_close() executes, sees count of 1, clears TTY, releases lock.
5. Original thread resumes in gs_start_rx(), potentially leading to
kernel panic on an invalid TTY.

---

Since both gserial_connect() and gs_open() initiate gs_start_io(),
there's a brief window where gs_start_rx() releases a spinlock for USB
submission. If gs_close() executes during this window, it could
acquire the lock and clear the TTY structure prematurely. This happens
because the lock is released and the usage count remains 1, making it
appear like a valid final reference, even though gs_start_io() is
still in progress.

My only solution so far is to recheck the TTY structure after
gs_start_rx() or gs_start_tx(). I would greatly appreciate your
insights on how to address this race condition effectively.

Regards,
Kuen-Han
Kuen-Han Tsai March 28, 2024, 7:54 a.m. UTC | #4
Hi Greg and Jiri

On Fri, Mar 8, 2024 at 7:47 PM Kuen-Han Tsai <khtsai@google.com> wrote:
> On Sun, Jan 28, 2024 at 9:29 AM Greg KH <gregkh@linuxfoundation.org> wrote:
> >
> > On Thu, Jan 18, 2024 at 10:27:54AM +0100, Jiri Slaby wrote:
> > > On 16. 01. 24, 15:16, Kuen-Han Tsai wrote:
> > > > Commit ffd603f21423 ("usb: gadget: u_serial: Add null pointer check in
> > > > gs_start_io") adds null pointer checks to gs_start_io(), but it doesn't
> > > > fully fix the potential null pointer dereference issue. While
> > > > gserial_connect() calls gs_start_io() with port_lock held, gs_start_rx()
> > > > and gs_start_tx() release the lock during endpoint request submission.
> > > > This creates a window where gs_close() could set port->port_tty to NULL,
> > > > leading to a dereference when the lock is reacquired.
> > > >
> > > > This patch adds a null pointer check for port->port_tty after RX/TX
> > > > submission, and removes the initial null pointer check in gs_start_io()
> > > > since the caller must hold port_lock and guarantee non-null values for
> > > > port_usb and port_tty.
> > >
> > > Or you switch to tty_port refcounting and need not fiddling with this at all
> > > ;).
> >
> > I agree, Kuen-Han, why not do that instead?
>
> The u_serial driver has already maintained the usage count of a TTY
> structure for open and close. While the driver tracks the usage count
> via open/close, it doesn't fully eliminate race conditions. Below are
> two potential scenarios:
>
> Case 1 (Observed):
> 1. gs_open() sets usage count to 1.
> 2. gserial_connect(), gs_start_io(), and gs_start_rx() execute in
> sequence (lock held).
> 3. Lock released, usb_ep_queue() called.
> 4. In parallel, gs_close() executes, sees count of 1, clears TTY, releases lock.
> 5. Original thread resumes in gs_start_rx(), potentially leading to
> kernel panic on an invalid TTY.

The same issue happens in the f_acm function.

[  369.926837][ T9731] Unable to handle kernel NULL pointer
dereference at virtual address 00000000000001f8
[  369.930098][ T9731] Call trace:
[  369.930108][ T9731]  tty_wakeup+0x28/0x160
[  369.930124][ T9731]  gs_start_io+0x128/0x214
[  369.930136][ T9731]  gserial_connect+0xb8/0x158
[  369.930150][ T9731]  acm_set_alt+0xf8/0x118
[  369.930162][ T9731]  set_config+0x258/0x3c0
[  369.930179][ T9731]  composite_setup+0x484/0x984
[  369.930193][ T9731]  android_setup+0x13c/0x24c
[  369.930206][ T9731]  dwc3_ep0_interrupt+0x8c4/0x122c
[  369.930224][ T9731]  dwc3_thread_interrupt+0xa4/0x1918
[  369.930238][ T9731]  irq_thread_fn+0x44/0xa4
[  369.930260][ T9731]  irq_thread+0x2a8/0x588
[  369.930272][ T9731]  kthread+0x1d0/0x23c
[  369.930289][ T9731]  ret_from_fork+0x10/0x20
[  369.930314][ T9731] Code: d5384108 aa0003f3 f9431d08 f81f83a8 (f940fc08)

> Since both gserial_connect() and gs_open() initiate gs_start_io(),
> there's a brief window where gs_start_rx() releases a spinlock for USB
> submission. If gs_close() executes during this window, it could
> acquire the lock and clear the TTY structure prematurely. This happens
> because the lock is released and the usage count remains 1, making it
> appear like a valid final reference, even though gs_start_io() is
> still in progress.

The f_acm function invokes gserial_connect while configuring the
altsetting. A similar issue arises when the TTY file node is already
opened but closed when the gs_start_io function is executing. The
current code includes a spinlock and the usage_count is 1; however,
the problem persists, suggesting that the existing spinlock and
usage_count are ineffective in preventing the issue. A straightforward
solution is to double-check the TTY status before calling the
tty_wakeup function.

> My only solution so far is to recheck the TTY structure after
> gs_start_rx() or gs_start_tx(). I would greatly appreciate your
> insights on how to address this race condition effectively.

I'd like to get your perspective on this issue. Do you have any ideas
on how we can move this solution forward?

Regards,
Kuen-Han
Jiri Slaby (SUSE) March 28, 2024, 9:02 a.m. UTC | #5
On 08. 03. 24, 12:47, Kuen-Han Tsai wrote:
> Hi Greg & Jiri,
> 
> On Sun, Jan 28, 2024 at 9:29 AM Greg KH <gregkh@linuxfoundation.org> wrote:
>>
>> On Thu, Jan 18, 2024 at 10:27:54AM +0100, Jiri Slaby wrote:
>>> On 16. 01. 24, 15:16, Kuen-Han Tsai wrote:
>>>> Commit ffd603f21423 ("usb: gadget: u_serial: Add null pointer check in
>>>> gs_start_io") adds null pointer checks to gs_start_io(), but it doesn't
>>>> fully fix the potential null pointer dereference issue. While
>>>> gserial_connect() calls gs_start_io() with port_lock held, gs_start_rx()
>>>> and gs_start_tx() release the lock during endpoint request submission.
>>>> This creates a window where gs_close() could set port->port_tty to NULL,
>>>> leading to a dereference when the lock is reacquired.
>>>>
>>>> This patch adds a null pointer check for port->port_tty after RX/TX
>>>> submission, and removes the initial null pointer check in gs_start_io()
>>>> since the caller must hold port_lock and guarantee non-null values for
>>>> port_usb and port_tty.
>>>
>>> Or you switch to tty_port refcounting and need not fiddling with this at all
>>> ;).
>>
>> I agree, Kuen-Han, why not do that instead?
> 
> The u_serial driver has already maintained the usage count of a TTY
> structure for open and close. While the driver tracks the usage count
> via open/close, it doesn't fully eliminate race conditions. Below are
> two potential scenarios:
> 
> Case 1 (Observed):
> 1. gs_open() sets usage count to 1.
> 2. gserial_connect(), gs_start_io(), and gs_start_rx() execute in
> sequence (lock held).
> 3. Lock released, usb_ep_queue() called.
> 4. In parallel, gs_close() executes, sees count of 1, clears TTY, releases lock.
> 5. Original thread resumes in gs_start_rx(), potentially leading to
> kernel panic on an invalid TTY.

If it used refcounting -- tty_port_tty_get(), how comes?

thanks,
Kuen-Han Tsai July 15, 2024, 3:33 p.m. UTC | #6
Hi Jiri,

Sorry for the late reply, I've finally been able to revisit this issue.

On Thu, Mar 28, 2024 at 5:02 PM Jiri Slaby <jirislaby@kernel.org> wrote:
>
> On 08. 03. 24, 12:47, Kuen-Han Tsai wrote:
> > Hi Greg & Jiri,
> >
> > On Sun, Jan 28, 2024 at 9:29 AM Greg KH <gregkh@linuxfoundation.org> wrote:
> >>
> >> On Thu, Jan 18, 2024 at 10:27:54AM +0100, Jiri Slaby wrote:
> >>> On 16. 01. 24, 15:16, Kuen-Han Tsai wrote:
> >>>> Commit ffd603f21423 ("usb: gadget: u_serial: Add null pointer check in
> >>>> gs_start_io") adds null pointer checks to gs_start_io(), but it doesn't
> >>>> fully fix the potential null pointer dereference issue. While
> >>>> gserial_connect() calls gs_start_io() with port_lock held, gs_start_rx()
> >>>> and gs_start_tx() release the lock during endpoint request submission.
> >>>> This creates a window where gs_close() could set port->port_tty to NULL,
> >>>> leading to a dereference when the lock is reacquired.
> >>>>
> >>>> This patch adds a null pointer check for port->port_tty after RX/TX
> >>>> submission, and removes the initial null pointer check in gs_start_io()
> >>>> since the caller must hold port_lock and guarantee non-null values for
> >>>> port_usb and port_tty.
> >>>
> >>> Or you switch to tty_port refcounting and need not fiddling with this at all
> >>> ;).
> >>
> >> I agree, Kuen-Han, why not do that instead?
> >
> > The u_serial driver has already maintained the usage count of a TTY
> > structure for open and close. While the driver tracks the usage count
> > via open/close, it doesn't fully eliminate race conditions. Below are
> > two potential scenarios:
> >
> > Case 1 (Observed):
> > 1. gs_open() sets usage count to 1.
> > 2. gserial_connect(), gs_start_io(), and gs_start_rx() execute in
> > sequence (lock held).
> > 3. Lock released, usb_ep_queue() called.
> > 4. In parallel, gs_close() executes, sees count of 1, clears TTY, releases lock.
> > 5. Original thread resumes in gs_start_rx(), potentially leading to
> > kernel panic on an invalid TTY.
>
> If it used refcounting -- tty_port_tty_get(), how comes?

If gs_start_rx() uses refcounting, the race problem will still persist
because gs_close() currently decides to reset port->port.tty to NULL
only when port->port.count reaches one (i.e. last open), without
checking if the associated TTY instance is still in use. I am
uncertain if you are suggesting that I should modify gs_close() by
considering the refcount of a tty instance? I'm still unsure how to
fix this race problem by using refcounting. I'd greatly appreciate it
if you could provide more detailed guidance on how to resolve this
issue, as I'm not very experienced with TTY drivers.

Also, I noticed a typo in my earlier emails, including the commit
message. It should be port->port.tty instead of port->port_tty.

Regards,
Kuen-Han
diff mbox series

Patch

diff --git a/drivers/usb/gadget/function/u_serial.c b/drivers/usb/gadget/function/u_serial.c
index a92eb6d90976..2f1890c8f473 100644
--- a/drivers/usb/gadget/function/u_serial.c
+++ b/drivers/usb/gadget/function/u_serial.c
@@ -539,20 +539,16 @@  static int gs_alloc_requests(struct usb_ep *ep, struct list_head *head,
 static int gs_start_io(struct gs_port *port)
 {
 	struct list_head	*head = &port->read_pool;
-	struct usb_ep		*ep;
+	struct usb_ep		*ep = port->port_usb->out;
 	int			status;
 	unsigned		started;

-	if (!port->port_usb || !port->port.tty)
-		return -EIO;
-
 	/* Allocate RX and TX I/O buffers.  We can't easily do this much
 	 * earlier (with GFP_KERNEL) because the requests are coupled to
 	 * endpoints, as are the packet sizes we'll be using.  Different
 	 * configurations may use different endpoints with a given port;
 	 * and high speed vs full speed changes packet sizes too.
 	 */
-	ep = port->port_usb->out;
 	status = gs_alloc_requests(ep, head, gs_read_complete,
 		&port->read_allocated);
 	if (status)
@@ -569,12 +565,22 @@  static int gs_start_io(struct gs_port *port)
 	port->n_read = 0;
 	started = gs_start_rx(port);

+	/*
+	 * The TTY may be set to NULL by gs_close() after gs_start_rx() or
+	 * gs_start_tx() release locks for endpoint request submission.
+	 */
+	if (!port->port.tty)
+		goto out;
+
 	if (started) {
 		gs_start_tx(port);
 		/* Unblock any pending writes into our circular buffer, in case
 		 * we didn't in gs_start_tx() */
+		if (!port->port.tty)
+			goto out;
 		tty_wakeup(port->port.tty);
 	} else {
+out:
 		gs_free_requests(ep, head, &port->read_allocated);
 		gs_free_requests(port->port_usb->in, &port->write_pool,
 			&port->write_allocated);