diff mbox series

[v6] usb: gadget: u_serial: Add null pointer check in gs_read_complete & gs_write_complete

Message ID TYUPR06MB62171A7BF25AB6963CBA07FED28F2@TYUPR06MB6217.apcprd06.prod.outlook.com
State Superseded
Headers show
Series [v6] usb: gadget: u_serial: Add null pointer check in gs_read_complete & gs_write_complete | expand

Commit Message

胡连勤 Aug. 22, 2024, 2:34 a.m. UTC
From: Lianqin Hu <hulianqin@vivo.com>

Considering that in some extreme cases, when the unbind operation
is being executed, gserial_disconnect has already cleared gser->ioport,
and the controller has not stopped & pullup 0, sys.usb.config is reset
and the bind operation will be re-executed, calling gs_read_complete,
which will result in accessing gser->iport, resulting in a null pointer
dereference, add a null pointer check to prevent this situation.

Added a static spinlock to prevent gser->ioport from becoming
null after the newly added check.

Unable to handle kernel NULL pointer dereference at
virtual address 00000000000001a8
pc : gs_read_complete+0x58/0x240
lr : usb_gadget_giveback_request+0x40/0x160
sp : ffffffc00f1539c0
x29: ffffffc00f1539c0 x28: ffffff8002a30000 x27: 0000000000000000
x26: ffffff8002a30000 x25: 0000000000000000 x24: ffffff8002a30000
x23: ffffff8002ff9a70 x22: ffffff898e7a7b00 x21: ffffff803c9af9d8
x20: ffffff898e7a7b00 x19: 00000000000001a8 x18: ffffffc0099fd098
x17: 0000000000001000 x16: 0000000080000000 x15: 0000000ac1200000
x14: 0000000000000003 x13: 000000000000d5e8 x12: 0000000355c314ac
x11: 0000000000000015 x10: 0000000000000012 x9 : 0000000000000008
x8 : 0000000000000000 x7 : 0000000000000000 x6 : ffffff887cd12000
x5 : 0000000000000002 x4 : ffffffc00f9b07f0 x3 : ffffffc00f1538d0
x2 : 0000000000000001 x1 : 0000000000000000 x0 : 00000000000001a8
Call trace:
gs_read_complete+0x58/0x240
usb_gadget_giveback_request+0x40/0x160
dwc3_remove_requests+0x170/0x484
dwc3_ep0_out_start+0xb0/0x1d4
__dwc3_gadget_start+0x25c/0x720
kretprobe_trampoline.cfi_jt+0x0/0x8
kretprobe_trampoline.cfi_jt+0x0/0x8
udc_bind_to_driver+0x1d8/0x300
usb_gadget_probe_driver+0xa8/0x1dc
gadget_dev_desc_UDC_store+0x13c/0x188
configfs_write_iter+0x160/0x1f4
vfs_write+0x2d0/0x40c
ksys_write+0x7c/0xf0
__arm64_sys_write+0x20/0x30
invoke_syscall+0x60/0x150
el0_svc_common+0x8c/0xf8
do_el0_svc+0x28/0xa0
el0_svc+0x24/0x84
el0t_64_sync_handler+0x88/0xec
el0t_64_sync+0x1b4/0x1b8
Code: aa1f03e1 aa1303e0 52800022 2a0103e8 (88e87e62)
---[ end trace 938847327a739172 ]---
Kernel panic - not syncing: Oops: Fatal exception

Fixes: c1dca562be8a ("usb gadget: split out serial core")
Cc: stable@vger.kernel.org
Signed-off-by: Lianqin Hu <hulianqin@vivo.com>
---
v6:
  - Update the commit text
  - Add the Fixes tag
  - CC stable kernel
  - Add serial_port_lock protection when checking port pointer
  - Optimize code comments
  - Delete log printing
---
 drivers/usb/gadget/function/u_serial.c | 33 ++++++++++++++++++++++----
 1 file changed, 28 insertions(+), 5 deletions(-)

Comments

Michael Nazzareno Trimarchi Aug. 23, 2024, 6:58 a.m. UTC | #1
Hi

On Fri, Aug 23, 2024 at 8:40 AM 胡连勤 <hulianqin@vivo.com> wrote:
>
> Hello linux community expert:
>
> >> Fixes: c1dca562be8a ("usb gadget: split out serial core")
> >> Cc: stable@vger.kernel.org
> >> Signed-off-by: Lianqin Hu <hulianqin@vivo.com>
> >> ---
> >> v6:
> >>   - Update the commit text
> >>   - Add the Fixes tag
> >>   - CC stable kernel
> >>   - Add serial_port_lock protection when checking port pointer
> >>   - Optimize code comments
> >>   - Delete log printing
>
> >You need to list ALL of the versions here, I seem to have missed v4 and
> >v5 somewhere so I don't know what changed there.
>
>  V4: Add cc stable kernel     >> Cc: stable@vger.kernel.org
>  V5: Add the Fixes tag       >> Fixes: c1dca562be8a ("usb gadget: split out serial core")
> >You can also add the Fixes tag and CC stable kernel, so that it can be
> >backported to older kernels (such as 5.15) also.
>    ---------  The above two lines are from Prashanth K's comment
>
> >>  static void gs_read_complete(struct usb_ep *ep, struct usb_request
> >> *req)  {
> >> -    struct gs_port  *port = ep->driver_data;
> >> +    struct gs_port  *port;
> >> +    unsigned long  flags;
> >> +
> >> +    spin_lock_irqsave(&serial_port_lock, flags);
> >> +    port = ep->driver_data;
> >> +
> >> +    /* When port is NULL, return to avoid panic. */
>
> >This comment is not needed, it's obvious that you check before dereference.
>  OK, I will delete this comment in the new patch.
>
> >BUT you can mention that you are trying to check with the race somewhere else, right?  Please do that, and document here where that race is at that you are doing this extra locking for.
>  I don't fully understand what you mean. Are you asking which logic is in competition with this one, causing this port to be null?
>


> Considering that in some extreme cases, when the unbind operation
> being executed, gserial_disconnect has already cleared gser->ioport,
> and the controller has not stopped & pullup 0, sys.usb.config is reset

Here few people know what sys.usb.config doing, you should describe properly
what is doing. What I can imagine that you unbind and bind to a new gadget
changing the sys.usb.config. Is that right?

> and the bind operation will be re-executed, calling gs_read_complete,
> which will result in accessing gser->iport, resulting in a null pointer
> dereference, add a null pointer check to prevent this situation.

My only question why unbind should not wait for pending urb to be completed,
before getting in the race?

>
> >> +    if (!port) {
> >> +            spin_unlock_irqrestore(&serial_port_lock, flags);
> >> +            return;
> >> +    }
> >>
> >> -    /* Queue all received data until the tty layer is ready for it. */
> >>      spin_lock(&port->port_lock);
> >> +    spin_unlock(&serial_port_lock);
>
> >nested spinlocks, why?  Did you run this with lockdep enabled to verify you aren't hitting a different bug now?
>  Because there is a competition relationship between this function and the gserial_disconnect function,
>  the gserial_disconnect function first obtains serial_port_lock and then obtains port->port_lock.
>  The purpose of nesting is to ensure that when gs_read_complete is executed, it can be successfully executed after obtaining serial_port_lock.
>  gserial_disconnect(..)
>  {
>         struct gs_port  *port = gser->ioport;
>         ...
>         spin_lock_irqsave(&serial_port_lock, flags);
>         spin_lock(&port->port_lock);
>         ...
>         gser->ioport = NULL;   ---> port = NULL;
>         ...
>         spin_unlock(&port->port_lock);
>         spin_unlock_irqrestore(&serial_port_lock, flags);
>  }
>
> After enabling the lockdep function (CONFIG_DEBUG_LOCK_ALLOC=y), there is no lockdep-related warning information.
>
> >And why is one irqsave and one not?  That feels odd, it might be right, but you need to document here why the difference.
>  After the gs_read_complete function is executed, spin_unlock_irqrestore is used to restore the previous state,

胡连勤 this is not a common locking pattern that is the reason that
should be properly described.

> -       /* Queue all received data until the tty layer is ready for it. */
>         spin_lock(&port->port_lock);
> +       spin_unlock(&serial_port_lock);
> +
> +       /* Queue all received data until the tty layer is ready for it. */
>         list_add_tail(&req->list, &port->read_queue);
>         schedule_delayed_work(&port->push, 0);
> -       spin_unlock(&port->port_lock);
> +       spin_unlock_irqrestore(&port->port_lock, flags);   ---> Here we use spin_unlock_irqrestore to restore the state
>  }
>
> Thanks

Thank you
Prashanth K Aug. 23, 2024, 7:34 a.m. UTC | #2
On 23-08-24 12:28 pm, Michael Nazzareno Trimarchi wrote:
> Hi
> 
> On Fri, Aug 23, 2024 at 8:40 AM 胡连勤 <hulianqin@vivo.com> wrote:
>>
>> Hello linux community expert:
>>
>>>> Fixes: c1dca562be8a ("usb gadget: split out serial core")
>>>> Cc: stable@vger.kernel.org
>>>> Signed-off-by: Lianqin Hu <hulianqin@vivo.com>
>>>> ---
>>>> v6:
>>>>   - Update the commit text
>>>>   - Add the Fixes tag
>>>>   - CC stable kernel
>>>>   - Add serial_port_lock protection when checking port pointer
>>>>   - Optimize code comments
>>>>   - Delete log printing
>>
>>> You need to list ALL of the versions here, I seem to have missed v4 and
>>> v5 somewhere so I don't know what changed there.
>>

[...]
>>> nested spinlocks, why?  Did you run this with lockdep enabled to verify you aren't hitting a different bug now?
>>  Because there is a competition relationship between this function and the gserial_disconnect function,
>>  the gserial_disconnect function first obtains serial_port_lock and then obtains port->port_lock.
>>  The purpose of nesting is to ensure that when gs_read_complete is executed, it can be successfully executed after obtaining serial_port_lock.
>>  gserial_disconnect(..)
>>  {
>>         struct gs_port  *port = gser->ioport;
>>         ...
>>         spin_lock_irqsave(&serial_port_lock, flags);
>>         spin_lock(&port->port_lock);
>>         ...
>>         gser->ioport = NULL;   ---> port = NULL;
>>         ...
>>         spin_unlock(&port->port_lock);
>>         spin_unlock_irqrestore(&serial_port_lock, flags);
>>  }
>>
>> After enabling the lockdep function (CONFIG_DEBUG_LOCK_ALLOC=y), there is no lockdep-related warning information.
>>
>>> And why is one irqsave and one not?  That feels odd, it might be right, but you need to document here why the difference.
>>  After the gs_read_complete function is executed, spin_unlock_irqrestore is used to restore the previous state,
> 
> 胡连勤 this is not a common locking pattern that is the reason that
> should be properly described.
This pattern was already used on gser_suspend/resume callbacks, this was
done because the lock was stored under port (and port itself was
becoming null), hence we added a static spinlock to mitigate it.
> 
>> -       /* Queue all received data until the tty layer is ready for it. */
>>         spin_lock(&port->port_lock);
>> +       spin_unlock(&serial_port_lock);
>> +
>> +       /* Queue all received data until the tty layer is ready for it. */
>>         list_add_tail(&req->list, &port->read_queue);
>>         schedule_delayed_work(&port->push, 0);
>> -       spin_unlock(&port->port_lock);
>> +       spin_unlock_irqrestore(&port->port_lock, flags);   ---> Here we use spin_unlock_irqrestore to restore the state
>>  }
>>
>> Thanks
> 
> Thank you
Prashanth K Aug. 23, 2024, 9:25 a.m. UTC | #3
On 23-08-24 01:27 pm, Michael Nazzareno Trimarchi wrote:
> Hi Prashanth
> 
> On Fri, Aug 23, 2024 at 9:34 AM Prashanth K <quic_prashk@quicinc.com> wrote:
>>
>>
>>
>> On 23-08-24 12:28 pm, Michael Nazzareno Trimarchi wrote:
>>> Hi
>>>
>>> On Fri, Aug 23, 2024 at 8:40 AM 胡连勤 <hulianqin@vivo.com> wrote:
>>>>
>>>> Hello linux community expert:
>>>>
>>>>>> Fixes: c1dca562be8a ("usb gadget: split out serial core")
>>>>>> Cc: stable@vger.kernel.org
>>>>>> Signed-off-by: Lianqin Hu <hulianqin@vivo.com>
>>>>>> ---
>>>>>> v6:
>>>>>>   - Update the commit text
>>>>>>   - Add the Fixes tag
>>>>>>   - CC stable kernel
>>>>>>   - Add serial_port_lock protection when checking port pointer
>>>>>>   - Optimize code comments
>>>>>>   - Delete log printing
>>>>
>>>>> You need to list ALL of the versions here, I seem to have missed v4 and
>>>>> v5 somewhere so I don't know what changed there.
>>>>
>>
>> [...]
>>>>> nested spinlocks, why?  Did you run this with lockdep enabled to verify you aren't hitting a different bug now?
>>>>  Because there is a competition relationship between this function and the gserial_disconnect function,
>>>>  the gserial_disconnect function first obtains serial_port_lock and then obtains port->port_lock.
>>>>  The purpose of nesting is to ensure that when gs_read_complete is executed, it can be successfully executed after obtaining serial_port_lock.
>>>>  gserial_disconnect(..)
>>>>  {
>>>>         struct gs_port  *port = gser->ioport;
>>>>         ...
>>>>         spin_lock_irqsave(&serial_port_lock, flags);
>>>>         spin_lock(&port->port_lock);
>>>>         ...
>>>>         gser->ioport = NULL;   ---> port = NULL;
>>>>         ...
>>>>         spin_unlock(&port->port_lock);
>>>>         spin_unlock_irqrestore(&serial_port_lock, flags);
>>>>  }
>>>>
>>>> After enabling the lockdep function (CONFIG_DEBUG_LOCK_ALLOC=y), there is no lockdep-related warning information.
>>>>
>>>>> And why is one irqsave and one not?  That feels odd, it might be right, but you need to document here why the difference.
>>>>  After the gs_read_complete function is executed, spin_unlock_irqrestore is used to restore the previous state,
>>>
>>> 胡连勤 this is not a common locking pattern that is the reason that
>>> should be properly described.
>> This pattern was already used on gser_suspend/resume callbacks, this was
>> done because the lock was stored under port (and port itself was
>> becoming null), hence we added a static spinlock to mitigate it.
>>>
> I see now, but why don't disable the endpoint before disconnecting?
> 
> /* disable endpoints, aborting down any active I/O */
> usb_ep_disable(gser->out);
> usb_ep_disable(gser->in);
> 
> Michael
> 
Not sure about this case, I think generally we need stop IO before
disabling EP, otherwise TX/RX functions may queue requests while EP is
getting disabled, thats why i think port is removed before ep_disable.
Moreover these callbacks (complete/suspend/resume etc) comes from UDC
and can be async, so better to use locks to prevent these kind of races.

Regards,
Prashanth K
diff mbox series

Patch

diff --git a/drivers/usb/gadget/function/u_serial.c b/drivers/usb/gadget/function/u_serial.c
index b394105e55d6..e43d8065f7ec 100644
--- a/drivers/usb/gadget/function/u_serial.c
+++ b/drivers/usb/gadget/function/u_serial.c
@@ -452,20 +452,43 @@  static void gs_rx_push(struct work_struct *work)
 
 static void gs_read_complete(struct usb_ep *ep, struct usb_request *req)
 {
-	struct gs_port	*port = ep->driver_data;
+	struct gs_port	*port;
+	unsigned long  flags;
+
+	spin_lock_irqsave(&serial_port_lock, flags);
+	port = ep->driver_data;
+
+	/* When port is NULL, return to avoid panic. */
+	if (!port) {
+		spin_unlock_irqrestore(&serial_port_lock, flags);
+		return;
+	}
 
-	/* Queue all received data until the tty layer is ready for it. */
 	spin_lock(&port->port_lock);
+	spin_unlock(&serial_port_lock);
+
+	/* Queue all received data until the tty layer is ready for it. */
 	list_add_tail(&req->list, &port->read_queue);
 	schedule_delayed_work(&port->push, 0);
-	spin_unlock(&port->port_lock);
+	spin_unlock_irqrestore(&port->port_lock, flags);
 }
 
 static void gs_write_complete(struct usb_ep *ep, struct usb_request *req)
 {
-	struct gs_port	*port = ep->driver_data;
+	struct gs_port	*port;
+	unsigned long  flags;
+
+	spin_lock_irqsave(&serial_port_lock, flags);
+	port = ep->driver_data;
+
+	/* When port is NULL, return to avoid panic. */
+	if (!port) {
+		spin_unlock_irqrestore(&serial_port_lock, flags);
+		return;
+	}
 
 	spin_lock(&port->port_lock);
+	spin_unlock(&serial_port_lock);
 	list_add(&req->list, &port->write_pool);
 	port->write_started--;
 
@@ -486,7 +509,7 @@  static void gs_write_complete(struct usb_ep *ep, struct usb_request *req)
 		break;
 	}
 
-	spin_unlock(&port->port_lock);
+	spin_unlock_irqrestore(&port->port_lock, flags);
 }
 
 static void gs_free_requests(struct usb_ep *ep, struct list_head *head,