diff mbox series

[v2] Bluetooth: fix use-after-free in device_for_each_child()

Message ID 20241101114410.234311-1-dmantipov@yandex.ru
State New
Headers show
Series [v2] Bluetooth: fix use-after-free in device_for_each_child() | expand

Commit Message

Dmitry Antipov Nov. 1, 2024, 11:44 a.m. UTC
Syzbot has reported the following KASAN splat:

BUG: KASAN: slab-use-after-free in device_for_each_child+0x18f/0x1a0
Read of size 8 at addr ffff88801f605308 by task kbnepd bnep0/4980

CPU: 0 UID: 0 PID: 4980 Comm: kbnepd bnep0 Not tainted 6.12.0-rc4-00161-gae90f6a6170d #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0x100/0x190
 ? device_for_each_child+0x18f/0x1a0
 print_report+0x13a/0x4cb
 ? __virt_addr_valid+0x5e/0x590
 ? __phys_addr+0xc6/0x150
 ? device_for_each_child+0x18f/0x1a0
 kasan_report+0xda/0x110
 ? device_for_each_child+0x18f/0x1a0
 ? __pfx_dev_memalloc_noio+0x10/0x10
 device_for_each_child+0x18f/0x1a0
 ? __pfx_device_for_each_child+0x10/0x10
 pm_runtime_set_memalloc_noio+0xf2/0x180
 netdev_unregister_kobject+0x1ed/0x270
 unregister_netdevice_many_notify+0x123c/0x1d80
 ? __mutex_trylock_common+0xde/0x250
 ? __pfx_unregister_netdevice_many_notify+0x10/0x10
 ? trace_contention_end+0xe6/0x140
 ? __mutex_lock+0x4e7/0x8f0
 ? __pfx_lock_acquire.part.0+0x10/0x10
 ? rcu_is_watching+0x12/0xc0
 ? unregister_netdev+0x12/0x30
 unregister_netdevice_queue+0x30d/0x3f0
 ? __pfx_unregister_netdevice_queue+0x10/0x10
 ? __pfx_down_write+0x10/0x10
 unregister_netdev+0x1c/0x30
 bnep_session+0x1fb3/0x2ab0
 ? __pfx_bnep_session+0x10/0x10
 ? __pfx_lock_release+0x10/0x10
 ? __pfx_woken_wake_function+0x10/0x10
 ? __kthread_parkme+0x132/0x200
 ? __pfx_bnep_session+0x10/0x10
 ? kthread+0x13a/0x370
 ? __pfx_bnep_session+0x10/0x10
 kthread+0x2b7/0x370
 ? __pfx_kthread+0x10/0x10
 ret_from_fork+0x48/0x80
 ? __pfx_kthread+0x10/0x10
 ret_from_fork_asm+0x1a/0x30
 </TASK>

Allocated by task 4974:
 kasan_save_stack+0x30/0x50
 kasan_save_track+0x14/0x30
 __kasan_kmalloc+0xaa/0xb0
 __kmalloc_noprof+0x1d1/0x440
 hci_alloc_dev_priv+0x1d/0x2820
 __vhci_create_device+0xef/0x7d0
 vhci_write+0x2c7/0x480
 vfs_write+0x6a0/0xfc0
 ksys_write+0x12f/0x260
 do_syscall_64+0xc7/0x250
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Freed by task 4979:
 kasan_save_stack+0x30/0x50
 kasan_save_track+0x14/0x30
 kasan_save_free_info+0x3b/0x60
 __kasan_slab_free+0x4f/0x70
 kfree+0x141/0x490
 hci_release_dev+0x4d9/0x600
 bt_host_release+0x6a/0xb0
 device_release+0xa4/0x240
 kobject_put+0x1ec/0x5a0
 put_device+0x1f/0x30
 vhci_release+0x81/0xf0
 __fput+0x3f6/0xb30
 task_work_run+0x151/0x250
 do_exit+0xa79/0x2c30
 do_group_exit+0xd5/0x2a0
 get_signal+0x1fcd/0x2210
 arch_do_signal_or_restart+0x93/0x780
 syscall_exit_to_user_mode+0x140/0x290
 do_syscall_64+0xd4/0x250
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

In 'hci_conn_del_sysfs()', 'device_unregister()' may be called when
an underlying (kobject) reference counter is greater than 1. This
means that reparenting (happened when the device is actually freed)
is delayed and, during that delay, parent controller device (hciX)
may be deleted. Since the latter may create a dangling pointer to
freed parent, avoid that scenario by reparenting to NULL explicitly.

Reported-by: syzbot+6cf5652d3df49fae2e3f@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=6cf5652d3df49fae2e3f
Fixes: a85fb91e3d72 ("Bluetooth: Fix double free in hci_conn_cleanup")
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
---
v2: reparent per-connection 'struct device' explicitly
---
 net/bluetooth/hci_sysfs.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Luiz Augusto von Dentz Nov. 4, 2024, 2:06 p.m. UTC | #1
Hi Dmitry,

On Sat, Nov 2, 2024 at 5:51 AM Dmitry Antipov <dmantipov@yandex.ru> wrote:
>
> On 11/1/24 8:37 PM, Luiz Augusto von Dentz wrote:
>
> > Looks like this doesn't solve the problem, in fact I think you are
> > getting it backwards, you are trying to reparent the parent dev not
> > the child and I assume by destroying the parent device there should be
> > some way to reset the parent which seems to be the intent the
> > following code in hci_conn_del_sysfs:
> >
> >      while (1) {
> >          struct device *dev;
> >
> >          dev = device_find_child(&conn->dev, NULL, __match_tty);
> >          if (!dev)
> >              break;
> >          device_move(dev, NULL, DPM_ORDER_DEV_LAST);
> >          put_device(dev);
> >      }
> >
> > But note that it only does that after matching tty, but I guess we
> > want to do it regardless otherwise we may have the child objects still
> > access it, that said we should probably use device_for_each_child
> > though if that is safe to do calls to device_move under its callback.
>
> I'm not sure that I've got your point. IIUC the problem is that a controller
> (parent) instance may be freed _after_ the child (connection) has passed
> 'device_unregister(&conn->dev)' but _before_ an actual removal of 'conn->dev'
> from the devices hierarchy, thus leaving the dangling 'conn->dev.parent'
> pointer. The latter may be fixed by reparenting 'conn->dev' to NULL explicitly.
> And handling children of 'conn->dev' (i.e. the grandchilren of the controller)
> is out of this problem's scope at all.
>
> And nothing to say about syzbot's innards but manual testing shows that the
> following thing:
>
> void hci_conn_del_sysfs(struct hci_conn *conn)
> {
>          struct hci_dev *hdev = conn->hdev;
>
>          bt_dev_dbg(hdev, "conn %p", conn);
>
>          if (!device_is_registered(&conn->dev)) {
>                  /* If device_add() has *not* succeeded, use *only* put_device()
>                   * to drop the reference count.
>                   */
>                  put_device(&conn->dev);
>                  return;
>          }
>
>          while (1) {
>                  struct device *dev;
>
>                  dev = device_find_any_child(&conn->dev);
>                  if (!dev)
>                          break;
>                  printk(KERN_ERR "%s:%d: reparent dev@%p(%s) with parent@%p(%s)\n",
>                         __FILE__, __LINE__, dev, dev_name(dev), dev->parent,
>                         (dev->parent ? dev_name(dev->parent) : "<none>"));
>                  device_move(dev, NULL, DPM_ORDER_DEV_LAST);
>                  put_device(dev);
>          }
>
>          device_unregister(&conn->dev);
> }
>
> occasionally triggers the following crash:
>
> net/bluetooth/hci_sysfs.c:82: reparent dev@ffff888114be86f8(bnep0) with parent@ffff888111c64b68(hci4:200)
> Oops: general protection fault, probably for non-canonical address 0xdffffc000000000b: 0000 [#1] PREEMPT SMP KASI
> KASAN: null-ptr-deref in range [0x0000000000000058-0x000000000000005f]
> CPU: 3 UID: 0 PID: 6033 Comm: repro Not tainted 6.12.0-rc5-00299-g11066801dd4b-dirty #8
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014
> RIP: 0010:klist_put+0x4d/0x1d0
> Code: c1 ea 03 80 3c 02 00 0f 85 74 01 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8b 23 49 83 e4 fe 49 8d 7c 24 58 49
> RSP: 0018:ffffc9000423f9b0 EFLAGS: 00010202
> RAX: dffffc0000000000 RBX: ffff88810f43ac60 RCX: 0000000000000000
> RDX: 000000000000000b RSI: ffffffff8a92d415 RDI: 0000000000000058
> RBP: 0000000000000001 R08: 0000000000000000 R09: fffffbfff1fad62c
> R10: ffffffff8fd6b163 R11: 0000000000000000 R12: 0000000000000000
> R13: 0000000000000001 R14: ffffc9000423fa30 R15: ffffffff8a92d805
> FS:  00007f24ebb78740(0000) GS:ffff888135f00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000055c52afb6950 CR3: 0000000020c1c000 CR4: 00000000000006f0
> Call Trace:
>   <TASK>
>   ? die_addr.cold+0x8/0xd
>   ? exc_general_protection+0x147/0x240
>   ? asm_exc_general_protection+0x26/0x30
>   ? klist_remove+0x155/0x2b0
>   ? klist_put+0x15/0x1d0
>   ? klist_put+0x4d/0x1d0
>   klist_remove+0x15a/0x2b0
>   ? __pfx_klist_remove+0x10/0x10
>   device_move+0x12d/0x10b0
>   hci_conn_del_sysfs.cold+0xcf/0x14a
>   hci_conn_del+0x467/0xd60
>   hci_conn_hash_flush+0x18f/0x270
>   hci_dev_close_sync+0x549/0x1260
>   hci_dev_do_close+0x2e/0x90
>   hci_unregister_dev+0x213/0x630
>   vhci_release+0x79/0xf0
>   ? __pfx_vhci_release+0x10/0x10
>   __fput+0x3f6/0xb30
>   task_work_run+0x151/0x250
>   ? __pfx_task_work_run+0x10/0x10
>   do_exit+0xa79/0x2c30
>   ? do_raw_spin_lock+0x12a/0x2b0
>   ? __pfx_do_exit+0x10/0x10
>   do_group_exit+0xd5/0x2a0
>   __x64_sys_exit_group+0x3e/0x50
>   x64_sys_call+0x14af/0x14b0
>   do_syscall_64+0xc7/0x250
>   entry_SYSCALL_64_after_hwframe+0x77/0x7f

Weird, this was not reproduced by syzbot when I asked it to test, how
are you reproducing this? This looks like to be a problem with klist
being corrupted somehow, anyway if we can't unparent the objects we
need a way to unregister the child without waiting for the bnep thread
to clean it up.

> Dmitry
Luiz Augusto von Dentz Nov. 4, 2024, 4:04 p.m. UTC | #2
Hi,

On Mon, Nov 4, 2024 at 9:58 AM Dmitry Antipov <dmantipov@yandex.ru> wrote:
>
> On 11/4/24 5:06 PM, Luiz Augusto von Dentz wrote:
>
> > Weird, this was not reproduced by syzbot when I asked it to test, how
> > are you reproducing this?
>
> This is not regular, looks like a race, and I'm not sure how many runs
> are required to reproduce. Anyway we can't blame syzbot just because
> it was unable to reproduce some weird thing.

But how you reproducing, or you just running syzbot test over and over
locally? Anyway I will look into how we can actually unregister the
likes of bnep net_dev in a more direct manner rather than doing via
bnep kthread.

> Dmitry
>
diff mbox series

Patch

diff --git a/net/bluetooth/hci_sysfs.c b/net/bluetooth/hci_sysfs.c
index 367e32fe30eb..80ac537fa500 100644
--- a/net/bluetooth/hci_sysfs.c
+++ b/net/bluetooth/hci_sysfs.c
@@ -73,6 +73,8 @@  void hci_conn_del_sysfs(struct hci_conn *conn)
 		return;
 	}
 
+	device_move(&conn->dev, NULL, DPM_ORDER_DEV_LAST);
+
 	while (1) {
 		struct device *dev;