Message ID | 20210514071452.25220-1-kai.heng.feng@canonical.com |
---|---|
State | New |
Headers | show |
Series | [v2] Bluetooth: Shutdown controller after workqueues are flushed or cancelled | expand |
Hi Kai-Heng, Marcel, Marcel Holtmann <marcel@holtmann.org> writes: > Hi Kai-Heng, > >> Rfkill block and unblock Intel USB Bluetooth [8087:0026] may make it >> stops working: >> [ 509.691509] Bluetooth: hci0: HCI reset during shutdown failed >> [ 514.897584] Bluetooth: hci0: MSFT filter_enable is already on >> [ 530.044751] usb 3-10: reset full-speed USB device number 5 using xhci_hcd >> [ 545.660350] usb 3-10: device descriptor read/64, error -110 >> [ 561.283530] usb 3-10: device descriptor read/64, error -110 >> [ 561.519682] usb 3-10: reset full-speed USB device number 5 using xhci_hcd >> [ 566.686650] Bluetooth: hci0: unexpected event for opcode 0x0500 >> [ 568.752452] Bluetooth: hci0: urb 0000000096cd309b failed to resubmit (113) >> [ 578.797955] Bluetooth: hci0: Failed to read MSFT supported features (-110) >> [ 586.286565] Bluetooth: hci0: urb 00000000c522f633 failed to resubmit (113) >> [ 596.215302] Bluetooth: hci0: Failed to read MSFT supported features (-110) >> >> Or kernel panics because other workqueues already freed skb: >> [ 2048.663763] BUG: kernel NULL pointer dereference, address: 0000000000000000 >> [ 2048.663775] #PF: supervisor read access in kernel mode >> [ 2048.663779] #PF: error_code(0x0000) - not-present page >> [ 2048.663782] PGD 0 P4D 0 >> [ 2048.663787] Oops: 0000 [#1] SMP NOPTI >> [ 2048.663793] CPU: 3 PID: 4491 Comm: rfkill Tainted: G W 5.13.0-rc1-next-20210510+ #20 >> [ 2048.663799] Hardware name: HP HP EliteBook 850 G8 Notebook PC/8846, BIOS T76 Ver. 01.01.04 12/02/2020 >> [ 2048.663801] RIP: 0010:__skb_ext_put+0x6/0x50 >> [ 2048.663814] Code: 8b 1b 48 85 db 75 db 5b 41 5c 5d c3 be 01 00 00 00 e8 de 13 c0 ff eb e7 be 02 00 00 00 e8 d2 13 c0 ff eb db 0f 1f 44 00 00 55 <8b> 07 48 89 e5 83 f8 01 74 14 b8 ff ff ff ff f0 0f c1 >> 07 83 f8 01 >> [ 2048.663819] RSP: 0018:ffffc1d105b6fd80 EFLAGS: 00010286 >> [ 2048.663824] RAX: 0000000000000000 RBX: ffff9d9ac5649000 RCX: 0000000000000000 >> [ 2048.663827] RDX: ffffffffc0d1daf6 RSI: 0000000000000206 RDI: 0000000000000000 >> [ 2048.663830] RBP: ffffc1d105b6fd98 R08: 0000000000000001 R09: ffff9d9ace8ceac0 >> [ 2048.663834] R10: ffff9d9ace8ceac0 R11: 0000000000000001 R12: ffff9d9ac5649000 >> [ 2048.663838] R13: 0000000000000000 R14: 00007ffe0354d650 R15: 0000000000000000 >> [ 2048.663843] FS: 00007fe02ab19740(0000) GS:ffff9d9e5f8c0000(0000) knlGS:0000000000000000 >> [ 2048.663849] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [ 2048.663853] CR2: 0000000000000000 CR3: 0000000111a52004 CR4: 0000000000770ee0 >> [ 2048.663856] PKRU: 55555554 >> [ 2048.663859] Call Trace: >> [ 2048.663865] ? skb_release_head_state+0x5e/0x80 >> [ 2048.663873] kfree_skb+0x2f/0xb0 >> [ 2048.663881] btusb_shutdown_intel_new+0x36/0x60 [btusb] >> [ 2048.663905] hci_dev_do_close+0x48c/0x5e0 [bluetooth] >> [ 2048.663954] ? __cond_resched+0x1a/0x50 >> [ 2048.663962] hci_rfkill_set_block+0x56/0xa0 [bluetooth] >> [ 2048.664007] rfkill_set_block+0x98/0x170 >> [ 2048.664016] rfkill_fop_write+0x136/0x1e0 >> [ 2048.664022] vfs_write+0xc7/0x260 >> [ 2048.664030] ksys_write+0xb1/0xe0 >> [ 2048.664035] ? exit_to_user_mode_prepare+0x37/0x1c0 >> [ 2048.664042] __x64_sys_write+0x1a/0x20 >> [ 2048.664048] do_syscall_64+0x40/0xb0 >> [ 2048.664055] entry_SYSCALL_64_after_hwframe+0x44/0xae >> [ 2048.664060] RIP: 0033:0x7fe02ac23c27 >> [ 2048.664066] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 >> [ 2048.664070] RSP: 002b:00007ffe0354d638 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 >> [ 2048.664075] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fe02ac23c27 >> [ 2048.664078] RDX: 0000000000000008 RSI: 00007ffe0354d650 RDI: 0000000000000003 >> [ 2048.664081] RBP: 0000000000000000 R08: 0000559b05998440 R09: 0000559b05998440 >> [ 2048.664084] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003 >> [ 2048.664086] R13: 0000000000000000 R14: ffffffff00000000 R15: 00000000ffffffff >> >> So move the shutdown callback to a place where workqueues are either >> flushed or cancelled to resolve the issue. >> >> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> >> --- >> v2: >> - Rebased on bluetooth-next. >> >> net/bluetooth/hci_core.c | 16 ++++++++-------- >> 1 file changed, 8 insertions(+), 8 deletions(-) > > patch has been applied to bluetooth-next tree. This patch seems to introduce a regression in the btmtksdio driver. With this patch applied, I can't enable the hci0 interface anymore on mt8183-pumpkin: i500-pumpkin login: root root@i500-pumpkin:~# uname -a Linux i500-pumpkin 5.14.0-rc3 #94 SMP PREEMPT Wed Jul 28 11:58:20 CEST 2021 aarch64 aarch64 aarch64 GNU/Linux root@i500-pumpkin:~# hciconfig hci0 up Can't init device hci0: Connection timed out (110) root@i500-pumpkin:~# hciconfig hci0 down root@i500-pumpkin:~# hciconfig hci0 up Can't init device hci0: Input/output error (5) Reverting it fixes the above issue. Any suggestion on how to fix this without touching hci_core ? Maybe the btmtksdio driver needs some rework. As I'm not familiar with the code, I would appreciate any tips. Thanks, Mattijs Korpershoek > > Regards > > Marcel
Hi Kai-Heng, Kai-Heng Feng <kai.heng.feng@canonical.com> writes: > On Wed, Jul 28, 2021 at 9:50 PM Mattijs Korpershoek > <mkorpershoek@baylibre.com> wrote: >> >> Hi Kai-Heng, Marcel, >> >> Marcel Holtmann <marcel@holtmann.org> writes: >> >> > Hi Kai-Heng, >> > >> >> Rfkill block and unblock Intel USB Bluetooth [8087:0026] may make it >> >> stops working: >> >> [ 509.691509] Bluetooth: hci0: HCI reset during shutdown failed >> >> [ 514.897584] Bluetooth: hci0: MSFT filter_enable is already on >> >> [ 530.044751] usb 3-10: reset full-speed USB device number 5 using xhci_hcd >> >> [ 545.660350] usb 3-10: device descriptor read/64, error -110 >> >> [ 561.283530] usb 3-10: device descriptor read/64, error -110 >> >> [ 561.519682] usb 3-10: reset full-speed USB device number 5 using xhci_hcd >> >> [ 566.686650] Bluetooth: hci0: unexpected event for opcode 0x0500 >> >> [ 568.752452] Bluetooth: hci0: urb 0000000096cd309b failed to resubmit (113) >> >> [ 578.797955] Bluetooth: hci0: Failed to read MSFT supported features (-110) >> >> [ 586.286565] Bluetooth: hci0: urb 00000000c522f633 failed to resubmit (113) >> >> [ 596.215302] Bluetooth: hci0: Failed to read MSFT supported features (-110) >> >> >> >> Or kernel panics because other workqueues already freed skb: >> >> [ 2048.663763] BUG: kernel NULL pointer dereference, address: 0000000000000000 >> >> [ 2048.663775] #PF: supervisor read access in kernel mode >> >> [ 2048.663779] #PF: error_code(0x0000) - not-present page >> >> [ 2048.663782] PGD 0 P4D 0 >> >> [ 2048.663787] Oops: 0000 [#1] SMP NOPTI >> >> [ 2048.663793] CPU: 3 PID: 4491 Comm: rfkill Tainted: G W 5.13.0-rc1-next-20210510+ #20 >> >> [ 2048.663799] Hardware name: HP HP EliteBook 850 G8 Notebook PC/8846, BIOS T76 Ver. 01.01.04 12/02/2020 >> >> [ 2048.663801] RIP: 0010:__skb_ext_put+0x6/0x50 >> >> [ 2048.663814] Code: 8b 1b 48 85 db 75 db 5b 41 5c 5d c3 be 01 00 00 00 e8 de 13 c0 ff eb e7 be 02 00 00 00 e8 d2 13 c0 ff eb db 0f 1f 44 00 00 55 <8b> 07 48 89 e5 83 f8 01 74 14 b8 ff ff ff ff f0 0f c1 >> >> 07 83 f8 01 >> >> [ 2048.663819] RSP: 0018:ffffc1d105b6fd80 EFLAGS: 00010286 >> >> [ 2048.663824] RAX: 0000000000000000 RBX: ffff9d9ac5649000 RCX: 0000000000000000 >> >> [ 2048.663827] RDX: ffffffffc0d1daf6 RSI: 0000000000000206 RDI: 0000000000000000 >> >> [ 2048.663830] RBP: ffffc1d105b6fd98 R08: 0000000000000001 R09: ffff9d9ace8ceac0 >> >> [ 2048.663834] R10: ffff9d9ace8ceac0 R11: 0000000000000001 R12: ffff9d9ac5649000 >> >> [ 2048.663838] R13: 0000000000000000 R14: 00007ffe0354d650 R15: 0000000000000000 >> >> [ 2048.663843] FS: 00007fe02ab19740(0000) GS:ffff9d9e5f8c0000(0000) knlGS:0000000000000000 >> >> [ 2048.663849] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> >> [ 2048.663853] CR2: 0000000000000000 CR3: 0000000111a52004 CR4: 0000000000770ee0 >> >> [ 2048.663856] PKRU: 55555554 >> >> [ 2048.663859] Call Trace: >> >> [ 2048.663865] ? skb_release_head_state+0x5e/0x80 >> >> [ 2048.663873] kfree_skb+0x2f/0xb0 >> >> [ 2048.663881] btusb_shutdown_intel_new+0x36/0x60 [btusb] >> >> [ 2048.663905] hci_dev_do_close+0x48c/0x5e0 [bluetooth] >> >> [ 2048.663954] ? __cond_resched+0x1a/0x50 >> >> [ 2048.663962] hci_rfkill_set_block+0x56/0xa0 [bluetooth] >> >> [ 2048.664007] rfkill_set_block+0x98/0x170 >> >> [ 2048.664016] rfkill_fop_write+0x136/0x1e0 >> >> [ 2048.664022] vfs_write+0xc7/0x260 >> >> [ 2048.664030] ksys_write+0xb1/0xe0 >> >> [ 2048.664035] ? exit_to_user_mode_prepare+0x37/0x1c0 >> >> [ 2048.664042] __x64_sys_write+0x1a/0x20 >> >> [ 2048.664048] do_syscall_64+0x40/0xb0 >> >> [ 2048.664055] entry_SYSCALL_64_after_hwframe+0x44/0xae >> >> [ 2048.664060] RIP: 0033:0x7fe02ac23c27 >> >> [ 2048.664066] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 >> >> [ 2048.664070] RSP: 002b:00007ffe0354d638 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 >> >> [ 2048.664075] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fe02ac23c27 >> >> [ 2048.664078] RDX: 0000000000000008 RSI: 00007ffe0354d650 RDI: 0000000000000003 >> >> [ 2048.664081] RBP: 0000000000000000 R08: 0000559b05998440 R09: 0000559b05998440 >> >> [ 2048.664084] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003 >> >> [ 2048.664086] R13: 0000000000000000 R14: ffffffff00000000 R15: 00000000ffffffff >> >> >> >> So move the shutdown callback to a place where workqueues are either >> >> flushed or cancelled to resolve the issue. >> >> >> >> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> >> >> --- >> >> v2: >> >> - Rebased on bluetooth-next. >> >> >> >> net/bluetooth/hci_core.c | 16 ++++++++-------- >> >> 1 file changed, 8 insertions(+), 8 deletions(-) >> > >> > patch has been applied to bluetooth-next tree. >> >> This patch seems to introduce a regression in the btmtksdio driver. >> With this patch applied, I can't enable the hci0 interface anymore on mt8183-pumpkin: >> >> i500-pumpkin login: root >> root@i500-pumpkin:~# uname -a >> Linux i500-pumpkin 5.14.0-rc3 #94 SMP PREEMPT Wed Jul 28 11:58:20 CEST 2021 aarch64 aarch64 aarch64 GNU/Linux >> root@i500-pumpkin:~# hciconfig hci0 up >> Can't init device hci0: Connection timed out (110) >> root@i500-pumpkin:~# hciconfig hci0 down >> root@i500-pumpkin:~# hciconfig hci0 up >> Can't init device hci0: Input/output error (5) >> >> Reverting it fixes the above issue. >> Any suggestion on how to fix this without touching hci_core ? >> Maybe the btmtksdio driver needs some rework. As I'm not familiar with the code, I would appreciate any tips. > > Can you please attach dmesg? Also, full ftrace log on btmtksdio can > also be helpful. Thank you for your help. Sorry I did not post the logs previously. dmesg: https://pastebin.com/tpWDNyQr ftrace on btmtksdio: https://pastebin.com/jmhvmwUw Mattijs > > Kai-Heng > >> >> Thanks, >> Mattijs Korpershoek >> >> >> > >> > Regards >> > >> > Marcel
Hi Mattijs, On Fri, Jul 30, 2021 at 7:40 PM Mattijs Korpershoek <mkorpershoek@baylibre.com> wrote: > > Hi Kai-Heng, [snipped] > Thank you for your help. Sorry I did not post the logs previously. > > dmesg: https://pastebin.com/tpWDNyQr > ftrace on btmtksdio: https://pastebin.com/jmhvmwUw Seems like btmtksdio needs shudown() to be called before flush(). Since the order was there for a very long time, changing the calling order indeed can break what driver expects. Can you please test the following patch: diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c index 2560ed2f144d..a61e610a400c 100644 --- a/net/bluetooth/hci_core.c +++ b/net/bluetooth/hci_core.c @@ -1785,6 +1785,14 @@ int hci_dev_do_close(struct hci_dev *hdev) aosp_do_close(hdev); msft_do_close(hdev); + if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) && + !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) && + test_bit(HCI_UP, &hdev->flags)) { + /* Execute vendor specific shutdown routine */ + if (hdev->shutdown) + hdev->shutdown(hdev); + } + if (hdev->flush) hdev->flush(hdev); @@ -1798,14 +1806,6 @@ int hci_dev_do_close(struct hci_dev *hdev) clear_bit(HCI_INIT, &hdev->flags); } - if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) && - !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) && - test_bit(HCI_UP, &hdev->flags)) { - /* Execute vendor specific shutdown routine */ - if (hdev->shutdown) - hdev->shutdown(hdev); - } - /* flush cmd work */ flush_work(&hdev->cmd_work); Kai-Heng > > Mattijs > > > > Kai-Heng > > > >> > >> Thanks, > >> Mattijs Korpershoek > >> > >> > >> > > >> > Regards > >> > > >> > Marcel
On Mon, Aug 2, 2021 at 11:05 AM Hillf Danton <hdanton@sina.com> wrote: > > On Fri, 14 May 2021 15:14:52 +0800 Kai-Heng Feng wrote: > > Rfkill block and unblock Intel USB Bluetooth [8087:0026] may make it > > stops working: > > [ 509.691509] Bluetooth: hci0: HCI reset during shutdown failed > > [ 514.897584] Bluetooth: hci0: MSFT filter_enable is already on > > [ 530.044751] usb 3-10: reset full-speed USB device number 5 using xhci_hcd > > [ 545.660350] usb 3-10: device descriptor read/64, error -110 > > [ 561.283530] usb 3-10: device descriptor read/64, error -110 > > [ 561.519682] usb 3-10: reset full-speed USB device number 5 using xhci_hcd > > [ 566.686650] Bluetooth: hci0: unexpected event for opcode 0x0500 > > [ 568.752452] Bluetooth: hci0: urb 0000000096cd309b failed to resubmit (113) > > [ 578.797955] Bluetooth: hci0: Failed to read MSFT supported features (-110) > > [ 586.286565] Bluetooth: hci0: urb 00000000c522f633 failed to resubmit (113) > > [ 596.215302] Bluetooth: hci0: Failed to read MSFT supported features (-110) > > > > Or kernel panics because other workqueues already freed skb: > > [ 2048.663763] BUG: kernel NULL pointer dereference, address: 0000000000000000 > > [ 2048.663775] #PF: supervisor read access in kernel mode > > [ 2048.663779] #PF: error_code(0x0000) - not-present page > > [ 2048.663782] PGD 0 P4D 0 > > [ 2048.663787] Oops: 0000 [#1] SMP NOPTI > > [ 2048.663793] CPU: 3 PID: 4491 Comm: rfkill Tainted: G W 5.13.0-rc1-next-20210510+ #20 > > [ 2048.663799] Hardware name: HP HP EliteBook 850 G8 Notebook PC/8846, BIOS T76 Ver. 01.01.04 12/02/2020 > > [ 2048.663801] RIP: 0010:__skb_ext_put+0x6/0x50 > > [ 2048.663814] Code: 8b 1b 48 85 db 75 db 5b 41 5c 5d c3 be 01 00 00 00 e8 de 13 c0 ff eb e7 be 02 00 00 00 e8 d2 13 c0 ff eb db 0f 1f 44 00 00 55 <8b> 07 48 89 e5 83 f8 01 74 14 b8 ff ff ff ff f0 0f c1 > > 07 83 f8 01 > > [ 2048.663819] RSP: 0018:ffffc1d105b6fd80 EFLAGS: 00010286 > > [ 2048.663824] RAX: 0000000000000000 RBX: ffff9d9ac5649000 RCX: 0000000000000000 > > [ 2048.663827] RDX: ffffffffc0d1daf6 RSI: 0000000000000206 RDI: 0000000000000000 > > [ 2048.663830] RBP: ffffc1d105b6fd98 R08: 0000000000000001 R09: ffff9d9ace8ceac0 > > [ 2048.663834] R10: ffff9d9ace8ceac0 R11: 0000000000000001 R12: ffff9d9ac5649000 > > [ 2048.663838] R13: 0000000000000000 R14: 00007ffe0354d650 R15: 0000000000000000 > > [ 2048.663843] FS: 00007fe02ab19740(0000) GS:ffff9d9e5f8c0000(0000) knlGS:0000000000000000 > > [ 2048.663849] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 2048.663853] CR2: 0000000000000000 CR3: 0000000111a52004 CR4: 0000000000770ee0 > > [ 2048.663856] PKRU: 55555554 > > [ 2048.663859] Call Trace: > > [ 2048.663865] ? skb_release_head_state+0x5e/0x80 > > [ 2048.663873] kfree_skb+0x2f/0xb0 > > [ 2048.663881] btusb_shutdown_intel_new+0x36/0x60 [btusb] > > [ 2048.663905] hci_dev_do_close+0x48c/0x5e0 [bluetooth] > > [ 2048.663954] ? __cond_resched+0x1a/0x50 > > [ 2048.663962] hci_rfkill_set_block+0x56/0xa0 [bluetooth] > > [ 2048.664007] rfkill_set_block+0x98/0x170 > > [ 2048.664016] rfkill_fop_write+0x136/0x1e0 > > [ 2048.664022] vfs_write+0xc7/0x260 > > [ 2048.664030] ksys_write+0xb1/0xe0 > > [ 2048.664035] ? exit_to_user_mode_prepare+0x37/0x1c0 > > [ 2048.664042] __x64_sys_write+0x1a/0x20 > > [ 2048.664048] do_syscall_64+0x40/0xb0 > > [ 2048.664055] entry_SYSCALL_64_after_hwframe+0x44/0xae > > [ 2048.664060] RIP: 0033:0x7fe02ac23c27 > > [ 2048.664066] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 > > [ 2048.664070] RSP: 002b:00007ffe0354d638 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 > > [ 2048.664075] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fe02ac23c27 > > [ 2048.664078] RDX: 0000000000000008 RSI: 00007ffe0354d650 RDI: 0000000000000003 > > [ 2048.664081] RBP: 0000000000000000 R08: 0000559b05998440 R09: 0000559b05998440 > > [ 2048.664084] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003 > > [ 2048.664086] R13: 0000000000000000 R14: ffffffff00000000 R15: 00000000ffffffff > > > > So move the shutdown callback to a place where workqueues are either > > flushed or cancelled to resolve the issue. > > > > Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> > > --- > > v2: > > - Rebased on bluetooth-next. > > > > net/bluetooth/hci_core.c | 16 ++++++++-------- > > 1 file changed, 8 insertions(+), 8 deletions(-) > > > > diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c > > index 7baf93eda936..6eedf334f943 100644 > > --- a/net/bluetooth/hci_core.c > > +++ b/net/bluetooth/hci_core.c > > @@ -1716,14 +1716,6 @@ int hci_dev_do_close(struct hci_dev *hdev) > > > > BT_DBG("%s %p", hdev->name, hdev); > > > > - if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) && > > - !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) && > > - test_bit(HCI_UP, &hdev->flags)) { > > - /* Execute vendor specific shutdown routine */ > > - if (hdev->shutdown) > > - hdev->shutdown(hdev); > > - } > > - > > cancel_delayed_work(&hdev->power_off); > > cancel_delayed_work(&hdev->ncmd_timer); > > > > @@ -1801,6 +1793,14 @@ int hci_dev_do_close(struct hci_dev *hdev) > > clear_bit(HCI_INIT, &hdev->flags); > > } > > > > + if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) && > > + !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) && > > + test_bit(HCI_UP, &hdev->flags)) { > > + /* Execute vendor specific shutdown routine */ > > + if (hdev->shutdown) > > + hdev->shutdown(hdev); > > + } > > + > > /* flush cmd work */ > > flush_work(&hdev->cmd_work); > > > > -- > > 2.30.2 > > btusb_shutdown_intel_new > __hci_cmd_sync(hdev, HCI_OP_RESET, 0, NULL, HCI_INIT_TIMEOUT); > __hci_cmd_sync_ev(hdev, opcode, plen, param, 0, timeout); > hci_req_run_skb(&req, hci_req_sync_complete); > > hci_req_sync_complete > if (skb) > hdev->req_skb = skb_get(skb); > > Given the skb_get in hci_req_sync_complete makes it safe to free skb on > driver side, I doubt this patch is the correct fix as it is. Some workqueues are still active. The shutdown() should be called at least after hci_request_cancel_all(). Kai-Heng
On Tue, Aug 3, 2021 at 3:47 PM Hillf Danton <hdanton@sina.com> wrote: > > On Tue, 3 Aug 2021 14:45:07 +0800 Kai-Heng Feng wrote: > >On Mon, Aug 2, 2021 at 11:05 AM Hillf Danton <hdanton@sina.com> wrote: > >> > >> Given the skb_get in hci_req_sync_complete makes it safe to free skb on > >> driver side, I doubt this patch is the correct fix as it is. > > > >Some workqueues are still active. > >The shutdown() should be called at least after hci_request_cancel_all(). > > What is muddy then is how active workqueues prevent skb_get from protecting > kfree_skb. Can you spot what workqueue it is? I managed to reproduce the issue with another kernel splat: ------------[ cut here ]------------ kernel BUG at mm/slub.c:321! invalid opcode: 0000 [#1] SMP NOPTI CPU: 2 PID: 2208 Comm: kworker/u9:3 Not tainted 5.14.0-rc4+ #16 Hardware name: HP HP ProBook 650 G8 Notebook PC/87ED, BIOS T74 Ver. 01.03.04 01/07/2021 Workqueue: hci0 discov_update [bluetooth] RIP: 0010:__slab_free+0x20c/0x3a0 Code: 00 44 0f b6 54 24 1a 8b 74 24 14 44 0f b6 4c 24 1b 44 8b 44 24 1c 48 89 44 24 08 48 8b 54 24 20 48 8b 7c 24 28 e9 ad fe ff ff <0f> 0b 49 3b 54 24 28 0f 85 6b ff ff ff 49 89 5c 24 20 49 89 4c 24 RSP: 0018:ffffaa0e4164fc50 EFLAGS: 00010246 RAX: ffff9cc9a217e668 RBX: ffff9cc9a217e600 RCX: ffff9cc9a217e600 RDX: 000000008010000e RSI: ffffd09044885f80 RDI: ffff9cc980e96500 RBP: ffffaa0e4164fd00 R08: 0000000000000001 R09: ffffffff885b3a4e R10: ffff9cc999aab800 R11: ffff9cc9a217e600 R12: ffffd09044885f80 R13: ffff9cc9a217e600 R14: ffff9cc980e96500 R15: ffff9cc9a217e600 FS: 0000000000000000(0000) GS:ffff9cca2b900000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffe164d5b98 CR3: 000000013f410002 CR4: 0000000000770ee0 PKRU: 55555554 Call Trace: ? psi_task_switch+0xc3/0x1e0 ? __switch_to_asm+0x36/0x70 ? skb_free_head+0x67/0x80 kmem_cache_free+0x370/0x3d0 ? kfree_skbmem+0x4e/0x90 kfree_skbmem+0x4e/0x90 kfree_skb+0x47/0xb0 __hci_req_sync+0x134/0x2a0 [bluetooth] ? wait_woken+0x70/0x70 discov_update+0x2ae/0x310 [bluetooth] process_one_work+0x21d/0x3c0 worker_thread+0x53/0x420 ? process_one_work+0x3c0/0x3c0 kthread+0x127/0x150 ? set_kthread_struct+0x50/0x50 ret_from_fork+0x1f/0x30 Modules linked in: rfcomm cmac algif_hash algif_skcipher af_alg bnep nls_iso8859_1 snd_soc_skl_hda_dsp snd_soc_intel_hda_dsp_common snd_soc_hdac_hdmi snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_soc_dmic snd_sof_pci_intel_tgl snd_sof_intel_hda_common snd_soc_hdac_hda snd_sof_pci snd_sof snd_sof_xtensa_dsp snd_sof_intel_hda snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi ledtrig_audio snd_soc_core snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_seq i915 snd_seq_device snd_timer hp_wmi intel_tcc_cooling x86_pkg_temp_thermal intel_powerclamp i2c_algo_bit coretemp joydev kvm_intel ttm mei_hdcp intel_rapl_msr platform_profile wmi_bmof kvm uvcvideo crct10dif_pclmul btusb videobuf2_vmalloc videobuf2_memops drm_kms_helper btrtl videobuf2_v4l2 crc32_pclmul btbcm ghash_clmulni_intel input_leds videobuf2_common btintel snd videodev syscopyarea sysfillrect sysimgblt aesni_intel mc serio_raw fb_sys_fops bluetooth crypto_simd cec cryptd intel_cstate ecdh_generic efi_pstore ecc rc_core hid_multitouch processor_thermal_device_pci_legacy mei_me intel_soc_dts_iosf processor_thermal_device mei soundcore processor_thermal_rfim ee1004 processor_thermal_mbox processor_thermal_rapl intel_pmt_telemetry intel_rapl_common intel_pmt_class ucsi_acpi typec_ucsi typec wmi soc_button_array int3403_thermal int340x_thermal_zone video int3400_thermal acpi_thermal_rel acpi_pad mac_hid intel_hid sparse_keymap sch_fq_codel msr parport_pc ppdev lp drm parport ip_tables x_tables autofs4 hid_generic nvme nvme_core intel_lpss_pci e1000e intel_lpss i2c_i801 idma64 i2c_smbus xhci_pci xhci_pci_renesas vmd intel_pmt i2c_hid_acpi i2c_hid hid pinctrl_tigerlake ---[ end trace c09445d4697039ed ]--- So hci_request_cancel_all() -> cancel_work_sync(&hdev->discov_update) and can prevent the race from happening. And the kernel splat is just one symptom of the issue, most of the time it's just "Bluetooth: hci0: HCI reset during shutdown failed" in dmesg. Kai-Heng
On Tue, Aug 3, 2021 at 4:21 PM Mattijs Korpershoek <mkorpershoek@baylibre.com> wrote: > > Hi Kai-Heng, > > Kai-Heng Feng <kai.heng.feng@canonical.com> writes: > > > Hi Mattijs, > > > > On Fri, Jul 30, 2021 at 7:40 PM Mattijs Korpershoek > > <mkorpershoek@baylibre.com> wrote: > >> > >> Hi Kai-Heng, > > > > [snipped] > > > >> Thank you for your help. Sorry I did not post the logs previously. > >> > >> dmesg: https://pastebin.com/tpWDNyQr > >> ftrace on btmtksdio: https://pastebin.com/jmhvmwUw > > > > Seems like btmtksdio needs shudown() to be called before flush(). > > Since the order was there for a very long time, changing the calling > > order indeed can break what driver expects. > > Can you please test the following patch: > > diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c > > index 2560ed2f144d..a61e610a400c 100644 > > --- a/net/bluetooth/hci_core.c > > +++ b/net/bluetooth/hci_core.c > > @@ -1785,6 +1785,14 @@ int hci_dev_do_close(struct hci_dev *hdev) > > aosp_do_close(hdev); > > msft_do_close(hdev); > > > > + if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) && > > + !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) && > > + test_bit(HCI_UP, &hdev->flags)) { > > + /* Execute vendor specific shutdown routine */ > > + if (hdev->shutdown) > > + hdev->shutdown(hdev); > > + } > > + > > if (hdev->flush) > > hdev->flush(hdev); > > > > @@ -1798,14 +1806,6 @@ int hci_dev_do_close(struct hci_dev *hdev) > > clear_bit(HCI_INIT, &hdev->flags); > > } > > > > - if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) && > > - !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) && > > - test_bit(HCI_UP, &hdev->flags)) { > > - /* Execute vendor specific shutdown routine */ > > - if (hdev->shutdown) > > - hdev->shutdown(hdev); > > - } > > - > > /* flush cmd work */ > > flush_work(&hdev->cmd_work); > > Thanks for the patch and your help. > I've tried it, but it seems that it does not improve for me. > I'm still observing: > > i500-pumpkin login: root > root@i500-pumpkin:~# hciconfig hci0 up > Can't init device hci0: Connection timed out (110) > > Logs for this session: > dmesg: https://pastebin.com/iAFk5Tzi > ftrace: https://pastebin.com/kEMWSYrE Thanks for the testing! What about moving the shutdown() part right after hci_req_sync_lock() so tx/rx can still work: diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c index 2560ed2f144d4..be3113fb7d4b0 100644 --- a/net/bluetooth/hci_core.c +++ b/net/bluetooth/hci_core.c @@ -1727,6 +1727,14 @@ int hci_dev_do_close(struct hci_dev *hdev) hci_request_cancel_all(hdev); hci_req_sync_lock(hdev); + if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) && + !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) && + test_bit(HCI_UP, &hdev->flags)) { + /* Execute vendor specific shutdown routine */ + if (hdev->shutdown) + hdev->shutdown(hdev); + } + if (!test_and_clear_bit(HCI_UP, &hdev->flags)) { cancel_delayed_work_sync(&hdev->cmd_timer); hci_req_sync_unlock(hdev); @@ -1798,14 +1806,6 @@ int hci_dev_do_close(struct hci_dev *hdev) clear_bit(HCI_INIT, &hdev->flags); } - if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) && - !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) && - test_bit(HCI_UP, &hdev->flags)) { - /* Execute vendor specific shutdown routine */ - if (hdev->shutdown) - hdev->shutdown(hdev); - } - /* flush cmd work */ flush_work(&hdev->cmd_work); > > > > > > Kai-Heng > > > >> > >> Mattijs > >> > > >> > Kai-Heng > >> > > >> >> > >> >> Thanks, > >> >> Mattijs Korpershoek > >> >> > >> >> > >> >> > > >> >> > Regards > >> >> > > >> >> > Marcel
On Tue, Aug 3, 2021 at 2:45 PM Kai-Heng Feng <kai.heng.feng@canonical.com> wrote: > > On Mon, Aug 2, 2021 at 11:05 AM Hillf Danton <hdanton@sina.com> wrote: > > > > On Fri, 14 May 2021 15:14:52 +0800 Kai-Heng Feng wrote: > > > Rfkill block and unblock Intel USB Bluetooth [8087:0026] may make it > > > stops working: > > > [ 509.691509] Bluetooth: hci0: HCI reset during shutdown failed > > > [ 514.897584] Bluetooth: hci0: MSFT filter_enable is already on > > > [ 530.044751] usb 3-10: reset full-speed USB device number 5 using xhci_hcd > > > [ 545.660350] usb 3-10: device descriptor read/64, error -110 > > > [ 561.283530] usb 3-10: device descriptor read/64, error -110 > > > [ 561.519682] usb 3-10: reset full-speed USB device number 5 using xhci_hcd > > > [ 566.686650] Bluetooth: hci0: unexpected event for opcode 0x0500 > > > [ 568.752452] Bluetooth: hci0: urb 0000000096cd309b failed to resubmit (113) > > > [ 578.797955] Bluetooth: hci0: Failed to read MSFT supported features (-110) > > > [ 586.286565] Bluetooth: hci0: urb 00000000c522f633 failed to resubmit (113) > > > [ 596.215302] Bluetooth: hci0: Failed to read MSFT supported features (-110) > > > > > > Or kernel panics because other workqueues already freed skb: > > > [ 2048.663763] BUG: kernel NULL pointer dereference, address: 0000000000000000 > > > [ 2048.663775] #PF: supervisor read access in kernel mode > > > [ 2048.663779] #PF: error_code(0x0000) - not-present page > > > [ 2048.663782] PGD 0 P4D 0 > > > [ 2048.663787] Oops: 0000 [#1] SMP NOPTI > > > [ 2048.663793] CPU: 3 PID: 4491 Comm: rfkill Tainted: G W 5.13.0-rc1-next-20210510+ #20 > > > [ 2048.663799] Hardware name: HP HP EliteBook 850 G8 Notebook PC/8846, BIOS T76 Ver. 01.01.04 12/02/2020 > > > [ 2048.663801] RIP: 0010:__skb_ext_put+0x6/0x50 > > > [ 2048.663814] Code: 8b 1b 48 85 db 75 db 5b 41 5c 5d c3 be 01 00 00 00 e8 de 13 c0 ff eb e7 be 02 00 00 00 e8 d2 13 c0 ff eb db 0f 1f 44 00 00 55 <8b> 07 48 89 e5 83 f8 01 74 14 b8 ff ff ff ff f0 0f c1 > > > 07 83 f8 01 > > > [ 2048.663819] RSP: 0018:ffffc1d105b6fd80 EFLAGS: 00010286 > > > [ 2048.663824] RAX: 0000000000000000 RBX: ffff9d9ac5649000 RCX: 0000000000000000 > > > [ 2048.663827] RDX: ffffffffc0d1daf6 RSI: 0000000000000206 RDI: 0000000000000000 > > > [ 2048.663830] RBP: ffffc1d105b6fd98 R08: 0000000000000001 R09: ffff9d9ace8ceac0 > > > [ 2048.663834] R10: ffff9d9ace8ceac0 R11: 0000000000000001 R12: ffff9d9ac5649000 > > > [ 2048.663838] R13: 0000000000000000 R14: 00007ffe0354d650 R15: 0000000000000000 > > > [ 2048.663843] FS: 00007fe02ab19740(0000) GS:ffff9d9e5f8c0000(0000) knlGS:0000000000000000 > > > [ 2048.663849] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > [ 2048.663853] CR2: 0000000000000000 CR3: 0000000111a52004 CR4: 0000000000770ee0 > > > [ 2048.663856] PKRU: 55555554 > > > [ 2048.663859] Call Trace: > > > [ 2048.663865] ? skb_release_head_state+0x5e/0x80 > > > [ 2048.663873] kfree_skb+0x2f/0xb0 > > > [ 2048.663881] btusb_shutdown_intel_new+0x36/0x60 [btusb] > > > [ 2048.663905] hci_dev_do_close+0x48c/0x5e0 [bluetooth] > > > [ 2048.663954] ? __cond_resched+0x1a/0x50 > > > [ 2048.663962] hci_rfkill_set_block+0x56/0xa0 [bluetooth] > > > [ 2048.664007] rfkill_set_block+0x98/0x170 > > > [ 2048.664016] rfkill_fop_write+0x136/0x1e0 > > > [ 2048.664022] vfs_write+0xc7/0x260 > > > [ 2048.664030] ksys_write+0xb1/0xe0 > > > [ 2048.664035] ? exit_to_user_mode_prepare+0x37/0x1c0 > > > [ 2048.664042] __x64_sys_write+0x1a/0x20 > > > [ 2048.664048] do_syscall_64+0x40/0xb0 > > > [ 2048.664055] entry_SYSCALL_64_after_hwframe+0x44/0xae > > > [ 2048.664060] RIP: 0033:0x7fe02ac23c27 > > > [ 2048.664066] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 > > > [ 2048.664070] RSP: 002b:00007ffe0354d638 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 > > > [ 2048.664075] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fe02ac23c27 > > > [ 2048.664078] RDX: 0000000000000008 RSI: 00007ffe0354d650 RDI: 0000000000000003 > > > [ 2048.664081] RBP: 0000000000000000 R08: 0000559b05998440 R09: 0000559b05998440 > > > [ 2048.664084] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003 > > > [ 2048.664086] R13: 0000000000000000 R14: ffffffff00000000 R15: 00000000ffffffff > > > > > > So move the shutdown callback to a place where workqueues are either > > > flushed or cancelled to resolve the issue. > > > > > > Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> > > > --- > > > v2: > > > - Rebased on bluetooth-next. > > > > > > net/bluetooth/hci_core.c | 16 ++++++++-------- > > > 1 file changed, 8 insertions(+), 8 deletions(-) > > > > > > diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c > > > index 7baf93eda936..6eedf334f943 100644 > > > --- a/net/bluetooth/hci_core.c > > > +++ b/net/bluetooth/hci_core.c > > > @@ -1716,14 +1716,6 @@ int hci_dev_do_close(struct hci_dev *hdev) > > > > > > BT_DBG("%s %p", hdev->name, hdev); > > > > > > - if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) && > > > - !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) && > > > - test_bit(HCI_UP, &hdev->flags)) { > > > - /* Execute vendor specific shutdown routine */ > > > - if (hdev->shutdown) > > > - hdev->shutdown(hdev); > > > - } > > > - > > > cancel_delayed_work(&hdev->power_off); > > > cancel_delayed_work(&hdev->ncmd_timer); > > > > > > @@ -1801,6 +1793,14 @@ int hci_dev_do_close(struct hci_dev *hdev) > > > clear_bit(HCI_INIT, &hdev->flags); > > > } > > > > > > + if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) && > > > + !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) && > > > + test_bit(HCI_UP, &hdev->flags)) { > > > + /* Execute vendor specific shutdown routine */ > > > + if (hdev->shutdown) > > > + hdev->shutdown(hdev); > > > + } > > > + > > > /* flush cmd work */ > > > flush_work(&hdev->cmd_work); > > > > > > -- > > > 2.30.2 > > > > btusb_shutdown_intel_new > > __hci_cmd_sync(hdev, HCI_OP_RESET, 0, NULL, HCI_INIT_TIMEOUT); > > __hci_cmd_sync_ev(hdev, opcode, plen, param, 0, timeout); > > hci_req_run_skb(&req, hci_req_sync_complete); > > > > hci_req_sync_complete > > if (skb) > > hdev->req_skb = skb_get(skb); > > > > Given the skb_get in hci_req_sync_complete makes it safe to free skb on > > driver side, I doubt this patch is the correct fix as it is. > > Some workqueues are still active. > The shutdown() should be called at least after hci_request_cancel_all(). > Hello, The original patch 60789afc02f592b8d91217b60930e7a76271ae07 ("Bluetooth: Shutdown controller after workqueues are flushed or cancelled") is causing the tx fail when downloading fw on mediatek mt8183 device using QCA bluetooth: [ 225.205061] Bluetooth: qca_download_firmware() hci0: QCA Downloading qca/rampatch_00440302.bin [ 227.252653] Bluetooth: hci_cmd_timeout() hci0: command 0xfc00 tx timeout ... follows by a lot of: [ 223.604971] Bluetooth: qca_recv() hci0: Frame reassembly failed (-84) [ 223.605027] Bluetooth: qca_recv() hci0: Frame reassembly failed (-84) After applying the fixup to allow tx, the issue is solved. > Kai-Heng
Hi Kai-Heng, Thanks for your patch, Kai-Heng Feng <kai.heng.feng@canonical.com> writes: > On Tue, Aug 3, 2021 at 4:21 PM Mattijs Korpershoek > <mkorpershoek@baylibre.com> wrote: >> >> Hi Kai-Heng, >> >> Kai-Heng Feng <kai.heng.feng@canonical.com> writes: >> >> > Hi Mattijs, >> > >> > On Fri, Jul 30, 2021 at 7:40 PM Mattijs Korpershoek >> > <mkorpershoek@baylibre.com> wrote: >> >> >> >> Hi Kai-Heng, >> > >> > [snipped] >> > >> >> Thank you for your help. Sorry I did not post the logs previously. >> >> >> >> dmesg: https://pastebin.com/tpWDNyQr >> >> ftrace on btmtksdio: https://pastebin.com/jmhvmwUw >> > >> > Seems like btmtksdio needs shudown() to be called before flush(). >> > Since the order was there for a very long time, changing the calling >> > order indeed can break what driver expects. >> > Can you please test the following patch: >> > diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c >> > index 2560ed2f144d..a61e610a400c 100644 >> > --- a/net/bluetooth/hci_core.c >> > +++ b/net/bluetooth/hci_core.c >> > @@ -1785,6 +1785,14 @@ int hci_dev_do_close(struct hci_dev *hdev) >> > aosp_do_close(hdev); >> > msft_do_close(hdev); >> > >> > + if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) && >> > + !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) && >> > + test_bit(HCI_UP, &hdev->flags)) { >> > + /* Execute vendor specific shutdown routine */ >> > + if (hdev->shutdown) >> > + hdev->shutdown(hdev); >> > + } >> > + >> > if (hdev->flush) >> > hdev->flush(hdev); >> > >> > @@ -1798,14 +1806,6 @@ int hci_dev_do_close(struct hci_dev *hdev) >> > clear_bit(HCI_INIT, &hdev->flags); >> > } >> > >> > - if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) && >> > - !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) && >> > - test_bit(HCI_UP, &hdev->flags)) { >> > - /* Execute vendor specific shutdown routine */ >> > - if (hdev->shutdown) >> > - hdev->shutdown(hdev); >> > - } >> > - >> > /* flush cmd work */ >> > flush_work(&hdev->cmd_work); >> >> Thanks for the patch and your help. >> I've tried it, but it seems that it does not improve for me. >> I'm still observing: >> >> i500-pumpkin login: root >> root@i500-pumpkin:~# hciconfig hci0 up >> Can't init device hci0: Connection timed out (110) >> >> Logs for this session: >> dmesg: https://pastebin.com/iAFk5Tzi >> ftrace: https://pastebin.com/kEMWSYrE > > Thanks for the testing! > What about moving the shutdown() part right after hci_req_sync_lock() > so tx/rx can still work: > > diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c > index 2560ed2f144d4..be3113fb7d4b0 100644 > --- a/net/bluetooth/hci_core.c > +++ b/net/bluetooth/hci_core.c > @@ -1727,6 +1727,14 @@ int hci_dev_do_close(struct hci_dev *hdev) > hci_request_cancel_all(hdev); > hci_req_sync_lock(hdev); > > + if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) && > + !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) && > + test_bit(HCI_UP, &hdev->flags)) { > + /* Execute vendor specific shutdown routine */ > + if (hdev->shutdown) > + hdev->shutdown(hdev); > + } > + > if (!test_and_clear_bit(HCI_UP, &hdev->flags)) { > cancel_delayed_work_sync(&hdev->cmd_timer); > hci_req_sync_unlock(hdev); > @@ -1798,14 +1806,6 @@ int hci_dev_do_close(struct hci_dev *hdev) > clear_bit(HCI_INIT, &hdev->flags); > } > > - if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) && > - !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) && > - test_bit(HCI_UP, &hdev->flags)) { > - /* Execute vendor specific shutdown routine */ > - if (hdev->shutdown) > - hdev->shutdown(hdev); > - } > - > /* flush cmd work */ > flush_work(&hdev->cmd_work); I confirm this diff works for me: root@i500-pumpkin:~# hciconfig hci0 up root@i500-pumpkin:~# hciconfig hci0 down root@i500-pumpkin:~# hciconfig hci0 up root@i500-pumpkin:~# hciconfig hci0 hci0: Type: Primary Bus: SDIO BD Address: 00:0C:E7:55:FF:12 ACL MTU: 1021:8 SCO MTU: 244:4 UP RUNNING RX bytes:11268 acl:0 sco:0 events:829 errors:0 TX bytes:182569 acl:0 sco:0 commands:829 errors:0 root@i500-pumpkin:~# hcitool scan Scanning ... <redacted> Pixel 3 XL Tested-by: Mattijs Korpershoek <mkorpershoek@baylibre.com> > > > > > >> >> >> > >> > Kai-Heng >> > >> >> >> >> Mattijs >> >> > >> >> > Kai-Heng >> >> > >> >> >> >> >> >> Thanks, >> >> >> Mattijs Korpershoek >> >> >> >> >> >> >> >> >> > >> >> >> > Regards >> >> >> > >> >> >> > Marcel
Hi Mattijs, On Thu, Aug 5, 2021 at 2:55 PM Mattijs Korpershoek <mkorpershoek@baylibre.com> wrote: > > Hi Kai-Heng, > > Thanks for your patch, > > Kai-Heng Feng <kai.heng.feng@canonical.com> writes: > [snipped] > I confirm this diff works for me: > > root@i500-pumpkin:~# hciconfig hci0 up > root@i500-pumpkin:~# hciconfig hci0 down > root@i500-pumpkin:~# hciconfig hci0 up > root@i500-pumpkin:~# hciconfig hci0 > hci0: Type: Primary Bus: SDIO > BD Address: 00:0C:E7:55:FF:12 ACL MTU: 1021:8 SCO MTU: 244:4 > UP RUNNING > RX bytes:11268 acl:0 sco:0 events:829 errors:0 > TX bytes:182569 acl:0 sco:0 commands:829 errors:0 > > root@i500-pumpkin:~# hcitool scan > Scanning ... > <redacted> Pixel 3 XL > > Tested-by: Mattijs Korpershoek <mkorpershoek@baylibre.com> I found that btmtksdio_flush() only cancels the work instead of doing flush_work(). That probably explains why putting ->shutdown right before ->flush doesn't work. So can you please test the following again: diff --git a/drivers/bluetooth/btmtksdio.c b/drivers/bluetooth/btmtksdio.c index 9872ef18f9fea..b33c05ad2150b 100644 --- a/drivers/bluetooth/btmtksdio.c +++ b/drivers/bluetooth/btmtksdio.c @@ -649,9 +649,9 @@ static int btmtksdio_flush(struct hci_dev *hdev) { struct btmtksdio_dev *bdev = hci_get_drvdata(hdev); - skb_queue_purge(&bdev->txq); + flush_work(&bdev->tx_work); - cancel_work_sync(&bdev->tx_work); + skb_queue_purge(&bdev->txq); return 0; } diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c index 2560ed2f144d4..a61e610a400cb 100644 --- a/net/bluetooth/hci_core.c +++ b/net/bluetooth/hci_core.c @@ -1785,6 +1785,14 @@ int hci_dev_do_close(struct hci_dev *hdev) aosp_do_close(hdev); msft_do_close(hdev); + if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) && + !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) && + test_bit(HCI_UP, &hdev->flags)) { + /* Execute vendor specific shutdown routine */ + if (hdev->shutdown) + hdev->shutdown(hdev); + } + if (hdev->flush) hdev->flush(hdev); @@ -1798,14 +1806,6 @@ int hci_dev_do_close(struct hci_dev *hdev) clear_bit(HCI_INIT, &hdev->flags); } - if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) && - !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) && - test_bit(HCI_UP, &hdev->flags)) { - /* Execute vendor specific shutdown routine */ - if (hdev->shutdown) - hdev->shutdown(hdev); - } - /* flush cmd work */ flush_work(&hdev->cmd_work); Kai-Heng
Hi Kai-Heng, Kai-Heng Feng <kai.heng.feng@canonical.com> writes: > Hi Mattijs, > >> [snipped] > > Thanks for your testing. I think I finally got it: > btmtksdio_shutdown() > -> mtk_hci_wmt_sync() > -> __hci_cmd_send() > then waiting for BTMTKSDIO_TX_WAIT_VND_EVT, which is cleared in > btmtksdio_recv_event(): > btmtksdio_recv_event() > -> hci_recv_frame() > -> queue_work(hdev->workqueue, &hdev->rx_work); > > That means it has to be done before the following drain_workqueue() call. > Can you please see if moving the ->shutdown() part right before > drain_workqueue() can fix the issue? I've tested the following patch: diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c index 2560ed2f144d..131e69a9a66a 100644 --- a/net/bluetooth/hci_core.c +++ b/net/bluetooth/hci_core.c @@ -1757,6 +1757,14 @@ int hci_dev_do_close(struct hci_dev *hdev) cancel_delayed_work_sync(&adv_instance->rpa_expired_cb); } + if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) && + !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) && + test_bit(HCI_UP, &hdev->flags)) { + /* Execute vendor specific shutdown routine */ + if (hdev->shutdown) + hdev->shutdown(hdev); + } + /* Avoid potential lockdep warnings from the *_flush() calls by * ensuring the workqueue is empty up front. */ @@ -1798,14 +1806,6 @@ int hci_dev_do_close(struct hci_dev *hdev) clear_bit(HCI_INIT, &hdev->flags); } - if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) && - !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) && - test_bit(HCI_UP, &hdev->flags)) { - /* Execute vendor specific shutdown routine */ - if (hdev->shutdown) - hdev->shutdown(hdev); - } - /* flush cmd work */ flush_work(&hdev->cmd_work); It does not seem to fix the issue. Adding the bits in btmtksdio_flush() does not change the result of the above patch. Here are the logs. These are just with the above patch (not with the btmtksdio_flush() changes. dmesg: https://pastebin.com/FZZBkqGC btmtksdio_ftrace: https://pastebin.com/JQ0UWenY Mattijs > > Kai-Heng > >> >> Mattijs >> >> > >> > Kai-Heng
diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c index 7baf93eda936..6eedf334f943 100644 --- a/net/bluetooth/hci_core.c +++ b/net/bluetooth/hci_core.c @@ -1716,14 +1716,6 @@ int hci_dev_do_close(struct hci_dev *hdev) BT_DBG("%s %p", hdev->name, hdev); - if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) && - !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) && - test_bit(HCI_UP, &hdev->flags)) { - /* Execute vendor specific shutdown routine */ - if (hdev->shutdown) - hdev->shutdown(hdev); - } - cancel_delayed_work(&hdev->power_off); cancel_delayed_work(&hdev->ncmd_timer); @@ -1801,6 +1793,14 @@ int hci_dev_do_close(struct hci_dev *hdev) clear_bit(HCI_INIT, &hdev->flags); } + if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) && + !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) && + test_bit(HCI_UP, &hdev->flags)) { + /* Execute vendor specific shutdown routine */ + if (hdev->shutdown) + hdev->shutdown(hdev); + } + /* flush cmd work */ flush_work(&hdev->cmd_work);
Rfkill block and unblock Intel USB Bluetooth [8087:0026] may make it stops working: [ 509.691509] Bluetooth: hci0: HCI reset during shutdown failed [ 514.897584] Bluetooth: hci0: MSFT filter_enable is already on [ 530.044751] usb 3-10: reset full-speed USB device number 5 using xhci_hcd [ 545.660350] usb 3-10: device descriptor read/64, error -110 [ 561.283530] usb 3-10: device descriptor read/64, error -110 [ 561.519682] usb 3-10: reset full-speed USB device number 5 using xhci_hcd [ 566.686650] Bluetooth: hci0: unexpected event for opcode 0x0500 [ 568.752452] Bluetooth: hci0: urb 0000000096cd309b failed to resubmit (113) [ 578.797955] Bluetooth: hci0: Failed to read MSFT supported features (-110) [ 586.286565] Bluetooth: hci0: urb 00000000c522f633 failed to resubmit (113) [ 596.215302] Bluetooth: hci0: Failed to read MSFT supported features (-110) Or kernel panics because other workqueues already freed skb: [ 2048.663763] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 2048.663775] #PF: supervisor read access in kernel mode [ 2048.663779] #PF: error_code(0x0000) - not-present page [ 2048.663782] PGD 0 P4D 0 [ 2048.663787] Oops: 0000 [#1] SMP NOPTI [ 2048.663793] CPU: 3 PID: 4491 Comm: rfkill Tainted: G W 5.13.0-rc1-next-20210510+ #20 [ 2048.663799] Hardware name: HP HP EliteBook 850 G8 Notebook PC/8846, BIOS T76 Ver. 01.01.04 12/02/2020 [ 2048.663801] RIP: 0010:__skb_ext_put+0x6/0x50 [ 2048.663814] Code: 8b 1b 48 85 db 75 db 5b 41 5c 5d c3 be 01 00 00 00 e8 de 13 c0 ff eb e7 be 02 00 00 00 e8 d2 13 c0 ff eb db 0f 1f 44 00 00 55 <8b> 07 48 89 e5 83 f8 01 74 14 b8 ff ff ff ff f0 0f c1 07 83 f8 01 [ 2048.663819] RSP: 0018:ffffc1d105b6fd80 EFLAGS: 00010286 [ 2048.663824] RAX: 0000000000000000 RBX: ffff9d9ac5649000 RCX: 0000000000000000 [ 2048.663827] RDX: ffffffffc0d1daf6 RSI: 0000000000000206 RDI: 0000000000000000 [ 2048.663830] RBP: ffffc1d105b6fd98 R08: 0000000000000001 R09: ffff9d9ace8ceac0 [ 2048.663834] R10: ffff9d9ace8ceac0 R11: 0000000000000001 R12: ffff9d9ac5649000 [ 2048.663838] R13: 0000000000000000 R14: 00007ffe0354d650 R15: 0000000000000000 [ 2048.663843] FS: 00007fe02ab19740(0000) GS:ffff9d9e5f8c0000(0000) knlGS:0000000000000000 [ 2048.663849] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2048.663853] CR2: 0000000000000000 CR3: 0000000111a52004 CR4: 0000000000770ee0 [ 2048.663856] PKRU: 55555554 [ 2048.663859] Call Trace: [ 2048.663865] ? skb_release_head_state+0x5e/0x80 [ 2048.663873] kfree_skb+0x2f/0xb0 [ 2048.663881] btusb_shutdown_intel_new+0x36/0x60 [btusb] [ 2048.663905] hci_dev_do_close+0x48c/0x5e0 [bluetooth] [ 2048.663954] ? __cond_resched+0x1a/0x50 [ 2048.663962] hci_rfkill_set_block+0x56/0xa0 [bluetooth] [ 2048.664007] rfkill_set_block+0x98/0x170 [ 2048.664016] rfkill_fop_write+0x136/0x1e0 [ 2048.664022] vfs_write+0xc7/0x260 [ 2048.664030] ksys_write+0xb1/0xe0 [ 2048.664035] ? exit_to_user_mode_prepare+0x37/0x1c0 [ 2048.664042] __x64_sys_write+0x1a/0x20 [ 2048.664048] do_syscall_64+0x40/0xb0 [ 2048.664055] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 2048.664060] RIP: 0033:0x7fe02ac23c27 [ 2048.664066] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 [ 2048.664070] RSP: 002b:00007ffe0354d638 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 2048.664075] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fe02ac23c27 [ 2048.664078] RDX: 0000000000000008 RSI: 00007ffe0354d650 RDI: 0000000000000003 [ 2048.664081] RBP: 0000000000000000 R08: 0000559b05998440 R09: 0000559b05998440 [ 2048.664084] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003 [ 2048.664086] R13: 0000000000000000 R14: ffffffff00000000 R15: 00000000ffffffff So move the shutdown callback to a place where workqueues are either flushed or cancelled to resolve the issue. Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> --- v2: - Rebased on bluetooth-next. net/bluetooth/hci_core.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)