Message ID | 20220517103436.15867-1-johan+linaro@kernel.org |
---|---|
State | New |
Headers | show |
Series | [RFC] ath11k: fix netdev open race | expand |
On Tue, May 17, 2022 at 12:34:36PM +0200, Johan Hovold wrote: > Make sure to allocate resources needed before registering the device. > > This specifically avoids having a racing open() trigger a BUG_ON() in > mod_timer() when ath11k_mac_op_start() is called before the > mon_reap_timer as been set up. > > Fixes: d5c65159f289 ("ath11k: driver for Qualcomm IEEE 802.11ax devices") > Fixes: 840c36fa727a ("ath11k: dp: stop rx pktlog before suspend") > Signed-off-by: Johan Hovold <johan+linaro@kernel.org> > --- For completeness: Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3 Johan
Johan Hovold <johan@kernel.org> writes: > On Tue, May 17, 2022 at 12:34:36PM +0200, Johan Hovold wrote: >> Make sure to allocate resources needed before registering the device. >> >> This specifically avoids having a racing open() trigger a BUG_ON() in >> mod_timer() when ath11k_mac_op_start() is called before the >> mon_reap_timer as been set up. >> >> Fixes: d5c65159f289 ("ath11k: driver for Qualcomm IEEE 802.11ax devices") >> Fixes: 840c36fa727a ("ath11k: dp: stop rx pktlog before suspend") >> Signed-off-by: Johan Hovold <johan+linaro@kernel.org> >> --- > > For completeness: > > Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3 Thanks, added in the pending branch. You submitted this as RFC but do you mind if I apply this anyway? The patch looks good and passes my tests. But I do wonder why I haven't seen the crash...
On Mon, May 23, 2022 at 10:06:37PM +0300, Kalle Valo wrote: > Johan Hovold <johan@kernel.org> writes: > > > On Tue, May 17, 2022 at 12:34:36PM +0200, Johan Hovold wrote: > >> Make sure to allocate resources needed before registering the device. > >> > >> This specifically avoids having a racing open() trigger a BUG_ON() in > >> mod_timer() when ath11k_mac_op_start() is called before the > >> mon_reap_timer as been set up. > >> > >> Fixes: d5c65159f289 ("ath11k: driver for Qualcomm IEEE 802.11ax devices") > >> Fixes: 840c36fa727a ("ath11k: dp: stop rx pktlog before suspend") > >> Signed-off-by: Johan Hovold <johan+linaro@kernel.org> > >> --- > > > > For completeness: > > > > Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3 > > Thanks, added in the pending branch. > > You submitted this as RFC but do you mind if I apply this anyway? The > patch looks good and passes my tests. But I do wonder why I haven't seen > the crash... If it looks good to you then please do apply it. I was just worried that there may be some subtle reason for why ath11k_dp_pdev_alloc() was called after netdev registration in the first place and that it might need to be split up so that for example ath11k_dp_rx_pdev_mon_attach() isn't called until after registration. I did not see this issue with next-20220310, but I hit it on every probe with next-20220511. Perhaps some timing changed in between. Here's the backtrace for completeness in case someone else starts hitting this and searches the archives: [ 51.346947] kernel BUG at kernel/time/timer.c:990! [ 51.346958] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ... [ 51.578225] Call trace: [ 51.583293] __mod_timer+0x298/0x390 [ 51.589518] mod_timer+0x14/0x20 [ 51.595368] ath11k_mac_op_start+0x41c/0x4a0 [ath11k] [ 51.603165] drv_start+0x38/0x60 [mac80211] [ 51.610110] ieee80211_do_open+0x29c/0x7d0 [mac80211] [ 51.617945] ieee80211_open+0x60/0xb0 [mac80211] [ 51.625311] __dev_open+0x100/0x1c0 [ 51.631420] __dev_change_flags+0x194/0x210 [ 51.638214] dev_change_flags+0x24/0x70 [ 51.644646] do_setlink+0x228/0xdb0 [ 51.650723] __rtnl_newlink+0x460/0x830 [ 51.657162] rtnl_newlink+0x4c/0x80 [ 51.663229] rtnetlink_rcv_msg+0x124/0x390 [ 51.669917] netlink_rcv_skb+0x58/0x130 [ 51.676314] rtnetlink_rcv+0x18/0x30 [ 51.682460] netlink_unicast+0x250/0x310 [ 51.688960] netlink_sendmsg+0x19c/0x3e0 [ 51.695458] ____sys_sendmsg+0x220/0x290 [ 51.701938] ___sys_sendmsg+0x7c/0xc0 [ 51.708148] __sys_sendmsg+0x68/0xd0 [ 51.714254] __arm64_sys_sendmsg+0x28/0x40 [ 51.720900] invoke_syscall+0x48/0x120 Johan
Johan Hovold <johan@kernel.org> writes: > On Mon, May 23, 2022 at 10:06:37PM +0300, Kalle Valo wrote: >> Johan Hovold <johan@kernel.org> writes: >> >> > On Tue, May 17, 2022 at 12:34:36PM +0200, Johan Hovold wrote: >> >> Make sure to allocate resources needed before registering the device. >> >> >> >> This specifically avoids having a racing open() trigger a BUG_ON() in >> >> mod_timer() when ath11k_mac_op_start() is called before the >> >> mon_reap_timer as been set up. >> >> >> >> Fixes: d5c65159f289 ("ath11k: driver for Qualcomm IEEE 802.11ax devices") >> >> Fixes: 840c36fa727a ("ath11k: dp: stop rx pktlog before suspend") >> >> Signed-off-by: Johan Hovold <johan+linaro@kernel.org> >> >> --- >> > >> > For completeness: >> > >> > Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3 >> >> Thanks, added in the pending branch. >> >> You submitted this as RFC but do you mind if I apply this anyway? The >> patch looks good and passes my tests. But I do wonder why I haven't seen >> the crash... > > If it looks good to you then please do apply it. > > I was just worried that there may be some subtle reason for why > ath11k_dp_pdev_alloc() was called after netdev registration in the first > place and that it might need to be split up so that for example > ath11k_dp_rx_pdev_mon_attach() isn't called until after registration. At least I'm not aware of anything like that? Any comments from others? > I did not see this issue with next-20220310, but I hit it on every probe > with next-20220511. Perhaps some timing changed in between. > > Here's the backtrace for completeness in case someone else starts hitting > this and searches the archives: > > [ 51.346947] kernel BUG at kernel/time/timer.c:990! > [ 51.346958] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP > ... > [ 51.578225] Call trace: > [ 51.583293] __mod_timer+0x298/0x390 > [ 51.589518] mod_timer+0x14/0x20 > [ 51.595368] ath11k_mac_op_start+0x41c/0x4a0 [ath11k] > [ 51.603165] drv_start+0x38/0x60 [mac80211] > [ 51.610110] ieee80211_do_open+0x29c/0x7d0 [mac80211] > [ 51.617945] ieee80211_open+0x60/0xb0 [mac80211] > [ 51.625311] __dev_open+0x100/0x1c0 > [ 51.631420] __dev_change_flags+0x194/0x210 > [ 51.638214] dev_change_flags+0x24/0x70 > [ 51.644646] do_setlink+0x228/0xdb0 > [ 51.650723] __rtnl_newlink+0x460/0x830 > [ 51.657162] rtnl_newlink+0x4c/0x80 > [ 51.663229] rtnetlink_rcv_msg+0x124/0x390 > [ 51.669917] netlink_rcv_skb+0x58/0x130 > [ 51.676314] rtnetlink_rcv+0x18/0x30 > [ 51.682460] netlink_unicast+0x250/0x310 > [ 51.688960] netlink_sendmsg+0x19c/0x3e0 > [ 51.695458] ____sys_sendmsg+0x220/0x290 > [ 51.701938] ___sys_sendmsg+0x7c/0xc0 > [ 51.708148] __sys_sendmsg+0x68/0xd0 > [ 51.714254] __arm64_sys_sendmsg+0x28/0x40 > [ 51.720900] invoke_syscall+0x48/0x120 Thanks, this is good info and I added this to the commit log.
Johan Hovold <johan+linaro@kernel.org> wrote: > Make sure to allocate resources needed before registering the device. > > This specifically avoids having a racing open() trigger a BUG_ON() in > mod_timer() when ath11k_mac_op_start() is called before the > mon_reap_timer as been set up. > > I did not see this issue with next-20220310, but I hit it on every probe > with next-20220511. Perhaps some timing changed in between. > > Here's the backtrace: > > [ 51.346947] kernel BUG at kernel/time/timer.c:990! > [ 51.346958] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP > ... > [ 51.578225] Call trace: > [ 51.583293] __mod_timer+0x298/0x390 > [ 51.589518] mod_timer+0x14/0x20 > [ 51.595368] ath11k_mac_op_start+0x41c/0x4a0 [ath11k] > [ 51.603165] drv_start+0x38/0x60 [mac80211] > [ 51.610110] ieee80211_do_open+0x29c/0x7d0 [mac80211] > [ 51.617945] ieee80211_open+0x60/0xb0 [mac80211] > [ 51.625311] __dev_open+0x100/0x1c0 > [ 51.631420] __dev_change_flags+0x194/0x210 > [ 51.638214] dev_change_flags+0x24/0x70 > [ 51.644646] do_setlink+0x228/0xdb0 > [ 51.650723] __rtnl_newlink+0x460/0x830 > [ 51.657162] rtnl_newlink+0x4c/0x80 > [ 51.663229] rtnetlink_rcv_msg+0x124/0x390 > [ 51.669917] netlink_rcv_skb+0x58/0x130 > [ 51.676314] rtnetlink_rcv+0x18/0x30 > [ 51.682460] netlink_unicast+0x250/0x310 > [ 51.688960] netlink_sendmsg+0x19c/0x3e0 > [ 51.695458] ____sys_sendmsg+0x220/0x290 > [ 51.701938] ___sys_sendmsg+0x7c/0xc0 > [ 51.708148] __sys_sendmsg+0x68/0xd0 > [ 51.714254] __arm64_sys_sendmsg+0x28/0x40 > [ 51.720900] invoke_syscall+0x48/0x120 > > Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3 > > Fixes: d5c65159f289 ("ath11k: driver for Qualcomm IEEE 802.11ax devices") > Fixes: 840c36fa727a ("ath11k: dp: stop rx pktlog before suspend") > Signed-off-by: Johan Hovold <johan+linaro@kernel.org> > Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Patch applied to ath-next branch of ath.git, thanks. d4ba1ff87b17 ath11k: fix netdev open race
diff --git a/drivers/net/wireless/ath/ath11k/core.c b/drivers/net/wireless/ath/ath11k/core.c index ea073be60c12..e090dfbfa4e2 100644 --- a/drivers/net/wireless/ath/ath11k/core.c +++ b/drivers/net/wireless/ath/ath11k/core.c @@ -1218,23 +1218,23 @@ static int ath11k_core_pdev_create(struct ath11k_base *ab) return ret; } - ret = ath11k_mac_register(ab); + ret = ath11k_dp_pdev_alloc(ab); if (ret) { - ath11k_err(ab, "failed register the radio with mac80211: %d\n", ret); + ath11k_err(ab, "failed to attach DP pdev: %d\n", ret); goto err_pdev_debug; } - ret = ath11k_dp_pdev_alloc(ab); + ret = ath11k_mac_register(ab); if (ret) { - ath11k_err(ab, "failed to attach DP pdev: %d\n", ret); - goto err_mac_unregister; + ath11k_err(ab, "failed register the radio with mac80211: %d\n", ret); + goto err_dp_pdev_free; } ret = ath11k_thermal_register(ab); if (ret) { ath11k_err(ab, "could not register thermal device: %d\n", ret); - goto err_dp_pdev_free; + goto err_mac_unregister; } ret = ath11k_spectral_init(ab); @@ -1247,10 +1247,10 @@ static int ath11k_core_pdev_create(struct ath11k_base *ab) err_thermal_unregister: ath11k_thermal_unregister(ab); -err_dp_pdev_free: - ath11k_dp_pdev_free(ab); err_mac_unregister: ath11k_mac_unregister(ab); +err_dp_pdev_free: + ath11k_dp_pdev_free(ab); err_pdev_debug: ath11k_debugfs_pdev_destroy(ab);
Make sure to allocate resources needed before registering the device. This specifically avoids having a racing open() trigger a BUG_ON() in mod_timer() when ath11k_mac_op_start() is called before the mon_reap_timer as been set up. Fixes: d5c65159f289 ("ath11k: driver for Qualcomm IEEE 802.11ax devices") Fixes: 840c36fa727a ("ath11k: dp: stop rx pktlog before suspend") Signed-off-by: Johan Hovold <johan+linaro@kernel.org> --- I started hitting a BUG_ON() during ath11k probe due to a timer which hasn't been initialised. Turns out the netdev is registered before having been fully set up: [ 421.232410] ath11k_core_pdev_create [ 421.233854] ath11k_dp_pdev_alloc [ 421.233863] ath11k_dp_rx_pdev_srng_alloc [ 421.259161] ath11k_mac_config_mon_status_default - NULL reap timer function [ 421.259165] ath11k_pci 0006:01:00.0: failed to configure monitor status ring with default rx_filter: (-22) [ 421.373066] ath11k_dp_rx_pdev_srng_alloc - reap timer setup Sending as an RFC as I'm not familiar with the code. It looks like ath11k_dp_pdev_alloc() may need to be split in an alloc and attach function. Johan drivers/net/wireless/ath/ath11k/core.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)