Message ID | 20230714080658.3140-1-quic_bqiang@quicinc.com |
---|---|
State | New |
Headers | show |
Series | wifi: ath12k: Use pdev_id rather than mac_id to get pdev | expand |
Baochen Qiang <quic_bqiang@quicinc.com> wrote: > We are seeing kernel crash in below test scenario: > 1. make DUT connect to an WPA3 encrypted 11ax AP in Ch44 HE80 > 2. use "wpa_cli -i <inf> disconnect" to disconnect > 3. wait for DUT to automatically reconnect > > Kernel crashes while waiting, below shows the crash stack: > [ 755.120868] BUG: kernel NULL pointer dereference, address: 0000000000000000 > [ 755.120871] #PF: supervisor read access in kernel mode > [ 755.120872] #PF: error_code(0x0000) - not-present page > [ 755.120873] PGD 0 P4D 0 > [ 755.120875] Oops: 0000 [#1] PREEMPT SMP NOPTI > [ 755.120876] CPU: 7 PID: 0 Comm: swapper/7 Kdump: loaded Not tainted 5.19.0-rc1+ #3 > [ 755.120878] Hardware name: Intel(R) Client Systems NUC11PHi7/NUC11PHBi7, BIOS PHTGL579.0063.2021.0707.1057 07/07/2021 > [ 755.120879] RIP: 0010:ath12k_dp_process_rx_err+0x2b6/0x14a0 [ath12k] > [ 755.120890] Code: 01 c0 48 c1 e0 05 48 8b 9c 07 b8 b2 00 00 48 c7 c0 61 ff 0e c1 48 85 db 53 48 0f 44 c6 48 c7 c6 80 9d 0f c1 50 e8 1a 25 00 00 <4c> 8b 3b 4d 8b 76 14 41 59 41 5a 41 8b 87 78 43 01 00 4d 85 f6 89 > [ 755.120891] RSP: 0018:ffff9a93402c8d10 EFLAGS: 00010282 > [ 755.120892] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000303 > [ 755.120893] RDX: 0000000000000000 RSI: ffffffff93b7cbe9 RDI: 00000000ffffffff > [ 755.120894] RBP: ffff9a93402c8e50 R08: ffffffff93e65360 R09: ffffffff942e044d > [ 755.120894] R10: 0000000000000000 R11: 0000000000000063 R12: ffff8dbec5420000 > [ 755.120895] R13: ffff8dbec5420000 R14: ffff8dbdefe9a0a0 R15: ffff8dbec5420000 > [ 755.120896] FS: 0000000000000000(0000) GS:ffff8dc2705c0000(0000) knlGS:0000000000000000 > [ 755.120897] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 755.120898] CR2: 0000000000000000 CR3: 0000000107be4005 CR4: 0000000000770ee0 > [ 755.120898] PKRU: 55555554 > [ 755.120899] Call Trace: > [ 755.120900] <IRQ> > [ 755.120903] ? ath12k_pci_write32+0x2e/0x80 [ath12k] > [ 755.120910] ath12k_dp_service_srng+0x214/0x2e0 [ath12k] > [ 755.120917] ath12k_pci_ext_grp_napi_poll+0x26/0x80 [ath12k] > [ 755.120923] __napi_poll+0x2b/0x1c0 > [ 755.120925] net_rx_action+0x2a1/0x2f0 > [ 755.120927] __do_softirq+0xfa/0x2e9 > [ 755.120929] irq_exit_rcu+0xb9/0xd0 > [ 755.120932] common_interrupt+0xc1/0xe0 > [ 755.120934] </IRQ> > [ 755.120934] <TASK> > [ 755.120935] asm_common_interrupt+0x2c/0x40 > [ 755.120936] RIP: 0010:cpuidle_enter_state+0xdd/0x3a0 > [ 755.120938] Code: 00 31 ff e8 45 e2 74 ff 80 7d d7 00 74 16 9c 58 0f 1f 40 00 f6 c4 02 0f 85 a0 02 00 00 31 ff e8 69 79 7b ff fb 0f 1f 44 00 00 <45> 85 ff 0f 88 6d 01 00 00 49 63 d7 4c 2b 6d c8 48 8d 04 52 48 8d > [ 755.120939] RSP: 0018:ffff9a934018be50 EFLAGS: 00000246 > [ 755.120940] RAX: ffff8dc2705c0000 RBX: 0000000000000002 RCX: 000000000000001f > [ 755.120941] RDX: 000000afd0b532d3 RSI: ffffffff93b7cbe9 RDI: ffffffff93b8b66e > [ 755.120942] RBP: ffff9a934018be88 R08: 0000000000000002 R09: 0000000000030500 > [ 755.120942] R10: ffff9a934018be18 R11: 0000000000000741 R12: ffffba933fdc0600 > [ 755.120943] R13: 000000afd0b532d3 R14: ffffffff93fcbc60 R15: 0000000000000002 > [ 755.120945] cpuidle_enter+0x2e/0x40 > [ 755.120946] call_cpuidle+0x23/0x40 > [ 755.120948] do_idle+0x1ff/0x260 > [ 755.120950] cpu_startup_entry+0x1d/0x20 > [ 755.120951] start_secondary+0x10d/0x130 > [ 755.120953] secondary_startup_64_no_verify+0xd3/0xdb > [ 755.120956] </TASK> > [ 755.120956] Modules linked in: michael_mic rfcomm cmac algif_hash algif_skcipher af_alg bnep qrtr_mhi intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio kvm_intel qrtr snd_hda_codec_hdmi kvm irqbypass ath12k snd_hda_intel snd_seq_midi crct10dif_pclmul mhi ghash_clmulni_intel snd_intel_dspcfg snd_seq_midi_event aesni_intel qmi_helpers i915 snd_rawmidi crypto_simd snd_hda_codec cryptd cec intel_cstate snd_hda_core mac80211 rc_core nouveau snd_seq snd_hwdep btusb drm_buddy drm_ttm_helper nls_iso8859_1 snd_pcm ttm btrtl snd_seq_device wmi_bmof mxm_wmi input_leds cfg80211 joydev btbcm drm_display_helper snd_timer btintel mei_me libarc4 drm_kms_helper bluetooth i2c_algo_bit snd fb_sys_fops syscopyarea mei sysfillrect ecdh_generic soundcore sysimgblt ecc acpi_pad mac_hid sch_fq_codel ipmi_devintf ipmi_msghandler msr parport_pc ppdev lp ramoops parport reed_solomon drm efi_pstore ip_tables x_tables au tofs4 > [ 755.120992] hid_generic usbhid hid ax88179_178a usbnet mii nvme nvme_core rtsx_pci_sdmmc crc32_pclmul i2c_i801 intel_lpss_pci i2c_smbus intel_lpss rtsx_pci idma64 virt_dma vmd wmi video > [ 755.121002] CR2: 0000000000000000 > > The crash is because, for WCN7850, only ab->pdev[0] is initialized, while mac_id here is > misused to retrieve pdev and it is not zero, leading to a NULL pointer access. > > Fix this issue by getting pdev_id first and then use it to retrieve pdev. > > Also fix some other code snippets which have the same issue. > > Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0-03427-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.15378.4 > > Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> > Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Patch applied to ath-next branch of ath.git, thanks. 7ee027abd453 wifi: ath12k: Use pdev_id rather than mac_id to get pdev
diff --git a/drivers/net/wireless/ath/ath12k/dp_rx.c b/drivers/net/wireless/ath/ath12k/dp_rx.c index ffd9a2018610..67f8c140840f 100644 --- a/drivers/net/wireless/ath/ath12k/dp_rx.c +++ b/drivers/net/wireless/ath/ath12k/dp_rx.c @@ -2539,7 +2539,7 @@ static void ath12k_dp_rx_process_received_packets(struct ath12k_base *ab, struct ath12k_skb_rxcb *rxcb; struct sk_buff *msdu; struct ath12k *ar; - u8 mac_id; + u8 mac_id, pdev_id; int ret; if (skb_queue_empty(msdu_list)) @@ -2550,8 +2550,9 @@ static void ath12k_dp_rx_process_received_packets(struct ath12k_base *ab, while ((msdu = __skb_dequeue(msdu_list))) { rxcb = ATH12K_SKB_RXCB(msdu); mac_id = rxcb->mac_id; - ar = ab->pdevs[mac_id].ar; - if (!rcu_dereference(ab->pdevs_active[mac_id])) { + pdev_id = ath12k_hw_mac_id_to_pdev_id(ab->hw_params, mac_id); + ar = ab->pdevs[pdev_id].ar; + if (!rcu_dereference(ab->pdevs_active[pdev_id])) { dev_kfree_skb_any(msdu); continue; } @@ -3385,6 +3386,7 @@ int ath12k_dp_rx_process_err(struct ath12k_base *ab, struct napi_struct *napi, dma_addr_t paddr; bool is_frag; bool drop = false; + int pdev_id; tot_n_bufs_reaped = 0; quota = budget; @@ -3440,7 +3442,8 @@ int ath12k_dp_rx_process_err(struct ath12k_base *ab, struct napi_struct *napi, mac_id = le32_get_bits(reo_desc->info0, HAL_REO_DEST_RING_INFO0_SRC_LINK_ID); - ar = ab->pdevs[mac_id].ar; + pdev_id = ath12k_hw_mac_id_to_pdev_id(ab->hw_params, mac_id); + ar = ab->pdevs[pdev_id].ar; if (!ath12k_dp_process_rx_err_buf(ar, reo_desc, drop, msdu_cookies[i])) diff --git a/drivers/net/wireless/ath/ath12k/dp_tx.c b/drivers/net/wireless/ath/ath12k/dp_tx.c index d3c7c76d6b75..d661fe586651 100644 --- a/drivers/net/wireless/ath/ath12k/dp_tx.c +++ b/drivers/net/wireless/ath/ath12k/dp_tx.c @@ -347,6 +347,7 @@ static void ath12k_dp_tx_free_txbuf(struct ath12k_base *ab, { struct ath12k *ar; struct ath12k_skb_cb *skb_cb; + u8 pdev_id = ath12k_hw_mac_id_to_pdev_id(ab->hw_params, mac_id); skb_cb = ATH12K_SKB_CB(msdu); @@ -357,7 +358,7 @@ static void ath12k_dp_tx_free_txbuf(struct ath12k_base *ab, dev_kfree_skb_any(msdu); - ar = ab->pdevs[mac_id].ar; + ar = ab->pdevs[pdev_id].ar; if (atomic_dec_and_test(&ar->dp.num_tx_pending)) wake_up(&ar->dp.tx_empty_waitq); } @@ -536,7 +537,7 @@ void ath12k_dp_tx_completion_handler(struct ath12k_base *ab, int ring_id) struct hal_tx_status ts = { 0 }; struct dp_tx_ring *tx_ring = &dp->tx_ring[ring_id]; struct hal_wbm_release_ring *desc; - u8 mac_id; + u8 mac_id, pdev_id; u64 desc_va; spin_lock_bh(&status_ring->lock); @@ -605,7 +606,8 @@ void ath12k_dp_tx_completion_handler(struct ath12k_base *ab, int ring_id) continue; } - ar = ab->pdevs[mac_id].ar; + pdev_id = ath12k_hw_mac_id_to_pdev_id(ab->hw_params, mac_id); + ar = ab->pdevs[pdev_id].ar; if (atomic_dec_and_test(&ar->dp.num_tx_pending)) wake_up(&ar->dp.tx_empty_waitq);
We are seeing kernel crash in below test scenario: 1. make DUT connect to an WPA3 encrypted 11ax AP in Ch44 HE80 2. use "wpa_cli -i <inf> disconnect" to disconnect 3. wait for DUT to automatically reconnect Kernel crashes while waiting, below shows the crash stack: [ 755.120868] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 755.120871] #PF: supervisor read access in kernel mode [ 755.120872] #PF: error_code(0x0000) - not-present page [ 755.120873] PGD 0 P4D 0 [ 755.120875] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 755.120876] CPU: 7 PID: 0 Comm: swapper/7 Kdump: loaded Not tainted 5.19.0-rc1+ #3 [ 755.120878] Hardware name: Intel(R) Client Systems NUC11PHi7/NUC11PHBi7, BIOS PHTGL579.0063.2021.0707.1057 07/07/2021 [ 755.120879] RIP: 0010:ath12k_dp_process_rx_err+0x2b6/0x14a0 [ath12k] [ 755.120890] Code: 01 c0 48 c1 e0 05 48 8b 9c 07 b8 b2 00 00 48 c7 c0 61 ff 0e c1 48 85 db 53 48 0f 44 c6 48 c7 c6 80 9d 0f c1 50 e8 1a 25 00 00 <4c> 8b 3b 4d 8b 76 14 41 59 41 5a 41 8b 87 78 43 01 00 4d 85 f6 89 [ 755.120891] RSP: 0018:ffff9a93402c8d10 EFLAGS: 00010282 [ 755.120892] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000303 [ 755.120893] RDX: 0000000000000000 RSI: ffffffff93b7cbe9 RDI: 00000000ffffffff [ 755.120894] RBP: ffff9a93402c8e50 R08: ffffffff93e65360 R09: ffffffff942e044d [ 755.120894] R10: 0000000000000000 R11: 0000000000000063 R12: ffff8dbec5420000 [ 755.120895] R13: ffff8dbec5420000 R14: ffff8dbdefe9a0a0 R15: ffff8dbec5420000 [ 755.120896] FS: 0000000000000000(0000) GS:ffff8dc2705c0000(0000) knlGS:0000000000000000 [ 755.120897] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 755.120898] CR2: 0000000000000000 CR3: 0000000107be4005 CR4: 0000000000770ee0 [ 755.120898] PKRU: 55555554 [ 755.120899] Call Trace: [ 755.120900] <IRQ> [ 755.120903] ? ath12k_pci_write32+0x2e/0x80 [ath12k] [ 755.120910] ath12k_dp_service_srng+0x214/0x2e0 [ath12k] [ 755.120917] ath12k_pci_ext_grp_napi_poll+0x26/0x80 [ath12k] [ 755.120923] __napi_poll+0x2b/0x1c0 [ 755.120925] net_rx_action+0x2a1/0x2f0 [ 755.120927] __do_softirq+0xfa/0x2e9 [ 755.120929] irq_exit_rcu+0xb9/0xd0 [ 755.120932] common_interrupt+0xc1/0xe0 [ 755.120934] </IRQ> [ 755.120934] <TASK> [ 755.120935] asm_common_interrupt+0x2c/0x40 [ 755.120936] RIP: 0010:cpuidle_enter_state+0xdd/0x3a0 [ 755.120938] Code: 00 31 ff e8 45 e2 74 ff 80 7d d7 00 74 16 9c 58 0f 1f 40 00 f6 c4 02 0f 85 a0 02 00 00 31 ff e8 69 79 7b ff fb 0f 1f 44 00 00 <45> 85 ff 0f 88 6d 01 00 00 49 63 d7 4c 2b 6d c8 48 8d 04 52 48 8d [ 755.120939] RSP: 0018:ffff9a934018be50 EFLAGS: 00000246 [ 755.120940] RAX: ffff8dc2705c0000 RBX: 0000000000000002 RCX: 000000000000001f [ 755.120941] RDX: 000000afd0b532d3 RSI: ffffffff93b7cbe9 RDI: ffffffff93b8b66e [ 755.120942] RBP: ffff9a934018be88 R08: 0000000000000002 R09: 0000000000030500 [ 755.120942] R10: ffff9a934018be18 R11: 0000000000000741 R12: ffffba933fdc0600 [ 755.120943] R13: 000000afd0b532d3 R14: ffffffff93fcbc60 R15: 0000000000000002 [ 755.120945] cpuidle_enter+0x2e/0x40 [ 755.120946] call_cpuidle+0x23/0x40 [ 755.120948] do_idle+0x1ff/0x260 [ 755.120950] cpu_startup_entry+0x1d/0x20 [ 755.120951] start_secondary+0x10d/0x130 [ 755.120953] secondary_startup_64_no_verify+0xd3/0xdb [ 755.120956] </TASK> [ 755.120956] Modules linked in: michael_mic rfcomm cmac algif_hash algif_skcipher af_alg bnep qrtr_mhi intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio kvm_intel qrtr snd_hda_codec_hdmi kvm irqbypass ath12k snd_hda_intel snd_seq_midi crct10dif_pclmul mhi ghash_clmulni_intel snd_intel_dspcfg snd_seq_midi_event aesni_intel qmi_helpers i915 snd_rawmidi crypto_simd snd_hda_codec cryptd cec intel_cstate snd_hda_core mac80211 rc_core nouveau snd_seq snd_hwdep btusb drm_buddy drm_ttm_helper nls_iso8859_1 snd_pcm ttm btrtl snd_seq_device wmi_bmof mxm_wmi input_leds cfg80211 joydev btbcm drm_display_helper snd_timer btintel mei_me libarc4 drm_kms_helper bluetooth i2c_algo_bit snd fb_sys_fops syscopyarea mei sysfillrect ecdh_generic soundcore sysimgblt ecc acpi_pad mac_hid sch_fq_codel ipmi_devintf ipmi_msghandler msr parport_pc ppdev lp ramoops parport reed_solomon drm efi_pstore ip_tables x_tables autofs4 [ 755.120992] hid_generic usbhid hid ax88179_178a usbnet mii nvme nvme_core rtsx_pci_sdmmc crc32_pclmul i2c_i801 intel_lpss_pci i2c_smbus intel_lpss rtsx_pci idma64 virt_dma vmd wmi video [ 755.121002] CR2: 0000000000000000 The crash is because, for WCN7850, only ab->pdev[0] is initialized, while mac_id here is misused to retrieve pdev and it is not zero, leading to a NULL pointer access. Fix this issue by getting pdev_id first and then use it to retrieve pdev. Also fix some other code snippets which have the same issue. Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0-03427-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.15378.4 Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> --- drivers/net/wireless/ath/ath12k/dp_rx.c | 11 +++++++---- drivers/net/wireless/ath/ath12k/dp_tx.c | 8 +++++--- 2 files changed, 12 insertions(+), 7 deletions(-) base-commit: b21fe5be53eb873c02e7479372726c8aeed171e3