Message ID | 20240329015630.3019-1-quic_bqiang@quicinc.com |
---|---|
State | Superseded |
Headers | show |
Series | wifi: ath12k: fix flush failure in recovery scenarios | expand |
On 3/28/2024 6:56 PM, Baochen Qiang wrote: > Commit eaf9f17b861b ("wifi: ath12k: relocate ath12k_dp_pdev_pre_alloc() > call") moves ath12k_dp_pdev_pre_alloc() from ath12k_core_start() to > ath12k_mac_allocate(), resulting in ath12k_mac_flush() failure in > recovery scenarios: > > [ 6849.684104] ath12k_pci 0000:04:00.0: pdev 0 successfully recovered > [ 6854.907320] ath12k_pci 0000:04:00.0: failed to flush transmit queue 0 > [ 6860.027353] ath12k_pci 0000:04:00.0: failed to flush transmit queue 0 > [ 6865.143385] ath12k_pci 0000:04:00.0: failed to flush transmit queue 0 > > This is because, with ath12k_dp_pdev_pre_alloc() moved to ath12k_mac_allocate(), > dp->num_tx_pending is not reset due to ATH12K_FLAG_REGISTERED set in > recovery scenarios. > > So a possible fix would be to reset that counter at some proper point, > just like the old design. But considering that the counter tracks number > of packets pending to be freed or returned to mac80211, forcefully reset > it might make it hard to expose some real issues. For example if somehow > ath12k fails to free/return some TX packets, we don't know that because > no warnings any more. > > That is to say we should not reset that counter during recovery (which is > already done due to above commit), instead should decrease it each time > a packet is freed/returned. Currently almost each related function has > this logic implemented, except ath12k_dp_cc_cleanup(). So add the same > there to fix this issue. > > Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0-03427-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.15378.4 > > Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> Do we have QCN9274 test results?
On 4/2/2024 1:01 AM, Jeff Johnson wrote: > On 3/28/2024 6:56 PM, Baochen Qiang wrote: >> Commit eaf9f17b861b ("wifi: ath12k: relocate ath12k_dp_pdev_pre_alloc() >> call") moves ath12k_dp_pdev_pre_alloc() from ath12k_core_start() to >> ath12k_mac_allocate(), resulting in ath12k_mac_flush() failure in >> recovery scenarios: >> >> [ 6849.684104] ath12k_pci 0000:04:00.0: pdev 0 successfully recovered >> [ 6854.907320] ath12k_pci 0000:04:00.0: failed to flush transmit queue 0 >> [ 6860.027353] ath12k_pci 0000:04:00.0: failed to flush transmit queue 0 >> [ 6865.143385] ath12k_pci 0000:04:00.0: failed to flush transmit queue 0 >> >> This is because, with ath12k_dp_pdev_pre_alloc() moved to ath12k_mac_allocate(), >> dp->num_tx_pending is not reset due to ATH12K_FLAG_REGISTERED set in >> recovery scenarios. >> >> So a possible fix would be to reset that counter at some proper point, >> just like the old design. But considering that the counter tracks number >> of packets pending to be freed or returned to mac80211, forcefully reset >> it might make it hard to expose some real issues. For example if somehow >> ath12k fails to free/return some TX packets, we don't know that because >> no warnings any more. >> >> That is to say we should not reset that counter during recovery (which is >> already done due to above commit), instead should decrease it each time >> a packet is freed/returned. Currently almost each related function has >> this logic implemented, except ath12k_dp_cc_cleanup(). So add the same >> there to fix this issue. >> >> Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0-03427-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.15378.4 >> >> Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> > > Do we have QCN9274 test results? I was thinking this also applies to QCN9274 and should not cause any regression because it is quite a straightforward change. Anyway, more tests more convincing. Hi Vasanth, could you help with SSR test on this patch? Considering that debugfs patches and firmware-assert patch are not included in ath.git, you may need to find a way to simulate firmware crash to trigger SSR process. Please let me know if you have any issues in that. >
diff --git a/drivers/net/wireless/ath/ath12k/dp.c b/drivers/net/wireless/ath/ath12k/dp.c index 1006eef8ff0c..ddfe1eb82d6b 100644 --- a/drivers/net/wireless/ath/ath12k/dp.c +++ b/drivers/net/wireless/ath/ath12k/dp.c @@ -1150,7 +1150,9 @@ static void ath12k_dp_cc_cleanup(struct ath12k_base *ab) struct ath12k_rx_desc_info *desc_info; struct ath12k_tx_desc_info *tx_desc_info, *tmp1; struct ath12k_dp *dp = &ab->dp; + struct ath12k_skb_cb *skb_cb; struct sk_buff *skb; + struct ath12k *ar; int i, j; u32 pool_id, tx_spt_page; @@ -1201,6 +1203,11 @@ static void ath12k_dp_cc_cleanup(struct ath12k_base *ab) if (!skb) continue; + skb_cb = ATH12K_SKB_CB(skb); + ar = skb_cb->ar; + if (atomic_dec_and_test(&ar->dp.num_tx_pending)) + wake_up(&ar->dp.tx_empty_waitq); + dma_unmap_single(ab->dev, ATH12K_SKB_CB(skb)->paddr, skb->len, DMA_TO_DEVICE); dev_kfree_skb_any(skb);
Commit eaf9f17b861b ("wifi: ath12k: relocate ath12k_dp_pdev_pre_alloc() call") moves ath12k_dp_pdev_pre_alloc() from ath12k_core_start() to ath12k_mac_allocate(), resulting in ath12k_mac_flush() failure in recovery scenarios: [ 6849.684104] ath12k_pci 0000:04:00.0: pdev 0 successfully recovered [ 6854.907320] ath12k_pci 0000:04:00.0: failed to flush transmit queue 0 [ 6860.027353] ath12k_pci 0000:04:00.0: failed to flush transmit queue 0 [ 6865.143385] ath12k_pci 0000:04:00.0: failed to flush transmit queue 0 This is because, with ath12k_dp_pdev_pre_alloc() moved to ath12k_mac_allocate(), dp->num_tx_pending is not reset due to ATH12K_FLAG_REGISTERED set in recovery scenarios. So a possible fix would be to reset that counter at some proper point, just like the old design. But considering that the counter tracks number of packets pending to be freed or returned to mac80211, forcefully reset it might make it hard to expose some real issues. For example if somehow ath12k fails to free/return some TX packets, we don't know that because no warnings any more. That is to say we should not reset that counter during recovery (which is already done due to above commit), instead should decrease it each time a packet is freed/returned. Currently almost each related function has this logic implemented, except ath12k_dp_cc_cleanup(). So add the same there to fix this issue. Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0-03427-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.15378.4 Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> --- drivers/net/wireless/ath/ath12k/dp.c | 7 +++++++ 1 file changed, 7 insertions(+) base-commit: fe7e1b830cf3c0272aa4eaf367c4c7b29c169c3d