Message ID | 20200625201857.almm27xgzburyxxu@wololo.home.arpa |
---|---|
State | New |
Headers | show |
Series | rtw88: fix skb_under_panic in tx path | expand |
(Swapping out Yen-Hsuan's new email) Necromancing an old thread, since it's still relevant, and I had it sitting in my inbox to deal with. Now I have something useful to say! On Mon, Jun 29, 2020 at 11:50 AM Tony Chuang <yhchuang@realtek.com> wrote: > > On 2020-06-25 13:18, Nick Owens wrote: > > > fixes the following panic on my thinkpad A485 > > > > > > Oops#1 Part3 > > > <0>[ 3743.881656] skbuff: skb_under_panic: text:000000005f69fd98 > > > len:208 put:48 head:000000009e2719e8 data:00000000bd3795e0 tail:0xc2 > > > end:0x2c0 dev:wlp2s0 ... > > skb->head and skb->data here are really far (0.5GB) apart. Maybe > > skb->data actually got corrupted earlier? For the record, I've reproduced similar issues myself, and the problem occurs when (a) the initial SKB starts with minimal headroom (I find that many SKBs come into mac80211 with plenty of headroom) (b) the SKB participates in AMSDU aggregation I've spotted a specific bug, which I'll point out below. But I remain confused why in many cases the SKB ends up looking so corrupted. ... > > If it is a headroom issue, you can actually express the needed headroom > > needed by the driver in hw->extra_tx_headroom during init and avoid the > > pskb_expand_head() here. > > Looks like a headroom issue, but the driver already assigned headroom. > max_tx_headroom = rtwdev->chip->tx_pkt_desc_sz; > hw->extra_tx_headroom = max_tx_headroom; > > Then I am not sure why this happens. Nick, can you help to dump_stack() > so we can see where is the skb from? That's not so easy, because of the layers and queueing involved, but per the above, I've blamed mac80211's AMSDU aggregation. Specifically: ieee80211_amsdu_aggregate() -> ieee80211_amsdu_prepare_head(), where we pad out / expand the SKB to fit some additional AMSDU headers (and later append additional data). But the padding function (ieee80211_amsdu_realloc_pad()) accounts only for the 802.11 protocol headroom, and not for the driver-specific headroom. So it chooses not to expand the headroom, and instead eats into rtw88's space. For such SKBs, they end up in the driver without sufficient headroom -- thus, Nick's bug report. NB: the seemingly-obvious fix (changing the headroom checks in ieee80211_amsdu_realloc_pad()) does not seem to work, as I hit other bugs along the way. Unfortunately, I haven't had the time to fix this all myself properly, nor have I convinced Realtek to fix this themselves. So in the meantime, Chrome OS is running with this: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/260a7d4939c323aebe80efc73610682ad2cb187a%5E%21/#F0 It's a similar idea. We should of course fix the mac80211 bug, but I wonder if we also deserve some patch similar to either the Chromium patch or Nick's somewhere (perhaps with a loud warning, etc.), because it's much more user friendly (in the face of similar future bugs) to do some suboptimal memcpy()'s, etc., than to crash their systems Brian
diff --git a/drivers/net/wireless/realtek/rtw88/pci.c b/drivers/net/wireless/realtek/rtw88/pci.c index d735f3127fe8..21b3b268cb25 100644 --- a/drivers/net/wireless/realtek/rtw88/pci.c +++ b/drivers/net/wireless/realtek/rtw88/pci.c @@ -741,6 +741,12 @@ static int rtw_pci_tx_write_data(struct rtw_dev *rtwdev, else if (!avail_desc(ring->r.wp, ring->r.rp, ring->r.len)) return -ENOSPC; + if (skb_headroom(skb) < chip->tx_pkt_desc_sz && + pskb_expand_head(skb, chip->tx_pkt_desc_sz - skb_headroom(skb), 0, GFP_ATOMIC)) { + dev_err(rtwdev->dev, "no headroom available"); + return -ENOMEM; + } + pkt_desc = skb_push(skb, chip->tx_pkt_desc_sz); memset(pkt_desc, 0, tx_pkt_desc_sz); pkt_info->qsel = rtw_pci_get_tx_qsel(skb, queue);