diff mbox series

wifi: rtw88: sdio: Honor the host max_req_size in the RX path

Message ID 20230709195712.603200-1-martin.blumenstingl@googlemail.com
State New
Headers show
Series wifi: rtw88: sdio: Honor the host max_req_size in the RX path | expand

Commit Message

Martin Blumenstingl July 9, 2023, 7:57 p.m. UTC
Lukas reports skb_over_panic errors on his Banana Pi BPI-CM4 which comes
with an Amlogic A311D (G12B) SoC and a RTL8822CS SDIO wifi/Bluetooth
combo card. The error he observed is identical to what has been fixed
in commit e967229ead0e ("wifi: rtw88: sdio: Check the HISR RX_REQUEST
bit in rtw_sdio_rx_isr()") but that commit didn't fix Lukas' problem.

Lukas found that disabling or limiting RX aggregation fix the problem
for him. In the following discussion a few key topics have been
discussed which have an impact on this problem:
- The Amlogic A311D (G12B) SoC has a hardware bug in the SDIO controller
  which prevents DMA transfers. Instead all transfers need to go through
  the controller SRAM which limits transfers to 1536 bytes
- rtw88 chips don't split incoming (RX) packets, so if a big packet is
  received this is forwarded to the host in it's original form
- rtw88 chips can do RX aggregation, meaning more multiple incoming
  packets can be pulled by the host from the card with one MMC/SDIO
  transfer. This Depends on settings in the REG_RXDMA_AGG_PG_TH
  register (BIT_RXDMA_AGG_PG_TH limits the number of packets that will
  be aggregated, BIT_DMA_AGG_TO_V1 configures a timeout for aggregation
  and BIT_EN_PRE_CALC makes the chip honor the limits more effectively)

Use multiple consecutive reads in rtw_sdio_read_port() to limit the
number of bytes which are copied by the host from the card in one
MMC/SDIO transfer. This allows receiving a buffer that's larger than
the hosts max_req_size (number of bytes which can be transferred in
one MMC/SDIO transfer). As a result of this the skb_over_panic error
is gone as the rtw88 driver is now able to receive more than 1536 bytes
from the card (either because the incoming packet is larger than that
or because multiple packets have been aggregated).

Fixes: 65371a3f14e7 ("wifi: rtw88: sdio: Add HCI implementation for SDIO based chipsets")
Reported-by: Lukas F. Hartmann <lukas@mntre.com>
Closes: https://lore.kernel.org/linux-wireless/CAFBinCBaXtebixKbjkWKW_WXc5k=NdGNaGUjVE8NCPNxOhsb2g@mail.gmail.com/
Suggested-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
---
 drivers/net/wireless/realtek/rtw88/sdio.c | 24 +++++++++++++++++------
 1 file changed, 18 insertions(+), 6 deletions(-)

Comments

Ping-Ke Shih July 10, 2023, 12:36 a.m. UTC | #1
> -----Original Message-----
> From: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
> Sent: Monday, July 10, 2023 3:57 AM
> To: linux-wireless@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org; jernej.skrabec@gmail.com; Ping-Ke Shih <pkshih@realtek.com>;
> ulf.hansson@linaro.org; kvalo@kernel.org; tony0620emma@gmail.com; Martin Blumenstingl
> <martin.blumenstingl@googlemail.com>; Lukas F . Hartmann <lukas@mntre.com>
> Subject: [PATCH] wifi: rtw88: sdio: Honor the host max_req_size in the RX path
> 
> Lukas reports skb_over_panic errors on his Banana Pi BPI-CM4 which comes
> with an Amlogic A311D (G12B) SoC and a RTL8822CS SDIO wifi/Bluetooth
> combo card. The error he observed is identical to what has been fixed
> in commit e967229ead0e ("wifi: rtw88: sdio: Check the HISR RX_REQUEST
> bit in rtw_sdio_rx_isr()") but that commit didn't fix Lukas' problem.
> 
> Lukas found that disabling or limiting RX aggregation fix the problem
> for him. In the following discussion a few key topics have been
> discussed which have an impact on this problem:
> - The Amlogic A311D (G12B) SoC has a hardware bug in the SDIO controller
>   which prevents DMA transfers. Instead all transfers need to go through
>   the controller SRAM which limits transfers to 1536 bytes
> - rtw88 chips don't split incoming (RX) packets, so if a big packet is
>   received this is forwarded to the host in it's original form
> - rtw88 chips can do RX aggregation, meaning more multiple incoming
>   packets can be pulled by the host from the card with one MMC/SDIO
>   transfer. This Depends on settings in the REG_RXDMA_AGG_PG_TH
>   register (BIT_RXDMA_AGG_PG_TH limits the number of packets that will
>   be aggregated, BIT_DMA_AGG_TO_V1 configures a timeout for aggregation
>   and BIT_EN_PRE_CALC makes the chip honor the limits more effectively)
> 
> Use multiple consecutive reads in rtw_sdio_read_port() to limit the
> number of bytes which are copied by the host from the card in one
> MMC/SDIO transfer. This allows receiving a buffer that's larger than
> the hosts max_req_size (number of bytes which can be transferred in
> one MMC/SDIO transfer). As a result of this the skb_over_panic error
> is gone as the rtw88 driver is now able to receive more than 1536 bytes
> from the card (either because the incoming packet is larger than that
> or because multiple packets have been aggregated).

I assume your conclusion is correct for all platforms, so I add my reviewed-by.
But, I think it would be better that Lukas can help to test this patch on his
platform, and give a tested-by tag before getting this patch merged. 

> 
> Fixes: 65371a3f14e7 ("wifi: rtw88: sdio: Add HCI implementation for SDIO based chipsets")
> Reported-by: Lukas F. Hartmann <lukas@mntre.com>
> Closes:
> https://lore.kernel.org/linux-wireless/CAFBinCBaXtebixKbjkWKW_WXc5k=NdGNaGUjVE8NCPNxOhsb2g@mail.gmail.
> com/
> Suggested-by: Ping-Ke Shih <pkshih@realtek.com>
> Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>

Reviewed-by: Ping-Ke Shih <pkshih@realtek.com>

[...]
Ping-Ke Shih July 14, 2023, 12:34 a.m. UTC | #2
> -----Original Message-----
> From: Lukas F. Hartmann <lukas@mntre.com>
> Sent: Thursday, July 13, 2023 8:49 PM
> To: Ping-Ke Shih <pkshih@realtek.com>; Martin Blumenstingl <martin.blumenstingl@googlemail.com>;
> linux-wireless@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org; jernej.skrabec@gmail.com; ulf.hansson@linaro.org; kvalo@kernel.org;
> tony0620emma@gmail.com
> Subject: RE: [PATCH] wifi: rtw88: sdio: Honor the host max_req_size in the RX path
> 
> Hi,
> 
> Ping-Ke Shih <pkshih@realtek.com> writes:
> 
> > I assume your conclusion is correct for all platforms, so I add my reviewed-by.
> > But, I think it would be better that Lukas can help to test this patch on his
> > platform, and give a tested-by tag before getting this patch merged.
> 
> I have been testing this now more rigorously in my own laptop with
> Kernel 6.4.1 (from Debian experimental) and this patch applied. I first
> had issues with rtw_power_mode_change (and "firmware failed to leave lps
> state"), so I turned off power_save using iw. This made everything
> quiet, but unfortunately after about 1 hour of usage I get
> skb_over_panic again and I believe some memory corruption happens in the
> kernel, as I can do dmesg only once and then another dmesg will hang forever.
> (After WARNING: CPU: 4 PID: 0 at kernel/context_tracking.c:128
> ct_kernel_exit.constprop.0+0xa0/0xa8)
> 
> Here are the errors that lead up to this:
> http://dump.mntmn.com/rtw88-failure-1h-dmesg.txt

Hi Martin,

The dmesg shows that
"rtw_8822cs mmc2:0001:1: Failed to read 1536 byte(s) from SDIO port 0x000000d1"

Shouldn't we return an error code (with proper error handling) instead of
just break the loop? Because 'buf' content isn't usable. 

I wonder the approach of this patch is still not enough for Lukas' platform. 

Ping-Ke
Martin Blumenstingl July 26, 2023, 5:37 p.m. UTC | #3
Hello Ping-Ke,

On Fri, Jul 14, 2023 at 2:34 AM Ping-Ke Shih <pkshih@realtek.com> wrote:
[...]
> > Here are the errors that lead up to this:
> > http://dump.mntmn.com/rtw88-failure-1h-dmesg.txt
>
> Hi Martin,
>
> The dmesg shows that
> "rtw_8822cs mmc2:0001:1: Failed to read 1536 byte(s) from SDIO port 0x000000d1"
>
> Shouldn't we return an error code (with proper error handling) instead of
> just break the loop? Because 'buf' content isn't usable.
In my opinion we are properly breaking the loop:
"ret" will be non-zero so the error code is returned from
rtw_sdio_read_port() to the caller.
The (only) caller is rtw_sdio_rxfifo_recv() which sees the non-zero
return code and aborts processing.
What do you think?

> I wonder the approach of this patch is still not enough for Lukas' platform.
On IRC Lukas wrote:
  funny, i can reproduce skb_panic when opening this page in chromium
https://embedded.avnet.com/product/msc-sm2s-ryz/
and:
  still getting spurious skb_panics, even after disabling rx aggregation.

I haven't had the time to look into this any further yet.
Unfortunately I also don't have any hardware to reproduce this problem
either, which unfortunately results in this long ping-pong.
Lukas, could you please add two more prints:
- in the rtw_warn with "Failed to read %zu byte(s) from SDIO port":
please also print the ret variable (with %d) - I'm curious what the
reported error is (it could be some CRC error which would mean ret is
-EILSEQ)
- add something like the following at the end of rtw_sdio_read_port()
(right before "return ret"):

if (!ret && count > 1000) {
    printk(KERN_INFO "rtw_sdio_read_port() with %zu bytes:", count);
    print_hex_dump(KERN_INFO, "", DUMP_PREFIX_OFFSET, 16, 1, buf, count, false);
}

(note: I only compile-tested this)
The very last output of this (potentially spammy) output will contain
the full buffer that's causing the problem.


Best regards,
Martin
Ping-Ke Shih July 27, 2023, 1:44 a.m. UTC | #4
> -----Original Message-----
> From: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
> Sent: Thursday, July 27, 2023 1:38 AM
> To: Ping-Ke Shih <pkshih@realtek.com>
> Cc: Lukas F. Hartmann <lukas@mntre.com>; linux-wireless@vger.kernel.org; linux-kernel@vger.kernel.org;
> jernej.skrabec@gmail.com; ulf.hansson@linaro.org; kvalo@kernel.org; tony0620emma@gmail.com
> Subject: Re: [PATCH] wifi: rtw88: sdio: Honor the host max_req_size in the RX path
> 
> Hello Ping-Ke,
> 
> On Fri, Jul 14, 2023 at 2:34 AM Ping-Ke Shih <pkshih@realtek.com> wrote:
> [...]
> > > Here are the errors that lead up to this:
> > > http://dump.mntmn.com/rtw88-failure-1h-dmesg.txt
> >
> > Hi Martin,
> >
> > The dmesg shows that
> > "rtw_8822cs mmc2:0001:1: Failed to read 1536 byte(s) from SDIO port 0x000000d1"
> >
> > Shouldn't we return an error code (with proper error handling) instead of
> > just break the loop? Because 'buf' content isn't usable.
> In my opinion we are properly breaking the loop:
> "ret" will be non-zero so the error code is returned from
> rtw_sdio_read_port() to the caller.
> The (only) caller is rtw_sdio_rxfifo_recv() which sees the non-zero
> return code and aborts processing.
> What do you think?

You are correct. 

I check the kernel log again. It might try to read two times for a large packet.

First read is 1536 bytes, but it failed: 
  [ 4002.096664] rtw_8822cs mmc2:0001:1: Failed to read 1536 byte(s) from SDIO port 0x000000d1

Second read is less byte, and it succeed, but skb->data content is incorrect. Then, 
  [ 4002.100140] rtw_8822cs mmc2:0001:1: unused phy status page (3)
  [ 4002.105065] rtw_8822cs mmc2:0001:1: unused phy status page (2)
  [ 4002.110862] ------------[ cut here ]------------
  [ 4002.110868] Rate marked as a VHT rate but data is invalid: MCS: 0, NSS: 0

So, showing total size ('count' argument) might help to find the cause
or a workaround. 

Ping-Ke
Kalle Valo Aug. 1, 2023, 2:11 p.m. UTC | #5
Martin Blumenstingl <martin.blumenstingl@googlemail.com> wrote:

> Lukas reports skb_over_panic errors on his Banana Pi BPI-CM4 which comes
> with an Amlogic A311D (G12B) SoC and a RTL8822CS SDIO wifi/Bluetooth
> combo card. The error he observed is identical to what has been fixed
> in commit e967229ead0e ("wifi: rtw88: sdio: Check the HISR RX_REQUEST
> bit in rtw_sdio_rx_isr()") but that commit didn't fix Lukas' problem.
> 
> Lukas found that disabling or limiting RX aggregation fix the problem
> for him. In the following discussion a few key topics have been
> discussed which have an impact on this problem:
> - The Amlogic A311D (G12B) SoC has a hardware bug in the SDIO controller
>   which prevents DMA transfers. Instead all transfers need to go through
>   the controller SRAM which limits transfers to 1536 bytes
> - rtw88 chips don't split incoming (RX) packets, so if a big packet is
>   received this is forwarded to the host in it's original form
> - rtw88 chips can do RX aggregation, meaning more multiple incoming
>   packets can be pulled by the host from the card with one MMC/SDIO
>   transfer. This Depends on settings in the REG_RXDMA_AGG_PG_TH
>   register (BIT_RXDMA_AGG_PG_TH limits the number of packets that will
>   be aggregated, BIT_DMA_AGG_TO_V1 configures a timeout for aggregation
>   and BIT_EN_PRE_CALC makes the chip honor the limits more effectively)
> 
> Use multiple consecutive reads in rtw_sdio_read_port() to limit the
> number of bytes which are copied by the host from the card in one
> MMC/SDIO transfer. This allows receiving a buffer that's larger than
> the hosts max_req_size (number of bytes which can be transferred in
> one MMC/SDIO transfer). As a result of this the skb_over_panic error
> is gone as the rtw88 driver is now able to receive more than 1536 bytes
> from the card (either because the incoming packet is larger than that
> or because multiple packets have been aggregated).
> 
> Fixes: 65371a3f14e7 ("wifi: rtw88: sdio: Add HCI implementation for SDIO based chipsets")
> Reported-by: Lukas F. Hartmann <lukas@mntre.com>
> Closes: https://lore.kernel.org/linux-wireless/CAFBinCBaXtebixKbjkWKW_WXc5k=NdGNaGUjVE8NCPNxOhsb2g@mail.gmail.com/
> Suggested-by: Ping-Ke Shih <pkshih@realtek.com>
> Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
> Reviewed-by: Ping-Ke Shih <pkshih@realtek.com>

Ping, should I take or drop the patch? It wasn't quite clear for me.
Ping-Ke Shih Aug. 2, 2023, 12:27 a.m. UTC | #6
> -----Original Message-----
> From: Kalle Valo <kvalo@kernel.org>
> Sent: Tuesday, August 1, 2023 10:11 PM
> To: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
> Cc: linux-wireless@vger.kernel.org; linux-kernel@vger.kernel.org; jernej.skrabec@gmail.com; Ping-Ke Shih
> <pkshih@realtek.com>; ulf.hansson@linaro.org; tony0620emma@gmail.com; Martin Blumenstingl
> <martin.blumenstingl@googlemail.com>; Lukas F . Hartmann <lukas@mntre.com>
> Subject: Re: [PATCH] wifi: rtw88: sdio: Honor the host max_req_size in the RX path
> 
> Ping, should I take or drop the patch? It wasn't quite clear for me.

Please drop this patch, because this patch still not fixes problem on Lukas' platform.
I gave my reviewed-by too early. Sorry for that. 

Ping-Ke
diff mbox series

Patch

diff --git a/drivers/net/wireless/realtek/rtw88/sdio.c b/drivers/net/wireless/realtek/rtw88/sdio.c
index 2c1fb2dabd40..b19262ec5d8c 100644
--- a/drivers/net/wireless/realtek/rtw88/sdio.c
+++ b/drivers/net/wireless/realtek/rtw88/sdio.c
@@ -500,19 +500,31 @@  static u32 rtw_sdio_get_tx_addr(struct rtw_dev *rtwdev, size_t size,
 static int rtw_sdio_read_port(struct rtw_dev *rtwdev, u8 *buf, size_t count)
 {
 	struct rtw_sdio *rtwsdio = (struct rtw_sdio *)rtwdev->priv;
+	struct mmc_host *host = rtwsdio->sdio_func->card->host;
 	bool bus_claim = rtw_sdio_bus_claim_needed(rtwsdio);
 	u32 rxaddr = rtwsdio->rx_addr++;
+	size_t bytes;
 	int ret;
 
 	if (bus_claim)
 		sdio_claim_host(rtwsdio->sdio_func);
 
-	ret = sdio_memcpy_fromio(rtwsdio->sdio_func, buf,
-				 RTW_SDIO_ADDR_RX_RX0FF_GEN(rxaddr), count);
-	if (ret)
-		rtw_warn(rtwdev,
-			 "Failed to read %zu byte(s) from SDIO port 0x%08x",
-			 count, rxaddr);
+	while (count > 0) {
+		bytes = min_t(size_t, host->max_req_size, count);
+
+		ret = sdio_memcpy_fromio(rtwsdio->sdio_func, buf,
+					 RTW_SDIO_ADDR_RX_RX0FF_GEN(rxaddr),
+					 bytes);
+		if (ret) {
+			rtw_warn(rtwdev,
+				 "Failed to read %zu byte(s) from SDIO port 0x%08x",
+				 bytes, rxaddr);
+			break;
+		}
+
+		count -= bytes;
+		buf += bytes;
+	}
 
 	if (bus_claim)
 		sdio_release_host(rtwsdio->sdio_func);