Message ID | 20210421045246.215779-1-yoshihiro.shimoda.uh@renesas.com |
---|---|
State | New |
Headers | show |
Series | net: renesas: ravb: Fix a stuck issue when a lot of frames are received | expand |
Hello! On 4/21/21 9:00 PM, patchwork-bot+netdevbpf@kernel.org wrote: [...] >> When a lot of frames were received in the short term, the driver >> caused a stuck of receiving until a new frame was received. For example, >> the following command from other device could cause this issue. >> >> $ sudo ping -f -l 1000 -c 1000 <this driver's ipaddress> >> >> The previous code always cleared the interrupt flag of RX but checks >> the interrupt flags in ravb_poll(). So, ravb_poll() could not call >> ravb_rx() in the next time until a new RX frame was received if >> ravb_rx() returned true. To fix the issue, always calls ravb_rx() >> regardless the interrupt flags condition. >> >> [...] > > Here is the summary with links: > - net: renesas: ravb: Fix a stuck issue when a lot of frames are received > https://git.kernel.org/netdev/net/c/5718458b092b > > You are awesome, thank you! WTF is this rush?! :-/ I was going to review this patch (it didn't look well to me from th 1s glance)... [...] MBR, Sergei
From: Sergei Shtylyov <sergei.shtylyov@gmail.com> Date: Wed, 21 Apr 2021 21:14:37 +0300 > > WTF is this rush?! :-/ > I was going to review this patch (it didn't look well to me from th 1s glance)... > Timely reviews are really important. If I've inspired you to review more quickly in the future, then good. :) Just responding "I will review this don't apply yet." as quickly as you can could avoid this in the future. Thanks.
Hello! Sorry for the late reply -- the patch got into my spam folder, and I haven't seen it until this evening. Blame GMail for that. On 4/21/21 9:20 PM, David Miller wrote: >> WTF is this rush?! :-/ >> I was going to review this patch (it didn't look well to me from th 1s glance)... > Timely reviews are really important. If I've inspired you to review more quickly in the future, > then good. :) The the patch hit my mailbox in the morning, and I prepared to post comments once I log in to the Linux laptop. When I was going to start writing comments, I received your mail saying that the patch was queued -- that's all within one day... :-/ > Just responding "I will review this don't apply yet." as quickly as you can could avoid > this in the future. OK, I'll try to remember... > Thanks. MBR, Sergei
On 5/5/21 10:00 PM, Sergei Shtylyov wrote: > Sorry for the late reply -- the patch got into my spam folder, and I haven't seen it Not the patch, your reply got into spam. :-) > until this evening. Blame GMail for that. [...] >> Thanks. > > MBR, Sergei
Hello! On 4/21/21 7:52 AM, Yoshihiro Shimoda wrote: Posting a review of the already commited (over my head) patch. It would have been appropriate if the patch looked OK but it's not. :-/ > When a lot of frames were received in the short term, the driver > caused a stuck of receiving until a new frame was received. For example, > the following command from other device could cause this issue. > > $ sudo ping -f -l 1000 -c 1000 <this driver's ipaddress> -l is essential here, right? Have you tried testing sh_eth sriver like that, BTW? > The previous code always cleared the interrupt flag of RX but checks > the interrupt flags in ravb_poll(). So, ravb_poll() could not call > ravb_rx() in the next time until a new RX frame was received if > ravb_rx() returned true. To fix the issue, always calls ravb_rx() > regardless the interrupt flags condition. That bacially defeats the purpose of IIUC... > Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper") > Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> > --- > drivers/net/ethernet/renesas/ravb_main.c | 35 ++++++++---------------- > 1 file changed, 12 insertions(+), 23 deletions(-) > > diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c > index eb0c03bdb12d..cad57d58d764 100644 > --- a/drivers/net/ethernet/renesas/ravb_main.c > +++ b/drivers/net/ethernet/renesas/ravb_main.c > @@ -911,31 +911,20 @@ static int ravb_poll(struct napi_struct *napi, int budget) > int q = napi - priv->napi; > int mask = BIT(q); > int quota = budget; > - u32 ris0, tis; > > - for (;;) { > - tis = ravb_read(ndev, TIS); > - ris0 = ravb_read(ndev, RIS0); > - if (!((ris0 & mask) || (tis & mask))) > - break; > + /* Processing RX Descriptor Ring */ > + /* Clear RX interrupt */ I think these 2 coments should've been collapsed... > + ravb_write(ndev, ~(mask | RIS0_RESERVED), RIS0); > + if (ravb_rx(ndev, "a, q)) > + goto out; > > - /* Processing RX Descriptor Ring */ > - if (ris0 & mask) { > - /* Clear RX interrupt */ > - ravb_write(ndev, ~(mask | RIS0_RESERVED), RIS0); > - if (ravb_rx(ndev, "a, q)) > - goto out; This jumps over the TX NAPI code, not good... Seems like another bug. > - } > - /* Processing TX Descriptor Ring */ > - if (tis & mask) { > - spin_lock_irqsave(&priv->lock, flags); > - /* Clear TX interrupt */ > - ravb_write(ndev, ~(mask | TIS_RESERVED), TIS); > - ravb_tx_free(ndev, q, true); > - netif_wake_subqueue(ndev, q); > - spin_unlock_irqrestore(&priv->lock, flags); > - } > - } > + /* Processing RX Descriptor Ring */ TX! > + spin_lock_irqsave(&priv->lock, flags); > + /* Clear TX interrupt */ > + ravb_write(ndev, ~(mask | TIS_RESERVED), TIS); > + ravb_tx_free(ndev, q, true); > + netif_wake_subqueue(ndev, q); > + spin_unlock_irqrestore(&priv->lock, flags); > > napi_complete(napi); > MBR, Sergei
On 08.05.2021 23:47, Sergei Shtylyov wrote: > Posting a review of the already commited (over my head) patch. It would have > been appropriate if the patch looked OK but it's not. :-/ > >> When a lot of frames were received in the short term, the driver >> caused a stuck of receiving until a new frame was received. For example, >> the following command from other device could cause this issue. >> >> $ sudo ping -f -l 1000 -c 1000 <this driver's ipaddress> > > -l is essential here, right? > Have you tried testing sh_eth sriver like that, BTW? It's driver! :-) >> The previous code always cleared the interrupt flag of RX but checks >> the interrupt flags in ravb_poll(). So, ravb_poll() could not call >> ravb_rx() in the next time until a new RX frame was received if >> ravb_rx() returned true. To fix the issue, always calls ravb_rx() >> regardless the interrupt flags condition. > > That bacially defeats the purpose of IIUC... ^ NAPI, I was sure I typed NAPI here, yet it got lost in the edits. :-) >> Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper") >> Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> [...] MBR, Sergei
Hi Sergei, > From: Sergei Shtylyov, Sent: Sunday, May 9, 2021 7:22 PM > > On 08.05.2021 23:47, Sergei Shtylyov wrote: > > > Posting a review of the already commited (over my head) patch. It would have > > been appropriate if the patch looked OK but it's not. :-/ > > > >> When a lot of frames were received in the short term, the driver > >> caused a stuck of receiving until a new frame was received. For example, > >> the following command from other device could cause this issue. > >> > >> $ sudo ping -f -l 1000 -c 1000 <this driver's ipaddress> > > > > -l is essential here, right? Yes. > > Have you tried testing sh_eth sriver like that, BTW? > > It's driver! :-) I have not tried testing sh_eth driver yet. I'll test it after I got an actual board. > >> The previous code always cleared the interrupt flag of RX but checks > >> the interrupt flags in ravb_poll(). So, ravb_poll() could not call > >> ravb_rx() in the next time until a new RX frame was received if > >> ravb_rx() returned true. To fix the issue, always calls ravb_rx() > >> regardless the interrupt flags condition. > > > > That bacially defeats the purpose of IIUC... > ^ NAPI, > > I was sure I typed NAPI here, yet it got lost in the edits. :-) I could not understand "that" (calling ravb_rx() regardless the interrupt flags condition) defeats the purpose of NAPI. According to an article on the Linux Foundation wiki [1], one of the purpose of NAPI is "Interrupt mitigation". In poll(), the interrupts are already disabled, and ravb_rx() will check the descriptor's status. So, this patch keeps the "Interrupt mitigation" IIUC. [1] https://wiki.linuxfoundation.org/networking/napi Best regards, Yoshihiro Shimoda
Hi again, > From: Yoshihiro Shimoda, Sent: Monday, May 10, 2021 7:30 PM > > > From: Sergei Shtylyov, Sent: Sunday, May 9, 2021 7:22 PM > > > > On 08.05.2021 23:47, Sergei Shtylyov wrote: > > > > > Posting a review of the already commited (over my head) patch. It would have > > > been appropriate if the patch looked OK but it's not. :-/ > > > > > >> When a lot of frames were received in the short term, the driver > > >> caused a stuck of receiving until a new frame was received. For example, > > >> the following command from other device could cause this issue. > > >> > > >> $ sudo ping -f -l 1000 -c 1000 <this driver's ipaddress> > > > > > > -l is essential here, right? > > Yes. > > > > Have you tried testing sh_eth sriver like that, BTW? > > > > It's driver! :-) > > I have not tried testing sh_eth driver yet. I'll test it after I got an actual board. I could reproduce a similar issue on R-Car Gen2 with sh_eth driver if the RX ring size is 1024 like the ravb driver. Also, I confirmed a similar patch for sh_eth driver could fix the issue. So, I'll send a patch later. Best regards, Yoshihiro Shimoda > > >> The previous code always cleared the interrupt flag of RX but checks > > >> the interrupt flags in ravb_poll(). So, ravb_poll() could not call > > >> ravb_rx() in the next time until a new RX frame was received if > > >> ravb_rx() returned true. To fix the issue, always calls ravb_rx() > > >> regardless the interrupt flags condition. > > > > > > That bacially defeats the purpose of IIUC... > > ^ NAPI, > > > > I was sure I typed NAPI here, yet it got lost in the edits. :-) > > I could not understand "that" (calling ravb_rx() regardless the interrupt > flags condition) defeats the purpose of NAPI. According to an article on > the Linux Foundation wiki [1], one of the purpose of NAPI is "Interrupt mitigation". > In poll(), the interrupts are already disabled, and ravb_rx() will check the > descriptor's status. So, this patch keeps the "Interrupt mitigation" IIUC. > > [1] > https://wiki.linuxfoundation.org/networking/napi > > Best regards, > Yoshihiro Shimoda
Hello! On 5/10/21 1:29 PM, Yoshihiro Shimoda wrote: >>> Posting a review of the already commited (over my head) patch. It would have >>> been appropriate if the patch looked OK but it's not. :-/ >>> >>>> When a lot of frames were received in the short term, the driver >>>> caused a stuck of receiving until a new frame was received. For example, >>>> the following command from other device could cause this issue. >>>> >>>> $ sudo ping -f -l 1000 -c 1000 <this driver's ipaddress> >>> >>> -l is essential here, right? > > Yes. > >>> Have you tried testing sh_eth sriver like that, BTW? >> >> It's driver! :-) > > I have not tried testing sh_eth driver yet. I'll test it after I got an actual board. Now you've got it, let's not rush forth with the fix this time. >>>> The previous code always cleared the interrupt flag of RX but checks >>>> the interrupt flags in ravb_poll(). So, ravb_poll() could not call >>>> ravb_rx() in the next time until a new RX frame was received if >>>> ravb_rx() returned true. To fix the issue, always calls ravb_rx() >>>> regardless the interrupt flags condition. >>> >>> That bacially defeats the purpose of IIUC... >> ^ NAPI, >> >> I was sure I typed NAPI here, yet it got lost in the edits. :-) > > I could not understand "that" (calling ravb_rx() regardless the interrupt > flags condition) defeats the purpose of NAPI. According to an article on > the Linux Foundation wiki [1], one of the purpose of NAPI is "Interrupt mitigation". Thank you for the pointer, BTW! Would have helped me with enabling NAPI in sh_eth (and ravb) drivers... > In poll(), the interrupts are already disabled, and ravb_rx() will check the > descriptor's status. So, this patch keeps the "Interrupt mitigation" IIUC. I think we'll still have the short race window, described in section 5.1 of this doc. So perhaps what we should do is changing the order of the code in the poll() method, not eliminating the loops totally. Thoughts? > [1] > https://wiki.linuxfoundation.org/networking/napi > > Best regards, > Yoshihiro Shimoda MBR, Sergei
Hello! > From: Sergei Shtylyov, Sent: Tuesday, May 18, 2021 4:36 AM > > On 5/10/21 1:29 PM, Yoshihiro Shimoda wrote: > > >>> Posting a review of the already commited (over my head) patch. It would have > >>> been appropriate if the patch looked OK but it's not. :-/ > >>> > >>>> When a lot of frames were received in the short term, the driver > >>>> caused a stuck of receiving until a new frame was received. For example, > >>>> the following command from other device could cause this issue. > >>>> > >>>> $ sudo ping -f -l 1000 -c 1000 <this driver's ipaddress> > >>> > >>> -l is essential here, right? > > > > Yes. > > > >>> Have you tried testing sh_eth sriver like that, BTW? > >> > >> It's driver! :-) > > > > I have not tried testing sh_eth driver yet. I'll test it after I got an actual board. > > Now you've got it, let's not rush forth with the fix this time. I sent a report yesterday: https://patchwork.kernel.org/project/linux-renesas-soc/patch/20210421045246.215779-1-yoshihiro.shimoda.uh@renesas.com/#24181167 > >>>> The previous code always cleared the interrupt flag of RX but checks > >>>> the interrupt flags in ravb_poll(). So, ravb_poll() could not call > >>>> ravb_rx() in the next time until a new RX frame was received if > >>>> ravb_rx() returned true. To fix the issue, always calls ravb_rx() > >>>> regardless the interrupt flags condition. > >>> > >>> That bacially defeats the purpose of IIUC... > >> ^ NAPI, > >> > >> I was sure I typed NAPI here, yet it got lost in the edits. :-) > > > > I could not understand "that" (calling ravb_rx() regardless the interrupt > > flags condition) defeats the purpose of NAPI. According to an article on > > the Linux Foundation wiki [1], one of the purpose of NAPI is "Interrupt mitigation". > > Thank you for the pointer, BTW! Would have helped me with enabling NAPI in sh_eth > (and ravb) drivers... > > > In poll(), the interrupts are already disabled, and ravb_rx() will check the > > descriptor's status. So, this patch keeps the "Interrupt mitigation" IIUC. > > I think we'll still have the short race window, described in section 5.1 > of this doc. So perhaps what we should do is changing the order of the code in > the poll() method, not eliminating the loops totally. Thoughts? The ravb hardware acts as "non-level sensitive IRQs". However, fortunately, the hardware can set an interrupt flag even if the interrupt is masked. So, I don't think this patch have any race window. Best regards, Yoshihiro Shimoda > > [1] > > > https://jpn01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwiki.linuxfoundation.org%2Fnetworking%2Fnapi&d > ata=04%7C01%7Cyoshihiro.shimoda.uh%40renesas.com%7C0102c1f2995947bcca1608d9196af978%7C53d82571da1947e49cb4625a166a4a > 2a%7C0%7C0%7C637568769530134169%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6 > Mn0%3D%7C1000&sdata=47kgAmI3d%2Fz%2BHunT0a8bzHRRQk1VdnxRETSExLkTrdI%3D&reserved=0 > > > > Best regards, > > Yoshihiro Shimoda > > MBR, Sergei
diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c index eb0c03bdb12d..cad57d58d764 100644 --- a/drivers/net/ethernet/renesas/ravb_main.c +++ b/drivers/net/ethernet/renesas/ravb_main.c @@ -911,31 +911,20 @@ static int ravb_poll(struct napi_struct *napi, int budget) int q = napi - priv->napi; int mask = BIT(q); int quota = budget; - u32 ris0, tis; - for (;;) { - tis = ravb_read(ndev, TIS); - ris0 = ravb_read(ndev, RIS0); - if (!((ris0 & mask) || (tis & mask))) - break; + /* Processing RX Descriptor Ring */ + /* Clear RX interrupt */ + ravb_write(ndev, ~(mask | RIS0_RESERVED), RIS0); + if (ravb_rx(ndev, "a, q)) + goto out; - /* Processing RX Descriptor Ring */ - if (ris0 & mask) { - /* Clear RX interrupt */ - ravb_write(ndev, ~(mask | RIS0_RESERVED), RIS0); - if (ravb_rx(ndev, "a, q)) - goto out; - } - /* Processing TX Descriptor Ring */ - if (tis & mask) { - spin_lock_irqsave(&priv->lock, flags); - /* Clear TX interrupt */ - ravb_write(ndev, ~(mask | TIS_RESERVED), TIS); - ravb_tx_free(ndev, q, true); - netif_wake_subqueue(ndev, q); - spin_unlock_irqrestore(&priv->lock, flags); - } - } + /* Processing RX Descriptor Ring */ + spin_lock_irqsave(&priv->lock, flags); + /* Clear TX interrupt */ + ravb_write(ndev, ~(mask | TIS_RESERVED), TIS); + ravb_tx_free(ndev, q, true); + netif_wake_subqueue(ndev, q); + spin_unlock_irqrestore(&priv->lock, flags); napi_complete(napi);
When a lot of frames were received in the short term, the driver caused a stuck of receiving until a new frame was received. For example, the following command from other device could cause this issue. $ sudo ping -f -l 1000 -c 1000 <this driver's ipaddress> The previous code always cleared the interrupt flag of RX but checks the interrupt flags in ravb_poll(). So, ravb_poll() could not call ravb_rx() in the next time until a new RX frame was received if ravb_rx() returned true. To fix the issue, always calls ravb_rx() regardless the interrupt flags condition. Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper") Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> --- drivers/net/ethernet/renesas/ravb_main.c | 35 ++++++++---------------- 1 file changed, 12 insertions(+), 23 deletions(-)