diff mbox series

net: renesas: ravb: Fix a stuck issue when a lot of frames are received

Message ID 20210421045246.215779-1-yoshihiro.shimoda.uh@renesas.com
State New
Headers show
Series net: renesas: ravb: Fix a stuck issue when a lot of frames are received | expand

Commit Message

Yoshihiro Shimoda April 21, 2021, 4:52 a.m. UTC
When a lot of frames were received in the short term, the driver
caused a stuck of receiving until a new frame was received. For example,
the following command from other device could cause this issue.

    $ sudo ping -f -l 1000 -c 1000 <this driver's ipaddress>

The previous code always cleared the interrupt flag of RX but checks
the interrupt flags in ravb_poll(). So, ravb_poll() could not call
ravb_rx() in the next time until a new RX frame was received if
ravb_rx() returned true. To fix the issue, always calls ravb_rx()
regardless the interrupt flags condition.

Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper")
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
---
 drivers/net/ethernet/renesas/ravb_main.c | 35 ++++++++----------------
 1 file changed, 12 insertions(+), 23 deletions(-)

Comments

Sergei Shtylyov April 21, 2021, 6:14 p.m. UTC | #1
Hello!

On 4/21/21 9:00 PM, patchwork-bot+netdevbpf@kernel.org wrote:

[...]
>> When a lot of frames were received in the short term, the driver
>> caused a stuck of receiving until a new frame was received. For example,
>> the following command from other device could cause this issue.
>>
>>     $ sudo ping -f -l 1000 -c 1000 <this driver's ipaddress>
>>
>> The previous code always cleared the interrupt flag of RX but checks
>> the interrupt flags in ravb_poll(). So, ravb_poll() could not call
>> ravb_rx() in the next time until a new RX frame was received if
>> ravb_rx() returned true. To fix the issue, always calls ravb_rx()
>> regardless the interrupt flags condition.
>>
>> [...]
> 
> Here is the summary with links:
>   - net: renesas: ravb: Fix a stuck issue when a lot of frames are received
>     https://git.kernel.org/netdev/net/c/5718458b092b
> 
> You are awesome, thank you!

   WTF is this rush?! :-/
   I was going to review this patch (it didn't look well to me from th 1s glance)...

[...]

MBR, Sergei
David Miller April 21, 2021, 6:20 p.m. UTC | #2
From: Sergei Shtylyov <sergei.shtylyov@gmail.com>
Date: Wed, 21 Apr 2021 21:14:37 +0300

> 
>    WTF is this rush?! :-/
>    I was going to review this patch (it didn't look well to me from th 1s glance)...
> 

Timely reviews are really important.  If I've inspired you to review more quickly in the future,
then good. :)

Just responding "I will review this don't apply yet." as quickly as you can could avoid
this in the future.

Thanks.
Sergei Shtylyov May 5, 2021, 7 p.m. UTC | #3
Hello!

   Sorry for the late reply -- the patch got into my spam folder, and I haven't seen it
until this evening. Blame GMail for that.

On 4/21/21 9:20 PM, David Miller wrote:

>>    WTF is this rush?! :-/

>>    I was going to review this patch (it didn't look well to me from th 1s glance)...


> Timely reviews are really important.  If I've inspired you to review more quickly in the future,

> then good. :)


   The the patch hit my mailbox in the morning, and I prepared to post comments once I log in
to the Linux laptop. When I was going to start writing comments, I received your mail saying
that the patch was queued -- that's all within one day... :-/

> Just responding "I will review this don't apply yet." as quickly as you can could avoid

> this in the future.


   OK, I'll try to remember...

> Thanks.


MBR, Sergei
Sergei Shtylyov May 5, 2021, 8:09 p.m. UTC | #4
On 5/5/21 10:00 PM, Sergei Shtylyov wrote:

>    Sorry for the late reply -- the patch got into my spam folder, and I haven't seen it


   Not the patch, your reply got into spam. :-)

> until this evening. Blame GMail for that.


[...]

>> Thanks.

> 

> MBR, Sergei
Sergei Shtylyov May 8, 2021, 8:47 p.m. UTC | #5
Hello!

On 4/21/21 7:52 AM, Yoshihiro Shimoda wrote:

   Posting a review of the already commited (over my head) patch. It would have
been appropriate if the patch looked OK but it's not. :-/

> When a lot of frames were received in the short term, the driver

> caused a stuck of receiving until a new frame was received. For example,

> the following command from other device could cause this issue.

> 

>     $ sudo ping -f -l 1000 -c 1000 <this driver's ipaddress>


   -l is essential here, right?
   Have you tried testing sh_eth sriver like that, BTW?

> The previous code always cleared the interrupt flag of RX but checks

> the interrupt flags in ravb_poll(). So, ravb_poll() could not call

> ravb_rx() in the next time until a new RX frame was received if

> ravb_rx() returned true. To fix the issue, always calls ravb_rx()

> regardless the interrupt flags condition.


   That bacially defeats the purpose of IIUC...

> Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper")

> Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>

> ---

>  drivers/net/ethernet/renesas/ravb_main.c | 35 ++++++++----------------

>  1 file changed, 12 insertions(+), 23 deletions(-)

> 

> diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c

> index eb0c03bdb12d..cad57d58d764 100644

> --- a/drivers/net/ethernet/renesas/ravb_main.c

> +++ b/drivers/net/ethernet/renesas/ravb_main.c

> @@ -911,31 +911,20 @@ static int ravb_poll(struct napi_struct *napi, int budget)

>  	int q = napi - priv->napi;

>  	int mask = BIT(q);

>  	int quota = budget;

> -	u32 ris0, tis;

>  

> -	for (;;) {

> -		tis = ravb_read(ndev, TIS);

> -		ris0 = ravb_read(ndev, RIS0);

> -		if (!((ris0 & mask) || (tis & mask)))

> -			break;

> +	/* Processing RX Descriptor Ring */

> +	/* Clear RX interrupt */


   I think these 2 coments should've been collapsed...
 
> +	ravb_write(ndev, ~(mask | RIS0_RESERVED), RIS0);

> +	if (ravb_rx(ndev, &quota, q))

> +		goto out;

>  

> -		/* Processing RX Descriptor Ring */

> -		if (ris0 & mask) {

> -			/* Clear RX interrupt */

> -			ravb_write(ndev, ~(mask | RIS0_RESERVED), RIS0);

> -			if (ravb_rx(ndev, &quota, q))

> -				goto out;


   This jumps over the TX NAPI code, not good... Seems like another bug.

> -		}

> -		/* Processing TX Descriptor Ring */

> -		if (tis & mask) {

> -			spin_lock_irqsave(&priv->lock, flags);

> -			/* Clear TX interrupt */

> -			ravb_write(ndev, ~(mask | TIS_RESERVED), TIS);

> -			ravb_tx_free(ndev, q, true);

> -			netif_wake_subqueue(ndev, q);

> -			spin_unlock_irqrestore(&priv->lock, flags);

> -		}

> -	}

> +	/* Processing RX Descriptor Ring */


   TX!

> +	spin_lock_irqsave(&priv->lock, flags);

> +	/* Clear TX interrupt */

> +	ravb_write(ndev, ~(mask | TIS_RESERVED), TIS);

> +	ravb_tx_free(ndev, q, true);

> +	netif_wake_subqueue(ndev, q);

> +	spin_unlock_irqrestore(&priv->lock, flags);

>  

>  	napi_complete(napi);

>  


MBR, Sergei
Sergei Shtylyov May 9, 2021, 10:21 a.m. UTC | #6
On 08.05.2021 23:47, Sergei Shtylyov wrote:

>     Posting a review of the already commited (over my head) patch. It would have

> been appropriate if the patch looked OK but it's not. :-/

> 

>> When a lot of frames were received in the short term, the driver

>> caused a stuck of receiving until a new frame was received. For example,

>> the following command from other device could cause this issue.

>>

>>      $ sudo ping -f -l 1000 -c 1000 <this driver's ipaddress>

> 

>     -l is essential here, right?

>     Have you tried testing sh_eth sriver like that, BTW?


    It's driver! :-)

>> The previous code always cleared the interrupt flag of RX but checks

>> the interrupt flags in ravb_poll(). So, ravb_poll() could not call

>> ravb_rx() in the next time until a new RX frame was received if

>> ravb_rx() returned true. To fix the issue, always calls ravb_rx()

>> regardless the interrupt flags condition.

> 

>     That bacially defeats the purpose of IIUC...

                                           ^ NAPI,

    I was sure I typed NAPI here, yet it got lost in the edits. :-)

>> Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper")

>> Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>

[...]

MBR, Sergei
Yoshihiro Shimoda May 10, 2021, 10:29 a.m. UTC | #7
Hi Sergei,

> From: Sergei Shtylyov, Sent: Sunday, May 9, 2021 7:22 PM

> 

> On 08.05.2021 23:47, Sergei Shtylyov wrote:

> 

> >     Posting a review of the already commited (over my head) patch. It would have

> > been appropriate if the patch looked OK but it's not. :-/

> >

> >> When a lot of frames were received in the short term, the driver

> >> caused a stuck of receiving until a new frame was received. For example,

> >> the following command from other device could cause this issue.

> >>

> >>      $ sudo ping -f -l 1000 -c 1000 <this driver's ipaddress>

> >

> >     -l is essential here, right?


Yes.

> >     Have you tried testing sh_eth sriver like that, BTW?

> 

>     It's driver! :-)


I have not tried testing sh_eth driver yet. I'll test it after I got an actual board.

> >> The previous code always cleared the interrupt flag of RX but checks

> >> the interrupt flags in ravb_poll(). So, ravb_poll() could not call

> >> ravb_rx() in the next time until a new RX frame was received if

> >> ravb_rx() returned true. To fix the issue, always calls ravb_rx()

> >> regardless the interrupt flags condition.

> >

> >     That bacially defeats the purpose of IIUC...

>                                            ^ NAPI,

> 

>     I was sure I typed NAPI here, yet it got lost in the edits. :-)


I could not understand "that" (calling ravb_rx() regardless the interrupt
flags condition) defeats the purpose of NAPI. According to an article on
the Linux Foundation wiki [1], one of the purpose of NAPI is "Interrupt mitigation".
In poll(), the interrupts are already disabled, and ravb_rx() will check the
descriptor's status. So, this patch keeps the "Interrupt mitigation" IIUC.

[1]
https://wiki.linuxfoundation.org/networking/napi

Best regards,
Yoshihiro Shimoda
Yoshihiro Shimoda May 17, 2021, 8:23 a.m. UTC | #8
Hi again,

> From: Yoshihiro Shimoda, Sent: Monday, May 10, 2021 7:30 PM

> 

> > From: Sergei Shtylyov, Sent: Sunday, May 9, 2021 7:22 PM

> >

> > On 08.05.2021 23:47, Sergei Shtylyov wrote:

> >

> > >     Posting a review of the already commited (over my head) patch. It would have

> > > been appropriate if the patch looked OK but it's not. :-/

> > >

> > >> When a lot of frames were received in the short term, the driver

> > >> caused a stuck of receiving until a new frame was received. For example,

> > >> the following command from other device could cause this issue.

> > >>

> > >>      $ sudo ping -f -l 1000 -c 1000 <this driver's ipaddress>

> > >

> > >     -l is essential here, right?

> 

> Yes.

> 

> > >     Have you tried testing sh_eth sriver like that, BTW?

> >

> >     It's driver! :-)

> 

> I have not tried testing sh_eth driver yet. I'll test it after I got an actual board.


I could reproduce a similar issue on R-Car Gen2 with sh_eth driver if the RX ring size is 1024 like
the ravb driver. Also, I confirmed a similar patch for sh_eth driver could fix the issue.
So, I'll send a patch later.

Best regards,
Yoshihiro Shimoda

> > >> The previous code always cleared the interrupt flag of RX but checks

> > >> the interrupt flags in ravb_poll(). So, ravb_poll() could not call

> > >> ravb_rx() in the next time until a new RX frame was received if

> > >> ravb_rx() returned true. To fix the issue, always calls ravb_rx()

> > >> regardless the interrupt flags condition.

> > >

> > >     That bacially defeats the purpose of IIUC...

> >                                            ^ NAPI,

> >

> >     I was sure I typed NAPI here, yet it got lost in the edits. :-)

> 

> I could not understand "that" (calling ravb_rx() regardless the interrupt

> flags condition) defeats the purpose of NAPI. According to an article on

> the Linux Foundation wiki [1], one of the purpose of NAPI is "Interrupt mitigation".

> In poll(), the interrupts are already disabled, and ravb_rx() will check the

> descriptor's status. So, this patch keeps the "Interrupt mitigation" IIUC.

> 

> [1]

> https://wiki.linuxfoundation.org/networking/napi

> 

> Best regards,

> Yoshihiro Shimoda
Sergei Shtylyov May 17, 2021, 7:35 p.m. UTC | #9
Hello!

On 5/10/21 1:29 PM, Yoshihiro Shimoda wrote:

>>>     Posting a review of the already commited (over my head) patch. It would have

>>> been appropriate if the patch looked OK but it's not. :-/

>>>

>>>> When a lot of frames were received in the short term, the driver

>>>> caused a stuck of receiving until a new frame was received. For example,

>>>> the following command from other device could cause this issue.

>>>>

>>>>      $ sudo ping -f -l 1000 -c 1000 <this driver's ipaddress>

>>>

>>>     -l is essential here, right?

> 

> Yes.

> 

>>>     Have you tried testing sh_eth sriver like that, BTW?

>>

>>     It's driver! :-)

> 

> I have not tried testing sh_eth driver yet. I'll test it after I got an actual board.


   Now you've got it, let's not rush forth with the fix this time.

>>>> The previous code always cleared the interrupt flag of RX but checks

>>>> the interrupt flags in ravb_poll(). So, ravb_poll() could not call

>>>> ravb_rx() in the next time until a new RX frame was received if

>>>> ravb_rx() returned true. To fix the issue, always calls ravb_rx()

>>>> regardless the interrupt flags condition.

>>>

>>>     That bacially defeats the purpose of IIUC...

>>                                            ^ NAPI,

>>

>>     I was sure I typed NAPI here, yet it got lost in the edits. :-)

> 

> I could not understand "that" (calling ravb_rx() regardless the interrupt

> flags condition) defeats the purpose of NAPI. According to an article on

> the Linux Foundation wiki [1], one of the purpose of NAPI is "Interrupt mitigation".


   Thank you for the pointer, BTW! Would have helped me with enabling NAPI in sh_eth
(and ravb) drivers...

> In poll(), the interrupts are already disabled, and ravb_rx() will check the

> descriptor's status. So, this patch keeps the "Interrupt mitigation" IIUC.


   I think we'll still have the short race window, described in section 5.1
of this doc. So perhaps what we should do is changing the order of the code in
the poll() method, not eliminating the loops totally. Thoughts?
 
> [1]

> https://wiki.linuxfoundation.org/networking/napi

> 

> Best regards,

> Yoshihiro Shimoda


MBR, Sergei
Yoshihiro Shimoda May 18, 2021, 2:20 a.m. UTC | #10
Hello!

> From: Sergei Shtylyov, Sent: Tuesday, May 18, 2021 4:36 AM

> 

> On 5/10/21 1:29 PM, Yoshihiro Shimoda wrote:

> 

> >>>     Posting a review of the already commited (over my head) patch. It would have

> >>> been appropriate if the patch looked OK but it's not. :-/

> >>>

> >>>> When a lot of frames were received in the short term, the driver

> >>>> caused a stuck of receiving until a new frame was received. For example,

> >>>> the following command from other device could cause this issue.

> >>>>

> >>>>      $ sudo ping -f -l 1000 -c 1000 <this driver's ipaddress>

> >>>

> >>>     -l is essential here, right?

> >

> > Yes.

> >

> >>>     Have you tried testing sh_eth sriver like that, BTW?

> >>

> >>     It's driver! :-)

> >

> > I have not tried testing sh_eth driver yet. I'll test it after I got an actual board.

> 

>    Now you've got it, let's not rush forth with the fix this time.


I sent a report yesterday:
https://patchwork.kernel.org/project/linux-renesas-soc/patch/20210421045246.215779-1-yoshihiro.shimoda.uh@renesas.com/#24181167

> >>>> The previous code always cleared the interrupt flag of RX but checks

> >>>> the interrupt flags in ravb_poll(). So, ravb_poll() could not call

> >>>> ravb_rx() in the next time until a new RX frame was received if

> >>>> ravb_rx() returned true. To fix the issue, always calls ravb_rx()

> >>>> regardless the interrupt flags condition.

> >>>

> >>>     That bacially defeats the purpose of IIUC...

> >>                                            ^ NAPI,

> >>

> >>     I was sure I typed NAPI here, yet it got lost in the edits. :-)

> >

> > I could not understand "that" (calling ravb_rx() regardless the interrupt

> > flags condition) defeats the purpose of NAPI. According to an article on

> > the Linux Foundation wiki [1], one of the purpose of NAPI is "Interrupt mitigation".

> 

>    Thank you for the pointer, BTW! Would have helped me with enabling NAPI in sh_eth

> (and ravb) drivers...

> 

> > In poll(), the interrupts are already disabled, and ravb_rx() will check the

> > descriptor's status. So, this patch keeps the "Interrupt mitigation" IIUC.

> 

>    I think we'll still have the short race window, described in section 5.1

> of this doc. So perhaps what we should do is changing the order of the code in

> the poll() method, not eliminating the loops totally. Thoughts?


The ravb hardware acts as "non-level sensitive IRQs". However, fortunately,
the hardware can set an interrupt flag even if the interrupt is masked.
So, I don't think this patch have any race window.

Best regards,
Yoshihiro Shimoda

> > [1]

> >

> https://jpn01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwiki.linuxfoundation.org%2Fnetworking%2Fnapi&amp;d

> ata=04%7C01%7Cyoshihiro.shimoda.uh%40renesas.com%7C0102c1f2995947bcca1608d9196af978%7C53d82571da1947e49cb4625a166a4a

> 2a%7C0%7C0%7C637568769530134169%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6

> Mn0%3D%7C1000&amp;sdata=47kgAmI3d%2Fz%2BHunT0a8bzHRRQk1VdnxRETSExLkTrdI%3D&amp;reserved=0

> >

> > Best regards,

> > Yoshihiro Shimoda

> 

> MBR, Sergei
diff mbox series

Patch

diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
index eb0c03bdb12d..cad57d58d764 100644
--- a/drivers/net/ethernet/renesas/ravb_main.c
+++ b/drivers/net/ethernet/renesas/ravb_main.c
@@ -911,31 +911,20 @@  static int ravb_poll(struct napi_struct *napi, int budget)
 	int q = napi - priv->napi;
 	int mask = BIT(q);
 	int quota = budget;
-	u32 ris0, tis;
 
-	for (;;) {
-		tis = ravb_read(ndev, TIS);
-		ris0 = ravb_read(ndev, RIS0);
-		if (!((ris0 & mask) || (tis & mask)))
-			break;
+	/* Processing RX Descriptor Ring */
+	/* Clear RX interrupt */
+	ravb_write(ndev, ~(mask | RIS0_RESERVED), RIS0);
+	if (ravb_rx(ndev, &quota, q))
+		goto out;
 
-		/* Processing RX Descriptor Ring */
-		if (ris0 & mask) {
-			/* Clear RX interrupt */
-			ravb_write(ndev, ~(mask | RIS0_RESERVED), RIS0);
-			if (ravb_rx(ndev, &quota, q))
-				goto out;
-		}
-		/* Processing TX Descriptor Ring */
-		if (tis & mask) {
-			spin_lock_irqsave(&priv->lock, flags);
-			/* Clear TX interrupt */
-			ravb_write(ndev, ~(mask | TIS_RESERVED), TIS);
-			ravb_tx_free(ndev, q, true);
-			netif_wake_subqueue(ndev, q);
-			spin_unlock_irqrestore(&priv->lock, flags);
-		}
-	}
+	/* Processing RX Descriptor Ring */
+	spin_lock_irqsave(&priv->lock, flags);
+	/* Clear TX interrupt */
+	ravb_write(ndev, ~(mask | TIS_RESERVED), TIS);
+	ravb_tx_free(ndev, q, true);
+	netif_wake_subqueue(ndev, q);
+	spin_unlock_irqrestore(&priv->lock, flags);
 
 	napi_complete(napi);