Message ID | 20220303181027.168204-1-robert.hancock@calian.com |
---|---|
State | Accepted |
Commit | 0bf476fc3624e3a72af4ba7340d430a91c18cd67 |
Headers | show |
Series | [net,v2] net: macb: Fix lost RX packet wakeup race in NAPI receive | expand |
On 03.03.2022 20:10, Robert Hancock wrote: > EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe > > There is an oddity in the way the RSR register flags propagate to the > ISR register (and the actual interrupt output) on this hardware: it > appears that RSR register bits only result in ISR being asserted if the > interrupt was actually enabled at the time, so enabling interrupts with > RSR bits already set doesn't trigger an interrupt to be raised. There > was already a partial fix for this race in the macb_poll function where > it checked for RSR bits being set and re-triggered NAPI receive. > However, there was a still a race window between checking RSR and > actually enabling interrupts, where a lost wakeup could happen. It's > necessary to check again after enabling interrupts to see if RSR was set > just prior to the interrupt being enabled, and re-trigger receive in that > case. > > This issue was noticed in a point-to-point UDP request-response protocol > which periodically saw timeouts or abnormally high response times due to > received packets not being processed in a timely fashion. In many > applications, more packets arriving, including TCP retransmissions, would > cause the original packet to be processed, thus masking the issue. > > Fixes: 02f7a34f34e3 ("net: macb: Re-enable RX interrupt only when RX is done") > Cc: stable@vger.kernel.org > Co-developed-by: Scott McNutt <scott.mcnutt@siriusxm.com> > Signed-off-by: Scott McNutt <scott.mcnutt@siriusxm.com> > Signed-off-by: Robert Hancock <robert.hancock@calian.com> Tested on SAMA7G5: Tested-by: Claudiu Beznea <claudiu.beznea@microchip.com> > --- > > Changes since v1: > -removed unrelated cleanup > -added notes on observed frequency of branches to comments > > drivers/net/ethernet/cadence/macb_main.c | 25 +++++++++++++++++++++++- > 1 file changed, 24 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c > index 98498a76ae16..d13f06cf0308 100644 > --- a/drivers/net/ethernet/cadence/macb_main.c > +++ b/drivers/net/ethernet/cadence/macb_main.c > @@ -1573,7 +1573,14 @@ static int macb_poll(struct napi_struct *napi, int budget) > if (work_done < budget) { > napi_complete_done(napi, work_done); > > - /* Packets received while interrupts were disabled */ > + /* RSR bits only seem to propagate to raise interrupts when > + * interrupts are enabled at the time, so if bits are already > + * set due to packets received while interrupts were disabled, > + * they will not cause another interrupt to be generated when > + * interrupts are re-enabled. > + * Check for this case here. This has been seen to happen > + * around 30% of the time under heavy network load. > + */ > status = macb_readl(bp, RSR); > if (status) { > if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE) > @@ -1581,6 +1588,22 @@ static int macb_poll(struct napi_struct *napi, int budget) > napi_reschedule(napi); > } else { > queue_writel(queue, IER, bp->rx_intr_mask); > + > + /* In rare cases, packets could have been received in > + * the window between the check above and re-enabling > + * interrupts. Therefore, a double-check is required > + * to avoid losing a wakeup. This can potentially race > + * with the interrupt handler doing the same actions > + * if an interrupt is raised just after enabling them, > + * but this should be harmless. > + */ > + status = macb_readl(bp, RSR); > + if (unlikely(status)) { > + queue_writel(queue, IDR, bp->rx_intr_mask); > + if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE) > + queue_writel(queue, ISR, MACB_BIT(RCOMP)); > + napi_schedule(napi); > + } > } > } > > -- > 2.31.1 >
diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c index 98498a76ae16..d13f06cf0308 100644 --- a/drivers/net/ethernet/cadence/macb_main.c +++ b/drivers/net/ethernet/cadence/macb_main.c @@ -1573,7 +1573,14 @@ static int macb_poll(struct napi_struct *napi, int budget) if (work_done < budget) { napi_complete_done(napi, work_done); - /* Packets received while interrupts were disabled */ + /* RSR bits only seem to propagate to raise interrupts when + * interrupts are enabled at the time, so if bits are already + * set due to packets received while interrupts were disabled, + * they will not cause another interrupt to be generated when + * interrupts are re-enabled. + * Check for this case here. This has been seen to happen + * around 30% of the time under heavy network load. + */ status = macb_readl(bp, RSR); if (status) { if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE) @@ -1581,6 +1588,22 @@ static int macb_poll(struct napi_struct *napi, int budget) napi_reschedule(napi); } else { queue_writel(queue, IER, bp->rx_intr_mask); + + /* In rare cases, packets could have been received in + * the window between the check above and re-enabling + * interrupts. Therefore, a double-check is required + * to avoid losing a wakeup. This can potentially race + * with the interrupt handler doing the same actions + * if an interrupt is raised just after enabling them, + * but this should be harmless. + */ + status = macb_readl(bp, RSR); + if (unlikely(status)) { + queue_writel(queue, IDR, bp->rx_intr_mask); + if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE) + queue_writel(queue, ISR, MACB_BIT(RCOMP)); + napi_schedule(napi); + } } }