diff mbox series

net: stmmac: Don't call _irqoff() with hardirqs enabled

Message ID 20201008162749.860521-1-john@metanate.com
State New
Headers show
Series net: stmmac: Don't call _irqoff() with hardirqs enabled | expand

Commit Message

John Keeping Oct. 8, 2020, 4:27 p.m. UTC
With threadirqs, stmmac_interrupt() is called on a thread with hardirqs
enabled so we cannot call __napi_schedule_irqoff().  Under lockdep it
leads to:

	------------[ cut here ]------------
	WARNING: CPU: 0 PID: 285 at kernel/softirq.c:598 __raise_softirq_irqoff+0x6c/0x1c8
	IRQs not disabled as expected
	Modules linked in: brcmfmac hci_uart btbcm cfg80211 brcmutil
	CPU: 0 PID: 285 Comm: irq/41-eth0 Not tainted 5.4.69-rt39 #1
	Hardware name: Rockchip (Device Tree)
	[<c0110d3c>] (unwind_backtrace) from [<c010c284>] (show_stack+0x10/0x14)
	[<c010c284>] (show_stack) from [<c0855504>] (dump_stack+0xa8/0xe0)
	[<c0855504>] (dump_stack) from [<c0120a9c>] (__warn+0xe0/0xfc)
	[<c0120a9c>] (__warn) from [<c0120e80>] (warn_slowpath_fmt+0x7c/0xa4)
	[<c0120e80>] (warn_slowpath_fmt) from [<c01278c8>] (__raise_softirq_irqoff+0x6c/0x1c8)
	[<c01278c8>] (__raise_softirq_irqoff) from [<c056bccc>] (stmmac_interrupt+0x388/0x4e0)
	[<c056bccc>] (stmmac_interrupt) from [<c0178714>] (irq_forced_thread_fn+0x28/0x64)
	[<c0178714>] (irq_forced_thread_fn) from [<c0178924>] (irq_thread+0x124/0x260)
	[<c0178924>] (irq_thread) from [<c0142ee8>] (kthread+0x154/0x164)
	[<c0142ee8>] (kthread) from [<c01010bc>] (ret_from_fork+0x14/0x38)
	Exception stack(0xeb7b5fb0 to 0xeb7b5ff8)
	5fa0:                                     00000000 00000000 00000000 00000000
	5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
	5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
	irq event stamp: 48
	hardirqs last  enabled at (50): [<c085c200>] prb_unlock+0x7c/0x8c
	hardirqs last disabled at (51): [<c085c0dc>] prb_lock+0x58/0x100
	softirqs last  enabled at (0): [<c011e770>] copy_process+0x550/0x1654
	softirqs last disabled at (25): [<c01786ec>] irq_forced_thread_fn+0x0/0x64
	---[ end trace 0000000000000002 ]---

Use __napi_schedule() instead which will save & restore the interrupt
state.

Fixes: 4ccb45857c2c ("net: stmmac: Fix NAPI poll in TX path when in multi-queue")
Signed-off-by: John Keeping <john@metanate.com>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Vladimir Oltean Oct. 8, 2020, 11:46 p.m. UTC | #1
On Thu, Oct 08, 2020 at 05:27:49PM +0100, John Keeping wrote:
> With threadirqs, stmmac_interrupt() is called on a thread with hardirqs
> enabled so we cannot call __napi_schedule_irqoff().  Under lockdep it
> leads to:
> 
> 	------------[ cut here ]------------
> 	WARNING: CPU: 0 PID: 285 at kernel/softirq.c:598 __raise_softirq_irqoff+0x6c/0x1c8
> 	IRQs not disabled as expected
> 	Modules linked in: brcmfmac hci_uart btbcm cfg80211 brcmutil
> 	CPU: 0 PID: 285 Comm: irq/41-eth0 Not tainted 5.4.69-rt39 #1
> 	Hardware name: Rockchip (Device Tree)
> 	[<c0110d3c>] (unwind_backtrace) from [<c010c284>] (show_stack+0x10/0x14)
> 	[<c010c284>] (show_stack) from [<c0855504>] (dump_stack+0xa8/0xe0)
> 	[<c0855504>] (dump_stack) from [<c0120a9c>] (__warn+0xe0/0xfc)
> 	[<c0120a9c>] (__warn) from [<c0120e80>] (warn_slowpath_fmt+0x7c/0xa4)
> 	[<c0120e80>] (warn_slowpath_fmt) from [<c01278c8>] (__raise_softirq_irqoff+0x6c/0x1c8)
> 	[<c01278c8>] (__raise_softirq_irqoff) from [<c056bccc>] (stmmac_interrupt+0x388/0x4e0)
> 	[<c056bccc>] (stmmac_interrupt) from [<c0178714>] (irq_forced_thread_fn+0x28/0x64)
> 	[<c0178714>] (irq_forced_thread_fn) from [<c0178924>] (irq_thread+0x124/0x260)
> 	[<c0178924>] (irq_thread) from [<c0142ee8>] (kthread+0x154/0x164)
> 	[<c0142ee8>] (kthread) from [<c01010bc>] (ret_from_fork+0x14/0x38)
> 	Exception stack(0xeb7b5fb0 to 0xeb7b5ff8)
> 	5fa0:                                     00000000 00000000 00000000 00000000
> 	5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 	5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
> 	irq event stamp: 48
> 	hardirqs last  enabled at (50): [<c085c200>] prb_unlock+0x7c/0x8c
> 	hardirqs last disabled at (51): [<c085c0dc>] prb_lock+0x58/0x100
> 	softirqs last  enabled at (0): [<c011e770>] copy_process+0x550/0x1654
> 	softirqs last disabled at (25): [<c01786ec>] irq_forced_thread_fn+0x0/0x64
> 	---[ end trace 0000000000000002 ]---
> 
> Use __napi_schedule() instead which will save & restore the interrupt
> state.
> 
> Fixes: 4ccb45857c2c ("net: stmmac: Fix NAPI poll in TX path when in multi-queue")
> Signed-off-by: John Keeping <john@metanate.com>
> ---

Don't get me wrong, this is so cool that the new lockdep warning is really
helping out finding real bugs, but the patch that adds that warning
(https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?id=cdabce2e3dff7e4bcef73473987618569d178af3)
isn't in 5.4.69-rt39, is it?
John Keeping Oct. 9, 2020, 9:59 a.m. UTC | #2
On Fri, 9 Oct 2020 02:46:09 +0300
Vladimir Oltean <olteanv@gmail.com> wrote:

> On Thu, Oct 08, 2020 at 05:27:49PM +0100, John Keeping wrote:
> > With threadirqs, stmmac_interrupt() is called on a thread with hardirqs
> > enabled so we cannot call __napi_schedule_irqoff().  Under lockdep it
> > leads to:
> > 
> > 	------------[ cut here ]------------
> > 	WARNING: CPU: 0 PID: 285 at kernel/softirq.c:598 __raise_softirq_irqoff+0x6c/0x1c8
> > 	IRQs not disabled as expected
> > 	Modules linked in: brcmfmac hci_uart btbcm cfg80211 brcmutil
> > 	CPU: 0 PID: 285 Comm: irq/41-eth0 Not tainted 5.4.69-rt39 #1
> > 	Hardware name: Rockchip (Device Tree)
> > 	[<c0110d3c>] (unwind_backtrace) from [<c010c284>] (show_stack+0x10/0x14)
> > 	[<c010c284>] (show_stack) from [<c0855504>] (dump_stack+0xa8/0xe0)
> > 	[<c0855504>] (dump_stack) from [<c0120a9c>] (__warn+0xe0/0xfc)
> > 	[<c0120a9c>] (__warn) from [<c0120e80>] (warn_slowpath_fmt+0x7c/0xa4)
> > 	[<c0120e80>] (warn_slowpath_fmt) from [<c01278c8>] (__raise_softirq_irqoff+0x6c/0x1c8)
> > 	[<c01278c8>] (__raise_softirq_irqoff) from [<c056bccc>] (stmmac_interrupt+0x388/0x4e0)
> > 	[<c056bccc>] (stmmac_interrupt) from [<c0178714>] (irq_forced_thread_fn+0x28/0x64)
> > 	[<c0178714>] (irq_forced_thread_fn) from [<c0178924>] (irq_thread+0x124/0x260)
> > 	[<c0178924>] (irq_thread) from [<c0142ee8>] (kthread+0x154/0x164)
> > 	[<c0142ee8>] (kthread) from [<c01010bc>] (ret_from_fork+0x14/0x38)
> > 	Exception stack(0xeb7b5fb0 to 0xeb7b5ff8)
> > 	5fa0:                                     00000000 00000000 00000000 00000000
> > 	5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> > 	5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
> > 	irq event stamp: 48
> > 	hardirqs last  enabled at (50): [<c085c200>] prb_unlock+0x7c/0x8c
> > 	hardirqs last disabled at (51): [<c085c0dc>] prb_lock+0x58/0x100
> > 	softirqs last  enabled at (0): [<c011e770>] copy_process+0x550/0x1654
> > 	softirqs last disabled at (25): [<c01786ec>] irq_forced_thread_fn+0x0/0x64
> > 	---[ end trace 0000000000000002 ]---
> > 
> > Use __napi_schedule() instead which will save & restore the interrupt
> > state.
> > 
> > Fixes: 4ccb45857c2c ("net: stmmac: Fix NAPI poll in TX path when in multi-queue")
> > Signed-off-by: John Keeping <john@metanate.com>
> > ---  
> 
> Don't get me wrong, this is so cool that the new lockdep warning is really
> helping out finding real bugs, but the patch that adds that warning
> (https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?id=cdabce2e3dff7e4bcef73473987618569d178af3)
> isn't in 5.4.69-rt39, is it?

No, it's not, although I would have saved several days debugging if it
was!  I backported the lockdep warning to prove that it caught this
issue.

The evidence it is possible to see on vanilla 5.4.x is:

	$ trace-cmd report -l
	irq/43-e-280     0....2    74.017658: softirq_raise:        vec=3 [action=NET_RX]

Note the missing "d" where this should be "0d...2" to indicate hardirqs
disabled.


Regards,
John
Vladimir Oltean Oct. 9, 2020, 10:12 a.m. UTC | #3
On Fri, Oct 09, 2020 at 10:59:45AM +0100, John Keeping wrote:
> No, it's not, although I would have saved several days debugging if it
> was!  I backported the lockdep warning to prove that it caught this
> issue.
>
> The evidence it is possible to see on vanilla 5.4.x is:
>
> 	$ trace-cmd report -l
> 	irq/43-e-280     0....2    74.017658: softirq_raise:        vec=3 [action=NET_RX]
>
> Note the missing "d" where this should be "0d...2" to indicate hardirqs
> disabled.

Cool, makes sense.
Heiner Kallweit Oct. 9, 2020, 2:54 p.m. UTC | #4
On 08.10.2020 18:27, John Keeping wrote:
> With threadirqs, stmmac_interrupt() is called on a thread with hardirqs
> enabled so we cannot call __napi_schedule_irqoff().  Under lockdep it
> leads to:
> 
> 	------------[ cut here ]------------
> 	WARNING: CPU: 0 PID: 285 at kernel/softirq.c:598 __raise_softirq_irqoff+0x6c/0x1c8
> 	IRQs not disabled as expected
> 	Modules linked in: brcmfmac hci_uart btbcm cfg80211 brcmutil
> 	CPU: 0 PID: 285 Comm: irq/41-eth0 Not tainted 5.4.69-rt39 #1
> 	Hardware name: Rockchip (Device Tree)
> 	[<c0110d3c>] (unwind_backtrace) from [<c010c284>] (show_stack+0x10/0x14)
> 	[<c010c284>] (show_stack) from [<c0855504>] (dump_stack+0xa8/0xe0)
> 	[<c0855504>] (dump_stack) from [<c0120a9c>] (__warn+0xe0/0xfc)
> 	[<c0120a9c>] (__warn) from [<c0120e80>] (warn_slowpath_fmt+0x7c/0xa4)
> 	[<c0120e80>] (warn_slowpath_fmt) from [<c01278c8>] (__raise_softirq_irqoff+0x6c/0x1c8)
> 	[<c01278c8>] (__raise_softirq_irqoff) from [<c056bccc>] (stmmac_interrupt+0x388/0x4e0)
> 	[<c056bccc>] (stmmac_interrupt) from [<c0178714>] (irq_forced_thread_fn+0x28/0x64)
> 	[<c0178714>] (irq_forced_thread_fn) from [<c0178924>] (irq_thread+0x124/0x260)
> 	[<c0178924>] (irq_thread) from [<c0142ee8>] (kthread+0x154/0x164)
> 	[<c0142ee8>] (kthread) from [<c01010bc>] (ret_from_fork+0x14/0x38)
> 	Exception stack(0xeb7b5fb0 to 0xeb7b5ff8)
> 	5fa0:                                     00000000 00000000 00000000 00000000
> 	5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 	5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
> 	irq event stamp: 48
> 	hardirqs last  enabled at (50): [<c085c200>] prb_unlock+0x7c/0x8c
> 	hardirqs last disabled at (51): [<c085c0dc>] prb_lock+0x58/0x100
> 	softirqs last  enabled at (0): [<c011e770>] copy_process+0x550/0x1654
> 	softirqs last disabled at (25): [<c01786ec>] irq_forced_thread_fn+0x0/0x64
> 	---[ end trace 0000000000000002 ]---
> 
> Use __napi_schedule() instead which will save & restore the interrupt
> state.
> 
I'm thinking about a __napi_schedule version that disables hard irq's
conditionally, based on variable force_irqthreads, exported by the irq
subsystem. This would allow to behave correctly with threadirqs set,
whilst not loosing the _irqoff benefit with threadirqs unset.
Let me come up with a proposal.
Jakub Kicinski Oct. 9, 2020, 3:58 p.m. UTC | #5
On Fri, 9 Oct 2020 16:54:06 +0200 Heiner Kallweit wrote:
> I'm thinking about a __napi_schedule version that disables hard irq's
> conditionally, based on variable force_irqthreads, exported by the irq
> subsystem. This would allow to behave correctly with threadirqs set,
> whilst not loosing the _irqoff benefit with threadirqs unset.
> Let me come up with a proposal.

I think you'd need to make napi_schedule_irqoff() behave like that,
right?  Are there any uses of napi_schedule_irqoff() that are disabling
irqs and not just running from an irq handler?
Heiner Kallweit Oct. 9, 2020, 4:06 p.m. UTC | #6
On 09.10.2020 17:58, Jakub Kicinski wrote:
> On Fri, 9 Oct 2020 16:54:06 +0200 Heiner Kallweit wrote:
>> I'm thinking about a __napi_schedule version that disables hard irq's
>> conditionally, based on variable force_irqthreads, exported by the irq
>> subsystem. This would allow to behave correctly with threadirqs set,
>> whilst not loosing the _irqoff benefit with threadirqs unset.
>> Let me come up with a proposal.
> 
> I think you'd need to make napi_schedule_irqoff() behave like that,
> right?  Are there any uses of napi_schedule_irqoff() that are disabling
> irqs and not just running from an irq handler?
> 
Right, the best approach depends on the answer to the latter question.
I didn't check this yet, therefore I described the least intrusive approach.
Heiner Kallweit Oct. 10, 2020, 1:08 p.m. UTC | #7
On 09.10.2020 18:06, Heiner Kallweit wrote:
> On 09.10.2020 17:58, Jakub Kicinski wrote:
>> On Fri, 9 Oct 2020 16:54:06 +0200 Heiner Kallweit wrote:
>>> I'm thinking about a __napi_schedule version that disables hard irq's
>>> conditionally, based on variable force_irqthreads, exported by the irq
>>> subsystem. This would allow to behave correctly with threadirqs set,
>>> whilst not loosing the _irqoff benefit with threadirqs unset.
>>> Let me come up with a proposal.
>>
>> I think you'd need to make napi_schedule_irqoff() behave like that,
>> right?  Are there any uses of napi_schedule_irqoff() that are disabling
>> irqs and not just running from an irq handler?
>>
> Right, the best approach depends on the answer to the latter question.
> I didn't check this yet, therefore I described the least intrusive approach.
> 

With some help from coccinelle I identified the following functions that
call napi_schedule_irqoff() or __napi_schedule_irqoff() and do not run
from an irq handler (at least not at the first glance).

dpaa2_caam_fqdan_cb
qede_simd_fp_handler
mlx4_en_rx_irq
mlx4_en_tx_irq
qeth_qdio_poll
netvsc_channel_cb
napi_watchdog
Jakub Kicinski Oct. 10, 2020, 3:22 p.m. UTC | #8
On Sat, 10 Oct 2020 15:08:15 +0200 Heiner Kallweit wrote:
> On 09.10.2020 18:06, Heiner Kallweit wrote:
> > On 09.10.2020 17:58, Jakub Kicinski wrote:  
> >> On Fri, 9 Oct 2020 16:54:06 +0200 Heiner Kallweit wrote:  
> >>> I'm thinking about a __napi_schedule version that disables hard irq's
> >>> conditionally, based on variable force_irqthreads, exported by the irq
> >>> subsystem. This would allow to behave correctly with threadirqs set,
> >>> whilst not loosing the _irqoff benefit with threadirqs unset.
> >>> Let me come up with a proposal.  
> >>
> >> I think you'd need to make napi_schedule_irqoff() behave like that,
> >> right?  Are there any uses of napi_schedule_irqoff() that are disabling
> >> irqs and not just running from an irq handler?
> >>  
> > Right, the best approach depends on the answer to the latter question.
> > I didn't check this yet, therefore I described the least intrusive approach.
> >   
> 
> With some help from coccinelle I identified the following functions that
> call napi_schedule_irqoff() or __napi_schedule_irqoff() and do not run
> from an irq handler (at least not at the first glance).
> 
> dpaa2_caam_fqdan_cb
> qede_simd_fp_handler
> mlx4_en_rx_irq
> mlx4_en_tx_irq

Don't know the others but FWIW the mlx4 ones run from an IRQ handler,
AFAICT:

static irqreturn_t mlx4_interrupt(int irq, void *dev_ptr)
static irqreturn_t mlx4_msi_x_interrupt(int irq, void *eq_ptr)
  mlx4_eq_int()
    mlx4_cq_completion
      cq->comp()

> qeth_qdio_poll
> netvsc_channel_cb
> napi_watchdog

This one runs from a hrtimer, which I believe will be a hard irq
context on anything but RT. I could be wrong.
Heiner Kallweit Oct. 11, 2020, 9:24 a.m. UTC | #9
On 10.10.2020 17:22, Jakub Kicinski wrote:
> On Sat, 10 Oct 2020 15:08:15 +0200 Heiner Kallweit wrote:
>> On 09.10.2020 18:06, Heiner Kallweit wrote:
>>> On 09.10.2020 17:58, Jakub Kicinski wrote:  
>>>> On Fri, 9 Oct 2020 16:54:06 +0200 Heiner Kallweit wrote:  
>>>>> I'm thinking about a __napi_schedule version that disables hard irq's
>>>>> conditionally, based on variable force_irqthreads, exported by the irq
>>>>> subsystem. This would allow to behave correctly with threadirqs set,
>>>>> whilst not loosing the _irqoff benefit with threadirqs unset.
>>>>> Let me come up with a proposal.  
>>>>
>>>> I think you'd need to make napi_schedule_irqoff() behave like that,
>>>> right?  Are there any uses of napi_schedule_irqoff() that are disabling
>>>> irqs and not just running from an irq handler?
>>>>  
>>> Right, the best approach depends on the answer to the latter question.
>>> I didn't check this yet, therefore I described the least intrusive approach.
>>>   
>>
>> With some help from coccinelle I identified the following functions that
>> call napi_schedule_irqoff() or __napi_schedule_irqoff() and do not run
>> from an irq handler (at least not at the first glance).
>>
>> dpaa2_caam_fqdan_cb
>> qede_simd_fp_handler
>> mlx4_en_rx_irq
>> mlx4_en_tx_irq
> 
> Don't know the others but FWIW the mlx4 ones run from an IRQ handler,
> AFAICT:
> 
> static irqreturn_t mlx4_interrupt(int irq, void *dev_ptr)
> static irqreturn_t mlx4_msi_x_interrupt(int irq, void *eq_ptr)
>   mlx4_eq_int()
>     mlx4_cq_completion
>       cq->comp()
> 
>> qeth_qdio_poll
>> netvsc_channel_cb
>> napi_watchdog
> 
> This one runs from a hrtimer, which I believe will be a hard irq
> context on anything but RT. I could be wrong.
> 

A similar discussion can be found e.g. here:
https://lore.kernel.org/netdev/20191126222013.1904785-1-bigeasy@linutronix.de/
However I don't see any actual outcome.
Heiner Kallweit Oct. 11, 2020, 1:42 p.m. UTC | #10
On 10.10.2020 17:22, Jakub Kicinski wrote:
> On Sat, 10 Oct 2020 15:08:15 +0200 Heiner Kallweit wrote:
>> On 09.10.2020 18:06, Heiner Kallweit wrote:
>>> On 09.10.2020 17:58, Jakub Kicinski wrote:  
>>>> On Fri, 9 Oct 2020 16:54:06 +0200 Heiner Kallweit wrote:  
>>>>> I'm thinking about a __napi_schedule version that disables hard irq's
>>>>> conditionally, based on variable force_irqthreads, exported by the irq
>>>>> subsystem. This would allow to behave correctly with threadirqs set,
>>>>> whilst not loosing the _irqoff benefit with threadirqs unset.
>>>>> Let me come up with a proposal.  
>>>>
>>>> I think you'd need to make napi_schedule_irqoff() behave like that,
>>>> right?  Are there any uses of napi_schedule_irqoff() that are disabling
>>>> irqs and not just running from an irq handler?
>>>>  
>>> Right, the best approach depends on the answer to the latter question.
>>> I didn't check this yet, therefore I described the least intrusive approach.
>>>   
>>
>> With some help from coccinelle I identified the following functions that
>> call napi_schedule_irqoff() or __napi_schedule_irqoff() and do not run
>> from an irq handler (at least not at the first glance).
>>
>> dpaa2_caam_fqdan_cb
>> qede_simd_fp_handler
>> mlx4_en_rx_irq
>> mlx4_en_tx_irq
> 
> Don't know the others but FWIW the mlx4 ones run from an IRQ handler,
> AFAICT:
> 
> static irqreturn_t mlx4_interrupt(int irq, void *dev_ptr)
> static irqreturn_t mlx4_msi_x_interrupt(int irq, void *eq_ptr)
>   mlx4_eq_int()
>     mlx4_cq_completion
>       cq->comp()
> 
>> qeth_qdio_poll
>> netvsc_channel_cb
>> napi_watchdog
> 
> This one runs from a hrtimer, which I believe will be a hard irq
> context on anything but RT. I could be wrong.
> 

Typically forced irq threading will not be enabled, therefore going
back to use napi_schedule() in drivers in most cases will cause
losing the benefit of the irqoff version. Something like the following
should be better. Only small drawback I see is that in case of forced
irq threading hrtimers will still run in hardirq context and we lose
the benefit of the irqoff version in napi_watchdog().

diff --git a/net/core/dev.c b/net/core/dev.c
index a146bac84..7d18560b2 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6393,7 +6393,11 @@ EXPORT_SYMBOL(napi_schedule_prep);
  */
 void __napi_schedule_irqoff(struct napi_struct *n)
 {
-	____napi_schedule(this_cpu_ptr(&softnet_data), n);
+	/* hard irqs may not be masked in case of forced irq threading */
+	if (force_irqthreads)
+		__napi_schedule(n);
+	else
+		____napi_schedule(this_cpu_ptr(&softnet_data), n);
 }
 EXPORT_SYMBOL(__napi_schedule_irqoff);
Jakub Kicinski Oct. 11, 2020, 3:56 p.m. UTC | #11
On Sun, 11 Oct 2020 15:42:24 +0200 Heiner Kallweit wrote:
> On 10.10.2020 17:22, Jakub Kicinski wrote:
> > On Sat, 10 Oct 2020 15:08:15 +0200 Heiner Kallweit wrote:  
> >> On 09.10.2020 18:06, Heiner Kallweit wrote:  
> >>> On 09.10.2020 17:58, Jakub Kicinski wrote:    
> >>>> On Fri, 9 Oct 2020 16:54:06 +0200 Heiner Kallweit wrote:    
> >>>>> I'm thinking about a __napi_schedule version that disables hard irq's
> >>>>> conditionally, based on variable force_irqthreads, exported by the irq
> >>>>> subsystem. This would allow to behave correctly with threadirqs set,
> >>>>> whilst not loosing the _irqoff benefit with threadirqs unset.
> >>>>> Let me come up with a proposal.    
> >>>>
> >>>> I think you'd need to make napi_schedule_irqoff() behave like that,
> >>>> right?  Are there any uses of napi_schedule_irqoff() that are disabling
> >>>> irqs and not just running from an irq handler?
> >>>>    
> >>> Right, the best approach depends on the answer to the latter question.
> >>> I didn't check this yet, therefore I described the least intrusive approach.
> >>>     
> >>
> >> With some help from coccinelle I identified the following functions that
> >> call napi_schedule_irqoff() or __napi_schedule_irqoff() and do not run
> >> from an irq handler (at least not at the first glance).
> >>
> >> dpaa2_caam_fqdan_cb
> >> qede_simd_fp_handler
> >> mlx4_en_rx_irq
> >> mlx4_en_tx_irq  
> > 
> > Don't know the others but FWIW the mlx4 ones run from an IRQ handler,
> > AFAICT:
> > 
> > static irqreturn_t mlx4_interrupt(int irq, void *dev_ptr)
> > static irqreturn_t mlx4_msi_x_interrupt(int irq, void *eq_ptr)
> >   mlx4_eq_int()
> >     mlx4_cq_completion
> >       cq->comp()
> >   
> >> qeth_qdio_poll
> >> netvsc_channel_cb
> >> napi_watchdog  
> > 
> > This one runs from a hrtimer, which I believe will be a hard irq
> > context on anything but RT. I could be wrong.
> >   
> 
> Typically forced irq threading will not be enabled, therefore going
> back to use napi_schedule() in drivers in most cases will cause
> losing the benefit of the irqoff version. Something like the following
> should be better. Only small drawback I see is that in case of forced
> irq threading hrtimers will still run in hardirq context and we lose
> the benefit of the irqoff version in napi_watchdog().
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index a146bac84..7d18560b2 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -6393,7 +6393,11 @@ EXPORT_SYMBOL(napi_schedule_prep);
>   */
>  void __napi_schedule_irqoff(struct napi_struct *n)
>  {
> -	____napi_schedule(this_cpu_ptr(&softnet_data), n);
> +	/* hard irqs may not be masked in case of forced irq threading */
> +	if (force_irqthreads)
> +		__napi_schedule(n);
> +	else
> +		____napi_schedule(this_cpu_ptr(&softnet_data), n);
>  }
>  EXPORT_SYMBOL(__napi_schedule_irqoff);

Does

	if (force_irqthreads)
		local_irq_save(flags);
	____napi_schedule(this_cpu_ptr(&softnet_data), n);
	if (force_irqthreads)
		local_irq_restore(flags);

not produce more concise assembly?
Jakub Kicinski Oct. 11, 2020, 4:06 p.m. UTC | #12
On Sun, 11 Oct 2020 11:24:41 +0200 Heiner Kallweit wrote:
> >> qeth_qdio_poll
> >> netvsc_channel_cb
> >> napi_watchdog  
> > 
> > This one runs from a hrtimer, which I believe will be a hard irq
> > context on anything but RT. I could be wrong.
> >   
> 
> A similar discussion can be found e.g. here:
> https://lore.kernel.org/netdev/20191126222013.1904785-1-bigeasy@linutronix.de/
> However I don't see any actual outcome.

Interesting, hopefully Eric will chime in. I think the hrtimer issue
was solved. But I'm not actually seeing a lockdep_assert_irqs_disabled()
in __raise_softirq_irqoff() in net, so IDK what that's for?

In any case if NAPI thinks it has irqs off while they're not, and
interacts with other parts of the kernel we may be in for a game of
whack-a-mole. 

Perhaps a way around touching force_irqthreads directly in net/ would 
be some form of a helper like "theaded_local_irq_save" or such that'd
disable IRQs only if force_irqthreads == 1? Is that cheating? :)
diff mbox series

Patch

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 220626a8d499..c331b829f60a 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -2145,7 +2145,7 @@  static int stmmac_napi_check(struct stmmac_priv *priv, u32 chan)
 			spin_lock_irqsave(&ch->lock, flags);
 			stmmac_disable_dma_irq(priv, priv->ioaddr, chan, 1, 0);
 			spin_unlock_irqrestore(&ch->lock, flags);
-			__napi_schedule_irqoff(&ch->rx_napi);
+			__napi_schedule(&ch->rx_napi);
 		}
 	}
 
@@ -2154,7 +2154,7 @@  static int stmmac_napi_check(struct stmmac_priv *priv, u32 chan)
 			spin_lock_irqsave(&ch->lock, flags);
 			stmmac_disable_dma_irq(priv, priv->ioaddr, chan, 0, 1);
 			spin_unlock_irqrestore(&ch->lock, flags);
-			__napi_schedule_irqoff(&ch->tx_napi);
+			__napi_schedule(&ch->tx_napi);
 		}
 	}