Message ID | 20210512214324.hiaiw3e2tzmsygcz@linutronix.de |
---|---|
State | New |
Headers | show |
Series | [net-next] net: Treat __napi_schedule_irqoff() as __napi_schedule() on PREEMPT_RT | expand |
On Wed, May 12 2021 at 23:43, Sebastian Andrzej Siewior wrote: > __napi_schedule_irqoff() is an optimized version of __napi_schedule() > which can be used where it is known that interrupts are disabled, > e.g. in interrupt-handlers, spin_lock_irq() sections or hrtimer > callbacks. > > On PREEMPT_RT enabled kernels this assumptions is not true. Force- > threaded interrupt handlers and spinlocks are not disabling interrupts > and the NAPI hrtimer callback is forced into softirq context which runs > with interrupts enabled as well. > > Chasing all usage sites of __napi_schedule_irqoff() is a whack-a-mole > game so make __napi_schedule_irqoff() invoke __napi_schedule() for > PREEMPT_RT kernels. > > The callers of ____napi_schedule() in the networking core have been > audited and are correct on PREEMPT_RT kernels as well. > > Reported-by: Juri Lelli <juri.lelli@redhat.com> > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> > --- > Alternatively __napi_schedule_irqoff() could be #ifdef'ed out on RT and > an inline provided which invokes __napi_schedule(). > > This was not chosen as it creates #ifdeffery all over the place and with > the proposed solution the code reflects the documentation consistently > and in one obvious place. Blame me for that decision. No matter which variant we end up with, this needs to go into all stable RT kernels ASAP. Thanks, tglx
On Thu, 13 May 2021 00:28:02 +0200 Thomas Gleixner <tglx@linutronix.de> wrote: > No matter which variant we end up with, this needs to go into all stable > RT kernels ASAP. Is this in rt-devel already? I'll start pulling in whatever is in there. -- Steve
Hi, On 12/05/21 23:43, Sebastian Andrzej Siewior wrote: > __napi_schedule_irqoff() is an optimized version of __napi_schedule() > which can be used where it is known that interrupts are disabled, > e.g. in interrupt-handlers, spin_lock_irq() sections or hrtimer > callbacks. > > On PREEMPT_RT enabled kernels this assumptions is not true. Force- > threaded interrupt handlers and spinlocks are not disabling interrupts > and the NAPI hrtimer callback is forced into softirq context which runs > with interrupts enabled as well. > > Chasing all usage sites of __napi_schedule_irqoff() is a whack-a-mole > game so make __napi_schedule_irqoff() invoke __napi_schedule() for > PREEMPT_RT kernels. > > The callers of ____napi_schedule() in the networking core have been > audited and are correct on PREEMPT_RT kernels as well. > > Reported-by: Juri Lelli <juri.lelli@redhat.com> > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> > --- > Alternatively __napi_schedule_irqoff() could be #ifdef'ed out on RT and > an inline provided which invokes __napi_schedule(). > > This was not chosen as it creates #ifdeffery all over the place and with > the proposed solution the code reflects the documentation consistently > and in one obvious place. > > net/core/dev.c | 11 +++++++++-- > 1 file changed, 9 insertions(+), 2 deletions(-) > > diff --git a/net/core/dev.c b/net/core/dev.c > index 222b1d322c969..febb23708184e 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -6501,11 +6501,18 @@ EXPORT_SYMBOL(napi_schedule_prep); > * __napi_schedule_irqoff - schedule for receive > * @n: entry to schedule > * > - * Variant of __napi_schedule() assuming hard irqs are masked > + * Variant of __napi_schedule() assuming hard irqs are masked. > + * > + * On PREEMPT_RT enabled kernels this maps to __napi_schedule() > + * because the interrupt disabled assumption might not be true > + * due to force-threaded interrupts and spinlock substitution. > */ > void __napi_schedule_irqoff(struct napi_struct *n) > { > - ____napi_schedule(this_cpu_ptr(&softnet_data), n); > + if (!IS_ENABLED(CONFIG_PREEMPT_RT)) > + ____napi_schedule(this_cpu_ptr(&softnet_data), n); > + else > + __napi_schedule(n); > } > EXPORT_SYMBOL(__napi_schedule_irqoff); Thanks for the patch! Reviewed-by: Juri Lelli <juri.lelli@redhat.com> Best, Juri
On Wed, 12 May 2021 20:50:46 -0400 Steven Rostedt <rostedt@goodmis.org> wrote: > On Thu, 13 May 2021 00:28:02 +0200 > Thomas Gleixner <tglx@linutronix.de> wrote: > > > No matter which variant we end up with, this needs to go into all stable > > RT kernels ASAP. > > Is this in rt-devel already? > > I'll start pulling in whatever is in there. I don't see this in the rt-devel tree. The stable-rt releases always pull from there (following the stable vs mainline relationship). Is there going to be a new rt-devel release? -- Steve
Hello: This patch was applied to netdev/net-next.git (refs/heads/master): On Wed, 12 May 2021 23:43:24 +0200 you wrote: > __napi_schedule_irqoff() is an optimized version of __napi_schedule() > which can be used where it is known that interrupts are disabled, > e.g. in interrupt-handlers, spin_lock_irq() sections or hrtimer > callbacks. > > On PREEMPT_RT enabled kernels this assumptions is not true. Force- > threaded interrupt handlers and spinlocks are not disabling interrupts > and the NAPI hrtimer callback is forced into softirq context which runs > with interrupts enabled as well. > > [...] Here is the summary with links: - [net-next] net: Treat __napi_schedule_irqoff() as __napi_schedule() on PREEMPT_RT https://git.kernel.org/netdev/net-next/c/8380c81d5c4f You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html
On Thu, May 13 2021 at 12:43, Steven Rostedt wrote: > On Wed, 12 May 2021 20:50:46 -0400 > Steven Rostedt <rostedt@goodmis.org> wrote: > >> On Thu, 13 May 2021 00:28:02 +0200 >> Thomas Gleixner <tglx@linutronix.de> wrote: >> >> > No matter which variant we end up with, this needs to go into all stable >> > RT kernels ASAP. >> >> Is this in rt-devel already? >> >> I'll start pulling in whatever is in there. > > I don't see this in the rt-devel tree. The stable-rt releases always pull > from there (following the stable vs mainline relationship). > > Is there going to be a new rt-devel release? Once we have time to work on it. The patch got applied to net-next, so please pick it up from there. Thanks, tglx
On Thu, 13 May 2021 00:28:02 +0200 Thomas Gleixner wrote: > On Wed, May 12 2021 at 23:43, Sebastian Andrzej Siewior wrote: > > __napi_schedule_irqoff() is an optimized version of __napi_schedule() > > which can be used where it is known that interrupts are disabled, > > e.g. in interrupt-handlers, spin_lock_irq() sections or hrtimer > > callbacks. > > > > On PREEMPT_RT enabled kernels this assumptions is not true. Force- > > threaded interrupt handlers and spinlocks are not disabling interrupts > > and the NAPI hrtimer callback is forced into softirq context which runs > > with interrupts enabled as well. > > > > Chasing all usage sites of __napi_schedule_irqoff() is a whack-a-mole > > game so make __napi_schedule_irqoff() invoke __napi_schedule() for > > PREEMPT_RT kernels. > > > > The callers of ____napi_schedule() in the networking core have been > > audited and are correct on PREEMPT_RT kernels as well. > > > > Reported-by: Juri Lelli <juri.lelli@redhat.com> > > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> > > Reviewed-by: Thomas Gleixner <tglx@linutronix.de> > > > --- > > Alternatively __napi_schedule_irqoff() could be #ifdef'ed out on RT and > > an inline provided which invokes __napi_schedule(). > > > > This was not chosen as it creates #ifdeffery all over the place and with > > the proposed solution the code reflects the documentation consistently > > and in one obvious place. > > Blame me for that decision. > > No matter which variant we end up with, this needs to go into all stable > RT kernels ASAP. Mumble mumble. I thought we concluded that drivers used on RT can be fixed, we've already done it for a couple drivers (by which I mean two). If all the IRQ handler is doing is scheduling NAPI (which it is for modern NICs) - IRQF_NO_THREAD seems like the right option. Is there any driver you care about that we can convert to using IRQF_NO_THREAD so we can have new drivers to "do the right thing" while the old ones depend on this workaround for now? Another thing while I have your attention - ____napi_schedule() does __raise_softirq_irqoff() which AFAIU does not wake the ksoftirq thread. On non-RT we get occasional NOHZ warnings when drivers schedule napi from process context, but on RT this is even more of a problem, right? ksoftirqd won't run until something else actually wakes it up?
On Fri, May 14, 2021 at 11:56 AM Jakub Kicinski <kuba@kernel.org> wrote: > > On Thu, 13 May 2021 00:28:02 +0200 Thomas Gleixner wrote: > > On Wed, May 12 2021 at 23:43, Sebastian Andrzej Siewior wrote: > > > __napi_schedule_irqoff() is an optimized version of __napi_schedule() > > > which can be used where it is known that interrupts are disabled, > > > e.g. in interrupt-handlers, spin_lock_irq() sections or hrtimer > > > callbacks. > > > > > > On PREEMPT_RT enabled kernels this assumptions is not true. Force- > > > threaded interrupt handlers and spinlocks are not disabling interrupts > > > and the NAPI hrtimer callback is forced into softirq context which runs > > > with interrupts enabled as well. > > > > > > Chasing all usage sites of __napi_schedule_irqoff() is a whack-a-mole > > > game so make __napi_schedule_irqoff() invoke __napi_schedule() for > > > PREEMPT_RT kernels. > > > > > > The callers of ____napi_schedule() in the networking core have been > > > audited and are correct on PREEMPT_RT kernels as well. > > > > > > Reported-by: Juri Lelli <juri.lelli@redhat.com> > > > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> > > > > Reviewed-by: Thomas Gleixner <tglx@linutronix.de> > > > > > --- > > > Alternatively __napi_schedule_irqoff() could be #ifdef'ed out on RT and > > > an inline provided which invokes __napi_schedule(). > > > > > > This was not chosen as it creates #ifdeffery all over the place and with > > > the proposed solution the code reflects the documentation consistently > > > and in one obvious place. > > > > Blame me for that decision. > > > > No matter which variant we end up with, this needs to go into all stable > > RT kernels ASAP. > > Mumble mumble. I thought we concluded that drivers used on RT can be > fixed, we've already done it for a couple drivers (by which I mean two). > If all the IRQ handler is doing is scheduling NAPI (which it is for > modern NICs) - IRQF_NO_THREAD seems like the right option. > > Is there any driver you care about that we can convert to using > IRQF_NO_THREAD so we can have new drivers to "do the right thing" > while the old ones depend on this workaround for now? > > > Another thing while I have your attention - ____napi_schedule() does > __raise_softirq_irqoff() which AFAIU does not wake the ksoftirq thread. > On non-RT we get occasional NOHZ warnings when drivers schedule napi > from process context, but on RT this is even more of a problem, right? > ksoftirqd won't run until something else actually wakes it up? By "NOHZ warnings," do you mean "NOHZ: local_softirq_pending"? We see that message about once a week with 4.19. Presumably any failure of ____napi_schedule() to wake ksoftirqd could only cause problems for the NET_RX softirq, so if the pending softirq is different, the cause lies elsewhere. -- Alison Chaiken Aurora Innovation
On Fri, May 14 2021 at 11:56, Jakub Kicinski wrote: > On Thu, 13 May 2021 00:28:02 +0200 Thomas Gleixner wrote: >> > --- >> > Alternatively __napi_schedule_irqoff() could be #ifdef'ed out on RT and >> > an inline provided which invokes __napi_schedule(). >> > >> > This was not chosen as it creates #ifdeffery all over the place and with >> > the proposed solution the code reflects the documentation consistently >> > and in one obvious place. >> >> Blame me for that decision. >> >> No matter which variant we end up with, this needs to go into all stable >> RT kernels ASAP. > > Mumble mumble. I thought we concluded that drivers used on RT can be > fixed, we've already done it for a couple drivers (by which I mean two). > If all the IRQ handler is doing is scheduling NAPI (which it is for > modern NICs) - IRQF_NO_THREAD seems like the right option. Yes. That works, but there are a bunch which do more than that IIRC. > Is there any driver you care about that we can convert to using > IRQF_NO_THREAD so we can have new drivers to "do the right thing" > while the old ones depend on this workaround for now? The start of this thread was about i40e_msix_clean_rings() which probably falls under the IRQF_NO_THREAD category, but I'm sure that there are others. So I chose the safe way for RT for now. > Another thing while I have your attention - ____napi_schedule() does > __raise_softirq_irqoff() which AFAIU does not wake the ksoftirq thread. > On non-RT we get occasional NOHZ warnings when drivers schedule napi > from process context, but on RT this is even more of a problem, right? > ksoftirqd won't run until something else actually wakes it up? Correct. I sent a patch for the r8152 usb network driver today which suffers from that problem. :) As I said there, we want a (debug/lockdep) check in __napi_schedule() whether soft interrupts are disabled, but let me have a look whether that check might make more sense directly in __raise_softirq_irqoff(). Thanks, tglx
On Fri, 14 May 2021 22:16:10 +0200 Thomas Gleixner wrote: > On Fri, May 14 2021 at 11:56, Jakub Kicinski wrote: > > On Thu, 13 May 2021 00:28:02 +0200 Thomas Gleixner wrote: > >> Blame me for that decision. > >> > >> No matter which variant we end up with, this needs to go into all stable > >> RT kernels ASAP. > > > > Mumble mumble. I thought we concluded that drivers used on RT can be > > fixed, we've already done it for a couple drivers (by which I mean two). > > If all the IRQ handler is doing is scheduling NAPI (which it is for > > modern NICs) - IRQF_NO_THREAD seems like the right option. > > Yes. That works, but there are a bunch which do more than that IIRC. > > > Is there any driver you care about that we can convert to using > > IRQF_NO_THREAD so we can have new drivers to "do the right thing" > > while the old ones depend on this workaround for now? > > The start of this thread was about i40e_msix_clean_rings() which > probably falls under the IRQF_NO_THREAD category, but I'm sure that > there are others. So I chose the safe way for RT for now. Sounds reasonable. I'll send a patch with a new helper and convert an example driver I'm sure falls into the "napi_schedule(); return;" category. I just want to make sure "the right thing to do" is accessible for people writing new drivers.
On Fri, May 14 2021 at 12:44, Alison Chaiken wrote: > On Fri, May 14, 2021 at 11:56 AM Jakub Kicinski <kuba@kernel.org> wrote: >> >> Another thing while I have your attention - ____napi_schedule() does >> __raise_softirq_irqoff() which AFAIU does not wake the ksoftirq thread. >> On non-RT we get occasional NOHZ warnings when drivers schedule napi >> from process context, but on RT this is even more of a problem, right? >> ksoftirqd won't run until something else actually wakes it up? > > By "NOHZ warnings," do you mean "NOHZ: local_softirq_pending"? We see > that message about once a week with 4.19. Presumably any failure of > ____napi_schedule() to wake ksoftirqd could only cause problems for the > NET_RX softirq, so if the pending softirq is different, the cause lies > elsewhere. If you read the above carefully you might notice that this _IS_ about ____napi_schedule() being invoked from task context which raises NET_RX and then results in pending 08! which is NET_RX. Thanks, tglx
diff --git a/net/core/dev.c b/net/core/dev.c index 222b1d322c969..febb23708184e 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -6501,11 +6501,18 @@ EXPORT_SYMBOL(napi_schedule_prep); * __napi_schedule_irqoff - schedule for receive * @n: entry to schedule * - * Variant of __napi_schedule() assuming hard irqs are masked + * Variant of __napi_schedule() assuming hard irqs are masked. + * + * On PREEMPT_RT enabled kernels this maps to __napi_schedule() + * because the interrupt disabled assumption might not be true + * due to force-threaded interrupts and spinlock substitution. */ void __napi_schedule_irqoff(struct napi_struct *n) { - ____napi_schedule(this_cpu_ptr(&softnet_data), n); + if (!IS_ENABLED(CONFIG_PREEMPT_RT)) + ____napi_schedule(this_cpu_ptr(&softnet_data), n); + else + __napi_schedule(n); } EXPORT_SYMBOL(__napi_schedule_irqoff);
__napi_schedule_irqoff() is an optimized version of __napi_schedule() which can be used where it is known that interrupts are disabled, e.g. in interrupt-handlers, spin_lock_irq() sections or hrtimer callbacks. On PREEMPT_RT enabled kernels this assumptions is not true. Force- threaded interrupt handlers and spinlocks are not disabling interrupts and the NAPI hrtimer callback is forced into softirq context which runs with interrupts enabled as well. Chasing all usage sites of __napi_schedule_irqoff() is a whack-a-mole game so make __napi_schedule_irqoff() invoke __napi_schedule() for PREEMPT_RT kernels. The callers of ____napi_schedule() in the networking core have been audited and are correct on PREEMPT_RT kernels as well. Reported-by: Juri Lelli <juri.lelli@redhat.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> --- Alternatively __napi_schedule_irqoff() could be #ifdef'ed out on RT and an inline provided which invokes __napi_schedule(). This was not chosen as it creates #ifdeffery all over the place and with the proposed solution the code reflects the documentation consistently and in one obvious place. net/core/dev.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-)