Message ID | c3caf60bfa78e5fdbdf483096b7174da65d1813a.1652168866.git.lukas@wunner.de |
---|---|
State | Accepted |
Commit | 792ea6a074ae7ea5ab6f1b8b31f76bb0297de66c |
Headers | show |
Series | genirq: Deduplicate WARN_ON_ONCE() in generic_handle_domain_irq() | expand |
On 2022-05-10 09:56:05 [+0200], Lukas Wunner wrote: > An example for irqchips where the warning is false positive are > USB-attached GPIO controllers such as drivers/gpio/gpio-dln2.c: They are not false positives becauseā¦ > USB gadgets are incapable of directly signaling an interrupt because > they cannot initiate a bus transaction by themselves. All communication > on the bus is initiated by the host controller, which polls a gadget's > Interrupt Endpoint in regular intervals. If an interrupt is pending, > that information is passed up the stack in softirq context, from which > a hardirq is synthesized via generic_handle_domain_irq(). they tell you that the context is wrong. From looking at gpio-dln2 this is called from USB URB's callback which is softirq. In the end dln2_gpio_event() is invoked while dln2_dev::event_cb_lock is acquired. That lock is acquired by disabling interrupts which is what gets the locking right for generic_handle_domain_irq(). If that lock lifted to spin_lock_bh() (because it is always in urb's calback context and all HCDs complete in one context unlike now) then this breaks. And PREEMPT_RT is broken already. Therefore, last week, I've been promoting generic_handle_domain_irq_safe() https://lkml.kernel.org/r/YnkfWFzvusFFktSt@linutronix.de and sadly I missed dln2. Please let me know if you have more users similar to dln2. I will add those to my list once upstream buys that interface. Sebastian
On Mon, May 16 2022 at 08:29, Sebastian Andrzej Siewior wrote: > On 2022-05-10 09:56:05 [+0200], Lukas Wunner wrote: >> An example for irqchips where the warning is false positive are >> USB-attached GPIO controllers such as drivers/gpio/gpio-dln2.c: > > They are not false positives becauseā¦ > >> USB gadgets are incapable of directly signaling an interrupt because >> they cannot initiate a bus transaction by themselves. All communication >> on the bus is initiated by the host controller, which polls a gadget's >> Interrupt Endpoint in regular intervals. If an interrupt is pending, >> that information is passed up the stack in softirq context, from which >> a hardirq is synthesized via generic_handle_domain_irq(). > > they tell you that the context is wrong. Why? These handlers can be called from any context, really. Yes. They need to be called with interrupts disabled, but that's it. The warning is checking hard interrupt context unconditionally. > From looking at gpio-dln2 this is called from USB URB's callback which > is softirq. In the end dln2_gpio_event() is invoked while > dln2_dev::event_cb_lock is acquired. That lock is acquired by > disabling interrupts which is what gets the locking right for > generic_handle_domain_irq(). If that lock lifted to spin_lock_bh() > (because it is always in urb's calback context and all HCDs complete > in one context unlike now) then this breaks. Yes, but that's a different problem. > And PREEMPT_RT is broken already. Therefore, last week, I've been > promoting generic_handle_domain_irq_safe() > https://lkml.kernel.org/r/YnkfWFzvusFFktSt@linutronix.de Well, that's just a wrapper which adds the local_irq_save(), so it's not any different from having the local_irq_save() at the callsite, unless I'm missing something. Thanks, tglx
On 2022-05-16 08:53:29 [+0200], Thomas Gleixner wrote: > > they tell you that the context is wrong. > > Why? These handlers can be called from any context, really. Yes. They > need to be called with interrupts disabled, but that's it. The warning > is checking hard interrupt context unconditionally. correct. If the context is wrong, the interrupts are usually not disabled. > > From looking at gpio-dln2 this is called from USB URB's callback which > > is softirq. In the end dln2_gpio_event() is invoked while > > dln2_dev::event_cb_lock is acquired. That lock is acquired by > > disabling interrupts which is what gets the locking right for > > generic_handle_domain_irq(). If that lock lifted to spin_lock_bh() > > (because it is always in urb's calback context and all HCDs complete > > in one context unlike now) then this breaks. > > Yes, but that's a different problem. > > > And PREEMPT_RT is broken already. Therefore, last week, I've been > > promoting generic_handle_domain_irq_safe() > > https://lkml.kernel.org/r/YnkfWFzvusFFktSt@linutronix.de > > Well, that's just a wrapper which adds the local_irq_save(), so it's not > any different from having the local_irq_save() at the callsite, unless > I'm missing something. I haven't seen a local_irq_save() at the callsite. If it is, then it is not any different. > Thanks, > > tglx Sebastian
On Mon, May 16, 2022 at 08:29:50AM +0200, Sebastian Andrzej Siewior wrote: > From looking at gpio-dln2 > this is called from USB URB's callback which is softirq. In the end > dln2_gpio_event() is invoked while dln2_dev::event_cb_lock is acquired. > That lock is acquired by disabling interrupts which is what gets the > locking right for generic_handle_domain_irq(). If that lock lifted to > spin_lock_bh() (because it is always in urb's calback context and all > HCDs complete in one context unlike now) then this breaks. I think you want WARN_ON(!irqs_disabled) to catch that, not WARN_ON_ONCE(!in_hardirq()). > and sadly I missed dln2. Please let me know if you have more users > similar to dln2. I will add those to my list once upstream buys that > interface. There's another USB GPIO irqchip queued up for 5.19 now: https://git.kernel.org/netdev/net-next/c/1ce8b37241ed And there's one more in drivers/hid/hid-cp2112.c, but that calls handle_nested_irq(), which disables hardirqs. Thanks, Lukas
diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c index 939d21c..da3e495 100644 --- a/kernel/irq/irqdesc.c +++ b/kernel/irq/irqdesc.c @@ -699,7 +699,6 @@ int generic_handle_irq_safe(unsigned int irq) */ int generic_handle_domain_irq(struct irq_domain *domain, unsigned int hwirq) { - WARN_ON_ONCE(!in_hardirq()); return handle_irq_desc(irq_resolve_mapping(domain, hwirq)); } EXPORT_SYMBOL_GPL(generic_handle_domain_irq);
Since commit 0953fb263714 ("irq: remove handle_domain_{irq,nmi}()"), generic_handle_domain_irq() warns if called outside hardirq context, even though the function calls down to handle_irq_desc(), which warns about the same. Moreover the newly added warning is false positive if the interrupt originates from any other irqchip than x86 APIC or arm GIC/GICv3. Those are the only ones for which handle_enforce_irqctx() returns true. Per commit c16816acd086 ("genirq: Add protection against unsafe usage of generic_handle_irq()"): "In general calling generic_handle_irq() with interrupts disabled from non interrupt context is harmless. [But for] interrupt controllers like the x86 trainwrecks this is outright dangerous as it might corrupt state if an interrupt affinity change is pending." An example for irqchips where the warning is false positive are USB-attached GPIO controllers such as drivers/gpio/gpio-dln2.c: USB gadgets are incapable of directly signaling an interrupt because they cannot initiate a bus transaction by themselves. All communication on the bus is initiated by the host controller, which polls a gadget's Interrupt Endpoint in regular intervals. If an interrupt is pending, that information is passed up the stack in softirq context, from which a hardirq is synthesized via generic_handle_domain_irq(). Deduplicate the warning to eliminate false positives and speed up IRQ handling. Fixes: 0953fb263714 ("irq: remove handle_domain_{irq,nmi}()") Signed-off-by: Lukas Wunner <lukas@wunner.de> Cc: stable@vger.kernel.org # v5.16+ Cc: Mark Rutland <mark.rutland@arm.com> --- kernel/irq/irqdesc.c | 1 - 1 file changed, 1 deletion(-)