Message ID | 24a844c3-c2e0-c735-ccb7-83736218b548@gmail.com |
---|---|
State | New |
Headers | show |
Series | mmc: host: meson-gx-mmc: fix possible deadlock condition for preempt_rt | expand |
Hi Brad, Brad Harper <bjharper@gmail.com> writes: > Force threaded interrupts for meson_mmc_irq to prevent possible deadlock > condition > during mmc operations when using preempt_rt with 5.9.0-rc3-rt3 patches > on arm64. > > Using meson-gx-mmc with an emmc device on Hardkernel Odroid N2+ > configured with > preempt_rt resulted in the soc becoming unresponsive. With lock > checking enabled > the below inconsistent lock state was observed during boot. > > After some discussions with tglx in IRC #linux-rt the attached patch was > suggested > to remove IRQF_ONESHOT from request_threaded_irq. > This has been tested and confirmed by me to resolve both the > unresponsive soc and > the inconsistent lock state warning when using 5.9.0-rc3-rt3 on arm64 > Odroid N2+. > > Further review and testing is required to ensure there are no adverse > impacts or > concerns and that is the correct method to resolve the problem. I will > continue > to test on various amlogic devices with both standard mainline low > latency kernel > and preempt_rt kernel with -rt patches. This looks right to me, thanks for sending a fix. For broader testing, I can add this to my testing branch so it gets booted on a bunch more platform in KernelCI also. However... [...] > Signed-off-by: Brad Harper <bjharper@gmail.com> > --- > drivers/mmc/host/meson-gx-mmc.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/mmc/host/meson-gx-mmc.c > b/drivers/mmc/host/meson-gx-mmc.c > index 08a3b1c05..130ac134d 100644 > --- a/drivers/mmc/host/meson-gx-mmc.c > +++ b/drivers/mmc/host/meson-gx-mmc.c > @@ -1139,7 +1139,7 @@ static int meson_mmc_probe(struct platform_device > *pdev) > host->regs + SD_EMMC_IRQ_EN); > > ret = request_threaded_irq(host->irq, meson_mmc_irq, > - meson_mmc_irq_thread, IRQF_ONESHOT, > + meson_mmc_irq_thread, 0, > dev_name(&pdev->dev), host); > if (ret) > goto err_init_clk; This patch has been mangled by your mailer, so it doesn't apply cleanly. If you're using the gmail web UI, this is a common problem. I strongly recommend using git-send-email to send directly via gmail SMTP. The git-send-email docs[1] give some examples on how to set this up. Kevin [1] https://git-scm.com/docs/git-send-email#_examples
On 2020-09-25 11:11:42 [+0200], Jerome Brunet wrote: > I'm not sure about this. > As you have explained on IRC, I understand that IRQF_ONESHOT is causing > trouble with RT as the hard IRQ part of the thread will not be migrated > to a thread. That was certainly not the intent when putting this flag. That is my understanding as well. > This seems pretty unsafe to me. Maybe we could improve the driver so it > copes with this case gracefully. ATM, I don't think it would. Running the primary handler in hardirq context is bad, because it invokes meson_mmc_request_done() at the very end. And here: - mmc_complete_cmd() -> complete_all() There is a lockdep_assert_RT_in_threaded_ctx() which should trigger. - led_trigger_event() -> led_trigger_event() This should trigger a might_sleep() warning somewhere. So removing IRQF_ONESHOT is okay but it should additionally disable the IRQ source in meson_mmc_irq() and re-enable back in meson_mmc_irq_thread(). Otherwise the IRQ remains asserted and may fire multiple times before the thread has a chance to run. Sebastian
On 2020-09-25 16:14:09 [+0200], Jerome Brunet wrote: > Looks like we need to do manually what IRQF_ONESHOT was doing for us :( IRQF_ONESHOT disables the IRQ at the irqchip level. You must ensure that the device keeps quite. Usually you mast the interrupt source at the device lee. > This brings a few questions: > > * The consideration you described is not mentioned near the description > of IRQF_ONESHOT. Maybe it should so other drivers with same intent > don't end up in the same pitfall ? From request_threaded_irq() -> | * If you want to set up a threaded irq handler for your device | * then you need to supply @handler and @thread_fn. @handler is | * still called in hard interrupt context and has to check | * whether the interrupt originates from the device. If yes it | * needs to disable the interrupt on the device and return | * IRQ_WAKE_THREAD which will wake up the handler thread and run | * @thread_fn. This split handler design is necessary to support | * shared interrupts. Just the line that saying what needs to be done before returning IRQ_WAKE_THREAD. > * Why doesn't RT move the IRQ with this flag ? Seems completly unrelated > to RT (maybe it is the same documentation problem) It is unrelated to RT. Mostly. You end up with the same problem booting with `threadirqs'. RT has the additional restrictions that you may not acquire any sleeping locks in hardirq context. This you can see with addinional lockdep magic. > * Can't we have flag doing the irq disable in the same way while still > allowing to RT to do its magic ? seems better than open coding it in > the driver ? Puh. That should be forwarded the IRQ department. So we have IRQF_NO_THREAD to avoid force threading. This is documented as such. Then we have IRQF_TIMER and IRQF_PERCPU which are also not force threaded and it is not documented as such. However it is used for the timer-IRQ, IPI, perf and such - things you obviously don't want to thread and need to run in hard-IRQ context. What you have ist a primary and secondary and IRQF_ONESHOT and don't want the primary handler to be force-threaded. I can't answer why we don't. However, drivers usually disable the source themself if they providing both handler. Sebastian
diff --git a/drivers/mmc/host/meson-gx-mmc.c b/drivers/mmc/host/meson-gx-mmc.c index 08a3b1c05..130ac134d 100644 --- a/drivers/mmc/host/meson-gx-mmc.c +++ b/drivers/mmc/host/meson-gx-mmc.c @@ -1139,7 +1139,7 @@ static int meson_mmc_probe(struct platform_device *pdev) host->regs + SD_EMMC_IRQ_EN); ret = request_threaded_irq(host->irq, meson_mmc_irq, - meson_mmc_irq_thread, IRQF_ONESHOT, + meson_mmc_irq_thread, 0, dev_name(&pdev->dev), host); if (ret)
Force threaded interrupts for meson_mmc_irq to prevent possible deadlock condition during mmc operations when using preempt_rt with 5.9.0-rc3-rt3 patches on arm64. Using meson-gx-mmc with an emmc device on Hardkernel Odroid N2+ configured with preempt_rt resulted in the soc becoming unresponsive. With lock checking enabled the below inconsistent lock state was observed during boot. After some discussions with tglx in IRC #linux-rt the attached patch was suggested to remove IRQF_ONESHOT from request_threaded_irq. This has been tested and confirmed by me to resolve both the unresponsive soc and the inconsistent lock state warning when using 5.9.0-rc3-rt3 on arm64 Odroid N2+. Further review and testing is required to ensure there are no adverse impacts or concerns and that is the correct method to resolve the problem. I will continue to test on various amlogic devices with both standard mainline low latency kernel and preempt_rt kernel with -rt patches. [ 7.858446] ================================ [ 7.858448] WARNING: inconsistent lock state [ 7.858450] 5.9.0-rc3-rt3+ #33 Not tainted [ 7.858453] -------------------------------- [ 7.858456] inconsistent {IN-HARDIRQ-R} -> {HARDIRQ-ON-W} usage. [ 7.858459] swapper/0/1 [HC0[0]:SC0[0]:HE1:SE1] takes: [ 7.858465] ffff80001219f4d8 (&trig->leddev_list_lock){+?.+}-{0:0}, at: led_trigger_set+0x104/0x270 [ 7.858482] {IN-HARDIRQ-R} state was registered at: [ 7.858484] lock_acquire+0xec/0x468 [ 7.858491] rt_read_lock+0xb0/0x108 [ 7.858497] led_trigger_event+0x34/0x88 [ 7.858501] mmc_request_done+0x3f0/0x450 [ 7.858505] meson_mmc_irq+0x284/0x378 [ 7.858511] __handle_irq_event_percpu+0xcc/0x4a8 [ 7.858515] handle_irq_event_percpu+0x60/0xb0 [ 7.858519] handle_irq_event+0x50/0x108 [ 7.858522] handle_fasteoi_irq+0xd0/0x180 [ 7.858527] generic_handle_irq+0x38/0x50 [ 7.858530] __handle_domain_irq+0x6c/0xc8 [ 7.858533] gic_handle_irq+0x5c/0xb8 [ 7.858537] el1_irq+0xbc/0x180 [ 7.858540] arch_cpu_idle+0x28/0x38 [ 7.858544] default_idle_call+0x90/0x3f0 [ 7.858547] do_idle+0x250/0x268 [ 7.858551] cpu_startup_entry+0x2c/0x78 [ 7.858554] rest_init+0x1b0/0x284 [ 7.858559] arch_call_rest_init+0x18/0x24 [ 7.858565] start_kernel+0x550/0x588 [ 7.858569] irq event stamp: 1925495 [ 7.858571] hardirqs last enabled at (1925495): [<ffff8000111e3ba4>] _raw_spin_unlock_irqrestore+0xa4/0xb0 [ 7.858576] hardirqs last disabled at (1925494): [<ffff8000111e3c58>] _raw_spin_lock_irqsave+0xa8/0xb8 [ 7.858580] softirqs last enabled at (1857856): [<ffff80001024705c>] bdi_register_va+0x114/0x368 [ 7.858586] softirqs last disabled at (1857849): [<ffff80001024705c>] bdi_register_va+0x114/0x368 [ 7.858590] other info that might help us debug this: [ 7.858592] Possible unsafe locking scenario: [ 7.858594] CPU0 [ 7.858595] ---- [ 7.858597] lock(&trig->leddev_list_lock); [ 7.858600] <Interrupt> [ 7.858602] lock(&trig->leddev_list_lock); [ 7.858604] *** DEADLOCK *** [ 7.858606] 3 locks held by swapper/0/1: [ 7.858609] #0: ffff80001219eb30 (leds_list_lock){++++}-{0:0}, at: led_trigger_register+0xf4/0x1c0 [ 7.858619] #1: ffff0000b0696a70 (&led_cdev->trigger_lock){+.+.}-{0:0}, at: led_trigger_register+0x134/0x1c0 [ 7.858629] #2: ffff800011fb83d0 (rcu_read_lock){....}-{1:2}, at: rt_write_lock+0x8/0x108 [ 7.858637] stack backtrace: [ 7.858640] CPU: 5 PID: 1 Comm: swapper/0 Not tainted 5.9.0-rc3-rt3+ #33 [ 7.858643] Hardware name: Hardkernel ODROID-N2Plus (DT) [ 7.858646] Call trace: [ 7.858647] dump_backtrace+0x0/0x1e8 [ 7.858650] show_stack+0x20/0x30 [ 7.858653] dump_stack+0xf0/0x164 [ 7.858659] print_usage_bug+0x2b4/0x2c0 [ 7.858662] mark_lock+0x2e8/0x360 [ 7.858665] __lock_acquire+0x238/0x1858 [ 7.858669] lock_acquire+0xec/0x468 [ 7.858672] rt_write_lock+0xb0/0x108 [ 7.858675] led_trigger_set+0x104/0x270 [ 7.858678] led_trigger_register+0x180/0x1c0 [ 7.858681] heartbeat_trig_init+0x28/0x5c [ 7.858686] do_one_initcall+0x90/0x4bc [ 7.858690] kernel_init_freeable+0x2cc/0x338 [ 7.858694] kernel_init+0x1c/0x11c [ 7.858697] ret_from_fork+0x10/0x34 Signed-off-by: Brad Harper <bjharper@gmail.com> --- drivers/mmc/host/meson-gx-mmc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) goto err_init_clk; -- 2.20.1