Message ID | 20210520094224.14099-1-pawell@gli-login.cadence.com |
---|---|
State | New |
Headers | show |
Series | usb: cdnsp: Fix deadlock issue in cdnsp_thread_irq_handler | expand |
On 21-05-20 11:42:24, Pawel Laszczak wrote: > From: Pawel Laszczak <pawell@cadence.com> > > Patch fixes the critical issue caused by deadlock which has been detected > during testing NCM class. > > The root cause of issue is spin_lock/spin_unlock instruction instead > spin_lock_irqsave/spin_lock_irqrestore in cdnsp_thread_irq_handler > function. Pawel, would you please explain more about why the deadlock occurs with current code, and why this patch could fix it? Peter > > Cc: stable@vger.kernel.org > Fixes: 3d82904559f4 ("usb: cdnsp: cdns3 Add main part of Cadence USBSSP DRD Driver") > Signed-off-by: Pawel Laszczak <pawell@cadence.com> > --- > drivers/usb/cdns3/cdnsp-ring.c | 7 ++++--- > 1 file changed, 4 insertions(+), 3 deletions(-) > > diff --git a/drivers/usb/cdns3/cdnsp-ring.c b/drivers/usb/cdns3/cdnsp-ring.c > index 5f0513c96c04..68972746e363 100644 > --- a/drivers/usb/cdns3/cdnsp-ring.c > +++ b/drivers/usb/cdns3/cdnsp-ring.c > @@ -1517,13 +1517,14 @@ irqreturn_t cdnsp_thread_irq_handler(int irq, void *data) > { > struct cdnsp_device *pdev = (struct cdnsp_device *)data; > union cdnsp_trb *event_ring_deq; > + unsigned long flags; > int counter = 0; > > - spin_lock(&pdev->lock); > + spin_lock_irqsave(&pdev->lock, flags); > > if (pdev->cdnsp_state & (CDNSP_STATE_HALTED | CDNSP_STATE_DYING)) { > cdnsp_died(pdev); > - spin_unlock(&pdev->lock); > + spin_unlock_irqrestore(&pdev->lock, flags); > return IRQ_HANDLED; > } > > @@ -1539,7 +1540,7 @@ irqreturn_t cdnsp_thread_irq_handler(int irq, void *data) > > cdnsp_update_erst_dequeue(pdev, event_ring_deq, 1); > > - spin_unlock(&pdev->lock); > + spin_unlock_irqrestore(&pdev->lock, flags); > > return IRQ_HANDLED; > } > -- > 2.25.1 > -- Thanks, Peter Chen
> > >On 21-05-20 11:42:24, Pawel Laszczak wrote: >> From: Pawel Laszczak <pawell@cadence.com> >> >> Patch fixes the critical issue caused by deadlock which has been detected >> during testing NCM class. >> >> The root cause of issue is spin_lock/spin_unlock instruction instead >> spin_lock_irqsave/spin_lock_irqrestore in cdnsp_thread_irq_handler >> function. > >Pawel, would you please explain more about why the deadlock occurs with >current code, and why this patch could fix it? > I'm going to add such description to commit message: Lack of spin_lock_irqsave causes that during handling threaded interrupt inside spin_lock/spin_unlock section the ethernet subsystem invokes eth_start_xmit function from interrupt context which in turn starts queueing free requests in cdnsp driver. Consequently, the deadlock occurs inside cdnsp_gadget_ep_queue on spin_lock_irqsave instruction. Replacing spin_lock/spin_unlock with spin_lock_irqsave/spin_lock_irqrestor causes disableing interrupts and blocks queuing requests by ethernet subsystem until cdnsp_thread_irq_handler ends.. I hope this description is sufficient. Thanks, Pawel >> >> Cc: stable@vger.kernel.org >> Fixes: 3d82904559f4 ("usb: cdnsp: cdns3 Add main part of Cadence USBSSP DRD Driver") >> Signed-off-by: Pawel Laszczak <pawell@cadence.com> >> --- >> drivers/usb/cdns3/cdnsp-ring.c | 7 ++++--- >> 1 file changed, 4 insertions(+), 3 deletions(-) >> >> diff --git a/drivers/usb/cdns3/cdnsp-ring.c b/drivers/usb/cdns3/cdnsp-ring.c >> index 5f0513c96c04..68972746e363 100644 >> --- a/drivers/usb/cdns3/cdnsp-ring.c >> +++ b/drivers/usb/cdns3/cdnsp-ring.c >> @@ -1517,13 +1517,14 @@ irqreturn_t cdnsp_thread_irq_handler(int irq, void *data) >> { >> struct cdnsp_device *pdev = (struct cdnsp_device *)data; >> union cdnsp_trb *event_ring_deq; >> + unsigned long flags; >> int counter = 0; >> >> - spin_lock(&pdev->lock); >> + spin_lock_irqsave(&pdev->lock, flags); >> >> if (pdev->cdnsp_state & (CDNSP_STATE_HALTED | CDNSP_STATE_DYING)) { >> cdnsp_died(pdev); >> - spin_unlock(&pdev->lock); >> + spin_unlock_irqrestore(&pdev->lock, flags); >> return IRQ_HANDLED; >> } >> >> @@ -1539,7 +1540,7 @@ irqreturn_t cdnsp_thread_irq_handler(int irq, void *data) >> >> cdnsp_update_erst_dequeue(pdev, event_ring_deq, 1); >> >> - spin_unlock(&pdev->lock); >> + spin_unlock_irqrestore(&pdev->lock, flags); >> >> return IRQ_HANDLED; >> } >> -- >> 2.25.1 >> > >-- > >Thanks, >Peter Chen
On 21-05-24 10:56:23, Pawel Laszczak wrote: > > > > > >On 21-05-20 11:42:24, Pawel Laszczak wrote: > >> From: Pawel Laszczak <pawell@cadence.com> > >> > >> Patch fixes the critical issue caused by deadlock which has been detected > >> during testing NCM class. > >> > >> The root cause of issue is spin_lock/spin_unlock instruction instead > >> spin_lock_irqsave/spin_lock_irqrestore in cdnsp_thread_irq_handler > >> function. > > > >Pawel, would you please explain more about why the deadlock occurs with > >current code, and why this patch could fix it? > > > > I'm going to add such description to commit message: > > Lack of spin_lock_irqsave causes that during handling threaded > interrupt inside spin_lock/spin_unlock section the ethernet > subsystem invokes eth_start_xmit function from interrupt context > which in turn starts queueing free requests in cdnsp driver. > Consequently, the deadlock occurs inside cdnsp_gadget_ep_queue > on spin_lock_irqsave instruction. Replacing spin_lock/spin_unlock > with spin_lock_irqsave/spin_lock_irqrestor causes disableing %s/disableing/disabling > interrupts and blocks queuing requests by ethernet subsystem until > cdnsp_thread_irq_handler ends.. > > I hope this description is sufficient. A call stack graph may be better, like [1] [1]: https://www.spinics.net/lists/linux-usb/msg211931.html Peter > > Thanks, > Pawel > > >> > >> Cc: stable@vger.kernel.org > >> Fixes: 3d82904559f4 ("usb: cdnsp: cdns3 Add main part of Cadence USBSSP DRD Driver") > >> Signed-off-by: Pawel Laszczak <pawell@cadence.com> > >> --- > >> drivers/usb/cdns3/cdnsp-ring.c | 7 ++++--- > >> 1 file changed, 4 insertions(+), 3 deletions(-) > >> > >> diff --git a/drivers/usb/cdns3/cdnsp-ring.c b/drivers/usb/cdns3/cdnsp-ring.c > >> index 5f0513c96c04..68972746e363 100644 > >> --- a/drivers/usb/cdns3/cdnsp-ring.c > >> +++ b/drivers/usb/cdns3/cdnsp-ring.c > >> @@ -1517,13 +1517,14 @@ irqreturn_t cdnsp_thread_irq_handler(int irq, void *data) > >> { > >> struct cdnsp_device *pdev = (struct cdnsp_device *)data; > >> union cdnsp_trb *event_ring_deq; > >> + unsigned long flags; > >> int counter = 0; > >> > >> - spin_lock(&pdev->lock); > >> + spin_lock_irqsave(&pdev->lock, flags); > >> > >> if (pdev->cdnsp_state & (CDNSP_STATE_HALTED | CDNSP_STATE_DYING)) { > >> cdnsp_died(pdev); > >> - spin_unlock(&pdev->lock); > >> + spin_unlock_irqrestore(&pdev->lock, flags); > >> return IRQ_HANDLED; > >> } > >> > >> @@ -1539,7 +1540,7 @@ irqreturn_t cdnsp_thread_irq_handler(int irq, void *data) > >> > >> cdnsp_update_erst_dequeue(pdev, event_ring_deq, 1); > >> > >> - spin_unlock(&pdev->lock); > >> + spin_unlock_irqrestore(&pdev->lock, flags); > >> > >> return IRQ_HANDLED; > >> } > >> -- > >> 2.25.1 > >> > > > >-- > > > >Thanks, > >Peter Chen >
> >On 21-05-24 10:56:23, Pawel Laszczak wrote: >> > >> > >> >On 21-05-20 11:42:24, Pawel Laszczak wrote: >> >> From: Pawel Laszczak <pawell@cadence.com> >> >> >> >> Patch fixes the critical issue caused by deadlock which has been detected >> >> during testing NCM class. >> >> >> >> The root cause of issue is spin_lock/spin_unlock instruction instead >> >> spin_lock_irqsave/spin_lock_irqrestore in cdnsp_thread_irq_handler >> >> function. >> > >> >Pawel, would you please explain more about why the deadlock occurs with >> >current code, and why this patch could fix it? >> > >> >> I'm going to add such description to commit message: >> >> Lack of spin_lock_irqsave causes that during handling threaded >> interrupt inside spin_lock/spin_unlock section the ethernet >> subsystem invokes eth_start_xmit function from interrupt context >> which in turn starts queueing free requests in cdnsp driver. >> Consequently, the deadlock occurs inside cdnsp_gadget_ep_queue >> on spin_lock_irqsave instruction. Replacing spin_lock/spin_unlock >> with spin_lock_irqsave/spin_lock_irqrestor causes disableing > >%s/disableing/disabling > >> interrupts and blocks queuing requests by ethernet subsystem until >> cdnsp_thread_irq_handler ends.. >> >> I hope this description is sufficient. > >A call stack graph may be better, like [1] > >[1]: https://urldefense.com/v3/__https://www.spinics.net/lists/linux- >usb/msg211931.html__;!!EHscmS1ygiU1lA!WCxrsHL4OWKd3iPdyymp-dQlVCoiHM9QzjIZPUC6-Dm6cnpU5CRLLOHrgiYSRA$ After putting online extra cpus, I've manage to catch call trace showing the deadlock issue: [ 223.051673] smp: csd: Detected non-responsive CSD lock (#1) on CPU#0, waiting 5000000015 ns for CPU#02 do_sync_core+0x0/0x30(0x0). [ 223.051690] smp: csd: CSD lock (#1) unresponsive. [ 223.051693] Sending NMI from CPU 0 to CPUs 2: [ 223.052690] NMI backtrace for cpu 2 [ 223.052692] CPU: 2 PID: 3053 Comm: irq/16-0000:01: Tainted: G OE 5.11.0-rc1+ #5 [ 223.052693] Hardware name: ASUS All Series/Q87T, BIOS 0908 07/22/2014 [ 223.052693] RIP: 0010:native_queued_spin_lock_slowpath+0x61/0x1d0 [ 223.052695] Code: 0f ba 2f 08 0f 92 c0 0f b6 c0 c1 e0 08 89 c2 8b 07 30 e4 09 d0 a9 00 01 ff ff 75 47 85 c0 74 0e 8b 07 84 c0 74 08 f3 90 8b 07 <84> c0 75 f8 b8 01 00 00 00 5d 66 89 07 c3 8b 37 b8 00 02 00 00 81 [ 223.052696] RSP: 0018:ffffbc494011cde0 EFLAGS: 00000002 [ 223.052698] RAX: 0000000000000101 RBX: ffff9ee8116b4a68 RCX: 0000000000000000 [ 223.052699] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9ee8116b4658 [ 223.052700] RBP: ffffbc494011cde0 R08: 0000000000000001 R09: 0000000000000000 [ 223.052701] R10: ffff9ee8116b4670 R11: 0000000000000000 R12: ffff9ee8116b4658 [ 223.052702] R13: ffff9ee8116b4670 R14: 0000000000000246 R15: ffff9ee8116b4658 [ 223.052702] FS: 0000000000000000(0000) GS:ffff9ee89b400000(0000) knlGS:0000000000000000 [ 223.052703] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 223.052704] CR2: 00007f7bcc41a830 CR3: 000000007a612003 CR4: 00000000001706e0 [ 223.052705] Call Trace: [ 223.052706] <IRQ> [ 223.052706] do_raw_spin_lock+0xc0/0xd0 [ 223.052707] _raw_spin_lock_irqsave+0x95/0xa0 [ 223.052708] cdnsp_gadget_ep_queue.cold+0x88/0x107 [cdnsp_udc_pci] [ 223.052708] usb_ep_queue+0x35/0x110 [ 223.052709] eth_start_xmit+0x220/0x3d0 [u_ether] [ 223.052710] ncm_tx_timeout+0x34/0x40 [usb_f_ncm] [ 223.052711] ? ncm_free_inst+0x50/0x50 [usb_f_ncm] [ 223.052711] __hrtimer_run_queues+0xac/0x440 [ 223.052712] hrtimer_run_softirq+0x8c/0xb0 [ 223.052713] __do_softirq+0xcf/0x428 [ 223.052713] asm_call_irq_on_stack+0x12/0x20 [ 223.052714] </IRQ> [ 223.052715] do_softirq_own_stack+0x61/0x70 [ 223.052715] irq_exit_rcu+0xc1/0xd0 [ 223.052716] sysvec_apic_timer_interrupt+0x52/0xb0 [ 223.052717] asm_sysvec_apic_timer_interrupt+0x12/0x20 [ 223.052718] RIP: 0010:do_raw_spin_trylock+0x18/0x40 [ 223.052719] Code: ff ff 66 90 eb 86 66 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44 00 00 55 8b 07 48 89 e5 85 c0 75 29 ba 01 00 00 00 f0 0f b1 17 <75> 1e 65 8b 05 2f 77 ee 70 89 47 08 5d 65 48 8b 04 25 40 7e 01 00 [ 223.052720] RSP: 0018:ffffbc494138bda8 EFLAGS: 00000246 [ 223.052721] RAX: 0000000000000000 RBX: ffff9ee8116b4658 RCX: 0000000000000000 [ 223.052722] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff9ee8116b4658 [ 223.052723] RBP: ffffbc494138bda8 R08: 0000000000000001 R09: 0000000000000000 [ 223.052724] R10: ffff9ee8116b4670 R11: 0000000000000000 R12: ffff9ee8116b4658 [ 223.052725] R13: ffff9ee8116b4670 R14: ffff9ee7b5c73d80 R15: ffff9ee8116b4000 [ 223.052726] _raw_spin_lock+0x3d/0x70 [ 223.052726] ? cdnsp_thread_irq_handler.cold+0x32/0x112c [cdnsp_udc_pci] [ 223.052727] cdnsp_thread_irq_handler.cold+0x32/0x112c [cdnsp_udc_pci] [ 223.052728] ? cdnsp_remove_request+0x1f0/0x1f0 [cdnsp_udc_pci] [ 223.052729] ? cdnsp_thread_irq_handler+0x5/0xa0 [cdnsp_udc_pci] [ 223.052730] ? irq_thread+0xa0/0x1c0 [ 223.052730] irq_thread_fn+0x28/0x60 [ 223.052731] irq_thread+0x105/0x1c0 [ 223.052732] ? __kthread_parkme+0x42/0x90 [ 223.052732] ? irq_forced_thread_fn+0x90/0x90 [ 223.052733] ? wake_threads_waitq+0x30/0x30 [ 223.052734] ? irq_thread_check_affinity+0xe0/0xe0 [ 223.052735] kthread+0x12a/0x160 [ 223.052735] ? kthread_park+0x90/0x90 [ 223.052736] ret_from_fork+0x22/0x30 Pawel > >Peter > >> >> Thanks, >> Pawel >> >> >> >> >> Cc: stable@vger.kernel.org >> >> Fixes: 3d82904559f4 ("usb: cdnsp: cdns3 Add main part of Cadence USBSSP DRD Driver") >> >> Signed-off-by: Pawel Laszczak <pawell@cadence.com> >> >> --- >> >> drivers/usb/cdns3/cdnsp-ring.c | 7 ++++--- >> >> 1 file changed, 4 insertions(+), 3 deletions(-) >> >> >> >> diff --git a/drivers/usb/cdns3/cdnsp-ring.c b/drivers/usb/cdns3/cdnsp-ring.c >> >> index 5f0513c96c04..68972746e363 100644 >> >> --- a/drivers/usb/cdns3/cdnsp-ring.c >> >> +++ b/drivers/usb/cdns3/cdnsp-ring.c >> >> @@ -1517,13 +1517,14 @@ irqreturn_t cdnsp_thread_irq_handler(int irq, void *data) >> >> { >> >> struct cdnsp_device *pdev = (struct cdnsp_device *)data; >> >> union cdnsp_trb *event_ring_deq; >> >> + unsigned long flags; >> >> int counter = 0; >> >> >> >> - spin_lock(&pdev->lock); >> >> + spin_lock_irqsave(&pdev->lock, flags); >> >> >> >> if (pdev->cdnsp_state & (CDNSP_STATE_HALTED | CDNSP_STATE_DYING)) { >> >> cdnsp_died(pdev); >> >> - spin_unlock(&pdev->lock); >> >> + spin_unlock_irqrestore(&pdev->lock, flags); >> >> return IRQ_HANDLED; >> >> } >> >> >> >> @@ -1539,7 +1540,7 @@ irqreturn_t cdnsp_thread_irq_handler(int irq, void *data) >> >> >> >> cdnsp_update_erst_dequeue(pdev, event_ring_deq, 1); >> >> >> >> - spin_unlock(&pdev->lock); >> >> + spin_unlock_irqrestore(&pdev->lock, flags); >> >> >> >> return IRQ_HANDLED; >> >> } >> >> -- >> >> 2.25.1 >> >> >> > >> >-- >> > >> >Thanks, >> >Peter Chen >> > >-- > >Thanks, >Peter Chen
diff --git a/drivers/usb/cdns3/cdnsp-ring.c b/drivers/usb/cdns3/cdnsp-ring.c index 5f0513c96c04..68972746e363 100644 --- a/drivers/usb/cdns3/cdnsp-ring.c +++ b/drivers/usb/cdns3/cdnsp-ring.c @@ -1517,13 +1517,14 @@ irqreturn_t cdnsp_thread_irq_handler(int irq, void *data) { struct cdnsp_device *pdev = (struct cdnsp_device *)data; union cdnsp_trb *event_ring_deq; + unsigned long flags; int counter = 0; - spin_lock(&pdev->lock); + spin_lock_irqsave(&pdev->lock, flags); if (pdev->cdnsp_state & (CDNSP_STATE_HALTED | CDNSP_STATE_DYING)) { cdnsp_died(pdev); - spin_unlock(&pdev->lock); + spin_unlock_irqrestore(&pdev->lock, flags); return IRQ_HANDLED; } @@ -1539,7 +1540,7 @@ irqreturn_t cdnsp_thread_irq_handler(int irq, void *data) cdnsp_update_erst_dequeue(pdev, event_ring_deq, 1); - spin_unlock(&pdev->lock); + spin_unlock_irqrestore(&pdev->lock, flags); return IRQ_HANDLED; }