Message ID | 20190628182413.33225-1-john.stultz@linaro.org |
---|---|
Headers | show |
Series | Fix scheduling while atomic in dwc3_gadget_ep_dequeue | expand |
On Fri, Jun 28, 2019 at 06:24:04PM +0000, John Stultz wrote: >With recent changes in AOSP, adb is using asynchronous io, which >causes the following crash usually on a reboot: > >[ 184.278302] BUG: scheduling while atomic: ksoftirqd/0/9/0x00000104 >[ 184.284617] Modules linked in: wl18xx wlcore snd_soc_hdmi_codec wlcore_sdio tcpci_rt1711h tcpci tcpm typec adv7511 cec dwc3 phy_hi3660_usb3 snd_soc_simple_card snd_soc_a >[ 184.316034] Preemption disabled at: >[ 184.316072] [<ffffff8008081de4>] __do_softirq+0x64/0x398 >[ 184.324953] CPU: 0 PID: 9 Comm: ksoftirqd/0 Tainted: G S 4.19.43-00669-g8e4970572c43-dirty #356 >[ 184.334963] Hardware name: HiKey960 (DT) >[ 184.338892] Call trace: >[ 184.341352] dump_backtrace+0x0/0x158 >[ 184.345025] show_stack+0x14/0x20 >[ 184.348355] dump_stack+0x80/0xa4 >[ 184.351685] __schedule_bug+0x6c/0xc0 >[ 184.355363] __schedule+0x64c/0x978 >[ 184.358863] schedule+0x2c/0x90 >[ 184.362053] dwc3_gadget_ep_dequeue+0x274/0x388 [dwc3] >[ 184.367210] usb_ep_dequeue+0x24/0xf8 >[ 184.370884] ffs_aio_cancel+0x3c/0x80 >[ 184.374561] free_ioctx_users+0x40/0x148 >[ 184.378500] percpu_ref_switch_to_atomic_rcu+0x180/0x1c0 >[ 184.383830] rcu_process_callbacks+0x24c/0x5d8 >[ 184.388283] __do_softirq+0x13c/0x398 >[ 184.391959] run_ksoftirqd+0x3c/0x48 >[ 184.395549] smpboot_thread_fn+0x220/0x288 >[ 184.399660] kthread+0x12c/0x130 >[ 184.402901] ret_from_fork+0x10/0x1c > > >This happens as usb_ep_dequeue can be called in interrupt >context, and dwc3_gadget_ep_dequeue() then calls >wait_event_lock_irq() which can sleep. > >Upstream kernels are not affected due to the change >fec9095bdef4 ("dwc3: gadget: remove wait_end_transfer") which >removes the wait_even_lock_irq code. Unfortunately that change >has a number of dependencies, which I'm submitting here. > >Also, to match upstream, in this series I've reverted one >change that was backported to -stable, to replace it with the >cherry-picked upstream commit (as the dependencies are now >there) > >This issue also affects 4.14,4.9 and I believe 4.4 kernels, >however I don't know how to best backport this functionality >that far back. Help from the maintainers would be very much >appreciated! > > >New in v2: >* Reordered the patchset to put the revert patch first, which > avoids any bisection build issues. (Thanks to Jack Pham for > the suggestion!) > > >Feedback and comments would be welcome! I've queued it up for 4.19. Is it the case that for older kernels the dependency list is too long? -- Thanks, Sasha
Hi, John Stultz wrote: > On Fri, Jun 28, 2019 at 3:58 PM Sasha Levin <sashal@kernel.org> wrote: >> On Fri, Jun 28, 2019 at 06:24:04PM +0000, John Stultz wrote: >>> With recent changes in AOSP, adb is using asynchronous io, which >>> causes the following crash usually on a reboot: >>> >>> [ 184.278302] BUG: scheduling while atomic: ksoftirqd/0/9/0x00000104 >>> [ 184.284617] Modules linked in: wl18xx wlcore snd_soc_hdmi_codec wlcore_sdio tcpci_rt1711h tcpci tcpm typec adv7511 cec dwc3 phy_hi3660_usb3 snd_soc_simple_card snd_soc_a >>> [ 184.316034] Preemption disabled at: >>> [ 184.316072] [<ffffff8008081de4>] __do_softirq+0x64/0x398 >>> [ 184.324953] CPU: 0 PID: 9 Comm: ksoftirqd/0 Tainted: G S 4.19.43-00669-g8e4970572c43-dirty #356 >>> [ 184.334963] Hardware name: HiKey960 (DT) >>> [ 184.338892] Call trace: >>> [ 184.341352] dump_backtrace+0x0/0x158 >>> [ 184.345025] show_stack+0x14/0x20 >>> [ 184.348355] dump_stack+0x80/0xa4 >>> [ 184.351685] __schedule_bug+0x6c/0xc0 >>> [ 184.355363] __schedule+0x64c/0x978 >>> [ 184.358863] schedule+0x2c/0x90 >>> [ 184.362053] dwc3_gadget_ep_dequeue+0x274/0x388 [dwc3] >>> [ 184.367210] usb_ep_dequeue+0x24/0xf8 >>> [ 184.370884] ffs_aio_cancel+0x3c/0x80 >>> [ 184.374561] free_ioctx_users+0x40/0x148 >>> [ 184.378500] percpu_ref_switch_to_atomic_rcu+0x180/0x1c0 >>> [ 184.383830] rcu_process_callbacks+0x24c/0x5d8 >>> [ 184.388283] __do_softirq+0x13c/0x398 >>> [ 184.391959] run_ksoftirqd+0x3c/0x48 >>> [ 184.395549] smpboot_thread_fn+0x220/0x288 >>> [ 184.399660] kthread+0x12c/0x130 >>> [ 184.402901] ret_from_fork+0x10/0x1c >>> >>> >>> This happens as usb_ep_dequeue can be called in interrupt >>> context, and dwc3_gadget_ep_dequeue() then calls >>> wait_event_lock_irq() which can sleep. >>> >>> Upstream kernels are not affected due to the change >>> fec9095bdef4 ("dwc3: gadget: remove wait_end_transfer") which >>> removes the wait_even_lock_irq code. Unfortunately that change >>> has a number of dependencies, which I'm submitting here. >>> >>> Also, to match upstream, in this series I've reverted one >>> change that was backported to -stable, to replace it with the >>> cherry-picked upstream commit (as the dependencies are now >>> there) >>> >>> This issue also affects 4.14,4.9 and I believe 4.4 kernels, >>> however I don't know how to best backport this functionality >>> that far back. Help from the maintainers would be very much >>> appreciated! >>> >>> >>> New in v2: >>> * Reordered the patchset to put the revert patch first, which >>> avoids any bisection build issues. (Thanks to Jack Pham for >>> the suggestion!) >>> >>> >>> Feedback and comments would be welcome! >> I've queued it up for 4.19. >> >> Is it the case that for older kernels the dependency list is too long? > Yea. It gets ugly and I'm not enough of an expert on the driver to > feel comfortable knowing if I'm doing the right thing reworking this > stack onto an even older tree. > > But I do see crashes on reboot w/ 4.14 and 4.9 (I and suspect 4.4 as > well), so I'll need to figure out something eventually. > > If you're backporting this series, then you also need to apply these fixes for this series: This fixes a race issue: c5353b225df9 ("usb: dwc3: gadget: don't enable interrupt when disabling endpoint") This fixes incorrect TRB skip: c7152763f02e ("usb: dwc3: Reset num_trbs after skipping") BR, Thinh