Message ID | 20240729214149.752663-1-edmund.raile@protonmail.com |
---|---|
Headers | show |
Series | ALSA: firewire-lib: restore process context workqueue to prevent deadlock | expand |
Hi, On Mon, Jul 29, 2024 at 09:42:02PM +0000, Edmund Raile wrote: > This patchset serves to prevent an AB/BA deadlock: > > thread 0: > * (lock A) acquire substream lock by > snd_pcm_stream_lock_irq() in > snd_pcm_status64() > * (lock B) wait for tasklet to finish by calling > tasklet_unlock_spin_wait() in > tasklet_disable_in_atomic() in > ohci_flush_iso_completions() of ohci.c > > thread 1: > * (lock B) enter tasklet > * (lock A) attempt to acquire substream lock, > waiting for it to be released: > snd_pcm_stream_lock_irqsave() in > snd_pcm_period_elapsed() in > update_pcm_pointers() in > process_ctx_payloads() in > process_rx_packets() of amdtp-stream.c > > ? tasklet_unlock_spin_wait > </NMI> > <TASK> > ohci_flush_iso_completions firewire_ohci > amdtp_domain_stream_pcm_pointer snd_firewire_lib > snd_pcm_update_hw_ptr0 snd_pcm > snd_pcm_status64 snd_pcm > > ? native_queued_spin_lock_slowpath > </NMI> > <IRQ> > _raw_spin_lock_irqsave > snd_pcm_period_elapsed snd_pcm > process_rx_packets snd_firewire_lib > irq_target_callback snd_firewire_lib > handle_it_packet firewire_ohci > context_tasklet firewire_ohci > > The issue has been reported as a regression of kernel 5.14: > Link: https://lore.kernel.org/regressions/kwryofzdmjvzkuw6j3clftsxmoolynljztxqwg76hzeo4simnl@jn3eo7pe642q/T/#u > ("[REGRESSION] ALSA: firewire-lib: snd_pcm_period_elapsed deadlock with > Fireface 800") > > Commit 7ba5ca32fe6e ("ALSA: firewire-lib: operate for period elapse event > in process context") removed the process context workqueue from > amdtp_domain_stream_pcm_pointer() and update_pcm_pointers() to remove > its overhead. > Commit b5b519965c4c ("ALSA: firewire-lib: obsolete workqueue for period > update") belongs to the same patch series and removed > the now-unused workqueue entirely. > > Though being observed on RME Fireface 800, this issue would affect all > Firewire audio interfaces using ohci amdtp + pcm streaming. > > ALSA streaming, especially under intensive CPU load will reveal this issue > the soonest due to issuing more hardIRQs, with time to occurrence ranging > from 2 secons to 30 minutes after starting playback. > > to reproduce the issue: > direct ALSA playback to the device: > mpv --audio-device=alsa/sysdefault:CARD=Fireface800 Spor-Ignition.flac > Time to occurrence: 2s to 30m > Likelihood increased by: > - high CPU load > stress --cpu $(nproc) > - switching between applications via workspaces > tested with i915 in Xfce > PulsaAudio / PipeWire conceal the issue as they run PCM substream > without period wakeup mode, issuing less hardIRQs. > > Cc: stable@vger.kernel.org > Backport note: > Also applies to and fixes on (tested): > 6.10.2, 6.9.12, 6.6.43, 6.1.102, 5.15.164 > > Edmund Raile (3): > Revert "ALSA: firewire-lib: obsolete workqueue for period update" > Revert "ALSA: firewire-lib: operate for period elapse event in process > context" > ALSA: firewire-lib: amdtp-stream work queue inline description > > sound/firewire/amdtp-stream.c | 38 ++++++++++++++++++++++------------- > sound/firewire/amdtp-stream.h | 1 + > 2 files changed, 25 insertions(+), 14 deletions(-) In my opinion, the patch just to change code comment is not preferable to apply stable and longterm kernels as fixes. It is acceptable to revise revert commits with slight changes to optimize codes and comments to current status, thus I would like you to amend the second and third patches so that the patchset consists of two revert commits. I'm sorry to make many requests to you, however it is a community for software development to which we are involved. It has some implicit and explicit conventions to which we could follow. Thanks Takashi Sakamoto