Message ID | 20230320193731.GA36840@zipoli.concurrent-rt.com |
---|---|
State | New |
Headers | show |
Series | [5.10.162-rt78] Restore initialization of wake_q_sleeper.next in fork.c | expand |
On Mon, Mar 20, 2023 at 03:37:31PM -0400, Joe Korty wrote: > In the transition from 5.10.158-rt77 to 5.10.162-rt78, > the initialization of task_struct::wake_q_sleeper.next > was dropped. Restore it. > > This appears to be only a problem in 5.10. 5.15 does not > have wake_q_sleeper; 4.19 does have it but its initialization > there is still present. > > The 5.10.162-rt78 patch that damaged fork.c is: > > 0170-locking-rtmutex-add-sleeping-lock-implementation.patch > > I do not have a simple test that brings out this problem. > My test consists of a shell script and eight binaries, > all of which were written in Ada. strace shows that it > does a few thousand forks in rapid succession. One of the > forks stalls out, after which no fork after that returns. > Eventually the 122 second stallout occurs and a large > number of threads are shown to be waiting for tasklist > lock, either in do_exit or in copy_process. The kernel > .config has rt and many debug features enabled, lockdep > included. Joe, thank you for investigating that problem and for writing a patch. Earlier today Steffen Dirkwinkel sent a similar patch: https://lore.kernel.org/all/20230320080347.32434-1-linux@steffen.cc/ Would you mind giving your ACK to his patch? I have that patch queued for my next build already. Thank you, Luis > Signed-off-by: Joe Korty <joe.korty@concurrent-rt.com > > Index: b/kernel/fork.c > =================================================================== > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -960,6 +960,7 @@ static struct task_struct *dup_task_stru > tsk->splice_pipe = NULL; > tsk->task_frag.page = NULL; > tsk->wake_q.next = NULL; > + tsk->wake_q_sleeper.next = NULL; > tsk->pf_io_worker = NULL; > > account_kernel_stack(tsk, 1); > ---end quoted text---
On Mon, Mar 20, 2023 at 05:00:13PM -0300, Luis Claudio R. Goncalves wrote: > On Mon, Mar 20, 2023 at 03:37:31PM -0400, Joe Korty wrote: > > In the transition from 5.10.158-rt77 to 5.10.162-rt78, > > the initialization of task_struct::wake_q_sleeper.next > > was dropped. Restore it. > > > > This appears to be only a problem in 5.10. 5.15 does not > > have wake_q_sleeper; 4.19 does have it but its initialization > > there is still present. > > > > The 5.10.162-rt78 patch that damaged fork.c is: > > > > 0170-locking-rtmutex-add-sleeping-lock-implementation.patch > > > > I do not have a simple test that brings out this problem. > > My test consists of a shell script and eight binaries, > > all of which were written in Ada. strace shows that it > > does a few thousand forks in rapid succession. One of the > > forks stalls out, after which no fork after that returns. > > Eventually the 122 second stallout occurs and a large > > number of threads are shown to be waiting for tasklist > > lock, either in do_exit or in copy_process. The kernel > > .config has rt and many debug features enabled, lockdep > > included. > > Joe, thank you for investigating that problem and for writing a patch. > > Earlier today Steffen Dirkwinkel sent a similar patch: > > https://lore.kernel.org/all/20230320080347.32434-1-linux@steffen.cc/ > > Would you mind giving your ACK to his patch? I have that patch queued for > my next build already. Acked-by: Joe Korty <joe.korty@concurrent-rt.com>
Index: b/kernel/fork.c =================================================================== --- a/kernel/fork.c +++ b/kernel/fork.c @@ -960,6 +960,7 @@ static struct task_struct *dup_task_stru tsk->splice_pipe = NULL; tsk->task_frag.page = NULL; tsk->wake_q.next = NULL; + tsk->wake_q_sleeper.next = NULL; tsk->pf_io_worker = NULL; account_kernel_stack(tsk, 1);
In the transition from 5.10.158-rt77 to 5.10.162-rt78, the initialization of task_struct::wake_q_sleeper.next was dropped. Restore it. This appears to be only a problem in 5.10. 5.15 does not have wake_q_sleeper; 4.19 does have it but its initialization there is still present. The 5.10.162-rt78 patch that damaged fork.c is: 0170-locking-rtmutex-add-sleeping-lock-implementation.patch I do not have a simple test that brings out this problem. My test consists of a shell script and eight binaries, all of which were written in Ada. strace shows that it does a few thousand forks in rapid succession. One of the forks stalls out, after which no fork after that returns. Eventually the 122 second stallout occurs and a large number of threads are shown to be waiting for tasklist lock, either in do_exit or in copy_process. The kernel .config has rt and many debug features enabled, lockdep included. Signed-off-by: Joe Korty <joe.korty@concurrent-rt.com