mbox series

[00/16] ptrace: cleanups and calling do_cldstop with only siglock

Message ID 871qwq5ucx.fsf_-_@email.froward.int.ebiederm.org
Headers show
Series ptrace: cleanups and calling do_cldstop with only siglock | expand

Message

Eric W. Biederman May 18, 2022, 10:49 p.m. UTC
For ptrace_stop to work on PREEMT_RT no spinlocks can be taken once
ptrace_freeze_traced has completed successfully.  Which fundamentally
means the lock dance of dropping siglock and grabbing tasklist_lock does
not work on PREEMPT_RT.  So I have worked through what is necessary so
that tasklist_lock does not need to be grabbed in ptrace_stop after
siglock is dropped.

I have explored several alternate ways of getting there and along the
way I found a lot of small bug fixes/cleanups that don't necessarily
contribute to the final result but that or worthwhile on their own.  So
I have included those changes in this set of changes just so they don't
get lost.

In addition I had a conversation with Thomas Gleixner recently that
emphasized for me the need to reduce the hold times of tasklist_lock,
and that made me realize that in principle it is possible.
https://lkml.kernel.org/r/87mtfmhap2.fsf@email.froward.int.ebiederm.org

Which is a long way of saying that not taking tasklist_lock in
ptrace_stop is good not just for PREMPT_RT but also for improving the
scalability of the kernel in general.

After this set of changes only cgroup_enter_frozen should remain a
stumbling block for PREEMPT_RT in the ptrace_stop path.

Eric W. Biederman (16):
      signal/alpha: Remove unused definition of TASK_REAL_PARENT
      signal/ia64: Remove unused definition of IA64_TASK_REAL_PARENT_OFFSET
      kdb: Use real_parent when displaying a list of processes
      powerpc/xmon:  Use real_parent when displaying a list of processes
      ptrace: Remove dead code from __ptrace_detach
      ptrace: Remove unnecessary locking in ptrace_(get|set)siginfo
      signal: Wake up the designated parent
      ptrace: Only populate last_siginfo from ptrace
      ptrace: In ptrace_setsiginfo deal with invalid si_signo
      ptrace: In ptrace_signal look at what the debugger did with siginfo
      ptrace: Use si_sino as the signal number to resume with
      ptrace: Stop protecting ptrace_set_signr with tasklist_lock
      ptrace: Document why ptrace_setoptions does not need a lock
      signal: Protect parent child relationships by childs siglock
      ptrace: Use siglock instead of tasklist_lock in ptrace_check_attach
      signal: Always call do_notify_parent_cldstop with siglock held

 arch/alpha/kernel/asm-offsets.c |   1 -
 arch/ia64/kernel/asm-offsets.c  |   1 -
 arch/powerpc/xmon/xmon.c        |   2 +-
 kernel/debug/kdb/kdb_main.c     |   2 +-
 kernel/exit.c                   |  23 +++-
 kernel/fork.c                   |  12 +-
 kernel/ptrace.c                 | 132 ++++++++----------
 kernel/signal.c                 | 296 ++++++++++++++++++++++++++--------------
 8 files changed, 279 insertions(+), 190 deletions(-)

Eric

Comments

Sebastian Andrzej Siewior May 19, 2022, 6:19 a.m. UTC | #1
On 2022-05-18 20:26:05 [-0700], Kyle Huey wrote:
> Is there a git branch somewhere I can pull to test this? It doesn't apply
> cleanly to Linus's tip.

https://kernel.googlesource.com/pub/scm/linux/kernel/git/ebiederm/user-namespace.git ptrace_stop-cleanup-for-v5.19

> - Kyle

Sebastian
Eric W. Biederman May 19, 2022, 6:05 p.m. UTC | #2
Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:

> On 2022-05-18 20:26:05 [-0700], Kyle Huey wrote:
>> Is there a git branch somewhere I can pull to test this? It doesn't apply
>> cleanly to Linus's tip.
>
> https://kernel.googlesource.com/pub/scm/linux/kernel/git/ebiederm/user-namespace.git ptrace_stop-cleanup-for-v5.19

Yes that is the branch this all applies to.

This is my second round of cleanups this cycle for this code.
I just keep finding little things that deserve to be changed,
when I am working on the more substantial issues.

Eric
Doug Anderson May 19, 2022, 8:52 p.m. UTC | #3
Hi,

On Wed, May 18, 2022 at 3:54 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> kdb has a bug that when using the ps command to display a list of
> processes, if a process is being debugged the debugger as the parent
> process.
>
> This is silly, and I expect it never comes up in ptractice.  As there
> is very little point in using gdb and kdb simultaneously.  Update the
> code to use real_parent so that it is clear kdb does not want to
> display a debugger as the parent of a process.

So I would tend to defer to Daniel, but I'm not convinced that the
behavior you describe for kdb today _is_ actually silly.

If I was in kdb and I was listing processes, I might actually want to
see that a process's parent was set to gdb. Presumably that would tell
me extra information that might be relevant to my debug session.

Personally, I'd rather add an extra piece of information into the list
showing the real parent if it's not the same as the parent. Then
you're not throwing away information.

-Doug
Eric W. Biederman May 19, 2022, 11:48 p.m. UTC | #4
Doug Anderson <dianders@chromium.org> writes:

> Hi,
>
> On Wed, May 18, 2022 at 3:54 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>>
>> kdb has a bug that when using the ps command to display a list of
>> processes, if a process is being debugged the debugger as the parent
>> process.
>>
>> This is silly, and I expect it never comes up in ptractice.  As there
>> is very little point in using gdb and kdb simultaneously.  Update the
>> code to use real_parent so that it is clear kdb does not want to
>> display a debugger as the parent of a process.
>
> So I would tend to defer to Daniel, but I'm not convinced that the
> behavior you describe for kdb today _is_ actually silly.
>
> If I was in kdb and I was listing processes, I might actually want to
> see that a process's parent was set to gdb. Presumably that would tell
> me extra information that might be relevant to my debug session.
>
> Personally, I'd rather add an extra piece of information into the list
> showing the real parent if it's not the same as the parent. Then
> you're not throwing away information.

The name of the field is confusing for anyone who isn't intimate with
the implementation details.  The function getppid returns
tsk->real_parent->tgid.

If kdb wants information of what the tracer is that is fine, but I
recommend putting that information in another field.

Given that the original description says give the information that ps
gives my sense is that kdb is currently wrong.  Especially as it does
not give you the actual parentage anywhere.

I can certainly be convinced, but I do want some clarity.  It looks very
attractive to rename task->parent to task->ptracer and leave the field
NULL when there is no tracer.

Eric
Kyle Huey May 20, 2022, 5:24 a.m. UTC | #5
On Thu, May 19, 2022 at 11:05 AM Eric W. Biederman
<ebiederm@xmission.com> wrote:
>
> Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:
>
> > On 2022-05-18 20:26:05 [-0700], Kyle Huey wrote:
> >> Is there a git branch somewhere I can pull to test this? It doesn't apply
> >> cleanly to Linus's tip.
> >
> > https://kernel.googlesource.com/pub/scm/linux/kernel/git/ebiederm/user-namespace.git ptrace_stop-cleanup-for-v5.19
>
> Yes that is the branch this all applies to.
>
> This is my second round of cleanups this cycle for this code.
> I just keep finding little things that deserve to be changed,
> when I am working on the more substantial issues.
>
> Eric

When running the rr test suite, I see hangs like this

[  812.151505] watchdog: BUG: soft lockup - CPU#3 stuck for 548s!
[condvar_stress-:12152]
[  812.151529] Modules linked in: snd_hda_codec_realtek
snd_hda_codec_generic ledtrig_audio rfcomm cmac algif_hash
algif_skcipher af_alg bnep dm_crypt intel_rapl_msr mei_hdcp
snd_hda_codec_
hdmi intel_rapl_common snd_hda_intel x86_pkg_temp_thermal
snd_intel_dspcfg snd_intel_sdw_acpi nls_iso8859_1 intel_powerclamp
snd_hda_codec coretemp snd_hda_core snd_hwdep snd_pcm rtl8723be
btcoexist snd_seq_midi snd_seq_midi_event rtl8723_common kvm_intel
rtl_pci snd_rawmidi rtlwifi btusb btrtl btbcm snd_seq kvm mac80211
btintel btmtk snd_seq_device rapl bluetooth snd_timer i
ntel_cstate hp_wmi cfg80211 serio_raw snd platform_profile
ecdh_generic mei_me sparse_keymap efi_pstore wmi_bmof ee1004 joydev
input_leds ecc libarc4 soundcore mei acpi_pad mac_hid sch_fq_c
odel ipmi_devintf ipmi_msghandler msr vhost_vsock
vmw_vsock_virtio_transport_common vsock vhost_net vhost vhost_iotlb
tap vhci_hcd usbip_core parport_pc ppdev lp parport ip_tables x_tables
autofs4 btrfs blake2b_generic xor raid6_pq zstd_compress
[  812.151570]  libcrc32c hid_generic usbhid hid i915 drm_buddy
i2c_algo_bit ttm drm_dp_helper cec rc_core crct10dif_pclmul
drm_kms_helper crc32_pclmul syscopyarea ghash_clmulni_intel sysfi
llrect sysimgblt fb_sys_fops aesni_intel crypto_simd cryptd r8169
psmouse drm i2c_i801 realtek ahci i2c_smbus xhci_pci libahci
xhci_pci_renesas wmi video
[  812.151584] CPU: 3 PID: 12152 Comm: condvar_stress- Tainted: G
    I  L    5.18.0-rc1+ #2
[  812.151586] Hardware name: HP 750-280st/2B4B, BIOS A0.11 02/24/2016
[  812.151587] RIP: 0010:_raw_spin_unlock_irq+0x15/0x40
[  812.151591] Code: df e8 3f 1f 4a ff 90 5b 5d c3 66 66 2e 0f 1f 84
00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 c6 07 00 0f 1f 00 fb 0f 1f
44 00 00 <bf> 01 00 00 00 e8 41 95 46 ff 65 8b 05 9
a c1 9a 5f 85 c0 74 02 5d
[  812.151593] RSP: 0018:ffffa863c246bd70 EFLAGS: 00000246
[  812.151594] RAX: ffff8bc0913f6400 RBX: ffff8bc0913f6400 RCX: 0000000000000000
[  812.151595] RDX: 0000000000000002 RSI: 00000000000a0013 RDI: ffff8bc089b63180
[  812.151596] RBP: ffffa863c246bd70 R08: ffff8bc0811d6b40 R09: ffff8bc089b63180
[  812.151597] R10: 0000000000000000 R11: 0000000000000004 R12: ffff8bc0913f6400
[  812.151597] R13: ffff8bc089b63180 R14: ffff8bc0913f6400 R15: ffffa863c246be68
[  812.151598] FS:  00007f612dda5700(0000) GS:ffff8bc7e24c0000(0000)
knlGS:0000000000000000
[  812.151599] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  812.151600] CR2: 000055e70715692e CR3: 000000010b4e8005 CR4: 00000000003706e4
[  812.151601] Call Trace:
[  812.151602]  <TASK>
[  812.151604]  do_signal_stop+0x228/0x260
[  812.151606]  get_signal+0x43a/0x8e0
[  812.151608]  arch_do_signal_or_restart+0x37/0x7d0
[  812.151610]  ? __this_cpu_preempt_check+0x13/0x20
[  812.151612]  ? __perf_event_task_sched_in+0x81/0x230
[  812.151616]  ? __this_cpu_preempt_check+0x13/0x20
[  812.151617]  exit_to_user_mode_prepare+0x130/0x1a0
[  812.151620]  syscall_exit_to_user_mode+0x26/0x40
[  812.151621]  ret_from_fork+0x15/0x30
[  812.151623] RIP: 0033:0x7f612dfcd125
[  812.151625] Code: 48 85 ff 74 3d 48 85 f6 74 38 48 83 ee 10 48 89
4e 08 48 89 3e 48 89 d7 4c 89 c2 4d 89 c8 4c 8b 54 24 08 b8 38 00 00
00 0f 05 <48> 85 c0 7c 13 74 01 c3 31 ed 58 5f ff d
0 48 89 c7 b8 3c 00 00 00
[  812.151626] RSP: 002b:00007f612dda4fb0 EFLAGS: 00000246 ORIG_RAX:
0000000000000038
[  812.151628] RAX: 0000000000000000 RBX: 00007f612dda5700 RCX: ffffffffffffffff
[  812.151628] RDX: 00007f612dda59d0 RSI: 00007f612dda4fb0 RDI: 00000000003d0f00
[  812.151629] RBP: 00007ffd59ad20b0 R08: 00007f612dda5700 R09: 00007f612dda5700
[  812.151630] R10: 00007f612dda59d0 R11: 0000000000000246 R12: 00007ffd59ad20ae
[  812.151631] R13: 00007ffd59ad20af R14: 00007ffd59ad20b0 R15: 00007f612dda4fc0
[  812.151632]  </TASK>

- Kyle
Sebastian Andrzej Siewior May 20, 2022, 7:33 a.m. UTC | #6
On 2022-05-18 17:49:50 [-0500], Eric W. Biederman wrote:
> 
> For ptrace_stop to work on PREEMT_RT no spinlocks can be taken once
> ptrace_freeze_traced has completed successfully.  Which fundamentally
> means the lock dance of dropping siglock and grabbing tasklist_lock does
> not work on PREEMPT_RT.  So I have worked through what is necessary so
> that tasklist_lock does not need to be grabbed in ptrace_stop after
> siglock is dropped.
…
It took me a while to realise that this is a follow-up I somehow assumed
that you added a few patches on top. Might have been the yesterday's
heat. b4 also refused to download this series because the v4 in this
thread looked newer… Anyway. Both series applied:

| =============================
| WARNING: suspicious RCU usage
| 5.18.0-rc7+ #16 Not tainted
| -----------------------------
| include/linux/ptrace.h:120 suspicious rcu_dereference_check() usage!
|
| other info that might help us debug this:
|
| rcu_scheduler_active = 2, debug_locks = 1
| 2 locks held by ssdd/1734:
|  #0: ffff88800eaa6918 (&sighand->siglock){....}-{2:2}, at: lock_parents_siglocks+0xf0/0x3b0
|  #1: ffff88800eaa71d8 (&sighand->siglock/2){....}-{2:2}, at: lock_parents_siglocks+0x115/0x3b0
|
| stack backtrace:
| CPU: 2 PID: 1734 Comm: ssdd Not tainted 5.18.0-rc7+ #16
| Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
| Call Trace:
|  <TASK>
|  dump_stack_lvl+0x45/0x5a
|  unlock_parents_siglocks+0xb6/0xc0
|  ptrace_stop+0xb9/0x390
|  get_signal+0x51c/0x8d0
|  arch_do_signal_or_restart+0x31/0x750
|  exit_to_user_mode_prepare+0x157/0x220
|  irqentry_exit_to_user_mode+0x5/0x50
|  asm_sysvec_apic_timer_interrupt+0x12/0x20

That is ptrace_parent() in unlock_parents_siglocks().

Sebastian
Sebastian Andrzej Siewior May 20, 2022, 9:19 a.m. UTC | #7
On 2022-05-18 17:49:50 [-0500], Eric W. Biederman wrote:
> After this set of changes only cgroup_enter_frozen should remain a
> stumbling block for PREEMPT_RT in the ptrace_stop path.

Yes, I can confirm that. I have no systemd-less system at hand which
means I can't boot a kernel without CGROUP support. But after removing
cgroup_{enter|leave}_frozen() in ptrace_stop() I don't see the problems
I saw earlier.

Sebastian
Eric W. Biederman May 20, 2022, 7:32 p.m. UTC | #8
Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:

> On 2022-05-18 17:49:50 [-0500], Eric W. Biederman wrote:
>> 
>> For ptrace_stop to work on PREEMT_RT no spinlocks can be taken once
>> ptrace_freeze_traced has completed successfully.  Which fundamentally
>> means the lock dance of dropping siglock and grabbing tasklist_lock does
>> not work on PREEMPT_RT.  So I have worked through what is necessary so
>> that tasklist_lock does not need to be grabbed in ptrace_stop after
>> siglock is dropped.
> …
> It took me a while to realise that this is a follow-up I somehow assumed
> that you added a few patches on top. Might have been the yesterday's
> heat. b4 also refused to download this series because the v4 in this
> thread looked newer… Anyway. Both series applied:
>
> | =============================
> | WARNING: suspicious RCU usage
> | 5.18.0-rc7+ #16 Not tainted
> | -----------------------------
> | include/linux/ptrace.h:120 suspicious rcu_dereference_check() usage!
> |
> | other info that might help us debug this:
> |
> | rcu_scheduler_active = 2, debug_locks = 1
> | 2 locks held by ssdd/1734:
> |  #0: ffff88800eaa6918 (&sighand->siglock){....}-{2:2}, at: lock_parents_siglocks+0xf0/0x3b0
> |  #1: ffff88800eaa71d8 (&sighand->siglock/2){....}-{2:2}, at: lock_parents_siglocks+0x115/0x3b0
> |
> | stack backtrace:
> | CPU: 2 PID: 1734 Comm: ssdd Not tainted 5.18.0-rc7+ #16
> | Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
> | Call Trace:
> |  <TASK>
> |  dump_stack_lvl+0x45/0x5a
> |  unlock_parents_siglocks+0xb6/0xc0
> |  ptrace_stop+0xb9/0x390
> |  get_signal+0x51c/0x8d0
> |  arch_do_signal_or_restart+0x31/0x750
> |  exit_to_user_mode_prepare+0x157/0x220
> |  irqentry_exit_to_user_mode+0x5/0x50
> |  asm_sysvec_apic_timer_interrupt+0x12/0x20
>
> That is ptrace_parent() in unlock_parents_siglocks().

How odd.  I thought I had the appropriate lockdep config options enabled
in my test build to catch things like this.  I guess not.

Now I am trying to think how to tell it that holding the appropriate
iglock makes this ok.

Eric
Peter Zijlstra May 20, 2022, 7:58 p.m. UTC | #9
On Fri, May 20, 2022 at 02:32:24PM -0500, Eric W. Biederman wrote:
> Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:
> 
> > On 2022-05-18 17:49:50 [-0500], Eric W. Biederman wrote:
> >> 
> >> For ptrace_stop to work on PREEMT_RT no spinlocks can be taken once
> >> ptrace_freeze_traced has completed successfully.  Which fundamentally
> >> means the lock dance of dropping siglock and grabbing tasklist_lock does
> >> not work on PREEMPT_RT.  So I have worked through what is necessary so
> >> that tasklist_lock does not need to be grabbed in ptrace_stop after
> >> siglock is dropped.
> > …
> > It took me a while to realise that this is a follow-up I somehow assumed
> > that you added a few patches on top. Might have been the yesterday's
> > heat. b4 also refused to download this series because the v4 in this
> > thread looked newer… Anyway. Both series applied:
> >
> > | =============================
> > | WARNING: suspicious RCU usage
> > | 5.18.0-rc7+ #16 Not tainted
> > | -----------------------------
> > | include/linux/ptrace.h:120 suspicious rcu_dereference_check() usage!
> > |
> > | other info that might help us debug this:
> > |
> > | rcu_scheduler_active = 2, debug_locks = 1
> > | 2 locks held by ssdd/1734:
> > |  #0: ffff88800eaa6918 (&sighand->siglock){....}-{2:2}, at: lock_parents_siglocks+0xf0/0x3b0
> > |  #1: ffff88800eaa71d8 (&sighand->siglock/2){....}-{2:2}, at: lock_parents_siglocks+0x115/0x3b0
> > |
> > | stack backtrace:
> > | CPU: 2 PID: 1734 Comm: ssdd Not tainted 5.18.0-rc7+ #16
> > | Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
> > | Call Trace:
> > |  <TASK>
> > |  dump_stack_lvl+0x45/0x5a
> > |  unlock_parents_siglocks+0xb6/0xc0
> > |  ptrace_stop+0xb9/0x390
> > |  get_signal+0x51c/0x8d0
> > |  arch_do_signal_or_restart+0x31/0x750
> > |  exit_to_user_mode_prepare+0x157/0x220
> > |  irqentry_exit_to_user_mode+0x5/0x50
> > |  asm_sysvec_apic_timer_interrupt+0x12/0x20
> >
> > That is ptrace_parent() in unlock_parents_siglocks().
> 
> How odd.  I thought I had the appropriate lockdep config options enabled
> in my test build to catch things like this.  I guess not.
> 
> Now I am trying to think how to tell it that holding the appropriate
> iglock makes this ok.

The typical annotation is something like:

	rcu_dereference_protected(foo, lockdep_is_held(&bar))

Except in this case I think the problem is that bar depends on foo in
non-trivial ways. That is, foo is 'task->parent' and bar is
'task->parent->sighand->siglock' or something.

The other option is to use rcu_dereference_raw() in this one instance
and have a comment that explains the situation.
Doug Anderson May 20, 2022, 11:01 p.m. UTC | #10
Hi,

On Thu, May 19, 2022 at 4:49 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> Doug Anderson <dianders@chromium.org> writes:
>
> > Hi,
> >
> > On Wed, May 18, 2022 at 3:54 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
> >>
> >> kdb has a bug that when using the ps command to display a list of
> >> processes, if a process is being debugged the debugger as the parent
> >> process.
> >>
> >> This is silly, and I expect it never comes up in ptractice.  As there
> >> is very little point in using gdb and kdb simultaneously.  Update the
> >> code to use real_parent so that it is clear kdb does not want to
> >> display a debugger as the parent of a process.
> >
> > So I would tend to defer to Daniel, but I'm not convinced that the
> > behavior you describe for kdb today _is_ actually silly.
> >
> > If I was in kdb and I was listing processes, I might actually want to
> > see that a process's parent was set to gdb. Presumably that would tell
> > me extra information that might be relevant to my debug session.
> >
> > Personally, I'd rather add an extra piece of information into the list
> > showing the real parent if it's not the same as the parent. Then
> > you're not throwing away information.
>
> The name of the field is confusing for anyone who isn't intimate with
> the implementation details.  The function getppid returns
> tsk->real_parent->tgid.
>
> If kdb wants information of what the tracer is that is fine, but I
> recommend putting that information in another field.
>
> Given that the original description says give the information that ps
> gives my sense is that kdb is currently wrong.  Especially as it does
> not give you the actual parentage anywhere.
>
> I can certainly be convinced, but I do want some clarity.  It looks very
> attractive to rename task->parent to task->ptracer and leave the field
> NULL when there is no tracer.

Fair enough. You can consider my objection rescinded.

Presumably, though, you're hoping for an Ack for your patch and you
plan to take it with the rest of the series. That's going to need to
come from Daniel anyway as he is the actual maintainer. I'm just the
peanut gallery. ;-)

-Doug
Eric W. Biederman June 6, 2022, 4:12 p.m. UTC | #11
Kyle Huey <khuey@pernos.co> writes:

> On Thu, May 19, 2022 at 11:05 AM Eric W. Biederman
> <ebiederm@xmission.com> wrote:
>>
>> Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:
>>
>> > On 2022-05-18 20:26:05 [-0700], Kyle Huey wrote:
>> >> Is there a git branch somewhere I can pull to test this? It doesn't apply
>> >> cleanly to Linus's tip.
>> >
>> > https://kernel.googlesource.com/pub/scm/linux/kernel/git/ebiederm/user-namespace.git ptrace_stop-cleanup-for-v5.19
>>
>> Yes that is the branch this all applies to.
>>
>> This is my second round of cleanups this cycle for this code.
>> I just keep finding little things that deserve to be changed,
>> when I am working on the more substantial issues.
>>
>> Eric
>
> When running the rr test suite, I see hangs like this

Thanks.  I will dig into this.

Is there an easy way I can run the rr test suite to see if I can
reproduce this myself?

Thanks,
Eric

>
> [  812.151505] watchdog: BUG: soft lockup - CPU#3 stuck for 548s!
> [condvar_stress-:12152]
> [  812.151529] Modules linked in: snd_hda_codec_realtek
> snd_hda_codec_generic ledtrig_audio rfcomm cmac algif_hash
> algif_skcipher af_alg bnep dm_crypt intel_rapl_msr mei_hdcp
> snd_hda_codec_
> hdmi intel_rapl_common snd_hda_intel x86_pkg_temp_thermal
> snd_intel_dspcfg snd_intel_sdw_acpi nls_iso8859_1 intel_powerclamp
> snd_hda_codec coretemp snd_hda_core snd_hwdep snd_pcm rtl8723be
> btcoexist snd_seq_midi snd_seq_midi_event rtl8723_common kvm_intel
> rtl_pci snd_rawmidi rtlwifi btusb btrtl btbcm snd_seq kvm mac80211
> btintel btmtk snd_seq_device rapl bluetooth snd_timer i
> ntel_cstate hp_wmi cfg80211 serio_raw snd platform_profile
> ecdh_generic mei_me sparse_keymap efi_pstore wmi_bmof ee1004 joydev
> input_leds ecc libarc4 soundcore mei acpi_pad mac_hid sch_fq_c
> odel ipmi_devintf ipmi_msghandler msr vhost_vsock
> vmw_vsock_virtio_transport_common vsock vhost_net vhost vhost_iotlb
> tap vhci_hcd usbip_core parport_pc ppdev lp parport ip_tables x_tables
> autofs4 btrfs blake2b_generic xor raid6_pq zstd_compress
> [  812.151570]  libcrc32c hid_generic usbhid hid i915 drm_buddy
> i2c_algo_bit ttm drm_dp_helper cec rc_core crct10dif_pclmul
> drm_kms_helper crc32_pclmul syscopyarea ghash_clmulni_intel sysfi
> llrect sysimgblt fb_sys_fops aesni_intel crypto_simd cryptd r8169
> psmouse drm i2c_i801 realtek ahci i2c_smbus xhci_pci libahci
> xhci_pci_renesas wmi video
> [  812.151584] CPU: 3 PID: 12152 Comm: condvar_stress- Tainted: G
>     I  L    5.18.0-rc1+ #2
> [  812.151586] Hardware name: HP 750-280st/2B4B, BIOS A0.11 02/24/2016
> [  812.151587] RIP: 0010:_raw_spin_unlock_irq+0x15/0x40
> [  812.151591] Code: df e8 3f 1f 4a ff 90 5b 5d c3 66 66 2e 0f 1f 84
> 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 c6 07 00 0f 1f 00 fb 0f 1f
> 44 00 00 <bf> 01 00 00 00 e8 41 95 46 ff 65 8b 05 9
> a c1 9a 5f 85 c0 74 02 5d
> [  812.151593] RSP: 0018:ffffa863c246bd70 EFLAGS: 00000246
> [  812.151594] RAX: ffff8bc0913f6400 RBX: ffff8bc0913f6400 RCX: 0000000000000000
> [  812.151595] RDX: 0000000000000002 RSI: 00000000000a0013 RDI: ffff8bc089b63180
> [  812.151596] RBP: ffffa863c246bd70 R08: ffff8bc0811d6b40 R09: ffff8bc089b63180
> [  812.151597] R10: 0000000000000000 R11: 0000000000000004 R12: ffff8bc0913f6400
> [  812.151597] R13: ffff8bc089b63180 R14: ffff8bc0913f6400 R15: ffffa863c246be68
> [  812.151598] FS:  00007f612dda5700(0000) GS:ffff8bc7e24c0000(0000)
> knlGS:0000000000000000
> [  812.151599] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  812.151600] CR2: 000055e70715692e CR3: 000000010b4e8005 CR4: 00000000003706e4
> [  812.151601] Call Trace:
> [  812.151602]  <TASK>
> [  812.151604]  do_signal_stop+0x228/0x260
> [  812.151606]  get_signal+0x43a/0x8e0
> [  812.151608]  arch_do_signal_or_restart+0x37/0x7d0
> [  812.151610]  ? __this_cpu_preempt_check+0x13/0x20
> [  812.151612]  ? __perf_event_task_sched_in+0x81/0x230
> [  812.151616]  ? __this_cpu_preempt_check+0x13/0x20
> [  812.151617]  exit_to_user_mode_prepare+0x130/0x1a0
> [  812.151620]  syscall_exit_to_user_mode+0x26/0x40
> [  812.151621]  ret_from_fork+0x15/0x30
> [  812.151623] RIP: 0033:0x7f612dfcd125
> [  812.151625] Code: 48 85 ff 74 3d 48 85 f6 74 38 48 83 ee 10 48 89
> 4e 08 48 89 3e 48 89 d7 4c 89 c2 4d 89 c8 4c 8b 54 24 08 b8 38 00 00
> 00 0f 05 <48> 85 c0 7c 13 74 01 c3 31 ed 58 5f ff d
> 0 48 89 c7 b8 3c 00 00 00
> [  812.151626] RSP: 002b:00007f612dda4fb0 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000038
> [  812.151628] RAX: 0000000000000000 RBX: 00007f612dda5700 RCX: ffffffffffffffff
> [  812.151628] RDX: 00007f612dda59d0 RSI: 00007f612dda4fb0 RDI: 00000000003d0f00
> [  812.151629] RBP: 00007ffd59ad20b0 R08: 00007f612dda5700 R09: 00007f612dda5700
> [  812.151630] R10: 00007f612dda59d0 R11: 0000000000000246 R12: 00007ffd59ad20ae
> [  812.151631] R13: 00007ffd59ad20af R14: 00007ffd59ad20b0 R15: 00007f612dda4fc0
> [  812.151632]  </TASK>
>
> - Kyle
Kyle Huey June 9, 2022, 7:59 p.m. UTC | #12
On Mon, Jun 6, 2022 at 9:12 AM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> Kyle Huey <khuey@pernos.co> writes:
>
> > On Thu, May 19, 2022 at 11:05 AM Eric W. Biederman
> > <ebiederm@xmission.com> wrote:
> >>
> >> Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:
> >>
> >> > On 2022-05-18 20:26:05 [-0700], Kyle Huey wrote:
> >> >> Is there a git branch somewhere I can pull to test this? It doesn't apply
> >> >> cleanly to Linus's tip.
> >> >
> >> > https://kernel.googlesource.com/pub/scm/linux/kernel/git/ebiederm/user-namespace.git ptrace_stop-cleanup-for-v5.19
> >>
> >> Yes that is the branch this all applies to.
> >>
> >> This is my second round of cleanups this cycle for this code.
> >> I just keep finding little things that deserve to be changed,
> >> when I am working on the more substantial issues.
> >>
> >> Eric
> >
> > When running the rr test suite, I see hangs like this
>
> Thanks.  I will dig into this.
>
> Is there an easy way I can run the rr test suite to see if I can
> reproduce this myself?

It should be a straight forward
1. https://github.com/rr-debugger/rr.git
2. mkdir obj-rr && cd obj-rr
3. cmake ../rr
4. make -jN
5. make check

If you have trouble with it feel free to email me off list.

- Kyle

> Thanks,
> Eric
>
> >
> > [  812.151505] watchdog: BUG: soft lockup - CPU#3 stuck for 548s!
> > [condvar_stress-:12152]
> > [  812.151529] Modules linked in: snd_hda_codec_realtek
> > snd_hda_codec_generic ledtrig_audio rfcomm cmac algif_hash
> > algif_skcipher af_alg bnep dm_crypt intel_rapl_msr mei_hdcp
> > snd_hda_codec_
> > hdmi intel_rapl_common snd_hda_intel x86_pkg_temp_thermal
> > snd_intel_dspcfg snd_intel_sdw_acpi nls_iso8859_1 intel_powerclamp
> > snd_hda_codec coretemp snd_hda_core snd_hwdep snd_pcm rtl8723be
> > btcoexist snd_seq_midi snd_seq_midi_event rtl8723_common kvm_intel
> > rtl_pci snd_rawmidi rtlwifi btusb btrtl btbcm snd_seq kvm mac80211
> > btintel btmtk snd_seq_device rapl bluetooth snd_timer i
> > ntel_cstate hp_wmi cfg80211 serio_raw snd platform_profile
> > ecdh_generic mei_me sparse_keymap efi_pstore wmi_bmof ee1004 joydev
> > input_leds ecc libarc4 soundcore mei acpi_pad mac_hid sch_fq_c
> > odel ipmi_devintf ipmi_msghandler msr vhost_vsock
> > vmw_vsock_virtio_transport_common vsock vhost_net vhost vhost_iotlb
> > tap vhci_hcd usbip_core parport_pc ppdev lp parport ip_tables x_tables
> > autofs4 btrfs blake2b_generic xor raid6_pq zstd_compress
> > [  812.151570]  libcrc32c hid_generic usbhid hid i915 drm_buddy
> > i2c_algo_bit ttm drm_dp_helper cec rc_core crct10dif_pclmul
> > drm_kms_helper crc32_pclmul syscopyarea ghash_clmulni_intel sysfi
> > llrect sysimgblt fb_sys_fops aesni_intel crypto_simd cryptd r8169
> > psmouse drm i2c_i801 realtek ahci i2c_smbus xhci_pci libahci
> > xhci_pci_renesas wmi video
> > [  812.151584] CPU: 3 PID: 12152 Comm: condvar_stress- Tainted: G
> >     I  L    5.18.0-rc1+ #2
> > [  812.151586] Hardware name: HP 750-280st/2B4B, BIOS A0.11 02/24/2016
> > [  812.151587] RIP: 0010:_raw_spin_unlock_irq+0x15/0x40
> > [  812.151591] Code: df e8 3f 1f 4a ff 90 5b 5d c3 66 66 2e 0f 1f 84
> > 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 c6 07 00 0f 1f 00 fb 0f 1f
> > 44 00 00 <bf> 01 00 00 00 e8 41 95 46 ff 65 8b 05 9
> > a c1 9a 5f 85 c0 74 02 5d
> > [  812.151593] RSP: 0018:ffffa863c246bd70 EFLAGS: 00000246
> > [  812.151594] RAX: ffff8bc0913f6400 RBX: ffff8bc0913f6400 RCX: 0000000000000000
> > [  812.151595] RDX: 0000000000000002 RSI: 00000000000a0013 RDI: ffff8bc089b63180
> > [  812.151596] RBP: ffffa863c246bd70 R08: ffff8bc0811d6b40 R09: ffff8bc089b63180
> > [  812.151597] R10: 0000000000000000 R11: 0000000000000004 R12: ffff8bc0913f6400
> > [  812.151597] R13: ffff8bc089b63180 R14: ffff8bc0913f6400 R15: ffffa863c246be68
> > [  812.151598] FS:  00007f612dda5700(0000) GS:ffff8bc7e24c0000(0000)
> > knlGS:0000000000000000
> > [  812.151599] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  812.151600] CR2: 000055e70715692e CR3: 000000010b4e8005 CR4: 00000000003706e4
> > [  812.151601] Call Trace:
> > [  812.151602]  <TASK>
> > [  812.151604]  do_signal_stop+0x228/0x260
> > [  812.151606]  get_signal+0x43a/0x8e0
> > [  812.151608]  arch_do_signal_or_restart+0x37/0x7d0
> > [  812.151610]  ? __this_cpu_preempt_check+0x13/0x20
> > [  812.151612]  ? __perf_event_task_sched_in+0x81/0x230
> > [  812.151616]  ? __this_cpu_preempt_check+0x13/0x20
> > [  812.151617]  exit_to_user_mode_prepare+0x130/0x1a0
> > [  812.151620]  syscall_exit_to_user_mode+0x26/0x40
> > [  812.151621]  ret_from_fork+0x15/0x30
> > [  812.151623] RIP: 0033:0x7f612dfcd125
> > [  812.151625] Code: 48 85 ff 74 3d 48 85 f6 74 38 48 83 ee 10 48 89
> > 4e 08 48 89 3e 48 89 d7 4c 89 c2 4d 89 c8 4c 8b 54 24 08 b8 38 00 00
> > 00 0f 05 <48> 85 c0 7c 13 74 01 c3 31 ed 58 5f ff d
> > 0 48 89 c7 b8 3c 00 00 00
> > [  812.151626] RSP: 002b:00007f612dda4fb0 EFLAGS: 00000246 ORIG_RAX:
> > 0000000000000038
> > [  812.151628] RAX: 0000000000000000 RBX: 00007f612dda5700 RCX: ffffffffffffffff
> > [  812.151628] RDX: 00007f612dda59d0 RSI: 00007f612dda4fb0 RDI: 00000000003d0f00
> > [  812.151629] RBP: 00007ffd59ad20b0 R08: 00007f612dda5700 R09: 00007f612dda5700
> > [  812.151630] R10: 00007f612dda59d0 R11: 0000000000000246 R12: 00007ffd59ad20ae
> > [  812.151631] R13: 00007ffd59ad20af R14: 00007ffd59ad20b0 R15: 00007f612dda4fc0
> > [  812.151632]  </TASK>
> >
> > - Kyle