Message ID | 871qwq5ucx.fsf_-_@email.froward.int.ebiederm.org |
---|---|
Headers | show |
Series | ptrace: cleanups and calling do_cldstop with only siglock | expand |
On 2022-05-18 20:26:05 [-0700], Kyle Huey wrote: > Is there a git branch somewhere I can pull to test this? It doesn't apply > cleanly to Linus's tip. https://kernel.googlesource.com/pub/scm/linux/kernel/git/ebiederm/user-namespace.git ptrace_stop-cleanup-for-v5.19 > - Kyle Sebastian
Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes: > On 2022-05-18 20:26:05 [-0700], Kyle Huey wrote: >> Is there a git branch somewhere I can pull to test this? It doesn't apply >> cleanly to Linus's tip. > > https://kernel.googlesource.com/pub/scm/linux/kernel/git/ebiederm/user-namespace.git ptrace_stop-cleanup-for-v5.19 Yes that is the branch this all applies to. This is my second round of cleanups this cycle for this code. I just keep finding little things that deserve to be changed, when I am working on the more substantial issues. Eric
Hi, On Wed, May 18, 2022 at 3:54 PM Eric W. Biederman <ebiederm@xmission.com> wrote: > > kdb has a bug that when using the ps command to display a list of > processes, if a process is being debugged the debugger as the parent > process. > > This is silly, and I expect it never comes up in ptractice. As there > is very little point in using gdb and kdb simultaneously. Update the > code to use real_parent so that it is clear kdb does not want to > display a debugger as the parent of a process. So I would tend to defer to Daniel, but I'm not convinced that the behavior you describe for kdb today _is_ actually silly. If I was in kdb and I was listing processes, I might actually want to see that a process's parent was set to gdb. Presumably that would tell me extra information that might be relevant to my debug session. Personally, I'd rather add an extra piece of information into the list showing the real parent if it's not the same as the parent. Then you're not throwing away information. -Doug
Doug Anderson <dianders@chromium.org> writes: > Hi, > > On Wed, May 18, 2022 at 3:54 PM Eric W. Biederman <ebiederm@xmission.com> wrote: >> >> kdb has a bug that when using the ps command to display a list of >> processes, if a process is being debugged the debugger as the parent >> process. >> >> This is silly, and I expect it never comes up in ptractice. As there >> is very little point in using gdb and kdb simultaneously. Update the >> code to use real_parent so that it is clear kdb does not want to >> display a debugger as the parent of a process. > > So I would tend to defer to Daniel, but I'm not convinced that the > behavior you describe for kdb today _is_ actually silly. > > If I was in kdb and I was listing processes, I might actually want to > see that a process's parent was set to gdb. Presumably that would tell > me extra information that might be relevant to my debug session. > > Personally, I'd rather add an extra piece of information into the list > showing the real parent if it's not the same as the parent. Then > you're not throwing away information. The name of the field is confusing for anyone who isn't intimate with the implementation details. The function getppid returns tsk->real_parent->tgid. If kdb wants information of what the tracer is that is fine, but I recommend putting that information in another field. Given that the original description says give the information that ps gives my sense is that kdb is currently wrong. Especially as it does not give you the actual parentage anywhere. I can certainly be convinced, but I do want some clarity. It looks very attractive to rename task->parent to task->ptracer and leave the field NULL when there is no tracer. Eric
On Thu, May 19, 2022 at 11:05 AM Eric W. Biederman <ebiederm@xmission.com> wrote: > > Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes: > > > On 2022-05-18 20:26:05 [-0700], Kyle Huey wrote: > >> Is there a git branch somewhere I can pull to test this? It doesn't apply > >> cleanly to Linus's tip. > > > > https://kernel.googlesource.com/pub/scm/linux/kernel/git/ebiederm/user-namespace.git ptrace_stop-cleanup-for-v5.19 > > Yes that is the branch this all applies to. > > This is my second round of cleanups this cycle for this code. > I just keep finding little things that deserve to be changed, > when I am working on the more substantial issues. > > Eric When running the rr test suite, I see hangs like this [ 812.151505] watchdog: BUG: soft lockup - CPU#3 stuck for 548s! [condvar_stress-:12152] [ 812.151529] Modules linked in: snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio rfcomm cmac algif_hash algif_skcipher af_alg bnep dm_crypt intel_rapl_msr mei_hdcp snd_hda_codec_ hdmi intel_rapl_common snd_hda_intel x86_pkg_temp_thermal snd_intel_dspcfg snd_intel_sdw_acpi nls_iso8859_1 intel_powerclamp snd_hda_codec coretemp snd_hda_core snd_hwdep snd_pcm rtl8723be btcoexist snd_seq_midi snd_seq_midi_event rtl8723_common kvm_intel rtl_pci snd_rawmidi rtlwifi btusb btrtl btbcm snd_seq kvm mac80211 btintel btmtk snd_seq_device rapl bluetooth snd_timer i ntel_cstate hp_wmi cfg80211 serio_raw snd platform_profile ecdh_generic mei_me sparse_keymap efi_pstore wmi_bmof ee1004 joydev input_leds ecc libarc4 soundcore mei acpi_pad mac_hid sch_fq_c odel ipmi_devintf ipmi_msghandler msr vhost_vsock vmw_vsock_virtio_transport_common vsock vhost_net vhost vhost_iotlb tap vhci_hcd usbip_core parport_pc ppdev lp parport ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq zstd_compress [ 812.151570] libcrc32c hid_generic usbhid hid i915 drm_buddy i2c_algo_bit ttm drm_dp_helper cec rc_core crct10dif_pclmul drm_kms_helper crc32_pclmul syscopyarea ghash_clmulni_intel sysfi llrect sysimgblt fb_sys_fops aesni_intel crypto_simd cryptd r8169 psmouse drm i2c_i801 realtek ahci i2c_smbus xhci_pci libahci xhci_pci_renesas wmi video [ 812.151584] CPU: 3 PID: 12152 Comm: condvar_stress- Tainted: G I L 5.18.0-rc1+ #2 [ 812.151586] Hardware name: HP 750-280st/2B4B, BIOS A0.11 02/24/2016 [ 812.151587] RIP: 0010:_raw_spin_unlock_irq+0x15/0x40 [ 812.151591] Code: df e8 3f 1f 4a ff 90 5b 5d c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 c6 07 00 0f 1f 00 fb 0f 1f 44 00 00 <bf> 01 00 00 00 e8 41 95 46 ff 65 8b 05 9 a c1 9a 5f 85 c0 74 02 5d [ 812.151593] RSP: 0018:ffffa863c246bd70 EFLAGS: 00000246 [ 812.151594] RAX: ffff8bc0913f6400 RBX: ffff8bc0913f6400 RCX: 0000000000000000 [ 812.151595] RDX: 0000000000000002 RSI: 00000000000a0013 RDI: ffff8bc089b63180 [ 812.151596] RBP: ffffa863c246bd70 R08: ffff8bc0811d6b40 R09: ffff8bc089b63180 [ 812.151597] R10: 0000000000000000 R11: 0000000000000004 R12: ffff8bc0913f6400 [ 812.151597] R13: ffff8bc089b63180 R14: ffff8bc0913f6400 R15: ffffa863c246be68 [ 812.151598] FS: 00007f612dda5700(0000) GS:ffff8bc7e24c0000(0000) knlGS:0000000000000000 [ 812.151599] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 812.151600] CR2: 000055e70715692e CR3: 000000010b4e8005 CR4: 00000000003706e4 [ 812.151601] Call Trace: [ 812.151602] <TASK> [ 812.151604] do_signal_stop+0x228/0x260 [ 812.151606] get_signal+0x43a/0x8e0 [ 812.151608] arch_do_signal_or_restart+0x37/0x7d0 [ 812.151610] ? __this_cpu_preempt_check+0x13/0x20 [ 812.151612] ? __perf_event_task_sched_in+0x81/0x230 [ 812.151616] ? __this_cpu_preempt_check+0x13/0x20 [ 812.151617] exit_to_user_mode_prepare+0x130/0x1a0 [ 812.151620] syscall_exit_to_user_mode+0x26/0x40 [ 812.151621] ret_from_fork+0x15/0x30 [ 812.151623] RIP: 0033:0x7f612dfcd125 [ 812.151625] Code: 48 85 ff 74 3d 48 85 f6 74 38 48 83 ee 10 48 89 4e 08 48 89 3e 48 89 d7 4c 89 c2 4d 89 c8 4c 8b 54 24 08 b8 38 00 00 00 0f 05 <48> 85 c0 7c 13 74 01 c3 31 ed 58 5f ff d 0 48 89 c7 b8 3c 00 00 00 [ 812.151626] RSP: 002b:00007f612dda4fb0 EFLAGS: 00000246 ORIG_RAX: 0000000000000038 [ 812.151628] RAX: 0000000000000000 RBX: 00007f612dda5700 RCX: ffffffffffffffff [ 812.151628] RDX: 00007f612dda59d0 RSI: 00007f612dda4fb0 RDI: 00000000003d0f00 [ 812.151629] RBP: 00007ffd59ad20b0 R08: 00007f612dda5700 R09: 00007f612dda5700 [ 812.151630] R10: 00007f612dda59d0 R11: 0000000000000246 R12: 00007ffd59ad20ae [ 812.151631] R13: 00007ffd59ad20af R14: 00007ffd59ad20b0 R15: 00007f612dda4fc0 [ 812.151632] </TASK> - Kyle
On 2022-05-18 17:49:50 [-0500], Eric W. Biederman wrote: > > For ptrace_stop to work on PREEMT_RT no spinlocks can be taken once > ptrace_freeze_traced has completed successfully. Which fundamentally > means the lock dance of dropping siglock and grabbing tasklist_lock does > not work on PREEMPT_RT. So I have worked through what is necessary so > that tasklist_lock does not need to be grabbed in ptrace_stop after > siglock is dropped. … It took me a while to realise that this is a follow-up I somehow assumed that you added a few patches on top. Might have been the yesterday's heat. b4 also refused to download this series because the v4 in this thread looked newer… Anyway. Both series applied: | ============================= | WARNING: suspicious RCU usage | 5.18.0-rc7+ #16 Not tainted | ----------------------------- | include/linux/ptrace.h:120 suspicious rcu_dereference_check() usage! | | other info that might help us debug this: | | rcu_scheduler_active = 2, debug_locks = 1 | 2 locks held by ssdd/1734: | #0: ffff88800eaa6918 (&sighand->siglock){....}-{2:2}, at: lock_parents_siglocks+0xf0/0x3b0 | #1: ffff88800eaa71d8 (&sighand->siglock/2){....}-{2:2}, at: lock_parents_siglocks+0x115/0x3b0 | | stack backtrace: | CPU: 2 PID: 1734 Comm: ssdd Not tainted 5.18.0-rc7+ #16 | Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-debian-1.16.0-4 04/01/2014 | Call Trace: | <TASK> | dump_stack_lvl+0x45/0x5a | unlock_parents_siglocks+0xb6/0xc0 | ptrace_stop+0xb9/0x390 | get_signal+0x51c/0x8d0 | arch_do_signal_or_restart+0x31/0x750 | exit_to_user_mode_prepare+0x157/0x220 | irqentry_exit_to_user_mode+0x5/0x50 | asm_sysvec_apic_timer_interrupt+0x12/0x20 That is ptrace_parent() in unlock_parents_siglocks(). Sebastian
On 2022-05-18 17:49:50 [-0500], Eric W. Biederman wrote: > After this set of changes only cgroup_enter_frozen should remain a > stumbling block for PREEMPT_RT in the ptrace_stop path. Yes, I can confirm that. I have no systemd-less system at hand which means I can't boot a kernel without CGROUP support. But after removing cgroup_{enter|leave}_frozen() in ptrace_stop() I don't see the problems I saw earlier. Sebastian
Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes: > On 2022-05-18 17:49:50 [-0500], Eric W. Biederman wrote: >> >> For ptrace_stop to work on PREEMT_RT no spinlocks can be taken once >> ptrace_freeze_traced has completed successfully. Which fundamentally >> means the lock dance of dropping siglock and grabbing tasklist_lock does >> not work on PREEMPT_RT. So I have worked through what is necessary so >> that tasklist_lock does not need to be grabbed in ptrace_stop after >> siglock is dropped. > … > It took me a while to realise that this is a follow-up I somehow assumed > that you added a few patches on top. Might have been the yesterday's > heat. b4 also refused to download this series because the v4 in this > thread looked newer… Anyway. Both series applied: > > | ============================= > | WARNING: suspicious RCU usage > | 5.18.0-rc7+ #16 Not tainted > | ----------------------------- > | include/linux/ptrace.h:120 suspicious rcu_dereference_check() usage! > | > | other info that might help us debug this: > | > | rcu_scheduler_active = 2, debug_locks = 1 > | 2 locks held by ssdd/1734: > | #0: ffff88800eaa6918 (&sighand->siglock){....}-{2:2}, at: lock_parents_siglocks+0xf0/0x3b0 > | #1: ffff88800eaa71d8 (&sighand->siglock/2){....}-{2:2}, at: lock_parents_siglocks+0x115/0x3b0 > | > | stack backtrace: > | CPU: 2 PID: 1734 Comm: ssdd Not tainted 5.18.0-rc7+ #16 > | Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-debian-1.16.0-4 04/01/2014 > | Call Trace: > | <TASK> > | dump_stack_lvl+0x45/0x5a > | unlock_parents_siglocks+0xb6/0xc0 > | ptrace_stop+0xb9/0x390 > | get_signal+0x51c/0x8d0 > | arch_do_signal_or_restart+0x31/0x750 > | exit_to_user_mode_prepare+0x157/0x220 > | irqentry_exit_to_user_mode+0x5/0x50 > | asm_sysvec_apic_timer_interrupt+0x12/0x20 > > That is ptrace_parent() in unlock_parents_siglocks(). How odd. I thought I had the appropriate lockdep config options enabled in my test build to catch things like this. I guess not. Now I am trying to think how to tell it that holding the appropriate iglock makes this ok. Eric
On Fri, May 20, 2022 at 02:32:24PM -0500, Eric W. Biederman wrote: > Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes: > > > On 2022-05-18 17:49:50 [-0500], Eric W. Biederman wrote: > >> > >> For ptrace_stop to work on PREEMT_RT no spinlocks can be taken once > >> ptrace_freeze_traced has completed successfully. Which fundamentally > >> means the lock dance of dropping siglock and grabbing tasklist_lock does > >> not work on PREEMPT_RT. So I have worked through what is necessary so > >> that tasklist_lock does not need to be grabbed in ptrace_stop after > >> siglock is dropped. > > … > > It took me a while to realise that this is a follow-up I somehow assumed > > that you added a few patches on top. Might have been the yesterday's > > heat. b4 also refused to download this series because the v4 in this > > thread looked newer… Anyway. Both series applied: > > > > | ============================= > > | WARNING: suspicious RCU usage > > | 5.18.0-rc7+ #16 Not tainted > > | ----------------------------- > > | include/linux/ptrace.h:120 suspicious rcu_dereference_check() usage! > > | > > | other info that might help us debug this: > > | > > | rcu_scheduler_active = 2, debug_locks = 1 > > | 2 locks held by ssdd/1734: > > | #0: ffff88800eaa6918 (&sighand->siglock){....}-{2:2}, at: lock_parents_siglocks+0xf0/0x3b0 > > | #1: ffff88800eaa71d8 (&sighand->siglock/2){....}-{2:2}, at: lock_parents_siglocks+0x115/0x3b0 > > | > > | stack backtrace: > > | CPU: 2 PID: 1734 Comm: ssdd Not tainted 5.18.0-rc7+ #16 > > | Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-debian-1.16.0-4 04/01/2014 > > | Call Trace: > > | <TASK> > > | dump_stack_lvl+0x45/0x5a > > | unlock_parents_siglocks+0xb6/0xc0 > > | ptrace_stop+0xb9/0x390 > > | get_signal+0x51c/0x8d0 > > | arch_do_signal_or_restart+0x31/0x750 > > | exit_to_user_mode_prepare+0x157/0x220 > > | irqentry_exit_to_user_mode+0x5/0x50 > > | asm_sysvec_apic_timer_interrupt+0x12/0x20 > > > > That is ptrace_parent() in unlock_parents_siglocks(). > > How odd. I thought I had the appropriate lockdep config options enabled > in my test build to catch things like this. I guess not. > > Now I am trying to think how to tell it that holding the appropriate > iglock makes this ok. The typical annotation is something like: rcu_dereference_protected(foo, lockdep_is_held(&bar)) Except in this case I think the problem is that bar depends on foo in non-trivial ways. That is, foo is 'task->parent' and bar is 'task->parent->sighand->siglock' or something. The other option is to use rcu_dereference_raw() in this one instance and have a comment that explains the situation.
Hi, On Thu, May 19, 2022 at 4:49 PM Eric W. Biederman <ebiederm@xmission.com> wrote: > > Doug Anderson <dianders@chromium.org> writes: > > > Hi, > > > > On Wed, May 18, 2022 at 3:54 PM Eric W. Biederman <ebiederm@xmission.com> wrote: > >> > >> kdb has a bug that when using the ps command to display a list of > >> processes, if a process is being debugged the debugger as the parent > >> process. > >> > >> This is silly, and I expect it never comes up in ptractice. As there > >> is very little point in using gdb and kdb simultaneously. Update the > >> code to use real_parent so that it is clear kdb does not want to > >> display a debugger as the parent of a process. > > > > So I would tend to defer to Daniel, but I'm not convinced that the > > behavior you describe for kdb today _is_ actually silly. > > > > If I was in kdb and I was listing processes, I might actually want to > > see that a process's parent was set to gdb. Presumably that would tell > > me extra information that might be relevant to my debug session. > > > > Personally, I'd rather add an extra piece of information into the list > > showing the real parent if it's not the same as the parent. Then > > you're not throwing away information. > > The name of the field is confusing for anyone who isn't intimate with > the implementation details. The function getppid returns > tsk->real_parent->tgid. > > If kdb wants information of what the tracer is that is fine, but I > recommend putting that information in another field. > > Given that the original description says give the information that ps > gives my sense is that kdb is currently wrong. Especially as it does > not give you the actual parentage anywhere. > > I can certainly be convinced, but I do want some clarity. It looks very > attractive to rename task->parent to task->ptracer and leave the field > NULL when there is no tracer. Fair enough. You can consider my objection rescinded. Presumably, though, you're hoping for an Ack for your patch and you plan to take it with the rest of the series. That's going to need to come from Daniel anyway as he is the actual maintainer. I'm just the peanut gallery. ;-) -Doug
Kyle Huey <khuey@pernos.co> writes: > On Thu, May 19, 2022 at 11:05 AM Eric W. Biederman > <ebiederm@xmission.com> wrote: >> >> Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes: >> >> > On 2022-05-18 20:26:05 [-0700], Kyle Huey wrote: >> >> Is there a git branch somewhere I can pull to test this? It doesn't apply >> >> cleanly to Linus's tip. >> > >> > https://kernel.googlesource.com/pub/scm/linux/kernel/git/ebiederm/user-namespace.git ptrace_stop-cleanup-for-v5.19 >> >> Yes that is the branch this all applies to. >> >> This is my second round of cleanups this cycle for this code. >> I just keep finding little things that deserve to be changed, >> when I am working on the more substantial issues. >> >> Eric > > When running the rr test suite, I see hangs like this Thanks. I will dig into this. Is there an easy way I can run the rr test suite to see if I can reproduce this myself? Thanks, Eric > > [ 812.151505] watchdog: BUG: soft lockup - CPU#3 stuck for 548s! > [condvar_stress-:12152] > [ 812.151529] Modules linked in: snd_hda_codec_realtek > snd_hda_codec_generic ledtrig_audio rfcomm cmac algif_hash > algif_skcipher af_alg bnep dm_crypt intel_rapl_msr mei_hdcp > snd_hda_codec_ > hdmi intel_rapl_common snd_hda_intel x86_pkg_temp_thermal > snd_intel_dspcfg snd_intel_sdw_acpi nls_iso8859_1 intel_powerclamp > snd_hda_codec coretemp snd_hda_core snd_hwdep snd_pcm rtl8723be > btcoexist snd_seq_midi snd_seq_midi_event rtl8723_common kvm_intel > rtl_pci snd_rawmidi rtlwifi btusb btrtl btbcm snd_seq kvm mac80211 > btintel btmtk snd_seq_device rapl bluetooth snd_timer i > ntel_cstate hp_wmi cfg80211 serio_raw snd platform_profile > ecdh_generic mei_me sparse_keymap efi_pstore wmi_bmof ee1004 joydev > input_leds ecc libarc4 soundcore mei acpi_pad mac_hid sch_fq_c > odel ipmi_devintf ipmi_msghandler msr vhost_vsock > vmw_vsock_virtio_transport_common vsock vhost_net vhost vhost_iotlb > tap vhci_hcd usbip_core parport_pc ppdev lp parport ip_tables x_tables > autofs4 btrfs blake2b_generic xor raid6_pq zstd_compress > [ 812.151570] libcrc32c hid_generic usbhid hid i915 drm_buddy > i2c_algo_bit ttm drm_dp_helper cec rc_core crct10dif_pclmul > drm_kms_helper crc32_pclmul syscopyarea ghash_clmulni_intel sysfi > llrect sysimgblt fb_sys_fops aesni_intel crypto_simd cryptd r8169 > psmouse drm i2c_i801 realtek ahci i2c_smbus xhci_pci libahci > xhci_pci_renesas wmi video > [ 812.151584] CPU: 3 PID: 12152 Comm: condvar_stress- Tainted: G > I L 5.18.0-rc1+ #2 > [ 812.151586] Hardware name: HP 750-280st/2B4B, BIOS A0.11 02/24/2016 > [ 812.151587] RIP: 0010:_raw_spin_unlock_irq+0x15/0x40 > [ 812.151591] Code: df e8 3f 1f 4a ff 90 5b 5d c3 66 66 2e 0f 1f 84 > 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 c6 07 00 0f 1f 00 fb 0f 1f > 44 00 00 <bf> 01 00 00 00 e8 41 95 46 ff 65 8b 05 9 > a c1 9a 5f 85 c0 74 02 5d > [ 812.151593] RSP: 0018:ffffa863c246bd70 EFLAGS: 00000246 > [ 812.151594] RAX: ffff8bc0913f6400 RBX: ffff8bc0913f6400 RCX: 0000000000000000 > [ 812.151595] RDX: 0000000000000002 RSI: 00000000000a0013 RDI: ffff8bc089b63180 > [ 812.151596] RBP: ffffa863c246bd70 R08: ffff8bc0811d6b40 R09: ffff8bc089b63180 > [ 812.151597] R10: 0000000000000000 R11: 0000000000000004 R12: ffff8bc0913f6400 > [ 812.151597] R13: ffff8bc089b63180 R14: ffff8bc0913f6400 R15: ffffa863c246be68 > [ 812.151598] FS: 00007f612dda5700(0000) GS:ffff8bc7e24c0000(0000) > knlGS:0000000000000000 > [ 812.151599] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 812.151600] CR2: 000055e70715692e CR3: 000000010b4e8005 CR4: 00000000003706e4 > [ 812.151601] Call Trace: > [ 812.151602] <TASK> > [ 812.151604] do_signal_stop+0x228/0x260 > [ 812.151606] get_signal+0x43a/0x8e0 > [ 812.151608] arch_do_signal_or_restart+0x37/0x7d0 > [ 812.151610] ? __this_cpu_preempt_check+0x13/0x20 > [ 812.151612] ? __perf_event_task_sched_in+0x81/0x230 > [ 812.151616] ? __this_cpu_preempt_check+0x13/0x20 > [ 812.151617] exit_to_user_mode_prepare+0x130/0x1a0 > [ 812.151620] syscall_exit_to_user_mode+0x26/0x40 > [ 812.151621] ret_from_fork+0x15/0x30 > [ 812.151623] RIP: 0033:0x7f612dfcd125 > [ 812.151625] Code: 48 85 ff 74 3d 48 85 f6 74 38 48 83 ee 10 48 89 > 4e 08 48 89 3e 48 89 d7 4c 89 c2 4d 89 c8 4c 8b 54 24 08 b8 38 00 00 > 00 0f 05 <48> 85 c0 7c 13 74 01 c3 31 ed 58 5f ff d > 0 48 89 c7 b8 3c 00 00 00 > [ 812.151626] RSP: 002b:00007f612dda4fb0 EFLAGS: 00000246 ORIG_RAX: > 0000000000000038 > [ 812.151628] RAX: 0000000000000000 RBX: 00007f612dda5700 RCX: ffffffffffffffff > [ 812.151628] RDX: 00007f612dda59d0 RSI: 00007f612dda4fb0 RDI: 00000000003d0f00 > [ 812.151629] RBP: 00007ffd59ad20b0 R08: 00007f612dda5700 R09: 00007f612dda5700 > [ 812.151630] R10: 00007f612dda59d0 R11: 0000000000000246 R12: 00007ffd59ad20ae > [ 812.151631] R13: 00007ffd59ad20af R14: 00007ffd59ad20b0 R15: 00007f612dda4fc0 > [ 812.151632] </TASK> > > - Kyle
On Mon, Jun 6, 2022 at 9:12 AM Eric W. Biederman <ebiederm@xmission.com> wrote: > > Kyle Huey <khuey@pernos.co> writes: > > > On Thu, May 19, 2022 at 11:05 AM Eric W. Biederman > > <ebiederm@xmission.com> wrote: > >> > >> Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes: > >> > >> > On 2022-05-18 20:26:05 [-0700], Kyle Huey wrote: > >> >> Is there a git branch somewhere I can pull to test this? It doesn't apply > >> >> cleanly to Linus's tip. > >> > > >> > https://kernel.googlesource.com/pub/scm/linux/kernel/git/ebiederm/user-namespace.git ptrace_stop-cleanup-for-v5.19 > >> > >> Yes that is the branch this all applies to. > >> > >> This is my second round of cleanups this cycle for this code. > >> I just keep finding little things that deserve to be changed, > >> when I am working on the more substantial issues. > >> > >> Eric > > > > When running the rr test suite, I see hangs like this > > Thanks. I will dig into this. > > Is there an easy way I can run the rr test suite to see if I can > reproduce this myself? It should be a straight forward 1. https://github.com/rr-debugger/rr.git 2. mkdir obj-rr && cd obj-rr 3. cmake ../rr 4. make -jN 5. make check If you have trouble with it feel free to email me off list. - Kyle > Thanks, > Eric > > > > > [ 812.151505] watchdog: BUG: soft lockup - CPU#3 stuck for 548s! > > [condvar_stress-:12152] > > [ 812.151529] Modules linked in: snd_hda_codec_realtek > > snd_hda_codec_generic ledtrig_audio rfcomm cmac algif_hash > > algif_skcipher af_alg bnep dm_crypt intel_rapl_msr mei_hdcp > > snd_hda_codec_ > > hdmi intel_rapl_common snd_hda_intel x86_pkg_temp_thermal > > snd_intel_dspcfg snd_intel_sdw_acpi nls_iso8859_1 intel_powerclamp > > snd_hda_codec coretemp snd_hda_core snd_hwdep snd_pcm rtl8723be > > btcoexist snd_seq_midi snd_seq_midi_event rtl8723_common kvm_intel > > rtl_pci snd_rawmidi rtlwifi btusb btrtl btbcm snd_seq kvm mac80211 > > btintel btmtk snd_seq_device rapl bluetooth snd_timer i > > ntel_cstate hp_wmi cfg80211 serio_raw snd platform_profile > > ecdh_generic mei_me sparse_keymap efi_pstore wmi_bmof ee1004 joydev > > input_leds ecc libarc4 soundcore mei acpi_pad mac_hid sch_fq_c > > odel ipmi_devintf ipmi_msghandler msr vhost_vsock > > vmw_vsock_virtio_transport_common vsock vhost_net vhost vhost_iotlb > > tap vhci_hcd usbip_core parport_pc ppdev lp parport ip_tables x_tables > > autofs4 btrfs blake2b_generic xor raid6_pq zstd_compress > > [ 812.151570] libcrc32c hid_generic usbhid hid i915 drm_buddy > > i2c_algo_bit ttm drm_dp_helper cec rc_core crct10dif_pclmul > > drm_kms_helper crc32_pclmul syscopyarea ghash_clmulni_intel sysfi > > llrect sysimgblt fb_sys_fops aesni_intel crypto_simd cryptd r8169 > > psmouse drm i2c_i801 realtek ahci i2c_smbus xhci_pci libahci > > xhci_pci_renesas wmi video > > [ 812.151584] CPU: 3 PID: 12152 Comm: condvar_stress- Tainted: G > > I L 5.18.0-rc1+ #2 > > [ 812.151586] Hardware name: HP 750-280st/2B4B, BIOS A0.11 02/24/2016 > > [ 812.151587] RIP: 0010:_raw_spin_unlock_irq+0x15/0x40 > > [ 812.151591] Code: df e8 3f 1f 4a ff 90 5b 5d c3 66 66 2e 0f 1f 84 > > 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 c6 07 00 0f 1f 00 fb 0f 1f > > 44 00 00 <bf> 01 00 00 00 e8 41 95 46 ff 65 8b 05 9 > > a c1 9a 5f 85 c0 74 02 5d > > [ 812.151593] RSP: 0018:ffffa863c246bd70 EFLAGS: 00000246 > > [ 812.151594] RAX: ffff8bc0913f6400 RBX: ffff8bc0913f6400 RCX: 0000000000000000 > > [ 812.151595] RDX: 0000000000000002 RSI: 00000000000a0013 RDI: ffff8bc089b63180 > > [ 812.151596] RBP: ffffa863c246bd70 R08: ffff8bc0811d6b40 R09: ffff8bc089b63180 > > [ 812.151597] R10: 0000000000000000 R11: 0000000000000004 R12: ffff8bc0913f6400 > > [ 812.151597] R13: ffff8bc089b63180 R14: ffff8bc0913f6400 R15: ffffa863c246be68 > > [ 812.151598] FS: 00007f612dda5700(0000) GS:ffff8bc7e24c0000(0000) > > knlGS:0000000000000000 > > [ 812.151599] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 812.151600] CR2: 000055e70715692e CR3: 000000010b4e8005 CR4: 00000000003706e4 > > [ 812.151601] Call Trace: > > [ 812.151602] <TASK> > > [ 812.151604] do_signal_stop+0x228/0x260 > > [ 812.151606] get_signal+0x43a/0x8e0 > > [ 812.151608] arch_do_signal_or_restart+0x37/0x7d0 > > [ 812.151610] ? __this_cpu_preempt_check+0x13/0x20 > > [ 812.151612] ? __perf_event_task_sched_in+0x81/0x230 > > [ 812.151616] ? __this_cpu_preempt_check+0x13/0x20 > > [ 812.151617] exit_to_user_mode_prepare+0x130/0x1a0 > > [ 812.151620] syscall_exit_to_user_mode+0x26/0x40 > > [ 812.151621] ret_from_fork+0x15/0x30 > > [ 812.151623] RIP: 0033:0x7f612dfcd125 > > [ 812.151625] Code: 48 85 ff 74 3d 48 85 f6 74 38 48 83 ee 10 48 89 > > 4e 08 48 89 3e 48 89 d7 4c 89 c2 4d 89 c8 4c 8b 54 24 08 b8 38 00 00 > > 00 0f 05 <48> 85 c0 7c 13 74 01 c3 31 ed 58 5f ff d > > 0 48 89 c7 b8 3c 00 00 00 > > [ 812.151626] RSP: 002b:00007f612dda4fb0 EFLAGS: 00000246 ORIG_RAX: > > 0000000000000038 > > [ 812.151628] RAX: 0000000000000000 RBX: 00007f612dda5700 RCX: ffffffffffffffff > > [ 812.151628] RDX: 00007f612dda59d0 RSI: 00007f612dda4fb0 RDI: 00000000003d0f00 > > [ 812.151629] RBP: 00007ffd59ad20b0 R08: 00007f612dda5700 R09: 00007f612dda5700 > > [ 812.151630] R10: 00007f612dda59d0 R11: 0000000000000246 R12: 00007ffd59ad20ae > > [ 812.151631] R13: 00007ffd59ad20af R14: 00007ffd59ad20b0 R15: 00007f612dda4fc0 > > [ 812.151632] </TASK> > > > > - Kyle