mbox series

[v3,0/11] ptrace: cleaning up ptrace_stop

Message ID 87k0b0apne.fsf_-_@email.froward.int.ebiederm.org
Headers show
Series ptrace: cleaning up ptrace_stop | expand

Message

Eric W. Biederman May 4, 2022, 10:39 p.m. UTC
The states TASK_STOPPED and TASK_TRACE are special in they can not
handle spurious wake-ups.  This plus actively depending upon and
changing the value of tsk->__state causes problems for PREEMPT_RT and
Peter's freezer rewrite.

There are a lot of details we have to get right to sort out the
technical challenges and this is my parred back version of the changes
that contains just those problems I see good solutions to that I believe
are ready.

A couple of issues have been pointed but I think this parred back set of
changes is still on the right track.  The biggest change in v3 is that
instead of trying to prevent sending a spurious SIGTRAP when the tracer
dies with the tracee in ptrace_report_syscall, I have modified the code
to just stop trying.  While I still have taken TASK_WAKEKILL out of
TASK_TRACED I have implemented simpler logic in signal_wake_up.  Further
I have followed Oleg's advice and exit early from ptrace_stop if a fatal
signal is pending.

This set of changes should support Peter's freezer rewrite, and with the
addition of changing wait_task_inactive(TASK_TRACED) to be
wait_task_inactive(0) in ptrace_check_attach I don't think there are any
races or issues to be concerned about from the ptrace side.

More work is needed to support PREEMPT_RT, but these changes get things
closer.

I believe this set of changes will provide a firm foundation for solving
the PREEMPT_RT and freezer challenges.

With fewer lines added and more lines removed this set of changes looks
like it is moving in a good direction.

Eric W. Biederman (10):
      signal: Rename send_signal send_signal_locked
      signal: Replace __group_send_sig_info with send_signal_locked
      ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
      ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
      ptrace: Remove arch_ptrace_attach
      signal: Use lockdep_assert_held instead of assert_spin_locked
      ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
      ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs
      ptrace: Don't change __state
      ptrace: Always take siglock in ptrace_resume

Peter Zijlstra (1):
      sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state

 arch/ia64/include/asm/ptrace.h    |   4 --
 arch/ia64/kernel/ptrace.c         |  57 ----------------
 arch/um/include/asm/thread_info.h |   2 +
 arch/um/kernel/exec.c             |   2 +-
 arch/um/kernel/process.c          |   2 +-
 arch/um/kernel/ptrace.c           |   8 +--
 arch/um/kernel/signal.c           |   4 +-
 arch/x86/kernel/step.c            |   3 +-
 arch/xtensa/kernel/ptrace.c       |   4 +-
 arch/xtensa/kernel/signal.c       |   4 +-
 drivers/tty/tty_jobctrl.c         |   4 +-
 include/linux/ptrace.h            |   7 --
 include/linux/sched.h             |  10 ++-
 include/linux/sched/jobctl.h      |   8 +++
 include/linux/sched/signal.h      |  20 ++++--
 include/linux/signal.h            |   3 +-
 kernel/ptrace.c                   |  87 ++++++++----------------
 kernel/sched/core.c               |   5 +-
 kernel/signal.c                   | 135 +++++++++++++++++---------------------
 kernel/time/posix-cpu-timers.c    |   6 +-
 20 files changed, 138 insertions(+), 237 deletions(-)

Eric

Comments

Sebastian Andrzej Siewior May 5, 2022, 12:50 p.m. UTC | #1
On 2022-05-04 17:40:56 [-0500], Eric W. Biederman wrote:
> Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace
> command is executing.
> 
> Instead remove TASK_WAKEKILL from the definition of TASK_TRACED, and
> implemention a new jobctl flag TASK_PTRACE_FROZEN.  This new flag is
implement ?

> set in jobctl_freeze_task and cleared when ptrace_stop is awoken or in
> jobctl_unfreeze_task (when ptrace_stop remains asleep).

Sebastian
Oleg Nesterov May 5, 2022, 2:57 p.m. UTC | #2
On 05/04, Eric W. Biederman wrote:
>
> -static int ptrace_stop(int exit_code, int why, int clear_code,
> -			unsigned long message, kernel_siginfo_t *info)
> +static int ptrace_stop(int exit_code, int why, unsigned long message,
> +		       kernel_siginfo_t *info)
>  	__releases(&current->sighand->siglock)
>  	__acquires(&current->sighand->siglock)
>  {
> @@ -2259,54 +2259,33 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
>  
>  	spin_unlock_irq(&current->sighand->siglock);
>  	read_lock(&tasklist_lock);
> -	if (likely(current->ptrace)) {
> -		/*
> -		 * Notify parents of the stop.
> -		 *
> -		 * While ptraced, there are two parents - the ptracer and
> -		 * the real_parent of the group_leader.  The ptracer should
> -		 * know about every stop while the real parent is only
> -		 * interested in the completion of group stop.  The states
> -		 * for the two don't interact with each other.  Notify
> -		 * separately unless they're gonna be duplicates.
> -		 */
> +	/*
> +	 * Notify parents of the stop.
> +	 *
> +	 * While ptraced, there are two parents - the ptracer and
> +	 * the real_parent of the group_leader.  The ptracer should
> +	 * know about every stop while the real parent is only
> +	 * interested in the completion of group stop.  The states
> +	 * for the two don't interact with each other.  Notify
> +	 * separately unless they're gonna be duplicates.
> +	 */
> +	if (current->ptrace)
>  		do_notify_parent_cldstop(current, true, why);
> -		if (gstop_done && ptrace_reparented(current))
> -			do_notify_parent_cldstop(current, false, why);
> +	if (gstop_done && (!current->ptrace || ptrace_reparented(current)))
> +		do_notify_parent_cldstop(current, false, why);
>  
> -		/*
> -		 * Don't want to allow preemption here, because
> -		 * sys_ptrace() needs this task to be inactive.
> -		 *
> -		 * XXX: implement read_unlock_no_resched().
> -		 */
> -		preempt_disable();
> -		read_unlock(&tasklist_lock);
> -		cgroup_enter_frozen();
> -		preempt_enable_no_resched();
> -		freezable_schedule();
> -		cgroup_leave_frozen(true);
> -	} else {
> -		/*
> -		 * By the time we got the lock, our tracer went away.
> -		 * Don't drop the lock yet, another tracer may come.
> -		 *
> -		 * If @gstop_done, the ptracer went away between group stop
> -		 * completion and here.  During detach, it would have set
> -		 * JOBCTL_STOP_PENDING on us and we'll re-enter
> -		 * TASK_STOPPED in do_signal_stop() on return, so notifying
> -		 * the real parent of the group stop completion is enough.
> -		 */
> -		if (gstop_done)
> -			do_notify_parent_cldstop(current, false, why);
> -
> -		/* tasklist protects us from ptrace_freeze_traced() */
> -		__set_current_state(TASK_RUNNING);
> -		read_code = false;
> -		if (clear_code)
> -			exit_code = 0;
> -		read_unlock(&tasklist_lock);
> -	}
> +	/*
> +	 * Don't want to allow preemption here, because
> +	 * sys_ptrace() needs this task to be inactive.
> +	 *
> +	 * XXX: implement read_unlock_no_resched().
> +	 */
> +	preempt_disable();
> +	read_unlock(&tasklist_lock);
> +	cgroup_enter_frozen();
> +	preempt_enable_no_resched();
> +	freezable_schedule();

I must have missed something.

So the tracee calls ptrace_notify() but debugger goes away before the
ptrace_notify() takes siglock. After that the no longer traced task
will sleep in TASK_TRACED ?

Looks like ptrace_stop() needs to check current->ptrace before it does
set_special_state(TASK_TRACED) with siglock held? Then we can rely on
ptrace_unlink() which will wake the tracee up even if debugger exits.

No?

Oleg.
Oleg Nesterov May 5, 2022, 3:01 p.m. UTC | #3
On 05/04, Eric W. Biederman wrote:
>
> With the removal of the incomplete detection of the tracer going away
> in ptrace_stop, ptrace_stop always sleeps in schedule after
> ptrace_freeze_traced succeeds.  Modify ptrace_check_attach to
> warn if wait_task_inactive fails.

Oh. Again, I don't understand the changelog. If we forget about RT,
ptrace_stop() will always sleep if ptrace_freeze_traced() succeeds.
may_ptrace_stop() has gone.

IOW. Lets forget about RT

> --- a/kernel/ptrace.c
> +++ b/kernel/ptrace.c
> @@ -266,17 +266,9 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
>  	}
>  	read_unlock(&tasklist_lock);
>
> -	if (!ret && !ignore_state) {
> -		if (!wait_task_inactive(child, __TASK_TRACED)) {
> -			/*
> -			 * This can only happen if may_ptrace_stop() fails and
> -			 * ptrace_stop() changes ->state back to TASK_RUNNING,
> -			 * so we should not worry about leaking __TASK_TRACED.
> -			 */
> -			WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
> -			ret = -ESRCH;
> -		}
> -	}
> +	if (!ret && !ignore_state &&
> +	    WARN_ON_ONCE(!wait_task_inactive(child, __TASK_TRACED)))
> +		ret = -ESRCH;
>
>  	return ret;
>  }

Why do you think this change would be wrong without any other changes?

Oleg.
Eric W. Biederman May 5, 2022, 4:48 p.m. UTC | #4
Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:

> On 2022-05-04 17:40:56 [-0500], Eric W. Biederman wrote:
>> Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace
>> command is executing.
>> 
>> Instead remove TASK_WAKEKILL from the definition of TASK_TRACED, and
>> implemention a new jobctl flag TASK_PTRACE_FROZEN.  This new flag is
> implement ?

Yes.  Thank you.

Eric
Eric W. Biederman May 5, 2022, 4:59 p.m. UTC | #5
Oleg Nesterov <oleg@redhat.com> writes:

> On 05/04, Eric W. Biederman wrote:
>>
>> -static int ptrace_stop(int exit_code, int why, int clear_code,
>> -			unsigned long message, kernel_siginfo_t *info)
>> +static int ptrace_stop(int exit_code, int why, unsigned long message,
>> +		       kernel_siginfo_t *info)
>>  	__releases(&current->sighand->siglock)
>>  	__acquires(&current->sighand->siglock)
>>  {
>> @@ -2259,54 +2259,33 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
>>  
>>  	spin_unlock_irq(&current->sighand->siglock);
>>  	read_lock(&tasklist_lock);
>> -	if (likely(current->ptrace)) {
>> -		/*
>> -		 * Notify parents of the stop.
>> -		 *
>> -		 * While ptraced, there are two parents - the ptracer and
>> -		 * the real_parent of the group_leader.  The ptracer should
>> -		 * know about every stop while the real parent is only
>> -		 * interested in the completion of group stop.  The states
>> -		 * for the two don't interact with each other.  Notify
>> -		 * separately unless they're gonna be duplicates.
>> -		 */
>> +	/*
>> +	 * Notify parents of the stop.
>> +	 *
>> +	 * While ptraced, there are two parents - the ptracer and
>> +	 * the real_parent of the group_leader.  The ptracer should
>> +	 * know about every stop while the real parent is only
>> +	 * interested in the completion of group stop.  The states
>> +	 * for the two don't interact with each other.  Notify
>> +	 * separately unless they're gonna be duplicates.
>> +	 */
>> +	if (current->ptrace)
>>  		do_notify_parent_cldstop(current, true, why);
>> -		if (gstop_done && ptrace_reparented(current))
>> -			do_notify_parent_cldstop(current, false, why);
>> +	if (gstop_done && (!current->ptrace || ptrace_reparented(current)))
>> +		do_notify_parent_cldstop(current, false, why);
>>  
>> -		/*
>> -		 * Don't want to allow preemption here, because
>> -		 * sys_ptrace() needs this task to be inactive.
>> -		 *
>> -		 * XXX: implement read_unlock_no_resched().
>> -		 */
>> -		preempt_disable();
>> -		read_unlock(&tasklist_lock);
>> -		cgroup_enter_frozen();
>> -		preempt_enable_no_resched();
>> -		freezable_schedule();
>> -		cgroup_leave_frozen(true);
>> -	} else {
>> -		/*
>> -		 * By the time we got the lock, our tracer went away.
>> -		 * Don't drop the lock yet, another tracer may come.
>> -		 *
>> -		 * If @gstop_done, the ptracer went away between group stop
>> -		 * completion and here.  During detach, it would have set
>> -		 * JOBCTL_STOP_PENDING on us and we'll re-enter
>> -		 * TASK_STOPPED in do_signal_stop() on return, so notifying
>> -		 * the real parent of the group stop completion is enough.
>> -		 */
>> -		if (gstop_done)
>> -			do_notify_parent_cldstop(current, false, why);
>> -
>> -		/* tasklist protects us from ptrace_freeze_traced() */
>> -		__set_current_state(TASK_RUNNING);
>> -		read_code = false;
>> -		if (clear_code)
>> -			exit_code = 0;
>> -		read_unlock(&tasklist_lock);
>> -	}
>> +	/*
>> +	 * Don't want to allow preemption here, because
>> +	 * sys_ptrace() needs this task to be inactive.
>> +	 *
>> +	 * XXX: implement read_unlock_no_resched().
>> +	 */
>> +	preempt_disable();
>> +	read_unlock(&tasklist_lock);
>> +	cgroup_enter_frozen();
>> +	preempt_enable_no_resched();
>> +	freezable_schedule();
>
> I must have missed something.
>
> So the tracee calls ptrace_notify() but debugger goes away before the
> ptrace_notify() takes siglock. After that the no longer traced task
> will sleep in TASK_TRACED ?
>
> Looks like ptrace_stop() needs to check current->ptrace before it does
> set_special_state(TASK_TRACED) with siglock held? Then we can rely on
> ptrace_unlink() which will wake the tracee up even if debugger exits.
>
> No?

Hmm.  If the debugger goes away when siglock is dropped and reaquired at
the top of ptrace_stop, that would appear to set the debugged process up
to sleep indefinitely.

I was thinking of the SIGKILL case which is handled.

Thank you for catching that.

Eric
Eric W. Biederman May 5, 2022, 5:21 p.m. UTC | #6
Oleg Nesterov <oleg@redhat.com> writes:

> On 05/04, Eric W. Biederman wrote:
>>
>> With the removal of the incomplete detection of the tracer going away
>> in ptrace_stop, ptrace_stop always sleeps in schedule after
>> ptrace_freeze_traced succeeds.  Modify ptrace_check_attach to
>> warn if wait_task_inactive fails.
>
> Oh. Again, I don't understand the changelog. If we forget about RT,
> ptrace_stop() will always sleep if ptrace_freeze_traced() succeeds.
> may_ptrace_stop() has gone.
>
> IOW. Lets forget about RT
>
>> --- a/kernel/ptrace.c
>> +++ b/kernel/ptrace.c
>> @@ -266,17 +266,9 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
>>  	}
>>  	read_unlock(&tasklist_lock);
>>
>> -	if (!ret && !ignore_state) {
>> -		if (!wait_task_inactive(child, __TASK_TRACED)) {
>> -			/*
>> -			 * This can only happen if may_ptrace_stop() fails and
>> -			 * ptrace_stop() changes ->state back to TASK_RUNNING,
>> -			 * so we should not worry about leaking __TASK_TRACED.
>> -			 */
>> -			WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
>> -			ret = -ESRCH;
>> -		}
>> -	}
>> +	if (!ret && !ignore_state &&
>> +	    WARN_ON_ONCE(!wait_task_inactive(child, __TASK_TRACED)))
>> +		ret = -ESRCH;
>>
>>  	return ret;
>>  }
>
> Why do you think this change would be wrong without any other changes?

For purposes of this analysis ptrace_detach and ptrace_exit (when the
tracer exits) can't happen.  So the bug you spotted in ptrace_stop does
not apply.

I was thinking that the test against !current->ptrace that replaced
the old may_ptrace_stop could trigger a failure here.  If the
ptrace_freeze_traced happens before that test that branch clearly can
not happen.

*Looks twice* Both ptrace_check_attach and ptrace_stop taking a
read_lock on tasklist_lock does not protect against concurrency by each
other, but the write_lock on tasklist_lock in ptrace_attach does
protect against a ptrace_attach coming in after the test and before
__set_current_state(TASK_RUNNING).

So yes. I should really split that part out into it's own patch.
And yes that WARN_ON_ONCE can trigger on PREEMPT_RT but that is just
because PREMPT_RT is currently broken with respect to ptrace.  Which
makes a WARN_ON_ONCE appropriate.

I will see how much of this analysis I can put in the changelog.

Thank you,
Eric
Oleg Nesterov May 5, 2022, 5:27 p.m. UTC | #7
On 05/05, Eric W. Biederman wrote:
>
> And yes that WARN_ON_ONCE can trigger on PREEMPT_RT but that is just
> because PREMPT_RT is currently broken with respect to ptrace.  Which
> makes a WARN_ON_ONCE appropriate.

Yes agreed. In this case WARN_ON_ONCE() can help a user to understand
that a failure was caused by the kernel problem which we need to fix
anyway.

Oleg.
Eric W. Biederman May 5, 2022, 5:53 p.m. UTC | #8
Oleg Nesterov <oleg@redhat.com> writes:

> On 05/04, Eric W. Biederman wrote:
>>
>> -static int ptrace_stop(int exit_code, int why, int clear_code,
>> -			unsigned long message, kernel_siginfo_t *info)
>> +static int ptrace_stop(int exit_code, int why, unsigned long message,
>> +		       kernel_siginfo_t *info)
>
> Forgot to mention... but in general I like this change.
>
> In particular, I like the fact it kills the ugly "int clear_code" arg
> which looks as if it solves the problems with the exiting tracer, but
> actually it doesn't. And we do not really care, imo.

Further either this change is necessary or we need to take siglock in
the !current->ptrace path in "ptrace: Don't change __state" so that
JOBCTL_TRACED can be cleared.

So I vote for deleting code, and making ptrace_stop easier to reason
about.

Eric
Oleg Nesterov May 5, 2022, 6:10 p.m. UTC | #9
On 05/05, Eric W. Biederman wrote:
>
> So I vote for deleting code, and making ptrace_stop easier to reason
> about.

Yes, yes, agreed.

Oleg.