mbox series

[v2,0/4] Freezer rewrite

Message ID 20210624092156.332208049@infradead.org
Headers show
Series Freezer rewrite | expand

Message

Peter Zijlstra June 24, 2021, 9:21 a.m. UTC
Hi all,

Now with a completely different approach to freezing the special states.

Patches go on top of tip/master, as they depend on the
task_struct::state rename.

Comments

Rafael J. Wysocki June 30, 2021, 5:05 p.m. UTC | #1
On Thu, Jun 24, 2021 at 11:28 AM Peter Zijlstra <peterz@infradead.org> wrote:
>

> Hi all,

>

> Now with a completely different approach to freezing the special states.

>

> Patches go on top of tip/master, as they depend on the

> task_struct::state rename.


Acked-by: Rafael J. Wysocki <rafael@kernel.org>


for the entire series from the power management side.
Peter Zijlstra July 6, 2021, 1:12 p.m. UTC | #2
On Thu, Jun 24, 2021 at 11:21:56AM +0200, Peter Zijlstra wrote:
> Hi all,

> 

> Now with a completely different approach to freezing the special states.


Oleg, could you please have a look at this?
Oleg Nesterov July 7, 2021, 2:14 p.m. UTC | #3
sorry for delay...

I am still trying to understand this series, just one note for now.

On 06/24, Peter Zijlstra wrote:
>

> +static bool __freeze_task(struct task_struct *p)

> +{

> +	unsigned long flags;

> +	unsigned int state;

> +	bool frozen = false;

> +

> +	raw_spin_lock_irqsave(&p->pi_lock, flags);

> +	state = READ_ONCE(p->__state);

> +	if (state & (TASK_FREEZABLE|__TASK_STOPPED|__TASK_TRACED)) {

> +		/*

> +		 * Only TASK_NORMAL can be augmented with TASK_FREEZABLE,

> +		 * since they can suffer spurious wakeups.

> +		 */

> +		if (state & TASK_FREEZABLE)

> +			WARN_ON_ONCE(!(state & TASK_NORMAL));

> +

> +#ifdef CONFIG_LOCKDEP

> +		/*

> +		 * It's dangerous to freeze with locks held; there be dragons there.

> +		 */

> +		if (!(state & __TASK_FREEZABLE_UNSAFE))

> +			WARN_ON_ONCE(debug_locks && p->lockdep_depth);

> +#endif

> +

> +		if (state & (__TASK_STOPPED|__TASK_TRACED))

> +			WRITE_ONCE(p->__state, TASK_FROZEN|__TASK_FROZEN_SPECIAL);


Well, this doesn't look right.

Firstly, this can race with ptrace_freeze_traced() which can set
p->__state = __TASK_TRACED and clear TASK_FROZEN. Or with
__set_current_state(TASK_RUNNING) in ptrace_stop().


But the main problem is that you can't simply remove __TASK_TRACED,
this can confuse the debugger, any ptrace() request will fail as if
the tracee was killed.


Another problem. Suppose that p->parent sleeps in do_wait(). p calls
ptrace_stop(), sets __TASK_TRACED, and wakes the parent up.

__freeze_task() clears __TASK_TRACED.

The parent calls wait_task_stopped(p) but it fails because
task_is_traced() returns false. The parent sleeps again, and forever
because __thaw_special() won't notify it.


Or. Suppose that __freeze_task() removes __TASK_STOPPED. The new
debugger comes, the tracee should switch from STOPPED to TRACED. But
this won't happen because task_is_stopped() in ptrace_() will return
false and task_set_jobctl_pending/signal_wake_up_state won't be called.

Oleg.
Peter Zijlstra Aug. 5, 2021, 11:50 a.m. UTC | #4
On Wed, Jul 07, 2021 at 04:14:12PM +0200, Oleg Nesterov wrote:
> sorry for delay...


And me.. :/

> I am still trying to understand this series, just one note for now.


The main motivation is to ensure tasks don't wake up early on resume.
The current code has a problem between clearing pm_freezing and calling
__thaw_task(), a task can get spuriously woken there.

(Will is doing unspeakable things that suffer there.)

I'm trying to fix that by making frozen a special wait state, but that
then gets me complications vs the existing special states.

I also don't want to change the wakeup path, as you suggested earlier
because that's adding code (abeit fairly trivial) to every single wakeup
for the benefit of these exceptional cases, which I feel is just wrong
(tempting as it might be).

> On 06/24, Peter Zijlstra wrote:

> >

> > +static bool __freeze_task(struct task_struct *p)

> > +{

> > +	unsigned long flags;

> > +	unsigned int state;

> > +	bool frozen = false;

> > +

> > +	raw_spin_lock_irqsave(&p->pi_lock, flags);

> > +	state = READ_ONCE(p->__state);

> > +	if (state & (TASK_FREEZABLE|__TASK_STOPPED|__TASK_TRACED)) {

> > +		/*

> > +		 * Only TASK_NORMAL can be augmented with TASK_FREEZABLE,

> > +		 * since they can suffer spurious wakeups.

> > +		 */

> > +		if (state & TASK_FREEZABLE)

> > +			WARN_ON_ONCE(!(state & TASK_NORMAL));

> > +

> > +#ifdef CONFIG_LOCKDEP

> > +		/*

> > +		 * It's dangerous to freeze with locks held; there be dragons there.

> > +		 */

> > +		if (!(state & __TASK_FREEZABLE_UNSAFE))

> > +			WARN_ON_ONCE(debug_locks && p->lockdep_depth);

> > +#endif

> > +

> > +		if (state & (__TASK_STOPPED|__TASK_TRACED))

> > +			WRITE_ONCE(p->__state, TASK_FROZEN|__TASK_FROZEN_SPECIAL);

> 

> Well, this doesn't look right.


> But the main problem is that you can't simply remove __TASK_TRACED,

> this can confuse the debugger, any ptrace() request will fail as if

> the tracee was killed.


Urgh.. indeed. I missed the obvious *again* :/ Other, not-yet-frozen,
tasks will observe this 'intermediate' state and misbehave. And similar
on wakeup I suppose, if we wake the ptracer before the tracee it again
can observe this state.

I suppose we could cure that, have stopped/trace users use a special
accessor for task::__state... not pretty. Let me see if I can come up
with anything else.