mbox series

[0/6] sched: Cleanup task_struct::state

Message ID 20210602131225.336600299@infradead.org
Headers show
Series sched: Cleanup task_struct::state | expand

Message

Peter Zijlstra June 2, 2021, 1:12 p.m. UTC
Hi!

The task_struct::state variable is a bit odd in a number of ways:

 - it's declared 'volatile' (against current practises);
 - it's 'unsigned long' which is a weird size;
 - it's type is inconsistent when used for function arguments.

These patches clean that up by making it consistently 'unsigned int', and
replace (almost) all accesses with READ_ONCE()/WRITE_ONCE(). In order to not
miss any, the variable is renamed, ensuring a missed conversion results in a
compile error.

The first few patches fix a number of pre-existing errors and introduce a few
helpers to make the final conversion less painful.

Comments

Mathieu Desnoyers June 2, 2021, 2:01 p.m. UTC | #1
----- On Jun 2, 2021, at 9:12 AM, Peter Zijlstra peterz@infradead.org wrote:

> Remove yet another few p->state accesses.

[...]

> 
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -212,6 +212,8 @@ struct task_group;
> 
> #endif
> 
> +#define get_current_state()	READ_ONCE(current->state)

Why use a macro rather than a static inline here ?

Thanks,

Mathieu
Mathieu Desnoyers June 2, 2021, 2:06 p.m. UTC | #2
----- On Jun 2, 2021, at 9:12 AM, Peter Zijlstra peterz@infradead.org wrote:

> Change the type and name of task_struct::state. Drop the volatile and
> shrink it to an 'unsigned int'. Rename it in order to find all uses
> such that we can use READ_ONCE/WRITE_ONCE as appropriate.
> 
[...]
> 
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c

[...]
> @@ -1559,7 +1560,8 @@ static int fill_psinfo(struct elf_prpsin
> 	psinfo->pr_pgrp = task_pgrp_vnr(p);
> 	psinfo->pr_sid = task_session_vnr(p);
> 
> -	i = p->state ? ffz(~p->state) + 1 : 0;
> +	state = READ_ONCE(p->__state);
> +	i = state ? ffz(~state) + 1 : 0;
> 	psinfo->pr_state = i;
> 	psinfo->pr_sname = (i > 5) ? '.' : "RSDTZW"[i];
> 	psinfo->pr_zomb = psinfo->pr_sname == 'Z';

[...]

> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -113,13 +113,13 @@ struct task_group;
> 					 __TASK_TRACED | EXIT_DEAD | EXIT_ZOMBIE | \
> 					 TASK_PARKED)
> 
> -#define task_is_running(task)		(READ_ONCE((task)->state) == TASK_RUNNING)
> +#define task_is_running(task)		(READ_ONCE((task)->__state) == TASK_RUNNING)
> 
> -#define task_is_traced(task)		((task->state & __TASK_TRACED) != 0)
> +#define task_is_traced(task)		((READ_ONCE(task->__state) & __TASK_TRACED) != 0)
> 
> -#define task_is_stopped(task)		((task->state & __TASK_STOPPED) != 0)
> +#define task_is_stopped(task)		((READ_ONCE(task->__state) & __TASK_STOPPED) !=
> 0)
> 
> -#define task_is_stopped_or_traced(task)	((task->state & (__TASK_STOPPED |
> __TASK_TRACED)) != 0)
> +#define task_is_stopped_or_traced(task)	((READ_ONCE(task->__state) &
> (__TASK_STOPPED | __TASK_TRACED)) != 0)
> 
> #ifdef CONFIG_DEBUG_ATOMIC_SLEEP
> 
> @@ -134,14 +134,14 @@ struct task_group;
> 	do {							\
> 		WARN_ON_ONCE(is_special_task_state(state_value));\
> 		current->task_state_change = _THIS_IP_;		\
> -		current->state = (state_value);			\
> +		WRITE_ONCE(current->__state, (state_value));	\
> 	} while (0)

Why not introduce set_task_state(p) and get_task_state(p) rather than sprinkle
READ_ONCE() and WRITE_ONCE() all over the kernel ?

Thanks,

Mathieu
Peter Zijlstra June 2, 2021, 2:12 p.m. UTC | #3
On Wed, Jun 02, 2021 at 10:01:29AM -0400, Mathieu Desnoyers wrote:
> ----- On Jun 2, 2021, at 9:12 AM, Peter Zijlstra peterz@infradead.org wrote:
> 
> > Remove yet another few p->state accesses.
> 
> [...]
> 
> > 
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -212,6 +212,8 @@ struct task_group;
> > 
> > #endif
> > 
> > +#define get_current_state()	READ_ONCE(current->state)
> 
> Why use a macro rather than a static inline here ?

Mostly to be consistent, all that state stuff is macros. I suppose we
could try and make them inlines at the end or so -- if the header maze
allows.
Peter Zijlstra June 2, 2021, 2:20 p.m. UTC | #4
On Wed, Jun 02, 2021 at 10:06:58AM -0400, Mathieu Desnoyers wrote:
> ----- On Jun 2, 2021, at 9:12 AM, Peter Zijlstra peterz@infradead.org wrote:

> > @@ -134,14 +134,14 @@ struct task_group;
> > 	do {							\
> > 		WARN_ON_ONCE(is_special_task_state(state_value));\
> > 		current->task_state_change = _THIS_IP_;		\
> > -		current->state = (state_value);			\
> > +		WRITE_ONCE(current->__state, (state_value));	\
> > 	} while (0)
> 
> Why not introduce set_task_state(p) and get_task_state(p) rather than sprinkle
> READ_ONCE() and WRITE_ONCE() all over the kernel ?

set_task_state() is fundamentally unsound, there's very few sites that
_set_ state on anything other than current, and those sites are super
tricky, eg. ptrace.

Having get_task_state() would seem to suggest it's actually a sane thing
to do, it's not really. Inspecting remote state is full of races, and
some of that really wants cleaning up, but that's for another day.
Will Deacon June 2, 2021, 3:02 p.m. UTC | #5
On Wed, Jun 02, 2021 at 03:12:29PM +0200, Peter Zijlstra wrote:
> Remove yet another few p->state accesses.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  block/blk-mq.c        |    2 +-
>  include/linux/sched.h |    2 ++
>  kernel/freezer.c      |    2 +-
>  kernel/sched/core.c   |    6 +++---
>  4 files changed, 7 insertions(+), 5 deletions(-)

I think you can include kernel/kcsan/report.c here too.

With that:

Acked-by: Will Deacon <will@kernel.org>

Will
Will Deacon June 2, 2021, 3:06 p.m. UTC | #6
On Wed, Jun 02, 2021 at 03:12:30PM +0200, Peter Zijlstra wrote:
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  kernel/time/timer.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> --- a/kernel/time/timer.c
> +++ b/kernel/time/timer.c
> @@ -1879,7 +1879,7 @@ signed long __sched schedule_timeout(sig
>  			printk(KERN_ERR "schedule_timeout: wrong timeout "
>  				"value %lx\n", timeout);
>  			dump_stack();
> -			current->state = TASK_RUNNING;
> +			__set_current_state(TASK_RUNNING);

Acked-by: Will Deacon <will@kernel.org>

Will
Peter Zijlstra June 2, 2021, 4:46 p.m. UTC | #7
On Wed, Jun 02, 2021 at 03:59:21PM +0100, Will Deacon wrote:
> On Wed, Jun 02, 2021 at 03:12:27PM +0200, Peter Zijlstra wrote:
> > Replace a bunch of 'p->state == TASK_RUNNING' with a new helper:
> > task_is_running(p).
> > 
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > ---
> >  arch/x86/kernel/process.c |    4 ++--
> >  block/blk-mq.c            |    2 +-
> >  include/linux/sched.h     |    2 ++
> >  kernel/locking/lockdep.c  |    2 +-
> >  kernel/rcu/tree_plugin.h  |    2 +-
> >  kernel/sched/core.c       |    6 +++---
> >  kernel/sched/stats.h      |    2 +-
> >  kernel/signal.c           |    2 +-
> >  kernel/softirq.c          |    3 +--
> >  mm/compaction.c           |    2 +-
> >  10 files changed, 14 insertions(+), 13 deletions(-)
> > 
> > --- a/arch/x86/kernel/process.c
> > +++ b/arch/x86/kernel/process.c
> > @@ -931,7 +931,7 @@ unsigned long get_wchan(struct task_stru
> >  	unsigned long start, bottom, top, sp, fp, ip, ret = 0;
> >  	int count = 0;
> >  
> > -	if (p == current || p->state == TASK_RUNNING)
> > +	if (p == current || task_is_running(p))
> 
> Looks like this one in get_wchan() has been cargo-culted across most of
> arch/ so they'll need fixing up before you rename the struct member.

Yeah, this was x86_64 allmodconfig driven, I've already got a bunch of
robot mail telling me other archs need help, I'll fix it iup.

> There's also a weird one in tools/bpf/runqslower/runqslower.bpf.c (!)

I'm tempted to let the bpf people sort their own gunk. This is not an
ABI. I so don't care breaking every script out there.
Davidlohr Bueso June 2, 2021, 7:54 p.m. UTC | #8
On Wed, 02 Jun 2021, Peter Zijlstra wrote:

-ENOCHANGELONG

But yeah, I thought we had gotten rid of all these.

>
>Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Reviewed-by: Davidlohr Bueso <dbueso@suse.de>
Peter Zijlstra June 3, 2021, 6:39 a.m. UTC | #9
On Wed, Jun 02, 2021 at 12:54:58PM -0700, Davidlohr Bueso wrote:
> On Wed, 02 Jun 2021, Peter Zijlstra wrote:
> 
> -ENOCHANGELONG

I completely failed to come up with something useful, still do. Subject
says it all.

> But yeah, I thought we had gotten rid of all these.

I too was surprised to find it :-)
Daniel Thompson June 7, 2021, 10:45 a.m. UTC | #10
On Wed, Jun 02, 2021 at 03:12:31PM +0200, Peter Zijlstra wrote:
> Change the type and name of task_struct::state. Drop the volatile and
> shrink it to an 'unsigned int'. Rename it in order to find all uses
> such that we can use READ_ONCE/WRITE_ONCE as appropriate.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  ...
>  kernel/debug/kdb/kdb_support.c |   18 +++++++------
>  ...
> --- a/kernel/debug/kdb/kdb_support.c
> +++ b/kernel/debug/kdb/kdb_support.c
> @@ -609,23 +609,25 @@ unsigned long kdb_task_state_string(cons
>   */
>  char kdb_task_state_char (const struct task_struct *p)
>  {
> -	int cpu;
> -	char state;
> +	unsigned int p_state;
>  	unsigned long tmp;
> +	char state;
> +	int cpu;
>  
>  	if (!p ||
>  	    copy_from_kernel_nofault(&tmp, (char *)p, sizeof(unsigned long)))
>  		return 'E';
>  
>  	cpu = kdb_process_cpu(p);
> -	state = (p->state == 0) ? 'R' :
> -		(p->state < 0) ? 'U' :
> -		(p->state & TASK_UNINTERRUPTIBLE) ? 'D' :
> -		(p->state & TASK_STOPPED) ? 'T' :
> -		(p->state & TASK_TRACED) ? 'C' :
> +	p_state = READ_ONCE(p->__state);
> +	state = (p_state == 0) ? 'R' :
> +		(p_state < 0) ? 'U' :

Looks like the U here stands for Unreachable since this patch makes it
more obvious that this clause is (and previously was) exactly that!

Dropping the U state would be good since I guess this will show up as a
"new" warning in some tools. However it was a preexisting problem so with
or without this cleaned up:
Acked-by: Daniel Thompson <daniel.thompson@linaro.org>


Daniel.

> +		(p_state & TASK_UNINTERRUPTIBLE) ? 'D' :
> +		(p_state & TASK_STOPPED) ? 'T' :
> +		(p_state & TASK_TRACED) ? 'C' :
>  		(p->exit_state & EXIT_ZOMBIE) ? 'Z' :
>  		(p->exit_state & EXIT_DEAD) ? 'E' :
> -		(p->state & TASK_INTERRUPTIBLE) ? 'S' : '?';
> +		(p_state & TASK_INTERRUPTIBLE) ? 'S' : '?';
>  	if (is_idle_task(p)) {
>  		/* Idle task.  Is it really idle, apart from the kdb
>  		 * interrupt? */
Peter Zijlstra June 7, 2021, 11:10 a.m. UTC | #11
On Mon, Jun 07, 2021 at 11:45:00AM +0100, Daniel Thompson wrote:
> On Wed, Jun 02, 2021 at 03:12:31PM +0200, Peter Zijlstra wrote:

> > Change the type and name of task_struct::state. Drop the volatile and

> > shrink it to an 'unsigned int'. Rename it in order to find all uses

> > such that we can use READ_ONCE/WRITE_ONCE as appropriate.

> > 

> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

> > ---

> >  ...

> >  kernel/debug/kdb/kdb_support.c |   18 +++++++------

> >  ...

> > --- a/kernel/debug/kdb/kdb_support.c

> > +++ b/kernel/debug/kdb/kdb_support.c

> > @@ -609,23 +609,25 @@ unsigned long kdb_task_state_string(cons

> >   */

> >  char kdb_task_state_char (const struct task_struct *p)

> >  {

> > -	int cpu;

> > -	char state;

> > +	unsigned int p_state;

> >  	unsigned long tmp;

> > +	char state;

> > +	int cpu;

> >  

> >  	if (!p ||

> >  	    copy_from_kernel_nofault(&tmp, (char *)p, sizeof(unsigned long)))

> >  		return 'E';

> >  

> >  	cpu = kdb_process_cpu(p);

> > -	state = (p->state == 0) ? 'R' :

> > -		(p->state < 0) ? 'U' :

> > -		(p->state & TASK_UNINTERRUPTIBLE) ? 'D' :

> > -		(p->state & TASK_STOPPED) ? 'T' :

> > -		(p->state & TASK_TRACED) ? 'C' :

> > +	p_state = READ_ONCE(p->__state);

> > +	state = (p_state == 0) ? 'R' :

> > +		(p_state < 0) ? 'U' :

> 

> Looks like the U here stands for Unreachable since this patch makes it

> more obvious that this clause is (and previously was) exactly that!

> 

> Dropping the U state would be good since I guess this will show up as a

> "new" warning in some tools. However it was a preexisting problem so with

> or without this cleaned up:

> Acked-by: Daniel Thompson <daniel.thompson@linaro.org>


Thanks!

Note that there's a second instance of this exact code in
arch/powerpc/xmon/xmon.c, with the same 'U' issue.

I'll repost this soon, as it seems I've fixed all robot failout (fingers
crossed).