mbox series

[0/6] sched: support schedstats for RT sched class

Message ID 20201201115416.26515-1-laoar.shao@gmail.com
Headers show
Series sched: support schedstats for RT sched class | expand

Message

Yafang Shao Dec. 1, 2020, 11:54 a.m. UTC
We want to measure the latency of RT tasks in our production
environment with schedstats facility, but currently schedstats is only
supported for fair sched class. This patch enable it for RT sched class
as well.

- Structure

Before we make schedstats available for RT sched class, we should make
struct sched_statistics independent of fair sched class first.


The struct sched_statistics is the schedular statistics of a task_struct
or a task_group. So we can move it into struct task_struct and
struct task_group to achieve the goal.

Below is the detailed explaination of the change in the structs.

The struct before this patch:

struct task_struct {            |-> struct sched_entity {
    ...                         |       ...
    struct sched_entity *se; ---|       struct sched_statistics statistics;
    struct sched_rt_entity *rt;         ...
    ...                                 ...
};                                  };

struct task_group {             |--> se[0]->statistics : schedstats of CPU0
    ...                         |
 #ifdef CONFIG_FAIR_GROUP_SCHED |
    struct sched_entity **se; --|--> se[1]->statistics : schedstats of CPU1
                                |
 #endif                         |
                                |--> se[N]->statistics : schedstats of CPUn

 #ifdef CONFIG_FAIR_GROUP_SCHED
    struct sched_rt_entity  **rt_se; (N/A)
 #endif
    ...
};

The '**se' in task_group is allocated in the fair sched class, which is
hard to be reused by other sched class.

The struct after this patch:
struct task_struct {
    ...
    struct sched_statistics statistics;
    ...
    struct sched_entity *se;
    struct sched_rt_entity *rt;
    ...
};

struct task_group {                    |---> stats[0] : of CPU0
    ...                                |
    struct sched_statistics **stats; --|---> stats[1] : of CPU1
    ...                                |
                                       |---> stats[n] : of CPUn
 #ifdef CONFIG_FAIR_GROUP_SCHED
    struct sched_entity **se;
 #endif
 #ifdef CONFIG_RT_GROUP_SCHED
    struct sched_rt_entity  **rt_se;
 #endif
    ...
};

After the patch it is clearly that both of se or rt_se can easily get the
sched_statistics by a task_struct or a task_group.

- Function Interface

The original prototype of the schedstats helpers are

update_stats_wait_*(struct cfs_rq *cfs_rq, struct sched_entity *se)

The cfs_rq in these helpers is used to get the rq_clock, and the se is
used to get the struct sched_statistics and the struct task_struct. In
order to make these helpers available by all sched classes, we can pass
the rq, sched_statistics and task_struct directly.

Then the new helpers are

update_stats_wait_*(struct rq *rq, struct task_struct *p,
                    struct sched_statistics *stats)

which are independent of fair sched class.

To avoid vmlinux growing too large or introducing ovehead when
!schedstat_enabled(), some new helpers after schedstat_enabled() are also
introduced, Suggested by Mel. These helpers are in sched/stats.c,

__update_stats_wait_*(struct rq *rq, struct task_struct *p,
                      struct sched_statistics *stats)

- Implementation

After we make the struct sched_statistics and the helpers of it
independent of fair sched class, we can easily use the schedstats
facility for RT sched class.

The schedstat usage in RT sched class is similar with fair sched class,
for example,
                fair                            RT
enqueue         update_stats_enqueue_fair       update_stats_enqueue_rt
dequeue         update_stats_dequeue_fair       update_stats_dequeue_rt
put_prev_task   update_stats_wait_start         update_stats_wait_start
set_next_task   update_stats_wait_end           update_stats_wait_end

- Usage
 
The user can get the schedstats information in the same way in fair sched
class. For example,
		fair				RT
task show	/proc/[pid]/sched               /proc/[pid]/sched
group show	cpu.stat in cgroup		cpu.stat in cgroup

The sched:sched_stat_{wait, sleep, iowait, blocked, runtime} tracepoints
can be used to trace RT tasks as well.

Detailed examples of the output of schedstats for RT tasks are in patch #6.


Changes since RFC:
- improvement of schedstats helpers, per Mel.
- make struct schedstats independent of fair sched class

Yafang Shao (6):
  sched: don't include stats.h in sched.h
  sched, fair: use __schedstat_set() in set_next_entity()
  sched: make struct sched_statistics independent of fair sched class
  sched: make schedstats helpers independent of fair sched class
  sched, rt: support sched_stat_runtime tracepoint for RT sched class
  sched, rt: support schedstats for RT sched class

 include/linux/sched.h    |   3 +-
 kernel/sched/core.c      |  25 +++--
 kernel/sched/deadline.c  |   5 +-
 kernel/sched/debug.c     |  82 +++++++--------
 kernel/sched/fair.c      | 209 +++++++++++++++------------------------
 kernel/sched/idle.c      |   1 +
 kernel/sched/rt.c        | 142 +++++++++++++++++++++++++-
 kernel/sched/sched.h     |   9 +-
 kernel/sched/stats.c     | 105 ++++++++++++++++++++
 kernel/sched/stats.h     |  80 +++++++++++++++
 kernel/sched/stop_task.c |   5 +-
 11 files changed, 476 insertions(+), 190 deletions(-)

Comments

Mel Gorman Dec. 1, 2020, 12:41 p.m. UTC | #1
On Tue, Dec 01, 2020 at 07:54:13PM +0800, Yafang Shao wrote:
> If we want to use schedstats facility, we should move out of
> struct sched_statistics from the struct sched_entity or add it into other
> sctructs of sched entity as well. Obviously the latter one is bad because
> that requires more spaces. So we should move it into a common struct which
> can be used by all sched classes.
> 
> The struct sched_statistics is the schedular statistics of a task_struct
> or a task_group. So we can move it into struct task_struct and
> struct task_group to achieve the goal.
> 
> Below is the detailed explaination of the change in the structs.
> 
> - Before this patch
> 
> struct task_struct {            |-> struct sched_entity {
>     ...                         |       ...
>     struct sched_entity *se; ---|       struct sched_statistics statistics;
>     struct sched_rt_entity *rt;         ...
>     ...                                 ...
> };                                  };
> 
> struct task_group {             |--> se[0]->statistics : schedstats of CPU0
>     ...                         |
>  #ifdef CONFIG_FAIR_GROUP_SCHED |
>     struct sched_entity **se; --|--> se[1]->statistics : schedstats of CPU1
>                                 |
>  #endif                         |
>                                 |--> se[N]->statistics : schedstats of CPUn
> 
>  #ifdef CONFIG_FAIR_GROUP_SCHED
>     struct sched_rt_entity  **rt_se; (N/A)
>  #endif
>     ...
> };
> 
> The '**se' in task_group is allocated in the fair sched class, which is
> hard to be reused by other sched class.
> 
> - After this patch
> 
> struct task_struct {
>     ...
>     struct sched_statistics statistics;
>     ...
>     struct sched_entity *se;
>     struct sched_rt_entity *rt;
>     ...
> };
> 
> struct task_group {                    |---> stats[0] : of CPU0
>     ...                                |
>     struct sched_statistics **stats; --|---> stats[1] : of CPU1
>     ...                                |
>                                        |---> stats[n] : of CPUn
>  #ifdef CONFIG_FAIR_GROUP_SCHED
>     struct sched_entity **se;
>  #endif
>  #ifdef CONFIG_RT_GROUP_SCHED
>     struct sched_rt_entity  **rt_se;
>  #endif
>     ...
> };
> 
> After the patch it is clearly that both of se or rt_se can easily get the
> sched_statistics by a task_struct or a task_group.
> 
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>

I didn't see anything wrong as such, it's mostly a mechanical
conversion. The one slight caveat is the potential change in cache
location for the statistics but it's not necessarily negative. The stats
potentially move to a different cache line but it's less obvious whether
that even matters given the location is very similar.

There is increased overhead now when schedstats are *enabled* because
_schedstat_from_sched_entity() has to be called but it appears that it is
protected by a schedstat_enabled() check. So ok, schedstats when enabled
are now a bit more expensive but they were expensive in the first place
so does it matter?

I'd have been happied if there was a comparison with schedstats enabled
just in case the overhead is too high but it could also do with a second
set of eyeballs.

It's somewhat tentative but

Acked-by: Mel Gorman <mgorman@suse.de>
Yafang Shao Dec. 2, 2020, 2:06 a.m. UTC | #2
On Tue, Dec 1, 2020 at 8:41 PM Mel Gorman <mgorman@suse.de> wrote:
>

> On Tue, Dec 01, 2020 at 07:54:13PM +0800, Yafang Shao wrote:

> > If we want to use schedstats facility, we should move out of

> > struct sched_statistics from the struct sched_entity or add it into other

> > sctructs of sched entity as well. Obviously the latter one is bad because

> > that requires more spaces. So we should move it into a common struct which

> > can be used by all sched classes.

> >

> > The struct sched_statistics is the schedular statistics of a task_struct

> > or a task_group. So we can move it into struct task_struct and

> > struct task_group to achieve the goal.

> >

> > Below is the detailed explaination of the change in the structs.

> >

> > - Before this patch

> >

> > struct task_struct {            |-> struct sched_entity {

> >     ...                         |       ...

> >     struct sched_entity *se; ---|       struct sched_statistics statistics;

> >     struct sched_rt_entity *rt;         ...

> >     ...                                 ...

> > };                                  };

> >

> > struct task_group {             |--> se[0]->statistics : schedstats of CPU0

> >     ...                         |

> >  #ifdef CONFIG_FAIR_GROUP_SCHED |

> >     struct sched_entity **se; --|--> se[1]->statistics : schedstats of CPU1

> >                                 |

> >  #endif                         |

> >                                 |--> se[N]->statistics : schedstats of CPUn

> >

> >  #ifdef CONFIG_FAIR_GROUP_SCHED

> >     struct sched_rt_entity  **rt_se; (N/A)

> >  #endif

> >     ...

> > };

> >

> > The '**se' in task_group is allocated in the fair sched class, which is

> > hard to be reused by other sched class.

> >

> > - After this patch

> >

> > struct task_struct {

> >     ...

> >     struct sched_statistics statistics;

> >     ...

> >     struct sched_entity *se;

> >     struct sched_rt_entity *rt;

> >     ...

> > };

> >

> > struct task_group {                    |---> stats[0] : of CPU0

> >     ...                                |

> >     struct sched_statistics **stats; --|---> stats[1] : of CPU1

> >     ...                                |

> >                                        |---> stats[n] : of CPUn

> >  #ifdef CONFIG_FAIR_GROUP_SCHED

> >     struct sched_entity **se;

> >  #endif

> >  #ifdef CONFIG_RT_GROUP_SCHED

> >     struct sched_rt_entity  **rt_se;

> >  #endif

> >     ...

> > };

> >

> > After the patch it is clearly that both of se or rt_se can easily get the

> > sched_statistics by a task_struct or a task_group.

> >

> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>

>

> I didn't see anything wrong as such, it's mostly a mechanical

> conversion. The one slight caveat is the potential change in cache

> location for the statistics but it's not necessarily negative. The stats

> potentially move to a different cache line but it's less obvious whether

> that even matters given the location is very similar.

>

> There is increased overhead now when schedstats are *enabled* because

> _schedstat_from_sched_entity() has to be called but it appears that it is

> protected by a schedstat_enabled() check. So ok, schedstats when enabled

> are now a bit more expensive but they were expensive in the first place

> so does it matter?

>

> I'd have been happied if there was a comparison with schedstats enabled

> just in case the overhead is too high but it could also do with a second

> set of eyeballs.

>


Sure, I will do the comparison. Thanks for the review again.


> It's somewhat tentative but

>

> Acked-by: Mel Gorman <mgorman@suse.de>

>

> --

> Mel Gorman

> SUSE Labs




-- 
Thanks
Yafang