Message ID | 1530200714-4504-10-git-send-email-vincent.guittot@linaro.org |
---|---|
State | New |
Headers | show |
Series | track CPU utilization | expand |
* Vincent Guittot <vincent.guittot@linaro.org> wrote: > The utilization of the CPU by rt, dl and interrupts are now tracked with > PELT so we can use these metrics instead of rt_avg to evaluate the remaining > capacity available for cfs class. > > scale_rt_capacity() behavior has been changed and now returns the remaining > capacity available for cfs instead of a scaling factor because rt, dl and > interrupt provide now absolute utilization value. > > The same formula as schedutil is used: > irq util_avg + (1 - irq util_avg / max capacity ) * /Sum rq util_avg > but the implementation is different because it doesn't return the same value > and doesn't benefit of the same optimization > > Cc: Ingo Molnar <mingo@redhat.com> > Cc: Peter Zijlstra <peterz@infradead.org> > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> > --- > kernel/sched/deadline.c | 2 -- > kernel/sched/fair.c | 41 +++++++++++++++++++---------------------- > kernel/sched/pelt.c | 2 +- > kernel/sched/rt.c | 2 -- > 4 files changed, 20 insertions(+), 27 deletions(-) > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index d2758e3..ce0dcbf 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -7550,39 +7550,36 @@ static inline int get_sd_load_idx(struct sched_domain *sd, > static unsigned long scale_rt_capacity(int cpu) > { > struct rq *rq = cpu_rq(cpu); > - u64 total, used, age_stamp, avg; > - s64 delta; > - > - /* > - * Since we're reading these variables without serialization make sure > - * we read them once before doing sanity checks on them. > - */ > - age_stamp = READ_ONCE(rq->age_stamp); > - avg = READ_ONCE(rq->rt_avg); > - delta = __rq_clock_broken(rq) - age_stamp; > + unsigned long max = arch_scale_cpu_capacity(NULL, cpu); > + unsigned long used, irq, free; > > - if (unlikely(delta < 0)) > - delta = 0; > +#if defined(CONFIG_IRQ_TIME_ACCOUNTING) || defined(CONFIG_PARAVIRT_TIME_ACCOUNTING) > + irq = READ_ONCE(rq->avg_irq.util_avg); > > - total = sched_avg_period() + delta; > + if (unlikely(irq >= max)) > + return 1; > +#endif Note that 'irq' is unused outside that macro block, resulting in a new warning on defconfig builds: CC kernel/sched/fair.o kernel/sched/fair.c: In function ‘scale_rt_capacity’: kernel/sched/fair.c:7553:22: warning: unused variable ‘irq’ [-Wunused-variable] unsigned long used, irq, free; ^~~ I have applied the delta fix below for simplicity, but what we really want is a cleanup of that function to eliminate the #ifdefs. One solution would be to factor out the 'irq' utilization value into a helper inline, and double check that if the configs are off the compiler does the right thing and eliminates this identity transformation for the irq==0 case: free *= (max - irq); free /= max; If the compiler refuses to optimize this away (due to the zero and overflow cases), try to find something more clever? Thanks, Ingo kernel/sched/fair.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index e3221db0511a..d5f7d521e448 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7550,7 +7550,10 @@ static unsigned long scale_rt_capacity(int cpu) { struct rq *rq = cpu_rq(cpu); unsigned long max = arch_scale_cpu_capacity(NULL, cpu); - unsigned long used, irq, free; + unsigned long used, free; +#if defined(CONFIG_IRQ_TIME_ACCOUNTING) || defined(CONFIG_PARAVIRT_TIME_ACCOUNTING) + unsigned long irq; +#endif #if defined(CONFIG_IRQ_TIME_ACCOUNTING) || defined(CONFIG_PARAVIRT_TIME_ACCOUNTING) irq = READ_ONCE(rq->avg_irq.util_avg);
On Mon, 2018-07-16 at 00:15 +0200, Ingo Molnar wrote: > * Vincent Guittot <vincent.guittot@linaro.org> wrote: > > > The utilization of the CPU by rt, dl and interrupts are now tracked with > > PELT so we can use these metrics instead of rt_avg to evaluate the remaining > > capacity available for cfs class. > > > > scale_rt_capacity() behavior has been changed and now returns the remaining > > capacity available for cfs instead of a scaling factor because rt, dl and > > interrupt provide now absolute utilization value. > > > > The same formula as schedutil is used: > > irq util_avg + (1 - irq util_avg / max capacity ) * /Sum rq util_avg > > but the implementation is different because it doesn't return the same value > > and doesn't benefit of the same optimization [] > I have applied the delta fix below for simplicity, but what we really want is a > cleanup of that function to eliminate the #ifdefs. One solution would be to factor > out the 'irq' utilization value into a helper inline, and double check that if the > configs are off the compiler does the right thing and eliminates this identity > transformation for the irq==0 case: [] > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c [] > @@ -7550,7 +7550,10 @@ static unsigned long scale_rt_capacity(int cpu) > { > struct rq *rq = cpu_rq(cpu); > unsigned long max = arch_scale_cpu_capacity(NULL, cpu); > - unsigned long used, irq, free; > + unsigned long used, free; > +#if defined(CONFIG_IRQ_TIME_ACCOUNTING) || defined(CONFIG_PARAVIRT_TIME_ACCOUNTING) > + unsigned long irq; > +#endif > > #if defined(CONFIG_IRQ_TIME_ACCOUNTING) || defined(CONFIG_PARAVIRT_TIME_ACCOUNTING) Perhaps combine these two #if defined blocks into a single block > irq = READ_ONCE(rq->avg_irq.util_avg);
Hi Ingo, On Mon, 16 Jul 2018 at 00:15, Ingo Molnar <mingo@kernel.org> wrote: > > > * Vincent Guittot <vincent.guittot@linaro.org> wrote: > > > @@ -7550,39 +7550,36 @@ static inline int get_sd_load_idx(struct sched_domain *sd, > > static unsigned long scale_rt_capacity(int cpu) > > { > > struct rq *rq = cpu_rq(cpu); > > - u64 total, used, age_stamp, avg; > > - s64 delta; > > - > > - /* > > - * Since we're reading these variables without serialization make sure > > - * we read them once before doing sanity checks on them. > > - */ > > - age_stamp = READ_ONCE(rq->age_stamp); > > - avg = READ_ONCE(rq->rt_avg); > > - delta = __rq_clock_broken(rq) - age_stamp; > > + unsigned long max = arch_scale_cpu_capacity(NULL, cpu); > > + unsigned long used, irq, free; > > > > - if (unlikely(delta < 0)) > > - delta = 0; > > +#if defined(CONFIG_IRQ_TIME_ACCOUNTING) || defined(CONFIG_PARAVIRT_TIME_ACCOUNTING) > > + irq = READ_ONCE(rq->avg_irq.util_avg); > > > > - total = sched_avg_period() + delta; > > + if (unlikely(irq >= max)) > > + return 1; > > +#endif > > Note that 'irq' is unused outside that macro block, resulting in a new warning on > defconfig builds: > > CC kernel/sched/fair.o > kernel/sched/fair.c: In function ‘scale_rt_capacity’: > kernel/sched/fair.c:7553:22: warning: unused variable ‘irq’ [-Wunused-variable] > unsigned long used, irq, free; > ^~~ > > I have applied the delta fix below for simplicity, but what we really want is a > cleanup of that function to eliminate the #ifdefs. One solution would be to factor > out the 'irq' utilization value into a helper inline, and double check that if the > configs are off the compiler does the right thing and eliminates this identity > transformation for the irq==0 case: > > free *= (max - irq); > free /= max; > > If the compiler refuses to optimize this away (due to the zero and overflow > cases), try to find something more clever? Thanks for the fix. I'm off for now and will look at your proposal above once back Regards, Vincent > > Thanks, > > Ingo > > kernel/sched/fair.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index e3221db0511a..d5f7d521e448 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -7550,7 +7550,10 @@ static unsigned long scale_rt_capacity(int cpu) > { > struct rq *rq = cpu_rq(cpu); > unsigned long max = arch_scale_cpu_capacity(NULL, cpu); > - unsigned long used, irq, free; > + unsigned long used, free; > +#if defined(CONFIG_IRQ_TIME_ACCOUNTING) || defined(CONFIG_PARAVIRT_TIME_ACCOUNTING) > + unsigned long irq; > +#endif > > #if defined(CONFIG_IRQ_TIME_ACCOUNTING) || defined(CONFIG_PARAVIRT_TIME_ACCOUNTING) > irq = READ_ONCE(rq->avg_irq.util_avg);
* Vincent Guittot <vincent.guittot@linaro.org> wrote: > > If the compiler refuses to optimize this away (due to the zero and overflow > > cases), try to find something more clever? > > Thanks for the fix. > I'm off for now and will look at your proposal above once back Sounds good, there's no rush, we've still got time until ~rc7. Thanks, Ingo
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index f4de2698..68b8a9f 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -1180,8 +1180,6 @@ static void update_curr_dl(struct rq *rq) curr->se.exec_start = now; cgroup_account_cputime(curr, delta_exec); - sched_rt_avg_update(rq, delta_exec); - if (dl_entity_is_special(dl_se)) return; diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index d2758e3..ce0dcbf 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7550,39 +7550,36 @@ static inline int get_sd_load_idx(struct sched_domain *sd, static unsigned long scale_rt_capacity(int cpu) { struct rq *rq = cpu_rq(cpu); - u64 total, used, age_stamp, avg; - s64 delta; - - /* - * Since we're reading these variables without serialization make sure - * we read them once before doing sanity checks on them. - */ - age_stamp = READ_ONCE(rq->age_stamp); - avg = READ_ONCE(rq->rt_avg); - delta = __rq_clock_broken(rq) - age_stamp; + unsigned long max = arch_scale_cpu_capacity(NULL, cpu); + unsigned long used, irq, free; - if (unlikely(delta < 0)) - delta = 0; +#if defined(CONFIG_IRQ_TIME_ACCOUNTING) || defined(CONFIG_PARAVIRT_TIME_ACCOUNTING) + irq = READ_ONCE(rq->avg_irq.util_avg); - total = sched_avg_period() + delta; + if (unlikely(irq >= max)) + return 1; +#endif - used = div_u64(avg, total); + used = READ_ONCE(rq->avg_rt.util_avg); + used += READ_ONCE(rq->avg_dl.util_avg); - if (likely(used < SCHED_CAPACITY_SCALE)) - return SCHED_CAPACITY_SCALE - used; + if (unlikely(used >= max)) + return 1; - return 1; + free = max - used; +#if defined(CONFIG_IRQ_TIME_ACCOUNTING) || defined(CONFIG_PARAVIRT_TIME_ACCOUNTING) + free *= (max - irq); + free /= max; +#endif + return free; } static void update_cpu_capacity(struct sched_domain *sd, int cpu) { - unsigned long capacity = arch_scale_cpu_capacity(sd, cpu); + unsigned long capacity = scale_rt_capacity(cpu); struct sched_group *sdg = sd->groups; - cpu_rq(cpu)->cpu_capacity_orig = capacity; - - capacity *= scale_rt_capacity(cpu); - capacity >>= SCHED_CAPACITY_SHIFT; + cpu_rq(cpu)->cpu_capacity_orig = arch_scale_cpu_capacity(sd, cpu); if (!capacity) capacity = 1; diff --git a/kernel/sched/pelt.c b/kernel/sched/pelt.c index ead6d8b..35475c0 100644 --- a/kernel/sched/pelt.c +++ b/kernel/sched/pelt.c @@ -237,7 +237,7 @@ ___update_load_avg(struct sched_avg *sa, unsigned long load, unsigned long runna */ sa->load_avg = div_u64(load * sa->load_sum, divider); sa->runnable_load_avg = div_u64(runnable * sa->runnable_load_sum, divider); - sa->util_avg = sa->util_sum / divider; + WRITE_ONCE(sa->util_avg, sa->util_sum / divider); } /* diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 0e3e57a..2a881bd 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -970,8 +970,6 @@ static void update_curr_rt(struct rq *rq) curr->se.exec_start = now; cgroup_account_cputime(curr, delta_exec); - sched_rt_avg_update(rq, delta_exec); - if (!rt_bandwidth_enabled()) return;
The utilization of the CPU by rt, dl and interrupts are now tracked with PELT so we can use these metrics instead of rt_avg to evaluate the remaining capacity available for cfs class. scale_rt_capacity() behavior has been changed and now returns the remaining capacity available for cfs instead of a scaling factor because rt, dl and interrupt provide now absolute utilization value. The same formula as schedutil is used: irq util_avg + (1 - irq util_avg / max capacity ) * /Sum rq util_avg but the implementation is different because it doesn't return the same value and doesn't benefit of the same optimization Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> --- kernel/sched/deadline.c | 2 -- kernel/sched/fair.c | 41 +++++++++++++++++++---------------------- kernel/sched/pelt.c | 2 +- kernel/sched/rt.c | 2 -- 4 files changed, 20 insertions(+), 27 deletions(-) -- 2.7.4