Message ID | 20160505111308.GA22310@e105550-lin.cambridge.arm.com |
---|---|
State | New |
Headers | show |
Hi Morten, On 5 May 2016 at 13:13, Morten Rasmussen <morten.rasmussen@arm.com> wrote: > On Wed, May 04, 2016 at 04:02:44AM +0800, Yuyang Du wrote: >> In sched average update, a period is about 1ms, so a 32-bit unsigned >> integer can approximately hold a maximum of 49 (=2^32/1000/3600/24) >> days. >> >> For usual cases, 32bit is big enough and 64bit is needless. But if >> a task sleeps longer than it, there can be two outcomes: >> >> Consider a task sleeps whatever m milliseconds, then n = (u32)m. > > Maybe it crystal clear that you mean: For any m > UINT_MAX? > >> >> 1. If n >= 32*64, then the task's sched avgs will be surely decayed >> to 0. In this case, it really doesn't matter that the 32bit is not >> big enough to hold m. In other words, a task sleeps 2 secs or sleeps >> 50 days are the same from sched average point of view. >> >> 2. If m < 32*64, the chance to be here is very very low, but if so, > > Should that be: n < 32*64 ? > > Talking about 32*64, I don't get why we don't use LOAD_AVG_MAX_N. I had > a patch ready to post for that: > > From 5055e5f82c8d207880035c2ec4ecf1ac1e7f9e91 Mon Sep 17 00:00:00 2001 > From: Morten Rasmussen <morten.rasmussen@arm.com> > Date: Mon, 11 Apr 2016 15:41:37 +0100 > Subject: [PATCH] sched/fair: Fix decay to zero period in decay_load() > > In __compute_runnable_contrib() we are happy with returning LOAD_AVG_MAX > when the update period n >= LOAD_AVG_MAX_N (=345), so we should be happy > with returning zero for n >= LOAD_AVG_MAX_N when decaying in > decay_load() as well instead of only returning zero for n > > LOAD_AVG_PERIOD * 63 (=2016). > > As both conditions are unlikely() the impact is quite likely very small, > but at least it makes the rounding errors for load accumulation and > decay symmetrical. > > Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com> > --- > kernel/sched/fair.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 56b7d4b..42515b6 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -2527,7 +2527,7 @@ static __always_inline u64 decay_load(u64 val, u64 n) > > if (!n) > return val; > - else if (unlikely(n > LOAD_AVG_PERIOD * 63)) > + else if (unlikely(n > LOAD_AVG_MAX_N)) I had the same question about this 63 and IIUC, the 63 comes from 64bits-1. The load can be that high that the 64bits are used. We shift right every LOAD_AVG_PERIOD so we will be sure that the value will be 0 after shifting all 64 bits > return 0; > > /* after bounds checking we can collapse to 32-bit */ > -- > 1.9.1 > > >> the task's sched avgs MAY NOT be decayed to 0, depending on how >> big its sums are, and the chance to 0 is still good as load_sum >> is way less than ~0ULL and util_sum way less than ~0U. > > I don't get the last bit about load_sum < ~0ULL and util_sum < ~0U. > Whether you get to zero depends on the sums (as you say) and the actual > value of 'n'. It is true that you might get to zero even if n < > LOAD_AVG_MAX_N if the sums are small. > >> Nevertheless, what really maters is what happens in the worst-case >> scenario, which is when (u32)m = 0? So in that case, it would be like >> after so long a sleep, we treat the task as it never slept, and has the >> same sched averages as before it slept. That is actually not bad or >> nothing to worry about, and think of it as the same as how we treat >> a new born task. > > There is subtle but important difference between not decaying a task > correctly and adding new task: The sleeping task is already accounted > for in the cfs_rq.avg.{load,util}_sum. The sleeping task's contribution > to cfs_rq.avg has been decayed correctly in the mean time which means > that by not guaranteeing a decay of the se.avg at wake-up you introduce > a discrepancy between the task's owen view of its contribution (se.avg) > and the cfs_rq view (cfs_rq.avg). That might lead to trouble if the task > is migrated at wake-up as we remove se.avg amount of contribution from > the previous cpu's cfs_rq.avg. In other words, you remove too much from > the cfs_rq.avg. > > The discrepancy will decay and disappear after LOAD_AVG_MAX_N ms, which > might be acceptable, but it is not a totally harmless change IMHO.
On Fri, May 06, 2016 at 02:19:32AM +0800, Yuyang Du wrote: > Hi Morten, > > On Thu, May 05, 2016 at 12:13:10PM +0100, Morten Rasmussen wrote: > > On Wed, May 04, 2016 at 04:02:44AM +0800, Yuyang Du wrote: > > > In sched average update, a period is about 1ms, so a 32-bit unsigned > > > integer can approximately hold a maximum of 49 (=2^32/1000/3600/24) > > > days. > > > > > > For usual cases, 32bit is big enough and 64bit is needless. But if > > > a task sleeps longer than it, there can be two outcomes: > > > > > > Consider a task sleeps whatever m milliseconds, then n = (u32)m. > > > > Maybe it crystal clear that you mean: For any m > UINT_MAX? > > En, literally whatever. Your second case below is not correct for m < UINT_MAX. You are only talking about cases where n != m which is m > UINT_MAX. > > > > 1. If n >= 32*64, then the task's sched avgs will be surely decayed > > > to 0. In this case, it really doesn't matter that the 32bit is not > > > big enough to hold m. In other words, a task sleeps 2 secs or sleeps > > > 50 days are the same from sched average point of view. > > > > > > 2. If m < 32*64, the chance to be here is very very low, but if so, > > > > Should that be: n < 32*64 ? > > > > Talking about 32*64, I don't get why we don't use LOAD_AVG_MAX_N. I had > > a patch ready to post for that: > > > > >From 5055e5f82c8d207880035c2ec4ecf1ac1e7f9e91 Mon Sep 17 00:00:00 2001 > > From: Morten Rasmussen <morten.rasmussen@arm.com> > > Date: Mon, 11 Apr 2016 15:41:37 +0100 > > Subject: [PATCH] sched/fair: Fix decay to zero period in decay_load() > > > > In __compute_runnable_contrib() we are happy with returning LOAD_AVG_MAX > > when the update period n >= LOAD_AVG_MAX_N (=345), so we should be happy > > with returning zero for n >= LOAD_AVG_MAX_N when decaying in > > decay_load() as well instead of only returning zero for n > > > LOAD_AVG_PERIOD * 63 (=2016). > > So basically, you want to add another rule in addition to the exponential > decay rule. No, not at all. I want to make the 'rules' symmetrical for accumulation and decay exactly like the patch does. > > > > the task's sched avgs MAY NOT be decayed to 0, depending on how > > > big its sums are, and the chance to 0 is still good as load_sum > > > is way less than ~0ULL and util_sum way less than ~0U. > > > > I don't get the last bit about load_sum < ~0ULL and util_sum < ~0U. > > Whether you get to zero depends on the sums (as you say) and the actual > > value of 'n'. It is true that you might get to zero even if n < > > LOAD_AVG_MAX_N if the sums are small. > > Frankly, util Ben brought it up, I didn't think a task sleeping so long > is even possible. But I do admit it may happen. > > However, I will say this. A task sleeping so long is already very rare, > and among all those long sleep cases, the chance that after waking up the > avgs will not be decayed to zero is much less than 0.5 in a million > (=32*64/2^32=1/2^21), assuming the sleeping time is uniformly distributed. You can't just ignore cases because they have a low probability. Going by that logic we could remove a lot of synchronization overhead in the kernel. My concern is whether we introduce any assumptions that might hit us later when everybody has forgotten about them. This one would be extremely hard to debug later. > > > Nevertheless, what really maters is what happens in the worst-case > > > scenario, which is when (u32)m = 0? So in that case, it would be like > > > after so long a sleep, we treat the task as it never slept, and has the > > > same sched averages as before it slept. That is actually not bad or > > > nothing to worry about, and think of it as the same as how we treat > > > a new born task. > > > > There is subtle but important difference between not decaying a task > > correctly and adding new task: The sleeping task is already accounted > > for in the cfs_rq.avg.{load,util}_sum. The sleeping task's contribution > > to cfs_rq.avg has been decayed correctly in the mean time which means > > that by not guaranteeing a decay of the se.avg at wake-up you introduce > > a discrepancy between the task's owen view of its contribution (se.avg) > > and the cfs_rq view (cfs_rq.avg). That might lead to trouble if the task > > is migrated at wake-up as we remove se.avg amount of contribution from > > the previous cpu's cfs_rq.avg. In other words, you remove too much from > > the cfs_rq.avg. > > > > The discrepancy will decay and disappear after LOAD_AVG_MAX_N ms, which > > might be acceptable, but it is not a totally harmless change IMHO. > > That is just an explanation, :) nevermind, or I really don't think that is > a deal. I don't really follow. I analysed the implications of the overflow that you are introducing. In my opinion this is what you should have done before proposing this patch. I think it is essential to understand what assumptions we make and introducing new ones should be carefully considered. I think it is a big 'deal' if you are not more careful when you are submitting patches and just ignore feedback. We spent a lot of time reviewing them. Morten
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 56b7d4b..42515b6 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2527,7 +2527,7 @@ static __always_inline u64 decay_load(u64 val, u64 n) if (!n) return val; - else if (unlikely(n > LOAD_AVG_PERIOD * 63)) + else if (unlikely(n > LOAD_AVG_MAX_N)) return 0; /* after bounds checking we can collapse to 32-bit */