From patchwork Fri Feb 21 16:36:41 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Juri Lelli X-Patchwork-Id: 25118 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-oa0-f72.google.com (mail-oa0-f72.google.com [209.85.219.72]) by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id A11EA201EE for ; Fri, 21 Feb 2014 16:36:38 +0000 (UTC) Received: by mail-oa0-f72.google.com with SMTP id i4sf15817203oah.3 for ; Fri, 21 Feb 2014 08:36:38 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:delivered-to:date:from:to:cc:subject:message-id :in-reply-to:references:mime-version:sender:precedence:list-id :x-original-sender:x-original-authentication-results:mailing-list :list-post:list-help:list-archive:list-unsubscribe:content-type :content-transfer-encoding; bh=z4SZ70qcNUlv8688d95QT2iMdF7W3x5NH/SBZlr31JI=; b=W6D0aLm+00NfDCV4u2tN4IuVs4h+zlH5TPW70OFqO14osFQ+vvzguvH5/ExWQI+Org XGoRqoXh0kqfbZkmZHg06M4PKM1f/t/4cC9CJZVq27hCWT1nUqLpY2mJGRxvjVgMkf+3 1QHQXBf3fvgPNsVD9F/TyfeqKlx9XXSpX3qS6WS2S+IO86JbdbmiqD0ny+dj/Rv4SDtd rXWDUINiAwqBGRydWlIEidp26GsJ2FXgqVTnMS494pTdUzogvigPQ+CxaVgwkUeJhiM4 H23BtFETYkQ8+nLCvxEEA4/gqKsQAyMNpsf4wIwNkyYbcygrbqMMEOVfblL4K+o0K93y 6HxQ== X-Gm-Message-State: ALoCoQn0AeEcEfcJ1UbEVQKKZEWoc+N95xZ3vFdySFeyxP7CG14BGkEcUPGXOEeEiXQdhohtNxbt X-Received: by 10.182.104.200 with SMTP id gg8mr2315813obb.45.1393000598181; Fri, 21 Feb 2014 08:36:38 -0800 (PST) X-BeenThere: patchwork-forward@linaro.org Received: by 10.140.27.239 with SMTP id 102ls1143810qgx.23.gmail; Fri, 21 Feb 2014 08:36:37 -0800 (PST) X-Received: by 10.52.61.133 with SMTP id p5mr4529120vdr.4.1393000597936; Fri, 21 Feb 2014 08:36:37 -0800 (PST) Received: from mail-vc0-x235.google.com (mail-vc0-x235.google.com [2607:f8b0:400c:c03::235]) by mx.google.com with ESMTPS id av10si3209581ved.150.2014.02.21.08.36.37 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 21 Feb 2014 08:36:37 -0800 (PST) Received-SPF: neutral (google.com: 2607:f8b0:400c:c03::235 is neither permitted nor denied by best guess record for domain of patch+caf_=patchwork-forward=linaro.org@linaro.org) client-ip=2607:f8b0:400c:c03::235; Received: by mail-vc0-f181.google.com with SMTP id ie18so3445563vcb.26 for ; Fri, 21 Feb 2014 08:36:37 -0800 (PST) X-Received: by 10.220.191.134 with SMTP id dm6mr5497101vcb.16.1393000597825; Fri, 21 Feb 2014 08:36:37 -0800 (PST) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patch@linaro.org Received: by 10.220.174.196 with SMTP id u4csp47721vcz; Fri, 21 Feb 2014 08:36:37 -0800 (PST) X-Received: by 10.68.230.137 with SMTP id sy9mr10133026pbc.126.1393000596337; Fri, 21 Feb 2014 08:36:36 -0800 (PST) Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id tt8si7769329pbc.70.2014.02.21.08.36.35; Fri, 21 Feb 2014 08:36:35 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932778AbaBUQg3 (ORCPT + 26 others); Fri, 21 Feb 2014 11:36:29 -0500 Received: from mail-ee0-f43.google.com ([74.125.83.43]:65267 "EHLO mail-ee0-f43.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932218AbaBUQg1 (ORCPT ); Fri, 21 Feb 2014 11:36:27 -0500 Received: by mail-ee0-f43.google.com with SMTP id e51so1600752eek.16 for ; Fri, 21 Feb 2014 08:36:26 -0800 (PST) X-Received: by 10.14.194.193 with SMTP id m41mr9540736een.76.1393000586649; Fri, 21 Feb 2014 08:36:26 -0800 (PST) Received: from neville (nat-cataldo.sssup.it. [193.205.81.5]) by mx.google.com with ESMTPSA id q44sm28263941eez.1.2014.02.21.08.36.24 for (version=TLSv1.1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 21 Feb 2014 08:36:25 -0800 (PST) Date: Fri, 21 Feb 2014 17:36:41 +0100 From: Juri Lelli To: Peter Zijlstra Cc: Kirill Tkhai , "linux-kernel@vger.kernel.org" , Steven Rostedt , Ingo Molnar Subject: Re: [RFC] sched/deadline: Prevent rt_time growth to infinity Message-Id: <20140221173641.a060b3d6c0993c21e77f29c2@gmail.com> In-Reply-To: <20140221103715.GP9987@twins.programming.kicks-ass.net> References: <230991392848160@web13m.yandex.ru> <20140221103715.GP9987@twins.programming.kicks-ass.net> X-Mailer: Sylpheed 3.2.0beta5 (GTK+ 2.24.10; x86_64-pc-linux-gnu) Mime-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: list List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Original-Sender: juri.lelli@gmail.com X-Original-Authentication-Results: mx.google.com; spf=neutral (google.com: 2607:f8b0:400c:c03::235 is neither permitted nor denied by best guess record for domain of patch+caf_=patchwork-forward=linaro.org@linaro.org) smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org; dkim=neutral (bad format) header.i=@gmail.com; dmarc=fail (p=NONE dis=NONE) header.from=gmail.com Mailing-list: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org X-Google-Group-Id: 836684582541 List-Post: , List-Help: , List-Archive: List-Unsubscribe: , On Fri, 21 Feb 2014 11:37:15 +0100 Peter Zijlstra wrote: > On Thu, Feb 20, 2014 at 02:16:00AM +0400, Kirill Tkhai wrote: > > Since deadline tasks share rt bandwidth, we must care about > > bandwidth timer set. Otherwise rt_time may grow up to infinity > > in update_curr_dl(), if there are no other available RT tasks > > on top level bandwidth. > > > > I'm going to decide the problem the way below. Almost untested > > because of I skipped almost all of recent patches which haveto be applied from lkml. > > > > Please say, if I skipped anything in idea. Maybe better put > > start_top_rt_bandwidth() into set_curr_task_dl()? > > How about we only increment rt_time when there's an RT bandwidth timer > active? > > > --- > --- a/kernel/sched/rt.c > +++ b/kernel/sched/rt.c > @@ -568,6 +568,12 @@ static inline struct rt_bandwidth *sched > > #endif /* CONFIG_RT_GROUP_SCHED */ > > +bool sched_rt_bandwidth_active(struct rt_rq *rt_rq) > +{ > + struct rt_bandwidth *rt_b = sched_rt_bandwidth(rt_rq); > + return hrtimer_active(&rt_b->rt_period_timer); > +} > + > #ifdef CONFIG_SMP > /* > * We ran out of runtime, see if we can borrow some from our neighbours. > --- a/kernel/sched/deadline.c > +++ b/kernel/sched/deadline.c > @@ -587,6 +587,8 @@ int dl_runtime_exceeded(struct rq *rq, s > return 1; > } > > +extern bool sched_rt_bandwidth_active(struct rt_rq *rt_rq); > + > /* > * Update the current task's runtime statistics (provided it is still > * a -deadline task and has not been removed from the dl_rq). > @@ -650,11 +652,13 @@ static void update_curr_dl(struct rq *rq > struct rt_rq *rt_rq = &rq->rt; > > raw_spin_lock(&rt_rq->rt_runtime_lock); > - rt_rq->rt_time += delta_exec; > /* > * We'll let actual RT tasks worry about the overflow here, we > - * have our own CBS to keep us inline -- see above. > + * have our own CBS to keep us inline; only account when RT > + * bandwidth is relevant. > */ > + if (sched_rt_bandwidth_active(rt_rq)) > + rt_rq->rt_time += delta_exec; > raw_spin_unlock(&rt_rq->rt_runtime_lock); > } > } So, I ran some tests with the above and I'd like to share with you what I've found. You can find here a trace-cmd trace that should be feeded to kernelshark to be able to understand what follows (or feel free to reproduce same scenario :)): http://retis.sssup.it/~jlelli/traces/trace_rt_time.dat Here you have a DL task (4/10) and a while(1) RT task, both running inside a rt_bw of 0.5. RT tasks is activated 500ms after DL. As I filtered in sched_rt_period_timer(), you can search for time instants when the rt_bw is replenished. It is evident that the first time after rt timer is activated back (search for start_bandwidth_timer), we can eat some bw to FAIR tasks (if any). This is due to the fact that we reset rt_bw budget at this time, start decrementing rt_time for both DL and RT tasks, throttle RT tasks when rt_time > runtime, but, since DL tasks acually executes inside their own server, they don't care about rt_bw. Good news is that steady state is ok: keeping track of overruns we are able to stop eating bw to other guys. My thougths: - Peter's patch is an easy fix to Kirill's problem (RT tasks were throttled too early); - something to add to this solution could be to pre-calculate bw of ready DL tasks and subtract it to rt_bw at replenishment time, but it sounds quite awkward, pessimistic, and I'm not sure it is gonna work; - we are stealing bw to best-effort tasks, and just at the beginning of the transistion, is it really a problem? - I mean, if you want guarantees make your tasks DL! :); - in the long run we are gonna have RT tasks scheduled inside CBS servers, and all this will be properly fixed up. Comments? BTW, rt timer activation/deactivation should probably be fixed for !RT_GROUP_SCHED with something like this: --- kernel/sched/rt.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 6161de8..274f992 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -86,12 +86,12 @@ void init_rt_rq(struct rt_rq *rt_rq, struct rq *rq) raw_spin_lock_init(&rt_rq->rt_runtime_lock); } -#ifdef CONFIG_RT_GROUP_SCHED static void destroy_rt_bandwidth(struct rt_bandwidth *rt_b) { hrtimer_cancel(&rt_b->rt_period_timer); } +#ifdef CONFIG_RT_GROUP_SCHED #define rt_entity_is_task(rt_se) (!(rt_se)->my_q) static inline struct task_struct *rt_task_of(struct sched_rt_entity *rt_se) @@ -1017,8 +1017,12 @@ inc_rt_group(struct sched_rt_entity *rt_se, struct rt_rq *rt_rq) start_rt_bandwidth(&def_rt_bandwidth); } -static inline -void dec_rt_group(struct sched_rt_entity *rt_se, struct rt_rq *rt_rq) {} +static void +dec_rt_group(struct sched_rt_entity *rt_se, struct rt_rq *rt_rq) +{ + if (!rt_rq->rt_nr_running) + destroy_rt_bandwidth(&def_rt_bandwidth); +} #endif /* CONFIG_RT_GROUP_SCHED */