From patchwork Thu Apr 4 14:15:59 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vincent Guittot X-Patchwork-Id: 15933 Return-Path: X-Original-To: linaro@staging.patches.linaro.org Delivered-To: linaro@staging.patches.linaro.org Received: from mail-qa0-f71.google.com (mail-qa0-f71.google.com [209.85.216.71]) by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id 6145B277E2 for ; Thu, 4 Apr 2013 14:17:02 +0000 (UTC) Received: by mail-qa0-f71.google.com with SMTP id k4sf4089515qaq.10 for ; Thu, 04 Apr 2013 07:16:56 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:mime-version:x-beenthere:x-received:received-spf :x-received:x-forwarded-to:x-forwarded-for:delivered-to:x-received :received-spf:x-received:from:to:cc:subject:date:message-id:x-mailer :x-gm-message-state:x-original-sender :x-original-authentication-results:precedence:mailing-list:list-id :x-google-group-id:list-post:list-help:list-archive:list-unsubscribe; bh=GrRuUoWGCuKm14mMpO/UJE9ueLOkwlpnLNwdlOYcAOM=; b=bjjWBnkKZv+BtfiGrr7PUnt0NIXDD1A7WeTonl/FTI/egvHmBpXxZjA754XWT3c6Y0 ChR9E7dcjAvZpH2EzeJZjZVCxG2Pu2AOYp3L6fDy8V9vQRMxoHKYCLbPs2LKnD3dq/BL 7d+ilYHWbGLWH0kzotwLwbint+qhGdgRl5WcyrO/k3RkfWPVmHNlhejeceXL0eLfQ5+2 YCYXR6ttOxmllQTdU/4U5z+o6SJ1n0jKzh0fcM7/3BdXsuqDSPwvquxb4Uyp5hMQatvB RftOtL1gon5NERQFjum/E4OfctyNuaS1Ph8RA2GnCusOmRpWfL4POchvSq9Bril3sPZJ mCzA== X-Received: by 10.236.91.109 with SMTP id g73mr2863801yhf.23.1365085016230; Thu, 04 Apr 2013 07:16:56 -0700 (PDT) MIME-Version: 1.0 X-BeenThere: patchwork-forward@linaro.org Received: by 10.49.0.102 with SMTP id 6ls1482795qed.24.gmail; Thu, 04 Apr 2013 07:16:55 -0700 (PDT) X-Received: by 10.52.37.109 with SMTP id x13mr4227406vdj.10.1365085015973; Thu, 04 Apr 2013 07:16:55 -0700 (PDT) Received: from mail-vc0-f174.google.com (mail-vc0-f174.google.com [209.85.220.174]) by mx.google.com with ESMTPS id iu6si8303488vcb.6.2013.04.04.07.16.55 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 04 Apr 2013 07:16:55 -0700 (PDT) Received-SPF: neutral (google.com: 209.85.220.174 is neither permitted nor denied by best guess record for domain of patch+caf_=patchwork-forward=linaro.org@linaro.org) client-ip=209.85.220.174; Received: by mail-vc0-f174.google.com with SMTP id kw10so132598vcb.19 for ; Thu, 04 Apr 2013 07:16:55 -0700 (PDT) X-Received: by 10.58.168.208 with SMTP id zy16mr4918351veb.3.1365085015799; Thu, 04 Apr 2013 07:16:55 -0700 (PDT) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patches@linaro.org Received: by 10.58.85.136 with SMTP id h8csp218122vez; Thu, 4 Apr 2013 07:16:55 -0700 (PDT) X-Received: by 10.194.220.37 with SMTP id pt5mr9610330wjc.16.1365085014489; Thu, 04 Apr 2013 07:16:54 -0700 (PDT) Received: from mail-wg0-f51.google.com (mail-wg0-f51.google.com [74.125.82.51]) by mx.google.com with ESMTPS id e13si4363808wiv.123.2013.04.04.07.16.53 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 04 Apr 2013 07:16:54 -0700 (PDT) Received-SPF: neutral (google.com: 74.125.82.51 is neither permitted nor denied by best guess record for domain of vincent.guittot@linaro.org) client-ip=74.125.82.51; Received: by mail-wg0-f51.google.com with SMTP id b12so2802149wgh.30 for ; Thu, 04 Apr 2013 07:16:53 -0700 (PDT) X-Received: by 10.180.97.132 with SMTP id ea4mr9682524wib.23.1365085013790; Thu, 04 Apr 2013 07:16:53 -0700 (PDT) Received: from localhost.localdomain (LPuteaux-156-14-44-212.w82-127.abo.wanadoo.fr. [82.127.83.212]) by mx.google.com with ESMTPS id h10sm16400992wic.8.2013.04.04.07.16.52 (version=TLSv1.1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 04 Apr 2013 07:16:53 -0700 (PDT) From: Vincent Guittot To: linux-kernel@vger.kernel.org, linaro-kernel@lists.linaro.org, peterz@infradead.org, mingo@kernel.org, pjt@google.com, rostedt@goodmis.org, fweisbec@gmail.com, efault@gmx.de Cc: Vincent Guittot Subject: [PATCH v4] sched: fix wrong rq's runnable_avg update with rt tasks Date: Thu, 4 Apr 2013 16:15:59 +0200 Message-Id: <1365084959-28374-1-git-send-email-vincent.guittot@linaro.org> X-Mailer: git-send-email 1.7.9.5 X-Gm-Message-State: ALoCoQki0vSDYQCPVY9U7ls+4i3110UypmiqhHf9LmoLogmmJ5fPll5a7r3YAczXImo2BwjpPxL8 X-Original-Sender: vincent.guittot@linaro.org X-Original-Authentication-Results: mx.google.com; spf=neutral (google.com: 209.85.220.174 is neither permitted nor denied by best guess record for domain of patch+caf_=patchwork-forward=linaro.org@linaro.org) smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org Precedence: list Mailing-list: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org List-ID: X-Google-Group-Id: 836684582541 List-Post: , List-Help: , List-Archive: List-Unsubscribe: , The current update of the rq's load can be erroneous when RT tasks are involved The update of the load of a rq that becomes idle, is done only if the avg_idle is less than sysctl_sched_migration_cost. If RT tasks and short idle duration alternate, the runnable_avg will not be updated correctly and the time will be accounted as idle time when a CFS task wakes up. A new idle_enter function is called when the next task is the idle function so the elapsed time will be accounted as run time in the load of the rq, whatever the average idle time is. The function update_rq_runnable_avg is removed from idle_balance. When a RT task is scheduled on an idle CPU, the update of the rq's load is not done when the rq exit idle state because CFS's functions are not called. Then, the idle_balance, which is called just before entering the idle function, updates the rq's load and makes the assumption that the elapsed time since the last update, was only running time. As a consequence, the rq's load of a CPU that only runs a periodic RT task, is close to LOAD_AVG_MAX whatever the running duration of the RT task is. A new idle_exit function is called when the prev task is the idle function so the elapsed time will be accounted as idle time in the rq's load. Changes since V3: - Remove dependancy with CONFIG_FAIR_GROUP_SCHED - Add a new idle_enter function and create a post_schedule callback for idle class - Remove the update_runnable_avg from idle_balance Changes since V2: - remove useless definition for UP platform - rebased on top of Steven Rostedt's patches : https://lkml.org/lkml/2013/2/12/558 Changes since V1: - move code out of schedule function and create a pre_schedule callback for idle class instead. Signed-off-by: Vincent Guittot --- kernel/sched/fair.c | 23 +++++++++++++++++++++-- kernel/sched/idle_task.c | 10 ++++++++++ kernel/sched/sched.h | 12 ++++++++++++ 3 files changed, 43 insertions(+), 2 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 0fcdbff..1851ca8 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1562,6 +1562,27 @@ static inline void dequeue_entity_load_avg(struct cfs_rq *cfs_rq, se->avg.decay_count = atomic64_read(&cfs_rq->decay_counter); } /* migrations, e.g. sleep=0 leave decay_count == 0 */ } + +/* + * Update the rq's load with the elapsed running time before entering + * idle. if the last scheduled task is not a CFS task, idle_enter will + * be the only way to update the runnable statistic. + */ +void idle_enter(struct rq *this_rq) +{ + update_rq_runnable_avg(this_rq, 1); +} + +/* + * Update the rq's load with the elapsed idle time before a task is + * scheduled. if the newly scheduled task is not a CFS task, idle_exit will + * be the only way to update the runnable statistic. + */ +void idle_exit(struct rq *this_rq) +{ + update_rq_runnable_avg(this_rq, 0); +} + #else static inline void update_entity_load_avg(struct sched_entity *se, int update_cfs_rq) {} @@ -5219,8 +5240,6 @@ void idle_balance(int this_cpu, struct rq *this_rq) if (this_rq->avg_idle < sysctl_sched_migration_cost) return; - update_rq_runnable_avg(this_rq, 1); - /* * Drop the rq->lock, but keep preempt disabled. */ diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c index 66b5220..0775261 100644 --- a/kernel/sched/idle_task.c +++ b/kernel/sched/idle_task.c @@ -14,8 +14,17 @@ select_task_rq_idle(struct task_struct *p, int sd_flag, int flags) return task_cpu(p); /* IDLE tasks as never migrated */ } +static void pre_schedule_idle(struct rq *rq, struct task_struct *prev) +{ + /* Update rq's load with elapsed idle time */ + idle_exit(rq); +} + static void post_schedule_idle(struct rq *rq) { + /* Update rq's load with elapsed running time */ + idle_enter(rq); + idle_balance(smp_processor_id(), rq); } #endif /* CONFIG_SMP */ @@ -95,6 +104,7 @@ const struct sched_class idle_sched_class = { #ifdef CONFIG_SMP .select_task_rq = select_task_rq_idle, + .pre_schedule = pre_schedule_idle, .post_schedule = post_schedule_idle, #endif diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index fc88644..ff4b029 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -878,6 +878,18 @@ extern const struct sched_class idle_sched_class; extern void trigger_load_balance(struct rq *rq, int cpu); extern void idle_balance(int this_cpu, struct rq *this_rq); +/* + * Only depends on SMP, FAIR_GROUP_SCHED may be removed when runnable_avg + * becomes useful in lb + */ +#if defined(CONFIG_FAIR_GROUP_SCHED) +extern void idle_enter(struct rq *this_rq); +extern void idle_exit(struct rq *this_rq); +#else +static inline void idle_enter(struct rq *this_rq) {} +static inline void idle_exit(struct rq *this_rq) {} +#endif + #else /* CONFIG_SMP */ static inline void idle_balance(int cpu, struct rq *rq)