From patchwork Tue Apr 9 15:00:01 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vincent Guittot X-Patchwork-Id: 16000 Return-Path: X-Original-To: linaro@staging.patches.linaro.org Delivered-To: linaro@staging.patches.linaro.org Received: from mail-gg0-f197.google.com (mail-gg0-f197.google.com [209.85.161.197]) by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id 76727277CA for ; Tue, 9 Apr 2013 15:00:59 +0000 (UTC) Received: by mail-gg0-f197.google.com with SMTP id k5sf9436313ggn.0 for ; Tue, 09 Apr 2013 08:00:42 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:mime-version:x-beenthere:x-received:received-spf :x-received:x-forwarded-to:x-forwarded-for:delivered-to:x-received :received-spf:x-received:from:to:cc:subject:date:message-id:x-mailer :x-gm-message-state:x-original-sender :x-original-authentication-results:precedence:mailing-list:list-id :x-google-group-id:list-post:list-help:list-archive:list-unsubscribe; bh=ZbuGkRdhgfHRndrdrUzIRzXu0u2VRvhKSamT0gawWQ4=; b=Caa4tLAou0WFdkP7LFuZIlSwTGDRq4KIZzt19KeMFF185nUzFnAlofwGfa9pOaXLOn JSFnAir8Vg0f44mFGuWIMZPaTHzn+Aze5rfHhvOlK2eRCeph8CEh9swUwyouJNcmTlD7 fETQqGf+1TQlgtG9sScwrw4qfzl3X+m8GrXVdAxqe1UyKwbrYC8Gkvm1RzpNkJ14zKIK UEHPpg3AW85xFkbrnCgMfU9s4SPxrBFNlvSxIytSildla3D3Fhb/H+4J7luxJq9MgbkW AyBmhmB0tAJ+9TyZBkxuGHpiFSqrBQeubUYBdGC/xaLan8Mqq5ZXyN2nokhczstNP6Sc gPcw== X-Received: by 10.58.155.231 with SMTP id vz7mr10489089veb.21.1365519642290; Tue, 09 Apr 2013 08:00:42 -0700 (PDT) MIME-Version: 1.0 X-BeenThere: patchwork-forward@linaro.org Received: by 10.49.18.162 with SMTP id x2ls3787827qed.37.gmail; Tue, 09 Apr 2013 08:00:42 -0700 (PDT) X-Received: by 10.52.67.164 with SMTP id o4mr16416112vdt.42.1365519641990; Tue, 09 Apr 2013 08:00:41 -0700 (PDT) Received: from mail-ve0-f181.google.com (mail-ve0-f181.google.com [209.85.128.181]) by mx.google.com with ESMTPS id sc6si6862269vdc.86.2013.04.09.08.00.41 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 09 Apr 2013 08:00:41 -0700 (PDT) Received-SPF: neutral (google.com: 209.85.128.181 is neither permitted nor denied by best guess record for domain of patch+caf_=patchwork-forward=linaro.org@linaro.org) client-ip=209.85.128.181; Received: by mail-ve0-f181.google.com with SMTP id pa12so6556010veb.12 for ; Tue, 09 Apr 2013 08:00:41 -0700 (PDT) X-Received: by 10.52.25.72 with SMTP id a8mr2964021vdg.47.1365519641664; Tue, 09 Apr 2013 08:00:41 -0700 (PDT) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patches@linaro.org Received: by 10.58.85.136 with SMTP id h8csp65442vez; Tue, 9 Apr 2013 08:00:40 -0700 (PDT) X-Received: by 10.180.91.106 with SMTP id cd10mr20474810wib.6.1365519640341; Tue, 09 Apr 2013 08:00:40 -0700 (PDT) Received: from mail-wg0-f53.google.com (mail-wg0-f53.google.com [74.125.82.53]) by mx.google.com with ESMTPS id j17si6474755wiv.119.2013.04.09.08.00.39 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 09 Apr 2013 08:00:40 -0700 (PDT) Received-SPF: neutral (google.com: 74.125.82.53 is neither permitted nor denied by best guess record for domain of vincent.guittot@linaro.org) client-ip=74.125.82.53; Received: by mail-wg0-f53.google.com with SMTP id c11so7267300wgh.32 for ; Tue, 09 Apr 2013 08:00:39 -0700 (PDT) X-Received: by 10.194.122.131 with SMTP id ls3mr3939911wjb.55.1365519636922; Tue, 09 Apr 2013 08:00:36 -0700 (PDT) Received: from localhost.localdomain (LPuteaux-156-14-44-212.w82-127.abo.wanadoo.fr. [82.127.83.212]) by mx.google.com with ESMTPS id fv2sm29384850wib.6.2013.04.09.08.00.35 (version=TLSv1.1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 09 Apr 2013 08:00:36 -0700 (PDT) From: Vincent Guittot To: linux-kernel@vger.kernel.org, linaro-kernel@lists.linaro.org, peterz@infradead.org, mingo@kernel.org, pjt@google.com, rostedt@goodmis.org, fweisbec@gmail.com, efault@gmx.de Cc: Vincent Guittot Subject: [PATCH v5] sched: fix wrong rq's runnable_avg update with rt tasks Date: Tue, 9 Apr 2013 17:00:01 +0200 Message-Id: <1365519601-29799-1-git-send-email-vincent.guittot@linaro.org> X-Mailer: git-send-email 1.7.9.5 X-Gm-Message-State: ALoCoQkKotlzcdUItIRNYwIGr7Ahy00ED/pHJVHJwSQ7m3fqa3VrXxu0DVkzV8HmavrD7ox7Wphp X-Original-Sender: vincent.guittot@linaro.org X-Original-Authentication-Results: mx.google.com; spf=neutral (google.com: 209.85.128.181 is neither permitted nor denied by best guess record for domain of patch+caf_=patchwork-forward=linaro.org@linaro.org) smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org Precedence: list Mailing-list: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org List-ID: X-Google-Group-Id: 836684582541 List-Post: , List-Help: , List-Archive: List-Unsubscribe: , The current update of the rq's load can be erroneous when RT tasks are involved The update of the load of a rq that becomes idle, is done only if the avg_idle is less than sysctl_sched_migration_cost. If RT tasks and short idle duration alternate, the runnable_avg will not be updated correctly and the time will be accounted as idle time when a CFS task wakes up. A new idle_enter function is called when the next task is the idle function so the elapsed time will be accounted as run time in the load of the rq, whatever the average idle time is. The function update_rq_runnable_avg is removed from idle_balance. When a RT task is scheduled on an idle CPU, the update of the rq's load is not done when the rq exit idle state because CFS's functions are not called. Then, the idle_balance, which is called just before entering the idle function, updates the rq's load and makes the assumption that the elapsed time since the last update, was only running time. As a consequence, the rq's load of a CPU that only runs a periodic RT task, is close to LOAD_AVG_MAX whatever the running duration of the RT task is. A new idle_exit function is called when the prev task is the idle function so the elapsed time will be accounted as idle time in the rq's load. Changes since V4: - Rebase on v3.9-rc6 instead of Steven Rostedt's patches - Create the post_schedule_idle function that was previously created by Steven's patches Changes since V3: - Remove dependancy with CONFIG_FAIR_GROUP_SCHED - Add a new idle_enter function and create a post_schedule callback for idle class - Remove the update_runnable_avg from idle_balance Changes since V2: - remove useless definition for UP platform - rebased on top of Steven Rostedt's patches : https://lkml.org/lkml/2013/2/12/558 Changes since V1: - move code out of schedule function and create a pre_schedule callback for idle class instead. Signed-off-by: Vincent Guittot --- kernel/sched/fair.c | 23 +++++++++++++++++++++-- kernel/sched/idle_task.c | 16 ++++++++++++++++ kernel/sched/sched.h | 12 ++++++++++++ 3 files changed, 49 insertions(+), 2 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 7a33e59..653edd8 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1562,6 +1562,27 @@ static inline void dequeue_entity_load_avg(struct cfs_rq *cfs_rq, se->avg.decay_count = atomic64_read(&cfs_rq->decay_counter); } /* migrations, e.g. sleep=0 leave decay_count == 0 */ } + +/* + * Update the rq's load with the elapsed running time before entering + * idle. if the last scheduled task is not a CFS task, idle_enter will + * be the only way to update the runnable statistic. + */ +void idle_enter(struct rq *this_rq) +{ + update_rq_runnable_avg(this_rq, 1); +} + +/* + * Update the rq's load with the elapsed idle time before a task is + * scheduled. if the newly scheduled task is not a CFS task, idle_exit will + * be the only way to update the runnable statistic. + */ +void idle_exit(struct rq *this_rq) +{ + update_rq_runnable_avg(this_rq, 0); +} + #else static inline void update_entity_load_avg(struct sched_entity *se, int update_cfs_rq) {} @@ -5219,8 +5240,6 @@ void idle_balance(int this_cpu, struct rq *this_rq) if (this_rq->avg_idle < sysctl_sched_migration_cost) return; - update_rq_runnable_avg(this_rq, 1); - /* * Drop the rq->lock, but keep IRQ/preempt disabled. */ diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c index b6baf37..cef61fa 100644 --- a/kernel/sched/idle_task.c +++ b/kernel/sched/idle_task.c @@ -13,6 +13,16 @@ select_task_rq_idle(struct task_struct *p, int sd_flag, int flags) { return task_cpu(p); /* IDLE tasks as never migrated */ } + +static void pre_schedule_idle(struct rq *rq, struct task_struct *prev) +{ + idle_exit(rq); +} + +static void post_schedule_idle(struct rq *rq) +{ + idle_enter(rq); +} #endif /* CONFIG_SMP */ /* * Idle tasks are unconditionally rescheduled: @@ -25,6 +35,10 @@ static void check_preempt_curr_idle(struct rq *rq, struct task_struct *p, int fl static struct task_struct *pick_next_task_idle(struct rq *rq) { schedstat_inc(rq, sched_goidle); +#ifdef CONFIG_SMP + /* Trigger the post schedule to do an idle_enter for CFS */ + rq->post_schedule = 1; +#endif return rq->idle; } @@ -86,6 +100,8 @@ const struct sched_class idle_sched_class = { #ifdef CONFIG_SMP .select_task_rq = select_task_rq_idle, + .pre_schedule = pre_schedule_idle, + .post_schedule = post_schedule_idle, #endif .set_curr_task = set_curr_task_idle, diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index cc03cfd..2b826f2 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -880,6 +880,18 @@ extern const struct sched_class idle_sched_class; extern void trigger_load_balance(struct rq *rq, int cpu); extern void idle_balance(int this_cpu, struct rq *this_rq); +/* + * Only depends on SMP, FAIR_GROUP_SCHED may be removed when runnable_avg + * becomes useful in lb + */ +#if defined(CONFIG_FAIR_GROUP_SCHED) +extern void idle_enter(struct rq *this_rq); +extern void idle_exit(struct rq *this_rq); +#else +static inline void idle_enter(struct rq *this_rq) {} +static inline void idle_exit(struct rq *this_rq) {} +#endif + #else /* CONFIG_SMP */ static inline void idle_balance(int cpu, struct rq *rq)