From patchwork Fri Oct 14 13:41:07 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Morten Rasmussen <morten.rasmussen@arm.com>
X-Patchwork-Id: 77668
Delivered-To: patch@linaro.org
Received: by 10.140.97.247 with SMTP id m110csp300888qge;
 Fri, 14 Oct 2016 06:43:34 -0700 (PDT)
X-Received: by 10.66.135.50 with SMTP id pp18mr15194843pab.18.1476452614321; 
 Fri, 14 Oct 2016 06:43:34 -0700 (PDT)
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67])
 by mx.google.com with ESMTP id
 q24si18639381pfi.151.2016.10.14.06.43.33; 
 Fri, 14 Oct 2016 06:43:34 -0700 (PDT)
Received-SPF: pass (google.com: best guess record for domain of
 linux-kernel-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender) client-ip=209.132.180.67; 
Authentication-Results: mx.google.com;
 spf=pass (google.com: best guess record for domain of
 linux-kernel-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
 id S1756741AbcJNNnb (ORCPT <rfc822;julien.grall@linaro.org>
 + 27 others); Fri, 14 Oct 2016 09:43:31 -0400
Received: from foss.arm.com ([217.140.101.70]:37646 "EHLO foss.arm.com"
 rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
 id S1755694AbcJNNmK (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
 Fri, 14 Oct 2016 09:42:10 -0400
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249])
 by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A7D53105A;
 Fri, 14 Oct 2016 06:42:09 -0700 (PDT)
Received: from e105550-lin.cambridge.arm.com (e105550-lin.cambridge.arm.com
 [10.1.211.30])
 by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id
 F2EF43F32C; Fri, 14 Oct 2016 06:42:07 -0700 (PDT)
From: Morten Rasmussen <morten.rasmussen@arm.com>
To: peterz@infradead.org, mingo@redhat.com
Cc: dietmar.eggemann@arm.com, yuyang.du@intel.com,
 vincent.guittot@linaro.org, mgalbraith@suse.de,
 sgurrappadi@nvidia.com, freedom.tan@mediatek.com,
 keita.kobayashi.ym@renesas.com, linux-kernel@vger.kernel.org,
 Morten Rasmussen <morten.rasmussen@arm.com>
Subject: [PATCH v5 1/6] sched/fair: Compute task/cpu utilization at wake-up
 correctly
Date: Fri, 14 Oct 2016 14:41:07 +0100
Message-Id: <1476452472-24740-2-git-send-email-morten.rasmussen@arm.com>
X-Mailer: git-send-email 2.7.4
In-Reply-To: <1476452472-24740-1-git-send-email-morten.rasmussen@arm.com>
References: <1476452472-24740-1-git-send-email-morten.rasmussen@arm.com>
Sender: linux-kernel-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

At task wake-up load-tracking isn't updated until the task is enqueued.
The task's own view of its utilization contribution may therefore not be
aligned with its contribution to the cfs_rq load-tracking which may have
been updated in the meantime. Basically, the task's own utilization
hasn't yet accounted for the sleep decay, while the cfs_rq may have
(partially). Estimating the cfs_rq utilization in case the task is
migrated at wake-up as task_rq(p)->cfs.avg.util_avg - p->se.avg.util_avg
is therefore incorrect as the two load-tracking signals aren't time
synchronized (different last update).

To solve this problem, this patch synchronizes the task utilization with
its previous rq before the task utilization is used in the wake-up path.
Currently the update/synchronization is done _after_ the task has been
placed by select_task_rq_fair(). The synchronization is done without
having to take the rq lock using the existing mechanism used in
remove_entity_load_avg().

cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>

Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
---
 kernel/sched/fair.c | 39 +++++++++++++++++++++++++++++++++++----
 1 file changed, 35 insertions(+), 4 deletions(-)

-- 
2.7.4

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 543b2f291152..f4abdbdeab50 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3205,13 +3205,25 @@ static inline u64 cfs_rq_last_update_time(struct cfs_rq *cfs_rq)
 #endif
 
 /*
+ * Synchronize entity load avg of dequeued entity without locking
+ * the previous rq.
+ */
+void sync_entity_load_avg(struct sched_entity *se)
+{
+	struct cfs_rq *cfs_rq = cfs_rq_of(se);
+	u64 last_update_time;
+
+	last_update_time = cfs_rq_last_update_time(cfs_rq);
+	__update_load_avg(last_update_time, cpu_of(rq_of(cfs_rq)), &se->avg, 0, 0, NULL);
+}
+
+/*
  * Task first catches up with cfs_rq, and then subtract
  * itself from the cfs_rq (task must be off the queue now).
  */
 void remove_entity_load_avg(struct sched_entity *se)
 {
 	struct cfs_rq *cfs_rq = cfs_rq_of(se);
-	u64 last_update_time;
 
 	/*
 	 * tasks cannot exit without having gone through wake_up_new_task() ->
@@ -3223,9 +3235,7 @@ void remove_entity_load_avg(struct sched_entity *se)
 	 * calls this.
 	 */
 
-	last_update_time = cfs_rq_last_update_time(cfs_rq);
-
-	__update_load_avg(last_update_time, cpu_of(rq_of(cfs_rq)), &se->avg, 0, 0, NULL);
+	sync_entity_load_avg(se);
 	atomic_long_add(se->avg.load_avg, &cfs_rq->removed_load_avg);
 	atomic_long_add(se->avg.util_avg, &cfs_rq->removed_util_avg);
 }
@@ -5579,6 +5589,24 @@ static inline int task_util(struct task_struct *p)
 }
 
 /*
+ * cpu_util_wake: Compute cpu utilization with any contributions from
+ * the waking task p removed.
+ */
+static int cpu_util_wake(int cpu, struct task_struct *p)
+{
+	unsigned long util, capacity;
+
+	/* Task has no contribution or is new */
+	if (cpu != task_cpu(p) || !p->se.avg.last_update_time)
+		return cpu_util(cpu);
+
+	capacity = capacity_orig_of(cpu);
+	util = max_t(long, cpu_rq(cpu)->cfs.avg.util_avg - task_util(p), 0);
+
+	return (util >= capacity) ? capacity : util;
+}
+
+/*
  * Disable WAKE_AFFINE in the case where task @p doesn't fit in the
  * capacity of either the waking CPU @cpu or the previous CPU @prev_cpu.
  *
@@ -5596,6 +5624,9 @@ static int wake_cap(struct task_struct *p, int cpu, int prev_cpu)
 	if (max_cap - min_cap < max_cap >> 3)
 		return 0;
 
+	/* Bring task utilization in sync with prev_cpu */
+	sync_entity_load_avg(&p->se);
+
 	return min_cap * 1024 < task_util(p) * capacity_margin;
 }