From patchwork Tue Nov 24 13:49:30 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vincent Guittot X-Patchwork-Id: 57253 Delivered-To: patch@linaro.org Received: by 10.112.155.196 with SMTP id vy4csp2099186lbb; Tue, 24 Nov 2015 05:50:38 -0800 (PST) X-Received: by 10.67.5.69 with SMTP id ck5mr1334366pad.125.1448373038247; Tue, 24 Nov 2015 05:50:38 -0800 (PST) Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id uu9si26622520pac.19.2015.11.24.05.50.37; Tue, 24 Nov 2015 05:50:38 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dkim=neutral (body hash did not verify) header.i=@linaro-org.20150623.gappssmtp.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754169AbbKXNuZ (ORCPT + 28 others); Tue, 24 Nov 2015 08:50:25 -0500 Received: from mail-wm0-f42.google.com ([74.125.82.42]:37162 "EHLO mail-wm0-f42.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751946AbbKXNuW (ORCPT ); Tue, 24 Nov 2015 08:50:22 -0500 Received: by wmww144 with SMTP id w144so27239389wmw.0 for ; Tue, 24 Nov 2015 05:50:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro-org.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id; bh=Pi71KexNLgmlFKW6sPXI28KF8l1LpB4W4CpebyHtyqg=; b=invlt+SLYaAAVRm9e0zgJ1eIOSy//p3V/6Bg+rz9PXNkFMuJdf4htoPdpGNAm6s8Ce wX6r+tMWXbCdnFSyPcikd+QOevj917iO7eLsU6e9cUTgNKDfuOPBdTZWniEyL/J0A2q3 GtaNj4mebx51SZh9XUWTYGfa8/Ta8JGoSl3Dut6sVXDKgcSwepT+SIpKahddcIpbxWE7 F6vaF6TJ3lc9NSAE6k6B23BAXUQnA2+Md1tjeJy6LuRi2Zcw1Rxo/xl7MOQtJtzLATW7 Fjk+zx/67+wY9zTcCglfyhuuae7wMvzi+fIqwyZzwroaGRMxj9/CI6lyX61HuGyb1z3l LXqA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=Pi71KexNLgmlFKW6sPXI28KF8l1LpB4W4CpebyHtyqg=; b=VqvrpD8DA/BdVk23tslp03z3OFeLlIdckzMIb8cEeeko8g1mt4VM9LMcG5NIp7BvxY GtCR/Ur9pYvmEYZyoxZrOA0RANztvTH30Tv2w75/GHiGmzAttgrHTepooVa6pzyN4ngO OBlUp9ww9gF6zJwfdIiqfXdte6GEaf9F3Aqy22mg84VJvrFeg6lEcUSqYkIgBkOufEDO mdqX7YNhfN2MaxH1Rdf10eD2n9v1nYCF/MQ5At72IAOfZeUK/aRM35NLkZPg/YuKtOPQ 1QMWTD/GXmQaa4YFRCjY6yNiVOVSjeK1eTx9Y/g+l+ORWVwjTfiLzuD6bnf+OKKG+QRK hozQ== X-Gm-Message-State: ALoCoQlQaSJepHXdvR9EjOtSCCLD8zYY9632Y8kxqKRDuT5qpmGCY3P3Hu+EKzRuLiAMLSya9hfa X-Received: by 10.28.95.193 with SMTP id t184mr6596257wmb.7.1448373020160; Tue, 24 Nov 2015 05:50:20 -0800 (PST) Received: from lmenx30s.lme.st.com (LPuteaux-656-1-48-212.w82-127.abo.wanadoo.fr. [82.127.83.212]) by smtp.gmail.com with ESMTPSA id a186sm18163598wmh.4.2015.11.24.05.50.18 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Tue, 24 Nov 2015 05:50:18 -0800 (PST) From: Vincent Guittot To: peterz@infradead.org, mingo@kernel.org, linux-kernel@vger.kernel.org, yuyang.du@intel.com, Morten.Rasmussen@arm.com Cc: linaro-kernel@lists.linaro.org, dietmar.eggemann@arm.com, pjt@google.com, bsegall@google.com, Vincent Guittot Subject: [PATCH] sched/fair: update scale invariance of pelt Date: Tue, 24 Nov 2015 14:49:30 +0100 Message-Id: <1448372970-8764-1-git-send-email-vincent.guittot@linaro.org> X-Mailer: git-send-email 1.9.1 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The current implementation of load tracking invariance scales the load tracking value with current frequency and uarch performance (only for utilization) of the CPU. One main result of the current formula is that the figures are capped by the current capacity of the CPU. This limitation is the main reason of not including the uarch invariance (arch_scale_cpu_capacity) in the calculation of load_avg because capping the load can generate erroneous system load statistic as described with this example [1] Instead of scaling the complete value of PELT algo, we should only scale the running time by the current capacity of the CPU. It seems more correct to only scale the running time because the non running time of a task (sleeping or waiting for a runqueue) is the same whatever the current freq and the compute capacity of the CPU. Then, one main advantage of this change is that the load of a task can reach max value whatever the current freq and the uarch of the CPU on which it run. It will just take more time at a lower freq than a max freq or on a "little" CPU compared to a "big" one. The load and the utilization stay invariant across system so we can still compared them between CPU but with a wider range of values. With this change, we don't have to test if a CPU is overloaded or not in order to use one metric (util) or another (load) as all metrics are always valid. I have put below some examples of duration to reach some typical load value according to the capacity of the CPU with current implementation and with this patch. Util (%) max capacity half capacity(mainline) half capacity(w/ patch) 972 (95%) 138ms not reachable 276ms 486 (47.5%) 30ms 138ms 60ms 256 (25%) 13ms 32ms 26ms We can see that at half capacity, we need twice the duration of max capacity with this patch whereas we have a non linear increase of the duration with current implementation. [1] https://lkml.org/lkml/2014/12/18/128 Signed-off-by: Vincent Guittot --- kernel/sched/fair.c | 28 +++++++++++++--------------- 1 file changed, 13 insertions(+), 15 deletions(-) -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 824aa9f..f2a18e1 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2560,10 +2560,9 @@ static __always_inline int __update_load_avg(u64 now, int cpu, struct sched_avg *sa, unsigned long weight, int running, struct cfs_rq *cfs_rq) { - u64 delta, scaled_delta, periods; + u64 delta, periods; u32 contrib; - unsigned int delta_w, scaled_delta_w, decayed = 0; - unsigned long scale_freq, scale_cpu; + unsigned int delta_w, decayed = 0; delta = now - sa->last_update_time; /* @@ -2584,8 +2583,10 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa, return 0; sa->last_update_time = now; - scale_freq = arch_scale_freq_capacity(NULL, cpu); - scale_cpu = arch_scale_cpu_capacity(NULL, cpu); + if (running) { + delta = cap_scale(delta, arch_scale_freq_capacity(NULL, cpu)); + delta = cap_scale(delta, arch_scale_cpu_capacity(NULL, cpu)); + } /* delta_w is the amount already accumulated against our next period */ delta_w = sa->period_contrib; @@ -2601,16 +2602,15 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa, * period and accrue it. */ delta_w = 1024 - delta_w; - scaled_delta_w = cap_scale(delta_w, scale_freq); if (weight) { - sa->load_sum += weight * scaled_delta_w; + sa->load_sum += weight * delta_w; if (cfs_rq) { cfs_rq->runnable_load_sum += - weight * scaled_delta_w; + weight * delta_w; } } if (running) - sa->util_sum += scaled_delta_w * scale_cpu; + sa->util_sum += delta_w << SCHED_CAPACITY_SHIFT; delta -= delta_w; @@ -2627,25 +2627,23 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa, /* Efficiently calculate \sum (1..n_period) 1024*y^i */ contrib = __compute_runnable_contrib(periods); - contrib = cap_scale(contrib, scale_freq); if (weight) { sa->load_sum += weight * contrib; if (cfs_rq) cfs_rq->runnable_load_sum += weight * contrib; } if (running) - sa->util_sum += contrib * scale_cpu; + sa->util_sum += contrib << SCHED_CAPACITY_SHIFT; } /* Remainder of delta accrued against u_0` */ - scaled_delta = cap_scale(delta, scale_freq); if (weight) { - sa->load_sum += weight * scaled_delta; + sa->load_sum += weight * delta; if (cfs_rq) - cfs_rq->runnable_load_sum += weight * scaled_delta; + cfs_rq->runnable_load_sum += weight * delta; } if (running) - sa->util_sum += scaled_delta * scale_cpu; + sa->util_sum += delta << SCHED_CAPACITY_SHIFT; sa->period_contrib += delta;