From patchwork Mon Jun 19 08:51:24 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vincent Guittot X-Patchwork-Id: 105805 Delivered-To: patch@linaro.org Received: by 10.140.91.2 with SMTP id y2csp773575qgd; Mon, 19 Jun 2017 01:52:01 -0700 (PDT) X-Received: by 10.99.175.18 with SMTP id w18mr18173943pge.67.1497862321144; Mon, 19 Jun 2017 01:52:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1497862321; cv=none; d=google.com; s=arc-20160816; b=ciawzqor7RSvu+n2OU3scTD2I0so71ugRplGJGKRIiSUxDlPmfJmBABqd0pCPmc264 9+VLPnLbLm0QtoLf0iwtfzNDJ0JM5c8v7YzJlq6VLe9lt3fsEsbqgseVCPne4PBcf/jy +48o3p3gIxBjJ2G6NENrYSBNyYcYZVO1m86AL1gr5Ggh4oZ0ZAWbBVMVJeFWR0FYikhj 1pG5xYLb7aFnc8pzi8TIXHvIsGCei5HYSXjT4pXdkjEwCn+ix2QfKgh6cGhGBXfWJwfP wwczndeTCUNkTFBKFNHli7e+avil1T03bxmfdrH57H75YOb7U1xBtMgFkGc15MJcjL/1 eTpg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=2YrBsOyQFi1cHMXSE27RJtFW3wfdMXwlWYLIhSDeLKM=; b=juqA+DospXNDG2vSM0Aic6GQ+YpKtVkyzDnKZNyx/GjI0W3lGkXzq2iMRdpTWiBg5y R12aayjdxAEH+YkzI97HYVHqdme0uGsBgM31kIUSr3pQacFEzqHF3YtHxVPwGgRPpCJD l7bZh47TCICtIQuLZuS3PbYs8XTvUJIpM+ZCZBI3FUzsyvViKizep59TYpFkFof7UibK LtcPFiQzUzcTuBv2lXkL/i+7qiUhYf2VwR3s3yb6y0mOBRbS2q+V0OvTQ/LaBQtcV+Yq Hx/40wmSQEF4LQcWRYnSXSz1uWDbubrcRJweW2CT2Kn5MArw4eiEP0PhKL3oZnPbQgS4 HC5A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.b=NqQGmyZ3; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e19si7324866pfj.161.2017.06.19.01.52.00; Mon, 19 Jun 2017 01:52:01 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.b=NqQGmyZ3; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753721AbdFSIvm (ORCPT + 25 others); Mon, 19 Jun 2017 04:51:42 -0400 Received: from mail-wr0-f174.google.com ([209.85.128.174]:34500 "EHLO mail-wr0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753754AbdFSIvj (ORCPT ); Mon, 19 Jun 2017 04:51:39 -0400 Received: by mail-wr0-f174.google.com with SMTP id 77so67744822wrb.1 for ; Mon, 19 Jun 2017 01:51:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=2YrBsOyQFi1cHMXSE27RJtFW3wfdMXwlWYLIhSDeLKM=; b=NqQGmyZ30zflIetwWhRaub+e3FdN8t2cD1WqNgmmE/fDeGTTNmOn4XKYQBK1n1V9o0 X6rb2y+q2vi/LDFXmLjN5MVDm5iFiTpYPJ6uqf/iT4zpX1x6u/actUhj70bT4V6s7vAr BACoWX5o6FOBaUoO+DfZQMv4+62O5UWkpQzB4= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=2YrBsOyQFi1cHMXSE27RJtFW3wfdMXwlWYLIhSDeLKM=; b=gvICD/iq0Z3R5meDS3o9tbjqji9sWKk6Z+hk7Abxq3AGnUNuif0ir7r/Ao0RC83fIe 9yKUQvQpb5muCukRp/IcYQc2vtBRZQ3r4ncA5Nmhr+30Dju3PXQRUN3vfInbhQb7DQvL ht2iJzB7zU285kzULGrv9xFqMAythEw4J0+kP1kFvu/nIPEEjZ/Ubjmf7ExYYZyoPB3G pyM4zu9hCxC2AAuQiNPomD12MNTgXSAPhnGlC/DCLLk7ZmHZh871J/3Ct5M8DnHTpjML TNsprFo1Bk7fg2sTApBXrmVhu7ZYFWR4KvjY3PFhssNew5X4U7+C5t2P5cVLu+Zgcn7X 4eQQ== X-Gm-Message-State: AKS2vOwSUUz1hxTgfOYgAydoYqMgspdpdgYcR1OKWE4dAOrzkiLWqMr1 dbF8ELp+whOLsNZw X-Received: by 10.223.163.152 with SMTP id l24mr8127081wrb.203.1497862298309; Mon, 19 Jun 2017 01:51:38 -0700 (PDT) Received: from localhost.localdomain ([2a01:e0a:f:6020:f4a0:4947:8905:10c6]) by smtp.gmail.com with ESMTPSA id n71sm12741463wrb.62.2017.06.19.01.51.36 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Mon, 19 Jun 2017 01:51:37 -0700 (PDT) From: Vincent Guittot To: peterz@infradead.org, mingo@kernel.org, linux-kernel@vger.kernel.org, rjw@rjwysocki.net Cc: juri.lelli@arm.com, dietmar.eggemann@arm.com, Morten.Rasmussen@arm.com, viresh.kumar@linaro.org, Vincent Guittot Subject: [PATCH v2 1/2] sched/rt: add utilization tracking Date: Mon, 19 Jun 2017 10:51:24 +0200 Message-Id: <1497862285-10875-2-git-send-email-vincent.guittot@linaro.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1497862285-10875-1-git-send-email-vincent.guittot@linaro.org> References: <1497862285-10875-1-git-send-email-vincent.guittot@linaro.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org schedutil governor relies on cfs_rq's util_avg to choose the OPP when cfs tasks are running. When the CPU is overloaded by cfs and rt tasks, cfs tasks are preempted by rt tasks and in this case util_avg reflects the remaining capacity that is used by cfs tasks but not what cfs tasks want to use. In such case, schedutil can select a lower OPP when cfs task runs whereas the CPU is overloaded. In order to have a more accurate view of the utilization of the CPU, we track the utilization that is used by RT tasks. DL tasks are not taken into account as they have their own utilization tracking mecanism. We don't use rt_avg which doesn't have the same dynamic as PELT and which can include IRQ time. Signed-off-by: Vincent Guittot --- Change since v1: - rebase on tip/sched/core There were several comments on v1: - As raised by Peter for v1, if IRQ time is taken into account in rt_avg, it will not be accounted in rq->clock_task. This means that cfs utilization is not affected by some extra contributions or decays because of IRQ. - Regading the sync of rt and cfs utilization, both cfs and rt use the same rq->clock_task. Then, we have the same issue than cfs regarding blocked value. The utilization of idle cfs/rt rqs are not updated regularly but only when a load_balance is triggered (more precisely a call to update_blocked_average). I'd like to fix this issue for both cfs and rt with a dedicated patch that will ensure that utilization (and load) are updated regularly even for idle CPUs - One last open question is the location of rt utilization function in fair.c file. PELT related funtions should probably move in a dedicated pelt.c file. This would also help to address one comment about having a place to update metrics of NOHZ idle CPUs. kernel/sched/fair.c | 21 +++++++++++++++++++++ kernel/sched/rt.c | 9 +++++++++ kernel/sched/sched.h | 3 +++ 3 files changed, 33 insertions(+) -- 2.7.4 diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 396bca9..c52c523 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2950,6 +2950,17 @@ __update_load_avg_cfs_rq(u64 now, int cpu, struct cfs_rq *cfs_rq) cfs_rq->curr != NULL, cfs_rq); } +int update_rt_rq_load_avg(u64 now, int cpu, struct rt_rq *rt_rq, int running) +{ + int ret; + + ret = ___update_load_avg(now, cpu, &rt_rq->avg, 0, running, NULL); + + + return ret; +} + + /* * Signed add and clamp on underflow. * @@ -3478,6 +3489,11 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq, bool update_freq) return 0; } +int update_rt_rq_load_avg(u64 now, int cpu, struct rt_rq *rt_rq, int running) +{ + return 0; +} + #define UPDATE_TG 0x0 #define SKIP_AGE_LOAD 0x0 @@ -7019,6 +7035,10 @@ static void update_blocked_averages(int cpu) if (cfs_rq_is_decayed(cfs_rq)) list_del_leaf_cfs_rq(cfs_rq); } + + update_rt_rq_load_avg(rq_clock_task(rq), cpu, &rq->rt, 0); + + rq_unlock_irqrestore(rq, &rf); } @@ -7078,6 +7098,7 @@ static inline void update_blocked_averages(int cpu) rq_lock_irqsave(rq, &rf); update_rq_clock(rq); update_cfs_rq_load_avg(cfs_rq_clock_task(cfs_rq), cfs_rq, true); + update_rt_rq_load_avg(rq_clock_task(rq), cpu, &rq->rt, 0); rq_unlock_irqrestore(rq, &rf); } diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 581d5c7..09293fa 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -1534,6 +1534,8 @@ static struct task_struct *_pick_next_task_rt(struct rq *rq) return p; } +extern int update_rt_rq_load_avg(u64 now, int cpu, struct rt_rq *rt_rq, int running); + static struct task_struct * pick_next_task_rt(struct rq *rq, struct task_struct *prev, struct rq_flags *rf) { @@ -1579,6 +1581,10 @@ pick_next_task_rt(struct rq *rq, struct task_struct *prev, struct rq_flags *rf) queue_push_tasks(rq); + if (p) + update_rt_rq_load_avg(rq_clock_task(rq), cpu_of(rq), rt_rq, + rq->curr->sched_class == &rt_sched_class); + return p; } @@ -1586,6 +1592,8 @@ static void put_prev_task_rt(struct rq *rq, struct task_struct *p) { update_curr_rt(rq); + update_rt_rq_load_avg(rq_clock_task(rq), cpu_of(rq), &rq->rt, 1); + /* * The previous task needs to be made eligible for pushing * if it is still active @@ -2368,6 +2376,7 @@ static void task_tick_rt(struct rq *rq, struct task_struct *p, int queued) struct sched_rt_entity *rt_se = &p->rt; update_curr_rt(rq); + update_rt_rq_load_avg(rq_clock_task(rq), cpu_of(rq), &rq->rt, 1); watchdog(rq, p); diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index f2ef759a..0af5f40 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -508,6 +508,9 @@ struct rt_rq { unsigned long rt_nr_total; int overloaded; struct plist_head pushable_tasks; + + struct sched_avg avg; + #ifdef HAVE_RT_PUSH_IPI int push_flags; int push_cpu;