From patchwork Tue Aug 20 16:34:57 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Qais Yousef <qyousef@layalina.io>
X-Patchwork-Id: 820846
Received: from mail-lf1-f53.google.com (mail-lf1-f53.google.com
 [209.85.167.53])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by smtp.subspace.kernel.org (Postfix) with ESMTPS id A85981AB53E
 for <linux-pm@vger.kernel.org>; Tue, 20 Aug 2024 16:35:19 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.167.53
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
 t=1724171721; cv=none;
 b=fWVbH8XmMdhppcMrdIbPfvdXG4K5s+5cgKsXjUGP8ApFXLmjbM4eZpBDsxLY7SMDx2Q2N353/AQ9EyJlyF9QAfSaCskRcPWR6zsO42gDkLc47oubzPxvvQJBI0DEnKpgAGvjP8LjJn+/4acX9ueO0vueu1Ln4B0TdyKs1gRbNg4=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
 s=arc-20240116; t=1724171721; c=relaxed/simple;
 bh=zQ47ImMy4Dn1DCDt3xGlBdnPJ6cBUL0LZB3cTsDPnpA=;
 h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
 MIME-Version;
 b=oJQ3RWSC0wZsTP4iLJB7YadkapCd/R661fYDNXUcWjA3nZK9VzLwp9gKSAoIgeGP/j9X9DIpometibsUAAeeFxzry/XQBPxngxlsZxlKcWv74YpTaJRzFE0LAtRAWQGcUlujm8UunvIKHTh4DdxDFcj8O0ajeLMU/nOgqmWRbWc=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=none (p=none dis=none) header.from=layalina.io;
 spf=pass smtp.mailfrom=layalina.io;
 dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com
 header.i=@layalina-io.20230601.gappssmtp.com header.b=B+yMZs56;
 arc=none smtp.client-ip=209.85.167.53
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=none (p=none dis=none) header.from=layalina.io
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=layalina.io
Authentication-Results: smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com
 header.i=@layalina-io.20230601.gappssmtp.com header.b="B+yMZs56"
Received: by mail-lf1-f53.google.com with SMTP id
 2adb3069b0e04-533462b9428so615027e87.3
 for <linux-pm@vger.kernel.org>; Tue, 20 Aug 2024 09:35:19 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=layalina-io.20230601.gappssmtp.com; s=20230601; t=1724171718;
 x=1724776518; darn=vger.kernel.org;
 h=content-transfer-encoding:mime-version:references:in-reply-to
 :message-id:date:subject:cc:to:from:from:to:cc:subject:date
 :message-id:reply-to;
 bh=ALljPJwr4PeLXzf4Mb7jserQc5yvW1LK2jxOSl+dUgk=;
 b=B+yMZs56DcUH8lwhBw7LjnX1UE4bUz+uXrsRAFZbq2wDl9WIxCE+7bQCJwVAV5dsvD
 1HsSi9npTs5kShx7/YWBr/EBeHotOA4Dpbfrbr2FRXrBNy60r0xdiA3bwQ1g6i+Y/DNs
 C/bxMu8u18z+qiYdM/h/tJV3T9vRh/uxJxbCCKC2n9UDjv2RAhlDe9MPXAbNuxxPyfDq
 bMuW7p4nT+noroAo2WrnYcpYuAjP8ldg5f+Y0le8COnziAh0YLgg4T2nKPUipHxp1kK7
 dILxGzzGxGrRXuCOJanfpKtPd+YKkTbvX4DuWhHIKs156AogCLT6SJAUMuiXA3HMrzCz
 U0nA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20230601; t=1724171718; x=1724776518;
 h=content-transfer-encoding:mime-version:references:in-reply-to
 :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
 :subject:date:message-id:reply-to;
 bh=ALljPJwr4PeLXzf4Mb7jserQc5yvW1LK2jxOSl+dUgk=;
 b=ZWRbHN5RcuXVBnRzmVHjeMO66m8ZmMiS8JYULkWtYw+Yxu7LY8rp7CBotxN2eHcRbJ
 T1JOdImrHoSH+6TkIcHp3gkLeUUL2ItctlgS7NSJMw9D+QWytvaSKrVI4+bqPHf1KPEf
 o+13gfUgK1Lgv7a6ZVaSEc1lT6+gplUQdMitfa29O0RcV6qy/xqRoPXkgo5KrBqqun7g
 7hdMjxrvCljehVeP8DNqJfOUfwqWTlz+znzwElTcMKPWPmN/+Yk99vR17BOeqlBKk4FV
 vIMXaCev9X5b2Rv6n+W6bOS5/q55XJenwkuPvjmiF6BF+R+PiuRb/pcZxnjSdRVKJBeb
 vkvw==
X-Forwarded-Encrypted: i=1;
 AJvYcCX4Irp8ISJvo6zl7GgVftOFJv5jMnLd/hUkCJBq1HS4U+0N60TjuIy5h4BlNXV0bQqLkwEqE/WuM94yD3bjL88EZ8oc38ODmtY=
X-Gm-Message-State: AOJu0Yxk6GCxPue39ff5rHfWy7SWNJMzRFy8+lsuZWbYcMGmPF2br1Dx
 KXogBnpD9rom2tznnaAHdeEw+r63TPaRCuUeQ98oSxqm78hv+SETCRJuLEAxXiU=
X-Google-Smtp-Source: AGHT+IGuDVH/ZjqmvSREiN6JjWHaWFgiSgGpgzFBXWMg8TVU1iExwbDmMv1TgZFfKYdZJrbLW1T/7A==
X-Received: by 2002:a05:6512:1193:b0:533:4620:ebec with SMTP id
 2adb3069b0e04-5334620eda4mr942030e87.3.1724171717578;
 Tue, 20 Aug 2024 09:35:17 -0700 (PDT)
Received: from airbuntu.. (host81-157-90-255.range81-157.btcentralplus.com.
 [81.157.90.255]) by smtp.gmail.com with ESMTPSA id
 a640c23a62f3a-a83838cfb5esm780728766b.59.2024.08.20.09.35.16
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Tue, 20 Aug 2024 09:35:17 -0700 (PDT)
From: Qais Yousef <qyousef@layalina.io>
To: Ingo Molnar <mingo@kernel.org>, Peter Zijlstra <peterz@infradead.org>,
 Vincent Guittot <vincent.guittot@linaro.org>,
 "Rafael J. Wysocki" <rafael@kernel.org>,
 Viresh Kumar <viresh.kumar@linaro.org>
Cc: Juri Lelli <juri.lelli@redhat.com>, Steven Rostedt <rostedt@goodmis.org>,
 Dietmar Eggemann <dietmar.eggemann@arm.com>,
 John Stultz <jstultz@google.com>, linux-pm@vger.kernel.org,
 linux-kernel@vger.kernel.org, Qais Yousef <qyousef@layalina.io>
Subject: [RFC PATCH 01/16] sched: cpufreq: Rename map_util_perf to
 sugov_apply_dvfs_headroom
Date: Tue, 20 Aug 2024 17:34:57 +0100
Message-Id: <20240820163512.1096301-2-qyousef@layalina.io>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20240820163512.1096301-1-qyousef@layalina.io>
References: <20240820163512.1096301-1-qyousef@layalina.io>
Precedence: bulk
X-Mailing-List: linux-pm@vger.kernel.org
List-Id: <linux-pm.vger.kernel.org>
List-Subscribe: <mailto:linux-pm+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-pm+unsubscribe@vger.kernel.org>
MIME-Version: 1.0

We are providing headroom for the utilization to grow until the next
decision point to pick the next frequency. Give the function a better
name and give it some documentation. It is not really mapping anything.

Also move it to cpufreq_schedutil.c. This function relies on updating
util signal appropriately to give a headroom to grow. This is tied to
schedutil and scheduler and not something that can be shared with other
governors.

Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Qais Yousef <qyousef@layalina.io>
---
 include/linux/sched/cpufreq.h    |  5 -----
 kernel/sched/cpufreq_schedutil.c | 20 +++++++++++++++++++-
 2 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/include/linux/sched/cpufreq.h b/include/linux/sched/cpufreq.h
index bdd31ab93bc5..d01755d3142f 100644
--- a/include/linux/sched/cpufreq.h
+++ b/include/linux/sched/cpufreq.h
@@ -28,11 +28,6 @@ static inline unsigned long map_util_freq(unsigned long util,
 {
 	return freq * util / cap;
 }
-
-static inline unsigned long map_util_perf(unsigned long util)
-{
-	return util + (util >> 2);
-}
 #endif /* CONFIG_CPU_FREQ */
 
 #endif /* _LINUX_SCHED_CPUFREQ_H */
diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index eece6244f9d2..575df3599813 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -178,12 +178,30 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
 	return cpufreq_driver_resolve_freq(policy, freq);
 }
 
+/*
+ * DVFS decision are made at discrete points. If CPU stays busy, the util will
+ * continue to grow, which means it could need to run at a higher frequency
+ * before the next decision point was reached. IOW, we can't follow the util as
+ * it grows immediately, but there's a delay before we issue a request to go to
+ * higher frequency. The headroom caters for this delay so the system continues
+ * to run at adequate performance point.
+ *
+ * This function provides enough headroom to provide adequate performance
+ * assuming the CPU continues to be busy.
+ *
+ * At the moment it is a constant multiplication with 1.25.
+ */
+static inline unsigned long sugov_apply_dvfs_headroom(unsigned long util)
+{
+	return util + (util >> 2);
+}
+
 unsigned long sugov_effective_cpu_perf(int cpu, unsigned long actual,
 				 unsigned long min,
 				 unsigned long max)
 {
 	/* Add dvfs headroom to actual utilization */
-	actual = map_util_perf(actual);
+	actual = sugov_apply_dvfs_headroom(actual);
 	/* Actually we don't need to target the max performance */
 	if (actual < max)
 		max = actual;

From patchwork Tue Aug 20 16:34:58 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Qais Yousef <qyousef@layalina.io>
X-Patchwork-Id: 821200
Received: from mail-ed1-f46.google.com (mail-ed1-f46.google.com
 [209.85.208.46])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by smtp.subspace.kernel.org (Postfix) with ESMTPS id A1B0C1AE04F
 for <linux-pm@vger.kernel.org>; Tue, 20 Aug 2024 16:35:20 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.208.46
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
 t=1724171722; cv=none;
 b=muHjeURFNmnzDuICxMy17b1H2Xn3QXDcQyM/wjR9lUTNO41OeSWrUs7ucFwAQARd6Mbu1R+EWFyCMoGYDkRD7XsLGeHfEttM7MHQOiInFISDndHurm/yvDw1v9T5gSyZgTM37tMzY9qjYcaX2bU+74nf06Xr2KXsLVbiBXp/TXE=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
 s=arc-20240116; t=1724171722; c=relaxed/simple;
 bh=x61t08722IA9HiK/FzMTCXQkBhUf938GZOXU6aQsSSk=;
 h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
 MIME-Version;
 b=Yn9+Yv1L4MrSlF4unn8EEv9Fsx4LNfpzaEGSJ8Pc2N9cp1Zc3z14Mm39TLJvORKDS+E81funCEwTGEuHjobGp0ZE8ZKrK8vgPW2H/R8e9Gxzf2EIHujB6fX9mpTLQ6U4+vdNiW/RRTN5oLUnXmfr0lz82gETagZkoXTND5L0JNA=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=none (p=none dis=none) header.from=layalina.io;
 spf=pass smtp.mailfrom=layalina.io;
 dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com
 header.i=@layalina-io.20230601.gappssmtp.com header.b=V6TR+HEk;
 arc=none smtp.client-ip=209.85.208.46
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=none (p=none dis=none) header.from=layalina.io
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=layalina.io
Authentication-Results: smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com
 header.i=@layalina-io.20230601.gappssmtp.com header.b="V6TR+HEk"
Received: by mail-ed1-f46.google.com with SMTP id
 4fb4d7f45d1cf-5befd2f35bfso2727920a12.2
 for <linux-pm@vger.kernel.org>; Tue, 20 Aug 2024 09:35:20 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=layalina-io.20230601.gappssmtp.com; s=20230601; t=1724171719;
 x=1724776519; darn=vger.kernel.org;
 h=content-transfer-encoding:mime-version:references:in-reply-to
 :message-id:date:subject:cc:to:from:from:to:cc:subject:date
 :message-id:reply-to;
 bh=P4GvNZ2q2vkKbUvc/NgCLEzCheBxpfsamb4EXfBM/Og=;
 b=V6TR+HEk6ch4QlU0saqGNMJ5DhHPrFkh0qnwkkIkhf/0rInDOAi1ajEM91s4akYkIR
 oCv9yClD3OTbLl8kyOeWbTIsGihOBFlKn6E4j2/q3BCrcVkcCnaUaQWkcTnfhCQ4Bb22
 tAh3+7g+dIphmESa6y1cVQCKCRcsnnf14Vno3QKL5826xuLylVi2ErstQCLz/7+pMjwI
 mHaBz9VYiHsBHaMqUlvhhRc3hmcfFU/MG6BjizdwMowafdQfVNq+R1p8jot7WqQXmX6I
 MHw3IfTHV2sRUlUYcKN7fidLgJfpaWVgU3mbg5JzZ7KmIOi5NwgFAHSz4AASMOf6bcDZ
 0UVw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20230601; t=1724171719; x=1724776519;
 h=content-transfer-encoding:mime-version:references:in-reply-to
 :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
 :subject:date:message-id:reply-to;
 bh=P4GvNZ2q2vkKbUvc/NgCLEzCheBxpfsamb4EXfBM/Og=;
 b=LfMZSH9mB6HuwghcmzSDsC5hHJAXNXrHZHQlYfib4Ujamo6DnVKa9XBLLKuxqWw5iP
 LTY6gpQHpj6LlVFvREhAnXsWJhLYPOhdB9M1W4O7ODKGZq8th2NwYgOLyACs4AAiXFhB
 0a09w1gVJFZXDcqjJyyZonVOJjHmA0qzrdjbO21M8hsyE1H5zQkCcEzPMoQ1NIybliFb
 1s5BVDLpnh6QBXy1UOD7qAbCIumpMJENu+98Hc9fA+RJyz1TTeTuKXs+SXgCdjZ39Jy2
 Ohnb5ibq6r8S/6yTs6l5hVrgdfJAyipKNxlXaVYXJuN6tH4Gf9oE6b+XB3xxYM3MoUZH
 antA==
X-Forwarded-Encrypted: i=1;
 AJvYcCX25h99HhuY3/c4dSHbkzzenIu5fiUFQ0RHOwOsuXkWvkmc6wnjyACtEatB1KuWyKqy2jo8XZzHAQ==@vger.kernel.org
X-Gm-Message-State: AOJu0Yw5DJFsYHRfnnF0ajscWiBg5VJkX5weORTy35jutZs288H7VRzn
 ZWmgzLTAU66c2u7F4uup0Fs86t5fP3rFZXZA8e7MQF8kkb4fe8z/lrN/f0/6nNI=
X-Google-Smtp-Source: AGHT+IGANk0JtgzuFnotXZ6oaaNXX0v0ntO5GyaujzzJT2sPsn/z/iWuqqA4+lqIPu3rUA7Z15IshQ==
X-Received: by 2002:a17:907:1c22:b0:a80:f6f2:e070 with SMTP id
 a640c23a62f3a-a83928a3333mr1164972466b.3.1724171718395;
 Tue, 20 Aug 2024 09:35:18 -0700 (PDT)
Received: from airbuntu.. (host81-157-90-255.range81-157.btcentralplus.com.
 [81.157.90.255]) by smtp.gmail.com with ESMTPSA id
 a640c23a62f3a-a83838cfb5esm780728766b.59.2024.08.20.09.35.17
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Tue, 20 Aug 2024 09:35:18 -0700 (PDT)
From: Qais Yousef <qyousef@layalina.io>
To: Ingo Molnar <mingo@kernel.org>, Peter Zijlstra <peterz@infradead.org>,
 Vincent Guittot <vincent.guittot@linaro.org>,
 "Rafael J. Wysocki" <rafael@kernel.org>,
 Viresh Kumar <viresh.kumar@linaro.org>
Cc: Juri Lelli <juri.lelli@redhat.com>, Steven Rostedt <rostedt@goodmis.org>,
 Dietmar Eggemann <dietmar.eggemann@arm.com>,
 John Stultz <jstultz@google.com>, linux-pm@vger.kernel.org,
 linux-kernel@vger.kernel.org, Qais Yousef <qyousef@layalina.io>
Subject: [RFC PATCH 02/16] sched/pelt: Add a new function to approximate the
 future util_avg value
Date: Tue, 20 Aug 2024 17:34:58 +0100
Message-Id: <20240820163512.1096301-3-qyousef@layalina.io>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20240820163512.1096301-1-qyousef@layalina.io>
References: <20240820163512.1096301-1-qyousef@layalina.io>
Precedence: bulk
X-Mailing-List: linux-pm@vger.kernel.org
List-Id: <linux-pm.vger.kernel.org>
List-Subscribe: <mailto:linux-pm+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-pm+unsubscribe@vger.kernel.org>
MIME-Version: 1.0

Given a util_avg value, the new function will return the future one
given a runtime delta.

This will be useful in later patches to help replace some magic margins
with more deterministic behavior.

Signed-off-by: Qais Yousef <qyousef@layalina.io>
---
 kernel/sched/pelt.c  | 22 +++++++++++++++++++++-
 kernel/sched/sched.h |  1 +
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/pelt.c b/kernel/sched/pelt.c
index fa52906a4478..2ce83e880bd5 100644
--- a/kernel/sched/pelt.c
+++ b/kernel/sched/pelt.c
@@ -466,4 +466,24 @@ int update_irq_load_avg(struct rq *rq, u64 running)
 
 	return ret;
 }
-#endif
+#endif /* CONFIG_HAVE_SCHED_AVG_IRQ */
+
+/*
+ * Approximate the new util_avg value assuming an entity has continued to run
+ * for @delta us.
+ */
+unsigned long approximate_util_avg(unsigned long util, u64 delta)
+{
+	struct sched_avg sa = {
+		.util_sum = util * PELT_MIN_DIVIDER,
+		.util_avg = util,
+	};
+
+	if (unlikely(!delta))
+		return util;
+
+	accumulate_sum(delta, &sa, 1, 0, 1);
+	___update_load_avg(&sa, 0);
+
+	return sa.util_avg;
+}
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 4c36cc680361..294c6769e330 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -3064,6 +3064,7 @@ unsigned long sugov_effective_cpu_perf(int cpu, unsigned long actual,
 				 unsigned long min,
 				 unsigned long max);
 
+unsigned long approximate_util_avg(unsigned long util, u64 delta);
 
 /*
  * Verify the fitness of task @p to run on @cpu taking into account the

From patchwork Tue Aug 20 16:34:59 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Qais Yousef <qyousef@layalina.io>
X-Patchwork-Id: 821199
Received: from mail-lj1-f181.google.com (mail-lj1-f181.google.com
 [209.85.208.181])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by smtp.subspace.kernel.org (Postfix) with ESMTPS id 68673195F17
 for <linux-pm@vger.kernel.org>; Tue, 20 Aug 2024 16:35:21 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.208.181
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
 t=1724171723; cv=none;
 b=f9R3RNK8ynltr7Bv4gdM0mIKCjoKghFOEv3OdUtWK23zbm/3Wiaf/wHzeC6VV1imcr/ARNViN7N/JQRHftGOx7RD1OztVP7kq0tzRIo3Z7QpR9PQY2t6Hm3od2LQHptZWpxzs+HKPY8FI2Dup6EvYrmo4xWKx9nY9Aij+Xa3vkE=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
 s=arc-20240116; t=1724171723; c=relaxed/simple;
 bh=FF4nQxBM0nS1lqakhr4Q+sUzCZDVAFySaog/xxInf3M=;
 h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
 MIME-Version;
 b=Ok+lExfqimQJetY1Y223sgogUU07DNUORaRnXZtBWCzT2GFgzSNKqJ20D8pZmqgzv8dbo3u2hyelgqD7NCxdyUeWNRFuZxjoxmD7nCcwy/Ly9Nne2Y3IZRifDWjddYakch2euBsqRPVDKZ0YUVTMZh0hZECHKXlsAU3PU7Dc01w=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=none (p=none dis=none) header.from=layalina.io;
 spf=pass smtp.mailfrom=layalina.io;
 dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com
 header.i=@layalina-io.20230601.gappssmtp.com header.b=I8RXwI2m;
 arc=none smtp.client-ip=209.85.208.181
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=none (p=none dis=none) header.from=layalina.io
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=layalina.io
Authentication-Results: smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com
 header.i=@layalina-io.20230601.gappssmtp.com header.b="I8RXwI2m"
Received: by mail-lj1-f181.google.com with SMTP id
 38308e7fff4ca-2f029e9c9cfso30807121fa.2
 for <linux-pm@vger.kernel.org>; Tue, 20 Aug 2024 09:35:21 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=layalina-io.20230601.gappssmtp.com; s=20230601; t=1724171719;
 x=1724776519; darn=vger.kernel.org;
 h=content-transfer-encoding:mime-version:references:in-reply-to
 :message-id:date:subject:cc:to:from:from:to:cc:subject:date
 :message-id:reply-to;
 bh=/BSe4KOyJw2YEYF8rnu6mjIl7u9y+oNGIQ3Hg6lKPvk=;
 b=I8RXwI2menTRCf+yMfWW84ydKaVgC8EaI0+dk0M56EXo1MH5w5kGeC4s2aOpnqQWZQ
 VK9w+FqoOQlkqcCO7ngOn4KWGi7Dk9rcqopIwm3NbpFrHtPkSMIjFLhpvgtwR11sl0n3
 9uONdTjNa87gXkR4Z8g883SQgjTwYZorQuGWz/LvC9IWbmw2KUxkvsZzFZLpE3zmgh4d
 JQ20QBJlGdjA0uCPNyh5vaEAXt7DZzaeYNgAC81/il4bqs1+1d7qGzDuKrcaq7ySaRf1
 9JTFsbjojBZvIOii/Uzl36hdwPQfw+PHkl4iIcRTkr5UutahNW9NWZk6uFpXt7Dn7GVc
 gWPg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20230601; t=1724171719; x=1724776519;
 h=content-transfer-encoding:mime-version:references:in-reply-to
 :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
 :subject:date:message-id:reply-to;
 bh=/BSe4KOyJw2YEYF8rnu6mjIl7u9y+oNGIQ3Hg6lKPvk=;
 b=D+sDfcp5ynZtOn4VhNCrXk6LJG0N+4Wd+G4kiJHqzllvJ6/8PfjiZWbDPgXvp2PCCy
 52zs3un9Tk7YF0Otw9bINlCQYeNEWfxM8AB7u6hgiganFgKN9wft0fGeB84q1cbkx75t
 eHaORjWR5aP+7oIwB7duKMTaXVrJRN7LA6qfe5Q5GJHppxHMcy68423sfo8J8zSw/Z3z
 vzW/hCmtDPwdq6fm3rIJROUnGHhZENGl+oWYTiWXBB909SSEkRUrL2Prm9Cps8frUptn
 d79kgoXj+5Vg+NOZUHVIabJ7P1fOS2ZJ/6+Nz9PQeGBwLUFaWZmGZ+K7wqrOmsH9O4CL
 2HcA==
X-Forwarded-Encrypted: i=1;
 AJvYcCWBwgqSp/yqeX3j5oNcFVkeZSRoYFYnh8ninN+YnHuQ7wHtohBklPVOYMglrBGGIXhDLdkJOcosMA==@vger.kernel.org
X-Gm-Message-State: AOJu0YxdYlNdlebtnAhUJWjzljGEUh5qjcAzQPVRQhn9tVGL7UfURaVX
 mVoqNIgKmPWn0YYysSjAxBc/kG2KUDmHjWj639uuKSZaBy8dwelZS4r5cNCtonU=
X-Google-Smtp-Source: AGHT+IFsK5sSF4zn+iA0VCf5c4osiSUBkZLZkTq8PaWwmO8mR0g9hawJ9xoV1658/XBy+x9/4qSzfw==
X-Received: by 2002:a05:6512:3d28:b0:52c:8342:6699 with SMTP id
 2adb3069b0e04-5331c6e4088mr12083886e87.55.1724171719232;
 Tue, 20 Aug 2024 09:35:19 -0700 (PDT)
Received: from airbuntu.. (host81-157-90-255.range81-157.btcentralplus.com.
 [81.157.90.255]) by smtp.gmail.com with ESMTPSA id
 a640c23a62f3a-a83838cfb5esm780728766b.59.2024.08.20.09.35.18
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Tue, 20 Aug 2024 09:35:18 -0700 (PDT)
From: Qais Yousef <qyousef@layalina.io>
To: Ingo Molnar <mingo@kernel.org>, Peter Zijlstra <peterz@infradead.org>,
 Vincent Guittot <vincent.guittot@linaro.org>,
 "Rafael J. Wysocki" <rafael@kernel.org>,
 Viresh Kumar <viresh.kumar@linaro.org>
Cc: Juri Lelli <juri.lelli@redhat.com>, Steven Rostedt <rostedt@goodmis.org>,
 Dietmar Eggemann <dietmar.eggemann@arm.com>,
 John Stultz <jstultz@google.com>, linux-pm@vger.kernel.org,
 linux-kernel@vger.kernel.org, Qais Yousef <qyousef@layalina.io>
Subject: [RFC PATCH 03/16] sched/pelt: Add a new function to approximate
 runtime to reach given util
Date: Tue, 20 Aug 2024 17:34:59 +0100
Message-Id: <20240820163512.1096301-4-qyousef@layalina.io>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20240820163512.1096301-1-qyousef@layalina.io>
References: <20240820163512.1096301-1-qyousef@layalina.io>
Precedence: bulk
X-Mailing-List: linux-pm@vger.kernel.org
List-Id: <linux-pm.vger.kernel.org>
List-Subscribe: <mailto:linux-pm+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-pm+unsubscribe@vger.kernel.org>
MIME-Version: 1.0

It is basically the ramp-up time from 0 to a given value. Will be used
later to implement new tunable to control response time  for schedutil.

Signed-off-by: Qais Yousef <qyousef@layalina.io>
---
 kernel/sched/pelt.c  | 21 +++++++++++++++++++++
 kernel/sched/sched.h |  1 +
 2 files changed, 22 insertions(+)

diff --git a/kernel/sched/pelt.c b/kernel/sched/pelt.c
index 2ce83e880bd5..06cb881ba582 100644
--- a/kernel/sched/pelt.c
+++ b/kernel/sched/pelt.c
@@ -487,3 +487,24 @@ unsigned long approximate_util_avg(unsigned long util, u64 delta)
 
 	return sa.util_avg;
 }
+
+/*
+ * Approximate the required amount of runtime in ms required to reach @util.
+ */
+u64 approximate_runtime(unsigned long util)
+{
+	struct sched_avg sa = {};
+	u64 delta = 1024; // period = 1024 = ~1ms
+	u64 runtime = 0;
+
+	if (unlikely(!util))
+		return runtime;
+
+	while (sa.util_avg < util) {
+		accumulate_sum(delta, &sa, 1, 0, 1);
+		___update_load_avg(&sa, 0);
+		runtime++;
+	}
+
+	return runtime;
+}
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 294c6769e330..47f158b2cdc2 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -3065,6 +3065,7 @@ unsigned long sugov_effective_cpu_perf(int cpu, unsigned long actual,
 				 unsigned long max);
 
 unsigned long approximate_util_avg(unsigned long util, u64 delta);
+u64 approximate_runtime(unsigned long util);
 
 /*
  * Verify the fitness of task @p to run on @cpu taking into account the

From patchwork Tue Aug 20 16:35:00 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Qais Yousef <qyousef@layalina.io>
X-Patchwork-Id: 820844
Received: from mail-wm1-f48.google.com (mail-wm1-f48.google.com
 [209.85.128.48])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by smtp.subspace.kernel.org (Postfix) with ESMTPS id EAD76198E75
 for <linux-pm@vger.kernel.org>; Tue, 20 Aug 2024 16:35:21 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.48
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
 t=1724171723; cv=none;
 b=SrTQAffco0HnjU7SUIVPEYqh5dxj3Bi9ivnKVW2J30gYw6PHtNnOz6cVJQgDoCEqP/flhCwmch8ggDm8KwHNlhfcL8F8B8Ky8oNWasLOWLsZqwTi5vQdvAbRv7QiyhloDULPc/1msMF9QEfnpqC1Q0NZqacC2pNM7kuo6bxiPXc=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
 s=arc-20240116; t=1724171723; c=relaxed/simple;
 bh=iywCMvLV4TtRo7p9sFYjyRIh592/LLot9ueypZVp58c=;
 h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
 MIME-Version;
 b=GD+ex6hp2QdFUreptapd12RdQ6t4EMIHs8jHRujthh+tUVeyGRkXFO1WEg6bCbjQhRpyUqxaXN62tiEaLY0Nlbc6PYcrJLAWqg7DmK9IipLBSQNTXGzWoRVKXXp0zKy6ju3gf2rM+TPkYwoPHK7Dssktc6FXS7Vx7GH63ffxvEE=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=none (p=none dis=none) header.from=layalina.io;
 spf=pass smtp.mailfrom=layalina.io;
 dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com
 header.i=@layalina-io.20230601.gappssmtp.com header.b=FoFhAMWf;
 arc=none smtp.client-ip=209.85.128.48
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=none (p=none dis=none) header.from=layalina.io
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=layalina.io
Authentication-Results: smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com
 header.i=@layalina-io.20230601.gappssmtp.com header.b="FoFhAMWf"
Received: by mail-wm1-f48.google.com with SMTP id
 5b1f17b1804b1-42816ca797fso47308205e9.2
 for <linux-pm@vger.kernel.org>; Tue, 20 Aug 2024 09:35:21 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=layalina-io.20230601.gappssmtp.com; s=20230601; t=1724171720;
 x=1724776520; darn=vger.kernel.org;
 h=content-transfer-encoding:mime-version:references:in-reply-to
 :message-id:date:subject:cc:to:from:from:to:cc:subject:date
 :message-id:reply-to;
 bh=4F1WOdF23Pe6zUBd4kovJU9RvUMFw9YRtUZ0tQf1aSc=;
 b=FoFhAMWfB3718FijD5y0b0aIYk+gikZSkfUeJJ8Yu49C4AgRG8JGKeGBSFm6+wDlU5
 s9Z6BTE5bDwj4MjhuSaRWcw6meo0nk6rGoknxGWTvnXEYltGR0/T1Bh+mKwyUKRJY5mf
 HADhrMKtY3E3Xqzono+H0556XGUdoLZ5K3ntUsnX36lce/UEyTuCB24+tmRsSeZ3qDHE
 e7rplvTGfuwsDB9jjY2cDmr0GHaM5h8tW3678gRvZZpOIQGAOQYlQSSK/dOGRR3840bY
 J6bV4uJyckwa7Dl7ySRG0fMzW8rABf4YBK2LuT5AlQQlKtxxjQCzXkJvYAkw+QmzUsyc
 QVYA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20230601; t=1724171720; x=1724776520;
 h=content-transfer-encoding:mime-version:references:in-reply-to
 :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
 :subject:date:message-id:reply-to;
 bh=4F1WOdF23Pe6zUBd4kovJU9RvUMFw9YRtUZ0tQf1aSc=;
 b=knkelqzPyX1khMPNZlAaZ/wrPWkBtPqnWPA3BX/7v2YDGlfUl0oMm2eUmhljkcLvQe
 D4alKzq+HH0CDIL/ii96QEnIJtOoR0lBUoPU9fGPc+yHNEKB12Ai96htpyHwc4p5ytA/
 QEqXHi9cJtPaVp2iEpTtqYop9kTu80VmUKFrQhXY543jBU2Bp2RQRK9m0fybt1WaEl+r
 9ISkIhGbsFU0UCze+04ZrcWMbl/ijVk8I/phV8coM9UfG2O+rdJOFRfjU8yHe1LtA8Ne
 rfA45XY+XgLv27S9OiiGaWsPjEcYpjqSaeKQw07qNLUH786Co1qEA+oQg2tEK4mahxk5
 MzGA==
X-Forwarded-Encrypted: i=1;
 AJvYcCUbd0ACSZe1aDn1qM+lw65Q+DswSpr1qTPbf8dSxzTUWNBVqeC86LVjAQmrxA+YGqf+ZTFy8AXW7RlYlPzK5ayxr9Tjy9fJlOw=
X-Gm-Message-State: AOJu0YyZwVi1tGb8G11K1RwUcOtt4PYJJku+Oqtdahrh7vPjvlCkykH3
 0M9xJ9bNV4UQmJllU4ubDvonByqh1GTVazXWdxeaK0JNhrgHWfSzeGoG7PgBuBo=
X-Google-Smtp-Source: AGHT+IEu45k94fu5QhYe3wBpe9rK5a5C2AuIr7eJpYJMNhrKU40wfFxlLbRCcxUSZTh9bJvNjgDxDA==
X-Received: by 2002:a5d:400d:0:b0:367:8847:5bf4 with SMTP id
 ffacd0b85a97d-37194317679mr9919698f8f.10.1724171720072;
 Tue, 20 Aug 2024 09:35:20 -0700 (PDT)
Received: from airbuntu.. (host81-157-90-255.range81-157.btcentralplus.com.
 [81.157.90.255]) by smtp.gmail.com with ESMTPSA id
 a640c23a62f3a-a83838cfb5esm780728766b.59.2024.08.20.09.35.19
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Tue, 20 Aug 2024 09:35:19 -0700 (PDT)
From: Qais Yousef <qyousef@layalina.io>
To: Ingo Molnar <mingo@kernel.org>, Peter Zijlstra <peterz@infradead.org>,
 Vincent Guittot <vincent.guittot@linaro.org>,
 "Rafael J. Wysocki" <rafael@kernel.org>,
 Viresh Kumar <viresh.kumar@linaro.org>
Cc: Juri Lelli <juri.lelli@redhat.com>, Steven Rostedt <rostedt@goodmis.org>,
 Dietmar Eggemann <dietmar.eggemann@arm.com>,
 John Stultz <jstultz@google.com>, linux-pm@vger.kernel.org,
 linux-kernel@vger.kernel.org, Qais Yousef <qyousef@layalina.io>
Subject: [RFC PATCH 04/16] sched/fair: Remove magic hardcoded margin in
 fits_capacity()
Date: Tue, 20 Aug 2024 17:35:00 +0100
Message-Id: <20240820163512.1096301-5-qyousef@layalina.io>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20240820163512.1096301-1-qyousef@layalina.io>
References: <20240820163512.1096301-1-qyousef@layalina.io>
Precedence: bulk
X-Mailing-List: linux-pm@vger.kernel.org
List-Id: <linux-pm.vger.kernel.org>
List-Subscribe: <mailto:linux-pm+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-pm+unsubscribe@vger.kernel.org>
MIME-Version: 1.0

Replace hardcoded margin value in fits_capacity() with better dynamic
logic.

80% margin is a magic value that has served its purpose for now, but it
no longer fits the variety of systems that exist today. If a system is
over powered specifically, this 80% will mean we leave a lot of capacity
unused before we decide to upmigrate on HMP system.

On many systems the little cores are under powered and ability to
migrate faster away from them is desired.

Redefine misfit migration to mean the utilization threshold at which the
task would become misfit at the next load balance event assuming it
becomes an always running task.

To calculate this threshold, we use the new approximate_util_avg()
function to find out the threshold, based on arch_scale_cpu_capacity()
the task will be misfit if it continues to run for a TICK_USEC which is
our worst case scenario for when misfit migration will kick in.

Signed-off-by: Qais Yousef <qyousef@layalina.io>
---
 kernel/sched/core.c  |  1 +
 kernel/sched/fair.c  | 40 ++++++++++++++++++++++++++++++++--------
 kernel/sched/sched.h |  1 +
 3 files changed, 34 insertions(+), 8 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 6d35c48239be..402ee4947ef0 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -8266,6 +8266,7 @@ void __init sched_init(void)
 		rq->sd = NULL;
 		rq->rd = NULL;
 		rq->cpu_capacity = SCHED_CAPACITY_SCALE;
+		rq->fits_capacity_threshold = SCHED_CAPACITY_SCALE;
 		rq->balance_callback = &balance_push_callback;
 		rq->active_balance = 0;
 		rq->next_balance = jiffies;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9057584ec06d..e5e986af18dc 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -95,11 +95,15 @@ int __weak arch_asym_cpu_priority(int cpu)
 }
 
 /*
- * The margin used when comparing utilization with CPU capacity.
- *
- * (default: ~20%)
+ * fits_capacity() must ensure that a task will not be 'stuck' on a CPU with
+ * lower capacity for too long. This the threshold is the util value at which
+ * if a task becomes always busy it could miss misfit migration load balance
+ * event. So we consider a task is misfit before it reaches this point.
  */
-#define fits_capacity(cap, max)	((cap) * 1280 < (max) * 1024)
+static inline bool fits_capacity(unsigned long util, int cpu)
+{
+	return util < cpu_rq(cpu)->fits_capacity_threshold;
+}
 
 /*
  * The margin used when comparing CPU capacities.
@@ -4978,14 +4982,13 @@ static inline int util_fits_cpu(unsigned long util,
 				unsigned long uclamp_max,
 				int cpu)
 {
-	unsigned long capacity = capacity_of(cpu);
 	unsigned long capacity_orig;
 	bool fits, uclamp_max_fits;
 
 	/*
 	 * Check if the real util fits without any uclamp boost/cap applied.
 	 */
-	fits = fits_capacity(util, capacity);
+	fits = fits_capacity(util, cpu);
 
 	if (!uclamp_is_used())
 		return fits;
@@ -9592,12 +9595,33 @@ static void update_cpu_capacity(struct sched_domain *sd, int cpu)
 {
 	unsigned long capacity = scale_rt_capacity(cpu);
 	struct sched_group *sdg = sd->groups;
+	struct rq *rq = cpu_rq(cpu);
+	u64 limit;
 
 	if (!capacity)
 		capacity = 1;
 
-	cpu_rq(cpu)->cpu_capacity = capacity;
-	trace_sched_cpu_capacity_tp(cpu_rq(cpu));
+	rq->cpu_capacity = capacity;
+	trace_sched_cpu_capacity_tp(rq);
+
+	/*
+	 * Calculate the util at which the task must be considered a misfit.
+	 *
+	 * We must ensure that a task experiences the same ramp-up time to
+	 * reach max performance point of the system regardless of the CPU it
+	 * is running on (due to invariance, time will stretch and task will
+	 * take longer to achieve the same util value compared to a task
+	 * running on a big CPU) and a delay in misfit migration which depends
+	 * on TICK doesn't end up hurting it as it can happen after we would
+	 * have crossed this threshold.
+	 *
+	 * To ensure that invaraince is taken into account, we don't scale time
+	 * and use it as-is, approximate_util_avg() will then let us know the
+	 * our threshold.
+	 */
+	limit = approximate_runtime(arch_scale_cpu_capacity(cpu)) * USEC_PER_MSEC;
+	limit -= TICK_USEC; /* sd->balance_interval is more accurate */
+	rq->fits_capacity_threshold = approximate_util_avg(0, limit);
 
 	sdg->sgc->capacity = capacity;
 	sdg->sgc->min_capacity = capacity;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 47f158b2cdc2..ab4672675b84 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1093,6 +1093,7 @@ struct rq {
 	struct sched_domain __rcu	*sd;
 
 	unsigned long		cpu_capacity;
+	unsigned long		fits_capacity_threshold;
 
 	struct balance_callback *balance_callback;
 

From patchwork Tue Aug 20 16:35:01 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Qais Yousef <qyousef@layalina.io>
X-Patchwork-Id: 821198
Received: from mail-ej1-f44.google.com (mail-ej1-f44.google.com
 [209.85.218.44])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by smtp.subspace.kernel.org (Postfix) with ESMTPS id C92411BB69B
 for <linux-pm@vger.kernel.org>; Tue, 20 Aug 2024 16:35:22 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.218.44
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
 t=1724171724; cv=none;
 b=kHuozDGZ40AudJcUGWXvAaJ/+ZggqiXfp3Xp0n3h8x0HGwu8lwnn9sinBX1Z4QqPocKiCp1czQVoUFr6Xa7aQZshKQdZe48Y4I26tBBzn9iRUoI+mYeQOVafbn7U3CL/C6WJsIQ8QfJXCugyRcJ/acH0aJJnt1QCK4lr7udCyLo=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
 s=arc-20240116; t=1724171724; c=relaxed/simple;
 bh=ZmhmvAh2ezCJJ4PmFxbu0s4/c7f7Y3hO+tlLfSwsztc=;
 h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
 MIME-Version;
 b=I3WjMi6crSLzFoD7A3kpO4INtRVqmJ4ykGtxQviyBg4Z1IKuAFsqW5ZrKF3BVpga7DxApmMojXTql8Fg5fopmGJl49rvT8I7r8uK9tFmqemTXIQcfyBU6XYhRa5g6dR4LgMORVmrb1deyhDgz9rGUFkSceWajjhtGH5dSDe+WmA=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=none (p=none dis=none) header.from=layalina.io;
 spf=pass smtp.mailfrom=layalina.io;
 dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com
 header.i=@layalina-io.20230601.gappssmtp.com header.b=ZN42eiHZ;
 arc=none smtp.client-ip=209.85.218.44
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=none (p=none dis=none) header.from=layalina.io
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=layalina.io
Authentication-Results: smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com
 header.i=@layalina-io.20230601.gappssmtp.com header.b="ZN42eiHZ"
Received: by mail-ej1-f44.google.com with SMTP id
 a640c23a62f3a-a7abe5aa9d5so615137766b.1
 for <linux-pm@vger.kernel.org>; Tue, 20 Aug 2024 09:35:22 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=layalina-io.20230601.gappssmtp.com; s=20230601; t=1724171721;
 x=1724776521; darn=vger.kernel.org;
 h=content-transfer-encoding:mime-version:references:in-reply-to
 :message-id:date:subject:cc:to:from:from:to:cc:subject:date
 :message-id:reply-to;
 bh=xQ1DK2daPE9ncuhHLypAOQVYnkEU114ByewzWSw0TA4=;
 b=ZN42eiHZCPvTwREOEJqJbGsuizb3xhvMQU+MZwMM7zwdpGQXJPR6SZZfQc6p3D/mN9
 OBtJ23dNwC5DWwNJpmSbUXMusS5vccpj7YQVygqKPqeiXdhZCn6gccNgy+Gsbjf9j3+2
 cZr7ggSGTuB73fANZfHQXOmh7FiwA/eHPo6fXHEpu35ASUbEC6aqWRzlXNeo80nRJ6To
 RXOcI+E/vyHC7ycySs3+7f5fDKDj9NTqZ47GICpek8hUBtP3p9vP67+UURg1Eq+ZT+EW
 3KNhD8the0zCN95BVUUf05GwYDBQ2KWn4kkjDTFQLurhYDTtbS/ZqF54VpcpbRIAxth/
 kq3g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20230601; t=1724171721; x=1724776521;
 h=content-transfer-encoding:mime-version:references:in-reply-to
 :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
 :subject:date:message-id:reply-to;
 bh=xQ1DK2daPE9ncuhHLypAOQVYnkEU114ByewzWSw0TA4=;
 b=T8uQqp29Q5xQqDf3C9RBKPYS7fr9EwzJsd9alUzgCpBY/9Kf063EBojNQ5olWuYMsg
 i+PRGurcOLgBuaDlEJWrIf4CA2GPkx1A18UrswauBj9rGP3UTPVQPTXWaXDwE4XdvIBV
 wxDWEZjoSmmuhWr+/0g7iGwEyZpOUCmBrEnua+X407mbZefEVQCKD6IVUXXGHRfxuQbc
 gn+w+iSSF/v2FGbH7bROQwWv+jm3aue7zW0E2+9o5kURnA8ju5YKQavthqQe64nW50JI
 oFXM2YLzHS82XRTtfw29IRPPzacwfiyXDUq9uFSG2voQX+kL39ROeITo8H/6vxfz11xI
 /zsw==
X-Forwarded-Encrypted: i=1;
 AJvYcCXuLXYeag8s68FlQKDNcs2dW2Qsn+oJwsEfS9R04NHw6GcEAjw2fBIzPnPyxOZR1Qp/h2rpNMrUaQ==@vger.kernel.org
X-Gm-Message-State: AOJu0Yw/5pe3FRNTBd0RJiG1gWlKYrSwYXpoilbYyGfsyuDsauaNYApQ
 rem9kbTsZjmnm8eyqzJelMD/GH6AO0mh6LWm39syJA+oGI6in/MV40bgTjTgKAQ=
X-Google-Smtp-Source: AGHT+IGJIEiqfLHFXG+Ko/nMDgS5J7g2UwUVQhLX/Pve1qWCglcSM9ewImEX+qVu2VF8smMRoWdXuQ==
X-Received: by 2002:a17:907:72cb:b0:a6f:27e6:8892 with SMTP id
 a640c23a62f3a-a8392a38c8dmr1018867866b.60.1724171720898;
 Tue, 20 Aug 2024 09:35:20 -0700 (PDT)
Received: from airbuntu.. (host81-157-90-255.range81-157.btcentralplus.com.
 [81.157.90.255]) by smtp.gmail.com with ESMTPSA id
 a640c23a62f3a-a83838cfb5esm780728766b.59.2024.08.20.09.35.20
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Tue, 20 Aug 2024 09:35:20 -0700 (PDT)
From: Qais Yousef <qyousef@layalina.io>
To: Ingo Molnar <mingo@kernel.org>, Peter Zijlstra <peterz@infradead.org>,
 Vincent Guittot <vincent.guittot@linaro.org>,
 "Rafael J. Wysocki" <rafael@kernel.org>,
 Viresh Kumar <viresh.kumar@linaro.org>
Cc: Juri Lelli <juri.lelli@redhat.com>, Steven Rostedt <rostedt@goodmis.org>,
 Dietmar Eggemann <dietmar.eggemann@arm.com>,
 John Stultz <jstultz@google.com>, linux-pm@vger.kernel.org,
 linux-kernel@vger.kernel.org, Qais Yousef <qyousef@layalina.io>
Subject: [RFC PATCH 05/16] sched: cpufreq: Remove magic 1.25 headroom from
 sugov_apply_dvfs_headroom()
Date: Tue, 20 Aug 2024 17:35:01 +0100
Message-Id: <20240820163512.1096301-6-qyousef@layalina.io>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20240820163512.1096301-1-qyousef@layalina.io>
References: <20240820163512.1096301-1-qyousef@layalina.io>
Precedence: bulk
X-Mailing-List: linux-pm@vger.kernel.org
List-Id: <linux-pm.vger.kernel.org>
List-Subscribe: <mailto:linux-pm+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-pm+unsubscribe@vger.kernel.org>
MIME-Version: 1.0

Replace 1.25 headroom in sugov_apply_dvfs_headroom() with better dynamic
logic.

Instead of the magical 1.25 headroom, use the new approximate_util_avg()
to provide headroom based on the dvfs_update_delay, which is the period
at which the cpufreq governor will send DVFS updates to the hardware, or
min(curr.se.slice, TICK_USEC) which is the max delay for util signal to
change and promote a cpufreq update; whichever is higher.

Add a new percpu dvfs_update_delay that can be cheaply accessed whenever
sugov_apply_dvfs_headroom() is called. We expect cpufreq governors that
rely on util to drive its DVFS logic/algorithm to populate these percpu
variables. schedutil is the only such governor at the moment.

The behavior of schedutil will change. Some systems will experience
faster dvfs rampup (because of higher TICK or rate_limit_us), others
will experience slower rampup.

The impact on performance should not be visible if not for the black
hole effect of utilization invariance. A problem that will be addressed
in later patches.

Later patches will also address how to provide better control of how
fast or slow the system should respond to allow userspace to select
their power/perf/thermal trade-off.

Signed-off-by: Qais Yousef <qyousef@layalina.io>
---
 kernel/sched/core.c              |  1 +
 kernel/sched/cpufreq_schedutil.c | 36 ++++++++++++++++++++++++++------
 kernel/sched/sched.h             |  9 ++++++++
 3 files changed, 40 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 402ee4947ef0..7099e40cc8bd 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -118,6 +118,7 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(sched_update_nr_running_tp);
 EXPORT_TRACEPOINT_SYMBOL_GPL(sched_compute_energy_tp);
 
 DEFINE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues);
+DEFINE_PER_CPU_READ_MOSTLY(u64, dvfs_update_delay);
 
 #ifdef CONFIG_SCHED_DEBUG
 /*
diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 575df3599813..303b0ab227e7 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -187,13 +187,28 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
  * to run at adequate performance point.
  *
  * This function provides enough headroom to provide adequate performance
- * assuming the CPU continues to be busy.
+ * assuming the CPU continues to be busy. This headroom is based on the
+ * dvfs_update_delay of the cpufreq governor or min(curr.se.slice, TICK_US),
+ * whichever is higher.
  *
- * At the moment it is a constant multiplication with 1.25.
+ * XXX: Should we provide headroom when the util is decaying?
  */
-static inline unsigned long sugov_apply_dvfs_headroom(unsigned long util)
+static inline unsigned long sugov_apply_dvfs_headroom(unsigned long util,  int cpu)
 {
-	return util + (util >> 2);
+	struct rq *rq = cpu_rq(cpu);
+	u64 delay;
+
+	/*
+	 * What is the possible worst case scenario for updating util_avg, ctx
+	 * switch or TICK?
+	 */
+	if (rq->cfs.h_nr_running > 1)
+		delay = min(rq->curr->se.slice/1000, TICK_USEC);
+	else
+		delay = TICK_USEC;
+	delay = max(delay, per_cpu(dvfs_update_delay, cpu));
+
+	return approximate_util_avg(util, delay);
 }
 
 unsigned long sugov_effective_cpu_perf(int cpu, unsigned long actual,
@@ -201,7 +216,7 @@ unsigned long sugov_effective_cpu_perf(int cpu, unsigned long actual,
 				 unsigned long max)
 {
 	/* Add dvfs headroom to actual utilization */
-	actual = sugov_apply_dvfs_headroom(actual);
+	actual = sugov_apply_dvfs_headroom(actual, cpu);
 	/* Actually we don't need to target the max performance */
 	if (actual < max)
 		max = actual;
@@ -579,15 +594,21 @@ rate_limit_us_store(struct gov_attr_set *attr_set, const char *buf, size_t count
 	struct sugov_tunables *tunables = to_sugov_tunables(attr_set);
 	struct sugov_policy *sg_policy;
 	unsigned int rate_limit_us;
+	int cpu;
 
 	if (kstrtouint(buf, 10, &rate_limit_us))
 		return -EINVAL;
 
 	tunables->rate_limit_us = rate_limit_us;
 
-	list_for_each_entry(sg_policy, &attr_set->policy_list, tunables_hook)
+	list_for_each_entry(sg_policy, &attr_set->policy_list, tunables_hook) {
+
 		sg_policy->freq_update_delay_ns = rate_limit_us * NSEC_PER_USEC;
 
+		for_each_cpu(cpu, sg_policy->policy->cpus)
+			per_cpu(dvfs_update_delay, cpu) = rate_limit_us;
+	}
+
 	return count;
 }
 
@@ -868,6 +889,9 @@ static int sugov_start(struct cpufreq_policy *policy)
 		memset(sg_cpu, 0, sizeof(*sg_cpu));
 		sg_cpu->cpu = cpu;
 		sg_cpu->sg_policy = sg_policy;
+
+		per_cpu(dvfs_update_delay, cpu) = sg_policy->tunables->rate_limit_us;
+
 		cpufreq_add_update_util_hook(cpu, &sg_cpu->update_util, uu);
 	}
 	return 0;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index ab4672675b84..c2d9fba6ea7a 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -3068,6 +3068,15 @@ unsigned long sugov_effective_cpu_perf(int cpu, unsigned long actual,
 unsigned long approximate_util_avg(unsigned long util, u64 delta);
 u64 approximate_runtime(unsigned long util);
 
+/*
+ * Any governor that relies on util signal to drive DVFS, must populate these
+ * percpu dvfs_update_delay variables.
+ *
+ * It should describe the rate/delay at which the governor sends DVFS freq
+ * update to the hardware in us.
+ */
+DECLARE_PER_CPU_READ_MOSTLY(u64, dvfs_update_delay);
+
 /*
  * Verify the fitness of task @p to run on @cpu taking into account the
  * CPU original capacity and the runtime/deadline ratio of the task.

From patchwork Tue Aug 20 16:35:02 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Qais Yousef <qyousef@layalina.io>
X-Patchwork-Id: 820843
Received: from mail-lj1-f177.google.com (mail-lj1-f177.google.com
 [209.85.208.177])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1D5191C378A
 for <linux-pm@vger.kernel.org>; Tue, 20 Aug 2024 16:35:23 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.208.177
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
 t=1724171728; cv=none;
 b=P1LZ+51xxa3DgoPjkl6ctX4m0Sn+R9mNk1DUjxih/R/FFzW/7XFnOxv2FHutJ7zH8LoO9ioadXX98LQ7eZesiWm0d+bIpxq72T6eYL9cFHID0lSgkf9iKLd5CPLDe4bngdB/G6RaYcWXJK5zsHgZvBKDia4+5DaCydPHMZv9CP0=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
 s=arc-20240116; t=1724171728; c=relaxed/simple;
 bh=tgYyWJNFjQme0TMeNdmbWnROtVj9eTJRm8pByyeiq9c=;
 h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
 MIME-Version;
 b=cGoJyq91PSMgL5vLO2IjxPZz2iqaigiiSYxVyPakGO5oTQ3/accJJfWLb6Yy8qxdwpmKY5EEJ/hvHHfmgdssX9GaI9qqmDXdF2aAMrkjiAW7mKqOgbZW5tqr7DvMPP0Q4cElpuj/Px0GYC5OZre+AuKUUtnMbbJIU7q2KFRdCEY=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=none (p=none dis=none) header.from=layalina.io;
 spf=pass smtp.mailfrom=layalina.io;
 dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com
 header.i=@layalina-io.20230601.gappssmtp.com header.b=1LzhH+1Z;
 arc=none smtp.client-ip=209.85.208.177
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=none (p=none dis=none) header.from=layalina.io
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=layalina.io
Authentication-Results: smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com
 header.i=@layalina-io.20230601.gappssmtp.com header.b="1LzhH+1Z"
Received: by mail-lj1-f177.google.com with SMTP id
 38308e7fff4ca-2f3f163e379so10992911fa.3
 for <linux-pm@vger.kernel.org>; Tue, 20 Aug 2024 09:35:23 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=layalina-io.20230601.gappssmtp.com; s=20230601; t=1724171722;
 x=1724776522; darn=vger.kernel.org;
 h=content-transfer-encoding:mime-version:references:in-reply-to
 :message-id:date:subject:cc:to:from:from:to:cc:subject:date
 :message-id:reply-to;
 bh=cnyuo1aGTE5JLvwJSoqPF/e83EBO8SWfb0UwiymD/r8=;
 b=1LzhH+1ZQWYEG0T9WKKxRjlxjwI1DixubVZuDM68kOGY9SbrW/L+0Qt0YOagX+VT02
 8FTzWCt3EiLhQh3gvezxNoHAptgaBXX5K6aMec5FiJqS+CLLGmMgq92IbG51tzXgVKXm
 9AmmI9R69cQYu+cFGEpUcpcb7DsgDEqZsxFRvD7FkZGNf/VuOUR9n6Z5xVIqShTFQpQs
 l4OrL2047I9YKhOWczNebbIbzlHMpVVU8MVyV0I62MtCYBS1maOkC5Jzq5mmyB2WC2zG
 +NRHBx2lQBChqmgqswTGyPkzbCYCtFAGoPRf0hxgfqNdCuZGxIsdjG8K4s79htcjv3Bn
 bFvQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20230601; t=1724171722; x=1724776522;
 h=content-transfer-encoding:mime-version:references:in-reply-to
 :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
 :subject:date:message-id:reply-to;
 bh=cnyuo1aGTE5JLvwJSoqPF/e83EBO8SWfb0UwiymD/r8=;
 b=l+hQ0JBcov3aFrl/YYLp7A97uEZFATX20yjefVsxeKkYFyo4Qws6E8w5q175twTOhZ
 ejFNHNUwog2c2D3ESN4wwg1uwwuw/pcDn89tixfI99iI0ZwSdfCgb28HQ6lKdbQ0+xSQ
 i3/RyBWezcQBDXx78ASdomzwO0N7GGdbUKxDN5iYIdtCbRBnMpoMwbfuMRQf3pHwx/K0
 yKiYurx6BtYdN5P69GhdQT72MYGx0mA52iqjmTWz74lnyLc0S5FciPQg8LTKGKaMnTRw
 3UpdJnWTChkubxIc5vyq9ZAq3CyQ100Y26NYMVYwWjoCQjtzl5ip9lbINEn0rHLDybsC
 ie/A==
X-Forwarded-Encrypted: i=1;
 AJvYcCUNAOzDQNGLb7eIlijma1z30XlZwRZhMiZcGKRAl+MdFrXvw690B2AHHmxhubqhvUNkzmxf7go6fQaYgxYmdW46GzbXcXULmDg=
X-Gm-Message-State: AOJu0Ywm5hcMpzMmIa0bSn6lNgaGHRXPd4yVRsE+DeQpdq3Ce2xhtMUs
 uZKyrmIWvRQ+CZxf0WAKv9UAXL11ofUK1HwPXc2T1aS3EucBLS7zviLIF4o5Irg=
X-Google-Smtp-Source: AGHT+IFyy57O8ICffhFQegIIFct0J1HR2fD4S7+D4GtgQH4PdLNsbpsWoXm759jhmaZ/Fqu9rrA55w==
X-Received: by 2002:a2e:84a:0:b0:2f3:ee58:eaf with SMTP id
 38308e7fff4ca-2f3ee580f92mr20700721fa.23.1724171721872;
 Tue, 20 Aug 2024 09:35:21 -0700 (PDT)
Received: from airbuntu.. (host81-157-90-255.range81-157.btcentralplus.com.
 [81.157.90.255]) by smtp.gmail.com with ESMTPSA id
 a640c23a62f3a-a83838cfb5esm780728766b.59.2024.08.20.09.35.21
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Tue, 20 Aug 2024 09:35:21 -0700 (PDT)
From: Qais Yousef <qyousef@layalina.io>
To: Ingo Molnar <mingo@kernel.org>, Peter Zijlstra <peterz@infradead.org>,
 Vincent Guittot <vincent.guittot@linaro.org>,
 "Rafael J. Wysocki" <rafael@kernel.org>,
 Viresh Kumar <viresh.kumar@linaro.org>
Cc: Juri Lelli <juri.lelli@redhat.com>, Steven Rostedt <rostedt@goodmis.org>,
 Dietmar Eggemann <dietmar.eggemann@arm.com>,
 John Stultz <jstultz@google.com>, linux-pm@vger.kernel.org,
 linux-kernel@vger.kernel.org, Qais Yousef <qyousef@layalina.io>
Subject: [RFC PATCH 06/16] sched/schedutil: Add a new tunable to dictate
 response time
Date: Tue, 20 Aug 2024 17:35:02 +0100
Message-Id: <20240820163512.1096301-7-qyousef@layalina.io>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20240820163512.1096301-1-qyousef@layalina.io>
References: <20240820163512.1096301-1-qyousef@layalina.io>
Precedence: bulk
X-Mailing-List: linux-pm@vger.kernel.org
List-Id: <linux-pm.vger.kernel.org>
List-Subscribe: <mailto:linux-pm+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-pm+unsubscribe@vger.kernel.org>
MIME-Version: 1.0

The new tunable, response_time_ms,  allow us to speed up or slow down
the response time of the policy to meet the perf, power and thermal
characteristic desired by the user/sysadmin. There's no single universal
trade-off that we can apply for all systems even if they use the same
SoC. The form factor of the system, the dominant use case, and in case
of battery powered systems, the size of the battery and presence or
absence of active cooling can play a big role on what would be best to
use.

The new tunable provides sensible defaults, but yet gives the power to
control the response time to the user/sysadmin, if they wish to.

This tunable is applied before we apply the DVFS headroom.

The default behavior of applying 1.25 headroom can be re-instated easily
now. But we continue to keep the min required headroom to overcome
hardware limitation in its speed to change DVFS. And any additional
headroom to speed things up must be applied by userspace to match their
expectation for best perf/watt as it dictates a type of policy that will
be better for some systems, but worse for others.

There's a whitespace clean up included in sugov_start().

Signed-off-by: Qais Yousef <qyousef@layalina.io>
---
 Documentation/admin-guide/pm/cpufreq.rst |  17 +++-
 drivers/cpufreq/cpufreq.c                |   4 +-
 include/linux/cpufreq.h                  |   3 +
 kernel/sched/cpufreq_schedutil.c         | 115 ++++++++++++++++++++++-
 4 files changed, 132 insertions(+), 7 deletions(-)

diff --git a/Documentation/admin-guide/pm/cpufreq.rst b/Documentation/admin-guide/pm/cpufreq.rst
index 6adb7988e0eb..fa0d602a920e 100644
--- a/Documentation/admin-guide/pm/cpufreq.rst
+++ b/Documentation/admin-guide/pm/cpufreq.rst
@@ -417,7 +417,7 @@ is passed by the scheduler to the governor callback which causes the frequency
 to go up to the allowed maximum immediately and then draw back to the value
 returned by the above formula over time.
 
-This governor exposes only one tunable:
+This governor exposes two tunables:
 
 ``rate_limit_us``
 	Minimum time (in microseconds) that has to pass between two consecutive
@@ -427,6 +427,21 @@ This governor exposes only one tunable:
 	The purpose of this tunable is to reduce the scheduler context overhead
 	of the governor which might be excessive without it.
 
+``respone_time_ms``
+	Amount of time (in milliseconds) required to ramp the policy from
+	lowest to highest frequency. Can be decreased to speed up the
+	responsiveness of the system, or increased to slow the system down in
+	hope to save power. The best perf/watt will depend on the system
+	characteristics and the dominant workload you expect to run. For
+	userspace that has smart context on the type of workload running (like
+	in Android), one can tune this to suite the demand of that workload.
+
+	Note that when slowing the response down, you can end up effectively
+	chopping off the top frequencies for that policy as the util is capped
+	to 1024. On HMP systems this chopping effect will only occur on the
+	biggest core whose capacity is 1024. Don't rely on this behavior as
+	this is a limitation that can hopefully be improved in the future.
+
 This governor generally is regarded as a replacement for the older `ondemand`_
 and `conservative`_ governors (described below), as it is simpler and more
 tightly integrated with the CPU scheduler, its overhead in terms of CPU context
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index a45aac17c20f..5dc44c3694fe 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -533,8 +533,8 @@ void cpufreq_disable_fast_switch(struct cpufreq_policy *policy)
 }
 EXPORT_SYMBOL_GPL(cpufreq_disable_fast_switch);
 
-static unsigned int __resolve_freq(struct cpufreq_policy *policy,
-		unsigned int target_freq, unsigned int relation)
+unsigned int __resolve_freq(struct cpufreq_policy *policy,
+			    unsigned int target_freq, unsigned int relation)
 {
 	unsigned int idx;
 
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index 20f7e98ee8af..c14ffdcd8933 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -622,6 +622,9 @@ int cpufreq_driver_target(struct cpufreq_policy *policy,
 int __cpufreq_driver_target(struct cpufreq_policy *policy,
 				   unsigned int target_freq,
 				   unsigned int relation);
+unsigned int __resolve_freq(struct cpufreq_policy *policy,
+			    unsigned int target_freq,
+			    unsigned int relation);
 unsigned int cpufreq_driver_resolve_freq(struct cpufreq_policy *policy,
 					 unsigned int target_freq);
 unsigned int cpufreq_policy_transition_delay_us(struct cpufreq_policy *policy);
diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 303b0ab227e7..94e35b7c972d 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -8,9 +8,12 @@
 
 #define IOWAIT_BOOST_MIN	(SCHED_CAPACITY_SCALE / 8)
 
+DEFINE_PER_CPU_READ_MOSTLY(unsigned long, response_time_mult);
+
 struct sugov_tunables {
 	struct gov_attr_set	attr_set;
 	unsigned int		rate_limit_us;
+	unsigned int		response_time_ms;
 };
 
 struct sugov_policy {
@@ -22,6 +25,7 @@ struct sugov_policy {
 	raw_spinlock_t		update_lock;
 	u64			last_freq_update_time;
 	s64			freq_update_delay_ns;
+	unsigned int		freq_response_time_ms;
 	unsigned int		next_freq;
 	unsigned int		cached_raw_freq;
 
@@ -59,6 +63,70 @@ static DEFINE_PER_CPU(struct sugov_cpu, sugov_cpu);
 
 /************************ Governor internals ***********************/
 
+static inline u64 sugov_calc_freq_response_ms(struct sugov_policy *sg_policy)
+{
+	int cpu = cpumask_first(sg_policy->policy->cpus);
+	unsigned long cap = arch_scale_cpu_capacity(cpu);
+	unsigned int max_freq, sec_max_freq;
+
+	max_freq = sg_policy->policy->cpuinfo.max_freq;
+	sec_max_freq = __resolve_freq(sg_policy->policy,
+				      max_freq - 1,
+				      CPUFREQ_RELATION_H);
+
+	/*
+	 * We will request max_freq as soon as util crosses the capacity at
+	 * second highest frequency. So effectively our response time is the
+	 * util at which we cross the cap@2nd_highest_freq.
+	 */
+	cap = sec_max_freq * cap / max_freq;
+
+	return approximate_runtime(cap + 1);
+}
+
+static inline void sugov_update_response_time_mult(struct sugov_policy *sg_policy)
+{
+	unsigned long mult;
+	int cpu;
+
+	if (unlikely(!sg_policy->freq_response_time_ms))
+		sg_policy->freq_response_time_ms = sugov_calc_freq_response_ms(sg_policy);
+
+	mult = sg_policy->freq_response_time_ms * SCHED_CAPACITY_SCALE;
+	mult /=	sg_policy->tunables->response_time_ms;
+
+	if (SCHED_WARN_ON(!mult))
+		mult = SCHED_CAPACITY_SCALE;
+
+	for_each_cpu(cpu, sg_policy->policy->cpus)
+		per_cpu(response_time_mult, cpu) = mult;
+}
+
+/*
+ * Shrink or expand how long it takes to reach the maximum performance of the
+ * policy.
+ *
+ * sg_policy->freq_response_time_ms is a constant value defined by PELT
+ * HALFLIFE and the capacity of the policy (assuming HMP systems).
+ *
+ * sg_policy->tunables->response_time_ms is a user defined response time. By
+ * setting it lower than sg_policy->freq_response_time_ms, the system will
+ * respond faster to changes in util, which will result in reaching maximum
+ * performance point quicker. By setting it higher, it'll slow down the amount
+ * of time required to reach the maximum OPP.
+ *
+ * This should be applied when selecting the frequency.
+ */
+static inline unsigned long
+sugov_apply_response_time(unsigned long util, int cpu)
+{
+	unsigned long mult;
+
+	mult = per_cpu(response_time_mult, cpu) * util;
+
+	return mult >> SCHED_CAPACITY_SHIFT;
+}
+
 static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
 {
 	s64 delta_ns;
@@ -215,7 +283,10 @@ unsigned long sugov_effective_cpu_perf(int cpu, unsigned long actual,
 				 unsigned long min,
 				 unsigned long max)
 {
-	/* Add dvfs headroom to actual utilization */
+	/*
+	 * Speed up/slow down response timee first then apply DVFS headroom.
+	 */
+	actual = sugov_apply_response_time(actual, cpu);
 	actual = sugov_apply_dvfs_headroom(actual, cpu);
 	/* Actually we don't need to target the max performance */
 	if (actual < max)
@@ -614,8 +685,42 @@ rate_limit_us_store(struct gov_attr_set *attr_set, const char *buf, size_t count
 
 static struct governor_attr rate_limit_us = __ATTR_RW(rate_limit_us);
 
+static ssize_t response_time_ms_show(struct gov_attr_set *attr_set, char *buf)
+{
+	struct sugov_tunables *tunables = to_sugov_tunables(attr_set);
+
+	return sprintf(buf, "%u\n", tunables->response_time_ms);
+}
+
+static ssize_t
+response_time_ms_store(struct gov_attr_set *attr_set, const char *buf, size_t count)
+{
+	struct sugov_tunables *tunables = to_sugov_tunables(attr_set);
+	struct sugov_policy *sg_policy;
+	unsigned int response_time_ms;
+
+	if (kstrtouint(buf, 10, &response_time_ms))
+		return -EINVAL;
+
+	/* XXX need special handling for high values? */
+
+	tunables->response_time_ms = response_time_ms;
+
+	list_for_each_entry(sg_policy, &attr_set->policy_list, tunables_hook) {
+		if (sg_policy->tunables == tunables) {
+			sugov_update_response_time_mult(sg_policy);
+			break;
+		}
+	}
+
+	return count;
+}
+
+static struct governor_attr response_time_ms = __ATTR_RW(response_time_ms);
+
 static struct attribute *sugov_attrs[] = {
 	&rate_limit_us.attr,
+	&response_time_ms.attr,
 	NULL
 };
 ATTRIBUTE_GROUPS(sugov);
@@ -803,11 +908,13 @@ static int sugov_init(struct cpufreq_policy *policy)
 		goto stop_kthread;
 	}
 
-	tunables->rate_limit_us = cpufreq_policy_transition_delay_us(policy);
-
 	policy->governor_data = sg_policy;
 	sg_policy->tunables = tunables;
 
+	tunables->rate_limit_us = cpufreq_policy_transition_delay_us(policy);
+	tunables->response_time_ms = sugov_calc_freq_response_ms(sg_policy);
+	sugov_update_response_time_mult(sg_policy);
+
 	ret = kobject_init_and_add(&tunables->attr_set.kobj, &sugov_tunables_ktype,
 				   get_governor_parent_kobj(policy), "%s",
 				   schedutil_gov.name);
@@ -867,7 +974,7 @@ static int sugov_start(struct cpufreq_policy *policy)
 	void (*uu)(struct update_util_data *data, u64 time, unsigned int flags);
 	unsigned int cpu;
 
-	sg_policy->freq_update_delay_ns	= sg_policy->tunables->rate_limit_us * NSEC_PER_USEC;
+	sg_policy->freq_update_delay_ns		= sg_policy->tunables->rate_limit_us * NSEC_PER_USEC;
 	sg_policy->last_freq_update_time	= 0;
 	sg_policy->next_freq			= 0;
 	sg_policy->work_in_progress		= false;

From patchwork Tue Aug 20 16:35:03 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Qais Yousef <qyousef@layalina.io>
X-Patchwork-Id: 821197
Received: from mail-ej1-f51.google.com (mail-ej1-f51.google.com
 [209.85.218.51])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2596C1B3F33
 for <linux-pm@vger.kernel.org>; Tue, 20 Aug 2024 16:36:45 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.218.51
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
 t=1724171809; cv=none;
 b=rcc8jr531xUZH/b5chIpCTh6C8Pw/TdjbEUj0Kkpf5lBD+wQyODMIwL8tA4BQlP1X6sqEafMriB0qTpe0qvjhNX3vFmjhDmQsBF6CSdNThdlKEQSm40DWI0rpGj6/HXkgCcLc+0iKJCLjIG5hrRzfUB/y7dOCJ7nQFwQbBjnrbY=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
 s=arc-20240116; t=1724171809; c=relaxed/simple;
 bh=ggQ0xHMfPwWR5W6MLhFpsoF3X1RHmOFsfCjj+44KrkY=;
 h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
 MIME-Version;
 b=W8AlGSN1XpjgGWwCiaCVbvGKZsJ/qkqqCp+T0k7L1zSWD9U50p6CwPjjVarJCaXuuQEiK3hdyfVyVbivmvhde5cRvi3zOHzzvw2ZdhQK+U1OQ5U+trx4DKwRwC7SZVBhKWJLYN4d0K285Q+qNgrqgR/nAQaa9bI2Dr3DREB12OU=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=none (p=none dis=none) header.from=layalina.io;
 spf=pass smtp.mailfrom=layalina.io;
 dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com
 header.i=@layalina-io.20230601.gappssmtp.com header.b=wsyA3i1I;
 arc=none smtp.client-ip=209.85.218.51
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=none (p=none dis=none) header.from=layalina.io
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=layalina.io
Authentication-Results: smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com
 header.i=@layalina-io.20230601.gappssmtp.com header.b="wsyA3i1I"
Received: by mail-ej1-f51.google.com with SMTP id
 a640c23a62f3a-a7ab5fc975dso569566366b.1
 for <linux-pm@vger.kernel.org>; Tue, 20 Aug 2024 09:36:45 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=layalina-io.20230601.gappssmtp.com; s=20230601; t=1724171804;
 x=1724776604; darn=vger.kernel.org;
 h=content-transfer-encoding:mime-version:references:in-reply-to
 :message-id:date:subject:cc:to:from:from:to:cc:subject:date
 :message-id:reply-to;
 bh=1wet2+g0g75UUnmsKQrb/VZONF2cRQanUuyFsF+8Vjk=;
 b=wsyA3i1IbjWoPW9j3p0vNvUGwPtjDlNT45+LMvpLkR7Gh+g1Ft4yn9bcG6cHDY6Fx7
 EskjTsaMfaEWcarR6Xqt1tnXRL+7pkPpjAlgUab7PvpOrpZbl2ph7izXNPl7oLISini8
 V4ljBcawl8Sf7NAfs1prsOlVMV9YYOL0v+8HXCSOh1Jr1bnvLd6Ku1IjRgSIUBusiYEp
 r084fR18jdRdhPY4bsDBe2NmYEqcLYCTg9X4e7/HH2BtfV9GHjqzHR7A6eYyy1REsQRq
 0wdm6G189GSe9YpKNQ8GF0GCTNPMj8Yt4Zf7NoW335B8v6ZY/TFYj7N1dCKjvlLGo8OY
 bzvw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20230601; t=1724171804; x=1724776604;
 h=content-transfer-encoding:mime-version:references:in-reply-to
 :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
 :subject:date:message-id:reply-to;
 bh=1wet2+g0g75UUnmsKQrb/VZONF2cRQanUuyFsF+8Vjk=;
 b=jbcH43S+Vbei6a2ua3WZ7LmB/boEIowfXt02L7i0S61mL66D+8UcLB+T78dxbVAi9E
 JP/wno3u334Izb9x7IIKE3HvuNN92oUeQLEsG/AxoPtLG94mQPSdj3vpmS5vp/LVYprO
 WAvcDj+/Gpje+g72j/8LgeYcp9ZKDsSn8084ekk27dBmUCuqEn66gI2O/p1b04WA/jlC
 j5pJtq33QIVUeySLtsTSvqeJg61hWjt0huUOYKWX3kt3UefLYFBnchKYYDalt3bPbRC4
 S0K+FG+UhBAdjUcFMdEseylvVJs5l59vMpy1gBj5c1FDF4capkbBmsnc6sHcmh99J9oX
 Czxg==
X-Forwarded-Encrypted: i=1;
 AJvYcCWwE41RUN3AWzfy38TVupUvm3Q7uo55nHxriziWw1FLMKXC+qX3JIaUoA5L3mTt9jjCI+G8/GBvx9K4F0gAnTDBxTSeeDwqO5U=
X-Gm-Message-State: AOJu0YyKg2XrRBHvgHf/CPv6kETAzps0o7oZW70JlTl8Hwo3UOSJczkt
 l24mLwqDlaJkl1SFB6ILh4HJFV4C4NfR1Mf2+0B/KaOvxPa9R00xcsnj9W4bRaw=
X-Google-Smtp-Source: AGHT+IFEuSDJpZmH2vuXQPhm8UldWkmv4L2KE1jsWvuj7bhXQKIf0BaaBicVLq/zLot7W5U3e4x0BA==
X-Received: by 2002:a17:906:d7c7:b0:a7a:bece:6222 with SMTP id
 a640c23a62f3a-a83928a9f25mr1027872166b.10.1724171804254;
 Tue, 20 Aug 2024 09:36:44 -0700 (PDT)
Received: from airbuntu.. (host81-157-90-255.range81-157.btcentralplus.com.
 [81.157.90.255]) by smtp.gmail.com with ESMTPSA id
 a640c23a62f3a-a83838cfb5esm780728766b.59.2024.08.20.09.35.22
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Tue, 20 Aug 2024 09:36:43 -0700 (PDT)
From: Qais Yousef <qyousef@layalina.io>
To: Ingo Molnar <mingo@kernel.org>, Peter Zijlstra <peterz@infradead.org>,
 Vincent Guittot <vincent.guittot@linaro.org>,
 "Rafael J. Wysocki" <rafael@kernel.org>,
 Viresh Kumar <viresh.kumar@linaro.org>
Cc: Juri Lelli <juri.lelli@redhat.com>, Steven Rostedt <rostedt@goodmis.org>,
 Dietmar Eggemann <dietmar.eggemann@arm.com>,
 John Stultz <jstultz@google.com>, linux-pm@vger.kernel.org,
 linux-kernel@vger.kernel.org, Qais Yousef <qyousef@layalina.io>
Subject: [RFC PATCH 07/16] sched/pelt: Introduce PELT multiplier boot time
 parameter
Date: Tue, 20 Aug 2024 17:35:03 +0100
Message-Id: <20240820163512.1096301-8-qyousef@layalina.io>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20240820163512.1096301-1-qyousef@layalina.io>
References: <20240820163512.1096301-1-qyousef@layalina.io>
Precedence: bulk
X-Mailing-List: linux-pm@vger.kernel.org
List-Id: <linux-pm.vger.kernel.org>
List-Subscribe: <mailto:linux-pm+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-pm+unsubscribe@vger.kernel.org>
MIME-Version: 1.0

The param is set as read only and can only be changed at boot time via

	kernel.sched_pelt_multiplier=[1, 2, 4]

PELT has a big impact on the overall system response and reactiveness to
change. Smaller PELT HF means it'll require less time to reach the
maximum performance point of the system when the system become fully
busy; and equally shorter time to go back to lowest performance point
when the system goes back to idle.

This faster reaction impacts both DVFS response and migration time
between clusters in HMP system.

Smaller PELT values (higher multiplier) are expected to give better
performance at the cost of more power. Under-powered systems can
particularly benefit from faster response time. Powerful systems can
still benefit from response time if they want to be tuned towards perf
more and power is not the major concern for them.

This combined with response_time_ms from schedutil should give the user
and sysadmin a deterministic way to control the triangular power, perf
and thermals for their system. The default response_time_ms will half
as PELT HF halves.

Update approximate_{util_avg, runtime}() to take into account the PELT
HALFLIFE multiplier.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
[qyousef: Commit message and boot param]
Signed-off-by: Qais Yousef <qyousef@layalina.io>
---
 kernel/sched/pelt.c | 62 ++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 58 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/pelt.c b/kernel/sched/pelt.c
index 06cb881ba582..536575757420 100644
--- a/kernel/sched/pelt.c
+++ b/kernel/sched/pelt.c
@@ -24,6 +24,9 @@
  *  Author: Vincent Guittot <vincent.guittot@linaro.org>
  */
 
+static __read_mostly unsigned int sched_pelt_lshift;
+static unsigned int sched_pelt_multiplier = 1;
+
 /*
  * Approximate:
  *   val * y^n,    where y^32 ~= 0.5 (~1 scheduling period)
@@ -180,6 +183,7 @@ static __always_inline int
 ___update_load_sum(u64 now, struct sched_avg *sa,
 		  unsigned long load, unsigned long runnable, int running)
 {
+	int time_shift;
 	u64 delta;
 
 	delta = now - sa->last_update_time;
@@ -195,12 +199,17 @@ ___update_load_sum(u64 now, struct sched_avg *sa,
 	/*
 	 * Use 1024ns as the unit of measurement since it's a reasonable
 	 * approximation of 1us and fast to compute.
+	 * On top of this, we can change the half-time period from the default
+	 * 32ms to a shorter value. This is equivalent to left shifting the
+	 * time.
+	 * Merge both right and left shifts in one single right shift
 	 */
-	delta >>= 10;
+	time_shift = 10 - sched_pelt_lshift;
+	delta >>= time_shift;
 	if (!delta)
 		return 0;
 
-	sa->last_update_time += delta << 10;
+	sa->last_update_time += delta << time_shift;
 
 	/*
 	 * running is a subset of runnable (weight) so running can't be set if
@@ -468,6 +477,51 @@ int update_irq_load_avg(struct rq *rq, u64 running)
 }
 #endif /* CONFIG_HAVE_SCHED_AVG_IRQ */
 
+static int set_sched_pelt_multiplier(const char *val, const struct kernel_param *kp)
+{
+	int ret;
+
+	ret = param_set_int(val, kp);
+	if (ret)
+		goto error;
+
+	switch (sched_pelt_multiplier)  {
+	case 1:
+		fallthrough;
+	case 2:
+		fallthrough;
+	case 4:
+		WRITE_ONCE(sched_pelt_lshift,
+			   sched_pelt_multiplier >> 1);
+		break;
+	default:
+		ret = -EINVAL;
+		goto error;
+	}
+
+	return 0;
+
+error:
+	sched_pelt_multiplier = 1;
+	return ret;
+}
+
+static const struct kernel_param_ops sched_pelt_multiplier_ops = {
+	.set = set_sched_pelt_multiplier,
+	.get = param_get_int,
+};
+
+#ifdef MODULE_PARAM_PREFIX
+#undef MODULE_PARAM_PREFIX
+#endif
+/* XXX: should we use sched as prefix? */
+#define MODULE_PARAM_PREFIX "kernel."
+module_param_cb(sched_pelt_multiplier, &sched_pelt_multiplier_ops, &sched_pelt_multiplier, 0444);
+MODULE_PARM_DESC(sched_pelt_multiplier, "PELT HALFLIFE helps control the responsiveness of the system.");
+MODULE_PARM_DESC(sched_pelt_multiplier, "Accepted value: 1 32ms PELT HALIFE - roughly 200ms to go from 0 to max performance point (default).");
+MODULE_PARM_DESC(sched_pelt_multiplier, "                2 16ms PELT HALIFE - roughly 100ms to go from 0 to max performance point.");
+MODULE_PARM_DESC(sched_pelt_multiplier, "                4  8ms PELT HALIFE - roughly  50ms to go from 0 to max performance point.");
+
 /*
  * Approximate the new util_avg value assuming an entity has continued to run
  * for @delta us.
@@ -482,7 +536,7 @@ unsigned long approximate_util_avg(unsigned long util, u64 delta)
 	if (unlikely(!delta))
 		return util;
 
-	accumulate_sum(delta, &sa, 1, 0, 1);
+	accumulate_sum(delta << sched_pelt_lshift, &sa, 1, 0, 1);
 	___update_load_avg(&sa, 0);
 
 	return sa.util_avg;
@@ -494,7 +548,7 @@ unsigned long approximate_util_avg(unsigned long util, u64 delta)
 u64 approximate_runtime(unsigned long util)
 {
 	struct sched_avg sa = {};
-	u64 delta = 1024; // period = 1024 = ~1ms
+	u64 delta = 1024 << sched_pelt_lshift; // period = 1024 = ~1ms
 	u64 runtime = 0;
 
 	if (unlikely(!util))

From patchwork Tue Aug 20 16:35:04 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Qais Yousef <qyousef@layalina.io>
X-Patchwork-Id: 820842
Received: from mail-ed1-f48.google.com (mail-ed1-f48.google.com
 [209.85.208.48])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by smtp.subspace.kernel.org (Postfix) with ESMTPS id D79064962C
 for <linux-pm@vger.kernel.org>; Tue, 20 Aug 2024 16:36:46 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.208.48
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
 t=1724171809; cv=none;
 b=uzRXJhKVgwhBOc1TFrE0d7oRpkJzsX3REAj3IA2c/aRkPbYZHNSHcItWD0Ebg7uSvfnbxOk4tA+IUiFPfgPixZK16qaHnD714WB7YIb8Xv+0zxqCRj2chyxnng5HKZeEo8weJocwCfr+XVorPe+UeSLN0WvDMkRCJ9rx6nEUD1I=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
 s=arc-20240116; t=1724171809; c=relaxed/simple;
 bh=NWGl6xmypDOUzvDbdd6/ZYozVxHaZS6LctziiRw3r/4=;
 h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
 MIME-Version:Content-Type;
 b=ORxUBjG8Jo5scv0hRw6rgQ0qEe+8eg58MAJK4E0V6/DiqEXk0KWZSZkf5baRHpIsPQbZbNDJKlZiqJ1Utl2fTF6cHLaAIQ0x9vKyDSh7S64VgQmdR8wF/DgTlrQR+OfVPR38f/AydZHvebUtPf7CKpismXhkPZg2R7Gbviyq+N8=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=none (p=none dis=none) header.from=layalina.io;
 spf=pass smtp.mailfrom=layalina.io;
 dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com
 header.i=@layalina-io.20230601.gappssmtp.com header.b=RXtjyE/P;
 arc=none smtp.client-ip=209.85.208.48
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=none (p=none dis=none) header.from=layalina.io
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=layalina.io
Authentication-Results: smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com
 header.i=@layalina-io.20230601.gappssmtp.com header.b="RXtjyE/P"
Received: by mail-ed1-f48.google.com with SMTP id
 4fb4d7f45d1cf-5bed68129a7so4960634a12.2
 for <linux-pm@vger.kernel.org>; Tue, 20 Aug 2024 09:36:46 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=layalina-io.20230601.gappssmtp.com; s=20230601; t=1724171805;
 x=1724776605; darn=vger.kernel.org;
 h=content-transfer-encoding:mime-version:references:in-reply-to
 :message-id:date:subject:cc:to:from:from:to:cc:subject:date
 :message-id:reply-to;
 bh=MlOaXaSxe4t1PPDAFFFwKxv22YBKt/FKkTrJcGZpyCM=;
 b=RXtjyE/P3BG/2Se4+wuqZn3Cf+Eru8kvIkt31wZ5fw+B8afAMb7hG0T6xvTJUx7Wbr
 zoUrFjiJiuXtxr9Z+MqmRYNKrAE33RJh8GhoPs+yEgDPvAU4HjxwxnBEsrBhvm1Op+iB
 MsqB62OBwp60MSPF6566vcj9AsfrPUInlpHhmjdyA3tAV9AUUITrl1T3f/ZD4SiGfbu/
 6uWsY2WyLKUEJeUMedut17OPFs4qjrtj+z/bBKEb4YGQrewhdnn2JJwzhLS+6xFmNgbn
 m2fv1pOjeajBEdvplOIEsRIkAhMWh66E/XSeviooXq6y/lUHoDntlNRhxiwvaPQgOuOb
 Ek6w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20230601; t=1724171805; x=1724776605;
 h=content-transfer-encoding:mime-version:references:in-reply-to
 :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
 :subject:date:message-id:reply-to;
 bh=MlOaXaSxe4t1PPDAFFFwKxv22YBKt/FKkTrJcGZpyCM=;
 b=o6vHk1sLOYUXOLR6wNrKFKdTvyZW4oJ65U6hoGg1s5IjR4J8z1pFjsXMxZWcWpyJt9
 9DJZIWfUor6lS9ApDj0pWH7cso30eVA41D5+f0vAN16m7BBLPEF5So+7nWGEH5WaCI/3
 XNDS5ZHor2SNsJ7oOpsl1FyH4fpMduCVdAR5wBu/d2RrZKyEPNsBTQx7boHL33DABGQL
 NAufRHpOz2MxkmLgbuD1OeHsdasPAJKJ5lSyhW2T2i96Jjg7hEn0B1D8wTBy5yy7mpVL
 hEtEkfiBkypoD7NNmsfwEvAk6MGOAg/aXbqR7SNOs8NNxkf/7lpzJ0tMXAZv0jzaDjTk
 wCkw==
X-Forwarded-Encrypted: i=1;
 AJvYcCXvtOdvLDKBLXIx9N/zcdQeI0Vp3t1INzhaUHBoOzr2vWSVCLKI4r4u1myM0f54zWwqbSoGQvGJXg==@vger.kernel.org
X-Gm-Message-State: AOJu0Yzjm4Vd69XLM9B3DQL7aSI7eTYnaeRyvOPYVdHBN+G4oVYBw/I7
 yhcVJAgsZZp1/xatA7YGQTj38x7TsLlbtTs/+Nt+RlTPJG0FLYUCyA1C/5gYANw=
X-Google-Smtp-Source: AGHT+IGd0lN/68+It+jeiTE+VFnqKtHnby+yBIZv2MbicBv5kJa2yRauuDHdIXSJbNXUDYwoNgDzbA==
X-Received: by 2002:a17:907:c7d7:b0:a7a:be06:d8e3 with SMTP id
 a640c23a62f3a-a8392a22287mr1003338666b.46.1724171805332;
 Tue, 20 Aug 2024 09:36:45 -0700 (PDT)
Received: from airbuntu.. (host81-157-90-255.range81-157.btcentralplus.com.
 [81.157.90.255]) by smtp.gmail.com with ESMTPSA id
 a640c23a62f3a-a83838cfb5esm780728766b.59.2024.08.20.09.36.44
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Tue, 20 Aug 2024 09:36:44 -0700 (PDT)
From: Qais Yousef <qyousef@layalina.io>
To: Ingo Molnar <mingo@kernel.org>, Peter Zijlstra <peterz@infradead.org>,
 Vincent Guittot <vincent.guittot@linaro.org>,
 "Rafael J. Wysocki" <rafael@kernel.org>,
 Viresh Kumar <viresh.kumar@linaro.org>
Cc: Juri Lelli <juri.lelli@redhat.com>, Steven Rostedt <rostedt@goodmis.org>,
 Dietmar Eggemann <dietmar.eggemann@arm.com>,
 John Stultz <jstultz@google.com>, linux-pm@vger.kernel.org,
 linux-kernel@vger.kernel.org, Qais Yousef <qyousef@layalina.io>
Subject: [RFC PATCH 08/16] sched/fair: Extend util_est to improve rampup time
Date: Tue, 20 Aug 2024 17:35:04 +0100
Message-Id: <20240820163512.1096301-9-qyousef@layalina.io>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20240820163512.1096301-1-qyousef@layalina.io>
References: <20240820163512.1096301-1-qyousef@layalina.io>
Precedence: bulk
X-Mailing-List: linux-pm@vger.kernel.org
List-Id: <linux-pm.vger.kernel.org>
List-Subscribe: <mailto:linux-pm+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-pm+unsubscribe@vger.kernel.org>
MIME-Version: 1.0

Utilization invariance can cause big delays. When tasks are running,
accumulate non-invairiant version of utilization to help tasks to settle
down to their new util_avg values faster.

Keep track of delta_exec during runnable across activations to help
update util_est for a long running task accurately. util_est shoudl
still behave the same at enqueue/dequeue.

Before this patch the a busy task tamping up would experience the
following transitions, running on M1 Mac Mini

                            rampup-6338 util_avg running
     ┌─────────────────────────────────────────────────────────────────────────┐
986.0┤                                                               ▄▄▄▄▄▟▀▀▀▀│
     │                                                        ▗▄▄▟▀▀▀▘         │
     │                                                    ▗▄▟▀▀                │
     │                                                 ▄▟▀▀                    │
739.5┤                                              ▄▟▀▘                       │
     │                                           ▗▄▛▘                          │
     │                                         ▗▟▀                             │
493.0┤                                       ▗▛▀                               │
     │                                    ▗▄▛▀                                 │
     │                                  ▄▟▀                                    │
     │                                ▄▛▘                                      │
246.5┤                             ▗▟▀▘                                        │
     │                          ▄▟▀▀                                           │
     │                      ▗▄▄▛▘                                              │
     │                 ▗▄▄▄▟▀                                                  │
  0.0┤  ▗         ▗▄▄▟▀▀                                                       │
     └┬───────┬───────┬───────┬───────┬───────┬───────┬───────┬───────┬───────┬┘
    1.700   1.733   1.767   1.800   1.833   1.867   1.900   1.933   1.967 2.000

───────────────── rampup-6338 util_avg running residency (ms) ──────────────────
0.0   ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 5.5
15.0  ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 7.9
36.0  ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 8.0
57.0  ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 8.0
78.0  ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 7.9
98.0  ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 5.0
117.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 5.0
137.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 5.0
156.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 4.0
176.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 3.0
191.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 4.0
211.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 4.0
230.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 3.0
248.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 3.0
266.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0
277.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 3.0
294.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.6
311.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.4
327.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0
340.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 3.0
358.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0
371.0 ▇▇▇▇▇▇▇▇▇ 1.0
377.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0
389.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0
401.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0
413.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 3.0
431.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0
442.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0
456.0 ▇▇▇▇▇▇▇▇▇ 1.0

───────────────────────── Sum Time Running on CPU (ms) ─────────────────────────
CPU0.0 ▇▇▇▇▇ 90.39
CPU4.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 1156.93

                            6338 rampup CPU0.0 Frequency
    ┌──────────────────────────────────────────────────────────────────────────┐
2.06┤                                ▛▀▀                                       │
    │                                ▌                                         │
    │                                ▌                                         │
    │                                ▌                                         │
1.70┤                             ▛▀▀▘                                         │
    │                             ▌                                            │
    │                             ▌                                            │
1.33┤                         ▗▄▄▄▌                                            │
    │                         ▐                                                │
    │                         ▐                                                │
    │                         ▐                                                │
0.97┤                     ▗▄▄▄▟                                                │
    │                     ▐                                                    │
    │                     ▐                                                    │
    │                     ▐                                                    │
0.60┤  ▗         ▗▄▄▄▄▄▄▄▄▟                                                    │
    └┬───────┬───────┬───────┬───────┬────────┬───────┬───────┬───────┬───────┬┘
   1.700   1.733   1.767   1.800   1.833    1.867   1.900   1.933   1.967 2.000

                            6338 rampup CPU4.0 Frequency
    ┌──────────────────────────────────────────────────────────────────────────┐
3.20┤                                                    ▐▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀│
    │                                                    ▐                     │
    │                                                  ▛▀▀                     │
    │                                                  ▌                       │
2.78┤                                               ▐▀▀▘                       │
    │                                             ▗▄▟                          │
    │                                             ▌                            │
2.35┤                                          ▗▄▄▌                            │
    │                                          ▐                               │
    │                                        ▄▄▟                               │
    │                                        ▌                                 │
1.93┤                                     ▗▄▄▌                                 │
    │                                     ▐                                    │
    │                                     ▐                                    │
    │                                     ▐                                    │
1.50┤                                  ▗▄▄▟                                    │
    └┬───────┬───────┬───────┬───────┬────────┬───────┬───────┬───────┬───────┬┘
   1.700   1.733   1.767   1.800   1.833    1.867   1.900   1.933   1.967 2.000

───────────────── 6338 rampup CPU0.0 Frequency residency (ms) ──────────────────
0.6   ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 37.300000000000004
0.972 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 15.0
1.332 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 15.0
1.704 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 11.0
2.064 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 12.1

───────────────── 6338 rampup CPU4.0 Frequency residency (ms) ──────────────────
1.5   ▇▇▇▇▇▇▇▇▇▇ 11.9
1.956 ▇▇▇▇▇▇▇▇ 10.0
2.184 ▇▇▇▇▇▇▇▇ 10.0
2.388 ▇▇▇▇▇▇▇▇▇ 11.0
2.592 ▇▇▇▇▇▇▇▇ 10.0
2.772 ▇▇▇▇▇▇▇▇ 10.0
2.988 ▇▇▇▇▇▇▇▇ 10.0
3.204 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 85.3

After the patch the response is improved to rampup frequencies faster
and migrate from little quicker

                           rampup-2234 util_avg running
   ┌───────────────────────────────────────────────────────────────────────────┐
984┤                                                                ▗▄▄▄▄▄▛▀▀▀▀│
   │                                                          ▄▄▟▀▀▀▀          │
   │                                                     ▄▄▟▀▀                 │
   │                                                  ▄▟▀▘                     │
738┤                                               ▄▟▀▘                        │
   │                                            ▗▟▀▘                           │
   │                                          ▗▟▀                              │
492┤                                        ▗▟▀                                │
   │                                      ▗▟▀                                  │
   │                                     ▟▀                                    │
   │                                   ▄▛▘                                     │
246┤                                 ▗▟▘                                       │
   │                               ▗▟▀                                         │
   │                             ▗▟▀                                           │
   │                           ▗▟▀                                             │
  0┤                       ▄▄▄▛▀                                               │
   └┬───────┬───────┬────────┬───────┬───────┬───────┬────────┬───────┬───────┬┘
  1.700   1.733   1.767    1.800   1.833   1.867   1.900    1.933   1.967 2.000

───────────────── rampup-2234 util_avg running residency (ms) ──────────────────
0.0   ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 5.6000000000000005
15.0  ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 8.0
39.0  ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 5.0
61.0  ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 4.0
85.0  ▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0
99.0  ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 3.0
120.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 3.0
144.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0
160.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0
176.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0
192.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0
210.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0
228.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0
246.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0
263.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0
282.0 ▇▇▇▇▇▇▇ 1.0
291.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0
309.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0
327.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0
344.0 ▇▇▇▇▇▇▇ 1.0
354.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0
373.0 ▇▇▇▇▇▇▇ 1.0
382.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0
400.0 ▇▇▇▇▇▇▇ 1.0
408.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0
425.0 ▇▇▇▇▇▇▇ 1.0
434.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0
452.0 ▇▇▇▇▇▇▇ 1.0

                            2234 rampup CPU1.0 Frequency
    ┌──────────────────────────────────────────────────────────────────────────┐
2.06┤                             ▐▀                                           │
    │                             ▐                                            │
    │                             ▐                                            │
    │                             ▐                                            │
1.70┤                            ▛▀                                            │
    │                            ▌                                             │
    │                            ▌                                             │
1.33┤                           ▄▌                                             │
    │                           ▌                                              │
    │                           ▌                                              │
    │                           ▌                                              │
0.97┤                         ▗▄▌                                              │
    │                         ▐                                                │
    │                         ▐                                                │
    │                         ▐                                                │
0.60┤                      ▗▄▄▟                                                │
    └┬───────┬───────┬───────┬───────┬────────┬───────┬───────┬───────┬───────┬┘
   1.700   1.733   1.767   1.800   1.833    1.867   1.900   1.933   1.967 2.000

                            2234 rampup CPU4.0 Frequency
    ┌──────────────────────────────────────────────────────────────────────────┐
3.10┤                                                            ▐▀▀▀▀▀▀▀▀▀▀▀▀▀│
    │                                                 ▛▀▀▀▀▀▀▀▀▀▀▀             │
    │                                                 ▌                        │
    │                                            ▐▀▀▀▀▘                        │
2.70┤                                            ▐                             │
    │                                        ▐▀▀▀▀                             │
    │                                        ▐                                 │
2.30┤                                      ▛▀▀                                 │
    │                                      ▌                                   │
    │                                   ▐▀▀▘                                   │
    │                                   ▐                                      │
1.90┤                                 ▐▀▀                                      │
    │                                 ▐                                        │
    │                               ▗▄▟                                        │
    │                               ▐                                          │
1.50┤                              ▗▟                                          │
    └┬───────┬───────┬───────┬───────┬────────┬───────┬───────┬───────┬───────┬┘
   1.700   1.733   1.767   1.800   1.833    1.867   1.900   1.933   1.967 2.000

───────────────────────── Sum Time Running on CPU (ms) ─────────────────────────
CPU1.0 ▇▇▇▇ 32.53
CPU4.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 540.3

───────────────── 2234 rampup CPU1.0 Frequency residency (ms) ──────────────────
0.6   ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 12.1
0.972 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 6.5
1.332 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 3.7
1.704 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 5.5
2.064 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 4.8

───────────────── 2234 rampup CPU4.0 Frequency residency (ms) ──────────────────
1.5   ▇▇▇▇▇ 4.0
1.728 ▇▇▇▇▇▇▇▇▇▇ 8.0
1.956 ▇▇▇▇▇▇▇▇▇▇▇▇ 9.0
2.184 ▇▇▇▇▇▇▇▇▇▇▇▇ 9.0
2.388 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 11.0
2.592 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 16.0
2.772 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 18.0
2.988 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 47.0
3.096 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 53.4

Signed-off-by: Qais Yousef <qyousef@layalina.io>
---
 include/linux/sched.h |  1 +
 kernel/sched/core.c   |  1 +
 kernel/sched/fair.c   | 43 +++++++++++++++++++++++++++++++------------
 3 files changed, 33 insertions(+), 12 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 90691d99027e..8db8f4085d84 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -544,6 +544,7 @@ struct sched_entity {
 	unsigned int			on_rq;
 
 	u64				exec_start;
+	u64				delta_exec;
 	u64				sum_exec_runtime;
 	u64				prev_sum_exec_runtime;
 	u64				vruntime;
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 7099e40cc8bd..e2b4b87ec2b7 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4318,6 +4318,7 @@ static void __sched_fork(unsigned long clone_flags, struct task_struct *p)
 
 	p->se.on_rq			= 0;
 	p->se.exec_start		= 0;
+	p->se.delta_exec		= 0;
 	p->se.sum_exec_runtime		= 0;
 	p->se.prev_sum_exec_runtime	= 0;
 	p->se.nr_migrations		= 0;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e5e986af18dc..a6421e4032c0 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1118,6 +1118,7 @@ static s64 update_curr_se(struct rq *rq, struct sched_entity *curr)
 
 	curr->exec_start = now;
 	curr->sum_exec_runtime += delta_exec;
+	curr->delta_exec = delta_exec;
 
 	if (schedstat_enabled()) {
 		struct sched_statistics *stats;
@@ -1126,7 +1127,6 @@ static s64 update_curr_se(struct rq *rq, struct sched_entity *curr)
 		__schedstat_set(stats->exec_max,
 				max(delta_exec, stats->exec_max));
 	}
-
 	return delta_exec;
 }
 
@@ -4890,16 +4890,20 @@ static inline void util_est_update(struct cfs_rq *cfs_rq,
 	if (!sched_feat(UTIL_EST))
 		return;
 
-	/*
-	 * Skip update of task's estimated utilization when the task has not
-	 * yet completed an activation, e.g. being migrated.
-	 */
-	if (!task_sleep)
-		return;
-
 	/* Get current estimate of utilization */
 	ewma = READ_ONCE(p->se.avg.util_est);
 
+	/*
+	 * If a task is running, update util_est ignoring utilization
+	 * invariance so that if the task suddenly becomes busy we will rampup
+	 * quickly to settle down to our new util_avg.
+	 */
+	if (!task_sleep) {
+		ewma &= ~UTIL_AVG_UNCHANGED;
+		ewma = approximate_util_avg(ewma, p->se.delta_exec / 1000);
+		goto done;
+	}
+
 	/*
 	 * If the PELT values haven't changed since enqueue time,
 	 * skip the util_est update.
@@ -4968,6 +4972,14 @@ static inline void util_est_update(struct cfs_rq *cfs_rq,
 	trace_sched_util_est_se_tp(&p->se);
 }
 
+static inline void util_est_update_running(struct cfs_rq *cfs_rq,
+					   struct task_struct *p)
+{
+	util_est_dequeue(cfs_rq, p);
+	util_est_update(cfs_rq, p, false);
+	util_est_enqueue(cfs_rq, p);
+}
+
 static inline unsigned long get_actual_cpu_capacity(int cpu)
 {
 	unsigned long capacity = arch_scale_cpu_capacity(cpu);
@@ -5164,13 +5176,13 @@ static inline int sched_balance_newidle(struct rq *rq, struct rq_flags *rf)
 
 static inline void
 util_est_enqueue(struct cfs_rq *cfs_rq, struct task_struct *p) {}
-
 static inline void
 util_est_dequeue(struct cfs_rq *cfs_rq, struct task_struct *p) {}
-
 static inline void
-util_est_update(struct cfs_rq *cfs_rq, struct task_struct *p,
-		bool task_sleep) {}
+util_est_update(struct cfs_rq *cfs_rq, struct task_struct *p, bool task_sleep) {}
+static inline void
+util_est_update_running(struct cfs_rq *cfs_rq, struct task_struct *p) {}
+
 static inline void update_misfit_status(struct task_struct *p, struct rq *rq) {}
 
 #endif /* CONFIG_SMP */
@@ -6906,6 +6918,8 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 		rq->next_balance = jiffies;
 
 dequeue_throttle:
+	if (task_sleep)
+		p->se.delta_exec = 0;
 	util_est_update(&rq->cfs, p, task_sleep);
 	hrtick_update(rq);
 }
@@ -8546,6 +8560,9 @@ pick_next_task_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf
 		set_next_entity(cfs_rq, se);
 	}
 
+	if (prev->on_rq)
+		util_est_update_running(&rq->cfs, prev);
+
 	goto done;
 simple:
 #endif
@@ -12710,6 +12727,8 @@ static void task_tick_fair(struct rq *rq, struct task_struct *curr, int queued)
 		entity_tick(cfs_rq, se, queued);
 	}
 
+	util_est_update_running(&rq->cfs, curr);
+
 	if (static_branch_unlikely(&sched_numa_balancing))
 		task_tick_numa(rq, curr);
 

From patchwork Tue Aug 20 16:35:05 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Qais Yousef <qyousef@layalina.io>
X-Patchwork-Id: 821196
Received: from mail-ed1-f42.google.com (mail-ed1-f42.google.com
 [209.85.208.42])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by smtp.subspace.kernel.org (Postfix) with ESMTPS id 01D2D1B8E9B
 for <linux-pm@vger.kernel.org>; Tue, 20 Aug 2024 16:36:47 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.208.42
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
 t=1724171809; cv=none;
 b=QGUT8YvuuDjWZuY2scan61/JSwQgGnIOSK6t7gDkuwCZyWk8x2diYq1g4zJiOprlGpz9nKr+MtH49O8q0ZpIsv1ZpB4OSqjFmgDyKvTgRQtHKUVZGShiiKwoDH5RiWgLQvIuOdeizzwfnAGgmNSwS8BjvEi/xrPl2qMfJFZQZiU=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
 s=arc-20240116; t=1724171809; c=relaxed/simple;
 bh=GQPbKPJztlK6T882OcegVi96fmYLqoqnWFn1+Q/Iecs=;
 h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
 MIME-Version;
 b=ARR/WPpBtcp/Fy3mQkNEQO7bXIsTjtC3mTjWOuAU1D/UkS4sWItnQNczQFITi91bUPjgt6ISxXlVsuZhKGCfq9W8AS5sujhmThShdtiPp3oIvRtWaayT6owSYqXLEU7Tyuw8nkJO22BBI/iaqCSU+dJi60/co6tst8TFwAOl/JA=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=none (p=none dis=none) header.from=layalina.io;
 spf=pass smtp.mailfrom=layalina.io;
 dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com
 header.i=@layalina-io.20230601.gappssmtp.com header.b=H0X/u4lk;
 arc=none smtp.client-ip=209.85.208.42
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=none (p=none dis=none) header.from=layalina.io
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=layalina.io
Authentication-Results: smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com
 header.i=@layalina-io.20230601.gappssmtp.com header.b="H0X/u4lk"
Received: by mail-ed1-f42.google.com with SMTP id
 4fb4d7f45d1cf-5bef4d9e7f8so3396822a12.2
 for <linux-pm@vger.kernel.org>; Tue, 20 Aug 2024 09:36:47 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=layalina-io.20230601.gappssmtp.com; s=20230601; t=1724171806;
 x=1724776606; darn=vger.kernel.org;
 h=content-transfer-encoding:mime-version:references:in-reply-to
 :message-id:date:subject:cc:to:from:from:to:cc:subject:date
 :message-id:reply-to;
 bh=Gs4Ks0Kkm98Csg47MQJPALx1GhqbRV6StE7nBUKSXR0=;
 b=H0X/u4lkYVxFQFnzat5w3riIAUl6WVHxxPByZomWAcSHNL+Ptim7h5zZhLrJV3CCx9
 amArqi5HN+roUdYPOTMBeL5bhMY7AIEJvdql9T4gYy/vL3HajjoFV1hh4wxInVuZQdOH
 VtM7oPIiEfB2R+DYHdaHwup+R0WFfFrtXODtyV1MbX/cbEARy5X2gXMLIP0Tzi1+zAzz
 BzkbYZFei4WSydr7Od7zo370kfNVvrIY+AiYwBv5Z6ZaLPVFFqtpt2P7PBOvOGM7G2oH
 CQ9e8wkYgph2Swc0WrTN7m0TLt+PP6waRY9AQQQcT2x+F26lpeB5aLQG7dezsNHHI36L
 KhPw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20230601; t=1724171806; x=1724776606;
 h=content-transfer-encoding:mime-version:references:in-reply-to
 :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
 :subject:date:message-id:reply-to;
 bh=Gs4Ks0Kkm98Csg47MQJPALx1GhqbRV6StE7nBUKSXR0=;
 b=BEPw0W0RCjPUaoADSF1mvHQmNN+xve3cL0fRNt7BjXk7qvMxNWfyKUs4pZWhbuLwwb
 9ZCxn4O9jIso8W+jshQp7BkykLwInqBB7aV0ufwcJbZjiRI0tPfyMAC6lVyI2rPvRZie
 x2VPmncB2JM9B9tNBXSyh1CfC9/KC7etrLFVYLl19MyBRjIfUapo8CHjztZ2e432WGX9
 7clLd5EjtqVFZzudkuFsRuurF3HAYF5HMc/AeKdM02jP8C2jaa9J4UtzXVNlvZ6cF4fC
 C2WeV/GPjcmT/ORF6Krtb4t6qVUTteTN0LzMQocgYijvnJPQya6AJZ6VruYZvhcMNRlH
 FHjQ==
X-Forwarded-Encrypted: i=1;
 AJvYcCWvmQr67598CDiiq7+7wgEf5XexHtD/tsdvXu6izpfKU+ljqA7nMGRIgWHu0Lv9omwFNvxGpBZvUCDFv+BXdQ+hftKUKpfUZkY=
X-Gm-Message-State: AOJu0Yy/huBpdTMAkjt/BORPz0esTI3AphOvVoK5Jz7d0XPKGsMgv96u
 ZU4fKijgSQOqNaH789e5+TngyLFXE7mNigqQY4bNdmeSBRi2uKNV3cCqEyhQoyI=
X-Google-Smtp-Source: AGHT+IGwNTmVgxwt0g3+fIC/ibzEl8NcSRLlNxBxLL/CCcYzyggVVh9aXnQfRJ+/Pw7OoMUPFxuxmw==
X-Received: by 2002:a17:907:6d20:b0:a7d:2f42:db54 with SMTP id
 a640c23a62f3a-a8392a49851mr940380666b.65.1724171806188;
 Tue, 20 Aug 2024 09:36:46 -0700 (PDT)
Received: from airbuntu.. (host81-157-90-255.range81-157.btcentralplus.com.
 [81.157.90.255]) by smtp.gmail.com with ESMTPSA id
 a640c23a62f3a-a83838cfb5esm780728766b.59.2024.08.20.09.36.45
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Tue, 20 Aug 2024 09:36:45 -0700 (PDT)
From: Qais Yousef <qyousef@layalina.io>
To: Ingo Molnar <mingo@kernel.org>, Peter Zijlstra <peterz@infradead.org>,
 Vincent Guittot <vincent.guittot@linaro.org>,
 "Rafael J. Wysocki" <rafael@kernel.org>,
 Viresh Kumar <viresh.kumar@linaro.org>
Cc: Juri Lelli <juri.lelli@redhat.com>, Steven Rostedt <rostedt@goodmis.org>,
 Dietmar Eggemann <dietmar.eggemann@arm.com>,
 John Stultz <jstultz@google.com>, linux-pm@vger.kernel.org,
 linux-kernel@vger.kernel.org, Qais Yousef <qyousef@layalina.io>
Subject: [RFC PATCH 09/16] sched/fair: util_est: Take into account periodic
 tasks
Date: Tue, 20 Aug 2024 17:35:05 +0100
Message-Id: <20240820163512.1096301-10-qyousef@layalina.io>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20240820163512.1096301-1-qyousef@layalina.io>
References: <20240820163512.1096301-1-qyousef@layalina.io>
Precedence: bulk
X-Mailing-List: linux-pm@vger.kernel.org
List-Id: <linux-pm.vger.kernel.org>
List-Subscribe: <mailto:linux-pm+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-pm+unsubscribe@vger.kernel.org>
MIME-Version: 1.0

The new faster rampup is great for performance. But terrible for power.
We want the faster rampup to be only applied for tasks that are
transitioning from one periodic/steady state to another periodic/steady
state. But if they are stably periodic, then the faster rampup doesn't
make sense as util_avg describes their computational demand accurately
and we can rely on that to make accurate decision. And preserve the
power savings from being exact with the resources we give to this task
(ie: smaller DVFS headroom).

We detect periodic tasks based on util_avg across util_est_update()
calls. If it is rising, then the task is going through a transition.

We rely on util_avg being stable for periodic tasks with very little
variations around one stable point.

Signed-off-by: Qais Yousef <qyousef@layalina.io>
---
 include/linux/sched.h |  2 ++
 kernel/sched/core.c   |  2 ++
 kernel/sched/fair.c   | 17 ++++++++++++++---
 3 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 8db8f4085d84..2e8c5a9ffa76 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -829,6 +829,8 @@ struct task_struct {
 	struct uclamp_se		uclamp[UCLAMP_CNT];
 #endif
 
+	unsigned long			util_avg_dequeued;
+
 	struct sched_statistics         stats;
 
 #ifdef CONFIG_PREEMPT_NOTIFIERS
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index e2b4b87ec2b7..c91e6a62c7ab 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4331,6 +4331,8 @@ static void __sched_fork(unsigned long clone_flags, struct task_struct *p)
 	p->se.cfs_rq			= NULL;
 #endif
 
+	p->util_avg_dequeued		= 0;
+
 #ifdef CONFIG_SCHEDSTATS
 	/* Even if schedstat is disabled, there should not be garbage */
 	memset(&p->stats, 0, sizeof(p->stats));
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a6421e4032c0..0c10e2afb52d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4832,6 +4832,11 @@ static inline unsigned long task_util(struct task_struct *p)
 	return READ_ONCE(p->se.avg.util_avg);
 }
 
+static inline unsigned long task_util_dequeued(struct task_struct *p)
+{
+	return READ_ONCE(p->util_avg_dequeued);
+}
+
 static inline unsigned long task_runnable(struct task_struct *p)
 {
 	return READ_ONCE(p->se.avg.runnable_avg);
@@ -4899,9 +4904,12 @@ static inline void util_est_update(struct cfs_rq *cfs_rq,
 	 * quickly to settle down to our new util_avg.
 	 */
 	if (!task_sleep) {
-		ewma &= ~UTIL_AVG_UNCHANGED;
-		ewma = approximate_util_avg(ewma, p->se.delta_exec / 1000);
-		goto done;
+		if (task_util(p) > task_util_dequeued(p)) {
+			ewma &= ~UTIL_AVG_UNCHANGED;
+			ewma = approximate_util_avg(ewma, p->se.delta_exec / 1000);
+			goto done;
+		}
+		return;
 	}
 
 	/*
@@ -4914,6 +4922,9 @@ static inline void util_est_update(struct cfs_rq *cfs_rq,
 	/* Get utilization at dequeue */
 	dequeued = task_util(p);
 
+	if (!task_on_rq_migrating(p))
+		p->util_avg_dequeued = dequeued;
+
 	/*
 	 * Reset EWMA on utilization increases, the moving average is used only
 	 * to smooth utilization decreases.

From patchwork Tue Aug 20 16:35:06 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Qais Yousef <qyousef@layalina.io>
X-Patchwork-Id: 820841
Received: from mail-ed1-f42.google.com (mail-ed1-f42.google.com
 [209.85.208.42])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by smtp.subspace.kernel.org (Postfix) with ESMTPS id AAE2F1B9B3D
 for <linux-pm@vger.kernel.org>; Tue, 20 Aug 2024 16:36:48 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.208.42
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
 t=1724171810; cv=none;
 b=G+oZS2XHpB6swagEVKNkIEO5hrk0SZufWBobQoUEGVhWNZmdZLj37UImlTk2sbLeJALTk8/XMk3/D+3FfOv9SL/2mjtq8cIyQv56FLgmnm/2hHu5fBL6J9nB8Fb7FLkYt2xwirloTsphHh4dz3GCbnqGyp6btZgnwsMPESQcGqw=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
 s=arc-20240116; t=1724171810; c=relaxed/simple;
 bh=H2v4wPkY9/S00PuSqnRFW7BMeliOkRRZuCIYsWw2Xl0=;
 h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
 MIME-Version;
 b=DlWibJTkKYxwoBgHHROzTMBEwGH2V1IcUA/Zl8QTfq1EbyHF3cKP+6cxxiYoOEeR2nMA5mmxq/GQ30t14ZXYWJuofaOLMxXWdfhYObMqowlU4pAN9lbljjsw9nuGIuh+haHWAR37hELcwGqhq1XZ0mSdDjZ0RFSsSbGTxl5f5AM=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=none (p=none dis=none) header.from=layalina.io;
 spf=pass smtp.mailfrom=layalina.io;
 dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com
 header.i=@layalina-io.20230601.gappssmtp.com header.b=SEdLHilB;
 arc=none smtp.client-ip=209.85.208.42
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=none (p=none dis=none) header.from=layalina.io
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=layalina.io
Authentication-Results: smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com
 header.i=@layalina-io.20230601.gappssmtp.com header.b="SEdLHilB"
Received: by mail-ed1-f42.google.com with SMTP id
 4fb4d7f45d1cf-5bec7d380caso4903756a12.3
 for <linux-pm@vger.kernel.org>; Tue, 20 Aug 2024 09:36:48 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=layalina-io.20230601.gappssmtp.com; s=20230601; t=1724171807;
 x=1724776607; darn=vger.kernel.org;
 h=content-transfer-encoding:mime-version:references:in-reply-to
 :message-id:date:subject:cc:to:from:from:to:cc:subject:date
 :message-id:reply-to;
 bh=JT5j0PZ9X9xu3k6v8CfXi6RGxKCSRQvQX4tJvcwiDgI=;
 b=SEdLHilBdGd+HV3kLIAFgh5Tco/aov6L03RWLcXoDAxxwECbqowStf7e7Nn4W00uNA
 8ou3G4Dx/Qe935abWpyN261HVj/GzOCQ3TdT5w/Ei88D7RX7h/hPDGY513givXkFJOiL
 FlhSh8Oqovtucp01VF8KkxGGsubtzyP4MExPV5wMARKfzjNOIg/rNu6Uq2XcVvXM/1Gn
 NotMSmAekdwHEjUvlRlecHIuTf82M0IC/dNIXbEHRGht3TJroZomTcVEcZ7gFL/bP5aF
 B309BAeGdVe2jdane30E7qGJcyQLmu1IZuO0fGUiO0esf7UfyDmBCjvU04zCUWbsviB1
 NlBw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20230601; t=1724171807; x=1724776607;
 h=content-transfer-encoding:mime-version:references:in-reply-to
 :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
 :subject:date:message-id:reply-to;
 bh=JT5j0PZ9X9xu3k6v8CfXi6RGxKCSRQvQX4tJvcwiDgI=;
 b=u1UChiOcI/tq3fMBSoxD+7ZUja3y6Bwyfp44d2X55gwIJ7SyZvIt28hMR69L/PaDSt
 6Lx1oCwPjoyFij9e8AS4gFwItbZJR6iil1frs5Bs5vmgICjl849aB51dKC9r58CvQ6Tb
 SoI6HsoTc7dBRyyJjDFUbECkh2gMscS26sNtLTOEcUl82L4p4HnuHMY1ToK1m6Th+/2s
 RIrWsa9bbpDG2HW9BczldAmdAp5M91+N/KYRnCvitXdEwlNoRWIjTflu7ah4CZLvpkW4
 oR7YVp/a73n/j8cSjjqSmz2YBd2+CS2w8AczdKWpDMxJXof18nDWpnx4G5RewkO1os7A
 8g6Q==
X-Forwarded-Encrypted: i=1;
 AJvYcCUdzBmgYfllyKVNNFM0Lv8+dwGjNtitdxD4/AYxAj/AAZn/mHt0DWh0C6hHYNxTr6oZcxYEWiot2Q==@vger.kernel.org
X-Gm-Message-State: AOJu0YxxOd03eL5zPcoPdFxz12ufmNjYKX9Vexb6d1CfiZTIjzHwQCm7
 njxEQj37hi127VuW8OMEOHRPuBgFePriua3WqPToRV2d1L5eH34gg1fiSBdFMfk=
X-Google-Smtp-Source: AGHT+IFCqy8udzUPQSToeARc04ldQIgYbEcEvnYLXxCQCr6hah66sD1+LHqReW2Q/Kq1330Y8OrsKQ==
X-Received: by 2002:a17:907:efc8:b0:a7a:c197:8701 with SMTP id
 a640c23a62f3a-a83929534e5mr1074429266b.31.1724171807015;
 Tue, 20 Aug 2024 09:36:47 -0700 (PDT)
Received: from airbuntu.. (host81-157-90-255.range81-157.btcentralplus.com.
 [81.157.90.255]) by smtp.gmail.com with ESMTPSA id
 a640c23a62f3a-a83838cfb5esm780728766b.59.2024.08.20.09.36.46
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Tue, 20 Aug 2024 09:36:46 -0700 (PDT)
From: Qais Yousef <qyousef@layalina.io>
To: Ingo Molnar <mingo@kernel.org>, Peter Zijlstra <peterz@infradead.org>,
 Vincent Guittot <vincent.guittot@linaro.org>,
 "Rafael J. Wysocki" <rafael@kernel.org>,
 Viresh Kumar <viresh.kumar@linaro.org>
Cc: Juri Lelli <juri.lelli@redhat.com>, Steven Rostedt <rostedt@goodmis.org>,
 Dietmar Eggemann <dietmar.eggemann@arm.com>,
 John Stultz <jstultz@google.com>, linux-pm@vger.kernel.org,
 linux-kernel@vger.kernel.org, Qais Yousef <qyousef@layalina.io>
Subject: [RFC PATCH 10/16] sched/qos: Add a new sched-qos interface
Date: Tue, 20 Aug 2024 17:35:06 +0100
Message-Id: <20240820163512.1096301-11-qyousef@layalina.io>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20240820163512.1096301-1-qyousef@layalina.io>
References: <20240820163512.1096301-1-qyousef@layalina.io>
Precedence: bulk
X-Mailing-List: linux-pm@vger.kernel.org
List-Id: <linux-pm.vger.kernel.org>
List-Subscribe: <mailto:linux-pm+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-pm+unsubscribe@vger.kernel.org>
MIME-Version: 1.0

The need to describe the conflicting demand of various workloads hasn't
been higher. Both hardware and software have moved rapidly in the past
decade and system usage is more diverse and the number of workloads
expected to run on the same machine whether on Mobile or Server markets
has created a big dilemma on how to better manage those requirements.

The problem is that we lack mechanisms to allow these workloads to
describe what they need, and then allow kernel to do best efforts to
manage those demands based on the hardware it is running on
transparently and current system state.

Example of conflicting requirements that come across frequently:

	1. Improve wake up latency for SCHED_OTHER. Many tasks end up
	   using SCHED_FIFO/SCHED_RR to compensate for this shortcoming.
	   RT tasks lack power management and fairness and can be hard
	   and error prone to use correctly and portably.

	2. Prefer spreading vs prefer packing on wake up for a group of
	   tasks. Geekbench-like workloads would benefit from
	   parallelising on different CPUs. hackbench type of workloads
	   can benefit from waking on up same CPUs or a CPU that is
	   closer in the cache hierarchy.

	3. Nice values for SCHED_OTHER are system wide and require
	   privileges. Many workloads would like a way to set relative
	   nice value so they can preempt each others, but not be
	   impact or be impacted by other tasks belong to different
	   workloads on the system.

	4. Provide a way to tag some tasks as 'background' to keep them
	   out of the way. SCHED_IDLE is too strong for some of these
	   tasks but yet they can be computationally heavy. Example
	   tasks are garbage collectors. Their work is both important
	   and not important.

	5. Provide a way to improve DVFS/upmigration rampup time for
	   specific tasks that are bursty in nature and highly
	   interactive.

Whether any of these use cases warrants an additional QoS hint is
something to be discussed individually. But the main point is to
introduce an interface that can be extendable to cater for potentially
those requirements and more. rampup_multiplier to improve
DVFS/upmigration for bursty tasks will be the first user in later patch.

It is desired to have apps (and benchmarks!) directly use this interface
for optimal perf/watt. But in the absence of such support, it should be
possible to write a userspace daemon to monitor workloads and apply
these QoS hints on apps behalf based on analysis done by anyone
interested in improving the performance of those workloads.

Signed-off-by: Qais Yousef <qyousef@layalina.io>
---
 Documentation/scheduler/index.rst             |  1 +
 Documentation/scheduler/sched-qos.rst         | 44 ++++++++++++++++++
 include/uapi/linux/sched.h                    |  4 ++
 include/uapi/linux/sched/types.h              | 46 +++++++++++++++++++
 kernel/sched/syscalls.c                       |  3 ++
 .../trace/beauty/include/uapi/linux/sched.h   |  4 ++
 6 files changed, 102 insertions(+)
 create mode 100644 Documentation/scheduler/sched-qos.rst

diff --git a/Documentation/scheduler/index.rst b/Documentation/scheduler/index.rst
index 43bd8a145b7a..f49b8b021d97 100644
--- a/Documentation/scheduler/index.rst
+++ b/Documentation/scheduler/index.rst
@@ -21,6 +21,7 @@ Scheduler
     sched-rt-group
     sched-stats
     sched-debug
+    sched-qos
 
     text_files
 
diff --git a/Documentation/scheduler/sched-qos.rst b/Documentation/scheduler/sched-qos.rst
new file mode 100644
index 000000000000..0911261cb124
--- /dev/null
+++ b/Documentation/scheduler/sched-qos.rst
@@ -0,0 +1,44 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=============
+Scheduler QoS
+=============
+
+1. Introduction
+===============
+
+Different workloads have different scheduling requirements to operate
+optimally. The same applies to tasks within the same workload.
+
+To enable smarter usage of system resources and to cater for the conflicting
+demands of various tasks, Scheduler QoS provides a mechanism to provide more
+information about those demands so that scheduler can do best-effort to
+honour them.
+
+  @sched_qos_type	what QoS hint to apply
+  @sched_qos_value	value of the QoS hint
+  @sched_qos_cookie	magic cookie to tag a group of tasks for which the QoS
+			applies. If 0, the hint will apply globally system
+			wide. If not 0, the hint will be relative to tasks that
+			has the same cookie value only.
+
+QoS hints are set once and not inherited by children by design. The
+rationale is that each task has its individual characteristics and it is
+encouraged to describe each of these separately. Also since system resources
+are finite, there's a limit to what can be done to honour these requests
+before reaching a tipping point where there are too many requests for
+a particular QoS that is impossible to service for all of them at once and
+some will start to lose out. For example if 10 tasks require better wake
+up latencies on a 4 CPUs SMP system, then if they all wake up at once, only
+4 can perceive the hint honoured and the rest will have to wait. Inheritance
+can lead these 10 to become a 100 or a 1000 more easily, and then the QoS
+hint will lose its meaning and effectiveness rapidly. The chances of 10
+tasks waking up at the same time is lower than a 100 and lower than a 1000.
+
+To set multiple QoS hints, a syscall is required for each. This is a
+trade-off to reduce the churn on extending the interface as the hope for
+this to evolve as workloads and hardware get more sophisticated and the
+need for extension will arise; and when this happen the task should be
+simpler to add the kernel extension and allow userspace to use readily by
+setting the newly added flag without having to update the whole of
+sched_attr.
diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h
index 3bac0a8ceab2..67ef99f64ddc 100644
--- a/include/uapi/linux/sched.h
+++ b/include/uapi/linux/sched.h
@@ -102,6 +102,9 @@ struct clone_args {
 	__aligned_u64 set_tid_size;
 	__aligned_u64 cgroup;
 };
+
+enum sched_qos_type {
+};
 #endif
 
 #define CLONE_ARGS_SIZE_VER0 64 /* sizeof first published struct */
@@ -132,6 +135,7 @@ struct clone_args {
 #define SCHED_FLAG_KEEP_PARAMS		0x10
 #define SCHED_FLAG_UTIL_CLAMP_MIN	0x20
 #define SCHED_FLAG_UTIL_CLAMP_MAX	0x40
+#define SCHED_FLAG_QOS			0x80
 
 #define SCHED_FLAG_KEEP_ALL	(SCHED_FLAG_KEEP_POLICY | \
 				 SCHED_FLAG_KEEP_PARAMS)
diff --git a/include/uapi/linux/sched/types.h b/include/uapi/linux/sched/types.h
index 90662385689b..55e4b1e79ed2 100644
--- a/include/uapi/linux/sched/types.h
+++ b/include/uapi/linux/sched/types.h
@@ -94,6 +94,48 @@
  * scheduled on a CPU with no more capacity than the specified value.
  *
  * A task utilization boundary can be reset by setting the attribute to -1.
+ *
+ * Scheduler QoS
+ * =============
+ *
+ * Different workloads have different scheduling requirements to operate
+ * optimally. The same applies to tasks within the same workload.
+ *
+ * To enable smarter usage of system resources and to cater for the conflicting
+ * demands of various tasks, Scheduler QoS provides a mechanism to provide more
+ * information about those demands so that scheduler can do best-effort to
+ * honour them.
+ *
+ *  @sched_qos_type	what QoS hint to apply
+ *  @sched_qos_value	value of the QoS hint
+ *  @sched_qos_cookie	magic cookie to tag a group of tasks for which the QoS
+ *			applies. If 0, the hint will apply globally system
+ *			wide. If not 0, the hint will be relative to tasks that
+ *			has the same cookie value only.
+ *
+ * QoS hints are set once and not inherited by children by design. The
+ * rationale is that each task has its individual characteristics and it is
+ * encouraged to describe each of these separately. Also since system resources
+ * are finite, there's a limit to what can be done to honour these requests
+ * before reaching a tipping point where there are too many requests for
+ * a particular QoS that is impossible to service for all of them at once and
+ * some will start to lose out. For example if 10 tasks require better wake
+ * up latencies on a 4 CPUs SMP system, then if they all wake up at once, only
+ * 4 can perceive the hint honoured and the rest will have to wait. Inheritance
+ * can lead these 10 to become a 100 or a 1000 more easily, and then the QoS
+ * hint will lose its meaning and effectiveness rapidly. The chances of 10
+ * tasks waking up at the same time is lower than a 100 and lower than a 1000.
+ *
+ * To set multiple QoS hints, a syscall is required for each. This is a
+ * trade-off to reduce the churn on extending the interface as the hope for
+ * this to evolve as workloads and hardware get more sophisticated and the
+ * need for extension will arise; and when this happen the task should be
+ * simpler to add the kernel extension and allow userspace to use readily by
+ * setting the newly added flag without having to update the whole of
+ * sched_attr.
+ *
+ * Details about the available QoS hints can be found in:
+ * Documentation/scheduler/sched-qos.rst
  */
 struct sched_attr {
 	__u32 size;
@@ -116,6 +158,10 @@ struct sched_attr {
 	__u32 sched_util_min;
 	__u32 sched_util_max;
 
+	__u32 sched_qos_type;
+	__s64 sched_qos_value;
+	__u32 sched_qos_cookie;
+
 };
 
 #endif /* _UAPI_LINUX_SCHED_TYPES_H */
diff --git a/kernel/sched/syscalls.c b/kernel/sched/syscalls.c
index ae1b42775ef9..a7d4dfdfed43 100644
--- a/kernel/sched/syscalls.c
+++ b/kernel/sched/syscalls.c
@@ -668,6 +668,9 @@ int __sched_setscheduler(struct task_struct *p,
 			return retval;
 	}
 
+	if (attr->sched_flags & SCHED_FLAG_QOS)
+		return -EOPNOTSUPP;
+
 	/*
 	 * SCHED_DEADLINE bandwidth accounting relies on stable cpusets
 	 * information.
diff --git a/tools/perf/trace/beauty/include/uapi/linux/sched.h b/tools/perf/trace/beauty/include/uapi/linux/sched.h
index 3bac0a8ceab2..67ef99f64ddc 100644
--- a/tools/perf/trace/beauty/include/uapi/linux/sched.h
+++ b/tools/perf/trace/beauty/include/uapi/linux/sched.h
@@ -102,6 +102,9 @@ struct clone_args {
 	__aligned_u64 set_tid_size;
 	__aligned_u64 cgroup;
 };
+
+enum sched_qos_type {
+};
 #endif
 
 #define CLONE_ARGS_SIZE_VER0 64 /* sizeof first published struct */
@@ -132,6 +135,7 @@ struct clone_args {
 #define SCHED_FLAG_KEEP_PARAMS		0x10
 #define SCHED_FLAG_UTIL_CLAMP_MIN	0x20
 #define SCHED_FLAG_UTIL_CLAMP_MAX	0x40
+#define SCHED_FLAG_QOS			0x80
 
 #define SCHED_FLAG_KEEP_ALL	(SCHED_FLAG_KEEP_POLICY | \
 				 SCHED_FLAG_KEEP_PARAMS)

From patchwork Tue Aug 20 16:35:07 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Qais Yousef <qyousef@layalina.io>
X-Patchwork-Id: 821195
Received: from mail-ej1-f41.google.com (mail-ej1-f41.google.com
 [209.85.218.41])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by smtp.subspace.kernel.org (Postfix) with ESMTPS id 902DE1B3F33
 for <linux-pm@vger.kernel.org>; Tue, 20 Aug 2024 16:36:49 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.218.41
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
 t=1724171811; cv=none;
 b=CtKz4ThvyGM/QTnqU9qqw1Hos1Ef+othJXZuZm+pITF+GZohVU/2l8rW8x38NlYdu2+ePp1qA2sCchNJzC9eGSvS/C0svKdgJUglx3fpCKqxoVjukK2jZUJPaCaQtMj92qkZ9Gy8tyoF+839i0zvYlquAroJaQ9tHSzMr76tokg=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
 s=arc-20240116; t=1724171811; c=relaxed/simple;
 bh=6C3oq3HrJLkJWrLpo+1dSeHCEUbPFUKzQa1BwujeAi0=;
 h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
 MIME-Version;
 b=Pj33fFRxwuR/zm+Ev2i+rJE/Hxpl2Ljq2rLc0iiKAYiTyE2GLHU1p5aF6nbaNdit6S2KTOEk9UVfZMtJ5YLglPHDewplME7+mFXKLg7MgY8x38R+60bVkG/59E3h8vjNxHhmtTTEExABT5SCE3RAwFbUFXkIOKPXluonSL1ha3E=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=none (p=none dis=none) header.from=layalina.io;
 spf=pass smtp.mailfrom=layalina.io;
 dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com
 header.i=@layalina-io.20230601.gappssmtp.com header.b=fjzVqkVB;
 arc=none smtp.client-ip=209.85.218.41
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=none (p=none dis=none) header.from=layalina.io
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=layalina.io
Authentication-Results: smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com
 header.i=@layalina-io.20230601.gappssmtp.com header.b="fjzVqkVB"
Received: by mail-ej1-f41.google.com with SMTP id
 a640c23a62f3a-a8657900fc1so80328866b.1
 for <linux-pm@vger.kernel.org>; Tue, 20 Aug 2024 09:36:49 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=layalina-io.20230601.gappssmtp.com; s=20230601; t=1724171808;
 x=1724776608; darn=vger.kernel.org;
 h=content-transfer-encoding:mime-version:references:in-reply-to
 :message-id:date:subject:cc:to:from:from:to:cc:subject:date
 :message-id:reply-to;
 bh=D/d/uli9PIV9KsPRWvcOGLBXbZ0uKNuetYque1eIsGo=;
 b=fjzVqkVBxMlFXs/sf5pfn83HKxZQ5kmtbDyuCPqCWoT989vqp5FYBgRHckO175VAK6
 afp5xA1Kt3xPIl/EhBbsiMUTi+KZe3lMFTUMw+wF/gfH0N2Zh3ZBp76G84juM/GieE1P
 ATcMhYlRgSR49mkGkASHqiIi+kzig0aJhlL0J45kH3IWYiQvLA/z6824GVWUeBy/jwTF
 OO8c65IReOgiNzWEsfrQUUHkZpYEopWYA3pATdKuwtFZSAj/Hv1olpkpvTw5AUJgVZID
 hDC8O3YuV0TvVW+0Y/aHZC5FzOTc4xELh9NvIMwBPR0MtX1auZ+ymrHXTFERLFwTi9sp
 YFRA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20230601; t=1724171808; x=1724776608;
 h=content-transfer-encoding:mime-version:references:in-reply-to
 :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
 :subject:date:message-id:reply-to;
 bh=D/d/uli9PIV9KsPRWvcOGLBXbZ0uKNuetYque1eIsGo=;
 b=DgDsVecCpM66CSqmJRVnFEnH5r7NZFH1AT3h8NVX7yysoUBGcjS0BpFywTWLjhwY3x
 inA3giMJrH14a1+Z1cp9RMRyAkWO6qxOuzxxFYgyYGQc5xpFMFLctRtisVW1z8vlplM3
 EQOr+96q5xblDgWj1y4T6uvrf1cA3BV4IOmX+yLRMgSIL3x8A8nuR1E1q7BH2pg9EQp5
 jZNLqnzwZQvishRPV42vfVJjy9hjnStU7l52nO4A7VmM8WMfR+erAu2lFmbo07ls8Fgg
 fL/4Zk5m1UMS3fUI2Smcl682caWxhtu+bkzocmAHZshan97IgUvPC8rtLNSMsYXLEOlE
 vuUQ==
X-Forwarded-Encrypted: i=1;
 AJvYcCV2gD7+ik22bdn9lcQAnLccuRXSiVn+WDb157Vq5Nb5AoJm+0AWLPEhhTRAeFBkbqOKLiYlAFLzqbMZHMBjNTIRr20adcIGEbU=
X-Gm-Message-State: AOJu0YwHnthTg5oWxSn9VBiPWz6EXRgaR2eP31kstRw4kgyywnvqfKkL
 Sy19rEE3hFzJSSn8GkRZtUqRWmlvRDmc+H7PaDojTIXUcfm1kY6wNFtHgogwpq4=
X-Google-Smtp-Source: AGHT+IEZPw2L8xpcAT+hcydRU41CPHFT8ONkE8EI/GuWh2TzpVwCtCSqfce21OS/S3Nt2vjzA7lj8A==
X-Received: by 2002:a17:907:f193:b0:a80:f893:51bb with SMTP id
 a640c23a62f3a-a8392a4c515mr1029504866b.68.1724171807827;
 Tue, 20 Aug 2024 09:36:47 -0700 (PDT)
Received: from airbuntu.. (host81-157-90-255.range81-157.btcentralplus.com.
 [81.157.90.255]) by smtp.gmail.com with ESMTPSA id
 a640c23a62f3a-a83838cfb5esm780728766b.59.2024.08.20.09.36.47
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Tue, 20 Aug 2024 09:36:47 -0700 (PDT)
From: Qais Yousef <qyousef@layalina.io>
To: Ingo Molnar <mingo@kernel.org>, Peter Zijlstra <peterz@infradead.org>,
 Vincent Guittot <vincent.guittot@linaro.org>,
 "Rafael J. Wysocki" <rafael@kernel.org>,
 Viresh Kumar <viresh.kumar@linaro.org>
Cc: Juri Lelli <juri.lelli@redhat.com>, Steven Rostedt <rostedt@goodmis.org>,
 Dietmar Eggemann <dietmar.eggemann@arm.com>,
 John Stultz <jstultz@google.com>, linux-pm@vger.kernel.org,
 linux-kernel@vger.kernel.org, Qais Yousef <qyousef@layalina.io>
Subject: [RFC PATCH 11/16] sched/qos: Add rampup multiplier QoS
Date: Tue, 20 Aug 2024 17:35:07 +0100
Message-Id: <20240820163512.1096301-12-qyousef@layalina.io>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20240820163512.1096301-1-qyousef@layalina.io>
References: <20240820163512.1096301-1-qyousef@layalina.io>
Precedence: bulk
X-Mailing-List: linux-pm@vger.kernel.org
List-Id: <linux-pm.vger.kernel.org>
List-Subscribe: <mailto:linux-pm+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-pm+unsubscribe@vger.kernel.org>
MIME-Version: 1.0

Bursty tasks are hard to predict. To use resources efficiently, the
system would like to be exact as much as possible. But this poses
a challenge for these bursty tasks that need to get access to more
resources quickly.

The new SCHED_QOS_RAMPUP_MULTIPLIER allows userspace to do that. As the
name implies, it only helps them to transition to a higher performance
state when they get _busier_. That is perfectly periodic tasks by
definition are not going through a transition and will run at a constant
performance level. It is the tasks that need to transition from one
periodic state to another periodic state that is at a higher level that
this rampup_multiplier will help with. It also slows down the ewma decay
of util_est which should help those bursty tasks to keep their faster
rampup.

This should work complimentary with uclamp. uclamp tells the system
about min and max perf requirements which can be applied immediately.

rampup_multiplier is about reactiveness of the task to change.
Specifically to a change for a higher performance level. The task might
necessary need to have a min perf requirements, but it can have sudden
burst of changes that require higher perf level and it needs the system
to provide this faster.

TODO: update the sched_qos docs

Signed-off-by: Qais Yousef <qyousef@layalina.io>
---
 include/linux/sched.h      |  7 ++++
 include/uapi/linux/sched.h |  2 ++
 kernel/sched/core.c        | 66 ++++++++++++++++++++++++++++++++++++++
 kernel/sched/fair.c        |  6 ++--
 kernel/sched/syscalls.c    | 38 ++++++++++++++++++++--
 5 files changed, 115 insertions(+), 4 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 2e8c5a9ffa76..a30ee43a25fb 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -404,6 +404,11 @@ struct sched_info {
 #endif /* CONFIG_SCHED_INFO */
 };
 
+struct sched_qos {
+	DECLARE_BITMAP(user_defined, SCHED_QOS_MAX);
+	unsigned int rampup_multiplier;
+};
+
 /*
  * Integer metrics need fixed point arithmetic, e.g., sched/fair
  * has a few: load, load_avg, util_avg, freq, and capacity.
@@ -882,6 +887,8 @@ struct task_struct {
 
 	struct sched_info		sched_info;
 
+	struct sched_qos		sched_qos;
+
 	struct list_head		tasks;
 #ifdef CONFIG_SMP
 	struct plist_node		pushable_tasks;
diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h
index 67ef99f64ddc..0baba91ba5b8 100644
--- a/include/uapi/linux/sched.h
+++ b/include/uapi/linux/sched.h
@@ -104,6 +104,8 @@ struct clone_args {
 };
 
 enum sched_qos_type {
+	SCHED_QOS_RAMPUP_MULTIPLIER,
+	SCHED_QOS_MAX,
 };
 #endif
 
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c91e6a62c7ab..54faa845cb29 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -152,6 +152,8 @@ __read_mostly int sysctl_resched_latency_warn_once = 1;
  */
 const_debug unsigned int sysctl_sched_nr_migrate = SCHED_NR_MIGRATE_BREAK;
 
+unsigned int sysctl_sched_qos_default_rampup_multiplier	= 1;
+
 __read_mostly int scheduler_running;
 
 #ifdef CONFIG_SCHED_CORE
@@ -4488,6 +4490,47 @@ static int sysctl_schedstats(struct ctl_table *table, int write, void *buffer,
 #endif /* CONFIG_SCHEDSTATS */
 
 #ifdef CONFIG_SYSCTL
+static void sched_qos_sync_sysctl(void)
+{
+	struct task_struct *g, *p;
+
+	guard(rcu)();
+	for_each_process_thread(g, p) {
+		struct rq_flags rf;
+		struct rq *rq;
+
+		rq = task_rq_lock(p, &rf);
+		if (!test_bit(SCHED_QOS_RAMPUP_MULTIPLIER, p->sched_qos.user_defined))
+			p->sched_qos.rampup_multiplier = sysctl_sched_qos_default_rampup_multiplier;
+		task_rq_unlock(rq, p, &rf);
+	}
+}
+
+static int sysctl_sched_qos_handler(struct ctl_table *table, int write,
+				    void *buffer, size_t *lenp, loff_t *ppos)
+{
+	unsigned int old_rampup_mult;
+	int result;
+
+	old_rampup_mult = sysctl_sched_qos_default_rampup_multiplier;
+
+	result = proc_dointvec(table, write, buffer, lenp, ppos);
+	if (result)
+		goto undo;
+	if (!write)
+		return 0;
+
+	if (old_rampup_mult != sysctl_sched_qos_default_rampup_multiplier) {
+		sched_qos_sync_sysctl();
+	}
+
+	return 0;
+
+undo:
+	sysctl_sched_qos_default_rampup_multiplier = old_rampup_mult;
+	return result;
+}
+
 static struct ctl_table sched_core_sysctls[] = {
 #ifdef CONFIG_SCHEDSTATS
 	{
@@ -4534,6 +4577,13 @@ static struct ctl_table sched_core_sysctls[] = {
 		.extra2		= SYSCTL_FOUR,
 	},
 #endif /* CONFIG_NUMA_BALANCING */
+	{
+		.procname	= "sched_qos_default_rampup_multiplier",
+		.data           = &sysctl_sched_qos_default_rampup_multiplier,
+		.maxlen         = sizeof(unsigned int),
+		.mode           = 0644,
+		.proc_handler   = sysctl_sched_qos_handler,
+	},
 };
 static int __init sched_core_sysctl_init(void)
 {
@@ -4543,6 +4593,21 @@ static int __init sched_core_sysctl_init(void)
 late_initcall(sched_core_sysctl_init);
 #endif /* CONFIG_SYSCTL */
 
+static void sched_qos_fork(struct task_struct *p)
+{
+	/*
+	 * We always force reset sched_qos on fork. These sched_qos are treated
+	 * as finite resources to help improve quality of life. Inheriting them
+	 * by default can easily lead to a situation where the QoS hint become
+	 * meaningless because all tasks in the system have it.
+	 *
+	 * Every task must request the QoS explicitly if it needs it. No
+	 * accidental inheritance is allowed to keep the default behavior sane.
+	 */
+	bitmap_zero(p->sched_qos.user_defined, SCHED_QOS_MAX);
+	p->sched_qos.rampup_multiplier = sysctl_sched_qos_default_rampup_multiplier;
+}
+
 /*
  * fork()/clone()-time setup:
  */
@@ -4562,6 +4627,7 @@ int sched_fork(unsigned long clone_flags, struct task_struct *p)
 	p->prio = current->normal_prio;
 
 	uclamp_fork(p);
+	sched_qos_fork(p);
 
 	/*
 	 * Revert to default priority/policy on fork if requested.
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0c10e2afb52d..3d9794db58e1 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4906,7 +4906,7 @@ static inline void util_est_update(struct cfs_rq *cfs_rq,
 	if (!task_sleep) {
 		if (task_util(p) > task_util_dequeued(p)) {
 			ewma &= ~UTIL_AVG_UNCHANGED;
-			ewma = approximate_util_avg(ewma, p->se.delta_exec / 1000);
+			ewma = approximate_util_avg(ewma, (p->se.delta_exec/1000) * p->sched_qos.rampup_multiplier);
 			goto done;
 		}
 		return;
@@ -4974,6 +4974,8 @@ static inline void util_est_update(struct cfs_rq *cfs_rq,
 	 * 0.25, thus making w=1/4 ( >>= UTIL_EST_WEIGHT_SHIFT)
 	 */
 	ewma <<= UTIL_EST_WEIGHT_SHIFT;
+	if (p->sched_qos.rampup_multiplier)
+		last_ewma_diff /= p->sched_qos.rampup_multiplier;
 	ewma  -= last_ewma_diff;
 	ewma >>= UTIL_EST_WEIGHT_SHIFT;
 done:
@@ -9643,7 +9645,7 @@ static void update_cpu_capacity(struct sched_domain *sd, int cpu)
 	 * on TICK doesn't end up hurting it as it can happen after we would
 	 * have crossed this threshold.
 	 *
-	 * To ensure that invaraince is taken into account, we don't scale time
+	 * To ensure that invariance is taken into account, we don't scale time
 	 * and use it as-is, approximate_util_avg() will then let us know the
 	 * our threshold.
 	 */
diff --git a/kernel/sched/syscalls.c b/kernel/sched/syscalls.c
index a7d4dfdfed43..dc7d7bcaae7b 100644
--- a/kernel/sched/syscalls.c
+++ b/kernel/sched/syscalls.c
@@ -543,6 +543,35 @@ static void __setscheduler_uclamp(struct task_struct *p,
 				  const struct sched_attr *attr) { }
 #endif
 
+static inline int sched_qos_validate(struct task_struct *p,
+				     const struct sched_attr *attr)
+{
+	switch (attr->sched_qos_type) {
+	case SCHED_QOS_RAMPUP_MULTIPLIER:
+		if (attr->sched_qos_cookie)
+			return -EINVAL;
+		if (attr->sched_qos_value < 0)
+			return -EINVAL;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static void __setscheduler_sched_qos(struct task_struct *p,
+				     const struct sched_attr *attr)
+{
+	switch (attr->sched_qos_type) {
+	case SCHED_QOS_RAMPUP_MULTIPLIER:
+		set_bit(SCHED_QOS_RAMPUP_MULTIPLIER, p->sched_qos.user_defined);
+		p->sched_qos.rampup_multiplier = attr->sched_qos_value;
+	default:
+		break;
+	}
+}
+
 /*
  * Allow unprivileged RT tasks to decrease priority.
  * Only issue a capable test if needed and only once to avoid an audit
@@ -668,8 +697,11 @@ int __sched_setscheduler(struct task_struct *p,
 			return retval;
 	}
 
-	if (attr->sched_flags & SCHED_FLAG_QOS)
-		return -EOPNOTSUPP;
+	if (attr->sched_flags & SCHED_FLAG_QOS) {
+		retval = sched_qos_validate(p, attr);
+		if (retval)
+			return retval;
+	}
 
 	/*
 	 * SCHED_DEADLINE bandwidth accounting relies on stable cpusets
@@ -799,7 +831,9 @@ int __sched_setscheduler(struct task_struct *p,
 		__setscheduler_params(p, attr);
 		__setscheduler_prio(p, newprio);
 	}
+
 	__setscheduler_uclamp(p, attr);
+	__setscheduler_sched_qos(p, attr);
 
 	if (queued) {
 		/*

From patchwork Tue Aug 20 16:35:08 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Qais Yousef <qyousef@layalina.io>
X-Patchwork-Id: 820840
Received: from mail-ej1-f47.google.com (mail-ej1-f47.google.com
 [209.85.218.47])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9AAB91BD507
 for <linux-pm@vger.kernel.org>; Tue, 20 Aug 2024 16:36:50 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.218.47
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
 t=1724171812; cv=none;
 b=io8r6qX0KnyQjU9O+HT5aggswtTUuVD3SJ3DNgX+CByd+z1qAI3XQBx6vd9Ek7bA/mUNL0SxG4JngeYQ3uBpDPEWSqdW5U6R9qwxeqtOSbxS+4cfh3cySHMqzBa5is8aQ1pvjmaFU2+644AwnTJJk+rJor872Wcv/KVwv3AlUYc=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
 s=arc-20240116; t=1724171812; c=relaxed/simple;
 bh=OqsaIQh4uctp/oFXsRzpZsyZxGxfrG6iSiwYiGo8PH8=;
 h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
 MIME-Version;
 b=fx5gjF024Uv1nbC2QlVmLBveZXDPqlOGv7RWg3gmo+w+G867QL8bWYotNP9HDG1bWLzwfsG+vi53VT4Zj87nVy5hIAsPw51QTGrwlHSWoSLGWSwTI/xKxjtbkUgMokkAD56V/O6aHHHpw7O8yJG+QgTfoKmnA2ccrduwfG/dAHo=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=none (p=none dis=none) header.from=layalina.io;
 spf=pass smtp.mailfrom=layalina.io;
 dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com
 header.i=@layalina-io.20230601.gappssmtp.com header.b=RTMnOQ6Y;
 arc=none smtp.client-ip=209.85.218.47
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=none (p=none dis=none) header.from=layalina.io
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=layalina.io
Authentication-Results: smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com
 header.i=@layalina-io.20230601.gappssmtp.com header.b="RTMnOQ6Y"
Received: by mail-ej1-f47.google.com with SMTP id
 a640c23a62f3a-a7a9cf7d3f3so729862966b.1
 for <linux-pm@vger.kernel.org>; Tue, 20 Aug 2024 09:36:50 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=layalina-io.20230601.gappssmtp.com; s=20230601; t=1724171809;
 x=1724776609; darn=vger.kernel.org;
 h=content-transfer-encoding:mime-version:references:in-reply-to
 :message-id:date:subject:cc:to:from:from:to:cc:subject:date
 :message-id:reply-to;
 bh=njOkbGyQ961EBslTvE6EDoi/18VK+01g5BSPEHY0GfY=;
 b=RTMnOQ6YK46DwBqw1h2uMXzX+6ia50MIzmrFzdRyi606L4gpe9mgVOQLjyiHT0/inH
 qQy/5Ws94ElnG0lNwn3558zunrUbtYXHvreSFkFHs58r9TmMlqvmJib1bSe1TbExfO+g
 ZQnypvquHj8Yk5Jzuvr3A5FM8ynUFKwQjOzQmyk7Rdgg5UBUILfsCKCNBdN3DCoH6ndF
 D87/NM4OjWar68x+o8aX5UnG/wHwTMmtviaVvc7Btr0Wl/zNsghZoWY0EaZZiUD+bRfl
 AW9MtTMT1ZM4vXJwMSsBI0Eu+rXIXkS62s9DLOEJ/43Gd2E5XFVkNIGxWYwtTC4JbwiP
 Cxfg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20230601; t=1724171809; x=1724776609;
 h=content-transfer-encoding:mime-version:references:in-reply-to
 :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
 :subject:date:message-id:reply-to;
 bh=njOkbGyQ961EBslTvE6EDoi/18VK+01g5BSPEHY0GfY=;
 b=jAzOzDMY+rrm4mAQM+yyF8ro/T9M2SxnemTqn1ArsDY3DiKbie1tgEs1fBNfzdc2Tc
 BOTgp2fvZ6B9TQ6wiPlqNrQdqPdJ3crq1Dh4rG6tTrHvJNWmUegTbURw4/guxeV+CO5f
 wsb6/xb2HyXH1k2ciRHWQeoJK2ZB/vl/Pv4iRgs3Sn7rnJAwx8K+3TK4Jr6HvMnN205I
 v3xP2dh+K3Cu6474I9QLL3FfyAM3E5bTtPy7N7dA90o6FoY53/3NlXyRiy7rRAE2HGzY
 cKYDUK6cv00Ym6PVfHX3KEhbCVmdRLXb3OM8qKrRO7EmpqcBNxEk/K3Aj2cxQ7xyxEb7
 mfbw==
X-Forwarded-Encrypted: i=1;
 AJvYcCWGrNLZ6u7RzJdlEqzU6LrwfPoruEDG1JWwUYgvshPE5BHdUsySJ6NQ0ousUWGE/kwvWnK6uYCJNAlL9gEvrQUD2nnQJZR271Q=
X-Gm-Message-State: AOJu0YxQs9VgkAT5SVItPSq2NB3VmqAJhIHFUoJWef/sp9SifYgxg0SR
 1XF/j9mW4v6SAn1Sl1LhSJEFbegCWoWkETUMIZQfkcQ8AxKkCPJPty8Jn8qhrGY=
X-Google-Smtp-Source: AGHT+IGAOpAp1NhyW1kkfvm5+3bpeVK+LP152tiTM71qKtdLXPrb3x+lQlXhi6Bkd3u5CZOa4S8bpQ==
X-Received: by 2002:a17:906:f591:b0:a7a:c106:3659 with SMTP id
 a640c23a62f3a-a8647b6c7a6mr242678966b.60.1724171808835;
 Tue, 20 Aug 2024 09:36:48 -0700 (PDT)
Received: from airbuntu.. (host81-157-90-255.range81-157.btcentralplus.com.
 [81.157.90.255]) by smtp.gmail.com with ESMTPSA id
 a640c23a62f3a-a83838cfb5esm780728766b.59.2024.08.20.09.36.48
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Tue, 20 Aug 2024 09:36:48 -0700 (PDT)
From: Qais Yousef <qyousef@layalina.io>
To: Ingo Molnar <mingo@kernel.org>, Peter Zijlstra <peterz@infradead.org>,
 Vincent Guittot <vincent.guittot@linaro.org>,
 "Rafael J. Wysocki" <rafael@kernel.org>,
 Viresh Kumar <viresh.kumar@linaro.org>
Cc: Juri Lelli <juri.lelli@redhat.com>, Steven Rostedt <rostedt@goodmis.org>,
 Dietmar Eggemann <dietmar.eggemann@arm.com>,
 John Stultz <jstultz@google.com>, linux-pm@vger.kernel.org,
 linux-kernel@vger.kernel.org, Qais Yousef <qyousef@layalina.io>
Subject: [RFC PATCH 12/16] sched/pelt: Add new waiting_avg to record when
 runnable && !running
Date: Tue, 20 Aug 2024 17:35:08 +0100
Message-Id: <20240820163512.1096301-13-qyousef@layalina.io>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20240820163512.1096301-1-qyousef@layalina.io>
References: <20240820163512.1096301-1-qyousef@layalina.io>
Precedence: bulk
X-Mailing-List: linux-pm@vger.kernel.org
List-Id: <linux-pm.vger.kernel.org>
List-Subscribe: <mailto:linux-pm+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-pm+unsubscribe@vger.kernel.org>
MIME-Version: 1.0

This info will be useful to understand how long tasks end up waiting
behind other tasks. This info is recorded for tasks only, and
added/subtracted from root cfs_rq on __update_load_avg_se().

It also helps to decouple util_avg which indicates tasks computational
demand from the fact that the CPU might need to run faster to reduce the
waiting time. It has been a point of confusion in the past while
discussing uclamp and util_avg and the fact that not keeping freq high
means tasks will take longer to run and cause delays. Isolating the
source of delay into its own signal would be a better way to take this
source of delay into account when making decisions independently of
task's/CPU's computational demands.

It is not used now. But will be used later to help drive DVFS headroom.
It could become a helpful metric to help us manage waiting latencies in
general, for example in load balance.

TODO: waiting_avg should use rq_clock_task() as it doesn't care about
invariance. Waiting time should reflect actual wait in realtime as this
is the measure of latency that users care about.

Signed-off-by: Qais Yousef <qyousef@layalina.io>
---
 include/linux/sched.h |  2 ++
 kernel/sched/debug.c  |  5 +++++
 kernel/sched/fair.c   | 32 +++++++++++++++++++++++++++++-
 kernel/sched/pelt.c   | 45 ++++++++++++++++++++++++++++++-------------
 4 files changed, 70 insertions(+), 14 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index a30ee43a25fb..f332ce5e226f 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -477,10 +477,12 @@ struct sched_avg {
 	u64				last_update_time;
 	u64				load_sum;
 	u64				runnable_sum;
+	u64				waiting_sum;
 	u32				util_sum;
 	u32				period_contrib;
 	unsigned long			load_avg;
 	unsigned long			runnable_avg;
+	unsigned long			waiting_avg;
 	unsigned long			util_avg;
 	unsigned int			util_est;
 } ____cacheline_aligned;
diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index c1eb9a1afd13..5fa2662a4a50 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -528,6 +528,7 @@ static void print_cfs_group_stats(struct seq_file *m, int cpu, struct task_group
 	P(se->avg.load_avg);
 	P(se->avg.util_avg);
 	P(se->avg.runnable_avg);
+	P(se->avg.waiting_avg);
 #endif
 
 #undef PN_SCHEDSTAT
@@ -683,6 +684,8 @@ void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq)
 			cfs_rq->avg.load_avg);
 	SEQ_printf(m, "  .%-30s: %lu\n", "runnable_avg",
 			cfs_rq->avg.runnable_avg);
+	SEQ_printf(m, "  .%-30s: %lu\n", "waiting_avg",
+			cfs_rq->avg.waiting_avg);
 	SEQ_printf(m, "  .%-30s: %lu\n", "util_avg",
 			cfs_rq->avg.util_avg);
 	SEQ_printf(m, "  .%-30s: %u\n", "util_est",
@@ -1071,9 +1074,11 @@ void proc_sched_show_task(struct task_struct *p, struct pid_namespace *ns,
 #ifdef CONFIG_SMP
 	P(se.avg.load_sum);
 	P(se.avg.runnable_sum);
+	P(se.avg.waiting_sum);
 	P(se.avg.util_sum);
 	P(se.avg.load_avg);
 	P(se.avg.runnable_avg);
+	P(se.avg.waiting_avg);
 	P(se.avg.util_avg);
 	P(se.avg.last_update_time);
 	PM(se.avg.util_est, ~UTIL_AVG_UNCHANGED);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 3d9794db58e1..a8dbba0b755e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4726,6 +4726,22 @@ static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
 	trace_pelt_cfs_tp(cfs_rq);
 }
 
+static inline void add_waiting_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
+{
+	unsigned long waiting_avg;
+	waiting_avg = READ_ONCE(cfs_rq->avg.waiting_avg);
+	waiting_avg += READ_ONCE(se->avg.waiting_avg);
+	WRITE_ONCE(cfs_rq->avg.waiting_avg, waiting_avg);
+}
+
+static inline void sub_waiting_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
+{
+	unsigned long waiting_avg;
+	waiting_avg = READ_ONCE(cfs_rq->avg.waiting_avg);
+	waiting_avg -= min(waiting_avg, READ_ONCE(se->avg.waiting_avg));
+	WRITE_ONCE(cfs_rq->avg.waiting_avg, waiting_avg);
+}
+
 /*
  * Optional action to be done while updating the load average
  */
@@ -4744,8 +4760,15 @@ static inline void update_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
 	 * Track task load average for carrying it to new CPU after migrated, and
 	 * track group sched_entity load average for task_h_load calculation in migration
 	 */
-	if (se->avg.last_update_time && !(flags & SKIP_AGE_LOAD))
+	if (se->avg.last_update_time && !(flags & SKIP_AGE_LOAD)) {
+		bool update_rq_waiting_avg = entity_is_task(se) && se_runnable(se);
+
+		if (update_rq_waiting_avg)
+			sub_waiting_avg(&rq_of(cfs_rq)->cfs, se);
 		__update_load_avg_se(now, cfs_rq, se);
+		if (update_rq_waiting_avg)
+			add_waiting_avg(&rq_of(cfs_rq)->cfs, se);
+	}
 
 	decayed  = update_cfs_rq_load_avg(now, cfs_rq);
 	decayed |= propagate_entity_load_avg(se);
@@ -5182,6 +5205,11 @@ attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) {}
 static inline void
 detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) {}
 
+static inline void
+add_waiting_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) {}
+static inline void
+sub_waiting_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) {}
+
 static inline int sched_balance_newidle(struct rq *rq, struct rq_flags *rf)
 {
 	return 0;
@@ -6786,6 +6814,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 	 * estimated utilization, before we update schedutil.
 	 */
 	util_est_enqueue(&rq->cfs, p);
+	add_waiting_avg(&rq->cfs, se);
 
 	/*
 	 * If in_iowait is set, the code below may not trigger any cpufreq
@@ -6874,6 +6903,7 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 	bool was_sched_idle = sched_idle_rq(rq);
 
 	util_est_dequeue(&rq->cfs, p);
+	sub_waiting_avg(&rq->cfs, se);
 
 	for_each_sched_entity(se) {
 		cfs_rq = cfs_rq_of(se);
diff --git a/kernel/sched/pelt.c b/kernel/sched/pelt.c
index 536575757420..f0974abf8566 100644
--- a/kernel/sched/pelt.c
+++ b/kernel/sched/pelt.c
@@ -103,7 +103,8 @@ static u32 __accumulate_pelt_segments(u64 periods, u32 d1, u32 d3)
  */
 static __always_inline u32
 accumulate_sum(u64 delta, struct sched_avg *sa,
-	       unsigned long load, unsigned long runnable, int running)
+	       unsigned long load, unsigned long runnable, int running,
+	       bool is_task)
 {
 	u32 contrib = (u32)delta; /* p == 0 -> delta < 1024 */
 	u64 periods;
@@ -118,6 +119,7 @@ accumulate_sum(u64 delta, struct sched_avg *sa,
 		sa->load_sum = decay_load(sa->load_sum, periods);
 		sa->runnable_sum =
 			decay_load(sa->runnable_sum, periods);
+		sa->waiting_sum = decay_load((u64)(sa->waiting_sum), periods);
 		sa->util_sum = decay_load((u64)(sa->util_sum), periods);
 
 		/*
@@ -147,6 +149,8 @@ accumulate_sum(u64 delta, struct sched_avg *sa,
 		sa->runnable_sum += runnable * contrib << SCHED_CAPACITY_SHIFT;
 	if (running)
 		sa->util_sum += contrib << SCHED_CAPACITY_SHIFT;
+	if (is_task && runnable && !running)
+		sa->waiting_sum += contrib << SCHED_CAPACITY_SHIFT;
 
 	return periods;
 }
@@ -181,7 +185,8 @@ accumulate_sum(u64 delta, struct sched_avg *sa,
  */
 static __always_inline int
 ___update_load_sum(u64 now, struct sched_avg *sa,
-		  unsigned long load, unsigned long runnable, int running)
+		  unsigned long load, unsigned long runnable, int running,
+		  bool is_task)
 {
 	int time_shift;
 	u64 delta;
@@ -232,7 +237,7 @@ ___update_load_sum(u64 now, struct sched_avg *sa,
 	 * Step 1: accumulate *_sum since last_update_time. If we haven't
 	 * crossed period boundaries, finish.
 	 */
-	if (!accumulate_sum(delta, sa, load, runnable, running))
+	if (!accumulate_sum(delta, sa, load, runnable, running, is_task))
 		return 0;
 
 	return 1;
@@ -272,6 +277,7 @@ ___update_load_avg(struct sched_avg *sa, unsigned long load)
 	 */
 	sa->load_avg = div_u64(load * sa->load_sum, divider);
 	sa->runnable_avg = div_u64(sa->runnable_sum, divider);
+	sa->waiting_avg = div_u64(sa->waiting_sum, divider);
 	WRITE_ONCE(sa->util_avg, sa->util_sum / divider);
 }
 
@@ -303,7 +309,7 @@ ___update_load_avg(struct sched_avg *sa, unsigned long load)
 
 int __update_load_avg_blocked_se(u64 now, struct sched_entity *se)
 {
-	if (___update_load_sum(now, &se->avg, 0, 0, 0)) {
+	if (___update_load_sum(now, &se->avg, 0, 0, 0, false)) {
 		___update_load_avg(&se->avg, se_weight(se));
 		trace_pelt_se_tp(se);
 		return 1;
@@ -314,10 +320,17 @@ int __update_load_avg_blocked_se(u64 now, struct sched_entity *se)
 
 int __update_load_avg_se(u64 now, struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
+	bool is_task = entity_is_task(se);
+
+	if (is_task)
+		rq_of(cfs_rq)->cfs.avg.waiting_avg -= se->avg.waiting_avg;
+
 	if (___update_load_sum(now, &se->avg, !!se->on_rq, se_runnable(se),
-				cfs_rq->curr == se)) {
+				cfs_rq->curr == se, is_task)) {
 
 		___update_load_avg(&se->avg, se_weight(se));
+		if (is_task)
+			rq_of(cfs_rq)->cfs.avg.waiting_avg += se->avg.waiting_avg;
 		cfs_se_util_change(&se->avg);
 		trace_pelt_se_tp(se);
 		return 1;
@@ -331,7 +344,8 @@ int __update_load_avg_cfs_rq(u64 now, struct cfs_rq *cfs_rq)
 	if (___update_load_sum(now, &cfs_rq->avg,
 				scale_load_down(cfs_rq->load.weight),
 				cfs_rq->h_nr_running,
-				cfs_rq->curr != NULL)) {
+				cfs_rq->curr != NULL,
+				false)) {
 
 		___update_load_avg(&cfs_rq->avg, 1);
 		trace_pelt_cfs_tp(cfs_rq);
@@ -357,7 +371,8 @@ int update_rt_rq_load_avg(u64 now, struct rq *rq, int running)
 	if (___update_load_sum(now, &rq->avg_rt,
 				running,
 				running,
-				running)) {
+				running,
+				false)) {
 
 		___update_load_avg(&rq->avg_rt, 1);
 		trace_pelt_rt_tp(rq);
@@ -383,7 +398,8 @@ int update_dl_rq_load_avg(u64 now, struct rq *rq, int running)
 	if (___update_load_sum(now, &rq->avg_dl,
 				running,
 				running,
-				running)) {
+				running,
+				false)) {
 
 		___update_load_avg(&rq->avg_dl, 1);
 		trace_pelt_dl_tp(rq);
@@ -414,7 +430,8 @@ int update_hw_load_avg(u64 now, struct rq *rq, u64 capacity)
 	if (___update_load_sum(now, &rq->avg_hw,
 			       capacity,
 			       capacity,
-			       capacity)) {
+			       capacity,
+			       false)) {
 		___update_load_avg(&rq->avg_hw, 1);
 		trace_pelt_hw_tp(rq);
 		return 1;
@@ -462,11 +479,13 @@ int update_irq_load_avg(struct rq *rq, u64 running)
 	ret = ___update_load_sum(rq->clock - running, &rq->avg_irq,
 				0,
 				0,
-				0);
+				0,
+				false);
 	ret += ___update_load_sum(rq->clock, &rq->avg_irq,
 				1,
 				1,
-				1);
+				1,
+				false);
 
 	if (ret) {
 		___update_load_avg(&rq->avg_irq, 1);
@@ -536,7 +555,7 @@ unsigned long approximate_util_avg(unsigned long util, u64 delta)
 	if (unlikely(!delta))
 		return util;
 
-	accumulate_sum(delta << sched_pelt_lshift, &sa, 1, 0, 1);
+	accumulate_sum(delta << sched_pelt_lshift, &sa, 1, 0, 1, false);
 	___update_load_avg(&sa, 0);
 
 	return sa.util_avg;
@@ -555,7 +574,7 @@ u64 approximate_runtime(unsigned long util)
 		return runtime;
 
 	while (sa.util_avg < util) {
-		accumulate_sum(delta, &sa, 1, 0, 1);
+		accumulate_sum(delta, &sa, 1, 0, 1, false);
 		___update_load_avg(&sa, 0);
 		runtime++;
 	}

From patchwork Tue Aug 20 16:35:09 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Qais Yousef <qyousef@layalina.io>
X-Patchwork-Id: 821194
Received: from mail-ej1-f43.google.com (mail-ej1-f43.google.com
 [209.85.218.43])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by smtp.subspace.kernel.org (Postfix) with ESMTPS id F00B91C0DCA
 for <linux-pm@vger.kernel.org>; Tue, 20 Aug 2024 16:36:51 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.218.43
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
 t=1724171813; cv=none;
 b=W5PiwPlKcJotDQeTtY/94Unf/36thZyw3E+gjEhxHd75p84UkWfUAsXmdr0usb0Qa6KQakZKEQBFY5NEpYYovFYs3i8NWbHYGeur8PTpxGI96GSna38Pwu4Oj0mAdKNiJXRaQRzZNk5yC7BUv2caRj4yBJeKEk7arFBrEuTCWMo=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
 s=arc-20240116; t=1724171813; c=relaxed/simple;
 bh=peGOTrMpJU6Y/7bYHyR4tSGHI6Lbtst36QPON3uOXj0=;
 h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
 MIME-Version;
 b=WlnjVwzx01eSgUlJhJPpXeLmmLnHl8B9SyPdGn9GNYdFadLhXGN+XHh1gULbnCOpT+lRMc0fK/Otpot96z6oc1VfgrgKzw6QB6phAB4i/O98+SWaX/x7T/9Vh/+0nKLUTT7of1jkBif9tEnUJF6SuME3jqfNpe1fHVtQrl9vcfo=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=none (p=none dis=none) header.from=layalina.io;
 spf=pass smtp.mailfrom=layalina.io;
 dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com
 header.i=@layalina-io.20230601.gappssmtp.com header.b=3A3uUyoa;
 arc=none smtp.client-ip=209.85.218.43
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=none (p=none dis=none) header.from=layalina.io
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=layalina.io
Authentication-Results: smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com
 header.i=@layalina-io.20230601.gappssmtp.com header.b="3A3uUyoa"
Received: by mail-ej1-f43.google.com with SMTP id
 a640c23a62f3a-a8647056026so125241366b.0
 for <linux-pm@vger.kernel.org>; Tue, 20 Aug 2024 09:36:51 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=layalina-io.20230601.gappssmtp.com; s=20230601; t=1724171810;
 x=1724776610; darn=vger.kernel.org;
 h=content-transfer-encoding:mime-version:references:in-reply-to
 :message-id:date:subject:cc:to:from:from:to:cc:subject:date
 :message-id:reply-to;
 bh=nawo0iXyk8rugawGjPThZSFWb4byKstZYwvxJoh01/U=;
 b=3A3uUyoa2XVh8IQpUMWF9oo+EJnAtvGnVkgTOkd3SrhbRtIYlpsyGX8V8H0KcR/V1e
 nC3fH2Ff2pHhFdNpRiGMeGkoOLXJJ3ZVk4/EDrag/Ntp/+MGPqI7avbvbJDjJjbxxTPg
 lzc0t5Agrk7BhTMPRV3uFnEnFfmr9nLTrrUJP5okbQOVen/wZpjKsrHUhEPuIHUCDEJE
 3uapYNNelcMfaAL2Lb+dTBqsLyUESbDBWE47bOvfOOznohFRNZ8BkOduG711d+zvGaf7
 FoDW+9myWedFeQHli9yAucsOfnybJAoZ/G8ZUmlROJcvzHe4zoCdGL7ccOb1BtYf75LJ
 FFJQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20230601; t=1724171810; x=1724776610;
 h=content-transfer-encoding:mime-version:references:in-reply-to
 :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
 :subject:date:message-id:reply-to;
 bh=nawo0iXyk8rugawGjPThZSFWb4byKstZYwvxJoh01/U=;
 b=EKO6BKHa14pM7q71gS3LkRdyDqcsYISEV3Nn6A5RM5+pDou6iMNhBYSjd/zFiGtyyG
 oLddjHZOn86E1BmIm8VMxDvp41llEnfVKCSUVfAkjQAgJqsyuppOkQWwFM4MY6VYrmiM
 Tt2DfGBvgFrfLMc1FvAdRSA2qb8iA0h0k2DcWjN8o27v1bKmfwcQAH/WzVEj8rqePLE5
 jWtvJgw/VTe93wUaLlhtRCVGhl3YsQcA2FgA6ijEk10PWIkuTiVkzpyrpqQGOEyL88vZ
 jFm6HMkm7TbiPcxtQYSxBlc21cdIgdYzTSNOLnteHZN9pnmTp3Xfp5tBBqRVMZPgw+DX
 VvCg==
X-Forwarded-Encrypted: i=1;
 AJvYcCV3NhACOj1bw6RSLUFlYsofGMIwhupnhcCQyLwy3Sh5VePjWIWZDegMks0PqFzv+/gIYDhpdibCIA==@vger.kernel.org
X-Gm-Message-State: AOJu0Yww0VG48dURrqKa1GaG3UV2qQePzhYY8MiG0wmtsBYt4UI5WAIk
 uA8rciXirJw3mxkzGJB9ivHGr+DJPdAUgG7jU54vFZ+J8EQ484DX/Ue07szdYVM=
X-Google-Smtp-Source: AGHT+IGIadSEibAssyW3zb8AhXXDHexQCIIATFiI9hph0UEKoDLXeaAbkr5C1JABp/pKbDbwR99llQ==
X-Received: by 2002:a17:907:d3c9:b0:a7a:b643:654f with SMTP id
 a640c23a62f3a-a839292f3ccmr1052502366b.15.1724171810258;
 Tue, 20 Aug 2024 09:36:50 -0700 (PDT)
Received: from airbuntu.. (host81-157-90-255.range81-157.btcentralplus.com.
 [81.157.90.255]) by smtp.gmail.com with ESMTPSA id
 a640c23a62f3a-a83838cfb5esm780728766b.59.2024.08.20.09.36.48
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Tue, 20 Aug 2024 09:36:49 -0700 (PDT)
From: Qais Yousef <qyousef@layalina.io>
To: Ingo Molnar <mingo@kernel.org>, Peter Zijlstra <peterz@infradead.org>,
 Vincent Guittot <vincent.guittot@linaro.org>,
 "Rafael J. Wysocki" <rafael@kernel.org>,
 Viresh Kumar <viresh.kumar@linaro.org>
Cc: Juri Lelli <juri.lelli@redhat.com>, Steven Rostedt <rostedt@goodmis.org>,
 Dietmar Eggemann <dietmar.eggemann@arm.com>,
 John Stultz <jstultz@google.com>, linux-pm@vger.kernel.org,
 linux-kernel@vger.kernel.org, Qais Yousef <qyousef@layalina.io>
Subject: [RFC PATCH 13/16] sched/schedutil: Take into account waiting_avg in
 apply_dvfs_headroom
Date: Tue, 20 Aug 2024 17:35:09 +0100
Message-Id: <20240820163512.1096301-14-qyousef@layalina.io>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20240820163512.1096301-1-qyousef@layalina.io>
References: <20240820163512.1096301-1-qyousef@layalina.io>
Precedence: bulk
X-Mailing-List: linux-pm@vger.kernel.org
List-Id: <linux-pm.vger.kernel.org>
List-Subscribe: <mailto:linux-pm+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-pm+unsubscribe@vger.kernel.org>
MIME-Version: 1.0

We now have three sources of delays.

	1. How often we send cpufreq_updates
	2. How often we update util_avg
	3. How long tasks wait in RUNNABLE to become RUNNING

The headroom should cater for all this type of delays to ensure the
system is running at adequate performance point.

We want to pick the maximum headroom required by any of these sources of
delays.

TODO: the signal should use task clock not pelt as this should be
real time based and we don't care about invariance.

Signed-off-by: Qais Yousef <qyousef@layalina.io>
---
 kernel/sched/cpufreq_schedutil.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 94e35b7c972d..318b09bc4ab1 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -259,10 +259,15 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
  * dvfs_update_delay of the cpufreq governor or min(curr.se.slice, TICK_US),
  * whichever is higher.
  *
+ * Also take into accounting how long tasks have been waiting in runnable but
+ * !running state. If it is high, it means we need higher DVFS headroom to
+ * reduce it.
+ *
  * XXX: Should we provide headroom when the util is decaying?
  */
 static inline unsigned long sugov_apply_dvfs_headroom(unsigned long util,  int cpu)
 {
+	unsigned long update_headroom, waiting_headroom;
 	struct rq *rq = cpu_rq(cpu);
 	u64 delay;
 
@@ -276,7 +281,10 @@ static inline unsigned long sugov_apply_dvfs_headroom(unsigned long util,  int c
 		delay = TICK_USEC;
 	delay = max(delay, per_cpu(dvfs_update_delay, cpu));
 
-	return approximate_util_avg(util, delay);
+	update_headroom = approximate_util_avg(util, delay);
+	waiting_headroom = util + READ_ONCE(rq->cfs.avg.waiting_avg);
+
+	return max(update_headroom, waiting_headroom);
 }
 
 unsigned long sugov_effective_cpu_perf(int cpu, unsigned long actual,

From patchwork Tue Aug 20 16:35:10 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Qais Yousef <qyousef@layalina.io>
X-Patchwork-Id: 820839
Received: from mail-ej1-f51.google.com (mail-ej1-f51.google.com
 [209.85.218.51])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by smtp.subspace.kernel.org (Postfix) with ESMTPS id CDC861C3788
 for <linux-pm@vger.kernel.org>; Tue, 20 Aug 2024 16:36:52 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.218.51
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
 t=1724171814; cv=none;
 b=poaj57N5JdF+LsdMWH6aW3vzQKdJwGVbqImjCxvDLAJPuvZ/6IA7f/SlLjreiJvo6g8aZaQFfC4qGbtl27QeNiALJyyDQh2CyKQWiL/OrjGA8ogPCO3BJcngAxHMBMd7PcpuqB4h1N89/47XXadxEWAC3x1LxaMrnMJvtVmDypk=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
 s=arc-20240116; t=1724171814; c=relaxed/simple;
 bh=9ZPSZBuNvWFZgZLOCmzGhftDorxtai3m2giRjex3pcY=;
 h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
 MIME-Version;
 b=as0KfUvf0VjXdXLPLwrK1klUl4KRblJvlRBoUU+N32lcsz/fw5dHJ8f0/8FtdzoUB9+WbEROAgzQ8sdNKD2rTB+7Xku+WSigJ9YAvXP1nroiAKqKCcQlJYXmD/Aw7pNAK1veCDdPEicsaRIJfjz+DpxJCKv4AE+OXybH+BerwlA=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=none (p=none dis=none) header.from=layalina.io;
 spf=pass smtp.mailfrom=layalina.io;
 dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com
 header.i=@layalina-io.20230601.gappssmtp.com header.b=OYm/b04C;
 arc=none smtp.client-ip=209.85.218.51
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=none (p=none dis=none) header.from=layalina.io
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=layalina.io
Authentication-Results: smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com
 header.i=@layalina-io.20230601.gappssmtp.com header.b="OYm/b04C"
Received: by mail-ej1-f51.google.com with SMTP id
 a640c23a62f3a-a8643235f99so163172066b.3
 for <linux-pm@vger.kernel.org>; Tue, 20 Aug 2024 09:36:52 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=layalina-io.20230601.gappssmtp.com; s=20230601; t=1724171811;
 x=1724776611; darn=vger.kernel.org;
 h=content-transfer-encoding:mime-version:references:in-reply-to
 :message-id:date:subject:cc:to:from:from:to:cc:subject:date
 :message-id:reply-to;
 bh=9h5cxC5T/07ttsRJWHESo6Dab0VfuQh9Cth3dLGuihw=;
 b=OYm/b04CrUWrJrdjX1an4E8erpwS2OhGYQZkI6WK68VaOwmsESIYHq5xbTJGzlFKGX
 NsXyrOzuTP8zggrwTcWzP4ej1rOOneaPjb/nVsCdLwiJ/gi2jZv9ypuEtrmw99ELT455
 ljeDskts8uzbrDKuMxLP+7M0luwGiSt+VqA3Ei/4M1wuI5QACp7Xk6C/y/IkUo5Mztal
 aXSjKYLMtM+QJTuEZlTfM7uuQsJWTl4OUFEIPupjgcdSLHH3/srpjxVjNAxzWjYc0ZpY
 crJjifgsxSAZjBgFo0CWXUjNxbgQAKA4cLhF8NP2tPKhNEaoncfr7tk1WBbyZfgv9OWl
 0xaA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20230601; t=1724171811; x=1724776611;
 h=content-transfer-encoding:mime-version:references:in-reply-to
 :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
 :subject:date:message-id:reply-to;
 bh=9h5cxC5T/07ttsRJWHESo6Dab0VfuQh9Cth3dLGuihw=;
 b=WyssfxmKgF2Z//eI2u7NFJWHE8yCraoFKTEew9Nhz/xv26IJ5I0hMuYLcAUi6CYYG7
 DUU/NJFz7ULIent9pKdYBVOoHVyNu0sjYAnf6E+pTs1q4kjBB0NIi3oQSfGIsro3ahsp
 /xwBcAIRpSeuRbIxAp3+WeN3OG40TCqfzNqdpLB1cQ994tceAx7ZLrFrOx5ovxIfGtRO
 Ewqx7Mmabe6Iy9p3WRy8DiVJH8xm9Aga8YCPsr0XLaApszk2Eiyez3QjVNEhxXLqhJvJ
 C1Yux8uejOoBQdFpNEQ+NHOAAYpiFtzlBJ56Xj2PSdKk24mmETu0Bnv0rvOI5v/hV6ma
 Oumg==
X-Forwarded-Encrypted: i=1;
 AJvYcCUtAuOOv6N60uQJOG9TRep/ryzapgwvH11hMP/mQjoa7uh1qBzFdkgbvkeyq3LcPjX+wpZBrY6YWTpIkmD63EVQeg9ANu0p0gk=
X-Gm-Message-State: AOJu0YwkEjioWH3I+zTnCaIz+Ay5SZxS1lOnN+auO2GjJC0fpBO8jrRq
 v0wwf3imMWg8ud31AP6Rjauj0jVop2QO7z6Lwv+mV6USEW0akjstbRuHKRWEb2E=
X-Google-Smtp-Source: AGHT+IG0xeJvN3XiOQlKOh7QN7lETQdxPtqwJVb6er+yMgBCCOTRa61cwW2df57Oc6uhYemybizG5Q==
X-Received: by 2002:a17:907:97d2:b0:a77:cdaa:88a3 with SMTP id
 a640c23a62f3a-a86479e45efmr210115366b.27.1724171811016;
 Tue, 20 Aug 2024 09:36:51 -0700 (PDT)
Received: from airbuntu.. (host81-157-90-255.range81-157.btcentralplus.com.
 [81.157.90.255]) by smtp.gmail.com with ESMTPSA id
 a640c23a62f3a-a83838cfb5esm780728766b.59.2024.08.20.09.36.50
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Tue, 20 Aug 2024 09:36:50 -0700 (PDT)
From: Qais Yousef <qyousef@layalina.io>
To: Ingo Molnar <mingo@kernel.org>, Peter Zijlstra <peterz@infradead.org>,
 Vincent Guittot <vincent.guittot@linaro.org>,
 "Rafael J. Wysocki" <rafael@kernel.org>,
 Viresh Kumar <viresh.kumar@linaro.org>
Cc: Juri Lelli <juri.lelli@redhat.com>, Steven Rostedt <rostedt@goodmis.org>,
 Dietmar Eggemann <dietmar.eggemann@arm.com>,
 John Stultz <jstultz@google.com>, linux-pm@vger.kernel.org,
 linux-kernel@vger.kernel.org, Qais Yousef <qyousef@layalina.io>
Subject: [RFC PATCH 14/16] sched/schedutil: Ignore dvfs headroom when util is
 decaying
Date: Tue, 20 Aug 2024 17:35:10 +0100
Message-Id: <20240820163512.1096301-15-qyousef@layalina.io>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20240820163512.1096301-1-qyousef@layalina.io>
References: <20240820163512.1096301-1-qyousef@layalina.io>
Precedence: bulk
X-Mailing-List: linux-pm@vger.kernel.org
List-Id: <linux-pm.vger.kernel.org>
List-Subscribe: <mailto:linux-pm+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-pm+unsubscribe@vger.kernel.org>
MIME-Version: 1.0

It means we're being idling or doing less work and are already running
at a higher value. No need to apply any dvfs headroom in this case.

Signed-off-by: Qais Yousef <qyousef@layalina.io>
---
 kernel/sched/cpufreq_schedutil.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 318b09bc4ab1..4a1a8b353d51 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -9,6 +9,7 @@
 #define IOWAIT_BOOST_MIN	(SCHED_CAPACITY_SCALE / 8)
 
 DEFINE_PER_CPU_READ_MOSTLY(unsigned long, response_time_mult);
+DEFINE_PER_CPU(unsigned long, last_update_util);
 
 struct sugov_tunables {
 	struct gov_attr_set	attr_set;
@@ -262,15 +263,19 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
  * Also take into accounting how long tasks have been waiting in runnable but
  * !running state. If it is high, it means we need higher DVFS headroom to
  * reduce it.
- *
- * XXX: Should we provide headroom when the util is decaying?
  */
 static inline unsigned long sugov_apply_dvfs_headroom(unsigned long util,  int cpu)
 {
-	unsigned long update_headroom, waiting_headroom;
+	unsigned long update_headroom, waiting_headroom, prev_util;
 	struct rq *rq = cpu_rq(cpu);
 	u64 delay;
 
+	prev_util = per_cpu(last_update_util, cpu);
+	per_cpu(last_update_util, cpu) = util;
+
+	if (util < prev_util)
+		return util;
+
 	/*
 	 * What is the possible worst case scenario for updating util_avg, ctx
 	 * switch or TICK?

From patchwork Tue Aug 20 16:35:11 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Qais Yousef <qyousef@layalina.io>
X-Patchwork-Id: 821193
Received: from mail-ed1-f51.google.com (mail-ed1-f51.google.com
 [209.85.208.51])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7997C1BE228
 for <linux-pm@vger.kernel.org>; Tue, 20 Aug 2024 16:36:53 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.208.51
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
 t=1724171815; cv=none;
 b=CPMm1jNnPj15MxpZqm5k6maKvpqYpxb30HTlX0o/QpVSC1PYIjFIEYZu9tPxY3nNbmrHG9uVxJ5MhgQ68B8MOf4ki0TlHMG8kT2qKt37ExyVpdwDjGE8wVk8gxZmL88bvKnRPIEcgS1WMO9zqeL8JbZLqzxutuN59CUQA62YJ0I=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
 s=arc-20240116; t=1724171815; c=relaxed/simple;
 bh=nw2KAMga+N9SBRpuwZhMNhiboc0xudR9Yyh15QBkKfY=;
 h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
 MIME-Version;
 b=ETU3nbCm3bLaXQhKeJALG0UJw8PnEgb5TKPmKUzrz8phpEAr8psSMJxjMjsGgr8I1Wht7PEXiZfbifubBl3T5c9yhgIOSoCcQpr0wzVlpFlx1Xg1u+3wvieWy087gv3BgdSMBq5eh0pYYfTcoSDZxIjH0CKovAXXA5WcUQTrWEw=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=none (p=none dis=none) header.from=layalina.io;
 spf=pass smtp.mailfrom=layalina.io;
 dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com
 header.i=@layalina-io.20230601.gappssmtp.com header.b=pl/50XDE;
 arc=none smtp.client-ip=209.85.208.51
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=none (p=none dis=none) header.from=layalina.io
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=layalina.io
Authentication-Results: smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com
 header.i=@layalina-io.20230601.gappssmtp.com header.b="pl/50XDE"
Received: by mail-ed1-f51.google.com with SMTP id
 4fb4d7f45d1cf-5bed83488b3so4442407a12.0
 for <linux-pm@vger.kernel.org>; Tue, 20 Aug 2024 09:36:53 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=layalina-io.20230601.gappssmtp.com; s=20230601; t=1724171812;
 x=1724776612; darn=vger.kernel.org;
 h=content-transfer-encoding:mime-version:references:in-reply-to
 :message-id:date:subject:cc:to:from:from:to:cc:subject:date
 :message-id:reply-to;
 bh=E2cahoYdlsFx3NF9sR068SfEnyVTIFV9fZk/j7POxiM=;
 b=pl/50XDENJksbaqzw4PK5B0o7XTiHGuKR+mRP5tKuYaEta1OmQv7i/eD5lK6XkHlDQ
 a/qfJSbzn+aezmGgu3sBOB/ghRpQgJSjHVXk7XrxOWGCHsxT4g7tS0RyKAJh4qfNnDZe
 MrNA2HLqNIk9236txVQ7FNfsYLglwn9dBCOt0xhSRwP02jP7XJZzLfjxPYhqbbUttXdt
 fs61SypRsU8SK/DY47EIbeDI9VXXP3Tj0GH0UdrNKLRgpaneR22f3Sj4BGRC8ahMKkQ1
 PXNEaT3SxDXRqfxmSSiF8oqyCp06nslixr/cvoqlHye8xbUUc+TdxdL2vQRNt7xk+TMG
 r8uA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20230601; t=1724171812; x=1724776612;
 h=content-transfer-encoding:mime-version:references:in-reply-to
 :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
 :subject:date:message-id:reply-to;
 bh=E2cahoYdlsFx3NF9sR068SfEnyVTIFV9fZk/j7POxiM=;
 b=gESLxrLV4AJn9Ut/PKnCZnJ1fAQ/aRMXcqI0e9raQcfm6UEiD8h274JWBC/jR/G+pW
 /5akHwLxt66bszEntZ2k4p92zW5CfF1njophvQeJtfIopf52PiZXGDc92ZkCf/pef9Nj
 GlKSuIlW1YAcYk9ye5RNwEG9UkQfKYdaFy6HPwlz/01MmUi6lKqnsd62vWn6P15DcOMv
 grCN/vsL72rQ8RDXGylpgOlW1r3lg6tefbV4wUVuHibOoxY4Q3lMSvhdNNNMEiBWiTtC
 HGIfh+GAW2NSw1OgWn3aymhigml9OTJy6dHYK9wXzxSAmaIE7i6J4sAhDFqwkgTBGMlx
 Ub5w==
X-Forwarded-Encrypted: i=1;
 AJvYcCXY/SDJZVZGZBKZzYx5a14uIMAJQ7kKg/p0MvlHZzQaOmAenSGwDlrNRAFxSm6t+T9ZWg73utXxCg==@vger.kernel.org
X-Gm-Message-State: AOJu0YzkR7svX4UjecDdHv2LX5QSThm4npLFs1I/JyoyE9PLr1S4FnXQ
 DU+01fN6D9dIxP1Vf5/k+bxq0hRHI9DKnZdBCCadoUo4oCHcMtBEaOuu9Kdu/dY=
X-Google-Smtp-Source: AGHT+IGYJnTxET1FI2CgmIVhGwHAm0Rvsu+OSAWgb5x2sIlIex+roRD/rs+GaRWocmTPkLXygBK5rw==
X-Received: by 2002:a17:907:2d1e:b0:a7a:a557:454b with SMTP id
 a640c23a62f3a-a8392a47a85mr998867766b.66.1724171811750;
 Tue, 20 Aug 2024 09:36:51 -0700 (PDT)
Received: from airbuntu.. (host81-157-90-255.range81-157.btcentralplus.com.
 [81.157.90.255]) by smtp.gmail.com with ESMTPSA id
 a640c23a62f3a-a83838cfb5esm780728766b.59.2024.08.20.09.36.51
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Tue, 20 Aug 2024 09:36:51 -0700 (PDT)
From: Qais Yousef <qyousef@layalina.io>
To: Ingo Molnar <mingo@kernel.org>, Peter Zijlstra <peterz@infradead.org>,
 Vincent Guittot <vincent.guittot@linaro.org>,
 "Rafael J. Wysocki" <rafael@kernel.org>,
 Viresh Kumar <viresh.kumar@linaro.org>
Cc: Juri Lelli <juri.lelli@redhat.com>, Steven Rostedt <rostedt@goodmis.org>,
 Dietmar Eggemann <dietmar.eggemann@arm.com>,
 John Stultz <jstultz@google.com>, linux-pm@vger.kernel.org,
 linux-kernel@vger.kernel.org, Qais Yousef <qyousef@layalina.io>
Subject: [RFC PATCH 15/16] sched/fair: Enable disabling util_est via
 rampup_multiplier
Date: Tue, 20 Aug 2024 17:35:11 +0100
Message-Id: <20240820163512.1096301-16-qyousef@layalina.io>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20240820163512.1096301-1-qyousef@layalina.io>
References: <20240820163512.1096301-1-qyousef@layalina.io>
Precedence: bulk
X-Mailing-List: linux-pm@vger.kernel.org
List-Id: <linux-pm.vger.kernel.org>
List-Subscribe: <mailto:linux-pm+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-pm+unsubscribe@vger.kernel.org>
MIME-Version: 1.0

util_est is a great feature to enable busy tasks with long sleep time to
maintain their perf level. But it can also be expensive in terms of
power for tasks that have no such perf requirements and just happened to
be busy in the last activation.

If a task sets its rampup_multiplier to 0, then it indicates that it is
happy to glide along with system default response and doesn't require
responsiveness. We can use that to further imply that the task is happy
to decay its util for long sleep too and disable util_est.

XXX: This could be overloading this QoS. We could add a separate more
explicit QoS to disable util_est for tasks that don't care.

Signed-off-by: Qais Yousef <qyousef@layalina.io>
---
 kernel/sched/fair.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a8dbba0b755e..ad72db5a266c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4918,6 +4918,14 @@ static inline void util_est_update(struct cfs_rq *cfs_rq,
 	if (!sched_feat(UTIL_EST))
 		return;
 
+	/*
+	 * rampup_multiplier = 0 indicates util_est is disabled.
+	 */
+	if (!p->sched_qos.rampup_multiplier) {
+		ewma = 0;
+		goto done;
+	}
+
 	/* Get current estimate of utilization */
 	ewma = READ_ONCE(p->se.avg.util_est);
 

From patchwork Tue Aug 20 16:35:12 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Qais Yousef <qyousef@layalina.io>
X-Patchwork-Id: 820838
Received: from mail-ed1-f52.google.com (mail-ed1-f52.google.com
 [209.85.208.52])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by smtp.subspace.kernel.org (Postfix) with ESMTPS id 747CF1C4631
 for <linux-pm@vger.kernel.org>; Tue, 20 Aug 2024 16:36:54 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.208.52
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
 t=1724171816; cv=none;
 b=QKa5U3sly3cxdJZ+CjT0ruIP+a4pnmKD3BjFPMgTnGZ0Wh9hmc2RctepNapRNyHKbKg8kcct8zpkC1uRqMXkWQ5F9KY3fUFSgYemcUMnsOU66y0NeVT88W3nXGsGrNr5EyRiGLFlYLMBBZvUB96xtF/iPu4SJ3jSiHIcPkuUaFE=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
 s=arc-20240116; t=1724171816; c=relaxed/simple;
 bh=FcQoU+XSuInJQQOSIHfVyJzGsFgsU0Fz0b3FqV2urj0=;
 h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
 MIME-Version;
 b=oCbAG6WiGjH2cGp4fkNRKyc0QKft/qRcA4KADYl9QykEYzSty5H49o9Ijqzd1v+uDS9YZzOtwz3kNxRBoRzVPGhJGfZtcZREgeqD9BkSMZTMDiP2+6UGJ0EFNJKu09183PWvbKhKinkcSTngv63u+6UpYGYgxyX+jLT++55REMg=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=none (p=none dis=none) header.from=layalina.io;
 spf=pass smtp.mailfrom=layalina.io;
 dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com
 header.i=@layalina-io.20230601.gappssmtp.com header.b=Fsybhqe7;
 arc=none smtp.client-ip=209.85.208.52
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=none (p=none dis=none) header.from=layalina.io
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=layalina.io
Authentication-Results: smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com
 header.i=@layalina-io.20230601.gappssmtp.com header.b="Fsybhqe7"
Received: by mail-ed1-f52.google.com with SMTP id
 4fb4d7f45d1cf-5bef4d9e7f8so3396995a12.2
 for <linux-pm@vger.kernel.org>; Tue, 20 Aug 2024 09:36:54 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=layalina-io.20230601.gappssmtp.com; s=20230601; t=1724171813;
 x=1724776613; darn=vger.kernel.org;
 h=content-transfer-encoding:mime-version:references:in-reply-to
 :message-id:date:subject:cc:to:from:from:to:cc:subject:date
 :message-id:reply-to;
 bh=Tc420hGJWxSmQzyq9r4KlOxaBiNHLqomnZxiY/aUtfM=;
 b=Fsybhqe7muuGhbmgEdTCITd2P88EV0KF/kH75xdvPkU8GWKNa07xoXRRDpOMicwFwi
 vnw0Q2B/jcxC1RuJO/XDYFVkPxJsw/xaCoeRULoBrYZlMW752AMBar6kGPqf6FFNedhL
 Lu72rWH9TFTi0KkxHwvdyNfht1QcpE/GV6jLq+LKsijg8xXCeG9VVJFUz2dyysQNFAs6
 5kWy52wB6U6vmT+hkRIJITVzIcGFPQfHxRuWMV9PbqxrGRtmHqYL8Y/i96xKFcEoiDTo
 KV96gXw6UIQj1afHF05y+JLNaQQpNLM9+ipA9I/wFwlsot+O10899CQnR/nTDqbLry7a
 EuWg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20230601; t=1724171813; x=1724776613;
 h=content-transfer-encoding:mime-version:references:in-reply-to
 :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
 :subject:date:message-id:reply-to;
 bh=Tc420hGJWxSmQzyq9r4KlOxaBiNHLqomnZxiY/aUtfM=;
 b=KqmPXUnZLK/cXK6RviUZ0DDciNzDKqRKQjfR5Evku1pU3HT68Ct6TrBaNaabs32JC2
 Yi9i/LUAe82lqk5255vFnBD/+pHjI4Zb92+36G3/CZLh8+LA9afV9mJ+EfUughxQ8ZJK
 OyNx/1cgBic20gvNNsCjVcq1jzVlwsJeRXwCRyagAAdn97auP28Rvj6+nq60zy0ojd0z
 AlhIUr4yl5i60dlkS3DXqPW1Q6v1hMfoC8WYdU1Ufdx9zTJeq9w5cbZ+VNyozmoUoTjL
 CNCNvoDg7qML36MFEg6dcCubNzZt2V+tpaQTSPPJexAiAZy4TTLXQ4HwqTdVkqeILWjm
 84AQ==
X-Forwarded-Encrypted: i=1;
 AJvYcCUKoBeINXw6/fOvhopd05GOSFWrBVONoH/eQsvUo2hQBftVBnnn3yZe3eXMQo4Krn0WWDqmjf+MmzYppvgpSaC7bwUGfmDquIY=
X-Gm-Message-State: AOJu0Yx5B4EarLfCD5EwTN0zqkcabgz7UXqrwIBbZnEblAvnUr+3Kmd+
 QobGScMNjzBR19uSjZ/stUhfF/hw1UOjDd5zoDHNvzKyg7FrOCtcplHp6ccSMig=
X-Google-Smtp-Source: AGHT+IFrFBtEWI8G3mIhZShOA/h8coVH75w3HUcYb7DxNvB76QnrvRr68I5ag+1Mh1T6RnbU58v1zg==
X-Received: by 2002:a17:907:d581:b0:a7a:8cb9:7490 with SMTP id
 a640c23a62f3a-a8392a15b30mr1057699366b.47.1724171812633;
 Tue, 20 Aug 2024 09:36:52 -0700 (PDT)
Received: from airbuntu.. (host81-157-90-255.range81-157.btcentralplus.com.
 [81.157.90.255]) by smtp.gmail.com with ESMTPSA id
 a640c23a62f3a-a83838cfb5esm780728766b.59.2024.08.20.09.36.52
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Tue, 20 Aug 2024 09:36:52 -0700 (PDT)
From: Qais Yousef <qyousef@layalina.io>
To: Ingo Molnar <mingo@kernel.org>, Peter Zijlstra <peterz@infradead.org>,
 Vincent Guittot <vincent.guittot@linaro.org>,
 "Rafael J. Wysocki" <rafael@kernel.org>,
 Viresh Kumar <viresh.kumar@linaro.org>
Cc: Juri Lelli <juri.lelli@redhat.com>, Steven Rostedt <rostedt@goodmis.org>,
 Dietmar Eggemann <dietmar.eggemann@arm.com>,
 John Stultz <jstultz@google.com>, linux-pm@vger.kernel.org,
 linux-kernel@vger.kernel.org, Qais Yousef <qyousef@layalina.io>
Subject: [RFC PATCH 16/16] sched/fair: Don't mess with util_avg post init
Date: Tue, 20 Aug 2024 17:35:12 +0100
Message-Id: <20240820163512.1096301-17-qyousef@layalina.io>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20240820163512.1096301-1-qyousef@layalina.io>
References: <20240820163512.1096301-1-qyousef@layalina.io>
Precedence: bulk
X-Mailing-List: linux-pm@vger.kernel.org
List-Id: <linux-pm.vger.kernel.org>
List-Subscribe: <mailto:linux-pm+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-pm+unsubscribe@vger.kernel.org>
MIME-Version: 1.0

The extrapolation logic for util_avg for newly forked tasks tries to
crystal ball the task's demand. This has worked well when the system
didn't have the means to help these tasks otherwise. But now we do have
util_est that will rampup faster. And uclamp_min to ensure a good
starting point if they really care.

Since we really can't crystal ball the behavior, and giving the same
starting value for all tasks is more consistent behavior for all forked
tasks, and it helps to preserve system resources for tasks to compete to
get them if they truly care, set the initial util_avg to be 0 when
util_est feature is enabled.

This should not impact workloads that need best single threaded
performance (like geekbench) given the previous improvements introduced
to help with faster rampup to reach max perf point more coherently and
consistently across systems.

Signed-off-by: Qais Yousef <qyousef@layalina.io>
---
 kernel/sched/fair.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ad72db5a266c..45be77d1112f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1031,6 +1031,19 @@ void init_entity_runnable_average(struct sched_entity *se)
 }
 
 /*
+ * When util_est is used, the tasks can rampup much faster by default. And with
+ * the rampup_multiplier, tasks can ask for faster rampup after fork. And with
+ * uclamp, they can ensure a min perf requirement. Given all these factors, we
+ * keep util_avg at 0 as we can't crystal ball the task demand after fork.
+ * Userspace have enough ways to ensure good perf for tasks after fork. Keeping
+ * the util_avg to 0 is good way to ensure a uniform start for all tasks. And
+ * it is good to preserve precious resources. Truly busy forked tasks can
+ * compete for the resources without the need for initial 'cheat' to ramp them
+ * up automagically.
+ *
+ * When util_est is not present, the extrapolation logic below will still
+ * apply.
+ *
  * With new tasks being created, their initial util_avgs are extrapolated
  * based on the cfs_rq's current util_avg:
  *
@@ -1080,6 +1093,12 @@ void post_init_entity_util_avg(struct task_struct *p)
 		return;
 	}
 
+	/*
+	 * Tasks can rampup faster with util_est, so don't mess with util_avg.
+	 */
+	if (sched_feat(UTIL_EST))
+		return;
+
 	if (cap > 0) {
 		if (cfs_rq->avg.util_avg != 0) {
 			sa->util_avg  = cfs_rq->avg.util_avg * se_weight(se);