From patchwork Tue Aug 20 16:34:57 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qais Yousef X-Patchwork-Id: 820846 Received: from mail-lf1-f53.google.com (mail-lf1-f53.google.com [209.85.167.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A85981AB53E for ; Tue, 20 Aug 2024 16:35:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.167.53 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724171721; cv=none; b=fWVbH8XmMdhppcMrdIbPfvdXG4K5s+5cgKsXjUGP8ApFXLmjbM4eZpBDsxLY7SMDx2Q2N353/AQ9EyJlyF9QAfSaCskRcPWR6zsO42gDkLc47oubzPxvvQJBI0DEnKpgAGvjP8LjJn+/4acX9ueO0vueu1Ln4B0TdyKs1gRbNg4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724171721; c=relaxed/simple; bh=zQ47ImMy4Dn1DCDt3xGlBdnPJ6cBUL0LZB3cTsDPnpA=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=oJQ3RWSC0wZsTP4iLJB7YadkapCd/R661fYDNXUcWjA3nZK9VzLwp9gKSAoIgeGP/j9X9DIpometibsUAAeeFxzry/XQBPxngxlsZxlKcWv74YpTaJRzFE0LAtRAWQGcUlujm8UunvIKHTh4DdxDFcj8O0ajeLMU/nOgqmWRbWc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=layalina.io; spf=pass smtp.mailfrom=layalina.io; dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com header.i=@layalina-io.20230601.gappssmtp.com header.b=B+yMZs56; arc=none smtp.client-ip=209.85.167.53 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=layalina.io Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=layalina.io Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com header.i=@layalina-io.20230601.gappssmtp.com header.b="B+yMZs56" Received: by mail-lf1-f53.google.com with SMTP id 2adb3069b0e04-533462b9428so615027e87.3 for ; Tue, 20 Aug 2024 09:35:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=layalina-io.20230601.gappssmtp.com; s=20230601; t=1724171718; x=1724776518; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ALljPJwr4PeLXzf4Mb7jserQc5yvW1LK2jxOSl+dUgk=; b=B+yMZs56DcUH8lwhBw7LjnX1UE4bUz+uXrsRAFZbq2wDl9WIxCE+7bQCJwVAV5dsvD 1HsSi9npTs5kShx7/YWBr/EBeHotOA4Dpbfrbr2FRXrBNy60r0xdiA3bwQ1g6i+Y/DNs C/bxMu8u18z+qiYdM/h/tJV3T9vRh/uxJxbCCKC2n9UDjv2RAhlDe9MPXAbNuxxPyfDq bMuW7p4nT+noroAo2WrnYcpYuAjP8ldg5f+Y0le8COnziAh0YLgg4T2nKPUipHxp1kK7 dILxGzzGxGrRXuCOJanfpKtPd+YKkTbvX4DuWhHIKs156AogCLT6SJAUMuiXA3HMrzCz U0nA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724171718; x=1724776518; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ALljPJwr4PeLXzf4Mb7jserQc5yvW1LK2jxOSl+dUgk=; b=ZWRbHN5RcuXVBnRzmVHjeMO66m8ZmMiS8JYULkWtYw+Yxu7LY8rp7CBotxN2eHcRbJ T1JOdImrHoSH+6TkIcHp3gkLeUUL2ItctlgS7NSJMw9D+QWytvaSKrVI4+bqPHf1KPEf o+13gfUgK1Lgv7a6ZVaSEc1lT6+gplUQdMitfa29O0RcV6qy/xqRoPXkgo5KrBqqun7g 7hdMjxrvCljehVeP8DNqJfOUfwqWTlz+znzwElTcMKPWPmN/+Yk99vR17BOeqlBKk4FV vIMXaCev9X5b2Rv6n+W6bOS5/q55XJenwkuPvjmiF6BF+R+PiuRb/pcZxnjSdRVKJBeb vkvw== X-Forwarded-Encrypted: i=1; AJvYcCX4Irp8ISJvo6zl7GgVftOFJv5jMnLd/hUkCJBq1HS4U+0N60TjuIy5h4BlNXV0bQqLkwEqE/WuM94yD3bjL88EZ8oc38ODmtY= X-Gm-Message-State: AOJu0Yxk6GCxPue39ff5rHfWy7SWNJMzRFy8+lsuZWbYcMGmPF2br1Dx KXogBnpD9rom2tznnaAHdeEw+r63TPaRCuUeQ98oSxqm78hv+SETCRJuLEAxXiU= X-Google-Smtp-Source: AGHT+IGuDVH/ZjqmvSREiN6JjWHaWFgiSgGpgzFBXWMg8TVU1iExwbDmMv1TgZFfKYdZJrbLW1T/7A== X-Received: by 2002:a05:6512:1193:b0:533:4620:ebec with SMTP id 2adb3069b0e04-5334620eda4mr942030e87.3.1724171717578; Tue, 20 Aug 2024 09:35:17 -0700 (PDT) Received: from airbuntu.. (host81-157-90-255.range81-157.btcentralplus.com. [81.157.90.255]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a83838cfb5esm780728766b.59.2024.08.20.09.35.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Aug 2024 09:35:17 -0700 (PDT) From: Qais Yousef To: Ingo Molnar , Peter Zijlstra , Vincent Guittot , "Rafael J. Wysocki" , Viresh Kumar Cc: Juri Lelli , Steven Rostedt , Dietmar Eggemann , John Stultz , linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, Qais Yousef Subject: [RFC PATCH 01/16] sched: cpufreq: Rename map_util_perf to sugov_apply_dvfs_headroom Date: Tue, 20 Aug 2024 17:34:57 +0100 Message-Id: <20240820163512.1096301-2-qyousef@layalina.io> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240820163512.1096301-1-qyousef@layalina.io> References: <20240820163512.1096301-1-qyousef@layalina.io> Precedence: bulk X-Mailing-List: linux-pm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 We are providing headroom for the utilization to grow until the next decision point to pick the next frequency. Give the function a better name and give it some documentation. It is not really mapping anything. Also move it to cpufreq_schedutil.c. This function relies on updating util signal appropriately to give a headroom to grow. This is tied to schedutil and scheduler and not something that can be shared with other governors. Acked-by: Viresh Kumar Acked-by: Rafael J. Wysocki Reviewed-by: Vincent Guittot Signed-off-by: Qais Yousef --- include/linux/sched/cpufreq.h | 5 ----- kernel/sched/cpufreq_schedutil.c | 20 +++++++++++++++++++- 2 files changed, 19 insertions(+), 6 deletions(-) diff --git a/include/linux/sched/cpufreq.h b/include/linux/sched/cpufreq.h index bdd31ab93bc5..d01755d3142f 100644 --- a/include/linux/sched/cpufreq.h +++ b/include/linux/sched/cpufreq.h @@ -28,11 +28,6 @@ static inline unsigned long map_util_freq(unsigned long util, { return freq * util / cap; } - -static inline unsigned long map_util_perf(unsigned long util) -{ - return util + (util >> 2); -} #endif /* CONFIG_CPU_FREQ */ #endif /* _LINUX_SCHED_CPUFREQ_H */ diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c index eece6244f9d2..575df3599813 100644 --- a/kernel/sched/cpufreq_schedutil.c +++ b/kernel/sched/cpufreq_schedutil.c @@ -178,12 +178,30 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy, return cpufreq_driver_resolve_freq(policy, freq); } +/* + * DVFS decision are made at discrete points. If CPU stays busy, the util will + * continue to grow, which means it could need to run at a higher frequency + * before the next decision point was reached. IOW, we can't follow the util as + * it grows immediately, but there's a delay before we issue a request to go to + * higher frequency. The headroom caters for this delay so the system continues + * to run at adequate performance point. + * + * This function provides enough headroom to provide adequate performance + * assuming the CPU continues to be busy. + * + * At the moment it is a constant multiplication with 1.25. + */ +static inline unsigned long sugov_apply_dvfs_headroom(unsigned long util) +{ + return util + (util >> 2); +} + unsigned long sugov_effective_cpu_perf(int cpu, unsigned long actual, unsigned long min, unsigned long max) { /* Add dvfs headroom to actual utilization */ - actual = map_util_perf(actual); + actual = sugov_apply_dvfs_headroom(actual); /* Actually we don't need to target the max performance */ if (actual < max) max = actual; From patchwork Tue Aug 20 16:34:58 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qais Yousef X-Patchwork-Id: 821200 Received: from mail-ed1-f46.google.com (mail-ed1-f46.google.com [209.85.208.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A1B0C1AE04F for ; Tue, 20 Aug 2024 16:35:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.46 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724171722; cv=none; b=muHjeURFNmnzDuICxMy17b1H2Xn3QXDcQyM/wjR9lUTNO41OeSWrUs7ucFwAQARd6Mbu1R+EWFyCMoGYDkRD7XsLGeHfEttM7MHQOiInFISDndHurm/yvDw1v9T5gSyZgTM37tMzY9qjYcaX2bU+74nf06Xr2KXsLVbiBXp/TXE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724171722; c=relaxed/simple; bh=x61t08722IA9HiK/FzMTCXQkBhUf938GZOXU6aQsSSk=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Yn9+Yv1L4MrSlF4unn8EEv9Fsx4LNfpzaEGSJ8Pc2N9cp1Zc3z14Mm39TLJvORKDS+E81funCEwTGEuHjobGp0ZE8ZKrK8vgPW2H/R8e9Gxzf2EIHujB6fX9mpTLQ6U4+vdNiW/RRTN5oLUnXmfr0lz82gETagZkoXTND5L0JNA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=layalina.io; spf=pass smtp.mailfrom=layalina.io; dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com header.i=@layalina-io.20230601.gappssmtp.com header.b=V6TR+HEk; arc=none smtp.client-ip=209.85.208.46 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=layalina.io Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=layalina.io Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com header.i=@layalina-io.20230601.gappssmtp.com header.b="V6TR+HEk" Received: by mail-ed1-f46.google.com with SMTP id 4fb4d7f45d1cf-5befd2f35bfso2727920a12.2 for ; Tue, 20 Aug 2024 09:35:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=layalina-io.20230601.gappssmtp.com; s=20230601; t=1724171719; x=1724776519; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=P4GvNZ2q2vkKbUvc/NgCLEzCheBxpfsamb4EXfBM/Og=; b=V6TR+HEk6ch4QlU0saqGNMJ5DhHPrFkh0qnwkkIkhf/0rInDOAi1ajEM91s4akYkIR oCv9yClD3OTbLl8kyOeWbTIsGihOBFlKn6E4j2/q3BCrcVkcCnaUaQWkcTnfhCQ4Bb22 tAh3+7g+dIphmESa6y1cVQCKCRcsnnf14Vno3QKL5826xuLylVi2ErstQCLz/7+pMjwI mHaBz9VYiHsBHaMqUlvhhRc3hmcfFU/MG6BjizdwMowafdQfVNq+R1p8jot7WqQXmX6I MHw3IfTHV2sRUlUYcKN7fidLgJfpaWVgU3mbg5JzZ7KmIOi5NwgFAHSz4AASMOf6bcDZ 0UVw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724171719; x=1724776519; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=P4GvNZ2q2vkKbUvc/NgCLEzCheBxpfsamb4EXfBM/Og=; b=LfMZSH9mB6HuwghcmzSDsC5hHJAXNXrHZHQlYfib4Ujamo6DnVKa9XBLLKuxqWw5iP LTY6gpQHpj6LlVFvREhAnXsWJhLYPOhdB9M1W4O7ODKGZq8th2NwYgOLyACs4AAiXFhB 0a09w1gVJFZXDcqjJyyZonVOJjHmA0qzrdjbO21M8hsyE1H5zQkCcEzPMoQ1NIybliFb 1s5BVDLpnh6QBXy1UOD7qAbCIumpMJENu+98Hc9fA+RJyz1TTeTuKXs+SXgCdjZ39Jy2 Ohnb5ibq6r8S/6yTs6l5hVrgdfJAyipKNxlXaVYXJuN6tH4Gf9oE6b+XB3xxYM3MoUZH antA== X-Forwarded-Encrypted: i=1; AJvYcCX25h99HhuY3/c4dSHbkzzenIu5fiUFQ0RHOwOsuXkWvkmc6wnjyACtEatB1KuWyKqy2jo8XZzHAQ==@vger.kernel.org X-Gm-Message-State: AOJu0Yw5DJFsYHRfnnF0ajscWiBg5VJkX5weORTy35jutZs288H7VRzn ZWmgzLTAU66c2u7F4uup0Fs86t5fP3rFZXZA8e7MQF8kkb4fe8z/lrN/f0/6nNI= X-Google-Smtp-Source: AGHT+IGANk0JtgzuFnotXZ6oaaNXX0v0ntO5GyaujzzJT2sPsn/z/iWuqqA4+lqIPu3rUA7Z15IshQ== X-Received: by 2002:a17:907:1c22:b0:a80:f6f2:e070 with SMTP id a640c23a62f3a-a83928a3333mr1164972466b.3.1724171718395; Tue, 20 Aug 2024 09:35:18 -0700 (PDT) Received: from airbuntu.. (host81-157-90-255.range81-157.btcentralplus.com. [81.157.90.255]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a83838cfb5esm780728766b.59.2024.08.20.09.35.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Aug 2024 09:35:18 -0700 (PDT) From: Qais Yousef To: Ingo Molnar , Peter Zijlstra , Vincent Guittot , "Rafael J. Wysocki" , Viresh Kumar Cc: Juri Lelli , Steven Rostedt , Dietmar Eggemann , John Stultz , linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, Qais Yousef Subject: [RFC PATCH 02/16] sched/pelt: Add a new function to approximate the future util_avg value Date: Tue, 20 Aug 2024 17:34:58 +0100 Message-Id: <20240820163512.1096301-3-qyousef@layalina.io> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240820163512.1096301-1-qyousef@layalina.io> References: <20240820163512.1096301-1-qyousef@layalina.io> Precedence: bulk X-Mailing-List: linux-pm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Given a util_avg value, the new function will return the future one given a runtime delta. This will be useful in later patches to help replace some magic margins with more deterministic behavior. Signed-off-by: Qais Yousef --- kernel/sched/pelt.c | 22 +++++++++++++++++++++- kernel/sched/sched.h | 1 + 2 files changed, 22 insertions(+), 1 deletion(-) diff --git a/kernel/sched/pelt.c b/kernel/sched/pelt.c index fa52906a4478..2ce83e880bd5 100644 --- a/kernel/sched/pelt.c +++ b/kernel/sched/pelt.c @@ -466,4 +466,24 @@ int update_irq_load_avg(struct rq *rq, u64 running) return ret; } -#endif +#endif /* CONFIG_HAVE_SCHED_AVG_IRQ */ + +/* + * Approximate the new util_avg value assuming an entity has continued to run + * for @delta us. + */ +unsigned long approximate_util_avg(unsigned long util, u64 delta) +{ + struct sched_avg sa = { + .util_sum = util * PELT_MIN_DIVIDER, + .util_avg = util, + }; + + if (unlikely(!delta)) + return util; + + accumulate_sum(delta, &sa, 1, 0, 1); + ___update_load_avg(&sa, 0); + + return sa.util_avg; +} diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 4c36cc680361..294c6769e330 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -3064,6 +3064,7 @@ unsigned long sugov_effective_cpu_perf(int cpu, unsigned long actual, unsigned long min, unsigned long max); +unsigned long approximate_util_avg(unsigned long util, u64 delta); /* * Verify the fitness of task @p to run on @cpu taking into account the From patchwork Tue Aug 20 16:34:59 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qais Yousef X-Patchwork-Id: 821199 Received: from mail-lj1-f181.google.com (mail-lj1-f181.google.com [209.85.208.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 68673195F17 for ; Tue, 20 Aug 2024 16:35:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.181 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724171723; cv=none; b=f9R3RNK8ynltr7Bv4gdM0mIKCjoKghFOEv3OdUtWK23zbm/3Wiaf/wHzeC6VV1imcr/ARNViN7N/JQRHftGOx7RD1OztVP7kq0tzRIo3Z7QpR9PQY2t6Hm3od2LQHptZWpxzs+HKPY8FI2Dup6EvYrmo4xWKx9nY9Aij+Xa3vkE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724171723; c=relaxed/simple; bh=FF4nQxBM0nS1lqakhr4Q+sUzCZDVAFySaog/xxInf3M=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Ok+lExfqimQJetY1Y223sgogUU07DNUORaRnXZtBWCzT2GFgzSNKqJ20D8pZmqgzv8dbo3u2hyelgqD7NCxdyUeWNRFuZxjoxmD7nCcwy/Ly9Nne2Y3IZRifDWjddYakch2euBsqRPVDKZ0YUVTMZh0hZECHKXlsAU3PU7Dc01w= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=layalina.io; spf=pass smtp.mailfrom=layalina.io; dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com header.i=@layalina-io.20230601.gappssmtp.com header.b=I8RXwI2m; arc=none smtp.client-ip=209.85.208.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=layalina.io Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=layalina.io Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com header.i=@layalina-io.20230601.gappssmtp.com header.b="I8RXwI2m" Received: by mail-lj1-f181.google.com with SMTP id 38308e7fff4ca-2f029e9c9cfso30807121fa.2 for ; Tue, 20 Aug 2024 09:35:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=layalina-io.20230601.gappssmtp.com; s=20230601; t=1724171719; x=1724776519; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=/BSe4KOyJw2YEYF8rnu6mjIl7u9y+oNGIQ3Hg6lKPvk=; b=I8RXwI2menTRCf+yMfWW84ydKaVgC8EaI0+dk0M56EXo1MH5w5kGeC4s2aOpnqQWZQ VK9w+FqoOQlkqcCO7ngOn4KWGi7Dk9rcqopIwm3NbpFrHtPkSMIjFLhpvgtwR11sl0n3 9uONdTjNa87gXkR4Z8g883SQgjTwYZorQuGWz/LvC9IWbmw2KUxkvsZzFZLpE3zmgh4d JQ20QBJlGdjA0uCPNyh5vaEAXt7DZzaeYNgAC81/il4bqs1+1d7qGzDuKrcaq7ySaRf1 9JTFsbjojBZvIOii/Uzl36hdwPQfw+PHkl4iIcRTkr5UutahNW9NWZk6uFpXt7Dn7GVc gWPg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724171719; x=1724776519; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/BSe4KOyJw2YEYF8rnu6mjIl7u9y+oNGIQ3Hg6lKPvk=; b=D+sDfcp5ynZtOn4VhNCrXk6LJG0N+4Wd+G4kiJHqzllvJ6/8PfjiZWbDPgXvp2PCCy 52zs3un9Tk7YF0Otw9bINlCQYeNEWfxM8AB7u6hgiganFgKN9wft0fGeB84q1cbkx75t eHaORjWR5aP+7oIwB7duKMTaXVrJRN7LA6qfe5Q5GJHppxHMcy68423sfo8J8zSw/Z3z vzW/hCmtDPwdq6fm3rIJROUnGHhZENGl+oWYTiWXBB909SSEkRUrL2Prm9Cps8frUptn d79kgoXj+5Vg+NOZUHVIabJ7P1fOS2ZJ/6+Nz9PQeGBwLUFaWZmGZ+K7wqrOmsH9O4CL 2HcA== X-Forwarded-Encrypted: i=1; AJvYcCWBwgqSp/yqeX3j5oNcFVkeZSRoYFYnh8ninN+YnHuQ7wHtohBklPVOYMglrBGGIXhDLdkJOcosMA==@vger.kernel.org X-Gm-Message-State: AOJu0YxdYlNdlebtnAhUJWjzljGEUh5qjcAzQPVRQhn9tVGL7UfURaVX mVoqNIgKmPWn0YYysSjAxBc/kG2KUDmHjWj639uuKSZaBy8dwelZS4r5cNCtonU= X-Google-Smtp-Source: AGHT+IFsK5sSF4zn+iA0VCf5c4osiSUBkZLZkTq8PaWwmO8mR0g9hawJ9xoV1658/XBy+x9/4qSzfw== X-Received: by 2002:a05:6512:3d28:b0:52c:8342:6699 with SMTP id 2adb3069b0e04-5331c6e4088mr12083886e87.55.1724171719232; Tue, 20 Aug 2024 09:35:19 -0700 (PDT) Received: from airbuntu.. (host81-157-90-255.range81-157.btcentralplus.com. [81.157.90.255]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a83838cfb5esm780728766b.59.2024.08.20.09.35.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Aug 2024 09:35:18 -0700 (PDT) From: Qais Yousef To: Ingo Molnar , Peter Zijlstra , Vincent Guittot , "Rafael J. Wysocki" , Viresh Kumar Cc: Juri Lelli , Steven Rostedt , Dietmar Eggemann , John Stultz , linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, Qais Yousef Subject: [RFC PATCH 03/16] sched/pelt: Add a new function to approximate runtime to reach given util Date: Tue, 20 Aug 2024 17:34:59 +0100 Message-Id: <20240820163512.1096301-4-qyousef@layalina.io> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240820163512.1096301-1-qyousef@layalina.io> References: <20240820163512.1096301-1-qyousef@layalina.io> Precedence: bulk X-Mailing-List: linux-pm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 It is basically the ramp-up time from 0 to a given value. Will be used later to implement new tunable to control response time for schedutil. Signed-off-by: Qais Yousef --- kernel/sched/pelt.c | 21 +++++++++++++++++++++ kernel/sched/sched.h | 1 + 2 files changed, 22 insertions(+) diff --git a/kernel/sched/pelt.c b/kernel/sched/pelt.c index 2ce83e880bd5..06cb881ba582 100644 --- a/kernel/sched/pelt.c +++ b/kernel/sched/pelt.c @@ -487,3 +487,24 @@ unsigned long approximate_util_avg(unsigned long util, u64 delta) return sa.util_avg; } + +/* + * Approximate the required amount of runtime in ms required to reach @util. + */ +u64 approximate_runtime(unsigned long util) +{ + struct sched_avg sa = {}; + u64 delta = 1024; // period = 1024 = ~1ms + u64 runtime = 0; + + if (unlikely(!util)) + return runtime; + + while (sa.util_avg < util) { + accumulate_sum(delta, &sa, 1, 0, 1); + ___update_load_avg(&sa, 0); + runtime++; + } + + return runtime; +} diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 294c6769e330..47f158b2cdc2 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -3065,6 +3065,7 @@ unsigned long sugov_effective_cpu_perf(int cpu, unsigned long actual, unsigned long max); unsigned long approximate_util_avg(unsigned long util, u64 delta); +u64 approximate_runtime(unsigned long util); /* * Verify the fitness of task @p to run on @cpu taking into account the From patchwork Tue Aug 20 16:35:00 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qais Yousef X-Patchwork-Id: 820844 Received: from mail-wm1-f48.google.com (mail-wm1-f48.google.com [209.85.128.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EAD76198E75 for ; Tue, 20 Aug 2024 16:35:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.48 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724171723; cv=none; b=SrTQAffco0HnjU7SUIVPEYqh5dxj3Bi9ivnKVW2J30gYw6PHtNnOz6cVJQgDoCEqP/flhCwmch8ggDm8KwHNlhfcL8F8B8Ky8oNWasLOWLsZqwTi5vQdvAbRv7QiyhloDULPc/1msMF9QEfnpqC1Q0NZqacC2pNM7kuo6bxiPXc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724171723; c=relaxed/simple; bh=iywCMvLV4TtRo7p9sFYjyRIh592/LLot9ueypZVp58c=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=GD+ex6hp2QdFUreptapd12RdQ6t4EMIHs8jHRujthh+tUVeyGRkXFO1WEg6bCbjQhRpyUqxaXN62tiEaLY0Nlbc6PYcrJLAWqg7DmK9IipLBSQNTXGzWoRVKXXp0zKy6ju3gf2rM+TPkYwoPHK7Dssktc6FXS7Vx7GH63ffxvEE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=layalina.io; spf=pass smtp.mailfrom=layalina.io; dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com header.i=@layalina-io.20230601.gappssmtp.com header.b=FoFhAMWf; arc=none smtp.client-ip=209.85.128.48 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=layalina.io Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=layalina.io Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com header.i=@layalina-io.20230601.gappssmtp.com header.b="FoFhAMWf" Received: by mail-wm1-f48.google.com with SMTP id 5b1f17b1804b1-42816ca797fso47308205e9.2 for ; Tue, 20 Aug 2024 09:35:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=layalina-io.20230601.gappssmtp.com; s=20230601; t=1724171720; x=1724776520; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=4F1WOdF23Pe6zUBd4kovJU9RvUMFw9YRtUZ0tQf1aSc=; b=FoFhAMWfB3718FijD5y0b0aIYk+gikZSkfUeJJ8Yu49C4AgRG8JGKeGBSFm6+wDlU5 s9Z6BTE5bDwj4MjhuSaRWcw6meo0nk6rGoknxGWTvnXEYltGR0/T1Bh+mKwyUKRJY5mf HADhrMKtY3E3Xqzono+H0556XGUdoLZ5K3ntUsnX36lce/UEyTuCB24+tmRsSeZ3qDHE e7rplvTGfuwsDB9jjY2cDmr0GHaM5h8tW3678gRvZZpOIQGAOQYlQSSK/dOGRR3840bY J6bV4uJyckwa7Dl7ySRG0fMzW8rABf4YBK2LuT5AlQQlKtxxjQCzXkJvYAkw+QmzUsyc QVYA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724171720; x=1724776520; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=4F1WOdF23Pe6zUBd4kovJU9RvUMFw9YRtUZ0tQf1aSc=; b=knkelqzPyX1khMPNZlAaZ/wrPWkBtPqnWPA3BX/7v2YDGlfUl0oMm2eUmhljkcLvQe D4alKzq+HH0CDIL/ii96QEnIJtOoR0lBUoPU9fGPc+yHNEKB12Ai96htpyHwc4p5ytA/ QEqXHi9cJtPaVp2iEpTtqYop9kTu80VmUKFrQhXY543jBU2Bp2RQRK9m0fybt1WaEl+r 9ISkIhGbsFU0UCze+04ZrcWMbl/ijVk8I/phV8coM9UfG2O+rdJOFRfjU8yHe1LtA8Ne rfA45XY+XgLv27S9OiiGaWsPjEcYpjqSaeKQw07qNLUH786Co1qEA+oQg2tEK4mahxk5 MzGA== X-Forwarded-Encrypted: i=1; AJvYcCUbd0ACSZe1aDn1qM+lw65Q+DswSpr1qTPbf8dSxzTUWNBVqeC86LVjAQmrxA+YGqf+ZTFy8AXW7RlYlPzK5ayxr9Tjy9fJlOw= X-Gm-Message-State: AOJu0YyZwVi1tGb8G11K1RwUcOtt4PYJJku+Oqtdahrh7vPjvlCkykH3 0M9xJ9bNV4UQmJllU4ubDvonByqh1GTVazXWdxeaK0JNhrgHWfSzeGoG7PgBuBo= X-Google-Smtp-Source: AGHT+IEu45k94fu5QhYe3wBpe9rK5a5C2AuIr7eJpYJMNhrKU40wfFxlLbRCcxUSZTh9bJvNjgDxDA== X-Received: by 2002:a5d:400d:0:b0:367:8847:5bf4 with SMTP id ffacd0b85a97d-37194317679mr9919698f8f.10.1724171720072; Tue, 20 Aug 2024 09:35:20 -0700 (PDT) Received: from airbuntu.. (host81-157-90-255.range81-157.btcentralplus.com. [81.157.90.255]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a83838cfb5esm780728766b.59.2024.08.20.09.35.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Aug 2024 09:35:19 -0700 (PDT) From: Qais Yousef To: Ingo Molnar , Peter Zijlstra , Vincent Guittot , "Rafael J. Wysocki" , Viresh Kumar Cc: Juri Lelli , Steven Rostedt , Dietmar Eggemann , John Stultz , linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, Qais Yousef Subject: [RFC PATCH 04/16] sched/fair: Remove magic hardcoded margin in fits_capacity() Date: Tue, 20 Aug 2024 17:35:00 +0100 Message-Id: <20240820163512.1096301-5-qyousef@layalina.io> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240820163512.1096301-1-qyousef@layalina.io> References: <20240820163512.1096301-1-qyousef@layalina.io> Precedence: bulk X-Mailing-List: linux-pm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Replace hardcoded margin value in fits_capacity() with better dynamic logic. 80% margin is a magic value that has served its purpose for now, but it no longer fits the variety of systems that exist today. If a system is over powered specifically, this 80% will mean we leave a lot of capacity unused before we decide to upmigrate on HMP system. On many systems the little cores are under powered and ability to migrate faster away from them is desired. Redefine misfit migration to mean the utilization threshold at which the task would become misfit at the next load balance event assuming it becomes an always running task. To calculate this threshold, we use the new approximate_util_avg() function to find out the threshold, based on arch_scale_cpu_capacity() the task will be misfit if it continues to run for a TICK_USEC which is our worst case scenario for when misfit migration will kick in. Signed-off-by: Qais Yousef --- kernel/sched/core.c | 1 + kernel/sched/fair.c | 40 ++++++++++++++++++++++++++++++++-------- kernel/sched/sched.h | 1 + 3 files changed, 34 insertions(+), 8 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 6d35c48239be..402ee4947ef0 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -8266,6 +8266,7 @@ void __init sched_init(void) rq->sd = NULL; rq->rd = NULL; rq->cpu_capacity = SCHED_CAPACITY_SCALE; + rq->fits_capacity_threshold = SCHED_CAPACITY_SCALE; rq->balance_callback = &balance_push_callback; rq->active_balance = 0; rq->next_balance = jiffies; diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 9057584ec06d..e5e986af18dc 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -95,11 +95,15 @@ int __weak arch_asym_cpu_priority(int cpu) } /* - * The margin used when comparing utilization with CPU capacity. - * - * (default: ~20%) + * fits_capacity() must ensure that a task will not be 'stuck' on a CPU with + * lower capacity for too long. This the threshold is the util value at which + * if a task becomes always busy it could miss misfit migration load balance + * event. So we consider a task is misfit before it reaches this point. */ -#define fits_capacity(cap, max) ((cap) * 1280 < (max) * 1024) +static inline bool fits_capacity(unsigned long util, int cpu) +{ + return util < cpu_rq(cpu)->fits_capacity_threshold; +} /* * The margin used when comparing CPU capacities. @@ -4978,14 +4982,13 @@ static inline int util_fits_cpu(unsigned long util, unsigned long uclamp_max, int cpu) { - unsigned long capacity = capacity_of(cpu); unsigned long capacity_orig; bool fits, uclamp_max_fits; /* * Check if the real util fits without any uclamp boost/cap applied. */ - fits = fits_capacity(util, capacity); + fits = fits_capacity(util, cpu); if (!uclamp_is_used()) return fits; @@ -9592,12 +9595,33 @@ static void update_cpu_capacity(struct sched_domain *sd, int cpu) { unsigned long capacity = scale_rt_capacity(cpu); struct sched_group *sdg = sd->groups; + struct rq *rq = cpu_rq(cpu); + u64 limit; if (!capacity) capacity = 1; - cpu_rq(cpu)->cpu_capacity = capacity; - trace_sched_cpu_capacity_tp(cpu_rq(cpu)); + rq->cpu_capacity = capacity; + trace_sched_cpu_capacity_tp(rq); + + /* + * Calculate the util at which the task must be considered a misfit. + * + * We must ensure that a task experiences the same ramp-up time to + * reach max performance point of the system regardless of the CPU it + * is running on (due to invariance, time will stretch and task will + * take longer to achieve the same util value compared to a task + * running on a big CPU) and a delay in misfit migration which depends + * on TICK doesn't end up hurting it as it can happen after we would + * have crossed this threshold. + * + * To ensure that invaraince is taken into account, we don't scale time + * and use it as-is, approximate_util_avg() will then let us know the + * our threshold. + */ + limit = approximate_runtime(arch_scale_cpu_capacity(cpu)) * USEC_PER_MSEC; + limit -= TICK_USEC; /* sd->balance_interval is more accurate */ + rq->fits_capacity_threshold = approximate_util_avg(0, limit); sdg->sgc->capacity = capacity; sdg->sgc->min_capacity = capacity; diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 47f158b2cdc2..ab4672675b84 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1093,6 +1093,7 @@ struct rq { struct sched_domain __rcu *sd; unsigned long cpu_capacity; + unsigned long fits_capacity_threshold; struct balance_callback *balance_callback; From patchwork Tue Aug 20 16:35:01 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qais Yousef X-Patchwork-Id: 821198 Received: from mail-ej1-f44.google.com (mail-ej1-f44.google.com [209.85.218.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C92411BB69B for ; Tue, 20 Aug 2024 16:35:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.44 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724171724; cv=none; b=kHuozDGZ40AudJcUGWXvAaJ/+ZggqiXfp3Xp0n3h8x0HGwu8lwnn9sinBX1Z4QqPocKiCp1czQVoUFr6Xa7aQZshKQdZe48Y4I26tBBzn9iRUoI+mYeQOVafbn7U3CL/C6WJsIQ8QfJXCugyRcJ/acH0aJJnt1QCK4lr7udCyLo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724171724; c=relaxed/simple; bh=ZmhmvAh2ezCJJ4PmFxbu0s4/c7f7Y3hO+tlLfSwsztc=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=I3WjMi6crSLzFoD7A3kpO4INtRVqmJ4ykGtxQviyBg4Z1IKuAFsqW5ZrKF3BVpga7DxApmMojXTql8Fg5fopmGJl49rvT8I7r8uK9tFmqemTXIQcfyBU6XYhRa5g6dR4LgMORVmrb1deyhDgz9rGUFkSceWajjhtGH5dSDe+WmA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=layalina.io; spf=pass smtp.mailfrom=layalina.io; dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com header.i=@layalina-io.20230601.gappssmtp.com header.b=ZN42eiHZ; arc=none smtp.client-ip=209.85.218.44 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=layalina.io Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=layalina.io Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com header.i=@layalina-io.20230601.gappssmtp.com header.b="ZN42eiHZ" Received: by mail-ej1-f44.google.com with SMTP id a640c23a62f3a-a7abe5aa9d5so615137766b.1 for ; Tue, 20 Aug 2024 09:35:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=layalina-io.20230601.gappssmtp.com; s=20230601; t=1724171721; x=1724776521; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=xQ1DK2daPE9ncuhHLypAOQVYnkEU114ByewzWSw0TA4=; b=ZN42eiHZCPvTwREOEJqJbGsuizb3xhvMQU+MZwMM7zwdpGQXJPR6SZZfQc6p3D/mN9 OBtJ23dNwC5DWwNJpmSbUXMusS5vccpj7YQVygqKPqeiXdhZCn6gccNgy+Gsbjf9j3+2 cZr7ggSGTuB73fANZfHQXOmh7FiwA/eHPo6fXHEpu35ASUbEC6aqWRzlXNeo80nRJ6To RXOcI+E/vyHC7ycySs3+7f5fDKDj9NTqZ47GICpek8hUBtP3p9vP67+UURg1Eq+ZT+EW 3KNhD8the0zCN95BVUUf05GwYDBQ2KWn4kkjDTFQLurhYDTtbS/ZqF54VpcpbRIAxth/ kq3g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724171721; x=1724776521; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=xQ1DK2daPE9ncuhHLypAOQVYnkEU114ByewzWSw0TA4=; b=T8uQqp29Q5xQqDf3C9RBKPYS7fr9EwzJsd9alUzgCpBY/9Kf063EBojNQ5olWuYMsg i+PRGurcOLgBuaDlEJWrIf4CA2GPkx1A18UrswauBj9rGP3UTPVQPTXWaXDwE4XdvIBV wxDWEZjoSmmuhWr+/0g7iGwEyZpOUCmBrEnua+X407mbZefEVQCKD6IVUXXGHRfxuQbc gn+w+iSSF/v2FGbH7bROQwWv+jm3aue7zW0E2+9o5kURnA8ju5YKQavthqQe64nW50JI oFXM2YLzHS82XRTtfw29IRPPzacwfiyXDUq9uFSG2voQX+kL39ROeITo8H/6vxfz11xI /zsw== X-Forwarded-Encrypted: i=1; AJvYcCXuLXYeag8s68FlQKDNcs2dW2Qsn+oJwsEfS9R04NHw6GcEAjw2fBIzPnPyxOZR1Qp/h2rpNMrUaQ==@vger.kernel.org X-Gm-Message-State: AOJu0Yw/5pe3FRNTBd0RJiG1gWlKYrSwYXpoilbYyGfsyuDsauaNYApQ rem9kbTsZjmnm8eyqzJelMD/GH6AO0mh6LWm39syJA+oGI6in/MV40bgTjTgKAQ= X-Google-Smtp-Source: AGHT+IGJIEiqfLHFXG+Ko/nMDgS5J7g2UwUVQhLX/Pve1qWCglcSM9ewImEX+qVu2VF8smMRoWdXuQ== X-Received: by 2002:a17:907:72cb:b0:a6f:27e6:8892 with SMTP id a640c23a62f3a-a8392a38c8dmr1018867866b.60.1724171720898; Tue, 20 Aug 2024 09:35:20 -0700 (PDT) Received: from airbuntu.. (host81-157-90-255.range81-157.btcentralplus.com. [81.157.90.255]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a83838cfb5esm780728766b.59.2024.08.20.09.35.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Aug 2024 09:35:20 -0700 (PDT) From: Qais Yousef To: Ingo Molnar , Peter Zijlstra , Vincent Guittot , "Rafael J. Wysocki" , Viresh Kumar Cc: Juri Lelli , Steven Rostedt , Dietmar Eggemann , John Stultz , linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, Qais Yousef Subject: [RFC PATCH 05/16] sched: cpufreq: Remove magic 1.25 headroom from sugov_apply_dvfs_headroom() Date: Tue, 20 Aug 2024 17:35:01 +0100 Message-Id: <20240820163512.1096301-6-qyousef@layalina.io> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240820163512.1096301-1-qyousef@layalina.io> References: <20240820163512.1096301-1-qyousef@layalina.io> Precedence: bulk X-Mailing-List: linux-pm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Replace 1.25 headroom in sugov_apply_dvfs_headroom() with better dynamic logic. Instead of the magical 1.25 headroom, use the new approximate_util_avg() to provide headroom based on the dvfs_update_delay, which is the period at which the cpufreq governor will send DVFS updates to the hardware, or min(curr.se.slice, TICK_USEC) which is the max delay for util signal to change and promote a cpufreq update; whichever is higher. Add a new percpu dvfs_update_delay that can be cheaply accessed whenever sugov_apply_dvfs_headroom() is called. We expect cpufreq governors that rely on util to drive its DVFS logic/algorithm to populate these percpu variables. schedutil is the only such governor at the moment. The behavior of schedutil will change. Some systems will experience faster dvfs rampup (because of higher TICK or rate_limit_us), others will experience slower rampup. The impact on performance should not be visible if not for the black hole effect of utilization invariance. A problem that will be addressed in later patches. Later patches will also address how to provide better control of how fast or slow the system should respond to allow userspace to select their power/perf/thermal trade-off. Signed-off-by: Qais Yousef --- kernel/sched/core.c | 1 + kernel/sched/cpufreq_schedutil.c | 36 ++++++++++++++++++++++++++------ kernel/sched/sched.h | 9 ++++++++ 3 files changed, 40 insertions(+), 6 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 402ee4947ef0..7099e40cc8bd 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -118,6 +118,7 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(sched_update_nr_running_tp); EXPORT_TRACEPOINT_SYMBOL_GPL(sched_compute_energy_tp); DEFINE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues); +DEFINE_PER_CPU_READ_MOSTLY(u64, dvfs_update_delay); #ifdef CONFIG_SCHED_DEBUG /* diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c index 575df3599813..303b0ab227e7 100644 --- a/kernel/sched/cpufreq_schedutil.c +++ b/kernel/sched/cpufreq_schedutil.c @@ -187,13 +187,28 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy, * to run at adequate performance point. * * This function provides enough headroom to provide adequate performance - * assuming the CPU continues to be busy. + * assuming the CPU continues to be busy. This headroom is based on the + * dvfs_update_delay of the cpufreq governor or min(curr.se.slice, TICK_US), + * whichever is higher. * - * At the moment it is a constant multiplication with 1.25. + * XXX: Should we provide headroom when the util is decaying? */ -static inline unsigned long sugov_apply_dvfs_headroom(unsigned long util) +static inline unsigned long sugov_apply_dvfs_headroom(unsigned long util, int cpu) { - return util + (util >> 2); + struct rq *rq = cpu_rq(cpu); + u64 delay; + + /* + * What is the possible worst case scenario for updating util_avg, ctx + * switch or TICK? + */ + if (rq->cfs.h_nr_running > 1) + delay = min(rq->curr->se.slice/1000, TICK_USEC); + else + delay = TICK_USEC; + delay = max(delay, per_cpu(dvfs_update_delay, cpu)); + + return approximate_util_avg(util, delay); } unsigned long sugov_effective_cpu_perf(int cpu, unsigned long actual, @@ -201,7 +216,7 @@ unsigned long sugov_effective_cpu_perf(int cpu, unsigned long actual, unsigned long max) { /* Add dvfs headroom to actual utilization */ - actual = sugov_apply_dvfs_headroom(actual); + actual = sugov_apply_dvfs_headroom(actual, cpu); /* Actually we don't need to target the max performance */ if (actual < max) max = actual; @@ -579,15 +594,21 @@ rate_limit_us_store(struct gov_attr_set *attr_set, const char *buf, size_t count struct sugov_tunables *tunables = to_sugov_tunables(attr_set); struct sugov_policy *sg_policy; unsigned int rate_limit_us; + int cpu; if (kstrtouint(buf, 10, &rate_limit_us)) return -EINVAL; tunables->rate_limit_us = rate_limit_us; - list_for_each_entry(sg_policy, &attr_set->policy_list, tunables_hook) + list_for_each_entry(sg_policy, &attr_set->policy_list, tunables_hook) { + sg_policy->freq_update_delay_ns = rate_limit_us * NSEC_PER_USEC; + for_each_cpu(cpu, sg_policy->policy->cpus) + per_cpu(dvfs_update_delay, cpu) = rate_limit_us; + } + return count; } @@ -868,6 +889,9 @@ static int sugov_start(struct cpufreq_policy *policy) memset(sg_cpu, 0, sizeof(*sg_cpu)); sg_cpu->cpu = cpu; sg_cpu->sg_policy = sg_policy; + + per_cpu(dvfs_update_delay, cpu) = sg_policy->tunables->rate_limit_us; + cpufreq_add_update_util_hook(cpu, &sg_cpu->update_util, uu); } return 0; diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index ab4672675b84..c2d9fba6ea7a 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -3068,6 +3068,15 @@ unsigned long sugov_effective_cpu_perf(int cpu, unsigned long actual, unsigned long approximate_util_avg(unsigned long util, u64 delta); u64 approximate_runtime(unsigned long util); +/* + * Any governor that relies on util signal to drive DVFS, must populate these + * percpu dvfs_update_delay variables. + * + * It should describe the rate/delay at which the governor sends DVFS freq + * update to the hardware in us. + */ +DECLARE_PER_CPU_READ_MOSTLY(u64, dvfs_update_delay); + /* * Verify the fitness of task @p to run on @cpu taking into account the * CPU original capacity and the runtime/deadline ratio of the task. From patchwork Tue Aug 20 16:35:02 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qais Yousef X-Patchwork-Id: 820843 Received: from mail-lj1-f177.google.com (mail-lj1-f177.google.com [209.85.208.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1D5191C378A for ; Tue, 20 Aug 2024 16:35:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.177 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724171728; cv=none; b=P1LZ+51xxa3DgoPjkl6ctX4m0Sn+R9mNk1DUjxih/R/FFzW/7XFnOxv2FHutJ7zH8LoO9ioadXX98LQ7eZesiWm0d+bIpxq72T6eYL9cFHID0lSgkf9iKLd5CPLDe4bngdB/G6RaYcWXJK5zsHgZvBKDia4+5DaCydPHMZv9CP0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724171728; c=relaxed/simple; bh=tgYyWJNFjQme0TMeNdmbWnROtVj9eTJRm8pByyeiq9c=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=cGoJyq91PSMgL5vLO2IjxPZz2iqaigiiSYxVyPakGO5oTQ3/accJJfWLb6Yy8qxdwpmKY5EEJ/hvHHfmgdssX9GaI9qqmDXdF2aAMrkjiAW7mKqOgbZW5tqr7DvMPP0Q4cElpuj/Px0GYC5OZre+AuKUUtnMbbJIU7q2KFRdCEY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=layalina.io; spf=pass smtp.mailfrom=layalina.io; dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com header.i=@layalina-io.20230601.gappssmtp.com header.b=1LzhH+1Z; arc=none smtp.client-ip=209.85.208.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=layalina.io Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=layalina.io Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com header.i=@layalina-io.20230601.gappssmtp.com header.b="1LzhH+1Z" Received: by mail-lj1-f177.google.com with SMTP id 38308e7fff4ca-2f3f163e379so10992911fa.3 for ; Tue, 20 Aug 2024 09:35:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=layalina-io.20230601.gappssmtp.com; s=20230601; t=1724171722; x=1724776522; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=cnyuo1aGTE5JLvwJSoqPF/e83EBO8SWfb0UwiymD/r8=; b=1LzhH+1ZQWYEG0T9WKKxRjlxjwI1DixubVZuDM68kOGY9SbrW/L+0Qt0YOagX+VT02 8FTzWCt3EiLhQh3gvezxNoHAptgaBXX5K6aMec5FiJqS+CLLGmMgq92IbG51tzXgVKXm 9AmmI9R69cQYu+cFGEpUcpcb7DsgDEqZsxFRvD7FkZGNf/VuOUR9n6Z5xVIqShTFQpQs l4OrL2047I9YKhOWczNebbIbzlHMpVVU8MVyV0I62MtCYBS1maOkC5Jzq5mmyB2WC2zG +NRHBx2lQBChqmgqswTGyPkzbCYCtFAGoPRf0hxgfqNdCuZGxIsdjG8K4s79htcjv3Bn bFvQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724171722; x=1724776522; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=cnyuo1aGTE5JLvwJSoqPF/e83EBO8SWfb0UwiymD/r8=; b=l+hQ0JBcov3aFrl/YYLp7A97uEZFATX20yjefVsxeKkYFyo4Qws6E8w5q175twTOhZ ejFNHNUwog2c2D3ESN4wwg1uwwuw/pcDn89tixfI99iI0ZwSdfCgb28HQ6lKdbQ0+xSQ i3/RyBWezcQBDXx78ASdomzwO0N7GGdbUKxDN5iYIdtCbRBnMpoMwbfuMRQf3pHwx/K0 yKiYurx6BtYdN5P69GhdQT72MYGx0mA52iqjmTWz74lnyLc0S5FciPQg8LTKGKaMnTRw 3UpdJnWTChkubxIc5vyq9ZAq3CyQ100Y26NYMVYwWjoCQjtzl5ip9lbINEn0rHLDybsC ie/A== X-Forwarded-Encrypted: i=1; AJvYcCUNAOzDQNGLb7eIlijma1z30XlZwRZhMiZcGKRAl+MdFrXvw690B2AHHmxhubqhvUNkzmxf7go6fQaYgxYmdW46GzbXcXULmDg= X-Gm-Message-State: AOJu0Ywm5hcMpzMmIa0bSn6lNgaGHRXPd4yVRsE+DeQpdq3Ce2xhtMUs uZKyrmIWvRQ+CZxf0WAKv9UAXL11ofUK1HwPXc2T1aS3EucBLS7zviLIF4o5Irg= X-Google-Smtp-Source: AGHT+IFyy57O8ICffhFQegIIFct0J1HR2fD4S7+D4GtgQH4PdLNsbpsWoXm759jhmaZ/Fqu9rrA55w== X-Received: by 2002:a2e:84a:0:b0:2f3:ee58:eaf with SMTP id 38308e7fff4ca-2f3ee580f92mr20700721fa.23.1724171721872; Tue, 20 Aug 2024 09:35:21 -0700 (PDT) Received: from airbuntu.. (host81-157-90-255.range81-157.btcentralplus.com. [81.157.90.255]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a83838cfb5esm780728766b.59.2024.08.20.09.35.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Aug 2024 09:35:21 -0700 (PDT) From: Qais Yousef To: Ingo Molnar , Peter Zijlstra , Vincent Guittot , "Rafael J. Wysocki" , Viresh Kumar Cc: Juri Lelli , Steven Rostedt , Dietmar Eggemann , John Stultz , linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, Qais Yousef Subject: [RFC PATCH 06/16] sched/schedutil: Add a new tunable to dictate response time Date: Tue, 20 Aug 2024 17:35:02 +0100 Message-Id: <20240820163512.1096301-7-qyousef@layalina.io> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240820163512.1096301-1-qyousef@layalina.io> References: <20240820163512.1096301-1-qyousef@layalina.io> Precedence: bulk X-Mailing-List: linux-pm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 The new tunable, response_time_ms, allow us to speed up or slow down the response time of the policy to meet the perf, power and thermal characteristic desired by the user/sysadmin. There's no single universal trade-off that we can apply for all systems even if they use the same SoC. The form factor of the system, the dominant use case, and in case of battery powered systems, the size of the battery and presence or absence of active cooling can play a big role on what would be best to use. The new tunable provides sensible defaults, but yet gives the power to control the response time to the user/sysadmin, if they wish to. This tunable is applied before we apply the DVFS headroom. The default behavior of applying 1.25 headroom can be re-instated easily now. But we continue to keep the min required headroom to overcome hardware limitation in its speed to change DVFS. And any additional headroom to speed things up must be applied by userspace to match their expectation for best perf/watt as it dictates a type of policy that will be better for some systems, but worse for others. There's a whitespace clean up included in sugov_start(). Signed-off-by: Qais Yousef --- Documentation/admin-guide/pm/cpufreq.rst | 17 +++- drivers/cpufreq/cpufreq.c | 4 +- include/linux/cpufreq.h | 3 + kernel/sched/cpufreq_schedutil.c | 115 ++++++++++++++++++++++- 4 files changed, 132 insertions(+), 7 deletions(-) diff --git a/Documentation/admin-guide/pm/cpufreq.rst b/Documentation/admin-guide/pm/cpufreq.rst index 6adb7988e0eb..fa0d602a920e 100644 --- a/Documentation/admin-guide/pm/cpufreq.rst +++ b/Documentation/admin-guide/pm/cpufreq.rst @@ -417,7 +417,7 @@ is passed by the scheduler to the governor callback which causes the frequency to go up to the allowed maximum immediately and then draw back to the value returned by the above formula over time. -This governor exposes only one tunable: +This governor exposes two tunables: ``rate_limit_us`` Minimum time (in microseconds) that has to pass between two consecutive @@ -427,6 +427,21 @@ This governor exposes only one tunable: The purpose of this tunable is to reduce the scheduler context overhead of the governor which might be excessive without it. +``respone_time_ms`` + Amount of time (in milliseconds) required to ramp the policy from + lowest to highest frequency. Can be decreased to speed up the + responsiveness of the system, or increased to slow the system down in + hope to save power. The best perf/watt will depend on the system + characteristics and the dominant workload you expect to run. For + userspace that has smart context on the type of workload running (like + in Android), one can tune this to suite the demand of that workload. + + Note that when slowing the response down, you can end up effectively + chopping off the top frequencies for that policy as the util is capped + to 1024. On HMP systems this chopping effect will only occur on the + biggest core whose capacity is 1024. Don't rely on this behavior as + this is a limitation that can hopefully be improved in the future. + This governor generally is regarded as a replacement for the older `ondemand`_ and `conservative`_ governors (described below), as it is simpler and more tightly integrated with the CPU scheduler, its overhead in terms of CPU context diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index a45aac17c20f..5dc44c3694fe 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -533,8 +533,8 @@ void cpufreq_disable_fast_switch(struct cpufreq_policy *policy) } EXPORT_SYMBOL_GPL(cpufreq_disable_fast_switch); -static unsigned int __resolve_freq(struct cpufreq_policy *policy, - unsigned int target_freq, unsigned int relation) +unsigned int __resolve_freq(struct cpufreq_policy *policy, + unsigned int target_freq, unsigned int relation) { unsigned int idx; diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index 20f7e98ee8af..c14ffdcd8933 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -622,6 +622,9 @@ int cpufreq_driver_target(struct cpufreq_policy *policy, int __cpufreq_driver_target(struct cpufreq_policy *policy, unsigned int target_freq, unsigned int relation); +unsigned int __resolve_freq(struct cpufreq_policy *policy, + unsigned int target_freq, + unsigned int relation); unsigned int cpufreq_driver_resolve_freq(struct cpufreq_policy *policy, unsigned int target_freq); unsigned int cpufreq_policy_transition_delay_us(struct cpufreq_policy *policy); diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c index 303b0ab227e7..94e35b7c972d 100644 --- a/kernel/sched/cpufreq_schedutil.c +++ b/kernel/sched/cpufreq_schedutil.c @@ -8,9 +8,12 @@ #define IOWAIT_BOOST_MIN (SCHED_CAPACITY_SCALE / 8) +DEFINE_PER_CPU_READ_MOSTLY(unsigned long, response_time_mult); + struct sugov_tunables { struct gov_attr_set attr_set; unsigned int rate_limit_us; + unsigned int response_time_ms; }; struct sugov_policy { @@ -22,6 +25,7 @@ struct sugov_policy { raw_spinlock_t update_lock; u64 last_freq_update_time; s64 freq_update_delay_ns; + unsigned int freq_response_time_ms; unsigned int next_freq; unsigned int cached_raw_freq; @@ -59,6 +63,70 @@ static DEFINE_PER_CPU(struct sugov_cpu, sugov_cpu); /************************ Governor internals ***********************/ +static inline u64 sugov_calc_freq_response_ms(struct sugov_policy *sg_policy) +{ + int cpu = cpumask_first(sg_policy->policy->cpus); + unsigned long cap = arch_scale_cpu_capacity(cpu); + unsigned int max_freq, sec_max_freq; + + max_freq = sg_policy->policy->cpuinfo.max_freq; + sec_max_freq = __resolve_freq(sg_policy->policy, + max_freq - 1, + CPUFREQ_RELATION_H); + + /* + * We will request max_freq as soon as util crosses the capacity at + * second highest frequency. So effectively our response time is the + * util at which we cross the cap@2nd_highest_freq. + */ + cap = sec_max_freq * cap / max_freq; + + return approximate_runtime(cap + 1); +} + +static inline void sugov_update_response_time_mult(struct sugov_policy *sg_policy) +{ + unsigned long mult; + int cpu; + + if (unlikely(!sg_policy->freq_response_time_ms)) + sg_policy->freq_response_time_ms = sugov_calc_freq_response_ms(sg_policy); + + mult = sg_policy->freq_response_time_ms * SCHED_CAPACITY_SCALE; + mult /= sg_policy->tunables->response_time_ms; + + if (SCHED_WARN_ON(!mult)) + mult = SCHED_CAPACITY_SCALE; + + for_each_cpu(cpu, sg_policy->policy->cpus) + per_cpu(response_time_mult, cpu) = mult; +} + +/* + * Shrink or expand how long it takes to reach the maximum performance of the + * policy. + * + * sg_policy->freq_response_time_ms is a constant value defined by PELT + * HALFLIFE and the capacity of the policy (assuming HMP systems). + * + * sg_policy->tunables->response_time_ms is a user defined response time. By + * setting it lower than sg_policy->freq_response_time_ms, the system will + * respond faster to changes in util, which will result in reaching maximum + * performance point quicker. By setting it higher, it'll slow down the amount + * of time required to reach the maximum OPP. + * + * This should be applied when selecting the frequency. + */ +static inline unsigned long +sugov_apply_response_time(unsigned long util, int cpu) +{ + unsigned long mult; + + mult = per_cpu(response_time_mult, cpu) * util; + + return mult >> SCHED_CAPACITY_SHIFT; +} + static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time) { s64 delta_ns; @@ -215,7 +283,10 @@ unsigned long sugov_effective_cpu_perf(int cpu, unsigned long actual, unsigned long min, unsigned long max) { - /* Add dvfs headroom to actual utilization */ + /* + * Speed up/slow down response timee first then apply DVFS headroom. + */ + actual = sugov_apply_response_time(actual, cpu); actual = sugov_apply_dvfs_headroom(actual, cpu); /* Actually we don't need to target the max performance */ if (actual < max) @@ -614,8 +685,42 @@ rate_limit_us_store(struct gov_attr_set *attr_set, const char *buf, size_t count static struct governor_attr rate_limit_us = __ATTR_RW(rate_limit_us); +static ssize_t response_time_ms_show(struct gov_attr_set *attr_set, char *buf) +{ + struct sugov_tunables *tunables = to_sugov_tunables(attr_set); + + return sprintf(buf, "%u\n", tunables->response_time_ms); +} + +static ssize_t +response_time_ms_store(struct gov_attr_set *attr_set, const char *buf, size_t count) +{ + struct sugov_tunables *tunables = to_sugov_tunables(attr_set); + struct sugov_policy *sg_policy; + unsigned int response_time_ms; + + if (kstrtouint(buf, 10, &response_time_ms)) + return -EINVAL; + + /* XXX need special handling for high values? */ + + tunables->response_time_ms = response_time_ms; + + list_for_each_entry(sg_policy, &attr_set->policy_list, tunables_hook) { + if (sg_policy->tunables == tunables) { + sugov_update_response_time_mult(sg_policy); + break; + } + } + + return count; +} + +static struct governor_attr response_time_ms = __ATTR_RW(response_time_ms); + static struct attribute *sugov_attrs[] = { &rate_limit_us.attr, + &response_time_ms.attr, NULL }; ATTRIBUTE_GROUPS(sugov); @@ -803,11 +908,13 @@ static int sugov_init(struct cpufreq_policy *policy) goto stop_kthread; } - tunables->rate_limit_us = cpufreq_policy_transition_delay_us(policy); - policy->governor_data = sg_policy; sg_policy->tunables = tunables; + tunables->rate_limit_us = cpufreq_policy_transition_delay_us(policy); + tunables->response_time_ms = sugov_calc_freq_response_ms(sg_policy); + sugov_update_response_time_mult(sg_policy); + ret = kobject_init_and_add(&tunables->attr_set.kobj, &sugov_tunables_ktype, get_governor_parent_kobj(policy), "%s", schedutil_gov.name); @@ -867,7 +974,7 @@ static int sugov_start(struct cpufreq_policy *policy) void (*uu)(struct update_util_data *data, u64 time, unsigned int flags); unsigned int cpu; - sg_policy->freq_update_delay_ns = sg_policy->tunables->rate_limit_us * NSEC_PER_USEC; + sg_policy->freq_update_delay_ns = sg_policy->tunables->rate_limit_us * NSEC_PER_USEC; sg_policy->last_freq_update_time = 0; sg_policy->next_freq = 0; sg_policy->work_in_progress = false; From patchwork Tue Aug 20 16:35:03 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qais Yousef X-Patchwork-Id: 821197 Received: from mail-ej1-f51.google.com (mail-ej1-f51.google.com [209.85.218.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2596C1B3F33 for ; Tue, 20 Aug 2024 16:36:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724171809; cv=none; b=rcc8jr531xUZH/b5chIpCTh6C8Pw/TdjbEUj0Kkpf5lBD+wQyODMIwL8tA4BQlP1X6sqEafMriB0qTpe0qvjhNX3vFmjhDmQsBF6CSdNThdlKEQSm40DWI0rpGj6/HXkgCcLc+0iKJCLjIG5hrRzfUB/y7dOCJ7nQFwQbBjnrbY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724171809; c=relaxed/simple; bh=ggQ0xHMfPwWR5W6MLhFpsoF3X1RHmOFsfCjj+44KrkY=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=W8AlGSN1XpjgGWwCiaCVbvGKZsJ/qkqqCp+T0k7L1zSWD9U50p6CwPjjVarJCaXuuQEiK3hdyfVyVbivmvhde5cRvi3zOHzzvw2ZdhQK+U1OQ5U+trx4DKwRwC7SZVBhKWJLYN4d0K285Q+qNgrqgR/nAQaa9bI2Dr3DREB12OU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=layalina.io; spf=pass smtp.mailfrom=layalina.io; dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com header.i=@layalina-io.20230601.gappssmtp.com header.b=wsyA3i1I; arc=none smtp.client-ip=209.85.218.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=layalina.io Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=layalina.io Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com header.i=@layalina-io.20230601.gappssmtp.com header.b="wsyA3i1I" Received: by mail-ej1-f51.google.com with SMTP id a640c23a62f3a-a7ab5fc975dso569566366b.1 for ; Tue, 20 Aug 2024 09:36:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=layalina-io.20230601.gappssmtp.com; s=20230601; t=1724171804; x=1724776604; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=1wet2+g0g75UUnmsKQrb/VZONF2cRQanUuyFsF+8Vjk=; b=wsyA3i1IbjWoPW9j3p0vNvUGwPtjDlNT45+LMvpLkR7Gh+g1Ft4yn9bcG6cHDY6Fx7 EskjTsaMfaEWcarR6Xqt1tnXRL+7pkPpjAlgUab7PvpOrpZbl2ph7izXNPl7oLISini8 V4ljBcawl8Sf7NAfs1prsOlVMV9YYOL0v+8HXCSOh1Jr1bnvLd6Ku1IjRgSIUBusiYEp r084fR18jdRdhPY4bsDBe2NmYEqcLYCTg9X4e7/HH2BtfV9GHjqzHR7A6eYyy1REsQRq 0wdm6G189GSe9YpKNQ8GF0GCTNPMj8Yt4Zf7NoW335B8v6ZY/TFYj7N1dCKjvlLGo8OY bzvw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724171804; x=1724776604; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=1wet2+g0g75UUnmsKQrb/VZONF2cRQanUuyFsF+8Vjk=; b=jbcH43S+Vbei6a2ua3WZ7LmB/boEIowfXt02L7i0S61mL66D+8UcLB+T78dxbVAi9E JP/wno3u334Izb9x7IIKE3HvuNN92oUeQLEsG/AxoPtLG94mQPSdj3vpmS5vp/LVYprO WAvcDj+/Gpje+g72j/8LgeYcp9ZKDsSn8084ekk27dBmUCuqEn66gI2O/p1b04WA/jlC j5pJtq33QIVUeySLtsTSvqeJg61hWjt0huUOYKWX3kt3UefLYFBnchKYYDalt3bPbRC4 S0K+FG+UhBAdjUcFMdEseylvVJs5l59vMpy1gBj5c1FDF4capkbBmsnc6sHcmh99J9oX Czxg== X-Forwarded-Encrypted: i=1; AJvYcCWwE41RUN3AWzfy38TVupUvm3Q7uo55nHxriziWw1FLMKXC+qX3JIaUoA5L3mTt9jjCI+G8/GBvx9K4F0gAnTDBxTSeeDwqO5U= X-Gm-Message-State: AOJu0YyKg2XrRBHvgHf/CPv6kETAzps0o7oZW70JlTl8Hwo3UOSJczkt l24mLwqDlaJkl1SFB6ILh4HJFV4C4NfR1Mf2+0B/KaOvxPa9R00xcsnj9W4bRaw= X-Google-Smtp-Source: AGHT+IFEuSDJpZmH2vuXQPhm8UldWkmv4L2KE1jsWvuj7bhXQKIf0BaaBicVLq/zLot7W5U3e4x0BA== X-Received: by 2002:a17:906:d7c7:b0:a7a:bece:6222 with SMTP id a640c23a62f3a-a83928a9f25mr1027872166b.10.1724171804254; Tue, 20 Aug 2024 09:36:44 -0700 (PDT) Received: from airbuntu.. (host81-157-90-255.range81-157.btcentralplus.com. [81.157.90.255]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a83838cfb5esm780728766b.59.2024.08.20.09.35.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Aug 2024 09:36:43 -0700 (PDT) From: Qais Yousef To: Ingo Molnar , Peter Zijlstra , Vincent Guittot , "Rafael J. Wysocki" , Viresh Kumar Cc: Juri Lelli , Steven Rostedt , Dietmar Eggemann , John Stultz , linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, Qais Yousef Subject: [RFC PATCH 07/16] sched/pelt: Introduce PELT multiplier boot time parameter Date: Tue, 20 Aug 2024 17:35:03 +0100 Message-Id: <20240820163512.1096301-8-qyousef@layalina.io> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240820163512.1096301-1-qyousef@layalina.io> References: <20240820163512.1096301-1-qyousef@layalina.io> Precedence: bulk X-Mailing-List: linux-pm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 The param is set as read only and can only be changed at boot time via kernel.sched_pelt_multiplier=[1, 2, 4] PELT has a big impact on the overall system response and reactiveness to change. Smaller PELT HF means it'll require less time to reach the maximum performance point of the system when the system become fully busy; and equally shorter time to go back to lowest performance point when the system goes back to idle. This faster reaction impacts both DVFS response and migration time between clusters in HMP system. Smaller PELT values (higher multiplier) are expected to give better performance at the cost of more power. Under-powered systems can particularly benefit from faster response time. Powerful systems can still benefit from response time if they want to be tuned towards perf more and power is not the major concern for them. This combined with response_time_ms from schedutil should give the user and sysadmin a deterministic way to control the triangular power, perf and thermals for their system. The default response_time_ms will half as PELT HF halves. Update approximate_{util_avg, runtime}() to take into account the PELT HALFLIFE multiplier. Signed-off-by: Vincent Guittot [qyousef: Commit message and boot param] Signed-off-by: Qais Yousef --- kernel/sched/pelt.c | 62 ++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 58 insertions(+), 4 deletions(-) diff --git a/kernel/sched/pelt.c b/kernel/sched/pelt.c index 06cb881ba582..536575757420 100644 --- a/kernel/sched/pelt.c +++ b/kernel/sched/pelt.c @@ -24,6 +24,9 @@ * Author: Vincent Guittot */ +static __read_mostly unsigned int sched_pelt_lshift; +static unsigned int sched_pelt_multiplier = 1; + /* * Approximate: * val * y^n, where y^32 ~= 0.5 (~1 scheduling period) @@ -180,6 +183,7 @@ static __always_inline int ___update_load_sum(u64 now, struct sched_avg *sa, unsigned long load, unsigned long runnable, int running) { + int time_shift; u64 delta; delta = now - sa->last_update_time; @@ -195,12 +199,17 @@ ___update_load_sum(u64 now, struct sched_avg *sa, /* * Use 1024ns as the unit of measurement since it's a reasonable * approximation of 1us and fast to compute. + * On top of this, we can change the half-time period from the default + * 32ms to a shorter value. This is equivalent to left shifting the + * time. + * Merge both right and left shifts in one single right shift */ - delta >>= 10; + time_shift = 10 - sched_pelt_lshift; + delta >>= time_shift; if (!delta) return 0; - sa->last_update_time += delta << 10; + sa->last_update_time += delta << time_shift; /* * running is a subset of runnable (weight) so running can't be set if @@ -468,6 +477,51 @@ int update_irq_load_avg(struct rq *rq, u64 running) } #endif /* CONFIG_HAVE_SCHED_AVG_IRQ */ +static int set_sched_pelt_multiplier(const char *val, const struct kernel_param *kp) +{ + int ret; + + ret = param_set_int(val, kp); + if (ret) + goto error; + + switch (sched_pelt_multiplier) { + case 1: + fallthrough; + case 2: + fallthrough; + case 4: + WRITE_ONCE(sched_pelt_lshift, + sched_pelt_multiplier >> 1); + break; + default: + ret = -EINVAL; + goto error; + } + + return 0; + +error: + sched_pelt_multiplier = 1; + return ret; +} + +static const struct kernel_param_ops sched_pelt_multiplier_ops = { + .set = set_sched_pelt_multiplier, + .get = param_get_int, +}; + +#ifdef MODULE_PARAM_PREFIX +#undef MODULE_PARAM_PREFIX +#endif +/* XXX: should we use sched as prefix? */ +#define MODULE_PARAM_PREFIX "kernel." +module_param_cb(sched_pelt_multiplier, &sched_pelt_multiplier_ops, &sched_pelt_multiplier, 0444); +MODULE_PARM_DESC(sched_pelt_multiplier, "PELT HALFLIFE helps control the responsiveness of the system."); +MODULE_PARM_DESC(sched_pelt_multiplier, "Accepted value: 1 32ms PELT HALIFE - roughly 200ms to go from 0 to max performance point (default)."); +MODULE_PARM_DESC(sched_pelt_multiplier, " 2 16ms PELT HALIFE - roughly 100ms to go from 0 to max performance point."); +MODULE_PARM_DESC(sched_pelt_multiplier, " 4 8ms PELT HALIFE - roughly 50ms to go from 0 to max performance point."); + /* * Approximate the new util_avg value assuming an entity has continued to run * for @delta us. @@ -482,7 +536,7 @@ unsigned long approximate_util_avg(unsigned long util, u64 delta) if (unlikely(!delta)) return util; - accumulate_sum(delta, &sa, 1, 0, 1); + accumulate_sum(delta << sched_pelt_lshift, &sa, 1, 0, 1); ___update_load_avg(&sa, 0); return sa.util_avg; @@ -494,7 +548,7 @@ unsigned long approximate_util_avg(unsigned long util, u64 delta) u64 approximate_runtime(unsigned long util) { struct sched_avg sa = {}; - u64 delta = 1024; // period = 1024 = ~1ms + u64 delta = 1024 << sched_pelt_lshift; // period = 1024 = ~1ms u64 runtime = 0; if (unlikely(!util)) From patchwork Tue Aug 20 16:35:04 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Qais Yousef X-Patchwork-Id: 820842 Received: from mail-ed1-f48.google.com (mail-ed1-f48.google.com [209.85.208.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D79064962C for ; Tue, 20 Aug 2024 16:36:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.48 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724171809; cv=none; b=uzRXJhKVgwhBOc1TFrE0d7oRpkJzsX3REAj3IA2c/aRkPbYZHNSHcItWD0Ebg7uSvfnbxOk4tA+IUiFPfgPixZK16qaHnD714WB7YIb8Xv+0zxqCRj2chyxnng5HKZeEo8weJocwCfr+XVorPe+UeSLN0WvDMkRCJ9rx6nEUD1I= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724171809; c=relaxed/simple; bh=NWGl6xmypDOUzvDbdd6/ZYozVxHaZS6LctziiRw3r/4=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version:Content-Type; b=ORxUBjG8Jo5scv0hRw6rgQ0qEe+8eg58MAJK4E0V6/DiqEXk0KWZSZkf5baRHpIsPQbZbNDJKlZiqJ1Utl2fTF6cHLaAIQ0x9vKyDSh7S64VgQmdR8wF/DgTlrQR+OfVPR38f/AydZHvebUtPf7CKpismXhkPZg2R7Gbviyq+N8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=layalina.io; spf=pass smtp.mailfrom=layalina.io; dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com header.i=@layalina-io.20230601.gappssmtp.com header.b=RXtjyE/P; arc=none smtp.client-ip=209.85.208.48 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=layalina.io Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=layalina.io Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com header.i=@layalina-io.20230601.gappssmtp.com header.b="RXtjyE/P" Received: by mail-ed1-f48.google.com with SMTP id 4fb4d7f45d1cf-5bed68129a7so4960634a12.2 for ; Tue, 20 Aug 2024 09:36:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=layalina-io.20230601.gappssmtp.com; s=20230601; t=1724171805; x=1724776605; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=MlOaXaSxe4t1PPDAFFFwKxv22YBKt/FKkTrJcGZpyCM=; b=RXtjyE/P3BG/2Se4+wuqZn3Cf+Eru8kvIkt31wZ5fw+B8afAMb7hG0T6xvTJUx7Wbr zoUrFjiJiuXtxr9Z+MqmRYNKrAE33RJh8GhoPs+yEgDPvAU4HjxwxnBEsrBhvm1Op+iB MsqB62OBwp60MSPF6566vcj9AsfrPUInlpHhmjdyA3tAV9AUUITrl1T3f/ZD4SiGfbu/ 6uWsY2WyLKUEJeUMedut17OPFs4qjrtj+z/bBKEb4YGQrewhdnn2JJwzhLS+6xFmNgbn m2fv1pOjeajBEdvplOIEsRIkAhMWh66E/XSeviooXq6y/lUHoDntlNRhxiwvaPQgOuOb Ek6w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724171805; x=1724776605; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=MlOaXaSxe4t1PPDAFFFwKxv22YBKt/FKkTrJcGZpyCM=; b=o6vHk1sLOYUXOLR6wNrKFKdTvyZW4oJ65U6hoGg1s5IjR4J8z1pFjsXMxZWcWpyJt9 9DJZIWfUor6lS9ApDj0pWH7cso30eVA41D5+f0vAN16m7BBLPEF5So+7nWGEH5WaCI/3 XNDS5ZHor2SNsJ7oOpsl1FyH4fpMduCVdAR5wBu/d2RrZKyEPNsBTQx7boHL33DABGQL NAufRHpOz2MxkmLgbuD1OeHsdasPAJKJ5lSyhW2T2i96Jjg7hEn0B1D8wTBy5yy7mpVL hEtEkfiBkypoD7NNmsfwEvAk6MGOAg/aXbqR7SNOs8NNxkf/7lpzJ0tMXAZv0jzaDjTk wCkw== X-Forwarded-Encrypted: i=1; AJvYcCXvtOdvLDKBLXIx9N/zcdQeI0Vp3t1INzhaUHBoOzr2vWSVCLKI4r4u1myM0f54zWwqbSoGQvGJXg==@vger.kernel.org X-Gm-Message-State: AOJu0Yzjm4Vd69XLM9B3DQL7aSI7eTYnaeRyvOPYVdHBN+G4oVYBw/I7 yhcVJAgsZZp1/xatA7YGQTj38x7TsLlbtTs/+Nt+RlTPJG0FLYUCyA1C/5gYANw= X-Google-Smtp-Source: AGHT+IGd0lN/68+It+jeiTE+VFnqKtHnby+yBIZv2MbicBv5kJa2yRauuDHdIXSJbNXUDYwoNgDzbA== X-Received: by 2002:a17:907:c7d7:b0:a7a:be06:d8e3 with SMTP id a640c23a62f3a-a8392a22287mr1003338666b.46.1724171805332; Tue, 20 Aug 2024 09:36:45 -0700 (PDT) Received: from airbuntu.. (host81-157-90-255.range81-157.btcentralplus.com. [81.157.90.255]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a83838cfb5esm780728766b.59.2024.08.20.09.36.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Aug 2024 09:36:44 -0700 (PDT) From: Qais Yousef To: Ingo Molnar , Peter Zijlstra , Vincent Guittot , "Rafael J. Wysocki" , Viresh Kumar Cc: Juri Lelli , Steven Rostedt , Dietmar Eggemann , John Stultz , linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, Qais Yousef Subject: [RFC PATCH 08/16] sched/fair: Extend util_est to improve rampup time Date: Tue, 20 Aug 2024 17:35:04 +0100 Message-Id: <20240820163512.1096301-9-qyousef@layalina.io> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240820163512.1096301-1-qyousef@layalina.io> References: <20240820163512.1096301-1-qyousef@layalina.io> Precedence: bulk X-Mailing-List: linux-pm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Utilization invariance can cause big delays. When tasks are running, accumulate non-invairiant version of utilization to help tasks to settle down to their new util_avg values faster. Keep track of delta_exec during runnable across activations to help update util_est for a long running task accurately. util_est shoudl still behave the same at enqueue/dequeue. Before this patch the a busy task tamping up would experience the following transitions, running on M1 Mac Mini rampup-6338 util_avg running ┌─────────────────────────────────────────────────────────────────────────┐ 986.0┤ ▄▄▄▄▄▟▀▀▀▀│ │ ▗▄▄▟▀▀▀▘ │ │ ▗▄▟▀▀ │ │ ▄▟▀▀ │ 739.5┤ ▄▟▀▘ │ │ ▗▄▛▘ │ │ ▗▟▀ │ 493.0┤ ▗▛▀ │ │ ▗▄▛▀ │ │ ▄▟▀ │ │ ▄▛▘ │ 246.5┤ ▗▟▀▘ │ │ ▄▟▀▀ │ │ ▗▄▄▛▘ │ │ ▗▄▄▄▟▀ │ 0.0┤ ▗ ▗▄▄▟▀▀ │ └┬───────┬───────┬───────┬───────┬───────┬───────┬───────┬───────┬───────┬┘ 1.700 1.733 1.767 1.800 1.833 1.867 1.900 1.933 1.967 2.000 ───────────────── rampup-6338 util_avg running residency (ms) ────────────────── 0.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 5.5 15.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 7.9 36.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 8.0 57.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 8.0 78.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 7.9 98.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 5.0 117.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 5.0 137.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 5.0 156.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 4.0 176.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 3.0 191.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 4.0 211.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 4.0 230.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 3.0 248.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 3.0 266.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0 277.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 3.0 294.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.6 311.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.4 327.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0 340.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 3.0 358.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0 371.0 ▇▇▇▇▇▇▇▇▇ 1.0 377.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0 389.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0 401.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0 413.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 3.0 431.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0 442.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0 456.0 ▇▇▇▇▇▇▇▇▇ 1.0 ───────────────────────── Sum Time Running on CPU (ms) ───────────────────────── CPU0.0 ▇▇▇▇▇ 90.39 CPU4.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 1156.93 6338 rampup CPU0.0 Frequency ┌──────────────────────────────────────────────────────────────────────────┐ 2.06┤ ▛▀▀ │ │ ▌ │ │ ▌ │ │ ▌ │ 1.70┤ ▛▀▀▘ │ │ ▌ │ │ ▌ │ 1.33┤ ▗▄▄▄▌ │ │ ▐ │ │ ▐ │ │ ▐ │ 0.97┤ ▗▄▄▄▟ │ │ ▐ │ │ ▐ │ │ ▐ │ 0.60┤ ▗ ▗▄▄▄▄▄▄▄▄▟ │ └┬───────┬───────┬───────┬───────┬────────┬───────┬───────┬───────┬───────┬┘ 1.700 1.733 1.767 1.800 1.833 1.867 1.900 1.933 1.967 2.000 6338 rampup CPU4.0 Frequency ┌──────────────────────────────────────────────────────────────────────────┐ 3.20┤ ▐▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀│ │ ▐ │ │ ▛▀▀ │ │ ▌ │ 2.78┤ ▐▀▀▘ │ │ ▗▄▟ │ │ ▌ │ 2.35┤ ▗▄▄▌ │ │ ▐ │ │ ▄▄▟ │ │ ▌ │ 1.93┤ ▗▄▄▌ │ │ ▐ │ │ ▐ │ │ ▐ │ 1.50┤ ▗▄▄▟ │ └┬───────┬───────┬───────┬───────┬────────┬───────┬───────┬───────┬───────┬┘ 1.700 1.733 1.767 1.800 1.833 1.867 1.900 1.933 1.967 2.000 ───────────────── 6338 rampup CPU0.0 Frequency residency (ms) ────────────────── 0.6 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 37.300000000000004 0.972 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 15.0 1.332 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 15.0 1.704 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 11.0 2.064 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 12.1 ───────────────── 6338 rampup CPU4.0 Frequency residency (ms) ────────────────── 1.5 ▇▇▇▇▇▇▇▇▇▇ 11.9 1.956 ▇▇▇▇▇▇▇▇ 10.0 2.184 ▇▇▇▇▇▇▇▇ 10.0 2.388 ▇▇▇▇▇▇▇▇▇ 11.0 2.592 ▇▇▇▇▇▇▇▇ 10.0 2.772 ▇▇▇▇▇▇▇▇ 10.0 2.988 ▇▇▇▇▇▇▇▇ 10.0 3.204 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 85.3 After the patch the response is improved to rampup frequencies faster and migrate from little quicker rampup-2234 util_avg running ┌───────────────────────────────────────────────────────────────────────────┐ 984┤ ▗▄▄▄▄▄▛▀▀▀▀│ │ ▄▄▟▀▀▀▀ │ │ ▄▄▟▀▀ │ │ ▄▟▀▘ │ 738┤ ▄▟▀▘ │ │ ▗▟▀▘ │ │ ▗▟▀ │ 492┤ ▗▟▀ │ │ ▗▟▀ │ │ ▟▀ │ │ ▄▛▘ │ 246┤ ▗▟▘ │ │ ▗▟▀ │ │ ▗▟▀ │ │ ▗▟▀ │ 0┤ ▄▄▄▛▀ │ └┬───────┬───────┬────────┬───────┬───────┬───────┬────────┬───────┬───────┬┘ 1.700 1.733 1.767 1.800 1.833 1.867 1.900 1.933 1.967 2.000 ───────────────── rampup-2234 util_avg running residency (ms) ────────────────── 0.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 5.6000000000000005 15.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 8.0 39.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 5.0 61.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 4.0 85.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0 99.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 3.0 120.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 3.0 144.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0 160.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0 176.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0 192.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0 210.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0 228.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0 246.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0 263.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0 282.0 ▇▇▇▇▇▇▇ 1.0 291.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0 309.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0 327.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0 344.0 ▇▇▇▇▇▇▇ 1.0 354.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0 373.0 ▇▇▇▇▇▇▇ 1.0 382.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0 400.0 ▇▇▇▇▇▇▇ 1.0 408.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0 425.0 ▇▇▇▇▇▇▇ 1.0 434.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0 452.0 ▇▇▇▇▇▇▇ 1.0 2234 rampup CPU1.0 Frequency ┌──────────────────────────────────────────────────────────────────────────┐ 2.06┤ ▐▀ │ │ ▐ │ │ ▐ │ │ ▐ │ 1.70┤ ▛▀ │ │ ▌ │ │ ▌ │ 1.33┤ ▄▌ │ │ ▌ │ │ ▌ │ │ ▌ │ 0.97┤ ▗▄▌ │ │ ▐ │ │ ▐ │ │ ▐ │ 0.60┤ ▗▄▄▟ │ └┬───────┬───────┬───────┬───────┬────────┬───────┬───────┬───────┬───────┬┘ 1.700 1.733 1.767 1.800 1.833 1.867 1.900 1.933 1.967 2.000 2234 rampup CPU4.0 Frequency ┌──────────────────────────────────────────────────────────────────────────┐ 3.10┤ ▐▀▀▀▀▀▀▀▀▀▀▀▀▀│ │ ▛▀▀▀▀▀▀▀▀▀▀▀ │ │ ▌ │ │ ▐▀▀▀▀▘ │ 2.70┤ ▐ │ │ ▐▀▀▀▀ │ │ ▐ │ 2.30┤ ▛▀▀ │ │ ▌ │ │ ▐▀▀▘ │ │ ▐ │ 1.90┤ ▐▀▀ │ │ ▐ │ │ ▗▄▟ │ │ ▐ │ 1.50┤ ▗▟ │ └┬───────┬───────┬───────┬───────┬────────┬───────┬───────┬───────┬───────┬┘ 1.700 1.733 1.767 1.800 1.833 1.867 1.900 1.933 1.967 2.000 ───────────────────────── Sum Time Running on CPU (ms) ───────────────────────── CPU1.0 ▇▇▇▇ 32.53 CPU4.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 540.3 ───────────────── 2234 rampup CPU1.0 Frequency residency (ms) ────────────────── 0.6 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 12.1 0.972 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 6.5 1.332 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 3.7 1.704 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 5.5 2.064 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 4.8 ───────────────── 2234 rampup CPU4.0 Frequency residency (ms) ────────────────── 1.5 ▇▇▇▇▇ 4.0 1.728 ▇▇▇▇▇▇▇▇▇▇ 8.0 1.956 ▇▇▇▇▇▇▇▇▇▇▇▇ 9.0 2.184 ▇▇▇▇▇▇▇▇▇▇▇▇ 9.0 2.388 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 11.0 2.592 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 16.0 2.772 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 18.0 2.988 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 47.0 3.096 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 53.4 Signed-off-by: Qais Yousef --- include/linux/sched.h | 1 + kernel/sched/core.c | 1 + kernel/sched/fair.c | 43 +++++++++++++++++++++++++++++++------------ 3 files changed, 33 insertions(+), 12 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 90691d99027e..8db8f4085d84 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -544,6 +544,7 @@ struct sched_entity { unsigned int on_rq; u64 exec_start; + u64 delta_exec; u64 sum_exec_runtime; u64 prev_sum_exec_runtime; u64 vruntime; diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 7099e40cc8bd..e2b4b87ec2b7 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4318,6 +4318,7 @@ static void __sched_fork(unsigned long clone_flags, struct task_struct *p) p->se.on_rq = 0; p->se.exec_start = 0; + p->se.delta_exec = 0; p->se.sum_exec_runtime = 0; p->se.prev_sum_exec_runtime = 0; p->se.nr_migrations = 0; diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index e5e986af18dc..a6421e4032c0 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1118,6 +1118,7 @@ static s64 update_curr_se(struct rq *rq, struct sched_entity *curr) curr->exec_start = now; curr->sum_exec_runtime += delta_exec; + curr->delta_exec = delta_exec; if (schedstat_enabled()) { struct sched_statistics *stats; @@ -1126,7 +1127,6 @@ static s64 update_curr_se(struct rq *rq, struct sched_entity *curr) __schedstat_set(stats->exec_max, max(delta_exec, stats->exec_max)); } - return delta_exec; } @@ -4890,16 +4890,20 @@ static inline void util_est_update(struct cfs_rq *cfs_rq, if (!sched_feat(UTIL_EST)) return; - /* - * Skip update of task's estimated utilization when the task has not - * yet completed an activation, e.g. being migrated. - */ - if (!task_sleep) - return; - /* Get current estimate of utilization */ ewma = READ_ONCE(p->se.avg.util_est); + /* + * If a task is running, update util_est ignoring utilization + * invariance so that if the task suddenly becomes busy we will rampup + * quickly to settle down to our new util_avg. + */ + if (!task_sleep) { + ewma &= ~UTIL_AVG_UNCHANGED; + ewma = approximate_util_avg(ewma, p->se.delta_exec / 1000); + goto done; + } + /* * If the PELT values haven't changed since enqueue time, * skip the util_est update. @@ -4968,6 +4972,14 @@ static inline void util_est_update(struct cfs_rq *cfs_rq, trace_sched_util_est_se_tp(&p->se); } +static inline void util_est_update_running(struct cfs_rq *cfs_rq, + struct task_struct *p) +{ + util_est_dequeue(cfs_rq, p); + util_est_update(cfs_rq, p, false); + util_est_enqueue(cfs_rq, p); +} + static inline unsigned long get_actual_cpu_capacity(int cpu) { unsigned long capacity = arch_scale_cpu_capacity(cpu); @@ -5164,13 +5176,13 @@ static inline int sched_balance_newidle(struct rq *rq, struct rq_flags *rf) static inline void util_est_enqueue(struct cfs_rq *cfs_rq, struct task_struct *p) {} - static inline void util_est_dequeue(struct cfs_rq *cfs_rq, struct task_struct *p) {} - static inline void -util_est_update(struct cfs_rq *cfs_rq, struct task_struct *p, - bool task_sleep) {} +util_est_update(struct cfs_rq *cfs_rq, struct task_struct *p, bool task_sleep) {} +static inline void +util_est_update_running(struct cfs_rq *cfs_rq, struct task_struct *p) {} + static inline void update_misfit_status(struct task_struct *p, struct rq *rq) {} #endif /* CONFIG_SMP */ @@ -6906,6 +6918,8 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags) rq->next_balance = jiffies; dequeue_throttle: + if (task_sleep) + p->se.delta_exec = 0; util_est_update(&rq->cfs, p, task_sleep); hrtick_update(rq); } @@ -8546,6 +8560,9 @@ pick_next_task_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf set_next_entity(cfs_rq, se); } + if (prev->on_rq) + util_est_update_running(&rq->cfs, prev); + goto done; simple: #endif @@ -12710,6 +12727,8 @@ static void task_tick_fair(struct rq *rq, struct task_struct *curr, int queued) entity_tick(cfs_rq, se, queued); } + util_est_update_running(&rq->cfs, curr); + if (static_branch_unlikely(&sched_numa_balancing)) task_tick_numa(rq, curr); From patchwork Tue Aug 20 16:35:05 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qais Yousef X-Patchwork-Id: 821196 Received: from mail-ed1-f42.google.com (mail-ed1-f42.google.com [209.85.208.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 01D2D1B8E9B for ; Tue, 20 Aug 2024 16:36:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.42 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724171809; cv=none; b=QGUT8YvuuDjWZuY2scan61/JSwQgGnIOSK6t7gDkuwCZyWk8x2diYq1g4zJiOprlGpz9nKr+MtH49O8q0ZpIsv1ZpB4OSqjFmgDyKvTgRQtHKUVZGShiiKwoDH5RiWgLQvIuOdeizzwfnAGgmNSwS8BjvEi/xrPl2qMfJFZQZiU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724171809; c=relaxed/simple; bh=GQPbKPJztlK6T882OcegVi96fmYLqoqnWFn1+Q/Iecs=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=ARR/WPpBtcp/Fy3mQkNEQO7bXIsTjtC3mTjWOuAU1D/UkS4sWItnQNczQFITi91bUPjgt6ISxXlVsuZhKGCfq9W8AS5sujhmThShdtiPp3oIvRtWaayT6owSYqXLEU7Tyuw8nkJO22BBI/iaqCSU+dJi60/co6tst8TFwAOl/JA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=layalina.io; spf=pass smtp.mailfrom=layalina.io; dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com header.i=@layalina-io.20230601.gappssmtp.com header.b=H0X/u4lk; arc=none smtp.client-ip=209.85.208.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=layalina.io Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=layalina.io Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com header.i=@layalina-io.20230601.gappssmtp.com header.b="H0X/u4lk" Received: by mail-ed1-f42.google.com with SMTP id 4fb4d7f45d1cf-5bef4d9e7f8so3396822a12.2 for ; Tue, 20 Aug 2024 09:36:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=layalina-io.20230601.gappssmtp.com; s=20230601; t=1724171806; x=1724776606; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Gs4Ks0Kkm98Csg47MQJPALx1GhqbRV6StE7nBUKSXR0=; b=H0X/u4lkYVxFQFnzat5w3riIAUl6WVHxxPByZomWAcSHNL+Ptim7h5zZhLrJV3CCx9 amArqi5HN+roUdYPOTMBeL5bhMY7AIEJvdql9T4gYy/vL3HajjoFV1hh4wxInVuZQdOH VtM7oPIiEfB2R+DYHdaHwup+R0WFfFrtXODtyV1MbX/cbEARy5X2gXMLIP0Tzi1+zAzz BzkbYZFei4WSydr7Od7zo370kfNVvrIY+AiYwBv5Z6ZaLPVFFqtpt2P7PBOvOGM7G2oH CQ9e8wkYgph2Swc0WrTN7m0TLt+PP6waRY9AQQQcT2x+F26lpeB5aLQG7dezsNHHI36L KhPw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724171806; x=1724776606; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Gs4Ks0Kkm98Csg47MQJPALx1GhqbRV6StE7nBUKSXR0=; b=BEPw0W0RCjPUaoADSF1mvHQmNN+xve3cL0fRNt7BjXk7qvMxNWfyKUs4pZWhbuLwwb 9ZCxn4O9jIso8W+jshQp7BkykLwInqBB7aV0ufwcJbZjiRI0tPfyMAC6lVyI2rPvRZie x2VPmncB2JM9B9tNBXSyh1CfC9/KC7etrLFVYLl19MyBRjIfUapo8CHjztZ2e432WGX9 7clLd5EjtqVFZzudkuFsRuurF3HAYF5HMc/AeKdM02jP8C2jaa9J4UtzXVNlvZ6cF4fC C2WeV/GPjcmT/ORF6Krtb4t6qVUTteTN0LzMQocgYijvnJPQya6AJZ6VruYZvhcMNRlH FHjQ== X-Forwarded-Encrypted: i=1; AJvYcCWvmQr67598CDiiq7+7wgEf5XexHtD/tsdvXu6izpfKU+ljqA7nMGRIgWHu0Lv9omwFNvxGpBZvUCDFv+BXdQ+hftKUKpfUZkY= X-Gm-Message-State: AOJu0Yy/huBpdTMAkjt/BORPz0esTI3AphOvVoK5Jz7d0XPKGsMgv96u ZU4fKijgSQOqNaH789e5+TngyLFXE7mNigqQY4bNdmeSBRi2uKNV3cCqEyhQoyI= X-Google-Smtp-Source: AGHT+IGwNTmVgxwt0g3+fIC/ibzEl8NcSRLlNxBxLL/CCcYzyggVVh9aXnQfRJ+/Pw7OoMUPFxuxmw== X-Received: by 2002:a17:907:6d20:b0:a7d:2f42:db54 with SMTP id a640c23a62f3a-a8392a49851mr940380666b.65.1724171806188; Tue, 20 Aug 2024 09:36:46 -0700 (PDT) Received: from airbuntu.. (host81-157-90-255.range81-157.btcentralplus.com. [81.157.90.255]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a83838cfb5esm780728766b.59.2024.08.20.09.36.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Aug 2024 09:36:45 -0700 (PDT) From: Qais Yousef To: Ingo Molnar , Peter Zijlstra , Vincent Guittot , "Rafael J. Wysocki" , Viresh Kumar Cc: Juri Lelli , Steven Rostedt , Dietmar Eggemann , John Stultz , linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, Qais Yousef Subject: [RFC PATCH 09/16] sched/fair: util_est: Take into account periodic tasks Date: Tue, 20 Aug 2024 17:35:05 +0100 Message-Id: <20240820163512.1096301-10-qyousef@layalina.io> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240820163512.1096301-1-qyousef@layalina.io> References: <20240820163512.1096301-1-qyousef@layalina.io> Precedence: bulk X-Mailing-List: linux-pm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 The new faster rampup is great for performance. But terrible for power. We want the faster rampup to be only applied for tasks that are transitioning from one periodic/steady state to another periodic/steady state. But if they are stably periodic, then the faster rampup doesn't make sense as util_avg describes their computational demand accurately and we can rely on that to make accurate decision. And preserve the power savings from being exact with the resources we give to this task (ie: smaller DVFS headroom). We detect periodic tasks based on util_avg across util_est_update() calls. If it is rising, then the task is going through a transition. We rely on util_avg being stable for periodic tasks with very little variations around one stable point. Signed-off-by: Qais Yousef --- include/linux/sched.h | 2 ++ kernel/sched/core.c | 2 ++ kernel/sched/fair.c | 17 ++++++++++++++--- 3 files changed, 18 insertions(+), 3 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 8db8f4085d84..2e8c5a9ffa76 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -829,6 +829,8 @@ struct task_struct { struct uclamp_se uclamp[UCLAMP_CNT]; #endif + unsigned long util_avg_dequeued; + struct sched_statistics stats; #ifdef CONFIG_PREEMPT_NOTIFIERS diff --git a/kernel/sched/core.c b/kernel/sched/core.c index e2b4b87ec2b7..c91e6a62c7ab 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4331,6 +4331,8 @@ static void __sched_fork(unsigned long clone_flags, struct task_struct *p) p->se.cfs_rq = NULL; #endif + p->util_avg_dequeued = 0; + #ifdef CONFIG_SCHEDSTATS /* Even if schedstat is disabled, there should not be garbage */ memset(&p->stats, 0, sizeof(p->stats)); diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index a6421e4032c0..0c10e2afb52d 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4832,6 +4832,11 @@ static inline unsigned long task_util(struct task_struct *p) return READ_ONCE(p->se.avg.util_avg); } +static inline unsigned long task_util_dequeued(struct task_struct *p) +{ + return READ_ONCE(p->util_avg_dequeued); +} + static inline unsigned long task_runnable(struct task_struct *p) { return READ_ONCE(p->se.avg.runnable_avg); @@ -4899,9 +4904,12 @@ static inline void util_est_update(struct cfs_rq *cfs_rq, * quickly to settle down to our new util_avg. */ if (!task_sleep) { - ewma &= ~UTIL_AVG_UNCHANGED; - ewma = approximate_util_avg(ewma, p->se.delta_exec / 1000); - goto done; + if (task_util(p) > task_util_dequeued(p)) { + ewma &= ~UTIL_AVG_UNCHANGED; + ewma = approximate_util_avg(ewma, p->se.delta_exec / 1000); + goto done; + } + return; } /* @@ -4914,6 +4922,9 @@ static inline void util_est_update(struct cfs_rq *cfs_rq, /* Get utilization at dequeue */ dequeued = task_util(p); + if (!task_on_rq_migrating(p)) + p->util_avg_dequeued = dequeued; + /* * Reset EWMA on utilization increases, the moving average is used only * to smooth utilization decreases. From patchwork Tue Aug 20 16:35:06 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qais Yousef X-Patchwork-Id: 820841 Received: from mail-ed1-f42.google.com (mail-ed1-f42.google.com [209.85.208.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AAE2F1B9B3D for ; Tue, 20 Aug 2024 16:36:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.42 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724171810; cv=none; b=G+oZS2XHpB6swagEVKNkIEO5hrk0SZufWBobQoUEGVhWNZmdZLj37UImlTk2sbLeJALTk8/XMk3/D+3FfOv9SL/2mjtq8cIyQv56FLgmnm/2hHu5fBL6J9nB8Fb7FLkYt2xwirloTsphHh4dz3GCbnqGyp6btZgnwsMPESQcGqw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724171810; c=relaxed/simple; bh=H2v4wPkY9/S00PuSqnRFW7BMeliOkRRZuCIYsWw2Xl0=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=DlWibJTkKYxwoBgHHROzTMBEwGH2V1IcUA/Zl8QTfq1EbyHF3cKP+6cxxiYoOEeR2nMA5mmxq/GQ30t14ZXYWJuofaOLMxXWdfhYObMqowlU4pAN9lbljjsw9nuGIuh+haHWAR37hELcwGqhq1XZ0mSdDjZ0RFSsSbGTxl5f5AM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=layalina.io; spf=pass smtp.mailfrom=layalina.io; dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com header.i=@layalina-io.20230601.gappssmtp.com header.b=SEdLHilB; arc=none smtp.client-ip=209.85.208.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=layalina.io Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=layalina.io Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com header.i=@layalina-io.20230601.gappssmtp.com header.b="SEdLHilB" Received: by mail-ed1-f42.google.com with SMTP id 4fb4d7f45d1cf-5bec7d380caso4903756a12.3 for ; Tue, 20 Aug 2024 09:36:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=layalina-io.20230601.gappssmtp.com; s=20230601; t=1724171807; x=1724776607; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=JT5j0PZ9X9xu3k6v8CfXi6RGxKCSRQvQX4tJvcwiDgI=; b=SEdLHilBdGd+HV3kLIAFgh5Tco/aov6L03RWLcXoDAxxwECbqowStf7e7Nn4W00uNA 8ou3G4Dx/Qe935abWpyN261HVj/GzOCQ3TdT5w/Ei88D7RX7h/hPDGY513givXkFJOiL FlhSh8Oqovtucp01VF8KkxGGsubtzyP4MExPV5wMARKfzjNOIg/rNu6Uq2XcVvXM/1Gn NotMSmAekdwHEjUvlRlecHIuTf82M0IC/dNIXbEHRGht3TJroZomTcVEcZ7gFL/bP5aF B309BAeGdVe2jdane30E7qGJcyQLmu1IZuO0fGUiO0esf7UfyDmBCjvU04zCUWbsviB1 NlBw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724171807; x=1724776607; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=JT5j0PZ9X9xu3k6v8CfXi6RGxKCSRQvQX4tJvcwiDgI=; b=u1UChiOcI/tq3fMBSoxD+7ZUja3y6Bwyfp44d2X55gwIJ7SyZvIt28hMR69L/PaDSt 6Lx1oCwPjoyFij9e8AS4gFwItbZJR6iil1frs5Bs5vmgICjl849aB51dKC9r58CvQ6Tb SoI6HsoTc7dBRyyJjDFUbECkh2gMscS26sNtLTOEcUl82L4p4HnuHMY1ToK1m6Th+/2s RIrWsa9bbpDG2HW9BczldAmdAp5M91+N/KYRnCvitXdEwlNoRWIjTflu7ah4CZLvpkW4 oR7YVp/a73n/j8cSjjqSmz2YBd2+CS2w8AczdKWpDMxJXof18nDWpnx4G5RewkO1os7A 8g6Q== X-Forwarded-Encrypted: i=1; AJvYcCUdzBmgYfllyKVNNFM0Lv8+dwGjNtitdxD4/AYxAj/AAZn/mHt0DWh0C6hHYNxTr6oZcxYEWiot2Q==@vger.kernel.org X-Gm-Message-State: AOJu0YxxOd03eL5zPcoPdFxz12ufmNjYKX9Vexb6d1CfiZTIjzHwQCm7 njxEQj37hi127VuW8OMEOHRPuBgFePriua3WqPToRV2d1L5eH34gg1fiSBdFMfk= X-Google-Smtp-Source: AGHT+IFCqy8udzUPQSToeARc04ldQIgYbEcEvnYLXxCQCr6hah66sD1+LHqReW2Q/Kq1330Y8OrsKQ== X-Received: by 2002:a17:907:efc8:b0:a7a:c197:8701 with SMTP id a640c23a62f3a-a83929534e5mr1074429266b.31.1724171807015; Tue, 20 Aug 2024 09:36:47 -0700 (PDT) Received: from airbuntu.. (host81-157-90-255.range81-157.btcentralplus.com. [81.157.90.255]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a83838cfb5esm780728766b.59.2024.08.20.09.36.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Aug 2024 09:36:46 -0700 (PDT) From: Qais Yousef To: Ingo Molnar , Peter Zijlstra , Vincent Guittot , "Rafael J. Wysocki" , Viresh Kumar Cc: Juri Lelli , Steven Rostedt , Dietmar Eggemann , John Stultz , linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, Qais Yousef Subject: [RFC PATCH 10/16] sched/qos: Add a new sched-qos interface Date: Tue, 20 Aug 2024 17:35:06 +0100 Message-Id: <20240820163512.1096301-11-qyousef@layalina.io> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240820163512.1096301-1-qyousef@layalina.io> References: <20240820163512.1096301-1-qyousef@layalina.io> Precedence: bulk X-Mailing-List: linux-pm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 The need to describe the conflicting demand of various workloads hasn't been higher. Both hardware and software have moved rapidly in the past decade and system usage is more diverse and the number of workloads expected to run on the same machine whether on Mobile or Server markets has created a big dilemma on how to better manage those requirements. The problem is that we lack mechanisms to allow these workloads to describe what they need, and then allow kernel to do best efforts to manage those demands based on the hardware it is running on transparently and current system state. Example of conflicting requirements that come across frequently: 1. Improve wake up latency for SCHED_OTHER. Many tasks end up using SCHED_FIFO/SCHED_RR to compensate for this shortcoming. RT tasks lack power management and fairness and can be hard and error prone to use correctly and portably. 2. Prefer spreading vs prefer packing on wake up for a group of tasks. Geekbench-like workloads would benefit from parallelising on different CPUs. hackbench type of workloads can benefit from waking on up same CPUs or a CPU that is closer in the cache hierarchy. 3. Nice values for SCHED_OTHER are system wide and require privileges. Many workloads would like a way to set relative nice value so they can preempt each others, but not be impact or be impacted by other tasks belong to different workloads on the system. 4. Provide a way to tag some tasks as 'background' to keep them out of the way. SCHED_IDLE is too strong for some of these tasks but yet they can be computationally heavy. Example tasks are garbage collectors. Their work is both important and not important. 5. Provide a way to improve DVFS/upmigration rampup time for specific tasks that are bursty in nature and highly interactive. Whether any of these use cases warrants an additional QoS hint is something to be discussed individually. But the main point is to introduce an interface that can be extendable to cater for potentially those requirements and more. rampup_multiplier to improve DVFS/upmigration for bursty tasks will be the first user in later patch. It is desired to have apps (and benchmarks!) directly use this interface for optimal perf/watt. But in the absence of such support, it should be possible to write a userspace daemon to monitor workloads and apply these QoS hints on apps behalf based on analysis done by anyone interested in improving the performance of those workloads. Signed-off-by: Qais Yousef --- Documentation/scheduler/index.rst | 1 + Documentation/scheduler/sched-qos.rst | 44 ++++++++++++++++++ include/uapi/linux/sched.h | 4 ++ include/uapi/linux/sched/types.h | 46 +++++++++++++++++++ kernel/sched/syscalls.c | 3 ++ .../trace/beauty/include/uapi/linux/sched.h | 4 ++ 6 files changed, 102 insertions(+) create mode 100644 Documentation/scheduler/sched-qos.rst diff --git a/Documentation/scheduler/index.rst b/Documentation/scheduler/index.rst index 43bd8a145b7a..f49b8b021d97 100644 --- a/Documentation/scheduler/index.rst +++ b/Documentation/scheduler/index.rst @@ -21,6 +21,7 @@ Scheduler sched-rt-group sched-stats sched-debug + sched-qos text_files diff --git a/Documentation/scheduler/sched-qos.rst b/Documentation/scheduler/sched-qos.rst new file mode 100644 index 000000000000..0911261cb124 --- /dev/null +++ b/Documentation/scheduler/sched-qos.rst @@ -0,0 +1,44 @@ +.. SPDX-License-Identifier: GPL-2.0 + +============= +Scheduler QoS +============= + +1. Introduction +=============== + +Different workloads have different scheduling requirements to operate +optimally. The same applies to tasks within the same workload. + +To enable smarter usage of system resources and to cater for the conflicting +demands of various tasks, Scheduler QoS provides a mechanism to provide more +information about those demands so that scheduler can do best-effort to +honour them. + + @sched_qos_type what QoS hint to apply + @sched_qos_value value of the QoS hint + @sched_qos_cookie magic cookie to tag a group of tasks for which the QoS + applies. If 0, the hint will apply globally system + wide. If not 0, the hint will be relative to tasks that + has the same cookie value only. + +QoS hints are set once and not inherited by children by design. The +rationale is that each task has its individual characteristics and it is +encouraged to describe each of these separately. Also since system resources +are finite, there's a limit to what can be done to honour these requests +before reaching a tipping point where there are too many requests for +a particular QoS that is impossible to service for all of them at once and +some will start to lose out. For example if 10 tasks require better wake +up latencies on a 4 CPUs SMP system, then if they all wake up at once, only +4 can perceive the hint honoured and the rest will have to wait. Inheritance +can lead these 10 to become a 100 or a 1000 more easily, and then the QoS +hint will lose its meaning and effectiveness rapidly. The chances of 10 +tasks waking up at the same time is lower than a 100 and lower than a 1000. + +To set multiple QoS hints, a syscall is required for each. This is a +trade-off to reduce the churn on extending the interface as the hope for +this to evolve as workloads and hardware get more sophisticated and the +need for extension will arise; and when this happen the task should be +simpler to add the kernel extension and allow userspace to use readily by +setting the newly added flag without having to update the whole of +sched_attr. diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h index 3bac0a8ceab2..67ef99f64ddc 100644 --- a/include/uapi/linux/sched.h +++ b/include/uapi/linux/sched.h @@ -102,6 +102,9 @@ struct clone_args { __aligned_u64 set_tid_size; __aligned_u64 cgroup; }; + +enum sched_qos_type { +}; #endif #define CLONE_ARGS_SIZE_VER0 64 /* sizeof first published struct */ @@ -132,6 +135,7 @@ struct clone_args { #define SCHED_FLAG_KEEP_PARAMS 0x10 #define SCHED_FLAG_UTIL_CLAMP_MIN 0x20 #define SCHED_FLAG_UTIL_CLAMP_MAX 0x40 +#define SCHED_FLAG_QOS 0x80 #define SCHED_FLAG_KEEP_ALL (SCHED_FLAG_KEEP_POLICY | \ SCHED_FLAG_KEEP_PARAMS) diff --git a/include/uapi/linux/sched/types.h b/include/uapi/linux/sched/types.h index 90662385689b..55e4b1e79ed2 100644 --- a/include/uapi/linux/sched/types.h +++ b/include/uapi/linux/sched/types.h @@ -94,6 +94,48 @@ * scheduled on a CPU with no more capacity than the specified value. * * A task utilization boundary can be reset by setting the attribute to -1. + * + * Scheduler QoS + * ============= + * + * Different workloads have different scheduling requirements to operate + * optimally. The same applies to tasks within the same workload. + * + * To enable smarter usage of system resources and to cater for the conflicting + * demands of various tasks, Scheduler QoS provides a mechanism to provide more + * information about those demands so that scheduler can do best-effort to + * honour them. + * + * @sched_qos_type what QoS hint to apply + * @sched_qos_value value of the QoS hint + * @sched_qos_cookie magic cookie to tag a group of tasks for which the QoS + * applies. If 0, the hint will apply globally system + * wide. If not 0, the hint will be relative to tasks that + * has the same cookie value only. + * + * QoS hints are set once and not inherited by children by design. The + * rationale is that each task has its individual characteristics and it is + * encouraged to describe each of these separately. Also since system resources + * are finite, there's a limit to what can be done to honour these requests + * before reaching a tipping point where there are too many requests for + * a particular QoS that is impossible to service for all of them at once and + * some will start to lose out. For example if 10 tasks require better wake + * up latencies on a 4 CPUs SMP system, then if they all wake up at once, only + * 4 can perceive the hint honoured and the rest will have to wait. Inheritance + * can lead these 10 to become a 100 or a 1000 more easily, and then the QoS + * hint will lose its meaning and effectiveness rapidly. The chances of 10 + * tasks waking up at the same time is lower than a 100 and lower than a 1000. + * + * To set multiple QoS hints, a syscall is required for each. This is a + * trade-off to reduce the churn on extending the interface as the hope for + * this to evolve as workloads and hardware get more sophisticated and the + * need for extension will arise; and when this happen the task should be + * simpler to add the kernel extension and allow userspace to use readily by + * setting the newly added flag without having to update the whole of + * sched_attr. + * + * Details about the available QoS hints can be found in: + * Documentation/scheduler/sched-qos.rst */ struct sched_attr { __u32 size; @@ -116,6 +158,10 @@ struct sched_attr { __u32 sched_util_min; __u32 sched_util_max; + __u32 sched_qos_type; + __s64 sched_qos_value; + __u32 sched_qos_cookie; + }; #endif /* _UAPI_LINUX_SCHED_TYPES_H */ diff --git a/kernel/sched/syscalls.c b/kernel/sched/syscalls.c index ae1b42775ef9..a7d4dfdfed43 100644 --- a/kernel/sched/syscalls.c +++ b/kernel/sched/syscalls.c @@ -668,6 +668,9 @@ int __sched_setscheduler(struct task_struct *p, return retval; } + if (attr->sched_flags & SCHED_FLAG_QOS) + return -EOPNOTSUPP; + /* * SCHED_DEADLINE bandwidth accounting relies on stable cpusets * information. diff --git a/tools/perf/trace/beauty/include/uapi/linux/sched.h b/tools/perf/trace/beauty/include/uapi/linux/sched.h index 3bac0a8ceab2..67ef99f64ddc 100644 --- a/tools/perf/trace/beauty/include/uapi/linux/sched.h +++ b/tools/perf/trace/beauty/include/uapi/linux/sched.h @@ -102,6 +102,9 @@ struct clone_args { __aligned_u64 set_tid_size; __aligned_u64 cgroup; }; + +enum sched_qos_type { +}; #endif #define CLONE_ARGS_SIZE_VER0 64 /* sizeof first published struct */ @@ -132,6 +135,7 @@ struct clone_args { #define SCHED_FLAG_KEEP_PARAMS 0x10 #define SCHED_FLAG_UTIL_CLAMP_MIN 0x20 #define SCHED_FLAG_UTIL_CLAMP_MAX 0x40 +#define SCHED_FLAG_QOS 0x80 #define SCHED_FLAG_KEEP_ALL (SCHED_FLAG_KEEP_POLICY | \ SCHED_FLAG_KEEP_PARAMS) From patchwork Tue Aug 20 16:35:07 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qais Yousef X-Patchwork-Id: 821195 Received: from mail-ej1-f41.google.com (mail-ej1-f41.google.com [209.85.218.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 902DE1B3F33 for ; Tue, 20 Aug 2024 16:36:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.41 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724171811; cv=none; b=CtKz4ThvyGM/QTnqU9qqw1Hos1Ef+othJXZuZm+pITF+GZohVU/2l8rW8x38NlYdu2+ePp1qA2sCchNJzC9eGSvS/C0svKdgJUglx3fpCKqxoVjukK2jZUJPaCaQtMj92qkZ9Gy8tyoF+839i0zvYlquAroJaQ9tHSzMr76tokg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724171811; c=relaxed/simple; bh=6C3oq3HrJLkJWrLpo+1dSeHCEUbPFUKzQa1BwujeAi0=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Pj33fFRxwuR/zm+Ev2i+rJE/Hxpl2Ljq2rLc0iiKAYiTyE2GLHU1p5aF6nbaNdit6S2KTOEk9UVfZMtJ5YLglPHDewplME7+mFXKLg7MgY8x38R+60bVkG/59E3h8vjNxHhmtTTEExABT5SCE3RAwFbUFXkIOKPXluonSL1ha3E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=layalina.io; spf=pass smtp.mailfrom=layalina.io; dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com header.i=@layalina-io.20230601.gappssmtp.com header.b=fjzVqkVB; arc=none smtp.client-ip=209.85.218.41 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=layalina.io Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=layalina.io Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com header.i=@layalina-io.20230601.gappssmtp.com header.b="fjzVqkVB" Received: by mail-ej1-f41.google.com with SMTP id a640c23a62f3a-a8657900fc1so80328866b.1 for ; Tue, 20 Aug 2024 09:36:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=layalina-io.20230601.gappssmtp.com; s=20230601; t=1724171808; x=1724776608; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=D/d/uli9PIV9KsPRWvcOGLBXbZ0uKNuetYque1eIsGo=; b=fjzVqkVBxMlFXs/sf5pfn83HKxZQ5kmtbDyuCPqCWoT989vqp5FYBgRHckO175VAK6 afp5xA1Kt3xPIl/EhBbsiMUTi+KZe3lMFTUMw+wF/gfH0N2Zh3ZBp76G84juM/GieE1P ATcMhYlRgSR49mkGkASHqiIi+kzig0aJhlL0J45kH3IWYiQvLA/z6824GVWUeBy/jwTF OO8c65IReOgiNzWEsfrQUUHkZpYEopWYA3pATdKuwtFZSAj/Hv1olpkpvTw5AUJgVZID hDC8O3YuV0TvVW+0Y/aHZC5FzOTc4xELh9NvIMwBPR0MtX1auZ+ymrHXTFERLFwTi9sp YFRA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724171808; x=1724776608; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=D/d/uli9PIV9KsPRWvcOGLBXbZ0uKNuetYque1eIsGo=; b=DgDsVecCpM66CSqmJRVnFEnH5r7NZFH1AT3h8NVX7yysoUBGcjS0BpFywTWLjhwY3x inA3giMJrH14a1+Z1cp9RMRyAkWO6qxOuzxxFYgyYGQc5xpFMFLctRtisVW1z8vlplM3 EQOr+96q5xblDgWj1y4T6uvrf1cA3BV4IOmX+yLRMgSIL3x8A8nuR1E1q7BH2pg9EQp5 jZNLqnzwZQvishRPV42vfVJjy9hjnStU7l52nO4A7VmM8WMfR+erAu2lFmbo07ls8Fgg fL/4Zk5m1UMS3fUI2Smcl682caWxhtu+bkzocmAHZshan97IgUvPC8rtLNSMsYXLEOlE vuUQ== X-Forwarded-Encrypted: i=1; AJvYcCV2gD7+ik22bdn9lcQAnLccuRXSiVn+WDb157Vq5Nb5AoJm+0AWLPEhhTRAeFBkbqOKLiYlAFLzqbMZHMBjNTIRr20adcIGEbU= X-Gm-Message-State: AOJu0YwHnthTg5oWxSn9VBiPWz6EXRgaR2eP31kstRw4kgyywnvqfKkL Sy19rEE3hFzJSSn8GkRZtUqRWmlvRDmc+H7PaDojTIXUcfm1kY6wNFtHgogwpq4= X-Google-Smtp-Source: AGHT+IEZPw2L8xpcAT+hcydRU41CPHFT8ONkE8EI/GuWh2TzpVwCtCSqfce21OS/S3Nt2vjzA7lj8A== X-Received: by 2002:a17:907:f193:b0:a80:f893:51bb with SMTP id a640c23a62f3a-a8392a4c515mr1029504866b.68.1724171807827; Tue, 20 Aug 2024 09:36:47 -0700 (PDT) Received: from airbuntu.. (host81-157-90-255.range81-157.btcentralplus.com. [81.157.90.255]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a83838cfb5esm780728766b.59.2024.08.20.09.36.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Aug 2024 09:36:47 -0700 (PDT) From: Qais Yousef To: Ingo Molnar , Peter Zijlstra , Vincent Guittot , "Rafael J. Wysocki" , Viresh Kumar Cc: Juri Lelli , Steven Rostedt , Dietmar Eggemann , John Stultz , linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, Qais Yousef Subject: [RFC PATCH 11/16] sched/qos: Add rampup multiplier QoS Date: Tue, 20 Aug 2024 17:35:07 +0100 Message-Id: <20240820163512.1096301-12-qyousef@layalina.io> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240820163512.1096301-1-qyousef@layalina.io> References: <20240820163512.1096301-1-qyousef@layalina.io> Precedence: bulk X-Mailing-List: linux-pm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Bursty tasks are hard to predict. To use resources efficiently, the system would like to be exact as much as possible. But this poses a challenge for these bursty tasks that need to get access to more resources quickly. The new SCHED_QOS_RAMPUP_MULTIPLIER allows userspace to do that. As the name implies, it only helps them to transition to a higher performance state when they get _busier_. That is perfectly periodic tasks by definition are not going through a transition and will run at a constant performance level. It is the tasks that need to transition from one periodic state to another periodic state that is at a higher level that this rampup_multiplier will help with. It also slows down the ewma decay of util_est which should help those bursty tasks to keep their faster rampup. This should work complimentary with uclamp. uclamp tells the system about min and max perf requirements which can be applied immediately. rampup_multiplier is about reactiveness of the task to change. Specifically to a change for a higher performance level. The task might necessary need to have a min perf requirements, but it can have sudden burst of changes that require higher perf level and it needs the system to provide this faster. TODO: update the sched_qos docs Signed-off-by: Qais Yousef --- include/linux/sched.h | 7 ++++ include/uapi/linux/sched.h | 2 ++ kernel/sched/core.c | 66 ++++++++++++++++++++++++++++++++++++++ kernel/sched/fair.c | 6 ++-- kernel/sched/syscalls.c | 38 ++++++++++++++++++++-- 5 files changed, 115 insertions(+), 4 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 2e8c5a9ffa76..a30ee43a25fb 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -404,6 +404,11 @@ struct sched_info { #endif /* CONFIG_SCHED_INFO */ }; +struct sched_qos { + DECLARE_BITMAP(user_defined, SCHED_QOS_MAX); + unsigned int rampup_multiplier; +}; + /* * Integer metrics need fixed point arithmetic, e.g., sched/fair * has a few: load, load_avg, util_avg, freq, and capacity. @@ -882,6 +887,8 @@ struct task_struct { struct sched_info sched_info; + struct sched_qos sched_qos; + struct list_head tasks; #ifdef CONFIG_SMP struct plist_node pushable_tasks; diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h index 67ef99f64ddc..0baba91ba5b8 100644 --- a/include/uapi/linux/sched.h +++ b/include/uapi/linux/sched.h @@ -104,6 +104,8 @@ struct clone_args { }; enum sched_qos_type { + SCHED_QOS_RAMPUP_MULTIPLIER, + SCHED_QOS_MAX, }; #endif diff --git a/kernel/sched/core.c b/kernel/sched/core.c index c91e6a62c7ab..54faa845cb29 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -152,6 +152,8 @@ __read_mostly int sysctl_resched_latency_warn_once = 1; */ const_debug unsigned int sysctl_sched_nr_migrate = SCHED_NR_MIGRATE_BREAK; +unsigned int sysctl_sched_qos_default_rampup_multiplier = 1; + __read_mostly int scheduler_running; #ifdef CONFIG_SCHED_CORE @@ -4488,6 +4490,47 @@ static int sysctl_schedstats(struct ctl_table *table, int write, void *buffer, #endif /* CONFIG_SCHEDSTATS */ #ifdef CONFIG_SYSCTL +static void sched_qos_sync_sysctl(void) +{ + struct task_struct *g, *p; + + guard(rcu)(); + for_each_process_thread(g, p) { + struct rq_flags rf; + struct rq *rq; + + rq = task_rq_lock(p, &rf); + if (!test_bit(SCHED_QOS_RAMPUP_MULTIPLIER, p->sched_qos.user_defined)) + p->sched_qos.rampup_multiplier = sysctl_sched_qos_default_rampup_multiplier; + task_rq_unlock(rq, p, &rf); + } +} + +static int sysctl_sched_qos_handler(struct ctl_table *table, int write, + void *buffer, size_t *lenp, loff_t *ppos) +{ + unsigned int old_rampup_mult; + int result; + + old_rampup_mult = sysctl_sched_qos_default_rampup_multiplier; + + result = proc_dointvec(table, write, buffer, lenp, ppos); + if (result) + goto undo; + if (!write) + return 0; + + if (old_rampup_mult != sysctl_sched_qos_default_rampup_multiplier) { + sched_qos_sync_sysctl(); + } + + return 0; + +undo: + sysctl_sched_qos_default_rampup_multiplier = old_rampup_mult; + return result; +} + static struct ctl_table sched_core_sysctls[] = { #ifdef CONFIG_SCHEDSTATS { @@ -4534,6 +4577,13 @@ static struct ctl_table sched_core_sysctls[] = { .extra2 = SYSCTL_FOUR, }, #endif /* CONFIG_NUMA_BALANCING */ + { + .procname = "sched_qos_default_rampup_multiplier", + .data = &sysctl_sched_qos_default_rampup_multiplier, + .maxlen = sizeof(unsigned int), + .mode = 0644, + .proc_handler = sysctl_sched_qos_handler, + }, }; static int __init sched_core_sysctl_init(void) { @@ -4543,6 +4593,21 @@ static int __init sched_core_sysctl_init(void) late_initcall(sched_core_sysctl_init); #endif /* CONFIG_SYSCTL */ +static void sched_qos_fork(struct task_struct *p) +{ + /* + * We always force reset sched_qos on fork. These sched_qos are treated + * as finite resources to help improve quality of life. Inheriting them + * by default can easily lead to a situation where the QoS hint become + * meaningless because all tasks in the system have it. + * + * Every task must request the QoS explicitly if it needs it. No + * accidental inheritance is allowed to keep the default behavior sane. + */ + bitmap_zero(p->sched_qos.user_defined, SCHED_QOS_MAX); + p->sched_qos.rampup_multiplier = sysctl_sched_qos_default_rampup_multiplier; +} + /* * fork()/clone()-time setup: */ @@ -4562,6 +4627,7 @@ int sched_fork(unsigned long clone_flags, struct task_struct *p) p->prio = current->normal_prio; uclamp_fork(p); + sched_qos_fork(p); /* * Revert to default priority/policy on fork if requested. diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 0c10e2afb52d..3d9794db58e1 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4906,7 +4906,7 @@ static inline void util_est_update(struct cfs_rq *cfs_rq, if (!task_sleep) { if (task_util(p) > task_util_dequeued(p)) { ewma &= ~UTIL_AVG_UNCHANGED; - ewma = approximate_util_avg(ewma, p->se.delta_exec / 1000); + ewma = approximate_util_avg(ewma, (p->se.delta_exec/1000) * p->sched_qos.rampup_multiplier); goto done; } return; @@ -4974,6 +4974,8 @@ static inline void util_est_update(struct cfs_rq *cfs_rq, * 0.25, thus making w=1/4 ( >>= UTIL_EST_WEIGHT_SHIFT) */ ewma <<= UTIL_EST_WEIGHT_SHIFT; + if (p->sched_qos.rampup_multiplier) + last_ewma_diff /= p->sched_qos.rampup_multiplier; ewma -= last_ewma_diff; ewma >>= UTIL_EST_WEIGHT_SHIFT; done: @@ -9643,7 +9645,7 @@ static void update_cpu_capacity(struct sched_domain *sd, int cpu) * on TICK doesn't end up hurting it as it can happen after we would * have crossed this threshold. * - * To ensure that invaraince is taken into account, we don't scale time + * To ensure that invariance is taken into account, we don't scale time * and use it as-is, approximate_util_avg() will then let us know the * our threshold. */ diff --git a/kernel/sched/syscalls.c b/kernel/sched/syscalls.c index a7d4dfdfed43..dc7d7bcaae7b 100644 --- a/kernel/sched/syscalls.c +++ b/kernel/sched/syscalls.c @@ -543,6 +543,35 @@ static void __setscheduler_uclamp(struct task_struct *p, const struct sched_attr *attr) { } #endif +static inline int sched_qos_validate(struct task_struct *p, + const struct sched_attr *attr) +{ + switch (attr->sched_qos_type) { + case SCHED_QOS_RAMPUP_MULTIPLIER: + if (attr->sched_qos_cookie) + return -EINVAL; + if (attr->sched_qos_value < 0) + return -EINVAL; + break; + default: + return -EINVAL; + } + + return 0; +} + +static void __setscheduler_sched_qos(struct task_struct *p, + const struct sched_attr *attr) +{ + switch (attr->sched_qos_type) { + case SCHED_QOS_RAMPUP_MULTIPLIER: + set_bit(SCHED_QOS_RAMPUP_MULTIPLIER, p->sched_qos.user_defined); + p->sched_qos.rampup_multiplier = attr->sched_qos_value; + default: + break; + } +} + /* * Allow unprivileged RT tasks to decrease priority. * Only issue a capable test if needed and only once to avoid an audit @@ -668,8 +697,11 @@ int __sched_setscheduler(struct task_struct *p, return retval; } - if (attr->sched_flags & SCHED_FLAG_QOS) - return -EOPNOTSUPP; + if (attr->sched_flags & SCHED_FLAG_QOS) { + retval = sched_qos_validate(p, attr); + if (retval) + return retval; + } /* * SCHED_DEADLINE bandwidth accounting relies on stable cpusets @@ -799,7 +831,9 @@ int __sched_setscheduler(struct task_struct *p, __setscheduler_params(p, attr); __setscheduler_prio(p, newprio); } + __setscheduler_uclamp(p, attr); + __setscheduler_sched_qos(p, attr); if (queued) { /* From patchwork Tue Aug 20 16:35:08 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qais Yousef X-Patchwork-Id: 820840 Received: from mail-ej1-f47.google.com (mail-ej1-f47.google.com [209.85.218.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9AAB91BD507 for ; Tue, 20 Aug 2024 16:36:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.47 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724171812; cv=none; b=io8r6qX0KnyQjU9O+HT5aggswtTUuVD3SJ3DNgX+CByd+z1qAI3XQBx6vd9Ek7bA/mUNL0SxG4JngeYQ3uBpDPEWSqdW5U6R9qwxeqtOSbxS+4cfh3cySHMqzBa5is8aQ1pvjmaFU2+644AwnTJJk+rJor872Wcv/KVwv3AlUYc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724171812; c=relaxed/simple; bh=OqsaIQh4uctp/oFXsRzpZsyZxGxfrG6iSiwYiGo8PH8=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=fx5gjF024Uv1nbC2QlVmLBveZXDPqlOGv7RWg3gmo+w+G867QL8bWYotNP9HDG1bWLzwfsG+vi53VT4Zj87nVy5hIAsPw51QTGrwlHSWoSLGWSwTI/xKxjtbkUgMokkAD56V/O6aHHHpw7O8yJG+QgTfoKmnA2ccrduwfG/dAHo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=layalina.io; spf=pass smtp.mailfrom=layalina.io; dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com header.i=@layalina-io.20230601.gappssmtp.com header.b=RTMnOQ6Y; arc=none smtp.client-ip=209.85.218.47 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=layalina.io Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=layalina.io Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com header.i=@layalina-io.20230601.gappssmtp.com header.b="RTMnOQ6Y" Received: by mail-ej1-f47.google.com with SMTP id a640c23a62f3a-a7a9cf7d3f3so729862966b.1 for ; Tue, 20 Aug 2024 09:36:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=layalina-io.20230601.gappssmtp.com; s=20230601; t=1724171809; x=1724776609; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=njOkbGyQ961EBslTvE6EDoi/18VK+01g5BSPEHY0GfY=; b=RTMnOQ6YK46DwBqw1h2uMXzX+6ia50MIzmrFzdRyi606L4gpe9mgVOQLjyiHT0/inH qQy/5Ws94ElnG0lNwn3558zunrUbtYXHvreSFkFHs58r9TmMlqvmJib1bSe1TbExfO+g ZQnypvquHj8Yk5Jzuvr3A5FM8ynUFKwQjOzQmyk7Rdgg5UBUILfsCKCNBdN3DCoH6ndF D87/NM4OjWar68x+o8aX5UnG/wHwTMmtviaVvc7Btr0Wl/zNsghZoWY0EaZZiUD+bRfl AW9MtTMT1ZM4vXJwMSsBI0Eu+rXIXkS62s9DLOEJ/43Gd2E5XFVkNIGxWYwtTC4JbwiP Cxfg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724171809; x=1724776609; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=njOkbGyQ961EBslTvE6EDoi/18VK+01g5BSPEHY0GfY=; b=jAzOzDMY+rrm4mAQM+yyF8ro/T9M2SxnemTqn1ArsDY3DiKbie1tgEs1fBNfzdc2Tc BOTgp2fvZ6B9TQ6wiPlqNrQdqPdJ3crq1Dh4rG6tTrHvJNWmUegTbURw4/guxeV+CO5f wsb6/xb2HyXH1k2ciRHWQeoJK2ZB/vl/Pv4iRgs3Sn7rnJAwx8K+3TK4Jr6HvMnN205I v3xP2dh+K3Cu6474I9QLL3FfyAM3E5bTtPy7N7dA90o6FoY53/3NlXyRiy7rRAE2HGzY cKYDUK6cv00Ym6PVfHX3KEhbCVmdRLXb3OM8qKrRO7EmpqcBNxEk/K3Aj2cxQ7xyxEb7 mfbw== X-Forwarded-Encrypted: i=1; AJvYcCWGrNLZ6u7RzJdlEqzU6LrwfPoruEDG1JWwUYgvshPE5BHdUsySJ6NQ0ousUWGE/kwvWnK6uYCJNAlL9gEvrQUD2nnQJZR271Q= X-Gm-Message-State: AOJu0YxQs9VgkAT5SVItPSq2NB3VmqAJhIHFUoJWef/sp9SifYgxg0SR 1XF/j9mW4v6SAn1Sl1LhSJEFbegCWoWkETUMIZQfkcQ8AxKkCPJPty8Jn8qhrGY= X-Google-Smtp-Source: AGHT+IGAOpAp1NhyW1kkfvm5+3bpeVK+LP152tiTM71qKtdLXPrb3x+lQlXhi6Bkd3u5CZOa4S8bpQ== X-Received: by 2002:a17:906:f591:b0:a7a:c106:3659 with SMTP id a640c23a62f3a-a8647b6c7a6mr242678966b.60.1724171808835; Tue, 20 Aug 2024 09:36:48 -0700 (PDT) Received: from airbuntu.. (host81-157-90-255.range81-157.btcentralplus.com. [81.157.90.255]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a83838cfb5esm780728766b.59.2024.08.20.09.36.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Aug 2024 09:36:48 -0700 (PDT) From: Qais Yousef To: Ingo Molnar , Peter Zijlstra , Vincent Guittot , "Rafael J. Wysocki" , Viresh Kumar Cc: Juri Lelli , Steven Rostedt , Dietmar Eggemann , John Stultz , linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, Qais Yousef Subject: [RFC PATCH 12/16] sched/pelt: Add new waiting_avg to record when runnable && !running Date: Tue, 20 Aug 2024 17:35:08 +0100 Message-Id: <20240820163512.1096301-13-qyousef@layalina.io> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240820163512.1096301-1-qyousef@layalina.io> References: <20240820163512.1096301-1-qyousef@layalina.io> Precedence: bulk X-Mailing-List: linux-pm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 This info will be useful to understand how long tasks end up waiting behind other tasks. This info is recorded for tasks only, and added/subtracted from root cfs_rq on __update_load_avg_se(). It also helps to decouple util_avg which indicates tasks computational demand from the fact that the CPU might need to run faster to reduce the waiting time. It has been a point of confusion in the past while discussing uclamp and util_avg and the fact that not keeping freq high means tasks will take longer to run and cause delays. Isolating the source of delay into its own signal would be a better way to take this source of delay into account when making decisions independently of task's/CPU's computational demands. It is not used now. But will be used later to help drive DVFS headroom. It could become a helpful metric to help us manage waiting latencies in general, for example in load balance. TODO: waiting_avg should use rq_clock_task() as it doesn't care about invariance. Waiting time should reflect actual wait in realtime as this is the measure of latency that users care about. Signed-off-by: Qais Yousef --- include/linux/sched.h | 2 ++ kernel/sched/debug.c | 5 +++++ kernel/sched/fair.c | 32 +++++++++++++++++++++++++++++- kernel/sched/pelt.c | 45 ++++++++++++++++++++++++++++++------------- 4 files changed, 70 insertions(+), 14 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index a30ee43a25fb..f332ce5e226f 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -477,10 +477,12 @@ struct sched_avg { u64 last_update_time; u64 load_sum; u64 runnable_sum; + u64 waiting_sum; u32 util_sum; u32 period_contrib; unsigned long load_avg; unsigned long runnable_avg; + unsigned long waiting_avg; unsigned long util_avg; unsigned int util_est; } ____cacheline_aligned; diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c index c1eb9a1afd13..5fa2662a4a50 100644 --- a/kernel/sched/debug.c +++ b/kernel/sched/debug.c @@ -528,6 +528,7 @@ static void print_cfs_group_stats(struct seq_file *m, int cpu, struct task_group P(se->avg.load_avg); P(se->avg.util_avg); P(se->avg.runnable_avg); + P(se->avg.waiting_avg); #endif #undef PN_SCHEDSTAT @@ -683,6 +684,8 @@ void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq) cfs_rq->avg.load_avg); SEQ_printf(m, " .%-30s: %lu\n", "runnable_avg", cfs_rq->avg.runnable_avg); + SEQ_printf(m, " .%-30s: %lu\n", "waiting_avg", + cfs_rq->avg.waiting_avg); SEQ_printf(m, " .%-30s: %lu\n", "util_avg", cfs_rq->avg.util_avg); SEQ_printf(m, " .%-30s: %u\n", "util_est", @@ -1071,9 +1074,11 @@ void proc_sched_show_task(struct task_struct *p, struct pid_namespace *ns, #ifdef CONFIG_SMP P(se.avg.load_sum); P(se.avg.runnable_sum); + P(se.avg.waiting_sum); P(se.avg.util_sum); P(se.avg.load_avg); P(se.avg.runnable_avg); + P(se.avg.waiting_avg); P(se.avg.util_avg); P(se.avg.last_update_time); PM(se.avg.util_est, ~UTIL_AVG_UNCHANGED); diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 3d9794db58e1..a8dbba0b755e 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4726,6 +4726,22 @@ static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s trace_pelt_cfs_tp(cfs_rq); } +static inline void add_waiting_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) +{ + unsigned long waiting_avg; + waiting_avg = READ_ONCE(cfs_rq->avg.waiting_avg); + waiting_avg += READ_ONCE(se->avg.waiting_avg); + WRITE_ONCE(cfs_rq->avg.waiting_avg, waiting_avg); +} + +static inline void sub_waiting_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) +{ + unsigned long waiting_avg; + waiting_avg = READ_ONCE(cfs_rq->avg.waiting_avg); + waiting_avg -= min(waiting_avg, READ_ONCE(se->avg.waiting_avg)); + WRITE_ONCE(cfs_rq->avg.waiting_avg, waiting_avg); +} + /* * Optional action to be done while updating the load average */ @@ -4744,8 +4760,15 @@ static inline void update_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s * Track task load average for carrying it to new CPU after migrated, and * track group sched_entity load average for task_h_load calculation in migration */ - if (se->avg.last_update_time && !(flags & SKIP_AGE_LOAD)) + if (se->avg.last_update_time && !(flags & SKIP_AGE_LOAD)) { + bool update_rq_waiting_avg = entity_is_task(se) && se_runnable(se); + + if (update_rq_waiting_avg) + sub_waiting_avg(&rq_of(cfs_rq)->cfs, se); __update_load_avg_se(now, cfs_rq, se); + if (update_rq_waiting_avg) + add_waiting_avg(&rq_of(cfs_rq)->cfs, se); + } decayed = update_cfs_rq_load_avg(now, cfs_rq); decayed |= propagate_entity_load_avg(se); @@ -5182,6 +5205,11 @@ attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) {} static inline void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) {} +static inline void +add_waiting_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) {} +static inline void +sub_waiting_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) {} + static inline int sched_balance_newidle(struct rq *rq, struct rq_flags *rf) { return 0; @@ -6786,6 +6814,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags) * estimated utilization, before we update schedutil. */ util_est_enqueue(&rq->cfs, p); + add_waiting_avg(&rq->cfs, se); /* * If in_iowait is set, the code below may not trigger any cpufreq @@ -6874,6 +6903,7 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags) bool was_sched_idle = sched_idle_rq(rq); util_est_dequeue(&rq->cfs, p); + sub_waiting_avg(&rq->cfs, se); for_each_sched_entity(se) { cfs_rq = cfs_rq_of(se); diff --git a/kernel/sched/pelt.c b/kernel/sched/pelt.c index 536575757420..f0974abf8566 100644 --- a/kernel/sched/pelt.c +++ b/kernel/sched/pelt.c @@ -103,7 +103,8 @@ static u32 __accumulate_pelt_segments(u64 periods, u32 d1, u32 d3) */ static __always_inline u32 accumulate_sum(u64 delta, struct sched_avg *sa, - unsigned long load, unsigned long runnable, int running) + unsigned long load, unsigned long runnable, int running, + bool is_task) { u32 contrib = (u32)delta; /* p == 0 -> delta < 1024 */ u64 periods; @@ -118,6 +119,7 @@ accumulate_sum(u64 delta, struct sched_avg *sa, sa->load_sum = decay_load(sa->load_sum, periods); sa->runnable_sum = decay_load(sa->runnable_sum, periods); + sa->waiting_sum = decay_load((u64)(sa->waiting_sum), periods); sa->util_sum = decay_load((u64)(sa->util_sum), periods); /* @@ -147,6 +149,8 @@ accumulate_sum(u64 delta, struct sched_avg *sa, sa->runnable_sum += runnable * contrib << SCHED_CAPACITY_SHIFT; if (running) sa->util_sum += contrib << SCHED_CAPACITY_SHIFT; + if (is_task && runnable && !running) + sa->waiting_sum += contrib << SCHED_CAPACITY_SHIFT; return periods; } @@ -181,7 +185,8 @@ accumulate_sum(u64 delta, struct sched_avg *sa, */ static __always_inline int ___update_load_sum(u64 now, struct sched_avg *sa, - unsigned long load, unsigned long runnable, int running) + unsigned long load, unsigned long runnable, int running, + bool is_task) { int time_shift; u64 delta; @@ -232,7 +237,7 @@ ___update_load_sum(u64 now, struct sched_avg *sa, * Step 1: accumulate *_sum since last_update_time. If we haven't * crossed period boundaries, finish. */ - if (!accumulate_sum(delta, sa, load, runnable, running)) + if (!accumulate_sum(delta, sa, load, runnable, running, is_task)) return 0; return 1; @@ -272,6 +277,7 @@ ___update_load_avg(struct sched_avg *sa, unsigned long load) */ sa->load_avg = div_u64(load * sa->load_sum, divider); sa->runnable_avg = div_u64(sa->runnable_sum, divider); + sa->waiting_avg = div_u64(sa->waiting_sum, divider); WRITE_ONCE(sa->util_avg, sa->util_sum / divider); } @@ -303,7 +309,7 @@ ___update_load_avg(struct sched_avg *sa, unsigned long load) int __update_load_avg_blocked_se(u64 now, struct sched_entity *se) { - if (___update_load_sum(now, &se->avg, 0, 0, 0)) { + if (___update_load_sum(now, &se->avg, 0, 0, 0, false)) { ___update_load_avg(&se->avg, se_weight(se)); trace_pelt_se_tp(se); return 1; @@ -314,10 +320,17 @@ int __update_load_avg_blocked_se(u64 now, struct sched_entity *se) int __update_load_avg_se(u64 now, struct cfs_rq *cfs_rq, struct sched_entity *se) { + bool is_task = entity_is_task(se); + + if (is_task) + rq_of(cfs_rq)->cfs.avg.waiting_avg -= se->avg.waiting_avg; + if (___update_load_sum(now, &se->avg, !!se->on_rq, se_runnable(se), - cfs_rq->curr == se)) { + cfs_rq->curr == se, is_task)) { ___update_load_avg(&se->avg, se_weight(se)); + if (is_task) + rq_of(cfs_rq)->cfs.avg.waiting_avg += se->avg.waiting_avg; cfs_se_util_change(&se->avg); trace_pelt_se_tp(se); return 1; @@ -331,7 +344,8 @@ int __update_load_avg_cfs_rq(u64 now, struct cfs_rq *cfs_rq) if (___update_load_sum(now, &cfs_rq->avg, scale_load_down(cfs_rq->load.weight), cfs_rq->h_nr_running, - cfs_rq->curr != NULL)) { + cfs_rq->curr != NULL, + false)) { ___update_load_avg(&cfs_rq->avg, 1); trace_pelt_cfs_tp(cfs_rq); @@ -357,7 +371,8 @@ int update_rt_rq_load_avg(u64 now, struct rq *rq, int running) if (___update_load_sum(now, &rq->avg_rt, running, running, - running)) { + running, + false)) { ___update_load_avg(&rq->avg_rt, 1); trace_pelt_rt_tp(rq); @@ -383,7 +398,8 @@ int update_dl_rq_load_avg(u64 now, struct rq *rq, int running) if (___update_load_sum(now, &rq->avg_dl, running, running, - running)) { + running, + false)) { ___update_load_avg(&rq->avg_dl, 1); trace_pelt_dl_tp(rq); @@ -414,7 +430,8 @@ int update_hw_load_avg(u64 now, struct rq *rq, u64 capacity) if (___update_load_sum(now, &rq->avg_hw, capacity, capacity, - capacity)) { + capacity, + false)) { ___update_load_avg(&rq->avg_hw, 1); trace_pelt_hw_tp(rq); return 1; @@ -462,11 +479,13 @@ int update_irq_load_avg(struct rq *rq, u64 running) ret = ___update_load_sum(rq->clock - running, &rq->avg_irq, 0, 0, - 0); + 0, + false); ret += ___update_load_sum(rq->clock, &rq->avg_irq, 1, 1, - 1); + 1, + false); if (ret) { ___update_load_avg(&rq->avg_irq, 1); @@ -536,7 +555,7 @@ unsigned long approximate_util_avg(unsigned long util, u64 delta) if (unlikely(!delta)) return util; - accumulate_sum(delta << sched_pelt_lshift, &sa, 1, 0, 1); + accumulate_sum(delta << sched_pelt_lshift, &sa, 1, 0, 1, false); ___update_load_avg(&sa, 0); return sa.util_avg; @@ -555,7 +574,7 @@ u64 approximate_runtime(unsigned long util) return runtime; while (sa.util_avg < util) { - accumulate_sum(delta, &sa, 1, 0, 1); + accumulate_sum(delta, &sa, 1, 0, 1, false); ___update_load_avg(&sa, 0); runtime++; } From patchwork Tue Aug 20 16:35:09 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qais Yousef X-Patchwork-Id: 821194 Received: from mail-ej1-f43.google.com (mail-ej1-f43.google.com [209.85.218.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F00B91C0DCA for ; Tue, 20 Aug 2024 16:36:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.43 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724171813; cv=none; b=W5PiwPlKcJotDQeTtY/94Unf/36thZyw3E+gjEhxHd75p84UkWfUAsXmdr0usb0Qa6KQakZKEQBFY5NEpYYovFYs3i8NWbHYGeur8PTpxGI96GSna38Pwu4Oj0mAdKNiJXRaQRzZNk5yC7BUv2caRj4yBJeKEk7arFBrEuTCWMo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724171813; c=relaxed/simple; bh=peGOTrMpJU6Y/7bYHyR4tSGHI6Lbtst36QPON3uOXj0=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=WlnjVwzx01eSgUlJhJPpXeLmmLnHl8B9SyPdGn9GNYdFadLhXGN+XHh1gULbnCOpT+lRMc0fK/Otpot96z6oc1VfgrgKzw6QB6phAB4i/O98+SWaX/x7T/9Vh/+0nKLUTT7of1jkBif9tEnUJF6SuME3jqfNpe1fHVtQrl9vcfo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=layalina.io; spf=pass smtp.mailfrom=layalina.io; dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com header.i=@layalina-io.20230601.gappssmtp.com header.b=3A3uUyoa; arc=none smtp.client-ip=209.85.218.43 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=layalina.io Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=layalina.io Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com header.i=@layalina-io.20230601.gappssmtp.com header.b="3A3uUyoa" Received: by mail-ej1-f43.google.com with SMTP id a640c23a62f3a-a8647056026so125241366b.0 for ; Tue, 20 Aug 2024 09:36:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=layalina-io.20230601.gappssmtp.com; s=20230601; t=1724171810; x=1724776610; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=nawo0iXyk8rugawGjPThZSFWb4byKstZYwvxJoh01/U=; b=3A3uUyoa2XVh8IQpUMWF9oo+EJnAtvGnVkgTOkd3SrhbRtIYlpsyGX8V8H0KcR/V1e nC3fH2Ff2pHhFdNpRiGMeGkoOLXJJ3ZVk4/EDrag/Ntp/+MGPqI7avbvbJDjJjbxxTPg lzc0t5Agrk7BhTMPRV3uFnEnFfmr9nLTrrUJP5okbQOVen/wZpjKsrHUhEPuIHUCDEJE 3uapYNNelcMfaAL2Lb+dTBqsLyUESbDBWE47bOvfOOznohFRNZ8BkOduG711d+zvGaf7 FoDW+9myWedFeQHli9yAucsOfnybJAoZ/G8ZUmlROJcvzHe4zoCdGL7ccOb1BtYf75LJ FFJQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724171810; x=1724776610; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=nawo0iXyk8rugawGjPThZSFWb4byKstZYwvxJoh01/U=; b=EKO6BKHa14pM7q71gS3LkRdyDqcsYISEV3Nn6A5RM5+pDou6iMNhBYSjd/zFiGtyyG oLddjHZOn86E1BmIm8VMxDvp41llEnfVKCSUVfAkjQAgJqsyuppOkQWwFM4MY6VYrmiM Tt2DfGBvgFrfLMc1FvAdRSA2qb8iA0h0k2DcWjN8o27v1bKmfwcQAH/WzVEj8rqePLE5 jWtvJgw/VTe93wUaLlhtRCVGhl3YsQcA2FgA6ijEk10PWIkuTiVkzpyrpqQGOEyL88vZ jFm6HMkm7TbiPcxtQYSxBlc21cdIgdYzTSNOLnteHZN9pnmTp3Xfp5tBBqRVMZPgw+DX VvCg== X-Forwarded-Encrypted: i=1; AJvYcCV3NhACOj1bw6RSLUFlYsofGMIwhupnhcCQyLwy3Sh5VePjWIWZDegMks0PqFzv+/gIYDhpdibCIA==@vger.kernel.org X-Gm-Message-State: AOJu0Yww0VG48dURrqKa1GaG3UV2qQePzhYY8MiG0wmtsBYt4UI5WAIk uA8rciXirJw3mxkzGJB9ivHGr+DJPdAUgG7jU54vFZ+J8EQ484DX/Ue07szdYVM= X-Google-Smtp-Source: AGHT+IGIadSEibAssyW3zb8AhXXDHexQCIIATFiI9hph0UEKoDLXeaAbkr5C1JABp/pKbDbwR99llQ== X-Received: by 2002:a17:907:d3c9:b0:a7a:b643:654f with SMTP id a640c23a62f3a-a839292f3ccmr1052502366b.15.1724171810258; Tue, 20 Aug 2024 09:36:50 -0700 (PDT) Received: from airbuntu.. (host81-157-90-255.range81-157.btcentralplus.com. [81.157.90.255]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a83838cfb5esm780728766b.59.2024.08.20.09.36.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Aug 2024 09:36:49 -0700 (PDT) From: Qais Yousef To: Ingo Molnar , Peter Zijlstra , Vincent Guittot , "Rafael J. Wysocki" , Viresh Kumar Cc: Juri Lelli , Steven Rostedt , Dietmar Eggemann , John Stultz , linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, Qais Yousef Subject: [RFC PATCH 13/16] sched/schedutil: Take into account waiting_avg in apply_dvfs_headroom Date: Tue, 20 Aug 2024 17:35:09 +0100 Message-Id: <20240820163512.1096301-14-qyousef@layalina.io> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240820163512.1096301-1-qyousef@layalina.io> References: <20240820163512.1096301-1-qyousef@layalina.io> Precedence: bulk X-Mailing-List: linux-pm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 We now have three sources of delays. 1. How often we send cpufreq_updates 2. How often we update util_avg 3. How long tasks wait in RUNNABLE to become RUNNING The headroom should cater for all this type of delays to ensure the system is running at adequate performance point. We want to pick the maximum headroom required by any of these sources of delays. TODO: the signal should use task clock not pelt as this should be real time based and we don't care about invariance. Signed-off-by: Qais Yousef --- kernel/sched/cpufreq_schedutil.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c index 94e35b7c972d..318b09bc4ab1 100644 --- a/kernel/sched/cpufreq_schedutil.c +++ b/kernel/sched/cpufreq_schedutil.c @@ -259,10 +259,15 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy, * dvfs_update_delay of the cpufreq governor or min(curr.se.slice, TICK_US), * whichever is higher. * + * Also take into accounting how long tasks have been waiting in runnable but + * !running state. If it is high, it means we need higher DVFS headroom to + * reduce it. + * * XXX: Should we provide headroom when the util is decaying? */ static inline unsigned long sugov_apply_dvfs_headroom(unsigned long util, int cpu) { + unsigned long update_headroom, waiting_headroom; struct rq *rq = cpu_rq(cpu); u64 delay; @@ -276,7 +281,10 @@ static inline unsigned long sugov_apply_dvfs_headroom(unsigned long util, int c delay = TICK_USEC; delay = max(delay, per_cpu(dvfs_update_delay, cpu)); - return approximate_util_avg(util, delay); + update_headroom = approximate_util_avg(util, delay); + waiting_headroom = util + READ_ONCE(rq->cfs.avg.waiting_avg); + + return max(update_headroom, waiting_headroom); } unsigned long sugov_effective_cpu_perf(int cpu, unsigned long actual, From patchwork Tue Aug 20 16:35:10 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qais Yousef X-Patchwork-Id: 820839 Received: from mail-ej1-f51.google.com (mail-ej1-f51.google.com [209.85.218.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CDC861C3788 for ; Tue, 20 Aug 2024 16:36:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724171814; cv=none; b=poaj57N5JdF+LsdMWH6aW3vzQKdJwGVbqImjCxvDLAJPuvZ/6IA7f/SlLjreiJvo6g8aZaQFfC4qGbtl27QeNiALJyyDQh2CyKQWiL/OrjGA8ogPCO3BJcngAxHMBMd7PcpuqB4h1N89/47XXadxEWAC3x1LxaMrnMJvtVmDypk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724171814; c=relaxed/simple; bh=9ZPSZBuNvWFZgZLOCmzGhftDorxtai3m2giRjex3pcY=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=as0KfUvf0VjXdXLPLwrK1klUl4KRblJvlRBoUU+N32lcsz/fw5dHJ8f0/8FtdzoUB9+WbEROAgzQ8sdNKD2rTB+7Xku+WSigJ9YAvXP1nroiAKqKCcQlJYXmD/Aw7pNAK1veCDdPEicsaRIJfjz+DpxJCKv4AE+OXybH+BerwlA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=layalina.io; spf=pass smtp.mailfrom=layalina.io; dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com header.i=@layalina-io.20230601.gappssmtp.com header.b=OYm/b04C; arc=none smtp.client-ip=209.85.218.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=layalina.io Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=layalina.io Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com header.i=@layalina-io.20230601.gappssmtp.com header.b="OYm/b04C" Received: by mail-ej1-f51.google.com with SMTP id a640c23a62f3a-a8643235f99so163172066b.3 for ; Tue, 20 Aug 2024 09:36:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=layalina-io.20230601.gappssmtp.com; s=20230601; t=1724171811; x=1724776611; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=9h5cxC5T/07ttsRJWHESo6Dab0VfuQh9Cth3dLGuihw=; b=OYm/b04CrUWrJrdjX1an4E8erpwS2OhGYQZkI6WK68VaOwmsESIYHq5xbTJGzlFKGX NsXyrOzuTP8zggrwTcWzP4ej1rOOneaPjb/nVsCdLwiJ/gi2jZv9ypuEtrmw99ELT455 ljeDskts8uzbrDKuMxLP+7M0luwGiSt+VqA3Ei/4M1wuI5QACp7Xk6C/y/IkUo5Mztal aXSjKYLMtM+QJTuEZlTfM7uuQsJWTl4OUFEIPupjgcdSLHH3/srpjxVjNAxzWjYc0ZpY crJjifgsxSAZjBgFo0CWXUjNxbgQAKA4cLhF8NP2tPKhNEaoncfr7tk1WBbyZfgv9OWl 0xaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724171811; x=1724776611; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=9h5cxC5T/07ttsRJWHESo6Dab0VfuQh9Cth3dLGuihw=; b=WyssfxmKgF2Z//eI2u7NFJWHE8yCraoFKTEew9Nhz/xv26IJ5I0hMuYLcAUi6CYYG7 DUU/NJFz7ULIent9pKdYBVOoHVyNu0sjYAnf6E+pTs1q4kjBB0NIi3oQSfGIsro3ahsp /xwBcAIRpSeuRbIxAp3+WeN3OG40TCqfzNqdpLB1cQ994tceAx7ZLrFrOx5ovxIfGtRO Ewqx7Mmabe6Iy9p3WRy8DiVJH8xm9Aga8YCPsr0XLaApszk2Eiyez3QjVNEhxXLqhJvJ C1Yux8uejOoBQdFpNEQ+NHOAAYpiFtzlBJ56Xj2PSdKk24mmETu0Bnv0rvOI5v/hV6ma Oumg== X-Forwarded-Encrypted: i=1; AJvYcCUtAuOOv6N60uQJOG9TRep/ryzapgwvH11hMP/mQjoa7uh1qBzFdkgbvkeyq3LcPjX+wpZBrY6YWTpIkmD63EVQeg9ANu0p0gk= X-Gm-Message-State: AOJu0YwkEjioWH3I+zTnCaIz+Ay5SZxS1lOnN+auO2GjJC0fpBO8jrRq v0wwf3imMWg8ud31AP6Rjauj0jVop2QO7z6Lwv+mV6USEW0akjstbRuHKRWEb2E= X-Google-Smtp-Source: AGHT+IG0xeJvN3XiOQlKOh7QN7lETQdxPtqwJVb6er+yMgBCCOTRa61cwW2df57Oc6uhYemybizG5Q== X-Received: by 2002:a17:907:97d2:b0:a77:cdaa:88a3 with SMTP id a640c23a62f3a-a86479e45efmr210115366b.27.1724171811016; Tue, 20 Aug 2024 09:36:51 -0700 (PDT) Received: from airbuntu.. (host81-157-90-255.range81-157.btcentralplus.com. [81.157.90.255]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a83838cfb5esm780728766b.59.2024.08.20.09.36.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Aug 2024 09:36:50 -0700 (PDT) From: Qais Yousef To: Ingo Molnar , Peter Zijlstra , Vincent Guittot , "Rafael J. Wysocki" , Viresh Kumar Cc: Juri Lelli , Steven Rostedt , Dietmar Eggemann , John Stultz , linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, Qais Yousef Subject: [RFC PATCH 14/16] sched/schedutil: Ignore dvfs headroom when util is decaying Date: Tue, 20 Aug 2024 17:35:10 +0100 Message-Id: <20240820163512.1096301-15-qyousef@layalina.io> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240820163512.1096301-1-qyousef@layalina.io> References: <20240820163512.1096301-1-qyousef@layalina.io> Precedence: bulk X-Mailing-List: linux-pm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 It means we're being idling or doing less work and are already running at a higher value. No need to apply any dvfs headroom in this case. Signed-off-by: Qais Yousef --- kernel/sched/cpufreq_schedutil.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c index 318b09bc4ab1..4a1a8b353d51 100644 --- a/kernel/sched/cpufreq_schedutil.c +++ b/kernel/sched/cpufreq_schedutil.c @@ -9,6 +9,7 @@ #define IOWAIT_BOOST_MIN (SCHED_CAPACITY_SCALE / 8) DEFINE_PER_CPU_READ_MOSTLY(unsigned long, response_time_mult); +DEFINE_PER_CPU(unsigned long, last_update_util); struct sugov_tunables { struct gov_attr_set attr_set; @@ -262,15 +263,19 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy, * Also take into accounting how long tasks have been waiting in runnable but * !running state. If it is high, it means we need higher DVFS headroom to * reduce it. - * - * XXX: Should we provide headroom when the util is decaying? */ static inline unsigned long sugov_apply_dvfs_headroom(unsigned long util, int cpu) { - unsigned long update_headroom, waiting_headroom; + unsigned long update_headroom, waiting_headroom, prev_util; struct rq *rq = cpu_rq(cpu); u64 delay; + prev_util = per_cpu(last_update_util, cpu); + per_cpu(last_update_util, cpu) = util; + + if (util < prev_util) + return util; + /* * What is the possible worst case scenario for updating util_avg, ctx * switch or TICK? From patchwork Tue Aug 20 16:35:11 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qais Yousef X-Patchwork-Id: 821193 Received: from mail-ed1-f51.google.com (mail-ed1-f51.google.com [209.85.208.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7997C1BE228 for ; Tue, 20 Aug 2024 16:36:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724171815; cv=none; b=CPMm1jNnPj15MxpZqm5k6maKvpqYpxb30HTlX0o/QpVSC1PYIjFIEYZu9tPxY3nNbmrHG9uVxJ5MhgQ68B8MOf4ki0TlHMG8kT2qKt37ExyVpdwDjGE8wVk8gxZmL88bvKnRPIEcgS1WMO9zqeL8JbZLqzxutuN59CUQA62YJ0I= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724171815; c=relaxed/simple; bh=nw2KAMga+N9SBRpuwZhMNhiboc0xudR9Yyh15QBkKfY=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=ETU3nbCm3bLaXQhKeJALG0UJw8PnEgb5TKPmKUzrz8phpEAr8psSMJxjMjsGgr8I1Wht7PEXiZfbifubBl3T5c9yhgIOSoCcQpr0wzVlpFlx1Xg1u+3wvieWy087gv3BgdSMBq5eh0pYYfTcoSDZxIjH0CKovAXXA5WcUQTrWEw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=layalina.io; spf=pass smtp.mailfrom=layalina.io; dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com header.i=@layalina-io.20230601.gappssmtp.com header.b=pl/50XDE; arc=none smtp.client-ip=209.85.208.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=layalina.io Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=layalina.io Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com header.i=@layalina-io.20230601.gappssmtp.com header.b="pl/50XDE" Received: by mail-ed1-f51.google.com with SMTP id 4fb4d7f45d1cf-5bed83488b3so4442407a12.0 for ; Tue, 20 Aug 2024 09:36:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=layalina-io.20230601.gappssmtp.com; s=20230601; t=1724171812; x=1724776612; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=E2cahoYdlsFx3NF9sR068SfEnyVTIFV9fZk/j7POxiM=; b=pl/50XDENJksbaqzw4PK5B0o7XTiHGuKR+mRP5tKuYaEta1OmQv7i/eD5lK6XkHlDQ a/qfJSbzn+aezmGgu3sBOB/ghRpQgJSjHVXk7XrxOWGCHsxT4g7tS0RyKAJh4qfNnDZe MrNA2HLqNIk9236txVQ7FNfsYLglwn9dBCOt0xhSRwP02jP7XJZzLfjxPYhqbbUttXdt fs61SypRsU8SK/DY47EIbeDI9VXXP3Tj0GH0UdrNKLRgpaneR22f3Sj4BGRC8ahMKkQ1 PXNEaT3SxDXRqfxmSSiF8oqyCp06nslixr/cvoqlHye8xbUUc+TdxdL2vQRNt7xk+TMG r8uA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724171812; x=1724776612; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=E2cahoYdlsFx3NF9sR068SfEnyVTIFV9fZk/j7POxiM=; b=gESLxrLV4AJn9Ut/PKnCZnJ1fAQ/aRMXcqI0e9raQcfm6UEiD8h274JWBC/jR/G+pW /5akHwLxt66bszEntZ2k4p92zW5CfF1njophvQeJtfIopf52PiZXGDc92ZkCf/pef9Nj GlKSuIlW1YAcYk9ye5RNwEG9UkQfKYdaFy6HPwlz/01MmUi6lKqnsd62vWn6P15DcOMv grCN/vsL72rQ8RDXGylpgOlW1r3lg6tefbV4wUVuHibOoxY4Q3lMSvhdNNNMEiBWiTtC HGIfh+GAW2NSw1OgWn3aymhigml9OTJy6dHYK9wXzxSAmaIE7i6J4sAhDFqwkgTBGMlx Ub5w== X-Forwarded-Encrypted: i=1; AJvYcCXY/SDJZVZGZBKZzYx5a14uIMAJQ7kKg/p0MvlHZzQaOmAenSGwDlrNRAFxSm6t+T9ZWg73utXxCg==@vger.kernel.org X-Gm-Message-State: AOJu0YzkR7svX4UjecDdHv2LX5QSThm4npLFs1I/JyoyE9PLr1S4FnXQ DU+01fN6D9dIxP1Vf5/k+bxq0hRHI9DKnZdBCCadoUo4oCHcMtBEaOuu9Kdu/dY= X-Google-Smtp-Source: AGHT+IGYJnTxET1FI2CgmIVhGwHAm0Rvsu+OSAWgb5x2sIlIex+roRD/rs+GaRWocmTPkLXygBK5rw== X-Received: by 2002:a17:907:2d1e:b0:a7a:a557:454b with SMTP id a640c23a62f3a-a8392a47a85mr998867766b.66.1724171811750; Tue, 20 Aug 2024 09:36:51 -0700 (PDT) Received: from airbuntu.. (host81-157-90-255.range81-157.btcentralplus.com. [81.157.90.255]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a83838cfb5esm780728766b.59.2024.08.20.09.36.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Aug 2024 09:36:51 -0700 (PDT) From: Qais Yousef To: Ingo Molnar , Peter Zijlstra , Vincent Guittot , "Rafael J. Wysocki" , Viresh Kumar Cc: Juri Lelli , Steven Rostedt , Dietmar Eggemann , John Stultz , linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, Qais Yousef Subject: [RFC PATCH 15/16] sched/fair: Enable disabling util_est via rampup_multiplier Date: Tue, 20 Aug 2024 17:35:11 +0100 Message-Id: <20240820163512.1096301-16-qyousef@layalina.io> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240820163512.1096301-1-qyousef@layalina.io> References: <20240820163512.1096301-1-qyousef@layalina.io> Precedence: bulk X-Mailing-List: linux-pm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 util_est is a great feature to enable busy tasks with long sleep time to maintain their perf level. But it can also be expensive in terms of power for tasks that have no such perf requirements and just happened to be busy in the last activation. If a task sets its rampup_multiplier to 0, then it indicates that it is happy to glide along with system default response and doesn't require responsiveness. We can use that to further imply that the task is happy to decay its util for long sleep too and disable util_est. XXX: This could be overloading this QoS. We could add a separate more explicit QoS to disable util_est for tasks that don't care. Signed-off-by: Qais Yousef --- kernel/sched/fair.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index a8dbba0b755e..ad72db5a266c 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4918,6 +4918,14 @@ static inline void util_est_update(struct cfs_rq *cfs_rq, if (!sched_feat(UTIL_EST)) return; + /* + * rampup_multiplier = 0 indicates util_est is disabled. + */ + if (!p->sched_qos.rampup_multiplier) { + ewma = 0; + goto done; + } + /* Get current estimate of utilization */ ewma = READ_ONCE(p->se.avg.util_est); From patchwork Tue Aug 20 16:35:12 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qais Yousef X-Patchwork-Id: 820838 Received: from mail-ed1-f52.google.com (mail-ed1-f52.google.com [209.85.208.52]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 747CF1C4631 for ; Tue, 20 Aug 2024 16:36:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.52 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724171816; cv=none; b=QKa5U3sly3cxdJZ+CjT0ruIP+a4pnmKD3BjFPMgTnGZ0Wh9hmc2RctepNapRNyHKbKg8kcct8zpkC1uRqMXkWQ5F9KY3fUFSgYemcUMnsOU66y0NeVT88W3nXGsGrNr5EyRiGLFlYLMBBZvUB96xtF/iPu4SJ3jSiHIcPkuUaFE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724171816; c=relaxed/simple; bh=FcQoU+XSuInJQQOSIHfVyJzGsFgsU0Fz0b3FqV2urj0=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=oCbAG6WiGjH2cGp4fkNRKyc0QKft/qRcA4KADYl9QykEYzSty5H49o9Ijqzd1v+uDS9YZzOtwz3kNxRBoRzVPGhJGfZtcZREgeqD9BkSMZTMDiP2+6UGJ0EFNJKu09183PWvbKhKinkcSTngv63u+6UpYGYgxyX+jLT++55REMg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=layalina.io; spf=pass smtp.mailfrom=layalina.io; dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com header.i=@layalina-io.20230601.gappssmtp.com header.b=Fsybhqe7; arc=none smtp.client-ip=209.85.208.52 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=layalina.io Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=layalina.io Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com header.i=@layalina-io.20230601.gappssmtp.com header.b="Fsybhqe7" Received: by mail-ed1-f52.google.com with SMTP id 4fb4d7f45d1cf-5bef4d9e7f8so3396995a12.2 for ; Tue, 20 Aug 2024 09:36:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=layalina-io.20230601.gappssmtp.com; s=20230601; t=1724171813; x=1724776613; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Tc420hGJWxSmQzyq9r4KlOxaBiNHLqomnZxiY/aUtfM=; b=Fsybhqe7muuGhbmgEdTCITd2P88EV0KF/kH75xdvPkU8GWKNa07xoXRRDpOMicwFwi vnw0Q2B/jcxC1RuJO/XDYFVkPxJsw/xaCoeRULoBrYZlMW752AMBar6kGPqf6FFNedhL Lu72rWH9TFTi0KkxHwvdyNfht1QcpE/GV6jLq+LKsijg8xXCeG9VVJFUz2dyysQNFAs6 5kWy52wB6U6vmT+hkRIJITVzIcGFPQfHxRuWMV9PbqxrGRtmHqYL8Y/i96xKFcEoiDTo KV96gXw6UIQj1afHF05y+JLNaQQpNLM9+ipA9I/wFwlsot+O10899CQnR/nTDqbLry7a EuWg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724171813; x=1724776613; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Tc420hGJWxSmQzyq9r4KlOxaBiNHLqomnZxiY/aUtfM=; b=KqmPXUnZLK/cXK6RviUZ0DDciNzDKqRKQjfR5Evku1pU3HT68Ct6TrBaNaabs32JC2 Yi9i/LUAe82lqk5255vFnBD/+pHjI4Zb92+36G3/CZLh8+LA9afV9mJ+EfUughxQ8ZJK OyNx/1cgBic20gvNNsCjVcq1jzVlwsJeRXwCRyagAAdn97auP28Rvj6+nq60zy0ojd0z AlhIUr4yl5i60dlkS3DXqPW1Q6v1hMfoC8WYdU1Ufdx9zTJeq9w5cbZ+VNyozmoUoTjL CNCNvoDg7qML36MFEg6dcCubNzZt2V+tpaQTSPPJexAiAZy4TTLXQ4HwqTdVkqeILWjm 84AQ== X-Forwarded-Encrypted: i=1; AJvYcCUKoBeINXw6/fOvhopd05GOSFWrBVONoH/eQsvUo2hQBftVBnnn3yZe3eXMQo4Krn0WWDqmjf+MmzYppvgpSaC7bwUGfmDquIY= X-Gm-Message-State: AOJu0Yx5B4EarLfCD5EwTN0zqkcabgz7UXqrwIBbZnEblAvnUr+3Kmd+ QobGScMNjzBR19uSjZ/stUhfF/hw1UOjDd5zoDHNvzKyg7FrOCtcplHp6ccSMig= X-Google-Smtp-Source: AGHT+IFrFBtEWI8G3mIhZShOA/h8coVH75w3HUcYb7DxNvB76QnrvRr68I5ag+1Mh1T6RnbU58v1zg== X-Received: by 2002:a17:907:d581:b0:a7a:8cb9:7490 with SMTP id a640c23a62f3a-a8392a15b30mr1057699366b.47.1724171812633; Tue, 20 Aug 2024 09:36:52 -0700 (PDT) Received: from airbuntu.. (host81-157-90-255.range81-157.btcentralplus.com. [81.157.90.255]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a83838cfb5esm780728766b.59.2024.08.20.09.36.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Aug 2024 09:36:52 -0700 (PDT) From: Qais Yousef To: Ingo Molnar , Peter Zijlstra , Vincent Guittot , "Rafael J. Wysocki" , Viresh Kumar Cc: Juri Lelli , Steven Rostedt , Dietmar Eggemann , John Stultz , linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, Qais Yousef Subject: [RFC PATCH 16/16] sched/fair: Don't mess with util_avg post init Date: Tue, 20 Aug 2024 17:35:12 +0100 Message-Id: <20240820163512.1096301-17-qyousef@layalina.io> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240820163512.1096301-1-qyousef@layalina.io> References: <20240820163512.1096301-1-qyousef@layalina.io> Precedence: bulk X-Mailing-List: linux-pm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 The extrapolation logic for util_avg for newly forked tasks tries to crystal ball the task's demand. This has worked well when the system didn't have the means to help these tasks otherwise. But now we do have util_est that will rampup faster. And uclamp_min to ensure a good starting point if they really care. Since we really can't crystal ball the behavior, and giving the same starting value for all tasks is more consistent behavior for all forked tasks, and it helps to preserve system resources for tasks to compete to get them if they truly care, set the initial util_avg to be 0 when util_est feature is enabled. This should not impact workloads that need best single threaded performance (like geekbench) given the previous improvements introduced to help with faster rampup to reach max perf point more coherently and consistently across systems. Signed-off-by: Qais Yousef --- kernel/sched/fair.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index ad72db5a266c..45be77d1112f 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1031,6 +1031,19 @@ void init_entity_runnable_average(struct sched_entity *se) } /* + * When util_est is used, the tasks can rampup much faster by default. And with + * the rampup_multiplier, tasks can ask for faster rampup after fork. And with + * uclamp, they can ensure a min perf requirement. Given all these factors, we + * keep util_avg at 0 as we can't crystal ball the task demand after fork. + * Userspace have enough ways to ensure good perf for tasks after fork. Keeping + * the util_avg to 0 is good way to ensure a uniform start for all tasks. And + * it is good to preserve precious resources. Truly busy forked tasks can + * compete for the resources without the need for initial 'cheat' to ramp them + * up automagically. + * + * When util_est is not present, the extrapolation logic below will still + * apply. + * * With new tasks being created, their initial util_avgs are extrapolated * based on the cfs_rq's current util_avg: * @@ -1080,6 +1093,12 @@ void post_init_entity_util_avg(struct task_struct *p) return; } + /* + * Tasks can rampup faster with util_est, so don't mess with util_avg. + */ + if (sched_feat(UTIL_EST)) + return; + if (cap > 0) { if (cfs_rq->avg.util_avg != 0) { sa->util_avg = cfs_rq->avg.util_avg * se_weight(se);