From patchwork Tue Jun 13 04:23:59 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ricardo Neri X-Patchwork-Id: 692456 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BC0E7C77B7A for ; Tue, 13 Jun 2023 04:21:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238415AbjFMEVj (ORCPT ); Tue, 13 Jun 2023 00:21:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45252 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233202AbjFMEVi (ORCPT ); Tue, 13 Jun 2023 00:21:38 -0400 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 496E710DC; Mon, 12 Jun 2023 21:21:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686630097; x=1718166097; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=FPTg/Kh6Bpx/mtnSi0xLQfXNHVVz6z4CysuQe9QkeN4=; b=jY0wpup8aNnyc/CKvnYznW4W5RtdAmMn4Wu2cxgS3PpedtNNpbxUV8UW 82aZ32WtwlwZ5jOCixt4YfhEszb2epwcFhk7F+qyNed1T9tLRAStZvWyL nnuRwQB4ccbFBe95oqbQo7a/ks0ZKPDowHsHtV6Ze2klBenAfYO1oeoDV 9QduMMpmrJGjVOvjaf/lOzPAVRJfXPyphM9Myq1cHvqITCGgwocceUlIa gq+pkdSefIr4EXvx5O8lr/ZIV5CPvyjwEcszTCWM64c3OJTNeWNmyOtrt aeohUcbF7tR9VadVUvT1rQHwx3Dj53Is4leYfF9dIDhow5NjnBDrOVebt Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="358222031" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="358222031" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 21:21:36 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="661854926" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="661854926" Received: from ranerica-svr.sc.intel.com ([172.25.110.23]) by orsmga003.jf.intel.com with ESMTP; 12 Jun 2023 21:21:34 -0700 From: Ricardo Neri To: "Peter Zijlstra (Intel)" , Juri Lelli , Vincent Guittot Cc: Ricardo Neri , "Ravi V. Shankar" , Ben Segall , Daniel Bristot de Oliveira , Dietmar Eggemann , Len Brown , Mel Gorman , "Rafael J. Wysocki" , Srinivas Pandruvada , Steven Rostedt , Tim Chen , Valentin Schneider , Lukasz Luba , Ionela Voinescu , Zhao Liu , "Yuan, Perry" , x86@kernel.org, "Joel Fernandes (Google)" , linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, Ricardo Neri , "Tim C . Chen" , Zhao Liu Subject: [PATCH v4 01/24] sched/task_struct: Introduce IPC classes of tasks Date: Mon, 12 Jun 2023 21:23:59 -0700 Message-Id: <20230613042422.5344-2-ricardo.neri-calderon@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230613042422.5344-1-ricardo.neri-calderon@linux.intel.com> References: <20230613042422.5344-1-ricardo.neri-calderon@linux.intel.com> Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org On hybrid processors, the architecture differences between the types of CPUs result in different instructions-per-cycle (IPC) rates for each type of CPU. IPCs may vary further by the type of instructions being executed. Instructions can be grouped into classes of similar IPCs. Tasks can be classified into groups based on the type of instructions they execute. Add a new member task_struct::ipcc to associate a particular task to an IPC class that depends on the instructions it executes. The scheduler may use the IPC class of a task and data about the performance among CPUs of a given IPC class to improve throughput. It may, for instance, place certain classes of tasks on CPUs of higher performance. The methods to determine the classification of a task and its relative IPC score are specific to each CPU architecture. Cc: Ben Segall Cc: Daniel Bristot de Oliveira Cc: Dietmar Eggemann Cc: Ionela Voinescu Cc: Joel Fernandes (Google) Cc: Len Brown Cc: Lukasz Luba Cc: Mel Gorman Cc: Perry Yuan Cc: Rafael J. Wysocki Cc: Srinivas Pandruvada Cc: Steven Rostedt Cc: Tim C. Chen Cc: Valentin Schneider Cc: Zhao Liu Cc: x86@kernel.org Cc: linux-pm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Ricardo Neri --- Changes since v3: * None Changes since v2: * Changed the type of task_struct::ipcc to unsigned short. A subsequent patch uses bit fields to use 9 bits, along with other auxiliary members. Changes since v1: * Renamed task_struct::class as task_struct::ipcc. (Joel) * Use task_struct::ipcc = 0 for unclassified tasks. (PeterZ) * Renamed CONFIG_SCHED_TASK_CLASSES as CONFIG_IPC_CLASSES. (PeterZ, Joel) --- include/linux/sched.h | 10 ++++++++++ init/Kconfig | 12 ++++++++++++ 2 files changed, 22 insertions(+) diff --git a/include/linux/sched.h b/include/linux/sched.h index 1292d38d66cc..9fdee040f450 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -129,6 +129,8 @@ struct user_event_mm; __TASK_TRACED | EXIT_DEAD | EXIT_ZOMBIE | \ TASK_PARKED) +#define IPC_CLASS_UNCLASSIFIED 0 + #define task_is_running(task) (READ_ONCE((task)->__state) == TASK_RUNNING) #define task_is_traced(task) ((READ_ONCE(task->jobctl) & JOBCTL_TRACED) != 0) @@ -1534,6 +1536,14 @@ struct task_struct { struct user_event_mm *user_event_mm; #endif +#ifdef CONFIG_IPC_CLASSES + /* + * A hardware-defined classification of task that reflects but is + * not identical to the number of instructions per cycle. + */ + unsigned short ipcc; +#endif + /* * New fields for task_struct should be added above here, so that * they are included in the randomized portion of task_struct. diff --git a/init/Kconfig b/init/Kconfig index 32c24950c4ce..ea3371ccb530 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -839,6 +839,18 @@ config UCLAMP_BUCKETS_COUNT If in doubt, use the default value. +config IPC_CLASSES + bool "IPC classes of tasks" + depends on SMP + help + If selected, each task is assigned a classification value that + reflects the type of instructions that the task executes. This + classification reflects but is not equal to the number of + instructions retired per cycle. + + The scheduler uses the classification value to improve the placement + of tasks. + endmenu # From patchwork Tue Jun 13 04:24:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ricardo Neri X-Patchwork-Id: 692933 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2005CC88CBC for ; Tue, 13 Jun 2023 04:21:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238886AbjFMEVk (ORCPT ); Tue, 13 Jun 2023 00:21:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45258 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239046AbjFMEVj (ORCPT ); Tue, 13 Jun 2023 00:21:39 -0400 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 632B510DA; Mon, 12 Jun 2023 21:21:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686630098; x=1718166098; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=f70oxXzoYQGDme9oRHr+6EBjtL+JoDXEljhGKTzB1PQ=; b=Inp6q5rZdf0BFS+6rBMTb2BCUnj/nbrH5uiwTyGR8kS9/ZmGb9E6wJav O8FZ9foj9UBc++WUQbvZL785/hyl2FPNd+ZIGgsTgLJVz6VddmukMrA7H jwShHf6j4KCSQ7ijB1UEHweH7O9cRBciGQzO1L3NP419iaTjNZLURVB3D x1k840CNUJGx4YtGA2ZsdHSOY0O39Gosd87RoMR++rzWoJMQt4BhsQSfr x6mwSAChNDTREi3lfwfGsmShANr/leM4K9W+e2BBOO1mM3muFwuqK230x /d8BOf0/CRHLisGRHjg1FwT4Uitzeuw23NRC5eUntU+SVanhVr438Yd9k g==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="358222044" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="358222044" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 21:21:37 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="661854932" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="661854932" Received: from ranerica-svr.sc.intel.com ([172.25.110.23]) by orsmga003.jf.intel.com with ESMTP; 12 Jun 2023 21:21:36 -0700 From: Ricardo Neri To: "Peter Zijlstra (Intel)" , Juri Lelli , Vincent Guittot Cc: Ricardo Neri , "Ravi V. Shankar" , Ben Segall , Daniel Bristot de Oliveira , Dietmar Eggemann , Len Brown , Mel Gorman , "Rafael J. Wysocki" , Srinivas Pandruvada , Steven Rostedt , Tim Chen , Valentin Schneider , Lukasz Luba , Ionela Voinescu , Zhao Liu , "Yuan, Perry" , x86@kernel.org, "Joel Fernandes (Google)" , linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, Ricardo Neri , "Tim C . Chen" , Zhao Liu Subject: [PATCH v4 02/24] sched: Add interfaces for IPC classes Date: Mon, 12 Jun 2023 21:24:00 -0700 Message-Id: <20230613042422.5344-3-ricardo.neri-calderon@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230613042422.5344-1-ricardo.neri-calderon@linux.intel.com> References: <20230613042422.5344-1-ricardo.neri-calderon@linux.intel.com> Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org Add the interfaces that architectures shall implement to convey the data to support IPC classes. arch_update_ipcc() updates the IPC classification of the current task as given by hardware. arch_get_ipcc_score() provides a performance score for a given IPC class when placed on a specific CPU. Higher scores indicate higher performance. When a driver or equivalent enablement code has configured the necessary hardware to support IPC classes, it should call sched_enable_ipc_classes() to notify the scheduler that it can start using IPC classes data. The number of classes and the score of each class of task are determined by hardware. Cc: Ben Segall Cc: Daniel Bristot de Oliveira Cc: Dietmar Eggemann Cc: Ionela Voinescu Cc: Joel Fernandes (Google) Cc: Len Brown Cc: Lukasz Luba Cc: Mel Gorman Cc: Perry Yuan Cc: Rafael J. Wysocki Cc: Srinivas Pandruvada Cc: Steven Rostedt Cc: Tim C. Chen Cc: Valentin Schneider Cc: Zhao Liu Cc: x86@kernel.org Cc: linux-pm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Ricardo Neri --- Changes since v3: * None Changes since v2: * Clarified the properties of the IPC score: abstract, and linear. It can normalized when needed. (Ionela) * Selected a better default IPC score. (Ionela) * Removed arch_has_ipc_classes(). It is not suitable for hardware that is not ready to support IPC classes after boot. (Lukasz) * Added a new sched_enable_ipc_classes() interface that drivers or enablement code can call when ready to support IPC classes. (Lukasz) Changes since v1: * Shortened the names of the IPCC interfaces (PeterZ): sched_task_classes_enabled >> sched_ipcc_enabled arch_has_task_classes >> arch_has_ipc_classes arch_update_task_class >> arch_update_ipcc arch_get_task_class_score >> arch_get_ipcc_score * Removed smt_siblings_idle argument from arch_update_ipcc(). (PeterZ) --- include/linux/sched/topology.h | 6 ++++ kernel/sched/sched.h | 66 ++++++++++++++++++++++++++++++++++ kernel/sched/topology.c | 9 +++++ 3 files changed, 81 insertions(+) diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h index 816df6cc444e..5b084d3c9ad1 100644 --- a/include/linux/sched/topology.h +++ b/include/linux/sched/topology.h @@ -280,4 +280,10 @@ static inline int task_node(const struct task_struct *p) return cpu_to_node(task_cpu(p)); } +#ifdef CONFIG_IPC_CLASSES +extern void sched_enable_ipc_classes(void); +#else +static inline void sched_enable_ipc_classes(void) { } +#endif + #endif /* _LINUX_SCHED_TOPOLOGY_H */ diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 556496c77dc2..03fb53e9f340 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2544,6 +2544,72 @@ void arch_scale_freq_tick(void) } #endif +#ifdef CONFIG_IPC_CLASSES +DECLARE_STATIC_KEY_FALSE(sched_ipcc); + +static inline bool sched_ipcc_enabled(void) +{ + return static_branch_unlikely(&sched_ipcc); +} + +#ifndef arch_update_ipcc +/** + * arch_update_ipcc() - Update the IPC class of the current task + * @curr: The current task + * + * Request that the IPC classification of @curr is updated. + * + * Returns: none + */ +static __always_inline +void arch_update_ipcc(struct task_struct *curr) +{ +} +#endif + +#ifndef arch_get_ipcc_score + +#define SCHED_IPCC_SCORE_SCALE (1L << SCHED_FIXEDPOINT_SHIFT) +/** + * arch_get_ipcc_score() - Get the IPC score of a class of task + * @ipcc: The IPC class + * @cpu: A CPU number + * + * The IPC performance scores reflects (but it is not identical to) the number + * of instructions retired per cycle for a given IPC class. It is a linear and + * abstract metric. Higher scores reflect better performance. + * + * The IPC score can be normalized with respect to the class, i, with the + * highest IPC score on the CPU, c, with highest performance: + * + * IPC(i, c) + * ------------------------------------ * SCHED_IPCC_SCORE_SCALE + * max(IPC(i, c) : (i, c)) + * + * Scheduling schemes that want to use the IPC score along with other + * normalized metrics for scheduling (e.g., CPU capacity) may need to normalize + * it. + * + * Other scheduling schemes (e.g., asym_packing) do not need normalization. + * + * Returns the performance score of an IPC class, @ipcc, when running on @cpu. + * Error when either @ipcc or @cpu are invalid. + */ +static __always_inline +unsigned long arch_get_ipcc_score(unsigned short ipcc, int cpu) +{ + return SCHED_IPCC_SCORE_SCALE; +} +#endif +#else /* CONFIG_IPC_CLASSES */ + +#define arch_get_ipcc_score(ipcc, cpu) (-EINVAL) +#define arch_update_ipcc(curr) + +static inline bool sched_ipcc_enabled(void) { return false; } + +#endif /* CONFIG_IPC_CLASSES */ + #ifndef arch_scale_freq_capacity /** * arch_scale_freq_capacity - get the frequency scale factor of a given CPU. diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c index ca4472281c28..c3a633a4b474 100644 --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -672,6 +672,15 @@ DEFINE_PER_CPU(struct sched_domain __rcu *, sd_asym_packing); DEFINE_PER_CPU(struct sched_domain __rcu *, sd_asym_cpucapacity); DEFINE_STATIC_KEY_FALSE(sched_asym_cpucapacity); +#ifdef CONFIG_IPC_CLASSES +DEFINE_STATIC_KEY_FALSE(sched_ipcc); + +void sched_enable_ipc_classes(void) +{ + static_branch_enable_cpuslocked(&sched_ipcc); +} +#endif + static void update_top_cache_domain(int cpu) { struct sched_domain_shared *sds = NULL; From patchwork Tue Jun 13 04:24:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ricardo Neri X-Patchwork-Id: 692455 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4E498C77B7A for ; Tue, 13 Jun 2023 04:21:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239079AbjFMEVl (ORCPT ); Tue, 13 Jun 2023 00:21:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45264 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239066AbjFMEVk (ORCPT ); Tue, 13 Jun 2023 00:21:40 -0400 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8CAC410DC; Mon, 12 Jun 2023 21:21:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686630099; x=1718166099; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=bqBqPC3JJdHtcPQBfVoOvpXz2vMkY7r1MvvgzoInsII=; b=A/TB27KftATPY2hCv2jeSmz1tYqhwZXYDuHgGmM+c4xRE8/bZjIsOTWW BC5QhNSGNn+QspndTykh8oHCFsg32de3UAeaj69JbHD5ZjCHPhCyDHdN9 O3gPqQRMMxm7ufsw5XsOulGH06Mvr8I4/Ef8HpsXc9cd2E9z44dO35pMX aerE+6TghmHa8hf0Hia29nqXTCMl53RZ3yT0BPIzTreuNkNE8yKRQ/SAn 63cnuzhmCxWjU5ePqhmBLvOKXNv8nKXfGnym+B/hOSbW7YIcDPjpNB3Ke HWIHE2lyouQTJcxlVrK9c6Frkh4mcQ3lxGzjPCd8kAHRWnYfx+Du81B5H g==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="358222056" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="358222056" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 21:21:38 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="661854936" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="661854936" Received: from ranerica-svr.sc.intel.com ([172.25.110.23]) by orsmga003.jf.intel.com with ESMTP; 12 Jun 2023 21:21:37 -0700 From: Ricardo Neri To: "Peter Zijlstra (Intel)" , Juri Lelli , Vincent Guittot Cc: Ricardo Neri , "Ravi V. Shankar" , Ben Segall , Daniel Bristot de Oliveira , Dietmar Eggemann , Len Brown , Mel Gorman , "Rafael J. Wysocki" , Srinivas Pandruvada , Steven Rostedt , Tim Chen , Valentin Schneider , Lukasz Luba , Ionela Voinescu , Zhao Liu , "Yuan, Perry" , x86@kernel.org, "Joel Fernandes (Google)" , linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, Ricardo Neri , "Tim C . Chen" , Zhao Liu Subject: [PATCH v4 03/24] sched/core: Initialize the IPC class of a new task Date: Mon, 12 Jun 2023 21:24:01 -0700 Message-Id: <20230613042422.5344-4-ricardo.neri-calderon@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230613042422.5344-1-ricardo.neri-calderon@linux.intel.com> References: <20230613042422.5344-1-ricardo.neri-calderon@linux.intel.com> Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org New tasks shall start life as unclassified. They will be classified by hardware when they run. Cc: Ben Segall Cc: Daniel Bristot de Oliveira Cc: Dietmar Eggemann Cc: Ionela Voinescu Cc: Joel Fernandes (Google) Cc: Len Brown Cc: Lukasz Luba Cc: Mel Gorman Cc: Perry Yuan Cc: Rafael J. Wysocki Cc: Srinivas Pandruvada Cc: Steven Rostedt Cc: Tim C. Chen Cc: Valentin Schneider Cc: Zhao Liu Cc: x86@kernel.org Cc: linux-pm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Ricardo Neri --- Changes since v3: * None Changes since v2: * None Changes since v1: * None --- kernel/sched/core.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index d9a04faa48a1..77b769c23186 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4500,6 +4500,9 @@ static void __sched_fork(unsigned long clone_flags, struct task_struct *p) p->se.prev_sum_exec_runtime = 0; p->se.nr_migrations = 0; p->se.vruntime = 0; +#ifdef CONFIG_IPC_CLASSES + p->ipcc = IPC_CLASS_UNCLASSIFIED; +#endif INIT_LIST_HEAD(&p->se.group_node); #ifdef CONFIG_FAIR_GROUP_SCHED From patchwork Tue Jun 13 04:24:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ricardo Neri X-Patchwork-Id: 692932 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3B150C88CBA for ; Tue, 13 Jun 2023 04:21:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239182AbjFMEVv (ORCPT ); Tue, 13 Jun 2023 00:21:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45270 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239086AbjFMEVl (ORCPT ); Tue, 13 Jun 2023 00:21:41 -0400 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8FD8210DA; Mon, 12 Jun 2023 21:21:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686630100; x=1718166100; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=M3Us+Di7YVDVts2uy+SPo0Lo0L5HiLvQUCEEirPWj5g=; b=Hsgoigar71fWJq+9QayOUU3GDLpjRns7PPUPdk3V/Za72SlaUtL9kjas O/H0f6cD1XHHHIyK162Id5sr3y3vwspqtABx7X93u16l2v5Qh6pchmd5h puxa2hyLHt6LsBbv0kZD/ZT6tOvTdqE6FsLajtwUdKMyEzInTuaiim/CD LSPAR4hJZ126sti7qEJ8pHMXp/kmuqcyhHxOhKtP5R4pDIYzvcpvhQe+7 CtFbQOxSyZE2Khsl6zhKl0P0Z7IHtuWeHjkStm+hlAr9nn+KnNvQWUIe/ JXzjcJrPvwnzANi5k9daIlKhgW+kyec7o9lTyJ5U3To42jHQ2/s3UhfM7 Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="358222070" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="358222070" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 21:21:40 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="661854940" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="661854940" Received: from ranerica-svr.sc.intel.com ([172.25.110.23]) by orsmga003.jf.intel.com with ESMTP; 12 Jun 2023 21:21:38 -0700 From: Ricardo Neri To: "Peter Zijlstra (Intel)" , Juri Lelli , Vincent Guittot Cc: Ricardo Neri , "Ravi V. Shankar" , Ben Segall , Daniel Bristot de Oliveira , Dietmar Eggemann , Len Brown , Mel Gorman , "Rafael J. Wysocki" , Srinivas Pandruvada , Steven Rostedt , Tim Chen , Valentin Schneider , Lukasz Luba , Ionela Voinescu , Zhao Liu , "Yuan, Perry" , x86@kernel.org, "Joel Fernandes (Google)" , linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, Ricardo Neri , "Tim C . Chen" , Zhao Liu Subject: [PATCH v4 04/24] sched/core: Add user_tick as argument to scheduler_tick() Date: Mon, 12 Jun 2023 21:24:02 -0700 Message-Id: <20230613042422.5344-5-ricardo.neri-calderon@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230613042422.5344-1-ricardo.neri-calderon@linux.intel.com> References: <20230613042422.5344-1-ricardo.neri-calderon@linux.intel.com> Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org Differentiate between user and kernel ticks so that the scheduler updates the IPC class of the current task during the former. Cc: Ben Segall Cc: Daniel Bristot de Oliveira Cc: Dietmar Eggemann Cc: Ionela Voinescu Cc: Joel Fernandes (Google) Cc: Len Brown Cc: Lukasz Luba Cc: Mel Gorman Cc: Perry Yuan Cc: Rafael J. Wysocki Cc: Srinivas Pandruvada Cc: Steven Rostedt Cc: Tim C. Chen Cc: Valentin Schneider Cc: Zhao Liu Cc: x86@kernel.org Cc: linux-pm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Ricardo Neri --- Changes since v3: * None Changes since v2: * Corrected error in the changeset description: the IPC class of the current task is updated at user tick. (Dietmar) Changes since v1: * None --- include/linux/sched.h | 2 +- kernel/sched/core.c | 2 +- kernel/time/timer.c | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 9fdee040f450..0e1c38ad09c2 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -295,7 +295,7 @@ enum { TASK_COMM_LEN = 16, }; -extern void scheduler_tick(void); +extern void scheduler_tick(bool user_tick); #define MAX_SCHEDULE_TIMEOUT LONG_MAX diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 77b769c23186..6ac53f01d989 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5639,7 +5639,7 @@ static inline u64 cpu_resched_latency(struct rq *rq) { return 0; } * This function gets called by the timer code, with HZ frequency. * We call it with interrupts disabled. */ -void scheduler_tick(void) +void scheduler_tick(bool user_tick) { int cpu = smp_processor_id(); struct rq *rq = cpu_rq(cpu); diff --git a/kernel/time/timer.c b/kernel/time/timer.c index 63a8ce7177dd..e15e24105891 100644 --- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -2073,7 +2073,7 @@ void update_process_times(int user_tick) if (in_irq()) irq_work_tick(); #endif - scheduler_tick(); + scheduler_tick(user_tick); if (IS_ENABLED(CONFIG_POSIX_TIMERS)) run_posix_cpu_timers(); } From patchwork Tue Jun 13 04:24:03 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ricardo Neri X-Patchwork-Id: 692454 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7471FC77B7A for ; Tue, 13 Jun 2023 04:21:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239215AbjFMEVx (ORCPT ); Tue, 13 Jun 2023 00:21:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45280 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239104AbjFMEVn (ORCPT ); Tue, 13 Jun 2023 00:21:43 -0400 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DC1F510DC; Mon, 12 Jun 2023 21:21:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686630101; x=1718166101; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=QU+wKINCGsj5LuFLg6kYBTKe+UQFDp2wCImpIqFvbB4=; b=f02zPALJIwzedBIRhSsrfRqitbua2VMWgHiFwOHwrcENt4V9ygcGotgM j9MLI+tPYwqk/jhjwNFeZAjlnSdSF8e2x0vfUKO2NbjNXg89voi13uUQx k1i8kkynO1p0AbH2qnGwTRsdS75ROJscCU8MJY+FBEFD9OJrWZhK7FjF0 eWH+HEHC5ZtzjX3Wd5uTw/PuPwS9KbfiujaY4+3k646VQ5NQmhId0AUqB eG5XP73s91K2UdN1w9PbQJylflFBONK5DYDlxA16nMPphcapt7puMIS21 G6LETGSxHH4hJLT1+Hp3SYAewgg7O5T2O6pbjyYgdGL3OlZscDqpsDDbW w==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="358222083" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="358222083" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 21:21:41 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="661854947" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="661854947" Received: from ranerica-svr.sc.intel.com ([172.25.110.23]) by orsmga003.jf.intel.com with ESMTP; 12 Jun 2023 21:21:40 -0700 From: Ricardo Neri To: "Peter Zijlstra (Intel)" , Juri Lelli , Vincent Guittot Cc: Ricardo Neri , "Ravi V. Shankar" , Ben Segall , Daniel Bristot de Oliveira , Dietmar Eggemann , Len Brown , Mel Gorman , "Rafael J. Wysocki" , Srinivas Pandruvada , Steven Rostedt , Tim Chen , Valentin Schneider , Lukasz Luba , Ionela Voinescu , Zhao Liu , "Yuan, Perry" , x86@kernel.org, "Joel Fernandes (Google)" , linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, Ricardo Neri , "Tim C . Chen" , Zhao Liu Subject: [PATCH v4 05/24] sched/core: Update the IPC class of the current task Date: Mon, 12 Jun 2023 21:24:03 -0700 Message-Id: <20230613042422.5344-6-ricardo.neri-calderon@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230613042422.5344-1-ricardo.neri-calderon@linux.intel.com> References: <20230613042422.5344-1-ricardo.neri-calderon@linux.intel.com> Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org When supported, hardware monitors the instruction stream to classify the current task. Hence, at userspace tick, we are ready to read the most recent classification result for the current task. Cc: Ben Segall Cc: Daniel Bristot de Oliveira Cc: Dietmar Eggemann Cc: Ionela Voinescu Cc: Joel Fernandes (Google) Cc: Len Brown Cc: Lukasz Luba Cc: Mel Gorman Cc: Perry Yuan Cc: Rafael J. Wysocki Cc: Srinivas Pandruvada Cc: Steven Rostedt Cc: Tim C. Chen Cc: Valentin Schneider Cc: Zhao Liu Cc: x86@kernel.org Cc: linux-pm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Ricardo Neri --- Changes since v3: * None Changes since v2: * None Changes since v1: * Removed argument smt_siblings_idle from call to arch_ipcc_update(). * Used the new IPCC interfaces names. --- kernel/sched/core.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 6ac53f01d989..876396b1d077 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5651,6 +5651,9 @@ void scheduler_tick(bool user_tick) if (housekeeping_cpu(cpu, HK_TYPE_TICK)) arch_scale_freq_tick(); + if (sched_ipcc_enabled() && user_tick) + arch_update_ipcc(curr); + sched_clock_tick(); rq_lock(rq, &rf); From patchwork Tue Jun 13 04:24:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ricardo Neri X-Patchwork-Id: 692931 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 61E58C88CB9 for ; Tue, 13 Jun 2023 04:21:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239244AbjFMEVy (ORCPT ); Tue, 13 Jun 2023 00:21:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45288 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233202AbjFMEVo (ORCPT ); Tue, 13 Jun 2023 00:21:44 -0400 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 326F210DF; Mon, 12 Jun 2023 21:21:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686630103; x=1718166103; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=pYXtUSdKAZn71AT9JcTrqnvDxSzKtU4csFEI6Z9hgNU=; b=gyz2BBm/6c/j3lsDUYzclbhGI8H68562IcU+HM3JyG+V8ZuZRFoXXwg5 9rYlmc4sOAFMBplg9Dr6UFcvODUoqdCpokCsUmi2nLIDeMtSWtnW9OXVO DouGE9ZwjM+01UflO3+5J1gEJACa/02eiZoFNtDXCn+6+FhC1xtxEZQ5h pbzi4P2Z9A/2itaHbbxgA3bS5i3zQd+j+UjSHA/N9UtkiFAhtBHJPio1X 4svRtHBa5Jk1ZsvKwq71YnlFIhGM/b4OQFKnqjo3pj6rKM4Y+yktW6rN5 TFrvIA4jEDneRUmKKkUNvCDOPEFSEzn6gAC7fRAMP6g8tVgkyFUFxYFAg A==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="358222096" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="358222096" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 21:21:42 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="661854950" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="661854950" Received: from ranerica-svr.sc.intel.com ([172.25.110.23]) by orsmga003.jf.intel.com with ESMTP; 12 Jun 2023 21:21:41 -0700 From: Ricardo Neri To: "Peter Zijlstra (Intel)" , Juri Lelli , Vincent Guittot Cc: Ricardo Neri , "Ravi V. Shankar" , Ben Segall , Daniel Bristot de Oliveira , Dietmar Eggemann , Len Brown , Mel Gorman , "Rafael J. Wysocki" , Srinivas Pandruvada , Steven Rostedt , Tim Chen , Valentin Schneider , Lukasz Luba , Ionela Voinescu , Zhao Liu , "Yuan, Perry" , x86@kernel.org, "Joel Fernandes (Google)" , linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, Ricardo Neri , "Tim C . Chen" , Zhao Liu Subject: [PATCH v4 06/24] sched/fair: Collect load-balancing stats for IPC classes Date: Mon, 12 Jun 2023 21:24:04 -0700 Message-Id: <20230613042422.5344-7-ricardo.neri-calderon@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230613042422.5344-1-ricardo.neri-calderon@linux.intel.com> References: <20230613042422.5344-1-ricardo.neri-calderon@linux.intel.com> Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org When selecting the busiest scheduling group between two otherwise identical groups of types asym_packing or fully_busy, IPC classes can be used to break the tie. Compute the IPC class performance score for a scheduling group. It is defined as the sum of the IPC scores of the tasks at the back of each runqueue in the group. Load balancing starts by pulling tasks from the back of the runqueue first, making this tiebreaker more useful. Also, track the IPC class with the lowest score in the scheduling group. A task of this class will be pulled when the destination CPU has lower priority than the fully_busy busiest group. These two metrics will be used during idle load balancing to compute the current and the potential IPC class score of a scheduling group in a subsequent changeset. Cc: Ben Segall Cc: Daniel Bristot de Oliveira Cc: Dietmar Eggemann Cc: Ionela Voinescu Cc: Joel Fernandes (Google) Cc: Len Brown Cc: Lukasz Luba Cc: Mel Gorman Cc: Perry Yuan Cc: Rafael J. Wysocki Cc: Srinivas Pandruvada Cc: Steven Rostedt Cc: Tim C. Chen Cc: Valentin Schneider Cc: Zhao Liu Cc: x86@kernel.org Cc: linux-pm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Ricardo Neri --- Changes since v3: * Do not compute the IPCC stats using the current tasks of runqueues. Instead, use the tasks at the back of the queue. These are the tasks that will be pulled first during load balance. (Vincent) Changes since v2: * Also excluded deadline and realtime tasks from IPCC stats. (Dietmar) * Also excluded tasks that cannot run on the destination CPU from the IPCC stats. * Folded struct sg_lb_ipcc_stats into struct sg_lb_stats. (Dietmar) * Reworded description sg_lb_stats::min_ipcc. (Ionela) * Handle errors of arch_get_ipcc_score(). (Ionela) Changes since v1: * Implemented cleanups and reworks from PeterZ. Thanks! * Used the new interface names. --- kernel/sched/fair.c | 79 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 79 insertions(+) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 6189d1a45635..c0cab5e501b6 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -9110,6 +9110,11 @@ struct sg_lb_stats { unsigned int nr_numa_running; unsigned int nr_preferred_running; #endif +#ifdef CONFIG_IPC_CLASSES + unsigned long min_score; /* Min(score(rq->curr->ipcc)) */ + unsigned short min_ipcc; /* Class of the task with the minimum IPCC score in the rq */ + unsigned long sum_score; /* Sum(score(rq->curr->ipcc)) */ +#endif }; /* @@ -9387,6 +9392,77 @@ group_type group_classify(unsigned int imbalance_pct, return group_has_spare; } +#ifdef CONFIG_IPC_CLASSES +static void init_rq_ipcc_stats(struct sg_lb_stats *sgs) +{ + /* All IPCC stats have been set to zero in update_sg_lb_stats(). */ + sgs->min_score = ULONG_MAX; +} + +static int rq_last_task_ipcc(int dst_cpu, struct rq *rq, unsigned short *ipcc) +{ + struct list_head *tasks = &rq->cfs_tasks; + struct task_struct *p; + struct rq_flags rf; + int ret = -EINVAL; + + rq_lock_irqsave(rq, &rf); + if (list_empty(tasks)) + goto out; + + p = list_last_entry(tasks, struct task_struct, se.group_node); + if (p->flags & PF_EXITING || is_idle_task(p) || + !cpumask_test_cpu(dst_cpu, p->cpus_ptr)) + goto out; + + ret = 0; + *ipcc = p->ipcc; +out: + rq_unlock(rq, &rf); + return ret; +} + +/* Called only if cpu_of(@rq) is not idle and has tasks running. */ +static void update_sg_lb_ipcc_stats(int dst_cpu, struct sg_lb_stats *sgs, + struct rq *rq) +{ + unsigned short ipcc; + unsigned long score; + + if (!sched_ipcc_enabled()) + return; + + if (rq_last_task_ipcc(dst_cpu, rq, &ipcc)) + return; + + score = arch_get_ipcc_score(ipcc, cpu_of(rq)); + + /* + * Ignore tasks with invalid scores. When finding the busiest group, we + * prefer those with higher sum_score. This group will not be selected. + */ + if (IS_ERR_VALUE(score)) + return; + + sgs->sum_score += score; + + if (score < sgs->min_score) { + sgs->min_score = score; + sgs->min_ipcc = ipcc; + } +} + +#else /* CONFIG_IPC_CLASSES */ +static void update_sg_lb_ipcc_stats(int dst_cpu, struct sg_lb_stats *sgs, + struct rq *rq) +{ +} + +static void init_rq_ipcc_stats(struct sg_lb_stats *sgs) +{ +} +#endif /* CONFIG_IPC_CLASSES */ + /** * sched_use_asym_prio - Check whether asym_packing priority must be used * @sd: The scheduling domain of the load balancing @@ -9477,6 +9553,7 @@ static inline void update_sg_lb_stats(struct lb_env *env, int i, nr_running, local_group; memset(sgs, 0, sizeof(*sgs)); + init_rq_ipcc_stats(sgs); local_group = group == sds->local; @@ -9526,6 +9603,8 @@ static inline void update_sg_lb_stats(struct lb_env *env, if (sgs->group_misfit_task_load < load) sgs->group_misfit_task_load = load; } + + update_sg_lb_ipcc_stats(env->dst_cpu, sgs, rq); } sgs->group_capacity = group->sgc->capacity; From patchwork Tue Jun 13 04:24:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ricardo Neri X-Patchwork-Id: 692453 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id ACDF0C77B7A for ; Tue, 13 Jun 2023 04:22:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239328AbjFMEV6 (ORCPT ); Tue, 13 Jun 2023 00:21:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45306 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239117AbjFMEVp (ORCPT ); Tue, 13 Jun 2023 00:21:45 -0400 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 776CB10E3; Mon, 12 Jun 2023 21:21:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686630104; x=1718166104; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=hUm0MDoGkoQOMZz9UHTaezep/GE1PjQ+RBjcMUHjEnk=; b=X+qY9K0s7u3fT6Fzjq2rOgyz0FI52MrgIfagV1O9je6MN35B3eIhvSOm C0UlXB9ik5Ylvh359yS4al98ajnWVO8pL23Ti9w22pqRKbhvuTGvTCUEq o0uOVeYqy3/lDrYvFB1EWFHzK5LWJl6dYUHxOupJMtUC9FduydLBq7xs3 19OqX0kkrfu7Nnlq45/5Futeds869lA588EdO7lLZgLTgV4yCjTOlYZmu GIQyw0m81/aAkx4mPeH2OOivGmrQRyEGpXh6s2LIV1XfFcD5U521nduBZ gFOkGc4rcxGG5y9CIyJraVk1IqOPI0Wg10uI59VKVYaV4RYnyDRf4zo6b w==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="358222107" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="358222107" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 21:21:43 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="661854953" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="661854953" Received: from ranerica-svr.sc.intel.com ([172.25.110.23]) by orsmga003.jf.intel.com with ESMTP; 12 Jun 2023 21:21:42 -0700 From: Ricardo Neri To: "Peter Zijlstra (Intel)" , Juri Lelli , Vincent Guittot Cc: Ricardo Neri , "Ravi V. Shankar" , Ben Segall , Daniel Bristot de Oliveira , Dietmar Eggemann , Len Brown , Mel Gorman , "Rafael J. Wysocki" , Srinivas Pandruvada , Steven Rostedt , Tim Chen , Valentin Schneider , Lukasz Luba , Ionela Voinescu , Zhao Liu , "Yuan, Perry" , x86@kernel.org, "Joel Fernandes (Google)" , linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, Ricardo Neri , "Tim C . Chen" , Zhao Liu Subject: [PATCH v4 07/24] sched/fair: Compute IPC class scores for load balancing Date: Mon, 12 Jun 2023 21:24:05 -0700 Message-Id: <20230613042422.5344-8-ricardo.neri-calderon@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230613042422.5344-1-ricardo.neri-calderon@linux.intel.com> References: <20230613042422.5344-1-ricardo.neri-calderon@linux.intel.com> Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org When using IPCC scores to break ties between two scheduling groups, it is necessary to consider both the current score and the score that would result after load balancing. Compute the combined IPC class score of a scheduling group and the local scheduling group. Compute both the current score and the prospective score. Collect IPCC statistics only for asym_packing and fully_busy scheduling groups. These are the only cases that use IPCC scores. These IPCC statistics are used during idle load balancing. The candidate scheduling group will have one fewer busy CPU after load balancing. This observation is important for cores with SMT support. The IPCC score of scheduling groups composed of SMT siblings needs to consider that the siblings share CPU resources. When computing the total IPCC score of the scheduling group, divide the score of each sibling by the number of busy siblings. Cc: Ben Segall Cc: Daniel Bristot de Oliveira Cc: Dietmar Eggemann Cc: Ionela Voinescu Cc: Joel Fernandes (Google) Cc: Len Brown Cc: Lukasz Luba Cc: Mel Gorman Cc: Perry Yuan Cc: Rafael J. Wysocki Cc: Srinivas Pandruvada Cc: Steven Rostedt Cc: Tim C. Chen Cc: Valentin Schneider Cc: Zhao Liu Cc: x86@kernel.org Cc: linux-pm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Ricardo Neri --- Changes since v3: * None Changes since v2: * Also collect IPCC stats for fully_busy sched groups. * Restrict use of IPCC stats to SD_ASYM_PACKING. (Ionela) * Handle errors of arch_get_ipcc_score(). (Ionela) Changes since v1: * Implemented cleanups and reworks from PeterZ. I took all his suggestions, except the computation of the IPC score before and after load balancing. We are computing not the average score, but the *total*. * Check for the SD_SHARE_CPUCAPACITY to compute the throughput of the SMT siblings of a physical core. * Used the new interface names. * Reworded commit message for clarity. --- kernel/sched/fair.c | 68 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 68 insertions(+) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index c0cab5e501b6..a51c65c9335f 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -9114,6 +9114,8 @@ struct sg_lb_stats { unsigned long min_score; /* Min(score(rq->curr->ipcc)) */ unsigned short min_ipcc; /* Class of the task with the minimum IPCC score in the rq */ unsigned long sum_score; /* Sum(score(rq->curr->ipcc)) */ + long ipcc_score_after; /* Prospective IPCC score after load balancing */ + unsigned long ipcc_score_before; /* IPCC score before load balancing */ #endif }; @@ -9452,6 +9454,62 @@ static void update_sg_lb_ipcc_stats(int dst_cpu, struct sg_lb_stats *sgs, } } +static void update_sg_lb_stats_scores(struct sg_lb_stats *sgs, + struct sched_group *sg, + struct lb_env *env) +{ + unsigned long score_on_dst_cpu, before; + int busy_cpus; + long after; + + if (!sched_ipcc_enabled()) + return; + + /* + * IPCC scores are only useful during idle load balancing. For now, + * only asym_packing uses IPCC scores. + */ + if (!(env->sd->flags & SD_ASYM_PACKING) || + env->idle == CPU_NOT_IDLE) + return; + + /* + * IPCC scores are used to break ties only between these types of + * groups. + */ + if (sgs->group_type != group_fully_busy && + sgs->group_type != group_asym_packing) + return; + + busy_cpus = sgs->group_weight - sgs->idle_cpus; + + /* No busy CPUs in the group. No tasks to move. */ + if (!busy_cpus) + return; + + score_on_dst_cpu = arch_get_ipcc_score(sgs->min_ipcc, env->dst_cpu); + + /* + * Do not use IPC scores. sgs::ipcc_score_{after, before} will be zero + * and not used. + */ + if (IS_ERR_VALUE(score_on_dst_cpu)) + return; + + before = sgs->sum_score; + after = before - sgs->min_score; + + /* SMT siblings share throughput. */ + if (busy_cpus > 1 && sg->flags & SD_SHARE_CPUCAPACITY) { + before /= busy_cpus; + /* One sibling will become idle after load balance. */ + after /= busy_cpus - 1; + } + + sgs->ipcc_score_after = after + score_on_dst_cpu; + sgs->ipcc_score_before = before; +} + #else /* CONFIG_IPC_CLASSES */ static void update_sg_lb_ipcc_stats(int dst_cpu, struct sg_lb_stats *sgs, struct rq *rq) @@ -9461,6 +9519,13 @@ static void update_sg_lb_ipcc_stats(int dst_cpu, struct sg_lb_stats *sgs, static void init_rq_ipcc_stats(struct sg_lb_stats *sgs) { } + +static void update_sg_lb_stats_scores(struct sg_lb_stats *sgs, + struct sched_group *sg, + struct lb_env *env) +{ +} + #endif /* CONFIG_IPC_CLASSES */ /** @@ -9620,6 +9685,9 @@ static inline void update_sg_lb_stats(struct lb_env *env, sgs->group_type = group_classify(env->sd->imbalance_pct, group, sgs); + if (!local_group) + update_sg_lb_stats_scores(sgs, group, env); + /* Computing avg_load makes sense only when group is overloaded */ if (sgs->group_type == group_overloaded) sgs->avg_load = (sgs->group_load * SCHED_CAPACITY_SCALE) / From patchwork Tue Jun 13 04:24:06 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ricardo Neri X-Patchwork-Id: 692930 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CAB86C88CB9 for ; Tue, 13 Jun 2023 04:22:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239066AbjFMEV7 (ORCPT ); Tue, 13 Jun 2023 00:21:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45436 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239175AbjFMEVu (ORCPT ); Tue, 13 Jun 2023 00:21:50 -0400 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D411C10EB; Mon, 12 Jun 2023 21:21:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686630105; x=1718166105; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=A62nET/Lf8LbzlAZNd4yBxD9o9MCsRJYPyzWWZ16rDc=; b=GI0HxolQlR8+6FwbiiYa2ntvcEhxe8p7lSSlXgmYva3ALuvoQsvj1hMp CKr4QbSz1XAE+YaXvVPh8EHCEoPbA1IP7dxfz+1IdWBsZKK3Ztgj8phC5 rnhSewZKTTUrZHo/VLeyheDvUgYosNiHG0KowgP2oucfSKgCZ4HK/iADQ fUNJnHl+CK2NlaVzVrM8XTjMdqfeCEmpTcQoUtCYgHK/3MyBPxexyVmw4 W1OnRr6XJhpAWQH0rTQ8Zy1Guh9kMW1LiQa+dOI7vaGUyqaD8SZqmkVDf heB55PUo/avw2RY62bIJTw7XfuBuyInDpWf5OdzpPCYYbJGKjUGErOQyJ w==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="358222119" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="358222119" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 21:21:45 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="661854956" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="661854956" Received: from ranerica-svr.sc.intel.com ([172.25.110.23]) by orsmga003.jf.intel.com with ESMTP; 12 Jun 2023 21:21:44 -0700 From: Ricardo Neri To: "Peter Zijlstra (Intel)" , Juri Lelli , Vincent Guittot Cc: Ricardo Neri , "Ravi V. Shankar" , Ben Segall , Daniel Bristot de Oliveira , Dietmar Eggemann , Len Brown , Mel Gorman , "Rafael J. Wysocki" , Srinivas Pandruvada , Steven Rostedt , Tim Chen , Valentin Schneider , Lukasz Luba , Ionela Voinescu , Zhao Liu , "Yuan, Perry" , x86@kernel.org, "Joel Fernandes (Google)" , linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, Ricardo Neri , "Tim C . Chen" , Zhao Liu Subject: [PATCH v4 08/24] sched/fair: Use IPCC stats to break ties between asym_packing sched groups Date: Mon, 12 Jun 2023 21:24:06 -0700 Message-Id: <20230613042422.5344-9-ricardo.neri-calderon@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230613042422.5344-1-ricardo.neri-calderon@linux.intel.com> References: <20230613042422.5344-1-ricardo.neri-calderon@linux.intel.com> Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org update_sd_pick_busiest() selects as busiest the candidate group passed to it as argument if it has the same priority as the current busiest. Either group is a good choice. IPCC statistics reflect the class of work on a scheduling group. Use this data to break the priority tie between the candidate and current busiest groups. Pick as busiest the scheduling group that yields a higher IPCC score after load balancing. Cc: Ben Segall Cc: Daniel Bristot de Oliveira Cc: Dietmar Eggemann Cc: Ionela Voinescu Cc: Joel Fernandes (Google) Cc: Len Brown Cc: Lukasz Luba Cc: Mel Gorman Cc: Perry Yuan Cc: Rafael J. Wysocki Cc: Srinivas Pandruvada Cc: Steven Rostedt Cc: Tim C. Chen Cc: Valentin Schneider Cc: Zhao Liu Cc: x86@kernel.org Cc: linux-pm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Ricardo Neri --- Changes since v3: * None Changes since v2: * None Changes since v1: * Added a comment to clarify why sched_asym_prefer() needs a tie breaker only in update_sd_pick_busiest(). (PeterZ) * Renamed functions for accuracy: sched_asym_class_prefer() >> sched_asym_ipcc_prefer() sched_asym_class_pick() >> sched_asym_ipcc_pick() * Reworded commit message for clarity. --- kernel/sched/fair.c | 72 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 72 insertions(+) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index a51c65c9335f..fb3d793fe9ad 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -9510,6 +9510,60 @@ static void update_sg_lb_stats_scores(struct sg_lb_stats *sgs, sgs->ipcc_score_before = before; } +/** + * sched_asym_ipcc_prefer - Select a sched group based on its IPCC score + * @a: Load balancing statistics of a sched group + * @b: Load balancing statistics of a second sched group + * + * Returns: true if @a has a higher IPCC score than @b after load balance. + * False otherwise. + */ +static bool sched_asym_ipcc_prefer(struct sg_lb_stats *a, + struct sg_lb_stats *b) +{ + if (!sched_ipcc_enabled()) + return false; + + /* @a increases overall throughput after load balance. */ + if (a->ipcc_score_after > b->ipcc_score_after) + return true; + + /* + * If @a and @b yield the same overall throughput, pick @a if + * its current throughput is lower than that of @b. + */ + if (a->ipcc_score_after == b->ipcc_score_after) + return a->ipcc_score_before < b->ipcc_score_before; + + return false; +} + +/** + * sched_asym_ipcc_pick - Select a sched group based on its IPCC score + * @a: A scheduling group + * @b: A second scheduling group + * @a_stats: Load balancing statistics of @a + * @b_stats: Load balancing statistics of @b + * + * Returns: true if @a has the same priority and @a has tasks with IPC classes + * that yield higher overall throughput after load balance. False otherwise. + */ +static bool sched_asym_ipcc_pick(struct sched_group *a, + struct sched_group *b, + struct sg_lb_stats *a_stats, + struct sg_lb_stats *b_stats) +{ + /* + * Only use the class-specific preference selection if both sched + * groups have the same priority. + */ + if (arch_asym_cpu_priority(a->asym_prefer_cpu) != + arch_asym_cpu_priority(b->asym_prefer_cpu)) + return false; + + return sched_asym_ipcc_prefer(a_stats, b_stats); +} + #else /* CONFIG_IPC_CLASSES */ static void update_sg_lb_ipcc_stats(int dst_cpu, struct sg_lb_stats *sgs, struct rq *rq) @@ -9526,6 +9580,14 @@ static void update_sg_lb_stats_scores(struct sg_lb_stats *sgs, { } +static bool sched_asym_ipcc_pick(struct sched_group *a, + struct sched_group *b, + struct sg_lb_stats *a_stats, + struct sg_lb_stats *b_stats) +{ + return false; +} + #endif /* CONFIG_IPC_CLASSES */ /** @@ -9759,6 +9821,16 @@ static bool update_sd_pick_busiest(struct lb_env *env, /* Prefer to move from lowest priority CPU's work */ if (sched_asym_prefer(sg->asym_prefer_cpu, sds->busiest->asym_prefer_cpu)) return false; + + /* + * Unlike other callers of sched_asym_prefer(), here both @sg + * and @sds::busiest have tasks running. When they have equal + * priority, their IPC class scores can be used to select a + * better busiest. + */ + if (sched_asym_ipcc_pick(sds->busiest, sg, &sds->busiest_stat, sgs)) + return false; + break; case group_misfit_task: From patchwork Tue Jun 13 04:24:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ricardo Neri X-Patchwork-Id: 692452 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 32B25C88CB9 for ; Tue, 13 Jun 2023 04:22:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239500AbjFMEWU (ORCPT ); Tue, 13 Jun 2023 00:22:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45420 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239335AbjFMEV6 (ORCPT ); Tue, 13 Jun 2023 00:21:58 -0400 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BB28C10FE; Mon, 12 Jun 2023 21:21:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686630107; x=1718166107; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=ci6rEw05hh+1ShG1mMr7Z5qVrukY7SAsqk+7oT7Fwys=; b=AXE4E/C/FhFhBS8MBhZqnDWnzmDW9/197iXZc4yPGxUlt+Bvpk3DbLLN CaApxDPwC9aArUDsI5IER2iZVcFeXnUwNNh4qHOAOx8F2EskuVaHTPeZX lcTqDYsL2YEM8H0bhOYzO14/yykl+X3xKPFcFzZxkyiTD8VQrME/7bWhS aK5BUge6GxoDK3Hb1jm0ZJxcES4emPsQkHVaFZWpWnsUieZpjzvrK0vnI lEzUOK7EC4smoWAOnMotpQ884yeZjCnxumWLCmyt+gVIuHPGGFxvheV5D 6p553UIbaciMJu5FyVFMtDNAHwlwOcfUjI2wGp3fuw+IZY5UCJcuoAiFS Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="358222131" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="358222131" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 21:21:47 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="661854959" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="661854959" Received: from ranerica-svr.sc.intel.com ([172.25.110.23]) by orsmga003.jf.intel.com with ESMTP; 12 Jun 2023 21:21:45 -0700 From: Ricardo Neri To: "Peter Zijlstra (Intel)" , Juri Lelli , Vincent Guittot Cc: Ricardo Neri , "Ravi V. Shankar" , Ben Segall , Daniel Bristot de Oliveira , Dietmar Eggemann , Len Brown , Mel Gorman , "Rafael J. Wysocki" , Srinivas Pandruvada , Steven Rostedt , Tim Chen , Valentin Schneider , Lukasz Luba , Ionela Voinescu , Zhao Liu , "Yuan, Perry" , x86@kernel.org, "Joel Fernandes (Google)" , linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, Ricardo Neri , "Tim C . Chen" , Zhao Liu Subject: [PATCH v4 09/24] sched/fair: Use IPCC stats to break ties between fully_busy SMT groups Date: Mon, 12 Jun 2023 21:24:07 -0700 Message-Id: <20230613042422.5344-10-ricardo.neri-calderon@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230613042422.5344-1-ricardo.neri-calderon@linux.intel.com> References: <20230613042422.5344-1-ricardo.neri-calderon@linux.intel.com> Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org IPCC statistics are used during idle load balancing. After balancing one of the siblings of an SMT core becomes idle. The remaining busy siblings experience increased throughput. The IPCC statistics provide a measure of the increased throughput. Use them to select the busiest group between otherwise identical fully_busy scheduling groups. (The avg_load is not computed in this case and is zero for both groups). IPCC scores are not needed to break ties with non-SMT fully_busy sched groups. SMT sched groups always need more help. Add a stub sched_asym_ipcc_prefer() to handle !CONFIG_IPC_CLASSES. Cc: Ben Segall Cc: Daniel Bristot de Oliveira Cc: Dietmar Eggemann Cc: Ionela Voinescu Cc: Joel Fernandes (Google) Cc: Len Brown Cc: Lukasz Luba Cc: Mel Gorman Cc: Perry Yuan Cc: Rafael J. Wysocki Cc: Srinivas Pandruvada Cc: Steven Rostedt Cc: Tim C. Chen Cc: Valentin Schneider Cc: Zhao Liu Cc: x86@kernel.org Cc: linux-pm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Ricardo Neri --- Changes since v3: * None Changes since v2: * Introduced this patch. Changes since v1: * N/A --- kernel/sched/fair.c | 23 ++++++++++++++++++++--- 1 file changed, 20 insertions(+), 3 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index fb3d793fe9ad..fcec791ede4f 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -9580,6 +9580,12 @@ static void update_sg_lb_stats_scores(struct sg_lb_stats *sgs, { } +static bool sched_asym_ipcc_prefer(struct sg_lb_stats *a, + struct sg_lb_stats *b) +{ + return false; +} + static bool sched_asym_ipcc_pick(struct sched_group *a, struct sched_group *b, struct sg_lb_stats *a_stats, @@ -9861,10 +9867,21 @@ static bool update_sd_pick_busiest(struct lb_env *env, if (sgs->avg_load == busiest->avg_load) { /* * SMT sched groups need more help than non-SMT groups. - * If @sg happens to also be SMT, either choice is good. */ - if (sds->busiest->flags & SD_SHARE_CPUCAPACITY) - return false; + if (sds->busiest->flags & SD_SHARE_CPUCAPACITY) { + if (!(sg->flags & SD_SHARE_CPUCAPACITY)) + return false; + + /* + * Between two SMT groups, use IPCC scores to pick the + * one that would improve throughput the most (only + * asym_packing uses IPCC scores for now). + */ + if (sched_ipcc_enabled() && + env->sd->flags & SD_ASYM_PACKING && + sched_asym_ipcc_prefer(busiest, sgs)) + return false; + } } break; From patchwork Tue Jun 13 04:24:08 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ricardo Neri X-Patchwork-Id: 692929 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 628F8C77B7A for ; Tue, 13 Jun 2023 04:22:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239550AbjFMEWp (ORCPT ); Tue, 13 Jun 2023 00:22:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45882 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239374AbjFMEWK (ORCPT ); Tue, 13 Jun 2023 00:22:10 -0400 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 05B96171D; Mon, 12 Jun 2023 21:21:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686630109; x=1718166109; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=/I49eIsDquR6DFYsO93kE0UH3pQmCgZ0Nh4gXSalcss=; b=NjjKNrQXJsw2RrnH4XFLz/Y+huzpJBJMxesqJW03q34Gt4I0hc9fHz+3 s/e4ycpeOpeTHP0oMXmo2P1N7DAIL+ihNbegVQkMYYaHgx/e67TB+pJCz lhoHmPtMd0vMBlQ3HZAKJgTo+vXAjxQO10he19oH1kds/bO+cVkovvyup /b3uxnV84NRSMXQreySCkfAyJ3TE/Td/lJsIaeZPuWsU6zLBf3F6+JWt0 /KW9AyIsRNEW5IDOX7BmbvGc0+oMxq4WnolXaIf8nu8P5tbU69HKfK0JI Ir6f65tYgEmBD++b5rRh7ckJJla53zMKePdDqvxIk9jEK0HTuOqT9ixH7 w==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="358222148" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="358222148" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 21:21:48 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="661854962" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="661854962" Received: from ranerica-svr.sc.intel.com ([172.25.110.23]) by orsmga003.jf.intel.com with ESMTP; 12 Jun 2023 21:21:47 -0700 From: Ricardo Neri To: "Peter Zijlstra (Intel)" , Juri Lelli , Vincent Guittot Cc: Ricardo Neri , "Ravi V. Shankar" , Ben Segall , Daniel Bristot de Oliveira , Dietmar Eggemann , Len Brown , Mel Gorman , "Rafael J. Wysocki" , Srinivas Pandruvada , Steven Rostedt , Tim Chen , Valentin Schneider , Lukasz Luba , Ionela Voinescu , Zhao Liu , "Yuan, Perry" , x86@kernel.org, "Joel Fernandes (Google)" , linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, Ricardo Neri , "Tim C . Chen" , Zhao Liu Subject: [PATCH v4 10/24] sched/fair: Use IPCC scores to select a busiest runqueue Date: Mon, 12 Jun 2023 21:24:08 -0700 Message-Id: <20230613042422.5344-11-ricardo.neri-calderon@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230613042422.5344-1-ricardo.neri-calderon@linux.intel.com> References: <20230613042422.5344-1-ricardo.neri-calderon@linux.intel.com> Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org Use IPCC scores to break a tie between two runqueues with the same priority and number of running tasks: select the runqueue of which the task enqueued last would get a higher IPC boost when migrated to the destination CPU. (These tasks are migrated first during load balance.) For now, restrict the utilization of IPCC scores to scheduling domains marked with the SD_ASYM_PACKING flag. Cc: Ben Segall Cc: Daniel Bristot de Oliveira Cc: Dietmar Eggemann Cc: Ionela Voinescu Cc: Joel Fernandes (Google) Cc: Len Brown Cc: Lukasz Luba Cc: Mel Gorman Cc: Perry Yuan Cc: Rafael J. Wysocki Cc: Srinivas Pandruvada Cc: Steven Rostedt Cc: Tim C. Chen Cc: Valentin Schneider Cc: Zhao Liu Cc: x86@kernel.org Cc: linux-pm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Ricardo Neri --- Changes since v3: * Do not compute the IPCC stats using the current tasks of runqueues. Instead, use the tasks at the back of the queue. These are the tasks that will be pulled first during load balance. (Vincent) Changes since v2: * Only use IPCC scores to break ties if the sched domain uses asym_packing. (Ionela) * Handle errors of arch_get_ipcc_score(). (Ionela) Changes since v1: * Fixed a bug when selecting a busiest runqueue: when comparing two runqueues with equal nr_running, we must compute the IPCC score delta of both. * Renamed local variables to improve the layout of the code block. (PeterZ) * Used the new interface names. --- kernel/sched/fair.c | 61 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 61 insertions(+) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index fcec791ede4f..da3e009eef42 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -9564,6 +9564,41 @@ static bool sched_asym_ipcc_pick(struct sched_group *a, return sched_asym_ipcc_prefer(a_stats, b_stats); } +/** + * ipcc_score_delta - Get the IPCC score delta wrt the load balance's dst_cpu + * @rq: A runqueue + * @env: Load balancing environment + * + * Returns: The IPCC score delta that the last task enqueued in @rq would get + * if placed in the destination CPU of @env. LONG_MIN to indicate that the + * delta should not be used. + */ +static long ipcc_score_delta(struct rq *rq, struct lb_env *env) +{ + unsigned long score_src, score_dst; + unsigned short ipcc; + + if (!sched_ipcc_enabled()) + return LONG_MIN; + + /* Only asym_packing uses IPCC scores at the moment. */ + if (!(env->sd->flags & SD_ASYM_PACKING)) + return LONG_MIN; + + if (rq_last_task_ipcc(env->dst_cpu, rq, &ipcc)) + return LONG_MIN; + + score_dst = arch_get_ipcc_score(ipcc, env->dst_cpu); + if (IS_ERR_VALUE(score_dst)) + return LONG_MIN; + + score_src = arch_get_ipcc_score(ipcc, cpu_of(rq)); + if (IS_ERR_VALUE(score_src)) + return LONG_MIN; + + return score_dst - score_src; +} + #else /* CONFIG_IPC_CLASSES */ static void update_sg_lb_ipcc_stats(int dst_cpu, struct sg_lb_stats *sgs, struct rq *rq) @@ -9594,6 +9629,11 @@ static bool sched_asym_ipcc_pick(struct sched_group *a, return false; } +static long ipcc_score_delta(struct rq *rq, struct lb_env *env) +{ + return LONG_MIN; +} + #endif /* CONFIG_IPC_CLASSES */ /** @@ -10769,6 +10809,7 @@ static struct rq *find_busiest_queue(struct lb_env *env, { struct rq *busiest = NULL, *rq; unsigned long busiest_util = 0, busiest_load = 0, busiest_capacity = 1; + long busiest_ipcc_delta = LONG_MIN; unsigned int busiest_nr = 0; int i; @@ -10885,6 +10926,26 @@ static struct rq *find_busiest_queue(struct lb_env *env, if (busiest_nr < nr_running) { busiest_nr = nr_running; busiest = rq; + + /* + * Remember the IPCC score of the busiest + * runqueue. We may need it to break a tie with + * other queues with equal nr_running. + */ + busiest_ipcc_delta = ipcc_score_delta(busiest, env); + /* + * For ties, select @rq if doing would give its last + * queued task a bigger IPC boost when migrated to + * dst_cpu. + */ + } else if (busiest_nr == nr_running) { + long delta = ipcc_score_delta(rq, env); + + if (busiest_ipcc_delta < delta) { + busiest_ipcc_delta = delta; + busiest_nr = nr_running; + busiest = rq; + } } break; From patchwork Tue Jun 13 04:24:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ricardo Neri X-Patchwork-Id: 692451 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1A798C88CB9 for ; Tue, 13 Jun 2023 04:23:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239614AbjFMEXG (ORCPT ); Tue, 13 Jun 2023 00:23:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45916 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239417AbjFMEWM (ORCPT ); Tue, 13 Jun 2023 00:22:12 -0400 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 156531989; Mon, 12 Jun 2023 21:21:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686630110; x=1718166110; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=y0o6+f37IqB/eFr8r5TzCIQyyzU0NBcgN26gcc8sIuI=; b=kBCIU+OSNYuYe5r2Zr+l7Vqiu4BRRWn/L28IswR6z3ghBGYgiWy7TslQ 8VXQY4E8OtfdII2MaL4kXe6Sje0lOYx7pzlPoHQRBRzox4qAIK3HlrC9v gOKL5QEqqRpLGbBlJvLRz7r3Fxm/KbTuQwoNAeHgS6xtolo3ZHYLGm2dH lEBdqtbr/wuT4QPPjOF/v8KWM9JKHdAcWco3zO+PNrjXf8JViPqonM+3i lt9w5g581eRWZMXxhDX+4t1czZUHoQha4BA2bXlCmR0AotyVwLtkh0dJt MhKKQHUImipWhnf3VO52xjuP3Bm7BPYUArE2Qgw3EP29mKdoJO9gY1G5s g==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="358222159" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="358222159" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 21:21:49 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="661854965" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="661854965" Received: from ranerica-svr.sc.intel.com ([172.25.110.23]) by orsmga003.jf.intel.com with ESMTP; 12 Jun 2023 21:21:48 -0700 From: Ricardo Neri To: "Peter Zijlstra (Intel)" , Juri Lelli , Vincent Guittot Cc: Ricardo Neri , "Ravi V. Shankar" , Ben Segall , Daniel Bristot de Oliveira , Dietmar Eggemann , Len Brown , Mel Gorman , "Rafael J. Wysocki" , Srinivas Pandruvada , Steven Rostedt , Tim Chen , Valentin Schneider , Lukasz Luba , Ionela Voinescu , Zhao Liu , "Yuan, Perry" , x86@kernel.org, "Joel Fernandes (Google)" , linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, Ricardo Neri , "Tim C . Chen" , Zhao Liu Subject: [PATCH v4 11/24] thermal: intel: hfi: Introduce Intel Thread Director classes Date: Mon, 12 Jun 2023 21:24:09 -0700 Message-Id: <20230613042422.5344-12-ricardo.neri-calderon@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230613042422.5344-1-ricardo.neri-calderon@linux.intel.com> References: <20230613042422.5344-1-ricardo.neri-calderon@linux.intel.com> Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org On Intel hybrid parts, each type of CPU has specific performance and energy efficiency capabilities. The Intel Thread Director technology extends the Hardware Feedback Interface (HFI) to provide performance and energy efficiency data for advanced classes of instructions. Add support to parse per-class capabilities. Cc: Ben Segall Cc: Daniel Bristot de Oliveira Cc: Dietmar Eggemann Cc: Ionela Voinescu Cc: Joel Fernandes (Google) Cc: Len Brown Cc: Lukasz Luba Cc: Mel Gorman Cc: Perry Yuan Cc: Rafael J. Wysocki Cc: Srinivas Pandruvada Cc: Steven Rostedt Cc: Tim C. Chen Cc: Valentin Schneider Cc: Zhao Liu Cc: x86@kernel.org Cc: linux-pm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Acked-by: Rafael J. Wysocki Signed-off-by: Ricardo Neri --- Changes since v3: * Added Acked-by tag from Rafael. Changes since v2: * None Changes since v1: * Removed a now obsolete comment. --- drivers/thermal/intel/intel_hfi.c | 30 ++++++++++++++++++++++++------ 1 file changed, 24 insertions(+), 6 deletions(-) diff --git a/drivers/thermal/intel/intel_hfi.c b/drivers/thermal/intel/intel_hfi.c index c69db6c90869..b41547912fda 100644 --- a/drivers/thermal/intel/intel_hfi.c +++ b/drivers/thermal/intel/intel_hfi.c @@ -78,7 +78,7 @@ union cpuid6_edx { * @ee_cap: Energy efficiency capability * * Capabilities of a logical processor in the HFI table. These capabilities are - * unitless. + * unitless and specific to each HFI class. */ struct hfi_cpu_data { u8 perf_cap; @@ -90,7 +90,8 @@ struct hfi_cpu_data { * @perf_updated: Hardware updated performance capabilities * @ee_updated: Hardware updated energy efficiency capabilities * - * Properties of the data in an HFI table. + * Properties of the data in an HFI table. There exists one header per each + * HFI class. */ struct hfi_hdr { u8 perf_updated; @@ -128,16 +129,21 @@ struct hfi_instance { /** * struct hfi_features - Supported HFI features + * @nr_classes: Number of classes supported * @nr_table_pages: Size of the HFI table in 4KB pages * @cpu_stride: Stride size to locate the capability data of a logical * processor within the table (i.e., row stride) + * @class_stride: Stride size to locate a class within the capability + * data of a logical processor or the HFI table header * @hdr_size: Size of the table header * * Parameters and supported features that are common to all HFI instances */ struct hfi_features { + unsigned int nr_classes; size_t nr_table_pages; unsigned int cpu_stride; + unsigned int class_stride; unsigned int hdr_size; }; @@ -334,8 +340,8 @@ static void init_hfi_cpu_index(struct hfi_cpu_info *info) } /* - * The format of the HFI table depends on the number of capabilities that the - * hardware supports. Keep a data structure to navigate the table. + * The format of the HFI table depends on the number of capabilities and classes + * that the hardware supports. Keep a data structure to navigate the table. */ static void init_hfi_instance(struct hfi_instance *hfi_instance) { @@ -516,18 +522,30 @@ static __init int hfi_parse_features(void) /* The number of 4KB pages required by the table */ hfi_features.nr_table_pages = edx.split.table_pages + 1; + /* + * Capability fields of an HFI class are grouped together. Classes are + * contiguous in memory. Hence, use the number of supported features to + * locate a specific class. + */ + hfi_features.class_stride = nr_capabilities; + + /* For now, use only one class of the HFI table */ + hfi_features.nr_classes = 1; + /* * The header contains change indications for each supported feature. * The size of the table header is rounded up to be a multiple of 8 * bytes. */ - hfi_features.hdr_size = DIV_ROUND_UP(nr_capabilities, 8) * 8; + hfi_features.hdr_size = DIV_ROUND_UP(nr_capabilities * + hfi_features.nr_classes, 8) * 8; /* * Data of each logical processor is also rounded up to be a multiple * of 8 bytes. */ - hfi_features.cpu_stride = DIV_ROUND_UP(nr_capabilities, 8) * 8; + hfi_features.cpu_stride = DIV_ROUND_UP(nr_capabilities * + hfi_features.nr_classes, 8) * 8; return 0; } From patchwork Tue Jun 13 04:24:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ricardo Neri X-Patchwork-Id: 692928 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id ABA76C88CB9 for ; Tue, 13 Jun 2023 04:23:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239678AbjFMEXM (ORCPT ); Tue, 13 Jun 2023 00:23:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45996 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239331AbjFMEWS (ORCPT ); Tue, 13 Jun 2023 00:22:18 -0400 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9CA3310DC; Mon, 12 Jun 2023 21:21:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686630111; x=1718166111; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=CrOiqTVb1eZOl3V7qBg0hPpm63j88QwlS+s2uEc7SgM=; b=ZcKREqPJUeFxqdibSGJ45E9M56mzfyilHzA492b5JEC+QuBveCWj7kai 0d2xy++iH3udZbaTT6JG70TBc/bY31XZoefVGz8VfRbvkjankphHeqqG1 ztmeKuNKrR63MEx+DPfvhWJ/vFnZxyqM4K+9d3DdeM3zCNpZ2Es5bPRi8 AuxJUKL/ngxPklZMJu5e9o6nD8kHQlW9b+TdGJiwu58TWDxuxg1MsUCkt DB8kp+qJZA3UCubKLgqCu6LmVXXCWHAwDkMZj5n6cUDiuGE5WY08w5kpI VucH9uq+GsnWDhl/eNqDoEf1ZdQvlVEl1Ugb7T6Ee+9qskTnC1GOlhh9S w==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="358222171" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="358222171" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 21:21:50 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="661854968" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="661854968" Received: from ranerica-svr.sc.intel.com ([172.25.110.23]) by orsmga003.jf.intel.com with ESMTP; 12 Jun 2023 21:21:49 -0700 From: Ricardo Neri To: "Peter Zijlstra (Intel)" , Juri Lelli , Vincent Guittot Cc: Ricardo Neri , "Ravi V. Shankar" , Ben Segall , Daniel Bristot de Oliveira , Dietmar Eggemann , Len Brown , Mel Gorman , "Rafael J. Wysocki" , Srinivas Pandruvada , Steven Rostedt , Tim Chen , Valentin Schneider , Lukasz Luba , Ionela Voinescu , Zhao Liu , "Yuan, Perry" , x86@kernel.org, "Joel Fernandes (Google)" , linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, Ricardo Neri , "Tim C . Chen" , Zhao Liu Subject: [PATCH v4 12/24] x86/cpufeatures: Add the Intel Thread Director feature definitions Date: Mon, 12 Jun 2023 21:24:10 -0700 Message-Id: <20230613042422.5344-13-ricardo.neri-calderon@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230613042422.5344-1-ricardo.neri-calderon@linux.intel.com> References: <20230613042422.5344-1-ricardo.neri-calderon@linux.intel.com> Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org Intel Thread Director (ITD) provides hardware resources to classify the current task. The classification reflects the type of instructions that a task currently executes. ITD extends the Hardware Feedback Interface table to provide performance and energy efficiency capabilities for each of the supported classes of tasks. Cc: Ben Segall Cc: Daniel Bristot de Oliveira Cc: Dietmar Eggemann Cc: Ionela Voinescu Cc: Joel Fernandes (Google) Cc: Len Brown Cc: Lukasz Luba Cc: Mel Gorman Cc: Perry Yuan Cc: Rafael J. Wysocki Cc: Srinivas Pandruvada Cc: Steven Rostedt Cc: Tim C. Chen Cc: Valentin Schneider Cc: Zhao Liu Cc: x86@kernel.org Cc: linux-pm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Ricardo Neri --- Changes since v3: * None Changes since v2: * None Changes since v1: * Removed dependency on CONFIG_INTEL_THREAD_DIRECTOR. Instead, depend on CONFIG_IPC_CLASSES. * Added DISABLE_ITD to the correct DISABLE_MASK: 14 instead of 13. --- arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/disabled-features.h | 8 +++++++- arch/x86/kernel/cpu/cpuid-deps.c | 1 + 3 files changed, 9 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index cb8ca46213be..98a84cbf4261 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -353,6 +353,7 @@ #define X86_FEATURE_HWP_EPP (14*32+10) /* HWP Energy Perf. Preference */ #define X86_FEATURE_HWP_PKG_REQ (14*32+11) /* HWP Package Level Request */ #define X86_FEATURE_HFI (14*32+19) /* Hardware Feedback Interface */ +#define X86_FEATURE_ITD (14*32+23) /* Intel Thread Director */ /* AMD SVM Feature Identification, CPUID level 0x8000000a (EDX), word 15 */ #define X86_FEATURE_NPT (15*32+ 0) /* Nested Page Table support */ diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h index fafe9be7a6f4..fad78bd840cd 100644 --- a/arch/x86/include/asm/disabled-features.h +++ b/arch/x86/include/asm/disabled-features.h @@ -105,6 +105,12 @@ # define DISABLE_TDX_GUEST (1 << (X86_FEATURE_TDX_GUEST & 31)) #endif +#ifdef CONFIG_IPC_CLASSES +# define DISABLE_ITD 0 +#else +# define DISABLE_ITD (1 << (X86_FEATURE_ITD & 31)) +#endif + /* * Make sure to add features to the correct mask */ @@ -123,7 +129,7 @@ DISABLE_CALL_DEPTH_TRACKING) #define DISABLED_MASK12 (DISABLE_LAM) #define DISABLED_MASK13 0 -#define DISABLED_MASK14 0 +#define DISABLED_MASK14 (DISABLE_ITD) #define DISABLED_MASK15 0 #define DISABLED_MASK16 (DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP| \ DISABLE_ENQCMD) diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c index f6748c8bd647..7a87b823eef3 100644 --- a/arch/x86/kernel/cpu/cpuid-deps.c +++ b/arch/x86/kernel/cpu/cpuid-deps.c @@ -81,6 +81,7 @@ static const struct cpuid_dep cpuid_deps[] = { { X86_FEATURE_XFD, X86_FEATURE_XSAVES }, { X86_FEATURE_XFD, X86_FEATURE_XGETBV1 }, { X86_FEATURE_AMX_TILE, X86_FEATURE_XFD }, + { X86_FEATURE_ITD, X86_FEATURE_HFI }, {} }; From patchwork Tue Jun 13 04:24:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ricardo Neri X-Patchwork-Id: 692450 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E4856C88CB9 for ; Tue, 13 Jun 2023 04:23:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239720AbjFMEXa (ORCPT ); Tue, 13 Jun 2023 00:23:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46148 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237495AbjFMEWa (ORCPT ); Tue, 13 Jun 2023 00:22:30 -0400 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E794C19B4; Mon, 12 Jun 2023 21:21:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686630112; x=1718166112; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=sAQ1uZpZ/GyeSOWenRYpSLcOx4BePCSSLjrCLgIxfbo=; b=M1+Q8CFCUKIxt2wmqtf/pdblWM21fJbRHXfb+95muJ4pZjnYvJk1ASr+ clq2H+58OTHabHu4MaFFlimWLxtHO1aOnnf4U/8IYtwx2OMq2m6AzfPgx n9pxpjqCnpPhR3K/jgGSiaEHsXwFr0tXU2rgoPEKXpmAqePXIAWjqBIbr ORSc7bnwJYb26Mf5su8wzeLbYVFYQD8GflqJ3v00U9zykIAHcChhfhGY4 XHKC7UMRmhFLEYnE2Wj8r7KL06XfCNcwQXlm5zBTwJFkL331NLXAOwnUt Rn/ue3bRW7Xy5hP7o5NHEKrWvv06ecf3bNmqg4qb7CcT37r/6Id9DUnLj A==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="358222183" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="358222183" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 21:21:52 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="661854971" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="661854971" Received: from ranerica-svr.sc.intel.com ([172.25.110.23]) by orsmga003.jf.intel.com with ESMTP; 12 Jun 2023 21:21:50 -0700 From: Ricardo Neri To: "Peter Zijlstra (Intel)" , Juri Lelli , Vincent Guittot Cc: Ricardo Neri , "Ravi V. Shankar" , Ben Segall , Daniel Bristot de Oliveira , Dietmar Eggemann , Len Brown , Mel Gorman , "Rafael J. Wysocki" , Srinivas Pandruvada , Steven Rostedt , Tim Chen , Valentin Schneider , Lukasz Luba , Ionela Voinescu , Zhao Liu , "Yuan, Perry" , x86@kernel.org, "Joel Fernandes (Google)" , linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, Ricardo Neri , "Tim C . Chen" , Zhao Liu Subject: [PATCH v4 13/24] x86/sched: Update the IPC class of the current task Date: Mon, 12 Jun 2023 21:24:11 -0700 Message-Id: <20230613042422.5344-14-ricardo.neri-calderon@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230613042422.5344-1-ricardo.neri-calderon@linux.intel.com> References: <20230613042422.5344-1-ricardo.neri-calderon@linux.intel.com> Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org Intel Thread Director provides a classification value based on the type of instructions that CPU is currently executing. Use this classification to update the IPC class of the current task. The responsibility for configuring and enabling both the Hardware Feedback Interface and Intel Thread Director lies with the HFI driver, but it should not directly handle tasks. Update the HFI driver to read the register that provides the classification result. Implement the arch_update_ipcc() interface of the scheduler under arch/x86 code to update the IPC class of individual tasks. Task classification only makes sense when used along with the HFI driver. Make HFI driver select CONFIG_IPC_CLASSES. However, users may still select CONFIG_IPC_CLASSES. Add function stubs to prevent build errors. Cc: Ben Segall Cc: Daniel Bristot de Oliveira Cc: Dietmar Eggemann Cc: Ionela Voinescu Cc: Joel Fernandes (Google) Cc: Len Brown Cc: Lukasz Luba Cc: Mel Gorman Cc: Perry Yuan Cc: Rafael J. Wysocki Cc: Srinivas Pandruvada Cc: Steven Rostedt Cc: Tim C. Chen Cc: Valentin Schneider Cc: Zhao Liu Cc: x86@kernel.org Cc: linux-pm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Ricardo Neri --- Changes since v3: * Relocated the implementation of arch_update_ipcc() from drivers/thermal to arch/x86. (Rafael) * Moved the definition of MSR_IA32_HW_FEEDBACK_CHAR into this patch. (Reported-by: kernel test robot ) * Select CONFIG_IPC_CLASSES when CONFIG_INTEL_HFI_THERMAL is selected to reduce the configuration burden of the user/administrator. (Srinivas) Changes since v2: * Removed the implementation of arch_has_ipc_classes(). Changes since v1: * Adjusted the result the classification of Intel Thread Director to start at class 1. Class 0 for the scheduler means that the task is unclassified. * Redefined union hfi_thread_feedback_char_msr to ensure all bit-fields are packed. (PeterZ) * Removed CONFIG_INTEL_THREAD_DIRECTOR. (PeterZ) * Shortened the names of the functions that implement IPC classes. * Removed argument smt_siblings_idle from intel_hfi_update_ipcc(). (PeterZ) --- arch/x86/include/asm/msr-index.h | 1 + arch/x86/include/asm/topology.h | 11 +++++++++ arch/x86/kernel/Makefile | 2 ++ arch/x86/kernel/sched_ipcc.c | 35 +++++++++++++++++++++++++++++ drivers/thermal/intel/Kconfig | 1 + drivers/thermal/intel/intel_hfi.c | 37 +++++++++++++++++++++++++++++++ 6 files changed, 87 insertions(+) create mode 100644 arch/x86/kernel/sched_ipcc.c diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h index 3aedae61af4f..0bc4ed0ff787 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -1108,6 +1108,7 @@ /* Hardware Feedback Interface */ #define MSR_IA32_HW_FEEDBACK_PTR 0x17d0 #define MSR_IA32_HW_FEEDBACK_CONFIG 0x17d1 +#define MSR_IA32_HW_FEEDBACK_CHAR 0x17d2 /* x2APIC locked status */ #define MSR_IA32_XAPIC_DISABLE_STATUS 0xBD diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h index caf41c4869a0..7d2ed7f7bb8c 100644 --- a/arch/x86/include/asm/topology.h +++ b/arch/x86/include/asm/topology.h @@ -235,4 +235,15 @@ void init_freq_invariance_cppc(void); #define arch_init_invariance_cppc init_freq_invariance_cppc #endif +#ifdef CONFIG_INTEL_HFI_THERMAL +int intel_hfi_read_classid(u8 *classid); +#else +static inline int intel_hfi_read_classid(u8 *classid) { return -ENODEV; } +#endif + +#ifdef CONFIG_IPC_CLASSES +void intel_update_ipcc(struct task_struct *curr); +#define arch_update_ipcc intel_update_ipcc +#endif + #endif /* _ASM_X86_TOPOLOGY_H */ diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 4070a01c11b7..f9b9d213ddaa 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -145,6 +145,8 @@ obj-$(CONFIG_CFI_CLANG) += cfi.o obj-$(CONFIG_CALL_THUNKS) += callthunks.o +obj-$(CONFIG_IPC_CLASSES) += sched_ipcc.o + ### # 64 bit specific files ifeq ($(CONFIG_X86_64),y) diff --git a/arch/x86/kernel/sched_ipcc.c b/arch/x86/kernel/sched_ipcc.c new file mode 100644 index 000000000000..685e7b3b5375 --- /dev/null +++ b/arch/x86/kernel/sched_ipcc.c @@ -0,0 +1,35 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Intel support for scheduler IPC classes + * + * Copyright (c) 2023, Intel Corporation. + * + * Author: Ricardo Neri + * + * On hybrid processors, the architecture differences between types of CPUs + * lead to different number of retired instructions per cycle (IPC). IPCs may + * differ further by classes of instructions. + * + * The scheduler assigns an IPC class to every task with arch_update_ipcc() + * from data that hardware provides. Implement this interface for x86. + * + * See kernel/sched/sched.h for details. + */ + +#include + +#include + +void intel_update_ipcc(struct task_struct *curr) +{ + u8 hfi_class; + + if (intel_hfi_read_classid(&hfi_class)) + return; + + /* + * 0 is a valid classification for Intel Thread Director. A scheduler + * IPCC class of 0 means that the task is unclassified. Adjust. + */ + curr->ipcc = hfi_class + 1; +} diff --git a/drivers/thermal/intel/Kconfig b/drivers/thermal/intel/Kconfig index ecd7e07eece0..418db04dc876 100644 --- a/drivers/thermal/intel/Kconfig +++ b/drivers/thermal/intel/Kconfig @@ -109,6 +109,7 @@ config INTEL_HFI_THERMAL depends on CPU_SUP_INTEL depends on X86_THERMAL_VECTOR select THERMAL_NETLINK + select IPC_CLASSES help Select this option to enable the Hardware Feedback Interface. If selected, hardware provides guidance to the operating system on diff --git a/drivers/thermal/intel/intel_hfi.c b/drivers/thermal/intel/intel_hfi.c index b41547912fda..20ee4264dcd4 100644 --- a/drivers/thermal/intel/intel_hfi.c +++ b/drivers/thermal/intel/intel_hfi.c @@ -72,6 +72,15 @@ union cpuid6_edx { u32 full; }; +union hfi_thread_feedback_char_msr { + struct { + u64 classid : 8; + u64 __reserved : 55; + u64 valid : 1; + } split; + u64 full; +}; + /** * struct hfi_cpu_data - HFI capabilities per CPU * @perf_cap: Performance capability @@ -171,6 +180,34 @@ static struct workqueue_struct *hfi_updates_wq; #define HFI_UPDATE_INTERVAL HZ #define HFI_MAX_THERM_NOTIFY_COUNT 16 +/** + * intel_hfi_read_classid() - Read the currrent classid + * @classid: Variable to which the classid will be written. + * + * Read the classification that Intel Thread Director has produced when this + * function is called. Thread classification must be enabled before calling + * this function. + * + * Return: 0 if the produced classification is valid. Error otherwise. + */ +int intel_hfi_read_classid(u8 *classid) +{ + union hfi_thread_feedback_char_msr msr; + + /* We should not be here if ITD is not supported. */ + if (!cpu_feature_enabled(X86_FEATURE_ITD)) { + pr_warn_once("task classification requested but not supported!"); + return -ENODEV; + } + + rdmsrl(MSR_IA32_HW_FEEDBACK_CHAR, msr.full); + if (!msr.split.valid) + return -EINVAL; + + *classid = msr.split.classid; + return 0; +} + static void get_hfi_caps(struct hfi_instance *hfi_instance, struct thermal_genl_cpu_caps *cpu_caps) { From patchwork Tue Jun 13 04:24:12 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ricardo Neri X-Patchwork-Id: 692927 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2056CC88CBA for ; Tue, 13 Jun 2023 04:23:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239729AbjFMEXc (ORCPT ); Tue, 13 Jun 2023 00:23:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45452 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239335AbjFMEWl (ORCPT ); Tue, 13 Jun 2023 00:22:41 -0400 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7A9DB1734; Mon, 12 Jun 2023 21:21:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686630114; x=1718166114; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=Rg6bBLEQ3AC4QDzW6yr8kT0VDnQBk2LkMuNm2FBH7m4=; b=KTPz9mb27IBpJnSggCKlaeBB0xZ5nQdVRqzBHO/LXGSgLzgvw5oY+20H 3i7YxXtGBSbGNrkgmUiwhWt7WS9J+yDThGMx5Kt2XFpvY+z19Lest6Y5i 2P1K81sF9W+WaNsEeCRRB+prch9ICqniNy4O8osHEITmrvdkDa45KfzBO Ovw6JPKfw3BpMZS5pKO8lmTEr9YCevz9s7h18qRrItZzhOqDpcIvpFzzF +dMrOYMxOhd1O1NwVOuIQI3VrXVgrQ31+L8UIK/8UBpSZYPxY9m1fGFxO kCvmogFVXA9jHKVUdkNWTtUhTRq9aSJqk6E+mhTNW71+GrPiCMrQX9g4k w==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="358222201" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="358222201" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 21:21:53 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="661854974" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="661854974" Received: from ranerica-svr.sc.intel.com ([172.25.110.23]) by orsmga003.jf.intel.com with ESMTP; 12 Jun 2023 21:21:52 -0700 From: Ricardo Neri To: "Peter Zijlstra (Intel)" , Juri Lelli , Vincent Guittot Cc: Ricardo Neri , "Ravi V. Shankar" , Ben Segall , Daniel Bristot de Oliveira , Dietmar Eggemann , Len Brown , Mel Gorman , "Rafael J. Wysocki" , Srinivas Pandruvada , Steven Rostedt , Tim Chen , Valentin Schneider , Lukasz Luba , Ionela Voinescu , Zhao Liu , "Yuan, Perry" , x86@kernel.org, "Joel Fernandes (Google)" , linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, Ricardo Neri , "Tim C . Chen" , Zhao Liu Subject: [PATCH v4 14/24] thermal: intel: hfi: Store per-CPU IPCC scores Date: Mon, 12 Jun 2023 21:24:12 -0700 Message-Id: <20230613042422.5344-15-ricardo.neri-calderon@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230613042422.5344-1-ricardo.neri-calderon@linux.intel.com> References: <20230613042422.5344-1-ricardo.neri-calderon@linux.intel.com> Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org The scheduler reads the IPCC scores when balancing load. These reads can occur frequently and originate from many CPUs. Hardware may also occasionally update the HFI table. Controlling access with locks would cause contention. Cache the IPCC scores in separate per-CPU variables that the scheduler can use. Use a seqcount to synchronize memory accesses to these cached values. This eliminates the need for locks, as the sequence counter provides the memory ordering required to prevent the use of stale data. The HFI delayed workqueue guarantees that only one CPU writes the cached IPCC scores. The frequency of updates is low (every CONFIG_HZ jiffies or less), and the number of writes per update is in the order of tens. Writes should not starve reads. Only cache IPCC scores in this changeset. A subsequent changeset will use these scores. Cc: Ben Segall Cc: Daniel Bristot de Oliveira Cc: Dietmar Eggemann Cc: Ionela Voinescu Cc: Joel Fernandes (Google) Cc: Len Brown Cc: Lukasz Luba Cc: Mel Gorman Cc: Perry Yuan Cc: Rafael J. Wysocki Cc: Srinivas Pandruvada Cc: Steven Rostedt Cc: Tim C. Chen Cc: Valentin Schneider Cc: Zhao Liu Cc: x86@kernel.org Cc: linux-pm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Suggested-by: Peter Zijlstra (Intel) Signed-off-by: Ricardo Neri --- Changes since v3: * As Rafael requested, I reworked the memory ordering of the cached IPCC scores. I selected a seqcount as is less expensive than a memory barrier, which is not necessary anyways. * Made alloc_hfi_ipcc_scores() return -ENOMEM on allocation failure. (Rafael) * Added a comment to describe hfi_ipcc_scores. (Rafael) Changes since v2: * Only create these per-CPU variables when Intel Thread Director is supported. Changes since v1: * Added this patch. --- drivers/thermal/intel/intel_hfi.c | 66 +++++++++++++++++++++++++++++++ 1 file changed, 66 insertions(+) diff --git a/drivers/thermal/intel/intel_hfi.c b/drivers/thermal/intel/intel_hfi.c index 20ee4264dcd4..d822ed0bb5c1 100644 --- a/drivers/thermal/intel/intel_hfi.c +++ b/drivers/thermal/intel/intel_hfi.c @@ -29,9 +29,11 @@ #include #include #include +#include #include #include #include +#include #include #include #include @@ -180,6 +182,62 @@ static struct workqueue_struct *hfi_updates_wq; #define HFI_UPDATE_INTERVAL HZ #define HFI_MAX_THERM_NOTIFY_COUNT 16 +/* A cache of the HFI perf capabilities for lockless access. */ +static int __percpu *hfi_ipcc_scores; +/* Sequence counter for hfi_ipcc_scores */ +static seqcount_t hfi_ipcc_seqcount = SEQCNT_ZERO(hfi_ipcc_seqcount); + +static int alloc_hfi_ipcc_scores(void) +{ + if (!cpu_feature_enabled(X86_FEATURE_ITD)) + return 0; + + hfi_ipcc_scores = __alloc_percpu(sizeof(*hfi_ipcc_scores) * + hfi_features.nr_classes, + sizeof(*hfi_ipcc_scores)); + + return hfi_ipcc_scores ? 0 : -ENOMEM; +} + +static void set_hfi_ipcc_scores(struct hfi_instance *hfi_instance) +{ + int cpu; + + if (!cpu_feature_enabled(X86_FEATURE_ITD)) + return; + + /* + * Serialize with writes to the HFI table. It also protects the write + * loop against seqcount readers running in interrupt context. + */ + raw_spin_lock_irq(&hfi_instance->table_lock); + /* + * The seqcount implies store-release semantics to order stores with + * lockless loads from the seqcount read side. It also implies a + * compiler barrier. + */ + write_seqcount_begin(&hfi_ipcc_seqcount); + for_each_cpu(cpu, hfi_instance->cpus) { + int c, *scores; + s16 index; + + index = per_cpu(hfi_cpu_info, cpu).index; + scores = per_cpu_ptr(hfi_ipcc_scores, cpu); + + for (c = 0; c < hfi_features.nr_classes; c++) { + struct hfi_cpu_data *caps; + + caps = hfi_instance->data + + index * hfi_features.cpu_stride + + c * hfi_features.class_stride; + scores[c] = caps->perf_cap; + } + } + + write_seqcount_end(&hfi_ipcc_seqcount); + raw_spin_unlock_irq(&hfi_instance->table_lock); +} + /** * intel_hfi_read_classid() - Read the currrent classid * @classid: Variable to which the classid will be written. @@ -275,6 +333,8 @@ static void update_capabilities(struct hfi_instance *hfi_instance) thermal_genl_cpu_capability_event(cpu_count, &cpu_caps[i]); kfree(cpu_caps); + + set_hfi_ipcc_scores(hfi_instance); out: mutex_unlock(&hfi_instance_lock); } @@ -618,8 +678,14 @@ void __init intel_hfi_init(void) if (!hfi_updates_wq) goto err_nomem; + if (alloc_hfi_ipcc_scores()) + goto err_ipcc; + return; +err_ipcc: + destroy_workqueue(hfi_updates_wq); + err_nomem: for (j = 0; j < i; ++j) { hfi_instance = &hfi_instances[j]; From patchwork Tue Jun 13 04:24:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ricardo Neri X-Patchwork-Id: 692449 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 39AF2C88CBA for ; Tue, 13 Jun 2023 04:23:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239407AbjFMEXw (ORCPT ); Tue, 13 Jun 2023 00:23:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45898 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239599AbjFMEXD (ORCPT ); Tue, 13 Jun 2023 00:23:03 -0400 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AE74F10E3; Mon, 12 Jun 2023 21:21:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686630117; x=1718166117; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=j+E1XdJn/u7s3/DMQyQTEanAvog6+RNqIczdbl+b8aQ=; b=J+1pZGctWkvLormUlInL4dgSQaYhGF30WYolG3/x+lC3LhMwbB6uWn20 YdGEC+532sTVZLo8PmVqvhIlbOol5oKC6pNm8ZH53dxWovgtude0FIcSV t0Q3q6Y8ZlD8OV2lZzsz9GSISd9KS5sNazMEnyvdHs74nxM89aohlKpHm YrAYY2k4pstPVPBY3teMzeGtf0qctteTelOBzFJEuBmiFoSUFgbee2jBp gbsEirYyBLy+BPGNLkFu9v1tkl5pRJrQn5EtDZgvfiaFkH7w5SLahScLO 8PbOOJx5EUPYhk1B62NGXQHfMzglZyO/RFreAW2W+RWK/WT5pRUkK1NrP Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="358222210" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="358222210" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 21:21:54 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="661854977" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="661854977" Received: from ranerica-svr.sc.intel.com ([172.25.110.23]) by orsmga003.jf.intel.com with ESMTP; 12 Jun 2023 21:21:53 -0700 From: Ricardo Neri To: "Peter Zijlstra (Intel)" , Juri Lelli , Vincent Guittot Cc: Ricardo Neri , "Ravi V. Shankar" , Ben Segall , Daniel Bristot de Oliveira , Dietmar Eggemann , Len Brown , Mel Gorman , "Rafael J. Wysocki" , Srinivas Pandruvada , Steven Rostedt , Tim Chen , Valentin Schneider , Lukasz Luba , Ionela Voinescu , Zhao Liu , "Yuan, Perry" , x86@kernel.org, "Joel Fernandes (Google)" , linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, Ricardo Neri , "Tim C . Chen" , Zhao Liu Subject: [PATCH v4 15/24] thermal: intel: hfi: Report the IPC class score of a CPU Date: Mon, 12 Jun 2023 21:24:13 -0700 Message-Id: <20230613042422.5344-16-ricardo.neri-calderon@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230613042422.5344-1-ricardo.neri-calderon@linux.intel.com> References: <20230613042422.5344-1-ricardo.neri-calderon@linux.intel.com> Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org Implement the arch_get_ipcc_score() interface of the scheduler. Use the performance capabilities of the extended Hardware Feedback Interface table as the IPC score. Use the cached per-CPU IPCC scores. A seqcount provides lockless access and the required memory ordering to avoid stale data. Cc: Ben Segall Cc: Daniel Bristot de Oliveira Cc: Dietmar Eggemann Cc: Ionela Voinescu Cc: Joel Fernandes (Google) Cc: Len Brown Cc: Lukasz Luba Cc: Mel Gorman Cc: Perry Yuan Cc: Rafael J. Wysocki Cc: Srinivas Pandruvada Cc: Steven Rostedt Cc: Tim C. Chen Cc: Valentin Schneider Cc: Zhao Liu Cc: x86@kernel.org Cc: linux-pm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Ricardo Neri Acked-by: Rafael J. Wysocki --- Changes since v3: * Ordered memory loads using a seqcount. * Removed local variable hfi_class from intel_hfi_get_ipcc_score(). (Rafael). Changes since v2: * None Changes since v1: * Adjusted the returned HFI class (which starts at 0) to match the scheduler IPCC class (which starts at 1). (PeterZ) * Used the new interface names. --- arch/x86/include/asm/topology.h | 4 ++++ drivers/thermal/intel/intel_hfi.c | 40 +++++++++++++++++++++++++++++-- 2 files changed, 42 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h index 7d2ed7f7bb8c..8dd328cdb5cf 100644 --- a/arch/x86/include/asm/topology.h +++ b/arch/x86/include/asm/topology.h @@ -237,13 +237,17 @@ void init_freq_invariance_cppc(void); #ifdef CONFIG_INTEL_HFI_THERMAL int intel_hfi_read_classid(u8 *classid); +unsigned long intel_hfi_get_ipcc_score(unsigned short ipcc, int cpu); #else static inline int intel_hfi_read_classid(u8 *classid) { return -ENODEV; } +static inline unsigned long +intel_hfi_get_ipcc_score(unsigned short ipcc, int cpu) { return -ENODEV; } #endif #ifdef CONFIG_IPC_CLASSES void intel_update_ipcc(struct task_struct *curr); #define arch_update_ipcc intel_update_ipcc +#define arch_get_ipcc_score intel_hfi_get_ipcc_score #endif #endif /* _ASM_X86_TOPOLOGY_H */ diff --git a/drivers/thermal/intel/intel_hfi.c b/drivers/thermal/intel/intel_hfi.c index d822ed0bb5c1..fec2c01fda1b 100644 --- a/drivers/thermal/intel/intel_hfi.c +++ b/drivers/thermal/intel/intel_hfi.c @@ -199,6 +199,42 @@ static int alloc_hfi_ipcc_scores(void) return hfi_ipcc_scores ? 0 : -ENOMEM; } +unsigned long intel_hfi_get_ipcc_score(unsigned short ipcc, int cpu) +{ + int *scores, score; + unsigned long seq; + + scores = per_cpu_ptr(hfi_ipcc_scores, cpu); + if (!scores) + return -ENODEV; + + if (cpu < 0 || cpu >= nr_cpu_ids) + return -EINVAL; + + if (ipcc == IPC_CLASS_UNCLASSIFIED) + return -EINVAL; + + /* + * Scheduler IPC classes start at 1. HFI classes start at 0. + * See note intel_hfi_update_ipcc(). + */ + if (ipcc >= hfi_features.nr_classes + 1) + return -EINVAL; + + /* + * The seqcount implies load-acquire semantics to order loads with + * lockless stores of the write side in set_hfi_ipcc_score(). It + * also implies a compiler barrier. + */ + do { + seq = read_seqcount_begin(&hfi_ipcc_seqcount); + /* @ipcc is never 0. */ + score = scores[ipcc - 1]; + } while (read_seqcount_retry(&hfi_ipcc_seqcount, seq)); + + return score; +} + static void set_hfi_ipcc_scores(struct hfi_instance *hfi_instance) { int cpu; @@ -213,8 +249,8 @@ static void set_hfi_ipcc_scores(struct hfi_instance *hfi_instance) raw_spin_lock_irq(&hfi_instance->table_lock); /* * The seqcount implies store-release semantics to order stores with - * lockless loads from the seqcount read side. It also implies a - * compiler barrier. + * lockless loads from the seqcount read side in + * intel_hfi_get_ipcc_score(). It also implies a compiler barrier. */ write_seqcount_begin(&hfi_ipcc_seqcount); for_each_cpu(cpu, hfi_instance->cpus) { From patchwork Tue Jun 13 04:24:14 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ricardo Neri X-Patchwork-Id: 692926 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D38D1C88CB9 for ; Tue, 13 Jun 2023 04:24:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239662AbjFMEYY (ORCPT ); Tue, 13 Jun 2023 00:24:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46676 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239417AbjFMEXK (ORCPT ); Tue, 13 Jun 2023 00:23:10 -0400 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 796341FC1; Mon, 12 Jun 2023 21:22:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686630120; x=1718166120; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=GW8okMpPu8O5W0GUNzkAELrhDTCKsrn++d0PdMiQWnw=; b=XM9btBK+ABby0KiBW8RJMZSdDMT9F21wimEwyTS5xITb6pwDsY7eFH6t WabTHB5yiVraY3WiYkxeA9NgntUD+jVvL8teAf9mEEEj0Zwh5VjrpcviJ OeiNpoojSuWI5pu6gKbP5Az4H/0F17nkpQ/xSp286WD5XtjubCMU8PtQq cdLCHjJarr1w9x3caCBU4MnodGvE6zFpgG+cWzj/wtbMJMUvPZ2bqwK9c aJxGaoEUzve3MkKOW7DPh1eYsPTluWcdyrFzUeEbwYqyoSasWwxCVWqK2 PhikGJj+HS2+n5C9lcJUsvGdr3hgwPInyRFJbuRj8y0Fo8val4nxSqMp9 g==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="358222218" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="358222218" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 21:21:56 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="661854980" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="661854980" Received: from ranerica-svr.sc.intel.com ([172.25.110.23]) by orsmga003.jf.intel.com with ESMTP; 12 Jun 2023 21:21:54 -0700 From: Ricardo Neri To: "Peter Zijlstra (Intel)" , Juri Lelli , Vincent Guittot Cc: Ricardo Neri , "Ravi V. Shankar" , Ben Segall , Daniel Bristot de Oliveira , Dietmar Eggemann , Len Brown , Mel Gorman , "Rafael J. Wysocki" , Srinivas Pandruvada , Steven Rostedt , Tim Chen , Valentin Schneider , Lukasz Luba , Ionela Voinescu , Zhao Liu , "Yuan, Perry" , x86@kernel.org, "Joel Fernandes (Google)" , linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, Ricardo Neri , "Tim C . Chen" , Zhao Liu Subject: [PATCH v4 16/24] thermal: intel: hfi: Define a default class for unclassified tasks Date: Mon, 12 Jun 2023 21:24:14 -0700 Message-Id: <20230613042422.5344-17-ricardo.neri-calderon@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230613042422.5344-1-ricardo.neri-calderon@linux.intel.com> References: <20230613042422.5344-1-ricardo.neri-calderon@linux.intel.com> Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org A task may be unclassified if it has been recently created, spend most of its lifetime sleeping, or hardware has not provided a classification. Most tasks will be eventually classified as scheduler's IPC class 1 (HFI class 0). This class corresponds to the capabilities in the legacy, classless, HFI table. IPC class 1 is a reasonable choice until hardware provides an actual classification. Meanwhile, the scheduler will place classes of tasks with higher IPC scores on higher-performance CPUs. Cc: Ben Segall Cc: Daniel Bristot de Oliveira Cc: Dietmar Eggemann Cc: Ionela Voinescu Cc: Joel Fernandes (Google) Cc: Len Brown Cc: Lukasz Luba Cc: Mel Gorman Cc: Perry Yuan Cc: Rafael J. Wysocki Cc: Srinivas Pandruvada Cc: Steven Rostedt Cc: Tim C. Chen Cc: Valentin Schneider Cc: Zhao Liu Cc: x86@kernel.org Cc: linux-pm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Acked-by: Rafael J. Wysocki Signed-off-by: Ricardo Neri --- Changes since v3: * Added Acked-by tag from Rafael. Changes since v2: * None Changes since v1: * Now the default class is 1. --- drivers/thermal/intel/intel_hfi.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/drivers/thermal/intel/intel_hfi.c b/drivers/thermal/intel/intel_hfi.c index fec2c01fda1b..e23c49da02ee 100644 --- a/drivers/thermal/intel/intel_hfi.c +++ b/drivers/thermal/intel/intel_hfi.c @@ -182,6 +182,19 @@ static struct workqueue_struct *hfi_updates_wq; #define HFI_UPDATE_INTERVAL HZ #define HFI_MAX_THERM_NOTIFY_COUNT 16 +/* + * A task may be unclassified if it has been recently created, spend most of + * its lifetime sleeping, or hardware has not provided a classification. + * + * Most tasks will be classified as scheduler's IPC class 1 (HFI class 0) + * eventually. Meanwhile, the scheduler will place classes of tasks with higher + * IPC scores on higher-performance CPUs. + * + * IPC class 1 is a reasonable choice. It matches the performance capability + * of the legacy, classless, HFI table. + */ +#define HFI_UNCLASSIFIED_DEFAULT 1 + /* A cache of the HFI perf capabilities for lockless access. */ static int __percpu *hfi_ipcc_scores; /* Sequence counter for hfi_ipcc_scores */ @@ -212,7 +225,7 @@ unsigned long intel_hfi_get_ipcc_score(unsigned short ipcc, int cpu) return -EINVAL; if (ipcc == IPC_CLASS_UNCLASSIFIED) - return -EINVAL; + ipcc = HFI_UNCLASSIFIED_DEFAULT; /* * Scheduler IPC classes start at 1. HFI classes start at 0. From patchwork Tue Jun 13 04:24:15 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ricardo Neri X-Patchwork-Id: 692448 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 34E31C88CBA for ; Tue, 13 Jun 2023 04:24:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239417AbjFMEYb (ORCPT ); Tue, 13 Jun 2023 00:24:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46004 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239331AbjFMEXV (ORCPT ); Tue, 13 Jun 2023 00:23:21 -0400 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 04BBA1FCE; Mon, 12 Jun 2023 21:22:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686630121; x=1718166121; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=WzB7EIesme5f3MZe0K+4oIdjGvzQwrDPxjYwpN58ncc=; b=EOLMGIUz+iZigu//V0TNAjS38rUotldulUFi6kmtkyOftaSfBbfZiLel 1b/+3eIqkhCTBC75+NIYMdX7N+4ET6PMPPwoszeH5P/XoTgKL+OPBBhYh ssyR/Tq1H/HTA2RQduf1uIPdJWB6Zng+Fofi9SINHCpBQf0Pup3EeEm6Z U6uUSb/XUlNTB/NJLzcU1HQ1IedTv/vXp58fXmQoJyxaUQ7TqYPEmcFm5 BpLFTN4pMblYv0l6raZktbTPSY7AhHaQRBuR7mTQXEQeBI9Mm+um5GlsD O3P9GpkYVhmKxCQ7Zi0pefBJaUplpzXHKpaX2TFTYZdwTZ1TLTe/tkpv4 A==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="358222229" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="358222229" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 21:21:57 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="661854984" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="661854984" Received: from ranerica-svr.sc.intel.com ([172.25.110.23]) by orsmga003.jf.intel.com with ESMTP; 12 Jun 2023 21:21:56 -0700 From: Ricardo Neri To: "Peter Zijlstra (Intel)" , Juri Lelli , Vincent Guittot Cc: Ricardo Neri , "Ravi V. Shankar" , Ben Segall , Daniel Bristot de Oliveira , Dietmar Eggemann , Len Brown , Mel Gorman , "Rafael J. Wysocki" , Srinivas Pandruvada , Steven Rostedt , Tim Chen , Valentin Schneider , Lukasz Luba , Ionela Voinescu , Zhao Liu , "Yuan, Perry" , x86@kernel.org, "Joel Fernandes (Google)" , linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, Ricardo Neri , "Tim C . Chen" , Zhao Liu Subject: [PATCH v4 17/24] thermal: intel: hfi: Enable the Intel Thread Director Date: Mon, 12 Jun 2023 21:24:15 -0700 Message-Id: <20230613042422.5344-18-ricardo.neri-calderon@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230613042422.5344-1-ricardo.neri-calderon@linux.intel.com> References: <20230613042422.5344-1-ricardo.neri-calderon@linux.intel.com> Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org Enable Intel Thread Director from the CPU hotplug callback: globally from CPU0 and then enable the thread-classification hardware in each logical processor individually. Also, initialize the number of classes supported. Let the scheduler know that it can start using IPC classes. Cc: Ben Segall Cc: Daniel Bristot de Oliveira Cc: Dietmar Eggemann Cc: Ionela Voinescu Cc: Joel Fernandes (Google) Cc: Len Brown Cc: Lukasz Luba Cc: Mel Gorman Cc: Perry Yuan Cc: Rafael J. Wysocki Cc: Srinivas Pandruvada Cc: Steven Rostedt Cc: Tim C. Chen Cc: Valentin Schneider Cc: Zhao Liu Cc: x86@kernel.org Cc: linux-pm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Acked-by: Rafael J. Wysocki # intel_hfi.c Signed-off-by: Ricardo Neri --- Changes since v3: * Dropped the definition of MSR_IA32_HW_FEEDBACK_CHAR. It is now added in patch 14 to fix a build break during bisection. (Reported-by: kernel test robot ). * Added Acked-by from Rafael. Changes since v2: * Use the new sched_enable_ipc_classes() interface to enable the use of IPC classes in the scheduler. Changes since v1: * None --- arch/x86/include/asm/msr-index.h | 1 + drivers/thermal/intel/intel_hfi.c | 41 +++++++++++++++++++++++++++++-- 2 files changed, 40 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h index 0bc4ed0ff787..7823b87bf383 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -1108,6 +1108,7 @@ /* Hardware Feedback Interface */ #define MSR_IA32_HW_FEEDBACK_PTR 0x17d0 #define MSR_IA32_HW_FEEDBACK_CONFIG 0x17d1 +#define MSR_IA32_HW_FEEDBACK_THREAD_CONFIG 0x17d4 #define MSR_IA32_HW_FEEDBACK_CHAR 0x17d2 /* x2APIC locked status */ diff --git a/drivers/thermal/intel/intel_hfi.c b/drivers/thermal/intel/intel_hfi.c index e23c49da02ee..75bf18dc7f51 100644 --- a/drivers/thermal/intel/intel_hfi.c +++ b/drivers/thermal/intel/intel_hfi.c @@ -33,6 +33,7 @@ #include #include #include +#include #include #include #include @@ -50,6 +51,8 @@ /* Hardware Feedback Interface MSR configuration bits */ #define HW_FEEDBACK_PTR_VALID_BIT BIT(0) #define HW_FEEDBACK_CONFIG_HFI_ENABLE_BIT BIT(0) +#define HW_FEEDBACK_CONFIG_ITD_ENABLE_BIT BIT(1) +#define HW_FEEDBACK_THREAD_CONFIG_ENABLE_BIT BIT(0) /* CPUID detection and enumeration definitions for HFI */ @@ -74,6 +77,15 @@ union cpuid6_edx { u32 full; }; +union cpuid6_ecx { + struct { + u32 dont_care0:8; + u32 nr_classes:8; + u32 dont_care1:16; + } split; + u32 full; +}; + union hfi_thread_feedback_char_msr { struct { u64 classid : 8; @@ -541,6 +553,11 @@ void intel_hfi_online(unsigned int cpu) init_hfi_cpu_index(info); + if (cpu_feature_enabled(X86_FEATURE_ITD)) { + msr_val = HW_FEEDBACK_THREAD_CONFIG_ENABLE_BIT; + wrmsrl(MSR_IA32_HW_FEEDBACK_THREAD_CONFIG, msr_val); + } + /* * Now check if the HFI instance of the package/die of @cpu has been * initialized (by checking its header). In such case, all we have to @@ -596,8 +613,22 @@ void intel_hfi_online(unsigned int cpu) */ rdmsrl(MSR_IA32_HW_FEEDBACK_CONFIG, msr_val); msr_val |= HW_FEEDBACK_CONFIG_HFI_ENABLE_BIT; + + if (cpu_feature_enabled(X86_FEATURE_ITD)) + msr_val |= HW_FEEDBACK_CONFIG_ITD_ENABLE_BIT; + wrmsrl(MSR_IA32_HW_FEEDBACK_CONFIG, msr_val); + /* + * We have all we need to support IPC classes. Task classification is + * now working. + * + * All class scores are zero until after the first HFI update. That is + * OK. The scheduler queries these scores at every load balance. + */ + if (cpu_feature_enabled(X86_FEATURE_ITD)) + sched_enable_ipc_classes(); + unlock: mutex_unlock(&hfi_instance_lock); return; @@ -675,8 +706,14 @@ static __init int hfi_parse_features(void) */ hfi_features.class_stride = nr_capabilities; - /* For now, use only one class of the HFI table */ - hfi_features.nr_classes = 1; + if (cpu_feature_enabled(X86_FEATURE_ITD)) { + union cpuid6_ecx ecx; + + ecx.full = cpuid_ecx(CPUID_HFI_LEAF); + hfi_features.nr_classes = ecx.split.nr_classes; + } else { + hfi_features.nr_classes = 1; + } /* * The header contains change indications for each supported feature. From patchwork Tue Jun 13 04:24:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ricardo Neri X-Patchwork-Id: 692925 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AA154C77B7A for ; Tue, 13 Jun 2023 04:25:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239556AbjFMEZB (ORCPT ); Tue, 13 Jun 2023 00:25:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45918 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239412AbjFMEX5 (ORCPT ); Tue, 13 Jun 2023 00:23:57 -0400 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2D6342122; Mon, 12 Jun 2023 21:22:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686630128; x=1718166128; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=t17FqAhGdxt78i227uUsMAyQGoFX7gC6V9+9xUvlqks=; b=cw8fJ7dm2IkYXTUGRdF06uoKm7vBrYB7wLh5ROagYUOkbyGz2pLqKTuM Jxj+d2QGHWWR0KEvuO4gYvSvXzNpVwaBzJlJ8un8c+ma3lU2bLK6qawM4 XOfWk0UapCEYggK2l5UwxM9VEzGdnFnP6vZoJFHMthAso84fUQ7wMico9 +hk76Q++a2PPxAiae1FJBzZcg6y71ObVfdkQhzlni2Y0AV5cxkVJJ2hRO mvJd4x+KR0gLGwinQs1q5RBVCKyfzBxpduVOzfEiCkcAfUqEsgNIvyPWy Mqwytn/GT9x+ShT2Z6PXuWiBZp8hONg1lH1+Qo+Js5dviO/HVUdv8qiwL g==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="358222240" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="358222240" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 21:21:58 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="661854987" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="661854987" Received: from ranerica-svr.sc.intel.com ([172.25.110.23]) by orsmga003.jf.intel.com with ESMTP; 12 Jun 2023 21:21:57 -0700 From: Ricardo Neri To: "Peter Zijlstra (Intel)" , Juri Lelli , Vincent Guittot Cc: Ricardo Neri , "Ravi V. Shankar" , Ben Segall , Daniel Bristot de Oliveira , Dietmar Eggemann , Len Brown , Mel Gorman , "Rafael J. Wysocki" , Srinivas Pandruvada , Steven Rostedt , Tim Chen , Valentin Schneider , Lukasz Luba , Ionela Voinescu , Zhao Liu , "Yuan, Perry" , x86@kernel.org, "Joel Fernandes (Google)" , linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, Ricardo Neri , "Tim C . Chen" , Zhao Liu Subject: [PATCH v4 18/24] sched/task_struct: Add helpers for IPC classification Date: Mon, 12 Jun 2023 21:24:16 -0700 Message-Id: <20230613042422.5344-19-ricardo.neri-calderon@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230613042422.5344-1-ricardo.neri-calderon@linux.intel.com> References: <20230613042422.5344-1-ricardo.neri-calderon@linux.intel.com> Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org The raw classification that hardware provides for a task may not be directly usable by the scheduler: the classification may change too frequently or architecture-specific implementations may need to consider additional factors. For instance, some processors with Intel Thread Director need to consider the state of the SMT siblings of a core. Provide per-task helper variables that architectures can use to postprocess the classification that hardware provides. Cc: Ben Segall Cc: Daniel Bristot de Oliveira Cc: Dietmar Eggemann Cc: Ionela Voinescu Cc: Joel Fernandes (Google) Cc: Len Brown Cc: Lukasz Luba Cc: Mel Gorman Cc: Perry Yuan Cc: Rafael J. Wysocki Cc: Srinivas Pandruvada Cc: Steven Rostedt Cc: Tim C. Chen Cc: Valentin Schneider Cc: Zhao Liu Cc: x86@kernel.org Cc: linux-pm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Ricardo Neri --- Changes since v3: * None Changes since v2: * None Changes since v1: * Used bit-fields to fit all the IPC class data in 4 bytes. (PeterZ) * Shortened names of the helpers. * Renamed helpers with the ipcc_ prefix. * Reworded commit message for clarity --- include/linux/sched.h | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 0e1c38ad09c2..719147460ca8 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1541,7 +1541,17 @@ struct task_struct { * A hardware-defined classification of task that reflects but is * not identical to the number of instructions per cycle. */ - unsigned short ipcc; + unsigned int ipcc : 9; + /* + * A candidate classification that arch-specific implementations + * qualify for correctness. + */ + unsigned int ipcc_tmp : 9; + /* + * Counter to filter out transient candidate classifications + * of a task. + */ + unsigned int ipcc_cntr : 14; #endif /* From patchwork Tue Jun 13 04:24:17 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ricardo Neri X-Patchwork-Id: 692447 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D8F0EC77B7A for ; Tue, 13 Jun 2023 04:25:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239483AbjFMEZO (ORCPT ); Tue, 13 Jun 2023 00:25:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46004 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239671AbjFMEYb (ORCPT ); Tue, 13 Jun 2023 00:24:31 -0400 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9C5112683; Mon, 12 Jun 2023 21:22:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686630138; x=1718166138; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=LbqLOEgq0/wctp/fT6BSArteOuKzDX/IB/thMI/XJtY=; b=bgDoqt1VsD6pssc8367BYTft6Hmvz421kZqef/PieLoYxdu6RDdRFDtx GmADbcinDPVSEdHuGOb+VKRZ6I0+7MHQ4NYCOMLvLcxoM5b6Fv12Xs/U0 H69bAUw9mSZKm0oupZrHhAqCosKIgIosrdmNF50kcfnO7cfsNJysLaxyO zq/f5Zjr2X36hOxUMxT+/ulclKNUifyMO26gP1PVF9SpQsrbGfNNWB2b1 4SdgYm4u1/AO1lvbetPp+4JQRLVeW4H/4TRwUcyUBADpw61EDZ7cppntM DNsH3Pk8kIxDjairoerntOC9VvQFyiOuU1L+2nH1AGOs0EL5Oq4ZWJOCF w==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="358222257" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="358222257" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 21:21:59 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="661854991" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="661854991" Received: from ranerica-svr.sc.intel.com ([172.25.110.23]) by orsmga003.jf.intel.com with ESMTP; 12 Jun 2023 21:21:58 -0700 From: Ricardo Neri To: "Peter Zijlstra (Intel)" , Juri Lelli , Vincent Guittot Cc: Ricardo Neri , "Ravi V. Shankar" , Ben Segall , Daniel Bristot de Oliveira , Dietmar Eggemann , Len Brown , Mel Gorman , "Rafael J. Wysocki" , Srinivas Pandruvada , Steven Rostedt , Tim Chen , Valentin Schneider , Lukasz Luba , Ionela Voinescu , Zhao Liu , "Yuan, Perry" , x86@kernel.org, "Joel Fernandes (Google)" , linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, Ricardo Neri , "Tim C . Chen" , Zhao Liu Subject: [PATCH v4 19/24] sched/core: Initialize helpers of task classification Date: Mon, 12 Jun 2023 21:24:17 -0700 Message-Id: <20230613042422.5344-20-ricardo.neri-calderon@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230613042422.5344-1-ricardo.neri-calderon@linux.intel.com> References: <20230613042422.5344-1-ricardo.neri-calderon@linux.intel.com> Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org Just as tasks start life unclassified, initialize the classification auxiliary variables. Cc: Ben Segall Cc: Daniel Bristot de Oliveira Cc: Dietmar Eggemann Cc: Ionela Voinescu Cc: Joel Fernandes (Google) Cc: Len Brown Cc: Lukasz Luba Cc: Mel Gorman Cc: Perry Yuan Cc: Rafael J. Wysocki Cc: Srinivas Pandruvada Cc: Steven Rostedt Cc: Tim C. Chen Cc: Valentin Schneider Cc: Zhao Liu Cc: x86@kernel.org Cc: linux-pm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Ricardo Neri --- Changes since v3: * None Changes since v2: * None Changes since v1: * None --- kernel/sched/core.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 876396b1d077..9c28be596938 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4502,6 +4502,8 @@ static void __sched_fork(unsigned long clone_flags, struct task_struct *p) p->se.vruntime = 0; #ifdef CONFIG_IPC_CLASSES p->ipcc = IPC_CLASS_UNCLASSIFIED; + p->ipcc_tmp = IPC_CLASS_UNCLASSIFIED; + p->ipcc_cntr = 0; #endif INIT_LIST_HEAD(&p->se.group_node); From patchwork Tue Jun 13 04:24:18 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ricardo Neri X-Patchwork-Id: 692924 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 01C6AC88CBA for ; Tue, 13 Jun 2023 04:25:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239796AbjFMEZT (ORCPT ); Tue, 13 Jun 2023 00:25:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45646 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239825AbjFMEYt (ORCPT ); Tue, 13 Jun 2023 00:24:49 -0400 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5E1172693; Mon, 12 Jun 2023 21:22:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686630141; x=1718166141; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=40EcOZxxZvEM9rJzNTuBarR/nR+bap62+RifKT4WIh0=; b=EWVioQCCzCHgSWQrhh+A1sXBGbS2Nh213ZPWTe5WDsdN5wgFBXpdYe+O ewrEX2eqDfWO4xh+PdQqd8PSkthtQDW2ecnGhf/j/uj1ix+Ny1r8IYTjI zFote07seVZL+XdQOJIQgxdNbwSrNox8iQICdZdHgV1OKHe3y6/x1rUN6 SYk6X+iu7J07Yi/Yw+A6jcskjViQG/iFZVdtn0AkRzkCE9kfeA3PNPmsl XM7I0SJ2/kvq5Fz8BePXNWTrKsuv5fna3VyAUbGzr/qKaKkq3FyGS+lO7 uYOOZjut80D9E6mLJlZA6sQ6ekiziIYQdvDz+trzrwgrJw+ZLQsknpqzj Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="358222265" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="358222265" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 21:22:01 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="661854995" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="661854995" Received: from ranerica-svr.sc.intel.com ([172.25.110.23]) by orsmga003.jf.intel.com with ESMTP; 12 Jun 2023 21:21:59 -0700 From: Ricardo Neri To: "Peter Zijlstra (Intel)" , Juri Lelli , Vincent Guittot Cc: Ricardo Neri , "Ravi V. Shankar" , Ben Segall , Daniel Bristot de Oliveira , Dietmar Eggemann , Len Brown , Mel Gorman , "Rafael J. Wysocki" , Srinivas Pandruvada , Steven Rostedt , Tim Chen , Valentin Schneider , Lukasz Luba , Ionela Voinescu , Zhao Liu , "Yuan, Perry" , x86@kernel.org, "Joel Fernandes (Google)" , linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, Ricardo Neri , "Tim C . Chen" , Zhao Liu Subject: [PATCH v4 20/24] sched/fair: Introduce sched_smt_siblings_idle() Date: Mon, 12 Jun 2023 21:24:18 -0700 Message-Id: <20230613042422.5344-21-ricardo.neri-calderon@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230613042422.5344-1-ricardo.neri-calderon@linux.intel.com> References: <20230613042422.5344-1-ricardo.neri-calderon@linux.intel.com> Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org X86 needs to know the idle state of the SMT siblings of a CPU to improve the accuracy of IPCC classification. X86 implements support for IPC classes in the thermal HFI driver. Rename is_core_idle() as sched_smt_siblings_idle() and make it available outside the scheduler code. Cc: Ben Segall Cc: Daniel Bristot de Oliveira Cc: Dietmar Eggemann Cc: Len Brown Cc: Mel Gorman Cc: Perry Yuan Cc: Rafael J. Wysocki Cc: Srinivas Pandruvada Cc: Steven Rostedt Cc: Tim C. Chen Cc: Valentin Schneider Cc: Zhao Liu Cc: x86@kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Ricardo Neri --- is_core_idle() is no longer an inline function after this patch. To rule out performance degradation, I compared the execution time of the inline and non-inline versions on a 4-socket Cascade Lake system using the NUMA stressor of stress-ng. I used this test command: $ stress-ng --numa 1500 -t 10m During the test, is_core_idle() was called ~200,000 times. To measure the execution time, I recorded the value of the TSC counter before and after calling is_core_idle() and calculated the difference. value. I arbitrarily removed outliers (defined as any delta larger than 5000 counts). This required removing ~40 samples. The table below summarizes the difference in execution time. All values are expressed in TSC counts, except for the standard deviation, expressed as a percentage of the average. Average Median Std(%) Mode TSCdelta inline 668.76 626 67.24 42 TSCdelta non-inline 677.64 624 67.67 46 The metrics show that both the inline and non-inline versions exhibit similar performance characteristics. --- Changes since v3: * None Changes since v2: * Brought back this previously dropped patch. * Profiled inline vs non-inline is_core_idle(). I found not major penalty. * Merged is_core_idle() and sched_smt_siblings_idle() into a single function. (Dietmar) Changes since v1: * Dropped this patch. --- include/linux/sched.h | 2 ++ kernel/sched/fair.c | 13 ++++++++++--- 2 files changed, 12 insertions(+), 3 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 719147460ca8..986344ebf2f6 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -2461,4 +2461,6 @@ static inline void sched_core_fork(struct task_struct *p) { } extern void sched_set_stop_task(int cpu, struct task_struct *stop); +extern bool sched_smt_siblings_idle(int cpu); + #endif diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index da3e009eef42..a52589a6c561 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1064,7 +1064,14 @@ update_stats_curr_start(struct cfs_rq *cfs_rq, struct sched_entity *se) * Scheduling class queueing methods: */ -static inline bool is_core_idle(int cpu) +/** + * sched_smt_siblings_idle - Check whether SMT siblings of a CPU are idle + * @cpu: The CPU to check + * + * Returns true if all the SMT siblings of @cpu are idle or @cpu does not have + * SMT siblings. The idle state of @cpu is not considered. + */ +bool sched_smt_siblings_idle(int cpu) { #ifdef CONFIG_SCHED_SMT int sibling; @@ -1767,7 +1774,7 @@ static inline int numa_idle_core(int idle_core, int cpu) * Prefer cores instead of packing HT siblings * and triggering future load balancing. */ - if (is_core_idle(cpu)) + if (sched_smt_siblings_idle(cpu)) idle_core = cpu; return idle_core; @@ -9652,7 +9659,7 @@ static bool sched_use_asym_prio(struct sched_domain *sd, int cpu) if (!sched_smt_active()) return true; - return sd->flags & SD_SHARE_CPUCAPACITY || is_core_idle(cpu); + return sd->flags & SD_SHARE_CPUCAPACITY || sched_smt_siblings_idle(cpu); } /** From patchwork Tue Jun 13 04:24:19 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ricardo Neri X-Patchwork-Id: 692446 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 78C21C77B7A for ; Tue, 13 Jun 2023 04:25:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239649AbjFMEZu (ORCPT ); Tue, 13 Jun 2023 00:25:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45934 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239412AbjFMEZF (ORCPT ); Tue, 13 Jun 2023 00:25:05 -0400 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CFD7E294F; Mon, 12 Jun 2023 21:22:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686630159; x=1718166159; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=kkx8+i7/ujXbIWdrAp+zwYJqUDlBUvhBgxYJWBRnjXc=; b=LCZYlE7YCGhLrx8+gTEekjH/iUW6Fl8f6LFKw/Cf4wmSbr/rGgnR0G93 y3rIK9nc8bwPZroUcpO1SO6Fq/8rO05NrpKhBrI418PGmd5TCOG6HGPUS +mFNwnmC8ZZgYdz8CyZc5Zo8mSyHPFB0MgDnX1mIdMCK/vCPcks1aImrK eSsoJyubC8VhWD95szP6QcAwCug0FrQQl6QNfA8cs2Tio9V8oq9NnZal4 OaI/MDOblMD2p8fC+Ma+hq2Y6TOP2X2X6/V/sJRfHzRata0k035ySCxR4 fWKmGilb9Esw+WEVovZARaHNI6ITrHetEoxwub6JCgxGHLwr2oZYpuZwM A==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="358222277" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="358222277" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 21:22:02 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="661854998" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="661854998" Received: from ranerica-svr.sc.intel.com ([172.25.110.23]) by orsmga003.jf.intel.com with ESMTP; 12 Jun 2023 21:22:01 -0700 From: Ricardo Neri To: "Peter Zijlstra (Intel)" , Juri Lelli , Vincent Guittot Cc: Ricardo Neri , "Ravi V. Shankar" , Ben Segall , Daniel Bristot de Oliveira , Dietmar Eggemann , Len Brown , Mel Gorman , "Rafael J. Wysocki" , Srinivas Pandruvada , Steven Rostedt , Tim Chen , Valentin Schneider , Lukasz Luba , Ionela Voinescu , Zhao Liu , "Yuan, Perry" , x86@kernel.org, "Joel Fernandes (Google)" , linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, Ricardo Neri , "Tim C . Chen" , Zhao Liu Subject: [PATCH v4 21/24] x86/sched/ipcc: Implement model-specific checks for task classification Date: Mon, 12 Jun 2023 21:24:19 -0700 Message-Id: <20230613042422.5344-22-ricardo.neri-calderon@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230613042422.5344-1-ricardo.neri-calderon@linux.intel.com> References: <20230613042422.5344-1-ricardo.neri-calderon@linux.intel.com> Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org In Alder Lake and Raptor Lake, the result of thread classification is more accurate when only one SMT sibling is busy. Classification results for class 2 and 3 are always reliable. Changing the classification of a task too frequently may lead to unnecessary migrations. Only update the class of a task if it is considered accurate and has been constant during four consecutive user ticks. Cc: Ben Segall Cc: Daniel Bristot de Oliveira Cc: Dietmar Eggemann Cc: Ionela Voinescu Cc: Joel Fernandes (Google) Cc: Len Brown Cc: Lukasz Luba Cc: Mel Gorman Cc: Perry Yuan Cc: Rafael J. Wysocki Cc: Srinivas Pandruvada Cc: Steven Rostedt Cc: Tim C. Chen Cc: Valentin Schneider Cc: Zhao Liu Cc: x86@kernel.org Cc: linux-pm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Ricardo Neri --- Changes since v3: * Relocated this code to arch/x86/kernel/sched_ipcc.c (Rafael) Changes since v2: * None Changes since v1: * Adjusted the result the classification of Intel Thread Director to start at class 1. Class 0 for the scheduler means that the task is unclassified. * Used the new names of the IPC classes members in task_struct. * Reworked helper functions to use sched_smt_siblings_idle() to query the idle state of the SMT siblings of a CPU. --- arch/x86/kernel/sched_ipcc.c | 60 +++++++++++++++++++++++++++++++++++- 1 file changed, 59 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/sched_ipcc.c b/arch/x86/kernel/sched_ipcc.c index 685e7b3b5375..dd73fc8be49b 100644 --- a/arch/x86/kernel/sched_ipcc.c +++ b/arch/x86/kernel/sched_ipcc.c @@ -18,11 +18,67 @@ #include +#include #include +#define CLASS_DEBOUNCER_SKIPS 4 + +/** + * debounce_and_update_class() - Process and update a task's classification + * + * @p: The task of which the classification will be updated + * @new_ipcc: The new IPC classification + * + * Update the classification of @p with the new value that hardware provides. + * Only update the classification of @p if it has been the same during + * CLASS_DEBOUNCER_SKIPS consecutive ticks. + */ +static void debounce_and_update_class(struct task_struct *p, u8 new_ipcc) +{ + u16 debounce_skip; + + /* The class of @p changed. Only restart the debounce counter. */ + if (p->ipcc_tmp != new_ipcc) { + p->ipcc_cntr = 1; + goto out; + } + + /* + * The class of @p did not change. Update it if it has been the same + * for CLASS_DEBOUNCER_SKIPS user ticks. + */ + debounce_skip = p->ipcc_cntr + 1; + if (debounce_skip < CLASS_DEBOUNCER_SKIPS) + p->ipcc_cntr++; + else + p->ipcc = new_ipcc; + +out: + p->ipcc_tmp = new_ipcc; +} + +static bool classification_is_accurate(u8 hfi_class, bool smt_siblings_idle) +{ + switch (boot_cpu_data.x86_model) { + case INTEL_FAM6_ALDERLAKE: + case INTEL_FAM6_ALDERLAKE_L: + case INTEL_FAM6_RAPTORLAKE: + case INTEL_FAM6_RAPTORLAKE_P: + case INTEL_FAM6_RAPTORLAKE_S: + if (hfi_class == 3 || hfi_class == 2 || smt_siblings_idle) + return true; + + return false; + + default: + return false; + } +} + void intel_update_ipcc(struct task_struct *curr) { u8 hfi_class; + bool idle; if (intel_hfi_read_classid(&hfi_class)) return; @@ -31,5 +87,7 @@ void intel_update_ipcc(struct task_struct *curr) * 0 is a valid classification for Intel Thread Director. A scheduler * IPCC class of 0 means that the task is unclassified. Adjust. */ - curr->ipcc = hfi_class + 1; + idle = sched_smt_siblings_idle(task_cpu(curr)); + if (classification_is_accurate(hfi_class, idle)) + debounce_and_update_class(curr, hfi_class + 1); } From patchwork Tue Jun 13 04:24:20 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ricardo Neri X-Patchwork-Id: 692923 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B76A0C88CBA for ; Tue, 13 Jun 2023 04:26:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239788AbjFME0A (ORCPT ); Tue, 13 Jun 2023 00:26:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46268 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239777AbjFMEZT (ORCPT ); Tue, 13 Jun 2023 00:25:19 -0400 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0AD2C2964; Mon, 12 Jun 2023 21:22:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686630168; x=1718166168; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=GKEVd4WNWkMHaCoolaZhOCYmwpKRjuAmiMNVm86HFno=; b=F4zRc0WIJ+CslGd670xS5PeUJIhApADyJX5eDDrbpa1rno/RCpJn6AvF 4xmM/P5o/ExWsNKNlueX8J9eobGjmqdqVVdiTgXRiQPWfAAnK9OspJwPr dwTiBbtyoju4ddmvTR1ofxjwmYmf8JUyxgKuDK7VSv+JGrwgRoHJ/+pPf KxSxHQL8CXaPInRRhneIQYeIv1XuglNXWnJs1YnSkf/UW1Cgs6ZI7cNd0 jc+/C5jn7A6mI7wN6k90tgq610FfyR3MRoywRiDPSmkvsAsna25UKcDeL cYUmbk1YntJYDPzunL5HJWKofBNWF6lUUx5WLC1BSuwQfwCP6c+Ur5ECi Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="358222290" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="358222290" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 21:22:04 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="661855002" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="661855002" Received: from ranerica-svr.sc.intel.com ([172.25.110.23]) by orsmga003.jf.intel.com with ESMTP; 12 Jun 2023 21:22:02 -0700 From: Ricardo Neri To: "Peter Zijlstra (Intel)" , Juri Lelli , Vincent Guittot Cc: Ricardo Neri , "Ravi V. Shankar" , Ben Segall , Daniel Bristot de Oliveira , Dietmar Eggemann , Len Brown , Mel Gorman , "Rafael J. Wysocki" , Srinivas Pandruvada , Steven Rostedt , Tim Chen , Valentin Schneider , Lukasz Luba , Ionela Voinescu , Zhao Liu , "Yuan, Perry" , x86@kernel.org, "Joel Fernandes (Google)" , linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, Ricardo Neri , "Tim C . Chen" , Zhao Liu Subject: [PATCH v4 22/24] x86/cpufeatures: Add feature bit for HRESET Date: Mon, 12 Jun 2023 21:24:20 -0700 Message-Id: <20230613042422.5344-23-ricardo.neri-calderon@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230613042422.5344-1-ricardo.neri-calderon@linux.intel.com> References: <20230613042422.5344-1-ricardo.neri-calderon@linux.intel.com> Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org The HRESET instruction isolates the classification of individual tasks when they run sequentially on the same logical processor. It resets the classification history that the logical processor maintains. Cc: Ben Segall Cc: Daniel Bristot de Oliveira Cc: Dietmar Eggemann Cc: Ionela Voinescu Cc: Joel Fernandes (Google) Cc: Len Brown Cc: Lukasz Luba Cc: Mel Gorman Cc: Perry Yuan Cc: Rafael J. Wysocki Cc: Srinivas Pandruvada Cc: Steven Rostedt Cc: Tim C. Chen Cc: Valentin Schneider Cc: Zhao Liu Cc: x86@kernel.org Cc: linux-pm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Ricardo Neri --- Changes since v3: * Moved definition of HRESET to its correct leaf: CPUID_7_1_EAX (Zhao) Changes since v2: * None Changes since v1: * None --- arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/msr-index.h | 3 +++ 2 files changed, 4 insertions(+) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 98a84cbf4261..5edb63af2996 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -319,6 +319,7 @@ #define X86_FEATURE_FSRC (12*32+12) /* "" Fast short REP {CMPSB,SCASB} */ #define X86_FEATURE_LKGS (12*32+18) /* "" Load "kernel" (userspace) GS */ #define X86_FEATURE_AMX_FP16 (12*32+21) /* "" AMX fp16 Support */ +#define X86_FEATURE_HRESET (12*32+22) /* Hardware history reset instruction */ #define X86_FEATURE_AVX_IFMA (12*32+23) /* "" Support for VPMADD52[H,L]UQ */ #define X86_FEATURE_LAM (12*32+26) /* Linear Address Masking */ diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h index 7823b87bf383..605c01539b0d 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -1111,6 +1111,9 @@ #define MSR_IA32_HW_FEEDBACK_THREAD_CONFIG 0x17d4 #define MSR_IA32_HW_FEEDBACK_CHAR 0x17d2 +/* Hardware History Reset */ +#define MSR_IA32_HW_HRESET_ENABLE 0x17da + /* x2APIC locked status */ #define MSR_IA32_XAPIC_DISABLE_STATUS 0xBD #define LEGACY_XAPIC_DISABLED BIT(0) /* From patchwork Tue Jun 13 04:24:21 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ricardo Neri X-Patchwork-Id: 692445 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E8E5BC77B7A for ; Tue, 13 Jun 2023 04:26:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239812AbjFME0D (ORCPT ); Tue, 13 Jun 2023 00:26:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45904 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239830AbjFMEZf (ORCPT ); Tue, 13 Jun 2023 00:25:35 -0400 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0516E170E; Mon, 12 Jun 2023 21:22:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686630172; x=1718166172; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=RwlWd28vqHXWXzvO+9M6hWVcHHCuXEVs+/JBdj0y9xA=; b=nYXJYymDEFL//J7FXryfDQkNqSC76YxUMo1Z7JsnefLeH5zYy2T3Uqvn C4V1JqXDNaw8/ELmJp1xXaQinUy6TQe0POZIMYAy0Ng2afZ2QRVXp0Pdq RqzOT35DDUz96zE4s54ITqL0j90PLdEXRfDPRe1byy08jbdqos5TWoPhn Hmn0ZnrIzbzPTumIj5jWGJxPZBSaAKkeBcWB9/PUkASlfat5urlXsWyPK a8uc5ua8/5eGgjzhIZF/cBypmmCO5k4hTe/GK+hKQQaQRHBYdNbf7RfmM Ok11sekR+wiXuj810xXTynmqDF+tQucfJ34pHnXCBd3c2RWfRIjas4o5b w==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="358222301" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="358222301" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 21:22:05 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="661855006" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="661855006" Received: from ranerica-svr.sc.intel.com ([172.25.110.23]) by orsmga003.jf.intel.com with ESMTP; 12 Jun 2023 21:22:04 -0700 From: Ricardo Neri To: "Peter Zijlstra (Intel)" , Juri Lelli , Vincent Guittot Cc: Ricardo Neri , "Ravi V. Shankar" , Ben Segall , Daniel Bristot de Oliveira , Dietmar Eggemann , Len Brown , Mel Gorman , "Rafael J. Wysocki" , Srinivas Pandruvada , Steven Rostedt , Tim Chen , Valentin Schneider , Lukasz Luba , Ionela Voinescu , Zhao Liu , "Yuan, Perry" , x86@kernel.org, "Joel Fernandes (Google)" , linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, Ricardo Neri , "Tim C . Chen" , Zhao Liu Subject: [PATCH v4 23/24] x86/hreset: Configure history reset Date: Mon, 12 Jun 2023 21:24:21 -0700 Message-Id: <20230613042422.5344-24-ricardo.neri-calderon@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230613042422.5344-1-ricardo.neri-calderon@linux.intel.com> References: <20230613042422.5344-1-ricardo.neri-calderon@linux.intel.com> Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org Configure the MSR that controls the behavior of HRESET on each logical processor. Cc: Ben Segall Cc: Daniel Bristot de Oliveira Cc: Dietmar Eggemann Cc: Ionela Voinescu Cc: Joel Fernandes (Google) Cc: Len Brown Cc: Lukasz Luba Cc: Mel Gorman Cc: Perry Yuan Cc: Rafael J. Wysocki Cc: Srinivas Pandruvada Cc: Steven Rostedt Cc: Tim C. Chen Cc: Valentin Schneider Cc: Zhao Liu Cc: x86@kernel.org Cc: linux-pm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Ricardo Neri --- Changes since v3: * None Changes since v2: * None Changes since v1: * Marked hardware_history_features as __ro_after_init instead of __read_mostly. (PeterZ) --- arch/x86/kernel/cpu/common.c | 23 ++++++++++++++++++++++- 1 file changed, 22 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 8f284e185aea..d47a442900ad 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -396,6 +396,26 @@ static __always_inline void setup_umip(struct cpuinfo_x86 *c) cr4_clear_bits(X86_CR4_UMIP); } +static u32 hardware_history_features __ro_after_init; + +static __always_inline void setup_hreset(struct cpuinfo_x86 *c) +{ + if (!cpu_feature_enabled(X86_FEATURE_HRESET)) + return; + + /* + * Use on all CPUs the hardware history features that the boot + * CPU supports. + */ + if (c == &boot_cpu_data) + hardware_history_features = cpuid_ebx(0x20); + + if (!hardware_history_features) + return; + + wrmsrl(MSR_IA32_HW_HRESET_ENABLE, hardware_history_features); +} + /* These bits should not change their value after CPU init is finished. */ static const unsigned long cr4_pinned_mask = X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_UMIP | @@ -1834,10 +1854,11 @@ static void identify_cpu(struct cpuinfo_x86 *c) /* Disable the PN if appropriate */ squash_the_stupid_serial_number(c); - /* Set up SMEP/SMAP/UMIP */ + /* Set up SMEP/SMAP/UMIP/HRESET */ setup_smep(c); setup_smap(c); setup_umip(c); + setup_hreset(c); /* Enable FSGSBASE instructions if available. */ if (cpu_has(c, X86_FEATURE_FSGSBASE)) { From patchwork Tue Jun 13 04:24:22 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ricardo Neri X-Patchwork-Id: 692922 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 99580C77B7A for ; Tue, 13 Jun 2023 04:26:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239859AbjFME0h (ORCPT ); Tue, 13 Jun 2023 00:26:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45624 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239717AbjFMEZy (ORCPT ); Tue, 13 Jun 2023 00:25:54 -0400 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A8C7A173F; Mon, 12 Jun 2023 21:23:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686630189; x=1718166189; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=TE4ozxfbws0uqAWGPY8R3I8UzboW/+vJ7ix2tgBVGXY=; b=Sq5LoMznLdRAJs/Mvoj89qVywR2vHY046N5hWdn4wWw/Qe6CEApJSyuq ODlWQa8TU/V+hl4LJPcr6E1g5Cu4dogCye0r3Mp4BLjtLN8JdLHE7FYF/ QTRBwe9vwG2LYG8/3fbwjULBuZAnUp5IfvPuAKzaoO6SFvRx//cx6S8y+ UmfxvhmBAJfOZZTS46LwVN7UjWgzWcjeHFaMgrV9oLoIPDGEVMbphycXK ElNQRVCuJ1oBgax9XPe4gJPw1KkjhWuWxiVZ26lQ3L7UD9EHzrIYc9MO2 V3vgENN/s5pUuGZRtLmQ0d25pJqTiL0Vnc06X0JVzU0TBqatasaTs5pv1 Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="358222313" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="358222313" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 21:22:07 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="661855010" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="661855010" Received: from ranerica-svr.sc.intel.com ([172.25.110.23]) by orsmga003.jf.intel.com with ESMTP; 12 Jun 2023 21:22:05 -0700 From: Ricardo Neri To: "Peter Zijlstra (Intel)" , Juri Lelli , Vincent Guittot Cc: Ricardo Neri , "Ravi V. Shankar" , Ben Segall , Daniel Bristot de Oliveira , Dietmar Eggemann , Len Brown , Mel Gorman , "Rafael J. Wysocki" , Srinivas Pandruvada , Steven Rostedt , Tim Chen , Valentin Schneider , Lukasz Luba , Ionela Voinescu , Zhao Liu , "Yuan, Perry" , x86@kernel.org, "Joel Fernandes (Google)" , linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, Ricardo Neri , "Tim C . Chen" , Zhao Liu Subject: [PATCH v4 24/24] x86/process: Reset hardware history in context switch Date: Mon, 12 Jun 2023 21:24:22 -0700 Message-Id: <20230613042422.5344-25-ricardo.neri-calderon@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230613042422.5344-1-ricardo.neri-calderon@linux.intel.com> References: <20230613042422.5344-1-ricardo.neri-calderon@linux.intel.com> Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org Reset the classification history of the current task when switching to the next task. Hardware will start the classification of the next task from scratch. Cc: Ben Segall Cc: Daniel Bristot de Oliveira Cc: Dietmar Eggemann Cc: Ionela Voinescu Cc: Joel Fernandes (Google) Cc: Len Brown Cc: Lukasz Luba Cc: Mel Gorman Cc: Perry Yuan Cc: Rafael J. Wysocki Cc: Srinivas Pandruvada Cc: Steven Rostedt Cc: Tim C. Chen Cc: Valentin Schneider Cc: Zhao Liu Cc: x86@kernel.org Cc: linux-pm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Ricardo Neri --- Changes since v3: * None Changes since v2: * None Changes since v1: * Measurements of the cost of the HRESET instruction Methodology: I created a tight loop with interrupts and preemption disabled. I recorded the value of the TSC counter before and after executing HRESET or RDTSC. I repeated the measurement 100,000 times. I performed the experiment using an Alder Lake S system. I set the frequency of the CPUs at a fixed value. The table below compares the cost of HRESET with RDTSC (expressed in the elapsed TSC count). The cost of the two instructions is comparable. PCore ECore Frequency (GHz) 5.0 3.8 HRESET (avg) 28.5 44.7 HRESET (stdev %) 3.6 2.3 RDTSC (avg) 25.2 35.7 RDTSC (stdev %) 3.9 2.6 * Used an ALTERNATIVE macro instead of static_cpu_has() to execute HRESET when supported. (PeterZ) --- arch/x86/include/asm/hreset.h | 30 ++++++++++++++++++++++++++++++ arch/x86/kernel/cpu/common.c | 7 +++++++ arch/x86/kernel/process_32.c | 3 +++ arch/x86/kernel/process_64.c | 3 +++ 4 files changed, 43 insertions(+) create mode 100644 arch/x86/include/asm/hreset.h diff --git a/arch/x86/include/asm/hreset.h b/arch/x86/include/asm/hreset.h new file mode 100644 index 000000000000..d68ca2fb8642 --- /dev/null +++ b/arch/x86/include/asm/hreset.h @@ -0,0 +1,30 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_HRESET_H + +/** + * HRESET - History reset. Available since binutils v2.36. + * + * Request the processor to reset the history of task classification on the + * current logical processor. The history components to be + * reset are specified in %eax. Only bits specified in CPUID(0x20).EBX + * and enabled in the IA32_HRESET_ENABLE MSR can be selected. + * + * The assembly code looks like: + * + * hreset %eax + * + * The corresponding machine code looks like: + * + * F3 0F 3A F0 ModRM Imm + * + * The value of ModRM is 0xc0 to specify %eax register addressing. + * The ignored immediate operand is set to 0. + * + * The instruction is documented in the Intel SDM. + */ + +#define __ASM_HRESET ".byte 0xf3, 0xf, 0x3a, 0xf0, 0xc0, 0x0" + +void reset_hardware_history(void); + +#endif /* _ASM_X86_HRESET_H */ diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index d47a442900ad..8cdd88dab7ed 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -53,6 +53,7 @@ #include #include #include +#include #include #include #include @@ -398,6 +399,12 @@ static __always_inline void setup_umip(struct cpuinfo_x86 *c) static u32 hardware_history_features __ro_after_init; +void reset_hardware_history(void) +{ + asm_inline volatile (ALTERNATIVE("", __ASM_HRESET, X86_FEATURE_HRESET) + : : "a" (hardware_history_features) : "memory"); +} + static __always_inline void setup_hreset(struct cpuinfo_x86 *c) { if (!cpu_feature_enabled(X86_FEATURE_HRESET)) diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c index 708c87b88cc1..7353bb119e79 100644 --- a/arch/x86/kernel/process_32.c +++ b/arch/x86/kernel/process_32.c @@ -52,6 +52,7 @@ #include #include #include +#include #include #include "process.h" @@ -214,6 +215,8 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p) /* Load the Intel cache allocation PQR MSR. */ resctrl_sched_in(next_p); + reset_hardware_history(); + return prev_p; } diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index 3d181c16a2f6..fcf14cba2a9d 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -54,6 +54,7 @@ #include #include #include +#include #include #include #ifdef CONFIG_IA32_EMULATION @@ -659,6 +660,8 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p) /* Load the Intel cache allocation PQR MSR. */ resctrl_sched_in(next_p); + reset_hardware_history(); + return prev_p; }