From patchwork Mon Mar 6 12:34:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Artem Bityutskiy X-Patchwork-Id: 659627 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1D1CEC61DA4 for ; Mon, 6 Mar 2023 12:34:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229750AbjCFMeY (ORCPT ); Mon, 6 Mar 2023 07:34:24 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60144 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229863AbjCFMeX (ORCPT ); Mon, 6 Mar 2023 07:34:23 -0500 Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 741ED1D91F for ; Mon, 6 Mar 2023 04:34:22 -0800 (PST) X-IronPort-AV: E=McAfee;i="6500,9779,10640"; a="398123469" X-IronPort-AV: E=Sophos;i="5.98,236,1673942400"; d="scan'208";a="398123469" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Mar 2023 04:34:22 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10640"; a="706422139" X-IronPort-AV: E=Sophos;i="5.98,236,1673942400"; d="scan'208";a="706422139" Received: from powerlab.fi.intel.com ([10.237.71.25]) by orsmga008.jf.intel.com with ESMTP; 06 Mar 2023 04:34:20 -0800 From: Artem Bityutskiy To: x86@kernel.org, Linux PM Mailing List Cc: Artem Bityutskiy Subject: [PATCH 1/3] x86/mwait: Add support for idle via umwait Date: Mon, 6 Mar 2023 14:34:16 +0200 Message-Id: <20230306123418.720679-2-dedekind1@gmail.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20230306123418.720679-1-dedekind1@gmail.com> References: <20230306123418.720679-1-dedekind1@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org From: Artem Bityutskiy On Intel platforms, C-states are requested using the 'monitor/mwait' instructions pair, as implemented in 'mwait_idle_with_hints()'. This mechanism allows for entering C1 and deeper C-states. Sapphire Rapids Xeon supports new idle states - C0.1 and C0.2 (later C0.x). These idle states have lower latency comparing to C1, and can be requested with either 'tpause' and 'umwait' instructions. Linux already uses the 'tpause' instruction in delay functions like 'udelay()'. This patch adds 'umwait' and 'umonitor' instructions support. 'umwait' and 'tpause' instructions are very similar - both send the CPU to C0.x and have the same break out rules. But unlike 'tpause', 'umwait' works together with 'umonitor' and exits the C0.x when the monitored memory address is modified (similar idea as with 'monitor/mwait'). This patch implements the 'umwait_idle()' function, which works very similarly to existing 'mwait_idle_with_hints()', but requests C0.x. The intention is to use it from the 'intel_idle' driver. Signed-off-by: Artem Bityutskiy --- arch/x86/include/asm/mwait.h | 63 ++++++++++++++++++++++++++++++++++++ 1 file changed, 63 insertions(+) diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h index 778df05f8539..a8612de3212a 100644 --- a/arch/x86/include/asm/mwait.h +++ b/arch/x86/include/asm/mwait.h @@ -141,4 +141,67 @@ static inline void __tpause(u32 ecx, u32 edx, u32 eax) #endif } +#ifdef CONFIG_X86_64 +/* + * Monitor a memory address at 'rcx' using the 'umonitor' instruction. + */ +static inline void __umonitor(const void *rcx) +{ + /* "umonitor %rcx" */ +#ifdef CONFIG_AS_TPAUSE + asm volatile("umonitor %%rcx\n" + : + : "c"(rcx)); +#else + asm volatile(".byte 0xf3, 0x0f, 0xae, 0xf1\t\n" + : + : "c"(rcx)); +#endif +} + +/* + * Same as '__tpause()', but uses the 'umwait' instruction. It is very + * similar to 'tpause', but also breaks out if the data at the address + * monitored with 'umonitor' is modified. + */ +static inline void __umwait(u32 ecx, u32 edx, u32 eax) +{ + /* "umwait %ecx, %edx, %eax;" */ +#ifdef CONFIG_AS_TPAUSE + asm volatile("umwait %%ecx\n" + : + : "c"(ecx), "d"(edx), "a"(eax)); +#else + asm volatile(".byte 0xf2, 0x0f, 0xae, 0xf1\t\n" + : + : "c"(ecx), "d"(edx), "a"(eax)); +#endif +} + +/* + * Enter C0.1 or C0.2 state and stay there until an event happens (an interrupt + * or the 'need_resched()'), or the deadline is reached. The deadline is the + * absolute TSC value to exit the idle state at. However, if deadline exceeds + * the global limit in the IA32_UMWAIT_CONTROL register, the global limit + * prevails, and the idle state is exited earlier than the deadline. + */ +static inline void umwait_idle(u64 deadline, u32 state) +{ + if (!current_set_polling_and_test()) { + u32 eax, edx; + + eax = lower_32_bits(deadline); + edx = upper_32_bits(deadline); + + __umonitor(¤t_thread_info()->flags); + if (!need_resched()) + __umwait(state, edx, eax); + } + current_clr_polling(); +} +#else +#define umwait_idle(deadline, state) \ + WARN_ONCE(1, "umwait CPU instruction is not supported") +#endif /* CONFIG_X86_64 */ + #endif /* _ASM_X86_MWAIT_H */ From patchwork Mon Mar 6 12:34:17 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Artem Bityutskiy X-Patchwork-Id: 660908 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5F5E1C6FA99 for ; Mon, 6 Mar 2023 12:34:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229805AbjCFMeZ (ORCPT ); Mon, 6 Mar 2023 07:34:25 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60180 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229863AbjCFMeY (ORCPT ); Mon, 6 Mar 2023 07:34:24 -0500 Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 13B4118B27 for ; Mon, 6 Mar 2023 04:34:23 -0800 (PST) X-IronPort-AV: E=McAfee;i="6500,9779,10640"; a="398123474" X-IronPort-AV: E=Sophos;i="5.98,236,1673942400"; d="scan'208";a="398123474" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Mar 2023 04:34:23 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10640"; a="706422148" X-IronPort-AV: E=Sophos;i="5.98,236,1673942400"; d="scan'208";a="706422148" Received: from powerlab.fi.intel.com ([10.237.71.25]) by orsmga008.jf.intel.com with ESMTP; 06 Mar 2023 04:34:22 -0800 From: Artem Bityutskiy To: x86@kernel.org, Linux PM Mailing List Cc: Artem Bityutskiy Subject: [PATCH 2/3] x86/umwait: Increase tpause and umwait quanta Date: Mon, 6 Mar 2023 14:34:17 +0200 Message-Id: <20230306123418.720679-3-dedekind1@gmail.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20230306123418.720679-1-dedekind1@gmail.com> References: <20230306123418.720679-1-dedekind1@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org From: Artem Bityutskiy == Background == The 'umwait' and 'tpause' instructions both put the CPU in a low power state while they wait. They amount of time they wait is influenced by an explicit deadline value passed in a register and an implicit value written to an shared MSR (MSR_IA32_UMWAIT_CONTROL). Existing 'tpause' users (udelay()) can tolerate a wide range of MSR_IA32_UMWAIT_CONTROL MSR values. The explicit deadline trumps the MSR for short delays. Longer delays will see extra wakeups, but no functional issues. == Problem == Extra wakeups mean extra power. That translates into worse idle power when 'umwait' gets used for idle. == Solution == Increase MSR_IA32_UMWAIT_CONTROL by factor of 100 to decrease idle power when using 'umwait'. Make 'tpause' rely on its explicit deadline more often and reduce the number of wakeups and save power during long delays. Signed-off-by: Artem Bityutskiy --- arch/x86/kernel/cpu/umwait.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/cpu/umwait.c b/arch/x86/kernel/cpu/umwait.c index ec8064c0ae03..17c23173da0f 100644 --- a/arch/x86/kernel/cpu/umwait.c +++ b/arch/x86/kernel/cpu/umwait.c @@ -14,9 +14,9 @@ /* * Cache IA32_UMWAIT_CONTROL MSR. This is a systemwide control. By default, - * umwait max time is 100000 in TSC-quanta and C0.2 is enabled + * umwait max time is 10,000,000 in TSC-quanta and C0.2 is enabled. */ -static u32 umwait_control_cached = UMWAIT_CTRL_VAL(100000, UMWAIT_C02_ENABLE); +static u32 umwait_control_cached = UMWAIT_CTRL_VAL(10000000, UMWAIT_C02_ENABLE); /* * Cache the original IA32_UMWAIT_CONTROL MSR value which is configured by From patchwork Mon Mar 6 12:34:18 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Artem Bityutskiy X-Patchwork-Id: 659626 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0539CC61DA4 for ; Mon, 6 Mar 2023 12:34:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229863AbjCFMe1 (ORCPT ); Mon, 6 Mar 2023 07:34:27 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60216 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229867AbjCFMe0 (ORCPT ); Mon, 6 Mar 2023 07:34:26 -0500 Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 91BB11F5EE for ; Mon, 6 Mar 2023 04:34:25 -0800 (PST) X-IronPort-AV: E=McAfee;i="6500,9779,10640"; a="398123478" X-IronPort-AV: E=Sophos;i="5.98,236,1673942400"; d="scan'208";a="398123478" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Mar 2023 04:34:25 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10640"; a="706422154" X-IronPort-AV: E=Sophos;i="5.98,236,1673942400"; d="scan'208";a="706422154" Received: from powerlab.fi.intel.com ([10.237.71.25]) by orsmga008.jf.intel.com with ESMTP; 06 Mar 2023 04:34:24 -0800 From: Artem Bityutskiy To: x86@kernel.org, Linux PM Mailing List Cc: Artem Bityutskiy Subject: [PATCH 3/3] intel_idle: add C0.2 state for Sapphire Rapids Xeon Date: Mon, 6 Mar 2023 14:34:18 +0200 Message-Id: <20230306123418.720679-4-dedekind1@gmail.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20230306123418.720679-1-dedekind1@gmail.com> References: <20230306123418.720679-1-dedekind1@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org From: Artem Bityutskiy Add Sapphire Rapids Xeon C0.2 state support. This state has a lower exit latency comparing to C1, and saves energy comparing to POLL (in range of 5-20%). This patch also improves performance (e.g., as measured by 'hackbench'), because idle CPU power savings in C0.2 increase busy CPU power budget and therefore, improve turbo boost of the busy CPU. Suggested-by: Len Brown Suggested-by: Arjan Van De Ven Signed-off-by: Artem Bityutskiy --- drivers/idle/intel_idle.c | 59 ++++++++++++++++++++++++++++++++++++++- 1 file changed, 58 insertions(+), 1 deletion(-) diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c index 938c17f25d94..f7705a64d0e6 100644 --- a/drivers/idle/intel_idle.c +++ b/drivers/idle/intel_idle.c @@ -51,11 +51,13 @@ #include #include #include +#include #include #include #include #include #include +#include #include #define INTEL_IDLE_VERSION "0.5.1" @@ -73,6 +75,8 @@ static struct cpuidle_device __percpu *intel_idle_cpuidle_devices; static unsigned long auto_demotion_disable_flags; +static u64 umwait_limit; + static enum { C1E_PROMOTION_PRESERVE, C1E_PROMOTION_ENABLE, @@ -225,6 +229,27 @@ static __cpuidle int intel_idle_s2idle(struct cpuidle_device *dev, return 0; } +/** + * intel_idle_umwait_irq - Request C0.x using the 'umwait' instruction. + * @dev: cpuidle device of the target CPU. + * @drv: cpuidle driver (assumed to point to intel_idle_driver). + * @index: Target idle state index. + * + * Request C0.1 or C0.2 using 'umwait' instruction with interrupts enabled. + */ +static __cpuidle int intel_idle_umwait_irq(struct cpuidle_device *dev, + struct cpuidle_driver *drv, + int index) +{ + u32 state = flg2MWAIT(drv->states[index].flags); + + raw_local_irq_enable(); + umwait_idle(rdtsc() + umwait_limit, state); + local_irq_disable(); + + return index; +} + /* * States are indexed by the cstate number, * which is also the index into the MWAIT hint array. @@ -968,6 +993,13 @@ static struct cpuidle_state adl_n_cstates[] __initdata = { }; static struct cpuidle_state spr_cstates[] __initdata = { + { + .name = "C0.2", + .desc = "UMWAIT C0.2", + .flags = MWAIT2flg(TPAUSE_C02_STATE) | CPUIDLE_FLAG_IRQ_ENABLE, + .exit_latency_ns = 100, + .target_residency_ns = 100, + .enter = &intel_idle_umwait_irq, }, { .name = "C1", .desc = "MWAIT 0x00", @@ -1894,7 +1926,8 @@ static void __init intel_idle_init_cstates_icpu(struct cpuidle_driver *drv) /* Structure copy. */ drv->states[drv->state_count] = cpuidle_state_table[cstate]; - if ((cpuidle_state_table[cstate].flags & CPUIDLE_FLAG_IRQ_ENABLE) || force_irq_on) { + if (cpuidle_state_table[cstate].enter == intel_idle && + ((cpuidle_state_table[cstate].flags & CPUIDLE_FLAG_IRQ_ENABLE) || force_irq_on)) { printk("intel_idle: forced intel_idle_irq for state %d\n", cstate); drv->states[drv->state_count].enter = intel_idle_irq; } @@ -1926,6 +1959,29 @@ static void __init intel_idle_init_cstates_icpu(struct cpuidle_driver *drv) } } +/** + * umwait_limit_init - initialize time limit value for 'umwait'. + * @drv: cpuidle driver structure to initialize. + * + * C0.1 and C0.2 (later C0.x) idle states are requested via the 'umwait' + * instruction. The 'umwait' instruction requires the "deadline" - the TSC + * counter value to break out of C0.x (unless it broke out because of an + * interrupt or some other event). + * + * The deadline is specified as an absolute TSC value, and it is calculated as + * current TSC value + 'umwait_limit'. This function initializes the + * 'umwait_limit' variable to count of cycles per tick. The motivation is: + * * the tick is not disabled for shallow states like C0.x so, so idle will + * not last longer than a tick anyway + * * limit idle time to give cpuidle a chance to re-evaluate its C-state + * selection decision and possibly select a deeper C-state. + */ +static void __init umwait_limit_init(void) +{ + umwait_limit = (u64)TICK_NSEC * tsc_khz; + do_div(umwait_limit, MICRO); +} + /** * intel_idle_cpuidle_driver_init - Create the list of available idle states. * @drv: cpuidle driver structure to initialize. @@ -1933,6 +1989,7 @@ static void __init intel_idle_init_cstates_icpu(struct cpuidle_driver *drv) static void __init intel_idle_cpuidle_driver_init(struct cpuidle_driver *drv) { cpuidle_poll_state_init(drv); + umwait_limit_init(); if (disabled_states_mask & BIT(0)) drv->states[0].flags |= CPUIDLE_FLAG_OFF;