diff mbox series

[1/3] x86/mwait: Add support for idle via umwait

Message ID 20230306123418.720679-2-dedekind1@gmail.com
State Superseded
Headers show
Series Sapphire Rapids C0.x idle states support | expand

Commit Message

Artem Bityutskiy March 6, 2023, 12:34 p.m. UTC
From: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>

On Intel platforms, C-states are requested using the 'monitor/mwait'
instructions pair, as implemented in 'mwait_idle_with_hints()'. This
mechanism allows for entering C1 and deeper C-states.

Sapphire Rapids Xeon supports new idle states - C0.1 and C0.2 (later C0.x).
These idle states have lower latency comparing to C1, and can be requested
with either 'tpause' and 'umwait' instructions.

Linux already uses the 'tpause' instruction in delay functions like
'udelay()'. This patch adds 'umwait' and 'umonitor' instructions support.

'umwait' and 'tpause' instructions are very similar - both send the CPU to
C0.x and have the same break out rules. But unlike 'tpause', 'umwait' works
together with 'umonitor' and exits the C0.x when the monitored memory
address is modified (similar idea as with 'monitor/mwait').

This patch implements the 'umwait_idle()' function, which works very
similarly to existing 'mwait_idle_with_hints()', but requests C0.x. The
intention is to use it from the 'intel_idle' driver.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
---
 arch/x86/include/asm/mwait.h | 63 ++++++++++++++++++++++++++++++++++++
 1 file changed, 63 insertions(+)

Comments

Rafael J. Wysocki March 7, 2023, 11:55 a.m. UTC | #1
On Mon, Mar 6, 2023 at 3:56 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Mon, Mar 06, 2023 at 02:34:16PM +0200, Artem Bityutskiy wrote:
> > From: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
> >
> > On Intel platforms, C-states are requested using the 'monitor/mwait'
> > instructions pair, as implemented in 'mwait_idle_with_hints()'. This
> > mechanism allows for entering C1 and deeper C-states.
> >
> > Sapphire Rapids Xeon supports new idle states - C0.1 and C0.2 (later C0.x).
> > These idle states have lower latency comparing to C1, and can be requested
> > with either 'tpause' and 'umwait' instructions.
> >
> > Linux already uses the 'tpause' instruction in delay functions like
> > 'udelay()'. This patch adds 'umwait' and 'umonitor' instructions support.
> >
> > 'umwait' and 'tpause' instructions are very similar - both send the CPU to
> > C0.x and have the same break out rules. But unlike 'tpause', 'umwait' works
> > together with 'umonitor' and exits the C0.x when the monitored memory
> > address is modified (similar idea as with 'monitor/mwait').
> >
> > This patch implements the 'umwait_idle()' function, which works very
> > similarly to existing 'mwait_idle_with_hints()', but requests C0.x. The
> > intention is to use it from the 'intel_idle' driver.
>
> Still wondering wth regular mwait can't access these new idle states.

But is this a question for Artem to answer?
Peter Zijlstra March 8, 2023, 12:35 p.m. UTC | #2
On Tue, Mar 07, 2023 at 12:55:45PM +0100, Rafael J. Wysocki wrote:
> On Mon, Mar 6, 2023 at 3:56 PM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Mon, Mar 06, 2023 at 02:34:16PM +0200, Artem Bityutskiy wrote:
> > > From: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
> > >
> > > On Intel platforms, C-states are requested using the 'monitor/mwait'
> > > instructions pair, as implemented in 'mwait_idle_with_hints()'. This
> > > mechanism allows for entering C1 and deeper C-states.
> > >
> > > Sapphire Rapids Xeon supports new idle states - C0.1 and C0.2 (later C0.x).
> > > These idle states have lower latency comparing to C1, and can be requested
> > > with either 'tpause' and 'umwait' instructions.
> > >
> > > Linux already uses the 'tpause' instruction in delay functions like
> > > 'udelay()'. This patch adds 'umwait' and 'umonitor' instructions support.
> > >
> > > 'umwait' and 'tpause' instructions are very similar - both send the CPU to
> > > C0.x and have the same break out rules. But unlike 'tpause', 'umwait' works
> > > together with 'umonitor' and exits the C0.x when the monitored memory
> > > address is modified (similar idea as with 'monitor/mwait').
> > >
> > > This patch implements the 'umwait_idle()' function, which works very
> > > similarly to existing 'mwait_idle_with_hints()', but requests C0.x. The
> > > intention is to use it from the 'intel_idle' driver.
> >
> > Still wondering wth regular mwait can't access these new idle states.
> 
> But is this a question for Artem to answer?

Maybe, maybe not, but I did want to call out this 'design' in public. It
is really weird IMO.
diff mbox series

Patch

diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
index 778df05f8539..a8612de3212a 100644
--- a/arch/x86/include/asm/mwait.h
+++ b/arch/x86/include/asm/mwait.h
@@ -141,4 +141,67 @@  static inline void __tpause(u32 ecx, u32 edx, u32 eax)
 	#endif
 }
 
+#ifdef CONFIG_X86_64
+/*
+ * Monitor a memory address at 'rcx' using the 'umonitor' instruction.
+ */
+static inline void __umonitor(const void *rcx)
+{
+	/* "umonitor %rcx" */
+#ifdef CONFIG_AS_TPAUSE
+	asm volatile("umonitor %%rcx\n"
+		     :
+		     : "c"(rcx));
+#else
+	asm volatile(".byte 0xf3, 0x0f, 0xae, 0xf1\t\n"
+		     :
+		     : "c"(rcx));
+#endif
+}
+
+/*
+ * Same as '__tpause()', but uses the 'umwait' instruction. It is very
+ * similar to 'tpause', but also breaks out if the data at the address
+ * monitored with 'umonitor' is modified.
+ */
+static inline void __umwait(u32 ecx, u32 edx, u32 eax)
+{
+	/* "umwait %ecx, %edx, %eax;" */
+#ifdef CONFIG_AS_TPAUSE
+	asm volatile("umwait %%ecx\n"
+		     :
+		     : "c"(ecx), "d"(edx), "a"(eax));
+#else
+	asm volatile(".byte 0xf2, 0x0f, 0xae, 0xf1\t\n"
+		     :
+		     : "c"(ecx), "d"(edx), "a"(eax));
+#endif
+}
+
+/*
+ * Enter C0.1 or C0.2 state and stay there until an event happens (an interrupt
+ * or the 'need_resched()'), or the deadline is reached. The deadline is the
+ * absolute TSC value to exit the idle state at. However, if deadline exceeds
+ * the global limit in the IA32_UMWAIT_CONTROL register, the global limit
+ * prevails, and the idle state is exited earlier than the deadline.
+ */
+static inline void umwait_idle(u64 deadline, u32 state)
+{
+	if (!current_set_polling_and_test()) {
+		u32 eax, edx;
+
+		eax = lower_32_bits(deadline);
+		edx = upper_32_bits(deadline);
+
+		__umonitor(&current_thread_info()->flags);
+		if (!need_resched())
+			__umwait(state, edx, eax);
+	}
+	current_clr_polling();
+}
+#else
+#define umwait_idle(deadline, state) \
+		WARN_ONCE(1, "umwait CPU instruction is not supported")
+#endif /* CONFIG_X86_64 */
+
 #endif /* _ASM_X86_MWAIT_H */