diff mbox series

[v3] cpuidle/pseries: Fixup CEDE0 latency only for POWER10 onwards

Message ID 1619697982-28461-1-git-send-email-ego@linux.vnet.ibm.com
State New
Headers show
Series [v3] cpuidle/pseries: Fixup CEDE0 latency only for POWER10 onwards | expand

Commit Message

Gautham R. Shenoy April 29, 2021, 12:06 p.m. UTC
From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>

Commit d947fb4c965c ("cpuidle: pseries: Fixup exit latency for
CEDE(0)") sets the exit latency of CEDE(0) based on the latency values
of the Extended CEDE states advertised by the platform

On POWER9 LPARs, the firmwares advertise a very low value of 2us for
CEDE1 exit latency on a Dedicated LPAR. The latency advertized by the
PHYP hypervisor corresponds to the latency required to wakeup from the
underlying hardware idle state. However the wakeup latency from the
LPAR perspective should include

1. The time taken to transition the CPU from the Hypervisor into the
   LPAR post wakeup from platform idle state

2. Time taken to send the IPI from the source CPU (waker) to the idle
   target CPU (wakee).

1. can be measured via timer idle test, where we queue a timer, say
for 1ms, and enter the CEDE state. When the timer fires, in the timer
handler we compute how much extra timer over the expected 1ms have we
consumed. On a a POWER9 LPAR the numbers are

CEDE latency measured using a timer (numbers in ns)
N       Min      Median   Avg       90%ile  99%ile    Max    Stddev
400     2601     5677     5668.74    5917    6413     9299   455.01

1. and 2. combined can be determined by an IPI latency test where we
send an IPI to an idle CPU and in the handler compute the time
difference between when the IPI was sent and when the handler ran. We
see the following numbers on POWER9 LPAR.

CEDE latency measured using an IPI (numbers in ns)
N       Min      Median   Avg       90%ile  99%ile    Max    Stddev
400     711      7564     7369.43   8559    9514      9698   1200.01

Suppose, we consider the 99th percentile latency value measured using
the IPI to be the wakeup latency, the value would be 9.5us This is in
the ballpark of the default value of 10us.

Hence, use the exit latency of CEDE(0) based on the latency values
advertized by platform only from POWER10 onwards. The values
advertized on POWER10 platforms is more realistic and informed by the
latency measurements. For earlier platforms stick to the default value
of 10us. The fix was suggested by Michael Ellerman.

Reported-by: Enrico Joedecke <joedecke@de.ibm.com>
Fixes: commit d947fb4c965c ("cpuidle: pseries: Fixup exit latency for
CEDE(0)")
Cc: Michal Suchanek <msuchanek@suse.de>
Cc: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
---
v2-->v3: Modify the condition to preclude only the platforms prior to
         POWER10 from using the firmware provided values.
	 
 drivers/cpuidle/cpuidle-pseries.c | 9 +++++++++
 1 file changed, 9 insertions(+)
diff mbox series

Patch

diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c
index a2b5c6f..694d71e 100644
--- a/drivers/cpuidle/cpuidle-pseries.c
+++ b/drivers/cpuidle/cpuidle-pseries.c
@@ -349,6 +349,15 @@  static void __init fixup_cede0_latency(void)
 	u64 min_latency_us;
 	int i;
 
+	/*
+	 * Use firmware provided latency values on POWER10 onwards and
+	 * also on POWER10 running in POWER9-compat mode. On platforms
+	 * prior to POWER10, we cannot rely on the firmware provided
+	 * values, so we go with the conservative default value.
+	 */
+	if (!cpu_has_feature(CPU_FTR_ARCH_31) && !pvr_version_is(PVR_POWER10))
+		return;
+
 	min_latency_us = dedicated_states[1].exit_latency; // CEDE latency
 
 	if (parse_cede_parameters())