Message ID | 1619673517-10853-1-git-send-email-ego@linux.vnet.ibm.com |
---|---|
State | New |
Headers | show |
Series | [v2] cpuidle/pseries: Fixup CEDE0 latency only for POWER10 onwards | expand |
"Gautham R. Shenoy" <ego@linux.vnet.ibm.com> writes: > From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com> > > Commit d947fb4c965c ("cpuidle: pseries: Fixup exit latency for > CEDE(0)") sets the exit latency of CEDE(0) based on the latency values > of the Extended CEDE states advertised by the platform > > On POWER9 LPARs, the firmwares advertise a very low value of 2us for > CEDE1 exit latency on a Dedicated LPAR. The latency advertized by the > PHYP hypervisor corresponds to the latency required to wakeup from the > underlying hardware idle state. However the wakeup latency from the > LPAR perspective should include > > 1. The time taken to transition the CPU from the Hypervisor into the > LPAR post wakeup from platform idle state > > 2. Time taken to send the IPI from the source CPU (waker) to the idle > target CPU (wakee). > > 1. can be measured via timer idle test, where we queue a timer, say > for 1ms, and enter the CEDE state. When the timer fires, in the timer > handler we compute how much extra timer over the expected 1ms have we > consumed. On a a POWER9 LPAR the numbers are > > CEDE latency measured using a timer (numbers in ns) > N Min Median Avg 90%ile 99%ile Max Stddev > 400 2601 5677 5668.74 5917 6413 9299 455.01 > > 1. and 2. combined can be determined by an IPI latency test where we > send an IPI to an idle CPU and in the handler compute the time > difference between when the IPI was sent and when the handler ran. We > see the following numbers on POWER9 LPAR. > > CEDE latency measured using an IPI (numbers in ns) > N Min Median Avg 90%ile 99%ile Max Stddev > 400 711 7564 7369.43 8559 9514 9698 1200.01 > > Suppose, we consider the 99th percentile latency value measured using > the IPI to be the wakeup latency, the value would be 9.5us This is in > the ballpark of the default value of 10us. > > Hence, use the exit latency of CEDE(0) based on the latency values > advertized by platform only from POWER10 onwards. The values ^^^^^^^ > advertized on POWER10 platforms is more realistic and informed by the > latency measurements. For earlier platforms stick to the default value > of 10us. ... > diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c > index a2b5c6f..7207467 100644 > --- a/drivers/cpuidle/cpuidle-pseries.c > +++ b/drivers/cpuidle/cpuidle-pseries.c > @@ -419,7 +419,8 @@ static int pseries_idle_probe(void) > cpuidle_state_table = shared_states; > max_idle_state = ARRAY_SIZE(shared_states); > } else { > - fixup_cede0_latency(); > + if (pvr_version_is(PVR_POWER10)) > + fixup_cede0_latency(); A PVR check like that tests for *only* Power10, not Power10 and onwards as you say in the change log. The other question is what should happen on a Power10 LPAR that's running in Power9 compat mode. I assume in that case we *do* want to use the firmware provided values, because they're tied to the underlying CPU, not the compat mode? In which case a check for !PVR_POWER9 would seem to achieve what we want? cheers
diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c index a2b5c6f..7207467 100644 --- a/drivers/cpuidle/cpuidle-pseries.c +++ b/drivers/cpuidle/cpuidle-pseries.c @@ -419,7 +419,8 @@ static int pseries_idle_probe(void) cpuidle_state_table = shared_states; max_idle_state = ARRAY_SIZE(shared_states); } else { - fixup_cede0_latency(); + if (pvr_version_is(PVR_POWER10)) + fixup_cede0_latency(); cpuidle_state_table = dedicated_states; max_idle_state = NR_DEDICATED_STATES; }