diff mbox series

[6.1.y] cpuidle: teo: Remove recent intercepts metric

Message ID 181bb5c2-5790-41bf-9ed8-3d3164b8697d@arm.com
State New
Headers show
Series [6.1.y] cpuidle: teo: Remove recent intercepts metric | expand

Commit Message

Christian Loehle Aug. 5, 2024, 2:58 p.m. UTC
commit 449914398083148f93d070a8aace04f9ec296ce3 upstream.

The logic for recent intercepts didn't work, there is an underflow
of the 'recent' value that can be observed during boot already, which
teo usually doesn't recover from, making the entire logic pointless.
Furthermore the recent intercepts also were never reset, thus not
actually being very 'recent'.

Having underflowed 'recent' values lead to teo always acting as if
we were in a scenario were expected sleep length based on timers is
too high and it therefore unnecessarily selecting shallower states.

Experiments show that the remaining 'intercept' logic is enough to
quickly react to scenarios in which teo cannot rely on the timer
expected sleep length.

See also here:
https://lore.kernel.org/lkml/0ce2d536-1125-4df8-9a5b-0d5e389cd8af@arm.com/

Fixes: 77577558f25d ("cpuidle: teo: Rework most recent idle duration values treatment")
Link: https://patch.msgid.link/20240628095955.34096-3-christian.loehle@arm.com
Signed-off-by: Christian Loehle <christian.loehle@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/cpuidle/governors/teo.c | 79 ++++++---------------------------
 1 file changed, 14 insertions(+), 65 deletions(-)

Comments

Greg KH Aug. 12, 2024, 12:42 p.m. UTC | #1
On Mon, Aug 05, 2024 at 03:58:09PM +0100, Christian Loehle wrote:
> commit 449914398083148f93d070a8aace04f9ec296ce3 upstream.
> 
> The logic for recent intercepts didn't work, there is an underflow
> of the 'recent' value that can be observed during boot already, which
> teo usually doesn't recover from, making the entire logic pointless.
> Furthermore the recent intercepts also were never reset, thus not
> actually being very 'recent'.
> 
> Having underflowed 'recent' values lead to teo always acting as if
> we were in a scenario were expected sleep length based on timers is
> too high and it therefore unnecessarily selecting shallower states.
> 
> Experiments show that the remaining 'intercept' logic is enough to
> quickly react to scenarios in which teo cannot rely on the timer
> expected sleep length.
> 
> See also here:
> https://lore.kernel.org/lkml/0ce2d536-1125-4df8-9a5b-0d5e389cd8af@arm.com/
> 
> Fixes: 77577558f25d ("cpuidle: teo: Rework most recent idle duration values treatment")
> Link: https://patch.msgid.link/20240628095955.34096-3-christian.loehle@arm.com
> Signed-off-by: Christian Loehle <christian.loehle@arm.com>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>  drivers/cpuidle/governors/teo.c | 79 ++++++---------------------------
>  1 file changed, 14 insertions(+), 65 deletions(-)

We can't just take a 6.1.y backport without newer kernels also having
this fix.  Can you resend this as backports for all relevant kernels
please?

thanks,

greg k-h
Christian Loehle Aug. 13, 2024, 1:18 p.m. UTC | #2
On 8/12/24 13:42, Greg KH wrote:
> On Mon, Aug 05, 2024 at 03:58:09PM +0100, Christian Loehle wrote:
>> commit 449914398083148f93d070a8aace04f9ec296ce3 upstream.
>>
>> The logic for recent intercepts didn't work, there is an underflow
>> of the 'recent' value that can be observed during boot already, which
>> teo usually doesn't recover from, making the entire logic pointless.
>> Furthermore the recent intercepts also were never reset, thus not
>> actually being very 'recent'.
>>
>> Having underflowed 'recent' values lead to teo always acting as if
>> we were in a scenario were expected sleep length based on timers is
>> too high and it therefore unnecessarily selecting shallower states.
>>
>> Experiments show that the remaining 'intercept' logic is enough to
>> quickly react to scenarios in which teo cannot rely on the timer
>> expected sleep length.
>>
>> See also here:
>> https://lore.kernel.org/lkml/0ce2d536-1125-4df8-9a5b-0d5e389cd8af@arm.com/
>>
>> Fixes: 77577558f25d ("cpuidle: teo: Rework most recent idle duration values treatment")
>> Link: https://patch.msgid.link/20240628095955.34096-3-christian.loehle@arm.com
>> Signed-off-by: Christian Loehle <christian.loehle@arm.com>
>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> ---
>>  drivers/cpuidle/governors/teo.c | 79 ++++++---------------------------
>>  1 file changed, 14 insertions(+), 65 deletions(-)
> 
> We can't just take a 6.1.y backport without newer kernels also having
> this fix.  Can you resend this as backports for all relevant kernels
> please?

Hi Greg,
the email thread might've looked a bit strange to you but as I wrote
in a previous reply:
https://lore.kernel.org/linux-pm/20240628095955.34096-1-christian.loehle@arm.com/T/#ma5bcd00c4b0ffa1fc34e8d7fa237b8de4ee8a25c
@stable
4b20b07ce72f cpuidle: teo: Don't count non-existent intercepts
449914398083 cpuidle: teo: Remove recent intercepts metric
0a2998fa48f0 Revert: "cpuidle: teo: Introduce util-awareness"
apply as-is to
linux-6.10.y
linux-6.6.y
for linux-6.1.y only 449914398083 ("cpuidle: teo: Remove recent intercepts metric")
is relevant, I'll reply with a backport.


Ideally I would wait for an Ack from Rafael though.
Hopefully that makes more sense then.
Greg KH Aug. 14, 2024, 11:40 a.m. UTC | #3
On Tue, Aug 13, 2024 at 02:18:53PM +0100, Christian Loehle wrote:
> On 8/12/24 13:42, Greg KH wrote:
> > On Mon, Aug 05, 2024 at 03:58:09PM +0100, Christian Loehle wrote:
> >> commit 449914398083148f93d070a8aace04f9ec296ce3 upstream.
> >>
> >> The logic for recent intercepts didn't work, there is an underflow
> >> of the 'recent' value that can be observed during boot already, which
> >> teo usually doesn't recover from, making the entire logic pointless.
> >> Furthermore the recent intercepts also were never reset, thus not
> >> actually being very 'recent'.
> >>
> >> Having underflowed 'recent' values lead to teo always acting as if
> >> we were in a scenario were expected sleep length based on timers is
> >> too high and it therefore unnecessarily selecting shallower states.
> >>
> >> Experiments show that the remaining 'intercept' logic is enough to
> >> quickly react to scenarios in which teo cannot rely on the timer
> >> expected sleep length.
> >>
> >> See also here:
> >> https://lore.kernel.org/lkml/0ce2d536-1125-4df8-9a5b-0d5e389cd8af@arm.com/
> >>
> >> Fixes: 77577558f25d ("cpuidle: teo: Rework most recent idle duration values treatment")
> >> Link: https://patch.msgid.link/20240628095955.34096-3-christian.loehle@arm.com
> >> Signed-off-by: Christian Loehle <christian.loehle@arm.com>
> >> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >> ---
> >>  drivers/cpuidle/governors/teo.c | 79 ++++++---------------------------
> >>  1 file changed, 14 insertions(+), 65 deletions(-)
> > 
> > We can't just take a 6.1.y backport without newer kernels also having
> > this fix.  Can you resend this as backports for all relevant kernels
> > please?
> 
> Hi Greg,
> the email thread might've looked a bit strange to you but as I wrote
> in a previous reply:
> https://lore.kernel.org/linux-pm/20240628095955.34096-1-christian.loehle@arm.com/T/#ma5bcd00c4b0ffa1fc34e8d7fa237b8de4ee8a25c
> @stable
> 4b20b07ce72f cpuidle: teo: Don't count non-existent intercepts
> 449914398083 cpuidle: teo: Remove recent intercepts metric
> 0a2998fa48f0 Revert: "cpuidle: teo: Introduce util-awareness"
> apply as-is to
> linux-6.10.y
> linux-6.6.y
> for linux-6.1.y only 449914398083 ("cpuidle: teo: Remove recent intercepts metric")
> is relevant, I'll reply with a backport.

Please send all of these as a patch series for the relevent branches so
we know exactly what is going on...

thanks,

greg k-h
diff mbox series

Patch

diff --git a/drivers/cpuidle/governors/teo.c b/drivers/cpuidle/governors/teo.c
index d9262db79cae..65475526bb11 100644
--- a/drivers/cpuidle/governors/teo.c
+++ b/drivers/cpuidle/governors/teo.c
@@ -54,16 +54,12 @@ 
  * shallower than the one whose bin is fallen into by the sleep length (these
  * situations are referred to as "intercepts" below).
  *
- * In addition to the metrics described above, the governor counts recent
- * intercepts (that is, intercepts that have occurred during the last
- * %NR_RECENT invocations of it for the given CPU) for each bin.
- *
  * In order to select an idle state for a CPU, the governor takes the following
  * steps (modulo the possible latency constraint that must be taken into account
  * too):
  *
  * 1. Find the deepest CPU idle state whose target residency does not exceed
- *    the current sleep length (the candidate idle state) and compute 3 sums as
+ *    the current sleep length (the candidate idle state) and compute 2 sums as
  *    follows:
  *
  *    - The sum of the "hits" and "intercepts" metrics for the candidate state
@@ -76,20 +72,15 @@ 
  *      idle long enough to avoid being intercepted if the sleep length had been
  *      equal to the current one).
  *
- *    - The sum of the numbers of recent intercepts for all of the idle states
- *      shallower than the candidate one.
- *
- * 2. If the second sum is greater than the first one or the third sum is
- *    greater than %NR_RECENT / 2, the CPU is likely to wake up early, so look
- *    for an alternative idle state to select.
+ * 2. If the second sum is greater than the first one the CPU is likely to wake
+ *    up early, so look for an alternative idle state to select.
  *
  *    - Traverse the idle states shallower than the candidate one in the
  *      descending order.
  *
- *    - For each of them compute the sum of the "intercepts" metrics and the sum
- *      of the numbers of recent intercepts over all of the idle states between
- *      it and the candidate one (including the former and excluding the
- *      latter).
+ *    - For each of them compute the sum of the "intercepts" metrics over all
+ *      of the idle states between it and the candidate one (including the
+ *      former and excluding the latter).
  *
  *    - If each of these sums that needs to be taken into account (because the
  *      check related to it has indicated that the CPU is likely to wake up
@@ -114,22 +105,14 @@ 
 #define PULSE		1024
 #define DECAY_SHIFT	3
 
-/*
- * Number of the most recent idle duration values to take into consideration for
- * the detection of recent early wakeup patterns.
- */
-#define NR_RECENT	9
-
 /**
  * struct teo_bin - Metrics used by the TEO cpuidle governor.
  * @intercepts: The "intercepts" metric.
  * @hits: The "hits" metric.
- * @recent: The number of recent "intercepts".
  */
 struct teo_bin {
 	unsigned int intercepts;
 	unsigned int hits;
-	unsigned int recent;
 };
 
 /**
@@ -137,17 +120,13 @@  struct teo_bin {
  * @time_span_ns: Time between idle state selection and post-wakeup update.
  * @sleep_length_ns: Time till the closest timer event (at the selection time).
  * @state_bins: Idle state data bins for this CPU.
- * @total: Grand total of the "intercepts" and "hits" mertics for all bins.
- * @next_recent_idx: Index of the next @recent_idx entry to update.
- * @recent_idx: Indices of bins corresponding to recent "intercepts".
+ * @total: Grand total of the "intercepts" and "hits" metrics for all bins.
  */
 struct teo_cpu {
 	s64 time_span_ns;
 	s64 sleep_length_ns;
 	struct teo_bin state_bins[CPUIDLE_STATE_MAX];
 	unsigned int total;
-	int next_recent_idx;
-	int recent_idx[NR_RECENT];
 };
 
 static DEFINE_PER_CPU(struct teo_cpu, teo_cpus);
@@ -216,27 +195,16 @@  static void teo_update(struct cpuidle_driver *drv, struct cpuidle_device *dev)
 		}
 	}
 
-	i = cpu_data->next_recent_idx++;
-	if (cpu_data->next_recent_idx >= NR_RECENT)
-		cpu_data->next_recent_idx = 0;
-
-	if (cpu_data->recent_idx[i] >= 0)
-		cpu_data->state_bins[cpu_data->recent_idx[i]].recent--;
-
 	/*
 	 * If the measured idle duration falls into the same bin as the sleep
 	 * length, this is a "hit", so update the "hits" metric for that bin.
 	 * Otherwise, update the "intercepts" metric for the bin fallen into by
 	 * the measured idle duration.
 	 */
-	if (idx_timer == idx_duration) {
+	if (idx_timer == idx_duration)
 		cpu_data->state_bins[idx_timer].hits += PULSE;
-		cpu_data->recent_idx[i] = -1;
-	} else {
+	else
 		cpu_data->state_bins[idx_duration].intercepts += PULSE;
-		cpu_data->state_bins[idx_duration].recent++;
-		cpu_data->recent_idx[i] = idx_duration;
-	}
 
 	cpu_data->total += PULSE;
 }
@@ -289,13 +257,10 @@  static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
 	s64 latency_req = cpuidle_governor_latency_req(dev->cpu);
 	unsigned int idx_intercept_sum = 0;
 	unsigned int intercept_sum = 0;
-	unsigned int idx_recent_sum = 0;
-	unsigned int recent_sum = 0;
 	unsigned int idx_hit_sum = 0;
 	unsigned int hit_sum = 0;
 	int constraint_idx = 0;
 	int idx0 = 0, idx = -1;
-	bool alt_intercepts, alt_recent;
 	ktime_t delta_tick;
 	s64 duration_ns;
 	int i;
@@ -338,7 +303,6 @@  static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
 		 */
 		intercept_sum += prev_bin->intercepts;
 		hit_sum += prev_bin->hits;
-		recent_sum += prev_bin->recent;
 
 		if (dev->states_usage[i].disable)
 			continue;
@@ -358,7 +322,6 @@  static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
 
 		idx_intercept_sum = intercept_sum;
 		idx_hit_sum = hit_sum;
-		idx_recent_sum = recent_sum;
 	}
 
 	/* Avoid unnecessary overhead. */
@@ -373,42 +336,32 @@  static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
 	 * If the sum of the intercepts metric for all of the idle states
 	 * shallower than the current candidate one (idx) is greater than the
 	 * sum of the intercepts and hits metrics for the candidate state and
-	 * all of the deeper states, or the sum of the numbers of recent
-	 * intercepts over all of the states shallower than the candidate one
-	 * is greater than a half of the number of recent events taken into
-	 * account, the CPU is likely to wake up early, so find an alternative
-	 * idle state to select.
+	 * all of the deeper states a shallower idle state is likely to be a
+	 * better choice.
 	 */
-	alt_intercepts = 2 * idx_intercept_sum > cpu_data->total - idx_hit_sum;
-	alt_recent = idx_recent_sum > NR_RECENT / 2;
-	if (alt_recent || alt_intercepts) {
+	if (2 * idx_intercept_sum > cpu_data->total - idx_hit_sum) {
 		s64 first_suitable_span_ns = duration_ns;
 		int first_suitable_idx = idx;
 
 		/*
 		 * Look for the deepest idle state whose target residency had
 		 * not exceeded the idle duration in over a half of the relevant
-		 * cases (both with respect to intercepts overall and with
-		 * respect to the recent intercepts only) in the past.
+		 * cases in the past.
 		 *
 		 * Take the possible latency constraint and duration limitation
 		 * present if the tick has been stopped already into account.
 		 */
 		intercept_sum = 0;
-		recent_sum = 0;
 
 		for (i = idx - 1; i >= 0; i--) {
 			struct teo_bin *bin = &cpu_data->state_bins[i];
 			s64 span_ns;
 
 			intercept_sum += bin->intercepts;
-			recent_sum += bin->recent;
 
 			span_ns = teo_middle_of_bin(i, drv);
 
-			if ((!alt_recent || 2 * recent_sum > idx_recent_sum) &&
-			    (!alt_intercepts ||
-			     2 * intercept_sum > idx_intercept_sum)) {
+			if (2 * intercept_sum > idx_intercept_sum) {
 				if (teo_time_ok(span_ns) &&
 				    !dev->states_usage[i].disable) {
 					idx = i;
@@ -508,13 +461,9 @@  static int teo_enable_device(struct cpuidle_driver *drv,
 			     struct cpuidle_device *dev)
 {
 	struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
-	int i;
 
 	memset(cpu_data, 0, sizeof(*cpu_data));
 
-	for (i = 0; i < NR_RECENT; i++)
-		cpu_data->recent_idx[i] = -1;
-
 	return 0;
 }