Message ID | 20220406220809.22555-3-lukasz.luba@arm.com |
---|---|
State | New |
Headers | show |
Series | Introduce Cpufreq Active Stats | expand |
Hi Lukasz, On Wed, 2022-04-06 at 23:08 +0100, Lukasz Luba wrote: > @@ -231,6 +232,8 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct > cpuidle_driver *drv, > trace_cpu_idle(index, dev->cpu); > time_start = ns_to_ktime(local_clock()); > > + cpufreq_active_stats_cpu_idle_enter(time_start); > + > stop_critical_timings(); > if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE)) > rcu_idle_enter(); > @@ -243,6 +246,8 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct > cpuidle_driver *drv, > time_end = ns_to_ktime(local_clock()); > trace_cpu_idle(PWR_EVENT_EXIT, dev->cpu); > > + cpufreq_active_stats_cpu_idle_exit(time_end); > + At this point the interrupts are still disabled, and they get enabled later. So the more code you add here and the longer it executes, the longer you delay the interrupts. Therefore, you are effectively increasing IRQ latency from idle by adding more code here. How much? I do not know, depends on how much code you need to execute. But the amount of code in functions like this tends to increase over time. So the risk is that we'll keep making 'cpufreq_active_stats_cpu_idle_exit()', and (may be unintentionally) increase idle interrupt latency. This is not ideal. We use the 'wult' tool (https://github.com/intel/wult) to measure C-states latency and interrupt latency on Intel platforms, and for fast C-states like Intel C1, we can see that even the current code between C-state exit and interrupt re-enabled adds measurable overhead. I am worried about adding more stuff here. Please, consider getting the stats after interrupts are re-enabled. You may lose some "precision" because of that, but it is probably overall better that adding to idle interrupt latency. > /* The cpu is no longer idle or about to enter idle. */ > sched_idle_set_state(NULL);
Hi Artem, Thanks for comments! On 4/26/22 13:05, Artem Bityutskiy wrote: > Hi Lukasz, > > On Wed, 2022-04-06 at 23:08 +0100, Lukasz Luba wrote: >> @@ -231,6 +232,8 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct >> cpuidle_driver *drv, >> trace_cpu_idle(index, dev->cpu); >> time_start = ns_to_ktime(local_clock()); >> >> + cpufreq_active_stats_cpu_idle_enter(time_start); >> + >> stop_critical_timings(); >> if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE)) >> rcu_idle_enter(); >> @@ -243,6 +246,8 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct >> cpuidle_driver *drv, >> time_end = ns_to_ktime(local_clock()); >> trace_cpu_idle(PWR_EVENT_EXIT, dev->cpu); >> >> + cpufreq_active_stats_cpu_idle_exit(time_end); >> + > > At this point the interrupts are still disabled, and they get enabled later. So > the more code you add here and the longer it executes, the longer you delay the > interrupts. Therefore, you are effectively increasing IRQ latency from idle by > adding more code here. Good point, I've added it just below the trace_cpu_idle(). > > How much? I do not know, depends on how much code you need to execute. But the > amount of code in functions like this tends to increase over time. > > So the risk is that we'll keep making 'cpufreq_active_stats_cpu_idle_exit()', > and (may be unintentionally) increase idle interrupt latency. > > This is not ideal. I agree, I will try to find a better place to put this call. > > We use the 'wult' tool (https://github.com/intel/wult) to measure C-states > latency and interrupt latency on Intel platforms, and for fast C-states like > Intel C1, we can see that even the current code between C-state exit and > interrupt re-enabled adds measurable overhead. Thanks for the hint and the link. I'll check that tool. I don't know if that would work with my platforms. I might create some tests for this latency measurements. > > I am worried about adding more stuff here. > > Please, consider getting the stats after interrupts are re-enabled. You may lose > some "precision" because of that, but it is probably overall better that adding > to idle interrupt latency. Definitely. I don't need such precision, so later when interrupts are re-enabled is OK for me. > >> /* The cpu is no longer idle or about to enter idle. */ >> sched_idle_set_state(NULL); > > This new call might be empty for your x86 kernels, since probably you set the CONFIG_CPU_FREQ_STAT.I can add additional config so platforms might still have CONFIG_CPU_FREQ_STAT but avoid this new feature and additional overhead in idle exit when e.g. CONFIG_CPU_FREQ_ACTIVE_STAT is not set. The x86 platforms won't use IPA governor, so it's reasonable to do this way. Does this sounds good? Regards, Lukasz
On Tue, 2022-04-26 at 16:01 +0100, Lukasz Luba wrote: > > I am worried about adding more stuff here. > > > > Please, consider getting the stats after interrupts are re-enabled. You may > > lose > > some "precision" because of that, but it is probably overall better that > > adding > > to idle interrupt latency. > > Definitely. I don't need such precision, so later when interrupts are > re-enabled is OK for me. Thanks. That is preferable in general: we do not do things with interrupts disabled unless there is a very good reason to. > > This new call might be empty for your x86 kernels, since probably > you set the CONFIG_CPU_FREQ_STAT.I can add additional config > so platforms might still have CONFIG_CPU_FREQ_STAT but avoid this > new feature and additional overhead in idle exit when e.g. > CONFIG_CPU_FREQ_ACTIVE_STAT is not set. > > The x86 platforms won't use IPA governor, so it's reasonable to > do this way. > > Does this sounds good? I did not thoroughly read your patches so can't comment on the details. Just pointing that in general idle path is to be considered the critical path, especially the part before interrupts are re-enabled. Not only on x86, but on all platforms using cpuidle. This does not mean we can't read more statistics there, but it does mean that we should be very careful about added overhead, keep it under control, etc. Thank you!
On 4/26/22 17:29, Artem Bityutskiy wrote: > On Tue, 2022-04-26 at 16:01 +0100, Lukasz Luba wrote: >>> I am worried about adding more stuff here. >>> >>> Please, consider getting the stats after interrupts are re-enabled. You may >>> lose >>> some "precision" because of that, but it is probably overall better that >>> adding >>> to idle interrupt latency. >> >> Definitely. I don't need such precision, so later when interrupts are >> re-enabled is OK for me. > > Thanks. That is preferable in general: we do not do things with interrupts > disabled unless there is a very good reason to. > >> >> This new call might be empty for your x86 kernels, since probably >> you set the CONFIG_CPU_FREQ_STAT.I can add additional config >> so platforms might still have CONFIG_CPU_FREQ_STAT but avoid this >> new feature and additional overhead in idle exit when e.g. >> CONFIG_CPU_FREQ_ACTIVE_STAT is not set. >> >> The x86 platforms won't use IPA governor, so it's reasonable to >> do this way. >> >> Does this sounds good? > > I did not thoroughly read your patches so can't comment on the details. > > Just pointing that in general idle path is to be considered the critical path, > especially the part before interrupts are re-enabled. Not only on x86, > but on all platforms using cpuidle. This does not mean we can't read more > statistics there, but it does mean that we should be very careful about added > overhead, keep it under control, etc. I totally agree with you. I didn't know that the interrupts were not enabled at that moment. I'll address it. Regards, Lukasz
diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c index ef2ea1b12cd8..f19711b95afb 100644 --- a/drivers/cpuidle/cpuidle.c +++ b/drivers/cpuidle/cpuidle.c @@ -16,6 +16,7 @@ #include <linux/notifier.h> #include <linux/pm_qos.h> #include <linux/cpu.h> +#include <linux/cpufreq_stats.h> #include <linux/cpuidle.h> #include <linux/ktime.h> #include <linux/hrtimer.h> @@ -231,6 +232,8 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv, trace_cpu_idle(index, dev->cpu); time_start = ns_to_ktime(local_clock()); + cpufreq_active_stats_cpu_idle_enter(time_start); + stop_critical_timings(); if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE)) rcu_idle_enter(); @@ -243,6 +246,8 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv, time_end = ns_to_ktime(local_clock()); trace_cpu_idle(PWR_EVENT_EXIT, dev->cpu); + cpufreq_active_stats_cpu_idle_exit(time_end); + /* The cpu is no longer idle or about to enter idle. */ sched_idle_set_state(NULL);
The Cpufreq Active Stats tracks and accounts the activity of the CPU for each performance level. It accounts the real residency, when the CPU was not idle, at a given performance level. This patch adds needed calls which provide the CPU idle entry/exit events to the Cpufreq Active Stats. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> --- drivers/cpuidle/cpuidle.c | 5 +++++ 1 file changed, 5 insertions(+)