diff mbox series

[v2] cpufreq: Register with perf domain before

Message ID 20230118044733.29391-1-bhuwz@163.com
State New
Headers show
Series [v2] cpufreq: Register with perf domain before | expand

Commit Message

Vincent Wang Jan. 18, 2023, 4:47 a.m. UTC
From: Vincent Wang <vincentwang3@lenovo.com>

We found the following issue during kernel boot on android phone:

[    1.325272][    T1] cpu cpu0: EM: created perf domain
[    1.329317][    T1] cpu cpu4: EM: created perf domain
[    1.337597][   T76] pd_init: no EM found for CPU7
[    1.350849][    T1] cpu cpu7: EM: created perf domain

pd init for cluster2 is executed in a kworker thread and
is earlier than the perf domain creation for cluster2.

pd_init() is called from the cpufreq notification of
CPUFREQ_CREATE_POLICY in cpufreq_online(), which is earlier
than that cpufreq_driver->register_em() is called.

To avoid this issue, register with perf domain should be
earlier than the notification is sent.

Signed-off-by: Vincent Wang <vincentwang3@lenovo.com>
---
v1 -> v2:  based on Rafael's comment, adjust the order of
regitster perf domain. But I think it's no need to be in
advance to the initialization of frequency QoS.

Change the description of this patch.

 drivers/cpufreq/cpufreq.c | 31 ++++++++++++++++++-------------
 1 file changed, 18 insertions(+), 13 deletions(-)

Comments

Viresh Kumar Jan. 18, 2023, 8:49 a.m. UTC | #1
On 18-01-23, 12:47, Vincent Wang wrote:
> From: Vincent Wang <vincentwang3@lenovo.com>
> 
> We found the following issue during kernel boot on android phone:
> 
> [    1.325272][    T1] cpu cpu0: EM: created perf domain
> [    1.329317][    T1] cpu cpu4: EM: created perf domain
> [    1.337597][   T76] pd_init: no EM found for CPU7
> [    1.350849][    T1] cpu cpu7: EM: created perf domain
> 
> pd init for cluster2 is executed in a kworker thread and
> is earlier than the perf domain creation for cluster2.

Can you please give detail of the exact code path, for mainline kernel
? I am not sure which kworker thread are you talking about here.

> pd_init() is called from the cpufreq notification of
> CPUFREQ_CREATE_POLICY in cpufreq_online(), which is earlier
> than that cpufreq_driver->register_em() is called.
Lukasz Luba Jan. 18, 2023, 11:55 a.m. UTC | #2
+Quentin and Todd

On 1/18/23 10:00, Vincent Wang3 wrote:
> Hi, Lukasz, Viresh
> 
> I found this issue on Android phone with kernel 5.15, and the governor is schedutil.
> With the pd_init issue, the rd->pd will be NULL and EAS doesn't work since rd->pd will be checked in function find_energy_efficient_cpu( ).

That's true, but that's not mainline issue. This discussion should
be moved to Android kernel bug tracking system.
I've seen probably similar issue in Android.

I have added Quentin and Todd who are more familiar with this design in
Android kernel.

> 
> I didn't notice the modification in schedutil at mainline, so I submitted this patch to fix the problem, sorry!
> 
> However, besides EAS will check rd->pd, I noticed that it's also used in the find_busiest_group() and update_cpu_capacity() functions at mainline, so I'm worried that may not be enough to circumvent it only in schedutil.

Those function indeed use the perf domain, but they don't use the
Energy Model pointer: rd->pd->em_pd, so they don't suffer.

> 
> 
> detail of the exact code path:
> in function register_cpufreq_notifier( ), the notifier_block init_cpu_capacity_notifier is registered with CPUFREQ_POLICY_NOTIFIER.
> During the initialization of a new policy, the notifier_call function init_cpu_capacity_callback( ) will be called,in which the work update_topology_flags_work will be scheduled.
> 
> for the kwork function update_topology_flags_workfn( ):
> update_topology_flags_workfn( ) -> rebuild_sched_domains( ) -> rebuild_sched_domains_locked( ) -> partition_sched_domains_locked( ) -> build_perf_domains( ) -> pd_init( )
> 
> 
> BRs
> Vincent
> 
> -----邮件原件-----
> 发件人: Lukasz Luba <lukasz.luba@arm.com>
> 发送时间: 2023年1月18日 17:24
> 收件人: Viresh Kumar <viresh.kumar@linaro.org>; Vincent Wang <bhuwz@163.com>; Vincent Wang3 <vincentwang3@lenovo.com>
> 抄送: rafael@kernel.org; linux-pm@vger.kernel.org; linux-kernel@vger.kernel.org
> 主题: [External] Re: [PATCH v2] cpufreq: Register with perf domain before
> 
> Hi Viresh, Vincent
> 
> I'm surprised seeing this thread and thanks that you Viresh have answered, so it could go through my spam/junk filters.
> (I hope when answer to that domain it would change something).
> 
> On 1/18/23 08:49, Viresh Kumar wrote:
>> On 18-01-23, 12:47, Vincent Wang wrote:
>>> From: Vincent Wang <vincentwang3@lenovo.com>
>>>
>>> We found the following issue during kernel boot on android phone:
>>>
>>> [    1.325272][    T1] cpu cpu0: EM: created perf domain
>>> [    1.329317][    T1] cpu cpu4: EM: created perf domain
>>> [    1.337597][   T76] pd_init: no EM found for CPU7
>>> [    1.350849][    T1] cpu cpu7: EM: created perf domain
>>>
>>> pd init for cluster2 is executed in a kworker thread and is earlier
>>> than the perf domain creation for cluster2.
>>
>> Can you please give detail of the exact code path, for mainline kernel
>> ? I am not sure which kworker thread are you talking about here.
> 
> Please also tell us your cpufreq governor. The schedutil governor at mainline can handle those situations and we rebuild the perf domains here [1].
> 
>>
>>> pd_init() is called from the cpufreq notification of
>>> CPUFREQ_CREATE_POLICY in cpufreq_online(), which is earlier than that
>>> cpufreq_driver->register_em() is called.
> 
> Viresh, If that's an issue for other governors, than maybe we should address that. IMO the patch just relies on side-effect in arch_topology.c update_topology_flags_workfn(). That would be a workaround and dangerous if someone would change that arch_topology.c design. Shouldn't we come up with something reliable inside the cpufreq.c if there is a real issue?
> Even now, these two mechanisms:
> 1. in the schedutil [1]
> 2. in the update_topology_flags_workfn() are a bit leaky (in terms of design holes).
> 
> Regards,
> Lukasz
> 
> [1]
> https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Flatest%2Fsource%2Fkernel%2Fsched%2Fcpufreq_schedutil.c%23L858&data=05%7C01%7Cvincentwang3%40lenovo.com%7Cc69b7dca4401474bd23808daf935bfd5%7C5c7d0b28bdf8410caa934df372b16203%7C0%7C0%7C638096306507392144%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=WTPpWDytIsbfdW920cGNiHhEs7udgrX0O78sAh8JV7I%3D&reserved=0
>
Viresh Kumar Jan. 19, 2023, 5:39 a.m. UTC | #3
On 18-01-23, 09:24, Lukasz Luba wrote:
> Viresh, If that's an issue for other governors, than maybe
> we should address that.

As you said in the other email, it isn't a mainline issue, I will
ignore it then :)
Lukasz Luba Jan. 19, 2023, 7:58 a.m. UTC | #4
On 1/19/23 05:39, Viresh Kumar wrote:
> On 18-01-23, 09:24, Lukasz Luba wrote:
>> Viresh, If that's an issue for other governors, than maybe
>> we should address that.
> 
> As you said in the other email, it isn't a mainline issue, I will
> ignore it then :)
> 

Fair enough. Although, I hope one day, maybe at some conference,
we could discuss those Android-kernel vs. mainline differences.
IMO the gap isn't shrinking, but I would like to shrink it.
Especially when we are serious about long term kernel
support for mobile devices, this gap should be tiny.
Therefore, I would tackle this scheduler cpufreq governors and API
for them.
Viresh Kumar Jan. 19, 2023, 8:32 a.m. UTC | #5
On 19-01-23, 07:58, Lukasz Luba wrote:
> Although, I hope one day, maybe at some conference,
> we could discuss those Android-kernel vs. mainline differences.
> IMO the gap isn't shrinking, but I would like to shrink it.
> Especially when we are serious about long term kernel
> support for mobile devices, this gap should be tiny.
> Therefore, I would tackle this scheduler cpufreq governors and API
> for them.

Sure, or if there is something worth pushing to the mainline kernel,
cpufreq specific, then I am up for it.
diff mbox series

Patch

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 7e56a42750ea..a715c8323897 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -1431,6 +1431,24 @@  static int cpufreq_online(unsigned int cpu)
 			goto out_destroy_policy;
 		}
 
+		/*
+		 * Register with the energy model before
+		 * blocking_notifier_call_chain() is called for
+		 * CPUFREQ_CREATE_POLICY, which will result in rebuilding of the
+		 * sched domains in update_topology_flags_workfn().
+		 *
+		 * Register with the energy model before
+		 * sched_cpufreq_governor_change() is called, which will result
+		 * in rebuilding of the sched domains, which should only be done
+		 * once the energy model is properly initialized for the policy
+		 * first.
+		 *
+		 * Also, this should be called before the policy is registered
+		 * with cooling framework.
+		 */
+		if (cpufreq_driver->register_em)
+			cpufreq_driver->register_em(policy);
+
 		blocking_notifier_call_chain(&cpufreq_policy_notifier_list,
 				CPUFREQ_CREATE_POLICY, policy);
 	}
@@ -1493,19 +1511,6 @@  static int cpufreq_online(unsigned int cpu)
 		write_lock_irqsave(&cpufreq_driver_lock, flags);
 		list_add(&policy->policy_list, &cpufreq_policy_list);
 		write_unlock_irqrestore(&cpufreq_driver_lock, flags);
-
-		/*
-		 * Register with the energy model before
-		 * sched_cpufreq_governor_change() is called, which will result
-		 * in rebuilding of the sched domains, which should only be done
-		 * once the energy model is properly initialized for the policy
-		 * first.
-		 *
-		 * Also, this should be called before the policy is registered
-		 * with cooling framework.
-		 */
-		if (cpufreq_driver->register_em)
-			cpufreq_driver->register_em(policy);
 	}
 
 	ret = cpufreq_init_policy(policy);