Message ID | 20230925081139.1305766-11-lukasz.luba@arm.com |
---|---|
State | New |
Headers | show |
Series | Introduce runtime modifiable Energy Model | expand |
Hi Lukasz, kernel test robot noticed the following build warnings: [auto build test WARNING on rafael-pm/linux-next] [also build test WARNING on rafael-pm/thermal linus/master v6.6-rc3 next-20230926] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Lukasz-Luba/PM-EM-Add-missing-newline-for-the-message-log/20230925-181243 base: https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next patch link: https://lore.kernel.org/r/20230925081139.1305766-11-lukasz.luba%40arm.com patch subject: [PATCH v4 10/18] PM: EM: Add RCU mechanism which safely cleans the old data config: i386-randconfig-063-20230926 (https://download.01.org/0day-ci/archive/20230926/202309261850.jrucSbN8-lkp@intel.com/config) compiler: gcc-12 (Debian 12.2.0-14) 12.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20230926/202309261850.jrucSbN8-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202309261850.jrucSbN8-lkp@intel.com/ sparse warnings: (new ones prefixed by >>) >> kernel/power/energy_model.c:125:13: sparse: sparse: incorrect type in assignment (different address spaces) @@ expected struct em_perf_table *tmp @@ got struct em_perf_table [noderef] __rcu *runtime_table @@ kernel/power/energy_model.c:125:13: sparse: expected struct em_perf_table *tmp kernel/power/energy_model.c:125:13: sparse: got struct em_perf_table [noderef] __rcu *runtime_table vim +125 kernel/power/energy_model.c 118 119 static void em_perf_runtime_table_set(struct device *dev, 120 struct em_perf_table *runtime_table) 121 { 122 struct em_perf_domain *pd = dev->em_pd; 123 struct em_perf_table *tmp; 124 > 125 tmp = pd->runtime_table; 126 127 rcu_assign_pointer(pd->runtime_table, runtime_table); 128 129 em_cpufreq_update_efficiencies(dev, runtime_table->state); 130 131 /* Don't free default table since it's used by other frameworks. */ 132 if (tmp != pd->default_table) 133 call_rcu(&tmp->rcu, em_destroy_rt_table_rcu); 134 } 135
On 9/26/23 20:26, Rafael J. Wysocki wrote: > On Mon, Sep 25, 2023 at 10:11 AM Lukasz Luba <lukasz.luba@arm.com> wrote: >> >> The EM is going to support runtime modifications of the power data. >> Introduce RCU safe mechanism to clean up the old allocated EM data. > > "RCU-based" probably and "to clean up the old EM data safely". Yes, thanks > >> It also adds a mutex for the EM structure to serialize the modifiers. > > This part doesn't match the code changes in the patch. Good catch. It left from some older version. We use the existing em_pd_mutex. > >> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> >> --- >> kernel/power/energy_model.c | 29 +++++++++++++++++++++++++++++ >> 1 file changed, 29 insertions(+) >> >> diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c >> index 5b40db38b745..2345837bfd2c 100644 >> --- a/kernel/power/energy_model.c >> +++ b/kernel/power/energy_model.c >> @@ -23,6 +23,9 @@ >> */ >> static DEFINE_MUTEX(em_pd_mutex); >> >> +static void em_cpufreq_update_efficiencies(struct device *dev, >> + struct em_perf_state *table); >> + >> static bool _is_cpu_device(struct device *dev) >> { >> return (dev->bus == &cpu_subsys); >> @@ -104,6 +107,32 @@ static void em_debug_create_pd(struct device *dev) {} >> static void em_debug_remove_pd(struct device *dev) {} >> #endif >> >> +static void em_destroy_rt_table_rcu(struct rcu_head *rp) > > Adding static functions without callers will obviously cause the > compiler to complain, which is one of the reasons to avoid doing that. > The other is that it is hard to say how these functions are going to > be used without reviewing multiple patches simultaneously, which is a > pain as far as I'm concerned. It is used in this patch, but inside the call_rcu() as 2nd arg. I have marked that below. The compiler didn't complain IIRC. > >> +{ >> + struct em_perf_table *runtime_table; >> + >> + runtime_table = container_of(rp, struct em_perf_table, rcu); >> + kfree(runtime_table->state); >> + kfree(runtime_table); > > If runtime_table and its state were allocated in one go, it would be > possible to free them in one go either. > > For some reason, you don't seem to want to do that, but why? We had a few internal reviews and there were voices where saying that it's better to have 2 identical tables: 'default_table' and 'runtime_table' to make sure it's visible everywhere when it's used. That made the need to actually have also the 'state' table inside. I don't see it as a big problem, though. > >> +} >> + >> +static void em_perf_runtime_table_set(struct device *dev, >> + struct em_perf_table *runtime_table) >> +{ >> + struct em_perf_domain *pd = dev->em_pd; >> + struct em_perf_table *tmp; >> + >> + tmp = pd->runtime_table; >> + >> + rcu_assign_pointer(pd->runtime_table, runtime_table); >> + >> + em_cpufreq_update_efficiencies(dev, runtime_table->state); >> + >> + /* Don't free default table since it's used by other frameworks. */ > > Apparently, some frameworks are only going to use the default table > while the runtime-updatable table will be used somewhere else at the > same time. > > I'm not really sure if this is a good idea. Runtime table is only for driving the task placement in the EAS. The thermal gov IPA won't make better decisions because it already has the mechanism to accumulate the error that it made. The same applies to DTPM, which works in a more 'configurable' way, rather that hard optimization mechanism (like EAS). > >> + if (tmp != pd->default_table) >> + call_rcu(&tmp->rcu, em_destroy_rt_table_rcu); The em_destroy_rt_table_rcu() is used here ^^^^^^
On Fri, Sep 29, 2023 at 11:36 AM Lukasz Luba <lukasz.luba@arm.com> wrote: > > > > On 9/26/23 20:26, Rafael J. Wysocki wrote: > > On Mon, Sep 25, 2023 at 10:11 AM Lukasz Luba <lukasz.luba@arm.com> wrote: > >> > >> The EM is going to support runtime modifications of the power data. > >> Introduce RCU safe mechanism to clean up the old allocated EM data. > > > > "RCU-based" probably and "to clean up the old EM data safely". > > Yes, thanks > > > > >> It also adds a mutex for the EM structure to serialize the modifiers. > > > > This part doesn't match the code changes in the patch. > > Good catch. It left from some older version. We use the existing > em_pd_mutex. > > > > >> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> > >> --- > >> kernel/power/energy_model.c | 29 +++++++++++++++++++++++++++++ > >> 1 file changed, 29 insertions(+) > >> > >> diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c > >> index 5b40db38b745..2345837bfd2c 100644 > >> --- a/kernel/power/energy_model.c > >> +++ b/kernel/power/energy_model.c > >> @@ -23,6 +23,9 @@ > >> */ > >> static DEFINE_MUTEX(em_pd_mutex); > >> > >> +static void em_cpufreq_update_efficiencies(struct device *dev, > >> + struct em_perf_state *table); > >> + > >> static bool _is_cpu_device(struct device *dev) > >> { > >> return (dev->bus == &cpu_subsys); > >> @@ -104,6 +107,32 @@ static void em_debug_create_pd(struct device *dev) {} > >> static void em_debug_remove_pd(struct device *dev) {} > >> #endif > >> > >> +static void em_destroy_rt_table_rcu(struct rcu_head *rp) > > > > Adding static functions without callers will obviously cause the > > compiler to complain, which is one of the reasons to avoid doing that. > > The other is that it is hard to say how these functions are going to > > be used without reviewing multiple patches simultaneously, which is a > > pain as far as I'm concerned. > > It is used in this patch, but inside the call_rcu() as 2nd arg. I missed that, sorry for the noise. > I have marked that below. The compiler didn't complain IIRC. > > > > >> +{ > >> + struct em_perf_table *runtime_table; > >> + > >> + runtime_table = container_of(rp, struct em_perf_table, rcu); > >> + kfree(runtime_table->state); > >> + kfree(runtime_table); > > > > If runtime_table and its state were allocated in one go, it would be > > possible to free them in one go either. > > > > For some reason, you don't seem to want to do that, but why? > > We had a few internal reviews and there were voices where saying that > it's better to have 2 identical tables: 'default_table' and > 'runtime_table' to make sure it's visible everywhere when it's used. > That made the need to actually have also the 'state' table inside. > I don't see it as a big problem, though. What I'm trying to say is that you can allocate runtime_table along with the table pointed to by its state field in one invocation of kzalloc() (say). Having just one memory region to free eventually instead of two of them would help to avoid some complexity, especially in the next patch. > > > >> +} > >> + > >> +static void em_perf_runtime_table_set(struct device *dev, > >> + struct em_perf_table *runtime_table) > >> +{ > >> + struct em_perf_domain *pd = dev->em_pd; > >> + struct em_perf_table *tmp; > >> + > >> + tmp = pd->runtime_table; > >> + > >> + rcu_assign_pointer(pd->runtime_table, runtime_table); > >> + > >> + em_cpufreq_update_efficiencies(dev, runtime_table->state); > >> + > >> + /* Don't free default table since it's used by other frameworks. */ > > > > Apparently, some frameworks are only going to use the default table > > while the runtime-updatable table will be used somewhere else at the > > same time. > > > > I'm not really sure if this is a good idea. > > Runtime table is only for driving the task placement in the EAS. > > The thermal gov IPA won't make better decisions because it already > has the mechanism to accumulate the error that it made. > > The same applies to DTPM, which works in a more 'configurable' way, > rather that hard optimization mechanism (like EAS). My understanding of the above is that the other EM users don't really care that much so they can get away with using the default table all the time, but EAS needs more accuracy, so the table used by it needs to be adjusted in certain situations. Fair enough, I'm assuming that you've done some research around it. Still, this is rather confusing. > > > >> + if (tmp != pd->default_table) > >> + call_rcu(&tmp->rcu, em_destroy_rt_table_rcu); > > The em_destroy_rt_table_rcu() is used here ^^^^^^
On 9/29/23 13:59, Rafael J. Wysocki wrote: > On Fri, Sep 29, 2023 at 11:36 AM Lukasz Luba <lukasz.luba@arm.com> wrote: [snip] >> We had a few internal reviews and there were voices where saying that >> it's better to have 2 identical tables: 'default_table' and >> 'runtime_table' to make sure it's visible everywhere when it's used. >> That made the need to actually have also the 'state' table inside. >> I don't see it as a big problem, though. > > What I'm trying to say is that you can allocate runtime_table along > with the table pointed to by its state field in one invocation of > kzalloc() (say). > > Having just one memory region to free eventually instead of two of > them would help to avoid some complexity, especially in the next > patch. I think, I know what you mean, basically: ------------------------------ struct em_perf_table { struct rcu_head rcu; struct em_perf_state state[]; } kzalloc(sizeof(struct em_perf_table) + N * sizeof(struct em_perf_state)) ------ IMO that should also be OK in the rest of places. I agree the alloc/free code would be smaller. Let me do that than. > >>> >>>> +} >>>> + >>>> +static void em_perf_runtime_table_set(struct device *dev, >>>> + struct em_perf_table *runtime_table) >>>> +{ >>>> + struct em_perf_domain *pd = dev->em_pd; >>>> + struct em_perf_table *tmp; >>>> + >>>> + tmp = pd->runtime_table; >>>> + >>>> + rcu_assign_pointer(pd->runtime_table, runtime_table); >>>> + >>>> + em_cpufreq_update_efficiencies(dev, runtime_table->state); >>>> + >>>> + /* Don't free default table since it's used by other frameworks. */ >>> >>> Apparently, some frameworks are only going to use the default table >>> while the runtime-updatable table will be used somewhere else at the >>> same time. >>> >>> I'm not really sure if this is a good idea. >> >> Runtime table is only for driving the task placement in the EAS. >> >> The thermal gov IPA won't make better decisions because it already >> has the mechanism to accumulate the error that it made. >> >> The same applies to DTPM, which works in a more 'configurable' way, >> rather that hard optimization mechanism (like EAS). > > My understanding of the above is that the other EM users don't really > care that much so they can get away with using the default table all > the time, but EAS needs more accuracy, so the table used by it needs > to be adjusted in certain situations. Yes > > Fair enough, I'm assuming that you've done some research around it. > Still, this is rather confusing. Yes, I have presented those ~2y ago in Android Gerrit world (got feedback from a few vendors) and in a few Linux conferences. For now we don't plan to have this feature for the thermal governor or something similar. > >>> >>>> + if (tmp != pd->default_table) >>>> + call_rcu(&tmp->rcu, em_destroy_rt_table_rcu); >> >> The em_destroy_rt_table_rcu() is used here ^^^^^^
Hi Rafael, A change of direction here, regarding your comment below. On 10/2/23 14:44, Lukasz Luba wrote: > > > On 9/29/23 13:59, Rafael J. Wysocki wrote: >> On Fri, Sep 29, 2023 at 11:36 AM Lukasz Luba <lukasz.luba@arm.com> wrote: > > [snip] > [snip] >>>> Apparently, some frameworks are only going to use the default table >>>> while the runtime-updatable table will be used somewhere else at the >>>> same time. >>>> >>>> I'm not really sure if this is a good idea. >>> >>> Runtime table is only for driving the task placement in the EAS. >>> >>> The thermal gov IPA won't make better decisions because it already >>> has the mechanism to accumulate the error that it made. >>> >>> The same applies to DTPM, which works in a more 'configurable' way, >>> rather that hard optimization mechanism (like EAS). >> >> My understanding of the above is that the other EM users don't really >> care that much so they can get away with using the default table all >> the time, but EAS needs more accuracy, so the table used by it needs >> to be adjusted in certain situations. > > Yes > >> >> Fair enough, I'm assuming that you've done some research around it. >> Still, this is rather confusing. > > Yes, I have presented those ~2y ago in Android Gerrit world > (got feedback from a few vendors) and in a few Linux conferences. > > For now we don't plan to have this feature for the thermal > governor or something similar. > I have discussed with one of our partners your comment about 2 tables. They would like to have this runtime modified EM in other places as well: DTPM and thermal governor. So you had good gut feeling. In the past in our IPA (thermal gov ~2016 and kernel v4.14) we had two callbacks: - get_static_power() [1] - get_dynamic_power() [2] Later ~2017/2018 v4.16 the static power mechanism was removed completely by this commit 84fe2cab48590e4373978e4e. The way how it was design, implemented and used justified that decision. We later used EM in the cpu cooling which also only had dynamic power information. The PID mechanism in IPA tries to compensate that missing information (about changed static power in time or a chip binning) and adjusts the 'error'. How good and fast that is in all situations - it's a different story (out of this scope). So, IPA should not be worse with the runtime table. The static power was on the chips and probably will be still. You might remember my slide 13 from OSPM2024 showing two power usage plots for the same Big CPU and 1.4GHz fixed (50% of fmax): - w/ GPU working in the background using 1-1.5W - w/o GPU in the background The same workload run on Big, but power bigger is ~15% higher after ~1min. The static power (leakage) is the issue that this patch tries to address for EAS. Although, there is not only the leakage. It's about the whole 'profile', which can be different than what could be built during boot default information. So we would want to go for one single table in EM, which is runtime modifiable. That is something that you might be more confident and we would have less diversity (2 tables) in the kernel. Regards, Lukasz [1] https://elixir.bootlin.com/linux/v4.14/source/drivers/thermal/cpu_cooling.c#L336 [2] https://elixir.bootlin.com/linux/v4.14/source/drivers/thermal/cpu_cooling.c#L383
On Fri, Oct 6, 2023 at 1:45 AM Lukasz Luba <lukasz.luba@arm.com> wrote: > > Hi Rafael, > > A change of direction here, regarding your comment below. > > On 10/2/23 14:44, Lukasz Luba wrote: > > > > > > On 9/29/23 13:59, Rafael J. Wysocki wrote: > >> On Fri, Sep 29, 2023 at 11:36 AM Lukasz Luba <lukasz.luba@arm.com> wrote: > > > > [snip] > > > > [snip] > > >>>> Apparently, some frameworks are only going to use the default table > >>>> while the runtime-updatable table will be used somewhere else at the > >>>> same time. > >>>> > >>>> I'm not really sure if this is a good idea. > >>> > >>> Runtime table is only for driving the task placement in the EAS. > >>> > >>> The thermal gov IPA won't make better decisions because it already > >>> has the mechanism to accumulate the error that it made. > >>> > >>> The same applies to DTPM, which works in a more 'configurable' way, > >>> rather that hard optimization mechanism (like EAS). > >> > >> My understanding of the above is that the other EM users don't really > >> care that much so they can get away with using the default table all > >> the time, but EAS needs more accuracy, so the table used by it needs > >> to be adjusted in certain situations. > > > > Yes > > > >> > >> Fair enough, I'm assuming that you've done some research around it. > >> Still, this is rather confusing. > > > > Yes, I have presented those ~2y ago in Android Gerrit world > > (got feedback from a few vendors) and in a few Linux conferences. > > > > For now we don't plan to have this feature for the thermal > > governor or something similar. > > > > I have discussed with one of our partners your comment about 2 tables. > They would like to have this runtime modified EM in other places > as well: DTPM and thermal governor. So you had good gut feeling. > > In the past in our IPA (thermal gov ~2016 and kernel v4.14) we > had two callbacks: > - get_static_power() [1] > - get_dynamic_power() [2] > > Later ~2017/2018 v4.16 the static power mechanism was removed > completely by this commit 84fe2cab48590e4373978e4e. > The way how it was design, implemented and used justified that > decision. We later used EM in the cpu cooling which also only > had dynamic power information. > > The PID mechanism in IPA tries to compensate that > missing information (about changed static power in time or a chip > binning) and adjusts the 'error'. How good and fast that is in all > situations - it's a different story (out of this scope). > So, IPA should not be worse with the runtime table. > > The static power was on the chips and probably will be still. > You might remember my slide 13 from OSPM2024 showing two power > usage plots for the same Big CPU and 1.4GHz fixed (50% of fmax): > - w/ GPU working in the background using 1-1.5W > - w/o GPU in the background > > The same workload run on Big, but power bigger is ~15% higher > after ~1min. > > The static power (leakage) is the issue that this patch tries > to address for EAS. Although, there is not only the leakage. > It's about the whole 'profile', which can be different than what > could be built during boot default information. > > So we would want to go for one single table in EM, which > is runtime modifiable. > > That is something that you might be more confident and we would > have less diversity (2 tables) in the kernel. > > Regards, > Lukasz > > Indeed, we had a conversation about this with Lukasz recently. The key idea is that there is no compelling reason to introduce diversity in the mathematics involved. If we have confidence in the superior accuracy of our model, it should be universally implemented. While the governors are designed with some error tolerance, they can benefit from enhanced accuracy in their operation. Thanks! -Wei > [1] > https://elixir.bootlin.com/linux/v4.14/source/drivers/thermal/cpu_cooling.c#L336 > [2] > https://elixir.bootlin.com/linux/v4.14/source/drivers/thermal/cpu_cooling.c#L383
On Wed, Oct 11, 2023 at 6:03 PM Wei Wang <wvw@google.com> wrote: > > On Fri, Oct 6, 2023 at 1:45 AM Lukasz Luba <lukasz.luba@arm.com> wrote: > > > > Hi Rafael, > > > > A change of direction here, regarding your comment below. > > > > On 10/2/23 14:44, Lukasz Luba wrote: > > > > > > > > > On 9/29/23 13:59, Rafael J. Wysocki wrote: > > >> On Fri, Sep 29, 2023 at 11:36 AM Lukasz Luba <lukasz.luba@arm.com> wrote: > > > > > > [snip] > > > > > > > [snip] > > > > >>>> Apparently, some frameworks are only going to use the default table > > >>>> while the runtime-updatable table will be used somewhere else at the > > >>>> same time. > > >>>> > > >>>> I'm not really sure if this is a good idea. > > >>> > > >>> Runtime table is only for driving the task placement in the EAS. > > >>> > > >>> The thermal gov IPA won't make better decisions because it already > > >>> has the mechanism to accumulate the error that it made. > > >>> > > >>> The same applies to DTPM, which works in a more 'configurable' way, > > >>> rather that hard optimization mechanism (like EAS). > > >> > > >> My understanding of the above is that the other EM users don't really > > >> care that much so they can get away with using the default table all > > >> the time, but EAS needs more accuracy, so the table used by it needs > > >> to be adjusted in certain situations. > > > > > > Yes > > > > > >> > > >> Fair enough, I'm assuming that you've done some research around it. > > >> Still, this is rather confusing. > > > > > > Yes, I have presented those ~2y ago in Android Gerrit world > > > (got feedback from a few vendors) and in a few Linux conferences. > > > > > > For now we don't plan to have this feature for the thermal > > > governor or something similar. > > > > > > > I have discussed with one of our partners your comment about 2 tables. > > They would like to have this runtime modified EM in other places > > as well: DTPM and thermal governor. So you had good gut feeling. > > > > In the past in our IPA (thermal gov ~2016 and kernel v4.14) we > > had two callbacks: > > - get_static_power() [1] > > - get_dynamic_power() [2] > > > > Later ~2017/2018 v4.16 the static power mechanism was removed > > completely by this commit 84fe2cab48590e4373978e4e. > > The way how it was design, implemented and used justified that > > decision. We later used EM in the cpu cooling which also only > > had dynamic power information. > > > > The PID mechanism in IPA tries to compensate that > > missing information (about changed static power in time or a chip > > binning) and adjusts the 'error'. How good and fast that is in all > > situations - it's a different story (out of this scope). > > So, IPA should not be worse with the runtime table. > > > > The static power was on the chips and probably will be still. > > You might remember my slide 13 from OSPM2024 showing two power > > usage plots for the same Big CPU and 1.4GHz fixed (50% of fmax): > > - w/ GPU working in the background using 1-1.5W > > - w/o GPU in the background > > > > The same workload run on Big, but power bigger is ~15% higher > > after ~1min. > > > > The static power (leakage) is the issue that this patch tries > > to address for EAS. Although, there is not only the leakage. > > It's about the whole 'profile', which can be different than what > > could be built during boot default information. > > > > So we would want to go for one single table in EM, which > > is runtime modifiable. > > > > That is something that you might be more confident and we would > > have less diversity (2 tables) in the kernel. > > > > Regards, > > Lukasz > > > > > > Indeed, we had a conversation about this with Lukasz recently. The key > idea is that there is no compelling reason to introduce diversity in > the mathematics involved. If we have confidence in the superior > accuracy of our model, it should be universally implemented. While the > governors are designed with some error tolerance, they can benefit > from enhanced accuracy in their operation. I agree, thanks! > > [1] > > https://elixir.bootlin.com/linux/v4.14/source/drivers/thermal/cpu_cooling.c#L336 > > [2] > > https://elixir.bootlin.com/linux/v4.14/source/drivers/thermal/cpu_cooling.c#L383
On 10/11/23 17:07, Rafael J. Wysocki wrote: > On Wed, Oct 11, 2023 at 6:03 PM Wei Wang <wvw@google.com> wrote: >> >> On Fri, Oct 6, 2023 at 1:45 AM Lukasz Luba <lukasz.luba@arm.com> wrote: >>> >>> Hi Rafael, >>> >>> A change of direction here, regarding your comment below. >>> >>> On 10/2/23 14:44, Lukasz Luba wrote: >>>> >>>> >>>> On 9/29/23 13:59, Rafael J. Wysocki wrote: >>>>> On Fri, Sep 29, 2023 at 11:36 AM Lukasz Luba <lukasz.luba@arm.com> wrote: >>>> >>>> [snip] >>>> >>> >>> [snip] >>> >>>>>>> Apparently, some frameworks are only going to use the default table >>>>>>> while the runtime-updatable table will be used somewhere else at the >>>>>>> same time. >>>>>>> >>>>>>> I'm not really sure if this is a good idea. >>>>>> >>>>>> Runtime table is only for driving the task placement in the EAS. >>>>>> >>>>>> The thermal gov IPA won't make better decisions because it already >>>>>> has the mechanism to accumulate the error that it made. >>>>>> >>>>>> The same applies to DTPM, which works in a more 'configurable' way, >>>>>> rather that hard optimization mechanism (like EAS). >>>>> >>>>> My understanding of the above is that the other EM users don't really >>>>> care that much so they can get away with using the default table all >>>>> the time, but EAS needs more accuracy, so the table used by it needs >>>>> to be adjusted in certain situations. >>>> >>>> Yes >>>> >>>>> >>>>> Fair enough, I'm assuming that you've done some research around it. >>>>> Still, this is rather confusing. >>>> >>>> Yes, I have presented those ~2y ago in Android Gerrit world >>>> (got feedback from a few vendors) and in a few Linux conferences. >>>> >>>> For now we don't plan to have this feature for the thermal >>>> governor or something similar. >>>> >>> >>> I have discussed with one of our partners your comment about 2 tables. >>> They would like to have this runtime modified EM in other places >>> as well: DTPM and thermal governor. So you had good gut feeling. >>> >>> In the past in our IPA (thermal gov ~2016 and kernel v4.14) we >>> had two callbacks: >>> - get_static_power() [1] >>> - get_dynamic_power() [2] >>> >>> Later ~2017/2018 v4.16 the static power mechanism was removed >>> completely by this commit 84fe2cab48590e4373978e4e. >>> The way how it was design, implemented and used justified that >>> decision. We later used EM in the cpu cooling which also only >>> had dynamic power information. >>> >>> The PID mechanism in IPA tries to compensate that >>> missing information (about changed static power in time or a chip >>> binning) and adjusts the 'error'. How good and fast that is in all >>> situations - it's a different story (out of this scope). >>> So, IPA should not be worse with the runtime table. >>> >>> The static power was on the chips and probably will be still. >>> You might remember my slide 13 from OSPM2024 showing two power >>> usage plots for the same Big CPU and 1.4GHz fixed (50% of fmax): >>> - w/ GPU working in the background using 1-1.5W >>> - w/o GPU in the background >>> >>> The same workload run on Big, but power bigger is ~15% higher >>> after ~1min. >>> >>> The static power (leakage) is the issue that this patch tries >>> to address for EAS. Although, there is not only the leakage. >>> It's about the whole 'profile', which can be different than what >>> could be built during boot default information. >>> >>> So we would want to go for one single table in EM, which >>> is runtime modifiable. >>> >>> That is something that you might be more confident and we would >>> have less diversity (2 tables) in the kernel. >>> >>> Regards, >>> Lukasz >>> >>> >> >> Indeed, we had a conversation about this with Lukasz recently. The key >> idea is that there is no compelling reason to introduce diversity in >> the mathematics involved. If we have confidence in the superior >> accuracy of our model, it should be universally implemented. While the >> governors are designed with some error tolerance, they can benefit >> from enhanced accuracy in their operation. > > I agree, thanks! > Thank you Wei and Rafael. I'm working on that implementation and will be in v5 soon.
diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c index 5b40db38b745..2345837bfd2c 100644 --- a/kernel/power/energy_model.c +++ b/kernel/power/energy_model.c @@ -23,6 +23,9 @@ */ static DEFINE_MUTEX(em_pd_mutex); +static void em_cpufreq_update_efficiencies(struct device *dev, + struct em_perf_state *table); + static bool _is_cpu_device(struct device *dev) { return (dev->bus == &cpu_subsys); @@ -104,6 +107,32 @@ static void em_debug_create_pd(struct device *dev) {} static void em_debug_remove_pd(struct device *dev) {} #endif +static void em_destroy_rt_table_rcu(struct rcu_head *rp) +{ + struct em_perf_table *runtime_table; + + runtime_table = container_of(rp, struct em_perf_table, rcu); + kfree(runtime_table->state); + kfree(runtime_table); +} + +static void em_perf_runtime_table_set(struct device *dev, + struct em_perf_table *runtime_table) +{ + struct em_perf_domain *pd = dev->em_pd; + struct em_perf_table *tmp; + + tmp = pd->runtime_table; + + rcu_assign_pointer(pd->runtime_table, runtime_table); + + em_cpufreq_update_efficiencies(dev, runtime_table->state); + + /* Don't free default table since it's used by other frameworks. */ + if (tmp != pd->default_table) + call_rcu(&tmp->rcu, em_destroy_rt_table_rcu); +} + static int em_compute_costs(struct device *dev, struct em_perf_state *table, struct em_data_callback *cb, int nr_states, unsigned long flags)
The EM is going to support runtime modifications of the power data. Introduce RCU safe mechanism to clean up the old allocated EM data. It also adds a mutex for the EM structure to serialize the modifiers. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> --- kernel/power/energy_model.c | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+)