Message ID | tencent_EE27C7D1D6BDB3EE57A2C467CC59A866C405@qq.com |
---|---|
State | New |
Headers | show |
Series | [v2] PM: EM: Fix potential division-by-zero error in em_compute_costs() | expand |
Hi Yaxiong, On 4/11/25 02:28, Yaxiong Tian wrote: > From: Yaxiong Tian <tianyaxiong@kylinos.cn> > > When the device is of a non-CPU type, table[i].performance won't be > initialized in the previous em_init_performance(), resulting in division > by zero when calculating costs in em_compute_costs(). > > Since the 'cost' algorithm is only used for EAS energy efficiency > calculations and is currently not utilized by other device drivers, we > should add the _is_cpu_device(dev) check to prevent this division-by-zero > issue. > > Fixes: <1b600da51073> ("PM: EM: Optimize em_cpu_energy() and remove division") > Signed-off-by: Yaxiong Tian <tianyaxiong@kylinos.cn> > --- > kernel/power/energy_model.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c > index d9b7e2b38c7a..d1fa7e8787b5 100644 > --- a/kernel/power/energy_model.c > +++ b/kernel/power/energy_model.c > @@ -244,7 +244,7 @@ static int em_compute_costs(struct device *dev, struct em_perf_state *table, > cost, ret); > return -EINVAL; > } > - } else { > + } else if (_is_cpu_device(dev)) { > /* increase resolution of 'cost' precision */ > power_res = table[i].power * 10; > cost = power_res / table[i].performance; As the test robot pointed out, please set the 'cost' to 0 where it's declared. The rest should be fine. Regards, Lukasz
在 2025/4/14 16:08, Lukasz Luba 写道: > Hi Yaxiong, > > On 4/11/25 02:28, Yaxiong Tian wrote: >> From: Yaxiong Tian <tianyaxiong@kylinos.cn> >> >> When the device is of a non-CPU type, table[i].performance won't be >> initialized in the previous em_init_performance(), resulting in division >> by zero when calculating costs in em_compute_costs(). >> >> Since the 'cost' algorithm is only used for EAS energy efficiency >> calculations and is currently not utilized by other device drivers, we >> should add the _is_cpu_device(dev) check to prevent this division-by-zero >> issue. >> >> Fixes: <1b600da51073> ("PM: EM: Optimize em_cpu_energy() and remove >> division") >> Signed-off-by: Yaxiong Tian <tianyaxiong@kylinos.cn> >> --- >> kernel/power/energy_model.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c >> index d9b7e2b38c7a..d1fa7e8787b5 100644 >> --- a/kernel/power/energy_model.c >> +++ b/kernel/power/energy_model.c >> @@ -244,7 +244,7 @@ static int em_compute_costs(struct device *dev, >> struct em_perf_state *table, >> cost, ret); >> return -EINVAL; >> } >> - } else { >> + } else if (_is_cpu_device(dev)) { >> /* increase resolution of 'cost' precision */ >> power_res = table[i].power * 10; >> cost = power_res / table[i].performance; > > > As the test robot pointed out, please set the 'cost' to 0 > where it's declared. > > The rest should be fine. > > Regards, > Lukasz Okay. I wasn’t sure whether the new patch should reuse the current Message-ID or create a new one, so I checked with ChatGPT and kept the original Message-ID for v2. If a new Message-ID is required, I can resend the email.
在 2025/4/14 16:08, Lukasz Luba 写道: > Hi Yaxiong, > > On 4/11/25 02:28, Yaxiong Tian wrote: >> From: Yaxiong Tian <tianyaxiong@kylinos.cn> >> >> When the device is of a non-CPU type, table[i].performance won't be >> initialized in the previous em_init_performance(), resulting in division >> by zero when calculating costs in em_compute_costs(). >> >> Since the 'cost' algorithm is only used for EAS energy efficiency >> calculations and is currently not utilized by other device drivers, we >> should add the _is_cpu_device(dev) check to prevent this division-by-zero >> issue. >> >> Fixes: <1b600da51073> ("PM: EM: Optimize em_cpu_energy() and remove >> division") >> Signed-off-by: Yaxiong Tian <tianyaxiong@kylinos.cn> >> --- >> kernel/power/energy_model.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c >> index d9b7e2b38c7a..d1fa7e8787b5 100644 >> --- a/kernel/power/energy_model.c >> +++ b/kernel/power/energy_model.c >> @@ -244,7 +244,7 @@ static int em_compute_costs(struct device *dev, >> struct em_perf_state *table, >> cost, ret); >> return -EINVAL; >> } >> - } else { >> + } else if (_is_cpu_device(dev)) { >> /* increase resolution of 'cost' precision */ >> power_res = table[i].power * 10; >> cost = power_res / table[i].performance; > > > As the test robot pointed out, please set the 'cost' to 0 > where it's declared. > > The rest should be fine. > > Regards, > Lukasz Sorry, the V3 version with cost=0 still has issues. I noticed that if the cost is set to 0, the condition "if (table[i].cost >= prev_cost)" in the following code will always evaluate to true. This will incorrectly set the flags to EM_PERF_STATE_INEFFICIENT. Should we change ">=" to ">"?
在 2025/4/15 09:12, Yaxiong Tian 写道: > > > 在 2025/4/14 16:08, Lukasz Luba 写道: >> Hi Yaxiong, >> >> On 4/11/25 02:28, Yaxiong Tian wrote: >>> From: Yaxiong Tian <tianyaxiong@kylinos.cn> >>> >>> When the device is of a non-CPU type, table[i].performance won't be >>> initialized in the previous em_init_performance(), resulting in division >>> by zero when calculating costs in em_compute_costs(). >>> >>> Since the 'cost' algorithm is only used for EAS energy efficiency >>> calculations and is currently not utilized by other device drivers, we >>> should add the _is_cpu_device(dev) check to prevent this >>> division-by-zero >>> issue. >>> >>> Fixes: <1b600da51073> ("PM: EM: Optimize em_cpu_energy() and remove >>> division") >>> Signed-off-by: Yaxiong Tian <tianyaxiong@kylinos.cn> >>> --- >>> kernel/power/energy_model.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c >>> index d9b7e2b38c7a..d1fa7e8787b5 100644 >>> --- a/kernel/power/energy_model.c >>> +++ b/kernel/power/energy_model.c >>> @@ -244,7 +244,7 @@ static int em_compute_costs(struct device *dev, >>> struct em_perf_state *table, >>> cost, ret); >>> return -EINVAL; >>> } >>> - } else { >>> + } else if (_is_cpu_device(dev)) { >>> /* increase resolution of 'cost' precision */ >>> power_res = table[i].power * 10; >>> cost = power_res / table[i].performance; >> >> >> As the test robot pointed out, please set the 'cost' to 0 >> where it's declared. >> >> The rest should be fine. >> >> Regards, >> Lukasz > > Sorry, the V3 version with cost=0 still has issues. > > I noticed that if the cost is set to 0, the condition "if (table[i].cost > >= prev_cost)" in the following code will always evaluate to true. This > will incorrectly set the flags to EM_PERF_STATE_INEFFICIENT. > > Should we change ">=" to ">"? > Sorry Again, Setting EM_PERF_STATE_INEFFICIENT in this case is correct. Earlier, I misunderstood the definition/usage of EM_PERF_STATE_INEFFICIENT.
On Tue, Apr 15, 2025 at 4:03 AM Yaxiong Tian <iambestgod@qq.com> wrote: > > > > 在 2025/4/15 09:12, Yaxiong Tian 写道: > > > > > > 在 2025/4/14 16:08, Lukasz Luba 写道: > >> Hi Yaxiong, > >> > >> On 4/11/25 02:28, Yaxiong Tian wrote: > >>> From: Yaxiong Tian <tianyaxiong@kylinos.cn> > >>> > >>> When the device is of a non-CPU type, table[i].performance won't be > >>> initialized in the previous em_init_performance(), resulting in division > >>> by zero when calculating costs in em_compute_costs(). > >>> > >>> Since the 'cost' algorithm is only used for EAS energy efficiency > >>> calculations and is currently not utilized by other device drivers, we > >>> should add the _is_cpu_device(dev) check to prevent this > >>> division-by-zero > >>> issue. > >>> > >>> Fixes: <1b600da51073> ("PM: EM: Optimize em_cpu_energy() and remove > >>> division") > >>> Signed-off-by: Yaxiong Tian <tianyaxiong@kylinos.cn> > >>> --- > >>> kernel/power/energy_model.c | 2 +- > >>> 1 file changed, 1 insertion(+), 1 deletion(-) > >>> > >>> diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c > >>> index d9b7e2b38c7a..d1fa7e8787b5 100644 > >>> --- a/kernel/power/energy_model.c > >>> +++ b/kernel/power/energy_model.c > >>> @@ -244,7 +244,7 @@ static int em_compute_costs(struct device *dev, > >>> struct em_perf_state *table, > >>> cost, ret); > >>> return -EINVAL; > >>> } > >>> - } else { > >>> + } else if (_is_cpu_device(dev)) { > >>> /* increase resolution of 'cost' precision */ > >>> power_res = table[i].power * 10; > >>> cost = power_res / table[i].performance; > >> > >> > >> As the test robot pointed out, please set the 'cost' to 0 > >> where it's declared. > >> > >> The rest should be fine. > >> > >> Regards, > >> Lukasz > > > > Sorry, the V3 version with cost=0 still has issues. > > > > I noticed that if the cost is set to 0, the condition "if (table[i].cost > > >= prev_cost)" in the following code will always evaluate to true. This > > will incorrectly set the flags to EM_PERF_STATE_INEFFICIENT. > > > > Should we change ">=" to ">"? > > > > Sorry Again, Setting EM_PERF_STATE_INEFFICIENT in this case is correct. > Earlier, I misunderstood the definition/usage of EM_PERF_STATE_INEFFICIENT. Well, EM_PERF_STATE_INEFFICIENT is only looked at in CPU energy models, so setting it in a non-CPU one is redundant.
On 4/17/25 02:07, Yaxiong Tian wrote: > From: Yaxiong Tian <tianyaxiong@kylinos.cn> > > When the device is of a non-CPU type, table[i].performance won't be > initialized in the previous em_init_performance(), resulting in division > by zero when calculating costs in em_compute_costs(). > > Since the 'cost' algorithm is only used for EAS energy efficiency > calculations and is currently not utilized by other device drivers, we > should add the _is_cpu_device(dev) check to prevent this division-by-zero > issue. > > Fixes: 1b600da51073 ("PM: EM: Optimize em_cpu_energy() and remove division") > Signed-off-by: Yaxiong Tian <tianyaxiong@kylinos.cn> > --- > kernel/power/energy_model.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c > index d9b7e2b38c7a..41606247c277 100644 > --- a/kernel/power/energy_model.c > +++ b/kernel/power/energy_model.c > @@ -233,6 +233,10 @@ static int em_compute_costs(struct device *dev, struct em_perf_state *table, > unsigned long prev_cost = ULONG_MAX; > int i, ret; > > + /* This is needed only for CPUs and EAS skip other devices */ > + if (!_is_cpu_device(dev)) > + return 0; > + > /* Compute the cost of each performance state. */ > for (i = nr_states - 1; i >= 0; i--) { > unsigned long power_res, cost; Please stop for a while. I have to check what happened that you faced the issue in the first place. I have been testing the GPU EMs and there was no issues... Let me debug that today.
Hi Yaxiong, On Mon, 14 Apr 2025 at 11:14, Yaxiong Tian <iambestgod@qq.com> wrote: > I wasn’t sure whether the new patch should reuse the current Message-ID > or create a new one, so I checked with ChatGPT and kept the original > Message-ID for v2. If a new Message-ID is required, I can resend the > email. Different (revisions of) patches must have different Message-IDs, else they can no longer be distinguished by our tooling. Fortunately it seems like you didn't succeed (which is good ;-), as all four revisions do have different Message-IDs: https://lore.kernel.org/all/tencent_8478BF8F2549630842D323E7394CB6F49D08@qq.com/ https://lore.kernel.org/all/tencent_EE27C7D1D6BDB3EE57A2C467CC59A866C405@qq.com/ https://lore.kernel.org/all/tencent_6D2374392DB66C9D23BF6E2546638A42EC08@qq.com/ https://lore.kernel.org/all/tencent_2256A7C02F7849F1D89390E488704E826D06@qq.com/ Gr{oetje,eeting}s, Geert
在 2025/4/17 15:20, Geert Uytterhoeven 写道: > Hi Yaxiong, > > On Mon, 14 Apr 2025 at 11:14, Yaxiong Tian <iambestgod@qq.com> wrote: >> I wasn’t sure whether the new patch should reuse the current Message-ID >> or create a new one, so I checked with ChatGPT and kept the original >> Message-ID for v2. If a new Message-ID is required, I can resend the >> email. > > Different (revisions of) patches must have different Message-IDs, > else they can no longer be distinguished by our tooling. > Fortunately it seems like you didn't succeed (which is good ;-), > as all four revisions do have different Message-IDs: > > https://lore.kernel.org/all/tencent_8478BF8F2549630842D323E7394CB6F49D08@qq.com/ > https://lore.kernel.org/all/tencent_EE27C7D1D6BDB3EE57A2C467CC59A866C405@qq.com/ > https://lore.kernel.org/all/tencent_6D2374392DB66C9D23BF6E2546638A42EC08@qq.com/ > https://lore.kernel.org/all/tencent_2256A7C02F7849F1D89390E488704E826D06@qq.com/ > > Gr{oetje,eeting}s, > > Geert > Thank you for your explanation. My previous description might have been a bit unclear. Actually, what I meant was whether it should be --in- reply-to=original Message-ID. I think if it's a patch, replying to the original Message-ID would be better.
diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c index d9b7e2b38c7a..d1fa7e8787b5 100644 --- a/kernel/power/energy_model.c +++ b/kernel/power/energy_model.c @@ -244,7 +244,7 @@ static int em_compute_costs(struct device *dev, struct em_perf_state *table, cost, ret); return -EINVAL; } - } else { + } else if (_is_cpu_device(dev)) { /* increase resolution of 'cost' precision */ power_res = table[i].power * 10; cost = power_res / table[i].performance;