diff mbox series

[v2] PM: EM: Fix potential division-by-zero error in em_compute_costs()

Message ID tencent_EE27C7D1D6BDB3EE57A2C467CC59A866C405@qq.com
State New
Headers show
Series [v2] PM: EM: Fix potential division-by-zero error in em_compute_costs() | expand

Commit Message

Yaxiong Tian April 11, 2025, 1:28 a.m. UTC
From: Yaxiong Tian <tianyaxiong@kylinos.cn>

When the device is of a non-CPU type, table[i].performance won't be
initialized in the previous em_init_performance(), resulting in division
by zero when calculating costs in em_compute_costs().

Since the 'cost' algorithm is only used for EAS energy efficiency
calculations and is currently not utilized by other device drivers, we
should add the _is_cpu_device(dev) check to prevent this division-by-zero
issue.

Fixes: <1b600da51073> ("PM: EM: Optimize em_cpu_energy() and remove division")
Signed-off-by: Yaxiong Tian <tianyaxiong@kylinos.cn>
---
 kernel/power/energy_model.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Lukasz Luba April 14, 2025, 8:08 a.m. UTC | #1
Hi Yaxiong,

On 4/11/25 02:28, Yaxiong Tian wrote:
> From: Yaxiong Tian <tianyaxiong@kylinos.cn>
> 
> When the device is of a non-CPU type, table[i].performance won't be
> initialized in the previous em_init_performance(), resulting in division
> by zero when calculating costs in em_compute_costs().
> 
> Since the 'cost' algorithm is only used for EAS energy efficiency
> calculations and is currently not utilized by other device drivers, we
> should add the _is_cpu_device(dev) check to prevent this division-by-zero
> issue.
> 
> Fixes: <1b600da51073> ("PM: EM: Optimize em_cpu_energy() and remove division")
> Signed-off-by: Yaxiong Tian <tianyaxiong@kylinos.cn>
> ---
>   kernel/power/energy_model.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
> index d9b7e2b38c7a..d1fa7e8787b5 100644
> --- a/kernel/power/energy_model.c
> +++ b/kernel/power/energy_model.c
> @@ -244,7 +244,7 @@ static int em_compute_costs(struct device *dev, struct em_perf_state *table,
>   					cost, ret);
>   				return -EINVAL;
>   			}
> -		} else {
> +		} else if (_is_cpu_device(dev)) {
>   			/* increase resolution of 'cost' precision */
>   			power_res = table[i].power * 10;
>   			cost = power_res / table[i].performance;


As the test robot pointed out, please set the 'cost' to 0
where it's declared.

The rest should be fine.

Regards,
Lukasz
Yaxiong Tian April 14, 2025, 9:08 a.m. UTC | #2
在 2025/4/14 16:08, Lukasz Luba 写道:
> Hi Yaxiong,
> 
> On 4/11/25 02:28, Yaxiong Tian wrote:
>> From: Yaxiong Tian <tianyaxiong@kylinos.cn>
>>
>> When the device is of a non-CPU type, table[i].performance won't be
>> initialized in the previous em_init_performance(), resulting in division
>> by zero when calculating costs in em_compute_costs().
>>
>> Since the 'cost' algorithm is only used for EAS energy efficiency
>> calculations and is currently not utilized by other device drivers, we
>> should add the _is_cpu_device(dev) check to prevent this division-by-zero
>> issue.
>>
>> Fixes: <1b600da51073> ("PM: EM: Optimize em_cpu_energy() and remove 
>> division")
>> Signed-off-by: Yaxiong Tian <tianyaxiong@kylinos.cn>
>> ---
>>   kernel/power/energy_model.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
>> index d9b7e2b38c7a..d1fa7e8787b5 100644
>> --- a/kernel/power/energy_model.c
>> +++ b/kernel/power/energy_model.c
>> @@ -244,7 +244,7 @@ static int em_compute_costs(struct device *dev, 
>> struct em_perf_state *table,
>>                       cost, ret);
>>                   return -EINVAL;
>>               }
>> -        } else {
>> +        } else if (_is_cpu_device(dev)) {
>>               /* increase resolution of 'cost' precision */
>>               power_res = table[i].power * 10;
>>               cost = power_res / table[i].performance;
> 
> 
> As the test robot pointed out, please set the 'cost' to 0
> where it's declared.
> 
> The rest should be fine.
> 
> Regards,
> Lukasz

Okay.
I wasn’t sure whether the new patch should reuse the current Message-ID
or create a new one, so I checked with ChatGPT and kept the original
Message-ID for v2. If a new Message-ID is required, I can resend the
email.
Yaxiong Tian April 15, 2025, 1:12 a.m. UTC | #3
在 2025/4/14 16:08, Lukasz Luba 写道:
> Hi Yaxiong,
> 
> On 4/11/25 02:28, Yaxiong Tian wrote:
>> From: Yaxiong Tian <tianyaxiong@kylinos.cn>
>>
>> When the device is of a non-CPU type, table[i].performance won't be
>> initialized in the previous em_init_performance(), resulting in division
>> by zero when calculating costs in em_compute_costs().
>>
>> Since the 'cost' algorithm is only used for EAS energy efficiency
>> calculations and is currently not utilized by other device drivers, we
>> should add the _is_cpu_device(dev) check to prevent this division-by-zero
>> issue.
>>
>> Fixes: <1b600da51073> ("PM: EM: Optimize em_cpu_energy() and remove 
>> division")
>> Signed-off-by: Yaxiong Tian <tianyaxiong@kylinos.cn>
>> ---
>>   kernel/power/energy_model.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
>> index d9b7e2b38c7a..d1fa7e8787b5 100644
>> --- a/kernel/power/energy_model.c
>> +++ b/kernel/power/energy_model.c
>> @@ -244,7 +244,7 @@ static int em_compute_costs(struct device *dev, 
>> struct em_perf_state *table,
>>                       cost, ret);
>>                   return -EINVAL;
>>               }
>> -        } else {
>> +        } else if (_is_cpu_device(dev)) {
>>               /* increase resolution of 'cost' precision */
>>               power_res = table[i].power * 10;
>>               cost = power_res / table[i].performance;
> 
> 
> As the test robot pointed out, please set the 'cost' to 0
> where it's declared.
> 
> The rest should be fine.
> 
> Regards,
> Lukasz

Sorry, the V3 version with cost=0 still has issues.

I noticed that if the cost is set to 0, the condition "if (table[i].cost
  >= prev_cost)" in the following code will always evaluate to true. This
  will incorrectly set the flags to EM_PERF_STATE_INEFFICIENT.

Should we change ">=" to ">"?
Yaxiong Tian April 15, 2025, 1:58 a.m. UTC | #4
在 2025/4/15 09:12, Yaxiong Tian 写道:
> 
> 
> 在 2025/4/14 16:08, Lukasz Luba 写道:
>> Hi Yaxiong,
>>
>> On 4/11/25 02:28, Yaxiong Tian wrote:
>>> From: Yaxiong Tian <tianyaxiong@kylinos.cn>
>>>
>>> When the device is of a non-CPU type, table[i].performance won't be
>>> initialized in the previous em_init_performance(), resulting in division
>>> by zero when calculating costs in em_compute_costs().
>>>
>>> Since the 'cost' algorithm is only used for EAS energy efficiency
>>> calculations and is currently not utilized by other device drivers, we
>>> should add the _is_cpu_device(dev) check to prevent this 
>>> division-by-zero
>>> issue.
>>>
>>> Fixes: <1b600da51073> ("PM: EM: Optimize em_cpu_energy() and remove 
>>> division")
>>> Signed-off-by: Yaxiong Tian <tianyaxiong@kylinos.cn>
>>> ---
>>>   kernel/power/energy_model.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
>>> index d9b7e2b38c7a..d1fa7e8787b5 100644
>>> --- a/kernel/power/energy_model.c
>>> +++ b/kernel/power/energy_model.c
>>> @@ -244,7 +244,7 @@ static int em_compute_costs(struct device *dev, 
>>> struct em_perf_state *table,
>>>                       cost, ret);
>>>                   return -EINVAL;
>>>               }
>>> -        } else {
>>> +        } else if (_is_cpu_device(dev)) {
>>>               /* increase resolution of 'cost' precision */
>>>               power_res = table[i].power * 10;
>>>               cost = power_res / table[i].performance;
>>
>>
>> As the test robot pointed out, please set the 'cost' to 0
>> where it's declared.
>>
>> The rest should be fine.
>>
>> Regards,
>> Lukasz
> 
> Sorry, the V3 version with cost=0 still has issues.
> 
> I noticed that if the cost is set to 0, the condition "if (table[i].cost
>   >= prev_cost)" in the following code will always evaluate to true. This
>   will incorrectly set the flags to EM_PERF_STATE_INEFFICIENT.
> 
> Should we change ">=" to ">"?
> 

Sorry Again, Setting EM_PERF_STATE_INEFFICIENT in this case is correct.
Earlier, I misunderstood the definition/usage of EM_PERF_STATE_INEFFICIENT.
Rafael J. Wysocki April 15, 2025, 5:19 p.m. UTC | #5
On Tue, Apr 15, 2025 at 4:03 AM Yaxiong Tian <iambestgod@qq.com> wrote:
>
>
>
> 在 2025/4/15 09:12, Yaxiong Tian 写道:
> >
> >
> > 在 2025/4/14 16:08, Lukasz Luba 写道:
> >> Hi Yaxiong,
> >>
> >> On 4/11/25 02:28, Yaxiong Tian wrote:
> >>> From: Yaxiong Tian <tianyaxiong@kylinos.cn>
> >>>
> >>> When the device is of a non-CPU type, table[i].performance won't be
> >>> initialized in the previous em_init_performance(), resulting in division
> >>> by zero when calculating costs in em_compute_costs().
> >>>
> >>> Since the 'cost' algorithm is only used for EAS energy efficiency
> >>> calculations and is currently not utilized by other device drivers, we
> >>> should add the _is_cpu_device(dev) check to prevent this
> >>> division-by-zero
> >>> issue.
> >>>
> >>> Fixes: <1b600da51073> ("PM: EM: Optimize em_cpu_energy() and remove
> >>> division")
> >>> Signed-off-by: Yaxiong Tian <tianyaxiong@kylinos.cn>
> >>> ---
> >>>   kernel/power/energy_model.c | 2 +-
> >>>   1 file changed, 1 insertion(+), 1 deletion(-)
> >>>
> >>> diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
> >>> index d9b7e2b38c7a..d1fa7e8787b5 100644
> >>> --- a/kernel/power/energy_model.c
> >>> +++ b/kernel/power/energy_model.c
> >>> @@ -244,7 +244,7 @@ static int em_compute_costs(struct device *dev,
> >>> struct em_perf_state *table,
> >>>                       cost, ret);
> >>>                   return -EINVAL;
> >>>               }
> >>> -        } else {
> >>> +        } else if (_is_cpu_device(dev)) {
> >>>               /* increase resolution of 'cost' precision */
> >>>               power_res = table[i].power * 10;
> >>>               cost = power_res / table[i].performance;
> >>
> >>
> >> As the test robot pointed out, please set the 'cost' to 0
> >> where it's declared.
> >>
> >> The rest should be fine.
> >>
> >> Regards,
> >> Lukasz
> >
> > Sorry, the V3 version with cost=0 still has issues.
> >
> > I noticed that if the cost is set to 0, the condition "if (table[i].cost
> >   >= prev_cost)" in the following code will always evaluate to true. This
> >   will incorrectly set the flags to EM_PERF_STATE_INEFFICIENT.
> >
> > Should we change ">=" to ">"?
> >
>
> Sorry Again, Setting EM_PERF_STATE_INEFFICIENT in this case is correct.
> Earlier, I misunderstood the definition/usage of EM_PERF_STATE_INEFFICIENT.

Well, EM_PERF_STATE_INEFFICIENT is only looked at in CPU energy
models, so setting it in a non-CPU one is redundant.
Lukasz Luba April 17, 2025, 5:57 a.m. UTC | #6
On 4/17/25 02:07, Yaxiong Tian wrote:
> From: Yaxiong Tian <tianyaxiong@kylinos.cn>
> 
> When the device is of a non-CPU type, table[i].performance won't be
> initialized in the previous em_init_performance(), resulting in division
> by zero when calculating costs in em_compute_costs().
> 
> Since the 'cost' algorithm is only used for EAS energy efficiency
> calculations and is currently not utilized by other device drivers, we
> should add the _is_cpu_device(dev) check to prevent this division-by-zero
> issue.
> 
> Fixes: 1b600da51073 ("PM: EM: Optimize em_cpu_energy() and remove division")
> Signed-off-by: Yaxiong Tian <tianyaxiong@kylinos.cn>
> ---
>   kernel/power/energy_model.c | 4 ++++
>   1 file changed, 4 insertions(+)
> 
> diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
> index d9b7e2b38c7a..41606247c277 100644
> --- a/kernel/power/energy_model.c
> +++ b/kernel/power/energy_model.c
> @@ -233,6 +233,10 @@ static int em_compute_costs(struct device *dev, struct em_perf_state *table,
>   	unsigned long prev_cost = ULONG_MAX;
>   	int i, ret;
>   
> +	/* This is needed only for CPUs and EAS skip other devices */
> +	if (!_is_cpu_device(dev))
> +		return 0;
> +
>   	/* Compute the cost of each performance state. */
>   	for (i = nr_states - 1; i >= 0; i--) {
>   		unsigned long power_res, cost;


Please stop for a while. I have to check what happened that you
faced the issue in the first place. I have been testing the GPU
EMs and there was no issues...

Let me debug that today.
Geert Uytterhoeven April 17, 2025, 7:20 a.m. UTC | #7
Hi Yaxiong,

On Mon, 14 Apr 2025 at 11:14, Yaxiong Tian <iambestgod@qq.com> wrote:
> I wasn’t sure whether the new patch should reuse the current Message-ID
> or create a new one, so I checked with ChatGPT and kept the original
> Message-ID for v2. If a new Message-ID is required, I can resend the
> email.

Different (revisions of) patches must have different Message-IDs,
else they can no longer be distinguished by our tooling.
Fortunately it seems like you didn't succeed (which is good ;-),
as all four revisions do have different Message-IDs:

https://lore.kernel.org/all/tencent_8478BF8F2549630842D323E7394CB6F49D08@qq.com/
https://lore.kernel.org/all/tencent_EE27C7D1D6BDB3EE57A2C467CC59A866C405@qq.com/
https://lore.kernel.org/all/tencent_6D2374392DB66C9D23BF6E2546638A42EC08@qq.com/
https://lore.kernel.org/all/tencent_2256A7C02F7849F1D89390E488704E826D06@qq.com/

Gr{oetje,eeting}s,

                        Geert
Yaxiong Tian April 17, 2025, 7:41 a.m. UTC | #8
在 2025/4/17 15:20, Geert Uytterhoeven 写道:
> Hi Yaxiong,
> 
> On Mon, 14 Apr 2025 at 11:14, Yaxiong Tian <iambestgod@qq.com> wrote:
>> I wasn’t sure whether the new patch should reuse the current Message-ID
>> or create a new one, so I checked with ChatGPT and kept the original
>> Message-ID for v2. If a new Message-ID is required, I can resend the
>> email.
> 
> Different (revisions of) patches must have different Message-IDs,
> else they can no longer be distinguished by our tooling.
> Fortunately it seems like you didn't succeed (which is good ;-),
> as all four revisions do have different Message-IDs:
> 
> https://lore.kernel.org/all/tencent_8478BF8F2549630842D323E7394CB6F49D08@qq.com/
> https://lore.kernel.org/all/tencent_EE27C7D1D6BDB3EE57A2C467CC59A866C405@qq.com/
> https://lore.kernel.org/all/tencent_6D2374392DB66C9D23BF6E2546638A42EC08@qq.com/
> https://lore.kernel.org/all/tencent_2256A7C02F7849F1D89390E488704E826D06@qq.com/
> 
> Gr{oetje,eeting}s,
> 
>                          Geert
> 
Thank you for your explanation. My previous description might have been
a bit unclear. Actually, what I meant was whether it should be --in-
reply-to=original Message-ID. I think if it's a patch, replying to the
  original Message-ID would be better.
diff mbox series

Patch

diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
index d9b7e2b38c7a..d1fa7e8787b5 100644
--- a/kernel/power/energy_model.c
+++ b/kernel/power/energy_model.c
@@ -244,7 +244,7 @@  static int em_compute_costs(struct device *dev, struct em_perf_state *table,
 					cost, ret);
 				return -EINVAL;
 			}
-		} else {
+		} else if (_is_cpu_device(dev)) {
 			/* increase resolution of 'cost' precision */
 			power_res = table[i].power * 10;
 			cost = power_res / table[i].performance;