diff mbox series

[V5,3/6] firmware: arm_scmi: Report duplicate opps as firmware bugs

Message ID 20241030125512.2884761-4-quic_sibis@quicinc.com
State New
Headers show
Series [V5,1/6] firmware: arm_scmi: Ensure that the message-id supports fastchannel | expand

Commit Message

Sibi Sankar Oct. 30, 2024, 12:55 p.m. UTC
Duplicate opps reported by buggy SCP firmware currently show up
as warnings even though the only functional impact is that the
level/index remain inaccessible. Make it less scary for the end
user by using dev_info instead, along with FW_BUG tag.

Suggested-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Sibi Sankar <quic_sibis@quicinc.com>
---
 drivers/firmware/arm_scmi/perf.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Johan Hovold Nov. 1, 2024, 2:09 p.m. UTC | #1
On Wed, Oct 30, 2024 at 06:25:09PM +0530, Sibi Sankar wrote:
> Duplicate opps reported by buggy SCP firmware currently show up
> as warnings even though the only functional impact is that the
> level/index remain inaccessible. Make it less scary for the end
> user by using dev_info instead, along with FW_BUG tag.
> 
> Suggested-by: Johan Hovold <johan+linaro@kernel.org>
> Signed-off-by: Sibi Sankar <quic_sibis@quicinc.com>
> ---
>  drivers/firmware/arm_scmi/perf.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/firmware/arm_scmi/perf.c b/drivers/firmware/arm_scmi/perf.c
> index 32f9a9acd3e9..c7e5a34b254b 100644
> --- a/drivers/firmware/arm_scmi/perf.c
> +++ b/drivers/firmware/arm_scmi/perf.c
> @@ -387,7 +387,7 @@ process_response_opp(struct device *dev, struct perf_dom_info *dom,
>  
>  	ret = xa_insert(&dom->opps_by_lvl, opp->perf, opp, GFP_KERNEL);
>  	if (ret) {
> -		dev_warn(dev, "Failed to add opps_by_lvl at %d for %s - ret:%d\n",
> +		dev_info(dev, FW_BUG "Failed to add opps_by_lvl at %d for %s - ret:%d\n",
>  			 opp->perf, dom->info.name, ret);

I was hoping you could make the error message a bit more informative as
well, for example, by saying that a duplicate opp level was ignored:

	arm-scmi arm-scmi.0.auto: [Firmware Bug]: Ignoring duplicate OPP 3417600 for NCC

or similar (e.g. as the current message looks like an error, with errno
and all, that indeed warrants warning level).

Perhaps with such a message you could even keep the warning level to
make it stand out more (if that's desirable) without the risk of scaring
users.

Johan
Sibi Sankar Nov. 4, 2024, 1:50 p.m. UTC | #2
On 11/1/24 19:39, Johan Hovold wrote:
> On Wed, Oct 30, 2024 at 06:25:09PM +0530, Sibi Sankar wrote:
>> Duplicate opps reported by buggy SCP firmware currently show up
>> as warnings even though the only functional impact is that the
>> level/index remain inaccessible. Make it less scary for the end
>> user by using dev_info instead, along with FW_BUG tag.
>>
>> Suggested-by: Johan Hovold <johan+linaro@kernel.org>
>> Signed-off-by: Sibi Sankar <quic_sibis@quicinc.com>
>> ---
>>   drivers/firmware/arm_scmi/perf.c | 4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/firmware/arm_scmi/perf.c b/drivers/firmware/arm_scmi/perf.c
>> index 32f9a9acd3e9..c7e5a34b254b 100644
>> --- a/drivers/firmware/arm_scmi/perf.c
>> +++ b/drivers/firmware/arm_scmi/perf.c
>> @@ -387,7 +387,7 @@ process_response_opp(struct device *dev, struct perf_dom_info *dom,
>>   
>>   	ret = xa_insert(&dom->opps_by_lvl, opp->perf, opp, GFP_KERNEL);
>>   	if (ret) {
>> -		dev_warn(dev, "Failed to add opps_by_lvl at %d for %s - ret:%d\n",
>> +		dev_info(dev, FW_BUG "Failed to add opps_by_lvl at %d for %s - ret:%d\n",
>>   			 opp->perf, dom->info.name, ret);
> 
> I was hoping you could make the error message a bit more informative as
> well, for example, by saying that a duplicate opp level was ignored:
> 
> 	arm-scmi arm-scmi.0.auto: [Firmware Bug]: Ignoring duplicate OPP 3417600 for NCC

I did think about doing something similar but xa_insert can fail
with both -EXIST (duplicate) and -ENOMEM, so the we can't really
use term duplicate when insert fails. I can add the perf level
though to the message though.

-Sibi

> 
> or similar (e.g. as the current message looks like an error, with errno
> and all, that indeed warrants warning level).
> 
> Perhaps with such a message you could even keep the warning level to
> make it stand out more (if that's desirable) without the risk of scaring
> users.
> 
> Johan
Cristian Marussi Nov. 4, 2024, 2:07 p.m. UTC | #3
On Mon, Nov 04, 2024 at 07:20:01PM +0530, Sibi Sankar wrote:
> 
> 
> On 11/1/24 19:39, Johan Hovold wrote:
> > On Wed, Oct 30, 2024 at 06:25:09PM +0530, Sibi Sankar wrote:
> > > Duplicate opps reported by buggy SCP firmware currently show up
> > > as warnings even though the only functional impact is that the
> > > level/index remain inaccessible. Make it less scary for the end
> > > user by using dev_info instead, along with FW_BUG tag.
> > > 
> > > Suggested-by: Johan Hovold <johan+linaro@kernel.org>
> > > Signed-off-by: Sibi Sankar <quic_sibis@quicinc.com>
> > > ---
> > >   drivers/firmware/arm_scmi/perf.c | 4 ++--
> > >   1 file changed, 2 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/firmware/arm_scmi/perf.c b/drivers/firmware/arm_scmi/perf.c
> > > index 32f9a9acd3e9..c7e5a34b254b 100644
> > > --- a/drivers/firmware/arm_scmi/perf.c
> > > +++ b/drivers/firmware/arm_scmi/perf.c
> > > @@ -387,7 +387,7 @@ process_response_opp(struct device *dev, struct perf_dom_info *dom,
> > >   	ret = xa_insert(&dom->opps_by_lvl, opp->perf, opp, GFP_KERNEL);
> > >   	if (ret) {
> > > -		dev_warn(dev, "Failed to add opps_by_lvl at %d for %s - ret:%d\n",
> > > +		dev_info(dev, FW_BUG "Failed to add opps_by_lvl at %d for %s - ret:%d\n",
> > >   			 opp->perf, dom->info.name, ret);
> > 
> > I was hoping you could make the error message a bit more informative as
> > well, for example, by saying that a duplicate opp level was ignored:
> > 
> > 	arm-scmi arm-scmi.0.auto: [Firmware Bug]: Ignoring duplicate OPP 3417600 for NCC
> 
> I did think about doing something similar but xa_insert can fail
> with both -EXIST (duplicate) and -ENOMEM, so the we can't really
> use term duplicate when insert fails. I can add the perf level
> though to the message though.
> 

It is the caller iter_perf_levels_process_response() of the above
helpers that is in charge to check the retval and decide what to do:
if it is a -EBUSY it just bails out returning 0 (SKIP) otherwise returns
the error... (and anyway the warn/info had already been given)

Originally the message was generic exactly for this reason...making some
noise to have fw/guys fix it and carry on...or fail completely the otehr
way...

I suppose you should move the error message in the caller if you want
to attain this level of information for the user...if not you are also
making noise potentially for nothing by saying FW_BUG on an -ENOMEM...

....anyway, on the otehr side, on the -ENOMEM path there is probably
really no need to say anything in any case...things are going terribly
wrong and you will notice soon in the form of a total failure of the stack,
most probably.

Thanks,
Cristian
Johan Hovold Nov. 4, 2024, 2:09 p.m. UTC | #4
On Mon, Nov 04, 2024 at 07:20:01PM +0530, Sibi Sankar wrote:
> On 11/1/24 19:39, Johan Hovold wrote:
> > On Wed, Oct 30, 2024 at 06:25:09PM +0530, Sibi Sankar wrote:

> >> @@ -387,7 +387,7 @@ process_response_opp(struct device *dev, struct perf_dom_info *dom,
> >>   
> >>   	ret = xa_insert(&dom->opps_by_lvl, opp->perf, opp, GFP_KERNEL);
> >>   	if (ret) {
> >> -		dev_warn(dev, "Failed to add opps_by_lvl at %d for %s - ret:%d\n",
> >> +		dev_info(dev, FW_BUG "Failed to add opps_by_lvl at %d for %s - ret:%d\n",
> >>   			 opp->perf, dom->info.name, ret);
> > 
> > I was hoping you could make the error message a bit more informative as
> > well, for example, by saying that a duplicate opp level was ignored:
> > 
> > 	arm-scmi arm-scmi.0.auto: [Firmware Bug]: Ignoring duplicate OPP 3417600 for NCC
> 
> I did think about doing something similar but xa_insert can fail
> with both -EXIST (duplicate) and -ENOMEM, so the we can't really
> use term duplicate when insert fails. I can add the perf level
> though to the message though.

We generally don't log errors for memory allocation failures (e.g. as
that would already have been taken care of by the allocators, if that is
the source of the -ENOMEM).

But either way you should be able to check the errno to determine if
this is due to a duplicate entry or not.

Johan
Sudeep Holla Nov. 6, 2024, 7:07 a.m. UTC | #5
On Wed, Oct 30, 2024 at 06:25:09PM +0530, Sibi Sankar wrote:
> Duplicate opps reported by buggy SCP firmware currently show up
> as warnings even though the only functional impact is that the
> level/index remain inaccessible. Make it less scary for the end
> user by using dev_info instead, along with FW_BUG tag.
> 
> Suggested-by: Johan Hovold <johan+linaro@kernel.org>

Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Sudeep Holla Nov. 6, 2024, 7:20 a.m. UTC | #6
On Mon, Nov 04, 2024 at 03:09:12PM +0100, Johan Hovold wrote:
> On Mon, Nov 04, 2024 at 07:20:01PM +0530, Sibi Sankar wrote:
> > On 11/1/24 19:39, Johan Hovold wrote:
> > > On Wed, Oct 30, 2024 at 06:25:09PM +0530, Sibi Sankar wrote:
> 
> > >> @@ -387,7 +387,7 @@ process_response_opp(struct device *dev, struct perf_dom_info *dom,
> > >>   
> > >>   	ret = xa_insert(&dom->opps_by_lvl, opp->perf, opp, GFP_KERNEL);
> > >>   	if (ret) {
> > >> -		dev_warn(dev, "Failed to add opps_by_lvl at %d for %s - ret:%d\n",
> > >> +		dev_info(dev, FW_BUG "Failed to add opps_by_lvl at %d for %s - ret:%d\n",
> > >>   			 opp->perf, dom->info.name, ret);
> > > 
> > > I was hoping you could make the error message a bit more informative as
> > > well, for example, by saying that a duplicate opp level was ignored:
> > > 
> > > 	arm-scmi arm-scmi.0.auto: [Firmware Bug]: Ignoring duplicate OPP 3417600 for NCC
> > 
> > I did think about doing something similar but xa_insert can fail
> > with both -EXIST (duplicate) and -ENOMEM, so the we can't really
> > use term duplicate when insert fails. I can add the perf level
> > though to the message though.
> 
> We generally don't log errors for memory allocation failures (e.g. as
> that would already have been taken care of by the allocators, if that is
> the source of the -ENOMEM).
> 
> But either way you should be able to check the errno to determine if
> this is due to a duplicate entry or not.

Everyone has valid reasons for their argument here, so we need to find
a safe middle ground. Will stating it as [Possible Firmware Bug] be any
useful ? If there is -ENOMEM, other error messages will be seen before
this and user can ignore this error until that memory issue is fixed ?
diff mbox series

Patch

diff --git a/drivers/firmware/arm_scmi/perf.c b/drivers/firmware/arm_scmi/perf.c
index 32f9a9acd3e9..c7e5a34b254b 100644
--- a/drivers/firmware/arm_scmi/perf.c
+++ b/drivers/firmware/arm_scmi/perf.c
@@ -387,7 +387,7 @@  process_response_opp(struct device *dev, struct perf_dom_info *dom,
 
 	ret = xa_insert(&dom->opps_by_lvl, opp->perf, opp, GFP_KERNEL);
 	if (ret) {
-		dev_warn(dev, "Failed to add opps_by_lvl at %d for %s - ret:%d\n",
+		dev_info(dev, FW_BUG "Failed to add opps_by_lvl at %d for %s - ret:%d\n",
 			 opp->perf, dom->info.name, ret);
 		return ret;
 	}
@@ -409,7 +409,7 @@  process_response_opp_v4(struct device *dev, struct perf_dom_info *dom,
 
 	ret = xa_insert(&dom->opps_by_lvl, opp->perf, opp, GFP_KERNEL);
 	if (ret) {
-		dev_warn(dev, "Failed to add opps_by_lvl at %d for %s - ret:%d\n",
+		dev_info(dev, FW_BUG "Failed to add opps_by_lvl at %d for %s - ret:%d\n",
 			 opp->perf, dom->info.name, ret);
 		return ret;
 	}