diff mbox series

power: supply: bq27xxx_battery: Do not return ENODEV when busy

Message ID 20240913-foo-fix2-v1-1-a0f499404f3a@axis.com
State New
Headers show
Series power: supply: bq27xxx_battery: Do not return ENODEV when busy | expand

Commit Message

Jerry Lv Sept. 13, 2024, 8:45 a.m. UTC
Multiple applications may access the device gauge at the same time, so the
gauge may be busy and EBUSY will be returned. The driver will set a flag to
record the EBUSY state, and this flag will be kept until the next periodic
update. When this flag is set, bq27xxx_battery_get_property() will just
return ENODEV until the flag is updated.

Even if the gauge was busy during the last accessing attempt, returning
ENODEV is not ideal, and can cause confusion in the applications layer.

Instead, retry accessing the gauge to update the properties is as expected.
The gauge typically recovers from busy state within a few milliseconds, and
the cached flag will not cause issues while updating the properties.

Signed-off-by: Jerry Lv <Jerry.Lv@axis.com>
---
 drivers/power/supply/bq27xxx_battery.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


---
base-commit: da3ea35007d0af457a0afc87e84fddaebc4e0b63
change-id: 20240913-foo-fix2-a0d79db86a0b

Best regards,

Comments

Pali Rohár Sept. 13, 2024, 9:27 p.m. UTC | #1
On Friday 13 September 2024 16:45:37 Jerry Lv wrote:
> Multiple applications may access the device gauge at the same time, so the
> gauge may be busy and EBUSY will be returned. The driver will set a flag to
> record the EBUSY state, and this flag will be kept until the next periodic
> update. When this flag is set, bq27xxx_battery_get_property() will just
> return ENODEV until the flag is updated.

I did not find any evidence of EBUSY. Which function and to which caller
it returns? Do you mean that bq27xxx_read() returns -EBUSY?

> Even if the gauge was busy during the last accessing attempt, returning
> ENODEV is not ideal, and can cause confusion in the applications layer.

It would be better to either propagate correct error or return old value
from cache...

> Instead, retry accessing the gauge to update the properties is as expected.
> The gauge typically recovers from busy state within a few milliseconds, and
> the cached flag will not cause issues while updating the properties.
> 
> Signed-off-by: Jerry Lv <Jerry.Lv@axis.com>
> ---
>  drivers/power/supply/bq27xxx_battery.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/power/supply/bq27xxx_battery.c b/drivers/power/supply/bq27xxx_battery.c
> index 750fda543308..eefbb5029a3b 100644
> --- a/drivers/power/supply/bq27xxx_battery.c
> +++ b/drivers/power/supply/bq27xxx_battery.c
> @@ -2029,7 +2029,7 @@ static int bq27xxx_battery_get_property(struct power_supply *psy,
>  		bq27xxx_battery_update_unlocked(di);
>  	mutex_unlock(&di->lock);
>  
> -	if (psp != POWER_SUPPLY_PROP_PRESENT && di->cache.flags < 0)
> +	if (psp != POWER_SUPPLY_PROP_PRESENT && di->cache.flags < 0 && di->cache.flags != -EBUSY)
>  		return -ENODEV;

... but ignoring error and re-using the error return value as flags in
code later in this function is bad idea.

>  
>  	switch (psp) {
> 
> ---
> base-commit: da3ea35007d0af457a0afc87e84fddaebc4e0b63
> change-id: 20240913-foo-fix2-a0d79db86a0b
> 
> Best regards,
> -- 
> Jerry Lv <Jerry.Lv@axis.com>
>
Pali Rohár Sept. 14, 2024, 8:24 a.m. UTC | #2
Hello Jerry,

I think that this issue should be handled in different way.

First thing is to propagate error and not change it to -ENODEV. This is
really confusing and makes debugging harder.

Second thing, if bq27xxx_read() returns -EBUSY, sleep few milliseconds
and call bq27xxx_read() again.

This should cover the issue which you are observing and also fixing the
problem which you introduced in your change (interpreting error code as
bogus cache data).

Anyway, which bus is BQ27Z561-R2 using (i2c?)? And how is EBUSY
indicated or transferred over wire?

Pali

On Saturday 14 September 2024 02:57:39 Jerry Lv wrote:
> Hi Pali,
> 
> (Sorry for inconvineient! previous email was rejected by some email list for some HTML part, so I edit it and send it again.)
> 
> Yes, bq27xxx_read() will return -EBUSY, and bq27xxx_read() will be called in many functions.
> 
> In our product, some different applications may access the gauge BQ27Z561-R2, and we see many times the returned error code is -ENODEV.
> After debugging it by oscillograph and adding some debug info, we found the device is busy sometimes, and it will recover very soon(a few milliseconds).
> So, we want to exclude the busy case before return -ENODEV.
> 
> Best Regards,
> Jerry
> 
> On Friday 13 September 2024 16:45:37 Jerry Lv wrote:
> > Multiple applications may access the device gauge at the same time, so the
> > gauge may be busy and EBUSY will be returned. The driver will set a flag to
> > record the EBUSY state, and this flag will be kept until the next periodic
> > update. When this flag is set, bq27xxx_battery_get_property() will just
> > return ENODEV until the flag is updated.
> 
> I did not find any evidence of EBUSY. Which function and to which caller
> it returns? Do you mean that bq27xxx_read() returns -EBUSY?
> 
> > Even if the gauge was busy during the last accessing attempt, returning
> > ENODEV is not ideal, and can cause confusion in the applications layer.
> 
> It would be better to either propagate correct error or return old value
> from cache...
> 
> > Instead, retry accessing the gauge to update the properties is as expected.
> > The gauge typically recovers from busy state within a few milliseconds, and
> > the cached flag will not cause issues while updating the properties.
> >
> > Signed-off-by: Jerry Lv <Jerry.Lv@axis.com>
> > ---
> >  drivers/power/supply/bq27xxx_battery.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/power/supply/bq27xxx_battery.c b/drivers/power/supply/bq27xxx_battery.c
> > index 750fda543308..eefbb5029a3b 100644
> > --- a/drivers/power/supply/bq27xxx_battery.c
> > +++ b/drivers/power/supply/bq27xxx_battery.c
> > @@ -2029,7 +2029,7 @@ static int bq27xxx_battery_get_property(struct power_supply *psy,
> >                bq27xxx_battery_update_unlocked(di);
> >        mutex_unlock(&di->lock);
> >
> > -     if (psp != POWER_SUPPLY_PROP_PRESENT && di->cache.flags < 0)
> > +     if (psp != POWER_SUPPLY_PROP_PRESENT && di->cache.flags < 0 && di->cache.flags != -EBUSY)
> >                return -ENODEV;
> 
> ... but ignoring error and re-using the error return value as flags in
> code later in this function is bad idea.
> 
> >
> >        switch (psp) {
> >
> > ---
> > base-commit: da3ea35007d0af457a0afc87e84fddaebc4e0b63
> > change-id: 20240913-foo-fix2-a0d79db86a0b
> >
> > Best regards,
> > --
> > Jerry Lv <Jerry.Lv@axis.com>
> >
>
Jerry Lv Sept. 23, 2024, 8:14 a.m. UTC | #3
Hi Pali,

Thanks for your excellent suggestion, I will change the code accordingly.

About the question: 
Anyway, which bus is BQ27Z561-R2 using (i2c?)? And how is EBUSY indicated or transferred over wire?
--- Yes, we connect the gauge BQ27Z561 to I2C. When it's busy, the feedback we got from the logic analyser is "NAK".


Best Regards,
Jerry Lv
Pali Rohár Sept. 23, 2024, 6:16 p.m. UTC | #4
Thank you for detailed information about i2c NAK. In this case try to
consider if it would not be better to add retry logic in the
bq27xxx_battery_i2c_read() function.

If it is common that bq chipset itself returns i2c NAKs during normal
operations then this affects any i2c read operation done by
bq27xxx_battery_i2c_read() function.

So this issue is not related just to reading "flags", but to anything.
That is why I think that retry should be handled at lower layer.

On Monday 23 September 2024 08:14:13 Jerry Lv wrote:
> Hi Pali,
> 
> Thanks for your excellent suggestion, I will change the code accordingly.
> 
> About the question: 
> Anyway, which bus is BQ27Z561-R2 using (i2c?)? And how is EBUSY indicated or transferred over wire?
> --- Yes, we connect the gauge BQ27Z561 to I2C. When it's busy, the feedback we got from the logic analyser is "NAK".
> 
> 
> Best Regards,
> Jerry Lv
> 
> ________________________________________
> From: Pali Rohár <pali@kernel.org>
> Sent: Saturday, September 14, 2024 4:24 PM
> To: Jerry Lv
> Cc: Sebastian Reichel; linux-pm@vger.kernel.org; linux-kernel@vger.kernel.org; Kernel
> Subject: Re: [PATCH] power: supply: bq27xxx_battery: Do not return ENODEV when busy
> 
> Hello Jerry,
> 
> I think that this issue should be handled in different way.
> 
> First thing is to propagate error and not change it to -ENODEV. This is
> really confusing and makes debugging harder.
> 
> Second thing, if bq27xxx_read() returns -EBUSY, sleep few milliseconds
> and call bq27xxx_read() again.
> 
> This should cover the issue which you are observing and also fixing the
> problem which you introduced in your change (interpreting error code as
> bogus cache data).
> 
> Anyway, which bus is BQ27Z561-R2 using (i2c?)? And how is EBUSY
> indicated or transferred over wire?
> 
> Pali
> 
> On Saturday 14 September 2024 02:57:39 Jerry Lv wrote:
> > Hi Pali,
> >
> > (Sorry for inconvineient! previous email was rejected by some email list for some HTML part, so I edit it and send it again.)
> >
> > Yes, bq27xxx_read() will return -EBUSY, and bq27xxx_read() will be called in many functions.
> >
> > In our product, some different applications may access the gauge BQ27Z561-R2, and we see many times the returned error code is -ENODEV.
> > After debugging it by oscillograph and adding some debug info, we found the device is busy sometimes, and it will recover very soon(a few milliseconds).
> > So, we want to exclude the busy case before return -ENODEV.
> >
> > Best Regards,
> > Jerry
> >
> > On Friday 13 September 2024 16:45:37 Jerry Lv wrote:
> > > Multiple applications may access the device gauge at the same time, so the
> > > gauge may be busy and EBUSY will be returned. The driver will set a flag to
> > > record the EBUSY state, and this flag will be kept until the next periodic
> > > update. When this flag is set, bq27xxx_battery_get_property() will just
> > > return ENODEV until the flag is updated.
> >
> > I did not find any evidence of EBUSY. Which function and to which caller
> > it returns? Do you mean that bq27xxx_read() returns -EBUSY?
> >
> > > Even if the gauge was busy during the last accessing attempt, returning
> > > ENODEV is not ideal, and can cause confusion in the applications layer.
> >
> > It would be better to either propagate correct error or return old value
> > from cache...
> >
> > > Instead, retry accessing the gauge to update the properties is as expected.
> > > The gauge typically recovers from busy state within a few milliseconds, and
> > > the cached flag will not cause issues while updating the properties.
> > >
> > > Signed-off-by: Jerry Lv <Jerry.Lv@axis.com>
> > > ---
> > >  drivers/power/supply/bq27xxx_battery.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/power/supply/bq27xxx_battery.c b/drivers/power/supply/bq27xxx_battery.c
> > > index 750fda543308..eefbb5029a3b 100644
> > > --- a/drivers/power/supply/bq27xxx_battery.c
> > > +++ b/drivers/power/supply/bq27xxx_battery.c
> > > @@ -2029,7 +2029,7 @@ static int bq27xxx_battery_get_property(struct power_supply *psy,
> > >                bq27xxx_battery_update_unlocked(di);
> > >        mutex_unlock(&di->lock);
> > >
> > > -     if (psp != POWER_SUPPLY_PROP_PRESENT && di->cache.flags < 0)
> > > +     if (psp != POWER_SUPPLY_PROP_PRESENT && di->cache.flags < 0 && di->cache.flags != -EBUSY)
> > >                return -ENODEV;
> >
> > ... but ignoring error and re-using the error return value as flags in
> > code later in this function is bad idea.
> >
> > >
> > >        switch (psp) {
> > >
> > > ---
> > > base-commit: da3ea35007d0af457a0afc87e84fddaebc4e0b63
> > > change-id: 20240913-foo-fix2-a0d79db86a0b
> > >
> > > Best regards,
> > > --
> > > Jerry Lv <Jerry.Lv@axis.com>
> > >
> >
Jerry Lv Sept. 24, 2024, 3:34 a.m. UTC | #5
Hi Pali,

Just as you mentioned, when the gauge is busy, the other devices
connected to the same I2C will not response too. We rarely see
this in the normal use case, but sometimes see it in our stress test.

Since the gauge usually recovers from busy status very quickly, and
too many retry may affect other devices too. So could we just retry
one time, do you think is it enough?

Best Regards
Jerry Lv
Pali Rohár Sept. 24, 2024, 7:02 p.m. UTC | #6
Hello, as I do not have HW which is affected by this issue, I think that
you would better know how to handle it. If you think that one retry is
enough for normal usage then go ahead with it. I'm fine with it.

Maybe if we want to be super precise we can measure probability how
often gauge is busy and then calculate number of retries to have device
driver working in usual conditions over one or two years. But this is
overkill...

On Tuesday 24 September 2024 03:34:11 Jerry Lv wrote:
> Hi Pali,
> 
> Just as you mentioned, when the gauge is busy, the other devices
> connected to the same I2C will not response too. We rarely see
> this in the normal use case, but sometimes see it in our stress test.
> 
> Since the gauge usually recovers from busy status very quickly, and
> too many retry may affect other devices too. So could we just retry
> one time, do you think is it enough?
> 
> Best Regards
> Jerry Lv
> 
> ________________________________________
> From: Pali Rohár <pali@kernel.org>
> Sent: Tuesday, September 24, 2024 2:16 AM
> To: Jerry Lv
> Cc: Sebastian Reichel; linux-pm@vger.kernel.org; linux-kernel@vger.kernel.org; Kernel
> Subject: Re: [PATCH] power: supply: bq27xxx_battery: Do not return ENODEV when busy
> 
> Thank you for detailed information about i2c NAK. In this case try to
> consider if it would not be better to add retry logic in the
> bq27xxx_battery_i2c_read() function.
> 
> If it is common that bq chipset itself returns i2c NAKs during normal
> operations then this affects any i2c read operation done by
> bq27xxx_battery_i2c_read() function.
> 
> So this issue is not related just to reading "flags", but to anything.
> That is why I think that retry should be handled at lower layer.
> 
> On Monday 23 September 2024 08:14:13 Jerry Lv wrote:
> > Hi Pali,
> >
> > Thanks for your excellent suggestion, I will change the code accordingly.
> >
> > About the question:
> > Anyway, which bus is BQ27Z561-R2 using (i2c?)? And how is EBUSY indicated or transferred over wire?
> > --- Yes, we connect the gauge BQ27Z561 to I2C. When it's busy, the feedback we got from the logic analyser is "NAK".
> >
> >
> > Best Regards,
> > Jerry Lv
> >
> > ________________________________________
> > From: Pali Rohár <pali@kernel.org>
> > Sent: Saturday, September 14, 2024 4:24 PM
> > To: Jerry Lv
> > Cc: Sebastian Reichel; linux-pm@vger.kernel.org; linux-kernel@vger.kernel.org; Kernel
> > Subject: Re: [PATCH] power: supply: bq27xxx_battery: Do not return ENODEV when busy
> >
> > Hello Jerry,
> >
> > I think that this issue should be handled in different way.
> >
> > First thing is to propagate error and not change it to -ENODEV. This is
> > really confusing and makes debugging harder.
> >
> > Second thing, if bq27xxx_read() returns -EBUSY, sleep few milliseconds
> > and call bq27xxx_read() again.
> >
> > This should cover the issue which you are observing and also fixing the
> > problem which you introduced in your change (interpreting error code as
> > bogus cache data).
> >
> > Anyway, which bus is BQ27Z561-R2 using (i2c?)? And how is EBUSY
> > indicated or transferred over wire?
> >
> > Pali
> >
> > On Saturday 14 September 2024 02:57:39 Jerry Lv wrote:
> > > Hi Pali,
> > >
> > > (Sorry for inconvineient! previous email was rejected by some email list for some HTML part, so I edit it and send it again.)
> > >
> > > Yes, bq27xxx_read() will return -EBUSY, and bq27xxx_read() will be called in many functions.
> > >
> > > In our product, some different applications may access the gauge BQ27Z561-R2, and we see many times the returned error code is -ENODEV.
> > > After debugging it by oscillograph and adding some debug info, we found the device is busy sometimes, and it will recover very soon(a few milliseconds).
> > > So, we want to exclude the busy case before return -ENODEV.
> > >
> > > Best Regards,
> > > Jerry
> > >
> > > On Friday 13 September 2024 16:45:37 Jerry Lv wrote:
> > > > Multiple applications may access the device gauge at the same time, so the
> > > > gauge may be busy and EBUSY will be returned. The driver will set a flag to
> > > > record the EBUSY state, and this flag will be kept until the next periodic
> > > > update. When this flag is set, bq27xxx_battery_get_property() will just
> > > > return ENODEV until the flag is updated.
> > >
> > > I did not find any evidence of EBUSY. Which function and to which caller
> > > it returns? Do you mean that bq27xxx_read() returns -EBUSY?
> > >
> > > > Even if the gauge was busy during the last accessing attempt, returning
> > > > ENODEV is not ideal, and can cause confusion in the applications layer.
> > >
> > > It would be better to either propagate correct error or return old value
> > > from cache...
> > >
> > > > Instead, retry accessing the gauge to update the properties is as expected.
> > > > The gauge typically recovers from busy state within a few milliseconds, and
> > > > the cached flag will not cause issues while updating the properties.
> > > >
> > > > Signed-off-by: Jerry Lv <Jerry.Lv@axis.com>
> > > > ---
> > > >  drivers/power/supply/bq27xxx_battery.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/power/supply/bq27xxx_battery.c b/drivers/power/supply/bq27xxx_battery.c
> > > > index 750fda543308..eefbb5029a3b 100644
> > > > --- a/drivers/power/supply/bq27xxx_battery.c
> > > > +++ b/drivers/power/supply/bq27xxx_battery.c
> > > > @@ -2029,7 +2029,7 @@ static int bq27xxx_battery_get_property(struct power_supply *psy,
> > > >                bq27xxx_battery_update_unlocked(di);
> > > >        mutex_unlock(&di->lock);
> > > >
> > > > -     if (psp != POWER_SUPPLY_PROP_PRESENT && di->cache.flags < 0)
> > > > +     if (psp != POWER_SUPPLY_PROP_PRESENT && di->cache.flags < 0 && di->cache.flags != -EBUSY)
> > > >                return -ENODEV;
> > >
> > > ... but ignoring error and re-using the error return value as flags in
> > > code later in this function is bad idea.
> > >
> > > >
> > > >        switch (psp) {
> > > >
> > > > ---
> > > > base-commit: da3ea35007d0af457a0afc87e84fddaebc4e0b63
> > > > change-id: 20240913-foo-fix2-a0d79db86a0b
> > > >
> > > > Best regards,
> > > > --
> > > > Jerry Lv <Jerry.Lv@axis.com>
> > > >
> > >
diff mbox series

Patch

diff --git a/drivers/power/supply/bq27xxx_battery.c b/drivers/power/supply/bq27xxx_battery.c
index 750fda543308..eefbb5029a3b 100644
--- a/drivers/power/supply/bq27xxx_battery.c
+++ b/drivers/power/supply/bq27xxx_battery.c
@@ -2029,7 +2029,7 @@  static int bq27xxx_battery_get_property(struct power_supply *psy,
 		bq27xxx_battery_update_unlocked(di);
 	mutex_unlock(&di->lock);
 
-	if (psp != POWER_SUPPLY_PROP_PRESENT && di->cache.flags < 0)
+	if (psp != POWER_SUPPLY_PROP_PRESENT && di->cache.flags < 0 && di->cache.flags != -EBUSY)
 		return -ENODEV;
 
 	switch (psp) {