mbox series

[0/2] media: ov5675: Fixup ov5675 reset failures

Message ID 20240711-linux-next-ov5675-v1-0-69e9b6c62c16@linaro.org
Headers show
Series media: ov5675: Fixup ov5675 reset failures | expand

Message

Bryan O'Donoghue July 11, 2024, 10:20 a.m. UTC
One long running saga for me on the Lenovo X13s is the occasional failure
to either probe or subsequently bring-up the ov5675 main RGB sensor on the
laptop.

Initially I suspected the PMIC for this part as the PMIC is using a new
interface on an I2C bus instead of an SPMI bus. In particular I thought
perhaps the I2C write to PMIC had completed but the regulator output hadn't
become stable from the perspective of the SoC. This however doesn't appear
to be the case - I can introduce a delay of milliseconds on the PMIC path
without resolving the sensor reset problem.

Secondly I thought about reset pin polarity or drive-strength but, again
playing about with both didn't yield decent results.

I also played with the duration of reset to no avail.

The error manifested as an I2C write timeout to the sensor which indicated
that the chip likely hadn't come out reset. An intermittent fault appearing
in perhaps 1/10 or 1/20 reset cycles.

Looking at the expression of the reset we see that there is a minimum time
expressed in XVCLK cycles between reset completion and first I2C
transaction to the sensor. The specification calls out the minimum delay @
8192 XVCLK cycles and the ov5675 driver meets that timing almost exactly.

A little too exactly - testing finally showed that we were too racy with
respect to the minimum quiescence between reset completion and first
command to the chip.

Fixing this error I choose to base the fix again on the number of clocks
but to also support any clock rate the chip could support by moving away
from a define to reading and using the XVCLK.

True enough only 19.2 MHz is currently supported but for the hypothetical
case where some other frequency is supported in the future, I wanted the
fix introduced in this series to still hold.

Hence this series:

1. Allows for any clock rate to be used in the valid range for the reset.
2. Elongates the post-reset period based on clock cycles which can now
vary.

Patch #2 can still be backported to stable irrespective of patch #1.

Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
---
Bryan O'Donoghue (2):
      media: ov5675: Derive delay cycles from the clock rate reported
      media: ov5675: Elongate reset to first transaction minimum gap

 drivers/media/i2c/ov5675.c | 26 +++++++++++++++++---------
 1 file changed, 17 insertions(+), 9 deletions(-)
---
base-commit: 523b23f0bee3014a7a752c9bb9f5c54f0eddae88
change-id: 20240710-linux-next-ov5675-60b0e83c73f1

Best regards,

Comments

Quentin Schulz July 11, 2024, 10:32 a.m. UTC | #1
Hi Bryan,

On 7/11/24 12:20 PM, Bryan O'Donoghue wrote:
> The ov5675 driver expresses its reset delays in terms of XVCLK cycles as
> per the ov5675 specification. XVCLK can be anything in the range of 6 MHz
> to 24 MHz inclusive.
> 
> Upstream we use 19.2 MHz however, since the delays are calculated in terms
> of clock cycles as opposed to fixed intervals it makes sense to facilitate
> any potential clock we might support.
> 
> Do so by reading the XVCLK rate and using the returned rate instead of
> operating from a static definition.
> 

We're actually running this sensor at **almost** 19.2MHz but are having 
intermittent issues, so this could probably help us too. In the end (for 
my employer), we should just add support for 24MHz as that's a frequency 
our products support but I'm lacking time right now. One day hopefully.

> Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>

The change isn't really useful for upstream yet but makes sense to me, so:

Reviewed-by: Quentin Schulz <quentin.schulz@cherry.de>

Thanks!
Quentin
Dave Stevenson July 11, 2024, 11:17 a.m. UTC | #2
Hi Quentin and Bryan

On Thu, 11 Jul 2024 at 11:40, Quentin Schulz <quentin.schulz@cherry.de> wrote:
>
> Hi Bryan,
>
> On 7/11/24 12:20 PM, Bryan O'Donoghue wrote:
> > The ov5675 specification says that the gap between XSHUTDN deassert and the
> > first I2C transaction should be a minimum of 8192 XVCLK cycles.
> >
> > Right now we use a usleep_rage() that gives a sleep time of between about
> > 430 and 860 microseconds.
> >
> > On the Lenovo X13s we have observed that in about 1/20 cases the current
> > timing is too tight and we start transacting before the ov5675's reset
> > cycle completes, leading to I2C bus transaction failures.
> >
> > The reset racing is sometimes triggered at initial chip probe but, more
> > usually on a subsequent power-off/power-on cycle e.g.
> >
> > [   71.451662] ov5675 24-0010: failed to write reg 0x0103. error = -5
> > [   71.451686] ov5675 24-0010: failed to set plls
> >
> > The current quiescence period we have is too tight, doubling the minimum
> > appears to fix the issue observed on X13s.
> >
> > Fixes: 49d9ad719e89 ("media: ov5675: add device-tree support and support runtime PM")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
> > ---
> >   drivers/media/i2c/ov5675.c | 9 +++++++--
> >   1 file changed, 7 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/media/i2c/ov5675.c b/drivers/media/i2c/ov5675.c
> > index 92bd35133a5d..0498f8f3064d 100644
> > --- a/drivers/media/i2c/ov5675.c
> > +++ b/drivers/media/i2c/ov5675.c
> > @@ -1018,8 +1018,13 @@ static int ov5675_power_on(struct device *dev)
> >
> >       gpiod_set_value_cansleep(ov5675->reset_gpio, 0);
> >
> > -     /* 8192 xvclk cycles prior to the first SCCB transation */
> > -     usleep_range(delay_us, delay_us * 2);
> > +     /* The spec calls for a minimum delay of 8192 XVCLK cycles prior to
> > +      * transacting on the I2C bus, which translates to about 430
> > +      * microseconds at 19.2 MHz.
> > +      * Testing shows the range 8192 - 16384 cycles to be unreliable.
> > +      * Grant a more liberal 2x -3x clock cycle grace time.
> > +      */
> > +     usleep_range(delay_us * 2, delay_us * 3);
> >
>
> Would it make sense to have power_off have the same logic? We do a
> usleep_range of those same values currently, so keeping them in sync
> seems to make sense to me.
>
> Also, I'm wondering if it isn't an issue with the gpio not being high
> right after gpoiod_set_value_cansleep() returns, i.e. the time it
> actually takes for the HW to reach the IO level that means "high" for
> the camera. And that this increased sleep is just a way to mitigate that?
>
> With this patch we essentially postpone the power_on by another 430ms
> making it almost a full second before we can start using the camera.
> That's quite a lot I think? We don't have a usecase right now that
> requires this to be blazing fast (and we anyway would need at the very
> least 430ms), so take this remark as what it is, a remark.

I think you've misread 430 usec as 430 msec.

I was looking at the series and trying to decide whether it's worth
going to the effort of computing the time at all when even on the
slowest 6MHz XVCLK we're sub 1.5ms for the required delay.
At the max XVLCK of 24MHz you could save 1ms. I know of very few use
cases that would suffer for a 1ms delay.

I know we all like to be precise, but it sounds like the precision
actually causes grief in this situation.

  Dave

>
> The change looks fine to me even though it feels like a band-aid patch.
>
> Cheers,
> Quentin
>
Bryan O'Donoghue July 11, 2024, 11:24 a.m. UTC | #3
On 11/07/2024 12:17, Dave Stevenson wrote:
> Hi Quentin and Bryan
> 
> On Thu, 11 Jul 2024 at 11:40, Quentin Schulz <quentin.schulz@cherry.de> wrote:
>>
>> Hi Bryan,
>>
>> On 7/11/24 12:20 PM, Bryan O'Donoghue wrote:
>>> The ov5675 specification says that the gap between XSHUTDN deassert and the
>>> first I2C transaction should be a minimum of 8192 XVCLK cycles.
>>>
>>> Right now we use a usleep_rage() that gives a sleep time of between about
>>> 430 and 860 microseconds.
>>>
>>> On the Lenovo X13s we have observed that in about 1/20 cases the current
>>> timing is too tight and we start transacting before the ov5675's reset
>>> cycle completes, leading to I2C bus transaction failures.
>>>
>>> The reset racing is sometimes triggered at initial chip probe but, more
>>> usually on a subsequent power-off/power-on cycle e.g.
>>>
>>> [   71.451662] ov5675 24-0010: failed to write reg 0x0103. error = -5
>>> [   71.451686] ov5675 24-0010: failed to set plls
>>>
>>> The current quiescence period we have is too tight, doubling the minimum
>>> appears to fix the issue observed on X13s.
>>>
>>> Fixes: 49d9ad719e89 ("media: ov5675: add device-tree support and support runtime PM")
>>> Cc: stable@vger.kernel.org
>>> Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
>>> ---
>>>    drivers/media/i2c/ov5675.c | 9 +++++++--
>>>    1 file changed, 7 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/media/i2c/ov5675.c b/drivers/media/i2c/ov5675.c
>>> index 92bd35133a5d..0498f8f3064d 100644
>>> --- a/drivers/media/i2c/ov5675.c
>>> +++ b/drivers/media/i2c/ov5675.c
>>> @@ -1018,8 +1018,13 @@ static int ov5675_power_on(struct device *dev)
>>>
>>>        gpiod_set_value_cansleep(ov5675->reset_gpio, 0);
>>>
>>> -     /* 8192 xvclk cycles prior to the first SCCB transation */
>>> -     usleep_range(delay_us, delay_us * 2);
>>> +     /* The spec calls for a minimum delay of 8192 XVCLK cycles prior to
>>> +      * transacting on the I2C bus, which translates to about 430
>>> +      * microseconds at 19.2 MHz.
>>> +      * Testing shows the range 8192 - 16384 cycles to be unreliable.
>>> +      * Grant a more liberal 2x -3x clock cycle grace time.
>>> +      */
>>> +     usleep_range(delay_us * 2, delay_us * 3);
>>>
>>
>> Would it make sense to have power_off have the same logic? We do a
>> usleep_range of those same values currently, so keeping them in sync
>> seems to make sense to me.
>>
>> Also, I'm wondering if it isn't an issue with the gpio not being high
>> right after gpoiod_set_value_cansleep() returns, i.e. the time it
>> actually takes for the HW to reach the IO level that means "high" for
>> the camera. And that this increased sleep is just a way to mitigate that?
>>
>> With this patch we essentially postpone the power_on by another 430ms
>> making it almost a full second before we can start using the camera.
>> That's quite a lot I think? We don't have a usecase right now that
>> requires this to be blazing fast (and we anyway would need at the very
>> least 430ms), so take this remark as what it is, a remark.
> 
> I think you've misread 430 usec as 430 msec.
> 
> I was looking at the series and trying to decide whether it's worth
> going to the effort of computing the time at all when even on the
> slowest 6MHz XVCLK we're sub 1.5ms for the required delay.
> At the max XVLCK of 24MHz you could save 1ms. I know of very few use
> cases that would suffer for a 1ms delay.
> 
> I know we all like to be precise, but it sounds like the precision
> actually causes grief in this situation.

Yeah the first draft of the patch just had a post-reset delay of I 
forget - I think I just used usleep_range(2000, 2200); again but I kind 
respected the attempt to hit the specification and wanted to fix the 
original logic, which is close but no cigar ATM.

---
bod
Quentin Schulz July 11, 2024, 12:22 p.m. UTC | #4
Hi Bryan,

On 7/11/24 2:07 PM, Bryan O'Donoghue wrote:
> On 11/07/2024 12:41, Quentin Schulz wrote:
>> Hi Bryan and Dave,
>>
>> On 7/11/24 1:22 PM, Bryan O'Donoghue wrote:
>>> On 11/07/2024 11:40, Quentin Schulz wrote:
>>>> Hi Bryan,
>>>>
>>>> On 7/11/24 12:20 PM, Bryan O'Donoghue wrote:
>>>>> The ov5675 specification says that the gap between XSHUTDN deassert 
>>>>> and the
>>>>> first I2C transaction should be a minimum of 8192 XVCLK cycles.
>>>>>
>>>>> Right now we use a usleep_rage() that gives a sleep time of between 
>>>>> about
>>>>> 430 and 860 microseconds.
>>>>>
>>>>> On the Lenovo X13s we have observed that in about 1/20 cases the 
>>>>> current
>>>>> timing is too tight and we start transacting before the ov5675's reset
>>>>> cycle completes, leading to I2C bus transaction failures.
>>>>>
>>>>> The reset racing is sometimes triggered at initial chip probe but, 
>>>>> more
>>>>> usually on a subsequent power-off/power-on cycle e.g.
>>>>>
>>>>> [   71.451662] ov5675 24-0010: failed to write reg 0x0103. error = -5
>>>>> [   71.451686] ov5675 24-0010: failed to set plls
>>>>>
>>>>> The current quiescence period we have is too tight, doubling the 
>>>>> minimum
>>>>> appears to fix the issue observed on X13s.
>>>>>
>>>>> Fixes: 49d9ad719e89 ("media: ov5675: add device-tree support and 
>>>>> support runtime PM")
>>>>> Cc: stable@vger.kernel.org
>>>>> Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
>>>>> ---
>>>>>   drivers/media/i2c/ov5675.c | 9 +++++++--
>>>>>   1 file changed, 7 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/media/i2c/ov5675.c b/drivers/media/i2c/ov5675.c
>>>>> index 92bd35133a5d..0498f8f3064d 100644
>>>>> --- a/drivers/media/i2c/ov5675.c
>>>>> +++ b/drivers/media/i2c/ov5675.c
>>>>> @@ -1018,8 +1018,13 @@ static int ov5675_power_on(struct device *dev)
>>>>>       gpiod_set_value_cansleep(ov5675->reset_gpio, 0);
>>>>> -    /* 8192 xvclk cycles prior to the first SCCB transation */
>>>>> -    usleep_range(delay_us, delay_us * 2);
>>>>> +    /* The spec calls for a minimum delay of 8192 XVCLK cycles 
>>>>> prior to
>>>>> +     * transacting on the I2C bus, which translates to about 430
>>>>> +     * microseconds at 19.2 MHz.
>>>>> +     * Testing shows the range 8192 - 16384 cycles to be unreliable.
>>>>> +     * Grant a more liberal 2x -3x clock cycle grace time.
>>>>> +     */
>>>>> +    usleep_range(delay_us * 2, delay_us * 3);
>>>>
>>>> Would it make sense to have power_off have the same logic? We do a 
>>>> usleep_range of those same values currently, so keeping them in sync 
>>>> seems to make sense to me.
>>>
>>> I have no evidence to suggest there's a problem on the shutdown path, 
>>> that's why I left the quiescence period as-is there.
>>>
>>>> Also, I'm wondering if it isn't an issue with the gpio not being 
>>>> high right after gpoiod_set_value_cansleep() returns, i.e. the time 
>>>> it actually takes for the HW to reach the IO level that means "high" 
>>>> for the camera. And that this increased sleep is just a way to 
>>>> mitigate that?
>>>
>>> No, that's not what I found.
>>>
>>> I tried changing
>>>
>>>          usleep_range(2000, 2200);
>>>
>>> to
>>>          usleep_range(200000, 220000);
>>>
>>> but could still elicit the I2C transaction failure. If the time it 
>>> took for the GPIO to hit logical 1 were the issue then multiplying 
>>> the reset time by 100 would certainly account for that.
>>>
>>> // BOD set the chip into reset
>>>          gpiod_set_value_cansleep(ov5675->reset_gpio, 1);
>>>
>>> // BOD apply power
>>>          ret = regulator_bulk_enable(OV5675_NUM_SUPPLIES, 
>>> ov5675->supplies);
>>>          if (ret) {
>>>                  clk_disable_unprepare(ov5675->xvclk);
>>>                  return ret;
>>>          }
>>>
>>>          /* Reset pulse should be at least 2ms and reset gpio 
>>> released only once
>>>           * regulators are stable.
>>>           */
>>>
>>> // BOD spec specifies 2 milliseconds here not a count of XVCLKs
>>>          usleep_range(2000, 2200);
>>>
>>>          gpiod_set_value_cansleep(ov5675->reset_gpio, 0);
>>>
>>
>> I meant to say this gpiod_set_value_cansleep(), which is logical LOW 
>> and not HIGH, brain not braining today sorry. But the question remains 
>> the same.
> 
> Ah right yes I get you, you mean how can I prove the sensor has come out 
> of reset by the time we start counting for the first I2C transaction 
> delay ?
> 
> There's no way to prove that, the only thing we can do is elongate the 
> post reset delay by X, whatever X we choose.
> 

I think we could, checking the delay between the moment the GPIO reaches 
logical low and the moment we send the first I2C command and comparing 
this against 8192 * 1/19.2MHz. Not sure we need to spend the time on 
this though? There isn't really a strong need for optimizing this as 
much as we can I believe? (and worst case scenario, we can do it later on).

>> So, maybe this is all too complex for something that could be as 
>> simple as 8192 XVCLK cycles for 6MHz as Dave suggested I believe. And 
>> have some wiggle room added in case we ever support 6MHz and it has 
>> the same issue as you encountered with 19.2MHz (or whatever was that 
>> rate you were running the camera at). 1/6MHz * 8192 * 2 ~= 2.7ms if 
>> I'm not mistaken. So maybe go with that with a comment just above to 
>> justify why we are doing this with hardcoded values?
> 
> 2.7 milliseconds is alot.
> 
> Worst case XVCLK period is 1.365 milliseconds.
> 
> If your theory on the GPIO is correct, its still difficult to see how @ 
> 6MHz - which we don't yet support and probably never will, that 1.5 
> milliseconds would be insufficient.
> 
> So - I'm happy enough to throw out the first patch and give a range of 
> 1.5 to 1.6 milliseconds instead.
> 

Works for me.

Cheers,
Quentin