Message ID | 20230426194956.689756-1-reidt@ti.com |
---|---|
State | New |
Headers | show |
Series | [v2] i2c: omap: Fix standard mode false ACK readings | expand |
On 4/27/2023 1:19 AM, Reid Tonking wrote: > Using standard mode, rare false ACK responses were appearing with > i2cdetect tool. This was happening due to NACK interrupt triggering > ISR thread before register access interrupt was ready. Removing the > NACK interrupt's ability to trigger ISR thread lets register access > ready interrupt do this instead. > > Cc: <stable@vger.kernel.org> # v3.7+ > Fixes: 3b2f8f82dad7 ("i2c: omap: switch to threaded IRQ support") > Signed-off-by: Reid Tonking <reidt@ti.com> Acked-by: Vignesh Raghavendra <vigneshr@ti.com> Regards Vignesh
* Raghavendra, Vignesh <vigneshr@ti.com> [230427 13:18]: > On 4/27/2023 1:19 AM, Reid Tonking wrote: > > Using standard mode, rare false ACK responses were appearing with > > i2cdetect tool. This was happening due to NACK interrupt triggering > > ISR thread before register access interrupt was ready. Removing the > > NACK interrupt's ability to trigger ISR thread lets register access > > ready interrupt do this instead. So is it safe to leave NACK interrupt unhandled until we get the next interrupt, does the ARDY always trigger after hitting this? Regards, Tony
On 10:43-20230428, Tony Lindgren wrote: > * Raghavendra, Vignesh <vigneshr@ti.com> [230427 13:18]: > > On 4/27/2023 1:19 AM, Reid Tonking wrote: > > > Using standard mode, rare false ACK responses were appearing with > > > i2cdetect tool. This was happening due to NACK interrupt triggering > > > ISR thread before register access interrupt was ready. Removing the > > > NACK interrupt's ability to trigger ISR thread lets register access > > > ready interrupt do this instead. > > So is it safe to leave NACK interrupt unhandled until we get the next > interrupt, does the ARDY always trigger after hitting this? > > Regards, > > Tony Yep, the ARDY always gets set after a new command when register access is ready so there's no need for NACK interrupt to control this. -Reid
On Wed, Apr 26, 2023 at 02:49:56PM -0500, Reid Tonking wrote: > Using standard mode, rare false ACK responses were appearing with > i2cdetect tool. This was happening due to NACK interrupt triggering > ISR thread before register access interrupt was ready. Removing the > NACK interrupt's ability to trigger ISR thread lets register access > ready interrupt do this instead. > > Cc: <stable@vger.kernel.org> # v3.7+ > Fixes: 3b2f8f82dad7 ("i2c: omap: switch to threaded IRQ support") > Signed-off-by: Reid Tonking <reidt@ti.com> Applied to for-current, thanks!
* Reid Tonking <reidt@ti.com> [230428 18:30]: > On 10:43-20230428, Tony Lindgren wrote: > > * Raghavendra, Vignesh <vigneshr@ti.com> [230427 13:18]: > > > On 4/27/2023 1:19 AM, Reid Tonking wrote: > > > > Using standard mode, rare false ACK responses were appearing with > > > > i2cdetect tool. This was happening due to NACK interrupt triggering > > > > ISR thread before register access interrupt was ready. Removing the > > > > NACK interrupt's ability to trigger ISR thread lets register access > > > > ready interrupt do this instead. > > > > So is it safe to leave NACK interrupt unhandled until we get the next > > interrupt, does the ARDY always trigger after hitting this? > > > > Regards, > > > > Tony > > Yep, the ARDY always gets set after a new command when register access is ready so there's no need for NACK interrupt to control this. OK thanks, looks good to me: Reviewed-by: Tony Lindgren <tony@atomide.com>
Am Wed, 11 Sep 2024 11:40:04 +0200 schrieb "H. Nikolaus Schaller" <hns@goldelico.com>: > Hi, > > > Am 28.04.2023 um 20:30 schrieb Reid Tonking <reidt@ti.com>: > > > > On 10:43-20230428, Tony Lindgren wrote: > >> * Raghavendra, Vignesh <vigneshr@ti.com> [230427 13:18]: > >>> On 4/27/2023 1:19 AM, Reid Tonking wrote: > >>>> Using standard mode, rare false ACK responses were appearing with > >>>> i2cdetect tool. This was happening due to NACK interrupt > >>>> triggering ISR thread before register access interrupt was > >>>> ready. Removing the NACK interrupt's ability to trigger ISR > >>>> thread lets register access ready interrupt do this instead. > >> > >> So is it safe to leave NACK interrupt unhandled until we get the > >> next interrupt, does the ARDY always trigger after hitting this? > >> > >> Regards, > >> > >> Tony > > > > Yep, the ARDY always gets set after a new command when register > > access is ready so there's no need for NACK interrupt to control > > this. > > I have tested one GTA04A5 board where this patch breaks boot on > v4.19.283 or v6.11-rc7 (where it was inherited from some earlier -rc > series). > > The device is either stuck with no signs of activity or reports RCU > stalls after a 20 second pause. > cannot reproduce it here. I had a patch to disable 1Ghz on that device in my tree. Do you have anything strange in your tree? Regards, Andreas
Hi, > Am 13.09.2024 um 14:09 schrieb Andreas Kemnade <andreas@kemnade.info>: > > Am Wed, 11 Sep 2024 11:40:04 +0200 > schrieb "H. Nikolaus Schaller" <hns@goldelico.com>: > >> Hi, >> >>> Am 28.04.2023 um 20:30 schrieb Reid Tonking <reidt@ti.com>: >>> >>> On 10:43-20230428, Tony Lindgren wrote: >>>> * Raghavendra, Vignesh <vigneshr@ti.com> [230427 13:18]: >>>>> On 4/27/2023 1:19 AM, Reid Tonking wrote: >>>>>> Using standard mode, rare false ACK responses were appearing with >>>>>> i2cdetect tool. This was happening due to NACK interrupt >>>>>> triggering ISR thread before register access interrupt was >>>>>> ready. Removing the NACK interrupt's ability to trigger ISR >>>>>> thread lets register access ready interrupt do this instead. >>>> >>>> So is it safe to leave NACK interrupt unhandled until we get the >>>> next interrupt, does the ARDY always trigger after hitting this? >>>> >>>> Regards, >>>> >>>> Tony >>> >>> Yep, the ARDY always gets set after a new command when register >>> access is ready so there's no need for NACK interrupt to control >>> this. >> >> I have tested one GTA04A5 board where this patch breaks boot on >> v4.19.283 or v6.11-rc7 (where it was inherited from some earlier -rc >> series). >> >> The device is either stuck with no signs of activity or reports RCU >> stalls after a 20 second pause. >> > cannot reproduce it here. That is good for you :) > I had a patch to disable 1Ghz on that > device in my tree. Do you have anything strange in your > tree? No, and the omap3 is running with 800 MHz only. I haven't tested on another board but the bug is very reproducible and I was able to bisect it to this patch, which makes the difference. So there may be boards which happily run with the patch and some don't. Maybe a race condition with hardware. But I think the assumption that "ARDY always gets set after a new command when register access is ready so there's no need for NACK interrupt to control this" may not hold in all situations. Potentially if a new command is never ready. BR, Nikolaus
Am Fri, 13 Sep 2024 14:28:59 +0200 schrieb "H. Nikolaus Schaller" <hns@goldelico.com>: > Hi, > > > > Am 13.09.2024 um 14:09 schrieb Andreas Kemnade > > <andreas@kemnade.info>: > > > > Am Wed, 11 Sep 2024 11:40:04 +0200 > > schrieb "H. Nikolaus Schaller" <hns@goldelico.com>: > > > >> Hi, > >> > >>> Am 28.04.2023 um 20:30 schrieb Reid Tonking <reidt@ti.com>: > >>> > >>> On 10:43-20230428, Tony Lindgren wrote: > >>>> * Raghavendra, Vignesh <vigneshr@ti.com> [230427 13:18]: > >>>>> On 4/27/2023 1:19 AM, Reid Tonking wrote: > >>>>>> Using standard mode, rare false ACK responses were appearing > >>>>>> with i2cdetect tool. This was happening due to NACK interrupt > >>>>>> triggering ISR thread before register access interrupt was > >>>>>> ready. Removing the NACK interrupt's ability to trigger ISR > >>>>>> thread lets register access ready interrupt do this instead. > >>>>>> > >>>> > >>>> So is it safe to leave NACK interrupt unhandled until we get the > >>>> next interrupt, does the ARDY always trigger after hitting this? > >>>> > >>>> Regards, > >>>> > >>>> Tony > >>> > >>> Yep, the ARDY always gets set after a new command when register > >>> access is ready so there's no need for NACK interrupt to control > >>> this. > >> > >> I have tested one GTA04A5 board where this patch breaks boot on > >> v4.19.283 or v6.11-rc7 (where it was inherited from some earlier > >> -rc series). > >> > >> The device is either stuck with no signs of activity or reports RCU > >> stalls after a 20 second pause. > >> > > cannot reproduce it here. > > That is good for you :) > which does not mean that there is no problem... > > I had a patch to disable 1Ghz on that > > device in my tree. Do you have anything strange in your > > tree? > > No, and the omap3 is running with 800 MHz only. > So you have a patch disabling 1Ghz OPP in there? The error messages look like things I got when 1Ghz was enabled, so better double check. if it is letux, then there is e.g. the interrupt reversal in there. Maybe it unveils some problem which should be fixed, maybe it is harmful, it was never well reviewed... > I haven't tested on another board but the bug is very reproducible > and I was able to bisect it to this patch, which makes the difference. > the error messages, esp. regarding rcu do not look so related to this. Maybe having this patch or not triggers some other bug. Maybe we trigger some race conditions. Or i2c error checking regarding OPP setting... > So there may be boards which happily run with the patch and some > don't. Maybe a race condition with hardware. > I am not ruling out that this patch has nasty side effects but I think there is more in the game. Regards, Andreas
Hi Andreas, > Am 13.09.2024 um 15:32 schrieb Andreas Kemnade <andreas@kemnade.info>: > >>> I had a patch to disable 1Ghz on that >>> device in my tree. Do you have anything strange in your >>> tree? >> >> No, and the omap3 is running with 800 MHz only. >> > So you have a patch disabling 1Ghz OPP in there? I think the speed is binned to be 800 MHz only. So the OPP is ignored. > The error messages > look like things I got when 1Ghz was enabled, so better double check. Well, it turns out to be difficult to check since with 6.11-rc7 cpufreq-info seems to be broken... I have not yet installed the fixed 4.19.283 again. But indeed I have found a potential issue. We have a patch [1] for the gta04a5 (only) that adds &cpu0_opp_table { /* is unreliable on gta04a5 - enable by echo 1 >/sys/devices/system/cpu/cpufreq/boost */ opp1g-1000000000 { turbo-mode; }; }; so that 1 GHz must be explicitly enabled by user-space. But some time ago the 1GHz node was apparently renamed to opp-1000000000 (5821d766932cc8) and this patch was not adjusted. After fixing it I can ask again for cpufreq-info and the 1GHz OPP is not activated: root@letux:~# cpufreq-info cpufrequtils 008: cpufreq-info (C) Dominik Brodowski 2004-2009 Report errors and bugs to cpufreq@vger.kernel.org, please. analyzing CPU 0: driver: cpufreq-dt CPUs which run at the same hardware frequency: 0 CPUs which need to have their frequency coordinated by software: 0 maximum transition latency: 300 us. hardware limits: 300 MHz - 800 MHz available frequency steps: 300 MHz, 600 MHz, 800 MHz available cpufreq governors: conservative, ondemand, userspace, powersave, performance current policy: frequency should be within 300 MHz and 800 MHz. The governor "ondemand" may decide which speed to use within this range. current CPU frequency is 800 MHz (asserted by call to hardware). cpufreq stats: 300 MHz:37.38%, 600 MHz:10.11%, 800 MHz:52.51%, 1000 MHz:0.00% (1740) root@letux:~# Anyways, this bug was introduced some months after this i2c patch we are discussing here. So i2c broke first before the 800MHz limitation was accidentially removed. Therefore I am quite sure that the failing 4.19.283 did run at 800 MHz. And in the v4.19.282 and v4.19.283 based kernels we have simply commented out the 1GHz option (since 2018) or there is no 1GHz OPP at all. Thanks for the hint to take a second and closer look at it, but it doesn't seem to be a factor here. > if it is letux, then there is e.g. the interrupt reversal in there. > Maybe it unveils some problem which should be fixed, maybe it is > harmful, it was never well reviewed... I know what you refer to but I could not find it any more. But I may not have searched correctly. > >> I haven't tested on another board but the bug is very reproducible >> and I was able to bisect it to this patch, which makes the difference. >> > the error messages, esp. regarding rcu do not look so related to this. > Maybe having this patch or not triggers some other bug. Maybe we trigger > some race conditions. Or i2c error checking regarding OPP setting... That is what I suspect as well. I2C is used to switch the twl4030 for different OPPs... > >> So there may be boards which happily run with the patch and some >> don't. Maybe a race condition with hardware. >> > I am not ruling out that this patch has nasty side effects but I think > there is more in the game. Yes, that is why I think just reverting this patch may only hide a symptom and does not solve it. But it may as well have introduced a bug as Tony apparently was thinking of when asking. BR and thanks, Nikolaus [1]: https://git.goldelico.com/?p=letux-kernel.git;a=commit;h=e824f0c9513cf1d57eba0c9a2ce5fe264fafc8d5
diff --git a/drivers/i2c/busses/i2c-omap.c b/drivers/i2c/busses/i2c-omap.c index 2b4e2be51318..4199f57a6bf2 100644 --- a/drivers/i2c/busses/i2c-omap.c +++ b/drivers/i2c/busses/i2c-omap.c @@ -1058,7 +1058,7 @@ omap_i2c_isr(int irq, void *dev_id) u16 stat; stat = omap_i2c_read_reg(omap, OMAP_I2C_STAT_REG); - mask = omap_i2c_read_reg(omap, OMAP_I2C_IE_REG); + mask = omap_i2c_read_reg(omap, OMAP_I2C_IE_REG) & ~OMAP_I2C_STAT_NACK; if (stat & mask) ret = IRQ_WAKE_THREAD;
Using standard mode, rare false ACK responses were appearing with i2cdetect tool. This was happening due to NACK interrupt triggering ISR thread before register access interrupt was ready. Removing the NACK interrupt's ability to trigger ISR thread lets register access ready interrupt do this instead. Cc: <stable@vger.kernel.org> # v3.7+ Fixes: 3b2f8f82dad7 ("i2c: omap: switch to threaded IRQ support") Signed-off-by: Reid Tonking <reidt@ti.com> --- drivers/i2c/busses/i2c-omap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)