Message ID | 20231207082200.16388-1-jensenhuang@friendlyarm.com |
---|---|
State | New |
Headers | show |
Series | [v2] i2c: rk3x: fix potential spinlock recursion on poll | expand |
On 2023-12-07 09:21, Jensen Huang wrote: > Possible deadlock scenario (on reboot): > rk3x_i2c_xfer_common(polling) > -> rk3x_i2c_wait_xfer_poll() > -> rk3x_i2c_irq(0, i2c); > --> spin_lock(&i2c->lock); > ... > <rk3x i2c interrupt> > -> rk3x_i2c_irq(0, i2c); > --> spin_lock(&i2c->lock); (deadlock here) > > Store the IRQ number and disable/enable it around the polling transfer. > This patch has been tested on NanoPC-T4. In case you haven't already seen the related discussion linked below, please have a look. I also added more people to the list of recipients, in an attempt to make everyone aware of the different approaches to solving this issue. https://lore.kernel.org/all/655177f4.050a0220.d85c9.3ba0@mx.google.com/T/#m6fc9c214452fec6681843e7f455978c35c6f6c8b > Signed-off-by: Jensen Huang <jensenhuang@friendlyarm.com> > Reviewed-by: Heiko Stuebner <heiko@sntech.de> > Reviewed-by: Andi Shyti <andi.shyti@kernel.org> > --- > Changes in v2: > - Add description for member 'irq' to fix build warning > > drivers/i2c/busses/i2c-rk3x.c | 13 +++++++++++-- > 1 file changed, 11 insertions(+), 2 deletions(-) > > diff --git a/drivers/i2c/busses/i2c-rk3x.c > b/drivers/i2c/busses/i2c-rk3x.c > index a044ca0c35a1..4362db7c5789 100644 > --- a/drivers/i2c/busses/i2c-rk3x.c > +++ b/drivers/i2c/busses/i2c-rk3x.c > @@ -178,6 +178,7 @@ struct rk3x_i2c_soc_data { > * @clk: function clk for rk3399 or function & Bus clks for others > * @pclk: Bus clk for rk3399 > * @clk_rate_nb: i2c clk rate change notify > + * @irq: irq number > * @t: I2C known timing information > * @lock: spinlock for the i2c bus > * @wait: the waitqueue to wait for i2c transfer > @@ -200,6 +201,7 @@ struct rk3x_i2c { > struct clk *clk; > struct clk *pclk; > struct notifier_block clk_rate_nb; > + int irq; > > /* Settings */ > struct i2c_timings t; > @@ -1087,13 +1089,18 @@ static int rk3x_i2c_xfer_common(struct > i2c_adapter *adap, > > spin_unlock_irqrestore(&i2c->lock, flags); > > - rk3x_i2c_start(i2c); > - > if (!polling) { > + rk3x_i2c_start(i2c); > + > timeout = wait_event_timeout(i2c->wait, !i2c->busy, > msecs_to_jiffies(WAIT_TIMEOUT)); > } else { > + disable_irq(i2c->irq); > + rk3x_i2c_start(i2c); > + > timeout = rk3x_i2c_wait_xfer_poll(i2c); > + > + enable_irq(i2c->irq); > } > > spin_lock_irqsave(&i2c->lock, flags); > @@ -1310,6 +1317,8 @@ static int rk3x_i2c_probe(struct platform_device > *pdev) > return ret; > } > > + i2c->irq = irq; > + > platform_set_drvdata(pdev, i2c); > > if (i2c->soc_data->calc_timings == rk3x_i2c_v0_calc_timings) {
On Fri, Dec 8, 2023 at 12:00 AM Dmitry Osipenko <dmitry.osipenko@collabora.com> wrote: > > On 12/7/23 17:10, Dragan Simic wrote: > > On 2023-12-07 10:25, Jensen Huang wrote: > >> On Thu, Dec 7, 2023 at 4:37 PM Dragan Simic <dsimic@manjaro.org> wrote: > >>> > >>> On 2023-12-07 09:21, Jensen Huang wrote: > >>> > Possible deadlock scenario (on reboot): > >>> > rk3x_i2c_xfer_common(polling) > >>> > -> rk3x_i2c_wait_xfer_poll() > >>> > -> rk3x_i2c_irq(0, i2c); > >>> > --> spin_lock(&i2c->lock); > >>> > ... > >>> > <rk3x i2c interrupt> > >>> > -> rk3x_i2c_irq(0, i2c); > >>> > --> spin_lock(&i2c->lock); (deadlock here) > >>> > > >>> > Store the IRQ number and disable/enable it around the polling > >>> transfer. > >>> > This patch has been tested on NanoPC-T4. > >>> > >>> In case you haven't already seen the related discussion linked below, > >>> please have a look. I also added more people to the list of recipients, > >>> in an attempt to make everyone aware of the different approaches to > >>> solving this issue. > >>> > >>> https://lore.kernel.org/all/655177f4.050a0220.d85c9.3ba0@mx.google.com/T/#m6fc9c214452fec6681843e7f455978c35c6f6c8b > >> > >> Thank you for providing the information. I hadn't seen this link before. > >> After carefully looking into the related discussion, it appears that > >> Dmitry Osipenko is already working on a suitable patch. To avoid > >> duplication > >> or conflicts, my patch can be discarded. > > > > Thank you for responding so quickly. Perhaps it would be best to hear > > from Dmitry as well, before discarding anything. It's been a while > > since Dmitry wrote about working on the patch, so he might have > > abandoned it. > > This patch is okay. In general, will be better to have IRQ disabled by > default like I did in my variant, it should allow to remove the spinlock > entirely. Of course this also can be done later on in a follow up > patches. Jensen, feel free to use my variant of the patch, add my > s-o-b+co-developed tags to the commit msg if you'll do. Otherwise I'll > be able to send my patch next week. Thank you for the suggestion. I've updated the patch to your variant, and as confirmed by others, reboots are functioning correctly. I measured the overhead of enable_irq/disable_irq() by calculating ktime in the updated version, and on rk3399, the minimum delta I observed was 291/875 ns. This extra cost may impact most interrupt-based transfers. Therefore, I personally lean towards the current v2 patch and handle the spinlock and irqsave/restore in a follow up patch. I'd like to hear everyone's thoughts on this. -- Best regards, Jensen
On 12/8/23 11:53, Jensen Huang wrote: > On Fri, Dec 8, 2023 at 12:00 AM Dmitry Osipenko > <dmitry.osipenko@collabora.com> wrote: >> >> On 12/7/23 17:10, Dragan Simic wrote: >>> On 2023-12-07 10:25, Jensen Huang wrote: >>>> On Thu, Dec 7, 2023 at 4:37 PM Dragan Simic <dsimic@manjaro.org> wrote: >>>>> >>>>> On 2023-12-07 09:21, Jensen Huang wrote: >>>>>> Possible deadlock scenario (on reboot): >>>>>> rk3x_i2c_xfer_common(polling) >>>>>> -> rk3x_i2c_wait_xfer_poll() >>>>>> -> rk3x_i2c_irq(0, i2c); >>>>>> --> spin_lock(&i2c->lock); >>>>>> ... >>>>>> <rk3x i2c interrupt> >>>>>> -> rk3x_i2c_irq(0, i2c); >>>>>> --> spin_lock(&i2c->lock); (deadlock here) >>>>>> >>>>>> Store the IRQ number and disable/enable it around the polling >>>>> transfer. >>>>>> This patch has been tested on NanoPC-T4. >>>>> >>>>> In case you haven't already seen the related discussion linked below, >>>>> please have a look. I also added more people to the list of recipients, >>>>> in an attempt to make everyone aware of the different approaches to >>>>> solving this issue. >>>>> >>>>> https://lore.kernel.org/all/655177f4.050a0220.d85c9.3ba0@mx.google.com/T/#m6fc9c214452fec6681843e7f455978c35c6f6c8b >>>> >>>> Thank you for providing the information. I hadn't seen this link before. >>>> After carefully looking into the related discussion, it appears that >>>> Dmitry Osipenko is already working on a suitable patch. To avoid >>>> duplication >>>> or conflicts, my patch can be discarded. >>> >>> Thank you for responding so quickly. Perhaps it would be best to hear >>> from Dmitry as well, before discarding anything. It's been a while >>> since Dmitry wrote about working on the patch, so he might have >>> abandoned it. >> >> This patch is okay. In general, will be better to have IRQ disabled by >> default like I did in my variant, it should allow to remove the spinlock >> entirely. Of course this also can be done later on in a follow up >> patches. Jensen, feel free to use my variant of the patch, add my >> s-o-b+co-developed tags to the commit msg if you'll do. Otherwise I'll >> be able to send my patch next week. > > Thank you for the suggestion. I've updated the patch to your variant, and > as confirmed by others, reboots are functioning correctly. I measured the > overhead of enable_irq/disable_irq() by calculating ktime in the > updated version, > and on rk3399, the minimum delta I observed was 291/875 ns. This extra > cost may impact most interrupt-based transfers. Therefore, I personally lean > towards the current v2 patch and handle the spinlock and irqsave/restore in > a follow up patch. I'd like to hear everyone's thoughts on this. Please don't use ktime for perf measurements, ktime itself is a slow API and it should be 200us that ktime takes itself. Also, the 0.2us is practically nothing, especially compared to I2C transfers measured in ms. I'm fine with keeping your v2 variant for the bug fix if you prefer that. Thanks for addressing the issue :)
On Thu, Dec 07, 2023 at 04:21:59PM +0800, Jensen Huang wrote: > Possible deadlock scenario (on reboot): > rk3x_i2c_xfer_common(polling) > -> rk3x_i2c_wait_xfer_poll() > -> rk3x_i2c_irq(0, i2c); > --> spin_lock(&i2c->lock); > ... > <rk3x i2c interrupt> > -> rk3x_i2c_irq(0, i2c); > --> spin_lock(&i2c->lock); (deadlock here) > > Store the IRQ number and disable/enable it around the polling transfer. > This patch has been tested on NanoPC-T4. > > Signed-off-by: Jensen Huang <jensenhuang@friendlyarm.com> > Reviewed-by: Heiko Stuebner <heiko@sntech.de> > Reviewed-by: Andi Shyti <andi.shyti@kernel.org> Applied to for-current, thanks! But I'd like to see the follow-up patches somewhen which have been discussed in this thread.
On Wed, Dec 20, 2023 at 1:23 AM Wolfram Sang <wsa@kernel.org> wrote: > > On Thu, Dec 07, 2023 at 04:21:59PM +0800, Jensen Huang wrote: > > Possible deadlock scenario (on reboot): > > rk3x_i2c_xfer_common(polling) > > -> rk3x_i2c_wait_xfer_poll() > > -> rk3x_i2c_irq(0, i2c); > > --> spin_lock(&i2c->lock); > > ... > > <rk3x i2c interrupt> > > -> rk3x_i2c_irq(0, i2c); > > --> spin_lock(&i2c->lock); (deadlock here) > > > > Store the IRQ number and disable/enable it around the polling transfer. > > This patch has been tested on NanoPC-T4. > > > > Signed-off-by: Jensen Huang <jensenhuang@friendlyarm.com> > > Reviewed-by: Heiko Stuebner <heiko@sntech.de> > > Reviewed-by: Andi Shyti <andi.shyti@kernel.org> > > Applied to for-current, thanks! > > But I'd like to see the follow-up patches somewhen which have been > discussed in this thread. I've made some attempts, such as removing irqsave/restore, but it didn't meet my expectations and requires further testing. Therefore, I may not be able to submit such a patch in the short term. However, if someone else submits a new patch, I'd be happy to test it on rk3328/rk3399. On Fri, Dec 8, 2023 at 8:17 PM Dmitry Osipenko <dmitry.osipenko@collabora.com> wrote: > > On 12/8/23 11:53, Jensen Huang wrote: > > On Fri, Dec 8, 2023 at 12:00 AM Dmitry Osipenko > > <dmitry.osipenko@collabora.com> wrote: > >> > >> On 12/7/23 17:10, Dragan Simic wrote: > >>> On 2023-12-07 10:25, Jensen Huang wrote: > >>>> On Thu, Dec 7, 2023 at 4:37 PM Dragan Simic <dsimic@manjaro.org> wrote: > >>>>> > >>>>> On 2023-12-07 09:21, Jensen Huang wrote: > >>>>>> Possible deadlock scenario (on reboot): > >>>>>> rk3x_i2c_xfer_common(polling) > >>>>>> -> rk3x_i2c_wait_xfer_poll() > >>>>>> -> rk3x_i2c_irq(0, i2c); > >>>>>> --> spin_lock(&i2c->lock); > >>>>>> ... > >>>>>> <rk3x i2c interrupt> > >>>>>> -> rk3x_i2c_irq(0, i2c); > >>>>>> --> spin_lock(&i2c->lock); (deadlock here) > >>>>>> > >>>>>> Store the IRQ number and disable/enable it around the polling > >>>>> transfer. > >>>>>> This patch has been tested on NanoPC-T4. > >>>>> > >>>>> In case you haven't already seen the related discussion linked below, > >>>>> please have a look. I also added more people to the list of recipients, > >>>>> in an attempt to make everyone aware of the different approaches to > >>>>> solving this issue. > >>>>> > >>>>> https://lore.kernel.org/all/655177f4.050a0220.d85c9.3ba0@mx.google.com/T/#m6fc9c214452fec6681843e7f455978c35c6f6c8b > >>>> > >>>> Thank you for providing the information. I hadn't seen this link before. > >>>> After carefully looking into the related discussion, it appears that > >>>> Dmitry Osipenko is already working on a suitable patch. To avoid > >>>> duplication > >>>> or conflicts, my patch can be discarded. > >>> > >>> Thank you for responding so quickly. Perhaps it would be best to hear > >>> from Dmitry as well, before discarding anything. It's been a while > >>> since Dmitry wrote about working on the patch, so he might have > >>> abandoned it. > >> > >> This patch is okay. In general, will be better to have IRQ disabled by > >> default like I did in my variant, it should allow to remove the spinlock > >> entirely. Of course this also can be done later on in a follow up > >> patches. Jensen, feel free to use my variant of the patch, add my > >> s-o-b+co-developed tags to the commit msg if you'll do. Otherwise I'll > >> be able to send my patch next week. > > > > Thank you for the suggestion. I've updated the patch to your variant, and > > as confirmed by others, reboots are functioning correctly. I measured the > > overhead of enable_irq/disable_irq() by calculating ktime in the > > updated version, > > and on rk3399, the minimum delta I observed was 291/875 ns. This extra > > cost may impact most interrupt-based transfers. Therefore, I personally lean > > towards the current v2 patch and handle the spinlock and irqsave/restore in > > a follow up patch. I'd like to hear everyone's thoughts on this. > > Please don't use ktime for perf measurements, ktime itself is a slow API > and it should be 200us that ktime takes itself. Also, the 0.2us is > practically nothing, especially compared to I2C transfers measured in ms. > > I'm fine with keeping your v2 variant for the bug fix if you prefer > that. Thanks for addressing the issue :) Thank you for clarifying the situation with ktime. Honestly, I also believe that the overhead of disable_irq/enable_irq should be minimal. Considering that system frequency adjustments involve I2C transfers, I conducted this 'somewhat imprecise' measurements. -- Best regards, Jensen
diff --git a/drivers/i2c/busses/i2c-rk3x.c b/drivers/i2c/busses/i2c-rk3x.c index a044ca0c35a1..4362db7c5789 100644 --- a/drivers/i2c/busses/i2c-rk3x.c +++ b/drivers/i2c/busses/i2c-rk3x.c @@ -178,6 +178,7 @@ struct rk3x_i2c_soc_data { * @clk: function clk for rk3399 or function & Bus clks for others * @pclk: Bus clk for rk3399 * @clk_rate_nb: i2c clk rate change notify + * @irq: irq number * @t: I2C known timing information * @lock: spinlock for the i2c bus * @wait: the waitqueue to wait for i2c transfer @@ -200,6 +201,7 @@ struct rk3x_i2c { struct clk *clk; struct clk *pclk; struct notifier_block clk_rate_nb; + int irq; /* Settings */ struct i2c_timings t; @@ -1087,13 +1089,18 @@ static int rk3x_i2c_xfer_common(struct i2c_adapter *adap, spin_unlock_irqrestore(&i2c->lock, flags); - rk3x_i2c_start(i2c); - if (!polling) { + rk3x_i2c_start(i2c); + timeout = wait_event_timeout(i2c->wait, !i2c->busy, msecs_to_jiffies(WAIT_TIMEOUT)); } else { + disable_irq(i2c->irq); + rk3x_i2c_start(i2c); + timeout = rk3x_i2c_wait_xfer_poll(i2c); + + enable_irq(i2c->irq); } spin_lock_irqsave(&i2c->lock, flags); @@ -1310,6 +1317,8 @@ static int rk3x_i2c_probe(struct platform_device *pdev) return ret; } + i2c->irq = irq; + platform_set_drvdata(pdev, i2c); if (i2c->soc_data->calc_timings == rk3x_i2c_v0_calc_timings) {