diff mbox series

[v2] i2c: rk3x: fix potential spinlock recursion on poll

Message ID 20231207082200.16388-1-jensenhuang@friendlyarm.com
State New
Headers show
Series [v2] i2c: rk3x: fix potential spinlock recursion on poll | expand

Commit Message

Jensen Huang Dec. 7, 2023, 8:21 a.m. UTC
Possible deadlock scenario (on reboot):
rk3x_i2c_xfer_common(polling)
    -> rk3x_i2c_wait_xfer_poll()
        -> rk3x_i2c_irq(0, i2c);
            --> spin_lock(&i2c->lock);
            ...
        <rk3x i2c interrupt>
        -> rk3x_i2c_irq(0, i2c);
            --> spin_lock(&i2c->lock); (deadlock here)

Store the IRQ number and disable/enable it around the polling transfer.
This patch has been tested on NanoPC-T4.

Signed-off-by: Jensen Huang <jensenhuang@friendlyarm.com>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: Andi Shyti <andi.shyti@kernel.org>
---
Changes in v2:
 - Add description for member 'irq' to fix build warning

 drivers/i2c/busses/i2c-rk3x.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

Comments

Dragan Simic Dec. 7, 2023, 8:37 a.m. UTC | #1
On 2023-12-07 09:21, Jensen Huang wrote:
> Possible deadlock scenario (on reboot):
> rk3x_i2c_xfer_common(polling)
>     -> rk3x_i2c_wait_xfer_poll()
>         -> rk3x_i2c_irq(0, i2c);
>             --> spin_lock(&i2c->lock);
>             ...
>         <rk3x i2c interrupt>
>         -> rk3x_i2c_irq(0, i2c);
>             --> spin_lock(&i2c->lock); (deadlock here)
> 
> Store the IRQ number and disable/enable it around the polling transfer.
> This patch has been tested on NanoPC-T4.

In case you haven't already seen the related discussion linked below, 
please have a look.  I also added more people to the list of recipients, 
in an attempt to make everyone aware of the different approaches to 
solving this issue.

https://lore.kernel.org/all/655177f4.050a0220.d85c9.3ba0@mx.google.com/T/#m6fc9c214452fec6681843e7f455978c35c6f6c8b

> Signed-off-by: Jensen Huang <jensenhuang@friendlyarm.com>
> Reviewed-by: Heiko Stuebner <heiko@sntech.de>
> Reviewed-by: Andi Shyti <andi.shyti@kernel.org>
> ---
> Changes in v2:
>  - Add description for member 'irq' to fix build warning
> 
>  drivers/i2c/busses/i2c-rk3x.c | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/i2c/busses/i2c-rk3x.c 
> b/drivers/i2c/busses/i2c-rk3x.c
> index a044ca0c35a1..4362db7c5789 100644
> --- a/drivers/i2c/busses/i2c-rk3x.c
> +++ b/drivers/i2c/busses/i2c-rk3x.c
> @@ -178,6 +178,7 @@ struct rk3x_i2c_soc_data {
>   * @clk: function clk for rk3399 or function & Bus clks for others
>   * @pclk: Bus clk for rk3399
>   * @clk_rate_nb: i2c clk rate change notify
> + * @irq: irq number
>   * @t: I2C known timing information
>   * @lock: spinlock for the i2c bus
>   * @wait: the waitqueue to wait for i2c transfer
> @@ -200,6 +201,7 @@ struct rk3x_i2c {
>  	struct clk *clk;
>  	struct clk *pclk;
>  	struct notifier_block clk_rate_nb;
> +	int irq;
> 
>  	/* Settings */
>  	struct i2c_timings t;
> @@ -1087,13 +1089,18 @@ static int rk3x_i2c_xfer_common(struct
> i2c_adapter *adap,
> 
>  		spin_unlock_irqrestore(&i2c->lock, flags);
> 
> -		rk3x_i2c_start(i2c);
> -
>  		if (!polling) {
> +			rk3x_i2c_start(i2c);
> +
>  			timeout = wait_event_timeout(i2c->wait, !i2c->busy,
>  						     msecs_to_jiffies(WAIT_TIMEOUT));
>  		} else {
> +			disable_irq(i2c->irq);
> +			rk3x_i2c_start(i2c);
> +
>  			timeout = rk3x_i2c_wait_xfer_poll(i2c);
> +
> +			enable_irq(i2c->irq);
>  		}
> 
>  		spin_lock_irqsave(&i2c->lock, flags);
> @@ -1310,6 +1317,8 @@ static int rk3x_i2c_probe(struct platform_device 
> *pdev)
>  		return ret;
>  	}
> 
> +	i2c->irq = irq;
> +
>  	platform_set_drvdata(pdev, i2c);
> 
>  	if (i2c->soc_data->calc_timings == rk3x_i2c_v0_calc_timings) {
Jensen Huang Dec. 8, 2023, 8:53 a.m. UTC | #2
On Fri, Dec 8, 2023 at 12:00 AM Dmitry Osipenko
<dmitry.osipenko@collabora.com> wrote:
>
> On 12/7/23 17:10, Dragan Simic wrote:
> > On 2023-12-07 10:25, Jensen Huang wrote:
> >> On Thu, Dec 7, 2023 at 4:37 PM Dragan Simic <dsimic@manjaro.org> wrote:
> >>>
> >>> On 2023-12-07 09:21, Jensen Huang wrote:
> >>> > Possible deadlock scenario (on reboot):
> >>> > rk3x_i2c_xfer_common(polling)
> >>> >     -> rk3x_i2c_wait_xfer_poll()
> >>> >         -> rk3x_i2c_irq(0, i2c);
> >>> >             --> spin_lock(&i2c->lock);
> >>> >             ...
> >>> >         <rk3x i2c interrupt>
> >>> >         -> rk3x_i2c_irq(0, i2c);
> >>> >             --> spin_lock(&i2c->lock); (deadlock here)
> >>> >
> >>> > Store the IRQ number and disable/enable it around the polling
> >>> transfer.
> >>> > This patch has been tested on NanoPC-T4.
> >>>
> >>> In case you haven't already seen the related discussion linked below,
> >>> please have a look.  I also added more people to the list of recipients,
> >>> in an attempt to make everyone aware of the different approaches to
> >>> solving this issue.
> >>>
> >>> https://lore.kernel.org/all/655177f4.050a0220.d85c9.3ba0@mx.google.com/T/#m6fc9c214452fec6681843e7f455978c35c6f6c8b
> >>
> >> Thank you for providing the information. I hadn't seen this link before.
> >> After carefully looking into the related discussion, it appears that
> >> Dmitry Osipenko is already working on a suitable patch. To avoid
> >> duplication
> >> or conflicts, my patch can be discarded.
> >
> > Thank you for responding so quickly.  Perhaps it would be best to hear
> > from Dmitry as well, before discarding anything.  It's been a while
> > since Dmitry wrote about working on the patch, so he might have
> > abandoned it.
>
> This patch is okay. In general, will be better to have IRQ disabled by
> default like I did in my variant, it should allow to remove the spinlock
> entirely. Of course this also can be done later on in a follow up
> patches. Jensen, feel free to use my variant of the patch, add my
> s-o-b+co-developed tags to the commit msg if you'll do. Otherwise I'll
> be able to send my patch next week.

Thank you for the suggestion. I've updated the patch to your variant, and
as confirmed by others, reboots are functioning correctly. I measured the
overhead of enable_irq/disable_irq() by calculating ktime in the
updated version,
and on rk3399, the minimum delta I observed was 291/875 ns. This extra
cost may impact most interrupt-based transfers. Therefore, I personally lean
towards the current v2 patch and handle the spinlock and irqsave/restore in
a follow up patch. I'd like to hear everyone's thoughts on this.

--
Best regards,
Jensen
Dmitry Osipenko Dec. 8, 2023, 12:17 p.m. UTC | #3
On 12/8/23 11:53, Jensen Huang wrote:
> On Fri, Dec 8, 2023 at 12:00 AM Dmitry Osipenko
> <dmitry.osipenko@collabora.com> wrote:
>>
>> On 12/7/23 17:10, Dragan Simic wrote:
>>> On 2023-12-07 10:25, Jensen Huang wrote:
>>>> On Thu, Dec 7, 2023 at 4:37 PM Dragan Simic <dsimic@manjaro.org> wrote:
>>>>>
>>>>> On 2023-12-07 09:21, Jensen Huang wrote:
>>>>>> Possible deadlock scenario (on reboot):
>>>>>> rk3x_i2c_xfer_common(polling)
>>>>>>     -> rk3x_i2c_wait_xfer_poll()
>>>>>>         -> rk3x_i2c_irq(0, i2c);
>>>>>>             --> spin_lock(&i2c->lock);
>>>>>>             ...
>>>>>>         <rk3x i2c interrupt>
>>>>>>         -> rk3x_i2c_irq(0, i2c);
>>>>>>             --> spin_lock(&i2c->lock); (deadlock here)
>>>>>>
>>>>>> Store the IRQ number and disable/enable it around the polling
>>>>> transfer.
>>>>>> This patch has been tested on NanoPC-T4.
>>>>>
>>>>> In case you haven't already seen the related discussion linked below,
>>>>> please have a look.  I also added more people to the list of recipients,
>>>>> in an attempt to make everyone aware of the different approaches to
>>>>> solving this issue.
>>>>>
>>>>> https://lore.kernel.org/all/655177f4.050a0220.d85c9.3ba0@mx.google.com/T/#m6fc9c214452fec6681843e7f455978c35c6f6c8b
>>>>
>>>> Thank you for providing the information. I hadn't seen this link before.
>>>> After carefully looking into the related discussion, it appears that
>>>> Dmitry Osipenko is already working on a suitable patch. To avoid
>>>> duplication
>>>> or conflicts, my patch can be discarded.
>>>
>>> Thank you for responding so quickly.  Perhaps it would be best to hear
>>> from Dmitry as well, before discarding anything.  It's been a while
>>> since Dmitry wrote about working on the patch, so he might have
>>> abandoned it.
>>
>> This patch is okay. In general, will be better to have IRQ disabled by
>> default like I did in my variant, it should allow to remove the spinlock
>> entirely. Of course this also can be done later on in a follow up
>> patches. Jensen, feel free to use my variant of the patch, add my
>> s-o-b+co-developed tags to the commit msg if you'll do. Otherwise I'll
>> be able to send my patch next week.
> 
> Thank you for the suggestion. I've updated the patch to your variant, and
> as confirmed by others, reboots are functioning correctly. I measured the
> overhead of enable_irq/disable_irq() by calculating ktime in the
> updated version,
> and on rk3399, the minimum delta I observed was 291/875 ns. This extra
> cost may impact most interrupt-based transfers. Therefore, I personally lean
> towards the current v2 patch and handle the spinlock and irqsave/restore in
> a follow up patch. I'd like to hear everyone's thoughts on this.

Please don't use ktime for perf measurements, ktime itself is a slow API
and it should be 200us that ktime takes itself. Also, the 0.2us is
practically nothing, especially compared to I2C transfers measured in ms.

I'm fine with keeping your v2 variant for the bug fix if you prefer
that. Thanks for addressing the issue :)
Wolfram Sang Dec. 19, 2023, 5:23 p.m. UTC | #4
On Thu, Dec 07, 2023 at 04:21:59PM +0800, Jensen Huang wrote:
> Possible deadlock scenario (on reboot):
> rk3x_i2c_xfer_common(polling)
>     -> rk3x_i2c_wait_xfer_poll()
>         -> rk3x_i2c_irq(0, i2c);
>             --> spin_lock(&i2c->lock);
>             ...
>         <rk3x i2c interrupt>
>         -> rk3x_i2c_irq(0, i2c);
>             --> spin_lock(&i2c->lock); (deadlock here)
> 
> Store the IRQ number and disable/enable it around the polling transfer.
> This patch has been tested on NanoPC-T4.
> 
> Signed-off-by: Jensen Huang <jensenhuang@friendlyarm.com>
> Reviewed-by: Heiko Stuebner <heiko@sntech.de>
> Reviewed-by: Andi Shyti <andi.shyti@kernel.org>

Applied to for-current, thanks!

But I'd like to see the follow-up patches somewhen which have been
discussed in this thread.
Jensen Huang Dec. 22, 2023, 8:22 a.m. UTC | #5
On Wed, Dec 20, 2023 at 1:23 AM Wolfram Sang <wsa@kernel.org> wrote:
>
> On Thu, Dec 07, 2023 at 04:21:59PM +0800, Jensen Huang wrote:
> > Possible deadlock scenario (on reboot):
> > rk3x_i2c_xfer_common(polling)
> >     -> rk3x_i2c_wait_xfer_poll()
> >         -> rk3x_i2c_irq(0, i2c);
> >             --> spin_lock(&i2c->lock);
> >             ...
> >         <rk3x i2c interrupt>
> >         -> rk3x_i2c_irq(0, i2c);
> >             --> spin_lock(&i2c->lock); (deadlock here)
> >
> > Store the IRQ number and disable/enable it around the polling transfer.
> > This patch has been tested on NanoPC-T4.
> >
> > Signed-off-by: Jensen Huang <jensenhuang@friendlyarm.com>
> > Reviewed-by: Heiko Stuebner <heiko@sntech.de>
> > Reviewed-by: Andi Shyti <andi.shyti@kernel.org>
>
> Applied to for-current, thanks!
>
> But I'd like to see the follow-up patches somewhen which have been
> discussed in this thread.

I've made some attempts, such as removing irqsave/restore, but it
didn't meet my expectations and
requires further testing. Therefore, I may not be able to submit such
a patch in the short term. However,
if someone else submits a new patch, I'd be happy to test it on rk3328/rk3399.


On Fri, Dec 8, 2023 at 8:17 PM Dmitry Osipenko
<dmitry.osipenko@collabora.com> wrote:
>
> On 12/8/23 11:53, Jensen Huang wrote:
> > On Fri, Dec 8, 2023 at 12:00 AM Dmitry Osipenko
> > <dmitry.osipenko@collabora.com> wrote:
> >>
> >> On 12/7/23 17:10, Dragan Simic wrote:
> >>> On 2023-12-07 10:25, Jensen Huang wrote:
> >>>> On Thu, Dec 7, 2023 at 4:37 PM Dragan Simic <dsimic@manjaro.org> wrote:
> >>>>>
> >>>>> On 2023-12-07 09:21, Jensen Huang wrote:
> >>>>>> Possible deadlock scenario (on reboot):
> >>>>>> rk3x_i2c_xfer_common(polling)
> >>>>>>     -> rk3x_i2c_wait_xfer_poll()
> >>>>>>         -> rk3x_i2c_irq(0, i2c);
> >>>>>>             --> spin_lock(&i2c->lock);
> >>>>>>             ...
> >>>>>>         <rk3x i2c interrupt>
> >>>>>>         -> rk3x_i2c_irq(0, i2c);
> >>>>>>             --> spin_lock(&i2c->lock); (deadlock here)
> >>>>>>
> >>>>>> Store the IRQ number and disable/enable it around the polling
> >>>>> transfer.
> >>>>>> This patch has been tested on NanoPC-T4.
> >>>>>
> >>>>> In case you haven't already seen the related discussion linked below,
> >>>>> please have a look.  I also added more people to the list of recipients,
> >>>>> in an attempt to make everyone aware of the different approaches to
> >>>>> solving this issue.
> >>>>>
> >>>>> https://lore.kernel.org/all/655177f4.050a0220.d85c9.3ba0@mx.google.com/T/#m6fc9c214452fec6681843e7f455978c35c6f6c8b
> >>>>
> >>>> Thank you for providing the information. I hadn't seen this link before.
> >>>> After carefully looking into the related discussion, it appears that
> >>>> Dmitry Osipenko is already working on a suitable patch. To avoid
> >>>> duplication
> >>>> or conflicts, my patch can be discarded.
> >>>
> >>> Thank you for responding so quickly.  Perhaps it would be best to hear
> >>> from Dmitry as well, before discarding anything.  It's been a while
> >>> since Dmitry wrote about working on the patch, so he might have
> >>> abandoned it.
> >>
> >> This patch is okay. In general, will be better to have IRQ disabled by
> >> default like I did in my variant, it should allow to remove the spinlock
> >> entirely. Of course this also can be done later on in a follow up
> >> patches. Jensen, feel free to use my variant of the patch, add my
> >> s-o-b+co-developed tags to the commit msg if you'll do. Otherwise I'll
> >> be able to send my patch next week.
> >
> > Thank you for the suggestion. I've updated the patch to your variant, and
> > as confirmed by others, reboots are functioning correctly. I measured the
> > overhead of enable_irq/disable_irq() by calculating ktime in the
> > updated version,
> > and on rk3399, the minimum delta I observed was 291/875 ns. This extra
> > cost may impact most interrupt-based transfers. Therefore, I personally lean
> > towards the current v2 patch and handle the spinlock and irqsave/restore in
> > a follow up patch. I'd like to hear everyone's thoughts on this.
>
> Please don't use ktime for perf measurements, ktime itself is a slow API
> and it should be 200us that ktime takes itself. Also, the 0.2us is
> practically nothing, especially compared to I2C transfers measured in ms.
>
> I'm fine with keeping your v2 variant for the bug fix if you prefer
> that. Thanks for addressing the issue :)

Thank you for clarifying the situation with ktime. Honestly, I also
believe that the overhead
of disable_irq/enable_irq should be minimal. Considering that system
frequency adjustments
involve I2C transfers, I conducted this 'somewhat imprecise' measurements.


--
Best regards,
Jensen
diff mbox series

Patch

diff --git a/drivers/i2c/busses/i2c-rk3x.c b/drivers/i2c/busses/i2c-rk3x.c
index a044ca0c35a1..4362db7c5789 100644
--- a/drivers/i2c/busses/i2c-rk3x.c
+++ b/drivers/i2c/busses/i2c-rk3x.c
@@ -178,6 +178,7 @@  struct rk3x_i2c_soc_data {
  * @clk: function clk for rk3399 or function & Bus clks for others
  * @pclk: Bus clk for rk3399
  * @clk_rate_nb: i2c clk rate change notify
+ * @irq: irq number
  * @t: I2C known timing information
  * @lock: spinlock for the i2c bus
  * @wait: the waitqueue to wait for i2c transfer
@@ -200,6 +201,7 @@  struct rk3x_i2c {
 	struct clk *clk;
 	struct clk *pclk;
 	struct notifier_block clk_rate_nb;
+	int irq;
 
 	/* Settings */
 	struct i2c_timings t;
@@ -1087,13 +1089,18 @@  static int rk3x_i2c_xfer_common(struct i2c_adapter *adap,
 
 		spin_unlock_irqrestore(&i2c->lock, flags);
 
-		rk3x_i2c_start(i2c);
-
 		if (!polling) {
+			rk3x_i2c_start(i2c);
+
 			timeout = wait_event_timeout(i2c->wait, !i2c->busy,
 						     msecs_to_jiffies(WAIT_TIMEOUT));
 		} else {
+			disable_irq(i2c->irq);
+			rk3x_i2c_start(i2c);
+
 			timeout = rk3x_i2c_wait_xfer_poll(i2c);
+
+			enable_irq(i2c->irq);
 		}
 
 		spin_lock_irqsave(&i2c->lock, flags);
@@ -1310,6 +1317,8 @@  static int rk3x_i2c_probe(struct platform_device *pdev)
 		return ret;
 	}
 
+	i2c->irq = irq;
+
 	platform_set_drvdata(pdev, i2c);
 
 	if (i2c->soc_data->calc_timings == rk3x_i2c_v0_calc_timings) {