Message ID | 20240808121447.239278-1-leitao@debian.org |
---|---|
State | New |
Headers | show |
Series | [RESEND] Do not mark ACPI devices as irq safe | expand |
On Tue, Aug 13, 2024 at 6:28 PM Dmitry Osipenko <digetx@gmail.com> wrote: > > 13.08.2024 16:32, Breno Leitao пишет: > > Hello Andy, > > > > On Fri, Aug 09, 2024 at 02:03:27PM +0300, Andy Shevchenko wrote: > >> On Fri, Aug 9, 2024 at 2:57 AM Andi Shyti <andi.shyti@kernel.org> wrote: > >>> On Thu, Aug 08, 2024 at 05:14:46AM GMT, Breno Leitao wrote: > > > >>>> The problem arises because during __pm_runtime_resume(), the spinlock > >>>> &dev->power.lock is acquired before rpm_resume() is called. Later, > >>>> rpm_resume() invokes acpi_subsys_runtime_resume(), which relies on > >>>> mutexes, triggering the error. > >>>> > >>>> To address this issue, devices on ACPI are now marked as not IRQ-safe, > >>>> considering the dependency of acpi_subsys_runtime_resume() on mutexes. > >> > >> This is a step in the right direction > > > > Thanks > > > >> but somewhere in the replies > >> here I would like to hear about roadmap to get rid of the > >> pm_runtime_irq_safe() in all Tegra related code. > > > > Agree, that seems the right way to go, but this is a question to > > maintainers, Laxman and Dmitry. > > > > By the way, looking at lore, I found that the last email from Laxman is > > from 2022. And Dmitry seems to be using a different email!? Let me copy > > the Dmitry's other email (dmitry.osipenko@collabora.com) here. > > > >>>> + if (!IS_VI(i2c_dev) && !ACPI_HANDLE(i2c_dev->dev)) > >>> > >>> looks good to me, can I have an ack from Andy here? > >> > >> I prefer to see something like > >> is_acpi_node() / is_acpi_device_node() / is_acpi_data_node() / > >> has_acpi_companion() > >> instead depending on the actual ACPI representation of the device. > >> > >> Otherwise no objections. > >> Please, Cc me (andy@kernel.org) for the next version. > > > > Thanks for the feedback, I agree that leveraging the functions about > > should be better. What about something as: > > > > Author: Breno Leitao <leitao@debian.org> > > Date: Thu Jun 6 06:27:07 2024 -0700 > > > > Do not mark ACPI devices as irq safe > > > > On ACPI machines, the tegra i2c module encounters an issue due to a > > mutex being called inside a spinlock. This leads to the following bug: > > > > BUG: sleeping function called from invalid context at kernel/locking/mutex.c:585 > > in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 1282, name: kssif0010 > > preempt_count: 0, expected: 0 > > RCU nest depth: 0, expected: 0 > > irq event stamp: 0 > > > > Call trace: > > __might_sleep > > __mutex_lock_common > > mutex_lock_nested > > acpi_subsys_runtime_resume > > rpm_resume > > tegra_i2c_xfer > > > > The problem arises because during __pm_runtime_resume(), the spinlock > > &dev->power.lock is acquired before rpm_resume() is called. Later, > > rpm_resume() invokes acpi_subsys_runtime_resume(), which relies on > > mutexes, triggering the error. > > > > To address this issue, devices on ACPI are now marked as not IRQ-safe, > > considering the dependency of acpi_subsys_runtime_resume() on mutexes. > > > > Co-developed-by: Michael van der Westhuizen <rmikey@meta.com> > > Signed-off-by: Michael van der Westhuizen <rmikey@meta.com> > > Signed-off-by: Breno Leitao <leitao@debian.org> > > > > diff --git a/drivers/i2c/busses/i2c-tegra.c b/drivers/i2c/busses/i2c-tegra.c > > index 85b31edc558d..1df5b4204142 100644 > > --- a/drivers/i2c/busses/i2c-tegra.c > > +++ b/drivers/i2c/busses/i2c-tegra.c > > @@ -1802,9 +1802,9 @@ static int tegra_i2c_probe(struct platform_device *pdev) > > * domain. > > * > > * VI I2C device shouldn't be marked as IRQ-safe because VI I2C won't > > - * be used for atomic transfers. > > + * be used for atomic transfers. ACPI device is not IRQ safe also. > > */ > > - if (!IS_VI(i2c_dev)) > > + if (!IS_VI(i2c_dev) && !has_acpi_companion(i2c_dev->dev)) > > pm_runtime_irq_safe(i2c_dev->dev); > > > > pm_runtime_enable(i2c_dev->dev); > > > > Looks good, thanks > > Reviewed-by: Dmitry Osipenko <digetx@gmail.com> LGTM as well, feel free to add Reviewed-by: Andy Shevchenko <andy@kernel.org> to the above when sending it formally. > > but somewhere in the replies > > here I would like to hear about roadmap to get rid of the > > pm_runtime_irq_safe() in all Tegra related code. > > What is the problem with pm_runtime_irq_safe()? It's a hack. It has no reasons to stay in the kernel. It also prevents PM from working properly (in some cases, not Tegra). > There were multiple > problems with RPM for this driver in the past, it wasn't trivial to make > it work for all Tegra HW generations. Don't expect anyone would want to > invest time into doing it all over again. You may always refer to the OMAP case, which used to have 12 (IIRC, but definitely several) calls to this API and now 0. Taking the OMAP case into consideration I believe it's quite possible to get rid of this hack and retire the API completely. Yes, this may take months or even years. But I would like to have this roadmap be documented.
13.08.2024 18:52, Andy Shevchenko пишет: ... >>> but somewhere in the replies >>> here I would like to hear about roadmap to get rid of the >>> pm_runtime_irq_safe() in all Tegra related code. >> >> What is the problem with pm_runtime_irq_safe()? > > It's a hack. It has no reasons to stay in the kernel. It also prevents > PM from working properly (in some cases, not Tegra). Why is it a hack? Why it can't be made to work properly for all cases? >> There were multiple >> problems with RPM for this driver in the past, it wasn't trivial to make >> it work for all Tegra HW generations. Don't expect anyone would want to >> invest time into doing it all over again. > > You may always refer to the OMAP case, which used to have 12 (IIRC, > but definitely several) calls to this API and now 0. Taking the OMAP > case into consideration I believe it's quite possible to get rid of > this hack and retire the API completely. Yes, this may take months or > even years. But I would like to have this roadmap be documented. There should be alternative to the removed API. Otherwise drivers will have to have own hacks to work around the RPM limitation, re-invent own PM, or not do RPM at all. Looking at the i2c-omap.c, I see it's doing pm_runtime_get_sync() in the atomic transfer, which should cause a lockup without IRQ-safe RPM, AFAICT. The OMAP example doesn't look great so far.
+Cc: Tony On Thu, Aug 15, 2024 at 5:48 AM Dmitry Osipenko <digetx@gmail.com> wrote: > 13.08.2024 18:52, Andy Shevchenko пишет: > ... > >>> but somewhere in the replies > >>> here I would like to hear about roadmap to get rid of the > >>> pm_runtime_irq_safe() in all Tegra related code. > >> > >> What is the problem with pm_runtime_irq_safe()? > > > > It's a hack. It has no reasons to stay in the kernel. It also prevents > > PM from working properly (in some cases, not Tegra). > > Why is it a hack? Why it can't be made to work properly for all cases? Because it messes up with the proper power transitions of the parent devices. Refer to the initial commit c7b61de5b7b1 ("PM / Runtime: Add synchronous runtime interface for interrupt handlers (v3)") that pretty much explains the constraints of it. Also note, it was added quite a while after the main PM machinery had been introduced. What you have to use is device links to make sure the parent (PM speaking) may not go away. FWIW, if I am not mistaken the whole reconsideration of pm_runtime_irq_safe() had been started with this [1] thread. If you want to dive more into the history of this API, run `git log -S pm_runtime_irq_safe`. It gives you also interesting facts of how it was started being used and in many cases reverted or reworked for a reason. > >> There were multiple > >> problems with RPM for this driver in the past, it wasn't trivial to make > >> it work for all Tegra HW generations. Don't expect anyone would want to > >> invest time into doing it all over again. > > > > You may always refer to the OMAP case, which used to have 12 (IIRC, > > but definitely several) calls to this API and now 0. Taking the OMAP > > case into consideration I believe it's quite possible to get rid of > > this hack and retire the API completely. Yes, this may take months or > > even years. But I would like to have this roadmap be documented. > > There should be alternative to the removed API. Otherwise drivers will > have to have own hacks to work around the RPM limitation, re-invent own > PM, or not do RPM at all. > > Looking at the i2c-omap.c, I see it's doing pm_runtime_get_sync() in the > atomic transfer, which should cause a lockup without IRQ-safe RPM, > AFAICT. The OMAP example doesn't look great so far. Bugs may still appear, but it's not a point. I can easily find a better example with a hint why it's bad to call that API [2][3][4] and so on. [1]: https://lore.kernel.org/all/20180515183409.78046-1-andriy.shevchenko@linux.intel.com/T/#u [2]: https://lore.kernel.org/all/20191114101718.20619-1-peter.ujfalusi@ti.com/ [3]: https://lore.kernel.org/all/20180920193532.7714-1-tony@atomide.com/ [4]: https://lore.kernel.org/all/1463014396-4095-1-git-send-email-tony@atomide.com/
* Andy Shevchenko <andy.shevchenko@gmail.com> [240819 09:21]: > +Cc: Tony > > On Thu, Aug 15, 2024 at 5:48 AM Dmitry Osipenko <digetx@gmail.com> wrote: > > 13.08.2024 18:52, Andy Shevchenko пишет: > > ... > > >>> but somewhere in the replies > > >>> here I would like to hear about roadmap to get rid of the > > >>> pm_runtime_irq_safe() in all Tegra related code. > > >> > > >> What is the problem with pm_runtime_irq_safe()? > > > > > > It's a hack. It has no reasons to stay in the kernel. It also prevents > > > PM from working properly (in some cases, not Tegra). > > > > Why is it a hack? Why it can't be made to work properly for all cases? > > Because it messes up with the proper power transitions of the parent > devices. Refer to the initial commit c7b61de5b7b1 ("PM / Runtime: Add > synchronous runtime interface for interrupt handlers (v3)") that > pretty much explains the constraints of it. Also note, it was added > quite a while after the main PM machinery had been introduced. > > What you have to use is device links to make sure the parent (PM > speaking) may not go away. > FWIW, if I am not mistaken the whole reconsideration of > pm_runtime_irq_safe() had been started with this [1] thread. > > If you want to dive more into the history of this API, run `git log -S > pm_runtime_irq_safe`. It gives you also interesting facts of how it > was started being used and in many cases reverted or reworked for a > reason. Yeah we should remove pm_runtime_irq_safe() completely. Fixing the use of it in a driver afterwards is always a pain. And so far there has always been a better solution available for the use cases I've seen. > > >> There were multiple > > >> problems with RPM for this driver in the past, it wasn't trivial to make > > >> it work for all Tegra HW generations. Don't expect anyone would want to > > >> invest time into doing it all over again. > > > > > > You may always refer to the OMAP case, which used to have 12 (IIRC, > > > but definitely several) calls to this API and now 0. Taking the OMAP > > > case into consideration I believe it's quite possible to get rid of > > > this hack and retire the API completely. Yes, this may take months or > > > even years. But I would like to have this roadmap be documented. > > > > There should be alternative to the removed API. Otherwise drivers will > > have to have own hacks to work around the RPM limitation, re-invent own > > PM, or not do RPM at all. > > > > Looking at the i2c-omap.c, I see it's doing pm_runtime_get_sync() in the > > atomic transfer, which should cause a lockup without IRQ-safe RPM, > > AFAICT. The OMAP example doesn't look great so far. > > Bugs may still appear, but it's not a point. I can easily find a > better example with a hint why it's bad to call that API [2][3][4] and > so on. Adding Kevin for the i2c-omap.c, sounds like it might depend on the autosuspend timeout for runtime PM. For issues where the controller may wake to an interrupt while runtime idle, there's pm_runtime_get_noresume(). Regards, Tony > [1]: https://lore.kernel.org/all/20180515183409.78046-1-andriy.shevchenko@linux.intel.com/T/#u > [2]: https://lore.kernel.org/all/20191114101718.20619-1-peter.ujfalusi@ti.com/ > [3]: https://lore.kernel.org/all/20180920193532.7714-1-tony@atomide.com/ > [4]: https://lore.kernel.org/all/1463014396-4095-1-git-send-email-tony@atomide.com/
diff --git a/drivers/i2c/busses/i2c-tegra.c b/drivers/i2c/busses/i2c-tegra.c index 85b31edc558d..6d783ecc3431 100644 --- a/drivers/i2c/busses/i2c-tegra.c +++ b/drivers/i2c/busses/i2c-tegra.c @@ -1804,7 +1804,7 @@ static int tegra_i2c_probe(struct platform_device *pdev) * VI I2C device shouldn't be marked as IRQ-safe because VI I2C won't * be used for atomic transfers. */ - if (!IS_VI(i2c_dev)) + if (!IS_VI(i2c_dev) && !ACPI_HANDLE(i2c_dev->dev)) pm_runtime_irq_safe(i2c_dev->dev); pm_runtime_enable(i2c_dev->dev);