diff mbox series

[5.10,079/126] drm/tegra: sor: Grab runtime PM reference across reset

Message ID 20210405085033.686284735@linuxfoundation.org
State New
Headers show
Series None | expand

Commit Message

Greg KH April 5, 2021, 8:54 a.m. UTC
From: Thierry Reding <treding@nvidia.com>

commit ac097aecfef0bb289ca53d2fe0b73fc7e1612a05 upstream.

The SOR resets are exclusively shared with the SOR power domain. This
means that exclusive access can only be granted temporarily and in order
for that to work, a rigorous sequence must be observed. To ensure that a
single consumer gets exclusive access to a reset, each consumer must
implement a rigorous protocol using the reset_control_acquire() and
reset_control_release() functions.

However, these functions alone don't provide any guarantees at the
system level. Drivers need to ensure that the only a single consumer has
access to the reset at the same time. In order for the SOR to be able to
exclusively access its reset, it must therefore ensure that the SOR
power domain is not powered off by holding on to a runtime PM reference
to that power domain across the reset assert/deassert operation.

This used to work fine by accident, but was revealed when recently more
devices started to rely on the SOR power domain.

Fixes: 11c632e1cfd3 ("drm/tegra: sor: Implement acquire/release for reset")
Reported-by: Jonathan Hunter <jonathanh@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/gpu/drm/tegra/sor.c |    7 +++++++
 1 file changed, 7 insertions(+)

Comments

Pavel Machek April 5, 2021, 3:42 p.m. UTC | #1
Hi!

> However, these functions alone don't provide any guarantees at the
> system level. Drivers need to ensure that the only a single consumer has
> access to the reset at the same time. In order for the SOR to be able to
> exclusively access its reset, it must therefore ensure that the SOR
> power domain is not powered off by holding on to a runtime PM reference
> to that power domain across the reset assert/deassert operation.

Yeah, but it should not leak the PM reference in the error handling.

Signed-off-by: Pavel Machek (CIP) <pavel@denx.de>

Best regards,
								Pavel

diff --git a/drivers/gpu/drm/tegra/sor.c b/drivers/gpu/drm/tegra/sor.c
index 7b88261f57bb..b3b727b2a3c5 100644
--- a/drivers/gpu/drm/tegra/sor.c
+++ b/drivers/gpu/drm/tegra/sor.c
@@ -3125,21 +3125,21 @@ static int tegra_sor_init(struct host1x_client *client)
 		if (err < 0) {
 			dev_err(sor->dev, "failed to acquire SOR reset: %d\n",
 				err);
-			return err;
+			goto maybe_put;
 		}
 
 		err = reset_control_assert(sor->rst);
 		if (err < 0) {
 			dev_err(sor->dev, "failed to assert SOR reset: %d\n",
 				err);
-			return err;
+			goto maybe_put;
 		}
 	}
 
 	err = clk_prepare_enable(sor->clk);
 	if (err < 0) {
 		dev_err(sor->dev, "failed to enable clock: %d\n", err);
-		return err;
+		goto maybe_put;
 	}
 
 	usleep_range(1000, 3000);
@@ -3150,7 +3150,7 @@ static int tegra_sor_init(struct host1x_client *client)
 			dev_err(sor->dev, "failed to deassert SOR reset: %d\n",
 				err);
 			clk_disable_unprepare(sor->clk);
-			return err;
+			goto maybe_put;
 		}
 
 		reset_control_release(sor->rst);
@@ -3171,6 +3171,11 @@ static int tegra_sor_init(struct host1x_client *client)
 	}
 
 	return 0;
+
+ maybe_put:
+	if (sor->rst)
+		pm_runtime_put(sor->dev);
+	return err;
 }
 
 static int tegra_sor_exit(struct host1x_client *client)
Thierry Reding April 6, 2021, 11:50 a.m. UTC | #2
On Mon, Apr 05, 2021 at 05:42:21PM +0200, Pavel Machek wrote:
> Hi!

> 

> > However, these functions alone don't provide any guarantees at the

> > system level. Drivers need to ensure that the only a single consumer has

> > access to the reset at the same time. In order for the SOR to be able to

> > exclusively access its reset, it must therefore ensure that the SOR

> > power domain is not powered off by holding on to a runtime PM reference

> > to that power domain across the reset assert/deassert operation.

> 

> Yeah, but it should not leak the PM reference in the error handling.


True. However the only reason why the code could realistically fail
between pm_runtime_resume_and_get() and pm_runtime_put() is if we did
not take the runtime PM reference (which would cause the call to
reset_control_acquire() to fail).

So the chances of this actually leaking are practically zero, so I
didn't want to bloat this bugfix with what's essentially dead code. I
can queue up your fix below for v5.14, though, since it's obviously
more correct from a theoretical point of view.

Thierry
Pavel Machek April 6, 2021, 7:14 p.m. UTC | #3
Hi!

> > > However, these functions alone don't provide any guarantees at the

> > > system level. Drivers need to ensure that the only a single consumer has

> > > access to the reset at the same time. In order for the SOR to be able to

> > > exclusively access its reset, it must therefore ensure that the SOR

> > > power domain is not powered off by holding on to a runtime PM reference

> > > to that power domain across the reset assert/deassert operation.

> > 

> > Yeah, but it should not leak the PM reference in the error handling.

> 

> True. However the only reason why the code could realistically fail

> between pm_runtime_resume_and_get() and pm_runtime_put() is if we did

> not take the runtime PM reference (which would cause the call to

> reset_control_acquire() to fail).

> 

> So the chances of this actually leaking are practically zero, so I

> didn't want to bloat this bugfix with what's essentially dead code. I

> can queue up your fix below for v5.14, though, since it's obviously

> more correct from a theoretical point of view.


Yes, that sounds like good solution, thanks!

Best regards,
								Pavel

-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
diff mbox series

Patch

--- a/drivers/gpu/drm/tegra/sor.c
+++ b/drivers/gpu/drm/tegra/sor.c
@@ -3115,6 +3115,12 @@  static int tegra_sor_init(struct host1x_
 	 * kernel is possible.
 	 */
 	if (sor->rst) {
+		err = pm_runtime_resume_and_get(sor->dev);
+		if (err < 0) {
+			dev_err(sor->dev, "failed to get runtime PM: %d\n", err);
+			return err;
+		}
+
 		err = reset_control_acquire(sor->rst);
 		if (err < 0) {
 			dev_err(sor->dev, "failed to acquire SOR reset: %d\n",
@@ -3148,6 +3154,7 @@  static int tegra_sor_init(struct host1x_
 		}
 
 		reset_control_release(sor->rst);
+		pm_runtime_put(sor->dev);
 	}
 
 	err = clk_prepare_enable(sor->clk_safe);