diff mbox series

[1/3] clk: qcom: lpass-sc7280: Fix pm_runtime usage

Message ID 20221104064055.1.I00a0e4564a25489e85328ec41636497775627564@changeid
State Accepted
Commit d470be3c4f30b4666e43eef6bab80f543563cdb0
Headers show
Series [1/3] clk: qcom: lpass-sc7280: Fix pm_runtime usage | expand

Commit Message

Doug Anderson Nov. 4, 2022, 1:56 p.m. UTC
The pm_runtime usage in lpass-sc7280 was broken in quite a few
ways. Specifically:

1. At the end of probe it called "put" twice. This is a no-no and will
   end us up with a negative usage count. Even worse than calling
   "put" twice, it never called "get" once. Thus after bootup it could
   be seen that the runtime usage of the devices managed by this
   driver was -2.
2. In some error cases it manually called pm_runtime_disable() even
   though it had previously used devm_add_action_or_reset() to set
   this up to be called automatically. This meant that in these error
   cases we'd double-call pm_runtime_disable().
3. It forgot to call undo pm_runtime_use_autosuspend(), which can
   sometimes have subtle problems (and the docs specifically mention
   that you need to undo this function).

Overall the above seriously calls into question how this driver is
working. It seems like a combination of "it doesn't", "by luck", and
"because of the weirdness of runtime_pm". Specifically I put a
printout to the serial console every time the runtime suspend/resume
was called for the two devices created by this driver (I wrapped the
pm_clk calls). When I had serial console enabled, I found that the
calls got resumed at bootup (when the clk core probed and before our
double-put) and then never touched again. That's no good.
  [    0.829997] DOUG: my_pm_clk_resume, usage=1
  [    0.835487] DOUG: my_pm_clk_resume, usage=1

When I disabled serial console (speeding up boot), I got a different
pattern, which I guess (?) is better:
  [    0.089767] DOUG: my_pm_clk_resume, usage=1
  [    0.090507] DOUG: my_pm_clk_resume, usage=1
  [    0.151885] DOUG: my_pm_clk_suspend, usage=-2
  [    0.151914] DOUG: my_pm_clk_suspend, usage=-2
  [    1.825747] DOUG: my_pm_clk_resume, usage=-1
  [    1.825774] DOUG: my_pm_clk_resume, usage=-1
  [    1.888269] DOUG: my_pm_clk_suspend, usage=-2
  [    1.888282] DOUG: my_pm_clk_suspend, usage=-2

These different patterns have to do with the fact that the core PM
Runtime code really isn't designed to be robust to negative usage
counts and sometimes may happen to stumble upon a behavior that
happens to "work". For instance, you can see that
__pm_runtime_suspend() will treat any non-zero value (including
negative numbers) as if the device is in use.

In any case, let's fix the driver to be correct. We'll hold a
pm_runtime reference for the whole probe and then drop it (once!) at
the end. We'll get rid of manual pm_runtime_disable() calls in the
error handling. We'll also switch to devm_pm_runtime_enable(), which
magically handles undoing pm_runtime_use_autosuspend() as of commit
b4060db9251f ("PM: runtime: Have devm_pm_runtime_enable() handle
pm_runtime_dont_use_autosuspend()").

While we're at this, let's also use devm_pm_clk_create() instead of
rolling it ourselves.

Note that the above changes make it obvious that
lpassaudio_create_pm_clks() was doing more than just creating
clocks. It was also setting up pm_runtime parameters. Let's rename it.

All of these problems were found by code inspection. I started looking
at this driver because it was involved in a deadlock that I reported a
while ago [1]. Though I bisected the deadlock to commit 1b771839de05
("clk: qcom: gdsc: enable optional power domain support"), it was
never really clear why that patch affected it other than a luck of
timing changes. I'll also note that by fixing the timing (as done in
this change) we also seem to aboid the deadlock, which is a nice
benefit.

Also note that some of the fixes here are much the same type of stuff
that Dmitry did in commit 72cfc73f4663 ("clk: qcom: use
devm_pm_runtime_enable and devm_pm_clk_create"), but I guess
lpassaudiocc-sc7280.c didn't exist then.

[1] https://lore.kernel.org/r/20220922154354.2486595-1-dianders@chromium.org

Fixes: a9dd26639d05 ("clk: qcom: lpass: Add support for LPASS clock controller for SC7280")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

 drivers/clk/qcom/lpassaudiocc-sc7280.c | 55 ++++++++++----------------
 1 file changed, 21 insertions(+), 34 deletions(-)

Comments

Dmitry Baryshkov Nov. 4, 2022, 9:10 p.m. UTC | #1
On Fri, 4 Nov 2022 at 16:57, Douglas Anderson <dianders@chromium.org> wrote:
>
> The pm_runtime usage in lpass-sc7280 was broken in quite a few
> ways. Specifically:
>
> 1. At the end of probe it called "put" twice. This is a no-no and will
>    end us up with a negative usage count. Even worse than calling
>    "put" twice, it never called "get" once. Thus after bootup it could
>    be seen that the runtime usage of the devices managed by this
>    driver was -2.
> 2. In some error cases it manually called pm_runtime_disable() even
>    though it had previously used devm_add_action_or_reset() to set
>    this up to be called automatically. This meant that in these error
>    cases we'd double-call pm_runtime_disable().
> 3. It forgot to call undo pm_runtime_use_autosuspend(), which can
>    sometimes have subtle problems (and the docs specifically mention
>    that you need to undo this function).
>
> Overall the above seriously calls into question how this driver is
> working. It seems like a combination of "it doesn't", "by luck", and
> "because of the weirdness of runtime_pm". Specifically I put a
> printout to the serial console every time the runtime suspend/resume
> was called for the two devices created by this driver (I wrapped the
> pm_clk calls). When I had serial console enabled, I found that the
> calls got resumed at bootup (when the clk core probed and before our
> double-put) and then never touched again. That's no good.
>   [    0.829997] DOUG: my_pm_clk_resume, usage=1
>   [    0.835487] DOUG: my_pm_clk_resume, usage=1
>
> When I disabled serial console (speeding up boot), I got a different
> pattern, which I guess (?) is better:
>   [    0.089767] DOUG: my_pm_clk_resume, usage=1
>   [    0.090507] DOUG: my_pm_clk_resume, usage=1
>   [    0.151885] DOUG: my_pm_clk_suspend, usage=-2
>   [    0.151914] DOUG: my_pm_clk_suspend, usage=-2
>   [    1.825747] DOUG: my_pm_clk_resume, usage=-1
>   [    1.825774] DOUG: my_pm_clk_resume, usage=-1
>   [    1.888269] DOUG: my_pm_clk_suspend, usage=-2
>   [    1.888282] DOUG: my_pm_clk_suspend, usage=-2
>
> These different patterns have to do with the fact that the core PM
> Runtime code really isn't designed to be robust to negative usage
> counts and sometimes may happen to stumble upon a behavior that
> happens to "work". For instance, you can see that
> __pm_runtime_suspend() will treat any non-zero value (including
> negative numbers) as if the device is in use.
>
> In any case, let's fix the driver to be correct. We'll hold a
> pm_runtime reference for the whole probe and then drop it (once!) at
> the end. We'll get rid of manual pm_runtime_disable() calls in the
> error handling. We'll also switch to devm_pm_runtime_enable(), which
> magically handles undoing pm_runtime_use_autosuspend() as of commit
> b4060db9251f ("PM: runtime: Have devm_pm_runtime_enable() handle
> pm_runtime_dont_use_autosuspend()").
>
> While we're at this, let's also use devm_pm_clk_create() instead of
> rolling it ourselves.
>
> Note that the above changes make it obvious that
> lpassaudio_create_pm_clks() was doing more than just creating
> clocks. It was also setting up pm_runtime parameters. Let's rename it.
>
> All of these problems were found by code inspection. I started looking
> at this driver because it was involved in a deadlock that I reported a
> while ago [1]. Though I bisected the deadlock to commit 1b771839de05
> ("clk: qcom: gdsc: enable optional power domain support"), it was
> never really clear why that patch affected it other than a luck of
> timing changes. I'll also note that by fixing the timing (as done in
> this change) we also seem to aboid the deadlock, which is a nice
> benefit.
>
> Also note that some of the fixes here are much the same type of stuff
> that Dmitry did in commit 72cfc73f4663 ("clk: qcom: use
> devm_pm_runtime_enable and devm_pm_clk_create"), but I guess
> lpassaudiocc-sc7280.c didn't exist then.

I don't remember. Most probably so.

>
> [1] https://lore.kernel.org/r/20220922154354.2486595-1-dianders@chromium.org
>
> Fixes: a9dd26639d05 ("clk: qcom: lpass: Add support for LPASS clock controller for SC7280")
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---

Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Stephen Boyd Nov. 4, 2022, 9:19 p.m. UTC | #2
Quoting Douglas Anderson (2022-11-04 06:56:29)
> The sc7180 lpass clock controller's pm_runtime usage wasn't broken
> quite as spectacularly as the sc7280's pm_runtime usage, but it was
> still broken. Putting some printouts in at boot showed me this (with
> serial console enabled, which makes the prints slow and thus changes
> timing):
>   [    3.109951] DOUG: my_pm_clk_resume, usage=1
>   [    3.114767] DOUG: my_pm_clk_resume, usage=1
>   [    3.664443] DOUG: my_pm_clk_suspend, usage=0
>   [    3.897566] DOUG: my_pm_clk_suspend, usage=0
>   [    3.910137] DOUG: my_pm_clk_resume, usage=1
>   [    3.923217] DOUG: my_pm_clk_resume, usage=0
>   [    4.440116] DOUG: my_pm_clk_suspend, usage=-1
>   [    4.444982] DOUG: my_pm_clk_suspend, usage=0
>   [   14.170501] DOUG: my_pm_clk_resume, usage=1
>   [   14.176245] DOUG: my_pm_clk_resume, usage=0
>
> ...or this w/out serial console:
>   [    0.556139] DOUG: my_pm_clk_resume, usage=1
>   [    0.556279] DOUG: my_pm_clk_resume, usage=1
>   [    1.058422] DOUG: my_pm_clk_suspend, usage=-1
>   [    1.058464] DOUG: my_pm_clk_suspend, usage=0
>   [    1.186250] DOUG: my_pm_clk_resume, usage=1
>   [    1.186292] DOUG: my_pm_clk_resume, usage=0
>   [    1.731536] DOUG: my_pm_clk_suspend, usage=-1
>   [    1.731557] DOUG: my_pm_clk_suspend, usage=0
>   [   10.288910] DOUG: my_pm_clk_resume, usage=1
>   [   10.289496] DOUG: my_pm_clk_resume, usage=0
>
> It seems to be doing roughly the right sequence of calls, but just
> like with sc7280 this is more by luck than anything. Having a usage of
> -1 is just not OK.
>
> Let's fix this like we did with sc7280.

Any Fixes tag?

>
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---

Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Doug Anderson Nov. 15, 2022, 12:08 a.m. UTC | #3
Bjorn,

On Fri, Nov 4, 2022 at 6:57 AM Douglas Anderson <dianders@chromium.org> wrote:
>
> The pm_runtime usage in lpass-sc7280 was broken in quite a few
> ways. Specifically:
>
> 1. At the end of probe it called "put" twice. This is a no-no and will
>    end us up with a negative usage count. Even worse than calling
>    "put" twice, it never called "get" once. Thus after bootup it could
>    be seen that the runtime usage of the devices managed by this
>    driver was -2.
> 2. In some error cases it manually called pm_runtime_disable() even
>    though it had previously used devm_add_action_or_reset() to set
>    this up to be called automatically. This meant that in these error
>    cases we'd double-call pm_runtime_disable().
> 3. It forgot to call undo pm_runtime_use_autosuspend(), which can
>    sometimes have subtle problems (and the docs specifically mention
>    that you need to undo this function).
>
> Overall the above seriously calls into question how this driver is
> working. It seems like a combination of "it doesn't", "by luck", and
> "because of the weirdness of runtime_pm". Specifically I put a
> printout to the serial console every time the runtime suspend/resume
> was called for the two devices created by this driver (I wrapped the
> pm_clk calls). When I had serial console enabled, I found that the
> calls got resumed at bootup (when the clk core probed and before our
> double-put) and then never touched again. That's no good.
>   [    0.829997] DOUG: my_pm_clk_resume, usage=1
>   [    0.835487] DOUG: my_pm_clk_resume, usage=1
>
> When I disabled serial console (speeding up boot), I got a different
> pattern, which I guess (?) is better:
>   [    0.089767] DOUG: my_pm_clk_resume, usage=1
>   [    0.090507] DOUG: my_pm_clk_resume, usage=1
>   [    0.151885] DOUG: my_pm_clk_suspend, usage=-2
>   [    0.151914] DOUG: my_pm_clk_suspend, usage=-2
>   [    1.825747] DOUG: my_pm_clk_resume, usage=-1
>   [    1.825774] DOUG: my_pm_clk_resume, usage=-1
>   [    1.888269] DOUG: my_pm_clk_suspend, usage=-2
>   [    1.888282] DOUG: my_pm_clk_suspend, usage=-2
>
> These different patterns have to do with the fact that the core PM
> Runtime code really isn't designed to be robust to negative usage
> counts and sometimes may happen to stumble upon a behavior that
> happens to "work". For instance, you can see that
> __pm_runtime_suspend() will treat any non-zero value (including
> negative numbers) as if the device is in use.
>
> In any case, let's fix the driver to be correct. We'll hold a
> pm_runtime reference for the whole probe and then drop it (once!) at
> the end. We'll get rid of manual pm_runtime_disable() calls in the
> error handling. We'll also switch to devm_pm_runtime_enable(), which
> magically handles undoing pm_runtime_use_autosuspend() as of commit
> b4060db9251f ("PM: runtime: Have devm_pm_runtime_enable() handle
> pm_runtime_dont_use_autosuspend()").
>
> While we're at this, let's also use devm_pm_clk_create() instead of
> rolling it ourselves.
>
> Note that the above changes make it obvious that
> lpassaudio_create_pm_clks() was doing more than just creating
> clocks. It was also setting up pm_runtime parameters. Let's rename it.
>
> All of these problems were found by code inspection. I started looking
> at this driver because it was involved in a deadlock that I reported a
> while ago [1]. Though I bisected the deadlock to commit 1b771839de05
> ("clk: qcom: gdsc: enable optional power domain support"), it was
> never really clear why that patch affected it other than a luck of
> timing changes. I'll also note that by fixing the timing (as done in
> this change) we also seem to aboid the deadlock, which is a nice
> benefit.
>
> Also note that some of the fixes here are much the same type of stuff
> that Dmitry did in commit 72cfc73f4663 ("clk: qcom: use
> devm_pm_runtime_enable and devm_pm_clk_create"), but I guess
> lpassaudiocc-sc7280.c didn't exist then.
>
> [1] https://lore.kernel.org/r/20220922154354.2486595-1-dianders@chromium.org
>
> Fixes: a9dd26639d05 ("clk: qcom: lpass: Add support for LPASS clock controller for SC7280")
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---
>
>  drivers/clk/qcom/lpassaudiocc-sc7280.c | 55 ++++++++++----------------
>  1 file changed, 21 insertions(+), 34 deletions(-)

Is anything blocking this series from landing? I noticed a few other
patches have landed since then to your Qualcomm clk branch, but I
don't see these there. I assume it'll land through your tree.

Thanks!


-Doug
Bjorn Andersson Dec. 2, 2022, 8:58 p.m. UTC | #4
On Fri, 4 Nov 2022 06:56:28 -0700, Douglas Anderson wrote:
> The pm_runtime usage in lpass-sc7280 was broken in quite a few
> ways. Specifically:
> 
> 1. At the end of probe it called "put" twice. This is a no-no and will
>    end us up with a negative usage count. Even worse than calling
>    "put" twice, it never called "get" once. Thus after bootup it could
>    be seen that the runtime usage of the devices managed by this
>    driver was -2.
> 2. In some error cases it manually called pm_runtime_disable() even
>    though it had previously used devm_add_action_or_reset() to set
>    this up to be called automatically. This meant that in these error
>    cases we'd double-call pm_runtime_disable().
> 3. It forgot to call undo pm_runtime_use_autosuspend(), which can
>    sometimes have subtle problems (and the docs specifically mention
>    that you need to undo this function).
> 
> [...]

Applied, thanks!

[1/3] clk: qcom: lpass-sc7280: Fix pm_runtime usage
      commit: d470be3c4f30b4666e43eef6bab80f543563cdb0
[2/3] clk: qcom: lpass-sc7180: Fix pm_runtime usage
      commit: ff1ccf59eaffd192efe21f7de9fb0c130faf1b1b
[3/3] clk: qcom: lpass-sc7180: Avoid an extra "struct dev_pm_ops"
      commit: e3ad6c3f21ddb89e4b71361be8318da57dbe3597

Best regards,
diff mbox series

Patch

diff --git a/drivers/clk/qcom/lpassaudiocc-sc7280.c b/drivers/clk/qcom/lpassaudiocc-sc7280.c
index 063e0365f311..1339f9211a14 100644
--- a/drivers/clk/qcom/lpassaudiocc-sc7280.c
+++ b/drivers/clk/qcom/lpassaudiocc-sc7280.c
@@ -722,33 +722,17 @@  static const struct of_device_id lpass_audio_cc_sc7280_match_table[] = {
 };
 MODULE_DEVICE_TABLE(of, lpass_audio_cc_sc7280_match_table);
 
-static void lpassaudio_pm_runtime_disable(void *data)
-{
-	pm_runtime_disable(data);
-}
-
-static void lpassaudio_pm_clk_destroy(void *data)
-{
-	pm_clk_destroy(data);
-}
-
-static int lpassaudio_create_pm_clks(struct platform_device *pdev)
+static int lpass_audio_setup_runtime_pm(struct platform_device *pdev)
 {
 	int ret;
 
 	pm_runtime_use_autosuspend(&pdev->dev);
 	pm_runtime_set_autosuspend_delay(&pdev->dev, 50);
-	pm_runtime_enable(&pdev->dev);
-
-	ret = devm_add_action_or_reset(&pdev->dev, lpassaudio_pm_runtime_disable, &pdev->dev);
-	if (ret)
-		return ret;
-
-	ret = pm_clk_create(&pdev->dev);
+	ret = devm_pm_runtime_enable(&pdev->dev);
 	if (ret)
 		return ret;
 
-	ret = devm_add_action_or_reset(&pdev->dev, lpassaudio_pm_clk_destroy, &pdev->dev);
+	ret = devm_pm_clk_create(&pdev->dev);
 	if (ret)
 		return ret;
 
@@ -756,7 +740,7 @@  static int lpassaudio_create_pm_clks(struct platform_device *pdev)
 	if (ret < 0)
 		dev_err(&pdev->dev, "failed to acquire iface clock\n");
 
-	return ret;
+	return pm_runtime_resume_and_get(&pdev->dev);
 }
 
 static int lpass_audio_cc_sc7280_probe(struct platform_device *pdev)
@@ -765,7 +749,7 @@  static int lpass_audio_cc_sc7280_probe(struct platform_device *pdev)
 	struct regmap *regmap;
 	int ret;
 
-	ret = lpassaudio_create_pm_clks(pdev);
+	ret = lpass_audio_setup_runtime_pm(pdev);
 	if (ret)
 		return ret;
 
@@ -775,8 +759,8 @@  static int lpass_audio_cc_sc7280_probe(struct platform_device *pdev)
 
 	regmap = qcom_cc_map(pdev, desc);
 	if (IS_ERR(regmap)) {
-		pm_runtime_disable(&pdev->dev);
-		return PTR_ERR(regmap);
+		ret = PTR_ERR(regmap);
+		goto exit;
 	}
 
 	clk_zonda_pll_configure(&lpass_audio_cc_pll, regmap, &lpass_audio_cc_pll_config);
@@ -788,20 +772,18 @@  static int lpass_audio_cc_sc7280_probe(struct platform_device *pdev)
 	ret = qcom_cc_really_probe(pdev, &lpass_audio_cc_sc7280_desc, regmap);
 	if (ret) {
 		dev_err(&pdev->dev, "Failed to register LPASS AUDIO CC clocks\n");
-		pm_runtime_disable(&pdev->dev);
-		return ret;
+		goto exit;
 	}
 
 	ret = qcom_cc_probe_by_index(pdev, 1, &lpass_audio_cc_reset_sc7280_desc);
 	if (ret) {
 		dev_err(&pdev->dev, "Failed to register LPASS AUDIO CC Resets\n");
-		pm_runtime_disable(&pdev->dev);
-		return ret;
+		goto exit;
 	}
 
 	pm_runtime_mark_last_busy(&pdev->dev);
+exit:
 	pm_runtime_put_autosuspend(&pdev->dev);
-	pm_runtime_put_sync(&pdev->dev);
 
 	return ret;
 }
@@ -839,14 +821,15 @@  static int lpass_aon_cc_sc7280_probe(struct platform_device *pdev)
 	struct regmap *regmap;
 	int ret;
 
-	ret = lpassaudio_create_pm_clks(pdev);
+	ret = lpass_audio_setup_runtime_pm(pdev);
 	if (ret)
 		return ret;
 
 	if (of_property_read_bool(pdev->dev.of_node, "qcom,adsp-pil-mode")) {
 		lpass_audio_cc_sc7280_regmap_config.name = "cc";
 		desc = &lpass_cc_sc7280_desc;
-		return qcom_cc_probe(pdev, desc);
+		ret = qcom_cc_probe(pdev, desc);
+		goto exit;
 	}
 
 	lpass_audio_cc_sc7280_regmap_config.name = "lpasscc_aon";
@@ -854,18 +837,22 @@  static int lpass_aon_cc_sc7280_probe(struct platform_device *pdev)
 	desc = &lpass_aon_cc_sc7280_desc;
 
 	regmap = qcom_cc_map(pdev, desc);
-	if (IS_ERR(regmap))
-		return PTR_ERR(regmap);
+	if (IS_ERR(regmap)) {
+		ret = PTR_ERR(regmap);
+		goto exit;
+	}
 
 	clk_lucid_pll_configure(&lpass_aon_cc_pll, regmap, &lpass_aon_cc_pll_config);
 
 	ret = qcom_cc_really_probe(pdev, &lpass_aon_cc_sc7280_desc, regmap);
-	if (ret)
+	if (ret) {
 		dev_err(&pdev->dev, "Failed to register LPASS AON CC clocks\n");
+		goto exit;
+	}
 
 	pm_runtime_mark_last_busy(&pdev->dev);
+exit:
 	pm_runtime_put_autosuspend(&pdev->dev);
-	pm_runtime_put_sync(&pdev->dev);
 
 	return ret;
 }