mbox series

[v2,0/6] Fix RK3588 GPU domain

Message ID 20240919091834.83572-1-sebastian.reichel@collabora.com
Headers show
Series Fix RK3588 GPU domain | expand

Message

Sebastian Reichel Sept. 19, 2024, 9:12 a.m. UTC
Hi,

I got a report, that the Linux kernel crashes on Rock 5B when the panthor
driver is loaded late after booting. The crash starts with the following
shortened error print:

rockchip-pm-domain fd8d8000.power-management:power-controller: failed to set domain 'gpu', val=0
rockchip-pm-domain fd8d8000.power-management:power-controller: failed to get ack on domain 'gpu', val=0xa9fff
SError Interrupt on CPU4, code 0x00000000be000411 -- SError

This series first does some cleanups in the Rockchip power domain
driver and changes the driver, so that it no longer tries to continue
when it fails to enable a domain. This gets rid of the SError interrupt
and long backtraces. But the kernel still hangs when it fails to enable
a power domain. I have not done further analysis to check if that can
be avoided.

Last but not least this provides a fix for the GPU power domain failing
to get enabled - after some testing from my side it seems to require the
GPU voltage supply to be enabled.

I'm not really happy about the hack to get a regulator for a sub-node,
which I took over from the Mediatek driver. I discussed this with
Chen-Yu Tsai and Heiko Stübner at OSS EU and the plan is:

1. Merge Rockchip PM domain driver with this hack for now, since DRM CI
   people need it
2. Chen-Yu will work on a series, which fixes the hack in Mediatek by
   introducing a new devm_regulator_get function taking an DT node as
   additional argument
3. Rockchip PM domain later will switch to that once it has landed

Changes since PATCHv1:
 * https://lore.kernel.org/all/20240910180530.47194-1-sebastian.reichel@collabora.com/
 * Collect Reviewed-by/Acked-by/Tested-by
 * swap first and second patch to avoid introducing and directly removing a mutex_unlock
 * fix spelling of indentation
 * fix double empty line after rockchip_pd_regulator_disable()

Greetings,

-- Sebastian

Sebastian Reichel (6):
  pmdomain: rockchip: cleanup mutex handling in rockchip_pd_power
  pmdomain: rockchip: forward rockchip_do_pmu_set_power_domain errors
  pmdomain: rockchip: reduce indentation in rockchip_pd_power
  dt-bindings: power: rockchip: add regulator support
  pmdomain: rockchip: add regulator support
  arm64: dts: rockchip: Add GPU power domain regulator dependency for
    RK3588

 .../power/rockchip,power-controller.yaml      |   3 +
 .../boot/dts/rockchip/rk3588-armsom-sige7.dts |   4 +
 arch/arm64/boot/dts/rockchip/rk3588-base.dtsi |   2 +-
 .../boot/dts/rockchip/rk3588-coolpi-cm5.dtsi  |   4 +
 .../rockchip/rk3588-friendlyelec-cm3588.dtsi  |   4 +
 .../arm64/boot/dts/rockchip/rk3588-jaguar.dts |   4 +
 .../boot/dts/rockchip/rk3588-ok3588-c.dts     |   4 +
 .../boot/dts/rockchip/rk3588-rock-5-itx.dts   |   4 +
 .../boot/dts/rockchip/rk3588-rock-5b.dts      |   4 +
 .../arm64/boot/dts/rockchip/rk3588-tiger.dtsi |   4 +
 .../boot/dts/rockchip/rk3588s-coolpi-4b.dts   |   4 +
 .../dts/rockchip/rk3588s-khadas-edge2.dts     |   4 +
 .../boot/dts/rockchip/rk3588s-orangepi-5.dts  |   4 +
 drivers/pmdomain/rockchip/pm-domains.c        | 129 +++++++++++++-----
 14 files changed, 143 insertions(+), 35 deletions(-)

Comments

Chen-Yu Tsai Sept. 20, 2024, 11:03 p.m. UTC | #1
On Thu, Sep 19, 2024 at 11:27 AM Sebastian Reichel
<sebastian.reichel@collabora.com> wrote:
>
> Some power domains require extra voltages to be applied. For example
> trying to enable the GPU domain on RK3588 fails when the SoC does not
> have VDD GPU enabled.
>
> The solution to temporarily change the device's device tree node has
> been taken over from the Mediatek power domain driver.
>
> The regulator is not acquired at probe time, since that creates circular
> dependencies. The power domain driver must be probed early, since SoC
> peripherals need it. Regulators on the other hand depend on SoC
> peripherals like SPI, I2C or GPIO.
>
> Reviewed-by: Heiko Stuebner <heiko@sntech.de>
> Tested-by: Adrian Larumbe <adrian.larumbe@collabora.com> # On Rock 5B
> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
> ---
>  drivers/pmdomain/rockchip/pm-domains.c | 56 +++++++++++++++++++++++++-
>  1 file changed, 54 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/pmdomain/rockchip/pm-domains.c b/drivers/pmdomain/rockchip/pm-domains.c
> index 663d390faaeb..4bc17b588419 100644
> --- a/drivers/pmdomain/rockchip/pm-domains.c
> +++ b/drivers/pmdomain/rockchip/pm-domains.c
> @@ -18,6 +18,7 @@
>  #include <linux/of_clk.h>
>  #include <linux/clk.h>
>  #include <linux/regmap.h>
> +#include <linux/regulator/consumer.h>
>  #include <linux/mfd/syscon.h>
>  #include <soc/rockchip/pm_domains.h>
>  #include <dt-bindings/power/px30-power.h>
> @@ -89,6 +90,8 @@ struct rockchip_pm_domain {
>         u32 *qos_save_regs[MAX_QOS_REGS_NUM];
>         int num_clks;
>         struct clk_bulk_data *clks;
> +       struct device_node *node;
> +       struct regulator *supply;
>  };
>
>  struct rockchip_pmu {
> @@ -571,18 +574,66 @@ static int rockchip_pd_power(struct rockchip_pm_domain *pd, bool power_on)
>         return 0;
>  }
>
> +static int rockchip_pd_regulator_disable(struct rockchip_pm_domain *pd)
> +{
> +       return pd->supply ? regulator_disable(pd->supply) : 0;
> +}
> +
> +static int rockchip_pd_regulator_enable(struct rockchip_pm_domain *pd)
> +{
> +       struct rockchip_pmu *pmu = pd->pmu;
> +       struct device_node *main_node;
> +
> +       if (!pd->supply) {
> +               /*
> +                * Find regulator in current power domain node.
> +                * devm_regulator_get() finds regulator in a node and its child
> +                * node, so set of_node to current power domain node then change
> +                * back to original node after regulator is found for current
> +                * power domain node.
> +                */
> +               main_node = pmu->dev->of_node;
> +               pmu->dev->of_node = pd->node;
> +               pd->supply = devm_regulator_get(pmu->dev, "domain");

How do you differentiate between a power domain not needing a supply,
vs a power domain that is missing its supply in DT?

You're using the normal "devm_regulator_get()" here (no "_optional),
which gives the dummy supply if no supply is specified in the DT, and
I think that wouldn't work properly.

I'm trying to work out if having "devm_of_regulator_get()" is needed or
not.

Also, this could return -EPROBE_DEFER. I guess it works with the initial
pm_domain_attach() from the driver core?


Thanks
ChenYu


> +               pmu->dev->of_node = main_node;
> +               if (IS_ERR(pd->supply)) {
> +                       pd->supply = NULL;
> +                       return 0;
> +               }
> +       }
> +
> +       return regulator_enable(pd->supply);
> +}
> +
>  static int rockchip_pd_power_on(struct generic_pm_domain *domain)
>  {
>         struct rockchip_pm_domain *pd = to_rockchip_pd(domain);
> +       int ret;
> +
> +       ret = rockchip_pd_regulator_enable(pd);
> +       if (ret) {
> +               dev_err(pd->pmu->dev, "Failed to enable supply: %d\n", ret);
> +               return ret;
> +       }
>
> -       return rockchip_pd_power(pd, true);
> +       ret = rockchip_pd_power(pd, true);
> +       if (ret)
> +               rockchip_pd_regulator_disable(pd);
> +
> +       return ret;
>  }
>
>  static int rockchip_pd_power_off(struct generic_pm_domain *domain)
>  {
>         struct rockchip_pm_domain *pd = to_rockchip_pd(domain);
> +       int ret;
>
> -       return rockchip_pd_power(pd, false);
> +       ret = rockchip_pd_power(pd, false);
> +       if (ret)
> +               return ret;
> +
> +       rockchip_pd_regulator_disable(pd);
> +       return ret;
>  }
>
>  static int rockchip_pd_attach_dev(struct generic_pm_domain *genpd,
> @@ -663,6 +714,7 @@ static int rockchip_pm_add_one_domain(struct rockchip_pmu *pmu,
>
>         pd->info = pd_info;
>         pd->pmu = pmu;
> +       pd->node = node;
>
>         pd->num_clks = of_clk_get_parent_count(node);
>         if (pd->num_clks > 0) {
> --
> 2.45.2
>
Ulf Hansson Oct. 2, 2024, 10:59 a.m. UTC | #2
On Thu, 19 Sept 2024 at 11:18, Sebastian Reichel
<sebastian.reichel@collabora.com> wrote:
>
> Hi,
>
> I got a report, that the Linux kernel crashes on Rock 5B when the panthor
> driver is loaded late after booting. The crash starts with the following
> shortened error print:
>
> rockchip-pm-domain fd8d8000.power-management:power-controller: failed to set domain 'gpu', val=0
> rockchip-pm-domain fd8d8000.power-management:power-controller: failed to get ack on domain 'gpu', val=0xa9fff
> SError Interrupt on CPU4, code 0x00000000be000411 -- SError
>
> This series first does some cleanups in the Rockchip power domain
> driver and changes the driver, so that it no longer tries to continue
> when it fails to enable a domain. This gets rid of the SError interrupt
> and long backtraces. But the kernel still hangs when it fails to enable
> a power domain. I have not done further analysis to check if that can
> be avoided.
>
> Last but not least this provides a fix for the GPU power domain failing
> to get enabled - after some testing from my side it seems to require the
> GPU voltage supply to be enabled.
>
> I'm not really happy about the hack to get a regulator for a sub-node,
> which I took over from the Mediatek driver. I discussed this with
> Chen-Yu Tsai and Heiko Stübner at OSS EU and the plan is:
>
> 1. Merge Rockchip PM domain driver with this hack for now, since DRM CI
>    people need it
> 2. Chen-Yu will work on a series, which fixes the hack in Mediatek by
>    introducing a new devm_regulator_get function taking an DT node as
>    additional argument
> 3. Rockchip PM domain later will switch to that once it has landed

I have just queued up 2) on my next branch.

My suggestion is to skip the intermediate step in 1) and go directly
for 3) instead, unless you think there is a problem with that, of
course?

[...]

Kind regards
Uffe