diff mbox series

[RFC] pmdomain: ti-sci: Set PD on/off state according to the HW state

Message ID 20241022-tisci-pd-boot-state-v1-1-849a6384131b@ideasonboard.com
State New
Headers show
Series [RFC] pmdomain: ti-sci: Set PD on/off state according to the HW state | expand

Commit Message

Tomi Valkeinen Oct. 22, 2024, 6:41 a.m. UTC
At the moment the driver sets the power state of all the PDs it creates
to off, regardless of the actual HW state. This has two drawbacks:

1) The kernel cannot disable unused PDs automatically for power saving,
   as it thinks they are off already

2) A more specific case (but perhaps applicable to other scenarios
   also): bootloader enabled splash-screen cannot be kept on the screen.

The issue in 2) is that the driver framework automatically enables the
device's PD before calling probe() and disables it after the probe().
This means that when the display subsystem (DSS) driver probes, but e.g.
fails due to deferred probing, the DSS PD gets turned off and the driver
cannot do anything to affect that.

Solving the 2) requires more changes to actually keep the PD on during
the boot, but a prerequisite for it is to have the correct power state
for the PD.

The downside with this patch is that it takes time to call the 'is_on'
op, and we need to call it for each PD. In my tests with AM62 SK, using
defconfig, I see an increase from ~3.5ms to ~7ms. However, the added
feature is valuable, so in my opinion it's worth it.

The performance could probably be improved with a new firmware API which
returns the power states of all the PDs.

There's also a related HW issue at play here: if the DSS IP is enabled
and active, and its PD is turned off without first disabling the DSS
display outputs, the DSS IP will hang and causes the kernel to halt if
and when the DSS driver accesses the DSS registers the next time.

With the current upstream kernel, with this patch applied, this means
that if the bootloader enables the display, and the DSS driver is
compiled as a module, the kernel will at some point disable unused PDs,
including the DSS PD. When the DSS module is later loaded, it will hang
the kernel.

The same issue is already there, even without this patch, as the DSS
driver may hit deferred probing, which causes the PD to be turned off,
and leading to kernel halt when the DSS driver is probed again. This
issue has been made quite rare with some arrangements in the DSS
driver's probe, but it's still there.

So, because of the DSS hang issues, I think this patch is still an RFC.
Hopefully we can sort out all the issues, but this patch (or similar)
will be part of the solution so I'd like to get some acks/nacks/comments
for this. Also, this change might have side effects to other devices
too, if the drivers expect the PD to be on, so testing is needed.

Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
---
 drivers/pmdomain/ti/ti_sci_pm_domains.c | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)


---
base-commit: 98f7e32f20d28ec452afb208f9cffc08448a2652
change-id: 20241022-tisci-pd-boot-state-33cf02efd378

Best regards,

Comments

Kevin Hilman Oct. 30, 2024, 8:04 p.m. UTC | #1
Tomi Valkeinen <tomi.valkeinen@ideasonboard.com> writes:

> At the moment the driver sets the power state of all the PDs it creates
> to off, regardless of the actual HW state. This has two drawbacks:
>
> 1) The kernel cannot disable unused PDs automatically for power saving,
>    as it thinks they are off already
>
> 2) A more specific case (but perhaps applicable to other scenarios
>    also): bootloader enabled splash-screen cannot be kept on the screen.
>
> The issue in 2) is that the driver framework automatically enables the
> device's PD before calling probe() and disables it after the probe().
> This means that when the display subsystem (DSS) driver probes, but e.g.
> fails due to deferred probing, the DSS PD gets turned off and the driver
> cannot do anything to affect that.
>
> Solving the 2) requires more changes to actually keep the PD on during
> the boot, but a prerequisite for it is to have the correct power state
> for the PD.
>
> The downside with this patch is that it takes time to call the 'is_on'
> op, and we need to call it for each PD. In my tests with AM62 SK, using
> defconfig, I see an increase from ~3.5ms to ~7ms. However, the added
> feature is valuable, so in my opinion it's worth it.
>
> The performance could probably be improved with a new firmware API which
> returns the power states of all the PDs.

Agreed.  I think we have to pay this performance price for correctness,
and we can optimizie it later with improvements to the SCI firmware and
a new API.

> There's also a related HW issue at play here: if the DSS IP is enabled
> and active, and its PD is turned off without first disabling the DSS
> display outputs, the DSS IP will hang and causes the kernel to halt if
> and when the DSS driver accesses the DSS registers the next time.

Ouch.

> With the current upstream kernel, with this patch applied, this means
> that if the bootloader enables the display, and the DSS driver is
> compiled as a module, the kernel will at some point disable unused PDs,
> including the DSS PD. When the DSS module is later loaded, it will hang
> the kernel.
>
> The same issue is already there, even without this patch, as the DSS
> driver may hit deferred probing, which causes the PD to be turned off,
> and leading to kernel halt when the DSS driver is probed again. This
> issue has been made quite rare with some arrangements in the DSS
> driver's probe, but it's still there.
>
> So, because of the DSS hang issues, I think this patch is still an RFC.

Like you said, I think that DSS hang is an issue independently of this
patch, so it shouldn't hold this up IMO.

> Hopefully we can sort out all the issues, but this patch (or similar)
> will be part of the solution so I'd like to get some acks/nacks/comments
> for this. Also, this change might have side effects to other devices
> too, if the drivers expect the PD to be on, so testing is needed.
>
> Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>

We already discussed this a bit off-list, but for the record, I agree
with the approach.

I also tested it on k3-am62a7-sk where I've been doing the other TI SCI
pmdomain work and everything still working fine.

Reviewed-by: Kevin Hilman <khilman@baylibre.com>
Tested-by: Kevin Hilman <khilman@baylibre.com>

Kevin
Tomi Valkeinen Nov. 1, 2024, 6:46 a.m. UTC | #2
Hi,

On 31/10/2024 12:25, Ulf Hansson wrote:
> On Thu, 31 Oct 2024 at 08:39, Tomi Valkeinen
> <tomi.valkeinen@ideasonboard.com> wrote:
>>
>> Hi,
>>
>> On 30/10/2024 22:04, Kevin Hilman wrote:
>>> Tomi Valkeinen <tomi.valkeinen@ideasonboard.com> writes:
>>>
>>>> At the moment the driver sets the power state of all the PDs it creates
>>>> to off, regardless of the actual HW state. This has two drawbacks:
>>>>
>>>> 1) The kernel cannot disable unused PDs automatically for power saving,
>>>>      as it thinks they are off already
>>>>
>>>> 2) A more specific case (but perhaps applicable to other scenarios
>>>>      also): bootloader enabled splash-screen cannot be kept on the screen.
>>>>
>>>> The issue in 2) is that the driver framework automatically enables the
>>>> device's PD before calling probe() and disables it after the probe().
>>>> This means that when the display subsystem (DSS) driver probes, but e.g.
>>>> fails due to deferred probing, the DSS PD gets turned off and the driver
>>>> cannot do anything to affect that.
>>>>
>>>> Solving the 2) requires more changes to actually keep the PD on during
>>>> the boot, but a prerequisite for it is to have the correct power state
>>>> for the PD.
>>>>
>>>> The downside with this patch is that it takes time to call the 'is_on'
>>>> op, and we need to call it for each PD. In my tests with AM62 SK, using
>>>> defconfig, I see an increase from ~3.5ms to ~7ms. However, the added
>>>> feature is valuable, so in my opinion it's worth it.
>>>>
>>>> The performance could probably be improved with a new firmware API which
>>>> returns the power states of all the PDs.
>>>
>>> Agreed.  I think we have to pay this performance price for correctness,
>>> and we can optimizie it later with improvements to the SCI firmware and
>>> a new API.
>>>
>>>> There's also a related HW issue at play here: if the DSS IP is enabled
>>>> and active, and its PD is turned off without first disabling the DSS
>>>> display outputs, the DSS IP will hang and causes the kernel to halt if
>>>> and when the DSS driver accesses the DSS registers the next time.
>>>
>>> Ouch.
>>>
>>>> With the current upstream kernel, with this patch applied, this means
>>>> that if the bootloader enables the display, and the DSS driver is
>>>> compiled as a module, the kernel will at some point disable unused PDs,
>>>> including the DSS PD. When the DSS module is later loaded, it will hang
>>>> the kernel.
>>>>
>>>> The same issue is already there, even without this patch, as the DSS
>>>> driver may hit deferred probing, which causes the PD to be turned off,
>>>> and leading to kernel halt when the DSS driver is probed again. This
>>>> issue has been made quite rare with some arrangements in the DSS
>>>> driver's probe, but it's still there.
>>>>
>>>> So, because of the DSS hang issues, I think this patch is still an RFC.
>>>
>>> Like you said, I think that DSS hang is an issue independently of this
>>> patch, so it shouldn't hold this up IMO.
>>
>> In current upstream, if the bootloader has enabled the display, we most
>> likely won't hit the DSS hang issue as the PD will stay on until the DSS
>> driver has had a chance to probe, and the driver takes actions to avoid
>> the hang issue.
>>
>> With this patch applied, the PD may be turned off before the DSS driver
>> has had a chance to probe, causing the board to hang when the DSS driver
>> probes the first time.
>>
>> That's why I'm a bit hesitant to apply this. It could mean that for some
>> people their board stops booting.
>>
>> I'm not even sure what would be the perfect fix for this hang problem...
>>
>> We could have some built-in early boot code which checks if the DSS is
>> enabled, and disables it, so that the hang issue won't happen. But
>> that's not good if we try to keep the boot splash on the screen until
>> the userspace takes over.
>>
>> Alternatively we could, somehow, mark the DSS powerdomain to be handled
>> in a special way: if the PD is enabled at boot time, it will be kept
>> enabled until the DSS driver (somehow) changes the PD back to normal
>> operation (and if DSS driver is never loaded, PD will stay on).
> 
> This option is kind of what I am working on. Although, the goal is to
> keep the code generic, so ideally we should not need any changes in
> the DSS driver to make this work. Let's see.

If you need someone to do some testing, you know who to ask =).

> That said, it sounds like we should defer $subject patch until we have
> a solution for the above, right?

Yes, I would say so.

I'll collect the tags, and will resend this patch (as a non-RFC) when I 
think it's safe.

Thanks for the feedback, Kevin and Ulf.

  Tomi
diff mbox series

Patch

diff --git a/drivers/pmdomain/ti/ti_sci_pm_domains.c b/drivers/pmdomain/ti/ti_sci_pm_domains.c
index 1510d5ddae3d..14c51a395d7e 100644
--- a/drivers/pmdomain/ti/ti_sci_pm_domains.c
+++ b/drivers/pmdomain/ti/ti_sci_pm_domains.c
@@ -126,6 +126,23 @@  static bool ti_sci_pm_idx_exists(struct ti_sci_genpd_provider *pd_provider, u32
 	return false;
 }
 
+static bool ti_sci_pm_pd_is_on(struct ti_sci_genpd_provider *pd_provider,
+			       int pd_idx)
+{
+	bool is_on;
+	int ret;
+
+	if (!pd_provider->ti_sci->ops.dev_ops.is_on)
+		return false;
+
+	ret = pd_provider->ti_sci->ops.dev_ops.is_on(pd_provider->ti_sci,
+						     pd_idx, NULL, &is_on);
+	if (ret)
+		return false;
+
+	return is_on;
+}
+
 static int ti_sci_pm_domain_probe(struct platform_device *pdev)
 {
 	struct device *dev = &pdev->dev;
@@ -161,6 +178,8 @@  static int ti_sci_pm_domain_probe(struct platform_device *pdev)
 				break;
 
 			if (args.args_count >= 1 && args.np == dev->of_node) {
+				bool is_on;
+
 				if (args.args[0] > max_id) {
 					max_id = args.args[0];
 				} else {
@@ -189,7 +208,10 @@  static int ti_sci_pm_domain_probe(struct platform_device *pdev)
 				pd->idx = args.args[0];
 				pd->parent = pd_provider;
 
-				pm_genpd_init(&pd->pd, NULL, true);
+				is_on = ti_sci_pm_pd_is_on(pd_provider,
+							   pd->idx);
+
+				pm_genpd_init(&pd->pd, NULL, !is_on);
 
 				list_add(&pd->node, &pd_provider->pd_list);
 			}