diff mbox series

[v6,6/6] arm64: Use SYSTEM_OFF2 PSCI call to power off for hibernate

Message ID 20241019172459.2241939-7-dwmw2@infradead.org
State New
Headers show
Series Add PSCI v1.3 SYSTEM_OFF2 support for hibernation | expand

Commit Message

David Woodhouse Oct. 19, 2024, 5:15 p.m. UTC
From: David Woodhouse <dwmw@amazon.co.uk>

The PSCI v1.3 specification adds support for a SYSTEM_OFF2 function
which is analogous to ACPI S4 state. This will allow hosting
environments to determine that a guest is hibernated rather than just
powered off, and handle that state appropriately on subsequent launches.

Since commit 60c0d45a7f7a ("efi/arm64: use UEFI for system reset and
poweroff") the EFI shutdown method is deliberately preferred over PSCI
or other methods. So register a SYS_OFF_MODE_POWER_OFF handler which
*only* handles the hibernation, leaving the original PSCI SYSTEM_OFF as
a last resort via the legacy pm_power_off function pointer.

The hibernation code already exports a system_entering_hibernation()
function which is be used by the higher-priority handler to check for
hibernation. That existing function just returns the value of a static
boolean variable from hibernate.c, which was previously only set in the
hibernation_platform_enter() code path. Set the same flag in the simpler
code path around the call to kernel_power_off() too.

An alternative way to hook SYSTEM_OFF2 into the hibernation code would
be to register a platform_hibernation_ops structure with an ->enter()
method which makes the new SYSTEM_OFF2 call. But that would have the
unwanted side-effect of making hibernation take a completely different
code path in hibernation_platform_enter(), invoking a lot of special dpm
callbacks.

Another option might be to add a new SYS_OFF_MODE_HIBERNATE mode, with
fallback to SYS_OFF_MODE_POWER_OFF. Or to use the sys_off_data to
indicate whether the power off is for hibernation.

But this version works and is relatively simple.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
---
 drivers/firmware/psci/psci.c | 42 ++++++++++++++++++++++++++++++++++++
 kernel/power/hibernate.c     |  5 ++++-
 2 files changed, 46 insertions(+), 1 deletion(-)

Comments

Oliver Upton Oct. 24, 2024, 7:57 p.m. UTC | #1
On Thu, Oct 24, 2024 at 05:56:09PM +0200, David Woodhouse wrote:
> On 24 October 2024 17:44:49 CEST, Oliver Upton <oliver.upton@linux.dev> wrote:
> >IIUC, you're really wanting to 0x0 because there are hypervisors out
> >there that violate the final spec and *only* accept this value.
> >
> >That's perfectly fine, but it'd help avoid confusion if the supporting
> >comment was a bit more direct:
> >
> >	/*
> >	 * If no hibernate type is specified SYSTEM_OFF2 defaults to
> >	 * selecting HIBERNATE_OFF.
> >	 *
> >	 * There are hypervisors in the wild that violate the spec and
> >	 * reject calls that explicitly provide a hibernate type. For
> >	 * compatibility with these nonstandard implementations, pass 0
> >	 * as the type.
> >	 */
> >	 if (system_entering_hibernation())
> >		invoke_psci_fn(PSCI_FN_NATIVE(1_3, SYSTEM_OFF2), 0 , 0, 0);
> 
> By the time this makes it into released versions of the guest Linux kernel, that comment won't be true any more.

Then does it even matter? What is the problem you're trying to solve
with using a particular value for the hibernate type?

Either the goal of this is to make the PSCI client code compatible with
your hypervisor today (and any other implementation based on 'F ALP1') or
we don't care and go with whatever value we want.

Even if the comment eventually becomes stale, there is a ton of value in
documenting the exact implementation decision being made.
Oliver Upton Oct. 25, 2024, 8:40 p.m. UTC | #2
On Fri, Oct 25, 2024 at 08:13:03AM +0200, David Woodhouse wrote:
> On 24 October 2024 21:57:38 CEST, Oliver Upton <oliver.upton@linux.dev> wrote:
> >On Thu, Oct 24, 2024 at 05:56:09PM +0200, David Woodhouse wrote:
> >> On 24 October 2024 17:44:49 CEST, Oliver Upton <oliver.upton@linux.dev> wrote:
> >> >IIUC, you're really wanting to 0x0 because there are hypervisors out
> >> >there that violate the final spec and *only* accept this value.
> >> >
> >> >That's perfectly fine, but it'd help avoid confusion if the supporting
> >> >comment was a bit more direct:
> >> >
> >> >	/*
> >> >	 * If no hibernate type is specified SYSTEM_OFF2 defaults to
> >> >	 * selecting HIBERNATE_OFF.
> >> >	 *
> >> >	 * There are hypervisors in the wild that violate the spec and
> >> >	 * reject calls that explicitly provide a hibernate type. For
> >> >	 * compatibility with these nonstandard implementations, pass 0
> >> >	 * as the type.
> >> >	 */
> >> >	 if (system_entering_hibernation())
> >> >		invoke_psci_fn(PSCI_FN_NATIVE(1_3, SYSTEM_OFF2), 0 , 0, 0);
> >> 
> >> By the time this makes it into released versions of the guest Linux kernel, that comment won't be true any more.
> >
> >Then does it even matter? What is the problem you're trying to solve
> >with using a particular value for the hibernate type?
> >
> >Either the goal of this is to make the PSCI client code compatible with
> >your hypervisor today (and any other implementation based on 'F ALP1') or
> >we don't care and go with whatever value we want.
> >
> >Even if the comment eventually becomes stale, there is a ton of value in
> >documenting the exact implementation decision being made.
> >
> 
> Eventually it won't matter and we can go with whatever value we want. But yes, the goal is to be compatible with the hypervisor *today* until it catches up the changes to the final versions of the spec. I didn't spend much time overthinking the comment. What was it....
> 
> 	/*
> 	 * Zero is an acceptable alternative to PSCI_1_3_OFF_TYPE_HIBERNATE_OFF
> 	 * and is supported by hypervisors implementing an earlier version
> 	 * of the pSCI v1.3 spec.
> 	 */
> 
> That seems to cover it just fine, I think.

No. You're leaving the work for the reader to:

 (1) Figure out what you're talking about
 (2) Go dig up an "earlier version" of the spec
 (3) Realise that it means certain hypervisors only take 0x0 as an
     argument

If you speak *directly* about the problem you're trying to address then
reviewers are less likely to get hung up on what you're trying to do.

I'm genuinely at a loss for why you would want to present this as an
"acceptable alterantive" rather than a functional requirement for this
driver to run on your hypervisor.
Lorenzo Pieralisi Oct. 31, 2024, 5:55 p.m. UTC | #3
On Sat, Oct 19, 2024 at 06:15:47PM +0100, David Woodhouse wrote:

[...]

> +#ifdef CONFIG_HIBERNATION
> +static int psci_sys_hibernate(struct sys_off_data *data)
> +{
> +	/*
> +	 * Zero is an acceptable alternative to PSCI_1_3_OFF_TYPE_HIBERNATE_OFF
> +	 * and is supported by hypervisors implementing an earlier version
> +	 * of the pSCI v1.3 spec.
> +	 */

It is obvious but with this patch applied a host kernel would start executing
SYSTEM_OFF2 too if supported in firmware to hibernate, it is not a hypervisor
only code path.

Related to that: is it now always safe to override

commit 60c0d45a7f7a ("efi/arm64: use UEFI for system reset and poweroff")

for hibernation ? It is not very clear to me why overriding PSCI for
poweroff was the right thing to do - tried to follow that patch history but
the question remains (it is related to UpdateCapsule() but I don't know
how that applies to the hibernation use case).

As far as type == 0 is concerned:

I don't think the issue here is really PSCI 1.3 ALP1 vs PSCI 1.3 Issue
F.b, by reading the PSCI 1.3 Issue F.b specs (note (e) page 96) it looks
like there was a (superseded) PSCI 1.3 Issue F (september 2024 -
superseded in October 2024 - just reading the specs timeline) that allowed an
implementation to return INVALID_PARAMETERS if type == 0 - there should
be no firwmare out there that followed it - it was short lived.

Lorenzo

> +	if (system_entering_hibernation())
> +		invoke_psci_fn(PSCI_FN_NATIVE(1_3, SYSTEM_OFF2),
> +			       0 /*PSCI_1_3_OFF_TYPE_HIBERNATE_OFF*/, 0, 0);
> +	return NOTIFY_DONE;
> +}
> +
> +static int __init psci_hibernate_init(void)
> +{
> +	if (psci_system_off2_hibernate_supported) {
> +		/* Higher priority than EFI shutdown, but only for hibernate */
> +		register_sys_off_handler(SYS_OFF_MODE_POWER_OFF,
> +					 SYS_OFF_PRIO_FIRMWARE + 2,
> +					 psci_sys_hibernate, NULL);
> +	}
> +	return 0;
> +}
> +subsys_initcall(psci_hibernate_init);
> +#endif
> +
>  static int psci_features(u32 psci_func_id)
>  {
>  	return invoke_psci_fn(PSCI_1_0_FN_PSCI_FEATURES,
> @@ -364,6 +392,7 @@ static const struct {
>  	PSCI_ID_NATIVE(1_1, SYSTEM_RESET2),
>  	PSCI_ID(1_1, MEM_PROTECT),
>  	PSCI_ID_NATIVE(1_1, MEM_PROTECT_CHECK_RANGE),
> +	PSCI_ID_NATIVE(1_3, SYSTEM_OFF2),
>  };
>  
>  static int psci_debugfs_read(struct seq_file *s, void *data)
> @@ -525,6 +554,18 @@ static void __init psci_init_system_reset2(void)
>  		psci_system_reset2_supported = true;
>  }
>  
> +static void __init psci_init_system_off2(void)
> +{
> +	int ret;
> +
> +	ret = psci_features(PSCI_FN_NATIVE(1_3, SYSTEM_OFF2));
> +	if (ret < 0)
> +		return;
> +
> +	if (ret & PSCI_1_3_OFF_TYPE_HIBERNATE_OFF)
> +		psci_system_off2_hibernate_supported = true;
> +}
> +
>  static void __init psci_init_system_suspend(void)
>  {
>  	int ret;
> @@ -655,6 +696,7 @@ static int __init psci_probe(void)
>  		psci_init_cpu_suspend();
>  		psci_init_system_suspend();
>  		psci_init_system_reset2();
> +		psci_init_system_off2();
>  		kvm_init_hyp_services();
>  	}
>  
> diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
> index e35829d36039..1f87aa01ba44 100644
> --- a/kernel/power/hibernate.c
> +++ b/kernel/power/hibernate.c
> @@ -685,8 +685,11 @@ static void power_down(void)
>  		}
>  		fallthrough;
>  	case HIBERNATION_SHUTDOWN:
> -		if (kernel_can_power_off())
> +		if (kernel_can_power_off()) {
> +			entering_platform_hibernation = true;
>  			kernel_power_off();
> +			entering_platform_hibernation = false;
> +		}
>  		break;
>  	}
>  	kernel_halt();
> -- 
> 2.44.0
>
David Woodhouse Oct. 31, 2024, 6 p.m. UTC | #4
On Fri, 2024-10-25 at 13:40 -0700, Oliver Upton wrote:
> 
> No. You're leaving the work for the reader to:
> 
>  (1) Figure out what you're talking about
>  (2) Go dig up an "earlier version" of the spec
>  (3) Realise that it means certain hypervisors only take 0x0 as an
>      argument

No. There's no need to dig up an 'earlier version' of the spec. The
current F.b release says, "if the value of this parameter is 0x0, the
implementation defaults to selecting HIBERNATE_OFF". That's what makes
it an acceptable alternative.

> If you speak *directly* about the problem you're trying to address then
> reviewers are less likely to get hung up on what you're trying to do.

The "problem" this comment addresses is a reader who looks at this code
and wonders why it uses zero instead of HIBERNATE_OFF.

Firstly, that reader needs to spot the text, in the *current* version
of the spec as cited above, which makes it clear that it's a perfectly
acceptable alternative.

Secondly, that reader needs to know why we chose zero over
HIBERNATE_OFF, which is also explained fairly succinctly: because it's
compatible with hypervisors implementing an earlier version of the
spec.

> I'm genuinely at a loss for why you would want to present this as an
> "acceptable alterantive" rather than a functional requirement for this
> driver to run on your hypervisor.

Because "my" hypervisor is a live product which actually gets security
fixes and updates. If it wasn't for the silly season of "thou shalt not
ship *anything* except security fixes and stuff that's going to be
announced at a big conference at the end of the month", the change to
make it accept the new value of HIBERNATE_OFF would already have been
deployed.

And *even* with the silly season delaying non-critical hypervisor
changes, your suggested wording will *still* probably not be true by
the time the comment is included in an actual release of the Linux
kernel.

But honestly, it isn't a hill I'm prepared to die on. If you insist on
changing the comment to your 'There are hypervisors in the wile that
violate the spec...' version, by all means go ahead and do so. I'll
follow up with a patch to correct it in a few weeks when it becomes
obsolete.
Lorenzo Pieralisi Nov. 1, 2024, 5:48 p.m. UTC | #5
[+Ard, Sami, for EFI]

On Thu, Oct 31, 2024 at 06:55:43PM +0100, Lorenzo Pieralisi wrote:
> On Sat, Oct 19, 2024 at 06:15:47PM +0100, David Woodhouse wrote:
> 
> [...]
> 
> > +#ifdef CONFIG_HIBERNATION
> > +static int psci_sys_hibernate(struct sys_off_data *data)
> > +{
> > +	/*
> > +	 * Zero is an acceptable alternative to PSCI_1_3_OFF_TYPE_HIBERNATE_OFF
> > +	 * and is supported by hypervisors implementing an earlier version
> > +	 * of the pSCI v1.3 spec.
> > +	 */
> 
> It is obvious but with this patch applied a host kernel would start executing
> SYSTEM_OFF2 too if supported in firmware to hibernate, it is not a hypervisor
> only code path.
> 
> Related to that: is it now always safe to override
> 
> commit 60c0d45a7f7a ("efi/arm64: use UEFI for system reset and poweroff")
> 
> for hibernation ? It is not very clear to me why overriding PSCI for
> poweroff was the right thing to do - tried to follow that patch history but
> the question remains (it is related to UpdateCapsule() but I don't know
> how that applies to the hibernation use case).

RFC: It is unclear to me what happens in current mainline if we try to
hibernate with EFI runtime services enabled and a capsule update pending (we
issue EFI ResetSystem(EFI_RESET_SHUTDOWN,..) which might not be compatible
with the reset required by the pending capsule update request) what happens
in this case I don't know but at least the choice is all contained in
EFI firmware.

Then if in the same scenario now we are switching to PSCI SYSTEM_OFF2 for the
hibernate reset I suspect that what happens to the in-flight capsule
update requests strictly depends on what "reset" PSCI SYSTEM_OFF2 will
end up doing ?

I think this is just a corner case and it is unlikely it has been ever
tested (is it even possible ? Looking at EFI folks) - it would be good
to clarify it at least to make sure we understand this code path.

Thanks,
Lorenzo
Ard Biesheuvel Nov. 4, 2024, 1:54 p.m. UTC | #6
On Fri, 1 Nov 2024 at 18:49, Lorenzo Pieralisi <lpieralisi@kernel.org> wrote:
>
> [+Ard, Sami, for EFI]
>
> On Thu, Oct 31, 2024 at 06:55:43PM +0100, Lorenzo Pieralisi wrote:
> > On Sat, Oct 19, 2024 at 06:15:47PM +0100, David Woodhouse wrote:
> >
> > [...]
> >
> > > +#ifdef CONFIG_HIBERNATION
> > > +static int psci_sys_hibernate(struct sys_off_data *data)
> > > +{
> > > +   /*
> > > +    * Zero is an acceptable alternative to PSCI_1_3_OFF_TYPE_HIBERNATE_OFF
> > > +    * and is supported by hypervisors implementing an earlier version
> > > +    * of the pSCI v1.3 spec.
> > > +    */
> >
> > It is obvious but with this patch applied a host kernel would start executing
> > SYSTEM_OFF2 too if supported in firmware to hibernate, it is not a hypervisor
> > only code path.
> >
> > Related to that: is it now always safe to override
> >
> > commit 60c0d45a7f7a ("efi/arm64: use UEFI for system reset and poweroff")
> >
> > for hibernation ? It is not very clear to me why overriding PSCI for
> > poweroff was the right thing to do - tried to follow that patch history but
> > the question remains (it is related to UpdateCapsule() but I don't know
> > how that applies to the hibernation use case).
>
> RFC: It is unclear to me what happens in current mainline if we try to
> hibernate with EFI runtime services enabled and a capsule update pending (we
> issue EFI ResetSystem(EFI_RESET_SHUTDOWN,..) which might not be compatible
> with the reset required by the pending capsule update request) what happens
> in this case I don't know but at least the choice is all contained in
> EFI firmware.
>
> Then if in the same scenario now we are switching to PSCI SYSTEM_OFF2 for the
> hibernate reset I suspect that what happens to the in-flight capsule
> update requests strictly depends on what "reset" PSCI SYSTEM_OFF2 will
> end up doing ?
>
> I think this is just a corner case and it is unlikely it has been ever
> tested (is it even possible ? Looking at EFI folks) - it would be good
> to clarify it at least to make sure we understand this code path.
>

I'm not aware of any OS that actually uses capsule update at runtime
(both Windows and Linux queue up the capsule and call the
UpdateCapsule() runtime service at boot time after a reboot).

So it is unlikely that this would break anything, and I'd actually be
inclined to disable capsule update at runtime altogether.

I will also note that hibernation with EFI is flaky in general, given
that EFI memory regions may move around
Lorenzo Pieralisi Nov. 4, 2024, 3:11 p.m. UTC | #7
On Mon, Nov 04, 2024 at 02:54:12PM +0100, Ard Biesheuvel wrote:
> On Fri, 1 Nov 2024 at 18:49, Lorenzo Pieralisi <lpieralisi@kernel.org> wrote:
> >
> > [+Ard, Sami, for EFI]
> >
> > On Thu, Oct 31, 2024 at 06:55:43PM +0100, Lorenzo Pieralisi wrote:
> > > On Sat, Oct 19, 2024 at 06:15:47PM +0100, David Woodhouse wrote:
> > >
> > > [...]
> > >
> > > > +#ifdef CONFIG_HIBERNATION
> > > > +static int psci_sys_hibernate(struct sys_off_data *data)
> > > > +{
> > > > +   /*
> > > > +    * Zero is an acceptable alternative to PSCI_1_3_OFF_TYPE_HIBERNATE_OFF
> > > > +    * and is supported by hypervisors implementing an earlier version
> > > > +    * of the pSCI v1.3 spec.
> > > > +    */
> > >
> > > It is obvious but with this patch applied a host kernel would start executing
> > > SYSTEM_OFF2 too if supported in firmware to hibernate, it is not a hypervisor
> > > only code path.
> > >
> > > Related to that: is it now always safe to override
> > >
> > > commit 60c0d45a7f7a ("efi/arm64: use UEFI for system reset and poweroff")
> > >
> > > for hibernation ? It is not very clear to me why overriding PSCI for
> > > poweroff was the right thing to do - tried to follow that patch history but
> > > the question remains (it is related to UpdateCapsule() but I don't know
> > > how that applies to the hibernation use case).
> >
> > RFC: It is unclear to me what happens in current mainline if we try to
> > hibernate with EFI runtime services enabled and a capsule update pending (we
> > issue EFI ResetSystem(EFI_RESET_SHUTDOWN,..) which might not be compatible
> > with the reset required by the pending capsule update request) what happens
> > in this case I don't know but at least the choice is all contained in
> > EFI firmware.
> >
> > Then if in the same scenario now we are switching to PSCI SYSTEM_OFF2 for the
> > hibernate reset I suspect that what happens to the in-flight capsule
> > update requests strictly depends on what "reset" PSCI SYSTEM_OFF2 will
> > end up doing ?
> >
> > I think this is just a corner case and it is unlikely it has been ever
> > tested (is it even possible ? Looking at EFI folks) - it would be good
> > to clarify it at least to make sure we understand this code path.
> >
> 
> I'm not aware of any OS that actually uses capsule update at runtime
> (both Windows and Linux queue up the capsule and call the
> UpdateCapsule() runtime service at boot time after a reboot).
> 
> So it is unlikely that this would break anything, and I'd actually be
> inclined to disable capsule update at runtime altogether.
> 
> I will also note that hibernation with EFI is flaky in general, given
> that EFI memory regions may move around

Thank you for chiming in, I think we are OK (I don't think this patch
will create more issues than the ones that are already there for hibernate
anyway) - the reasoning behind the change is in the commit logs.

Lorenzo
diff mbox series

Patch

diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
index 2328ca58bba6..8809455a61a6 100644
--- a/drivers/firmware/psci/psci.c
+++ b/drivers/firmware/psci/psci.c
@@ -78,6 +78,7 @@  struct psci_0_1_function_ids get_psci_0_1_function_ids(void)
 
 static u32 psci_cpu_suspend_feature;
 static bool psci_system_reset2_supported;
+static bool psci_system_off2_hibernate_supported;
 
 static inline bool psci_has_ext_power_state(void)
 {
@@ -333,6 +334,33 @@  static void psci_sys_poweroff(void)
 	invoke_psci_fn(PSCI_0_2_FN_SYSTEM_OFF, 0, 0, 0);
 }
 
+#ifdef CONFIG_HIBERNATION
+static int psci_sys_hibernate(struct sys_off_data *data)
+{
+	/*
+	 * Zero is an acceptable alternative to PSCI_1_3_OFF_TYPE_HIBERNATE_OFF
+	 * and is supported by hypervisors implementing an earlier version
+	 * of the pSCI v1.3 spec.
+	 */
+	if (system_entering_hibernation())
+		invoke_psci_fn(PSCI_FN_NATIVE(1_3, SYSTEM_OFF2),
+			       0 /*PSCI_1_3_OFF_TYPE_HIBERNATE_OFF*/, 0, 0);
+	return NOTIFY_DONE;
+}
+
+static int __init psci_hibernate_init(void)
+{
+	if (psci_system_off2_hibernate_supported) {
+		/* Higher priority than EFI shutdown, but only for hibernate */
+		register_sys_off_handler(SYS_OFF_MODE_POWER_OFF,
+					 SYS_OFF_PRIO_FIRMWARE + 2,
+					 psci_sys_hibernate, NULL);
+	}
+	return 0;
+}
+subsys_initcall(psci_hibernate_init);
+#endif
+
 static int psci_features(u32 psci_func_id)
 {
 	return invoke_psci_fn(PSCI_1_0_FN_PSCI_FEATURES,
@@ -364,6 +392,7 @@  static const struct {
 	PSCI_ID_NATIVE(1_1, SYSTEM_RESET2),
 	PSCI_ID(1_1, MEM_PROTECT),
 	PSCI_ID_NATIVE(1_1, MEM_PROTECT_CHECK_RANGE),
+	PSCI_ID_NATIVE(1_3, SYSTEM_OFF2),
 };
 
 static int psci_debugfs_read(struct seq_file *s, void *data)
@@ -525,6 +554,18 @@  static void __init psci_init_system_reset2(void)
 		psci_system_reset2_supported = true;
 }
 
+static void __init psci_init_system_off2(void)
+{
+	int ret;
+
+	ret = psci_features(PSCI_FN_NATIVE(1_3, SYSTEM_OFF2));
+	if (ret < 0)
+		return;
+
+	if (ret & PSCI_1_3_OFF_TYPE_HIBERNATE_OFF)
+		psci_system_off2_hibernate_supported = true;
+}
+
 static void __init psci_init_system_suspend(void)
 {
 	int ret;
@@ -655,6 +696,7 @@  static int __init psci_probe(void)
 		psci_init_cpu_suspend();
 		psci_init_system_suspend();
 		psci_init_system_reset2();
+		psci_init_system_off2();
 		kvm_init_hyp_services();
 	}
 
diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
index e35829d36039..1f87aa01ba44 100644
--- a/kernel/power/hibernate.c
+++ b/kernel/power/hibernate.c
@@ -685,8 +685,11 @@  static void power_down(void)
 		}
 		fallthrough;
 	case HIBERNATION_SHUTDOWN:
-		if (kernel_can_power_off())
+		if (kernel_can_power_off()) {
+			entering_platform_hibernation = true;
 			kernel_power_off();
+			entering_platform_hibernation = false;
+		}
 		break;
 	}
 	kernel_halt();