diff mbox series

[2/2] x86/kexec/64: Forbid kexec when running as an SEV-ES guest

Message ID 20210506093122.28607-3-joro@8bytes.org
State Superseded
Headers show
Series [1/2] kexec: Allow architecture code to opt-out at runtime | expand

Commit Message

Joerg Roedel May 6, 2021, 9:31 a.m. UTC
From: Joerg Roedel <jroedel@suse.de>

For now, kexec is not supported when running as an SEV-ES guest. Doing
so requires additional hypervisor support and special code to hand
over the CPUs to the new kernel in a safe way.

Until this is implemented, do not support kexec in SEV-ES guests.

Cc: stable@vger.kernel.org # v5.10+
Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/kernel/machine_kexec_64.c | 8 ++++++++
 1 file changed, 8 insertions(+)

Comments

Eric W. Biederman May 6, 2021, 5:42 p.m. UTC | #1
Joerg Roedel <joro@8bytes.org> writes:

> From: Joerg Roedel <jroedel@suse.de>
>
> For now, kexec is not supported when running as an SEV-ES guest. Doing
> so requires additional hypervisor support and special code to hand
> over the CPUs to the new kernel in a safe way.
>
> Until this is implemented, do not support kexec in SEV-ES guests.

I don't understand this.

Fundamentally kexec is about doing things more or less inspite of
what the firmware is doing.

I don't have any idea what a SEV-ES is.  But the normal x86 boot doesn't
do anything special.  Is cross cpu IPI emulation buggy?

If this is a move in your face hypervisor like Xen is sometimes I can
see perhaps needing a little bit of different work during bootup.
Perhaps handing back a cpu on system shutdown and asking for more cpus
on system boot up.

What is the actual problem you are trying to avoid?

And yes for a temporary hack the suggestion of putting code into
machine_kexec_prepare seems much more reasonable so we don't have to
carry special case infrastructure for the forseeable future.

Eric


> Cc: stable@vger.kernel.org # v5.10+
> Signed-off-by: Joerg Roedel <jroedel@suse.de>
> ---
>  arch/x86/kernel/machine_kexec_64.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
> index c078b0d3ab0e..f902cc9cc634 100644
> --- a/arch/x86/kernel/machine_kexec_64.c
> +++ b/arch/x86/kernel/machine_kexec_64.c
> @@ -620,3 +620,11 @@ void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages)
>  	 */
>  	set_memory_encrypted((unsigned long)vaddr, pages);
>  }
> +
> +/*
> + * Kexec is not supported in SEV-ES guests yet
> + */
> +bool arch_kexec_supported(void)
> +{
> +	return !sev_es_active();
> +}
Joerg Roedel May 6, 2021, 8:41 p.m. UTC | #2
On Thu, May 06, 2021 at 01:59:42PM -0500, Eric W. Biederman wrote:
> Joerg Roedel <jroedel@suse.de> writes:

> Why does it need that?
> 
> Would it not make sense to instead teach kexec how to pass a cpu from
> one kernel to another.  We could use that everywhere.
> 
> Even the kexec-on-panic case should work as even in that case we have
> to touch the cpus as they go down.
> 
> The hardware simply worked well enough that it hasn't mattered enough
> for us to do something like that, but given that we need to do something
> anyway.  It seems like it would make most sense do something that
> will work everywhere, and does not introduce unnecessary dependencies
> on hypervisors.

Well, I guess we could implement something that even works for non
SEV-ES guests and bare-metal. The question is what benefit we get from
that. Is the SIPI sequence currently used not reliable enough?

The benefit of being able to rely on the SIPI sequence is that the
kexec'ed kernel can use the same method to bring up APs as the first
kernel did.

Btw, the same is true for SEV-ES guests, The goal is bring the APs of
an SEV-ES guest into a state where they will use the SEV-ES method of
AP-bringup when they wake up again. This method involves a
firmware-owned page called the AP-jump-table, which contains the reset
vector for the AP in its first 4 bytes.

> > As I said above, for protocol version 1 it will stay disabled, so it is
> > not only a temporary hack.
> 
> Why does bringing up a cpu need hypervisor support?

When a CPU is taken offline under SEV-ES it will do a special hypercall
named AP-reset-hold. The hypervisor will put the CPU into a halt state
until the next SIPI arrives. In protocol version 1 this hypercall
requires a GHCB shared page to be set up for the CPU doing the hypercall
and upon CPU wakeup the HV will write to that shared page. Problem with
that is that the page which contains the GHCB is already owned by the
new kernel then and is probably not shared anymore, which can cause data
corruption in the new kernel.

Version 2 of the protocol adds a purely MSR based version of the
AP-reset-hold hypercall. This one does not need a GHCB page and has to
be used to bring the APs offline before doing the kexec. That is the
reason I plan to only support kexec when the hypervisor provides version
2 of the protocol.

Regards,

	Joerg
diff mbox series

Patch

diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index c078b0d3ab0e..f902cc9cc634 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -620,3 +620,11 @@  void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages)
 	 */
 	set_memory_encrypted((unsigned long)vaddr, pages);
 }
+
+/*
+ * Kexec is not supported in SEV-ES guests yet
+ */
+bool arch_kexec_supported(void)
+{
+	return !sev_es_active();
+}