diff mbox series

[03/13] x86/HV: Add new hvcall guest address host visibility support

Message ID 20210728145232.285861-4-ltykernel@gmail.com
State New
Headers show
Series x86/Hyper-V: Add Hyper-V Isolation VM support | expand

Commit Message

Tianyu Lan July 28, 2021, 2:52 p.m. UTC
From: Tianyu Lan <Tianyu.Lan@microsoft.com>

Add new hvcall guest address host visibility support to mark
memory visible to host. Call it inside set_memory_decrypted
/encrypted().

Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
 arch/x86/hyperv/Makefile           |   2 +-
 arch/x86/hyperv/ivm.c              | 112 +++++++++++++++++++++++++++++
 arch/x86/include/asm/hyperv-tlfs.h |  18 +++++
 arch/x86/include/asm/mshyperv.h    |   3 +-
 arch/x86/mm/pat/set_memory.c       |   6 +-
 include/asm-generic/hyperv-tlfs.h  |   1 +
 6 files changed, 139 insertions(+), 3 deletions(-)
 create mode 100644 arch/x86/hyperv/ivm.c

Comments

Dave Hansen July 28, 2021, 3:29 p.m. UTC | #1
On 7/28/21 7:52 AM, Tianyu Lan wrote:
> @@ -1986,7 +1988,9 @@ static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
>  	int ret;
>  
>  	/* Nothing to do if memory encryption is not active */
> -	if (!mem_encrypt_active())
> +	if (hv_is_isolation_supported())
> +		return hv_set_mem_enc(addr, numpages, enc);
> +	else if (!mem_encrypt_active())
>  		return 0;

__set_memory_enc_dec() is turning into a real mess.  SEV, TDX and now
Hyper-V are messing around in here.

It doesn't help that these additions are totally uncommented.  Even
worse is that hv_set_mem_enc() was intentionally named "enc" when it
presumably has nothing to do with encryption.

This needs to be refactored.  The current __set_memory_enc_dec() can
become __set_memory_enc_pgtable().  It gets used for the hypervisors
that get informed about "encryption" status via page tables: SEV and TDX.

Then, rename hv_set_mem_enc() to hv_set_visible_hcall().  You'll end up
with:

int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
{
	if (hv_is_isolation_supported())
		return hv_set_visible_hcall(...);

	if (mem_encrypt_active() || ...)
		return __set_memory_enc_pgtable();

	/* Nothing to do */
	return 0;
}

That tells the story pretty effectively, in code.

> +int hv_set_mem_enc(unsigned long addr, int numpages, bool enc)
> +{
> +	return hv_set_mem_host_visibility((void *)addr,
> +			numpages * HV_HYP_PAGE_SIZE,
> +			enc ? VMBUS_PAGE_NOT_VISIBLE
> +			: VMBUS_PAGE_VISIBLE_READ_WRITE);
> +}

I know this is off in Hyper-V code, but this just makes my eyes bleed.
I'd much rather see something which is less compact but readable.

> +/* Hyper-V GPA map flags */
> +#define	VMBUS_PAGE_NOT_VISIBLE		0
> +#define	VMBUS_PAGE_VISIBLE_READ_ONLY	1
> +#define	VMBUS_PAGE_VISIBLE_READ_WRITE	3

That looks suspiciously like an enum.
Tianyu Lan July 29, 2021, 12:49 p.m. UTC | #2
Hi Dave:
      Thanks for your review.

On 7/28/2021 11:29 PM, Dave Hansen wrote:
> On 7/28/21 7:52 AM, Tianyu Lan wrote:
>> @@ -1986,7 +1988,9 @@ static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
>>   	int ret;
>>   
>>   	/* Nothing to do if memory encryption is not active */
>> -	if (!mem_encrypt_active())
>> +	if (hv_is_isolation_supported())
>> +		return hv_set_mem_enc(addr, numpages, enc);
>> +	else if (!mem_encrypt_active())
>>   		return 0;
> 
> __set_memory_enc_dec() is turning into a real mess.  SEV, TDX and now
> Hyper-V are messing around in here.
> 
> It doesn't help that these additions are totally uncommented.  Even
> worse is that hv_set_mem_enc() was intentionally named "enc" when it
> presumably has nothing to do with encryption.
> 
> This needs to be refactored.  The current __set_memory_enc_dec() can
> become __set_memory_enc_pgtable().  It gets used for the hypervisors
> that get informed about "encryption" status via page tables: SEV and TDX.
> 
> Then, rename hv_set_mem_enc() to hv_set_visible_hcall().  You'll end up
> with:
> 
> int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
> {
> 	if (hv_is_isolation_supported())
> 		return hv_set_visible_hcall(...);
> 
> 	if (mem_encrypt_active() || ...)
> 		return __set_memory_enc_pgtable();
> 
> 	/* Nothing to do */
> 	return 0;
> }
> 
> That tells the story pretty effectively, in code.

Yes, this is good idea. Thanks for your suggestion.

> 
>> +int hv_set_mem_enc(unsigned long addr, int numpages, bool enc)
>> +{
>> +	return hv_set_mem_host_visibility((void *)addr,
>> +			numpages * HV_HYP_PAGE_SIZE,
>> +			enc ? VMBUS_PAGE_NOT_VISIBLE
>> +			: VMBUS_PAGE_VISIBLE_READ_WRITE);
>> +}
> 
> I know this is off in Hyper-V code, but this just makes my eyes bleed.
> I'd much rather see something which is less compact but readable.

OK. Will update.

> 
>> +/* Hyper-V GPA map flags */
>> +#define	VMBUS_PAGE_NOT_VISIBLE		0
>> +#define	VMBUS_PAGE_VISIBLE_READ_ONLY	1
>> +#define	VMBUS_PAGE_VISIBLE_READ_WRITE	3
> 
> That looks suspiciously like an enum.
>

OK. Will update.
Tianyu Lan July 29, 2021, 1:01 p.m. UTC | #3
On 7/29/2021 1:06 AM, Dave Hansen wrote:
> On 7/28/21 7:52 AM, Tianyu Lan wrote:

>> @@ -1986,7 +1988,9 @@ static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)

>>   	int ret;

>>   

>>   	/* Nothing to do if memory encryption is not active */

>> -	if (!mem_encrypt_active())

>> +	if (hv_is_isolation_supported())

>> +		return hv_set_mem_enc(addr, numpages, enc);

>> +	else if (!mem_encrypt_active())

>>   		return 0;

> 

> One more thing.  If you're going to be patching generic code, please

> start using feature checks that can get optimized away at runtime.

> hv_is_isolation_supported() doesn't look like the world's cheapest

> check.  It can't be inlined and costs at least a function call.


Yes, you are right. How about adding a static branch key for the check 
of isolation VM? This may reduce the check cost.
Tianyu Lan July 29, 2021, 3:02 p.m. UTC | #4
On 7/29/2021 10:09 PM, Dave Hansen wrote:
> On 7/29/21 6:01 AM, Tianyu Lan wrote:

>> On 7/29/2021 1:06 AM, Dave Hansen wrote:

>>> On 7/28/21 7:52 AM, Tianyu Lan wrote:

>>>> @@ -1986,7 +1988,9 @@ static int __set_memory_enc_dec(unsigned long

>>>> addr, int numpages, bool enc)

>>>>        int ret;

>>>>          /* Nothing to do if memory encryption is not active */

>>>> -    if (!mem_encrypt_active())

>>>> +    if (hv_is_isolation_supported())

>>>> +        return hv_set_mem_enc(addr, numpages, enc);

>>>> +    else if (!mem_encrypt_active())

>>>>            return 0;

>>>

>>> One more thing.  If you're going to be patching generic code, please

>>> start using feature checks that can get optimized away at runtime.

>>> hv_is_isolation_supported() doesn't look like the world's cheapest

>>> check.  It can't be inlined and costs at least a function call.

>>

>> Yes, you are right. How about adding a static branch key for the check

>> of isolation VM? This may reduce the check cost.

> 

> I don't think you need a static key.

> 

> There are basically three choices:

> 1. Use an existing X86_FEATURE bit.  I think there's already one for

>     when you are running under a hypervisor.  It's not super precise,

>     but it's better than what you have.

> 2. Define a new X86_FEATURE bit for when you are running under

>     Hyper-V.

> 3. Define a new X86_FEATURE bit specifically for Hyper-V isolation VM

>     support.  This particular feature might be a little uncommon to

>     deserve its own bit.

> 

> I'd probably just do #2.

> 


There is x86_hyper_type to identify hypervisor type and we may check 
this variable after checking X86_FEATURE_HYPERVISOR.

static inline bool hv_is_isolation_supported(void)
{
	if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR))
		return 0;

         if (x86_hyper_type != X86_HYPER_MS_HYPERV)
                 return 0;

	// out of line function call:
	return __hv_is_isolation_supported();
}
Dave Hansen July 29, 2021, 4:05 p.m. UTC | #5
On 7/29/21 8:02 AM, Tianyu Lan wrote:
>>
> 
> There is x86_hyper_type to identify hypervisor type and we may check
> this variable after checking X86_FEATURE_HYPERVISOR.
> 
> static inline bool hv_is_isolation_supported(void)
> {
>     if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR))
>         return 0;
> 
>         if (x86_hyper_type != X86_HYPER_MS_HYPERV)
>                 return 0;
> 
>     // out of line function call:
>     return __hv_is_isolation_supported();
> }   

Looks fine.  You just might want to use this existing helper:

static inline bool hypervisor_is_type(enum x86_hypervisor_type type)
{
        return x86_hyper_type == type;
}
Tianyu Lan July 30, 2021, 2:52 a.m. UTC | #6
On 7/30/2021 12:05 AM, Dave Hansen wrote:
> On 7/29/21 8:02 AM, Tianyu Lan wrote:

>>>

>>

>> There is x86_hyper_type to identify hypervisor type and we may check

>> this variable after checking X86_FEATURE_HYPERVISOR.

>>

>> static inline bool hv_is_isolation_supported(void)

>> {

>>      if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR))

>>          return 0;

>>

>>          if (x86_hyper_type != X86_HYPER_MS_HYPERV)

>>                  return 0;

>>

>>      // out of line function call:

>>      return __hv_is_isolation_supported();

>> }

> 

> Looks fine.  You just might want to use this existing helper:

> 

> static inline bool hypervisor_is_type(enum x86_hypervisor_type type)

> {

>          return x86_hyper_type == type;

> }

> 


Yes,thanks for suggestion and will update in the next version.
Joerg Roedel Aug. 2, 2021, 12:01 p.m. UTC | #7
On Wed, Jul 28, 2021 at 08:29:41AM -0700, Dave Hansen wrote:
> __set_memory_enc_dec() is turning into a real mess.  SEV, TDX and now

> Hyper-V are messing around in here.


I was going to suggest a PV_OPS call where the fitting implementation
for the guest environment can be plugged in at boot. There is TDX and an
SEV(-SNP) case, a Hyper-V case, and likely more coming up from other
cloud/hypervisor vendors. Hiding all these behind feature checks is not
going to make things cleaner.

Regards,

	Joerg
Tianyu Lan Aug. 2, 2021, 12:59 p.m. UTC | #8
On 8/2/2021 8:01 PM, Joerg Roedel wrote:
> On Wed, Jul 28, 2021 at 08:29:41AM -0700, Dave Hansen wrote:

>> __set_memory_enc_dec() is turning into a real mess.  SEV, TDX and now

>> Hyper-V are messing around in here.

> 

> I was going to suggest a PV_OPS call where the fitting implementation

> for the guest environment can be plugged in at boot. There is TDX and an

> SEV(-SNP) case, a Hyper-V case, and likely more coming up from other

> cloud/hypervisor vendors. Hiding all these behind feature checks is not

> going to make things cleaner.

> 


Yes, that makes sense. I will do this in the next version.
Jürgen Groß Aug. 2, 2021, 1:11 p.m. UTC | #9
On 02.08.21 14:01, Joerg Roedel wrote:
> On Wed, Jul 28, 2021 at 08:29:41AM -0700, Dave Hansen wrote:

>> __set_memory_enc_dec() is turning into a real mess.  SEV, TDX and now

>> Hyper-V are messing around in here.

> 

> I was going to suggest a PV_OPS call where the fitting implementation

> for the guest environment can be plugged in at boot. There is TDX and an

> SEV(-SNP) case, a Hyper-V case, and likely more coming up from other

> cloud/hypervisor vendors. Hiding all these behind feature checks is not

> going to make things cleaner.


As those cases are all mutually exclusive, wouldn't a static_call() be
the appropriate solution?


Juergen
diff mbox series

Patch

diff --git a/arch/x86/hyperv/Makefile b/arch/x86/hyperv/Makefile
index 48e2c51464e8..5d2de10809ae 100644
--- a/arch/x86/hyperv/Makefile
+++ b/arch/x86/hyperv/Makefile
@@ -1,5 +1,5 @@ 
 # SPDX-License-Identifier: GPL-2.0-only
-obj-y			:= hv_init.o mmu.o nested.o irqdomain.o
+obj-y			:= hv_init.o mmu.o nested.o irqdomain.o ivm.o
 obj-$(CONFIG_X86_64)	+= hv_apic.o hv_proc.o
 
 ifdef CONFIG_X86_64
diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
new file mode 100644
index 000000000000..24a58795abd8
--- /dev/null
+++ b/arch/x86/hyperv/ivm.c
@@ -0,0 +1,112 @@ 
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Hyper-V Isolation VM interface with paravisor and hypervisor
+ *
+ * Author:
+ *  Tianyu Lan <Tianyu.Lan@microsoft.com>
+ */
+
+#include <linux/hyperv.h>
+#include <linux/types.h>
+#include <linux/bitfield.h>
+#include <linux/slab.h>
+#include <asm/io.h>
+#include <asm/mshyperv.h>
+
+/*
+ * hv_mark_gpa_visibility - Set pages visible to host via hvcall.
+ *
+ * In Isolation VM, all guest memory is encripted from host and guest
+ * needs to set memory visible to host via hvcall before sharing memory
+ * with host.
+ */
+int hv_mark_gpa_visibility(u16 count, const u64 pfn[], u32 visibility)
+{
+	struct hv_gpa_range_for_visibility **input_pcpu, *input;
+	u16 pages_processed;
+	u64 hv_status;
+	unsigned long flags;
+
+	/* no-op if partition isolation is not enabled */
+	if (!hv_is_isolation_supported())
+		return 0;
+
+	if (count > HV_MAX_MODIFY_GPA_REP_COUNT) {
+		pr_err("Hyper-V: GPA count:%d exceeds supported:%lu\n", count,
+			HV_MAX_MODIFY_GPA_REP_COUNT);
+		return -EINVAL;
+	}
+
+	local_irq_save(flags);
+	input_pcpu = (struct hv_gpa_range_for_visibility **)
+			this_cpu_ptr(hyperv_pcpu_input_arg);
+	input = *input_pcpu;
+	if (unlikely(!input)) {
+		local_irq_restore(flags);
+		return -EINVAL;
+	}
+
+	input->partition_id = HV_PARTITION_ID_SELF;
+	input->host_visibility = visibility;
+	input->reserved0 = 0;
+	input->reserved1 = 0;
+	memcpy((void *)input->gpa_page_list, pfn, count * sizeof(*pfn));
+	hv_status = hv_do_rep_hypercall(
+			HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY, count,
+			0, input, &pages_processed);
+	local_irq_restore(flags);
+
+	if (!(hv_status & HV_HYPERCALL_RESULT_MASK))
+		return 0;
+
+	return hv_status & HV_HYPERCALL_RESULT_MASK;
+}
+EXPORT_SYMBOL(hv_mark_gpa_visibility);
+
+/*
+ * hv_set_mem_host_visibility - Set specified memory visible to host.
+ *
+ * In Isolation VM, all guest memory is encrypted from host and guest
+ * needs to set memory visible to host via hvcall before sharing memory
+ * with host. This function works as wrap of hv_mark_gpa_visibility()
+ * with memory base and size.
+ */
+static int hv_set_mem_host_visibility(void *kbuffer, size_t size, u32 visibility)
+{
+	int pagecount = size >> HV_HYP_PAGE_SHIFT;
+	u64 *pfn_array;
+	int ret = 0;
+	int i, pfn;
+
+	if (!hv_is_isolation_supported() || !ms_hyperv.ghcb_base)
+		return 0;
+
+	pfn_array = kzalloc(HV_HYP_PAGE_SIZE, GFP_KERNEL);
+	if (!pfn_array)
+		return -ENOMEM;
+
+	for (i = 0, pfn = 0; i < pagecount; i++) {
+		pfn_array[pfn] = virt_to_hvpfn(kbuffer + i * HV_HYP_PAGE_SIZE);
+		pfn++;
+
+		if (pfn == HV_MAX_MODIFY_GPA_REP_COUNT || i == pagecount - 1) {
+			ret |= hv_mark_gpa_visibility(pfn, pfn_array, visibility);
+			pfn = 0;
+
+			if (ret)
+				goto err_free_pfn_array;
+		}
+	}
+
+ err_free_pfn_array:
+	kfree(pfn_array);
+	return ret;
+}
+
+int hv_set_mem_enc(unsigned long addr, int numpages, bool enc)
+{
+	return hv_set_mem_host_visibility((void *)addr,
+			numpages * HV_HYP_PAGE_SIZE,
+			enc ? VMBUS_PAGE_NOT_VISIBLE
+			: VMBUS_PAGE_VISIBLE_READ_WRITE);
+}
diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
index f1366ce609e3..f027b5bf6076 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -276,6 +276,11 @@  enum hv_isolation_type {
 #define HV_X64_MSR_TIME_REF_COUNT	HV_REGISTER_TIME_REF_COUNT
 #define HV_X64_MSR_REFERENCE_TSC	HV_REGISTER_REFERENCE_TSC
 
+/* Hyper-V GPA map flags */
+#define	VMBUS_PAGE_NOT_VISIBLE		0
+#define	VMBUS_PAGE_VISIBLE_READ_ONLY	1
+#define	VMBUS_PAGE_VISIBLE_READ_WRITE	3
+
 /*
  * Declare the MSR used to setup pages used to communicate with the hypervisor.
  */
@@ -578,4 +583,17 @@  enum hv_interrupt_type {
 
 #include <asm-generic/hyperv-tlfs.h>
 
+/* All input parameters should be in single page. */
+#define HV_MAX_MODIFY_GPA_REP_COUNT		\
+	((PAGE_SIZE / sizeof(u64)) - 2)
+
+/* HvCallModifySparseGpaPageHostVisibility hypercall */
+struct hv_gpa_range_for_visibility {
+	u64 partition_id;
+	u32 host_visibility:2;
+	u32 reserved0:30;
+	u32 reserved1;
+	u64 gpa_page_list[HV_MAX_MODIFY_GPA_REP_COUNT];
+} __packed;
+
 #endif
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 6627cfd2bfba..68dd207c2603 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -190,7 +190,8 @@  struct irq_domain *hv_create_pci_msi_domain(void);
 int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector,
 		struct hv_interrupt_entry *entry);
 int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry);
-
+int hv_mark_gpa_visibility(u16 count, const u64 pfn[], u32 visibility);
+int hv_set_mem_enc(unsigned long addr, int numpages, bool enc);
 #else /* CONFIG_HYPERV */
 static inline void hyperv_init(void) {}
 static inline void hyperv_setup_mmu_ops(void) {}
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index ad8a5c586a35..ba2a22886976 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -29,6 +29,8 @@ 
 #include <asm/proto.h>
 #include <asm/memtype.h>
 #include <asm/set_memory.h>
+#include <asm/hyperv-tlfs.h>
+#include <asm/mshyperv.h>
 
 #include "../mm_internal.h"
 
@@ -1986,7 +1988,9 @@  static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
 	int ret;
 
 	/* Nothing to do if memory encryption is not active */
-	if (!mem_encrypt_active())
+	if (hv_is_isolation_supported())
+		return hv_set_mem_enc(addr, numpages, enc);
+	else if (!mem_encrypt_active())
 		return 0;
 
 	/* Should not be working on unaligned addresses */
diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
index 56348a541c50..8ed6733d5146 100644
--- a/include/asm-generic/hyperv-tlfs.h
+++ b/include/asm-generic/hyperv-tlfs.h
@@ -158,6 +158,7 @@  struct ms_hyperv_tsc_page {
 #define HVCALL_RETARGET_INTERRUPT		0x007e
 #define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_SPACE 0x00af
 #define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_LIST 0x00b0
+#define HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY 0x00db
 
 /* Extended hypercalls */
 #define HV_EXT_CALL_QUERY_CAPABILITIES		0x8001