mbox series

[00/13] x86/Hyper-V: Add Hyper-V Isolation VM support

Message ID 20210728145232.285861-1-ltykernel@gmail.com
Headers show
Series x86/Hyper-V: Add Hyper-V Isolation VM support | expand

Message

Tianyu Lan July 28, 2021, 2:52 p.m. UTC
From: Tianyu Lan <Tianyu.Lan@microsoft.com>


Hyper-V provides two kinds of Isolation VMs. VBS(Virtualization-based
security) and AMD SEV-SNP unenlightened Isolation VMs. This patchset
is to add support for these Isolation VM support in Linux.

The memory of these vms are encrypted and host can't access guest
memory directly. Hyper-V provides new host visibility hvcall and
the guest needs to call new hvcall to mark memory visible to host
before sharing memory with host. For security, all network/storage
stack memory should not be shared with host and so there is bounce
buffer requests.

Vmbus channel ring buffer already plays bounce buffer role because
all data from/to host needs to copy from/to between the ring buffer
and IO stack memory. So mark vmbus channel ring buffer visible.

There are two exceptions - packets sent by vmbus_sendpacket_
pagebuffer() and vmbus_sendpacket_mpb_desc(). These packets
contains IO stack memory address and host will access these memory.
So add allocation bounce buffer support in vmbus for these packets.

For SNP isolation VM, guest needs to access the shared memory via
extra address space which is specified by Hyper-V CPUID HYPERV_CPUID_
ISOLATION_CONFIG. The access physical address of the shared memory
should be bounce buffer memory GPA plus with shared_gpa_boundary
reported by CPUID.

Change sicne RFC V4:
       - Introduce dma map decrypted function to remap bounce buffer
       	 and provide dma map decrypted ops for platform to hook callback.	 
       - Split swiotlb and dma map decrypted change into two patches
       - Replace vstart with vaddr in swiotlb changes.

Change since RFC v3:
       - Add interface set_memory_decrypted_map() to decrypt memory and
         map bounce buffer in extra address space 
       - Remove swiotlb remap function and store the remap address
         returned by set_memory_decrypted_map() in swiotlb mem data structure.
       - Introduce hv_set_mem_enc() to make code more readable in the __set_memory_enc_dec().

Change since RFC v2:
       - Remove not UIO driver in Isolation VM patch
       - Use vmap_pfn() to replace ioremap_page_range function in
       order to avoid exposing symbol ioremap_page_range() and
       ioremap_page_range()
       - Call hv set mem host visibility hvcall in set_memory_encrypted/decrypted()
       - Enable swiotlb force mode instead of adding Hyper-V dma map/unmap hook
       - Fix code style


Tianyu Lan (13):
  x86/HV: Initialize GHCB page in Isolation VM
  x86/HV: Initialize shared memory boundary in the Isolation VM.
  x86/HV: Add new hvcall guest address host visibility support
  HV: Mark vmbus ring buffer visible to host in Isolation VM
  HV: Add Write/Read MSR registers via ghcb page
  HV: Add ghcb hvcall support for SNP VM
  HV/Vmbus: Add SNP support for VMbus channel initiate message
  HV/Vmbus: Initialize VMbus ring buffer for Isolation VM
  DMA: Add dma_map_decrypted/dma_unmap_encrypted() function
  x86/Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM
  HV/IOMMU: Enable swiotlb bounce buffer for Isolation VM
  HV/Netvsc: Add Isolation VM support for netvsc driver
  HV/Storvsc: Add Isolation VM support for storvsc driver

 arch/x86/hyperv/Makefile           |   2 +-
 arch/x86/hyperv/hv_init.c          |  87 +++++++--
 arch/x86/hyperv/ivm.c              | 296 +++++++++++++++++++++++++++++
 arch/x86/include/asm/hyperv-tlfs.h |  18 ++
 arch/x86/include/asm/mshyperv.h    |  86 ++++++++-
 arch/x86/include/asm/sev.h         |   4 +
 arch/x86/kernel/cpu/mshyperv.c     |   5 +
 arch/x86/kernel/sev-shared.c       |  21 +-
 arch/x86/mm/pat/set_memory.c       |   6 +-
 arch/x86/xen/pci-swiotlb-xen.c     |   3 +-
 drivers/hv/Kconfig                 |   1 +
 drivers/hv/channel.c               |  48 ++++-
 drivers/hv/connection.c            |  71 ++++++-
 drivers/hv/hv.c                    | 129 +++++++++----
 drivers/hv/hyperv_vmbus.h          |   3 +
 drivers/hv/ring_buffer.c           |  84 ++++++--
 drivers/hv/vmbus_drv.c             |   3 +
 drivers/iommu/hyperv-iommu.c       |  65 +++++++
 drivers/net/hyperv/hyperv_net.h    |   6 +
 drivers/net/hyperv/netvsc.c        | 144 +++++++++++++-
 drivers/net/hyperv/rndis_filter.c  |   2 +
 drivers/scsi/storvsc_drv.c         |  68 ++++++-
 include/asm-generic/hyperv-tlfs.h  |   1 +
 include/asm-generic/mshyperv.h     |  53 +++++-
 include/linux/dma-map-ops.h        |   9 +
 include/linux/hyperv.h             |  16 ++
 include/linux/swiotlb.h            |   4 +
 kernel/dma/mapping.c               |  22 +++
 kernel/dma/swiotlb.c               |  11 +-
 29 files changed, 1166 insertions(+), 102 deletions(-)
 create mode 100644 arch/x86/hyperv/ivm.c

Comments

Joerg Roedel Aug. 2, 2021, 11:53 a.m. UTC | #1
On Wed, Jul 28, 2021 at 10:52:16AM -0400, Tianyu Lan wrote:
> +static int hyperv_init_ghcb(void)

> +{

> +	u64 ghcb_gpa;

> +	void *ghcb_va;

> +	void **ghcb_base;

> +

> +	if (!ms_hyperv.ghcb_base)

> +		return -EINVAL;

> +

> +	rdmsrl(MSR_AMD64_SEV_ES_GHCB, ghcb_gpa);

> +	ghcb_va = memremap(ghcb_gpa, HV_HYP_PAGE_SIZE, MEMREMAP_WB);


This deserves a comment. As I understand it, the GHCB pa is set by
Hyper-V or the paravisor, so the page does not need to be allocated by
Linux.
And it is not mapped unencrypted because the GHCB page is allocated
above the VTOM boundary?

> @@ -167,6 +190,31 @@ static int hv_cpu_die(unsigned int cpu)

>  {

>  	struct hv_reenlightenment_control re_ctrl;

>  	unsigned int new_cpu;

> +	unsigned long flags;

> +	void **input_arg;

> +	void *pg;

> +	void **ghcb_va = NULL;

> +

> +	local_irq_save(flags);

> +	input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);

> +	pg = *input_arg;


Pg is never used later on, why is it set?
Joerg Roedel Aug. 2, 2021, 12:07 p.m. UTC | #2
On Wed, Jul 28, 2021 at 10:52:19AM -0400, Tianyu Lan wrote:
> +	if (type == HV_GPADL_BUFFER)

> +		index = 0;

> +	else

> +		index = channel->gpadl_range[1].gpadlhandle ? 2 : 1;


Hmm... This doesn't look very robust. Can you set fixed indexes for
different buffer types? HV_GPADL_BUFFER already has fixed index 0. But
as it is implemented here you risk that index 2 gets overwritten by
subsequent calls.
Joerg Roedel Aug. 2, 2021, 12:28 p.m. UTC | #3
On Wed, Jul 28, 2021 at 10:52:20AM -0400, Tianyu Lan wrote:
> +void hv_ghcb_msr_write(u64 msr, u64 value)
> +{
> +	union hv_ghcb *hv_ghcb;
> +	void **ghcb_base;
> +	unsigned long flags;
> +
> +	if (!ms_hyperv.ghcb_base)
> +		return;
> +
> +	WARN_ON(in_nmi());
> +
> +	local_irq_save(flags);
> +	ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
> +	hv_ghcb = (union hv_ghcb *)*ghcb_base;
> +	if (!hv_ghcb) {
> +		local_irq_restore(flags);
> +		return;
> +	}
> +
> +	memset(hv_ghcb, 0x00, HV_HYP_PAGE_SIZE);

Do you really need to zero out the whole 4k? The validation bitmap
should be enough, there are no secrets on the page anyway.
Same in hv_ghcb_msr_read().

> +enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb,
> +				   struct es_em_ctxt *ctxt,
> +				   u64 exit_code, u64 exit_info_1,
> +				   u64 exit_info_2)
>  {
>  	enum es_result ret;
>  
> @@ -109,7 +109,16 @@ static enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb,
>  	ghcb_set_sw_exit_info_1(ghcb, exit_info_1);
>  	ghcb_set_sw_exit_info_2(ghcb, exit_info_2);
>  
> -	sev_es_wr_ghcb_msr(__pa(ghcb));
> +	/*
> +	 * Hyper-V runs paravisor with SEV. Ghcb page is allocated by
> +	 * paravisor and not needs to be updated in the Linux guest.
> +	 * Otherwise, the ghcb page's PA reported by paravisor is above
> +	 * VTOM. Hyper-V use this function with NULL for ctxt point and
> +	 * skip setting ghcb page in such case.
> +	 */
> +	if (ctxt)
> +		sev_es_wr_ghcb_msr(__pa(ghcb));

No, do not make this function work with ctxt==NULL. Instead, factor out
a helper function which contains what Hyper-V needs and use that in
sev_es_ghcb_hv_call() and Hyper-V code.

> +union hv_ghcb {
> +	struct ghcb ghcb;
> +} __packed __aligned(PAGE_SIZE);

I am curious what this will end up being good for.
Tianyu Lan Aug. 2, 2021, 12:56 p.m. UTC | #4
On 8/2/2021 8:07 PM, Joerg Roedel wrote:
> On Wed, Jul 28, 2021 at 10:52:19AM -0400, Tianyu Lan wrote:
>> +	if (type == HV_GPADL_BUFFER)
>> +		index = 0;
>> +	else
>> +		index = channel->gpadl_range[1].gpadlhandle ? 2 : 1;
> 
> Hmm... This doesn't look very robust. Can you set fixed indexes for
> different buffer types? HV_GPADL_BUFFER already has fixed index 0. But
> as it is implemented here you risk that index 2 gets overwritten by
> subsequent calls.

Both second and third are HV_GPADL_RING type. One is send ring and the
other is receive ring. The driver keeps the order to allocate rx and
tx buffer. You are right this is not robust and will add a mutex to keep
the order.
Joerg Roedel Aug. 2, 2021, 12:58 p.m. UTC | #5
On Wed, Jul 28, 2021 at 10:52:22AM -0400, Tianyu Lan wrote:
> +	if (hv_is_isolation_supported()) {
> +		vmbus_connection.monitor_pages_va[0]
> +			= vmbus_connection.monitor_pages[0];
> +		vmbus_connection.monitor_pages[0]
> +			= memremap(msg->monitor_page1, HV_HYP_PAGE_SIZE,
> +				   MEMREMAP_WB);
> +		if (!vmbus_connection.monitor_pages[0])
> +			return -ENOMEM;
> +
> +		vmbus_connection.monitor_pages_va[1]
> +			= vmbus_connection.monitor_pages[1];
> +		vmbus_connection.monitor_pages[1]
> +			= memremap(msg->monitor_page2, HV_HYP_PAGE_SIZE,
> +				   MEMREMAP_WB);
> +		if (!vmbus_connection.monitor_pages[1]) {
> +			memunmap(vmbus_connection.monitor_pages[0]);
> +			return -ENOMEM;
> +		}
> +
> +		memset(vmbus_connection.monitor_pages[0], 0x00,
> +		       HV_HYP_PAGE_SIZE);
> +		memset(vmbus_connection.monitor_pages[1], 0x00,
> +		       HV_HYP_PAGE_SIZE);
> +	}

Okay, let me see if I got this right. In Hyper-V Isolation VMs, when the
guest wants to make memory shared, it does":

	- Call to the Hypervisor the mark the pages shared. The
	  Hypervisor will do the RMP update and remap the pages at
	  (VTOM + pa)

	- The guest maps the memory again into its page-table, so that
	  the entries point to the correct GPA (which is above VTOM
	  now).

Or in other words, Hyper-V implements a hardware-independent and
configurable c-bit position, as the VTOM value is always power-of-two
aligned. Is that correct?
This would at least explain why there is no separate
allocation/dealloction of memory for the shared range.

Thanks,

	Joerg
Tianyu Lan Aug. 2, 2021, 1:18 p.m. UTC | #6
On 8/2/2021 8:28 PM, Joerg Roedel wrote:
> On Wed, Jul 28, 2021 at 10:52:20AM -0400, Tianyu Lan wrote:

>> +void hv_ghcb_msr_write(u64 msr, u64 value)

>> +{

>> +	union hv_ghcb *hv_ghcb;

>> +	void **ghcb_base;

>> +	unsigned long flags;

>> +

>> +	if (!ms_hyperv.ghcb_base)

>> +		return;

>> +

>> +	WARN_ON(in_nmi());

>> +

>> +	local_irq_save(flags);

>> +	ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);

>> +	hv_ghcb = (union hv_ghcb *)*ghcb_base;

>> +	if (!hv_ghcb) {

>> +		local_irq_restore(flags);

>> +		return;

>> +	}

>> +

>> +	memset(hv_ghcb, 0x00, HV_HYP_PAGE_SIZE);

> 

> Do you really need to zero out the whole 4k? The validation bitmap

> should be enough, there are no secrets on the page anyway.

> Same in hv_ghcb_msr_read().


OK. Thanks for suggestion. I will have a try.

> 

>> +enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb,

>> +				   struct es_em_ctxt *ctxt,

>> +				   u64 exit_code, u64 exit_info_1,

>> +				   u64 exit_info_2)

>>   {

>>   	enum es_result ret;

>>   

>> @@ -109,7 +109,16 @@ static enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb,

>>   	ghcb_set_sw_exit_info_1(ghcb, exit_info_1);

>>   	ghcb_set_sw_exit_info_2(ghcb, exit_info_2);

>>   

>> -	sev_es_wr_ghcb_msr(__pa(ghcb));

>> +	/*

>> +	 * Hyper-V runs paravisor with SEV. Ghcb page is allocated by

>> +	 * paravisor and not needs to be updated in the Linux guest.

>> +	 * Otherwise, the ghcb page's PA reported by paravisor is above

>> +	 * VTOM. Hyper-V use this function with NULL for ctxt point and

>> +	 * skip setting ghcb page in such case.

>> +	 */

>> +	if (ctxt)

>> +		sev_es_wr_ghcb_msr(__pa(ghcb));

> 

> No, do not make this function work with ctxt==NULL. Instead, factor out

> a helper function which contains what Hyper-V needs and use that in

> sev_es_ghcb_hv_call() and Hyper-V code.

> 


OK. Will update.

>> +union hv_ghcb {

>> +	struct ghcb ghcb;

>> +} __packed __aligned(PAGE_SIZE);

> 

> I am curious what this will end up being good for.

> 


Hyper-V introduces a specific hypercall request in GHCB page and use 
same union in the Linux Hyper-V code to read/write MSR and call the new 
hypercall request.
Tianyu Lan Aug. 2, 2021, 2:08 p.m. UTC | #7
On 8/2/2021 9:20 PM, Joerg Roedel wrote:
> On Wed, Jul 28, 2021 at 10:52:28AM -0400, Tianyu Lan wrote:
>> In Isolation VM, all shared memory with host needs to mark visible
>> to host via hvcall. vmbus_establish_gpadl() has already done it for
>> storvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
>> mpb_desc() still need to handle. Use DMA API to map/umap these
>> memory during sending/receiving packet and Hyper-V DMA ops callback
>> will use swiotlb function to allocate bounce buffer and copy data
>> from/to bounce buffer.
> 
> I am wondering why you dont't use DMA-API unconditionally? It provides
> enough abstraction to do the right thing for isolated and legacy VMs.
> 

In VMbus, there is already a similar bounce buffer design and so there 
is no need to call DMA-API for such buffer. Calling DMA-API is to use
swiotlb bounce buffer for those buffer which hasn't been covered. This
is why need to conditionally call DMA API.