mbox series

[Resend,RFC,V4,00/13] x86/Hyper-V: Add Hyper-V Isolation VM support

Message ID 20210707154629.3977369-1-ltykernel@gmail.com
Headers show
Series x86/Hyper-V: Add Hyper-V Isolation VM support | expand

Message

Tianyu Lan July 7, 2021, 3:46 p.m. UTC
From: Tianyu Lan <Tianyu.Lan@microsoft.com>

Hyper-V provides two kinds of Isolation VMs. VBS(Virtualization-based
security) and AMD SEV-SNP unenlightened Isolation VMs. This patchset
is to add support for these Isolation VM support in Linux.

The memory of these vms are encrypted and host can't access guest
memory directly. Hyper-V provides new host visibility hvcall and
the guest needs to call new hvcall to mark memory visible to host
before sharing memory with host. For security, all network/storage
stack memory should not be shared with host and so there is bounce
buffer requests.

Vmbus channel ring buffer already plays bounce buffer role because
all data from/to host needs to copy from/to between the ring buffer
and IO stack memory. So mark vmbus channel ring buffer visible.

There are two exceptions - packets sent by vmbus_sendpacket_
pagebuffer() and vmbus_sendpacket_mpb_desc(). These packets
contains IO stack memory address and host will access these memory.
So add allocation bounce buffer support in vmbus for these packets.

For SNP isolation VM, guest needs to access the shared memory via
extra address space which is specified by Hyper-V CPUID HYPERV_CPUID_
ISOLATION_CONFIG. The access physical address of the shared memory
should be bounce buffer memory GPA plus with shared_gpa_boundary
reported by CPUID.

Change since v3:
       - Add interface set_memory_decrypted_map() to decrypt memory and
         map bounce buffer in extra address space 
       - Remove swiotlb remap function and store the remap address
         returned by set_memory_decrypted_map() in swiotlb mem data structure.
       - Introduce hv_set_mem_enc() to make code more readable in the __set_memory_enc_dec().

Change since v2:
       - Remove not UIO driver in Isolation VM patch
       - Use vmap_pfn() to replace ioremap_page_range function in
       order to avoid exposing symbol ioremap_page_range() and
       ioremap_page_range()
       - Call hv set mem host visibility hvcall in set_memory_encrypted/decrypted()
       - Enable swiotlb force mode instead of adding Hyper-V dma map/unmap hook
       - Fix code style

Tianyu Lan (13):
  x86/HV: Initialize GHCB page in Isolation VM
  x86/HV: Initialize shared memory boundary in the Isolation VM.
  x86/HV: Add new hvcall guest address host visibility support
  HV: Mark vmbus ring buffer visible to host in Isolation VM
  HV: Add Write/Read MSR registers via ghcb page
  HV: Add ghcb hvcall support for SNP VM
  HV/Vmbus: Add SNP support for VMbus channel initiate message
  HV/Vmbus: Initialize VMbus ring buffer for Isolation VM
  x86/Swiotlb/HV: Add Swiotlb bounce buffer remap function for HV IVM
  HV/IOMMU: Enable swiotlb bounce buffer for Isolation VM
  HV/Netvsc: Add Isolation VM support for netvsc driver
  HV/Storvsc: Add Isolation VM support for storvsc driver
  x86/HV: Not set memory decrypted/encrypted during kexec alloc/free
    page in IVM

 arch/x86/hyperv/Makefile           |   2 +-
 arch/x86/hyperv/hv_init.c          |  25 +--
 arch/x86/hyperv/ivm.c              | 299 +++++++++++++++++++++++++++++
 arch/x86/include/asm/hyperv-tlfs.h |  18 ++
 arch/x86/include/asm/mshyperv.h    |  84 +++++++-
 arch/x86/include/asm/set_memory.h  |   2 +
 arch/x86/include/asm/sev-es.h      |   4 +
 arch/x86/kernel/cpu/mshyperv.c     |   5 +
 arch/x86/kernel/machine_kexec_64.c |   5 +-
 arch/x86/kernel/sev-es-shared.c    |  21 +-
 arch/x86/mm/pat/set_memory.c       |  34 +++-
 arch/x86/xen/pci-swiotlb-xen.c     |   3 +-
 drivers/hv/Kconfig                 |   1 +
 drivers/hv/channel.c               |  48 ++++-
 drivers/hv/connection.c            |  71 ++++++-
 drivers/hv/hv.c                    | 129 +++++++++----
 drivers/hv/hyperv_vmbus.h          |   3 +
 drivers/hv/ring_buffer.c           |  84 ++++++--
 drivers/hv/vmbus_drv.c             |   3 +
 drivers/iommu/hyperv-iommu.c       |  62 ++++++
 drivers/net/hyperv/hyperv_net.h    |   6 +
 drivers/net/hyperv/netvsc.c        | 144 +++++++++++++-
 drivers/net/hyperv/rndis_filter.c  |   2 +
 drivers/scsi/storvsc_drv.c         |  68 ++++++-
 include/asm-generic/hyperv-tlfs.h  |   1 +
 include/asm-generic/mshyperv.h     |  53 ++++-
 include/linux/hyperv.h             |  16 ++
 include/linux/swiotlb.h            |   4 +
 kernel/dma/swiotlb.c               |  11 +-
 29 files changed, 1097 insertions(+), 111 deletions(-)
 create mode 100644 arch/x86/hyperv/ivm.c

Comments

Tianyu Lan July 8, 2021, 1:54 p.m. UTC | #1
Hi Dave:
      Thanks for your review.

On 7/8/2021 12:14 AM, Dave Hansen wrote:
> On 7/7/21 8:46 AM, Tianyu Lan wrote:
>> @@ -598,7 +599,7 @@ void arch_kexec_unprotect_crashkres(void)
>>    */
>>   int arch_kexec_post_alloc_pages(void *vaddr, unsigned int pages, gfp_t gfp)
>>   {
>> -	if (sev_active())
>> +	if (sev_active() || hv_is_isolation_supported())
>>   		return 0;
>>   
>>   	/*
>> @@ -611,7 +612,7 @@ int arch_kexec_post_alloc_pages(void *vaddr, unsigned int pages, gfp_t gfp)
>>   
>>   void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages)
>>   {
>> -	if (sev_active())
>> +	if (sev_active() || hv_is_isolation_supported())
>>   		return;
> 
> You might want to take a look through the "protected guest" patches.  I
> think this series is touching a few of the same locations that TDX and
> recent SEV work touch.
> 
> https://lore.kernel.org/lkml/20210618225755.662725-5-sathyanarayanan.kuppuswamy@linux.intel.com/

Thanks for reminder. You are right. There will be a generic API to check 
"proteced guest" type.
Christoph Hellwig July 20, 2021, 1:54 p.m. UTC | #2
Please split the swiotlb changes into a separate patch from the
consumer.

>  }
> +
> +/*
> + * hv_map_memory - map memory to extra space in the AMD SEV-SNP Isolation VM.
> + */
> +unsigned long hv_map_memory(unsigned long addr, unsigned long size)
> +{
> +	unsigned long *pfns = kcalloc(size / HV_HYP_PAGE_SIZE,
> +				      sizeof(unsigned long),
> +		       GFP_KERNEL);
> +	unsigned long vaddr;
> +	int i;
> +
> +	if (!pfns)
> +		return (unsigned long)NULL;
> +
> +	for (i = 0; i < size / HV_HYP_PAGE_SIZE; i++)
> +		pfns[i] = virt_to_hvpfn((void *)addr + i * HV_HYP_PAGE_SIZE) +
> +			(ms_hyperv.shared_gpa_boundary >> HV_HYP_PAGE_SHIFT);
> +
> +	vaddr = (unsigned long)vmap_pfn(pfns, size / HV_HYP_PAGE_SIZE,
> +					PAGE_KERNEL_IO);
> +	kfree(pfns);
> +
> +	return vaddr;

This seems to miss a 'select VMAP_PFN'.  But more importantly I don't
think this actually works.  Various DMA APIs do expect a struct page
backing, so how is this going to work with say dma_mmap_attrs or
dma_get_sgtable_attrs?

> +static unsigned long __map_memory(unsigned long addr, unsigned long size)
> +{
> +	if (hv_is_isolation_supported())
> +		return hv_map_memory(addr, size);
> +
> +	return addr;
> +}
> +
> +static void __unmap_memory(unsigned long addr)
> +{
> +	if (hv_is_isolation_supported())
> +		hv_unmap_memory(addr);
> +}
> +
> +unsigned long set_memory_decrypted_map(unsigned long addr, unsigned long size)
> +{
> +	if (__set_memory_enc_dec(addr, size / PAGE_SIZE, false))
> +		return (unsigned long)NULL;
> +
> +	return __map_memory(addr, size);
> +}
> +
> +int set_memory_encrypted_unmap(unsigned long addr, unsigned long size)
> +{
> +	__unmap_memory(addr);
> +	return __set_memory_enc_dec(addr, size / PAGE_SIZE, true);
> +}

Why this obsfucation into all kinds of strange helpers?  Also I think
we want an ops vectors (or alternative calls) instead of the random
if checks here.

> + * @vstart:	The virtual start address of the swiotlb memory pool. The swiotlb
> + *		memory pool may be remapped in the memory encrypted case and store

Normall we'd call this vaddr or cpu_addr.

> -	set_memory_decrypted((unsigned long)vaddr, bytes >> PAGE_SHIFT);
> -	memset(vaddr, 0, bytes);
> +	mem->vstart = (void *)set_memory_decrypted_map((unsigned long)vaddr, bytes);

Please always pass kernel virtual addresses as pointers.

And I think these APIs might need better names, e.g.

arch_dma_map_decrypted and arch_dma_unmap_decrypted.

Also these will need fallback versions for non-x86 architectures that
currently use memory encryption.
Tianyu Lan July 21, 2021, 10:28 a.m. UTC | #3
Thanks for review.

On 7/20/2021 9:54 PM, Christoph Hellwig wrote:
> 
> Please split the swiotlb changes into a separate patch from the
> consumer.

OK. Will update.

> 
>>   }
>> +
>> +/*
>> + * hv_map_memory - map memory to extra space in the AMD SEV-SNP Isolation VM.
>> + */
>> +unsigned long hv_map_memory(unsigned long addr, unsigned long size)
>> +{
>> +	unsigned long *pfns = kcalloc(size / HV_HYP_PAGE_SIZE,
>> +				      sizeof(unsigned long),
>> +		       GFP_KERNEL);
>> +	unsigned long vaddr;
>> +	int i;
>> +
>> +	if (!pfns)
>> +		return (unsigned long)NULL;
>> +
>> +	for (i = 0; i < size / HV_HYP_PAGE_SIZE; i++)
>> +		pfns[i] = virt_to_hvpfn((void *)addr + i * HV_HYP_PAGE_SIZE) +
>> +			(ms_hyperv.shared_gpa_boundary >> HV_HYP_PAGE_SHIFT);
>> +
>> +	vaddr = (unsigned long)vmap_pfn(pfns, size / HV_HYP_PAGE_SIZE,
>> +					PAGE_KERNEL_IO);
>> +	kfree(pfns);
>> +
>> +	return vaddr;
> 
> This seems to miss a 'select VMAP_PFN'. 

VMAP_PFN has been selected in the previous patch "RFC PATCH V4 08/13]
HV/Vmbus: Initialize VMbus ring buffer for Isolation VM"

> But more importantly I don't
> think this actually works.  Various DMA APIs do expect a struct page
> backing, so how is this going to work with say dma_mmap_attrs or
> dma_get_sgtable_attrs?

dma_mmap_attrs() and dma_get_sgtable_attrs() get input virtual address
belonging to backing memory with struct page and returns bounce buffer
dma physical address which is below shared_gpa_boundary(vTOM) and passed
to Hyper-V via vmbus protocol.

The new map virtual address is only to access bounce buffer in swiotlb
code and will not be used other places. It's stored in the mem->vstart.
So the new API set_memory_decrypted_map() in this series is only called
in the swiotlb code. Other platforms may replace set_memory_decrypted()
with set_memory_decrypted_map() as requested.

> 
>> +static unsigned long __map_memory(unsigned long addr, unsigned long size)
>> +{
>> +	if (hv_is_isolation_supported())
>> +		return hv_map_memory(addr, size);
>> +
>> +	return addr;
>> +}
>> +
>> +static void __unmap_memory(unsigned long addr)
>> +{
>> +	if (hv_is_isolation_supported())
>> +		hv_unmap_memory(addr);
>> +}
>> +
>> +unsigned long set_memory_decrypted_map(unsigned long addr, unsigned long size)
>> +{
>> +	if (__set_memory_enc_dec(addr, size / PAGE_SIZE, false))
>> +		return (unsigned long)NULL;
>> +
>> +	return __map_memory(addr, size);
>> +}
>> +
>> +int set_memory_encrypted_unmap(unsigned long addr, unsigned long size)
>> +{
>> +	__unmap_memory(addr);
>> +	return __set_memory_enc_dec(addr, size / PAGE_SIZE, true);
>> +}
> 
> Why this obsfucation into all kinds of strange helpers?  Also I think
> we want an ops vectors (or alternative calls) instead of the random
> if checks here.

Yes, agree and will add ops for different platforms to map/unmap memory.

> 
>> + * @vstart:	The virtual start address of the swiotlb memory pool. The swiotlb
>> + *		memory pool may be remapped in the memory encrypted case and store
> 
> Normall we'd call this vaddr or cpu_addr.

OK. Will update.

> 
>> -	set_memory_decrypted((unsigned long)vaddr, bytes >> PAGE_SHIFT);
>> -	memset(vaddr, 0, bytes);
>> +	mem->vstart = (void *)set_memory_decrypted_map((unsigned long)vaddr, bytes);
> 
> Please always pass kernel virtual addresses as pointers.
> 
> And I think these APIs might need better names, e.g.
> 
> arch_dma_map_decrypted and arch_dma_unmap_decrypted.
> 
> Also these will need fallback versions for non-x86 architectures that
> currently use memory encryption.

Sure. Will update in the next version.
Christoph Hellwig July 21, 2021, 2:33 p.m. UTC | #4
On Wed, Jul 21, 2021 at 06:28:48PM +0800, Tianyu Lan wrote:
> dma_mmap_attrs() and dma_get_sgtable_attrs() get input virtual address
> belonging to backing memory with struct page and returns bounce buffer
> dma physical address which is below shared_gpa_boundary(vTOM) and passed
> to Hyper-V via vmbus protocol.
>
> The new map virtual address is only to access bounce buffer in swiotlb
> code and will not be used other places. It's stored in the mem->vstart.
> So the new API set_memory_decrypted_map() in this series is only called
> in the swiotlb code. Other platforms may replace set_memory_decrypted()
> with set_memory_decrypted_map() as requested.

It seems like you are indeed not using these new helpers in
dma_direct_alloc.  How is dma_alloc_attrs/dma_alloc_coherent going to
work on these systems?
Tianyu Lan July 21, 2021, 3:11 p.m. UTC | #5
On 7/21/2021 10:33 PM, Christoph Hellwig wrote:
> On Wed, Jul 21, 2021 at 06:28:48PM +0800, Tianyu Lan wrote:

>> dma_mmap_attrs() and dma_get_sgtable_attrs() get input virtual address

>> belonging to backing memory with struct page and returns bounce buffer

>> dma physical address which is below shared_gpa_boundary(vTOM) and passed

>> to Hyper-V via vmbus protocol.

>>

>> The new map virtual address is only to access bounce buffer in swiotlb

>> code and will not be used other places. It's stored in the mem->vstart.

>> So the new API set_memory_decrypted_map() in this series is only called

>> in the swiotlb code. Other platforms may replace set_memory_decrypted()

>> with set_memory_decrypted_map() as requested.

> 

> It seems like you are indeed not using these new helpers in

> dma_direct_alloc.  How is dma_alloc_attrs/dma_alloc_coherent going to

> work on these systems?

> 


Current vmbus device drivers don't use dma_alloc_attrs/dma_alloc
_coherent() because vmbus driver allocates ring buffer for all devices. 
Ring buffer has been marked decrypted and remapped in vmbus driver. 
Netvsc and storvsc drivers just need to use  dma_map_single/dma_map_page().
Tianyu Lan Aug. 13, 2021, 4:43 p.m. UTC | #6
Hi Christoph:
       I followed your this suggestion to rework the latest
version(https://lkml.org/lkml/2021/8/9/805). I just remove the arch
prefix from your suggested name arch_dma_map_decrypted because the 
platform may populate their map/umap callback in the ops. But from your 
latest comment(https://lkml.org/lkml/2021/8/12/149), these names confuse 
vs the actual dma_map_* calls... Could you help to give the right 
function name? The new map function is to map bounce buffer in the 
trust/Isolation VM and these buffers are DMA memory.



On 7/20/2021 9:54 PM, Christoph Hellwig wrote:
>> -	set_memory_decrypted((unsigned long)vaddr, bytes >> PAGE_SHIFT);

>> -	memset(vaddr, 0, bytes);

>> +	mem->vstart = (void *)set_memory_decrypted_map((unsigned long)vaddr, bytes);

> Please always pass kernel virtual addresses as pointers.

> 

> And I think these APIs might need better names, e.g.

> 

> arch_dma_map_decrypted and arch_dma_unmap_decrypted.

> 

> Also these will need fallback versions for non-x86 architectures that

> currently use memory encryption.