Message ID | 20210707154629.3977369-1-ltykernel@gmail.com |
---|---|
Headers | show |
Series | x86/Hyper-V: Add Hyper-V Isolation VM support | expand |
Hi Dave: Thanks for your review. On 7/8/2021 12:14 AM, Dave Hansen wrote: > On 7/7/21 8:46 AM, Tianyu Lan wrote: >> @@ -598,7 +599,7 @@ void arch_kexec_unprotect_crashkres(void) >> */ >> int arch_kexec_post_alloc_pages(void *vaddr, unsigned int pages, gfp_t gfp) >> { >> - if (sev_active()) >> + if (sev_active() || hv_is_isolation_supported()) >> return 0; >> >> /* >> @@ -611,7 +612,7 @@ int arch_kexec_post_alloc_pages(void *vaddr, unsigned int pages, gfp_t gfp) >> >> void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages) >> { >> - if (sev_active()) >> + if (sev_active() || hv_is_isolation_supported()) >> return; > > You might want to take a look through the "protected guest" patches. I > think this series is touching a few of the same locations that TDX and > recent SEV work touch. > > https://lore.kernel.org/lkml/20210618225755.662725-5-sathyanarayanan.kuppuswamy@linux.intel.com/ Thanks for reminder. You are right. There will be a generic API to check "proteced guest" type.
Please split the swiotlb changes into a separate patch from the consumer. > } > + > +/* > + * hv_map_memory - map memory to extra space in the AMD SEV-SNP Isolation VM. > + */ > +unsigned long hv_map_memory(unsigned long addr, unsigned long size) > +{ > + unsigned long *pfns = kcalloc(size / HV_HYP_PAGE_SIZE, > + sizeof(unsigned long), > + GFP_KERNEL); > + unsigned long vaddr; > + int i; > + > + if (!pfns) > + return (unsigned long)NULL; > + > + for (i = 0; i < size / HV_HYP_PAGE_SIZE; i++) > + pfns[i] = virt_to_hvpfn((void *)addr + i * HV_HYP_PAGE_SIZE) + > + (ms_hyperv.shared_gpa_boundary >> HV_HYP_PAGE_SHIFT); > + > + vaddr = (unsigned long)vmap_pfn(pfns, size / HV_HYP_PAGE_SIZE, > + PAGE_KERNEL_IO); > + kfree(pfns); > + > + return vaddr; This seems to miss a 'select VMAP_PFN'. But more importantly I don't think this actually works. Various DMA APIs do expect a struct page backing, so how is this going to work with say dma_mmap_attrs or dma_get_sgtable_attrs? > +static unsigned long __map_memory(unsigned long addr, unsigned long size) > +{ > + if (hv_is_isolation_supported()) > + return hv_map_memory(addr, size); > + > + return addr; > +} > + > +static void __unmap_memory(unsigned long addr) > +{ > + if (hv_is_isolation_supported()) > + hv_unmap_memory(addr); > +} > + > +unsigned long set_memory_decrypted_map(unsigned long addr, unsigned long size) > +{ > + if (__set_memory_enc_dec(addr, size / PAGE_SIZE, false)) > + return (unsigned long)NULL; > + > + return __map_memory(addr, size); > +} > + > +int set_memory_encrypted_unmap(unsigned long addr, unsigned long size) > +{ > + __unmap_memory(addr); > + return __set_memory_enc_dec(addr, size / PAGE_SIZE, true); > +} Why this obsfucation into all kinds of strange helpers? Also I think we want an ops vectors (or alternative calls) instead of the random if checks here. > + * @vstart: The virtual start address of the swiotlb memory pool. The swiotlb > + * memory pool may be remapped in the memory encrypted case and store Normall we'd call this vaddr or cpu_addr. > - set_memory_decrypted((unsigned long)vaddr, bytes >> PAGE_SHIFT); > - memset(vaddr, 0, bytes); > + mem->vstart = (void *)set_memory_decrypted_map((unsigned long)vaddr, bytes); Please always pass kernel virtual addresses as pointers. And I think these APIs might need better names, e.g. arch_dma_map_decrypted and arch_dma_unmap_decrypted. Also these will need fallback versions for non-x86 architectures that currently use memory encryption.
Thanks for review. On 7/20/2021 9:54 PM, Christoph Hellwig wrote: > > Please split the swiotlb changes into a separate patch from the > consumer. OK. Will update. > >> } >> + >> +/* >> + * hv_map_memory - map memory to extra space in the AMD SEV-SNP Isolation VM. >> + */ >> +unsigned long hv_map_memory(unsigned long addr, unsigned long size) >> +{ >> + unsigned long *pfns = kcalloc(size / HV_HYP_PAGE_SIZE, >> + sizeof(unsigned long), >> + GFP_KERNEL); >> + unsigned long vaddr; >> + int i; >> + >> + if (!pfns) >> + return (unsigned long)NULL; >> + >> + for (i = 0; i < size / HV_HYP_PAGE_SIZE; i++) >> + pfns[i] = virt_to_hvpfn((void *)addr + i * HV_HYP_PAGE_SIZE) + >> + (ms_hyperv.shared_gpa_boundary >> HV_HYP_PAGE_SHIFT); >> + >> + vaddr = (unsigned long)vmap_pfn(pfns, size / HV_HYP_PAGE_SIZE, >> + PAGE_KERNEL_IO); >> + kfree(pfns); >> + >> + return vaddr; > > This seems to miss a 'select VMAP_PFN'. VMAP_PFN has been selected in the previous patch "RFC PATCH V4 08/13] HV/Vmbus: Initialize VMbus ring buffer for Isolation VM" > But more importantly I don't > think this actually works. Various DMA APIs do expect a struct page > backing, so how is this going to work with say dma_mmap_attrs or > dma_get_sgtable_attrs? dma_mmap_attrs() and dma_get_sgtable_attrs() get input virtual address belonging to backing memory with struct page and returns bounce buffer dma physical address which is below shared_gpa_boundary(vTOM) and passed to Hyper-V via vmbus protocol. The new map virtual address is only to access bounce buffer in swiotlb code and will not be used other places. It's stored in the mem->vstart. So the new API set_memory_decrypted_map() in this series is only called in the swiotlb code. Other platforms may replace set_memory_decrypted() with set_memory_decrypted_map() as requested. > >> +static unsigned long __map_memory(unsigned long addr, unsigned long size) >> +{ >> + if (hv_is_isolation_supported()) >> + return hv_map_memory(addr, size); >> + >> + return addr; >> +} >> + >> +static void __unmap_memory(unsigned long addr) >> +{ >> + if (hv_is_isolation_supported()) >> + hv_unmap_memory(addr); >> +} >> + >> +unsigned long set_memory_decrypted_map(unsigned long addr, unsigned long size) >> +{ >> + if (__set_memory_enc_dec(addr, size / PAGE_SIZE, false)) >> + return (unsigned long)NULL; >> + >> + return __map_memory(addr, size); >> +} >> + >> +int set_memory_encrypted_unmap(unsigned long addr, unsigned long size) >> +{ >> + __unmap_memory(addr); >> + return __set_memory_enc_dec(addr, size / PAGE_SIZE, true); >> +} > > Why this obsfucation into all kinds of strange helpers? Also I think > we want an ops vectors (or alternative calls) instead of the random > if checks here. Yes, agree and will add ops for different platforms to map/unmap memory. > >> + * @vstart: The virtual start address of the swiotlb memory pool. The swiotlb >> + * memory pool may be remapped in the memory encrypted case and store > > Normall we'd call this vaddr or cpu_addr. OK. Will update. > >> - set_memory_decrypted((unsigned long)vaddr, bytes >> PAGE_SHIFT); >> - memset(vaddr, 0, bytes); >> + mem->vstart = (void *)set_memory_decrypted_map((unsigned long)vaddr, bytes); > > Please always pass kernel virtual addresses as pointers. > > And I think these APIs might need better names, e.g. > > arch_dma_map_decrypted and arch_dma_unmap_decrypted. > > Also these will need fallback versions for non-x86 architectures that > currently use memory encryption. Sure. Will update in the next version.
On Wed, Jul 21, 2021 at 06:28:48PM +0800, Tianyu Lan wrote: > dma_mmap_attrs() and dma_get_sgtable_attrs() get input virtual address > belonging to backing memory with struct page and returns bounce buffer > dma physical address which is below shared_gpa_boundary(vTOM) and passed > to Hyper-V via vmbus protocol. > > The new map virtual address is only to access bounce buffer in swiotlb > code and will not be used other places. It's stored in the mem->vstart. > So the new API set_memory_decrypted_map() in this series is only called > in the swiotlb code. Other platforms may replace set_memory_decrypted() > with set_memory_decrypted_map() as requested. It seems like you are indeed not using these new helpers in dma_direct_alloc. How is dma_alloc_attrs/dma_alloc_coherent going to work on these systems?
On 7/21/2021 10:33 PM, Christoph Hellwig wrote: > On Wed, Jul 21, 2021 at 06:28:48PM +0800, Tianyu Lan wrote: >> dma_mmap_attrs() and dma_get_sgtable_attrs() get input virtual address >> belonging to backing memory with struct page and returns bounce buffer >> dma physical address which is below shared_gpa_boundary(vTOM) and passed >> to Hyper-V via vmbus protocol. >> >> The new map virtual address is only to access bounce buffer in swiotlb >> code and will not be used other places. It's stored in the mem->vstart. >> So the new API set_memory_decrypted_map() in this series is only called >> in the swiotlb code. Other platforms may replace set_memory_decrypted() >> with set_memory_decrypted_map() as requested. > > It seems like you are indeed not using these new helpers in > dma_direct_alloc. How is dma_alloc_attrs/dma_alloc_coherent going to > work on these systems? > Current vmbus device drivers don't use dma_alloc_attrs/dma_alloc _coherent() because vmbus driver allocates ring buffer for all devices. Ring buffer has been marked decrypted and remapped in vmbus driver. Netvsc and storvsc drivers just need to use dma_map_single/dma_map_page().
Hi Christoph: I followed your this suggestion to rework the latest version(https://lkml.org/lkml/2021/8/9/805). I just remove the arch prefix from your suggested name arch_dma_map_decrypted because the platform may populate their map/umap callback in the ops. But from your latest comment(https://lkml.org/lkml/2021/8/12/149), these names confuse vs the actual dma_map_* calls... Could you help to give the right function name? The new map function is to map bounce buffer in the trust/Isolation VM and these buffers are DMA memory. On 7/20/2021 9:54 PM, Christoph Hellwig wrote: >> - set_memory_decrypted((unsigned long)vaddr, bytes >> PAGE_SHIFT); >> - memset(vaddr, 0, bytes); >> + mem->vstart = (void *)set_memory_decrypted_map((unsigned long)vaddr, bytes); > Please always pass kernel virtual addresses as pointers. > > And I think these APIs might need better names, e.g. > > arch_dma_map_decrypted and arch_dma_unmap_decrypted. > > Also these will need fallback versions for non-x86 architectures that > currently use memory encryption.
From: Tianyu Lan <Tianyu.Lan@microsoft.com> Hyper-V provides two kinds of Isolation VMs. VBS(Virtualization-based security) and AMD SEV-SNP unenlightened Isolation VMs. This patchset is to add support for these Isolation VM support in Linux. The memory of these vms are encrypted and host can't access guest memory directly. Hyper-V provides new host visibility hvcall and the guest needs to call new hvcall to mark memory visible to host before sharing memory with host. For security, all network/storage stack memory should not be shared with host and so there is bounce buffer requests. Vmbus channel ring buffer already plays bounce buffer role because all data from/to host needs to copy from/to between the ring buffer and IO stack memory. So mark vmbus channel ring buffer visible. There are two exceptions - packets sent by vmbus_sendpacket_ pagebuffer() and vmbus_sendpacket_mpb_desc(). These packets contains IO stack memory address and host will access these memory. So add allocation bounce buffer support in vmbus for these packets. For SNP isolation VM, guest needs to access the shared memory via extra address space which is specified by Hyper-V CPUID HYPERV_CPUID_ ISOLATION_CONFIG. The access physical address of the shared memory should be bounce buffer memory GPA plus with shared_gpa_boundary reported by CPUID. Change since v3: - Add interface set_memory_decrypted_map() to decrypt memory and map bounce buffer in extra address space - Remove swiotlb remap function and store the remap address returned by set_memory_decrypted_map() in swiotlb mem data structure. - Introduce hv_set_mem_enc() to make code more readable in the __set_memory_enc_dec(). Change since v2: - Remove not UIO driver in Isolation VM patch - Use vmap_pfn() to replace ioremap_page_range function in order to avoid exposing symbol ioremap_page_range() and ioremap_page_range() - Call hv set mem host visibility hvcall in set_memory_encrypted/decrypted() - Enable swiotlb force mode instead of adding Hyper-V dma map/unmap hook - Fix code style Tianyu Lan (13): x86/HV: Initialize GHCB page in Isolation VM x86/HV: Initialize shared memory boundary in the Isolation VM. x86/HV: Add new hvcall guest address host visibility support HV: Mark vmbus ring buffer visible to host in Isolation VM HV: Add Write/Read MSR registers via ghcb page HV: Add ghcb hvcall support for SNP VM HV/Vmbus: Add SNP support for VMbus channel initiate message HV/Vmbus: Initialize VMbus ring buffer for Isolation VM x86/Swiotlb/HV: Add Swiotlb bounce buffer remap function for HV IVM HV/IOMMU: Enable swiotlb bounce buffer for Isolation VM HV/Netvsc: Add Isolation VM support for netvsc driver HV/Storvsc: Add Isolation VM support for storvsc driver x86/HV: Not set memory decrypted/encrypted during kexec alloc/free page in IVM arch/x86/hyperv/Makefile | 2 +- arch/x86/hyperv/hv_init.c | 25 +-- arch/x86/hyperv/ivm.c | 299 +++++++++++++++++++++++++++++ arch/x86/include/asm/hyperv-tlfs.h | 18 ++ arch/x86/include/asm/mshyperv.h | 84 +++++++- arch/x86/include/asm/set_memory.h | 2 + arch/x86/include/asm/sev-es.h | 4 + arch/x86/kernel/cpu/mshyperv.c | 5 + arch/x86/kernel/machine_kexec_64.c | 5 +- arch/x86/kernel/sev-es-shared.c | 21 +- arch/x86/mm/pat/set_memory.c | 34 +++- arch/x86/xen/pci-swiotlb-xen.c | 3 +- drivers/hv/Kconfig | 1 + drivers/hv/channel.c | 48 ++++- drivers/hv/connection.c | 71 ++++++- drivers/hv/hv.c | 129 +++++++++---- drivers/hv/hyperv_vmbus.h | 3 + drivers/hv/ring_buffer.c | 84 ++++++-- drivers/hv/vmbus_drv.c | 3 + drivers/iommu/hyperv-iommu.c | 62 ++++++ drivers/net/hyperv/hyperv_net.h | 6 + drivers/net/hyperv/netvsc.c | 144 +++++++++++++- drivers/net/hyperv/rndis_filter.c | 2 + drivers/scsi/storvsc_drv.c | 68 ++++++- include/asm-generic/hyperv-tlfs.h | 1 + include/asm-generic/mshyperv.h | 53 ++++- include/linux/hyperv.h | 16 ++ include/linux/swiotlb.h | 4 + kernel/dma/swiotlb.c | 11 +- 29 files changed, 1097 insertions(+), 111 deletions(-) create mode 100644 arch/x86/hyperv/ivm.c