Message ID | 20210728145232.285861-1-ltykernel@gmail.com |
---|---|
Headers | show |
Series | x86/Hyper-V: Add Hyper-V Isolation VM support | expand |
On Wed, Jul 28, 2021 at 10:52:16AM -0400, Tianyu Lan wrote: > +static int hyperv_init_ghcb(void) > +{ > + u64 ghcb_gpa; > + void *ghcb_va; > + void **ghcb_base; > + > + if (!ms_hyperv.ghcb_base) > + return -EINVAL; > + > + rdmsrl(MSR_AMD64_SEV_ES_GHCB, ghcb_gpa); > + ghcb_va = memremap(ghcb_gpa, HV_HYP_PAGE_SIZE, MEMREMAP_WB); This deserves a comment. As I understand it, the GHCB pa is set by Hyper-V or the paravisor, so the page does not need to be allocated by Linux. And it is not mapped unencrypted because the GHCB page is allocated above the VTOM boundary? > @@ -167,6 +190,31 @@ static int hv_cpu_die(unsigned int cpu) > { > struct hv_reenlightenment_control re_ctrl; > unsigned int new_cpu; > + unsigned long flags; > + void **input_arg; > + void *pg; > + void **ghcb_va = NULL; > + > + local_irq_save(flags); > + input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg); > + pg = *input_arg; Pg is never used later on, why is it set?
On Wed, Jul 28, 2021 at 10:52:19AM -0400, Tianyu Lan wrote: > + if (type == HV_GPADL_BUFFER) > + index = 0; > + else > + index = channel->gpadl_range[1].gpadlhandle ? 2 : 1; Hmm... This doesn't look very robust. Can you set fixed indexes for different buffer types? HV_GPADL_BUFFER already has fixed index 0. But as it is implemented here you risk that index 2 gets overwritten by subsequent calls.
On Wed, Jul 28, 2021 at 10:52:20AM -0400, Tianyu Lan wrote: > +void hv_ghcb_msr_write(u64 msr, u64 value) > +{ > + union hv_ghcb *hv_ghcb; > + void **ghcb_base; > + unsigned long flags; > + > + if (!ms_hyperv.ghcb_base) > + return; > + > + WARN_ON(in_nmi()); > + > + local_irq_save(flags); > + ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base); > + hv_ghcb = (union hv_ghcb *)*ghcb_base; > + if (!hv_ghcb) { > + local_irq_restore(flags); > + return; > + } > + > + memset(hv_ghcb, 0x00, HV_HYP_PAGE_SIZE); Do you really need to zero out the whole 4k? The validation bitmap should be enough, there are no secrets on the page anyway. Same in hv_ghcb_msr_read(). > +enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb, > + struct es_em_ctxt *ctxt, > + u64 exit_code, u64 exit_info_1, > + u64 exit_info_2) > { > enum es_result ret; > > @@ -109,7 +109,16 @@ static enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb, > ghcb_set_sw_exit_info_1(ghcb, exit_info_1); > ghcb_set_sw_exit_info_2(ghcb, exit_info_2); > > - sev_es_wr_ghcb_msr(__pa(ghcb)); > + /* > + * Hyper-V runs paravisor with SEV. Ghcb page is allocated by > + * paravisor and not needs to be updated in the Linux guest. > + * Otherwise, the ghcb page's PA reported by paravisor is above > + * VTOM. Hyper-V use this function with NULL for ctxt point and > + * skip setting ghcb page in such case. > + */ > + if (ctxt) > + sev_es_wr_ghcb_msr(__pa(ghcb)); No, do not make this function work with ctxt==NULL. Instead, factor out a helper function which contains what Hyper-V needs and use that in sev_es_ghcb_hv_call() and Hyper-V code. > +union hv_ghcb { > + struct ghcb ghcb; > +} __packed __aligned(PAGE_SIZE); I am curious what this will end up being good for.
On 8/2/2021 8:07 PM, Joerg Roedel wrote: > On Wed, Jul 28, 2021 at 10:52:19AM -0400, Tianyu Lan wrote: >> + if (type == HV_GPADL_BUFFER) >> + index = 0; >> + else >> + index = channel->gpadl_range[1].gpadlhandle ? 2 : 1; > > Hmm... This doesn't look very robust. Can you set fixed indexes for > different buffer types? HV_GPADL_BUFFER already has fixed index 0. But > as it is implemented here you risk that index 2 gets overwritten by > subsequent calls. Both second and third are HV_GPADL_RING type. One is send ring and the other is receive ring. The driver keeps the order to allocate rx and tx buffer. You are right this is not robust and will add a mutex to keep the order.
On Wed, Jul 28, 2021 at 10:52:22AM -0400, Tianyu Lan wrote: > + if (hv_is_isolation_supported()) { > + vmbus_connection.monitor_pages_va[0] > + = vmbus_connection.monitor_pages[0]; > + vmbus_connection.monitor_pages[0] > + = memremap(msg->monitor_page1, HV_HYP_PAGE_SIZE, > + MEMREMAP_WB); > + if (!vmbus_connection.monitor_pages[0]) > + return -ENOMEM; > + > + vmbus_connection.monitor_pages_va[1] > + = vmbus_connection.monitor_pages[1]; > + vmbus_connection.monitor_pages[1] > + = memremap(msg->monitor_page2, HV_HYP_PAGE_SIZE, > + MEMREMAP_WB); > + if (!vmbus_connection.monitor_pages[1]) { > + memunmap(vmbus_connection.monitor_pages[0]); > + return -ENOMEM; > + } > + > + memset(vmbus_connection.monitor_pages[0], 0x00, > + HV_HYP_PAGE_SIZE); > + memset(vmbus_connection.monitor_pages[1], 0x00, > + HV_HYP_PAGE_SIZE); > + } Okay, let me see if I got this right. In Hyper-V Isolation VMs, when the guest wants to make memory shared, it does": - Call to the Hypervisor the mark the pages shared. The Hypervisor will do the RMP update and remap the pages at (VTOM + pa) - The guest maps the memory again into its page-table, so that the entries point to the correct GPA (which is above VTOM now). Or in other words, Hyper-V implements a hardware-independent and configurable c-bit position, as the VTOM value is always power-of-two aligned. Is that correct? This would at least explain why there is no separate allocation/dealloction of memory for the shared range. Thanks, Joerg
On 8/2/2021 8:28 PM, Joerg Roedel wrote: > On Wed, Jul 28, 2021 at 10:52:20AM -0400, Tianyu Lan wrote: >> +void hv_ghcb_msr_write(u64 msr, u64 value) >> +{ >> + union hv_ghcb *hv_ghcb; >> + void **ghcb_base; >> + unsigned long flags; >> + >> + if (!ms_hyperv.ghcb_base) >> + return; >> + >> + WARN_ON(in_nmi()); >> + >> + local_irq_save(flags); >> + ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base); >> + hv_ghcb = (union hv_ghcb *)*ghcb_base; >> + if (!hv_ghcb) { >> + local_irq_restore(flags); >> + return; >> + } >> + >> + memset(hv_ghcb, 0x00, HV_HYP_PAGE_SIZE); > > Do you really need to zero out the whole 4k? The validation bitmap > should be enough, there are no secrets on the page anyway. > Same in hv_ghcb_msr_read(). OK. Thanks for suggestion. I will have a try. > >> +enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb, >> + struct es_em_ctxt *ctxt, >> + u64 exit_code, u64 exit_info_1, >> + u64 exit_info_2) >> { >> enum es_result ret; >> >> @@ -109,7 +109,16 @@ static enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb, >> ghcb_set_sw_exit_info_1(ghcb, exit_info_1); >> ghcb_set_sw_exit_info_2(ghcb, exit_info_2); >> >> - sev_es_wr_ghcb_msr(__pa(ghcb)); >> + /* >> + * Hyper-V runs paravisor with SEV. Ghcb page is allocated by >> + * paravisor and not needs to be updated in the Linux guest. >> + * Otherwise, the ghcb page's PA reported by paravisor is above >> + * VTOM. Hyper-V use this function with NULL for ctxt point and >> + * skip setting ghcb page in such case. >> + */ >> + if (ctxt) >> + sev_es_wr_ghcb_msr(__pa(ghcb)); > > No, do not make this function work with ctxt==NULL. Instead, factor out > a helper function which contains what Hyper-V needs and use that in > sev_es_ghcb_hv_call() and Hyper-V code. > OK. Will update. >> +union hv_ghcb { >> + struct ghcb ghcb; >> +} __packed __aligned(PAGE_SIZE); > > I am curious what this will end up being good for. > Hyper-V introduces a specific hypercall request in GHCB page and use same union in the Linux Hyper-V code to read/write MSR and call the new hypercall request.
On 8/2/2021 9:20 PM, Joerg Roedel wrote: > On Wed, Jul 28, 2021 at 10:52:28AM -0400, Tianyu Lan wrote: >> In Isolation VM, all shared memory with host needs to mark visible >> to host via hvcall. vmbus_establish_gpadl() has already done it for >> storvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_ >> mpb_desc() still need to handle. Use DMA API to map/umap these >> memory during sending/receiving packet and Hyper-V DMA ops callback >> will use swiotlb function to allocate bounce buffer and copy data >> from/to bounce buffer. > > I am wondering why you dont't use DMA-API unconditionally? It provides > enough abstraction to do the right thing for isolated and legacy VMs. > In VMbus, there is already a similar bounce buffer design and so there is no need to call DMA-API for such buffer. Calling DMA-API is to use swiotlb bounce buffer for those buffer which hasn't been covered. This is why need to conditionally call DMA API.
From: Tianyu Lan <Tianyu.Lan@microsoft.com> Hyper-V provides two kinds of Isolation VMs. VBS(Virtualization-based security) and AMD SEV-SNP unenlightened Isolation VMs. This patchset is to add support for these Isolation VM support in Linux. The memory of these vms are encrypted and host can't access guest memory directly. Hyper-V provides new host visibility hvcall and the guest needs to call new hvcall to mark memory visible to host before sharing memory with host. For security, all network/storage stack memory should not be shared with host and so there is bounce buffer requests. Vmbus channel ring buffer already plays bounce buffer role because all data from/to host needs to copy from/to between the ring buffer and IO stack memory. So mark vmbus channel ring buffer visible. There are two exceptions - packets sent by vmbus_sendpacket_ pagebuffer() and vmbus_sendpacket_mpb_desc(). These packets contains IO stack memory address and host will access these memory. So add allocation bounce buffer support in vmbus for these packets. For SNP isolation VM, guest needs to access the shared memory via extra address space which is specified by Hyper-V CPUID HYPERV_CPUID_ ISOLATION_CONFIG. The access physical address of the shared memory should be bounce buffer memory GPA plus with shared_gpa_boundary reported by CPUID. Change sicne RFC V4: - Introduce dma map decrypted function to remap bounce buffer and provide dma map decrypted ops for platform to hook callback. - Split swiotlb and dma map decrypted change into two patches - Replace vstart with vaddr in swiotlb changes. Change since RFC v3: - Add interface set_memory_decrypted_map() to decrypt memory and map bounce buffer in extra address space - Remove swiotlb remap function and store the remap address returned by set_memory_decrypted_map() in swiotlb mem data structure. - Introduce hv_set_mem_enc() to make code more readable in the __set_memory_enc_dec(). Change since RFC v2: - Remove not UIO driver in Isolation VM patch - Use vmap_pfn() to replace ioremap_page_range function in order to avoid exposing symbol ioremap_page_range() and ioremap_page_range() - Call hv set mem host visibility hvcall in set_memory_encrypted/decrypted() - Enable swiotlb force mode instead of adding Hyper-V dma map/unmap hook - Fix code style Tianyu Lan (13): x86/HV: Initialize GHCB page in Isolation VM x86/HV: Initialize shared memory boundary in the Isolation VM. x86/HV: Add new hvcall guest address host visibility support HV: Mark vmbus ring buffer visible to host in Isolation VM HV: Add Write/Read MSR registers via ghcb page HV: Add ghcb hvcall support for SNP VM HV/Vmbus: Add SNP support for VMbus channel initiate message HV/Vmbus: Initialize VMbus ring buffer for Isolation VM DMA: Add dma_map_decrypted/dma_unmap_encrypted() function x86/Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM HV/IOMMU: Enable swiotlb bounce buffer for Isolation VM HV/Netvsc: Add Isolation VM support for netvsc driver HV/Storvsc: Add Isolation VM support for storvsc driver arch/x86/hyperv/Makefile | 2 +- arch/x86/hyperv/hv_init.c | 87 +++++++-- arch/x86/hyperv/ivm.c | 296 +++++++++++++++++++++++++++++ arch/x86/include/asm/hyperv-tlfs.h | 18 ++ arch/x86/include/asm/mshyperv.h | 86 ++++++++- arch/x86/include/asm/sev.h | 4 + arch/x86/kernel/cpu/mshyperv.c | 5 + arch/x86/kernel/sev-shared.c | 21 +- arch/x86/mm/pat/set_memory.c | 6 +- arch/x86/xen/pci-swiotlb-xen.c | 3 +- drivers/hv/Kconfig | 1 + drivers/hv/channel.c | 48 ++++- drivers/hv/connection.c | 71 ++++++- drivers/hv/hv.c | 129 +++++++++---- drivers/hv/hyperv_vmbus.h | 3 + drivers/hv/ring_buffer.c | 84 ++++++-- drivers/hv/vmbus_drv.c | 3 + drivers/iommu/hyperv-iommu.c | 65 +++++++ drivers/net/hyperv/hyperv_net.h | 6 + drivers/net/hyperv/netvsc.c | 144 +++++++++++++- drivers/net/hyperv/rndis_filter.c | 2 + drivers/scsi/storvsc_drv.c | 68 ++++++- include/asm-generic/hyperv-tlfs.h | 1 + include/asm-generic/mshyperv.h | 53 +++++- include/linux/dma-map-ops.h | 9 + include/linux/hyperv.h | 16 ++ include/linux/swiotlb.h | 4 + kernel/dma/mapping.c | 22 +++ kernel/dma/swiotlb.c | 11 +- 29 files changed, 1166 insertions(+), 102 deletions(-) create mode 100644 arch/x86/hyperv/ivm.c