From patchwork Sun May 11 15:28:07 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Julien Grall X-Patchwork-Id: 29949 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-vc0-f198.google.com (mail-vc0-f198.google.com [209.85.220.198]) by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id E41AE208CF for ; Sun, 11 May 2014 16:43:57 +0000 (UTC) Received: by mail-vc0-f198.google.com with SMTP id ij19sf21119687vcb.9 for ; Sun, 11 May 2014 09:43:57 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:delivered-to:message-id:date:from:user-agent :mime-version:to:references:in-reply-to:cc:subject:precedence :list-id:list-unsubscribe:list-post:list-help:list-subscribe:sender :errors-to:x-original-sender:x-original-authentication-results :mailing-list:list-archive:content-type:content-transfer-encoding; bh=6UdyOuAldgdO983JxYpgnxNzR937ErDnmHCRfkpy6Wc=; b=KUSvVpYxTDOKeZ9PF8fZ7WW1v9W9y8rG6iu6RfL3TwMPQWZ0cDyMmPB3fL8DISkRRA IJjeU/67sY0UW6gLBCulDX0vR7bQbwHHItN2fQ3aIS/TvDTxQQg9P+f9k72RkBlFuulq mBv2v5LZNnhdwavmsXN/vYNwl2Qcm7xo6FuihMntCLaO5kPaZRR/flVekHxk4oXigNWK V+YiZb2JH+gHT14fcdhPPFhhuG6gqItYMGb2hqczheaGRqznx/tc5lF6v5ceRylnFZON WwllRx6lEaRxEEJeZ6WgnEmI/prcSBjk8ZhZhFfbAp7h/JgGUaqY6LQtZ1xdXBH9TcNw Gfsw== X-Gm-Message-State: ALoCoQn8Wj2eas6iLWf/ylyZd7I4XBFyZSYGEM8NmIWxCA0wE+UgGEE1KuVvLHXlp0YC1Qf9w8jx X-Received: by 10.224.36.137 with SMTP id t9mr11018724qad.4.1399826637588; Sun, 11 May 2014 09:43:57 -0700 (PDT) X-BeenThere: patchwork-forward@linaro.org Received: by 10.140.93.131 with SMTP id d3ls1064704qge.52.gmail; Sun, 11 May 2014 09:43:57 -0700 (PDT) X-Received: by 10.220.106.7 with SMTP id v7mr426729vco.46.1399826637492; Sun, 11 May 2014 09:43:57 -0700 (PDT) Received: from mail-ve0-f174.google.com (mail-ve0-f174.google.com [209.85.128.174]) by mx.google.com with ESMTPS id gq7si1676828vdc.122.2014.05.11.09.43.57 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sun, 11 May 2014 09:43:57 -0700 (PDT) Received-SPF: pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 209.85.128.174 as permitted sender) client-ip=209.85.128.174; Received: by mail-ve0-f174.google.com with SMTP id jw12so7565300veb.19 for ; Sun, 11 May 2014 09:43:57 -0700 (PDT) X-Received: by 10.58.243.39 with SMTP id wv7mr433230vec.51.1399826637274; Sun, 11 May 2014 09:43:57 -0700 (PDT) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patch@linaro.org Received: by 10.220.221.72 with SMTP id ib8csp11451vcb; Sun, 11 May 2014 09:43:56 -0700 (PDT) X-Received: by 10.52.104.72 with SMTP id gc8mr359388vdb.48.1399826636544; Sun, 11 May 2014 09:43:56 -0700 (PDT) Received: from lists.xen.org (lists.xen.org. [50.57.142.19]) by mx.google.com with ESMTPS id tj4si1675503vdc.87.2014.05.11.09.43.56 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Sun, 11 May 2014 09:43:56 -0700 (PDT) Received-SPF: none (google.com: xen-devel-bounces@lists.xen.org does not designate permitted sender hosts) client-ip=50.57.142.19; Received: from localhost ([127.0.0.1] helo=lists.xen.org) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1WjWqK-000630-CS; Sun, 11 May 2014 16:43:16 +0000 Received: from mail6.bemta5.messagelabs.com ([195.245.231.135]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1WjWqI-00062s-MP for xen-devel@lists.xen.org; Sun, 11 May 2014 16:43:15 +0000 Received: from [85.158.139.211:40747] by server-7.bemta-5.messagelabs.com id DA/A8-20531-2A8AF635; Sun, 11 May 2014 16:43:14 +0000 X-Env-Sender: julien.grall@linaro.org X-Msg-Ref: server-8.tower-206.messagelabs.com!1399826591!3519511!1 X-Originating-IP: [74.125.82.175] X-SpamReason: No, hits=0.0 required=7.0 tests= X-StarScan-Received: X-StarScan-Version: 6.11.3; banners=-,-,- X-VirusChecked: Checked Received: (qmail 2169 invoked from network); 11 May 2014 16:43:12 -0000 Received: from mail-we0-f175.google.com (HELO mail-we0-f175.google.com) (74.125.82.175) by server-8.tower-206.messagelabs.com with RC4-SHA encrypted SMTP; 11 May 2014 16:43:12 -0000 Received: by mail-we0-f175.google.com with SMTP id t61so5977617wes.34 for ; Sun, 11 May 2014 09:43:11 -0700 (PDT) X-Received: by 10.180.93.226 with SMTP id cx2mr12125600wib.16.1399826591737; Sun, 11 May 2014 09:43:11 -0700 (PDT) Received: from [192.168.42.89] (dab-ntm1-h-8-3.dab.02.net. [82.132.229.62]) by mx.google.com with ESMTPSA id 12sm14248569wju.48.2014.05.11.09.40.43 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sun, 11 May 2014 09:43:11 -0700 (PDT) Message-ID: <536F9707.4060807@linaro.org> Date: Sun, 11 May 2014 16:28:07 +0100 From: Julien Grall User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0 MIME-Version: 1.0 To: Wei Huang , xen-devel@lists.xen.org References: <1399583908-21755-1-git-send-email-w1.huang@samsung.com> <1399583908-21755-6-git-send-email-w1.huang@samsung.com> In-Reply-To: <1399583908-21755-6-git-send-email-w1.huang@samsung.com> Cc: keir@xen.org, ian.campbell@citrix.com, stefano.stabellini@eu.citrix.com, andrew.cooper3@citrix.com, tim@xen.org, jaeyong.yoo@samsung.com, jbeulich@suse.com, ian.jackson@eu.citrix.com, yjhyun.yoo@samsung.com Subject: Re: [Xen-devel] [RFC v3 5/6] xen/arm: Add log_dirty support for ARM X-BeenThere: xen-devel@lists.xen.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Post: , List-Help: , List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org X-Removed-Original-Auth: Dkim didn't pass. X-Original-Sender: julien.grall@linaro.org X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 209.85.128.174 as permitted sender) smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org Mailing-list: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org X-Google-Group-Id: 836684582541 List-Archive: Hi Wei, Thank you for the patch. It seems you didn't address most of my comment I made on V2. I will try to repeat all of them here. Next time, please read email on the previous version before sending a new one. On 08/05/14 22:18, Wei Huang wrote: > This patch implements log_dirty for ARM guest VMs. This feature > is provided via two basic blocks: dirty_bit_map and VLPT > (virtual-linear page table) > > 1. VLPT provides fast accessing of 3rd PTE of guest P2M. > When creating a mapping for VLPT, the page table mapping > becomes the following: > xen's 1st PTE --> xen's 2nd PTE --> guest p2m's 2nd PTE --> > guest p2m's 3rd PTE > > With VLPT, xen can immediately locate the 3rd PTE of guest P2M > and modify PTE attirbute during dirty page tracking. The following attribute > link shows the performance comparison for handling a dirty-page > between VLPT and typical page table walking. > http://lists.xen.org/archives/html/xen-devel/2013-08/msg01503.html > > For more info about VLPT, please see > http://www.technovelty.org/linux/virtual-linear-page-table.html. > > 2. Dirty bitmap > The dirty bitmap is used to mark the pages which are dirty during > migration. The info is used by Xen tools, via DOMCTL_SHADOW_OP_*, > to figure out which guest pages need to be resent. > > Signed-off-by: Jaeyong Yoo > Signed-off-by: Evgeny Fedotov > Signed-off-by: Wei Huang > --- > xen/arch/arm/domain.c | 6 + > xen/arch/arm/domctl.c | 31 +++- > xen/arch/arm/mm.c | 298 ++++++++++++++++++++++++++++++++++++++- As said on v2, the functions you have added is P2M related not Xen memory related. I think they should be moved in p2m.c [..] > +/* Return start and end addr of guest RAM. Note this function only reports > + * regular RAM. It does not cover other areas such as foreign mapped > + * pages or MMIO space. */ > +void domain_get_ram_range(struct domain *d, paddr_t *start, paddr_t *end) > +{ > + if ( start ) > + *start = GUEST_RAM_BASE; > + > + if ( end ) > + *end = GUEST_RAM_BASE + ((paddr_t) d->max_pages << PAGE_SHIFT); > +} As said on V1 this solution won't work. Ian plans to add multiple banks support for the guest very soon. With this solution there is a 1GB hole between the 2 banks. Your function will therefore stop to work. Furthermore, Xen should not assume that the layout of the guest will always start at GUEST_RAM_BASE. I think you can use max_mapped_pfn and lowest_mapped_pfn here. You may need to modify a bit the signification of it in the p2m code or introduce a new field. [..] > +/* Allocate dirty bitmap resource */ > +static int bitmap_init(struct domain *d) > +{ > + paddr_t gma_start = 0; > + paddr_t gma_end = 0; > + int nr_bytes; > + int nr_pages; > + int i; > + > + domain_get_ram_range(d, &gma_start, &gma_end); > + > + nr_bytes = (PFN_DOWN(gma_end - gma_start) + 7) / 8; > + nr_pages = (nr_bytes + PAGE_SIZE - 1) / PAGE_SIZE; > + > + BUG_ON(nr_pages > MAX_DIRTY_BITMAP_PAGES); AFAIU, you will crash Xen is the user is trying to migrate a guest with more than 8GB of RAM, right? If so, you should instead return an error. [..] > + /* Check if reserved space is enough to cover guest physical address space. > + * Note that each LPAE page table entry is 64-bit (8 bytes). So we only > + * shift left with LPAE_SHIFT instead of PAGE_SHIFT. */ > + domain_get_ram_range(d, &gma_start, &gma_end); > + required = (gma_end - gma_start) >> LPAE_SHIFT; > + if ( required > avail ) What is the maximum amount of RAM a guest can use if we want to migrate it? > + { > + dprintk(XENLOG_ERR, "Available VLPT is small for domU guest (avail: " > + "%#llx, required: %#llx)\n", (unsigned long long)avail, > + (unsigned long long)required); You don't need cast here. > + return -ENOMEM; > + } > + > + /* Caulculate the base of 2nd linear table base for VIRT_LIN_P2M_START */ Calculate > + xen_second_linear_base = second_linear_offset(VIRT_LIN_P2M_START); > + > + gp2m_start_index = gma_start >> FIRST_SHIFT; > + gp2m_end_index = (gma_end >> FIRST_SHIFT) + 1; > + > + if ( xen_second_linear_base + gp2m_end_index >= LPAE_ENTRIES * 2 ) > + { In which case this thing happen? > + dprintk(XENLOG_ERR, "xen second page is small for VLPT for domU"); > + return -ENOMEM; > + } > + > + /* Two pages are allocated to backup the related PTE content of guest > + * VM's 1st-level table. */ > + second_lvl_page = alloc_domheap_pages(NULL, 1, 0); > + if ( second_lvl_page == NULL ) > + return -ENOMEM; > + d->arch.dirty.second_lvl[0] = map_domain_page_global( > + page_to_mfn(second_lvl_page) ); > + d->arch.dirty.second_lvl[1] = map_domain_page_global( > + page_to_mfn(second_lvl_page+1) ); map_domain_page_global can fail. > + > + /* 1st level P2M of guest VM is 2 consecutive pages */ > + first[0] = __map_domain_page(p2m->first_level); > + first[1] = __map_domain_page(p2m->first_level+1); > + > + for ( i = gp2m_start_index; i < gp2m_end_index; ++i ) > + { > + int k = i % LPAE_ENTRIES; > + int l = i / LPAE_ENTRIES; > + int k2 = (xen_second_linear_base + i) % LPAE_ENTRIES; > + int l2 = (xen_second_linear_base + i) / LPAE_ENTRIES; > + > + /* Update 2nd-level PTE of Xen linear table. With this, Xen linear > + * page table layout becomes: 1st Xen linear ==> 2nd Xen linear ==> > + * 2nd guest P2M (i.e. 3rd Xen linear) ==> 3rd guest P2M (i.e. Xen > + * linear content) for VIRT_LIN_P2M_START address space. */ > + write_pte(&xen_second[xen_second_linear_base+i], first[l][k]); Space around binary operator. [..] > +/* Restore the xen page table for vlpt mapping for domain */ > +void log_dirty_restore(struct domain *d) > +{ > + int i; > + > + /* Nothing to do as log dirty mode is off */ > + if ( !(d->arch.dirty.mode) ) Your VLPT implementation uses xen_second, which is shared between every pCPU. This will restrict to migrate only one guest at the time. Therefore restoring the VLPT seems pointless. In another hand, I didn't see anything in your patch series which prevent this case. You will likely corrupt one (if not both) guest. You have to create per-VCPU mapping for your VLPT solution. > + return; > + > + dsb(sy); I think inner-shareable (ish) is enough here. > + > + for ( i = d->arch.dirty.second_lvl_start; i < d->arch.dirty.second_lvl_end; > + ++i ) > + { > + int k = i % LPAE_ENTRIES; > + int l = i / LPAE_ENTRIES; > + > + if ( xen_second[i].bits != d->arch.dirty.second_lvl[l][k].bits ) > + { > + write_pte(&xen_second[i], d->arch.dirty.second_lvl[l][k]); > + flush_xen_data_tlb_range_va(i << SECOND_SHIFT, 1 << SECOND_SHIFT); > + } > + } > + > + dsb(sy); Same here. > + isb(); > +} > + > +/* Turn on log dirty */ > +int log_dirty_on(struct domain *d) > +{ > + if ( vlpt_init(d) || bitmap_init(d) ) You have to call vlpt_cleanup if bitmap_init fail. Otherwise, will let some page mapped via domain global. > + return -EINVAL; > + > + return 0; > +} > + > +/* Turn off log dirty */ > +void log_dirty_off(struct domain *d) > +{ > + bitmap_cleanup(d); > + vlpt_cleanup(d); > +} > + > +/* Initialize log dirty fields */ > +int log_dirty_init(struct domain *d) You don't check the return in arch_domain_create. Therefore, your return value should be void. > +{ > + d->arch.dirty.count = 0; > + d->arch.dirty.mode = 0; > + spin_lock_init(&d->arch.dirty.lock); > + > + d->arch.dirty.second_lvl_start = 0; > + d->arch.dirty.second_lvl_end = 0; > + d->arch.dirty.second_lvl[0] = NULL; > + d->arch.dirty.second_lvl[1] = NULL; > + > + memset(d->arch.dirty.bitmap, 0, sizeof(d->arch.dirty.bitmap)); > + d->arch.dirty.bitmap_pages = 0; > + > + return 0; > +} > + > +/* Log dirty tear down */ > +void log_dirty_teardown(struct domain *d) > +{ I think nothing prevents to destroy a domain with log dirty on. You should release all resources you've allocated for this domain. Otherwise, Xen will leak memory. > + return; > +} > + > /* > * Local variables: > * mode: C > diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c > index 603c097..0808cc9 100644 > --- a/xen/arch/arm/p2m.c > +++ b/xen/arch/arm/p2m.c > @@ -6,6 +6,8 @@ > #include > #include > #include > +#include > +#include > #include > #include > #include > @@ -208,6 +210,7 @@ static lpae_t mfn_to_p2m_entry(unsigned long mfn, unsigned int mattr, > break; > > case p2m_ram_ro: > + case p2m_ram_logdirty: > e.p2m.xn = 0; > e.p2m.write = 0; > break; > @@ -261,6 +264,10 @@ static int p2m_create_table(struct domain *d, > > pte = mfn_to_p2m_entry(page_to_mfn(page), MATTR_MEM, p2m_invalid); > > + /* mark the write bit (page table's case, ro bit) as 0 > + * so, it is writable in case of vlpt access */ mark the entry as write-only? > + pte.pt.ro = 0; > + > write_pte(entry, pte); > > return 0; > @@ -696,6 +703,203 @@ unsigned long gmfn_to_mfn(struct domain *d, unsigned long gpfn) > return p >> PAGE_SHIFT; > } > > +/* Change types across all p2m entries in a domain */ > +void p2m_change_entry_type_global(struct domain *d, enum mg nt) > +{ Can't you reuse apply_p2m_changes? I'm also concerned about preemption. This function might be very long to run (depending on the size of the memory). > + struct p2m_domain *p2m = &d->arch.p2m; > + paddr_t ram_base; > + int i1, i2, i3; > + int first_index, second_index, third_index; > + lpae_t *first = __map_domain_page(p2m->first_level); > + lpae_t pte, *second = NULL, *third = NULL; > + > + domain_get_ram_range(d, &ram_base, NULL); > + > + first_index = first_table_offset((uint64_t)ram_base); > + second_index = second_table_offset((uint64_t)ram_base); > + third_index = third_table_offset((uint64_t)ram_base); > + > + BUG_ON(!first); __map_domain_page always return a valid pointer. This BUG_ON is pointless. > + spin_lock(&p2m->lock); > + > + for ( i1 = first_index; i1 < LPAE_ENTRIES*2; ++i1 ) > + { > + lpae_walk_t first_pte = first[i1].walk; > + if ( !first_pte.valid || !first_pte.table ) > + goto out; > + > + second = map_domain_page(first_pte.base); > + BUG_ON(!second); Same remark as BUG_ON(!first). > + for ( i2 = second_index; i2 < LPAE_ENTRIES; ++i2 ) > + { > + lpae_walk_t second_pte = second[i2].walk; > + > + if ( !second_pte.valid || !second_pte.table ) > + goto out; With Ian's multiple bank support, the RAM region (as returned by domain_get_range) can contain a hole. Rather than leaving the loop, you should continue. > + > + third = map_domain_page(second_pte.base); > + BUG_ON(!third); Same remark as BUG_ON(!third). > + > + for ( i3 = third_index; i3 < LPAE_ENTRIES; ++i3 ) > + { > + lpae_walk_t third_pte = third[i3].walk; > + > + if ( !third_pte.valid ) > + goto out; > + > + pte = third[i3]; > + > + if ( nt == mg_ro ) I would use a switch case for nt. It will be clearer and easier to extend. > + { > + if ( pte.p2m.write == 1 ) We only want to trap read-write RAM region. Your solution may also trap MMIO, which is completely wrong. I would replace by if ( pte.p2m.type == p2m_ram_rw ). > + { > + pte.p2m.write = 0; > + pte.p2m.type = p2m_ram_logdirty; > + } > + else > + { > + /* reuse avail bit as an indicator of 'actual' > + * read-only */ Spurious comment? > + pte.p2m.type = p2m_ram_rw; Why do you unconditionally change the type? > + } > + } > + else if ( nt == mg_rw ) > + { > + if ( pte.p2m.write == 0 && > + pte.p2m.type == p2m_ram_logdirty ) Can you add a comment to say what does it mean? > + { > + pte.p2m.write = p2m_ram_rw; Wrong field? > + } > + } > + write_pte(&third[i3], pte); > + } > + unmap_domain_page(third); > + > + third = NULL; > + third_index = 0; > + } > + unmap_domain_page(second); > + > + second = NULL; > + second_index = 0; > + third_index = 0; > + } > + > +out: > + flush_tlb_all_local(); You want to flush the P2M on every CPU and only for the current VMID. I've introduced a function flush_tlb_domain to do the job for you. I haven't yet send the patch (see [1] at the end of the mail). > + if ( third ) unmap_domain_page(third); > + if ( second ) unmap_domain_page(second); > + if ( first ) unmap_domain_page(first); > + > + spin_unlock(&p2m->lock); > +} > + > +/* Read a domain's log-dirty bitmap and stats. If the operation is a CLEAN, > + * clear the bitmap and stats. */ > +int log_dirty_op(struct domain *d, xen_domctl_shadow_op_t *sc) > +{ > + int peek = 1; > + int i; > + int bitmap_size; > + paddr_t gma_start, gma_end; > + > + /* this hypercall is called from domain 0, and we don't know which guest's What does prevent to call this hypercall from another domain than 0? > + * vlpt is mapped in xen_second, so, to be sure, we restore vlpt here */ > + log_dirty_restore(d); > + > + domain_get_ram_range(d, &gma_start, &gma_end); > + bitmap_size = (gma_end - gma_start) / 8; > + > + if ( guest_handle_is_null(sc->dirty_bitmap) ) > + { > + peek = 0; Hrrrm... you set peek to but never use it. > + } Bracket are not necessary here. > + else > + { > + spin_lock(&d->arch.dirty.lock); > + > + for ( i = 0; i < d->arch.dirty.bitmap_pages; ++i ) Why not i++? > + { > + int j = 0; > + uint8_t *bitmap; > + > + copy_to_guest_offset(sc->dirty_bitmap, i * PAGE_SIZE, > + d->arch.dirty.bitmap[i], > + bitmap_size < PAGE_SIZE ? bitmap_size : > + PAGE_SIZE); Where do you check sc->dirty_bitmap as enough space to store the guest dirty bitmap? You also forget to update sc->pages. > + bitmap_size -= PAGE_SIZE; > + > + /* set p2m page table read-only */ > + bitmap = d->arch.dirty.bitmap[i]; > + while ((j = find_next_bit((const long unsigned int *)bitmap, > + PAGE_SIZE*8, j)) < PAGE_SIZE*8) > + { > + lpae_t *vlpt; > + paddr_t addr = gma_start + (i << (2*PAGE_SHIFT+3)) + > + (j << PAGE_SHIFT); > + vlpt = vlpt_get_3lvl_pte(addr); > + vlpt->p2m.write = 0; > + j++; > + } > + } > + > + if ( sc->op == XEN_DOMCTL_SHADOW_OP_CLEAN ) > + { I suspect this if is in the wrong place. I think, when sc->op is equal to XEN_DOMCTL_SHADOW_OP_CLEAN, the buffer is NULL. Clean should also clean d->arch.dirty.count ... > + for ( i = 0; i < d->arch.dirty.bitmap_pages; ++i ) > + { > + clear_page(d->arch.dirty.bitmap[i]); > + } Bracket are not necessary here. [..] > diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c > index df4d375..b652565 100644 > --- a/xen/arch/arm/traps.c > +++ b/xen/arch/arm/traps.c > @@ -1556,6 +1556,8 @@ static void do_trap_data_abort_guest(struct cpu_user_regs *regs, > struct hsr_dabt dabt = hsr.dabt; > int rc; > mmio_info_t info; > + int page_fault = ( dabt.write && ((dabt.dfsc & FSC_MASK) == > + (FSC_FLT_PERM|FSC_3RD_LEVEL)) ); > > if ( !check_conditional_instr(regs, hsr) ) > { > @@ -1577,6 +1579,13 @@ static void do_trap_data_abort_guest(struct cpu_user_regs *regs, > if ( rc == -EFAULT ) > goto bad_data_abort; > > + /* domU page fault handling for guest live migration. Note that I would remove domU in the comment. > + * dabt.valid can be 0 here */ > + if ( page_fault && handle_page_fault(current->domain, info.gpa) ) > + { > + /* Do not modify PC as guest needs to repeat memory operation */ > + return; > + } > /* XXX: Decode the instruction if ISS is not valid */ > if ( !dabt.valid ) > goto bad_data_abort; > diff --git a/xen/include/asm-arm/config.h b/xen/include/asm-arm/config.h > index ef291ff..f18fae4 100644 > --- a/xen/include/asm-arm/config.h > +++ b/xen/include/asm-arm/config.h > @@ -87,6 +87,7 @@ > * 0 - 8M > * > * 32M - 128M Frametable: 24 bytes per page for 16GB of RAM > + * 128M - 256M Virtual-linear mapping to P2M table > * 256M - 1G VMAP: ioremap and early_ioremap use this virtual address > * space > * > @@ -124,13 +125,15 @@ > #define CONFIG_SEPARATE_XENHEAP 1 > > #define FRAMETABLE_VIRT_START _AT(vaddr_t,0x02000000) > -#define VMAP_VIRT_START _AT(vaddr_t,0x10000000) > +#define VIRT_LIN_P2M_START _AT(vaddr_t,0x08000000) > +#define VMAP_VIRT_START _AT(vaddr_t,0x10000000) > +#define VIRT_LIN_P2M_END VMAP_VIRT_START Should not it be VMAP_VIRT_START - 1? I would also directly use _AT(vaddr_t,0x0FFFFFFF) to stay consistent with the other *_END define. > #define XENHEAP_VIRT_START _AT(vaddr_t,0x40000000) > #define XENHEAP_VIRT_END _AT(vaddr_t,0x7fffffff) > #define DOMHEAP_VIRT_START _AT(vaddr_t,0x80000000) > #define DOMHEAP_VIRT_END _AT(vaddr_t,0xffffffff) > > -#define VMAP_VIRT_END XENHEAP_VIRT_START > +#define VMAP_VIRT_END XENHEAP_VIRT_START Spurious changes? > #define DOMHEAP_ENTRIES 1024 /* 1024 2MB mapping slots */ > > @@ -157,6 +160,11 @@ > > #define HYPERVISOR_VIRT_END DIRECTMAP_VIRT_END > > +/* Definition for VIRT_LIN_P2M_START and VIRT_LIN_P2M_END (64-bit) > + * TODO: Needs evaluation. */ Can you update the layout description for ARM64 above? > +#define VIRT_LIN_P2M_START _AT(vaddr_t, 0x08000000) > +#define VIRT_LIN_P2M_END VMAP_VIRT_START > + > #endif > > /* Fixmap slots */ > diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h > index aabeb51..ac82643 100644 > --- a/xen/include/asm-arm/domain.h > +++ b/xen/include/asm-arm/domain.h > @@ -162,6 +162,25 @@ struct arch_domain > } vuart; > > unsigned int evtchn_irq; > + > + /* dirty page tracing */ > + struct { > + spinlock_t lock; > + volatile int mode; /* 1 if dirty pages tracing enabled */ > + volatile unsigned int count; /* dirty pages counter */ > + > + /* vlpt context switch */ > + volatile int second_lvl_start; /* start idx of virt linear space 2nd */ > + volatile int second_lvl_end; /* end idx of virt linear space 2nd */ Why this 4 fields must be volatile? > + lpae_t *second_lvl[2]; /* copy of guest P2M 1st-lvl content */ > + > + /* bitmap to track dirty pages */ > +#define MAX_DIRTY_BITMAP_PAGES 64 > + /* Because each bit represents a dirty page, the total supported guest > + * memory is (64 entries x 4KB/entry x 8bits/byte x 4KB) = 8GB. */ > + uint8_t *bitmap[MAX_DIRTY_BITMAP_PAGES]; /* dirty bitmap */ > + int bitmap_pages; /* # of dirty bitmap pages */ > + } dirty; > } __cacheline_aligned; > > struct arch_vcpu > diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h > index b8d4e7d..ab19025 100644 > --- a/xen/include/asm-arm/mm.h > +++ b/xen/include/asm-arm/mm.h > @@ -4,6 +4,7 @@ > #include > #include > #include > +#include > #include > > /* Align Xen to a 2 MiB boundary. */ > @@ -320,6 +321,7 @@ int donate_page( > #define domain_clamp_alloc_bitsize(d, b) (b) > > unsigned long domain_get_maximum_gpfn(struct domain *d); > +void domain_get_ram_range(struct domain *d, paddr_t *start, paddr_t *end); > extern struct domain *dom_xen, *dom_io, *dom_cow; > > @@ -341,6 +343,27 @@ static inline void put_page_and_type(struct page_info *page) > put_page(page); > } > > +enum mg { mg_clear, mg_ro, mg_rw, mg_rx }; Please describe this enum. Also mg is too generic. > +/************************************/ > +/* Log-dirty support functions */ > +/************************************/ > +int log_dirty_on(struct domain *d); > +void log_dirty_off(struct domain *d); > +int log_dirty_init(struct domain *d); > +void log_dirty_teardown(struct domain *d); > +void log_dirty_restore(struct domain *d); > +int handle_page_fault(struct domain *d, paddr_t addr); > +/* access leaf PTE of a given guest address (GPA) */ > +static inline lpae_t * vlpt_get_3lvl_pte(paddr_t addr) > +{ > + lpae_t *table = (lpae_t *)VIRT_LIN_P2M_START; > + > + /* Since we slotted the guest's first p2m page table to xen's > + * second page table, one shift is enough for calculating the > + * index of guest p2m table entry */ > + return &table[addr >> PAGE_SHIFT]; > +} These functions should be part of p2m.h, not mm.h > @@ -41,6 +42,7 @@ typedef enum { > p2m_invalid = 0, /* Nothing mapped here */ > p2m_ram_rw, /* Normal read/write guest RAM */ > p2m_ram_ro, /* Read-only; writes are silently dropped */ > + p2m_ram_logdirty, /* Read-only: special mode for log dirty */ You should at the new type at the end of the enum. > p2m_mmio_direct, /* Read/write mapping of genuine MMIO area */ > p2m_map_foreign, /* Ram pages from foreign domain */ > p2m_grant_map_rw, /* Read/write grant mapping */ > @@ -49,7 +51,8 @@ typedef enum { > } p2m_type_t; Regards, [1] commit cebfd170dc13791df95fbb120c5894f0960e2804 Author: Julien Grall Date: Mon Apr 28 16:34:21 2014 +0100 xen/arm: Introduce flush_tlb_domain The pattern p2m_load_VTTBR(d) -> flush_tlb -> p2m_load_VTTBR(current->domain) is used in few places. Replace this usage by flush_tlb_domain which will take care of this pattern. This will help to the lisibility of apply_p2m_changes which begin to be big. Signed-off-by: Julien Grall diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c index 603c097..61450cf 100644 --- a/xen/arch/arm/p2m.c +++ b/xen/arch/arm/p2m.c @@ -72,6 +72,21 @@ void p2m_restore_state(struct vcpu *n) isb(); } +void flush_tlb_domain(struct domain *d) +{ + /* Update the VTTBR if necessary with the domain d. In this case, + * it's only necessary to flush TLBs on every CPUs with the current VMID + * (our domain). + */ + if ( d != current->domain ) + p2m_load_VTTBR(d); + + flush_tlb(); + + if ( d != current->domain ) + p2m_load_VTTBR(current->domain); +} + static int p2m_first_level_index(paddr_t addr) { /* @@ -450,19 +465,7 @@ static int apply_p2m_changes(struct domain *d, } if ( flush ) - { - /* Update the VTTBR if necessary with the domain where mappings - * are created. In this case it's only necessary to flush TLBs - * on every CPUs with the current VMID (our domain). - */ - if ( d != current->domain ) - p2m_load_VTTBR(d); - - flush_tlb(); - - if ( d != current->domain ) - p2m_load_VTTBR(current->domain); - } + flush_tlb_domain(d); if ( op == ALLOCATE || op == INSERT ) { @@ -550,14 +553,10 @@ int p2m_alloc_table(struct domain *d) d->arch.vttbr = page_to_maddr(p2m->first_level) | ((uint64_t)p2m->vmid&0xff)<<48; - p2m_load_VTTBR(d); - /* Make sure that all TLBs corresponding to the new VMID are flushed * before using it */ - flush_tlb(); - - p2m_load_VTTBR(current->domain); + flush_tlb_domain(d); spin_unlock(&p2m->lock); diff --git a/xen/include/asm-arm/flushtlb.h b/xen/include/asm-arm/flushtlb.h index 329fbb4..5722c67 100644 --- a/xen/include/asm-arm/flushtlb.h +++ b/xen/include/asm-arm/flushtlb.h @@ -25,6 +25,9 @@ do { \ /* Flush specified CPUs' TLBs */ void flush_tlb_mask(const cpumask_t *mask); +/* Flush CPU's TLBs for the speficied domain */ +void flush_tlb_domain(struct domain *d); + #endif /* __ASM_ARM_FLUSHTLB_H__ */ /* * Local variables: