Message ID | 1390997486-3986-4-git-send-email-ian.campbell@citrix.com |
---|---|
State | New |
Headers | show |
On Wed, 29 Jan 2014, Jan Beulich wrote: > > +static void do_one_cacheflush(paddr_t mfn) > > +{ > > + void *v = map_domain_page(mfn); > > + > > + flush_xen_dcache_va_range(v, PAGE_SIZE); > > + > > + unmap_domain_page(v); > > +} > > Sort of odd that you have to map a page in order to flush cache > (which I very much hope is physically indexed, or else this > operation wouldn't have the intended effect anyway). Can this > not be done based on the machine address? Unfortunately no. I asked for a similar change when Ian sent the RFC because I was concerned about performances, but it turns out is not possible. A pity.
On Wed, 2014-01-29 at 13:00 +0000, Jan Beulich wrote: > >>> On 29.01.14 at 13:11, Ian Campbell <ian.campbell@citrix.com> wrote: > > --- a/tools/libxc/xc_domain.c > > +++ b/tools/libxc/xc_domain.c > > @@ -48,6 +48,14 @@ int xc_domain_create(xc_interface *xch, > > return 0; > > } > > > > +int xc_domain_cacheflush(xc_interface *xch, uint32_t domid) > > +{ > > + DECLARE_DOMCTL; > > + domctl.cmd = XEN_DOMCTL_cacheflush; > > + domctl.domain = (domid_t)domid; > > Why can't the function parameter be domid_t right away? It seemed that the vast majority of the current libxc functions were using uint32_t for whatever reason. > > > + case XEN_DOMCTL_cacheflush: > > + { > > + long rc = p2m_cache_flush(d, &domctl->u.cacheflush.start_mfn); > > + if ( __copy_to_guest(u_domctl, domctl, 1) ) > > While you certainly say so in the public header change, I think you > recall that we pretty recently changed another hypercall to not be > the only inconsistent one modifying the input structure in order to > handle hypercall preemption. That was a XNEMEM though IIRC -- is the same requirement also true of domctl's? How/where would you recommend saving the progress here? > > Further - who's responsible for initiating the resume after a > preemption? p2m_cache_flush() returning -EAGAIN isn't being > handled here, and also not in libxc (which would be awkward > anyway). I've once again fallen into the trap of thinking the common domctl code would do it for me. > > > +static void do_one_cacheflush(paddr_t mfn) > > +{ > > + void *v = map_domain_page(mfn); > > + > > + flush_xen_dcache_va_range(v, PAGE_SIZE); > > + > > + unmap_domain_page(v); > > +} > > Sort of odd that you have to map a page in order to flush cache > (which I very much hope is physically indexed, or else this > operation wouldn't have the intended effect anyway). Can this > not be done based on the machine address? Sadly not, yes it is very annoying. Yes, the cache is required to be physically indexed from armv7 onwards. > > > /* Preempt every 2MiB (mapped) or 32 MiB (unmapped) - arbitrary */ > > - if ( op == RELINQUISH && count >= 0x2000 ) > > + switch ( op ) > > { > > - if ( hypercall_preempt_check() ) > > + case RELINQUISH: > > + case CACHEFLUSH: > > + if (count >= 0x2000 && hypercall_preempt_check() ) > > { > > p2m->lowest_mapped_gfn = addr >> PAGE_SHIFT; > > rc = -EAGAIN; > > goto out; > > } > > count = 0; > > + break; > > + case INSERT: > > + case ALLOCATE: > > + case REMOVE: > > + /* No preemption */ > > + break; > > } > > Unrelated to the patch here, but don't you have a problem if you > don't preempt _at all_ here for certain operation types? Or is a > limit on the number of iterations being enforced elsewhere for > those? Good question. The tools/guest accessible paths here are through guest_physmap_add/remove_page. I think the only paths which are exposed that pass a non-zero order are XENMEM_populate_physmap and XENMEM_exchange, bot of which restrict the maximum order. I don't think those guest_physmap_* are preemptible on x86 either? It's possible that we should nevertheless handle preemption on those code paths as well, but I don't think it is critical right now (*or at least not critical enough to warrant an freeze exception for 4.4). Ian.
On Wed, 2014-01-29 at 15:01 +0000, Jan Beulich wrote: > >>> On 29.01.14 at 15:15, Ian Campbell <Ian.Campbell@citrix.com> wrote: > > On Wed, 2014-01-29 at 13:00 +0000, Jan Beulich wrote: > >> >>> On 29.01.14 at 13:11, Ian Campbell <ian.campbell@citrix.com> wrote: > >> > + case XEN_DOMCTL_cacheflush: > >> > + { > >> > + long rc = p2m_cache_flush(d, &domctl->u.cacheflush.start_mfn); > >> > + if ( __copy_to_guest(u_domctl, domctl, 1) ) > >> > >> While you certainly say so in the public header change, I think you > >> recall that we pretty recently changed another hypercall to not be > >> the only inconsistent one modifying the input structure in order to > >> handle hypercall preemption. > > > > That was a XNEMEM though IIRC -- is the same requirement also true of > > domctl's? > > Not necessarily - I was just trying to point out the issue to > avoid needing to fix it later on. OK, but you do think it should be fixed "transparently" rather than made an explicit part of the API? > > How/where would you recommend saving the progress here? > > Depending on the nature, a per-domain or per-vCPU field that > gets acted upon before issuing any new, similar operation. I.e. > something along the lines of x86's old_guest_table. It's ugly, I > know. But with exposing domctls to semi-trusted guests in > mind, you may use state modifiable by the caller here only if > tampering with that state isn't going to harm the whole system > (if the guest being started is affected in the case here that > obviously wouldn't be a problem). Hrm, thanks for raising this -- it made me realise that we cannot necessarily rely on the disaggregated domain builder to even issue this call at all. That would be OK from the point of view of not flushing the things which the builder touched (as you say, it can only harm the domain it is building). But it is not ok from the PoV of flushing scrubbed data from the cache, ensuring that the scrubbed bytes reach RAM (i.e. it can leak old data). So I think I need an approach which flushes the scrubbed pages as it does the scrubbing (this makes a certain logical sense anyway) and have the toolstack issue hypercalls to flush the stuff it has written. (the first approach to this issue tried to do this but used a system call provided by Linux which didn't have the quite correct semantics, but using a version of this hypercall with a range should work). Before I get too deep into that, do you think that struct xen_domctl_cacheflush { /* start_mfn is updated for progress over preemption. */ xen_pfn_t start_mfn; xen_pfn_t end_mfn; }; is acceptable or do you want me to try and find a way to do preemption without the write back? The blobs written by the toolstack aren't likely to be >1GB in size, so rejecting over large ranges would be a potential option, but it's not totally satisfactory. > >> > /* Preempt every 2MiB (mapped) or 32 MiB (unmapped) - arbitrary */ > >> > - if ( op == RELINQUISH && count >= 0x2000 ) > >> > + switch ( op ) > >> > { > >> > - if ( hypercall_preempt_check() ) > >> > + case RELINQUISH: > >> > + case CACHEFLUSH: > >> > + if (count >= 0x2000 && hypercall_preempt_check() ) > >> > { > >> > p2m->lowest_mapped_gfn = addr >> PAGE_SHIFT; > >> > rc = -EAGAIN; > >> > goto out; > >> > } > >> > count = 0; > >> > + break; > >> > + case INSERT: > >> > + case ALLOCATE: > >> > + case REMOVE: > >> > + /* No preemption */ > >> > + break; > >> > } > >> > >> Unrelated to the patch here, but don't you have a problem if you > >> don't preempt _at all_ here for certain operation types? Or is a > >> limit on the number of iterations being enforced elsewhere for > >> those? > > > > Good question. > > > > The tools/guest accessible paths here are through > > guest_physmap_add/remove_page. I think the only paths which are exposed > > that pass a non-zero order are XENMEM_populate_physmap and > > XENMEM_exchange, bot of which restrict the maximum order. > > > > I don't think those guest_physmap_* are preemptible on x86 either? > > They aren't, but they have a strict upper limit of at most dealing > with a 1Gb page at a time. If that's similar for ARM, I don't see > an immediate issue. Same on ARM (through common code using MAX_ORDER == 20) > > It's possible that we should nevertheless handle preemption on those > > code paths as well, but I don't think it is critical right now (*or at > > least not critical enough to warrant an freeze exception for 4.4). > > Indeed. Of course the 1Gb limit mentioned above, while perhaps > acceptable to process without preemption right now, is still pretty > high for achieving really good responsiveness, so we may need to > do something about that going forward. Right. I won't worry about it now though. > > Jan >
On Thu, 2014-01-30 at 11:26 +0000, Ian Jackson wrote: > Ian Campbell writes ("Re: [PATCH 4/4] xen/arm: clean and invalidate all guest caches by VMID after domain build."): > > On Wed, 2014-01-29 at 13:00 +0000, Jan Beulich wrote: > >>> On 29.01.14 at 13:11, Ian Campbell <ian.campbell@citrix.com> wrote: > > > > +int xc_domain_cacheflush(xc_interface *xch, uint32_t domid) > > > > + domctl.domain = (domid_t)domid; > > > > > > Why can't the function parameter be domid_t right away? > > > > It seemed that the vast majority of the current libxc functions were > > using uint32_t for whatever reason. > > What's the point of the cast, though ? Apparently all the cool kids in this file are doing it and I followed suite ;-) domid_t is a uint16_t, I kind of expect gcc to warn about an assignment of a uint32_t to a uint16_t to generate some sort of warning, but apparently not... Ian.
On Thu, 2014-01-30 at 12:24 +0000, Ian Campbell wrote: > On Thu, 2014-01-30 at 11:26 +0000, Ian Jackson wrote: > > Ian Campbell writes ("Re: [PATCH 4/4] xen/arm: clean and invalidate all guest caches by VMID after domain build."): > > > On Wed, 2014-01-29 at 13:00 +0000, Jan Beulich wrote: > > >>> On 29.01.14 at 13:11, Ian Campbell <ian.campbell@citrix.com> wrote: > > > > > +int xc_domain_cacheflush(xc_interface *xch, uint32_t domid) > > > > > + domctl.domain = (domid_t)domid; > > > > > > > > Why can't the function parameter be domid_t right away? > > > > > > It seemed that the vast majority of the current libxc functions were > > > using uint32_t for whatever reason. > > > > What's the point of the cast, though ? > > Apparently all the cool kids in this file are doing it and I followed > suite ;-) > > domid_t is a uint16_t, I kind of expect gcc to warn about an assignment > of a uint32_t to a uint16_t to generate some sort of warning, but > apparently not... Just for completeness: -Wconversion is the option to make it do this.
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c index c2fdd74..e6fa4ff 100644 --- a/tools/libxc/xc_domain.c +++ b/tools/libxc/xc_domain.c @@ -48,6 +48,14 @@ int xc_domain_create(xc_interface *xch, return 0; } +int xc_domain_cacheflush(xc_interface *xch, uint32_t domid) +{ + DECLARE_DOMCTL; + domctl.cmd = XEN_DOMCTL_cacheflush; + domctl.domain = (domid_t)domid; + domctl.u.cacheflush.start_mfn = 0; + return do_domctl(xch, &domctl); +} int xc_domain_pause(xc_interface *xch, uint32_t domid) diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h index 13f816b..43dae5c 100644 --- a/tools/libxc/xenctrl.h +++ b/tools/libxc/xenctrl.h @@ -453,6 +453,7 @@ int xc_domain_create(xc_interface *xch, xen_domain_handle_t handle, uint32_t flags, uint32_t *pdomid); +int xc_domain_cacheflush(xc_interface *xch, uint32_t domid); /* Functions to produce a dump of a given domain diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c index a604cd8..55c86f0 100644 --- a/tools/libxl/libxl_create.c +++ b/tools/libxl/libxl_create.c @@ -1364,7 +1364,10 @@ static void domain_create_cb(libxl__egc *egc, STATE_AO_GC(cdcs->dcs.ao); if (!rc) + { *cdcs->domid_out = domid; + xc_domain_cacheflush(CTX->xch, domid); + } libxl__ao_complete(egc, ao, rc); } diff --git a/xen/arch/arm/domctl.c b/xen/arch/arm/domctl.c index 546e86b..9e3b37d 100644 --- a/xen/arch/arm/domctl.c +++ b/xen/arch/arm/domctl.c @@ -11,12 +11,24 @@ #include <xen/sched.h> #include <xen/hypercall.h> #include <public/domctl.h> +#include <xen/guest_access.h> + +extern long p2m_cache_flush(struct domain *d, xen_pfn_t *start_mfn); long arch_do_domctl(struct xen_domctl *domctl, struct domain *d, XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl) { switch ( domctl->cmd ) { + case XEN_DOMCTL_cacheflush: + { + long rc = p2m_cache_flush(d, &domctl->u.cacheflush.start_mfn); + if ( __copy_to_guest(u_domctl, domctl, 1) ) + rc = -EFAULT; + + return rc; + } + default: return subarch_do_domctl(domctl, d, u_domctl); } diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c index a61edeb..18bd500 100644 --- a/xen/arch/arm/p2m.c +++ b/xen/arch/arm/p2m.c @@ -228,15 +228,26 @@ enum p2m_operation { ALLOCATE, REMOVE, RELINQUISH, + CACHEFLUSH, }; +static void do_one_cacheflush(paddr_t mfn) +{ + void *v = map_domain_page(mfn); + + flush_xen_dcache_va_range(v, PAGE_SIZE); + + unmap_domain_page(v); +} + static int apply_p2m_changes(struct domain *d, enum p2m_operation op, paddr_t start_gpaddr, paddr_t end_gpaddr, paddr_t maddr, int mattr, - p2m_type_t t) + p2m_type_t t, + xen_pfn_t *last_mfn) { int rc; struct p2m_domain *p2m = &d->arch.p2m; @@ -381,18 +392,42 @@ static int apply_p2m_changes(struct domain *d, count++; } break; + case CACHEFLUSH: + { + if ( !pte.p2m.valid || !p2m_is_ram(pte.p2m.type) ) + { + count++; + break; + } + + count += 0x10; + + do_one_cacheflush(pte.p2m.base); + } + break; } + if ( last_mfn ) + *last_mfn = addr >> PAGE_SHIFT; + /* Preempt every 2MiB (mapped) or 32 MiB (unmapped) - arbitrary */ - if ( op == RELINQUISH && count >= 0x2000 ) + switch ( op ) { - if ( hypercall_preempt_check() ) + case RELINQUISH: + case CACHEFLUSH: + if (count >= 0x2000 && hypercall_preempt_check() ) { p2m->lowest_mapped_gfn = addr >> PAGE_SHIFT; rc = -EAGAIN; goto out; } count = 0; + break; + case INSERT: + case ALLOCATE: + case REMOVE: + /* No preemption */ + break; } /* Got the next page */ @@ -438,7 +473,7 @@ int p2m_populate_ram(struct domain *d, paddr_t end) { return apply_p2m_changes(d, ALLOCATE, start, end, - 0, MATTR_MEM, p2m_ram_rw); + 0, MATTR_MEM, p2m_ram_rw, NULL); } int map_mmio_regions(struct domain *d, @@ -447,7 +482,7 @@ int map_mmio_regions(struct domain *d, paddr_t maddr) { return apply_p2m_changes(d, INSERT, start_gaddr, end_gaddr, - maddr, MATTR_DEV, p2m_mmio_direct); + maddr, MATTR_DEV, p2m_mmio_direct, NULL); } int guest_physmap_add_entry(struct domain *d, @@ -459,7 +494,7 @@ int guest_physmap_add_entry(struct domain *d, return apply_p2m_changes(d, INSERT, pfn_to_paddr(gpfn), pfn_to_paddr(gpfn + (1 << page_order)), - pfn_to_paddr(mfn), MATTR_MEM, t); + pfn_to_paddr(mfn), MATTR_MEM, t, NULL); } void guest_physmap_remove_page(struct domain *d, @@ -469,7 +504,7 @@ void guest_physmap_remove_page(struct domain *d, apply_p2m_changes(d, REMOVE, pfn_to_paddr(gpfn), pfn_to_paddr(gpfn + (1<<page_order)), - pfn_to_paddr(mfn), MATTR_MEM, p2m_invalid); + pfn_to_paddr(mfn), MATTR_MEM, p2m_invalid, NULL); } int p2m_alloc_table(struct domain *d) @@ -621,7 +656,20 @@ int relinquish_p2m_mapping(struct domain *d) pfn_to_paddr(p2m->lowest_mapped_gfn), pfn_to_paddr(p2m->max_mapped_gfn), pfn_to_paddr(INVALID_MFN), - MATTR_MEM, p2m_invalid); + MATTR_MEM, p2m_invalid, NULL); +} + +long p2m_cache_flush(struct domain *d, xen_pfn_t *start_mfn) +{ + struct p2m_domain *p2m = &d->arch.p2m; + + *start_mfn = MAX(*start_mfn, p2m->next_gfn_to_relinquish); + + return apply_p2m_changes(d, CACHEFLUSH, + pfn_to_paddr(*start_mfn), + pfn_to_paddr(p2m->max_mapped_gfn), + pfn_to_paddr(INVALID_MFN), + MATTR_MEM, p2m_invalid, start_mfn); } unsigned long gmfn_to_mfn(struct domain *d, unsigned long gpfn) diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h index 91f01fa..d7b22c3 100644 --- a/xen/include/public/domctl.h +++ b/xen/include/public/domctl.h @@ -885,6 +885,13 @@ struct xen_domctl_set_max_evtchn { typedef struct xen_domctl_set_max_evtchn xen_domctl_set_max_evtchn_t; DEFINE_XEN_GUEST_HANDLE(xen_domctl_set_max_evtchn_t); +struct xen_domctl_cacheflush { + /* Updated for progress */ + xen_pfn_t start_mfn; +}; +typedef struct xen_domctl_cacheflush xen_domctl_cacheflush_t; +DEFINE_XEN_GUEST_HANDLE(xen_domctl_cacheflush_t); + struct xen_domctl { uint32_t cmd; #define XEN_DOMCTL_createdomain 1 @@ -954,6 +961,7 @@ struct xen_domctl { #define XEN_DOMCTL_setnodeaffinity 68 #define XEN_DOMCTL_getnodeaffinity 69 #define XEN_DOMCTL_set_max_evtchn 70 +#define XEN_DOMCTL_cacheflush 71 #define XEN_DOMCTL_gdbsx_guestmemio 1000 #define XEN_DOMCTL_gdbsx_pausevcpu 1001 #define XEN_DOMCTL_gdbsx_unpausevcpu 1002 @@ -1012,6 +1020,7 @@ struct xen_domctl { struct xen_domctl_set_max_evtchn set_max_evtchn; struct xen_domctl_gdbsx_memio gdbsx_guest_memio; struct xen_domctl_set_broken_page_p2m set_broken_page_p2m; + struct xen_domctl_cacheflush cacheflush; struct xen_domctl_gdbsx_pauseunp_vcpu gdbsx_pauseunp_vcpu; struct xen_domctl_gdbsx_domstatus gdbsx_domstatus; uint8_t pad[128];
Guests are initially started with caches disabled and so we need to make sure they see consistent data in RAM (requiring a cache clean) but also that they do not have old stale data suddenly appear in the caches when they enable their caches (requiring the invalidate). We need to clean all caches in order to catch both pages dirtied by the domain builder and those which have been scrubbed but not yet flushed. Separating the two and flushing scrubbed pages at scrub time and only builder-dirtied pages here would require tracking the dirtiness state in the guest's p2m, perhaps via a new p2m type. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: jbeulich@suse.com Cc: keir@xen.org Cc: ian.jackson@eu.citrix.com --- tools/libxc/xc_domain.c | 8 ++++++ tools/libxc/xenctrl.h | 1 + tools/libxl/libxl_create.c | 3 ++ xen/arch/arm/domctl.c | 12 ++++++++ xen/arch/arm/p2m.c | 64 +++++++++++++++++++++++++++++++++++++------ xen/include/public/domctl.h | 9 ++++++ 6 files changed, 89 insertions(+), 8 deletions(-)