Message ID | 20140609110356.GD25590@arm.com |
---|---|
State | New |
Headers | show |
On Mon, Jun 09, 2014 at 12:03:56PM +0100, Catalin Marinas wrote: > > > > > __early_set_fixmap does not do any synchronization when called to set a > > > > > fixmap entry. Add call to flush_vmap_cache(). > > Did you hit a problem or it was just for safety? This fixes an abort when accessing early UEFI configuration tables on a hardware platform. (The issue can not be triggered with the FVM Base Model, even with cache modelling enabled.) > > > > I'm confused by the commit message mentioning synchronization but > > > > the code doing a cache flush. I see that arm64 implementation of > > > > flush_cache_vmap() is just a dsb(). If it is synchronization that > > > > we need here (and it certainly looks like we do), why not just add > > > > the dsb() directly to make that clear? > > > > > > It needs this Linux-semantically for the same reason remap_page_range > > > needs it. From the ARM architectural point of view, the reason is that > > > the translation table walk is considered a separate observer from the > > > core data interface. > > > > > > But since there is a common Linux semantic for this, I preferred > > > reusing that over just throwing in a dsb(). My interpretation of > > > flush_cache_vmap() was "flush mappings from cache, so they can be > > > picked up by table walk". While we don't technically need to flush the > > > cache here, the underlying requirement is the same. > > > > But the range you are flushing is not a range seen by the table walk > > observer. I just think it is clearer to explicitly show that it is > > the pte write which we want the table walk to see rather than to > > rely on the implicit behavior of a cache flush routine. > > I think that's a valid point. flush_cache_vmap() is used to remove any > cached entries for the ioremap/vmap'ed range. That's not the aim here. > > As an optimisation, set_pte() doesn't have a dsb(). We do this on the > clearing/invalidating path via the TLB flushing routines but not on the > re-enabling path. Here we just added dsb() in the relevant functions > that were called from the generic code (flush_cache_vmap, > update_mmu_cache). > > A quick grep through the kernel shows that we have other set_pte() calls > without additional dsb() like create_mapping(), I think kvm_set_pte() as > well. > > So I'm proposing an alternative patch (which needs some benchmarking as > well to see if anything is affected, maybe application startup time). I'm happy for any fix which can be included in 3.16. But is the dsb(ishst) sufficient? We need to also prevent reads from overtaking the set_pte(). i.e.: ptr = early_ioremap(phys_addr, size); if (ptr && strcmp(ptr, "magic") == 0) ... Does it not require a dsb(ish)? > ------------------8<------------------------------- > > From b5cd0fff5cb044fa32cbaa4eebd2f00690922567 Mon Sep 17 00:00:00 2001 > From: Catalin Marinas <catalin.marinas@arm.com> > Date: Mon, 9 Jun 2014 11:55:03 +0100 > Subject: [PATCH] arm64: Use DSB after page table update > > As an optimisation, set_pte() does not contain a DSB instruction to > ensure the observability of the page table update before subsequent > memory accesses or TLB maintenance. Such barrier is placed in the > flush_tlb_*() functions and flush_cache_vmap()/update_mmu_cache(). > However, there are still places where this barrier is missed like > create_mapping(). > > This patch adds a dsb(ishst) call in set_pte() but only when the entry > being written is valid. For invalid entries, there is always a > subsequent TLB maintenance containing the DSB. > > Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> > Cc: Will Deacon <will.deacon@arm.com> > --- > arch/arm64/include/asm/cacheflush.h | 11 +---------- > arch/arm64/include/asm/pgtable.h | 8 ++++++++ > arch/arm64/include/asm/tlbflush.h | 5 ----- > 3 files changed, 9 insertions(+), 15 deletions(-) > > diff --git a/arch/arm64/include/asm/cacheflush.h b/arch/arm64/include/asm/cacheflush.h > index a5176cf32dad..8fdf37d83014 100644 > --- a/arch/arm64/include/asm/cacheflush.h > +++ b/arch/arm64/include/asm/cacheflush.h > @@ -138,19 +138,10 @@ static inline void __flush_icache_all(void) > #define flush_icache_page(vma,page) do { } while (0) > > /* > - * flush_cache_vmap() is used when creating mappings (eg, via vmap, > - * vmalloc, ioremap etc) in kernel space for pages. On non-VIPT > - * caches, since the direct-mappings of these pages may contain cached > - * data, we need to do a full cache flush to ensure that writebacks > - * don't corrupt data placed into these pages via the new mappings. > + * Not required on AArch64 with VIPT caches. > */ > static inline void flush_cache_vmap(unsigned long start, unsigned long end) > { > - /* > - * set_pte_at() called from vmap_pte_range() does not > - * have a DSB after cleaning the cache line. > - */ > - dsb(ish); > } > > static inline void flush_cache_vunmap(unsigned long start, unsigned long end) > diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h > index aa150ed99f22..2d03af6a2225 100644 > --- a/arch/arm64/include/asm/pgtable.h > +++ b/arch/arm64/include/asm/pgtable.h > @@ -136,6 +136,7 @@ extern struct page *empty_zero_page; > #define pte_write(pte) (!!(pte_val(pte) & PTE_WRITE)) > #define pte_exec(pte) (!(pte_val(pte) & PTE_UXN)) > > +#define pte_valid(pte) (!!(pte_val(pte) & PTE_VALID)) > #define pte_valid_user(pte) \ > ((pte_val(pte) & (PTE_VALID | PTE_USER)) == (PTE_VALID | PTE_USER)) > > @@ -184,6 +185,13 @@ static inline pte_t pte_mkspecial(pte_t pte) > static inline void set_pte(pte_t *ptep, pte_t pte) > { > *ptep = pte; > + > + /* > + * If !pte_valid(pte), the DSB is handled by the TLB invalidation > + * operation. > + */ > + if (pte_valid(pte)) > + dsb(ishst); > } > > extern void __sync_icache_dcache(pte_t pteval, unsigned long addr); > diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h > index b9349c4513ea..b28b970123dc 100644 > --- a/arch/arm64/include/asm/tlbflush.h > +++ b/arch/arm64/include/asm/tlbflush.h > @@ -130,11 +130,6 @@ static inline void flush_tlb_kernel_range(unsigned long start, unsigned long end > static inline void update_mmu_cache(struct vm_area_struct *vma, > unsigned long addr, pte_t *ptep) > { > - /* > - * set_pte() does not have a DSB, so make sure that the page table > - * write is visible. > - */ > - dsb(ishst); > } > > #define update_mmu_cache_pmd(vma, address, pmd) do { } while (0) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On Mon, Jun 09, 2014 at 02:24:29PM +0100, Leif Lindholm wrote: > On Mon, Jun 09, 2014 at 12:03:56PM +0100, Catalin Marinas wrote: > > A quick grep through the kernel shows that we have other set_pte() calls > > without additional dsb() like create_mapping(), I think kvm_set_pte() as > > well. > > > > So I'm proposing an alternative patch (which needs some benchmarking as > > well to see if anything is affected, maybe application startup time). > > I'm happy for any fix which can be included in 3.16. Steve Capper made a point about performance. He'll follow up. > But is the dsb(ishst) sufficient? We need to also prevent reads from > overtaking the set_pte(). i.e.: > > ptr = early_ioremap(phys_addr, size); > if (ptr && strcmp(ptr, "magic") == 0) > ... > > Does it not require a dsb(ish)? So doesn't early_ioremap() now include a dsb() after set_pte() with my patch? BTW, according to the ARM ARM (and confirmed with architects), we needs DSB+ISB even if we have just a data access (rather than instruction fetch). We have to revisit both 32 and 64-bit code for this.
On Mon, Jun 09, 2014 at 02:38:59PM +0100, Catalin Marinas wrote: > On Mon, Jun 09, 2014 at 02:24:29PM +0100, Leif Lindholm wrote: > > On Mon, Jun 09, 2014 at 12:03:56PM +0100, Catalin Marinas wrote: > > > A quick grep through the kernel shows that we have other set_pte() calls > > > without additional dsb() like create_mapping(), I think kvm_set_pte() as > > > well. > > > > > > So I'm proposing an alternative patch (which needs some benchmarking as > > > well to see if anything is affected, maybe application startup time). > > > > I'm happy for any fix which can be included in 3.16. > > Steve Capper made a point about performance. He'll follow up. Hi :-). My concern was initially about splitting THPs, as with a 64K granule, we will have 2048 calls to set_pte_at in a loop. Having thought about it, these should be relatively rare events though. There will be an impact on various aspects of the memory system (for instance fork), but this will need to be measured. Cheers,
On Mon, Jun 09, 2014 at 05:40:05PM +0100, Steve Capper wrote: > On Mon, Jun 09, 2014 at 02:38:59PM +0100, Catalin Marinas wrote: > > On Mon, Jun 09, 2014 at 02:24:29PM +0100, Leif Lindholm wrote: > > > On Mon, Jun 09, 2014 at 12:03:56PM +0100, Catalin Marinas wrote: > > > > A quick grep through the kernel shows that we have other set_pte() calls > > > > without additional dsb() like create_mapping(), I think kvm_set_pte() as > > > > well. > > > > > > > > So I'm proposing an alternative patch (which needs some benchmarking as > > > > well to see if anything is affected, maybe application startup time). > > > > > > I'm happy for any fix which can be included in 3.16. > > > > Steve Capper made a point about performance. He'll follow up. > > My concern was initially about splitting THPs, as with a 64K granule, > we will have 2048 calls to set_pte_at in a loop. Having thought about > it, these should be relatively rare events though. Are there any THP benchmarks? > There will be an impact on various aspects of the memory system (for > instance fork), but this will need to be measured. There is but I don't think we have an easy around. Grep'ing the kernel for set_pte_at() shows several places where this is called without a corresponding DSB (currently only as part of flush_cache_vmap and update_mmu_cache). For example, insert_page(), though insert_pfn() calls update_mmu_cache() and it even has a comment "why not for insert page?". So we need to run some benchmarks to assess the impact. If significant, we can look at alternatives (in addition to the above, possibly adding barriers to arch_leave_lazy_mmu_mode).
On Mon, Jun 09, 2014 at 02:24:29PM +0100, Leif Lindholm wrote: > On Mon, Jun 09, 2014 at 12:03:56PM +0100, Catalin Marinas wrote: > > So I'm proposing an alternative patch (which needs some benchmarking as > > well to see if anything is affected, maybe application startup time). I don't like the proposed patch at all -- keeping the dsb out of set_pte is worthwhile if we can manage it. That said, it would be interesting to know how often we get a subsequent page fault after mapping invalid -> valid because of the missing dsb. It could be that the cost of the benign fault is hitting us more than we think. > I'm happy for any fix which can be included in 3.16. > > But is the dsb(ishst) sufficient? We need to also prevent reads from > overtaking the set_pte(). i.e.: > > ptr = early_ioremap(phys_addr, size); > if (ptr && strcmp(ptr, "magic") == 0) > ... > > Does it not require a dsb(ish)? I don't think so. Crudely, the sequence above would look like: STR x0, [PTEP] DSB ISHST LDR x0, [MAGIC] The DSB can't complete until the STR is globally observed within the inner-shareable domain, so the LDR cannot execute until the page table update is visible to the walker. If it was a DMB, we'd have a problem. Interestingly, the asm-generic page table allocators (e.g. __pmd_alloc) *do* use dmb for ordering observability of page-table updates via smp_wmb. I'm struggling to decide whether that's broken or not. Will -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On Mon, Jun 09, 2014 at 02:38:59PM +0100, Catalin Marinas wrote: > On Mon, Jun 09, 2014 at 02:24:29PM +0100, Leif Lindholm wrote: > > On Mon, Jun 09, 2014 at 12:03:56PM +0100, Catalin Marinas wrote: > > > A quick grep through the kernel shows that we have other set_pte() calls > > > without additional dsb() like create_mapping(), I think kvm_set_pte() as > > > well. > > > > > > So I'm proposing an alternative patch (which needs some benchmarking as > > > well to see if anything is affected, maybe application startup time). > > > > I'm happy for any fix which can be included in 3.16. > > Steve Capper made a point about performance. He'll follow up. > > > But is the dsb(ishst) sufficient? We need to also prevent reads from > > overtaking the set_pte(). i.e.: > > > > ptr = early_ioremap(phys_addr, size); > > if (ptr && strcmp(ptr, "magic") == 0) > > ... > > > > Does it not require a dsb(ish)? > > So doesn't early_ioremap() now include a dsb() after set_pte() with my > patch? > > BTW, according to the ARM ARM (and confirmed with architects), we needs > DSB+ISB even if we have just a data access (rather than instruction > fetch). We have to revisit both 32 and 64-bit code for this. The impact of this really depends on whether or not we need to sandwich DSB+ISB between every update to a set of page tables (possibly plumbing together different levels) or whether the ISB can just come at the end. If the latter case is true, we just need to add something for kernel mappings, which should be pretty low overhead, Will -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
diff --git a/arch/arm64/include/asm/cacheflush.h b/arch/arm64/include/asm/cacheflush.h index a5176cf32dad..8fdf37d83014 100644 --- a/arch/arm64/include/asm/cacheflush.h +++ b/arch/arm64/include/asm/cacheflush.h @@ -138,19 +138,10 @@ static inline void __flush_icache_all(void) #define flush_icache_page(vma,page) do { } while (0) /* - * flush_cache_vmap() is used when creating mappings (eg, via vmap, - * vmalloc, ioremap etc) in kernel space for pages. On non-VIPT - * caches, since the direct-mappings of these pages may contain cached - * data, we need to do a full cache flush to ensure that writebacks - * don't corrupt data placed into these pages via the new mappings. + * Not required on AArch64 with VIPT caches. */ static inline void flush_cache_vmap(unsigned long start, unsigned long end) { - /* - * set_pte_at() called from vmap_pte_range() does not - * have a DSB after cleaning the cache line. - */ - dsb(ish); } static inline void flush_cache_vunmap(unsigned long start, unsigned long end) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index aa150ed99f22..2d03af6a2225 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -136,6 +136,7 @@ extern struct page *empty_zero_page; #define pte_write(pte) (!!(pte_val(pte) & PTE_WRITE)) #define pte_exec(pte) (!(pte_val(pte) & PTE_UXN)) +#define pte_valid(pte) (!!(pte_val(pte) & PTE_VALID)) #define pte_valid_user(pte) \ ((pte_val(pte) & (PTE_VALID | PTE_USER)) == (PTE_VALID | PTE_USER)) @@ -184,6 +185,13 @@ static inline pte_t pte_mkspecial(pte_t pte) static inline void set_pte(pte_t *ptep, pte_t pte) { *ptep = pte; + + /* + * If !pte_valid(pte), the DSB is handled by the TLB invalidation + * operation. + */ + if (pte_valid(pte)) + dsb(ishst); } extern void __sync_icache_dcache(pte_t pteval, unsigned long addr); diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h index b9349c4513ea..b28b970123dc 100644 --- a/arch/arm64/include/asm/tlbflush.h +++ b/arch/arm64/include/asm/tlbflush.h @@ -130,11 +130,6 @@ static inline void flush_tlb_kernel_range(unsigned long start, unsigned long end static inline void update_mmu_cache(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep) { - /* - * set_pte() does not have a DSB, so make sure that the page table - * write is visible. - */ - dsb(ishst); } #define update_mmu_cache_pmd(vma, address, pmd) do { } while (0)