Message ID | 5347655B.3080307@linaro.org |
---|---|
State | New |
Headers | show |
From: David Long <dave.long@linaro.org> Date: Thu, 10 Apr 2014 23:45:31 -0400 > Replace memcpy and dcache flush in generic uprobes with a call to > copy_to_user_page(), which will do a proper flushing of kernel and > user cache. Also modify the inmplementation of copy_to_user_page > to assume a NULL vma pointer means the user icache corresponding > to this right is stale and needs to be flushed. Note that this patch > does not fix copy_to_user page for the sh, alpha, sparc, or mips > architectures (which do not currently support uprobes). > > Signed-off-by: David A. Long <dave.long@linaro.org> You really need to pass the proper VMA down to the call site rather than pass NULL, that's extremely ugly and totally unnecesary.
On 04/10, David Long wrote: > > static void copy_to_page(struct page *page, unsigned long vaddr, const void *src, int len) > { > void *kaddr = kmap_atomic(page); > - memcpy(kaddr + (vaddr & ~PAGE_MASK), src, len); > + copy_to_user_page(NULL, page, vaddr, kaddr + (vaddr & ~PAGE_MASK), src, len); No, no, this is not what we want... And I do not think we should change copy_to_user_page(). I'll write another email... Oleg.
On Fri, Apr 11, 2014 at 05:22:07PM +0200, Oleg Nesterov wrote: > On 04/11, Oleg Nesterov wrote: > > > > Can't we do _something_ > > like below? > > If not, I'd propose the patch below. > > I can be easily wrong, but it seems that arch/arm can reimplement > arch_uprobe_flush_xol_icache() and do flush_ptrace_access()-like > code. It needs kaddr, but this is not a problem. > > Btw. From arch/arm/include/asm/cacheflush.h > > #define flush_icache_user_range(vma,page,addr,len) \ > flush_dcache_page(page) > > but it has no users? I wonder whether you've read this yet: http://lkml.iu.edu//hypermail/linux/kernel/1404.1/00725.html where I proposed removing flush_icache_user_range() since it's not used on a great many architectures. > And I am just curious, why arm's copy_to_user_page() disables premption > before memcpy? flush_ptrace_access() needs to run on the CPU which ended up with the dirty cache line(s) to cope with those which do not have hardware broadcasting of cache maintanence operations. This is why the hacks that you're doing are just that - they're hacks and are all broken in some way. So, let me re-quote what I asked in a previous thread: | Given that we've already solved that problem, wouldn't it be a good idea | if the tracing code would stop trying to reinvent broken solutions to | problems we have already solved? I fail to see what your problem is with keeping the vma around, and using that infrastructure. If it needs optimisation for uprobes, then let's optimise it. Let's not go inventing a whole new interface solving the same problem.
On Fri, Apr 11, 2014 at 05:22:07PM +0200, Oleg Nesterov wrote: > And I am just curious, why arm's copy_to_user_page() disables premption > before memcpy? Without looking, I suspect its because the VIVT caches, they need to get shot down on every context switch.
On 11 April 2014 07:56, Oleg Nesterov <oleg@redhat.com> wrote: > First of all: I do not pretend I really understand the problems with > icache/etc coherency and how flush_icache_range() actually works on > alpha. Help. > > For those who were not cc'ed. A probed task has a special "xol" vma, > it is used to execute the probed insn out of line. > > Initial implementation was x86 only, so we simply copied the probed > insn into this vma and everything was fine, flush_icache_range() is > nop on x86. > > Then we added flush_dcache_page() for powerpc, > > // this is just kmap() + memcpy() > copy_to_page(area->page, xol_vaddr, > &uprobe->arch.ixol, sizeof(uprobe->arch.ixol)); > > /* > * We probably need flush_icache_user_range() but it needs vma. > * This should work on supported architectures too. > */ > flush_dcache_page(area->page); > > but this doesn't work on arm. So we need another fix. > > On 04/11, David Miller wrote: >> >> From: David Long <dave.long@linaro.org> >> Date: Thu, 10 Apr 2014 23:45:31 -0400 >> >> > Replace memcpy and dcache flush in generic uprobes with a call to >> > copy_to_user_page(), which will do a proper flushing of kernel and >> > user cache. Also modify the inmplementation of copy_to_user_page >> > to assume a NULL vma pointer means the user icache corresponding >> > to this right is stale and needs to be flushed. Note that this patch >> > does not fix copy_to_user page for the sh, alpha, sparc, or mips >> > architectures (which do not currently support uprobes). >> > >> > Signed-off-by: David A. Long <dave.long@linaro.org> >> >> You really need to pass the proper VMA down to the call site >> rather than pass NULL, that's extremely ugly and totally >> unnecesary. > > I agree that we should not change copy_to_user_page(), but I am not sure > the code above should use copy_to_user_page(). > > Because the code above really differs in my opinion from the only user of > copy_to_user_page(), __access_remote_vm(). > > 1. First of all, we do not know vma. > > OK, we can down_read(mmap_sem) and do find_vma() of course. > This is a bit unfortunate, especially because the architectures > we currently support do not need this. Question, maybe silly one but I don't know the answer, why can't we just do look up for vma once and cache results in place like xol_area (along with xol_area.vaddr) and use it all the time. IOW under what circumstances vma for xol area can disappear change so we need constant lookup for it? Comment in xol_area > /* > * We keep the vma's vm_start rather than a pointer to the vma > * itself. The probed process or a naughty kernel module could make > * the vma go away, and we must handle that reasonably gracefully. > */ > unsigned long vaddr; /* Page(s) of instruction slots */ alludes to some of those conditions, but I don't quite follow. Should not we go after "probed process" ability to unmap xol area. xol area is like vdso, signal page and other mapping injected by kernel into user process address space, mmap call should ignore those.. I wonder what would happen if process would try to unmap vdso region. > But, > > 2. The problem is, it would be very nice to remove this vma, or > at least hide it somehow from find_vma/etc. This is the special > mapping we do not want to expose to user-space. > > In fact I even have the patches which remove this vma, but they > do not work with compat tasks unfortunately. I don't think it is right route. Xol area as well as vdso, signal page, etc should be visible as regular VMAs. There are other aspects of the system where they needed. Like core file collection - I would like to have xol area present in my core file if traced process crashed. Like /porc/<pid>/maps - I would like to see my memory layout through this interface and I would like to see xol area there because I can see xol area addresses by some other means. > 3. Unlike __access_remote_vm() we always use current->mm, and the > memory range changed by the code above can only be used by > "current" thread. > > So (perhaps) flush_icache_user_range() can even treat this case > as "mm->mm_users) <= 1" and avoid ipi_flush_icache_page (just in > case, of course I do not know if this is actually possible). > > So can't we do something else? Lets forget about arch/arm for the moment, > suppose that we want to support uprobes on alpha. Can't we do _something_ > like below? Appeal of copy_to_user_page approach is that I don't need to know how to handle sync up of icache and dcache on that architecture, it is already done by someone else when they programmed basic ptrace breakpoint write behavior. Thanks, Victor > And perhap we can do something like this on arm? it can do kmap(page) / > page_address(page) itself. > > Oleg. > > --- x/arch/alpha/kernel/smp.c > +++ x/arch/alpha/kernel/smp.c > @@ -751,15 +751,9 @@ ipi_flush_icache_page(void *x) > flush_tlb_other(mm); > } > > -void > -flush_icache_user_range(struct vm_area_struct *vma, struct page *page, > - unsigned long addr, int len) > -{ > - struct mm_struct *mm = vma->vm_mm; > - > - if ((vma->vm_flags & VM_EXEC) == 0) > - return; > > +void __flush_icache_page_xxx(struct mm_struct *mm, struct page *page) // addr, len ? > +{ > preempt_disable(); > > if (mm == current->active_mm) { > @@ -783,3 +777,24 @@ flush_icache_user_range(struct vm_area_s > > preempt_enable(); > } > + > +void flush_icache_page_xxx(struct mm_struct *mm, struct page *page) > +{ > + struct mm_struct *mm = current->mm; > + > + down_read(&mm->mmap_sem); > + __flush_icache_page_xxx(mm, page); > + up_read(&mm->mmap_sem); > +} > + > +void > +flush_icache_user_range(struct vm_area_struct *vma, struct page *page, > + unsigned long addr, int len) > +{ > + struct mm_struct *mm = vma->vm_mm; > + > + if ((vma->vm_flags & VM_EXEC) == 0) > + return; > + > + __flush_icache_page_xxx(mm, page); > +} >
On Fri, Apr 11, 2014 at 7:56 AM, Oleg Nesterov <oleg@redhat.com> wrote: > First of all: I do not pretend I really understand the problems with > icache/etc coherency and how flush_icache_range() actually works on > alpha. Help. According to the alpha architecture rules, the instruction cache can be completely virtual, and is not only not coherent with the data cache, it's not even necessarily coherent with TLB mapping changes (ie it's purely virtual, and you need to flush it if you change instruction mappings). The virtual caches do have an address space number, so you can have multiple separate virtual address spaces. The way to flush it is with the "imb" instruction (which is not actually an instruction at all, it's a jump to PAL-code, alpha's "explicit microcode") That means that when you modify data that could be code, you do need to do an "imb" _and_ you do need to do it cross-cpu even for thread-local cases in case your thread migrates to another CPU with stale I$ data (the ASN will be the same). You can use the usual VM cpu-mask to tell which other CPU's you'd need to do it on, though. But alpha does not need page or addr/len, because "imb" is "make the whole instruction cache coherent". Your patch looks correct for alpha, afaik. Linus
On Fri, Apr 11, 2014 at 05:32:49PM +0200, Peter Zijlstra wrote: > On Fri, Apr 11, 2014 at 05:22:07PM +0200, Oleg Nesterov wrote: > > And I am just curious, why arm's copy_to_user_page() disables premption > > before memcpy? > > Without looking, I suspect its because the VIVT caches, they need to get > shot down on every context switch. So... let's think about that for a moment... if we have a preemption event, then that's a context switch, which means... No, this is obviously not the reason, because such an event on a fully VIVT system would result in the caches being flushed, meaning that we wouldn't need to do anything if we could be predicably preempted at that point.
On 04/11, Victor Kamensky wrote: > > On 11 April 2014 07:56, Oleg Nesterov <oleg@redhat.com> wrote: > > > > 1. First of all, we do not know vma. > > > > OK, we can down_read(mmap_sem) and do find_vma() of course. > > This is a bit unfortunate, especially because the architectures > > we currently support do not need this. > > Question, maybe silly one but I don't know the answer, why can't we just do > look up for vma once and cache results in place like xol_area (along with > xol_area.vaddr) and use it all the time. IOW under what circumstances > vma for xol area can disappear change so we need constant lookup for it? > Comment in xol_area > > > /* > > * We keep the vma's vm_start rather than a pointer to the vma > > * itself. The probed process or a naughty kernel module could make > > * the vma go away, and we must handle that reasonably gracefully. > > */ > > unsigned long vaddr; /* Page(s) of instruction slots */ > > alludes to some of those conditions, but I don't quite follow. > Should not we go after "probed process" ability to unmap xol area. > xol area is like vdso, But it is not like vdso. And (unlike vsyscall page) vdso can be unmapped too (unless it is FIX_VDSO). > mmap call should ignore > those.. This is not that simple, this means more ugly uprobe_ hooks in mm/. And I think we simply do not want/need this. I didn't write the comment above, but "reasonably gracefully" should mean "we should not allow unmap/remap/etc(xol_area) crash the kernel, the task can crashif it does this, we do not care". The same for vdso, except in this case the kernel can simply forget about this area after it does setup_additional_pages(). > > 2. The problem is, it would be very nice to remove this vma, or > > at least hide it somehow from find_vma/etc. This is the special > > mapping we do not want to expose to user-space. > > > > In fact I even have the patches which remove this vma, but they > > do not work with compat tasks unfortunately. > > I don't think it is right route. Xol area as well as vdso, signal page, etc > should be visible as regular VMAs. There are other aspects of the system > where they needed. Like core file collection - I would like to have > xol area present in my core file if traced process crashed. It must never crash in xol_area, or we have a kernel bug. (we do have such a bug which I am trying to fix right now ;) > /porc/<pid>/maps - I would like to see my memory layout through > this interface and I would like to see xol area there because I > can see xol area addresses by some other means. But it is not "your memory", to some degree. I mean, it would be nice if it was not. This should be more like vsyscall page. And indeed, we can move this into FIXMAP area. The only problem, 32bit task can't use this area in 64-bit machine. > Appeal of copy_to_user_page approach is that I don't need to know > how to handle sync up of icache and dcache on that architecture, Yes, sure, this is true. > it is > already done by someone else when they programmed basic ptrace > breakpoint write behavior. Yes, but (rightly or not) I still think that uprobes differs from ptrace. Perhaps we do not have other choice though. Oleg.
On 04/11, Oleg Nesterov wrote: > > +static void arch_uprobe_copy_ixol(struct xol_area *area, unsigned long vaddr, > + struct arch_uprobe *auprobe) > +{ > +#ifndef ARCH_UPROBE_XXX > + copy_to_page(area->page, vaddr, &auprobe->ixol, sizeof(&auprobe->ixol)); > + /* > + * We probably need flush_icache_user_range() but it needs vma. > + * If this doesn't work define ARCH_UPROBE_XXX. > + */ > + flush_dcache_page(area->page); > +#else > + struct mm_struct *mm = current->mm; > + struct vm_area_struct *vma; > + > + down_read(&mm->mmap_sem); > + vma = find_exact_vma(mm, area->vaddr, area->vaddr + PAGE_SIZE); > + if (vma) { > + void *kaddr = kmap_atomic(area->page); > + copy_to_user_page(vma, area->page, > + vaddr, kaddr + (vaddr & ~PAGE_MASK), > + &auprobe->ixol, sizeof(&auprobe->ixol)); > + kunmap_atomic(kaddr); > + } > + up_read(&mm->mmap_sem); > +#endif And perhaps the patch is not complete. "if (vma)" is not enough, a probed task can mmap something else at this vaddr. copy_to_user_page() should only change the contents of area->page, so memcpy should be fine. But I am not sure that flush_icache_user_range() or flush_ptrace_access() is always safe on every arch if "struct page *page" doesn't match vma. Oleg.
From: Russell King - ARM Linux <linux@arm.linux.org.uk> Date: Fri, 11 Apr 2014 16:30:41 +0100 > | Given that we've already solved that problem, wouldn't it be a good idea > | if the tracing code would stop trying to reinvent broken solutions to > | problems we have already solved? > > I fail to see what your problem is with keeping the vma around, and > using that infrastructure. If it needs optimisation for uprobes, then > let's optimise it. Let's not go inventing a whole new interface > solving the same problem. +1
On Fri, Apr 11, 2014 at 10:24 AM, Oleg Nesterov <oleg@redhat.com> wrote: > +static void arch_uprobe_copy_ixol(struct xol_area *area, unsigned long vaddr, > + struct arch_uprobe *auprobe) > +{ > +#ifndef ARCH_UPROBE_XXX > + copy_to_page(area->page, vaddr, &auprobe->ixol, sizeof(&auprobe->ixol)); > + /* > + * We probably need flush_icache_user_range() but it needs vma. > + * If this doesn't work define ARCH_UPROBE_XXX. > + */ > + flush_dcache_page(area->page); > +#else > + struct mm_struct *mm = current->mm; > + struct vm_area_struct *vma; > + > + down_read(&mm->mmap_sem); > + vma = find_exact_vma(mm, area->vaddr, area->vaddr + PAGE_SIZE); > + if (vma) { > + void *kaddr = kmap_atomic(area->page); > + copy_to_user_page(vma, area->page, > + vaddr, kaddr + (vaddr & ~PAGE_MASK), > + &auprobe->ixol, sizeof(&auprobe->ixol)); > + kunmap_atomic(kaddr); > + } > + up_read(&mm->mmap_sem); > +#endif Yeah, no, this is wrong. the fact is, the *only* possible use for the whole "vma" argument is the "can this be executable" optimization, afaik. So I really think we should just have a fixed "flush_icache_page(page,vaddr)" function. Maybe add a "len" argument, to allow architectures that have to loop over cachelines to do just a minimal loop. Then, to do the vma optimization, let's introduce a new arch_needs_icache_flush(vma, page) function, which on cache coherent architectures can just be zero (or one, since the icache flush itself will be a no-op, so it doesn't really matter), and on others it can do the "have we executed from this page", which may involve just looking at the vma.. Then the uprobe case can just do copy_to_page() flush_dcache_page() flush_icache_page() and be done with it. Hmm? Linus
From: Oleg Nesterov <oleg@redhat.com> Date: Fri, 11 Apr 2014 19:38:53 +0200 > And perhaps the patch is not complete. "if (vma)" is not enough, a probed > task can mmap something else at this vaddr. > > copy_to_user_page() should only change the contents of area->page, so memcpy > should be fine. But I am not sure that flush_icache_user_range() or > flush_ptrace_access() is always safe on every arch if "struct page *page" > doesn't match vma. The architectures want the VMA for two reasons: 1) To get at the 'mm'. The 'mm' is absolutely essential so that we can look at the MM cpumask and therefore determine what cpus this address space has executed upon, and therefore what cpus need the flush broadcast to. 2) To determine if the VMA is executable, in order to avoid the I-cache flush if possible. I think you can get at the 'mm' trivially in this uprobes path, and we can just as well assume that the VMA is executable since this thing is always writing instructions. So we could create a __copy_to_user_page() that takes an 'mm' and a boolean 'executable' which uprobes could unconditionally set true, and copy_to_user_page() would then be implemented in terms of __copy_to_user_page().
From: Linus Torvalds <torvalds@linux-foundation.org> Date: Fri, 11 Apr 2014 10:50:31 -0700 > So I really think we should just have a fixed > "flush_icache_page(page,vaddr)" function. Maybe add a "len" argument, > to allow architectures that have to loop over cachelines to do just a > minimal loop. It's not enough, we need to have the 'mm' so we can know what cpu's this address space has executed upon, and therefore what cpus need the broadcast flush. See my other reply, we can just make a __copy_to_user_page() that takes 'mm' and a boolean 'executable' which uprobes can unconditionally pass as true.
On Fri, Apr 11, 2014 at 11:02 AM, David Miller <davem@davemloft.net> wrote: > > It's not enough, we need to have the 'mm' so we can know what cpu's this > address space has executed upon, and therefore what cpus need the broadcast > flush. Ok. But still, it shouldn't need "vma". > See my other reply, we can just make a __copy_to_user_page() that takes 'mm' > and a boolean 'executable' which uprobes can unconditionally pass as true. Sure, that doesn't look disgusting. That said, I thought at least one architecture (powerpc) did more than just check the executable bit: I think somebody actually does a page-per-page "has this been mapped executably" thing because their icache flush is *so* expensive. So that boolean "executable" bit is potentially architecture-specific. And quite frankly, using the "vma->vm_flags" sounds potentially *incorrect* to me, since it really isn't about the vma. If you change a page through a non-executable vma, you'd want to flush the icache entry for that page mapped in a totally different vma. So I really get the feeling that passing in "vma" is actively *wrong*. The vma interface really makes little to no sense. Hmm? Linus
On 11 April 2014 11:02, David Miller <davem@davemloft.net> wrote: > From: Linus Torvalds <torvalds@linux-foundation.org> > Date: Fri, 11 Apr 2014 10:50:31 -0700 > >> So I really think we should just have a fixed >> "flush_icache_page(page,vaddr)" function. Maybe add a "len" argument, >> to allow architectures that have to loop over cachelines to do just a >> minimal loop. > > It's not enough, we need to have the 'mm' so we can know what cpu's this > address space has executed upon, and therefore what cpus need the broadcast > flush. But in uprobes case xol slot where instruction write happened will be used only by current CPU. The way I read uprobes code other core when it hit the same uprobe address will use different xol slot. Xol slot size is cache line so it will not be moved around. So as long as we know for sure that while tasks performs single step on uprobe xol area instruction it won't be migrated to another core we don't need to do broadcast to any other cores. Thanks, Victor > See my other reply, we can just make a __copy_to_user_page() that takes 'mm' > and a boolean 'executable' which uprobes can unconditionally pass as true.
From: Linus Torvalds <torvalds@linux-foundation.org> Date: Fri, 11 Apr 2014 11:11:33 -0700 > And quite frankly, using the "vma->vm_flags" sounds potentially > *incorrect* to me, since it really isn't about the vma. If you change > a page through a non-executable vma, you'd want to flush the icache > entry for that page mapped in a totally different vma. So I really get > the feeling that passing in "vma" is actively *wrong*. The vma > interface really makes little to no sense. > > Hmm? The vm_flags check is about "could it have gotten into the I-cache via this VMA". If the VMA protections change, we'd do a flush of some sort during that change.
On Fri, Apr 11, 2014 at 11:19 AM, David Miller <davem@davemloft.net> wrote: > > The vm_flags check is about "could it have gotten into the I-cache > via this VMA". .. and that's obviously complete bullshit and wrong. Which is my point. Now, it's possible that doing things right is just too much work for architectures that don't even matter, but dammit, it's still wrong. If you change a page, and it's executably mapped into some other vma, the icache is possibly stale there. The whole _point_ of our cache flushing is to make caches coherent, and anything that uses "vma" to do so is *wrong*. So your argument makes no sense. You're just re-stating that "it's wrong", but you're re-stating it in a way that makes it sounds like it could be right. The "this page has been mapped executably" approach, in contrast, is *correct*. It has a chance in hell of actually making caches coherent. Linus
On 04/11, David Miller wrote: > > From: Oleg Nesterov <oleg@redhat.com> > Date: Fri, 11 Apr 2014 19:38:53 +0200 > > > And perhaps the patch is not complete. "if (vma)" is not enough, a probed > > task can mmap something else at this vaddr. > > > > copy_to_user_page() should only change the contents of area->page, so memcpy > > should be fine. But I am not sure that flush_icache_user_range() or > > flush_ptrace_access() is always safe on every arch if "struct page *page" > > doesn't match vma. > > The architectures want the VMA for two reasons: > > 1) To get at the 'mm'. The 'mm' is absolutely essential so that we can look > at the MM cpumask and therefore determine what cpus this address space has > executed upon, and therefore what cpus need the flush broadcast to. > > 2) To determine if the VMA is executable, in order to avoid the I-cache flush > if possible. Yes, thanks, this is clear. > I think you can get at the 'mm' trivially in this uprobes path, sure, it is always current->mm. > and we can just > as well assume that the VMA is executable since this thing is always writing > instructions. yes. > So we could create a __copy_to_user_page() that takes an 'mm' and a boolean > 'executable' which uprobes could unconditionally set true, and copy_to_user_page() > would then be implemented in terms of __copy_to_user_page(). This needs a lot of per-arch changes. Plus, it seems, in general VM_EXEC is not the only thing __copy_to_user_page() should take into account... But at least we are starting to agree that we need something else ;) Oleg.
On 04/11, Victor Kamensky wrote: > > On 11 April 2014 11:02, David Miller <davem@davemloft.net> wrote: > But in uprobes case xol slot where instruction write happened will be > used only by current CPU. The way I read uprobes code other core > when it hit the same uprobe address will use different xol slot. Xol slot > size is cache line so it will not be moved around. Yes. > So as long as we > know for sure that while tasks performs single step on uprobe xol > area instruction it won't be migrated to another core we don't need to > do broadcast to any other cores. It can migrate to another CPU before it does single-step. Oleg.
On Fri, Apr 11, 2014 at 05:00:29PM +0100, Russell King - ARM Linux wrote: > On Fri, Apr 11, 2014 at 05:32:49PM +0200, Peter Zijlstra wrote: > > On Fri, Apr 11, 2014 at 05:22:07PM +0200, Oleg Nesterov wrote: > > > And I am just curious, why arm's copy_to_user_page() disables premption > > > before memcpy? > > > > Without looking, I suspect its because the VIVT caches, they need to get > > shot down on every context switch. > > So... let's think about that for a moment... if we have a preemption event, > then that's a context switch, which means... > > No, this is obviously not the reason, because such an event on a fully > VIVT system would result in the caches being flushed, meaning that we > wouldn't need to do anything if we could be predicably preempted at that > point. Yeah; I've since realized I was completely wrong about that. Thanks for explaining though.
From: Linus Torvalds <torvalds@linux-foundation.org> Date: Fri, 11 Apr 2014 11:24:58 -0700 > On Fri, Apr 11, 2014 at 11:19 AM, David Miller <davem@davemloft.net> wrote: >> >> The vm_flags check is about "could it have gotten into the I-cache >> via this VMA". > > .. and that's obviously complete bullshit and wrong. Which is my point. > > Now, it's possible that doing things right is just too much work for > architectures that don't even matter, but dammit, it's still wrong. If > you change a page, and it's executably mapped into some other vma, the > icache is possibly stale there. The whole _point_ of our cache > flushing is to make caches coherent, and anything that uses "vma" to > do so is *wrong*. > > So your argument makes no sense. You're just re-stating that "it's > wrong", but you're re-stating it in a way that makes it sounds like it > could be right. > > The "this page has been mapped executably" approach, in contrast, is > *correct*. It has a chance in hell of actually making caches coherent. You're right that using VMA as a hint during ptrace accesses is bogus. If it's writeable, shared, and executable, we won't do the right thing. Since we do most of the cache flushing stuff during normal operations at the PTE modification point, perhaps a piece of page state could be used to handle this. We already use such a thing for D-cache alias flushing.
On Fri, Apr 11, 2014 at 11:58 AM, David Miller <davem@davemloft.net> wrote: > > Since we do most of the cache flushing stuff during normal operations > at the PTE modification point, perhaps a piece of page state could be > used to handle this. We already use such a thing for D-cache alias > flushing. So looking at the powerpc code, I thought ppc already did this, but it seems to do something different: it lazily does the icache flush at page fault time if the page has been marked by dcache flush (with the PG_arch_1 bit indicating whether the page is coherent in the I$). But I don't see it trying to actually flush the icache of already mapped processes when modifying the dcache. So while we *could* do that, apparently no architecture does this. Even the one architecture that I thought did it doesn'r really try to make things globally coherent. (My "analysis" was mainly using "git grep", so maybe I missed something). Linus
On 04/11, Linus Torvalds wrote: > > On Fri, Apr 11, 2014 at 10:24 AM, Oleg Nesterov <oleg@redhat.com> wrote: > > +static void arch_uprobe_copy_ixol(struct xol_area *area, unsigned long vaddr, > > + struct arch_uprobe *auprobe) > > +{ > > +#ifndef ARCH_UPROBE_XXX > > + copy_to_page(area->page, vaddr, &auprobe->ixol, sizeof(&auprobe->ixol)); > > + /* > > + * We probably need flush_icache_user_range() but it needs vma. > > + * If this doesn't work define ARCH_UPROBE_XXX. > > + */ > > + flush_dcache_page(area->page); > > +#else > > + struct mm_struct *mm = current->mm; > > + struct vm_area_struct *vma; > > + > > + down_read(&mm->mmap_sem); > > + vma = find_exact_vma(mm, area->vaddr, area->vaddr + PAGE_SIZE); > > + if (vma) { > > + void *kaddr = kmap_atomic(area->page); > > + copy_to_user_page(vma, area->page, > > + vaddr, kaddr + (vaddr & ~PAGE_MASK), > > + &auprobe->ixol, sizeof(&auprobe->ixol)); > > + kunmap_atomic(kaddr); > > + } > > + up_read(&mm->mmap_sem); > > +#endif > > Yeah, no, this is wrong. Yesss, agreed. > So I really think we should just have a fixed > "flush_icache_page(page,vaddr)" function. > ... > Then the uprobe case can just do > > copy_to_page() > flush_dcache_page() > flush_icache_page() And I obviously like this idea because (iiuc) it more or less matches flush_icache_page_xxx() I tried to suggest. But we need a short term solution for arm. And unless I misunderstood Russell (this is quite possible), arm needs to disable preemption around copy + flush. Russel, so what do you think we can do for arm right now? Does the patch above (and subsequent discussion) answer the "why reinvent" question ? Oleg.
On 14 April 2014 11:59, Oleg Nesterov <oleg@redhat.com> wrote: > On 04/11, Linus Torvalds wrote: >> >> On Fri, Apr 11, 2014 at 10:24 AM, Oleg Nesterov <oleg@redhat.com> wrote: >> > +static void arch_uprobe_copy_ixol(struct xol_area *area, unsigned long vaddr, >> > + struct arch_uprobe *auprobe) >> > +{ >> > +#ifndef ARCH_UPROBE_XXX >> > + copy_to_page(area->page, vaddr, &auprobe->ixol, sizeof(&auprobe->ixol)); >> > + /* >> > + * We probably need flush_icache_user_range() but it needs vma. >> > + * If this doesn't work define ARCH_UPROBE_XXX. >> > + */ >> > + flush_dcache_page(area->page); >> > +#else >> > + struct mm_struct *mm = current->mm; >> > + struct vm_area_struct *vma; >> > + >> > + down_read(&mm->mmap_sem); >> > + vma = find_exact_vma(mm, area->vaddr, area->vaddr + PAGE_SIZE); >> > + if (vma) { >> > + void *kaddr = kmap_atomic(area->page); >> > + copy_to_user_page(vma, area->page, >> > + vaddr, kaddr + (vaddr & ~PAGE_MASK), >> > + &auprobe->ixol, sizeof(&auprobe->ixol)); >> > + kunmap_atomic(kaddr); >> > + } >> > + up_read(&mm->mmap_sem); >> > +#endif >> >> Yeah, no, this is wrong. > > Yesss, agreed. > >> So I really think we should just have a fixed >> "flush_icache_page(page,vaddr)" function. >> ... >> Then the uprobe case can just do >> >> copy_to_page() >> flush_dcache_page() >> flush_icache_page() > > > And I obviously like this idea because (iiuc) it more or less matches > flush_icache_page_xxx() I tried to suggest. Would not page granularity to be too expensive? Note you need to do that on each probe hit and you flushing whole data and instruction page every time. IMHO it will work correctly when you flush just few dcache/icache lines that correspond to xol slot that got modified. Note copy_to_user_page takes len that describes size of area that has to be flushed. Given that we are flushing xol area page at this case; and nothing except one xol slot is any interest for current task, and if CPU can flush one dcache and icache page as quickly as it can flush range, my remark may not matter. I personally would prefer if we could have function like copy_to_user_page but without requirement to pass vma to it. Thanks, Victor > But we need a short term solution for arm. And unless I misunderstood > Russell (this is quite possible), arm needs to disable preemption around > copy + flush. > > Russel, so what do you think we can do for arm right now? Does the patch > above (and subsequent discussion) answer the "why reinvent" question ? > > Oleg. >
On 14 April 2014 13:05, Victor Kamensky <victor.kamensky@linaro.org> wrote: > On 14 April 2014 11:59, Oleg Nesterov <oleg@redhat.com> wrote: >> On 04/11, Linus Torvalds wrote: >>> >>> On Fri, Apr 11, 2014 at 10:24 AM, Oleg Nesterov <oleg@redhat.com> wrote: >>> > +static void arch_uprobe_copy_ixol(struct xol_area *area, unsigned long vaddr, >>> > + struct arch_uprobe *auprobe) >>> > +{ >>> > +#ifndef ARCH_UPROBE_XXX >>> > + copy_to_page(area->page, vaddr, &auprobe->ixol, sizeof(&auprobe->ixol)); >>> > + /* >>> > + * We probably need flush_icache_user_range() but it needs vma. >>> > + * If this doesn't work define ARCH_UPROBE_XXX. >>> > + */ >>> > + flush_dcache_page(area->page); >>> > +#else >>> > + struct mm_struct *mm = current->mm; >>> > + struct vm_area_struct *vma; >>> > + >>> > + down_read(&mm->mmap_sem); >>> > + vma = find_exact_vma(mm, area->vaddr, area->vaddr + PAGE_SIZE); >>> > + if (vma) { >>> > + void *kaddr = kmap_atomic(area->page); >>> > + copy_to_user_page(vma, area->page, >>> > + vaddr, kaddr + (vaddr & ~PAGE_MASK), >>> > + &auprobe->ixol, sizeof(&auprobe->ixol)); >>> > + kunmap_atomic(kaddr); >>> > + } >>> > + up_read(&mm->mmap_sem); >>> > +#endif >>> >>> Yeah, no, this is wrong. >> >> Yesss, agreed. >> >>> So I really think we should just have a fixed >>> "flush_icache_page(page,vaddr)" function. >>> ... >>> Then the uprobe case can just do >>> >>> copy_to_page() >>> flush_dcache_page() >>> flush_icache_page() >> >> >> And I obviously like this idea because (iiuc) it more or less matches >> flush_icache_page_xxx() I tried to suggest. > > Would not page granularity to be too expensive? Note you need to do that on > each probe hit and you flushing whole data and instruction page every time. > IMHO it will work correctly when you flush just few dcache/icache lines that > correspond to xol slot that got modified. Note copy_to_user_page takes > len that describes size of area that has to be flushed. Given that we are > flushing xol area page at this case; and nothing except one xol slot is > any interest for current task, and if CPU can flush one dcache and icache > page as quickly as it can flush range, my remark may not matter. > > I personally would prefer if we could have function like copy_to_user_page > but without requirement to pass vma to it. I was trying to collect some experimental data around this discussion. I did not find anything super surprising and I am not sure how it would matter, but since I collected it already, I will just share anyway. The result covers only one architecture so they should be taken with grain of salt. Test was conducted on ARM h/w. Arndale with Exynos 5250 2 cores CPU and Pandaboard ES with OMAP 4460 were tested. The uporbes/systemtap test was arranged in the following way. SystemTap module was counting number of times functions was called. The SystemTap action is very simple, counter increment, is close to noop operation that allows to see tracing overhead. Traced user-land function had approximately 8000 instructions (unoptimized empty loop of 1000 iterations, with each interaction is 8 instructions). That function was constantly called in the loop 1 million times, and that interval was timed. SystemTap/uprobes testing was enabled and it was observed how targeted user-land execution time changed. Test scenarios and variations ----------------------------- Here is scenarios where measurements took place: vanilla - no tracing, 1 million calls of function that executes 8000 instructions Oleg's fix - Oleg's fix proposed on [1]. Basically it uses copy_to_user_page and it does dynamic look-up of xol area vma every tracing my arm specific fix - this one was proposed on as [2]. It is close to discussed possible solution if we could have something similar to copy_to_user_page function but which does not required vma. My code tried to shared ARM backend of copy_to_user_page and xol access flush function Oleg's fix + forced broadcast - one of concerns I had is situation where smp function call broadcast has to be happen to flush icache. On both of my board that was not needed. So to simulated such situation I've changed code ARM backend of copy_to_user_page to do smp_call_function(flush_ptrace_access_other Tested application had two possible dimensions: 1) number of threads that runs loop over traced function to see how tracing would cope with multicore, default is only one thread, but test could run another loop on second core. 2) number of mapping in target process, target process could have 1000 files mapping to create bunch of vmas. It is to test how much dynamic look-up of xol area vma matters. Results ------- Number shown in the table is time in microseconds to execute tested function with and without tracing presence. Please note well tracing overhead include all related to tracing pieces, not only cache flush that is under discussion. Those pieces will be: taking arch exception layer; uprobes layer; uprobes arch specific layer (before/after); xol look-up, update and cache flush; uprobes single stepping logic, systemtap module callback that generated for .stp tracing script, etc Arndale Pandaboard ES vanilla 5.0 (100%) 11.5 (100%) Oleg's fix 9.8 (196%) 28.1 (244%) Oleg's fix 10.0 (200%) 28.7 (250%) + 1000 mappings Victor's fix 9.4 (188%) 26.4 (230%) Oleg's fix 13.7 (274%) 39.8 (346%) + broadcast 1 thread Oleg's fix 14.1 (282%) 41.6 (361%) + broadcast 2 threads Observations ------------ x) generally uprobes tracing is a bit expensive 1 trace roughly would cost around 10000 instructions/cycles x) the way how cache is flushed matters somewhat, but because of big overall tracing overhead those differences may not matter x) looking at 'Oleg's fix' vs 'Oleg's fix + 1000 mappings' shows that vma look-up noticeable but the difference is marginal. No surprise here: rb tree search works fast. x) need to broadcast icache flush has most noticeable impact. I am not sure how essential is that. Both tested platforms Exynos 5250 and OMAP 4460 did not need that operation. I am not sure what CPU would really have this issue ... x) fix that I did for ARM that shares ARM code with copy_to_user_page but does not need vma performs best. Essentially it differs from 'Oleg's fix' that no vma lookup at all. But I have to admit resulting code looks a bit ugly. I wonder whether the gain matters enough ... maybe not. Dynamic xol slots vs cached to uprobe xol slots ----------------------------------------------- This section could be a bit off topic, but introduce interesting data point. When I looked at current uprobes single step out line code and compared it with code that was in the past (utrace times) I noticed main essential difference how xol slots are handled: Currently for each hit uprobes code allocate xol slot and needs dcache/icache flush. But in the past xol slot was attached/cached to uprobe entry and if there were enough xol slots dcache/icache flush would happen only once and latter tracing would not need it. For cases where there were not enough xol slots lru algorithm was used to rotate xol slots vs uprobes. Previous uprobes mechanism was more immune to cost of modifying instruction stream, because modifying instruction stream was one time operation and after that under normal circumstances traced apps did not touch instructions at all. I am quite sure that semi-static xol slot allocation scheme had it is own set of issues. I've tried to hack cached xol slot scheme and measure time difference it brings. Arndale hack static 8.9 (188%) xol slot 8.9 microseconds gives idea about all other overheads during uprobes tracing except of xol allocation and icache/dcache flush. I.e cost of dynamically allocating xol slot, dcache/icache flush and impact of cache flush on application is around 0.5 and 1.1 microsecond as long as no cache operations broadcast is involved. I.e cost is not that big, as long as modern CPU that does not need cache flush broadcasts, dynamic xol scheme looks OK to me. Raw Results and Test Source Code -------------------------------- I don't publish my test source code and raw results here because it is quite big. Raw results were collected with target test running under perf, so it could be seen how different schemes affect cache and tlb misses. If anyone interested in source and raw data please let me know I will post it here. Thanks, Victor [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2014-April/246595.html [2] http://lists.infradead.org/pipermail/linux-arm-kernel/2014-April/245743.html > Thanks, > Victor > >> But we need a short term solution for arm. And unless I misunderstood >> Russell (this is quite possible), arm needs to disable preemption around >> copy + flush. >> >> Russel, so what do you think we can do for arm right now? Does the patch >> above (and subsequent discussion) answer the "why reinvent" question ? >> >> Oleg. >>
On 04/14, Victor Kamensky wrote: > > On 14 April 2014 11:59, Oleg Nesterov <oleg@redhat.com> wrote: > > On 04/11, Linus Torvalds wrote: > >> > >> So I really think we should just have a fixed > >> "flush_icache_page(page,vaddr)" function. > >> ... > >> Then the uprobe case can just do > >> > >> copy_to_page() > >> flush_dcache_page() > >> flush_icache_page() > > > > > > And I obviously like this idea because (iiuc) it more or less matches > > flush_icache_page_xxx() I tried to suggest. > > Would not page granularity to be too expensive? Note you need to do that on > each probe hit and you flushing whole data and instruction page every time. > IMHO it will work correctly when you flush just few dcache/icache lines that > correspond to xol slot that got modified. Note copy_to_user_page takes > len that describes size of area that has to be flushed. Given that we are > flushing xol area page at this case; and nothing except one xol slot is > any interest for current task, and if CPU can flush one dcache and icache > page as quickly as it can flush range, my remark may not matter. We can add "vaddr, len" to the argument list. > I personally would prefer if we could have function like copy_to_user_page > but without requirement to pass vma to it. I won't argue, but you need to convince maintainers. And to remind, we need something simple/nonintrusive for arm right now. Again, I won't argue if we turn copy_to_page() + flush_dcache_page() into __weak arch_uprobe_copy_ixol(), and add the necessary hacks into arm's implementatiion. This is up to you and Russel. But. Please do not add copy_to_user_page() into copy_to_page() (as your patch did). This is certainly not what uprobe_write_opcode() wants, we do not want or need "flush" in this case. The same for __create_xol_area(). Note also that currently copy_to_user_page() is always called under mmap_sem. I do not know if arm actually needs ->mmap_sem, but if you propose it as a generic solution we should probably take this lock. Also. Even if we have copy_to_user_page_no_vma() or change copy_to_user_page() to accept vma => NULL, I am not sure this will work fine on arm when the probed application unmaps xol_area and mmaps something else at the same vaddr. I mean, in this case we do not care if the application crashes, but please verify that something really bad can't happen. Let me repeat just in case, I know nothing about arm/. I can't even understand how, say, flush_pfn_alias() works, and how it should work if 2 threads call it at the same time with the same vaddr (or CACHE_COLOUR(vaddr)). Oleg.
On 04/14, Victor Kamensky wrote: > > Oleg's fix - Oleg's fix proposed on [1]. Basically it uses > copy_to_user_page and it does dynamic look-up of xol area vma > every tracing I guess I was not clear. No, I didn't really try to propose this change, I do not like it ;) I showed this hack in reply to multiple and persistent requests to reuse the ptrace solution we already have. > my arm specific fix - this one was proposed on as [2]. I didn't even try to read the changes in arm/, I can't understand them anyway. I leave this to you and Russel. But, once again, please do not add arch_uprobe_flush_xol_access(), add arch_uprobe_copy_ixol(). > x) fix that I did for ARM that shares ARM code with > copy_to_user_page but does not need vma performs best. The patch which adds copy_to_user_page(vma => NULL) into copy_to_page() ? Please see the comments in my previous email. > When I looked at current uprobes single step out line code > and compared it with code that was in the past (utrace > times) I noticed main essential difference how xol slots > are handled: Currently for each hit uprobes code allocate > xol slot and needs dcache/icache flush. But in the past > xol slot was attached/cached to uprobe entry Can't comment, I am not familiar with the old implementation. But yes, the current implementation is not perfect. Once again, it would be nice to remove this vma. Even if this is not possible, we can try to share this memory. We do not even need lru, we can make it "per cpu" and avoid the broadcasts. On x86 this is simple, we have __switch_to_xtra() which can re-copy ->ixol[] and do flush_icache_range() if UTASK_SSTEP. Not sure this is possible on arm and other arch'es. But lets not discuss this right now, this is a bit off-topic currently. Oleg.
On 15 April 2014 08:46, Oleg Nesterov <oleg@redhat.com> wrote: > On 04/14, Victor Kamensky wrote: >> >> On 14 April 2014 11:59, Oleg Nesterov <oleg@redhat.com> wrote: >> > On 04/11, Linus Torvalds wrote: >> >> >> >> So I really think we should just have a fixed >> >> "flush_icache_page(page,vaddr)" function. >> >> ... >> >> Then the uprobe case can just do >> >> >> >> copy_to_page() >> >> flush_dcache_page() >> >> flush_icache_page() >> > >> > >> > And I obviously like this idea because (iiuc) it more or less matches >> > flush_icache_page_xxx() I tried to suggest. >> >> Would not page granularity to be too expensive? Note you need to do that on >> each probe hit and you flushing whole data and instruction page every time. >> IMHO it will work correctly when you flush just few dcache/icache lines that >> correspond to xol slot that got modified. Note copy_to_user_page takes >> len that describes size of area that has to be flushed. Given that we are >> flushing xol area page at this case; and nothing except one xol slot is >> any interest for current task, and if CPU can flush one dcache and icache >> page as quickly as it can flush range, my remark may not matter. > > We can add "vaddr, len" to the argument list. > >> I personally would prefer if we could have function like copy_to_user_page >> but without requirement to pass vma to it. > > I won't argue, but you need to convince maintainers. > > > And to remind, we need something simple/nonintrusive for arm right now. > Again, I won't argue if we turn copy_to_page() + flush_dcache_page() into > __weak arch_uprobe_copy_ixol(), and add the necessary hacks into arm's > implementatiion. This is up to you and Russel. For short term arm specific solution I will follow up on [1]. Yes, I will incorporate your request to make arch_uprobe_copy_ixol() instead of arch_uprobe_flush_xol_access, will address Dave Long's comments about checkpatch and will remove special handling for broadcast situation (FLAG_UA_BROADCAST) since in further discussion it was established that task can migrate while doing uprobes xol single stepping. I don't think my patch does those things that you describe below. Anyway, I will repost new version of short term arm specific fix proposal today PST and we will see what Russell, you and all will say about it. [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2014-April/245952.html http://lists.infradead.org/pipermail/linux-arm-kernel/2014-April/245743.html Thanks, Victor > > > But. Please do not add copy_to_user_page() into copy_to_page() (as your patch > did). This is certainly not what uprobe_write_opcode() wants, we do not want > or need "flush" in this case. The same for __create_xol_area(). > > Note also that currently copy_to_user_page() is always called under mmap_sem. > I do not know if arm actually needs ->mmap_sem, but if you propose it as a > generic solution we should probably take this lock. > > Also. Even if we have copy_to_user_page_no_vma() or change copy_to_user_page() > to accept vma => NULL, I am not sure this will work fine on arm when the probed > application unmaps xol_area and mmaps something else at the same vaddr. I mean, > in this case we do not care if the application crashes, but please verify that > something really bad can't happen. > > Let me repeat just in case, I know nothing about arm/. I can't even understand > how, say, flush_pfn_alias() works, and how it should work if 2 threads call it > at the same time with the same vaddr (or CACHE_COLOUR(vaddr)). > > Oleg. >
On Thu, Apr 10, 2014 at 11:45:31PM -0400, David Long wrote: > diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c > index 04709b6..2e976fb 100644 > --- a/kernel/events/uprobes.c > +++ b/kernel/events/uprobes.c > @@ -241,7 +241,7 @@ static void copy_from_page(struct page *page, unsigned long vaddr, void *dst, in > static void copy_to_page(struct page *page, unsigned long vaddr, const void *src, int len) > { > void *kaddr = kmap_atomic(page); > - memcpy(kaddr + (vaddr & ~PAGE_MASK), src, len); > + copy_to_user_page(NULL, page, vaddr, kaddr + (vaddr & ~PAGE_MASK), src, len); > kunmap_atomic(kaddr); > } Rather than changing all the architectures to be able to pass a NULL vma to copy_to_user_page(), you can create a dummy vma on the stack with the VM_EXEC flag and pass a pointer to it.
diff --git a/arch/arc/include/asm/cacheflush.h b/arch/arc/include/asm/cacheflush.h index 6abc497..64d67e4 100644 --- a/arch/arc/include/asm/cacheflush.h +++ b/arch/arc/include/asm/cacheflush.h @@ -110,7 +110,7 @@ static inline int cache_is_vipt_aliasing(void) #define copy_to_user_page(vma, page, vaddr, dst, src, len) \ do { \ memcpy(dst, src, len); \ - if (vma->vm_flags & VM_EXEC) \ + if (!vma || vma->vm_flags & VM_EXEC) \ __sync_icache_dcache((unsigned long)(dst), vaddr, len); \ } while (0) diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c index 3387e60..dd19ad4 100644 --- a/arch/arm/mm/flush.c +++ b/arch/arm/mm/flush.c @@ -114,7 +114,7 @@ void flush_ptrace_access(struct vm_area_struct *vma, struct page *page, unsigned long uaddr, void *kaddr, unsigned long len) { if (cache_is_vivt()) { - if (cpumask_test_cpu(smp_processor_id(), mm_cpumask(vma->vm_mm))) { + if (!vma || cpumask_test_cpu(smp_processor_id(), mm_cpumask(vma->vm_mm))) { unsigned long addr = (unsigned long)kaddr; __cpuc_coherent_kern_range(addr, addr + len); } @@ -128,7 +128,7 @@ void flush_ptrace_access(struct vm_area_struct *vma, struct page *page, } /* VIPT non-aliasing D-cache */ - if (vma->vm_flags & VM_EXEC) { + if (!vma || vma->vm_flags & VM_EXEC) { unsigned long addr = (unsigned long)kaddr; if (icache_is_vipt_aliasing()) flush_icache_alias(page_to_pfn(page), uaddr, len); diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c index e4193e3..cde3cb4 100644 --- a/arch/arm64/mm/flush.c +++ b/arch/arm64/mm/flush.c @@ -38,7 +38,7 @@ static void flush_ptrace_access(struct vm_area_struct *vma, struct page *page, unsigned long uaddr, void *kaddr, unsigned long len) { - if (vma->vm_flags & VM_EXEC) { + if (!vma || vma->vm_flags & VM_EXEC) { unsigned long addr = (unsigned long)kaddr; if (icache_is_aliasing()) { __flush_dcache_area(kaddr, len); diff --git a/arch/avr32/mm/cache.c b/arch/avr32/mm/cache.c index 6a46ecd..cd3d378 100644 --- a/arch/avr32/mm/cache.c +++ b/arch/avr32/mm/cache.c @@ -156,7 +156,7 @@ void copy_to_user_page(struct vm_area_struct *vma, struct page *page, unsigned long len) { memcpy(dst, src, len); - if (vma->vm_flags & VM_EXEC) + if (!vma || vma->vm_flags & VM_EXEC) flush_icache_range((unsigned long)dst, (unsigned long)dst + len); } diff --git a/arch/hexagon/include/asm/cacheflush.h b/arch/hexagon/include/asm/cacheflush.h index 49e0896..9bea768 100644 --- a/arch/hexagon/include/asm/cacheflush.h +++ b/arch/hexagon/include/asm/cacheflush.h @@ -86,7 +86,7 @@ static inline void copy_to_user_page(struct vm_area_struct *vma, void *dst, void *src, int len) { memcpy(dst, src, len); - if (vma->vm_flags & VM_EXEC) { + if (!vma || vma->vm_flags & VM_EXEC) { flush_icache_range((unsigned long) dst, (unsigned long) dst + len); } diff --git a/arch/m68k/include/asm/cacheflush_mm.h b/arch/m68k/include/asm/cacheflush_mm.h index fa2c3d6..afefdeb 100644 --- a/arch/m68k/include/asm/cacheflush_mm.h +++ b/arch/m68k/include/asm/cacheflush_mm.h @@ -212,7 +212,7 @@ static inline void flush_cache_range(struct vm_area_struct *vma, static inline void flush_cache_page(struct vm_area_struct *vma, unsigned long vmaddr, unsigned long pfn) { - if (vma->vm_mm == current->mm) + if (!vma || vma->vm_mm == current->mm) __flush_cache_030(); } diff --git a/arch/microblaze/include/asm/cacheflush.h b/arch/microblaze/include/asm/cacheflush.h index ffea82a..9eef956 100644 --- a/arch/microblaze/include/asm/cacheflush.h +++ b/arch/microblaze/include/asm/cacheflush.h @@ -108,7 +108,7 @@ static inline void copy_to_user_page(struct vm_area_struct *vma, { u32 addr = virt_to_phys(dst); memcpy(dst, src, len); - if (vma->vm_flags & VM_EXEC) { + if (!vma || vma->vm_flags & VM_EXEC) { invalidate_icache_range(addr, addr + PAGE_SIZE); flush_dcache_range(addr, addr + PAGE_SIZE); } diff --git a/arch/mips/mm/init.c b/arch/mips/mm/init.c index 6b59617..e428551 100644 --- a/arch/mips/mm/init.c +++ b/arch/mips/mm/init.c @@ -232,7 +232,7 @@ void copy_to_user_page(struct vm_area_struct *vma, if (cpu_has_dc_aliases) SetPageDcacheDirty(page); } - if ((vma->vm_flags & VM_EXEC) && !cpu_has_ic_fills_f_dc) + if ((!vma || vma->vm_flags & VM_EXEC) && !cpu_has_ic_fills_f_dc) flush_cache_page(vma, vaddr, page_to_pfn(page)); } diff --git a/arch/parisc/include/asm/tlbflush.h b/arch/parisc/include/asm/tlbflush.h index 9d086a5..7aad1b6 100644 --- a/arch/parisc/include/asm/tlbflush.h +++ b/arch/parisc/include/asm/tlbflush.h @@ -68,6 +68,10 @@ static inline void flush_tlb_page(struct vm_area_struct *vma, /* For one page, it's not worth testing the split_tlb variable */ mb(); + if (!vma) { + flush_tlb_all(); + return; + } sid = vma->vm_mm->context; purge_tlb_start(flags); mtsp(sid, 1); diff --git a/arch/parisc/kernel/cache.c b/arch/parisc/kernel/cache.c index ac87a40..ff09f05 100644 --- a/arch/parisc/kernel/cache.c +++ b/arch/parisc/kernel/cache.c @@ -278,7 +278,7 @@ __flush_cache_page(struct vm_area_struct *vma, unsigned long vmaddr, { preempt_disable(); flush_dcache_page_asm(physaddr, vmaddr); - if (vma->vm_flags & VM_EXEC) + if (!vma || vma->vm_flags & VM_EXEC) flush_icache_page_asm(physaddr, vmaddr); preempt_enable(); } @@ -574,7 +574,7 @@ void flush_cache_range(struct vm_area_struct *vma, void flush_cache_page(struct vm_area_struct *vma, unsigned long vmaddr, unsigned long pfn) { - BUG_ON(!vma->vm_mm->context); + BUG_ON(vma && !vma->vm_mm->context); if (pfn_valid(pfn)) { flush_tlb_page(vma, vmaddr); diff --git a/arch/score/include/asm/cacheflush.h b/arch/score/include/asm/cacheflush.h index 1d545d0..63e7b4e 100644 --- a/arch/score/include/asm/cacheflush.h +++ b/arch/score/include/asm/cacheflush.h @@ -41,7 +41,7 @@ static inline void flush_icache_page(struct vm_area_struct *vma, #define copy_to_user_page(vma, page, vaddr, dst, src, len) \ do { \ memcpy(dst, src, len); \ - if ((vma->vm_flags & VM_EXEC)) \ + if (!vma || (vma->vm_flags & VM_EXEC)) \ flush_cache_page(vma, vaddr, page_to_pfn(page));\ } while (0) diff --git a/arch/score/mm/cache.c b/arch/score/mm/cache.c index f85ec1a..8464575 100644 --- a/arch/score/mm/cache.c +++ b/arch/score/mm/cache.c @@ -210,7 +210,7 @@ void flush_cache_range(struct vm_area_struct *vma, void flush_cache_page(struct vm_area_struct *vma, unsigned long addr, unsigned long pfn) { - int exec = vma->vm_flags & VM_EXEC; + int exec = !vma || vma->vm_flags & VM_EXEC; unsigned long kaddr = 0xa0000000 | (pfn << PAGE_SHIFT); flush_dcache_range(kaddr, kaddr + PAGE_SIZE); diff --git a/arch/sh/mm/cache.c b/arch/sh/mm/cache.c index 616966a..ba2313a 100644 --- a/arch/sh/mm/cache.c +++ b/arch/sh/mm/cache.c @@ -70,7 +70,7 @@ void copy_to_user_page(struct vm_area_struct *vma, struct page *page, clear_bit(PG_dcache_clean, &page->flags); } - if (vma->vm_flags & VM_EXEC) + if (!vma || vma->vm_flags & VM_EXEC) flush_cache_page(vma, vaddr, page_to_pfn(page)); } diff --git a/arch/sparc/mm/leon_mm.c b/arch/sparc/mm/leon_mm.c index 5bed085..dca5e18 100644 --- a/arch/sparc/mm/leon_mm.c +++ b/arch/sparc/mm/leon_mm.c @@ -192,7 +192,7 @@ void leon_flush_dcache_all(void) void leon_flush_pcache_all(struct vm_area_struct *vma, unsigned long page) { - if (vma->vm_flags & VM_EXEC) + if (!vma || vma->vm_flags & VM_EXEC) leon_flush_icache_all(); leon_flush_dcache_all(); } diff --git a/arch/tile/include/asm/cacheflush.h b/arch/tile/include/asm/cacheflush.h index 92ee4c8..7b7022c 100644 --- a/arch/tile/include/asm/cacheflush.h +++ b/arch/tile/include/asm/cacheflush.h @@ -66,7 +66,7 @@ static inline void copy_to_user_page(struct vm_area_struct *vma, void *dst, void *src, int len) { memcpy(dst, src, len); - if (vma->vm_flags & VM_EXEC) { + if (!vma || vma->vm_flags & VM_EXEC) { flush_icache_range((unsigned long) dst, (unsigned long) dst + len); } diff --git a/arch/unicore32/mm/flush.c b/arch/unicore32/mm/flush.c index 6d4c096..10ddab3 100644 --- a/arch/unicore32/mm/flush.c +++ b/arch/unicore32/mm/flush.c @@ -36,7 +36,7 @@ static void flush_ptrace_access(struct vm_area_struct *vma, struct page *page, unsigned long uaddr, void *kaddr, unsigned long len) { /* VIPT non-aliasing D-cache */ - if (vma->vm_flags & VM_EXEC) { + if (!vma || vma->vm_flags & VM_EXEC) { unsigned long addr = (unsigned long)kaddr; __cpuc_coherent_kern_range(addr, addr + len); diff --git a/arch/xtensa/mm/cache.c b/arch/xtensa/mm/cache.c index ba4c47f..d34c06c 100644 --- a/arch/xtensa/mm/cache.c +++ b/arch/xtensa/mm/cache.c @@ -221,10 +221,10 @@ void copy_to_user_page(struct vm_area_struct *vma, struct page *page, unsigned long t = TLBTEMP_BASE_1 + (vaddr & DCACHE_ALIAS_MASK); __flush_invalidate_dcache_range((unsigned long) dst, len); - if ((vma->vm_flags & VM_EXEC) != 0) + if (!vma || (vma->vm_flags & VM_EXEC) != 0) __invalidate_icache_page_alias(t, phys); - } else if ((vma->vm_flags & VM_EXEC) != 0) { + } else if (!vma || (vma->vm_flags & VM_EXEC) != 0) { __flush_dcache_range((unsigned long)dst,len); __invalidate_icache_range((unsigned long) dst, len); } diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 04709b6..2e976fb 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -241,7 +241,7 @@ static void copy_from_page(struct page *page, unsigned long vaddr, void *dst, in static void copy_to_page(struct page *page, unsigned long vaddr, const void *src, int len) { void *kaddr = kmap_atomic(page); - memcpy(kaddr + (vaddr & ~PAGE_MASK), src, len); + copy_to_user_page(NULL, page, vaddr, kaddr + (vaddr & ~PAGE_MASK), src, len); kunmap_atomic(kaddr); } @@ -1299,11 +1299,6 @@ static unsigned long xol_get_insn_slot(struct uprobe *uprobe) /* Initialize the slot */ copy_to_page(area->page, xol_vaddr, &uprobe->arch.ixol, sizeof(uprobe->arch.ixol)); - /* - * We probably need flush_icache_user_range() but it needs vma. - * This should work on supported architectures too. - */ - flush_dcache_page(area->page); return xol_vaddr; }
Replace memcpy and dcache flush in generic uprobes with a call to copy_to_user_page(), which will do a proper flushing of kernel and user cache. Also modify the inmplementation of copy_to_user_page to assume a NULL vma pointer means the user icache corresponding to this right is stale and needs to be flushed. Note that this patch does not fix copy_to_user page for the sh, alpha, sparc, or mips architectures (which do not currently support uprobes). Signed-off-by: David A. Long <dave.long@linaro.org> --- arch/arc/include/asm/cacheflush.h | 2 +- arch/arm/mm/flush.c | 4 ++-- arch/arm64/mm/flush.c | 2 +- arch/avr32/mm/cache.c | 2 +- arch/hexagon/include/asm/cacheflush.h | 2 +- arch/m68k/include/asm/cacheflush_mm.h | 2 +- arch/microblaze/include/asm/cacheflush.h | 2 +- arch/mips/mm/init.c | 2 +- arch/parisc/include/asm/tlbflush.h | 4 ++++ arch/parisc/kernel/cache.c | 4 ++-- arch/score/include/asm/cacheflush.h | 2 +- arch/score/mm/cache.c | 2 +- arch/sh/mm/cache.c | 2 +- arch/sparc/mm/leon_mm.c | 2 +- arch/tile/include/asm/cacheflush.h | 2 +- arch/unicore32/mm/flush.c | 2 +- arch/xtensa/mm/cache.c | 4 ++-- kernel/events/uprobes.c | 7 +------ 18 files changed, 24 insertions(+), 25 deletions(-)