Message ID | 20171205093959.9537-1-mhocko@kernel.org |
---|---|
State | New |
Headers | show |
Series | mm, oom_reaper: gather each vma to prevent leaking TLB entry | expand |
On Tue, Dec 05, 2017 at 10:39:59AM +0100, Michal Hocko wrote: > From: Wang Nan <wangnan0@huawei.com> > > commit 687cb0884a714ff484d038e9190edc874edcf146 upstream. > > tlb_gather_mmu(&tlb, mm, 0, -1) means gathering the whole virtual memory > space. In this case, tlb->fullmm is true. Some archs like arm64 > doesn't flush TLB when tlb->fullmm is true: > > commit 5a7862e83000 ("arm64: tlbflush: avoid flushing when fullmm == 1"). > > Which causes leaking of tlb entries. > > Will clarifies his patch: > "Basically, we tag each address space with an ASID (PCID on x86) which > is resident in the TLB. This means we can elide TLB invalidation when > pulling down a full mm because we won't ever assign that ASID to > another mm without doing TLB invalidation elsewhere (which actually > just nukes the whole TLB). > > I think that means that we could potentially not fault on a kernel > uaccess, because we could hit in the TLB" > > There could be a window between complete_signal() sending IPI to other > cores and all threads sharing this mm are really kicked off from cores. > In this window, the oom reaper may calls tlb_flush_mmu_tlbonly() to > flush TLB then frees pages. However, due to the above problem, the TLB > entries are not really flushed on arm64. Other threads are possible to > access these pages through TLB entries. Moreover, a copy_to_user() can > also write to these pages without generating page fault, causes > use-after-free bugs. > > This patch gathers each vma instead of gathering full vm space. In this > case tlb->fullmm is not true. The behavior of oom reaper become similar > to munmapping before do_exit, which should be safe for all archs. > > Link: http://lkml.kernel.org/r/20171107095453.179940-1-wangnan0@huawei.com > Fixes: aac453635549 ("mm, oom: introduce oom reaper") > Signed-off-by: Wang Nan <wangnan0@huawei.com> > Acked-by: Michal Hocko <mhocko@suse.com> > Acked-by: David Rientjes <rientjes@google.com> > Cc: Minchan Kim <minchan@kernel.org> > Cc: Will Deacon <will.deacon@arm.com> > Cc: Bob Liu <liubo95@huawei.com> > Cc: Ingo Molnar <mingo@kernel.org> > Cc: Roman Gushchin <guro@fb.com> > Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> > Cc: Andrea Arcangeli <aarcange@redhat.com> > Cc: <stable@vger.kernel.org> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org> > Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> > [backported to 4.9 stable tree] > Signed-off-by: Michal Hocko <mhocko@suse.com> Thanks for the backport, now queued up. greg k-h
diff --git a/mm/oom_kill.c b/mm/oom_kill.c index d631d251c150..4a184157cc3d 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -524,7 +524,6 @@ static bool __oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm) */ set_bit(MMF_UNSTABLE, &mm->flags); - tlb_gather_mmu(&tlb, mm, 0, -1); for (vma = mm->mmap ; vma; vma = vma->vm_next) { if (is_vm_hugetlb_page(vma)) continue; @@ -546,11 +545,13 @@ static bool __oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm) * we do not want to block exit_mmap by keeping mm ref * count elevated without a good reason. */ - if (vma_is_anonymous(vma) || !(vma->vm_flags & VM_SHARED)) + if (vma_is_anonymous(vma) || !(vma->vm_flags & VM_SHARED)) { + tlb_gather_mmu(&tlb, mm, vma->vm_start, vma->vm_end); unmap_page_range(&tlb, vma, vma->vm_start, vma->vm_end, &details); + tlb_finish_mmu(&tlb, vma->vm_start, vma->vm_end); + } } - tlb_finish_mmu(&tlb, 0, -1); pr_info("oom_reaper: reaped process %d (%s), now anon-rss:%lukB, file-rss:%lukB, shmem-rss:%lukB\n", task_pid_nr(tsk), tsk->comm, K(get_mm_counter(mm, MM_ANONPAGES)),