From patchwork Tue Oct 1 18:38:57 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Stultz X-Patchwork-Id: 20739 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-qc0-f200.google.com (mail-qc0-f200.google.com [209.85.216.200]) by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id 057CA23920 for ; Tue, 1 Oct 2013 18:39:22 +0000 (UTC) Received: by mail-qc0-f200.google.com with SMTP id x20sf8246309qcv.11 for ; Tue, 01 Oct 2013 11:39:21 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=mime-version:x-gm-message-state:delivered-to:from:to:subject:date :message-id:in-reply-to:references:x-original-sender :x-original-authentication-results:precedence:mailing-list:list-id :list-post:list-help:list-archive:list-unsubscribe; bh=q8kbcHRqdFm/fv0TUh1w1AGB8dP/jZRaL3O84S3ZKhI=; b=dhj8H2X0OjeFyiQFJdNqrJORqhIfP5W+DnZyPDp6b+b5PRmk7AqdoTMAdVDVl0fJVb sR02J9w0qa3rczj2FdYknIqAz/yfprNAfG1maqAxWq6UlQmr5eVOr+QQiET0oXGTZph6 cYmnUPO1BOU+J1bvCprB217l+HLNiNTrlm+2jK23gg4+fekaLIKIESViNpREEyojgKu3 U1nYg8lXNB20R0ELO+uPr5maw7Ft4zsqZ1Nej7XQxnLDqga1QU0xQgx6/+uD2DwhOw0x b0HR2e1aX6qbW34loYrkCZxdSvbTdQQZtNmtoBPCZX14pZqI0qdw8jfm+T4KrHSHdt00 FD3w== X-Received: by 10.58.56.165 with SMTP id b5mr2350490veq.19.1380652761832; Tue, 01 Oct 2013 11:39:21 -0700 (PDT) MIME-Version: 1.0 X-BeenThere: patchwork-forward@linaro.org Received: by 10.49.70.228 with SMTP id p4ls184668qeu.43.gmail; Tue, 01 Oct 2013 11:39:21 -0700 (PDT) X-Received: by 10.220.13.20 with SMTP id z20mr29121145vcz.0.1380652761697; Tue, 01 Oct 2013 11:39:21 -0700 (PDT) Received: from mail-vb0-f52.google.com (mail-vb0-f52.google.com [209.85.212.52]) by mx.google.com with ESMTPS id o5si1612235vdw.63.1969.12.31.16.00.00 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 01 Oct 2013 11:39:21 -0700 (PDT) Received-SPF: neutral (google.com: 209.85.212.52 is neither permitted nor denied by best guess record for domain of patch+caf_=patchwork-forward=linaro.org@linaro.org) client-ip=209.85.212.52; Received: by mail-vb0-f52.google.com with SMTP id f12so4963613vbg.25 for ; Tue, 01 Oct 2013 11:39:21 -0700 (PDT) X-Gm-Message-State: ALoCoQnrJTz05FnZOzoqdXm82EO6IdyTOXjW6Dj3OuuF2jar8hYx/3qWRwJrzA2tQf0bcTmyBFly X-Received: by 10.220.244.132 with SMTP id lq4mr1346530vcb.31.1380652761534; Tue, 01 Oct 2013 11:39:21 -0700 (PDT) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patches@linaro.org Received: by 10.220.174.196 with SMTP id u4csp38622vcz; Tue, 1 Oct 2013 11:39:20 -0700 (PDT) X-Received: by 10.66.118.204 with SMTP id ko12mr3263616pab.184.1380652760427; Tue, 01 Oct 2013 11:39:20 -0700 (PDT) Received: from mail-pb0-f42.google.com (mail-pb0-f42.google.com [209.85.160.42]) by mx.google.com with ESMTPS id z1si5552250pbw.99.1969.12.31.16.00.00 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 01 Oct 2013 11:39:18 -0700 (PDT) Received-SPF: neutral (google.com: 209.85.160.42 is neither permitted nor denied by best guess record for domain of john.stultz@linaro.org) client-ip=209.85.160.42; Received: by mail-pb0-f42.google.com with SMTP id un15so7560213pbc.1 for ; Tue, 01 Oct 2013 11:39:17 -0700 (PDT) X-Received: by 10.68.171.35 with SMTP id ar3mr30436304pbc.77.1380652757598; Tue, 01 Oct 2013 11:39:17 -0700 (PDT) Received: from localhost.localdomain (c-67-170-153-23.hsd1.or.comcast.net. [67.170.153.23]) by mx.google.com with ESMTPSA id ed3sm8282606pbc.6.1969.12.31.16.00.00 (version=TLSv1.1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 01 Oct 2013 11:39:17 -0700 (PDT) From: John Stultz To: Minchan Kim , Dhaval Giani Subject: [PATCH 13/14] vrange: Allocate vroot dynamically Date: Tue, 1 Oct 2013 11:38:57 -0700 Message-Id: <1380652738-8000-14-git-send-email-john.stultz@linaro.org> X-Mailer: git-send-email 1.8.1.2 In-Reply-To: <1380652738-8000-1-git-send-email-john.stultz@linaro.org> References: <1380652738-8000-1-git-send-email-john.stultz@linaro.org> X-Removed-Original-Auth: Dkim didn't pass. X-Original-Sender: john.stultz@linaro.org X-Original-Authentication-Results: mx.google.com; spf=neutral (google.com: 209.85.212.52 is neither permitted nor denied by best guess record for domain of patch+caf_=patchwork-forward=linaro.org@linaro.org) smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org Precedence: list Mailing-list: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org List-ID: X-Google-Group-Id: 836684582541 List-Post: , List-Help: , List-Archive: List-Unsubscribe: , From: Minchan Kim This patch allocates vroot dynamically when vrange syscall is called so if anybody doesn't call vrange syscall, we don't waste memory space occupied by vroot. The vroot is allocated by SLAB_DESTROY_BY_RCU, thus because we can't guarantee vroot's validity when we are about to access vroot of a different process, the rules are as follows: 1. rcu_read_lock 2. checkt vroot == NULL 3. increment vroot's refcount 4. rcu_read_unlock 5. vrange_lock(vroot) 6. get vrange from tree 7. vrange->owenr == vroot check again because vroot can be allocated for another one in same RCU period. If we're accessing the vroot from our own context, we can skip the rcu & extra checking, since we know the vroot won't disappear from under us while we're running. Cc: Andrew Morton Cc: Android Kernel Team Cc: Robert Love Cc: Mel Gorman Cc: Hugh Dickins Cc: Dave Hansen Cc: Rik van Riel Cc: Dmitry Adamushko Cc: Dave Chinner Cc: Neil Brown Cc: Andrea Righi Cc: Andrea Arcangeli Cc: Aneesh Kumar K.V Cc: Mike Hommey Cc: Taras Glek Cc: Dhaval Giani Cc: Jan Kara Cc: KOSAKI Motohiro Cc: Michel Lespinasse Cc: Rob Clark Cc: Minchan Kim Cc: linux-mm@kvack.org Signed-off-by: Minchan Kim [jstultz: Commit rewording, renamed functions, added helper functions] Signed-off-by: John Stultz --- fs/inode.c | 4 +- include/linux/fs.h | 2 +- include/linux/mm_types.h | 2 +- include/linux/vrange_types.h | 1 + kernel/fork.c | 5 +- mm/mmap.c | 2 +- mm/vrange.c | 259 +++++++++++++++++++++++++++++++++++++++++-- 7 files changed, 256 insertions(+), 19 deletions(-) diff --git a/fs/inode.c b/fs/inode.c index 5364f91..f5b8990 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -353,7 +353,6 @@ void address_space_init_once(struct address_space *mapping) spin_lock_init(&mapping->private_lock); mapping->i_mmap = RB_ROOT; INIT_LIST_HEAD(&mapping->i_mmap_nonlinear); - vrange_root_init(&mapping->vroot, VRANGE_FILE, mapping); } EXPORT_SYMBOL(address_space_init_once); @@ -1421,7 +1420,8 @@ static void iput_final(struct inode *inode) inode_lru_list_del(inode); spin_unlock(&inode->i_lock); - vrange_root_cleanup(&inode->i_mapping->vroot); + vrange_root_cleanup(inode->i_mapping->vroot); + inode->i_mapping->vroot = NULL; evict(inode); } diff --git a/include/linux/fs.h b/include/linux/fs.h index 6ec2953..32ef488 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -415,7 +415,7 @@ struct address_space { struct list_head i_mmap_nonlinear;/*list VM_NONLINEAR mappings */ struct mutex i_mmap_mutex; /* protect tree, count, list */ #ifdef CONFIG_MMU - struct vrange_root vroot; + struct vrange_root *vroot; #endif /* Protected by tree_lock together with the radix tree */ unsigned long nrpages; /* number of total pages */ diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 5d8cdc3..ad7e2fc 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -351,7 +351,7 @@ struct mm_struct { #ifdef CONFIG_MMU - struct vrange_root vroot; + struct vrange_root *vroot; #endif unsigned long hiwater_rss; /* High-watermark of RSS usage */ unsigned long hiwater_vm; /* High-water virtual memory usage */ diff --git a/include/linux/vrange_types.h b/include/linux/vrange_types.h index d7d451c..c4ef8b6 100644 --- a/include/linux/vrange_types.h +++ b/include/linux/vrange_types.h @@ -14,6 +14,7 @@ struct vrange_root { struct mutex v_lock; /* Protect v_rb */ enum vrange_type type; /* range root type */ void *object; /* pointer to mm_struct or mapping */ + atomic_t refcount; }; struct vrange { diff --git a/kernel/fork.c b/kernel/fork.c index ceb38bf..16d58ca 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -545,9 +545,9 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p) (current->mm->flags & MMF_INIT_MASK) : default_dump_filter; mm->core_state = NULL; mm->nr_ptes = 0; + mm->vroot = NULL; memset(&mm->rss_stat, 0, sizeof(mm->rss_stat)); spin_lock_init(&mm->page_table_lock); - vrange_root_init(&mm->vroot, VRANGE_MM, mm); mm_init_aio(mm); mm_init_owner(mm, p); @@ -619,7 +619,8 @@ void mmput(struct mm_struct *mm) if (atomic_dec_and_test(&mm->mm_users)) { uprobe_clear_state(mm); - vrange_root_cleanup(&mm->vroot); + vrange_root_cleanup(mm->vroot); + mm->vroot = NULL; exit_aio(mm); ksm_exit(mm); khugepaged_exit(mm); /* must run before exit_mmap */ diff --git a/mm/mmap.c b/mm/mmap.c index ed7056f..cb2f9e0 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1505,7 +1505,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr, munmap_back: /* zap any volatile ranges */ - vrange_clear(&mm->vroot, addr, addr + len); + vrange_clear(mm->vroot, addr, addr + len); if (find_vma_links(mm, addr, addr + len, &prev, &rb_link, &rb_parent)) { if (do_munmap(mm, addr, len)) diff --git a/mm/vrange.c b/mm/vrange.c index be63f0a..843504e 100644 --- a/mm/vrange.c +++ b/mm/vrange.c @@ -16,6 +16,7 @@ #include static struct kmem_cache *vrange_cachep; +static struct kmem_cache *vroot_cachep; static struct vrange_list { struct list_head list; @@ -44,12 +45,169 @@ static int __init vrange_init(void) { INIT_LIST_HEAD(&vrange_list.list); mutex_init(&vrange_list.lock); + vroot_cachep = kmem_cache_create("vrange_root", + sizeof(struct vrange_root), 0, + SLAB_DESTROY_BY_RCU|SLAB_PANIC, NULL); vrange_cachep = KMEM_CACHE(vrange, SLAB_PANIC); register_shrinker(&vrange_shrinker); return 0; } module_init(vrange_init); +static struct vrange_root *__vroot_alloc(gfp_t flags) +{ + struct vrange_root *vroot = kmem_cache_alloc(vroot_cachep, flags); + if (!vroot) + return vroot; + + atomic_set(&vroot->refcount, 1); + return vroot; +} + +static inline int __vroot_get(struct vrange_root *vroot) +{ + if (!atomic_inc_not_zero(&vroot->refcount)) + return 0; + + return 1; +} + +static inline void __vroot_put(struct vrange_root *vroot) +{ + if (atomic_dec_and_test(&vroot->refcount)) { + enum {VRANGE_MM, VRANGE_FILE} type = vroot->type; + if (type == VRANGE_MM) { + struct mm_struct *mm = vroot->object; + mmdrop(mm); + } else if (type == VRANGE_FILE) { + /* TODO : */ + } else + BUG(); + + WARN_ON(!RB_EMPTY_ROOT(&vroot->v_rb)); + kmem_cache_free(vroot_cachep, vroot); + } +} + +static bool __vroot_init_mm(struct vrange_root *vroot, struct mm_struct *mm) +{ + bool ret = false; + + spin_lock(&mm->page_table_lock); + if (!mm->vroot) { + mm->vroot = vroot; + vrange_root_init(mm->vroot, VRANGE_MM, mm); + atomic_inc(&mm->mm_count); + ret = true; + } + spin_unlock(&mm->page_table_lock); + + return ret; +} + +static bool __vroot_init_mapping(struct vrange_root *vroot, + struct address_space *mapping) +{ + bool ret = false; + + mutex_lock(&mapping->i_mmap_mutex); + if (!mapping->vroot) { + mapping->vroot = vroot; + vrange_root_init(mapping->vroot, VRANGE_FILE, mapping); + /* XXX - inc ref count on mapping? */ + ret = true; + } + mutex_unlock(&mapping->i_mmap_mutex); + + return ret; +} + +static struct vrange_root *vroot_alloc_mm(struct mm_struct *mm) +{ + struct vrange_root *ret, *allocated; + + ret = NULL; + allocated = __vroot_alloc(GFP_NOFS); + if (!allocated) + return NULL; + + if (__vroot_init_mm(allocated, mm)) { + ret = allocated; + allocated = NULL; + } + + if (allocated) + __vroot_put(allocated); + + return ret; +} + +static struct vrange_root *vroot_alloc_vma(struct vm_area_struct *vma) +{ + struct vrange_root *ret, *allocated; + bool val; + + ret = NULL; + allocated = __vroot_alloc(GFP_NOFS); + if (!allocated) + return NULL; + + if (vma->vm_file && (vma->vm_flags & VM_SHARED)) + val = __vroot_init_mapping(allocated, vma->vm_file->f_mapping); + else + val = __vroot_init_mm(allocated, vma->vm_mm); + + if (val) { + ret = allocated; + allocated = NULL; + } + + if (allocated) + __vroot_put(allocated); + + return ret; +} + +static struct vrange_root *vrange_get_vroot(struct vrange *vrange) +{ + struct vrange_root *vroot; + struct vrange_root *ret = NULL; + + rcu_read_lock(); + /* + * Prevent compiler from re-fetching vrange->owner while others + * clears vrange->owner. + */ + vroot = ACCESS_ONCE(vrange->owner); + if (!vroot) + goto out; + + /* + * vroot couldn't be destroyed while we're holding rcu_read_lock + * so it's okay to access vroot + */ + if (!__vroot_get(vroot)) + goto out; + + + /* If we reach here, vroot is either ours or others because + * vroot could be allocated for othres in same RCU period + * so we should check it carefully. For free/reallocating + * for others, all vranges from vroot->tree should be detached + * firstly right before vroot freeing so if we check vrange->owner + * isn't NULL, it means vroot is ours. + */ + smp_rmb(); + if (!vrange->owner) { + __vroot_put(vroot); + goto out; + } + ret = vroot; +out: + rcu_read_unlock(); + return ret; +} + static struct vrange *__vrange_alloc(gfp_t flags) { struct vrange *vrange = kmem_cache_alloc(vrange_cachep, flags); @@ -209,6 +367,9 @@ static int vrange_remove(struct vrange_root *vroot, struct interval_tree_node *node, *next; bool used_new = false; + if (!vroot) + return 0; + if (!purged) return -EINVAL; @@ -279,6 +440,9 @@ void vrange_root_cleanup(struct vrange_root *vroot) struct vrange *range; struct rb_node *node; + if (vroot == NULL) + return; + vrange_lock(vroot); /* We should remove node by post-order traversal */ while ((node = rb_first(&vroot->v_rb))) { @@ -287,6 +451,12 @@ void vrange_root_cleanup(struct vrange_root *vroot) __vrange_put(range); } vrange_unlock(vroot); + /* + * Before removing vroot, we should make sure range-owner + * should be NULL. See the smp_rmb of vrange_get_vroot. + */ + smp_wmb(); + __vroot_put(vroot); } /* @@ -294,6 +464,7 @@ void vrange_root_cleanup(struct vrange_root *vroot) * can't have copied own vrange data structure so that pages in the * vrange couldn't be purged. It would be better rather than failing * fork. + * The down_write of both mm->mmap_sem protects mm->vroot race. */ int vrange_fork(struct mm_struct *new_mm, struct mm_struct *old_mm) { @@ -301,8 +472,14 @@ int vrange_fork(struct mm_struct *new_mm, struct mm_struct *old_mm) struct vrange *range, *new_range; struct rb_node *next; - new = &new_mm->vroot; - old = &old_mm->vroot; + if (!old_mm->vroot) + return 0; + + new = vroot_alloc_mm(new_mm); + if (!new) + return -ENOMEM; + + old = old_mm->vroot; vrange_lock(old); next = rb_first(&old->v_rb); @@ -323,11 +500,12 @@ int vrange_fork(struct mm_struct *new_mm, struct mm_struct *old_mm) } vrange_unlock(old); + return 0; fail: vrange_unlock(old); vrange_root_cleanup(new); - return 0; + return -ENOMEM; } static inline struct vrange_root *__vma_to_vroot(struct vm_area_struct *vma) @@ -335,9 +513,27 @@ static inline struct vrange_root *__vma_to_vroot(struct vm_area_struct *vma) struct vrange_root *vroot = NULL; if (vma->vm_file && (vma->vm_flags & VM_SHARED)) - vroot = &vma->vm_file->f_mapping->vroot; + vroot = vma->vm_file->f_mapping->vroot; else - vroot = &vma->vm_mm->vroot; + vroot = vma->vm_mm->vroot; + + return vroot; +} + +static inline struct vrange_root *__vma_to_vroot_get(struct vm_area_struct *vma) +{ + struct vrange_root *vroot = NULL; + + rcu_read_lock(); + vroot = __vma_to_vroot(vma); + + if (!vroot) + goto out; + + if (!__vroot_get(vroot)) + vroot = NULL; +out: + rcu_read_unlock(); return vroot; } @@ -383,6 +579,11 @@ static ssize_t do_vrange(struct mm_struct *mm, unsigned long start_idx, tmp = end_idx; vroot = __vma_to_vroot(vma); + if (!vroot) + vroot = vroot_alloc_vma(vma); + if (!vroot) + goto out; + vstart_idx = __vma_addr_to_index(vma, start_idx); vend_idx = __vma_addr_to_index(vma, tmp); @@ -495,17 +696,31 @@ out: bool vrange_addr_volatile(struct vm_area_struct *vma, unsigned long addr) { struct vrange_root *vroot; + struct vrange *vrange; unsigned long vstart_idx, vend_idx; bool ret = false; - vroot = __vma_to_vroot(vma); + vroot = __vma_to_vroot_get(vma); + + if (!vroot) + return ret; + vstart_idx = __vma_addr_to_index(vma, addr); vend_idx = vstart_idx + PAGE_SIZE - 1; vrange_lock(vroot); - if (__vrange_find(vroot, vstart_idx, vend_idx)) - ret = true; + vrange = __vrange_find(vroot, vstart_idx, vend_idx); + if (vrange) { + /* + * vroot can be allocated for another process in + * same period so let's check vroot's stability + */ + if (likely(vroot == vrange->owner)) + ret = true; + } vrange_unlock(vroot); + __vroot_put(vroot); + return ret; } @@ -517,6 +732,8 @@ bool vrange_addr_purged(struct vm_area_struct *vma, unsigned long addr) bool ret = false; vroot = __vma_to_vroot(vma); + if (!vroot) + return false; vstart_idx = __vma_addr_to_index(vma, addr); vrange_lock(vroot); @@ -550,6 +767,7 @@ static void try_to_discard_one(struct vrange_root *vroot, struct page *page, pte_t pteval; spinlock_t *ptl; + VM_BUG_ON(!vroot); VM_BUG_ON(!PageLocked(page)); pte = page_check_address(page, mm, addr, &ptl, 0); @@ -608,9 +826,11 @@ static int try_to_discard_anon_vpage(struct page *page) anon_vma_interval_tree_foreach(avc, &anon_vma->rb_root, pgoff, pgoff) { vma = avc->vma; mm = vma->vm_mm; - vroot = &mm->vroot; - address = vma_address(page, vma); + vroot = __vma_to_vroot(vma); + if (!vroot) + continue; + address = vma_address(page, vma); vrange_lock(vroot); if (!__vrange_find(vroot, address, address + PAGE_SIZE - 1)) { vrange_unlock(vroot); @@ -634,10 +854,14 @@ static int try_to_discard_file_vpage(struct page *page) mutex_lock(&mapping->i_mmap_mutex); vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff) { unsigned long address = vma_address(page, vma); - struct vrange_root *vroot = &mapping->vroot; + struct vrange_root *vroot; long vstart_idx; + vroot = __vma_to_vroot(vma); + if (!vroot) + continue; vstart_idx = __vma_addr_to_index(vma, address); + vrange_lock(vroot); if (!__vrange_find(vroot, vstart_idx, vstart_idx + PAGE_SIZE - 1)) { @@ -901,7 +1125,16 @@ static int discard_vrange(struct vrange *vrange) int ret = 0; struct vrange_root *vroot; unsigned int nr_discard = 0; - vroot = vrange->owner; + vroot = vrange_get_vroot(vrange); + if (!vroot) + return 0; + + /* + * Race of vrange->owner could happens with __vrange_remove + * but it's okay because subfunctions will check it again + */ + if (vrange->owner == NULL) + goto out; if (vroot->type == VRANGE_MM) { struct mm_struct *mm = vroot->object; @@ -911,6 +1144,8 @@ static int discard_vrange(struct vrange *vrange) ret = __discard_vrange_file(mapping, vrange, &nr_discard); } +out: + __vroot_put(vroot); return nr_discard; }