From patchwork Tue Jun 11 01:12:17 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Stultz X-Patchwork-Id: 17757 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-vc0-f197.google.com (mail-vc0-f197.google.com [209.85.220.197]) by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id 13A9B25DF9 for ; Tue, 11 Jun 2013 01:13:01 +0000 (UTC) Received: by mail-vc0-f197.google.com with SMTP id ha12sf2249349vcb.8 for ; Mon, 10 Jun 2013 18:13:00 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-beenthere:x-forwarded-to:x-forwarded-for :delivered-to:from:to:cc:subject:date:message-id:x-mailer :in-reply-to:references:x-gm-message-state:x-original-sender :x-original-authentication-results:precedence:mailing-list:list-id :x-google-group-id:list-post:list-help:list-archive:list-unsubscribe; bh=lVX7Q54gg1lmluBgUBJAtiUN4nPHgZg/X9DRKN9sVSM=; b=TCkb8ldEzmYXa5VUe+fc7MldaqXC+30A9ncPPpNSkgZc18aNC04sfHiRpH4Owc8pzi Y7YXoINUcPo8OoGxmJgbhkVIc34Yb2gP8oBVFEdFv8tvC01YtJvsHKYmZJwxqV2KFlDM 8PbvNL6iABWVRupKIwUp/1mZooxHBEWoY+p4/dHzdouXtnrYHCl8NeXDk4YiOmPTq51M 5q/S2g6kcVfu4dbesVaqvn06YTSEson8qsEZMY7RioIlOk3+Z+ShdNzspFoTZ53KYG9o t+KgnDIV45gWbUnyp5b8KVmfLPHWbNdsOixQZkIE27ubePT++x7mWXWG+2dZaU87hIwD 7gLw== X-Received: by 10.224.59.205 with SMTP id m13mr9718013qah.7.1370913180770; Mon, 10 Jun 2013 18:13:00 -0700 (PDT) MIME-Version: 1.0 X-BeenThere: patchwork-forward@linaro.org Received: by 10.49.12.143 with SMTP id y15ls3088943qeb.82.gmail; Mon, 10 Jun 2013 18:13:00 -0700 (PDT) X-Received: by 10.58.248.99 with SMTP id yl3mr7053626vec.55.1370913180560; Mon, 10 Jun 2013 18:13:00 -0700 (PDT) Received: from mail-vb0-x230.google.com (mail-vb0-x230.google.com [2607:f8b0:400c:c02::230]) by mx.google.com with ESMTPS id ny2si1804386vcb.50.2013.06.10.18.13.00 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 10 Jun 2013 18:13:00 -0700 (PDT) Received-SPF: neutral (google.com: 2607:f8b0:400c:c02::230 is neither permitted nor denied by best guess record for domain of patch+caf_=patchwork-forward=linaro.org@linaro.org) client-ip=2607:f8b0:400c:c02::230; Received: by mail-vb0-f48.google.com with SMTP id w15so2647382vbf.35 for ; Mon, 10 Jun 2013 18:13:00 -0700 (PDT) X-Received: by 10.220.246.8 with SMTP id lw8mr7139327vcb.8.1370913180415; Mon, 10 Jun 2013 18:13:00 -0700 (PDT) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patches@linaro.org Received: by 10.221.10.206 with SMTP id pb14csp89609vcb; Mon, 10 Jun 2013 18:12:59 -0700 (PDT) X-Received: by 10.68.190.104 with SMTP id gp8mr12249100pbc.120.1370913179313; Mon, 10 Jun 2013 18:12:59 -0700 (PDT) Received: from mail-pb0-x235.google.com (mail-pb0-x235.google.com [2607:f8b0:400e:c01::235]) by mx.google.com with ESMTPS id wx5si5837374pbc.83.2013.06.10.18.12.58 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 10 Jun 2013 18:12:59 -0700 (PDT) Received-SPF: neutral (google.com: 2607:f8b0:400e:c01::235 is neither permitted nor denied by best guess record for domain of john.stultz@linaro.org) client-ip=2607:f8b0:400e:c01::235; Received: by mail-pb0-f53.google.com with SMTP id xb12so7757375pbc.26 for ; Mon, 10 Jun 2013 18:12:58 -0700 (PDT) X-Received: by 10.66.162.161 with SMTP id yb1mr15961872pab.213.1370913178823; Mon, 10 Jun 2013 18:12:58 -0700 (PDT) Received: from localhost.localdomain (c-67-170-153-23.hsd1.or.comcast.net. [67.170.153.23]) by mx.google.com with ESMTPSA id qe10sm9802489pbb.2.2013.06.10.18.12.57 for (version=TLSv1.1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 10 Jun 2013 18:12:58 -0700 (PDT) From: John Stultz To: minchan@kernel.org Cc: dgiani@mozilla.com, John Stultz Subject: [PATCH 11/13] vrange: Purging vrange pages without swap Date: Mon, 10 Jun 2013 18:12:17 -0700 Message-Id: <1370913139-9320-12-git-send-email-john.stultz@linaro.org> X-Mailer: git-send-email 1.8.1.2 In-Reply-To: <1370913139-9320-1-git-send-email-john.stultz@linaro.org> References: <1370913139-9320-1-git-send-email-john.stultz@linaro.org> X-Gm-Message-State: ALoCoQnqO4MwQDs5wy9u9AJAnYcUl5OTGjVY69ehNKRJfaJWU0swsZ1oAoZlUuk14GnduEwp2K5p X-Original-Sender: john.stultz@linaro.org X-Original-Authentication-Results: mx.google.com; spf=neutral (google.com: 2607:f8b0:400c:c02::230 is neither permitted nor denied by best guess record for domain of patch+caf_=patchwork-forward=linaro.org@linaro.org) smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org Precedence: list Mailing-list: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org List-ID: X-Google-Group-Id: 836684582541 List-Post: , List-Help: , List-Archive: List-Unsubscribe: , From: Minchan Kim Now one of problem in vrange is VM reclaim anonymous pages if only there is a swap system. This patch adds new hook in kswapd where above scanning normal LRU pages. I should confess that I didn't spend enough time to investigate where is good place for hook. Even, It might be better to add new kvranged thread because there are a few bugs these days in kswapd, which was very sensitive for a small change so adding new hooks may make subtle another problem. It could be better to move vrange code into kswapd after settle down in kvrangd. Otherwise, we could leave at it is in kvranged. Other issue is scanning cost of virtual address. We don't have any information of rss in each VMA so kswapd can scan all address without any gain. It can burn out CPU. I have a plan to account rss by per-VMA at least, anonymous vma. Any comment are welcome! Signed-off-by: Minchan Kim Signed-off-by: John Stultz --- include/linux/rmap.h | 3 + include/linux/vrange.h | 7 ++ include/linux/vrange_types.h | 1 + mm/vmscan.c | 45 +++++++- mm/vrange.c | 247 +++++++++++++++++++++++++++++++++++++++++-- 5 files changed, 291 insertions(+), 12 deletions(-) diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 6432dfb..e822a30 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -83,6 +83,9 @@ enum ttu_flags { }; #ifdef CONFIG_MMU +unsigned long discard_vrange_page_list(struct zone *zone, + struct list_head *page_list); + unsigned long vma_address(struct page *page, struct vm_area_struct *vma); static inline void get_anon_vma(struct anon_vma *anon_vma) diff --git a/include/linux/vrange.h b/include/linux/vrange.h index fb101c6..b6e8b99 100644 --- a/include/linux/vrange.h +++ b/include/linux/vrange.h @@ -31,6 +31,13 @@ static inline int vrange_type(struct vrange *vrange) return vrange->owner->type; } +static inline struct mm_struct *vrange_get_owner_mm(struct vrange *vrange) +{ + if (vrange_type(vrange) != VRANGE_MM) + return NULL; + return container_of(vrange->owner, struct mm_struct, vroot); +} + void vrange_init(void); extern int vrange_clear(struct vrange_root *vroot, unsigned long start, unsigned long end); diff --git a/include/linux/vrange_types.h b/include/linux/vrange_types.h index 71ebc70..bacdfe8 100644 --- a/include/linux/vrange_types.h +++ b/include/linux/vrange_types.h @@ -15,6 +15,7 @@ struct vrange { struct vrange_root *owner; int purged; struct list_head lru; /* protected by lru_lock */ + atomic_t refcount; }; #endif diff --git a/mm/vmscan.c b/mm/vmscan.c index 4107c07..c0f120e 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -684,7 +684,7 @@ static enum page_references page_check_references(struct page *page, /* * shrink_page_list() returns the number of reclaimed pages */ -static unsigned long shrink_page_list(struct list_head *page_list, +unsigned long shrink_page_list(struct list_head *page_list, struct zone *zone, struct scan_control *sc, enum ttu_flags ttu_flags, @@ -986,6 +986,35 @@ keep: return nr_reclaimed; } + +unsigned long discard_vrange_page_list(struct zone *zone, + struct list_head *page_list) +{ + unsigned long ret; + struct scan_control sc = { + .gfp_mask = GFP_KERNEL, + .priority = DEF_PRIORITY, + .may_unmap = 1, + .may_swap = 1, + .may_discard = 1 + }; + + unsigned long dummy1, dummy2; + struct page *page; + + list_for_each_entry(page, page_list, lru) { + VM_BUG_ON(!PageAnon(page)); + ClearPageActive(page); + } + + /* page_list have pages from multiple zones */ + ret = shrink_page_list(page_list, NULL, &sc, + TTU_UNMAP|TTU_IGNORE_ACCESS, + &dummy1, &dummy2, false); + __mod_zone_page_state(zone, NR_ISOLATED_ANON, -ret); + return ret; +} + unsigned long reclaim_clean_pages_from_list(struct zone *zone, struct list_head *page_list) { @@ -2786,6 +2815,16 @@ loop_again: if ((buffer_heads_over_limit && is_highmem_idx(i)) || !zone_balanced(zone, testorder, balance_gap, end_zone)) { + + unsigned int nr_discard; + if (testorder == 0) { + nr_discard = discard_vrange_pages(zone, + SWAP_CLUSTER_MAX); + sc.nr_reclaimed += nr_discard; + if (zone_balanced(zone, testorder, 0, + end_zone)) + goto zone_balanced; + } shrink_zone(zone, &sc); reclaim_state->reclaimed_slab = 0; @@ -2809,7 +2848,8 @@ loop_again: continue; } - if (zone_balanced(zone, testorder, 0, end_zone)) + if (zone_balanced(zone, testorder, 0, end_zone)) { +zone_balanced: /* * If a zone reaches its high watermark, * consider it to be no longer congested. It's @@ -2818,6 +2858,7 @@ loop_again: * speculatively avoid congestion waits */ zone_clear_flag(zone, ZONE_CONGESTED); + } } /* diff --git a/mm/vrange.c b/mm/vrange.c index c686960..e9ea728 100644 --- a/mm/vrange.c +++ b/mm/vrange.c @@ -13,13 +13,30 @@ #include #include #include +#include + +struct vrange_walker_private { + struct zone *zone; + struct vm_area_struct *vma; + struct list_head *pagelist; +}; static LIST_HEAD(lru_vrange); static DEFINE_SPINLOCK(lru_lock); static struct kmem_cache *vrange_cachep; +static void vrange_ctor(void *data) +{ + struct vrange *vrange = data; + INIT_LIST_HEAD(&vrange->lru); +} +void __init vrange_init(void) +{ + vrange_cachep = kmem_cache_create("vrange", sizeof(struct vrange), + 0, SLAB_PANIC, vrange_ctor); +} void lru_add_vrange(struct vrange *vrange) { @@ -61,17 +78,13 @@ void lru_move_vrange_to_head(struct mm_struct *mm, unsigned long address) vrange_unlock(vroot); } -void __init vrange_init(void) -{ - vrange_cachep = KMEM_CACHE(vrange, SLAB_PANIC); -} - static struct vrange *__vrange_alloc(gfp_t flags) { struct vrange *vrange = kmem_cache_alloc(vrange_cachep, flags); if (!vrange) return vrange; vrange->owner = NULL; + atomic_set(&vrange->refcount, 1); INIT_LIST_HEAD(&vrange->lru); return vrange; } @@ -83,6 +96,13 @@ static void __vrange_free(struct vrange *range) kmem_cache_free(vrange_cachep, range); } +static void put_vrange(struct vrange *range) +{ + WARN_ON(atomic_read(&range->refcount) < 0); + if (atomic_dec_and_test(&range->refcount)) + __vrange_free(range); +} + static void __vrange_add(struct vrange *range, struct vrange_root *vroot) { range->owner = vroot; @@ -136,7 +156,7 @@ static int vrange_add(struct vrange_root *vroot, range = container_of(node, struct vrange, node); /* old range covers new range fully */ if (node->start <= start_idx && node->last >= end_idx) { - __vrange_free(new_range); + put_vrange(new_range); goto out; } @@ -145,7 +165,7 @@ static int vrange_add(struct vrange_root *vroot, purged |= range->purged; __vrange_remove(range); - __vrange_free(range); + put_vrange(range); node = next; } @@ -186,7 +206,7 @@ static int vrange_remove(struct vrange_root *vroot, if (start_idx <= node->start && end_idx >= node->last) { /* argumented range covers the range fully */ __vrange_remove(range); - __vrange_free(range); + put_vrange(range); } else if (node->start >= start_idx) { /* * Argumented range covers over the left of the @@ -216,7 +236,7 @@ static int vrange_remove(struct vrange_root *vroot, vrange_unlock(vroot); if (!used_new) - __vrange_free(new_range); + put_vrange(new_range); return 0; } @@ -240,7 +260,7 @@ void vrange_root_cleanup(struct vrange_root *vroot) range = vrange_entry(next); next = rb_next(next); __vrange_remove(range); - __vrange_free(range); + put_vrange(range); } vrange_unlock(vroot); } @@ -633,6 +653,7 @@ int discard_vpage(struct page *page) if (page_freeze_refs(page, 1)) { unlock_page(page); + dec_zone_page_state(page, NR_ISOLATED_ANON); return 1; } } @@ -659,3 +680,209 @@ bool is_purged_vrange(struct mm_struct *mm, unsigned long address) return ret; } +static void vrange_pte_entry(pte_t pteval, unsigned long address, + unsigned ptent_size, struct mm_walk *walk) +{ + struct page *page; + struct vrange_walker_private *vwp = walk->private; + struct vm_area_struct *vma = vwp->vma; + struct list_head *pagelist = vwp->pagelist; + struct zone *zone = vwp->zone; + + if (pte_none(pteval)) + return; + + if (!pte_present(pteval)) + return; + + page = vm_normal_page(vma, address, pteval); + if (unlikely(!page)) + return; + + if (!PageLRU(page) || PageLocked(page) || !PageAnon(page)) + return; + + /* TODO : Support THP and HugeTLB */ + if (unlikely(PageCompound(page))) + return; + + if (zone_idx(page_zone(page)) > zone_idx(zone)) + return; + + if (isolate_lru_page(page)) + return; + + list_add(&page->lru, pagelist); + inc_zone_page_state(page, NR_ISOLATED_ANON); +} + +static int vrange_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, + struct mm_walk *walk) +{ + pte_t *pte; + spinlock_t *ptl; + + pte = pte_offset_map_lock(walk->mm, pmd, addr, &ptl); + for (; addr != end; pte++, addr += PAGE_SIZE) + vrange_pte_entry(*pte, addr, PAGE_SIZE, walk); + pte_unmap_unlock(pte - 1, ptl); + cond_resched(); + return 0; + +} + +unsigned int discard_vma_pages(struct zone *zone, struct mm_struct *mm, + struct vm_area_struct *vma, unsigned long start, + unsigned long end, unsigned int nr_to_discard) +{ + LIST_HEAD(pagelist); + int ret = 0; + struct vrange_walker_private vwp; + struct mm_walk vrange_walk = { + .pmd_entry = vrange_pte_range, + .mm = vma->vm_mm, + .private = &vwp, + }; + + vwp.pagelist = &pagelist; + vwp.vma = vma; + vwp.zone = zone; + + walk_page_range(start, end, &vrange_walk); + + if (!list_empty(&pagelist)) + ret = discard_vrange_page_list(zone, &pagelist); + + putback_lru_pages(&pagelist); + return ret; +} + +unsigned int discard_vrange(struct zone *zone, struct vrange *vrange, + int nr_to_discard) +{ + struct mm_struct *mm; + unsigned long start = vrange->node.start; + unsigned long end = vrange->node.last; + struct vm_area_struct *vma; + unsigned int nr_discarded = 0; + + mm = vrange_get_owner_mm(vrange); + + if (!mm) + goto out; + + if (!down_read_trylock(&mm->mmap_sem)) + goto out; + + vma = find_vma(mm, start); + if (!vma || (vma->vm_start > end)) + goto out_unlock; + + for (; vma; vma = vma->vm_next) { + if (vma->vm_start > end) + break; + + if (vma->vm_file || + (vma->vm_flags & (VM_SPECIAL | VM_LOCKED))) + continue; + + cond_resched(); + nr_discarded += + discard_vma_pages(zone, mm, vma, + max_t(unsigned long, start, vma->vm_start), + min_t(unsigned long, end + 1, vma->vm_end), + nr_to_discard); + } +out_unlock: + up_read(&mm->mmap_sem); +out: + return nr_discarded; +} + +/* + * Get next victim vrange from LRU and hold a vrange refcount + * and vrange->mm's refcount. + */ +struct vrange *get_victim_vrange(void) +{ + struct mm_struct *mm; + struct vrange *vrange = NULL; + struct list_head *cur, *tmp; + + spin_lock(&lru_lock); + list_for_each_prev_safe(cur, tmp, &lru_vrange) { + vrange = list_entry(cur, struct vrange, lru); + mm = vrange_get_owner_mm(vrange); + /* the process is exiting so pass it */ + if (atomic_read(&mm->mm_users) == 0) { + list_del_init(&vrange->lru); + vrange = NULL; + continue; + } + + /* vrange is freeing so continue to loop */ + if (!atomic_inc_not_zero(&vrange->refcount)) { + list_del_init(&vrange->lru); + vrange = NULL; + continue; + } + + /* + * we need to access mmap_sem further routine so + * need to get a refcount of mm. + * NOTE: We guarantee mm_count isn't zero in here because + * if we found vrange from LRU list, it means we are + * before exit_vrange or remove_vrange. + */ + atomic_inc(&mm->mm_count); + + /* Isolate vrange */ + list_del_init(&vrange->lru); + break; + } + + spin_unlock(&lru_lock); + return vrange; +} + +void put_victim_range(struct vrange *vrange) +{ + struct mm_struct *mm = vrange_get_owner_mm(vrange); + put_vrange(vrange); + mmdrop(mm); +} + +unsigned int discard_vrange_pages(struct zone *zone, int nr_to_discard) +{ + struct vrange *vrange, *start_vrange; + unsigned int nr_discarded = 0; + + start_vrange = vrange = get_victim_vrange(); + if (start_vrange) { + struct mm_struct *mm = vrange_get_owner_mm(start_vrange); + atomic_inc(&start_vrange->refcount); + atomic_inc(&mm->mm_count); + } + + while (vrange) { + nr_discarded += discard_vrange(zone, vrange, nr_to_discard); + lru_add_vrange(vrange); + put_victim_range(vrange); + + if (nr_discarded >= nr_to_discard) + break; + + vrange = get_victim_vrange(); + /* break if we go round the loop */ + if (vrange == start_vrange) { + lru_add_vrange(vrange); + put_victim_range(vrange); + break; + } + } + + if (start_vrange) + put_victim_range(start_vrange); + + return nr_discarded; +}