From patchwork Tue Apr 29 21:21:21 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Stultz X-Patchwork-Id: 29376 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-qa0-f72.google.com (mail-qa0-f72.google.com [209.85.216.72]) by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id 9FEDE203F4 for ; Tue, 29 Apr 2014 21:21:40 +0000 (UTC) Received: by mail-qa0-f72.google.com with SMTP id m5sf2095518qaj.11 for ; Tue, 29 Apr 2014 14:21:40 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:delivered-to:from:to:cc:subject :date:message-id:in-reply-to:references:x-original-sender :x-original-authentication-results:precedence:mailing-list:list-id :list-post:list-help:list-archive:list-unsubscribe; bh=s0YXygiqYoTF0shZTLWjOpjsvuHS3nHrRrMbYE87IwI=; b=YfAwFO0LELuJEYS7OYqBtCnHNsVBB8M0FCa0LUOXfNNRgrBFjmNSAIslU5gLIR6tZF Aj2UlWVCrtR/KRwb/nnxM6Nna3eV5hRtCKYa6RdRNbfJd+GQU+DZU66YWlm+clxz1MQW viAntQ0YPm0g3F7xQVF3qKsuiriAiCqNhNopJ+WOZUoFeL1QuKfeha+ZSI9SXMH/Q0c8 aJQ9Vrt0FTqByDgDVjX6CjeQQw5mgFmpQPe76KgIUso79/dL3IuM6Ic69Rrh97oZAixT 5B+jRaF80bJNpOo6eUiB7C9hWpgH5Y2ZDg66XhAWJ5l/VLmoucXenOlb8WCu8UqXgYPi BQZw== X-Gm-Message-State: ALoCoQlRb5+aDOC3gAiXvZcdsd9/eOBhSOGi4RycF88jwgs1Sp/EbF33uwBnpLahy+VIsEMBDvNI X-Received: by 10.58.202.10 with SMTP id ke10mr212832vec.8.1398806500330; Tue, 29 Apr 2014 14:21:40 -0700 (PDT) MIME-Version: 1.0 X-BeenThere: patchwork-forward@linaro.org Received: by 10.140.37.164 with SMTP id r33ls374894qgr.2.gmail; Tue, 29 Apr 2014 14:21:40 -0700 (PDT) X-Received: by 10.58.181.170 with SMTP id dx10mr238568vec.25.1398806500204; Tue, 29 Apr 2014 14:21:40 -0700 (PDT) Received: from mail-vc0-f176.google.com (mail-vc0-f176.google.com [209.85.220.176]) by mx.google.com with ESMTPS id eb17si4844939veb.76.2014.04.29.14.21.40 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 29 Apr 2014 14:21:40 -0700 (PDT) Received-SPF: none (google.com: patch+caf_=patchwork-forward=linaro.org@linaro.org does not designate permitted sender hosts) client-ip=209.85.220.176; Received: by mail-vc0-f176.google.com with SMTP id lc6so1079033vcb.35 for ; Tue, 29 Apr 2014 14:21:40 -0700 (PDT) X-Received: by 10.220.105.130 with SMTP id t2mr188519vco.18.1398806500112; Tue, 29 Apr 2014 14:21:40 -0700 (PDT) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patches@linaro.org Received: by 10.220.221.72 with SMTP id ib8csp228817vcb; Tue, 29 Apr 2014 14:21:39 -0700 (PDT) X-Received: by 10.66.191.134 with SMTP id gy6mr458266pac.27.1398806499189; Tue, 29 Apr 2014 14:21:39 -0700 (PDT) Received: from mail-pd0-f179.google.com (mail-pd0-f179.google.com [209.85.192.179]) by mx.google.com with ESMTPS id hr5si1864521pac.346.2014.04.29.14.21.38 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 29 Apr 2014 14:21:39 -0700 (PDT) Received-SPF: none (google.com: john.stultz@linaro.org does not designate permitted sender hosts) client-ip=209.85.192.179; Received: by mail-pd0-f179.google.com with SMTP id y10so713522pdj.10 for ; Tue, 29 Apr 2014 14:21:38 -0700 (PDT) X-Received: by 10.66.251.101 with SMTP id zj5mr507844pac.154.1398806498526; Tue, 29 Apr 2014 14:21:38 -0700 (PDT) Received: from localhost.localdomain (c-67-170-153-23.hsd1.or.comcast.net. [67.170.153.23]) by mx.google.com with ESMTPSA id yv7sm118814095pac.33.2014.04.29.14.21.37 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Tue, 29 Apr 2014 14:21:37 -0700 (PDT) From: John Stultz To: LKML Cc: John Stultz , Andrew Morton , Android Kernel Team , Johannes Weiner , Robert Love , Mel Gorman , Hugh Dickins , Dave Hansen , Rik van Riel , Dmitry Adamushko , Neil Brown , Andrea Arcangeli , Mike Hommey , Taras Glek , Jan Kara , KOSAKI Motohiro , Michel Lespinasse , Minchan Kim , Keith Packard , "linux-mm@kvack.org" Subject: [PATCH 2/4] MADV_VOLATILE: Add MADV_VOLATILE/NONVOLATILE hooks and handle marking vmas Date: Tue, 29 Apr 2014 14:21:21 -0700 Message-Id: <1398806483-19122-3-git-send-email-john.stultz@linaro.org> X-Mailer: git-send-email 1.9.1 In-Reply-To: <1398806483-19122-1-git-send-email-john.stultz@linaro.org> References: <1398806483-19122-1-git-send-email-john.stultz@linaro.org> X-Removed-Original-Auth: Dkim didn't pass. X-Original-Sender: john.stultz@linaro.org X-Original-Authentication-Results: mx.google.com; spf=neutral (google.com: patch+caf_=patchwork-forward=linaro.org@linaro.org does not designate permitted sender hosts) smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org Precedence: list Mailing-list: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org List-ID: X-Google-Group-Id: 836684582541 List-Post: , List-Help: , List-Archive: List-Unsubscribe: , This patch introduces MADV_VOLATILE/NONVOLATILE flags to madvise(), which allows for specifying ranges of memory as volatile, and able to be discarded by the system. This initial patch simply adds flag handling to madvise, and the vma handling, splitting and merging the vmas as needed, and marking them with VM_VOLATILE. No purging or discarding of volatile ranges is done at this point. This a simplified implementation which reuses some of the logic from Minchan's earlier efforts. So credit to Minchan for his work. Cc: Andrew Morton Cc: Android Kernel Team Cc: Johannes Weiner Cc: Robert Love Cc: Mel Gorman Cc: Hugh Dickins Cc: Dave Hansen Cc: Rik van Riel Cc: Dmitry Adamushko Cc: Neil Brown Cc: Andrea Arcangeli Cc: Mike Hommey Cc: Taras Glek Cc: Jan Kara Cc: KOSAKI Motohiro Cc: Michel Lespinasse Cc: Minchan Kim Cc: Keith Packard Cc: linux-mm@kvack.org Signed-off-by: John Stultz --- include/linux/mm.h | 1 + include/linux/mvolatile.h | 6 ++ include/uapi/asm-generic/mman-common.h | 5 ++ mm/Makefile | 2 +- mm/madvise.c | 14 ++++ mm/mvolatile.c | 147 +++++++++++++++++++++++++++++++++ 6 files changed, 174 insertions(+), 1 deletion(-) create mode 100644 include/linux/mvolatile.h create mode 100644 mm/mvolatile.c diff --git a/include/linux/mm.h b/include/linux/mm.h index bf9811e..ea8b687 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -117,6 +117,7 @@ extern unsigned int kobjsize(const void *objp); #define VM_IO 0x00004000 /* Memory mapped I/O or similar */ /* Used by sys_madvise() */ +#define VM_VOLATILE 0x00001000 /* VMA is volatile */ #define VM_SEQ_READ 0x00008000 /* App will access data sequentially */ #define VM_RAND_READ 0x00010000 /* App will not benefit from clustered reads */ diff --git a/include/linux/mvolatile.h b/include/linux/mvolatile.h new file mode 100644 index 0000000..f53396b --- /dev/null +++ b/include/linux/mvolatile.h @@ -0,0 +1,6 @@ +#ifndef _LINUX_MVOLATILE_H +#define _LINUX_MVOLATILE_H + +int madvise_volatile(int bhv, unsigned long start, unsigned long end); + +#endif /* _LINUX_MVOLATILE_H */ diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h index ddc3b36..b74d61d 100644 --- a/include/uapi/asm-generic/mman-common.h +++ b/include/uapi/asm-generic/mman-common.h @@ -39,6 +39,7 @@ #define MADV_REMOVE 9 /* remove these pages & resources */ #define MADV_DONTFORK 10 /* don't inherit across fork */ #define MADV_DOFORK 11 /* do inherit across fork */ + #define MADV_HWPOISON 100 /* poison a page for testing */ #define MADV_SOFT_OFFLINE 101 /* soft offline page for testing */ @@ -52,6 +53,10 @@ overrides the coredump filter bits */ #define MADV_DODUMP 17 /* Clear the MADV_DONTDUMP flag */ +#define MADV_VOLATILE 18 /* Mark pages as volatile */ +#define MADV_NONVOLATILE 19 /* Mark pages non-volatile, return 1 + if any pages were purged */ + /* compatibility flags */ #define MAP_FILE 0 diff --git a/mm/Makefile b/mm/Makefile index b484452..9a3dc62 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -18,7 +18,7 @@ obj-y := filemap.o mempool.o oom_kill.o fadvise.o \ mm_init.o mmu_context.o percpu.o slab_common.o \ compaction.o balloon_compaction.o vmacache.o \ interval_tree.o list_lru.o workingset.o \ - iov_iter.o $(mmu-y) + mvolatile.o iov_iter.o $(mmu-y) obj-y += init-mm.o diff --git a/mm/madvise.c b/mm/madvise.c index 539eeb9..937c026 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -19,6 +19,7 @@ #include #include #include +#include /* * Any behaviour which results in changes to the vma->vm_flags needs to @@ -413,6 +414,8 @@ madvise_behavior_valid(int behavior) #endif case MADV_DONTDUMP: case MADV_DODUMP: + case MADV_VOLATILE: + case MADV_NONVOLATILE: return 1; default: @@ -450,9 +453,14 @@ madvise_behavior_valid(int behavior) * MADV_MERGEABLE - the application recommends that KSM try to merge pages in * this area with pages of identical content from other such areas. * MADV_UNMERGEABLE- cancel MADV_MERGEABLE: no longer merge pages with others. + * MADV_VOLATILE - Mark pages as volatile, allowing kernel to purge them under + * pressure. + * MADV_NONVOLATILE - Mark pages as non-volatile. Report if pages were purged. * * return values: * zero - success + * 1 - (MADV_NONVOLATILE only) some pages marked non-volatile were + * purged. * -EINVAL - start + len < 0, start is not page-aligned, * "behavior" is not a valid value, or application * is attempting to release locked or shared pages. @@ -478,6 +486,12 @@ SYSCALL_DEFINE3(madvise, unsigned long, start, size_t, len_in, int, behavior) #endif if (!madvise_behavior_valid(behavior)) return error; + /* + * MADV_VOLATILE/NONVOLATILE has subtle semantics that requrie + * we don't use the generic per-vma manipulation below. + */ + if (behavior == MADV_VOLATILE || behavior == MADV_NONVOLATILE) + return madvise_volatile(behavior, start, start+len_in); if (start & ~PAGE_MASK) return error; diff --git a/mm/mvolatile.c b/mm/mvolatile.c new file mode 100644 index 0000000..edc5894 --- /dev/null +++ b/mm/mvolatile.c @@ -0,0 +1,147 @@ +/* + * mm/mvolatile.c + * + * Copyright (C) 2014, LG Electronics, Minchan Kim + * Copyright (C) 2014 Linaro Ltd., John Stultz + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "internal.h" + + +/** + * madvise_volatile - Marks or clears VMAs in the range (start-end) as VM_VOLATILE + * @mode: the mode of the volatile range (volatile or non-volatile) + * @start: starting address of the volatile range + * @end: ending address of the volatile range + * + * Iterates over the VMAs in the specified range, and marks or clears + * them as VM_VOLATILE, splitting or merging them as needed. + * + * Returns 0 on success + * Returns 1 if any pages being marked were purged (MADV_NONVOLATILE only) + * Returns error only if no bytes were modified. + */ +int madvise_volatile(int mode, unsigned long start, unsigned long end) +{ + struct vm_area_struct *vma, *prev; + struct mm_struct *mm = current->mm; + unsigned long orig_start = start; + int ret = 0; + + /* Bit of sanity checking */ + if ((mode != MADV_VOLATILE) && (mode != MADV_NONVOLATILE)) + return -EINVAL; + if (start & ~PAGE_MASK) + return -EINVAL; + if (end & ~PAGE_MASK) + return -EINVAL; + if (end < start) + return -EINVAL; + if (start >= TASK_SIZE) + return -EINVAL; + + + down_write(&mm->mmap_sem); + /* + * First, iterate ovver the VMAs and make sure + * there are no holes or file vmas which would result + * in -EINVAL. + */ + vma = find_vma(mm, start); + if (!vma) { + /* return ENOMEM if we're trying to mark unmapped pages */ + ret = -ENOMEM; + goto out; + } + + while (vma) { + if (vma->vm_flags & (VM_SPECIAL|VM_LOCKED|VM_MIXEDMAP| + VM_HUGETLB)) { + ret = -EINVAL; + goto out; + } + + /* We don't support volatility on files for now */ + if (vma->vm_file) { + ret = -EINVAL; + goto out; + } + + /* return ENOMEM if we're trying to mark unmapped pages */ + if (start < vma->vm_start) { + ret = -ENOMEM; + goto out; + } + + start = vma->vm_end; + if (start >= end) + break; + vma = vma->vm_next; + } + + /* + * Second, do VMA splitting. Note: If either of these + * fail, we'll make no modifications to the vm_flags, + * and will merge back together any unmodified split + * vmas + */ + start = orig_start; + vma = find_vma(mm, start); + if (start != vma->vm_start) + ret = split_vma(mm, vma, start, 1); + + vma = find_vma(mm, end-1); + /* only need to split if end addr is not at the beginning of the vma */ + if (!ret && (end != vma->vm_end)) + ret = split_vma(mm, vma, end, 0); + + /* + * Third, if splitting was successful modify vm_flags. + * We also will do any vma merging that is needed at + * this point. + */ + start = orig_start; + vma = find_vma_prev(mm, start, &prev); + if (vma && start > vma->vm_start) + prev = vma; + + while (vma) { + unsigned long new_flags; + pgoff_t pgoff; + + new_flags = vma->vm_flags; + if (!ret) { + if (mode == MADV_VOLATILE) + new_flags |= VM_VOLATILE; + else /* mode == MADV_NONVOLATILE */ + new_flags &= ~VM_VOLATILE; + } + pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT); + prev = vma_merge(mm, prev, start, vma->vm_end, new_flags, + vma->anon_vma, vma->vm_file, pgoff, + vma_policy(vma)); + if (!prev) + prev = vma; + else + vma = prev; + + vma->vm_flags = new_flags; + + start = vma->vm_end; + if (start >= end) + break; + vma = vma->vm_next; + } +out: + up_write(&mm->mmap_sem); + + return ret; +}