From patchwork Fri May 3 18:27:07 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Stultz X-Patchwork-Id: 16700 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-bk0-f70.google.com (mail-bk0-f70.google.com [209.85.214.70]) by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id EC90A23905 for ; Fri, 3 May 2013 18:27:40 +0000 (UTC) Received: by mail-bk0-f70.google.com with SMTP id j4sf2653321bkw.1 for ; Fri, 03 May 2013 11:27:30 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:mime-version:x-beenthere:x-received:received-spf :x-received:x-forwarded-to:x-forwarded-for:delivered-to:x-received :received-spf:x-received:from:to:cc:subject:date:message-id:x-mailer :in-reply-to:references:x-gm-message-state:x-original-sender :x-original-authentication-results:precedence:mailing-list:list-id :x-google-group-id:list-post:list-help:list-archive:list-unsubscribe; bh=kcNLNBM1iDNN9+SHYRyWagqI8DfDPB0N3NqV7xi0ly8=; b=CSfA9c8M0PmErc09To8cOK6bd1kOcoBZm9tTSap5HUNQU6OPbJO0M7oILY30656uOR aVKUvupkkBPZFaKmQhypzPNV6RqwouyO+TLf4iNMwNcM12wQAAHCqziMzCj+WjDLTWSM gg4AMFMZDfFrUDGI9jFsubU3cdEzB3qDTxda79CT7DVfpQd0Cs+wR/VTUjxgpHk3wWXs biM2WRdukYFOYUGheXsWO6sv5b6wtCNqcrvfB/LK43KXhVFGqGPF3UKh8lioSH9jieeT N7ojpDvYDTIC0UqbeSRKQwGvI0YcDA2fAQHSMG2+FVxPI4e8+bRIMQ5gagBM+As1apER ul3Q== X-Received: by 10.180.12.200 with SMTP id a8mr5022106wic.1.1367605650044; Fri, 03 May 2013 11:27:30 -0700 (PDT) MIME-Version: 1.0 X-BeenThere: patchwork-forward@linaro.org Received: by 10.180.76.71 with SMTP id i7ls259686wiw.47.canary; Fri, 03 May 2013 11:27:29 -0700 (PDT) X-Received: by 10.194.77.103 with SMTP id r7mr11938423wjw.12.1367605649859; Fri, 03 May 2013 11:27:29 -0700 (PDT) Received: from mail-ve0-x22b.google.com (mail-ve0-x22b.google.com [2607:f8b0:400c:c01::22b]) by mx.google.com with ESMTPS id y5si3996327wjx.121.2013.05.03.11.27.29 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 03 May 2013 11:27:29 -0700 (PDT) Received-SPF: neutral (google.com: 2607:f8b0:400c:c01::22b is neither permitted nor denied by best guess record for domain of patch+caf_=patchwork-forward=linaro.org@linaro.org) client-ip=2607:f8b0:400c:c01::22b; Received: by mail-ve0-f171.google.com with SMTP id oy12so1774730veb.2 for ; Fri, 03 May 2013 11:27:28 -0700 (PDT) X-Received: by 10.52.228.71 with SMTP id sg7mr3254035vdc.51.1367605648456; Fri, 03 May 2013 11:27:28 -0700 (PDT) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patches@linaro.org Received: by 10.58.127.98 with SMTP id nf2csp34378veb; Fri, 3 May 2013 11:27:27 -0700 (PDT) X-Received: by 10.66.8.34 with SMTP id o2mr15643951paa.182.1367605647289; Fri, 03 May 2013 11:27:27 -0700 (PDT) Received: from mail-pd0-f179.google.com (mail-pd0-f179.google.com [209.85.192.179]) by mx.google.com with ESMTPS id wt9si8612822pab.124.2013.05.03.11.27.26 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 03 May 2013 11:27:27 -0700 (PDT) Received-SPF: neutral (google.com: 209.85.192.179 is neither permitted nor denied by best guess record for domain of john.stultz@linaro.org) client-ip=209.85.192.179; Received: by mail-pd0-f179.google.com with SMTP id q10so1045058pdj.10 for ; Fri, 03 May 2013 11:27:26 -0700 (PDT) X-Received: by 10.66.248.163 with SMTP id yn3mr15929504pac.39.1367605646168; Fri, 03 May 2013 11:27:26 -0700 (PDT) Received: from localhost.localdomain (c-24-21-54-107.hsd1.or.comcast.net. [24.21.54.107]) by mx.google.com with ESMTPSA id qh4sm13792406pac.8.2013.05.03.11.27.24 for (version=TLSv1.1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 03 May 2013 11:27:24 -0700 (PDT) From: John Stultz To: Minchan Kim Cc: John Stultz Subject: [PATCH 03/12] vrange: Add new vrange(2) system call Date: Fri, 3 May 2013 11:27:07 -0700 Message-Id: <1367605636-18284-4-git-send-email-john.stultz@linaro.org> X-Mailer: git-send-email 1.7.10.4 In-Reply-To: <1367605636-18284-1-git-send-email-john.stultz@linaro.org> References: <1367605636-18284-1-git-send-email-john.stultz@linaro.org> X-Gm-Message-State: ALoCoQkAFTnkk0hP4Gn0LTAbftVUuGXU1fIxMih+EX4064biuelV2a364owf8JXIZ7s7roHOrmzD X-Original-Sender: john.stultz@linaro.org X-Original-Authentication-Results: mx.google.com; spf=neutral (google.com: 2607:f8b0:400c:c01::22b is neither permitted nor denied by best guess record for domain of patch+caf_=patchwork-forward=linaro.org@linaro.org) smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org Precedence: list Mailing-list: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org List-ID: X-Google-Group-Id: 836684582541 List-Post: , List-Help: , List-Archive: List-Unsubscribe: , From: Minchan Kim This patch adds new system call sys_vrange. NAME vrange - Mark or unmark range of memory as volatile SYNOPSIS int vrange(unsigned_long start, size_t length, int mode, int behavior, int *purged); DESCRIPTION Applications can use vrange(2) to advise the kernel how it should handle paging I/O in this VM area. The idea is to help the kernel discard pages of vrange instead of reclaiming when memory pressure happens. It means kernel doesn't discard any pages of vrange if there is no memory pressure. mode: VRANGE_VOLATILE hint to kernel so VM can discard in vrange pages when memory pressure happens. VRANGE_NONVOLATILE hint to kernel so VM doesn't discard vrange pages any more. behavior: VRANGE_MODE_SHARED Normally the volatility is private to the process doing the marking. In other words, pages will only be purged if *all* processes that have that page mapped considers it as volatile. With the MODE_SHARED bit flag set, the file pages can be purged if any process has marked it volatile. This is very similar to MAP_SHARED semantics of mmap, where modifications by one process to a mmapped file will be seen by other processes. (NOTE: This functionality is not present in this patch, but is introduced with a following patch) If user try to access purged memory without VRANGE_NOVOLATILE call, he can encounter SIGBUS if the page was discarded by kernel. purged: Pointer to an integer which will return 1 if mode == VRANGE_NONVOLATILE and any page in the affected range was purged. If purged returns zero during a mode == VRANGE_NONVOLATILE call, it means all of the pages in the range are intact. RETURN VALUE On success vrange returns the number of bytes marked or unmarked. Similar to write(), it may return fewer bytes then specified if it ran into a problem. If an error is returned, no changes were made. ERRORS EINVAL This error can occur for the following reasons: * The value length is negative. * addr is not page-aligned * mode or behavior are not a valid value. ENOMEM Not enough memory ENOTSUPP Behavior is not yet supported Signed-off-by: Minchan Kim [jstultz: Major rework of interface and commit message] Signed-off-by: John Stultz --- arch/x86/syscalls/syscall_64.tbl | 1 + include/linux/mm_types.h | 5 +++ include/uapi/asm-generic/mman-common.h | 3 ++ kernel/fork.c | 3 ++ mm/vrange.c | 75 ++++++++++++++++++++++++++++++++++ 5 files changed, 87 insertions(+) diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl index 38ae65d..dc332bd 100644 --- a/arch/x86/syscalls/syscall_64.tbl +++ b/arch/x86/syscalls/syscall_64.tbl @@ -320,6 +320,7 @@ 311 64 process_vm_writev sys_process_vm_writev 312 common kcmp sys_kcmp 313 common finit_module sys_finit_module +314 common vrange sys_vrange # # x32-specific system call numbers start at 512 to avoid cache impact diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index ace9a5f..2e02a6d 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -13,6 +13,8 @@ #include #include #include +#include +#include #include #include @@ -351,6 +353,9 @@ struct mm_struct { */ +#ifdef CONFIG_MMU + struct vrange_root vroot; +#endif unsigned long hiwater_rss; /* High-watermark of RSS usage */ unsigned long hiwater_vm; /* High-water virtual memory usage */ diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h index 4164529..9be120b 100644 --- a/include/uapi/asm-generic/mman-common.h +++ b/include/uapi/asm-generic/mman-common.h @@ -66,4 +66,7 @@ #define MAP_HUGE_SHIFT 26 #define MAP_HUGE_MASK 0x3f +#define VRANGE_VOLATILE 0 /* unpin pages so VM can discard them */ +#define VRANGE_NONVOLATILE 1 /* pin pages so VM can't discard them */ + #endif /* __ASM_GENERIC_MMAN_COMMON_H */ diff --git a/kernel/fork.c b/kernel/fork.c index 1766d32..360ad65 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -70,6 +70,7 @@ #include #include #include +#include #include #include @@ -541,6 +542,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p) spin_lock_init(&mm->page_table_lock); mm->free_area_cache = TASK_UNMAPPED_BASE; mm->cached_hole_size = ~0UL; + vrange_root_init(&mm->vroot, VRANGE_MM); mm_init_aio(mm); mm_init_owner(mm, p); @@ -612,6 +614,7 @@ void mmput(struct mm_struct *mm) if (atomic_dec_and_test(&mm->mm_users)) { uprobe_clear_state(mm); + vrange_root_cleanup(&mm->vroot); exit_aio(mm); ksm_exit(mm); khugepaged_exit(mm); /* must run before exit_mmap */ diff --git a/mm/vrange.c b/mm/vrange.c index 565bca34..537e3d5 100644 --- a/mm/vrange.c +++ b/mm/vrange.c @@ -4,6 +4,8 @@ #include #include +#include +#include static struct kmem_cache *vrange_cachep; @@ -163,3 +165,76 @@ void vrange_root_cleanup(struct vrange_root *vroot) vrange_unlock(vroot); } +static int vrange_private(struct mm_struct *mm, + unsigned long start, unsigned long end, + int mode, int *purged) +{ + int ret = -EINVAL; + + if (mode == VRANGE_VOLATILE) + ret = vrange_add(&mm->vroot, start, end-1); + else if (mode == VRANGE_NONVOLATILE) + ret = vrange_remove(&mm->vroot, start, end-1, purged); + + if (ret < 0) + return ret; + return end-start; +} + +/* + * The vrange(2) system call. + * + * Applications can use vrange() to advise the kernel how it should + * handle paging I/O in this VM area. The idea is to help the kernel + * discard pages of vrange instead of swapping out when memory pressure + * happens. The information provided is advisory only, and can be safely + * disregarded by the kernel if system has enough free memory. + * + * mode values: + * VRANGE_VOLATILE - hint to kernel so VM can discard vrange pages when + * memory pressure happens. + * VRANGE_NONVOLATILE - Removes any volatile hints previous specified in that + * range. + * + * behavior values (bitflags): None yet supported. + * + * purged ptr: + * Returns 1 if any page in the range being marked nonvolatile has been purged. + * + * return values: + * non-negative - Number of bytes marked or unmarked. + * -EINVAL - start len < 0, start is not page-aligned, start is greater + * than TASK_SIZE or "mode" is not a valid value. + * -ENOMEM - Short of free memory in system for successful system call. + * -ENOSUP - Feature not yet supported. + */ +SYSCALL_DEFINE5(vrange, unsigned long, start, + size_t, len, int, mode, int, behavior, int*, purged) +{ + unsigned long end; + struct mm_struct *mm = current->mm; + int ret = -EINVAL; + + /* We don't yet support any behavior modes */ + if (behavior) + return -ENOTSUPP; + + if (start & ~PAGE_MASK) + goto out; + + len &= PAGE_MASK; + if (!len) + goto out; + + end = start + len; + if (end < start) + goto out; + + if (start >= TASK_SIZE) + goto out; + + ret = vrange_private(mm, start, end, mode, purged); + +out: + return ret; +}