From patchwork Tue Oct 1 18:38:49 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Stultz X-Patchwork-Id: 20730 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-ve0-f198.google.com (mail-ve0-f198.google.com [209.85.128.198]) by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id 4B4A423920 for ; Tue, 1 Oct 2013 18:39:12 +0000 (UTC) Received: by mail-ve0-f198.google.com with SMTP id c14sf8795067vea.9 for ; Tue, 01 Oct 2013 11:39:11 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=mime-version:x-gm-message-state:delivered-to:from:to:subject:date :message-id:in-reply-to:references:x-original-sender :x-original-authentication-results:precedence:mailing-list:list-id :list-post:list-help:list-archive:list-unsubscribe; bh=cWLVt/VldogkolCV8WWD1mvzDyKMEJWOvBAE0RJAF5k=; b=ClpiJze/bEjzY37QYNnL80HZYWHgub8bryE215mwq63ueJl0cFr1u2akg2IzvvEiWu 73/LEJjzW9+I4nZwbW9lRgtbJT5cFwJ5lb/lr4CuI2uCgGOno6slkMwYP7EuaxUycvS4 0geG0wop1mjk+Rj+LENqY+1V0T0skvAjbOj9FuoqCls3EbaEphGFV4wwVJxdN+XfhLpR TWmmMGJRsEllt0IbKIPKRoawxP29OJhIHLSlIbBHsEGEV0exaAUmG2NzM86BnHg12TPT LYTtp2PtpuzQoGOcQQsBF+mNeO++kDtOkHbAbae+rC4zYNcbZDrf/5tng+OAiF59eMHk X3JA== X-Received: by 10.58.227.129 with SMTP id sa1mr2412397vec.34.1380652751914; Tue, 01 Oct 2013 11:39:11 -0700 (PDT) MIME-Version: 1.0 X-BeenThere: patchwork-forward@linaro.org Received: by 10.49.83.7 with SMTP id m7ls171146qey.0.gmail; Tue, 01 Oct 2013 11:39:11 -0700 (PDT) X-Received: by 10.52.230.102 with SMTP id sx6mr24348100vdc.15.1380652751566; Tue, 01 Oct 2013 11:39:11 -0700 (PDT) Received: from mail-vb0-f44.google.com (mail-vb0-f44.google.com [209.85.212.44]) by mx.google.com with ESMTPS id b5si1616277vel.62.1969.12.31.16.00.00 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 01 Oct 2013 11:39:11 -0700 (PDT) Received-SPF: neutral (google.com: 209.85.212.44 is neither permitted nor denied by best guess record for domain of patch+caf_=patchwork-forward=linaro.org@linaro.org) client-ip=209.85.212.44; Received: by mail-vb0-f44.google.com with SMTP id e13so5070955vbg.17 for ; Tue, 01 Oct 2013 11:39:11 -0700 (PDT) X-Gm-Message-State: ALoCoQlFcH0n4Dy5C2Dgy10Qa3GtO57f2Nn7b7ahfxVgu9JuKCvUJ0kNjbfK0v7hkXgDxz5r0vJW X-Received: by 10.58.179.104 with SMTP id df8mr1771449vec.26.1380652751408; Tue, 01 Oct 2013 11:39:11 -0700 (PDT) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patches@linaro.org Received: by 10.220.174.196 with SMTP id u4csp38605vcz; Tue, 1 Oct 2013 11:39:10 -0700 (PDT) X-Received: by 10.68.11.41 with SMTP id n9mr10098898pbb.164.1380652748978; Tue, 01 Oct 2013 11:39:08 -0700 (PDT) Received: from mail-pb0-f54.google.com (mail-pb0-f54.google.com [209.85.160.54]) by mx.google.com with ESMTPS id kk1si5505155pbc.334.1969.12.31.16.00.00 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 01 Oct 2013 11:39:08 -0700 (PDT) Received-SPF: neutral (google.com: 209.85.160.54 is neither permitted nor denied by best guess record for domain of john.stultz@linaro.org) client-ip=209.85.160.54; Received: by mail-pb0-f54.google.com with SMTP id ro12so7498365pbb.41 for ; Tue, 01 Oct 2013 11:39:08 -0700 (PDT) X-Received: by 10.66.66.76 with SMTP id d12mr5055491pat.162.1380652748551; Tue, 01 Oct 2013 11:39:08 -0700 (PDT) Received: from localhost.localdomain (c-67-170-153-23.hsd1.or.comcast.net. [67.170.153.23]) by mx.google.com with ESMTPSA id ed3sm8282606pbc.6.1969.12.31.16.00.00 (version=TLSv1.1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 01 Oct 2013 11:39:07 -0700 (PDT) From: John Stultz To: Minchan Kim , Dhaval Giani Subject: [PATCH 05/14] vrange: Add new vrange(2) system call Date: Tue, 1 Oct 2013 11:38:49 -0700 Message-Id: <1380652738-8000-6-git-send-email-john.stultz@linaro.org> X-Mailer: git-send-email 1.8.1.2 In-Reply-To: <1380652738-8000-1-git-send-email-john.stultz@linaro.org> References: <1380652738-8000-1-git-send-email-john.stultz@linaro.org> X-Removed-Original-Auth: Dkim didn't pass. X-Original-Sender: john.stultz@linaro.org X-Original-Authentication-Results: mx.google.com; spf=neutral (google.com: 209.85.212.44 is neither permitted nor denied by best guess record for domain of patch+caf_=patchwork-forward=linaro.org@linaro.org) smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org Precedence: list Mailing-list: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org List-ID: X-Google-Group-Id: 836684582541 List-Post: , List-Help: , List-Archive: List-Unsubscribe: , From: Minchan Kim This patch adds new system call sys_vrange. NAME vrange - Mark or unmark range of memory as volatile SYNOPSIS int vrange(unsigned_long start, size_t length, int mode, int *purged); DESCRIPTION Applications can use vrange(2) to advise the kernel how it should handle paging I/O in this VM area. The idea is to help the kernel discard pages of vrange instead of reclaiming when memory pressure happens. It means kernel doesn't discard any pages of vrange if there is no memory pressure. mode: VRANGE_VOLATILE hint to kernel so VM can discard in vrange pages when memory pressure happens. VRANGE_NONVOLATILE hint to kernel so VM doesn't discard vrange pages any more. If user try to access purged memory without VRANGE_NOVOLATILE call, he can encounter SIGBUS if the page was discarded by kernel. purged: Pointer to an integer which will return 1 if mode == VRANGE_NONVOLATILE and any page in the affected range was purged. If purged returns zero during a mode == VRANGE_NONVOLATILE call, it means all of the pages in the range are intact. RETURN VALUE On success vrange returns the number of bytes marked or unmarked. Similar to write(), it may return fewer bytes then specified if it ran into a problem. If an error is returned, no changes were made. ERRORS EINVAL This error can occur for the following reasons: * The value length is negative or not page size units. * addr is not page-aligned * mode not a valid value. ENOMEM Not enough memory EFAULT purged pointer is invalid Cc: Andrew Morton Cc: Android Kernel Team Cc: Robert Love Cc: Mel Gorman Cc: Hugh Dickins Cc: Dave Hansen Cc: Rik van Riel Cc: Dmitry Adamushko Cc: Dave Chinner Cc: Neil Brown Cc: Andrea Righi Cc: Andrea Arcangeli Cc: Aneesh Kumar K.V Cc: Mike Hommey Cc: Taras Glek Cc: Dhaval Giani Cc: Jan Kara Cc: KOSAKI Motohiro Cc: Michel Lespinasse Cc: Rob Clark Cc: Minchan Kim Cc: linux-mm@kvack.org Signed-off-by: Minchan Kim Signed-off-by: John Stultz --- arch/x86/syscalls/syscall_64.tbl | 1 + include/linux/syscalls.h | 2 + include/uapi/asm-generic/mman-common.h | 3 + kernel/sys_ni.c | 1 + mm/vrange.c | 164 +++++++++++++++++++++++++++++++++ 5 files changed, 171 insertions(+) diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl index 38ae65d..dc332bd 100644 --- a/arch/x86/syscalls/syscall_64.tbl +++ b/arch/x86/syscalls/syscall_64.tbl @@ -320,6 +320,7 @@ 311 64 process_vm_writev sys_process_vm_writev 312 common kcmp sys_kcmp 313 common finit_module sys_finit_module +314 common vrange sys_vrange # # x32-specific system call numbers start at 512 to avoid cache impact diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 84662ec..0997165 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -846,4 +846,6 @@ asmlinkage long sys_process_vm_writev(pid_t pid, asmlinkage long sys_kcmp(pid_t pid1, pid_t pid2, int type, unsigned long idx1, unsigned long idx2); asmlinkage long sys_finit_module(int fd, const char __user *uargs, int flags); +asmlinkage long sys_vrange(unsigned long start, size_t len, int mode, + int __user *purged); #endif diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h index 4164529..9be120b 100644 --- a/include/uapi/asm-generic/mman-common.h +++ b/include/uapi/asm-generic/mman-common.h @@ -66,4 +66,7 @@ #define MAP_HUGE_SHIFT 26 #define MAP_HUGE_MASK 0x3f +#define VRANGE_VOLATILE 0 /* unpin pages so VM can discard them */ +#define VRANGE_NONVOLATILE 1 /* pin pages so VM can't discard them */ + #endif /* __ASM_GENERIC_MMAN_COMMON_H */ diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c index 7078052..f40070e 100644 --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -175,6 +175,7 @@ cond_syscall(sys_mremap); cond_syscall(sys_remap_file_pages); cond_syscall(compat_sys_move_pages); cond_syscall(compat_sys_migrate_pages); +cond_syscall(sys_vrange); /* block-layer dependent */ cond_syscall(sys_bdflush); diff --git a/mm/vrange.c b/mm/vrange.c index 10c736f..115ddb4 100644 --- a/mm/vrange.c +++ b/mm/vrange.c @@ -4,6 +4,8 @@ #include #include +#include +#include static struct kmem_cache *vrange_cachep; @@ -229,3 +231,165 @@ fail: vrange_root_cleanup(new); return 0; } + +static inline struct vrange_root *__vma_to_vroot(struct vm_area_struct *vma) +{ + struct vrange_root *vroot = NULL; + + if (vma->vm_file && (vma->vm_flags & VM_SHARED)) + vroot = &vma->vm_file->f_mapping->vroot; + else + vroot = &vma->vm_mm->vroot; + return vroot; +} + +static inline unsigned long __vma_addr_to_index(struct vm_area_struct *vma, + unsigned long addr) +{ + if (vma->vm_file && (vma->vm_flags & VM_SHARED)) + return (vma->vm_pgoff << PAGE_SHIFT) + addr - vma->vm_start; + return addr; +} + +static ssize_t do_vrange(struct mm_struct *mm, unsigned long start_idx, + unsigned long end_idx, int mode, int *purged) +{ + struct vm_area_struct *vma; + unsigned long orig_start = start_idx; + ssize_t count = 0, ret = 0; + + down_read(&mm->mmap_sem); + + vma = find_vma(mm, start_idx); + for (;;) { + struct vrange_root *vroot; + unsigned long tmp, vstart_idx, vend_idx; + + if (!vma) + goto out; + + if (vma->vm_flags & (VM_SPECIAL|VM_LOCKED|VM_MIXEDMAP| + VM_HUGETLB)) + goto out; + + /* make sure start is at the front of the current vma*/ + if (start_idx < vma->vm_start) { + start_idx = vma->vm_start; + if (start_idx > end_idx) + goto out; + } + + /* bound tmp to closer of vm_end & end */ + tmp = vma->vm_end - 1; + if (end_idx < tmp) + tmp = end_idx; + + vroot = __vma_to_vroot(vma); + vstart_idx = __vma_addr_to_index(vma, start_idx); + vend_idx = __vma_addr_to_index(vma, tmp); + + /* mark or unmark */ + if (mode == VRANGE_VOLATILE) + ret = vrange_add(vroot, vstart_idx, vend_idx); + else if (mode == VRANGE_NONVOLATILE) + ret = vrange_remove(vroot, vstart_idx, vend_idx, + purged); + + if (ret) + goto out; + + /* update count to distance covered so far*/ + count = tmp - orig_start + 1; + + /* move start up to the end of the vma*/ + start_idx = vma->vm_end; + if (start_idx > end_idx) + goto out; + /* move to the next vma */ + vma = vma->vm_next; + } +out: + up_read(&mm->mmap_sem); + + /* report bytes successfully marked, even if we're exiting on error */ + if (count) + return count; + + return ret; +} + +/* + * The vrange(2) system call. + * + * Applications can use vrange() to advise the kernel how it should + * handle paging I/O in this VM area. The idea is to help the kernel + * discard pages of vrange instead of swapping out when memory pressure + * happens. The information provided is advisory only, and can be safely + * disregarded by the kernel if system has enough free memory. + * + * mode values: + * VRANGE_VOLATILE - hint to kernel so VM can discard vrange pages when + * memory pressure happens. + * VRANGE_NONVOLATILE - Removes any volatile hints previous specified in that + * range. + * + * purged ptr: + * Returns 1 if any page in the range being marked nonvolatile has been purged. + * + * Return values: + * On success vrange returns the number of bytes marked or unmarked. + * Similar to write(), it may return fewer bytes then specified if + * it ran into a problem. + * + * If an error is returned, no changes were made. + * + * Errors: + * -EINVAL - start len < 0, start is not page-aligned, start is greater + * than TASK_SIZE or "mode" is not a valid value. + * -ENOMEM - Short of free memory in system for successful system call. + * -EFAULT - Purged pointer is invalid. + * -ENOSUP - Feature not yet supported. + */ +SYSCALL_DEFINE4(vrange, unsigned long, start, + size_t, len, int, mode, int __user *, purged) +{ + unsigned long end; + struct mm_struct *mm = current->mm; + ssize_t ret = -EINVAL; + int p = 0; + + if (start & ~PAGE_MASK) + goto out; + + len &= PAGE_MASK; + if (!len) + goto out; + + end = start + len; + if (end < start) + goto out; + + if (start >= TASK_SIZE) + goto out; + + if (purged) { + /* Test pointer is valid before making any changes */ + if (put_user(p, purged)) + return -EFAULT; + } + + ret = do_vrange(mm, start, end - 1, mode, &p); + + if (purged) { + if (put_user(p, purged)) { + /* + * This would be bad, since we've modified volatilty + * and the change in purged state would be lost. + */ + BUG(); + } + } + +out: + return ret; +}