From patchwork Tue Jun 11 01:12:11 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Stultz X-Patchwork-Id: 17751 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-qc0-f199.google.com (mail-qc0-f199.google.com [209.85.216.199]) by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id 9BCE225DF9 for ; Tue, 11 Jun 2013 01:12:40 +0000 (UTC) Received: by mail-qc0-f199.google.com with SMTP id a1sf3981147qcx.10 for ; Mon, 10 Jun 2013 18:12:40 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-beenthere:x-forwarded-to:x-forwarded-for :delivered-to:from:to:cc:subject:date:message-id:x-mailer :in-reply-to:references:x-gm-message-state:x-original-sender :x-original-authentication-results:precedence:mailing-list:list-id :x-google-group-id:list-post:list-help:list-archive:list-unsubscribe; bh=d3Ju5O5GcDKSRcXRs5H1Qc27uu9jps3z2RlC/AdOKaA=; b=inTZtaBnSgddB6haqDnFFYNeFUcwOW9oXqGvvlF4lgY1+Dx2hkpRWP90DfbwlxApdp yplfup/OQUW6v6V2d66uAGkDSUTvg0Sx+5uLMl7vybO1eY0BvZ/658uigdwHK0phXZo1 gcKe5XLi7u+hWgVv6S4hQk+somDYsP83DobGWP9zBtj6ZnYKTitNjauJXIcf2SgbVgSd IIr5FFiXYwOkkAV2uOkadfGu4Ctd1fAmjCC/p0GaVOxuEa/K4EhjS5N3OKXAkPQ4yQig yxRpg9JfEHdG1+wR+UO81Fah9G5mYBBJPmJSXIYvYb8WSb9LguG49qZ6YJwMukj0Pd/n JPCA== X-Received: by 10.224.42.141 with SMTP id s13mr9694132qae.3.1370913160383; Mon, 10 Jun 2013 18:12:40 -0700 (PDT) MIME-Version: 1.0 X-BeenThere: patchwork-forward@linaro.org Received: by 10.49.99.72 with SMTP id eo8ls3037529qeb.89.gmail; Mon, 10 Jun 2013 18:12:40 -0700 (PDT) X-Received: by 10.52.70.142 with SMTP id m14mr6249596vdu.127.1370913160172; Mon, 10 Jun 2013 18:12:40 -0700 (PDT) Received: from mail-vb0-x229.google.com (mail-vb0-x229.google.com [2607:f8b0:400c:c02::229]) by mx.google.com with ESMTPS id i3si6160444vdw.22.2013.06.10.18.12.40 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 10 Jun 2013 18:12:40 -0700 (PDT) Received-SPF: neutral (google.com: 2607:f8b0:400c:c02::229 is neither permitted nor denied by best guess record for domain of patch+caf_=patchwork-forward=linaro.org@linaro.org) client-ip=2607:f8b0:400c:c02::229; Received: by mail-vb0-f41.google.com with SMTP id p13so3512640vbe.14 for ; Mon, 10 Jun 2013 18:12:40 -0700 (PDT) X-Received: by 10.220.19.74 with SMTP id z10mr7272441vca.45.1370913160036; Mon, 10 Jun 2013 18:12:40 -0700 (PDT) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patches@linaro.org Received: by 10.221.10.206 with SMTP id pb14csp89599vcb; Mon, 10 Jun 2013 18:12:39 -0700 (PDT) X-Received: by 10.69.1.69 with SMTP id be5mr12171022pbd.138.1370913159060; Mon, 10 Jun 2013 18:12:39 -0700 (PDT) Received: from mail-pd0-f175.google.com (mail-pd0-f175.google.com [209.85.192.175]) by mx.google.com with ESMTPS id rs6si5811570pbc.242.2013.06.10.18.12.38 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 10 Jun 2013 18:12:38 -0700 (PDT) Received-SPF: neutral (google.com: 209.85.192.175 is neither permitted nor denied by best guess record for domain of john.stultz@linaro.org) client-ip=209.85.192.175; Received: by mail-pd0-f175.google.com with SMTP id 4so7956386pdd.6 for ; Mon, 10 Jun 2013 18:12:38 -0700 (PDT) X-Received: by 10.68.65.134 with SMTP id x6mr12147445pbs.219.1370913158363; Mon, 10 Jun 2013 18:12:38 -0700 (PDT) Received: from localhost.localdomain (c-67-170-153-23.hsd1.or.comcast.net. [67.170.153.23]) by mx.google.com with ESMTPSA id qe10sm9802489pbb.2.2013.06.10.18.12.37 for (version=TLSv1.1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 10 Jun 2013 18:12:37 -0700 (PDT) From: John Stultz To: minchan@kernel.org Cc: dgiani@mozilla.com, John Stultz Subject: [PATCH 05/13] vrange: Add new vrange(2) system call Date: Mon, 10 Jun 2013 18:12:11 -0700 Message-Id: <1370913139-9320-6-git-send-email-john.stultz@linaro.org> X-Mailer: git-send-email 1.8.1.2 In-Reply-To: <1370913139-9320-1-git-send-email-john.stultz@linaro.org> References: <1370913139-9320-1-git-send-email-john.stultz@linaro.org> X-Gm-Message-State: ALoCoQniuQWy5qtah7p5kHguWrZuKKNUl6cRMlnEEENcohCIi61Dd55DuRnVZpeBs+h0eeTX6Ip4 X-Original-Sender: john.stultz@linaro.org X-Original-Authentication-Results: mx.google.com; spf=neutral (google.com: 2607:f8b0:400c:c02::229 is neither permitted nor denied by best guess record for domain of patch+caf_=patchwork-forward=linaro.org@linaro.org) smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org Precedence: list Mailing-list: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org List-ID: X-Google-Group-Id: 836684582541 List-Post: , List-Help: , List-Archive: List-Unsubscribe: , This patch adds new system call sys_vrange. NAME vrange - Mark or unmark range of memory as volatile SYNOPSIS int vrange(unsigned_long start, size_t length, int mode, int *purged); DESCRIPTION Applications can use vrange(2) to advise the kernel how it should handle paging I/O in this VM area. The idea is to help the kernel discard pages of vrange instead of reclaiming when memory pressure happens. It means kernel doesn't discard any pages of vrange if there is no memory pressure. mode: VRANGE_VOLATILE hint to kernel so VM can discard in vrange pages when memory pressure happens. VRANGE_NONVOLATILE hint to kernel so VM doesn't discard vrange pages any more. If user try to access purged memory without VRANGE_NOVOLATILE call, he can encounter SIGBUS if the page was discarded by kernel. purged: Pointer to an integer which will return 1 if mode == VRANGE_NONVOLATILE and any page in the affected range was purged. If purged returns zero during a mode == VRANGE_NONVOLATILE call, it means all of the pages in the range are intact. RETURN VALUE On success vrange returns the number of bytes marked or unmarked. Similar to write(), it may return fewer bytes then specified if it ran into a problem. If an error is returned, no changes were made. ERRORS EINVAL This error can occur for the following reasons: * The value length is negative. * addr is not page-aligned * mode or behavior are not a valid value. ENOMEM Not enough memory ENOTSUPP Behavior is not yet supported Signed-off-by: Minchan Kim [jstultz: Major rework of interface and commit message] Signed-off-by: John Stultz --- arch/x86/syscalls/syscall_64.tbl | 1 + include/uapi/asm-generic/mman-common.h | 3 + mm/vrange.c | 129 +++++++++++++++++++++++++++++++++ 3 files changed, 133 insertions(+) diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl index 38ae65d..dc332bd 100644 --- a/arch/x86/syscalls/syscall_64.tbl +++ b/arch/x86/syscalls/syscall_64.tbl @@ -320,6 +320,7 @@ 311 64 process_vm_writev sys_process_vm_writev 312 common kcmp sys_kcmp 313 common finit_module sys_finit_module +314 common vrange sys_vrange # # x32-specific system call numbers start at 512 to avoid cache impact diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h index 4164529..9be120b 100644 --- a/include/uapi/asm-generic/mman-common.h +++ b/include/uapi/asm-generic/mman-common.h @@ -66,4 +66,7 @@ #define MAP_HUGE_SHIFT 26 #define MAP_HUGE_MASK 0x3f +#define VRANGE_VOLATILE 0 /* unpin pages so VM can discard them */ +#define VRANGE_NONVOLATILE 1 /* pin pages so VM can't discard them */ + #endif /* __ASM_GENERIC_MMAN_COMMON_H */ diff --git a/mm/vrange.c b/mm/vrange.c index 5ca8853..0ab741e 100644 --- a/mm/vrange.c +++ b/mm/vrange.c @@ -5,6 +5,7 @@ #include #include #include +#include static struct kmem_cache *vrange_cachep; @@ -217,3 +218,131 @@ fail: vrange_root_cleanup(new); return -ENOMEM; } + +static ssize_t do_vrange(struct mm_struct *mm, unsigned long start_idx, + unsigned long end_idx, int mode, int *purged) +{ + struct vm_area_struct *vma; + unsigned long orig_start = start_idx; + ssize_t count = 0, ret = 0; + + down_read(&mm->mmap_sem); + + vma = find_vma(mm, start_idx); + for (;;) { + struct vrange_root *vroot; + unsigned long tmp, vstart_idx, vend_idx; + + if (!vma) + goto out; + + /* make sure start is at the front of the current vma*/ + if (start_idx < vma->vm_start) { + start_idx = vma->vm_start; + if (start_idx > end_idx) + goto out; + } + + /* bound tmp to closer of vm_end & end */ + tmp = vma->vm_end - 1; + if (end_idx < tmp) + tmp = end_idx; + + if (vma->vm_file && (vma->vm_flags & VM_SHARED)) { + /* Convert to file relative offsets */ + vroot = &vma->vm_file->f_mapping->vroot; + vstart_idx = vma->vm_pgoff + start_idx - vma->vm_start; + vend_idx = vma->vm_pgoff + tmp - vma->vm_start; + } else { + vroot = &mm->vroot; + vstart_idx = start_idx; + vend_idx = tmp; + } + + /* mark or unmark */ + if (mode == VRANGE_VOLATILE) + ret = vrange_add(vroot, vstart_idx, vend_idx); + else if (mode == VRANGE_NONVOLATILE) + ret = vrange_remove(vroot, vstart_idx, vend_idx, + purged); + + if (ret) + goto out; + + /* update count to distance covered so far*/ + count = tmp - orig_start; + + /* move start up to the end of the vma*/ + start_idx = vma->vm_end; + if (start_idx > end_idx) + goto out; + /* move to the next vma */ + vma = vma->vm_next; + } +out: + up_read(&mm->mmap_sem); + + /* report bytes successfully marked, even if we're exiting on error */ + if (count) + return count; + + return ret; +} + +/* + * The vrange(2) system call. + * + * Applications can use vrange() to advise the kernel how it should + * handle paging I/O in this VM area. The idea is to help the kernel + * discard pages of vrange instead of swapping out when memory pressure + * happens. The information provided is advisory only, and can be safely + * disregarded by the kernel if system has enough free memory. + * + * mode values: + * VRANGE_VOLATILE - hint to kernel so VM can discard vrange pages when + * memory pressure happens. + * VRANGE_NONVOLATILE - Removes any volatile hints previous specified in that + * range. + * + * purged ptr: + * Returns 1 if any page in the range being marked nonvolatile has been purged. + * + * return values: + * 0 - success + * -EINVAL - start len < 0, start is not page-aligned, start is greater + * than TASK_SIZE or "mode" is not a valid value. + * -ENOMEM - Short of free memory in system for successful system call. + * -ENOSUP - Feature not yet supported. + */ +SYSCALL_DEFINE4(vrange, unsigned long, start, + size_t, len, int, mode, int __user *, purged) +{ + unsigned long end; + struct mm_struct *mm = current->mm; + ssize_t ret = -EINVAL; + int p; + + if (start & ~PAGE_MASK) + goto out; + + len &= PAGE_MASK; + if (!len) + goto out; + + end = start + len; + if (end < start) + goto out; + + if (start >= TASK_SIZE) + goto out; + + ret = do_vrange(mm, start, end - 1, mode, &p); + + if (purged) { + if (put_user(p,purged)) + return -EFAULT; + } + +out: + return ret; +}