From patchwork Tue Jul 19 19:56:24 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Axel Rasmussen X-Patchwork-Id: 592035 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AF5E9CCA47F for ; Tue, 19 Jul 2022 19:56:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238767AbiGST4o (ORCPT ); Tue, 19 Jul 2022 15:56:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44142 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239062AbiGST4h (ORCPT ); Tue, 19 Jul 2022 15:56:37 -0400 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CF9315A2C9 for ; Tue, 19 Jul 2022 12:56:35 -0700 (PDT) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-31e55518830so33527977b3.23 for ; Tue, 19 Jul 2022 12:56:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=2lmji2pJwQUUP3R6t1yXuXRPrPh1yDx0i1mGLg8CxXk=; b=rRPjfzCfRNd5lQiZ67VG1PsIVIYjPwFy08N1NgXwZEdmOj6JA+coqAzVWkDJIlNRY5 YYf3Y/bM+zhmKkRWAsQ2bjiJrwYdl3VnbCcCZGMRHF6ew7w4EywM1I3/QseVs1Mn1HXX tpRbi7PWHIGlDIpJweXOzNMEFyij2g2rnpJrMrBPd0i5ICMUSiInyP+w3GQ4/Jk2pCQB aC/ZpgWXZ+1KmqHJq0wUclsGmDY8ho2QpLzY95UOiGq0A0G8UHc0v+qIJPIQ9/n6efPf iewCSIjs66UnweZpc/EznMh9zB+4mPI7OoyNXC/osApBENIUmF4qtOi2XbW2DpSB5wMJ zUIw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=2lmji2pJwQUUP3R6t1yXuXRPrPh1yDx0i1mGLg8CxXk=; b=uqCBq0acYEx6OL/BGSpN0dHXoyrIl87FMSdd6ImkWYp6w0QuaQDLXyw6a+ByCfuBUd 0NrO0vN30rbP9pL3UPr9uqFEzgHIfjgzC2zTuXAZD5mobk9xD+QfeV8gWluxKKHUc3yv UDXCuuE7J25FxTQHATzevnaBH8qkfb/xoYxXlpps3OlxK2J/JTATHUpqTBT0QC+ct8Y9 pA7qtDSVoFtKX+O7cHem+6iXckGnWpjv3UWaRNMLEaKKBegNsN89YALVVjpbl2jKvZ5i /eVl8Ls6MKd3SOQ9xvLQYZX4oyM52hNLwHrVFG8nJW5apZ6E4sUqpN48N8uCOlLAj9X2 16fA== X-Gm-Message-State: AJIora+zxp6IlvKGdq22uXoavHI2PbNaTb3zSvnS/qCydGdsvSkWFFkJ esSpvYKW6PgMxHFplFsR3BDpYYVbvaXgaRm3Dad2 X-Google-Smtp-Source: AGRyM1uCcb2pDVTpDDsVUiZd0J0+WEjtbbgh4jqcG3bGfivbKRKkUwVy3Wnb/o8tyjKhhIKWZ30sg4m4oT0+s0fm0h3m X-Received: from ajr0.svl.corp.google.com ([2620:15c:2d4:203:a065:9221:e40d:4fbe]) (user=axelrasmussen job=sendgmr) by 2002:a25:7c41:0:b0:670:7de8:1d4b with SMTP id x62-20020a257c41000000b006707de81d4bmr5955132ybc.488.1658260595015; Tue, 19 Jul 2022 12:56:35 -0700 (PDT) Date: Tue, 19 Jul 2022 12:56:24 -0700 In-Reply-To: <20220719195628.3415852-1-axelrasmussen@google.com> Message-Id: <20220719195628.3415852-2-axelrasmussen@google.com> Mime-Version: 1.0 References: <20220719195628.3415852-1-axelrasmussen@google.com> X-Mailer: git-send-email 2.37.0.170.g444d1eabd0-goog Subject: [PATCH v4 1/5] selftests: vm: add hugetlb_shared userfaultfd test to run_vmtests.sh From: Axel Rasmussen To: Alexander Viro , Andrew Morton , Dave Hansen , "Dmitry V . Levin" , Gleb Fotengauer-Malinovskiy , Hugh Dickins , Jan Kara , Jonathan Corbet , Mel Gorman , Mike Kravetz , Mike Rapoport , Nadav Amit , Peter Xu , Shuah Khan , Suren Baghdasaryan , Vlastimil Babka , zhangyi Cc: Axel Rasmussen , linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, Shuah Khan Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org This not being included was just a simple oversight. There are certain features (like minor fault support) which are only enabled on shared mappings, so without including hugetlb_shared we actually lose a significant amount of test coverage. Reviewed-by: Shuah Khan Reviewed-by: Peter Xu Signed-off-by: Axel Rasmussen --- tools/testing/selftests/vm/run_vmtests.sh | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/vm/run_vmtests.sh b/tools/testing/selftests/vm/run_vmtests.sh index 41fce8bea929..e70ae0f3aaf6 100755 --- a/tools/testing/selftests/vm/run_vmtests.sh +++ b/tools/testing/selftests/vm/run_vmtests.sh @@ -121,9 +121,11 @@ run_test ./gup_test -a run_test ./gup_test -ct -F 0x1 0 19 0x1000 run_test ./userfaultfd anon 20 16 -# Test requires source and destination huge pages. Size of source -# (half_ufd_size_MB) is passed as argument to test. +# Hugetlb tests require source and destination huge pages. Pass in half the +# size ($half_ufd_size_MB), which is used for *each*. run_test ./userfaultfd hugetlb "$half_ufd_size_MB" 32 +run_test ./userfaultfd hugetlb_shared "$half_ufd_size_MB" 32 "$mnt"/uffd-test +rm -f "$mnt"/uffd-test run_test ./userfaultfd shmem 20 16 #cleanup From patchwork Tue Jul 19 19:56:25 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Axel Rasmussen X-Patchwork-Id: 591767 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 842C3C43334 for ; Tue, 19 Jul 2022 19:56:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238780AbiGST4x (ORCPT ); Tue, 19 Jul 2022 15:56:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44098 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234379AbiGST4k (ORCPT ); Tue, 19 Jul 2022 15:56:40 -0400 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 72F08599D2 for ; Tue, 19 Jul 2022 12:56:38 -0700 (PDT) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-31e62bc916aso12556047b3.19 for ; Tue, 19 Jul 2022 12:56:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=qMVUvtohpl5u0YhYXflPGUyFWvhgp3J9BbhXpFzblMU=; b=lnfO9vGuCg2XCgU/FaH826cH1zjvOw9G/iNyzi1HOhL3DxRT5bJcHvrfvhI4t6URAW ZP8ma2CBwGO2AVLagAv1r0AO895+URwxT6XMHfplaCjyUO6lJkILUzedOiepPxy4Eahr EhH0JgOK+S8z069u+yUEl/l3mZrtwpKQA9Xw28TuBzAjavO2bWFh8TRuv1X2CVR90KpU qPBKThQYdZup0H8ibLQRBekD27EueLVSQBgOG0oQLbT0STjwrqunbK5D9oWZXtUEf2be s1WMV4AYiUbOkEVKGSa0BPiWU1Jl6E6ZbPremSWY888QlqwUWVrZ7Do8xcodXxNg2E9I fJJw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=qMVUvtohpl5u0YhYXflPGUyFWvhgp3J9BbhXpFzblMU=; b=f0NK7ewgkH6h8+T46EY/SjtM3oLz4vXhYYttl1W3M54pXbtaYfi3kO4cepu9RRp1mM AIhfkUpwb6LzQEskKq3hmYkvd33FT52SR1OGOKKXZClUqYnppHFW+akd8d43VO7VU1rF GVuVKZOd8PsCZrRwsN3hkDeuVaxEHrqXTxg+p4/TMBWhzQRUBJcb0Jug32zR+3wljSiJ 7bZ0UMnxXK8DTJelBhBZUSXwUg8/SZqvX7Rwm7Fcnh5pB45OCHI62y87Rww+bOK/OADx srA+6xHB+iSITPr8Gh6alDm4O+PSRsrukQyHve5glltRSsDqbL/p8a89L9/EAiEe2x8C YjSA== X-Gm-Message-State: AJIora8OyIcqSq2afuhEIGQpGilpKGLWOjKyKK3Lb93muvpF0tr12qKA RaPiIQiz/HaZeOER2mbYBA09qW7h0IxkWmbUSs2t X-Google-Smtp-Source: AGRyM1tJ/Uc5B/XJoSXLcnF76pAX+DiA/uqYaRjrJAYUu9yZiW0d+SAGBCOg8WS/GtxAUC/6LkRBA2YKMCzMouUel+Kd X-Received: from ajr0.svl.corp.google.com ([2620:15c:2d4:203:a065:9221:e40d:4fbe]) (user=axelrasmussen job=sendgmr) by 2002:a25:6ed5:0:b0:669:8b84:bb57 with SMTP id j204-20020a256ed5000000b006698b84bb57mr32393560ybc.227.1658260597485; Tue, 19 Jul 2022 12:56:37 -0700 (PDT) Date: Tue, 19 Jul 2022 12:56:25 -0700 In-Reply-To: <20220719195628.3415852-1-axelrasmussen@google.com> Message-Id: <20220719195628.3415852-3-axelrasmussen@google.com> Mime-Version: 1.0 References: <20220719195628.3415852-1-axelrasmussen@google.com> X-Mailer: git-send-email 2.37.0.170.g444d1eabd0-goog Subject: [PATCH v4 2/5] userfaultfd: add /dev/userfaultfd for fine grained access control From: Axel Rasmussen To: Alexander Viro , Andrew Morton , Dave Hansen , "Dmitry V . Levin" , Gleb Fotengauer-Malinovskiy , Hugh Dickins , Jan Kara , Jonathan Corbet , Mel Gorman , Mike Kravetz , Mike Rapoport , Nadav Amit , Peter Xu , Shuah Khan , Suren Baghdasaryan , Vlastimil Babka , zhangyi Cc: Axel Rasmussen , linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org Historically, it has been shown that intercepting kernel faults with userfaultfd (thereby forcing the kernel to wait for an arbitrary amount of time) can be exploited, or at least can make some kinds of exploits easier. So, in 37cd0575b8 "userfaultfd: add UFFD_USER_MODE_ONLY" we changed things so, in order for kernel faults to be handled by userfaultfd, either the process needs CAP_SYS_PTRACE, or this sysctl must be configured so that any unprivileged user can do it. In a typical implementation of a hypervisor with live migration (take QEMU/KVM as one such example), we do indeed need to be able to handle kernel faults. But, both options above are less than ideal: - Toggling the sysctl increases attack surface by allowing any unprivileged user to do it. - Granting the live migration process CAP_SYS_PTRACE gives it this ability, but *also* the ability to "observe and control the execution of another process [...], and examine and change [its] memory and registers" (from ptrace(2)). This isn't something we need or want to be able to do, so granting this permission violates the "principle of least privilege". This is all a long winded way to say: we want a more fine-grained way to grant access to userfaultfd, without granting other additional permissions at the same time. To achieve this, add a /dev/userfaultfd misc device. This device provides an alternative to the userfaultfd(2) syscall for the creation of new userfaultfds. The idea is, any userfaultfds created this way will be able to handle kernel faults, without the caller having any special capabilities. Access to this mechanism is instead restricted using e.g. standard filesystem permissions. Signed-off-by: Axel Rasmussen --- fs/userfaultfd.c | 69 +++++++++++++++++++++++++------- include/uapi/linux/userfaultfd.h | 4 ++ 2 files changed, 59 insertions(+), 14 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index e943370107d0..968f2517a281 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -30,6 +30,7 @@ #include #include #include +#include int sysctl_unprivileged_userfaultfd __read_mostly; @@ -413,13 +414,8 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) if (ctx->features & UFFD_FEATURE_SIGBUS) goto out; - if ((vmf->flags & FAULT_FLAG_USER) == 0 && - ctx->flags & UFFD_USER_MODE_ONLY) { - printk_once(KERN_WARNING "uffd: Set unprivileged_userfaultfd " - "sysctl knob to 1 if kernel faults must be handled " - "without obtaining CAP_SYS_PTRACE capability\n"); + if (!(vmf->flags & FAULT_FLAG_USER) && (ctx->flags & UFFD_USER_MODE_ONLY)) goto out; - } /* * If it's already released don't get it. This avoids to loop @@ -2052,19 +2048,30 @@ static void init_once_userfaultfd_ctx(void *mem) seqcount_spinlock_init(&ctx->refile_seq, &ctx->fault_pending_wqh.lock); } -SYSCALL_DEFINE1(userfaultfd, int, flags) +static inline bool userfaultfd_syscall_allowed(int flags) +{ + /* Userspace-only page faults are always allowed */ + if (flags & UFFD_USER_MODE_ONLY) + return true; + + /* + * The user is requesting a userfaultfd which can handle kernel faults. + * Privileged users are always allowed to do this. + */ + if (capable(CAP_SYS_PTRACE)) + return true; + + /* Otherwise, access to kernel fault handling is sysctl controlled. */ + return sysctl_unprivileged_userfaultfd; +} + +static int new_userfaultfd(bool is_syscall, int flags) { struct userfaultfd_ctx *ctx; int fd; - if (!sysctl_unprivileged_userfaultfd && - (flags & UFFD_USER_MODE_ONLY) == 0 && - !capable(CAP_SYS_PTRACE)) { - printk_once(KERN_WARNING "uffd: Set unprivileged_userfaultfd " - "sysctl knob to 1 if kernel faults must be handled " - "without obtaining CAP_SYS_PTRACE capability\n"); + if (is_syscall && !userfaultfd_syscall_allowed(flags)) return -EPERM; - } BUG_ON(!current->mm); @@ -2098,8 +2105,42 @@ SYSCALL_DEFINE1(userfaultfd, int, flags) return fd; } +SYSCALL_DEFINE1(userfaultfd, int, flags) +{ + return new_userfaultfd(true, flags); +} + +static int userfaultfd_dev_open(struct inode *inode, struct file *file) +{ + return 0; +} + +static long userfaultfd_dev_ioctl(struct file *file, unsigned int cmd, unsigned long flags) +{ + if (cmd != USERFAULTFD_IOC_NEW) + return -EINVAL; + + return new_userfaultfd(false, flags); +} + +static const struct file_operations userfaultfd_dev_fops = { + .open = userfaultfd_dev_open, + .unlocked_ioctl = userfaultfd_dev_ioctl, + .compat_ioctl = userfaultfd_dev_ioctl, + .owner = THIS_MODULE, + .llseek = noop_llseek, +}; + +static struct miscdevice userfaultfd_misc = { + .minor = MISC_DYNAMIC_MINOR, + .name = "userfaultfd", + .fops = &userfaultfd_dev_fops +}; + static int __init userfaultfd_init(void) { + WARN_ON(misc_register(&userfaultfd_misc)); + userfaultfd_ctx_cachep = kmem_cache_create("userfaultfd_ctx_cache", sizeof(struct userfaultfd_ctx), 0, diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h index 7d32b1e797fb..005e5e306266 100644 --- a/include/uapi/linux/userfaultfd.h +++ b/include/uapi/linux/userfaultfd.h @@ -12,6 +12,10 @@ #include +/* ioctls for /dev/userfaultfd */ +#define USERFAULTFD_IOC 0xAA +#define USERFAULTFD_IOC_NEW _IO(USERFAULTFD_IOC, 0x00) + /* * If the UFFDIO_API is upgraded someday, the UFFDIO_UNREGISTER and * UFFDIO_WAKE ioctls should be defined as _IOW and not as _IOR. In From patchwork Tue Jul 19 19:56:26 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Axel Rasmussen X-Patchwork-Id: 592034 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 30035CCA47F for ; Tue, 19 Jul 2022 19:56:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239687AbiGST45 (ORCPT ); Tue, 19 Jul 2022 15:56:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44154 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239704AbiGST4o (ORCPT ); Tue, 19 Jul 2022 15:56:44 -0400 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C3D0B5B7BA for ; Tue, 19 Jul 2022 12:56:40 -0700 (PDT) Received: by mail-yb1-xb4a.google.com with SMTP id m5-20020a2598c5000000b0066faab590c5so11547651ybo.7 for ; Tue, 19 Jul 2022 12:56:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=kfHTJgvI1xhbc2HV5Dg5HLOm6jC5g70kHqbkIkuEdDY=; b=eATFJJjY0r2Ckp5HyJYF7nbsKpNr/n+bnVWTBX/qKJPuAEJFHTGGCBZXyp9cXnxBjg Ca1+y0xLR0qoBqtrlrNHYULBzMXCsc8caFlMWYdtEt52PUQhmCcZQT9ESXfI1teZEuWa IWUlbVDzn15ayl5FiIOdT8aMilcfI1uMjS5UpC1ymoVTyAuboD3MgfK8/r3RKHzKZmKP 8m5pTgYtqjZZkBSc+eWfj3+JIt1hCkNiSZnJ7k/w+No27VJ2bcyRAaOz1ogd6Mrde2Rr +tyNsenOxNV9jPukEvgPal2QfI5GQaafFY21IscEco8k3Vd5LNcprxRmuTztwvxV115+ 2EZw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=kfHTJgvI1xhbc2HV5Dg5HLOm6jC5g70kHqbkIkuEdDY=; b=sGLTeCKevH+vclK7IJPZFC5hKMImtutp+n6kkKIWtk2IWwafvZC81jqiPNHEFwl2ua ApR5PJXw4US5G+gGczSMMxBx5nsg+Wr2ODDlcUCDYstfQ21H4s2LfxWr3zeQcLhlgQtW 0KZK4/cGnHFNxjes+92XMrC4JSiLlqZMbfMRowDsDcSjfBT13S4PQnF78coLFJU46jV6 YuXkZJriP+AA5Zy485i5TbfjtFOfaQ1RqnBTEQ6rnqLs/b25b2/FipxEO/9L8n+rX+cp Vj/3Z+dAJKJhE1jUS/jJ9tlcuPTGz7GXMJxE8lhwLZXoY2bc8MKCs9a3VhBBdBBB8yEu PWSQ== X-Gm-Message-State: AJIora8Ols2hAmbMfwrPdpgfeoJSlXmdCt5nUw0M0YXo+MjtF2QT/khD lf5CJ5MA9PE5OldIEi9hClLD8fUa4jnMHhG9Pslv X-Google-Smtp-Source: AGRyM1vh6EyxVuVXbs2vKNFncUeOGmAA4GIWCwxEtu04z5gOe8ex1hyo5UKRpWnlw9uhvTBE62Bd9M97+L2VvA7rWmqs X-Received: from ajr0.svl.corp.google.com ([2620:15c:2d4:203:a065:9221:e40d:4fbe]) (user=axelrasmussen job=sendgmr) by 2002:a25:3b11:0:b0:66e:ccf2:76dc with SMTP id i17-20020a253b11000000b0066eccf276dcmr33549674yba.247.1658260600090; Tue, 19 Jul 2022 12:56:40 -0700 (PDT) Date: Tue, 19 Jul 2022 12:56:26 -0700 In-Reply-To: <20220719195628.3415852-1-axelrasmussen@google.com> Message-Id: <20220719195628.3415852-4-axelrasmussen@google.com> Mime-Version: 1.0 References: <20220719195628.3415852-1-axelrasmussen@google.com> X-Mailer: git-send-email 2.37.0.170.g444d1eabd0-goog Subject: [PATCH v4 3/5] userfaultfd: selftests: modify selftest to use /dev/userfaultfd From: Axel Rasmussen To: Alexander Viro , Andrew Morton , Dave Hansen , "Dmitry V . Levin" , Gleb Fotengauer-Malinovskiy , Hugh Dickins , Jan Kara , Jonathan Corbet , Mel Gorman , Mike Kravetz , Mike Rapoport , Nadav Amit , Peter Xu , Shuah Khan , Suren Baghdasaryan , Vlastimil Babka , zhangyi Cc: Axel Rasmussen , linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org We clearly want to ensure both userfaultfd(2) and /dev/userfaultfd keep working into the future, so just run the test twice, using each interface. Instead of always testing both userfaultfd(2) and /dev/userfaultfd, let the user choose which to test. As with other test features, change the behavior based on a new command line flag. Introduce the idea of "test mods", which are generic (not specific to a test type) modifications to the behavior of the test. This is sort of borrowed from this RFC patch series [1], but simplified a bit. The benefit is, in "typical" configurations this test is somewhat slow (say, 30sec or something). Testing both clearly doubles it, so it may not always be desirable, as users are likely to use one or the other, but never both, in the "real world". [1]: https://patchwork.kernel.org/project/linux-mm/patch/20201129004548.1619714-14-namit@vmware.com/ Signed-off-by: Axel Rasmussen --- tools/testing/selftests/vm/userfaultfd.c | 69 ++++++++++++++++++++---- 1 file changed, 60 insertions(+), 9 deletions(-) diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c index 0bdfc1955229..0a126c620bc0 100644 --- a/tools/testing/selftests/vm/userfaultfd.c +++ b/tools/testing/selftests/vm/userfaultfd.c @@ -77,6 +77,11 @@ static int bounces; #define TEST_SHMEM 3 static int test_type; +#define UFFD_FLAGS (O_CLOEXEC | O_NONBLOCK | UFFD_USER_MODE_ONLY) + +/* test using /dev/userfaultfd, instead of userfaultfd(2) */ +static bool test_dev_userfaultfd; + /* exercise the test_uffdio_*_eexist every ALARM_INTERVAL_SECS */ #define ALARM_INTERVAL_SECS 10 static volatile bool test_uffdio_copy_eexist = true; @@ -125,6 +130,8 @@ struct uffd_stats { const char *examples = "# Run anonymous memory test on 100MiB region with 99999 bounces:\n" "./userfaultfd anon 100 99999\n\n" + "# Run the same anonymous memory test, but using /dev/userfaultfd:\n" + "./userfaultfd anon:dev 100 99999\n\n" "# Run share memory test on 1GiB region with 99 bounces:\n" "./userfaultfd shmem 1000 99\n\n" "# Run hugetlb memory test on 256MiB region with 50 bounces:\n" @@ -141,6 +148,14 @@ static void usage(void) "[hugetlbfs_file]\n\n"); fprintf(stderr, "Supported : anon, hugetlb, " "hugetlb_shared, shmem\n\n"); + fprintf(stderr, "'Test mods' can be joined to the test type string with a ':'. " + "Supported mods:\n"); + fprintf(stderr, "\tsyscall - Use userfaultfd(2) (default)\n"); + fprintf(stderr, "\tdev - Use /dev/userfaultfd instead of userfaultfd(2)\n"); + fprintf(stderr, "\nExample test mod usage:\n"); + fprintf(stderr, "# Run anonymous memory test with /dev/userfaultfd:\n"); + fprintf(stderr, "./userfaultfd anon:dev 100 99999\n\n"); + fprintf(stderr, "Examples:\n\n"); fprintf(stderr, "%s", examples); exit(1); @@ -154,12 +169,14 @@ static void usage(void) ret, __LINE__); \ } while (0) -#define err(fmt, ...) \ +#define errexit(exitcode, fmt, ...) \ do { \ _err(fmt, ##__VA_ARGS__); \ - exit(1); \ + exit(exitcode); \ } while (0) +#define err(fmt, ...) errexit(1, fmt, ##__VA_ARGS__) + static void uffd_stats_reset(struct uffd_stats *uffd_stats, unsigned long n_cpus) { @@ -383,13 +400,29 @@ static void assert_expected_ioctls_present(uint64_t mode, uint64_t ioctls) } } +static int __userfaultfd_open_dev(void) +{ + int fd, _uffd = -1; + + fd = open("/dev/userfaultfd", O_RDWR | O_CLOEXEC); + if (fd < 0) + return -1; + + _uffd = ioctl(fd, USERFAULTFD_IOC_NEW, UFFD_FLAGS); + close(fd); + return _uffd; +} + static void userfaultfd_open(uint64_t *features) { struct uffdio_api uffdio_api; - uffd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK | UFFD_USER_MODE_ONLY); + if (test_dev_userfaultfd) + uffd = __userfaultfd_open_dev(); + else + uffd = syscall(__NR_userfaultfd, UFFD_FLAGS); if (uffd < 0) - err("userfaultfd syscall not available in this kernel"); + errexit(KSFT_SKIP, "creating userfaultfd failed"); uffd_flags = fcntl(uffd, F_GETFD, NULL); uffdio_api.api = UFFD_API; @@ -1584,8 +1617,6 @@ unsigned long default_huge_page_size(void) static void set_test_type(const char *type) { - uint64_t features = UFFD_API_FEATURES; - if (!strcmp(type, "anon")) { test_type = TEST_ANON; uffd_test_ops = &anon_uffd_test_ops; @@ -1603,9 +1634,29 @@ static void set_test_type(const char *type) test_type = TEST_SHMEM; uffd_test_ops = &shmem_uffd_test_ops; test_uffdio_minor = true; - } else { - err("Unknown test type: %s", type); } +} + +static void parse_test_type_arg(const char *raw_type) +{ + char *buf = strdup(raw_type); + uint64_t features = UFFD_API_FEATURES; + + while (buf) { + const char *token = strsep(&buf, ":"); + + if (!test_type) + set_test_type(token); + else if (!strcmp(token, "dev")) + test_dev_userfaultfd = true; + else if (!strcmp(token, "syscall")) + test_dev_userfaultfd = false; + else + err("unrecognized test mod '%s'", token); + } + + if (!test_type) + err("failed to parse test type argument: '%s'", raw_type); if (test_type == TEST_HUGETLB) page_size = default_huge_page_size(); @@ -1653,7 +1704,7 @@ int main(int argc, char **argv) err("failed to arm SIGALRM"); alarm(ALARM_INTERVAL_SECS); - set_test_type(argv[1]); + parse_test_type_arg(argv[1]); nr_cpus = sysconf(_SC_NPROCESSORS_ONLN); nr_pages_per_cpu = atol(argv[2]) * 1024*1024 / page_size / From patchwork Tue Jul 19 19:56:27 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Axel Rasmussen X-Patchwork-Id: 591766 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 78655C433EF for ; Tue, 19 Jul 2022 19:57:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239874AbiGST5J (ORCPT ); Tue, 19 Jul 2022 15:57:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44132 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238664AbiGST4u (ORCPT ); Tue, 19 Jul 2022 15:56:50 -0400 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6F8375D0FC for ; Tue, 19 Jul 2022 12:56:43 -0700 (PDT) Received: by mail-yb1-xb4a.google.com with SMTP id z9-20020a258689000000b0066e38ab7122so11542779ybk.9 for ; Tue, 19 Jul 2022 12:56:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=D+axFZcXM2aiH957XKoLzm0NK888e8vhfRqJcL8tQlk=; b=Q6lTmACHbxPLYKcD6TEAkGdodCXui8T8KQJjaNpTrLvbWN5YYkAgyFnxcQZ0lRY7oa 9acQO03dUwhSLLvwZgG97CGmC+M40EFkdd14zNVuiZQDK4FpyM9U525S0X5c7VpbI6Xn marrPlcC1B8OnVdKcyWLEzuxmfyKhQY6KJ04vb6BCCY7/tXnVjU1sHyYUUStNY026lvx a1JgtjHgsUNop33vm+b+5kvkCAIjGuUeNHlBhU7D03TW84KjTflKGPaQP9ECMhxEIUtS /x21ATGjiJ/WPYC9JCPHfhv5dMPm6wUIZ9DJZ6WQtg1p5yFpjTb5vnOzyYzl9H6HwAsK MVbw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=D+axFZcXM2aiH957XKoLzm0NK888e8vhfRqJcL8tQlk=; b=Vogc4DcQZ1/yqYDTgjSD7lPaAp287xV4vdXjRZ2kR5bSqJ60xPPtC/USK9A2US3eUy MwWFM/dUYW196KtOCSthSCw06EzunTeyp04QL9YQLDxZtLkA633ECZayTs3JShO+sXHl JktJ9du2kbDwBv8ZqKFHxy3PX/KLp0wpJ8mG3FmxkQ6l+8h3V1eFK9ybN8LyZSZIGWky k/9ApEBhTK3LffKHY5C1jq0nnP5QgduhU5MPWqVXvtkNz64n6MSiBOXQ9QKUQoN17eJv NdQi0VEO5Jy22CEeF+KEL7PeihvhorWCL8SCjtAB/2WjQCbRWqnITY/k1/ooAN+vexuw do7A== X-Gm-Message-State: AJIora9WkWhsz8z1YCGSYAdKtiPFWUq6XxeFwRCK0xy/6dUG5lPrCEyo T1XaIcnTZ25sx3RIYZ2V4+2YRHVEYbL4hBtEoEBp X-Google-Smtp-Source: AGRyM1sQ1a9x2GIZzx4aY+TQYaC9bhDEJVn0xFYWIaHkF0vSABxhMbIcNlU/zTDwvgT/LJednu8AwjLGuGiRA1DWNMMo X-Received: from ajr0.svl.corp.google.com ([2620:15c:2d4:203:a065:9221:e40d:4fbe]) (user=axelrasmussen job=sendgmr) by 2002:a5b:202:0:b0:66f:aab4:9c95 with SMTP id z2-20020a5b0202000000b0066faab49c95mr32614656ybl.81.1658260602570; Tue, 19 Jul 2022 12:56:42 -0700 (PDT) Date: Tue, 19 Jul 2022 12:56:27 -0700 In-Reply-To: <20220719195628.3415852-1-axelrasmussen@google.com> Message-Id: <20220719195628.3415852-5-axelrasmussen@google.com> Mime-Version: 1.0 References: <20220719195628.3415852-1-axelrasmussen@google.com> X-Mailer: git-send-email 2.37.0.170.g444d1eabd0-goog Subject: [PATCH v4 4/5] userfaultfd: update documentation to describe /dev/userfaultfd From: Axel Rasmussen To: Alexander Viro , Andrew Morton , Dave Hansen , "Dmitry V . Levin" , Gleb Fotengauer-Malinovskiy , Hugh Dickins , Jan Kara , Jonathan Corbet , Mel Gorman , Mike Kravetz , Mike Rapoport , Nadav Amit , Peter Xu , Shuah Khan , Suren Baghdasaryan , Vlastimil Babka , zhangyi Cc: Axel Rasmussen , linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org Explain the different ways to create a new userfaultfd, and how access control works for each way. Signed-off-by: Axel Rasmussen --- Documentation/admin-guide/mm/userfaultfd.rst | 41 ++++++++++++++++++-- Documentation/admin-guide/sysctl/vm.rst | 3 ++ 2 files changed, 41 insertions(+), 3 deletions(-) diff --git a/Documentation/admin-guide/mm/userfaultfd.rst b/Documentation/admin-guide/mm/userfaultfd.rst index 6528036093e1..a76c9dc1865b 100644 --- a/Documentation/admin-guide/mm/userfaultfd.rst +++ b/Documentation/admin-guide/mm/userfaultfd.rst @@ -17,7 +17,10 @@ of the ``PROT_NONE+SIGSEGV`` trick. Design ====== -Userfaults are delivered and resolved through the ``userfaultfd`` syscall. +Userspace creates a new userfaultfd, initializes it, and registers one or more +regions of virtual memory with it. Then, any page faults which occur within the +region(s) result in a message being delivered to the userfaultfd, notifying +userspace of the fault. The ``userfaultfd`` (aside from registering and unregistering virtual memory ranges) provides two primary functionalities: @@ -34,12 +37,11 @@ The real advantage of userfaults if compared to regular virtual memory management of mremap/mprotect is that the userfaults in all their operations never involve heavyweight structures like vmas (in fact the ``userfaultfd`` runtime load never takes the mmap_lock for writing). - Vmas are not suitable for page- (or hugepage) granular fault tracking when dealing with virtual address spaces that could span Terabytes. Too many vmas would be needed for that. -The ``userfaultfd`` once opened by invoking the syscall, can also be +The ``userfaultfd``, once created, can also be passed using unix domain sockets to a manager process, so the same manager process could handle the userfaults of a multitude of different processes without them being aware about what is going on @@ -50,6 +52,39 @@ is a corner case that would currently return ``-EBUSY``). API === +Creating a userfaultfd +---------------------- + +There are two ways to create a new userfaultfd, each of which provide ways to +restrict access to this functionality (since historically userfaultfds which +handle kernel page faults have been a useful tool for exploiting the kernel). + +The first way, supported since userfaultfd was introduced, is the +userfaultfd(2) syscall. Access to this is controlled in several ways: + +- Any user can always create a userfaultfd which traps userspace page faults + only. Such a userfaultfd can be created using the userfaultfd(2) syscall + with the flag UFFD_USER_MODE_ONLY. + +- In order to also trap kernel page faults for the address space, then either + the process needs the CAP_SYS_PTRACE capability, or the system must have + vm.unprivileged_userfaultfd set to 1. By default, vm.unprivileged_userfaultfd + is set to 0. + +The second way, added to the kernel more recently, is by opening and issuing a +USERFAULTFD_IOC_NEW ioctl to /dev/userfaultfd. This method yields equivalent +userfaultfds to the userfaultfd(2) syscall. + +Unlike userfaultfd(2), access to /dev/userfaultfd is controlled via normal +filesystem permissions (user/group/mode), which gives fine grained access to +userfaultfd specifically, without also granting other unrelated privileges at +the same time (as e.g. granting CAP_SYS_PTRACE would do). Users who have access +to /dev/userfaultfd can always create userfaultfds that trap kernel page faults; +vm.unprivileged_userfaultfd is not considered. + +Initializing a userfaultfd +-------------------------- + When first opened the ``userfaultfd`` must be enabled invoking the ``UFFDIO_API`` ioctl specifying a ``uffdio_api.api`` value set to ``UFFD_API`` (or a later API version) which will specify the ``read/POLLIN`` protocol diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst index 5c9aa171a0d3..36cf21f3b7ab 100644 --- a/Documentation/admin-guide/sysctl/vm.rst +++ b/Documentation/admin-guide/sysctl/vm.rst @@ -928,6 +928,9 @@ calls without any restrictions. The default value is 0. +Another way to control permissions for userfaultfd is to use +/dev/userfaultfd instead of userfaultfd(2). See +Documentation/admin-guide/mm/userfaultfd.rst. user_reserve_kbytes =================== From patchwork Tue Jul 19 19:56:28 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Axel Rasmussen X-Patchwork-Id: 592033 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E864EC43334 for ; Tue, 19 Jul 2022 19:57:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239920AbiGST5Y (ORCPT ); Tue, 19 Jul 2022 15:57:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44188 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239724AbiGST4v (ORCPT ); Tue, 19 Jul 2022 15:56:51 -0400 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 480EA5D0FF for ; Tue, 19 Jul 2022 12:56:45 -0700 (PDT) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-31c9a49a1a8so126143297b3.9 for ; Tue, 19 Jul 2022 12:56:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=hOhehCg3VD/THunYJJ5B4YThbuYZbL9+a5Qwaql9g98=; b=bNEJzZtKoIwRRVCbEhhzPBFWZtV6mxX5h0TNf42DduzDwcxSAljzJzxILukQgkAqZJ Fr4hgnnXxmToxkm5HMEiBThZjYFkjtQNe1w0gNK8HVvBtsDUcNgn+QTuCIz9wKiHWsOa rQ+x3lFIMfgeny/J1BHAsDcYfSwoYWcXCMgbYh/VWoIVx1qsJxbRI/zFLssciOfAf1OO 4v+uBp9+wTcaNYpEFGhApHT+SfhxwXOIqt2T2pAjvhX1U9EHcdMMY5T/0cmp5MO2+olo EL/TuCXI7fV8EMi8yISn4WQ3LWCqnWkhxLy3WcnraSF8Qfb11rn+rA/jNwzFARxSTaeR vsiQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=hOhehCg3VD/THunYJJ5B4YThbuYZbL9+a5Qwaql9g98=; b=NBDPABHgTuEzgQuj8cDWKwbPzRyczTNyloPJeQkJw6F7cwQJ7JEbMvSRSEKa+3Tihu OlYQpFsckOtrYRzDI0Auvot1vG78DZO1VHFsQxknNosAStfdBe+mVgVJVSyL9jEWqJM+ 4MIONpc9UKbV4ipNX34gRt9coKbRmIfntocgptxsWLCpzdAi2VD+OrT7quZ5gaJKo0GR k/Qo4OBY2b/x4xpDG0FB7HzLxOtqlqelSbfRz1uuYPyna7dUiqgtkzyUVTZAzn/01FDj GuzSBIxzfE35HABYlIQIZY0L9OqO1S86RMDfzFbTtzsJUaxmNbzLYIXodpsKXGVGorBU zu+A== X-Gm-Message-State: AJIora+5B19DuFN5MEEgITUkZbIesGbEfK5RoUlqAQoXaYWklUfbB2M4 eBhb9Oj9xh0LgE2cG69unPhcK9ROvd6paxWU9JgL X-Google-Smtp-Source: AGRyM1sJa7Fbwmi70rsvyDqxe2kH1B8xy8UoSr0mRkjdLSB4LR9SWeGIONuOcxkrQy2m8DeHjejHuwm+ncnhHIgcyBt6 X-Received: from ajr0.svl.corp.google.com ([2620:15c:2d4:203:a065:9221:e40d:4fbe]) (user=axelrasmussen job=sendgmr) by 2002:a25:2e50:0:b0:669:9a76:beb with SMTP id b16-20020a252e50000000b006699a760bebmr34308174ybn.597.1658260604865; Tue, 19 Jul 2022 12:56:44 -0700 (PDT) Date: Tue, 19 Jul 2022 12:56:28 -0700 In-Reply-To: <20220719195628.3415852-1-axelrasmussen@google.com> Message-Id: <20220719195628.3415852-6-axelrasmussen@google.com> Mime-Version: 1.0 References: <20220719195628.3415852-1-axelrasmussen@google.com> X-Mailer: git-send-email 2.37.0.170.g444d1eabd0-goog Subject: [PATCH v4 5/5] selftests: vm: add /dev/userfaultfd test cases to run_vmtests.sh From: Axel Rasmussen To: Alexander Viro , Andrew Morton , Dave Hansen , "Dmitry V . Levin" , Gleb Fotengauer-Malinovskiy , Hugh Dickins , Jan Kara , Jonathan Corbet , Mel Gorman , Mike Kravetz , Mike Rapoport , Nadav Amit , Peter Xu , Shuah Khan , Suren Baghdasaryan , Vlastimil Babka , zhangyi Cc: Axel Rasmussen , linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, Shuah Khan Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org This new mode was recently added to the userfaultfd selftest. We want to exercise both userfaultfd(2) as well as /dev/userfaultfd, so add both test cases to the script. Reviewed-by: Shuah Khan Acked-by: Peter Xu Signed-off-by: Axel Rasmussen --- tools/testing/selftests/vm/run_vmtests.sh | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/tools/testing/selftests/vm/run_vmtests.sh b/tools/testing/selftests/vm/run_vmtests.sh index e70ae0f3aaf6..156f864030fc 100755 --- a/tools/testing/selftests/vm/run_vmtests.sh +++ b/tools/testing/selftests/vm/run_vmtests.sh @@ -121,12 +121,17 @@ run_test ./gup_test -a run_test ./gup_test -ct -F 0x1 0 19 0x1000 run_test ./userfaultfd anon 20 16 +run_test ./userfaultfd anon:dev 20 16 # Hugetlb tests require source and destination huge pages. Pass in half the # size ($half_ufd_size_MB), which is used for *each*. run_test ./userfaultfd hugetlb "$half_ufd_size_MB" 32 +run_test ./userfaultfd hugetlb:dev "$half_ufd_size_MB" 32 run_test ./userfaultfd hugetlb_shared "$half_ufd_size_MB" 32 "$mnt"/uffd-test rm -f "$mnt"/uffd-test +run_test ./userfaultfd hugetlb_shared:dev "$half_ufd_size_MB" 32 "$mnt"/uffd-test +rm -f "$mnt"/uffd-test run_test ./userfaultfd shmem 20 16 +run_test ./userfaultfd shmem:dev 20 16 #cleanup umount "$mnt"