[v2,0/6] userfaultfd: add /dev/userfaultfd for fine grained access control

Message ID	20220422212945.2227722-1-axelrasmussen@google.com
Headers	show Return-Path: <linux-kselftest-owner@kernel.org> Date: Fri, 22 Apr 2022 14:29:39 -0700 Message-Id: <20220422212945.2227722-1-axelrasmussen@google.com> Mime-Version: 1.0 Subject: [PATCH v2 0/6] userfaultfd: add /dev/userfaultfd for fine grained access control From: Axel Rasmussen <axelrasmussen@google.com> To: Alexander Viro <viro@zeniv.linux.org.uk>, Andrew Morton <akpm@linux-foundation.org>, Charan Teja Reddy <charante@codeaurora.org>, Dave Hansen <dave.hansen@linux.intel.com>, "Dmitry V . Levin" <ldv@altlinux.org>, Gleb Fotengauer-Malinovskiy <glebfm@altlinux.org>, Hugh Dickins <hughd@google.com>, Jan Kara <jack@suse.cz>, Jonathan Corbet <corbet@lwn.net>, Mel Gorman <mgorman@techsingularity.net>, Mike Kravetz <mike.kravetz@oracle.com>, Mike Rapoport <rppt@kernel.org>, Nadav Amit <namit@vmware.com>, Peter Xu <peterx@redhat.com>, Shuah Khan <shuah@kernel.org>, Suren Baghdasaryan <surenb@google.com>, Vlastimil Babka <vbabka@suse.cz>, zhangyi <yi.zhang@huawei.com> Cc: Axel Rasmussen <axelrasmussen@google.com>, linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Precedence: bulk
Series	userfaultfd: add /dev/userfaultfd for fine grained access control \| expand [v2,0/6] userfaultfd: add /dev/userfaultfd for fine grained access control [v2,1/6] selftests: vm: add hugetlb_shared userfaultfd test to run_vmtests.sh [v2,2/6] userfaultfd: add /dev/userfaultfd for fine grained access control [v2,3/6] userfaultfd: selftests: modify selftest to use /dev/userfaultfd [v2,4/6] userfaultfd: update documentation to describe /dev/userfaultfd [v2,5/6] userfaultfd: selftests: make /dev/userfaultfd testing configurable [v2,6/6] selftests: vm: add /dev/userfaultfd test cases to run_vmtests.sh

Message ID

20220422212945.2227722-1-axelrasmussen@google.com

Headers

Date: Fri, 22 Apr 2022 14:29:39 -0700
Message-Id: <20220422212945.2227722-1-axelrasmussen@google.com>
Mime-Version: 1.0
Subject: [PATCH v2 0/6] userfaultfd: add /dev/userfaultfd for fine grained
 access control
From: Axel Rasmussen <axelrasmussen@google.com>
To: Alexander Viro <viro@zeniv.linux.org.uk>,
        Andrew Morton <akpm@linux-foundation.org>,
        Charan Teja Reddy <charante@codeaurora.org>,
        Dave Hansen <dave.hansen@linux.intel.com>,
        "Dmitry V . Levin" <ldv@altlinux.org>,
        Gleb Fotengauer-Malinovskiy <glebfm@altlinux.org>,
        Hugh Dickins <hughd@google.com>, Jan Kara <jack@suse.cz>,
        Jonathan Corbet <corbet@lwn.net>,
        Mel Gorman <mgorman@techsingularity.net>,
        Mike Kravetz <mike.kravetz@oracle.com>,
        Mike Rapoport <rppt@kernel.org>, Nadav Amit <namit@vmware.com>,
        Peter Xu <peterx@redhat.com>, Shuah Khan <shuah@kernel.org>,
        Suren Baghdasaryan <surenb@google.com>,
        Vlastimil Babka <vbabka@suse.cz>, zhangyi <yi.zhang@huawei.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>,
        linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org,
        linux-kernel@vger.kernel.org, linux-mm@kvack.org,
        linux-kselftest@vger.kernel.org
Content-Type: text/plain; charset="UTF-8"
Precedence: bulk

Series

userfaultfd: add /dev/userfaultfd for fine grained access control | expand

Message

Axel Rasmussen April 22, 2022, 9:29 p.m. UTC

This series is based on torvalds/master, but additionally the run_vmtests.sh
changes assume my refactor [1] has been applied first.

The series is split up like so:
- Patch 1 is a simple fixup which we should take in any case (even by itself).
- Patches 2-4 add the feature, basic support for it to the selftest, and docs.
- Patches 5-6 make the selftest configurable, so you can test one or the other
  instead of always both. If we decide this is overcomplicated, we could just
  drop these two patches and take the rest of the series.

[1]: https://patchwork.kernel.org/project/linux-mm/patch/20220421224928.1848230-1-axelrasmussen@google.com/

Changelog:
v1->v2:
  - Add documentation update.
  - Test *both* userfaultfd(2) and /dev/userfaultfd via the selftest.

Axel Rasmussen (6):
  selftests: vm: add hugetlb_shared userfaultfd test to run_vmtests.sh
  userfaultfd: add /dev/userfaultfd for fine grained access control
  userfaultfd: selftests: modify selftest to use /dev/userfaultfd
  userfaultfd: update documentation to describe /dev/userfaultfd
  userfaultfd: selftests: make /dev/userfaultfd testing configurable
  selftests: vm: add /dev/userfaultfd test cases to run_vmtests.sh

 Documentation/admin-guide/mm/userfaultfd.rst | 38 +++++++++-
 Documentation/admin-guide/sysctl/vm.rst      |  3 +
 fs/userfaultfd.c                             | 79 ++++++++++++++++----
 include/uapi/linux/userfaultfd.h             |  4 +
 tools/testing/selftests/vm/run_vmtests.sh    | 11 ++-
 tools/testing/selftests/vm/userfaultfd.c     | 60 +++++++++++++--
 6 files changed, 170 insertions(+), 25 deletions(-)

--
2.36.0.rc2.479.g8af0fa9b8e-goog

Comments

Shuah Khan April 26, 2022, 4:16 p.m. UTC | #1

On 4/22/22 3:29 PM, Axel Rasmussen wrote:
> We clearly want to ensure both userfaultfd(2) and /dev/userfaultfd keep
> working into the future, so just run the test twice, using each
> interface.
> 
> Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
> ---
>   tools/testing/selftests/vm/userfaultfd.c | 31 ++++++++++++++++++++++--
>   1 file changed, 29 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c
> index 92a4516f8f0d..12ae742a9981 100644
> --- a/tools/testing/selftests/vm/userfaultfd.c
> +++ b/tools/testing/selftests/vm/userfaultfd.c
> @@ -77,6 +77,9 @@ static int bounces;
>   #define TEST_SHMEM	3
>   static int test_type;
>   
> +/* test using /dev/userfaultfd, instead of userfaultfd(2) */
> +static bool test_dev_userfaultfd;
> +
>   /* exercise the test_uffdio_*_eexist every ALARM_INTERVAL_SECS */
>   #define ALARM_INTERVAL_SECS 10
>   static volatile bool test_uffdio_copy_eexist = true;
> @@ -383,13 +386,31 @@ static void assert_expected_ioctls_present(uint64_t mode, uint64_t ioctls)
>   	}
>   }
>   
> +static void __userfaultfd_open_dev(void)
> +{
> +	int fd;
> +
> +	uffd = -1;
> +	fd = open("/dev/userfaultfd", O_RDWR | O_CLOEXEC);
> +	if (fd < 0)
> +		return;
> +
> +	uffd = ioctl(fd, USERFAULTFD_IOC_NEW,
> +		     O_CLOEXEC | O_NONBLOCK | UFFD_USER_MODE_ONLY);
> +	close(fd);
> +}
> +
>   static void userfaultfd_open(uint64_t *features)
>   {
>   	struct uffdio_api uffdio_api;
>   
> -	uffd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK | UFFD_USER_MODE_ONLY);
> +	if (test_dev_userfaultfd)
> +		__userfaultfd_open_dev();
> +	else
> +		uffd = syscall(__NR_userfaultfd,
> +			       O_CLOEXEC | O_NONBLOCK | UFFD_USER_MODE_ONLY);
>   	if (uffd < 0)
> -		err("userfaultfd syscall not available in this kernel");
> +		err("creating userfaultfd failed");

This isn't an error as in test failure. This will be a skip because of
unmet dependencies. Also if this test requires root access, please check
for that and make that a skip as well.

>   	uffd_flags = fcntl(uffd, F_GETFD, NULL);
>   
>   	uffdio_api.api = UFFD_API;
> @@ -1698,6 +1719,12 @@ int main(int argc, char **argv)
>   	}
>   	printf("nr_pages: %lu, nr_pages_per_cpu: %lu\n",
>   	       nr_pages, nr_pages_per_cpu);
> +
> +	test_dev_userfaultfd = false;
> +	if (userfaultfd_stress())
> +		return 1;
> +
> +	test_dev_userfaultfd = true;
>   	return userfaultfd_stress();
>   }
>   
> 

thanks,
-- Shuah

Shuah Khan April 26, 2022, 4:56 p.m. UTC | #2

On 4/22/22 3:29 PM, Axel Rasmussen wrote:
> Instead of always testing both userfaultfd(2) and /dev/userfaultfd,
> let the user choose which to test.
> 
> As with other test features, change the behavior based on a new
> command line flag. Introduce the idea of "test mods", which are
> generic (not specific to a test type) modifications to the behavior of
> the test. This is sort of borrowed from this RFC patch series [1], but
> simplified a bit.
> 
> The benefit is, in "typical" configurations this test is somewhat slow
> (say, 30sec or something). Testing both clearly doubles it, so it may
> not always be desirable, as users are likely to use one or the other,
> but never both, in the "real world".
> 
> [1]: https://patchwork.kernel.org/project/linux-mm/patch/20201129004548.1619714-14-namit@vmware.com/
> 
> Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
> ---
>   tools/testing/selftests/vm/userfaultfd.c | 41 +++++++++++++++++-------
>   1 file changed, 30 insertions(+), 11 deletions(-)
> 
> diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c
> index 12ae742a9981..274522704e40 100644
> --- a/tools/testing/selftests/vm/userfaultfd.c
> +++ b/tools/testing/selftests/vm/userfaultfd.c
> @@ -142,8 +142,17 @@ static void usage(void)
>   {
>   	fprintf(stderr, "\nUsage: ./userfaultfd <test type> <MiB> <bounces> "
>   		"[hugetlbfs_file]\n\n");
> +

Remove the extra blank line here.

>   	fprintf(stderr, "Supported <test type>: anon, hugetlb, "
>   		"hugetlb_shared, shmem\n\n");
> +

Remove the extra blank line here.

> +	fprintf(stderr, "'Test mods' can be joined to the test type string with a ':'. "
> +		"Supported mods:\n");
> +	fprintf(stderr, "\tdev - Use /dev/userfaultfd instead of userfaultfd(2)\n");
> +	fprintf(stderr, "\nExample test mod usage:\n");
> +	fprintf(stderr, "# Run anonymous memory test with /dev/userfaultfd:\n");
> +	fprintf(stderr, "./userfaultfd anon:dev 100 99999\n\n");
> +
>   	fprintf(stderr, "Examples:\n\n");
>   	fprintf(stderr, "%s", examples);

Update examples above with new test cases if any.

>   	exit(1);
> @@ -1610,8 +1619,6 @@ unsigned long default_huge_page_size(void)
>   
>   static void set_test_type(const char *type)
>   {
> -	uint64_t features = UFFD_API_FEATURES;
> -
>   	if (!strcmp(type, "anon")) {
>   		test_type = TEST_ANON;
>   		uffd_test_ops = &anon_uffd_test_ops;
> @@ -1631,10 +1638,28 @@ static void set_test_type(const char *type)
>   		test_type = TEST_SHMEM;
>   		uffd_test_ops = &shmem_uffd_test_ops;
>   		test_uffdio_minor = true;
> -	} else {
> -		err("Unknown test type: %s", type);
> +	}

At this point, it might make it so much easier and maintainable if
we were to use getopt instead of parsing options.

> +}
> +
> +static void parse_test_type_arg(const char *raw_type)
> +{
> +	char *buf = strdup(raw_type);
> +	uint64_t features = UFFD_API_FEATURES;
> +
> +	while (buf) {
> +		const char *token = strsep(&buf, ":");
> +
> +		if (!test_type)
> +			set_test_type(token);
> +		else if (!strcmp(token, "dev"))
> +			test_dev_userfaultfd = true;
> +		else
> +			err("unrecognized test mod '%s'", token);
>   	}
>   
> +	if (!test_type)
> +		err("failed to parse test type argument: '%s'", raw_type);
> +
>   	if (test_type == TEST_HUGETLB)
>   		page_size = default_huge_page_size();
>   	else
> @@ -1681,7 +1706,7 @@ int main(int argc, char **argv)
>   		err("failed to arm SIGALRM");
>   	alarm(ALARM_INTERVAL_SECS);
>   
> -	set_test_type(argv[1]);
> +	parse_test_type_arg(argv[1]);
>   
>   	nr_cpus = sysconf(_SC_NPROCESSORS_ONLN);
>   	nr_pages_per_cpu = atol(argv[2]) * 1024*1024 / page_size /
> @@ -1719,12 +1744,6 @@ int main(int argc, char **argv)
>   	}
>   	printf("nr_pages: %lu, nr_pages_per_cpu: %lu\n",
>   	       nr_pages, nr_pages_per_cpu);
> -
> -	test_dev_userfaultfd = false;
> -	if (userfaultfd_stress())
> -		return 1;
> -
> -	test_dev_userfaultfd = true;
>   	return userfaultfd_stress();
>   }
>   
> 

Same comments as before on fail vs. skip conditions to watch out
for and report them correctly.

thanks,
-- Shuah

Axel Rasmussen May 19, 2022, 5:56 p.m. UTC | #3

On Tue, Apr 26, 2022 at 9:16 AM Shuah Khan <skhan@linuxfoundation.org> wrote:
>
> On 4/22/22 3:29 PM, Axel Rasmussen wrote:
> > We clearly want to ensure both userfaultfd(2) and /dev/userfaultfd keep
> > working into the future, so just run the test twice, using each
> > interface.
> >
> > Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
> > ---
> >   tools/testing/selftests/vm/userfaultfd.c | 31 ++++++++++++++++++++++--
> >   1 file changed, 29 insertions(+), 2 deletions(-)
> >
> > diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c
> > index 92a4516f8f0d..12ae742a9981 100644
> > --- a/tools/testing/selftests/vm/userfaultfd.c
> > +++ b/tools/testing/selftests/vm/userfaultfd.c
> > @@ -77,6 +77,9 @@ static int bounces;
> >   #define TEST_SHMEM  3
> >   static int test_type;
> >
> > +/* test using /dev/userfaultfd, instead of userfaultfd(2) */
> > +static bool test_dev_userfaultfd;
> > +
> >   /* exercise the test_uffdio_*_eexist every ALARM_INTERVAL_SECS */
> >   #define ALARM_INTERVAL_SECS 10
> >   static volatile bool test_uffdio_copy_eexist = true;
> > @@ -383,13 +386,31 @@ static void assert_expected_ioctls_present(uint64_t mode, uint64_t ioctls)
> >       }
> >   }
> >
> > +static void __userfaultfd_open_dev(void)
> > +{
> > +     int fd;
> > +
> > +     uffd = -1;
> > +     fd = open("/dev/userfaultfd", O_RDWR | O_CLOEXEC);
> > +     if (fd < 0)
> > +             return;
> > +
> > +     uffd = ioctl(fd, USERFAULTFD_IOC_NEW,
> > +                  O_CLOEXEC | O_NONBLOCK | UFFD_USER_MODE_ONLY);
> > +     close(fd);
> > +}
> > +
> >   static void userfaultfd_open(uint64_t *features)
> >   {
> >       struct uffdio_api uffdio_api;
> >
> > -     uffd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK | UFFD_USER_MODE_ONLY);
> > +     if (test_dev_userfaultfd)
> > +             __userfaultfd_open_dev();
> > +     else
> > +             uffd = syscall(__NR_userfaultfd,
> > +                            O_CLOEXEC | O_NONBLOCK | UFFD_USER_MODE_ONLY);
> >       if (uffd < 0)
> > -             err("userfaultfd syscall not available in this kernel");
> > +             err("creating userfaultfd failed");
>
> This isn't an error as in test failure. This will be a skip because of
> unmet dependencies. Also if this test requires root access, please check
> for that and make that a skip as well.

Testing with the userfaultfd syscall doesn't require any special
permissions (root or otherwise).

But testing with /dev/userfaultfd will require access to that device
node, which is root:root by default, but the system administrator may
have changed this. In general I think this will only fail due to a)
lack of kernel support or b) lack of permissions though, so always
exiting with KSFT_SKIP here seems reasonable. I'll make that change in
v3.

>
> >       uffd_flags = fcntl(uffd, F_GETFD, NULL);
> >
> >       uffdio_api.api = UFFD_API;
> > @@ -1698,6 +1719,12 @@ int main(int argc, char **argv)
> >       }
> >       printf("nr_pages: %lu, nr_pages_per_cpu: %lu\n",
> >              nr_pages, nr_pages_per_cpu);
> > +
> > +     test_dev_userfaultfd = false;
> > +     if (userfaultfd_stress())
> > +             return 1;
> > +
> > +     test_dev_userfaultfd = true;
> >       return userfaultfd_stress();
> >   }
> >
> >
>
> thanks,
> -- Shuah

Axel Rasmussen May 19, 2022, 7:13 p.m. UTC | #4

On Tue, Apr 26, 2022 at 9:56 AM Shuah Khan <skhan@linuxfoundation.org> wrote:
>
> On 4/22/22 3:29 PM, Axel Rasmussen wrote:
> > Instead of always testing both userfaultfd(2) and /dev/userfaultfd,
> > let the user choose which to test.
> >
> > As with other test features, change the behavior based on a new
> > command line flag. Introduce the idea of "test mods", which are
> > generic (not specific to a test type) modifications to the behavior of
> > the test. This is sort of borrowed from this RFC patch series [1], but
> > simplified a bit.
> >
> > The benefit is, in "typical" configurations this test is somewhat slow
> > (say, 30sec or something). Testing both clearly doubles it, so it may
> > not always be desirable, as users are likely to use one or the other,
> > but never both, in the "real world".
> >
> > [1]: https://patchwork.kernel.org/project/linux-mm/patch/20201129004548.1619714-14-namit@vmware.com/
> >
> > Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
> > ---
> >   tools/testing/selftests/vm/userfaultfd.c | 41 +++++++++++++++++-------
> >   1 file changed, 30 insertions(+), 11 deletions(-)
> >
> > diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c
> > index 12ae742a9981..274522704e40 100644
> > --- a/tools/testing/selftests/vm/userfaultfd.c
> > +++ b/tools/testing/selftests/vm/userfaultfd.c
> > @@ -142,8 +142,17 @@ static void usage(void)
> >   {
> >       fprintf(stderr, "\nUsage: ./userfaultfd <test type> <MiB> <bounces> "
> >               "[hugetlbfs_file]\n\n");
> > +
>
> Remove the extra blank line here.
>
> >       fprintf(stderr, "Supported <test type>: anon, hugetlb, "
> >               "hugetlb_shared, shmem\n\n");
> > +
>
> Remove the extra blank line here.
>
> > +     fprintf(stderr, "'Test mods' can be joined to the test type string with a ':'. "
> > +             "Supported mods:\n");
> > +     fprintf(stderr, "\tdev - Use /dev/userfaultfd instead of userfaultfd(2)\n");
> > +     fprintf(stderr, "\nExample test mod usage:\n");
> > +     fprintf(stderr, "# Run anonymous memory test with /dev/userfaultfd:\n");
> > +     fprintf(stderr, "./userfaultfd anon:dev 100 99999\n\n");
> > +
> >       fprintf(stderr, "Examples:\n\n");
> >       fprintf(stderr, "%s", examples);
>
> Update examples above with new test cases if any.

Will fix the above comments in v3.

>
> >       exit(1);
> > @@ -1610,8 +1619,6 @@ unsigned long default_huge_page_size(void)
> >
> >   static void set_test_type(const char *type)
> >   {
> > -     uint64_t features = UFFD_API_FEATURES;
> > -
> >       if (!strcmp(type, "anon")) {
> >               test_type = TEST_ANON;
> >               uffd_test_ops = &anon_uffd_test_ops;
> > @@ -1631,10 +1638,28 @@ static void set_test_type(const char *type)
> >               test_type = TEST_SHMEM;
> >               uffd_test_ops = &shmem_uffd_test_ops;
> >               test_uffdio_minor = true;
> > -     } else {
> > -             err("Unknown test type: %s", type);
> > +     }
>
> At this point, it might make it so much easier and maintainable if
> we were to use getopt instead of parsing options.

Agreed, I'd like that as well. But, since it's a bigger refactor that
affects all test types, I think it may be cleaner to leave it for a
follow-up series.

>
> > +}
> > +
> > +static void parse_test_type_arg(const char *raw_type)
> > +{
> > +     char *buf = strdup(raw_type);
> > +     uint64_t features = UFFD_API_FEATURES;
> > +
> > +     while (buf) {
> > +             const char *token = strsep(&buf, ":");
> > +
> > +             if (!test_type)
> > +                     set_test_type(token);
> > +             else if (!strcmp(token, "dev"))
> > +                     test_dev_userfaultfd = true;
> > +             else
> > +                     err("unrecognized test mod '%s'", token);
> >       }
> >
> > +     if (!test_type)
> > +             err("failed to parse test type argument: '%s'", raw_type);
> > +
> >       if (test_type == TEST_HUGETLB)
> >               page_size = default_huge_page_size();
> >       else
> > @@ -1681,7 +1706,7 @@ int main(int argc, char **argv)
> >               err("failed to arm SIGALRM");
> >       alarm(ALARM_INTERVAL_SECS);
> >
> > -     set_test_type(argv[1]);
> > +     parse_test_type_arg(argv[1]);
> >
> >       nr_cpus = sysconf(_SC_NPROCESSORS_ONLN);
> >       nr_pages_per_cpu = atol(argv[2]) * 1024*1024 / page_size /
> > @@ -1719,12 +1744,6 @@ int main(int argc, char **argv)
> >       }
> >       printf("nr_pages: %lu, nr_pages_per_cpu: %lu\n",
> >              nr_pages, nr_pages_per_cpu);
> > -
> > -     test_dev_userfaultfd = false;
> > -     if (userfaultfd_stress())
> > -             return 1;
> > -
> > -     test_dev_userfaultfd = true;
> >       return userfaultfd_stress();
> >   }
> >
> >
>
> Same comments as before on fail vs. skip conditions to watch out
> for and report them correctly.

I think in v3 things will be correct. Basically, in the skip cases we
just exit(KSFT_SKIP) directly, instead of relying on the return value
here. I'll take a pass and double check though before sending v3.

>
> thanks,
> -- Shuah
>