mbox series

[v2,00/10] KVM: selftests: exercise userfaultfd minor faults

Message ID 20210519200339.829146-1-axelrasmussen@google.com
Headers show
Series KVM: selftests: exercise userfaultfd minor faults | expand

Message

Axel Rasmussen May 19, 2021, 8:03 p.m. UTC
Base
====

These patches are based upon Andrew Morton's v5.13-rc1-mmots-2021-05-13-17-23
tag. This is because this series depends on:

- UFFD minor fault support for hugetlbfs (in v5.13-rc1) [1]
- UFFD minor fault support for shmem (in Andrew's tree) [2]

[1] https://lore.kernel.org/linux-fsdevel/20210301222728.176417-1-axelrasmussen@google.com/
[2] https://lore.kernel.org/patchwork/cover/1420967/

Changelog
=========

v1->v2:
- Picked up Reviewed-by's.
- Change backing_src_is_shared() to check the flags, instead of the type. This
  makes it robust to adding new backing source types in the future.
- Add another commit which refactors setup_demand_paging() error handling.
- Print UFFD ioctl type once in setup_demand_paging, instead of on every page-in
  operation.
- Expand comment on why we use MFD_HUGETLB instead of MAP_HUGETLB.
- Reworded comment on addr_gpa2alias.
- Moved demand_paging_test.c timing calls outside of the if (), deduplicating
  them.
- Split trivial comment / logging fixups into a separate commit.
- Add another commit which prints a clarifying message on test skip.
- Split the commit allowing backing src_type to be modified in two.
- Split the commit adding the shmem backing type in two.
- Rebased onto v5.13-rc1-mmots-2021-05-13-17-23.

Overview
========

Minor fault handling is a new userfaultfd feature whose goal is generally to
improve performance. In particular, it is intended for use with demand paging.
There are more details in the cover letters for this new feature (linked above),
but at a high level the idea is that we think of these three phases of live
migration of a VM:

1. Precopy, where we copy "some" pages from the source to the target, while the
   VM is still running on the source machine.
2. Blackout, where execution stops on the source, and begins on the target.
3. Postcopy, where the VM is running on the target, some pages are already up
   to date, and others are not (because they weren't copied, or were modified
   after being copied).

During postcopy, the first time the guest touches memory, we intercept a minor
fault. Userspace checks whether or not the page is already up to date. If
needed, we copy the final version of the page from the soure machine. This
could be done with RDMA for example, to do it truly in place / with no copying.
At this point, all that's left is to setup PTEs for the guest: so we issue
UFFDIO_CONTINUE. No copying or page allocation needed.

Because of this use case, it's useful to exercise this as part of the demand
paging test. It lets us ensure the use case works correctly end-to-end, and also
gives us an in-tree way to profile the end-to-end flow for future performance
improvements.

Axel Rasmussen (10):
  KVM: selftests: trivial comment/logging fixes
  KVM: selftests: simplify setup_demand_paging error handling
  KVM: selftests: print a message when skipping KVM tests
  KVM: selftests: compute correct demand paging size
  KVM: selftests: allow different backing source types
  KVM: selftests: refactor vm_mem_backing_src_type flags
  KVM: selftests: add shmem backing source type
  KVM: selftests: create alias mappings when using shared memory
  KVM: selftests: allow using UFFD minor faults for demand paging
  KVM: selftests: add shared hugetlbfs backing source type

 .../selftests/kvm/demand_paging_test.c        | 175 +++++++++++-------
 .../testing/selftests/kvm/include/kvm_util.h  |   1 +
 .../testing/selftests/kvm/include/test_util.h |  12 ++
 tools/testing/selftests/kvm/lib/kvm_util.c    |  84 ++++++++-
 .../selftests/kvm/lib/kvm_util_internal.h     |   2 +
 tools/testing/selftests/kvm/lib/test_util.c   |  51 +++--
 6 files changed, 238 insertions(+), 87 deletions(-)

--
2.31.1.751.gd2f1c929bd-goog

Comments

Ben Gardon May 19, 2021, 9:41 p.m. UTC | #1
On Wed, May 19, 2021 at 1:03 PM Axel Rasmussen <axelrasmussen@google.com> wrote:
>
> Some trivial fixes I found while touching related code in this series,
> factored out into a separate commit for easier reviewing:
>
> - s/gor/got/ and add a newline in demand_paging_test.c
> - s/backing_src/src_type/ in a comment to be consistent with the real
>   function signature in kvm_util.c
>
> Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>

Reviewed-by: Ben Gardon <bgardon@google.com>

Thanks for doing this!

> ---
>  tools/testing/selftests/kvm/demand_paging_test.c | 2 +-
>  tools/testing/selftests/kvm/lib/kvm_util.c       | 4 ++--
>  2 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/tools/testing/selftests/kvm/demand_paging_test.c b/tools/testing/selftests/kvm/demand_paging_test.c
> index 5f7a229c3af1..9398ba6ef023 100644
> --- a/tools/testing/selftests/kvm/demand_paging_test.c
> +++ b/tools/testing/selftests/kvm/demand_paging_test.c
> @@ -169,7 +169,7 @@ static void *uffd_handler_thread_fn(void *arg)
>                 if (r == -1) {
>                         if (errno == EAGAIN)
>                                 continue;
> -                       pr_info("Read of uffd gor errno %d", errno);
> +                       pr_info("Read of uffd got errno %d\n", errno);
>                         return NULL;
>                 }
>
> diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
> index fc83f6c5902d..f05ca919cccb 100644
> --- a/tools/testing/selftests/kvm/lib/kvm_util.c
> +++ b/tools/testing/selftests/kvm/lib/kvm_util.c
> @@ -663,8 +663,8 @@ int kvm_memcmp_hva_gva(void *hva, struct kvm_vm *vm, vm_vaddr_t gva, size_t len)
>   *
>   * Input Args:
>   *   vm - Virtual Machine
> - *   backing_src - Storage source for this region.
> - *                 NULL to use anonymous memory.
> + *   src_type - Storage source for this region.
> + *              NULL to use anonymous memory.
>   *   guest_paddr - Starting guest physical address
>   *   slot - KVM region slot
>   *   npages - Number of physical pages
> --
> 2.31.1.751.gd2f1c929bd-goog
>
Ben Gardon May 19, 2021, 9:49 p.m. UTC | #2
On Wed, May 19, 2021 at 1:03 PM Axel Rasmussen <axelrasmussen@google.com> wrote:
>
> Previously, if this check failed, we'd just exit quietly with no output.
> This can be confusing, so print out a short message indicating why the
> test is being skipped.
>
> Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>

Reviewed-by: Ben Gardon <bgardon@google.com>

> ---
>  tools/testing/selftests/kvm/lib/kvm_util.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
> index f05ca919cccb..0d6ddee429b9 100644
> --- a/tools/testing/selftests/kvm/lib/kvm_util.c
> +++ b/tools/testing/selftests/kvm/lib/kvm_util.c
> @@ -53,8 +53,10 @@ int kvm_check_cap(long cap)
>         int kvm_fd;
>
>         kvm_fd = open(KVM_DEV_PATH, O_RDONLY);
> -       if (kvm_fd < 0)
> +       if (kvm_fd < 0) {
> +               print_skip("KVM not available, errno: %d", errno);
>                 exit(KSFT_SKIP);
> +       }

This is a wonderful change. I believe this will only be hit if KVM is
built as a module and that module has not yet been loaded, so this
message could also suggest that the user check if the KVM / KVM
arch/vendor specific module has been loaded.

>
>         ret = ioctl(kvm_fd, KVM_CHECK_EXTENSION, cap);
>         TEST_ASSERT(ret != -1, "KVM_CHECK_EXTENSION IOCTL failed,\n"
> --
> 2.31.1.751.gd2f1c929bd-goog
>
Ben Gardon May 19, 2021, 9:53 p.m. UTC | #3
On Wed, May 19, 2021 at 1:03 PM Axel Rasmussen <axelrasmussen@google.com> wrote:
>
> Add an argument which lets us specify a different backing memory type
> for the test. The default is just to use anonymous, matching existing
> behavior.
>
> This is in preparation for testing UFFD minor faults. For that, we'll
> need to use a new backing memory type which is setup with MAP_SHARED.
>
> Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>

Reviewed-by: Ben Gardon <bgardon@google.com>

> ---
>  tools/testing/selftests/kvm/demand_paging_test.c | 15 +++++++++++----
>  1 file changed, 11 insertions(+), 4 deletions(-)
>
> diff --git a/tools/testing/selftests/kvm/demand_paging_test.c b/tools/testing/selftests/kvm/demand_paging_test.c
> index 94cf047358d5..01890a7b0155 100644
> --- a/tools/testing/selftests/kvm/demand_paging_test.c
> +++ b/tools/testing/selftests/kvm/demand_paging_test.c
> @@ -241,6 +241,7 @@ static void setup_demand_paging(struct kvm_vm *vm,
>  struct test_params {
>         bool use_uffd;
>         useconds_t uffd_delay;
> +       enum vm_mem_backing_src_type src_type;
>         bool partition_vcpu_memory_access;
>  };
>
> @@ -258,11 +259,11 @@ static void run_test(enum vm_guest_mode mode, void *arg)
>         int r;
>
>         vm = perf_test_create_vm(mode, nr_vcpus, guest_percpu_mem_size,
> -                                VM_MEM_SRC_ANONYMOUS);
> +                                p->src_type);
>
>         perf_test_args.wr_fract = 1;
>
> -       demand_paging_size = get_backing_src_pagesz(VM_MEM_SRC_ANONYMOUS);
> +       demand_paging_size = get_backing_src_pagesz(p->src_type);
>
>         guest_data_prototype = malloc(demand_paging_size);
>         TEST_ASSERT(guest_data_prototype,
> @@ -378,7 +379,7 @@ static void help(char *name)
>  {
>         puts("");
>         printf("usage: %s [-h] [-m mode] [-u] [-d uffd_delay_usec]\n"
> -              "          [-b memory] [-v vcpus] [-o]\n", name);
> +              "          [-b memory] [-t type] [-v vcpus] [-o]\n", name);
>         guest_modes_help();
>         printf(" -u: use User Fault FD to handle vCPU page\n"
>                "     faults.\n");
> @@ -388,6 +389,8 @@ static void help(char *name)
>         printf(" -b: specify the size of the memory region which should be\n"
>                "     demand paged by each vCPU. e.g. 10M or 3G.\n"
>                "     Default: 1G\n");
> +       printf(" -t: The type of backing memory to use. Default: anonymous\n");
> +       backing_src_help();
>         printf(" -v: specify the number of vCPUs to run.\n");
>         printf(" -o: Overlap guest memory accesses instead of partitioning\n"
>                "     them into a separate region of memory for each vCPU.\n");
> @@ -399,13 +402,14 @@ int main(int argc, char *argv[])
>  {
>         int max_vcpus = kvm_check_cap(KVM_CAP_MAX_VCPUS);
>         struct test_params p = {
> +               .src_type = VM_MEM_SRC_ANONYMOUS,
>                 .partition_vcpu_memory_access = true,
>         };
>         int opt;
>
>         guest_modes_append_default();
>
> -       while ((opt = getopt(argc, argv, "hm:ud:b:v:o")) != -1) {
> +       while ((opt = getopt(argc, argv, "hm:ud:b:t:v:o")) != -1) {
>                 switch (opt) {
>                 case 'm':
>                         guest_modes_cmdline(optarg);
> @@ -420,6 +424,9 @@ int main(int argc, char *argv[])
>                 case 'b':
>                         guest_percpu_mem_size = parse_size(optarg);
>                         break;
> +               case 't':
> +                       p.src_type = parse_backing_src_type(optarg);
> +                       break;
>                 case 'v':
>                         nr_vcpus = atoi(optarg);
>                         TEST_ASSERT(nr_vcpus > 0 && nr_vcpus <= max_vcpus,
> --
> 2.31.1.751.gd2f1c929bd-goog
>
Ben Gardon May 19, 2021, 10:20 p.m. UTC | #4
On Wed, May 19, 2021 at 1:04 PM Axel Rasmussen <axelrasmussen@google.com> wrote:
>
> UFFD handling of MINOR faults is a new feature whose use case is to
> speed up demand paging (compared to MISSING faults). So, it's
> interesting to let this selftest exercise this new mode.
>
> Modify the demand paging test to have the option of using UFFD minor
> faults, as opposed to missing faults. Now, when turning on userfaultfd
> with '-u', the desired mode has to be specified ("MISSING" or "MINOR").
>
> If we're in minor mode, before registering, prefault via the *alias*.
> This way, the guest will trigger minor faults, instead of missing
> faults, and we can UFFDIO_CONTINUE to resolve them.
>
> Modify the page fault handler function to use the right ioctl depending
> on the mode we're running in. In MINOR mode, use UFFDIO_CONTINUE.
>
> Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
> ---
>  .../selftests/kvm/demand_paging_test.c        | 112 ++++++++++++------
>  1 file changed, 79 insertions(+), 33 deletions(-)
>
> diff --git a/tools/testing/selftests/kvm/demand_paging_test.c b/tools/testing/selftests/kvm/demand_paging_test.c
> index 01890a7b0155..df7190261923 100644
> --- a/tools/testing/selftests/kvm/demand_paging_test.c
> +++ b/tools/testing/selftests/kvm/demand_paging_test.c
> @@ -74,33 +74,48 @@ static void *vcpu_worker(void *data)
>         return NULL;
>  }
>
> -static int handle_uffd_page_request(int uffd, uint64_t addr)
> +static int handle_uffd_page_request(int uffd_mode, int uffd, uint64_t addr)
>  {
> -       pid_t tid;
> +       pid_t tid = syscall(__NR_gettid);
>         struct timespec start;
>         struct timespec ts_diff;
> -       struct uffdio_copy copy;
>         int r;
>
> -       tid = syscall(__NR_gettid);
> +       clock_gettime(CLOCK_MONOTONIC, &start);
>
> -       copy.src = (uint64_t)guest_data_prototype;
> -       copy.dst = addr;
> -       copy.len = demand_paging_size;
> -       copy.mode = 0;
> +       if (uffd_mode == UFFDIO_REGISTER_MODE_MISSING) {
> +               struct uffdio_copy copy;
>
> -       clock_gettime(CLOCK_MONOTONIC, &start);
> +               copy.src = (uint64_t)guest_data_prototype;
> +               copy.dst = addr;
> +               copy.len = demand_paging_size;
> +               copy.mode = 0;
> +
> +               r = ioctl(uffd, UFFDIO_COPY, &copy);
> +               if (r == -1) {
> +                       pr_info("Failed UFFDIO_COPY in 0x%lx from thread %d with errno: %d\n",
> +                               addr, tid, errno);
> +                       return r;
> +               }
> +       } else if (uffd_mode == UFFDIO_REGISTER_MODE_MINOR) {
> +               struct uffdio_continue cont = {0};
> +
> +               cont.range.start = addr;
> +               cont.range.len = demand_paging_size;
>
> -       r = ioctl(uffd, UFFDIO_COPY, &copy);
> -       if (r == -1) {
> -               pr_info("Failed Paged in 0x%lx from thread %d with errno: %d\n",
> -                       addr, tid, errno);
> -               return r;
> +               r = ioctl(uffd, UFFDIO_CONTINUE, &cont);
> +               if (r == -1) {
> +                       pr_info("Failed UFFDIO_CONTINUE in 0x%lx from thread %d with errno: %d\n",
> +                               addr, tid, errno);
> +                       return r;
> +               }
> +       } else {
> +               TEST_FAIL("Invalid uffd mode %d", uffd_mode);
>         }
>
>         ts_diff = timespec_elapsed(start);
>
> -       PER_PAGE_DEBUG("UFFDIO_COPY %d \t%ld ns\n", tid,
> +       PER_PAGE_DEBUG("UFFD page-in %d \t%ld ns\n", tid,
>                        timespec_to_ns(ts_diff));
>         PER_PAGE_DEBUG("Paged in %ld bytes at 0x%lx from thread %d\n",
>                        demand_paging_size, addr, tid);
> @@ -111,6 +126,7 @@ static int handle_uffd_page_request(int uffd, uint64_t addr)
>  bool quit_uffd_thread;
>
>  struct uffd_handler_args {
> +       int uffd_mode;
>         int uffd;
>         int pipefd;
>         useconds_t delay;
> @@ -187,7 +203,7 @@ static void *uffd_handler_thread_fn(void *arg)
>                 if (delay)
>                         usleep(delay);
>                 addr =  msg.arg.pagefault.address;
> -               r = handle_uffd_page_request(uffd, addr);
> +               r = handle_uffd_page_request(uffd_args->uffd_mode, uffd, addr);
>                 if (r < 0)
>                         return NULL;
>                 pages++;
> @@ -203,13 +219,32 @@ static void *uffd_handler_thread_fn(void *arg)
>
>  static void setup_demand_paging(struct kvm_vm *vm,
>                                 pthread_t *uffd_handler_thread, int pipefd,
> -                               useconds_t uffd_delay,
> +                               int uffd_mode, useconds_t uffd_delay,
>                                 struct uffd_handler_args *uffd_args,
> -                               void *hva, uint64_t len)
> +                               void *hva, void *alias, uint64_t len)
>  {
> +       bool is_minor = (uffd_mode == UFFDIO_REGISTER_MODE_MINOR);
>         int uffd;
>         struct uffdio_api uffdio_api;
>         struct uffdio_register uffdio_register;
> +       uint64_t expected_ioctls = ((uint64_t) 1) << _UFFDIO_COPY;
> +
> +       PER_PAGE_DEBUG("Userfaultfd %s mode, faults resolved with %s\n",
> +                      is_minor ? "MINOR" : "MISSING",
> +                      is_minor ? "UFFDIO_CONINUE" : "UFFDIO_COPY");
> +
> +       /* In order to get minor faults, prefault via the alias. */
> +       if (is_minor) {
> +               size_t p;
> +
> +               expected_ioctls = ((uint64_t) 1) << _UFFDIO_CONTINUE;
> +
> +               TEST_ASSERT(alias != NULL, "Alias required for minor faults");
> +               for (p = 0; p < (len / demand_paging_size); ++p) {
> +                       memcpy(alias + (p * demand_paging_size),
> +                              guest_data_prototype, demand_paging_size);
> +               }
> +       }

Would it be worth timing this operation? I think we'd need to know how
long we spent prefaulting the memory to really be able to compare UDDF
modes using this test.

>
>         uffd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK);
>         TEST_ASSERT(uffd >= 0, "uffd creation failed, errno: %d", errno);
> @@ -222,12 +257,13 @@ static void setup_demand_paging(struct kvm_vm *vm,
>
>         uffdio_register.range.start = (uint64_t)hva;
>         uffdio_register.range.len = len;
> -       uffdio_register.mode = UFFDIO_REGISTER_MODE_MISSING;
> +       uffdio_register.mode = uffd_mode;
>         TEST_ASSERT(ioctl(uffd, UFFDIO_REGISTER, &uffdio_register) != -1,
>                     "ioctl UFFDIO_REGISTER failed");
> -       TEST_ASSERT((uffdio_register.ioctls & UFFD_API_RANGE_IOCTLS) ==
> -                   UFFD_API_RANGE_IOCTLS, "unexpected userfaultfd ioctl set");
> +       TEST_ASSERT((uffdio_register.ioctls & expected_ioctls) ==
> +                   expected_ioctls, "missing userfaultfd ioctls");
>
> +       uffd_args->uffd_mode = uffd_mode;
>         uffd_args->uffd = uffd;
>         uffd_args->pipefd = pipefd;
>         uffd_args->delay = uffd_delay;
> @@ -239,7 +275,7 @@ static void setup_demand_paging(struct kvm_vm *vm,
>  }
>
>  struct test_params {
> -       bool use_uffd;
> +       int uffd_mode;
>         useconds_t uffd_delay;
>         enum vm_mem_backing_src_type src_type;
>         bool partition_vcpu_memory_access;
> @@ -276,7 +312,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
>         perf_test_setup_vcpus(vm, nr_vcpus, guest_percpu_mem_size,
>                               p->partition_vcpu_memory_access);
>
> -       if (p->use_uffd) {
> +       if (p->uffd_mode) {
>                 uffd_handler_threads =
>                         malloc(nr_vcpus * sizeof(*uffd_handler_threads));
>                 TEST_ASSERT(uffd_handler_threads, "Memory allocation failed");
> @@ -290,6 +326,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
>                 for (vcpu_id = 0; vcpu_id < nr_vcpus; vcpu_id++) {
>                         vm_paddr_t vcpu_gpa;
>                         void *vcpu_hva;
> +                       void *vcpu_alias;
>                         uint64_t vcpu_mem_size;
>
>
> @@ -304,8 +341,9 @@ static void run_test(enum vm_guest_mode mode, void *arg)
>                         PER_VCPU_DEBUG("Added VCPU %d with test mem gpa [%lx, %lx)\n",
>                                        vcpu_id, vcpu_gpa, vcpu_gpa + vcpu_mem_size);
>
> -                       /* Cache the HVA pointer of the region */
> +                       /* Cache the host addresses of the region */
>                         vcpu_hva = addr_gpa2hva(vm, vcpu_gpa);
> +                       vcpu_alias = addr_gpa2alias(vm, vcpu_gpa);
>
>                         /*
>                          * Set up user fault fd to handle demand paging
> @@ -316,8 +354,9 @@ static void run_test(enum vm_guest_mode mode, void *arg)
>                         TEST_ASSERT(!r, "Failed to set up pipefd");
>
>                         setup_demand_paging(vm, &uffd_handler_threads[vcpu_id],
> -                                           pipefds[vcpu_id * 2], p->uffd_delay,
> -                                           &uffd_args[vcpu_id], vcpu_hva,
> +                                           pipefds[vcpu_id * 2], p->uffd_mode,
> +                                           p->uffd_delay, &uffd_args[vcpu_id],
> +                                           vcpu_hva, vcpu_alias,
>                                             vcpu_mem_size);
>                 }
>         }
> @@ -346,7 +385,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
>
>         pr_info("All vCPU threads joined\n");
>
> -       if (p->use_uffd) {
> +       if (p->uffd_mode) {
>                 char c;
>
>                 /* Tell the user fault fd handler threads to quit */
> @@ -368,7 +407,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
>
>         free(guest_data_prototype);
>         free(vcpu_threads);
> -       if (p->use_uffd) {
> +       if (p->uffd_mode) {
>                 free(uffd_handler_threads);
>                 free(uffd_args);
>                 free(pipefds);
> @@ -378,11 +417,11 @@ static void run_test(enum vm_guest_mode mode, void *arg)
>  static void help(char *name)
>  {
>         puts("");
> -       printf("usage: %s [-h] [-m mode] [-u] [-d uffd_delay_usec]\n"
> +       printf("usage: %s [-h] [-m mode] [-u mode] [-d uffd_delay_usec]\n"

NIT: maybe use uffd_mode or some word other than mode here to
disambiguate with -m

>                "          [-b memory] [-t type] [-v vcpus] [-o]\n", name);
>         guest_modes_help();
> -       printf(" -u: use User Fault FD to handle vCPU page\n"
> -              "     faults.\n");
> +       printf(" -u: use userfaultfd to handle vCPU page faults. Mode is a\n"
> +              "     UFFD registration mode: 'MISSING' or 'MINOR'.\n");
>         printf(" -d: add a delay in usec to the User Fault\n"
>                "     FD handler to simulate demand paging\n"
>                "     overheads. Ignored without -u.\n");
> @@ -409,13 +448,17 @@ int main(int argc, char *argv[])
>
>         guest_modes_append_default();
>
> -       while ((opt = getopt(argc, argv, "hm:ud:b:t:v:o")) != -1) {
> +       while ((opt = getopt(argc, argv, "hm:u:d:b:t:v:o")) != -1) {
>                 switch (opt) {
>                 case 'm':
>                         guest_modes_cmdline(optarg);
>                         break;
>                 case 'u':
> -                       p.use_uffd = true;
> +                       if (!strcmp("MISSING", optarg))
> +                               p.uffd_mode = UFFDIO_REGISTER_MODE_MISSING;
> +                       else if (!strcmp("MINOR", optarg))
> +                               p.uffd_mode = UFFDIO_REGISTER_MODE_MINOR;
> +                       TEST_ASSERT(p.uffd_mode, "UFFD mode must be 'MISSING' or 'MINOR'.");
>                         break;
>                 case 'd':
>                         p.uffd_delay = strtoul(optarg, NULL, 0);
> @@ -442,6 +485,9 @@ int main(int argc, char *argv[])
>                 }
>         }
>
> +       TEST_ASSERT(p.uffd_mode != UFFDIO_REGISTER_MODE_MINOR || p.src_type == VM_MEM_SRC_SHMEM,
> +                   "userfaultfd MINOR mode requires shared memory; pick a different -t");
> +
>         for_each_guest_mode(run_test, &p);
>
>         return 0;
> --
> 2.31.1.751.gd2f1c929bd-goog
>
Axel Rasmussen May 19, 2021, 10:34 p.m. UTC | #5
On Wed, May 19, 2021 at 3:21 PM Ben Gardon <bgardon@google.com> wrote:
>
> On Wed, May 19, 2021 at 1:04 PM Axel Rasmussen <axelrasmussen@google.com> wrote:
> >
> > UFFD handling of MINOR faults is a new feature whose use case is to
> > speed up demand paging (compared to MISSING faults). So, it's
> > interesting to let this selftest exercise this new mode.
> >
> > Modify the demand paging test to have the option of using UFFD minor
> > faults, as opposed to missing faults. Now, when turning on userfaultfd
> > with '-u', the desired mode has to be specified ("MISSING" or "MINOR").
> >
> > If we're in minor mode, before registering, prefault via the *alias*.
> > This way, the guest will trigger minor faults, instead of missing
> > faults, and we can UFFDIO_CONTINUE to resolve them.
> >
> > Modify the page fault handler function to use the right ioctl depending
> > on the mode we're running in. In MINOR mode, use UFFDIO_CONTINUE.
> >
> > Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
> > ---
> >  .../selftests/kvm/demand_paging_test.c        | 112 ++++++++++++------
> >  1 file changed, 79 insertions(+), 33 deletions(-)
> >
> > diff --git a/tools/testing/selftests/kvm/demand_paging_test.c b/tools/testing/selftests/kvm/demand_paging_test.c
> > index 01890a7b0155..df7190261923 100644
> > --- a/tools/testing/selftests/kvm/demand_paging_test.c
> > +++ b/tools/testing/selftests/kvm/demand_paging_test.c
> > @@ -74,33 +74,48 @@ static void *vcpu_worker(void *data)
> >         return NULL;
> >  }
> >
> > -static int handle_uffd_page_request(int uffd, uint64_t addr)
> > +static int handle_uffd_page_request(int uffd_mode, int uffd, uint64_t addr)
> >  {
> > -       pid_t tid;
> > +       pid_t tid = syscall(__NR_gettid);
> >         struct timespec start;
> >         struct timespec ts_diff;
> > -       struct uffdio_copy copy;
> >         int r;
> >
> > -       tid = syscall(__NR_gettid);
> > +       clock_gettime(CLOCK_MONOTONIC, &start);
> >
> > -       copy.src = (uint64_t)guest_data_prototype;
> > -       copy.dst = addr;
> > -       copy.len = demand_paging_size;
> > -       copy.mode = 0;
> > +       if (uffd_mode == UFFDIO_REGISTER_MODE_MISSING) {
> > +               struct uffdio_copy copy;
> >
> > -       clock_gettime(CLOCK_MONOTONIC, &start);
> > +               copy.src = (uint64_t)guest_data_prototype;
> > +               copy.dst = addr;
> > +               copy.len = demand_paging_size;
> > +               copy.mode = 0;
> > +
> > +               r = ioctl(uffd, UFFDIO_COPY, &copy);
> > +               if (r == -1) {
> > +                       pr_info("Failed UFFDIO_COPY in 0x%lx from thread %d with errno: %d\n",
> > +                               addr, tid, errno);
> > +                       return r;
> > +               }
> > +       } else if (uffd_mode == UFFDIO_REGISTER_MODE_MINOR) {
> > +               struct uffdio_continue cont = {0};
> > +
> > +               cont.range.start = addr;
> > +               cont.range.len = demand_paging_size;
> >
> > -       r = ioctl(uffd, UFFDIO_COPY, &copy);
> > -       if (r == -1) {
> > -               pr_info("Failed Paged in 0x%lx from thread %d with errno: %d\n",
> > -                       addr, tid, errno);
> > -               return r;
> > +               r = ioctl(uffd, UFFDIO_CONTINUE, &cont);
> > +               if (r == -1) {
> > +                       pr_info("Failed UFFDIO_CONTINUE in 0x%lx from thread %d with errno: %d\n",
> > +                               addr, tid, errno);
> > +                       return r;
> > +               }
> > +       } else {
> > +               TEST_FAIL("Invalid uffd mode %d", uffd_mode);
> >         }
> >
> >         ts_diff = timespec_elapsed(start);
> >
> > -       PER_PAGE_DEBUG("UFFDIO_COPY %d \t%ld ns\n", tid,
> > +       PER_PAGE_DEBUG("UFFD page-in %d \t%ld ns\n", tid,
> >                        timespec_to_ns(ts_diff));
> >         PER_PAGE_DEBUG("Paged in %ld bytes at 0x%lx from thread %d\n",
> >                        demand_paging_size, addr, tid);
> > @@ -111,6 +126,7 @@ static int handle_uffd_page_request(int uffd, uint64_t addr)
> >  bool quit_uffd_thread;
> >
> >  struct uffd_handler_args {
> > +       int uffd_mode;
> >         int uffd;
> >         int pipefd;
> >         useconds_t delay;
> > @@ -187,7 +203,7 @@ static void *uffd_handler_thread_fn(void *arg)
> >                 if (delay)
> >                         usleep(delay);
> >                 addr =  msg.arg.pagefault.address;
> > -               r = handle_uffd_page_request(uffd, addr);
> > +               r = handle_uffd_page_request(uffd_args->uffd_mode, uffd, addr);
> >                 if (r < 0)
> >                         return NULL;
> >                 pages++;
> > @@ -203,13 +219,32 @@ static void *uffd_handler_thread_fn(void *arg)
> >
> >  static void setup_demand_paging(struct kvm_vm *vm,
> >                                 pthread_t *uffd_handler_thread, int pipefd,
> > -                               useconds_t uffd_delay,
> > +                               int uffd_mode, useconds_t uffd_delay,
> >                                 struct uffd_handler_args *uffd_args,
> > -                               void *hva, uint64_t len)
> > +                               void *hva, void *alias, uint64_t len)
> >  {
> > +       bool is_minor = (uffd_mode == UFFDIO_REGISTER_MODE_MINOR);
> >         int uffd;
> >         struct uffdio_api uffdio_api;
> >         struct uffdio_register uffdio_register;
> > +       uint64_t expected_ioctls = ((uint64_t) 1) << _UFFDIO_COPY;
> > +
> > +       PER_PAGE_DEBUG("Userfaultfd %s mode, faults resolved with %s\n",
> > +                      is_minor ? "MINOR" : "MISSING",
> > +                      is_minor ? "UFFDIO_CONINUE" : "UFFDIO_COPY");
> > +
> > +       /* In order to get minor faults, prefault via the alias. */
> > +       if (is_minor) {
> > +               size_t p;
> > +
> > +               expected_ioctls = ((uint64_t) 1) << _UFFDIO_CONTINUE;
> > +
> > +               TEST_ASSERT(alias != NULL, "Alias required for minor faults");
> > +               for (p = 0; p < (len / demand_paging_size); ++p) {
> > +                       memcpy(alias + (p * demand_paging_size),
> > +                              guest_data_prototype, demand_paging_size);
> > +               }
> > +       }
>
> Would it be worth timing this operation? I think we'd need to know how
> long we spent prefaulting the memory to really be able to compare UDDF
> modes using this test.

It's easy to time it and print out a value, so I'm happy to add it.

As for how useful it is, I'm not so sure. In general the way I think
of it is, the prefaulting would happen during the precopy phase of
live migration. During this phase, the VM is still running on the
source machine, so the VM owner doesn't notice any performance
degradation or slowness in this phase.

>
> >
> >         uffd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK);
> >         TEST_ASSERT(uffd >= 0, "uffd creation failed, errno: %d", errno);
> > @@ -222,12 +257,13 @@ static void setup_demand_paging(struct kvm_vm *vm,
> >
> >         uffdio_register.range.start = (uint64_t)hva;
> >         uffdio_register.range.len = len;
> > -       uffdio_register.mode = UFFDIO_REGISTER_MODE_MISSING;
> > +       uffdio_register.mode = uffd_mode;
> >         TEST_ASSERT(ioctl(uffd, UFFDIO_REGISTER, &uffdio_register) != -1,
> >                     "ioctl UFFDIO_REGISTER failed");
> > -       TEST_ASSERT((uffdio_register.ioctls & UFFD_API_RANGE_IOCTLS) ==
> > -                   UFFD_API_RANGE_IOCTLS, "unexpected userfaultfd ioctl set");
> > +       TEST_ASSERT((uffdio_register.ioctls & expected_ioctls) ==
> > +                   expected_ioctls, "missing userfaultfd ioctls");
> >
> > +       uffd_args->uffd_mode = uffd_mode;
> >         uffd_args->uffd = uffd;
> >         uffd_args->pipefd = pipefd;
> >         uffd_args->delay = uffd_delay;
> > @@ -239,7 +275,7 @@ static void setup_demand_paging(struct kvm_vm *vm,
> >  }
> >
> >  struct test_params {
> > -       bool use_uffd;
> > +       int uffd_mode;
> >         useconds_t uffd_delay;
> >         enum vm_mem_backing_src_type src_type;
> >         bool partition_vcpu_memory_access;
> > @@ -276,7 +312,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
> >         perf_test_setup_vcpus(vm, nr_vcpus, guest_percpu_mem_size,
> >                               p->partition_vcpu_memory_access);
> >
> > -       if (p->use_uffd) {
> > +       if (p->uffd_mode) {
> >                 uffd_handler_threads =
> >                         malloc(nr_vcpus * sizeof(*uffd_handler_threads));
> >                 TEST_ASSERT(uffd_handler_threads, "Memory allocation failed");
> > @@ -290,6 +326,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
> >                 for (vcpu_id = 0; vcpu_id < nr_vcpus; vcpu_id++) {
> >                         vm_paddr_t vcpu_gpa;
> >                         void *vcpu_hva;
> > +                       void *vcpu_alias;
> >                         uint64_t vcpu_mem_size;
> >
> >
> > @@ -304,8 +341,9 @@ static void run_test(enum vm_guest_mode mode, void *arg)
> >                         PER_VCPU_DEBUG("Added VCPU %d with test mem gpa [%lx, %lx)\n",
> >                                        vcpu_id, vcpu_gpa, vcpu_gpa + vcpu_mem_size);
> >
> > -                       /* Cache the HVA pointer of the region */
> > +                       /* Cache the host addresses of the region */
> >                         vcpu_hva = addr_gpa2hva(vm, vcpu_gpa);
> > +                       vcpu_alias = addr_gpa2alias(vm, vcpu_gpa);
> >
> >                         /*
> >                          * Set up user fault fd to handle demand paging
> > @@ -316,8 +354,9 @@ static void run_test(enum vm_guest_mode mode, void *arg)
> >                         TEST_ASSERT(!r, "Failed to set up pipefd");
> >
> >                         setup_demand_paging(vm, &uffd_handler_threads[vcpu_id],
> > -                                           pipefds[vcpu_id * 2], p->uffd_delay,
> > -                                           &uffd_args[vcpu_id], vcpu_hva,
> > +                                           pipefds[vcpu_id * 2], p->uffd_mode,
> > +                                           p->uffd_delay, &uffd_args[vcpu_id],
> > +                                           vcpu_hva, vcpu_alias,
> >                                             vcpu_mem_size);
> >                 }
> >         }
> > @@ -346,7 +385,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
> >
> >         pr_info("All vCPU threads joined\n");
> >
> > -       if (p->use_uffd) {
> > +       if (p->uffd_mode) {
> >                 char c;
> >
> >                 /* Tell the user fault fd handler threads to quit */
> > @@ -368,7 +407,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
> >
> >         free(guest_data_prototype);
> >         free(vcpu_threads);
> > -       if (p->use_uffd) {
> > +       if (p->uffd_mode) {
> >                 free(uffd_handler_threads);
> >                 free(uffd_args);
> >                 free(pipefds);
> > @@ -378,11 +417,11 @@ static void run_test(enum vm_guest_mode mode, void *arg)
> >  static void help(char *name)
> >  {
> >         puts("");
> > -       printf("usage: %s [-h] [-m mode] [-u] [-d uffd_delay_usec]\n"
> > +       printf("usage: %s [-h] [-m mode] [-u mode] [-d uffd_delay_usec]\n"
>
> NIT: maybe use uffd_mode or some word other than mode here to
> disambiguate with -m
>
> >                "          [-b memory] [-t type] [-v vcpus] [-o]\n", name);
> >         guest_modes_help();
> > -       printf(" -u: use User Fault FD to handle vCPU page\n"
> > -              "     faults.\n");
> > +       printf(" -u: use userfaultfd to handle vCPU page faults. Mode is a\n"
> > +              "     UFFD registration mode: 'MISSING' or 'MINOR'.\n");
> >         printf(" -d: add a delay in usec to the User Fault\n"
> >                "     FD handler to simulate demand paging\n"
> >                "     overheads. Ignored without -u.\n");
> > @@ -409,13 +448,17 @@ int main(int argc, char *argv[])
> >
> >         guest_modes_append_default();
> >
> > -       while ((opt = getopt(argc, argv, "hm:ud:b:t:v:o")) != -1) {
> > +       while ((opt = getopt(argc, argv, "hm:u:d:b:t:v:o")) != -1) {
> >                 switch (opt) {
> >                 case 'm':
> >                         guest_modes_cmdline(optarg);
> >                         break;
> >                 case 'u':
> > -                       p.use_uffd = true;
> > +                       if (!strcmp("MISSING", optarg))
> > +                               p.uffd_mode = UFFDIO_REGISTER_MODE_MISSING;
> > +                       else if (!strcmp("MINOR", optarg))
> > +                               p.uffd_mode = UFFDIO_REGISTER_MODE_MINOR;
> > +                       TEST_ASSERT(p.uffd_mode, "UFFD mode must be 'MISSING' or 'MINOR'.");
> >                         break;
> >                 case 'd':
> >                         p.uffd_delay = strtoul(optarg, NULL, 0);
> > @@ -442,6 +485,9 @@ int main(int argc, char *argv[])
> >                 }
> >         }
> >
> > +       TEST_ASSERT(p.uffd_mode != UFFDIO_REGISTER_MODE_MINOR || p.src_type == VM_MEM_SRC_SHMEM,
> > +                   "userfaultfd MINOR mode requires shared memory; pick a different -t");
> > +
> >         for_each_guest_mode(run_test, &p);
> >
> >         return 0;
> > --
> > 2.31.1.751.gd2f1c929bd-goog
> >
Paolo Bonzini May 24, 2021, 1:23 p.m. UTC | #6
On 19/05/21 23:49, Ben Gardon wrote:
> On Wed, May 19, 2021 at 1:03 PM Axel Rasmussen <axelrasmussen@google.com> wrote:

>>

>> Previously, if this check failed, we'd just exit quietly with no output.

>> This can be confusing, so print out a short message indicating why the

>> test is being skipped.

>>

>> Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>

> 

> Reviewed-by: Ben Gardon <bgardon@google.com>

> 

>> ---

>>   tools/testing/selftests/kvm/lib/kvm_util.c | 4 +++-

>>   1 file changed, 3 insertions(+), 1 deletion(-)

>>

>> diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c

>> index f05ca919cccb..0d6ddee429b9 100644

>> --- a/tools/testing/selftests/kvm/lib/kvm_util.c

>> +++ b/tools/testing/selftests/kvm/lib/kvm_util.c

>> @@ -53,8 +53,10 @@ int kvm_check_cap(long cap)

>>          int kvm_fd;

>>

>>          kvm_fd = open(KVM_DEV_PATH, O_RDONLY);

>> -       if (kvm_fd < 0)

>> +       if (kvm_fd < 0) {

>> +               print_skip("KVM not available, errno: %d", errno);

>>                  exit(KSFT_SKIP);

>> +       }

> 

> This is a wonderful change. I believe this will only be hit if KVM is

> built as a module and that module has not yet been loaded, so this

> message could also suggest that the user check if the KVM / KVM

> arch/vendor specific module has been loaded.


Let's make it

                 print_skip("%s not available, is KVM loaded? (errno: %d)",
                            KVM_DEV_PATH, errno);

Paolo
Paolo Bonzini May 24, 2021, 1:36 p.m. UTC | #7
On 20/05/21 00:20, Ben Gardon wrote:
>> +       printf("usage: %s [-h] [-m mode] [-u mode] [-d uffd_delay_usec]\n"

> NIT: maybe use uffd_mode or some word other than mode here to

> disambiguate with -m

> 


Changed to "[-m vm_mode] [-u uffd_mode]".

Paolo
Paolo Bonzini May 24, 2021, 1:38 p.m. UTC | #8
On 19/05/21 22:03, Axel Rasmussen wrote:
> Base

> ====

> 

> These patches are based upon Andrew Morton's v5.13-rc1-mmots-2021-05-13-17-23

> tag. This is because this series depends on:

> 

> - UFFD minor fault support for hugetlbfs (in v5.13-rc1) [1]

> - UFFD minor fault support for shmem (in Andrew's tree) [2]

> 

> [1] https://lore.kernel.org/linux-fsdevel/20210301222728.176417-1-axelrasmussen@google.com/

> [2] https://lore.kernel.org/patchwork/cover/1420967/

> 

> Changelog

> =========

> 

> v1->v2:

> - Picked up Reviewed-by's.

> - Change backing_src_is_shared() to check the flags, instead of the type. This

>    makes it robust to adding new backing source types in the future.

> - Add another commit which refactors setup_demand_paging() error handling.

> - Print UFFD ioctl type once in setup_demand_paging, instead of on every page-in

>    operation.

> - Expand comment on why we use MFD_HUGETLB instead of MAP_HUGETLB.

> - Reworded comment on addr_gpa2alias.

> - Moved demand_paging_test.c timing calls outside of the if (), deduplicating

>    them.

> - Split trivial comment / logging fixups into a separate commit.

> - Add another commit which prints a clarifying message on test skip.

> - Split the commit allowing backing src_type to be modified in two.

> - Split the commit adding the shmem backing type in two.

> - Rebased onto v5.13-rc1-mmots-2021-05-13-17-23.

> 

> Overview

> ========

> 

> Minor fault handling is a new userfaultfd feature whose goal is generally to

> improve performance. In particular, it is intended for use with demand paging.

> There are more details in the cover letters for this new feature (linked above),

> but at a high level the idea is that we think of these three phases of live

> migration of a VM:

> 

> 1. Precopy, where we copy "some" pages from the source to the target, while the

>     VM is still running on the source machine.

> 2. Blackout, where execution stops on the source, and begins on the target.

> 3. Postcopy, where the VM is running on the target, some pages are already up

>     to date, and others are not (because they weren't copied, or were modified

>     after being copied).

> 

> During postcopy, the first time the guest touches memory, we intercept a minor

> fault. Userspace checks whether or not the page is already up to date. If

> needed, we copy the final version of the page from the soure machine. This

> could be done with RDMA for example, to do it truly in place / with no copying.

> At this point, all that's left is to setup PTEs for the guest: so we issue

> UFFDIO_CONTINUE. No copying or page allocation needed.

> 

> Because of this use case, it's useful to exercise this as part of the demand

> paging test. It lets us ensure the use case works correctly end-to-end, and also

> gives us an in-tree way to profile the end-to-end flow for future performance

> improvements.

> 

> Axel Rasmussen (10):

>    KVM: selftests: trivial comment/logging fixes

>    KVM: selftests: simplify setup_demand_paging error handling

>    KVM: selftests: print a message when skipping KVM tests

>    KVM: selftests: compute correct demand paging size

>    KVM: selftests: allow different backing source types

>    KVM: selftests: refactor vm_mem_backing_src_type flags

>    KVM: selftests: add shmem backing source type

>    KVM: selftests: create alias mappings when using shared memory

>    KVM: selftests: allow using UFFD minor faults for demand paging

>    KVM: selftests: add shared hugetlbfs backing source type

> 

>   .../selftests/kvm/demand_paging_test.c        | 175 +++++++++++-------

>   .../testing/selftests/kvm/include/kvm_util.h  |   1 +

>   .../testing/selftests/kvm/include/test_util.h |  12 ++

>   tools/testing/selftests/kvm/lib/kvm_util.c    |  84 ++++++++-

>   .../selftests/kvm/lib/kvm_util_internal.h     |   2 +

>   tools/testing/selftests/kvm/lib/test_util.c   |  51 +++--

>   6 files changed, 238 insertions(+), 87 deletions(-)

> 

> --

> 2.31.1.751.gd2f1c929bd-goog

> 


Queued, thanks (with region->fd moved to the right patch).

Paolo