mbox series

[v2,0/5] KVM: rseq: Fix and a test for a KVM+rseq bug

Message ID 20210820225002.310652-1-seanjc@google.com
Headers show
Series KVM: rseq: Fix and a test for a KVM+rseq bug | expand

Message

Sean Christopherson Aug. 20, 2021, 10:49 p.m. UTC
Patch 1 fixes a KVM+rseq bug where KVM's handling of TIF_NOTIFY_RESUME,
e.g. for task migration, clears the flag without informing rseq and leads
to stale data in userspace's rseq struct.

Patch 2 is a cleanup to try and make future bugs less likely.  It's also
a baby step towards moving and renaming tracehook_notify_resume() since
it has nothing to do with tracing.

Patch 3 is a fix/cleanup to stop overriding x86's unistd_{32,64}.h when
the include path (intentionally) omits tools' uapi headers.  KVM's
selftests do exactly that so that they can pick up the uapi headers from
the installed kernel headers, and still use various tools/ headers that
mirror kernel code, e.g. linux/types.h.  This allows the new test in
patch 4 to reference __NR_rseq without having to manually define it.

Patch 4 is a regression test for the KVM+rseq bug.

Patch 5 is a cleanup made possible by patch 3.

v2:
  - Don't touch rseq_cs when handling KVM case so that rseq_syscall() will
    still detect a naughty userspace. [Mathieu]
  - Use a sequence counter + retry in the test to ensure the process isn't
    migrated between sched_getcpu() and reading rseq.cpu_id, i.e. to
    avoid a flaky test. [Mathieu]
  - Add Mathieu's ack for patch 2.
  - Add more comments in the test.

v1: https://lkml.kernel.org/r/20210818001210.4073390-1-seanjc@google.com

Sean Christopherson (5):
  KVM: rseq: Update rseq when processing NOTIFY_RESUME on xfer to KVM
    guest
  entry: rseq: Call rseq_handle_notify_resume() in
    tracehook_notify_resume()
  tools: Move x86 syscall number fallbacks to .../uapi/
  KVM: selftests: Add a test for KVM_RUN+rseq to detect task migration
    bugs
  KVM: selftests: Remove __NR_userfaultfd syscall fallback

 arch/arm/kernel/signal.c                      |   1 -
 arch/arm64/kernel/signal.c                    |   1 -
 arch/csky/kernel/signal.c                     |   4 +-
 arch/mips/kernel/signal.c                     |   4 +-
 arch/powerpc/kernel/signal.c                  |   4 +-
 arch/s390/kernel/signal.c                     |   1 -
 include/linux/tracehook.h                     |   2 +
 kernel/entry/common.c                         |   4 +-
 kernel/rseq.c                                 |  14 +-
 .../x86/include/{ => uapi}/asm/unistd_32.h    |   0
 .../x86/include/{ => uapi}/asm/unistd_64.h    |   3 -
 tools/testing/selftests/kvm/.gitignore        |   1 +
 tools/testing/selftests/kvm/Makefile          |   3 +
 tools/testing/selftests/kvm/rseq_test.c       | 154 ++++++++++++++++++
 14 files changed, 175 insertions(+), 21 deletions(-)
 rename tools/arch/x86/include/{ => uapi}/asm/unistd_32.h (100%)
 rename tools/arch/x86/include/{ => uapi}/asm/unistd_64.h (83%)
 create mode 100644 tools/testing/selftests/kvm/rseq_test.c

Comments

Mathieu Desnoyers Aug. 23, 2021, 3 p.m. UTC | #1
----- On Aug 20, 2021, at 6:49 PM, Sean Christopherson seanjc@google.com wrote:

> Invoke rseq's NOTIFY_RESUME handler when processing the flag prior to

> transferring to a KVM guest, which is roughly equivalent to an exit to

> userspace and processes many of the same pending actions.  While the task

> cannot be in an rseq critical section as the KVM path is reachable only

> by via ioctl(KVM_RUN), the side effects that apply to rseq outside of a

> critical section still apply, e.g. the current CPU needs to be updated if

> the task is migrated.

> 

> Clearing TIF_NOTIFY_RESUME without informing rseq can lead to segfaults

> and other badness in userspace VMMs that use rseq in combination with KVM,

> e.g. due to the CPU ID being stale after task migration.


Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>


> 

> Fixes: 72c3c0fe54a3 ("x86/kvm: Use generic xfer to guest work function")

> Reported-by: Peter Foley <pefoley@google.com>

> Bisected-by: Doug Evans <dje@google.com>

> Cc: Shakeel Butt <shakeelb@google.com>

> Cc: Thomas Gleixner <tglx@linutronix.de>

> Cc: stable@vger.kernel.org

> Signed-off-by: Sean Christopherson <seanjc@google.com>

> ---

> kernel/entry/kvm.c |  4 +++-

> kernel/rseq.c      | 14 +++++++++++---

> 2 files changed, 14 insertions(+), 4 deletions(-)

> 

> diff --git a/kernel/entry/kvm.c b/kernel/entry/kvm.c

> index 49972ee99aff..049fd06b4c3d 100644

> --- a/kernel/entry/kvm.c

> +++ b/kernel/entry/kvm.c

> @@ -19,8 +19,10 @@ static int xfer_to_guest_mode_work(struct kvm_vcpu *vcpu,

> unsigned long ti_work)

> 		if (ti_work & _TIF_NEED_RESCHED)

> 			schedule();

> 

> -		if (ti_work & _TIF_NOTIFY_RESUME)

> +		if (ti_work & _TIF_NOTIFY_RESUME) {

> 			tracehook_notify_resume(NULL);

> +			rseq_handle_notify_resume(NULL, NULL);

> +		}

> 

> 		ret = arch_xfer_to_guest_mode_handle_work(vcpu, ti_work);

> 		if (ret)

> diff --git a/kernel/rseq.c b/kernel/rseq.c

> index 35f7bd0fced0..6d45ac3dae7f 100644

> --- a/kernel/rseq.c

> +++ b/kernel/rseq.c

> @@ -282,9 +282,17 @@ void __rseq_handle_notify_resume(struct ksignal *ksig,

> struct pt_regs *regs)

> 

> 	if (unlikely(t->flags & PF_EXITING))

> 		return;

> -	ret = rseq_ip_fixup(regs);

> -	if (unlikely(ret < 0))

> -		goto error;

> +

> +	/*

> +	 * regs is NULL if and only if the caller is in a syscall path.  Skip

> +	 * fixup and leave rseq_cs as is so that rseq_sycall() will detect and

> +	 * kill a misbehaving userspace on debug kernels.

> +	 */

> +	if (regs) {

> +		ret = rseq_ip_fixup(regs);

> +		if (unlikely(ret < 0))

> +			goto error;

> +	}

> 	if (unlikely(rseq_update_cpu_id(t)))

> 		goto error;

> 	return;

> --

> 2.33.0.rc2.250.ged5fa647cd-goog


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
Ben Gardon Aug. 23, 2021, 11:46 p.m. UTC | #2
On Fri, Aug 20, 2021 at 3:50 PM Sean Christopherson <seanjc@google.com> wrote:
>

> Revert the __NR_userfaultfd syscall fallback added for KVM selftests now

> that x86's unistd_{32,63}.h overrides are under uapi/ and thus not in

> KVM sefltests' search path, i.e. now that KVM gets x86 syscall numbers

> from the installed kernel headers.

>

> No functional change intended.

>

> Cc: Ben Gardon <bgardon@google.com>

> Signed-off-by: Sean Christopherson <seanjc@google.com>


Reviewed-by: Ben Gardon <bgardon@google.com>


> ---

>  tools/arch/x86/include/uapi/asm/unistd_64.h | 3 ---

>  1 file changed, 3 deletions(-)

>

> diff --git a/tools/arch/x86/include/uapi/asm/unistd_64.h b/tools/arch/x86/include/uapi/asm/unistd_64.h

> index 4205ed4158bf..cb52a3a8b8fc 100644

> --- a/tools/arch/x86/include/uapi/asm/unistd_64.h

> +++ b/tools/arch/x86/include/uapi/asm/unistd_64.h

> @@ -1,7 +1,4 @@

>  /* SPDX-License-Identifier: GPL-2.0 */

> -#ifndef __NR_userfaultfd

> -#define __NR_userfaultfd 282

> -#endif

>  #ifndef __NR_perf_event_open

>  # define __NR_perf_event_open 298

>  #endif

> --

> 2.33.0.rc2.250.ged5fa647cd-goog

>