diff mbox series

[RESEND,1/2] KVM: Prevent module exit until all VMs are freed

Message ID 20220303183328.1499189-2-dmatlack@google.com
State Superseded
Headers show
Series [RESEND,1/2] KVM: Prevent module exit until all VMs are freed | expand

Commit Message

David Matlack March 3, 2022, 6:33 p.m. UTC
Tie the lifetime the KVM module to the lifetime of each VM via
kvm.users_count. This way anything that grabs a reference to the VM via
kvm_get_kvm() cannot accidentally outlive the KVM module.

Prior to this commit, the lifetime of the KVM module was tied to the
lifetime of /dev/kvm file descriptors, VM file descriptors, and vCPU
file descriptors by their respective file_operations "owner" field.
This approach is insufficient because references grabbed via
kvm_get_kvm() do not prevent closing any of the aforementioned file
descriptors.

This fixes a long standing theoretical bug in KVM that at least affects
async page faults. kvm_setup_async_pf() grabs a reference via
kvm_get_kvm(), and drops it in an asynchronous work callback. Nothing
prevents the VM file descriptor from being closed and the KVM module
from being unloaded before this callback runs.

Fixes: af585b921e5d ("KVM: Halt vcpu if page it tries to access is swapped out")
Cc: stable@vger.kernel.org
Suggested-by: Ben Gardon <bgardon@google.com>
[ Based on a patch from Ben implemented for Google's kernel. ]
Signed-off-by: David Matlack <dmatlack@google.com>
---
 virt/kvm/kvm_main.c | 8 ++++++++
 1 file changed, 8 insertions(+)


base-commit: b13a3befc815eae574d87e6249f973dfbb6ad6cd
prerequisite-patch-id: 38f66d60319bf0bc9bf49f91f0f9119e5441629b
prerequisite-patch-id: 51aa921d68ea649d436ea68e1b8f4aabc3805156

Comments

Sean Christopherson March 8, 2022, 11:08 p.m. UTC | #1
On Tue, Mar 08, 2022, David Matlack wrote:
> On Tue, Mar 8, 2022 at 1:40 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Thu, Mar 03, 2022, David Matlack wrote:
> > > Tie the lifetime the KVM module to the lifetime of each VM via
> > > kvm.users_count. This way anything that grabs a reference to the VM via
> > > kvm_get_kvm() cannot accidentally outlive the KVM module.
> > >
> > > Prior to this commit, the lifetime of the KVM module was tied to the
> > > lifetime of /dev/kvm file descriptors, VM file descriptors, and vCPU
> > > file descriptors by their respective file_operations "owner" field.
> > > This approach is insufficient because references grabbed via
> > > kvm_get_kvm() do not prevent closing any of the aforementioned file
> > > descriptors.
> > >
> > > This fixes a long standing theoretical bug in KVM that at least affects
> > > async page faults. kvm_setup_async_pf() grabs a reference via
> > > kvm_get_kvm(), and drops it in an asynchronous work callback. Nothing
> > > prevents the VM file descriptor from being closed and the KVM module
> > > from being unloaded before this callback runs.
> > >
> > > Fixes: af585b921e5d ("KVM: Halt vcpu if page it tries to access is swapped out")
> >
> > And (or)
> >
> >   Fixes: 3d3aab1b973b ("KVM: set owner of cpu and vm file operations")
> >
> > because the above is x86-centric, at a glance PPC and maybe s390 have issues
> > beyond async #PF.
> >
> > > Cc: stable@vger.kernel.org
> > > Suggested-by: Ben Gardon <bgardon@google.com>
> > > [ Based on a patch from Ben implemented for Google's kernel. ]
> > > Signed-off-by: David Matlack <dmatlack@google.com>
> > > ---
> > >  virt/kvm/kvm_main.c | 8 ++++++++
> > >  1 file changed, 8 insertions(+)
> > >
> > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > > index 35ae6d32dae5..b59f0a29dbd5 100644
> > > --- a/virt/kvm/kvm_main.c
> > > +++ b/virt/kvm/kvm_main.c
> > > @@ -117,6 +117,8 @@ EXPORT_SYMBOL_GPL(kvm_debugfs_dir);
> > >
> > >  static const struct file_operations stat_fops_per_vm;
> > >
> > > +static struct file_operations kvm_chardev_ops;
> > > +
> > >  static long kvm_vcpu_ioctl(struct file *file, unsigned int ioctl,
> > >                          unsigned long arg);
> > >  #ifdef CONFIG_KVM_COMPAT
> > > @@ -1131,6 +1133,11 @@ static struct kvm *kvm_create_vm(unsigned long type)
> > >       preempt_notifier_inc();
> > >       kvm_init_pm_notifier(kvm);
> > >
> > > +     if (!try_module_get(kvm_chardev_ops.owner)) {
> >
> > The "try" aspect is unnecessary.  Stealing from Paolo's version,
> >
> >         /* KVM is pinned via open("/dev/kvm"), the fd passed to this ioctl(). */
> >         __module_get(kvm_chardev_ops.owner);
> 
> Right, I did see that and agree we're guaranteed the KVM module has a
> reference at this point. But the KVM module might be in state
> MODULE_STATE_GOING (e.g. if someone ran "rmmod --wait"), which
> try_module_get() checks.

Ah, can you throw that in as a comment?  Doesn't have to be much, just enough of
a breadcrumb to connect the dots and to prevent us from "optimizing" this to
__module_get() in the future.

	/* Use the "try" variant to play nice with e.g. "rmmod --wait". */

With a comment,

Reviewed-by: Sean Christopherson <seanjc@google.com>
diff mbox series

Patch

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 35ae6d32dae5..b59f0a29dbd5 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -117,6 +117,8 @@  EXPORT_SYMBOL_GPL(kvm_debugfs_dir);
 
 static const struct file_operations stat_fops_per_vm;
 
+static struct file_operations kvm_chardev_ops;
+
 static long kvm_vcpu_ioctl(struct file *file, unsigned int ioctl,
 			   unsigned long arg);
 #ifdef CONFIG_KVM_COMPAT
@@ -1131,6 +1133,11 @@  static struct kvm *kvm_create_vm(unsigned long type)
 	preempt_notifier_inc();
 	kvm_init_pm_notifier(kvm);
 
+	if (!try_module_get(kvm_chardev_ops.owner)) {
+		r = -ENODEV;
+		goto out_err;
+	}
+
 	return kvm;
 
 out_err:
@@ -1220,6 +1227,7 @@  static void kvm_destroy_vm(struct kvm *kvm)
 	preempt_notifier_dec();
 	hardware_disable_all();
 	mmdrop(mm);
+	module_put(kvm_chardev_ops.owner);
 }
 
 void kvm_get_kvm(struct kvm *kvm)