diff mbox series

[v4] virt: vmgenid: introduce driver for reinitializing RNG on VM fork

Message ID 20220225124848.909093-1-Jason@zx2c4.com
State Superseded
Headers show
Series [v4] virt: vmgenid: introduce driver for reinitializing RNG on VM fork | expand

Commit Message

Jason A. Donenfeld Feb. 25, 2022, 12:48 p.m. UTC
VM Generation ID is a feature from Microsoft, described at
<https://go.microsoft.com/fwlink/?LinkId=260709>, and supported by
Hyper-V and QEMU. Its usage is described in Microsoft's RNG whitepaper,
<https://aka.ms/win10rng>, as:

    If the OS is running in a VM, there is a problem that most
    hypervisors can snapshot the state of the machine and later rewind
    the VM state to the saved state. This results in the machine running
    a second time with the exact same RNG state, which leads to serious
    security problems.  To reduce the window of vulnerability, Windows
    10 on a Hyper-V VM will detect when the VM state is reset, retrieve
    a unique (not random) value from the hypervisor, and reseed the root
    RNG with that unique value.  This does not eliminate the
    vulnerability, but it greatly reduces the time during which the RNG
    system will produce the same outputs as it did during a previous
    instantiation of the same VM state.

Linux has the same issue, and given that vmgenid is supported already by
multiple hypervisors, we can implement more or less the same solution.
So this commit wires up the vmgenid ACPI notification to the RNG's newly
added add_vmfork_randomness() function.

It can be used from qemu via the `-device vmgenid,guid=auto` parameter.
After setting that, use `savevm` in the monitor to save the VM state,
then quit QEMU, start it again, and use `loadvm`. That will trigger this
driver's notify function, which hands the new UUID to the RNG. This is
described in <https://git.qemu.org/?p=qemu.git;a=blob;f=docs/specs/vmgenid.txt>.
And there are hooks for this in libvirt as well, described in
<https://libvirt.org/formatdomain.html#general-metadata>.

Note, however, that the treatment of this as a UUID is considered to be
an accidental QEMU nuance, per
<https://github.com/libguestfs/virt-v2v/blob/master/docs/vm-generation-id-across-hypervisors.txt>,
so this driver simply treats these bytes as an opaque 128-bit binary
blob, as per the spec. This doesn't really make a difference anyway,
considering that's how it ends up when handed to the RNG in the end.

Cc: Adrian Catangiu <adrian@parity.io>
Cc: Daniel P. Berrangé <berrange@redhat.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
Changes v3->v4:
- Add this driver to MAINTAINERS, per Ard's request.
  Note: I didn't really want to do this at first, because I was hoping the
  original Amazon team looking into this last year would step up. But it seems
  like that team has moved on, and anyway I've basically rewritten the driver
  from scratch at this point -- not a single line of the original exists --
  and so I guess I'll maintain it myself. Adding Greg to the CC for his ack on
  this.
- Don't use a static global state in case there are multiple instances.
- Use devm_memremap instead of the acpi internal functions.
- Default to being modular instead of a built-in, as apparently this is
  udev-able.

 MAINTAINERS            |   1 +
 drivers/virt/Kconfig   |   9 ++++
 drivers/virt/Makefile  |   1 +
 drivers/virt/vmgenid.c | 112 +++++++++++++++++++++++++++++++++++++++++
 4 files changed, 123 insertions(+)
 create mode 100644 drivers/virt/vmgenid.c

Comments

Jason A. Donenfeld Feb. 25, 2022, 12:56 p.m. UTC | #1
On Fri, Feb 25, 2022 at 1:53 PM Greg KH <gregkh@linuxfoundation.org> wrote:
>
> On Fri, Feb 25, 2022 at 01:48:48PM +0100, Jason A. Donenfeld wrote:
> > +static struct acpi_driver acpi_driver = {
> > +     .name = "vmgenid",
> > +     .ids = vmgenid_ids,
> > +     .owner = THIS_MODULE,
> > +     .ops = {
> > +             .add = vmgenid_acpi_add,
> > +             .notify = vmgenid_acpi_notify,
> > +     }
> > +};
> > +
> > +static int __init vmgenid_init(void)
> > +{
> > +     return acpi_bus_register_driver(&acpi_driver);
> > +}
> > +
> > +static void __exit vmgenid_exit(void)
> > +{
> > +     acpi_bus_unregister_driver(&acpi_driver);
> > +}
> > +
> > +module_init(vmgenid_init);
> > +module_exit(vmgenid_exit);
>
> Nit, you could use module_acpi_driver() to make this even smaller if you
> want to.

Nice! Will do.

Jason
Alexander Graf Feb. 25, 2022, 1:57 p.m. UTC | #2
On 25.02.22 13:48, Jason A. Donenfeld wrote:
>
> VM Generation ID is a feature from Microsoft, described at
> <https://go.microsoft.com/fwlink/?LinkId=260709>, and supported by
> Hyper-V and QEMU. Its usage is described in Microsoft's RNG whitepaper,
> <https://aka.ms/win10rng>, as:
>
>      If the OS is running in a VM, there is a problem that most
>      hypervisors can snapshot the state of the machine and later rewind
>      the VM state to the saved state. This results in the machine running
>      a second time with the exact same RNG state, which leads to serious
>      security problems.  To reduce the window of vulnerability, Windows
>      10 on a Hyper-V VM will detect when the VM state is reset, retrieve
>      a unique (not random) value from the hypervisor, and reseed the root
>      RNG with that unique value.  This does not eliminate the
>      vulnerability, but it greatly reduces the time during which the RNG
>      system will produce the same outputs as it did during a previous
>      instantiation of the same VM state.
>
> Linux has the same issue, and given that vmgenid is supported already by
> multiple hypervisors, we can implement more or less the same solution.
> So this commit wires up the vmgenid ACPI notification to the RNG's newly
> added add_vmfork_randomness() function.
>
> It can be used from qemu via the `-device vmgenid,guid=auto` parameter.
> After setting that, use `savevm` in the monitor to save the VM state,
> then quit QEMU, start it again, and use `loadvm`. That will trigger this
> driver's notify function, which hands the new UUID to the RNG. This is
> described in <https://git.qemu.org/?p=qemu.git;a=blob;f=docs/specs/vmgenid.txt>.
> And there are hooks for this in libvirt as well, described in
> <https://libvirt.org/formatdomain.html#general-metadata>.
>
> Note, however, that the treatment of this as a UUID is considered to be
> an accidental QEMU nuance, per
> <https://github.com/libguestfs/virt-v2v/blob/master/docs/vm-generation-id-across-hypervisors.txt>,
> so this driver simply treats these bytes as an opaque 128-bit binary
> blob, as per the spec. This doesn't really make a difference anyway,
> considering that's how it ends up when handed to the RNG in the end.
>
> Cc: Adrian Catangiu <adrian@parity.io>
> Cc: Daniel P. Berrangé <berrange@redhat.com>
> Cc: Dominik Brodowski <linux@dominikbrodowski.net>
> Cc: Ard Biesheuvel <ardb@kernel.org>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Reviewed-by: Laszlo Ersek <lersek@redhat.com>
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> ---
> Changes v3->v4:
> - Add this driver to MAINTAINERS, per Ard's request.
>    Note: I didn't really want to do this at first, because I was hoping the
>    original Amazon team looking into this last year would step up. But it seems
>    like that team has moved on, and anyway I've basically rewritten the driver
>    from scratch at this point -- not a single line of the original exists --
>    and so I guess I'll maintain it myself. Adding Greg to the CC for his ack on
>    this.
> - Don't use a static global state in case there are multiple instances.
> - Use devm_memremap instead of the acpi internal functions.
> - Default to being modular instead of a built-in, as apparently this is
>    udev-able.
>
>   MAINTAINERS            |   1 +
>   drivers/virt/Kconfig   |   9 ++++
>   drivers/virt/Makefile  |   1 +
>   drivers/virt/vmgenid.c | 112 +++++++++++++++++++++++++++++++++++++++++
>   4 files changed, 123 insertions(+)
>   create mode 100644 drivers/virt/vmgenid.c
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 777cd6fa2b3d..a10997e15146 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -16211,6 +16211,7 @@ M:      Jason A. Donenfeld <Jason@zx2c4.com>
>   T:     git https://git.kernel.org/pub/scm/linux/kernel/git/crng/random.git
>   S:     Maintained
>   F:     drivers/char/random.c
> +F:     drivers/virt/vmgenid.c
>
>   RAPIDIO SUBSYSTEM
>   M:     Matt Porter <mporter@kernel.crashing.org>
> diff --git a/drivers/virt/Kconfig b/drivers/virt/Kconfig
> index 8061e8ef449f..5596c7313f59 100644
> --- a/drivers/virt/Kconfig
> +++ b/drivers/virt/Kconfig
> @@ -13,6 +13,15 @@ menuconfig VIRT_DRIVERS
>
>   if VIRT_DRIVERS
>
> +config VMGENID
> +       tristate "Virtual Machine Generation ID driver"
> +       default m
> +       depends on ACPI
> +       help
> +         Say Y here to use the hypervisor-provided Virtual Machine Generation ID
> +         to reseed the RNG when the VM is cloned. This is highly recommended if
> +         you intend to do any rollback / cloning / snapshotting of VMs.
> +
>   config FSL_HV_MANAGER
>          tristate "Freescale hypervisor management driver"
>          depends on FSL_SOC
> diff --git a/drivers/virt/Makefile b/drivers/virt/Makefile
> index 3e272ea60cd9..108d0ffcc9aa 100644
> --- a/drivers/virt/Makefile
> +++ b/drivers/virt/Makefile
> @@ -4,6 +4,7 @@
>   #
>
>   obj-$(CONFIG_FSL_HV_MANAGER)   += fsl_hypervisor.o
> +obj-$(CONFIG_VMGENID)          += vmgenid.o
>   obj-y                          += vboxguest/
>
>   obj-$(CONFIG_NITRO_ENCLAVES)   += nitro_enclaves/
> diff --git a/drivers/virt/vmgenid.c b/drivers/virt/vmgenid.c
> new file mode 100644
> index 000000000000..e3dd4afb33c6
> --- /dev/null
> +++ b/drivers/virt/vmgenid.c
> @@ -0,0 +1,112 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2022 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
> + *
> + * The "Virtual Machine Generation ID" is exposed via ACPI and changes when a
> + * virtual machine forks or is cloned. This driver exists for shepherding that
> + * information to random.c.
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/acpi.h>
> +#include <linux/random.h>
> +
> +ACPI_MODULE_NAME("vmgenid");
> +
> +enum { VMGENID_SIZE = 16 };
> +
> +struct vmgenid_state {
> +       u8 *next_id;
> +       u8 this_id[VMGENID_SIZE];
> +};
> +
> +static int vmgenid_acpi_add(struct acpi_device *device)
> +{
> +       struct acpi_buffer parsed = { ACPI_ALLOCATE_BUFFER };
> +       struct vmgenid_state *state;
> +       union acpi_object *obj;
> +       phys_addr_t phys_addr;
> +       acpi_status status;
> +       int ret = 0;
> +
> +       state = devm_kmalloc(&device->dev, sizeof(*state), GFP_KERNEL);
> +       if (!state)
> +               return -ENOMEM;
> +
> +       status = acpi_evaluate_object(device->handle, "ADDR", NULL, &parsed);
> +       if (ACPI_FAILURE(status)) {
> +               ACPI_EXCEPTION((AE_INFO, status, "Evaluating ADDR"));
> +               return -ENODEV;
> +       }
> +       obj = parsed.pointer;
> +       if (!obj || obj->type != ACPI_TYPE_PACKAGE || obj->package.count != 2 ||
> +           obj->package.elements[0].type != ACPI_TYPE_INTEGER ||
> +           obj->package.elements[1].type != ACPI_TYPE_INTEGER) {
> +               ret = -EINVAL;
> +               goto out;
> +       }
> +
> +       phys_addr = (obj->package.elements[0].integer.value << 0) |
> +                   (obj->package.elements[1].integer.value << 32);
> +       state->next_id = devm_memremap(&device->dev, phys_addr, VMGENID_SIZE, MEMREMAP_WB);
> +       if (!state->next_id) {
> +               ret = -ENOMEM;
> +               goto out;
> +       }
> +
> +       memcpy(state->this_id, state->next_id, sizeof(state->this_id));
> +       add_device_randomness(state->this_id, sizeof(state->this_id));


Please expose the vmgenid via /sysfs so that user space even remotely 
has a chance to check if it's been cloned.


> +
> +       device->driver_data = state;
> +
> +out:
> +       ACPI_FREE(parsed.pointer);
> +       return ret;
> +}
> +
> +static void vmgenid_acpi_notify(struct acpi_device *device, u32 event)
> +{
> +       struct vmgenid_state *state = acpi_driver_data(device);
> +       u8 old_id[VMGENID_SIZE];
> +
> +       memcpy(old_id, state->this_id, sizeof(old_id));
> +       memcpy(state->this_id, state->next_id, sizeof(state->this_id));
> +       if (!memcmp(old_id, state->this_id, sizeof(old_id)))
> +               return;
> +       add_vmfork_randomness(state->this_id, sizeof(state->this_id));
> +}
> +
> +static const struct acpi_device_id vmgenid_ids[] = {
> +       { "VMGENID", 0 },
> +       { "QEMUVGID", 0 },


According to the VMGenID spec[1], you can only rely on _CID and _DDN for 
matching. They both contain "VM_Gen_Counter". The list above contains 
_HID values which are not an official identifier for the VMGenID device.

IIRC the ACPI device match logic does match _CID in addition to _HID. 
However, it is limited to 8 characters. Let me paste an experimental 
hack I did back then to do the _CID matching instead.

[1] 
https://download.microsoft.com/download/3/1/C/31CFC307-98CA-4CA5-914C-D9772691E214/VirtualMachineGenerationID.docx


Alex

diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
index 1682f8b454a2..452443d79d87 100644
--- a/drivers/acpi/bus.c
+++ b/drivers/acpi/bus.c
@@ -748,7 +748,7 @@ static bool __acpi_match_device(struct acpi_device 
*device,
          /* First, check the ACPI/PNP IDs provided by the caller. */
          if (acpi_ids) {
              for (id = acpi_ids; id->id[0] || id->cls; id++) {
-                if (id->id[0] && !strcmp((char *)id->id, hwid->id))
+                if (id->id[0] && !strncmp((char *)id->id, hwid->id, 
ACPI_ID_LEN - 1))
                      goto out_acpi_match;
                  if (id->cls && __acpi_match_device_cls(id, hwid))
                      goto out_acpi_match;
diff --git a/drivers/virt/vmgenid.c b/drivers/virt/vmgenid.c
index 75a787da8aad..0bfa422cf094 100644
--- a/drivers/virt/vmgenid.c
+++ b/drivers/virt/vmgenid.c
@@ -356,7 +356,8 @@ static void vmgenid_acpi_notify(struct acpi_device 
*device, u32 event)
  }

  static const struct acpi_device_id vmgenid_ids[] = {
-    {"QEMUVGID", 0},
+    /* This really is VM_Gen_Counter, but we can only match 8 characters */
+    {"VM_GEN_C", 0},
      {"", 0},
  };




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879
Jason A. Donenfeld Feb. 25, 2022, 2:18 p.m. UTC | #3
On Fri, Feb 25, 2022 at 3:12 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> Hi Alex,
>
> On Fri, Feb 25, 2022 at 02:57:38PM +0100, Alexander Graf wrote:
> > > +static const struct acpi_device_id vmgenid_ids[] = {
> > > +       { "VMGENID", 0 },
> > > +       { "QEMUVGID", 0 },
> >
> >
> > According to the VMGenID spec[1], you can only rely on _CID and _DDN for
> > matching. They both contain "VM_Gen_Counter". The list above contains
> > _HID values which are not an official identifier for the VMGenID device.
> >
> > IIRC the ACPI device match logic does match _CID in addition to _HID.
> > However, it is limited to 8 characters. Let me paste an experimental
> > hack I did back then to do the _CID matching instead.
> >
> > [1]
> > https://download.microsoft.com/download/3/1/C/31CFC307-98CA-4CA5-914C-D9772691E214/VirtualMachineGenerationID.docx
> >
> >
> > Alex
> >
> > diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
> > index 1682f8b454a2..452443d79d87 100644
> > --- a/drivers/acpi/bus.c
> > +++ b/drivers/acpi/bus.c
> > @@ -748,7 +748,7 @@ static bool __acpi_match_device(struct acpi_device
> > *device,
> >           /* First, check the ACPI/PNP IDs provided by the caller. */
> >           if (acpi_ids) {
> >               for (id = acpi_ids; id->id[0] || id->cls; id++) {
> > -                if (id->id[0] && !strcmp((char *)id->id, hwid->id))
> > +                if (id->id[0] && !strncmp((char *)id->id, hwid->id,
> > ACPI_ID_LEN - 1))
> >                       goto out_acpi_match;
> >                   if (id->cls && __acpi_match_device_cls(id, hwid))
> >                       goto out_acpi_match;
> > diff --git a/drivers/virt/vmgenid.c b/drivers/virt/vmgenid.c
> > index 75a787da8aad..0bfa422cf094 100644
> > --- a/drivers/virt/vmgenid.c
> > +++ b/drivers/virt/vmgenid.c
> > @@ -356,7 +356,8 @@ static void vmgenid_acpi_notify(struct acpi_device
> > *device, u32 event)
> >   }
> >
> >   static const struct acpi_device_id vmgenid_ids[] = {
> > -    {"QEMUVGID", 0},
> > +    /* This really is VM_Gen_Counter, but we can only match 8 characters */
> > +    {"VM_GEN_C", 0},
> >       {"", 0},
> >   };
>
> I recall this part of the old thread. From what I understood, using
> "VMGENID" + "QEMUVGID" worked /well enough/, even if that wasn't
> technically in-spec. Ard noted that relying on _CID like that is
> technically an ACPI spec notification. So we're between one spec and
> another, basically, and doing "VMGENID" + "QEMUVGID" requires fewer
> changes, as mentioned, appears to work fine in my testing.
>
> However, with that said, I think supporting this via "VM_Gen_Counter"
> would be a better eventual thing to do, but will require acks and
> changes from the ACPI maintainers. Do you think you could prepare your
> patch proposal above as something on-top of my tree [1]? And if you can
> convince the ACPI maintainers that that's okay, then I'll happily take
> the patch.

Closely related concern that whatever patch you come up with will have
to handle is MODULE_DEVICE_TABLE and udev autoloading. I don't know if
_CID matching is something that happens in udev or what its limits
are, so that'll have to be researched and tested a bit.

Jason
Jason A. Donenfeld Feb. 25, 2022, 2:33 p.m. UTC | #4
On Fri, Feb 25, 2022 at 03:18:43PM +0100, Alexander Graf wrote:
> > I recall this part of the old thread. From what I understood, using
> > "VMGENID" + "QEMUVGID" worked /well enough/, even if that wasn't
> > technically in-spec. Ard noted that relying on _CID like that is
> > technically an ACPI spec notification. So we're between one spec and
> > another, basically, and doing "VMGENID" + "QEMUVGID" requires fewer
> > changes, as mentioned, appears to work fine in my testing.
> >
> > However, with that said, I think supporting this via "VM_Gen_Counter"
> > would be a better eventual thing to do, but will require acks and
> > changes from the ACPI maintainers. Do you think you could prepare your
> > patch proposal above as something on-top of my tree [1]? And if you can
> > convince the ACPI maintainers that that's okay, then I'll happily take
> > the patch.
> 
> 
> Sure, let me send the ACPI patch stand alone. No need to include the 
> VMGenID change in there.

That's fine. If the ACPI people take it for 5.18, then we can count on
it being there and adjust the vmgenid driver accordingly also for 5.18.

I just booted up a Windows VM, and it looks like Hyper-V uses
"Hyper_V_Gen_Counter_V1", which is also quite long, so we can't really
HID match on that either.

Jason
Jason A. Donenfeld Feb. 25, 2022, 2:54 p.m. UTC | #5
Hi Alex,

Missed this remark before:

On Fri, Feb 25, 2022 at 02:57:38PM +0100, Alexander Graf wrote:
> Please expose the vmgenid via /sysfs so that user space even remotely 
> has a chance to check if it's been cloned.

No. Did you read the 0/2 cover letter? I'll quote it for you here:

> As a side note, this series intentionally does _not_ focus on
> notification of these events to userspace or to other kernel consumers.
> Since these VM fork detection events first need to hit the RNG, we can
> later talk about what sorts of notifications or mmap'd counters the RNG
> should be making accessible to elsewhere. But that's a different sort of
> project and ties into a lot of more complicated concerns beyond this
> more basic patchset. So hopefully we can keep the discussion rather
> focused here to this ACPI business.

What about that was unclear to you?

Anyway, it's a different thing that will have to be designed and
considered carefully, and that design doesn't have a whole lot to do
with this little driver here, except insofar as it could build on top of
it in one way or another. Yes, it's an important thing to do. No, I'm
not going to do it in this patch here. If you want to have a discussion
about that, start a different thread.

Thanks,
Jason
Ard Biesheuvel Feb. 25, 2022, 3:04 p.m. UTC | #6
On Fri, 25 Feb 2022 at 13:53, Greg KH <gregkh@linuxfoundation.org> wrote:
>
> On Fri, Feb 25, 2022 at 01:48:48PM +0100, Jason A. Donenfeld wrote:
> > +static struct acpi_driver acpi_driver = {
> > +     .name = "vmgenid",
> > +     .ids = vmgenid_ids,
> > +     .owner = THIS_MODULE,
> > +     .ops = {
> > +             .add = vmgenid_acpi_add,
> > +             .notify = vmgenid_acpi_notify,
> > +     }
> > +};
> > +
> > +static int __init vmgenid_init(void)
> > +{
> > +     return acpi_bus_register_driver(&acpi_driver);
> > +}
> > +
> > +static void __exit vmgenid_exit(void)
> > +{
> > +     acpi_bus_unregister_driver(&acpi_driver);
> > +}
> > +
> > +module_init(vmgenid_init);
> > +module_exit(vmgenid_exit);
>
> Nit, you could use module_acpi_driver() to make this even smaller if you
> want to.
>

With that suggestion adopted,

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Alexander Graf Feb. 25, 2022, 3:15 p.m. UTC | #7
On 25.02.22 15:54, Jason A. Donenfeld wrote:
> Hi Alex,
>
> Missed this remark before:
>
> On Fri, Feb 25, 2022 at 02:57:38PM +0100, Alexander Graf wrote:
>> Please expose the vmgenid via /sysfs so that user space even remotely
>> has a chance to check if it's been cloned.
> No. Did you read the 0/2 cover letter? I'll quote it for you here:
>
>> As a side note, this series intentionally does _not_ focus on
>> notification of these events to userspace or to other kernel consumers.
>> Since these VM fork detection events first need to hit the RNG, we can
>> later talk about what sorts of notifications or mmap'd counters the RNG
>> should be making accessible to elsewhere. But that's a different sort of
>> project and ties into a lot of more complicated concerns beyond this
>> more basic patchset. So hopefully we can keep the discussion rather
>> focused here to this ACPI business.
> What about that was unclear to you?
>
> Anyway, it's a different thing that will have to be designed and
> considered carefully, and that design doesn't have a whole lot to do
> with this little driver here, except insofar as it could build on top of
> it in one way or another. Yes, it's an important thing to do. No, I'm
> not going to do it in this patch here. If you want to have a discussion
> about that, start a different thread.


I'm not talking about a notification interface - we've gone through 
great length on that one in the previous submission. What I'm more 
interested in is *any* way for user space to read the current VM Gen ID. 
The same way I'm interested to see other device attributes of my system 
through sysfs.


Alex





Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879
Alexander Graf Feb. 25, 2022, 3:22 p.m. UTC | #8
On 25.02.22 16:16, Ard Biesheuvel wrote:
> On Fri, 25 Feb 2022 at 16:12, Alexander Graf <graf@amazon.com> wrote:
>>
>> On 25.02.22 15:33, Jason A. Donenfeld wrote:
>>> On Fri, Feb 25, 2022 at 03:18:43PM +0100, Alexander Graf wrote:
>>>>> I recall this part of the old thread. From what I understood, using
>>>>> "VMGENID" + "QEMUVGID" worked /well enough/, even if that wasn't
>>>>> technically in-spec. Ard noted that relying on _CID like that is
>>>>> technically an ACPI spec notification. So we're between one spec and
>>>>> another, basically, and doing "VMGENID" + "QEMUVGID" requires fewer
>>>>> changes, as mentioned, appears to work fine in my testing.
>>>>>
>>>>> However, with that said, I think supporting this via "VM_Gen_Counter"
>>>>> would be a better eventual thing to do, but will require acks and
>>>>> changes from the ACPI maintainers. Do you think you could prepare your
>>>>> patch proposal above as something on-top of my tree [1]? And if you can
>>>>> convince the ACPI maintainers that that's okay, then I'll happily take
>>>>> the patch.
>>>> Sure, let me send the ACPI patch stand alone. No need to include the
>>>> VMGenID change in there.
>>> That's fine. If the ACPI people take it for 5.18, then we can count on
>>> it being there and adjust the vmgenid driver accordingly also for 5.18.
>>>
>>> I just booted up a Windows VM, and it looks like Hyper-V uses
>>> "Hyper_V_Gen_Counter_V1", which is also quite long, so we can't really
>>> HID match on that either.
>>
>> Yes, due to the same problem. I'd really prefer we sort out the ACPI
>> matching before this goes mainline. Matching on _HID is explicitly
>> discouraged in the VMGenID spec.
>>
> OK, this really sucks. Quoting the ACPI spec:
>
> """
> A _HID object evaluates to either a numeric 32-bit compressed EISA
> type ID or a string. If a string, the format must be an alphanumeric
> PNP or ACPI ID with no asterisk or other leading characters.
> A valid PNP ID must be of the form "AAA####" where A is an uppercase
> letter and # is a hex digit.
> A valid ACPI ID must be of the form "NNNN####" where N is an uppercase
> letter or a digit ('0'-'9') and # is a hex digit. This specification
> reserves the string "ACPI" for use only with devices defined herein.
> It further reserves all strings representing 4 HEX digits for
> exclusive use with PCI-assigned Vendor IDs.
> """
>
> So now we have to implement Microsoft's fork of ACPI to be able to use
> this device, even if we expose it from QEMU instead of Hyper-V? I
> strongly object to that.
>
> Instead, we can match on _HID exposed by QEMU, and cordially invite
> Microsoft to align their spec with the ACPI spec.


Doing that would be a backwards incompatible change for Hyper-V, no? I 
understand that you're upset about their spec, but that doesn't mean we 
can't find a path forward to make it all compatible.

IMHO just matching on the first 9 bytes of the _CID/_HID string is 
perfectly fine. It follows the spec, but still allows for weird 
identifiers like this one to work.

I don't understand the rush here. This had been sitting on the ML for 1 
year - and now suddenly talking the match through properly and getting 
VMGenID spec compatible matching support into the ACPI core is a 
problem? What did I miss? :)


Alex




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879
Alexander Graf Feb. 25, 2022, 3:31 p.m. UTC | #9
On 25.02.22 15:36, Greg KH wrote:
> On Fri, Feb 25, 2022 at 02:57:38PM +0100, Alexander Graf wrote:
>>> +
>>> +       phys_addr = (obj->package.elements[0].integer.value << 0) |
>>> +                   (obj->package.elements[1].integer.value << 32);
>>> +       state->next_id = devm_memremap(&device->dev, phys_addr, VMGENID_SIZE, MEMREMAP_WB);
>>> +       if (!state->next_id) {
>>> +               ret = -ENOMEM;
>>> +               goto out;
>>> +       }
>>> +
>>> +       memcpy(state->this_id, state->next_id, sizeof(state->this_id));
>>> +       add_device_randomness(state->this_id, sizeof(state->this_id));
>>
>> Please expose the vmgenid via /sysfs so that user space even remotely has a
>> chance to check if it's been cloned.
> Export it how?  And why, who would care?


You can just create a sysfs file that contains it. The same way we have 
sysfs files for UEFI config tables. Or sysfs files for the acpi device 
nodes themselves.

I personally don't care if we put this into a generic location 
(/sys/hypervisor for example) or into the existing acpi device node as 
additional file you can just read.

Who would care? Well, for starters I would care for debugging purposes 
:). Extracting the ID and validating that it's different than before is 
quite useful when you want to check if the clone rng adjustment actually 
worked.

I don't have very strong feelings on it though - unlike the _CID 
conversation.


Alex





Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879
Jason A. Donenfeld Feb. 25, 2022, 3:36 p.m. UTC | #10
Hi again,

On Fri, Feb 25, 2022 at 04:31:02PM +0100, Alexander Graf wrote:
> >> Please expose the vmgenid via /sysfs so that user space even remotely has a
> >> chance to check if it's been cloned.
> > Export it how?  And why, who would care?
> You can just

As mentioned in <https://lore.kernel.org/lkml/Yhj1nYHXmimPsqFd@zx2c4.com/>,
propose something later for all of this. It doesn't need to happen all
at once *now*.

Thanks,
Jason
Jason A. Donenfeld Feb. 25, 2022, 3:45 p.m. UTC | #11
Hi Alex,

On Fri, Feb 25, 2022 at 04:37:43PM +0100, Alexander Graf wrote:
> I believe "VMGENID" was for the firecracker prototype that Adrian built 
> back then, yeah. Matching on _HID for this is a rat hole unfortunately, 
> so let's see what the ACPI patch gets us :).

Thanks. I'll add a comment to the code about Firecracker. And indeed
hopefully that'll all go away anyway.

Jason
diff mbox series

Patch

diff --git a/MAINTAINERS b/MAINTAINERS
index 777cd6fa2b3d..a10997e15146 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -16211,6 +16211,7 @@  M:	Jason A. Donenfeld <Jason@zx2c4.com>
 T:	git https://git.kernel.org/pub/scm/linux/kernel/git/crng/random.git
 S:	Maintained
 F:	drivers/char/random.c
+F:	drivers/virt/vmgenid.c
 
 RAPIDIO SUBSYSTEM
 M:	Matt Porter <mporter@kernel.crashing.org>
diff --git a/drivers/virt/Kconfig b/drivers/virt/Kconfig
index 8061e8ef449f..5596c7313f59 100644
--- a/drivers/virt/Kconfig
+++ b/drivers/virt/Kconfig
@@ -13,6 +13,15 @@  menuconfig VIRT_DRIVERS
 
 if VIRT_DRIVERS
 
+config VMGENID
+	tristate "Virtual Machine Generation ID driver"
+	default m
+	depends on ACPI
+	help
+	  Say Y here to use the hypervisor-provided Virtual Machine Generation ID
+	  to reseed the RNG when the VM is cloned. This is highly recommended if
+	  you intend to do any rollback / cloning / snapshotting of VMs.
+
 config FSL_HV_MANAGER
 	tristate "Freescale hypervisor management driver"
 	depends on FSL_SOC
diff --git a/drivers/virt/Makefile b/drivers/virt/Makefile
index 3e272ea60cd9..108d0ffcc9aa 100644
--- a/drivers/virt/Makefile
+++ b/drivers/virt/Makefile
@@ -4,6 +4,7 @@ 
 #
 
 obj-$(CONFIG_FSL_HV_MANAGER)	+= fsl_hypervisor.o
+obj-$(CONFIG_VMGENID)		+= vmgenid.o
 obj-y				+= vboxguest/
 
 obj-$(CONFIG_NITRO_ENCLAVES)	+= nitro_enclaves/
diff --git a/drivers/virt/vmgenid.c b/drivers/virt/vmgenid.c
new file mode 100644
index 000000000000..e3dd4afb33c6
--- /dev/null
+++ b/drivers/virt/vmgenid.c
@@ -0,0 +1,112 @@ 
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2022 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ *
+ * The "Virtual Machine Generation ID" is exposed via ACPI and changes when a
+ * virtual machine forks or is cloned. This driver exists for shepherding that
+ * information to random.c.
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/acpi.h>
+#include <linux/random.h>
+
+ACPI_MODULE_NAME("vmgenid");
+
+enum { VMGENID_SIZE = 16 };
+
+struct vmgenid_state {
+	u8 *next_id;
+	u8 this_id[VMGENID_SIZE];
+};
+
+static int vmgenid_acpi_add(struct acpi_device *device)
+{
+	struct acpi_buffer parsed = { ACPI_ALLOCATE_BUFFER };
+	struct vmgenid_state *state;
+	union acpi_object *obj;
+	phys_addr_t phys_addr;
+	acpi_status status;
+	int ret = 0;
+
+	state = devm_kmalloc(&device->dev, sizeof(*state), GFP_KERNEL);
+	if (!state)
+		return -ENOMEM;
+
+	status = acpi_evaluate_object(device->handle, "ADDR", NULL, &parsed);
+	if (ACPI_FAILURE(status)) {
+		ACPI_EXCEPTION((AE_INFO, status, "Evaluating ADDR"));
+		return -ENODEV;
+	}
+	obj = parsed.pointer;
+	if (!obj || obj->type != ACPI_TYPE_PACKAGE || obj->package.count != 2 ||
+	    obj->package.elements[0].type != ACPI_TYPE_INTEGER ||
+	    obj->package.elements[1].type != ACPI_TYPE_INTEGER) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	phys_addr = (obj->package.elements[0].integer.value << 0) |
+		    (obj->package.elements[1].integer.value << 32);
+	state->next_id = devm_memremap(&device->dev, phys_addr, VMGENID_SIZE, MEMREMAP_WB);
+	if (!state->next_id) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	memcpy(state->this_id, state->next_id, sizeof(state->this_id));
+	add_device_randomness(state->this_id, sizeof(state->this_id));
+
+	device->driver_data = state;
+
+out:
+	ACPI_FREE(parsed.pointer);
+	return ret;
+}
+
+static void vmgenid_acpi_notify(struct acpi_device *device, u32 event)
+{
+	struct vmgenid_state *state = acpi_driver_data(device);
+	u8 old_id[VMGENID_SIZE];
+
+	memcpy(old_id, state->this_id, sizeof(old_id));
+	memcpy(state->this_id, state->next_id, sizeof(state->this_id));
+	if (!memcmp(old_id, state->this_id, sizeof(old_id)))
+		return;
+	add_vmfork_randomness(state->this_id, sizeof(state->this_id));
+}
+
+static const struct acpi_device_id vmgenid_ids[] = {
+	{ "VMGENID", 0 },
+	{ "QEMUVGID", 0 },
+	{ },
+};
+
+static struct acpi_driver acpi_driver = {
+	.name = "vmgenid",
+	.ids = vmgenid_ids,
+	.owner = THIS_MODULE,
+	.ops = {
+		.add = vmgenid_acpi_add,
+		.notify = vmgenid_acpi_notify,
+	}
+};
+
+static int __init vmgenid_init(void)
+{
+	return acpi_bus_register_driver(&acpi_driver);
+}
+
+static void __exit vmgenid_exit(void)
+{
+	acpi_bus_unregister_driver(&acpi_driver);
+}
+
+module_init(vmgenid_init);
+module_exit(vmgenid_exit);
+
+MODULE_DEVICE_TABLE(acpi, vmgenid_ids);
+MODULE_DESCRIPTION("Virtual Machine Generation ID");
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Jason A. Donenfeld <Jason@zx2c4.com>");