From patchwork Fri Aug 28 20:18:28 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Laszlo Ersek X-Patchwork-Id: 52806 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-lb0-f199.google.com (mail-lb0-f199.google.com [209.85.217.199]) by patches.linaro.org (Postfix) with ESMTPS id DAE7220503 for ; Fri, 28 Aug 2015 20:19:09 +0000 (UTC) Received: by lbbpd10 with SMTP id pd10sf20199986lbb.3 for ; Fri, 28 Aug 2015 13:19:08 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:delivered-to:from:to:date :message-id:cc:subject:precedence:list-id:list-unsubscribe :list-archive:list-post:list-help:list-subscribe:errors-to:sender :x-original-sender:x-original-authentication-results:mailing-list; bh=Ja29iun8ROO1xRDCla/2hFUGKhv7iT1afkJN51iUxbM=; b=gUUaB8z+q8/yeIo2wZbwYTGttZEiKtHJJwFeNo0THFCe/TSASKzln2GT0y1zx/FwG4 V2Vfwxa5gwtHXU1oPi9+cVHX8YZySD5OFX5R7kDg8E16VbmsVOYdClPzuZ+lPn6UC7ft zmsJnUQVbbon0WkcDz8u1fr3T39SbBCCoA1tJqUMYKcCEgv2oDNLbVoZcEwGkv+vdj9g RnA+QDRLBedt4x5ExLFe/HAz8GQ+QJbGj1mYNzGzKvgWfEOHVJR1epOBs8EtwgkJBpHH jSN7OKXaeNKDvTjDnpjZpItd4kY4yQgGvmT+BqnAY0sYCpT+0ekv4DFu4gdGMQ7UdQes HjBA== X-Gm-Message-State: ALoCoQmD/iBGbz6iQWfkHREtw+bmX92DiidyCS2EZYvaoQsgiC4iE1pE+bIZ9JkwDe4J+Y5g04Sf X-Received: by 10.112.89.228 with SMTP id br4mr3113878lbb.3.1440793148838; Fri, 28 Aug 2015 13:19:08 -0700 (PDT) MIME-Version: 1.0 X-BeenThere: patchwork-forward@linaro.org Received: by 10.152.242.2 with SMTP id wm2ls235308lac.38.gmail; Fri, 28 Aug 2015 13:19:08 -0700 (PDT) X-Received: by 10.152.5.40 with SMTP id p8mr5997208lap.10.1440793148574; Fri, 28 Aug 2015 13:19:08 -0700 (PDT) Received: from mail-lb0-f175.google.com (mail-lb0-f175.google.com. [209.85.217.175]) by mx.google.com with ESMTPS id xg3si6774631lac.41.2015.08.28.13.19.08 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 28 Aug 2015 13:19:08 -0700 (PDT) Received-SPF: pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 209.85.217.175 as permitted sender) client-ip=209.85.217.175; Received: by lbbsx3 with SMTP id sx3so36327707lbb.0 for ; Fri, 28 Aug 2015 13:19:08 -0700 (PDT) X-Received: by 10.112.125.34 with SMTP id mn2mr4655546lbb.76.1440793148090; Fri, 28 Aug 2015 13:19:08 -0700 (PDT) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patch@linaro.org Received: by 10.112.151.194 with SMTP id us2csp100122lbb; Fri, 28 Aug 2015 13:19:06 -0700 (PDT) X-Received: by 10.140.133.67 with SMTP id 64mr20108318qhf.52.1440793146438; Fri, 28 Aug 2015 13:19:06 -0700 (PDT) Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id 87si8534918qkx.83.2015.08.28.13.19.05 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Fri, 28 Aug 2015 13:19:06 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Received: from localhost ([::1]:49751 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZVQ77-0004QO-Hx for patch@linaro.org; Fri, 28 Aug 2015 16:19:05 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:51712) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZVQ6h-00049N-9T for qemu-devel@nongnu.org; Fri, 28 Aug 2015 16:18:41 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZVQ6d-00010o-7O for qemu-devel@nongnu.org; Fri, 28 Aug 2015 16:18:39 -0400 Received: from mx1.redhat.com ([209.132.183.28]:47767) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZVQ6c-0000zx-Um for qemu-devel@nongnu.org; Fri, 28 Aug 2015 16:18:35 -0400 Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) by mx1.redhat.com (Postfix) with ESMTPS id 52CC78EA52 for ; Fri, 28 Aug 2015 20:18:33 +0000 (UTC) Received: from lacos-laptop-7.usersys.redhat.com (ovpn-116-21.rdu2.redhat.com [10.10.116.21]) by int-mx11.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id t7SKIUPl027009; Fri, 28 Aug 2015 16:18:31 -0400 From: Laszlo Ersek To: qemu-devel@nongnu.org Date: Fri, 28 Aug 2015 22:18:28 +0200 Message-Id: <1440793108-25061-1-git-send-email-lersek@redhat.com> X-Scanned-By: MIMEDefang 2.68 on 10.5.11.24 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 209.132.183.28 Cc: Gal Hammer , Paolo Bonzini , "Michael S. Tsirkin" , Igor Mammedov Subject: [Qemu-devel] [RFC] docs: describe QEMU's VMGenID design X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: , List-Help: , List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org X-Removed-Original-Auth: Dkim didn't pass. X-Original-Sender: lersek@redhat.com X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 209.85.217.175 as permitted sender) smtp.mailfrom=patch+caf_=patchwork-forward=linaro.org@linaro.org Mailing-list: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org X-Google-Group-Id: 836684582541 Cc: Paolo Bonzini Cc: Gal Hammer Cc: Igor Mammedov Cc: "Michael S. Tsirkin" Signed-off-by: Laszlo Ersek Acked-by: Michael S. Tsirkin --- Notes: This is based on the super long private email discussion we had two months ago, plus on the IRL discussion between Michael and myself @ the KVM Forum 2015. docs/specs/vmgenid.txt | 343 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 343 insertions(+) create mode 100644 docs/specs/vmgenid.txt diff --git a/docs/specs/vmgenid.txt b/docs/specs/vmgenid.txt new file mode 100644 index 0000000..d4bf132 --- /dev/null +++ b/docs/specs/vmgenid.txt @@ -0,0 +1,343 @@ +Virtual Machine Generation ID Device +==================================== + +The Microsoft specification entitled "Virtual Machine Generation ID", +maintained at , defines an ACPI +feature that allows the guest OSPM to recognize when it has been returned "to +an earlier point in time", eg. by restoral from snapshot, or by incoming +migration. Quoting the spec, + + The virtual machine generation ID is a feature whereby the virtual machines + BIOS will expose a new ID. This is a 128-bit, cryptographically random + integer value identifier that will be different every time the virtual + machine executes from a different configuration file-such as executing from + a recovered snapshot, or executing after restoring from backup. [...] + +The document you are reading now extracts the requirements set forth by the +VMGenID spec for hypervisors that intend to provide the feature, and describes +QEMU's implementation. The design below targets both SeaBIOS and OVMF as +compatible guest firmwares, without any changes to either of them. + +Requirements +------------ + +These requirements are extracted from the "How to implement virtual machine +generation ID support in a virtualization platform" section of the +specification, dated August 1, 2012. + +R1a. The generation ID shall live in an 8-byte aligned buffer. + +R1b. The buffer holding the generation ID shall be in guest RAM, ROM, or device + MMIO range. + +R1c. The buffer holding the generation ID shall be kept separate from areas + used by the operating system. + +R1d. The buffer shall not be covered by an AddressRangeMemory or + AddressRangeACPI entry in the E820 or UEFI memory map. + +R1e. The generation ID shall not live in a page frame that could be mapped with + caching disabled. (In other words, if the generation ID lives in RAM, then + it shall only be mapped as cacheable.) + +R2 to R5. [These AML requirements are isolated well enough in the Microsoft + specification for us to simply refer to them here.] + +R6. The hypervisor shall expose a _HID (hardware identifier) object in the + VMGenId device's scope that is unique to the hypervisor vendor. + +Generation ID buffer design +--------------------------- + +QEMU places the generation ID buffer inside a separate fw_cfg blob that is +exposed to the guest OS with the ACPI linker/loader. + +The structure of the blob is as follows. Offsets, sizes and numeric values are +given in decimal; furthermore the latter are encoded in little endian. + + Offs Field Size Value + ---- ------------------ ---- ------------------------------------ + 0 System Description 36 + Table Header + 0 Signature 4 "UEFI" + 4 Length 4 62 + 8 Revision 1 1 + 9 Checksum 1 0 + 10 OEMID 6 ACPI_BUILD_APPNAME6 ("BOCHS ") + 16 OEM Table ID 8 "QEMUPARM" + 24 OEM Revision 4 1 + 28 Creator ID 4 ACPI_BUILD_APPNAME4 ("BXPC") + 32 Creator Revision 4 1 + + 36 UEFI Table 18 + Sub-Header + 36 Identifier 16 417a5dff-bf4b-4abc-a839-6593bb41f452 + 52 DataOffset 2 54 + + 54 ADDR base pointer 8 62 + .................................................................... + 62 OVMF SDT Header 36 zeroes + probe suppressor + 98 VMGenID alignment 6 zeroes + padding + 104 generation ID 16 128-bit VMGenID + 120 fw_cfg blob 3976 zeroes + padding + 4096 + +The fw_cfg blob is divided in two parts conceptually (separated by the dotted +line in the diagram). The first part, up to and excluding offset 62, is a +"UEFI" ACPI Table, governed by the UEFI specification 2.5, Appendix O. The +second part is mainly padding, but it also contains the generation ID. + +The "UEFI" ACPI Table -- in the first part -- is a "normal" ACPI table whose +generic header is defined by the ACPI specification, but for which the UEFI +spec defines the "UEFI" signature and adds two more fixed fields, "Identifier" +and "DataOffset". + +- The Identifier field carries a 128-bit GUID, and enables firmware + implementors to install several "UEFI" tables with different internal + structures, enabling OSPM to tell them apart based on the (Type-)Identifier + GUID field. + + For the purposes of QEMU's VMGenID implementation, we generated a new GUID + with the "uuidgen" utility. It should be different from all other + "Identifier" values, present and future, but otherwise no other software need + be aware of the concrete GUID value we generated. + +- The DataOffset field is just an offset into the table where the actual + (Identifier-specific) data starts. + + For the purposes of QEMU's VMGenID implementation, we simply set it to the + next (QEMU-specific) field, "ADDR base pointer". + +Linker/loader commands +---------------------- + +The name of the fw_cfg blob is "etc/acpi/qemuparam". The ALLOCATE command that +instructs the guest firmware to download this fw_cfg blob specifies an +alignment of 4096, and the blob will have size 4096 too. + +An ADD_POINTER command links the "UEFI" ACPI Table at the start of the blob +into the RSDT. + +Another ADD_POINTER command relocates the "ADDR base pointer" field to the +absolute address of the "OVMF SDT Header probe suppressor" field, within the +same blob. + +After this relocation, an ADD_CHECKSUM command updates the Checksum field, +covering the entire "UEFI" ACPI Table (which extends up to and excluding offset +62). + +Blob behavior under SeaBIOS +--------------------------- + +(Most of the complexity in the blob is ignored when the guest firmware is +SeaBIOS.) + +- SeaBIOS's ACPI linker/loader client allocates the blob in normal RAM + (satisfying R1b). + +- Because the ALLOCATE command prescribes an alignment of 4KB, and the blob's + size is also 4KB, the allocation covers a standalone page frame in full + (satisfying R1e). + +- The 128-bit VMGenID field is located at offset 104 within that page, + resulting in a guest-physical address divisible by 8 (satisfying R1a). + +- The blob is marked as Reserved in the E820 map (satisfying R1c and R1d). + +- The "UEFI" ACPI Table at the start of the blob is linked into the RSDT, + in-place. + +- The "ADDR" AML method (see later) is allowed to refer to the "UEFI" ACPI + Table with the DataTableRegion operator, because the table is located in + memory marked as AddressRangeReserved. + +- The "ADDR base pointer" field points at "OVMF SDT Header probe suppressor", + which is right after the "UEFI" ACPI Table inside the blob. At OSPM runtime, + the "ADDR" AML method reads the "ADDR base pointer" field, and adds 42, to + arrive at the address of the VMGenID field. + + blob @ page offset 0 RSDT + +-----------------------+ +-----+ + | "UEFI" ACPI Table <---------+ | ... | + | +-------------------+ | | | ... | + | | ... | | +---- ... | + | | ... | | +-----+ + | | ADDR base pointer -----+ + | +-------------------+ | | + | probe suppressor <-------+ + | VMGenID @ offset 104 | + | padding | + +-----------------------+ + +Blob behavior under OVMF +------------------------ + +The complexity in the blob is required by the two-pass nature of OVMF's ACPI +linker/loader client, which in turn comes from the fact that OVMF has to +dissect blobs into individual ACPI tables vs. "other things", tracking the +ADD_POINTER commands, so that tables can be installed individually, with +EFI_ACPI_TABLE_PROTOCOL. + +- OVMF's ACPI linker/loader client allocates the blob in normal RAM (satisfying + R1b). + +- Because the ALLOCATE command prescribes an alignment of 4KB, and the blob's + size is also 4KB, the allocation covers a standalone page frame in full + (satisfying R1e). + +- The 128-bit VMGenID field is located at offset 104 within that page, + resulting in a guest-physical address divisible by 8 (satisfying R1a). + +- OVMF's ACPI linker/loader allocates the blob in EfiACPIMemoryNVS type memory, + therefore it is marked as such in the UEFI memmap (satisfying R1c and R1d). + +- OVMF identifies the "UEFI" ACPI Table at the start of the blob in the second + pass, following the ADD_POINTER command that is meant to link the table into + the RSDT. OVMF installs a *copy* of the "UEFI" ACPI Table with + EFI_ACPI_TABLE_PROTOCOL (linking the copy into both RSDT and XSDT). Given the + "UEFI" signature of the table, EFI_ACPI_TABLE_PROTOCOL places the copy of the + table in EfiACPIMemoryNVS type memory. + +- The "ADDR" AML method (see later) is allowed to refer to the "UEFI" ACPI + Table with the DataTableRegion operator, because the table is located in + memory marked as AddressRangeNVS. + +- The "ADDR base pointer" field inside the installed table points at "OVMF SDT + Header probe suppressor" in the original blob. Because this field is filled + with zeros, OVMF's table identification heuristics unconditionally reports a + negative when it tracks the relevant ADD_POINTER command to it in the second + pass. Therefore the blob is marked as "hosts something else than just ACPI + tables", and it is preserved permanently (in the same EfiACPIMemoryNVS type + memory where it has been originally allocated). + + At OSPM runtime, the "ADDR" AML method reads the "ADDR base pointer" field, + and adds 42, to arrive at the address of the VMGenID field. + + blob @ page offset 0 RSDT XSDT + +-----------------------------+ +-----+ +-----+ + | "UEFI" ACPI Table (in blob) | | ... | | ... | + | +-------------------------+ | | ... ---+ | ... ---------------+ + | |XXXXXXXXXXXXXXXXXXXXXXXXX| | +-----+ | +-----+ | + | |XXXXXXX [unused] XXXXXXXX| | | | + | |XXXXXXXXXXXXXXXXXXXXXXXXX| | +------------------------+ + | +-------------------------+ | | + | probe suppressor <-------------+ "UEFI" ACPI Table (installed) <--+ + | VMGenID @ offset 104 | | +---------------------------+ + | padding | | | ... | + +-----------------------------+ | | ... | + +--- ADDR base pointer | + +---------------------------+ + +ACPI device, control methods +---------------------------- + +Requirements R2 through R6 of the VMGenID specification are satisfied with the +following ACPI logic, exposed by QEMU's ACPI generator in one of the SSDTs, and +installed by both guest firmwares as such. + +The basic idea is that, when the appropriate guest driver calls the ADDR method +(see R4), OSPM locates the generation ID field in the 4KB blob that lives in +E820 Reserved (SeaBIOS) or EfiACPIMemoryNVS type (OVMF) memory. The +guest-physical address of the field is communicated to QEMU via IO ports +[0x512..0x519] inclusive. Then QEMU is cued through IO port 0x51A to refresh +(and keep refreshing when appropriate) the generation ID at the passed back +address. Finally, the method returns the address to the guest driver too, in +the format required by R4. + + Scope(\_SB) { + Device (VMGI) { + /* satisfy R2 */ + Name (_CID, "VM_Gen_Counter") + + /* satisfy R3 */ + Name (_DDN, "VM_Gen_Counter") + + /* satisfy R6 */ + Name (_HID, "QEMU0002") + + /* the device owns this IO port range */ + Name (_CRS, ResourceTemplate () { + IO (Decode16, 0x512, 0x512, 1, 9) + }) + + /* Device status: present, enabled & decoding resources, should be + * shown in the UI, functioning properly. + */ + Name (_STA, 0xF) + + /* Satisfy R4. + * + * This method is serialized because it creates named objects. + */ + Method (ADDR, 0, Serialized) { + /* The 8-byte integer field defined as ADBP below is the + * "ADDR base pointer" field in the UEFI ACPI Table. + * + * The DataTableRegion() operator locates that ACPI table by + * scanning the RSDT/XSDT using the (SignatureString, + * OemIDString, OemTableIDString) triplet as key. + * + * Windows XP would normally crash on the DataTableRegion() + * operator, but it never calls the ADDR method, hence it never + * reaches or evaluates DataTableRegion(). + */ + DataTableRegion (TBLR, "UEFI", "BOCHS", "QEMUPARM") + Field (TBLR, AnyAcc, NoLock, Preserve) { + Offset (54), + ADBP, 64 + } + + /* This is the IO port range exposed in the _CRS above. + * + * The first two 4-byte ports are used to communicate the + * 64-bit guest-physical address of the actual (relocated) + * 128-bit generation ID field to QEMU, in little endian + * encoding, so that QEMU can rewrite that field in guest RAM. + * + * A write to last 1-byte port signals that the address has + * been written fully, and QEMU is free to dereference it. + */ + OperationRegion (VMGR, SystemIO, 0x512, 9) + Field (VMGR, DWordAcc, NoLock, Preserve) { + PTLO, 32, + PTHI, 32, + AccessAs (ByteAcc), + DONE, 8 + } + + /* The ADBP field points to the "OVMF SDT Header probe + * suppressor" area in the blob, at offset 62. In order to + * arrive at the generation ID field at offset 104, we must add + * 42 dynamically. + * + * The RESU buffer below will contain the result of the + * addition. The ADFU field exposes it as an 8-byte integer + * (for storing the sum), while the ADLO and ADHI fields enable + * us to access the result in two separate 4-byte integers. + * This exact integer width is especially important for + * composing the package object that the ADDR method must + * return. + */ + Name (RESU, Buffer (8) {}) + CreateQWordField (RESU, 0, ADFU) + CreateDWordField (RESU, 0, ADLO) + CreateDWordField (RESU, 4, ADHI) + + Add (ADBP, 42, ADFU) + Store (ADLO, PTLO) + Store (ADHI, PTHI) + Store (0, DONE) + Return (Package (2) { ADLO, ADHI }) + } + } + } + + /* satisfy R5 */ + Scope (\_GPE) { + Method (_E04) { + Notify (\_SB.VMGI, 0x80) + } + }