Message ID | 1484328738-21149-1-git-send-email-ard.biesheuvel@linaro.org |
---|---|
State | New |
Headers | show |
On 13 January 2017 at 17:32, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > Linux for arm64 v4.10 and later will complain if the ECAM config space is > not reserved in the ACPI namespace: > > acpi PNP0A08:00: [Firmware Bug]: ECAM area [mem 0x3f000000-0x3fffffff] not reserved in ACPI namespace > > The rationale is that OSes that don't consume the MCFG table should still > be able to infer that the PCI config space MMIO region is occupied. > > So update the ACPI table generation routine to add this reservation. > > Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> > --- > hw/arm/virt-acpi-build.c | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c > index 085a61117378..50d52f685f68 100644 > --- a/hw/arm/virt-acpi-build.c > +++ b/hw/arm/virt-acpi-build.c > @@ -310,6 +310,13 @@ static void acpi_dsdt_add_pci(Aml *scope, const MemMapEntry *memmap, > Aml *dev_rp0 = aml_device("%s", "RP0"); > aml_append(dev_rp0, aml_name_decl("_ADR", aml_int(0))); > aml_append(dev, dev_rp0); > + > + Aml *dev_res0 = aml_device("%s", "RES0"); > + aml_append(dev_res0, aml_name_decl("_HID", aml_string("PNP0C02"))); > + crs = aml_resource_template(); > + aml_append(crs, aml_memory32_fixed(base_ecam, size_ecam, AML_READ_WRITE)); > + aml_append(dev_res0, aml_name_decl("_CRS", crs)); > + aml_append(dev, dev_res0); > aml_append(scope, dev); > } This needs to be controlled via the machine class back-compat machinery in hw/arm/virt.c so that it only happens for virt-2.9 and later. thanks -- PMM
On 16 January 2017 at 17:25, Peter Maydell <peter.maydell@linaro.org> wrote: > On 13 January 2017 at 17:32, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: >> Linux for arm64 v4.10 and later will complain if the ECAM config space is >> not reserved in the ACPI namespace: >> >> acpi PNP0A08:00: [Firmware Bug]: ECAM area [mem 0x3f000000-0x3fffffff] not reserved in ACPI namespace >> >> The rationale is that OSes that don't consume the MCFG table should still >> be able to infer that the PCI config space MMIO region is occupied. >> >> So update the ACPI table generation routine to add this reservation. >> >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> >> --- >> hw/arm/virt-acpi-build.c | 7 +++++++ >> 1 file changed, 7 insertions(+) >> >> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c >> index 085a61117378..50d52f685f68 100644 >> --- a/hw/arm/virt-acpi-build.c >> +++ b/hw/arm/virt-acpi-build.c >> @@ -310,6 +310,13 @@ static void acpi_dsdt_add_pci(Aml *scope, const MemMapEntry *memmap, >> Aml *dev_rp0 = aml_device("%s", "RP0"); >> aml_append(dev_rp0, aml_name_decl("_ADR", aml_int(0))); >> aml_append(dev, dev_rp0); >> + >> + Aml *dev_res0 = aml_device("%s", "RES0"); >> + aml_append(dev_res0, aml_name_decl("_HID", aml_string("PNP0C02"))); >> + crs = aml_resource_template(); >> + aml_append(crs, aml_memory32_fixed(base_ecam, size_ecam, AML_READ_WRITE)); >> + aml_append(dev_res0, aml_name_decl("_CRS", crs)); >> + aml_append(dev, dev_res0); >> aml_append(scope, dev); >> } > > This needs to be controlled via the machine class back-compat > machinery in hw/arm/virt.c so that it only happens for virt-2.9 > and later. > Why exactly?
On 16 January 2017 at 17:30, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > On 16 January 2017 at 17:25, Peter Maydell <peter.maydell@linaro.org> wrote: >> On 13 January 2017 at 17:32, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: >>> Linux for arm64 v4.10 and later will complain if the ECAM config space is >>> not reserved in the ACPI namespace: >>> >>> acpi PNP0A08:00: [Firmware Bug]: ECAM area [mem 0x3f000000-0x3fffffff] not reserved in ACPI namespace >>> >>> The rationale is that OSes that don't consume the MCFG table should still >>> be able to infer that the PCI config space MMIO region is occupied. >>> >>> So update the ACPI table generation routine to add this reservation. >>> >>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> >>> --- >>> hw/arm/virt-acpi-build.c | 7 +++++++ >>> 1 file changed, 7 insertions(+) >>> >>> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c >>> index 085a61117378..50d52f685f68 100644 >>> --- a/hw/arm/virt-acpi-build.c >>> +++ b/hw/arm/virt-acpi-build.c >>> @@ -310,6 +310,13 @@ static void acpi_dsdt_add_pci(Aml *scope, const MemMapEntry *memmap, >>> Aml *dev_rp0 = aml_device("%s", "RP0"); >>> aml_append(dev_rp0, aml_name_decl("_ADR", aml_int(0))); >>> aml_append(dev, dev_rp0); >>> + >>> + Aml *dev_res0 = aml_device("%s", "RES0"); >>> + aml_append(dev_res0, aml_name_decl("_HID", aml_string("PNP0C02"))); >>> + crs = aml_resource_template(); >>> + aml_append(crs, aml_memory32_fixed(base_ecam, size_ecam, AML_READ_WRITE)); >>> + aml_append(dev_res0, aml_name_decl("_CRS", crs)); >>> + aml_append(dev, dev_res0); >>> aml_append(scope, dev); >>> } >> >> This needs to be controlled via the machine class back-compat >> machinery in hw/arm/virt.c so that it only happens for virt-2.9 >> and later. >> > > Why exactly? Because the "virt-2.8" machine has to present to the guest exactly what "virt" did as of the QEMU 2.8 release, including any bugs or missing things we happened to have in our ACPI tables. This allows cross-version compatibility (including VM migration). Drew will have a more detailed explanation if you need it. thanks -- PMM
On 16 January 2017 at 18:20, Peter Maydell <peter.maydell@linaro.org> wrote: > On 16 January 2017 at 17:30, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: >> On 16 January 2017 at 17:25, Peter Maydell <peter.maydell@linaro.org> wrote: >>> On 13 January 2017 at 17:32, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: >>>> Linux for arm64 v4.10 and later will complain if the ECAM config space is >>>> not reserved in the ACPI namespace: >>>> >>>> acpi PNP0A08:00: [Firmware Bug]: ECAM area [mem 0x3f000000-0x3fffffff] not reserved in ACPI namespace >>>> >>>> The rationale is that OSes that don't consume the MCFG table should still >>>> be able to infer that the PCI config space MMIO region is occupied. >>>> >>>> So update the ACPI table generation routine to add this reservation. >>>> >>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> >>>> --- >>>> hw/arm/virt-acpi-build.c | 7 +++++++ >>>> 1 file changed, 7 insertions(+) >>>> >>>> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c >>>> index 085a61117378..50d52f685f68 100644 >>>> --- a/hw/arm/virt-acpi-build.c >>>> +++ b/hw/arm/virt-acpi-build.c >>>> @@ -310,6 +310,13 @@ static void acpi_dsdt_add_pci(Aml *scope, const MemMapEntry *memmap, >>>> Aml *dev_rp0 = aml_device("%s", "RP0"); >>>> aml_append(dev_rp0, aml_name_decl("_ADR", aml_int(0))); >>>> aml_append(dev, dev_rp0); >>>> + >>>> + Aml *dev_res0 = aml_device("%s", "RES0"); >>>> + aml_append(dev_res0, aml_name_decl("_HID", aml_string("PNP0C02"))); >>>> + crs = aml_resource_template(); >>>> + aml_append(crs, aml_memory32_fixed(base_ecam, size_ecam, AML_READ_WRITE)); >>>> + aml_append(dev_res0, aml_name_decl("_CRS", crs)); >>>> + aml_append(dev, dev_res0); >>>> aml_append(scope, dev); >>>> } >>> >>> This needs to be controlled via the machine class back-compat >>> machinery in hw/arm/virt.c so that it only happens for virt-2.9 >>> and later. >>> >> >> Why exactly? > > Because the "virt-2.8" machine has to present to the guest > exactly what "virt" did as of the QEMU 2.8 release, including > any bugs or missing things we happened to have in our ACPI > tables. This allows cross-version compatibility (including > VM migration). Drew will have a more detailed explanation > if you need it. > I suspected as much. But in this case, I am not sure if it is worth the trouble: the generated data is only consumed at boot time by the firmware, and I suppose migration involves freezing a VM, including whatever resident firmware image was used to boot the OS, and so this is unlikely to affect migration. But I will let Drew explain ... Thanks, Ard.
On 01/16/17 20:31, Ard Biesheuvel wrote: > On 16 January 2017 at 18:20, Peter Maydell <peter.maydell@linaro.org> wrote: >> On 16 January 2017 at 17:30, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: >>> On 16 January 2017 at 17:25, Peter Maydell <peter.maydell@linaro.org> wrote: >>>> On 13 January 2017 at 17:32, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: >>>>> Linux for arm64 v4.10 and later will complain if the ECAM config space is >>>>> not reserved in the ACPI namespace: >>>>> >>>>> acpi PNP0A08:00: [Firmware Bug]: ECAM area [mem 0x3f000000-0x3fffffff] not reserved in ACPI namespace >>>>> >>>>> The rationale is that OSes that don't consume the MCFG table should still >>>>> be able to infer that the PCI config space MMIO region is occupied. >>>>> >>>>> So update the ACPI table generation routine to add this reservation. >>>>> >>>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> >>>>> --- >>>>> hw/arm/virt-acpi-build.c | 7 +++++++ >>>>> 1 file changed, 7 insertions(+) >>>>> >>>>> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c >>>>> index 085a61117378..50d52f685f68 100644 >>>>> --- a/hw/arm/virt-acpi-build.c >>>>> +++ b/hw/arm/virt-acpi-build.c >>>>> @@ -310,6 +310,13 @@ static void acpi_dsdt_add_pci(Aml *scope, const MemMapEntry *memmap, >>>>> Aml *dev_rp0 = aml_device("%s", "RP0"); >>>>> aml_append(dev_rp0, aml_name_decl("_ADR", aml_int(0))); >>>>> aml_append(dev, dev_rp0); >>>>> + >>>>> + Aml *dev_res0 = aml_device("%s", "RES0"); >>>>> + aml_append(dev_res0, aml_name_decl("_HID", aml_string("PNP0C02"))); >>>>> + crs = aml_resource_template(); >>>>> + aml_append(crs, aml_memory32_fixed(base_ecam, size_ecam, AML_READ_WRITE)); >>>>> + aml_append(dev_res0, aml_name_decl("_CRS", crs)); >>>>> + aml_append(dev, dev_res0); >>>>> aml_append(scope, dev); >>>>> } >>>> >>>> This needs to be controlled via the machine class back-compat >>>> machinery in hw/arm/virt.c so that it only happens for virt-2.9 >>>> and later. >>>> >>> >>> Why exactly? >> >> Because the "virt-2.8" machine has to present to the guest >> exactly what "virt" did as of the QEMU 2.8 release, including >> any bugs or missing things we happened to have in our ACPI >> tables. This allows cross-version compatibility (including >> VM migration). Drew will have a more detailed explanation >> if you need it. >> > > I suspected as much. > > But in this case, I am not sure if it is worth the trouble: the > generated data is only consumed at boot time by the firmware, and I > suppose migration involves freezing a VM, including whatever resident > firmware image was used to boot the OS, and so this is unlikely to > affect migration. > > But I will let Drew explain ... The PCI Firmware Specification (rev 3.1) says in 4.1.2. "MCFG Table Description": "The resources can *optionally* be returned in [...] EFIGetMemoryMap as reserved memory [...]". (Emphasis mine.) Linux seems to *insist* on this kind of reservation however. PNP0C02 is "General ID for reserving resources required by PnP motherboard registers. (Not device specific.)", according to <http://www.plasma-online.de/english/identify/serial/pnp_id_pnp.html>. So what this patch does is reserve a memory area through ACPI, practically as an unspecified "platform resource". There's an alternative that's contained entirely in the firmware. You can cover the MMCONFIG area in ArmVirtQemu with an EfiReservedMemoryType memory map entry (by producing an appropriate memalloc HOB in PEI, or by calling the appropriate gDS memory space map functions in DXE). OVMF does the former (memalloc HOB). In ArmVirtQemu, we grab the MMCONFIG range from "pci-host-ecam-generic", from QEMU's DTB. If you don't dislike the idea, we could cover the range as well, right in "ArmVirtPkg/Library/FdtPciPcdProducerLib". That lib instance already sets the base address PCD, and makes sure that the relevant code is executed only once (in whatever driver module the library instance was built into). You could call the gDS functions mentioned above from that spot. (The library instance is already restricted to DXE_DRIVER and UEFI_DRIVER modules.) Thanks! Laszlo
On 16 January 2017 at 21:13, Laszlo Ersek <lersek@redhat.com> wrote: > On 01/16/17 20:31, Ard Biesheuvel wrote: >> On 16 January 2017 at 18:20, Peter Maydell <peter.maydell@linaro.org> wrote: >>> On 16 January 2017 at 17:30, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: >>>> On 16 January 2017 at 17:25, Peter Maydell <peter.maydell@linaro.org> wrote: >>>>> On 13 January 2017 at 17:32, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: >>>>>> Linux for arm64 v4.10 and later will complain if the ECAM config space is >>>>>> not reserved in the ACPI namespace: >>>>>> >>>>>> acpi PNP0A08:00: [Firmware Bug]: ECAM area [mem 0x3f000000-0x3fffffff] not reserved in ACPI namespace >>>>>> >>>>>> The rationale is that OSes that don't consume the MCFG table should still >>>>>> be able to infer that the PCI config space MMIO region is occupied. >>>>>> >>>>>> So update the ACPI table generation routine to add this reservation. >>>>>> >>>>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> >>>>>> --- >>>>>> hw/arm/virt-acpi-build.c | 7 +++++++ >>>>>> 1 file changed, 7 insertions(+) >>>>>> >>>>>> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c >>>>>> index 085a61117378..50d52f685f68 100644 >>>>>> --- a/hw/arm/virt-acpi-build.c >>>>>> +++ b/hw/arm/virt-acpi-build.c >>>>>> @@ -310,6 +310,13 @@ static void acpi_dsdt_add_pci(Aml *scope, const MemMapEntry *memmap, >>>>>> Aml *dev_rp0 = aml_device("%s", "RP0"); >>>>>> aml_append(dev_rp0, aml_name_decl("_ADR", aml_int(0))); >>>>>> aml_append(dev, dev_rp0); >>>>>> + >>>>>> + Aml *dev_res0 = aml_device("%s", "RES0"); >>>>>> + aml_append(dev_res0, aml_name_decl("_HID", aml_string("PNP0C02"))); >>>>>> + crs = aml_resource_template(); >>>>>> + aml_append(crs, aml_memory32_fixed(base_ecam, size_ecam, AML_READ_WRITE)); >>>>>> + aml_append(dev_res0, aml_name_decl("_CRS", crs)); >>>>>> + aml_append(dev, dev_res0); >>>>>> aml_append(scope, dev); >>>>>> } >>>>> >>>>> This needs to be controlled via the machine class back-compat >>>>> machinery in hw/arm/virt.c so that it only happens for virt-2.9 >>>>> and later. >>>>> >>>> >>>> Why exactly? >>> >>> Because the "virt-2.8" machine has to present to the guest >>> exactly what "virt" did as of the QEMU 2.8 release, including >>> any bugs or missing things we happened to have in our ACPI >>> tables. This allows cross-version compatibility (including >>> VM migration). Drew will have a more detailed explanation >>> if you need it. >>> >> >> I suspected as much. >> >> But in this case, I am not sure if it is worth the trouble: the >> generated data is only consumed at boot time by the firmware, and I >> suppose migration involves freezing a VM, including whatever resident >> firmware image was used to boot the OS, and so this is unlikely to >> affect migration. >> >> But I will let Drew explain ... > > The PCI Firmware Specification (rev 3.1) says in 4.1.2. "MCFG Table > Description": "The resources can *optionally* be returned in [...] > EFIGetMemoryMap as reserved memory [...]". (Emphasis mine.) Linux seems > to *insist* on this kind of reservation however. > No, not at the UEFI level but at the ACPI level. Reservations in the UEFI memory map describe memory not MMIO space > PNP0C02 is "General ID for reserving resources required by PnP > motherboard registers. (Not device specific.)", according to > <http://www.plasma-online.de/english/identify/serial/pnp_id_pnp.html>. > So what this patch does is reserve a memory area through ACPI, > practically as an unspecified "platform resource". > This has been discussed at great length on the linux mailing lists https://patchwork.kernel.org/patch/9453149/ > There's an alternative that's contained entirely in the firmware. You > can cover the MMCONFIG area in ArmVirtQemu with an EfiReservedMemoryType > memory map entry (by producing an appropriate memalloc HOB in PEI, or by > calling the appropriate gDS memory space map functions in DXE). OVMF > does the former (memalloc HOB). > > In ArmVirtQemu, we grab the MMCONFIG range from "pci-host-ecam-generic", > from QEMU's DTB. If you don't dislike the idea, we could cover the range > as well, right in "ArmVirtPkg/Library/FdtPciPcdProducerLib". That lib > instance already sets the base address PCD, and makes sure that the > relevant code is executed only once (in whatever driver module the > library instance was built into). You could call the gDS functions > mentioned above from that spot. (The library instance is already > restricted to DXE_DRIVER and UEFI_DRIVER modules.) > In general, I think describing MMIO in the UEFI memory map is not very useful, and counter to the spec, which mentions that the memory map describes memory ("however it is used"), not memory *space* (unless UEFI itself needs to access it to implement runtime services)
On 01/16/17 22:23, Ard Biesheuvel wrote: > On 16 January 2017 at 21:13, Laszlo Ersek <lersek@redhat.com> wrote: >> On 01/16/17 20:31, Ard Biesheuvel wrote: >>> On 16 January 2017 at 18:20, Peter Maydell <peter.maydell@linaro.org> wrote: >>>> On 16 January 2017 at 17:30, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: >>>>> On 16 January 2017 at 17:25, Peter Maydell <peter.maydell@linaro.org> wrote: >>>>>> On 13 January 2017 at 17:32, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: >>>>>>> Linux for arm64 v4.10 and later will complain if the ECAM config space is >>>>>>> not reserved in the ACPI namespace: >>>>>>> >>>>>>> acpi PNP0A08:00: [Firmware Bug]: ECAM area [mem 0x3f000000-0x3fffffff] not reserved in ACPI namespace >>>>>>> >>>>>>> The rationale is that OSes that don't consume the MCFG table should still >>>>>>> be able to infer that the PCI config space MMIO region is occupied. >>>>>>> >>>>>>> So update the ACPI table generation routine to add this reservation. >>>>>>> >>>>>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> >>>>>>> --- >>>>>>> hw/arm/virt-acpi-build.c | 7 +++++++ >>>>>>> 1 file changed, 7 insertions(+) >>>>>>> >>>>>>> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c >>>>>>> index 085a61117378..50d52f685f68 100644 >>>>>>> --- a/hw/arm/virt-acpi-build.c >>>>>>> +++ b/hw/arm/virt-acpi-build.c >>>>>>> @@ -310,6 +310,13 @@ static void acpi_dsdt_add_pci(Aml *scope, const MemMapEntry *memmap, >>>>>>> Aml *dev_rp0 = aml_device("%s", "RP0"); >>>>>>> aml_append(dev_rp0, aml_name_decl("_ADR", aml_int(0))); >>>>>>> aml_append(dev, dev_rp0); >>>>>>> + >>>>>>> + Aml *dev_res0 = aml_device("%s", "RES0"); >>>>>>> + aml_append(dev_res0, aml_name_decl("_HID", aml_string("PNP0C02"))); >>>>>>> + crs = aml_resource_template(); >>>>>>> + aml_append(crs, aml_memory32_fixed(base_ecam, size_ecam, AML_READ_WRITE)); >>>>>>> + aml_append(dev_res0, aml_name_decl("_CRS", crs)); >>>>>>> + aml_append(dev, dev_res0); >>>>>>> aml_append(scope, dev); >>>>>>> } >>>>>> >>>>>> This needs to be controlled via the machine class back-compat >>>>>> machinery in hw/arm/virt.c so that it only happens for virt-2.9 >>>>>> and later. >>>>>> >>>>> >>>>> Why exactly? >>>> >>>> Because the "virt-2.8" machine has to present to the guest >>>> exactly what "virt" did as of the QEMU 2.8 release, including >>>> any bugs or missing things we happened to have in our ACPI >>>> tables. This allows cross-version compatibility (including >>>> VM migration). Drew will have a more detailed explanation >>>> if you need it. >>>> >>> >>> I suspected as much. >>> >>> But in this case, I am not sure if it is worth the trouble: the >>> generated data is only consumed at boot time by the firmware, and I >>> suppose migration involves freezing a VM, including whatever resident >>> firmware image was used to boot the OS, and so this is unlikely to >>> affect migration. >>> >>> But I will let Drew explain ... >> >> The PCI Firmware Specification (rev 3.1) says in 4.1.2. "MCFG Table >> Description": "The resources can *optionally* be returned in [...] >> EFIGetMemoryMap as reserved memory [...]". (Emphasis mine.) Linux seems >> to *insist* on this kind of reservation however. >> > > No, not at the UEFI level but at the ACPI level. Reservations in the > UEFI memory map describe memory not MMIO space > >> PNP0C02 is "General ID for reserving resources required by PnP >> motherboard registers. (Not device specific.)", according to >> <http://www.plasma-online.de/english/identify/serial/pnp_id_pnp.html>. >> So what this patch does is reserve a memory area through ACPI, >> practically as an unspecified "platform resource". >> > > This has been discussed at great length on the linux mailing lists > > https://patchwork.kernel.org/patch/9453149/ > >> There's an alternative that's contained entirely in the firmware. You >> can cover the MMCONFIG area in ArmVirtQemu with an EfiReservedMemoryType >> memory map entry (by producing an appropriate memalloc HOB in PEI, or by >> calling the appropriate gDS memory space map functions in DXE). OVMF >> does the former (memalloc HOB). >> >> In ArmVirtQemu, we grab the MMCONFIG range from "pci-host-ecam-generic", >> from QEMU's DTB. If you don't dislike the idea, we could cover the range >> as well, right in "ArmVirtPkg/Library/FdtPciPcdProducerLib". That lib >> instance already sets the base address PCD, and makes sure that the >> relevant code is executed only once (in whatever driver module the >> library instance was built into). You could call the gDS functions >> mentioned above from that spot. (The library instance is already >> restricted to DXE_DRIVER and UEFI_DRIVER modules.) >> > > In general, I think describing MMIO in the UEFI memory map is not very > useful, and counter to the spec, which mentions that the memory map > describes memory ("however it is used"), not memory *space* (unless > UEFI itself needs to access it to implement runtime services) > The UEFI memory map will reflect allocations from the GCD memory space, for the Reserved and MMIO types. See "Figure 2. GCD Memory State Transitions" in "7.2.2 GCD Memory Resources", Vol2 of the PI spec. See also "9.7.1 UEFI Boot Services Dependencies" in the same, 9.7.1.8 GetMemoryMap() The GetMemoryMap() implementation must include into the UEFI memory map all GCD map entries of types EfiGcdMemoryTypeReserved and EfiPersistentMemory, and all GCD map entries of type EfiGcdMemoryTypeMemoryMappedIo that have EFI_MEMORY_RUNTIME attribute set. (Note that I wrote Reserved earlier, not MMIO.) However, you are right that *just* the UEFI memmap entry is not sufficient, according to the PCI firmware spec. (Regardless of the fact that in practice, just the memmap entry does keep Linux happy. Or is it about to change?) Namely, looking again at the spot I quoted above (and it's also quoted in the kernel docs patch that you linked above, under ref [6]), we find If the operating system does not natively comprehend reserving the MMCFG region, the MMCFG region must be reserved by firmware. The address range reported in the MCFG table or by _CBA method (see Section 4.1.3) must be reserved by declaring a motherboard resource. For most systems, the motherboard resource would appear at the root of the ACPI namespace (under \_SB) in a node with a _HID of EISAID (PNP0C02), and the resources in this case should not be claimed in the root PCI bus’s _CRS. The resources can optionally be returned in Int15 E820 or EFIGetMemoryMap as reserved memory but must always be reported through ACPI as a motherboard resource. Therefore I agree that reserving the MMCONFIG area via a PNP0C02 object in QEMU's ACPI payload improves spec conformance. (Actually, the argument can be made for x86/Q35 as well. Adding Marcel and MST.) ... Beyond the machine-type dependency raised by Peter (which I gather is still being discussed), I suggest that the commit message of this patch quote the relevant passage from the PCI fw spec in full (see above, or in the kernel docs patch). Thanks! Laszlo
On 16 January 2017 at 22:35, Laszlo Ersek <lersek@redhat.com> wrote: > On 01/16/17 22:23, Ard Biesheuvel wrote: >> On 16 January 2017 at 21:13, Laszlo Ersek <lersek@redhat.com> wrote: >>> On 01/16/17 20:31, Ard Biesheuvel wrote: >>>> On 16 January 2017 at 18:20, Peter Maydell <peter.maydell@linaro.org> wrote: >>>>> On 16 January 2017 at 17:30, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: >>>>>> On 16 January 2017 at 17:25, Peter Maydell <peter.maydell@linaro.org> wrote: >>>>>>> On 13 January 2017 at 17:32, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: >>>>>>>> Linux for arm64 v4.10 and later will complain if the ECAM config space is >>>>>>>> not reserved in the ACPI namespace: >>>>>>>> >>>>>>>> acpi PNP0A08:00: [Firmware Bug]: ECAM area [mem 0x3f000000-0x3fffffff] not reserved in ACPI namespace >>>>>>>> >>>>>>>> The rationale is that OSes that don't consume the MCFG table should still >>>>>>>> be able to infer that the PCI config space MMIO region is occupied. >>>>>>>> >>>>>>>> So update the ACPI table generation routine to add this reservation. >>>>>>>> >>>>>>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> >>>>>>>> --- >>>>>>>> hw/arm/virt-acpi-build.c | 7 +++++++ >>>>>>>> 1 file changed, 7 insertions(+) >>>>>>>> >>>>>>>> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c >>>>>>>> index 085a61117378..50d52f685f68 100644 >>>>>>>> --- a/hw/arm/virt-acpi-build.c >>>>>>>> +++ b/hw/arm/virt-acpi-build.c >>>>>>>> @@ -310,6 +310,13 @@ static void acpi_dsdt_add_pci(Aml *scope, const MemMapEntry *memmap, >>>>>>>> Aml *dev_rp0 = aml_device("%s", "RP0"); >>>>>>>> aml_append(dev_rp0, aml_name_decl("_ADR", aml_int(0))); >>>>>>>> aml_append(dev, dev_rp0); >>>>>>>> + >>>>>>>> + Aml *dev_res0 = aml_device("%s", "RES0"); >>>>>>>> + aml_append(dev_res0, aml_name_decl("_HID", aml_string("PNP0C02"))); >>>>>>>> + crs = aml_resource_template(); >>>>>>>> + aml_append(crs, aml_memory32_fixed(base_ecam, size_ecam, AML_READ_WRITE)); >>>>>>>> + aml_append(dev_res0, aml_name_decl("_CRS", crs)); >>>>>>>> + aml_append(dev, dev_res0); >>>>>>>> aml_append(scope, dev); >>>>>>>> } >>>>>>> >>>>>>> This needs to be controlled via the machine class back-compat >>>>>>> machinery in hw/arm/virt.c so that it only happens for virt-2.9 >>>>>>> and later. >>>>>>> >>>>>> >>>>>> Why exactly? >>>>> >>>>> Because the "virt-2.8" machine has to present to the guest >>>>> exactly what "virt" did as of the QEMU 2.8 release, including >>>>> any bugs or missing things we happened to have in our ACPI >>>>> tables. This allows cross-version compatibility (including >>>>> VM migration). Drew will have a more detailed explanation >>>>> if you need it. >>>>> >>>> >>>> I suspected as much. >>>> >>>> But in this case, I am not sure if it is worth the trouble: the >>>> generated data is only consumed at boot time by the firmware, and I >>>> suppose migration involves freezing a VM, including whatever resident >>>> firmware image was used to boot the OS, and so this is unlikely to >>>> affect migration. >>>> >>>> But I will let Drew explain ... >>> >>> The PCI Firmware Specification (rev 3.1) says in 4.1.2. "MCFG Table >>> Description": "The resources can *optionally* be returned in [...] >>> EFIGetMemoryMap as reserved memory [...]". (Emphasis mine.) Linux seems >>> to *insist* on this kind of reservation however. >>> >> >> No, not at the UEFI level but at the ACPI level. Reservations in the >> UEFI memory map describe memory not MMIO space >> >>> PNP0C02 is "General ID for reserving resources required by PnP >>> motherboard registers. (Not device specific.)", according to >>> <http://www.plasma-online.de/english/identify/serial/pnp_id_pnp.html>. >>> So what this patch does is reserve a memory area through ACPI, >>> practically as an unspecified "platform resource". >>> >> >> This has been discussed at great length on the linux mailing lists >> >> https://patchwork.kernel.org/patch/9453149/ >> >>> There's an alternative that's contained entirely in the firmware. You >>> can cover the MMCONFIG area in ArmVirtQemu with an EfiReservedMemoryType >>> memory map entry (by producing an appropriate memalloc HOB in PEI, or by >>> calling the appropriate gDS memory space map functions in DXE). OVMF >>> does the former (memalloc HOB). >>> >>> In ArmVirtQemu, we grab the MMCONFIG range from "pci-host-ecam-generic", >>> from QEMU's DTB. If you don't dislike the idea, we could cover the range >>> as well, right in "ArmVirtPkg/Library/FdtPciPcdProducerLib". That lib >>> instance already sets the base address PCD, and makes sure that the >>> relevant code is executed only once (in whatever driver module the >>> library instance was built into). You could call the gDS functions >>> mentioned above from that spot. (The library instance is already >>> restricted to DXE_DRIVER and UEFI_DRIVER modules.) >>> >> >> In general, I think describing MMIO in the UEFI memory map is not very >> useful, and counter to the spec, which mentions that the memory map >> describes memory ("however it is used"), not memory *space* (unless >> UEFI itself needs to access it to implement runtime services) >> > > The UEFI memory map will reflect allocations from the GCD memory space, > for the Reserved and MMIO types. See "Figure 2. GCD Memory State > Transitions" in "7.2.2 GCD Memory Resources", Vol2 of the PI spec. > > See also "9.7.1 UEFI Boot Services Dependencies" in the same, > > 9.7.1.8 GetMemoryMap() > > The GetMemoryMap() implementation must include into the UEFI memory > map all GCD map entries of types EfiGcdMemoryTypeReserved and > EfiPersistentMemory, and all GCD map entries of type > EfiGcdMemoryTypeMemoryMappedIo that have EFI_MEMORY_RUNTIME attribute > set. > > (Note that I wrote Reserved earlier, not MMIO.) > What the PI spec stipulates is irrelevant: the contract between the firmware and the OS is in the UEFI and ACPI specifications, not in the PI spec. > However, you are right that *just* the UEFI memmap entry is not > sufficient, according to the PCI firmware spec. (Regardless of the fact > that in practice, just the memmap entry does keep Linux happy. Or is it > about to change?) > The kernel uses the UEFI memory map for two purposes: - finding out where memory is, and which parts are usable (i.e., non-reserved) - setting up page tables to allow UEFI runtime services calls, which may include MMIO mappings This means that MMIO regions in the UEFI memory map are *not* considered reservations. This is in line with the wording of the UEFI spec, which mentions that the memory map describes memory, not MMIO (with the exception of MMIO peripherals that the firmware needs to access to implement the runtime services) > Namely, looking again at the spot I quoted above (and it's also quoted > in the kernel docs patch that you linked above, under ref [6]), we find > > If the operating system does not natively comprehend reserving the > MMCFG region, the MMCFG region must be reserved by firmware. The > address range reported in the MCFG table or by _CBA method (see > Section 4.1.3) must be reserved by declaring a motherboard resource. > For most systems, the motherboard resource would appear at the root > of the ACPI namespace (under \_SB) in a node with a _HID of EISAID > (PNP0C02), and the resources in this case should not be claimed in > the root PCI bus’s _CRS. The resources can optionally be returned in > Int15 E820 or EFIGetMemoryMap as reserved memory but must always be > reported through ACPI as a motherboard resource. > > Therefore I agree that reserving the MMCONFIG area via a PNP0C02 object > in QEMU's ACPI payload improves spec conformance. > Good. > (Actually, the argument can be made for x86/Q35 as well. Adding Marcel > and MST.) > > ... Beyond the machine-type dependency raised by Peter (which I gather > is still being discussed), I suggest that the commit message of this > patch quote the relevant passage from the PCI fw spec in full (see > above, or in the kernel docs patch). > I will expand the commit message in the next respin Thanks (and apologies for not cc'ing you in the first place), Ard.
(my reply is no longer related to the patch, so maybe I shouldn't send it... I can't resist, sorry :)) On 01/17/17 08:47, Ard Biesheuvel wrote: > On 16 January 2017 at 22:35, Laszlo Ersek <lersek@redhat.com> wrote: >> The UEFI memory map will reflect allocations from the GCD memory space, >> for the Reserved and MMIO types. See "Figure 2. GCD Memory State >> Transitions" in "7.2.2 GCD Memory Resources", Vol2 of the PI spec. >> >> See also "9.7.1 UEFI Boot Services Dependencies" in the same, >> >> 9.7.1.8 GetMemoryMap() >> >> The GetMemoryMap() implementation must include into the UEFI memory >> map all GCD map entries of types EfiGcdMemoryTypeReserved and >> EfiPersistentMemory, and all GCD map entries of type >> EfiGcdMemoryTypeMemoryMappedIo that have EFI_MEMORY_RUNTIME attribute >> set. >> >> (Note that I wrote Reserved earlier, not MMIO.) >> > > What the PI spec stipulates is irrelevant: the contract between the > firmware and the OS is in the UEFI and ACPI specifications, not in the > PI spec. I disagree that what the PI spec stipulates is irrelevant. For platforms that implement both PI and UEFI, the PI spec expresses additional requirements for the UEFI implementation (in PI terminology). So what it says certainly matters for the ArmVirtQemu firmware specifically. End-to-end, if we want to achieve a particular result in a UEFI OS, we can certainly work towards that end in the PEI phase (or in the DXE phase, using the DXE services) in a specific firmware that aims to conform to both PI and UEFI. Because, the effects that those low-level operations will have on the UEFI level (and consequently, on the OS) are well defined in the PI spec. > >> However, you are right that *just* the UEFI memmap entry is not >> sufficient, according to the PCI firmware spec. (Regardless of the fact >> that in practice, just the memmap entry does keep Linux happy. Or is it >> about to change?) >> > > The kernel uses the UEFI memory map for two purposes: > - finding out where memory is, and which parts are usable (i.e., non-reserved) > - setting up page tables to allow UEFI runtime services calls, which > may include MMIO mappings > > This means that MMIO regions in the UEFI memory map are *not* > considered reservations. [...] Yes, I understand that. Now please understand that my suggestion was never to cover the MMCONFIG area with MMIO type memory; all along I've been saying "reserved memory". (Again, this is now independent of the patch.) Thanks, Laszlo
On 17 January 2017 at 08:50, Laszlo Ersek <lersek@redhat.com> wrote: > (my reply is no longer related to the patch, so maybe I shouldn't send > it... I can't resist, sorry :)) > > On 01/17/17 08:47, Ard Biesheuvel wrote: >> On 16 January 2017 at 22:35, Laszlo Ersek <lersek@redhat.com> wrote: > >>> The UEFI memory map will reflect allocations from the GCD memory space, >>> for the Reserved and MMIO types. See "Figure 2. GCD Memory State >>> Transitions" in "7.2.2 GCD Memory Resources", Vol2 of the PI spec. >>> >>> See also "9.7.1 UEFI Boot Services Dependencies" in the same, >>> >>> 9.7.1.8 GetMemoryMap() >>> >>> The GetMemoryMap() implementation must include into the UEFI memory >>> map all GCD map entries of types EfiGcdMemoryTypeReserved and >>> EfiPersistentMemory, and all GCD map entries of type >>> EfiGcdMemoryTypeMemoryMappedIo that have EFI_MEMORY_RUNTIME attribute >>> set. >>> >>> (Note that I wrote Reserved earlier, not MMIO.) >>> >> >> What the PI spec stipulates is irrelevant: the contract between the >> firmware and the OS is in the UEFI and ACPI specifications, not in the >> PI spec. > > I disagree that what the PI spec stipulates is irrelevant. For platforms > that implement both PI and UEFI, the PI spec expresses additional > requirements for the UEFI implementation (in PI terminology). So what it > says certainly matters for the ArmVirtQemu firmware specifically. > > End-to-end, if we want to achieve a particular result in a UEFI OS, we > can certainly work towards that end in the PEI phase (or in the DXE > phase, using the DXE services) in a specific firmware that aims to > conform to both PI and UEFI. Because, the effects that those low-level > operations will have on the UEFI level (and consequently, on the OS) are > well defined in the PI spec. > PI spec should drive the implementation choices we make at the ArmVirtQemu end, and the ACPI generation is tightly coupled with that, so in that sense, I agree that the PI spec *is* relevant. However, the purpose of the patch (which we are no longer discussing :-)), is to ensure that QEMU + ArmVirtQemu adheres to the pertinent contracts with the OS, and PI is not one of them. >> >>> However, you are right that *just* the UEFI memmap entry is not >>> sufficient, according to the PCI firmware spec. (Regardless of the fact >>> that in practice, just the memmap entry does keep Linux happy. Or is it >>> about to change?) >>> >> >> The kernel uses the UEFI memory map for two purposes: >> - finding out where memory is, and which parts are usable (i.e., non-reserved) >> - setting up page tables to allow UEFI runtime services calls, which >> may include MMIO mappings >> >> This means that MMIO regions in the UEFI memory map are *not* >> considered reservations. [...] > > Yes, I understand that. Now please understand that my suggestion was > never to cover the MMCONFIG area with MMIO type memory; all along I've > been saying "reserved memory". > > (Again, this is now independent of the patch.) > I know the various specs are vague and slightly contradictory, but I would oppose to using EfiReservedMemory to describe an MMIO region, given that the wording of the UEFI spec (which is authoritative imo) suggests that the memory map should only describe memory (unless we are dealing with MMIO regions that require a runtime mapping so that the firmware can use the device while running under the OS)
On 01/17/17 10:06, Ard Biesheuvel wrote: > On 17 January 2017 at 08:50, Laszlo Ersek <lersek@redhat.com> wrote: >> (my reply is no longer related to the patch, so maybe I shouldn't send >> it... I can't resist, sorry :)) >> >> On 01/17/17 08:47, Ard Biesheuvel wrote: >>> On 16 January 2017 at 22:35, Laszlo Ersek <lersek@redhat.com> wrote: >> >>>> The UEFI memory map will reflect allocations from the GCD memory space, >>>> for the Reserved and MMIO types. See "Figure 2. GCD Memory State >>>> Transitions" in "7.2.2 GCD Memory Resources", Vol2 of the PI spec. >>>> >>>> See also "9.7.1 UEFI Boot Services Dependencies" in the same, >>>> >>>> 9.7.1.8 GetMemoryMap() >>>> >>>> The GetMemoryMap() implementation must include into the UEFI memory >>>> map all GCD map entries of types EfiGcdMemoryTypeReserved and >>>> EfiPersistentMemory, and all GCD map entries of type >>>> EfiGcdMemoryTypeMemoryMappedIo that have EFI_MEMORY_RUNTIME attribute >>>> set. >>>> >>>> (Note that I wrote Reserved earlier, not MMIO.) >>>> >>> >>> What the PI spec stipulates is irrelevant: the contract between the >>> firmware and the OS is in the UEFI and ACPI specifications, not in the >>> PI spec. >> >> I disagree that what the PI spec stipulates is irrelevant. For platforms >> that implement both PI and UEFI, the PI spec expresses additional >> requirements for the UEFI implementation (in PI terminology). So what it >> says certainly matters for the ArmVirtQemu firmware specifically. >> >> End-to-end, if we want to achieve a particular result in a UEFI OS, we >> can certainly work towards that end in the PEI phase (or in the DXE >> phase, using the DXE services) in a specific firmware that aims to >> conform to both PI and UEFI. Because, the effects that those low-level >> operations will have on the UEFI level (and consequently, on the OS) are >> well defined in the PI spec. >> > > PI spec should drive the implementation choices we make at the > ArmVirtQemu end, and the ACPI generation is tightly coupled with that, > so in that sense, I agree that the PI spec *is* relevant. > > However, the purpose of the patch (which we are no longer discussing > :-)), is to ensure that QEMU + ArmVirtQemu adheres to the pertinent > contracts with the OS, and PI is not one of them. > >>> >>>> However, you are right that *just* the UEFI memmap entry is not >>>> sufficient, according to the PCI firmware spec. (Regardless of the fact >>>> that in practice, just the memmap entry does keep Linux happy. Or is it >>>> about to change?) >>>> >>> >>> The kernel uses the UEFI memory map for two purposes: >>> - finding out where memory is, and which parts are usable (i.e., non-reserved) >>> - setting up page tables to allow UEFI runtime services calls, which >>> may include MMIO mappings >>> >>> This means that MMIO regions in the UEFI memory map are *not* >>> considered reservations. [...] >> >> Yes, I understand that. Now please understand that my suggestion was >> never to cover the MMCONFIG area with MMIO type memory; all along I've >> been saying "reserved memory". >> >> (Again, this is now independent of the patch.) >> > > I know the various specs are vague and slightly contradictory, but I > would oppose to using EfiReservedMemory to describe an MMIO region, > given that the wording of the UEFI spec (which is authoritative imo) > suggests that the memory map should only describe memory (unless we > are dealing with MMIO regions that require a runtime mapping so that > the firmware can use the device while running under the OS) > Fair enough, on both counts :)
On Mon, Jan 16, 2017 at 07:31:33PM +0000, Ard Biesheuvel wrote: > On 16 January 2017 at 18:20, Peter Maydell <peter.maydell@linaro.org> wrote: > > On 16 January 2017 at 17:30, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > >> On 16 January 2017 at 17:25, Peter Maydell <peter.maydell@linaro.org> wrote: > >>> On 13 January 2017 at 17:32, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > >>>> Linux for arm64 v4.10 and later will complain if the ECAM config space is > >>>> not reserved in the ACPI namespace: > >>>> > >>>> acpi PNP0A08:00: [Firmware Bug]: ECAM area [mem 0x3f000000-0x3fffffff] not reserved in ACPI namespace > >>>> > >>>> The rationale is that OSes that don't consume the MCFG table should still > >>>> be able to infer that the PCI config space MMIO region is occupied. > >>>> > >>>> So update the ACPI table generation routine to add this reservation. > >>>> > >>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> > >>>> --- > >>>> hw/arm/virt-acpi-build.c | 7 +++++++ > >>>> 1 file changed, 7 insertions(+) > >>>> > >>>> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c > >>>> index 085a61117378..50d52f685f68 100644 > >>>> --- a/hw/arm/virt-acpi-build.c > >>>> +++ b/hw/arm/virt-acpi-build.c > >>>> @@ -310,6 +310,13 @@ static void acpi_dsdt_add_pci(Aml *scope, const MemMapEntry *memmap, > >>>> Aml *dev_rp0 = aml_device("%s", "RP0"); > >>>> aml_append(dev_rp0, aml_name_decl("_ADR", aml_int(0))); > >>>> aml_append(dev, dev_rp0); > >>>> + > >>>> + Aml *dev_res0 = aml_device("%s", "RES0"); > >>>> + aml_append(dev_res0, aml_name_decl("_HID", aml_string("PNP0C02"))); > >>>> + crs = aml_resource_template(); > >>>> + aml_append(crs, aml_memory32_fixed(base_ecam, size_ecam, AML_READ_WRITE)); > >>>> + aml_append(dev_res0, aml_name_decl("_CRS", crs)); > >>>> + aml_append(dev, dev_res0); > >>>> aml_append(scope, dev); > >>>> } > >>> > >>> This needs to be controlled via the machine class back-compat > >>> machinery in hw/arm/virt.c so that it only happens for virt-2.9 > >>> and later. > >>> > >> > >> Why exactly? > > > > Because the "virt-2.8" machine has to present to the guest > > exactly what "virt" did as of the QEMU 2.8 release, including > > any bugs or missing things we happened to have in our ACPI > > tables. This allows cross-version compatibility (including > > VM migration). Drew will have a more detailed explanation > > if you need it. > > > > I suspected as much. > > But in this case, I am not sure if it is worth the trouble: the > generated data is only consumed at boot time by the firmware, and I > suppose migration involves freezing a VM, including whatever resident > firmware image was used to boot the OS, and so this is unlikely to > affect migration. > > But I will let Drew explain ... > In some cases the problem we're solving with the compat guards is a bit hypothetical, but, IMHO, nonetheless a good practice. While we may be sure that AAVMF and Linux will be fine with this table changing under their feet, we can't be sure there aren't other mach-virt users that have more sensitive firmwares/OSes. An ACPI- sensitive OS may notice the change on its next reboot after a migration, and then simply refuse to continue. Now, that said, I just spoke with Igor in order to learn the x86 practice. He says that the policy has been more lax than what I suggest above. Hypothetical, low-risk issues are left unguarded, and only when a bug is found during testing is it then managed. The idea is to try and reduce the amount of compat variables and conditions needed in the ACPI generation code, but, of course, at some level of risk to users expecting their versioned machine type to always appear the same. So far we've been strict with mach-virt, guarding all hypothetical issues. Perhaps this patch is a good example to get a discussion started on whether or not we should be so strict though. Thanks, drew
On 17 January 2017 at 09:49, Andrew Jones <drjones@redhat.com> wrote: > In some cases the problem we're solving with the compat guards is > a bit hypothetical, but, IMHO, nonetheless a good practice. While > we may be sure that AAVMF and Linux will be fine with this table > changing under their feet, we can't be sure there aren't other > mach-virt users that have more sensitive firmwares/OSes. An ACPI- > sensitive OS may notice the change on its next reboot after a > migration, and then simply refuse to continue. There's also the case where you do a VM migration midway through UEFI booting up, I think, which might cause things to go wrong if you catch it just at the wrong moment. > Now, that said, I just spoke with Igor in order to learn the x86 > practice. He says that the policy has been more lax than what I > suggest above. Hypothetical, low-risk issues are left unguarded, > and only when a bug is found during testing is it then managed. > The idea is to try and reduce the amount of compat variables and > conditions needed in the ACPI generation code, but, of course, at > some level of risk to users expecting their versioned machine type > to always appear the same. > > So far we've been strict with mach-virt, guarding all hypothetical > issues. Perhaps this patch is a good example to get a discussion > started on whether or not we should be so strict though. That said, I don't have a very strong opinion here, beyond that we should be consistent at least with x86 practice. thanks -- PMM
On Mon, Jan 16, 2017 at 11:35:04PM +0100, Laszlo Ersek wrote: > On 01/16/17 22:23, Ard Biesheuvel wrote: > > On 16 January 2017 at 21:13, Laszlo Ersek <lersek@redhat.com> wrote: > >> On 01/16/17 20:31, Ard Biesheuvel wrote: > >>> On 16 January 2017 at 18:20, Peter Maydell <peter.maydell@linaro.org> wrote: > >>>> On 16 January 2017 at 17:30, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > >>>>> On 16 January 2017 at 17:25, Peter Maydell <peter.maydell@linaro.org> wrote: > >>>>>> On 13 January 2017 at 17:32, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > >>>>>>> Linux for arm64 v4.10 and later will complain if the ECAM config space is > >>>>>>> not reserved in the ACPI namespace: > >>>>>>> > >>>>>>> acpi PNP0A08:00: [Firmware Bug]: ECAM area [mem 0x3f000000-0x3fffffff] not reserved in ACPI namespace > >>>>>>> > >>>>>>> The rationale is that OSes that don't consume the MCFG table should still > >>>>>>> be able to infer that the PCI config space MMIO region is occupied. > >>>>>>> > >>>>>>> So update the ACPI table generation routine to add this reservation. > >>>>>>> > >>>>>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> > >>>>>>> --- > >>>>>>> hw/arm/virt-acpi-build.c | 7 +++++++ > >>>>>>> 1 file changed, 7 insertions(+) > >>>>>>> > >>>>>>> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c > >>>>>>> index 085a61117378..50d52f685f68 100644 > >>>>>>> --- a/hw/arm/virt-acpi-build.c > >>>>>>> +++ b/hw/arm/virt-acpi-build.c > >>>>>>> @@ -310,6 +310,13 @@ static void acpi_dsdt_add_pci(Aml *scope, const MemMapEntry *memmap, > >>>>>>> Aml *dev_rp0 = aml_device("%s", "RP0"); > >>>>>>> aml_append(dev_rp0, aml_name_decl("_ADR", aml_int(0))); > >>>>>>> aml_append(dev, dev_rp0); > >>>>>>> + > >>>>>>> + Aml *dev_res0 = aml_device("%s", "RES0"); > >>>>>>> + aml_append(dev_res0, aml_name_decl("_HID", aml_string("PNP0C02"))); > >>>>>>> + crs = aml_resource_template(); > >>>>>>> + aml_append(crs, aml_memory32_fixed(base_ecam, size_ecam, AML_READ_WRITE)); > >>>>>>> + aml_append(dev_res0, aml_name_decl("_CRS", crs)); > >>>>>>> + aml_append(dev, dev_res0); > >>>>>>> aml_append(scope, dev); > >>>>>>> } > >>>>>> > >>>>>> This needs to be controlled via the machine class back-compat > >>>>>> machinery in hw/arm/virt.c so that it only happens for virt-2.9 > >>>>>> and later. > >>>>>> > >>>>> > >>>>> Why exactly? > >>>> > >>>> Because the "virt-2.8" machine has to present to the guest > >>>> exactly what "virt" did as of the QEMU 2.8 release, including > >>>> any bugs or missing things we happened to have in our ACPI > >>>> tables. This allows cross-version compatibility (including > >>>> VM migration). Drew will have a more detailed explanation > >>>> if you need it. > >>>> > >>> > >>> I suspected as much. > >>> > >>> But in this case, I am not sure if it is worth the trouble: the > >>> generated data is only consumed at boot time by the firmware, and I > >>> suppose migration involves freezing a VM, including whatever resident > >>> firmware image was used to boot the OS, and so this is unlikely to > >>> affect migration. > >>> > >>> But I will let Drew explain ... > >> > >> The PCI Firmware Specification (rev 3.1) says in 4.1.2. "MCFG Table > >> Description": "The resources can *optionally* be returned in [...] > >> EFIGetMemoryMap as reserved memory [...]". (Emphasis mine.) Linux seems > >> to *insist* on this kind of reservation however. > >> > > > > No, not at the UEFI level but at the ACPI level. Reservations in the > > UEFI memory map describe memory not MMIO space > > > >> PNP0C02 is "General ID for reserving resources required by PnP > >> motherboard registers. (Not device specific.)", according to > >> <http://www.plasma-online.de/english/identify/serial/pnp_id_pnp.html>. > >> So what this patch does is reserve a memory area through ACPI, > >> practically as an unspecified "platform resource". > >> > > > > This has been discussed at great length on the linux mailing lists > > > > https://patchwork.kernel.org/patch/9453149/ > > > >> There's an alternative that's contained entirely in the firmware. You > >> can cover the MMCONFIG area in ArmVirtQemu with an EfiReservedMemoryType > >> memory map entry (by producing an appropriate memalloc HOB in PEI, or by > >> calling the appropriate gDS memory space map functions in DXE). OVMF > >> does the former (memalloc HOB). > >> > >> In ArmVirtQemu, we grab the MMCONFIG range from "pci-host-ecam-generic", > >> from QEMU's DTB. If you don't dislike the idea, we could cover the range > >> as well, right in "ArmVirtPkg/Library/FdtPciPcdProducerLib". That lib > >> instance already sets the base address PCD, and makes sure that the > >> relevant code is executed only once (in whatever driver module the > >> library instance was built into). You could call the gDS functions > >> mentioned above from that spot. (The library instance is already > >> restricted to DXE_DRIVER and UEFI_DRIVER modules.) > >> > > > > In general, I think describing MMIO in the UEFI memory map is not very > > useful, and counter to the spec, which mentions that the memory map > > describes memory ("however it is used"), not memory *space* (unless > > UEFI itself needs to access it to implement runtime services) > > > > The UEFI memory map will reflect allocations from the GCD memory space, > for the Reserved and MMIO types. See "Figure 2. GCD Memory State > Transitions" in "7.2.2 GCD Memory Resources", Vol2 of the PI spec. > > See also "9.7.1 UEFI Boot Services Dependencies" in the same, > > 9.7.1.8 GetMemoryMap() > > The GetMemoryMap() implementation must include into the UEFI memory > map all GCD map entries of types EfiGcdMemoryTypeReserved and > EfiPersistentMemory, and all GCD map entries of type > EfiGcdMemoryTypeMemoryMappedIo that have EFI_MEMORY_RUNTIME attribute > set. > > (Note that I wrote Reserved earlier, not MMIO.) > > However, you are right that *just* the UEFI memmap entry is not > sufficient, according to the PCI firmware spec. (Regardless of the fact > that in practice, just the memmap entry does keep Linux happy. Or is it > about to change?) > > Namely, looking again at the spot I quoted above (and it's also quoted > in the kernel docs patch that you linked above, under ref [6]), we find > > If the operating system does not natively comprehend reserving the > MMCFG region, the MMCFG region must be reserved by firmware. The > address range reported in the MCFG table or by _CBA method (see > Section 4.1.3) must be reserved by declaring a motherboard resource. > For most systems, the motherboard resource would appear at the root > of the ACPI namespace (under \_SB) in a node with a _HID of EISAID > (PNP0C02), and the resources in this case should not be claimed in > the root PCI bus’s _CRS. The resources can optionally be returned in > Int15 E820 or EFIGetMemoryMap as reserved memory but must always be > reported through ACPI as a motherboard resource. > > Therefore I agree that reserving the MMCONFIG area via a PNP0C02 object > in QEMU's ACPI payload improves spec conformance. > > (Actually, the argument can be made for x86/Q35 as well. Adding Marcel > and MST.) I agree, thanks for pointing this out. Patch, anyone? > ... Beyond the machine-type dependency raised by Peter (which I gather > is still being discussed), I suggest that the commit message of this > patch quote the relevant passage from the PCI fw spec in full (see > above, or in the kernel docs patch). > > Thanks! > Laszlo
On 17 January 2017 at 09:49, Andrew Jones <drjones@redhat.com> wrote: > On Mon, Jan 16, 2017 at 07:31:33PM +0000, Ard Biesheuvel wrote: >> On 16 January 2017 at 18:20, Peter Maydell <peter.maydell@linaro.org> wrote: >> > On 16 January 2017 at 17:30, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: >> >> On 16 January 2017 at 17:25, Peter Maydell <peter.maydell@linaro.org> wrote: >> >>> On 13 January 2017 at 17:32, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: >> >>>> Linux for arm64 v4.10 and later will complain if the ECAM config space is >> >>>> not reserved in the ACPI namespace: >> >>>> >> >>>> acpi PNP0A08:00: [Firmware Bug]: ECAM area [mem 0x3f000000-0x3fffffff] not reserved in ACPI namespace >> >>>> >> >>>> The rationale is that OSes that don't consume the MCFG table should still >> >>>> be able to infer that the PCI config space MMIO region is occupied. >> >>>> >> >>>> So update the ACPI table generation routine to add this reservation. >> >>>> >> >>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> >> >>>> --- >> >>>> hw/arm/virt-acpi-build.c | 7 +++++++ >> >>>> 1 file changed, 7 insertions(+) >> >>>> >> >>>> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c >> >>>> index 085a61117378..50d52f685f68 100644 >> >>>> --- a/hw/arm/virt-acpi-build.c >> >>>> +++ b/hw/arm/virt-acpi-build.c >> >>>> @@ -310,6 +310,13 @@ static void acpi_dsdt_add_pci(Aml *scope, const MemMapEntry *memmap, >> >>>> Aml *dev_rp0 = aml_device("%s", "RP0"); >> >>>> aml_append(dev_rp0, aml_name_decl("_ADR", aml_int(0))); >> >>>> aml_append(dev, dev_rp0); >> >>>> + >> >>>> + Aml *dev_res0 = aml_device("%s", "RES0"); >> >>>> + aml_append(dev_res0, aml_name_decl("_HID", aml_string("PNP0C02"))); >> >>>> + crs = aml_resource_template(); >> >>>> + aml_append(crs, aml_memory32_fixed(base_ecam, size_ecam, AML_READ_WRITE)); >> >>>> + aml_append(dev_res0, aml_name_decl("_CRS", crs)); >> >>>> + aml_append(dev, dev_res0); >> >>>> aml_append(scope, dev); >> >>>> } >> >>> >> >>> This needs to be controlled via the machine class back-compat >> >>> machinery in hw/arm/virt.c so that it only happens for virt-2.9 >> >>> and later. >> >>> >> >> >> >> Why exactly? >> > >> > Because the "virt-2.8" machine has to present to the guest >> > exactly what "virt" did as of the QEMU 2.8 release, including >> > any bugs or missing things we happened to have in our ACPI >> > tables. This allows cross-version compatibility (including >> > VM migration). Drew will have a more detailed explanation >> > if you need it. >> > >> >> I suspected as much. >> >> But in this case, I am not sure if it is worth the trouble: the >> generated data is only consumed at boot time by the firmware, and I >> suppose migration involves freezing a VM, including whatever resident >> firmware image was used to boot the OS, and so this is unlikely to >> affect migration. >> >> But I will let Drew explain ... >> > > In some cases the problem we're solving with the compat guards is > a bit hypothetical, but, IMHO, nonetheless a good practice. While > we may be sure that AAVMF and Linux will be fine with this table > changing under their feet, we can't be sure there aren't other > mach-virt users that have more sensitive firmwares/OSes. An ACPI- > sensitive OS may notice the change on its next reboot after a > migration, and then simply refuse to continue. > > Now, that said, I just spoke with Igor in order to learn the x86 > practice. He says that the policy has been more lax than what I > suggest above. Hypothetical, low-risk issues are left unguarded, > and only when a bug is found during testing is it then managed. > The idea is to try and reduce the amount of compat variables and > conditions needed in the ACPI generation code, but, of course, at > some level of risk to users expecting their versioned machine type > to always appear the same. > > So far we've been strict with mach-virt, guarding all hypothetical > issues. Perhaps this patch is a good example to get a discussion > started on whether or not we should be so strict though. > Yes please. I don't mind respinning the patch, but I agree that it makes sense to consider whether minimal bug fixes like this one require this treatment in the first place
On Tue, 17 Jan 2017 10:56:53 +0000 Peter Maydell <peter.maydell@linaro.org> wrote: > On 17 January 2017 at 09:49, Andrew Jones <drjones@redhat.com> wrote: > > In some cases the problem we're solving with the compat guards is > > a bit hypothetical, but, IMHO, nonetheless a good practice. While > > we may be sure that AAVMF and Linux will be fine with this table > > changing under their feet, we can't be sure there aren't other > > mach-virt users that have more sensitive firmwares/OSes. An ACPI- > > sensitive OS may notice the change on its next reboot after a > > migration, and then simply refuse to continue. > > There's also the case where you do a VM migration midway through > UEFI booting up, I think, which might cause things to go wrong > if you catch it just at the wrong moment. acpi blobs are migrated from source so above won't happen. The time guest will see new table is fresh boot or reboot. > > > Now, that said, I just spoke with Igor in order to learn the x86 > > practice. He says that the policy has been more lax than what I > > suggest above. Hypothetical, low-risk issues are left unguarded, > > and only when a bug is found during testing is it then managed. > > The idea is to try and reduce the amount of compat variables and > > conditions needed in the ACPI generation code, but, of course, at > > some level of risk to users expecting their versioned machine type > > to always appear the same. > > > > So far we've been strict with mach-virt, guarding all hypothetical > > issues. Perhaps this patch is a good example to get a discussion > > started on whether or not we should be so strict though. > > That said, I don't have a very strong opinion here, beyond that > we should be consistent at least with x86 practice. another reason why we are trying not to use strict approach with ACPI tables is that it's part of firmware and we didn't version firmwares so far. (i.e. dst host with newer QEMU will typically have newer firmware and guest with old machine-type migrated to host with newer QEMU will run new firmware on (re)boot) > > thanks > -- PMM
On 01/18/17 16:18, Igor Mammedov wrote: > On Tue, 17 Jan 2017 10:56:53 +0000 > Peter Maydell <peter.maydell@linaro.org> wrote: > >> On 17 January 2017 at 09:49, Andrew Jones <drjones@redhat.com> wrote: >>> In some cases the problem we're solving with the compat guards is >>> a bit hypothetical, but, IMHO, nonetheless a good practice. While >>> we may be sure that AAVMF and Linux will be fine with this table >>> changing under their feet, we can't be sure there aren't other >>> mach-virt users that have more sensitive firmwares/OSes. An ACPI- >>> sensitive OS may notice the change on its next reboot after a >>> migration, and then simply refuse to continue. >> >> There's also the case where you do a VM migration midway through >> UEFI booting up, I think, which might cause things to go wrong >> if you catch it just at the wrong moment. > acpi blobs are migrated from source so above won't happen. > The time guest will see new table is fresh boot or reboot. > >> >>> Now, that said, I just spoke with Igor in order to learn the x86 >>> practice. He says that the policy has been more lax than what I >>> suggest above. Hypothetical, low-risk issues are left unguarded, >>> and only when a bug is found during testing is it then managed. >>> The idea is to try and reduce the amount of compat variables and >>> conditions needed in the ACPI generation code, but, of course, at >>> some level of risk to users expecting their versioned machine type >>> to always appear the same. >>> >>> So far we've been strict with mach-virt, guarding all hypothetical >>> issues. Perhaps this patch is a good example to get a discussion >>> started on whether or not we should be so strict though. >> >> That said, I don't have a very strong opinion here, beyond that >> we should be consistent at least with x86 practice. > another reason why we are trying not to use strict approach with ACPI > tables is that it's part of firmware and we didn't version firmwares > so far. (i.e. dst host with newer QEMU will typically have newer > firmware and guest with old machine-type migrated to host with newer > QEMU will run new firmware on (re)boot) I haven't been aware of this argument, and I'm surprised by it, but I think it's valid. Regardless of our choice to ultimately compose the ACPI tables in QEMU, guest OSes definitely consider ACPI as part of the firmware. So, different ACPI content after a migration + guest reboot on the target host is not much different from any other firmware-level changes encountered on the same target host, after reboot. Laszlo
On 18 January 2017 at 15:55, Laszlo Ersek <lersek@redhat.com> wrote: > On 01/18/17 16:18, Igor Mammedov wrote: >> On Tue, 17 Jan 2017 10:56:53 +0000 >> Peter Maydell <peter.maydell@linaro.org> wrote: >> >>> On 17 January 2017 at 09:49, Andrew Jones <drjones@redhat.com> wrote: >>>> In some cases the problem we're solving with the compat guards is >>>> a bit hypothetical, but, IMHO, nonetheless a good practice. While >>>> we may be sure that AAVMF and Linux will be fine with this table >>>> changing under their feet, we can't be sure there aren't other >>>> mach-virt users that have more sensitive firmwares/OSes. An ACPI- >>>> sensitive OS may notice the change on its next reboot after a >>>> migration, and then simply refuse to continue. >>> >>> There's also the case where you do a VM migration midway through >>> UEFI booting up, I think, which might cause things to go wrong >>> if you catch it just at the wrong moment. >> acpi blobs are migrated from source so above won't happen. >> The time guest will see new table is fresh boot or reboot. >> >>> >>>> Now, that said, I just spoke with Igor in order to learn the x86 >>>> practice. He says that the policy has been more lax than what I >>>> suggest above. Hypothetical, low-risk issues are left unguarded, >>>> and only when a bug is found during testing is it then managed. >>>> The idea is to try and reduce the amount of compat variables and >>>> conditions needed in the ACPI generation code, but, of course, at >>>> some level of risk to users expecting their versioned machine type >>>> to always appear the same. >>>> >>>> So far we've been strict with mach-virt, guarding all hypothetical >>>> issues. Perhaps this patch is a good example to get a discussion >>>> started on whether or not we should be so strict though. >>> >>> That said, I don't have a very strong opinion here, beyond that >>> we should be consistent at least with x86 practice. >> another reason why we are trying not to use strict approach with ACPI >> tables is that it's part of firmware and we didn't version firmwares >> so far. (i.e. dst host with newer QEMU will typically have newer >> firmware and guest with old machine-type migrated to host with newer >> QEMU will run new firmware on (re)boot) > > I haven't been aware of this argument, and I'm surprised by it, but I > think it's valid. Regardless of our choice to ultimately compose the > ACPI tables in QEMU, guest OSes definitely consider ACPI as part of the > firmware. So, different ACPI content after a migration + guest reboot on > the target host is not much different from any other firmware-level > changes encountered on the same target host, after reboot. > I agree. But does that imply that this fix should be tightly coupled to the mach-virt version, considering that the UEFI firmware you run *inside* such a vm is not versioned either?
On 01/18/17 18:02, Ard Biesheuvel wrote: > On 18 January 2017 at 15:55, Laszlo Ersek <lersek@redhat.com> wrote: >> On 01/18/17 16:18, Igor Mammedov wrote: >>> On Tue, 17 Jan 2017 10:56:53 +0000 >>> Peter Maydell <peter.maydell@linaro.org> wrote: >>> >>>> On 17 January 2017 at 09:49, Andrew Jones <drjones@redhat.com> wrote: >>>>> In some cases the problem we're solving with the compat guards is >>>>> a bit hypothetical, but, IMHO, nonetheless a good practice. While >>>>> we may be sure that AAVMF and Linux will be fine with this table >>>>> changing under their feet, we can't be sure there aren't other >>>>> mach-virt users that have more sensitive firmwares/OSes. An ACPI- >>>>> sensitive OS may notice the change on its next reboot after a >>>>> migration, and then simply refuse to continue. >>>> >>>> There's also the case where you do a VM migration midway through >>>> UEFI booting up, I think, which might cause things to go wrong >>>> if you catch it just at the wrong moment. >>> acpi blobs are migrated from source so above won't happen. >>> The time guest will see new table is fresh boot or reboot. >>> >>>> >>>>> Now, that said, I just spoke with Igor in order to learn the x86 >>>>> practice. He says that the policy has been more lax than what I >>>>> suggest above. Hypothetical, low-risk issues are left unguarded, >>>>> and only when a bug is found during testing is it then managed. >>>>> The idea is to try and reduce the amount of compat variables and >>>>> conditions needed in the ACPI generation code, but, of course, at >>>>> some level of risk to users expecting their versioned machine type >>>>> to always appear the same. >>>>> >>>>> So far we've been strict with mach-virt, guarding all hypothetical >>>>> issues. Perhaps this patch is a good example to get a discussion >>>>> started on whether or not we should be so strict though. >>>> >>>> That said, I don't have a very strong opinion here, beyond that >>>> we should be consistent at least with x86 practice. >>> another reason why we are trying not to use strict approach with ACPI >>> tables is that it's part of firmware and we didn't version firmwares >>> so far. (i.e. dst host with newer QEMU will typically have newer >>> firmware and guest with old machine-type migrated to host with newer >>> QEMU will run new firmware on (re)boot) >> >> I haven't been aware of this argument, and I'm surprised by it, but I >> think it's valid. Regardless of our choice to ultimately compose the >> ACPI tables in QEMU, guest OSes definitely consider ACPI as part of the >> firmware. So, different ACPI content after a migration + guest reboot on >> the target host is not much different from any other firmware-level >> changes encountered on the same target host, after reboot. >> > > I agree. But does that imply that this fix should be tightly coupled > to the mach-virt version, considering that the UEFI firmware you run > *inside* such a vm is not versioned either? No, it implies the exact opposite: given that the UEFI firmware is not versioned, and may very well differ on source host and target host, the ACPI payload that QEMU generates (and that the guest OS considers part of the firmware) should be permitted to differ between src and dst host just the same. In brief, for one data point, I'd be fine if we didn't tie this change to machine types. Thanks Laszlo
On 18 January 2017 at 17:26, Laszlo Ersek <lersek@redhat.com> wrote: > In brief, for one data point, I'd be fine if we didn't tie this change > to machine types. We seem to have arrived at a consensus that we don't need to version-constrain this change, so I'm applying Ard's patch to target-arm.next. thanks -- PMM
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c index 085a61117378..50d52f685f68 100644 --- a/hw/arm/virt-acpi-build.c +++ b/hw/arm/virt-acpi-build.c @@ -310,6 +310,13 @@ static void acpi_dsdt_add_pci(Aml *scope, const MemMapEntry *memmap, Aml *dev_rp0 = aml_device("%s", "RP0"); aml_append(dev_rp0, aml_name_decl("_ADR", aml_int(0))); aml_append(dev, dev_rp0); + + Aml *dev_res0 = aml_device("%s", "RES0"); + aml_append(dev_res0, aml_name_decl("_HID", aml_string("PNP0C02"))); + crs = aml_resource_template(); + aml_append(crs, aml_memory32_fixed(base_ecam, size_ecam, AML_READ_WRITE)); + aml_append(dev_res0, aml_name_decl("_CRS", crs)); + aml_append(dev, dev_res0); aml_append(scope, dev); }
Linux for arm64 v4.10 and later will complain if the ECAM config space is not reserved in the ACPI namespace: acpi PNP0A08:00: [Firmware Bug]: ECAM area [mem 0x3f000000-0x3fffffff] not reserved in ACPI namespace The rationale is that OSes that don't consume the MCFG table should still be able to infer that the PCI config space MMIO region is occupied. So update the ACPI table generation routine to add this reservation. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> --- hw/arm/virt-acpi-build.c | 7 +++++++ 1 file changed, 7 insertions(+) -- 2.7.4