mbox series

[0/1] x86/acpi: acpi_is_processor_usable() dropping possible cpus

Message ID 20230327191026.3454-1-eric.devolder@oracle.com
Headers show
Series x86/acpi: acpi_is_processor_usable() dropping possible cpus | expand

Message

Eric DeVolder March 27, 2023, 7:10 p.m. UTC
I've been working on a patch series for cpu hotplug and kdump and
noticed recently that utilizing a QEMU guest to test cpu hotplug was
causing the following error messages:

    APIC: NR_CPUS/possible_cpus limit of 30 reached. Processor 30/0x.
    ACPI: Unable to map lapic to logical cpu number
    acpi LNXCPU:1e: Enumeration failure

where prior cpu hotplug would typically result in messages similar to:

    CPU30 has been hot-added
    smpboot: Booting Node 0 Processor 30 APIC 0x1e
    Will online and init hotplugged CPU: 30

The QEMU guest incantation was:

    qemu-system-x86_64 -smp 30,maxcpus=32 ...

A simple guest with 30 cpus and possiblly up to 32, meaning two hot
pluggable.

An investigation lead to finding the following relevant changes:

   commit aa06e20f1be6 ("x86/acpi: Don't add CPUs that are not online capable")
   commit e2869bd7af60 ("x86/acpi/boot: Do not register processors that cannot be onlined for x2APIC")

The first patch codes the fact that ACPI 6.3 now features an Online
Capable bit in MADT.revision >= 5. This Online Capable bit explicitly
indicates that the cpu can be hotplugged.

The second patch refactors the first a bit in order to apply the same
logic for both lapic and x2apic. However, there is a subtle change in
the logic to this patch that breaks for MADT.revision prior to 5.

As it turns out, QEMU reports revision 1 in the MADT table for x86.
(Technically QEMU is at revision 4, and I'll be addressing that with
the QEMU folks soon.)

In looking at these patches and understanding the logic, I can sum it
up by stating that prior to MADT revision 5, the presence of a
lapic/x2apic structure _implicitly_ was used to determine that a cpu
might be hotpluggable, and it would count towards possible cpus. With
MADT.revision >= 5, that implicit assumption is now replaced with an
explicit indication.

The manner in which the latest patch is coded, it essentially requires
the Online Capable bit introduced at MADT.revision >= 5 be set for
present-but-offline cpus, else it excludes the cpu from the possible
cpus. Thus for MADT.revision < 5, possible cpus are being dropped.

This patch aims to restore the behavior for MADT.revision prior to 5
which facilitates again cpu hotplug for QEMU guests.

Regards,
eric
---


Eric DeVolder (1):
  x86/acpi: acpi_is_processor_usable() dropping possible cpus

 arch/x86/kernel/acpi/boot.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)