Message ID | 20240426135126.12802-1-Jonathan.Cameron@huawei.com |
---|---|
Headers | show |
Series | ACPI/arm64: add support for virtual cpu hotplug | expand |
On Fri, 26 Apr 2024 17:21:41 +0000 Miguel Luis <miguel.luis@oracle.com> wrote: > Hi Jonathan, > > > On 26 Apr 2024, at 16:05, Miguel Luis <miguel.luis@oracle.com> wrote: > > > > > > > >> On 26 Apr 2024, at 13:51, Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote: > >> > >> Separate code paths, combined with a flag set in acpi_processor.c to > >> indicate a struct acpi_processor was for a hotplugged CPU ensured that > >> per CPU data was only set up the first time that a CPU was initialized. > >> This appears to be unnecessary as the paths can be combined by letting > >> the online logic also handle any CPUs online at the time of driver load. > >> > >> Motivation for this change, beyond simplification, is that ARM64 > >> virtual CPU HP uses the same code paths for hotplug and cold path in > >> acpi_processor.c so had no easy way to set the flag for hotplug only. > >> Removing this necessity will enable ARM64 vCPU HP to reuse the existing > >> code paths. > >> > >> Leave noisy pr_info() in place but update it to not state the CPU > >> was hotplugged. > > On a second thought, do we want to keep it? Can't we just assume that no > news is good news while keeping the warn right after __acpi_processor_start ? Good question - my inclination was to keep this in place for now as removing it would remove a source of information people may expect on x86 hotplug. Then maybe propose dropping it as overly noisy kernel as a follow up patch after this series is merged. Felt like a potential rat hole I didn't want to go down if I could avoid it. If any x86 experts want to shout that no one cares then I'll happily drop the print. We've carefully made it so that on arm64 we have no way to tell if this is hotplug or normal cpu bring up so we can't just print it on hotplug. Jonathan > > Miguel > > >> > >> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > >> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> > >> Tested-by: Miguel Luis <miguel.luis@oracle.com> > >> Reviewed-by: Gavin Shan <gshan@redhat.com> > >> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > >> > >> --- > >> v8: No change > >> --- > >> drivers/acpi/acpi_processor.c | 1 - > >> drivers/acpi/processor_driver.c | 44 ++++++++++----------------------- > >> include/acpi/processor.h | 2 +- > >> 3 files changed, 14 insertions(+), 33 deletions(-) > >> > >> diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c > >> index 7a0dd35d62c9..7fc924aeeed0 100644 > >> --- a/drivers/acpi/acpi_processor.c > >> +++ b/drivers/acpi/acpi_processor.c > >> @@ -216,7 +216,6 @@ static int acpi_processor_hotadd_init(struct acpi_processor *pr) > >> * gets online for the first time. > >> */ > >> pr_info("CPU%d has been hot-added\n", pr->id); > >> - pr->flags.need_hotplug_init = 1; > >> > >> out: > >> cpus_write_unlock(); > >> diff --git a/drivers/acpi/processor_driver.c b/drivers/acpi/processor_driver.c > >> index 67db60eda370..55782eac3ff1 100644 > >> --- a/drivers/acpi/processor_driver.c > >> +++ b/drivers/acpi/processor_driver.c > >> @@ -33,7 +33,6 @@ MODULE_AUTHOR("Paul Diefenbaugh"); > >> MODULE_DESCRIPTION("ACPI Processor Driver"); > >> MODULE_LICENSE("GPL"); > >> > >> -static int acpi_processor_start(struct device *dev); > >> static int acpi_processor_stop(struct device *dev); > >> > >> static const struct acpi_device_id processor_device_ids[] = { > >> @@ -47,7 +46,6 @@ static struct device_driver acpi_processor_driver = { > >> .name = "processor", > >> .bus = &cpu_subsys, > >> .acpi_match_table = processor_device_ids, > >> - .probe = acpi_processor_start, > >> .remove = acpi_processor_stop, > >> }; > >> > >> @@ -115,12 +113,10 @@ static int acpi_soft_cpu_online(unsigned int cpu) > >> * CPU got physically hotplugged and onlined for the first time: > >> * Initialize missing things. > >> */ > >> - if (pr->flags.need_hotplug_init) { > >> + if (!pr->flags.previously_online) { > >> int ret; > >> > >> - pr_info("Will online and init hotplugged CPU: %d\n", > >> - pr->id); > >> - pr->flags.need_hotplug_init = 0; > >> + pr_info("Will online and init CPU: %d\n", pr->id); > >> ret = __acpi_processor_start(device); > >> WARN(ret, "Failed to start CPU: %d\n", pr->id); > >> } else { > >> @@ -167,9 +163,6 @@ static int __acpi_processor_start(struct acpi_device *device) > >> if (!pr) > >> return -ENODEV; > >> > >> - if (pr->flags.need_hotplug_init) > >> - return 0; > >> - > >> result = acpi_cppc_processor_probe(pr); > >> if (result && !IS_ENABLED(CONFIG_ACPI_CPU_FREQ_PSS)) > >> dev_dbg(&device->dev, "CPPC data invalid or not present\n"); > >> @@ -185,32 +178,21 @@ static int __acpi_processor_start(struct acpi_device *device) > >> > >> status = acpi_install_notify_handler(device->handle, ACPI_DEVICE_NOTIFY, > >> acpi_processor_notify, device); > >> - if (ACPI_SUCCESS(status)) > >> - return 0; > >> + if (!ACPI_SUCCESS(status)) { > >> + result = -ENODEV; > >> + goto err_thermal_exit; > >> + } > >> + pr->flags.previously_online = 1; > >> > >> - result = -ENODEV; > >> - acpi_processor_thermal_exit(pr, device); > >> + return 0; > >> > >> +err_thermal_exit: > >> + acpi_processor_thermal_exit(pr, device); > >> err_power_exit: > >> acpi_processor_power_exit(pr); > >> return result; > >> } > >> > >> -static int acpi_processor_start(struct device *dev) > >> -{ > >> - struct acpi_device *device = ACPI_COMPANION(dev); > >> - int ret; > >> - > >> - if (!device) > >> - return -ENODEV; > >> - > >> - /* Protect against concurrent CPU hotplug operations */ > >> - cpu_hotplug_disable(); > >> - ret = __acpi_processor_start(device); > >> - cpu_hotplug_enable(); > >> - return ret; > >> -} > >> - > >> static int acpi_processor_stop(struct device *dev) > >> { > >> struct acpi_device *device = ACPI_COMPANION(dev); > >> @@ -279,9 +261,9 @@ static int __init acpi_processor_driver_init(void) > >> if (result < 0) > >> return result; > >> > >> - result = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, > >> - "acpi/cpu-drv:online", > >> - acpi_soft_cpu_online, NULL); > >> + result = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, > >> + "acpi/cpu-drv:online", > >> + acpi_soft_cpu_online, NULL); > >> if (result < 0) > >> goto err; > >> hp_online = result; > >> diff --git a/include/acpi/processor.h b/include/acpi/processor.h > >> index 3f34ebb27525..e6f6074eadbf 100644 > >> --- a/include/acpi/processor.h > >> +++ b/include/acpi/processor.h > >> @@ -217,7 +217,7 @@ struct acpi_processor_flags { > >> u8 has_lpi:1; > >> u8 power_setup_done:1; > >> u8 bm_rld_set:1; > >> - u8 need_hotplug_init:1; > >> + u8 previously_online:1; > > > > Reviewed-by: Miguel Luis <miguel.luis@oracle.com> > > > > Miguel > > > >> }; > >> > >> struct acpi_processor { > >> -- > >> 2.39.2 > >> > > >
On Fri, 26 Apr 2024 18:49:49 +0100 Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote: > On Fri, 26 Apr 2024 17:21:41 +0000 > Miguel Luis <miguel.luis@oracle.com> wrote: > > > Hi Jonathan, > > > > > On 26 Apr 2024, at 16:05, Miguel Luis <miguel.luis@oracle.com> wrote: > > > > > > > > > > > >> On 26 Apr 2024, at 13:51, Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote: > > >> > > >> Separate code paths, combined with a flag set in acpi_processor.c to > > >> indicate a struct acpi_processor was for a hotplugged CPU ensured that > > >> per CPU data was only set up the first time that a CPU was initialized. > > >> This appears to be unnecessary as the paths can be combined by letting > > >> the online logic also handle any CPUs online at the time of driver load. > > >> > > >> Motivation for this change, beyond simplification, is that ARM64 > > >> virtual CPU HP uses the same code paths for hotplug and cold path in > > >> acpi_processor.c so had no easy way to set the flag for hotplug only. > > >> Removing this necessity will enable ARM64 vCPU HP to reuse the existing > > >> code paths. > > >> > > >> Leave noisy pr_info() in place but update it to not state the CPU > > >> was hotplugged. > > > > On a second thought, do we want to keep it? Can't we just assume that no > > news is good news while keeping the warn right after __acpi_processor_start ? > > Good question - my inclination was to keep this in place for now as removing > it would remove a source of information people may expect on x86 hotplug. > > Then maybe propose dropping it as overly noisy kernel as a follow up > patch after this series is merged. Felt like a potential rat hole I didn't > want to go down if I could avoid it. > > If any x86 experts want to shout that no one cares then I'll happily drop > the print. We've carefully made it so that on arm64 we have no way to tell > if this is hotplug or normal cpu bring up so we can't just print it on > hotplug. I'm being silly. This is just one of the messages shouting out hotplug happened and for that matter only occurs at online anyway which is trivially detected. There is a much more informative ACPI: CPU3: has been hot-added message for example on the actual hotplug event. Let's drop it for v9. There is also a stale comment about a flag being set that is no longer the case that I'll drop. > > Jonathan > > > > > > Miguel > > > > >> > > >> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > >> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> > > >> Tested-by: Miguel Luis <miguel.luis@oracle.com> > > >> Reviewed-by: Gavin Shan <gshan@redhat.com> > > >> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > > >> > > >> --- > > >> v8: No change > > >> --- > > >> drivers/acpi/acpi_processor.c | 1 - > > >> drivers/acpi/processor_driver.c | 44 ++++++++++----------------------- > > >> include/acpi/processor.h | 2 +- > > >> 3 files changed, 14 insertions(+), 33 deletions(-) > > >> > > >> diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c > > >> index 7a0dd35d62c9..7fc924aeeed0 100644 > > >> --- a/drivers/acpi/acpi_processor.c > > >> +++ b/drivers/acpi/acpi_processor.c > > >> @@ -216,7 +216,6 @@ static int acpi_processor_hotadd_init(struct acpi_processor *pr) > > >> * gets online for the first time. > > >> */ > > >> pr_info("CPU%d has been hot-added\n", pr->id); > > >> - pr->flags.need_hotplug_init = 1; > > >> > > >> out: > > >> cpus_write_unlock(); > > >> diff --git a/drivers/acpi/processor_driver.c b/drivers/acpi/processor_driver.c > > >> index 67db60eda370..55782eac3ff1 100644 > > >> --- a/drivers/acpi/processor_driver.c > > >> +++ b/drivers/acpi/processor_driver.c > > >> @@ -33,7 +33,6 @@ MODULE_AUTHOR("Paul Diefenbaugh"); > > >> MODULE_DESCRIPTION("ACPI Processor Driver"); > > >> MODULE_LICENSE("GPL"); > > >> > > >> -static int acpi_processor_start(struct device *dev); > > >> static int acpi_processor_stop(struct device *dev); > > >> > > >> static const struct acpi_device_id processor_device_ids[] = { > > >> @@ -47,7 +46,6 @@ static struct device_driver acpi_processor_driver = { > > >> .name = "processor", > > >> .bus = &cpu_subsys, > > >> .acpi_match_table = processor_device_ids, > > >> - .probe = acpi_processor_start, > > >> .remove = acpi_processor_stop, > > >> }; > > >> > > >> @@ -115,12 +113,10 @@ static int acpi_soft_cpu_online(unsigned int cpu) > > >> * CPU got physically hotplugged and onlined for the first time: > > >> * Initialize missing things. > > >> */ > > >> - if (pr->flags.need_hotplug_init) { > > >> + if (!pr->flags.previously_online) { > > >> int ret; > > >> > > >> - pr_info("Will online and init hotplugged CPU: %d\n", > > >> - pr->id); > > >> - pr->flags.need_hotplug_init = 0; > > >> + pr_info("Will online and init CPU: %d\n", pr->id); > > >> ret = __acpi_processor_start(device); > > >> WARN(ret, "Failed to start CPU: %d\n", pr->id); > > >> } else { > > >> @@ -167,9 +163,6 @@ static int __acpi_processor_start(struct acpi_device *device) > > >> if (!pr) > > >> return -ENODEV; > > >> > > >> - if (pr->flags.need_hotplug_init) > > >> - return 0; > > >> - > > >> result = acpi_cppc_processor_probe(pr); > > >> if (result && !IS_ENABLED(CONFIG_ACPI_CPU_FREQ_PSS)) > > >> dev_dbg(&device->dev, "CPPC data invalid or not present\n"); > > >> @@ -185,32 +178,21 @@ static int __acpi_processor_start(struct acpi_device *device) > > >> > > >> status = acpi_install_notify_handler(device->handle, ACPI_DEVICE_NOTIFY, > > >> acpi_processor_notify, device); > > >> - if (ACPI_SUCCESS(status)) > > >> - return 0; > > >> + if (!ACPI_SUCCESS(status)) { > > >> + result = -ENODEV; > > >> + goto err_thermal_exit; > > >> + } > > >> + pr->flags.previously_online = 1; > > >> > > >> - result = -ENODEV; > > >> - acpi_processor_thermal_exit(pr, device); > > >> + return 0; > > >> > > >> +err_thermal_exit: > > >> + acpi_processor_thermal_exit(pr, device); > > >> err_power_exit: > > >> acpi_processor_power_exit(pr); > > >> return result; > > >> } > > >> > > >> -static int acpi_processor_start(struct device *dev) > > >> -{ > > >> - struct acpi_device *device = ACPI_COMPANION(dev); > > >> - int ret; > > >> - > > >> - if (!device) > > >> - return -ENODEV; > > >> - > > >> - /* Protect against concurrent CPU hotplug operations */ > > >> - cpu_hotplug_disable(); > > >> - ret = __acpi_processor_start(device); > > >> - cpu_hotplug_enable(); > > >> - return ret; > > >> -} > > >> - > > >> static int acpi_processor_stop(struct device *dev) > > >> { > > >> struct acpi_device *device = ACPI_COMPANION(dev); > > >> @@ -279,9 +261,9 @@ static int __init acpi_processor_driver_init(void) > > >> if (result < 0) > > >> return result; > > >> > > >> - result = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, > > >> - "acpi/cpu-drv:online", > > >> - acpi_soft_cpu_online, NULL); > > >> + result = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, > > >> + "acpi/cpu-drv:online", > > >> + acpi_soft_cpu_online, NULL); > > >> if (result < 0) > > >> goto err; > > >> hp_online = result; > > >> diff --git a/include/acpi/processor.h b/include/acpi/processor.h > > >> index 3f34ebb27525..e6f6074eadbf 100644 > > >> --- a/include/acpi/processor.h > > >> +++ b/include/acpi/processor.h > > >> @@ -217,7 +217,7 @@ struct acpi_processor_flags { > > >> u8 has_lpi:1; > > >> u8 power_setup_done:1; > > >> u8 bm_rld_set:1; > > >> - u8 need_hotplug_init:1; > > >> + u8 previously_online:1; > > > > > > Reviewed-by: Miguel Luis <miguel.luis@oracle.com> > > > > > > Miguel > > > > > >> }; > > >> > > >> struct acpi_processor { > > >> -- > > >> 2.39.2 > > >> > > > > > >
> > @@ -2363,11 +2381,24 @@ gic_acpi_parse_madt_gicc(union acpi_subtable_headers *header, > > (struct acpi_madt_generic_interrupt *)header; > > u32 reg = readl_relaxed(acpi_data.dist_base + GICD_PIDR2) & GIC_PIDR2_ARCH_MASK; > > u32 size = reg == GIC_PIDR2_ARCH_GICv4 ? SZ_64K * 4 : SZ_64K * 2; > > + int cpu = get_cpu_for_acpi_id(gicc->uid); > > void __iomem *redist_base; > > > > if (!acpi_gicc_is_usable(gicc)) > > return 0; > > > > + /* > > + * Capable but disabled CPUs can be brought online later. What about > > + * the redistributor? ACPI doesn't want to say! > > + * Virtual hotplug systems can use the MADT's "always-on" GICR entries. > > + * Otherwise, prevent such CPUs from being brought online. > > + */ > > + if (!(gicc->flags & ACPI_MADT_ENABLED)) { > > Now this makes the above acpi_gicc_is_usable() very odd. It checks for > MADT_ENABLED *or* GICC_ONLINE_CAPABLE. But we definitely don't want to > deal with the lack of MADT_ENABLED. > > So why don't we explicitly check for individual flags and get rid of > acpi_gicc_is_usable(), as its new definition doesn't tell you anything > useful? That does seem to have evolved to something rather odd. I messed around with various reorganizations of the boolean logic and ended up with same 2 conditions as here as otherwise the indent gets deep and the code becomes fiddlier to reason about (see below for result) > > > + return 0; > > + } > > + > > redist_base = ioremap(gicc->gicr_base_address, size); > > if (!redist_base) > > return -ENOMEM; > > @@ -2413,9 +2444,12 @@ static int __init gic_acpi_match_gicc(union acpi_subtable_headers *header, > > > > /* > > * If GICC is enabled and has valid gicr base address, then it means > > - * GICR base is presented via GICC > > + * GICR base is presented via GICC. The redistributor is only known to > > + * be accessible if the GICC is marked as enabled. If this bit is not > > + * set, we'd need to add the redistributor at runtime, which isn't > > + * supported. > > */ > > - if (acpi_gicc_is_usable(gicc) && gicc->gicr_base_address) > > + if (gicc->flags & ACPI_MADT_ENABLED && gicc->gicr_base_address) > > acpi_data.enabled_rdists++; > > > > return 0; > > diff --git a/include/linux/acpi.h b/include/linux/acpi.h > > index 9844a3f9c4e5..fcfb7bb6789e 100644 > > --- a/include/linux/acpi.h > > +++ b/include/linux/acpi.h > > @@ -239,7 +239,8 @@ void acpi_table_print_madt_entry (struct acpi_subtable_header *madt); > > > > static inline bool acpi_gicc_is_usable(struct acpi_madt_generic_interrupt *gicc) > > { > > - return gicc->flags & ACPI_MADT_ENABLED; > > + return gicc->flags & (ACPI_MADT_ENABLED | > > + ACPI_MADT_GICC_ONLINE_CAPABLE); > > } > > > > /* the following numa functions are architecture-dependent */ > > Thanks, I'll not send a formal v9 until early next week, so here is the current state if you have time to take another look before then. From a8a54cfbadccf1782b7cc04b93eb875dedbee7a9 Mon Sep 17 00:00:00 2001 From: James Morse <james.morse@arm.com> Date: Thu, 18 Apr 2024 14:54:07 +0100 Subject: [PATCH] irqchip/gic-v3: Add support for ACPI's disabled but 'online capable' CPUs To support virtual CPU hotplug, ACPI has added an 'online capable' bit to the MADT GICC entries. This indicates a disabled CPU entry may not be possible to online via PSCI until firmware has set enabled bit in _STA. This means that a "usable" GIC redistributor is one that is marked as either enabled, or online capable. The meaning of the acpi_gicc_is_usable() would become less clear than just checking the pair of flags at call sites. As such, drop that helper function. The test in gic_acpi_match_gicc() remains as testing just the enabled bit so the count of enabled distributors is correct. What about the redistributor in the GICC entry? ACPI doesn't want to say. Assume the worst: When a redistributor is described in the GICC entry, but the entry is marked as disabled at boot, assume the redistributor is inaccessible. The GICv3 driver doesn't support late online of redistributors, so this means the corresponding CPU can't be brought online either. Rather than modifying cpu masks that may already have been used, register a new cpuhp callback to fail this case. This must run earlier than the main gic_starting_cpu() so that this case can be rejected before the section of cpuhp that runs on the CPU that is coming up as that is not allowed to fail. This solution keeps the handling of this broken firmware corner case local to the GIC driver. As precise ordering of this callback doesn't need to be controlled as long as it is in that initial prepare phase, use CPUHP_BP_PREPARE_DYN. Systems that want CPU hotplug in a VM can ensure their redistributors are always-on, and describe them that way with a GICR entry in the MADT. Suggested-by: Marc Zyngier <maz@kernel.org> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Tested-by: Miguel Luis <miguel.luis@oracle.com> Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> --- v9: Thanks to Marc for quick follow up. Fix up description and drop the acpi_gicc_is_usable() check given that now doesn't actually mean they are usable. Thanks to Marc for review and suggestions! v8: Change the handling of broken rdists to fail cpuhp rather than modifying the cpu_present and cpu_possible masks. Updated commit text to reflect that. Added a sb tag for Marc given this is more or less what he put in his review comment. --- arch/arm64/kernel/smp.c | 3 ++- drivers/acpi/processor_core.c | 3 ++- drivers/irqchip/irq-gic-v3.c | 44 +++++++++++++++++++++++++++++++---- include/linux/acpi.h | 5 ---- 4 files changed, 44 insertions(+), 11 deletions(-) diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c index 4ced34f62dab..afe835c1cbe2 100644 --- a/arch/arm64/kernel/smp.c +++ b/arch/arm64/kernel/smp.c @@ -523,7 +523,8 @@ acpi_map_gic_cpu_interface(struct acpi_madt_generic_interrupt *processor) { u64 hwid = processor->arm_mpidr; - if (!acpi_gicc_is_usable(processor)) { + if (!(processor->flags & + (ACPI_MADT_ENABLED | ACPI_MADT_GICC_ONLINE_CAPABLE))) { pr_debug("skipping disabled CPU entry with 0x%llx MPIDR\n", hwid); return; } diff --git a/drivers/acpi/processor_core.c b/drivers/acpi/processor_core.c index b203cfe28550..b04b684f3190 100644 --- a/drivers/acpi/processor_core.c +++ b/drivers/acpi/processor_core.c @@ -90,7 +90,8 @@ static int map_gicc_mpidr(struct acpi_subtable_header *entry, struct acpi_madt_generic_interrupt *gicc = container_of(entry, struct acpi_madt_generic_interrupt, header); - if (!acpi_gicc_is_usable(gicc)) + if (!(gicc->flags & + (ACPI_MADT_ENABLED | ACPI_MADT_GICC_ONLINE_CAPABLE))) return -ENODEV; /* device_declaration means Device object in DSDT, in the diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c index 10af15f93d4d..45272316d155 100644 --- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -44,6 +44,8 @@ #define GIC_IRQ_TYPE_PARTITION (GIC_IRQ_TYPE_LPI + 1) +static struct cpumask broken_rdists __read_mostly; + struct redist_region { void __iomem *redist_base; phys_addr_t phys_base; @@ -1293,6 +1295,18 @@ static void gic_cpu_init(void) #define MPIDR_TO_SGI_RS(mpidr) (MPIDR_RS(mpidr) << ICC_SGI1R_RS_SHIFT) #define MPIDR_TO_SGI_CLUSTER_ID(mpidr) ((mpidr) & ~0xFUL) +/* + * gic_starting_cpu() is called after the last point where cpuhp is allowed + * to fail. So pre check for problems earlier. + */ +static int gic_check_rdist(unsigned int cpu) +{ + if (cpumask_test_cpu(cpu, &broken_rdists)) + return -EINVAL; + + return 0; +} + static int gic_starting_cpu(unsigned int cpu) { gic_cpu_init(); @@ -1384,6 +1398,10 @@ static void __init gic_smp_init(void) }; int base_sgi; + cpuhp_setup_state_nocalls(CPUHP_BP_PREPARE_DYN, + "irqchip/arm/gicv3:checkrdist", + gic_check_rdist, NULL); + cpuhp_setup_state_nocalls(CPUHP_AP_IRQ_GIC_STARTING, "irqchip/arm/gicv3:starting", gic_starting_cpu, NULL); @@ -2363,11 +2381,25 @@ gic_acpi_parse_madt_gicc(union acpi_subtable_headers *header, (struct acpi_madt_generic_interrupt *)header; u32 reg = readl_relaxed(acpi_data.dist_base + GICD_PIDR2) & GIC_PIDR2_ARCH_MASK; u32 size = reg == GIC_PIDR2_ARCH_GICv4 ? SZ_64K * 4 : SZ_64K * 2; + int cpu = get_cpu_for_acpi_id(gicc->uid); void __iomem *redist_base; - if (!acpi_gicc_is_usable(gicc)) + /* Neither enabled or online capable means it doesn't exist, skip it */ + if (!(gicc->flags & (ACPI_MADT_ENABLED | ACPI_MADT_GICC_ONLINE_CAPABLE))) return 0; + /* + * Capable but disabled CPUs can be brought online later. What about + * the redistributor? ACPI doesn't want to say! + * Virtual hotplug systems can use the MADT's "always-on" GICR entries. + * Otherwise, prevent such CPUs from being brought online. + */ + if (!(gicc->flags & ACPI_MADT_ENABLED)) { + pr_warn("CPU %u's redistributor is inaccessible: this CPU can't be brought online\n", cpu); + cpumask_set_cpu(cpu, &broken_rdists); + return 0; + } + redist_base = ioremap(gicc->gicr_base_address, size); if (!redist_base) return -ENOMEM; @@ -2413,9 +2445,12 @@ static int __init gic_acpi_match_gicc(union acpi_subtable_headers *header, /* * If GICC is enabled and has valid gicr base address, then it means - * GICR base is presented via GICC + * GICR base is presented via GICC. The redistributor is only known to + * be accessible if the GICC is marked as enabled. If this bit is not + * set, we'd need to add the redistributor at runtime, which isn't + * supported. */ - if (acpi_gicc_is_usable(gicc) && gicc->gicr_base_address) + if (gicc->flags & ACPI_MADT_ENABLED && gicc->gicr_base_address) acpi_data.enabled_rdists++; return 0; @@ -2474,7 +2509,8 @@ static int __init gic_acpi_parse_virt_madt_gicc(union acpi_subtable_headers *hea int maint_irq_mode; static int first_madt = true; - if (!acpi_gicc_is_usable(gicc)) + if (!(gicc->flags & + (ACPI_MADT_ENABLED | ACPI_MADT_GICC_ONLINE_CAPABLE))) return 0; maint_irq_mode = (gicc->flags & ACPI_MADT_VGIC_IRQ_MODE) ? diff --git a/include/linux/acpi.h b/include/linux/acpi.h index 9844a3f9c4e5..cf5d2a6950ec 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -237,11 +237,6 @@ acpi_table_parse_cedt(enum acpi_cedt_type id, int acpi_parse_mcfg (struct acpi_table_header *header); void acpi_table_print_madt_entry (struct acpi_subtable_header *madt); -static inline bool acpi_gicc_is_usable(struct acpi_madt_generic_interrupt *gicc) -{ - return gicc->flags & ACPI_MADT_ENABLED; -} - /* the following numa functions are architecture-dependent */ void acpi_numa_slit_init (struct acpi_table_slit *slit);
On Fri, 26 Apr 2024 19:28:58 +0100, Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote: > > > I'll not send a formal v9 until early next week, so here is the current state > if you have time to take another look before then. Don't bother resending this on my account -- you only sent it on Friday and there hasn't been much response to it yet. There is still a problem (see below), but looks otherwise OK. [...] > @@ -2363,11 +2381,25 @@ gic_acpi_parse_madt_gicc(union acpi_subtable_headers *header, > (struct acpi_madt_generic_interrupt *)header; > u32 reg = readl_relaxed(acpi_data.dist_base + GICD_PIDR2) & GIC_PIDR2_ARCH_MASK; > u32 size = reg == GIC_PIDR2_ARCH_GICv4 ? SZ_64K * 4 : SZ_64K * 2; > + int cpu = get_cpu_for_acpi_id(gicc->uid); I already commented that get_cpu_for_acpi_id() can... > void __iomem *redist_base; > > - if (!acpi_gicc_is_usable(gicc)) > + /* Neither enabled or online capable means it doesn't exist, skip it */ > + if (!(gicc->flags & (ACPI_MADT_ENABLED | ACPI_MADT_GICC_ONLINE_CAPABLE))) > return 0; > > + /* > + * Capable but disabled CPUs can be brought online later. What about > + * the redistributor? ACPI doesn't want to say! > + * Virtual hotplug systems can use the MADT's "always-on" GICR entries. > + * Otherwise, prevent such CPUs from being brought online. > + */ > + if (!(gicc->flags & ACPI_MADT_ENABLED)) { > + pr_warn("CPU %u's redistributor is inaccessible: this CPU can't be brought online\n", cpu); > + cpumask_set_cpu(cpu, &broken_rdists); ... return -EINVAL, and then be passed to cpumask_set_cpu(), with interesting effects. It shouldn't happen, but I trust anything that comes from firmware tables as much as I trust a campaigning politician's promises. This should really result in the RD being considered unusable, but without affecting any CPU (there is no valid CPU the first place). Another question is what get_cpu_for acpi_id() returns for a disabled CPU. A valid CPU number? Or -EINVAL? Thanks, M.
On Sun, 28 Apr 2024 12:28:03 +0100 Marc Zyngier <maz@kernel.org> wrote: > On Fri, 26 Apr 2024 19:28:58 +0100, > Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote: > > > > > > I'll not send a formal v9 until early next week, so here is the current state > > if you have time to take another look before then. > > Don't bother resending this on my account -- you only sent it on > Friday and there hasn't been much response to it yet. There is still a > problem (see below), but looks otherwise OK. > > [...] > > > @@ -2363,11 +2381,25 @@ gic_acpi_parse_madt_gicc(union acpi_subtable_headers *header, > > (struct acpi_madt_generic_interrupt *)header; > > u32 reg = readl_relaxed(acpi_data.dist_base + GICD_PIDR2) & GIC_PIDR2_ARCH_MASK; > > u32 size = reg == GIC_PIDR2_ARCH_GICv4 ? SZ_64K * 4 : SZ_64K * 2; > > + int cpu = get_cpu_for_acpi_id(gicc->uid); > > I already commented that get_cpu_for_acpi_id() can... Indeed sorry - I blame Friday syndrome for me failing to address that. > > > void __iomem *redist_base; > > > > - if (!acpi_gicc_is_usable(gicc)) > > + /* Neither enabled or online capable means it doesn't exist, skip it */ > > + if (!(gicc->flags & (ACPI_MADT_ENABLED | ACPI_MADT_GICC_ONLINE_CAPABLE))) > > return 0; > > > > + /* > > + * Capable but disabled CPUs can be brought online later. What about > > + * the redistributor? ACPI doesn't want to say! > > + * Virtual hotplug systems can use the MADT's "always-on" GICR entries. > > + * Otherwise, prevent such CPUs from being brought online. > > + */ > > + if (!(gicc->flags & ACPI_MADT_ENABLED)) { > > + pr_warn("CPU %u's redistributor is inaccessible: this CPU can't be brought online\n", cpu); > > + cpumask_set_cpu(cpu, &broken_rdists); > > ... return -EINVAL, and then be passed to cpumask_set_cpu(), with > interesting effects. It shouldn't happen, but I trust anything that > comes from firmware tables as much as I trust a campaigning > politician's promises. This should really result in the RD being > considered unusable, but without affecting any CPU (there is no valid > CPU the first place). > > Another question is what get_cpu_for acpi_id() returns for a disabled > CPU. A valid CPU number? Or -EINVAL? It's a match function that works by iterating over 0 to nr_cpu_ids and if (uid == get_acpi_id_for_cpu(cpu)) So the question become does get_acpi_id_for_cpu() return a valid CPU number for a disabled CPU. That uses acpi_cpu_get_madt_gicc(cpu)->uid so this all gets a bit circular. That looks it up via cpu_madt_gicc[cpu] which after the proposed updated patch is set if enabled or online capable. There are however a few other error checks in acpi_map_gic_cpu_interface() that could lead to it not being set (MPIDR validity checks). I suspect all of these end up being fatal elsewhere which is why this hasn't blown up before. If any of those cases are possible we could get a null pointer dereference. Easy to harden this case via the following (which will leave us with -EINVAL. There are other call sites that might trip over this. I'm inclined to harden them as a separate issue though so as not to get in the way of this patch set. diff --git a/arch/arm64/include/asm/acpi.h b/arch/arm64/include/asm/acpi.h index bc9a6656fc0c..a407f9cd549e 100644 --- a/arch/arm64/include/asm/acpi.h +++ b/arch/arm64/include/asm/acpi.h @@ -124,7 +124,8 @@ static inline int get_cpu_for_acpi_id(u32 uid) int cpu; for (cpu = 0; cpu < nr_cpu_ids; cpu++) - if (uid == get_acpi_id_for_cpu(cpu)) + if (acpi_cpu_get_madt_gicc(cpu) && + uid == get_acpi_id_for_cpu(cpu)) return cpu; return -EINVAL; I'll spin an additional patch to make that change after testing I haven't messed it up. At the call site in gic_acpi_parse_madt_gicc() I'm not sure we can do better than just skipping setting broken_rdists. I'll also pull the declaration of that cpu variable down into this condition so it's more obvious we only care about it in this error path. Jonathan > > Thanks, > > M. >
On Mon, 29 Apr 2024 10:21:31 +0100 Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote: > On Sun, 28 Apr 2024 12:28:03 +0100 > Marc Zyngier <maz@kernel.org> wrote: > > > On Fri, 26 Apr 2024 19:28:58 +0100, > > Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote: > > > > > > > > > I'll not send a formal v9 until early next week, so here is the current state > > > if you have time to take another look before then. > > > > Don't bother resending this on my account -- you only sent it on > > Friday and there hasn't been much response to it yet. There is still a > > problem (see below), but looks otherwise OK. > > > > [...] > > > > > @@ -2363,11 +2381,25 @@ gic_acpi_parse_madt_gicc(union acpi_subtable_headers *header, > > > (struct acpi_madt_generic_interrupt *)header; > > > u32 reg = readl_relaxed(acpi_data.dist_base + GICD_PIDR2) & GIC_PIDR2_ARCH_MASK; > > > u32 size = reg == GIC_PIDR2_ARCH_GICv4 ? SZ_64K * 4 : SZ_64K * 2; > > > + int cpu = get_cpu_for_acpi_id(gicc->uid); > > > > I already commented that get_cpu_for_acpi_id() can... > > Indeed sorry - I blame Friday syndrome for me failing to address that. > > > > > > void __iomem *redist_base; > > > > > > - if (!acpi_gicc_is_usable(gicc)) > > > + /* Neither enabled or online capable means it doesn't exist, skip it */ > > > + if (!(gicc->flags & (ACPI_MADT_ENABLED | ACPI_MADT_GICC_ONLINE_CAPABLE))) > > > return 0; > > > > > > + /* > > > + * Capable but disabled CPUs can be brought online later. What about > > > + * the redistributor? ACPI doesn't want to say! > > > + * Virtual hotplug systems can use the MADT's "always-on" GICR entries. > > > + * Otherwise, prevent such CPUs from being brought online. > > > + */ > > > + if (!(gicc->flags & ACPI_MADT_ENABLED)) { > > > + pr_warn("CPU %u's redistributor is inaccessible: this CPU can't be brought online\n", cpu); > > > + cpumask_set_cpu(cpu, &broken_rdists); > > > > ... return -EINVAL, and then be passed to cpumask_set_cpu(), with > > interesting effects. It shouldn't happen, but I trust anything that > > comes from firmware tables as much as I trust a campaigning > > politician's promises. This should really result in the RD being > > considered unusable, but without affecting any CPU (there is no valid > > CPU the first place). > > > > Another question is what get_cpu_for acpi_id() returns for a disabled > > CPU. A valid CPU number? Or -EINVAL? > It's a match function that works by iterating over 0 to nr_cpu_ids and > > if (uid == get_acpi_id_for_cpu(cpu)) > > So the question become does get_acpi_id_for_cpu() return a valid CPU > number for a disabled CPU. > > That uses acpi_cpu_get_madt_gicc(cpu)->uid so this all gets a bit circular. > That looks it up via cpu_madt_gicc[cpu] which after the proposed updated > patch is set if enabled or online capable. There are however a few other > error checks in acpi_map_gic_cpu_interface() that could lead to it > not being set (MPIDR validity checks). I suspect all of these end up being > fatal elsewhere which is why this hasn't blown up before. > > If any of those cases are possible we could get a null pointer > dereference. > > Easy to harden this case via the following (which will leave us with > -EINVAL. There are other call sites that might trip over this. > I'm inclined to harden them as a separate issue though so as not > to get in the way of this patch set. > > > diff --git a/arch/arm64/include/asm/acpi.h b/arch/arm64/include/asm/acpi.h > index bc9a6656fc0c..a407f9cd549e 100644 > --- a/arch/arm64/include/asm/acpi.h > +++ b/arch/arm64/include/asm/acpi.h > @@ -124,7 +124,8 @@ static inline int get_cpu_for_acpi_id(u32 uid) > int cpu; > > for (cpu = 0; cpu < nr_cpu_ids; cpu++) > - if (uid == get_acpi_id_for_cpu(cpu)) > + if (acpi_cpu_get_madt_gicc(cpu) && > + uid == get_acpi_id_for_cpu(cpu)) > return cpu; > > return -EINVAL; > > I'll spin an additional patch to make that change after testing I haven't > messed it up. > > At the call site in gic_acpi_parse_madt_gicc() I'm not sure we can do better > than just skipping setting broken_rdists. I'll also pull the declaration of > that cpu variable down into this condition so it's more obvious we only > care about it in this error path. Just for the record, for my deliberately broken test case it seems that it returns a valid CPU ID anyway. That's what I'd expect given acpi_parse_and_init_cpus() doesn't check if the gicc entrees are enabled or not. Jonathan > > Jonathan > > > > > > > > > Thanks, > > > > M. > > > > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel