Message ID | 20240426135126.12802-5-Jonathan.Cameron@huawei.com |
---|---|
State | Superseded |
Headers | show |
Series | ACPI/arm64: add support for virtual cpu hotplug | expand |
On 4/26/24 23:51, Jonathan Cameron wrote: > Make the per_cpu(processors, cpu) entries available earlier so that > they are available in arch_register_cpu() as ARM64 will need access > to the acpi_handle to distinguish between acpi_processor_add() > and earlier registration attempts (which will fail as _STA cannot > be checked). > > Reorder the remove flow to clear this per_cpu() after > arch_unregister_cpu() has completed, allowing it to be used in > there as well. > > Note that on x86 for the CPU hotplug case, the pr->id prior to > acpi_map_cpu() may be invalid. Thus the per_cpu() structures > must be initialized after that call or after checking the ID > is valid (not hotplug path). > > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > > --- > v8: On buggy bios detection when setting per_cpu structures > do not carry on. > Fix up the clearing of per cpu structures to remove unwanted > side effects and ensure an error code isn't use to reference them. > --- > drivers/acpi/acpi_processor.c | 79 +++++++++++++++++++++-------------- > 1 file changed, 48 insertions(+), 31 deletions(-) > > diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c > index ba0a6f0ac841..3b180e21f325 100644 > --- a/drivers/acpi/acpi_processor.c > +++ b/drivers/acpi/acpi_processor.c > @@ -183,8 +183,38 @@ static void __init acpi_pcc_cpufreq_init(void) {} > #endif /* CONFIG_X86 */ > > /* Initialization */ > +static DEFINE_PER_CPU(void *, processor_device_array); > + > +static bool acpi_processor_set_per_cpu(struct acpi_processor *pr, > + struct acpi_device *device) > +{ > + BUG_ON(pr->id >= nr_cpu_ids); One blank line after BUG_ON() if we need to follow original implementation. > + /* > + * Buggy BIOS check. > + * ACPI id of processors can be reported wrongly by the BIOS. > + * Don't trust it blindly > + */ > + if (per_cpu(processor_device_array, pr->id) != NULL && > + per_cpu(processor_device_array, pr->id) != device) { > + dev_warn(&device->dev, > + "BIOS reported wrong ACPI id %d for the processor\n", > + pr->id); > + /* Give up, but do not abort the namespace scan. */ It depends on how the return value is handled by the caller if the namespace is continued to be scanned. The caller can be acpi_processor_hotadd_init() and acpi_processor_get_info() after this patch is applied. So I think this specific comment need to be moved to the caller. Besides, it seems acpi_processor_set_per_cpu() isn't properly called and memory leakage can happen. More details are given below. > + return false; > + } > + /* > + * processor_device_array is not cleared on errors to allow buggy BIOS > + * checks. > + */ > + per_cpu(processor_device_array, pr->id) = device; > + per_cpu(processors, pr->id) = pr; > + > + return true; > +} > + > #ifdef CONFIG_ACPI_HOTPLUG_CPU > -static int acpi_processor_hotadd_init(struct acpi_processor *pr) > +static int acpi_processor_hotadd_init(struct acpi_processor *pr, > + struct acpi_device *device) > { > int ret; > > @@ -198,8 +228,15 @@ static int acpi_processor_hotadd_init(struct acpi_processor *pr) > if (ret) > goto out; > > + if (!acpi_processor_set_per_cpu(pr, device)) { > + acpi_unmap_cpu(pr->id); > + goto out; > + } > + With the 'goto out', zero is returned from acpi_processor_hotadd_init() to acpi_processor_get_info(). The zero return value is carried from acpi_map_cpu() in acpi_processor_hotadd_init(). If I'm correct, we need return errno from acpi_processor_get_info() to acpi_processor_add() so that cleanup can be done. For example, the cleanup corresponding to the 'err' tag can be done in acpi_processor_add(). Otherwise, we will have memory leakage. > ret = arch_register_cpu(pr->id); > if (ret) { > + /* Leave the processor device array in place to detect buggy bios */ > + per_cpu(processors, pr->id) = NULL; > acpi_unmap_cpu(pr->id); > goto out; > } > @@ -217,7 +254,8 @@ static int acpi_processor_hotadd_init(struct acpi_processor *pr) > return ret; > } > #else > -static inline int acpi_processor_hotadd_init(struct acpi_processor *pr) > +static inline int acpi_processor_hotadd_init(struct acpi_processor *pr, > + struct acpi_device *device) > { > return -ENODEV; > } > @@ -316,10 +354,13 @@ static int acpi_processor_get_info(struct acpi_device *device) > * because cpuid <-> apicid mapping is persistent now. > */ > if (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id)) { > - int ret = acpi_processor_hotadd_init(pr); > + int ret = acpi_processor_hotadd_init(pr, device); > > if (ret) > return ret; > + } else { > + if (!acpi_processor_set_per_cpu(pr, device)) > + return 0; > } > For non-hotplug case, we still need pass the error to acpi_processor_add() so that cleanup corresponding 'err' tag can be done. Otherwise, we will have memory leakage. > /* > @@ -365,8 +406,6 @@ static int acpi_processor_get_info(struct acpi_device *device) > * (cpu_data(cpu)) values, like CPU feature flags, family, model, etc. > * Such things have to be put in and set up by the processor driver's .probe(). > */ > -static DEFINE_PER_CPU(void *, processor_device_array); > - > static int acpi_processor_add(struct acpi_device *device, > const struct acpi_device_id *id) > { > @@ -395,28 +434,6 @@ static int acpi_processor_add(struct acpi_device *device, > if (result) /* Processor is not physically present or unavailable */ > return 0; > > - BUG_ON(pr->id >= nr_cpu_ids); > - > - /* > - * Buggy BIOS check. > - * ACPI id of processors can be reported wrongly by the BIOS. > - * Don't trust it blindly > - */ > - if (per_cpu(processor_device_array, pr->id) != NULL && > - per_cpu(processor_device_array, pr->id) != device) { > - dev_warn(&device->dev, > - "BIOS reported wrong ACPI id %d for the processor\n", > - pr->id); > - /* Give up, but do not abort the namespace scan. */ > - goto err; > - } > - /* > - * processor_device_array is not cleared on errors to allow buggy BIOS > - * checks. > - */ > - per_cpu(processor_device_array, pr->id) = device; > - per_cpu(processors, pr->id) = pr; > - > dev = get_cpu_device(pr->id); > if (!dev) { > result = -ENODEV; > @@ -469,10 +486,6 @@ static void acpi_processor_remove(struct acpi_device *device) > device_release_driver(pr->dev); > acpi_unbind_one(pr->dev); > > - /* Clean up. */ > - per_cpu(processor_device_array, pr->id) = NULL; > - per_cpu(processors, pr->id) = NULL; > - > cpu_maps_update_begin(); > cpus_write_lock(); > > @@ -480,6 +493,10 @@ static void acpi_processor_remove(struct acpi_device *device) > arch_unregister_cpu(pr->id); > acpi_unmap_cpu(pr->id); > > + /* Clean up. */ > + per_cpu(processor_device_array, pr->id) = NULL; > + per_cpu(processors, pr->id) = NULL; > + > cpus_write_unlock(); > cpu_maps_update_done(); > Thanks, Gavin
On Tue, 30 Apr 2024 14:17:24 +1000 Gavin Shan <gshan@redhat.com> wrote: > On 4/26/24 23:51, Jonathan Cameron wrote: > > Make the per_cpu(processors, cpu) entries available earlier so that > > they are available in arch_register_cpu() as ARM64 will need access > > to the acpi_handle to distinguish between acpi_processor_add() > > and earlier registration attempts (which will fail as _STA cannot > > be checked). > > > > Reorder the remove flow to clear this per_cpu() after > > arch_unregister_cpu() has completed, allowing it to be used in > > there as well. > > > > Note that on x86 for the CPU hotplug case, the pr->id prior to > > acpi_map_cpu() may be invalid. Thus the per_cpu() structures > > must be initialized after that call or after checking the ID > > is valid (not hotplug path). > > > > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > > > > --- > > v8: On buggy bios detection when setting per_cpu structures > > do not carry on. > > Fix up the clearing of per cpu structures to remove unwanted > > side effects and ensure an error code isn't use to reference them. > > --- > > drivers/acpi/acpi_processor.c | 79 +++++++++++++++++++++-------------- > > 1 file changed, 48 insertions(+), 31 deletions(-) > > > > diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c > > index ba0a6f0ac841..3b180e21f325 100644 > > --- a/drivers/acpi/acpi_processor.c > > +++ b/drivers/acpi/acpi_processor.c > > @@ -183,8 +183,38 @@ static void __init acpi_pcc_cpufreq_init(void) {} > > #endif /* CONFIG_X86 */ > > > > /* Initialization */ > > +static DEFINE_PER_CPU(void *, processor_device_array); > > + > > +static bool acpi_processor_set_per_cpu(struct acpi_processor *pr, > > + struct acpi_device *device) > > +{ > > + BUG_ON(pr->id >= nr_cpu_ids); > > One blank line after BUG_ON() if we need to follow original implementation. Sure unintentional - I'll put that back. > > > + /* > > + * Buggy BIOS check. > > + * ACPI id of processors can be reported wrongly by the BIOS. > > + * Don't trust it blindly > > + */ > > + if (per_cpu(processor_device_array, pr->id) != NULL && > > + per_cpu(processor_device_array, pr->id) != device) { > > + dev_warn(&device->dev, > > + "BIOS reported wrong ACPI id %d for the processor\n", > > + pr->id); > > + /* Give up, but do not abort the namespace scan. */ > > It depends on how the return value is handled by the caller if the namespace > is continued to be scanned. The caller can be acpi_processor_hotadd_init() > and acpi_processor_get_info() after this patch is applied. So I think this > specific comment need to be moved to the caller. Good point. This gets messy and was an unintended change. Previously the options were: 1) acpi_processor_get_info() failed for other reasons - this code was never called. 2) acpi_processor_get_info() succeeded without acpi_processor_hotadd_init (non hotplug) this code then ran and would paper over the problem doing a bunch of cleanup under err. 3) acpi_processor_get_info() succeeded with acpi_processor_hotadd_init called. This code then ran and would paper over the problem doing a bunch of cleanup under err. We should maintain that or argue cleanly against it. This isn't helped the the fact I have no idea which cases we care about for that bios bug handling. Do any of those bios's ever do hotplug? Guess we have to try and maintain whatever protection this was offering. Also, the original code leaks data in some paths and I have limited idea of whether it is intentional or not. So to tidy the issue up that you've identified I'll need to try and make that code consistent first. I suspect the only way to do that is going to be to duplicate the allocations we 'want' to leak to deal with the bios bug detection. For example acpi_processor_get_info() failing leaks pr and pr->throttling.shared_cpu_map before this series. After this series we need pr to leak because it's used for the detection via processor_device_array. I'll work through this but it's going to be tricky to tell if we get right. Step 1 will be closing the existing leaks and then we will have something consistent to build on. > > Besides, it seems acpi_processor_set_per_cpu() isn't properly called and > memory leakage can happen. More details are given below. > > > + return false; > > + } > > + /* > > + * processor_device_array is not cleared on errors to allow buggy BIOS > > + * checks. > > + */ > > + per_cpu(processor_device_array, pr->id) = device; > > + per_cpu(processors, pr->id) = pr; > > + > > + return true; > > +} > > + > > #ifdef CONFIG_ACPI_HOTPLUG_CPU > > -static int acpi_processor_hotadd_init(struct acpi_processor *pr) > > +static int acpi_processor_hotadd_init(struct acpi_processor *pr, > > + struct acpi_device *device) > > { > > int ret; > > > > @@ -198,8 +228,15 @@ static int acpi_processor_hotadd_init(struct acpi_processor *pr) > > if (ret) > > goto out; > > > > + if (!acpi_processor_set_per_cpu(pr, device)) { > > + acpi_unmap_cpu(pr->id); > > + goto out; > > + } > > + > > With the 'goto out', zero is returned from acpi_processor_hotadd_init() to acpi_processor_get_info(). > The zero return value is carried from acpi_map_cpu() in acpi_processor_hotadd_init(). If I'm correct, > we need return errno from acpi_processor_get_info() to acpi_processor_add() so that cleanup can be > done. For example, the cleanup corresponding to the 'err' tag can be done in acpi_processor_add(). > Otherwise, we will have memory leakage. > > > ret = arch_register_cpu(pr->id); > > if (ret) { > > + /* Leave the processor device array in place to detect buggy bios */ > > + per_cpu(processors, pr->id) = NULL; > > acpi_unmap_cpu(pr->id); > > goto out; > > } > > @@ -217,7 +254,8 @@ static int acpi_processor_hotadd_init(struct acpi_processor *pr) > > return ret; > > } > > #else > > -static inline int acpi_processor_hotadd_init(struct acpi_processor *pr) > > +static inline int acpi_processor_hotadd_init(struct acpi_processor *pr, > > + struct acpi_device *device) > > { > > return -ENODEV; > > } > > @@ -316,10 +354,13 @@ static int acpi_processor_get_info(struct acpi_device *device) > > * because cpuid <-> apicid mapping is persistent now. > > */ > > if (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id)) { > > - int ret = acpi_processor_hotadd_init(pr); > > + int ret = acpi_processor_hotadd_init(pr, device); > > > > if (ret) > > return ret; > > + } else { > > + if (!acpi_processor_set_per_cpu(pr, device)) > > + return 0; > > } > > > > For non-hotplug case, we still need pass the error to acpi_processor_add() so that > cleanup corresponding 'err' tag can be done. Otherwise, we will have memory leakage. > > > /* > > @@ -365,8 +406,6 @@ static int acpi_processor_get_info(struct acpi_device *device) > > * (cpu_data(cpu)) values, like CPU feature flags, family, model, etc. > > * Such things have to be put in and set up by the processor driver's .probe(). > > */ > > -static DEFINE_PER_CPU(void *, processor_device_array); > > - > > static int acpi_processor_add(struct acpi_device *device, > > const struct acpi_device_id *id) > > { > > @@ -395,28 +434,6 @@ static int acpi_processor_add(struct acpi_device *device, > > if (result) /* Processor is not physically present or unavailable */ > > return 0; > > > > - BUG_ON(pr->id >= nr_cpu_ids); > > - > > - /* > > - * Buggy BIOS check. > > - * ACPI id of processors can be reported wrongly by the BIOS. > > - * Don't trust it blindly > > - */ > > - if (per_cpu(processor_device_array, pr->id) != NULL && > > - per_cpu(processor_device_array, pr->id) != device) { > > - dev_warn(&device->dev, > > - "BIOS reported wrong ACPI id %d for the processor\n", > > - pr->id); > > - /* Give up, but do not abort the namespace scan. */ > > - goto err; > > - } > > - /* > > - * processor_device_array is not cleared on errors to allow buggy BIOS > > - * checks. > > - */ > > - per_cpu(processor_device_array, pr->id) = device; > > - per_cpu(processors, pr->id) = pr; > > - > > dev = get_cpu_device(pr->id); > > if (!dev) { > > result = -ENODEV; > > @@ -469,10 +486,6 @@ static void acpi_processor_remove(struct acpi_device *device) > > device_release_driver(pr->dev); > > acpi_unbind_one(pr->dev); > > > > - /* Clean up. */ > > - per_cpu(processor_device_array, pr->id) = NULL; > > - per_cpu(processors, pr->id) = NULL; > > - > > cpu_maps_update_begin(); > > cpus_write_lock(); > > > > @@ -480,6 +493,10 @@ static void acpi_processor_remove(struct acpi_device *device) > > arch_unregister_cpu(pr->id); > > acpi_unmap_cpu(pr->id); > > > > + /* Clean up. */ > > + per_cpu(processor_device_array, pr->id) = NULL; > > + per_cpu(processors, pr->id) = NULL; > > + > > cpus_write_unlock(); > > cpu_maps_update_done(); > > > > Thanks, > Gavin >
On Tue, Apr 30, 2024 at 11:28 AM Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote: > > On Tue, 30 Apr 2024 14:17:24 +1000 > Gavin Shan <gshan@redhat.com> wrote: > > > On 4/26/24 23:51, Jonathan Cameron wrote: > > > Make the per_cpu(processors, cpu) entries available earlier so that > > > they are available in arch_register_cpu() as ARM64 will need access > > > to the acpi_handle to distinguish between acpi_processor_add() > > > and earlier registration attempts (which will fail as _STA cannot > > > be checked). > > > > > > Reorder the remove flow to clear this per_cpu() after > > > arch_unregister_cpu() has completed, allowing it to be used in > > > there as well. > > > > > > Note that on x86 for the CPU hotplug case, the pr->id prior to > > > acpi_map_cpu() may be invalid. Thus the per_cpu() structures > > > must be initialized after that call or after checking the ID > > > is valid (not hotplug path). > > > > > > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > > > > > > --- > > > v8: On buggy bios detection when setting per_cpu structures > > > do not carry on. > > > Fix up the clearing of per cpu structures to remove unwanted > > > side effects and ensure an error code isn't use to reference them. > > > --- > > > drivers/acpi/acpi_processor.c | 79 +++++++++++++++++++++-------------- > > > 1 file changed, 48 insertions(+), 31 deletions(-) > > > > > > diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c > > > index ba0a6f0ac841..3b180e21f325 100644 > > > --- a/drivers/acpi/acpi_processor.c > > > +++ b/drivers/acpi/acpi_processor.c > > > @@ -183,8 +183,38 @@ static void __init acpi_pcc_cpufreq_init(void) {} > > > #endif /* CONFIG_X86 */ > > > > > > /* Initialization */ > > > +static DEFINE_PER_CPU(void *, processor_device_array); > > > + > > > +static bool acpi_processor_set_per_cpu(struct acpi_processor *pr, > > > + struct acpi_device *device) > > > +{ > > > + BUG_ON(pr->id >= nr_cpu_ids); > > > > One blank line after BUG_ON() if we need to follow original implementation. > > Sure unintentional - I'll put that back. > > > > > > + /* > > > + * Buggy BIOS check. > > > + * ACPI id of processors can be reported wrongly by the BIOS. > > > + * Don't trust it blindly > > > + */ > > > + if (per_cpu(processor_device_array, pr->id) != NULL && > > > + per_cpu(processor_device_array, pr->id) != device) { > > > + dev_warn(&device->dev, > > > + "BIOS reported wrong ACPI id %d for the processor\n", > > > + pr->id); > > > + /* Give up, but do not abort the namespace scan. */ > > > > It depends on how the return value is handled by the caller if the namespace > > is continued to be scanned. The caller can be acpi_processor_hotadd_init() > > and acpi_processor_get_info() after this patch is applied. So I think this > > specific comment need to be moved to the caller. > > Good point. This gets messy and was an unintended change. > > Previously the options were: > 1) acpi_processor_get_info() failed for other reasons - this code was never called. > 2) acpi_processor_get_info() succeeded without acpi_processor_hotadd_init (non hotplug) > this code then ran and would paper over the problem doing a bunch of cleanup under err. > 3) acpi_processor_get_info() succeeded with acpi_processor_hotadd_init called. > This code then ran and would paper over the problem doing a bunch of cleanup under err. > > We should maintain that or argue cleanly against it. The return value needs to be propagated to acpi_processor_add() so it can decide what to do with it. Now, acpi_processor_add() can only return 1 if the CPU has been successfully registered and initialized, so it is regarded as available (but it may not be online to start with). Returning 0 from it may get messy, because acpi_default_enumeration() will get called and it will attempt to create a platform device for the CPU, so in all cases in which the CPU is not regarded as available when acpi_processor_add() returns, it should return an error code (the exact value doesn't matter for its caller so long as it is negative). > This isn't helped the the fact I have no idea which cases we care about for that bios > bug handling. Do any of those bios's ever do hotplug? Guess we have to try and maintain > whatever protection this was offering. > > Also, the original code leaks data in some paths and I have limited idea > of whether it is intentional or not. So to tidy the issue up that you've identified > I'll need to try and make that code consistent first. I agree. > I suspect the only way to do that is going to be to duplicate the allocations we > 'want' to leak to deal with the bios bug detection. > > For example acpi_processor_get_info() failing leaks pr and pr->throttling.shared_cpu_map > before this series. After this series we need pr to leak because it's used for the detection > via processor_device_array. > > I'll work through this but it's going to be tricky to tell if we get right. > Step 1 will be closing the existing leaks and then we will have something > consistent to build on. Sounds good to me.
On Tue, 30 Apr 2024 10:28:38 +0100 Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote: > On Tue, 30 Apr 2024 14:17:24 +1000 > Gavin Shan <gshan@redhat.com> wrote: > > > On 4/26/24 23:51, Jonathan Cameron wrote: > > > Make the per_cpu(processors, cpu) entries available earlier so that > > > they are available in arch_register_cpu() as ARM64 will need access > > > to the acpi_handle to distinguish between acpi_processor_add() > > > and earlier registration attempts (which will fail as _STA cannot > > > be checked). > > > > > > Reorder the remove flow to clear this per_cpu() after > > > arch_unregister_cpu() has completed, allowing it to be used in > > > there as well. > > > > > > Note that on x86 for the CPU hotplug case, the pr->id prior to > > > acpi_map_cpu() may be invalid. Thus the per_cpu() structures > > > must be initialized after that call or after checking the ID > > > is valid (not hotplug path). > > > > > > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > > > > > > --- > > > v8: On buggy bios detection when setting per_cpu structures > > > do not carry on. > > > Fix up the clearing of per cpu structures to remove unwanted > > > side effects and ensure an error code isn't use to reference them. > > > --- > > > drivers/acpi/acpi_processor.c | 79 +++++++++++++++++++++-------------- > > > 1 file changed, 48 insertions(+), 31 deletions(-) > > > > > > diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c > > > index ba0a6f0ac841..3b180e21f325 100644 > > > --- a/drivers/acpi/acpi_processor.c > > > +++ b/drivers/acpi/acpi_processor.c > > > @@ -183,8 +183,38 @@ static void __init acpi_pcc_cpufreq_init(void) {} > > > #endif /* CONFIG_X86 */ > > > > > > /* Initialization */ > > > +static DEFINE_PER_CPU(void *, processor_device_array); > > > + > > > +static bool acpi_processor_set_per_cpu(struct acpi_processor *pr, > > > + struct acpi_device *device) > > > +{ > > > + BUG_ON(pr->id >= nr_cpu_ids); > > > > One blank line after BUG_ON() if we need to follow original implementation. > > Sure unintentional - I'll put that back. > > > > > > + /* > > > + * Buggy BIOS check. > > > + * ACPI id of processors can be reported wrongly by the BIOS. > > > + * Don't trust it blindly > > > + */ > > > + if (per_cpu(processor_device_array, pr->id) != NULL && > > > + per_cpu(processor_device_array, pr->id) != device) { > > > + dev_warn(&device->dev, > > > + "BIOS reported wrong ACPI id %d for the processor\n", > > > + pr->id); > > > + /* Give up, but do not abort the namespace scan. */ > > > > It depends on how the return value is handled by the caller if the namespace > > is continued to be scanned. The caller can be acpi_processor_hotadd_init() > > and acpi_processor_get_info() after this patch is applied. So I think this > > specific comment need to be moved to the caller. > > Good point. This gets messy and was an unintended change. > > Previously the options were: > 1) acpi_processor_get_info() failed for other reasons - this code was never called. > 2) acpi_processor_get_info() succeeded without acpi_processor_hotadd_init (non hotplug) > this code then ran and would paper over the problem doing a bunch of cleanup under err. > 3) acpi_processor_get_info() succeeded with acpi_processor_hotadd_init called. > This code then ran and would paper over the problem doing a bunch of cleanup under err. > > We should maintain that or argue cleanly against it. > > This isn't helped the the fact I have no idea which cases we care about for that bios > bug handling. Do any of those bios's ever do hotplug? Guess we have to try and maintain > whatever protection this was offering. > > Also, the original code leaks data in some paths and I have limited idea > of whether it is intentional or not. So to tidy the issue up that you've identified > I'll need to try and make that code consistent first. > > I suspect the only way to do that is going to be to duplicate the allocations we > 'want' to leak to deal with the bios bug detection. > > For example acpi_processor_get_info() failing leaks pr and pr->throttling.shared_cpu_map > before this series. After this series we need pr to leak because it's used for the detection > via processor_device_array. > > I'll work through this but it's going to be tricky to tell if we get right. > Step 1 will be closing the existing leaks and then we will have something > consistent to build on. > I 'think' that fixing the original leaks makes this all much more straight forward. That return 0 for acpi_processor_get_info() never made sense as far as I can tell. The pr isn't used after this point. What about something along lines of. diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c index 161c95c9d60a..97cff4492304 100644 --- a/drivers/acpi/acpi_processor.c +++ b/drivers/acpi/acpi_processor.c @@ -392,8 +392,10 @@ static int acpi_processor_add(struct acpi_device *device, device->driver_data = pr; result = acpi_processor_get_info(device); - if (result) /* Processor is not physically present or unavailable */ - return 0; + if (result) { /* Processor is not physically present or unavailable */ + result = 0; + goto err_free_throttling_mask; + } BUG_ON(pr->id >= nr_cpu_ids); @@ -408,7 +410,7 @@ static int acpi_processor_add(struct acpi_device *device, "BIOS reported wrong ACPI id %d for the processor\n", pr->id); /* Give up, but do not abort the namespace scan. */ - goto err; + goto err_clear_driver_data; } /* * processor_device_array is not cleared on errors to allow buggy BIOS @@ -420,12 +422,12 @@ static int acpi_processor_add(struct acpi_device *device, dev = get_cpu_device(pr->id); if (!dev) { result = -ENODEV; - goto err; + goto err_clear_per_cpu; } result = acpi_bind_one(dev, device); if (result) - goto err; + goto err_clear_per_cpu; pr->dev = dev; @@ -436,10 +438,12 @@ static int acpi_processor_add(struct acpi_device *device, dev_err(dev, "Processor driver could not be attached\n"); acpi_unbind_one(dev); - err: - free_cpumask_var(pr->throttling.shared_cpu_map); - device->driver_data = NULL; + err_clear_per_cpu: per_cpu(processors, pr->id) = NULL; + err_clear_driver_data: + device->driver_data = NULL; + err_free_throttling_mask: + free_cpumask_var(pr->throttling.shared_cpu_map); err_free_pr: kfree(pr); return result; Then the diff on this patch is simply: diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c index 3c49eae1e943..3b75f5aeb7ab 100644 --- a/drivers/acpi/acpi_processor.c +++ b/drivers/acpi/acpi_processor.c @@ -200,7 +200,6 @@ static bool acpi_processor_set_per_cpu(struct acpi_processor *pr, dev_warn(&device->dev, "BIOS reported wrong ACPI id %d for the processor\n", pr->id); - /* Give up, but do not abort the namespace scan. */ return false; } /* @@ -230,13 +229,14 @@ static int acpi_processor_hotadd_init(struct acpi_processor *pr, goto out; if (!acpi_processor_set_per_cpu(pr, device)) { + ret = -EINVAL; acpi_unmap_cpu(pr->id); goto out; } ret = arch_register_cpu(pr->id); if (ret) { - /* Leave the processor device array in place to detect buggy bios */ +x /* Leave the processor device array in place to detect buggy bios */ per_cpu(processors, pr->id) = NULL; acpi_unmap_cpu(pr->id); goto out; @@ -262,7 +262,7 @@ static inline int acpi_processor_hotadd_init(struct acpi_processor *pr, } #endif /* CONFIG_ACPI_HOTPLUG_CPU */ -static int acpi_processor_get_info(struct acpi_device *device) +static int acpi_processor_get_info(struct acpi_device *device, bool bios_bug) { union acpi_object object = { 0 }; struct acpi_buffer buffer = { sizeof(union acpi_object), &object }; @@ -361,7 +361,7 @@ static int acpi_processor_get_info(struct acpi_device *device) return ret; } else { if (!acpi_processor_set_per_cpu(pr, device)) - return 0; + return -EINVAL; } /* > > > > Besides, it seems acpi_processor_set_per_cpu() isn't properly called and > > memory leakage can happen. More details are given below. > > > > > + return false; > > > + } > > > + /* > > > + * processor_device_array is not cleared on errors to allow buggy BIOS > > > + * checks. > > > + */ > > > + per_cpu(processor_device_array, pr->id) = device; > > > + per_cpu(processors, pr->id) = pr; > > > + > > > + return true; > > > +} > > > + > > > #ifdef CONFIG_ACPI_HOTPLUG_CPU > > > -static int acpi_processor_hotadd_init(struct acpi_processor *pr) > > > +static int acpi_processor_hotadd_init(struct acpi_processor *pr, > > > + struct acpi_device *device) > > > { > > > int ret; > > > > > > @@ -198,8 +228,15 @@ static int acpi_processor_hotadd_init(struct acpi_processor *pr) > > > if (ret) > > > goto out; > > > > > > + if (!acpi_processor_set_per_cpu(pr, device)) { > > > + acpi_unmap_cpu(pr->id); > > > + goto out; > > > + } > > > + > > > > With the 'goto out', zero is returned from acpi_processor_hotadd_init() to acpi_processor_get_info(). Indeed a bug :( > > The zero return value is carried from acpi_map_cpu() in acpi_processor_hotadd_init(). If I'm correct, > > we need return errno from acpi_processor_get_info() to acpi_processor_add() so that cleanup can be > > done. For example, the cleanup corresponding to the 'err' tag can be done in acpi_processor_add(). > > Otherwise, we will have memory leakage. The confusion here was that previously acpi_processor_add() was missing error cleanup for acpi_processor_get_info(). With that in place I think it's all much simpler. Thanks for your eagle eyes! Jonathan > > > > > ret = arch_register_cpu(pr->id); > > > if (ret) { > > > + /* Leave the processor device array in place to detect buggy bios */ > > > + per_cpu(processors, pr->id) = NULL; > > > acpi_unmap_cpu(pr->id); > > > goto out; > > > } > > > @@ -217,7 +254,8 @@ static int acpi_processor_hotadd_init(struct acpi_processor *pr) > > > return ret; > > > } > > > #else > > > -static inline int acpi_processor_hotadd_init(struct acpi_processor *pr) > > > +static inline int acpi_processor_hotadd_init(struct acpi_processor *pr, > > > + struct acpi_device *device) > > > { > > > return -ENODEV; > > > } > > > @@ -316,10 +354,13 @@ static int acpi_processor_get_info(struct acpi_device *device) > > > * because cpuid <-> apicid mapping is persistent now. > > > */ > > > if (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id)) { > > > - int ret = acpi_processor_hotadd_init(pr); > > > + int ret = acpi_processor_hotadd_init(pr, device); > > > > > > if (ret) > > > return ret; > > > + } else { > > > + if (!acpi_processor_set_per_cpu(pr, device)) > > > + return 0; > > > } > > > > > > > For non-hotplug case, we still need pass the error to acpi_processor_add() so that > > cleanup corresponding 'err' tag can be done. Otherwise, we will have memory leakage. > > > > > /* > > > @@ -365,8 +406,6 @@ static int acpi_processor_get_info(struct acpi_device *device) > > > * (cpu_data(cpu)) values, like CPU feature flags, family, model, etc. > > > * Such things have to be put in and set up by the processor driver's .probe(). > > > */ > > > -static DEFINE_PER_CPU(void *, processor_device_array); > > > - > > > static int acpi_processor_add(struct acpi_device *device, > > > const struct acpi_device_id *id) > > > { > > > @@ -395,28 +434,6 @@ static int acpi_processor_add(struct acpi_device *device, > > > if (result) /* Processor is not physically present or unavailable */ > > > return 0; > > > > > > - BUG_ON(pr->id >= nr_cpu_ids); > > > - > > > - /* > > > - * Buggy BIOS check. > > > - * ACPI id of processors can be reported wrongly by the BIOS. > > > - * Don't trust it blindly > > > - */ > > > - if (per_cpu(processor_device_array, pr->id) != NULL && > > > - per_cpu(processor_device_array, pr->id) != device) { > > > - dev_warn(&device->dev, > > > - "BIOS reported wrong ACPI id %d for the processor\n", > > > - pr->id); > > > - /* Give up, but do not abort the namespace scan. */ > > > - goto err; > > > - } > > > - /* > > > - * processor_device_array is not cleared on errors to allow buggy BIOS > > > - * checks. > > > - */ > > > - per_cpu(processor_device_array, pr->id) = device; > > > - per_cpu(processors, pr->id) = pr; > > > - > > > dev = get_cpu_device(pr->id); > > > if (!dev) { > > > result = -ENODEV; > > > @@ -469,10 +486,6 @@ static void acpi_processor_remove(struct acpi_device *device) > > > device_release_driver(pr->dev); > > > acpi_unbind_one(pr->dev); > > > > > > - /* Clean up. */ > > > - per_cpu(processor_device_array, pr->id) = NULL; > > > - per_cpu(processors, pr->id) = NULL; > > > - > > > cpu_maps_update_begin(); > > > cpus_write_lock(); > > > > > > @@ -480,6 +493,10 @@ static void acpi_processor_remove(struct acpi_device *device) > > > arch_unregister_cpu(pr->id); > > > acpi_unmap_cpu(pr->id); > > > > > > + /* Clean up. */ > > > + per_cpu(processor_device_array, pr->id) = NULL; > > > + per_cpu(processors, pr->id) = NULL; > > > + > > > cpus_write_unlock(); > > > cpu_maps_update_done(); > > > > > > > Thanks, > > Gavin > > >
On Tue, Apr 30, 2024 at 12:13 PM Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote: > > On Tue, 30 Apr 2024 10:28:38 +0100 > Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote: > > > On Tue, 30 Apr 2024 14:17:24 +1000 > > Gavin Shan <gshan@redhat.com> wrote: > > > > > On 4/26/24 23:51, Jonathan Cameron wrote: > > > > Make the per_cpu(processors, cpu) entries available earlier so that > > > > they are available in arch_register_cpu() as ARM64 will need access > > > > to the acpi_handle to distinguish between acpi_processor_add() > > > > and earlier registration attempts (which will fail as _STA cannot > > > > be checked). > > > > > > > > Reorder the remove flow to clear this per_cpu() after > > > > arch_unregister_cpu() has completed, allowing it to be used in > > > > there as well. > > > > > > > > Note that on x86 for the CPU hotplug case, the pr->id prior to > > > > acpi_map_cpu() may be invalid. Thus the per_cpu() structures > > > > must be initialized after that call or after checking the ID > > > > is valid (not hotplug path). > > > > > > > > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > > > > > > > > --- > > > > v8: On buggy bios detection when setting per_cpu structures > > > > do not carry on. > > > > Fix up the clearing of per cpu structures to remove unwanted > > > > side effects and ensure an error code isn't use to reference them. > > > > --- > > > > drivers/acpi/acpi_processor.c | 79 +++++++++++++++++++++-------------- > > > > 1 file changed, 48 insertions(+), 31 deletions(-) > > > > > > > > diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c > > > > index ba0a6f0ac841..3b180e21f325 100644 > > > > --- a/drivers/acpi/acpi_processor.c > > > > +++ b/drivers/acpi/acpi_processor.c > > > > @@ -183,8 +183,38 @@ static void __init acpi_pcc_cpufreq_init(void) {} > > > > #endif /* CONFIG_X86 */ > > > > > > > > /* Initialization */ > > > > +static DEFINE_PER_CPU(void *, processor_device_array); > > > > + > > > > +static bool acpi_processor_set_per_cpu(struct acpi_processor *pr, > > > > + struct acpi_device *device) > > > > +{ > > > > + BUG_ON(pr->id >= nr_cpu_ids); > > > > > > One blank line after BUG_ON() if we need to follow original implementation. > > > > Sure unintentional - I'll put that back. > > > > > > > > > + /* > > > > + * Buggy BIOS check. > > > > + * ACPI id of processors can be reported wrongly by the BIOS. > > > > + * Don't trust it blindly > > > > + */ > > > > + if (per_cpu(processor_device_array, pr->id) != NULL && > > > > + per_cpu(processor_device_array, pr->id) != device) { > > > > + dev_warn(&device->dev, > > > > + "BIOS reported wrong ACPI id %d for the processor\n", > > > > + pr->id); > > > > + /* Give up, but do not abort the namespace scan. */ > > > > > > It depends on how the return value is handled by the caller if the namespace > > > is continued to be scanned. The caller can be acpi_processor_hotadd_init() > > > and acpi_processor_get_info() after this patch is applied. So I think this > > > specific comment need to be moved to the caller. > > > > Good point. This gets messy and was an unintended change. > > > > Previously the options were: > > 1) acpi_processor_get_info() failed for other reasons - this code was never called. > > 2) acpi_processor_get_info() succeeded without acpi_processor_hotadd_init (non hotplug) > > this code then ran and would paper over the problem doing a bunch of cleanup under err. > > 3) acpi_processor_get_info() succeeded with acpi_processor_hotadd_init called. > > This code then ran and would paper over the problem doing a bunch of cleanup under err. > > > > We should maintain that or argue cleanly against it. > > > > This isn't helped the the fact I have no idea which cases we care about for that bios > > bug handling. Do any of those bios's ever do hotplug? Guess we have to try and maintain > > whatever protection this was offering. > > > > Also, the original code leaks data in some paths and I have limited idea > > of whether it is intentional or not. So to tidy the issue up that you've identified > > I'll need to try and make that code consistent first. > > > > I suspect the only way to do that is going to be to duplicate the allocations we > > 'want' to leak to deal with the bios bug detection. > > > > For example acpi_processor_get_info() failing leaks pr and pr->throttling.shared_cpu_map > > before this series. After this series we need pr to leak because it's used for the detection > > via processor_device_array. > > > > I'll work through this but it's going to be tricky to tell if we get right. > > Step 1 will be closing the existing leaks and then we will have something > > consistent to build on. > > > I 'think' that fixing the original leaks makes this all much more straight forward. > That return 0 for acpi_processor_get_info() never made sense as far as I can tell. > The pr isn't used after this point. > > What about something along lines of. > > diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c > index 161c95c9d60a..97cff4492304 100644 > --- a/drivers/acpi/acpi_processor.c > +++ b/drivers/acpi/acpi_processor.c > @@ -392,8 +392,10 @@ static int acpi_processor_add(struct acpi_device *device, > device->driver_data = pr; > > result = acpi_processor_get_info(device); > - if (result) /* Processor is not physically present or unavailable */ > - return 0; > + if (result) { /* Processor is not physically present or unavailable */ > + result = 0; As per my previous message (just sent) this should be an error code, as returning 0 from acpi_processor_add() is generally problematic. > + goto err_free_throttling_mask; > + } > > BUG_ON(pr->id >= nr_cpu_ids); >
On Tue, 30 Apr 2024 12:17:38 +0200 "Rafael J. Wysocki" <rafael@kernel.org> wrote: > On Tue, Apr 30, 2024 at 12:13 PM Jonathan Cameron > <Jonathan.Cameron@huawei.com> wrote: > > > > On Tue, 30 Apr 2024 10:28:38 +0100 > > Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote: > > > > > On Tue, 30 Apr 2024 14:17:24 +1000 > > > Gavin Shan <gshan@redhat.com> wrote: > > > > > > > On 4/26/24 23:51, Jonathan Cameron wrote: > > > > > Make the per_cpu(processors, cpu) entries available earlier so that > > > > > they are available in arch_register_cpu() as ARM64 will need access > > > > > to the acpi_handle to distinguish between acpi_processor_add() > > > > > and earlier registration attempts (which will fail as _STA cannot > > > > > be checked). > > > > > > > > > > Reorder the remove flow to clear this per_cpu() after > > > > > arch_unregister_cpu() has completed, allowing it to be used in > > > > > there as well. > > > > > > > > > > Note that on x86 for the CPU hotplug case, the pr->id prior to > > > > > acpi_map_cpu() may be invalid. Thus the per_cpu() structures > > > > > must be initialized after that call or after checking the ID > > > > > is valid (not hotplug path). > > > > > > > > > > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > > > > > > > > > > --- > > > > > v8: On buggy bios detection when setting per_cpu structures > > > > > do not carry on. > > > > > Fix up the clearing of per cpu structures to remove unwanted > > > > > side effects and ensure an error code isn't use to reference them. > > > > > --- > > > > > drivers/acpi/acpi_processor.c | 79 +++++++++++++++++++++-------------- > > > > > 1 file changed, 48 insertions(+), 31 deletions(-) > > > > > > > > > > diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c > > > > > index ba0a6f0ac841..3b180e21f325 100644 > > > > > --- a/drivers/acpi/acpi_processor.c > > > > > +++ b/drivers/acpi/acpi_processor.c > > > > > @@ -183,8 +183,38 @@ static void __init acpi_pcc_cpufreq_init(void) {} > > > > > #endif /* CONFIG_X86 */ > > > > > > > > > > /* Initialization */ > > > > > +static DEFINE_PER_CPU(void *, processor_device_array); > > > > > + > > > > > +static bool acpi_processor_set_per_cpu(struct acpi_processor *pr, > > > > > + struct acpi_device *device) > > > > > +{ > > > > > + BUG_ON(pr->id >= nr_cpu_ids); > > > > > > > > One blank line after BUG_ON() if we need to follow original implementation. > > > > > > Sure unintentional - I'll put that back. > > > > > > > > > > > > + /* > > > > > + * Buggy BIOS check. > > > > > + * ACPI id of processors can be reported wrongly by the BIOS. > > > > > + * Don't trust it blindly > > > > > + */ > > > > > + if (per_cpu(processor_device_array, pr->id) != NULL && > > > > > + per_cpu(processor_device_array, pr->id) != device) { > > > > > + dev_warn(&device->dev, > > > > > + "BIOS reported wrong ACPI id %d for the processor\n", > > > > > + pr->id); > > > > > + /* Give up, but do not abort the namespace scan. */ > > > > > > > > It depends on how the return value is handled by the caller if the namespace > > > > is continued to be scanned. The caller can be acpi_processor_hotadd_init() > > > > and acpi_processor_get_info() after this patch is applied. So I think this > > > > specific comment need to be moved to the caller. > > > > > > Good point. This gets messy and was an unintended change. > > > > > > Previously the options were: > > > 1) acpi_processor_get_info() failed for other reasons - this code was never called. > > > 2) acpi_processor_get_info() succeeded without acpi_processor_hotadd_init (non hotplug) > > > this code then ran and would paper over the problem doing a bunch of cleanup under err. > > > 3) acpi_processor_get_info() succeeded with acpi_processor_hotadd_init called. > > > This code then ran and would paper over the problem doing a bunch of cleanup under err. > > > > > > We should maintain that or argue cleanly against it. > > > > > > This isn't helped the the fact I have no idea which cases we care about for that bios > > > bug handling. Do any of those bios's ever do hotplug? Guess we have to try and maintain > > > whatever protection this was offering. > > > > > > Also, the original code leaks data in some paths and I have limited idea > > > of whether it is intentional or not. So to tidy the issue up that you've identified > > > I'll need to try and make that code consistent first. > > > > > > I suspect the only way to do that is going to be to duplicate the allocations we > > > 'want' to leak to deal with the bios bug detection. > > > > > > For example acpi_processor_get_info() failing leaks pr and pr->throttling.shared_cpu_map > > > before this series. After this series we need pr to leak because it's used for the detection > > > via processor_device_array. > > > > > > I'll work through this but it's going to be tricky to tell if we get right. > > > Step 1 will be closing the existing leaks and then we will have something > > > consistent to build on. > > > > > I 'think' that fixing the original leaks makes this all much more straight forward. > > That return 0 for acpi_processor_get_info() never made sense as far as I can tell. > > The pr isn't used after this point. > > > > What about something along lines of. > > > > diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c > > index 161c95c9d60a..97cff4492304 100644 > > --- a/drivers/acpi/acpi_processor.c > > +++ b/drivers/acpi/acpi_processor.c > > @@ -392,8 +392,10 @@ static int acpi_processor_add(struct acpi_device *device, > > device->driver_data = pr; > > > > result = acpi_processor_get_info(device); > > - if (result) /* Processor is not physically present or unavailable */ > > - return 0; > > + if (result) { /* Processor is not physically present or unavailable */ > > + result = 0; > > As per my previous message (just sent) this should be an error code, > as returning 0 from acpi_processor_add() is generally problematic. Ok. I'll switch to that, but as a separate precusor patch. Independent of the memory leak fixes. Jonathan > > > + goto err_free_throttling_mask; > > + } > > > > BUG_ON(pr->id >= nr_cpu_ids); > >
On Tue, Apr 30, 2024 at 12:45 PM Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote: > > On Tue, 30 Apr 2024 12:17:38 +0200 > "Rafael J. Wysocki" <rafael@kernel.org> wrote: > > > On Tue, Apr 30, 2024 at 12:13 PM Jonathan Cameron > > <Jonathan.Cameron@huawei.com> wrote: > > > > > > On Tue, 30 Apr 2024 10:28:38 +0100 > > > Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote: > > > > > > > On Tue, 30 Apr 2024 14:17:24 +1000 > > > > Gavin Shan <gshan@redhat.com> wrote: > > > > > > > > > On 4/26/24 23:51, Jonathan Cameron wrote: > > > > > > Make the per_cpu(processors, cpu) entries available earlier so that > > > > > > they are available in arch_register_cpu() as ARM64 will need access > > > > > > to the acpi_handle to distinguish between acpi_processor_add() > > > > > > and earlier registration attempts (which will fail as _STA cannot > > > > > > be checked). > > > > > > > > > > > > Reorder the remove flow to clear this per_cpu() after > > > > > > arch_unregister_cpu() has completed, allowing it to be used in > > > > > > there as well. > > > > > > > > > > > > Note that on x86 for the CPU hotplug case, the pr->id prior to > > > > > > acpi_map_cpu() may be invalid. Thus the per_cpu() structures > > > > > > must be initialized after that call or after checking the ID > > > > > > is valid (not hotplug path). > > > > > > > > > > > > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > > > > > > > > > > > > --- > > > > > > v8: On buggy bios detection when setting per_cpu structures > > > > > > do not carry on. > > > > > > Fix up the clearing of per cpu structures to remove unwanted > > > > > > side effects and ensure an error code isn't use to reference them. > > > > > > --- > > > > > > drivers/acpi/acpi_processor.c | 79 +++++++++++++++++++++-------------- > > > > > > 1 file changed, 48 insertions(+), 31 deletions(-) > > > > > > > > > > > > diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c > > > > > > index ba0a6f0ac841..3b180e21f325 100644 > > > > > > --- a/drivers/acpi/acpi_processor.c > > > > > > +++ b/drivers/acpi/acpi_processor.c > > > > > > @@ -183,8 +183,38 @@ static void __init acpi_pcc_cpufreq_init(void) {} > > > > > > #endif /* CONFIG_X86 */ > > > > > > > > > > > > /* Initialization */ > > > > > > +static DEFINE_PER_CPU(void *, processor_device_array); > > > > > > + > > > > > > +static bool acpi_processor_set_per_cpu(struct acpi_processor *pr, > > > > > > + struct acpi_device *device) > > > > > > +{ > > > > > > + BUG_ON(pr->id >= nr_cpu_ids); > > > > > > > > > > One blank line after BUG_ON() if we need to follow original implementation. > > > > > > > > Sure unintentional - I'll put that back. > > > > > > > > > > > > > > > + /* > > > > > > + * Buggy BIOS check. > > > > > > + * ACPI id of processors can be reported wrongly by the BIOS. > > > > > > + * Don't trust it blindly > > > > > > + */ > > > > > > + if (per_cpu(processor_device_array, pr->id) != NULL && > > > > > > + per_cpu(processor_device_array, pr->id) != device) { > > > > > > + dev_warn(&device->dev, > > > > > > + "BIOS reported wrong ACPI id %d for the processor\n", > > > > > > + pr->id); > > > > > > + /* Give up, but do not abort the namespace scan. */ > > > > > > > > > > It depends on how the return value is handled by the caller if the namespace > > > > > is continued to be scanned. The caller can be acpi_processor_hotadd_init() > > > > > and acpi_processor_get_info() after this patch is applied. So I think this > > > > > specific comment need to be moved to the caller. > > > > > > > > Good point. This gets messy and was an unintended change. > > > > > > > > Previously the options were: > > > > 1) acpi_processor_get_info() failed for other reasons - this code was never called. > > > > 2) acpi_processor_get_info() succeeded without acpi_processor_hotadd_init (non hotplug) > > > > this code then ran and would paper over the problem doing a bunch of cleanup under err. > > > > 3) acpi_processor_get_info() succeeded with acpi_processor_hotadd_init called. > > > > This code then ran and would paper over the problem doing a bunch of cleanup under err. > > > > > > > > We should maintain that or argue cleanly against it. > > > > > > > > This isn't helped the the fact I have no idea which cases we care about for that bios > > > > bug handling. Do any of those bios's ever do hotplug? Guess we have to try and maintain > > > > whatever protection this was offering. > > > > > > > > Also, the original code leaks data in some paths and I have limited idea > > > > of whether it is intentional or not. So to tidy the issue up that you've identified > > > > I'll need to try and make that code consistent first. > > > > > > > > I suspect the only way to do that is going to be to duplicate the allocations we > > > > 'want' to leak to deal with the bios bug detection. > > > > > > > > For example acpi_processor_get_info() failing leaks pr and pr->throttling.shared_cpu_map > > > > before this series. After this series we need pr to leak because it's used for the detection > > > > via processor_device_array. > > > > > > > > I'll work through this but it's going to be tricky to tell if we get right. > > > > Step 1 will be closing the existing leaks and then we will have something > > > > consistent to build on. > > > > > > > I 'think' that fixing the original leaks makes this all much more straight forward. > > > That return 0 for acpi_processor_get_info() never made sense as far as I can tell. > > > The pr isn't used after this point. > > > > > > What about something along lines of. > > > > > > diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c > > > index 161c95c9d60a..97cff4492304 100644 > > > --- a/drivers/acpi/acpi_processor.c > > > +++ b/drivers/acpi/acpi_processor.c > > > @@ -392,8 +392,10 @@ static int acpi_processor_add(struct acpi_device *device, > > > device->driver_data = pr; > > > > > > result = acpi_processor_get_info(device); > > > - if (result) /* Processor is not physically present or unavailable */ > > > - return 0; > > > + if (result) { /* Processor is not physically present or unavailable */ > > > + result = 0; > > > > As per my previous message (just sent) this should be an error code, > > as returning 0 from acpi_processor_add() is generally problematic. > Ok. I'll switch to that, but as a separate precusor patch. Independent of > the memory leak fixes. Sure, thank you!
On Tue, 30 Apr 2024 11:13:41 +0100 Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote: > On Tue, 30 Apr 2024 10:28:38 +0100 > Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote: > > > On Tue, 30 Apr 2024 14:17:24 +1000 > > Gavin Shan <gshan@redhat.com> wrote: > > > > > On 4/26/24 23:51, Jonathan Cameron wrote: > > > > Make the per_cpu(processors, cpu) entries available earlier so that > > > > they are available in arch_register_cpu() as ARM64 will need access > > > > to the acpi_handle to distinguish between acpi_processor_add() > > > > and earlier registration attempts (which will fail as _STA cannot > > > > be checked). > > > > > > > > Reorder the remove flow to clear this per_cpu() after > > > > arch_unregister_cpu() has completed, allowing it to be used in > > > > there as well. > > > > > > > > Note that on x86 for the CPU hotplug case, the pr->id prior to > > > > acpi_map_cpu() may be invalid. Thus the per_cpu() structures > > > > must be initialized after that call or after checking the ID > > > > is valid (not hotplug path). > > > > > > > > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > > > > > > > > --- > > > > v8: On buggy bios detection when setting per_cpu structures > > > > do not carry on. > > > > Fix up the clearing of per cpu structures to remove unwanted > > > > side effects and ensure an error code isn't use to reference them. > > > > --- > > > > drivers/acpi/acpi_processor.c | 79 +++++++++++++++++++++-------------- > > > > 1 file changed, 48 insertions(+), 31 deletions(-) > > > > > > > > diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c > > > > index ba0a6f0ac841..3b180e21f325 100644 > > > > --- a/drivers/acpi/acpi_processor.c > > > > +++ b/drivers/acpi/acpi_processor.c > > > > @@ -183,8 +183,38 @@ static void __init acpi_pcc_cpufreq_init(void) {} > > > > #endif /* CONFIG_X86 */ > > > > > > > > /* Initialization */ > > > > +static DEFINE_PER_CPU(void *, processor_device_array); > > > > + > > > > +static bool acpi_processor_set_per_cpu(struct acpi_processor *pr, > > > > + struct acpi_device *device) > > > > +{ > > > > + BUG_ON(pr->id >= nr_cpu_ids); > > > > > > One blank line after BUG_ON() if we need to follow original implementation. > > > > Sure unintentional - I'll put that back. > > > > > > > > > + /* > > > > + * Buggy BIOS check. > > > > + * ACPI id of processors can be reported wrongly by the BIOS. > > > > + * Don't trust it blindly > > > > + */ > > > > + if (per_cpu(processor_device_array, pr->id) != NULL && > > > > + per_cpu(processor_device_array, pr->id) != device) { > > > > + dev_warn(&device->dev, > > > > + "BIOS reported wrong ACPI id %d for the processor\n", > > > > + pr->id); > > > > + /* Give up, but do not abort the namespace scan. */ > > > > > > It depends on how the return value is handled by the caller if the namespace > > > is continued to be scanned. The caller can be acpi_processor_hotadd_init() > > > and acpi_processor_get_info() after this patch is applied. So I think this > > > specific comment need to be moved to the caller. > > > > Good point. This gets messy and was an unintended change. > > > > Previously the options were: > > 1) acpi_processor_get_info() failed for other reasons - this code was never called. > > 2) acpi_processor_get_info() succeeded without acpi_processor_hotadd_init (non hotplug) > > this code then ran and would paper over the problem doing a bunch of cleanup under err. > > 3) acpi_processor_get_info() succeeded with acpi_processor_hotadd_init called. > > This code then ran and would paper over the problem doing a bunch of cleanup under err. > > > > We should maintain that or argue cleanly against it. > > > > This isn't helped the the fact I have no idea which cases we care about for that bios > > bug handling. Do any of those bios's ever do hotplug? Guess we have to try and maintain > > whatever protection this was offering. > > > > Also, the original code leaks data in some paths and I have limited idea > > of whether it is intentional or not. So to tidy the issue up that you've identified > > I'll need to try and make that code consistent first. > > > > I suspect the only way to do that is going to be to duplicate the allocations we > > 'want' to leak to deal with the bios bug detection. > > > > For example acpi_processor_get_info() failing leaks pr and pr->throttling.shared_cpu_map > > before this series. After this series we need pr to leak because it's used for the detection > > via processor_device_array. > > > > I'll work through this but it's going to be tricky to tell if we get right. > > Step 1 will be closing the existing leaks and then we will have something > > consistent to build on. > > > I 'think' that fixing the original leaks makes this all much more straight forward. > That return 0 for acpi_processor_get_info() never made sense as far as I can tell. > The pr isn't used after this point. > > What about something along lines of. > > diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c > index 161c95c9d60a..97cff4492304 100644 > --- a/drivers/acpi/acpi_processor.c > +++ b/drivers/acpi/acpi_processor.c > @@ -392,8 +392,10 @@ static int acpi_processor_add(struct acpi_device *device, > device->driver_data = pr; > > result = acpi_processor_get_info(device); > - if (result) /* Processor is not physically present or unavailable */ > - return 0; > + if (result) { /* Processor is not physically present or unavailable */ > + result = 0; > + goto err_free_throttling_mask; FWIW this is wrong, should be goto err_clear_driver_data (you can see it set just at the top of this block and that never fails!) The err_free_throttling_mask label should be unused and hence won't exist in v9. > + } > > BUG_ON(pr->id >= nr_cpu_ids); > > @@ -408,7 +410,7 @@ static int acpi_processor_add(struct acpi_device *device, > "BIOS reported wrong ACPI id %d for the processor\n", > pr->id); > /* Give up, but do not abort the namespace scan. */ > - goto err; > + goto err_clear_driver_data; > } > /* > * processor_device_array is not cleared on errors to allow buggy BIOS > @@ -420,12 +422,12 @@ static int acpi_processor_add(struct acpi_device *device, > dev = get_cpu_device(pr->id); > if (!dev) { > result = -ENODEV; > - goto err; > + goto err_clear_per_cpu; > } > > result = acpi_bind_one(dev, device); > if (result) > - goto err; > + goto err_clear_per_cpu; > > pr->dev = dev; > > @@ -436,10 +438,12 @@ static int acpi_processor_add(struct acpi_device *device, > dev_err(dev, "Processor driver could not be attached\n"); > acpi_unbind_one(dev); > > - err: > - free_cpumask_var(pr->throttling.shared_cpu_map); > - device->driver_data = NULL; > + err_clear_per_cpu: > per_cpu(processors, pr->id) = NULL; > + err_clear_driver_data: > + device->driver_data = NULL; > + err_free_throttling_mask: > + free_cpumask_var(pr->throttling.shared_cpu_map); > err_free_pr: > kfree(pr); > return result; > > Then the diff on this patch is simply: > > diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c > index 3c49eae1e943..3b75f5aeb7ab 100644 > --- a/drivers/acpi/acpi_processor.c > +++ b/drivers/acpi/acpi_processor.c > @@ -200,7 +200,6 @@ static bool acpi_processor_set_per_cpu(struct acpi_processor *pr, > dev_warn(&device->dev, > "BIOS reported wrong ACPI id %d for the processor\n", > pr->id); > - /* Give up, but do not abort the namespace scan. */ > return false; > } > /* > @@ -230,13 +229,14 @@ static int acpi_processor_hotadd_init(struct acpi_processor *pr, > goto out; > > if (!acpi_processor_set_per_cpu(pr, device)) { > + ret = -EINVAL; > acpi_unmap_cpu(pr->id); > goto out; > } > > ret = arch_register_cpu(pr->id); > if (ret) { > - /* Leave the processor device array in place to detect buggy bios */ > +x /* Leave the processor device array in place to detect buggy bios */ > per_cpu(processors, pr->id) = NULL; > acpi_unmap_cpu(pr->id); > goto out; > @@ -262,7 +262,7 @@ static inline int acpi_processor_hotadd_init(struct acpi_processor *pr, > } > #endif /* CONFIG_ACPI_HOTPLUG_CPU */ > > -static int acpi_processor_get_info(struct acpi_device *device) > +static int acpi_processor_get_info(struct acpi_device *device, bool bios_bug) > { > union acpi_object object = { 0 }; > struct acpi_buffer buffer = { sizeof(union acpi_object), &object }; > @@ -361,7 +361,7 @@ static int acpi_processor_get_info(struct acpi_device *device) > return ret; > } else { > if (!acpi_processor_set_per_cpu(pr, device)) > - return 0; > + return -EINVAL; > } > > /* > > > > > > Besides, it seems acpi_processor_set_per_cpu() isn't properly called and > > > memory leakage can happen. More details are given below. > > > > > > > + return false; > > > > + } > > > > + /* > > > > + * processor_device_array is not cleared on errors to allow buggy BIOS > > > > + * checks. > > > > + */ > > > > + per_cpu(processor_device_array, pr->id) = device; > > > > + per_cpu(processors, pr->id) = pr; > > > > + > > > > + return true; > > > > +} > > > > + > > > > #ifdef CONFIG_ACPI_HOTPLUG_CPU > > > > -static int acpi_processor_hotadd_init(struct acpi_processor *pr) > > > > +static int acpi_processor_hotadd_init(struct acpi_processor *pr, > > > > + struct acpi_device *device) > > > > { > > > > int ret; > > > > > > > > @@ -198,8 +228,15 @@ static int acpi_processor_hotadd_init(struct acpi_processor *pr) > > > > if (ret) > > > > goto out; > > > > > > > > + if (!acpi_processor_set_per_cpu(pr, device)) { > > > > + acpi_unmap_cpu(pr->id); > > > > + goto out; > > > > + } > > > > + > > > > > > With the 'goto out', zero is returned from acpi_processor_hotadd_init() to acpi_processor_get_info(). > > Indeed a bug :( > > > > The zero return value is carried from acpi_map_cpu() in acpi_processor_hotadd_init(). If I'm correct, > > > we need return errno from acpi_processor_get_info() to acpi_processor_add() so that cleanup can be > > > done. For example, the cleanup corresponding to the 'err' tag can be done in acpi_processor_add(). > > > Otherwise, we will have memory leakage. > > The confusion here was that previously acpi_processor_add() was missing error cleanup for > acpi_processor_get_info(). With that in place I think it's all much simpler. > > Thanks for your eagle eyes! > > Jonathan > > > > > > > > > ret = arch_register_cpu(pr->id); > > > > if (ret) { > > > > + /* Leave the processor device array in place to detect buggy bios */ > > > > + per_cpu(processors, pr->id) = NULL; > > > > acpi_unmap_cpu(pr->id); > > > > goto out; > > > > } > > > > @@ -217,7 +254,8 @@ static int acpi_processor_hotadd_init(struct acpi_processor *pr) > > > > return ret; > > > > } > > > > #else > > > > -static inline int acpi_processor_hotadd_init(struct acpi_processor *pr) > > > > +static inline int acpi_processor_hotadd_init(struct acpi_processor *pr, > > > > + struct acpi_device *device) > > > > { > > > > return -ENODEV; > > > > } > > > > @@ -316,10 +354,13 @@ static int acpi_processor_get_info(struct acpi_device *device) > > > > * because cpuid <-> apicid mapping is persistent now. > > > > */ > > > > if (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id)) { > > > > - int ret = acpi_processor_hotadd_init(pr); > > > > + int ret = acpi_processor_hotadd_init(pr, device); > > > > > > > > if (ret) > > > > return ret; > > > > + } else { > > > > + if (!acpi_processor_set_per_cpu(pr, device)) > > > > + return 0; > > > > } > > > > > > > > > > For non-hotplug case, we still need pass the error to acpi_processor_add() so that > > > cleanup corresponding 'err' tag can be done. Otherwise, we will have memory leakage. > > > > > > > /* > > > > @@ -365,8 +406,6 @@ static int acpi_processor_get_info(struct acpi_device *device) > > > > * (cpu_data(cpu)) values, like CPU feature flags, family, model, etc. > > > > * Such things have to be put in and set up by the processor driver's .probe(). > > > > */ > > > > -static DEFINE_PER_CPU(void *, processor_device_array); > > > > - > > > > static int acpi_processor_add(struct acpi_device *device, > > > > const struct acpi_device_id *id) > > > > { > > > > @@ -395,28 +434,6 @@ static int acpi_processor_add(struct acpi_device *device, > > > > if (result) /* Processor is not physically present or unavailable */ > > > > return 0; > > > > > > > > - BUG_ON(pr->id >= nr_cpu_ids); > > > > - > > > > - /* > > > > - * Buggy BIOS check. > > > > - * ACPI id of processors can be reported wrongly by the BIOS. > > > > - * Don't trust it blindly > > > > - */ > > > > - if (per_cpu(processor_device_array, pr->id) != NULL && > > > > - per_cpu(processor_device_array, pr->id) != device) { > > > > - dev_warn(&device->dev, > > > > - "BIOS reported wrong ACPI id %d for the processor\n", > > > > - pr->id); > > > > - /* Give up, but do not abort the namespace scan. */ > > > > - goto err; > > > > - } > > > > - /* > > > > - * processor_device_array is not cleared on errors to allow buggy BIOS > > > > - * checks. > > > > - */ > > > > - per_cpu(processor_device_array, pr->id) = device; > > > > - per_cpu(processors, pr->id) = pr; > > > > - > > > > dev = get_cpu_device(pr->id); > > > > if (!dev) { > > > > result = -ENODEV; > > > > @@ -469,10 +486,6 @@ static void acpi_processor_remove(struct acpi_device *device) > > > > device_release_driver(pr->dev); > > > > acpi_unbind_one(pr->dev); > > > > > > > > - /* Clean up. */ > > > > - per_cpu(processor_device_array, pr->id) = NULL; > > > > - per_cpu(processors, pr->id) = NULL; > > > > - > > > > cpu_maps_update_begin(); > > > > cpus_write_lock(); > > > > > > > > @@ -480,6 +493,10 @@ static void acpi_processor_remove(struct acpi_device *device) > > > > arch_unregister_cpu(pr->id); > > > > acpi_unmap_cpu(pr->id); > > > > > > > > + /* Clean up. */ > > > > + per_cpu(processor_device_array, pr->id) = NULL; > > > > + per_cpu(processors, pr->id) = NULL; > > > > + > > > > cpus_write_unlock(); > > > > cpu_maps_update_done(); > > > > > > > > > > Thanks, > > > Gavin > > > > > > > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c index ba0a6f0ac841..3b180e21f325 100644 --- a/drivers/acpi/acpi_processor.c +++ b/drivers/acpi/acpi_processor.c @@ -183,8 +183,38 @@ static void __init acpi_pcc_cpufreq_init(void) {} #endif /* CONFIG_X86 */ /* Initialization */ +static DEFINE_PER_CPU(void *, processor_device_array); + +static bool acpi_processor_set_per_cpu(struct acpi_processor *pr, + struct acpi_device *device) +{ + BUG_ON(pr->id >= nr_cpu_ids); + /* + * Buggy BIOS check. + * ACPI id of processors can be reported wrongly by the BIOS. + * Don't trust it blindly + */ + if (per_cpu(processor_device_array, pr->id) != NULL && + per_cpu(processor_device_array, pr->id) != device) { + dev_warn(&device->dev, + "BIOS reported wrong ACPI id %d for the processor\n", + pr->id); + /* Give up, but do not abort the namespace scan. */ + return false; + } + /* + * processor_device_array is not cleared on errors to allow buggy BIOS + * checks. + */ + per_cpu(processor_device_array, pr->id) = device; + per_cpu(processors, pr->id) = pr; + + return true; +} + #ifdef CONFIG_ACPI_HOTPLUG_CPU -static int acpi_processor_hotadd_init(struct acpi_processor *pr) +static int acpi_processor_hotadd_init(struct acpi_processor *pr, + struct acpi_device *device) { int ret; @@ -198,8 +228,15 @@ static int acpi_processor_hotadd_init(struct acpi_processor *pr) if (ret) goto out; + if (!acpi_processor_set_per_cpu(pr, device)) { + acpi_unmap_cpu(pr->id); + goto out; + } + ret = arch_register_cpu(pr->id); if (ret) { + /* Leave the processor device array in place to detect buggy bios */ + per_cpu(processors, pr->id) = NULL; acpi_unmap_cpu(pr->id); goto out; } @@ -217,7 +254,8 @@ static int acpi_processor_hotadd_init(struct acpi_processor *pr) return ret; } #else -static inline int acpi_processor_hotadd_init(struct acpi_processor *pr) +static inline int acpi_processor_hotadd_init(struct acpi_processor *pr, + struct acpi_device *device) { return -ENODEV; } @@ -316,10 +354,13 @@ static int acpi_processor_get_info(struct acpi_device *device) * because cpuid <-> apicid mapping is persistent now. */ if (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id)) { - int ret = acpi_processor_hotadd_init(pr); + int ret = acpi_processor_hotadd_init(pr, device); if (ret) return ret; + } else { + if (!acpi_processor_set_per_cpu(pr, device)) + return 0; } /* @@ -365,8 +406,6 @@ static int acpi_processor_get_info(struct acpi_device *device) * (cpu_data(cpu)) values, like CPU feature flags, family, model, etc. * Such things have to be put in and set up by the processor driver's .probe(). */ -static DEFINE_PER_CPU(void *, processor_device_array); - static int acpi_processor_add(struct acpi_device *device, const struct acpi_device_id *id) { @@ -395,28 +434,6 @@ static int acpi_processor_add(struct acpi_device *device, if (result) /* Processor is not physically present or unavailable */ return 0; - BUG_ON(pr->id >= nr_cpu_ids); - - /* - * Buggy BIOS check. - * ACPI id of processors can be reported wrongly by the BIOS. - * Don't trust it blindly - */ - if (per_cpu(processor_device_array, pr->id) != NULL && - per_cpu(processor_device_array, pr->id) != device) { - dev_warn(&device->dev, - "BIOS reported wrong ACPI id %d for the processor\n", - pr->id); - /* Give up, but do not abort the namespace scan. */ - goto err; - } - /* - * processor_device_array is not cleared on errors to allow buggy BIOS - * checks. - */ - per_cpu(processor_device_array, pr->id) = device; - per_cpu(processors, pr->id) = pr; - dev = get_cpu_device(pr->id); if (!dev) { result = -ENODEV; @@ -469,10 +486,6 @@ static void acpi_processor_remove(struct acpi_device *device) device_release_driver(pr->dev); acpi_unbind_one(pr->dev); - /* Clean up. */ - per_cpu(processor_device_array, pr->id) = NULL; - per_cpu(processors, pr->id) = NULL; - cpu_maps_update_begin(); cpus_write_lock(); @@ -480,6 +493,10 @@ static void acpi_processor_remove(struct acpi_device *device) arch_unregister_cpu(pr->id); acpi_unmap_cpu(pr->id); + /* Clean up. */ + per_cpu(processor_device_array, pr->id) = NULL; + per_cpu(processors, pr->id) = NULL; + cpus_write_unlock(); cpu_maps_update_done();
Make the per_cpu(processors, cpu) entries available earlier so that they are available in arch_register_cpu() as ARM64 will need access to the acpi_handle to distinguish between acpi_processor_add() and earlier registration attempts (which will fail as _STA cannot be checked). Reorder the remove flow to clear this per_cpu() after arch_unregister_cpu() has completed, allowing it to be used in there as well. Note that on x86 for the CPU hotplug case, the pr->id prior to acpi_map_cpu() may be invalid. Thus the per_cpu() structures must be initialized after that call or after checking the ID is valid (not hotplug path). Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> --- v8: On buggy bios detection when setting per_cpu structures do not carry on. Fix up the clearing of per cpu structures to remove unwanted side effects and ensure an error code isn't use to reference them. --- drivers/acpi/acpi_processor.c | 79 +++++++++++++++++++++-------------- 1 file changed, 48 insertions(+), 31 deletions(-)