Message ID | 20240313030251.1049624-1-hamish.martin@alliedtelesis.co.nz |
---|---|
State | New |
Headers | show |
Series | gpiolib: acpi: Move storage of acpi_gpio_chip | expand |
On Thu, 2024-03-14 at 15:18 +0200, andriy.shevchenko@linux.intel.com wrote: > On Thu, Mar 14, 2024 at 01:13:31AM +0000, Hamish Martin wrote: > > On Wed, 2024-03-13 at 13:26 +0200, Andy Shevchenko wrote: > > ... > > > Removing the setting of the handle to invalid may be the right fix > > but > > I don't feel I know the code well enough to make a decision on > > that. It > > certainly doesn't resolve things immediately - I saw ref counting > > warnings output. > > Not removing, but moving to the better place? > Can you share warnings, though? > For context here is the current call chain that leads to acpi_gpiochip_remove(): acpi_gpiochip_remove+0x32/0x1a0 gpiochip_remove+0x39/0x140 devres_release_group+0xe6/0x160 i2c_device_remove+0x2d/0x80 device_release_driver_internal+0x19a/0x200 bus_remove_device+0xbf/0x100 device_del+0x157/0x490 ? __pfx_device_match_fwnode+0x10/0x10 ? srso_return_thunk+0x5/0x5f device_unregister+0xd/0x30 i2c_acpi_notify+0x10e/0x160 notifier_call_chain+0x58/0xd0 blocking_notifier_call_chain+0x3a/0x60 acpi_device_del_work_fn+0x85/0x1d0 process_one_work+0x134/0x2f0 worker_thread+0x2f0/0x410 ? __pfx_worker_thread+0x10/0x10 kthread+0xe3/0x110 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2f/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 I removed the setting of adev->handle = INVALID_ACPI_HANDLE from acpi_scan_drop_device() and shifted it to just after the call to blocking_notifier_call_chain() in acpi_device_del_work_fn(). With that it seems things progress further with the call to acpi_get_data() in acpi_gpiochip_remove() succeeding now. However, later in acpi_gpiochip_free_regions() we hit this error: pca953x i2c-PRP0001:03: Failed to remove GPIO OpRegion handler We also get these errors: ACPI Warning: Obj 00000000ba6a9600, Reference Count is already zero, cannot decrement (20230628/utdelete-422) > P.S. > I'm not an expert in ACPICA and low lever of ACPI glue layer in the > Linux > kernel, perhaps Rafael can suggest something better. > OK, thanks for your help. Hopefully Rafael can add something to the discssion. Thanks, Hamish M
Summon Hans and Sakari for the ideas and for heads up (MIPI case can be affected as well). On Fri, Mar 15, 2024 at 12:39:03AM +0000, Hamish Martin wrote: > On Thu, 2024-03-14 at 15:18 +0200, andriy.shevchenko@linux.intel.com > wrote: > > On Thu, Mar 14, 2024 at 01:13:31AM +0000, Hamish Martin wrote: > > > On Wed, 2024-03-13 at 13:26 +0200, Andy Shevchenko wrote: ... > > > Removing the setting of the handle to invalid may be the right fix > > > but > > > I don't feel I know the code well enough to make a decision on > > > that. It > > > certainly doesn't resolve things immediately - I saw ref counting > > > warnings output. > > > > Not removing, but moving to the better place? > > Can you share warnings, though? > > > For context here is the current call chain that leads to > acpi_gpiochip_remove(): > > acpi_gpiochip_remove+0x32/0x1a0 > gpiochip_remove+0x39/0x140 > devres_release_group+0xe6/0x160 > i2c_device_remove+0x2d/0x80 > device_release_driver_internal+0x19a/0x200 > bus_remove_device+0xbf/0x100 > device_del+0x157/0x490 > ? __pfx_device_match_fwnode+0x10/0x10 > ? srso_return_thunk+0x5/0x5f > device_unregister+0xd/0x30 > i2c_acpi_notify+0x10e/0x160 > notifier_call_chain+0x58/0xd0 > blocking_notifier_call_chain+0x3a/0x60 > acpi_device_del_work_fn+0x85/0x1d0 > process_one_work+0x134/0x2f0 > worker_thread+0x2f0/0x410 > ? __pfx_worker_thread+0x10/0x10 > kthread+0xe3/0x110 > ? __pfx_kthread+0x10/0x10 > ret_from_fork+0x2f/0x50 > ? __pfx_kthread+0x10/0x10 > ret_from_fork_asm+0x1b/0x30 Right, and with invalid handle it can't process further. And note, that _pointer_ is valid, the content become unavailable. > I removed the setting of adev->handle = INVALID_ACPI_HANDLE from > acpi_scan_drop_device() and shifted it to just after the call to > blocking_notifier_call_chain() in acpi_device_del_work_fn(). > With that it seems things progress further with the call to > acpi_get_data() in acpi_gpiochip_remove() succeeding now. However, > later in acpi_gpiochip_free_regions() we hit this error: > > pca953x i2c-PRP0001:03: Failed to remove GPIO OpRegion handler > > We also get these errors: > ACPI Warning: Obj 00000000ba6a9600, Reference Count is already zero, > cannot decrement > (20230628/utdelete-422) Right, because of ACPICA (not even ACPI glue layer!) the callbacks are called when namespace node (which is acpi_handle) is being removed. I spend a few hours to understand the history of the invalidation of the handle. TBH, it looks like a hack to me, but its presence seems necessary to avoid racing with the hotplug work. It's a lot of functions that run asynchronously there and the validness of some objects is questionable. Here are the commits in question: d783156ea384 ("ACPI / scan: Define non-empty device removal handler") c27b2c33b621 ("ACPI / hotplug: Introduce common hotplug function acpi_device_hotplug()") (It doesn't mean they are bad, it means that this requires more investigation.) That said, from my perspective what should be done/clarified - the scope of ACPI data (Can we use it outside of ACPICA/ACPI glue layer or not?) - clarify in the documentation the scope / life time of the ACPI objects from the ACPICA perspective (Is it already done? Where?) - remove that invalidation hack and find better solution for the race avoidance > > P.S. > > I'm not an expert in ACPICA and low lever of ACPI glue layer in the Linux > > kernel, perhaps Rafael can suggest something better. > > > OK, thanks for your help. Hopefully Rafael can add something to the > discssion. Added more people to harvest the ideas.
On Wed, Mar 13, 2024 at 4:03 AM Hamish Martin <hamish.martin@alliedtelesis.co.nz> wrote: > > A memory leak occurs in a scenario where an ACPI SSDT overlay is removed > and that is the trigger for the removal of the acpi_gpio_chip. > This occurs because we use the ACPI_HANDLE of the chip->parent as a > convenient location to tie the 'struct acpi_gpio_chip' to, using > acpi_attach_data(). > This is fine in the usual case of removal of the 'struct acpi_gpio_chip' > via a call to acpi_gpiochip_remove() because usually the ACPI data is > still valid. > But in the case of an SSDT overlay removal, the ACPI data has been > marked as invalid before the removal code is triggered (see the setting > of the acpi_device handle to INVALID_ACPI_HANDLE in > acpi_scan_drop_device()) This is not entirely accurate. The ACPI handle of the struct acpi_device going away is replaced with INVALID_ACPI_HANDLE because by the time acpi_device_del_work_fn() runs, the namespace object identified by this handle may have been deleted already. Moreover, all of the data items attached to that object are deleted by the very acpi_ns_delete_node() call that has invoked acpi_scan_drop_device(). So acpi_scan_drop_device() must invalidate the ACPI handle in the struct acpi_device, because going forward it is not associated with a valid namespace object and all of the data items attached to it via acpi_attach_data() have been deleted. > This means that by the time we execute > acpi_gpiochip_remove(), the handles are invalid and we are unable to > retrieve the 'struct acpi_gpio_chip' using acpi_get_data(). Indeed, that information has gone away already. However, acpi_gpio_chip_dh() is called when that happens, so in principle you can know when it is happening. > Unable to get our data, we hit the failure case and see the following warning > logged: Failed to retrieve ACPI GPIO chip > This means we also fail to kfree() the struct at the end of > acpi_gpiochip_remove(). > > The fix is to no longer tie the acpi_gpio_chip data to the ACPI data, > but instead hang it directly from the 'struct gpio_chip' with a new > field. This decouples the lifecycle of the acpi_gpio_chip from the ACPI > data, and ties it to the gpio_chip. This seems a much better place since > they share a common lifecycle. I can agree with this. > The general concept of attaching data to the ACPI objects may also need > revisiting in light of the issue this patch addresses. For instance > i2c_acpi_remove_space_handler() is vulnerable to a similar leak due to > using acpi_bus_get_private_data(), which just wraps acpi_attach_data(). > This may need a more widespread change than is addressed in this patch. acpi_bus_attach_private_data() is only used in 2 places beyond this, so I'm not worried too much. But yes, generally speaking, if the ACPI namespace object you attach data to can vanish from under you as a result of an overlay removal, you better not attach data to it or use a meaningful data removal handler. > Signed-off-by: Hamish Martin <hamish.martin@alliedtelesis.co.nz> > --- > drivers/gpio/gpiolib-acpi.c | 57 ++++++++----------------------------- > include/linux/gpio/driver.h | 4 +++ > 2 files changed, 16 insertions(+), 45 deletions(-) > > diff --git a/drivers/gpio/gpiolib-acpi.c b/drivers/gpio/gpiolib-acpi.c > index cd3e9657cc36..14d29902391f 100644 > --- a/drivers/gpio/gpiolib-acpi.c > +++ b/drivers/gpio/gpiolib-acpi.c > @@ -180,11 +180,6 @@ static irqreturn_t acpi_gpio_irq_handler_evt(int irq, void *data) > return IRQ_HANDLED; > } > > -static void acpi_gpio_chip_dh(acpi_handle handle, void *data) > -{ > - /* The address of this function is used as a key. */ > -} > - > bool acpi_gpio_get_irq_resource(struct acpi_resource *ares, > struct acpi_resource_gpio **agpio) > { > @@ -500,18 +495,17 @@ void acpi_gpiochip_request_interrupts(struct gpio_chip *chip) > { > struct acpi_gpio_chip *acpi_gpio; > acpi_handle handle; > - acpi_status status; > bool defer; > > if (!chip->parent || !chip->to_irq) > return; > > - handle = ACPI_HANDLE(chip->parent); > - if (!handle) > + acpi_gpio = chip->acpi_gpio; > + if (!acpi_gpio) > return; > > - status = acpi_get_data(handle, acpi_gpio_chip_dh, (void **)&acpi_gpio); > - if (ACPI_FAILURE(status)) > + handle = ACPI_HANDLE(chip->parent); > + if (!handle) > return; > > if (acpi_quirk_skip_gpio_event_handlers()) > @@ -545,18 +539,12 @@ void acpi_gpiochip_free_interrupts(struct gpio_chip *chip) > { > struct acpi_gpio_chip *acpi_gpio; > struct acpi_gpio_event *event, *ep; > - acpi_handle handle; > - acpi_status status; > > if (!chip->parent || !chip->to_irq) > return; > > - handle = ACPI_HANDLE(chip->parent); > - if (!handle) > - return; > - > - status = acpi_get_data(handle, acpi_gpio_chip_dh, (void **)&acpi_gpio); > - if (ACPI_FAILURE(status)) > + acpi_gpio = chip->acpi_gpio; > + if (!acpi_gpio) > return; > > mutex_lock(&acpi_gpio_deferred_req_irqs_lock); > @@ -1218,15 +1206,10 @@ static void acpi_gpiochip_free_regions(struct acpi_gpio_chip *achip) > struct gpio_chip *chip = achip->chip; > acpi_handle handle = ACPI_HANDLE(chip->parent); > struct acpi_gpio_connection *conn, *tmp; > - acpi_status status; > > - status = acpi_remove_address_space_handler(handle, ACPI_ADR_SPACE_GPIO, > - acpi_gpio_adr_space_handler); > - if (ACPI_FAILURE(status)) { > - dev_err(chip->parent, > - "Failed to remove GPIO OpRegion handler\n"); > - return; > - } > + /* Ignore the return status as the handle is likely no longer valid. */ > + acpi_remove_address_space_handler(handle, ACPI_ADR_SPACE_GPIO, > + acpi_gpio_adr_space_handler); > > list_for_each_entry_safe_reverse(conn, tmp, &achip->conns, node) { > gpiochip_free_own_desc(conn->desc); > @@ -1310,7 +1293,6 @@ void acpi_gpiochip_add(struct gpio_chip *chip) > { > struct acpi_gpio_chip *acpi_gpio; > struct acpi_device *adev; > - acpi_status status; > > if (!chip || !chip->parent) > return; > @@ -1327,16 +1309,10 @@ void acpi_gpiochip_add(struct gpio_chip *chip) > } > > acpi_gpio->chip = chip; > + chip->acpi_gpio = acpi_gpio; > INIT_LIST_HEAD(&acpi_gpio->events); > INIT_LIST_HEAD(&acpi_gpio->deferred_req_irqs_list_entry); > > - status = acpi_attach_data(adev->handle, acpi_gpio_chip_dh, acpi_gpio); > - if (ACPI_FAILURE(status)) { > - dev_err(chip->parent, "Failed to attach ACPI GPIO chip\n"); > - kfree(acpi_gpio); > - return; > - } > - > acpi_gpiochip_request_regions(acpi_gpio); > acpi_gpiochip_scan_gpios(acpi_gpio); > acpi_dev_clear_dependencies(adev); > @@ -1345,25 +1321,16 @@ void acpi_gpiochip_add(struct gpio_chip *chip) > void acpi_gpiochip_remove(struct gpio_chip *chip) > { > struct acpi_gpio_chip *acpi_gpio; > - acpi_handle handle; > - acpi_status status; > > if (!chip || !chip->parent) > return; > > - handle = ACPI_HANDLE(chip->parent); > - if (!handle) > + acpi_gpio = chip->acpi_gpio; > + if (!acpi_gpio) > return; > > - status = acpi_get_data(handle, acpi_gpio_chip_dh, (void **)&acpi_gpio); > - if (ACPI_FAILURE(status)) { > - dev_warn(chip->parent, "Failed to retrieve ACPI GPIO chip\n"); > - return; > - } > - > acpi_gpiochip_free_regions(acpi_gpio); > > - acpi_detach_data(handle, acpi_gpio_chip_dh); > kfree(acpi_gpio); > } > > diff --git a/include/linux/gpio/driver.h b/include/linux/gpio/driver.h > index 7f75c9a51874..698b92b1764c 100644 > --- a/include/linux/gpio/driver.h > +++ b/include/linux/gpio/driver.h > @@ -529,6 +529,10 @@ struct gpio_chip { > int (*of_xlate)(struct gpio_chip *gc, > const struct of_phandle_args *gpiospec, u32 *flags); > #endif /* CONFIG_OF_GPIO */ > + > +#ifdef CONFIG_GPIO_ACPI > + struct acpi_gpio_chip *acpi_gpio; > +#endif /* CONFIG_GPIO_ACPI */ > }; > > char *gpiochip_dup_line_label(struct gpio_chip *gc, unsigned int offset); > -- > 2.43.2 > >
diff --git a/drivers/gpio/gpiolib-acpi.c b/drivers/gpio/gpiolib-acpi.c index cd3e9657cc36..14d29902391f 100644 --- a/drivers/gpio/gpiolib-acpi.c +++ b/drivers/gpio/gpiolib-acpi.c @@ -180,11 +180,6 @@ static irqreturn_t acpi_gpio_irq_handler_evt(int irq, void *data) return IRQ_HANDLED; } -static void acpi_gpio_chip_dh(acpi_handle handle, void *data) -{ - /* The address of this function is used as a key. */ -} - bool acpi_gpio_get_irq_resource(struct acpi_resource *ares, struct acpi_resource_gpio **agpio) { @@ -500,18 +495,17 @@ void acpi_gpiochip_request_interrupts(struct gpio_chip *chip) { struct acpi_gpio_chip *acpi_gpio; acpi_handle handle; - acpi_status status; bool defer; if (!chip->parent || !chip->to_irq) return; - handle = ACPI_HANDLE(chip->parent); - if (!handle) + acpi_gpio = chip->acpi_gpio; + if (!acpi_gpio) return; - status = acpi_get_data(handle, acpi_gpio_chip_dh, (void **)&acpi_gpio); - if (ACPI_FAILURE(status)) + handle = ACPI_HANDLE(chip->parent); + if (!handle) return; if (acpi_quirk_skip_gpio_event_handlers()) @@ -545,18 +539,12 @@ void acpi_gpiochip_free_interrupts(struct gpio_chip *chip) { struct acpi_gpio_chip *acpi_gpio; struct acpi_gpio_event *event, *ep; - acpi_handle handle; - acpi_status status; if (!chip->parent || !chip->to_irq) return; - handle = ACPI_HANDLE(chip->parent); - if (!handle) - return; - - status = acpi_get_data(handle, acpi_gpio_chip_dh, (void **)&acpi_gpio); - if (ACPI_FAILURE(status)) + acpi_gpio = chip->acpi_gpio; + if (!acpi_gpio) return; mutex_lock(&acpi_gpio_deferred_req_irqs_lock); @@ -1218,15 +1206,10 @@ static void acpi_gpiochip_free_regions(struct acpi_gpio_chip *achip) struct gpio_chip *chip = achip->chip; acpi_handle handle = ACPI_HANDLE(chip->parent); struct acpi_gpio_connection *conn, *tmp; - acpi_status status; - status = acpi_remove_address_space_handler(handle, ACPI_ADR_SPACE_GPIO, - acpi_gpio_adr_space_handler); - if (ACPI_FAILURE(status)) { - dev_err(chip->parent, - "Failed to remove GPIO OpRegion handler\n"); - return; - } + /* Ignore the return status as the handle is likely no longer valid. */ + acpi_remove_address_space_handler(handle, ACPI_ADR_SPACE_GPIO, + acpi_gpio_adr_space_handler); list_for_each_entry_safe_reverse(conn, tmp, &achip->conns, node) { gpiochip_free_own_desc(conn->desc); @@ -1310,7 +1293,6 @@ void acpi_gpiochip_add(struct gpio_chip *chip) { struct acpi_gpio_chip *acpi_gpio; struct acpi_device *adev; - acpi_status status; if (!chip || !chip->parent) return; @@ -1327,16 +1309,10 @@ void acpi_gpiochip_add(struct gpio_chip *chip) } acpi_gpio->chip = chip; + chip->acpi_gpio = acpi_gpio; INIT_LIST_HEAD(&acpi_gpio->events); INIT_LIST_HEAD(&acpi_gpio->deferred_req_irqs_list_entry); - status = acpi_attach_data(adev->handle, acpi_gpio_chip_dh, acpi_gpio); - if (ACPI_FAILURE(status)) { - dev_err(chip->parent, "Failed to attach ACPI GPIO chip\n"); - kfree(acpi_gpio); - return; - } - acpi_gpiochip_request_regions(acpi_gpio); acpi_gpiochip_scan_gpios(acpi_gpio); acpi_dev_clear_dependencies(adev); @@ -1345,25 +1321,16 @@ void acpi_gpiochip_add(struct gpio_chip *chip) void acpi_gpiochip_remove(struct gpio_chip *chip) { struct acpi_gpio_chip *acpi_gpio; - acpi_handle handle; - acpi_status status; if (!chip || !chip->parent) return; - handle = ACPI_HANDLE(chip->parent); - if (!handle) + acpi_gpio = chip->acpi_gpio; + if (!acpi_gpio) return; - status = acpi_get_data(handle, acpi_gpio_chip_dh, (void **)&acpi_gpio); - if (ACPI_FAILURE(status)) { - dev_warn(chip->parent, "Failed to retrieve ACPI GPIO chip\n"); - return; - } - acpi_gpiochip_free_regions(acpi_gpio); - acpi_detach_data(handle, acpi_gpio_chip_dh); kfree(acpi_gpio); } diff --git a/include/linux/gpio/driver.h b/include/linux/gpio/driver.h index 7f75c9a51874..698b92b1764c 100644 --- a/include/linux/gpio/driver.h +++ b/include/linux/gpio/driver.h @@ -529,6 +529,10 @@ struct gpio_chip { int (*of_xlate)(struct gpio_chip *gc, const struct of_phandle_args *gpiospec, u32 *flags); #endif /* CONFIG_OF_GPIO */ + +#ifdef CONFIG_GPIO_ACPI + struct acpi_gpio_chip *acpi_gpio; +#endif /* CONFIG_GPIO_ACPI */ }; char *gpiochip_dup_line_label(struct gpio_chip *gc, unsigned int offset);
A memory leak occurs in a scenario where an ACPI SSDT overlay is removed and that is the trigger for the removal of the acpi_gpio_chip. This occurs because we use the ACPI_HANDLE of the chip->parent as a convenient location to tie the 'struct acpi_gpio_chip' to, using acpi_attach_data(). This is fine in the usual case of removal of the 'struct acpi_gpio_chip' via a call to acpi_gpiochip_remove() because usually the ACPI data is still valid. But in the case of an SSDT overlay removal, the ACPI data has been marked as invalid before the removal code is triggered (see the setting of the acpi_device handle to INVALID_ACPI_HANDLE in acpi_scan_drop_device()). This means that by the time we execute acpi_gpiochip_remove(), the handles are invalid and we are unable to retrieve the 'struct acpi_gpio_chip' using acpi_get_data(). Unable to get our data, we hit the failure case and see the following warning logged: Failed to retrieve ACPI GPIO chip This means we also fail to kfree() the struct at the end of acpi_gpiochip_remove(). The fix is to no longer tie the acpi_gpio_chip data to the ACPI data, but instead hang it directly from the 'struct gpio_chip' with a new field. This decouples the lifecycle of the acpi_gpio_chip from the ACPI data, and ties it to the gpio_chip. This seems a much better place since they share a common lifecycle. The general concept of attaching data to the ACPI objects may also need revisiting in light of the issue this patch addresses. For instance i2c_acpi_remove_space_handler() is vulnerable to a similar leak due to using acpi_bus_get_private_data(), which just wraps acpi_attach_data(). This may need a more widespread change than is addressed in this patch. Signed-off-by: Hamish Martin <hamish.martin@alliedtelesis.co.nz> --- drivers/gpio/gpiolib-acpi.c | 57 ++++++++----------------------------- include/linux/gpio/driver.h | 4 +++ 2 files changed, 16 insertions(+), 45 deletions(-)