mbox series

[0/1] I2C mux ACPI SSDT overlay removal issue

Message ID 20240228023925.2814638-1-hamish.martin@alliedtelesis.co.nz
Headers show
Series I2C mux ACPI SSDT overlay removal issue | expand

Message

Hamish Martin Feb. 28, 2024, 2:39 a.m. UTC
I have found an issue with ACPI overlay table removal specifically related to I2C multiplexers.

Consider an ACPI SSDT Overlay such as the following which defines a PCA9548 I2C mux "MUX0" on the existing I2C bus "\_SB.I2CA":

DefinitionBlock ("my_mux.asl", "SSDT", 1, "ATL", "RMUX", 0x00000001)
{
    External (\_SB.I2CA, DeviceObj)

    Scope (\_SB.I2CA)
    {
        // i2c mux (8-Channel)
        Device (MUX0)
        {
            Name (_HID, "PRP0001")  // Use compatible string
            Name (_CRS, ResourceTemplate ()
            {
                I2cSerialBusV2 (0x70, ControllerInitiated,
                                100000, AddressingMode7Bit,
                                "\\_SB.I2CA", 0x00,
                                ResourceConsumer, , Exclusive, )
            })
            Name (_DSD, Package ()
            {
                ToUUID ("daffd814-6eba-4d8c-8a91-bc9bbf4aa301"), // Device Properties UUID
                Package ()
                {
                    Package () { "name", "cex_direct_mux" },
                    Package () { "compatible", "nxp,pca9548" },
                    Package () { "i2c-mux-idle-disconnect", "" },
                }
            })

            Device (CH00) { name (_ADR, 0) }
            Device (CH01) { name (_ADR, 1) }
            Device (CH02) { name (_ADR, 2) }
            Device (CH03) { name (_ADR, 3) }
            Device (CH04) { name (_ADR, 4) }
            Device (CH05) { name (_ADR, 5) }
            Device (CH06) { name (_ADR, 6) }
            Device (CH07) { name (_ADR, 7) }
        }
    }
}

When this table is loaded (using configfs method) we see the creation of a device for the overall PCA9548 chip and 8 further devices - one i2c_adapter each for the mux channels. These are all bound to their ACPI equivalents via an eventual invocation of acpi_bind_one().

When we unload the SSDT overlay (rmdir /sys/kernel/config/acpi/table/my_mux/) we run into the problem. The ACPI devices are deleted via acpi_device_del_work_fn() and the acpi_device_del_list. 

The following warning and stack trace is output as the deletion does not go smoothly:
# rmdir /sys/kernel/config/acpi/table/my_mux/
------------[ cut here ]------------
kernfs: can not remove 'physical_node', no directory
WARNING: CPU: 1 PID: 11 at fs/kernfs/dir.c:1674 kernfs_remove_by_name_ns+0xb9/0xc0
Modules linked in:
CPU: 1 PID: 11 Comm: kworker/u128:0 Not tainted 6.8.0-rc6+ #1
Hardware name: congatec AG conga-B7E3/conga-B7E3, BIOS 5.13 05/16/2023
Workqueue: kacpi_hotplug acpi_device_del_work_fn
RIP: 0010:kernfs_remove_by_name_ns+0xb9/0xc0
Code: e4 00 48 89 ef e8 07 71 db ff 5b b8 fe ff ff ff 5d 41 5c 41 5d e9 a7 55 e4 00 0f 0b eb a6 48 c7 c7 f0 38 0d 9d e8 97 0a d5 ff <0f> 0b eb dc 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90
RSP: 0018:ffff9f864008fb28 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff8ef90a8d4940 RCX: 0000000000000000
RDX: ffff8f000e267d10 RSI: ffff8f000e25c780 RDI: ffff8f000e25c780
RBP: ffff8ef9186f9870 R08: 0000000000013ffb R09: 00000000ffffbfff
R10: 00000000ffffbfff R11: ffff8f000e0a0000 R12: ffff9f864008fb50
R13: ffff8ef90c93dd60 R14: ffff8ef9010d0958 R15: ffff8ef9186f98c8
FS:  0000000000000000(0000) GS:ffff8f000e240000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f48f5253a08 CR3: 00000003cb82e000 CR4: 00000000003506f0
Call Trace:
 <TASK>
 ? kernfs_remove_by_name_ns+0xb9/0xc0
 ? __warn+0x7c/0x130
 ? kernfs_remove_by_name_ns+0xb9/0xc0
 ? report_bug+0x171/0x1a0
 ? handle_bug+0x3c/0x70
 ? exc_invalid_op+0x17/0x70
 ? asm_exc_invalid_op+0x1a/0x20
 ? kernfs_remove_by_name_ns+0xb9/0xc0
 ? kernfs_remove_by_name_ns+0xb9/0xc0
 acpi_unbind_one+0x108/0x180
 device_del+0x18b/0x490
 ? srso_return_thunk+0x5/0x5f
 ? srso_return_thunk+0x5/0x5f
 device_unregister+0xd/0x30
 i2c_del_adapter.part.0+0x1bf/0x250
 i2c_mux_del_adapters+0xa1/0xe0
 i2c_device_remove+0x1e/0x80
 device_release_driver_internal+0x19a/0x200
 bus_remove_device+0xbf/0x100
 device_del+0x157/0x490
 ? __pfx_device_match_fwnode+0x10/0x10
 ? srso_return_thunk+0x5/0x5f
 device_unregister+0xd/0x30
 i2c_acpi_notify+0x10f/0x140
 notifier_call_chain+0x58/0xd0
 blocking_notifier_call_chain+0x3a/0x60
 acpi_device_del_work_fn+0x85/0x1d0
 process_one_work+0x134/0x2f0
 worker_thread+0x2f0/0x410
 ? __pfx_worker_thread+0x10/0x10
 kthread+0xe3/0x110
 ? __pfx_kthread+0x10/0x10
 ret_from_fork+0x2f/0x50
 ? __pfx_kthread+0x10/0x10
 ret_from_fork_asm+0x1b/0x30
 </TASK>
---[ end trace 0000000000000000 ]---
...
repeated 7 more times, 1 for each channel of the mux
...

The issue as I see it is that the binding of the ACPI devices to their peer I2C adapters is not correctly cleaned up. 
Digging deeper into the issue we see that the deletion order is such that the ACPI devices matching the mux channel i2c adapters are deleted first during the SSDT overlay removal. For each of the channels we see a call to i2c_acpi_notify() with ACPI_RECONFIG_DEVICE_REMOVE but, because these devices are not actually i2c_clients, nothing is done for them. 

Later on, after each of the mux channels has been dealt with, we come to delete the i2c_client representing the PCA9548 device. This is the call stack we see above, whereby the kernel cleans up the i2c_client including destruction of the mux and it's channel adapters (i2c_mux_del_adapters() -> i2c_del_adapter()). At this point we do attempt to unbind from the ACPI peers but those peers no longer exist and so we hit the kernfs errors.

My proposed fix is to augment i2c_acpi_notify() to handle i2c_adapters. But, given that the life cycle of the adapters is linked to the i2c_client, instead of deleting the i2c_adapters during the i2c_acpi_notify(), we instead just trigger unbinding of the ACPI device from the adapter device, and allow the cleanup of the adapter to continue in the way it always has.

This fix resolves my specific issue with the I2C mux removal. However, I'm not sure about whether there is a better fix for this issue so if you have suggestions please let me know. I'm not sure if there needs to be further code to ensure this new code only runs for this specific case where the i2c_adapters are part of an i2c mux. Guidance on this would be appreciated.

Hamish Martin (1):
  i2c: acpi: Unbind mux adapters before delete

 drivers/i2c/i2c-core-acpi.c | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

Comments

Mika Westerberg Feb. 29, 2024, 9:54 a.m. UTC | #1
On Wed, Feb 28, 2024 at 03:39:25PM +1300, Hamish Martin wrote:
> Unbind any i2c_adapters matching an ACPI device being removed. This
> ensures that the linkage between the ACPI device and the i2c_adapter is
> correctly broken before the ACPI device is destroyed.

I suggest to move some of the information from the cover letter here to
make sure it is not lost. It includes good analysis of the problem so
should be part of the changelog.

> Signed-off-by: Hamish Martin <hamish.martin@alliedtelesis.co.nz>
> ---
>  drivers/i2c/i2c-core-acpi.c | 19 +++++++++++++++----
>  1 file changed, 15 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/i2c/i2c-core-acpi.c b/drivers/i2c/i2c-core-acpi.c
> index d6037a328669..67fa8deccef6 100644
> --- a/drivers/i2c/i2c-core-acpi.c
> +++ b/drivers/i2c/i2c-core-acpi.c
> @@ -445,6 +445,11 @@ static struct i2c_client *i2c_acpi_find_client_by_adev(struct acpi_device *adev)
>  	return i2c_find_device_by_fwnode(acpi_fwnode_handle(adev));
>  }
>  
> +static struct i2c_adapter *i2c_acpi_find_adapter_by_adev(struct acpi_device *adev)
> +{
> +	return i2c_find_adapter_by_fwnode(acpi_fwnode_handle(adev));
> +}
> +
>  static int i2c_acpi_notify(struct notifier_block *nb, unsigned long value,
>  			   void *arg)
>  {
> @@ -471,11 +476,17 @@ static int i2c_acpi_notify(struct notifier_block *nb, unsigned long value,
>  			break;
>  
>  		client = i2c_acpi_find_client_by_adev(adev);
> -		if (!client)
> -			break;
> +		if (client) {
> +			i2c_unregister_device(client);
> +			put_device(&client->dev);
> +		}
> +
> +		adapter = i2c_acpi_find_adapter_by_adev(adev);
> +		if (adapter) {
> +			acpi_device_notify_remove(&adapter->dev);
> +			put_device(&adapter->dev);
> +		}

This looks good to me. I'm just wondering whether we have the same issue
on DT side too?

>  
> -		i2c_unregister_device(client);
> -		put_device(&client->dev);
>  		break;
>  	}
>  
> -- 
> 2.43.0