Message ID | 20230726123518.2361181-2-imammedo@redhat.com |
---|---|
State | New |
Headers | show |
Series | PCI: acpiphp: fix regression introduced by 'Reassign resources on bridge if necessary' | expand |
On Wed, Jul 26, 2023 at 2:35 PM Igor Mammedov <imammedo@redhat.com> wrote: > > Commit [1] switched acpiphp hotplug to use > pci_assign_unassigned_bridge_resources() > which depends on bridge being available, however in some cases > when acpiphp is in use, enable_slot() can get a slot without > bridge associated. > 1. legitimate case of hotplug on root bus > (likely not exiting on real hw, but widely used in virt world) > 2. broken firmware, that sends 'Bus check' events to non > existing root ports (Dell Inspiron 7352/0W6WV0), which somehow > endup at acpiphp:enable_slot(..., bridge = 0) and with bus > without bridge assigned to it. > > Issue is easy to reproduce with QEMU's 'pc' machine provides > PCI hotplug on hostbridge slots. to reproduce boot kernel at > commit [1] in VM started with followin CLI and hotplug a device: > > once guest OS is fully booted at qemu prompt: > > (qemu) device_add e1000 > > it will cause NULL pointer dereference at > > void pci_assign_unassigned_bridge_resources(struct pci_dev *bridge) > { > struct pci_bus *parent = bridge->subordinate; > > [ 612.277651] BUG: kernel NULL pointer dereference, address: 0000000000000018 > [...] > [ 612.277798] ? pci_assign_unassigned_bridge_resources+0x1f/0x260 > [ 612.277804] ? pcibios_allocate_dev_resources+0x3c/0x2a0 > [ 612.277809] enable_slot+0x21f/0x3e0 > [ 612.277816] acpiphp_hotplug_notify+0x13d/0x260 > [ 612.277822] ? __pfx_acpiphp_hotplug_notify+0x10/0x10 > [ 612.277827] acpi_device_hotplug+0xbc/0x540 > [ 612.277834] acpi_hotplug_work_fn+0x15/0x20 > [ 612.277839] process_one_work+0x1f7/0x370 > [ 612.277845] worker_thread+0x45/0x3b0 > [ 612.277850] ? __pfx_worker_thread+0x10/0x10 > [ 612.277854] kthread+0xdc/0x110 > [ 612.277860] ? __pfx_kthread+0x10/0x10 > [ 612.277866] ret_from_fork+0x28/0x40 > [ 612.277871] ? __pfx_kthread+0x10/0x10 > [ 612.277876] ret_from_fork_asm+0x1b/0x30 > > The issue was discovered on Dell Inspiron 7352/0W6WV0 laptop with > following sequence: > 1. suspend to RAM > 2. wake up with the same backtrace being observed: > 3. 2nd suspend to RAM attempt makes laptop freeze > > Fix it by using __pci_bus_assign_resources() instead of > pci_assign_unassigned_bridge_resources()as we used to do > but only in case when bus doesn't have a bridge associated > with it. > > That let us keep hotplug on root bus working like it used to be > but at the same time keeps resource reassignment usable on > root ports (and other 1st level bridges) that was fixed by [1]. > > 1) > Fixes: 40613da52b13 ("PCI: acpiphp: Reassign resources on bridge if necessary") > Link: https://lore.kernel.org/r/11fc981c-af49-ce64-6b43-3e282728bd1a@gmail.com > Reported-by: Woody Suwalski <terraluna977@gmail.com> > Signed-off-by: Igor Mammedov <imammedo@redhat.com> Acked-by: Rafael J. Wysocki <rafael@kernel.org> > --- > drivers/pci/hotplug/acpiphp_glue.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c > index 328d1e416014..3bc4e1f3efee 100644 > --- a/drivers/pci/hotplug/acpiphp_glue.c > +++ b/drivers/pci/hotplug/acpiphp_glue.c > @@ -498,6 +498,7 @@ static void enable_slot(struct acpiphp_slot *slot, bool bridge) > acpiphp_native_scan_bridge(dev); > } > } else { > + LIST_HEAD(add_list); > int max, pass; > > acpiphp_rescan_slot(slot); > @@ -511,10 +512,15 @@ static void enable_slot(struct acpiphp_slot *slot, bool bridge) > if (pass && dev->subordinate) { > check_hotplug_bridge(slot, dev); > pcibios_resource_survey_bus(dev->subordinate); > + if (!bus->self) > + __pci_bus_size_bridges(dev->subordinate, &add_list); > } > } > } > - pci_assign_unassigned_bridge_resources(bus->self); > + if (bus->self) > + pci_assign_unassigned_bridge_resources(bus->self); > + else > + __pci_bus_assign_resources(bus, &add_list, NULL); > } > > acpiphp_sanitize_bus(bus); > -- > 2.39.3 >
On Wed, Jul 26, 2023 at 02:35:18PM +0200, Igor Mammedov wrote: > Commit [1] switched acpiphp hotplug to use > pci_assign_unassigned_bridge_resources() > which depends on bridge being available, however in some cases > when acpiphp is in use, enable_slot() can get a slot without > bridge associated. > 1. legitimate case of hotplug on root bus > (likely not exiting on real hw, but widely used in virt world) > 2. broken firmware, that sends 'Bus check' events to non > existing root ports (Dell Inspiron 7352/0W6WV0), which somehow > endup at acpiphp:enable_slot(..., bridge = 0) and with bus > without bridge assigned to it. > > Issue is easy to reproduce with QEMU's 'pc' machine provides > PCI hotplug on hostbridge slots. to reproduce boot kernel at > commit [1] in VM started with followin CLI and hotplug a device: > > once guest OS is fully booted at qemu prompt: > > (qemu) device_add e1000 > > it will cause NULL pointer dereference at > > void pci_assign_unassigned_bridge_resources(struct pci_dev *bridge) > { > struct pci_bus *parent = bridge->subordinate; > > [ 612.277651] BUG: kernel NULL pointer dereference, address: 0000000000000018 > [...] > [ 612.277798] ? pci_assign_unassigned_bridge_resources+0x1f/0x260 > [ 612.277804] ? pcibios_allocate_dev_resources+0x3c/0x2a0 > [ 612.277809] enable_slot+0x21f/0x3e0 > [ 612.277816] acpiphp_hotplug_notify+0x13d/0x260 > [ 612.277822] ? __pfx_acpiphp_hotplug_notify+0x10/0x10 > [ 612.277827] acpi_device_hotplug+0xbc/0x540 > [ 612.277834] acpi_hotplug_work_fn+0x15/0x20 > [ 612.277839] process_one_work+0x1f7/0x370 > [ 612.277845] worker_thread+0x45/0x3b0 > [ 612.277850] ? __pfx_worker_thread+0x10/0x10 > [ 612.277854] kthread+0xdc/0x110 > [ 612.277860] ? __pfx_kthread+0x10/0x10 > [ 612.277866] ret_from_fork+0x28/0x40 > [ 612.277871] ? __pfx_kthread+0x10/0x10 > [ 612.277876] ret_from_fork_asm+0x1b/0x30 > > The issue was discovered on Dell Inspiron 7352/0W6WV0 laptop with > following sequence: > 1. suspend to RAM > 2. wake up with the same backtrace being observed: > 3. 2nd suspend to RAM attempt makes laptop freeze > > Fix it by using __pci_bus_assign_resources() instead of > pci_assign_unassigned_bridge_resources()as we used to do > but only in case when bus doesn't have a bridge associated > with it. > > That let us keep hotplug on root bus working like it used to be > but at the same time keeps resource reassignment usable on > root ports (and other 1st level bridges) that was fixed by [1]. > > 1) > Fixes: 40613da52b13 ("PCI: acpiphp: Reassign resources on bridge if necessary") > Link: https://lore.kernel.org/r/11fc981c-af49-ce64-6b43-3e282728bd1a@gmail.com > Reported-by: Woody Suwalski <terraluna977@gmail.com> > Signed-off-by: Igor Mammedov <imammedo@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> > --- > drivers/pci/hotplug/acpiphp_glue.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c > index 328d1e416014..3bc4e1f3efee 100644 > --- a/drivers/pci/hotplug/acpiphp_glue.c > +++ b/drivers/pci/hotplug/acpiphp_glue.c > @@ -498,6 +498,7 @@ static void enable_slot(struct acpiphp_slot *slot, bool bridge) > acpiphp_native_scan_bridge(dev); > } > } else { > + LIST_HEAD(add_list); > int max, pass; > > acpiphp_rescan_slot(slot); > @@ -511,10 +512,15 @@ static void enable_slot(struct acpiphp_slot *slot, bool bridge) > if (pass && dev->subordinate) { > check_hotplug_bridge(slot, dev); > pcibios_resource_survey_bus(dev->subordinate); > + if (!bus->self) > + __pci_bus_size_bridges(dev->subordinate, &add_list); > } > } > } > - pci_assign_unassigned_bridge_resources(bus->self); > + if (bus->self) > + pci_assign_unassigned_bridge_resources(bus->self); > + else > + __pci_bus_assign_resources(bus, &add_list, NULL); > } > > acpiphp_sanitize_bus(bus); > -- > 2.39.3
Thank you to both you and Woody for chasing this down! On Wed, Jul 26, 2023 at 02:35:18PM +0200, Igor Mammedov wrote: > Commit [1] switched acpiphp hotplug to use > pci_assign_unassigned_bridge_resources() > which depends on bridge being available, however in some cases > when acpiphp is in use, enable_slot() can get a slot without > bridge associated. > 1. legitimate case of hotplug on root bus > (likely not exiting on real hw, but widely used in virt world) > 2. broken firmware, that sends 'Bus check' events to non > existing root ports (Dell Inspiron 7352/0W6WV0), which somehow > endup at acpiphp:enable_slot(..., bridge = 0) and with bus > without bridge assigned to it. Do we have evidence about the details of this non-existent root port? If we do, I think it would be interesting to include a URL to them in case there's some hole in the way we handle Bus Check events. > Issue is easy to reproduce with QEMU's 'pc' machine provides > PCI hotplug on hostbridge slots. to reproduce boot kernel at > commit [1] in VM started with followin CLI and hotplug a device: You mention CLI; did you mean to include a qemu command line here? Maybe it's the same thing mentioned in the 40613da52b13 commit log? I tried briefly to reproduce this using the 40613da52b13 command line but haven't quite got it going yet. I think it would be very useful to either include it here again or point to the 40613da52b13 commit log. > once guest OS is fully booted at qemu prompt: > > (qemu) device_add e1000 > > it will cause NULL pointer dereference at > > void pci_assign_unassigned_bridge_resources(struct pci_dev *bridge) > { > struct pci_bus *parent = bridge->subordinate; > > [ 612.277651] BUG: kernel NULL pointer dereference, address: 0000000000000018 > [...] > [ 612.277798] ? pci_assign_unassigned_bridge_resources+0x1f/0x260 > [ 612.277804] ? pcibios_allocate_dev_resources+0x3c/0x2a0 > [ 612.277809] enable_slot+0x21f/0x3e0 > [ 612.277816] acpiphp_hotplug_notify+0x13d/0x260 > [ 612.277822] ? __pfx_acpiphp_hotplug_notify+0x10/0x10 > [ 612.277827] acpi_device_hotplug+0xbc/0x540 > [ 612.277834] acpi_hotplug_work_fn+0x15/0x20 > [ 612.277839] process_one_work+0x1f7/0x370 > [ 612.277845] worker_thread+0x45/0x3b0 > [ 612.277850] ? __pfx_worker_thread+0x10/0x10 > [ 612.277854] kthread+0xdc/0x110 > [ 612.277860] ? __pfx_kthread+0x10/0x10 > [ 612.277866] ret_from_fork+0x28/0x40 > [ 612.277871] ? __pfx_kthread+0x10/0x10 > [ 612.277876] ret_from_fork_asm+0x1b/0x30 > > The issue was discovered on Dell Inspiron 7352/0W6WV0 laptop with > following sequence: > 1. suspend to RAM > 2. wake up with the same backtrace being observed: > 3. 2nd suspend to RAM attempt makes laptop freeze > > Fix it by using __pci_bus_assign_resources() instead of > pci_assign_unassigned_bridge_resources()as we used to do > but only in case when bus doesn't have a bridge associated > with it. > > That let us keep hotplug on root bus working like it used to be > but at the same time keeps resource reassignment usable on > root ports (and other 1st level bridges) that was fixed by [1]. > > 1) > Fixes: 40613da52b13 ("PCI: acpiphp: Reassign resources on bridge if necessary") > Link: https://lore.kernel.org/r/11fc981c-af49-ce64-6b43-3e282728bd1a@gmail.com > Reported-by: Woody Suwalski <terraluna977@gmail.com> > Signed-off-by: Igor Mammedov <imammedo@redhat.com> > --- > drivers/pci/hotplug/acpiphp_glue.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c > index 328d1e416014..3bc4e1f3efee 100644 > --- a/drivers/pci/hotplug/acpiphp_glue.c > +++ b/drivers/pci/hotplug/acpiphp_glue.c > @@ -498,6 +498,7 @@ static void enable_slot(struct acpiphp_slot *slot, bool bridge) > acpiphp_native_scan_bridge(dev); > } > } else { > + LIST_HEAD(add_list); > int max, pass; > > acpiphp_rescan_slot(slot); > @@ -511,10 +512,15 @@ static void enable_slot(struct acpiphp_slot *slot, bool bridge) > if (pass && dev->subordinate) { > check_hotplug_bridge(slot, dev); > pcibios_resource_survey_bus(dev->subordinate); > + if (!bus->self) > + __pci_bus_size_bridges(dev->subordinate, &add_list); > } > } > } > - pci_assign_unassigned_bridge_resources(bus->self); > + if (bus->self) > + pci_assign_unassigned_bridge_resources(bus->self); > + else > + __pci_bus_assign_resources(bus, &add_list, NULL); > } > > acpiphp_sanitize_bus(bus); > -- > 2.39.3 >
On Thu, 27 Jul 2023 12:41:02 -0500 Bjorn Helgaas <helgaas@kernel.org> wrote: > Thank you to both you and Woody for chasing this down! > > On Wed, Jul 26, 2023 at 02:35:18PM +0200, Igor Mammedov wrote: > > Commit [1] switched acpiphp hotplug to use > > pci_assign_unassigned_bridge_resources() > > which depends on bridge being available, however in some cases > > when acpiphp is in use, enable_slot() can get a slot without > > bridge associated. > > 1. legitimate case of hotplug on root bus > > (likely not exiting on real hw, but widely used in virt world) > > 2. broken firmware, that sends 'Bus check' events to non > > existing root ports (Dell Inspiron 7352/0W6WV0), which somehow > > endup at acpiphp:enable_slot(..., bridge = 0) and with bus > > without bridge assigned to it. > > Do we have evidence about the details of this non-existent root port? > If we do, I think it would be interesting to include a URL to them in > case there's some hole in the way we handle Bus Check events. it's scattered over logs Woody has provided, here are links to emails with 1: lspci output https://lore.kernel.org/r/92150d8d-8a3a-d600-a996-f60a8e4c876c@gmail.com/ according to lscpi and dmesg there is only one root-port at 1c.0 which is occupied by wifi card while DSTD table has more ports described, which is fine as long as missing/disabled are not reported as present. 2: last round of logs with debug patch /before 40613da5, with 40613da5, and after/ https://lore.kernel.org/r/46437825-3bd0-2f8a-12d8-98a2b54d7c22@gmail.com/ here dmesg shows 1st correct port ACPI: \_SB_.PCI0.RP03: acpiphp_glue: Bus check in hotplug_event(): bridge: 000000000dad0b34 and then later on ACPI: \_SB_.PCI0.RP07: acpiphp_glue: Bus check in hotplug_event(): bridge: 0000000000000000 ACPI: \_SB_.PCI0.RP08: acpiphp_glue: Bus check in hotplug_event(): bridge: 0000000000000000 which aren't recognized as bridge I don't know ACPICA code enough to guesstimate where we might miss a check that device is actually exists to do further debug over mail list within reasonable timeframe. > > Issue is easy to reproduce with QEMU's 'pc' machine provides > > PCI hotplug on hostbridge slots. to reproduce boot kernel at > > commit [1] in VM started with followin CLI and hotplug a device: > > You mention CLI; did you mean to include a qemu command line here? > Maybe it's the same thing mentioned in the 40613da52b13 commit log? > I tried briefly to reproduce this using the 40613da52b13 command line > but haven't quite got it going yet. I think it would be very useful > to either include it here again or point to the 40613da52b13 commit > log. my bad, I didn't realize that saying 'pc' machine is not sufficient. minimal CLI can be (important part '-M pc -monitor stdio', the rest is for making guest boot and run at tolerable speed): $QEMU -M pc -m 4G -monitor stdio -cpu host --enable-kvm vm_disk_image Will you amend commit message or shall I repost with changes/Acks? > > once guest OS is fully booted at qemu prompt: > > > > (qemu) device_add e1000 > > > > it will cause NULL pointer dereference at > > > > void pci_assign_unassigned_bridge_resources(struct pci_dev *bridge) > > { > > struct pci_bus *parent = bridge->subordinate; > > > > [ 612.277651] BUG: kernel NULL pointer dereference, address: 0000000000000018 > > [...] > > [ 612.277798] ? pci_assign_unassigned_bridge_resources+0x1f/0x260 > > [ 612.277804] ? pcibios_allocate_dev_resources+0x3c/0x2a0 > > [ 612.277809] enable_slot+0x21f/0x3e0 > > [ 612.277816] acpiphp_hotplug_notify+0x13d/0x260 > > [ 612.277822] ? __pfx_acpiphp_hotplug_notify+0x10/0x10 > > [ 612.277827] acpi_device_hotplug+0xbc/0x540 > > [ 612.277834] acpi_hotplug_work_fn+0x15/0x20 > > [ 612.277839] process_one_work+0x1f7/0x370 > > [ 612.277845] worker_thread+0x45/0x3b0 > > [ 612.277850] ? __pfx_worker_thread+0x10/0x10 > > [ 612.277854] kthread+0xdc/0x110 > > [ 612.277860] ? __pfx_kthread+0x10/0x10 > > [ 612.277866] ret_from_fork+0x28/0x40 > > [ 612.277871] ? __pfx_kthread+0x10/0x10 > > [ 612.277876] ret_from_fork_asm+0x1b/0x30 > > > > The issue was discovered on Dell Inspiron 7352/0W6WV0 laptop with > > following sequence: > > 1. suspend to RAM > > 2. wake up with the same backtrace being observed: > > 3. 2nd suspend to RAM attempt makes laptop freeze > > > > Fix it by using __pci_bus_assign_resources() instead of > > pci_assign_unassigned_bridge_resources()as we used to do > > but only in case when bus doesn't have a bridge associated > > with it. > > > > That let us keep hotplug on root bus working like it used to be > > but at the same time keeps resource reassignment usable on > > root ports (and other 1st level bridges) that was fixed by [1]. > > > > 1) > > Fixes: 40613da52b13 ("PCI: acpiphp: Reassign resources on bridge if necessary") > > Link: https://lore.kernel.org/r/11fc981c-af49-ce64-6b43-3e282728bd1a@gmail.com > > Reported-by: Woody Suwalski <terraluna977@gmail.com> > > Signed-off-by: Igor Mammedov <imammedo@redhat.com> > > --- > > drivers/pci/hotplug/acpiphp_glue.c | 8 +++++++- > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c > > index 328d1e416014..3bc4e1f3efee 100644 > > --- a/drivers/pci/hotplug/acpiphp_glue.c > > +++ b/drivers/pci/hotplug/acpiphp_glue.c > > @@ -498,6 +498,7 @@ static void enable_slot(struct acpiphp_slot *slot, bool bridge) > > acpiphp_native_scan_bridge(dev); > > } > > } else { > > + LIST_HEAD(add_list); > > int max, pass; > > > > acpiphp_rescan_slot(slot); > > @@ -511,10 +512,15 @@ static void enable_slot(struct acpiphp_slot *slot, bool bridge) > > if (pass && dev->subordinate) { > > check_hotplug_bridge(slot, dev); > > pcibios_resource_survey_bus(dev->subordinate); > > + if (!bus->self) > > + __pci_bus_size_bridges(dev->subordinate, &add_list); > > } > > } > > } > > - pci_assign_unassigned_bridge_resources(bus->self); > > + if (bus->self) > > + pci_assign_unassigned_bridge_resources(bus->self); > > + else > > + __pci_bus_assign_resources(bus, &add_list, NULL); > > } > > > > acpiphp_sanitize_bus(bus); > > -- > > 2.39.3 > > >
On Fri, Jul 28, 2023 at 11:32:16AM +0200, Igor Mammedov wrote: > On Thu, 27 Jul 2023 12:41:02 -0500 Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Wed, Jul 26, 2023 at 02:35:18PM +0200, Igor Mammedov wrote: > > > Commit [1] switched acpiphp hotplug to use > > > pci_assign_unassigned_bridge_resources() > > > which depends on bridge being available, however in some cases > > > when acpiphp is in use, enable_slot() can get a slot without > > > bridge associated. > > > 1. legitimate case of hotplug on root bus > > > (likely not exiting on real hw, but widely used in virt world) > > > 2. broken firmware, that sends 'Bus check' events to non > > > existing root ports (Dell Inspiron 7352/0W6WV0), which somehow > > > endup at acpiphp:enable_slot(..., bridge = 0) and with bus > > > without bridge assigned to it. > > > > Do we have evidence about the details of this non-existent root port? > > If we do, I think it would be interesting to include a URL to them in > > case there's some hole in the way we handle Bus Check events. > > it's scattered over logs Woody has provided, here are links to > emails with > 1: lspci output > https://lore.kernel.org/r/92150d8d-8a3a-d600-a996-f60a8e4c876c@gmail.com/ > > according to lscpi and dmesg there is only one root-port at 1c.0 > which is occupied by wifi card > > while DSTD table has more ports described, which is fine as long as > missing/disabled are not reported as present. > > 2: last round of logs with debug patch /before 40613da5, with 40613da5, and after/ > https://lore.kernel.org/r/46437825-3bd0-2f8a-12d8-98a2b54d7c22@gmail.com/ > > here dmesg shows 1st correct port > ACPI: \_SB_.PCI0.RP03: acpiphp_glue: Bus check in hotplug_event(): bridge: 000000000dad0b34 > and then later on > ACPI: \_SB_.PCI0.RP07: acpiphp_glue: Bus check in hotplug_event(): bridge: 0000000000000000 > ACPI: \_SB_.PCI0.RP08: acpiphp_glue: Bus check in hotplug_event(): bridge: 0000000000000000 > which aren't recognized as bridge Thanks, that does seem a little suspect. ACPI r6.5 sec 5.6.6 says that when OSPM handles a Bus Check, it should "perform a Plug and Play re-enumeration operation on the device tree starting from the point where it has been notified." PCI devices are enumerated by doing PCI config reads. It would make sense to re-enumerate a PCI hierarchy starting with a PCI device that's already known to the OS, e.g., by scanning the secondary bus of a PCI-to-PCI bridge. I think there are two problems here: 1) The platform shouldn't send a Bus Check notification to a PCI device that doesn't exist. How could the OS re-enumerate starting there? 2) Linux runs acpiphp_hotplug_notify() for Bus Checks to non-existent PCI devices when it ignore them; reasoning below. We call acpiphp_enumerate_slots() in this path, which happens before any of the PCI devices on the root bus have been enumerated: pci_register_host_bridge pcibios_add_bus(root bus) acpi_pci_add_bus acpiphp_enumerate_slots(pci_bus *bus) acpi_walk_namespace(acpiphp_add_context) acpiphp_add_context(struct acpiphp_bridge *) acpi_evaluate_integer("_ADR") acpiphp_init_context context->hp.notify = acpiphp_hotplug_notify So now we've already looked at RP03, RP07, and RP08, and set up the .notify() handler for all of them. Since we haven't scanned the bus yet, we don't know that RP03 exists and RP07 and RP08 do not. Per ACPI r6.5, sec 6, all these Device objects are "Augmented Device Descriptors": An Agumented [sic] Device Descriptor, which contains additional device information that is not provided from the Device itself, yet is needed by the Device or Bus driver in order to properly configure and use the device. This type of device is enumerated by a bus-specific enumeration mechanism, and OSPM uses the Address (_ADR) to match the ACPI Device object in the Namespace to the device discovered through bus enumeration. I think that means OSPM should discover a PCI device using the PCI bus-specific enumeration mechanism (i.e., config reads) before it even looks for a corresponding ACPI Device object, and it should only set up .notify() for PCI devices that actually exist, so the Bus Checks on RP07 and RP08 would be ignored and we wouldn't even get into the path that causes the NULL pointer dereference: acpi_device_hotplug acpiphp_hotplug_notify # from hp.notify hotplug_event bridge = context->bridge case BUS_CHECK: if (bridge) acpiphp_check_bridge else if (!SLOT_IS_GOING_AWAY) enable_slot bus = slot->bus # "bus" is a root bus pci_assign_unassigned_bridge_resources(bus->self) bridge = bus->self # "bridge" is NULL since # bus->self is NULL for root buses struct pci_bus *parent = bridge->subordinate # NULL pointer dereference Obviously none of this helps solve the current regression. Changing the .notify() setup would be a big change, it would be risky because it might affect dock support, and it still wouldn't fix your case 1 of hotplug on the root bus in a virtualized environment. > > > Issue is easy to reproduce with QEMU's 'pc' machine provides > > > PCI hotplug on hostbridge slots. to reproduce boot kernel at > > > commit [1] in VM started with followin CLI and hotplug a device: > > > > You mention CLI; did you mean to include a qemu command line here? > > Maybe it's the same thing mentioned in the 40613da52b13 commit log? > > I tried briefly to reproduce this using the 40613da52b13 command line > > but haven't quite got it going yet. I think it would be very useful > > to either include it here again or point to the 40613da52b13 commit > > log. > > my bad, I didn't realize that saying 'pc' machine is not sufficient. > > minimal CLI can be (important part '-M pc -monitor stdio', > the rest is for making guest boot and run at tolerable speed): > > $QEMU -M pc -m 4G -monitor stdio -cpu host --enable-kvm vm_disk_image > > Will you amend commit message or shall I repost with changes/Acks? I'll give it a shot and post it for your comments. > > > once guest OS is fully booted at qemu prompt: > > > > > > (qemu) device_add e1000 > > > > > > it will cause NULL pointer dereference at > > > > > > void pci_assign_unassigned_bridge_resources(struct pci_dev *bridge) > > > { > > > struct pci_bus *parent = bridge->subordinate; This worked for me (after setting CONFIG_HOTPLUG_PCI_ACPI=y :)): $ qemu-system-x86_64 -M pc -m 512M -monitor stdio -cpu host --enable-kvm -kernel arch/x86/boot/bzImage -drive format=raw,file=ubuntu.img -append "root=/dev/sda1" (qemu) device_add e1000 (For posterity, replacing "-monitor stdio" with "-nographic -monitor telnet:localhost:7001,server,nowait,nodelay" and adding "console=ttyS0,115200n8" to the -append made it easier to see the crash details.) > > > [ 612.277651] BUG: kernel NULL pointer dereference, address: 0000000000000018 > > > [...] > > > [ 612.277798] ? pci_assign_unassigned_bridge_resources+0x1f/0x260 > > > [ 612.277804] ? pcibios_allocate_dev_resources+0x3c/0x2a0 > > > [ 612.277809] enable_slot+0x21f/0x3e0 > > > [ 612.277816] acpiphp_hotplug_notify+0x13d/0x260 > > > [ 612.277822] ? __pfx_acpiphp_hotplug_notify+0x10/0x10 > > > [ 612.277827] acpi_device_hotplug+0xbc/0x540 > > > [ 612.277834] acpi_hotplug_work_fn+0x15/0x20 > > > [ 612.277839] process_one_work+0x1f7/0x370 > > > [ 612.277845] worker_thread+0x45/0x3b0 > > > [ 612.277850] ? __pfx_worker_thread+0x10/0x10 > > > [ 612.277854] kthread+0xdc/0x110 > > > [ 612.277860] ? __pfx_kthread+0x10/0x10 > > > [ 612.277866] ret_from_fork+0x28/0x40 > > > [ 612.277871] ? __pfx_kthread+0x10/0x10 > > > [ 612.277876] ret_from_fork_asm+0x1b/0x30 > > > > > > The issue was discovered on Dell Inspiron 7352/0W6WV0 laptop with > > > following sequence: > > > 1. suspend to RAM > > > 2. wake up with the same backtrace being observed: > > > 3. 2nd suspend to RAM attempt makes laptop freeze > > > > > > Fix it by using __pci_bus_assign_resources() instead of > > > pci_assign_unassigned_bridge_resources()as we used to do > > > but only in case when bus doesn't have a bridge associated > > > with it. > > > > > > That let us keep hotplug on root bus working like it used to be > > > but at the same time keeps resource reassignment usable on > > > root ports (and other 1st level bridges) that was fixed by [1]. > > > > > > 1) > > > Fixes: 40613da52b13 ("PCI: acpiphp: Reassign resources on bridge if necessary") > > > Link: https://lore.kernel.org/r/11fc981c-af49-ce64-6b43-3e282728bd1a@gmail.com > > > Reported-by: Woody Suwalski <terraluna977@gmail.com> > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com> > > > --- > > > drivers/pci/hotplug/acpiphp_glue.c | 8 +++++++- > > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > > > diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c > > > index 328d1e416014..3bc4e1f3efee 100644 > > > --- a/drivers/pci/hotplug/acpiphp_glue.c > > > +++ b/drivers/pci/hotplug/acpiphp_glue.c > > > @@ -498,6 +498,7 @@ static void enable_slot(struct acpiphp_slot *slot, bool bridge) > > > acpiphp_native_scan_bridge(dev); > > > } > > > } else { > > > + LIST_HEAD(add_list); > > > int max, pass; > > > > > > acpiphp_rescan_slot(slot); > > > @@ -511,10 +512,15 @@ static void enable_slot(struct acpiphp_slot *slot, bool bridge) > > > if (pass && dev->subordinate) { > > > check_hotplug_bridge(slot, dev); > > > pcibios_resource_survey_bus(dev->subordinate); > > > + if (!bus->self) > > > + __pci_bus_size_bridges(dev->subordinate, &add_list); > > > } > > > } > > > } > > > - pci_assign_unassigned_bridge_resources(bus->self); > > > + if (bus->self) > > > + pci_assign_unassigned_bridge_resources(bus->self); > > > + else > > > + __pci_bus_assign_resources(bus, &add_list, NULL); I really wish we didn't have such different resource assignment paths depending on whether the device is on a root bus or deeper in the hierarchy. But we can't fix that now, so this seems like the right thing. But would you be OK with this minor mod? if (pci_is_root_bus(bus)) __pci_bus_size_bridges(dev->subordinate, &add_list); ... if (pci_is_root_bus(bus)) __pci_bus_assign_resources(bus, &add_list, NULL); else pci_assign_unassigned_bridge_resources(bus->self); For two reasons: (1) test the same condition both places, and (2) be a little more explicit about the scenario (and "bus->self == NULL" also happens for the virtual buses added for SR-IOV). > > > } > > > > > > acpiphp_sanitize_bus(bus); > > > -- > > > 2.39.3 > > > > > >
On Sat, 29 Jul 2023 16:50:09 -0500 Bjorn Helgaas <helgaas@kernel.org> wrote: > On Fri, Jul 28, 2023 at 11:32:16AM +0200, Igor Mammedov wrote: > > On Thu, 27 Jul 2023 12:41:02 -0500 Bjorn Helgaas <helgaas@kernel.org> wrote: > > > On Wed, Jul 26, 2023 at 02:35:18PM +0200, Igor Mammedov wrote: > > > > Commit [1] switched acpiphp hotplug to use > > > > pci_assign_unassigned_bridge_resources() > > > > which depends on bridge being available, however in some cases > > > > when acpiphp is in use, enable_slot() can get a slot without > > > > bridge associated. > > > > 1. legitimate case of hotplug on root bus > > > > (likely not exiting on real hw, but widely used in virt world) > > > > 2. broken firmware, that sends 'Bus check' events to non > > > > existing root ports (Dell Inspiron 7352/0W6WV0), which somehow > > > > endup at acpiphp:enable_slot(..., bridge = 0) and with bus > > > > without bridge assigned to it. > > > > > > Do we have evidence about the details of this non-existent root port? > > > If we do, I think it would be interesting to include a URL to them in > > > case there's some hole in the way we handle Bus Check events. > > > > it's scattered over logs Woody has provided, here are links to > > emails with > > 1: lspci output > > https://lore.kernel.org/r/92150d8d-8a3a-d600-a996-f60a8e4c876c@gmail.com/ > > > > according to lscpi and dmesg there is only one root-port at 1c.0 > > which is occupied by wifi card > > > > while DSTD table has more ports described, which is fine as long as > > missing/disabled are not reported as present. > > > > 2: last round of logs with debug patch /before 40613da5, with 40613da5, and after/ > > https://lore.kernel.org/r/46437825-3bd0-2f8a-12d8-98a2b54d7c22@gmail.com/ > > > > here dmesg shows 1st correct port > > ACPI: \_SB_.PCI0.RP03: acpiphp_glue: Bus check in hotplug_event(): bridge: 000000000dad0b34 > > and then later on > > ACPI: \_SB_.PCI0.RP07: acpiphp_glue: Bus check in hotplug_event(): bridge: 0000000000000000 > > ACPI: \_SB_.PCI0.RP08: acpiphp_glue: Bus check in hotplug_event(): bridge: 0000000000000000 > > which aren't recognized as bridge > > Thanks, that does seem a little suspect. ACPI r6.5 sec 5.6.6 says > that when OSPM handles a Bus Check, it should "perform a Plug and Play > re-enumeration operation on the device tree starting from the point > where it has been notified." > > PCI devices are enumerated by doing PCI config reads. It would make > sense to re-enumerate a PCI hierarchy starting with a PCI device > that's already known to the OS, e.g., by scanning the secondary bus of > a PCI-to-PCI bridge. > > I think there are two problems here: > > 1) The platform shouldn't send a Bus Check notification to a PCI > device that doesn't exist. How could the OS re-enumerate > starting there? in case of reported laptop, DSDT provides Device Descriptors for not existing root-ports. OSPM can't do anything with it but to pass Notify event to PCI bus-specific enumeration mechanism, and it's upto PCI subsystem to discard/ignore Notify() on this ACPI node. I think I can mock this case by hacking QEMU, that should help with finding a proper place to fix it. (here it is: https://gitlab.com/imammedo/qemu/-/commits/acpiphp_buscheck_on_missing_device?ref_type=heads I'll post hack patch as a reply to this email for posterity) > 2) Linux runs acpiphp_hotplug_notify() for Bus Checks to > non-existent PCI devices when it ignore them; reasoning below. > > We call acpiphp_enumerate_slots() in this path, which happens before > any of the PCI devices on the root bus have been enumerated: > > pci_register_host_bridge > pcibios_add_bus(root bus) > acpi_pci_add_bus > acpiphp_enumerate_slots(pci_bus *bus) > acpi_walk_namespace(acpiphp_add_context) > acpiphp_add_context(struct acpiphp_bridge *) > acpi_evaluate_integer("_ADR") > acpiphp_init_context > context->hp.notify = acpiphp_hotplug_notify > > So now we've already looked at RP03, RP07, and RP08, and set up the > .notify() handler for all of them. Since we haven't scanned the bus > yet, we don't know that RP03 exists and RP07 and RP08 do not. While ACPI doesn't forbid firmware to describe non-existing RP, the PCIe hostbridge can't hotplug extra root ports. (and QEMU follows PCIe design in this respect on 'q35' machine). However when it comes to hotplug on QEMU's 'pc' machine (hotplug on root bus), each slot has "Augmented Device Descriptors", that includes un-populated slots leading to the presence of .notify() handler on such slots. Then later on when device is hotplugged, a Notify(,1/*DeviceCheck*/) is sent to previously empty slot and from there on PCI subsystem re-enumerates either a single device or a bridge hierarchy (from the parent context). So I'd assume that we need to have .notify() handler for all slots that are described in DSDT (present and non present). > Per ACPI r6.5, sec 6, all these Device objects are "Augmented Device > Descriptors": > > An Agumented [sic] Device Descriptor, which contains additional > device information that is not provided from the Device itself, yet > is needed by the Device or Bus driver in order to properly configure > and use the device. This type of device is enumerated by a > bus-specific enumeration mechanism, and OSPM uses the Address (_ADR) > to match the ACPI Device object in the Namespace to the device > discovered through bus enumeration. > > I think that means OSPM should discover a PCI device using the PCI > bus-specific enumeration mechanism (i.e., config reads) before it even > looks for a corresponding ACPI Device object, and it should only set > up .notify() for PCI devices that actually exist, so the Bus Checks on > RP07 and RP08 would be ignored and we wouldn't even get into the path > that causes the NULL pointer dereference: > > acpi_device_hotplug > acpiphp_hotplug_notify # from hp.notify > hotplug_event > bridge = context->bridge > case BUS_CHECK: > if (bridge) > acpiphp_check_bridge > else if (!SLOT_IS_GOING_AWAY) > enable_slot > bus = slot->bus # "bus" is a root bus > pci_assign_unassigned_bridge_resources(bus->self) > bridge = bus->self # "bridge" is NULL since > # bus->self is NULL for root buses > struct pci_bus *parent = bridge->subordinate > # NULL pointer dereference > > Obviously none of this helps solve the current regression. Changing > the .notify() setup would be a big change, it would be risky because > it might affect dock support, and it still wouldn't fix your case 1 of > hotplug on the root bus in a virtualized environment. > > > > > Issue is easy to reproduce with QEMU's 'pc' machine provides > > > > PCI hotplug on hostbridge slots. to reproduce boot kernel at > > > > commit [1] in VM started with followin CLI and hotplug a device: > > > > > > You mention CLI; did you mean to include a qemu command line here? > > > Maybe it's the same thing mentioned in the 40613da52b13 commit log? > > > I tried briefly to reproduce this using the 40613da52b13 command line > > > but haven't quite got it going yet. I think it would be very useful > > > to either include it here again or point to the 40613da52b13 commit > > > log. > > > > my bad, I didn't realize that saying 'pc' machine is not sufficient. > > > > minimal CLI can be (important part '-M pc -monitor stdio', > > the rest is for making guest boot and run at tolerable speed): > > > > $QEMU -M pc -m 4G -monitor stdio -cpu host --enable-kvm vm_disk_image > > > > Will you amend commit message or shall I repost with changes/Acks? > > I'll give it a shot and post it for your comments. > > > > > once guest OS is fully booted at qemu prompt: > > > > > > > > (qemu) device_add e1000 > > > > > > > > it will cause NULL pointer dereference at > > > > > > > > void pci_assign_unassigned_bridge_resources(struct pci_dev *bridge) > > > > { > > > > struct pci_bus *parent = bridge->subordinate; > > This worked for me (after setting CONFIG_HOTPLUG_PCI_ACPI=y :)): > > $ qemu-system-x86_64 -M pc -m 512M -monitor stdio -cpu host --enable-kvm -kernel arch/x86/boot/bzImage -drive format=raw,file=ubuntu.img -append "root=/dev/sda1" > (qemu) device_add e1000 > > (For posterity, replacing "-monitor stdio" with "-nographic -monitor > telnet:localhost:7001,server,nowait,nodelay" and adding > "console=ttyS0,115200n8" to the -append made it easier to see the > crash details.) I've not put extra arguments, because there is a lot of ways one can configure/use monitor/serial options. But specifying full command line like yours will be useful for anyone who doesn't have any experience with QEMU CLI. > > > > [ 612.277651] BUG: kernel NULL pointer dereference, address: 0000000000000018 > > > > [...] > > > > [ 612.277798] ? pci_assign_unassigned_bridge_resources+0x1f/0x260 > > > > [ 612.277804] ? pcibios_allocate_dev_resources+0x3c/0x2a0 > > > > [ 612.277809] enable_slot+0x21f/0x3e0 > > > > [ 612.277816] acpiphp_hotplug_notify+0x13d/0x260 > > > > [ 612.277822] ? __pfx_acpiphp_hotplug_notify+0x10/0x10 > > > > [ 612.277827] acpi_device_hotplug+0xbc/0x540 > > > > [ 612.277834] acpi_hotplug_work_fn+0x15/0x20 > > > > [ 612.277839] process_one_work+0x1f7/0x370 > > > > [ 612.277845] worker_thread+0x45/0x3b0 > > > > [ 612.277850] ? __pfx_worker_thread+0x10/0x10 > > > > [ 612.277854] kthread+0xdc/0x110 > > > > [ 612.277860] ? __pfx_kthread+0x10/0x10 > > > > [ 612.277866] ret_from_fork+0x28/0x40 > > > > [ 612.277871] ? __pfx_kthread+0x10/0x10 > > > > [ 612.277876] ret_from_fork_asm+0x1b/0x30 > > > > > > > > The issue was discovered on Dell Inspiron 7352/0W6WV0 laptop with > > > > following sequence: > > > > 1. suspend to RAM > > > > 2. wake up with the same backtrace being observed: > > > > 3. 2nd suspend to RAM attempt makes laptop freeze > > > > > > > > Fix it by using __pci_bus_assign_resources() instead of > > > > pci_assign_unassigned_bridge_resources()as we used to do > > > > but only in case when bus doesn't have a bridge associated > > > > with it. > > > > > > > > That let us keep hotplug on root bus working like it used to be > > > > but at the same time keeps resource reassignment usable on > > > > root ports (and other 1st level bridges) that was fixed by [1]. > > > > > > > > 1) > > > > Fixes: 40613da52b13 ("PCI: acpiphp: Reassign resources on bridge if necessary") > > > > Link: https://lore.kernel.org/r/11fc981c-af49-ce64-6b43-3e282728bd1a@gmail.com > > > > Reported-by: Woody Suwalski <terraluna977@gmail.com> > > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com> > > > > --- > > > > drivers/pci/hotplug/acpiphp_glue.c | 8 +++++++- > > > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c > > > > index 328d1e416014..3bc4e1f3efee 100644 > > > > --- a/drivers/pci/hotplug/acpiphp_glue.c > > > > +++ b/drivers/pci/hotplug/acpiphp_glue.c > > > > @@ -498,6 +498,7 @@ static void enable_slot(struct acpiphp_slot *slot, bool bridge) > > > > acpiphp_native_scan_bridge(dev); > > > > } > > > > } else { > > > > + LIST_HEAD(add_list); > > > > int max, pass; > > > > > > > > acpiphp_rescan_slot(slot); > > > > @@ -511,10 +512,15 @@ static void enable_slot(struct acpiphp_slot *slot, bool bridge) > > > > if (pass && dev->subordinate) { > > > > check_hotplug_bridge(slot, dev); > > > > pcibios_resource_survey_bus(dev->subordinate); > > > > + if (!bus->self) > > > > + __pci_bus_size_bridges(dev->subordinate, &add_list); > > > > } > > > > } > > > > } > > > > - pci_assign_unassigned_bridge_resources(bus->self); > > > > + if (bus->self) > > > > + pci_assign_unassigned_bridge_resources(bus->self); > > > > + else > > > > + __pci_bus_assign_resources(bus, &add_list, NULL); > > I really wish we didn't have such different resource assignment paths > depending on whether the device is on a root bus or deeper in the > hierarchy. But we can't fix that now, so this seems like the right > thing. I've looked at possibility of making pci_assign_unassigned_bridge_resources() work without bridge pointer, but it looks not viable as it's a bridge dedicated function which on top of rearranging resources, also disables/reprograms/enables bridge. If there are ideas how to make it better, I can pick it up and try to implement. Testing shows that pci_assign_unassigned_bridge_resources() isn't ideal since it releases all resources before reassigning and then if the later fails bridge stays in mis-configured state (attempt to recover results in failing BAR assignment to children devices). It's not issue in case of root-port -> 1 child device hotplug since root port hadn't any working device[s] behind it. But in case of hotplug into PCI bridge, that leaves pre-existing devices behind the bridge broken (SHPC and acpiphp case). > But would you be OK with this minor mod? > > if (pci_is_root_bus(bus)) > __pci_bus_size_bridges(dev->subordinate, &add_list); > > ... > > if (pci_is_root_bus(bus)) > __pci_bus_assign_resources(bus, &add_list, NULL); > else > pci_assign_unassigned_bridge_resources(bus->self); > > For two reasons: (1) test the same condition both places, and (2) be a > little more explicit about the scenario (and "bus->self == NULL" also > happens for the virtual buses added for SR-IOV). works for me, tested with: 1: hotplug on root bus with QEMU's 'pc' machine, 2: (q35 machine) igb emulation, and adding VFs once guest is booted 3: BusCheck on missing PCI device (aka simulated env of reporter) 4: (q35 machine) resource re-allocation still works on root ports when a device with large BARs is hotplugged > > > > } > > > > > > > > acpiphp_sanitize_bus(bus); > > > > -- > > > > 2.39.3 > > > > > > > > > >
On Mon, Jul 31, 2023 at 02:44:18PM +0200, Igor Mammedov wrote: > On Sat, 29 Jul 2023 16:50:09 -0500 Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Fri, Jul 28, 2023 at 11:32:16AM +0200, Igor Mammedov wrote: > > > On Thu, 27 Jul 2023 12:41:02 -0500 Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > On Wed, Jul 26, 2023 at 02:35:18PM +0200, Igor Mammedov wrote: > > > > > Commit [1] switched acpiphp hotplug to use > > > > > pci_assign_unassigned_bridge_resources() > > > > > which depends on bridge being available, however in some cases > > > > > when acpiphp is in use, enable_slot() can get a slot without > > > > > bridge associated. > > > > > 1. legitimate case of hotplug on root bus > > > > > (likely not exiting on real hw, but widely used in virt world) > > > > > 2. broken firmware, that sends 'Bus check' events to non > > > > > existing root ports (Dell Inspiron 7352/0W6WV0), which somehow > > > > > endup at acpiphp:enable_slot(..., bridge = 0) and with bus > > > > > without bridge assigned to it. > > > 2: last round of logs with debug patch /before 40613da5, with 40613da5, and after/ > > > https://lore.kernel.org/r/46437825-3bd0-2f8a-12d8-98a2b54d7c22@gmail.com/ > > > > > > here dmesg shows 1st correct port > > > ACPI: \_SB_.PCI0.RP03: acpiphp_glue: Bus check in hotplug_event(): bridge: 000000000dad0b34 > > > and then later on > > > ACPI: \_SB_.PCI0.RP07: acpiphp_glue: Bus check in hotplug_event(): bridge: 0000000000000000 > > > ACPI: \_SB_.PCI0.RP08: acpiphp_glue: Bus check in hotplug_event(): bridge: 0000000000000000 > > > which aren't recognized as bridge > > > > Thanks, that does seem a little suspect. ACPI r6.5 sec 5.6.6 says > > that when OSPM handles a Bus Check, it should "perform a Plug and Play > > re-enumeration operation on the device tree starting from the point > > where it has been notified." > > > > PCI devices are enumerated by doing PCI config reads. It would make > > sense to re-enumerate a PCI hierarchy starting with a PCI device > > that's already known to the OS, e.g., by scanning the secondary bus of > > a PCI-to-PCI bridge. > > > > I think there are two problems here: > > > > 1) The platform shouldn't send a Bus Check notification to a PCI > > device that doesn't exist. How could the OS re-enumerate > > starting there? > > in case of reported laptop, DSDT provides Device Descriptors > for not existing root-ports. > > OSPM can't do anything with it but to pass Notify event to > PCI bus-specific enumeration mechanism, and it's upto PCI subsystem > to discard/ignore Notify() on this ACPI node. That seems backwards to me, but we have a fair bit of ACPI and PCI stuff that's backwards. > > 2) Linux runs acpiphp_hotplug_notify() for Bus Checks to > > non-existent PCI devices when it ignore them; reasoning below. > > > > We call acpiphp_enumerate_slots() in this path, which happens before > > any of the PCI devices on the root bus have been enumerated: > > > > pci_register_host_bridge > > pcibios_add_bus(root bus) > > acpi_pci_add_bus > > acpiphp_enumerate_slots(pci_bus *bus) > > acpi_walk_namespace(acpiphp_add_context) > > acpiphp_add_context(struct acpiphp_bridge *) > > acpi_evaluate_integer("_ADR") > > acpiphp_init_context > > context->hp.notify = acpiphp_hotplug_notify > > > > So now we've already looked at RP03, RP07, and RP08, and set up the > > .notify() handler for all of them. Since we haven't scanned the bus > > yet, we don't know that RP03 exists and RP07 and RP08 do not. > > While ACPI doesn't forbid firmware to describe non-existing RP, > the PCIe hostbridge can't hotplug extra root ports. (and QEMU > follows PCIe design in this respect on 'q35' machine). > > However when it comes to hotplug on QEMU's 'pc' machine > (hotplug on root bus), each slot has "Augmented Device > Descriptors", that includes un-populated slots leading to > the presence of .notify() handler on such slots. > > Then later on when device is hotplugged, a Notify(,1/*DeviceCheck*/) > is sent to previously empty slot and from there on PCI subsystem > re-enumerates either a single device or a bridge hierarchy > (from the parent context). > > So I'd assume that we need to have .notify() handler for all slots > that are described in DSDT (present and non present). Just from a "beautiful design" perspective, it seems suboptimal for the ACPI device tree to have to include Device objects for all possible hot-added devices. I would expect hot-add to be handled via a Bus Check to the *parent* of a new device, so the device tree would only need to describe hardware that's present at boot. That would mean pci_root.c would have some .notify() handler, but I don't see anything there. I don't know if platforms really implement Root Port hot-add (maybe qemu?), but if they do, I could believe they might do it via Notify to an ACPI Device for a non-present hardware device. I wouldn't know whether that's the intent of the spec, or just a reaction to something that happened to work with existing OSes. > > $ qemu-system-x86_64 -M pc -m 512M -monitor stdio -cpu host --enable-kvm -kernel arch/x86/boot/bzImage -drive format=raw,file=ubuntu.img -append "root=/dev/sda1" > > (qemu) device_add e1000 > > > > (For posterity, replacing "-monitor stdio" with "-nographic -monitor > > telnet:localhost:7001,server,nowait,nodelay" and adding > > "console=ttyS0,115200n8" to the -append made it easier to see the > > crash details.) > > I've not put extra arguments, because there is a lot of ways > one can configure/use monitor/serial options. > > But specifying full command line like yours will be useful > for anyone who doesn't have any experience with QEMU CLI. Yep, that's the audience :) I want to make it as easy as reasonably possible for non-qemu experts to repro things. > > I really wish we didn't have such different resource assignment paths > > depending on whether the device is on a root bus or deeper in the > > hierarchy. But we can't fix that now, so this seems like the right > > thing. > > I've looked at possibility of making > pci_assign_unassigned_bridge_resources() > work without bridge pointer, but it looks not viable as it's > a bridge dedicated function which on top of rearranging > resources, also disables/reprograms/enables bridge. > > If there are ideas how to make it better, > I can pick it up and try to implement. > > Testing shows that pci_assign_unassigned_bridge_resources() > isn't ideal since it releases all resources before reassigning > and then if the later fails bridge stays in mis-configured > state (attempt to recover results in failing BAR assignment > to children devices). > It's not issue in case of > root-port -> 1 child device hotplug > since root port hadn't any working device[s] behind it. > But in case of hotplug into PCI bridge, that leaves > pre-existing devices behind the bridge broken (SHPC and acpiphp case). Yeah, it's a complicated mess. That's why I didn't think this would be a viable fix in the short term. Bjorn
On Mon, Jul 31, 2023 at 04:42:51PM -0500, Bjorn Helgaas wrote: > I would expect hot-add to be handled via a Bus Check to the *parent* > of a new device, so the device tree would only need to describe > hardware that's present at boot. That would mean pci_root.c would > have some .notify() handler, but I don't see anything there. That has a big performance cost though - OSPM has no way to figure out on which slot the new device is, so has to rescan the whole bus.
On Mon, 31 Jul 2023 17:54:21 -0400 "Michael S. Tsirkin" <mst@redhat.com> wrote: > On Mon, Jul 31, 2023 at 04:42:51PM -0500, Bjorn Helgaas wrote: > > I would expect hot-add to be handled via a Bus Check to the *parent* > > of a new device, so the device tree would only need to describe > > hardware that's present at boot. That would mean pci_root.c would > > have some .notify() handler, but I don't see anything there. > > That has a big performance cost though - OSPM has no way to figure out > on which slot the new device is, so has to rescan the whole bus. > Spec says following about OSPM receiving DeviceCheck ACPI6.5r 5.6.6 Device Object Notifications) " If the device has appeared, OSPM will re-enumerate from the parent. If the device has disappeared, OSPM will invalidate the state of the device. OSPM may optimize out re-enumeration. ... If the device is a bridge, OSPM _may_ re-enumerate the bridge and the child bus. " The later statement is was added somewhere after 1.0b spec. According to debug logs when I was testing that hotplug still works I saw 're-enumerate from the parent', behavior. So there is space to optimize if there would be demand for that. And 6.5 spec has 'Device Light Check', though using that would require some ugly juggling with checking supported revisions & co which were never reliable in practice. I don't know what Windows does in that case. However if one has deep hierarchy, a BusCheck shall cause expensive deep scan. While for DeviceCheck it's optional 'may', and even that may is vague enough that one can read it as if it's 'a new bridge' then scan behind it while one can ignore existing bridge if it isn't DeviceCheck target. Regardless of that we can't just switch to BusCheck exclusively without harming existing setups which can legitimately use both methods.
On Tue, Aug 01, 2023 at 11:57:51AM +0200, Igor Mammedov wrote: > On Mon, 31 Jul 2023 17:54:21 -0400 > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > On Mon, Jul 31, 2023 at 04:42:51PM -0500, Bjorn Helgaas wrote: > > > I would expect hot-add to be handled via a Bus Check to the *parent* > > > of a new device, so the device tree would only need to describe > > > hardware that's present at boot. That would mean pci_root.c would > > > have some .notify() handler, but I don't see anything there. > > > > That has a big performance cost though - OSPM has no way to figure out > > on which slot the new device is, so has to rescan the whole bus. > > > > Spec says following about OSPM receiving DeviceCheck > ACPI6.5r 5.6.6 Device Object Notifications) " > If the device has appeared, OSPM will re-enumerate from the parent. > If the device has disappeared, OSPM will invalidate the state of the device. > OSPM may optimize out re-enumeration. > ... > If the device is a bridge, OSPM _may_ re-enumerate the bridge and the child bus. > " > The later statement is was added somewhere after 1.0b spec. > > According to debug logs when I was testing that hotplug still works > I saw 're-enumerate from the parent', behavior. > So there is space > to optimize if there would be demand for that. Yes I was talking about unplug. > And 6.5 spec > has 'Device Light Check', though using that would require some > ugly juggling with checking supported revisions & co which were > never reliable in practice. > I don't know what Windows does in that case. > > However if one has deep hierarchy, a BusCheck shall cause > expensive deep scan. While for DeviceCheck it's optional 'may', > and even that may is vague enough that one can read it as > if it's 'a new bridge' then scan behind it while one can ignore > existing bridge if it isn't DeviceCheck target. And it's very clear that it's more efficient for removal. > Regardless of that we can't just switch to BusCheck exclusively > without harming existing setups which can legitimately use both > methods.
On Mon, 31 Jul 2023 16:42:51 -0500 Bjorn Helgaas <helgaas@kernel.org> wrote: Hi Bjorn, Is there anything else for me to do to make this merged? (just checked recent branches in pci tree, and 'Revert "PCI: acpiphp: Reassign resources on bridge if necessary' is still there) > On Mon, Jul 31, 2023 at 02:44:18PM +0200, Igor Mammedov wrote: > > On Sat, 29 Jul 2023 16:50:09 -0500 Bjorn Helgaas <helgaas@kernel.org> wrote: > > > On Fri, Jul 28, 2023 at 11:32:16AM +0200, Igor Mammedov wrote: > > > > On Thu, 27 Jul 2023 12:41:02 -0500 Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > > On Wed, Jul 26, 2023 at 02:35:18PM +0200, Igor Mammedov wrote: > > > > > > Commit [1] switched acpiphp hotplug to use > > > > > > pci_assign_unassigned_bridge_resources() > > > > > > which depends on bridge being available, however in some cases > > > > > > when acpiphp is in use, enable_slot() can get a slot without > > > > > > bridge associated. > > > > > > 1. legitimate case of hotplug on root bus > > > > > > (likely not exiting on real hw, but widely used in virt world) > > > > > > 2. broken firmware, that sends 'Bus check' events to non > > > > > > existing root ports (Dell Inspiron 7352/0W6WV0), which somehow > > > > > > endup at acpiphp:enable_slot(..., bridge = 0) and with bus > > > > > > without bridge assigned to it. > > > > > 2: last round of logs with debug patch /before 40613da5, with 40613da5, and after/ > > > > https://lore.kernel.org/r/46437825-3bd0-2f8a-12d8-98a2b54d7c22@gmail.com/ > > > > > > > > here dmesg shows 1st correct port > > > > ACPI: \_SB_.PCI0.RP03: acpiphp_glue: Bus check in hotplug_event(): bridge: 000000000dad0b34 > > > > and then later on > > > > ACPI: \_SB_.PCI0.RP07: acpiphp_glue: Bus check in hotplug_event(): bridge: 0000000000000000 > > > > ACPI: \_SB_.PCI0.RP08: acpiphp_glue: Bus check in hotplug_event(): bridge: 0000000000000000 > > > > which aren't recognized as bridge > > > > > > Thanks, that does seem a little suspect. ACPI r6.5 sec 5.6.6 says > > > that when OSPM handles a Bus Check, it should "perform a Plug and Play > > > re-enumeration operation on the device tree starting from the point > > > where it has been notified." > > > > > > PCI devices are enumerated by doing PCI config reads. It would make > > > sense to re-enumerate a PCI hierarchy starting with a PCI device > > > that's already known to the OS, e.g., by scanning the secondary bus of > > > a PCI-to-PCI bridge. > > > > > > I think there are two problems here: > > > > > > 1) The platform shouldn't send a Bus Check notification to a PCI > > > device that doesn't exist. How could the OS re-enumerate > > > starting there? > > > > in case of reported laptop, DSDT provides Device Descriptors > > for not existing root-ports. > > > > OSPM can't do anything with it but to pass Notify event to > > PCI bus-specific enumeration mechanism, and it's upto PCI subsystem > > to discard/ignore Notify() on this ACPI node. > > That seems backwards to me, but we have a fair bit of ACPI and PCI > stuff that's backwards. > > > > 2) Linux runs acpiphp_hotplug_notify() for Bus Checks to > > > non-existent PCI devices when it ignore them; reasoning below. > > > > > > We call acpiphp_enumerate_slots() in this path, which happens before > > > any of the PCI devices on the root bus have been enumerated: > > > > > > pci_register_host_bridge > > > pcibios_add_bus(root bus) > > > acpi_pci_add_bus > > > acpiphp_enumerate_slots(pci_bus *bus) > > > acpi_walk_namespace(acpiphp_add_context) > > > acpiphp_add_context(struct acpiphp_bridge *) > > > acpi_evaluate_integer("_ADR") > > > acpiphp_init_context > > > context->hp.notify = acpiphp_hotplug_notify > > > > > > So now we've already looked at RP03, RP07, and RP08, and set up the > > > .notify() handler for all of them. Since we haven't scanned the bus > > > yet, we don't know that RP03 exists and RP07 and RP08 do not. > > > > While ACPI doesn't forbid firmware to describe non-existing RP, > > the PCIe hostbridge can't hotplug extra root ports. (and QEMU > > follows PCIe design in this respect on 'q35' machine). > > > > However when it comes to hotplug on QEMU's 'pc' machine > > (hotplug on root bus), each slot has "Augmented Device > > Descriptors", that includes un-populated slots leading to > > the presence of .notify() handler on such slots. > > > > Then later on when device is hotplugged, a Notify(,1/*DeviceCheck*/) > > is sent to previously empty slot and from there on PCI subsystem > > re-enumerates either a single device or a bridge hierarchy > > (from the parent context). > > > > So I'd assume that we need to have .notify() handler for all slots > > that are described in DSDT (present and non present). > > Just from a "beautiful design" perspective, it seems suboptimal for > the ACPI device tree to have to include Device objects for all > possible hot-added devices. > > I would expect hot-add to be handled via a Bus Check to the *parent* > of a new device, so the device tree would only need to describe > hardware that's present at boot. That would mean pci_root.c would > have some .notify() handler, but I don't see anything there. > > I don't know if platforms really implement Root Port hot-add (maybe > qemu?), but if they do, I could believe they might do it via Notify to > an ACPI Device for a non-present hardware device. I wouldn't know > whether that's the intent of the spec, or just a reaction to something > that happened to work with existing OSes. > > > > $ qemu-system-x86_64 -M pc -m 512M -monitor stdio -cpu host --enable-kvm -kernel arch/x86/boot/bzImage -drive format=raw,file=ubuntu.img -append "root=/dev/sda1" > > > (qemu) device_add e1000 > > > > > > (For posterity, replacing "-monitor stdio" with "-nographic -monitor > > > telnet:localhost:7001,server,nowait,nodelay" and adding > > > "console=ttyS0,115200n8" to the -append made it easier to see the > > > crash details.) > > > > I've not put extra arguments, because there is a lot of ways > > one can configure/use monitor/serial options. > > > > But specifying full command line like yours will be useful > > for anyone who doesn't have any experience with QEMU CLI. > > Yep, that's the audience :) I want to make it as easy as reasonably > possible for non-qemu experts to repro things. > > > > I really wish we didn't have such different resource assignment paths > > > depending on whether the device is on a root bus or deeper in the > > > hierarchy. But we can't fix that now, so this seems like the right > > > thing. > > > > I've looked at possibility of making > > pci_assign_unassigned_bridge_resources() > > work without bridge pointer, but it looks not viable as it's > > a bridge dedicated function which on top of rearranging > > resources, also disables/reprograms/enables bridge. > > > > If there are ideas how to make it better, > > I can pick it up and try to implement. > > > > Testing shows that pci_assign_unassigned_bridge_resources() > > isn't ideal since it releases all resources before reassigning > > and then if the later fails bridge stays in mis-configured > > state (attempt to recover results in failing BAR assignment > > to children devices). > > It's not issue in case of > > root-port -> 1 child device hotplug > > since root port hadn't any working device[s] behind it. > > But in case of hotplug into PCI bridge, that leaves > > pre-existing devices behind the bridge broken (SHPC and acpiphp case). > > Yeah, it's a complicated mess. That's why I didn't think this would > be a viable fix in the short term. > > Bjorn >
On Wed, Jul 26, 2023 at 02:35:18PM +0200, Igor Mammedov wrote: > Commit [1] switched acpiphp hotplug to use > pci_assign_unassigned_bridge_resources() > which depends on bridge being available, however in some cases > when acpiphp is in use, enable_slot() can get a slot without > bridge associated. acpiphp is *always* in use if we get to enable_slot(), so that doesn't really add information here. > 1. legitimate case of hotplug on root bus > (likely not exiting on real hw, but widely used in virt world) > 2. broken firmware, that sends 'Bus check' events to non > existing root ports (Dell Inspiron 7352/0W6WV0), which somehow > endup at acpiphp:enable_slot(..., bridge = 0) and with bus > without bridge assigned to it. IIUC, the Inspiron problem happens when: - acpiphp_context->bridge is NULL, so hotplug_event() calls enable_slot() instead of acpiphp_check_bridge(), AND - acpiphp_slot->bus->self is also NULL, because enable_slot() calls pci_assign_unassigned_bridge_resources() with that NULL pointer, which dereferences "bridge->subordinate" But I can't figure out why acpiphp_context->bridge is NULL for RP07 and RP08 (which don't exist), but not for RP03 (which does). I guess all the acpiphp_contexts (RP03, RP07, RP08) must be allocated in acpiphp_add_context() by acpiphp_init_context(). Woody's lspci from [1] shows only one Root Port: 00:1c.0 Wildcat Point-LP PCI Express Root Port #3 The DSDT.DSL includes: Device (RP01) _ADR 0x001C0000 # 1c.0 Device (RP02) _ADR 0x001C0001 # 1c.1 Device (RP03) _ADR 0x001C0002 # 1c.2 Device (RP04) _ADR 0x001C0003 # 1c.3 Device (RP05) _ADR 0x001C0004 # 1c.4 Device (RP06) _ADR 0x001C0005 # 1c.5 Device (RP07) _ADR 0x001C0006 # 1c.6 Device (RP08) _ADR 0x001C0007 # 1c.7 I can see why we might need a Bus Check after resume to see if something got added while we were suspended. But I don't see why we handle RP03 differently from RP07 and RP08. Can you help me out? I'm lost in a maze of twisty passages, all alike. Bjorn [1] https://lore.kernel.org/r/92150d8d-8a3a-d600-a996-f60a8e4c876c@gmail.com
diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c index 328d1e416014..3bc4e1f3efee 100644 --- a/drivers/pci/hotplug/acpiphp_glue.c +++ b/drivers/pci/hotplug/acpiphp_glue.c @@ -498,6 +498,7 @@ static void enable_slot(struct acpiphp_slot *slot, bool bridge) acpiphp_native_scan_bridge(dev); } } else { + LIST_HEAD(add_list); int max, pass; acpiphp_rescan_slot(slot); @@ -511,10 +512,15 @@ static void enable_slot(struct acpiphp_slot *slot, bool bridge) if (pass && dev->subordinate) { check_hotplug_bridge(slot, dev); pcibios_resource_survey_bus(dev->subordinate); + if (!bus->self) + __pci_bus_size_bridges(dev->subordinate, &add_list); } } } - pci_assign_unassigned_bridge_resources(bus->self); + if (bus->self) + pci_assign_unassigned_bridge_resources(bus->self); + else + __pci_bus_assign_resources(bus, &add_list, NULL); } acpiphp_sanitize_bus(bus);
Commit [1] switched acpiphp hotplug to use pci_assign_unassigned_bridge_resources() which depends on bridge being available, however in some cases when acpiphp is in use, enable_slot() can get a slot without bridge associated. 1. legitimate case of hotplug on root bus (likely not exiting on real hw, but widely used in virt world) 2. broken firmware, that sends 'Bus check' events to non existing root ports (Dell Inspiron 7352/0W6WV0), which somehow endup at acpiphp:enable_slot(..., bridge = 0) and with bus without bridge assigned to it. Issue is easy to reproduce with QEMU's 'pc' machine provides PCI hotplug on hostbridge slots. to reproduce boot kernel at commit [1] in VM started with followin CLI and hotplug a device: once guest OS is fully booted at qemu prompt: (qemu) device_add e1000 it will cause NULL pointer dereference at void pci_assign_unassigned_bridge_resources(struct pci_dev *bridge) { struct pci_bus *parent = bridge->subordinate; [ 612.277651] BUG: kernel NULL pointer dereference, address: 0000000000000018 [...] [ 612.277798] ? pci_assign_unassigned_bridge_resources+0x1f/0x260 [ 612.277804] ? pcibios_allocate_dev_resources+0x3c/0x2a0 [ 612.277809] enable_slot+0x21f/0x3e0 [ 612.277816] acpiphp_hotplug_notify+0x13d/0x260 [ 612.277822] ? __pfx_acpiphp_hotplug_notify+0x10/0x10 [ 612.277827] acpi_device_hotplug+0xbc/0x540 [ 612.277834] acpi_hotplug_work_fn+0x15/0x20 [ 612.277839] process_one_work+0x1f7/0x370 [ 612.277845] worker_thread+0x45/0x3b0 [ 612.277850] ? __pfx_worker_thread+0x10/0x10 [ 612.277854] kthread+0xdc/0x110 [ 612.277860] ? __pfx_kthread+0x10/0x10 [ 612.277866] ret_from_fork+0x28/0x40 [ 612.277871] ? __pfx_kthread+0x10/0x10 [ 612.277876] ret_from_fork_asm+0x1b/0x30 The issue was discovered on Dell Inspiron 7352/0W6WV0 laptop with following sequence: 1. suspend to RAM 2. wake up with the same backtrace being observed: 3. 2nd suspend to RAM attempt makes laptop freeze Fix it by using __pci_bus_assign_resources() instead of pci_assign_unassigned_bridge_resources()as we used to do but only in case when bus doesn't have a bridge associated with it. That let us keep hotplug on root bus working like it used to be but at the same time keeps resource reassignment usable on root ports (and other 1st level bridges) that was fixed by [1]. 1) Fixes: 40613da52b13 ("PCI: acpiphp: Reassign resources on bridge if necessary") Link: https://lore.kernel.org/r/11fc981c-af49-ce64-6b43-3e282728bd1a@gmail.com Reported-by: Woody Suwalski <terraluna977@gmail.com> Signed-off-by: Igor Mammedov <imammedo@redhat.com> --- drivers/pci/hotplug/acpiphp_glue.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)