Message ID | 20231213003614.1648343-1-imammedo@redhat.com |
---|---|
Headers | show |
Series | PCI: acpiphp: workaround race between hotplug and SCSI_SCAN_ASYNC job | expand |
Am 13.12.23 um 01:36 schrieb Igor Mammedov: > When SCSI_SCAN_ASYNC is enabled (either via config or via cmd line), > adding device to bus and enabling it will kick in async host scan > > scsi_scan_host+0x21/0x1f0 > virtscsi_probe+0x2dd/0x350 > .. > driver_probe_device+0x19/0x80 > ... > driver_probe_device+0x19/0x80 > pci_bus_add_device+0x53/0x80 > pci_bus_add_devices+0x2b/0x70 > ... > > which will schedule a job for async scan. That however breaks > if there are more than one SCSI host behind bridge, since > acpiphp_check_bridge() will walk over all slots and try to > enable each of them regardless of whether they were already > enabled. > As result the bridge might be reconfigured several times > and trigger following sequence: > > [cpu 0] acpiphp_check_bridge() > [cpu 0] enable_slot(a) > [cpu 0] configure bridge > [cpu 0] pci_bus_add_devices() -> scsi_scan_host(a1) > [cpu 0] enable_slot(b) > ... > [cpu 1] do_scsi_scan_host(a1) <- async jib scheduled for slot a > ... > [cpu 0] configure bridge <- temporaly disables bridge > > and cause do_scsi_scan_host() failure. > The same race affects SHPC (but it manages to avoid hitting the race due to > 1sec delay when enabling slot). > To cover case of single device hotplug (at a time) do not attempt to > enable slot that have already been enabled. > > Fixes: 40613da52b13 ("PCI: acpiphp: Reassign resources on bridge if necessary") > Reported-by: Dongli Zhang <dongli.zhang@oracle.com> > Reported-by: iona Ebner <f.ebner@proxmox.com> Missing an F here ;) > Signed-off-by: Igor Mammedov <imammedo@redhat.com> Thank you! Works for me: Tested-by: Fiona Ebner <f.ebner@proxmox.com>
On Wed, 13 Dec 2023 10:47:27 +0100 Fiona Ebner <f.ebner@proxmox.com> wrote: > Am 13.12.23 um 01:36 schrieb Igor Mammedov: > > When SCSI_SCAN_ASYNC is enabled (either via config or via cmd line), > > adding device to bus and enabling it will kick in async host scan > > > > scsi_scan_host+0x21/0x1f0 > > virtscsi_probe+0x2dd/0x350 > > .. > > driver_probe_device+0x19/0x80 > > ... > > driver_probe_device+0x19/0x80 > > pci_bus_add_device+0x53/0x80 > > pci_bus_add_devices+0x2b/0x70 > > ... > > > > which will schedule a job for async scan. That however breaks > > if there are more than one SCSI host behind bridge, since > > acpiphp_check_bridge() will walk over all slots and try to > > enable each of them regardless of whether they were already > > enabled. > > As result the bridge might be reconfigured several times > > and trigger following sequence: > > > > [cpu 0] acpiphp_check_bridge() > > [cpu 0] enable_slot(a) > > [cpu 0] configure bridge > > [cpu 0] pci_bus_add_devices() -> scsi_scan_host(a1) > > [cpu 0] enable_slot(b) > > ... > > [cpu 1] do_scsi_scan_host(a1) <- async jib scheduled for slot a > > ... > > [cpu 0] configure bridge <- temporaly disables bridge > > > > and cause do_scsi_scan_host() failure. > > The same race affects SHPC (but it manages to avoid hitting the race due to > > 1sec delay when enabling slot). > > To cover case of single device hotplug (at a time) do not attempt to > > enable slot that have already been enabled. > > > > Fixes: 40613da52b13 ("PCI: acpiphp: Reassign resources on bridge if necessary") > > Reported-by: Dongli Zhang <dongli.zhang@oracle.com> > > Reported-by: iona Ebner <f.ebner@proxmox.com> > > Missing an F here ;) Sorry for copypaste mistake, I'll fix it up on the next submission. > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com> > > Thank you! Works for me: > > Tested-by: Fiona Ebner <f.ebner@proxmox.com> >
On Wed, Dec 13, 2023 at 1:36 AM Igor Mammedov <imammedo@redhat.com> wrote: > > When SCSI_SCAN_ASYNC is enabled (either via config or via cmd line), > adding device to bus and enabling it will kick in async host scan > > scsi_scan_host+0x21/0x1f0 > virtscsi_probe+0x2dd/0x350 > .. > driver_probe_device+0x19/0x80 > ... > driver_probe_device+0x19/0x80 > pci_bus_add_device+0x53/0x80 > pci_bus_add_devices+0x2b/0x70 > ... > > which will schedule a job for async scan. That however breaks > if there are more than one SCSI host behind bridge, since > acpiphp_check_bridge() will walk over all slots and try to > enable each of them regardless of whether they were already > enabled. > As result the bridge might be reconfigured several times > and trigger following sequence: > > [cpu 0] acpiphp_check_bridge() > [cpu 0] enable_slot(a) > [cpu 0] configure bridge > [cpu 0] pci_bus_add_devices() -> scsi_scan_host(a1) > [cpu 0] enable_slot(b) > ... > [cpu 1] do_scsi_scan_host(a1) <- async jib scheduled for slot a > ... > [cpu 0] configure bridge <- temporaly disables bridge > > and cause do_scsi_scan_host() failure. > The same race affects SHPC (but it manages to avoid hitting the race due to > 1sec delay when enabling slot). > To cover case of single device hotplug (at a time) do not attempt to > enable slot that have already been enabled. > > Fixes: 40613da52b13 ("PCI: acpiphp: Reassign resources on bridge if necessary") > Reported-by: Dongli Zhang <dongli.zhang@oracle.com> > Reported-by: iona Ebner <f.ebner@proxmox.com> > Signed-off-by: Igor Mammedov <imammedo@redhat.com> > --- > drivers/pci/hotplug/acpiphp_glue.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c > index 601129772b2d..6b11609927d6 100644 > --- a/drivers/pci/hotplug/acpiphp_glue.c > +++ b/drivers/pci/hotplug/acpiphp_glue.c > @@ -722,7 +722,9 @@ static void acpiphp_check_bridge(struct acpiphp_bridge *bridge) > trim_stale_devices(dev); > > /* configure all functions */ > - enable_slot(slot, true); > + if (slot->flags != SLOT_ENABLED) { > + enable_slot(slot, true); > + } Shouldn't this be following the acpiphp_enable_slot() pattern, that is if (!(slot->flags & SLOT_ENABLED)) enable_slot(slot, true); Also the braces are redundant. > } else { > disable_slot(slot); > } > --
On Wed, Dec 13, 2023 at 2:01 PM Rafael J. Wysocki <rafael@kernel.org> wrote: > > On Wed, Dec 13, 2023 at 1:36 AM Igor Mammedov <imammedo@redhat.com> wrote: > > > > When SCSI_SCAN_ASYNC is enabled (either via config or via cmd line), > > adding device to bus and enabling it will kick in async host scan > > > > scsi_scan_host+0x21/0x1f0 > > virtscsi_probe+0x2dd/0x350 > > .. > > driver_probe_device+0x19/0x80 > > ... > > driver_probe_device+0x19/0x80 > > pci_bus_add_device+0x53/0x80 > > pci_bus_add_devices+0x2b/0x70 > > ... > > > > which will schedule a job for async scan. That however breaks > > if there are more than one SCSI host behind bridge, since > > acpiphp_check_bridge() will walk over all slots and try to > > enable each of them regardless of whether they were already > > enabled. > > As result the bridge might be reconfigured several times > > and trigger following sequence: > > > > [cpu 0] acpiphp_check_bridge() > > [cpu 0] enable_slot(a) > > [cpu 0] configure bridge > > [cpu 0] pci_bus_add_devices() -> scsi_scan_host(a1) > > [cpu 0] enable_slot(b) > > ... > > [cpu 1] do_scsi_scan_host(a1) <- async jib scheduled for slot a > > ... > > [cpu 0] configure bridge <- temporaly disables bridge > > > > and cause do_scsi_scan_host() failure. > > The same race affects SHPC (but it manages to avoid hitting the race due to > > 1sec delay when enabling slot). > > To cover case of single device hotplug (at a time) do not attempt to > > enable slot that have already been enabled. > > > > Fixes: 40613da52b13 ("PCI: acpiphp: Reassign resources on bridge if necessary") > > Reported-by: Dongli Zhang <dongli.zhang@oracle.com> > > Reported-by: iona Ebner <f.ebner@proxmox.com> > > Signed-off-by: Igor Mammedov <imammedo@redhat.com> > > --- > > drivers/pci/hotplug/acpiphp_glue.c | 4 +++- > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c > > index 601129772b2d..6b11609927d6 100644 > > --- a/drivers/pci/hotplug/acpiphp_glue.c > > +++ b/drivers/pci/hotplug/acpiphp_glue.c > > @@ -722,7 +722,9 @@ static void acpiphp_check_bridge(struct acpiphp_bridge *bridge) > > trim_stale_devices(dev); > > > > /* configure all functions */ > > - enable_slot(slot, true); > > + if (slot->flags != SLOT_ENABLED) { > > + enable_slot(slot, true); > > + } > > Shouldn't this be following the acpiphp_enable_slot() pattern, that is > > if (!(slot->flags & SLOT_ENABLED)) > enable_slot(slot, true); > > Also the braces are redundant. I'll fix up on respin if Bjorn is fine with the approach in general. Patches need respin anyways to fix botched up white spacing. > > > } else { > > disable_slot(slot); > > } > > -- >
On Wed, Dec 13, 2023 at 01:36:12AM +0100, Igor Mammedov wrote: > Hacks to mask a race between HBA scan job and bridge re-configuration(s) > during hotplug. > > I don't like it a bit but it something that could be done quickly > and solves problems that were reported. I agree, I don't like it either. Adding a 1s delay doesn't address the real problem, and putting in a band-aid like this means the real problem would likely never be addressed. At this point the best option I see is to revert these: cc22522fd55e2 ("PCI: acpiphp: Use pci_assign_unassigned_bridge_resources() only for non-root bus") 40613da52b13f ("PCI: acpiphp: Reassign resources on bridge if necessary") I hate the fact that reverting them would mean the root bus hotplug and ACPI bus check notifications would become issues again. But keeping these commits even though they add a new different problem that breaks things for somebody else seems worse to me. Bjorn > Other options to discuss/possibly more invasive: > 1: make sure pci_assign_unassigned_bridge_resources() doesn't reconfigure > bridge if it's not necessary. > 2. make SCSI_SCAN_ASYNC job wait till hotplug is finished for all slots on > the bridge or somehow restart the job if it fails > 3. any other ideas? > > > 1st reported: https://lore.kernel.org/r/9eb669c0-d8f2-431d-a700-6da13053ae54@proxmox.com > > CC: Dongli Zhang <dongli.zhang@oracle.com> > CC: linux-acpi@vger.kernel.org > CC: linux-pci@vger.kernel.org > CC: imammedo@redhat.com > CC: mst@redhat.com > CC: rafael@kernel.org > CC: lenb@kernel.org > CC: bhelgaas@google.com > CC: mika.westerberg@linux.intel.com > CC: boris.ostrovsky@oracle.com > CC: joe.jin@oracle.com > CC: stable@vger.kernel.org > CC: linux-kernel@vger.kernel.org > CC: Fiona Ebner <f.ebner@proxmox.com> > CC: Thomas Lamprecht <t.lamprecht@proxmox.com> > > Igor Mammedov (2): > PCI: acpiphp: enable slot only if it hasn't been enabled already > PCI: acpiphp: slowdown hotplug if hotplugging multiple devices at a > time > > drivers/pci/hotplug/acpiphp_glue.c | 10 +++++++++- > 1 file changed, 9 insertions(+), 1 deletion(-) > > -- > 2.39.3 >