Message ID | 20240613054204.5850-2-mario.limonciello@amd.com |
---|---|
State | New |
Headers | show |
Series | Verify devices transition from D3cold to D0 | expand |
> A device that has gone through a reset may return a value in PCI_COMMAND > but that doesn't mean it's finished transitioning to D0. On devices that > support power management explicitly check PCI_PM_CTRL to ensure the > transition happened. Devicees that don't support power management will Devices? > continue to use PCI_COMMAND. Would the tag “Fixes” be relevant for such a change description? Regards, Markus
On 6/19/2024 13:33, Markus Elfring wrote: >> A device that has gone through a reset may return a value in PCI_COMMAND >> but that doesn't mean it's finished transitioning to D0. On devices that >> support power management explicitly check PCI_PM_CTRL to ensure the >> transition happened. Devicees that don't support power management will > > Devices? Yes, thanks. I'll fix that up for the next version once we have some alignment on the functionality outlined in these patches. > > >> continue to use PCI_COMMAND. > > Would the tag “Fixes” be relevant for such a change description? > > Regards, > Markus I did trace back the history of the wait function and it goes back to 4.6. In my mind yes; it is a fix, but I don't think it should go that far back automatically. I think we should prioritize getting it fixed for 6.11 or 6.12 and then can revisit how far back to do a stable backport. For example AMD Rembrandt (where this race condition was found) isn't enabled until 5.17 or 5.18 IIRC. The backports would have a dependency on 08e3ed12ca861 (from 6.5-rc1) and bae26849372b8 (from 5.5-rc1) and 821cdad5c46ca (from 4.14) and 5adecf817dd63 (from 4.6-rc1).
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 59e0949fb079..41961e28a86c 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -1270,21 +1270,33 @@ static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout) * the read (except when CRS SV is enabled and the read was for the * Vendor ID; in that case it synthesizes 0x0001 data). * - * Wait for the device to return a non-CRS completion. Read the - * Command register instead of Vendor ID so we don't have to - * contend with the CRS SV value. + * Wait for the device to return a non-CRS completion. On devices + * that support PM control read the PM control register to ensure + * the device has transitioned to D0. On devices that don't support + * PM control, read the command register to instead of Vendor ID so + * we don't have to contend with the CRS SV value. */ for (;;) { - u32 id; if (pci_dev_is_disconnected(dev)) { pci_dbg(dev, "disconnected; not waiting\n"); return -ENOTTY; } - pci_read_config_dword(dev, PCI_COMMAND, &id); - if (!PCI_POSSIBLE_ERROR(id)) - break; + if (dev->pm_cap) { + u16 pmcsr; + + pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr); + if (!PCI_POSSIBLE_ERROR(pmcsr) && + (pmcsr & PCI_PM_CTRL_STATE_MASK) == PCI_D0) + break; + } else { + u32 id; + + pci_read_config_dword(dev, PCI_COMMAND, &id); + if (!PCI_POSSIBLE_ERROR(id)) + break; + } if (delay > timeout) { pci_warn(dev, "not ready %dms after %s; giving up\n",
A device that has gone through a reset may return a value in PCI_COMMAND but that doesn't mean it's finished transitioning to D0. On devices that support power management explicitly check PCI_PM_CTRL to ensure the transition happened. Devicees that don't support power management will continue to use PCI_COMMAND. Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> --- drivers/pci/pci.c | 26 +++++++++++++++++++------- 1 file changed, 19 insertions(+), 7 deletions(-)