diff mbox series

[v5,2/5] PCI: Check PCI_PM_CTRL instead of PCI_COMMAND in pci_dev_wait()

Message ID 20240823154023.360234-3-superm1@kernel.org
State Superseded
Headers show
Series Verify devices transition from D3cold to D0 | expand

Commit Message

Mario Limonciello Aug. 23, 2024, 3:40 p.m. UTC
From: Mario Limonciello <mario.limonciello@amd.com>

If a dock is plugged in at the same time as autosuspend delay then this
can cause malfunctions in the USB4 stack. This happens because the
device is still in D3cold at the time that the PCI core handed
control back to the USB4 stack.

A device that has gone through a reset may return a value in PCI_COMMAND
but that doesn't mean it's finished transitioning to D0.  For evices that
support power management explicitly check PCI_PM_CTRL on everything but
system resume to ensure the transition happened.

Devices that don't support power management and system resume will
continue to use PCI_COMMAND.

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
---
v4->v5:
 * Fix misleading indentation
 * Amend commit message
---
 drivers/pci/pci.c | 28 ++++++++++++++++++++--------
 1 file changed, 20 insertions(+), 8 deletions(-)

Comments

Bjorn Helgaas Aug. 23, 2024, 7:54 p.m. UTC | #1
[+cc Duc, Alex]

On Fri, Aug 23, 2024 at 10:40:20AM -0500, Mario Limonciello wrote:
> If a dock is plugged in at the same time as autosuspend delay then this
> can cause malfunctions in the USB4 stack. This happens because the
> device is still in D3cold at the time that the PCI core handed
> control back to the USB4 stack.

I assume the USB device in question is in the dock that was hot-added?
This patch suggests that pci_dev_wait() has waited for a read of
PCI_COMMAND to respond with something other than ~0, but the device is
still in D3cold.  I suppose we got to pci_dev_wait() via
pci_pm_bridge_power_up_actions() calling
pci_bridge_wait_for_secondary_bus(), since I wouldn't expect a reset
in the hot-add case.

> A device that has gone through a reset may return a value in PCI_COMMAND
> but that doesn't mean it's finished transitioning to D0.  For evices that
> support power management explicitly check PCI_PM_CTRL on everything but
> system resume to ensure the transition happened.

s/evices/devices/

> Devices that don't support power management and system resume will
> continue to use PCI_COMMAND.

Is there a bugzilla or other report with more details that we can
include here?

> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
> ---
> v4->v5:
>  * Fix misleading indentation
>  * Amend commit message
> ---
>  drivers/pci/pci.c | 28 ++++++++++++++++++++--------
>  1 file changed, 20 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 1e219057a5069..f032a4aaec268 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -1309,21 +1309,33 @@ static int pci_dev_wait(struct pci_dev *dev, enum pci_reset_type reset_type, int
>  	 * the read (except when CRS SV is enabled and the read was for the
>  	 * Vendor ID; in that case it synthesizes 0x0001 data).
>  	 *
> -	 * Wait for the device to return a non-CRS completion.  Read the
> -	 * Command register instead of Vendor ID so we don't have to
> -	 * contend with the CRS SV value.
> +	 * Wait for the device to return a non-CRS completion.  On devices
> +	 * that support PM control and on waits that aren't part of system
> +	 * resume read the PM control register to ensure the device has
> +	 * transitioned to D0.  On devices that don't support PM control,
> +	 * or during system resume read the command register to instead of
> +	 * Vendor ID so we don't have to contend with the CRS SV value.
>  	 */
>  	for (;;) {
> -		u32 id;
> -
>  		if (pci_dev_is_disconnected(dev)) {
>  			pci_dbg(dev, "disconnected; not waiting\n");
>  			return -ENOTTY;
>  		}
>  
> -		pci_read_config_dword(dev, PCI_COMMAND, &id);
> -		if (!PCI_POSSIBLE_ERROR(id))
> -			break;
> +		if (dev->pm_cap && reset_type != PCI_DEV_WAIT_RESUME) {
> +			u16 pmcsr;
> +
> +			pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
> +			if (!PCI_POSSIBLE_ERROR(pmcsr) &&
> +			    (pmcsr & PCI_PM_CTRL_STATE_MASK) == PCI_D0)
> +				break;
> +		} else {
> +			u32 id;
> +
> +			pci_read_config_dword(dev, PCI_COMMAND, &id);
> +			if (!PCI_POSSIBLE_ERROR(id))
> +				break;
> +		}

What is the rationale behind using PCI_PM_CTRL in some cases and
PCI_COMMAND in others?  Is there some spec language we can cite for
this?

IIUC, pci_dev_wait() waits for a device to be ready after a reset
(FLR, D3hot->D0 transition for devices where No_Soft_Reset is clear,
DPC) and after power-up from D3cold (pci_pm_bridge_power_up_actions()).
I think device readiness in all of these cases is covered by PCIe
r6.0, sec 6.6.1.

If the Root Port above the device supports Configuration RRS Software
Visibility, I think we probably should use that here instead of either
PCI_COMMAND or PCI_PM_CTRL.  SR-IOV VFs don't implement Vendor ID,
which might complicate this a little.  But it niggles in my mind that
there may be some other problem beyond that.  Maybe Alex remembers.

Anyway, if we meet the requirements of sec 6.6.1, the device should be
ready to respond to config requests, and I assume that also means
the device is in D0.

>  		if (delay > timeout) {
>  			pci_warn(dev, "not ready %dms after %s; giving up\n",
> -- 
> 2.43.0
>
Bjorn Helgaas Aug. 30, 2024, 12:01 a.m. UTC | #2
On Fri, Aug 23, 2024 at 10:40:20AM -0500, Mario Limonciello wrote:
> From: Mario Limonciello <mario.limonciello@amd.com>
> 
> If a dock is plugged in at the same time as autosuspend delay then
> this can cause malfunctions in the USB4 stack. This happens because
> the device is still in D3cold at the time that the PCI core handed
> control back to the USB4 stack.
> 
> A device that has gone through a reset may return a value in
> PCI_COMMAND but that doesn't mean it's finished transitioning to D0.
> For devices that support power management explicitly check
> PCI_PM_CTRL on everything but system resume to ensure the transition
> happened.

Still trying to understand what's going on here.

I posted a change to pci_dev_wait() to read Vendor ID, look for Config
RRS status, and wait for a successful completion (when RRS Software
Visibility is enabled) [1].

You tested that and found that it didn't help with *this* issue [2].
I assume you tested something like v6.11-rc plus the patches from [1],
i.e., without the PCI_PM_CTRL changes in this series.

  1) Initially the device is in D0

  2) We put it in D3cold (using some ACPI method) because the
  autosuspend delay expired (?)

  3) Plugging in the dock wakes up the device, so we power up the
  device (via pci_power_up(), which again uses some ACPI method), and
  it should transition to D0uninitialized

  4) The USB4 stack sees the device but thinks it's in D3cold (?)

If your testing only included [1], but did not include the
pci_power_up() change from patch 3/5 "Verify functions currently in
D3cold have entered D0", I don't think we would call pci_dev_wait(),
so I wouldn't expect [1] to make any difference.

If you *did* include both [1] and patch 3/5, the implication would be
that pci_dev_wait() successfully read the Vendor ID, meaning the
device is not in D3cold when pci_power_up() returns.

Can you clarify what you see and possibly expand/correct my timeline
above?

[1] https://lore.kernel.org/linux-pci/20240827234848.4429-1-helgaas@kernel.org/
[2] https://lore.kernel.org/linux-pci/30d9589a-8050-421b-a9a5-ad3422feadad@amd.com/

> Devices that don't support power management and system resume will
> continue to use PCI_COMMAND.
> 
> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
> ---
> v4->v5:
>  * Fix misleading indentation
>  * Amend commit message
> ---
>  drivers/pci/pci.c | 28 ++++++++++++++++++++--------
>  1 file changed, 20 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 1e219057a5069..f032a4aaec268 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -1309,21 +1309,33 @@ static int pci_dev_wait(struct pci_dev *dev, enum pci_reset_type reset_type, int
>  	 * the read (except when CRS SV is enabled and the read was for the
>  	 * Vendor ID; in that case it synthesizes 0x0001 data).
>  	 *
> -	 * Wait for the device to return a non-CRS completion.  Read the
> -	 * Command register instead of Vendor ID so we don't have to
> -	 * contend with the CRS SV value.
> +	 * Wait for the device to return a non-CRS completion.  On devices
> +	 * that support PM control and on waits that aren't part of system
> +	 * resume read the PM control register to ensure the device has
> +	 * transitioned to D0.  On devices that don't support PM control,
> +	 * or during system resume read the command register to instead of
> +	 * Vendor ID so we don't have to contend with the CRS SV value.
>  	 */
>  	for (;;) {
> -		u32 id;
> -
>  		if (pci_dev_is_disconnected(dev)) {
>  			pci_dbg(dev, "disconnected; not waiting\n");
>  			return -ENOTTY;
>  		}
>  
> -		pci_read_config_dword(dev, PCI_COMMAND, &id);
> -		if (!PCI_POSSIBLE_ERROR(id))
> -			break;
> +		if (dev->pm_cap && reset_type != PCI_DEV_WAIT_RESUME) {
> +			u16 pmcsr;
> +
> +			pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
> +			if (!PCI_POSSIBLE_ERROR(pmcsr) &&
> +			    (pmcsr & PCI_PM_CTRL_STATE_MASK) == PCI_D0)
> +				break;
> +		} else {
> +			u32 id;
> +
> +			pci_read_config_dword(dev, PCI_COMMAND, &id);
> +			if (!PCI_POSSIBLE_ERROR(id))
> +				break;
> +		}
>  
>  		if (delay > timeout) {
>  			pci_warn(dev, "not ready %dms after %s; giving up\n",
> -- 
> 2.43.0
>
Mario Limonciello Sept. 3, 2024, 4:29 p.m. UTC | #3
On 8/29/2024 19:01, Bjorn Helgaas wrote:
> On Fri, Aug 23, 2024 at 10:40:20AM -0500, Mario Limonciello wrote:
>> From: Mario Limonciello <mario.limonciello@amd.com>
>>
>> If a dock is plugged in at the same time as autosuspend delay then
>> this can cause malfunctions in the USB4 stack. This happens because
>> the device is still in D3cold at the time that the PCI core handed
>> control back to the USB4 stack.
>>
>> A device that has gone through a reset may return a value in
>> PCI_COMMAND but that doesn't mean it's finished transitioning to D0.
>> For devices that support power management explicitly check
>> PCI_PM_CTRL on everything but system resume to ensure the transition
>> happened.
> 
> Still trying to understand what's going on here.
> 
> I posted a change to pci_dev_wait() to read Vendor ID, look for Config
> RRS status, and wait for a successful completion (when RRS Software
> Visibility is enabled) [1].
> 
> You tested that and found that it didn't help with *this* issue [2].
> I assume you tested something like v6.11-rc plus the patches from [1],
> i.e., without the PCI_PM_CTRL changes in this series.
> 
>    1) Initially the device is in D0
> 
>    2) We put it in D3cold (using some ACPI method) because the
>    autosuspend delay expired (?)
> 
>    3) Plugging in the dock wakes up the device, so we power up the
>    device (via pci_power_up(), which again uses some ACPI method), and
>    it should transition to D0uninitialized
> 
>    4) The USB4 stack sees the device but thinks it's in D3cold (?)
> 
> If your testing only included [1], but did not include the
> pci_power_up() change from patch 3/5 "Verify functions currently in
> D3cold have entered D0", I don't think we would call pci_dev_wait(),
> so I wouldn't expect [1] to make any difference.
> 
> If you *did* include both [1] and patch 3/5, the implication would be
> that pci_dev_wait() successfully read the Vendor ID, meaning the
> device is not in D3cold when pci_power_up() returns.

The testing here was from the LTS 6.6.y kernel with both [1] and patch 
3/5 from this series.

I can get testing from 6.11-rc6 with a combination of patches if you 
would like.

> 
> Can you clarify what you see and possibly expand/correct my timeline
> above?

The timeline you shared is nearly correct.  The USB4 stack *thinks* the 
device is in D0 because of the return of pci_power_up().

As by polling PCI_PM_CTRL we can see it's still in D3, so it hasn't made 
it to D0uninitialized yet.

I guess I reading between the lines you have an assumption that you 
can't read the vendor ID from D3; which doesn't appear to be the case 
from our testing.

> 
> [1] https://lore.kernel.org/linux-pci/20240827234848.4429-1-helgaas@kernel.org/
> [2] https://lore.kernel.org/linux-pci/30d9589a-8050-421b-a9a5-ad3422feadad@amd.com/
> 
>> Devices that don't support power management and system resume will
>> continue to use PCI_COMMAND.
>>
>> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
>> ---
>> v4->v5:
>>   * Fix misleading indentation
>>   * Amend commit message
>> ---
>>   drivers/pci/pci.c | 28 ++++++++++++++++++++--------
>>   1 file changed, 20 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> index 1e219057a5069..f032a4aaec268 100644
>> --- a/drivers/pci/pci.c
>> +++ b/drivers/pci/pci.c
>> @@ -1309,21 +1309,33 @@ static int pci_dev_wait(struct pci_dev *dev, enum pci_reset_type reset_type, int
>>   	 * the read (except when CRS SV is enabled and the read was for the
>>   	 * Vendor ID; in that case it synthesizes 0x0001 data).
>>   	 *
>> -	 * Wait for the device to return a non-CRS completion.  Read the
>> -	 * Command register instead of Vendor ID so we don't have to
>> -	 * contend with the CRS SV value.
>> +	 * Wait for the device to return a non-CRS completion.  On devices
>> +	 * that support PM control and on waits that aren't part of system
>> +	 * resume read the PM control register to ensure the device has
>> +	 * transitioned to D0.  On devices that don't support PM control,
>> +	 * or during system resume read the command register to instead of
>> +	 * Vendor ID so we don't have to contend with the CRS SV value.
>>   	 */
>>   	for (;;) {
>> -		u32 id;
>> -
>>   		if (pci_dev_is_disconnected(dev)) {
>>   			pci_dbg(dev, "disconnected; not waiting\n");
>>   			return -ENOTTY;
>>   		}
>>   
>> -		pci_read_config_dword(dev, PCI_COMMAND, &id);
>> -		if (!PCI_POSSIBLE_ERROR(id))
>> -			break;
>> +		if (dev->pm_cap && reset_type != PCI_DEV_WAIT_RESUME) {
>> +			u16 pmcsr;
>> +
>> +			pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
>> +			if (!PCI_POSSIBLE_ERROR(pmcsr) &&
>> +			    (pmcsr & PCI_PM_CTRL_STATE_MASK) == PCI_D0)
>> +				break;
>> +		} else {
>> +			u32 id;
>> +
>> +			pci_read_config_dword(dev, PCI_COMMAND, &id);
>> +			if (!PCI_POSSIBLE_ERROR(id))
>> +				break;
>> +		}
>>   
>>   		if (delay > timeout) {
>>   			pci_warn(dev, "not ready %dms after %s; giving up\n",
>> -- 
>> 2.43.0
>>
diff mbox series

Patch

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 1e219057a5069..f032a4aaec268 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1309,21 +1309,33 @@  static int pci_dev_wait(struct pci_dev *dev, enum pci_reset_type reset_type, int
 	 * the read (except when CRS SV is enabled and the read was for the
 	 * Vendor ID; in that case it synthesizes 0x0001 data).
 	 *
-	 * Wait for the device to return a non-CRS completion.  Read the
-	 * Command register instead of Vendor ID so we don't have to
-	 * contend with the CRS SV value.
+	 * Wait for the device to return a non-CRS completion.  On devices
+	 * that support PM control and on waits that aren't part of system
+	 * resume read the PM control register to ensure the device has
+	 * transitioned to D0.  On devices that don't support PM control,
+	 * or during system resume read the command register to instead of
+	 * Vendor ID so we don't have to contend with the CRS SV value.
 	 */
 	for (;;) {
-		u32 id;
-
 		if (pci_dev_is_disconnected(dev)) {
 			pci_dbg(dev, "disconnected; not waiting\n");
 			return -ENOTTY;
 		}
 
-		pci_read_config_dword(dev, PCI_COMMAND, &id);
-		if (!PCI_POSSIBLE_ERROR(id))
-			break;
+		if (dev->pm_cap && reset_type != PCI_DEV_WAIT_RESUME) {
+			u16 pmcsr;
+
+			pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
+			if (!PCI_POSSIBLE_ERROR(pmcsr) &&
+			    (pmcsr & PCI_PM_CTRL_STATE_MASK) == PCI_D0)
+				break;
+		} else {
+			u32 id;
+
+			pci_read_config_dword(dev, PCI_COMMAND, &id);
+			if (!PCI_POSSIBLE_ERROR(id))
+				break;
+		}
 
 		if (delay > timeout) {
 			pci_warn(dev, "not ready %dms after %s; giving up\n",