diff mbox series

usb: dwc3: Fix spurious wakeup when port is on device mode

Message ID 20231122165931.443845-1-gpiccoli@igalia.com
State New
Headers show
Series usb: dwc3: Fix spurious wakeup when port is on device mode | expand

Commit Message

Guilherme G. Piccoli Nov. 22, 2023, 4:51 p.m. UTC
It was noticed that on plugging a low-power USB source on Steam
Deck USB-C port (handled by dwc3), if such port is on device role,
the HW somehow keep asseting the PCIe PME line and triggering a
wakeup event on S3/deep suspend (that manifests as a PME wakeup
interrupt, failing the suspend path). That doesn't happen when the USB
port is on host mode or if the USB device connected is not a low-power
type (for example, a connected keyboard doesn't reproduce that).

Even by masking all maskable ACPI GPEs, the problem still reproduces; also
by having the PCIe PME mechanism using non-MSI interrupts, the symptom is
observed as the HW succeeding to S3/suspend but waking up right after.

By changing the PRTCAP mode to DWC3_GCTL_PRTCAP_HOST in the controller
(as one of the latest steps on gadget common suspend), we managed to
circumvent the issue. The mode restore is already done in the common
resume function. Notice that this patch does not change the in-driver
port state on suspend, or else the resume path "thinks" the port was
in host mode prior to suspend and resume it on a wrong fashion.

Cc: Andrey Smirnov <andrew.smirnov@gmail.com>
Cc: Vivek Dasmohapatra <vivek@collabora.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
---


Hi folks, first of all thanks in advance for reviews/feedback on this one.

This patch is the result of a debug approach with no datasheet access.
With that said, I'm not 100% sure writing to this register is free of
undesired side-effects; one thing I'm considering is the following scenario:
imagine a device A with the USB port on device/peripheral mode, and a device B
connected to it, acting as host. What if we want device B be able to wakeup
device A? Would this patch prevent that for all cases, always?

I appreciate ideas in case this is not the best approach to fix the
problem. We could also gate this approach to the HW board as a first step,
for example.

Thanks,


Guilherme


 drivers/usb/dwc3/core.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

Comments

Guilherme G. Piccoli Dec. 4, 2023, 11:36 a.m. UTC | #1
On 22/11/2023 13:51, Guilherme G. Piccoli wrote:
> It was noticed that on plugging a low-power USB source on Steam
> Deck USB-C port (handled by dwc3), if such port is on device role,
> the HW somehow keep asseting the PCIe PME line and triggering a
> wakeup event on S3/deep suspend (that manifests as a PME wakeup
> interrupt, failing the suspend path). That doesn't happen when the USB
> port is on host mode or if the USB device connected is not a low-power
> type (for example, a connected keyboard doesn't reproduce that).
> 
> Even by masking all maskable ACPI GPEs, the problem still reproduces; also
> by having the PCIe PME mechanism using non-MSI interrupts, the symptom is
> observed as the HW succeeding to S3/suspend but waking up right after.
> 
> By changing the PRTCAP mode to DWC3_GCTL_PRTCAP_HOST in the controller
> (as one of the latest steps on gadget common suspend), we managed to
> circumvent the issue. The mode restore is already done in the common
> resume function. Notice that this patch does not change the in-driver
> port state on suspend, or else the resume path "thinks" the port was
> in host mode prior to suspend and resume it on a wrong fashion.
> 
> Cc: Andrey Smirnov <andrew.smirnov@gmail.com>
> Cc: Vivek Dasmohapatra <vivek@collabora.com>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
> ---
> 
> 
> Hi folks, first of all thanks in advance for reviews/feedback on this one.
> 
> This patch is the result of a debug approach with no datasheet access.
> With that said, I'm not 100% sure writing to this register is free of
> undesired side-effects; one thing I'm considering is the following scenario:
> imagine a device A with the USB port on device/peripheral mode, and a device B
> connected to it, acting as host. What if we want device B be able to wakeup
> device A? Would this patch prevent that for all cases, always?
> 
> I appreciate ideas in case this is not the best approach to fix the
> problem. We could also gate this approach to the HW board as a first step,
> for example.
> 
> Thanks,
> 
> 
> Guilherme
> 
> 
>  drivers/usb/dwc3/core.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
> index 0328c86ef806..5801d3ebbbcb 100644
> --- a/drivers/usb/dwc3/core.c
> +++ b/drivers/usb/dwc3/core.c
> @@ -104,7 +104,7 @@ static int dwc3_get_dr_mode(struct dwc3 *dwc)
>  	return 0;
>  }
>  
> -void dwc3_set_prtcap(struct dwc3 *dwc, u32 mode)
> +static void __dwc3_set_prtcap(struct dwc3 *dwc, u32 mode)
>  {
>  	u32 reg;
>  
> @@ -112,7 +112,11 @@ void dwc3_set_prtcap(struct dwc3 *dwc, u32 mode)
>  	reg &= ~(DWC3_GCTL_PRTCAPDIR(DWC3_GCTL_PRTCAP_OTG));
>  	reg |= DWC3_GCTL_PRTCAPDIR(mode);
>  	dwc3_writel(dwc->regs, DWC3_GCTL, reg);
> +}
>  
> +void dwc3_set_prtcap(struct dwc3 *dwc, u32 mode)
> +{
> +	__dwc3_set_prtcap(dwc, mode);
>  	dwc->current_dr_role = mode;
>  }
>  
> @@ -2128,6 +2132,7 @@ static int dwc3_suspend_common(struct dwc3 *dwc, pm_message_t msg)
>  			break;
>  		dwc3_gadget_suspend(dwc);
>  		synchronize_irq(dwc->irq_gadget);
> +		__dwc3_set_prtcap(dwc, DWC3_GCTL_PRTCAP_HOST);
>  		dwc3_core_exit(dwc);
>  		break;
>  	case DWC3_GCTL_PRTCAP_HOST:

Hi folks, friendly ping on this one.
Thanks,


Guilherme
Thinh Nguyen Dec. 5, 2023, 1:23 a.m. UTC | #2
Hi,

On Mon, Dec 04, 2023, Guilherme G. Piccoli wrote:
> On 22/11/2023 13:51, Guilherme G. Piccoli wrote:
> > It was noticed that on plugging a low-power USB source on Steam
> > Deck USB-C port (handled by dwc3), if such port is on device role,

I'm not clear of the testing sequence here. Can you clarify further? Is
this device operating as host mode but then it switches role to device
mode when no device is connected?

> > the HW somehow keep asseting the PCIe PME line and triggering a
> > wakeup event on S3/deep suspend (that manifests as a PME wakeup
> > interrupt, failing the suspend path). That doesn't happen when the USB
> > port is on host mode or if the USB device connected is not a low-power
> > type (for example, a connected keyboard doesn't reproduce that).

Is the PME continuously generating non-stop? Did you test this in USB3
speed? Does this happen for every low-power device or just this specific
low-power device?

> > 
> > Even by masking all maskable ACPI GPEs, the problem still reproduces; also

Even if you masked all the interrupts, and the events are still
generating? Did you check if the driver handled wakeup from PME and
properly restore the controller?

> > by having the PCIe PME mechanism using non-MSI interrupts, the symptom is
> > observed as the HW succeeding to S3/suspend but waking up right after.
> > 
> > By changing the PRTCAP mode to DWC3_GCTL_PRTCAP_HOST in the controller
> > (as one of the latest steps on gadget common suspend), we managed to
> > circumvent the issue. The mode restore is already done in the common
> > resume function. Notice that this patch does not change the in-driver
> > port state on suspend, or else the resume path "thinks" the port was
> > in host mode prior to suspend and resume it on a wrong fashion.
> > 
> > Cc: Andrey Smirnov <andrew.smirnov@gmail.com>
> > Cc: Vivek Dasmohapatra <vivek@collabora.com>
> > Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
> > ---
> > 
> > 
> > Hi folks, first of all thanks in advance for reviews/feedback on this one.
> > 
> > This patch is the result of a debug approach with no datasheet access.
> > With that said, I'm not 100% sure writing to this register is free of
> > undesired side-effects; one thing I'm considering is the following scenario:
> > imagine a device A with the USB port on device/peripheral mode, and a device B
> > connected to it, acting as host. What if we want device B be able to wakeup
> > device A? Would this patch prevent that for all cases, always?
> > 
> > I appreciate ideas in case this is not the best approach to fix the
> > problem. We could also gate this approach to the HW board as a first step,
> > for example.
> > 
> > Thanks,
> > 
> > 
> > Guilherme
> > 
> > 
> >  drivers/usb/dwc3/core.c | 7 ++++++-
> >  1 file changed, 6 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
> > index 0328c86ef806..5801d3ebbbcb 100644
> > --- a/drivers/usb/dwc3/core.c
> > +++ b/drivers/usb/dwc3/core.c
> > @@ -104,7 +104,7 @@ static int dwc3_get_dr_mode(struct dwc3 *dwc)
> >  	return 0;
> >  }
> >  
> > -void dwc3_set_prtcap(struct dwc3 *dwc, u32 mode)
> > +static void __dwc3_set_prtcap(struct dwc3 *dwc, u32 mode)
> >  {
> >  	u32 reg;
> >  
> > @@ -112,7 +112,11 @@ void dwc3_set_prtcap(struct dwc3 *dwc, u32 mode)
> >  	reg &= ~(DWC3_GCTL_PRTCAPDIR(DWC3_GCTL_PRTCAP_OTG));
> >  	reg |= DWC3_GCTL_PRTCAPDIR(mode);
> >  	dwc3_writel(dwc->regs, DWC3_GCTL, reg);
> > +}
> >  
> > +void dwc3_set_prtcap(struct dwc3 *dwc, u32 mode)
> > +{
> > +	__dwc3_set_prtcap(dwc, mode);
> >  	dwc->current_dr_role = mode;
> >  }
> >  
> > @@ -2128,6 +2132,7 @@ static int dwc3_suspend_common(struct dwc3 *dwc, pm_message_t msg)
> >  			break;
> >  		dwc3_gadget_suspend(dwc);
> >  		synchronize_irq(dwc->irq_gadget);
> > +		__dwc3_set_prtcap(dwc, DWC3_GCTL_PRTCAP_HOST);
> >  		dwc3_core_exit(dwc);
> >  		break;
> >  	case DWC3_GCTL_PRTCAP_HOST:
> 
> Hi folks, friendly ping on this one.
> Thanks,
> 
> 

Some platforms may need a soft reset before a change to prtcapdir. This
may break some setups. This seems to be a workaround and should not be
treated as part of a normal flow.

BR,
Thinh
Guilherme G. Piccoli Dec. 5, 2023, 2:40 p.m. UTC | #3
Hi Thinh, thanks for your response. I'll clarify inline below:

On 04/12/2023 22:23, Thinh Nguyen wrote:
> [...]
>>> It was noticed that on plugging a low-power USB source on Steam
>>> Deck USB-C port (handled by dwc3), if such port is on device role,
> 
> I'm not clear of the testing sequence here. Can you clarify further? Is
> this device operating as host mode but then it switches role to device
> mode when no device is connected?
> 

Exactly this. We have a driver that changes between host/device mode
according to ACPI notifications on port connect. But to make
tests/discussion easier and eliminate more variables, we just dropped
this driver and do it manually.

So the steps are:

(A) host mode test
1) Put the port on host mode using debugfs interface.
2) Wait 30 seconds, plug a cable connecting the Steam Deck to a laptop -
we call this connection a "low-power source", since it seems to charge
slowly the Deck.
3) Suspend the Deck after some seconds (S3/deep) - success

(B) device mode test

1) Put the port on device mode using debugfs interface.
2) Wait 30 seconds, plug a cable connecting the Steam Deck to a laptop.
3) Suspend the Deck after some seconds (S3/deep) - fails

3a) If pcie_pme is using MSIs, it fails showing that a wakeup interrupt
is pending, in this case the Steam Deck effectively doesn't enter suspend.

3b) if PCIe PME is not using MSIs, Deck suspends and right after (less
than a second), wakes up properly.


>>> the HW somehow keep asseting the PCIe PME line and triggering a
>>> wakeup event on S3/deep suspend (that manifests as a PME wakeup
>>> interrupt, failing the suspend path). That doesn't happen when the USB
>>> port is on host mode or if the USB device connected is not a low-power
>>> type (for example, a connected keyboard doesn't reproduce that).
> 
> Is the PME continuously generating non-stop? Did you test this in USB3
> speed? Does this happen for every low-power device or just this specific
> low-power device?

It seems PME is continuously being generated, yes. I tested by
connecting to my laptop as mentioned, I guess others tested different
scenarios, not always reproduces. An example: a keyboard or a disk
connected when the USB port is on device mode doesn't reproduce. Also, I
think I didn't test "in USB3 speed" - could you detail more, not sure if
I understood that properly.


> [...] 
> Even if you masked all the interrupts, and the events are still
> generating? Did you check if the driver handled wakeup from PME and
> properly restore the controller?
> 

Ok, let me clarify a bit. From the ACPI perspective, I was able to check
from kernel that some GPEs were generated on resume when the issue
happens (and potentially even when the issue doesn't happen, in host
mode for example). So, what I did was masking all these GPEs using the
kernel sysfs interface. After masking, the issue still reproduces but
the GPEs count doesn't increase.

Regarding the PME interrupt now: if MSI is used on PME, I can see an
increase of 1 in every suspend/resume attempt (checking
/proc/interrupts). Now if MSIs are not used, guess what? There was no
increase in the interrupt at all. I didn't mask the PME interrupt on
PCIe PME driver...but even with PME driver disabled, IIRC the problem
reproduces.

"Did you check if the driver handled wakeup from PME and properly
restore the controller?" <- I think I didn't - how do you suggest me to
check that?

What I've noticed is that either the system can't suspend, or if no MSIs
are used on PCIe PME, it suspends and resumes right after, with success.
In this latter case, dwc3 works normally again, resume is successful.
Let me know if you want me to check any other path or function called, etc.


> [...]
> 
> Some platforms may need a soft reset before a change to prtcapdir. This
> may break some setups. This seems to be a workaround and should not be
> treated as part of a normal flow.

OK, do you have any other idea of a register change that is softer than
changing "prtcapdir" and could prevent the issue? Also, would that
workaround makes sense as...a quirk?

We could guard it for Deck's HW exclusively, using DMI if you think it
does make sense...or the dwc3 quirks (already have some for AMD right?)

I'm CCing Piyush and Huang from AMD, since this is AMD HW. Any other
suggestions are much appreciated.
Thanks,


Guilherme
Guilherme G. Piccoli Jan. 9, 2024, 6 p.m. UTC | #4
On 05/12/2023 11:40, Guilherme G. Piccoli wrote:
> Hi Thinh, thanks for your response. I'll clarify inline below:
> 
> On 04/12/2023 22:23, Thinh Nguyen wrote:
>> [...]
>>>> It was noticed that on plugging a low-power USB source on Steam
>>>> Deck USB-C port (handled by dwc3), if such port is on device role,
>>
>> I'm not clear of the testing sequence here. Can you clarify further? Is
>> this device operating as host mode but then it switches role to device
>> mode when no device is connected?
>>
> 
> Exactly this. We have a driver that changes between host/device mode
> according to ACPI notifications on port connect. But to make
> tests/discussion easier and eliminate more variables, we just dropped
> this driver and do it manually.
> 
> So the steps are:
> 
> (A) host mode test
> 1) Put the port on host mode using debugfs interface.
> 2) Wait 30 seconds, plug a cable connecting the Steam Deck to a laptop -
> we call this connection a "low-power source", since it seems to charge
> slowly the Deck.
> 3) Suspend the Deck after some seconds (S3/deep) - success
> 
> (B) device mode test
> 
> 1) Put the port on device mode using debugfs interface.
> 2) Wait 30 seconds, plug a cable connecting the Steam Deck to a laptop.
> 3) Suspend the Deck after some seconds (S3/deep) - fails
> 
> 3a) If pcie_pme is using MSIs, it fails showing that a wakeup interrupt
> is pending, in this case the Steam Deck effectively doesn't enter suspend.
> 
> 3b) if PCIe PME is not using MSIs, Deck suspends and right after (less
> than a second), wakes up properly.
> 
> 
>>>> the HW somehow keep asseting the PCIe PME line and triggering a
>>>> wakeup event on S3/deep suspend (that manifests as a PME wakeup
>>>> interrupt, failing the suspend path). That doesn't happen when the USB
>>>> port is on host mode or if the USB device connected is not a low-power
>>>> type (for example, a connected keyboard doesn't reproduce that).
>>
>> Is the PME continuously generating non-stop? Did you test this in USB3
>> speed? Does this happen for every low-power device or just this specific
>> low-power device?
> 
> It seems PME is continuously being generated, yes. I tested by
> connecting to my laptop as mentioned, I guess others tested different
> scenarios, not always reproduces. An example: a keyboard or a disk
> connected when the USB port is on device mode doesn't reproduce. Also, I
> think I didn't test "in USB3 speed" - could you detail more, not sure if
> I understood that properly.
> 
> 
>> [...] 
>> Even if you masked all the interrupts, and the events are still
>> generating? Did you check if the driver handled wakeup from PME and
>> properly restore the controller?
>>
> 
> Ok, let me clarify a bit. From the ACPI perspective, I was able to check
> from kernel that some GPEs were generated on resume when the issue
> happens (and potentially even when the issue doesn't happen, in host
> mode for example). So, what I did was masking all these GPEs using the
> kernel sysfs interface. After masking, the issue still reproduces but
> the GPEs count doesn't increase.
> 
> Regarding the PME interrupt now: if MSI is used on PME, I can see an
> increase of 1 in every suspend/resume attempt (checking
> /proc/interrupts). Now if MSIs are not used, guess what? There was no
> increase in the interrupt at all. I didn't mask the PME interrupt on
> PCIe PME driver...but even with PME driver disabled, IIRC the problem
> reproduces.
> 
> "Did you check if the driver handled wakeup from PME and properly
> restore the controller?" <- I think I didn't - how do you suggest me to
> check that?
> 
> What I've noticed is that either the system can't suspend, or if no MSIs
> are used on PCIe PME, it suspends and resumes right after, with success.
> In this latter case, dwc3 works normally again, resume is successful.
> Let me know if you want me to check any other path or function called, etc.
> 
> 
>> [...]
>>
>> Some platforms may need a soft reset before a change to prtcapdir. This
>> may break some setups. This seems to be a workaround and should not be
>> treated as part of a normal flow.
> 
> OK, do you have any other idea of a register change that is softer than
> changing "prtcapdir" and could prevent the issue? Also, would that
> workaround makes sense as...a quirk?
> 
> We could guard it for Deck's HW exclusively, using DMI if you think it
> does make sense...or the dwc3 quirks (already have some for AMD right?)
> 
> I'm CCing Piyush and Huang from AMD, since this is AMD HW. Any other
> suggestions are much appreciated.
> Thanks,
> 
> 
> Guilherme


Hi folks, just curious if you think the way forward would be indeed to
quirk it based on DMI/device ID, or if should pursue another approach
here. Suggestions are very welcome, and thanks in advance!


Guilherme
Thinh Nguyen Jan. 11, 2024, 2:01 a.m. UTC | #5
Hi,

On Tue, Jan 09, 2024, Guilherme G. Piccoli wrote:
> On 05/12/2023 11:40, Guilherme G. Piccoli wrote:
> > Hi Thinh, thanks for your response. I'll clarify inline below:
> > 
> > On 04/12/2023 22:23, Thinh Nguyen wrote:
> >> [...]
> >>>> It was noticed that on plugging a low-power USB source on Steam
> >>>> Deck USB-C port (handled by dwc3), if such port is on device role,
> >>
> >> I'm not clear of the testing sequence here. Can you clarify further? Is
> >> this device operating as host mode but then it switches role to device
> >> mode when no device is connected?
> >>
> > 
> > Exactly this. We have a driver that changes between host/device mode
> > according to ACPI notifications on port connect. But to make
> > tests/discussion easier and eliminate more variables, we just dropped
> > this driver and do it manually.
> > 
> > So the steps are:
> > 
> > (A) host mode test
> > 1) Put the port on host mode using debugfs interface.
> > 2) Wait 30 seconds, plug a cable connecting the Steam Deck to a laptop -
> > we call this connection a "low-power source", since it seems to charge
> > slowly the Deck.

I assume there was a role switch negotiation to switch to device mode
successfully here before the next step?

> > 3) Suspend the Deck after some seconds (S3/deep) - success
> > 
> > (B) device mode test
> > 
> > 1) Put the port on device mode using debugfs interface.
> > 2) Wait 30 seconds, plug a cable connecting the Steam Deck to a laptop.
> > 3) Suspend the Deck after some seconds (S3/deep) - fails
> > 
> > 3a) If pcie_pme is using MSIs, it fails showing that a wakeup interrupt
> > is pending, in this case the Steam Deck effectively doesn't enter suspend.
> > 
> > 3b) if PCIe PME is not using MSIs, Deck suspends and right after (less
> > than a second), wakes up properly.
> > 

Your platform is DRD right? If that's the case, then it should be using
level interrupt. It should not support MSI unless it's host mode only.

> > 
> >>>> the HW somehow keep asseting the PCIe PME line and triggering a
> >>>> wakeup event on S3/deep suspend (that manifests as a PME wakeup
> >>>> interrupt, failing the suspend path). That doesn't happen when the USB
> >>>> port is on host mode or if the USB device connected is not a low-power
> >>>> type (for example, a connected keyboard doesn't reproduce that).
> >>
> >> Is the PME continuously generating non-stop? Did you test this in USB3
> >> speed? Does this happen for every low-power device or just this specific
> >> low-power device?
> > 
> > It seems PME is continuously being generated, yes. I tested by
> > connecting to my laptop as mentioned, I guess others tested different
> > scenarios, not always reproduces. An example: a keyboard or a disk
> > connected when the USB port is on device mode doesn't reproduce. Also, I
> > think I didn't test "in USB3 speed" - could you detail more, not sure if
> > I understood that properly.

I mean to ask whether this test was done while operating in SuperSpeed
or SuperSpeed Plus.

> > 
> > 
> >> [...] 
> >> Even if you masked all the interrupts, and the events are still
> >> generating? Did you check if the driver handled wakeup from PME and
> >> properly restore the controller?
> >>
> > 
> > Ok, let me clarify a bit. From the ACPI perspective, I was able to check
> > from kernel that some GPEs were generated on resume when the issue
> > happens (and potentially even when the issue doesn't happen, in host
> > mode for example). So, what I did was masking all these GPEs using the
> > kernel sysfs interface. After masking, the issue still reproduces but
> > the GPEs count doesn't increase.
> > 
> > Regarding the PME interrupt now: if MSI is used on PME, I can see an
> > increase of 1 in every suspend/resume attempt (checking
> > /proc/interrupts). Now if MSIs are not used, guess what? There was no
> > increase in the interrupt at all. I didn't mask the PME interrupt on
> > PCIe PME driver...but even with PME driver disabled, IIRC the problem
> > reproduces.
> > 
> > "Did you check if the driver handled wakeup from PME and properly
> > restore the controller?" <- I think I didn't - how do you suggest me to
> > check that?

If it's in device mode, and you mentioned PME, that means that the
device was in hibernation. I assume that you're not using the mainline
dwc3 driver if Steam Deck supports hibernation and was in hibernation
before the connection. Otherwise, PME should not be generated. If it
does, something is broken and requires a workaround (as the one you
have).

> > 
> > What I've noticed is that either the system can't suspend, or if no MSIs
> > are used on PCIe PME, it suspends and resumes right after, with success.
> > In this latter case, dwc3 works normally again, resume is successful.
> > Let me know if you want me to check any other path or function called, etc.
> > 
> > 
> >> [...]
> >>
> >> Some platforms may need a soft reset before a change to prtcapdir. This
> >> may break some setups. This seems to be a workaround and should not be
> >> treated as part of a normal flow.
> > 
> > OK, do you have any other idea of a register change that is softer than
> > changing "prtcapdir" and could prevent the issue? Also, would that
> > workaround makes sense as...a quirk?
> > 
> > We could guard it for Deck's HW exclusively, using DMI if you think it
> > does make sense...or the dwc3 quirks (already have some for AMD right?)
> > 

Yes, you can create a specific quirk for this device.

> > I'm CCing Piyush and Huang from AMD, since this is AMD HW. Any other
> > suggestions are much appreciated.
> > Thanks,
> > 
> > 
> > Guilherme
> 
> 
> Hi folks, just curious if you think the way forward would be indeed to
> quirk it based on DMI/device ID, or if should pursue another approach
> here. Suggestions are very welcome, and thanks in advance!
> 

Sorry for the delay response. Just got back from break.

Thanks,
Thinh
Guilherme G. Piccoli Jan. 11, 2024, 4:13 p.m. UTC | #6
Hi Thinh, thanks for your response! My comments are inline below:


On 10/01/2024 23:01, Thinh Nguyen wrote:
> [...]
> 
> I assume there was a role switch negotiation to switch to device mode
> successfully here before the next step?

Yes, exactly. We have an out-of-tree driver that reads the port state
through some ACPI message to switch modes, but to be 100% clear:

**This OOT driver was factored out for our tests** - in other words: all
tests made were done by manually changing the port mode (via debugfs)
and waiting some seconds for that to settle. This OOT driver is not even
compiled for recent kernels (it runs in a downstream 6.1 kernel).


>>> 3b) if PCIe PME is not using MSIs, Deck suspends and right after (less
>>> than a second), wakes up properly.
>>>
> 
> Your platform is DRD right? If that's the case, then it should be using
> level interrupt. It should not support MSI unless it's host mode only.
>

Yes, DRD indeed - but it's the **PCI PME interrupt**  that uses MSI, not
the USB PCI device. Here is an output of lspci:

$ lspci -vknns 04:00.3
04:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD]
VanGogh USB0 [1022:163a] (prog-if fe [USB Device])
        Subsystem: Valve Software VanGogh USB0 [1e44:1776]
        Flags: bus master, fast devsel, latency 0, IRQ 26
        Memory at 80000000 (64-bit, non-prefetchable) [size=1M]
        Capabilities: [48] Vendor Specific Information: Len=08 <?>
        Capabilities: [50] Power Management version 3
        Capabilities: [64] Express Endpoint, MSI 00
        Capabilities: [a0] MSI: Enable- Count=1/8 Maskable- 64bit+
        Capabilities: [c0] MSI-X: Enable- Count=8 Masked-
        Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1
Len=010 <?>
        Kernel driver in use: dwc3-pci
        Kernel modules: dwc3_pci

Now, I **guess** this is expected, but there is a difference in
/proc/interrupt between device and host mode:

$ grep 26: /proc/interrupts | tr -s \  # device mode
[empty]

$ grep 26: /proc/interrupts | tr -s \  # host mode
 26: 0 0 0 0 0 0 0 0 IO-APIC 25-fasteoi xhci-hcd:usb3


>>> [...] An example: a keyboard or a disk
>>> connected when the USB port is on device mode doesn't reproduce. Also, I
>>> think I didn't test "in USB3 speed" - could you detail more, not sure if
>>> I understood that properly.
> 
> I mean to ask whether this test was done while operating in SuperSpeed
> or SuperSpeed Plus.

Well, I'm not sure if I know how to answer that heh
Checking "lsusb --tree" in host mode gives me:

/:  Bus 04.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 10000M
/:  Bus 03.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 480M

When I switch to device mode, there is no output (and I think this is
expected, right?). Or...the question was about the USB port in my
laptop, the one I'm connecting on the Deck?

Apologies for my hard time understanding this one...


>>>> [...] 
>>> "Did you check if the driver handled wakeup from PME and properly
>>> restore the controller?" <- I think I didn't - how do you suggest me to
>>> check that?
> 
> If it's in device mode, and you mentioned PME, that means that the
> device was in hibernation. I assume that you're not using the mainline
> dwc3 driver if Steam Deck supports hibernation and was in hibernation
> before the connection. Otherwise, PME should not be generated. If it
> does, something is broken and requires a workaround (as the one you
> have).

There was no hibernation (S4 state) involved, just to clarify - it's a
mem_sleep /suspend to RAM operation, usually called deep sleep / S3. And
indeed, the PME seems to be generated and prevents the mem_sleep (or it
does sleep but instantly wakes-up, which is the case with level interrupts).

I'll check both Steam Deck models (LCD and OLED) to see if both can be
quirked in the same way and provide then a simple patch doing that for
review, makes sense?


> [...] 
> Sorry for the delay response. Just got back from break.

No need for apologies at all, thanks a bunch for your comprehensive
response!

Cheers,


Guilherme
Thinh Nguyen Jan. 13, 2024, 1:33 a.m. UTC | #7
On Thu, Jan 11, 2024, Guilherme G. Piccoli wrote:
> Hi Thinh, thanks for your response! My comments are inline below:
> 
> 
> On 10/01/2024 23:01, Thinh Nguyen wrote:
> > [...]
> > 
> > I assume there was a role switch negotiation to switch to device mode
> > successfully here before the next step?
> 
> Yes, exactly. We have an out-of-tree driver that reads the port state
> through some ACPI message to switch modes, but to be 100% clear:
> 
> **This OOT driver was factored out for our tests** - in other words: all
> tests made were done by manually changing the port mode (via debugfs)
> and waiting some seconds for that to settle. This OOT driver is not even
> compiled for recent kernels (it runs in a downstream 6.1 kernel).
> 
> 
> >>> 3b) if PCIe PME is not using MSIs, Deck suspends and right after (less
> >>> than a second), wakes up properly.
> >>>
> > 
> > Your platform is DRD right? If that's the case, then it should be using
> > level interrupt. It should not support MSI unless it's host mode only.
> >
> 
> Yes, DRD indeed - but it's the **PCI PME interrupt**  that uses MSI, not
> the USB PCI device. Here is an output of lspci:
> 
> $ lspci -vknns 04:00.3
> 04:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD]
> VanGogh USB0 [1022:163a] (prog-if fe [USB Device])
>         Subsystem: Valve Software VanGogh USB0 [1e44:1776]
>         Flags: bus master, fast devsel, latency 0, IRQ 26
>         Memory at 80000000 (64-bit, non-prefetchable) [size=1M]
>         Capabilities: [48] Vendor Specific Information: Len=08 <?>
>         Capabilities: [50] Power Management version 3
>         Capabilities: [64] Express Endpoint, MSI 00
>         Capabilities: [a0] MSI: Enable- Count=1/8 Maskable- 64bit+
>         Capabilities: [c0] MSI-X: Enable- Count=8 Masked-

Are you showing MSI enabled? That looks like MSI is disabled.

>         Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1
> Len=010 <?>
>         Kernel driver in use: dwc3-pci
>         Kernel modules: dwc3_pci
> 
> Now, I **guess** this is expected, but there is a difference in
> /proc/interrupt between device and host mode:
> 
> $ grep 26: /proc/interrupts | tr -s \  # device mode
> [empty]
> 
> $ grep 26: /proc/interrupts | tr -s \  # host mode
>  26: 0 0 0 0 0 0 0 0 IO-APIC 25-fasteoi xhci-hcd:usb3

Looks like it's level interrupt here. I don't see "edge" or MSI
interrupt. Is the pcie_pme share the same interrupt line as the usb
controller?

I'm not sure how the STEAM DECK is designed. Does the OOT driver manages
the power state request outside of the PCI PM for device mode and not
just reading the port state? If that's the case, the issue could be in
the OOT driver?

> 
> 
> >>> [...] An example: a keyboard or a disk
> >>> connected when the USB port is on device mode doesn't reproduce. Also, I
> >>> think I didn't test "in USB3 speed" - could you detail more, not sure if
> >>> I understood that properly.
> > 
> > I mean to ask whether this test was done while operating in SuperSpeed
> > or SuperSpeed Plus.
> 
> Well, I'm not sure if I know how to answer that heh
> Checking "lsusb --tree" in host mode gives me:
> 
> /:  Bus 04.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 10000M
> /:  Bus 03.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 480M
> 
> When I switch to device mode, there is no output (and I think this is
> expected, right?). Or...the question was about the USB port in my
> laptop, the one I'm connecting on the Deck?
> 
> Apologies for my hard time understanding this one...
> 

No problem. I was referring to connected speed. The print above doesn't
show any device connected. PME generation can come from USB2 or USB3
PMU. I wanted to check if the problem was because of one of them is
misbehaving, but I don't think it's related to this.

> 
> >>>> [...] 
> >>> "Did you check if the driver handled wakeup from PME and properly
> >>> restore the controller?" <- I think I didn't - how do you suggest me to
> >>> check that?
> > 
> > If it's in device mode, and you mentioned PME, that means that the
> > device was in hibernation. I assume that you're not using the mainline
> > dwc3 driver if Steam Deck supports hibernation and was in hibernation
> > before the connection. Otherwise, PME should not be generated. If it
> > does, something is broken and requires a workaround (as the one you
> > have).
> 
> There was no hibernation (S4 state) involved, just to clarify - it's a
> mem_sleep /suspend to RAM operation, usually called deep sleep / S3. And
> indeed, the PME seems to be generated and prevents the mem_sleep (or it
> does sleep but instantly wakes-up, which is the case with level interrupts).

I was referring to the controller hibernation and not system
hibernation. S3 will cause the xhci driver to put the host controller to
go into hibernation and send a power state change request through PCI
PM. Usually the power state change would put the core in D3 and passes
the control to the PMU, and PME generation can happen then. Similar
behavior applies to device mode, but the power state change may be tied
to a different interface than PCI PM?

Perhaps you're missing the logic/flow to update the power state change
when in device mode. And perhaps putting the controller in host mode
passes the control to xhci allowing the driver to properly manage the
power state preventing PME from generated? It's a little difficult to
debug without more info.

Did you seek help from the vendor?

> 
> I'll check both Steam Deck models (LCD and OLED) to see if both can be
> quirked in the same way and provide then a simple patch doing that for
> review, makes sense?
> 

I assume you ruled out problems from bad cable or faulty laptop/device?

Thanks,
Thinh
Guilherme G. Piccoli Jan. 21, 2024, 5:48 p.m. UTC | #8
On 17/01/2024 21:39, Thinh Nguyen wrote:
> [...]
> That means the disconnection is initiated by the dwc3 driver. This
> should happen when you put the STEAM DECK in suspend while connected to
> the laptop. From your laptop, you should see the DECK is disconnected as
> if the cable is unplugged.
> 
> If that did not happen, can you capture the dwc3 tracepoints to see
> what's wrong?
> 
> You can follow this instruction to capture the tracepoints:
> https://www.kernel.org/doc/html/latest/driver-api/usb/dwc3.html#required-information
> 

Hi Thinh, thanks again for your patience and suggestions!

I've tried my best to determine the state of the USB port from my laptop
PoV when connecting the Deck there, but I was unsuccessful. The Deck
doesn't appear on laptop's lsusb output, and there's nothing on dmesg /
tracing (xhci), dynamic debug or some power interfaces I poked about
that. The way to go, IIUC, it's now trying to measure that directly in
the port, using a multimeter or some HW device for that. I don't have
that, but what I was able to do instead is collecting the traces you
asked, at least heh


So, I did 4 collections, all in the attached tarball.

(1) Right after booting the Steam Deck and enabling the traces [0], I've
changed the mode of dwc3 (in the debugfs) from host to device - results
on {trace,regdump}.1

(2) Plugged the USB cable connecting the Deck to my laptop - results at
{trace,regdump}.2 and as we can see, traces are empty.

(3) Attempt suspending the Deck (by running "echo mem >
/sys/power/state"), it failed as expected - then I've collected results
on {trace,regdump}.3

(4) [bonus] Collected the same results of 3 (after rebooting the Deck)
but running dwc3 with this patch/quirk - it's easy to notice in the
trace, as we can see the extras readl/writel prior to suspend. In this
case, suspend works...and results are on {trace,regdump}.4-PATCHED

Tests were done on kernel 6.7 mainline, no OOT drivers running. Hope it
helps to narrow down the issue, and again, thanks for your help here.

[0]
cd /sys/kernel/tracing/
echo 1 > events/dwc3/enable
echo :mod:dwc3 > set_ftrace_filter
echo :mod:dwc3_pci >> set_ftrace_filter
echo function > current_tracer


>  [...]
> That means to unplug the cable connected to the STEAM DECK. Put the
> STEAM DECK to suspend. Check to see if it stays suspend or it would wake
> up right away.

Oh, this case was tested before and it works fine =)


> Also, in your test, the connected host (the laptop) remained ON at all
> time while the STEAM DECK was tested for suspend right?
> 

Yes, laptop always powered ON!
Cheers,


Guilherme
Guilherme G. Piccoli Jan. 28, 2024, 3:25 p.m. UTC | #9
Hi Thinh, thanks again for the help and apologies for my delay.
Responses inline below:


On 22/01/2024 23:12, Thinh Nguyen wrote:
> [...]
> 
> The tracepoints indicated that no gadget driver was binded. So the
> controller actually never started (ie. soft-connection never happened if
> the controller doesn't start).
> 
>>
>> (2) Plugged the USB cable connecting the Deck to my laptop - results at
>> {trace,regdump}.2 and as we can see, traces are empty.
> 
> Right, because as noted above, no activity is seen.
> 

Okay, that makes sense then and explains why I see nothing related to
the Deck on my laptop! Do you know a SW way to measure that the USB port
is providing power / "energy"? It's just out of curiosity, not that we
need that in this case (knowing the lack of a gadget driver).


>> [...]
> 
> I can see the device was resumed right after.
> 
>   kworker/u32:10-1078    [002] ...1.   179.453703: dwc3_pci_suspend <-pci_pm_suspend
>   kworker/u32:10-1078    [002] ...1.   179.453704: dwc3_pci_dsm <-pci_pm_suspend
>   kworker/u32:19-1087    [005] ...1.   179.939836: dwc3_pci_resume <-dpm_run_callback
> 
>>
>> (4) [bonus] Collected the same results of 3 (after rebooting the Deck)
>> but running dwc3 with this patch/quirk - it's easy to notice in the
>> trace, as we can see the extras readl/writel prior to suspend. In this
>> case, suspend works...and results are on {trace,regdump}.4-PATCHED
> 
> Even in the patched version, device was resumed right after. The resume
> here may not trigger the system resume, but it resumed nonetheless.
> 
>    kworker/u32:6-356     [006] ...1.    69.600400: dwc3_pci_suspend <-pci_pm_suspend
>    kworker/u32:6-356     [006] ...1.    69.600401: dwc3_pci_dsm <-pci_pm_suspend
>   kworker/u32:22-1028    [001] ...1.    70.125294: dwc3_pci_resume <-dpm_run_callback
>

Yeah, in both cases the resume happened. But they differ: without the
patch, resume was automatic - the device effectively can't suspend since
it auto-resumes instantly. Whereas with the patch (scenario 4), I've
triggered the resume by pressing the power button on Steam Deck.

And ftrace timestamps unfortunately don't help with that, it's not
showing how much time the device stay suspended, so it might be
confusing and both cases could appear as the same exact scenario, even
if they are completely different (fail vs success cases).


>> [...]
> Thanks for the logs and data points. The tracepoints indicate that the
> workaround _only_ prevented the system wakeup, not the controller.
> The wakeup still happen there as you can see the controller got woken up
> immediately after request to go into suspend in both cases, patched or
> not. So the workaround you provided doesn't help keeping the controller
> in suspend.

Yeah, this is not really the case - as mentioned above, though they
appear the same in traces, without the workaround we have an immediate
auto-resume, preventing the suspend. With the patch, device suspends and
we can keep it this way for the amount of time we want, only resuming
when we press the power button (which is exactly the expected behavior).


> 
> We need to look into why that's the case, and we need to figure out
> where the source of the wake came from. Do you have a connector
> controller that can also wakeup the system?

I'm not sure about this, what do you mean by a connector controller?!


> 
> As for the workaround, if the wakeup source is the usb controller, did
> you try to disable wakeup?
> 
> diff --git a/drivers/usb/dwc3/dwc3-pci.c b/drivers/usb/dwc3/dwc3-pci.c
> index 6604845c397c..e395d7518022 100644
> --- a/drivers/usb/dwc3/dwc3-pci.c
> +++ b/drivers/usb/dwc3/dwc3-pci.c
> @@ -359,7 +359,7 @@ static int dwc3_pci_probe(struct pci_dev *pci, const struct pci_device_id *id)
>  		goto err;
>  	}
>  
> -	device_init_wakeup(dev, true);
> +	device_init_wakeup(dev, false);
>  	pci_set_drvdata(pci, dwc);
>  	pm_runtime_put(dev);
>  #ifdef CONFIG_PM
> 
>  If it works, you can fine tune to only disable wakeup when in device
>  mode and enable when in host mode with device_set_wakeup_enable().
> 

Oh, great suggestion - thanks! And...it worked!

So, with your patch, I'm able to properly suspend the Deck, even with
dwc3 in device/gadget mode.

I'll try to cook a patch with this approach but restricted to this
specific USB controller and only when in gadget mode - do you think this
is the way to go?

Any other debug / logs that might be useful?
Cheers,

Guilherme
Guilherme G. Piccoli Jan. 31, 2024, 6:16 p.m. UTC | #10
On 29/01/2024 23:17, Thinh Nguyen wrote:
> [...]
>> And ftrace timestamps unfortunately don't help with that, it's not
>> showing how much time the device stay suspended, so it might be
>> confusing and both cases could appear as the same exact scenario, even
>> if they are completely different (fail vs success cases).
> 
> That's odd.. my assumption was the timestamps to be valid. Were you able
> to confirm that it's the issue with the timestamps? Perhaps check if the
> other devices in the system also wakeup at the approximately same
> time printed in the timestamps?
> 

Hi Thinh, indeed - this a bit odd right? I guess this is maybe related
with the way kernel keeps track of clocks on suspend - despite TSC keeps
accounting on suspend (at least for all modern x86 processors IIUC), the
timestamps in both tracing buffer or dmesg do not reflect suspend time.

Is it different in your x86 systems? Or maybe in other architectures you
have experience with?

I've done a test on Steam Deck, take a look in both attachments - seems
to be aligned with my perception. Also checked dmesg on my Intel laptop
(i5-6300U, with "nonstop_tsc" capability) and the timestamps do not
reflect the time spent in S3 suspend...


>>> [...]
>>> We need to look into why that's the case, and we need to figure out
>>> where the source of the wake came from. Do you have a connector
>>> controller that can also wakeup the system?
>>
>> I'm not sure about this, what do you mean by a connector controller?!
> 
> I mean connector controller such as Type-C Port Controller (TCPC) or
> some specific phy that can detect and report to a driver whether a
> connection event occurs. For the lack of better term, I used connector
> controller... (perhaps just connector?)

Thanks for the clarification! I don't have it at hands anyway,
unfortunately heh


> [...]
> Perhaps we can make this a generic change across different devices. See
> below.
> 
>>
>> Any other debug / logs that might be useful?
> 
> In your setup, can you check if host mode wakeup is enabled:
> 
> diff --git a/drivers/usb/host/xhci-plat.c b/drivers/usb/host/xhci-plat.c
> index c5d9ac67e612..716b05ff0384 100644
> --- a/drivers/usb/host/xhci-plat.c
> +++ b/drivers/usb/host/xhci-plat.c
> @@ -472,6 +472,10 @@ static int xhci_plat_suspend(struct device *dev)
>         ret = xhci_priv_suspend_quirk(hcd);
>         if (ret)
>                 return ret;
> +
> +       dev_info(dev, "%s: device_may_wakeup: %d\n",
> +                __func__, !!device_may_wakeup(dev));
> +
>         /*
>          * xhci_suspend() needs `do_wakeup` to know whether host is allowed
>          * to do wakeup during suspend.

OK, tried with this hunk alone, and this is the result:

$ dmesg | grep "suspend\|xhci"
[227.990149] PM: suspend entry (deep)
[228.012255] printk: Suspending console(s) (use no_console_suspend to debug)
[228.041056] xhci-hcd xhci-hcd.2.auto: xhci_plat_suspend:
device_may_wakeup: 0
[228.878517] usb 1-3: reset full-speed USB device number 2 using xhci_hcd
[229.150502] usb 1-5: reset full-speed USB device number 3 using xhci_hcd
[229.318882] PM: suspend exit


> When you go through the dwc3-pci code path, the dwc3 creates the
> platform device from the pci device. Perhaps it doesn't enable wakeup.
> Let's confirm that.
> 
> My suspicion is that the power management is mapped to the same PCI
> PMCSR for both the host and device mode. When the device is in suspend
> (D3), it gives control the the PMUs. If the PME is enabled, the PMUs
> will generate PME on connection detection if the PME is enabled. When
> you go through the xhci code path, wakeup may not be enabled.
> 
> For device mode, we can handle generically by only enabling wakeup if
> the following conditions are met:
> 	if (dwc->gadget_driver && dwc->softconnect)
> 
> Try this (totally untested):
> 
> <patch>

Thanks a bunch Thinh, with this patch applied (plus the above hunk on
xhci-plat), things work both in host or device mode - I could properly
suspend and resume after pressing the power button in the Steam Deck.

So, how should we proceed now? Could you send a final/official version
of the patch? I could test and provide my Tested-by (and proceed with
backporting to Deck's kernel). Or let me know if you want/need more tests.

Thanks again for all the help/support in this issue =)
Cheers,


Guilherme
[ 1581.136572] PM: suspend entry (deep)
[ 1581.153993] Filesystems sync: 0.017 seconds
[ 1581.158424] Freezing user space processes
[ 1581.159767] Freezing user space processes completed (elapsed 0.001 seconds)
[ 1581.159774] OOM killer disabled.
[ 1581.159776] Freezing remaining freezable tasks
[ 1581.161135] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
[ 1581.161181] printk: Suspending console(s) (use no_console_suspend to debug)
[ 1581.161898] wlan0: deauthenticating from e0:63:da:37:36:f4 by local choice (Reason: 3=DEAUTH_LEAVING)
[ 1581.634343] ACPI: EC: interrupt blocked
[ 1581.666006] ACPI: PM: Preparing to enter system sleep state S3
[ 1581.667021] ACPI: EC: event blocked
[ 1581.667023] ACPI: EC: EC stopped
[ 1581.667025] ACPI: PM: Saving platform NVS memory
[ 1581.667669] Disabling non-boot CPUs ...
[ 1581.670318] smpboot: CPU 1 is now offline
[ 1581.673418] smpboot: CPU 2 is now offline
[ 1581.676352] smpboot: CPU 3 is now offline
[ 1581.679245] smpboot: CPU 4 is now offline
[ 1581.681863] smpboot: CPU 5 is now offline
[ 1581.684751] smpboot: CPU 6 is now offline
[ 1581.685487] Spectre V2 : Update user space SMT mitigation: STIBP off
[ 1581.687489] smpboot: CPU 7 is now offline
[ 1581.688608] ACPI: PM: Low-level resume complete
[ 1581.688638] ACPI: EC: EC started
[ 1581.688640] ACPI: PM: Restoring platform NVS memory
[ 1581.689102] LVT offset 0 assigned for vector 0x400
[ 1581.689790] Enabling non-boot CPUs ...
[ 1581.689896] smpboot: Booting Node 0 Processor 1 APIC 0x1
[ 1581.693107] ACPI: \_SB_.PLTF.C001: Found 3 idle states
[ 1581.693589] Spectre V2 : Update user space SMT mitigation: STIBP always-on
[ 1581.693642] CPU1 is up
[ 1581.693748] smpboot: Booting Node 0 Processor 2 APIC 0x2
[ 1581.697023] ACPI: \_SB_.PLTF.C002: Found 3 idle states
[ 1581.697481] CPU2 is up
[ 1581.697554] smpboot: Booting Node 0 Processor 3 APIC 0x3
[ 1581.700848] ACPI: \_SB_.PLTF.C003: Found 3 idle states
[ 1581.701327] CPU3 is up
[ 1581.701406] smpboot: Booting Node 0 Processor 4 APIC 0x4
[ 1581.704698] ACPI: \_SB_.PLTF.C004: Found 3 idle states
[ 1581.705151] CPU4 is up
[ 1581.705237] smpboot: Booting Node 0 Processor 5 APIC 0x5
[ 1581.708525] ACPI: \_SB_.PLTF.C005: Found 3 idle states
[ 1581.709044] CPU5 is up
[ 1581.709121] smpboot: Booting Node 0 Processor 6 APIC 0x6
[ 1581.712419] ACPI: \_SB_.PLTF.C006: Found 3 idle states
[ 1581.712935] CPU6 is up
[ 1581.713008] smpboot: Booting Node 0 Processor 7 APIC 0x7
[ 1581.716321] ACPI: \_SB_.PLTF.C007: Found 3 idle states
[ 1581.716896] CPU7 is up
[ 1581.718792] ACPI: PM: Waking up from system sleep state S3
[ 1581.720526] ACPI: EC: interrupt unblocked
[ 1581.727819] ACPI: EC: event unblocked
[ 1581.741323] nvme nvme0: Shutdown timeout set to 8 seconds
[ 1581.800877] nvme nvme0: 12/0/0 default/read/poll queues
[ 1582.048421] usb 1-5: reset full-speed USB device number 3 using xhci_hcd
[ 1582.320325] usb 1-3: reset full-speed USB device number 2 using xhci_hcd
[ 1582.503773] OOM killer enabled.
[ 1582.503778] Restarting tasks ... done.
[ 1582.504480] random: crng reseeded on system resumption
[ 1582.505927] Bluetooth: hci0: RTL: examining hci_ver=0a hci_rev=000c lmp_ver=0a lmp_subver=8822
[ 1582.507985] Bluetooth: hci0: RTL: rom_version status=0 version=3
[ 1582.508000] Bluetooth: hci0: RTL: loading rtl_bt/rtl8822cu_fw.bin
[ 1582.508847] Bluetooth: hci0: RTL: loading rtl_bt/rtl8822cu_config.bin
[ 1582.509047] Bluetooth: hci0: RTL: cfg_sz 6, total sz 35458
[ 1582.511146] PM: suspend exit
[ 1582.815948] Bluetooth: hci0: RTL: fw version 0xffb8abd3
[ 1582.921928] Bluetooth: hci0: AOSP extensions version v1.00
[ 1582.921939] Bluetooth: hci0: AOSP quality report is supported
[ 1582.922213] Bluetooth: MGMT ver 1.22
[ 1583.098188] wlan0: authenticate with e0:63:da:37:36:f4 (local address=14:d4:24:71:6d:69)
[ 1583.170967] wlan0: send auth to e0:63:da:37:36:f4 (try 1/3)
[ 1583.181110] wlan0: authenticated
[ 1583.184241] wlan0: associate with e0:63:da:37:36:f4 (try 1/3)
[ 1583.194507] wlan0: RX AssocResp from e0:63:da:37:36:f4 (capab=0x1431 status=0 aid=2)
[ 1583.194863] wlan0: associated
Guilherme G. Piccoli Jan. 31, 2024, 7:16 p.m. UTC | #11
On 31/01/2024 15:16, Guilherme G. Piccoli wrote:
> [...]
>>> And ftrace timestamps unfortunately don't help with that, it's not
>>> showing how much time the device stay suspended, so it might be
>>> confusing and both cases could appear as the same exact scenario, even
>>> if they are completely different (fail vs success cases).
>>
>> That's odd.. my assumption was the timestamps to be valid. Were you able
>> to confirm that it's the issue with the timestamps? Perhaps check if the
>> other devices in the system also wakeup at the approximately same
>> time printed in the timestamps?
>>
> 
> Hi Thinh, indeed - this a bit odd right? I guess this is maybe related
> with the way kernel keeps track of clocks on suspend - despite TSC keeps
> accounting on suspend (at least for all modern x86 processors IIUC), the
> timestamps in both tracing buffer or dmesg do not reflect suspend time.
> 
> Is it different in your x86 systems? Or maybe in other architectures you
> have experience with?
> 
> I've done a test on Steam Deck, take a look in both attachments - seems
> to be aligned with my perception. Also checked dmesg on my Intel laptop
> (i5-6300U, with "nonstop_tsc" capability) and the timestamps do not
> reflect the time spent in S3 suspend...
> 

Just a heads-up: just noticed that on s2idle sleep, the timestamps seems
to reflect the suspended time - just tested here. But on S3/deep sleep,
they don't...

Cheers!
Thinh Nguyen Feb. 1, 2024, 1:23 a.m. UTC | #12
On Wed, Jan 31, 2024, Guilherme G. Piccoli wrote:
> On 29/01/2024 23:17, Thinh Nguyen wrote:
> > [...]
> >> And ftrace timestamps unfortunately don't help with that, it's not
> >> showing how much time the device stay suspended, so it might be
> >> confusing and both cases could appear as the same exact scenario, even
> >> if they are completely different (fail vs success cases).
> > 
> > That's odd.. my assumption was the timestamps to be valid. Were you able
> > to confirm that it's the issue with the timestamps? Perhaps check if the
> > other devices in the system also wakeup at the approximately same
> > time printed in the timestamps?
> > 
> 
> Hi Thinh, indeed - this a bit odd right? I guess this is maybe related
> with the way kernel keeps track of clocks on suspend - despite TSC keeps
> accounting on suspend (at least for all modern x86 processors IIUC), the
> timestamps in both tracing buffer or dmesg do not reflect suspend time.
> 
> Is it different in your x86 systems? Or maybe in other architectures you
> have experience with?

I just expected this to happen for S4 and not S3. Our test environment
is different. I guess the "local" clock is turned off for S3. Perhaps we
should change the trace_clock for one that doesn't stop on suspend. If
you're using x86 TSC clock, maybe try this next time?

 # echo x86-tsc > /sys/kernel/debug/tracing/trace_clock

> 
> I've done a test on Steam Deck, take a look in both attachments - seems
> to be aligned with my perception. Also checked dmesg on my Intel laptop
> (i5-6300U, with "nonstop_tsc" capability) and the timestamps do not
> reflect the time spent in S3 suspend...

Ok. I'll keep this in mind for future debugs also. Thanks.

> 
> 
> >>> [...]
> >>> We need to look into why that's the case, and we need to figure out
> >>> where the source of the wake came from. Do you have a connector
> >>> controller that can also wakeup the system?
> >>
> >> I'm not sure about this, what do you mean by a connector controller?!
> > 
> > I mean connector controller such as Type-C Port Controller (TCPC) or
> > some specific phy that can detect and report to a driver whether a
> > connection event occurs. For the lack of better term, I used connector
> > controller... (perhaps just connector?)
> 
> Thanks for the clarification! I don't have it at hands anyway,
> unfortunately heh
> 
> 
> > [...]
> > Perhaps we can make this a generic change across different devices. See
> > below.
> > 
> >>
> >> Any other debug / logs that might be useful?
> > 
> > In your setup, can you check if host mode wakeup is enabled:
> > 
> > diff --git a/drivers/usb/host/xhci-plat.c b/drivers/usb/host/xhci-plat.c
> > index c5d9ac67e612..716b05ff0384 100644
> > --- a/drivers/usb/host/xhci-plat.c
> > +++ b/drivers/usb/host/xhci-plat.c
> > @@ -472,6 +472,10 @@ static int xhci_plat_suspend(struct device *dev)
> >         ret = xhci_priv_suspend_quirk(hcd);
> >         if (ret)
> >                 return ret;
> > +
> > +       dev_info(dev, "%s: device_may_wakeup: %d\n",
> > +                __func__, !!device_may_wakeup(dev));
> > +
> >         /*
> >          * xhci_suspend() needs `do_wakeup` to know whether host is allowed
> >          * to do wakeup during suspend.
> 
> OK, tried with this hunk alone, and this is the result:
> 
> $ dmesg | grep "suspend\|xhci"
> [227.990149] PM: suspend entry (deep)
> [228.012255] printk: Suspending console(s) (use no_console_suspend to debug)
> [228.041056] xhci-hcd xhci-hcd.2.auto: xhci_plat_suspend:
> device_may_wakeup: 0

This confirms that the xhci platform device created by dwc3 doesn't
enable wakeup. This actually inline with what we thought in the
beginning. But from the discussion and tests you did, we can deduce
further what had happened now.

Can you test another scenario. Connect an HID device such as a keyboard
to the STEAM DECK. Have the DECK run as host. Put it to sleep, check to
see if the keyboard can remote wakeup the DECK. If remote wakeup doesn't
work, can you try this in addition to the previous patch?

diff --git a/drivers/usb/dwc3/host.c b/drivers/usb/dwc3/host.c
index 4957b9765dc5..f9361e995f5b 100644
--- a/drivers/usb/dwc3/host.c
+++ b/drivers/usb/dwc3/host.c
@@ -166,6 +166,9 @@ int dwc3_host_init(struct dwc3 *dwc)
 		}
 	}
 
+	if (dwc->sys_wakeup)
+		device_init_wakeup(&xhci->dev, true);
+
 	ret = platform_device_add(xhci);
 	if (ret) {
 		dev_err(dwc->dev, "failed to register xHCI device\n");

> [228.878517] usb 1-3: reset full-speed USB device number 2 using xhci_hcd
> [229.150502] usb 1-5: reset full-speed USB device number 3 using xhci_hcd
> [229.318882] PM: suspend exit
> 
> 
> > When you go through the dwc3-pci code path, the dwc3 creates the
> > platform device from the pci device. Perhaps it doesn't enable wakeup.
> > Let's confirm that.
> > 
> > My suspicion is that the power management is mapped to the same PCI
> > PMCSR for both the host and device mode. When the device is in suspend
> > (D3), it gives control the the PMUs. If the PME is enabled, the PMUs
> > will generate PME on connection detection if the PME is enabled. When
> > you go through the xhci code path, wakeup may not be enabled.
> > 
> > For device mode, we can handle generically by only enabling wakeup if
> > the following conditions are met:
> > 	if (dwc->gadget_driver && dwc->softconnect)
> > 
> > Try this (totally untested):
> > 
> > <patch>
> 
> Thanks a bunch Thinh, with this patch applied (plus the above hunk on
> xhci-plat), things work both in host or device mode - I could properly
> suspend and resume after pressing the power button in the Steam Deck.
> 
> So, how should we proceed now? Could you send a final/official version
> of the patch? I could test and provide my Tested-by (and proceed with
> backporting to Deck's kernel). Or let me know if you want/need more tests.
> 
> Thanks again for all the help/support in this issue =)

No problem. I'm glad we can debug this without probing into the
hardward. I'll rework the patch so that it can be submitted. Your
Tested-by will be very helpful.

Thanks,
Thinh
Thinh Nguyen Feb. 1, 2024, 1:23 a.m. UTC | #13
On Wed, Jan 31, 2024, Guilherme G. Piccoli wrote:
> On 31/01/2024 15:16, Guilherme G. Piccoli wrote:
> > [...]
> >>> And ftrace timestamps unfortunately don't help with that, it's not
> >>> showing how much time the device stay suspended, so it might be
> >>> confusing and both cases could appear as the same exact scenario, even
> >>> if they are completely different (fail vs success cases).
> >>
> >> That's odd.. my assumption was the timestamps to be valid. Were you able
> >> to confirm that it's the issue with the timestamps? Perhaps check if the
> >> other devices in the system also wakeup at the approximately same
> >> time printed in the timestamps?
> >>
> > 
> > Hi Thinh, indeed - this a bit odd right? I guess this is maybe related
> > with the way kernel keeps track of clocks on suspend - despite TSC keeps
> > accounting on suspend (at least for all modern x86 processors IIUC), the
> > timestamps in both tracing buffer or dmesg do not reflect suspend time.
> > 
> > Is it different in your x86 systems? Or maybe in other architectures you
> > have experience with?
> > 
> > I've done a test on Steam Deck, take a look in both attachments - seems
> > to be aligned with my perception. Also checked dmesg on my Intel laptop
> > (i5-6300U, with "nonstop_tsc" capability) and the timestamps do not
> > reflect the time spent in S3 suspend...
> > 
> 
> Just a heads-up: just noticed that on s2idle sleep, the timestamps seems
> to reflect the suspended time - just tested here. But on S3/deep sleep,
> they don't...
> 

Ok. Thanks for the info.

Thinh
Guilherme G. Piccoli Feb. 1, 2024, 2:39 p.m. UTC | #14
On 31/01/2024 22:23, Thinh Nguyen wrote:
> [...]
> I just expected this to happen for S4 and not S3. Our test environment
> is different. I guess the "local" clock is turned off for S3. Perhaps we
> should change the trace_clock for one that doesn't stop on suspend. If
> you're using x86 TSC clock, maybe try this next time?
> 
>  # echo x86-tsc > /sys/kernel/debug/tracing/trace_clock
>

Hi Thinh, tried it now in the first test with keyboard connected in host
mode:

            bash-1015    [002] ...1. 852256557544: dwc3_suspend
<-dpm_run_callback
  kworker/u32:19-1099    [004] ...1. 852259924040: dwc3_pci_suspend
<-pci_pm_suspend
  kworker/u32:26-1107    [002] ...1. 853901309968: dwc3_pci_resume
<-dpm_run_callback
            bash-1015    [006] ...1. 853944152544: dwc3_resume
<-dpm_run_callback
            bash-1015    [006] ...1. 853944158900:
dwc3_core_init_for_resume <-dwc3_resume_common

So...did it work?! System was in suspend mode for ~4 min, but how do I
map these TSC timestamps to real time? heh


>> [...]
>> $ dmesg | grep "suspend\|xhci"
>> [227.990149] PM: suspend entry (deep)
>> [228.012255] printk: Suspending console(s) (use no_console_suspend to debug)
>> [228.041056] xhci-hcd xhci-hcd.2.auto: xhci_plat_suspend:
>> device_may_wakeup: 0
> 
> This confirms that the xhci platform device created by dwc3 doesn't
> enable wakeup. This actually inline with what we thought in the
> beginning. But from the discussion and tests you did, we can deduce
> further what had happened now.
> 
> Can you test another scenario. Connect an HID device such as a keyboard
> to the STEAM DECK. Have the DECK run as host. Put it to sleep, check to
> see if the keyboard can remote wakeup the DECK. If remote wakeup doesn't
> work, can you try this in addition to the previous patch?
>

At first, it didn't work - I couldn't wakeup the system from the keyboard.

Then I added the hunk below on top of the previous patch:

        /* Perhaps we need to enable wakeup for xhci->dev? */
        if (dwc->sys_wakeup) {
                device_init_wakeup(&xhci->dev, true);
                device_wakeup_enable(dwc->sysdev);
        }


And..that also didn't work! With or without the patch (+ above hunk),
can't wakeup from keyboard, in S3/deep.

But guess what? Keyboard *does wake* the system in s2idle, with or
without the patch!!! Interesting ...

Notice that all my tests are on top of v6.8-rc2.


> [...] 
> No problem. I'm glad we can debug this without probing into the
> hardward. I'll rework the patch so that it can be submitted. Your
> Tested-by will be very helpful.
> 

Awesome, thanks again! Please CC my email in the final patch and I'll be
glad in testing.
Cheers,


Guilherme
Thinh Nguyen Feb. 8, 2024, 11:53 p.m. UTC | #15
On Tue, Feb 06, 2024, Guilherme G. Piccoli wrote:
> >> [...]
> > Did you still see this print in S3?
> > 
> > 	xhci-hcd xhci-hcd.2.auto: xhci_plat_suspend: device_may_wakeup: 0
> > 
> > Or was it this:
> > 
> > 	xhci-hcd xhci-hcd.2.auto: xhci_plat_suspend: device_may_wakeup: 1
> > 
> 
> It always shows "device_may_wakeup: 0".
> Cheers,
> 

Thanks. I have some suspicions of what happened. I'll create a patch,
but I'll need some time.

Thanks,
Thinh
Guilherme G. Piccoli Feb. 9, 2024, 1:18 a.m. UTC | #16
On 08/02/2024 20:53, Thinh Nguyen wrote:
> [...]
> 
> Thanks. I have some suspicions of what happened. I'll create a patch,
> but I'll need some time.
> 
> Thanks,
> Thinh

Thank you a bunch! Let us know when you have a candidate, I can test it
quickly in the Steam Deck =)

Cheers,


Guilherme
Guilherme G. Piccoli March 7, 2024, 7:22 p.m. UTC | #17
On 06/03/2024 21:40, Thinh Nguyen wrote:
> [...]
> 
> Can you try this? I made some adjustments to the previous conditions:
> * If operate as device mode, only allow system wakeup if there's gadget
> * driver binded.
> * Make sure to pass the wakeup config to the xhci platform device when
>   switch to host.
> 
> If it works, I'll clean this up and split this into 2 separate patches.
> 
> Thanks,
> Thinh
> 
> <patch>


Hi Thinh, thanks again for your support!

I've tested the suggested patch, and it's working fine in both device
and host modes. In both modes the system suspends and wakes up fine.

And when in host mode, with USB connected, I see the following on dmesg:
"xhci-hcd xhci-hcd.2.auto: xhci_plat_suspend: device_may_wakeup: 1"

Please CC me in the official patches if possible, I can send my
Tested-by tags there.
Cheers,


Guilherme
Thinh Nguyen March 8, 2024, 2:42 a.m. UTC | #18
On Thu, Mar 07, 2024, Guilherme G. Piccoli wrote:
> On 06/03/2024 21:40, Thinh Nguyen wrote:
> > [...]
> > 
> > Can you try this? I made some adjustments to the previous conditions:
> > * If operate as device mode, only allow system wakeup if there's gadget
> > * driver binded.
> > * Make sure to pass the wakeup config to the xhci platform device when
> >   switch to host.
> > 
> > If it works, I'll clean this up and split this into 2 separate patches.
> > 
> > Thanks,
> > Thinh
> > 
> > <patch>
> 
> 
> Hi Thinh, thanks again for your support!
> 
> I've tested the suggested patch, and it's working fine in both device
> and host modes. In both modes the system suspends and wakes up fine.
> 
> And when in host mode, with USB connected, I see the following on dmesg:
> "xhci-hcd xhci-hcd.2.auto: xhci_plat_suspend: device_may_wakeup: 1"

That's great! I assumed you tested remote wakeup from the keyboard?

> 
> Please CC me in the official patches if possible, I can send my
> Tested-by tags there.
> Cheers,
> 
> 

Thanks for testing! I sent out the patch. I think it can be done in a
single patch. Your Tested-by will be very helpful.

BR,
Thinh
Guilherme G. Piccoli March 8, 2024, 11:52 a.m. UTC | #19
On 07/03/2024 23:42, Thinh Nguyen wrote:
> [...]
>> And when in host mode, with USB connected, I see the following on dmesg:
>> "xhci-hcd xhci-hcd.2.auto: xhci_plat_suspend: device_may_wakeup: 1"
> 
> That's great! I assumed you tested remote wakeup from the keyboard?
> 

Hi Thinh, uh..I didn't test keyboard wakeup yday, but tried just now heh
It doesn't work, but I don't think it's related with this patch.

I've also tested with pure XHCI (by disabling DRD mode on BIOS) and it
doesn't wakeup too. So, I personally don't think that "diminishes" this
patch, which is a proper fix. Oh, and also I'm testing through a very
very cheap OTG adapter, so not 100% confidence.

> [...]
> Thanks for testing! I sent out the patch. I think it can be done in a
> single patch. Your Tested-by will be very helpful.
> 

Thank you Thinh, for the great support here - much appreciated!
Cheers,


Guilherme
diff mbox series

Patch

diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
index 0328c86ef806..5801d3ebbbcb 100644
--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -104,7 +104,7 @@  static int dwc3_get_dr_mode(struct dwc3 *dwc)
 	return 0;
 }
 
-void dwc3_set_prtcap(struct dwc3 *dwc, u32 mode)
+static void __dwc3_set_prtcap(struct dwc3 *dwc, u32 mode)
 {
 	u32 reg;
 
@@ -112,7 +112,11 @@  void dwc3_set_prtcap(struct dwc3 *dwc, u32 mode)
 	reg &= ~(DWC3_GCTL_PRTCAPDIR(DWC3_GCTL_PRTCAP_OTG));
 	reg |= DWC3_GCTL_PRTCAPDIR(mode);
 	dwc3_writel(dwc->regs, DWC3_GCTL, reg);
+}
 
+void dwc3_set_prtcap(struct dwc3 *dwc, u32 mode)
+{
+	__dwc3_set_prtcap(dwc, mode);
 	dwc->current_dr_role = mode;
 }
 
@@ -2128,6 +2132,7 @@  static int dwc3_suspend_common(struct dwc3 *dwc, pm_message_t msg)
 			break;
 		dwc3_gadget_suspend(dwc);
 		synchronize_irq(dwc->irq_gadget);
+		__dwc3_set_prtcap(dwc, DWC3_GCTL_PRTCAP_HOST);
 		dwc3_core_exit(dwc);
 		break;
 	case DWC3_GCTL_PRTCAP_HOST: