diff mbox series

USB: Fix the issue of task recovery failure caused by USB status when S4 wakes up

Message ID 20240906030548.845115-1-duanchenghao@kylinos.cn
State Superseded
Headers show
Series USB: Fix the issue of task recovery failure caused by USB status when S4 wakes up | expand

Commit Message

duanchenghao Sept. 6, 2024, 3:05 a.m. UTC
When a device is inserted into the USB port and an S4 wakeup is initiated,
after the USB-hub initialization is completed, it will automatically enter suspend mode.
Upon detecting a device on the USB port, it will proceed with resume and set the hcd to the HCD_FLAG_WAKEUP_PENDING state.
During the S4 wakeup process, peripherals are put into suspend mode, followed by task recovery.
However, upon detecting that the hcd is in the HCD_FLAG_WAKEUP_PENDING state,
it will return an EBUSY status, causing the S4 suspend to fail and subsequent task recovery to not proceed.

This patch makes two modifications in total:
1. The set_bit and clean_bit operations for the HCD_FLAG_WAKEUP_PENDING flag of Hcd,
which were previously split between the top half and bottom half of the interrupt,
are now unified and executed solely in the bottom half of the interrupt.
This prevents the bottom half tasks from being frozen during the S4 process,
ensuring that the clean_bit process can proceed without interruption.

2. Add a condition to the set_bit operation for the hcd status HCD_FLAG_WAKEUP_PENDING.
When the hcd status is HC_STATE_SUSPENDED, perform the setting of the aforementioned status bit.
This prevents a subsequent set_bit from occurring after the clean_bit if the hcd is in the resuming process.

Signed-off-by: Duan Chenghao <duanchenghao@kylinos.cn>
---
 drivers/usb/core/hcd.c | 1 -
 drivers/usb/core/hub.c | 3 +++
 2 files changed, 3 insertions(+), 1 deletion(-)

Comments

duanchenghao Sept. 10, 2024, 9:34 a.m. UTC | #1
> [Please make sure that the lines in your email message don't extend 
> beyond 76 columns or so.]
> 

OK. Later, I will modify the patch format. V2 patch will be released
later

> Lots of things here seem to be wrong.
> 
> On Fri, Sep 06, 2024 at 11:05:48AM +0800, Duan Chenghao wrote:
> > When a device is inserted into the USB port and an S4 wakeup is
> > initiated,
> 
> There is no such thing as an S4 wakeup.  Do you mean wakeup from an
> S4 
> suspend state?

Yes, waking up from the S4 suspend state.

> 
> > after the USB-hub initialization is completed, it will
> > automatically enter suspend mode.
> 
> What will enter suspend mode?  The hub that the device was plugged
> into?
> That should not happen.  The hub initialization code should detect
> that 
> a new device was plugged in and prevent the hub from suspending.
> 

Yes, the current issue is that the hub detects a new device during the
resuming process. However, the S4 wakeup is attempting to put the hub
into suspend mode, and during the suspend process, it detects that the
HCD_FLAG_WAKEUP_PENDING flag has already been set, resulting in the
return of an EBUSY status.

> > Upon detecting a device on the USB port, it will proceed with
> > resume and set the hcd to the HCD_FLAG_WAKEUP_PENDING state.
> 
> HCD_FLAG_WAKEUP_PENDING is not a state.  It is a flag.
> 
> > During the S4 wakeup process, peripherals are put into suspend
> > mode, followed by task recovery.
> 
> What do you mean by "task recovery"?  We don't need to recover any 
> tasks.
> 

S4 wakeup restores the image that was saved before the system entered
the S4 sleep state.

    S4 waking up from hibernation
    =============================
    kernel initialization
    |   
    v   
    freeze user task and kernel thread
    |   
    v   
    load saved image
    |    
    v   
    freeze the peripheral device and controller
    (Check the HCD_FLAG_WAKEUP_ PENDING flag of the USB. If it is set,
     return to EBUSY and do not perform the following restore image.)
    |
    v
    restore image(task recovery)


> What do you mean by "peripherals are put into suspend mode"?  That's
> not 
> what happens.  Peripherals are set back to full power.
> 
> > However, upon detecting that the hcd is in the
> > HCD_FLAG_WAKEUP_PENDING state,
> > it will return an EBUSY status, causing the S4 suspend to fail and
> > subsequent task recovery to not proceed.
> 
> What will return an EBUSY status?

if HCD_FLAG_WAKEUP_PENDING flag is set_bit, will return EBUSY.

> 
> Why do you say that S4 suspend will fail?  Aren't you talking about
> S4 
> wakeup?

After returning EBUSY, the subsequent restore image operation will not
be executed.

> 
> Can you provide a kernel log that explains these points and shows
> what 
> problem you are trying to solve?

[    9.009166][ 2] [  T403] PM: Image signature found, resuming
[    9.009167][ 2] [  T403] PM: resume from hibernation
[    9.009243][ 2] [  T403] inno-codec inno-codec.16.auto:
[inno_vpu][vpu_notifier:1540]vpu_notifier: untested action 5...
[    9.009244][ 2] [  T403] Freezing user space processes ... (elapsed
0.001 seconds) done.
[    9.010355][ 2] [  T403] OOM killer disabled.
[    9.010355][ 2] [  T403] Freezing remaining freezable tasks ...
(elapsed 0.000 seconds) done.
[    9.012152][ 2] [  T403] PM: Basic memory bitmaps created
[    9.073333][ 2] [  T403] PM: Using 3 thread(s) for decompression
[    9.073334][ 2] [  T403] PM: Loading and decompressing image data
(486874 pages)...
[    9.073335][ 2] [  T403] hibernate: Hibernated on CPU 0 [mpidr:0x0]
[    9.095928][ 2] [  T403] PM: Image loading progress:   0%
[    9.664803][ 2] [  T403] PM: Image loading progress:  10%
[    9.794156][ 2] [  T403] PM: Image loading progress:  20%
[    9.913001][ 2] [  T403] PM: Image loading progress:  30%
[   10.034331][ 2] [  T403] PM: Image loading progress:  40%
[   10.154070][ 2] [  T403] PM: Image loading progress:  50%
[   10.277096][ 2] [  T403] PM: Image loading progress:  60%
[   10.398860][ 2] [  T403] PM: Image loading progress:  70%
[   10.533760][ 2] [  T403] PM: Image loading progress:  80%
[   10.659874][ 2] [  T403] PM: Image loading progress:  90%
[   10.760681][ 2] [  T403] PM: Image loading progress: 100%
[   10.760693][ 2] [  T403] PM: Image loading done
[   10.760718][ 2] [  T403] PM: Read 1947496 kbytes in 1.68 seconds
(1159.22 MB/s)
[   10.761982][ 2] [  T403] PM: Image successfully loaded
[   10.761988][ 2] [  T403] printk: Suspending console(s) (use
no_console_suspend to debug)
[   10.864973][ 2] [  T403] innovpu_freeze:1782
[   10.864974][ 2] [  T403] innovpu_suspend:1759
[   11.168871][ 2] [  T189] PM: pci_pm_freeze():
hcd_pci_suspend+0x0/0x38 returns -16
[   11.168875][ 2] [  T189] PM: dpm_run_callback():
pci_pm_freeze+0x0/0x108 returns -16
[   11.168876][ 2] [  T189] PM: Device 0000:05:00.0 failed to quiesce
async: error -16
[   12.270452][ 2] [  T403] innovpu_thaw:1792
[   12.405296][ 2] [  T403] PM: Failed to load hibernation image,
recovering.
[   12.486859][ 2] [  T403] PM: Basic memory bitmaps freed
[   12.486860][ 2] [  T403] OOM killer enabled.
[   12.486861][ 2] [  T403] Restarting tasks ... 

> 
> > This patch makes two modifications in total:
> > 1. The set_bit and clean_bit operations for the
> > HCD_FLAG_WAKEUP_PENDING flag of Hcd,
> > which were previously split between the top half and bottom half of
> > the interrupt,
> > are now unified and executed solely in the bottom half of the
> > interrupt.
> > This prevents the bottom half tasks from being frozen during the S4
> > process,
> > ensuring that the clean_bit process can proceed without
> > interruption.
> 
> The name is "clear_bit" (with an 'r'), not "clean_bit".
> 
> > 2. Add a condition to the set_bit operation for the hcd status
> > HCD_FLAG_WAKEUP_PENDING.
> > When the hcd status is HC_STATE_SUSPENDED, perform the setting of
> > the aforementioned status bit.
> > This prevents a subsequent set_bit from occurring after the
> > clean_bit if the hcd is in the resuming process.
> 
> hcd_bus_resume() clears that HCD_FLAG_WAKEUP_PENDING bit after
> calling 
> hcd->driver->bus_resume().  After that point,
> usb_hcd_resume_root_hub() 
> won't be called, so how can HCD_FLAG_WAKEUP_PENDING get set again?
> 
> Alan Stern
> 
> > Signed-off-by: Duan Chenghao <duanchenghao@kylinos.cn>
> > ---
> >  drivers/usb/core/hcd.c | 1 -
> >  drivers/usb/core/hub.c | 3 +++
> >  2 files changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c
> > index 1ff7d901fede..a6bd0fbd82f4 100644
> > --- a/drivers/usb/core/hcd.c
> > +++ b/drivers/usb/core/hcd.c
> > @@ -2389,7 +2389,6 @@ void usb_hcd_resume_root_hub (struct usb_hcd
> > *hcd)
> >         spin_lock_irqsave (&hcd_root_hub_lock, flags);
> >         if (hcd->rh_registered) {
> >                 pm_wakeup_event(&hcd->self.root_hub->dev, 0);
> > -               set_bit(HCD_FLAG_WAKEUP_PENDING, &hcd->flags);
> >                 queue_work(pm_wq, &hcd->wakeup_work);
> >         }
> >         spin_unlock_irqrestore (&hcd_root_hub_lock, flags);
> > diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
> > index 4b93c0bd1d4b..7f847c4afc0d 100644
> > --- a/drivers/usb/core/hub.c
> > +++ b/drivers/usb/core/hub.c
> > @@ -3835,11 +3835,14 @@ int usb_port_resume(struct usb_device
> > *udev, pm_message_t msg)
> >  
> >  int usb_remote_wakeup(struct usb_device *udev)
> >  {
> > +       struct usb_hcd  *hcd = bus_to_hcd(udev->bus);
> >         int     status = 0;
> >  
> >         usb_lock_device(udev);
> >         if (udev->state == USB_STATE_SUSPENDED) {
> >                 dev_dbg(&udev->dev, "usb %sresume\n", "wakeup-");
> > +               if (hcd->state == HC_STATE_SUSPENDED)
> > +                       set_bit(HCD_FLAG_WAKEUP_PENDING, &hcd-
> > > flags);
> >                 status = usb_autoresume_device(udev);
> >                 if (status == 0) {
> >                         /* Let the drivers do their thing, then...
> > */
> > -- 
> > 2.34.1
> > 
> >
duanchenghao Sept. 10, 2024, 9:36 a.m. UTC | #2
> [Please make sure that the lines in your email message don't extend 
> beyond 76 columns or so.]
> 

OK. Later, I will modify the patch format. V2 patch will be released
later

> Lots of things here seem to be wrong.
> 
> On Fri, Sep 06, 2024 at 11:05:48AM +0800, Duan Chenghao wrote:
> > When a device is inserted into the USB port and an S4 wakeup is
> > initiated,
> 
> There is no such thing as an S4 wakeup.  Do you mean wakeup from an
> S4 
> suspend state?

Yes, waking up from the S4 suspend state.

> 
> > after the USB-hub initialization is completed, it will
> > automatically enter suspend mode.
> 
> What will enter suspend mode?  The hub that the device was plugged
> into?
> That should not happen.  The hub initialization code should detect
> that 
> a new device was plugged in and prevent the hub from suspending.
> 

Yes, the current issue is that the hub detects a new device during the
resuming process. However, the S4 wakeup is attempting to put the hub
into suspend mode, and during the suspend process, it detects that the
HCD_FLAG_WAKEUP_PENDING flag has already been set, resulting in the
return of an EBUSY status.

> > Upon detecting a device on the USB port, it will proceed with
> > resume and set the hcd to the HCD_FLAG_WAKEUP_PENDING state.
> 
> HCD_FLAG_WAKEUP_PENDING is not a state.  It is a flag.
> 
> > During the S4 wakeup process, peripherals are put into suspend
> > mode, followed by task recovery.
> 
> What do you mean by "task recovery"?  We don't need to recover any 
> tasks.
> 

S4 wakeup restores the image that was saved before the system entered
the S4 sleep state.

    S4 waking up from hibernation
    =============================
    kernel initialization
    |   
    v   
    freeze user task and kernel thread
    |   
    v   
    load saved image
    |    
    v   
    freeze the peripheral device and controller
    (Check the HCD_FLAG_WAKEUP_ PENDING flag of the USB. If it is set,
     return to EBUSY and do not perform the following restore image.)
    |
    v
    restore image(task recovery)


> What do you mean by "peripherals are put into suspend mode"?  That's
> not 
> what happens.  Peripherals are set back to full power.
> 
> > However, upon detecting that the hcd is in the
> > HCD_FLAG_WAKEUP_PENDING state,
> > it will return an EBUSY status, causing the S4 suspend to fail and
> > subsequent task recovery to not proceed.
> 
> What will return an EBUSY status?

if HCD_FLAG_WAKEUP_PENDING flag is set_bit, will return EBUSY.

> 
> Why do you say that S4 suspend will fail?  Aren't you talking about
> S4 
> wakeup?

After returning EBUSY, the subsequent restore image operation will not
be executed.

> 
> Can you provide a kernel log that explains these points and shows
> what 
> problem you are trying to solve?

[    9.009166][ 2] [  T403] PM: Image signature found, resuming
[    9.009167][ 2] [  T403] PM: resume from hibernation
[    9.009243][ 2] [  T403] inno-codec inno-codec.16.auto:
[inno_vpu][vpu_notifier:1540]vpu_notifier: untested action 5...
[    9.009244][ 2] [  T403] Freezing user space processes ... (elapsed
0.001 seconds) done.
[    9.010355][ 2] [  T403] OOM killer disabled.
[    9.010355][ 2] [  T403] Freezing remaining freezable tasks ...
(elapsed 0.000 seconds) done.
[    9.012152][ 2] [  T403] PM: Basic memory bitmaps created
[    9.073333][ 2] [  T403] PM: Using 3 thread(s) for decompression
[    9.073334][ 2] [  T403] PM: Loading and decompressing image data
(486874 pages)...
[    9.073335][ 2] [  T403] hibernate: Hibernated on CPU 0 [mpidr:0x0]
[    9.095928][ 2] [  T403] PM: Image loading progress:   0%
[    9.664803][ 2] [  T403] PM: Image loading progress:  10%
[    9.794156][ 2] [  T403] PM: Image loading progress:  20%
[    9.913001][ 2] [  T403] PM: Image loading progress:  30%
[   10.034331][ 2] [  T403] PM: Image loading progress:  40%
[   10.154070][ 2] [  T403] PM: Image loading progress:  50%
[   10.277096][ 2] [  T403] PM: Image loading progress:  60%
[   10.398860][ 2] [  T403] PM: Image loading progress:  70%
[   10.533760][ 2] [  T403] PM: Image loading progress:  80%
[   10.659874][ 2] [  T403] PM: Image loading progress:  90%
[   10.760681][ 2] [  T403] PM: Image loading progress: 100%
[   10.760693][ 2] [  T403] PM: Image loading done
[   10.760718][ 2] [  T403] PM: Read 1947496 kbytes in 1.68 seconds
(1159.22 MB/s)
[   10.761982][ 2] [  T403] PM: Image successfully loaded
[   10.761988][ 2] [  T403] printk: Suspending console(s) (use
no_console_suspend to debug)
[   10.864973][ 2] [  T403] innovpu_freeze:1782
[   10.864974][ 2] [  T403] innovpu_suspend:1759
[   11.168871][ 2] [  T189] PM: pci_pm_freeze():
hcd_pci_suspend+0x0/0x38 returns -16
[   11.168875][ 2] [  T189] PM: dpm_run_callback():
pci_pm_freeze+0x0/0x108 returns -16
[   11.168876][ 2] [  T189] PM: Device 0000:05:00.0 failed to quiesce
async: error -16
[   12.270452][ 2] [  T403] innovpu_thaw:1792
[   12.405296][ 2] [  T403] PM: Failed to load hibernation image,
recovering.
[   12.486859][ 2] [  T403] PM: Basic memory bitmaps freed
[   12.486860][ 2] [  T403] OOM killer enabled.
[   12.486861][ 2] [  T403] Restarting tasks ... 

> 
> > This patch makes two modifications in total:
> > 1. The set_bit and clean_bit operations for the
> > HCD_FLAG_WAKEUP_PENDING flag of Hcd,
> > which were previously split between the top half and bottom half of
> > the interrupt,
> > are now unified and executed solely in the bottom half of the
> > interrupt.
> > This prevents the bottom half tasks from being frozen during the S4
> > process,
> > ensuring that the clean_bit process can proceed without
> > interruption.
> 
> The name is "clear_bit" (with an 'r'), not "clean_bit".
> 
> > 2. Add a condition to the set_bit operation for the hcd status
> > HCD_FLAG_WAKEUP_PENDING.
> > When the hcd status is HC_STATE_SUSPENDED, perform the setting of
> > the aforementioned status bit.
> > This prevents a subsequent set_bit from occurring after the
> > clean_bit if the hcd is in the resuming process.
> 
> hcd_bus_resume() clears that HCD_FLAG_WAKEUP_PENDING bit after
> calling 
> hcd->driver->bus_resume().  After that point,
> usb_hcd_resume_root_hub() 
> won't be called, so how can HCD_FLAG_WAKEUP_PENDING get set again?
> 
> Alan Stern
> 
> > Signed-off-by: Duan Chenghao <duanchenghao@kylinos.cn>
> > ---
> >  drivers/usb/core/hcd.c | 1 -
> >  drivers/usb/core/hub.c | 3 +++
> >  2 files changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c
> > index 1ff7d901fede..a6bd0fbd82f4 100644
> > --- a/drivers/usb/core/hcd.c
> > +++ b/drivers/usb/core/hcd.c
> > @@ -2389,7 +2389,6 @@ void usb_hcd_resume_root_hub (struct usb_hcd
> > *hcd)
> >         spin_lock_irqsave (&hcd_root_hub_lock, flags);
> >         if (hcd->rh_registered) {
> >                 pm_wakeup_event(&hcd->self.root_hub->dev, 0);
> > -               set_bit(HCD_FLAG_WAKEUP_PENDING, &hcd->flags);
> >                 queue_work(pm_wq, &hcd->wakeup_work);
> >         }
> >         spin_unlock_irqrestore (&hcd_root_hub_lock, flags);
> > diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
> > index 4b93c0bd1d4b..7f847c4afc0d 100644
> > --- a/drivers/usb/core/hub.c
> > +++ b/drivers/usb/core/hub.c
> > @@ -3835,11 +3835,14 @@ int usb_port_resume(struct usb_device
> > *udev, pm_message_t msg)
> >  
> >  int usb_remote_wakeup(struct usb_device *udev)
> >  {
> > +       struct usb_hcd  *hcd = bus_to_hcd(udev->bus);
> >         int     status = 0;
> >  
> >         usb_lock_device(udev);
> >         if (udev->state == USB_STATE_SUSPENDED) {
> >                 dev_dbg(&udev->dev, "usb %sresume\n", "wakeup-");
> > +               if (hcd->state == HC_STATE_SUSPENDED)
> > +                       set_bit(HCD_FLAG_WAKEUP_PENDING, &hcd-
> > > flags);
> >                 status = usb_autoresume_device(udev);
> >                 if (status == 0) {
> >                         /* Let the drivers do their thing, then...
> > */
> > -- 
> > 2.34.1
> > 
> >
duanchenghao Sept. 12, 2024, 3:21 a.m. UTC | #3
在 2024-09-11星期三的 10:40 -0400,Alan Stern写道:
> On Tue, Sep 10, 2024 at 05:36:56PM +0800, duanchenghao wrote:
> > S4 wakeup restores the image that was saved before the system
> > entered
> > the S4 sleep state.
> > 
> >     S4 waking up from hibernation
> >     =============================
> >     kernel initialization
> >     |   
> >     v   
> >     freeze user task and kernel thread
> >     |   
> >     v   
> >     load saved image
> >     |    
> >     v   
> >     freeze the peripheral device and controller
> >     (Check the HCD_FLAG_WAKEUP_ PENDING flag of the USB. If it is
> > set,
> >      return to EBUSY and do not perform the following restore
> > image.)
> 
> Why is the flag set at this point?  It should not be; the device and 
> controller should have been frozen with wakeup disabled.
> 
This is check point, not set point. When the USB goes into a suspend
state, the HCD_FLAG_WAKEUP_PENDING flag is checked, and if it is found
that the USB is in the process of resuming, then an EBUSY error is
returned.

> >     |
> >     v
> >     restore image(task recovery)
> 
> > > > However, upon detecting that the hcd is in the
> > > > HCD_FLAG_WAKEUP_PENDING state,
> > > > it will return an EBUSY status, causing the S4 suspend to fail
> > > > and
> > > > subsequent task recovery to not proceed.
> > > 
> > > What will return an EBUSY status?
> > 
> > if HCD_FLAG_WAKEUP_PENDING flag is set_bit, will return EBUSY.
> 
> I meant: Which function will return EBUSY status?  The answer is in
> the 
> log below; hcd_pci_suspend() does this.
> 
> > > Why do you say that S4 suspend will fail?  Aren't you talking
> > > about
> > > S4 
> > > wakeup?
> > 
> > After returning EBUSY, the subsequent restore image operation will
> > not
> > be executed.
> > 
> > > 
> > > Can you provide a kernel log that explains these points and shows
> > > what 
> > > problem you are trying to solve?
> > 
> > [    9.009166][ 2] [  T403] PM: Image signature found, resuming
> > [    9.009167][ 2] [  T403] PM: resume from hibernation
> > [    9.009243][ 2] [  T403] inno-codec inno-codec.16.auto:
> > [inno_vpu][vpu_notifier:1540]vpu_notifier: untested action 5...
> > [    9.009244][ 2] [  T403] Freezing user space processes ...
> > (elapsed
> > 0.001 seconds) done.
> > [    9.010355][ 2] [  T403] OOM killer disabled.
> > [    9.010355][ 2] [  T403] Freezing remaining freezable tasks ...
> > (elapsed 0.000 seconds) done.
> > [    9.012152][ 2] [  T403] PM: Basic memory bitmaps created
> > [    9.073333][ 2] [  T403] PM: Using 3 thread(s) for decompression
> > [    9.073334][ 2] [  T403] PM: Loading and decompressing image
> > data
> > (486874 pages)...
> > [    9.073335][ 2] [  T403] hibernate: Hibernated on CPU 0
> > [mpidr:0x0]
> > [    9.095928][ 2] [  T403] PM: Image loading progress:   0%
> > [    9.664803][ 2] [  T403] PM: Image loading progress:  10%
> > [    9.794156][ 2] [  T403] PM: Image loading progress:  20%
> > [    9.913001][ 2] [  T403] PM: Image loading progress:  30%
> > [   10.034331][ 2] [  T403] PM: Image loading progress:  40%
> > [   10.154070][ 2] [  T403] PM: Image loading progress:  50%
> > [   10.277096][ 2] [  T403] PM: Image loading progress:  60%
> > [   10.398860][ 2] [  T403] PM: Image loading progress:  70%
> > [   10.533760][ 2] [  T403] PM: Image loading progress:  80%
> > [   10.659874][ 2] [  T403] PM: Image loading progress:  90%
> > [   10.760681][ 2] [  T403] PM: Image loading progress: 100%
> > [   10.760693][ 2] [  T403] PM: Image loading done
> > [   10.760718][ 2] [  T403] PM: Read 1947496 kbytes in 1.68 seconds
> > (1159.22 MB/s)
> > [   10.761982][ 2] [  T403] PM: Image successfully loaded
> > [   10.761988][ 2] [  T403] printk: Suspending console(s) (use
> > no_console_suspend to debug)
> > [   10.864973][ 2] [  T403] innovpu_freeze:1782
> > [   10.864974][ 2] [  T403] innovpu_suspend:1759
> > [   11.168871][ 2] [  T189] PM: pci_pm_freeze():
> > hcd_pci_suspend+0x0/0x38 returns -16
> 
> This should not be allowed to happen.  Freezing is mandatory and not 
> subject to wakeup requests.
> 
> Is your problem related to the one discussed in this email thread?
> 
> https://lore.kernel.org/linux-usb/d8600868-6e4b-45ab-b328-852b6ac8ecb5@rowland.harvard.edu/
> 
> Would the suggestion I made there -- i.e., have the xhci-hcd 
> interrupt handler skip calling usb_hcd_resume_root_hub() if the root
> hub 
> was suspended with wakeup = 0 -- fix your problem?

Skipping usb_hcd_resume_root_hub() should generally be possible, but
it's important to ensure that normal remote wakeup functionality is not
compromised. Is it HUB_SUSPEND that the hub you are referring to is in
a suspended state?

v2 patch:
https://lore.kernel.org/all/20240910105714.148976-1-duanchenghao@kylinos.cn/
> 
> Alan Stern
Alan Stern Sept. 12, 2024, 3 p.m. UTC | #4
On Thu, Sep 12, 2024 at 11:21:26AM +0800, duanchenghao wrote:
> 在 2024-09-11星期三的 10:40 -0400,Alan Stern写道:
> > On Tue, Sep 10, 2024 at 05:36:56PM +0800, duanchenghao wrote:
> > > S4 wakeup restores the image that was saved before the system
> > > entered
> > > the S4 sleep state.
> > > 
> > >     S4 waking up from hibernation
> > >     =============================
> > >     kernel initialization
> > >     |   
> > >     v   
> > >     freeze user task and kernel thread
> > >     |   
> > >     v   
> > >     load saved image
> > >     |    
> > >     v   
> > >     freeze the peripheral device and controller
> > >     (Check the HCD_FLAG_WAKEUP_ PENDING flag of the USB. If it is
> > > set,
> > >      return to EBUSY and do not perform the following restore
> > > image.)
> > 
> > Why is the flag set at this point?  It should not be; the device and 
> > controller should have been frozen with wakeup disabled.
> > 
> This is check point, not set point.

Yes, I know that.  But when the flag was checked, why did the code find 
that it was set?  The flag should have been clear.

> > Is your problem related to the one discussed in this email thread?
> > 
> > https://lore.kernel.org/linux-usb/d8600868-6e4b-45ab-b328-852b6ac8ecb5@rowland.harvard.edu/
> > 
> > Would the suggestion I made there -- i.e., have the xhci-hcd 
> > interrupt handler skip calling usb_hcd_resume_root_hub() if the root
> > hub 
> > was suspended with wakeup = 0 -- fix your problem?
> 
> Skipping usb_hcd_resume_root_hub() should generally be possible, but
> it's important to ensure that normal remote wakeup functionality is not
> compromised. Is it HUB_SUSPEND that the hub you are referring to is in
> a suspended state?

I don't understand this question.  hub_quiesce() gets called with 
HUB_SUSPEND when the hub enters a suspended state.

You are correct about the need for normal remote wakeup to work 
properly.  The interrupt handler should skip calling 
usb_hcd_resume_root_hub() for port connect or disconnect changes and for 
port overcurrent changes (when the root hub is suspended with wakeup = 
0).  But it should _not_ skip calling usb_hcd_resume_root_hub() for port 
resume events.

Alan Stern
duanchenghao Sept. 13, 2024, 1:51 a.m. UTC | #5
在 2024-09-12星期四的 11:00 -0400,Alan Stern写道:
> On Thu, Sep 12, 2024 at 11:21:26AM +0800, duanchenghao wrote:
> > 在 2024-09-11星期三的 10:40 -0400,Alan Stern写道:
> > > On Tue, Sep 10, 2024 at 05:36:56PM +0800, duanchenghao wrote:
> > > > S4 wakeup restores the image that was saved before the system
> > > > entered
> > > > the S4 sleep state.
> > > > 
> > > >     S4 waking up from hibernation
> > > >     =============================
> > > >     kernel initialization
> > > >     |   
> > > >     v   
> > > >     freeze user task and kernel thread
> > > >     |   
> > > >     v   
> > > >     load saved image
> > > >     |    
> > > >     v   
> > > >     freeze the peripheral device and controller
> > > >     (Check the HCD_FLAG_WAKEUP_ PENDING flag of the USB. If it
> > > > is
> > > > set,
> > > >      return to EBUSY and do not perform the following restore
> > > > image.)
> > > 
> > > Why is the flag set at this point?  It should not be; the device
> > > and 
> > > controller should have been frozen with wakeup disabled.
> > > 
> > This is check point, not set point.
> 
> Yes, I know that.  But when the flag was checked, why did the code
> find 
> that it was set?  The flag should have been clear.

Yes, the current issue is that during S4 testing, there is a
probabilistic scenario where clear_bit is not called after set_bit, or
clear_bit is called but does not execute after set_bit. Please refer to
the two modification points in the v2 patch for details, as both of
them can cause the current issue.

> 
> > > Is your problem related to the one discussed in this email
> > > thread?
> > > 
> > > https://lore.kernel.org/linux-usb/d8600868-6e4b-45ab-b328-852b6ac8ecb5@rowland.harvard.edu/
> > > 
> > > Would the suggestion I made there -- i.e., have the xhci-hcd 
> > > interrupt handler skip calling usb_hcd_resume_root_hub() if the
> > > root
> > > hub 
> > > was suspended with wakeup = 0 -- fix your problem?
> > 
> > Skipping usb_hcd_resume_root_hub() should generally be possible,
> > but
> > it's important to ensure that normal remote wakeup functionality is
> > not
> > compromised. Is it HUB_SUSPEND that the hub you are referring to is
> > in
> > a suspended state?
> 
> I don't understand this question.  hub_quiesce() gets called with 
> HUB_SUSPEND when the hub enters a suspended state.
> 
> You are correct about the need for normal remote wakeup to work 
> properly.  The interrupt handler should skip calling 
> usb_hcd_resume_root_hub() for port connect or disconnect changes and
> for 
> port overcurrent changes (when the root hub is suspended with wakeup
> = 
> 0).  But it should _not_ skip calling usb_hcd_resume_root_hub() for
> port 
> resume events.

The current issue arises when rh_state is detected as RH_SUSPEND and
usb_hcd_resume_root_hub() is called to resume the root hub. However,
there is no mutual exclusion between the suspend flag, set_bit, and
clear_bit, which can lead to two scenarios:

    1. After set_bit is called, the state of the USB device is modified
by another process to !USB_STATE_SUSPEND, preventing the hub's resume
from being executed, and consequently, clear_bit is not called again.

    2. In another scenario, during the hub resume process, after
HCD_FLAG_WAKEUP_PENDING is cleared by clear_bit, rh_state has not yet
been set to !RH_SUSPENDED. At this point, set_bit is executed, but
since the hub has already entered the running state, the clear_bit
associated with the resume operation is not executed.

Please review the v2 patch, where I have described both the logical
flow before the modification and the revised logical flow after the
modification.

> 
> Alan Stern
duanchenghao Sept. 13, 2024, 9:38 a.m. UTC | #6
在 2024-09-13星期五的 09:51 +0800,duanchenghao写道:
> 在 2024-09-12星期四的 11:00 -0400,Alan Stern写道:
> > On Thu, Sep 12, 2024 at 11:21:26AM +0800, duanchenghao wrote:
> > > 在 2024-09-11星期三的 10:40 -0400,Alan Stern写道:
> > > > On Tue, Sep 10, 2024 at 05:36:56PM +0800, duanchenghao wrote:
> > > > > S4 wakeup restores the image that was saved before the system
> > > > > entered
> > > > > the S4 sleep state.
> > > > > 
> > > > >     S4 waking up from hibernation
> > > > >     =============================
> > > > >     kernel initialization
> > > > >     |   
> > > > >     v   
> > > > >     freeze user task and kernel thread
> > > > >     |   
> > > > >     v   
> > > > >     load saved image
> > > > >     |    
> > > > >     v   
> > > > >     freeze the peripheral device and controller
> > > > >     (Check the HCD_FLAG_WAKEUP_ PENDING flag of the USB. If
> > > > > it
> > > > > is
> > > > > set,
> > > > >      return to EBUSY and do not perform the following restore
> > > > > image.)
> > > > 
> > > > Why is the flag set at this point?  It should not be; the
> > > > device
> > > > and 
> > > > controller should have been frozen with wakeup disabled.
> > > > 
> > > This is check point, not set point.
> > 
> > Yes, I know that.  But when the flag was checked, why did the code
> > find 
> > that it was set?  The flag should have been clear.
> 
> Yes, the current issue is that during S4 testing, there is a
> probabilistic scenario where clear_bit is not called after set_bit,
> or
> clear_bit is called but does not execute after set_bit. Please refer
> to
> the two modification points in the v2 patch for details, as both of
> them can cause the current issue.
> 
> > 
> > > > Is your problem related to the one discussed in this email
> > > > thread?
> > > > 
> > > > https://lore.kernel.org/linux-usb/d8600868-6e4b-45ab-b328-852b6ac8ecb5@rowland.harvard.edu/
> > > > 
> > > > Would the suggestion I made there -- i.e., have the xhci-hcd 
> > > > interrupt handler skip calling usb_hcd_resume_root_hub() if the
> > > > root
> > > > hub 
> > > > was suspended with wakeup = 0 -- fix your problem?
> > > 
> > > Skipping usb_hcd_resume_root_hub() should generally be possible,
> > > but
> > > it's important to ensure that normal remote wakeup functionality
> > > is
> > > not
> > > compromised. Is it HUB_SUSPEND that the hub you are referring to
> > > is
> > > in
> > > a suspended state?
> > 
> > I don't understand this question.  hub_quiesce() gets called with 
> > HUB_SUSPEND when the hub enters a suspended state.
> > 
> > You are correct about the need for normal remote wakeup to work 
> > properly.  The interrupt handler should skip calling 
> > usb_hcd_resume_root_hub() for port connect or disconnect changes
> > and
> > for 
> > port overcurrent changes (when the root hub is suspended with
> > wakeup
> > = 
> > 0).  But it should _not_ skip calling usb_hcd_resume_root_hub() for
> > port 
> > resume events.
> 
> The current issue arises when rh_state is detected as RH_SUSPEND and
> usb_hcd_resume_root_hub() is called to resume the root hub. However,
> there is no mutual exclusion between the suspend flag, set_bit, and
> clear_bit, which can lead to two scenarios:
> 
>     1. After set_bit is called, the state of the USB device is
> modified
> by another process to !USB_STATE_SUSPEND, preventing the hub's resume
> from being executed, and consequently, clear_bit is not called again.
> 
>     2. In another scenario, during the hub resume process, after
> HCD_FLAG_WAKEUP_PENDING is cleared by clear_bit, rh_state has not yet
> been set to !RH_SUSPENDED. At this point, set_bit is executed, but
> since the hub has already entered the running state, the clear_bit
> associated with the resume operation is not executed.
> 
> Please review the v2 patch, where I have described both the logical
> flow before the modification and the revised logical flow after the
> modification.
> 

In fact, issue point 2 in the patch is introduced by issue point 1, and
issue point 2 represents a further improvement. The main issue lies in
point 1, where after the execution of the top half of the interrupt is
completed, the bottom half is frozen by S4. As a result, the USB resume
is not executed during this S4 process, and clear_bit is not called as
well. This further leads to a situation where during the process of S4
putting the USB controller into suspend, the check for
HCD_FLAG_WAKEUP_PENDING being set returns -EBUSY.

> > 
> > Alan Stern
>
Hongyu Xie Sept. 14, 2024, 2:43 a.m. UTC | #7
From: Hongyu Xie <xiehongyu1@kylinos.cn>


Hi Alan,
On 2024/9/12 23:00, Alan Stern wrote:
> On Thu, Sep 12, 2024 at 11:21:26AM +0800, duanchenghao wrote:
>> 在 2024-09-11星期三的 10:40 -0400,Alan Stern写道:
>>> On Tue, Sep 10, 2024 at 05:36:56PM +0800, duanchenghao wrote:
>>>> S4 wakeup restores the image that was saved before the system
>>>> entered
>>>> the S4 sleep state.
>>>>
>>>>      S4 waking up from hibernation
>>>>      =============================
>>>>      kernel initialization
>>>>      |
>>>>      v
>>>>      freeze user task and kernel thread
>>>>      |
>>>>      v
>>>>      load saved image
>>>>      |
>>>>      v
>>>>      freeze the peripheral device and controller
>>>>      (Check the HCD_FLAG_WAKEUP_ PENDING flag of the USB. If it is
>>>> set,
>>>>       return to EBUSY and do not perform the following restore
>>>> image.)
>>>
>>> Why is the flag set at this point?  It should not be; the device and
>>> controller should have been frozen with wakeup disabled.
>>>
>> This is check point, not set point.
> 
> Yes, I know that.  But when the flag was checked, why did the code find
> that it was set?  The flag should have been clear.
Maybe duanchenghao means this,
freeze_kernel_threads
load_image_and_restore
   suspend roothub
                                      interrupt occurred
                                        usb_hcd_resume_root_hub
                                          set HCD_FLAG_WAKEUP_PENDING
                                          queue_work // freezed
   suspend pci
     return -EBUSY  because HCD_FLAG_WAKEUP_PENDING

So s4 resume failed.
> 
>>> Is your problem related to the one discussed in this email thread?
>>>
>>> https://lore.kernel.org/linux-usb/d8600868-6e4b-45ab-b328-852b6ac8ecb5@rowland.harvard.edu/
>>>
>>> Would the suggestion I made there -- i.e., have the xhci-hcd
>>> interrupt handler skip calling usb_hcd_resume_root_hub() if the root
>>> hub
>>> was suspended with wakeup = 0 -- fix your problem?
>>
>> Skipping usb_hcd_resume_root_hub() should generally be possible, but
>> it's important to ensure that normal remote wakeup functionality is not
>> compromised. Is it HUB_SUSPEND that the hub you are referring to is in
>> a suspended state?
> 
> I don't understand this question.  hub_quiesce() gets called with
> HUB_SUSPEND when the hub enters a suspended state.
> 
> You are correct about the need for normal remote wakeup to work
> properly.  The interrupt handler should skip calling
> usb_hcd_resume_root_hub() for port connect or disconnect changes and for
> port overcurrent changes (when the root hub is suspended with wakeup =
> 0).  But it should _not_ skip calling usb_hcd_resume_root_hub() for port
> resume events.
> 
> Alan Stern
> 

Hongyu Xie
diff mbox series

Patch

diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c
index 1ff7d901fede..a6bd0fbd82f4 100644
--- a/drivers/usb/core/hcd.c
+++ b/drivers/usb/core/hcd.c
@@ -2389,7 +2389,6 @@  void usb_hcd_resume_root_hub (struct usb_hcd *hcd)
 	spin_lock_irqsave (&hcd_root_hub_lock, flags);
 	if (hcd->rh_registered) {
 		pm_wakeup_event(&hcd->self.root_hub->dev, 0);
-		set_bit(HCD_FLAG_WAKEUP_PENDING, &hcd->flags);
 		queue_work(pm_wq, &hcd->wakeup_work);
 	}
 	spin_unlock_irqrestore (&hcd_root_hub_lock, flags);
diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
index 4b93c0bd1d4b..7f847c4afc0d 100644
--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -3835,11 +3835,14 @@  int usb_port_resume(struct usb_device *udev, pm_message_t msg)
 
 int usb_remote_wakeup(struct usb_device *udev)
 {
+	struct usb_hcd  *hcd = bus_to_hcd(udev->bus);
 	int	status = 0;
 
 	usb_lock_device(udev);
 	if (udev->state == USB_STATE_SUSPENDED) {
 		dev_dbg(&udev->dev, "usb %sresume\n", "wakeup-");
+		if (hcd->state == HC_STATE_SUSPENDED)
+			set_bit(HCD_FLAG_WAKEUP_PENDING, &hcd->flags);
 		status = usb_autoresume_device(udev);
 		if (status == 0) {
 			/* Let the drivers do their thing, then... */