mbox series

[v3,0/5] PM: sleep: Improvements of async suspend and resume of devices

Message ID 10629535.nUPlyArG6x@rjwysocki.net
Headers show
Series PM: sleep: Improvements of async suspend and resume of devices | expand

Message

Rafael J. Wysocki March 14, 2025, 12:46 p.m. UTC
Hi Everyone,

This is a new iteration of the async suspend/resume improvements work:

https://lore.kernel.org/linux-pm/1915694.tdWV9SEqCh@rjwysocki.net/

which includes some rework and fixes of the patches in the series linked
above.  The most significant differences are splitting the second patch
into two patches and adding a change to treat consumers like children
during resume.

This new iteration is based on linux-pm.git/linux-next and on the recent
fix related to direct-complete:

https://lore.kernel.org/linux-pm/12627587.O9o76ZdvQC@rjwysocki.net/

The overall idea is still to start async processing for devices that have
at least some dependencies met, but not necessarily all of them, to avoid
overhead related to queuing too many async work items that will have to
wait for the processing of other devices before they can make progress.

Patch [1/5] does this in all resume phases, but it just takes children
into account (that is, async processing is started upfront for devices
without parents and then, after resuming each device, it is started for
the device's children).

Patches [2/5] does this in the suspend phase of system suspend and only
takes parents into account (that is, async processing is started upfront
for devices without any children and then, after suspending each device,
it is started for the device's parent).

Patch [3/5] extends it to the "late" and "noirq" suspend phases.

Patch [4/5] adds changes to treat suppliers like parents during suspend.
That is, async processing is started upfront for devices without any
children or consumers and then, after suspending each device, it is
started for the device's parent and suppliers.

Patch [5/5] adds changes to treat consumers like children during resume.
That is, async processing is started upfront for devices without a parent
or any suppliers and then, after resuming each device, it is started for
the device's children and consumers.

Preliminary test results from one sample system are below.

"Baseline" is the linux-pm.git/testing branch, "Parent/child"
is that branch with patches [1-3/5] applied and "Device links"
is that branch with patches [1-5/5] applied.

"s/r" means "regular" suspend/resume, noRPM is "late" suspend
and "early" resume, and noIRQ means the "noirq" phases of
suspend and resume, respectively.  The numbers are suspend
and resume times for each phase, in milliseconds.

         Baseline       Parent/child    Device links

       Suspend Resume  Suspend Resume  Suspend Resume

s/r    427     449     298     450     294     442
noRPM  13      1       13      1       13      1
noIRQ  31      25      28      24      28      26

s/r    408     442     298     443     301     447
noRPM  13      1       13      1       13      1
noIRQ  32      25      30      25      28      25

s/r    408     444     310     450     298     439
noRPM  13      1       13      1       13      1
noIRQ  31      24      31      26      31      24

It clearly shows an improvement in the suspend path after
applying patches [1-3/5], easily attributable to patch [2/5],
and clear difference after updating the async processing of
suppliers and consumers.

Note that there are systems where resume times are shorter after
patches [1-3/5] too, but more testing is necessary.

I do realize that this code can be optimized further, but it is not
particularly clear to me that any further optimizations would make
a significant difference and the changes in this series are deep
enough to do in one go.

Thanks!

Comments

Rafael J. Wysocki March 15, 2025, 2:57 p.m. UTC | #1
On Fri, Mar 14, 2025 at 10:06 PM Saravana Kannan <saravanak@google.com> wrote:
>
> On Fri, Mar 14, 2025 at 6:24 AM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> >
> > Hi Everyone,
> >
> > This is a new iteration of the async suspend/resume improvements work:
> >
> > https://lore.kernel.org/linux-pm/1915694.tdWV9SEqCh@rjwysocki.net/
> >
> > which includes some rework and fixes of the patches in the series linked
> > above.  The most significant differences are splitting the second patch
> > into two patches and adding a change to treat consumers like children
> > during resume.
> >
> > This new iteration is based on linux-pm.git/linux-next and on the recent
> > fix related to direct-complete:
> >
> > https://lore.kernel.org/linux-pm/12627587.O9o76ZdvQC@rjwysocki.net/
> >
> > The overall idea is still to start async processing for devices that have
> > at least some dependencies met, but not necessarily all of them, to avoid
> > overhead related to queuing too many async work items that will have to
> > wait for the processing of other devices before they can make progress.
> >
> > Patch [1/5] does this in all resume phases, but it just takes children
> > into account (that is, async processing is started upfront for devices
> > without parents and then, after resuming each device, it is started for
> > the device's children).
> >
> > Patches [2/5] does this in the suspend phase of system suspend and only
> > takes parents into account (that is, async processing is started upfront
> > for devices without any children and then, after suspending each device,
> > it is started for the device's parent).
> >
> > Patch [3/5] extends it to the "late" and "noirq" suspend phases.
> >
> > Patch [4/5] adds changes to treat suppliers like parents during suspend.
> > That is, async processing is started upfront for devices without any
> > children or consumers and then, after suspending each device, it is
> > started for the device's parent and suppliers.
> >
> > Patch [5/5] adds changes to treat consumers like children during resume.
> > That is, async processing is started upfront for devices without a parent
> > or any suppliers and then, after resuming each device, it is started for
> > the device's children and consumers.
> >
> > Preliminary test results from one sample system are below.
> >
> > "Baseline" is the linux-pm.git/testing branch, "Parent/child"
> > is that branch with patches [1-3/5] applied and "Device links"
> > is that branch with patches [1-5/5] applied.
> >
> > "s/r" means "regular" suspend/resume, noRPM is "late" suspend
> > and "early" resume, and noIRQ means the "noirq" phases of
> > suspend and resume, respectively.  The numbers are suspend
> > and resume times for each phase, in milliseconds.
> >
> >          Baseline       Parent/child    Device links
> >
> >        Suspend Resume  Suspend Resume  Suspend Resume
> >
> > s/r    427     449     298     450     294     442
> > noRPM  13      1       13      1       13      1
> > noIRQ  31      25      28      24      28      26
> >
> > s/r    408     442     298     443     301     447
> > noRPM  13      1       13      1       13      1
> > noIRQ  32      25      30      25      28      25
> >
> > s/r    408     444     310     450     298     439
> > noRPM  13      1       13      1       13      1
> > noIRQ  31      24      31      26      31      24
> >
> > It clearly shows an improvement in the suspend path after
> > applying patches [1-3/5], easily attributable to patch [2/5],
> > and clear difference after updating the async processing of
> > suppliers and consumers.

A "no" is missing above, it should be "and no clear difference after
updating ...".

Also, please find attached a text file with sample results from 3
different systems (including the one above), not for drawing any
conclusions (the number of samples is too low), but to illustrate what
can happen.

While both Dell XPS13 systems show a consistent improvement after
applying the first three patches, everything else is essentially a
wash (particularly on the desktop machine that seems to suspend and
resume as fast as it gets already).

> >
> > Note that there are systems where resume times are shorter after
> > patches [1-3/5] too, but more testing is necessary.
> >
> > I do realize that this code can be optimized further, but it is not
> > particularly clear to me that any further optimizations would make
> > a significant difference and the changes in this series are deep
> > enough to do in one go.
>
> Thanks for adding patches 4 and 5!

No problem.

> Let me try to test them early next week and compare your patches 1-3,
> 1-5 and my series (which does additional checks to make sure
> suppliers/consumers are done). I do about 100 suspend/resume runs for
> each kernel, so please bear with me while I get it.

Thanks and no worries, please take as much time as needed.  I will be
traveling next week, so I'll be a bit slow to respond anyway.

Since I've got a confirmation from internal testing (carried out on a
much wider range of machines and much more extensively that I can do
it myself) that patches [1-3/5] are overall improvement, I'm planning
to queue them up during the 6.16 cycle and other improvements can be
done on top of them, including patches [4-5/5].  I also think that
adding explicit status tracking (if it turns out to make things faster
measurably with respect to this series) on top of patches [4-5/5]
would be rather straightforward.
Baseline		Parents/children	Device links

	Suspend	Resume		Suspend	Resume		Suspend	Resume

Dell XPS13 9360

s/r	427	449		298	450		294	442
noRPM	13	1		13	1		13	1
noIRQ	31	25		28	24		28	26

s/r	408	442		298	443		301	447
noRPM	13	1		13	1		13	1
noIRQ	32	25		30	25		28	25

s/r	408	444		310	450		298	439
noRPM	13	1		13	1		13	1
noIRQ	31	24		31	26		31	24

Dell XPS13 9380

s/r	439	283		318	290		319	290
noRPM	15	2		15	1		15	2
noIRQ	198	1766		202	1743		204	1766

s/r	439	281		318	280		320	280
noRPM	15	2		15	1		15	1
noIRQ	199	1781		203	1783		205	1770

s/r	440	279		319	281		320	283
noRPM	14	2		15	1		15	1
noIRQ	197	1777		202	1765		203	1724

Coffee Lake Desktop

s/r	138	347		130	345		132	344
noRPM	15	2		20	2		15	2
noIRQ	15	25		23	25		16	26

s/r	133	345		124	343		131	346
noRPM	14	1		13	1		13	1
noIRQ	15	25		14	25		14	25

s/r	124	343		126	345		128	345
noRPM	13	1		13	1		13	1
noIRQ	14	25		14	25		14	26