Message ID | 20250331204442.1727618-1-superm1@kernel.org |
---|---|
State | New |
Headers | show |
Series | ACPI: EC: Set ec_no_wakeup for Lenovo Go S | expand |
On Tue, 1 Apr 2025 at 22:54, Mario Limonciello <superm1@kernel.org> wrote: > > On 4/1/2025 1:39 PM, Antheas Kapenekakis wrote: > > On Tue, 1 Apr 2025 at 17:24, Mario Limonciello <superm1@kernel.org> wrote: > >> > >> On 4/1/2025 10:03 AM, Antheas Kapenekakis wrote: > >>> On Tue, 1 Apr 2025 at 16:09, Mario Limonciello <superm1@kernel.org> wrote: > >>>> > >>>> On 4/1/2025 7:45 AM, Antheas Kapenekakis wrote: > >>>>> On Tue, 1 Apr 2025 at 14:30, Mario Limonciello <superm1@kernel.org> wrote: > >>>>>> > >>>>>>>> Here are tags for linking to your patch development to be picked up. > >>>>>>>> > >>>>>>>> Link: > >>>>>>>> https://github.com/bazzite-org/patchwork/commit/95b93b2852718ee1e808c72e6b1836da4a95fc63 > >>>>>>>> Co-developed-by: Antheas Kapenekakis <lkml@antheas.dev> > >>>>>>>> Signed-off-by: Antheas Kapenekakis <lkml@antheas.dev> > >>>>>>> > >>>>>> > >>>>>> I don't believe that b4 will pick these up, so I will send out a v2 with > >>>>>> them and mark this patch as superceded in patchwork so that Rafael > >>>>>> doesn't have to pull everything out of this thread manually. > >>>> > >>>> FTR I don't have permission on patchwork for linux-acpi. > >>>> > >>>> I sent out v2 though. > >>>> > >>>>>> > >>>>>>> > >>>>>>> And to avoid having this conversation again, there is another Legion > >>>>>>> Go S [3] patch you nacked and froze the testing for, so you could go > >>>>>>> on the manhunt for the real cause of this one. But it will probably be > >>>>>>> needed and you will find that as you get TDP controls going. So if you > >>>>>>> want me to prepare that in a timely manner, because that one actually > >>>>>>> needs rewriting to be posted, now is the time to say so. > >>>>>> > >>>>>> Can you please propose what you have in mind on the mailing lists to > >>>>>> discuss? It's relatively expensive (in the unit of tech debt) to add > >>>>>> quirk infrastructure and so we need to make sure it is the right solution. > >>>>>> > >>>>>> Derek is working on CPU coefficient tuning in a completely separate > >>>>>> driver. If there are issues with that, I would generally prefer the > >>>>>> fixes to be in that driver. > >>>>> > >>>>> CPU coefficient tuning? If you mean the lenovo-wmi-driver, yes I will > >>>>> try to make sure the quirk can be potentially added there, or in any > >>>>> driver*. > >>>> > >>>> Yes things like fPPT, sPPT, STAPM, STT limits. > >>>> > >>>>> > >>>>> The idea is to rewrite the patch series to just add a simple delay > >>>>> field on the s2idle quirk struct. Then the biggest delay wins and gets > >>>>> placed in ->begin. We have been using that series for ~6 months now, > >>>>> and it turns out that having a delay system for every call is quite > >>>>> pointless. But there are also situations where you might have a device > >>>>> such as the Z13 Folio which looks like a USB device but listens to > >>>>> s2idle notifications through ACPI, so the hid subsystem might need to > >>>>> be able to inject a small delay there. > >>>> > >>>> So the "general" problem with injecting delays is they are typically not > >>>> scalable as they're usually empirically measured and there is no > >>>> handshake with the firmware. > >>>> > >>>> Say for example the EC has some hardcoded value of 200ms to wait for > >>>> something. IIRC the Linux timer infrastructure can be off by ~13%. So > >>>> if you put 175ms it might work sometimes. You get some reports of this, > >>>> so you extend it to 200ms. Great it works 100% of the time because the > >>>> old hardcoded value in the EC was 200ms. > >>>> > >>>> Now say a new EC firmware comes out that for $REASONS changes it to > >>>> 250ms. Your old empirically measured value stops working, spend a bunch > >>>> of cycles debugging it, measure the new one. You change it to 250ms, > >>>> but people with the old one have a problem now because the timing changed. > >>>> > >>>> So now you have to add infrastructure to say what version of the > >>>> firmware gets what delay. > >>>> > >>>> Then you find out there is another SKU of that model which needs a > >>>> different delay, so your complexity has ballooned. > >>>> > >>>> What if all these "delays" were FW timeouts from failing to service an > >>>> interrupt? Or what if they were a flow problem like the device expected > >>>> you to issue a STOP command before a RESET command? > >>>> > >>>> So we need to be /incredibly careful/ with delays and 100% they are the > >>>> right answer to a problem. > >>> > >>> I do get your points. In this case though we sideskirt through a lot > >>> of the points because of where the delay is placed. > >>> > >>> If the instrumentation is in-place, this delay happens before sleep > >>> after the screen of the device has turned off (due to early DPMS), the > >>> keyboard backlight has turned off (DIsplay off call), and the suspend > >>> light pulses (Sleep Entry). So it does not affect device behavior and > >>> you can be quite liberal. The user has left the device alone. > >>> > >>> If the device needs e.g., 250ms you will not put 250ms, you will put > >>> 500ms. Still unsure, you bump it to 750ms. Also, even if the > >>> manufacturer comes up with a new firmware that fixes this issue, you > >>> can keep the delay for the life of the product, because keeping it > >>> does not affect device behavior, and writing kernel patches takes time. > >>> > >>> This is how I think about it, at least. A universal delay might be > >>> needed eventually. But for now, limiting the scope to some devices and > >>> seeing how that goes should be enough. > >>> > >>> Antheas > >> > >> My take is that "universal" delays are never popular. IE hardware that > >> "previously" worked perfectly is now slower. So if there /must/ be a > >> delay it should be as narrow as possible and justified. > >> > >> Let me give you an example of another case I'm *actively considering* a > >> delay. > >> > >> I have an OEM's system that if you enter and exit s0i3 too quickly you > >> can trigger the over voltage protection (OVP) feature of the VR module. > >> When OVP is tripped the system is forced off immediately. This *only > >> happens* on the VR module in that vendor's systems. "Normal" Linux > >> userspace suspend/resume can't trip it. But connecting a dock "does" > >> trip it. > >> > >> If you look on a scope you can see SLP_S3# pin is toggling faster than > >> spec says it should. Naïvely you would say well the easy solution is to > >> add a delay somewhere so that SLP_S3# stays in spec. I have a patch > >> that does just that. > >> > >> diff --git a/drivers/platform/x86/amd/pmc/pmc.c > >> b/drivers/platform/x86/amd/pmc/pmc.c > >> index e6124498b195f..97387ddb281e1 100644 > >> --- a/drivers/platform/x86/amd/pmc/pmc.c > >> +++ b/drivers/platform/x86/amd/pmc/pmc.c > >> @@ -724,10 +724,20 @@ static void amd_pmc_s2idle_check(void) > >> struct smu_metrics table; > >> int rc; > >> > >> - /* CZN: Ensure that future s0i3 entry attempts at least 10ms > >> passed */ > >> - if (pdev->cpu_id == AMD_CPU_ID_CZN && !get_metrics_table(pdev, > >> &table) && > >> - table.s0i3_last_entry_status) > >> - usleep_range(10000, 20000); > >> + if (!get_metrics_table(pdev, &table) && > >> table.s0i3_last_entry_status) { > >> + switch (pdev->cpu_id) { > >> + /* CZN: Ensure that future s0i3 entry attempts at least > >> 10ms passed */ > >> + case AMD_CPU_ID_CZN: > >> + usleep_range(10000, 20000); > >> + break; > >> + /* PHX/HPT: Ensure enough time to avoid VR OVP */ > >> + case AMD_CPU_ID_PS: > >> + msleep(2500); > >> + break; > >> + default: > >> + break; > >> + } > >> + } > >> > >> This stops all the failures, but it also has an impact that any time the > >> EC SCI is raised for any reason (like plug in power adapter) the system > >> will take 2.5s to go back into s0i3. > >> > >> Digging further - the intended behavior by the EC and BIOS was to wake > >> the system when the dock is connected. > >> > >> That is the reason this happens is because the EC SCI is raised when the > >> dock is connected, but the Notify() the EC sent wasn't received by any > >> driver. I've got a patch I'll be sending out soon that adds support for > >> the correct driver to wake up on this event. > >> > >> This prevents the case of the OVP and now we don't *need* to penalize > >> everyone to wait 2.5s after EC SCI events and going back to s0i3. If I > >> find out there are other ways to trip the problem I still have that > >> option though. > > > > So you are talking about missing the AC/DC burst feature of Windows > > here right? I do agree with you that yeah for most devices it is not > > necessary. > > No; I wasn't talking about that, my point was that timing delays are a > tempting to solution to a problem, but they're very often papering over > something else and a hint to dive deeper. What I gleaned from what you said is that X manufacturer has a problem due to missing AC/DC bursts in linux, where all AC/DC burst is is a 5s delay. The intended behavior of AC/DC bursts is to fully wake up the kernel for 5 seconds, and then sleep again. In windows, if a power supply is connected, userspace wakes up too, and then the Windows power manager sleeps the system again if there is no user activity for 5 seconds. However, this should not affect device drivers, so we may consider it optional on the Linux side until DEs get support for it and enable it themselves I would say. So in effect, AC/DC bursts are Windows' solution to problems like the one you faced. I am not saying penalize everyone. If I do make a patch for AC/DC it will be device specific. But after a point, if random devices start getting issues and the quirk list starts to grow, it might become inevitable to force it for all of them. I do get what you are saying with delays though. We had to merge one of the initial SOF delay patch variants for the Steam Deck which prevents audio crashing on resume, and that was definitely a bandage. > > > > But Microsoft guarantees 5 seconds. We already have the original Ally > > unit which gets stuck in prochot due to this so it would be nice to > > fix. For the Ally X I am unsure what Asus did but it stays awake for a > > nice three seconds after you plug/unplug the charger so it has no > > issues. > > > > So if devices keep getting issues like we will have to eat it and do > > AC/DC bursts with all of them. > > > > And it is the same with entering s0i3 too fast. Some devices just need > > a tiny amount of time to do whatever it is their manufacturer > > programmed them to do after the Modern Standby notifications. For > > handhelds, it is to turn off the controllers because XInput. Asus put > > the fade animation so that takes 300ms and if you do it earlier the > > controller gets cut before it saves its state and starts to do weird > > RGB stuff. Other manufacturers typically do not malfunction but they > > still use the notification. > > > > Only MSI does not, but that controller is quirky before/after sleep > > and they released a firmware update today saying something about > > controller S3/S4 improvements so they probably do that too now, I need > > to check. > > > > For the Go S, it sets itself to 5W after sleep entry and turns off the > > fan. A little delay went a long way in fixing the hang there, which I > > suspect is due to aggressive tuning. But I do not know if you guys get > > that. We did when we did the initial testing for it and carry the > > delay now so I cannot tell you either way. So you should max out the > > TDP, run stress -c 16, and make the device sleep 100-200 times to make > > sure that is not an issue. > > > > I do have a plan for trying to rework AC/DC bursts, but first the > > s2idle ordering needs to be fixed and I need to rewrite the series for > > that. The series we have for that works _fine_ so it is not a priority > > to rework but it is not upstreamable in its current state so if you > > need that (for the Go S) I need to know now. > > > > For ACDC my idea would be after the reordering is done to have a quirk > > that makes the kernel resume, fire the sleep exit notification, loop > > for 5 (maybe 3?) seconds inside the device suspend section prior to > > userspace resume, and then as long as a wakeup did not arrive restart > > the suspend sequence to sleep again. I would also combine that with > > the little s2idle wakeup device you made, so that userspace can enable > > wakeups for that if it wants to do resume on dock connection. But that > > has a lot of moving parts, including moving the DPMS action to happen > > even earlier than your patch does and making sure display on/off does > > not fire so that the keyboard backlight does not do weird stuff. > > > > Antheas > > I think a good start for what you're talking about would be to rebase > your series that reworked s2idle flow on 6.15 code (maybe it's a clean > rebase, idk) and then if/when all of us on LKML are happy with it we can > layer other concepts on top of that. Yeah, I will try to do that. However, I have around 30 submitted patches in the air right now, and we are about to add another 5-6 to the list for the Claw. So it will probably be after a bunch of those merge. For the interest of sustainability, if nothing else. So let's put a dot on this and pick up the discussion again mid 6.16 in a month or so. Unless you need this series for the Go S, in which case I can try to re-order stuff around. So, one of you should use the red light TDP mode with an artificial load (or actual, such as a game) and see if sleep works properly. Do that on battery. I would do at least 100 suspends with this, as most users do 50-70 suspends per reboot. I think I did around 300 to validate the Go S quirk. Antheas
On 4/1/2025 5:06 PM, Antheas Kapenekakis wrote: > On Tue, 1 Apr 2025 at 22:54, Mario Limonciello <superm1@kernel.org> wrote: >> >> On 4/1/2025 1:39 PM, Antheas Kapenekakis wrote: >>> On Tue, 1 Apr 2025 at 17:24, Mario Limonciello <superm1@kernel.org> wrote: >>>> >>>> On 4/1/2025 10:03 AM, Antheas Kapenekakis wrote: >>>>> On Tue, 1 Apr 2025 at 16:09, Mario Limonciello <superm1@kernel.org> wrote: >>>>>> >>>>>> On 4/1/2025 7:45 AM, Antheas Kapenekakis wrote: >>>>>>> On Tue, 1 Apr 2025 at 14:30, Mario Limonciello <superm1@kernel.org> wrote: >>>>>>>> >>>>>>>>>> Here are tags for linking to your patch development to be picked up. >>>>>>>>>> >>>>>>>>>> Link: >>>>>>>>>> https://github.com/bazzite-org/patchwork/commit/95b93b2852718ee1e808c72e6b1836da4a95fc63 >>>>>>>>>> Co-developed-by: Antheas Kapenekakis <lkml@antheas.dev> >>>>>>>>>> Signed-off-by: Antheas Kapenekakis <lkml@antheas.dev> >>>>>>>>> >>>>>>>> >>>>>>>> I don't believe that b4 will pick these up, so I will send out a v2 with >>>>>>>> them and mark this patch as superceded in patchwork so that Rafael >>>>>>>> doesn't have to pull everything out of this thread manually. >>>>>> >>>>>> FTR I don't have permission on patchwork for linux-acpi. >>>>>> >>>>>> I sent out v2 though. >>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> And to avoid having this conversation again, there is another Legion >>>>>>>>> Go S [3] patch you nacked and froze the testing for, so you could go >>>>>>>>> on the manhunt for the real cause of this one. But it will probably be >>>>>>>>> needed and you will find that as you get TDP controls going. So if you >>>>>>>>> want me to prepare that in a timely manner, because that one actually >>>>>>>>> needs rewriting to be posted, now is the time to say so. >>>>>>>> >>>>>>>> Can you please propose what you have in mind on the mailing lists to >>>>>>>> discuss? It's relatively expensive (in the unit of tech debt) to add >>>>>>>> quirk infrastructure and so we need to make sure it is the right solution. >>>>>>>> >>>>>>>> Derek is working on CPU coefficient tuning in a completely separate >>>>>>>> driver. If there are issues with that, I would generally prefer the >>>>>>>> fixes to be in that driver. >>>>>>> >>>>>>> CPU coefficient tuning? If you mean the lenovo-wmi-driver, yes I will >>>>>>> try to make sure the quirk can be potentially added there, or in any >>>>>>> driver*. >>>>>> >>>>>> Yes things like fPPT, sPPT, STAPM, STT limits. >>>>>> >>>>>>> >>>>>>> The idea is to rewrite the patch series to just add a simple delay >>>>>>> field on the s2idle quirk struct. Then the biggest delay wins and gets >>>>>>> placed in ->begin. We have been using that series for ~6 months now, >>>>>>> and it turns out that having a delay system for every call is quite >>>>>>> pointless. But there are also situations where you might have a device >>>>>>> such as the Z13 Folio which looks like a USB device but listens to >>>>>>> s2idle notifications through ACPI, so the hid subsystem might need to >>>>>>> be able to inject a small delay there. >>>>>> >>>>>> So the "general" problem with injecting delays is they are typically not >>>>>> scalable as they're usually empirically measured and there is no >>>>>> handshake with the firmware. >>>>>> >>>>>> Say for example the EC has some hardcoded value of 200ms to wait for >>>>>> something. IIRC the Linux timer infrastructure can be off by ~13%. So >>>>>> if you put 175ms it might work sometimes. You get some reports of this, >>>>>> so you extend it to 200ms. Great it works 100% of the time because the >>>>>> old hardcoded value in the EC was 200ms. >>>>>> >>>>>> Now say a new EC firmware comes out that for $REASONS changes it to >>>>>> 250ms. Your old empirically measured value stops working, spend a bunch >>>>>> of cycles debugging it, measure the new one. You change it to 250ms, >>>>>> but people with the old one have a problem now because the timing changed. >>>>>> >>>>>> So now you have to add infrastructure to say what version of the >>>>>> firmware gets what delay. >>>>>> >>>>>> Then you find out there is another SKU of that model which needs a >>>>>> different delay, so your complexity has ballooned. >>>>>> >>>>>> What if all these "delays" were FW timeouts from failing to service an >>>>>> interrupt? Or what if they were a flow problem like the device expected >>>>>> you to issue a STOP command before a RESET command? >>>>>> >>>>>> So we need to be /incredibly careful/ with delays and 100% they are the >>>>>> right answer to a problem. >>>>> >>>>> I do get your points. In this case though we sideskirt through a lot >>>>> of the points because of where the delay is placed. >>>>> >>>>> If the instrumentation is in-place, this delay happens before sleep >>>>> after the screen of the device has turned off (due to early DPMS), the >>>>> keyboard backlight has turned off (DIsplay off call), and the suspend >>>>> light pulses (Sleep Entry). So it does not affect device behavior and >>>>> you can be quite liberal. The user has left the device alone. >>>>> >>>>> If the device needs e.g., 250ms you will not put 250ms, you will put >>>>> 500ms. Still unsure, you bump it to 750ms. Also, even if the >>>>> manufacturer comes up with a new firmware that fixes this issue, you >>>>> can keep the delay for the life of the product, because keeping it >>>>> does not affect device behavior, and writing kernel patches takes time. >>>>> >>>>> This is how I think about it, at least. A universal delay might be >>>>> needed eventually. But for now, limiting the scope to some devices and >>>>> seeing how that goes should be enough. >>>>> >>>>> Antheas >>>> >>>> My take is that "universal" delays are never popular. IE hardware that >>>> "previously" worked perfectly is now slower. So if there /must/ be a >>>> delay it should be as narrow as possible and justified. >>>> >>>> Let me give you an example of another case I'm *actively considering* a >>>> delay. >>>> >>>> I have an OEM's system that if you enter and exit s0i3 too quickly you >>>> can trigger the over voltage protection (OVP) feature of the VR module. >>>> When OVP is tripped the system is forced off immediately. This *only >>>> happens* on the VR module in that vendor's systems. "Normal" Linux >>>> userspace suspend/resume can't trip it. But connecting a dock "does" >>>> trip it. >>>> >>>> If you look on a scope you can see SLP_S3# pin is toggling faster than >>>> spec says it should. Naïvely you would say well the easy solution is to >>>> add a delay somewhere so that SLP_S3# stays in spec. I have a patch >>>> that does just that. >>>> >>>> diff --git a/drivers/platform/x86/amd/pmc/pmc.c >>>> b/drivers/platform/x86/amd/pmc/pmc.c >>>> index e6124498b195f..97387ddb281e1 100644 >>>> --- a/drivers/platform/x86/amd/pmc/pmc.c >>>> +++ b/drivers/platform/x86/amd/pmc/pmc.c >>>> @@ -724,10 +724,20 @@ static void amd_pmc_s2idle_check(void) >>>> struct smu_metrics table; >>>> int rc; >>>> >>>> - /* CZN: Ensure that future s0i3 entry attempts at least 10ms >>>> passed */ >>>> - if (pdev->cpu_id == AMD_CPU_ID_CZN && !get_metrics_table(pdev, >>>> &table) && >>>> - table.s0i3_last_entry_status) >>>> - usleep_range(10000, 20000); >>>> + if (!get_metrics_table(pdev, &table) && >>>> table.s0i3_last_entry_status) { >>>> + switch (pdev->cpu_id) { >>>> + /* CZN: Ensure that future s0i3 entry attempts at least >>>> 10ms passed */ >>>> + case AMD_CPU_ID_CZN: >>>> + usleep_range(10000, 20000); >>>> + break; >>>> + /* PHX/HPT: Ensure enough time to avoid VR OVP */ >>>> + case AMD_CPU_ID_PS: >>>> + msleep(2500); >>>> + break; >>>> + default: >>>> + break; >>>> + } >>>> + } >>>> >>>> This stops all the failures, but it also has an impact that any time the >>>> EC SCI is raised for any reason (like plug in power adapter) the system >>>> will take 2.5s to go back into s0i3. >>>> >>>> Digging further - the intended behavior by the EC and BIOS was to wake >>>> the system when the dock is connected. >>>> >>>> That is the reason this happens is because the EC SCI is raised when the >>>> dock is connected, but the Notify() the EC sent wasn't received by any >>>> driver. I've got a patch I'll be sending out soon that adds support for >>>> the correct driver to wake up on this event. >>>> >>>> This prevents the case of the OVP and now we don't *need* to penalize >>>> everyone to wait 2.5s after EC SCI events and going back to s0i3. If I >>>> find out there are other ways to trip the problem I still have that >>>> option though. >>> >>> So you are talking about missing the AC/DC burst feature of Windows >>> here right? I do agree with you that yeah for most devices it is not >>> necessary. >> >> No; I wasn't talking about that, my point was that timing delays are a >> tempting to solution to a problem, but they're very often papering over >> something else and a hint to dive deeper. > > What I gleaned from what you said is that X manufacturer has a problem > due to missing AC/DC bursts in linux, where all AC/DC burst is is a 5s > delay. > > The intended behavior of AC/DC bursts is to fully wake up the kernel > for 5 seconds, and then sleep again. In windows, if a power supply is > connected, userspace wakes up too, and then the Windows power manager > sleeps the system again if there is no user activity for 5 seconds. > However, this should not affect device drivers, so we may consider it > optional on the Linux side until DEs get support for it and enable it > themselves I would say. > > So in effect, AC/DC bursts are Windows' solution to problems like the > one you faced. > > I am not saying penalize everyone. If I do make a patch for AC/DC it > will be device specific. But after a point, if random devices start > getting issues and the quirk list starts to grow, it might become > inevitable to force it for all of them. > > I do get what you are saying with delays though. We had to merge one > of the initial SOF delay patch variants for the Steam Deck which > prevents audio crashing on resume, and that was definitely a bandage. > Maybe I'm failing at my search-engine-foo, could you point me at some docs about this AC/DC burst stuff? [ And FTR unfortunately it's seeming that my proposal for the alternate wake source has negative side effects to other machines, so I'm going back to exploring a timing based quirk tied to SMBIOS data again too :( ] >>> >>> But Microsoft guarantees 5 seconds. We already have the original Ally >>> unit which gets stuck in prochot due to this so it would be nice to >>> fix. For the Ally X I am unsure what Asus did but it stays awake for a >>> nice three seconds after you plug/unplug the charger so it has no >>> issues. >>> >>> So if devices keep getting issues like we will have to eat it and do >>> AC/DC bursts with all of them. >>> >>> And it is the same with entering s0i3 too fast. Some devices just need >>> a tiny amount of time to do whatever it is their manufacturer >>> programmed them to do after the Modern Standby notifications. For >>> handhelds, it is to turn off the controllers because XInput. Asus put >>> the fade animation so that takes 300ms and if you do it earlier the >>> controller gets cut before it saves its state and starts to do weird >>> RGB stuff. Other manufacturers typically do not malfunction but they >>> still use the notification. >>> >>> Only MSI does not, but that controller is quirky before/after sleep >>> and they released a firmware update today saying something about >>> controller S3/S4 improvements so they probably do that too now, I need >>> to check. >>> >>> For the Go S, it sets itself to 5W after sleep entry and turns off the >>> fan. A little delay went a long way in fixing the hang there, which I >>> suspect is due to aggressive tuning. But I do not know if you guys get >>> that. We did when we did the initial testing for it and carry the >>> delay now so I cannot tell you either way. So you should max out the >>> TDP, run stress -c 16, and make the device sleep 100-200 times to make >>> sure that is not an issue. >>> >>> I do have a plan for trying to rework AC/DC bursts, but first the >>> s2idle ordering needs to be fixed and I need to rewrite the series for >>> that. The series we have for that works _fine_ so it is not a priority >>> to rework but it is not upstreamable in its current state so if you >>> need that (for the Go S) I need to know now. >>> >>> For ACDC my idea would be after the reordering is done to have a quirk >>> that makes the kernel resume, fire the sleep exit notification, loop >>> for 5 (maybe 3?) seconds inside the device suspend section prior to >>> userspace resume, and then as long as a wakeup did not arrive restart >>> the suspend sequence to sleep again. I would also combine that with >>> the little s2idle wakeup device you made, so that userspace can enable >>> wakeups for that if it wants to do resume on dock connection. But that >>> has a lot of moving parts, including moving the DPMS action to happen >>> even earlier than your patch does and making sure display on/off does >>> not fire so that the keyboard backlight does not do weird stuff. >>> >>> Antheas >> >> I think a good start for what you're talking about would be to rebase >> your series that reworked s2idle flow on 6.15 code (maybe it's a clean >> rebase, idk) and then if/when all of us on LKML are happy with it we can >> layer other concepts on top of that. > > Yeah, I will try to do that. However, I have around 30 submitted > patches in the air right now, and we are about to add another 5-6 to > the list for the Claw. So it will probably be after a bunch of those > merge. For the interest of sustainability, if nothing else. So let's > put a dot on this and pick up the discussion again mid 6.16 in a month > or so. > > Unless you need this series for the Go S, in which case I can try to > re-order stuff around. So, one of you should use the red light TDP > mode with an artificial load (or actual, such as a game) and see if > sleep works properly. Do that on battery. Take your time and get to it when you get to it. I just want to make sure we build a clean foundation for big changes like you have in mind. > > I would do at least 100 suspends with this, as most users do 50-70 > suspends per reboot. I think I did around 300 to validate the Go S > quirk. > > Antheas
On Wed, 2 Apr 2025 at 21:19, Mario Limonciello <superm1@kernel.org> wrote: > > On 4/1/2025 5:06 PM, Antheas Kapenekakis wrote: > > On Tue, 1 Apr 2025 at 22:54, Mario Limonciello <superm1@kernel.org> wrote: > >> > >> On 4/1/2025 1:39 PM, Antheas Kapenekakis wrote: > >>> On Tue, 1 Apr 2025 at 17:24, Mario Limonciello <superm1@kernel.org> wrote: > >>>> > >>>> On 4/1/2025 10:03 AM, Antheas Kapenekakis wrote: > >>>>> On Tue, 1 Apr 2025 at 16:09, Mario Limonciello <superm1@kernel.org> wrote: > >>>>>> > >>>>>> On 4/1/2025 7:45 AM, Antheas Kapenekakis wrote: > >>>>>>> On Tue, 1 Apr 2025 at 14:30, Mario Limonciello <superm1@kernel.org> wrote: > >>>>>>>> > >>>>>>>>>> Here are tags for linking to your patch development to be picked up. > >>>>>>>>>> > >>>>>>>>>> Link: > >>>>>>>>>> https://github.com/bazzite-org/patchwork/commit/95b93b2852718ee1e808c72e6b1836da4a95fc63 > >>>>>>>>>> Co-developed-by: Antheas Kapenekakis <lkml@antheas.dev> > >>>>>>>>>> Signed-off-by: Antheas Kapenekakis <lkml@antheas.dev> > >>>>>>>>> > >>>>>>>> > >>>>>>>> I don't believe that b4 will pick these up, so I will send out a v2 with > >>>>>>>> them and mark this patch as superceded in patchwork so that Rafael > >>>>>>>> doesn't have to pull everything out of this thread manually. > >>>>>> > >>>>>> FTR I don't have permission on patchwork for linux-acpi. > >>>>>> > >>>>>> I sent out v2 though. > >>>>>> > >>>>>>>> > >>>>>>>>> > >>>>>>>>> And to avoid having this conversation again, there is another Legion > >>>>>>>>> Go S [3] patch you nacked and froze the testing for, so you could go > >>>>>>>>> on the manhunt for the real cause of this one. But it will probably be > >>>>>>>>> needed and you will find that as you get TDP controls going. So if you > >>>>>>>>> want me to prepare that in a timely manner, because that one actually > >>>>>>>>> needs rewriting to be posted, now is the time to say so. > >>>>>>>> > >>>>>>>> Can you please propose what you have in mind on the mailing lists to > >>>>>>>> discuss? It's relatively expensive (in the unit of tech debt) to add > >>>>>>>> quirk infrastructure and so we need to make sure it is the right solution. > >>>>>>>> > >>>>>>>> Derek is working on CPU coefficient tuning in a completely separate > >>>>>>>> driver. If there are issues with that, I would generally prefer the > >>>>>>>> fixes to be in that driver. > >>>>>>> > >>>>>>> CPU coefficient tuning? If you mean the lenovo-wmi-driver, yes I will > >>>>>>> try to make sure the quirk can be potentially added there, or in any > >>>>>>> driver*. > >>>>>> > >>>>>> Yes things like fPPT, sPPT, STAPM, STT limits. > >>>>>> > >>>>>>> > >>>>>>> The idea is to rewrite the patch series to just add a simple delay > >>>>>>> field on the s2idle quirk struct. Then the biggest delay wins and gets > >>>>>>> placed in ->begin. We have been using that series for ~6 months now, > >>>>>>> and it turns out that having a delay system for every call is quite > >>>>>>> pointless. But there are also situations where you might have a device > >>>>>>> such as the Z13 Folio which looks like a USB device but listens to > >>>>>>> s2idle notifications through ACPI, so the hid subsystem might need to > >>>>>>> be able to inject a small delay there. > >>>>>> > >>>>>> So the "general" problem with injecting delays is they are typically not > >>>>>> scalable as they're usually empirically measured and there is no > >>>>>> handshake with the firmware. > >>>>>> > >>>>>> Say for example the EC has some hardcoded value of 200ms to wait for > >>>>>> something. IIRC the Linux timer infrastructure can be off by ~13%. So > >>>>>> if you put 175ms it might work sometimes. You get some reports of this, > >>>>>> so you extend it to 200ms. Great it works 100% of the time because the > >>>>>> old hardcoded value in the EC was 200ms. > >>>>>> > >>>>>> Now say a new EC firmware comes out that for $REASONS changes it to > >>>>>> 250ms. Your old empirically measured value stops working, spend a bunch > >>>>>> of cycles debugging it, measure the new one. You change it to 250ms, > >>>>>> but people with the old one have a problem now because the timing changed. > >>>>>> > >>>>>> So now you have to add infrastructure to say what version of the > >>>>>> firmware gets what delay. > >>>>>> > >>>>>> Then you find out there is another SKU of that model which needs a > >>>>>> different delay, so your complexity has ballooned. > >>>>>> > >>>>>> What if all these "delays" were FW timeouts from failing to service an > >>>>>> interrupt? Or what if they were a flow problem like the device expected > >>>>>> you to issue a STOP command before a RESET command? > >>>>>> > >>>>>> So we need to be /incredibly careful/ with delays and 100% they are the > >>>>>> right answer to a problem. > >>>>> > >>>>> I do get your points. In this case though we sideskirt through a lot > >>>>> of the points because of where the delay is placed. > >>>>> > >>>>> If the instrumentation is in-place, this delay happens before sleep > >>>>> after the screen of the device has turned off (due to early DPMS), the > >>>>> keyboard backlight has turned off (DIsplay off call), and the suspend > >>>>> light pulses (Sleep Entry). So it does not affect device behavior and > >>>>> you can be quite liberal. The user has left the device alone. > >>>>> > >>>>> If the device needs e.g., 250ms you will not put 250ms, you will put > >>>>> 500ms. Still unsure, you bump it to 750ms. Also, even if the > >>>>> manufacturer comes up with a new firmware that fixes this issue, you > >>>>> can keep the delay for the life of the product, because keeping it > >>>>> does not affect device behavior, and writing kernel patches takes time. > >>>>> > >>>>> This is how I think about it, at least. A universal delay might be > >>>>> needed eventually. But for now, limiting the scope to some devices and > >>>>> seeing how that goes should be enough. > >>>>> > >>>>> Antheas > >>>> > >>>> My take is that "universal" delays are never popular. IE hardware that > >>>> "previously" worked perfectly is now slower. So if there /must/ be a > >>>> delay it should be as narrow as possible and justified. > >>>> > >>>> Let me give you an example of another case I'm *actively considering* a > >>>> delay. > >>>> > >>>> I have an OEM's system that if you enter and exit s0i3 too quickly you > >>>> can trigger the over voltage protection (OVP) feature of the VR module. > >>>> When OVP is tripped the system is forced off immediately. This *only > >>>> happens* on the VR module in that vendor's systems. "Normal" Linux > >>>> userspace suspend/resume can't trip it. But connecting a dock "does" > >>>> trip it. > >>>> > >>>> If you look on a scope you can see SLP_S3# pin is toggling faster than > >>>> spec says it should. Naïvely you would say well the easy solution is to > >>>> add a delay somewhere so that SLP_S3# stays in spec. I have a patch > >>>> that does just that. > >>>> > >>>> diff --git a/drivers/platform/x86/amd/pmc/pmc.c > >>>> b/drivers/platform/x86/amd/pmc/pmc.c > >>>> index e6124498b195f..97387ddb281e1 100644 > >>>> --- a/drivers/platform/x86/amd/pmc/pmc.c > >>>> +++ b/drivers/platform/x86/amd/pmc/pmc.c > >>>> @@ -724,10 +724,20 @@ static void amd_pmc_s2idle_check(void) > >>>> struct smu_metrics table; > >>>> int rc; > >>>> > >>>> - /* CZN: Ensure that future s0i3 entry attempts at least 10ms > >>>> passed */ > >>>> - if (pdev->cpu_id == AMD_CPU_ID_CZN && !get_metrics_table(pdev, > >>>> &table) && > >>>> - table.s0i3_last_entry_status) > >>>> - usleep_range(10000, 20000); > >>>> + if (!get_metrics_table(pdev, &table) && > >>>> table.s0i3_last_entry_status) { > >>>> + switch (pdev->cpu_id) { > >>>> + /* CZN: Ensure that future s0i3 entry attempts at least > >>>> 10ms passed */ > >>>> + case AMD_CPU_ID_CZN: > >>>> + usleep_range(10000, 20000); > >>>> + break; > >>>> + /* PHX/HPT: Ensure enough time to avoid VR OVP */ > >>>> + case AMD_CPU_ID_PS: > >>>> + msleep(2500); > >>>> + break; > >>>> + default: > >>>> + break; > >>>> + } > >>>> + } > >>>> > >>>> This stops all the failures, but it also has an impact that any time the > >>>> EC SCI is raised for any reason (like plug in power adapter) the system > >>>> will take 2.5s to go back into s0i3. > >>>> > >>>> Digging further - the intended behavior by the EC and BIOS was to wake > >>>> the system when the dock is connected. > >>>> > >>>> That is the reason this happens is because the EC SCI is raised when the > >>>> dock is connected, but the Notify() the EC sent wasn't received by any > >>>> driver. I've got a patch I'll be sending out soon that adds support for > >>>> the correct driver to wake up on this event. > >>>> > >>>> This prevents the case of the OVP and now we don't *need* to penalize > >>>> everyone to wait 2.5s after EC SCI events and going back to s0i3. If I > >>>> find out there are other ways to trip the problem I still have that > >>>> option though. > >>> > >>> So you are talking about missing the AC/DC burst feature of Windows > >>> here right? I do agree with you that yeah for most devices it is not > >>> necessary. > >> > >> No; I wasn't talking about that, my point was that timing delays are a > >> tempting to solution to a problem, but they're very often papering over > >> something else and a hint to dive deeper. > > > > What I gleaned from what you said is that X manufacturer has a problem > > due to missing AC/DC bursts in linux, where all AC/DC burst is is a 5s > > delay. > > > > The intended behavior of AC/DC bursts is to fully wake up the kernel > > for 5 seconds, and then sleep again. In windows, if a power supply is > > connected, userspace wakes up too, and then the Windows power manager > > sleeps the system again if there is no user activity for 5 seconds. > > However, this should not affect device drivers, so we may consider it > > optional on the Linux side until DEs get support for it and enable it > > themselves I would say. > > > > So in effect, AC/DC bursts are Windows' solution to problems like the > > one you faced. > > > > I am not saying penalize everyone. If I do make a patch for AC/DC it > > will be device specific. But after a point, if random devices start > > getting issues and the quirk list starts to grow, it might become > > inevitable to force it for all of them. > > > > I do get what you are saying with delays though. We had to merge one > > of the initial SOF delay patch variants for the Steam Deck which > > prevents audio crashing on resume, and that was definitely a bandage. > > > > Maybe I'm failing at my search-engine-foo, could you point me at some > docs about this AC/DC burst stuff? AC/DC Burst/AC/DC Burst Suppresed are the events in Sleep Study https://learn.microsoft.com/en-us/windows-hardware/design/device-experiences/modern-standby-sleepstudy You can see those when running a sleep study and unplugging a connector. I think suppressed is unplugging Then here is the description for plugging in a charger https://learn.microsoft.com/en-us/windows-hardware/design/device-experiences/modern-standby-wake-sources#environmental-context-changes-1 > The Windows power manager will turn on the display when the battery subsystem has indicated > AC power has been connected. The GPIO interrupt for power source changes must cause the > ACPI _PSR method under the power supply device to be executed. The power subsystem must > wake the SoC any time the power source changes, including when the system is attached or > removed from a dock that has a battery or AC power source. After AC power is connected, > the display will remain on for five seconds, unless there is input to the system during this five-second window. And here for unplugging: https://learn.microsoft.com/en-us/windows-hardware/design/device-experiences/modern-standby-wake-sources#environmental-context-changes-2 > The GPIO interrupt for power source changes must cause the ACPI _PSR method under > the power supply device to be executed. The power subsystem must wake the SoC any time > the power source changes, including when the system is attached or removed from a dock > that has a battery or AC power source. I guess from the description it is not clear that the device stays on for 5 seconds when unplugging, but from empirical testing I want to say it does. It has been a while. I left 3 devices like an hour ago on Windows and none of them managed to sleep, so I cannot verify this at the moment though. Antheas > [ > And FTR unfortunately it's seeming that my proposal for the alternate > wake source has negative side effects to other machines, so I'm going > back to exploring a timing based quirk tied to SMBIOS data again too :( ] > > >>> > >>> But Microsoft guarantees 5 seconds. We already have the original Ally > >>> unit which gets stuck in prochot due to this so it would be nice to > >>> fix. For the Ally X I am unsure what Asus did but it stays awake for a > >>> nice three seconds after you plug/unplug the charger so it has no > >>> issues. > >>> > >>> So if devices keep getting issues like we will have to eat it and do > >>> AC/DC bursts with all of them. > >>> > >>> And it is the same with entering s0i3 too fast. Some devices just need > >>> a tiny amount of time to do whatever it is their manufacturer > >>> programmed them to do after the Modern Standby notifications. For > >>> handhelds, it is to turn off the controllers because XInput. Asus put > >>> the fade animation so that takes 300ms and if you do it earlier the > >>> controller gets cut before it saves its state and starts to do weird > >>> RGB stuff. Other manufacturers typically do not malfunction but they > >>> still use the notification. > >>> > >>> Only MSI does not, but that controller is quirky before/after sleep > >>> and they released a firmware update today saying something about > >>> controller S3/S4 improvements so they probably do that too now, I need > >>> to check. > >>> > >>> For the Go S, it sets itself to 5W after sleep entry and turns off the > >>> fan. A little delay went a long way in fixing the hang there, which I > >>> suspect is due to aggressive tuning. But I do not know if you guys get > >>> that. We did when we did the initial testing for it and carry the > >>> delay now so I cannot tell you either way. So you should max out the > >>> TDP, run stress -c 16, and make the device sleep 100-200 times to make > >>> sure that is not an issue. > >>> > >>> I do have a plan for trying to rework AC/DC bursts, but first the > >>> s2idle ordering needs to be fixed and I need to rewrite the series for > >>> that. The series we have for that works _fine_ so it is not a priority > >>> to rework but it is not upstreamable in its current state so if you > >>> need that (for the Go S) I need to know now. > >>> > >>> For ACDC my idea would be after the reordering is done to have a quirk > >>> that makes the kernel resume, fire the sleep exit notification, loop > >>> for 5 (maybe 3?) seconds inside the device suspend section prior to > >>> userspace resume, and then as long as a wakeup did not arrive restart > >>> the suspend sequence to sleep again. I would also combine that with > >>> the little s2idle wakeup device you made, so that userspace can enable > >>> wakeups for that if it wants to do resume on dock connection. But that > >>> has a lot of moving parts, including moving the DPMS action to happen > >>> even earlier than your patch does and making sure display on/off does > >>> not fire so that the keyboard backlight does not do weird stuff. > >>> > >>> Antheas > >> > >> I think a good start for what you're talking about would be to rebase > >> your series that reworked s2idle flow on 6.15 code (maybe it's a clean > >> rebase, idk) and then if/when all of us on LKML are happy with it we can > >> layer other concepts on top of that. > > > > Yeah, I will try to do that. However, I have around 30 submitted > > patches in the air right now, and we are about to add another 5-6 to > > the list for the Claw. So it will probably be after a bunch of those > > merge. For the interest of sustainability, if nothing else. So let's > > put a dot on this and pick up the discussion again mid 6.16 in a month > > or so. > > > > Unless you need this series for the Go S, in which case I can try to > > re-order stuff around. So, one of you should use the red light TDP > > mode with an artificial load (or actual, such as a game) and see if > > sleep works properly. Do that on battery. > > Take your time and get to it when you get to it. I just want to make > sure we build a clean foundation for big changes like you have in mind. > > > > > I would do at least 100 suspends with this, as most users do 50-70 > > suspends per reboot. I think I did around 300 to validate the Go S > > quirk. > > > > Antheas >
On 4/2/2025 3:37 PM, Antheas Kapenekakis wrote: >> Maybe I'm failing at my search-engine-foo, could you point me at some >> docs about this AC/DC burst stuff? > > AC/DC Burst/AC/DC Burst Suppresed are the events in Sleep Study > https://learn.microsoft.com/en-us/windows-hardware/design/device-experiences/modern-standby-sleepstudy > > You can see those when running a sleep study and unplugging a > connector. I think suppressed is unplugging > > Then here is the description for plugging in a charger > https://learn.microsoft.com/en-us/windows-hardware/design/device-experiences/modern-standby-wake-sources#environmental-context-changes-1 > >> The Windows power manager will turn on the display when the battery subsystem has indicated >> AC power has been connected. The GPIO interrupt for power source changes must cause the >> ACPI _PSR method under the power supply device to be executed. The power subsystem must >> wake the SoC any time the power source changes, including when the system is attached or >> removed from a dock that has a battery or AC power source. After AC power is connected, >> the display will remain on for five seconds, unless there is input to the system during this five-second window. > > And here for unplugging: > https://learn.microsoft.com/en-us/windows-hardware/design/device-experiences/modern-standby-wake-sources#environmental-context-changes-2 > >> The GPIO interrupt for power source changes must cause the ACPI _PSR method under >> the power supply device to be executed. The power subsystem must wake the SoC any time >> the power source changes, including when the system is attached or removed from a dock >> that has a battery or AC power source. I suppose this could actually just be another way to say that there is a flurry of EC activity as a result of the ACPI SCI and they characterize that activity as a "burst" associated with unplug or plug. It would be good to find actual documentation though instead of guessing. > > I guess from the description it is not clear that the device stays on > for 5 seconds when unplugging, but from empirical testing I want to > say it does. It has been a while. I left 3 devices like an hour ago on > Windows and none of them managed to sleep, so I cannot verify this at > the moment though. > During "display off" there is other activity that goes on until reaching resiliency on Microsoft side. So it's not really cut and dry periods of time. If I was to hypothesize when the screen off wakeup occurs the OS looks at the amount of time that has passed and if there was any scheduled tasks for the next wakeup gets them done before going back down.
diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c index 8db09d81918fb..3c5f34892734e 100644 --- a/drivers/acpi/ec.c +++ b/drivers/acpi/ec.c @@ -2301,6 +2301,34 @@ static const struct dmi_system_id acpi_ec_no_wakeup[] = { DMI_MATCH(DMI_PRODUCT_FAMILY, "103C_5336AN HP ZHAN 66 Pro"), }, }, + /* + * Lenovo Legion Go S; touchscreen blocks HW sleep when woken up from EC + * https://gitlab.freedesktop.org/drm/amd/-/issues/3929 + */ + { + .matches = { + DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"), + DMI_MATCH(DMI_PRODUCT_NAME, "83L3"), + } + }, + { + .matches = { + DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"), + DMI_MATCH(DMI_PRODUCT_NAME, "83N6"), + } + }, + { + .matches = { + DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"), + DMI_MATCH(DMI_PRODUCT_NAME, "83Q2"), + } + }, + { + .matches = { + DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"), + DMI_MATCH(DMI_PRODUCT_NAME, "83Q3"), + } + }, { }, };