mbox series

[0/7] PM: Solution for S0ix failure caused by PCH overheating

Message ID 20220505015814.3727692-1-rui.zhang@intel.com
Headers show
Series PM: Solution for S0ix failure caused by PCH overheating | expand

Message

Zhang, Rui May 5, 2022, 1:58 a.m. UTC
On some Intel client platforms like SKL/KBL/CNL/CML, there is a
PCH thermal sensor that monitors the PCH temperature and blocks the system
from entering S0ix in case it overheats.

Commit ef63b043ac86 ("thermal: intel: pch: fix S0ix failure due to PCH
temperature above threshold") introduces a delay loop to cool the
temperature down for this purpose.

However, in practice, we found that the time it takes to cool the PCH down
below threshold highly depends on the initial PCH temperature when the
delay starts, as well as the ambient temperature.

For example, on a Dell XPS 9360 laptop, the problem can be triggered 
1. when it is suspended with heavy workload running.
or
2. when it is moved from New Hampshire to Florida.

In these cases, the 1 second delay is not sufficient. As a result, the
system stays in a shallower power state like PCx instead of S0ix, and
drains the battery power, without user' notice.

In this patch series, we first fix the problem in patch 1/7 ~ 3/7, by
1. expand the default overall cooling delay timeout to 60 seconds.
2. make sure the temperature is below threshold rather than equal to it.
3. move the delay to .suspend_noirq phase instead, in order to
   a) do the cooling when the system is in a more quiescent state
   b) be aware of wakeup events during the long delay, because some wakeup
      events (ACPI Power button Press, USB mouse, etc) become valid only
      in .suspend_noirq phase and later.

However, this potential long delay introduces a problem to our suspend
stress automation test, because the delay makes it hard to predict how
much time it takes to suspend the system.
As we want to do as much suspend iterations as possible in limited time,
setting a 60+ seconds rtc alarm for suspend which usually takes shorter
than 1 second is far beyond overkill.

Thus, in patch 4/7 ~ 7/7, a rtc driver hook is introduced, which cancels
the armed rtc alarm in the beginning of suspend and then rearm the rtc
alarm with a short interval (say, 2 second) right before system suspended.

By running
 # echo 2 > /sys/module/rtc_cmos/parameters/rtc_wake_override_sec
before suspend, the system can be resumed by RTC alarm right after it is
suspended, no matter how much time the suspend really takes.

This patch series has been tested on the same Dell XPS 9360 laptop and
S0ix is 100% achieved across 1000+ s2idle iterations.

thanks,
rui

Comments

Rafael J. Wysocki May 5, 2022, 12:02 p.m. UTC | #1
On Thu, May 5, 2022 at 10:23 AM Oliver Neukum <oneukum@suse.com> wrote:
>
>
>
> On 05.05.22 03:58, Zhang Rui wrote:
> > On some Intel client platforms like SKL/KBL/CNL/CML, there is a
> > PCH thermal sensor that monitors the PCH temperature and blocks the system
> > from entering S0ix in case it overheats.
> >
> > Commit ef63b043ac86 ("thermal: intel: pch: fix S0ix failure due to PCH
> > temperature above threshold") introduces a delay loop to cool the
> > temperature down for this purpose.
> >
> > However, in practice, we found that the time it takes to cool the PCH down
> > below threshold highly depends on the initial PCH temperature when the
> > delay starts, as well as the ambient temperature.
>
> >
> > This patch series has been tested on the same Dell XPS 9360 laptop and
> > S0ix is 100% achieved across 1000+ s2idle iterations.
> >
> Hi,
>
> what is the user experience if this ever triggers? At that stage the
> system will appear to be suspended to an external observer, won't it?
> So in effect you'd have a system that spontaneously wakes up, won't you?

No, you won't.

It will just go ahead and reach S0ix when it can.  It will only wake
up if there's a legitimate wakeup even in the meantime.
Alexandre Belloni May 6, 2022, 9:46 p.m. UTC | #2
Hello,

I assume I can ignore this patch as this seems to be only for testing
I'm not even sure why this is needed as this completely breaks setting
the alarm time.

On 05/05/2022 09:58:14+0800, Zhang Rui wrote:
> Automated suspend/resume testing uses the RTC for wakeup.
> A short rtcwake period is desirable, so that more suspend/resume
> cycles can be completed, while the machine is available for testing.
> 
> But if too short a wake interval is specified, the event can occur,
> while still suspending, and then no event wakes the suspended system
> until the user notices that testing has stalled, and manually intervenes.
> 
> Here we add a hook to the rtc-cmos driver to
> a) remove the alarm timer in the beginning of suspend, if there is any
> b) arm the wakeup in PM notifier callback, which is in the very late stage
>    before the system really suspends
> The remaining part of suspend is usually measured under 10 ms,
> and so arming the timer at this point could be as fast as the minimum
> time of 1-second.
> 
> But there is a 2nd race.  The RTC has 1-second granularity, and unless
> you are timing the timer with a higher resolution timer,
> there is no telling if the current time + 1 will occur immediately,
> or a full second in the future.  And so 2-seconds is the safest minimum:
> 
> Run 1,000 s2idle cycles with (max of) 2 second wakeup period:
> 
>  # echo 2 > /sys/module/rtc_cmos/parameters/rtc_wake_override_sec
>  # sleepgraph.py -m freeze -multi 1000 0 -skiphtml -gzip
> 
> Clear the timer override, to not interfere with subsequent
> real use of the machine's suspend/resume feature:
> 
>  # echo 0 > /sys/module/rtc_cmos/parameters/rtc_wake_override_sec
> 
> Originally-by: Len Brown <len.brown@intel.com>
> Signed-off-by: Zhang Rui <rui.zhang@intel.com>
> Tested-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com>
> ---
>  drivers/rtc/interface.c |  1 +
>  drivers/rtc/rtc-cmos.c  | 45 +++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 46 insertions(+)
> 
> diff --git a/drivers/rtc/interface.c b/drivers/rtc/interface.c
> index 9edd662c69ac..fb93aa2dc99c 100644
> --- a/drivers/rtc/interface.c
> +++ b/drivers/rtc/interface.c
> @@ -1020,6 +1020,7 @@ void rtc_timer_cancel(struct rtc_device *rtc, struct rtc_timer *timer)
>  		rtc_timer_remove(rtc, timer);
>  	mutex_unlock(&rtc->ops_lock);
>  }
> +EXPORT_SYMBOL_GPL(rtc_timer_cancel);
>  
>  /**
>   * rtc_read_offset - Read the amount of rtc offset in parts per billion
> diff --git a/drivers/rtc/rtc-cmos.c b/drivers/rtc/rtc-cmos.c
> index 7c006c2b125f..9590c40fa9d8 100644
> --- a/drivers/rtc/rtc-cmos.c
> +++ b/drivers/rtc/rtc-cmos.c
> @@ -32,6 +32,7 @@
>  #include <linux/init.h>
>  #include <linux/interrupt.h>
>  #include <linux/spinlock.h>
> +#include <linux/suspend.h>
>  #include <linux/platform_device.h>
>  #include <linux/log2.h>
>  #include <linux/pm.h>
> @@ -70,6 +71,9 @@ static inline int cmos_use_acpi_alarm(void)
>  }
>  #endif
>  
> +static int rtc_wake_override_sec;
> +module_param(rtc_wake_override_sec, int, 0644);
> +
>  struct cmos_rtc {
>  	struct rtc_device	*rtc;
>  	struct device		*dev;
> @@ -89,6 +93,7 @@ struct cmos_rtc {
>  	u8			century;
>  
>  	struct rtc_wkalrm	saved_wkalrm;
> +	struct notifier_block	pm_nb;
>  };
>  
>  /* both platform and pnp busses use negative numbers for invalid irqs */
> @@ -744,6 +749,42 @@ static irqreturn_t cmos_interrupt(int irq, void *p)
>  		return IRQ_NONE;
>  }
>  
> +static int cmos_pm_notify(struct notifier_block *nb, unsigned long mode, void *_unused)
> +{
> +	struct cmos_rtc *cmos = container_of(nb, struct cmos_rtc, pm_nb);
> +	struct rtc_device       *rtc = cmos->rtc;
> +	unsigned long           now;
> +	struct rtc_wkalrm       alm;
> +
> +	if (rtc_wake_override_sec == 0)
> +		return NOTIFY_OK;
> +
> +	switch (mode) {
> +	case PM_SUSPEND_PREPARE:
> +		/*
> +		 * Cancel the timer to make sure it won't fire
> +		 * before rtc is rearmed later.
> +		 */
> +		rtc_timer_cancel(rtc, &rtc->aie_timer);
> +		break;
> +	case PM_SUSPEND_LATE:
> +		if (rtc_read_time(rtc, &alm.time))
> +			return NOTIFY_BAD;
> +
> +		now = rtc_tm_to_time64(&alm.time);
> +		memset(&alm, 0, sizeof(alm));
> +		rtc_time64_to_tm(now + rtc_wake_override_sec, &alm.time);
> +		alm.enabled = true;
> +		if (rtc_set_alarm(rtc, &alm))
> +			return NOTIFY_BAD;
> +		if (cmos->wake_on)
> +			cmos->wake_on(cmos->dev);
> +		break;
> +	}
> +
> +	return NOTIFY_OK;
> +}
> +
>  #ifdef	CONFIG_PNP
>  #define	INITSECTION
>  
> @@ -937,6 +978,9 @@ cmos_do_probe(struct device *dev, struct resource *ports, int rtc_irq)
>  		 nvmem_cfg.size,
>  		 use_hpet_alarm() ? ", hpet irqs" : "");
>  
> +	cmos_rtc.pm_nb.notifier_call = cmos_pm_notify;
> +	register_pm_notifier(&cmos_rtc.pm_nb);
> +
>  	return 0;
>  
>  cleanup2:
> @@ -965,6 +1009,7 @@ static void cmos_do_remove(struct device *dev)
>  	struct cmos_rtc	*cmos = dev_get_drvdata(dev);
>  	struct resource *ports;
>  
> +	unregister_pm_notifier(&cmos_rtc.pm_nb);
>  	cmos_do_shutdown(cmos->irq);
>  
>  	if (is_valid_irq(cmos->irq)) {
> -- 
> 2.17.1
>
Zhang, Rui May 7, 2022, 2 a.m. UTC | #3
Hi, Alexandre,

Thanks for reviewing the patch.

On Fri, 2022-05-06 at 23:46 +0200, Alexandre Belloni wrote:
> Hello,
> 
> I assume I can ignore this patch as this seems to be only for testing

The main purpose of this patch is for automate testing.
But this doesn't mean it cannot be part of upstream code, right?

> I'm not even sure why this is needed as this completely breaks
> setting
> the alarm time.

Or overrides the alarm time, :)

People rely on the rtc alarm in the automated suspend stress test,
which suspend and resume the system for over 1000 iterations.
As I mentioned in the cover letter of this patch series, if the system
suspend time varies from under 1 second to over 60 seconds, how to
alarm the RTC before suspend?
This feature is critical in this scenario.

Plus, the current solution is transparent to people who don't known/use
this "rtc_wake_override_sec" parameter. And for people who use this,
they should know that the previous armed RTC alarm will be overrode
whenever a system suspend is triggered. I can add a message when the
parameter is set, if needed.

thanks,
rui

> 
> On 05/05/2022 09:58:14+0800, Zhang Rui wrote:
> > Automated suspend/resume testing uses the RTC for wakeup.
> > A short rtcwake period is desirable, so that more suspend/resume
> > cycles can be completed, while the machine is available for
> > testing.
> > 
> > But if too short a wake interval is specified, the event can occur,
> > while still suspending, and then no event wakes the suspended
> > system
> > until the user notices that testing has stalled, and manually
> > intervenes.
> > 
> > Here we add a hook to the rtc-cmos driver to
> > a) remove the alarm timer in the beginning of suspend, if there is
> > any
> > b) arm the wakeup in PM notifier callback, which is in the very
> > late stage
> >    before the system really suspends
> > The remaining part of suspend is usually measured under 10 ms,
> > and so arming the timer at this point could be as fast as the
> > minimum
> > time of 1-second.
> > 
> > But there is a 2nd race.  The RTC has 1-second granularity, and
> > unless
> > you are timing the timer with a higher resolution timer,
> > there is no telling if the current time + 1 will occur immediately,
> > or a full second in the future.  And so 2-seconds is the safest
> > minimum:
> > 
> > Run 1,000 s2idle cycles with (max of) 2 second wakeup period:
> > 
> >  # echo 2 > /sys/module/rtc_cmos/parameters/rtc_wake_override_sec
> >  # sleepgraph.py -m freeze -multi 1000 0 -skiphtml -gzip
> > 
> > Clear the timer override, to not interfere with subsequent
> > real use of the machine's suspend/resume feature:
> > 
> >  # echo 0 > /sys/module/rtc_cmos/parameters/rtc_wake_override_sec
> > 
> > Originally-by: Len Brown <len.brown@intel.com>
> > Signed-off-by: Zhang Rui <rui.zhang@intel.com>
> > Tested-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com>
> > ---
> >  drivers/rtc/interface.c |  1 +
> >  drivers/rtc/rtc-cmos.c  | 45
> > +++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 46 insertions(+)
> > 
> > diff --git a/drivers/rtc/interface.c b/drivers/rtc/interface.c
> > index 9edd662c69ac..fb93aa2dc99c 100644
> > --- a/drivers/rtc/interface.c
> > +++ b/drivers/rtc/interface.c
> > @@ -1020,6 +1020,7 @@ void rtc_timer_cancel(struct rtc_device *rtc,
> > struct rtc_timer *timer)
> >  		rtc_timer_remove(rtc, timer);
> >  	mutex_unlock(&rtc->ops_lock);
> >  }
> > +EXPORT_SYMBOL_GPL(rtc_timer_cancel);
> >  
> >  /**
> >   * rtc_read_offset - Read the amount of rtc offset in parts per
> > billion
> > diff --git a/drivers/rtc/rtc-cmos.c b/drivers/rtc/rtc-cmos.c
> > index 7c006c2b125f..9590c40fa9d8 100644
> > --- a/drivers/rtc/rtc-cmos.c
> > +++ b/drivers/rtc/rtc-cmos.c
> > @@ -32,6 +32,7 @@
> >  #include <linux/init.h>
> >  #include <linux/interrupt.h>
> >  #include <linux/spinlock.h>
> > +#include <linux/suspend.h>
> >  #include <linux/platform_device.h>
> >  #include <linux/log2.h>
> >  #include <linux/pm.h>
> > @@ -70,6 +71,9 @@ static inline int cmos_use_acpi_alarm(void)
> >  }
> >  #endif
> >  
> > +static int rtc_wake_override_sec;
> > +module_param(rtc_wake_override_sec, int, 0644);
> > +
> >  struct cmos_rtc {
> >  	struct rtc_device	*rtc;
> >  	struct device		*dev;
> > @@ -89,6 +93,7 @@ struct cmos_rtc {
> >  	u8			century;
> >  
> >  	struct rtc_wkalrm	saved_wkalrm;
> > +	struct notifier_block	pm_nb;
> >  };
> >  
> >  /* both platform and pnp busses use negative numbers for invalid
> > irqs */
> > @@ -744,6 +749,42 @@ static irqreturn_t cmos_interrupt(int irq,
> > void *p)
> >  		return IRQ_NONE;
> >  }
> >  
> > +static int cmos_pm_notify(struct notifier_block *nb, unsigned long
> > mode, void *_unused)
> > +{
> > +	struct cmos_rtc *cmos = container_of(nb, struct cmos_rtc,
> > pm_nb);
> > +	struct rtc_device       *rtc = cmos->rtc;
> > +	unsigned long           now;
> > +	struct rtc_wkalrm       alm;
> > +
> > +	if (rtc_wake_override_sec == 0)
> > +		return NOTIFY_OK;
> > +
> > +	switch (mode) {
> > +	case PM_SUSPEND_PREPARE:
> > +		/*
> > +		 * Cancel the timer to make sure it won't fire
> > +		 * before rtc is rearmed later.
> > +		 */
> > +		rtc_timer_cancel(rtc, &rtc->aie_timer);
> > +		break;
> > +	case PM_SUSPEND_LATE:
> > +		if (rtc_read_time(rtc, &alm.time))
> > +			return NOTIFY_BAD;
> > +
> > +		now = rtc_tm_to_time64(&alm.time);
> > +		memset(&alm, 0, sizeof(alm));
> > +		rtc_time64_to_tm(now + rtc_wake_override_sec,
> > &alm.time);
> > +		alm.enabled = true;
> > +		if (rtc_set_alarm(rtc, &alm))
> > +			return NOTIFY_BAD;
> > +		if (cmos->wake_on)
> > +			cmos->wake_on(cmos->dev);
> > +		break;
> > +	}
> > +
> > +	return NOTIFY_OK;
> > +}
> > +
> >  #ifdef	CONFIG_PNP
> >  #define	INITSECTION
> >  
> > @@ -937,6 +978,9 @@ cmos_do_probe(struct device *dev, struct
> > resource *ports, int rtc_irq)
> >  		 nvmem_cfg.size,
> >  		 use_hpet_alarm() ? ", hpet irqs" : "");
> >  
> > +	cmos_rtc.pm_nb.notifier_call = cmos_pm_notify;
> > +	register_pm_notifier(&cmos_rtc.pm_nb);
> > +
> >  	return 0;
> >  
> >  cleanup2:
> > @@ -965,6 +1009,7 @@ static void cmos_do_remove(struct device *dev)
> >  	struct cmos_rtc	*cmos = dev_get_drvdata(dev);
> >  	struct resource *ports;
> >  
> > +	unregister_pm_notifier(&cmos_rtc.pm_nb);
> >  	cmos_do_shutdown(cmos->irq);
> >  
> >  	if (is_valid_irq(cmos->irq)) {
> > -- 
> > 2.17.1
> > 
> 
>
Alexandre Belloni May 7, 2022, 7:31 a.m. UTC | #4
On 07/05/2022 10:00:40+0800, Zhang Rui wrote:
> Hi, Alexandre,
> 
> Thanks for reviewing the patch.
> 
> On Fri, 2022-05-06 at 23:46 +0200, Alexandre Belloni wrote:
> > Hello,
> > 
> > I assume I can ignore this patch as this seems to be only for testing
> 
> The main purpose of this patch is for automate testing.
> But this doesn't mean it cannot be part of upstream code, right?
> 
> > I'm not even sure why this is needed as this completely breaks
> > setting
> > the alarm time.
> 
> Or overrides the alarm time, :)
> 
> People rely on the rtc alarm in the automated suspend stress test,
> which suspend and resume the system for over 1000 iterations.
> As I mentioned in the cover letter of this patch series, if the system
> suspend time varies from under 1 second to over 60 seconds, how to
> alarm the RTC before suspend?
> This feature is critical in this scenario.
> 
> Plus, the current solution is transparent to people who don't known/use
> this "rtc_wake_override_sec" parameter. And for people who use this,
> they should know that the previous armed RTC alarm will be overrode
> whenever a system suspend is triggered. I can add a message when the
> parameter is set, if needed.
> 

It is not transparent, if I read your patch properly, this breaks wakeup
for everyone...

> > On 05/05/2022 09:58:14+0800, Zhang Rui wrote:
> > > +static int cmos_pm_notify(struct notifier_block *nb, unsigned long
> > > mode, void *_unused)
> > > +{
> > > +	struct cmos_rtc *cmos = container_of(nb, struct cmos_rtc,
> > > pm_nb);
> > > +	struct rtc_device       *rtc = cmos->rtc;
> > > +	unsigned long           now;
> > > +	struct rtc_wkalrm       alm;
> > > +
> > > +	if (rtc_wake_override_sec == 0)
> > > +		return NOTIFY_OK;
> > > +
> > > +	switch (mode) {
> > > +	case PM_SUSPEND_PREPARE:
> > > +		/*
> > > +		 * Cancel the timer to make sure it won't fire
> > > +		 * before rtc is rearmed later.
> > > +		 */
> > > +		rtc_timer_cancel(rtc, &rtc->aie_timer);
> > > +		break;
> > > +	case PM_SUSPEND_LATE:
> > > +		if (rtc_read_time(rtc, &alm.time))
> > > +			return NOTIFY_BAD;
> > > +
> > > +		now = rtc_tm_to_time64(&alm.time);
> > > +		memset(&alm, 0, sizeof(alm));
> > > +		rtc_time64_to_tm(now + rtc_wake_override_sec,
> > > &alm.time);
> > > +		alm.enabled = true;
> > > +		if (rtc_set_alarm(rtc, &alm))
> > > +			return NOTIFY_BAD;

... because if rtc_wake_override_sec is not set, this sets the alarm to
now which is the current RTC time, ensuring the alarm will never
trigger.
Zhang, Rui May 7, 2022, 7:41 a.m. UTC | #5
On Sat, 2022-05-07 at 09:31 +0200, Alexandre Belloni wrote:
> On 07/05/2022 10:00:40+0800, Zhang Rui wrote:
> > Hi, Alexandre,
> > 
> > Thanks for reviewing the patch.
> > 
> > On Fri, 2022-05-06 at 23:46 +0200, Alexandre Belloni wrote:
> > > Hello,
> > > 
> > > I assume I can ignore this patch as this seems to be only for
> > > testing
> > 
> > The main purpose of this patch is for automate testing.
> > But this doesn't mean it cannot be part of upstream code, right?
> > 
> > > I'm not even sure why this is needed as this completely breaks
> > > setting
> > > the alarm time.
> > 
> > Or overrides the alarm time, :)
> > 
> > People rely on the rtc alarm in the automated suspend stress test,
> > which suspend and resume the system for over 1000 iterations.
> > As I mentioned in the cover letter of this patch series, if the
> > system
> > suspend time varies from under 1 second to over 60 seconds, how to
> > alarm the RTC before suspend?
> > This feature is critical in this scenario.
> > 
> > Plus, the current solution is transparent to people who don't
> > known/use
> > this "rtc_wake_override_sec" parameter. And for people who use
> > this,
> > they should know that the previous armed RTC alarm will be overrode
> > whenever a system suspend is triggered. I can add a message when
> > the
> > parameter is set, if needed.
> > 
> 
> It is not transparent, if I read your patch properly, this breaks
> wakeup
> for everyone...
> 
> > > On 05/05/2022 09:58:14+0800, Zhang Rui wrote:
> > > > +static int cmos_pm_notify(struct notifier_block *nb, unsigned
> > > > long
> > > > mode, void *_unused)
> > > > +{
> > > > +	struct cmos_rtc *cmos = container_of(nb, struct
> > > > cmos_rtc,
> > > > pm_nb);
> > > > +	struct rtc_device       *rtc = cmos->rtc;
> > > > +	unsigned long           now;
> > > > +	struct rtc_wkalrm       alm;
> > > > +
> > > > +	if (rtc_wake_override_sec == 0)
> > > > +		return NOTIFY_OK;
> > > > +
> > > > +	switch (mode) {
> > > > +	case PM_SUSPEND_PREPARE:
> > > > +		/*
> > > > +		 * Cancel the timer to make sure it won't fire
> > > > +		 * before rtc is rearmed later.
> > > > +		 */
> > > > +		rtc_timer_cancel(rtc, &rtc->aie_timer);
> > > > +		break;
> > > > +	case PM_SUSPEND_LATE:
> > > > +		if (rtc_read_time(rtc, &alm.time))
> > > > +			return NOTIFY_BAD;
> > > > +
> > > > +		now = rtc_tm_to_time64(&alm.time);
> > > > +		memset(&alm, 0, sizeof(alm));
> > > > +		rtc_time64_to_tm(now + rtc_wake_override_sec,
> > > > &alm.time);
> > > > +		alm.enabled = true;
> > > > +		if (rtc_set_alarm(rtc, &alm))
> > > > +			return NOTIFY_BAD;
> 
> ... because if rtc_wake_override_sec is not set, this sets the alarm
> to
> now which is the current RTC time, ensuring the alarm will never
> trigger.

No.
As the code below
> > > > 
> > > > 	if (rtc_wake_override_sec == 0)
> > > > +		return NOTIFY_OK;

We check for rtc_wake_override_sec at the beginning of the notifier
callback. So it takes effect only if a) rtc_wake_override_sec is set,
AND b) a suspend is triggered.

thanks,
rui
Zhang, Rui May 16, 2022, 7:50 a.m. UTC | #6
Hi, Alexandre,

May I know if this addresses your concern or not?

thanks,
rui

On Sat, 2022-05-07 at 15:41 +0800, Zhang Rui wrote:
> On Sat, 2022-05-07 at 09:31 +0200, Alexandre Belloni wrote:
> > On 07/05/2022 10:00:40+0800, Zhang Rui wrote:
> > > Hi, Alexandre,
> > > 
> > > Thanks for reviewing the patch.
> > > 
> > > On Fri, 2022-05-06 at 23:46 +0200, Alexandre Belloni wrote:
> > > > Hello,
> > > > 
> > > > I assume I can ignore this patch as this seems to be only for
> > > > testing
> > > 
> > > The main purpose of this patch is for automate testing.
> > > But this doesn't mean it cannot be part of upstream code, right?
> > > 
> > > > I'm not even sure why this is needed as this completely breaks
> > > > setting
> > > > the alarm time.
> > > 
> > > Or overrides the alarm time, :)
> > > 
> > > People rely on the rtc alarm in the automated suspend stress
> > > test,
> > > which suspend and resume the system for over 1000 iterations.
> > > As I mentioned in the cover letter of this patch series, if the
> > > system
> > > suspend time varies from under 1 second to over 60 seconds, how
> > > to
> > > alarm the RTC before suspend?
> > > This feature is critical in this scenario.
> > > 
> > > Plus, the current solution is transparent to people who don't
> > > known/use
> > > this "rtc_wake_override_sec" parameter. And for people who use
> > > this,
> > > they should know that the previous armed RTC alarm will be
> > > overrode
> > > whenever a system suspend is triggered. I can add a message when
> > > the
> > > parameter is set, if needed.
> > > 
> > 
> > It is not transparent, if I read your patch properly, this breaks
> > wakeup
> > for everyone...
> > 
> > > > On 05/05/2022 09:58:14+0800, Zhang Rui wrote:
> > > > > +static int cmos_pm_notify(struct notifier_block *nb,
> > > > > unsigned
> > > > > long
> > > > > mode, void *_unused)
> > > > > +{
> > > > > +	struct cmos_rtc *cmos = container_of(nb, struct
> > > > > cmos_rtc,
> > > > > pm_nb);
> > > > > +	struct rtc_device       *rtc = cmos->rtc;
> > > > > +	unsigned long           now;
> > > > > +	struct rtc_wkalrm       alm;
> > > > > +
> > > > > +	if (rtc_wake_override_sec == 0)
> > > > > +		return NOTIFY_OK;
> > > > > +
> > > > > +	switch (mode) {
> > > > > +	case PM_SUSPEND_PREPARE:
> > > > > +		/*
> > > > > +		 * Cancel the timer to make sure it won't fire
> > > > > +		 * before rtc is rearmed later.
> > > > > +		 */
> > > > > +		rtc_timer_cancel(rtc, &rtc->aie_timer);
> > > > > +		break;
> > > > > +	case PM_SUSPEND_LATE:
> > > > > +		if (rtc_read_time(rtc, &alm.time))
> > > > > +			return NOTIFY_BAD;
> > > > > +
> > > > > +		now = rtc_tm_to_time64(&alm.time);
> > > > > +		memset(&alm, 0, sizeof(alm));
> > > > > +		rtc_time64_to_tm(now + rtc_wake_override_sec,
> > > > > &alm.time);
> > > > > +		alm.enabled = true;
> > > > > +		if (rtc_set_alarm(rtc, &alm))
> > > > > +			return NOTIFY_BAD;
> > 
> > ... because if rtc_wake_override_sec is not set, this sets the
> > alarm
> > to
> > now which is the current RTC time, ensuring the alarm will never
> > trigger.
> 
> No.
> As the code below
> > > > > 
> > > > > 	if (rtc_wake_override_sec == 0)
> > > > > +		return NOTIFY_OK;
> 
> We check for rtc_wake_override_sec at the beginning of the notifier
> callback. So it takes effect only if a) rtc_wake_override_sec is set,
> AND b) a suspend is triggered.
> 
> thanks,
> rui
> 
>
Rafael J. Wysocki May 17, 2022, 3:11 p.m. UTC | #7
On Thu, May 5, 2022 at 3:58 AM Zhang Rui <rui.zhang@intel.com> wrote:
>
> On some Intel client platforms like SKL/KBL/CNL/CML, there is a
> PCH thermal sensor that monitors the PCH temperature and blocks the system
> from entering S0ix in case it overheats.
>
> Commit ef63b043ac86 ("thermal: intel: pch: fix S0ix failure due to PCH
> temperature above threshold") introduces a delay loop to cool the
> temperature down for this purpose.
>
> However, in practice, we found that the time it takes to cool the PCH down
> below threshold highly depends on the initial PCH temperature when the
> delay starts, as well as the ambient temperature.
>
> For example, on a Dell XPS 9360 laptop, the problem can be triggered
> 1. when it is suspended with heavy workload running.
> or
> 2. when it is moved from New Hampshire to Florida.
>
> In these cases, the 1 second delay is not sufficient. As a result, the
> system stays in a shallower power state like PCx instead of S0ix, and
> drains the battery power, without user' notice.
>
> In this patch series, we first fix the problem in patch 1/7 ~ 3/7, by
> 1. expand the default overall cooling delay timeout to 60 seconds.
> 2. make sure the temperature is below threshold rather than equal to it.
> 3. move the delay to .suspend_noirq phase instead, in order to
>    a) do the cooling when the system is in a more quiescent state
>    b) be aware of wakeup events during the long delay, because some wakeup
>       events (ACPI Power button Press, USB mouse, etc) become valid only
>       in .suspend_noirq phase and later.
>
> However, this potential long delay introduces a problem to our suspend
> stress automation test, because the delay makes it hard to predict how
> much time it takes to suspend the system.
> As we want to do as much suspend iterations as possible in limited time,
> setting a 60+ seconds rtc alarm for suspend which usually takes shorter
> than 1 second is far beyond overkill.
>
> Thus, in patch 4/7 ~ 7/7, a rtc driver hook is introduced, which cancels
> the armed rtc alarm in the beginning of suspend and then rearm the rtc
> alarm with a short interval (say, 2 second) right before system suspended.
>
> By running
>  # echo 2 > /sys/module/rtc_cmos/parameters/rtc_wake_override_sec
> before suspend, the system can be resumed by RTC alarm right after it is
> suspended, no matter how much time the suspend really takes.
>
> This patch series has been tested on the same Dell XPS 9360 laptop and
> S0ix is 100% achieved across 1000+ s2idle iterations.

Overall, the first three patches in the series can go in without the
rest, so let's put them into a separate series.

Patch [4/7] doesn't depend on the first three ones, so it can go in by itself.

Patch [5/7] is to be dropped anyway as per the earlier discussion.

Patch [6/7] is only needed to apply patch [7/7] which is controversial.

I think that we can drop or defer patches [6-7/7] for now.
Alexandre Belloni May 17, 2022, 5:07 p.m. UTC | #8
On 17/05/2022 17:11:05+0200, Rafael J. Wysocki wrote:
> On Thu, May 5, 2022 at 3:58 AM Zhang Rui <rui.zhang@intel.com> wrote:
> >
> > On some Intel client platforms like SKL/KBL/CNL/CML, there is a
> > PCH thermal sensor that monitors the PCH temperature and blocks the system
> > from entering S0ix in case it overheats.
> >
> > Commit ef63b043ac86 ("thermal: intel: pch: fix S0ix failure due to PCH
> > temperature above threshold") introduces a delay loop to cool the
> > temperature down for this purpose.
> >
> > However, in practice, we found that the time it takes to cool the PCH down
> > below threshold highly depends on the initial PCH temperature when the
> > delay starts, as well as the ambient temperature.
> >
> > For example, on a Dell XPS 9360 laptop, the problem can be triggered
> > 1. when it is suspended with heavy workload running.
> > or
> > 2. when it is moved from New Hampshire to Florida.
> >
> > In these cases, the 1 second delay is not sufficient. As a result, the
> > system stays in a shallower power state like PCx instead of S0ix, and
> > drains the battery power, without user' notice.
> >
> > In this patch series, we first fix the problem in patch 1/7 ~ 3/7, by
> > 1. expand the default overall cooling delay timeout to 60 seconds.
> > 2. make sure the temperature is below threshold rather than equal to it.
> > 3. move the delay to .suspend_noirq phase instead, in order to
> >    a) do the cooling when the system is in a more quiescent state
> >    b) be aware of wakeup events during the long delay, because some wakeup
> >       events (ACPI Power button Press, USB mouse, etc) become valid only
> >       in .suspend_noirq phase and later.
> >
> > However, this potential long delay introduces a problem to our suspend
> > stress automation test, because the delay makes it hard to predict how
> > much time it takes to suspend the system.
> > As we want to do as much suspend iterations as possible in limited time,
> > setting a 60+ seconds rtc alarm for suspend which usually takes shorter
> > than 1 second is far beyond overkill.
> >
> > Thus, in patch 4/7 ~ 7/7, a rtc driver hook is introduced, which cancels
> > the armed rtc alarm in the beginning of suspend and then rearm the rtc
> > alarm with a short interval (say, 2 second) right before system suspended.
> >
> > By running
> >  # echo 2 > /sys/module/rtc_cmos/parameters/rtc_wake_override_sec
> > before suspend, the system can be resumed by RTC alarm right after it is
> > suspended, no matter how much time the suspend really takes.
> >
> > This patch series has been tested on the same Dell XPS 9360 laptop and
> > S0ix is 100% achieved across 1000+ s2idle iterations.
> 
> Overall, the first three patches in the series can go in without the
> rest, so let's put them into a separate series.
> 
> Patch [4/7] doesn't depend on the first three ones, so it can go in by itself.
> 
> Patch [5/7] is to be dropped anyway as per the earlier discussion.
> 
> Patch [6/7] is only needed to apply patch [7/7] which is controversial.
> 
> I think that we can drop or defer patches [6-7/7] for now.

I don't think 7/7 is really useful in the upstream kernel, I don't plan
to apply it
Zhang, Rui May 18, 2022, 2:11 p.m. UTC | #9
Hi, Rafael,

On Tue, 2022-05-17 at 17:11 +0200, Rafael J. Wysocki wrote:
> On Thu, May 5, 2022 at 3:58 AM Zhang Rui <rui.zhang@intel.com> wrote:
> > 
> > On some Intel client platforms like SKL/KBL/CNL/CML, there is a
> > PCH thermal sensor that monitors the PCH temperature and blocks the
> > system
> > from entering S0ix in case it overheats.
> > 
> > Commit ef63b043ac86 ("thermal: intel: pch: fix S0ix failure due to
> > PCH
> > temperature above threshold") introduces a delay loop to cool the
> > temperature down for this purpose.
> > 
> > However, in practice, we found that the time it takes to cool the
> > PCH down
> > below threshold highly depends on the initial PCH temperature when
> > the
> > delay starts, as well as the ambient temperature.
> > 
> > For example, on a Dell XPS 9360 laptop, the problem can be
> > triggered
> > 1. when it is suspended with heavy workload running.
> > or
> > 2. when it is moved from New Hampshire to Florida.
> > 
> > In these cases, the 1 second delay is not sufficient. As a result,
> > the
> > system stays in a shallower power state like PCx instead of S0ix,
> > and
> > drains the battery power, without user' notice.
> > 
> > In this patch series, we first fix the problem in patch 1/7 ~ 3/7,
> > by
> > 1. expand the default overall cooling delay timeout to 60 seconds.
> > 2. make sure the temperature is below threshold rather than equal
> > to it.
> > 3. move the delay to .suspend_noirq phase instead, in order to
> >    a) do the cooling when the system is in a more quiescent state
> >    b) be aware of wakeup events during the long delay, because some
> > wakeup
> >       events (ACPI Power button Press, USB mouse, etc) become valid
> > only
> >       in .suspend_noirq phase and later.
> > 
> > However, this potential long delay introduces a problem to our
> > suspend
> > stress automation test, because the delay makes it hard to predict
> > how
> > much time it takes to suspend the system.
> > As we want to do as much suspend iterations as possible in limited
> > time,
> > setting a 60+ seconds rtc alarm for suspend which usually takes
> > shorter
> > than 1 second is far beyond overkill.
> > 
> > Thus, in patch 4/7 ~ 7/7, a rtc driver hook is introduced, which
> > cancels
> > the armed rtc alarm in the beginning of suspend and then rearm the
> > rtc
> > alarm with a short interval (say, 2 second) right before system
> > suspended.
> > 
> > By running
> >  # echo 2 > /sys/module/rtc_cmos/parameters/rtc_wake_override_sec
> > before suspend, the system can be resumed by RTC alarm right after
> > it is
> > suspended, no matter how much time the suspend really takes.
> > 
> > This patch series has been tested on the same Dell XPS 9360 laptop
> > and
> > S0ix is 100% achieved across 1000+ s2idle iterations.
> 
> Overall, the first three patches in the series can go in without the
> rest, so let's put them into a separate series.
> 
> Patch [4/7] doesn't depend on the first three ones, so it can go in
> by itself.
> 
> Patch [5/7] is to be dropped anyway as per the earlier discussion.
> 
> Patch [6/7] is only needed to apply patch [7/7] which is
> controversial.
> 
> I think that we can drop or defer patches [6-7/7] for now.

This all sounds reasonable to me.
I will resend them separately.

-rui