Message ID | 20230330194439.14361-5-mario.limonciello@amd.com |
---|---|
State | New |
Headers | show |
Series | Add vendor agnostic mechanism to report hardware sleep | expand |
Hi Mario, Thank you for the patch! Yet something to improve: [auto build test ERROR on linus/master] [also build test ERROR on v6.3-rc4 next-20230330] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Mario-Limonciello/PM-Add-a-sysfs-file-to-represent-time-spent-in-hardware-sleep-state/20230331-034714 patch link: https://lore.kernel.org/r/20230330194439.14361-5-mario.limonciello%40amd.com patch subject: [PATCH v5 4/4] platform/x86/intel/pmc: core: Report duration of time in HW sleep state config: i386-randconfig-a001 (https://download.01.org/0day-ci/archive/20230331/202303311048.UQo0skP2-lkp@intel.com/config) compiler: gcc-11 (Debian 11.3.0-8) 11.3.0 reproduce (this is a W=1 build): # https://github.com/intel-lab-lkp/linux/commit/ab8a5cd564c9bd22860612acbd76477f7515ca7b git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review Mario-Limonciello/PM-Add-a-sysfs-file-to-represent-time-spent-in-hardware-sleep-state/20230331-034714 git checkout ab8a5cd564c9bd22860612acbd76477f7515ca7b # save the config file mkdir build_dir && cp config build_dir/.config make W=1 O=build_dir ARCH=i386 olddefconfig make W=1 O=build_dir ARCH=i386 SHELL=/bin/bash If you fix the issue, kindly add following tag where applicable | Reported-by: kernel test robot <lkp@intel.com> | Link: https://lore.kernel.org/oe-kbuild-all/202303311048.UQo0skP2-lkp@intel.com/ All errors (new ones prefixed by >>): drivers/platform/x86/intel/pmc/core.c: In function 'pmc_core_is_s0ix_failed': >> drivers/platform/x86/intel/pmc/core.c:1206:9: error: implicit declaration of function 'pm_set_hw_sleep_time' [-Werror=implicit-function-declaration] 1206 | pm_set_hw_sleep_time(s0ix_counter - pmcdev->s0ix_counter); | ^~~~~~~~~~~~~~~~~~~~ cc1: some warnings being treated as errors vim +/pm_set_hw_sleep_time +1206 drivers/platform/x86/intel/pmc/core.c 1198 1199 static inline bool pmc_core_is_s0ix_failed(struct pmc_dev *pmcdev) 1200 { 1201 u64 s0ix_counter; 1202 1203 if (pmc_core_dev_state_get(pmcdev, &s0ix_counter)) 1204 return false; 1205 > 1206 pm_set_hw_sleep_time(s0ix_counter - pmcdev->s0ix_counter); 1207 1208 if (s0ix_counter == pmcdev->s0ix_counter) 1209 return true; 1210 1211 return false; 1212 } 1213
On Thu, Mar 30, 2023 at 9:45 PM Mario Limonciello <mario.limonciello@amd.com> wrote: > > intel_pmc_core displays a warning when the module parameter > `warn_on_s0ix_failures` is set and a suspend didn't get to a HW sleep > state. > > Report this to the standard kernel reporting infrastructure so that > userspace software can query after the suspend cycle is done. > > Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> > --- > v4->v5: > * Reword commit message > --- > drivers/platform/x86/intel/pmc/core.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/platform/x86/intel/pmc/core.c b/drivers/platform/x86/intel/pmc/core.c > index e2f171fac094..980af32dd48a 100644 > --- a/drivers/platform/x86/intel/pmc/core.c > +++ b/drivers/platform/x86/intel/pmc/core.c > @@ -1203,6 +1203,8 @@ static inline bool pmc_core_is_s0ix_failed(struct pmc_dev *pmcdev) > if (pmc_core_dev_state_get(pmcdev, &s0ix_counter)) > return false; > > + pm_set_hw_sleep_time(s0ix_counter - pmcdev->s0ix_counter); > + Maybe check if this is really accumulating? In case of a counter overflow, for instance? > if (s0ix_counter == pmcdev->s0ix_counter) > return true; > > --
On Fri, 2023-03-31 at 20:05 +0200, Rafael J. Wysocki wrote: > On Thu, Mar 30, 2023 at 9:45 PM Mario Limonciello > <mario.limonciello@amd.com> wrote: > > > > intel_pmc_core displays a warning when the module parameter > > `warn_on_s0ix_failures` is set and a suspend didn't get to a HW sleep > > state. > > > > Report this to the standard kernel reporting infrastructure so that > > userspace software can query after the suspend cycle is done. > > > > Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> > > --- > > v4->v5: > > * Reword commit message > > --- > > drivers/platform/x86/intel/pmc/core.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/drivers/platform/x86/intel/pmc/core.c > > b/drivers/platform/x86/intel/pmc/core.c > > index e2f171fac094..980af32dd48a 100644 > > --- a/drivers/platform/x86/intel/pmc/core.c > > +++ b/drivers/platform/x86/intel/pmc/core.c > > @@ -1203,6 +1203,8 @@ static inline bool pmc_core_is_s0ix_failed(struct > > pmc_dev *pmcdev) > > if (pmc_core_dev_state_get(pmcdev, &s0ix_counter)) > > return false; > > > > + pm_set_hw_sleep_time(s0ix_counter - pmcdev->s0ix_counter); > > + > > Maybe check if this is really accumulating? In case of a counter > overflow, for instance? Overflow is likely on some systems. The counter is only 32-bit and at our smallest granularity of 30.5us per tick it could overflow after a day and a half of s0ix time, though most of our systems have a higher granularity that puts them around 6 days. This brings up an issue that the attribute cannot be trusted if the system is suspended for longer than the maximum hardware counter time. Should be noted in the Documentation. David > > > if (s0ix_counter == pmcdev->s0ix_counter) > > return true; > > > > --
On 4/3/2023 13:00, Box, David E wrote: > On Fri, 2023-03-31 at 20:05 +0200, Rafael J. Wysocki wrote: >> On Thu, Mar 30, 2023 at 9:45 PM Mario Limonciello >> <mario.limonciello@amd.com> wrote: >>> >>> intel_pmc_core displays a warning when the module parameter >>> `warn_on_s0ix_failures` is set and a suspend didn't get to a HW sleep >>> state. >>> >>> Report this to the standard kernel reporting infrastructure so that >>> userspace software can query after the suspend cycle is done. >>> >>> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> >>> --- >>> v4->v5: >>> * Reword commit message >>> --- >>> drivers/platform/x86/intel/pmc/core.c | 2 ++ >>> 1 file changed, 2 insertions(+) >>> >>> diff --git a/drivers/platform/x86/intel/pmc/core.c >>> b/drivers/platform/x86/intel/pmc/core.c >>> index e2f171fac094..980af32dd48a 100644 >>> --- a/drivers/platform/x86/intel/pmc/core.c >>> +++ b/drivers/platform/x86/intel/pmc/core.c >>> @@ -1203,6 +1203,8 @@ static inline bool pmc_core_is_s0ix_failed(struct >>> pmc_dev *pmcdev) >>> if (pmc_core_dev_state_get(pmcdev, &s0ix_counter)) >>> return false; >>> >>> + pm_set_hw_sleep_time(s0ix_counter - pmcdev->s0ix_counter); >>> + >> >> Maybe check if this is really accumulating? In case of a counter >> overflow, for instance? > > Overflow is likely on some systems. The counter is only 32-bit and at our > smallest granularity of 30.5us per tick it could overflow after a day and a half > of s0ix time, though most of our systems have a higher granularity that puts > them around 6 days. > > This brings up an issue that the attribute cannot be trusted if the system is > suspended for longer than the maximum hardware counter time. Should be noted in > the Documentation. I think it would be rather confusing for userspace having to account for this and it's better to abstract it in the kernel. How can you discover the granularity a system can support? How would you know overflow actually happened? Is there a bit somewhere else that could tell you? In terms of ABI how about when we know overflow occurred and userspace reads the sysfs file we return -EOVERFLOW instead of a potentially bad value?
On Mon, Apr 3, 2023 at 8:07 PM Limonciello, Mario <mario.limonciello@amd.com> wrote: > > On 4/3/2023 13:00, Box, David E wrote: > > On Fri, 2023-03-31 at 20:05 +0200, Rafael J. Wysocki wrote: > >> On Thu, Mar 30, 2023 at 9:45 PM Mario Limonciello > >> <mario.limonciello@amd.com> wrote: > >>> > >>> intel_pmc_core displays a warning when the module parameter > >>> `warn_on_s0ix_failures` is set and a suspend didn't get to a HW sleep > >>> state. > >>> > >>> Report this to the standard kernel reporting infrastructure so that > >>> userspace software can query after the suspend cycle is done. > >>> > >>> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> > >>> --- > >>> v4->v5: > >>> * Reword commit message > >>> --- > >>> drivers/platform/x86/intel/pmc/core.c | 2 ++ > >>> 1 file changed, 2 insertions(+) > >>> > >>> diff --git a/drivers/platform/x86/intel/pmc/core.c > >>> b/drivers/platform/x86/intel/pmc/core.c > >>> index e2f171fac094..980af32dd48a 100644 > >>> --- a/drivers/platform/x86/intel/pmc/core.c > >>> +++ b/drivers/platform/x86/intel/pmc/core.c > >>> @@ -1203,6 +1203,8 @@ static inline bool pmc_core_is_s0ix_failed(struct > >>> pmc_dev *pmcdev) > >>> if (pmc_core_dev_state_get(pmcdev, &s0ix_counter)) > >>> return false; > >>> > >>> + pm_set_hw_sleep_time(s0ix_counter - pmcdev->s0ix_counter); > >>> + > >> > >> Maybe check if this is really accumulating? In case of a counter > >> overflow, for instance? > > > > Overflow is likely on some systems. The counter is only 32-bit and at our > > smallest granularity of 30.5us per tick it could overflow after a day and a half > > of s0ix time, though most of our systems have a higher granularity that puts > > them around 6 days. > > > > This brings up an issue that the attribute cannot be trusted if the system is > > suspended for longer than the maximum hardware counter time. Should be noted in > > the Documentation. > > I think it would be rather confusing for userspace having to account for > this and it's better to abstract it in the kernel. > > How can you discover the granularity a system can support? > How would you know overflow actually happened? Is there a bit somewhere > else that could tell you? I'm not really sure if there is a generally usable overflow detection for this. > In terms of ABI how about when we know overflow occurred and userspace > reads the sysfs file we return -EOVERFLOW instead of a potentially bad > value? So if the new value is greater than the old one, you don't really know whether or not an overflow has taken place. And so I would just document the fact that the underlying HW/firmware counter overflows as suggested by Dave.
diff --git a/drivers/platform/x86/intel/pmc/core.c b/drivers/platform/x86/intel/pmc/core.c index e2f171fac094..980af32dd48a 100644 --- a/drivers/platform/x86/intel/pmc/core.c +++ b/drivers/platform/x86/intel/pmc/core.c @@ -1203,6 +1203,8 @@ static inline bool pmc_core_is_s0ix_failed(struct pmc_dev *pmcdev) if (pmc_core_dev_state_get(pmcdev, &s0ix_counter)) return false; + pm_set_hw_sleep_time(s0ix_counter - pmcdev->s0ix_counter); + if (s0ix_counter == pmcdev->s0ix_counter) return true;
intel_pmc_core displays a warning when the module parameter `warn_on_s0ix_failures` is set and a suspend didn't get to a HW sleep state. Report this to the standard kernel reporting infrastructure so that userspace software can query after the suspend cycle is done. Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> --- v4->v5: * Reword commit message --- drivers/platform/x86/intel/pmc/core.c | 2 ++ 1 file changed, 2 insertions(+)