mbox series

[0/6] thermal/intel: Introduce intel-tcc lib and enhance tjmax handling

Message ID 20221108033332.27760-1-rui.zhang@intel.com
Headers show
Series thermal/intel: Introduce intel-tcc lib and enhance tjmax handling | expand

Message

Zhang, Rui Nov. 8, 2022, 3:33 a.m. UTC
Currently, all the thermal drivers that access the Intel CPU TCC (thermal
control circuitry) MSRs have two problems,
1. they have their own implementation for the same TCC functionalities,
   including getting the tjmax/temperature, getting/setting the TCC offset.
   This introduces a lot of duplicate code.
2. they get TjMax value once and always use the cached value later.
   But Tjmax value retrieved from MSR_IA32_TEMPERATURE_TARGET can be changed
   at runtime for cases like the Intel SST-PP (Intel Speed Select Technology-
   Performance Profile) level change.

The intel-tcc library is introduced in patch 1/6 for cleaning up the duplicate
code among these drivers, and it also ensures the temperature is alway got
based on the updated tjmax value.

Patch 2 ~ 5 cleans up the drivers by using the new Intel TCC lib APIs.

Patch 6/6 enhances the x86_pkg_temp driver to handle dynamic tjmax when
progamming the thermal interrupt threshold.
Actually, the thermal interrupt threshold programming can also be part of the
TCC library, but I didn't do it in this version because x86_pkg_temp is the
only user for now.

thanks,
rui

----------------------------------------------------------------
Zhang Rui (6):
      thermal/intel: Introduce Intel TCC library
      thermal/x86_pkg_temp_thermal: Use Intel TCC library
      thermal/int340x/processor_thermal: Use Intel TCC library
      thermal/intel/intel_soc_dts_iosf: Use Intel TCC library
      thermal/intel/intel_tcc_cooling: Use Intel TCC library
      thermal/x86_pkg_temp_thermal: Add support for handling dynamic tjmax

 drivers/thermal/intel/Kconfig                      |   7 ++
 drivers/thermal/intel/Makefile                     |   1 +
 drivers/thermal/intel/int340x_thermal/Kconfig      |   1 +
 .../int340x_thermal/processor_thermal_device.c     | 123 ++++---------------
 drivers/thermal/intel/intel_soc_dts_iosf.c         |  33 +-----
 drivers/thermal/intel/intel_tcc.c                  | 132 +++++++++++++++++++++
 drivers/thermal/intel/intel_tcc_cooling.c          |  37 +-----
 drivers/thermal/intel/x86_pkg_temp_thermal.c       |  62 ++++------
 include/linux/intel_tcc.h                          |  18 +++
 9 files changed, 219 insertions(+), 195 deletions(-)
 create mode 100644 drivers/thermal/intel/intel_tcc.c
 create mode 100644 include/linux/intel_tcc.h

Comments

Rafael J. Wysocki Dec. 8, 2022, 1:49 p.m. UTC | #1
On Tue, Nov 8, 2022 at 4:31 AM Zhang Rui <rui.zhang@intel.com> wrote:
>
> There are several different drivers that accesses the Intel TCC
> (thermal control circuitry) MSRs, and each of them has its own
> implementation for the same functionalities, e.g. getting the current
> temperature, getting the tj_max, and getting/setting the tj_max offset.
>
> Introduce a library to unify the code for Intel CPU TCC MSR access.
>
> At the same time, ensure the temperature is got based on the updated
> tjmax value because tjmax can be changed at runtime for cases like
> the Intel SST-PP (Intel Speed Select Technology - Performance Profile)
> level change.
>
> Signed-off-by: Zhang Rui <rui.zhang@intel.com>

Nice series, overall I like it a lot, but I have some comments
regarding this particular patch (below).  Some of them are arguably
minor, but at least one thing is more serious.

> ---
>  drivers/thermal/intel/Kconfig     |   4 +
>  drivers/thermal/intel/Makefile    |   1 +
>  drivers/thermal/intel/intel_tcc.c | 131 ++++++++++++++++++++++++++++++
>  include/linux/intel_tcc.h         |  18 ++++
>  4 files changed, 154 insertions(+)
>  create mode 100644 drivers/thermal/intel/intel_tcc.c
>  create mode 100644 include/linux/intel_tcc.h
>
> diff --git a/drivers/thermal/intel/Kconfig b/drivers/thermal/intel/Kconfig
> index f0c845679250..6b938c040d6e 100644
> --- a/drivers/thermal/intel/Kconfig
> +++ b/drivers/thermal/intel/Kconfig
> @@ -12,6 +12,10 @@ config X86_THERMAL_VECTOR
>         def_bool y
>         depends on X86 && CPU_SUP_INTEL && X86_LOCAL_APIC
>
> +config INTEL_TCC
> +       bool
> +       depends on X86
> +
>  config X86_PKG_TEMP_THERMAL
>         tristate "X86 package temperature thermal driver"
>         depends on X86_THERMAL_VECTOR
> diff --git a/drivers/thermal/intel/Makefile b/drivers/thermal/intel/Makefile
> index 9a8d8054f316..5d8833c82ab6 100644
> --- a/drivers/thermal/intel/Makefile
> +++ b/drivers/thermal/intel/Makefile
> @@ -2,6 +2,7 @@
>  #
>  # Makefile for various Intel thermal drivers.
>
> +obj-$(CONFIG_INTEL_TCC)        += intel_tcc.o
>  obj-$(CONFIG_INTEL_POWERCLAMP) += intel_powerclamp.o
>  obj-$(CONFIG_X86_PKG_TEMP_THERMAL)     += x86_pkg_temp_thermal.o
>  obj-$(CONFIG_INTEL_SOC_DTS_IOSF_CORE)  += intel_soc_dts_iosf.o
> diff --git a/drivers/thermal/intel/intel_tcc.c b/drivers/thermal/intel/intel_tcc.c
> new file mode 100644
> index 000000000000..74b434914975
> --- /dev/null
> +++ b/drivers/thermal/intel/intel_tcc.c
> @@ -0,0 +1,131 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * intel_tcc.c - Library for Intel TCC (thermal control circuitry) MSR access
> + * Copyright (c) 2022, Intel Corporation.
> + */
> +
> +#include <linux/errno.h>
> +#include <linux/intel_tcc.h>
> +#include <asm/msr.h>
> +
> +/**
> + * intel_tcc_get_tjmax() - returns the default TCC activation Temperature
> + * @cpu: cpu that the MSR should be run on.
> + * @tjmax: a valid pointer to where to store the Tjmax value
> + *
> + * Get the TjMax value, which is the default thermal throttling or TCC
> + * activation temperature in degrees C.
> + *
> + * Return: On success returns 0, an error code otherwise
> + */
> +

This extra empty line is not needed (and not desirable even).  And same below.

> +int intel_tcc_get_tjmax(int cpu, int *tjmax)
> +{
> +       u32 eax, edx;
> +       int err;
> +
> +       err = rdmsr_safe_on_cpu(cpu, MSR_IA32_TEMPERATURE_TARGET,
> +                                       &eax, &edx);

The current trend is to align the arguments after a line break with
the first one, but I would just put them all on the same line (and
below too).

> +       if (err)
> +               return err;
> +
> +       *tjmax = (eax >> 16) & 0xff;

This means that the tjmax value cannot be negative.

> +
> +       return *tjmax ? 0 : -EINVAL;

So the return value of this function could be tjmax if positive or a
negative error code otherwise.  No return pointers needed.

And why do you want to return -EINVAL (rather than any other error
code) if tjmax turns out to be 0?

> +}
> +EXPORT_SYMBOL_NS_GPL(intel_tcc_get_tjmax, INTEL_TCC);
> +
> +/**
> + * intel_tcc_get_offset() - returns the TCC Offset value to Tjmax
> + * @cpu: cpu that the MSR should be run on.
> + * @offset: a valid pointer to where to store the offset value
> + *
> + * Get the TCC offset value to Tjmax. The effective thermal throttling or TCC
> + * activation temperature equals "Tjmax" - "TCC Offset", in degrees C.
> + *
> + * Return: On success returns 0, an error code otherwise
> + */
> +
> +int intel_tcc_get_offset(int cpu, int *offset)
> +{
> +       u32 eax, edx;
> +       int err;
> +
> +       err = rdmsr_safe_on_cpu(cpu, MSR_IA32_TEMPERATURE_TARGET,
> +                                       &eax, &edx);
> +       if (err)
> +               return err;
> +
> +       *offset = (eax >> 24) & 0x3f;

Well, offset cannot be negative here, so (again) the return value of
this function could be interpreted as the offsent (if non-negative) or
a negative error code on failure.

> +
> +       return 0;
> +}
> +EXPORT_SYMBOL_NS_GPL(intel_tcc_get_offset, INTEL_TCC);
> +
> +/**
> + * intel_tcc_set_offset() - set the TCC offset value to Tjmax
> + * @cpu: cpu that the MSR should be run on.
> + * @offset: TCC offset value in degree C

I think that this cannot be negative, so maybe say "in K" instead of
"in degree C"?

And maybe it's better to pass u8 here?

> + *
> + * Set the TCC Offset value to Tjmax. The effective thermal throttling or TCC
> + * activation temperature equals "Tjmax" - "TCC Offset", in degree C.
> + *
> + * Return: On success returns 0, an error code otherwise
> + */
> +
> +int intel_tcc_set_offset(int cpu, int offset)
> +{
> +       u32 eax, edx;
> +       int err;
> +
> +       if (offset > 0x3f)
> +               return -EINVAL;
> +
> +       err = rdmsr_safe_on_cpu(cpu, MSR_IA32_TEMPERATURE_TARGET,
> +                                       &eax, &edx);
> +       if (err)
> +               return err;
> +
> +       if (eax & BIT(31))
> +               return -EPERM;

Why -EPERM?

> +
> +       eax &= ~(0x3f << 24);
> +       eax |= (offset << 24);

The parens are not needed AFAICS.

> +
> +       return wrmsr_safe_on_cpu(cpu, MSR_IA32_TEMPERATURE_TARGET, eax, edx);

So is any protection against concurrent access needed here?  Like what
if two different callers invoke this function at the same time for the
same CPU?

> +}
> +EXPORT_SYMBOL_NS_GPL(intel_tcc_set_offset, INTEL_TCC);
> +
> +/**
> + * intel_tcc_get_temp() - returns the current temperature
> + * @cpu: cpu that the MSR should be run on.
> + * @pkg: true: Package Thermal Sensor. false: Core Thermal Sensor.
> + * @temp: a valid pointer to where to store the resulting temperature
> + *
> + * Get the current temperature returned by the CPU core/package level
> + * thermal sensor, in degrees C.
> + *
> + * Return: On success returns 0, an error code otherwise
> + */
> +int intel_tcc_get_temp(int cpu, bool pkg, int *temp)
> +{
> +       u32 eax, edx;
> +       u32 msr = pkg ? MSR_IA32_PACKAGE_THERM_STATUS : MSR_IA32_THERM_STATUS;
> +       int tjmax, err;
> +
> +       err = intel_tcc_get_tjmax(cpu, &tjmax);
> +       if (err)
> +               return err;

Well, what if somebody change tjmax on this cpu while this function is running?

> +
> +       err = rdmsr_safe_on_cpu(cpu, msr, &eax, &edx);
> +       if (err)
> +               return err;
> +
> +       if (eax & 0x80000000) {
> +               *temp = tjmax - ((eax >> 16) & 0x7f);
> +               return 0;
> +       }
> +       return -EINVAL;
> +}
> +EXPORT_SYMBOL_NS_GPL(intel_tcc_get_temp, INTEL_TCC);
> +
> diff --git a/include/linux/intel_tcc.h b/include/linux/intel_tcc.h
> new file mode 100644
> index 000000000000..94f8ceab5dd0
> --- /dev/null
> +++ b/include/linux/intel_tcc.h
> @@ -0,0 +1,18 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + *  header for Intel TCC (thermal control circuitry) library
> + *
> + *  Copyright (C) 2022  Intel Corporation.
> + */
> +
> +#ifndef __INTEL_TCC_H__
> +#define __INTEL_TCC_H__
> +
> +#include <linux/types.h>
> +
> +int intel_tcc_get_tjmax(int cpu, int *tjmax);
> +int intel_tcc_get_offset(int cpu, int *offset);
> +int intel_tcc_set_offset(int cpu, int offset);
> +int intel_tcc_get_temp(int cpu, bool pkg, int *temp);
> +
> +#endif /* __INTEL_TCC_H__ */
> --
Rafael J. Wysocki Dec. 8, 2022, 2:07 p.m. UTC | #2
On Thu, Dec 8, 2022 at 2:49 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Tue, Nov 8, 2022 at 4:31 AM Zhang Rui <rui.zhang@intel.com> wrote:
> >
> > There are several different drivers that accesses the Intel TCC
> > (thermal control circuitry) MSRs, and each of them has its own
> > implementation for the same functionalities, e.g. getting the current
> > temperature, getting the tj_max, and getting/setting the tj_max offset.
> >
> > Introduce a library to unify the code for Intel CPU TCC MSR access.
> >
> > At the same time, ensure the temperature is got based on the updated
> > tjmax value because tjmax can be changed at runtime for cases like
> > the Intel SST-PP (Intel Speed Select Technology - Performance Profile)
> > level change.
> >
> > Signed-off-by: Zhang Rui <rui.zhang@intel.com>
>
> Nice series, overall I like it a lot, but I have some comments
> regarding this particular patch (below).  Some of them are arguably
> minor, but at least one thing is more serious.
>
> > ---
> >  drivers/thermal/intel/Kconfig     |   4 +
> >  drivers/thermal/intel/Makefile    |   1 +
> >  drivers/thermal/intel/intel_tcc.c | 131 ++++++++++++++++++++++++++++++
> >  include/linux/intel_tcc.h         |  18 ++++
> >  4 files changed, 154 insertions(+)
> >  create mode 100644 drivers/thermal/intel/intel_tcc.c
> >  create mode 100644 include/linux/intel_tcc.h
> >
> > diff --git a/drivers/thermal/intel/Kconfig b/drivers/thermal/intel/Kconfig
> > index f0c845679250..6b938c040d6e 100644
> > --- a/drivers/thermal/intel/Kconfig
> > +++ b/drivers/thermal/intel/Kconfig
> > @@ -12,6 +12,10 @@ config X86_THERMAL_VECTOR
> >         def_bool y
> >         depends on X86 && CPU_SUP_INTEL && X86_LOCAL_APIC
> >
> > +config INTEL_TCC
> > +       bool
> > +       depends on X86
> > +
> >  config X86_PKG_TEMP_THERMAL
> >         tristate "X86 package temperature thermal driver"
> >         depends on X86_THERMAL_VECTOR
> > diff --git a/drivers/thermal/intel/Makefile b/drivers/thermal/intel/Makefile
> > index 9a8d8054f316..5d8833c82ab6 100644
> > --- a/drivers/thermal/intel/Makefile
> > +++ b/drivers/thermal/intel/Makefile
> > @@ -2,6 +2,7 @@
> >  #
> >  # Makefile for various Intel thermal drivers.
> >
> > +obj-$(CONFIG_INTEL_TCC)        += intel_tcc.o
> >  obj-$(CONFIG_INTEL_POWERCLAMP) += intel_powerclamp.o
> >  obj-$(CONFIG_X86_PKG_TEMP_THERMAL)     += x86_pkg_temp_thermal.o
> >  obj-$(CONFIG_INTEL_SOC_DTS_IOSF_CORE)  += intel_soc_dts_iosf.o
> > diff --git a/drivers/thermal/intel/intel_tcc.c b/drivers/thermal/intel/intel_tcc.c
> > new file mode 100644
> > index 000000000000..74b434914975
> > --- /dev/null
> > +++ b/drivers/thermal/intel/intel_tcc.c
> > @@ -0,0 +1,131 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * intel_tcc.c - Library for Intel TCC (thermal control circuitry) MSR access
> > + * Copyright (c) 2022, Intel Corporation.
> > + */
> > +
> > +#include <linux/errno.h>
> > +#include <linux/intel_tcc.h>
> > +#include <asm/msr.h>
> > +
> > +/**
> > + * intel_tcc_get_tjmax() - returns the default TCC activation Temperature
> > + * @cpu: cpu that the MSR should be run on.
> > + * @tjmax: a valid pointer to where to store the Tjmax value
> > + *
> > + * Get the TjMax value, which is the default thermal throttling or TCC
> > + * activation temperature in degrees C.
> > + *
> > + * Return: On success returns 0, an error code otherwise
> > + */
> > +
>
> This extra empty line is not needed (and not desirable even).  And same below.
>
> > +int intel_tcc_get_tjmax(int cpu, int *tjmax)
> > +{
> > +       u32 eax, edx;
> > +       int err;
> > +
> > +       err = rdmsr_safe_on_cpu(cpu, MSR_IA32_TEMPERATURE_TARGET,
> > +                                       &eax, &edx);
>
> The current trend is to align the arguments after a line break with
> the first one, but I would just put them all on the same line (and
> below too).
>
> > +       if (err)
> > +               return err;
> > +
> > +       *tjmax = (eax >> 16) & 0xff;
>
> This means that the tjmax value cannot be negative.
>
> > +
> > +       return *tjmax ? 0 : -EINVAL;
>
> So the return value of this function could be tjmax if positive or a
> negative error code otherwise.  No return pointers needed.
>
> And why do you want to return -EINVAL (rather than any other error
> code) if tjmax turns out to be 0?
>
> > +}
> > +EXPORT_SYMBOL_NS_GPL(intel_tcc_get_tjmax, INTEL_TCC);
> > +
> > +/**
> > + * intel_tcc_get_offset() - returns the TCC Offset value to Tjmax
> > + * @cpu: cpu that the MSR should be run on.
> > + * @offset: a valid pointer to where to store the offset value
> > + *
> > + * Get the TCC offset value to Tjmax. The effective thermal throttling or TCC
> > + * activation temperature equals "Tjmax" - "TCC Offset", in degrees C.
> > + *
> > + * Return: On success returns 0, an error code otherwise
> > + */
> > +
> > +int intel_tcc_get_offset(int cpu, int *offset)
> > +{
> > +       u32 eax, edx;
> > +       int err;
> > +
> > +       err = rdmsr_safe_on_cpu(cpu, MSR_IA32_TEMPERATURE_TARGET,
> > +                                       &eax, &edx);
> > +       if (err)
> > +               return err;
> > +
> > +       *offset = (eax >> 24) & 0x3f;
>
> Well, offset cannot be negative here, so (again) the return value of
> this function could be interpreted as the offsent (if non-negative) or
> a negative error code on failure.
>
> > +
> > +       return 0;
> > +}
> > +EXPORT_SYMBOL_NS_GPL(intel_tcc_get_offset, INTEL_TCC);
> > +
> > +/**
> > + * intel_tcc_set_offset() - set the TCC offset value to Tjmax
> > + * @cpu: cpu that the MSR should be run on.
> > + * @offset: TCC offset value in degree C
>
> I think that this cannot be negative, so maybe say "in K" instead of
> "in degree C"?
>
> And maybe it's better to pass u8 here?
>
> > + *
> > + * Set the TCC Offset value to Tjmax. The effective thermal throttling or TCC
> > + * activation temperature equals "Tjmax" - "TCC Offset", in degree C.
> > + *
> > + * Return: On success returns 0, an error code otherwise
> > + */
> > +
> > +int intel_tcc_set_offset(int cpu, int offset)
> > +{
> > +       u32 eax, edx;
> > +       int err;
> > +
> > +       if (offset > 0x3f)
> > +               return -EINVAL;
> > +
> > +       err = rdmsr_safe_on_cpu(cpu, MSR_IA32_TEMPERATURE_TARGET,
> > +                                       &eax, &edx);
> > +       if (err)
> > +               return err;
> > +
> > +       if (eax & BIT(31))
> > +               return -EPERM;
>
> Why -EPERM?
>
> > +
> > +       eax &= ~(0x3f << 24);
> > +       eax |= (offset << 24);
>
> The parens are not needed AFAICS.
>
> > +
> > +       return wrmsr_safe_on_cpu(cpu, MSR_IA32_TEMPERATURE_TARGET, eax, edx);
>
> So is any protection against concurrent access needed here?  Like what
> if two different callers invoke this function at the same time for the
> same CPU?
>
> > +}
> > +EXPORT_SYMBOL_NS_GPL(intel_tcc_set_offset, INTEL_TCC);
> > +
> > +/**
> > + * intel_tcc_get_temp() - returns the current temperature
> > + * @cpu: cpu that the MSR should be run on.
> > + * @pkg: true: Package Thermal Sensor. false: Core Thermal Sensor.
> > + * @temp: a valid pointer to where to store the resulting temperature
> > + *
> > + * Get the current temperature returned by the CPU core/package level
> > + * thermal sensor, in degrees C.
> > + *
> > + * Return: On success returns 0, an error code otherwise
> > + */
> > +int intel_tcc_get_temp(int cpu, bool pkg, int *temp)
> > +{
> > +       u32 eax, edx;
> > +       u32 msr = pkg ? MSR_IA32_PACKAGE_THERM_STATUS : MSR_IA32_THERM_STATUS;
> > +       int tjmax, err;
> > +
> > +       err = intel_tcc_get_tjmax(cpu, &tjmax);
> > +       if (err)
> > +               return err;
>
> Well, what if somebody change tjmax on this cpu while this function is running?

Sorry, scratch this.  The offset can be updated, not tjmax.

But can tjmax change anyway or is it just a property of the hardware?

> > +
> > +       err = rdmsr_safe_on_cpu(cpu, msr, &eax, &edx);
> > +       if (err)
> > +               return err;
> > +
> > +       if (eax & 0x80000000) {
> > +               *temp = tjmax - ((eax >> 16) & 0x7f);
> > +               return 0;
> > +       }
> > +       return -EINVAL;
> > +}
> > +EXPORT_SYMBOL_NS_GPL(intel_tcc_get_temp, INTEL_TCC);

And one more thing, but rather for the future.

Reading from these MSRs can be costly, especially when done on a
remote CPU, so would it make sense to cache the retrieved values in
case they are read relatively often?
Zhang, Rui Dec. 8, 2022, 4:49 p.m. UTC | #3
On Thu, 2022-12-08 at 14:49 +0100, Rafael J. Wysocki wrote:
> On Tue, Nov 8, 2022 at 4:31 AM Zhang Rui <rui.zhang@intel.com> wrote:
> > There are several different drivers that accesses the Intel TCC
> > (thermal control circuitry) MSRs, and each of them has its own
> > implementation for the same functionalities, e.g. getting the
> > current
> > temperature, getting the tj_max, and getting/setting the tj_max
> > offset.
> > 
> > Introduce a library to unify the code for Intel CPU TCC MSR access.
> > 
> > At the same time, ensure the temperature is got based on the
> > updated
> > tjmax value because tjmax can be changed at runtime for cases like
> > the Intel SST-PP (Intel Speed Select Technology - Performance
> > Profile)
> > level change.
> > 
> > Signed-off-by: Zhang Rui <rui.zhang@intel.com>
> 
> Nice series, overall I like it a lot, but I have some comments
> regarding this particular patch (below).  Some of them are arguably
> minor, but at least one thing is more serious.

Hi, Rafael,

Thanks for reviewing the patch set.
> 
> > ---
> >  drivers/thermal/intel/Kconfig     |   4 +
> >  drivers/thermal/intel/Makefile    |   1 +
> >  drivers/thermal/intel/intel_tcc.c | 131
> > ++++++++++++++++++++++++++++++
> >  include/linux/intel_tcc.h         |  18 ++++
> >  4 files changed, 154 insertions(+)
> >  create mode 100644 drivers/thermal/intel/intel_tcc.c
> >  create mode 100644 include/linux/intel_tcc.h
> > 
> > diff --git a/drivers/thermal/intel/Kconfig
> > b/drivers/thermal/intel/Kconfig
> > index f0c845679250..6b938c040d6e 100644
> > --- a/drivers/thermal/intel/Kconfig
> > +++ b/drivers/thermal/intel/Kconfig
> > @@ -12,6 +12,10 @@ config X86_THERMAL_VECTOR
> >         def_bool y
> >         depends on X86 && CPU_SUP_INTEL && X86_LOCAL_APIC
> > 
> > +config INTEL_TCC
> > +       bool
> > +       depends on X86
> > +
> >  config X86_PKG_TEMP_THERMAL
> >         tristate "X86 package temperature thermal driver"
> >         depends on X86_THERMAL_VECTOR
> > diff --git a/drivers/thermal/intel/Makefile
> > b/drivers/thermal/intel/Makefile
> > index 9a8d8054f316..5d8833c82ab6 100644
> > --- a/drivers/thermal/intel/Makefile
> > +++ b/drivers/thermal/intel/Makefile
> > @@ -2,6 +2,7 @@
> >  #
> >  # Makefile for various Intel thermal drivers.
> > 
> > +obj-$(CONFIG_INTEL_TCC)        += intel_tcc.o
> >  obj-$(CONFIG_INTEL_POWERCLAMP) += intel_powerclamp.o
> >  obj-$(CONFIG_X86_PKG_TEMP_THERMAL)     += x86_pkg_temp_thermal.o
> >  obj-$(CONFIG_INTEL_SOC_DTS_IOSF_CORE)  += intel_soc_dts_iosf.o
> > diff --git a/drivers/thermal/intel/intel_tcc.c
> > b/drivers/thermal/intel/intel_tcc.c
> > new file mode 100644
> > index 000000000000..74b434914975
> > --- /dev/null
> > +++ b/drivers/thermal/intel/intel_tcc.c
> > @@ -0,0 +1,131 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * intel_tcc.c - Library for Intel TCC (thermal control circuitry)
> > MSR access
> > + * Copyright (c) 2022, Intel Corporation.
> > + */
> > +
> > +#include <linux/errno.h>
> > +#include <linux/intel_tcc.h>
> > +#include <asm/msr.h>
> > +
> > +/**
> > + * intel_tcc_get_tjmax() - returns the default TCC activation
> > Temperature
> > + * @cpu: cpu that the MSR should be run on.
> > + * @tjmax: a valid pointer to where to store the Tjmax value
> > + *
> > + * Get the TjMax value, which is the default thermal throttling or
> > TCC
> > + * activation temperature in degrees C.
> > + *
> > + * Return: On success returns 0, an error code otherwise
> > + */
> > +
> 
> This extra empty line is not needed (and not desirable even).  And
> same below.

Sure. will remove it.
> 
> > +int intel_tcc_get_tjmax(int cpu, int *tjmax)
> > +{
> > +       u32 eax, edx;
> > +       int err;
> > +
> > +       err = rdmsr_safe_on_cpu(cpu, MSR_IA32_TEMPERATURE_TARGET,
> > +                                       &eax, &edx);
> 
> The current trend is to align the arguments after a line break with
> the first one, but I would just put them all on the same line (and
> below too).

I agree putting them in the same line looks prettier.

> 
> > +       if (err)
> > +               return err;
> > +
> > +       *tjmax = (eax >> 16) & 0xff;
> 
> This means that the tjmax value cannot be negative.
> 
> > +
> > +       return *tjmax ? 0 : -EINVAL;
> 
> So the return value of this function could be tjmax if positive or a
> negative error code otherwise.  No return pointers needed.

I tried both. And I think I chose this solution just because it makes
the following cleanup patches in this series looks prettier.

I will try your suggestion, and if there is any other reason I wrote it
in this way, I will find it out again. :p

> 
> And why do you want to return -EINVAL (rather than any other error
> code) if tjmax turns out to be 0?

Because x86_pkg_temp_thermal driver returns -EINVAL previously.
Any suggestions on this?

> > +}
> > +EXPORT_SYMBOL_NS_GPL(intel_tcc_get_tjmax, INTEL_TCC);
> > +
> > +/**
> > + * intel_tcc_get_offset() - returns the TCC Offset value to Tjmax
> > + * @cpu: cpu that the MSR should be run on.
> > + * @offset: a valid pointer to where to store the offset value
> > + *
> > + * Get the TCC offset value to Tjmax. The effective thermal
> > throttling or TCC
> > + * activation temperature equals "Tjmax" - "TCC Offset", in
> > degrees C.
> > + *
> > + * Return: On success returns 0, an error code otherwise
> > + */
> > +
> > +int intel_tcc_get_offset(int cpu, int *offset)
> > +{
> > +       u32 eax, edx;
> > +       int err;
> > +
> > +       err = rdmsr_safe_on_cpu(cpu, MSR_IA32_TEMPERATURE_TARGET,
> > +                                       &eax, &edx);
> > +       if (err)
> > +               return err;
> > +
> > +       *offset = (eax >> 24) & 0x3f;
> 
> Well, offset cannot be negative here, so (again) the return value of
> this function could be interpreted as the offsent (if non-negative)
> or
> a negative error code on failure.
> 
> > +
> > +       return 0;
> > +}
> > +EXPORT_SYMBOL_NS_GPL(intel_tcc_get_offset, INTEL_TCC);
> > +
> > +/**
> > + * intel_tcc_set_offset() - set the TCC offset value to Tjmax
> > + * @cpu: cpu that the MSR should be run on.
> > + * @offset: TCC offset value in degree C
> 
> I think that this cannot be negative, so maybe say "in K" instead of
> "in degree C"?

degree C is the unit used in the Intel SDM, so better to keep this
comment aligned.

> 
> And maybe it's better to pass u8 here?

sounds good, will do in next version.

> 
> > + *
> > + * Set the TCC Offset value to Tjmax. The effective thermal
> > throttling or TCC
> > + * activation temperature equals "Tjmax" - "TCC Offset", in degree
> > C.
> > + *
> > + * Return: On success returns 0, an error code otherwise
> > + */
> > +
> > +int intel_tcc_set_offset(int cpu, int offset)
> > +{
> > +       u32 eax, edx;
> > +       int err;
> > +
> > +       if (offset > 0x3f)
> > +               return -EINVAL;
> > +
> > +       err = rdmsr_safe_on_cpu(cpu, MSR_IA32_TEMPERATURE_TARGET,
> > +                                       &eax, &edx);
> > +       if (err)
> > +               return err;
> > +
> > +       if (eax & BIT(31))
> > +               return -EPERM;
> 
> Why -EPERM?

Bit 31 set means the MSR is locked, and os software cannot write it.
should I use -EACCES instead?

> 
> > +
> > +       eax &= ~(0x3f << 24);
> > +       eax |= (offset << 24);
> 
> The parens are not needed AFAICS.
> 
Agreed.

> > +
> > +       return wrmsr_safe_on_cpu(cpu, MSR_IA32_TEMPERATURE_TARGET,
> > eax, edx);
> 
> So is any protection against concurrent access needed here?  Like
> what
> if two different callers invoke this function at the same time for
> the
> same CPU?

Given that the tcc offset is the only field that kernel code writes,
all the other bits won't change, so this is not a problem.

> 
> > +}
> > +EXPORT_SYMBOL_NS_GPL(intel_tcc_set_offset, INTEL_TCC);
> > +
> > +/**
> > + * intel_tcc_get_temp() - returns the current temperature
> > + * @cpu: cpu that the MSR should be run on.
> > + * @pkg: true: Package Thermal Sensor. false: Core Thermal Sensor.
> > + * @temp: a valid pointer to where to store the resulting
> > temperature
> > + *
> > + * Get the current temperature returned by the CPU core/package
> > level
> > + * thermal sensor, in degrees C.
> > + *
> > + * Return: On success returns 0, an error code otherwise
> > + */
> > +int intel_tcc_get_temp(int cpu, bool pkg, int *temp)
> > +{
> > +       u32 eax, edx;
> > +       u32 msr = pkg ? MSR_IA32_PACKAGE_THERM_STATUS :
> > MSR_IA32_THERM_STATUS;
> > +       int tjmax, err;
> > +
> > +       err = intel_tcc_get_tjmax(cpu, &tjmax);
> > +       if (err)
> > +               return err;
> 
> Well, what if somebody change tjmax on this cpu while this function
> is running?

tjmax is read only. Only firmware can change its value at runtime, say
this field is updated when SST-PP level is changed.

thanks,
rui
> 
> > +
> > +       err = rdmsr_safe_on_cpu(cpu, msr, &eax, &edx);
> > +       if (err)
> > +               return err;
> > +
> > +       if (eax & 0x80000000) {
> > +               *temp = tjmax - ((eax >> 16) & 0x7f);
> > +               return 0;
> > +       }
> > +       return -EINVAL;
> > +}
> > +EXPORT_SYMBOL_NS_GPL(intel_tcc_get_temp, INTEL_TCC);
> > +
> > diff --git a/include/linux/intel_tcc.h b/include/linux/intel_tcc.h
> > new file mode 100644
> > index 000000000000..94f8ceab5dd0
> > --- /dev/null
> > +++ b/include/linux/intel_tcc.h
> > @@ -0,0 +1,18 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +/*
> > + *  header for Intel TCC (thermal control circuitry) library
> > + *
> > + *  Copyright (C) 2022  Intel Corporation.
> > + */
> > +
> > +#ifndef __INTEL_TCC_H__
> > +#define __INTEL_TCC_H__
> > +
> > +#include <linux/types.h>
> > +
> > +int intel_tcc_get_tjmax(int cpu, int *tjmax);
> > +int intel_tcc_get_offset(int cpu, int *offset);
> > +int intel_tcc_set_offset(int cpu, int offset);
> > +int intel_tcc_get_temp(int cpu, bool pkg, int *temp);
> > +
> > +#endif /* __INTEL_TCC_H__ */
> > --
Rafael J. Wysocki Dec. 8, 2022, 5 p.m. UTC | #4
On Thu, Dec 8, 2022 at 5:49 PM Zhang Rui <rui.zhang@intel.com> wrote:
>
> On Thu, 2022-12-08 at 14:49 +0100, Rafael J. Wysocki wrote:
> > On Tue, Nov 8, 2022 at 4:31 AM Zhang Rui <rui.zhang@intel.com> wrote:
> > > There are several different drivers that accesses the Intel TCC
> > > (thermal control circuitry) MSRs, and each of them has its own
> > > implementation for the same functionalities, e.g. getting the
> > > current
> > > temperature, getting the tj_max, and getting/setting the tj_max
> > > offset.
> > >
> > > Introduce a library to unify the code for Intel CPU TCC MSR access.
> > >
> > > At the same time, ensure the temperature is got based on the
> > > updated
> > > tjmax value because tjmax can be changed at runtime for cases like
> > > the Intel SST-PP (Intel Speed Select Technology - Performance
> > > Profile)
> > > level change.
> > >
> > > Signed-off-by: Zhang Rui <rui.zhang@intel.com>
> >
> > Nice series, overall I like it a lot, but I have some comments
> > regarding this particular patch (below).  Some of them are arguably
> > minor, but at least one thing is more serious.
>
> Hi, Rafael,
>
> Thanks for reviewing the patch set.
> >
> > > ---
> > >  drivers/thermal/intel/Kconfig     |   4 +
> > >  drivers/thermal/intel/Makefile    |   1 +
> > >  drivers/thermal/intel/intel_tcc.c | 131
> > > ++++++++++++++++++++++++++++++
> > >  include/linux/intel_tcc.h         |  18 ++++
> > >  4 files changed, 154 insertions(+)
> > >  create mode 100644 drivers/thermal/intel/intel_tcc.c
> > >  create mode 100644 include/linux/intel_tcc.h
> > >
> > > diff --git a/drivers/thermal/intel/Kconfig
> > > b/drivers/thermal/intel/Kconfig
> > > index f0c845679250..6b938c040d6e 100644
> > > --- a/drivers/thermal/intel/Kconfig
> > > +++ b/drivers/thermal/intel/Kconfig
> > > @@ -12,6 +12,10 @@ config X86_THERMAL_VECTOR
> > >         def_bool y
> > >         depends on X86 && CPU_SUP_INTEL && X86_LOCAL_APIC
> > >
> > > +config INTEL_TCC
> > > +       bool
> > > +       depends on X86
> > > +
> > >  config X86_PKG_TEMP_THERMAL
> > >         tristate "X86 package temperature thermal driver"
> > >         depends on X86_THERMAL_VECTOR
> > > diff --git a/drivers/thermal/intel/Makefile
> > > b/drivers/thermal/intel/Makefile
> > > index 9a8d8054f316..5d8833c82ab6 100644
> > > --- a/drivers/thermal/intel/Makefile
> > > +++ b/drivers/thermal/intel/Makefile
> > > @@ -2,6 +2,7 @@
> > >  #
> > >  # Makefile for various Intel thermal drivers.
> > >
> > > +obj-$(CONFIG_INTEL_TCC)        += intel_tcc.o
> > >  obj-$(CONFIG_INTEL_POWERCLAMP) += intel_powerclamp.o
> > >  obj-$(CONFIG_X86_PKG_TEMP_THERMAL)     += x86_pkg_temp_thermal.o
> > >  obj-$(CONFIG_INTEL_SOC_DTS_IOSF_CORE)  += intel_soc_dts_iosf.o
> > > diff --git a/drivers/thermal/intel/intel_tcc.c
> > > b/drivers/thermal/intel/intel_tcc.c
> > > new file mode 100644
> > > index 000000000000..74b434914975
> > > --- /dev/null
> > > +++ b/drivers/thermal/intel/intel_tcc.c
> > > @@ -0,0 +1,131 @@
> > > +// SPDX-License-Identifier: GPL-2.0-only
> > > +/*
> > > + * intel_tcc.c - Library for Intel TCC (thermal control circuitry)
> > > MSR access
> > > + * Copyright (c) 2022, Intel Corporation.
> > > + */
> > > +
> > > +#include <linux/errno.h>
> > > +#include <linux/intel_tcc.h>
> > > +#include <asm/msr.h>
> > > +
> > > +/**
> > > + * intel_tcc_get_tjmax() - returns the default TCC activation
> > > Temperature
> > > + * @cpu: cpu that the MSR should be run on.
> > > + * @tjmax: a valid pointer to where to store the Tjmax value
> > > + *
> > > + * Get the TjMax value, which is the default thermal throttling or
> > > TCC
> > > + * activation temperature in degrees C.
> > > + *
> > > + * Return: On success returns 0, an error code otherwise
> > > + */
> > > +
> >
> > This extra empty line is not needed (and not desirable even).  And
> > same below.
>
> Sure. will remove it.
> >
> > > +int intel_tcc_get_tjmax(int cpu, int *tjmax)
> > > +{
> > > +       u32 eax, edx;
> > > +       int err;
> > > +
> > > +       err = rdmsr_safe_on_cpu(cpu, MSR_IA32_TEMPERATURE_TARGET,
> > > +                                       &eax, &edx);
> >
> > The current trend is to align the arguments after a line break with
> > the first one, but I would just put them all on the same line (and
> > below too).
>
> I agree putting them in the same line looks prettier.
>
> >
> > > +       if (err)
> > > +               return err;
> > > +
> > > +       *tjmax = (eax >> 16) & 0xff;
> >
> > This means that the tjmax value cannot be negative.
> >
> > > +
> > > +       return *tjmax ? 0 : -EINVAL;
> >
> > So the return value of this function could be tjmax if positive or a
> > negative error code otherwise.  No return pointers needed.
>
> I tried both. And I think I chose this solution just because it makes
> the following cleanup patches in this series looks prettier.
>
> I will try your suggestion, and if there is any other reason I wrote it
> in this way, I will find it out again. :p
>
> >
> > And why do you want to return -EINVAL (rather than any other error
> > code) if tjmax turns out to be 0?
>
> Because x86_pkg_temp_thermal driver returns -EINVAL previously.
> Any suggestions on this?

I would use -ENODATA.

> > > +}
> > > +EXPORT_SYMBOL_NS_GPL(intel_tcc_get_tjmax, INTEL_TCC);
> > > +
> > > +/**
> > > + * intel_tcc_get_offset() - returns the TCC Offset value to Tjmax
> > > + * @cpu: cpu that the MSR should be run on.
> > > + * @offset: a valid pointer to where to store the offset value
> > > + *
> > > + * Get the TCC offset value to Tjmax. The effective thermal
> > > throttling or TCC
> > > + * activation temperature equals "Tjmax" - "TCC Offset", in
> > > degrees C.
> > > + *
> > > + * Return: On success returns 0, an error code otherwise
> > > + */
> > > +
> > > +int intel_tcc_get_offset(int cpu, int *offset)
> > > +{
> > > +       u32 eax, edx;
> > > +       int err;
> > > +
> > > +       err = rdmsr_safe_on_cpu(cpu, MSR_IA32_TEMPERATURE_TARGET,
> > > +                                       &eax, &edx);
> > > +       if (err)
> > > +               return err;
> > > +
> > > +       *offset = (eax >> 24) & 0x3f;
> >
> > Well, offset cannot be negative here, so (again) the return value of
> > this function could be interpreted as the offsent (if non-negative)
> > or
> > a negative error code on failure.
> >
> > > +
> > > +       return 0;
> > > +}
> > > +EXPORT_SYMBOL_NS_GPL(intel_tcc_get_offset, INTEL_TCC);
> > > +
> > > +/**
> > > + * intel_tcc_set_offset() - set the TCC offset value to Tjmax
> > > + * @cpu: cpu that the MSR should be run on.
> > > + * @offset: TCC offset value in degree C
> >
> > I think that this cannot be negative, so maybe say "in K" instead of
> > "in degree C"?
>
> degree C is the unit used in the Intel SDM, so better to keep this
> comment aligned.

OK

> >
> > And maybe it's better to pass u8 here?
>
> sounds good, will do in next version.
>
> >
> > > + *
> > > + * Set the TCC Offset value to Tjmax. The effective thermal
> > > throttling or TCC
> > > + * activation temperature equals "Tjmax" - "TCC Offset", in degree
> > > C.
> > > + *
> > > + * Return: On success returns 0, an error code otherwise
> > > + */
> > > +
> > > +int intel_tcc_set_offset(int cpu, int offset)
> > > +{
> > > +       u32 eax, edx;
> > > +       int err;
> > > +
> > > +       if (offset > 0x3f)
> > > +               return -EINVAL;
> > > +
> > > +       err = rdmsr_safe_on_cpu(cpu, MSR_IA32_TEMPERATURE_TARGET,
> > > +                                       &eax, &edx);
> > > +       if (err)
> > > +               return err;
> > > +
> > > +       if (eax & BIT(31))
> > > +               return -EPERM;
> >
> > Why -EPERM?
>
> Bit 31 set means the MSR is locked, and os software cannot write it.
> should I use -EACCES instead?

No, -EPERM is fine in this case, but a short comment would be useful.

> >
> > > +
> > > +       eax &= ~(0x3f << 24);
> > > +       eax |= (offset << 24);
> >
> > The parens are not needed AFAICS.
> >
> Agreed.
>
> > > +
> > > +       return wrmsr_safe_on_cpu(cpu, MSR_IA32_TEMPERATURE_TARGET,
> > > eax, edx);
> >
> > So is any protection against concurrent access needed here?  Like
> > what
> > if two different callers invoke this function at the same time for
> > the
> > same CPU?
>
> Given that the tcc offset is the only field that kernel code writes,
> all the other bits won't change, so this is not a problem.

OK

> >
> > > +}
> > > +EXPORT_SYMBOL_NS_GPL(intel_tcc_set_offset, INTEL_TCC);
> > > +
> > > +/**
> > > + * intel_tcc_get_temp() - returns the current temperature
> > > + * @cpu: cpu that the MSR should be run on.
> > > + * @pkg: true: Package Thermal Sensor. false: Core Thermal Sensor.
> > > + * @temp: a valid pointer to where to store the resulting
> > > temperature
> > > + *
> > > + * Get the current temperature returned by the CPU core/package
> > > level
> > > + * thermal sensor, in degrees C.
> > > + *
> > > + * Return: On success returns 0, an error code otherwise
> > > + */
> > > +int intel_tcc_get_temp(int cpu, bool pkg, int *temp)
> > > +{
> > > +       u32 eax, edx;
> > > +       u32 msr = pkg ? MSR_IA32_PACKAGE_THERM_STATUS :
> > > MSR_IA32_THERM_STATUS;
> > > +       int tjmax, err;
> > > +
> > > +       err = intel_tcc_get_tjmax(cpu, &tjmax);
> > > +       if (err)
> > > +               return err;
> >
> > Well, what if somebody change tjmax on this cpu while this function
> > is running?
>
> tjmax is read only. Only firmware can change its value at runtime, say
> this field is updated when SST-PP level is changed.

Do we get any type of notification on tjmax changes, or do we need to
poll for it?
Zhang, Rui Dec. 9, 2022, 1:32 p.m. UTC | #5
> > > 
> > > > +
> > > > +       return *tjmax ? 0 : -EINVAL;
> > > 
> > > So the return value of this function could be tjmax if positive
> > > or a
> > > negative error code otherwise.  No return pointers needed.
> > 
> > I tried both. And I think I chose this solution just because it
> > makes
> > the following cleanup patches in this series looks prettier.
> > 
> > I will try your suggestion, and if there is any other reason I
> > wrote it
> > in this way, I will find it out again. :p
> > 
> > > And why do you want to return -EINVAL (rather than any other
> > > error
> > > code) if tjmax turns out to be 0?
> > 
> > Because x86_pkg_temp_thermal driver returns -EINVAL previously.
> > Any suggestions on this?
> 
> I would use -ENODATA.

Sure, will do.

> > > > +}
> > > > +EXPORT_SYMBOL_NS_GPL(intel_tcc_set_offset, INTEL_TCC);
> > > > +
> > > > +/**
> > > > + * intel_tcc_get_temp() - returns the current temperature
> > > > + * @cpu: cpu that the MSR should be run on.
> > > > + * @pkg: true: Package Thermal Sensor. false: Core Thermal
> > > > Sensor.
> > > > + * @temp: a valid pointer to where to store the resulting
> > > > temperature
> > > > + *
> > > > + * Get the current temperature returned by the CPU
> > > > core/package
> > > > level
> > > > + * thermal sensor, in degrees C.
> > > > + *
> > > > + * Return: On success returns 0, an error code otherwise
> > > > + */
> > > > +int intel_tcc_get_temp(int cpu, bool pkg, int *temp)
> > > > +{
> > > > +       u32 eax, edx;
> > > > +       u32 msr = pkg ? MSR_IA32_PACKAGE_THERM_STATUS :
> > > > MSR_IA32_THERM_STATUS;
> > > > +       int tjmax, err;
> > > > +
> > > > +       err = intel_tcc_get_tjmax(cpu, &tjmax);
> > > > +       if (err)
> > > > +               return err;
> > > 
> > > Well, what if somebody change tjmax on this cpu while this
> > > function
> > > is running?
> > 
> > tjmax is read only. Only firmware can change its value at runtime,
> > say
> > this field is updated when SST-PP level is changed.
> 
> Do we get any type of notification on tjmax changes, or do we need to
> poll for it?

I'm not aware of such a notification.
Srinivas, do you know if there is one?

thanks,
rui
Zhang, Rui Dec. 9, 2022, 1:57 p.m. UTC | #6
> > > +
> > > +       err = rdmsr_safe_on_cpu(cpu, msr, &eax, &edx);
> > > +       if (err)
> > > +               return err;
> > > +
> > > +       if (eax & 0x80000000) {
> > > +               *temp = tjmax - ((eax >> 16) & 0x7f);
> > > +               return 0;
> > > +       }
> > > +       return -EINVAL;
> > > +}
> > > +EXPORT_SYMBOL_NS_GPL(intel_tcc_get_temp, INTEL_TCC);
> 
> And one more thing, but rather for the future.
> 
> Reading from these MSRs can be costly, especially when done on a
> remote CPU, so would it make sense to cache the retrieved values in
> case they are read relatively often?like the coretemp driver already
> has similar code.

For now, this API is called from sysfs attribute callback so it is okay
to add a unified rate_limit here, say 1 second like coretemp driver.

But for the future, I'm not sure. This really depends on the
expectation of the caller.

BTW, this reminds me that I can improve the code a little bit, to avoid
reading MSR from remote CPU when possible.

thanks,
rui
Srinivas Pandruvada Dec. 9, 2022, 4:28 p.m. UTC | #7
On Fri, 2022-12-09 at 21:32 +0800, Zhang Rui wrote:
> > > 
> > > > 

[...]

> > > > Well, what if somebody change tjmax on this cpu while this
> > > > function
> > > > is running?
> > > 
> > > tjmax is read only. Only firmware can change its value at
> > > runtime,
> > > say
> > > this field is updated when SST-PP level is changed.
> > 
> > Do we get any type of notification on tjmax changes, or do we need
> > to
> > poll for it?
> 
> I'm not aware of such a notification.
> Srinivas, do you know if there is one?
There is no explicit notification. You can infer from HWP guaranteed
change interrupt.

Thanks,
Srinivas

> 
> thanks,
> rui
> 
>
Zhang, Rui Dec. 11, 2022, 7:23 a.m. UTC | #8
> > > +       if (err)
> > > +               return err;
> > > +
> > > +       *tjmax = (eax >> 16) & 0xff;
> > 
> > This means that the tjmax value cannot be negative.
> > 
> > > +
> > > +       return *tjmax ? 0 : -EINVAL;
> > 
> > So the return value of this function could be tjmax if positive or
> > a
> > negative error code otherwise.  No return pointers needed.
> 
> I tried both. And I think I chose this solution just because it makes
> the following cleanup patches in this series looks prettier.
> 
> I will try your suggestion, and if there is any other reason I wrote
> it
> in this way, I will find it out again. :p

I see.
Say, we have to use

intel_tcc_get_temp(int cpu, int *temp)

so I keep the same format for all the intel_tcc_get_XXX APIs

intel_tcc_get_tjmax(int cpu, int *tjmax)
intel_tcc_get_offset(int cpu, int *offset)

so that the return value is decoded in the same way. This helps avoid
return value check mistakes for the callers.

surely I can remove the return pointer if you still prefer that. :p

> > > +
> > > +       return 0;
> > > +}
> > > +EXPORT_SYMBOL_NS_GPL(intel_tcc_get_offset, INTEL_TCC);
> > > +
> > > +/**
> > > + * intel_tcc_set_offset() - set the TCC offset value to Tjmax
> > > + * @cpu: cpu that the MSR should be run on.
> > > + * @offset: TCC offset value in degree C
> > 
> > I think that this cannot be negative, so maybe say "in K" instead
> > of
> > "in degree C"?
> 
> degree C is the unit used in the Intel SDM, so better to keep this
> comment aligned.
> 
> > And maybe it's better to pass u8 here?
> 
> sounds good, will do in next version.
> 
to keep consistent, we can use either
int intel_tcc_get_offset(int cpu)
int intel_tcc_set_offset(int cpu, int offset)
or
int intel_tcc_get_offset(int cpu, u8 *offset)
int intel_tcc_set_offset(int cpu, u8 offset)

or else callers need to declare 'offset' but with different type, when
getting and setting it, which looks strange.

thanks,
rui
> > > + *
> > > + * Set the TCC Offset value to Tjmax. The effective thermal
> > > throttling or TCC
> > > + * activation temperature equals "Tjmax" - "TCC Offset", in
> > > degree
> > > C.
> > > + *
> > > + * Return: On success returns 0, an error code otherwise
> > > + */
> > > +
> > > +int intel_tcc_set_offset(int cpu, int offset)
> > > +{
> > > +       u32 eax, edx;
> > > +       int err;
> > > +
> > > +       if (offset > 0x3f)
> > > +               return -EINVAL;
> > > +
> > > +       err = rdmsr_safe_on_cpu(cpu, MSR_IA32_TEMPERATURE_TARGET,
> > > +                                       &eax, &edx);
> > > +       if (err)
> > > +               return err;
> > > +
> > > +       if (eax & BIT(31))
> > > +               return -EPERM;
> > 
> > Why -EPERM?
> 
> Bit 31 set means the MSR is locked, and os software cannot write it.
> should I use -EACCES instead?
> 
> > > +
> > > +       eax &= ~(0x3f << 24);
> > > +       eax |= (offset << 24);
> > 
> > The parens are not needed AFAICS.
> > 
> Agreed.
> 
> > > +
> > > +       return wrmsr_safe_on_cpu(cpu,
> > > MSR_IA32_TEMPERATURE_TARGET,
> > > eax, edx);
> > 
> > So is any protection against concurrent access needed here?  Like
> > what
> > if two different callers invoke this function at the same time for
> > the
> > same CPU?
> 
> Given that the tcc offset is the only field that kernel code writes,
> all the other bits won't change, so this is not a problem.
> 
> > > +}
> > > +EXPORT_SYMBOL_NS_GPL(intel_tcc_set_offset, INTEL_TCC);
> > > +
> > > +/**
> > > + * intel_tcc_get_temp() - returns the current temperature
> > > + * @cpu: cpu that the MSR should be run on.
> > > + * @pkg: true: Package Thermal Sensor. false: Core Thermal
> > > Sensor.
> > > + * @temp: a valid pointer to where to store the resulting
> > > temperature
> > > + *
> > > + * Get the current temperature returned by the CPU core/package
> > > level
> > > + * thermal sensor, in degrees C.
> > > + *
> > > + * Return: On success returns 0, an error code otherwise
> > > + */
> > > +int intel_tcc_get_temp(int cpu, bool pkg, int *temp)
> > > +{
> > > +       u32 eax, edx;
> > > +       u32 msr = pkg ? MSR_IA32_PACKAGE_THERM_STATUS :
> > > MSR_IA32_THERM_STATUS;
> > > +       int tjmax, err;
> > > +
> > > +       err = intel_tcc_get_tjmax(cpu, &tjmax);
> > > +       if (err)
> > > +               return err;
> > 
> > Well, what if somebody change tjmax on this cpu while this function
> > is running?
> 
> tjmax is read only. Only firmware can change its value at runtime,
> say
> this field is updated when SST-PP level is changed.
> 
> thanks,
> rui
> > > +
> > > +       err = rdmsr_safe_on_cpu(cpu, msr, &eax, &edx);
> > > +       if (err)
> > > +               return err;
> > > +
> > > +       if (eax & 0x80000000) {
> > > +               *temp = tjmax - ((eax >> 16) & 0x7f);
> > > +               return 0;
> > > +       }
> > > +       return -EINVAL;
> > > +}
> > > +EXPORT_SYMBOL_NS_GPL(intel_tcc_get_temp, INTEL_TCC);
> > > +
> > > diff --git a/include/linux/intel_tcc.h
> > > b/include/linux/intel_tcc.h
> > > new file mode 100644
> > > index 000000000000..94f8ceab5dd0
> > > --- /dev/null
> > > +++ b/include/linux/intel_tcc.h
> > > @@ -0,0 +1,18 @@
> > > +/* SPDX-License-Identifier: GPL-2.0 */
> > > +/*
> > > + *  header for Intel TCC (thermal control circuitry) library
> > > + *
> > > + *  Copyright (C) 2022  Intel Corporation.
> > > + */
> > > +
> > > +#ifndef __INTEL_TCC_H__
> > > +#define __INTEL_TCC_H__
> > > +
> > > +#include <linux/types.h>
> > > +
> > > +int intel_tcc_get_tjmax(int cpu, int *tjmax);
> > > +int intel_tcc_get_offset(int cpu, int *offset);
> > > +int intel_tcc_set_offset(int cpu, int offset);
> > > +int intel_tcc_get_temp(int cpu, bool pkg, int *temp);
> > > +
> > > +#endif /* __INTEL_TCC_H__ */
> > > --
Zhang, Rui Dec. 11, 2022, 7:50 a.m. UTC | #9
On Fri, 2022-12-09 at 08:28 -0800, srinivas pandruvada wrote:
> On Fri, 2022-12-09 at 21:32 +0800, Zhang Rui wrote:
> 
> [...]
> 
> > > > > Well, what if somebody change tjmax on this cpu while this
> > > > > function
> > > > > is running?
> > > > 
> > > > tjmax is read only. Only firmware can change its value at
> > > > runtime,
> > > > say
> > > > this field is updated when SST-PP level is changed.
> > > 
> > > Do we get any type of notification on tjmax changes, or do we
> > > need
> > > to
> > > poll for it?
> > 
> > I'm not aware of such a notification.
> > Srinivas, do you know if there is one?
> There is no explicit notification. You can infer from HWP guaranteed
> change interrupt.

But
1. tjmax may or may not change when HWP guaranteed change interrupt
fires. And it does not change in most cases, right?
2. HWP guaranteed change interrupt is per CPU, so if we update tjmax in
the interrupt handler, we also need to cache the timestamp to avoid
duplicate updates from multiple CPUs in the same package/die.

Given that the TCC lib APIs may or may not be called, depends on the
usersapce behavior, do you think this is a win by adding this extra
cost?

Instead, update the tjmax value only when the API is called, and then
cache it to handle frequent read is sufficient. what do you think?

thanks,
rui

> 
> Thanks,
> Srinivas
> 
> > thanks,
> > rui
> > 
> > 
> 
>
Rafael J. Wysocki Dec. 12, 2022, 12:11 p.m. UTC | #10
On Sun, Dec 11, 2022 at 8:24 AM Zhang Rui <rui.zhang@intel.com> wrote:
>
>
> > > > +       if (err)
> > > > +               return err;
> > > > +
> > > > +       *tjmax = (eax >> 16) & 0xff;
> > >
> > > This means that the tjmax value cannot be negative.
> > >
> > > > +
> > > > +       return *tjmax ? 0 : -EINVAL;
> > >
> > > So the return value of this function could be tjmax if positive or
> > > a
> > > negative error code otherwise.  No return pointers needed.
> >
> > I tried both. And I think I chose this solution just because it makes
> > the following cleanup patches in this series looks prettier.
> >
> > I will try your suggestion, and if there is any other reason I wrote
> > it
> > in this way, I will find it out again. :p
>
> I see.
> Say, we have to use
>
> intel_tcc_get_temp(int cpu, int *temp)

Do we?

> so I keep the same format for all the intel_tcc_get_XXX APIs
>
> intel_tcc_get_tjmax(int cpu, int *tjmax)
> intel_tcc_get_offset(int cpu, int *offset)
>
> so that the return value is decoded in the same way. This helps avoid
> return value check mistakes for the callers.
>
> surely I can remove the return pointer if you still prefer that. :p

Using a function value directly is very much preferred unless there
really are two values to return.

In these particular cases a negative return value can be clearly
interpreted as an error condition, so using the return value directly
is sufficient.

> > > > +
> > > > +       return 0;
> > > > +}
> > > > +EXPORT_SYMBOL_NS_GPL(intel_tcc_get_offset, INTEL_TCC);
> > > > +
> > > > +/**
> > > > + * intel_tcc_set_offset() - set the TCC offset value to Tjmax
> > > > + * @cpu: cpu that the MSR should be run on.
> > > > + * @offset: TCC offset value in degree C
> > >
> > > I think that this cannot be negative, so maybe say "in K" instead
> > > of
> > > "in degree C"?
> >
> > degree C is the unit used in the Intel SDM, so better to keep this
> > comment aligned.
> >
> > > And maybe it's better to pass u8 here?
> >
> > sounds good, will do in next version.
> >
> to keep consistent, we can use either
> int intel_tcc_get_offset(int cpu)
> int intel_tcc_set_offset(int cpu, int offset)

What about intel_tcc_set_offset(int cpu, u8 offset)?

> or
> int intel_tcc_get_offset(int cpu, u8 *offset)
> int intel_tcc_set_offset(int cpu, u8 offset)
>
> or else callers need to declare 'offset' but with different type, when
> getting and setting it, which looks strange.

Well, one can surely pass an int as a u8 argument and assign a u8
return value to an int variable.
Rafael J. Wysocki Dec. 12, 2022, 12:15 p.m. UTC | #11
On Sun, Dec 11, 2022 at 8:50 AM Zhang Rui <rui.zhang@intel.com> wrote:
>
> On Fri, 2022-12-09 at 08:28 -0800, srinivas pandruvada wrote:
> > On Fri, 2022-12-09 at 21:32 +0800, Zhang Rui wrote:
> >
> > [...]
> >
> > > > > > Well, what if somebody change tjmax on this cpu while this
> > > > > > function
> > > > > > is running?
> > > > >
> > > > > tjmax is read only. Only firmware can change its value at
> > > > > runtime,
> > > > > say
> > > > > this field is updated when SST-PP level is changed.
> > > >
> > > > Do we get any type of notification on tjmax changes, or do we
> > > > need
> > > > to
> > > > poll for it?
> > >
> > > I'm not aware of such a notification.
> > > Srinivas, do you know if there is one?
> > There is no explicit notification. You can infer from HWP guaranteed
> > change interrupt.
>
> But
> 1. tjmax may or may not change when HWP guaranteed change interrupt
> fires. And it does not change in most cases, right?
> 2. HWP guaranteed change interrupt is per CPU, so if we update tjmax in
> the interrupt handler, we also need to cache the timestamp to avoid
> duplicate updates from multiple CPUs in the same package/die.
>
> Given that the TCC lib APIs may or may not be called, depends on the
> usersapce behavior, do you think this is a win by adding this extra
> cost?

Not really.

> Instead, update the tjmax value only when the API is called, and then
> cache it to handle frequent read is sufficient. what do you think?

Yes, that should be sufficient, but I wouldn't do it in this series anyway.
Zhang, Rui Dec. 13, 2022, 1:38 a.m. UTC | #12
On Mon, 2022-12-12 at 13:11 +0100, Rafael J. Wysocki wrote:
> On Sun, Dec 11, 2022 at 8:24 AM Zhang Rui <rui.zhang@intel.com>
> wrote:
> > 
> > > > > +       if (err)
> > > > > +               return err;
> > > > > +
> > > > > +       *tjmax = (eax >> 16) & 0xff;
> > > > 
> > > > This means that the tjmax value cannot be negative.
> > > > 
> > > > > +
> > > > > +       return *tjmax ? 0 : -EINVAL;
> > > > 
> > > > So the return value of this function could be tjmax if positive
> > > > or
> > > > a
> > > > negative error code otherwise.  No return pointers needed.
> > > 
> > > I tried both. And I think I chose this solution just because it
> > > makes
> > > the following cleanup patches in this series looks prettier.
> > > 
> > > I will try your suggestion, and if there is any other reason I
> > > wrote
> > > it
> > > in this way, I will find it out again. :p
> > 
> > I see.
> > Say, we have to use
> > 
> > intel_tcc_get_temp(int cpu, int *temp)
> 
> Do we?

temp = tjmax - digital_readout

tjmax and digital_readout are from MSR and they are both positive
values, but temp can be negative in theory.
so are you suggesting that we should treat the negative CPU temperature
as an error because this won't happen in real world?

thanks,
rui

> 
> > so I keep the same format for all the intel_tcc_get_XXX APIs
> > 
> > intel_tcc_get_tjmax(int cpu, int *tjmax)
> > intel_tcc_get_offset(int cpu, int *offset)
> > 
> > so that the return value is decoded in the same way. This helps
> > avoid
> > return value check mistakes for the callers.
> > 
> > surely I can remove the return pointer if you still prefer that. :p
> 
> Using a function value directly is very much preferred unless there
> really are two values to return.
> 
> In these particular cases a negative return value can be clearly
> interpreted as an error condition, so using the return value directly
> is sufficient.
> 
> > > > > +
> > > > > +       return 0;
> > > > > +}
> > > > > +EXPORT_SYMBOL_NS_GPL(intel_tcc_get_offset, INTEL_TCC);
> > > > > +
> > > > > +/**
> > > > > + * intel_tcc_set_offset() - set the TCC offset value to
> > > > > Tjmax
> > > > > + * @cpu: cpu that the MSR should be run on.
> > > > > + * @offset: TCC offset value in degree C
> > > > 
> > > > I think that this cannot be negative, so maybe say "in K"
> > > > instead
> > > > of
> > > > "in degree C"?
> > > 
> > > degree C is the unit used in the Intel SDM, so better to keep
> > > this
> > > comment aligned.
> > > 
> > > > And maybe it's better to pass u8 here?
> > > 
> > > sounds good, will do in next version.
> > > 
> > to keep consistent, we can use either
> > int intel_tcc_get_offset(int cpu)
> > int intel_tcc_set_offset(int cpu, int offset)
> 
> What about intel_tcc_set_offset(int cpu, u8 offset)?
> 
> > or
> > int intel_tcc_get_offset(int cpu, u8 *offset)
> > int intel_tcc_set_offset(int cpu, u8 offset)
> > 
> > or else callers need to declare 'offset' but with different type,
> > when
> > getting and setting it, which looks strange.
> 
> Well, one can surely pass an int as a u8 argument and assign a u8
> return value to an int variable.
Rafael J. Wysocki Dec. 13, 2022, 3:34 p.m. UTC | #13
On Tue, Dec 13, 2022 at 2:38 AM Zhang Rui <rui.zhang@intel.com> wrote:
>
> On Mon, 2022-12-12 at 13:11 +0100, Rafael J. Wysocki wrote:
> > On Sun, Dec 11, 2022 at 8:24 AM Zhang Rui <rui.zhang@intel.com>
> > wrote:
> > >
> > > > > > +       if (err)
> > > > > > +               return err;
> > > > > > +
> > > > > > +       *tjmax = (eax >> 16) & 0xff;
> > > > >
> > > > > This means that the tjmax value cannot be negative.
> > > > >
> > > > > > +
> > > > > > +       return *tjmax ? 0 : -EINVAL;
> > > > >
> > > > > So the return value of this function could be tjmax if positive
> > > > > or
> > > > > a
> > > > > negative error code otherwise.  No return pointers needed.
> > > >
> > > > I tried both. And I think I chose this solution just because it
> > > > makes
> > > > the following cleanup patches in this series looks prettier.
> > > >
> > > > I will try your suggestion, and if there is any other reason I
> > > > wrote
> > > > it
> > > > in this way, I will find it out again. :p
> > >
> > > I see.
> > > Say, we have to use
> > >
> > > intel_tcc_get_temp(int cpu, int *temp)
> >
> > Do we?
>
> temp = tjmax - digital_readout
>
> tjmax and digital_readout are from MSR and they are both positive
> values, but temp can be negative in theory.

I know, but is this ever expected to happen?

> so are you suggesting that we should treat the negative CPU temperature
> as an error because this won't happen in real world?

No.

If it's necessary to handle negative temperatures, an int is needed
and the additional return value is useful.

Now there is also the question of what the callers will do with a
negative temperature value.  Do they care?
Zhang, Rui Dec. 14, 2022, 4:15 p.m. UTC | #14
On Tue, 2022-12-13 at 16:34 +0100, Rafael J. Wysocki wrote:
> On Tue, Dec 13, 2022 at 2:38 AM Zhang Rui <rui.zhang@intel.com>
> wrote:
> > On Mon, 2022-12-12 at 13:11 +0100, Rafael J. Wysocki wrote:
> > > On Sun, Dec 11, 2022 at 8:24 AM Zhang Rui <rui.zhang@intel.com>
> > > wrote:
> > > > > > > +       if (err)
> > > > > > > +               return err;
> > > > > > > +
> > > > > > > +       *tjmax = (eax >> 16) & 0xff;
> > > > > > 
> > > > > > This means that the tjmax value cannot be negative.
> > > > > > 
> > > > > > > +
> > > > > > > +       return *tjmax ? 0 : -EINVAL;
> > > > > > 
> > > > > > So the return value of this function could be tjmax if
> > > > > > positive
> > > > > > or
> > > > > > a
> > > > > > negative error code otherwise.  No return pointers needed.
> > > > > 
> > > > > I tried both. And I think I chose this solution just because
> > > > > it
> > > > > makes
> > > > > the following cleanup patches in this series looks prettier.
> > > > > 
> > > > > I will try your suggestion, and if there is any other reason
> > > > > I
> > > > > wrote
> > > > > it
> > > > > in this way, I will find it out again. :p
> > > > 
> > > > I see.
> > > > Say, we have to use
> > > > 
> > > > intel_tcc_get_temp(int cpu, int *temp)
> > > 
> > > Do we?
> > 
> > temp = tjmax - digital_readout
> > 
> > tjmax and digital_readout are from MSR and they are both positive
> > values, but temp can be negative in theory.
> 
> I know, but is this ever expected to happen?
> 
> > so are you suggesting that we should treat the negative CPU
> > temperature
> > as an error because this won't happen in real world?
> 
> No.
> 
> If it's necessary to handle negative temperatures, an int is needed
> and the additional return value is useful.
> 
> Now there is also the question of what the callers will do with a
> negative temperature value.  Do they care?

Currently, all the callers are from sysfs attribute.
for userspace tools, at least thermald ignores negative CPU
temperature.
So I will return an error for negative temperature in next version.

thanks,
rui
Rafael J. Wysocki Dec. 14, 2022, 4:17 p.m. UTC | #15
On Wed, Dec 14, 2022 at 5:15 PM Zhang Rui <rui.zhang@intel.com> wrote:
>
> On Tue, 2022-12-13 at 16:34 +0100, Rafael J. Wysocki wrote:
> > On Tue, Dec 13, 2022 at 2:38 AM Zhang Rui <rui.zhang@intel.com>
> > wrote:
> > > On Mon, 2022-12-12 at 13:11 +0100, Rafael J. Wysocki wrote:
> > > > On Sun, Dec 11, 2022 at 8:24 AM Zhang Rui <rui.zhang@intel.com>
> > > > wrote:
> > > > > > > > +       if (err)
> > > > > > > > +               return err;
> > > > > > > > +
> > > > > > > > +       *tjmax = (eax >> 16) & 0xff;
> > > > > > >
> > > > > > > This means that the tjmax value cannot be negative.
> > > > > > >
> > > > > > > > +
> > > > > > > > +       return *tjmax ? 0 : -EINVAL;
> > > > > > >
> > > > > > > So the return value of this function could be tjmax if
> > > > > > > positive
> > > > > > > or
> > > > > > > a
> > > > > > > negative error code otherwise.  No return pointers needed.
> > > > > >
> > > > > > I tried both. And I think I chose this solution just because
> > > > > > it
> > > > > > makes
> > > > > > the following cleanup patches in this series looks prettier.
> > > > > >
> > > > > > I will try your suggestion, and if there is any other reason
> > > > > > I
> > > > > > wrote
> > > > > > it
> > > > > > in this way, I will find it out again. :p
> > > > >
> > > > > I see.
> > > > > Say, we have to use
> > > > >
> > > > > intel_tcc_get_temp(int cpu, int *temp)
> > > >
> > > > Do we?
> > >
> > > temp = tjmax - digital_readout
> > >
> > > tjmax and digital_readout are from MSR and they are both positive
> > > values, but temp can be negative in theory.
> >
> > I know, but is this ever expected to happen?
> >
> > > so are you suggesting that we should treat the negative CPU
> > > temperature
> > > as an error because this won't happen in real world?
> >
> > No.
> >
> > If it's necessary to handle negative temperatures, an int is needed
> > and the additional return value is useful.
> >
> > Now there is also the question of what the callers will do with a
> > negative temperature value.  Do they care?
>
> Currently, all the callers are from sysfs attribute.
> for userspace tools, at least thermald ignores negative CPU
> temperature.
> So I will return an error for negative temperature in next version.

Sounds good.