mbox series

[00/10] thermal: tegra: Do not register cooling device

Message ID 20230414125721.1043589-1-thierry.reding@gmail.com
Headers show
Series thermal: tegra: Do not register cooling device | expand

Message

Thierry Reding April 14, 2023, 12:57 p.m. UTC
From: Thierry Reding <treding@nvidia.com>

Hi,

this set of patches removes the registration of the SOCTHERM internal
throttling mechanism as cooling device. Since this throttling starts
automatically once a certain temperature threshold is crossed, it
doesn't make sense to represent it as a cooling device, which are
typically "manually" activated by the thermal framework when thermal
sensors report temperature thresholds being crossed.

Instead of using the cooling device mechanism, this statically programs
the throttling mechanism when it is configured in device tree. In order
to do this, an additional device tree property is needed to replace the
information that was previously contained in trip points.

There's a few preparatory patches to make the removal a bit simpler and
also some follow up cleanups included as well.

Thierry

Thierry Reding (10):
  dt-bindings: thermal: tegra: Document throttle temperature
  thermal: tegra: Use driver-private data consistently
  thermal: tegra: Constify SoC-specific data
  thermal: tegra: Do not register cooling device
  thermal: tegra: Use unsigned int where appropriate
  thermal: tegra: Avoid over-allocation of temporary array
  thermal: tegra: Remove gratuitous error assignment
  thermal: tegra: Minor stylistic cleanups
  arm64: tegra: Rework SOCTHERM on Tegra132 and Tegra210
  ARM: tegra: Rework SOCTHERM on Tegra124

 .../thermal/nvidia,tegra124-soctherm.yaml     |   7 +
 arch/arm/boot/dts/tegra124.dtsi               |  65 +--
 arch/arm64/boot/dts/nvidia/tegra132.dtsi      |  63 +--
 arch/arm64/boot/dts/nvidia/tegra210.dtsi      |  83 +---
 drivers/thermal/tegra/soctherm.c              | 392 ++++++------------
 drivers/thermal/tegra/soctherm.h              |   1 +
 drivers/thermal/tegra/tegra124-soctherm.c     |   4 +
 drivers/thermal/tegra/tegra132-soctherm.c     |   4 +
 drivers/thermal/tegra/tegra210-soctherm.c     |   4 +
 9 files changed, 208 insertions(+), 415 deletions(-)

Comments

Daniel Lezcano April 17, 2023, 8:15 a.m. UTC | #1
On 14/04/2023 14:57, Thierry Reding wrote:
> From: Thierry Reding <treding@nvidia.com>
> 
> The "heavy throttle" cooling device that SOCTHERM uses isn't a cooling
> device in the traditional sense. It's an automatic mechanism that cannot
> be actively controlled. Do not expose it as a cooling device and instead
> of tying it to a specific trip point, hard-code the temperature at which
> the automatic throttling will begin.
> 
> While at it, clean up the trip point names to reflect the names used by
> the thermal device tree bindings.
> 
> Signed-off-by: Thierry Reding <treding@nvidia.com>
> ---
>   arch/arm64/boot/dts/nvidia/tegra132.dtsi | 63 +++++-------------
>   arch/arm64/boot/dts/nvidia/tegra210.dtsi | 83 +++++++-----------------
>   2 files changed, 39 insertions(+), 107 deletions(-)
> 
> diff --git a/arch/arm64/boot/dts/nvidia/tegra132.dtsi b/arch/arm64/boot/dts/nvidia/tegra132.dtsi
> index 8b78be8f4f9d..11ebf7517df1 100644
> --- a/arch/arm64/boot/dts/nvidia/tegra132.dtsi
> +++ b/arch/arm64/boot/dts/nvidia/tegra132.dtsi
> @@ -876,11 +876,10 @@ soctherm: thermal-sensor@700e2000 {
>   		#thermal-sensor-cells = <1>;
>   
>   		throttle-cfgs {
> -			throttle_heavy: heavy {
> +			heavy {
>   				nvidia,priority = <100>;
>   				nvidia,cpu-throt-level = <TEGRA_SOCTHERM_THROT_LEVEL_HIGH>;
> -
> -				#cooling-cells = <2>;
> +				temperature = <102000>;
>   			};
>   		};
>   	};
> @@ -1136,114 +1135,84 @@ cpu-thermal {
>   			polling-delay-passive = <1000>;
>   			polling-delay = <0>;
>   
> -			thermal-sensors =
> -				<&soctherm TEGRA124_SOCTHERM_SENSOR_CPU>;
> +			thermal-sensors = <&soctherm TEGRA124_SOCTHERM_SENSOR_CPU>;
>   
>   			trips {
> -				cpu_shutdown_trip {
> +				critical {
>   					temperature = <105000>;
>   					hysteresis = <1000>;
>   					type = "critical";
>   				};
>   
> -				cpu_throttle_trip: throttle-trip {
> +				hot {
>   					temperature = <102000>;
>   					hysteresis = <1000>;
>   					type = "hot";
>   				};
>   			};
> -
> -			cooling-maps {
> -				map0 {
> -					trip = <&cpu_throttle_trip>;
> -					cooling-device = <&throttle_heavy 1 1>;
> -				};
> -			};

If the hardware mitigation is 'heavy', don't you want to have DVFS 
acting before hardware throttling ?

[ ... ]
Thierry Reding April 17, 2023, 9:06 a.m. UTC | #2
On Mon, Apr 17, 2023 at 10:15:11AM +0200, Daniel Lezcano wrote:
> On 14/04/2023 14:57, Thierry Reding wrote:
> > From: Thierry Reding <treding@nvidia.com>
> > 
> > The "heavy throttle" cooling device that SOCTHERM uses isn't a cooling
> > device in the traditional sense. It's an automatic mechanism that cannot
> > be actively controlled. Do not expose it as a cooling device and instead
> > of tying it to a specific trip point, hard-code the temperature at which
> > the automatic throttling will begin.
> > 
> > While at it, clean up the trip point names to reflect the names used by
> > the thermal device tree bindings.
> > 
> > Signed-off-by: Thierry Reding <treding@nvidia.com>
> > ---
> >   arch/arm64/boot/dts/nvidia/tegra132.dtsi | 63 +++++-------------
> >   arch/arm64/boot/dts/nvidia/tegra210.dtsi | 83 +++++++-----------------
> >   2 files changed, 39 insertions(+), 107 deletions(-)
> > 
> > diff --git a/arch/arm64/boot/dts/nvidia/tegra132.dtsi b/arch/arm64/boot/dts/nvidia/tegra132.dtsi
> > index 8b78be8f4f9d..11ebf7517df1 100644
> > --- a/arch/arm64/boot/dts/nvidia/tegra132.dtsi
> > +++ b/arch/arm64/boot/dts/nvidia/tegra132.dtsi
> > @@ -876,11 +876,10 @@ soctherm: thermal-sensor@700e2000 {
> >   		#thermal-sensor-cells = <1>;
> >   		throttle-cfgs {
> > -			throttle_heavy: heavy {
> > +			heavy {
> >   				nvidia,priority = <100>;
> >   				nvidia,cpu-throt-level = <TEGRA_SOCTHERM_THROT_LEVEL_HIGH>;
> > -
> > -				#cooling-cells = <2>;
> > +				temperature = <102000>;
> >   			};
> >   		};
> >   	};
> > @@ -1136,114 +1135,84 @@ cpu-thermal {
> >   			polling-delay-passive = <1000>;
> >   			polling-delay = <0>;
> > -			thermal-sensors =
> > -				<&soctherm TEGRA124_SOCTHERM_SENSOR_CPU>;
> > +			thermal-sensors = <&soctherm TEGRA124_SOCTHERM_SENSOR_CPU>;
> >   			trips {
> > -				cpu_shutdown_trip {
> > +				critical {
> >   					temperature = <105000>;
> >   					hysteresis = <1000>;
> >   					type = "critical";
> >   				};
> > -				cpu_throttle_trip: throttle-trip {
> > +				hot {
> >   					temperature = <102000>;
> >   					hysteresis = <1000>;
> >   					type = "hot";
> >   				};
> >   			};
> > -
> > -			cooling-maps {
> > -				map0 {
> > -					trip = <&cpu_throttle_trip>;
> > -					cooling-device = <&throttle_heavy 1 1>;
> > -				};
> > -			};
> 
> If the hardware mitigation is 'heavy', don't you want to have DVFS acting
> before hardware throttling ?

The throttling here is in fact some form of DVFS, but yes, generally we
would likely want to have additional forms of DVFS before we reach this
state. We could add CPU cooling devices and there's also a mechanism to
throttle the DRAM frequency on certain boards.

But those are mostly orthogonal to this series. The goal here is to get
rid of the throttling mechanism as a cooling device.

Thierry
Krzysztof Kozlowski April 18, 2023, 4:13 p.m. UTC | #3
On 17/04/2023 10:59, Thierry Reding wrote:
> On Fri, Apr 14, 2023 at 11:47:56PM +0200, Krzysztof Kozlowski wrote:
>> On 14/04/2023 14:57, Thierry Reding wrote:
>>> From: Thierry Reding <treding@nvidia.com>
>>>
>>> Each throttling configuration needs to specify the temperature threshold
>>> at which it should start throttling. Previously this was tied to a given
>>> trip point as a cooling device and used the temperature specified for
>>> that trip point. This doesn't work well because the throttling mechanism
>>> is not a cooling device in the traditional sense.
>>>
>>> Instead, allow device trees to specify the throttle temperature in the
>>> throttle configuration directly so that the throttle doesn't need to be
>>> exposed as a cooling device.
>>>
>>> Signed-off-by: Thierry Reding <treding@nvidia.com>
>>> ---
>>>  .../bindings/thermal/nvidia,tegra124-soctherm.yaml         | 7 +++++++
>>>  1 file changed, 7 insertions(+)
>>>
>>> diff --git a/Documentation/devicetree/bindings/thermal/nvidia,tegra124-soctherm.yaml b/Documentation/devicetree/bindings/thermal/nvidia,tegra124-soctherm.yaml
>>> index 4677ad6645a5..37dac851f486 100644
>>> --- a/Documentation/devicetree/bindings/thermal/nvidia,tegra124-soctherm.yaml
>>> +++ b/Documentation/devicetree/bindings/thermal/nvidia,tegra124-soctherm.yaml
>>
>> File does not exist in next and no dependency is mentioned, so tricky to
>> review and figure out context. Without context the comment is:
> 
> Apologies, I have a conversion series for these thermal bindings. I'll
> send those out first.
> 
>>> @@ -120,6 +120,13 @@ properties:
>>>                # high (85%, TEGRA_SOCTHERM_THROT_LEVEL_HIGH)
>>>                - 3
>>>  
>>> +          temperature:
>>> +            $ref: /schemas/types.yaml#/definitions/int32
>>
>> Use -millicelsius suffix instead:
>> https://github.com/devicetree-org/dt-schema/blob/main/dtschema/schemas/property-units.yaml
> 
> Okay.
> 
>>> +            minimum: -273000
>>> +            maximum: 200000
>>> +            description: The temperature threshold (in millicelsius) that,
>>> +              when crossed, will trigger the configured automatic throttling.
>>
>> Don't you want some hysteresis? Or is it already using trips binding?
>> But in that case you should skip the $ref and maximum - they come from
>> thermal-zones, don't they?
> 
> We don't use a hysteresis at the moment, but checking the register
> documentation, there's indeed "up" and "down" thresholds, so we can add
> another property for that.
> 
> This doesn't use the trips binding and in fact, one of the reasons for
> this change is because we want to make this separate from trip points.
> Trip points are usually associated with cooling devices and this
> throttling mechanism doesn't really fit that concept because it is an
> automatic mechanism that is triggered when a given temperature threshold
> is crossed, rather than a manually activated mechanism, which is what a
> cooling device would be.

OK, I just wasn't sure if the binding already includes trips, which
would mean you should use existing 'temperature' property.

In such case, I think it's better to switch to property with common unit
- millicelsius, either low/high ranges or with hysteresis.

Best regards,
Krzysztof
Daniel Lezcano June 19, 2023, 10:36 a.m. UTC | #4
Hi Thierry,

are you planning to send a new version ?


On 14/04/2023 14:57, Thierry Reding wrote:
> From: Thierry Reding <treding@nvidia.com>
> 
> Hi,
> 
> this set of patches removes the registration of the SOCTHERM internal
> throttling mechanism as cooling device. Since this throttling starts
> automatically once a certain temperature threshold is crossed, it
> doesn't make sense to represent it as a cooling device, which are
> typically "manually" activated by the thermal framework when thermal
> sensors report temperature thresholds being crossed.
> 
> Instead of using the cooling device mechanism, this statically programs
> the throttling mechanism when it is configured in device tree. In order
> to do this, an additional device tree property is needed to replace the
> information that was previously contained in trip points.
> 
> There's a few preparatory patches to make the removal a bit simpler and
> also some follow up cleanups included as well.
> 
> Thierry
> 
> Thierry Reding (10):
>    dt-bindings: thermal: tegra: Document throttle temperature
>    thermal: tegra: Use driver-private data consistently
>    thermal: tegra: Constify SoC-specific data
>    thermal: tegra: Do not register cooling device
>    thermal: tegra: Use unsigned int where appropriate
>    thermal: tegra: Avoid over-allocation of temporary array
>    thermal: tegra: Remove gratuitous error assignment
>    thermal: tegra: Minor stylistic cleanups
>    arm64: tegra: Rework SOCTHERM on Tegra132 and Tegra210
>    ARM: tegra: Rework SOCTHERM on Tegra124
> 
>   .../thermal/nvidia,tegra124-soctherm.yaml     |   7 +
>   arch/arm/boot/dts/tegra124.dtsi               |  65 +--
>   arch/arm64/boot/dts/nvidia/tegra132.dtsi      |  63 +--
>   arch/arm64/boot/dts/nvidia/tegra210.dtsi      |  83 +---
>   drivers/thermal/tegra/soctherm.c              | 392 ++++++------------
>   drivers/thermal/tegra/soctherm.h              |   1 +
>   drivers/thermal/tegra/tegra124-soctherm.c     |   4 +
>   drivers/thermal/tegra/tegra132-soctherm.c     |   4 +
>   drivers/thermal/tegra/tegra210-soctherm.c     |   4 +
>   9 files changed, 208 insertions(+), 415 deletions(-)
>
Daniel Lezcano July 11, 2023, 8:25 a.m. UTC | #5
Hi Thierry,

do you have an update for this series?

Thanks

   -- Daniel

On 14/04/2023 14:57, Thierry Reding wrote:
> From: Thierry Reding <treding@nvidia.com>
> 
> Hi,
> 
> this set of patches removes the registration of the SOCTHERM internal
> throttling mechanism as cooling device. Since this throttling starts
> automatically once a certain temperature threshold is crossed, it
> doesn't make sense to represent it as a cooling device, which are
> typically "manually" activated by the thermal framework when thermal
> sensors report temperature thresholds being crossed.
> 
> Instead of using the cooling device mechanism, this statically programs
> the throttling mechanism when it is configured in device tree. In order
> to do this, an additional device tree property is needed to replace the
> information that was previously contained in trip points.
> 
> There's a few preparatory patches to make the removal a bit simpler and
> also some follow up cleanups included as well.
> 
> Thierry
> 
> Thierry Reding (10):
>    dt-bindings: thermal: tegra: Document throttle temperature
>    thermal: tegra: Use driver-private data consistently
>    thermal: tegra: Constify SoC-specific data
>    thermal: tegra: Do not register cooling device
>    thermal: tegra: Use unsigned int where appropriate
>    thermal: tegra: Avoid over-allocation of temporary array
>    thermal: tegra: Remove gratuitous error assignment
>    thermal: tegra: Minor stylistic cleanups
>    arm64: tegra: Rework SOCTHERM on Tegra132 and Tegra210
>    ARM: tegra: Rework SOCTHERM on Tegra124
> 
>   .../thermal/nvidia,tegra124-soctherm.yaml     |   7 +
>   arch/arm/boot/dts/tegra124.dtsi               |  65 +--
>   arch/arm64/boot/dts/nvidia/tegra132.dtsi      |  63 +--
>   arch/arm64/boot/dts/nvidia/tegra210.dtsi      |  83 +---
>   drivers/thermal/tegra/soctherm.c              | 392 ++++++------------
>   drivers/thermal/tegra/soctherm.h              |   1 +
>   drivers/thermal/tegra/tegra124-soctherm.c     |   4 +
>   drivers/thermal/tegra/tegra132-soctherm.c     |   4 +
>   drivers/thermal/tegra/tegra210-soctherm.c     |   4 +
>   9 files changed, 208 insertions(+), 415 deletions(-)
>
Thierry Reding July 11, 2023, 3:42 p.m. UTC | #6
On Tue, Jul 11, 2023 at 10:25:16AM +0200, Daniel Lezcano wrote:
> Hi Thierry,
> 
> do you have an update for this series?

Yeah, I've been working on this on and off for a while since I ran into
some complications with this version. I need to find a block of spare
time to go over the latest version again and do some testing. Hopefully
I can get around to that within this week or next.

Thierry
Daniel Lezcano July 12, 2023, 11:01 a.m. UTC | #7
On 11/07/2023 17:42, Thierry Reding wrote:
> On Tue, Jul 11, 2023 at 10:25:16AM +0200, Daniel Lezcano wrote:
>> Hi Thierry,
>>
>> do you have an update for this series?
> 
> Yeah, I've been working on this on and off for a while since I ran into
> some complications with this version. I need to find a block of spare
> time to go over the latest version again and do some testing. Hopefully
> I can get around to that within this week or next.

That is great, thanks !