Message ID | 20250103-topic-sm8650-thermal-cpu-idle-v1-1-faa1f011ecd9@linaro.org |
---|---|
State | New |
Headers | show |
Series | arm64: dts: qcom: sm8650: rework CPU & GPU thermal zones | expand |
On 8.01.2025 10:15 AM, Neil Armstrong wrote: > On 08/01/2025 04:11, Bjorn Andersson wrote: >> On Tue, Jan 07, 2025 at 09:13:18AM +0100, Neil Armstrong wrote: >>> Hi, >>> >>> On 07/01/2025 00:39, Bjorn Andersson wrote: >>>> On Fri, Jan 03, 2025 at 03:38:26PM +0100, Neil Armstrong wrote: >>>>> On the SM8650, the dynamic clock and voltage scaling (DCVS) is done in an >>>>> hardware controlled loop using the LMH and EPSS blocks with constraints and >>>>> OPPs programmed in the board firmware. >>>>> >>>>> Since the Hardware does a better job at maintaining the CPUs temperature >>>>> in an acceptable range by taking in account more parameters like the die >>>>> characteristics or other factory fused values, it makes no sense to try >>>>> and reproduce a similar set of constraints with the Linux cpufreq thermal >>>>> core. >>>>> >>>>> In addition, the tsens IP is responsible for monitoring the temperature >>>>> across the SoC and the current settings will heavily trigger the tsens >>>>> UP/LOW interrupts if the CPU temperatures reaches the hardware thermal >>>>> constraints which are currently defined in the DT. And since the CPUs >>>>> are not hooked in the thermal trip points, the potential interrupts and >>>>> calculations are a waste of system resources. >>>>> >>>>> Instead, set higher temperatures in the CPU trip points, and hook some CPU >>>>> idle injector with a 100% duty cycle at the highest trip point in the case >>>>> the hardware DCVS cannot handle the temperature surge, and try our best to >>>>> avoid reaching the critical temperature trip point which should trigger an >>>>> inevitable thermal shutdown. >>>>> >>>> >>>> Are you able to hit these higher temperatures? Do you have some test >>>> case where the idle-injection shows to be successful in blocking us from >>>> reaching the critical temp? >>> >>> No, I've been able to test idle-injection and observed a noticeable effect >>> but I had to set lower trip, do you know how I can easily "block" LMH/EPSS from >>> scaling down and let the temp go higher ? >>> >> >> I don't know how to override that configuration. I'll try to get some answers. SDM845 seems to expose a couple SCM calls for this purpose and it's already wired up in drivers/thermal/qcom/lmh.c >>>> E.g. in X13s (SC8280XP) we opted for relying on LMH/EPSS and define only >>>> the critical trip for when the hardware fails us. >>> >>> It's the goal here aswell >>> >> >> How about simplifying the patch by removing the idle-injection step and >> just rely on LMH/EPSS and the "critical" trip (at least until someone >> can prove that there's value in the extra mitigation)? > > OK, but I see value in this idle injection mitigation in that case LMH/EPSS > fails, the only factor in control of HLOS is by stopping scheduling tasks > since frequency won't be able to scale anymore. If LMH fails, your SoC is probably cooked already, anyway :( I'm not sure why idle injection isn't enabled by default if no other cooling methods are found. Perhaps that could be discussed with some thermal folks.. > Anyway, I agree it can be added later on, so should I drop the 2 trip points > and only leave the critical one ? I think sticking with critical=Tjmax + critical-action = "reboot" may be the way to go here. We may want to give some folks a heads up, so they can wire up skin sensors on their devices ahead of these changes landing tree-wide. Konrad
On Wed, Jan 08, 2025 at 10:15:34AM +0100, Neil Armstrong wrote: > On 08/01/2025 04:11, Bjorn Andersson wrote: > > On Tue, Jan 07, 2025 at 09:13:18AM +0100, Neil Armstrong wrote: > > > Hi, > > > > > > On 07/01/2025 00:39, Bjorn Andersson wrote: > > > > On Fri, Jan 03, 2025 at 03:38:26PM +0100, Neil Armstrong wrote: > > > > > On the SM8650, the dynamic clock and voltage scaling (DCVS) is done in an > > > > > hardware controlled loop using the LMH and EPSS blocks with constraints and > > > > > OPPs programmed in the board firmware. > > > > > > > > > > Since the Hardware does a better job at maintaining the CPUs temperature > > > > > in an acceptable range by taking in account more parameters like the die > > > > > characteristics or other factory fused values, it makes no sense to try > > > > > and reproduce a similar set of constraints with the Linux cpufreq thermal > > > > > core. > > > > > > > > > > In addition, the tsens IP is responsible for monitoring the temperature > > > > > across the SoC and the current settings will heavily trigger the tsens > > > > > UP/LOW interrupts if the CPU temperatures reaches the hardware thermal > > > > > constraints which are currently defined in the DT. And since the CPUs > > > > > are not hooked in the thermal trip points, the potential interrupts and > > > > > calculations are a waste of system resources. > > > > > > > > > > Instead, set higher temperatures in the CPU trip points, and hook some CPU > > > > > idle injector with a 100% duty cycle at the highest trip point in the case > > > > > the hardware DCVS cannot handle the temperature surge, and try our best to > > > > > avoid reaching the critical temperature trip point which should trigger an > > > > > inevitable thermal shutdown. > > > > > > > > > > > > > Are you able to hit these higher temperatures? Do you have some test > > > > case where the idle-injection shows to be successful in blocking us from > > > > reaching the critical temp? > > > > > > No, I've been able to test idle-injection and observed a noticeable effect > > > but I had to set lower trip, do you know how I can easily "block" LMH/EPSS from > > > scaling down and let the temp go higher ? > > > > > > > I don't know how to override that configuration. > > > > > > > > > > E.g. in X13s (SC8280XP) we opted for relying on LMH/EPSS and define only > > > > the critical trip for when the hardware fails us. > > > > > > It's the goal here aswell > > > > > > > How about simplifying the patch by removing the idle-injection step and > > just rely on LMH/EPSS and the "critical" trip (at least until someone > > can prove that there's value in the extra mitigation)? > > OK, but I see value in this idle injection mitigation in that case LMH/EPSS > fails, the only factor in control of HLOS is by stopping scheduling tasks > since frequency won't be able to scale anymore. > I think that sounds good, but afaict we don't have any indication of this being a problem and we don't have any way to test that it actually solves that problem. > Anyway, I agree it can be added later on, so should I drop the 2 trip points > and only leave the critical one ? > I think that's a simple and functional starting point - and it solves your IRQ issue. Regards, Bjorn
On 09/01/2025 22:01, Bjorn Andersson wrote: > On Wed, Jan 08, 2025 at 10:15:34AM +0100, Neil Armstrong wrote: >> On 08/01/2025 04:11, Bjorn Andersson wrote: >>> On Tue, Jan 07, 2025 at 09:13:18AM +0100, Neil Armstrong wrote: >>>> Hi, >>>> >>>> On 07/01/2025 00:39, Bjorn Andersson wrote: >>>>> On Fri, Jan 03, 2025 at 03:38:26PM +0100, Neil Armstrong wrote: >>>>>> On the SM8650, the dynamic clock and voltage scaling (DCVS) is done in an >>>>>> hardware controlled loop using the LMH and EPSS blocks with constraints and >>>>>> OPPs programmed in the board firmware. >>>>>> >>>>>> Since the Hardware does a better job at maintaining the CPUs temperature >>>>>> in an acceptable range by taking in account more parameters like the die >>>>>> characteristics or other factory fused values, it makes no sense to try >>>>>> and reproduce a similar set of constraints with the Linux cpufreq thermal >>>>>> core. >>>>>> >>>>>> In addition, the tsens IP is responsible for monitoring the temperature >>>>>> across the SoC and the current settings will heavily trigger the tsens >>>>>> UP/LOW interrupts if the CPU temperatures reaches the hardware thermal >>>>>> constraints which are currently defined in the DT. And since the CPUs >>>>>> are not hooked in the thermal trip points, the potential interrupts and >>>>>> calculations are a waste of system resources. >>>>>> >>>>>> Instead, set higher temperatures in the CPU trip points, and hook some CPU >>>>>> idle injector with a 100% duty cycle at the highest trip point in the case >>>>>> the hardware DCVS cannot handle the temperature surge, and try our best to >>>>>> avoid reaching the critical temperature trip point which should trigger an >>>>>> inevitable thermal shutdown. >>>>>> >>>>> >>>>> Are you able to hit these higher temperatures? Do you have some test >>>>> case where the idle-injection shows to be successful in blocking us from >>>>> reaching the critical temp? >>>> >>>> No, I've been able to test idle-injection and observed a noticeable effect >>>> but I had to set lower trip, do you know how I can easily "block" LMH/EPSS from >>>> scaling down and let the temp go higher ? >>>> >>> >>> I don't know how to override that configuration. >>> >>>>> >>>>> E.g. in X13s (SC8280XP) we opted for relying on LMH/EPSS and define only >>>>> the critical trip for when the hardware fails us. >>>> >>>> It's the goal here aswell >>>> >>> >>> How about simplifying the patch by removing the idle-injection step and >>> just rely on LMH/EPSS and the "critical" trip (at least until someone >>> can prove that there's value in the extra mitigation)? >> >> OK, but I see value in this idle injection mitigation in that case LMH/EPSS >> fails, the only factor in control of HLOS is by stopping scheduling tasks >> since frequency won't be able to scale anymore. >> > > I think that sounds good, but afaict we don't have any indication of > this being a problem and we don't have any way to test that it actually > solves that problem. Sure, let's postpone the idle injection when we can actually test it. > >> Anyway, I agree it can be added later on, so should I drop the 2 trip points >> and only leave the critical one ? >> > > I think that's a simple and functional starting point - and it solves > your IRQ issue. Ack Thanks, Neil > > Regards, > Bjorn
On 09/01/2025 16:18, Konrad Dybcio wrote: > On 8.01.2025 10:15 AM, Neil Armstrong wrote: >> On 08/01/2025 04:11, Bjorn Andersson wrote: >>> On Tue, Jan 07, 2025 at 09:13:18AM +0100, Neil Armstrong wrote: >>>> Hi, >>>> >>>> On 07/01/2025 00:39, Bjorn Andersson wrote: >>>>> On Fri, Jan 03, 2025 at 03:38:26PM +0100, Neil Armstrong wrote: >>>>>> On the SM8650, the dynamic clock and voltage scaling (DCVS) is done in an >>>>>> hardware controlled loop using the LMH and EPSS blocks with constraints and >>>>>> OPPs programmed in the board firmware. >>>>>> >>>>>> Since the Hardware does a better job at maintaining the CPUs temperature >>>>>> in an acceptable range by taking in account more parameters like the die >>>>>> characteristics or other factory fused values, it makes no sense to try >>>>>> and reproduce a similar set of constraints with the Linux cpufreq thermal >>>>>> core. >>>>>> >>>>>> In addition, the tsens IP is responsible for monitoring the temperature >>>>>> across the SoC and the current settings will heavily trigger the tsens >>>>>> UP/LOW interrupts if the CPU temperatures reaches the hardware thermal >>>>>> constraints which are currently defined in the DT. And since the CPUs >>>>>> are not hooked in the thermal trip points, the potential interrupts and >>>>>> calculations are a waste of system resources. >>>>>> >>>>>> Instead, set higher temperatures in the CPU trip points, and hook some CPU >>>>>> idle injector with a 100% duty cycle at the highest trip point in the case >>>>>> the hardware DCVS cannot handle the temperature surge, and try our best to >>>>>> avoid reaching the critical temperature trip point which should trigger an >>>>>> inevitable thermal shutdown. >>>>>> >>>>> >>>>> Are you able to hit these higher temperatures? Do you have some test >>>>> case where the idle-injection shows to be successful in blocking us from >>>>> reaching the critical temp? >>>> >>>> No, I've been able to test idle-injection and observed a noticeable effect >>>> but I had to set lower trip, do you know how I can easily "block" LMH/EPSS from >>>> scaling down and let the temp go higher ? >>>> >>> >>> I don't know how to override that configuration. > > I'll try to get some answers. SDM845 seems to expose a couple SCM calls for > this purpose and it's already wired up in drivers/thermal/qcom/lmh.c Would be great, thx > >>>>> E.g. in X13s (SC8280XP) we opted for relying on LMH/EPSS and define only >>>>> the critical trip for when the hardware fails us. >>>> >>>> It's the goal here aswell >>>> >>> >>> How about simplifying the patch by removing the idle-injection step and >>> just rely on LMH/EPSS and the "critical" trip (at least until someone >>> can prove that there's value in the extra mitigation)? >> >> OK, but I see value in this idle injection mitigation in that case LMH/EPSS >> fails, the only factor in control of HLOS is by stopping scheduling tasks >> since frequency won't be able to scale anymore. > > If LMH fails, your SoC is probably cooked already, anyway :( > > I'm not sure why idle injection isn't enabled by default if no other cooling > methods are found. Perhaps that could be discussed with some thermal folks.. Yeah this is good question, this should probably be the default "hot" behaviour > >> Anyway, I agree it can be added later on, so should I drop the 2 trip points >> and only leave the critical one ? > > I think sticking with critical=Tjmax + critical-action = "reboot" may be the > way to go here. > > We may want to give some folks a heads up, so they can wire up skin sensors > on their devices ahead of these changes landing tree-wide. Yeah it's also my goal, will respin with only critical. Thanks, Neil > > Konrad
diff --git a/arch/arm64/boot/dts/qcom/sm8650.dtsi b/arch/arm64/boot/dts/qcom/sm8650.dtsi index 25e47505adcb790d09f1d2726386438487255824..448374a32e07151e35727d92fab77356769aea8a 100644 --- a/arch/arm64/boot/dts/qcom/sm8650.dtsi +++ b/arch/arm64/boot/dts/qcom/sm8650.dtsi @@ -99,6 +99,13 @@ l3_0: l3-cache { cache-unified; }; }; + + cpu0_idle: thermal-idle { + #cooling-cells = <2>; + duration-us = <800000>; + exit-latency-us = <10000>; + }; + }; cpu1: cpu@100 { @@ -119,6 +126,12 @@ cpu1: cpu@100 { qcom,freq-domain = <&cpufreq_hw 0>; #cooling-cells = <2>; + + cpu1_idle: thermal-idle { + #cooling-cells = <2>; + duration-us = <800000>; + exit-latency-us = <10000>; + }; }; cpu2: cpu@200 { @@ -146,6 +159,12 @@ l2_200: l2-cache { cache-unified; next-level-cache = <&l3_0>; }; + + cpu2_idle: thermal-idle { + #cooling-cells = <2>; + duration-us = <800000>; + exit-latency-us = <10000>; + }; }; cpu3: cpu@300 { @@ -166,6 +185,12 @@ cpu3: cpu@300 { qcom,freq-domain = <&cpufreq_hw 3>; #cooling-cells = <2>; + + cpu3_idle: thermal-idle { + #cooling-cells = <2>; + duration-us = <800000>; + exit-latency-us = <10000>; + }; }; cpu4: cpu@400 { @@ -193,6 +218,12 @@ l2_400: l2-cache { cache-unified; next-level-cache = <&l3_0>; }; + + cpu4_idle: thermal-idle { + #cooling-cells = <2>; + duration-us = <800000>; + exit-latency-us = <10000>; + }; }; cpu5: cpu@500 { @@ -220,6 +251,12 @@ l2_500: l2-cache { cache-unified; next-level-cache = <&l3_0>; }; + + cpu5_idle: thermal-idle { + #cooling-cells = <2>; + duration-us = <800000>; + exit-latency-us = <10000>; + }; }; cpu6: cpu@600 { @@ -247,6 +284,12 @@ l2_600: l2-cache { cache-unified; next-level-cache = <&l3_0>; }; + + cpu6_idle: thermal-idle { + #cooling-cells = <2>; + duration-us = <800000>; + exit-latency-us = <10000>; + }; }; cpu7: cpu@700 { @@ -274,6 +317,12 @@ l2_700: l2-cache { cache-unified; next-level-cache = <&l3_0>; }; + + cpu7_idle: thermal-idle { + #cooling-cells = <2>; + duration-us = <800000>; + exit-latency-us = <10000>; + }; }; cpu-map { @@ -5752,23 +5801,30 @@ cpu2-top-thermal { trips { trip-point0 { - temperature = <90000>; + temperature = <108000>; hysteresis = <2000>; type = "passive"; }; - trip-point1 { - temperature = <95000>; + cpu2_top_alert1: trip-point1 { + temperature = <110000>; hysteresis = <2000>; type = "passive"; }; cpu2-critical { - temperature = <110000>; + temperature = <115000>; hysteresis = <1000>; type = "critical"; }; }; + + cooling-maps { + map0 { + trip = <&cpu2_top_alert1>; + cooling-device = <&cpu2_idle 100 100>; + }; + }; }; cpu2-bottom-thermal { @@ -5776,23 +5832,30 @@ cpu2-bottom-thermal { trips { trip-point0 { - temperature = <90000>; + temperature = <108000>; hysteresis = <2000>; type = "passive"; }; - trip-point1 { - temperature = <95000>; + cpu2_bottom_alert1: trip-point1 { + temperature = <110000>; hysteresis = <2000>; type = "passive"; }; cpu2-critical { - temperature = <110000>; + temperature = <115000>; hysteresis = <1000>; type = "critical"; }; }; + + cooling-maps { + map0 { + trip = <&cpu2_bottom_alert1>; + cooling-device = <&cpu2_idle 100 100>; + }; + }; }; cpu3-top-thermal { @@ -5800,23 +5863,30 @@ cpu3-top-thermal { trips { trip-point0 { - temperature = <90000>; + temperature = <108000>; hysteresis = <2000>; type = "passive"; }; - trip-point1 { - temperature = <95000>; + cpu3_top_alert1: trip-point1 { + temperature = <110000>; hysteresis = <2000>; type = "passive"; }; cpu3-critical { - temperature = <110000>; + temperature = <115000>; hysteresis = <1000>; type = "critical"; }; }; + + cooling-maps { + map0 { + trip = <&cpu3_top_alert1>; + cooling-device = <&cpu3_idle 100 100>; + }; + }; }; cpu3-bottom-thermal { @@ -5824,23 +5894,30 @@ cpu3-bottom-thermal { trips { trip-point0 { - temperature = <90000>; + temperature = <108000>; hysteresis = <2000>; type = "passive"; }; - trip-point1 { - temperature = <95000>; + cpu3_bottom_alert1: trip-point1 { + temperature = <110000>; hysteresis = <2000>; type = "passive"; }; cpu3-critical { - temperature = <110000>; + temperature = <115000>; hysteresis = <1000>; type = "critical"; }; }; + + cooling-maps { + map0 { + trip = <&cpu3_bottom_alert1>; + cooling-device = <&cpu3_idle 100 100>; + }; + }; }; cpu4-top-thermal { @@ -5848,23 +5925,30 @@ cpu4-top-thermal { trips { trip-point0 { - temperature = <90000>; + temperature = <108000>; hysteresis = <2000>; type = "passive"; }; - trip-point1 { - temperature = <95000>; + cpu4_top_alert1: trip-point1 { + temperature = <110000>; hysteresis = <2000>; type = "passive"; }; cpu4-critical { - temperature = <110000>; + temperature = <115000>; hysteresis = <1000>; type = "critical"; }; }; + + cooling-maps { + map0 { + trip = <&cpu4_top_alert1>; + cooling-device = <&cpu4_idle 100 100>; + }; + }; }; cpu4-bottom-thermal { @@ -5872,23 +5956,30 @@ cpu4-bottom-thermal { trips { trip-point0 { - temperature = <90000>; + temperature = <108000>; hysteresis = <2000>; type = "passive"; }; - trip-point1 { - temperature = <95000>; + cpu4_bottom_alert1: trip-point1 { + temperature = <110000>; hysteresis = <2000>; type = "passive"; }; cpu4-critical { - temperature = <110000>; + temperature = <115000>; hysteresis = <1000>; type = "critical"; }; }; + + cooling-maps { + map0 { + trip = <&cpu4_bottom_alert1>; + cooling-device = <&cpu4_idle 100 100>; + }; + }; }; cpu5-top-thermal { @@ -5896,23 +5987,30 @@ cpu5-top-thermal { trips { trip-point0 { - temperature = <90000>; + temperature = <108000>; hysteresis = <2000>; type = "passive"; }; - trip-point1 { - temperature = <95000>; + cpu5_top_alert1: trip-point1 { + temperature = <110000>; hysteresis = <2000>; type = "passive"; }; cpu5-critical { - temperature = <110000>; + temperature = <115000>; hysteresis = <1000>; type = "critical"; }; }; + + cooling-maps { + map0 { + trip = <&cpu5_top_alert1>; + cooling-device = <&cpu5_idle 100 100>; + }; + }; }; cpu5-bottom-thermal { @@ -5920,23 +6018,30 @@ cpu5-bottom-thermal { trips { trip-point0 { - temperature = <90000>; + temperature = <108000>; hysteresis = <2000>; type = "passive"; }; - trip-point1 { - temperature = <95000>; + cpu5_bottom_alert1: trip-point1 { + temperature = <110000>; hysteresis = <2000>; type = "passive"; }; cpu5-critical { - temperature = <110000>; + temperature = <115000>; hysteresis = <1000>; type = "critical"; }; }; + + cooling-maps { + map0 { + trip = <&cpu5_bottom_alert1>; + cooling-device = <&cpu5_idle 100 100>; + }; + }; }; cpu6-top-thermal { @@ -5944,23 +6049,30 @@ cpu6-top-thermal { trips { trip-point0 { - temperature = <90000>; + temperature = <108000>; hysteresis = <2000>; type = "passive"; }; - trip-point1 { - temperature = <95000>; + cpu6_top_alert1: trip-point1 { + temperature = <110000>; hysteresis = <2000>; type = "passive"; }; cpu6-critical { - temperature = <110000>; + temperature = <115000>; hysteresis = <1000>; type = "critical"; }; }; + + cooling-maps { + map0 { + trip = <&cpu6_top_alert1>; + cooling-device = <&cpu6_idle 100 100>; + }; + }; }; cpu6-bottom-thermal { @@ -5968,23 +6080,30 @@ cpu6-bottom-thermal { trips { trip-point0 { - temperature = <90000>; + temperature = <108000>; hysteresis = <2000>; type = "passive"; }; - trip-point1 { - temperature = <95000>; + cpu6_bottom_alert1: trip-point1 { + temperature = <110000>; hysteresis = <2000>; type = "passive"; }; cpu6-critical { - temperature = <110000>; + temperature = <115000>; hysteresis = <1000>; type = "critical"; }; }; + + cooling-maps { + map0 { + trip = <&cpu6_bottom_alert1>; + cooling-device = <&cpu6_idle 100 100>; + }; + }; }; aoss1-thermal { @@ -6010,23 +6129,30 @@ cpu7-top-thermal { trips { trip-point0 { - temperature = <90000>; + temperature = <108000>; hysteresis = <2000>; type = "passive"; }; - trip-point1 { - temperature = <95000>; + cpu7_top_alert1: trip-point1 { + temperature = <110000>; hysteresis = <2000>; type = "passive"; }; cpu7-critical { - temperature = <110000>; + temperature = <115000>; hysteresis = <1000>; type = "critical"; }; }; + + cooling-maps { + map0 { + trip = <&cpu7_top_alert1>; + cooling-device = <&cpu7_idle 100 100>; + }; + }; }; cpu7-middle-thermal { @@ -6034,23 +6160,30 @@ cpu7-middle-thermal { trips { trip-point0 { - temperature = <90000>; + temperature = <108000>; hysteresis = <2000>; type = "passive"; }; - trip-point1 { - temperature = <95000>; + cpu7_middle_alert1: trip-point1 { + temperature = <110000>; hysteresis = <2000>; type = "passive"; }; cpu7-critical { - temperature = <110000>; + temperature = <115000>; hysteresis = <1000>; type = "critical"; }; }; + + cooling-maps { + map0 { + trip = <&cpu7_middle_alert1>; + cooling-device = <&cpu7_idle 100 100>; + }; + }; }; cpu7-bottom-thermal { @@ -6058,23 +6191,30 @@ cpu7-bottom-thermal { trips { trip-point0 { - temperature = <90000>; + temperature = <108000>; hysteresis = <2000>; type = "passive"; }; - trip-point1 { - temperature = <95000>; + cpu7_bottom_alert1: trip-point1 { + temperature = <110000>; hysteresis = <2000>; type = "passive"; }; cpu7-critical { - temperature = <110000>; + temperature = <115000>; hysteresis = <1000>; type = "critical"; }; }; + + cooling-maps { + map0 { + trip = <&cpu7_bottom_alert1>; + cooling-device = <&cpu7_idle 100 100>; + }; + }; }; cpu0-thermal { @@ -6082,23 +6222,30 @@ cpu0-thermal { trips { trip-point0 { - temperature = <90000>; + temperature = <108000>; hysteresis = <2000>; type = "passive"; }; - trip-point1 { - temperature = <95000>; + cpu0_alert1: trip-point1 { + temperature = <110000>; hysteresis = <2000>; type = "passive"; }; cpu0-critical { - temperature = <110000>; + temperature = <115000>; hysteresis = <1000>; type = "critical"; }; }; + + cooling-maps { + map0 { + trip = <&cpu0_alert1>; + cooling-device = <&cpu0_idle 100 100>; + }; + }; }; cpu1-thermal { @@ -6106,23 +6253,30 @@ cpu1-thermal { trips { trip-point0 { - temperature = <90000>; + temperature = <108000>; hysteresis = <2000>; type = "passive"; }; - trip-point1 { - temperature = <95000>; + cpu1_alert1: trip-point1 { + temperature = <110000>; hysteresis = <2000>; type = "passive"; }; cpu1-critical { - temperature = <110000>; + temperature = <115000>; hysteresis = <1000>; type = "critical"; }; }; + + cooling-maps { + map0 { + trip = <&cpu1_alert1>; + cooling-device = <&cpu1_idle 100 100>; + }; + }; }; nsphvx0-thermal {
On the SM8650, the dynamic clock and voltage scaling (DCVS) is done in an hardware controlled loop using the LMH and EPSS blocks with constraints and OPPs programmed in the board firmware. Since the Hardware does a better job at maintaining the CPUs temperature in an acceptable range by taking in account more parameters like the die characteristics or other factory fused values, it makes no sense to try and reproduce a similar set of constraints with the Linux cpufreq thermal core. In addition, the tsens IP is responsible for monitoring the temperature across the SoC and the current settings will heavily trigger the tsens UP/LOW interrupts if the CPU temperatures reaches the hardware thermal constraints which are currently defined in the DT. And since the CPUs are not hooked in the thermal trip points, the potential interrupts and calculations are a waste of system resources. Instead, set higher temperatures in the CPU trip points, and hook some CPU idle injector with a 100% duty cycle at the highest trip point in the case the hardware DCVS cannot handle the temperature surge, and try our best to avoid reaching the critical temperature trip point which should trigger an inevitable thermal shutdown. Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> --- arch/arm64/boot/dts/qcom/sm8650.dtsi | 274 +++++++++++++++++++++++++++-------- 1 file changed, 214 insertions(+), 60 deletions(-)