Message ID | 20210115094744.21156-1-rui.zhang@intel.com |
---|---|
State | Superseded |
Headers | show |
Series | thermal/intel: introduce tcc cooling driver | expand |
> -----Original Message----- > From: Doug Smythies <dsmythies@telus.net> > Sent: Sunday, January 17, 2021 5:22 AM > To: Zhang, Rui <rui.zhang@intel.com>; Brown, Len <len.brown@intel.com> > Cc: daniel.lezcano@linaro.org; srinivas.pandruvada@linux.intel.com; linux- > pm@vger.kernel.org; 'Doug Smythies' <dsmythies@telus.net> > Subject: RE: [PATCH] thermal/intel: introduce tcc cooling driver > Importance: High > > On 2021.01.16 09:08 Doug Smythies wrote: > > On 2021.01.15 Zhang Rui wrote: > > Added Len to the "To" list: > > Turostat has another issue with this stuff. > It will be more work than I want to do to submit a fix patch, so I am not, but > see further down for my hack fix. > > ... > > > Example step function overshoot, trip point set to 55 degrees C. > > > > doug@s18:~$ sudo ~/turbostat --Summary --quiet --show > > Busy%,Bzy_MHz,PkgTmp,PkgWatt,GFXWatt,IRQ -- interval 1 > > Busy% Bzy_MHz IRQ PkgTmp PkgWatt GFXWatt > > 0.07 800 45 24 1.89 0.00 > > 0.04 800 29 23 1.89 0.00 > > 61.76 4546 4151 66 103.77 0.00 < step function load applied on 4 of 6 > cores > > 67.76 4570 4476 66 120.42 0.00 > > 68.03 4567 4488 66 120.73 0.00 > > 67.98 4572 4492 67 121.00 0.00 < 19 degrees over trip point > > 68.10 4489 4493 58 109.19 0.00 < this throttling is either the power > servo or the temp > > servo. > > 68.08 4262 4476 51 82.82 0.00 < this throttling is the temp servo. > > 68.13 4143 4513 48 75.16 0.00 > > 68.03 4086 4488 46 71.87 0.00 < It actually undershoots often, I don't > know why. > > 68.12 4000 4505 46 67.02 0.00 < often it doesn't undershoot. > > It turns out that tubostat does not list the package temperature properly if it > is started with an active TCC offset. > It erroneously includes the offset in the temperature math. > In the above example turbostat had also not yet been fixed for the bit mask > issue. So the real temp above was 59 degrees C. > > > 68.44 4000 4502 45 67.16 0.00 > > 68.06 4000 4483 45 66.95 0.00 > > 68.02 3973 4490 44 65.20 0.00 > > 67.94 3900 4489 43 60.51 0.00 > > 67.88 3900 4501 44 60.55 0.00 > > 67.85 3900 4472 43 60.52 0.00 > > And it settled at about 56 degrees, close to what was asked for. > > To proceed with my work, I did a hack fix to turbostat: > > doug@s18:~/temp-k-git/linux/tools/power/x86/turbostat$ git diff diff --git > a/tools/power/x86/turbostat/turbostat.c > b/tools/power/x86/turbostat/turbostat.c > index d7acdd4d16c4..7f0a22ab3a0d 100644 > --- a/tools/power/x86/turbostat/turbostat.c > +++ b/tools/power/x86/turbostat/turbostat.c > @@ -4831,6 +4831,7 @@ int read_tcc_activation_temp() > fprintf(outf, "cpu%d: MSR_IA32_TEMPERATURE_TARGET: 0x%08llx > (%d C) (%d default - %d offset)\n", > base_cpu, msr, tcc, target_c, offset_c); > > + tcc = target_c; > return tcc; > } > Yes, this is a right fix. I think Len already knows this breakage and he will propose some fix soon. > So this: > > cpu4: MSR_IA32_TEMPERATURE_TARGET: 0x2b64100d (57 C) (100 default - > 43 offset) > cpu0: MSR_IA32_PACKAGE_THERM_STATUS: 0x88420000 (-9 C) > > becomes this: > > cpu1: MSR_IA32_TEMPERATURE_TARGET: 0x2b64100d (57 C) (100 default - > 43 offset) > cpu0: MSR_IA32_PACKAGE_THERM_STATUS: 0x88400000 (36 C) > > and this: > > Busy% Bzy_MHz IRQ PkgTmp PkgWatt GFXWatt > 0.08 1079 928 -11 1.91 0.00 > > Becomes this: > > Busy% Bzy_MHz IRQ PkgTmp PkgWatt GFXWatt > 0.05 1046 846 32 1.94 0.00 > > So now back to my overshoot example: > > This: > > > 67.98 4572 4492 67 121.00 0.00 < 19 degrees over trip point > > Was actually: > > > 67.98 4572 4492 80 121.00 0.00 <<< 25 degrees over trip point > > But let's just do it again: > > doug@s18:~$ cat /sys/devices/virtual/thermal/cooling_device11/cur_state > 43 <<< so 100 - 43 = 57 degrees trip point. > doug@s18:~$ sudo ~/turbostat --Summary --quiet --show > Busy%,Bzy_MHz,PkgTmp,PkgWatt,GFXWatt,IRQ --interval 0.25 > Busy% Bzy_MHz IRQ PkgTmp PkgWatt GFXWatt > 0.09 800 6 36 2.01 0.00 > 0.16 800 23 36 2.00 0.00 > 0.11 800 14 36 2.15 0.00 > 66.81 4461 1160 70 101.17 0.00 <<< load applied, temp up 34 degrees in > less than 0.25 seconds. Normal. > 68.06 4581 1126 74 117.36 0.00 > 67.69 4589 1119 76 119.60 0.00 > 67.80 4589 1125 77 120.94 0.00 > 67.83 4587 1132 78 120.75 0.00 > 67.68 4591 1125 78 121.63 0.00 > 68.07 4585 1139 77 121.25 0.00 > 67.80 4588 1121 79 121.41 0.00 <<< now 20 degrees over trip point. > 68.57 4579 1139 79 121.71 0.00 > ... > 68.03 4220 1130 63 80.28 0.00 <<< it takes quite awhile (>7 seconds) to > really throttle down. What platform this is? On a KBL platform I'm running right now, with performance governor, and tcc offset set to 30 (Effective TCC is 70c), and also turbostat fixed, I can observe that 1. all cpus running at max turbo freq (3.9G) when idle, PkgTmp around 40C 2. with load applied (I use stress tool to get 100% CPU load), the PkgTmp reports 70C and the frequency drops to around 3G, IMMEDIATELY. 3. when I change TCC Offset to 60, cpu is throttled to around 200MHz, and the temperature is at around 50C, IMMEDIATELY. 4. when I change TCC Offset to 20, cpu freq raises to turbo range, and PkgTmp reaches 80C, IMMEDIATELY. So in your test, there is something I don't understand. 😊 a) it take such a long time (7+ seconds) to throttle b) it throttles to a frequency that is not low enough (in order to keep the system under effective TCC temperature, the frequency can be throttled to below turbo range, LFM, and even below LFM in my case) Can you please try performance governor and 100% CPU load to see if the symptom is the same? thanks, rui > > ... Doug >
Hi, Doug, Thanks for testing this patch. > -----Original Message----- > From: Doug Smythies <dsmythies@telus.net> > Sent: Sunday, January 17, 2021 1:08 AM > To: Zhang, Rui <rui.zhang@intel.com> > Cc: daniel.lezcano@linaro.org; srinivas.pandruvada@linux.intel.com; linux- > pm@vger.kernel.org > Subject: RE: [PATCH] thermal/intel: introduce tcc cooling driver > Importance: High > > On 2021.01.15 Zhang Rui wrote: > > > > On Intel processors, the core frequency can be reduced below OS > > request, when the current temperature reaches the TCC (Thermal Control > > Circuit) activation temperature. > > > > The default TCC activation temperature is specified by > > MSR_IA32_TEMPERATURE_TARGET. However, it can be adjusted by > specifying > > an offset in degrees C, using the TCC Offset bits in the same MSR register. > > > > This patch introduces a cooling devices driver that utilizes the TCC > > Offset feature. The bigger the current cooling state is, the lower the > > effective TCC activation temperature is, so that the processors can be > > throttled earlier before system critical overheats. > > Thank you for this useful patch. > My systems don't need thermald or any other thermal control, but it is nice > to have this extra margin to add to the critical stuff, as a backup. > I also like to use the offset to test stuff. > > I use the internal power limit servo for power limiting, and that servo works > very well indeed. Using this temperature offset as a way to servo the > thermal operating limit does work, but tends to overshoot, oscillate, hold low > excessively long (minutes). Do you have a script to test and show the drawbacks of this feature? It seems that it behaves differently on different platforms. Maybe we can evaluate this on more platforms. > It also seems to limit CPU clock frequency > reduction to the non-turbo limit, regardless of the desired maximum > temperature. > > I am not familiar with the thermal stuff at all, and didn't know where to find > the trip point knob. Anyway, found "cooling_devices11". > > I do not understand this: > > ~$ cat /sys/devices/virtual/thermal/cooling_device11/stats/trans_table > cat: /sys/devices/virtual/thermal/cooling_device11/stats/trans_table: File > too large This is a known issue that stats table can not handle devices with too many cooling states, say, 127 cooling states for TCC Offset cooling device. We can ignore this for now. > > Rather than enter the actual TCC offset, I would rather enter the desired trip > point, and have the driver do the math to convert it to the offset. Hmmm, a writable trip point? I need to think about this. > > Example step function overshoot, trip point set to 55 degrees C. > > doug@s18:~$ sudo ~/turbostat --Summary --quiet --show > Busy%,Bzy_MHz,PkgTmp,PkgWatt,GFXWatt,IRQ --interval 1 > Busy% Bzy_MHz IRQ PkgTmp PkgWatt GFXWatt > 0.07 800 45 24 1.89 0.00 > 0.04 800 29 23 1.89 0.00 > 61.76 4546 4151 66 103.77 0.00 < step function load applied on 4 of 6 > cores > 67.76 4570 4476 66 120.42 0.00 > 68.03 4567 4488 66 120.73 0.00 > 67.98 4572 4492 67 121.00 0.00 < 19 degrees over trip point > 68.10 4489 4493 58 109.19 0.00 < this throttling is either the power > servo or the temp servo. > 68.08 4262 4476 51 82.82 0.00 < this throttling is the temp servo. > 68.13 4143 4513 48 75.16 0.00 > 68.03 4086 4488 46 71.87 0.00 < It actually undershoots often, I don't > know why. > 68.12 4000 4505 46 67.02 0.00 < often it doesn't undershoot. > 68.44 4000 4502 45 67.16 0.00 > 68.06 4000 4483 45 66.95 0.00 > 68.02 3973 4490 44 65.20 0.00 > 67.94 3900 4489 43 60.51 0.00 > 67.88 3900 4501 44 60.55 0.00 > 67.85 3900 4472 43 60.52 0.00 > 67.96 3900 4481 43 60.59 0.00 > 68.26 3900 4501 44 60.70 0.00 > 67.93 3900 4498 43 60.58 0.00 > 68.03 3900 4476 43 60.68 0.00 > 67.83 3900 4481 44 60.54 0.00 > 35.06 3895 2412 25 32.13 0.00 < load removed. > 0.04 800 25 24 1.89 0.00 > 0.04 800 22 23 1.89 0.00 > 0.06 800 35 23 1.90 0.00 > 0.03 800 18 23 1.89 0.00 > 0.04 800 26 22 1.90 0.00 > 0.30 1927 44 23 1.97 0.00 > ^C0.10 800 25 23 1.91 0.00 > > Example long time to recover: > (actually, this example never recovers, unusual): > Note: 3.7 GHz is the limit. > > doug@s18:~$ sudo ~/turbostat --Summary --quiet --show > Busy%,Bzy_MHz,PkgTmp,PkgWatt,GFXWatt,IRQ --interval 30 > Busy% Bzy_MHz IRQ PkgTmp PkgWatt GFXWatt > 67.58 3700 134812 42 52.15 0.00 <<< the trip point was changed from 37 > to 57 degrees > 67.90 3700 134964 42 52.08 0.00 > 68.07 3700 134424 42 52.06 0.00 > 68.01 3700 134415 41 50.76 0.00 > 68.14 3700 134521 41 50.78 0.00 > 68.11 3700 134424 42 50.75 0.00 > 68.03 3700 134329 42 50.70 0.00 > 68.11 3700 134321 42 50.76 0.00 > 68.05 3700 134456 42 51.09 0.00 > 68.12 3700 134549 42 52.21 0.00 > 68.12 3700 134482 42 52.19 0.00 > 68.10 3700 134301 42 52.20 0.00 > 68.11 3700 134444 42 52.14 0.00 > 68.08 3700 134422 42 52.17 0.00 > 68.07 3700 134430 42 52.23 0.00 > 68.00 3700 134723 42 52.12 0.00 > 67.96 3711 135207 44 52.53 0.00 <<< It takes 8 minutes until the > frequency goes above 3.7 GHz > 68.05 3765 134519 42 54.34 0.00 > 68.11 3771 134461 43 54.60 0.00 > 67.83 3763 134867 43 54.26 0.00 > 67.93 3773 134577 43 54.78 0.00 <<< But it never recovers, Why not? > ... > > For unknown reason the processor seems to now think it is not heavily > loaded. From my MSR decoder: > > 0x64F: MSR_CORE_PERF_LIMIT_REASONS: 200020 AUTO AUTOL > > From the book: > > > Autonomous Utilization-Based Frequency Control Status (R0) When set, > > frequency is reduced below the operating system request because the > > processor has detected that utilization is low. > > Which is not true. > > Anyway, > > Acked-by: Doug Smythies <dsmythies@telus.net> > thanks, rui
On 2021.01.18 01:32 Zhang, Rui wrote: > On 2021.01.17 05:22 Doug Smythies wrote: > > On 2021.01.16 09:08 Doug Smythies wrote: > > > On 2021.01.15 Zhang Rui wrote: ... > > What platform this is? My i5-9600K test server. Intel(R) Core(TM) i5-9600K CPU @ 3.70GHz 6 CPUs and 6 cores. Kernel: 5.11-rc3 + this patch. Water cooled, with water pump always running full speed. > On a KBL platform I'm running right now, with performance governor, and tcc offset set to 30 > (Effective TCC is 70c), and also turbostat fixed, > I can observe that > 1. all cpus running at max turbo freq (3.9G) when idle, PkgTmp around 40C > 2. with load applied (I use stress tool to get 100% CPU load), the PkgTmp reports 70C and the > frequency drops to around 3G, IMMEDIATELY. > 3. when I change TCC Offset to 60, cpu is throttled to around 200MHz, and the temperature is at around > 50C, IMMEDIATELY. > 4. when I change TCC Offset to 20, cpu freq raises to turbo range, and PkgTmp reaches 80C, > IMMEDIATELY. O.K. You should be able to measure "IMMEDIATELY" and tell us what it is. > > So in your test, there is something I don't understand. 😊 > a) it take such a long time (7+ seconds) to throttle See test results below, it does seem to throttle quickly, but then the temperature creeps up. > b) it throttles to a frequency that is not low enough (in order to keep the system under effective TCC > temperature, the frequency can be throttled to below turbo range, LFM, and even below LFM in my case) c) it can take a long time to respond to an increase in allowed temperature. Likely related to some integral term build up from condition "b" above, because yours isn't clamped to 3.7 GHz, the response is more "immediate". I test both conditions, repeatedly below. > > Can you please try performance governor and 100% CPU load to see if the symptom is the same? I did 100% load on 4 of 6 CPUs on purpose: So as not to hit PKG Limit #2 from the outset; To have 2 CPUs idle, as I thought it might be more challenging. In terms of maximum heat generation, or maximum energy used, I studied every method I could find, including several of my own methods, settling on prime95 / torture test / max heat method. Note: all previous work was done with the intel_pstate driver, HWP enabled, powersave governor. Test 1: intel_cpufreq, HWP enabled, performance governor. Test 1.1: startup delay, requires faster sampling: MSR_IA32_TEMPERATURE_TARGET: 0x2a64100d (58 C) (100 default - 42 offset) at 58 degrees it shouldn't clamp. doug@s18:~$ sudo ~/turbostat --Summary --quiet --show Busy%,Bzy_MHz,PkgTmp,PkgWatt,GFXWatt,IRQ --interval 0.25 Busy% Bzy_MHz IRQ PkgTmp PkgWatt GFXWatt 0.02 4600 6 31 1.98 0.00 0.53 4600 41 31 2.54 0.00 33.29 4360 645 52 37.34 0.00 <<< PKG Limit #2 already engaged 99.03 4271 1512 59 121.84 0.00 <<< O.K. Seems additional throttling is "IMMEDIATE" 98.85 4244 1511 60 119.81 0.00 98.80 4239 1516 61 119.71 0.00 98.82 4230 1510 63 120.02 0.00 98.84 4228 1509 63 119.32 0.00 98.81 4230 1514 63 120.16 0.00 98.78 4224 1511 63 119.00 0.00 98.82 4226 1510 63 119.18 0.00 98.81 4225 1514 64 119.77 0.00 98.84 4225 1509 63 119.23 0.00 98.82 4225 1511 65 119.56 0.00 <<< But, what? Now 7 degrees over. Note: increase in waste heat for otherwise unchanged operating conditions is normal at high limits of operation. Note: I do not know the level of hysteresis, if any. This might be normal. 98.80 4227 1515 63 119.93 0.00 ... delete 14.5 seconds ... 100.25 4217 1514 63 111.25 0.00 100.26 4200 1514 62 109.29 0.00 <<< O.K. finally brings it down. 100.26 4200 1509 62 109.15 0.00 ... delete 8.75 seconds 100.26 4100 1509 60 101.64 0.00 100.26 4100 1511 60 101.61 0.00 <<< These two are important, because they 100.25 4010 1515 58 94.65 0.00 <<< reveal that we did not hit PKG Limit #1 <<< 100.0 watts <<< and we know for certain it is the temp <<< servo. Test 1.2: clamp and recover delay, requires slower sampling: MSR_IA32_TEMPERATURE_TARGET: 0x3f64100d (37 C) (100 default - 63 offset) $ sudo ~/turbostat --Summary --quiet --show Busy%,Bzy_MHz,PkgTmp,PkgWatt,GFXWatt,IRQ --interval 30 Busy% Bzy_MHz IRQ PkgTmp PkgWatt GFXWatt 100.26 3700 180608 59 72.80 0.00 100.26 3700 180407 60 72.70 0.00 <<< steady state 100.26 3700 181663 59 72.65 0.00 100.26 3700 46322 59 72.66 0.00 <<< close to time offset set to 37) 100.26 3700 180508 60 72.93 0.00 100.26 3700 180396 59 74.24 0.00 100.26 3700 180330 60 74.74 0.00 100.26 3700 180359 59 74.77 0.00 100.26 3775 180327 64 79.08 0.00 <<< ~~2 minutes 30 seconds response time 100.26 3853 180369 62 84.72 0.00 100.26 3865 180571 64 85.83 0.00 100.26 3866 180383 62 85.90 0.00 Now, change to 1 second sample time and change the offset again, but this time it is not clamped already first. doug@s18:~$ sudo ~/turbostat --Summary --quiet --show Busy%,Bzy_MHz,PkgTmp,PkgWatt,GFXWatt,IRQ --interval 1 Busy% Bzy_MHz IRQ PkgTmp PkgWatt GFXWatt 100.26 3875 6093 62 87.49 0.00 100.26 3800 6017 62 81.03 0.00 <<< by the way, notice the oscillations 100.26 3883 6023 64 87.98 0.00 100.26 3900 6020 64 89.52 0.00 <<< Processor package power oscillates quite a lot 100.26 3801 6021 62 81.09 0.00 <<< Frequency oscillates also. 100.26 3857 6021 64 85.70 0.00 <<< but in this region, 1 pstate ~= 10 watts 100.26 3900 6018 64 89.34 0.00 ... 100.26 3852 6020 62 85.24 0.00 100.26 3800 6019 62 80.82 0.00 100.26 3885 6047 64 87.77 0.00 <<< trip point changed to 70 100.26 3963 6017 67 94.88 0.00 <<< yes, offset change response is fast 100.26 4000 6017 67 98.35 0.00 100.26 4079 6018 69 105.17 0.00 100.26 4100 6017 69 107.02 0.00 ... delete 25 seconds ... 100.24 4042 6017 67 102.16 0.00 <<< PKG Limit #1 takes over 100.23 4016 6017 67 99.84 0.00 <<< All throttling is now PKG Limit #1 100.23 4017 6024 68 99.84 0.00 100.23 4015 6026 67 99.77 0.00 Test 2: Test 2: intel_pstate, HWP enabled, powersave governor. Test 2.1: startup delay, requires faster sampling: MSR_IA32_TEMPERATURE_TARGET: 0x2a64100d (58 C) (100 default - 42 offset) at 58 degrees it shouldn't clamp. $ sudo ~/turbostat --Summary --quiet --show Busy%,Bzy_MHz,PkgTmp,PkgWatt,IRQ --interval 0.1 Busy% Bzy_MHz IRQ PkgTmp PkgWatt 0.28 800 12 33 1.93 0.25 800 10 33 1.90 0.31 800 13 33 1.90 0.79 800 19 33 1.92 0.34 800 32 34 1.90 0.22 800 5 33 1.91 <<< ~ 77% of next sample is busy and 20 degrees already 61.91 4103 469 53 60.20 <<< 260 degrees per second 99.01 4264 610 56 121.94 <<< how much PKG Limit #2 and/or TCC loop, I don't know. 98.87 4251 614 61 120.74 <<< unthrottled would be 4.60 GHz 98.87 4235 609 62 119.87 98.85 4226 609 63 119.60 ... delete 18.4 seconds 100.26 4100 613 62 102.41 100.26 4100 609 61 102.28 100.25 4040 609 60 97.49 <<< Don't know between PKG Limit #1 and/or TCC loop 100.26 4000 609 59 95.02 <<< definitely TCC loop 100.26 4000 615 60 94.01 Test 2.2: clamp and recover delay, requires slower sampling: MSR_IA32_TEMPERATURE_TARGET: 0x3f64100d (37 C) (100 default - 63 offset) sudo ~/turbostat --Summary --quiet --show Busy%,Bzy_MHz,PkgTmp,PkgWatt,IRQ --interval 15 Busy% Bzy_MHz IRQ PkgTmp PkgWatt 100.26 3700 90167 59 73.97 100.26 3700 90234 58 73.96 100.26 3700 90184 58 74.07 100.26 3700 4073 58 74.09 <<< close to time offset set to 37) 100.26 3700 90222 59 74.12 100.26 3700 90169 59 74.19 100.26 3700 90294 59 73.03 100.26 3700 90164 59 72.63 100.26 3700 90174 59 72.62 100.26 3700 90163 58 72.60 100.26 3700 90208 59 72.58 100.26 3702 90164 60 72.73 <<< 2 minutes until response. 100.26 3831 90169 63 80.67 100.26 3880 90199 63 84.56 100.26 3889 90187 63 85.34 100.26 3900 90170 63 86.24 100.26 3900 90178 62 86.26 Now, change to 0.1 second sample time and change the offset again, but this time it is not clamped already first. $ sudo ~/turbostat --Summary --quiet --show Busy%,Bzy_MHz,PkgTmp,PkgWatt,IRQ --interval 0.1 Busy% Bzy_MHz IRQ PkgTmp PkgWatt 100.26 3900 609 63 89.02 100.26 3900 609 63 89.10 100.26 3900 131 63 89.47 <<< it takes a finite time between here and 100.26 3900 615 63 89.31 <<< the actual change of offset to 30 ... delete 2.7 seconds... <<< but nowhere near this long. 100.26 3900 614 63 90.08 100.24 3915 609 64 90.42 <<< O.K. responding. 100.26 4000 611 65 98.06 ... delete 1.2 seconds ... 100.26 4000 609 65 98.27 100.24 4091 616 67 106.93 100.26 4100 610 68 106.74 <<< Next step. 100.26 4100 609 68 106.90 ... delete 4.4 seconds ... 100.26 4100 609 68 108.02 100.24 4107 615 69 107.42 <<< Next step. 100.26 4200 609 70 115.93 100.26 4200 609 71 115.99 100.26 4200 610 71 117.14 100.26 4200 615 70 116.00 100.26 4200 609 70 116.17 100.26 4200 609 70 116.09 100.26 4200 612 71 117.23 100.26 4200 617 70 115.96 100.26 4200 611 70 116.10 100.26 4200 609 70 116.10 100.26 4200 609 70 117.38 100.26 4200 615 70 116.09 100.26 4200 610 70 116.12 100.26 4200 609 70 116.03 100.24 4117 609 69 109.74 <<< O.K. go down again. 100.26 4100 617 69 106.86 Test 3: intel_pstate, HWP enabled, performance governor. Test 3.1: startup delay, requires faster sampling: MSR_IA32_TEMPERATURE_TARGET: 0x2a64100d (58 C) (100 default - 42 offset) at 58 degrees it shouldn't clamp. $ sudo ~/turbostat --Summary --quiet --show Busy%,Bzy_MHz,PkgTmp,PkgWatt,IRQ --interval 0.1 Busy% Bzy_MHz IRQ PkgTmp PkgWatt 0.06 4169 15 32 2.23 0.04 4598 6 33 2.03 <<< ~275 degrees per second 20.09 4599 268 45 15.37 <<< 12 degrees in ~43.7 mSec 99.10 4282 612 55 121.10 <<< how much PKG Limit #2 and/or TCC loop, I don't know. 98.94 4263 610 59 122.18 ...delete 17.8 seconds. 63-66 degrees Example: 100.26 4300 609 66 118.93. ... 100.25 4154 610 62 106.06 <<< finally comes down again. 100.26 4100 609 62 101.90 ... delete 4.5 seconds 100.26 4100 611 61 102.06 100.26 4100 609 61 102.10 100.25 4038 615 59 98.09 <<< finally gets to temp. 100.26 4000 610 59 93.84 <<< will oscillate here 100.26 4000 609 59 93.88 <<< between pstates 40 and 41 ... delete 1.3 seconds ... 100.26 4000 615 58 94.81 100.26 4000 611 59 93.83 100.24 4030 609 60 96.27 100.26 4100 609 60 101.99 100.26 4100 615 61 102.91 ... delete 0.8 seconds ... 100.26 4100 614 61 103.44 100.25 4091 609 61 101.53 100.26 4000 610 59 94.07 ... Test 3.2: clamp and recover delay, requires slower sampling: MSR_IA32_TEMPERATURE_TARGET: 0x3f64100d (37 C) (100 default - 63 offset) sudo ~/turbostat --Summary --quiet --show Busy%,Bzy_MHz,PkgTmp,PkgWatt,IRQ --interval 15 Busy% Bzy_MHz IRQ PkgTmp PkgWatt 100.26 3700 90181 59 74.72 100.26 3700 90383 59 74.75 100.26 3700 2847 59 74.47 <<< close to time offset set to 37) 100.26 3700 90240 59 74.83 100.26 3700 90164 59 74.83 100.26 3700 90225 59 74.85 100.26 3700 90219 59 74.90 100.26 3700 90191 59 74.86 100.26 3700 90166 59 74.86 100.26 3700 90164 59 74.80 100.26 3728 90286 60 76.19 <<< 2 minutes, because it was clamped. 100.26 3832 90162 64 82.67 100.26 3870 90177 63 85.94 Now, change to 0.1 second sample time and change the offset again, but this time it is not clamped already first. $ sudo ~/turbostat --Summary --quiet --show Busy%,Bzy_MHz,PkgTmp,PkgWatt,IRQ --interval 0.1 Busy% Bzy_MHz IRQ PkgTmp PkgWatt 100.26 3900 609 63 88.73 100.26 3900 615 63 88.83 100.26 3900 303 63 89.85 <<< it takes a finite time between here and 100.26 3900 615 63 88.91 <<< the actual change of offset to 30 100.26 3900 610 64 89.64 ... delete 2 seconds ... 100.26 3900 611 64 88.75 100.25 3911 609 64 89.58 <<< 1st response 100.26 4000 615 65 97.82 ... delete 1.2 seconds ... 100.26 4000 615 66 98.77 100.24 4086 609 68 104.94 <<< next step 100.26 4100 609 67 106.35 ... Test 4: intel_pstate, HWP enabled, performance governor. Method of creating 100% CPU load changed to use much less Energy per thread. Test 4.1: startup delay, requires faster sampling: MSR_IA32_TEMPERATURE_TARGET: 0x2a64100d (45 C) (100 default - 55 offset) Multiple tests were run with 2 through 6 threads. It took between 6 and 9 seconds to begin to throttle. Example, 3 threads: $ sudo ~/turbostat --Summary --quiet --show Busy%,Bzy_MHz,PkgTmp,PkgWatt,IRQ --interval 1 Busy% Bzy_MHz IRQ PkgTmp PkgWatt 0.01 4498 25 32 1.95 0.01 4603 17 32 1.93 0.01 4153 31 31 1.96 36.34 4600 2581 51 35.31 <<< load for last ~ 0.727 seconds. 51.13 4600 3562 52 47.88 50.77 4600 3620 52 47.98 51.03 4600 3551 52 48.11 51.13 4600 3596 53 48.14 50.87 4600 3627 52 48.20 51.30 4600 3535 52 48.30 51.17 4550 3534 50 46.26 <<< start throttling, ~ 7 seconds 51.27 4452 3567 48 42.05 50.82 4395 3585 47 39.68 51.28 4300 3529 46 36.53 <<< plus another couple to get there. 50.98 4300 3522 47 36.28 51.15 4219 3530 45 34.72 51.08 4200 3678 46 34.33 50.74 4200 3697 46 34.17 51.16 4200 3522 46 34.40 50.99 4126 3534 46 32.44 51.22 4100 3590 44 32.41 ... Doug
Hi, Just a small follow up on this one: On 2021.01.16 09:08 Doug Smythies wrote: > On 2021.01.15 Zhang Rui wrote: ... > Busy% Bzy_MHz IRQ PkgTmp PkgWatt > 67.93 3773 134577 43 54.78 > > For unknown reason the processor seems to now > think it is not heavily loaded. From my MSR decoder: > > 0x64F: MSR_CORE_PERF_LIMIT_REASONS: 200020 AUTO AUTOL > > From the book: > > > Autonomous Utilization-Based Frequency Control > > Status (R0) > > When set, frequency is reduced below the operating > > system request because the processor has detected > > that utilization is low. > > Which is not true. > > Anyway, > > Acked-by: Doug Smythies <dsmythies@telus.net> O.K. there were 2 things wrong above: 1.) I used the wrong intel SDM table for those bit definitions. They should have been: RATL and RATLL. From the proper page of the book: > Running Average Thermal Limit Status (R0) > When set, frequency is reduced below the operating > system request due to Running Average Thermal Limit > (RATL). 2.) Due to the already discussed turbostat issue, that was not the actual temperature and so the RATL bit being set was actually valid at that time. I have not been able to find the time window knob for this, if there even is one, similar to the time window knobs for the package power limits. I wanted to reduce the time constant, just as a test, in an attempt to reduce the step function load potential temperature overshoot. One additional informational follow up note: There always seems to be a significant turn on transient to using the TCC offset, appearing as temperature undershoot. I am saying that an offset of 0 seems to also act as some sort of on/off switch to the running average. Example 1 - start with offset 0: $ sudo ~/turbostat --Summary --quiet --show Busy%,Bzy_MHz,PkgTmp,PkgWatt,IRQ --interval 1 Busy% Bzy_MHz IRQ PkgTmp PkgWatt 51.17 4600 3531 71 93.57 51.37 4600 3543 71 93.60 51.37 4600 3590 71 93.63 <<< offset changed from 0 to 24 50.99 3737 3566 52 43.49 <<< trip point = 76 degrees 51.20 3700 3550 51 41.14 <<< TCC offset turn on transient 51.09 3700 3559 51 41.30 <<< There was no need to throttle 51.12 3779 3515 53 43.78 50.95 4064 3553 58 55.57 51.55 4271 3522 63 65.30 51.18 4424 3534 67 76.58 51.27 4500 3532 68 84.12 51.14 4500 3529 68 84.14 51.24 4599 3522 71 93.61 51.14 4600 3523 71 93.71 <<< Eventually it does return to not throttled. Example 2 - start with offset 1: Busy% Bzy_MHz IRQ PkgTmp PkgWatt 51.14 4600 3554 73 94.73 51.37 4600 3544 73 94.85 51.03 4600 3560 74 94.64 <<< offset changed from 1 to 24 51.33 4600 3508 73 94.88 <<< trip point = 76 degrees 51.14 4600 3526 73 94.69 <<< No TCC offset transient 51.22 4600 3614 73 94.85 51.24 4600 3531 73 94.84 51.50 4600 3578 73 94.92 51.15 4600 3571 73 94.77 51.20 4600 3521 73 94.91 51.19 4600 3550 73 94.76 51.27 4600 3522 74 94.81 51.27 4600 3530 74 94.98 ... Doug
Hi, Doug, On Tue, 2021-01-26 at 11:18 -0800, Doug Smythies wrote: > Hi, Just a small follow up on this one: > > On 2021.01.16 09:08 Doug Smythies wrote: > > On 2021.01.15 Zhang Rui wrote: > > ... > > Busy% Bzy_MHz IRQ PkgTmp PkgWatt > > 67.93 3773 134577 43 54.78 > > > > For unknown reason the processor seems to now > > think it is not heavily loaded. From my MSR decoder: > > > > 0x64F: MSR_CORE_PERF_LIMIT_REASONS: 200020 AUTO AUTOL > > > > From the book: > > > > > Autonomous Utilization-Based Frequency Control > > > Status (R0) > > > When set, frequency is reduced below the operating > > > system request because the processor has detected > > > that utilization is low. > > > > Which is not true. > > > > Anyway, > > > > Acked-by: Doug Smythies <dsmythies@telus.net> > > O.K. there were 2 things wrong above: > > 1.) I used the wrong intel SDM table for those bit definitions. > They should have been: RATL and RATLL. > > From the proper page of the book: > > > Running Average Thermal Limit Status (R0) > > When set, frequency is reduced below the operating > > system request due to Running Average Thermal Limit > > (RATL). > > 2.) Due to the already discussed turbostat issue, that was not > the actual temperature and so the RATL bit being set was actually > valid at that time. > On my side, I got the "Thermal status bit" set. > I have not been able to find the time window knob for this, if there > even is one, similar to the time window knobs for the package power > limits. > I wanted to reduce the time constant, just as a test, in an attempt > to reduce the step function load potential temperature overshoot. > > One additional informational follow up note: > > There always seems to be a significant turn on transient to using the > TCC offset, appearing as temperature undershoot. I am saying that > an offset of 0 seems to also act as some sort of on/off switch to the > running average. > > Example 1 - start with offset 0: > > $ sudo ~/turbostat --Summary --quiet --show > Busy%,Bzy_MHz,PkgTmp,PkgWatt,IRQ --interval 1 > Busy% Bzy_MHz IRQ PkgTmp PkgWatt > 51.17 4600 3531 71 93.57 > 51.37 4600 3543 71 93.60 > 51.37 4600 3590 71 93.63 <<< offset changed from 0 to > 24 > 50.99 3737 3566 52 43.49 <<< trip point = 76 degrees > 51.20 3700 3550 51 41.14 <<< TCC offset turn on > transient > 51.09 3700 3559 51 41.30 <<< There was no need to > throttle > 51.12 3779 3515 53 43.78 > 50.95 4064 3553 58 55.57 > 51.55 4271 3522 63 65.30 > 51.18 4424 3534 67 76.58 > 51.27 4500 3532 68 84.12 > 51.14 4500 3529 68 84.14 > 51.24 4599 3522 71 93.61 > 51.14 4600 3523 71 93.71 <<< Eventually it does return > to not throttled. > > Example 2 - start with offset 1: > > Busy% Bzy_MHz IRQ PkgTmp PkgWatt > 51.14 4600 3554 73 94.73 > 51.37 4600 3544 73 94.85 > 51.03 4600 3560 74 94.64 <<< offset changed from 1 to 24 > 51.33 4600 3508 73 94.88 <<< trip point = 76 degrees > 51.14 4600 3526 73 94.69 <<< No TCC offset transient > 51.22 4600 3614 73 94.85 > 51.24 4600 3531 73 94.84 > 51.50 4600 3578 73 94.92 > 51.15 4600 3571 73 94.77 > 51.20 4600 3521 73 94.91 > 51.19 4600 3550 73 94.76 > 51.27 4600 3522 74 94.81 > 51.27 4600 3530 74 94.98 > > Thanks for your test. I'd prefer this is platform specific. Because it behaves really differently from what I observed. $sudo turbostat --Summary --quiet --show Busy%,Bzy_MHz,PkgTmp,PkgWatt,IRQ --interval 1 99.45 2216 10656 80 14.81 <<< start with offset=0 99.48 2234 10621 79 15.02 99.47 2233 10436 80 14.96 99.45 2236 10587 79 14.94 99.49 2216 10673 79 15.04 99.46 2226 10685 79 14.87 99.43 2233 10776 79 14.89 99.73 399 9139 66 4.51 <<< offset set to 50 99.76 212 8998 65 3.31 99.77 212 8902 64 3.27 ... <<< throttled for 20 seconds 99.76 212 8911 55 2.97 99.77 211 8851 55 2.95 99.76 211 8916 55 2.94 99.77 211 8844 55 3.05 99.77 211 8828 54 3.21 99.77 211 8911 54 3.05 99.74 212 8998 54 3.20 99.77 212 8802 54 2.90 99.77 211 8849 54 2.90 99.76 212 8942 53 2.98 99.76 211 9039 53 3.22 99.74 212 8977 53 2.89 99.77 211 8913 53 2.89 99.76 212 8900 53 2.89 99.77 211 8817 52 2.87 99.77 212 8923 52 2.88 99.77 212 8985 52 2.88 99.73 212 8877 52 2.86 99.58 575 9308 66 5.54 <<< offset set to 32 98.92 2460 13694 66 17.32 98.98 2298 13768 66 15.24 99.03 2244 14652 66 14.48 98.97 2198 14489 66 13.95 99.03 2148 14583 66 13.43 99.02 2107 14093 66 13.45 99.06 2060 13750 66 12.61 99.06 2036 14195 66 12.27 99.07 2007 14240 66 12.07 99.12 2888 12147 98 28.23 <<< offset cleared 99.03 3413 11503 98 37.21 98.96 3317 11698 98 34.64 99.07 3246 11410 98 32.89 98.95 3210 12107 98 32.13 98.94 3164 11790 98 31.08 99.00 3124 12106 98 30.84 99.00 3086 11876 98 29.60 98.94 3054 12482 98 29.00 98.89 3030 12629 98 28.54 99.39 2377 10764 82 17.62 <<< Didn't do anything, so it is probably thermald or something 99.49 2200 10679 81 14.44 99.52 2211 10267 80 14.66 99.49 2221 10318 80 14.71 99.45 2220 10289 81 14.74 99.43 2222 10326 81 14.76 I tried both tests, and the results are the same, in both cases, it starts throttling immediately (within a second), and no over-throttling observed. Do you have a script to do this? Say, run turbostat in background and then change tcc offset at certain timestamp? Maybe we can try exactly the same test on different machines. thanks, rui
> > > > Rather than enter the actual TCC offset, I would rather enter the > > desired trip > > point, and have the driver do the math to convert it to the offset. > > Hmmm, a writable trip point? I need to think about this. I think this is a better idea, and I will export this as a writable trip point of the x86_pkg_temp_thermal driver later, thanks for the suggestion. thanks, rui
On Thu, Jan 28, 2021 at 9:30 AM Zhang Rui <rui.zhang@intel.com> wrote: > On Tue, 2021-01-26 at 11:18 -0800, Doug Smythies wrote: > > On 2021.01.16 09:08 Doug Smythies wrote: > > > On 2021.01.15 Zhang Rui wrote: ... > > They should have been: RATL and RATLL. > > > > From the proper page of the book: > > > > > Running Average Thermal Limit Status (R0) > > > When set, frequency is reduced below the operating > > > system request due to Running Average Thermal Limit > > > (RATL). > > > > > 2.) Due to the already discussed turbostat issue, that was not > > the actual temperature and so the RATL bit being set was actually > > valid at that time. > > > On my side, I got the "Thermal status bit" set. Yes, and if I understand your comment correctly, you are referring to IA32_THERM_STATUS (0X19C) and/or IA32_PACKAGE_THERM_STATUS (0X1B1). I am referring to MSR_CORE_PERF_LIMIT_REASONS (0X64F). > > > I have not been able to find the time window knob for this, if there > > even is one, similar to the time window knobs for the package power > > limits. I just assume there is a time window, similar to the RAPL based power limits. But I haven't found it. > > I wanted to reduce the time constant, just as a test, in an attempt > > to reduce the step function load potential temperature overshoot. ... > > > Thanks for your test. > I'd prefer this is platform specific. > Because it behaves really differently from what I observed. O.K. These oddities aside, in the end it does do the expected job. > 99.06 2036 14195 66 12.27 > 99.07 2007 14240 66 12.07 > 99.12 2888 12147 98 28.23 <<< offset cleared > 99.03 3413 11503 98 37.21 > 98.96 3317 11698 98 34.64 very close to critical temp. I never knowingly allow my processor to go above 80 degrees. Although, I admit it hit 90 degrees a couple of times during this work. > 99.07 3246 11410 98 32.89 > 98.95 3210 12107 98 32.13 > 98.94 3164 11790 98 31.08 > 99.00 3124 12106 98 30.84 > 99.00 3086 11876 98 29.60 > 98.94 3054 12482 98 29.00 > 98.89 3030 12629 98 28.54 > 99.39 2377 10764 82 17.62 <<< Didn't do anything, so it > is probably thermald or something or critical temp hit. > > I tried both tests, and the results are the same, in both cases, it > starts throttling immediately (within a second), and no over-throttling > observed. > > Do you have a script to do this? No, all of my tests were done manually, varing: . placement of high loads on some cores for more heat over smaller surface area. . balance between 100% CPU load at max heat verses 100% CPU load at less heat. . balance between this TCC Offset throttling verses package power limits . using ambient (coolant temperature) as a heat removal capacity knob. In summary: I played around until I found something interesting. > Say, run turbostat in background and > then change tcc offset at certain timestamp? Maybe we can try exactly > the same test on different machines. I had an idea, and wasted way way too much time trying to make it work. I thought to just get turbostat to also show the offset, so then we know for certain when it changed. I tried virtually all combinations of: turbostat --Summary --quiet --add /sys/devices/virtual/thermal/cooling_device11/cur_state,,,,TCC --show Busy%,Bzy_MHz,PkgTmp,PkgWatt,IRQ --interval 1 turbostat --Summary --quiet --add msr0x1a2,u32,package,raw,TCC --show Busy%,Bzy_MHz,PkgTmp,PkgWatt,IRQ --interval 1 and could never get it to work in "Summary" mode. (note: about 95% of my use of turbostat is in "Summary" mode.) Anyway, after too long, I did get this to work: turbostat --quiet --cpu 0 --add /sys/devices/virtual/thermal/cooling_device11/cur_state,u32,,raw,TCC --show CPU,Busy%,Bzy_MHz,PkgTmp,PkgWatt,IRQ --interval 1 | grep "^ 0" Example 1: turbostat --quiet --cpu 0 --add /sys/devices/virtual/thermal/cooling_device11/cur_state,u32,,raw,TCC --show CPU,Busy%,Bzy_MHz,PkgTmp,PkgWatt,IRQ --interval 1 | grep "^0" CPU Busy% Bzy_MHz IRQ TCC PkgTmp PkgWatt 0 100.26 4500 1002 0x00000001 78 99.88 <<< Offset = 1 0 100.26 4501 1002 0x00000001 77 99.90 <<< steady state power limit throttle 0 100.26 4501 1004 0x00000001 77 99.92 0 100.26 4500 1002 0x0000001e 78 99.91 <<< offset changed, trip int 70 0 100.25 4502 1003 0x0000001e 77 100.03 0 100.25 4503 1002 0x0000001e 77 99.85 0 100.25 4502 1002 0x0000001e 78 99.92 0 100.26 4501 1003 0x0000001e 78 99.95 0 100.25 4503 1002 0x0000001e 77 99.88 0 100.25 4502 1002 0x0000001e 78 99.86 0 100.25 4502 1004 0x0000001e 77 99.92 0 100.25 4503 1002 0x0000001e 77 99.98 0 100.25 4502 1002 0x0000001e 77 99.88 0 100.26 4498 1004 0x0000001e 77 100.06 0 100.26 4501 1002 0x0000001e 78 99.77 0 100.26 4500 1002 0x0000001e 78 99.53 0 100.26 4430 1002 0x0000001e 72 91.19 <<< Thermal throttling. 13 Seconds 0 100.26 4400 1002 0x0000001e 72 87.55 0 100.26 4400 1002 0x0000001e 71 87.52 0 100.26 4400 1005 0x0000001e 71 87.56 0 100.26 4400 1002 0x0000001e 72 87.53 Example 2: 0 100.26 4600 1002 0x00000000 83 113.26 <<< Offset = 0 0 100.26 4600 1002 0x00000000 84 113.43 0 100.25 4599 1002 0x00000000 83 113.42 <<< No power limit throttle yet. 0 100.26 4600 1004 0x00000000 83 113.40 <<< Not steady state. 0 100.26 4600 1002 0x00000000 83 113.25 0 100.25 3797 1003 0x00000018 56 54.11 <<< Overshoot is immediate. 0 100.26 3700 1002 0x00000018 56 47.09 0 100.26 3700 1002 0x00000018 55 47.08 0 100.26 3700 1002 0x00000018 54 46.98 0 100.26 3820 1002 0x00000018 58 51.62 <<< starts to recover. 0 100.26 4016 1002 0x00000018 62 61.55 0 100.26 4177 1002 0x00000018 64 69.91 0 100.26 4275 1004 0x00000018 68 75.81 0 100.26 4300 1002 0x00000018 68 77.36 0 100.26 4371 1002 0x00000018 71 84.53 0 100.26 4400 1002 0x00000018 72 87.52 0 100.26 4400 1003 0x00000018 72 87.62 Example 3: This test is specifically an attempt to test the TCC Offset in the exact way I intend to use it. trip point = 75 degrees, and never changes. Power limit 2 is 115 watts, timing window short. Power limit 1 is 100 watts , timing window 8 seconds. Note: all previous work was with the timing window at 28 seconds. Note: typically temperature < 75 at 100 watts. The load is 4 prime95 maximum heat threads, plus 0 weaker memory hammering threads. The collant had to be preheated for about an hour before this test started, otherwise the processor would not get hot enough before package power limit 1 took over the throttling duties. Now, watching the TCC offset is useless for this test, so let's watch MSR_CORE_PERF_LIMIT_REASONS instead: turbostat --add msr0x64f,u32,,raw,TCC --show CPU,Busy%,Bzy_MHz,PkgTmp,PkgWatt,IRQ,RAMWatt --interval 1 | grep "^0" (O.K., I should have changed the added column name. I filter it anyhow, but manually added back, edited.) CPU Busy% Bzy_MHz IRQ TCC PkgTmp PkgWatt RAMWatt 0 0.07 1081 5 0x08200000 38 2.31 0.45 <<< Note high idle start temp. 0 0.16 824 11 0x08200000 38 2.12 0.45 0 1.74 3430 44 0x00000000 38 2.65 0.45 <<< clear last times log bits 0 0.16 851 6 0x00000000 37 2.27 0.45 0 4.32 3313 269 0x00000000 75 47.15 0.45 <<< load applied 0 4.24 4585 458 0x08000800 78 97.16 0.45 <<< package power limit 2 0 2.80 4588 482 0x08000000 77 97.49 0.45 <<< temperature just high 0 2.87 4593 463 0x08000000 78 97.95 0.45 0 3.39 4600 465 0x08000000 78 97.68 0.45 0 2.66 4600 462 0x08000000 78 97.55 0.45 0 2.28 4584 490 0x08000000 78 97.97 0.45 0 3.29 4583 478 0x08000000 78 97.72 0.45 0 3.24 4595 465 0x08000000 77 97.52 0.45 0 2.47 4600 465 0x08000000 78 97.50 0.45 0 4.18 4570 464 0x08000000 78 97.72 0.45 0 2.51 4600 470 0x08000000 78 97.40 0.45 0 1.77 4601 482 0x08000000 78 97.33 0.45 0 3.13 4584 462 0x08000000 78 97.57 0.45 0 3.06 4600 466 0x08000000 78 97.77 0.45 0 2.86 4592 461 0x08000000 78 97.56 0.45 0 2.85 4569 486 0x08000000 78 97.99 0.45 0 2.96 4600 465 0x08000000 78 97.91 0.45 0 3.00 4585 451 0x08000000 78 97.68 0.45 0 2.06 4600 475 0x08000000 78 97.50 0.45 0 3.05 4594 462 0x08000000 78 97.78 0.45 0 3.11 4592 461 0x08000000 78 97.68 0.45 0 2.31 4546 463 0x08200020 73 93.00 0.45 <<< RATL 0 2.80 4525 454 0x08200000 78 91.29 0.45 <<< Oscillates within 0 3.32 4538 445 0x08200020 73 91.61 0.45 <<< 1 pstate 0 3.27 4557 434 0x08200000 78 93.12 0.45 0 3.26 4523 470 0x08200020 73 89.85 0.45 <<< rough estimate is 0 2.48 4586 466 0x08200020 74 95.67 0.45 <<< oscillation costs 0.4% 0 1.95 4521 468 0x08200000 76 87.93 0.45 <<< performance loss verses 0 3.28 4569 449 0x08200020 73 94.67 0.45 <<< the power limit 2 servo. 0 0.44 4546 495 0x08200000 78 91.77 0.45 <<< (very crude, hard to defend 0 1.91 4518 487 0x08200020 73 91.24 0.45 <<< data.) 0 3.25 4539 460 0x08200000 78 91.63 0.45 0 2.51 4546 469 0x08200020 74 91.12 0.45 0 3.60 4540 453 0x08200000 77 91.43 0.45 0 3.06 4542 463 0x08200020 73 91.56 0.45 ... Doug
diff --git a/drivers/thermal/intel/Kconfig b/drivers/thermal/intel/Kconfig index 8025b21f43fa..67de49cc9fb4 100644 --- a/drivers/thermal/intel/Kconfig +++ b/drivers/thermal/intel/Kconfig @@ -75,3 +75,11 @@ config INTEL_PCH_THERMAL Enable this to support thermal reporting on certain intel PCHs. Thermal reporting device will provide temperature reading, programmable trip points and other information. + +config INTEL_TCC_COOLING + tristate "Intel TCC offset cooling Driver" + depends on X86 + help + Enable this to support system cooling by adjusting the effective TCC + activation temperature via the TCC Offset register, which is widely + supported on modern Intel platforms. diff --git a/drivers/thermal/intel/Makefile b/drivers/thermal/intel/Makefile index 0d9736ced5d4..40e86973f88d 100644 --- a/drivers/thermal/intel/Makefile +++ b/drivers/thermal/intel/Makefile @@ -10,3 +10,4 @@ obj-$(CONFIG_INTEL_QUARK_DTS_THERMAL) += intel_quark_dts_thermal.o obj-$(CONFIG_INT340X_THERMAL) += int340x_thermal/ obj-$(CONFIG_INTEL_BXT_PMIC_THERMAL) += intel_bxt_pmic_thermal.o obj-$(CONFIG_INTEL_PCH_THERMAL) += intel_pch_thermal.o +obj-$(CONFIG_INTEL_TCC_COOLING) += intel_tcc_cooling.o diff --git a/drivers/thermal/intel/intel_tcc_cooling.c b/drivers/thermal/intel/intel_tcc_cooling.c new file mode 100644 index 000000000000..aa6bbb9ba898 --- /dev/null +++ b/drivers/thermal/intel/intel_tcc_cooling.c @@ -0,0 +1,128 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * cooling device driver that activates the processor throttling by + * programming the TCC Offset register. + * Copyright (c) 2021, Intel Corporation. + */ +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include <linux/device.h> +#include <linux/module.h> +#include <linux/thermal.h> +#include <asm/cpu_device_id.h> + +#define TCC_SHIFT 24 +#define TCC_MASK (0x3fULL<<24) +#define TCC_PROGRAMMABLE BIT(30) + +static struct thermal_cooling_device *tcc_cdev; + +static int tcc_get_max_state(struct thermal_cooling_device *cdev, unsigned long + *state) +{ + *state = TCC_MASK >> TCC_SHIFT; + return 0; +} + +static int tcc_offset_update(int tcc) +{ + u64 val; + int err; + + err = rdmsrl_safe(MSR_IA32_TEMPERATURE_TARGET, &val); + if (err) + return err; + + val &= ~TCC_MASK; + val |= tcc << TCC_SHIFT; + + err = wrmsrl_safe(MSR_IA32_TEMPERATURE_TARGET, val); + if (err) + return err; + + return 0; +} + +static int tcc_get_cur_state(struct thermal_cooling_device *cdev, unsigned long + *state) +{ + u64 val; + int err; + + err = rdmsrl_safe(MSR_IA32_TEMPERATURE_TARGET, &val); + if (err) + return err; + + *state = (val & TCC_MASK) >> TCC_SHIFT; + return 0; +} + +static int tcc_set_cur_state(struct thermal_cooling_device *cdev, unsigned long + state) +{ + return tcc_offset_update(state); +} + +static const struct thermal_cooling_device_ops tcc_cooling_ops = { + .get_max_state = tcc_get_max_state, + .get_cur_state = tcc_get_cur_state, + .set_cur_state = tcc_set_cur_state, +}; + +static const struct x86_cpu_id tcc_ids[] __initconst = { + X86_MATCH_INTEL_FAM6_MODEL(SKYLAKE, NULL), + X86_MATCH_INTEL_FAM6_MODEL(SKYLAKE_L, NULL), + X86_MATCH_INTEL_FAM6_MODEL(KABYLAKE, NULL), + X86_MATCH_INTEL_FAM6_MODEL(KABYLAKE_L, NULL), + X86_MATCH_INTEL_FAM6_MODEL(ICELAKE, NULL), + X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_L, NULL), + X86_MATCH_INTEL_FAM6_MODEL(TIGERLAKE, NULL), + X86_MATCH_INTEL_FAM6_MODEL(TIGERLAKE_L, NULL), + {} +}; + +MODULE_DEVICE_TABLE(x86cpu, tcc_ids); + +static int __init tcc_cooling_init(void) +{ + int ret; + u64 val; + const struct x86_cpu_id *id; + + int err; + + id = x86_match_cpu(tcc_ids); + if (!id) + return -ENODEV; + + err = rdmsrl_safe(MSR_PLATFORM_INFO, &val); + if (err) + return err; + + if (!(val & TCC_PROGRAMMABLE)) + return -ENODEV; + + pr_info("Programmable TCC Offset detected\n"); + + tcc_cdev = + thermal_cooling_device_register("TCC Offset", NULL, + &tcc_cooling_ops); + if (IS_ERR(tcc_cdev)) { + ret = PTR_ERR(tcc_cdev); + return ret; + } + return 0; +} + +module_init(tcc_cooling_init) + +static void __exit tcc_cooling_exit(void) +{ + thermal_cooling_device_unregister(tcc_cdev); +} + +module_exit(tcc_cooling_exit) + +MODULE_DESCRIPTION("TCC offset cooling device Driver"); +MODULE_AUTHOR("Zhang Rui <rui.zhang@intel.com>"); +MODULE_LICENSE("GPL v2");
On Intel processors, the core frequency can be reduced below OS request, when the current temperature reaches the TCC (Thermal Control Circuit) activation temperature. The default TCC activation temperature is specified by MSR_IA32_TEMPERATURE_TARGET. However, it can be adjusted by specifying an offset in degrees C, using the TCC Offset bits in the same MSR register. This patch introduces a cooling devices driver that utilizes the TCC Offset feature. The bigger the current cooling state is, the lower the effective TCC activation temperature is, so that the processors can be throttled earlier before system critical overheats. This patch has been tested on a KBL mobile platform. Signed-off-by: Zhang Rui <rui.zhang@intel.com> --- drivers/thermal/intel/Kconfig | 8 ++ drivers/thermal/intel/Makefile | 1 + drivers/thermal/intel/intel_tcc_cooling.c | 128 ++++++++++++++++++++++ 3 files changed, 137 insertions(+) create mode 100644 drivers/thermal/intel/intel_tcc_cooling.c