Message ID | 20210619121927.32699-1-ericwouds@gmail.com |
---|---|
State | New |
Headers | show |
Series | Fix mt7622.dtsi thermal cpu | expand |
On 19/06/2021 14:19, ericwouds@gmail.com wrote: > From: Eric Woudstra <ericwouds@gmail.com> > > Cpu-thermal is set to use all frequencies already at 47 degrees. > Using the CPU at 50 for a minute, the CPU has reached 48 degrees, is > throttled back to lowest setting, making the mt7622 terrribly slow. > Even at this low speed, the CPU does not cool down lower then 47 so > the CPU is stuck at lowest possible frequency until it shut down and > stays off for 15 minutes. > > cooling-device = <&cpu0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>, > <&cpu1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>; > > This should not be set al every cooling map. It should only be set at > the highest cooling map. Same as in the example: > > https://www.kernel.org/doc/Documentation/devicetree/bindings/ > thermal/thermal.txt line 272 > > But then without the fan and added a third map. > > Now temperature will be regulated at 87 degrees celcius. At temperatures > lower then 87, all frequencies can be used. 47°C is really a too low temperature and this performance drop is normal. I would not remove the passive mitigation but try by increasing the CPU temp to 70°C and by changing the active trip point to 80°C. If it works fine, try 75°C and 85°C. To test, the thermal killer is dhrystone (one thread per cpu). With a 75°C passive trip point, the step wise thermal governor, I think the mitigation will happen smoothly providing better performances, and probably the fan won't fire. > Also see the post: > > http://forum.banana-pi.org/t/bpi-r64-only-10-cpu-speed-at-already-48- > degrees-celcius-speed-not-increasing-anymore/12262 > > Signed-off-by: Eric Woudstra <ericwouds@gmail.com> > --- > arch/arm64/boot/dts/mediatek/mt7622.dtsi | 8 ++++---- > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/arch/arm64/boot/dts/mediatek/mt7622.dtsi b/arch/arm64/boot/dts/mediatek/mt7622.dtsi > index 890a942ec..b779c7aa6 100644 > --- a/arch/arm64/boot/dts/mediatek/mt7622.dtsi > +++ b/arch/arm64/boot/dts/mediatek/mt7622.dtsi > @@ -170,14 +170,14 @@ cpu-crit { > cooling-maps { > map0 { > trip = <&cpu_passive>; > - cooling-device = <&cpu0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>, > - <&cpu1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>; > + cooling-device = <&cpu0 0 0>, > + <&cpu1 0 0>; > }; > > map1 { > trip = <&cpu_active>; > - cooling-device = <&cpu0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>, > - <&cpu1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>; > + cooling-device = <&cpu0 0 0>, > + <&cpu1 0 0>; > }; > > map2 { > -- <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | <http://twitter.com/#!/linaroorg> Twitter | <http://www.linaro.org/linaro-blog/> Blog
It is only useful to set 1 map with the regulated temperature for cpu frequency throttling. Same as in the kernel document example. It has no use to set frequency scaling on 2 different temperature trip points, as the lowest one makes sure the higher one(s) are never reached. It can be applied only at 1 trip point. Multiple trip points is only usefully for fan control to make sure the fan is not too noisy when it is not necessary to be noisy. The CPU will almost come to a dead stop when it starts to pass the lowest thermal map with frequency throttling. This is why it is a bug and needs a fix, not only adjustment. There is no fan... On the bpi r64. Anyway without throttling at all a kernel build for more then an hour temperature creeps up to 85 degrees. Get BlueMail for Android On Jun 21, 2021, 8:29 PM, at 8:29 PM, Daniel Lezcano <daniel.lezcano@linaro.org> wrote: >On 19/06/2021 14:19, ericwouds@gmail.com wrote: >> From: Eric Woudstra <ericwouds@gmail.com> >> >> Cpu-thermal is set to use all frequencies already at 47 degrees. >> Using the CPU at 50 for a minute, the CPU has reached 48 degrees, is >> throttled back to lowest setting, making the mt7622 terrribly slow. >> Even at this low speed, the CPU does not cool down lower then 47 so >> the CPU is stuck at lowest possible frequency until it shut down and >> stays off for 15 minutes. >> >> cooling-device = <&cpu0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>, >> <&cpu1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>; >> >> This should not be set al every cooling map. It should only be set at >> the highest cooling map. Same as in the example: >> >> https://www.kernel.org/doc/Documentation/devicetree/bindings/ >> thermal/thermal.txt line 272 >> >> But then without the fan and added a third map. >> >> Now temperature will be regulated at 87 degrees celcius. At >temperatures >> lower then 87, all frequencies can be used. > >47°C is really a too low temperature and this performance drop is >normal. > >I would not remove the passive mitigation but try by increasing the CPU >temp to 70°C and by changing the active trip point to 80°C. If it works >fine, try 75°C and 85°C. > >To test, the thermal killer is dhrystone (one thread per cpu). > >With a 75°C passive trip point, the step wise thermal governor, I think >the mitigation will happen smoothly providing better performances, and >probably the fan won't fire. > >> Also see the post: >> >> http://forum.banana-pi.org/t/bpi-r64-only-10-cpu-speed-at-already-48- >> degrees-celcius-speed-not-increasing-anymore/12262 >> >> Signed-off-by: Eric Woudstra <ericwouds@gmail.com> >> --- >> arch/arm64/boot/dts/mediatek/mt7622.dtsi | 8 ++++---- >> 1 file changed, 4 insertions(+), 4 deletions(-) >> >> diff --git a/arch/arm64/boot/dts/mediatek/mt7622.dtsi >b/arch/arm64/boot/dts/mediatek/mt7622.dtsi >> index 890a942ec..b779c7aa6 100644 >> --- a/arch/arm64/boot/dts/mediatek/mt7622.dtsi >> +++ b/arch/arm64/boot/dts/mediatek/mt7622.dtsi >> @@ -170,14 +170,14 @@ cpu-crit { >> cooling-maps { >> map0 { >> trip = <&cpu_passive>; >> - cooling-device = <&cpu0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>, >> - <&cpu1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>; >> + cooling-device = <&cpu0 0 0>, >> + <&cpu1 0 0>; >> }; >> >> map1 { >> trip = <&cpu_active>; >> - cooling-device = <&cpu0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>, >> - <&cpu1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>; >> + cooling-device = <&cpu0 0 0>, >> + <&cpu1 0 0>; >> }; >> >> map2 { >> > > >-- ><http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs > >Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | ><http://twitter.com/#!/linaroorg> Twitter | ><http://www.linaro.org/linaro-blog/> Blog
On 23/06/2021 17:35, Eric Woudstra wrote: > It is only useful to set 1 map with the regulated temperature for cpu > frequency throttling. Same as in the kernel document example. > > > It has no use to set frequency scaling on 2 different temperature > trip points, as the lowest one makes sure the higher one(s) are never > reached. I looked more closely the DT and there is a misunderstanding of the thermal framework in the definition. There is one trip point with the passive type and the cpu cooling device, followed by a second trip point with the active type *but* the same cpu cooling device. That is wrong. And finally, there is the hot trip point as a third mapping and the same cooling device. The hot trip point is only there to notify userspace and let it take an immediate action to prevent an emergency shutdown when reaching the critical temperature. > It can be applied only at 1 trip point. Multiple trip points > is only usefully for fan control to make sure the fan is not too > noisy when it is not necessary to be noisy. > > > The CPU will almost come to a dead stop when it starts to pass the > lowest thermal map with frequency throttling. > > This is why it is a bug and needs a fix, not only adjustment. Yes, you are right. It should be something like (verbatim copy): diff --git a/arch/arm64/boot/dts/mediatek/mt7622.dtsi b/arch/arm64/boot/dts/mediatek/mt7622.dtsi index 890a942ec608..88c81d24f4ff 100644 --- a/arch/arm64/boot/dts/mediatek/mt7622.dtsi +++ b/arch/arm64/boot/dts/mediatek/mt7622.dtsi @@ -136,24 +136,18 @@ secmon_reserved: secmon@43000000 { thermal-zones { cpu_thermal: cpu-thermal { - polling-delay-passive = <1000>; + polling-delay-passive = <250>; polling-delay = <1000>; thermal-sensors = <&thermal 0>; trips { cpu_passive: cpu-passive { - temperature = <47000>; + temperature = <77000>; hysteresis = <2000>; type = "passive"; }; - cpu_active: cpu-active { - temperature = <67000>; - hysteresis = <2000>; - type = "active"; - }; - cpu_hot: cpu-hot { temperature = <87000>; hysteresis = <2000>; @@ -173,18 +167,6 @@ map0 { cooling-device = <&cpu0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>, <&cpu1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>; }; - - map1 { - trip = <&cpu_active>; - cooling-device = <&cpu0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>, - <&cpu1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>; - }; - - map2 { - trip = <&cpu_hot>; - cooling-device = <&cpu0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>, - <&cpu1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>; - }; }; }; }; -- <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | <http://twitter.com/#!/linaroorg> Twitter | <http://www.linaro.org/linaro-blog/> Blog
I choose "hot" before, because 87 degrees seems ok to start frequency throttling. But, yes, it should be passive. 87 is still quite low if I compare this temperature with the wrt3200acm Marvell dual core arm soc. They even went above 100 degrees so I feel for an arm processor inside a router box it is fine to use 87 degrees But maybe someone at Mediatek can give some more details about operating temperatures. It may be possible to leave the active map in the device tree as some users of the bananapi might choose to install a fan as it is one of the options. Get BlueMail for Android On Jun 23, 2021, 5:58 PM, at 5:58 PM, Daniel Lezcano <daniel.lezcano@linaro.org> wrote: >On 23/06/2021 17:35, Eric Woudstra wrote: >> It is only useful to set 1 map with the regulated temperature for cpu >> frequency throttling. Same as in the kernel document example. >> >> >> It has no use to set frequency scaling on 2 different temperature >> trip points, as the lowest one makes sure the higher one(s) are never >> reached. > >I looked more closely the DT and there is a misunderstanding of the >thermal framework in the definition. > >There is one trip point with the passive type and the cpu cooling >device, followed by a second trip point with the active type *but* the >same cpu cooling device. That is wrong. > >And finally, there is the hot trip point as a third mapping and the >same >cooling device. > >The hot trip point is only there to notify userspace and let it take an >immediate action to prevent an emergency shutdown when reaching the >critical temperature. > >> It can be applied only at 1 trip point. Multiple trip points >> is only usefully for fan control to make sure the fan is not too >> noisy when it is not necessary to be noisy. >> >> >> The CPU will almost come to a dead stop when it starts to pass the >> lowest thermal map with frequency throttling. >> >> This is why it is a bug and needs a fix, not only adjustment. > >Yes, you are right. It should be something like (verbatim copy): > >diff --git a/arch/arm64/boot/dts/mediatek/mt7622.dtsi >b/arch/arm64/boot/dts/mediatek/mt7622.dtsi >index 890a942ec608..88c81d24f4ff 100644 >--- a/arch/arm64/boot/dts/mediatek/mt7622.dtsi >+++ b/arch/arm64/boot/dts/mediatek/mt7622.dtsi >@@ -136,24 +136,18 @@ secmon_reserved: secmon@43000000 { > > thermal-zones { > cpu_thermal: cpu-thermal { >- polling-delay-passive = <1000>; >+ polling-delay-passive = <250>; > polling-delay = <1000>; > > thermal-sensors = <&thermal 0>; > > trips { > cpu_passive: cpu-passive { >- temperature = <47000>; >+ temperature = <77000>; > hysteresis = <2000>; > type = "passive"; > }; > >- cpu_active: cpu-active { >- temperature = <67000>; >- hysteresis = <2000>; >- type = "active"; >- }; >- > cpu_hot: cpu-hot { > temperature = <87000>; > hysteresis = <2000>; >@@ -173,18 +167,6 @@ map0 { > cooling-device = <&cpu0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>, > <&cpu1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>; > }; >- >- map1 { >- trip = <&cpu_active>; >- cooling-device = <&cpu0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>, >- <&cpu1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>; >- }; >- >- map2 { >- trip = <&cpu_hot>; >- cooling-device = <&cpu0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>, >- <&cpu1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>; >- }; > }; > }; > }; > > >-- ><http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs > >Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | ><http://twitter.com/#!/linaroorg> Twitter | ><http://www.linaro.org/linaro-blog/> Blog
On 23/06/2021 20:43, Eric Woudstra wrote: > > I choose "hot" before, because 87 degrees seems ok to start frequency > throttling. But, yes, it should be passive. > > 87 is still quite low if I compare this temperature with the > wrt3200acm Marvell dual core arm soc. They even went above 100 > degrees so I feel for an arm processor inside a router box it is fine > to use 87 degrees But maybe someone at Mediatek can give some more > details about operating temperatures. Sometimes, the SoC vendor puts a high temperature in the DT just to export the thermal zone and deal with it from userspace. So putting the high temp allow the userspace (usually a thermal engine - Android stuff) to deal with the mitigation without a kernel interaction. Having more than 100°C could be this kind of setup. Only the operating temperature from the hardware documentation will tell the safe temperature for the silicon. IMO, 77°C is a good compromise until getting the documented temp. 87°C sounds to me a bit too hot. > It may be possible to leave the active map in the device tree as some > users of the bananapi might choose to install a fan as it is one of > the options. The active trip only makes sense if the cooling device is a fan (or any active device), so the mapping points to a fan node, like: https://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux.git/tree/arch/arm64/boot/dts/rockchip/rk3399-khadas-edge.dtsi#n192 If there is no such [pwm] fan output on the board, no active trip point should be added. > Get BlueMail for Android > > On Jun 23, 2021, 5:58 PM, at 5:58 PM, Daniel Lezcano > <daniel.lezcano@linaro.org> wrote: >> On 23/06/2021 17:35, Eric Woudstra wrote: >>> It is only useful to set 1 map with the regulated temperature for >>> cpu frequency throttling. Same as in the kernel document >>> example. >>> >>> >>> It has no use to set frequency scaling on 2 different >>> temperature trip points, as the lowest one makes sure the higher >>> one(s) are never reached. >> >> I looked more closely the DT and there is a misunderstanding of >> the thermal framework in the definition. >> >> There is one trip point with the passive type and the cpu cooling >> device, followed by a second trip point with the active type *but* >> the same cpu cooling device. That is wrong. >> >> And finally, there is the hot trip point as a third mapping and >> the same cooling device. >> >> The hot trip point is only there to notify userspace and let it >> take an immediate action to prevent an emergency shutdown when >> reaching the critical temperature. >> >>> It can be applied only at 1 trip point. Multiple trip points is >>> only usefully for fan control to make sure the fan is not too >>> noisy when it is not necessary to be noisy. >>> >>> >>> The CPU will almost come to a dead stop when it starts to pass >>> the lowest thermal map with frequency throttling. >>> >>> This is why it is a bug and needs a fix, not only adjustment. >> >> Yes, you are right. It should be something like (verbatim copy): >> >> diff --git a/arch/arm64/boot/dts/mediatek/mt7622.dtsi >> b/arch/arm64/boot/dts/mediatek/mt7622.dtsi index >> 890a942ec608..88c81d24f4ff 100644 --- >> a/arch/arm64/boot/dts/mediatek/mt7622.dtsi +++ >> b/arch/arm64/boot/dts/mediatek/mt7622.dtsi @@ -136,24 +136,18 @@ >> secmon_reserved: secmon@43000000 { >> >> thermal-zones { cpu_thermal: cpu-thermal { - >> polling-delay-passive = <1000>; + polling-delay-passive = <250>; >> polling-delay = <1000>; >> >> thermal-sensors = <&thermal 0>; >> >> trips { cpu_passive: cpu-passive { - temperature = <47000>; + >> temperature = <77000>; hysteresis = <2000>; type = "passive"; }; >> >> - cpu_active: cpu-active { - temperature = <67000>; - >> hysteresis = <2000>; - type = "active"; - }; - cpu_hot: >> cpu-hot { temperature = <87000>; hysteresis = <2000>; @@ -173,18 >> +167,6 @@ map0 { cooling-device = <&cpu0 THERMAL_NO_LIMIT >> THERMAL_NO_LIMIT>, <&cpu1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>; }; - >> - map1 { - trip = <&cpu_active>; - cooling-device = >> <&cpu0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>, - <&cpu1 >> THERMAL_NO_LIMIT THERMAL_NO_LIMIT>; - }; - - map2 { - >> trip = <&cpu_hot>; - cooling-device = <&cpu0 THERMAL_NO_LIMIT >> THERMAL_NO_LIMIT>, - <&cpu1 THERMAL_NO_LIMIT >> THERMAL_NO_LIMIT>; - }; }; }; }; >> >> >> -- <http://www.linaro.org/> Linaro.org │ Open source software for >> ARM SoCs >> >> Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | >> <http://twitter.com/#!/linaroorg> Twitter | >> <http://www.linaro.org/linaro-blog/> Blog > -- <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | <http://twitter.com/#!/linaroorg> Twitter | <http://www.linaro.org/linaro-blog/> Blog
For Marvell: https://www.google.com/url?sa=t&source=web&rct=j&url=https://wiki.kobol.io/helios4/files/som/brochure_a38x_microsom_2017-09-05.pdf Armada38x maximum die temperature 115 degrees Celcius. They really get hotter then 100. But for mt7622 I cannot find this value Get BlueMail for Android On Jun 23, 2021, 10:08 PM, at 10:08 PM, Daniel Lezcano <daniel.lezcano@linaro.org> wrote: >On 23/06/2021 20:43, Eric Woudstra wrote: >> >> I choose "hot" before, because 87 degrees seems ok to start frequency >> throttling. But, yes, it should be passive. >> >> 87 is still quite low if I compare this temperature with the >> wrt3200acm Marvell dual core arm soc. They even went above 100 >> degrees so I feel for an arm processor inside a router box it is fine >> to use 87 degrees But maybe someone at Mediatek can give some more >> details about operating temperatures. > >Sometimes, the SoC vendor puts a high temperature in the DT just to >export the thermal zone and deal with it from userspace. So putting the >high temp allow the userspace (usually a thermal engine - Android >stuff) >to deal with the mitigation without a kernel interaction. > >Having more than 100°C could be this kind of setup. Only the operating >temperature from the hardware documentation will tell the safe >temperature for the silicon. > >IMO, 77°C is a good compromise until getting the documented temp. 87°C >sounds to me a bit too hot. > >> It may be possible to leave the active map in the device tree as some >> users of the bananapi might choose to install a fan as it is one of >> the options. > >The active trip only makes sense if the cooling device is a fan (or any >active device), so the mapping points to a fan node, like: > >https://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux.git/tree/arch/arm64/boot/dts/rockchip/rk3399-khadas-edge.dtsi#n192 > >If there is no such [pwm] fan output on the board, no active trip point >should be added. > >> Get BlueMail for Android >> >> On Jun 23, 2021, 5:58 PM, at 5:58 PM, Daniel Lezcano >> <daniel.lezcano@linaro.org> wrote: >>> On 23/06/2021 17:35, Eric Woudstra wrote: >>>> It is only useful to set 1 map with the regulated temperature for >>>> cpu frequency throttling. Same as in the kernel document >>>> example. >>>> >>>> >>>> It has no use to set frequency scaling on 2 different >>>> temperature trip points, as the lowest one makes sure the higher >>>> one(s) are never reached. >>> >>> I looked more closely the DT and there is a misunderstanding of >>> the thermal framework in the definition. >>> >>> There is one trip point with the passive type and the cpu cooling >>> device, followed by a second trip point with the active type *but* >>> the same cpu cooling device. That is wrong. >>> >>> And finally, there is the hot trip point as a third mapping and >>> the same cooling device. >>> >>> The hot trip point is only there to notify userspace and let it >>> take an immediate action to prevent an emergency shutdown when >>> reaching the critical temperature. >>> >>>> It can be applied only at 1 trip point. Multiple trip points is >>>> only usefully for fan control to make sure the fan is not too >>>> noisy when it is not necessary to be noisy. >>>> >>>> >>>> The CPU will almost come to a dead stop when it starts to pass >>>> the lowest thermal map with frequency throttling. >>>> >>>> This is why it is a bug and needs a fix, not only adjustment. >>> >>> Yes, you are right. It should be something like (verbatim copy): >>> >>> diff --git a/arch/arm64/boot/dts/mediatek/mt7622.dtsi >>> b/arch/arm64/boot/dts/mediatek/mt7622.dtsi index >>> 890a942ec608..88c81d24f4ff 100644 --- >>> a/arch/arm64/boot/dts/mediatek/mt7622.dtsi +++ >>> b/arch/arm64/boot/dts/mediatek/mt7622.dtsi @@ -136,24 +136,18 @@ >>> secmon_reserved: secmon@43000000 { >>> >>> thermal-zones { cpu_thermal: cpu-thermal { - >>> polling-delay-passive = <1000>; + polling-delay-passive = <250>; >>> polling-delay = <1000>; >>> >>> thermal-sensors = <&thermal 0>; >>> >>> trips { cpu_passive: cpu-passive { - temperature = <47000>; + >>> temperature = <77000>; hysteresis = <2000>; type = "passive"; }; >>> >>> - cpu_active: cpu-active { - temperature = <67000>; - >>> hysteresis = <2000>; - type = "active"; - }; - cpu_hot: >>> cpu-hot { temperature = <87000>; hysteresis = <2000>; @@ -173,18 >>> +167,6 @@ map0 { cooling-device = <&cpu0 THERMAL_NO_LIMIT >>> THERMAL_NO_LIMIT>, <&cpu1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>; }; - >>> - map1 { - trip = <&cpu_active>; - cooling-device = >>> <&cpu0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>, - <&cpu1 >>> THERMAL_NO_LIMIT THERMAL_NO_LIMIT>; - }; - - map2 { - >>> trip = <&cpu_hot>; - cooling-device = <&cpu0 THERMAL_NO_LIMIT >>> THERMAL_NO_LIMIT>, - <&cpu1 THERMAL_NO_LIMIT >>> THERMAL_NO_LIMIT>; - }; }; }; }; >>> >>> >>> -- <http://www.linaro.org/> Linaro.org │ Open source software for >>> ARM SoCs >>> >>> Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | >>> <http://twitter.com/#!/linaroorg> Twitter | >>> <http://www.linaro.org/linaro-blog/> Blog >> > > >-- ><http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs > >Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | ><http://twitter.com/#!/linaroorg> Twitter | ><http://www.linaro.org/linaro-blog/> Blog
On 24/06/2021 11:59, Eric Woudstra wrote: > > For Marvell: > > https://www.google.com/url?sa=t&source=web&rct=j&url=https://wiki.kobol.io/helios4/files/som/brochure_a38x_microsom_2017-09-05.pdf > > Armada38x maximum die temperature 115 degrees Celcius. They really get hotter then 100. > > But for mt7622 I cannot find this value Found that: https://download.kamami.pl/p579344-MT7622A_Datasheet_for_BananaPi_Only%281%29.pdf Chapter 3.3 - Thermal Characteristics Given the values I suggest: - Passive - 80°C - Hot - 90°C - Critical - 100°C And passive polling set to 250ms. It sounds like the sensor is not supporting the interrupt mode yet, so a big gap is needed with the Tj IMO to give the time to detect the trip point crossing with the polling. > Get BlueMail for Android > > On Jun 23, 2021, 10:08 PM, at 10:08 PM, Daniel Lezcano <daniel.lezcano@linaro.org> wrote: >> On 23/06/2021 20:43, Eric Woudstra wrote: >>> >>> I choose "hot" before, because 87 degrees seems ok to start frequency >>> throttling. But, yes, it should be passive. >>> >>> 87 is still quite low if I compare this temperature with the >>> wrt3200acm Marvell dual core arm soc. They even went above 100 >>> degrees so I feel for an arm processor inside a router box it is fine >>> to use 87 degrees But maybe someone at Mediatek can give some more >>> details about operating temperatures. >> >> Sometimes, the SoC vendor puts a high temperature in the DT just to >> export the thermal zone and deal with it from userspace. So putting the >> high temp allow the userspace (usually a thermal engine - Android >> stuff) >> to deal with the mitigation without a kernel interaction. >> >> Having more than 100°C could be this kind of setup. Only the operating >> temperature from the hardware documentation will tell the safe >> temperature for the silicon. >> >> IMO, 77°C is a good compromise until getting the documented temp. 87°C >> sounds to me a bit too hot. >> >>> It may be possible to leave the active map in the device tree as some >>> users of the bananapi might choose to install a fan as it is one of >>> the options. >> >> The active trip only makes sense if the cooling device is a fan (or any >> active device), so the mapping points to a fan node, like: >> >> https://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux.git/tree/arch/arm64/boot/dts/rockchip/rk3399-khadas-edge.dtsi#n192 >> >> If there is no such [pwm] fan output on the board, no active trip point >> should be added. >> >>> Get BlueMail for Android >>> >>> On Jun 23, 2021, 5:58 PM, at 5:58 PM, Daniel Lezcano >>> <daniel.lezcano@linaro.org> wrote: >>>> On 23/06/2021 17:35, Eric Woudstra wrote: >>>>> It is only useful to set 1 map with the regulated temperature for >>>>> cpu frequency throttling. Same as in the kernel document >>>>> example. >>>>> >>>>> >>>>> It has no use to set frequency scaling on 2 different >>>>> temperature trip points, as the lowest one makes sure the higher >>>>> one(s) are never reached. >>>> >>>> I looked more closely the DT and there is a misunderstanding of >>>> the thermal framework in the definition. >>>> >>>> There is one trip point with the passive type and the cpu cooling >>>> device, followed by a second trip point with the active type *but* >>>> the same cpu cooling device. That is wrong. >>>> >>>> And finally, there is the hot trip point as a third mapping and >>>> the same cooling device. >>>> >>>> The hot trip point is only there to notify userspace and let it >>>> take an immediate action to prevent an emergency shutdown when >>>> reaching the critical temperature. >>>> >>>>> It can be applied only at 1 trip point. Multiple trip points is >>>>> only usefully for fan control to make sure the fan is not too >>>>> noisy when it is not necessary to be noisy. >>>>> >>>>> >>>>> The CPU will almost come to a dead stop when it starts to pass >>>>> the lowest thermal map with frequency throttling. >>>>> >>>>> This is why it is a bug and needs a fix, not only adjustment. >>>> >>>> Yes, you are right. It should be something like (verbatim copy): >>>> >>>> diff --git a/arch/arm64/boot/dts/mediatek/mt7622.dtsi >>>> b/arch/arm64/boot/dts/mediatek/mt7622.dtsi index >>>> 890a942ec608..88c81d24f4ff 100644 --- >>>> a/arch/arm64/boot/dts/mediatek/mt7622.dtsi +++ >>>> b/arch/arm64/boot/dts/mediatek/mt7622.dtsi @@ -136,24 +136,18 @@ >>>> secmon_reserved: secmon@43000000 { >>>> >>>> thermal-zones { cpu_thermal: cpu-thermal { - >>>> polling-delay-passive = <1000>; + polling-delay-passive = <250>; >>>> polling-delay = <1000>; >>>> >>>> thermal-sensors = <&thermal 0>; >>>> >>>> trips { cpu_passive: cpu-passive { - temperature = <47000>; + >>>> temperature = <77000>; hysteresis = <2000>; type = "passive"; }; >>>> >>>> - cpu_active: cpu-active { - temperature = <67000>; - >>>> hysteresis = <2000>; - type = "active"; - }; - cpu_hot: >>>> cpu-hot { temperature = <87000>; hysteresis = <2000>; @@ -173,18 >>>> +167,6 @@ map0 { cooling-device = <&cpu0 THERMAL_NO_LIMIT >>>> THERMAL_NO_LIMIT>, <&cpu1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>; }; - >>>> - map1 { - trip = <&cpu_active>; - cooling-device = >>>> <&cpu0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>, - <&cpu1 >>>> THERMAL_NO_LIMIT THERMAL_NO_LIMIT>; - }; - - map2 { - >>>> trip = <&cpu_hot>; - cooling-device = <&cpu0 THERMAL_NO_LIMIT >>>> THERMAL_NO_LIMIT>, - <&cpu1 THERMAL_NO_LIMIT >>>> THERMAL_NO_LIMIT>; - }; }; }; }; >>>> >>>> >>>> -- <http://www.linaro.org/> Linaro.org │ Open source software for >>>> ARM SoCs >>>> >>>> Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | >>>> <http://twitter.com/#!/linaroorg> Twitter | >>>> <http://www.linaro.org/linaro-blog/> Blog >>> >> >> >> -- >> <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs >> >> Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | >> <http://twitter.com/#!/linaroorg> Twitter | >> <http://www.linaro.org/linaro-blog/> Blog > -- <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | <http://twitter.com/#!/linaroorg> Twitter | <http://www.linaro.org/linaro-blog/> Blog
The SOC runs unthrotlled slowly to 80 degrees. This takes minutes. Polling interval 1 second or less does not matter much when looking at these temperature rise times After that in more then an hour it slowly creeps up to 85. I believe the design is so that the SOC, under normal circumstances, can run at 1.35 GHz without throttling frequency, without heatsink. It just needs a safeguard for different circumstances. Most of these SOCs can also run in industrial grade circumstances, which means up to 85 degrees ambient temperature already . If not industrial then this would be 60 degrees ambient already But only someone at Mediatek can confirm this Get BlueMail for Android On Jun 24, 2021, 12:21 PM, at 12:21 PM, Daniel Lezcano <daniel.lezcano@linaro.org> wrote: >On 24/06/2021 11:59, Eric Woudstra wrote: >> >> For Marvell: >> >> >https://www.google.com/url?sa=t&source=web&rct=j&url=https://wiki.kobol.io/helios4/files/som/brochure_a38x_microsom_2017-09-05.pdf >> >> Armada38x maximum die temperature 115 degrees Celcius. They really >get hotter then 100. >> >> But for mt7622 I cannot find this value > >Found that: > >https://download.kamami.pl/p579344-MT7622A_Datasheet_for_BananaPi_Only%281%29.pdf > >Chapter 3.3 - Thermal Characteristics > >Given the values I suggest: > > - Passive - 80°C > > - Hot - 90°C > > - Critical - 100°C > >And passive polling set to 250ms. > >It sounds like the sensor is not supporting the interrupt mode yet, so >a >big gap is needed with the Tj IMO to give the time to detect the trip >point crossing with the polling. > >> Get BlueMail for Android >> >> On Jun 23, 2021, 10:08 PM, at 10:08 PM, Daniel Lezcano ><daniel.lezcano@linaro.org> wrote: >>> On 23/06/2021 20:43, Eric Woudstra wrote: >>>> >>>> I choose "hot" before, because 87 degrees seems ok to start >frequency >>>> throttling. But, yes, it should be passive. >>>> >>>> 87 is still quite low if I compare this temperature with the >>>> wrt3200acm Marvell dual core arm soc. They even went above 100 >>>> degrees so I feel for an arm processor inside a router box it is >fine >>>> to use 87 degrees But maybe someone at Mediatek can give some more >>>> details about operating temperatures. >>> >>> Sometimes, the SoC vendor puts a high temperature in the DT just to >>> export the thermal zone and deal with it from userspace. So putting >the >>> high temp allow the userspace (usually a thermal engine - Android >>> stuff) >>> to deal with the mitigation without a kernel interaction. >>> >>> Having more than 100°C could be this kind of setup. Only the >operating >>> temperature from the hardware documentation will tell the safe >>> temperature for the silicon. >>> >>> IMO, 77°C is a good compromise until getting the documented temp. >87°C >>> sounds to me a bit too hot. >>> >>>> It may be possible to leave the active map in the device tree as >some >>>> users of the bananapi might choose to install a fan as it is one of >>>> the options. >>> >>> The active trip only makes sense if the cooling device is a fan (or >any >>> active device), so the mapping points to a fan node, like: >>> >>> >https://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux.git/tree/arch/arm64/boot/dts/rockchip/rk3399-khadas-edge.dtsi#n192 >>> >>> If there is no such [pwm] fan output on the board, no active trip >point >>> should be added. >>> >>>> Get BlueMail for Android >>>> >>>> On Jun 23, 2021, 5:58 PM, at 5:58 PM, Daniel Lezcano >>>> <daniel.lezcano@linaro.org> wrote: >>>>> On 23/06/2021 17:35, Eric Woudstra wrote: >>>>>> It is only useful to set 1 map with the regulated temperature for >>>>>> cpu frequency throttling. Same as in the kernel document >>>>>> example. >>>>>> >>>>>> >>>>>> It has no use to set frequency scaling on 2 different >>>>>> temperature trip points, as the lowest one makes sure the higher >>>>>> one(s) are never reached. >>>>> >>>>> I looked more closely the DT and there is a misunderstanding of >>>>> the thermal framework in the definition. >>>>> >>>>> There is one trip point with the passive type and the cpu cooling >>>>> device, followed by a second trip point with the active type *but* >>>>> the same cpu cooling device. That is wrong. >>>>> >>>>> And finally, there is the hot trip point as a third mapping and >>>>> the same cooling device. >>>>> >>>>> The hot trip point is only there to notify userspace and let it >>>>> take an immediate action to prevent an emergency shutdown when >>>>> reaching the critical temperature. >>>>> >>>>>> It can be applied only at 1 trip point. Multiple trip points is >>>>>> only usefully for fan control to make sure the fan is not too >>>>>> noisy when it is not necessary to be noisy. >>>>>> >>>>>> >>>>>> The CPU will almost come to a dead stop when it starts to pass >>>>>> the lowest thermal map with frequency throttling. >>>>>> >>>>>> This is why it is a bug and needs a fix, not only adjustment. >>>>> >>>>> Yes, you are right. It should be something like (verbatim copy): >>>>> >>>>> diff --git a/arch/arm64/boot/dts/mediatek/mt7622.dtsi >>>>> b/arch/arm64/boot/dts/mediatek/mt7622.dtsi index >>>>> 890a942ec608..88c81d24f4ff 100644 --- >>>>> a/arch/arm64/boot/dts/mediatek/mt7622.dtsi +++ >>>>> b/arch/arm64/boot/dts/mediatek/mt7622.dtsi @@ -136,24 +136,18 @@ >>>>> secmon_reserved: secmon@43000000 { >>>>> >>>>> thermal-zones { cpu_thermal: cpu-thermal { - >>>>> polling-delay-passive = <1000>; + polling-delay-passive = <250>; > >>>>> polling-delay = <1000>; >>>>> >>>>> thermal-sensors = <&thermal 0>; >>>>> >>>>> trips { cpu_passive: cpu-passive { - temperature = <47000>; + >>>>> temperature = <77000>; hysteresis = <2000>; type = "passive"; }; >>>>> >>>>> - cpu_active: cpu-active { - temperature = <67000>; - >>>>> hysteresis = <2000>; - type = "active"; - }; - cpu_hot: >>>>> cpu-hot { temperature = <87000>; hysteresis = <2000>; @@ -173,18 >>>>> +167,6 @@ map0 { cooling-device = <&cpu0 THERMAL_NO_LIMIT >>>>> THERMAL_NO_LIMIT>, <&cpu1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>; }; - > >>>>> - map1 { - trip = <&cpu_active>; - cooling-device = >>>>> <&cpu0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>, - <&cpu1 >>>>> THERMAL_NO_LIMIT THERMAL_NO_LIMIT>; - }; - - map2 { - >>>>> trip = <&cpu_hot>; - cooling-device = <&cpu0 THERMAL_NO_LIMIT >>>>> THERMAL_NO_LIMIT>, - <&cpu1 THERMAL_NO_LIMIT >>>>> THERMAL_NO_LIMIT>; - }; }; }; }; >>>>> >>>>> >>>>> -- <http://www.linaro.org/> Linaro.org │ Open source software for >>>>> ARM SoCs >>>>> >>>>> Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | >>>>> <http://twitter.com/#!/linaroorg> Twitter | >>>>> <http://www.linaro.org/linaro-blog/> Blog >>>> >>> >>> >>> -- >>> <http://www.linaro.org/> Linaro.org │ Open source software for ARM >SoCs >>> >>> Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | >>> <http://twitter.com/#!/linaroorg> Twitter | >>> <http://www.linaro.org/linaro-blog/> Blog >> > > >-- ><http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs > >Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | ><http://twitter.com/#!/linaroorg> Twitter | ><http://www.linaro.org/linaro-blog/> Blog
On Fri, Jun 25, 2021 at 10:16:43AM +0200, Frank Wunderlich wrote: > Hi, > > > Gesendet: Donnerstag, 24. Juni 2021 um 15:29 Uhr > > Von: "Eric Woudstra" <ericwouds@gmail.com> > > The SOC runs unthrotlled slowly to 80 degrees. This takes minutes. Polling interval 1 second or less does not matter much when looking at these temperature rise times > > > > After that in more then an hour it slowly creeps up to 85. I believe the design is so that the SOC, under normal circumstances, can run at 1.35 GHz without throttling frequency, without heatsink. It just needs a safeguard for different circumstances. > > > > Most of these SOCs can also run in industrial grade circumstances, which means up to 85 degrees ambient temperature already . If not industrial then this would be 60 degrees ambient already > > > > But only someone at Mediatek can confirm this > > maybe Matthias knows anybody? > get_maintainers-script shows no mtk employee for mtk_thermal driver, added Sean and Ryder as common Linux-Contacts... > > Daniel from openwrt have some other mt7622 Boards maybe he can test the Fan approach below I got Linksys E8450 aka. Belkin RT3200 ( https://fcc.io/K7S-03571 ) as well as Ubiquiti UniFi 6 LR ( https://fcc.io/SWX-U6LR ). Both got quite massive customized heatsinks (see internal photos on FCC submission), which results in much better heat dissipation than just having the naked chip like on the BPi-R64. Hence I also can't test the fan approach on boards other than the R64. > > > On Jun 24, 2021, 12:21 PM, at 12:21 PM, Daniel Lezcano <daniel.lezcano@linaro.org> wrote: > > >Found that: > > > > > >https://download.kamami.pl/p579344-MT7622A_Datasheet_for_BananaPi_Only%281%29.pdf > > > > > >Chapter 3.3 - Thermal Characteristics > > > > > >Given the values I suggest: > > > > > > - Passive - 80°C > > > > > > - Hot - 90°C > > > > > > - Critical - 100°C > > maybe adding FAN (r64, don't know for other mt7622 boards) for lower 2 trips (with adjusted temperature points) and cpu-throtteling for upper 2 trips > > something like this (used the 70/80 trip points discussed before): > > --- a/arch/arm64/boot/dts/mediatek/mt7622.dtsi > +++ b/arch/arm64/boot/dts/mediatek/mt7622.dtsi > @@ -134,6 +134,13 @@ > }; > }; > > + fan0: pwm-fan { > + compatible = "pwm-fan"; > + #cooling-cells = <2>; > + pwms = <&pwm 2 10000 0>; > + cooling-levels = <0 102 170 230>; > + }; > + > thermal-zones { > cpu_thermal: cpu-thermal { > polling-delay-passive = <1000>; > @@ -143,13 +150,13 @@ > > trips { > cpu_passive: cpu-passive { > - temperature = <47000>; > + temperature = <70000>; > hysteresis = <2000>; > type = "passive"; > }; > > cpu_active: cpu-active { > - temperature = <67000>; > + temperature = <80000>; > hysteresis = <2000>; > type = "active"; > }; > @@ -170,14 +177,12 @@ > cooling-maps { > map0 { > trip = <&cpu_passive>; > - cooling-device = <&cpu0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>, > - <&cpu1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>; > + cooling-device = <&fan0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>; > }; > > map1 { > trip = <&cpu_active>; > - cooling-device = <&cpu0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>, > - <&cpu1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>; > + cooling-device = <&fan0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>; > }; > > map2 { > @@ -428,6 +433,7 @@ > pwm: pwm@11006000 { > compatible = "mediatek,mt7622-pwm"; > reg = <0 0x11006000 0 0x1000>; > + #pwm-cells = <3>; > interrupts = <GIC_SPI 77 IRQ_TYPE_LEVEL_LOW>; > clocks = <&topckgen CLK_TOP_PWM_SEL>, > <&pericfg CLK_PERI_PWM_PD>, > > > regards Frank
> Gesendet: Freitag, 25. Juni 2021 um 11:22 Uhr > Von: "Daniel Golle" <daniel@makrotopia.org> > On Fri, Jun 25, 2021 at 10:16:43AM +0200, Frank Wunderlich wrote: > > Daniel from openwrt have some other mt7622 Boards maybe he can test the Fan approach below > > I got Linksys E8450 aka. Belkin RT3200 ( https://fcc.io/K7S-03571 ) as > well as Ubiquiti UniFi 6 LR ( https://fcc.io/SWX-U6LR ). Both got quite > massive customized heatsinks (see internal photos on FCC submission), > which results in much better heat dissipation than just having the > naked chip like on the BPi-R64. > Hence I also can't test the fan approach on boards other than the R64. Do your both mt7622 boards miss the fan-socket or is it not connected to pwm3? then we need to move the fan-parts to mt7622-bananapi-r64.dts instead of mt7622.dtsi regards Frank
Hi Frank, On 25/06/2021 10:16, Frank Wunderlich wrote: > Hi, > >> Gesendet: Donnerstag, 24. Juni 2021 um 15:29 Uhr Von: "Eric >> Woudstra" <ericwouds@gmail.com> The SOC runs unthrotlled slowly to >> 80 degrees. This takes minutes. Polling interval 1 second or less does not matter much when looking at these temperature rise times >> >> After that in more then an hour it slowly creeps up to 85. I >> believe the design is so that the SOC, under normal circumstances, can run at 1.35 GHz without throttling frequency, without heatsink. It just needs a safeguard for different circumstances. >> >> Most of these SOCs can also run in industrial grade circumstances, which means up to 85 degrees ambient temperature already . If not industrial then this would be 60 degrees ambient already >> >> But only someone at Mediatek can confirm this > > maybe Matthias knows anybody? get_maintainers-script shows no mtk > employee for mtk_thermal driver, added Sean and Ryder as common Linux-Contacts... > > Daniel from openwrt have some other mt7622 Boards maybe he can test the Fan approach below > >> On Jun 24, 2021, 12:21 PM, at 12:21 PM, Daniel Lezcano <daniel.lezcano@linaro.org> wrote: >>> Found that: >>> >>> https://download.kamami.pl/p579344-MT7622A_Datasheet_for_BananaPi_Only%281%29.pdf >>> >>> Chapter 3.3 - Thermal Characteristics >>> >>> Given the values I suggest: >>> >>> - Passive - 80°C >>> >>> - Hot - 90°C >>> >>> - Critical - 100°C > > maybe adding FAN (r64, don't know for other mt7622 boards) for lower > 2 trips (with adjusted temperature points) and cpu-throtteling for upper 2 trips It depends what you want to achieve first: - better / sustained performance, then fan before - quiet device or power saving (on battery) then cpu throttling before That is board specific, it should be tuned on DT board specific file. Some comments below: > something like this (used the 70/80 trip points discussed before): > --- a/arch/arm64/boot/dts/mediatek/mt7622.dtsi > +++ b/arch/arm64/boot/dts/mediatek/mt7622.dtsi You should not add the fan in the mt7622.dtsi itself but in the board specific file where there is a fan output on it. mt7622.dtsi is supposed to be the SoC itself AFAICT. For instance: https://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux.git/tree/arch/arm64/boot/dts/rockchip/rk3399-sapphire.dtsi#n39 https://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux.git/tree/arch/arm64/boot/dts/rockchip/rk3399-sapphire.dtsi#n164 > @@ -134,6 +134,13 @@ > }; > }; > > + fan0: pwm-fan { > + compatible = "pwm-fan"; > + #cooling-cells = <2>; > + pwms = <&pwm 2 10000 0>; > + cooling-levels = <0 102 170 230>; > + }; > + > thermal-zones { > cpu_thermal: cpu-thermal { > polling-delay-passive = <1000>; > @@ -143,13 +150,13 @@ > > trips { > cpu_passive: cpu-passive { > - temperature = <47000>; > + temperature = <70000>; > hysteresis = <2000>; > type = "passive"; > }; > > cpu_active: cpu-active { > - temperature = <67000>; > + temperature = <80000>; > hysteresis = <2000>; > type = "active"; > }; > @@ -170,14 +177,12 @@ > cooling-maps { > map0 { > trip = <&cpu_passive>; > - cooling-device = <&cpu0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>, > - <&cpu1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>; > + cooling-device = <&fan0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>; > }; fan == active trip point This is referring to the passive trip point. So it should point to the CPU as it is now. Note the order of mitigation is inverted regarding the proposal description. > map1 { > trip = <&cpu_active>; > - cooling-device = <&cpu0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>, > - <&cpu1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>; > + cooling-device = <&fan0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>; > }; > > map2 { > @@ -428,6 +433,7 @@ > pwm: pwm@11006000 { > compatible = "mediatek,mt7622-pwm"; > reg = <0 0x11006000 0 0x1000>; > + #pwm-cells = <3>; > interrupts = <GIC_SPI 77 IRQ_TYPE_LEVEL_LOW>; > clocks = <&topckgen CLK_TOP_PWM_SEL>, > <&pericfg CLK_PERI_PWM_PD>, > > > regards Frank > -- <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | <http://twitter.com/#!/linaroorg> Twitter | <http://www.linaro.org/linaro-blog/> Blog
Hi Frank, On Fri, Jun 25, 2021 at 11:31:59AM +0200, Frank Wunderlich wrote: > > Gesendet: Freitag, 25. Juni 2021 um 11:22 Uhr > > Von: "Daniel Golle" <daniel@makrotopia.org> > > > On Fri, Jun 25, 2021 at 10:16:43AM +0200, Frank Wunderlich wrote: > > > Daniel from openwrt have some other mt7622 Boards maybe he can test the Fan approach below > > > > I got Linksys E8450 aka. Belkin RT3200 ( https://fcc.io/K7S-03571 ) as > > well as Ubiquiti UniFi 6 LR ( https://fcc.io/SWX-U6LR ). Both got quite > > massive customized heatsinks (see internal photos on FCC submission), > > which results in much better heat dissipation than just having the > > naked chip like on the BPi-R64. > > Hence I also can't test the fan approach on boards other than the R64. > > Do your both mt7622 boards miss the fan-socket or is it not connected to pwm3? then we need to move the fan-parts to mt7622-bananapi-r64.dts instead of mt7622.dtsi There is no fan intended on both devices. E8450 has an unknown connector which **could** be for a fan, but I never tried if and how it is actually connected to the SoC. It could as well be an additional USB 2.0 (as it got 4 pins). Hence I suggest to add the fan on PWM3 for the BPi-R64 only for now. Cheers Daniel
I choose "hot* with CPU, because it was the best temperature. But it should really be passive only with the cooling device CPU but with a much higher temperature. For me 87 degrees is fine and tested. But for mainline we would better ask Mediatek for the correct maximum temperature. Get BlueMail for Android On Jun 25, 2021, 1:03 PM, at 1:03 PM, Frank Wunderlich <frank-w@public-files.de> wrote: >Hi > >> Gesendet: Freitag, 25. Juni 2021 um 11:57 Uhr >> Von: "Daniel Lezcano" <daniel.lezcano@linaro.org> > >> You should not add the fan in the mt7622.dtsi itself but in the board >> specific file where there is a fan output on it. mt7622.dtsi is >supposed >> to be the SoC itself AFAICT. >> >> For instance: >> >> >https://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux.git/tree/arch/arm64/boot/dts/rockchip/rk3399-sapphire.dtsi#n39 >> >> >https://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux.git/tree/arch/arm64/boot/dts/rockchip/rk3399-sapphire.dtsi#n164 > >> > @@ -170,14 +177,12 @@ >> > cooling-maps { >> > map0 { >> > trip = <&cpu_passive>; >> > - cooling-device = <&cpu0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>, >> > - <&cpu1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>; >> > + cooling-device = <&fan0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>; >> > }; >> >> fan == active trip point >> >> This is referring to the passive trip point. So it should point to >the >> CPU as it is now. Note the order of mitigation is inverted regarding >the >> proposal description. > >but we need to disable the passive trip as cpu-trotteling starts >there...the higher temperature trips are currently not reached > >summary > >moving fan and cpu_thermal-override to bananapi-r64.dts > >passive-trip: cooling-device = <&cpu0/1 0 0> as in erics Patch >active trip: cooling-device = <&fan0 THERMAL_NO_LIMIT >THERMAL_NO_LIMIT>; >the other 2 unchanged > >but i suggest changing the temperature points in mt7622 dtsi as this is >SoC specific > >so basicly: > >--- a/arch/arm64/boot/dts/mediatek/mt7622.dtsi >+++ b/arch/arm64/boot/dts/mediatek/mt7622.dtsi >@@ -143,13 +143,13 @@ cpu_thermal: cpu-thermal { > > trips { > cpu_passive: cpu-passive { >- temperature = <47000>; >+ temperature = <70000>; > hysteresis = <2000>; > type = "passive"; > }; > > cpu_active: cpu-active { >- temperature = <67000>; >+ temperature = <80000>; > hysteresis = <2000>; > type = "active"; > }; >@@ -170,8 +170,8 @@ cpu-crit { > cooling-maps { > map0 { > trip = <&cpu_passive>; >- cooling-device = <&cpu0 >THERMAL_NO_LIMIT THERMAL_NO_LIMIT>, >- <&cpu1 >THERMAL_NO_LIMIT THERMAL_NO_LIMIT>; >+ cooling-device = <&cpu0 0 0>, >+ <&cpu1 0 0>; > }; > > map1 { >@@ -428,6 +428,7 @@ uart3: serial@11005000 { > pwm: pwm@11006000 { > compatible = "mediatek,mt7622-pwm"; > reg = <0 0x11006000 0 0x1000>; >+ #pwm-cells = <3>; > interrupts = <GIC_SPI 77 IRQ_TYPE_LEVEL_LOW>; > clocks = <&topckgen CLK_TOP_PWM_SEL>, > <&pericfg CLK_PERI_PWM_PD>, > >--- a/arch/arm64/boot/dts/mediatek/mt7622-bananapi-bpi-r64.dts >+++ b/arch/arm64/boot/dts/mediatek/mt7622-bananapi-bpi-r64.dts >@@ -37,6 +37,13 @@ cpu@1 { > }; > }; > >+ fan0: pwm-fan { >+ compatible = "pwm-fan"; >+ #cooling-cells = <2>; >+ pwms = <&pwm 2 10000 0>; >+ cooling-levels = <0 102 170 230>; >+ }; >+ > gpio-keys { > compatible = "gpio-keys"; > >@@ -582,6 +589,29 @@ &u3phy { > status = "okay"; > }; > >+&cpu_thermal { >+ cooling-maps { >+ map1 { >+ trip = <&cpu_active>; >+ cooling-device = <&fan0 THERMAL_NO_LIMIT >THERMAL_NO_LIMIT>; >+ }; >+ }; >+}; >+ > &uart0 { > pinctrl-names = "default"; > pinctrl-0 = <&uart0_pins>;
On 25/06/2021 13:03, Frank Wunderlich wrote: > Hi > >> Gesendet: Freitag, 25. Juni 2021 um 11:57 Uhr >> Von: "Daniel Lezcano" <daniel.lezcano@linaro.org> > >> You should not add the fan in the mt7622.dtsi itself but in the board >> specific file where there is a fan output on it. mt7622.dtsi is supposed >> to be the SoC itself AFAICT. >> >> For instance: >> >> https://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux.git/tree/arch/arm64/boot/dts/rockchip/rk3399-sapphire.dtsi#n39 >> >> https://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux.git/tree/arch/arm64/boot/dts/rockchip/rk3399-sapphire.dtsi#n164 > >>> @@ -170,14 +177,12 @@ >>> cooling-maps { >>> map0 { >>> trip = <&cpu_passive>; >>> - cooling-device = <&cpu0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>, >>> - <&cpu1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>; >>> + cooling-device = <&fan0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>; >>> }; >> >> fan == active trip point >> >> This is referring to the passive trip point. So it should point to the >> CPU as it is now. Note the order of mitigation is inverted regarding the >> proposal description. > > but we need to disable the passive trip as cpu-trotteling starts there...the higher temperature trips are currently not reached Sorry, can you rephrase it ? I'm not getting the point. > summary > > moving fan and cpu_thermal-override to bananapi-r64.dts > > passive-trip: cooling-device = <&cpu0/1 0 0> as in erics Patch > active trip: cooling-device = <&fan0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>; > the other 2 unchanged > > but i suggest changing the temperature points in mt7622 dtsi as this is SoC specific > > so basicly: > > --- a/arch/arm64/boot/dts/mediatek/mt7622.dtsi > +++ b/arch/arm64/boot/dts/mediatek/mt7622.dtsi > @@ -143,13 +143,13 @@ cpu_thermal: cpu-thermal { > > trips { > cpu_passive: cpu-passive { > - temperature = <47000>; > + temperature = <70000>; May be increase the passive temp to 75°C. > hysteresis = <2000>; > type = "passive"; > }; > > cpu_active: cpu-active { > - temperature = <67000>; > + temperature = <80000>; > hysteresis = <2000>; > type = "active"; > }; Move the active trip 'cpu_active' to mt7622-bananapi-bpi-r64.dts. and set it to 70°C in the mt7622-bananapi-bpi-r64.dts, so the fan will act before the cpu throttling. The behavior should be the following: The temperature reaches 70°C, the fan will start, if the temperature continues to increase, it will increase the speed. If the temperature reaches 75°C, the fan is still rotating at full speed but the cpu begins to be throttled. AFAIU, it is a Cortex-A53 running @1.35GH, so except the board is in a black metal box under the sun, I don't see how we can reach this thermal limits. > @@ -170,8 +170,8 @@ cpu-crit { > cooling-maps { > map0 { > trip = <&cpu_passive>; > - cooling-device = <&cpu0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>, > - <&cpu1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>; > + cooling-device = <&cpu0 0 0>, > + <&cpu1 0 0>; You should keep it untouched. > }; > > map1 { > @@ -428,6 +428,7 @@ uart3: serial@11005000 { > pwm: pwm@11006000 { > compatible = "mediatek,mt7622-pwm"; > reg = <0 0x11006000 0 0x1000>; > + #pwm-cells = <3>; > interrupts = <GIC_SPI 77 IRQ_TYPE_LEVEL_LOW>; > clocks = <&topckgen CLK_TOP_PWM_SEL>, > <&pericfg CLK_PERI_PWM_PD>, > > --- a/arch/arm64/boot/dts/mediatek/mt7622-bananapi-bpi-r64.dts > +++ b/arch/arm64/boot/dts/mediatek/mt7622-bananapi-bpi-r64.dts > @@ -37,6 +37,13 @@ cpu@1 { > }; > }; > > + fan0: pwm-fan { > + compatible = "pwm-fan"; > + #cooling-cells = <2>; > + pwms = <&pwm 2 10000 0>; > + cooling-levels = <0 102 170 230>; > + }; > + > gpio-keys { > compatible = "gpio-keys"; > > @@ -582,6 +589,29 @@ &u3phy { > status = "okay"; > }; > > +&cpu_thermal { > + cooling-maps { > + map1 { > + trip = <&cpu_active>; > + cooling-device = <&fan0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>; > + }; > + }; > +}; > + > &uart0 { > pinctrl-names = "default"; > pinctrl-0 = <&uart0_pins>; > -- <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | <http://twitter.com/#!/linaroorg> Twitter | <http://www.linaro.org/linaro-blog/> Blog
> Gesendet: Freitag, 25. Juni 2021 um 13:47 Uhr > Von: "Daniel Lezcano" <daniel.lezcano@linaro.org> > > but we need to disable the passive trip as cpu-trotteling starts there...the higher temperature trips are currently not reached > > Sorry, can you rephrase it ? I'm not getting the point. the problem currently is that passive is at 47degress Celsius and trottles cpu, active (67°C) and hot points are never reached this way. so at least we need to change temperatures in dtsi, and maybe disable cpu-trotteling on passive trip. imho fan will never start if it is in active and cpu is trottled before in passive > > summary > > > > moving fan and cpu_thermal-override to bananapi-r64.dts > > > > passive-trip: cooling-device = <&cpu0/1 0 0> as in erics Patch > > active trip: cooling-device = <&fan0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>; > > the other 2 unchanged > > > > but i suggest changing the temperature points in mt7622 dtsi as this is SoC specific > > > > so basicly: > > > > --- a/arch/arm64/boot/dts/mediatek/mt7622.dtsi > > +++ b/arch/arm64/boot/dts/mediatek/mt7622.dtsi > > @@ -143,13 +143,13 @@ cpu_thermal: cpu-thermal { > > > > trips { > > cpu_passive: cpu-passive { > > - temperature = <47000>; > > + temperature = <70000>; > > May be increase the passive temp to 75°C. > > > hysteresis = <2000>; > > type = "passive"; > > }; > > > > cpu_active: cpu-active { > > - temperature = <67000>; > > + temperature = <80000>; > > hysteresis = <2000>; > > type = "active"; > > }; > > Move the active trip 'cpu_active' to mt7622-bananapi-bpi-r64.dts. and > set it to 70°C in the mt7622-bananapi-bpi-r64.dts, so the fan will act > before the cpu throttling. > > The behavior should be the following: The temperature reaches 70°C, the > fan will start, if the temperature continues to increase, it will > increase the speed. If the temperature reaches 75°C, the fan is still > rotating at full speed but the cpu begins to be throttled. passive to 75 and active lower to 70? is this as intended that active comes before passive? mt7622-bananapi-bpi-r64.dts: &cpu_thermal { trips { cpu_passive: cpu-passive { temperature = <75000>; hysteresis = <2000>; type = "passive"; }; cpu_active: cpu-active { temperature = <70000>; hysteresis = <2000>; type = "active"; }; }; cooling-maps { map1 { trip = <&cpu_active>; cooling-device = <&fan0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>; }; }; }; > AFAIU, it is a Cortex-A53 running @1.35GH, so except the board is in a > black metal box under the sun, I don't see how we can reach this thermal > limits. > > > @@ -170,8 +170,8 @@ cpu-crit { > > cooling-maps { > > map0 { > > trip = <&cpu_passive>; > > - cooling-device = <&cpu0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>, > > - <&cpu1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>; > > + cooling-device = <&cpu0 0 0>, > > + <&cpu1 0 0>; > > You should keep it untouched. then cpu is trottled at passive point (currently 47°C) and imho fan does not start at active > > };
On 25/06/2021 14:28, Frank Wunderlich wrote: >> Gesendet: Freitag, 25. Juni 2021 um 13:47 Uhr >> Von: "Daniel Lezcano" <daniel.lezcano@linaro.org> > >>> but we need to disable the passive trip as cpu-trotteling starts there...the higher temperature trips are currently not reached >> >> Sorry, can you rephrase it ? I'm not getting the point. > > the problem currently is that passive is at 47degress Celsius and > trottles cpu, active (67°C) and hot points are never reached this way. > so at least we need to change temperatures in dtsi, and maybe disable > cpu-trotteling on passive trip. imho fan will never start if it is in > active and cpu is trottled before in passive Ok, thanks for the clarification. >>> summary >>> >>> moving fan and cpu_thermal-override to bananapi-r64.dts >>> >>> passive-trip: cooling-device = <&cpu0/1 0 0> as in erics Patch >>> active trip: cooling-device = <&fan0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>; >>> the other 2 unchanged >>> >>> but i suggest changing the temperature points in mt7622 dtsi as this is SoC specific >>> >>> so basicly: >>> >>> --- a/arch/arm64/boot/dts/mediatek/mt7622.dtsi >>> +++ b/arch/arm64/boot/dts/mediatek/mt7622.dtsi >>> @@ -143,13 +143,13 @@ cpu_thermal: cpu-thermal { >>> >>> trips { >>> cpu_passive: cpu-passive { >>> - temperature = <47000>; >>> + temperature = <70000>; >> >> May be increase the passive temp to 75°C. >> >>> hysteresis = <2000>; >>> type = "passive"; >>> }; >>> >>> cpu_active: cpu-active { >>> - temperature = <67000>; >>> + temperature = <80000>; >>> hysteresis = <2000>; >>> type = "active"; >>> }; >> >> Move the active trip 'cpu_active' to mt7622-bananapi-bpi-r64.dts. and >> set it to 70°C in the mt7622-bananapi-bpi-r64.dts, so the fan will act >> before the cpu throttling. >> >> The behavior should be the following: The temperature reaches 70°C, the >> fan will start, if the temperature continues to increase, it will >> increase the speed. If the temperature reaches 75°C, the fan is still >> rotating at full speed but the cpu begins to be throttled. > > passive to 75 and active lower to 70? is this as intended that active comes before passive? Yes. So there is a default passive mitigation temp for the SoC at 75°C. And the bpi has a setup with a fan mitigating before the cpu throttling. > mt7622-bananapi-bpi-r64.dts: > > &cpu_thermal { > trips { > cpu_passive: cpu-passive { > temperature = <75000>; > hysteresis = <2000>; > type = "passive"; > }; No need to add this trip point, it should be changed to 75°C in SoC DT mt7622.dtsi. This fragment of DT will concatenate with the previous one. > cpu_active: cpu-active { > temperature = <70000>; > hysteresis = <2000>; > type = "active"; > }; > }; > > cooling-maps { > map1 { > trip = <&cpu_active>; > cooling-device = <&fan0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>; > }; > }; > }; > > >> AFAIU, it is a Cortex-A53 running @1.35GH, so except the board is in a >> black metal box under the sun, I don't see how we can reach this thermal >> limits. >> >>> @@ -170,8 +170,8 @@ cpu-crit { >>> cooling-maps { >>> map0 { >>> trip = <&cpu_passive>; >>> - cooling-device = <&cpu0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>, >>> - <&cpu1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>; >>> + cooling-device = <&cpu0 0 0>, >>> + <&cpu1 0 0>; >> >> You should keep it untouched. > > then cpu is trottled at passive point (currently 47°C) and imho fan does not start at active > >>> }; > -- <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | <http://twitter.com/#!/linaroorg> Twitter | <http://www.linaro.org/linaro-blog/> Blog
diff --git a/arch/arm64/boot/dts/mediatek/mt7622.dtsi b/arch/arm64/boot/dts/mediatek/mt7622.dtsi index 890a942ec..b779c7aa6 100644 --- a/arch/arm64/boot/dts/mediatek/mt7622.dtsi +++ b/arch/arm64/boot/dts/mediatek/mt7622.dtsi @@ -170,14 +170,14 @@ cpu-crit { cooling-maps { map0 { trip = <&cpu_passive>; - cooling-device = <&cpu0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>, - <&cpu1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>; + cooling-device = <&cpu0 0 0>, + <&cpu1 0 0>; }; map1 { trip = <&cpu_active>; - cooling-device = <&cpu0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>, - <&cpu1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>; + cooling-device = <&cpu0 0 0>, + <&cpu1 0 0>; }; map2 {