Message ID | 20220120200153.1214-1-benl@squareup.com |
---|---|
State | New |
Headers | show |
Series | [v3] drivers: thermal: tsens: respect thermal_device_mode in threshold irq reporting | expand |
On 20/01/2022 21:01, Benjamin Li wrote: > 'echo disabled > .../thermal_zoneX/mode' will disable the thermal core's > polling mechanism to check for threshold trips. This is used sometimes to > run performance test cases. > > However, tsens supports an interrupt mechanism to receive notification of > trips, implemented in commit 634e11d5b450 ("drivers: thermal: tsens: Add > interrupt support"). > > Currently the thermal zone mode that's set by userspace is not checked > before propagating threshold trip events from IRQs. Let's fix this to > restore the abilty to disable thermal throttling at runtime. > > ==================== > > Tested on MSM8939 running 5.16.0. This platform has 8 cores; the first > four thermal zones control cpu0-3 and the last zone is for the other four > CPUs together. > > for f in /sys/class/thermal/thermal_zone*; do > echo "disabled" > $f/mode > echo $f | paste - $f/type $f/mode > done > > /sys/class/thermal/thermal_zone0 cpu0-thermal disabled > /sys/class/thermal/thermal_zone1 cpu1-thermal disabled > /sys/class/thermal/thermal_zone2 cpu2-thermal disabled > /sys/class/thermal/thermal_zone3 cpu3-thermal disabled > /sys/class/thermal/thermal_zone4 cpu4567-thermal disabled > > With mitigation thresholds at 75 degC and load running, we can now cruise > past temp=75000 without CPU throttling kicking in. > > watch -n 1 "grep '' /sys/class/thermal/*/temp > /sys/class/thermal/*/cur_state > /sys/bus/cpu/devices/cpu*/cpufreq/cpuinfo_cur_freq" > > /sys/class/thermal/thermal_zone0/temp:82000 > /sys/class/thermal/thermal_zone1/temp:84000 > /sys/class/thermal/thermal_zone2/temp:87000 > /sys/class/thermal/thermal_zone3/temp:84000 > /sys/class/thermal/thermal_zone4/temp:84000 > /sys/class/thermal/cooling_device0/cur_state:0 > /sys/class/thermal/cooling_device1/cur_state:0 > /sys/bus/cpu/devices/cpu0/cpufreq/cpuinfo_cur_freq:1113600 > /sys/bus/cpu/devices/cpu1/cpufreq/cpuinfo_cur_freq:1113600 > /sys/bus/cpu/devices/cpu2/cpufreq/cpuinfo_cur_freq:1113600 > /sys/bus/cpu/devices/cpu3/cpufreq/cpuinfo_cur_freq:1113600 > /sys/bus/cpu/devices/cpu4/cpufreq/cpuinfo_cur_freq:800000 > /sys/bus/cpu/devices/cpu5/cpufreq/cpuinfo_cur_freq:800000 > /sys/bus/cpu/devices/cpu6/cpufreq/cpuinfo_cur_freq:800000 > /sys/bus/cpu/devices/cpu7/cpufreq/cpuinfo_cur_freq:800000 > > Reported-by: Zac Crosby <zac@squareup.com> > Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> > Signed-off-by: Benjamin Li <benl@squareup.com> > --- > Changes in v3: > - Upgraded logging to dev_info_ratelimited and revised log message. > - Remove unrelated hunk. > > Some drivers that support thermal zone disabling implement a set_mode > operation and simply disable the sensor or the relevant IRQ(s), so they > actually don't log anything when zones are disabled. These drivers are > imx_thermal.c, intel_quark_dts_thermal.c, and int3400_thermal.c. > > For tsens.c, implementing a change_mode would require migrating the driver > from devm_thermal_zone_of_sensor_register to thermal_zone_device_register > (or updating thermal_of.c to add a change_mode operation in thermal_zone_ > of_device_ops). > > stm_thermal.c seems to use this patch's model of not disabling IRQs when > the zone is disabled (they still perform the thermal_zone_device_update > upon IRQ, but return -EAGAIN from their get_temp). What is the concern by changing the core code to have a correct handling of the disabled / enabled state in this driver ? (and by this way give the opportunity to other drivers to fix their code) > Changes in v2: > - Reordered sentences in first part of commit message to make sense. > > drivers/thermal/qcom/tsens.c | 12 ++++++++---- > 1 file changed, 8 insertions(+), 4 deletions(-) > > diff --git a/drivers/thermal/qcom/tsens.c b/drivers/thermal/qcom/tsens.c > index 99a8d9f3e03c..dd0002829536 100644 > --- a/drivers/thermal/qcom/tsens.c > +++ b/drivers/thermal/qcom/tsens.c > @@ -509,10 +509,14 @@ static irqreturn_t tsens_irq_thread(int irq, void *data) > spin_unlock_irqrestore(&priv->ul_lock, flags); > > if (trigger) { > - dev_dbg(priv->dev, "[%u] %s: TZ update trigger (%d mC)\n", > - hw_id, __func__, temp); > - thermal_zone_device_update(s->tzd, > - THERMAL_EVENT_UNSPECIFIED); > + if (s->tzd->mode == THERMAL_DEVICE_ENABLED) { > + dev_dbg(priv->dev, "[%u] %s: TZ update trigger (%d mC)\n", > + hw_id, __func__, temp); > + thermal_zone_device_update(s->tzd, THERMAL_EVENT_UNSPECIFIED); > + } else { > + dev_info_ratelimited(priv->dev, "[%u] %s: TZ update trigger (%d mC) skipped - zone disabled, operating outside of safety limits!\n", > + hw_id, __func__, temp); > + } > } else { > dev_dbg(priv->dev, "[%u] %s: no violation: %d\n", > hw_id, __func__, temp);
On Fri, Feb 25, 2022 at 6:02 AM Daniel Lezcano <daniel.lezcano@linaro.org> wrote: > > Some drivers that support thermal zone disabling implement a set_mode > > operation and simply disable the sensor or the relevant IRQ(s), so they > > actually don't log anything when zones are disabled. These drivers are > > imx_thermal.c, intel_quark_dts_thermal.c, and int3400_thermal.c. > > > > For tsens.c, implementing a change_mode would require migrating the driver > > from devm_thermal_zone_of_sensor_register to thermal_zone_device_register > > (or updating thermal_of.c to add a change_mode operation in thermal_zone_ > > of_device_ops). > > > > stm_thermal.c seems to use this patch's model of not disabling IRQs when > > the zone is disabled (they still perform the thermal_zone_device_update > > upon IRQ, but return -EAGAIN from their get_temp). > > What is the concern by changing the core code to have a correct handling > of the disabled / enabled state in this driver ? (and by this way give > the opportunity to other drivers to fix their code)' It seems fine, is that the preference? Updating thermal_of.c to add a change_mode operation in thermal_zone_of_device_ops? Ben > > > -- > <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs > > Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | > <http://twitter.com/#!/linaroorg> Twitter | > <http://www.linaro.org/linaro-blog/> Blog
On 25/02/2022 17:46, Benjamin Li wrote: > On Fri, Feb 25, 2022 at 6:02 AM Daniel Lezcano > <daniel.lezcano@linaro.org> wrote: >>> Some drivers that support thermal zone disabling implement a set_mode >>> operation and simply disable the sensor or the relevant IRQ(s), so they >>> actually don't log anything when zones are disabled. These drivers are >>> imx_thermal.c, intel_quark_dts_thermal.c, and int3400_thermal.c. >>> >>> For tsens.c, implementing a change_mode would require migrating the driver >>> from devm_thermal_zone_of_sensor_register to thermal_zone_device_register >>> (or updating thermal_of.c to add a change_mode operation in thermal_zone_ >>> of_device_ops). >>> >>> stm_thermal.c seems to use this patch's model of not disabling IRQs when >>> the zone is disabled (they still perform the thermal_zone_device_update >>> upon IRQ, but return -EAGAIN from their get_temp). >> >> What is the concern by changing the core code to have a correct handling >> of the disabled / enabled state in this driver ? (and by this way give >> the opportunity to other drivers to fix their code)' > > It seems fine, is that the preference? Updating thermal_of.c to add a > change_mode > operation in thermal_zone_of_device_ops? I'm not a big fan of this duplicated ops structure but preferably it would be better to put it there (except if you see a better way to do it)
diff --git a/drivers/thermal/qcom/tsens.c b/drivers/thermal/qcom/tsens.c index 99a8d9f3e03c..dd0002829536 100644 --- a/drivers/thermal/qcom/tsens.c +++ b/drivers/thermal/qcom/tsens.c @@ -509,10 +509,14 @@ static irqreturn_t tsens_irq_thread(int irq, void *data) spin_unlock_irqrestore(&priv->ul_lock, flags); if (trigger) { - dev_dbg(priv->dev, "[%u] %s: TZ update trigger (%d mC)\n", - hw_id, __func__, temp); - thermal_zone_device_update(s->tzd, - THERMAL_EVENT_UNSPECIFIED); + if (s->tzd->mode == THERMAL_DEVICE_ENABLED) { + dev_dbg(priv->dev, "[%u] %s: TZ update trigger (%d mC)\n", + hw_id, __func__, temp); + thermal_zone_device_update(s->tzd, THERMAL_EVENT_UNSPECIFIED); + } else { + dev_info_ratelimited(priv->dev, "[%u] %s: TZ update trigger (%d mC) skipped - zone disabled, operating outside of safety limits!\n", + hw_id, __func__, temp); + } } else { dev_dbg(priv->dev, "[%u] %s: no violation: %d\n", hw_id, __func__, temp);