Message ID | 20201029100335.27665-1-peter.ujfalusi@ti.com |
---|---|
State | New |
Headers | show |
Series | thermal: ti-soc-thermal: Disable the CPU PM notifier for OMAP4430 | expand |
* Peter Ujfalusi <peter.ujfalusi@ti.com> [201029 10:03]: > Disabling the notifier fixes the random shutdowns on OMAP4430 (ES2.0 and ES2.1) > but it does not cause any issues on OMAP4460 (PandaES) or OMAP3630 (BeagleXM). > Tony's duovero with OMAP4430 ES2.3 did not ninja-shutdown, but he also have > constant and steady stream of: > thermal thermal_zone0: failed to read out thermal zone (-5) Works for me and I've verified duovero still keeps hitting core ret idle: Tested-by: Tony Lindgren <tony@atomide.com> Regards, Tony
Eduardo, Keerthy, On 29/10/2020 12.51, Tony Lindgren wrote: > * Peter Ujfalusi <peter.ujfalusi@ti.com> [201029 10:03]: >> Disabling the notifier fixes the random shutdowns on OMAP4430 (ES2.0 and ES2.1) >> but it does not cause any issues on OMAP4460 (PandaES) or OMAP3630 (BeagleXM). >> Tony's duovero with OMAP4430 ES2.3 did not ninja-shutdown, but he also have >> constant and steady stream of: >> thermal thermal_zone0: failed to read out thermal zone (-5) > > Works for me and I've verified duovero still keeps hitting core ret idle: Can you pick this one up for 5.10 to make omap4430-sdp to be usable (to not shut down randomly). The regression was introduced in 5.10-rc1. > Tested-by: Tony Lindgren <tony@atomide.com> > > Regards, > > Tony > - Péter Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
diff --git a/drivers/thermal/ti-soc-thermal/ti-bandgap.c b/drivers/thermal/ti-soc-thermal/ti-bandgap.c index 5e596168ba73..dcac99f327b0 100644 --- a/drivers/thermal/ti-soc-thermal/ti-bandgap.c +++ b/drivers/thermal/ti-soc-thermal/ti-bandgap.c @@ -20,6 +20,7 @@ #include <linux/err.h> #include <linux/types.h> #include <linux/spinlock.h> +#include <linux/sys_soc.h> #include <linux/reboot.h> #include <linux/of_device.h> #include <linux/of_platform.h> @@ -864,6 +865,17 @@ static struct ti_bandgap *ti_bandgap_build(struct platform_device *pdev) return bgp; } +/* + * List of SoCs on which the CPU PM notifier can cause erros on the DTEMP + * readout. + * Enabled notifier on these machines results in erroneous, random values which + * could trigger unexpected thermal shutdown. + */ +static const struct soc_device_attribute soc_no_cpu_notifier[] = { + { .machine = "OMAP4430" }, + { /* sentinel */ }, +}; + /*** Device driver call backs ***/ static @@ -1020,7 +1032,8 @@ int ti_bandgap_probe(struct platform_device *pdev) #ifdef CONFIG_PM_SLEEP bgp->nb.notifier_call = bandgap_omap_cpu_notifier; - cpu_pm_register_notifier(&bgp->nb); + if (!soc_device_match(soc_no_cpu_notifier)) + cpu_pm_register_notifier(&bgp->nb); #endif return 0; @@ -1056,7 +1069,8 @@ int ti_bandgap_remove(struct platform_device *pdev) struct ti_bandgap *bgp = platform_get_drvdata(pdev); int i; - cpu_pm_unregister_notifier(&bgp->nb); + if (!soc_device_match(soc_no_cpu_notifier)) + cpu_pm_unregister_notifier(&bgp->nb); /* Remove sensor interfaces */ for (i = 0; i < bgp->conf->sensor_count; i++) {
It has been observed that on OMAP4430 (ES2.0, ES2.1 and ES2.3) the enabled notifier causes errors on the DTEMP readout values: ti-soc-thermal 4a002260.bandgap: in range ADC val: 52 ti-soc-thermal 4a002260.bandgap: in range ADC val: 64 ti-soc-thermal 4a002260.bandgap: in range ADC val: 64 ti-soc-thermal 4a002260.bandgap: out of range ADC val: 0 thermal thermal_zone0: failed to read out thermal zone (-5) ti-soc-thermal 4a002260.bandgap: out of range ADC val: 0 thermal thermal_zone0: failed to read out thermal zone (-5) ti-soc-thermal 4a002260.bandgap: out of range ADC val: 4 thermal thermal_zone0: failed to read out thermal zone (-5) ti-soc-thermal 4a002260.bandgap: in range ADC val: 100 raw 100 translates to 133 Celsius on omap4-sdp, triggering shutdown due to critical temperature. When the notifier is disable for OMAP4430 the DTEMP values are stable: ti-soc-thermal 4a002260.bandgap: in range ADC val: 56 ti-soc-thermal 4a002260.bandgap: in range ADC val: 56 ti-soc-thermal 4a002260.bandgap: in range ADC val: 57 ti-soc-thermal 4a002260.bandgap: in range ADC val: 57 ti-soc-thermal 4a002260.bandgap: in range ADC val: 56 Fixes: 5093402e5b44 ("thermal: ti-soc-thermal: Enable addition power management") Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> --- Hi, my omap4-sdp (Blaze) was shutting down randomly due to critical temperature with 5.10-rc1 and I have bisected it back to 5093402e5b44. Disabling the notifier fixes the random shutdowns on OMAP4430 (ES2.0 and ES2.1) but it does not cause any issues on OMAP4460 (PandaES) or OMAP3630 (BeagleXM). Tony's duovero with OMAP4430 ES2.3 did not ninja-shutdown, but he also have constant and steady stream of: thermal thermal_zone0: failed to read out thermal zone (-5) pointing to similar issue. Regards, Peter drivers/thermal/ti-soc-thermal/ti-bandgap.c | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-)