Message ID | 20230219143657.241542-8-daniel.lezcano@linaro.org |
---|---|
State | New |
Headers | show |
Series | Self-encapsulate the thermal zone device structure | expand |
Hi Guenter, my script should have Cc'ed you but it didn't, so just a heads up this patch ;) On 19/02/2023 15:36, Daniel Lezcano wrote: > In this function, there is a guarantee the thermal zone is registered. > > The sysfs hwmon unregistering will be blocked until we exit the > function. The thermal zone is unregistered after the sysfs hwmon is > unregistered. > > When we are in this function, the thermal zone is registered. > > We can call the thermal_zone_get_crit_temp() function safely and let > the function use the lock which is private the thermal core code. > > Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> > --- > drivers/thermal/thermal_hwmon.c | 10 +--------- > 1 file changed, 1 insertion(+), 9 deletions(-) > > diff --git a/drivers/thermal/thermal_hwmon.c b/drivers/thermal/thermal_hwmon.c > index bc02095b314c..15158715b967 100644 > --- a/drivers/thermal/thermal_hwmon.c > +++ b/drivers/thermal/thermal_hwmon.c > @@ -77,15 +77,7 @@ temp_crit_show(struct device *dev, struct device_attribute *attr, char *buf) > int temperature; > int ret; > > - mutex_lock(&tz->lock); > - > - if (device_is_registered(&tz->device)) > - ret = tz->ops->get_crit_temp(tz, &temperature); > - else > - ret = -ENODEV; > - > - mutex_unlock(&tz->lock); > - > + ret = thermal_zone_get_crit_temp(tz, &temperature); > if (ret) > return ret; >
On Mon, Feb 20, 2023 at 02:34:08PM +0100, Daniel Lezcano wrote: > Hi Guenter, > > my script should have Cc'ed you but it didn't, so just a heads up this patch > ;) > > On 19/02/2023 15:36, Daniel Lezcano wrote: > > In this function, there is a guarantee the thermal zone is registered. > > > > The sysfs hwmon unregistering will be blocked until we exit the > > function. The thermal zone is unregistered after the sysfs hwmon is > > unregistered. > > > > When we are in this function, the thermal zone is registered. > > > > We can call the thermal_zone_get_crit_temp() function safely and let > > the function use the lock which is private the thermal core code. > > Hmm, if you say so. That very same call used to cause a crash in Chromebooks, which is why I had added the locking. Guenter > > Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> > > --- > > drivers/thermal/thermal_hwmon.c | 10 +--------- > > 1 file changed, 1 insertion(+), 9 deletions(-) > > > > diff --git a/drivers/thermal/thermal_hwmon.c b/drivers/thermal/thermal_hwmon.c > > index bc02095b314c..15158715b967 100644 > > --- a/drivers/thermal/thermal_hwmon.c > > +++ b/drivers/thermal/thermal_hwmon.c > > @@ -77,15 +77,7 @@ temp_crit_show(struct device *dev, struct device_attribute *attr, char *buf) > > int temperature; > > int ret; > > - mutex_lock(&tz->lock); > > - > > - if (device_is_registered(&tz->device)) > > - ret = tz->ops->get_crit_temp(tz, &temperature); > > - else > > - ret = -ENODEV; > > - > > - mutex_unlock(&tz->lock); > > - > > + ret = thermal_zone_get_crit_temp(tz, &temperature); > > if (ret) > > return ret; > > -- > <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs > > Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | > <http://twitter.com/#!/linaroorg> Twitter | > <http://www.linaro.org/linaro-blog/> Blog >
On 20/02/2023 15:11, Guenter Roeck wrote: > On Mon, Feb 20, 2023 at 02:34:08PM +0100, Daniel Lezcano wrote: >> Hi Guenter, >> >> my script should have Cc'ed you but it didn't, so just a heads up this patch >> ;) >> >> On 19/02/2023 15:36, Daniel Lezcano wrote: >>> In this function, there is a guarantee the thermal zone is registered. >>> >>> The sysfs hwmon unregistering will be blocked until we exit the >>> function. The thermal zone is unregistered after the sysfs hwmon is >>> unregistered. >>> >>> When we are in this function, the thermal zone is registered. >>> >>> We can call the thermal_zone_get_crit_temp() function safely and let >>> the function use the lock which is private the thermal core code. >>> > > Hmm, if you say so. That very same call used to cause a crash in > Chromebooks, which is why I had added the locking. Mmh, I see. I guess we can assume thermal_hwmon is part of the core code and remove this change.
On Mon, Feb 20, 2023 at 04:39:48PM +0100, Daniel Lezcano wrote: > On 20/02/2023 15:11, Guenter Roeck wrote: > > On Mon, Feb 20, 2023 at 02:34:08PM +0100, Daniel Lezcano wrote: > > > Hi Guenter, > > > > > > my script should have Cc'ed you but it didn't, so just a heads up this patch > > > ;) > > > > > > On 19/02/2023 15:36, Daniel Lezcano wrote: > > > > In this function, there is a guarantee the thermal zone is registered. > > > > > > > > The sysfs hwmon unregistering will be blocked until we exit the > > > > function. The thermal zone is unregistered after the sysfs hwmon is > > > > unregistered. > > > > > > > > When we are in this function, the thermal zone is registered. > > > > > > > > We can call the thermal_zone_get_crit_temp() function safely and let > > > > the function use the lock which is private the thermal core code. > > > > > > > > Hmm, if you say so. That very same call used to cause a crash in > > Chromebooks, which is why I had added the locking. > > Mmh, I see. I guess we can assume thermal_hwmon is part of the core code and > remove this change. > Yes. Anyway, the sequence of events was roughly as follows. - thermal zone is device is registered - hwmon device is registered - userspace is triggered and starts reading device attributes - while userspace has a hwmon attribute open, thermal device is unregistered - hwmon device is unregistered (sysfs attribute is still open) - hwmon device attribute function is called - Since thermal device ops have been released after the thermal device was unregistered, trying to call an ops callback fails. That doesn't normally happen, but the Intel wireless driver has the habit of registering a thermal zone early in its probe function, only to unregister it immediately afterwards if the probe function fails. If some userspace activity is triggered by the hwmon device registration, the thermal and hwmon device removal may be timed such that the hwmon devive is removed while one (or more) of its attribute files are still open. Normally that doesn't matter, but it is fatal here since the ops callbacks are not owned by the hwmon device but by the thermal device. Essentially every ops callback has this problem. thermal_zone_get_temp() had it as well, also associated with a hwmon sysfs attribute read operation. See commit 1c6b30060777 ("thermal/core: Ensure that thermal device is registered in thermal_zone_get_temp"). If you don't want non-thermal code to access ->ops directly, the thermal code would have to provide protected accessor functions, similar to thermal_zone_get_temp(). Thanks, Guenter > > -- > <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs > > Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | > <http://twitter.com/#!/linaroorg> Twitter | > <http://www.linaro.org/linaro-blog/> Blog >
On 20/02/2023 18:12, Guenter Roeck wrote: > On Mon, Feb 20, 2023 at 04:39:48PM +0100, Daniel Lezcano wrote: >> On 20/02/2023 15:11, Guenter Roeck wrote: >>> On Mon, Feb 20, 2023 at 02:34:08PM +0100, Daniel Lezcano wrote: >>>> Hi Guenter, >>>> >>>> my script should have Cc'ed you but it didn't, so just a heads up this patch >>>> ;) >>>> >>>> On 19/02/2023 15:36, Daniel Lezcano wrote: >>>>> In this function, there is a guarantee the thermal zone is registered. >>>>> >>>>> The sysfs hwmon unregistering will be blocked until we exit the >>>>> function. The thermal zone is unregistered after the sysfs hwmon is >>>>> unregistered. >>>>> >>>>> When we are in this function, the thermal zone is registered. >>>>> >>>>> We can call the thermal_zone_get_crit_temp() function safely and let >>>>> the function use the lock which is private the thermal core code. >>>>> >>> >>> Hmm, if you say so. That very same call used to cause a crash in >>> Chromebooks, which is why I had added the locking. >> >> Mmh, I see. I guess we can assume thermal_hwmon is part of the core code and >> remove this change. >> > > Yes. Anyway, the sequence of events was roughly as follows. > > - thermal zone is device is registered > - hwmon device is registered > - userspace is triggered and starts reading device attributes > - while userspace has a hwmon attribute open, thermal device is unregistered > - hwmon device is unregistered (sysfs attribute is still open) > - hwmon device attribute function is called > - Since thermal device ops have been released after the thermal device > was unregistered, trying to call an ops callback fails. > > That doesn't normally happen, but the Intel wireless driver has the habit > of registering a thermal zone early in its probe function, only to unregister > it immediately afterwards if the probe function fails. If some userspace > activity is triggered by the hwmon device registration, the thermal and > hwmon device removal may be timed such that the hwmon devive is removed > while one (or more) of its attribute files are still open. Normally that > doesn't matter, but it is fatal here since the ops callbacks are not owned > by the hwmon device but by the thermal device. > > Essentially every ops callback has this problem. > thermal_zone_get_temp() had it as well, also associated with > a hwmon sysfs attribute read operation. See commit 1c6b30060777 > ("thermal/core: Ensure that thermal device is registered in > thermal_zone_get_temp"). > > If you don't want non-thermal code to access ->ops directly, the thermal > code would have to provide protected accessor functions, similar to > thermal_zone_get_temp(). Hopefully we are getting rid of most of the ops soon ... :/
diff --git a/drivers/thermal/thermal_hwmon.c b/drivers/thermal/thermal_hwmon.c index bc02095b314c..15158715b967 100644 --- a/drivers/thermal/thermal_hwmon.c +++ b/drivers/thermal/thermal_hwmon.c @@ -77,15 +77,7 @@ temp_crit_show(struct device *dev, struct device_attribute *attr, char *buf) int temperature; int ret; - mutex_lock(&tz->lock); - - if (device_is_registered(&tz->device)) - ret = tz->ops->get_crit_temp(tz, &temperature); - else - ret = -ENODEV; - - mutex_unlock(&tz->lock); - + ret = thermal_zone_get_crit_temp(tz, &temperature); if (ret) return ret;
In this function, there is a guarantee the thermal zone is registered. The sysfs hwmon unregistering will be blocked until we exit the function. The thermal zone is unregistered after the sysfs hwmon is unregistered. When we are in this function, the thermal zone is registered. We can call the thermal_zone_get_crit_temp() function safely and let the function use the lock which is private the thermal core code. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> --- drivers/thermal/thermal_hwmon.c | 10 +--------- 1 file changed, 1 insertion(+), 9 deletions(-)