Message ID | 20231219082726.844508-1-wenst@chromium.org |
---|---|
State | New |
Headers | show |
Series | thermal/core: Correctly free tz->tzp in thermal zone registration error path | expand |
On Tue, Dec 19, 2023 at 11:28 PM Rafael J. Wysocki <rafael@kernel.org> wrote: > > On Tue, Dec 19, 2023 at 9:27 AM Chen-Yu Tsai <wenst@chromium.org> wrote: > > > > After commit 3d439b1a2ad3 ("thermal/core: Alloc-copy-free the thermal > > zone parameters structure"), the core now copies the thermal zone > > parameters structure, and frees it if an error happens during thermal > > zone device registration, or upon unregistration of the device. > > > > In the error path, if device_register() was called, then `tz` disappears > > before kfree(tz->tzp) happens, causing a NULL pointer deference crash. > > > > In my case, the error path was entered from the sbs power supply driver, > > which through the power supply core registers a thermal zone *without > > trip points* for the battery temperature sensor. This combined with > > setting the default thermal governor to "power allocator", which > > *requires* trip_max, causes the thermal zone registration to error out. > > > > The error path should handle the two cases, one where device_register > > has not happened and the kobj hasn't been reference counted, and vice > > versa where it has. The original commit tried to cover the first case, > > but fails for the second. Fix this by adding kfree(tz->tzp) before > > put_device() to cover the second case, and check if `tz` is still valid > > before calling kfree(tz->tzp) to avoid crashing in the second case. > > > > Fixes: 3d439b1a2ad3 ("thermal/core: Alloc-copy-free the thermal zone parameters structure") > > Signed-off-by: Chen-Yu Tsai <wenst@chromium.org> > > --- > > This includes the minimal changes to fix the crash. I suppose some other > > things in the thermal core could be reworked: > > - Don't use "power allocator" for thermal zones without trip points > > - Move some of the thermal zone cleanup code into the release function > > > > drivers/thermal/thermal_core.c | 6 +++++- > > 1 file changed, 5 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c > > index 2415dc50c31d..e47826d82062 100644 > > --- a/drivers/thermal/thermal_core.c > > +++ b/drivers/thermal/thermal_core.c > > @@ -1392,12 +1392,16 @@ thermal_zone_device_register_with_trips(const char *type, struct thermal_trip *t > > unregister: > > device_del(&tz->device); > > release_device: > > + /* Free tz->tzp before tz goes away. */ > > + kfree(tz->tzp); > > put_device(&tz->device); > > tz = NULL; > > remove_id: > > ida_free(&thermal_tz_ida, id); > > free_tzp: > > - kfree(tz->tzp); > > + /* If we arrived here before device_register() was called. */ > > + if (tz) > > + kfree(tz->tzp); > > free_tz: > > kfree(tz); > > return ERR_PTR(result); > > -- > > Can you please test linux-next from today? The issue addressed by > your patch should be fixed there. Sorry for the very late reply. Yes it does. Thanks. ChenYu
diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c index 2415dc50c31d..e47826d82062 100644 --- a/drivers/thermal/thermal_core.c +++ b/drivers/thermal/thermal_core.c @@ -1392,12 +1392,16 @@ thermal_zone_device_register_with_trips(const char *type, struct thermal_trip *t unregister: device_del(&tz->device); release_device: + /* Free tz->tzp before tz goes away. */ + kfree(tz->tzp); put_device(&tz->device); tz = NULL; remove_id: ida_free(&thermal_tz_ida, id); free_tzp: - kfree(tz->tzp); + /* If we arrived here before device_register() was called. */ + if (tz) + kfree(tz->tzp); free_tz: kfree(tz); return ERR_PTR(result);
After commit 3d439b1a2ad3 ("thermal/core: Alloc-copy-free the thermal zone parameters structure"), the core now copies the thermal zone parameters structure, and frees it if an error happens during thermal zone device registration, or upon unregistration of the device. In the error path, if device_register() was called, then `tz` disappears before kfree(tz->tzp) happens, causing a NULL pointer deference crash. In my case, the error path was entered from the sbs power supply driver, which through the power supply core registers a thermal zone *without trip points* for the battery temperature sensor. This combined with setting the default thermal governor to "power allocator", which *requires* trip_max, causes the thermal zone registration to error out. The error path should handle the two cases, one where device_register has not happened and the kobj hasn't been reference counted, and vice versa where it has. The original commit tried to cover the first case, but fails for the second. Fix this by adding kfree(tz->tzp) before put_device() to cover the second case, and check if `tz` is still valid before calling kfree(tz->tzp) to avoid crashing in the second case. Fixes: 3d439b1a2ad3 ("thermal/core: Alloc-copy-free the thermal zone parameters structure") Signed-off-by: Chen-Yu Tsai <wenst@chromium.org> --- This includes the minimal changes to fix the crash. I suppose some other things in the thermal core could be reworked: - Don't use "power allocator" for thermal zones without trip points - Move some of the thermal zone cleanup code into the release function drivers/thermal/thermal_core.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)