Message ID | 20240702-power-allocator-null-trip-max-v1-1-47a60dc55414@collabora.com |
---|---|
State | Accepted |
Commit | aaa18ff54b97706b84306b6613630262706b1f6b |
Headers | show |
Series | thermal: gov_power_allocator: Return early in manage if trip_max is null | expand |
On Wed, Jul 3, 2024 at 11:03 AM AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> wrote: > > Il 02/07/24 23:24, Nícolas F. R. A. Prado ha scritto: > > Commit da781936e7c3 ("thermal: gov_power_allocator: Allow binding > > without trip points") allowed the governor to bind even when trip_max > > is null. This allows a null pointer dereference to happen in the manage > > callback. Add an early return to prevent it, since the governor is > > expected to not do anything in this case. > > > > Fixes: da781936e7c3 ("thermal: gov_power_allocator: Allow binding without trip points") > > Signed-off-by: Nícolas F. R. A. Prado <nfraprado@collabora.com> > > --- > > This issue was noticed by KernelCI during a boot test on the > > mt8195-cherry-tomato-r2 platform with the config in [1]. The stack trace > > is attached below. > > > > [1] http://0x0.st/XaON.txt > > > > [ 4.015786] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 > > [ 4.015791] Mem abort info: > > [ 4.015793] ESR = 0x0000000096000004 > > [ 4.015796] EC = 0x25: DABT (current EL), IL = 32 bits > > [ 4.015799] SET = 0, FnV = 0 > > [ 4.015802] EA = 0, S1PTW = 0 > > [ 4.015804] FSC = 0x04: level 0 translation fault > > [ 4.015807] Data abort info: > > [ 4.015809] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 > > [ 4.015811] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 > > [ 4.015814] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 > > [ 4.015818] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000109809000 > > [ 4.015821] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000 > > [ 4.015835] Modules linked in: mt8195_mt6359(+) mt6577_auxadc snd_soc_mt8195_afe mtk_scp_ipi snd_sof_utils mtk_wdt(+) > > [ 4.015852] CPU: 2 PID: 13 Comm: kworker/u32:1 Not tainted 6.10.0-rc6 #1 c5d519ae8e7fec6bbe67cb8c50bfebcb89dfa54e > > [ 4.015859] Hardware name: Acer Tomato (rev2) board (DT) > > [ 4.015862] Workqueue: events_unbound deferred_probe_work_func > > [ 4.015875] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) > > [ 4.015880] pc : power_allocator_manage+0x110/0x6a0 > > [ 4.015888] lr : __thermal_zone_device_update+0x1dc/0x400 > > [ 4.015893] sp : ffff8000800eb800 > > [ 4.015895] x29: ffff8000800eb810 x28: 0000000000000001 x27: 0000000000000001 > > [ 4.015903] x26: aaaaaaaaaaaaaaab x25: ffff07a0461c15a0 x24: ffffb58530ca67c0 > > [ 4.015911] x23: 0000000000000000 x22: ffff07a04098fcc0 x21: ffffb58532eec848 > > [ 4.015918] x20: ffff8000800eb920 x19: ffff07a0461c1000 x18: 0000000000000b4b > > [ 4.015926] x17: 5359534255530031 x16: ffffb585310352e4 x15: 0000000000000020 > > [ 4.015933] x14: 0000000000000000 x13: ffffffff00000000 x12: 0000000000000040 > > [ 4.015940] x11: 0101010101010101 x10: ffffffffffffffff x9 : ffffb58530ca8d78 > > [ 4.015948] x8 : 0101010101010101 x7 : 7f7f7f7f7f7f7f7f x6 : 0000000000001388 > > [ 4.015955] x5 : 0000000000000000 x4 : 0000000000000384 x3 : 0000000000000000 > > [ 4.015962] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000 > > [ 4.015970] Call trace: > > [ 4.015972] power_allocator_manage+0x110/0x6a0 > > [ 4.015978] __thermal_zone_device_update+0x1dc/0x400 > > [ 4.015983] thermal_zone_device_set_mode+0x7c/0xa0 > > [ 4.015987] thermal_zone_device_enable+0x1c/0x28 > > [ 4.015991] thermal_of_zone_register+0x43c/0x498 > > [ 4.015996] devm_thermal_of_zone_register+0x6c/0xb8 > > [ 4.016001] gadc_thermal_probe+0x140/0x214 > > [ 4.016007] platform_probe+0x70/0xc4 > > [ 4.016012] really_probe+0x140/0x270 > > [ 4.016018] __driver_probe_device+0xfc/0x114 > > [ 4.016024] driver_probe_device+0x44/0x100 > > [ 4.016029] __device_attach_driver+0x64/0xdc > > [ 4.016035] bus_for_each_drv+0xb4/0xdc > > [ 4.016041] __device_attach+0xdc/0x16c > > [ 4.016046] device_initial_probe+0x1c/0x28 > > [ 4.016052] bus_probe_device+0x44/0xac > > [ 4.016057] deferred_probe_work_func+0xb0/0xc4 > > [ 4.016063] process_scheduled_works+0x114/0x330 > > [ 4.016070] worker_thread+0x1c0/0x20c > > [ 4.016076] kthread+0xf8/0x108 > > [ 4.016081] ret_from_fork+0x10/0x20 > > [ 4.016090] Code: d1030294 17ffffdd f94012c0 f9401ed7 (b9400000) > > [ 4.016095] ---[ end trace 0000000000000000 ]--- > > --- > > drivers/thermal/gov_power_allocator.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/drivers/thermal/gov_power_allocator.c b/drivers/thermal/gov_power_allocator.c > > index 45f04a25255a..1b2345a697c5 100644 > > --- a/drivers/thermal/gov_power_allocator.c > > +++ b/drivers/thermal/gov_power_allocator.c > > @@ -759,6 +759,9 @@ static void power_allocator_manage(struct thermal_zone_device *tz) > > return; > > } > > > > + if (!params->trip_max) > > + return; > > + > > I'm not sure that this is the right thing to do. > > If you do that, allocate_power() will never be called, so the entire algo doesn't > work, making binding this completely useless (as it's going to be a noop..!). > > Check what get_governor_trips() says in the documentation: > > * If there is only one trip point, then that's considered to be the > * "maximum desired temperature" trip point and the governor is always > * on. If there are no passive or active trip points, then the > * governor won't do anything. In fact, its throttle function > * won't be called at all. > > ....and it looks like you're aware of that, as you said that in the commit > description as well. IIUC, the problematic commit allowed the power allocator governor to bind to a tripless thermal zone in order to prevent failing the entire thermal zone registration. Yes, it will be a noop in this case because in the absence of any trips there will be nothing to do for it. Still, user space can check the zone temperature via sysfs. Adding a NULL pointer check before the place where the pointer in question is dereferenced is not a bad idea at all.
On 7/3/24 21:20, Rafael J. Wysocki wrote: > On Wed, Jul 3, 2024 at 11:03 AM AngeloGioacchino Del Regno > <angelogioacchino.delregno@collabora.com> wrote: >> >> Il 02/07/24 23:24, Nícolas F. R. A. Prado ha scritto: >>> Commit da781936e7c3 ("thermal: gov_power_allocator: Allow binding >>> without trip points") allowed the governor to bind even when trip_max >>> is null. This allows a null pointer dereference to happen in the manage >>> callback. Add an early return to prevent it, since the governor is >>> expected to not do anything in this case. >>> >>> Fixes: da781936e7c3 ("thermal: gov_power_allocator: Allow binding without trip points") >>> Signed-off-by: Nícolas F. R. A. Prado <nfraprado@collabora.com> >>> --- >>> This issue was noticed by KernelCI during a boot test on the >>> mt8195-cherry-tomato-r2 platform with the config in [1]. The stack trace >>> is attached below. >>> >>> [1] http://0x0.st/XaON.txt >>> >>> [ 4.015786] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 >>> [ 4.015791] Mem abort info: >>> [ 4.015793] ESR = 0x0000000096000004 >>> [ 4.015796] EC = 0x25: DABT (current EL), IL = 32 bits >>> [ 4.015799] SET = 0, FnV = 0 >>> [ 4.015802] EA = 0, S1PTW = 0 >>> [ 4.015804] FSC = 0x04: level 0 translation fault >>> [ 4.015807] Data abort info: >>> [ 4.015809] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 >>> [ 4.015811] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 >>> [ 4.015814] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 >>> [ 4.015818] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000109809000 >>> [ 4.015821] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000 >>> [ 4.015835] Modules linked in: mt8195_mt6359(+) mt6577_auxadc snd_soc_mt8195_afe mtk_scp_ipi snd_sof_utils mtk_wdt(+) >>> [ 4.015852] CPU: 2 PID: 13 Comm: kworker/u32:1 Not tainted 6.10.0-rc6 #1 c5d519ae8e7fec6bbe67cb8c50bfebcb89dfa54e >>> [ 4.015859] Hardware name: Acer Tomato (rev2) board (DT) >>> [ 4.015862] Workqueue: events_unbound deferred_probe_work_func >>> [ 4.015875] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) >>> [ 4.015880] pc : power_allocator_manage+0x110/0x6a0 >>> [ 4.015888] lr : __thermal_zone_device_update+0x1dc/0x400 >>> [ 4.015893] sp : ffff8000800eb800 >>> [ 4.015895] x29: ffff8000800eb810 x28: 0000000000000001 x27: 0000000000000001 >>> [ 4.015903] x26: aaaaaaaaaaaaaaab x25: ffff07a0461c15a0 x24: ffffb58530ca67c0 >>> [ 4.015911] x23: 0000000000000000 x22: ffff07a04098fcc0 x21: ffffb58532eec848 >>> [ 4.015918] x20: ffff8000800eb920 x19: ffff07a0461c1000 x18: 0000000000000b4b >>> [ 4.015926] x17: 5359534255530031 x16: ffffb585310352e4 x15: 0000000000000020 >>> [ 4.015933] x14: 0000000000000000 x13: ffffffff00000000 x12: 0000000000000040 >>> [ 4.015940] x11: 0101010101010101 x10: ffffffffffffffff x9 : ffffb58530ca8d78 >>> [ 4.015948] x8 : 0101010101010101 x7 : 7f7f7f7f7f7f7f7f x6 : 0000000000001388 >>> [ 4.015955] x5 : 0000000000000000 x4 : 0000000000000384 x3 : 0000000000000000 >>> [ 4.015962] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000 >>> [ 4.015970] Call trace: >>> [ 4.015972] power_allocator_manage+0x110/0x6a0 >>> [ 4.015978] __thermal_zone_device_update+0x1dc/0x400 >>> [ 4.015983] thermal_zone_device_set_mode+0x7c/0xa0 >>> [ 4.015987] thermal_zone_device_enable+0x1c/0x28 >>> [ 4.015991] thermal_of_zone_register+0x43c/0x498 >>> [ 4.015996] devm_thermal_of_zone_register+0x6c/0xb8 >>> [ 4.016001] gadc_thermal_probe+0x140/0x214 >>> [ 4.016007] platform_probe+0x70/0xc4 >>> [ 4.016012] really_probe+0x140/0x270 >>> [ 4.016018] __driver_probe_device+0xfc/0x114 >>> [ 4.016024] driver_probe_device+0x44/0x100 >>> [ 4.016029] __device_attach_driver+0x64/0xdc >>> [ 4.016035] bus_for_each_drv+0xb4/0xdc >>> [ 4.016041] __device_attach+0xdc/0x16c >>> [ 4.016046] device_initial_probe+0x1c/0x28 >>> [ 4.016052] bus_probe_device+0x44/0xac >>> [ 4.016057] deferred_probe_work_func+0xb0/0xc4 >>> [ 4.016063] process_scheduled_works+0x114/0x330 >>> [ 4.016070] worker_thread+0x1c0/0x20c >>> [ 4.016076] kthread+0xf8/0x108 >>> [ 4.016081] ret_from_fork+0x10/0x20 >>> [ 4.016090] Code: d1030294 17ffffdd f94012c0 f9401ed7 (b9400000) >>> [ 4.016095] ---[ end trace 0000000000000000 ]--- >>> --- >>> drivers/thermal/gov_power_allocator.c | 3 +++ >>> 1 file changed, 3 insertions(+) >>> >>> diff --git a/drivers/thermal/gov_power_allocator.c b/drivers/thermal/gov_power_allocator.c >>> index 45f04a25255a..1b2345a697c5 100644 >>> --- a/drivers/thermal/gov_power_allocator.c >>> +++ b/drivers/thermal/gov_power_allocator.c >>> @@ -759,6 +759,9 @@ static void power_allocator_manage(struct thermal_zone_device *tz) >>> return; >>> } >>> >>> + if (!params->trip_max) >>> + return; >>> + >> >> I'm not sure that this is the right thing to do. >> >> If you do that, allocate_power() will never be called, so the entire algo doesn't >> work, making binding this completely useless (as it's going to be a noop..!). >> >> Check what get_governor_trips() says in the documentation: >> >> * If there is only one trip point, then that's considered to be the >> * "maximum desired temperature" trip point and the governor is always >> * on. If there are no passive or active trip points, then the >> * governor won't do anything. In fact, its throttle function >> * won't be called at all. >> >> ....and it looks like you're aware of that, as you said that in the commit >> description as well. > > IIUC, the problematic commit allowed the power allocator governor to > bind to a tripless thermal zone in order to prevent failing the entire > thermal zone registration. > > Yes, it will be a noop in this case because in the absence of any > trips there will be nothing to do for it. Still, user space can check > the zone temperature via sysfs. > > Adding a NULL pointer check before the place where the pointer in > question is dereferenced is not a bad idea at all. Yes, I agree. My apologies for being late. Thanks Rafael for applying the patch. Lukasz
diff --git a/drivers/thermal/gov_power_allocator.c b/drivers/thermal/gov_power_allocator.c index 45f04a25255a..1b2345a697c5 100644 --- a/drivers/thermal/gov_power_allocator.c +++ b/drivers/thermal/gov_power_allocator.c @@ -759,6 +759,9 @@ static void power_allocator_manage(struct thermal_zone_device *tz) return; } + if (!params->trip_max) + return; + allocate_power(tz, params->trip_max->temperature); params->update_cdevs = true; }
Commit da781936e7c3 ("thermal: gov_power_allocator: Allow binding without trip points") allowed the governor to bind even when trip_max is null. This allows a null pointer dereference to happen in the manage callback. Add an early return to prevent it, since the governor is expected to not do anything in this case. Fixes: da781936e7c3 ("thermal: gov_power_allocator: Allow binding without trip points") Signed-off-by: Nícolas F. R. A. Prado <nfraprado@collabora.com> --- This issue was noticed by KernelCI during a boot test on the mt8195-cherry-tomato-r2 platform with the config in [1]. The stack trace is attached below. [1] http://0x0.st/XaON.txt [ 4.015786] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 [ 4.015791] Mem abort info: [ 4.015793] ESR = 0x0000000096000004 [ 4.015796] EC = 0x25: DABT (current EL), IL = 32 bits [ 4.015799] SET = 0, FnV = 0 [ 4.015802] EA = 0, S1PTW = 0 [ 4.015804] FSC = 0x04: level 0 translation fault [ 4.015807] Data abort info: [ 4.015809] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 [ 4.015811] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 4.015814] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 4.015818] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000109809000 [ 4.015821] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000 [ 4.015835] Modules linked in: mt8195_mt6359(+) mt6577_auxadc snd_soc_mt8195_afe mtk_scp_ipi snd_sof_utils mtk_wdt(+) [ 4.015852] CPU: 2 PID: 13 Comm: kworker/u32:1 Not tainted 6.10.0-rc6 #1 c5d519ae8e7fec6bbe67cb8c50bfebcb89dfa54e [ 4.015859] Hardware name: Acer Tomato (rev2) board (DT) [ 4.015862] Workqueue: events_unbound deferred_probe_work_func [ 4.015875] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 4.015880] pc : power_allocator_manage+0x110/0x6a0 [ 4.015888] lr : __thermal_zone_device_update+0x1dc/0x400 [ 4.015893] sp : ffff8000800eb800 [ 4.015895] x29: ffff8000800eb810 x28: 0000000000000001 x27: 0000000000000001 [ 4.015903] x26: aaaaaaaaaaaaaaab x25: ffff07a0461c15a0 x24: ffffb58530ca67c0 [ 4.015911] x23: 0000000000000000 x22: ffff07a04098fcc0 x21: ffffb58532eec848 [ 4.015918] x20: ffff8000800eb920 x19: ffff07a0461c1000 x18: 0000000000000b4b [ 4.015926] x17: 5359534255530031 x16: ffffb585310352e4 x15: 0000000000000020 [ 4.015933] x14: 0000000000000000 x13: ffffffff00000000 x12: 0000000000000040 [ 4.015940] x11: 0101010101010101 x10: ffffffffffffffff x9 : ffffb58530ca8d78 [ 4.015948] x8 : 0101010101010101 x7 : 7f7f7f7f7f7f7f7f x6 : 0000000000001388 [ 4.015955] x5 : 0000000000000000 x4 : 0000000000000384 x3 : 0000000000000000 [ 4.015962] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000 [ 4.015970] Call trace: [ 4.015972] power_allocator_manage+0x110/0x6a0 [ 4.015978] __thermal_zone_device_update+0x1dc/0x400 [ 4.015983] thermal_zone_device_set_mode+0x7c/0xa0 [ 4.015987] thermal_zone_device_enable+0x1c/0x28 [ 4.015991] thermal_of_zone_register+0x43c/0x498 [ 4.015996] devm_thermal_of_zone_register+0x6c/0xb8 [ 4.016001] gadc_thermal_probe+0x140/0x214 [ 4.016007] platform_probe+0x70/0xc4 [ 4.016012] really_probe+0x140/0x270 [ 4.016018] __driver_probe_device+0xfc/0x114 [ 4.016024] driver_probe_device+0x44/0x100 [ 4.016029] __device_attach_driver+0x64/0xdc [ 4.016035] bus_for_each_drv+0xb4/0xdc [ 4.016041] __device_attach+0xdc/0x16c [ 4.016046] device_initial_probe+0x1c/0x28 [ 4.016052] bus_probe_device+0x44/0xac [ 4.016057] deferred_probe_work_func+0xb0/0xc4 [ 4.016063] process_scheduled_works+0x114/0x330 [ 4.016070] worker_thread+0x1c0/0x20c [ 4.016076] kthread+0xf8/0x108 [ 4.016081] ret_from_fork+0x10/0x20 [ 4.016090] Code: d1030294 17ffffdd f94012c0 f9401ed7 (b9400000) [ 4.016095] ---[ end trace 0000000000000000 ]--- --- drivers/thermal/gov_power_allocator.c | 3 +++ 1 file changed, 3 insertions(+) --- base-commit: 82e4255305c554b0bb18b7ccf2db86041b4c8b6e change-id: 20240702-power-allocator-null-trip-max-bc82c8d7eaec Best regards,