mbox series

[v1,0/4] thermal: gov_bang_bang: Prevent cooling devices from getting stuck in the "on" state

Message ID 1903691.tdWV9SEqCh@rjwysocki.net
Headers show
Series thermal: gov_bang_bang: Prevent cooling devices from getting stuck in the "on" state | expand

Message

Rafael J. Wysocki Aug. 13, 2024, 2:23 p.m. UTC
Hi Everyone,

After changes made in 6.10, the Bang-bang governor has an initialization problem
on systems where cooling devices start in the "on" state, but the thermal zone
temperature stays below the corresponding trip points.

Namely, the Bang-bang governor only implements a .trip_crossed() callback which
only runs when a trip point is crossed.  If the zone temperature is always below
the trip point, that callback will never be invoked.  Now, if a cooling device
bound to that trip point starts in the "on" state, the governor has no chance
to change its state to "off".

This currently happens in the acerhdf driver, but it may as well happen elsewhere,
so I think that it needs to be addressed in the thermal subsystem.

It can be addressed by adding a .manage() callback to the Bang-bang governor,
which is done in patch [3/4].  That callback will be invoked every time
__thermal_zone_device_update() runs, not just when a trip is crossed, so it
can adjust the states of the cooling devices to the thermal zone temperature.
However, after running once, it becomes a pure needless overhead because the
states of cooling devices only need to be fixed up once (modulo some special
situations like system resume).

That's addressed in patch [4/4] which uses governor_data to store the information
on whether or not the states of the cooling devices will need to be adjusted.

Patches [1-2/4] are preliminary, but IMV it is better to make these changes
separately for clarity, but also in case they turn out to have a functional
effect which is not expected.

Overall, this series is a fix candidate for 6.11-rc because the change in
behavior addressed by it can be regarded as a regression with respect to 6.9.

Unfortunately, it affects this series:

https://lore.kernel.org/linux-pm/114901234.nniJfEyVGO@rjwysocki.net/

which will need to be reordered and rebased (slightly), but because I've dropped
one broken patch from it already, it will need to be changed anyway.

Thanks!