Message ID | 20230519032719.2581689-7-evalenti@kernel.org |
---|---|
State | New |
Headers | show |
Series | thermal: enhancements on thermal stats | expand |
On Fri, May 19, 2023 at 5:27 AM Eduardo Valentin <evalenti@kernel.org> wrote: > > From: Eduardo Valentin <eduval@amazon.com> > > Currently the thermal governors are not allowed to > react on temperature error events as the thermal core > skips the handling and logs an error on kernel buffer. > This patch adds the opportunity to report the errors > when they happen to governors. > > Now, if a governor wants to react on temperature read > errors, they can implement the .check_error() callback. Explaining the use case for this would help a lot.
On Tue, Jun 20, 2023 at 07:29:57PM +0200, Rafael J. Wysocki wrote: > > > > On Fri, May 19, 2023 at 5:27 AM Eduardo Valentin <evalenti@kernel.org> wrote: > > > > From: Eduardo Valentin <eduval@amazon.com> > > > > Currently the thermal governors are not allowed to > > react on temperature error events as the thermal core > > skips the handling and logs an error on kernel buffer. > > This patch adds the opportunity to report the errors > > when they happen to governors. > > > > Now, if a governor wants to react on temperature read > > errors, they can implement the .check_error() callback. > > Explaining the use case for this would help a lot. Yeah I agree. I also did not send the full series and will also add the governor changes for this in the next patch series. The use case here is primarily when temperature reads can fail. Common use case, not limited to though, is an I2C device temperature sensor. While it can be, in many cases, reliable, it is not always guaranteed to have a successful temperature read. In fact, it is common to see a sporadic temperature read failure, followed by successful reads. This patch series will enhance the core to allow temperature update error communication to the governor so the governor can have the opportunity to act upon sensor failure.
diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c index 3ba970c0744f..2ff7d9c7c973 100644 --- a/drivers/thermal/thermal_core.c +++ b/drivers/thermal/thermal_core.c @@ -313,6 +313,12 @@ static void handle_non_critical_trips(struct thermal_zone_device *tz, int trip) def_governor->throttle(tz, trip); } +static void handle_error_temperature(struct thermal_zone_device *tz, int error) +{ + if (tz->governor && tz->governor->check_error) + tz->governor->check_error(tz, error); +} + void thermal_zone_device_critical(struct thermal_zone_device *tz) { /* @@ -380,6 +386,9 @@ static void update_temperature(struct thermal_zone_device *tz) dev_warn(&tz->device, "failed to read out thermal zone (%d)\n", ret); + /* tell the governor its source is hosed */ + handle_error_temperature(tz, ret); + return; } diff --git a/include/linux/thermal.h b/include/linux/thermal.h index 9dc8292f0314..82c8e09a63e0 100644 --- a/include/linux/thermal.h +++ b/include/linux/thermal.h @@ -199,6 +199,8 @@ struct thermal_zone_device { * thermal zone. * @throttle: callback called for every trip point even if temperature is * below the trip point temperature + * @check_error: callback called whenever temperature updates fail. + * Opportunity for the governor to react on errors. * @governor_list: node in thermal_governor_list (in thermal_core.c) */ struct thermal_governor { @@ -206,6 +208,7 @@ struct thermal_governor { int (*bind_to_tz)(struct thermal_zone_device *tz); void (*unbind_from_tz)(struct thermal_zone_device *tz); int (*throttle)(struct thermal_zone_device *tz, int trip); + void (*check_error)(struct thermal_zone_device *tz, int error); struct list_head governor_list; };