diff mbox series

[6/7] ythermal: core: report errors to governors

Message ID 20230519032719.2581689-7-evalenti@kernel.org
State New
Headers show
Series thermal: enhancements on thermal stats | expand

Commit Message

Eduardo Valentin May 19, 2023, 3:27 a.m. UTC
From: Eduardo Valentin <eduval@amazon.com>

Currently the thermal governors are not allowed to
react on temperature error events as the thermal core
skips the handling and logs an error on kernel buffer.
This patch adds the opportunity to report the errors
when they happen to governors.

Now, if a governor wants to react on temperature read
errors, they can implement the .check_error() callback.

Cc: "Rafael J. Wysocki" <rafael@kernel.org> (supporter:THERMAL)
Cc: Daniel Lezcano <daniel.lezcano@linaro.org> (supporter:THERMAL)
Cc: Amit Kucheria <amitk@kernel.org> (reviewer:THERMAL)
Cc: Zhang Rui <rui.zhang@intel.com> (reviewer:THERMAL)
Cc: Jonathan Corbet <corbet@lwn.net> (maintainer:DOCUMENTATION)
Cc: linux-pm@vger.kernel.org (open list:THERMAL)
Cc: linux-doc@vger.kernel.org (open list:DOCUMENTATION)
Cc: linux-kernel@vger.kernel.org (open list)

Signed-off-by: Eduardo Valentin <eduval@amazon.com>
---
 drivers/thermal/thermal_core.c | 9 +++++++++
 include/linux/thermal.h        | 3 +++
 2 files changed, 12 insertions(+)

Comments

Rafael J. Wysocki June 20, 2023, 5:29 p.m. UTC | #1
On Fri, May 19, 2023 at 5:27 AM Eduardo Valentin <evalenti@kernel.org> wrote:
>
> From: Eduardo Valentin <eduval@amazon.com>
>
> Currently the thermal governors are not allowed to
> react on temperature error events as the thermal core
> skips the handling and logs an error on kernel buffer.
> This patch adds the opportunity to report the errors
> when they happen to governors.
>
> Now, if a governor wants to react on temperature read
> errors, they can implement the .check_error() callback.

Explaining the use case for this would help a lot.
Eduardo Valentin June 21, 2023, 4:49 a.m. UTC | #2
On Tue, Jun 20, 2023 at 07:29:57PM +0200, Rafael J. Wysocki wrote:
> 
> 
> 
> On Fri, May 19, 2023 at 5:27 AM Eduardo Valentin <evalenti@kernel.org> wrote:
> >
> > From: Eduardo Valentin <eduval@amazon.com>
> >
> > Currently the thermal governors are not allowed to
> > react on temperature error events as the thermal core
> > skips the handling and logs an error on kernel buffer.
> > This patch adds the opportunity to report the errors
> > when they happen to governors.
> >
> > Now, if a governor wants to react on temperature read
> > errors, they can implement the .check_error() callback.
> 
> Explaining the use case for this would help a lot.


Yeah I agree. I also did not send the full series and will also add
the governor changes for this in the next patch series.

The use case here is primarily when temperature reads can fail.
Common use case, not limited to though, is an I2C device temperature sensor.
While it can be, in many cases, reliable, it is not always guaranteed to
have a successful temperature read. In fact, it is common to see a sporadic
temperature read failure, followed by successful reads.

This patch series will enhance the core to allow temperature update
error communication to the governor so the governor can have the
opportunity to act upon sensor failure.
diff mbox series

Patch

diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index 3ba970c0744f..2ff7d9c7c973 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -313,6 +313,12 @@  static void handle_non_critical_trips(struct thermal_zone_device *tz, int trip)
 		       def_governor->throttle(tz, trip);
 }
 
+static void handle_error_temperature(struct thermal_zone_device *tz, int error)
+{
+	if (tz->governor && tz->governor->check_error)
+		tz->governor->check_error(tz, error);
+}
+
 void thermal_zone_device_critical(struct thermal_zone_device *tz)
 {
 	/*
@@ -380,6 +386,9 @@  static void update_temperature(struct thermal_zone_device *tz)
 			dev_warn(&tz->device,
 				 "failed to read out thermal zone (%d)\n",
 				 ret);
+		/* tell the governor its source is hosed */
+		handle_error_temperature(tz, ret);
+
 		return;
 	}
 
diff --git a/include/linux/thermal.h b/include/linux/thermal.h
index 9dc8292f0314..82c8e09a63e0 100644
--- a/include/linux/thermal.h
+++ b/include/linux/thermal.h
@@ -199,6 +199,8 @@  struct thermal_zone_device {
  *			thermal zone.
  * @throttle:	callback called for every trip point even if temperature is
  *		below the trip point temperature
+ * @check_error:	callback called whenever temperature updates fail.
+ *		Opportunity for the governor to react on errors.
  * @governor_list:	node in thermal_governor_list (in thermal_core.c)
  */
 struct thermal_governor {
@@ -206,6 +208,7 @@  struct thermal_governor {
 	int (*bind_to_tz)(struct thermal_zone_device *tz);
 	void (*unbind_from_tz)(struct thermal_zone_device *tz);
 	int (*throttle)(struct thermal_zone_device *tz, int trip);
+	void (*check_error)(struct thermal_zone_device *tz, int error);
 	struct list_head	governor_list;
 };