mbox series

[v1,0/7] ACPI: thermal: Use trip point table to register thermal zones

Message ID 13318886.uLZWGnKmhe@kreacher
Headers show
Series ACPI: thermal: Use trip point table to register thermal zones | expand

Message

Rafael J. Wysocki July 18, 2023, 6:01 p.m. UTC
Hi Everyone,

This patch series makes the ACPI thermal driver register thermal zones
with the help of thermal_zone_device_register_with_trips(), so it
doesn't need to use the thermal zone callbacks related to trip points
any more (and they are dropped in the last patch).

The approach presented here is quite radically different from the
previous attempts, as it doesn't really rearrange the driver's
internal data structures, but adds the trip table support on top of
them.  For this purpose, it uses an additional field in struct thermal_trip
introduced in the first patch.

I have run it on my test-bed systems, but this is not too representative,
because they each have only one ACPI thermal zone with only one (critical)
trip point in it.

Thanks,
Rafael

Comments

Daniel Lezcano July 18, 2023, 10:45 p.m. UTC | #1
Hi Rafael,

On 18/07/2023 20:01, Rafael J. Wysocki wrote:
> Hi Everyone,
> 
> This patch series makes the ACPI thermal driver register thermal zones
> with the help of thermal_zone_device_register_with_trips(), so it
> doesn't need to use the thermal zone callbacks related to trip points
> any more (and they are dropped in the last patch).

Yay!

> The approach presented here is quite radically different from the
> previous attempts, as it doesn't really rearrange the driver's
> internal data structures, but adds the trip table support on top of
> them.  For this purpose, it uses an additional field in struct thermal_trip
> introduced in the first patch.
> 
> I have run it on my test-bed systems, but this is not too representative,
> because they each have only one ACPI thermal zone with only one (critical)
> trip point in it.

Rui created some ACPI fake tables I was able to run them in a KVM 
machine with fake thermal zones.

I can share the setup if you are interested in
Rafael J. Wysocki July 19, 2023, 1:33 p.m. UTC | #2
On Wed, Jul 19, 2023 at 12:46 AM Daniel Lezcano
<daniel.lezcano@linaro.org> wrote:
>
>
> Hi Rafael,
>
> On 18/07/2023 20:01, Rafael J. Wysocki wrote:
> > Hi Everyone,
> >
> > This patch series makes the ACPI thermal driver register thermal zones
> > with the help of thermal_zone_device_register_with_trips(), so it
> > doesn't need to use the thermal zone callbacks related to trip points
> > any more (and they are dropped in the last patch).
>
> Yay!
>
> > The approach presented here is quite radically different from the
> > previous attempts, as it doesn't really rearrange the driver's
> > internal data structures, but adds the trip table support on top of
> > them.  For this purpose, it uses an additional field in struct thermal_trip
> > introduced in the first patch.
> >
> > I have run it on my test-bed systems, but this is not too representative,
> > because they each have only one ACPI thermal zone with only one (critical)
> > trip point in it.
>
> Rui created some ACPI fake tables I was able to run them in a KVM
> machine with fake thermal zones.
>
> I can share the setup if you are interested in

Yes, please!
Rafael J. Wysocki July 21, 2023, 12:44 p.m. UTC | #3
Hi Everyone,

This is the second iteration of the $subject patch series and its original
description below is still applicable

On Tuesday, July 18, 2023 8:01:20 PM CEST Rafael J. Wysocki wrote:
> 
> This patch series makes the ACPI thermal driver register thermal zones
> with the help of thermal_zone_device_register_with_trips(), so it
> doesn't need to use the thermal zone callbacks related to trip points
> any more (and they are dropped in the last patch).
> 
> The approach presented here is quite radically different from the
> previous attempts, as it doesn't really rearrange the driver's
> internal data structures, but adds the trip table support on top of
> them.  For this purpose, it uses an additional field in struct thermal_trip
> introduced in the first patch.

In the meantime I have updated the patches and tested them on a system with
a couple of ACPI thermal zones.  No differences in functionality after applying
the patches have been observed.

The update is mostly related to adding extra locking around trip point
temperature updates and some hardening of the .get_trend() callback routine
against invalid trip point indices.

Please see patch changelogs for details.

Thanks!
Daniel Lezcano July 23, 2023, 10:19 a.m. UTC | #4
Hi Rafael,

could you wait before applying I would like to review the series but I'm 
OoO ATM, coming back next week?



On 21/07/2023 14:44, Rafael J. Wysocki wrote:
> Hi Everyone,
> 
> This is the second iteration of the $subject patch series and its original
> description below is still applicable
> 
> On Tuesday, July 18, 2023 8:01:20 PM CEST Rafael J. Wysocki wrote:
>>
>> This patch series makes the ACPI thermal driver register thermal zones
>> with the help of thermal_zone_device_register_with_trips(), so it
>> doesn't need to use the thermal zone callbacks related to trip points
>> any more (and they are dropped in the last patch).
>>
>> The approach presented here is quite radically different from the
>> previous attempts, as it doesn't really rearrange the driver's
>> internal data structures, but adds the trip table support on top of
>> them.  For this purpose, it uses an additional field in struct thermal_trip
>> introduced in the first patch.
> 
> In the meantime I have updated the patches and tested them on a system with
> a couple of ACPI thermal zones.  No differences in functionality after applying
> the patches have been observed.
> 
> The update is mostly related to adding extra locking around trip point
> temperature updates and some hardening of the .get_trend() callback routine
> against invalid trip point indices.
> 
> Please see patch changelogs for details.
> 
> Thanks!
> 
> 
>
Rafael J. Wysocki July 24, 2023, 7:59 a.m. UTC | #5
Hi Daniel,

On Sun, Jul 23, 2023 at 12:19 PM Daniel Lezcano
<daniel.lezcano@linaro.org> wrote:
>
>
> Hi Rafael,
>
> could you wait before applying I would like to review the series but I'm
> OoO ATM, coming back next week?

Yes, I can wait.

Thanks!
Rafael J. Wysocki July 25, 2023, 6:08 p.m. UTC | #6
On Tuesday, July 25, 2023 2:20:57 PM CEST Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Make the ACPI thermal driver use thermal_zone_device_register_with_trips()
> to register its thermal zones.
> 
> For this purpose, make it create a trip point table and pass it to
> thermal_zone_device_register_with_trips() as an argument and use the
> struct thermal_trip_ref introduced previously to connect the generic
> thermal trip structures to the internal data structures representing
> trip points in the driver.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
> 
> v2 -> v3:
>    * Fix error code path memory leak in acpi_thermal_register_thermal_zone().
>    * Notice that the critical and hot trips never change after initialization,
>      so don't add struct thermal_trip_ref to any of them.
> 
> v1 -> v2:
>    * Use thermal_zone_device_lock()/thermal_zone_device_unlock() in
>      acpi_thermal_check_fn() explicitly and call __thermal_zone_device_update()
>      from there without unlocking the thermal zone.
>    * Export __thermal_zone_device_update() to modules (so it can be called by
>      the ACPI thermal code).
> 
> ---
>  drivers/acpi/thermal.c         |  106 ++++++++++++++++++++++++++++++++++++++---
>  drivers/thermal/thermal_core.c |    1 
>  include/linux/thermal.h        |    2 
>  3 files changed, 102 insertions(+), 7 deletions(-)
> 
> Index: linux-pm/drivers/acpi/thermal.c
> ===================================================================
> --- linux-pm.orig/drivers/acpi/thermal.c
> +++ linux-pm/drivers/acpi/thermal.c
> @@ -107,6 +107,7 @@ struct acpi_thermal_hot {
>  };
>  
>  struct acpi_thermal_passive {
> +	struct thermal_trip_ref trip_ref;
>  	struct acpi_handle_list devices;
>  	unsigned long temperature;
>  	unsigned long tc1;
> @@ -116,6 +117,7 @@ struct acpi_thermal_passive {
>  };
>  
>  struct acpi_thermal_active {
> +	struct thermal_trip_ref trip_ref;
>  	struct acpi_handle_list devices;
>  	unsigned long temperature;
>  	bool valid;
> @@ -137,6 +139,7 @@ struct acpi_thermal {
>  	unsigned long polling_frequency;
>  	volatile u8 zombie;
>  	struct acpi_thermal_trips trips;
> +	struct thermal_trip *trip_table;
>  	struct acpi_handle_list devices;
>  	struct thermal_zone_device *thermal_zone;
>  	int kelvin_offset;	/* in millidegrees */
> @@ -190,6 +193,14 @@ static int acpi_thermal_get_polling_freq
>  	return 0;
>  }
>  
> +static void acpi_thermal_trip_update_temp(struct acpi_thermal *tz,
> +					  struct thermal_trip *trip,
> +					  long temperature)
> +{
> +	trip->temperature = deci_kelvin_to_millicelsius_with_offset(temperature,
> +								    tz->kelvin_offset);
> +}
> +
>  static void __acpi_thermal_trips_update(struct acpi_thermal *tz, int flag)
>  {
>  	acpi_status status;
> @@ -756,6 +767,7 @@ static void acpi_thermal_zone_sysfs_remo
>  
>  static int acpi_thermal_register_thermal_zone(struct acpi_thermal *tz)
>  {
> +	struct thermal_trip *trip;
>  	int passive_delay = 0;
>  	int trip_count = 0;
>  	int result;
> @@ -776,12 +788,54 @@ static int acpi_thermal_register_thermal
>  	for (i = 0; i < ACPI_THERMAL_MAX_ACTIVE && tz->trips.active[i].valid; i++)
>  		trip_count++;
>  
> -	tz->thermal_zone = thermal_zone_device_register("acpitz", trip_count, 0,
> -							tz, &acpi_thermal_zone_ops,
> -							NULL, passive_delay,
> -							tz->polling_frequency * 100);
> -	if (IS_ERR(tz->thermal_zone))
> -		return -ENODEV;
> +	trip = kcalloc(trip_count, sizeof(*trip), GFP_KERNEL);
> +	if (!trip)
> +		return -ENOMEM;
> +
> +	tz->trip_table = trip;
> +
> +	if (tz->trips.critical.valid) {
> +		trip->type = THERMAL_TRIP_CRITICAL;
> +		acpi_thermal_trip_update_temp(tz, trip,
> +					      tz->trips.critical.temperature);
> +		trip++;
> +	}
> +
> +	if (tz->trips.hot.valid) {
> +		trip->type = THERMAL_TRIP_HOT;
> +		acpi_thermal_trip_update_temp(tz, trip,
> +					      tz->trips.hot.temperature);
> +		trip++;
> +	}
> +
> +	if (tz->trips.passive.valid) {
> +		trip->type = THERMAL_TRIP_PASSIVE;
> +		acpi_thermal_trip_update_temp(tz, trip,
> +					      tz->trips.passive.temperature);
> +		trip->driver_ref = &tz->trips.passive.trip_ref;
> +		trip++;
> +	}
> +
> +	for (i = 0; i < ACPI_THERMAL_MAX_ACTIVE && tz->trips.active[i].valid; i++) {
> +		trip->type = THERMAL_TRIP_ACTIVE;
> +		acpi_thermal_trip_update_temp(tz, trip,
> +					      tz->trips.active[i].temperature);
> +		trip->driver_ref = &tz->trips.active[i].trip_ref;
> +		trip++;
> +	}
> +
> +	tz->thermal_zone = thermal_zone_device_register_with_trips("acpitz",
> +								   tz->trip_table,
> +								   trip_count,
> +								   0, tz,
> +								   &acpi_thermal_zone_ops,
> +								   NULL,
> +								   passive_delay,
> +								   tz->polling_frequency * 100);
> +	if (IS_ERR(tz->thermal_zone)) {
> +		result = PTR_ERR(tz->thermal_zone);
> +		goto free_trip_table;
> +	}
>  
>  	result = acpi_thermal_zone_sysfs_add(tz);
>  	if (result)
> @@ -809,6 +863,8 @@ remove_links:
>  	acpi_thermal_zone_sysfs_remove(tz);
>  unregister_tzd:
>  	thermal_zone_device_unregister(tz->thermal_zone);
> +free_trip_table:
> +	kfree(tz->trip_table);
>  
>  	return result;
>  }
> @@ -817,6 +873,7 @@ static void acpi_thermal_unregister_ther
>  {
>  	acpi_thermal_zone_sysfs_remove(tz);
>  	thermal_zone_device_unregister(tz->thermal_zone);
> +	kfree(tz->trip_table);
>  	tz->thermal_zone = NULL;
>  	acpi_bus_detach_private_data(tz->device->handle);
>  }
> @@ -950,6 +1007,9 @@ static void acpi_thermal_check_fn(struct
>  {
>  	struct acpi_thermal *tz = container_of(work, struct acpi_thermal,
>  					       thermal_check_work);
> +	struct thermal_trip *trip;
> +	long temperature;
> +	int i;
>  
>  	/*
>  	 * In general, it is not sufficient to check the pending bit, because
> @@ -964,7 +1024,39 @@ static void acpi_thermal_check_fn(struct
>  
>  	mutex_lock(&tz->thermal_check_lock);
>  
> -	thermal_zone_device_update(tz->thermal_zone, THERMAL_EVENT_UNSPECIFIED);
> +	thermal_zone_device_lock(tz->thermal_zone);
> +
> +	trip = tz->trips.passive.trip_ref.trip;
> +	if (trip) {
> +		/*
> +		 * This means that the passive trip was valid initially, so
> +		 * update its temperature in case it has changed or the trip
> +		 * has become invalid.
> +		 */
> +		temperature = tz->trips.passive.valid ?
> +				tz->trips.passive.temperature :
> +				THERMAL_TEMP_INVALID;
> +		acpi_thermal_trip_update_temp(tz, trip, temperature);

This is a mistake, because THERMAL_TEMP_INVALID should not be passed to
acpi_thermal_trip_update_temp() (and same below).

I'll send an update of this patch in a moment.

> +	}
> +
> +	for (i = 0; i < ACPI_THERMAL_MAX_ACTIVE; i++) {
> +		trip = tz->trips.active[i].trip_ref.trip;
> +		if (trip) {
> +			/*
> +			 * This means that the active trip #i was valid
> +			 * initially, so update its temperature in case it has
> +			 * changed or the trip has become invalid.
> +			 */
> +			temperature = tz->trips.active[i].valid ?
> +					tz->trips.active[i].temperature :
> +					THERMAL_TEMP_INVALID;
> +			acpi_thermal_trip_update_temp(tz, trip, temperature);
> +		}
> +	}
> +
> +	__thermal_zone_device_update(tz->thermal_zone, THERMAL_EVENT_UNSPECIFIED);
> +
> +	thermal_zone_device_unlock(tz->thermal_zone);
>  
>  	refcount_inc(&tz->thermal_check_count);
>  
> Index: linux-pm/drivers/thermal/thermal_core.c
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_core.c
> +++ linux-pm/drivers/thermal/thermal_core.c
> @@ -429,6 +429,7 @@ void __thermal_zone_device_update(struct
>  
>  	monitor_thermal_zone(tz);
>  }
> +EXPORT_SYMBOL_GPL(__thermal_zone_device_update);
>  
>  static int thermal_zone_device_set_mode(struct thermal_zone_device *tz,
>  					enum thermal_device_mode mode)
> Index: linux-pm/include/linux/thermal.h
> ===================================================================
> --- linux-pm.orig/include/linux/thermal.h
> +++ linux-pm/include/linux/thermal.h
> @@ -334,6 +334,8 @@ int thermal_zone_bind_cooling_device(str
>  				     unsigned int);
>  int thermal_zone_unbind_cooling_device(struct thermal_zone_device *, int,
>  				       struct thermal_cooling_device *);
> +void __thermal_zone_device_update(struct thermal_zone_device *,
> +				  enum thermal_notify_event);
>  void thermal_zone_device_update(struct thermal_zone_device *,
>  				enum thermal_notify_event);
>  void thermal_zone_device_lock(struct thermal_zone_device *tz);
>