diff mbox series

[v2,2/7] thermal/core: Add thresholds support

Message ID 20240816081241.1925221-3-daniel.lezcano@linaro.org
State New
Headers show
Series [v2,1/7] thermal/core: Compute low and high boundaries in thermal_zone_device_update() | expand

Commit Message

Daniel Lezcano Aug. 16, 2024, 8:12 a.m. UTC
The trip points are a firmware description of the temperature limits
of a specific thermal zone where we associate an action which is done
by the kernel. The time resolution is low.

The userspace has to deal with a more complex thermal management based
on heuristics from different information coming from different
places. The logic is much more complex but based on a bigger time
resolution, usually one second based.

The purpose of the userspace is to monitor the temperatures from
different places and take actions. However, it can not be constantly
reading the temperature to detect when a temperature threshold has
been reached. This is especially bad for mobile or embedded system as
that will lead to an unacceptable number of wakeup to check the
temperature with nothing to do.

On the other side, the sensors are now most of the time interrupt
driven. That means the thermal framework will use the temperature trip
points to program the sensor to trigger an interrupt when a
temperature limit is crossed.

Unfortunately, the userspace can not benefit this feature and current
solutions found here and there, iow out-of-tree, are to add fake trip
points in the firmware and enable the writable trip points.

This is bad for different reasons, the trip points are for in-kernel
actions, the semantic of their types is used by the thermal framework
and by adding trip points in the device tree is a way to overcome the
current limitation but tampering with how the thermal framework is
supposed to work. The writable trip points is a way to adjust a
temperature limit given a specific platform if the firmware is not
accurate enough and TBH it is more a debug feature from my POV.

The thresholds mechanism is a way to have the userspace to tell
thermal framework to send a notification when a temperature limit is
crossed. There is no id, no hysteresis, just the temperature and the
direction of the limit crossing. That means we can be notified when a
threshold is crossed the way up only, or the way down only or both
ways. That allows to create hysteresis values if it is needed.

A threshold can be added, deleted or flushed. The latter means all
thresholds belonging to a thermal zone will be deleted.

When a threshold is added:

 - if the same threshold (temperature and direction) exists, an error
   is returned

 - if a threshold is specified with the same temperature but a
   different direction, the specified direction is added

 - if there is no threshold with the same temperature then it is
   created

When a threshold is deleted:

 - if the same threshold (temperature and direction) exists, it is
   deleted

 - if a threshold is specified with the same temperature but a
   different direction, the specified direction is removed

 - if there is no threshold with the same temperature, then an error
   is returned

When the threshold are flushed:

 - All thresholds related to a thermal zone are deleted

When a threshold is crossed:

 - the userspace does not need to know which threshold(s) have been
   crossed, it will be notified with the current temperature and the
   previous temperature

 - if multiple thresholds have been crossed between two updates only
   one notification will be send to the userspace, it is pointless to
   send a notification per thresholds crossed as the userspace can
   handle that easily when it has the temperature delta information

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
---
 drivers/thermal/Kconfig              |  15 ++
 drivers/thermal/Makefile             |   3 +
 drivers/thermal/thermal_core.h       |   4 +
 drivers/thermal/thermal_thresholds.c | 241 +++++++++++++++++++++++++++
 drivers/thermal/thermal_thresholds.h |  57 +++++++
 include/linux/thermal.h              |   3 +
 6 files changed, 323 insertions(+)
 create mode 100644 drivers/thermal/thermal_thresholds.c
 create mode 100644 drivers/thermal/thermal_thresholds.h

Comments

Rafael J. Wysocki Aug. 21, 2024, 8:05 p.m. UTC | #1
On Fri, Aug 16, 2024 at 10:12 AM Daniel Lezcano
<daniel.lezcano@linaro.org> wrote:
>
> The trip points are a firmware description of the temperature limits
> of a specific thermal zone where we associate an action which is done
> by the kernel. The time resolution is low.
>
> The userspace has to deal with a more complex thermal management based
> on heuristics from different information coming from different
> places. The logic is much more complex but based on a bigger time
> resolution, usually one second based.
>
> The purpose of the userspace is to monitor the temperatures from
> different places and take actions. However, it can not be constantly
> reading the temperature to detect when a temperature threshold has
> been reached. This is especially bad for mobile or embedded system as
> that will lead to an unacceptable number of wakeup to check the
> temperature with nothing to do.
>
> On the other side, the sensors are now most of the time interrupt
> driven. That means the thermal framework will use the temperature trip
> points to program the sensor to trigger an interrupt when a
> temperature limit is crossed.
>
> Unfortunately, the userspace can not benefit this feature and current
> solutions found here and there, iow out-of-tree, are to add fake trip
> points in the firmware and enable the writable trip points.
>
> This is bad for different reasons, the trip points are for in-kernel
> actions, the semantic of their types is used by the thermal framework
> and by adding trip points in the device tree is a way to overcome the
> current limitation but tampering with how the thermal framework is
> supposed to work. The writable trip points is a way to adjust a
> temperature limit given a specific platform if the firmware is not
> accurate enough and TBH it is more a debug feature from my POV.
>
> The thresholds mechanism is a way to have the userspace to tell
> thermal framework to send a notification when a temperature limit is
> crossed. There is no id, no hysteresis, just the temperature and the
> direction of the limit crossing. That means we can be notified when a
> threshold is crossed the way up only, or the way down only or both
> ways. That allows to create hysteresis values if it is needed.
>
> A threshold can be added, deleted or flushed. The latter means all
> thresholds belonging to a thermal zone will be deleted.
>
> When a threshold is added:
>
>  - if the same threshold (temperature and direction) exists, an error
>    is returned
>
>  - if a threshold is specified with the same temperature but a
>    different direction, the specified direction is added
>
>  - if there is no threshold with the same temperature then it is
>    created
>
> When a threshold is deleted:
>
>  - if the same threshold (temperature and direction) exists, it is
>    deleted
>
>  - if a threshold is specified with the same temperature but a
>    different direction, the specified direction is removed
>
>  - if there is no threshold with the same temperature, then an error
>    is returned
>
> When the threshold are flushed:
>
>  - All thresholds related to a thermal zone are deleted
>
> When a threshold is crossed:
>
>  - the userspace does not need to know which threshold(s) have been
>    crossed, it will be notified with the current temperature and the
>    previous temperature
>
>  - if multiple thresholds have been crossed between two updates only
>    one notification will be send to the userspace, it is pointless to
>    send a notification per thresholds crossed as the userspace can
>    handle that easily when it has the temperature delta information

The above seems to be an exact copy of the first part of the cover letter.

> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
> ---
>  drivers/thermal/Kconfig              |  15 ++
>  drivers/thermal/Makefile             |   3 +
>  drivers/thermal/thermal_core.h       |   4 +
>  drivers/thermal/thermal_thresholds.c | 241 +++++++++++++++++++++++++++
>  drivers/thermal/thermal_thresholds.h |  57 +++++++
>  include/linux/thermal.h              |   3 +
>  6 files changed, 323 insertions(+)
>  create mode 100644 drivers/thermal/thermal_thresholds.c
>  create mode 100644 drivers/thermal/thermal_thresholds.h
>
> diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig
> index ed16897584b4..84f9643678d6 100644
> --- a/drivers/thermal/Kconfig
> +++ b/drivers/thermal/Kconfig
> @@ -40,6 +40,21 @@ config THERMAL_DEBUGFS
>           Say Y to allow the thermal subsystem to collect diagnostic
>           information that can be accessed via debugfs.
>
> +config THERMAL_THRESHOLDS
> +       bool "Thermal thresholds notification mechanism"
> +       depends on THERMAL_NETLINK
> +       help
> +         The userspace implements thermal engines which needs to get
> +         notified when temperature thresholds are crossed the way up
> +         and down. These notification allow them to analyze the
> +         thermal situation of the platform and take decision to
> +         fulfill specific thermal profile like 'balanced',
> +         'performance' or 'power saving'. In addition, the
> +         temperature of the skin sensor is very important in this
> +         case and must be monitored as well.
> +
> +         If in doubt, say Y
> +

I'm not sure if this needs an additional user-selectable Kconfig
option.  It is not modular anyway and not so big, and distros don't
like user-selectable Kconfig options.

>  config THERMAL_EMERGENCY_POWEROFF_DELAY_MS
>         int "Emergency poweroff delay in milli-seconds"
>         default 0
> diff --git a/drivers/thermal/Makefile b/drivers/thermal/Makefile
> index ce7a4752ef52..3b991b1a7db4 100644
> --- a/drivers/thermal/Makefile
> +++ b/drivers/thermal/Makefile
> @@ -7,6 +7,9 @@ obj-$(CONFIG_THERMAL)           += thermal_sys.o
>  thermal_sys-y                  += thermal_core.o thermal_sysfs.o
>  thermal_sys-y                  += thermal_trip.o thermal_helpers.o
>
> +# thermal thresholds
> +thermal_sys-$(CONFIG_THERMAL_THRESHOLDS)       += thermal_thresholds.o
> +
>  # netlink interface to manage the thermal framework
>  thermal_sys-$(CONFIG_THERMAL_NETLINK)          += thermal_netlink.o
>
> diff --git a/drivers/thermal/thermal_core.h b/drivers/thermal/thermal_core.h
> index 67a09f90eb95..0742c0f03d46 100644
> --- a/drivers/thermal/thermal_core.h
> +++ b/drivers/thermal/thermal_core.h
> @@ -13,6 +13,7 @@
>  #include <linux/thermal.h>
>
>  #include "thermal_netlink.h"
> +#include "thermal_thresholds.h"
>  #include "thermal_debugfs.h"
>
>  struct thermal_trip_desc {
> @@ -132,6 +133,9 @@ struct thermal_zone_device {
>         bool resuming;
>  #ifdef CONFIG_THERMAL_DEBUGFS
>         struct thermal_debugfs *debugfs;
> +#endif
> +#ifdef CONFIG_THERMAL_THRESHOLDS
> +       struct thresholds *thresholds;

Why does it need to be a pointer?

I would just use a plain struct list_head for it anyway.

Also, as stated in my reply to the cover letter, I would prefer to
clearly distinguish these thresholds from trip thresholds, so I would
call this user_thresholds.

>  #endif
>         struct thermal_trip_desc trips[] __counted_by(num_trips);
>  };
> diff --git a/drivers/thermal/thermal_thresholds.c b/drivers/thermal/thermal_thresholds.c
> new file mode 100644
> index 000000000000..0241b468cfbd
> --- /dev/null
> +++ b/drivers/thermal/thermal_thresholds.c
> @@ -0,0 +1,241 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright 2024 Linaro Limited
> + *
> + * Author: Daniel Lezcano <daniel.lezcano@linaro.org>
> + *
> + * Thermal thresholds
> + */
> +#include <linux/list.h>
> +#include <linux/list_sort.h>
> +#include <linux/slab.h>
> +
> +#include "thermal_core.h"

+#include "thermal_thresholds.h"

> +
> +struct thresholds {
> +       struct list_head list;
> +};

This duplicates the definition in the header file.

Besides, why is the wrapper struct needed?

> +
> +int thermal_thresholds_init(struct thermal_zone_device *tz)
> +{
> +       struct thresholds *thresholds;
> +
> +       thresholds = kmalloc(sizeof(*thresholds), GFP_KERNEL);
> +       if (!thresholds)
> +               return -ENOMEM;
> +
> +       INIT_LIST_HEAD(&thresholds->list);
> +       tz->thresholds = thresholds;
> +
> +       return 0;
> +}

I'd rather embed "thresholds" in struct thermal_zone_device and avoid
allocating memory separately for it.  Less code, less complexity.

> +
> +void thermal_thresholds_exit(struct thermal_zone_device *tz)
> +{
> +       thermal_thresholds_flush(tz);
> +       kfree(tz->thresholds);
> +       tz->thresholds = NULL;
> +}
> +
> +static int __thermal_thresholds_cmp(void *data,
> +                                   const struct list_head *l1,
> +                                   const struct list_head *l2)
> +{
> +       struct threshold *t1 = container_of(l1, struct threshold, list);
> +       struct threshold *t2 = container_of(l2, struct threshold, list);
> +
> +       return t1->temperature - t2->temperature;
> +}
> +
> +static struct threshold *__thermal_thresholds_find(const struct thresholds *thresholds, int temperature)
> +{
> +       struct threshold *t;
> +
> +       list_for_each_entry(t, &thresholds->list, list)
> +               if (t->temperature == temperature)
> +                       return t;
> +
> +       return NULL;
> +}
> +
> +static bool __thermal_threshold_is_crossed(struct threshold *threshold, int temperature,
> +                                          int last_temperature, int direction,
> +                                          int *low, int *high)
> +{
> +       if (temperature > threshold->temperature && threshold->temperature > *low &&
> +           (THERMAL_THRESHOLD_WAY_DOWN & threshold->direction))
> +               *low = threshold->temperature;
> +
> +       if (temperature < threshold->temperature && threshold->temperature < *high &&
> +           (THERMAL_THRESHOLD_WAY_UP & threshold->direction))
> +               *high = threshold->temperature;
> +
> +       if (temperature < threshold->temperature &&
> +           last_temperature >= threshold->temperature &&
> +           (threshold->direction & direction))
> +               return true;
> +
> +       if (temperature >= threshold->temperature &&
> +           last_temperature < threshold->temperature &&
> +           (threshold->direction & direction))
> +               return true;

I would combine the checks, so something like this

if (temperature >= threshold->temperature) {
        if (threshold->temperature > *low &&
THERMAL_THRESHOLD_WAY_DOWN & threshold->direction)
                *low = threshold->temperature;

        if (last_temperature < threshold->temperature &&
threshold->direction & direction)
                return true;
} else {
        if (threshold->temperature < *high && THERMAL_THRESHOLD_WAY_UP
& threshold->direction)
              *high = threshold->temperature;

        if (last_temperature >= threshold->temperature &&
threshold->direction & direction)
                return true;
}

> +
> +       return false;
> +}
> +
> +static bool thermal_thresholds_handle_raising(struct thresholds *thresholds, int temperature,
> +                                             int last_temperature, int *low, int *high)
> +{
> +       struct threshold *t;
> +
> +       list_for_each_entry(t, &thresholds->list, list) {
> +               if (__thermal_threshold_is_crossed(t, temperature, last_temperature,
> +                                                  THERMAL_THRESHOLD_WAY_UP, low, high))
> +                       return true;
> +       }
> +
> +       return false;
> +}
> +
> +static bool thermal_thresholds_handle_dropping(struct thresholds *thresholds, int temperature,
> +                                              int last_temperature, int *low, int *high)
> +{
> +       struct threshold *t;
> +
> +       list_for_each_entry_reverse(t, &thresholds->list, list) {
> +               if (__thermal_threshold_is_crossed(t, temperature, last_temperature,
> +                                                  THERMAL_THRESHOLD_WAY_DOWN, low, high))
> +                       return true;
> +       }
> +
> +       return false;
> +}
> +
> +void thermal_thresholds_flush(struct thermal_zone_device *tz)
> +{
> +       struct thresholds *thresholds = tz->thresholds;
> +       struct threshold *entry, *tmp;
> +
> +       lockdep_assert_held(&tz->lock);
> +
> +       list_for_each_entry_safe(entry, tmp, &thresholds->list, list) {
> +               list_del(&entry->list);
> +               kfree(entry);
> +       }
> +
> +       __thermal_zone_device_update(tz, THERMAL_THRESHOLD_FLUSHED);
> +}

I'd move the function above before thermal_thresholds_exit() which
uses it.  Having it here is somewhat confusing.

> +
> +int thermal_thresholds_handle(struct thermal_zone_device *tz, int *low, int *high)

This function doesn't return anything other than 0 AFAICS.  Make it void?

> +{
> +       struct thresholds *thresholds = tz->thresholds;
> +
> +       int temperature = tz->temperature;
> +       int last_temperature = tz->last_temperature;
> +       bool notify;
> +
> +       lockdep_assert_held(&tz->lock);
> +
> +       /*
> +        * We need a second update in order to detect a threshold being crossed
> +        */
> +       if (last_temperature == THERMAL_TEMP_INVALID)
> +               return 0;

So user space won't get notified when tz->temperature is above some
thresholds the first time this runs.  Fair enough, but won't they be
confused by subsequent notifications that will not cover some
thresholds?

> +
> +       /*
> +        * The temperature is stable, so obviously we can not have
> +        * crossed a threshold.
> +        */
> +       if (last_temperature == temperature)
> +               return 0;
> +
> +       /*
> +        * Since last update the temperature:
> +        * - increased : thresholds are crossed the way up
> +        * - decreased : thresholds are crossed the way down
> +        */
> +       if (temperature > last_temperature)
> +               notify = thermal_thresholds_handle_raising(thresholds, temperature,
> +                                                          last_temperature, low, high);
> +       else
> +               notify = thermal_thresholds_handle_dropping(thresholds, temperature,
> +                                                           last_temperature, low, high);
> +
> +       if (notify)
> +               pr_debug("A threshold has been crossed the way %s, with a temperature=%d, last_temperature=%d\n",
> +                        temperature > last_temperature ? "up" : "down", temperature, last_temperature);
> +
> +       return 0;
> +}
> +
> +int thermal_thresholds_add(struct thermal_zone_device *tz, int temperature, int direction)
> +{
> +       struct thresholds *thresholds = tz->thresholds;

So IMO it would be cleaner to just put "thresholds" into struct
thermal_zone_device directly as a struct list_head because the above
wouldn't be needed then.

> +       struct threshold *t;
> +
> +       lockdep_assert_held(&tz->lock);
> +
> +       t = __thermal_thresholds_find(thresholds, temperature);
> +       if (t) {
> +               if (t->direction == direction)
> +                       return -EEXIST;

Why is it useful to return an error here?

> +
> +               t->direction |= direction;
> +       } else {
> +
> +               t = kmalloc(sizeof(*t), GFP_KERNEL);
> +               if (!t)
> +                       return -ENOMEM;
> +
> +               INIT_LIST_HEAD(&t->list);
> +               t->temperature = temperature;
> +               t->direction = direction;
> +               list_add(&t->list, &thresholds->list);
> +               list_sort(NULL, &thresholds->list, __thermal_thresholds_cmp);

And the above would become

+        list_add(&t->list, &tz->thresholds);
+        list_sort(NULL, &tz->thresholdst, __thermal_thresholds_cmp);

And analogously below.

> +       }
> +
> +       __thermal_zone_device_update(tz, THERMAL_THRESHOLD_ADDED);
> +
> +       return 0;
> +}
> +
> +int thermal_thresholds_delete(struct thermal_zone_device *tz, int temperature, int direction)
> +{
> +       struct thresholds *thresholds = tz->thresholds;
> +       struct threshold *t;
> +
> +       lockdep_assert_held(&tz->lock);
> +
> +       t = __thermal_thresholds_find(thresholds, temperature);
> +       if (!t)
> +               return -ENOENT;
> +
> +       if (t->direction == direction) {
> +               list_del(&t->list);
> +               kfree(t);
> +       } else {
> +               t->direction &= ~direction;
> +       }
> +
> +       __thermal_zone_device_update(tz, THERMAL_THRESHOLD_DELETED);
> +
> +       return 0;
> +}
> +
> +int thermal_thresholds_for_each(struct thermal_zone_device *tz,
> +                               int (*cb)(struct threshold *, void *arg), void *arg)
> +{
> +       struct thresholds *thresholds = tz->thresholds;
> +       struct threshold *entry;
> +       int ret;
> +
> +       lockdep_assert_held(&tz->lock);
> +
> +       list_for_each_entry(entry, &thresholds->list, list) {
> +               ret = cb(entry, arg);
> +               if (ret)
> +                       return ret;
> +       }
> +
> +       return 0;
> +}
> diff --git a/drivers/thermal/thermal_thresholds.h b/drivers/thermal/thermal_thresholds.h
> new file mode 100644
> index 000000000000..7c8ce150d6d0
> --- /dev/null
> +++ b/drivers/thermal/thermal_thresholds.h
> @@ -0,0 +1,57 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#define THERMAL_THRESHOLD_WAY_UP   BIT(0)
> +#define THERMAL_THRESHOLD_WAY_DOWN BIT(1)
> +
> +struct threshold {
> +       int temperature;
> +       int direction;
> +       struct list_head list;
> +};

IMO it would be better to put the list field at the top for better
alignment and such.  I would also call it something like list_node.

And I'd call this struct user_threshold as a whole (as per the
previous remarks about the naming).

> +
> +#ifdef CONFIG_THERMAL_THRESHOLDS
> +int thermal_thresholds_init(struct thermal_zone_device *tz);
> +void thermal_thresholds_exit(struct thermal_zone_device *tz);
> +void thermal_thresholds_flush(struct thermal_zone_device *tz);
> +int thermal_thresholds_add(struct thermal_zone_device *tz, int temperature, int direction);
> +int thermal_thresholds_delete(struct thermal_zone_device *tz, int temperature, int direction);
> +int thermal_thresholds_handle(struct thermal_zone_device *tz, int *low, int *high);
> +int thermal_thresholds_for_each(struct thermal_zone_device *tz,
> +                               int (*cb)(struct threshold *, void *arg), void *arg);
> +#else
> +static inline int thermal_thresholds_init(struct thermal_zone_device *tz)
> +{
> +       return 0;
> +}
> +
> +static inline void thermal_thresholds_exit(struct thermal_zone_device *tz)
> +{
> +       ;
> +}
> +
> +static inline void thermal_thresholds_flush(struct thermal_zone_device *tz)
> +{
> +       ;
> +}
> +
> +static inline int thermal_thresholds_add(struct thermal_zone_device *tz, int temperature, int direction)
> +{
> +       return 0;
> +}
> +
> +static inline int thermal_thresholds_delete(struct thermal_zone_device *tz, int temperature, int direction)
> +{
> +       return 0;
> +}
> +
> +static inline int thermal_thresholds_handle(struct thermal_zone_device *tz, int *low, int *high)
> +{
> +       return 0;
> +}
> +
> +static inline int thermal_thresholds_for_each(struct thermal_zone_device *tz,
> +                                             int (*cb)(struct threshold *, void *arg), void *arg)
> +{
> +       return 0;
> +}
> +#endif
> diff --git a/include/linux/thermal.h b/include/linux/thermal.h
> index 25fbf960b474..bf0b4a8218f6 100644
> --- a/include/linux/thermal.h
> +++ b/include/linux/thermal.h
> @@ -55,6 +55,9 @@ enum thermal_notify_event {
>         THERMAL_TZ_BIND_CDEV, /* Cooling dev is bind to the thermal zone */
>         THERMAL_TZ_UNBIND_CDEV, /* Cooling dev is unbind from the thermal zone */
>         THERMAL_INSTANCE_WEIGHT_CHANGED, /* Thermal instance weight changed */

There is an additional item here in the mainline (THERMAL_TZ_RESUME).

> +       THERMAL_THRESHOLD_ADDED, /* Threshold added */
> +       THERMAL_THRESHOLD_DELETED, /* Threshold deleted */
> +       THERMAL_THRESHOLD_FLUSHED, /* All thresholds deleted */

I'd add "TZ" to these names, eg. THERMAL_TZ_THRESHOLD_ADDED, or even
THERMAL_TZ_ADD_THRESHOLD in analogy with the cdev events above.

And THERMAL_TZ_FLUSH_THRESHOLDS sounds more like proper English to me. ;-)

>  };
>
>  /**
> --
Rafael J. Wysocki Aug. 22, 2024, 11:30 a.m. UTC | #2
On Wed, Aug 21, 2024 at 10:05 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Fri, Aug 16, 2024 at 10:12 AM Daniel Lezcano
> <daniel.lezcano@linaro.org> wrote:

[cut]

> > --- /dev/null
> > +++ b/drivers/thermal/thermal_thresholds.c
> > @@ -0,0 +1,241 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Copyright 2024 Linaro Limited
> > + *
> > + * Author: Daniel Lezcano <daniel.lezcano@linaro.org>
> > + *
> > + * Thermal thresholds
> > + */
> > +#include <linux/list.h>
> > +#include <linux/list_sort.h>
> > +#include <linux/slab.h>
> > +
> > +#include "thermal_core.h"
>
> +#include "thermal_thresholds.h"
>
> > +
> > +struct thresholds {
> > +       struct list_head list;
> > +};
>
> This duplicates the definition in the header file.
>
> Besides, why is the wrapper struct needed?

On second thought, it can hold a pointer to the first threshold that
was strictly above the zone temperature when
thermal_thresholds_handle() ran last time, so something like:

struct thresholds {
      struct list_head list;
      struct user_threhold *first_above;
};

and first_above == NULL would mean that the zone temperature was above
all of the threshold or at the highest one.

Then thermal_thresholds_handle() could do something like:

tr = tz->user_thresholds.first_above;

if (tr && tz->temperature >= tr->temperature) {
    do {
        if (tr->direction & THERMAL_THRESHOLD_WAY_UP)
            notify = true;

        if (tr->list_node.next != &tz->user_thresholds.list) {
            tr = list_next_entry(tr, list_node);
        else
            tr = NULL;
    } while (tr && tz->temperature >= tr->temperature);
} else {
    if (!tr)
        tr = list_last_entry(&tz->user_thresholds.list,
                     struct user_threshold, list_node);

    while (tz->temperature < tr->temperature)
        if (tr->direction & THERMAL_THRESHOLD_WAY_DOWN)
            notify = true;

        if (tr->list_node.prev != &tz->user_thresholds.list) {
            tr = list_prev_entry(tr, list_node);
        else
            break;
    }
    if (tz->temperature >= tr->temperature)
        tr = NULL;
}
tz->user_thresholds.first_above = tr;

which is a bit simpler than the code in the current patch.

> > +
> > +int thermal_thresholds_init(struct thermal_zone_device *tz)
> > +{
> > +       struct thresholds *thresholds;
> > +
> > +       thresholds = kmalloc(sizeof(*thresholds), GFP_KERNEL);
> > +       if (!thresholds)
> > +               return -ENOMEM;
> > +
> > +       INIT_LIST_HEAD(&thresholds->list);
> > +       tz->thresholds = thresholds;
> > +
> > +       return 0;
> > +}
>
> I'd rather embed "thresholds" in struct thermal_zone_device and avoid
> allocating memory separately for it.  Less code, less complexity.
>
> > +
> > +void thermal_thresholds_exit(struct thermal_zone_device *tz)
> > +{
> > +       thermal_thresholds_flush(tz);
> > +       kfree(tz->thresholds);
> > +       tz->thresholds = NULL;
> > +}
> > +
> > +static int __thermal_thresholds_cmp(void *data,
> > +                                   const struct list_head *l1,
> > +                                   const struct list_head *l2)
> > +{
> > +       struct threshold *t1 = container_of(l1, struct threshold, list);
> > +       struct threshold *t2 = container_of(l2, struct threshold, list);
> > +
> > +       return t1->temperature - t2->temperature;
> > +}
> > +
> > +static struct threshold *__thermal_thresholds_find(const struct thresholds *thresholds, int temperature)
> > +{
> > +       struct threshold *t;
> > +
> > +       list_for_each_entry(t, &thresholds->list, list)
> > +               if (t->temperature == temperature)
> > +                       return t;
> > +
> > +       return NULL;
> > +}
> > +
> > +static bool __thermal_threshold_is_crossed(struct threshold *threshold, int temperature,
> > +                                          int last_temperature, int direction,
> > +                                          int *low, int *high)
> > +{
> > +       if (temperature > threshold->temperature && threshold->temperature > *low &&
> > +           (THERMAL_THRESHOLD_WAY_DOWN & threshold->direction))
> > +               *low = threshold->temperature;
> > +
> > +       if (temperature < threshold->temperature && threshold->temperature < *high &&
> > +           (THERMAL_THRESHOLD_WAY_UP & threshold->direction))
> > +               *high = threshold->temperature;
> > +
> > +       if (temperature < threshold->temperature &&
> > +           last_temperature >= threshold->temperature &&
> > +           (threshold->direction & direction))
> > +               return true;
> > +
> > +       if (temperature >= threshold->temperature &&
> > +           last_temperature < threshold->temperature &&
> > +           (threshold->direction & direction))
> > +               return true;
>
> I would combine the checks, so something like this
>
> if (temperature >= threshold->temperature) {
>         if (threshold->temperature > *low &&
> THERMAL_THRESHOLD_WAY_DOWN & threshold->direction)
>                 *low = threshold->temperature;
>
>         if (last_temperature < threshold->temperature &&
> threshold->direction & direction)
>                 return true;
> } else {
>         if (threshold->temperature < *high && THERMAL_THRESHOLD_WAY_UP
> & threshold->direction)
>               *high = threshold->temperature;
>
>         if (last_temperature >= threshold->temperature &&
> threshold->direction & direction)
>                 return true;
> }

Also, I'm not sure why "high" and "low" are needed at all.

The current and last zone temperature could be included in the
notification just fine and user space should be able to figure out
which thresholds are affected.

> > +
> > +       return false;
> > +}
> > +
Daniel Lezcano Aug. 22, 2024, 5:20 p.m. UTC | #3
On 21/08/2024 22:05, Rafael J. Wysocki wrote:
> On Fri, Aug 16, 2024 at 10:12 AM Daniel Lezcano
> <daniel.lezcano@linaro.org> wrote:
>>
>> The trip points are a firmware description of the temperature limits
>> of a specific thermal zone where we associate an action which is done
>> by the kernel. The time resolution is low.
>>
>> The userspace has to deal with a more complex thermal management based
>> on heuristics from different information coming from different
>> places. The logic is much more complex but based on a bigger time
>> resolution, usually one second based.
>>
>> The purpose of the userspace is to monitor the temperatures from
>> different places and take actions. However, it can not be constantly
>> reading the temperature to detect when a temperature threshold has
>> been reached. This is especially bad for mobile or embedded system as
>> that will lead to an unacceptable number of wakeup to check the
>> temperature with nothing to do.
>>
>> On the other side, the sensors are now most of the time interrupt
>> driven. That means the thermal framework will use the temperature trip
>> points to program the sensor to trigger an interrupt when a
>> temperature limit is crossed.
>>
>> Unfortunately, the userspace can not benefit this feature and current
>> solutions found here and there, iow out-of-tree, are to add fake trip
>> points in the firmware and enable the writable trip points.
>>
>> This is bad for different reasons, the trip points are for in-kernel
>> actions, the semantic of their types is used by the thermal framework
>> and by adding trip points in the device tree is a way to overcome the
>> current limitation but tampering with how the thermal framework is
>> supposed to work. The writable trip points is a way to adjust a
>> temperature limit given a specific platform if the firmware is not
>> accurate enough and TBH it is more a debug feature from my POV.
>>
>> The thresholds mechanism is a way to have the userspace to tell
>> thermal framework to send a notification when a temperature limit is
>> crossed. There is no id, no hysteresis, just the temperature and the
>> direction of the limit crossing. That means we can be notified when a
>> threshold is crossed the way up only, or the way down only or both
>> ways. That allows to create hysteresis values if it is needed.
>>
>> A threshold can be added, deleted or flushed. The latter means all
>> thresholds belonging to a thermal zone will be deleted.
>>
>> When a threshold is added:
>>
>>   - if the same threshold (temperature and direction) exists, an error
>>     is returned
>>
>>   - if a threshold is specified with the same temperature but a
>>     different direction, the specified direction is added
>>
>>   - if there is no threshold with the same temperature then it is
>>     created
>>
>> When a threshold is deleted:
>>
>>   - if the same threshold (temperature and direction) exists, it is
>>     deleted
>>
>>   - if a threshold is specified with the same temperature but a
>>     different direction, the specified direction is removed
>>
>>   - if there is no threshold with the same temperature, then an error
>>     is returned
>>
>> When the threshold are flushed:
>>
>>   - All thresholds related to a thermal zone are deleted
>>
>> When a threshold is crossed:
>>
>>   - the userspace does not need to know which threshold(s) have been
>>     crossed, it will be notified with the current temperature and the
>>     previous temperature
>>
>>   - if multiple thresholds have been crossed between two updates only
>>     one notification will be send to the userspace, it is pointless to
>>     send a notification per thresholds crossed as the userspace can
>>     handle that easily when it has the temperature delta information
> 
> The above seems to be an exact copy of the first part of the cover letter.

Yes, it is done on purpose as it is the commit bringing the feature. I 
thought it is convenient for the developer to read the description of 
the commit introducing the feature.

[ ... ]

>> +config THERMAL_THRESHOLDS
>> +       bool "Thermal thresholds notification mechanism"
>> +       depends on THERMAL_NETLINK
>> +       help
>> +         The userspace implements thermal engines which needs to get
>> +         notified when temperature thresholds are crossed the way up
>> +         and down. These notification allow them to analyze the
>> +         thermal situation of the platform and take decision to
>> +         fulfill specific thermal profile like 'balanced',
>> +         'performance' or 'power saving'. In addition, the
>> +         temperature of the skin sensor is very important in this
>> +         case and must be monitored as well.
>> +
>> +         If in doubt, say Y
>> +
> 
> I'm not sure if this needs an additional user-selectable Kconfig
> option.  It is not modular anyway and not so big, and distros don't
> like user-selectable Kconfig options.

Ok, I can drop the user selectable option.

[ ... ]

>> +#endif
>> +#ifdef CONFIG_THERMAL_THRESHOLDS
>> +       struct thresholds *thresholds;
> 
> Why does it need to be a pointer?
> 
> I would just use a plain struct list_head for it anyway.
> 
> Also, as stated in my reply to the cover letter, I would prefer to
> clearly distinguish these thresholds from trip thresholds, so I would
> call this user_thresholds.
> 
>>   #endif
>>          struct thermal_trip_desc trips[] __counted_by(num_trips);
>>   };
>> diff --git a/drivers/thermal/thermal_thresholds.c b/drivers/thermal/thermal_thresholds.c
>> new file mode 100644
>> index 000000000000..0241b468cfbd
>> --- /dev/null
>> +++ b/drivers/thermal/thermal_thresholds.c
>> @@ -0,0 +1,241 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * Copyright 2024 Linaro Limited
>> + *
>> + * Author: Daniel Lezcano <daniel.lezcano@linaro.org>
>> + *
>> + * Thermal thresholds
>> + */
>> +#include <linux/list.h>
>> +#include <linux/list_sort.h>
>> +#include <linux/slab.h>
>> +
>> +#include "thermal_core.h"
> 
> +#include "thermal_thresholds.h"
> 
>> +
>> +struct thresholds {
>> +       struct list_head list;
>> +};
> 
> This duplicates the definition in the header file.

No actually it is plural.

struct threshold;
struct thresholds;

> Besides, why is the wrapper struct needed?

Because I think the threshold will continue to evolve. So instead of 
dealing with lists, we use a dedicated structure. In case we add new 
fields or we change the list by something else the functions prototypes 
are untouched. It is my way of coding in order to always set the scene 
for future changes.

>> +int thermal_thresholds_init(struct thermal_zone_device *tz)
>> +{
>> +       struct thresholds *thresholds;
>> +
>> +       thresholds = kmalloc(sizeof(*thresholds), GFP_KERNEL);
>> +       if (!thresholds)
>> +               return -ENOMEM;
>> +
>> +       INIT_LIST_HEAD(&thresholds->list);
>> +       tz->thresholds = thresholds;
>> +
>> +       return 0;
>> +}
> 
> I'd rather embed "thresholds" in struct thermal_zone_device and avoid
> allocating memory separately for it.  Less code, less complexity.

Ok

>> +static bool __thermal_threshold_is_crossed(struct threshold *threshold, int temperature,
>> +                                          int last_temperature, int direction,
>> +                                          int *low, int *high)
>> +{
>> +       if (temperature > threshold->temperature && threshold->temperature > *low &&
>> +           (THERMAL_THRESHOLD_WAY_DOWN & threshold->direction))
>> +               *low = threshold->temperature;
>> +
>> +       if (temperature < threshold->temperature && threshold->temperature < *high &&
>> +           (THERMAL_THRESHOLD_WAY_UP & threshold->direction))
>> +               *high = threshold->temperature;
>> +
>> +       if (temperature < threshold->temperature &&
>> +           last_temperature >= threshold->temperature &&
>> +           (threshold->direction & direction))
>> +               return true;
>> +
>> +       if (temperature >= threshold->temperature &&
>> +           last_temperature < threshold->temperature &&
>> +           (threshold->direction & direction))
>> +               return true;
> 
> I would combine the checks, so something like this
> 
> if (temperature >= threshold->temperature) {
>          if (threshold->temperature > *low &&
> THERMAL_THRESHOLD_WAY_DOWN & threshold->direction)
>                  *low = threshold->temperature;
>          if (last_temperature < threshold->temperature &&
> threshold->direction & direction)
>                  return true;
> } else {
>          if (threshold->temperature < *high && THERMAL_THRESHOLD_WAY_UP
> & threshold->direction)
>                *high = threshold->temperature;
> 
>          if (last_temperature >= threshold->temperature &&
> threshold->direction & direction)
>                  return true;
> }

Ok

[ ... ]

>> +void thermal_thresholds_flush(struct thermal_zone_device *tz)
>> +{
>> +       struct thresholds *thresholds = tz->thresholds;
>> +       struct threshold *entry, *tmp;
>> +
>> +       lockdep_assert_held(&tz->lock);
>> +
>> +       list_for_each_entry_safe(entry, tmp, &thresholds->list, list) {
>> +               list_del(&entry->list);
>> +               kfree(entry);
>> +       }
>> +
>> +       __thermal_zone_device_update(tz, THERMAL_THRESHOLD_FLUSHED);
>> +}
> 
> I'd move the function above before thermal_thresholds_exit() which
> uses it.  Having it here is somewhat confusing.

Sure

>> +
>> +int thermal_thresholds_handle(struct thermal_zone_device *tz, int *low, int *high)
> 
> This function doesn't return anything other than 0 AFAICS.  Make it void?

Ok

>> +{
>> +       struct thresholds *thresholds = tz->thresholds;
>> +
>> +       int temperature = tz->temperature;
>> +       int last_temperature = tz->last_temperature;
>> +       bool notify;
>> +
>> +       lockdep_assert_held(&tz->lock);
>> +
>> +       /*
>> +        * We need a second update in order to detect a threshold being crossed
>> +        */
>> +       if (last_temperature == THERMAL_TEMP_INVALID)
>> +               return 0;
> 
> So user space won't get notified when tz->temperature is above some
> thresholds the first time this runs.  Fair enough, but won't they be
> confused by subsequent notifications that will not cover some
> thresholds?

It is unlikely to happen.

When the thermal zone is created and enabled, it updates the temperature 
of the thermal zone so tz->temperature is no longer THERMAL_TEMP_INVALID.

At this step, there is no userspace set yet.

Assuming there is zero update until we create a threshold. When this one 
is created then the thermal zone is updated, the temperature is read, 
the tz->last_temperature = tz->temperature and tz->temperature has the 
new value. Then handle_thresholds is called and finally set_trips

If the temperature is above or below a threshold then it is detected and 
notified because tz->last_temperature is not equal to THERMAL_TEMP_INVALID.

Others situations are IMO resulting from a bogus driver/sensor and 
should be handled at the sensor level, not the core code.

[ ... ]

>> +int thermal_thresholds_add(struct thermal_zone_device *tz, int temperature, int direction)
>> +{
>> +       struct thresholds *thresholds = tz->thresholds;
> 
> So IMO it would be cleaner to just put "thresholds" into struct
> thermal_zone_device directly as a struct list_head because the above
> wouldn't be needed then.

I can change that to list directly but if we add anything (which can 
probably happen) then the API change as well as anything below again.

>> +       struct threshold *t;
>> +
>> +       lockdep_assert_held(&tz->lock);
>> +
>> +       t = __thermal_thresholds_find(thresholds, temperature);
>> +       if (t) {
>> +               if (t->direction == direction)
>> +                       return -EEXIST;
> 
> Why is it useful to return an error here?

I was expecting this comment :)

We have the choice between :
  * we do nothing
  * we return an error

Let's assume the userspace is misbehaving and because of an internal bug 
of a thermal engine it creates multiple times the same threshold (eg. 
index not incremented, etc ...). If the kernel reports nothing, then the 
user space will never detect this problem. If it reports the error the 
user space can choose to ignore it or follow it up.

The kernel is strict and it is up to the user space to ignore it or not.

>> +
>> +               t->direction |= direction;
>> +       } else {
>> +
>> +               t = kmalloc(sizeof(*t), GFP_KERNEL);
>> +               if (!t)
>> +                       return -ENOMEM;
>> +
>> +               INIT_LIST_HEAD(&t->list);
>> +               t->temperature = temperature;
>> +               t->direction = direction;
>> +               list_add(&t->list, &thresholds->list);
>> +               list_sort(NULL, &thresholds->list, __thermal_thresholds_cmp);
> 
> And the above would become
> 
> +        list_add(&t->list, &tz->thresholds);
> +        list_sort(NULL, &tz->thresholdst, __thermal_thresholds_cmp);
> 
> And analogously below.
> 
>> +       }
>> +
>> +       __thermal_zone_device_update(tz, THERMAL_THRESHOLD_ADDED);
>> +
>> +       return 0;
>> +}

[ ... ]

>> +struct threshold {
>> +       int temperature;
>> +       int direction;
>> +       struct list_head list;
>> +};
> 
> IMO it would be better to put the list field at the top for better
> alignment and such.  I would also call it something like list_node.
> 
> And I'd call this struct user_threshold as a whole (as per the
> previous remarks about the naming).

Ok

[ ... ]

>> @@ -55,6 +55,9 @@ enum thermal_notify_event {
>>          THERMAL_TZ_BIND_CDEV, /* Cooling dev is bind to the thermal zone */
>>          THERMAL_TZ_UNBIND_CDEV, /* Cooling dev is unbind from the thermal zone */
>>          THERMAL_INSTANCE_WEIGHT_CHANGED, /* Thermal instance weight changed */
> 
> There is an additional item here in the mainline (THERMAL_TZ_RESUME).
> 
>> +       THERMAL_THRESHOLD_ADDED, /* Threshold added */
>> +       THERMAL_THRESHOLD_DELETED, /* Threshold deleted */
>> +       THERMAL_THRESHOLD_FLUSHED, /* All thresholds deleted */
> 
> I'd add "TZ" to these names, eg. THERMAL_TZ_THRESHOLD_ADDED, or even
> THERMAL_TZ_ADD_THRESHOLD in analogy with the cdev events above.
> 
> And THERMAL_TZ_FLUSH_THRESHOLDS sounds more like proper English to me. ;-)

Ok, will do the change
Rafael J. Wysocki Aug. 22, 2024, 8:09 p.m. UTC | #4
On Thu, Aug 22, 2024 at 7:20 PM Daniel Lezcano
<daniel.lezcano@linaro.org> wrote:
>
> On 21/08/2024 22:05, Rafael J. Wysocki wrote:
> > On Fri, Aug 16, 2024 at 10:12 AM Daniel Lezcano
> > <daniel.lezcano@linaro.org> wrote:
> >>
> >> The trip points are a firmware description of the temperature limits
> >> of a specific thermal zone where we associate an action which is done
> >> by the kernel. The time resolution is low.
> >>
> >> The userspace has to deal with a more complex thermal management based
> >> on heuristics from different information coming from different
> >> places. The logic is much more complex but based on a bigger time
> >> resolution, usually one second based.
> >>
> >> The purpose of the userspace is to monitor the temperatures from
> >> different places and take actions. However, it can not be constantly
> >> reading the temperature to detect when a temperature threshold has
> >> been reached. This is especially bad for mobile or embedded system as
> >> that will lead to an unacceptable number of wakeup to check the
> >> temperature with nothing to do.
> >>
> >> On the other side, the sensors are now most of the time interrupt
> >> driven. That means the thermal framework will use the temperature trip
> >> points to program the sensor to trigger an interrupt when a
> >> temperature limit is crossed.
> >>
> >> Unfortunately, the userspace can not benefit this feature and current
> >> solutions found here and there, iow out-of-tree, are to add fake trip
> >> points in the firmware and enable the writable trip points.
> >>
> >> This is bad for different reasons, the trip points are for in-kernel
> >> actions, the semantic of their types is used by the thermal framework
> >> and by adding trip points in the device tree is a way to overcome the
> >> current limitation but tampering with how the thermal framework is
> >> supposed to work. The writable trip points is a way to adjust a
> >> temperature limit given a specific platform if the firmware is not
> >> accurate enough and TBH it is more a debug feature from my POV.
> >>
> >> The thresholds mechanism is a way to have the userspace to tell
> >> thermal framework to send a notification when a temperature limit is
> >> crossed. There is no id, no hysteresis, just the temperature and the
> >> direction of the limit crossing. That means we can be notified when a
> >> threshold is crossed the way up only, or the way down only or both
> >> ways. That allows to create hysteresis values if it is needed.
> >>
> >> A threshold can be added, deleted or flushed. The latter means all
> >> thresholds belonging to a thermal zone will be deleted.
> >>
> >> When a threshold is added:
> >>
> >>   - if the same threshold (temperature and direction) exists, an error
> >>     is returned
> >>
> >>   - if a threshold is specified with the same temperature but a
> >>     different direction, the specified direction is added
> >>
> >>   - if there is no threshold with the same temperature then it is
> >>     created
> >>
> >> When a threshold is deleted:
> >>
> >>   - if the same threshold (temperature and direction) exists, it is
> >>     deleted
> >>
> >>   - if a threshold is specified with the same temperature but a
> >>     different direction, the specified direction is removed
> >>
> >>   - if there is no threshold with the same temperature, then an error
> >>     is returned
> >>
> >> When the threshold are flushed:
> >>
> >>   - All thresholds related to a thermal zone are deleted
> >>
> >> When a threshold is crossed:
> >>
> >>   - the userspace does not need to know which threshold(s) have been
> >>     crossed, it will be notified with the current temperature and the
> >>     previous temperature
> >>
> >>   - if multiple thresholds have been crossed between two updates only
> >>     one notification will be send to the userspace, it is pointless to
> >>     send a notification per thresholds crossed as the userspace can
> >>     handle that easily when it has the temperature delta information
> >
> > The above seems to be an exact copy of the first part of the cover letter.
>
> Yes, it is done on purpose as it is the commit bringing the feature. I
> thought it is convenient for the developer to read the description of
> the commit introducing the feature.

I agree, but then you could just say "refer to the changelog of the
first patch for details" in the cover letter.

> [ ... ]
>
> >> +config THERMAL_THRESHOLDS
> >> +       bool "Thermal thresholds notification mechanism"
> >> +       depends on THERMAL_NETLINK
> >> +       help
> >> +         The userspace implements thermal engines which needs to get
> >> +         notified when temperature thresholds are crossed the way up
> >> +         and down. These notification allow them to analyze the
> >> +         thermal situation of the platform and take decision to
> >> +         fulfill specific thermal profile like 'balanced',
> >> +         'performance' or 'power saving'. In addition, the
> >> +         temperature of the skin sensor is very important in this
> >> +         case and must be monitored as well.
> >> +
> >> +         If in doubt, say Y
> >> +
> >
> > I'm not sure if this needs an additional user-selectable Kconfig
> > option.  It is not modular anyway and not so big, and distros don't
> > like user-selectable Kconfig options.
>
> Ok, I can drop the user selectable option.
>
> [ ... ]
>
> >> +#endif
> >> +#ifdef CONFIG_THERMAL_THRESHOLDS
> >> +       struct thresholds *thresholds;
> >
> > Why does it need to be a pointer?
> >
> > I would just use a plain struct list_head for it anyway.
> >
> > Also, as stated in my reply to the cover letter, I would prefer to
> > clearly distinguish these thresholds from trip thresholds, so I would
> > call this user_thresholds.
> >
> >>   #endif
> >>          struct thermal_trip_desc trips[] __counted_by(num_trips);
> >>   };
> >> diff --git a/drivers/thermal/thermal_thresholds.c b/drivers/thermal/thermal_thresholds.c
> >> new file mode 100644
> >> index 000000000000..0241b468cfbd
> >> --- /dev/null
> >> +++ b/drivers/thermal/thermal_thresholds.c
> >> @@ -0,0 +1,241 @@
> >> +// SPDX-License-Identifier: GPL-2.0
> >> +/*
> >> + * Copyright 2024 Linaro Limited
> >> + *
> >> + * Author: Daniel Lezcano <daniel.lezcano@linaro.org>
> >> + *
> >> + * Thermal thresholds
> >> + */
> >> +#include <linux/list.h>
> >> +#include <linux/list_sort.h>
> >> +#include <linux/slab.h>
> >> +
> >> +#include "thermal_core.h"
> >
> > +#include "thermal_thresholds.h"
> >
> >> +
> >> +struct thresholds {
> >> +       struct list_head list;
> >> +};
> >
> > This duplicates the definition in the header file.
>
> No actually it is plural.
>
> struct threshold;
> struct thresholds;

I saw the first one, but for some reason I thought the other was there
too, sorry.

But this means that it is missing from thermal_core.h where it is used
in the struct thermal_zone_device definition.

> > Besides, why is the wrapper struct needed?
>
> Because I think the threshold will continue to evolve.

Fair enough, like I said here:

https://lore.kernel.org/linux-pm/CAJZ5v0h=DgBSiFbdmnzSFjEJd6sdBffCODspxmM-G92FN2HGiA@mail.gmail.com/

> So instead of
> dealing with lists, we use a dedicated structure. In case we add new
> fields or we change the list by something else the functions prototypes
> are untouched. It is my way of coding in order to always set the scene
> for future changes.
>
> >> +int thermal_thresholds_init(struct thermal_zone_device *tz)
> >> +{
> >> +       struct thresholds *thresholds;
> >> +
> >> +       thresholds = kmalloc(sizeof(*thresholds), GFP_KERNEL);
> >> +       if (!thresholds)
> >> +               return -ENOMEM;
> >> +
> >> +       INIT_LIST_HEAD(&thresholds->list);
> >> +       tz->thresholds = thresholds;
> >> +
> >> +       return 0;
> >> +}
> >
> > I'd rather embed "thresholds" in struct thermal_zone_device and avoid
> > allocating memory separately for it.  Less code, less complexity.
>
> Ok
>
> >> +static bool __thermal_threshold_is_crossed(struct threshold *threshold, int temperature,
> >> +                                          int last_temperature, int direction,
> >> +                                          int *low, int *high)
> >> +{
> >> +       if (temperature > threshold->temperature && threshold->temperature > *low &&
> >> +           (THERMAL_THRESHOLD_WAY_DOWN & threshold->direction))
> >> +               *low = threshold->temperature;
> >> +
> >> +       if (temperature < threshold->temperature && threshold->temperature < *high &&
> >> +           (THERMAL_THRESHOLD_WAY_UP & threshold->direction))
> >> +               *high = threshold->temperature;
> >> +
> >> +       if (temperature < threshold->temperature &&
> >> +           last_temperature >= threshold->temperature &&
> >> +           (threshold->direction & direction))
> >> +               return true;
> >> +
> >> +       if (temperature >= threshold->temperature &&
> >> +           last_temperature < threshold->temperature &&
> >> +           (threshold->direction & direction))
> >> +               return true;
> >
> > I would combine the checks, so something like this
> >
> > if (temperature >= threshold->temperature) {
> >          if (threshold->temperature > *low &&
> > THERMAL_THRESHOLD_WAY_DOWN & threshold->direction)
> >                  *low = threshold->temperature;
> >          if (last_temperature < threshold->temperature &&
> > threshold->direction & direction)
> >                  return true;
> > } else {
> >          if (threshold->temperature < *high && THERMAL_THRESHOLD_WAY_UP
> > & threshold->direction)
> >                *high = threshold->temperature;
> >
> >          if (last_temperature >= threshold->temperature &&
> > threshold->direction & direction)
> >                  return true;
> > }
>
> Ok
>
> [ ... ]
>
> >> +void thermal_thresholds_flush(struct thermal_zone_device *tz)
> >> +{
> >> +       struct thresholds *thresholds = tz->thresholds;
> >> +       struct threshold *entry, *tmp;
> >> +
> >> +       lockdep_assert_held(&tz->lock);
> >> +
> >> +       list_for_each_entry_safe(entry, tmp, &thresholds->list, list) {
> >> +               list_del(&entry->list);
> >> +               kfree(entry);
> >> +       }
> >> +
> >> +       __thermal_zone_device_update(tz, THERMAL_THRESHOLD_FLUSHED);
> >> +}
> >
> > I'd move the function above before thermal_thresholds_exit() which
> > uses it.  Having it here is somewhat confusing.
>
> Sure
>
> >> +
> >> +int thermal_thresholds_handle(struct thermal_zone_device *tz, int *low, int *high)
> >
> > This function doesn't return anything other than 0 AFAICS.  Make it void?
>
> Ok
>
> >> +{
> >> +       struct thresholds *thresholds = tz->thresholds;
> >> +
> >> +       int temperature = tz->temperature;
> >> +       int last_temperature = tz->last_temperature;
> >> +       bool notify;
> >> +
> >> +       lockdep_assert_held(&tz->lock);
> >> +
> >> +       /*
> >> +        * We need a second update in order to detect a threshold being crossed
> >> +        */
> >> +       if (last_temperature == THERMAL_TEMP_INVALID)
> >> +               return 0;
> >
> > So user space won't get notified when tz->temperature is above some
> > thresholds the first time this runs.  Fair enough, but won't they be
> > confused by subsequent notifications that will not cover some
> > thresholds?
>
> It is unlikely to happen.
>
> When the thermal zone is created and enabled, it updates the temperature
> of the thermal zone so tz->temperature is no longer THERMAL_TEMP_INVALID.
>
> At this step, there is no userspace set yet.
>
> Assuming there is zero update until we create a threshold. When this one
> is created then the thermal zone is updated, the temperature is read,
> the tz->last_temperature = tz->temperature and tz->temperature has the
> new value. Then handle_thresholds is called and finally set_trips
>
> If the temperature is above or below a threshold then it is detected and
> notified because tz->last_temperature is not equal to THERMAL_TEMP_INVALID.
>
> Others situations are IMO resulting from a bogus driver/sensor and
> should be handled at the sensor level, not the core code.
>
> [ ... ]
>
> >> +int thermal_thresholds_add(struct thermal_zone_device *tz, int temperature, int direction)
> >> +{
> >> +       struct thresholds *thresholds = tz->thresholds;
> >
> > So IMO it would be cleaner to just put "thresholds" into struct
> > thermal_zone_device directly as a struct list_head because the above
> > wouldn't be needed then.
>
> I can change that to list directly but if we add anything (which can
> probably happen) then the API change as well as anything below again.

But it can be

 #ifdef CONFIG_THERMAL_DEBUGFS
     struct thermal_debugfs *debugfs;
 #endif
+   struct thresholds thresholds;
     struct thermal_trip_desc trips[] __counted_by(num_trips);
 };

> >> +       struct threshold *t;
> >> +
> >> +       lockdep_assert_held(&tz->lock);
> >> +
> >> +       t = __thermal_thresholds_find(thresholds, temperature);
> >> +       if (t) {
> >> +               if (t->direction == direction)
> >> +                       return -EEXIST;
> >
> > Why is it useful to return an error here?
>
> I was expecting this comment :)
>
> We have the choice between :
>   * we do nothing
>   * we return an error
>
> Let's assume the userspace is misbehaving and because of an internal bug
> of a thermal engine it creates multiple times the same threshold (eg.
> index not incremented, etc ...). If the kernel reports nothing, then the
> user space will never detect this problem. If it reports the error the
> user space can choose to ignore it or follow it up.
>
> The kernel is strict and it is up to the user space to ignore it or not.

Well, I'm not sure.

IMO it will just cause user space code to be more complex in some
valid use cases, but whatever.
Daniel Lezcano Sept. 4, 2024, 8:43 a.m. UTC | #5
On 22/08/2024 13:30, Rafael J. Wysocki wrote:
> On Wed, Aug 21, 2024 at 10:05 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>>
>> On Fri, Aug 16, 2024 at 10:12 AM Daniel Lezcano
>> <daniel.lezcano@linaro.org> wrote:
> 
> [cut]
> 
>>> --- /dev/null
>>> +++ b/drivers/thermal/thermal_thresholds.c
>>> @@ -0,0 +1,241 @@
>>> +// SPDX-License-Identifier: GPL-2.0
>>> +/*
>>> + * Copyright 2024 Linaro Limited
>>> + *
>>> + * Author: Daniel Lezcano <daniel.lezcano@linaro.org>
>>> + *
>>> + * Thermal thresholds
>>> + */
>>> +#include <linux/list.h>
>>> +#include <linux/list_sort.h>
>>> +#include <linux/slab.h>
>>> +
>>> +#include "thermal_core.h"
>>
>> +#include "thermal_thresholds.h"
>>
>>> +
>>> +struct thresholds {
>>> +       struct list_head list;
>>> +};
>>
>> This duplicates the definition in the header file.
>>
>> Besides, why is the wrapper struct needed?
> 
> On second thought, it can hold a pointer to the first threshold that
> was strictly above the zone temperature when
> thermal_thresholds_handle() ran last time, so something like:
> 
> struct thresholds {
>        struct list_head list;
>        struct user_threhold *first_above;
> };
> 
> and first_above == NULL would mean that the zone temperature was above
> all of the threshold or at the highest one.
> 
> Then thermal_thresholds_handle() could do something like:
> 
> tr = tz->user_thresholds.first_above;
> 
> if (tr && tz->temperature >= tr->temperature) {
>      do {
>          if (tr->direction & THERMAL_THRESHOLD_WAY_UP)
>              notify = true;
> 
>          if (tr->list_node.next != &tz->user_thresholds.list) {
>              tr = list_next_entry(tr, list_node);
>          else
>              tr = NULL;
>      } while (tr && tz->temperature >= tr->temperature);
> } else {
>      if (!tr)
>          tr = list_last_entry(&tz->user_thresholds.list,
>                       struct user_threshold, list_node);
> 
>      while (tz->temperature < tr->temperature)
>          if (tr->direction & THERMAL_THRESHOLD_WAY_DOWN)
>              notify = true;
> 
>          if (tr->list_node.prev != &tz->user_thresholds.list) {
>              tr = list_prev_entry(tr, list_node);
>          else
>              break;
>      }
>      if (tz->temperature >= tr->temperature)
>          tr = NULL;
> }
> tz->user_thresholds.first_above = tr;
> 
> which is a bit simpler than the code in the current patch.

TBH, it is probably a matter of taste but I don't see how it is simpler, 
especially if someone wants to to understand or add something in the 
code. I would prefer to keep the initial routine which was already tested.

[ ... ]

> 
> Also, I'm not sure why "high" and "low" are needed at all.
> 
> The current and last zone temperature could be included in the
> notification just fine and user space should be able to figure out
> which thresholds are affected.

I'm not sure to get the question. high and low are needed because the 
thermal core will call set_trips(). The trips and the thresholds are 
combined to find out the high and low temperature to program the driver 
to trigger an interrupt.
diff mbox series

Patch

diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig
index ed16897584b4..84f9643678d6 100644
--- a/drivers/thermal/Kconfig
+++ b/drivers/thermal/Kconfig
@@ -40,6 +40,21 @@  config THERMAL_DEBUGFS
 	  Say Y to allow the thermal subsystem to collect diagnostic
 	  information that can be accessed via debugfs.
 
+config THERMAL_THRESHOLDS
+	bool "Thermal thresholds notification mechanism"
+	depends on THERMAL_NETLINK
+	help
+	  The userspace implements thermal engines which needs to get
+	  notified when temperature thresholds are crossed the way up
+	  and down. These notification allow them to analyze the
+	  thermal situation of the platform and take decision to
+	  fulfill specific thermal profile like 'balanced',
+	  'performance' or 'power saving'. In addition, the
+	  temperature of the skin sensor is very important in this
+	  case and must be monitored as well.
+
+	  If in doubt, say Y
+
 config THERMAL_EMERGENCY_POWEROFF_DELAY_MS
 	int "Emergency poweroff delay in milli-seconds"
 	default 0
diff --git a/drivers/thermal/Makefile b/drivers/thermal/Makefile
index ce7a4752ef52..3b991b1a7db4 100644
--- a/drivers/thermal/Makefile
+++ b/drivers/thermal/Makefile
@@ -7,6 +7,9 @@  obj-$(CONFIG_THERMAL)		+= thermal_sys.o
 thermal_sys-y			+= thermal_core.o thermal_sysfs.o
 thermal_sys-y			+= thermal_trip.o thermal_helpers.o
 
+# thermal thresholds
+thermal_sys-$(CONFIG_THERMAL_THRESHOLDS)	+= thermal_thresholds.o
+
 # netlink interface to manage the thermal framework
 thermal_sys-$(CONFIG_THERMAL_NETLINK)		+= thermal_netlink.o
 
diff --git a/drivers/thermal/thermal_core.h b/drivers/thermal/thermal_core.h
index 67a09f90eb95..0742c0f03d46 100644
--- a/drivers/thermal/thermal_core.h
+++ b/drivers/thermal/thermal_core.h
@@ -13,6 +13,7 @@ 
 #include <linux/thermal.h>
 
 #include "thermal_netlink.h"
+#include "thermal_thresholds.h"
 #include "thermal_debugfs.h"
 
 struct thermal_trip_desc {
@@ -132,6 +133,9 @@  struct thermal_zone_device {
 	bool resuming;
 #ifdef CONFIG_THERMAL_DEBUGFS
 	struct thermal_debugfs *debugfs;
+#endif
+#ifdef CONFIG_THERMAL_THRESHOLDS
+	struct thresholds *thresholds;
 #endif
 	struct thermal_trip_desc trips[] __counted_by(num_trips);
 };
diff --git a/drivers/thermal/thermal_thresholds.c b/drivers/thermal/thermal_thresholds.c
new file mode 100644
index 000000000000..0241b468cfbd
--- /dev/null
+++ b/drivers/thermal/thermal_thresholds.c
@@ -0,0 +1,241 @@ 
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2024 Linaro Limited
+ *
+ * Author: Daniel Lezcano <daniel.lezcano@linaro.org>
+ *
+ * Thermal thresholds
+ */
+#include <linux/list.h>
+#include <linux/list_sort.h>
+#include <linux/slab.h>
+
+#include "thermal_core.h"
+
+struct thresholds {
+	struct list_head list;
+};
+
+int thermal_thresholds_init(struct thermal_zone_device *tz)
+{
+	struct thresholds *thresholds;
+
+	thresholds = kmalloc(sizeof(*thresholds), GFP_KERNEL);
+	if (!thresholds)
+		return -ENOMEM;
+
+	INIT_LIST_HEAD(&thresholds->list);
+	tz->thresholds = thresholds;
+
+	return 0;
+}
+
+void thermal_thresholds_exit(struct thermal_zone_device *tz)
+{
+	thermal_thresholds_flush(tz);
+	kfree(tz->thresholds);
+	tz->thresholds = NULL;
+}
+
+static int __thermal_thresholds_cmp(void *data,
+				    const struct list_head *l1,
+				    const struct list_head *l2)
+{
+	struct threshold *t1 = container_of(l1, struct threshold, list);
+	struct threshold *t2 = container_of(l2, struct threshold, list);
+
+	return t1->temperature - t2->temperature;
+}
+
+static struct threshold *__thermal_thresholds_find(const struct thresholds *thresholds, int temperature)
+{
+	struct threshold *t;
+
+	list_for_each_entry(t, &thresholds->list, list)
+		if (t->temperature == temperature)
+			return t;
+
+	return NULL;
+}
+
+static bool __thermal_threshold_is_crossed(struct threshold *threshold, int temperature,
+					   int last_temperature, int direction,
+					   int *low, int *high)
+{
+	if (temperature > threshold->temperature && threshold->temperature > *low &&
+	    (THERMAL_THRESHOLD_WAY_DOWN & threshold->direction))
+		*low = threshold->temperature;
+
+	if (temperature < threshold->temperature && threshold->temperature < *high &&
+	    (THERMAL_THRESHOLD_WAY_UP & threshold->direction))
+		*high = threshold->temperature;
+
+	if (temperature < threshold->temperature &&
+	    last_temperature >= threshold->temperature &&
+	    (threshold->direction & direction))
+		return true;
+
+	if (temperature >= threshold->temperature &&
+	    last_temperature < threshold->temperature &&
+	    (threshold->direction & direction))
+		return true;
+
+	return false;
+}
+
+static bool thermal_thresholds_handle_raising(struct thresholds *thresholds, int temperature,
+					      int last_temperature, int *low, int *high)
+{
+	struct threshold *t;
+
+	list_for_each_entry(t, &thresholds->list, list) {
+		if (__thermal_threshold_is_crossed(t, temperature, last_temperature,
+						   THERMAL_THRESHOLD_WAY_UP, low, high))
+			return true;
+	}
+
+	return false;
+}
+
+static bool thermal_thresholds_handle_dropping(struct thresholds *thresholds, int temperature,
+					       int last_temperature, int *low, int *high)
+{
+	struct threshold *t;
+
+	list_for_each_entry_reverse(t, &thresholds->list, list) {
+		if (__thermal_threshold_is_crossed(t, temperature, last_temperature,
+						   THERMAL_THRESHOLD_WAY_DOWN, low, high))
+			return true;
+	}
+
+	return false;
+}
+
+void thermal_thresholds_flush(struct thermal_zone_device *tz)
+{
+	struct thresholds *thresholds = tz->thresholds;
+	struct threshold *entry, *tmp;
+
+	lockdep_assert_held(&tz->lock);
+
+	list_for_each_entry_safe(entry, tmp, &thresholds->list, list) {
+		list_del(&entry->list);
+		kfree(entry);
+	}
+
+	__thermal_zone_device_update(tz, THERMAL_THRESHOLD_FLUSHED);
+}
+
+int thermal_thresholds_handle(struct thermal_zone_device *tz, int *low, int *high)
+{
+	struct thresholds *thresholds = tz->thresholds;
+
+	int temperature = tz->temperature;
+	int last_temperature = tz->last_temperature;
+	bool notify;
+
+	lockdep_assert_held(&tz->lock);
+
+	/*
+	 * We need a second update in order to detect a threshold being crossed
+	 */
+	if (last_temperature == THERMAL_TEMP_INVALID)
+		return 0;
+
+	/*
+	 * The temperature is stable, so obviously we can not have
+	 * crossed a threshold.
+	 */
+	if (last_temperature == temperature)
+		return 0;
+
+	/*
+	 * Since last update the temperature:
+	 * - increased : thresholds are crossed the way up
+	 * - decreased : thresholds are crossed the way down
+	 */
+	if (temperature > last_temperature)
+		notify = thermal_thresholds_handle_raising(thresholds, temperature,
+							   last_temperature, low, high);
+	else
+		notify = thermal_thresholds_handle_dropping(thresholds, temperature,
+							    last_temperature, low, high);
+
+	if (notify)
+		pr_debug("A threshold has been crossed the way %s, with a temperature=%d, last_temperature=%d\n",
+			 temperature > last_temperature ? "up" : "down", temperature, last_temperature);
+
+	return 0;
+}
+
+int thermal_thresholds_add(struct thermal_zone_device *tz, int temperature, int direction)
+{
+	struct thresholds *thresholds = tz->thresholds;
+	struct threshold *t;
+
+	lockdep_assert_held(&tz->lock);
+
+	t = __thermal_thresholds_find(thresholds, temperature);
+	if (t) {
+		if (t->direction == direction)
+			return -EEXIST;
+
+		t->direction |= direction;
+	} else {
+
+		t = kmalloc(sizeof(*t), GFP_KERNEL);
+		if (!t)
+			return -ENOMEM;
+
+		INIT_LIST_HEAD(&t->list);
+		t->temperature = temperature;
+		t->direction = direction;
+		list_add(&t->list, &thresholds->list);
+		list_sort(NULL, &thresholds->list, __thermal_thresholds_cmp);
+	}
+
+	__thermal_zone_device_update(tz, THERMAL_THRESHOLD_ADDED);
+
+	return 0;
+}
+
+int thermal_thresholds_delete(struct thermal_zone_device *tz, int temperature, int direction)
+{
+	struct thresholds *thresholds = tz->thresholds;
+	struct threshold *t;
+
+	lockdep_assert_held(&tz->lock);
+
+	t = __thermal_thresholds_find(thresholds, temperature);
+	if (!t)
+		return -ENOENT;
+
+	if (t->direction == direction) {
+		list_del(&t->list);
+		kfree(t);
+	} else {
+		t->direction &= ~direction;
+	}
+
+	__thermal_zone_device_update(tz, THERMAL_THRESHOLD_DELETED);
+
+	return 0;
+}
+
+int thermal_thresholds_for_each(struct thermal_zone_device *tz,
+				int (*cb)(struct threshold *, void *arg), void *arg)
+{
+	struct thresholds *thresholds = tz->thresholds;
+	struct threshold *entry;
+	int ret;
+
+	lockdep_assert_held(&tz->lock);
+
+	list_for_each_entry(entry, &thresholds->list, list) {
+		ret = cb(entry, arg);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
diff --git a/drivers/thermal/thermal_thresholds.h b/drivers/thermal/thermal_thresholds.h
new file mode 100644
index 000000000000..7c8ce150d6d0
--- /dev/null
+++ b/drivers/thermal/thermal_thresholds.h
@@ -0,0 +1,57 @@ 
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#define THERMAL_THRESHOLD_WAY_UP   BIT(0)
+#define THERMAL_THRESHOLD_WAY_DOWN BIT(1)
+
+struct threshold {
+	int temperature;
+	int direction;
+	struct list_head list;
+};
+
+#ifdef CONFIG_THERMAL_THRESHOLDS
+int thermal_thresholds_init(struct thermal_zone_device *tz);
+void thermal_thresholds_exit(struct thermal_zone_device *tz);
+void thermal_thresholds_flush(struct thermal_zone_device *tz);
+int thermal_thresholds_add(struct thermal_zone_device *tz, int temperature, int direction);
+int thermal_thresholds_delete(struct thermal_zone_device *tz, int temperature, int direction);
+int thermal_thresholds_handle(struct thermal_zone_device *tz, int *low, int *high);
+int thermal_thresholds_for_each(struct thermal_zone_device *tz,
+				int (*cb)(struct threshold *, void *arg), void *arg);
+#else
+static inline int thermal_thresholds_init(struct thermal_zone_device *tz)
+{
+	return 0;
+}
+
+static inline void thermal_thresholds_exit(struct thermal_zone_device *tz)
+{
+	;
+}
+
+static inline void thermal_thresholds_flush(struct thermal_zone_device *tz)
+{
+	;
+}
+
+static inline int thermal_thresholds_add(struct thermal_zone_device *tz, int temperature, int direction)
+{
+	return 0;
+}
+
+static inline int thermal_thresholds_delete(struct thermal_zone_device *tz, int temperature, int direction)
+{
+	return 0;
+}
+
+static inline int thermal_thresholds_handle(struct thermal_zone_device *tz, int *low, int *high)
+{
+	return 0;
+}
+
+static inline int thermal_thresholds_for_each(struct thermal_zone_device *tz,
+					      int (*cb)(struct threshold *, void *arg), void *arg)
+{
+	return 0;
+}
+#endif
diff --git a/include/linux/thermal.h b/include/linux/thermal.h
index 25fbf960b474..bf0b4a8218f6 100644
--- a/include/linux/thermal.h
+++ b/include/linux/thermal.h
@@ -55,6 +55,9 @@  enum thermal_notify_event {
 	THERMAL_TZ_BIND_CDEV, /* Cooling dev is bind to the thermal zone */
 	THERMAL_TZ_UNBIND_CDEV, /* Cooling dev is unbind from the thermal zone */
 	THERMAL_INSTANCE_WEIGHT_CHANGED, /* Thermal instance weight changed */
+	THERMAL_THRESHOLD_ADDED, /* Threshold added */
+	THERMAL_THRESHOLD_DELETED, /* Threshold deleted */
+	THERMAL_THRESHOLD_FLUSHED, /* All thresholds deleted */
 };
 
 /**