mbox series

[v4,00/30] Rework the trip points creation

Message ID 20220921094244.606948-1-daniel.lezcano@linaro.org
Headers show
Series Rework the trip points creation | expand

Message

Daniel Lezcano Sept. 21, 2022, 9:42 a.m. UTC
This work is the pre-requisite of handling correctly when the trip
point are crossed. For that we need to rework how the trip points are
declared and assigned to a thermal zone.

Even if it appears to be a common sense to have the trip points being
ordered, this no guarantee neither documentation telling that is the
case.

One solution could have been to create an ordered array of trips built
when registering the thermal zone by calling the different get_trip*
ops. However those ops receive a thermal zone pointer which is not
known as it is in the process of creating it.

This cyclic dependency shows we have to rework how we manage the trip
points.

Actually, all the trip points definition can be common to the backend
sensor drivers and we can factor out the thermal trip structure in all
of them.

Then, as we register the thermal trips array, they will be available
in the thermal zone structure and a core function can return the trip
given its id.

The get_trip_* ops won't be needed anymore and could be removed. The
resulting code will be another step forward to a self encapsulated
generic thermal framework.

Most of the drivers can be converted more or less easily. This series
does a first round with most of the drivers. Some remain and will be
converted but with a smaller set of changes as the conversion is a bit
more complex.

Changelog:
 v4:
   - Remove extra lines on exynos changes as reported by Krzysztof Kozlowski
   - Collected tags
 v3:
   - Reorg the series to be git-bisect safe
   - Added the set_trip generic function
   - Added the get_crit_temp generic function
   - Removed more dead code in the thermal-of
   - Fixed the exynos changelog
   - Fixed the error check for the exynos drivers
   - Collected tags
 v2:
   - Added missing EXPORT_SYMBOL_GPL() for thermal_zone_get_trip()
   - Removed tab whitespace in the acerhdf driver
   - Collected tags

Cc: Raju Rangoju <rajur@chelsio.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Peter Kaestle <peter@piie.net>
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Mark Gross <markgross@kernel.org>
Cc: Miquel Raynal <miquel.raynal@bootlin.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Amit Kucheria <amitk@kernel.org>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Nicolas Saenz Julienne <nsaenz@kernel.org>
Cc: Broadcom Kernel Team <bcm-kernel-feedback-list@broadcom.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Ray Jui <rjui@broadcom.com>
Cc: Scott Branden <sbranden@broadcom.com>
Cc: Support Opensource <support.opensource@diasemi.com>
Cc: Lukasz Luba <lukasz.luba@arm.com>
Cc: Shawn Guo <shawnguo@kernel.org>
Cc: Sascha Hauer <s.hauer@pengutronix.de>
Cc: Pengutronix Kernel Team <kernel@pengutronix.de>
Cc: Fabio Estevam <festevam@gmail.com>
Cc: NXP Linux Team <linux-imx@nxp.com>
Cc: Thara Gopinath <thara.gopinath@linaro.org>
Cc: Andy Gross <agross@kernel.org>
Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Cc: "Niklas Söderlund" <niklas.soderlund@ragnatech.se>
Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Cc: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Cc: Alim Akhtar <alim.akhtar@samsung.com>
Cc: Thierry Reding <thierry.reding@gmail.com>
Cc: Jonathan Hunter <jonathanh@nvidia.com>
Cc: Eduardo Valentin <edubezval@gmail.com>
Cc: Keerthy <j-keerthy@ti.com>
Cc: Kunihiko Hayashi <hayashi.kunihiko@socionext.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Antoine Tenart <atenart@kernel.org>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Dmitry Osipenko <digetx@gmail.com>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: platform-driver-x86@vger.kernel.org
Cc: linux-pm@vger.kernel.org
Cc: linux-rpi-kernel@lists.infradead.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-arm-msm@vger.kernel.org
Cc: linux-renesas-soc@vger.kernel.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-tegra@vger.kernel.org
Cc: linux-omap@vger.kernel.org

Daniel Lezcano (30):
  thermal/core: Add a generic thermal_zone_get_trip() function
  thermal/sysfs: Do not make get_trip_hyst optional
  thermal/core: Add a generic thermal_zone_set_trip() function
  thermal/core: Add a generic thermal_zone_get_crit_temp() function
  thermal/core/governors: Use thermal_zone_get_trip() instead of ops
    functions
  thermal/of: Use generic thermal_zone_get_trip() function
  thermal/of: Remove unused functions
  thermal/drivers/exynos: Use generic thermal_zone_get_trip() function
  thermal/drivers/exynos: of_thermal_get_ntrips()
  thermal/drivers/exynos: Replace of_thermal_is_trip_valid() by
    thermal_zone_get_trip()
  thermal/drivers/tegra: Use generic thermal_zone_get_trip() function
  thermal/drivers/uniphier: Use generic thermal_zone_get_trip() function
  thermal/drivers/hisi: Use generic thermal_zone_get_trip() function
  thermal/drivers/qcom: Use generic thermal_zone_get_trip() function
  thermal/drivers/armada: Use generic thermal_zone_get_trip() function
  thermal/drivers/rcar_gen3: Use the generic function to get the number
    of trips
  thermal/of: Remove of_thermal_get_ntrips()
  thermal/of: Remove of_thermal_is_trip_valid()
  thermal/of: Remove of_thermal_set_trip_hyst()
  thermal/of: Remove of_thermal_get_crit_temp()
  thermal/drivers/st: Use generic trip points
  thermal/drivers/imx: Use generic thermal_zone_get_trip() function
  thermal/drivers/rcar: Use generic thermal_zone_get_trip() function
  thermal/drivers/broadcom: Use generic thermal_zone_get_trip() function
  thermal/drivers/da9062: Use generic thermal_zone_get_trip() function
  thermal/drivers/ti: Remove unused macros ti_thermal_get_trip_value() /
    ti_thermal_trip_is_valid()
  thermal/drivers/acerhdf: Use generic thermal_zone_get_trip() function
  thermal/drivers/cxgb4: Use generic thermal_zone_get_trip() function
  thermal/intel/int340x: Replace parameter to simplify
  thermal/drivers/intel: Use generic thermal_zone_get_trip() function

 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h    |   2 -
 .../ethernet/chelsio/cxgb4/cxgb4_thermal.c    |  41 +----
 drivers/platform/x86/acerhdf.c                |  73 +++-----
 drivers/thermal/armada_thermal.c              |  39 ++---
 drivers/thermal/broadcom/bcm2835_thermal.c    |   8 +-
 drivers/thermal/da9062-thermal.c              |  52 +-----
 drivers/thermal/gov_bang_bang.c               |  23 +--
 drivers/thermal/gov_fair_share.c              |  18 +-
 drivers/thermal/gov_power_allocator.c         |  51 +++---
 drivers/thermal/gov_step_wise.c               |  22 ++-
 drivers/thermal/hisi_thermal.c                |  11 +-
 drivers/thermal/imx_thermal.c                 |  72 +++-----
 .../int340x_thermal/int340x_thermal_zone.c    |  31 ++--
 .../int340x_thermal/int340x_thermal_zone.h    |   4 +-
 .../processor_thermal_device.c                |  10 +-
 drivers/thermal/intel/x86_pkg_temp_thermal.c  | 120 +++++++------
 drivers/thermal/qcom/qcom-spmi-temp-alarm.c   |  39 ++---
 drivers/thermal/rcar_gen3_thermal.c           |   2 +-
 drivers/thermal/rcar_thermal.c                |  49 +-----
 drivers/thermal/samsung/exynos_tmu.c          |  57 +++----
 drivers/thermal/st/st_thermal.c               |  47 +-----
 drivers/thermal/tegra/soctherm.c              |  33 ++--
 drivers/thermal/tegra/tegra30-tsensor.c       |  17 +-
 drivers/thermal/thermal_core.c                | 158 +++++++++++++++---
 drivers/thermal/thermal_core.h                |  22 ---
 drivers/thermal/thermal_helpers.c             |  28 ++--
 drivers/thermal/thermal_netlink.c             |  21 +--
 drivers/thermal/thermal_of.c                  | 116 -------------
 drivers/thermal/thermal_sysfs.c               | 127 +++++---------
 drivers/thermal/ti-soc-thermal/ti-thermal.h   |  15 --
 drivers/thermal/uniphier_thermal.c            |  27 ++-
 include/linux/thermal.h                       |  10 ++
 32 files changed, 535 insertions(+), 810 deletions(-)

Comments

Krzysztof Kozlowski Sept. 21, 2022, 3:24 p.m. UTC | #1
On 21/09/2022 11:42, Daniel Lezcano wrote:
> The thermal_zone_get_trip() does the same check as
> of_thermal_is_trip_valid(). Replace the call to
> of_thermal_is_trip_valid() by thermal_zone_get_trip().
> 
> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
> ---


Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>

Best regards,
Krzysztof
Marek Szyprowski Sept. 23, 2022, 2:09 p.m. UTC | #2
Hi Daniel,

On 21.09.2022 11:42, Daniel Lezcano wrote:
> The thermal_zone_get_trip() does the same check as
> of_thermal_is_trip_valid(). Replace the call to
> of_thermal_is_trip_valid() by thermal_zone_get_trip().
>
> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
> ---

This patch landed in linux next-20220923 as commit 4a71bb8005ba 
("thermal/drivers/exynos: Replace of_thermal_is_trip_valid() by 
thermal_zone_get_trip()"). Unfortunately it causes deadlock on all 
Exynos based board:

============================================
WARNING: possible recursive locking detected
6.0.0-rc1-00062-g4a71bb8005ba #12855 Not tainted
--------------------------------------------
swapper/0/1 is trying to acquire lock:
c263c394 (&tz->lock){+.+.}-{3:3}, at: 
thermal_zone_device_update.part.0+0x114/0x538

but task is already holding lock:
c263c394 (&tz->lock){+.+.}-{3:3}, at: 
thermal_zone_device_update.part.0+0x3c/0x538

other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(&tz->lock);
   lock(&tz->lock);

  *** DEADLOCK ***

  May be due to missing lock nesting notation

2 locks held by swapper/0/1:
  #0: c1d5248c (&dev->mutex){....}-{3:3}, at: __driver_attach+0xe4/0x1f0
  #1: c263c394 (&tz->lock){+.+.}-{3:3}, at: 
thermal_zone_device_update.part.0+0x3c/0x538

stack backtrace:
CPU: 1 PID: 1 Comm: swapper/0 Not tainted 6.0.0-rc1-00062-g4a71bb8005ba 
#12855
Hardware name: Samsung Exynos (Flattened Device Tree)
  unwind_backtrace from show_stack+0x10/0x14
  show_stack from dump_stack_lvl+0x58/0x70
  dump_stack_lvl from __lock_acquire+0x146c/0x2a7c
  __lock_acquire from lock_acquire+0x124/0x3e4
  lock_acquire from __mutex_lock+0x90/0x948
  __mutex_lock from mutex_lock_nested+0x1c/0x24
  mutex_lock_nested from thermal_zone_device_update.part.0+0x114/0x538
  thermal_zone_device_update.part.0 from 
thermal_zone_device_set_mode+0x70/0x98
  thermal_zone_device_set_mode from thermal_of_zone_register+0x424/0x69c
  thermal_of_zone_register from devm_thermal_of_zone_register+0x58/0x94
  devm_thermal_of_zone_register from exynos_tmu_probe+0x29c/0x728
  exynos_tmu_probe from platform_probe+0x5c/0xb8
  platform_probe from really_probe+0xe0/0x414
  really_probe from __driver_probe_device+0xa0/0x208
  __driver_probe_device from driver_probe_device+0x30/0xc0
  driver_probe_device from __driver_attach+0xf0/0x1f0
  __driver_attach from bus_for_each_dev+0x70/0xb0
  bus_for_each_dev from bus_add_driver+0x174/0x218
  bus_add_driver from driver_register+0x88/0x11c
  driver_register from do_one_initcall+0x64/0x380
  do_one_initcall from kernel_init_freeable+0x1c0/0x224
  kernel_init_freeable from kernel_init+0x18/0x12c
  kernel_init from ret_from_fork+0x14/0x2c
Exception stack(0xf082dfb0 to 0xf082dff8)

[deadlock]

Something is wrong in locking in the functions from the above stacktrace.


>   drivers/thermal/samsung/exynos_tmu.c | 9 ++++++---
>   1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/thermal/samsung/exynos_tmu.c b/drivers/thermal/samsung/exynos_tmu.c
> index 91e6860b5ec4..34b460092308 100644
> --- a/drivers/thermal/samsung/exynos_tmu.c
> +++ b/drivers/thermal/samsung/exynos_tmu.c
> @@ -554,13 +554,14 @@ static void exynos4210_tmu_control(struct platform_device *pdev, bool on)
>   {
>   	struct exynos_tmu_data *data = platform_get_drvdata(pdev);
>   	struct thermal_zone_device *tz = data->tzd;
> +	struct thermal_trip trip;
>   	unsigned int con, interrupt_en = 0, i;
>   
>   	con = get_con_reg(data, readl(data->base + EXYNOS_TMU_REG_CONTROL));
>   
>   	if (on) {
>   		for (i = 0; i < data->ntrip; i++) {
> -			if (!of_thermal_is_trip_valid(tz, i))
> +			if (thermal_zone_get_trip(tz, i, &trip))
>   				continue;
>   
>   			interrupt_en |=
> @@ -584,13 +585,14 @@ static void exynos5433_tmu_control(struct platform_device *pdev, bool on)
>   {
>   	struct exynos_tmu_data *data = platform_get_drvdata(pdev);
>   	struct thermal_zone_device *tz = data->tzd;
> +	struct thermal_trip trip;
>   	unsigned int con, interrupt_en = 0, pd_det_en, i;
>   
>   	con = get_con_reg(data, readl(data->base + EXYNOS_TMU_REG_CONTROL));
>   
>   	if (on) {
>   		for (i = 0; i < data->ntrip; i++) {
> -			if (!of_thermal_is_trip_valid(tz, i))
> +			if (thermal_zone_get_trip(tz, i, &trip))
>   				continue;
>   
>   			interrupt_en |=
> @@ -615,13 +617,14 @@ static void exynos7_tmu_control(struct platform_device *pdev, bool on)
>   {
>   	struct exynos_tmu_data *data = platform_get_drvdata(pdev);
>   	struct thermal_zone_device *tz = data->tzd;
> +	struct thermal_trip trip;
>   	unsigned int con, interrupt_en = 0, i;
>   
>   	con = get_con_reg(data, readl(data->base + EXYNOS_TMU_REG_CONTROL));
>   
>   	if (on) {
>   		for (i = 0; i < data->ntrip; i++) {
> -			if (!of_thermal_is_trip_valid(tz, i))
> +			if (thermal_zone_get_trip(tz, i, &trip))
>   				continue;
>   
>   			interrupt_en |=

Best regards
Daniel Lezcano Sept. 23, 2022, 5:40 p.m. UTC | #3
Hi Marek,

thanks for reporting

On 23/09/2022 16:09, Marek Szyprowski wrote:

[ ... ]

> Exception stack(0xf082dfb0 to 0xf082dff8)
> 
> [deadlock]
> 
> Something is wrong in locking in the functions from the above stacktrace.

Are you sure this deadlock is coming from this patch? Does a revert of 
this patch solve the issue ?
Marek Szyprowski Sept. 23, 2022, 10:02 p.m. UTC | #4
Hi Daniel,

On 23.09.2022 19:40, Daniel Lezcano wrote:
>
> Hi Marek,
>
> thanks for reporting
>
> On 23/09/2022 16:09, Marek Szyprowski wrote:
>
> [ ... ]
>
>> Exception stack(0xf082dfb0 to 0xf082dff8)
>>
>> [deadlock]
>>
>> Something is wrong in locking in the functions from the above 
>> stacktrace.
>
> Are you sure this deadlock is coming from this patch? Does a revert of 
> this patch solve the issue ?
>
Ups, my fault. It looks that I've copied SHA from the wrong window while 
preparing the report. The bisection pointed to the 78ffa3e58d93 
("thermal/core: Add a generic thermal_zone_get_trip() function") commit 
and I've already found how to fix the deadlock. I will report it again 
under the proper patch.


Best regards