Message ID | 20230722122534.2279689-1-zhengxingda@iscas.ac.cn |
---|---|
State | New |
Headers | show |
Series | [RESEND,RESEND] thermal/of: support thermal zones w/o trips subnode | expand |
On Sat, Jul 22, 2023 at 08:25:34PM +0800, Icenowy Zheng wrote: > From: Icenowy Zheng <uwu@icenowy.me> > > Although the current device tree binding of thermal zones require the > trips subnode, the binding in kernel v5.15 does not require it, and many > device trees shipped with the kernel, for example, > allwinner/sun50i-a64.dtsi and mediatek/mt8183-kukui.dtsi in ARM64, still > comply to the old binding and contain no trips subnode. > > Allow the code to successfully register thermal zones w/o trips subnode > for DT binding compatibility now. > > Furtherly, the inconsistency between DTs and bindings should be resolved > by either adding empty trips subnode or dropping the trips subnode > requirement. This makes sense to me - it allows people to see the reported temperature even if there's no trips defined which seems more helpful than refusing to register. Reviewed-by: Mark Brown <broonie@kernel.org>
Hi Mark, On 22/07/2023 22:11, Mark Brown wrote: > On Sat, Jul 22, 2023 at 08:25:34PM +0800, Icenowy Zheng wrote: >> From: Icenowy Zheng <uwu@icenowy.me> >> >> Although the current device tree binding of thermal zones require the >> trips subnode, the binding in kernel v5.15 does not require it, and many >> device trees shipped with the kernel, for example, >> allwinner/sun50i-a64.dtsi and mediatek/mt8183-kukui.dtsi in ARM64, still >> comply to the old binding and contain no trips subnode. >> >> Allow the code to successfully register thermal zones w/o trips subnode >> for DT binding compatibility now. >> >> Furtherly, the inconsistency between DTs and bindings should be resolved >> by either adding empty trips subnode or dropping the trips subnode >> requirement. > > This makes sense to me - it allows people to see the reported > temperature even if there's no trips defined which seems more > helpful than refusing to register. The binding describes the trip points as required and that since the beginning. What changed is now the code reflects the required property while before it was permissive, that was an oversight. Just a reminder about the thermal framework goals: 1. It protects the silicon (thus critical and hot trip points) 2. It mitigates the temperature (thus cooling device bound to trip points) 3. It notifies the userspace when a trip point is crossed So if the thermal zone is described but without any of this goal above, it is pointless. If the goal is to report the temperature only, then hwmon should be used instead. If the goal is to mitigate by userspace, then the trip point *must* be used to prevent the userspace polling the temperature. With the trip point the sensor will be set to fire an interrupt at the given trip temperature. IOW, trip points are not optional
On Sun, Jul 23, 2023 at 12:12:49PM +0200, Daniel Lezcano wrote: > On 22/07/2023 22:11, Mark Brown wrote: > > This makes sense to me - it allows people to see the reported > > temperature even if there's no trips defined which seems more > > helpful than refusing to register. ... > If the goal is to report the temperature only, then hwmon should be used > instead. Sure, that doesn't seem to be the case in the impacted systems though - AFAICT the issue with these is that it's a generic SoC DT that's not fully fleshed out, either because more data is needed for the silicon or because the numbers need to be system specific for some reason. > If the goal is to mitigate by userspace, then the trip point *must* be used > to prevent the userspace polling the temperature. With the trip point the > sensor will be set to fire an interrupt at the given trip temperature. I'm not clear a trip point prevent userspace polling if it feels so moved? Is it just that it makes it more likely that someone will implement something that polls? > IOW, trip points are not optional I can see printing a loud warning given that the system is not fully configured (there's a warning already, I did nearly comment on this patch downgrading it all the way to a debug log), perhaps even suppressing the registraton of the userspace interface, but returning a failure to the registering driver feels like it's escalating the problem and complicating the driver code. Suppressing the registration to userspace seemed like it was adding more complexity in the core but it would avoid any potential confusion for userspace. For me the main issue is the impact on devices that support multiple thermal zones, in order to avoid having working zones stay registered their drivers will all have to handle the possibility of some of the zones failing to register due to missing configuration which is going to add complexity both at both registration and runtime and be easy to miss. If the core just accepts the zones then whatever complexity there is gets factored out into the core.
在 2023-07-23星期日的 16:05 +0100,Mark Brown写道: > On Sun, Jul 23, 2023 at 12:12:49PM +0200, Daniel Lezcano wrote: > > On 22/07/2023 22:11, Mark Brown wrote: > > > > This makes sense to me - it allows people to see the reported > > > temperature even if there's no trips defined which seems more > > > helpful than refusing to register. > > ... > > > If the goal is to report the temperature only, then hwmon should be > > used > > instead. > > Sure, that doesn't seem to be the case in the impacted systems though > - > AFAICT the issue with these is that it's a generic SoC DT that's not > fully fleshed out, either because more data is needed for the silicon > or > because the numbers need to be system specific for some reason. Well maybe we should move all thermal sensors to hwmon framework, then let thermal framework pull the readout from hwmon; but two frameworks have the same functionality of reading temperature is the current situation, we shouldn't break things. > > > If the goal is to mitigate by userspace, then the trip point *must* > > be used > > to prevent the userspace polling the temperature. With the trip > > point the > > sensor will be set to fire an interrupt at the given trip > > temperature. > > I'm not clear a trip point prevent userspace polling if it feels so > moved? Is it just that it makes it more likely that someone will > implement something that polls? > > > IOW, trip points are not optional If it's declared optional in DT binding in a released kernel version, then it's optional, at least it should be optional in practice to support this legacy DT binding, and even there are DT files shipped with the kernel that utilizes the optionalness. Showing a warning is okay, but bailing out is not an option, according to my understand of current DT maintaince model. > > I can see printing a loud warning given that the system is not fully > configured (there's a warning already, I did nearly comment on this > patch downgrading it all the way to a debug log), perhaps even > suppressing the registraton of the userspace interface, but returning > a > failure to the registering driver feels like it's escalating the > problem > and complicating the driver code. Suppressing the registration to > userspace seemed like it was adding more complexity in the core but > it > would avoid any potential confusion for userspace. > > For me the main issue is the impact on devices that support multiple > thermal zones, in order to avoid having working zones stay registered > their drivers will all have to handle the possibility of some of the > zones failing to register due to missing configuration which is going > to Well I think in the case of Allwinner SoCs, the thermal sensor is a multi-channel one, so it's possible that some channels (e.g. the CPU sensor) are used for thermal throttling and other channels (e.g. the GPU one, considering Mali-400 is quite weak, and usually no DVFS equipped) are only used for monitoring. We should allow this kind of configuration in kernel. Moving everything to hwmon is an option, but it's a too gaint change. > add complexity both at both registration and runtime and be easy to > miss. > If the core just accepts the zones then whatever complexity there is > gets factored out into the core.
On Sun, Jul 23, 2023 at 12:12:49PM +0200, Daniel Lezcano wrote: > > Hi Mark, > > On 22/07/2023 22:11, Mark Brown wrote: > > On Sat, Jul 22, 2023 at 08:25:34PM +0800, Icenowy Zheng wrote: > > > From: Icenowy Zheng <uwu@icenowy.me> > > > > > > Although the current device tree binding of thermal zones require the > > > trips subnode, the binding in kernel v5.15 does not require it, and many > > > device trees shipped with the kernel, for example, > > > allwinner/sun50i-a64.dtsi and mediatek/mt8183-kukui.dtsi in ARM64, still > > > comply to the old binding and contain no trips subnode. > > > > > > Allow the code to successfully register thermal zones w/o trips subnode > > > for DT binding compatibility now. > > > > > > Furtherly, the inconsistency between DTs and bindings should be resolved > > > by either adding empty trips subnode or dropping the trips subnode > > > requirement. > > > > This makes sense to me - it allows people to see the reported > > temperature even if there's no trips defined which seems more > > helpful than refusing to register. > > The binding describes the trip points as required and that since the > beginning. Not really. It was made optional in the v5.15 kernel release by commit 22fc857538c3 dt-bindings: thermal: Make trips node optional > What changed is now the code reflects the required property while before it > was permissive, that was an oversight. > > Just a reminder about the thermal framework goals: > > 1. It protects the silicon (thus critical and hot trip points) > > 2. It mitigates the temperature (thus cooling device bound to trip points) > > 3. It notifies the userspace when a trip point is crossed > > So if the thermal zone is described but without any of this goal above, it > is pointless. > > If the goal is to report the temperature only, then hwmon should be used > instead. What about thermal sensors with multiple channels? Some of the channels are indeed tied to important hardware blocks like the CPU cores and should be tied into the thermal tripping. However other channels might only be used for temperature read-out and have no such requirement. Should we be mixing thermal and hwmon APIs in the driver? > If the goal is to mitigate by userspace, then the trip point *must* be used > to prevent the userspace polling the temperature. With the trip point the > sensor will be set to fire an interrupt at the given trip temperature. > > IOW, trip points are not optional for measurement points that are used for thermal throttling / mitigation. ChenYu
于 2023年7月24日 GMT+08:00 12:25:02, Chen-Yu Tsai <wenst@chromium.org> 写到: >On Sun, Jul 23, 2023 at 12:12:49PM +0200, Daniel Lezcano wrote: >> >> Hi Mark, >> >> On 22/07/2023 22:11, Mark Brown wrote: >> > On Sat, Jul 22, 2023 at 08:25:34PM +0800, Icenowy Zheng wrote: >> > > From: Icenowy Zheng <uwu@icenowy.me> >> > > >> > > Although the current device tree binding of thermal zones require the >> > > trips subnode, the binding in kernel v5.15 does not require it, and many >> > > device trees shipped with the kernel, for example, >> > > allwinner/sun50i-a64.dtsi and mediatek/mt8183-kukui.dtsi in ARM64, still >> > > comply to the old binding and contain no trips subnode. >> > > >> > > Allow the code to successfully register thermal zones w/o trips subnode >> > > for DT binding compatibility now. >> > > >> > > Furtherly, the inconsistency between DTs and bindings should be resolved >> > > by either adding empty trips subnode or dropping the trips subnode >> > > requirement. >> > >> > This makes sense to me - it allows people to see the reported >> > temperature even if there's no trips defined which seems more >> > helpful than refusing to register. >> >> The binding describes the trip points as required and that since the >> beginning. > >Not really. It was made optional in the v5.15 kernel release by commit > > 22fc857538c3 dt-bindings: thermal: Make trips node optional Yes, thanks for the clarification. My understand of DT binding tells me that this means lacking of the trips node must be handled, before we solve the inconsistency between current DT binding and shipped DTs. The latter problem could be discussed, but the former problem is a MUST unless we're breaking the compatibility promise of DT bindings (and shipped DTs). > >> What changed is now the code reflects the required property while before it >> was permissive, that was an oversight. >> >> Just a reminder about the thermal framework goals: >> >> 1. It protects the silicon (thus critical and hot trip points) >> >> 2. It mitigates the temperature (thus cooling device bound to trip points) >> >> 3. It notifies the userspace when a trip point is crossed >> >> So if the thermal zone is described but without any of this goal above, it >> is pointless. >> >> If the goal is to report the temperature only, then hwmon should be used >> instead. > >What about thermal sensors with multiple channels? Some of the channels >are indeed tied to important hardware blocks like the CPU cores and >should be tied into the thermal tripping. However other channels might >only be used for temperature read-out and have no such requirement. > >Should we be mixing thermal and hwmon APIs in the driver? > >> If the goal is to mitigate by userspace, then the trip point *must* be used >> to prevent the userspace polling the temperature. With the trip point the >> sensor will be set to fire an interrupt at the given trip temperature. >> >> IOW, trip points are not optional > >for measurement points that are used for thermal throttling / >mitigation. > >ChenYu >
在 2023-07-24星期一的 12:25 +0800,Chen-Yu Tsai写道: > On Sun, Jul 23, 2023 at 12:12:49PM +0200, Daniel Lezcano wrote: > > > > Hi Mark, > > > > On 22/07/2023 22:11, Mark Brown wrote: > > > On Sat, Jul 22, 2023 at 08:25:34PM +0800, Icenowy Zheng wrote: > > > > From: Icenowy Zheng <uwu@icenowy.me> > > > > > > > > Although the current device tree binding of thermal zones > > > > require the > > > > trips subnode, the binding in kernel v5.15 does not require it, > > > > and many > > > > device trees shipped with the kernel, for example, > > > > allwinner/sun50i-a64.dtsi and mediatek/mt8183-kukui.dtsi in > > > > ARM64, still > > > > comply to the old binding and contain no trips subnode. > > > > > > > > Allow the code to successfully register thermal zones w/o trips > > > > subnode > > > > for DT binding compatibility now. > > > > > > > > Furtherly, the inconsistency between DTs and bindings should be > > > > resolved > > > > by either adding empty trips subnode or dropping the trips > > > > subnode > > > > requirement. > > > > > > This makes sense to me - it allows people to see the reported > > > temperature even if there's no trips defined which seems more > > > helpful than refusing to register. > > > > The binding describes the trip points as required and that since > > the > > beginning. > > Not really. It was made optional in the v5.15 kernel release by > commit > > 22fc857538c3 dt-bindings: thermal: Make trips node optional I agree, this is why I send this patch (and why I say 'for DT binding compatibility now' in the commit message). Further discussion could be performed, but this patch should be applied regardless of the result of further discussion. DT binding compatibility is the unbreakable law. > > > What changed is now the code reflects the required property while > > before it > > was permissive, that was an oversight. > > > > Just a reminder about the thermal framework goals: > > > > 1. It protects the silicon (thus critical and hot trip points) > > > > 2. It mitigates the temperature (thus cooling device bound to > > trip points) > > > > 3. It notifies the userspace when a trip point is crossed > > > > So if the thermal zone is described but without any of this goal > > above, it > > is pointless. > > > > If the goal is to report the temperature only, then hwmon should be > > used > > instead. > > What about thermal sensors with multiple channels? Some of the > channels > are indeed tied to important hardware blocks like the CPU cores and > should be tied into the thermal tripping. However other channels > might > only be used for temperature read-out and have no such requirement. > > Should we be mixing thermal and hwmon APIs in the driver? Well you have no right to decide which sensor should be used for throttling and which not. So the only way to make the semantic correct is just rip every sensor driver out of thermal API to hwmon API, and let thermal framework to use hwmon's. > > > If the goal is to mitigate by userspace, then the trip point *must* > > be used > > to prevent the userspace polling the temperature. With the trip > > point the > > sensor will be set to fire an interrupt at the given trip > > temperature. > > > > IOW, trip points are not optional > > for measurement points that are used for thermal throttling / > mitigation. > > ChenYu >
diff --git a/drivers/thermal/thermal_of.c b/drivers/thermal/thermal_of.c index 6fb14e521197..2c76df847e84 100644 --- a/drivers/thermal/thermal_of.c +++ b/drivers/thermal/thermal_of.c @@ -127,15 +127,17 @@ static struct thermal_trip *thermal_of_trips_init(struct device_node *np, int *n trips = of_get_child_by_name(np, "trips"); if (!trips) { - pr_err("Failed to find 'trips' node\n"); - return ERR_PTR(-EINVAL); + pr_debug("Failed to find 'trips' node\n"); + *ntrips = 0; + return NULL; } count = of_get_child_count(trips); if (!count) { - pr_err("No trip point defined\n"); - ret = -EINVAL; - goto out_of_node_put; + pr_debug("No trip point defined\n"); + of_node_put(trips); + *ntrips = 0; + return NULL; } tt = kzalloc(sizeof(*tt) * count, GFP_KERNEL); @@ -519,7 +521,10 @@ static struct thermal_zone_device *thermal_of_zone_register(struct device_node * of_ops->bind = thermal_of_bind; of_ops->unbind = thermal_of_unbind; - mask = GENMASK_ULL((ntrips) - 1, 0); + if (ntrips) + mask = GENMASK_ULL((ntrips) - 1, 0); + else + mask = 0; tz = thermal_zone_device_register_with_trips(np->name, trips, ntrips, mask, data, of_ops, tzp,