Message ID | 4940808.31r3eYUQgx@rjwysocki.net |
---|---|
State | Superseded |
Headers | show |
Series | thermal: core: Sanitize polling delay values | expand |
On 05/07/2024 21:46, Rafael J. Wysocki wrote: > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > If polling_delay is nonzero and passive_delay is 0, the thermal zone > will use polling except when tz->passive is nonzero, which does not make > sense. > > Also if polling_delay is nonzero and passive_delay is greater than > polling_delay, the thermal zone temperature will be updated less often > when tz->passive is nonzero. This does not make sense either. > > Ensure that none of the above will happen. > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > --- > > v1 -> v2: The patch actually matches the changelog > > --- > drivers/thermal/thermal_core.c | 3 +++ > 1 file changed, 3 insertions(+) > > Index: linux-pm/drivers/thermal/thermal_core.c > =================================================================== > --- linux-pm.orig/drivers/thermal/thermal_core.c > +++ linux-pm/drivers/thermal/thermal_core.c > @@ -1440,6 +1440,9 @@ thermal_zone_device_register_with_trips( > td->threshold = INT_MAX; > } > > + if (polling_delay && (passive_delay > polling_delay || !passive_delay)) > + passive_delay = polling_delay; Given this is a system misconfiguration, it would make more sense to bail out with -EINVAL. Assigning a default value in the back of the caller will never raise its attention and can make a bad configuration staying for a long time. That said, there are configurations with a passive delay set to zero but with a non zero polling delay. For instance, a thermal zone mitigated with a fan, so active trip points are set. Another example is when there is only critical trip points for a thermal zone. Actually there are multiple combinations with delays value which may look invalid but which are actually valid. For example, a setup with polling_delay > 0, passive_delay = 0, active trip points, cooling map to this active trips, passive trip points without cooling map. IMHO, it is better to do the configuration the system is asking for, even if it sounds weird > thermal_set_delay_jiffies(&tz->passive_delay_jiffies, passive_delay); > thermal_set_delay_jiffies(&tz->polling_delay_jiffies, polling_delay);
On Mon, Jul 8, 2024 at 2:12 PM Daniel Lezcano <daniel.lezcano@linaro.org> wrote: > > On 05/07/2024 21:46, Rafael J. Wysocki wrote: > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > If polling_delay is nonzero and passive_delay is 0, the thermal zone > > will use polling except when tz->passive is nonzero, which does not make > > sense. > > > > Also if polling_delay is nonzero and passive_delay is greater than > > polling_delay, the thermal zone temperature will be updated less often > > when tz->passive is nonzero. This does not make sense either. > > > > Ensure that none of the above will happen. > > > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > --- > > > > v1 -> v2: The patch actually matches the changelog > > > > --- > > drivers/thermal/thermal_core.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > Index: linux-pm/drivers/thermal/thermal_core.c > > =================================================================== > > --- linux-pm.orig/drivers/thermal/thermal_core.c > > +++ linux-pm/drivers/thermal/thermal_core.c > > @@ -1440,6 +1440,9 @@ thermal_zone_device_register_with_trips( > > td->threshold = INT_MAX; > > } > > > > + if (polling_delay && (passive_delay > polling_delay || !passive_delay)) > > + passive_delay = polling_delay; > > Given this is a system misconfiguration, it would make more sense to > bail out with -EINVAL. Assigning a default value in the back of the > caller will never raise its attention and can make a bad configuration > staying for a long time. This works except for the case mentioned below. I think that passive_delay > polling_delay can trigger a -EINVAL, but (polling_delay && !passive_delay) cannot do it because it is regarded as a valid case as per the below. > That said, there are configurations with a passive delay set to zero but > with a non zero polling delay. For instance, a thermal zone mitigated > with a fan, so active trip points are set. Another example is when there > is only critical trip points for a thermal zone. > > Actually there are multiple combinations with delays value which may > look invalid but which are actually valid. > > For example, a setup with polling_delay > 0, passive_delay = 0, active > trip points, cooling map to this active trips, passive trip points > without cooling map. > > IMHO, it is better to do the configuration the system is asking for, > even if it sounds weird Except that it doesn't work as expected because if passive_delay = 0, polling is paused when tz->passive is set. Thanks!
On 08/07/2024 15:38, Rafael J. Wysocki wrote: > On Mon, Jul 8, 2024 at 2:12 PM Daniel Lezcano <daniel.lezcano@linaro.org> wrote: >> >> On 05/07/2024 21:46, Rafael J. Wysocki wrote: >>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> >>> >>> If polling_delay is nonzero and passive_delay is 0, the thermal zone >>> will use polling except when tz->passive is nonzero, which does not make >>> sense. >>> >>> Also if polling_delay is nonzero and passive_delay is greater than >>> polling_delay, the thermal zone temperature will be updated less often >>> when tz->passive is nonzero. This does not make sense either. >>> >>> Ensure that none of the above will happen. >>> >>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> >>> --- >>> >>> v1 -> v2: The patch actually matches the changelog >>> >>> --- >>> drivers/thermal/thermal_core.c | 3 +++ >>> 1 file changed, 3 insertions(+) >>> >>> Index: linux-pm/drivers/thermal/thermal_core.c >>> =================================================================== >>> --- linux-pm.orig/drivers/thermal/thermal_core.c >>> +++ linux-pm/drivers/thermal/thermal_core.c >>> @@ -1440,6 +1440,9 @@ thermal_zone_device_register_with_trips( >>> td->threshold = INT_MAX; >>> } >>> >>> + if (polling_delay && (passive_delay > polling_delay || !passive_delay)) >>> + passive_delay = polling_delay; >> >> Given this is a system misconfiguration, it would make more sense to >> bail out with -EINVAL. Assigning a default value in the back of the >> caller will never raise its attention and can make a bad configuration >> staying for a long time. > > This works except for the case mentioned below. > > I think that passive_delay > polling_delay can trigger a -EINVAL, but > (polling_delay && !passive_delay) cannot do it because it is regarded > as a valid case as per the below. Right I can see ATM only this as an illogic combination: polling_delay && passive_delay && (polling_delay < passive_delay) >> That said, there are configurations with a passive delay set to zero but >> with a non zero polling delay. For instance, a thermal zone mitigated >> with a fan, so active trip points are set. Another example is when there >> is only critical trip points for a thermal zone. >> >> Actually there are multiple combinations with delays value which may >> look invalid but which are actually valid. >> >> For example, a setup with polling_delay > 0, passive_delay = 0, active >> trip points, cooling map to this active trips, passive trip points >> without cooling map. >> >> IMHO, it is better to do the configuration the system is asking for, >> even if it sounds weird > > Except that it doesn't work as expected because if passive_delay = 0, > polling is paused when tz->passive is set. Yes, but as there is no cooling map, there is no governor action, thus tz->passive is never set. So we can have a passive polling equal to zero without being illegal as no passive mitigation will happen. The passive delay is really there only if there is a passive cooling device mapped to a passive trip point. The polling delay is in charge of mitigating the active cooling device like a fan. So it is possible to mix an active trip point to mitigate with a fan and then put at a higher temperature a passive trip point with a higher sampling resolution.
On Mon, Jul 8, 2024 at 3:58 PM Daniel Lezcano <daniel.lezcano@linaro.org> wrote: > > On 08/07/2024 15:38, Rafael J. Wysocki wrote: > > On Mon, Jul 8, 2024 at 2:12 PM Daniel Lezcano <daniel.lezcano@linaro.org> wrote: > >> > >> On 05/07/2024 21:46, Rafael J. Wysocki wrote: > >>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > >>> > >>> If polling_delay is nonzero and passive_delay is 0, the thermal zone > >>> will use polling except when tz->passive is nonzero, which does not make > >>> sense. > >>> > >>> Also if polling_delay is nonzero and passive_delay is greater than > >>> polling_delay, the thermal zone temperature will be updated less often > >>> when tz->passive is nonzero. This does not make sense either. > >>> > >>> Ensure that none of the above will happen. > >>> > >>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > >>> --- > >>> > >>> v1 -> v2: The patch actually matches the changelog > >>> > >>> --- > >>> drivers/thermal/thermal_core.c | 3 +++ > >>> 1 file changed, 3 insertions(+) > >>> > >>> Index: linux-pm/drivers/thermal/thermal_core.c > >>> =================================================================== > >>> --- linux-pm.orig/drivers/thermal/thermal_core.c > >>> +++ linux-pm/drivers/thermal/thermal_core.c > >>> @@ -1440,6 +1440,9 @@ thermal_zone_device_register_with_trips( > >>> td->threshold = INT_MAX; > >>> } > >>> > >>> + if (polling_delay && (passive_delay > polling_delay || !passive_delay)) > >>> + passive_delay = polling_delay; > >> > >> Given this is a system misconfiguration, it would make more sense to > >> bail out with -EINVAL. Assigning a default value in the back of the > >> caller will never raise its attention and can make a bad configuration > >> staying for a long time. > > > > This works except for the case mentioned below. > > > > I think that passive_delay > polling_delay can trigger a -EINVAL, but > > (polling_delay && !passive_delay) cannot do it because it is regarded > > as a valid case as per the below. > > Right I can see ATM only this as an illogic combination: > > polling_delay && passive_delay && > (polling_delay < passive_delay) > > >> That said, there are configurations with a passive delay set to zero but > >> with a non zero polling delay. For instance, a thermal zone mitigated > >> with a fan, so active trip points are set. Another example is when there > >> is only critical trip points for a thermal zone. > >> > >> Actually there are multiple combinations with delays value which may > >> look invalid but which are actually valid. > >> > >> For example, a setup with polling_delay > 0, passive_delay = 0, active > >> trip points, cooling map to this active trips, passive trip points > >> without cooling map. > >> > >> IMHO, it is better to do the configuration the system is asking for, > >> even if it sounds weird > > > > Except that it doesn't work as expected because if passive_delay = 0, > > polling is paused when tz->passive is set. > > Yes, but as there is no cooling map, there is no governor action, thus > tz->passive is never set. In current linux-next, it is set when a passive trip is crossed on the way up. > So we can have a passive polling equal to zero > without being illegal as no passive mitigation will happen. > > The passive delay is really there only if there is a passive cooling > device mapped to a passive trip point. Well, shouldn't user space get notified more often when passive cooling is under way? > The polling delay is in charge of mitigating the active cooling device > like a fan. So it is possible to mix an active trip point to mitigate > with a fan and then put at a higher temperature a passive trip point > with a higher sampling resolution. But it is not correct to pause polling when tz->passive is set.
On 08/07/2024 16:03, Rafael J. Wysocki wrote: > On Mon, Jul 8, 2024 at 3:58 PM Daniel Lezcano <daniel.lezcano@linaro.org> wrote: >> >> On 08/07/2024 15:38, Rafael J. Wysocki wrote: >>> On Mon, Jul 8, 2024 at 2:12 PM Daniel Lezcano <daniel.lezcano@linaro.org> wrote: >>>> >>>> On 05/07/2024 21:46, Rafael J. Wysocki wrote: >>>>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> >>>>> >>>>> If polling_delay is nonzero and passive_delay is 0, the thermal zone >>>>> will use polling except when tz->passive is nonzero, which does not make >>>>> sense. >>>>> >>>>> Also if polling_delay is nonzero and passive_delay is greater than >>>>> polling_delay, the thermal zone temperature will be updated less often >>>>> when tz->passive is nonzero. This does not make sense either. >>>>> >>>>> Ensure that none of the above will happen. >>>>> >>>>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> >>>>> --- >>>>> >>>>> v1 -> v2: The patch actually matches the changelog >>>>> >>>>> --- >>>>> drivers/thermal/thermal_core.c | 3 +++ >>>>> 1 file changed, 3 insertions(+) >>>>> >>>>> Index: linux-pm/drivers/thermal/thermal_core.c >>>>> =================================================================== >>>>> --- linux-pm.orig/drivers/thermal/thermal_core.c >>>>> +++ linux-pm/drivers/thermal/thermal_core.c >>>>> @@ -1440,6 +1440,9 @@ thermal_zone_device_register_with_trips( >>>>> td->threshold = INT_MAX; >>>>> } >>>>> >>>>> + if (polling_delay && (passive_delay > polling_delay || !passive_delay)) >>>>> + passive_delay = polling_delay; >>>> >>>> Given this is a system misconfiguration, it would make more sense to >>>> bail out with -EINVAL. Assigning a default value in the back of the >>>> caller will never raise its attention and can make a bad configuration >>>> staying for a long time. >>> >>> This works except for the case mentioned below. >>> >>> I think that passive_delay > polling_delay can trigger a -EINVAL, but >>> (polling_delay && !passive_delay) cannot do it because it is regarded >>> as a valid case as per the below. >> >> Right I can see ATM only this as an illogic combination: >> >> polling_delay && passive_delay && >> (polling_delay < passive_delay) >> >>>> That said, there are configurations with a passive delay set to zero but >>>> with a non zero polling delay. For instance, a thermal zone mitigated >>>> with a fan, so active trip points are set. Another example is when there >>>> is only critical trip points for a thermal zone. >>>> >>>> Actually there are multiple combinations with delays value which may >>>> look invalid but which are actually valid. >>>> >>>> For example, a setup with polling_delay > 0, passive_delay = 0, active >>>> trip points, cooling map to this active trips, passive trip points >>>> without cooling map. >>>> >>>> IMHO, it is better to do the configuration the system is asking for, >>>> even if it sounds weird >>> >>> Except that it doesn't work as expected because if passive_delay = 0, >>> polling is paused when tz->passive is set. >> >> Yes, but as there is no cooling map, there is no governor action, thus >> tz->passive is never set. > > In current linux-next, it is set when a passive trip is crossed on the way up. Ah, I see. AFAIR that was the gov_step_wise which was changing this value but based on the thermal instance. >> So we can have a passive polling equal to zero >> without being illegal as no passive mitigation will happen. >> >> The passive delay is really there only if there is a passive cooling >> device mapped to a passive trip point. > > Well, shouldn't user space get notified more often when passive > cooling is under way? (Assuming you meant "user space get notified when a passive trip point is crossed") Mmh, yes. I see the point. >> The polling delay is in charge of mitigating the active cooling device >> like a fan. So it is possible to mix an active trip point to mitigate >> with a fan and then put at a higher temperature a passive trip point >> with a higher sampling resolution. > > But it is not correct to pause polling when tz->passive is set. I'm not sure to get the comment. Just to clarify: trip A is active with a multi speed fan, polling every 1s trip B is passive with a cpufreq cooling device, polling every 100ms temp(tripA) < temp(tripB) When the trip A is crossed, the mitigation happens at <polling> rate. Assuming it fails to cool down, the fan continues to increase its speed until it reaches its max state. The temperature continues to increase and crosses the passive trip point. The fan speed stays at its maximum and the polling switches to the passive polling delay.
On Mon, Jul 8, 2024 at 4:32 PM Daniel Lezcano <daniel.lezcano@linaro.org> wrote: > > On 08/07/2024 16:03, Rafael J. Wysocki wrote: > > On Mon, Jul 8, 2024 at 3:58 PM Daniel Lezcano <daniel.lezcano@linaro.org> wrote: > >> > >> On 08/07/2024 15:38, Rafael J. Wysocki wrote: > >>> On Mon, Jul 8, 2024 at 2:12 PM Daniel Lezcano <daniel.lezcano@linaro.org> wrote: > >>>> > >>>> On 05/07/2024 21:46, Rafael J. Wysocki wrote: > >>>>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > >>>>> > >>>>> If polling_delay is nonzero and passive_delay is 0, the thermal zone > >>>>> will use polling except when tz->passive is nonzero, which does not make > >>>>> sense. > >>>>> > >>>>> Also if polling_delay is nonzero and passive_delay is greater than > >>>>> polling_delay, the thermal zone temperature will be updated less often > >>>>> when tz->passive is nonzero. This does not make sense either. > >>>>> > >>>>> Ensure that none of the above will happen. > >>>>> > >>>>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > >>>>> --- > >>>>> > >>>>> v1 -> v2: The patch actually matches the changelog > >>>>> > >>>>> --- > >>>>> drivers/thermal/thermal_core.c | 3 +++ > >>>>> 1 file changed, 3 insertions(+) > >>>>> > >>>>> Index: linux-pm/drivers/thermal/thermal_core.c > >>>>> =================================================================== > >>>>> --- linux-pm.orig/drivers/thermal/thermal_core.c > >>>>> +++ linux-pm/drivers/thermal/thermal_core.c > >>>>> @@ -1440,6 +1440,9 @@ thermal_zone_device_register_with_trips( > >>>>> td->threshold = INT_MAX; > >>>>> } > >>>>> > >>>>> + if (polling_delay && (passive_delay > polling_delay || !passive_delay)) > >>>>> + passive_delay = polling_delay; > >>>> > >>>> Given this is a system misconfiguration, it would make more sense to > >>>> bail out with -EINVAL. Assigning a default value in the back of the > >>>> caller will never raise its attention and can make a bad configuration > >>>> staying for a long time. > >>> > >>> This works except for the case mentioned below. > >>> > >>> I think that passive_delay > polling_delay can trigger a -EINVAL, but > >>> (polling_delay && !passive_delay) cannot do it because it is regarded > >>> as a valid case as per the below. > >> > >> Right I can see ATM only this as an illogic combination: > >> > >> polling_delay && passive_delay && > >> (polling_delay < passive_delay) > >> > >>>> That said, there are configurations with a passive delay set to zero but > >>>> with a non zero polling delay. For instance, a thermal zone mitigated > >>>> with a fan, so active trip points are set. Another example is when there > >>>> is only critical trip points for a thermal zone. > >>>> > >>>> Actually there are multiple combinations with delays value which may > >>>> look invalid but which are actually valid. > >>>> > >>>> For example, a setup with polling_delay > 0, passive_delay = 0, active > >>>> trip points, cooling map to this active trips, passive trip points > >>>> without cooling map. > >>>> > >>>> IMHO, it is better to do the configuration the system is asking for, > >>>> even if it sounds weird > >>> > >>> Except that it doesn't work as expected because if passive_delay = 0, > >>> polling is paused when tz->passive is set. > >> > >> Yes, but as there is no cooling map, there is no governor action, thus > >> tz->passive is never set. > > > > In current linux-next, it is set when a passive trip is crossed on the way up. > > Ah, I see. AFAIR that was the gov_step_wise which was changing this > value but based on the thermal instance. > > >> So we can have a passive polling equal to zero > >> without being illegal as no passive mitigation will happen. > >> > >> The passive delay is really there only if there is a passive cooling > >> device mapped to a passive trip point. > > > > Well, shouldn't user space get notified more often when passive > > cooling is under way? > > (Assuming you meant "user space get notified when a passive trip point > is crossed") > > Mmh, yes. I see the point. > > > >> The polling delay is in charge of mitigating the active cooling device > >> like a fan. So it is possible to mix an active trip point to mitigate > >> with a fan and then put at a higher temperature a passive trip point > >> with a higher sampling resolution. > > > > But it is not correct to pause polling when tz->passive is set. > > I'm not sure to get the comment. > > Just to clarify: > > trip A is active with a multi speed fan, polling every 1s > > trip B is passive with a cpufreq cooling device, polling every 100ms > > temp(tripA) < temp(tripB) > > When the trip A is crossed, the mitigation happens at <polling> rate. > Assuming it fails to cool down, the fan continues to increase its speed > until it reaches its max state. > > The temperature continues to increase and crosses the passive trip > point. The fan speed stays at its maximum and the polling switches to > the passive polling delay. Yes, but if the passive polling delay happens to be zero, it will stop the polling entirely until tz->passive becomes zero again. I don't believe that this is correct.
Index: linux-pm/drivers/thermal/thermal_core.c =================================================================== --- linux-pm.orig/drivers/thermal/thermal_core.c +++ linux-pm/drivers/thermal/thermal_core.c @@ -1440,6 +1440,9 @@ thermal_zone_device_register_with_trips( td->threshold = INT_MAX; } + if (polling_delay && (passive_delay > polling_delay || !passive_delay)) + passive_delay = polling_delay; + thermal_set_delay_jiffies(&tz->passive_delay_jiffies, passive_delay); thermal_set_delay_jiffies(&tz->polling_delay_jiffies, polling_delay);