diff mbox series

driver core: platform: Use devres group to free driver probe resources

Message ID 20250215130849.227812-1-claudiu.beznea.uj@bp.renesas.com
State New
Headers show
Series driver core: platform: Use devres group to free driver probe resources | expand

Commit Message

Claudiu Beznea Feb. 15, 2025, 1:08 p.m. UTC
From: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>

On the Renesas RZ/G3S (and other Renesas SoCs, e.g., RZ/G2{L, LC, UL}),
clocks are managed through PM domains. These PM domains, registered on
behalf of the clock controller driver, are configured with
GENPD_FLAG_PM_CLK. In most of the Renesas drivers used by RZ SoCs, the
clocks are enabled/disabled using runtime PM APIs. The power domains may
also have power_on/power_off support implemented. After the device PM
domain is powered off any CPU accesses to these domains leads to system
aborts.

During probe, devices are attached to the PM domain controlling their
clocks and power. Similarly, during removal, devices are detached from the
PM domain.

The detachment call stack is as follows:

device_driver_detach() ->
  device_release_driver_internal() ->
    __device_release_driver() ->
      device_remove() ->
        platform_remove() ->
	  dev_pm_domain_detach()

During driver unbind, after the device is detached from its PM domain,
the device_unbind_cleanup() function is called, which subsequently invokes
devres_release_all(). This function handles devres resource cleanup.

If runtime PM is enabled in driver probe via devm_pm_runtime_enable(), the
cleanup process triggers the action or reset function for disabling runtime
PM. This function is pm_runtime_disable_action(), which leads to the
following call stack of interest when called:

pm_runtime_disable_action() ->
  pm_runtime_dont_use_autosuspend() ->
    __pm_runtime_use_autosuspend() ->
      update_autosuspend() ->
        rpm_idle()

The rpm_idle() function attempts to resume the device at runtime. However,
at the point it is called, the device is no longer part of a PM domain
(which manages clocks and power states). If the driver implements its own
runtime PM APIs for specific functionalities - such as the rzg2l_adc
driver - while also relying on the power domain subsystem for power
management, rpm_idle() will invoke the driver's runtime PM API. However,
since the device is no longer part of a PM domain at this point, the PM
domain's runtime PM APIs will not be called. This leads to system aborts on
Renesas SoCs.

Another identified case is when a subsystem performs various cleanups
using device_unbind_cleanup(), calling driver-specific APIs in the process.
A known example is the thermal subsystem, which may call driver-specific
APIs to disable the thermal device. The relevant call stack in this case
is:

device_driver_detach() ->
  device_release_driver_internal() ->
    device_unbind_cleanup() ->
      devres_release_all() ->
        devm_thermal_of_zone_release() ->
	  thermal_zone_device_disable() ->
	    thermal_zone_device_set_mode() ->
	      struct thermal_zone_device_ops::change_mode()

At the moment the driver-specific change_mode() API is called, the device
is no longer part of its PM domain. Accessing its registers without proper
power management leads to system aborts.

Open a devres group before calling the driver probe, and close it
immediately after the driver remove function is called and before
dev_pm_domain_detach(). This ensures that driver-specific devm actions or
reset functions are executed immediately after the driver remove function
completes. Additionally, it prevents driver-specific runtime PM APIs from
being called when the device is no longer part of its power domain.

Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
---

Hi,

Although Ulf gave its green light for the approaches on both IIO [1],
[2] and thermal subsystems [3], Jonathan considered unacceptable the
approaches in [1], [2] as he considered it may lead to dificult to
maintain code and code opened to subtle bugs (due to the potential of
mixing devres and non-devres calls). He pointed out a similar approach
that was done for the I2C bus [4], [5].

As the discussions in [1], [2] stopped w/o a clear conclusion, this
patch tries to revive it by proposing a similar approach that was done
for the I2C bus.

Please let me know you input.

Thank you,
Claudiu

[1] https://lore.kernel.org/all/20250103140042.1619703-2-claudiu.beznea.uj@bp.renesas.com/
[2] https://lore.kernel.org/all/20250117114540.289248-2-claudiu.beznea.uj@bp.renesas.com/
[3] https://lore.kernel.org/all/20250103163805.1775705-3-claudiu.beznea.uj@bp.renesas.com/
[4] https://elixir.bootlin.com/linux/v6.12.6/source/drivers/i2c/i2c-core-base.c#L579
[5] https://elixir.bootlin.com/linux/v6.12.6/source/drivers/i2c/i2c-core-base.c#L630

 drivers/base/platform.c         | 10 +++++++++-
 include/linux/platform_device.h |  3 +++
 2 files changed, 12 insertions(+), 1 deletion(-)

Comments

Claudiu Beznea March 27, 2025, 4:47 p.m. UTC | #1
Hi, Rafael,

On 06.03.2025 08:11, Dmitry Torokhov wrote:
> On Wed, Mar 05, 2025 at 02:03:09PM +0000, Jonathan Cameron wrote:
>> On Wed, 19 Feb 2025 14:45:07 +0200
>> Claudiu Beznea <claudiu.beznea@tuxon.dev> wrote:
>>
>>> Hi, Daniel, Jonathan,
>>>
>>> On 15.02.2025 15:51, Claudiu Beznea wrote:
>>>> Hi, Greg,
>>>>
>>>> On 15.02.2025 15:25, Greg KH wrote:  
>>>>> On Sat, Feb 15, 2025 at 03:08:49PM +0200, Claudiu wrote:  
>>>>>> From: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
>>>>>>
>>>>>> On the Renesas RZ/G3S (and other Renesas SoCs, e.g., RZ/G2{L, LC, UL}),
>>>>>> clocks are managed through PM domains. These PM domains, registered on
>>>>>> behalf of the clock controller driver, are configured with
>>>>>> GENPD_FLAG_PM_CLK. In most of the Renesas drivers used by RZ SoCs, the
>>>>>> clocks are enabled/disabled using runtime PM APIs. The power domains may
>>>>>> also have power_on/power_off support implemented. After the device PM
>>>>>> domain is powered off any CPU accesses to these domains leads to system
>>>>>> aborts.
>>>>>>
>>>>>> During probe, devices are attached to the PM domain controlling their
>>>>>> clocks and power. Similarly, during removal, devices are detached from the
>>>>>> PM domain.
>>>>>>
>>>>>> The detachment call stack is as follows:
>>>>>>
>>>>>> device_driver_detach() ->
>>>>>>   device_release_driver_internal() ->
>>>>>>     __device_release_driver() ->
>>>>>>       device_remove() ->
>>>>>>         platform_remove() ->
>>>>>> 	  dev_pm_domain_detach()
>>>>>>
>>>>>> During driver unbind, after the device is detached from its PM domain,
>>>>>> the device_unbind_cleanup() function is called, which subsequently invokes
>>>>>> devres_release_all(). This function handles devres resource cleanup.
>>>>>>
>>>>>> If runtime PM is enabled in driver probe via devm_pm_runtime_enable(), the
>>>>>> cleanup process triggers the action or reset function for disabling runtime
>>>>>> PM. This function is pm_runtime_disable_action(), which leads to the
>>>>>> following call stack of interest when called:
>>>>>>
>>>>>> pm_runtime_disable_action() ->
>>>>>>   pm_runtime_dont_use_autosuspend() ->
>>>>>>     __pm_runtime_use_autosuspend() ->
>>>>>>       update_autosuspend() ->
>>>>>>         rpm_idle()
>>>>>>
>>>>>> The rpm_idle() function attempts to resume the device at runtime. However,
>>>>>> at the point it is called, the device is no longer part of a PM domain
>>>>>> (which manages clocks and power states). If the driver implements its own
>>>>>> runtime PM APIs for specific functionalities - such as the rzg2l_adc
>>>>>> driver - while also relying on the power domain subsystem for power
>>>>>> management, rpm_idle() will invoke the driver's runtime PM API. However,
>>>>>> since the device is no longer part of a PM domain at this point, the PM
>>>>>> domain's runtime PM APIs will not be called. This leads to system aborts on
>>>>>> Renesas SoCs.
>>>>>>
>>>>>> Another identified case is when a subsystem performs various cleanups
>>>>>> using device_unbind_cleanup(), calling driver-specific APIs in the process.
>>>>>> A known example is the thermal subsystem, which may call driver-specific
>>>>>> APIs to disable the thermal device. The relevant call stack in this case
>>>>>> is:
>>>>>>
>>>>>> device_driver_detach() ->
>>>>>>   device_release_driver_internal() ->
>>>>>>     device_unbind_cleanup() ->
>>>>>>       devres_release_all() ->
>>>>>>         devm_thermal_of_zone_release() ->
>>>>>> 	  thermal_zone_device_disable() ->
>>>>>> 	    thermal_zone_device_set_mode() ->
>>>>>> 	      struct thermal_zone_device_ops::change_mode()
>>>>>>
>>>>>> At the moment the driver-specific change_mode() API is called, the device
>>>>>> is no longer part of its PM domain. Accessing its registers without proper
>>>>>> power management leads to system aborts.
>>>>>>
>>>>>> Open a devres group before calling the driver probe, and close it
>>>>>> immediately after the driver remove function is called and before
>>>>>> dev_pm_domain_detach(). This ensures that driver-specific devm actions or
>>>>>> reset functions are executed immediately after the driver remove function
>>>>>> completes. Additionally, it prevents driver-specific runtime PM APIs from
>>>>>> being called when the device is no longer part of its power domain.
>>>>>>
>>>>>> Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
>>>>>> ---
>>>>>>
>>>>>> Hi,
>>
>> Hi Claudiu, Greg,
>>
>> Sorry, I missed this thread whilst travelling and only saw it because
>> of reference from the in driver solution.
>>
>>>>>>
>>>>>> Although Ulf gave its green light for the approaches on both IIO [1],
>>>>>> [2] and thermal subsystems [3], Jonathan considered unacceptable the
>>>>>> approaches in [1], [2] as he considered it may lead to dificult to
>>>>>> maintain code and code opened to subtle bugs (due to the potential of
>>>>>> mixing devres and non-devres calls). He pointed out a similar approach
>>>>>> that was done for the I2C bus [4], [5].
>>>>>>
>>>>>> As the discussions in [1], [2] stopped w/o a clear conclusion, this
>>>>>> patch tries to revive it by proposing a similar approach that was done
>>>>>> for the I2C bus.
>>>>>>
>>>>>> Please let me know you input.  
>>>>>
>>>>> I'm with Jonathan here, the devres stuff is getting crazy here and you
>>>>> have drivers mixing them and side affects happening and lots of
>>>>> confusion.  Your change here is only going to make it even more
>>>>> confusing, and shouldn't actually solve it for other busses (i.e. what
>>>>> about iio devices NOT on the platform bus?)  
>>
>> In some cases they are already carrying the support as per the link
>> above covering all i2c drivers.  I'd like to see a generic solution and
>> I suspect pushing it to the device drivers rather than the bus code
>> will explode badly and leave us with subtle bugs where people don't
>> realise it is necessary. 
>>
>> https://lore.kernel.org/all/20250224120608.1769039-1-claudiu.beznea.uj@bp.renesas.com/
>> is a lot nastier looking than what we have here. I'll review that in a minute
>> to show that it need not be that bad, but none the less not pleasant.
>>
>> +CC linux-iio to join up threads and Dmitry wrt to i2c case (and HID that does
>> similar)
> 
> We should not expect individual drivers handle this, because this is a
> layering violation: they need to know implementation details of the bus
> code to know if the bus is using non-devres managed resources, and
> adjust their behavior. Moving this into driver core is also not
> feasible, as not all buses need it. So IMO this should belong to
> individual bus code.
> 
> Instead of using devres group a bus may opt to use
> devm_add_action_or_reset() and other devm APIs to make sure bus'
> resource unwinding is carried in the correct order relative to freeing
> driver-owned resources.

Can you please let us know your input on the approach proposed in this
patch? Or if you would prefer devm_add_action_or_reset() as suggested by
Dmitry? Or if you consider another approach would fit better?

Currently there were issues identified with the rzg2l-adc driver (driver
based solution proposed in [1]) and with the rzg3s thermal driver (solved
by function rzg3s_thermal_probe() from [2]).

As expressed previously by Jonathan and Dimitry this is a common problem
and as the issue is due to a call in the bus driver, would be better and
simpler to handle it in the bus driver. Otherwise, individual drivers would
have to be adjusted in a similar way.

Thank you,
Claudiu

[1]
https://lore.kernel.org/all/20250324122627.32336-2-claudiu.beznea.uj@bp.renesas.com/
[2]
https://lore.kernel.org/all/20250324135701.179827-3-claudiu.beznea.uj@bp.renesas.com/

> 
>>
>>>>
>>>> You're right, other busses will still have this problem.
>>>>   
>>>>>
>>>>> Why can't your individual driver handle this instead?  
>>
>> In my mind because it's the bus code that is doing the unexpected part by
>> making calls in the remove path that are effectively not in the same order
>> as probe because they occur between driver remove and related devres cleanup
>> for stuff registered in probe.
>>
>>>>
>>>> Initially I tried it at the driver level by using non-devres PM runtime
>>>> enable API but wasn't considered OK by all parties.
>>>>
>>>> I haven't thought about having devres_open_group()/devres_close_group() in
>>>> the driver itself but it should work.  
>>>
>>> Are you OK with having the devres_open_group()/devres_close_group() in the
>>> currently known affected drivers (drivers/iio/adc/rzg2l_adc.c and the
>>> proposed drivers/thermal/renesas/rzg3s_thermal.c [1]) ?
>>
>> I guess it may be the best of a bunch of not particularly nasty solutions...
> 
> We need to update _ALL_ platform drivers using devm then, and this is
> clearly not scalable.
> 
> Thanks.
>
Jonathan Cameron March 30, 2025, 3:31 p.m. UTC | #2
On Thu, 27 Mar 2025 18:47:53 +0200
Claudiu Beznea <claudiu.beznea@tuxon.dev> wrote:

> Hi, Rafael,
> 
> On 06.03.2025 08:11, Dmitry Torokhov wrote:
> > On Wed, Mar 05, 2025 at 02:03:09PM +0000, Jonathan Cameron wrote:  
> >> On Wed, 19 Feb 2025 14:45:07 +0200
> >> Claudiu Beznea <claudiu.beznea@tuxon.dev> wrote:
> >>  
> >>> Hi, Daniel, Jonathan,
> >>>
> >>> On 15.02.2025 15:51, Claudiu Beznea wrote:  
> >>>> Hi, Greg,
> >>>>
> >>>> On 15.02.2025 15:25, Greg KH wrote:    
> >>>>> On Sat, Feb 15, 2025 at 03:08:49PM +0200, Claudiu wrote:    
> >>>>>> From: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
> >>>>>>
> >>>>>> On the Renesas RZ/G3S (and other Renesas SoCs, e.g., RZ/G2{L, LC, UL}),
> >>>>>> clocks are managed through PM domains. These PM domains, registered on
> >>>>>> behalf of the clock controller driver, are configured with
> >>>>>> GENPD_FLAG_PM_CLK. In most of the Renesas drivers used by RZ SoCs, the
> >>>>>> clocks are enabled/disabled using runtime PM APIs. The power domains may
> >>>>>> also have power_on/power_off support implemented. After the device PM
> >>>>>> domain is powered off any CPU accesses to these domains leads to system
> >>>>>> aborts.
> >>>>>>
> >>>>>> During probe, devices are attached to the PM domain controlling their
> >>>>>> clocks and power. Similarly, during removal, devices are detached from the
> >>>>>> PM domain.
> >>>>>>
> >>>>>> The detachment call stack is as follows:
> >>>>>>
> >>>>>> device_driver_detach() ->
> >>>>>>   device_release_driver_internal() ->
> >>>>>>     __device_release_driver() ->
> >>>>>>       device_remove() ->
> >>>>>>         platform_remove() ->
> >>>>>> 	  dev_pm_domain_detach()
> >>>>>>
> >>>>>> During driver unbind, after the device is detached from its PM domain,
> >>>>>> the device_unbind_cleanup() function is called, which subsequently invokes
> >>>>>> devres_release_all(). This function handles devres resource cleanup.
> >>>>>>
> >>>>>> If runtime PM is enabled in driver probe via devm_pm_runtime_enable(), the
> >>>>>> cleanup process triggers the action or reset function for disabling runtime
> >>>>>> PM. This function is pm_runtime_disable_action(), which leads to the
> >>>>>> following call stack of interest when called:
> >>>>>>
> >>>>>> pm_runtime_disable_action() ->
> >>>>>>   pm_runtime_dont_use_autosuspend() ->
> >>>>>>     __pm_runtime_use_autosuspend() ->
> >>>>>>       update_autosuspend() ->
> >>>>>>         rpm_idle()
> >>>>>>
> >>>>>> The rpm_idle() function attempts to resume the device at runtime. However,
> >>>>>> at the point it is called, the device is no longer part of a PM domain
> >>>>>> (which manages clocks and power states). If the driver implements its own
> >>>>>> runtime PM APIs for specific functionalities - such as the rzg2l_adc
> >>>>>> driver - while also relying on the power domain subsystem for power
> >>>>>> management, rpm_idle() will invoke the driver's runtime PM API. However,
> >>>>>> since the device is no longer part of a PM domain at this point, the PM
> >>>>>> domain's runtime PM APIs will not be called. This leads to system aborts on
> >>>>>> Renesas SoCs.
> >>>>>>
> >>>>>> Another identified case is when a subsystem performs various cleanups
> >>>>>> using device_unbind_cleanup(), calling driver-specific APIs in the process.
> >>>>>> A known example is the thermal subsystem, which may call driver-specific
> >>>>>> APIs to disable the thermal device. The relevant call stack in this case
> >>>>>> is:
> >>>>>>
> >>>>>> device_driver_detach() ->
> >>>>>>   device_release_driver_internal() ->
> >>>>>>     device_unbind_cleanup() ->
> >>>>>>       devres_release_all() ->
> >>>>>>         devm_thermal_of_zone_release() ->
> >>>>>> 	  thermal_zone_device_disable() ->
> >>>>>> 	    thermal_zone_device_set_mode() ->
> >>>>>> 	      struct thermal_zone_device_ops::change_mode()
> >>>>>>
> >>>>>> At the moment the driver-specific change_mode() API is called, the device
> >>>>>> is no longer part of its PM domain. Accessing its registers without proper
> >>>>>> power management leads to system aborts.
> >>>>>>
> >>>>>> Open a devres group before calling the driver probe, and close it
> >>>>>> immediately after the driver remove function is called and before
> >>>>>> dev_pm_domain_detach(). This ensures that driver-specific devm actions or
> >>>>>> reset functions are executed immediately after the driver remove function
> >>>>>> completes. Additionally, it prevents driver-specific runtime PM APIs from
> >>>>>> being called when the device is no longer part of its power domain.
> >>>>>>
> >>>>>> Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
> >>>>>> ---
> >>>>>>
> >>>>>> Hi,  
> >>
> >> Hi Claudiu, Greg,
> >>
> >> Sorry, I missed this thread whilst travelling and only saw it because
> >> of reference from the in driver solution.
> >>  
> >>>>>>
> >>>>>> Although Ulf gave its green light for the approaches on both IIO [1],
> >>>>>> [2] and thermal subsystems [3], Jonathan considered unacceptable the
> >>>>>> approaches in [1], [2] as he considered it may lead to dificult to
> >>>>>> maintain code and code opened to subtle bugs (due to the potential of
> >>>>>> mixing devres and non-devres calls). He pointed out a similar approach
> >>>>>> that was done for the I2C bus [4], [5].
> >>>>>>
> >>>>>> As the discussions in [1], [2] stopped w/o a clear conclusion, this
> >>>>>> patch tries to revive it by proposing a similar approach that was done
> >>>>>> for the I2C bus.
> >>>>>>
> >>>>>> Please let me know you input.    
> >>>>>
> >>>>> I'm with Jonathan here, the devres stuff is getting crazy here and you
> >>>>> have drivers mixing them and side affects happening and lots of
> >>>>> confusion.  Your change here is only going to make it even more
> >>>>> confusing, and shouldn't actually solve it for other busses (i.e. what
> >>>>> about iio devices NOT on the platform bus?)    
> >>
> >> In some cases they are already carrying the support as per the link
> >> above covering all i2c drivers.  I'd like to see a generic solution and
> >> I suspect pushing it to the device drivers rather than the bus code
> >> will explode badly and leave us with subtle bugs where people don't
> >> realise it is necessary. 
> >>
> >> https://lore.kernel.org/all/20250224120608.1769039-1-claudiu.beznea.uj@bp.renesas.com/
> >> is a lot nastier looking than what we have here. I'll review that in a minute
> >> to show that it need not be that bad, but none the less not pleasant.
> >>
> >> +CC linux-iio to join up threads and Dmitry wrt to i2c case (and HID that does
> >> similar)  
> > 
> > We should not expect individual drivers handle this, because this is a
> > layering violation: they need to know implementation details of the bus
> > code to know if the bus is using non-devres managed resources, and
> > adjust their behavior. Moving this into driver core is also not
> > feasible, as not all buses need it. So IMO this should belong to
> > individual bus code.
> > 
> > Instead of using devres group a bus may opt to use
> > devm_add_action_or_reset() and other devm APIs to make sure bus'
> > resource unwinding is carried in the correct order relative to freeing
> > driver-owned resources.  
> 
> Can you please let us know your input on the approach proposed in this
> patch? Or if you would prefer devm_add_action_or_reset() as suggested by
> Dmitry? Or if you consider another approach would fit better?
> 
> Currently there were issues identified with the rzg2l-adc driver (driver
> based solution proposed in [1]) and with the rzg3s thermal driver (solved
> by function rzg3s_thermal_probe() from [2]).
> 
> As expressed previously by Jonathan and Dimitry this is a common problem
> and as the issue is due to a call in the bus driver, would be better and
> simpler to handle it in the bus driver. Otherwise, individual drivers would
> have to be adjusted in a similar way.
> 

Rafael,

Greg suggested we ask for your input on the right option:

https://lore.kernel.org/all/2025032703-genre-excitable-9473@gregkh/
(that thread has the other option).

Jonathan

> Thank you,
> Claudiu
> 
> [1]
> https://lore.kernel.org/all/20250324122627.32336-2-claudiu.beznea.uj@bp.renesas.com/
> [2]
> https://lore.kernel.org/all/20250324135701.179827-3-claudiu.beznea.uj@bp.renesas.com/
> 
> >   
> >>  
> >>>>
> >>>> You're right, other busses will still have this problem.
> >>>>     
> >>>>>
> >>>>> Why can't your individual driver handle this instead?    
> >>
> >> In my mind because it's the bus code that is doing the unexpected part by
> >> making calls in the remove path that are effectively not in the same order
> >> as probe because they occur between driver remove and related devres cleanup
> >> for stuff registered in probe.
> >>  
> >>>>
> >>>> Initially I tried it at the driver level by using non-devres PM runtime
> >>>> enable API but wasn't considered OK by all parties.
> >>>>
> >>>> I haven't thought about having devres_open_group()/devres_close_group() in
> >>>> the driver itself but it should work.    
> >>>
> >>> Are you OK with having the devres_open_group()/devres_close_group() in the
> >>> currently known affected drivers (drivers/iio/adc/rzg2l_adc.c and the
> >>> proposed drivers/thermal/renesas/rzg3s_thermal.c [1]) ?  
> >>
> >> I guess it may be the best of a bunch of not particularly nasty solutions...  
> > 
> > We need to update _ALL_ platform drivers using devm then, and this is
> > clearly not scalable.
> > 
> > Thanks.
> >   
>
Claudiu Beznea April 9, 2025, 4:12 p.m. UTC | #3
Hi, Rafael,

On 30.03.2025 18:31, Jonathan Cameron wrote:
> On Thu, 27 Mar 2025 18:47:53 +0200
> Claudiu Beznea <claudiu.beznea@tuxon.dev> wrote:
> 
>> Hi, Rafael,
>>
>> On 06.03.2025 08:11, Dmitry Torokhov wrote:
>>> On Wed, Mar 05, 2025 at 02:03:09PM +0000, Jonathan Cameron wrote:  
>>>> On Wed, 19 Feb 2025 14:45:07 +0200
>>>> Claudiu Beznea <claudiu.beznea@tuxon.dev> wrote:
>>>>  
>>>>> Hi, Daniel, Jonathan,
>>>>>
>>>>> On 15.02.2025 15:51, Claudiu Beznea wrote:  
>>>>>> Hi, Greg,
>>>>>>
>>>>>> On 15.02.2025 15:25, Greg KH wrote:    
>>>>>>> On Sat, Feb 15, 2025 at 03:08:49PM +0200, Claudiu wrote:    
>>>>>>>> From: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
>>>>>>>>
>>>>>>>> On the Renesas RZ/G3S (and other Renesas SoCs, e.g., RZ/G2{L, LC, UL}),
>>>>>>>> clocks are managed through PM domains. These PM domains, registered on
>>>>>>>> behalf of the clock controller driver, are configured with
>>>>>>>> GENPD_FLAG_PM_CLK. In most of the Renesas drivers used by RZ SoCs, the
>>>>>>>> clocks are enabled/disabled using runtime PM APIs. The power domains may
>>>>>>>> also have power_on/power_off support implemented. After the device PM
>>>>>>>> domain is powered off any CPU accesses to these domains leads to system
>>>>>>>> aborts.
>>>>>>>>
>>>>>>>> During probe, devices are attached to the PM domain controlling their
>>>>>>>> clocks and power. Similarly, during removal, devices are detached from the
>>>>>>>> PM domain.
>>>>>>>>
>>>>>>>> The detachment call stack is as follows:
>>>>>>>>
>>>>>>>> device_driver_detach() ->
>>>>>>>>   device_release_driver_internal() ->
>>>>>>>>     __device_release_driver() ->
>>>>>>>>       device_remove() ->
>>>>>>>>         platform_remove() ->
>>>>>>>> 	  dev_pm_domain_detach()
>>>>>>>>
>>>>>>>> During driver unbind, after the device is detached from its PM domain,
>>>>>>>> the device_unbind_cleanup() function is called, which subsequently invokes
>>>>>>>> devres_release_all(). This function handles devres resource cleanup.
>>>>>>>>
>>>>>>>> If runtime PM is enabled in driver probe via devm_pm_runtime_enable(), the
>>>>>>>> cleanup process triggers the action or reset function for disabling runtime
>>>>>>>> PM. This function is pm_runtime_disable_action(), which leads to the
>>>>>>>> following call stack of interest when called:
>>>>>>>>
>>>>>>>> pm_runtime_disable_action() ->
>>>>>>>>   pm_runtime_dont_use_autosuspend() ->
>>>>>>>>     __pm_runtime_use_autosuspend() ->
>>>>>>>>       update_autosuspend() ->
>>>>>>>>         rpm_idle()
>>>>>>>>
>>>>>>>> The rpm_idle() function attempts to resume the device at runtime. However,
>>>>>>>> at the point it is called, the device is no longer part of a PM domain
>>>>>>>> (which manages clocks and power states). If the driver implements its own
>>>>>>>> runtime PM APIs for specific functionalities - such as the rzg2l_adc
>>>>>>>> driver - while also relying on the power domain subsystem for power
>>>>>>>> management, rpm_idle() will invoke the driver's runtime PM API. However,
>>>>>>>> since the device is no longer part of a PM domain at this point, the PM
>>>>>>>> domain's runtime PM APIs will not be called. This leads to system aborts on
>>>>>>>> Renesas SoCs.
>>>>>>>>
>>>>>>>> Another identified case is when a subsystem performs various cleanups
>>>>>>>> using device_unbind_cleanup(), calling driver-specific APIs in the process.
>>>>>>>> A known example is the thermal subsystem, which may call driver-specific
>>>>>>>> APIs to disable the thermal device. The relevant call stack in this case
>>>>>>>> is:
>>>>>>>>
>>>>>>>> device_driver_detach() ->
>>>>>>>>   device_release_driver_internal() ->
>>>>>>>>     device_unbind_cleanup() ->
>>>>>>>>       devres_release_all() ->
>>>>>>>>         devm_thermal_of_zone_release() ->
>>>>>>>> 	  thermal_zone_device_disable() ->
>>>>>>>> 	    thermal_zone_device_set_mode() ->
>>>>>>>> 	      struct thermal_zone_device_ops::change_mode()
>>>>>>>>
>>>>>>>> At the moment the driver-specific change_mode() API is called, the device
>>>>>>>> is no longer part of its PM domain. Accessing its registers without proper
>>>>>>>> power management leads to system aborts.
>>>>>>>>
>>>>>>>> Open a devres group before calling the driver probe, and close it
>>>>>>>> immediately after the driver remove function is called and before
>>>>>>>> dev_pm_domain_detach(). This ensures that driver-specific devm actions or
>>>>>>>> reset functions are executed immediately after the driver remove function
>>>>>>>> completes. Additionally, it prevents driver-specific runtime PM APIs from
>>>>>>>> being called when the device is no longer part of its power domain.
>>>>>>>>
>>>>>>>> Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
>>>>>>>> ---
>>>>>>>>
>>>>>>>> Hi,  
>>>>
>>>> Hi Claudiu, Greg,
>>>>
>>>> Sorry, I missed this thread whilst travelling and only saw it because
>>>> of reference from the in driver solution.
>>>>  
>>>>>>>>
>>>>>>>> Although Ulf gave its green light for the approaches on both IIO [1],
>>>>>>>> [2] and thermal subsystems [3], Jonathan considered unacceptable the
>>>>>>>> approaches in [1], [2] as he considered it may lead to dificult to
>>>>>>>> maintain code and code opened to subtle bugs (due to the potential of
>>>>>>>> mixing devres and non-devres calls). He pointed out a similar approach
>>>>>>>> that was done for the I2C bus [4], [5].
>>>>>>>>
>>>>>>>> As the discussions in [1], [2] stopped w/o a clear conclusion, this
>>>>>>>> patch tries to revive it by proposing a similar approach that was done
>>>>>>>> for the I2C bus.
>>>>>>>>
>>>>>>>> Please let me know you input.    
>>>>>>>
>>>>>>> I'm with Jonathan here, the devres stuff is getting crazy here and you
>>>>>>> have drivers mixing them and side affects happening and lots of
>>>>>>> confusion.  Your change here is only going to make it even more
>>>>>>> confusing, and shouldn't actually solve it for other busses (i.e. what
>>>>>>> about iio devices NOT on the platform bus?)    
>>>>
>>>> In some cases they are already carrying the support as per the link
>>>> above covering all i2c drivers.  I'd like to see a generic solution and
>>>> I suspect pushing it to the device drivers rather than the bus code
>>>> will explode badly and leave us with subtle bugs where people don't
>>>> realise it is necessary. 
>>>>
>>>> https://lore.kernel.org/all/20250224120608.1769039-1-claudiu.beznea.uj@bp.renesas.com/
>>>> is a lot nastier looking than what we have here. I'll review that in a minute
>>>> to show that it need not be that bad, but none the less not pleasant.
>>>>
>>>> +CC linux-iio to join up threads and Dmitry wrt to i2c case (and HID that does
>>>> similar)  
>>>
>>> We should not expect individual drivers handle this, because this is a
>>> layering violation: they need to know implementation details of the bus
>>> code to know if the bus is using non-devres managed resources, and
>>> adjust their behavior. Moving this into driver core is also not
>>> feasible, as not all buses need it. So IMO this should belong to
>>> individual bus code.
>>>
>>> Instead of using devres group a bus may opt to use
>>> devm_add_action_or_reset() and other devm APIs to make sure bus'
>>> resource unwinding is carried in the correct order relative to freeing
>>> driver-owned resources.  
>>
>> Can you please let us know your input on the approach proposed in this
>> patch? Or if you would prefer devm_add_action_or_reset() as suggested by
>> Dmitry? Or if you consider another approach would fit better?
>>
>> Currently there were issues identified with the rzg2l-adc driver (driver
>> based solution proposed in [1]) and with the rzg3s thermal driver (solved
>> by function rzg3s_thermal_probe() from [2]).
>>
>> As expressed previously by Jonathan and Dimitry this is a common problem
>> and as the issue is due to a call in the bus driver, would be better and
>> simpler to handle it in the bus driver. Otherwise, individual drivers would
>> have to be adjusted in a similar way.
>>
> 
> Rafael,
> 
> Greg suggested we ask for your input on the right option:
> 
> https://lore.kernel.org/all/2025032703-genre-excitable-9473@gregkh/
> (that thread has the other option).

Can you please let us know your opinion on this?

Thank you,
Claudiu

> 
> Jonathan
> 
>> Thank you,
>> Claudiu
>>
>> [1]
>> https://lore.kernel.org/all/20250324122627.32336-2-claudiu.beznea.uj@bp.renesas.com/
>> [2]
>> https://lore.kernel.org/all/20250324135701.179827-3-claudiu.beznea.uj@bp.renesas.com/
>>
>>>   
>>>>  
>>>>>>
>>>>>> You're right, other busses will still have this problem.
>>>>>>     
>>>>>>>
>>>>>>> Why can't your individual driver handle this instead?    
>>>>
>>>> In my mind because it's the bus code that is doing the unexpected part by
>>>> making calls in the remove path that are effectively not in the same order
>>>> as probe because they occur between driver remove and related devres cleanup
>>>> for stuff registered in probe.
>>>>  
>>>>>>
>>>>>> Initially I tried it at the driver level by using non-devres PM runtime
>>>>>> enable API but wasn't considered OK by all parties.
>>>>>>
>>>>>> I haven't thought about having devres_open_group()/devres_close_group() in
>>>>>> the driver itself but it should work.    
>>>>>
>>>>> Are you OK with having the devres_open_group()/devres_close_group() in the
>>>>> currently known affected drivers (drivers/iio/adc/rzg2l_adc.c and the
>>>>> proposed drivers/thermal/renesas/rzg3s_thermal.c [1]) ?  
>>>>
>>>> I guess it may be the best of a bunch of not particularly nasty solutions...  
>>>
>>> We need to update _ALL_ platform drivers using devm then, and this is
>>> clearly not scalable.
>>>
>>> Thanks.
>>>   
>>
>
Claudiu Beznea May 9, 2025, 2:12 p.m. UTC | #4
Hi, Ulf,

Thank you for your input!

On 09.05.2025 16:07, Ulf Hansson wrote:
> On Fri, 9 May 2025 at 13:51, Claudiu Beznea <claudiu.beznea@tuxon.dev> wrote:
>>
>> Hi, Rafael, Ulf, PM list,
>>
>>
>> On 09.04.2025 19:12, Claudiu Beznea wrote:
>>> Hi, Rafael,
>>>
>>> On 30.03.2025 18:31, Jonathan Cameron wrote:
>>>> On Thu, 27 Mar 2025 18:47:53 +0200
>>>> Claudiu Beznea <claudiu.beznea@tuxon.dev> wrote:
>>>>
>>>>> Hi, Rafael,
>>>>>
>>>>> On 06.03.2025 08:11, Dmitry Torokhov wrote:
>>>>>> On Wed, Mar 05, 2025 at 02:03:09PM +0000, Jonathan Cameron wrote:
>>>>>>> On Wed, 19 Feb 2025 14:45:07 +0200
>>>>>>> Claudiu Beznea <claudiu.beznea@tuxon.dev> wrote:
>>>>>>>
>>>>>>>> Hi, Daniel, Jonathan,
>>>>>>>>
>>>>>>>> On 15.02.2025 15:51, Claudiu Beznea wrote:
>>>>>>>>> Hi, Greg,
>>>>>>>>>
>>>>>>>>> On 15.02.2025 15:25, Greg KH wrote:
>>>>>>>>>> On Sat, Feb 15, 2025 at 03:08:49PM +0200, Claudiu wrote:
>>>>>>>>>>> From: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
>>>>>>>>>>>
>>>>>>>>>>> On the Renesas RZ/G3S (and other Renesas SoCs, e.g., RZ/G2{L, LC, UL}),
>>>>>>>>>>> clocks are managed through PM domains. These PM domains, registered on
>>>>>>>>>>> behalf of the clock controller driver, are configured with
>>>>>>>>>>> GENPD_FLAG_PM_CLK. In most of the Renesas drivers used by RZ SoCs, the
>>>>>>>>>>> clocks are enabled/disabled using runtime PM APIs. The power domains may
>>>>>>>>>>> also have power_on/power_off support implemented. After the device PM
>>>>>>>>>>> domain is powered off any CPU accesses to these domains leads to system
>>>>>>>>>>> aborts.
>>>>>>>>>>>
>>>>>>>>>>> During probe, devices are attached to the PM domain controlling their
>>>>>>>>>>> clocks and power. Similarly, during removal, devices are detached from the
>>>>>>>>>>> PM domain.
>>>>>>>>>>>
>>>>>>>>>>> The detachment call stack is as follows:
>>>>>>>>>>>
>>>>>>>>>>> device_driver_detach() ->
>>>>>>>>>>>   device_release_driver_internal() ->
>>>>>>>>>>>     __device_release_driver() ->
>>>>>>>>>>>       device_remove() ->
>>>>>>>>>>>         platform_remove() ->
>>>>>>>>>>>         dev_pm_domain_detach()
>>>>>>>>>>>
>>>>>>>>>>> During driver unbind, after the device is detached from its PM domain,
>>>>>>>>>>> the device_unbind_cleanup() function is called, which subsequently invokes
>>>>>>>>>>> devres_release_all(). This function handles devres resource cleanup.
>>>>>>>>>>>
>>>>>>>>>>> If runtime PM is enabled in driver probe via devm_pm_runtime_enable(), the
>>>>>>>>>>> cleanup process triggers the action or reset function for disabling runtime
>>>>>>>>>>> PM. This function is pm_runtime_disable_action(), which leads to the
>>>>>>>>>>> following call stack of interest when called:
>>>>>>>>>>>
>>>>>>>>>>> pm_runtime_disable_action() ->
>>>>>>>>>>>   pm_runtime_dont_use_autosuspend() ->
>>>>>>>>>>>     __pm_runtime_use_autosuspend() ->
>>>>>>>>>>>       update_autosuspend() ->
>>>>>>>>>>>         rpm_idle()
>>>>>>>>>>>
>>>>>>>>>>> The rpm_idle() function attempts to resume the device at runtime. However,
>>>>>>>>>>> at the point it is called, the device is no longer part of a PM domain
>>>>>>>>>>> (which manages clocks and power states). If the driver implements its own
>>>>>>>>>>> runtime PM APIs for specific functionalities - such as the rzg2l_adc
>>>>>>>>>>> driver - while also relying on the power domain subsystem for power
>>>>>>>>>>> management, rpm_idle() will invoke the driver's runtime PM API. However,
>>>>>>>>>>> since the device is no longer part of a PM domain at this point, the PM
>>>>>>>>>>> domain's runtime PM APIs will not be called. This leads to system aborts on
>>>>>>>>>>> Renesas SoCs.
>>>>>>>>>>>
>>>>>>>>>>> Another identified case is when a subsystem performs various cleanups
>>>>>>>>>>> using device_unbind_cleanup(), calling driver-specific APIs in the process.
>>>>>>>>>>> A known example is the thermal subsystem, which may call driver-specific
>>>>>>>>>>> APIs to disable the thermal device. The relevant call stack in this case
>>>>>>>>>>> is:
>>>>>>>>>>>
>>>>>>>>>>> device_driver_detach() ->
>>>>>>>>>>>   device_release_driver_internal() ->
>>>>>>>>>>>     device_unbind_cleanup() ->
>>>>>>>>>>>       devres_release_all() ->
>>>>>>>>>>>         devm_thermal_of_zone_release() ->
>>>>>>>>>>>         thermal_zone_device_disable() ->
>>>>>>>>>>>           thermal_zone_device_set_mode() ->
>>>>>>>>>>>             struct thermal_zone_device_ops::change_mode()
>>>>>>>>>>>
>>>>>>>>>>> At the moment the driver-specific change_mode() API is called, the device
>>>>>>>>>>> is no longer part of its PM domain. Accessing its registers without proper
>>>>>>>>>>> power management leads to system aborts.
>>>>>>>>>>>
>>>>>>>>>>> Open a devres group before calling the driver probe, and close it
>>>>>>>>>>> immediately after the driver remove function is called and before
>>>>>>>>>>> dev_pm_domain_detach(). This ensures that driver-specific devm actions or
>>>>>>>>>>> reset functions are executed immediately after the driver remove function
>>>>>>>>>>> completes. Additionally, it prevents driver-specific runtime PM APIs from
>>>>>>>>>>> being called when the device is no longer part of its power domain.
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
>>>>>>>>>>> ---
>>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>
>>>>>>> Hi Claudiu, Greg,
>>>>>>>
>>>>>>> Sorry, I missed this thread whilst travelling and only saw it because
>>>>>>> of reference from the in driver solution.
>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Although Ulf gave its green light for the approaches on both IIO [1],
>>>>>>>>>>> [2] and thermal subsystems [3], Jonathan considered unacceptable the
>>>>>>>>>>> approaches in [1], [2] as he considered it may lead to dificult to
>>>>>>>>>>> maintain code and code opened to subtle bugs (due to the potential of
>>>>>>>>>>> mixing devres and non-devres calls). He pointed out a similar approach
>>>>>>>>>>> that was done for the I2C bus [4], [5].
>>>>>>>>>>>
>>>>>>>>>>> As the discussions in [1], [2] stopped w/o a clear conclusion, this
>>>>>>>>>>> patch tries to revive it by proposing a similar approach that was done
>>>>>>>>>>> for the I2C bus.
>>>>>>>>>>>
>>>>>>>>>>> Please let me know you input.
>>>>>>>>>>
>>>>>>>>>> I'm with Jonathan here, the devres stuff is getting crazy here and you
>>>>>>>>>> have drivers mixing them and side affects happening and lots of
>>>>>>>>>> confusion.  Your change here is only going to make it even more
>>>>>>>>>> confusing, and shouldn't actually solve it for other busses (i.e. what
>>>>>>>>>> about iio devices NOT on the platform bus?)
>>>>>>>
>>>>>>> In some cases they are already carrying the support as per the link
>>>>>>> above covering all i2c drivers.  I'd like to see a generic solution and
>>>>>>> I suspect pushing it to the device drivers rather than the bus code
>>>>>>> will explode badly and leave us with subtle bugs where people don't
>>>>>>> realise it is necessary.
>>>>>>>
>>>>>>> https://lore.kernel.org/all/20250224120608.1769039-1-claudiu.beznea.uj@bp.renesas.com/
>>>>>>> is a lot nastier looking than what we have here. I'll review that in a minute
>>>>>>> to show that it need not be that bad, but none the less not pleasant.
>>>>>>>
>>>>>>> +CC linux-iio to join up threads and Dmitry wrt to i2c case (and HID that does
>>>>>>> similar)
>>>>>>
>>>>>> We should not expect individual drivers handle this, because this is a
>>>>>> layering violation: they need to know implementation details of the bus
>>>>>> code to know if the bus is using non-devres managed resources, and
>>>>>> adjust their behavior. Moving this into driver core is also not
>>>>>> feasible, as not all buses need it. So IMO this should belong to
>>>>>> individual bus code.
>>>>>>
>>>>>> Instead of using devres group a bus may opt to use
>>>>>> devm_add_action_or_reset() and other devm APIs to make sure bus'
>>>>>> resource unwinding is carried in the correct order relative to freeing
>>>>>> driver-owned resources.
>>>>>
>>>>> Can you please let us know your input on the approach proposed in this
>>>>> patch? Or if you would prefer devm_add_action_or_reset() as suggested by
>>>>> Dmitry? Or if you consider another approach would fit better?
>>>>>
>>>>> Currently there were issues identified with the rzg2l-adc driver (driver
>>>>> based solution proposed in [1]) and with the rzg3s thermal driver (solved
>>>>> by function rzg3s_thermal_probe() from [2]).
>>>>>
>>>>> As expressed previously by Jonathan and Dimitry this is a common problem
>>>>> and as the issue is due to a call in the bus driver, would be better and
>>>>> simpler to handle it in the bus driver. Otherwise, individual drivers would
>>>>> have to be adjusted in a similar way.
>>>>>
>>>>
>>>> Rafael,
>>>>
>>>> Greg suggested we ask for your input on the right option:
>>>>
>>>> https://lore.kernel.org/all/2025032703-genre-excitable-9473@gregkh/
>>>> (that thread has the other option).
>>>
>>> Can you please let us know your opinion on this?
>> Can you please let us know if you have any suggestions for this?
> 
> It's been a while since I looked at this. Although as I understand it,
> the main issue comes from using devm_pm_runtime_enable().

Yes, it comes from the usage of devm_pm_runtime_enable() in drivers and the
dev_pm_domain_detach() call in platform_remove() right after calling
driver's remove function.

On the platform I experienced issues with, the dev_pm_domain_detach() drops
the clocks from the device power domain and any subsequent PM runtime
resume calls (that may happen in the devres cleanup phase) have no effect
on enabling the clocks. If driver has functions registered (e.g. through
devm_add_action_or_reset()), or driver specific runtime PM functions that
access directly registers in the devres cleanup phase this leads to system
aborts.


> 
> As I have tried to argue before, I think devm_pm_runtime_enable()
> should *not* be used. Not here, not at all. Runtime PM isn't like any
> other resources that we fetch/release. Instead, it's a behaviour that
> you turn on and off, which needs to be managed more carefully, rather
> than relying on fetch/release ordering from devres.
> 
> That said, I would convert the driver to use pm_runtime_enable() and
> pm_runtime_disable() instead.

I've tried this approach previously but it resulted in more complicated
code and thus, Jonathan wasn't happy with it [1].

Another approach I've tried was to have devres group opened/closed in the
driver itself [2], [3] but it was postponed as this approach may have a chance.

At the moment I have 2 drivers waiting for a resolution on this [2], [3]
and I recently posted a new one [4] that uses driver specific local devres
group to avoid this issue.


Thank you,
Claudiu

[1]
https://lore.kernel.org/all/20250224120608.1769039-2-claudiu.beznea.uj@bp.renesas.com
[2]
https://lore.kernel.org/all/20250324122627.32336-2-claudiu.beznea.uj@bp.renesas.com
[3]
https://lore.kernel.org/all/20250324135701.179827-3-claudiu.beznea.uj@bp.renesas.com/
[4]
https://lore.kernel.org/all/20250430103236.3511989-6-claudiu.beznea.uj@bp.renesas.com
Dmitry Torokhov May 19, 2025, 4:13 p.m. UTC | #5
Hi Ulf,

On Mon, May 19, 2025 at 05:02:05PM +0200, Ulf Hansson wrote:
> On Fri, 9 May 2025 at 16:12, Claudiu Beznea <claudiu.beznea@tuxon.dev> wrote:
> >
> > Hi, Ulf,
> >
> > Thank you for your input!
> >
> > On 09.05.2025 16:07, Ulf Hansson wrote:
> > > On Fri, 9 May 2025 at 13:51, Claudiu Beznea <claudiu.beznea@tuxon.dev> wrote:
> > >>
> > >> Hi, Rafael, Ulf, PM list,
> > >>
> > >>
> > >> On 09.04.2025 19:12, Claudiu Beznea wrote:
> > >>> Hi, Rafael,
> > >>>
> > >>> On 30.03.2025 18:31, Jonathan Cameron wrote:
> > >>>> On Thu, 27 Mar 2025 18:47:53 +0200
> > >>>> Claudiu Beznea <claudiu.beznea@tuxon.dev> wrote:
> > >>>>
> > >>>>> Hi, Rafael,
> > >>>>>
> > >>>>> On 06.03.2025 08:11, Dmitry Torokhov wrote:
> > >>>>>> On Wed, Mar 05, 2025 at 02:03:09PM +0000, Jonathan Cameron wrote:
> > >>>>>>> On Wed, 19 Feb 2025 14:45:07 +0200
> > >>>>>>> Claudiu Beznea <claudiu.beznea@tuxon.dev> wrote:
> > >>>>>>>
> > >>>>>>>> Hi, Daniel, Jonathan,
> > >>>>>>>>
> > >>>>>>>> On 15.02.2025 15:51, Claudiu Beznea wrote:
> > >>>>>>>>> Hi, Greg,
> > >>>>>>>>>
> > >>>>>>>>> On 15.02.2025 15:25, Greg KH wrote:
> > >>>>>>>>>> On Sat, Feb 15, 2025 at 03:08:49PM +0200, Claudiu wrote:
> > >>>>>>>>>>> From: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
> > >>>>>>>>>>>
> > >>>>>>>>>>> On the Renesas RZ/G3S (and other Renesas SoCs, e.g., RZ/G2{L, LC, UL}),
> > >>>>>>>>>>> clocks are managed through PM domains. These PM domains, registered on
> > >>>>>>>>>>> behalf of the clock controller driver, are configured with
> > >>>>>>>>>>> GENPD_FLAG_PM_CLK. In most of the Renesas drivers used by RZ SoCs, the
> > >>>>>>>>>>> clocks are enabled/disabled using runtime PM APIs. The power domains may
> > >>>>>>>>>>> also have power_on/power_off support implemented. After the device PM
> > >>>>>>>>>>> domain is powered off any CPU accesses to these domains leads to system
> > >>>>>>>>>>> aborts.
> > >>>>>>>>>>>
> > >>>>>>>>>>> During probe, devices are attached to the PM domain controlling their
> > >>>>>>>>>>> clocks and power. Similarly, during removal, devices are detached from the
> > >>>>>>>>>>> PM domain.
> > >>>>>>>>>>>
> > >>>>>>>>>>> The detachment call stack is as follows:
> > >>>>>>>>>>>
> > >>>>>>>>>>> device_driver_detach() ->
> > >>>>>>>>>>>   device_release_driver_internal() ->
> > >>>>>>>>>>>     __device_release_driver() ->
> > >>>>>>>>>>>       device_remove() ->
> > >>>>>>>>>>>         platform_remove() ->
> > >>>>>>>>>>>         dev_pm_domain_detach()
> > >>>>>>>>>>>
> > >>>>>>>>>>> During driver unbind, after the device is detached from its PM domain,
> > >>>>>>>>>>> the device_unbind_cleanup() function is called, which subsequently invokes
> > >>>>>>>>>>> devres_release_all(). This function handles devres resource cleanup.
> > >>>>>>>>>>>
> > >>>>>>>>>>> If runtime PM is enabled in driver probe via devm_pm_runtime_enable(), the
> > >>>>>>>>>>> cleanup process triggers the action or reset function for disabling runtime
> > >>>>>>>>>>> PM. This function is pm_runtime_disable_action(), which leads to the
> > >>>>>>>>>>> following call stack of interest when called:
> > >>>>>>>>>>>
> > >>>>>>>>>>> pm_runtime_disable_action() ->
> > >>>>>>>>>>>   pm_runtime_dont_use_autosuspend() ->
> > >>>>>>>>>>>     __pm_runtime_use_autosuspend() ->
> > >>>>>>>>>>>       update_autosuspend() ->
> > >>>>>>>>>>>         rpm_idle()
> > >>>>>>>>>>>
> > >>>>>>>>>>> The rpm_idle() function attempts to resume the device at runtime. However,
> > >>>>>>>>>>> at the point it is called, the device is no longer part of a PM domain
> > >>>>>>>>>>> (which manages clocks and power states). If the driver implements its own
> > >>>>>>>>>>> runtime PM APIs for specific functionalities - such as the rzg2l_adc
> > >>>>>>>>>>> driver - while also relying on the power domain subsystem for power
> > >>>>>>>>>>> management, rpm_idle() will invoke the driver's runtime PM API. However,
> > >>>>>>>>>>> since the device is no longer part of a PM domain at this point, the PM
> > >>>>>>>>>>> domain's runtime PM APIs will not be called. This leads to system aborts on
> > >>>>>>>>>>> Renesas SoCs.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Another identified case is when a subsystem performs various cleanups
> > >>>>>>>>>>> using device_unbind_cleanup(), calling driver-specific APIs in the process.
> > >>>>>>>>>>> A known example is the thermal subsystem, which may call driver-specific
> > >>>>>>>>>>> APIs to disable the thermal device. The relevant call stack in this case
> > >>>>>>>>>>> is:
> > >>>>>>>>>>>
> > >>>>>>>>>>> device_driver_detach() ->
> > >>>>>>>>>>>   device_release_driver_internal() ->
> > >>>>>>>>>>>     device_unbind_cleanup() ->
> > >>>>>>>>>>>       devres_release_all() ->
> > >>>>>>>>>>>         devm_thermal_of_zone_release() ->
> > >>>>>>>>>>>         thermal_zone_device_disable() ->
> > >>>>>>>>>>>           thermal_zone_device_set_mode() ->
> > >>>>>>>>>>>             struct thermal_zone_device_ops::change_mode()
> > >>>>>>>>>>>
> > >>>>>>>>>>> At the moment the driver-specific change_mode() API is called, the device
> > >>>>>>>>>>> is no longer part of its PM domain. Accessing its registers without proper
> > >>>>>>>>>>> power management leads to system aborts.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Open a devres group before calling the driver probe, and close it
> > >>>>>>>>>>> immediately after the driver remove function is called and before
> > >>>>>>>>>>> dev_pm_domain_detach(). This ensures that driver-specific devm actions or
> > >>>>>>>>>>> reset functions are executed immediately after the driver remove function
> > >>>>>>>>>>> completes. Additionally, it prevents driver-specific runtime PM APIs from
> > >>>>>>>>>>> being called when the device is no longer part of its power domain.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
> > >>>>>>>>>>> ---
> > >>>>>>>>>>>
> > >>>>>>>>>>> Hi,
> > >>>>>>>
> > >>>>>>> Hi Claudiu, Greg,
> > >>>>>>>
> > >>>>>>> Sorry, I missed this thread whilst travelling and only saw it because
> > >>>>>>> of reference from the in driver solution.
> > >>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> Although Ulf gave its green light for the approaches on both IIO [1],
> > >>>>>>>>>>> [2] and thermal subsystems [3], Jonathan considered unacceptable the
> > >>>>>>>>>>> approaches in [1], [2] as he considered it may lead to dificult to
> > >>>>>>>>>>> maintain code and code opened to subtle bugs (due to the potential of
> > >>>>>>>>>>> mixing devres and non-devres calls). He pointed out a similar approach
> > >>>>>>>>>>> that was done for the I2C bus [4], [5].
> > >>>>>>>>>>>
> > >>>>>>>>>>> As the discussions in [1], [2] stopped w/o a clear conclusion, this
> > >>>>>>>>>>> patch tries to revive it by proposing a similar approach that was done
> > >>>>>>>>>>> for the I2C bus.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Please let me know you input.
> > >>>>>>>>>>
> > >>>>>>>>>> I'm with Jonathan here, the devres stuff is getting crazy here and you
> > >>>>>>>>>> have drivers mixing them and side affects happening and lots of
> > >>>>>>>>>> confusion.  Your change here is only going to make it even more
> > >>>>>>>>>> confusing, and shouldn't actually solve it for other busses (i.e. what
> > >>>>>>>>>> about iio devices NOT on the platform bus?)
> > >>>>>>>
> > >>>>>>> In some cases they are already carrying the support as per the link
> > >>>>>>> above covering all i2c drivers.  I'd like to see a generic solution and
> > >>>>>>> I suspect pushing it to the device drivers rather than the bus code
> > >>>>>>> will explode badly and leave us with subtle bugs where people don't
> > >>>>>>> realise it is necessary.
> > >>>>>>>
> > >>>>>>> https://lore.kernel.org/all/20250224120608.1769039-1-claudiu.beznea.uj@bp.renesas.com/
> > >>>>>>> is a lot nastier looking than what we have here. I'll review that in a minute
> > >>>>>>> to show that it need not be that bad, but none the less not pleasant.
> > >>>>>>>
> > >>>>>>> +CC linux-iio to join up threads and Dmitry wrt to i2c case (and HID that does
> > >>>>>>> similar)
> > >>>>>>
> > >>>>>> We should not expect individual drivers handle this, because this is a
> > >>>>>> layering violation: they need to know implementation details of the bus
> > >>>>>> code to know if the bus is using non-devres managed resources, and
> > >>>>>> adjust their behavior. Moving this into driver core is also not
> > >>>>>> feasible, as not all buses need it. So IMO this should belong to
> > >>>>>> individual bus code.
> > >>>>>>
> > >>>>>> Instead of using devres group a bus may opt to use
> > >>>>>> devm_add_action_or_reset() and other devm APIs to make sure bus'
> > >>>>>> resource unwinding is carried in the correct order relative to freeing
> > >>>>>> driver-owned resources.
> > >>>>>
> > >>>>> Can you please let us know your input on the approach proposed in this
> > >>>>> patch? Or if you would prefer devm_add_action_or_reset() as suggested by
> > >>>>> Dmitry? Or if you consider another approach would fit better?
> > >>>>>
> > >>>>> Currently there were issues identified with the rzg2l-adc driver (driver
> > >>>>> based solution proposed in [1]) and with the rzg3s thermal driver (solved
> > >>>>> by function rzg3s_thermal_probe() from [2]).
> > >>>>>
> > >>>>> As expressed previously by Jonathan and Dimitry this is a common problem
> > >>>>> and as the issue is due to a call in the bus driver, would be better and
> > >>>>> simpler to handle it in the bus driver. Otherwise, individual drivers would
> > >>>>> have to be adjusted in a similar way.
> > >>>>>
> > >>>>
> > >>>> Rafael,
> > >>>>
> > >>>> Greg suggested we ask for your input on the right option:
> > >>>>
> > >>>> https://lore.kernel.org/all/2025032703-genre-excitable-9473@gregkh/
> > >>>> (that thread has the other option).
> > >>>
> > >>> Can you please let us know your opinion on this?
> > >> Can you please let us know if you have any suggestions for this?
> > >
> > > It's been a while since I looked at this. Although as I understand it,
> > > the main issue comes from using devm_pm_runtime_enable().
> >
> > Yes, it comes from the usage of devm_pm_runtime_enable() in drivers and the
> > dev_pm_domain_detach() call in platform_remove() right after calling
> > driver's remove function.
> 
> Okay.

This is not the root of the problem though. There is nothing really
special about power domain and runtime power management. The root of the
problem is that current code violates the order of releasing resources
by mixing devm- and normal resource management together. Usually it is
individual driver's fault, but in this case it is bus code that uses the
manual release (dev_om_domain_detach) that violates the "release in
opposite order to acquisition" rule.

> 
> >
> > On the platform I experienced issues with, the dev_pm_domain_detach() drops
> > the clocks from the device power domain and any subsequent PM runtime
> > resume calls (that may happen in the devres cleanup phase) have no effect
> > on enabling the clocks. If driver has functions registered (e.g. through
> > devm_add_action_or_reset()), or driver specific runtime PM functions that
> > access directly registers in the devres cleanup phase this leads to system
> > aborts.
> 
> So if you move away from using devm_pm_runtime_enable() things would
> be easier to manage and there is no additional new devres-management
> needed.

How exactly will it improve the situation? You still need to make sure
that you are not disabling things out of the order. You simply moving
the complexity to the driver, essentially forbidding it (and any other
driver on platform bus) from using any devm APIs. 

> 
> >
> >
> > >
> > > As I have tried to argue before, I think devm_pm_runtime_enable()
> > > should *not* be used. Not here, not at all. Runtime PM isn't like any
> > > other resources that we fetch/release. Instead, it's a behaviour that
> > > you turn on and off, which needs to be managed more carefully, rather
> > > than relying on fetch/release ordering from devres.

I disagree. It is a resource that you turn on and off, same as clocks,
regulators, interrupts, etc. We manage those during lifetime of the
device, disable them when going into low power mode/suspend, reenable
them upon resume, may disable and reenable them for other reasons.

PM is not any more special here. As long as you keep the proper order of
operations it works as well.

> > >
> > > That said, I would convert the driver to use pm_runtime_enable() and
> > > pm_runtime_disable() instead.
> >
> > I've tried this approach previously but it resulted in more complicated
> > code and thus, Jonathan wasn't happy with it [1].
> 
> I understand that you have been trying to move forward to address
> people's opinions. It's not always easy to keep everybody happy. :-)
> 
> That said, I still think this is the most viable option as it's how
> the vast majority of drivers do it today. A few lines of additional
> code shouldn't really be a big problem in my opinion.

Have you tried making such change? Again, you will need to abandon use
of most other devm APIs so that you keep the order of releasing
resources. The only devm that you can still use is devm_k*alloc(), the
rest has to be converted into unmanaged.

Thanks.
Claudiu Beznea May 21, 2025, 5:41 a.m. UTC | #6
Hi, Ulf,

On 20.05.2025 15:09, Ulf Hansson wrote:
> For example, even if the order is made correctly, suppose a driver's
> ->remove() callback completes by turning off the resources for its
> device and leaves runtime PM enabled, as it relies on devres to do it
> some point later. Beyond this point, nothing would prevent userspace
> for runtime resuming/suspending the device via sysfs. 

If I'm not wrong, that can't happen? The driver_sysfs_remove() is called
before device_remove() (which calls the driver remove) is called, this
being the call path:

device_driver_detach() ->
  device_release_driver_internal() ->
    __device_release_driver() ->
      driver_sysfs_remove()
      // ...
      device_remove()

And the driver_sysfs_remove() calls in the end __kernfs_remove() which
looks to me like the place that actually drops the entries from sysfs, this
being a call path for it:

driver_sysfs_remove() ->
  sysfs_remove_link() ->
    kernfs_remove_by_name() ->
      kernfs_remove_by_name_ns() ->
        __kernfs_remove() ->

activating the following line in __kernfs_remove():

pr_debug("kernfs %s: removing\n", kernfs_rcu_name(kn));

leads to the following prints when unbinding the watchdog device from its
watchdog driver (attached to platform bus) on my board:
https://p.fr33tux.org/935252

Thank you,
Claudiu
Ulf Hansson May 21, 2025, 12:37 p.m. UTC | #7
On Wed, 21 May 2025 at 07:41, Claudiu Beznea <claudiu.beznea@tuxon.dev> wrote:
>
> Hi, Ulf,
>
> On 20.05.2025 15:09, Ulf Hansson wrote:
> > For example, even if the order is made correctly, suppose a driver's
> > ->remove() callback completes by turning off the resources for its
> > device and leaves runtime PM enabled, as it relies on devres to do it
> > some point later. Beyond this point, nothing would prevent userspace
> > for runtime resuming/suspending the device via sysfs.
>
> If I'm not wrong, that can't happen? The driver_sysfs_remove() is called
> before device_remove() (which calls the driver remove) is called, this
> being the call path:
>
> device_driver_detach() ->
>   device_release_driver_internal() ->
>     __device_release_driver() ->
>       driver_sysfs_remove()
>       // ...
>       device_remove()
>
> And the driver_sysfs_remove() calls in the end __kernfs_remove() which
> looks to me like the place that actually drops the entries from sysfs, this
> being a call path for it:
>
> driver_sysfs_remove() ->
>   sysfs_remove_link() ->
>     kernfs_remove_by_name() ->
>       kernfs_remove_by_name_ns() ->
>         __kernfs_remove() ->
>
> activating the following line in __kernfs_remove():
>
> pr_debug("kernfs %s: removing\n", kernfs_rcu_name(kn));
>
> leads to the following prints when unbinding the watchdog device from its
> watchdog driver (attached to platform bus) on my board:
> https://p.fr33tux.org/935252

Indeed this is a very good point you make! I completely overlooked
this fact, thanks a lot for clarifying this!

However, my main point still stands.

In the end, there is nothing preventing rpm_suspend|resume|idle() in
drivers/base/power/runtime.c from running (don't forget runtime PM is
asynchronous too) for the device in question. This could lead to that
a ->runtime_suspend|resume|idle() callback becomes executed at any
point in time, as long as we haven't called pm_runtime_disable() for
the device.

That's why the devm_pm_runtime_enable() should be avoided as it simply
introduces a race-condition. Drivers need to be more careful and use
pm_runtime_enable|disable() explicitly to control the behaviour.

Kind regards
Uffe
Dmitry Torokhov May 21, 2025, 2:57 p.m. UTC | #8
On Wed, May 21, 2025 at 02:37:08PM +0200, Ulf Hansson wrote:
> On Wed, 21 May 2025 at 07:41, Claudiu Beznea <claudiu.beznea@tuxon.dev> wrote:
> >
> > Hi, Ulf,
> >
> > On 20.05.2025 15:09, Ulf Hansson wrote:
> > > For example, even if the order is made correctly, suppose a driver's
> > > ->remove() callback completes by turning off the resources for its
> > > device and leaves runtime PM enabled, as it relies on devres to do it
> > > some point later. Beyond this point, nothing would prevent userspace
> > > for runtime resuming/suspending the device via sysfs.
> >
> > If I'm not wrong, that can't happen? The driver_sysfs_remove() is called
> > before device_remove() (which calls the driver remove) is called, this
> > being the call path:
> >
> > device_driver_detach() ->
> >   device_release_driver_internal() ->
> >     __device_release_driver() ->
> >       driver_sysfs_remove()
> >       // ...
> >       device_remove()
> >
> > And the driver_sysfs_remove() calls in the end __kernfs_remove() which
> > looks to me like the place that actually drops the entries from sysfs, this
> > being a call path for it:
> >
> > driver_sysfs_remove() ->
> >   sysfs_remove_link() ->
> >     kernfs_remove_by_name() ->
> >       kernfs_remove_by_name_ns() ->
> >         __kernfs_remove() ->
> >
> > activating the following line in __kernfs_remove():
> >
> > pr_debug("kernfs %s: removing\n", kernfs_rcu_name(kn));
> >
> > leads to the following prints when unbinding the watchdog device from its
> > watchdog driver (attached to platform bus) on my board:
> > https://p.fr33tux.org/935252
> 
> Indeed this is a very good point you make! I completely overlooked
> this fact, thanks a lot for clarifying this!
> 
> However, my main point still stands.
> 
> In the end, there is nothing preventing rpm_suspend|resume|idle() in
> drivers/base/power/runtime.c from running (don't forget runtime PM is
> asynchronous too) for the device in question. This could lead to that
> a ->runtime_suspend|resume|idle() callback becomes executed at any
> point in time, as long as we haven't called pm_runtime_disable() for
> the device.

So exactly the same may happen if you enter driver->remove() and
something calls runtime API before pm_runtime_disable() is called.
The driver has (as they should be doing currently) be prepared for this.

> 
> That's why the devm_pm_runtime_enable() should be avoided as it simply
> introduces a race-condition. Drivers need to be more careful and use
> pm_runtime_enable|disable() explicitly to control the behaviour.

You make it sound like we are dealing with some non-deterministic
process, like garbage collector, where runtime disable done by devm
happens at some unspecified point in the future. However we are dealing
with very well defined order of operations, all happening within
__device_release_driver() call. It is the same scope as when using
manual pm_runtime_disable(). Just the order is wrong, that is it.

Thanks.
diff mbox series

Patch

diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index 6f2a33722c52..1b64c4a44263 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -1401,9 +1401,15 @@  static int platform_probe(struct device *_dev)
 		goto out;
 
 	if (drv->probe) {
+		dev->devres_group_id = devres_open_group(&dev->dev, NULL, GFP_KERNEL);
+		if (!dev->devres_group_id)
+			return -ENOMEM;
+
 		ret = drv->probe(dev);
-		if (ret)
+		if (ret) {
+			devres_close_group(&dev->dev, dev->devres_group_id);
 			dev_pm_domain_detach(_dev, true);
+		}
 	}
 
 out:
@@ -1422,6 +1428,8 @@  static void platform_remove(struct device *_dev)
 
 	if (drv->remove)
 		drv->remove(dev);
+	if (dev->devres_group_id)
+		devres_release_group(&dev->dev, dev->devres_group_id);
 	dev_pm_domain_detach(_dev, true);
 }
 
diff --git a/include/linux/platform_device.h b/include/linux/platform_device.h
index 074754c23d33..e842ad243bef 100644
--- a/include/linux/platform_device.h
+++ b/include/linux/platform_device.h
@@ -40,6 +40,9 @@  struct platform_device {
 	/* MFD cell pointer */
 	struct mfd_cell *mfd_cell;
 
+	/* ID of the probe devres group. */
+	void *devres_group_id;
+
 	/* arch specific additions */
 	struct pdev_archdata	archdata;
 };