mbox series

[v10,0/8] NVIDIA Tegra power management patches for 5.16

Message ID 20210831135450.26070-1-digetx@gmail.com
Headers show
Series NVIDIA Tegra power management patches for 5.16 | expand

Message

Dmitry Osipenko Aug. 31, 2021, 1:54 p.m. UTC
This is a reduced version of the patchset which adds power management
support to NVIDIA Tegra drivers. Viresh Kumar asked to send these PD/OPP
patches separately for now to reduce the noise and finalize the review.

I implemented new dev_get_performance_state() GENPD callback as was
discussed in v8/v9. GR3D driver patch shows how it's used by consumer
drivers.

v10: - Replaced dev_pm_opp_from_clk_rate() with dev_pm_opp_get_current(),
       as was requested by Viresh Kumar.

     - Added more comments to the code and extended commit message,
       as was requested by Viresh Kumar and Ulf Hansson.

     - Renamed get_performance_state() to dev_get_performance_state(),
       as was requested by Ulf Hansson.

     - Factored out 'performance' code out of __genpd_dev_pm_attach() into
       a separate function genpd_dev_initialize_performance_state(), as was
       requested by Ulf Hansson.

     - Removed dev_suspended argument from dev_get_performance_state(),
       as was requested by Ulf Hansson. It's replaced by the usage of
       pm_runtime_status_suspended(), see genpd_dev_get_current_performance_state().

Dmitry Osipenko (8):
  opp: Add dev_pm_opp_get_current()
  opp: Allow dev_pm_opp_set_clkname() to replace released clock
  opp: Change type of dev_pm_opp_attach_genpd(names) argument
  PM: domains: Add dev_get_performance_state() callback
  soc/tegra: pmc: Implement dev_get_performance_state() callback
  soc/tegra: Add devm_tegra_core_dev_init_opp_table_simple()
  gpu: host1x: Add host1x_channel_stop()
  drm/tegra: gr3d: Support generic power domain and runtime PM

 drivers/base/power/domain.c  |  90 ++++++--
 drivers/gpu/drm/tegra/gr3d.c | 384 ++++++++++++++++++++++++++++++-----
 drivers/gpu/host1x/channel.c |   8 +
 drivers/opp/core.c           |  51 ++++-
 drivers/soc/tegra/pmc.c      | 101 +++++++++
 include/linux/host1x.h       |   1 +
 include/linux/pm_domain.h    |   2 +
 include/linux/pm_opp.h       |  14 +-
 include/soc/tegra/common.h   |  13 ++
 9 files changed, 586 insertions(+), 78 deletions(-)

Comments

Viresh Kumar Sept. 1, 2021, 4:39 a.m. UTC | #1
On 31-08-21, 16:54, Dmitry Osipenko wrote:
> Add dev_pm_opp_get_current() helper that returns OPP corresponding

> to the current clock rate of a device.

> 

> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>

> ---

>  drivers/opp/core.c     | 43 +++++++++++++++++++++++++++++++++++++++---

>  include/linux/pm_opp.h |  6 ++++++

>  2 files changed, 46 insertions(+), 3 deletions(-)

> 

> diff --git a/drivers/opp/core.c b/drivers/opp/core.c

> index 04b4691a8aac..dde8a5cc948c 100644

> --- a/drivers/opp/core.c

> +++ b/drivers/opp/core.c

> @@ -939,7 +939,7 @@ static int _set_required_opps(struct device *dev,

>  	return ret;

>  }

>  

> -static void _find_current_opp(struct device *dev, struct opp_table *opp_table)

> +static struct dev_pm_opp *_find_current_opp(struct opp_table *opp_table)

>  {

>  	struct dev_pm_opp *opp = ERR_PTR(-ENODEV);

>  	unsigned long freq;

> @@ -949,6 +949,18 @@ static void _find_current_opp(struct device *dev, struct opp_table *opp_table)

>  		opp = _find_freq_ceil(opp_table, &freq);

>  	}

>  

> +	return opp;

> +}

> +

> +static void _find_and_set_current_opp(struct opp_table *opp_table)

> +{

> +	struct dev_pm_opp *opp;

> +

> +	if (opp_table->current_opp)

> +		return;


Why move this from caller as well ?

> +

> +	opp = _find_current_opp(opp_table);

> +

>  	/*

>  	 * Unable to find the current OPP ? Pick the first from the list since

>  	 * it is in ascending order, otherwise rest of the code will need to


If we aren't able to find current OPP based on current freq, then this
picks the first value from the list. Why shouldn't this be done in
your case as well ?

> @@ -1002,8 +1014,7 @@ static int _set_opp(struct device *dev, struct opp_table *opp_table,

>  		return _disable_opp_table(dev, opp_table);

>  

>  	/* Find the currently set OPP if we don't know already */

> -	if (unlikely(!opp_table->current_opp))

> -		_find_current_opp(dev, opp_table);

> +	_find_and_set_current_opp(opp_table);

>  

>  	old_opp = opp_table->current_opp;

>  

> @@ -2931,3 +2942,29 @@ int dev_pm_opp_sync_regulators(struct device *dev)

>  	return ret;

>  }

>  EXPORT_SYMBOL_GPL(dev_pm_opp_sync_regulators);

> +

> +/**

> + * dev_pm_opp_get_current() - Get current OPP

> + * @dev:	device for which we do this operation

> + *

> + * Get current OPP of a device based on current clock rate or by other means.

> + *

> + * Return: pointer to 'struct dev_pm_opp' on success and errorno otherwise.

> + */

> +struct dev_pm_opp *dev_pm_opp_get_current(struct device *dev)

> +{

> +	struct opp_table *opp_table;

> +	struct dev_pm_opp *opp;

> +

> +	opp_table = _find_opp_table(dev);

> +	if (IS_ERR(opp_table))

> +		return ERR_CAST(opp_table);

> +

> +	opp = _find_current_opp(opp_table);


This should not just go find the OPP based on current freq. This first
needs to check if curret_opp is set or not. If set, then just return
it, else run the _find_current_opp() function and update the
current_opp pointer as well.

-- 
viresh
Viresh Kumar Sept. 1, 2021, 4:41 a.m. UTC | #2
On 31-08-21, 16:54, Dmitry Osipenko wrote:
> Elements of the 'names' array are not changed by the code, constify them

> for consistency.

> 

> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>

> ---

>  drivers/opp/core.c     | 6 +++---

>  include/linux/pm_opp.h | 8 ++++----

>  2 files changed, 7 insertions(+), 7 deletions(-)

> 

> diff --git a/drivers/opp/core.c b/drivers/opp/core.c

> index 602e502d092e..d4e706a8b70d 100644

> --- a/drivers/opp/core.c

> +++ b/drivers/opp/core.c

> @@ -2359,12 +2359,12 @@ static void _opp_detach_genpd(struct opp_table *opp_table)

>   * "required-opps" are added in DT.

>   */

>  struct opp_table *dev_pm_opp_attach_genpd(struct device *dev,

> -		const char **names, struct device ***virt_devs)

> +		const char * const *names, struct device ***virt_devs)


I am sure there are issues around space around * here. Please run
checkpatch with --strict option for your series.

-- 
viresh
Dmitry Osipenko Sept. 1, 2021, 5:43 a.m. UTC | #3
01.09.2021 07:39, Viresh Kumar пишет:
> On 31-08-21, 16:54, Dmitry Osipenko wrote:

>> Add dev_pm_opp_get_current() helper that returns OPP corresponding

>> to the current clock rate of a device.

>>

>> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>

>> ---

>>  drivers/opp/core.c     | 43 +++++++++++++++++++++++++++++++++++++++---

>>  include/linux/pm_opp.h |  6 ++++++

>>  2 files changed, 46 insertions(+), 3 deletions(-)

>>

>> diff --git a/drivers/opp/core.c b/drivers/opp/core.c

>> index 04b4691a8aac..dde8a5cc948c 100644

>> --- a/drivers/opp/core.c

>> +++ b/drivers/opp/core.c

>> @@ -939,7 +939,7 @@ static int _set_required_opps(struct device *dev,

>>  	return ret;

>>  }

>>  

>> -static void _find_current_opp(struct device *dev, struct opp_table *opp_table)

>> +static struct dev_pm_opp *_find_current_opp(struct opp_table *opp_table)

>>  {

>>  	struct dev_pm_opp *opp = ERR_PTR(-ENODEV);

>>  	unsigned long freq;

>> @@ -949,6 +949,18 @@ static void _find_current_opp(struct device *dev, struct opp_table *opp_table)

>>  		opp = _find_freq_ceil(opp_table, &freq);

>>  	}

>>  

>> +	return opp;

>> +}

>> +

>> +static void _find_and_set_current_opp(struct opp_table *opp_table)

>> +{

>> +	struct dev_pm_opp *opp;

>> +

>> +	if (opp_table->current_opp)

>> +		return;

> 

> Why move this from caller as well ?


To make code cleaner.

>> +

>> +	opp = _find_current_opp(opp_table);

>> +

>>  	/*

>>  	 * Unable to find the current OPP ? Pick the first from the list since

>>  	 * it is in ascending order, otherwise rest of the code will need to

> 

> If we aren't able to find current OPP based on current freq, then this

> picks the first value from the list. Why shouldn't this be done in

> your case as well ?


You will get OPP which corresponds to the lowest freq, while h/w runs on
unsupported high freq. This may end with a tragedy.

>> @@ -1002,8 +1014,7 @@ static int _set_opp(struct device *dev, struct opp_table *opp_table,

>>  		return _disable_opp_table(dev, opp_table);

>>  

>>  	/* Find the currently set OPP if we don't know already */

>> -	if (unlikely(!opp_table->current_opp))

>> -		_find_current_opp(dev, opp_table);

>> +	_find_and_set_current_opp(opp_table);

>>  

>>  	old_opp = opp_table->current_opp;

>>  

>> @@ -2931,3 +2942,29 @@ int dev_pm_opp_sync_regulators(struct device *dev)

>>  	return ret;

>>  }

>>  EXPORT_SYMBOL_GPL(dev_pm_opp_sync_regulators);

>> +

>> +/**

>> + * dev_pm_opp_get_current() - Get current OPP

>> + * @dev:	device for which we do this operation

>> + *

>> + * Get current OPP of a device based on current clock rate or by other means.

>> + *

>> + * Return: pointer to 'struct dev_pm_opp' on success and errorno otherwise.

>> + */

>> +struct dev_pm_opp *dev_pm_opp_get_current(struct device *dev)

>> +{

>> +	struct opp_table *opp_table;

>> +	struct dev_pm_opp *opp;

>> +

>> +	opp_table = _find_opp_table(dev);

>> +	if (IS_ERR(opp_table))

>> +		return ERR_CAST(opp_table);

>> +

>> +	opp = _find_current_opp(opp_table);

> 

> This should not just go find the OPP based on current freq. This first

> needs to check if curret_opp is set or not. If set, then just return

> it, else run the _find_current_opp() function and update the

> current_opp pointer as well.

> 


Alright, I'll change it.
Dmitry Osipenko Sept. 1, 2021, 5:44 a.m. UTC | #4
01.09.2021 07:41, Viresh Kumar пишет:
> On 31-08-21, 16:54, Dmitry Osipenko wrote:

>> Elements of the 'names' array are not changed by the code, constify them

>> for consistency.

>>

>> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>

>> ---

>>  drivers/opp/core.c     | 6 +++---

>>  include/linux/pm_opp.h | 8 ++++----

>>  2 files changed, 7 insertions(+), 7 deletions(-)

>>

>> diff --git a/drivers/opp/core.c b/drivers/opp/core.c

>> index 602e502d092e..d4e706a8b70d 100644

>> --- a/drivers/opp/core.c

>> +++ b/drivers/opp/core.c

>> @@ -2359,12 +2359,12 @@ static void _opp_detach_genpd(struct opp_table *opp_table)

>>   * "required-opps" are added in DT.

>>   */

>>  struct opp_table *dev_pm_opp_attach_genpd(struct device *dev,

>> -		const char **names, struct device ***virt_devs)

>> +		const char * const *names, struct device ***virt_devs)

> 

> I am sure there are issues around space around * here. Please run

> checkpatch with --strict option for your series.

> 


It is the other way around. This fixes the checkpatch warning and that's
what checkpatch wants. You may also grep the kernel to find that this is
the only variant used in practice.
Viresh Kumar Sept. 1, 2021, 5:48 a.m. UTC | #5
On 01-09-21, 08:44, Dmitry Osipenko wrote:
> 01.09.2021 07:41, Viresh Kumar пишет:

> > On 31-08-21, 16:54, Dmitry Osipenko wrote:

> >> Elements of the 'names' array are not changed by the code, constify them

> >> for consistency.

> >>

> >> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>

> >> ---

> >>  drivers/opp/core.c     | 6 +++---

> >>  include/linux/pm_opp.h | 8 ++++----

> >>  2 files changed, 7 insertions(+), 7 deletions(-)

> >>

> >> diff --git a/drivers/opp/core.c b/drivers/opp/core.c

> >> index 602e502d092e..d4e706a8b70d 100644

> >> --- a/drivers/opp/core.c

> >> +++ b/drivers/opp/core.c

> >> @@ -2359,12 +2359,12 @@ static void _opp_detach_genpd(struct opp_table *opp_table)

> >>   * "required-opps" are added in DT.

> >>   */

> >>  struct opp_table *dev_pm_opp_attach_genpd(struct device *dev,

> >> -		const char **names, struct device ***virt_devs)

> >> +		const char * const *names, struct device ***virt_devs)

> > 

> > I am sure there are issues around space around * here. Please run

> > checkpatch with --strict option for your series.

> > 

> 

> It is the other way around. This fixes the checkpatch warning and that's

> what checkpatch wants. You may also grep the kernel to find that this is

> the only variant used in practice.


Heh, you are right. I somehow thought that * never has a space right
after.

-- 
viresh
Viresh Kumar Sept. 1, 2021, 6:05 a.m. UTC | #6
On 01-09-21, 08:43, Dmitry Osipenko wrote:
> You will get OPP which corresponds to the lowest freq, while h/w runs on

> unsupported high freq. This may end with a tragedy.


Yeah, because you are setting a performance state with this, it can be
a problem.

-- 
viresh
Viresh Kumar Sept. 1, 2021, 6:10 a.m. UTC | #7
On 31-08-21, 16:54, Dmitry Osipenko wrote:
> +static int

> +tegra_pmc_pd_dev_get_performance_state(struct generic_pm_domain *genpd,

> +				       struct device *dev)

> +{

> +	struct opp_table *hw_opp_table, *clk_opp_table;

> +	struct dev_pm_opp *opp;

> +	u32 hw_version;

> +	int ret;

> +

> +	/*

> +	 * The EMC devices are a special case because we have a protection

> +	 * from non-EMC drivers getting clock handle before EMC driver is

> +	 * fully initialized.  The goal of the protection is to prevent

> +	 * devfreq driver from getting failures if it will try to change

> +	 * EMC clock rate until clock is fully initialized.  The EMC drivers

> +	 * will initialize the performance state by themselves.

> +	 *

> +	 * Display controller also is a special case because only controller

> +	 * driver could get the clock rate based on configuration of internal

> +	 * divider.

> +	 *

> +	 * Clock driver uses its own state syncing.

> +	 */

> +	if (of_device_compatible_match(dev->of_node, tegra_pd_no_perf_compats))

> +		return 0;

> +

> +	if (of_machine_is_compatible("nvidia,tegra20"))

> +		hw_version = BIT(tegra_sku_info.soc_process_id);

> +	else

> +		hw_version = BIT(tegra_sku_info.soc_speedo_id);

> +

> +	hw_opp_table = dev_pm_opp_set_supported_hw(dev, &hw_version, 1);

> +	if (IS_ERR(hw_opp_table)) {

> +		dev_err(dev, "failed to set OPP supported HW: %pe\n",

> +			hw_opp_table);

> +		return PTR_ERR(hw_opp_table);

> +	}

> +

> +	/*

> +	 * Older device-trees don't have OPPs, meanwhile drivers use

> +	 * dev_pm_opp_set_rate() and it requires OPP table to be set

> +	 * up using dev_pm_opp_set_clkname().

> +	 *

> +	 * The devm_tegra_core_dev_init_opp_table() is a common helper

> +	 * that sets up OPP table for Tegra drivers and it sets the clk

> +	 * for compatibility with older device-tress.  GR3D driver uses that

> +	 * helper, it also uses devm_pm_opp_attach_genpd() to manually attach

> +	 * power domains, which now holds the reference to OPP table that

> +	 * already has clk set up implicitly by OPP core for a newly created

> +	 * table using dev_pm_opp_of_add_table() invoked below.

> +	 *

> +	 * Hence we need to explicitly set/put the clk, otherwise

> +	 * further dev_pm_opp_set_clkname() will fail with -EBUSY.

> +	 */

> +	clk_opp_table = dev_pm_opp_set_clkname(dev, NULL);

> +	if (IS_ERR(clk_opp_table)) {

> +		dev_err(dev, "failed to set OPP clk: %pe\n", clk_opp_table);

> +		ret = PTR_ERR(clk_opp_table);

> +		goto put_hw;

> +	}

> +

> +	ret = dev_pm_opp_of_add_table(dev);

> +	if (ret) {

> +		dev_err(dev, "failed to add OPP table: %d\n", ret);

> +		goto put_clk;

> +	}

> +

> +	opp = dev_pm_opp_get_current(dev);

> +	if (IS_ERR(opp)) {

> +		dev_err(dev, "failed to get current OPP: %pe\n", opp);

> +		ret = PTR_ERR(opp);

> +	} else {

> +		ret = dev_pm_opp_get_required_pstate(opp, 0);

> +		dev_pm_opp_put(opp);

> +	}

> +

> +	dev_pm_opp_of_remove_table(dev);

> +put_clk:

> +	dev_pm_opp_put_clkname(clk_opp_table);

> +put_hw:

> +	dev_pm_opp_put_supported_hw(hw_opp_table);


So you create the OPP table and just after that you remove it ? Just
to get the current OPP and pstate ? This doesn't look okay.

Moreover, this routine must be implemented with the expectation that
the genpd core can call it anytime it wants, not just at the
beginning. And so if the table is already setup by someone else then,
then this all will just fail.

I did have a look at the comment you added above regarding
devm_tegra_core_dev_init_opp_table(), but I am not sure of the series
of events right now. For me, the OPP table should just be added once
and for ever. You shouldn't remove and add it again. That is bound to
break somewhere later.

Can you give the sequence in which the whole thing works out with
respect to the OPP core? who calls
devm_tegra_core_dev_init_opp_table() and when, and when exactly will
this routine get called, etc ?

-- 
viresh
Dmitry Osipenko Sept. 1, 2021, 6:57 a.m. UTC | #8
01.09.2021 09:10, Viresh Kumar пишет:
> On 31-08-21, 16:54, Dmitry Osipenko wrote:

>> +static int

>> +tegra_pmc_pd_dev_get_performance_state(struct generic_pm_domain *genpd,

>> +				       struct device *dev)

>> +{

>> +	struct opp_table *hw_opp_table, *clk_opp_table;

>> +	struct dev_pm_opp *opp;

>> +	u32 hw_version;

>> +	int ret;

>> +

>> +	/*

>> +	 * The EMC devices are a special case because we have a protection

>> +	 * from non-EMC drivers getting clock handle before EMC driver is

>> +	 * fully initialized.  The goal of the protection is to prevent

>> +	 * devfreq driver from getting failures if it will try to change

>> +	 * EMC clock rate until clock is fully initialized.  The EMC drivers

>> +	 * will initialize the performance state by themselves.

>> +	 *

>> +	 * Display controller also is a special case because only controller

>> +	 * driver could get the clock rate based on configuration of internal

>> +	 * divider.

>> +	 *

>> +	 * Clock driver uses its own state syncing.

>> +	 */

>> +	if (of_device_compatible_match(dev->of_node, tegra_pd_no_perf_compats))

>> +		return 0;

>> +

>> +	if (of_machine_is_compatible("nvidia,tegra20"))

>> +		hw_version = BIT(tegra_sku_info.soc_process_id);

>> +	else

>> +		hw_version = BIT(tegra_sku_info.soc_speedo_id);

>> +

>> +	hw_opp_table = dev_pm_opp_set_supported_hw(dev, &hw_version, 1);

>> +	if (IS_ERR(hw_opp_table)) {

>> +		dev_err(dev, "failed to set OPP supported HW: %pe\n",

>> +			hw_opp_table);

>> +		return PTR_ERR(hw_opp_table);

>> +	}

>> +

>> +	/*

>> +	 * Older device-trees don't have OPPs, meanwhile drivers use

>> +	 * dev_pm_opp_set_rate() and it requires OPP table to be set

>> +	 * up using dev_pm_opp_set_clkname().

>> +	 *

>> +	 * The devm_tegra_core_dev_init_opp_table() is a common helper

>> +	 * that sets up OPP table for Tegra drivers and it sets the clk

>> +	 * for compatibility with older device-tress.  GR3D driver uses that

>> +	 * helper, it also uses devm_pm_opp_attach_genpd() to manually attach

>> +	 * power domains, which now holds the reference to OPP table that

>> +	 * already has clk set up implicitly by OPP core for a newly created

>> +	 * table using dev_pm_opp_of_add_table() invoked below.

>> +	 *

>> +	 * Hence we need to explicitly set/put the clk, otherwise

>> +	 * further dev_pm_opp_set_clkname() will fail with -EBUSY.

>> +	 */

>> +	clk_opp_table = dev_pm_opp_set_clkname(dev, NULL);

>> +	if (IS_ERR(clk_opp_table)) {

>> +		dev_err(dev, "failed to set OPP clk: %pe\n", clk_opp_table);

>> +		ret = PTR_ERR(clk_opp_table);

>> +		goto put_hw;

>> +	}

>> +

>> +	ret = dev_pm_opp_of_add_table(dev);

>> +	if (ret) {

>> +		dev_err(dev, "failed to add OPP table: %d\n", ret);

>> +		goto put_clk;

>> +	}

>> +

>> +	opp = dev_pm_opp_get_current(dev);

>> +	if (IS_ERR(opp)) {

>> +		dev_err(dev, "failed to get current OPP: %pe\n", opp);

>> +		ret = PTR_ERR(opp);

>> +	} else {

>> +		ret = dev_pm_opp_get_required_pstate(opp, 0);

>> +		dev_pm_opp_put(opp);

>> +	}

>> +

>> +	dev_pm_opp_of_remove_table(dev);

>> +put_clk:

>> +	dev_pm_opp_put_clkname(clk_opp_table);

>> +put_hw:

>> +	dev_pm_opp_put_supported_hw(hw_opp_table);

> 

> So you create the OPP table and just after that you remove it ? Just

> to get the current OPP and pstate ? This doesn't look okay.

> 

> Moreover, this routine must be implemented with the expectation that

> the genpd core can call it anytime it wants, not just at the

> beginning. And so if the table is already setup by someone else then,

> then this all will just fail.


This is not doable using the current OPP API, it doesn't allow to re-use initialized OPP table. The current limitation is okay because genpd core doesn't call routine anytime.

> I did have a look at the comment you added above regarding

> devm_tegra_core_dev_init_opp_table(), but I am not sure of the series

> of events right now. For me, the OPP table should just be added once

> and for ever. You shouldn't remove and add it again. That is bound to

> break somewhere later.


I see that the comment about devm_tegra_core_dev_init_opp_table() is outdated now, will update it. There was a refcounting bug in v9 where I accidentally used devm_pm_opp_of_add_table instead of dev_, still it fails similarly, but in a different place now. 

> Can you give the sequence in which the whole thing works out with

> respect to the OPP core? who calls

> devm_tegra_core_dev_init_opp_table() and when, and when exactly will

> this routine get called, etc ?

> 


gr3d_probe(struct platform_device *pdev)
{
	gr3d_init_power(dev)
	{
		static const char * const opp_genpd_names[] = { "3d0", "3d1", NULL };

		devm_pm_opp_attach_genpd(dev, opp_genpd_names)
		{
			dev_pm_opp_attach_genpd(dev, names, virt_devs)
			{
				// takes and holds table reference
				opp_table = _add_opp_table(dev, false);

				while (*name) {
					// first attachment succeeds, 
					// second fails with "tegra-gr3d 54180000.gr3d: failed to set OPP clk: -EBUSY"
					dev_pm_domain_attach_by_name(dev, *name)
					{
						tegra_pmc_pd_dev_get_performance_state(dev)
						{
							dev_pm_opp_set_clkname(dev, NULL);
							dev_pm_opp_of_add_table(dev);
							opp = dev_pm_opp_get_current(dev);
							dev_pm_opp_of_remove_table(dev);
							dev_pm_opp_put_clkname(opp_table);
							...
						}
						// opp_table->clk = ERR_PTR(-EINVAL) after the first attachment
					}
				}
			}
		}
	}

	devm_tegra_core_dev_init_opp_table_simple(&pdev->dev);

	return 0;
}


WARNING: CPU: 3 PID: 7 at drivers/opp/core.c:2151 dev_pm_opp_set_clkname+0x97/0xb8
Modules linked in:
CPU: 3 PID: 7 Comm: kworker/u8:0 Tainted: G        W         5.14.0-next-20210831-00177-g6850c69f8c92-dirty #9371
Hardware name: NVIDIA Tegra SoC (Flattened Device Tree)
Workqueue: events_unbound deferred_probe_work_func
[<c010cc91>] (unwind_backtrace) from [<c0108d35>] (show_stack+0x11/0x14)
[<c0108d35>] (show_stack) from [<c0a6e25d>] (dump_stack_lvl+0x2b/0x34)
[<c0a6e25d>] (dump_stack_lvl) from [<c011fc7f>] (__warn+0xbb/0x100)
[<c011fc7f>] (__warn) from [<c0a6b783>] (warn_slowpath_fmt+0x4b/0x80)
[<c0a6b783>] (warn_slowpath_fmt) from [<c0742613>] (dev_pm_opp_set_clkname+0x97/0xb8)
[<c0742613>] (dev_pm_opp_set_clkname) from [<c0509815>] (tegra_pmc_pd_dev_get_performance_state+0x85/0x120)
[<c0509815>] (tegra_pmc_pd_dev_get_performance_state) from [<c05cd3db>] (__genpd_dev_pm_attach+0xe7/0x218)
[<c05cd3db>] (__genpd_dev_pm_attach) from [<c05cd5d3>] (genpd_dev_pm_attach_by_id+0x8b/0xec)
[<c05cd5d3>] (genpd_dev_pm_attach_by_id) from [<c074287f>] (dev_pm_opp_attach_genpd+0x97/0x11c)
[<c074287f>] (dev_pm_opp_attach_genpd) from [<c0742913>] (devm_pm_opp_attach_genpd+0xf/0x6c)
[<c0742913>] (devm_pm_opp_attach_genpd) from [<c05ac7a5>] (gr3d_probe+0x245/0x348)
[<c05ac7a5>] (gr3d_probe) from [<c05bc003>] (platform_probe+0x43/0x84)
[<c05bc003>] (platform_probe) from [<c05ba569>] (really_probe.part.0+0x69/0x200)
[<c05ba569>] (really_probe.part.0) from [<c05ba773>] (__driver_probe_device+0x73/0xd4)
[<c05ba773>] (__driver_probe_device) from [<c05ba809>] (driver_probe_device+0x35/0xd0)
[<c05ba809>] (driver_probe_device) from [<c05bac11>] (__device_attach_driver+0x75/0x98)
[<c05bac11>] (__device_attach_driver) from [<c05b9005>] (bus_for_each_drv+0x51/0x7c)
[<c05b9005>] (bus_for_each_drv) from [<c05ba9f7>] (__device_attach+0x8b/0x104)
[<c05ba9f7>] (__device_attach) from [<c05b9b1b>] (bus_probe_device+0x5b/0x60)
[<c05b9b1b>] (bus_probe_device) from [<c05b7707>] (device_add+0x293/0x65c)
[<c05b7707>] (device_add) from [<c07798b7>] (of_platform_device_create_pdata+0x63/0x88)
[<c07798b7>] (of_platform_device_create_pdata) from [<c07799e5>] (of_platform_bus_create+0xfd/0x26c)
[<c07799e5>] (of_platform_bus_create) from [<c0779c2d>] (of_platform_populate+0x45/0x84)
[<c0779c2d>] (of_platform_populate) from [<c0779cc5>] (devm_of_platform_populate+0x41/0x6c)
[<c0779cc5>] (devm_of_platform_populate) from [<c054aa65>] (host1x_probe+0x1e9/0x2c8)
[<c054aa65>] (host1x_probe) from [<c05bc003>] (platform_probe+0x43/0x84)
[<c05bc003>] (platform_probe) from [<c05ba569>] (really_probe.part.0+0x69/0x200)
[<c05ba569>] (really_probe.part.0) from [<c05ba773>] (__driver_probe_device+0x73/0xd4)
[<c05ba773>] (__driver_probe_device) from [<c05ba809>] (driver_probe_device+0x35/0xd0)
[<c05ba809>] (driver_probe_device) from [<c05bac11>] (__device_attach_driver+0x75/0x98)
[<c05bac11>] (__device_attach_driver) from [<c05b9005>] (bus_for_each_drv+0x51/0x7c)
[<c05b9005>] (bus_for_each_drv) from [<c05ba9f7>] (__device_attach+0x8b/0x104)
[<c05ba9f7>] (__device_attach) from [<c05b9b1b>] (bus_probe_device+0x5b/0x60)
[<c05b9b1b>] (bus_probe_device) from [<c05b9dfb>] (deferred_probe_work_func+0x57/0x78)
[<c05b9dfb>] (deferred_probe_work_func) from [<c013701f>] (process_one_work+0x147/0x3f8)
[<c013701f>] (process_one_work) from [<c0137805>] (worker_thread+0x21d/0x3f4)
[<c0137805>] (worker_thread) from [<c013c1bb>] (kthread+0x123/0x140)
[<c013c1bb>] (kthread) from [<c0100135>] (ret_from_fork+0x11/0x1c)
Exception stack(0xc1567fb0 to 0xc1567ff8)
---[ end trace f68728a0d3053b54 ]---
tegra-gr3d 54180000.gr3d: failed to set OPP clk: -EBUSY
genpd genpd:1:54180000.gr3d: failed to get performance state of 54180000.gr3d for power-domain 3d1: -16
tegra-gr3d 54180000.gr3d: Couldn't attach to pm_domain: -16
Viresh Kumar Sept. 1, 2021, 7:16 a.m. UTC | #9
On 01-09-21, 09:57, Dmitry Osipenko wrote:
> 01.09.2021 09:10, Viresh Kumar пишет:

> > So you create the OPP table and just after that you remove it ? Just

> > to get the current OPP and pstate ? This doesn't look okay.

> > 

> > Moreover, this routine must be implemented with the expectation that

> > the genpd core can call it anytime it wants, not just at the

> > beginning. And so if the table is already setup by someone else then,

> > then this all will just fail.

> 

> This is not doable using the current OPP API, it doesn't allow to

> re-use initialized OPP table.


That isn't correct, you can call dev_pm_opp_of_add_table() as many
times as you want. It will just increase the refcount and return the
next time.

Yes, you can call the APIs like set-clk-name or supported-hw, since
they are supposed to be set only once.

> The current limitation is okay because genpd core doesn't call

> routine anytime.


Yeah, but is broken by design. People can make changes to genpd core
later on to call it as many times and they aren't required to have a
look at all the users to see who abused the API and who didn't.

> > Can you give the sequence in which the whole thing works out with

> > respect to the OPP core? who calls

> > devm_tegra_core_dev_init_opp_table() and when, and when exactly will

> > this routine get called, etc ?

> > 

> 

> gr3d_probe(struct platform_device *pdev)


Thanks for this.

> {

> 	gr3d_init_power(dev)

> 	{

> 		static const char * const opp_genpd_names[] = { "3d0", "3d1", NULL };

> 

> 		devm_pm_opp_attach_genpd(dev, opp_genpd_names)

> 		{

> 			dev_pm_opp_attach_genpd(dev, names, virt_devs)

> 			{

> 				// takes and holds table reference

> 				opp_table = _add_opp_table(dev, false);

> 

> 				while (*name) {

> 					// first attachment succeeds, 

> 					// second fails with "tegra-gr3d 54180000.gr3d: failed to set OPP clk: -EBUSY"

> 					dev_pm_domain_attach_by_name(dev, *name)

> 					{

> 						tegra_pmc_pd_dev_get_performance_state(dev)

> 						{

> 							dev_pm_opp_set_clkname(dev, NULL);

> 							dev_pm_opp_of_add_table(dev);


What you end up doing here is pretty much like
devm_tegra_core_dev_init_opp_table_simple(), right ?

> 							opp = dev_pm_opp_get_current(dev);

> 							dev_pm_opp_of_remove_table(dev);

> 							dev_pm_opp_put_clkname(opp_table);


You shouldn't be required to do this at all.

> 							...

> 						}

> 						// opp_table->clk = ERR_PTR(-EINVAL) after the first attachment

> 					}

> 				}

> 			}

> 		}

> 	}

> 

> 	devm_tegra_core_dev_init_opp_table_simple(&pdev->dev);


Can we make the call at the beginning ? before calling
gr3d_init_power() ? I mean you should only be required to initialize
the OPP table just once.

If not, then what about calling
devm_tegra_core_dev_init_opp_table_simple() from here and from
tegra_pmc_pd_dev_get_performance_state() as well ?

And update devm_tegra_core_dev_init_opp_table_simple() to call
dev_pm_opp_get_opp_table() first and return early if the OPP table is
already initialized ?

-- 
viresh
Dmitry Osipenko Sept. 1, 2021, 9:04 a.m. UTC | #10
01.09.2021 10:16, Viresh Kumar пишет:
> On 01-09-21, 09:57, Dmitry Osipenko wrote:

>> 01.09.2021 09:10, Viresh Kumar пишет:

>>> So you create the OPP table and just after that you remove it ? Just

>>> to get the current OPP and pstate ? This doesn't look okay.

>>>

>>> Moreover, this routine must be implemented with the expectation that

>>> the genpd core can call it anytime it wants, not just at the

>>> beginning. And so if the table is already setup by someone else then,

>>> then this all will just fail.

>>

>> This is not doable using the current OPP API, it doesn't allow to

>> re-use initialized OPP table.

> 

> That isn't correct, you can call dev_pm_opp_of_add_table() as many

> times as you want. It will just increase the refcount and return the

> next time.

> 

> Yes, you can call the APIs like set-clk-name or supported-hw, since

> they are supposed to be set only once.

> 

>> The current limitation is okay because genpd core doesn't call

>> routine anytime.

> 

> Yeah, but is broken by design. People can make changes to genpd core

> later on to call it as many times and they aren't required to have a

> look at all the users to see who abused the API and who didn't.

> 

>>> Can you give the sequence in which the whole thing works out with

>>> respect to the OPP core? who calls

>>> devm_tegra_core_dev_init_opp_table() and when, and when exactly will

>>> this routine get called, etc ?

>>>

>>

>> gr3d_probe(struct platform_device *pdev)

> 

> Thanks for this.

> 

>> {

>> 	gr3d_init_power(dev)

>> 	{

>> 		static const char * const opp_genpd_names[] = { "3d0", "3d1", NULL };

>>

>> 		devm_pm_opp_attach_genpd(dev, opp_genpd_names)

>> 		{

>> 			dev_pm_opp_attach_genpd(dev, names, virt_devs)

>> 			{

>> 				// takes and holds table reference

>> 				opp_table = _add_opp_table(dev, false);

>>

>> 				while (*name) {

>> 					// first attachment succeeds, 

>> 					// second fails with "tegra-gr3d 54180000.gr3d: failed to set OPP clk: -EBUSY"

>> 					dev_pm_domain_attach_by_name(dev, *name)

>> 					{

>> 						tegra_pmc_pd_dev_get_performance_state(dev)

>> 						{

>> 							dev_pm_opp_set_clkname(dev, NULL);

>> 							dev_pm_opp_of_add_table(dev);

> 

> What you end up doing here is pretty much like

> devm_tegra_core_dev_init_opp_table_simple(), right ?


Yes

>> 							opp = dev_pm_opp_get_current(dev);

>> 							dev_pm_opp_of_remove_table(dev);

>> 							dev_pm_opp_put_clkname(opp_table);

> 

> You shouldn't be required to do this at all.

> 

>> 							...

>> 						}

>> 						// opp_table->clk = ERR_PTR(-EINVAL) after the first attachment

>> 					}

>> 				}

>> 			}

>> 		}

>> 	}

>>

>> 	devm_tegra_core_dev_init_opp_table_simple(&pdev->dev);

> 

> Can we make the call at the beginning ? before calling

> gr3d_init_power() ? I mean you should only be required to initialize

> the OPP table just once.


This breaks dev_pm_opp_set_supported_hw() which isn't allowed to be
called for a populated OPP table, set/put_hw also isn't refcounted.

> If not, then what about calling

> devm_tegra_core_dev_init_opp_table_simple() from here and from

> tegra_pmc_pd_dev_get_performance_state() as well ?

> 

> And update devm_tegra_core_dev_init_opp_table_simple() to call

> dev_pm_opp_get_opp_table() first and return early if the OPP table is

> already initialized ?

> 


This doesn't work for devm_pm_opp_register_set_opp_helper() that is used
by gr3d_probe() because set_opp_helper() should be invoked only before
table is populated and it's already populated for the case of a
single-domain h/w since domain is attached before driver is probed.

But it's a good idea to use dev_pm_opp_get_opp_table(). I got it to work
by adding dev_pm_opp_get_opp_table() to
tegra_pmc_pd_dev_get_performance_state() and moving
devm_tegra_core_dev_init_opp_table_simple() before gr3d_init_power().

Thank you for the suggestion, I'll change it in v11.