Message ID | 20201104234427.26477-1-digetx@gmail.com |
---|---|
Headers | show |
Series | Introduce core voltage scaling for NVIDIA Tegra20/30 SoCs | expand |
On Thu, Nov 05, 2020 at 02:43:57AM +0300, Dmitry Osipenko wrote: > Introduce core voltage scaling for NVIDIA Tegra20/30 SoCs, which reduces > power consumption and heating of the Tegra chips. Tegra SoC has multiple > hardware units which belong to a core power domain of the SoC and share > the core voltage. The voltage must be selected in accordance to a minimum > requirement of every core hardware unit. [...] Just looked briefly through the series - it looks like there is a lot of code duplication in *_init_opp_table() functions. Could this be made more generic / data-driven? Best Regards Michał Mirosław
+ Viresh On Thu, 5 Nov 2020 at 00:44, Dmitry Osipenko <digetx@gmail.com> wrote: > > Introduce core voltage scaling for NVIDIA Tegra20/30 SoCs, which reduces > power consumption and heating of the Tegra chips. Tegra SoC has multiple > hardware units which belong to a core power domain of the SoC and share > the core voltage. The voltage must be selected in accordance to a minimum > requirement of every core hardware unit. > > The minimum core voltage requirement depends on: > > 1. Clock enable state of a hardware unit. > 2. Clock frequency. > 3. Unit's internal idling/active state. > > This series is tested on Acer A500 (T20), AC100 (T20), Nexus 7 (T30) and > Ouya (T30) devices. I also added voltage scaling to the Ventana (T20) and > Cardhu (T30) boards which are tested by NVIDIA's CI farm. Tegra30 is now up > to 5C cooler on Nexus 7 and stays cool on Ouya (instead of becoming burning > hot) while system is idling. It should be possible to improve this further > by implementing a more advanced power management features for the kernel > drivers. > > The DVFS support is opt-in for all boards, meaning that older DTBs will > continue to work like they did it before this series. It should be possible > to easily add the core voltage scaling support for Tegra114+ SoCs based on > this grounding work later on, if anyone will want to implement it. > > WARNING(!) This series is made on top of the memory interconnect patches > which are currently under review [1]. The Tegra EMC driver > and devicetree-related patches need to be applied on top of > the ICC series. > > [1] https://patchwork.ozlabs.org/project/linux-tegra/list/?series=212196 > > Dmitry Osipenko (30): > dt-bindings: host1x: Document OPP and voltage regulator properties > dt-bindings: mmc: tegra: Document OPP and voltage regulator properties > dt-bindings: pwm: tegra: Document OPP and voltage regulator properties > media: dt: bindings: tegra-vde: Document OPP and voltage regulator > properties > dt-binding: usb: ci-hdrc-usb2: Document OPP and voltage regulator > properties > dt-bindings: usb: tegra-ehci: Document OPP and voltage regulator > properties > soc/tegra: Add sync state API > soc/tegra: regulators: Support Tegra SoC device sync state API > soc/tegra: regulators: Fix lockup when voltage-spread is out of range > regulator: Allow skipping disabled regulators in > regulator_check_consumers() > drm/tegra: dc: Support OPP and SoC core voltage scaling > drm/tegra: gr2d: Correct swapped device-tree compatibles > drm/tegra: gr2d: Support OPP and SoC core voltage scaling > drm/tegra: gr3d: Support OPP and SoC core voltage scaling > drm/tegra: hdmi: Support OPP and SoC core voltage scaling > gpu: host1x: Support OPP and SoC core voltage scaling > mmc: sdhci-tegra: Support OPP and core voltage scaling > pwm: tegra: Support OPP and core voltage scaling > media: staging: tegra-vde: Support OPP and SoC core voltage scaling > usb: chipidea: tegra: Support OPP and SoC core voltage scaling > usb: host: ehci-tegra: Support OPP and SoC core voltage scaling > memory: tegra20-emc: Support Tegra SoC device state syncing > memory: tegra30-emc: Support Tegra SoC device state syncing > ARM: tegra: Add OPP tables for Tegra20 peripheral devices > ARM: tegra: Add OPP tables for Tegra30 peripheral devices > ARM: tegra: ventana: Add voltage supplies to DVFS-capable devices > ARM: tegra: paz00: Add voltage supplies to DVFS-capable devices > ARM: tegra: acer-a500: Add voltage supplies to DVFS-capable devices > ARM: tegra: cardhu-a04: Add voltage supplies to DVFS-capable devices > ARM: tegra: nexus7: Add voltage supplies to DVFS-capable devices > > .../display/tegra/nvidia,tegra20-host1x.txt | 56 +++ > .../bindings/media/nvidia,tegra-vde.txt | 12 + > .../bindings/mmc/nvidia,tegra20-sdhci.txt | 12 + > .../bindings/pwm/nvidia,tegra20-pwm.txt | 13 + > .../devicetree/bindings/usb/ci-hdrc-usb2.txt | 4 + > .../bindings/usb/nvidia,tegra20-ehci.txt | 2 + > .../boot/dts/tegra20-acer-a500-picasso.dts | 30 +- > arch/arm/boot/dts/tegra20-paz00.dts | 40 +- > .../arm/boot/dts/tegra20-peripherals-opp.dtsi | 386 ++++++++++++++++ > arch/arm/boot/dts/tegra20-ventana.dts | 65 ++- > arch/arm/boot/dts/tegra20.dtsi | 14 + > .../tegra30-asus-nexus7-grouper-common.dtsi | 23 + > arch/arm/boot/dts/tegra30-cardhu-a04.dts | 44 ++ > .../arm/boot/dts/tegra30-peripherals-opp.dtsi | 415 ++++++++++++++++++ > arch/arm/boot/dts/tegra30.dtsi | 13 + > drivers/gpu/drm/tegra/Kconfig | 1 + > drivers/gpu/drm/tegra/dc.c | 138 +++++- > drivers/gpu/drm/tegra/dc.h | 5 + > drivers/gpu/drm/tegra/gr2d.c | 140 +++++- > drivers/gpu/drm/tegra/gr3d.c | 136 ++++++ > drivers/gpu/drm/tegra/hdmi.c | 63 ++- > drivers/gpu/host1x/Kconfig | 1 + > drivers/gpu/host1x/dev.c | 87 ++++ > drivers/memory/tegra/tegra20-emc.c | 8 +- > drivers/memory/tegra/tegra30-emc.c | 8 +- > drivers/mmc/host/Kconfig | 1 + > drivers/mmc/host/sdhci-tegra.c | 70 ++- > drivers/pwm/Kconfig | 1 + > drivers/pwm/pwm-tegra.c | 84 +++- > drivers/regulator/core.c | 12 +- > .../soc/samsung/exynos-regulator-coupler.c | 2 +- > drivers/soc/tegra/common.c | 152 ++++++- > drivers/soc/tegra/regulators-tegra20.c | 25 +- > drivers/soc/tegra/regulators-tegra30.c | 30 +- > drivers/staging/media/tegra-vde/Kconfig | 1 + > drivers/staging/media/tegra-vde/vde.c | 127 ++++++ > drivers/staging/media/tegra-vde/vde.h | 1 + > drivers/usb/chipidea/Kconfig | 1 + > drivers/usb/chipidea/ci_hdrc_tegra.c | 79 ++++ > drivers/usb/host/Kconfig | 1 + > drivers/usb/host/ehci-tegra.c | 79 ++++ > include/linux/regulator/coupler.h | 6 +- > include/soc/tegra/common.h | 22 + > 43 files changed, 2360 insertions(+), 50 deletions(-) > > -- > 2.27.0 > I need some more time to review this, but just a quick check found a few potential issues... The "core-supply", that you specify as a regulator for each controller's device node, is not the way we describe power domains. Instead, it seems like you should register a power-domain provider (with the help of genpd) and implement the ->set_performance_state() callback for it. Each device node should then be hooked up to this power-domain, rather than to a "core-supply". For DT bindings, please have a look at Documentation/devicetree/bindings/power/power-domain.yaml and Documentation/devicetree/bindings/power/power_domain.txt. In regards to the "sync state" problem (preventing to change performance states until all consumers have been attached), this can then be managed by the genpd provider driver instead. Kind regards Uffe
On Thu, Nov 5, 2020 at 5:15 AM Dmitry Osipenko <digetx@gmail.com> wrote: > diff --git a/drivers/mmc/host/sdhci-tegra.c b/drivers/mmc/host/sdhci-tegra.c > +static void sdhci_tegra_deinit_opp_table(void *data) > +{ > + struct device *dev = data; > + struct opp_table *opp_table; > + > + opp_table = dev_pm_opp_get_opp_table(dev); So you need to get an OPP table to put one :) You need to save the pointer returned by dev_pm_opp_set_regulators() instead. > + dev_pm_opp_of_remove_table(dev); > + dev_pm_opp_put_regulators(opp_table); > + dev_pm_opp_put_opp_table(opp_table); > +} > + > +static int devm_sdhci_tegra_init_opp_table(struct device *dev) > +{ > + struct opp_table *opp_table; > + const char *rname = "core"; > + int err; > + > + /* voltage scaling is optional */ > + if (device_property_present(dev, "core-supply")) > + opp_table = dev_pm_opp_set_regulators(dev, &rname, 1); > + else > + opp_table = dev_pm_opp_get_opp_table(dev); Nice. I didn't think that someone will end up abusing this API and so made it available for all, but someone just did that. I will fix that in the OPP core. Any idea why you are doing what you are doing here ? > + > + if (IS_ERR(opp_table)) > + return dev_err_probe(dev, PTR_ERR(opp_table), > + "failed to prepare OPP table\n"); > + > + /* > + * OPP table presence is optional and we want the set_rate() of OPP > + * API to work similarly to clk_set_rate() if table is missing in a > + * device-tree. The add_table() errors out if OPP is missing in DT. > + */ > + if (device_property_present(dev, "operating-points-v2")) { > + err = dev_pm_opp_of_add_table(dev); > + if (err) { > + dev_err(dev, "failed to add OPP table: %d\n", err); > + goto put_table; > + } > + } > + > + err = devm_add_action(dev, sdhci_tegra_deinit_opp_table, dev); > + if (err) > + goto remove_table; > + > + return 0; > + > +remove_table: > + dev_pm_opp_of_remove_table(dev); > +put_table: > + dev_pm_opp_put_regulators(opp_table); > + > + return err; > +} > + > static int sdhci_tegra_probe(struct platform_device *pdev) > { > const struct of_device_id *match; > @@ -1621,6 +1681,10 @@ static int sdhci_tegra_probe(struct platform_device *pdev) > goto err_power_req; > } > > + rc = devm_sdhci_tegra_init_opp_table(&pdev->dev); > + if (rc) > + goto err_parse_dt; > + > /* > * Tegra210 has a separate SDMMC_LEGACY_TM clock used for host > * timeout clock and SW can choose TMCLK or SDCLK for hardware > -- > 2.27.0 > > _______________________________________________ > devel mailing list > devel@linuxdriverproject.org > http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
On 05-11-20, 10:45, Ulf Hansson wrote: > + Viresh Thanks Ulf. I found a bug in OPP core because you cc'd me here :) > On Thu, 5 Nov 2020 at 00:44, Dmitry Osipenko <digetx@gmail.com> wrote: > I need some more time to review this, but just a quick check found a > few potential issues... > > The "core-supply", that you specify as a regulator for each > controller's device node, is not the way we describe power domains. Maybe I misunderstood your comment here, but there are two ways of scaling the voltage of a device depending on if it is a regulator (and can be modeled as one in the kernel) or a power domain. In case of Qcom earlier (when we added the performance-state stuff), the eventual hardware was out of kernel's control and we didn't wanted (allowed) to model it as a virtual regulator just to pass the votes to the RPM. And so we did what we did. But if the hardware (where the voltage is required to be changed) is indeed a regulator and is modeled as one, then what Dmitry has done looks okay. i.e. add a supply in the device's node and microvolt property in the DT entries.
On Thu, 5 Nov 2020 at 11:06, Viresh Kumar <viresh.kumar@linaro.org> wrote: > > On 05-11-20, 10:45, Ulf Hansson wrote: > > + Viresh > > Thanks Ulf. I found a bug in OPP core because you cc'd me here :) Happy to help. :-) > > > On Thu, 5 Nov 2020 at 00:44, Dmitry Osipenko <digetx@gmail.com> wrote: > > I need some more time to review this, but just a quick check found a > > few potential issues... > > > > The "core-supply", that you specify as a regulator for each > > controller's device node, is not the way we describe power domains. > > Maybe I misunderstood your comment here, but there are two ways of > scaling the voltage of a device depending on if it is a regulator (and > can be modeled as one in the kernel) or a power domain. I am not objecting about scaling the voltage through a regulator, that's fine to me. However, encoding a power domain as a regulator (even if it may seem like a regulator) isn't. Well, unless Mark Brown has changed his mind about this. In this case, it seems like the regulator supply belongs in the description of the power domain provider. > > In case of Qcom earlier (when we added the performance-state stuff), > the eventual hardware was out of kernel's control and we didn't wanted > (allowed) to model it as a virtual regulator just to pass the votes to > the RPM. And so we did what we did. > > But if the hardware (where the voltage is required to be changed) is > indeed a regulator and is modeled as one, then what Dmitry has done > looks okay. i.e. add a supply in the device's node and microvolt > property in the DT entries. I guess I haven't paid enough attention how power domain regulators are being described then. I was under the impression that the CPUfreq case was a bit specific - and we had legacy bindings to stick with. Can you point me to some other existing examples of where power domain regulators are specified as a regulator in each device's node? Kind regards Uffe
On 05-11-20, 11:34, Ulf Hansson wrote: > I am not objecting about scaling the voltage through a regulator, > that's fine to me. However, encoding a power domain as a regulator > (even if it may seem like a regulator) isn't. Well, unless Mark Brown > has changed his mind about this. > > In this case, it seems like the regulator supply belongs in the > description of the power domain provider. Okay, I wasn't sure if it is a power domain or a regulator here. Btw, how do we identify if it is a power domain or a regulator ? > > In case of Qcom earlier (when we added the performance-state stuff), > > the eventual hardware was out of kernel's control and we didn't wanted > > (allowed) to model it as a virtual regulator just to pass the votes to > > the RPM. And so we did what we did. > > > > But if the hardware (where the voltage is required to be changed) is > > indeed a regulator and is modeled as one, then what Dmitry has done > > looks okay. i.e. add a supply in the device's node and microvolt > > property in the DT entries. > > I guess I haven't paid enough attention how power domain regulators > are being described then. I was under the impression that the CPUfreq > case was a bit specific - and we had legacy bindings to stick with. > > Can you point me to some other existing examples of where power domain > regulators are specified as a regulator in each device's node? No, I thought it is a regulator here and not a power domain.
On Thu, 5 Nov 2020 at 11:40, Viresh Kumar <viresh.kumar@linaro.org> wrote: > > On 05-11-20, 11:34, Ulf Hansson wrote: > > I am not objecting about scaling the voltage through a regulator, > > that's fine to me. However, encoding a power domain as a regulator > > (even if it may seem like a regulator) isn't. Well, unless Mark Brown > > has changed his mind about this. > > > > In this case, it seems like the regulator supply belongs in the > > description of the power domain provider. > > Okay, I wasn't sure if it is a power domain or a regulator here. Btw, > how do we identify if it is a power domain or a regulator ? Good question. It's not a crystal clear line in between them, I think. A power domain to me, means that some part of a silicon (a group of controllers or just a single piece, for example) needs some kind of resource (typically a power rail) to be enabled to be functional, to start with. If there are operating points involved, that's also a clear indication to me, that it's not a regular regulator. Maybe we should try to specify this more exactly in some documentation, somewhere. > > > > In case of Qcom earlier (when we added the performance-state stuff), > > > the eventual hardware was out of kernel's control and we didn't wanted > > > (allowed) to model it as a virtual regulator just to pass the votes to > > > the RPM. And so we did what we did. > > > > > > But if the hardware (where the voltage is required to be changed) is > > > indeed a regulator and is modeled as one, then what Dmitry has done > > > looks okay. i.e. add a supply in the device's node and microvolt > > > property in the DT entries. > > > > I guess I haven't paid enough attention how power domain regulators > > are being described then. I was under the impression that the CPUfreq > > case was a bit specific - and we had legacy bindings to stick with. > > > > Can you point me to some other existing examples of where power domain > > regulators are specified as a regulator in each device's node? > > No, I thought it is a regulator here and not a power domain. Okay, thanks! Kind regards Uffe
On 05-11-20, 11:56, Ulf Hansson wrote: > On Thu, 5 Nov 2020 at 11:40, Viresh Kumar <viresh.kumar@linaro.org> wrote: > > Btw, how do we identify if it is a power domain or a regulator ? To be honest, I was a bit afraid and embarrassed to ask this question, and was hoping people to make fun of me in return :) > Good question. It's not a crystal clear line in between them, I think. And I was relieved after reading this :) > A power domain to me, means that some part of a silicon (a group of > controllers or just a single piece, for example) needs some kind of > resource (typically a power rail) to be enabled to be functional, to > start with. Isn't this what a part of regulator does as well ? i.e. enabling/disabling of the regulator or power to a group of controllers. Over that the regulator does voltage/current scaling as well, which normally the power domains don't do (though we did that in performance-state case). > If there are operating points involved, that's also a > clear indication to me, that it's not a regular regulator. Is there any example of that? I hope by OPP you meant both freq and voltage here. I am not sure if I know of a case where a power domain handles both of them. > Maybe we should try to specify this more exactly in some > documentation, somewhere. I think yes, it is very much required. And in absence of that I think, many (or most) of the platforms that also need to scale the voltage would have modeled their hardware as a regulator and not a PM domain. What I always thought was: - Module that can just enable/disable power to a block of SoC is a power domain. - Module that can enable/disable as well as scale voltage is a regulator. And so I thought that this patchset has done the right thing. This changed a bit with the qcom stuff where the IP to be configured was in control of RPM and not Linux and so we couldn't add it as a regulator. If it was controlled by Linux, it would have been a regulator in kernel for sure :)
On Thu, 5 Nov 2020 at 12:13, Viresh Kumar <viresh.kumar@linaro.org> wrote: > > On 05-11-20, 11:56, Ulf Hansson wrote: > > On Thu, 5 Nov 2020 at 11:40, Viresh Kumar <viresh.kumar@linaro.org> wrote: > > > Btw, how do we identify if it is a power domain or a regulator ? > > To be honest, I was a bit afraid and embarrassed to ask this question, > and was hoping people to make fun of me in return :) > > > Good question. It's not a crystal clear line in between them, I think. > > And I was relieved after reading this :) > > > A power domain to me, means that some part of a silicon (a group of > > controllers or just a single piece, for example) needs some kind of > > resource (typically a power rail) to be enabled to be functional, to > > start with. > > Isn't this what a part of regulator does as well ? i.e. > enabling/disabling of the regulator or power to a group of > controllers. It could, but it shouldn't. > > Over that the regulator does voltage/current scaling as well, which > normally the power domains don't do (though we did that in > performance-state case). > > > If there are operating points involved, that's also a > > clear indication to me, that it's not a regular regulator. > > Is there any example of that? I hope by OPP you meant both freq and > voltage here. I am not sure if I know of a case where a power domain > handles both of them. It may be both voltage and frequency - but in some cases only voltage.
05.11.2020 04:45, Michał Mirosław пишет: > On Thu, Nov 05, 2020 at 02:43:57AM +0300, Dmitry Osipenko wrote: >> Introduce core voltage scaling for NVIDIA Tegra20/30 SoCs, which reduces >> power consumption and heating of the Tegra chips. Tegra SoC has multiple >> hardware units which belong to a core power domain of the SoC and share >> the core voltage. The voltage must be selected in accordance to a minimum >> requirement of every core hardware unit. > [...] > > Just looked briefly through the series - it looks like there is a lot of > code duplication in *_init_opp_table() functions. Could this be made > more generic / data-driven? Indeed, it should be possible to add a common helper. I had a quick thought about doing it too, but then decided to defer for the starter since there were some differences among the needs of the drivers. I'll take a closer look for the v2, thanks!
05.11.2020 12:58, Viresh Kumar пишет: >> +static void sdhci_tegra_deinit_opp_table(void *data) >> +{ >> + struct device *dev = data; >> + struct opp_table *opp_table; >> + >> + opp_table = dev_pm_opp_get_opp_table(dev); > So you need to get an OPP table to put one :) > You need to save the pointer returned by dev_pm_opp_set_regulators() instead. This is intentional because why do we need to save the pointer if we're not using it and we know that we could get this pointer using OPP API? This is exactly the same what I did for the CPUFreq driver [1] :) [1] https://elixir.bootlin.com/linux/v5.10-rc2/source/drivers/cpufreq/tegra20-cpufreq.c#L97 >> + dev_pm_opp_of_remove_table(dev); >> + dev_pm_opp_put_regulators(opp_table); >> + dev_pm_opp_put_opp_table(opp_table); >> +} >> + >> +static int devm_sdhci_tegra_init_opp_table(struct device *dev) >> +{ >> + struct opp_table *opp_table; >> + const char *rname = "core"; >> + int err; >> + >> + /* voltage scaling is optional */ >> + if (device_property_present(dev, "core-supply")) >> + opp_table = dev_pm_opp_set_regulators(dev, &rname, 1); >> + else > >> + opp_table = dev_pm_opp_get_opp_table(dev); > Nice. I didn't think that someone will end up abusing this API and so made it > available for all, but someone just did that. I will fix that in the OPP core. The dev_pm_opp_put_regulators() handles the case where regulator is missing by acting as dev_pm_opp_get_opp_table(), but the dev_pm_opp_set_regulators() doesn't do it. Hence I don't think this is an abuse, but the OPP API drawback. > Any idea why you are doing what you are doing here ? Two reasons: 1. Voltage regulator is optional, but dev_pm_opp_set_regulators() doesn't support optional regulators. 2. We need to balance the opp_table refcount in order to use OPP API without polluting code with if(have_regulator), hence the dev_pm_opp_get_opp_table() is needed for taking the opp_table reference to have the same refcount as in the case of the dev_pm_opp_set_regulators(). I guess we could make dev_pm_opp_set_regulators(dev, count) to accept regulators count=0 and then act as dev_pm_opp_get_opp_table(dev), will it be acceptable?
05.11.2020 12:45, Ulf Hansson пишет: ... > I need some more time to review this, but just a quick check found a > few potential issues... Thank you for starting the review! I'm pretty sure it will take a couple revisions until all the questions will be resolved :) > The "core-supply", that you specify as a regulator for each > controller's device node, is not the way we describe power domains. > Instead, it seems like you should register a power-domain provider > (with the help of genpd) and implement the ->set_performance_state() > callback for it. Each device node should then be hooked up to this > power-domain, rather than to a "core-supply". For DT bindings, please > have a look at Documentation/devicetree/bindings/power/power-domain.yaml > and Documentation/devicetree/bindings/power/power_domain.txt. > > In regards to the "sync state" problem (preventing to change > performance states until all consumers have been attached), this can > then be managed by the genpd provider driver instead. I'll need to take a closer look at GENPD, thank you for the suggestion. Sounds like a software GENPD driver which manages clocks and voltages could be a good idea, but it also could be an unnecessary over-engineering. Let's see..
On 05-11-20, 17:18, Dmitry Osipenko wrote: > 05.11.2020 12:58, Viresh Kumar пишет: > >> +static void sdhci_tegra_deinit_opp_table(void *data) > >> +{ > >> + struct device *dev = data; > >> + struct opp_table *opp_table; > >> + > >> + opp_table = dev_pm_opp_get_opp_table(dev); > > So you need to get an OPP table to put one :) > > You need to save the pointer returned by dev_pm_opp_set_regulators() instead. > > This is intentional because why do we need to save the pointer if we're > not using it and we know that we could get this pointer using OPP API? Because it is highly inefficient and it doesn't follow the rules set by the OPP core. Hypothetically speaking, the OPP core is free to allocate the OPP table structure as much as it wants, and if you don't use the value returned back to you earlier (think of it as a cookie assigned to your driver), then it will eventually lead to memory leak. > This is exactly the same what I did for the CPUFreq driver [1] :) I will strongly suggest you to save the pointer here and do the same in the cpufreq driver as well. > >> +static int devm_sdhci_tegra_init_opp_table(struct device *dev) > >> +{ > >> + struct opp_table *opp_table; > >> + const char *rname = "core"; > >> + int err; > >> + > >> + /* voltage scaling is optional */ > >> + if (device_property_present(dev, "core-supply")) > >> + opp_table = dev_pm_opp_set_regulators(dev, &rname, 1); > >> + else > > > >> + opp_table = dev_pm_opp_get_opp_table(dev); To make it further clear, this will end up allocating an OPP table for you, which it shouldn't have. > > Nice. I didn't think that someone will end up abusing this API and so made it > > available for all, but someone just did that. I will fix that in the OPP core. To be fair, I allowed the cpufreq-dt driver to abuse it too, which I am going to fix shortly. > The dev_pm_opp_put_regulators() handles the case where regulator is > missing by acting as dev_pm_opp_get_opp_table(), but the > dev_pm_opp_set_regulators() doesn't do it. Hence I don't think this is > an abuse, but the OPP API drawback. I am not sure what you meant here. Normally you are required to call dev_pm_opp_put_regulators() only if you have called dev_pm_opp_set_regulators() earlier. And the refcount stays in balance. > > Any idea why you are doing what you are doing here ? > > Two reasons: > > 1. Voltage regulator is optional, but dev_pm_opp_set_regulators() > doesn't support optional regulators. > > 2. We need to balance the opp_table refcount in order to use OPP API > without polluting code with if(have_regulator), hence the > dev_pm_opp_get_opp_table() is needed for taking the opp_table reference > to have the same refcount as in the case of the dev_pm_opp_set_regulators(). I am going to send a patchset shortly after which this call to dev_pm_opp_get_opp_table() will fail, if it is called before adding the OPP table. > I guess we could make dev_pm_opp_set_regulators(dev, count) to accept > regulators count=0 and then act as dev_pm_opp_get_opp_table(dev), will > it be acceptable? Setting regulators for count as 0 doesn't sound good to me. But, I understand that you don't want to have that if (have_regulator) check, and it is a fair request. What I will instead do is, allow all dev_pm_opp_put*() API to start accepting a NULL pointer for the OPP table and fail silently. And so you won't be required to have this unwanted check. But you will be required to save the pointer returned back by dev_pm_opp_set_regulators(), which is the right thing to do anyways.
05.11.2020 18:22, Dmitry Osipenko пишет: > 05.11.2020 12:45, Ulf Hansson пишет: > ... >> I need some more time to review this, but just a quick check found a >> few potential issues... > > Thank you for starting the review! I'm pretty sure it will take a couple > revisions until all the questions will be resolved :) > >> The "core-supply", that you specify as a regulator for each >> controller's device node, is not the way we describe power domains. >> Instead, it seems like you should register a power-domain provider >> (with the help of genpd) and implement the ->set_performance_state() >> callback for it. Each device node should then be hooked up to this >> power-domain, rather than to a "core-supply". For DT bindings, please >> have a look at Documentation/devicetree/bindings/power/power-domain.yaml >> and Documentation/devicetree/bindings/power/power_domain.txt. >> >> In regards to the "sync state" problem (preventing to change >> performance states until all consumers have been attached), this can >> then be managed by the genpd provider driver instead. > > I'll need to take a closer look at GENPD, thank you for the suggestion. > > Sounds like a software GENPD driver which manages clocks and voltages > could be a good idea, but it also could be an unnecessary > over-engineering. Let's see.. > Hello Ulf and all, I took a detailed look at the GENPD and tried to implement it. Here is what was found: 1. GENPD framework doesn't aggregate performance requests from the attached devices. This means that if deviceA requests performance state 10 and then deviceB requests state 3, then framework will set domain's state to 3 instead of 10. https://elixir.bootlin.com/linux/v5.10-rc2/source/drivers/base/power/domain.c#L376 2. GENPD framework has a sync() callback in the genpd.domain structure, but this callback isn't allowed to be used by the GENPD implementation. The GENPD framework always overrides that callback for its own needs. Hence GENPD doesn't allow to solve the bootstrapping state-synchronization problem in a nice way. https://elixir.bootlin.com/linux/v5.10-rc2/source/drivers/base/power/domain.c#L2606 3. Tegra doesn't have a dedicated hardware power-controller for the core domain, instead there is only an external voltage regulator. Hence we will need to create a phony device-tree node for the virtual power domain, which is probably a wrong thing to do. === Perhaps it should be possible to create some hacks to work around bullets 2 and 3 in order to achieve what we need for DVFS on Tegra, but bullet 1 isn't solvable without changing how the GENPD core works. Altogether, the GENPD in its current form is a wrong abstraction for a system-wide DVFS in a case where multiple devices share power domain and this domain is a voltage regulator. The regulator framework is the correct abstraction in this case for today.
09.11.2020 07:43, Viresh Kumar пишет: > On 08-11-20, 15:19, Dmitry Osipenko wrote: >> I took a detailed look at the GENPD and tried to implement it. Here is >> what was found: >> >> 1. GENPD framework doesn't aggregate performance requests from the >> attached devices. This means that if deviceA requests performance state >> 10 and then deviceB requests state 3, then framework will set domain's >> state to 3 instead of 10. > > It does. Look at _genpd_reeval_performance_state(). > Thanks, I probably had a bug in the quick prototype and then overlooked that function.
On 06-11-20, 21:41, Frank Lee wrote: > On Fri, Nov 6, 2020 at 9:18 PM Dmitry Osipenko <digetx@gmail.com> wrote: > > > > 06.11.2020 09:15, Viresh Kumar пишет: > > > Setting regulators for count as 0 doesn't sound good to me. > > > > > > But, I understand that you don't want to have that if (have_regulator) > > > check, and it is a fair request. What I will instead do is, allow all > > > dev_pm_opp_put*() API to start accepting a NULL pointer for the OPP > > > table and fail silently. And so you won't be required to have this > > > unwanted check. But you will be required to save the pointer returned > > > back by dev_pm_opp_set_regulators(), which is the right thing to do > > > anyways. > > > > Perhaps even a better variant could be to add a devm versions of the OPP > > API functions, then drivers won't need to care about storing the > > opp_table pointer if it's unused by drivers. > > I think so. The consumer may not be so concerned about the status of > these OPP tables. > If the driver needs to manage the release, it needs to add a pointer > to their driver global structure. > > Maybe it's worth having these devm interfaces for opp. Sure if there are enough users of this, I am all for it. I was fine with the patches you sent, just that there were not a lot of users of it and so I pushed them back. If we find that we have more users of it now, we can surely get that back. -- viresh
09.11.2020 08:00, Viresh Kumar пишет: > On 06-11-20, 21:41, Frank Lee wrote: >> On Fri, Nov 6, 2020 at 9:18 PM Dmitry Osipenko <digetx@gmail.com> wrote: >>> >>> 06.11.2020 09:15, Viresh Kumar пишет: >>>> Setting regulators for count as 0 doesn't sound good to me. >>>> >>>> But, I understand that you don't want to have that if (have_regulator) >>>> check, and it is a fair request. What I will instead do is, allow all >>>> dev_pm_opp_put*() API to start accepting a NULL pointer for the OPP >>>> table and fail silently. And so you won't be required to have this >>>> unwanted check. But you will be required to save the pointer returned >>>> back by dev_pm_opp_set_regulators(), which is the right thing to do >>>> anyways. >>> >>> Perhaps even a better variant could be to add a devm versions of the OPP >>> API functions, then drivers won't need to care about storing the >>> opp_table pointer if it's unused by drivers. >> >> I think so. The consumer may not be so concerned about the status of >> these OPP tables. >> If the driver needs to manage the release, it needs to add a pointer >> to their driver global structure. >> >> Maybe it's worth having these devm interfaces for opp. > > Sure if there are enough users of this, I am all for it. I was fine > with the patches you sent, just that there were not a lot of users of > it and so I pushed them back. If we find that we have more users of it > now, we can surely get that back. > There was already attempt to add the devm? Could you please give me a link to the patches? I already prepared a patch which adds the devm helpers. It helps to keep code cleaner and readable.
09.11.2020 07:47, Dmitry Osipenko пишет: > 09.11.2020 07:43, Viresh Kumar пишет: >> On 08-11-20, 15:19, Dmitry Osipenko wrote: >>> I took a detailed look at the GENPD and tried to implement it. Here is >>> what was found: >>> >>> 1. GENPD framework doesn't aggregate performance requests from the >>> attached devices. This means that if deviceA requests performance state >>> 10 and then deviceB requests state 3, then framework will set domain's >>> state to 3 instead of 10. >> >> It does. Look at _genpd_reeval_performance_state(). >> > > Thanks, I probably had a bug in the quick prototype and then overlooked > that function. > If a non-hardware device-tree node is okay to have for the domain, then I can try again. What I also haven't mentioned is that GENPD adds some extra complexity to some drivers (3d, video decoder) because we will need to handle both new GENPD and legacy Tegra specific pre-genpd era domains. I'm also not exactly sure how the topology of domains should look like because Tegra has a power-controller (PMC) which manages power rail of a few hardware units. Perhaps it should be device -> PMC domain -> CORE domain but not exactly sure for now.
09.11.2020 08:10, Viresh Kumar пишет: > On 09-11-20, 08:08, Dmitry Osipenko wrote: >> 09.11.2020 08:00, Viresh Kumar пишет: >>> On 06-11-20, 21:41, Frank Lee wrote: >>>> On Fri, Nov 6, 2020 at 9:18 PM Dmitry Osipenko <digetx@gmail.com> wrote: >>>>> >>>>> 06.11.2020 09:15, Viresh Kumar пишет: >>>>>> Setting regulators for count as 0 doesn't sound good to me. >>>>>> >>>>>> But, I understand that you don't want to have that if (have_regulator) >>>>>> check, and it is a fair request. What I will instead do is, allow all >>>>>> dev_pm_opp_put*() API to start accepting a NULL pointer for the OPP >>>>>> table and fail silently. And so you won't be required to have this >>>>>> unwanted check. But you will be required to save the pointer returned >>>>>> back by dev_pm_opp_set_regulators(), which is the right thing to do >>>>>> anyways. >>>>> >>>>> Perhaps even a better variant could be to add a devm versions of the OPP >>>>> API functions, then drivers won't need to care about storing the >>>>> opp_table pointer if it's unused by drivers. >>>> >>>> I think so. The consumer may not be so concerned about the status of >>>> these OPP tables. >>>> If the driver needs to manage the release, it needs to add a pointer >>>> to their driver global structure. >>>> >>>> Maybe it's worth having these devm interfaces for opp. >>> >>> Sure if there are enough users of this, I am all for it. I was fine >>> with the patches you sent, just that there were not a lot of users of >>> it and so I pushed them back. If we find that we have more users of it >>> now, we can surely get that back. >>> >> >> There was already attempt to add the devm? Could you please give me a >> link to the patches? >> >> I already prepared a patch which adds the devm helpers. It helps to keep >> code cleaner and readable. > > https://lore.kernel.org/lkml/20201012135517.19468-1-frank@allwinnertech.com/ > Thanks, I made it in a different way by simply adding helpers to the pm_opp.h which use devm_add_action_or_reset(). This doesn't require to add new kernel symbols. static inline int devm_pm_opp_of_add_table(struct device *dev) { int err; err = dev_pm_opp_of_add_table(dev); if (err) return err; err = devm_add_action_or_reset(dev, (void*)dev_pm_opp_remove_table, dev); if (err) return err; return 0; }
09.11.2020 08:35, Viresh Kumar пишет: > On 09-11-20, 08:19, Dmitry Osipenko wrote: >> Thanks, I made it in a different way by simply adding helpers to the >> pm_opp.h which use devm_add_action_or_reset(). This doesn't require to >> add new kernel symbols. > > I will prefer to add it in core.c itself, and yes > devm_add_action_or_reset() looks better. But I am still not sure for > which helpers do we need the devm_*() variants, as this is only useful > for non-CPU devices. But if we have users that we can add right now, > why not. All current non-CPU drivers (devfreq, mmc, memory, etc) can benefit from it. For Tegra drivers we need these variants: devm_pm_opp_set_supported_hw() devm_pm_opp_set_regulators() [if we won't use GENPD] devm_pm_opp_set_clkname() devm_pm_opp_of_add_table()
On 09-11-20, 08:44, Dmitry Osipenko wrote: > 09.11.2020 08:35, Viresh Kumar пишет: > > On 09-11-20, 08:19, Dmitry Osipenko wrote: > >> Thanks, I made it in a different way by simply adding helpers to the > >> pm_opp.h which use devm_add_action_or_reset(). This doesn't require to > >> add new kernel symbols. > > > > I will prefer to add it in core.c itself, and yes > > devm_add_action_or_reset() looks better. But I am still not sure for > > which helpers do we need the devm_*() variants, as this is only useful > > for non-CPU devices. But if we have users that we can add right now, > > why not. > > All current non-CPU drivers (devfreq, mmc, memory, etc) can benefit from it. > > For Tegra drivers we need these variants: > > devm_pm_opp_set_supported_hw() > devm_pm_opp_set_regulators() [if we won't use GENPD] > devm_pm_opp_set_clkname() > devm_pm_opp_of_add_table() I tried to look earlier for the stuff already merged in and didn't find a lot of stuff where the devm_* could be used, maybe I missed some of it. Frank, would you like to refresh your series based on suggestions from Dmitry and make other drivers adapt to the new APIs ?
On Mon, Nov 9, 2020 at 1:53 PM Viresh Kumar <viresh.kumar@linaro.org> wrote: > > On 09-11-20, 08:44, Dmitry Osipenko wrote: > > 09.11.2020 08:35, Viresh Kumar пишет: > > > On 09-11-20, 08:19, Dmitry Osipenko wrote: > > >> Thanks, I made it in a different way by simply adding helpers to the > > >> pm_opp.h which use devm_add_action_or_reset(). This doesn't require to > > >> add new kernel symbols. > > > > > > I will prefer to add it in core.c itself, and yes > > > devm_add_action_or_reset() looks better. But I am still not sure for > > > which helpers do we need the devm_*() variants, as this is only useful > > > for non-CPU devices. But if we have users that we can add right now, > > > why not. > > > > All current non-CPU drivers (devfreq, mmc, memory, etc) can benefit from it. > > > > For Tegra drivers we need these variants: > > > > devm_pm_opp_set_supported_hw() > > devm_pm_opp_set_regulators() [if we won't use GENPD] > > devm_pm_opp_set_clkname() > > devm_pm_opp_of_add_table() > > I tried to look earlier for the stuff already merged in and didn't > find a lot of stuff where the devm_* could be used, maybe I missed > some of it. > > Frank, would you like to refresh your series based on suggestions from > Dmitry and make other drivers adapt to the new APIs ? I am glad to do this.:) Yangtao
On Thu, 05 Nov 2020 02:43:58 +0300, Dmitry Osipenko wrote: > Document new DVFS OPP table and voltage regulator properties of the > Host1x bus and devices sitting on the bus. > > Signed-off-by: Dmitry Osipenko <digetx@gmail.com> > --- > .../display/tegra/nvidia,tegra20-host1x.txt | 56 +++++++++++++++++++ > 1 file changed, 56 insertions(+) > Reviewed-by: Rob Herring <robh@kernel.org>
On Thu, 05 Nov 2020 02:43:59 +0300, Dmitry Osipenko wrote: > Document new DVFS OPP table and voltage regulator properties of the > SDHCI controller. > > Signed-off-by: Dmitry Osipenko <digetx@gmail.com> > --- > .../devicetree/bindings/mmc/nvidia,tegra20-sdhci.txt | 12 ++++++++++++ > 1 file changed, 12 insertions(+) > Reviewed-by: Rob Herring <robh@kernel.org>
On Thu, 05 Nov 2020 02:44:00 +0300, Dmitry Osipenko wrote: > Document new DVFS OPP table and voltage regulator properties of the > PWM controller. > > Signed-off-by: Dmitry Osipenko <digetx@gmail.com> > --- > .../devicetree/bindings/pwm/nvidia,tegra20-pwm.txt | 13 +++++++++++++ > 1 file changed, 13 insertions(+) > Reviewed-by: Rob Herring <robh@kernel.org>
On Thu, Nov 05, 2020 at 02:44:08AM +0300, Dmitry Osipenko wrote: > Add OPP and SoC core voltage scaling support to the display controller > driver. This is required for enabling system-wide DVFS on older Tegra > SoCs. > > Tested-by: Peter Geis <pgwipeout@gmail.com> > Tested-by: Nicolas Chauvet <kwizart@gmail.com> > Signed-off-by: Dmitry Osipenko <digetx@gmail.com> > --- > drivers/gpu/drm/tegra/Kconfig | 1 + > drivers/gpu/drm/tegra/dc.c | 138 +++++++++++++++++++++++++++++++++- > drivers/gpu/drm/tegra/dc.h | 5 ++ > 3 files changed, 143 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/tegra/Kconfig b/drivers/gpu/drm/tegra/Kconfig > index 1650a448eabd..9eec4c3fbd3b 100644 > --- a/drivers/gpu/drm/tegra/Kconfig > +++ b/drivers/gpu/drm/tegra/Kconfig > @@ -12,6 +12,7 @@ config DRM_TEGRA > select INTERCONNECT > select IOMMU_IOVA > select CEC_CORE if CEC_NOTIFIER > + select PM_OPP > help > Choose this option if you have an NVIDIA Tegra SoC. > > diff --git a/drivers/gpu/drm/tegra/dc.c b/drivers/gpu/drm/tegra/dc.c > index fd7c8828652d..babcb66a335b 100644 > --- a/drivers/gpu/drm/tegra/dc.c > +++ b/drivers/gpu/drm/tegra/dc.c > @@ -11,9 +11,13 @@ > #include <linux/interconnect.h> > #include <linux/module.h> > #include <linux/of_device.h> > +#include <linux/pm_opp.h> > #include <linux/pm_runtime.h> > +#include <linux/regulator/consumer.h> > #include <linux/reset.h> > > +#include <soc/tegra/common.h> > +#include <soc/tegra/fuse.h> > #include <soc/tegra/pmc.h> > > #include <drm/drm_atomic.h> > @@ -1699,6 +1703,55 @@ int tegra_dc_state_setup_clock(struct tegra_dc *dc, > return 0; > } > > +static void tegra_dc_update_voltage_state(struct tegra_dc *dc, > + struct tegra_dc_state *state) > +{ > + struct dev_pm_opp *opp; > + unsigned long rate; > + int err, min_uV; > + > + /* OPP usage is optional */ > + if (!dc->opp_table) > + return; > + > + /* calculate actual pixel clock rate which depends on internal divider */ > + rate = DIV_ROUND_UP(clk_get_rate(dc->clk) * 2, state->div + 2); > + > + /* find suitable OPP for the rate */ > + opp = dev_pm_opp_find_freq_ceil(dc->dev, &rate); > + > + if (opp == ERR_PTR(-ERANGE)) > + opp = dev_pm_opp_find_freq_floor(dc->dev, &rate); > + > + if (IS_ERR(opp)) { > + dev_err(dc->dev, "failed to find OPP for %lu Hz: %ld\n", > + rate, PTR_ERR(opp)); > + return; > + } > + > + min_uV = dev_pm_opp_get_voltage(opp); > + dev_pm_opp_put(opp); > + > + /* > + * Voltage scaling is optional and trying to set voltage for a dummy > + * regulator will error out. > + */ > + if (!device_property_present(dc->dev, "core-supply")) > + return; This is a potentially heavy operation, so I think we should avoid that here. How about you use devm_regulator_get_optional() in ->probe()? That returns -ENODEV if no regulator was specified, in which case you can set dc->core_reg = NULL and use that as the condition here. > + > + /* > + * Note that the minimum core voltage depends on the pixel clock > + * rate (which depends on internal clock divider of CRTC) and not on > + * the rate of the display controller clock. This is why we're not > + * using dev_pm_opp_set_rate() API and instead are managing the > + * voltage by ourselves. > + */ > + err = regulator_set_voltage(dc->core_reg, min_uV, INT_MAX); > + if (err) > + dev_err(dc->dev, "failed to set CORE voltage to %duV: %d\n", > + min_uV, err); > +} Also, I'd prefer if the flow here was more linear, such as: if (dc->core_reg) { err = regulator_set_voltage(...); ... } > + > static void tegra_dc_commit_state(struct tegra_dc *dc, > struct tegra_dc_state *state) > { > @@ -1738,6 +1791,8 @@ static void tegra_dc_commit_state(struct tegra_dc *dc, > if (err < 0) > dev_err(dc->dev, "failed to set clock %pC to %lu Hz: %d\n", > dc->clk, state->pclk, err); > + > + tegra_dc_update_voltage_state(dc, state); > } > > static void tegra_dc_stop(struct tegra_dc *dc) > @@ -2521,6 +2576,7 @@ static int tegra_dc_runtime_suspend(struct host1x_client *client) > > clk_disable_unprepare(dc->clk); > pm_runtime_put_sync(dev); > + regulator_disable(dc->core_reg); > > return 0; > } > @@ -2531,10 +2587,16 @@ static int tegra_dc_runtime_resume(struct host1x_client *client) > struct device *dev = client->dev; > int err; > > + err = regulator_enable(dc->core_reg); > + if (err < 0) { > + dev_err(dev, "failed to enable CORE regulator: %d\n", err); > + return err; > + } > + > err = pm_runtime_get_sync(dev); > if (err < 0) { > dev_err(dev, "failed to get runtime PM: %d\n", err); > - return err; > + goto disable_regulator; > } > > if (dc->soc->has_powergate) { > @@ -2564,6 +2626,9 @@ static int tegra_dc_runtime_resume(struct host1x_client *client) > clk_disable_unprepare(dc->clk); > put_rpm: > pm_runtime_put_sync(dev); > +disable_regulator: > + regulator_disable(dc->core_reg); > + > return err; > } > > @@ -2879,6 +2944,72 @@ static int tegra_dc_couple(struct tegra_dc *dc) > return 0; > } > > +static void tegra_dc_deinit_opp_table(void *data) > +{ > + struct tegra_dc *dc = data; > + > + dev_pm_opp_of_remove_table(dc->dev); > + dev_pm_opp_put_supported_hw(dc->opp_table); > + dev_pm_opp_put_regulators(dc->opp_table); > +} > + > +static int devm_tegra_dc_opp_table_init(struct tegra_dc *dc) > +{ > + struct opp_table *hw_opp_table; > + u32 hw_version; > + int err; > + > + /* voltage scaling is optional */ > + dc->core_reg = devm_regulator_get(dc->dev, "core"); > + if (IS_ERR(dc->core_reg)) > + return dev_err_probe(dc->dev, PTR_ERR(dc->core_reg), > + "failed to get CORE regulator\n"); > + > + /* legacy device-trees don't have OPP table */ > + if (!device_property_present(dc->dev, "operating-points-v2")) > + return 0; "Legacy" is a bit confusing here. For one, no device trees currently have these tables and secondly, for newer SoCs we may never need them. > + > + dc->opp_table = dev_pm_opp_get_opp_table(dc->dev); > + if (IS_ERR(dc->opp_table)) > + return dev_err_probe(dc->dev, PTR_ERR(dc->opp_table), > + "failed to prepare OPP table\n"); > + > + if (of_machine_is_compatible("nvidia,tegra20")) > + hw_version = BIT(tegra_sku_info.soc_process_id); > + else > + hw_version = BIT(tegra_sku_info.soc_speedo_id); > + > + hw_opp_table = dev_pm_opp_set_supported_hw(dc->dev, &hw_version, 1); > + err = PTR_ERR_OR_ZERO(hw_opp_table); What's the point of this? A more canonical version would be: if (IS_ERR(hw_opp_table)) { err = PTR_ERR(hw_opp_table); dev_err(dc->dev, ...); goto put_table; } That uses the same number of lines but is much easier to read, in my opinion, because it is the canonical form. > + if (err) { > + dev_err(dc->dev, "failed to set supported HW: %d\n", err); > + goto put_table; > + } > + > + err = dev_pm_opp_of_add_table(dc->dev); > + if (err) { > + dev_err(dc->dev, "failed to add OPP table: %d\n", err); > + goto put_hw; > + } > + > + err = devm_add_action(dc->dev, tegra_dc_deinit_opp_table, dc); > + if (err) > + goto remove_table; Do these functions return positive values? If not, I'd prefer if this check was more explicit (i.e. err < 0) for consistency with the rest of this code. > + > + dev_info(dc->dev, "OPP HW ver. 0x%x\n", hw_version); > + > + return 0; > + > +remove_table: > + dev_pm_opp_of_remove_table(dc->dev); > +put_hw: > + dev_pm_opp_put_supported_hw(dc->opp_table); > +put_table: > + dev_pm_opp_put_opp_table(dc->opp_table); > + > + return err; > +} > + > static int tegra_dc_probe(struct platform_device *pdev) > { > struct tegra_dc *dc; > @@ -2937,6 +3068,10 @@ static int tegra_dc_probe(struct platform_device *pdev) > tegra_powergate_power_off(dc->powergate); > } > > + err = devm_tegra_dc_opp_table_init(dc); > + if (err < 0) > + return err; > + > dc->regs = devm_platform_ioremap_resource(pdev, 0); > if (IS_ERR(dc->regs)) > return PTR_ERR(dc->regs); > @@ -3007,6 +3142,7 @@ struct platform_driver tegra_dc_driver = { > .driver = { > .name = "tegra-dc", > .of_match_table = tegra_dc_of_match, > + .sync_state = tegra_soc_device_sync_state, > }, > .probe = tegra_dc_probe, > .remove = tegra_dc_remove, > diff --git a/drivers/gpu/drm/tegra/dc.h b/drivers/gpu/drm/tegra/dc.h > index ba4ed35139fb..fd774fc5c2e4 100644 > --- a/drivers/gpu/drm/tegra/dc.h > +++ b/drivers/gpu/drm/tegra/dc.h > @@ -13,6 +13,8 @@ > > #include "drm.h" > > +struct opp_table; > +struct regulator; > struct tegra_output; > > #define TEGRA_DC_LEGACY_PLANES_NUM 6 > @@ -107,6 +109,9 @@ struct tegra_dc { > struct drm_info_list *debugfs_files; > > const struct tegra_dc_soc_info *soc; > + > + struct opp_table *opp_table; > + struct regulator *core_reg; We typically use a _supply suffix on regulators to avoid confusing this with "register". Thierry
On Thu, Nov 05, 2020 at 02:44:04AM +0300, Dmitry Osipenko wrote: > Introduce sync state API that will be used by Tegra device drivers. This > new API is primarily needed for syncing state of SoC devices that are left > ON after bootloader or permanently enabled. All these devices belong to a > shared CORE voltage domain, and thus, we needed to bring all the devices > into expected state before the voltage scaling could be performed. > > All drivers of DVFS-critical devices shall sync theirs the state before > Tegra's voltage regulator coupler will be allowed to perform a system-wide > voltage scaling. > > Tested-by: Peter Geis <pgwipeout@gmail.com> > Tested-by: Nicolas Chauvet <kwizart@gmail.com> > Signed-off-by: Dmitry Osipenko <digetx@gmail.com> > --- > drivers/soc/tegra/common.c | 152 ++++++++++++++++++++++++++++++++++++- > include/soc/tegra/common.h | 22 ++++++ > 2 files changed, 170 insertions(+), 4 deletions(-) > > diff --git a/drivers/soc/tegra/common.c b/drivers/soc/tegra/common.c > index 3dc54f59cafe..f9b2b6f57887 100644 > --- a/drivers/soc/tegra/common.c > +++ b/drivers/soc/tegra/common.c > @@ -3,13 +3,52 @@ > * Copyright (C) 2014 NVIDIA CORPORATION. All rights reserved. > */ > > +#define dev_fmt(fmt) "%s: " fmt, __func__ > +#define pr_fmt(fmt) "%s: " fmt, __func__ > + > +#include <linux/export.h> > +#include <linux/init.h> > +#include <linux/mutex.h> > #include <linux/of.h> > +#include <linux/of_device.h> > > #include <soc/tegra/common.h> > > +#define terga_soc_for_each_device(__dev) \ tegra_soc_for_each_device > + for ((__dev) = tegra_soc_devices; (__dev) && (__dev)->compatible; \ > + (__dev)++) > + > +struct tegra_soc_device { > + const char *compatible; > + const bool dvfs_critical; > + unsigned int sync_count; > +}; > + > +static DEFINE_MUTEX(tegra_soc_lock); > +static struct tegra_soc_device *tegra_soc_devices; > + > +/* > + * DVFS-critical devices are either active at a boot time or permanently > + * active, like EMC for example. System-wide DVFS should be deferred until > + * drivers of the critical devices synced theirs state. > + */ > + > +static struct tegra_soc_device tegra20_soc_devices[] = { > + { .compatible = "nvidia,tegra20-dc", .dvfs_critical = true, }, > + { .compatible = "nvidia,tegra20-emc", .dvfs_critical = true, }, > + { } > +}; > + > +static struct tegra_soc_device tegra30_soc_devices[] = { > + { .compatible = "nvidia,tegra30-dc", .dvfs_critical = true, }, > + { .compatible = "nvidia,tegra30-emc", .dvfs_critical = true, }, > + { .compatible = "nvidia,tegra30-pwm", .dvfs_critical = true, }, > + { } > +}; > + > static const struct of_device_id tegra_machine_match[] = { > - { .compatible = "nvidia,tegra20", }, > - { .compatible = "nvidia,tegra30", }, > + { .compatible = "nvidia,tegra20", .data = tegra20_soc_devices, }, > + { .compatible = "nvidia,tegra30", .data = tegra30_soc_devices, }, > { .compatible = "nvidia,tegra114", }, > { .compatible = "nvidia,tegra124", }, > { .compatible = "nvidia,tegra132", }, > @@ -17,7 +56,7 @@ static const struct of_device_id tegra_machine_match[] = { > { } > }; > > -bool soc_is_tegra(void) > +static const struct of_device_id *tegra_soc_of_match(void) > { > const struct of_device_id *match; > struct device_node *root; > @@ -29,5 +68,110 @@ bool soc_is_tegra(void) > match = of_match_node(tegra_machine_match, root); > of_node_put(root); > > - return match != NULL; > + return match; > +} > + > +bool soc_is_tegra(void) > +{ > + return tegra_soc_of_match() != NULL; > +} > + > +void tegra_soc_device_sync_state(struct device *dev) > +{ > + struct tegra_soc_device *soc_dev; > + > + mutex_lock(&tegra_soc_lock); > + terga_soc_for_each_device(soc_dev) { tegra_soc_for_each_device > + if (!of_device_is_compatible(dev->of_node, soc_dev->compatible)) > + continue; > + > + if (!soc_dev->sync_count) { > + dev_err(dev, "already synced\n"); > + break; > + } > + > + /* > + * All DVFS-capable devices should have the CORE regulator > + * phandle. Older device-trees don't have it, hence state > + * won't be synced for the older DTBs, allowing them to work > + * properly. > + */ > + if (soc_dev->dvfs_critical && > + !device_property_present(dev, "core-supply")) { > + dev_dbg(dev, "doesn't have core supply\n"); > + break; > + } > + > + soc_dev->sync_count--; > + dev_dbg(dev, "sync_count=%u\n", soc_dev->sync_count); > + break; > + } > + mutex_unlock(&tegra_soc_lock); > +} > +EXPORT_SYMBOL_GPL(tegra_soc_device_sync_state); > + > +bool tegra_soc_dvfs_state_synced(void) > +{ > + struct tegra_soc_device *soc_dev; > + bool synced_state = true; > + > + /* > + * CORE voltage scaling is limited until drivers of the critical > + * devices synced theirs state. > + */ > + mutex_lock(&tegra_soc_lock); > + terga_soc_for_each_device(soc_dev) { tegra_soc_for_each_device I wonder if you copy/pasted this or if you got really lucky to mistype this all three times. > + if (!soc_dev->sync_count || !soc_dev->dvfs_critical) > + continue; > + > + pr_debug_ratelimited("%s: sync_count=%u\n", > + soc_dev->compatible, soc_dev->sync_count); > + > + synced_state = false; > + break; > + } > + mutex_unlock(&tegra_soc_lock); > + > + return synced_state; > +} > + > +static int __init tegra_soc_devices_init(void) > +{ > + struct device_node *np, *prev_np = NULL; > + struct tegra_soc_device *soc_dev; > + const struct of_device_id *match; > + > + if (!soc_is_tegra()) > + return 0; > + > + match = tegra_soc_of_match(); > + tegra_soc_devices = (void *)match->data; > + > + /* > + * If device node is disabled in a device-tree, then we shouldn't > + * care about this device. Even if device is active during boot, > + * its clock will be disabled by CCF as unused. > + */ > + terga_soc_for_each_device(soc_dev) { > + do { > + /* > + * Devices like display controller have multiple > + * instances with the same compatible. Hence we need > + * to walk up the whole tree in order to account those > + * multiple instances. > + */ > + np = of_find_compatible_node(prev_np, NULL, > + soc_dev->compatible); > + of_node_put(prev_np); > + prev_np = np; > + > + if (of_device_is_available(np)) { > + pr_debug("added %s\n", soc_dev->compatible); > + soc_dev->sync_count++; > + } > + } while (np); Maybe use for_each_compatible_node() for that inside loop? > + } > + > + return 0; > } > +postcore_initcall_sync(tegra_soc_devices_init); This is unfortunate. I recall having this discussion multiple times and one idea that has been floating around for a while was to let a driver bind against the top-level "bus" node. That has the advantage that it both anchors the code somewhere, so we don't have to play this game of checking for the SoC with soc_is_tegra(), and it properly orders this with respect to the child devices, so we wouldn't have to make this a postcore_initcall. Might be worth looking at that again, but for now this seems okay. Thierry
On Thu, Nov 05, 2020 at 02:44:15AM +0300, Dmitry Osipenko wrote: [...] > +static void tegra_pwm_deinit_opp_table(void *data) > +{ > + struct device *dev = data; > + struct opp_table *opp_table; > + > + opp_table = dev_pm_opp_get_opp_table(dev); > + dev_pm_opp_of_remove_table(dev); > + dev_pm_opp_put_regulators(opp_table); > + dev_pm_opp_put_opp_table(opp_table); > +} > + > +static int devm_tegra_pwm_init_opp_table(struct device *dev) > +{ > + struct opp_table *opp_table; > + const char *rname = "core"; > + int err; > + > + /* voltage scaling is optional */ > + if (device_property_present(dev, "core-supply")) > + opp_table = dev_pm_opp_set_regulators(dev, &rname, 1); > + else > + opp_table = dev_pm_opp_get_opp_table(dev); > + > + if (IS_ERR(opp_table)) > + return dev_err_probe(dev, PTR_ERR(opp_table), > + "failed to prepare OPP table\n"); > + > + /* > + * OPP table presence is optional and we want the set_rate() of OPP > + * API to work similarly to clk_set_rate() if table is missing in a > + * device-tree. The add_table() errors out if OPP is missing in DT. > + */ > + if (device_property_present(dev, "operating-points-v2")) { > + err = dev_pm_opp_of_add_table(dev); > + if (err) { > + dev_err(dev, "failed to add OPP table: %d\n", err); > + goto put_table; > + } > + } > + > + err = devm_add_action(dev, tegra_pwm_deinit_opp_table, dev); > + if (err) > + goto remove_table; > + > + return 0; > + > +remove_table: > + dev_pm_opp_of_remove_table(dev); > +put_table: > + dev_pm_opp_put_regulators(opp_table); > + > + return err; > +} These two functions seem to be heavily boilerplate across all these drivers. Have you considered splitting these out into separate helpers? Thierry
10.11.2020 23:50, Thierry Reding пишет: > On Thu, Nov 05, 2020 at 02:44:15AM +0300, Dmitry Osipenko wrote: > [...] >> +static void tegra_pwm_deinit_opp_table(void *data) >> +{ >> + struct device *dev = data; >> + struct opp_table *opp_table; >> + >> + opp_table = dev_pm_opp_get_opp_table(dev); >> + dev_pm_opp_of_remove_table(dev); >> + dev_pm_opp_put_regulators(opp_table); >> + dev_pm_opp_put_opp_table(opp_table); >> +} >> + >> +static int devm_tegra_pwm_init_opp_table(struct device *dev) >> +{ >> + struct opp_table *opp_table; >> + const char *rname = "core"; >> + int err; >> + >> + /* voltage scaling is optional */ >> + if (device_property_present(dev, "core-supply")) >> + opp_table = dev_pm_opp_set_regulators(dev, &rname, 1); >> + else >> + opp_table = dev_pm_opp_get_opp_table(dev); >> + >> + if (IS_ERR(opp_table)) >> + return dev_err_probe(dev, PTR_ERR(opp_table), >> + "failed to prepare OPP table\n"); >> + >> + /* >> + * OPP table presence is optional and we want the set_rate() of OPP >> + * API to work similarly to clk_set_rate() if table is missing in a >> + * device-tree. The add_table() errors out if OPP is missing in DT. >> + */ >> + if (device_property_present(dev, "operating-points-v2")) { >> + err = dev_pm_opp_of_add_table(dev); >> + if (err) { >> + dev_err(dev, "failed to add OPP table: %d\n", err); >> + goto put_table; >> + } >> + } >> + >> + err = devm_add_action(dev, tegra_pwm_deinit_opp_table, dev); >> + if (err) >> + goto remove_table; >> + >> + return 0; >> + >> +remove_table: >> + dev_pm_opp_of_remove_table(dev); >> +put_table: >> + dev_pm_opp_put_regulators(opp_table); >> + >> + return err; >> +} > > These two functions seem to be heavily boilerplate across all these > drivers. Have you considered splitting these out into separate helpers? The helper is already prepared for v2.
10.11.2020 23:47, Thierry Reding пишет: ... > tegra_soc_for_each_device > > I wonder if you copy/pasted this or if you got really lucky to mistype > this all three times. Copied of course :) I added a special spell checking rule for this typo, but it does help reliably. ... >> + terga_soc_for_each_device(soc_dev) { >> + do { >> + /* >> + * Devices like display controller have multiple >> + * instances with the same compatible. Hence we need >> + * to walk up the whole tree in order to account those >> + * multiple instances. >> + */ >> + np = of_find_compatible_node(prev_np, NULL, >> + soc_dev->compatible); >> + of_node_put(prev_np); >> + prev_np = np; >> + >> + if (of_device_is_available(np)) { >> + pr_debug("added %s\n", soc_dev->compatible); >> + soc_dev->sync_count++; >> + } >> + } while (np); > > Maybe use for_each_compatible_node() for that inside loop? Good point! I think there is actually an of_node_put() bug in current variant, which for_each_compatible_node() would safe from. >> + } >> + >> + return 0; >> } >> +postcore_initcall_sync(tegra_soc_devices_init); > > This is unfortunate. I recall having this discussion multiple times and > one idea that has been floating around for a while was to let a driver > bind against the top-level "bus" node. That has the advantage that it > both anchors the code somewhere, so we don't have to play this game of > checking for the SoC with soc_is_tegra(), and it properly orders this > with respect to the child devices, so we wouldn't have to make this a > postcore_initcall. > > Might be worth looking at that again, but for now this seems okay. Thanks
10.11.2020 23:32, Mark Brown пишет: > On Tue, Nov 10, 2020 at 09:29:45PM +0100, Thierry Reding wrote: >> On Thu, Nov 05, 2020 at 02:44:08AM +0300, Dmitry Osipenko wrote: > >>> + /* >>> + * Voltage scaling is optional and trying to set voltage for a dummy >>> + * regulator will error out. >>> + */ >>> + if (!device_property_present(dc->dev, "core-supply")) >>> + return; > >> This is a potentially heavy operation, so I think we should avoid that >> here. How about you use devm_regulator_get_optional() in ->probe()? That >> returns -ENODEV if no regulator was specified, in which case you can set >> dc->core_reg = NULL and use that as the condition here. > > Or enumerate the configurable voltages after getting the regulator and > handle that appropriately which would be more robust in case there's > missing or unusual constraints. > I already changed that code to use regulator_get_optional() for v2. Regarding the enumerating supported voltage.. I think this should be done by the OPP core, but regulator core doesn't work well if regulator_get() is invoked more than one time for the same device, at least there is a loud debugfs warning about an already existing directory for a regulator. It's easy to check whether the debug directory exists before creating it, like thermal framework does it for example, but then there were some other more difficult issues.. I don't recall what they were right now. Perhaps will be easier to simply get a error from regulator_set_voltage() for now because it shouldn't ever happen in practice, unless device-tree has wrong constraints.
11.11.2020 00:22, Dmitry Osipenko пишет: > I added a special spell checking rule for this typo, but it does help > reliably. does *not*
10.11.2020 23:29, Thierry Reding пишет: >> + /* legacy device-trees don't have OPP table */ >> + if (!device_property_present(dc->dev, "operating-points-v2")) >> + return 0; > "Legacy" is a bit confusing here. For one, no device trees currently > have these tables and secondly, for newer SoCs we may never need them. > I had the same thought and already improved such comments a day ago.
On Tue, Nov 10, 2020 at 09:29:45PM +0100, Thierry Reding wrote: > > + err = dev_pm_opp_of_add_table(dc->dev); > > + if (err) { > > + dev_err(dc->dev, "failed to add OPP table: %d\n", err); > > + goto put_hw; > > + } > > + > > + err = devm_add_action(dc->dev, tegra_dc_deinit_opp_table, dc); > > + if (err) > > + goto remove_table; > > Do these functions return positive values? If not, I'd prefer if this > check was more explicit (i.e. err < 0) for consistency with the rest of > this code. > Isn't it the other way around? It's only when the check is explicitly for "if (ret < 0)" that we have to wonder about positives. If the codes says "if (ret)" then we know that it doesn't return positive values and every non-zero is an error. In the kernel "if (ret)" is way more popular than "if (ret < 0)": $ git grep 'if (\(ret\|rc\|err\))' | wc -l 92927 $ git grep 'if (\(ret\|rc\|err\) < 0)' | wc -l 36577 And some of those are places where "ret" can be positive so we are forced to use the "if (ret < 0)" format. Checking for "if (ret)" is easier from a static analysis perspective. If it's one style is used consistently then they're the same but when there is a mismatch the "if (ret < 0) " will trigger a false positive and the "if (ret) " will not. int var; ret = frob(&var); if (ret < 0) return ret; Smatch thinks positive returns are not handled so it complains that "var can be uninitialized". regards, dan carpenter
On Sun, 8 Nov 2020 at 13:19, Dmitry Osipenko <digetx@gmail.com> wrote: > > 05.11.2020 18:22, Dmitry Osipenko пишет: > > 05.11.2020 12:45, Ulf Hansson пишет: > > ... > >> I need some more time to review this, but just a quick check found a > >> few potential issues... > > > > Thank you for starting the review! I'm pretty sure it will take a couple > > revisions until all the questions will be resolved :) > > > >> The "core-supply", that you specify as a regulator for each > >> controller's device node, is not the way we describe power domains. > >> Instead, it seems like you should register a power-domain provider > >> (with the help of genpd) and implement the ->set_performance_state() > >> callback for it. Each device node should then be hooked up to this > >> power-domain, rather than to a "core-supply". For DT bindings, please > >> have a look at Documentation/devicetree/bindings/power/power-domain.yaml > >> and Documentation/devicetree/bindings/power/power_domain.txt. > >> > >> In regards to the "sync state" problem (preventing to change > >> performance states until all consumers have been attached), this can > >> then be managed by the genpd provider driver instead. > > > > I'll need to take a closer look at GENPD, thank you for the suggestion. > > > > Sounds like a software GENPD driver which manages clocks and voltages > > could be a good idea, but it also could be an unnecessary > > over-engineering. Let's see.. > > > > Hello Ulf and all, > > I took a detailed look at the GENPD and tried to implement it. Here is > what was found: > > 1. GENPD framework doesn't aggregate performance requests from the > attached devices. This means that if deviceA requests performance state > 10 and then deviceB requests state 3, then framework will set domain's > state to 3 instead of 10. > > https://elixir.bootlin.com/linux/v5.10-rc2/source/drivers/base/power/domain.c#L376 As Viresh also stated, genpd does aggregate the votes. It even performs aggregation hierarchy (a genpd is allowed to have parent(s) to model a topology). > > 2. GENPD framework has a sync() callback in the genpd.domain structure, > but this callback isn't allowed to be used by the GENPD implementation. > The GENPD framework always overrides that callback for its own needs. > Hence GENPD doesn't allow to solve the bootstrapping > state-synchronization problem in a nice way. > > https://elixir.bootlin.com/linux/v5.10-rc2/source/drivers/base/power/domain.c#L2606 That ->sync() callback isn't the callback you are looking for, it's a PM domain specific callback - and has other purposes. To solve the problem you refer to, your genpd provider driver (a platform driver) should assign its ->sync_state() callback. The ->sync_state() callback will be invoked, when all consumer devices have been attached (and probed) to their corresponding provider. You may have a look at drivers/cpuidle/cpuidle-psci-domain.c, to see an example of how this works. If there is anything unclear, just tell me and I will try to help. > > 3. Tegra doesn't have a dedicated hardware power-controller for the core > domain, instead there is only an external voltage regulator. Hence we > will need to create a phony device-tree node for the virtual power > domain, which is probably a wrong thing to do. No, this is absolutely the correct thing to do. This isn't a virtual power domain, it's a real power domain. You only happen to model the control of it as a regulator, as it fits nicely with that for *this* SoC. Don't get me wrong, that's fine as long as the supply is specified only in the power-domain provider node. On another SoC, you might have a different FW interface for the power domain provider that doesn't fit well with the regulator. When that happens, all you need to do is to implement a new power domain provider and potentially re-define the power domain topology. More importantly, you don't need to re-invent yet another slew of device specific bindings - for each SoC. > > === > > Perhaps it should be possible to create some hacks to work around > bullets 2 and 3 in order to achieve what we need for DVFS on Tegra, but > bullet 1 isn't solvable without changing how the GENPD core works. > > Altogether, the GENPD in its current form is a wrong abstraction for a > system-wide DVFS in a case where multiple devices share power domain and > this domain is a voltage regulator. The regulator framework is the > correct abstraction in this case for today. Well, I admit it's a bit complex. But it solves the problem in a nicely abstracted way that should work for everybody, at least in my opinion. Although, let's not exclude that there are pieces missing in genpd or the opp layer, as this DVFS feature is rather new - but then we should just extend/fix it. Kind regards Uffe
On Wed, Nov 11, 2020 at 12:23:41AM +0300, Dmitry Osipenko wrote: > 10.11.2020 23:32, Mark Brown пишет: > >>> + if (!device_property_present(dc->dev, "core-supply")) > >>> + return; > >> This is a potentially heavy operation, so I think we should avoid that > >> here. How about you use devm_regulator_get_optional() in ->probe()? That > >> returns -ENODEV if no regulator was specified, in which case you can set > >> dc->core_reg = NULL and use that as the condition here. > > Or enumerate the configurable voltages after getting the regulator and > > handle that appropriately which would be more robust in case there's > > missing or unusual constraints. > I already changed that code to use regulator_get_optional() for v2. That doesn't look entirely appropriate given that the core does most likely require some kind of power to operate. > Regarding the enumerating supported voltage.. I think this should be > done by the OPP core, but regulator core doesn't work well if > regulator_get() is invoked more than one time for the same device, at > least there is a loud debugfs warning about an already existing I don't understand why this would be an issue - if nothing else the core could just offer an interface to trigger the check. > directory for a regulator. It's easy to check whether the debug > directory exists before creating it, like thermal framework does it for > example, but then there were some other more difficult issues.. I don't > recall what they were right now. Perhaps will be easier to simply get a > error from regulator_set_voltage() for now because it shouldn't ever > happen in practice, unless device-tree has wrong constraints. The constraints might not be wrong, there might be some board which has a constraint somewhere for
11.11.2020 14:55, Mark Brown пишет: > On Wed, Nov 11, 2020 at 12:23:41AM +0300, Dmitry Osipenko wrote: >> 10.11.2020 23:32, Mark Brown пишет: > >>>>> + if (!device_property_present(dc->dev, "core-supply")) >>>>> + return; > >>>> This is a potentially heavy operation, so I think we should avoid that >>>> here. How about you use devm_regulator_get_optional() in ->probe()? That >>>> returns -ENODEV if no regulator was specified, in which case you can set >>>> dc->core_reg = NULL and use that as the condition here. > >>> Or enumerate the configurable voltages after getting the regulator and >>> handle that appropriately which would be more robust in case there's >>> missing or unusual constraints. > >> I already changed that code to use regulator_get_optional() for v2. > > That doesn't look entirely appropriate given that the core does most > likely require some kind of power to operate. We will need to do this because older DTBs won't have that regulator and we want to keep them working. Also, some device-trees won't have that regulator anyways because board schematics isn't available, and thus, we can't fix them. >> Regarding the enumerating supported voltage.. I think this should be >> done by the OPP core, but regulator core doesn't work well if >> regulator_get() is invoked more than one time for the same device, at >> least there is a loud debugfs warning about an already existing > > I don't understand why this would be an issue - if nothing else the core > could just offer an interface to trigger the check. It's not an issue, I just described what happens when device driver tries to get a regulator twice. There was an issue once that check is added to the regulator core code. But perhaps not worth to discuss it for now because I don't remember details. >> directory for a regulator. It's easy to check whether the debug >> directory exists before creating it, like thermal framework does it for >> example, but then there were some other more difficult issues.. I don't >> recall what they were right now. Perhaps will be easier to simply get a >> error from regulator_set_voltage() for now because it shouldn't ever >> happen in practice, unless device-tree has wrong constraints. > > The constraints might not be wrong, there might be some board which has > a constraint somewhere for > In this case board's DT shouldn't specify unsupportable OPPs.
11.11.2020 14:38, Ulf Hansson пишет: > On Sun, 8 Nov 2020 at 13:19, Dmitry Osipenko <digetx@gmail.com> wrote: >> >> 05.11.2020 18:22, Dmitry Osipenko пишет: >>> 05.11.2020 12:45, Ulf Hansson пишет: >>> ... >>>> I need some more time to review this, but just a quick check found a >>>> few potential issues... >>> >>> Thank you for starting the review! I'm pretty sure it will take a couple >>> revisions until all the questions will be resolved :) >>> >>>> The "core-supply", that you specify as a regulator for each >>>> controller's device node, is not the way we describe power domains. >>>> Instead, it seems like you should register a power-domain provider >>>> (with the help of genpd) and implement the ->set_performance_state() >>>> callback for it. Each device node should then be hooked up to this >>>> power-domain, rather than to a "core-supply". For DT bindings, please >>>> have a look at Documentation/devicetree/bindings/power/power-domain.yaml >>>> and Documentation/devicetree/bindings/power/power_domain.txt. >>>> >>>> In regards to the "sync state" problem (preventing to change >>>> performance states until all consumers have been attached), this can >>>> then be managed by the genpd provider driver instead. >>> >>> I'll need to take a closer look at GENPD, thank you for the suggestion. >>> >>> Sounds like a software GENPD driver which manages clocks and voltages >>> could be a good idea, but it also could be an unnecessary >>> over-engineering. Let's see.. >>> >> >> Hello Ulf and all, >> >> I took a detailed look at the GENPD and tried to implement it. Here is >> what was found: >> >> 1. GENPD framework doesn't aggregate performance requests from the >> attached devices. This means that if deviceA requests performance state >> 10 and then deviceB requests state 3, then framework will set domain's >> state to 3 instead of 10. >> >> https://elixir.bootlin.com/linux/v5.10-rc2/source/drivers/base/power/domain.c#L376 > > As Viresh also stated, genpd does aggregate the votes. It even > performs aggregation hierarchy (a genpd is allowed to have parent(s) > to model a topology). Yes, I already found and fixed the bug which confused me previously and it's working well now. >> 2. GENPD framework has a sync() callback in the genpd.domain structure, >> but this callback isn't allowed to be used by the GENPD implementation. >> The GENPD framework always overrides that callback for its own needs. >> Hence GENPD doesn't allow to solve the bootstrapping >> state-synchronization problem in a nice way. >> >> https://elixir.bootlin.com/linux/v5.10-rc2/source/drivers/base/power/domain.c#L2606 > > That ->sync() callback isn't the callback you are looking for, it's a > PM domain specific callback - and has other purposes. > > To solve the problem you refer to, your genpd provider driver (a > platform driver) should assign its ->sync_state() callback. The > ->sync_state() callback will be invoked, when all consumer devices > have been attached (and probed) to their corresponding provider. > > You may have a look at drivers/cpuidle/cpuidle-psci-domain.c, to see > an example of how this works. If there is anything unclear, just tell > me and I will try to help. Indeed, thank you for the clarification. This variant works well. >> 3. Tegra doesn't have a dedicated hardware power-controller for the core >> domain, instead there is only an external voltage regulator. Hence we >> will need to create a phony device-tree node for the virtual power >> domain, which is probably a wrong thing to do. > > No, this is absolutely the correct thing to do. > > This isn't a virtual power domain, it's a real power domain. You only > happen to model the control of it as a regulator, as it fits nicely > with that for *this* SoC. Don't get me wrong, that's fine as long as > the supply is specified only in the power-domain provider node. > > On another SoC, you might have a different FW interface for the power > domain provider that doesn't fit well with the regulator. When that > happens, all you need to do is to implement a new power domain > provider and potentially re-define the power domain topology. More > importantly, you don't need to re-invent yet another slew of device > specific bindings - for each SoC. > >> >> === >> >> Perhaps it should be possible to create some hacks to work around >> bullets 2 and 3 in order to achieve what we need for DVFS on Tegra, but >> bullet 1 isn't solvable without changing how the GENPD core works. >> >> Altogether, the GENPD in its current form is a wrong abstraction for a >> system-wide DVFS in a case where multiple devices share power domain and >> this domain is a voltage regulator. The regulator framework is the >> correct abstraction in this case for today. > > Well, I admit it's a bit complex. But it solves the problem in a > nicely abstracted way that should work for everybody, at least in my > opinion. The OPP framework supports both voltage regulator and power domain, hiding the implementation details from drivers. This means that OPP API usage will be the same regardless of what approach (regulator or power domain) is used for a particular SoC. > Although, let's not exclude that there are pieces missing in genpd or > the opp layer, as this DVFS feature is rather new - but then we should > just extend/fix it. Will be nice to have a per-device GENPD performance stats. Thierry, could you please let me know what do you think about replacing regulator with the power domain? Do you think it's a worthwhile change? The difference in comparison to using voltage regulator directly is minimal, basically the core-supply phandle is replaced is replaced with a power-domain phandle in a device tree. The only thing which makes me feel a bit uncomfortable is that there is no real hardware node for the power domain node in a device-tree.
12.11.2020 23:01, Mark Brown пишет: >> But it's not allowed to change voltage of a dummy regulator, is it >> intentional? > Of course not, we can't know if the requested new voltage is valid - the > driver would have to have explict support for handling situations where > it's not possible to change the voltage (which it can detect through > enumerating the values it wants to set at startup). > > [Requesting the same supply multiple times] But how driver is supposed to recognize that it's a dummy or buggy regulator if it rejects all voltages?
On Fri, Nov 13, 2020 at 06:55:27PM +0300, Dmitry Osipenko wrote: > 13.11.2020 17:29, Mark Brown пишет: > > It's not clear if it matters - it's more a policy decision on the part > > of the driver about what it thinks safe error handling is. If it's not > If regulator_get() returns a dummy regulator, then this means that > regulator isn't specified in a device-tree. But then the only way for a > consumer driver to check whether regulator is dummy, is to check > presence of the supply property in a device-tree. My point here is that the driver shouldn't be checking for a dummy regulator, the driver should be checking the features that are provided to it by the regulator and handling those. It doesn't matter if this is a dummy regulator or an actual regulator with limited features, the effect is the same and the handling should be the same. If the driver is doing anything to handle dummy regulators explicitly as dummy regulators it is doing it wrong. > We want to emit error messages when something goes wrong, for example > when regulator voltage fails to change. It's fine that voltage changes > are failing for a dummy regulator, but then consumer driver shouldn't > recognize it as a error condition. If you're fine with that you should also be fine with any other regulator for which you failed to enumerate any voltages which you can set. > The regulator_get_optional() provides a more consistent and > straightforward way for consumer drivers to check presence of a physical > voltage regulator in comparison to dealing with a regulator_get(). The > dummy regulator is nice to use when there is no need to change > regulator's voltage, which doesn't work for a dummy regulator. To repeat you should *only* be using regulator_get_optional() in the case where the supply may be physically absent which is not the case here.
On Sun, Nov 15, 2020 at 08:42:10PM +0300, Dmitry Osipenko wrote: > 13.11.2020 20:28, Mark Brown пишет: > >> What should we do? > > As I keep saying the consumer driver should be enumerating the voltages > > it can set, if it can't find any and wants to continue then it can just > > skip setting voltages later on. If only some are unavailable then it > > probably wants to eliminate those specific OPPs instead. > I'm seeing a dummy regulator as a helper for consumer drivers which > removes burden of handling an absent (optional) regulator. Is this a > correct understanding of a dummy regulator? > Older DTBs don't have a regulator and we want to keep them working. This > is equal to a physically absent regulator and in this case it's an > optional regulator, IMO. No, you are failing to understand the purpose of this code. To reiterate unless the device supports operating with the supply physically absent then the driver should not be attempting to use regulator_get_optional(). That exists specifically for the case where the supply may be absent, nothing else. The dummy regulator is there precisely for the case where the system does not describe supplies that we know are required for the device to function, it fixes up that omission so we don't need to open code handling of this in every single consumer driver. Regulators that are present but not described by the firmware are a clearly different case to regulators that are not physically there, hardware with actually optional regulators will generally require some configuration for this case.
16.11.2020 16:33, Mark Brown пишет: > On Sun, Nov 15, 2020 at 08:42:10PM +0300, Dmitry Osipenko wrote: >> 13.11.2020 20:28, Mark Brown пишет: > >>>> What should we do? > >>> As I keep saying the consumer driver should be enumerating the voltages >>> it can set, if it can't find any and wants to continue then it can just >>> skip setting voltages later on. If only some are unavailable then it >>> probably wants to eliminate those specific OPPs instead. > >> I'm seeing a dummy regulator as a helper for consumer drivers which >> removes burden of handling an absent (optional) regulator. Is this a >> correct understanding of a dummy regulator? > >> Older DTBs don't have a regulator and we want to keep them working. This >> is equal to a physically absent regulator and in this case it's an >> optional regulator, IMO. > > No, you are failing to understand the purpose of this code. To > reiterate unless the device supports operating with the supply > physically absent then the driver should not be attempting to use > regulator_get_optional(). That exists specifically for the case where > the supply may be absent, nothing else. The dummy regulator is there > precisely for the case where the system does not describe supplies that > we know are required for the device to function, it fixes up that > omission so we don't need to open code handling of this in every single > consumer driver. The original intention of regulator_get_optional() is clear to me, but nothing really stops drivers from slightly re-purposing this API, IMO. Drivers should be free to assume that if regulator isn't defined by firmware, then it's physically absent if this doesn't break anything. Of course in some cases it's unsafe to make such assumptions. I think it's a bit unpractical to artificially limit API usage without a good reason, i.e. if nothing breaks underneath of a driver. > Regulators that are present but not described by the firmware are a > clearly different case to regulators that are not physically there, > hardware with actually optional regulators will generally require some > configuration for this case. > I have good news. After spending some more time on trying out different things, I found that my previous assumption about the fixed-regulator was wrong, it actually accepts voltage changes, i.e. regulator consumer doesn't get a error on a voltage-change. This is exactly what is needed for the OPP core to work properly. This means that there is no need to add special quirks to work around absent regulators, we will just add a fixed regulator to the DTs which don't specify a real regulator. The OPP core will perform voltage checking and filter out unsupported OPPs. The older DTBs will continue to work as well.
On Thu, Nov 19, 2020 at 05:22:43PM +0300, Dmitry Osipenko wrote: > 16.11.2020 16:33, Mark Brown пишет: > > No, you are failing to understand the purpose of this code. To > > reiterate unless the device supports operating with the supply > > physically absent then the driver should not be attempting to use > > regulator_get_optional(). That exists specifically for the case where > The original intention of regulator_get_optional() is clear to me, but > nothing really stops drivers from slightly re-purposing this API, IMO. > Drivers should be free to assume that if regulator isn't defined by > firmware, then it's physically absent if this doesn't break anything. Of > course in some cases it's unsafe to make such assumptions. I think it's > a bit unpractical to artificially limit API usage without a good reason, > i.e. if nothing breaks underneath of a driver. If the supply can be physically absent without breaking anything then this is the intended use case for optional regulators. This is a *very* uncommon. > > Regulators that are present but not described by the firmware are a > > clearly different case to regulators that are not physically there, > > hardware with actually optional regulators will generally require some > > configuration for this case. > I have good news. After spending some more time on trying out different > things, I found that my previous assumption about the fixed-regulator > was wrong, it actually accepts voltage changes, i.e. regulator consumer > doesn't get a error on a voltage-change. This is exactly what is needed > for the OPP core to work properly. To be clear when you set a voltage range you will get the minimum voltage that can be supported within that range on the system given all the other constraints the system has. For fixed voltage regulators or regulators constraints to not change voltage this means that if whatever voltage they are fixed at is in the range requested then the API will report success.
01.12.2020 16:57, Mark Brown пишет: > On Thu, 5 Nov 2020 02:43:57 +0300, Dmitry Osipenko wrote: >> Introduce core voltage scaling for NVIDIA Tegra20/30 SoCs, which reduces >> power consumption and heating of the Tegra chips. Tegra SoC has multiple >> hardware units which belong to a core power domain of the SoC and share >> the core voltage. The voltage must be selected in accordance to a minimum >> requirement of every core hardware unit. >> >> The minimum core voltage requirement depends on: >> >> [...] > > Applied to > > https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-next > > Thanks! > > [1/1] regulator: Allow skipping disabled regulators in regulator_check_consumers() > (no commit info) > > All being well this means that it will be integrated into the linux-next > tree (usually sometime in the next 24 hours) and sent to Linus during > the next merge window (or sooner if it is a bug fix), however if > problems are discovered then the patch may be dropped or reverted. > > You may get further e-mails resulting from automated or manual testing > and review of the tree, please engage with people reporting problems and > send followup patches addressing any issues that are reported if needed. > > If any updates are required or you are submitting further changes they > should be sent as incremental updates against current git, existing > patches will not be replaced. > > Please add any relevant lists and maintainers to the CCs when replying > to this mail. Hello Mark, Could you please hold on this patch? It won't be needed in a v2, which will use power domains. Also, I'm not sure whether the "sound" tree is suitable for any of the patches in this series.
On Tue, Dec 01, 2020 at 05:17:20PM +0300, Dmitry Osipenko wrote: > 01.12.2020 16:57, Mark Brown пишет: > > [1/1] regulator: Allow skipping disabled regulators in regulator_check_consumers() > > (no commit info) > Could you please hold on this patch? It won't be needed in a v2, which > will use power domains. > Also, I'm not sure whether the "sound" tree is suitable for any of the > patches in this series. It didn't actually get applied (note the "no commit info") - it looks like b4's matching code got confused and decided to generate mails for anything that I've ever downloaded and not posted.
01.12.2020 17:34, Mark Brown пишет: > On Tue, Dec 01, 2020 at 05:17:20PM +0300, Dmitry Osipenko wrote: >> 01.12.2020 16:57, Mark Brown пишет: > >>> [1/1] regulator: Allow skipping disabled regulators in regulator_check_consumers() >>> (no commit info) > >> Could you please hold on this patch? It won't be needed in a v2, which >> will use power domains. > >> Also, I'm not sure whether the "sound" tree is suitable for any of the >> patches in this series. > > It didn't actually get applied (note the "no commit info") - it looks > like b4's matching code got confused and decided to generate mails for > anything that I've ever downloaded and not posted. > Alright, thank you for the clarification.
On Mon, 9 Nov 2020 at 16:51, Frank Lee <tiny.windzz@gmail.com> wrote: > On Mon, Nov 9, 2020 at 1:53 PM Viresh Kumar <viresh.kumar@linaro.org> wrote: > > > devm_pm_opp_set_supported_hw() > > > devm_pm_opp_set_regulators() [if we won't use GENPD] > > > devm_pm_opp_set_clkname() > > > devm_pm_opp_of_add_table() > > > > I tried to look earlier for the stuff already merged in and didn't > > find a lot of stuff where the devm_* could be used, maybe I missed > > some of it. > > > > Frank, would you like to refresh your series based on suggestions from > > Dmitry and make other drivers adapt to the new APIs ? > > I am glad to do this.:) Frank, Dmitry has submitted a series with a patch that does stuff like this since you never resent your patches. http://lore.kernel.org/lkml/20201217180638.22748-14-digetx@gmail.com Since you were the first one to get to this, I would still like to give you a chance to get these patches merged under your authorship, otherwise I would be going to pick patches from Dmitry. -- viresh