mbox series

[0/2] clk: qcom: fix disp_cc_mdss_mdp_clk_src issues on sdm845

Message ID 20211208022210.1300773-1-dmitry.baryshkov@linaro.org
Headers show
Series clk: qcom: fix disp_cc_mdss_mdp_clk_src issues on sdm845 | expand

Message

Dmitry Baryshkov Dec. 8, 2021, 2:22 a.m. UTC
This is an alternative approach to the issue that Bjorn proposed in
https://lore.kernel.org/linux-arm-msm/20211203035436.3505743-1-bjorn.andersson@linaro.org/

The disp_cc_mdss_mdp_clk_src clock can become stuck during the boot
process for reasons other than just disabling the clocks in
clock_disable_unused phase. For example other drivers during the boot
procedure can toggle parent of the clock, disabling it for some reason.

So instead of enforcing clock parking during the clock_disable_unused,
park them during the driver probe. This can break the splash screen
display, however loosing the splash screen for few seconds is considered
to be lesser evil compared to possibly loosing the display at all
(because the RCG gets stuck).

----------------------------------------------------------------
Dmitry Baryshkov (2):
      clk: qcom: add API to safely park RCG2 sources
      clk: qcom: dispcc-sdm845: park disp_cc_mdss_mdp_clk_src

 drivers/clk/qcom/clk-rcg.h       |  2 ++
 drivers/clk/qcom/clk-rcg2.c      | 34 ++++++++++++++++++++++++++++++++++
 drivers/clk/qcom/dispcc-sdm845.c |  3 +++
 3 files changed, 39 insertions(+)

Comments

Stephen Boyd Dec. 9, 2021, 8:37 a.m. UTC | #1
Quoting Dmitry Baryshkov (2021-12-07 18:22:09)
> Some of RCG2 clocks can become stuck during the boot process, when
> device drivers are enabling and disabling the RCG2's parent clocks.
> To prevernt such outcome of driver probe sequences, add API to park

s/prevernt/prevent/

> clocks to the safe clock source (typically TCXO).
> 
> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

I'd prefer this approach vs. adding a new clk flag. The clk framework
doesn't handle handoff properly today so we shouldn't try to bandage
that in the core.

> diff --git a/drivers/clk/qcom/clk-rcg2.c b/drivers/clk/qcom/clk-rcg2.c
> index e1b1b426fae4..230b04a7427c 100644
> --- a/drivers/clk/qcom/clk-rcg2.c
> +++ b/drivers/clk/qcom/clk-rcg2.c
> @@ -1036,6 +1036,40 @@ static void clk_rcg2_shared_disable(struct clk_hw *hw)
>         regmap_write(rcg->clkr.regmap, rcg->cmd_rcgr + CFG_REG, cfg);
>  }
>  
> +int clk_rcg2_park_safely(struct regmap *regmap, u32 offset, unsigned int safe_src)

Please add kernel doc as it's an exported symbol.

> +{
> +       unsigned int val, ret, count;
> +
> +       ret = regmap_read(regmap, offset + CFG_REG, &val);
> +       if (ret)
> +               return ret;
> +
> +       /* assume safe source is 0 */

Are we assuming safe source is 0 here? It looks like we pass it in now?

> +       if ((val & CFG_SRC_SEL_MASK) == (safe_src << CFG_SRC_SEL_SHIFT))
> +               return 0;
> +
> +       regmap_write(regmap, offset + CFG_REG, safe_src << CFG_SRC_SEL_SHIFT);
> +
> +       ret = regmap_update_bits(regmap, offset + CMD_REG,
> +                                CMD_UPDATE, CMD_UPDATE);
> +       if (ret)
> +               return ret;
> +
> +       /* Wait for update to take effect */
> +       for (count = 500; count > 0; count--) {
> +               ret = regmap_read(regmap, offset + CMD_REG, &val);
> +               if (ret)
> +                       return ret;
> +               if (!(val & CMD_UPDATE))
> +                       return 0;
> +               udelay(1);
> +       }
> +
> +       WARN(1, "the rcg didn't update its configuration.");

Add a newline?

> +       return -EBUSY;
> +}
> +EXPORT_SYMBOL_GPL(clk_rcg2_park_safely);
> +
Bjorn Andersson Dec. 9, 2021, 6:36 p.m. UTC | #2
On Thu 09 Dec 00:37 PST 2021, Stephen Boyd wrote:

> Quoting Dmitry Baryshkov (2021-12-07 18:22:09)
> > Some of RCG2 clocks can become stuck during the boot process, when
> > device drivers are enabling and disabling the RCG2's parent clocks.
> > To prevernt such outcome of driver probe sequences, add API to park
> 
> s/prevernt/prevent/
> 
> > clocks to the safe clock source (typically TCXO).
> > 
> > Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
> 
> I'd prefer this approach vs. adding a new clk flag. The clk framework
> doesn't handle handoff properly today so we shouldn't try to bandage
> that in the core.
> 

I'm not against putting this responsibility in the drivers, but I don't
think we can blindly park all the RCGs that may or may not be used.

Note that we should do this for all RCGs that downstream are marked as
enable_safe_config (upstream should be using clk_rcg2_shared_ops)
and disabling some of those probe time won't be appreciated by the
hardware.


If you don't like the flag passed to clk_disable_unused (which is like a
very reasonable objection to have), we need to make progress towards a
proper solution that replaces clk_disable_unused().

> > diff --git a/drivers/clk/qcom/clk-rcg2.c b/drivers/clk/qcom/clk-rcg2.c
> > index e1b1b426fae4..230b04a7427c 100644
> > --- a/drivers/clk/qcom/clk-rcg2.c
> > +++ b/drivers/clk/qcom/clk-rcg2.c
> > @@ -1036,6 +1036,40 @@ static void clk_rcg2_shared_disable(struct clk_hw *hw)
> >         regmap_write(rcg->clkr.regmap, rcg->cmd_rcgr + CFG_REG, cfg);
> >  }
> >  
> > +int clk_rcg2_park_safely(struct regmap *regmap, u32 offset, unsigned int safe_src)

This seems to just duplicate clk_rcg2_shared_disable()?

Regards,
Bjorn

> 
> Please add kernel doc as it's an exported symbol.
> 
> > +{
> > +       unsigned int val, ret, count;
> > +
> > +       ret = regmap_read(regmap, offset + CFG_REG, &val);
> > +       if (ret)
> > +               return ret;
> > +
> > +       /* assume safe source is 0 */
> 
> Are we assuming safe source is 0 here? It looks like we pass it in now?
> 
> > +       if ((val & CFG_SRC_SEL_MASK) == (safe_src << CFG_SRC_SEL_SHIFT))
> > +               return 0;
> > +
> > +       regmap_write(regmap, offset + CFG_REG, safe_src << CFG_SRC_SEL_SHIFT);
> > +
> > +       ret = regmap_update_bits(regmap, offset + CMD_REG,
> > +                                CMD_UPDATE, CMD_UPDATE);
> > +       if (ret)
> > +               return ret;
> > +
> > +       /* Wait for update to take effect */
> > +       for (count = 500; count > 0; count--) {
> > +               ret = regmap_read(regmap, offset + CMD_REG, &val);
> > +               if (ret)
> > +                       return ret;
> > +               if (!(val & CMD_UPDATE))
> > +                       return 0;
> > +               udelay(1);
> > +       }
> > +
> > +       WARN(1, "the rcg didn't update its configuration.");
> 
> Add a newline?
> 
> > +       return -EBUSY;
> > +}
> > +EXPORT_SYMBOL_GPL(clk_rcg2_park_safely);
> > +
Dmitry Baryshkov Dec. 15, 2021, 9:14 p.m. UTC | #3
On 09/12/2021 21:36, Bjorn Andersson wrote:
> On Thu 09 Dec 00:37 PST 2021, Stephen Boyd wrote:
> 
>> Quoting Dmitry Baryshkov (2021-12-07 18:22:09)
>>> Some of RCG2 clocks can become stuck during the boot process, when
>>> device drivers are enabling and disabling the RCG2's parent clocks.
>>> To prevernt such outcome of driver probe sequences, add API to park
>>
>> s/prevernt/prevent/
>>
>>> clocks to the safe clock source (typically TCXO).
>>>
>>> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
>>
>> I'd prefer this approach vs. adding a new clk flag. The clk framework
>> doesn't handle handoff properly today so we shouldn't try to bandage
>> that in the core.
>>
> 
> I'm not against putting this responsibility in the drivers, but I don't
> think we can blindly park all the RCGs that may or may not be used.
> 
> Note that we should do this for all RCGs that downstream are marked as
> enable_safe_config (upstream should be using clk_rcg2_shared_ops)
> and disabling some of those probe time won't be appreciated by the
> hardware.

Only for the hardware as crazy, as displays. And maybe gmu_clk_src. I 
don't think we expect venus or camcc to be really clocking when kernel 
boots.

> 
> 
> If you don't like the flag passed to clk_disable_unused (which is like a
> very reasonable objection to have), we need to make progress towards a
> proper solution that replaces clk_disable_unused().

The issue is that at the time of clk_disable_unused() it can be too 
late, for example because msm being built-in into the kernel has already 
tried to play with PLLs/GDSCs and thus made RCG stuck. This is what I 
was observing on RB3 if the msm driver is built in and the splash screen 
is enabled.

> 
>>> diff --git a/drivers/clk/qcom/clk-rcg2.c b/drivers/clk/qcom/clk-rcg2.c
>>> index e1b1b426fae4..230b04a7427c 100644
>>> --- a/drivers/clk/qcom/clk-rcg2.c
>>> +++ b/drivers/clk/qcom/clk-rcg2.c
>>> @@ -1036,6 +1036,40 @@ static void clk_rcg2_shared_disable(struct clk_hw *hw)
>>>          regmap_write(rcg->clkr.regmap, rcg->cmd_rcgr + CFG_REG, cfg);
>>>   }
>>>   
>>> +int clk_rcg2_park_safely(struct regmap *regmap, u32 offset, unsigned int safe_src)
> 
> This seems to just duplicate clk_rcg2_shared_disable()?

A light version of it. It does not do force_on/_off. And also it can not 
rely on clkr->regmap or clock name being set. Initially I used 
clk_rcg2_shared_disable + several patches to stop it from crashing if it 
is used on the non-registered clock. Then I just decided to write 
special helper.

> 
> Regards,
> Bjorn
> 
>>
>> Please add kernel doc as it's an exported symbol.

Ack

>>
>>> +{
>>> +       unsigned int val, ret, count;
>>> +
>>> +       ret = regmap_read(regmap, offset + CFG_REG, &val);
>>> +       if (ret)
>>> +               return ret;
>>> +
>>> +       /* assume safe source is 0 */
>>
>> Are we assuming safe source is 0 here? It looks like we pass it in now?

Leftover, will remove if/when posting v2.

>>
>>> +       if ((val & CFG_SRC_SEL_MASK) == (safe_src << CFG_SRC_SEL_SHIFT))
>>> +               return 0;
>>> +
>>> +       regmap_write(regmap, offset + CFG_REG, safe_src << CFG_SRC_SEL_SHIFT);
>>> +
>>> +       ret = regmap_update_bits(regmap, offset + CMD_REG,
>>> +                                CMD_UPDATE, CMD_UPDATE);
>>> +       if (ret)
>>> +               return ret;
>>> +
>>> +       /* Wait for update to take effect */
>>> +       for (count = 500; count > 0; count--) {
>>> +               ret = regmap_read(regmap, offset + CMD_REG, &val);
>>> +               if (ret)
>>> +                       return ret;
>>> +               if (!(val & CMD_UPDATE))
>>> +                       return 0;
>>> +               udelay(1);
>>> +       }
>>> +
>>> +       WARN(1, "the rcg didn't update its configuration.");
>>
>> Add a newline?

Ack.

>>
>>> +       return -EBUSY;
>>> +}
>>> +EXPORT_SYMBOL_GPL(clk_rcg2_park_safely);
>>> +
Bjorn Andersson Dec. 16, 2021, 4:24 a.m. UTC | #4
On Wed 15 Dec 13:14 PST 2021, Dmitry Baryshkov wrote:

> On 09/12/2021 21:36, Bjorn Andersson wrote:
> > On Thu 09 Dec 00:37 PST 2021, Stephen Boyd wrote:
> > 
> > > Quoting Dmitry Baryshkov (2021-12-07 18:22:09)
> > > > Some of RCG2 clocks can become stuck during the boot process, when
> > > > device drivers are enabling and disabling the RCG2's parent clocks.
> > > > To prevernt such outcome of driver probe sequences, add API to park
> > > 
> > > s/prevernt/prevent/
> > > 
> > > > clocks to the safe clock source (typically TCXO).
> > > > 
> > > > Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
> > > 
> > > I'd prefer this approach vs. adding a new clk flag. The clk framework
> > > doesn't handle handoff properly today so we shouldn't try to bandage
> > > that in the core.
> > > 
> > 
> > I'm not against putting this responsibility in the drivers, but I don't
> > think we can blindly park all the RCGs that may or may not be used.
> > 
> > Note that we should do this for all RCGs that downstream are marked as
> > enable_safe_config (upstream should be using clk_rcg2_shared_ops)
> > and disabling some of those probe time won't be appreciated by the
> > hardware.
> 
> Only for the hardware as crazy, as displays. And maybe gmu_clk_src. I don't
> think we expect venus or camcc to be really clocking when kernel boots.
> 

SM8350 GCC has 44 clocks marked as such downstream.

> > 
> > 
> > If you don't like the flag passed to clk_disable_unused (which is like a
> > very reasonable objection to have), we need to make progress towards a
> > proper solution that replaces clk_disable_unused().
> 
> The issue is that at the time of clk_disable_unused() it can be too late,
> for example because msm being built-in into the kernel has already tried to
> play with PLLs/GDSCs and thus made RCG stuck.

Makes sense, so this logic will have to consider both the hardware state
(or make assumptions thereof) and the clock votes in the kernel.

> This is what I was observing
> on RB3 if the msm driver is built in and the splash screen is enabled.
> 

Which clock was this?

We should be able to assume that the bootloader will hand us a clock
tree that's functionally configured, so reparenting etc of the RCGs
should not cause issues because the old parent will be ticking and we
will explicitly start the new parent.

One case that might not be handled though is the externally sourced
clocks, where if you reconfigure the DSI phy without first parking the
RCG you might lock up the RCG. So I think that whenever we mess with
those clocks we need to make sure that the downstream RCGs are not
ticking off them.

> > 
> > > > diff --git a/drivers/clk/qcom/clk-rcg2.c b/drivers/clk/qcom/clk-rcg2.c
> > > > index e1b1b426fae4..230b04a7427c 100644
> > > > --- a/drivers/clk/qcom/clk-rcg2.c
> > > > +++ b/drivers/clk/qcom/clk-rcg2.c
> > > > @@ -1036,6 +1036,40 @@ static void clk_rcg2_shared_disable(struct clk_hw *hw)
> > > >          regmap_write(rcg->clkr.regmap, rcg->cmd_rcgr + CFG_REG, cfg);
> > > >   }
> > > > +int clk_rcg2_park_safely(struct regmap *regmap, u32 offset, unsigned int safe_src)
> > 
> > This seems to just duplicate clk_rcg2_shared_disable()?
> 
> A light version of it. It does not do force_on/_off. And also it can not
> rely on clkr->regmap or clock name being set. Initially I used
> clk_rcg2_shared_disable + several patches to stop it from crashing if it is
> used on the non-registered clock. Then I just decided to write special
> helper.
> 

Okay, makes sense then. But I don't think we want to shoot down clocks
at clock probe time.

Regards,
Bjorn

> > 
> > Regards,
> > Bjorn
> > 
> > > 
> > > Please add kernel doc as it's an exported symbol.
> 
> Ack
> 
> > > 
> > > > +{
> > > > +       unsigned int val, ret, count;
> > > > +
> > > > +       ret = regmap_read(regmap, offset + CFG_REG, &val);
> > > > +       if (ret)
> > > > +               return ret;
> > > > +
> > > > +       /* assume safe source is 0 */
> > > 
> > > Are we assuming safe source is 0 here? It looks like we pass it in now?
> 
> Leftover, will remove if/when posting v2.
> 
> > > 
> > > > +       if ((val & CFG_SRC_SEL_MASK) == (safe_src << CFG_SRC_SEL_SHIFT))
> > > > +               return 0;
> > > > +
> > > > +       regmap_write(regmap, offset + CFG_REG, safe_src << CFG_SRC_SEL_SHIFT);
> > > > +
> > > > +       ret = regmap_update_bits(regmap, offset + CMD_REG,
> > > > +                                CMD_UPDATE, CMD_UPDATE);
> > > > +       if (ret)
> > > > +               return ret;
> > > > +
> > > > +       /* Wait for update to take effect */
> > > > +       for (count = 500; count > 0; count--) {
> > > > +               ret = regmap_read(regmap, offset + CMD_REG, &val);
> > > > +               if (ret)
> > > > +                       return ret;
> > > > +               if (!(val & CMD_UPDATE))
> > > > +                       return 0;
> > > > +               udelay(1);
> > > > +       }
> > > > +
> > > > +       WARN(1, "the rcg didn't update its configuration.");
> > > 
> > > Add a newline?
> 
> Ack.
> 
> > > 
> > > > +       return -EBUSY;
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(clk_rcg2_park_safely);
> > > > +
> 
> 
> -- 
> With best wishes
> Dmitry