diff mbox series

arm64: dts: qcom: sdm845-db845c: Move LVS regulator nodes up

Message ID 20230602161246.1855448-1-amit.pundir@linaro.org
State New
Headers show
Series arm64: dts: qcom: sdm845-db845c: Move LVS regulator nodes up | expand

Commit Message

Amit Pundir June 2, 2023, 4:12 p.m. UTC
Move lvs1 and lvs2 regulator nodes up in the rpmh-regulators
list to workaround a boot regression uncovered by the upstream
commit ad44ac082fdf ("regulator: qcom-rpmh: Revert "regulator:
qcom-rpmh: Use PROBE_FORCE_SYNCHRONOUS"").

Without this fix DB845c fail to boot at times because one of the
lvs1 or lvs2 regulators fail to turn ON in time.

Link: https://lore.kernel.org/all/CAMi1Hd1avQDcDQf137m2auz2znov4XL8YGrLZsw5edb-NtRJRw@mail.gmail.com/
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
---
 arch/arm64/boot/dts/qcom/sdm845-db845c.dts | 24 +++++++++++-----------
 1 file changed, 12 insertions(+), 12 deletions(-)

Comments

Doug Anderson June 6, 2023, 11:34 p.m. UTC | #1
Hi,

On Fri, Jun 2, 2023 at 9:12 AM Amit Pundir <amit.pundir@linaro.org> wrote:
>
> Move lvs1 and lvs2 regulator nodes up in the rpmh-regulators
> list to workaround a boot regression uncovered by the upstream
> commit ad44ac082fdf ("regulator: qcom-rpmh: Revert "regulator:
> qcom-rpmh: Use PROBE_FORCE_SYNCHRONOUS"").
>
> Without this fix DB845c fail to boot at times because one of the
> lvs1 or lvs2 regulators fail to turn ON in time.
>
> Link: https://lore.kernel.org/all/CAMi1Hd1avQDcDQf137m2auz2znov4XL8YGrLZsw5edb-NtRJRw@mail.gmail.com/
> Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
> ---
>  arch/arm64/boot/dts/qcom/sdm845-db845c.dts | 24 +++++++++++-----------
>  1 file changed, 12 insertions(+), 12 deletions(-)
>
> diff --git a/arch/arm64/boot/dts/qcom/sdm845-db845c.dts b/arch/arm64/boot/dts/qcom/sdm845-db845c.dts
> index e14fe9bbb386..df2fde9063dc 100644
> --- a/arch/arm64/boot/dts/qcom/sdm845-db845c.dts
> +++ b/arch/arm64/boot/dts/qcom/sdm845-db845c.dts
> @@ -301,6 +301,18 @@ regulators-0 {
>                 vdd-l26-supply = <&vreg_s3a_1p35>;
>                 vin-lvs-1-2-supply = <&vreg_s4a_1p8>;
>
> +               vreg_lvs1a_1p8: lvs1 {
> +                       regulator-min-microvolt = <1800000>;
> +                       regulator-max-microvolt = <1800000>;
> +                       regulator-always-on;
> +               };
> +
> +               vreg_lvs2a_1p8: lvs2 {
> +                       regulator-min-microvolt = <1800000>;
> +                       regulator-max-microvolt = <1800000>;
> +                       regulator-always-on;
> +               };
> +
>                 vreg_s3a_1p35: smps3 {
>                         regulator-min-microvolt = <1352000>;
>                         regulator-max-microvolt = <1352000>;
> @@ -381,18 +393,6 @@ vreg_l26a_1p2: ldo26 {
>                         regulator-max-microvolt = <1200000>;
>                         regulator-initial-mode = <RPMH_REGULATOR_MODE_HPM>;
>                 };
> -
> -               vreg_lvs1a_1p8: lvs1 {
> -                       regulator-min-microvolt = <1800000>;
> -                       regulator-max-microvolt = <1800000>;
> -                       regulator-always-on;
> -               };
> -
> -               vreg_lvs2a_1p8: lvs2 {
> -                       regulator-min-microvolt = <1800000>;
> -                       regulator-max-microvolt = <1800000>;
> -                       regulator-always-on;
> -               };

This is a hack, but it at least feels less bad than reverting the
async probe patch. I'll leave it to Bjorn to decide if he's OK with
it. Personally, it feels like this would deserve a comment in the dts
to document that these regulators need to be listed first.

Ideally, we could still work towards a root cause. I added a few more
ideas to help with root causing in reply to the original thread about
this.

https://lore.kernel.org/r/CAD=FV=UKyjRNZG-ED2meUAR9aXdco+AbUTHiKixTzjCkaJbjTg@mail.gmail.com/

-Doug
Linux regression tracking (Thorsten Leemhuis) June 14, 2023, 6:18 p.m. UTC | #2
On 02.06.23 18:12, Amit Pundir wrote:
> Move lvs1 and lvs2 regulator nodes up in the rpmh-regulators
> list to workaround a boot regression uncovered by the upstream
> commit ad44ac082fdf ("regulator: qcom-rpmh: Revert "regulator:
> qcom-rpmh: Use PROBE_FORCE_SYNCHRONOUS"").
> 
> Without this fix DB845c fail to boot at times because one of the
> lvs1 or lvs2 regulators fail to turn ON in time.

/me waves friendly

FWIW, as it's not obvious: this...

> Link: https://lore.kernel.org/all/CAMi1Hd1avQDcDQf137m2auz2znov4XL8YGrLZsw5edb-NtRJRw@mail.gmail.com/

...is a report about a regression. One that we could still solve before
6.4 is out. One I'll likely will point Linus to, unless a fix comes into
sight.

When I noticed the reluctant replies to this patch I earlier today asked
in the thread with the report what the plan forward was:
https://lore.kernel.org/all/CAD%3DFV%3DV-h4EUKHCM9UivsFHRsJPY5sAiwXV3a1hUX9DUMkkxdg@mail.gmail.com/

Dough there replied:

```
Of the two proposals made (the revert vs. the reordering of the dts),
the reordering of the dts seems better. It only affects the one buggy
board (rather than preventing us to move to async probe for everyone)
and it also has a chance of actually fixing something (changing the
order that regulators probe in rpmh-regulator might legitimately work
around the problem). That being said, just like the revert the dts
reordering is still just papering over the problem and is fragile /
not guaranteed to work forever.
```

Papering over obviously is not good, but has anyone a better idea to fix
this? Or is "not fixing" for some reason an viable option here?

Ciao, Thorsten

> Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
> ---
>  arch/arm64/boot/dts/qcom/sdm845-db845c.dts | 24 +++++++++++-----------
>  1 file changed, 12 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/arm64/boot/dts/qcom/sdm845-db845c.dts b/arch/arm64/boot/dts/qcom/sdm845-db845c.dts
> index e14fe9bbb386..df2fde9063dc 100644
> --- a/arch/arm64/boot/dts/qcom/sdm845-db845c.dts
> +++ b/arch/arm64/boot/dts/qcom/sdm845-db845c.dts
> @@ -301,6 +301,18 @@ regulators-0 {
>  		vdd-l26-supply = <&vreg_s3a_1p35>;
>  		vin-lvs-1-2-supply = <&vreg_s4a_1p8>;
>  
> +		vreg_lvs1a_1p8: lvs1 {
> +			regulator-min-microvolt = <1800000>;
> +			regulator-max-microvolt = <1800000>;
> +			regulator-always-on;
> +		};
> +
> +		vreg_lvs2a_1p8: lvs2 {
> +			regulator-min-microvolt = <1800000>;
> +			regulator-max-microvolt = <1800000>;
> +			regulator-always-on;
> +		};
> +
>  		vreg_s3a_1p35: smps3 {
>  			regulator-min-microvolt = <1352000>;
>  			regulator-max-microvolt = <1352000>;
> @@ -381,18 +393,6 @@ vreg_l26a_1p2: ldo26 {
>  			regulator-max-microvolt = <1200000>;
>  			regulator-initial-mode = <RPMH_REGULATOR_MODE_HPM>;
>  		};
> -
> -		vreg_lvs1a_1p8: lvs1 {
> -			regulator-min-microvolt = <1800000>;
> -			regulator-max-microvolt = <1800000>;
> -			regulator-always-on;
> -		};
> -
> -		vreg_lvs2a_1p8: lvs2 {
> -			regulator-min-microvolt = <1800000>;
> -			regulator-max-microvolt = <1800000>;
> -			regulator-always-on;
> -		};
>  	};
>  
>  	regulators-1 {
Krzysztof Kozlowski June 14, 2023, 6:47 p.m. UTC | #3
On 14/06/2023 20:18, Linux regression tracking (Thorsten Leemhuis) wrote:
> On 02.06.23 18:12, Amit Pundir wrote:
>> Move lvs1 and lvs2 regulator nodes up in the rpmh-regulators
>> list to workaround a boot regression uncovered by the upstream
>> commit ad44ac082fdf ("regulator: qcom-rpmh: Revert "regulator:
>> qcom-rpmh: Use PROBE_FORCE_SYNCHRONOUS"").
>>
>> Without this fix DB845c fail to boot at times because one of the
>> lvs1 or lvs2 regulators fail to turn ON in time.
> 
> /me waves friendly
> 
> FWIW, as it's not obvious: this...
> 
>> Link: https://lore.kernel.org/all/CAMi1Hd1avQDcDQf137m2auz2znov4XL8YGrLZsw5edb-NtRJRw@mail.gmail.com/
> 
> ...is a report about a regression. One that we could still solve before
> 6.4 is out. One I'll likely will point Linus to, unless a fix comes into
> sight.
> 
> When I noticed the reluctant replies to this patch I earlier today asked
> in the thread with the report what the plan forward was:
> https://lore.kernel.org/all/CAD%3DFV%3DV-h4EUKHCM9UivsFHRsJPY5sAiwXV3a1hUX9DUMkkxdg@mail.gmail.com/
> 
> Dough there replied:
> 
> ```
> Of the two proposals made (the revert vs. the reordering of the dts),
> the reordering of the dts seems better. It only affects the one buggy
> board (rather than preventing us to move to async probe for everyone)
> and it also has a chance of actually fixing something (changing the
> order that regulators probe in rpmh-regulator might legitimately work
> around the problem). That being said, just like the revert the dts
> reordering is still just papering over the problem and is fragile /
> not guaranteed to work forever.
> ```
> 
> Papering over obviously is not good, but has anyone a better idea to fix
> this? Or is "not fixing" for some reason an viable option here?
> 

I understand there is a regression, although kernel is not mainline
(hash df7443a96851 is unknown) and the only solutions were papering the
problem. Reverting commit is a temporary workaround. Moving nodes in DTS
is not acceptable because it hides actual problem and only solves this
one particular observed problem, while actual issue is still there. It
would be nice to be able to reproduce it on real mainline with normal
operating system (not AOSP) - with ramdiks/without/whatever. So far no
one did it, right?

Best regards,
Krzysztof
Amit Pundir June 14, 2023, 7:08 p.m. UTC | #4
On Thu, 15 Jun 2023 at 00:17, Krzysztof Kozlowski
<krzysztof.kozlowski@linaro.org> wrote:
>
> On 14/06/2023 20:18, Linux regression tracking (Thorsten Leemhuis) wrote:
> > On 02.06.23 18:12, Amit Pundir wrote:
> >> Move lvs1 and lvs2 regulator nodes up in the rpmh-regulators
> >> list to workaround a boot regression uncovered by the upstream
> >> commit ad44ac082fdf ("regulator: qcom-rpmh: Revert "regulator:
> >> qcom-rpmh: Use PROBE_FORCE_SYNCHRONOUS"").
> >>
> >> Without this fix DB845c fail to boot at times because one of the
> >> lvs1 or lvs2 regulators fail to turn ON in time.
> >
> > /me waves friendly
> >
> > FWIW, as it's not obvious: this...
> >
> >> Link: https://lore.kernel.org/all/CAMi1Hd1avQDcDQf137m2auz2znov4XL8YGrLZsw5edb-NtRJRw@mail.gmail.com/
> >
> > ...is a report about a regression. One that we could still solve before
> > 6.4 is out. One I'll likely will point Linus to, unless a fix comes into
> > sight.
> >
> > When I noticed the reluctant replies to this patch I earlier today asked
> > in the thread with the report what the plan forward was:
> > https://lore.kernel.org/all/CAD%3DFV%3DV-h4EUKHCM9UivsFHRsJPY5sAiwXV3a1hUX9DUMkkxdg@mail.gmail.com/
> >
> > Dough there replied:
> >
> > ```
> > Of the two proposals made (the revert vs. the reordering of the dts),
> > the reordering of the dts seems better. It only affects the one buggy
> > board (rather than preventing us to move to async probe for everyone)
> > and it also has a chance of actually fixing something (changing the
> > order that regulators probe in rpmh-regulator might legitimately work
> > around the problem). That being said, just like the revert the dts
> > reordering is still just papering over the problem and is fragile /
> > not guaranteed to work forever.
> > ```
> >
> > Papering over obviously is not good, but has anyone a better idea to fix
> > this? Or is "not fixing" for some reason an viable option here?
> >
>
> I understand there is a regression, although kernel is not mainline
> (hash df7443a96851 is unknown) and the only solutions were papering the
> problem. Reverting commit is a temporary workaround. Moving nodes in DTS
> is not acceptable because it hides actual problem and only solves this
> one particular observed problem, while actual issue is still there. It
> would be nice to be able to reproduce it on real mainline with normal
> operating system (not AOSP) - with ramdiks/without/whatever. So far no
> one did it, right?

No, I did not try non-AOSP system yet. I'll try it tomorrow, if that
helps. With mainline hash.

Regards,
Amit Pundir
Amit Pundir June 19, 2023, 7:06 a.m. UTC | #5
On Sat, 17 Jun 2023 at 12:51, Krzysztof Kozlowski
<krzysztof.kozlowski@linaro.org> wrote:
>
> On 16/06/2023 19:09, Amit Pundir wrote:
> > Hi,
> >
> > On Fri, 16 Jun 2023 at 13:57, Krzysztof Kozlowski
> > <krzysztof.kozlowski@linaro.org> wrote:
> >>
> >>
> >> So you have interconnect as module - this is not a supported setup. It
> >> might work with if all the modules are loaded very early or might not.
> >> Pinctrl is another driver which should be built-in.
> >>
> >> With your defconfig I see regular issue - console and system dies
> >> because of lack of interconnects, most likely. I don't see your WARNs -
> >> I just see usual hang.
> >>
> >> See:
> >> https://lore.kernel.org/all/20221021032702.1340963-1-krzysztof.kozlowski@linaro.org/
> >>
> >> If you want them to really be modules, then you need to fix all the
> >> dependencies (SOFTDEP?), probe ordering glitches. It's not a problem of
> >> DTS. Just because something can be built as module, does not mean it
> >> will work. We don't test it, we don't work with them as modules.
> >
> > I do somewhat agree with most of your arguments but not this one. If a
> > driver doesn't work as a module then it shouldn't be allowed to build
> > as a module.
>
> Of course you are right. That's why I am pushing against blindly adding
> "tristate" by everyone working on GKI. Because such folks like to make
> them tristate, but not actually test it or work on issues later.
>
> That's exactly the case from Google and Samsung patches here:
> https://lore.kernel.org/all/ac328b6a-a8e2-873d-4015-814cb4f5588e@canonical.com/
> and in previous submissions.
>
> > I took a quick look at the history of the interconnect
> > driver and it is tristate from the beginning. And not converted to a
> > modular build later-on like some of the other drivers to support GKI.
>
> OK, maybe it was never actually tested. Or maybe some versions were
> working on boards where debug serial does not have interconnect, but new
> chips just followed the pattern without testing?
> >
> >>
> >> It's kind of the same as here:
> >> https://lore.kernel.org/all/ac328b6a-a8e2-873d-4015-814cb4f5588e@canonical.com/
> >>
> >> I understand that we might have here regression, if these were working
> >> as modules, but I don't think we ever really committed to it. We can as
> >> well make it non-module to solve the regression.
> >
> > Sure. But since v6.4 is around the corner, can we merge this
> > workaround for now, while a proper fix is being worked upon.
>
> DTS workaround? No. I don't agree. Once it is merged it will not be fixed.
>
> I am perfectly fine though with making the interconnect or even rpmh
> regulator bool instead of tristate.

As Doug also mentioned in one of his earlier emails, this workaround
is only limited to one particular board. If I try to change the common
interconnect and/or rpmh driver then it will need ack from other stake
holders as well and I'll most likely get more pushback from that side.

Regards,
Amit Pundir
Linux regression tracking (Thorsten Leemhuis) June 22, 2023, 7:47 a.m. UTC | #6
Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
for once, to make this easily accessible to everyone.

As Linus will likely release 6.4 on this or the following Sunday a quick
status inquiry so I can brief him appropriately: is there any hope the
regression this patch tried to fix will be resolved any time soon?
Doesn't look like it from below message and this thread, but maybe I
missed something.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke

On 20.06.23 17:59, Bjorn Andersson wrote:
> On Wed, Jun 14, 2023 at 12:44:15PM -0700, Doug Anderson wrote:
>> Hi,
>>
>> On Wed, Jun 14, 2023 at 11:47 AM Krzysztof Kozlowski
>> <krzysztof.kozlowski@linaro.org> wrote:
>>>
>>> On 14/06/2023 20:18, Linux regression tracking (Thorsten Leemhuis) wrote:
>>>> On 02.06.23 18:12, Amit Pundir wrote:
>>>>> Move lvs1 and lvs2 regulator nodes up in the rpmh-regulators
>>>>> list to workaround a boot regression uncovered by the upstream
>>>>> commit ad44ac082fdf ("regulator: qcom-rpmh: Revert "regulator:
>>>>> qcom-rpmh: Use PROBE_FORCE_SYNCHRONOUS"").
>>>>>
>>>>> Without this fix DB845c fail to boot at times because one of the
>>>>> lvs1 or lvs2 regulators fail to turn ON in time.
>>>>
>>>> /me waves friendly
>>>>
>>>> FWIW, as it's not obvious: this...
>>>>
>>>>> Link: https://lore.kernel.org/all/CAMi1Hd1avQDcDQf137m2auz2znov4XL8YGrLZsw5edb-NtRJRw@mail.gmail.com/
>>>>
>>>> ...is a report about a regression. One that we could still solve before
>>>> 6.4 is out. One I'll likely will point Linus to, unless a fix comes into
>>>> sight.
>>>>
>>>> When I noticed the reluctant replies to this patch I earlier today asked
>>>> in the thread with the report what the plan forward was:
>>>> https://lore.kernel.org/all/CAD%3DFV%3DV-h4EUKHCM9UivsFHRsJPY5sAiwXV3a1hUX9DUMkkxdg@mail.gmail.com/
>>>>
>>>> Dough there replied:
>>>>
>>>> ```
>>>> Of the two proposals made (the revert vs. the reordering of the dts),
>>>> the reordering of the dts seems better. It only affects the one buggy
>>>> board (rather than preventing us to move to async probe for everyone)
>>>> and it also has a chance of actually fixing something (changing the
>>>> order that regulators probe in rpmh-regulator might legitimately work
>>>> around the problem). That being said, just like the revert the dts
>>>> reordering is still just papering over the problem and is fragile /
>>>> not guaranteed to work forever.
>>>> ```
>>>>
>>>> Papering over obviously is not good, but has anyone a better idea to fix
>>>> this? Or is "not fixing" for some reason an viable option here?
>>>>
>>>
>>> I understand there is a regression, although kernel is not mainline
>>> (hash df7443a96851 is unknown) and the only solutions were papering the
>>> problem. Reverting commit is a temporary workaround. Moving nodes in DTS
>>> is not acceptable because it hides actual problem and only solves this
>>> one particular observed problem, while actual issue is still there. It
>>> would be nice to be able to reproduce it on real mainline with normal
>>> operating system (not AOSP) - with ramdiks/without/whatever. So far no
>>> one did it, right?
>>
>> The worry I have about the revert here is that it will never be able
>> to be undone and that doesn't seem great long term. I'm all for a
>> temporary revert to fix a problem while the root cause is understood,
>> but in this case I have a hard time believing that we'll make more
>> progress towards a root cause once the revert lands. All the
>> investigation we've done so far seems to indicate that the revert only
>> fixes the problem by luck...
>>
>> I completely agree that moving the nodes in the DTS is a hack and just
>> hides the problem. However, it also at least limits the workaround to
>> the one board showing the problem and doesn't mean we're stuck with
>> synchronous probe for rpmh-regulator for all eternity because nobody
>> can understand this timing issue on db845c.
>>
> 
> I agree that we shouldn't hide this by reverting the regulator change.
> 
> 
> And as has been stated a few times already, the symptom indicates that
> we have a misconfigured system.
> 
> Before accepting a patch just shuffling the bricks, I'd like to see some
> more analysis of what happens wrt the rpmh right before the timeout.
> Perhaps the landing team can assist here?
> 
> Regards,
> Bjorn
> 
>
Amit Pundir June 22, 2023, 11:48 a.m. UTC | #7
On Thu, 22 Jun 2023 at 13:17, Linux regression tracking (Thorsten
Leemhuis) <regressions@leemhuis.info> wrote:
>
> Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
> for once, to make this easily accessible to everyone.
>
> As Linus will likely release 6.4 on this or the following Sunday a quick
> status inquiry so I can brief him appropriately: is there any hope the
> regression this patch tried to fix will be resolved any time soon?

We are most likely to miss v6.4. I'm trying to reproduce the crash
with tracing enabled, to share some more debug information.

Regards,
Amit Pundir

> Doesn't look like it from below message and this thread, but maybe I
> missed something.
>
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> --
> Everything you wanna know about Linux kernel regression tracking:
> https://linux-regtracking.leemhuis.info/about/#tldr
> If I did something stupid, please tell me, as explained on that page.
>
> #regzbot poke
>
> On 20.06.23 17:59, Bjorn Andersson wrote:
> > On Wed, Jun 14, 2023 at 12:44:15PM -0700, Doug Anderson wrote:
> >> Hi,
> >>
> >> On Wed, Jun 14, 2023 at 11:47 AM Krzysztof Kozlowski
> >> <krzysztof.kozlowski@linaro.org> wrote:
> >>>
> >>> On 14/06/2023 20:18, Linux regression tracking (Thorsten Leemhuis) wrote:
> >>>> On 02.06.23 18:12, Amit Pundir wrote:
> >>>>> Move lvs1 and lvs2 regulator nodes up in the rpmh-regulators
> >>>>> list to workaround a boot regression uncovered by the upstream
> >>>>> commit ad44ac082fdf ("regulator: qcom-rpmh: Revert "regulator:
> >>>>> qcom-rpmh: Use PROBE_FORCE_SYNCHRONOUS"").
> >>>>>
> >>>>> Without this fix DB845c fail to boot at times because one of the
> >>>>> lvs1 or lvs2 regulators fail to turn ON in time.
> >>>>
> >>>> /me waves friendly
> >>>>
> >>>> FWIW, as it's not obvious: this...
> >>>>
> >>>>> Link: https://lore.kernel.org/all/CAMi1Hd1avQDcDQf137m2auz2znov4XL8YGrLZsw5edb-NtRJRw@mail.gmail.com/
> >>>>
> >>>> ...is a report about a regression. One that we could still solve before
> >>>> 6.4 is out. One I'll likely will point Linus to, unless a fix comes into
> >>>> sight.
> >>>>
> >>>> When I noticed the reluctant replies to this patch I earlier today asked
> >>>> in the thread with the report what the plan forward was:
> >>>> https://lore.kernel.org/all/CAD%3DFV%3DV-h4EUKHCM9UivsFHRsJPY5sAiwXV3a1hUX9DUMkkxdg@mail.gmail.com/
> >>>>
> >>>> Dough there replied:
> >>>>
> >>>> ```
> >>>> Of the two proposals made (the revert vs. the reordering of the dts),
> >>>> the reordering of the dts seems better. It only affects the one buggy
> >>>> board (rather than preventing us to move to async probe for everyone)
> >>>> and it also has a chance of actually fixing something (changing the
> >>>> order that regulators probe in rpmh-regulator might legitimately work
> >>>> around the problem). That being said, just like the revert the dts
> >>>> reordering is still just papering over the problem and is fragile /
> >>>> not guaranteed to work forever.
> >>>> ```
> >>>>
> >>>> Papering over obviously is not good, but has anyone a better idea to fix
> >>>> this? Or is "not fixing" for some reason an viable option here?
> >>>>
> >>>
> >>> I understand there is a regression, although kernel is not mainline
> >>> (hash df7443a96851 is unknown) and the only solutions were papering the
> >>> problem. Reverting commit is a temporary workaround. Moving nodes in DTS
> >>> is not acceptable because it hides actual problem and only solves this
> >>> one particular observed problem, while actual issue is still there. It
> >>> would be nice to be able to reproduce it on real mainline with normal
> >>> operating system (not AOSP) - with ramdiks/without/whatever. So far no
> >>> one did it, right?
> >>
> >> The worry I have about the revert here is that it will never be able
> >> to be undone and that doesn't seem great long term. I'm all for a
> >> temporary revert to fix a problem while the root cause is understood,
> >> but in this case I have a hard time believing that we'll make more
> >> progress towards a root cause once the revert lands. All the
> >> investigation we've done so far seems to indicate that the revert only
> >> fixes the problem by luck...
> >>
> >> I completely agree that moving the nodes in the DTS is a hack and just
> >> hides the problem. However, it also at least limits the workaround to
> >> the one board showing the problem and doesn't mean we're stuck with
> >> synchronous probe for rpmh-regulator for all eternity because nobody
> >> can understand this timing issue on db845c.
> >>
> >
> > I agree that we shouldn't hide this by reverting the regulator change.
> >
> >
> > And as has been stated a few times already, the symptom indicates that
> > we have a misconfigured system.
> >
> > Before accepting a patch just shuffling the bricks, I'd like to see some
> > more analysis of what happens wrt the rpmh right before the timeout.
> > Perhaps the landing team can assist here?
> >
> > Regards,
> > Bjorn
> >
> >
Amit Pundir July 7, 2023, 5:08 a.m. UTC | #8
On Thu, 22 Jun 2023 at 17:18, Amit Pundir <amit.pundir@linaro.org> wrote:
>
> On Thu, 22 Jun 2023 at 13:17, Linux regression tracking (Thorsten
> Leemhuis) <regressions@leemhuis.info> wrote:
> >
> > Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
> > for once, to make this easily accessible to everyone.
> >
> > As Linus will likely release 6.4 on this or the following Sunday a quick
> > status inquiry so I can brief him appropriately: is there any hope the
> > regression this patch tried to fix will be resolved any time soon?
>
> We are most likely to miss v6.4. I'm trying to reproduce the crash
> with tracing enabled, to share some more debug information.

FWIW, I couldn't reproduce this bug on v6.5 merge window commit
d528014517f2 (Revert ".gitignore: ignore *.cover and *.mbx")
on 100+ boot tests last night.
For the time being I'm keeping an eye on it and will trigger the boot
tests occasionally in the v6.5 development cycle.

>
> Regards,
> Amit Pundir
>
diff mbox series

Patch

diff --git a/arch/arm64/boot/dts/qcom/sdm845-db845c.dts b/arch/arm64/boot/dts/qcom/sdm845-db845c.dts
index e14fe9bbb386..df2fde9063dc 100644
--- a/arch/arm64/boot/dts/qcom/sdm845-db845c.dts
+++ b/arch/arm64/boot/dts/qcom/sdm845-db845c.dts
@@ -301,6 +301,18 @@  regulators-0 {
 		vdd-l26-supply = <&vreg_s3a_1p35>;
 		vin-lvs-1-2-supply = <&vreg_s4a_1p8>;
 
+		vreg_lvs1a_1p8: lvs1 {
+			regulator-min-microvolt = <1800000>;
+			regulator-max-microvolt = <1800000>;
+			regulator-always-on;
+		};
+
+		vreg_lvs2a_1p8: lvs2 {
+			regulator-min-microvolt = <1800000>;
+			regulator-max-microvolt = <1800000>;
+			regulator-always-on;
+		};
+
 		vreg_s3a_1p35: smps3 {
 			regulator-min-microvolt = <1352000>;
 			regulator-max-microvolt = <1352000>;
@@ -381,18 +393,6 @@  vreg_l26a_1p2: ldo26 {
 			regulator-max-microvolt = <1200000>;
 			regulator-initial-mode = <RPMH_REGULATOR_MODE_HPM>;
 		};
-
-		vreg_lvs1a_1p8: lvs1 {
-			regulator-min-microvolt = <1800000>;
-			regulator-max-microvolt = <1800000>;
-			regulator-always-on;
-		};
-
-		vreg_lvs2a_1p8: lvs2 {
-			regulator-min-microvolt = <1800000>;
-			regulator-max-microvolt = <1800000>;
-			regulator-always-on;
-		};
 	};
 
 	regulators-1 {