diff mbox series

[v4,1/5] dt-bindings: PCI: brcmstb: brcm,{enable-l1ss,completion-timeout-us} props

Message ID 20230428223500.23337-2-jim2101024@gmail.com
State New
Headers show
Series PCI: brcmstb: Configure appropriate HW CLKREQ# mode | expand

Commit Message

Jim Quinlan April 28, 2023, 10:34 p.m. UTC
This commit introduces two new properties:

brcm,enable-l1ss (bool):

  The Broadcom STB/CM PCIe HW -- a core that is also used by RPi SOCs --
  requires the driver probe() to deliberately place the HW one of three
  CLKREQ# modes:

  (a) CLKREQ# driven by the RC unconditionally
  (b) CLKREQ# driven by the EP for ASPM L0s, L1
  (c) Bidirectional CLKREQ#, as used for L1 Substates (L1SS).

  The HW+driver can tell the difference between downstream devices that
  need (a) and (b), but does not know when to configure (c).  All devices
  should work fine when the driver chooses (a) or (b), but (c) may be
  desired to realize the extra power savings that L1SS offers.  So we
  introduce the boolean "brcm,enable-l1ss" property to inform the driver
  that (c) is desired.  Setting this property only makes sense when the
  downstream device is L1SS-capable and the OS is configured to activate
  this mode (e.g. policy==superpowersave).

  This property is already present in the Raspian version of Linux, but the
  upstream driver implementaion that follows adds more details and discerns
  between (a) and (b).

brcm,completion-timeout-us (u32):

  Our HW will cause a CPU abort on any PCI transaction completion abort
  error.  It makes sense then to increase the timeout value for this type
  of error in hopes that the response is merely delayed.  Further,
  L1SS-capable devices may have a long L1SS exit time and may require a
  custom timeout value: we've been asked by our customers to make this
  configurable for just this reason.

Signed-off-by: Jim Quinlan <jim2101024@gmail.com>
Reviewed-by: Rob Herring <robh@kernel.org>
---
 .../devicetree/bindings/pci/brcm,stb-pcie.yaml   | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

Comments

Bjorn Helgaas April 30, 2023, 7:10 p.m. UTC | #1
On Fri, Apr 28, 2023 at 06:34:55PM -0400, Jim Quinlan wrote:
> This commit introduces two new properties:

Doing two things makes this a candidate for splitting into two
patches, as you've already done for the driver support.  They seem
incidentally related but not indivisible.

> brcm,enable-l1ss (bool):
> 
>   The Broadcom STB/CM PCIe HW -- a core that is also used by RPi SOCs --
>   requires the driver probe() to deliberately place the HW one of three
>   CLKREQ# modes:
> 
>   (a) CLKREQ# driven by the RC unconditionally
>   (b) CLKREQ# driven by the EP for ASPM L0s, L1
>   (c) Bidirectional CLKREQ#, as used for L1 Substates (L1SS).
> 
>   The HW+driver can tell the difference between downstream devices that
>   need (a) and (b), but does not know when to configure (c).  All devices
>   should work fine when the driver chooses (a) or (b), but (c) may be
>   desired to realize the extra power savings that L1SS offers.  So we
>   introduce the boolean "brcm,enable-l1ss" property to inform the driver
>   that (c) is desired.  Setting this property only makes sense when the
>   downstream device is L1SS-capable and the OS is configured to activate
>   this mode (e.g. policy==superpowersave).

Is this related to the existing generic "supports-clkreq" property?  I
guess not, because supports-clkreq looks like a description of CLKREQ
signal routing, while brcm,enable-l1ss looks like a description of
what kind of downstream device is present?

What bad things would happen if the driver always configured (c)?

Other platforms don't require this, and having to edit the DT based on
what PCIe device is plugged in seems wrong.  If brcmstb does need it,
that suggests a hardware defect.  If we need this to work around a
defect, that's OK, but we should acknowledge the defect so we can stop
using this for future hardware that doesn't need it.

Maybe the name should be more specific to CLKREQ#, since this doesn't
actually *enable* L1SS; apparently it's just one of the pieces needed
to enable L1SS?

>   This property is already present in the Raspian version of Linux, but the
>   upstream driver implementaion that follows adds more details and discerns
>   between (a) and (b).

s/implementaion/implementation/

> brcm,completion-timeout-us (u32):
> 
>   Our HW will cause a CPU abort on any PCI transaction completion abort
>   error.  It makes sense then to increase the timeout value for this type
>   of error in hopes that the response is merely delayed.  Further,
>   L1SS-capable devices may have a long L1SS exit time and may require a
>   custom timeout value: we've been asked by our customers to make this
>   configurable for just this reason.

I asked before whether this should be made generic and not
brcm-specific, since completion timeouts are generic PCIe things.  I
didn't see any discussion, but Rob reviewed this so I guess it's OK
as-is.

Is there something unique about brcm that requires this?  I think it's
common for PCIe Completion Timeouts to cause CPU aborts.

Surely other drivers need to configure the completion timeout, but
pcie-rcar-host.c and pcie-rcar-ep.c are the only ones I could find.
Maybe the brcmstb power-up values are just too small?  Does the
correct value need to be in DT, or could it just be built into the
driver?

This sounds like something dependent on the downstream device
connected, which again sounds hard for users to deal with.  How would
they know what to use here?

> Signed-off-by: Jim Quinlan <jim2101024@gmail.com>
> Reviewed-by: Rob Herring <robh@kernel.org>
> ---
>  .../devicetree/bindings/pci/brcm,stb-pcie.yaml   | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/pci/brcm,stb-pcie.yaml b/Documentation/devicetree/bindings/pci/brcm,stb-pcie.yaml
> index 7e15aae7d69e..239cc95545bd 100644
> --- a/Documentation/devicetree/bindings/pci/brcm,stb-pcie.yaml
> +++ b/Documentation/devicetree/bindings/pci/brcm,stb-pcie.yaml
> @@ -64,6 +64,22 @@ properties:
>  
>    aspm-no-l0s: true
>  
> +  brcm,enable-l1ss:
> +    description: Indicates that PCIe L1SS power savings
> +      are desired, the downstream device is L1SS-capable, and the
> +      OS has been configured to enable this mode.  For boards
> +      using a mini-card connector, this mode may not meet the
> +      TCRLon maximum time of 400ns, as specified in 3.2.5.2.5
> +      of the PCI Express Mini CEM 2.0 specification.
> +    type: boolean
> +
> +  brcm,completion-timeout-us:
> +    description: Number of microseconds before PCI transaction
> +      completion timeout abort is signalled.
> +    minimum: 16
> +    default: 1000000
> +    maximum: 19884107
> +
>    brcm,scb-sizes:
>      description: u64 giving the 64bit PCIe memory
>        viewport size of a memory controller.  There may be up to
> -- 
> 2.17.1
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Jim Quinlan May 3, 2023, 2:38 p.m. UTC | #2
On Sun, Apr 30, 2023 at 3:10 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Fri, Apr 28, 2023 at 06:34:55PM -0400, Jim Quinlan wrote:
> > This commit introduces two new properties:
>
> Doing two things makes this a candidate for splitting into two
> patches, as you've already done for the driver support.  They seem
> incidentally related but not indivisible.
>
> > brcm,enable-l1ss (bool):
> >
> >   The Broadcom STB/CM PCIe HW -- a core that is also used by RPi SOCs --
> >   requires the driver probe() to deliberately place the HW one of three
> >   CLKREQ# modes:
> >
> >   (a) CLKREQ# driven by the RC unconditionally
> >   (b) CLKREQ# driven by the EP for ASPM L0s, L1
> >   (c) Bidirectional CLKREQ#, as used for L1 Substates (L1SS).
> >
> >   The HW+driver can tell the difference between downstream devices that
> >   need (a) and (b), but does not know when to configure (c).  All devices
> >   should work fine when the driver chooses (a) or (b), but (c) may be
> >   desired to realize the extra power savings that L1SS offers.  So we
> >   introduce the boolean "brcm,enable-l1ss" property to inform the driver
> >   that (c) is desired.  Setting this property only makes sense when the
> >   downstream device is L1SS-capable and the OS is configured to activate
> >   this mode (e.g. policy==superpowersave).
>
> Is this related to the existing generic "supports-clkreq" property?  I
> guess not, because supports-clkreq looks like a description of CLKREQ
> signal routing, while brcm,enable-l1ss looks like a description of
> what kind of downstream device is present?

It is related, I thought about using it,  but not helpful for our needs.  Both
cases (b) and (c)  assume "supports-clkreq", and our HW needs to know
the difference between them.  Further, we have a register that tells
us if the endpoint device has requested a CLKREQ#, so we already
have this information.

As an aside, I would think that the "supports-clkreq" property should be in
the port-driver or endpoint node.

>
> What bad things would happen if the driver always configured (c)?
Well, our driver has traditionally only supported (b) and our existing
boards have
been designed with this in mind.  I would not want to switch modes w'o
the user/customer/engineer opting-in to do so.
Further, the PCIe HW engineer
told me defaulting to (c) was a bad idea and was "asking for trouble".
Note that the commit's comment has that warning about L1SS mode
not meeting this 400ns spec, and I suspect that many of our existing designs
have bumped into that.

But to answer your question, I haven't found a scenario that did not work
by setting mode (c).  That doesn't mean they are not out there.

>
> Other platforms don't require this, and having to edit the DT based on
> what PCIe device is plugged in seems wrong.  If brcmstb does need it,
> that suggests a hardware defect.  If we need this to work around a
> defect, that's OK, but we should acknowledge the defect so we can stop
> using this for future hardware that doesn't need it.

All devices should work w/o the user having to change the DT.  Only if they
desire L1SS must they add the "brcm,enable-l1ss" property.

Now there is this case where Cyril has found a regression, but recent
investigation
into this indicates that this particular failure was due to the RPi
CM4 using a "beta" eeprom
version -- after updating, it works fine.

>
> Maybe the name should be more specific to CLKREQ#, since this doesn't
> actually *enable* L1SS; apparently it's just one of the pieces needed
> to enable L1SS?

The other pieces are:  (a) policy == POWERSUPERSAVE and (b) an L1SS-capable
device, which seem unrelated and are out of the scope of the driver.

The RPi Raspian folks have been using "brcm,enable-l1ss"  for a while now and
I would prefer to keep that name for compatibility.

>
> >   This property is already present in the Raspian version of Linux, but the
> >   upstream driver implementaion that follows adds more details and discerns
> >   between (a) and (b).
>
> s/implementaion/implementation/
>
> > brcm,completion-timeout-us (u32):
> >
> >   Our HW will cause a CPU abort on any PCI transaction completion abort
> >   error.  It makes sense then to increase the timeout value for this type
> >   of error in hopes that the response is merely delayed.  Further,
> >   L1SS-capable devices may have a long L1SS exit time and may require a
> >   custom timeout value: we've been asked by our customers to make this
> >   configurable for just this reason.
>
> I asked before whether this should be made generic and not
> brcm-specific, since completion timeouts are generic PCIe things.  I
> didn't see any discussion, but Rob reviewed this so I guess it's OK
> as-is.
I am going to drop it, thanks for questioning its purpose and
I apologize for  the noise.

Regards,
Jim Quinlan
Broadcom STB
>
> Is there something unique about brcm that requires this?  I think it's
> common for PCIe Completion Timeouts to cause CPU aborts.
>
> Surely other drivers need to configure the completion timeout, but
> pcie-rcar-host.c and pcie-rcar-ep.c are the only ones I could find.
> Maybe the brcmstb power-up values are just too small?  Does the
> correct value need to be in DT, or could it just be built into the
> driver?
>
> This sounds like something dependent on the downstream device
> connected, which again sounds hard for users to deal with.  How would
> they know what to use here?
>
> > Signed-off-by: Jim Quinlan <jim2101024@gmail.com>
> > Reviewed-by: Rob Herring <robh@kernel.org>
> > ---
> >  .../devicetree/bindings/pci/brcm,stb-pcie.yaml   | 16 ++++++++++++++++
> >  1 file changed, 16 insertions(+)
> >
> > diff --git a/Documentation/devicetree/bindings/pci/brcm,stb-pcie.yaml b/Documentation/devicetree/bindings/pci/brcm,stb-pcie.yaml
> > index 7e15aae7d69e..239cc95545bd 100644
> > --- a/Documentation/devicetree/bindings/pci/brcm,stb-pcie.yaml
> > +++ b/Documentation/devicetree/bindings/pci/brcm,stb-pcie.yaml
> > @@ -64,6 +64,22 @@ properties:
> >
> >    aspm-no-l0s: true
> >
> > +  brcm,enable-l1ss:
> > +    description: Indicates that PCIe L1SS power savings
> > +      are desired, the downstream device is L1SS-capable, and the
> > +      OS has been configured to enable this mode.  For boards
> > +      using a mini-card connector, this mode may not meet the
> > +      TCRLon maximum time of 400ns, as specified in 3.2.5.2.5
> > +      of the PCI Express Mini CEM 2.0 specification.
> > +    type: boolean
> > +
> > +  brcm,completion-timeout-us:
> > +    description: Number of microseconds before PCI transaction
> > +      completion timeout abort is signalled.
> > +    minimum: 16
> > +    default: 1000000
> > +    maximum: 19884107
> > +
> >    brcm,scb-sizes:
> >      description: u64 giving the 64bit PCIe memory
> >        viewport size of a memory controller.  There may be up to
> > --
> > 2.17.1
> >
> >
> > _______________________________________________
> > linux-arm-kernel mailing list
> > linux-arm-kernel@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
diff mbox series

Patch

diff --git a/Documentation/devicetree/bindings/pci/brcm,stb-pcie.yaml b/Documentation/devicetree/bindings/pci/brcm,stb-pcie.yaml
index 7e15aae7d69e..239cc95545bd 100644
--- a/Documentation/devicetree/bindings/pci/brcm,stb-pcie.yaml
+++ b/Documentation/devicetree/bindings/pci/brcm,stb-pcie.yaml
@@ -64,6 +64,22 @@  properties:
 
   aspm-no-l0s: true
 
+  brcm,enable-l1ss:
+    description: Indicates that PCIe L1SS power savings
+      are desired, the downstream device is L1SS-capable, and the
+      OS has been configured to enable this mode.  For boards
+      using a mini-card connector, this mode may not meet the
+      TCRLon maximum time of 400ns, as specified in 3.2.5.2.5
+      of the PCI Express Mini CEM 2.0 specification.
+    type: boolean
+
+  brcm,completion-timeout-us:
+    description: Number of microseconds before PCI transaction
+      completion timeout abort is signalled.
+    minimum: 16
+    default: 1000000
+    maximum: 19884107
+
   brcm,scb-sizes:
     description: u64 giving the 64bit PCIe memory
       viewport size of a memory controller.  There may be up to