diff mbox series

rockchip: Add delay after link-training

Message ID 20200601203041.58631-1-kurt@intricatesoftware.com
State New
Headers show
Series rockchip: Add delay after link-training | expand

Commit Message

Kurt Miller June 1, 2020, 8:30 p.m. UTC
On at least the RockPro64, many cards will trip a
synchronous abort when first accessing PCIe config space
during bus scanning. A delay after link training allows
some of these cards to function.

Signed-off-by: Kurt Miller <kurt at intricatesoftware.com>
---
On the RockPro64, some pci cards trip a synchronous abort when scanning the
pci bus. For example with HighPoint Rocket Raid 640L which is based on
Marvell 88SE9230 I see this:

=> pci
"Synchronous Abort" handler, esr 0x96000210
elr: 000000000022d034 lr : 000000000022cfd0 (reloc)
elr: 00000000f4568034 lr : 00000000f4567fd0
x0 : 0000000000100000 x1 : 00000000f8000000
x2 : 0000000000000000 x3 : 0000000000100000
x4 : 00000000f2559290 x5 : 0000000000000000
x6 : 0000000000000001 x7 : 00000000f2559860
x8 : 0000000000000030 x9 : 0000000000000008
x10: 0000000000000010 x11: 00000000f251fd1c
x12: 0000000000001421 x13: 0000000000001468
x14: 00000000f251fd4c x15: 00000000ffffffff
x16: 0000000000060001 x17: 000000000000001f
x18: 00000000f2532dc0 x19: 00000000f251fcd0
x20: 0000000000000001 x21: 0000000000000000
x22: 0000000000010000 x23: 00000000f45d4000
x24: 0000000000000000 x25: 00000000f45bc000
x26: 0000000000000000 x27: 0000000000000000
x28: 00000000f2541440 x29: 00000000f251fc20

Code: 540000c1 350000a5 93407c00 f9400081 (b8616800)
Resetting CPU ...

Adding a delay after link training works-around the problem. I added this
delay to the OpenBSD rkpcie driver as well:

https://github.com/openbsd/src/commit/9857dee3520d8ca5bec68538f4b0708d7e64fc87

HighPoint Rocket Raid 640L needs a 1.75 sec delay and Crossfield SAS9211-4i
needs a 1 second delay, so I arbitrarily decided on 2 seconds.

The delay work-around was originally discovered by nuumio:
https://github.com/nuumio/linux-kernel/commit/5a65b17686002dc84d461bffa324a2cb68e67aee

 drivers/pci/pcie_rockchip.c | 8 ++++++++
 1 file changed, 8 insertions(+)

Comments

Jagan Teki June 1, 2020, 8:46 p.m. UTC | #1
On Tue, Jun 2, 2020 at 2:00 AM Kurt Miller <kurt at intricatesoftware.com> wrote:
>
> On at least the RockPro64, many cards will trip a
> synchronous abort when first accessing PCIe config space
> during bus scanning. A delay after link training allows
> some of these cards to function.

Can you check does the SoC has external PCIe pwr-pin GPIO?

I did see unstable SSD behavior on rock960 but fixed with this.
https://github.com/radxa/u-boot/blob/stable-4.4-rockpi4/board/rockchip/evb_rk3399/evb-rk3399.c#L168
Kurt Miller June 1, 2020, 9:18 p.m. UTC | #2
On Tue, 2020-06-02 at 02:16 +0530, Jagan Teki wrote:
> On Tue, Jun 2, 2020 at 2:00 AM Kurt Miller <kurt at intricatesoftware.com> wrote:
> > 
> > 
> > On at least the RockPro64, many cards will trip a
> > synchronous abort when first accessing PCIe config space
> > during bus scanning. A delay after link training allows
> > some of these cards to function.
> Can you check does the SoC has external PCIe pwr-pin GPIO?
> 
> I did see unstable SSD behavior on rock960 but fixed with this.
> https://github.com/radxa/u-boot/blob/stable-4.4-rockpi4/board/rockchip/evb_rk3399/evb-rk3399.c#L168

The schematic has:

GPIO1_D0/TCPD_VBUS_SOURCE2_d ---L26---->>PCIE_PWR

and?arch/arm/dts/rk3399-rockpro64.dtsi has:

&pinctrl {
? ? ? ? pcie {
? ? ? ? ? ? ? ? pcie_pwr_en: pcie-pwr-en {
????????????????????????rockchip,pins = <1 RK_PD0 RK_FUNC_GPIO &pcfg_pull_none>;
????????????????};
????????};
};

Does that answer your question? I'm rather new at this so
I may need more guidance if I miss understood your question.

Thanks,
-Kurt
Kever Yang June 2, 2020, 1:59 a.m. UTC | #3
Hi Kurt,

On 2020/6/2 ??4:30, Kurt Miller wrote:
> On at least the RockPro64, many cards will trip a
> synchronous abort when first accessing PCIe config space
> during bus scanning. A delay after link training allows
> some of these cards to function.
>
> Signed-off-by: Kurt Miller <kurt at intricatesoftware.com>
> ---
> On the RockPro64, some pci cards trip a synchronous abort when scanning the
> pci bus. For example with HighPoint Rocket Raid 640L which is based on
> Marvell 88SE9230 I see this:
>
> => pci
> "Synchronous Abort" handler, esr 0x96000210
> elr: 000000000022d034 lr : 000000000022cfd0 (reloc)
> elr: 00000000f4568034 lr : 00000000f4567fd0
> x0 : 0000000000100000 x1 : 00000000f8000000
> x2 : 0000000000000000 x3 : 0000000000100000
> x4 : 00000000f2559290 x5 : 0000000000000000
> x6 : 0000000000000001 x7 : 00000000f2559860
> x8 : 0000000000000030 x9 : 0000000000000008
> x10: 0000000000000010 x11: 00000000f251fd1c
> x12: 0000000000001421 x13: 0000000000001468
> x14: 00000000f251fd4c x15: 00000000ffffffff
> x16: 0000000000060001 x17: 000000000000001f
> x18: 00000000f2532dc0 x19: 00000000f251fcd0
> x20: 0000000000000001 x21: 0000000000000000
> x22: 0000000000010000 x23: 00000000f45d4000
> x24: 0000000000000000 x25: 00000000f45bc000
> x26: 0000000000000000 x27: 0000000000000000
> x28: 00000000f2541440 x29: 00000000f251fc20
>
> Code: 540000c1 350000a5 93407c00 f9400081 (b8616800)
> Resetting CPU ...
>
> Adding a delay after link training works-around the problem. I added this
> delay to the OpenBSD rkpcie driver as well:
>
> https://github.com/openbsd/src/commit/9857dee3520d8ca5bec68538f4b0708d7e64fc87
>
> HighPoint Rocket Raid 640L needs a 1.75 sec delay and Crossfield SAS9211-4i
> needs a 1 second delay, so I arbitrarily decided on 2 seconds.
>
> The delay work-around was originally discovered by nuumio:
> https://github.com/nuumio/linux-kernel/commit/5a65b17686002dc84d461bffa324a2cb68e67aee
>
>   drivers/pci/pcie_rockchip.c | 8 ++++++++
>   1 file changed, 8 insertions(+)
>
> diff --git a/drivers/pci/pcie_rockchip.c b/drivers/pci/pcie_rockchip.c
> index 0edc2464a8..51cfbf6b18 100644
> --- a/drivers/pci/pcie_rockchip.c
> +++ b/drivers/pci/pcie_rockchip.c
> @@ -288,6 +288,14 @@ static int rockchip_pcie_init_port(struct udevice *dev)
>   		goto err_power_off_phy;
>   	}
>   
> +	/*
> +	 * XXX: On at least the RockPro64, many cards will trip a
> +	 * synchronous abort when first accessing PCIe config space
> +	 * during bus scanning. A delay after link training allows
> +	 * some of these cards to function.
> +	 */
> +	mdelay(2000);

I don't understand what kind of delay for init needs 2 Seconds, root 
cause will preferred.


Thanks,

- Kever

> +
>   	/* Initialize Root Complex registers. */
>   	writel(PCIE_LM_VENDOR_ROCKCHIP, priv->apb_base + PCIE_LM_VENDOR_ID);
>   	writel(PCI_CLASS_BRIDGE_PCI << 16,
Shawn Lin June 2, 2020, 2:23 a.m. UTC | #4
? 2020/6/2 9:59, Kever Yang ??:
> Hi Kurt,
> 
> On 2020/6/2 ??4:30, Kurt Miller wrote:
>> On at least the RockPro64, many cards will trip a
>> synchronous abort when first accessing PCIe config space
>> during bus scanning. A delay after link training allows
>> some of these cards to function.
>>
>> Signed-off-by: Kurt Miller <kurt at intricatesoftware.com>
>> ---
>> On the RockPro64, some pci cards trip a synchronous abort when 
>> scanning the
>> pci bus. For example with HighPoint Rocket Raid 640L which is based on
>> Marvell 88SE9230 I see this:
>>
>> => pci
>> "Synchronous Abort" handler, esr 0x96000210
>> elr: 000000000022d034 lr : 000000000022cfd0 (reloc)
>> elr: 00000000f4568034 lr : 00000000f4567fd0
>> x0 : 0000000000100000 x1 : 00000000f8000000
>> x2 : 0000000000000000 x3 : 0000000000100000
>> x4 : 00000000f2559290 x5 : 0000000000000000
>> x6 : 0000000000000001 x7 : 00000000f2559860
>> x8 : 0000000000000030 x9 : 0000000000000008
>> x10: 0000000000000010 x11: 00000000f251fd1c
>> x12: 0000000000001421 x13: 0000000000001468
>> x14: 00000000f251fd4c x15: 00000000ffffffff
>> x16: 0000000000060001 x17: 000000000000001f
>> x18: 00000000f2532dc0 x19: 00000000f251fcd0
>> x20: 0000000000000001 x21: 0000000000000000
>> x22: 0000000000010000 x23: 00000000f45d4000
>> x24: 0000000000000000 x25: 00000000f45bc000
>> x26: 0000000000000000 x27: 0000000000000000
>> x28: 00000000f2541440 x29: 00000000f251fc20
>>
>> Code: 540000c1 350000a5 93407c00 f9400081 (b8616800)
>> Resetting CPU ...
>>
>> Adding a delay after link training works-around the problem. I added this
>> delay to the OpenBSD rkpcie driver as well:
>>
>> https://github.com/openbsd/src/commit/9857dee3520d8ca5bec68538f4b0708d7e64fc87 
>>
>>
>> HighPoint Rocket Raid 640L needs a 1.75 sec delay and Crossfield 
>> SAS9211-4i
>> needs a 1 second delay, so I arbitrarily decided on 2 seconds.
>>
>> The delay work-around was originally discovered by nuumio:
>> https://github.com/nuumio/linux-kernel/commit/5a65b17686002dc84d461bffa324a2cb68e67aee 
>>
>>
>> ? drivers/pci/pcie_rockchip.c | 8 ++++++++
>> ? 1 file changed, 8 insertions(+)
>>
>> diff --git a/drivers/pci/pcie_rockchip.c b/drivers/pci/pcie_rockchip.c
>> index 0edc2464a8..51cfbf6b18 100644
>> --- a/drivers/pci/pcie_rockchip.c
>> +++ b/drivers/pci/pcie_rockchip.c
>> @@ -288,6 +288,14 @@ static int rockchip_pcie_init_port(struct udevice 
>> *dev)
>> ????????? goto err_power_off_phy;
>> ????? }
>> +??? /*
>> +???? * XXX: On at least the RockPro64, many cards will trip a
>> +???? * synchronous abort when first accessing PCIe config space
>> +???? * during bus scanning. A delay after link training allows
>> +???? * some of these cards to function.
>> +???? */
>> +??? mdelay(2000);
> 
> I don't understand what kind of delay for init needs 2 Seconds, root 
> cause will preferred.

Strictly speaking, how long should we need for this had been provided,
see this:

https://patchwork.kernel.org/patch/11561977/

If you need more delay, I  highly suspect you should check if
the power supply is stable before enabling training.

> 
> 
> Thanks,
> 
> - Kever
> 
>> +
>> ????? /* Initialize Root Complex registers. */
>> ????? writel(PCIE_LM_VENDOR_ROCKCHIP, priv->apb_base + 
>> PCIE_LM_VENDOR_ID);
>> ????? writel(PCI_CLASS_BRIDGE_PCI << 16,
> 
> 
>
Kurt Miller June 2, 2020, 3:12 p.m. UTC | #5
On Tue, 2020-06-02 at 10:23 +0800, Shawn Lin wrote:
> 
> ? 2020/6/2 9:59, Kever Yang ??:
> > 
> > Hi Kurt,
> > 
> > On 2020/6/2 ??4:30, Kurt Miller wrote:
> > > 
> > > On at least the RockPro64, many cards will trip a
> > > synchronous abort when first accessing PCIe config space
> > > during bus scanning. A delay after link training allows
> > > some of these cards to function.
> > > 
> > > Signed-off-by: Kurt Miller <kurt at intricatesoftware.com>
> > > ---
> > > On the RockPro64, some pci cards trip a synchronous abort when?
> > > scanning the
> > > pci bus. For example with HighPoint Rocket Raid 640L which is based on
> > > Marvell 88SE9230 I see this:
> > > 
> > > => pci
> > > "Synchronous Abort" handler, esr 0x96000210
> > > elr: 000000000022d034 lr : 000000000022cfd0 (reloc)
> > > elr: 00000000f4568034 lr : 00000000f4567fd0
> > > x0 : 0000000000100000 x1 : 00000000f8000000
> > > x2 : 0000000000000000 x3 : 0000000000100000
> > > x4 : 00000000f2559290 x5 : 0000000000000000
> > > x6 : 0000000000000001 x7 : 00000000f2559860
> > > x8 : 0000000000000030 x9 : 0000000000000008
> > > x10: 0000000000000010 x11: 00000000f251fd1c
> > > x12: 0000000000001421 x13: 0000000000001468
> > > x14: 00000000f251fd4c x15: 00000000ffffffff
> > > x16: 0000000000060001 x17: 000000000000001f
> > > x18: 00000000f2532dc0 x19: 00000000f251fcd0
> > > x20: 0000000000000001 x21: 0000000000000000
> > > x22: 0000000000010000 x23: 00000000f45d4000
> > > x24: 0000000000000000 x25: 00000000f45bc000
> > > x26: 0000000000000000 x27: 0000000000000000
> > > x28: 00000000f2541440 x29: 00000000f251fc20
> > > 
> > > Code: 540000c1 350000a5 93407c00 f9400081 (b8616800)
> > > Resetting CPU ...
> > > 
> > > Adding a delay after link training works-around the problem. I added this
> > > delay to the OpenBSD rkpcie driver as well:
> > > 
> > > https://github.com/openbsd/src/commit/9857dee3520d8ca5bec68538f4b0708d7e64fc87?
> > > 
> > > 
> > > HighPoint Rocket Raid 640L needs a 1.75 sec delay and Crossfield?
> > > SAS9211-4i
> > > needs a 1 second delay, so I arbitrarily decided on 2 seconds.
> > > 
> > > The delay work-around was originally discovered by nuumio:
> > > https://github.com/nuumio/linux-kernel/commit/5a65b17686002dc84d461bffa324a2cb68e67aee?
> > > 
> > > 
> > > ? drivers/pci/pcie_rockchip.c | 8 ++++++++
> > > ? 1 file changed, 8 insertions(+)
> > > 
> > > diff --git a/drivers/pci/pcie_rockchip.c b/drivers/pci/pcie_rockchip.c
> > > index 0edc2464a8..51cfbf6b18 100644
> > > --- a/drivers/pci/pcie_rockchip.c
> > > +++ b/drivers/pci/pcie_rockchip.c
> > > @@ -288,6 +288,14 @@ static int rockchip_pcie_init_port(struct udevice?
> > > *dev)
> > > ????????? goto err_power_off_phy;
> > > ????? }
> > > +??? /*
> > > +???? * XXX: On at least the RockPro64, many cards will trip a
> > > +???? * synchronous abort when first accessing PCIe config space
> > > +???? * during bus scanning. A delay after link training allows
> > > +???? * some of these cards to function.
> > > +???? */
> > > +??? mdelay(2000);
> > I don't understand what kind of delay for init needs 2 Seconds, root?
> > cause will preferred.

Hi Kever,

While working on this issue for the OpenBSD PCIe driver I was not
able to determine the root cause. I tested the following adapters:

ROCKPro64 2 Port SATA
StarTech PEXSAT32 2 Port SATA
Samsung 970 Evo NVMe w/m.2 adapter
IO CREST SI-PEX40148 2 Port SATA
IO CREST SI-PEX40057 4 port?
HighPoint Rocket Raid 640L
Crossfield SAS9211-4i
Del PERC H200
Dell PERC 6/i
Intel Gigabit VT Quad Port Server

All of the above adapters successfully link trained, however
three of them would panic upon the first read of the PCI config
space.?nuumio's work-around of placing a delay after link-training
allows these cards to work.

HighPoint Rocket Raid 640L ~1.75 sec delay
Crossfield SAS9211-4i ~1 sec delay
Dell PERC H200 ~1 sec delay

In attempt to determine if there was a way to detect how long
to wait, I compared every status and debug register documented
in the rk3399 TRM part 2 for the PCI controller. I compared the
values pre-delay and post-delay. I was not able to find a value
that would indicate it was safe to proceed.

> Strictly speaking, how long should we need for this had been provided,
> see this:
> 
> https://patchwork.kernel.org/patch/11561977/
> 
> If you need more delay, I??highly suspect you should check if
> the power supply is stable before enabling training.

Hi Shawn,

Thank you for pointing out the need for the 100ms delay before
beginning link-training. I believe this is to correct link
training failures, while the delay after link training is
to work-around post link training reading of PCI config
space on certain cards.

In my testing I discussed above, I suspected that power
was a likely cause. The HighPoint Rocket Raid 640L is
a good test card because it documents its 3.3v power at
0.7 watts:

https://highpoint-tech.com/USA_new/series_r600-specifications.htm

How can I check if the power supply is stable?

Regards,
-Kurt

> > 
> > 
> > 
> > Thanks,
> > 
> > - Kever
> > 
> > > 
> > > +
> > > ????? /* Initialize Root Complex registers. */
> > > ????? writel(PCIE_LM_VENDOR_ROCKCHIP, priv->apb_base +?
> > > PCIE_LM_VENDOR_ID);
> > > ????? writel(PCI_CLASS_BRIDGE_PCI << 16,
> > 
> > 
>
Peter Geis June 3, 2020, 9:17 p.m. UTC | #6
On Tue, Jun 2, 2020 at 11:12 AM Kurt Miller <kurt at intricatesoftware.com> wrote:
>
> On Tue, 2020-06-02 at 10:23 +0800, Shawn Lin wrote:
> >
> > ? 2020/6/2 9:59, Kever Yang ??:
> > >
> > > Hi Kurt,
> > >
> > > On 2020/6/2 ??4:30, Kurt Miller wrote:
> > > >
> > > > On at least the RockPro64, many cards will trip a
> > > > synchronous abort when first accessing PCIe config space
> > > > during bus scanning. A delay after link training allows
> > > > some of these cards to function.
> > > >
> > > > Signed-off-by: Kurt Miller <kurt at intricatesoftware.com>
> > > > ---
> > > > On the RockPro64, some pci cards trip a synchronous abort when
> > > > scanning the
> > > > pci bus. For example with HighPoint Rocket Raid 640L which is based on
> > > > Marvell 88SE9230 I see this:
> > > >
> > > > => pci
> > > > "Synchronous Abort" handler, esr 0x96000210
> > > > elr: 000000000022d034 lr : 000000000022cfd0 (reloc)
> > > > elr: 00000000f4568034 lr : 00000000f4567fd0
> > > > x0 : 0000000000100000 x1 : 00000000f8000000
> > > > x2 : 0000000000000000 x3 : 0000000000100000
> > > > x4 : 00000000f2559290 x5 : 0000000000000000
> > > > x6 : 0000000000000001 x7 : 00000000f2559860
> > > > x8 : 0000000000000030 x9 : 0000000000000008
> > > > x10: 0000000000000010 x11: 00000000f251fd1c
> > > > x12: 0000000000001421 x13: 0000000000001468
> > > > x14: 00000000f251fd4c x15: 00000000ffffffff
> > > > x16: 0000000000060001 x17: 000000000000001f
> > > > x18: 00000000f2532dc0 x19: 00000000f251fcd0
> > > > x20: 0000000000000001 x21: 0000000000000000
> > > > x22: 0000000000010000 x23: 00000000f45d4000
> > > > x24: 0000000000000000 x25: 00000000f45bc000
> > > > x26: 0000000000000000 x27: 0000000000000000
> > > > x28: 00000000f2541440 x29: 00000000f251fc20
> > > >
> > > > Code: 540000c1 350000a5 93407c00 f9400081 (b8616800)
> > > > Resetting CPU ...
> > > >
> > > > Adding a delay after link training works-around the problem. I added this
> > > > delay to the OpenBSD rkpcie driver as well:
> > > >
> > > > https://github.com/openbsd/src/commit/9857dee3520d8ca5bec68538f4b0708d7e64fc87
> > > >
> > > >
> > > > HighPoint Rocket Raid 640L needs a 1.75 sec delay and Crossfield
> > > > SAS9211-4i
> > > > needs a 1 second delay, so I arbitrarily decided on 2 seconds.
> > > >
> > > > The delay work-around was originally discovered by nuumio:
> > > > https://github.com/nuumio/linux-kernel/commit/5a65b17686002dc84d461bffa324a2cb68e67aee
> > > >
> > > >
> > > >   drivers/pci/pcie_rockchip.c | 8 ++++++++
> > > >   1 file changed, 8 insertions(+)
> > > >
> > > > diff --git a/drivers/pci/pcie_rockchip.c b/drivers/pci/pcie_rockchip.c
> > > > index 0edc2464a8..51cfbf6b18 100644
> > > > --- a/drivers/pci/pcie_rockchip.c
> > > > +++ b/drivers/pci/pcie_rockchip.c
> > > > @@ -288,6 +288,14 @@ static int rockchip_pcie_init_port(struct udevice
> > > > *dev)
> > > >           goto err_power_off_phy;
> > > >       }
> > > > +    /*
> > > > +     * XXX: On at least the RockPro64, many cards will trip a
> > > > +     * synchronous abort when first accessing PCIe config space
> > > > +     * during bus scanning. A delay after link training allows
> > > > +     * some of these cards to function.
> > > > +     */
> > > > +    mdelay(2000);
> > > I don't understand what kind of delay for init needs 2 Seconds, root
> > > cause will preferred.
>
> Hi Kever,
>
> While working on this issue for the OpenBSD PCIe driver I was not
> able to determine the root cause. I tested the following adapters:
>
> ROCKPro64 2 Port SATA
> StarTech PEXSAT32 2 Port SATA
> Samsung 970 Evo NVMe w/m.2 adapter
> IO CREST SI-PEX40148 2 Port SATA
> IO CREST SI-PEX40057 4 port
> HighPoint Rocket Raid 640L
> Crossfield SAS9211-4i
> Del PERC H200
> Dell PERC 6/i
> Intel Gigabit VT Quad Port Server
>
> All of the above adapters successfully link trained, however
> three of them would panic upon the first read of the PCI config
> space. nuumio's work-around of placing a delay after link-training
> allows these cards to work.
>
> HighPoint Rocket Raid 640L ~1.75 sec delay
> Crossfield SAS9211-4i ~1 sec delay
> Dell PERC H200 ~1 sec delay
>
> In attempt to determine if there was a way to detect how long
> to wait, I compared every status and debug register documented
> in the rk3399 TRM part 2 for the PCI controller. I compared the
> values pre-delay and post-delay. I was not able to find a value
> that would indicate it was safe to proceed.
>
> > Strictly speaking, how long should we need for this had been provided,
> > see this:
> >
> > https://patchwork.kernel.org/patch/11561977/
> >
> > If you need more delay, I  highly suspect you should check if
> > the power supply is stable before enabling training.
>
> Hi Shawn,
>
> Thank you for pointing out the need for the 100ms delay before
> beginning link-training. I believe this is to correct link
> training failures, while the delay after link training is
> to work-around post link training reading of PCI config
> space on certain cards.

There are similar issues on the Linux driver with various cards
randomly throwing an abort.
If it is power, a 2 second wait to allow cards to stabilize after
power on might be wise for the Linux driver as well.

I imagine some cards take longer to complete power on reset than
others, and attempting to read them immediately after powering them up
could be the issue here.

>
> In my testing I discussed above, I suspected that power
> was a likely cause. The HighPoint Rocket Raid 640L is
> a good test card because it documents its 3.3v power at
> 0.7 watts:
>
> https://highpoint-tech.com/USA_new/series_r600-specifications.htm
>
> How can I check if the power supply is stable?
>
> Regards,
> -Kurt
>
> > >
> > >
> > >
> > > Thanks,
> > >
> > > - Kever
> > >
> > > >
> > > > +
> > > >       /* Initialize Root Complex registers. */
> > > >       writel(PCIE_LM_VENDOR_ROCKCHIP, priv->apb_base +
> > > > PCIE_LM_VENDOR_ID);
> > > >       writel(PCI_CLASS_BRIDGE_PCI << 16,
> > >
> > >
> >
>
Kever Yang June 27, 2020, 12:57 p.m. UTC | #7
Hi Kurt,


On 2020/6/4 ??5:17, Peter Geis wrote:

> On Tue, Jun 2, 2020 at 11:12 AM Kurt Miller <kurt at intricatesoftware.com> wrote:
>> On Tue, 2020-06-02 at 10:23 +0800, Shawn Lin wrote:
>>> ? 2020/6/2 9:59, Kever Yang ??:
>>>> Hi Kurt,
>>>>
>>>> On 2020/6/2 ??4:30, Kurt Miller wrote:
>>>>> On at least the RockPro64, many cards will trip a
>>>>> synchronous abort when first accessing PCIe config space
>>>>> during bus scanning. A delay after link training allows
>>>>> some of these cards to function.
>>>>>
>>>>> Signed-off-by: Kurt Miller <kurt at intricatesoftware.com>
>>>>> ---
>>>>> On the RockPro64, some pci cards trip a synchronous abort when
>>>>> scanning the
>>>>> pci bus. For example with HighPoint Rocket Raid 640L which is based on
>>>>> Marvell 88SE9230 I see this:
>>>>>
>>>>> => pci
>>>>> "Synchronous Abort" handler, esr 0x96000210
>>>>> elr: 000000000022d034 lr : 000000000022cfd0 (reloc)
>>>>> elr: 00000000f4568034 lr : 00000000f4567fd0
>>>>> x0 : 0000000000100000 x1 : 00000000f8000000
>>>>> x2 : 0000000000000000 x3 : 0000000000100000
>>>>> x4 : 00000000f2559290 x5 : 0000000000000000
>>>>> x6 : 0000000000000001 x7 : 00000000f2559860
>>>>> x8 : 0000000000000030 x9 : 0000000000000008
>>>>> x10: 0000000000000010 x11: 00000000f251fd1c
>>>>> x12: 0000000000001421 x13: 0000000000001468
>>>>> x14: 00000000f251fd4c x15: 00000000ffffffff
>>>>> x16: 0000000000060001 x17: 000000000000001f
>>>>> x18: 00000000f2532dc0 x19: 00000000f251fcd0
>>>>> x20: 0000000000000001 x21: 0000000000000000
>>>>> x22: 0000000000010000 x23: 00000000f45d4000
>>>>> x24: 0000000000000000 x25: 00000000f45bc000
>>>>> x26: 0000000000000000 x27: 0000000000000000
>>>>> x28: 00000000f2541440 x29: 00000000f251fc20
>>>>>
>>>>> Code: 540000c1 350000a5 93407c00 f9400081 (b8616800)
>>>>> Resetting CPU ...
>>>>>
>>>>> Adding a delay after link training works-around the problem. I added this
>>>>> delay to the OpenBSD rkpcie driver as well:
>>>>>
>>>>> https://github.com/openbsd/src/commit/9857dee3520d8ca5bec68538f4b0708d7e64fc87
>>>>>
>>>>>
>>>>> HighPoint Rocket Raid 640L needs a 1.75 sec delay and Crossfield
>>>>> SAS9211-4i
>>>>> needs a 1 second delay, so I arbitrarily decided on 2 seconds.
>>>>>
>>>>> The delay work-around was originally discovered by nuumio:
>>>>> https://github.com/nuumio/linux-kernel/commit/5a65b17686002dc84d461bffa324a2cb68e67aee
>>>>>
>>>>>
>>>>>    drivers/pci/pcie_rockchip.c | 8 ++++++++
>>>>>    1 file changed, 8 insertions(+)
>>>>>
>>>>> diff --git a/drivers/pci/pcie_rockchip.c b/drivers/pci/pcie_rockchip.c
>>>>> index 0edc2464a8..51cfbf6b18 100644
>>>>> --- a/drivers/pci/pcie_rockchip.c
>>>>> +++ b/drivers/pci/pcie_rockchip.c
>>>>> @@ -288,6 +288,14 @@ static int rockchip_pcie_init_port(struct udevice
>>>>> *dev)
>>>>>            goto err_power_off_phy;
>>>>>        }
>>>>> +    /*
>>>>> +     * XXX: On at least the RockPro64, many cards will trip a
>>>>> +     * synchronous abort when first accessing PCIe config space
>>>>> +     * during bus scanning. A delay after link training allows
>>>>> +     * some of these cards to function.
>>>>> +     */
>>>>> +    mdelay(2000);
>>>> I don't understand what kind of delay for init needs 2 Seconds, root
>>>> cause will preferred.
>> Hi Kever,
>>
>> While working on this issue for the OpenBSD PCIe driver I was not
>> able to determine the root cause. I tested the following adapters:
>>
>> ROCKPro64 2 Port SATA
>> StarTech PEXSAT32 2 Port SATA
>> Samsung 970 Evo NVMe w/m.2 adapter
>> IO CREST SI-PEX40148 2 Port SATA
>> IO CREST SI-PEX40057 4 port
>> HighPoint Rocket Raid 640L
>> Crossfield SAS9211-4i
>> Del PERC H200
>> Dell PERC 6/i
>> Intel Gigabit VT Quad Port Server
>>
>> All of the above adapters successfully link trained, however
>> three of them would panic upon the first read of the PCI config
>> space. nuumio's work-around of placing a delay after link-training
>> allows these cards to work.
>>
>> HighPoint Rocket Raid 640L ~1.75 sec delay
>> Crossfield SAS9211-4i ~1 sec delay
>> Dell PERC H200 ~1 sec delay
>>
>> In attempt to determine if there was a way to detect how long
>> to wait, I compared every status and debug register documented
>> in the rk3399 TRM part 2 for the PCI controller. I compared the
>> values pre-delay and post-delay. I was not able to find a value
>> that would indicate it was safe to proceed.
>>
>>> Strictly speaking, how long should we need for this had been provided,
>>> see this:
>>>
>>> https://patchwork.kernel.org/patch/11561977/

I can accept the same fix like kernel which is 100ms, but 2 Second is 
really too much for most cases.


Thanks,

- Kever

>>>
>>> If you need more delay, I  highly suspect you should check if
>>> the power supply is stable before enabling training.
>> Hi Shawn,
>>
>> Thank you for pointing out the need for the 100ms delay before
>> beginning link-training. I believe this is to correct link
>> training failures, while the delay after link training is
>> to work-around post link training reading of PCI config
>> space on certain cards.
> There are similar issues on the Linux driver with various cards
> randomly throwing an abort.
> If it is power, a 2 second wait to allow cards to stabilize after
> power on might be wise for the Linux driver as well.
>
> I imagine some cards take longer to complete power on reset than
> others, and attempting to read them immediately after powering them up
> could be the issue here.
>
>> In my testing I discussed above, I suspected that power
>> was a likely cause. The HighPoint Rocket Raid 640L is
>> a good test card because it documents its 3.3v power at
>> 0.7 watts:
>>
>> https://highpoint-tech.com/USA_new/series_r600-specifications.htm
>>
>> How can I check if the power supply is stable?
>>
>> Regards,
>> -Kurt
>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> - Kever
>>>>
>>>>> +
>>>>>        /* Initialize Root Complex registers. */
>>>>>        writel(PCIE_LM_VENDOR_ROCKCHIP, priv->apb_base +
>>>>> PCIE_LM_VENDOR_ID);
>>>>>        writel(PCI_CLASS_BRIDGE_PCI << 16,
>>>>
Kurt Miller June 30, 2020, 5:57 p.m. UTC | #8
On Sat, 2020-06-27 at 20:57 +0800, Kever Yang wrote:
> Hi Kurt,
> 
> 
> On 2020/6/4 ??5:17, Peter Geis wrote:
> 
> > 
> > On Tue, Jun 2, 2020 at 11:12 AM Kurt Miller <kurt at intricatesoftware.com> wrote:
> > > 
> > > On Tue, 2020-06-02 at 10:23 +0800, Shawn Lin wrote:
> > > > 
> > > > ? 2020/6/2 9:59, Kever Yang ??:
> > > > > 
> > > > > Hi Kurt,
> > > > > 
> > > > > On 2020/6/2 ??4:30, Kurt Miller wrote:
> > > > > > 
> > > > > > On at least the RockPro64, many cards will trip a
> > > > > > synchronous abort when first accessing PCIe config space
> > > > > > during bus scanning. A delay after link training allows
> > > > > > some of these cards to function.
> > > > > > 
> > > > > > Signed-off-by: Kurt Miller <kurt at intricatesoftware.com>
> > > > > > ---
> > > > > > On the RockPro64, some pci cards trip a synchronous abort when
> > > > > > scanning the
> > > > > > pci bus. For example with HighPoint Rocket Raid 640L which is based on
> > > > > > Marvell 88SE9230 I see this:
> > > > > > 
> > > > > > => pci
> > > > > > "Synchronous Abort"?handler, esr 0x96000210
> > > > > > elr: 000000000022d034 lr : 000000000022cfd0 (reloc)
> > > > > > elr: 00000000f4568034 lr : 00000000f4567fd0
> > > > > > x0 : 0000000000100000 x1 : 00000000f8000000
> > > > > > x2 : 0000000000000000 x3 : 0000000000100000
> > > > > > x4 : 00000000f2559290 x5 : 0000000000000000
> > > > > > x6 : 0000000000000001 x7 : 00000000f2559860
> > > > > > x8 : 0000000000000030 x9 : 0000000000000008
> > > > > > x10: 0000000000000010 x11: 00000000f251fd1c
> > > > > > x12: 0000000000001421 x13: 0000000000001468
> > > > > > x14: 00000000f251fd4c x15: 00000000ffffffff
> > > > > > x16: 0000000000060001 x17: 000000000000001f
> > > > > > x18: 00000000f2532dc0 x19: 00000000f251fcd0
> > > > > > x20: 0000000000000001 x21: 0000000000000000
> > > > > > x22: 0000000000010000 x23: 00000000f45d4000
> > > > > > x24: 0000000000000000 x25: 00000000f45bc000
> > > > > > x26: 0000000000000000 x27: 0000000000000000
> > > > > > x28: 00000000f2541440 x29: 00000000f251fc20
> > > > > > 
> > > > > > Code: 540000c1 350000a5 93407c00 f9400081 (b8616800)
> > > > > > Resetting CPU ...
> > > > > > 
> > > > > > Adding a delay after link training works-around the problem. I added this
> > > > > > delay to the OpenBSD rkpcie driver as well:
> > > > > > 
> > > > > > https://github.com/openbsd/src/commit/9857dee3520d8ca5bec68538f4b0708d7e64fc87
> > > > > > 
> > > > > > 
> > > > > > HighPoint Rocket Raid 640L needs a 1.75 sec delay and Crossfield
> > > > > > SAS9211-4i
> > > > > > needs a 1 second delay, so I arbitrarily decided on 2 seconds.
> > > > > > 
> > > > > > The delay work-around was originally discovered by nuumio:
> > > > > > https://github.com/nuumio/linux-kernel/commit/5a65b17686002dc84d461bffa324a2cb68e67aee
> > > > > > 
> > > > > > 
> > > > > > ???drivers/pci/pcie_rockchip.c | 8 ++++++++
> > > > > > ???1 file changed, 8 insertions(+)
> > > > > > 
> > > > > > diff --git a/drivers/pci/pcie_rockchip.c b/drivers/pci/pcie_rockchip.c
> > > > > > index 0edc2464a8..51cfbf6b18 100644
> > > > > > --- a/drivers/pci/pcie_rockchip.c
> > > > > > +++ b/drivers/pci/pcie_rockchip.c
> > > > > > @@ -288,6 +288,14 @@ static int rockchip_pcie_init_port(struct udevice
> > > > > > *dev)
> > > > > > ???????????goto err_power_off_phy;
> > > > > > ???????}
> > > > > > +????/*
> > > > > > +?????* XXX: On at least the RockPro64, many cards will trip a
> > > > > > +?????* synchronous abort when first accessing PCIe config space
> > > > > > +?????* during bus scanning. A delay after link training allows
> > > > > > +?????* some of these cards to function.
> > > > > > +?????*/
> > > > > > +????mdelay(2000);
> > > > > I don't understand what kind of delay for init needs 2 Seconds, root
> > > > > cause will preferred.
> > > Hi Kever,
> > > 
> > > While working on this issue for the OpenBSD PCIe driver I was not
> > > able to determine the root cause. I tested the following adapters:
> > > 
> > > ROCKPro64 2 Port SATA
> > > StarTech PEXSAT32 2 Port SATA
> > > Samsung 970 Evo NVMe w/m.2 adapter
> > > IO CREST SI-PEX40148 2 Port SATA
> > > IO CREST SI-PEX40057 4 port
> > > HighPoint Rocket Raid 640L
> > > Crossfield SAS9211-4i
> > > Del PERC H200
> > > Dell PERC 6/i
> > > Intel Gigabit VT Quad Port Server
> > > 
> > > All of the above adapters successfully link trained, however
> > > three of them would panic upon the first read of the PCI config
> > > space. nuumio's work-around of placing a delay after link-training
> > > allows these cards to work.
> > > 
> > > HighPoint Rocket Raid 640L ~1.75 sec delay
> > > Crossfield SAS9211-4i ~1 sec delay
> > > Dell PERC H200 ~1 sec delay
> > > 
> > > In attempt to determine if there was a way to detect how long
> > > to wait, I compared every status and debug register documented
> > > in the rk3399 TRM part 2 for the PCI controller. I compared the
> > > values pre-delay and post-delay. I was not able to find a value
> > > that would indicate it was safe to proceed.
> > > 
> > > > 
> > > > Strictly speaking, how long should we need for this had been provided,
> > > > see this:
> > > > 
> > > > https://patchwork.kernel.org/patch/11561977/
> I can accept the same fix like kernel which is 100ms, but 2 Second is?
> really too much for most cases.
> 

Hi Kever,

Thank you for following up on this. The 100ms delay before link-training
appears to be solving a different issue where some cards fail to link
train. From the comment in the referenced patch "Otherwise we do see some
failures for link training."

The delay I am proposing is after successful link-training to avoid the
synchronous abort that happens when scanning the pci bus and attempting
read the pci config space for some cards.

I performed a test where I added the 100ms delay before link-training
and it did not correct the synchronous abort on the?HighPoint Rocket
Raid 640L.

Shawn Lin suggested I check if the power supply is stable before
enabling training. How would I go about doing this??

Regards,
-Kurt

> 
> Thanks,
> 
> - Kever
> 
> > 
> > > 
> > > > 
> > > > 
> > > > If you need more delay, I??highly suspect you should check if
> > > > the power supply is stable before enabling training.
> > > Hi Shawn,
> > > 
> > > Thank you for pointing out the need for the 100ms delay before
> > > beginning link-training. I believe this is to correct link
> > > training failures, while the delay after link training is
> > > to work-around post link training reading of PCI config
> > > space on certain cards.
> > There are similar issues on the Linux driver with various cards
> > randomly throwing an abort.
> > If it is power, a 2 second wait to allow cards to stabilize after
> > power on might be wise for the Linux driver as well.
> > 
> > I imagine some cards take longer to complete power on reset than
> > others, and attempting to read them immediately after powering them up
> > could be the issue here.
> > 
> > > 
> > > In my testing I discussed above, I suspected that power
> > > was a likely cause. The HighPoint Rocket Raid 640L is
> > > a good test card because it documents its 3.3v power at
> > > 0.7 watts:
> > > 
> > > https://highpoint-tech.com/USA_new/series_r600-specifications.htm
> > > 
> > > How can I check if the power supply is stable?
> > > 
> > > Regards,
> > > -Kurt
> > > 
> > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > Thanks,
> > > > > 
> > > > > - Kever
> > > > > 
> > > > > > 
> > > > > > +
> > > > > > ???????/* Initialize Root Complex registers. */
> > > > > > ???????writel(PCIE_LM_VENDOR_ROCKCHIP, priv->apb_base +
> > > > > > PCIE_LM_VENDOR_ID);
> > > > > > ???????writel(PCI_CLASS_BRIDGE_PCI << 16,
>
diff mbox series

Patch

diff --git a/drivers/pci/pcie_rockchip.c b/drivers/pci/pcie_rockchip.c
index 0edc2464a8..51cfbf6b18 100644
--- a/drivers/pci/pcie_rockchip.c
+++ b/drivers/pci/pcie_rockchip.c
@@ -288,6 +288,14 @@  static int rockchip_pcie_init_port(struct udevice *dev)
 		goto err_power_off_phy;
 	}
 
+	/*
+	 * XXX: On at least the RockPro64, many cards will trip a
+	 * synchronous abort when first accessing PCIe config space
+	 * during bus scanning. A delay after link training allows
+	 * some of these cards to function.
+	 */
+	mdelay(2000);
+
 	/* Initialize Root Complex registers. */
 	writel(PCIE_LM_VENDOR_ROCKCHIP, priv->apb_base + PCIE_LM_VENDOR_ID);
 	writel(PCI_CLASS_BRIDGE_PCI << 16,