diff mbox series

[v7,2/3] PCI: Enable PCIe Relaxed Ordering if supported

Message ID 1499955692-26556-3-git-send-email-dingtianhong@huawei.com
State Superseded
Headers show
Series Add new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag | expand

Commit Message

Ding Tianhong July 13, 2017, 2:21 p.m. UTC
The PCIe Device Control Register use the bit 4 to indicate that
whether the device is permitted to enable relaxed ordering or not.
But relaxed ordering is not safe for some platform which could only
use strong write ordering, so devices are allowed (but not required)
to enable relaxed ordering bit by default.

If a PCIe device didn't enable the relaxed ordering attribute default,
we should not do anything in the PCIe configuration, otherwise we
should check if any of the devices above us do not support relaxed
ordering by the PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag, then base on
the result if we get a return that indicate that the relaxed ordering
is not supported we should update our device to disable relaxed ordering
in configuration space. If the device above us doesn't exist or isn't
the PCIe device, we shouldn't do anything and skip updating relaxed ordering
because we are probably running in a guest machine.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>

---
 drivers/pci/pci.c   | 29 +++++++++++++++++++++++++++++
 drivers/pci/probe.c | 37 +++++++++++++++++++++++++++++++++++++
 include/linux/pci.h |  2 ++
 3 files changed, 68 insertions(+)

-- 
1.8.3.1

Comments

Sinan Kaya July 13, 2017, 9:09 p.m. UTC | #1
On 7/13/2017 10:21 AM, Ding Tianhong wrote:
> static void pci_configure_relaxed_ordering(struct pci_dev *dev)

> +{

> +	/* We should not alter the relaxed ordering bit for the VF */

> +	if (dev->is_virtfn)

> +		return;

> +

> +	/* If the releaxed ordering enable bit is not set, do nothing. */

> +	if (!pcie_relaxed_ordering_supported(dev))

> +		return;

> +

> +	if (pci_dev_should_disable_relaxed_ordering(dev)) {

> +		pcie_clear_relaxed_ordering(dev);

> +		dev_info(&dev->dev, "Disable Relaxed Ordering\n");

> +	}

> +}


I couldn't find anywhere where you actually enable the relaxed ordering
like the subject suggests.

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
Ding Tianhong July 14, 2017, 1:26 a.m. UTC | #2
On 2017/7/14 5:09, Sinan Kaya wrote:
> On 7/13/2017 10:21 AM, Ding Tianhong wrote:

>> static void pci_configure_relaxed_ordering(struct pci_dev *dev)

>> +{

>> +	/* We should not alter the relaxed ordering bit for the VF */

>> +	if (dev->is_virtfn)

>> +		return;

>> +

>> +	/* If the releaxed ordering enable bit is not set, do nothing. */

>> +	if (!pcie_relaxed_ordering_supported(dev))

>> +		return;

>> +

>> +	if (pci_dev_should_disable_relaxed_ordering(dev)) {

>> +		pcie_clear_relaxed_ordering(dev);

>> +		dev_info(&dev->dev, "Disable Relaxed Ordering\n");

>> +	}

>> +}

> 

> I couldn't find anywhere where you actually enable the relaxed ordering

> like the subject suggests.

> 

There is no code to enable the PCIe Relaxed Ordering bit in the configuration space,
it is only be enable by default according to the PCIe Standard Specification, what we
do is to distinguish the RC problematic platform and clear the Relaxed Ordering bit
to tell the PCIe EP don't send any TLPs with Relaxed Ordering Attributes to the Root
Complex.

Thanks
Ding
Sinan Kaya July 14, 2017, 1:54 p.m. UTC | #3
On 7/13/2017 9:26 PM, Ding Tianhong wrote:
> There is no code to enable the PCIe Relaxed Ordering bit in the configuration space,

> it is only be enable by default according to the PCIe Standard Specification, what we

> do is to distinguish the RC problematic platform and clear the Relaxed Ordering bit

> to tell the PCIe EP don't send any TLPs with Relaxed Ordering Attributes to the Root

> Complex.


Maybe, you should change the patch commit as 
"Disable PCIe Relaxed Ordering if not supported"...

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
Ding Tianhong July 22, 2017, 4:19 a.m. UTC | #4
Hi Sinan, Bjorn:

On 2017/7/14 21:54, Sinan Kaya wrote:
> On 7/13/2017 9:26 PM, Ding Tianhong wrote:

>> There is no code to enable the PCIe Relaxed Ordering bit in the configuration space,

>> it is only be enable by default according to the PCIe Standard Specification, what we

>> do is to distinguish the RC problematic platform and clear the Relaxed Ordering bit

>> to tell the PCIe EP don't send any TLPs with Relaxed Ordering Attributes to the Root

>> Complex.

> 

> Maybe, you should change the patch commit as 

> "Disable PCIe Relaxed Ordering if not supported"...


I agree that to use the new commit title as your suggested, thanks. :)

@Bjorn do you want me to spawn a new patchset with the new commit title
and the Reviewed-by from Casey on the patch 3, or maybe you could pick this
up and modify it own ? thanks.

Ding

>
Alex Williamson July 24, 2017, 3:05 p.m. UTC | #5
On Sat, 22 Jul 2017 12:19:38 +0800
Ding Tianhong <dingtianhong@huawei.com> wrote:

> Hi Sinan, Bjorn:

> 

> On 2017/7/14 21:54, Sinan Kaya wrote:

> > On 7/13/2017 9:26 PM, Ding Tianhong wrote:  

> >> There is no code to enable the PCIe Relaxed Ordering bit in the configuration space,

> >> it is only be enable by default according to the PCIe Standard Specification, what we

> >> do is to distinguish the RC problematic platform and clear the Relaxed Ordering bit

> >> to tell the PCIe EP don't send any TLPs with Relaxed Ordering Attributes to the Root

> >> Complex.  

> > 

> > Maybe, you should change the patch commit as 

> > "Disable PCIe Relaxed Ordering if not supported"...  

> 

> I agree that to use the new commit title as your suggested, thanks. :)

> 

> @Bjorn do you want me to spawn a new patchset with the new commit title

> and the Reviewed-by from Casey on the patch 3, or maybe you could pick this

> up and modify it own ? thanks.


Hi Ding,

Bjorn is currently on holiday so it might be a good idea to respin the
series with any updates so nothing is lost.  Thanks,

Alex
Casey Leedom July 26, 2017, 7:05 p.m. UTC | #6
| From: Alexander Duyck <alexander.duyck@gmail.com>
| Sent: Wednesday, July 26, 2017 11:44 AM
| 
| On Jul 26, 2017 11:26 AM, "Casey Leedom" <leedom@chelsio.com> wrote:
| |
| |     I think that the patch will need to be extended to modify
| |     drivers/pci.c/iov.c:sriov_enable() to explicitly turn off
| |     Relaxed Ordering Enable if the Root Complex is marked
|     for no RO TLPs.
| 
| I'm not sure that would be an issue. Wouldn't most VFs inherit the PF's settings?

Ah yes, you're right.  This is covered in section 3.5.4 of the Single Root I/O
Virtualization and Sharing Specification, Revision 1.0 (September 11, 2007),
governing the PCIe Capability Device Control register.  It states that the VF
version of that register shall follow the setting of the corresponding PF.

So we should enhance the cxgb4vf/sge.c:t4vf_sge_alloc_rxq() in the same
way we did for the cxgb4 driver, but that's not critical since the Relaxed
Ordering Enable supersedes the internal chip's desire to use the Relaxed
Ordering Attribute.

Ding, send me a note if you'd like me to work that up for you.

| Also I thought most of the VF configuration space is read only.

Yes, but not all of it.  And when a VF is exported to a Virtual Machine,
then the Hypervisor captures and interprets all accesses to the VF's
PCIe Configuration Space from the VM.

Thanks again for reminding me of the subtle aspect of the SR_IOV
specification that I forgot.

Casey
Ding Tianhong July 27, 2017, 1:01 a.m. UTC | #7
On 2017/7/27 3:05, Casey Leedom wrote:
> | From: Alexander Duyck <alexander.duyck@gmail.com>

> | Sent: Wednesday, July 26, 2017 11:44 AM

> | 

> | On Jul 26, 2017 11:26 AM, "Casey Leedom" <leedom@chelsio.com> wrote:

> | |

> | |     I think that the patch will need to be extended to modify

> | |     drivers/pci.c/iov.c:sriov_enable() to explicitly turn off

> | |     Relaxed Ordering Enable if the Root Complex is marked

> |     for no RO TLPs.

> | 

> | I'm not sure that would be an issue. Wouldn't most VFs inherit the PF's settings?

> 

> Ah yes, you're right.  This is covered in section 3.5.4 of the Single Root I/O

> Virtualization and Sharing Specification, Revision 1.0 (September 11, 2007),

> governing the PCIe Capability Device Control register.  It states that the VF

> version of that register shall follow the setting of the corresponding PF.

> 

> So we should enhance the cxgb4vf/sge.c:t4vf_sge_alloc_rxq() in the same

> way we did for the cxgb4 driver, but that's not critical since the Relaxed

> Ordering Enable supersedes the internal chip's desire to use the Relaxed

> Ordering Attribute.

> 

> Ding, send me a note if you'd like me to work that up for you.

> 


Ok, you could send the change log and I could put it in the v8 version together,
will you base on the patch 3/3 or build a independence patch?

Ding

> | Also I thought most of the VF configuration space is read only.

> 

> Yes, but not all of it.  And when a VF is exported to a Virtual Machine,

> then the Hypervisor captures and interprets all accesses to the VF's

> PCIe Configuration Space from the VM.

> 

> Thanks again for reminding me of the subtle aspect of the SR_IOV

> specification that I forgot.

> 

> Casey

> .

>
Ding Tianhong July 27, 2017, 1:08 a.m. UTC | #8
On 2017/7/27 2:26, Casey Leedom wrote:
>   By the way Ding, two issues:

> 

>  1. Did we ever get any acknowledgement from either Intel or AMD

>     on this patch?  I know that we can't ensure that, but it sure would

>     be nice since the PCI Quirks that we're putting in affect their

>     products.

> 


Still no Intel and AMD guys has ack this, this is what I am worried about, should I
ping some man again ?

Thanks
Ding
> 

> Casey

> .

>
Casey Leedom July 27, 2017, 5:44 p.m. UTC | #9
| From: Ding Tianhong <dingtianhong@huawei.com>
| Sent: Wednesday, July 26, 2017 6:01 PM
|
| On 2017/7/27 3:05, Casey Leedom wrote:
| >
| > Ding, send me a note if you'd like me to work that [cxgb4vf patch] up
| > for you.
|
| Ok, you could send the change log and I could put it in the v8 version
| together, will you base on the patch 3/3 or build a independence patch?

Which ever you'd prefer.  It would basically mirror the same exact code that
you've got for cxgb4.  I.e. testing the setting of the VF's PCIe Capability
Device Control[Relaxed Ordering Enable], setting a new flag in
adpater->flags, testing that flag in cxgb4vf/sge.c:t4vf_sge_alloc_rxq().
But since the VF's PF will already have disabled the PF's Relaxed Ordering
Enable, the VF will also have it's Relaxed Ordering Enable disabled and any
effort by the internal chip to send TLPs with the Relaxed Ordering Attribute
will be gated by the PCIe logic.  So it's not critical that this be in the
first patch.  Your call.  Let me know if you'd like me to send that to you.


| From: Ding Tianhong <dingtianhong@huawei.com>
| Sent: Wednesday, July 26, 2017 6:08 PM
|
| On 2017/7/27 2:26, Casey Leedom wrote:
| >
| >  1. Did we ever get any acknowledgement from either Intel or AMD
| >     on this patch?  I know that we can't ensure that, but it sure would
| >     be nice since the PCI Quirks that we're putting in affect their
| >     products.
|
| Still no Intel and AMD guys has ack this, this is what I am worried about,
| should I ping some man again ?

By amusing coincidence, Patrik Cramer (now Cc'ed) from Intel sent me a note
yesterday with a link to the official Intel performance tuning documentation
which covers this issue:

https://software.intel.com/sites/default/files/managed/9e/bc/64-ia-32-architectures-optimization-manual.pdf

In section 3.9.1 we have:

    3.9.1 Optimizing PCIe Performance for Accesses Toward Coherent Memory
          and Toward MMIO Regions (P2P)

    In order to maximize performance for PCIe devices in the processors
    listed in Table 3-6 below, the soft- ware should determine whether the
    accesses are toward coherent memory (system memory) or toward MMIO
    regions (P2P access to other devices). If the access is toward MMIO
    region, then software can command HW to set the RO bit in the TLP
    header, as this would allow hardware to achieve maximum throughput for
    these types of accesses. For accesses toward coherent memory, software
    can command HW to clear the RO bit in the TLP header (no RO), as this
    would allow hardware to achieve maximum throughput for these types of
    accesses.

    Table 3-6. Intel Processor CPU RP Device IDs for Processors Optimizing
               PCIe Performance

    Processor                            CPU RP Device IDs

    Intel Xeon processors based on       6F01H-6F0EH
    Broadwell microarchitecture

    Intel Xeon processors based on       2F01H-2F0EH
    Haswell microarchitecture

Unfortunately that's a pretty thin section.  But it does expand the set of
Intel Root Complexes for which our Linux PCI Quirk will need to cover.  So
you should add those to the next (and hopefully final) spin of your patch.
And, it also verifies the need to handle the use of Relaxed Ordering more
subtlely than simply turning it off since the NVMe peer-to-peer example I
keep bringing up would fall into the "need to use Relaxed Ordering" case ...

It would have been nice to know why this is happening and if any future
processor would fix this.  After all, Relaxed Ordering, is just supposed to
be a hint.  At worst, a receiving device could just ignore the attribute
entirely.  Obviously someone made an effort to implement it but ... it
didn't go the way they wanted.

And, it also would have been nice to know if there was any hidden register
in these Intel Root Complexes which can completely turn off the effort to
pay attention to the Relaxed Ordering Attribute.  We've spend an enormous
amount of effort on this issue here on the Linux PCI email list struggling
mightily to come up with a way to determine when it's
safe/recommended/not-recommended/unsafe to use Relaxed Ordering when
directing TLPs towards the Root Complex.  And some architectures require RO
for decent performance so we can't just "turn it off" unilatterally.

Casey
Alexander Duyck July 27, 2017, 5:49 p.m. UTC | #10
On Wed, Jul 26, 2017 at 6:08 PM, Ding Tianhong <dingtianhong@huawei.com> wrote:
>

>

> On 2017/7/27 2:26, Casey Leedom wrote:

>>   By the way Ding, two issues:

>>

>>  1. Did we ever get any acknowledgement from either Intel or AMD

>>     on this patch?  I know that we can't ensure that, but it sure would

>>     be nice since the PCI Quirks that we're putting in affect their

>>     products.

>>

>

> Still no Intel and AMD guys has ack this, this is what I am worried about, should I

> ping some man again ?

>

> Thanks

> Ding



I probably wouldn't worry about it too much. If anything all this
patch is doing is disabling relaxed ordering on the platforms we know
have issues based on what Casey originally had. If nothing else we can
follow up once the patches are in the kernel and if somebody has an
issue then.

You can include my acked-by, but it is mostly related to how this
interacts with NICs, and not so much about the PCI chipsets
themselves.

Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Raj, Ashok July 27, 2017, 6:42 p.m. UTC | #11
Hi Casey

> | Still no Intel and AMD guys has ack this, this is what I am worried about,

> | should I ping some man again ?



I can ack the patch set for Intel specific changes. Now that the doc is made
public :-).

Can you/Ding resend the patch series, i do have the most recent v7, some
of the commit message wasn't easy to ready. Seems like this patch has
gotten bigger than originally intended, but seems to be for the overall
good :-).

Sorry for staying silent up until now.

Cheers,
Ashok
Ding Tianhong July 28, 2017, 2:48 a.m. UTC | #12
On 2017/7/28 1:44, Casey Leedom wrote:
> | From: Ding Tianhong <dingtianhong@huawei.com>

> | Sent: Wednesday, July 26, 2017 6:01 PM

> |

> | On 2017/7/27 3:05, Casey Leedom wrote:

> | >

> | > Ding, send me a note if you'd like me to work that [cxgb4vf patch] up

> | > for you.

> |

> | Ok, you could send the change log and I could put it in the v8 version

> | together, will you base on the patch 3/3 or build a independence patch?

> 

> Which ever you'd prefer.  It would basically mirror the same exact code that

> you've got for cxgb4.  I.e. testing the setting of the VF's PCIe Capability

> Device Control[Relaxed Ordering Enable], setting a new flag in

> adpater->flags, testing that flag in cxgb4vf/sge.c:t4vf_sge_alloc_rxq().

> But since the VF's PF will already have disabled the PF's Relaxed Ordering

> Enable, the VF will also have it's Relaxed Ordering Enable disabled and any

> effort by the internal chip to send TLPs with the Relaxed Ordering Attribute

> will be gated by the PCIe logic.  So it's not critical that this be in the

> first patch.  Your call.  Let me know if you'd like me to send that to you.

> 


Good, please Send it to me, I will put it together and send the v8 this week,
I think Bjorn will be back next week .:)

> 

> | From: Ding Tianhong <dingtianhong@huawei.com>

> | Sent: Wednesday, July 26, 2017 6:08 PM

> |

> | On 2017/7/27 2:26, Casey Leedom wrote:

> | >

> | >  1. Did we ever get any acknowledgement from either Intel or AMD

> | >     on this patch?  I know that we can't ensure that, but it sure would

> | >     be nice since the PCI Quirks that we're putting in affect their

> | >     products.

> |

> | Still no Intel and AMD guys has ack this, this is what I am worried about,

> | should I ping some man again ?

> 

> By amusing coincidence, Patrik Cramer (now Cc'ed) from Intel sent me a note

> yesterday with a link to the official Intel performance tuning documentation

> which covers this issue:

> 

> https://software.intel.com/sites/default/files/managed/9e/bc/64-ia-32-architectures-optimization-manual.pdf

> 

> In section 3.9.1 we have:

> 

>     3.9.1 Optimizing PCIe Performance for Accesses Toward Coherent Memory

>           and Toward MMIO Regions (P2P)

> 

>     In order to maximize performance for PCIe devices in the processors

>     listed in Table 3-6 below, the soft- ware should determine whether the

>     accesses are toward coherent memory (system memory) or toward MMIO

>     regions (P2P access to other devices). If the access is toward MMIO

>     region, then software can command HW to set the RO bit in the TLP

>     header, as this would allow hardware to achieve maximum throughput for

>     these types of accesses. For accesses toward coherent memory, software

>     can command HW to clear the RO bit in the TLP header (no RO), as this

>     would allow hardware to achieve maximum throughput for these types of

>     accesses.

> 

>     Table 3-6. Intel Processor CPU RP Device IDs for Processors Optimizing

>                PCIe Performance

> 

>     Processor                            CPU RP Device IDs

> 

>     Intel Xeon processors based on       6F01H-6F0EH

>     Broadwell microarchitecture

> 

>     Intel Xeon processors based on       2F01H-2F0EH

>     Haswell microarchitecture

> 

> Unfortunately that's a pretty thin section.  But it does expand the set of

> Intel Root Complexes for which our Linux PCI Quirk will need to cover.  So

> you should add those to the next (and hopefully final) spin of your patch.

> And, it also verifies the need to handle the use of Relaxed Ordering more

> subtlely than simply turning it off since the NVMe peer-to-peer example I

> keep bringing up would fall into the "need to use Relaxed Ordering" case ...

> 

> It would have been nice to know why this is happening and if any future

> processor would fix this.  After all, Relaxed Ordering, is just supposed to

> be a hint.  At worst, a receiving device could just ignore the attribute

> entirely.  Obviously someone made an effort to implement it but ... it

> didn't go the way they wanted.

> 

> And, it also would have been nice to know if there was any hidden register

> in these Intel Root Complexes which can completely turn off the effort to

> pay attention to the Relaxed Ordering Attribute.  We've spend an enormous

> amount of effort on this issue here on the Linux PCI email list struggling

> mightily to come up with a way to determine when it's

> safe/recommended/not-recommended/unsafe to use Relaxed Ordering when

> directing TLPs towards the Root Complex.  And some architectures require RO

> for decent performance so we can't just "turn it off" unilatterally.

> 


I am glad to hear that more person were focus on this problem, It would be great
if they could enter our discussion and give us more suggestion. :)

Thanks
Ding

> Casey

> 

> .

>
Ding Tianhong July 28, 2017, 2:57 a.m. UTC | #13
On 2017/7/28 2:42, Raj, Ashok wrote:
> Hi Casey

> 

>> | Still no Intel and AMD guys has ack this, this is what I am worried about,

>> | should I ping some man again ?

> 

> 

> I can ack the patch set for Intel specific changes. Now that the doc is made

> public :-).

> 


Good, Thanks. :)

> Can you/Ding resend the patch series, i do have the most recent v7, some

> of the commit message wasn't easy to ready. Seems like this patch has

> gotten bigger than originally intended, but seems to be for the overall

> good :-).

> 


OK, I will send v8 patch set and which will update the patch title and add
Casey's new modification for his vf driver, thanks.

Ding

> Sorry for staying silent up until now.

> 

> Cheers,

> Ashok

> 

> .

>
Ding Tianhong July 28, 2017, 3 a.m. UTC | #14
On 2017/7/28 1:49, Alexander Duyck wrote:
> On Wed, Jul 26, 2017 at 6:08 PM, Ding Tianhong <dingtianhong@huawei.com> wrote:

>>

>>

>> On 2017/7/27 2:26, Casey Leedom wrote:

>>>   By the way Ding, two issues:

>>>

>>>  1. Did we ever get any acknowledgement from either Intel or AMD

>>>     on this patch?  I know that we can't ensure that, but it sure would

>>>     be nice since the PCI Quirks that we're putting in affect their

>>>     products.

>>>

>>

>> Still no Intel and AMD guys has ack this, this is what I am worried about, should I

>> ping some man again ?

>>

>> Thanks

>> Ding

> 

> 

> I probably wouldn't worry about it too much. If anything all this

> patch is doing is disabling relaxed ordering on the platforms we know

> have issues based on what Casey originally had. If nothing else we can

> follow up once the patches are in the kernel and if somebody has an

> issue then.

> 

> You can include my acked-by, but it is mostly related to how this

> interacts with NICs, and not so much about the PCI chipsets

> themselves.

> 

> Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>

> 


Thanks, Alex. :)

> .

>
Casey Leedom Aug. 2, 2017, 5:53 p.m. UTC | #15
Okay, here you go.  As you can tell, it's almost a trivial copy of the
cxgb4 patch.
 
  By the way, I realized that we have yet another hole which is likely not
to be fixable.  If we're dealing with a problematic Root Complex, and we
instantiate Virtual Functions and attach them to a Virtual Machine along
with an NVMe device which can deal with Relaxed Ordering TLPs, the VF driver
in the VM will be able to tell that it shouldn't attempt to send RO TLPs to
the RC because it will see the state of its own PCIe Capability Device
Control[Relaxed Ordering Enable] (a copy of the setting in the VF's
corresponding PF), but it won't be able to change that and send non-RO TLPs
to the RC, and RO TLPs to the NVMe device.  Oh well.

  I sure wish that the Intel guys would pop up with a hidden register change
for these problematic Intel RCs that perform poorly with RO TLPs.  Their
silence has been frustrating.

Casey

----------

cxgb4vf Ethernet driver now queries PCIe configuration space to determine if
it can send TLPs to it with the Relaxed Ordering Attribute set.diff --git a/drivers/net/ethernet/chelsio/cxgb4vf/adapter.h b/drivers/net/ethernet/chelsio/cxgb4vf/adapter.h
index 109bc63..08c6ddb 100644
--- a/drivers/net/ethernet/chelsio/cxgb4vf/adapter.h
+++ b/drivers/net/ethernet/chelsio/cxgb4vf/adapter.h
@@ -408,6 +408,7 @@ enum { /* adapter flags */
 	USING_MSI          = (1UL << 1),
 	USING_MSIX         = (1UL << 2),
 	QUEUES_BOUND       = (1UL << 3),
+	ROOT_NO_RELAXED_ORDERING = (1UL << 4),
 };
 
 /*
diff --git a/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c b/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c
index ac7a150..59e7639 100644
--- a/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c
@@ -2888,6 +2888,24 @@ static int cxgb4vf_pci_probe(struct pci_dev *pdev,
 	 */
 	adapter->name = pci_name(pdev);
 	adapter->msg_enable = DFLT_MSG_ENABLE;
+
+	/* If possible, we use PCIe Relaxed Ordering Attribute to deliver
+	 * Ingress Packet Data to Free List Buffers in order to allow for
+	 * chipset performance optimizations between the Root Complex and
+	 * Memory Controllers.  (Messages to the associated Ingress Queue
+	 * notifying new Packet Placement in the Free Lists Buffers will be
+	 * send without the Relaxed Ordering Attribute thus guaranteeing that
+	 * all preceding PCIe Transaction Layer Packets will be processed
+	 * first.)  But some Root Complexes have various issues with Upstream
+	 * Transaction Layer Packets with the Relaxed Ordering Attribute set.
+	 * The PCIe devices which under the Root Complexes will be cleared the
+	 * Relaxed Ordering bit in the configuration space, So we check our
+	 * PCIe configuration space to see if it's flagged with advice against
+	 * using Relaxed Ordering.
+	 */
+	if (!pcie_relaxed_ordering_supported(pdev))
+		adapter->flags |= ROOT_NO_RELAXED_ORDERING;
+
 	err = adap_init0(adapter);
 	if (err)
 		goto err_unmap_bar;
diff --git a/drivers/net/ethernet/chelsio/cxgb4vf/sge.c b/drivers/net/ethernet/chelsio/cxgb4vf/sge.c
index e37dde2..05498e7 100644
--- a/drivers/net/ethernet/chelsio/cxgb4vf/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb4vf/sge.c
@@ -2205,6 +2205,7 @@ int t4vf_sge_alloc_rxq(struct adapter *adapter, struct sge_rspq *rspq,
 	struct port_info *pi = netdev_priv(dev);
 	struct fw_iq_cmd cmd, rpl;
 	int ret, iqandst, flsz = 0;
+	int relaxed = !(adapter->flags & ROOT_NO_RELAXED_ORDERING);
 
 	/*
 	 * If we're using MSI interrupts and we're not initializing the
@@ -2300,6 +2301,8 @@ int t4vf_sge_alloc_rxq(struct adapter *adapter, struct sge_rspq *rspq,
 			cpu_to_be32(
 				FW_IQ_CMD_FL0HOSTFCMODE_V(SGE_HOSTFCMODE_NONE) |
 				FW_IQ_CMD_FL0PACKEN_F |
+				FW_IQ_CMD_FL0FETCHRO_V(relaxed) |
+				FW_IQ_CMD_FL0DATARO_V(relaxed) |
 				FW_IQ_CMD_FL0PADEN_F);
 
 		/* In T6, for egress queue type FL there is internal overhead

Raj, Ashok Aug. 3, 2017, 8:31 a.m. UTC | #16
Hi Casey

On Wed, Aug 02, 2017 at 05:53:52PM +0000, Casey Leedom wrote:
>   Okay, here you go.  As you can tell, it's almost a trivial copy of the

> cxgb4 patch.

>  

>   By the way, I realized that we have yet another hole which is likely not

> to be fixable.  If we're dealing with a problematic Root Complex, and we

> instantiate Virtual Functions and attach them to a Virtual Machine along

> with an NVMe device which can deal with Relaxed Ordering TLPs, the VF driver

> in the VM will be able to tell that it shouldn't attempt to send RO TLPs to

> the RC because it will see the state of its own PCIe Capability Device

> Control[Relaxed Ordering Enable] (a copy of the setting in the VF's

> corresponding PF), but it won't be able to change that and send non-RO TLPs

> to the RC, and RO TLPs to the NVMe device.  Oh well.


I don't understand this completely.. So your driver would know not to send RO
TLP's to root complex. But you want to send RO to the NVMe device? This is the
peer-2-peer case correct?

The issue in the current patchset is that we device to turn off the device 
capability for all devices in the hierarchy so one would expect that 
the NVMe also would have RO turned off i suppose. 

The other approach is to not turn off the device capabilty, but let the
driver do the right thing. i.e for transactions towards system memory vs. 
peer-2-peer? But since we wanted to take a big hammer approach because
some platforms there can be data-corruption and we can't let trust guest
drivers to do the right thing. This isn't something we can fix in this 
current version.

One possible approach is to provide a strict flag, where we use this heavy 
hammer approach only on platforms that have a serious implication, and the 
other is we let the driver do the right thing depending on the platform.

Worst case if the driver doesn't do the right thing, you would see perf issues
but nothing bad would happen. It would allow you to select when to turn on 
RO and when to turn it off.

Cheers,
Ashok
Raj, Ashok Aug. 3, 2017, 9:13 a.m. UTC | #17
Hi Ding

patch looks good, except would reword the patch description for clarity

here is my crack at it, feel free to use.

On Thu, Jul 13, 2017 at 10:21:31PM +0800, Ding Tianhong wrote:
> The PCIe Device Control Register use the bit 4 to indicate that

> whether the device is permitted to enable relaxed ordering or not.

> But relaxed ordering is not safe for some platform which could only

> use strong write ordering, so devices are allowed (but not required)

> to enable relaxed ordering bit by default.

> 

> If a PCIe device didn't enable the relaxed ordering attribute default,

> we should not do anything in the PCIe configuration, otherwise we

> should check if any of the devices above us do not support relaxed

> ordering by the PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag, then base on

> the result if we get a return that indicate that the relaxed ordering

> is not supported we should update our device to disable relaxed ordering

> in configuration space. If the device above us doesn't exist or isn't

> the PCIe device, we shouldn't do anything and skip updating relaxed ordering

> because we are probably running in a guest machine.


When bit4 is set in the PCIe Device Control register, it indicates
whether the device is permitted to use relaxed ordering.
On some platforms using relaxed ordering can have performance issues or
due to erratum can cause data-corruption. In such cases devices must avoid
using relaxed ordering.

This patch checks if there is any node in the hierarchy that indicates that
using relaxed ordering is not safe. In such cases the patch turns off the
relaxed ordering by clearing the eapability for this device.

> 

> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>

> ---

>  drivers/pci/pci.c   | 29 +++++++++++++++++++++++++++++

>  drivers/pci/probe.c | 37 +++++++++++++++++++++++++++++++++++++

>  include/linux/pci.h |  2 ++

>  3 files changed, 68 insertions(+)

> 

> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c

> index d88edf5..7a6b32f 100644

> --- a/drivers/pci/pci.c

> +++ b/drivers/pci/pci.c

> @@ -4854,6 +4854,35 @@ int pcie_set_mps(struct pci_dev *dev, int mps)

>  EXPORT_SYMBOL(pcie_set_mps);

>  

>  /**

> + * pcie_clear_relaxed_ordering - clear PCI Express relaxed ordering bit

> + * @dev: PCI device to query

> + *

> + * If possible clear relaxed ordering

> + */

> +int pcie_clear_relaxed_ordering(struct pci_dev *dev)

> +{

> +	return pcie_capability_clear_word(dev, PCI_EXP_DEVCTL,

> +					  PCI_EXP_DEVCTL_RELAX_EN);

> +}

> +EXPORT_SYMBOL(pcie_clear_relaxed_ordering);

> +

> +/**

> + * pcie_relaxed_ordering_supported - Probe for PCIe relexed ordering support

> + * @dev: PCI device to query

> + *

> + * Returns true if the device support relaxed ordering attribute.

> + */

> +bool pcie_relaxed_ordering_supported(struct pci_dev *dev)

> +{

> +	u16 v;

> +

> +	pcie_capability_read_word(dev, PCI_EXP_DEVCTL, &v);

> +

> +	return !!(v & PCI_EXP_DEVCTL_RELAX_EN);

> +}

> +EXPORT_SYMBOL(pcie_relaxed_ordering_supported);

> +

> +/**

>   * pcie_get_minimum_link - determine minimum link settings of a PCI device

>   * @dev: PCI device to query

>   * @speed: storage for minimum speed

> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c

> index c31310d..48df012 100644

> --- a/drivers/pci/probe.c

> +++ b/drivers/pci/probe.c

> @@ -1762,6 +1762,42 @@ static void pci_configure_extended_tags(struct pci_dev *dev)

>  					 PCI_EXP_DEVCTL_EXT_TAG);

>  }

>  

> +/**

> + * pci_dev_should_disable_relaxed_ordering - check if the PCI device

> + * should disable the relaxed ordering attribute.

> + * @dev: PCI device

> + *

> + * Return true if any of the PCI devices above us do not support

> + * relaxed ordering.

> + */

> +static bool pci_dev_should_disable_relaxed_ordering(struct pci_dev *dev)

> +{

> +	while (dev) {

> +		if (dev->dev_flags & PCI_DEV_FLAGS_NO_RELAXED_ORDERING)

> +			return true;

> +

> +		dev = dev->bus->self;

> +	}

> +

> +	return false;

> +}

> +

> +static void pci_configure_relaxed_ordering(struct pci_dev *dev)

> +{

> +	/* We should not alter the relaxed ordering bit for the VF */

> +	if (dev->is_virtfn)

> +		return;

> +

> +	/* If the releaxed ordering enable bit is not set, do nothing. */

> +	if (!pcie_relaxed_ordering_supported(dev))

> +		return;

> +

> +	if (pci_dev_should_disable_relaxed_ordering(dev)) {

> +		pcie_clear_relaxed_ordering(dev);

> +		dev_info(&dev->dev, "Disable Relaxed Ordering\n");

> +	}

> +}

> +

>  static void pci_configure_device(struct pci_dev *dev)

>  {

>  	struct hotplug_params hpp;

> @@ -1769,6 +1805,7 @@ static void pci_configure_device(struct pci_dev *dev)

>  

>  	pci_configure_mps(dev);

>  	pci_configure_extended_tags(dev);

> +	pci_configure_relaxed_ordering(dev);

>  

>  	memset(&hpp, 0, sizeof(hpp));

>  	ret = pci_get_hp_params(dev, &hpp);

> diff --git a/include/linux/pci.h b/include/linux/pci.h

> index 412ec1c..3aa23a2 100644

> --- a/include/linux/pci.h

> +++ b/include/linux/pci.h

> @@ -1127,6 +1127,8 @@ int pci_add_ext_cap_save_buffer(struct pci_dev *dev,

>  void pci_pme_wakeup_bus(struct pci_bus *bus);

>  void pci_d3cold_enable(struct pci_dev *dev);

>  void pci_d3cold_disable(struct pci_dev *dev);

> +int pcie_clear_relaxed_ordering(struct pci_dev *dev);

> +bool pcie_relaxed_ordering_supported(struct pci_dev *dev);

>  

>  /* PCI Virtual Channel */

>  int pci_save_vc_state(struct pci_dev *dev);

> -- 

> 1.8.3.1

> 

>
Ding Tianhong Aug. 3, 2017, 10:22 a.m. UTC | #18
On 2017/8/3 17:13, Raj, Ashok wrote:
> Hi Ding

> 

> patch looks good, except would reword the patch description for clarity

> 

> here is my crack at it, feel free to use.

> 

> On Thu, Jul 13, 2017 at 10:21:31PM +0800, Ding Tianhong wrote:

>> The PCIe Device Control Register use the bit 4 to indicate that

>> whether the device is permitted to enable relaxed ordering or not.

>> But relaxed ordering is not safe for some platform which could only

>> use strong write ordering, so devices are allowed (but not required)

>> to enable relaxed ordering bit by default.

>>

>> If a PCIe device didn't enable the relaxed ordering attribute default,

>> we should not do anything in the PCIe configuration, otherwise we

>> should check if any of the devices above us do not support relaxed

>> ordering by the PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag, then base on

>> the result if we get a return that indicate that the relaxed ordering

>> is not supported we should update our device to disable relaxed ordering

>> in configuration space. If the device above us doesn't exist or isn't

>> the PCIe device, we shouldn't do anything and skip updating relaxed ordering

>> because we are probably running in a guest machine.

> 

> When bit4 is set in the PCIe Device Control register, it indicates

> whether the device is permitted to use relaxed ordering.

> On some platforms using relaxed ordering can have performance issues or

> due to erratum can cause data-corruption. In such cases devices must avoid

> using relaxed ordering.

> 

> This patch checks if there is any node in the hierarchy that indicates that

> using relaxed ordering is not safe. In such cases the patch turns off the

> relaxed ordering by clearing the eapability for this device.

> 


Good, thanks for the commit, I will send v8 and update the patch description.

Ding

>>

>> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>

>> ---

>>  drivers/pci/pci.c   | 29 +++++++++++++++++++++++++++++

>>  drivers/pci/probe.c | 37 +++++++++++++++++++++++++++++++++++++

>>  include/linux/pci.h |  2 ++

>>  3 files changed, 68 insertions(+)

>>

>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c

>> index d88edf5..7a6b32f 100644

>> --- a/drivers/pci/pci.c

>> +++ b/drivers/pci/pci.c

>> @@ -4854,6 +4854,35 @@ int pcie_set_mps(struct pci_dev *dev, int mps)

>>  EXPORT_SYMBOL(pcie_set_mps);

>>  

>>  /**

>> + * pcie_clear_relaxed_ordering - clear PCI Express relaxed ordering bit

>> + * @dev: PCI device to query

>> + *

>> + * If possible clear relaxed ordering

>> + */

>> +int pcie_clear_relaxed_ordering(struct pci_dev *dev)

>> +{

>> +	return pcie_capability_clear_word(dev, PCI_EXP_DEVCTL,

>> +					  PCI_EXP_DEVCTL_RELAX_EN);

>> +}

>> +EXPORT_SYMBOL(pcie_clear_relaxed_ordering);

>> +

>> +/**

>> + * pcie_relaxed_ordering_supported - Probe for PCIe relexed ordering support

>> + * @dev: PCI device to query

>> + *

>> + * Returns true if the device support relaxed ordering attribute.

>> + */

>> +bool pcie_relaxed_ordering_supported(struct pci_dev *dev)

>> +{

>> +	u16 v;

>> +

>> +	pcie_capability_read_word(dev, PCI_EXP_DEVCTL, &v);

>> +

>> +	return !!(v & PCI_EXP_DEVCTL_RELAX_EN);

>> +}

>> +EXPORT_SYMBOL(pcie_relaxed_ordering_supported);

>> +

>> +/**

>>   * pcie_get_minimum_link - determine minimum link settings of a PCI device

>>   * @dev: PCI device to query

>>   * @speed: storage for minimum speed

>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c

>> index c31310d..48df012 100644

>> --- a/drivers/pci/probe.c

>> +++ b/drivers/pci/probe.c

>> @@ -1762,6 +1762,42 @@ static void pci_configure_extended_tags(struct pci_dev *dev)

>>  					 PCI_EXP_DEVCTL_EXT_TAG);

>>  }

>>  

>> +/**

>> + * pci_dev_should_disable_relaxed_ordering - check if the PCI device

>> + * should disable the relaxed ordering attribute.

>> + * @dev: PCI device

>> + *

>> + * Return true if any of the PCI devices above us do not support

>> + * relaxed ordering.

>> + */

>> +static bool pci_dev_should_disable_relaxed_ordering(struct pci_dev *dev)

>> +{

>> +	while (dev) {

>> +		if (dev->dev_flags & PCI_DEV_FLAGS_NO_RELAXED_ORDERING)

>> +			return true;

>> +

>> +		dev = dev->bus->self;

>> +	}

>> +

>> +	return false;

>> +}

>> +

>> +static void pci_configure_relaxed_ordering(struct pci_dev *dev)

>> +{

>> +	/* We should not alter the relaxed ordering bit for the VF */

>> +	if (dev->is_virtfn)

>> +		return;

>> +

>> +	/* If the releaxed ordering enable bit is not set, do nothing. */

>> +	if (!pcie_relaxed_ordering_supported(dev))

>> +		return;

>> +

>> +	if (pci_dev_should_disable_relaxed_ordering(dev)) {

>> +		pcie_clear_relaxed_ordering(dev);

>> +		dev_info(&dev->dev, "Disable Relaxed Ordering\n");

>> +	}

>> +}

>> +

>>  static void pci_configure_device(struct pci_dev *dev)

>>  {

>>  	struct hotplug_params hpp;

>> @@ -1769,6 +1805,7 @@ static void pci_configure_device(struct pci_dev *dev)

>>  

>>  	pci_configure_mps(dev);

>>  	pci_configure_extended_tags(dev);

>> +	pci_configure_relaxed_ordering(dev);

>>  

>>  	memset(&hpp, 0, sizeof(hpp));

>>  	ret = pci_get_hp_params(dev, &hpp);

>> diff --git a/include/linux/pci.h b/include/linux/pci.h

>> index 412ec1c..3aa23a2 100644

>> --- a/include/linux/pci.h

>> +++ b/include/linux/pci.h

>> @@ -1127,6 +1127,8 @@ int pci_add_ext_cap_save_buffer(struct pci_dev *dev,

>>  void pci_pme_wakeup_bus(struct pci_bus *bus);

>>  void pci_d3cold_enable(struct pci_dev *dev);

>>  void pci_d3cold_disable(struct pci_dev *dev);

>> +int pcie_clear_relaxed_ordering(struct pci_dev *dev);

>> +bool pcie_relaxed_ordering_supported(struct pci_dev *dev);

>>  

>>  /* PCI Virtual Channel */

>>  int pci_save_vc_state(struct pci_dev *dev);

>> -- 

>> 1.8.3.1

>>

>>

> 

> .

>
Raj, Ashok Aug. 4, 2017, 8:21 p.m. UTC | #19
On Fri, Aug 04, 2017 at 08:20:37PM +0000, Casey Leedom wrote:
> | From: Raj, Ashok <ashok.raj@intel.com>

> | Sent: Thursday, August 3, 2017 1:31 AM

> |

> | I don't understand this completely.. So your driver would know not to send

> | RO TLP's to root complex. But you want to send RO to the NVMe device? This

> | is the peer-2-peer case correct?

> 

> Yes, this is the "heavy hammer" issue which you alluded to later.  There are

> applications where a device will want to send TLPs to a Root Complex without

> Relaxed Ordering set, but will want to use it when sending TLPs to a Peer

> device (say, an NVMe storage device).  The current approach doesn't make

> that easy ... and in fact, I still don't kow how to code a solution for this

> with the proposed APIs.  This means that we may be trading off one

> performance problem for another and that Relaxed Ordering may be doomed for

> use under Linux for the foreseeable future.

> 

> As I've noted a number of times, it would be great if the Intel Hardware

> Engineers who attempted to implement the Relaxed Ordering semantics in the

> current generation of Root Complexes had left the ability to turn off the

> logic which is obviously not working.  If there was a way to disable the

> logic via an undocumented register, then we could have the Linux PCI Quirk

> do that.  Since Relaxed Ordering is just a hint, it's completely legitimate

> to completely ignore it.


Suppose you are looking for the existence of a chicken bit to instruct the
port to ignore RO traffic. So all we would do is turn the chicken bit on
but would permit p2p traffic to be allowed since we won't turn off the
capability as currently proposed.

Let me look into that keep you posted.

Cheers,
Ashok
> 

> Casey
Casey Leedom Aug. 4, 2017, 8:48 p.m. UTC | #20
| From: Raj, Ashok <ashok.raj@intel.com>
| Sent: Friday, August 4, 2017 1:21 PM
|
| On Fri, Aug 04, 2017 at 08:20:37PM +0000, Casey Leedom wrote:
| > ...
| > As I've noted a number of times, it would be great if the Intel Hardware
| > Engineers who attempted to implement the Relaxed Ordering semantics in the
| > current generation of Root Complexes had left the ability to turn off the
| > logic which is obviously not working.  If there was a way to disable the
| > logic via an undocumented register, then we could have the Linux PCI Quirk
| > do that.  Since Relaxed Ordering is just a hint, it's completely legitimate
| > to completely ignore it.
|
| Suppose you are looking for the existence of a chicken bit to instruct the
| port to ignore RO traffic. So all we would do is turn the chicken bit on
| but would permit p2p traffic to be allowed since we won't turn off the
| capability as currently proposed.
|
| Let me look into that keep you posted.

Huh, I'd never heard it called a "chicken bit" before, but yes, that's what
I'm talking about.

Whenever our Hardware Designers implement new functionality in our hardware,
they almost always put in A. several "knobs" which can control fundamental
parameters of the new Hardware Feature, and B.  a mechanism of completely
disabling it if necessary.  This stems from the incredibly long Design ->
Deployment cyle for Hardware (as opposed to the edit->compile->run cycle for s!

It's obvious that handling Relaxed Ordering is a new Hardware Feature for
Intel's Root Complexes since previous versions simply ignored it (because
that's legal[1]).  If I was a Hardware Engineer tasked with implementing
Relaxed Ordering semantics for a device, I would certainly have also
implemented a switch to turn it off in case there were unintended
consequences (performance in this case).

And if there is such a mechanism to simply disable processing of Relaxed
Ordering semantics in the Root Complex, that would be a far easier "fix" for
this problem ... and leave the code in place to continue sending Relaxed
Ordering TLPs for a future Root Complex implementation which got it right ...

Casey

[1] One can't ~quite~ just ignore the Relaxed Ordering Attribute on an
    incoming Transaction Layer Packet Request: PCIe completion rules (see
    section 2.2.9 of the PCIe 3.0 specificatin) require that the Relaxed
    Ordering and No Snoop Attributes in a Request TLP be reflected back
    verbatim in any corresponding Response TLP.  (The rules for ID-Based
    Ordering are more complex.)
David Laight Aug. 7, 2017, 9:04 a.m. UTC | #21
From: Casey Leedom

> Sent: 04 August 2017 21:49

...
> Whenever our Hardware Designers implement new functionality in our hardware,

> they almost always put in A. several "knobs" which can control fundamental

> parameters of the new Hardware Feature, and B.  a mechanism of completely

> disabling it if necessary.  This stems from the incredibly long Design ->

> Deployment cyle for Hardware (as opposed to the edit->compile->run cycle for s!


Indeed, I'd also expect there to be an undocumented flag to turn
it on (broken) in earlier parts to allow testing.

	David
diff mbox series

Patch

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index d88edf5..7a6b32f 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4854,6 +4854,35 @@  int pcie_set_mps(struct pci_dev *dev, int mps)
 EXPORT_SYMBOL(pcie_set_mps);
 
 /**
+ * pcie_clear_relaxed_ordering - clear PCI Express relaxed ordering bit
+ * @dev: PCI device to query
+ *
+ * If possible clear relaxed ordering
+ */
+int pcie_clear_relaxed_ordering(struct pci_dev *dev)
+{
+	return pcie_capability_clear_word(dev, PCI_EXP_DEVCTL,
+					  PCI_EXP_DEVCTL_RELAX_EN);
+}
+EXPORT_SYMBOL(pcie_clear_relaxed_ordering);
+
+/**
+ * pcie_relaxed_ordering_supported - Probe for PCIe relexed ordering support
+ * @dev: PCI device to query
+ *
+ * Returns true if the device support relaxed ordering attribute.
+ */
+bool pcie_relaxed_ordering_supported(struct pci_dev *dev)
+{
+	u16 v;
+
+	pcie_capability_read_word(dev, PCI_EXP_DEVCTL, &v);
+
+	return !!(v & PCI_EXP_DEVCTL_RELAX_EN);
+}
+EXPORT_SYMBOL(pcie_relaxed_ordering_supported);
+
+/**
  * pcie_get_minimum_link - determine minimum link settings of a PCI device
  * @dev: PCI device to query
  * @speed: storage for minimum speed
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index c31310d..48df012 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1762,6 +1762,42 @@  static void pci_configure_extended_tags(struct pci_dev *dev)
 					 PCI_EXP_DEVCTL_EXT_TAG);
 }
 
+/**
+ * pci_dev_should_disable_relaxed_ordering - check if the PCI device
+ * should disable the relaxed ordering attribute.
+ * @dev: PCI device
+ *
+ * Return true if any of the PCI devices above us do not support
+ * relaxed ordering.
+ */
+static bool pci_dev_should_disable_relaxed_ordering(struct pci_dev *dev)
+{
+	while (dev) {
+		if (dev->dev_flags & PCI_DEV_FLAGS_NO_RELAXED_ORDERING)
+			return true;
+
+		dev = dev->bus->self;
+	}
+
+	return false;
+}
+
+static void pci_configure_relaxed_ordering(struct pci_dev *dev)
+{
+	/* We should not alter the relaxed ordering bit for the VF */
+	if (dev->is_virtfn)
+		return;
+
+	/* If the releaxed ordering enable bit is not set, do nothing. */
+	if (!pcie_relaxed_ordering_supported(dev))
+		return;
+
+	if (pci_dev_should_disable_relaxed_ordering(dev)) {
+		pcie_clear_relaxed_ordering(dev);
+		dev_info(&dev->dev, "Disable Relaxed Ordering\n");
+	}
+}
+
 static void pci_configure_device(struct pci_dev *dev)
 {
 	struct hotplug_params hpp;
@@ -1769,6 +1805,7 @@  static void pci_configure_device(struct pci_dev *dev)
 
 	pci_configure_mps(dev);
 	pci_configure_extended_tags(dev);
+	pci_configure_relaxed_ordering(dev);
 
 	memset(&hpp, 0, sizeof(hpp));
 	ret = pci_get_hp_params(dev, &hpp);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 412ec1c..3aa23a2 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1127,6 +1127,8 @@  int pci_add_ext_cap_save_buffer(struct pci_dev *dev,
 void pci_pme_wakeup_bus(struct pci_bus *bus);
 void pci_d3cold_enable(struct pci_dev *dev);
 void pci_d3cold_disable(struct pci_dev *dev);
+int pcie_clear_relaxed_ordering(struct pci_dev *dev);
+bool pcie_relaxed_ordering_supported(struct pci_dev *dev);
 
 /* PCI Virtual Channel */
 int pci_save_vc_state(struct pci_dev *dev);