Message ID | 1491031554-19516-2-git-send-email-dingtianhong@huawei.com |
---|---|
State | New |
Headers | show |
Series | ixgbe: enable Relaxed Order for ARM64 | expand |
From: Ding Tianhong <dingtianhong@huawei.com> Date: Sat, 1 Apr 2017 15:25:51 +0800 > Till now only the Intel ixgbe could support enable > Relaxed ordering in the drivers for special architecture, > but the ARCH_WANT_RELAX_ORDER is looks like a general name > for all arch, so rename to a specific name for intel > card looks more appropriate. > > Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> This is not a driver specific facility. Any driver can test this symbol and act accordingly. Just because IXGBE is the first and only user, doesn't mean the facility is driver specific. Thank you.
On 2017/4/2 2:26, David Miller wrote: > From: Ding Tianhong <dingtianhong@huawei.com> > Date: Sat, 1 Apr 2017 15:25:51 +0800 > >> Till now only the Intel ixgbe could support enable >> Relaxed ordering in the drivers for special architecture, >> but the ARCH_WANT_RELAX_ORDER is looks like a general name >> for all arch, so rename to a specific name for intel >> card looks more appropriate. >> >> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> > > This is not a driver specific facility. > > Any driver can test this symbol and act accordingly. > > Just because IXGBE is the first and only user, doesn't mean > the facility is driver specific. > Understand clearly,but the ARCH_WANT_RELAX_ORDER is really too generic and simple, cause much misleading to indicate that it looks like the hack code for some architecture. what do you think of the ETHERNET_ALLOW_RELAXED_ORDER in the drivers/net/ethernet/*, it will only affect ethernet and not only for Ixgbe. Thanks Ding > Thank you. > > . >
On 02/04/2017 07:49, Ding Tianhong wrote: > > > On 2017/4/2 2:26, David Miller wrote: >> From: Ding Tianhong <dingtianhong@huawei.com> >> Date: Sat, 1 Apr 2017 15:25:51 +0800 >> >>> Till now only the Intel ixgbe could support enable >>> Relaxed ordering in the drivers for special architecture, >>> but the ARCH_WANT_RELAX_ORDER is looks like a general name >>> for all arch, so rename to a specific name for intel >>> card looks more appropriate. >>> >>> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> >> >> This is not a driver specific facility. >> >> Any driver can test this symbol and act accordingly. >> >> Just because IXGBE is the first and only user, doesn't mean >> the facility is driver specific. >> > > Understand clearly,but the ARCH_WANT_RELAX_ORDER is really too generic and simple, > cause much misleading to indicate that it looks like the hack code for some architecture. > what do you think of the ETHERNET_ALLOW_RELAXED_ORDER in the drivers/net/ethernet/*, > it will only affect ethernet and not only for Ixgbe. > Hi Ding, I think the actual original config ARCH_WANT_RELAX_ORDER is quite dubious, and it does not really tell us which feature(s) of the architecture supports this (if indeed it is arch specific). According to the original commit, 1a8b6d76dc5b net:add one common config ARCH_WANT_RELAX_ORDER to support relax ordering, this is specific to SPARC only: "Currently it only supports one special cpu architecture(SPARC) in 82599 driver to enable RO feature, this is not very common for other cpu architecture which really needs RO feature". This sounds wooly. So I think that we need to know which specific architecture, memory model, or PCI host/port/EP features, or combination of them, allows this so called relaxed ordering. And a config option is probably not even the proper check. John > Thanks > Ding > > >> Thank you. >> >> . >> > > > . >
On 2017/4/5 21:05, John Garry wrote: > On 02/04/2017 07:49, Ding Tianhong wrote: >> >> >> On 2017/4/2 2:26, David Miller wrote: >>> From: Ding Tianhong <dingtianhong@huawei.com> >>> Date: Sat, 1 Apr 2017 15:25:51 +0800 >>> >>>> Till now only the Intel ixgbe could support enable >>>> Relaxed ordering in the drivers for special architecture, >>>> but the ARCH_WANT_RELAX_ORDER is looks like a general name >>>> for all arch, so rename to a specific name for intel >>>> card looks more appropriate. >>>> >>>> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> >>> >>> This is not a driver specific facility. >>> >>> Any driver can test this symbol and act accordingly. >>> >>> Just because IXGBE is the first and only user, doesn't mean >>> the facility is driver specific. >>> >> >> Understand clearly,but the ARCH_WANT_RELAX_ORDER is really too generic and simple, >> cause much misleading to indicate that it looks like the hack code for some architecture. >> what do you think of the ETHERNET_ALLOW_RELAXED_ORDER in the drivers/net/ethernet/*, >> it will only affect ethernet and not only for Ixgbe. >> > Hi John: > Hi Ding, > > I think the actual original config ARCH_WANT_RELAX_ORDER is quite dubious, and it does not really tell us which feature(s) of the architecture supports this (if indeed it is arch specific). > Agree. > According to the original commit, 1a8b6d76dc5b net:add one common config ARCH_WANT_RELAX_ORDER to support relax ordering, this is specific to SPARC only: > "Currently it only supports one special cpu architecture(SPARC) in 82599 driver to enable RO feature, this is not very common for other cpu architecture which really needs RO feature". > Relaxed Ordering is a general setting compare to the SO for PCIE controller, if the drivers support it, the architecture could choose to enable it, of course the feature is not support for every arch. > This sounds wooly. > > So I think that we need to know which specific architecture, memory model, or PCI host/port/EP features, or combination of them, allows this so called relaxed ordering. > This depends on the PCIE design in the chip, I couldn't know whether other arch has some issues when enable RO,if the chip totally support PCIE3.0 standard and has no defect,should both support RO and SO. Thanks Ding > And a config option is probably not even the proper check. > > John > >> Thanks >> Ding >> >> >>> Thank you. >>> >>> . >>> >> >> >> . >> > > > > . >
Hi David > -----Original Message----- > Subject: Re: [PATCH net-next 1/4] ixgbe: sparc: rename the > ARCH_WANT_RELAX_ORDER to IXGBE_ALLOW_RELAXED_ORDER > Date: Sat, 1 Apr 2017 11:26:03 -0700 > From: David Miller <davem@davemloft.net> > To: dingtianhong@huawei.com > CC: catalin.marinas@arm.com, will.deacon@arm.com, mark.rutland@arm.com, > robin.murphy@arm.com, jeffrey.t.kirsher@intel.com, > alexander.duyck@gmail.com, linux-arm-kernel@lists.infradead.org, > netdev@vger.kernel.org > > From: Ding Tianhong <dingtianhong@huawei.com> > Date: Sat, 1 Apr 2017 15:25:51 +0800 > > > Till now only the Intel ixgbe could support enable > > Relaxed ordering in the drivers for special architecture, > > but the ARCH_WANT_RELAX_ORDER is looks like a general name > > for all arch, so rename to a specific name for intel > > card looks more appropriate. > > > > Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> > > This is not a driver specific facility. > > Any driver can test this symbol and act accordingly. > > Just because IXGBE is the first and only user, doesn't mean > the facility is driver specific. Please correct me if I am wrong but my understanding is that the standard way to enable/disable relaxed ordering is to set/clear bit 4 of the Device Control Register (PCI_EXP_DEVCTL_RELAX_EN 0x0010 /* Enable relaxed ordering */). Now I have looked up for all drivers either enabling or disabling relaxed ordering and none of them seems to need a symbol to decide whether to enable it or not. Also it seems to me enabling/disabling relaxed ordering is never bound to the host architecture. So why this should be (or it is expected to be) a generic symbol? Wouldn't it be more correct to have this as a driver specific symbol now and move it to a generic one later once we have other drivers requiring it? Many thanks Gab > > Thank you. > > . >
From: Gabriele Paoloni <gabriele.paoloni@huawei.com> Date: Thu, 13 Apr 2017 09:10:32 +0000 > Wouldn't it be more correct to have this as a driver specific symbol > now and move it to a generic one later once we have other drivers > requiring it? No, it would not.
From: Gabriele Paoloni > Sent: 13 April 2017 10:11 > > > Till now only the Intel ixgbe could support enable > > > Relaxed ordering in the drivers for special architecture, > > > but the ARCH_WANT_RELAX_ORDER is looks like a general name > > > for all arch, so rename to a specific name for intel > > > card looks more appropriate. > > > > > > Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> > > > > This is not a driver specific facility. > > > > Any driver can test this symbol and act accordingly. > > > > Just because IXGBE is the first and only user, doesn't mean > > the facility is driver specific. > > > Please correct me if I am wrong but my understanding is that the standard > way to enable/disable relaxed ordering is to set/clear bit 4 of the Device > Control Register (PCI_EXP_DEVCTL_RELAX_EN 0x0010 /* Enable relaxed > ordering */). > Now I have looked up for all drivers either enabling or disabling relaxed > ordering and none of them seems to need a symbol to decide whether to > enable it or not. > Also it seems to me enabling/disabling relaxed ordering is never bound to the > host architecture. > > So why this should be (or it is expected to be) a generic symbol? > Wouldn't it be more correct to have this as a driver specific symbol now and > move it to a generic one later once we have other drivers requiring it? 'Relaxed ordering' is a bit in the TLP header. A device (like the ixgbe hardware) can set it for some transactions and still have the transactions 'ordered enough' for the driver to work. (If all transactions have 'relaxed ordering' set then I suspect it is almost impossible to write a working driver.) The host side could (probably does) have a bit to enable 'relaxed ordering', it could also enforce stronger ordering than required by the PCIe spec (eg keeping reads in order). Clearly, on some sparc64 systems, ixgbe needs to use 'relaxed ordering'. To me this requires two separate bits be enabled: 1) to the ixgbe driver to generate the 'relaxed' TLP. 2) to the host to actually act on them. If the ixgbe driver works when both are enabled why have options to disable either (except for bug-finding)? David
Hi David Many thanks for your reply > -----Original Message----- > From: David Laight [mailto:David.Laight@ACULAB.COM] > Sent: 18 April 2017 14:26 > To: Gabriele Paoloni; davem@davemloft.net > Cc: Catalin Marinas; Will Deacon; Mark Rutland; Robin Murphy; > jeffrey.t.kirsher@intel.com; alexander.duyck@gmail.com; linux-arm- > kernel@lists.infradead.org; netdev@vger.kernel.org; Dingtianhong; > Linuxarm > Subject: RE: Re: [PATCH net-next 1/4] ixgbe: sparc: rename the > ARCH_WANT_RELAX_ORDER to IXGBE_ALLOW_RELAXED_ORDER > > From: Gabriele Paoloni > > Sent: 13 April 2017 10:11 > > > > Till now only the Intel ixgbe could support enable > > > > Relaxed ordering in the drivers for special architecture, > > > > but the ARCH_WANT_RELAX_ORDER is looks like a general name > > > > for all arch, so rename to a specific name for intel > > > > card looks more appropriate. > > > > > > > > Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> > > > > > > This is not a driver specific facility. > > > > > > Any driver can test this symbol and act accordingly. > > > > > > Just because IXGBE is the first and only user, doesn't mean > > > the facility is driver specific. > > > > > > Please correct me if I am wrong but my understanding is that the > standard > > way to enable/disable relaxed ordering is to set/clear bit 4 of the > Device > > Control Register (PCI_EXP_DEVCTL_RELAX_EN 0x0010 /* Enable relaxed > > ordering */). > > Now I have looked up for all drivers either enabling or disabling > relaxed > > ordering and none of them seems to need a symbol to decide whether to > > enable it or not. > > Also it seems to me enabling/disabling relaxed ordering is never > bound to the > > host architecture. > > > > So why this should be (or it is expected to be) a generic symbol? > > Wouldn't it be more correct to have this as a driver specific symbol > now and > > move it to a generic one later once we have other drivers requiring > it? > > 'Relaxed ordering' is a bit in the TLP header. > A device (like the ixgbe hardware) can set it for some transactions and > still have the transactions 'ordered enough' for the driver to work. > (If all transactions have 'relaxed ordering' set then I suspect it is > almost impossible to write a working driver.) > The host side could (probably does) have a bit to enable 'relaxed > ordering', > it could also enforce stronger ordering than required by the PCIe spec > (eg keeping reads in order). My understanding is that from the host side the host is always allowed (as long as it complies with the rules specified in sec.2.4.1 of the PCIe Specs) to set the RO attribute in the TLP and the target function should be abel to cope with it. On the device side the device is allowed to set the RO attribute in the TLP only if bit4 of the "Device Control Register" is set. > > Clearly, on some sparc64 systems, ixgbe needs to use 'relaxed > ordering'. > To me this requires two separate bits be enabled: > 1) to the ixgbe driver to generate the 'relaxed' TLP. > 2) to the host to actually act on them. My understanding is that for performance reasons when possible we should enable relaxed ordering and I think this is up to the host (i.e. the host somehow should know when he is capable of handling RO TLPs and therefore it will try to enable it on the driver) > If the ixgbe driver works when both are enabled why have options to > disable either (except for bug-finding)? I think that by default the ixgbe driver disable RO since there are issues with "some chipsets" according to commit 3d5c520727ce "ixgbe: move disabling of relaxed ordering in start_hw()". What this means is a bit obscure to me and seems to be not related to the host architecture Also looking at where and why the other drivers set/clear the "Enable Relaxed Ordering" bit it seems that currently this is not tied to the host architecture nor to any global symbol; instead it seems purely dependent on the PCIe device chipset itself. > > David
Hi Amir > From: Amir Ancel [mailto:amira@mellanox.com] > Sent: 18 April 2017 21:18 > To: David Laight; Gabriele Paoloni; davem@davemloft.net > Cc: Catalin Marinas; Will Deacon; Mark Rutland; Robin Murphy; > jeffrey.t.kirsher@intel.com; alexander.duyck@gmail.com; linux-arm- > kernel@lists.infradead.org; netdev@vger.kernel.org; Dingtianhong; > Linuxarm > Subject: Re: Re: [PATCH net-next 1/4] ixgbe: sparc: rename the > ARCH_WANT_RELAX_ORDER to IXGBE_ALLOW_RELAXED_ORDER > > Hi, > mlx5 driver is planned to have RO support this year. > I believe drivers should be able to query whether the arch support it I guess that here when you say query you mean having a config symbol that is set accordingly to the host architecture, right? As already said I have looked around a bit and other drivers do not seem to enable/disable RO for their EP on the basis of the host architecture. So why should mlx5 do it according to the host? Also my understating is that some architectures (like ARM64 for example) can have different PCI host controller implementations depending on the vendor...therefore maybe it is not appropriate there to have a Kconfig symbol selected by the architecture... Thanks Gab > or not and enable it in the network adapter accordingly. > > -Amir > ________________________________________ > From: netdev-owner@vger.kernel.org <netdev-owner@vger.kernel.org> on > behalf of David Laight <David.Laight@ACULAB.COM> > Sent: Tuesday, April 18, 2017 4:25:44 PM > To: 'Gabriele Paoloni'; davem@davemloft.net > Cc: Catalin Marinas; Will Deacon; Mark Rutland; Robin Murphy; > jeffrey.t.kirsher@intel.com; alexander.duyck@gmail.com; linux-arm- > kernel@lists.infradead.org; netdev@vger.kernel.org; Dingtianhong; > Linuxarm > Subject: RE: Re: [PATCH net-next 1/4] ixgbe: sparc: rename the > ARCH_WANT_RELAX_ORDER to IXGBE_ALLOW_RELAXED_ORDER > > From: Gabriele Paoloni > > Sent: 13 April 2017 10:11 > > > > Till now only the Intel ixgbe could support enable > > > > Relaxed ordering in the drivers for special architecture, > > > > but the ARCH_WANT_RELAX_ORDER is looks like a general name > > > > for all arch, so rename to a specific name for intel > > > > card looks more appropriate. > > > > > > > > Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> > > > > > > This is not a driver specific facility. > > > > > > Any driver can test this symbol and act accordingly. > > > > > > Just because IXGBE is the first and only user, doesn't mean > > > the facility is driver specific. > > > > > > Please correct me if I am wrong but my understanding is that the > standard > > way to enable/disable relaxed ordering is to set/clear bit 4 of the > Device > > Control Register (PCI_EXP_DEVCTL_RELAX_EN 0x0010 /* Enable relaxed > > ordering */). > > Now I have looked up for all drivers either enabling or disabling > relaxed > > ordering and none of them seems to need a symbol to decide whether to > > enable it or not. > > Also it seems to me enabling/disabling relaxed ordering is never > bound to the > > host architecture. > > > > So why this should be (or it is expected to be) a generic symbol? > > Wouldn't it be more correct to have this as a driver specific symbol > now and > > move it to a generic one later once we have other drivers requiring > it? > > 'Relaxed ordering' is a bit in the TLP header. > A device (like the ixgbe hardware) can set it for some transactions and > still have the transactions 'ordered enough' for the driver to work. > (If all transactions have 'relaxed ordering' set then I suspect it is > almost impossible to write a working driver.) > The host side could (probably does) have a bit to enable 'relaxed > ordering', > it could also enforce stronger ordering than required by the PCIe spec > (eg keeping reads in order). > > Clearly, on some sparc64 systems, ixgbe needs to use 'relaxed > ordering'. > To me this requires two separate bits be enabled: > 1) to the ixgbe driver to generate the 'relaxed' TLP. > 2) to the host to actually act on them. > If the ixgbe driver works when both are enabled why have options to > disable either (except for bug-finding)? > > David
On Wed, Apr 19, 2017 at 02:46:19PM +0000, Gabriele Paoloni wrote: > > From: Amir Ancel [mailto:amira@mellanox.com] > > Sent: 18 April 2017 21:18 > > To: David Laight; Gabriele Paoloni; davem@davemloft.net > > Cc: Catalin Marinas; Will Deacon; Mark Rutland; Robin Murphy; > > jeffrey.t.kirsher@intel.com; alexander.duyck@gmail.com; linux-arm- > > kernel@lists.infradead.org; netdev@vger.kernel.org; Dingtianhong; > > Linuxarm > > Subject: Re: Re: [PATCH net-next 1/4] ixgbe: sparc: rename the > > ARCH_WANT_RELAX_ORDER to IXGBE_ALLOW_RELAXED_ORDER > > > > Hi, > > mlx5 driver is planned to have RO support this year. > > I believe drivers should be able to query whether the arch support it > > I guess that here when you say query you mean having a config symbol > that is set accordingly to the host architecture, right? > > As already said I have looked around a bit and other drivers do not seem > to enable/disable RO for their EP on the basis of the host architecture. > So why should mlx5 do it according to the host? > > Also my understating is that some architectures (like ARM64 for example) > can have different PCI host controller implementations depending on the > vendor...therefore maybe it is not appropriate there to have a Kconfig > symbol selected by the architecture... Indeed. We're not able to determine whether or not RO is supported at compile time, so we'd have to detect this dynamically if we want to support it for arm64 with a single kernel Image. That means either passing something through firmware, having the PCI host controller opt-in or something coarse like a command-line option. Will
Hi Amir: It is really glad to hear that the mlx5 will support RO mode this year, if so, do you agree that enable it dynamic by ethtool -s xxx, we have try it several month ago but there was only one drivers would use it at that time so the maintainer against it, it mlx5 would support RO, we could try to restart this solution, what do you think about it. :) Thanks Ding On 2017/4/19 4:17, Amir Ancel wrote: > Hi, > > mlx5 driver is planned to have RO support this year. > > I believe drivers should be able to query whether the arch support it or not and enable it in the network adapter accordingly. > > > > -Amir > > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > *From:* netdev-owner@vger.kernel.org <netdev-owner@vger.kernel.org> on behalf of David Laight <David.Laight@ACULAB.COM> > *Sent:* Tuesday, April 18, 2017 4:25:44 PM > *To:* 'Gabriele Paoloni'; davem@davemloft.net > *Cc:* Catalin Marinas; Will Deacon; Mark Rutland; Robin Murphy; jeffrey.t.kirsher@intel.com; alexander.duyck@gmail.com; linux-arm-kernel@lists.infradead.org; netdev@vger.kernel.org; Dingtianhong; Linuxarm > *Subject:* RE: Re: [PATCH net-next 1/4] ixgbe: sparc: rename the ARCH_WANT_RELAX_ORDER to IXGBE_ALLOW_RELAXED_ORDER > > From: Gabriele Paoloni >> Sent: 13 April 2017 10:11 >> > > Till now only the Intel ixgbe could support enable >> > > Relaxed ordering in the drivers for special architecture, >> > > but the ARCH_WANT_RELAX_ORDER is looks like a general name >> > > for all arch, so rename to a specific name for intel >> > > card looks more appropriate. >> > > >> > > Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> >> > >> > This is not a driver specific facility. >> > >> > Any driver can test this symbol and act accordingly. >> > >> > Just because IXGBE is the first and only user, doesn't mean >> > the facility is driver specific. >> >> >> Please correct me if I am wrong but my understanding is that the standard >> way to enable/disable relaxed ordering is to set/clear bit 4 of the Device >> Control Register (PCI_EXP_DEVCTL_RELAX_EN 0x0010 /* Enable relaxed >> ordering */). >> Now I have looked up for all drivers either enabling or disabling relaxed >> ordering and none of them seems to need a symbol to decide whether to >> enable it or not. >> Also it seems to me enabling/disabling relaxed ordering is never bound to the >> host architecture. >> >> So why this should be (or it is expected to be) a generic symbol? >> Wouldn't it be more correct to have this as a driver specific symbol now and >> move it to a generic one later once we have other drivers requiring it? > > 'Relaxed ordering' is a bit in the TLP header. > A device (like the ixgbe hardware) can set it for some transactions and > still have the transactions 'ordered enough' for the driver to work. > (If all transactions have 'relaxed ordering' set then I suspect it is > almost impossible to write a working driver.) > The host side could (probably does) have a bit to enable 'relaxed ordering', > it could also enforce stronger ordering than required by the PCIe spec > (eg keeping reads in order). > > Clearly, on some sparc64 systems, ixgbe needs to use 'relaxed ordering'. > To me this requires two separate bits be enabled: > 1) to the ixgbe driver to generate the 'relaxed' TLP. > 2) to the host to actually act on them. > If the ixgbe driver works when both are enabled why have options to > disable either (except for bug-finding)? > > David >
On Wed, Apr 26, 2017 at 2:26 AM, Ding Tianhong <dingtianhong@huawei.com> wrote: > Hi Amir: > > It is really glad to hear that the mlx5 will support RO mode this year, if so, do you agree that enable it dynamic by ethtool -s xxx, > we have try it several month ago but there was only one drivers would use it at that time so the maintainer against it, it mlx5 would support RO, > we could try to restart this solution, what do you think about it. :) > > Thanks > Ding Hi Ding, Enabing relaxed ordering really doesn't have any place in ethtool. It is a PCIe attribute that you are essentially wanting to enable. It might be worth while to take a look at updating the PCIe code path to handle this. Really what we should probably do is guarantee that the architectures that need relaxed ordering are setting it in the PCIe Device Control register and that the ones that don't are clearing the bit. It's possible that this is already occurring, but I don't know the state of handling those bits is in the kernel. Once we can guarantee that we could use that to have the drivers determine their behavior in regards to relaxed ordering. For example in the case of igb/ixgbe we could probably change the behavior so that it will bey default try to use relaxed ordering but if it is not enabled in PCIe Device Control register the hardware should not request to use it. It would simplify things in the drivers and allow for each architecture to control things as needed in their PCIe code. - Alex
[+cc Casey] On Wed, Apr 26, 2017 at 09:18:33AM -0700, Alexander Duyck wrote: > On Wed, Apr 26, 2017 at 2:26 AM, Ding Tianhong <dingtianhong@huawei.com> wrote: > > Hi Amir: > > > > It is really glad to hear that the mlx5 will support RO mode this year, if so, do you agree that enable it dynamic by ethtool -s xxx, > > we have try it several month ago but there was only one drivers would use it at that time so the maintainer against it, it mlx5 would support RO, > > we could try to restart this solution, what do you think about it. :) > > > > Thanks > > Ding > > Hi Ding, > > Enabing relaxed ordering really doesn't have any place in ethtool. It > is a PCIe attribute that you are essentially wanting to enable. > > It might be worth while to take a look at updating the PCIe code path > to handle this. Really what we should probably do is guarantee that > the architectures that need relaxed ordering are setting it in the > PCIe Device Control register and that the ones that don't are clearing > the bit. It's possible that this is already occurring, but I don't > know the state of handling those bits is in the kernel. Once we can > guarantee that we could use that to have the drivers determine their > behavior in regards to relaxed ordering. For example in the case of > igb/ixgbe we could probably change the behavior so that it will bey > default try to use relaxed ordering but if it is not enabled in PCIe > Device Control register the hardware should not request to use it. It > would simplify things in the drivers and allow for each architecture > to control things as needed in their PCIe code. I thought Relaxed Ordering was an optimization. Are there cases where it is actually required for correct behavior? The PCI core doesn't currently do anything with Relaxed Ordering. Some drivers enable/disable it directly. I think it would probably be better if the core provided an interface for this. One reason is because I think Casey has identified some systems where Relaxed Ordering doesn't work correctly, and I'd rather deal with them once in the core than in every driver. Are you hinting that the PCI core or arch code could actually *enable* Relaxed Ordering without the driver doing anything? Is it safe to do that? Is there such a thing as a device that is capable of using RO, but where the driver must be aware of it being enabled, so it programs the device appropriately? Bjorn
Thanks for adding me to the Cc list Bjorn. Hopefully my message will make it out to the netdev and linux-pci lists -- I'm not currently subscribed to them. I've explicitly marked this message to be sent in plain text but modern email agents suck with respect to this. (sigh) I have officially become a curmudgeon. So, officially, Relaxed Ordering should be a Semantic Noop as far as PCIe transfers are concerned, as long as you don't care what order the PCIe Transaction Layer Packets are processed in by the target PCIe Fabric End Point. Basically, if you have some number of back-to-back PCIe TLPs between two Fabric End Points {A} -> {B} which have the Relaxed Ordering Attribute set, the End Point {B} receiving these RO TLPs may process them in any order it likes. When a TLP without Relaxed Ordering is sent {A} -> {B}, all preceding TLPs with Relaxed Ordering set must be processed by {B} prior to processing the TLP without Relaxed Ordering set. In this sense, a TLP without Relaxed Ordering set is something akin to a "memory barrier". All of this is covered in Section 2.4.1 of the PCIe 3.0 Specification (PCI Express(r) Base Specification Revision 3.0 November 10, 2010). The advantage of using Relaxed Ordering (which is heavily used when sending data to Graphics Cards as I understand it), is that the PCIe Endpoint can potentially optimize the processing order of RO TLPs with things like a local multi-channel Memory Controller in order to achieve the highest transfer bandwidth possible. However, we have discovered at least two PCIe 3.0 Root Complex implementations which have problems with TLPs directed at them with the Relaxed Ordering Attribute set and I'm in the process of working up a Linux Kernel PCI "Quirk" to allow those PCIe End Points to be marked as "not being advisable to send RO TLPs to". These problems range from "mere" Performance Problems to outright Data Corruption. I'm working with the vendors of these ... "problematic" Root Complex implementations and hope to have this patch submitted to the linux-pci list by tomorrow. By the way, it's important to note that just because, say, a Root Complex has problems with RO TLPs directed at it, that doesn't mean that you want to avoid all use of Relaxed Ordering within the PCIe Fabric. For instance, with the vendor whose Root Complex has a Performance Problem with RO TLPs directed at it, it's perfectly reasonable -- and desired -- to use Relaxed Ordering in Peer-to-Peer traffic. Say for instance, with an NVMe <-> Ethernet application. Casey
| From: Bjorn Helgaas <helgaas@kernel.org> | Sent: Thursday, April 27, 2017 10:19 AM | | Are you hinting that the PCI core or arch code could actually *enable* | Relaxed Ordering without the driver doing anything? Is it safe to do that? | Is there such a thing as a device that is capable of using RO, but where the | driver must be aware of it being enabled, so it programs the device | appropriately? I forgot to reply to this portion of Bjorn's email. The PCI Configuration Space PCI Capability Device Control[Enable Relaxed Ordering] bit governs enabling the _ability_ for the PCIe Device to send TLPs with the Relaxed Ordering Attribute set. It does not _cause_ RO to be set on TLPs. Doing that would almost certainly cause Data Corruption Bugs since you only want a subset of TLPs to have RO set. For instance, we typically use RO for Ingress Packet Data delivery but non-RO for messages notifying the Host that an Ingress Packet has been delivered. This ensures that the "Ingress Packet Delivered" non-RO TLP is processed _after_ any preceding RO TLPs delivering the actual Ingress Packet Data. In the above scenario, if one were to turn off Enable Relaxed Ordering via the PCIe Capability, then the on-chip PCIe engine would simply never send a TLP with the Relaxed Ordering Attribute set, regardless of any other chip programming. And finally, just to be absolutely clear, using Relaxed Ordering isn't and "Architecture Thing". It's a PCIe Fabric End Point Thing. Many End Points simply ignore the Relaxed Ordering Attribute (except to reflect it back in Response TLPs). In this sense, Relaxed Ordering simply provides potentially useful optimization information to the PCIe End Point. Casey
Am Donnerstag, den 27.04.2017, 12:19 -0500 schrieb Bjorn Helgaas: > [+cc Casey] > > On Wed, Apr 26, 2017 at 09:18:33AM -0700, Alexander Duyck wrote: > > On Wed, Apr 26, 2017 at 2:26 AM, Ding Tianhong <dingtianhong@huawei.com> wrote: > > > Hi Amir: > > > > > > It is really glad to hear that the mlx5 will support RO mode this year, if so, do you agree that enable it dynamic by ethtool -s xxx, > > > we have try it several month ago but there was only one drivers would use it at that time so the maintainer against it, it mlx5 would support RO, > > > we could try to restart this solution, what do you think about it. :) > > > > > > Thanks > > > Ding > > > > Hi Ding, > > > > Enabing relaxed ordering really doesn't have any place in ethtool. It > > is a PCIe attribute that you are essentially wanting to enable. > > > > It might be worth while to take a look at updating the PCIe code path > > to handle this. Really what we should probably do is guarantee that > > the architectures that need relaxed ordering are setting it in the > > PCIe Device Control register and that the ones that don't are clearing > > the bit. It's possible that this is already occurring, but I don't > > know the state of handling those bits is in the kernel. Once we can > > guarantee that we could use that to have the drivers determine their > > behavior in regards to relaxed ordering. For example in the case of > > igb/ixgbe we could probably change the behavior so that it will bey > > default try to use relaxed ordering but if it is not enabled in PCIe > > Device Control register the hardware should not request to use it. It > > would simplify things in the drivers and allow for each architecture > > to control things as needed in their PCIe code. > > I thought Relaxed Ordering was an optimization. Are there cases where > it is actually required for correct behavior? Yes, at least the Tegra 2 TRM claims that RO needs to be enabled on the device side for correct operation with the following language: "Tegra 2 requires relaxed ordering for responses to downstream requests (responses can pass writes). It is possible in some circumstances for PCIe transfers from an external bus masters (i.e. upstream transfers) to become blocked by a downstream read or non-posted write. The responses to these downstream requests are blocked by upstream posted writes only when PCIe strict ordering is imposed. It is therefore necessary to never impose strict ordering that would block a response to a downstream NPW/read request and always set the relaxed ordering bit to 1. Only devices that are capable of relaxed ordering may be used with Tegra 2 devices." Regards, Lucas
Hi Casey Many thanks for the detailed explanation > -----Original Message----- > From: Casey Leedom [mailto:leedom@chelsio.com] > Sent: 27 April 2017 21:35 > To: Bjorn Helgaas; Alexander Duyck > Cc: Dingtianhong; Mark Rutland; Amir Ancel; Gabriele Paoloni; linux- > pci@vger.kernel.org; Catalin Marinas; Will Deacon; Linuxarm; David > Laight; jeffrey.t.kirsher@intel.com; netdev@vger.kernel.org; Robin > Murphy; davem@davemloft.net; linux-arm-kernel@lists.infradead.org > Subject: Re: [PATCH net-next 1/4] ixgbe: sparc: rename the > ARCH_WANT_RELAX_ORDER to IXGBE_ALLOW_RELAXED_ORDER > > | From: Bjorn Helgaas <helgaas@kernel.org> > | Sent: Thursday, April 27, 2017 10:19 AM > | > | Are you hinting that the PCI core or arch code could actually > *enable* > | Relaxed Ordering without the driver doing anything? Is it safe to do > that? > | Is there such a thing as a device that is capable of using RO, but > where the > | driver must be aware of it being enabled, so it programs the device > | appropriately? > > I forgot to reply to this portion of Bjorn's email. > > The PCI Configuration Space PCI Capability Device Control[Enable > Relaxed > Ordering] bit governs enabling the _ability_ for the PCIe Device to > send > TLPs with the Relaxed Ordering Attribute set. It does not _cause_ RO > to be > set on TLPs. Doing that would almost certainly cause Data Corruption > Bugs > since you only want a subset of TLPs to have RO set. > > For instance, we typically use RO for Ingress Packet Data delivery > but > non-RO for messages notifying the Host that an Ingress Packet has been > delivered. This ensures that the "Ingress Packet Delivered" non-RO TLP > is > processed _after_ any preceding RO TLPs delivering the actual Ingress > Packet > Data. > > In the above scenario, if one were to turn off Enable Relaxed > Ordering via > the PCIe Capability, then the on-chip PCIe engine would simply never > send a > TLP with the Relaxed Ordering Attribute set, regardless of any other > chip > programming. > > And finally, just to be absolutely clear, using Relaxed Ordering > isn't and > "Architecture Thing". It's a PCIe Fabric End Point Thing. Many End > Points > simply ignore the Relaxed Ordering Attribute (except to reflect it back > in > Response TLPs). In this sense, Relaxed Ordering simply provides > potentially useful optimization information to the PCIe End Point. I think your view matches what I found out about the current usage of the "Enable Relaxed Ordering" bit in Linux mainline: i.e. looking at where and why the other drivers set/clear the "Enable Relaxed Ordering" they do not look for any global symbol, nor they look at the host architecture. So with respect to this specific ixgbe driver I guess the main question is why RO was disabled by default by Intel for this EP (commit 3d5c520727ce mentions issues with "some chipsets"), then why it is safe to enable it back on SPARC....? Thanks Gab > > Casey
| From: Lucas Stach <l.stach@pengutronix.de> | Sent: Friday, April 28, 2017 1:51 AM | | Am Donnerstag, den 27.04.2017, 12:19 -0500 schrieb Bjorn Helgaas: | > | > | > I thought Relaxed Ordering was an optimization. Are there cases where | > it is actually required for correct behavior? | | Yes, at least the Tegra 2 TRM claims that RO needs to be enabled on the | device side for correct operation with the following language: | | "Tegra 2 requires relaxed ordering for responses to downstream requests | (responses can pass writes). It is possible in some circumstances for PCIe | transfers from an external bus masters (i.e. upstream transfers) to become | blocked by a downstream read or non-posted write. The responses to these | downstream requests are blocked by upstream posted writes only when PCIe | strict ordering is imposed. It is therefore necessary to never impose strict | ordering that would block a response to a downstream NPW/read request and | always set the relaxed ordering bit to 1. Only devices that are capable of | relaxed ordering may be used with Tegra 2 devices." (woof) Reading through the above paragraph is difficult because the author seems to shift language and terminology mid sentence and isn't following standard PCI terminology conventions. The Root Complex is "Upstream", a non-Root Complex Node in the PCIe Fabric is "Downstream", Requests that a Downstream Device (End Point) send to the Root Complex are called "Upstream Requests", responses that the Root Complex send to a Device are called "Downstream Responses" (or, even more pedantically, "Responses sent Downstream for an earlier Upstream Request"). Because a Root Complex is Upstream, but the Requests it sent Downstream, and Downstream Devices send their Requests Upstream, it's very important that we use exceedingly precise language. So, it ~sounds like~ the nVidia Tegra 2 document is talking about the need for Downstream Devices to echo the Relaxed Ordering Attribute in their Responses directed Upstream to Requests sent Downstream from the Root Complex. Moreover, there's code in drivers/pci/host/pci-tegra.c: tegra_pcie_relax_enable() which appears to set the PCIe Capability Device Control[Enable Relaxed Ordering] bit on all PCIe Fabric Nodes. If my reading of the intent of the nVidia document is correct -- and that's a Big If because of the extremely imprecise language used -- that means that the tegra_pcie_relax_enable() is completely bogus. The PCIe 3.0 Specification states that Responses MUST reflect the Relaxed Ordering and No Snoop Attributes of the Requests for which they are responding. Section 2.2.9 of PCI Express(r) Base Specification Revision 3.0 November 10, 2010: "Completion headers must supply the same values for the Attribute as were supplied in the header of the corresponding Request, except as explicitly allowed when IDO is used." And, specifically, the PCIe Capability Device Control[Enable Relaxed Ordering] bit _only_ affects the ability of that Device to originate Transaction Layer Packet Requests with the Relaxed Ordering Attribute set. Thus, tegra_pcie_relax_enable() setting those bits on all the Downstream Devices (and intervening Bridges) does not _cause_ those Devices to generate Requests with Relaxed Ordering set. And, if the Devices are PCIe 3.0 compliant, it also doesn't affect the Responses that they send back Upstream to the Root Complex. I apologize for the incredibly detailed nature of these responses, but it's very easy for people new to PCIe to get these things wrong and/or misinterpret the PCIe Specifications. Casey
diff --git a/arch/Kconfig b/arch/Kconfig index cd211a1..bc0ab44 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -844,7 +844,7 @@ config STRICT_MODULE_RWX and non-text memory will be made non-executable. This provides protection against certain security exploits (e.g. writing to text) -config ARCH_WANT_RELAX_ORDER +config IXGBE_ALLOW_RELAXED_ORDER bool source "kernel/gcov/Kconfig" diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig index 68ac5c7..f56bcf4 100644 --- a/arch/sparc/Kconfig +++ b/arch/sparc/Kconfig @@ -44,7 +44,7 @@ config SPARC select CPU_NO_EFFICIENT_FFS select HAVE_ARCH_HARDENED_USERCOPY select PROVE_LOCKING_SMALL if PROVE_LOCKING - select ARCH_WANT_RELAX_ORDER + select IXGBE_ALLOW_RELAXED_ORDER config SPARC32 def_bool !64BIT diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c index c38d50c..563ea15 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c @@ -350,7 +350,7 @@ s32 ixgbe_start_hw_gen2(struct ixgbe_hw *hw) } IXGBE_WRITE_FLUSH(hw); -#ifndef CONFIG_ARCH_WANT_RELAX_ORDER +#ifndef CONFIG_IXGBE_ALLOW_RELAX_ORDER /* Disable relaxed ordering */ for (i = 0; i < hw->mac.max_tx_queues; i++) { u32 regval;
Till now only the Intel ixgbe could support enable Relaxed ordering in the drivers for special architecture, but the ARCH_WANT_RELAX_ORDER is looks like a general name for all arch, so rename to a specific name for intel card looks more appropriate. Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> --- arch/Kconfig | 2 +- arch/sparc/Kconfig | 2 +- drivers/net/ethernet/intel/ixgbe/ixgbe_common.c | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) -- 1.9.0