Message ID | 1603442745-13085-2-git-send-email-yilun.xu@intel.com |
---|---|
State | New |
Headers | show |
Series | Add the netdev support for Intel PAC N3000 FPGA | expand |
Hi Xu Before i look at the other patches, i want to understand the architecture properly. > +======================================================================= > +DFL device driver for Ether Group private feature on Intel(R) PAC N3000 > +======================================================================= > + > +This is the driver for Ether Group private feature on Intel(R) > +PAC (Programmable Acceleration Card) N3000. I assume this is just one implementation. The FPGA could be placed on other boards. So some of the limitations you talk about with the BMC artificial, and the overall architecture of the drivers is more generic? > +The Intel(R) PAC N3000 is a FPGA based SmartNIC platform for multi-workload > +networking application acceleration. A simple diagram below to for the board: > + > + +----------------------------------------+ > + | FPGA | > ++----+ +-------+ +-----------+ +----------+ +-----------+ +----------+ > +|QSFP|---|retimer|---|Line Side |--|User logic|--|Host Side |---|XL710 | > ++----+ +-------+ |Ether Group| | | |Ether Group| |Ethernet | > + |(PHY + MAC)| |wiring & | |(MAC + PHY)| |Controller| > + +-----------+ |offloading| +-----------+ +----------+ > + | +----------+ | > + | | > + +----------------------------------------+ Is XL710 required? I assume any MAC with the correct MII interface will work? Do you really mean PHY? I actually expect it is PCS? > +The DFL Ether Group driver registers netdev for each line side link. Users > +could use standard commands (ethtool, ip, ifconfig) for configuration and > +link state/statistics reading. For host side links, they are always connected > +to the host ethernet controller, so they should always have same features as > +the host ethernet controller. There is no need to register netdevs for them. So lets say the XL710 is eth0. The line side netif is eth1. Where do i put the IP address? What interface do i add to quagga OSPF? > +The driver just enables these links on probe. > + > +The retimer chips are managed by onboard BMC (Board Management Controller) > +firmware, host driver is not capable to access them directly. What about the QSPF socket? Can the host get access to the I2C bus? The pins for TX enable, etc. ethtool -m? > +Speed/Duplex > +------------ > +The Ether Group doesn't support auto-negotiation. The link speed is fixed to > +10G, 25G or 40G full duplex according to which Ether Group IP is programmed. So that means, if i pop out the SFP and put in a different one which supports a different speed, it is expected to be broken until the FPGA is reloaded? Andrew
On 10/23/20 1:45 AM, Xu Yilun wrote: > This patch adds the document for DFL Ether Group driver. > > Signed-off-by: Xu Yilun <yilun.xu@intel.com> > --- > .../networking/device_drivers/ethernet/index.rst | 1 + > .../ethernet/intel/dfl-eth-group.rst | 102 +++++++++++++++++++++ > 2 files changed, 103 insertions(+) > create mode 100644 Documentation/networking/device_drivers/ethernet/intel/dfl-eth-group.rst > > diff --git a/Documentation/networking/device_drivers/ethernet/index.rst b/Documentation/networking/device_drivers/ethernet/index.rst > index cbb75a18..eb7c443 100644 > --- a/Documentation/networking/device_drivers/ethernet/index.rst > +++ b/Documentation/networking/device_drivers/ethernet/index.rst > @@ -26,6 +26,7 @@ Contents: > freescale/gianfar > google/gve > huawei/hinic > + intel/dfl-eth-group > intel/e100 > intel/e1000 > intel/e1000e > diff --git a/Documentation/networking/device_drivers/ethernet/intel/dfl-eth-group.rst b/Documentation/networking/device_drivers/ethernet/intel/dfl-eth-group.rst > new file mode 100644 > index 0000000..525807e > --- /dev/null > +++ b/Documentation/networking/device_drivers/ethernet/intel/dfl-eth-group.rst > @@ -0,0 +1,102 @@ > +.. SPDX-License-Identifier: GPL-2.0+ > + > +======================================================================= > +DFL device driver for Ether Group private feature on Intel(R) PAC N3000 > +======================================================================= > + > +This is the driver for Ether Group private feature on Intel(R) > +PAC (Programmable Acceleration Card) N3000. > + > +The Intel(R) PAC N3000 is a FPGA based SmartNIC platform for multi-workload > +networking application acceleration. A simple diagram below to for the board: > + > + +----------------------------------------+ > + | FPGA | > ++----+ +-------+ +-----------+ +----------+ +-----------+ +----------+ > +|QSFP|---|retimer|---|Line Side |--|User logic|--|Host Side |---|XL710 | > ++----+ +-------+ |Ether Group| | | |Ether Group| |Ethernet | > + |(PHY + MAC)| |wiring & | |(MAC + PHY)| |Controller| > + +-----------+ |offloading| +-----------+ +----------+ > + | +----------+ | > + | | > + +----------------------------------------+ > + > +The FPGA is composed of FPGA Interface Module (FIM) and Accelerated Function > +Unit (AFU). The FIM implements the basic functionalities for FPGA access, > +management and reprograming, while the AFU is the FPGA reprogramable region for > +users. > + > +The Line Side & Host Side Ether Groups are soft IP blocks embedded in FIM. They The Line Side and Host Side Ether Groups are soft IP blocks embedded in the FIM. > +are internally wire connected to AFU and communicate with AFU with MAC packets. are internally connected to the AFU and communicate with the AFU using MAC packets > +The user logic is developed by the FPGA users and re-programmed to AFU, The user logic is application dependent, supplied by the FPGA developer and used to reprogram the AFU. > +providing the user defined wire connections between line side & host side data between Line Side and Host Side > +interfaces, as well as the MAC layer offloading. > + > +There are 2 types of interfaces for the Ether Groups: > + > +1. The data interfaces connects the Ether Groups and the AFU, host has no The data interface which connects > +ability to control the data stream . So the FPGA is like a pipe between the > +host ethernet controller and the retimer chip. > + > +2. The management interfaces connects the Ether Groups to the host, so host The management interface which connects > +could access the Ether Group registers for configuration and statistics > +reading. > + > +The Intel(R) PAC N3000 could be programmed to various configurations (with N3000 can be > +different link numbers and speeds, e.g. 8x10G, 4x25G ...). It is done by This is done > +programing different variants of the Ether Group IP blocks, and doing > +corresponding configuration to the retimer chips. programming different variants of the Ether Group IP blocks and retimer configuration. > + > +The DFL Ether Group driver registers netdev for each line side link. Users registers a netdev > +could use standard commands (ethtool, ip, ifconfig) for configuration and > +link state/statistics reading. For host side links, they are always connected > +to the host ethernet controller, so they should always have same features as > +the host ethernet controller. There is no need to register netdevs for them. > +The driver just enables these links on probe. > + > +The retimer chips are managed by onboard BMC (Board Management Controller) > +firmware, host driver is not capable to access them directly. So it is mostly firmware. The host driver So it behaves like > +like an external fixed PHY. However the link states detected by the retimer > +chips can not be propagated to the Ether Groups for hardware limitation, in Limitations should get there own section, this is going off on tangent. > +order to manage the link state, a PHY driver (intel-m10-bmc-retimer) is > +introduced to query the BMC for the retimer's link state. The Ether Group > +driver would connect to the PHY devices and get the link states. The > +intel-m10-bmc-retimer driver creates a peseudo MDIO bus for each board, so > +that the Ether Group driver could find the PHY devices by their peseudo PHY > +addresses. > + > + > +2. Features supported > +===================== > + > +Data Path > +--------- > +Since the driver can't control the data stream, the Ether Group driver > +doesn't implement the valid tx/rx functions. Any transmit attempt on these > +links from host will be dropped, and no data could be received to host from links from the host will be dropped. (you can assume a dropped link will not have data and shorten the sentence) > +these links. Users should operate on the netdev of host ethernet controller > +for networking data traffic. > + > + > +Speed/Duplex > +------------ > +The Ether Group doesn't support auto-negotiation. The link speed is fixed to does not > +10G, 25G or 40G full duplex according to which Ether Group IP is programmed. > + > +Statistics > +---------- > +The Ether Group IP has the statistics counters for ethernet traffic and errors. > +The user can obtain these MAC-level statistics using "ethtool -S" option. > + > +MTU > +--- > +The Ether Group IP is capable of detecting oversized packets. It will not drop > +the packet but pass it up and increment the tx/rx oversize counters. The MTU but will pass it and > +could be changed via ip or ifconfig commands. > + > +Flow Control > +------------ > +Ethernet Flow Control (IEEE 802.3x) can be configured with ethtool to enable > +transmitting pause frames. Receiving pause request from outside to Ether Group pausing tx frames. Receiving a pause Tom > +MAC is not supported. The flow control auto-negotiation is not supported. The > +user can enable or disable Tx Flow Control using "ethtool -A eth? tx <on|off>"
Hi Andrew Thanks for your fast response, see comments inline. On Fri, Oct 23, 2020 at 05:37:31PM +0200, Andrew Lunn wrote: > Hi Xu > > Before i look at the other patches, i want to understand the > architecture properly. I have a doc to describe the architecture: https://www.intel.com/content/www/us/en/programmable/documentation/xgz1560360700260.html The "Figure 1" is a more detailed figure for the arch. It should be helpful. > > > +======================================================================= > > +DFL device driver for Ether Group private feature on Intel(R) PAC N3000 > > +======================================================================= > > + > > +This is the driver for Ether Group private feature on Intel(R) > > +PAC (Programmable Acceleration Card) N3000. > > I assume this is just one implementation. The FPGA could be placed on > other boards. So some of the limitations you talk about with the BMC > artificial, and the overall architecture of the drivers is more > generic? I could see if the retimer management is changed, e.g. access the retimer through a host controlled MDIO, maybe I need a more generic way to find the MDIO bus. Do you have other suggestions? > > > +The Intel(R) PAC N3000 is a FPGA based SmartNIC platform for multi-workload > > +networking application acceleration. A simple diagram below to for the board: > > + > > + +----------------------------------------+ > > + | FPGA | > > ++----+ +-------+ +-----------+ +----------+ +-----------+ +----------+ > > +|QSFP|---|retimer|---|Line Side |--|User logic|--|Host Side |---|XL710 | > > ++----+ +-------+ |Ether Group| | | |Ether Group| |Ethernet | > > + |(PHY + MAC)| |wiring & | |(MAC + PHY)| |Controller| > > + +-----------+ |offloading| +-----------+ +----------+ > > + | +----------+ | > > + | | > > + +----------------------------------------+ > > Is XL710 required? I assume any MAC with the correct MII interface > will work? The XL710 is required for this implementation, in which we have the Host Side Ether Group facing the host. The Host Side Ether Group actually contains the same IP blocks as Line Side. It contains the compacted MAC & PHY functionalities for 25G/40G case. The 25G MAC-PHY soft IP SPEC can be found at: https://www.intel.com/content/www/us/en/programmable/documentation/ewo1447742896786.html So raw serial data is output from Host Side FPGA, and XL710 is good to handle this. > > Do you really mean PHY? I actually expect it is PCS? For this implementation, yes. I guess if you program another IP block on FPGA host side, e.g. a PCS interface, and replace XL710 with another MAC, it may also work. But I think there should be other drivers to handle this. I may contact with our Hardware designer if there is some concern we don't use MII for connection of FPGA & Host. The FPGA User is mainly concerned about the user logic part. The Ether Groups in FIU and Board components are not expected to be re-designed by the user. So I think I should still focus on the driver for this implementation. > > > +The DFL Ether Group driver registers netdev for each line side link. Users > > +could use standard commands (ethtool, ip, ifconfig) for configuration and > > +link state/statistics reading. For host side links, they are always connected > > +to the host ethernet controller, so they should always have same features as > > +the host ethernet controller. There is no need to register netdevs for them. > > So lets say the XL710 is eth0. The line side netif is eth1. Where do i > put the IP address? What interface do i add to quagga OSPF? The IP address should be put in eth0. eth0 should always be used for the tools. The line/host side Ether Group is not the terminal of the network data stream. Eth1 will not paticipate in the network data exchange to host. The main purposes for eth1 are: 1. For users to monitor the network statistics on Line Side, and by comparing the statistics between eth0 & eth1, users could get some knowledge of how the User logic is taking function. 2. Get the link state of the front panel. The XL710 is now connected to Host Side of the FPGA and the its link state would be always on. So to check the link state of the front panel, we need to query eth1. > > > +The driver just enables these links on probe. > > + > > +The retimer chips are managed by onboard BMC (Board Management Controller) > > +firmware, host driver is not capable to access them directly. > > What about the QSPF socket? Can the host get access to the I2C bus? > The pins for TX enable, etc. ethtool -m? No, the QSPF/I2C are also managed by the BMC firmware, and host doesn't have interface to talk to BMC firmware about QSPF. > > > +Speed/Duplex > > +------------ > > +The Ether Group doesn't support auto-negotiation. The link speed is fixed to > > +10G, 25G or 40G full duplex according to which Ether Group IP is programmed. > > So that means, if i pop out the SFP and put in a different one which > supports a different speed, it is expected to be broken until the FPGA > is reloaded? It is expected to be broken. Now the line side is expected to be configured to 4x10G, 4x25G, 2x25G, 1x25G. host side is expected to be 4x10G or 2x40G for XL710. So 4 channel SFP is expected to be inserted to front panel. And we should use 4x25G SFP, which is compatible to 4x10G connection. Thanks, Yilun > > Andrew
> > > +The Intel(R) PAC N3000 is a FPGA based SmartNIC platform for multi-workload > > > +networking application acceleration. A simple diagram below to for the board: > > > + > > > + +----------------------------------------+ > > > + | FPGA | > > > ++----+ +-------+ +-----------+ +----------+ +-----------+ +----------+ > > > +|QSFP|---|retimer|---|Line Side |--|User logic|--|Host Side |---|XL710 | > > > ++----+ +-------+ |Ether Group| | | |Ether Group| |Ethernet | > > > + |(PHY + MAC)| |wiring & | |(MAC + PHY)| |Controller| > > > + +-----------+ |offloading| +-----------+ +----------+ > > > + | +----------+ | > > > + | | > > > + +----------------------------------------+ > > > > Is XL710 required? I assume any MAC with the correct MII interface > > will work? > > The XL710 is required for this implementation, in which we have the Host > Side Ether Group facing the host. The Host Side Ether Group actually > contains the same IP blocks as Line Side. It contains the compacted MAC & > PHY functionalities for 25G/40G case. The 25G MAC-PHY soft IP SPEC can > be found at: > > https://www.intel.com/content/www/us/en/programmable/documentation/ewo1447742896786.html > > So raw serial data is output from Host Side FPGA, and XL710 is good to > handle this. What i have seen working with Marvell Ethernet switches, is that Marvell normally recommends connecting them to the Ethernet interfaces of Marvell SoCs. But the switch just needs a compatible MII interface, and lots of boards make use of non-Marvell MAC chips. Freescale FEC is very popular. What i'm trying to say is that ideally we need a collection of generic drivers for the different major components on the board, and a board driver which glues it all together. That then allows somebody to build other boards, or integrate the FPGA directly into an embedded system directly connected to a SoC, etc. > > Do you really mean PHY? I actually expect it is PCS? > > For this implementation, yes. Yes, you have a PHY? Or Yes, it is PCS? To me, the phylib maintainer, having a PHY means you have a base-T interface, 25Gbase-T, 40Gbase-T? That would be an odd and expensive architecture when you should be able to just connect SERDES interfaces together. > > > +The DFL Ether Group driver registers netdev for each line side link. Users > > > +could use standard commands (ethtool, ip, ifconfig) for configuration and > > > +link state/statistics reading. For host side links, they are always connected > > > +to the host ethernet controller, so they should always have same features as > > > +the host ethernet controller. There is no need to register netdevs for them. > > > > So lets say the XL710 is eth0. The line side netif is eth1. Where do i > > put the IP address? What interface do i add to quagga OSPF? > > The IP address should be put in eth0. eth0 should always be used for the > tools. That was what i was afraid of :-) > > The line/host side Ether Group is not the terminal of the network data stream. > Eth1 will not paticipate in the network data exchange to host. > > The main purposes for eth1 are: > 1. For users to monitor the network statistics on Line Side, and by comparing the > statistics between eth0 & eth1, users could get some knowledge of how the User > logic is taking function. > > 2. Get the link state of the front panel. The XL710 is now connected to > Host Side of the FPGA and the its link state would be always on. So to > check the link state of the front panel, we need to query eth1. This is very non-intuitive. We try to avoid this in the kernel and the API to userspace. Ethernet switches are always modelled as accelerators for what the Linux network stack can already do. You configure an Ethernet switch port in just the same way configure any other netdev. You add an IP address to the switch port, you get the Ethernet statistics from the switch port, routing protocols use the switch port. You design needs to be the same. All configuration needs to happen via eth1. Please look at the DSA architecture. What you have here is very similar to a two port DSA switch. In DSA terminology, we would call eth0 the master interface. It needs to be up, but otherwise the user does not configure it. eth1 is the slave interface. It is the user facing interface of the switch. All configuration happens on this interface. Linux can also send/receive packets on this netdev. The slave TX function forwards the frame to the master interface netdev, via a DSA tagger. Frames which eth0 receive are passed through the tagger and then passed to the slave interface. All the infrastructure you need is already in place. Please use it. I'm not saying you need to write a DSA driver, but you should make use of the same ideas and low level hooks in the network stack which DSA uses. > > What about the QSPF socket? Can the host get access to the I2C bus? > > The pins for TX enable, etc. ethtool -m? > > No, the QSPF/I2C are also managed by the BMC firmware, and host doesn't > have interface to talk to BMC firmware about QSPF. So can i even tell what SFP is in the socket? > > > +Speed/Duplex > > > +------------ > > > +The Ether Group doesn't support auto-negotiation. The link speed is fixed to > > > +10G, 25G or 40G full duplex according to which Ether Group IP is programmed. > > > > So that means, if i pop out the SFP and put in a different one which > > supports a different speed, it is expected to be broken until the FPGA > > is reloaded? > > It is expected to be broken. And since i have no access to the SFP information, i have no idea what is actually broken? How i should configure the various layers? > Now the line side is expected to be configured to 4x10G, 4x25G, 2x25G, 1x25G. > host side is expected to be 4x10G or 2x40G for XL710. > > So 4 channel SFP is expected to be inserted to front panel. And we should use > 4x25G SFP, which is compatible to 4x10G connection. So if you had exported the SFP to linux, phylink could of handled some of this for you. Probably with some extensions to phylink, but Russell King would of probably helped you. phylink has a good idea how to decode the SFP EEPROM and figure out the link mode. It has interfaces to configure PCS blocks, So it could probably deal with the line side and host side PCS. And it would of been easy to send a udev notification that the SFP has changed, maybe user space needs to download a different FPGA bit file? So the user would not see a broken interface, the hardware could be reconfigured on the fly. This is one problem i have with this driver. It is based around this somewhat broken reference design. phylib, along with the hacks you have, are enough for this reference design. But really you want to make use of phylink in order to support less limited designs which will follow. Or you need to push a lot more into the BMC, and don't use phylib at all. Andrew
On Mon, Oct 26, 2020 at 02:00:01PM +0100, Andrew Lunn wrote: > > > > +The Intel(R) PAC N3000 is a FPGA based SmartNIC platform for multi-workload > > > > +networking application acceleration. A simple diagram below to for the board: > > > > + > > > > + +----------------------------------------+ > > > > + | FPGA | > > > > ++----+ +-------+ +-----------+ +----------+ +-----------+ +----------+ > > > > +|QSFP|---|retimer|---|Line Side |--|User logic|--|Host Side |---|XL710 | > > > > ++----+ +-------+ |Ether Group| | | |Ether Group| |Ethernet | > > > > + |(PHY + MAC)| |wiring & | |(MAC + PHY)| |Controller| > > > > + +-----------+ |offloading| +-----------+ +----------+ > > > > + | +----------+ | > > > > + | | > > > > + +----------------------------------------+ > > > > > > Is XL710 required? I assume any MAC with the correct MII interface > > > will work? > > > > The XL710 is required for this implementation, in which we have the Host > > Side Ether Group facing the host. The Host Side Ether Group actually > > contains the same IP blocks as Line Side. It contains the compacted MAC & > > PHY functionalities for 25G/40G case. The 25G MAC-PHY soft IP SPEC can > > be found at: > > > > https://www.intel.com/content/www/us/en/programmable/documentation/ewo1447742896786.html > > > > So raw serial data is output from Host Side FPGA, and XL710 is good to > > handle this. > > What i have seen working with Marvell Ethernet switches, is that > Marvell normally recommends connecting them to the Ethernet interfaces > of Marvell SoCs. But the switch just needs a compatible MII interface, > and lots of boards make use of non-Marvell MAC chips. Freescale FEC is > very popular. > > What i'm trying to say is that ideally we need a collection of generic > drivers for the different major components on the board, and a board > driver which glues it all together. That then allows somebody to build > other boards, or integrate the FPGA directly into an embedded system > directly connected to a SoC, etc. > > > > Do you really mean PHY? I actually expect it is PCS? > > > > For this implementation, yes. > > Yes, you have a PHY? Or Yes, it is PCS? Sorry, I mean I have a PHY. > > To me, the phylib maintainer, having a PHY means you have a base-T > interface, 25Gbase-T, 40Gbase-T? That would be an odd and expensive > architecture when you should be able to just connect SERDES interfaces > together. I see your concerns about the SERDES interface between FPGA & XL710. Considering the DSA, we just enable the cpu facing ports, seems the SERDES interface connection doesn't impact the software. It's just too expensive. > > > > > +The DFL Ether Group driver registers netdev for each line side link. Users > > > > +could use standard commands (ethtool, ip, ifconfig) for configuration and > > > > +link state/statistics reading. For host side links, they are always connected > > > > +to the host ethernet controller, so they should always have same features as > > > > +the host ethernet controller. There is no need to register netdevs for them. > > > > > > So lets say the XL710 is eth0. The line side netif is eth1. Where do i > > > put the IP address? What interface do i add to quagga OSPF? > > > > The IP address should be put in eth0. eth0 should always be used for the > > tools. > > That was what i was afraid of :-) > > > > > The line/host side Ether Group is not the terminal of the network data stream. > > Eth1 will not paticipate in the network data exchange to host. > > > > The main purposes for eth1 are: > > 1. For users to monitor the network statistics on Line Side, and by comparing the > > statistics between eth0 & eth1, users could get some knowledge of how the User > > logic is taking function. > > > > 2. Get the link state of the front panel. The XL710 is now connected to > > Host Side of the FPGA and the its link state would be always on. So to > > check the link state of the front panel, we need to query eth1. > > This is very non-intuitive. We try to avoid this in the kernel and the > API to userspace. Ethernet switches are always modelled as > accelerators for what the Linux network stack can already do. You > configure an Ethernet switch port in just the same way configure any > other netdev. You add an IP address to the switch port, you get the > Ethernet statistics from the switch port, routing protocols use the > switch port. > > You design needs to be the same. All configuration needs to happen via > eth1. > > Please look at the DSA architecture. What you have here is very > similar to a two port DSA switch. In DSA terminology, we would call > eth0 the master interface. It needs to be up, but otherwise the user > does not configure it. eth1 is the slave interface. It is the user > facing interface of the switch. All configuration happens on this > interface. Linux can also send/receive packets on this netdev. The > slave TX function forwards the frame to the master interface netdev, > via a DSA tagger. Frames which eth0 receive are passed through the > tagger and then passed to the slave interface. > > All the infrastructure you need is already in place. Please use > it. I'm not saying you need to write a DSA driver, but you should make > use of the same ideas and low level hooks in the network stack which > DSA uses. I did some investigation about the DSA, and actually I wrote a experimental DSA driver. It works and almost meets my need, I can make configuration, run pktgen on slave inf. A main concern for dsa is the wiring from slave inf to master inf depends on the user logic. If FPGA users want to make their own user logic, they may need a new driver. But our original design for the FPGA is, kernel drivers support the fundamental parts - FPGA FIU (where Ether Group is in) & other peripherals on board, and userspace direct I/O access for User logic. Then FPGA user don't have to write & compile a driver for their user logic change. It seems not that case for netdev. The user logic is a part of the whole functionality of the netdev, we cannot split part of the hardware component to userspace and the rest in kernel. I really need to reconsider this. > > > > What about the QSPF socket? Can the host get access to the I2C bus? > > > The pins for TX enable, etc. ethtool -m? > > > > No, the QSPF/I2C are also managed by the BMC firmware, and host doesn't > > have interface to talk to BMC firmware about QSPF. > > So can i even tell what SFP is in the socket? No. > > > > > +Speed/Duplex > > > > +------------ > > > > +The Ether Group doesn't support auto-negotiation. The link speed is fixed to > > > > +10G, 25G or 40G full duplex according to which Ether Group IP is programmed. > > > > > > So that means, if i pop out the SFP and put in a different one which > > > supports a different speed, it is expected to be broken until the FPGA > > > is reloaded? > > > > It is expected to be broken. > > And since i have no access to the SFP information, i have no idea what > is actually broken? How i should configure the various layers? With this hardware implementation, I'm afraid host can not know what is broken. It can just see the Speed of the slave inf is never changed, and the link state is "No" on slave inf. Is it like the fixed phy or fixed link mode? Is it possible just see it as fixed and configure the layers? > > > Now the line side is expected to be configured to 4x10G, 4x25G, 2x25G, 1x25G. > > host side is expected to be 4x10G or 2x40G for XL710. > > > > So 4 channel SFP is expected to be inserted to front panel. And we should use > > 4x25G SFP, which is compatible to 4x10G connection. > > So if you had exported the SFP to linux, phylink could of handled some > of this for you. Probably with some extensions to phylink, but Russell > King would of probably helped you. phylink has a good idea how to > decode the SFP EEPROM and figure out the link mode. It has interfaces > to configure PCS blocks, So it could probably deal with the line side > and host side PCS. And it would of been easy to send a udev > notification that the SFP has changed, maybe user space needs to > download a different FPGA bit file? So the user would not see a broken > interface, the hardware could be reconfigured on the fly. > > This is one problem i have with this driver. It is based around this > somewhat broken reference design. phylib, along with the hacks you > have, are enough for this reference design. But really you want to > make use of phylink in order to support less limited designs which > will follow. Or you need to push a lot more into the BMC, and don't > use phylib at all. Mm.. seems the hardware should be changed, either let host directly access the QSFP, or re-design the BMC to provide more info for QSFP. Is it possible we didn't change the hardware, and we support the components (QSFP, retimer) by fixed-link mode. I know this makes the driver specific to the board, but the boards are being used by customers and I'm trying to make them supported without hardware changes... Thanks for your very detailed explaination and guide. Yilun > > Andrew
On Tue, 27 Oct 2020 01:38:04 +0800 Xu Yilun wrote: > > > The line/host side Ether Group is not the terminal of the network data stream. > > > Eth1 will not paticipate in the network data exchange to host. > > > > > > The main purposes for eth1 are: > > > 1. For users to monitor the network statistics on Line Side, and by comparing the > > > statistics between eth0 & eth1, users could get some knowledge of how the User > > > logic is taking function. > > > > > > 2. Get the link state of the front panel. The XL710 is now connected to > > > Host Side of the FPGA and the its link state would be always on. So to > > > check the link state of the front panel, we need to query eth1. > > > > This is very non-intuitive. We try to avoid this in the kernel and the > > API to userspace. Ethernet switches are always modelled as > > accelerators for what the Linux network stack can already do. You > > configure an Ethernet switch port in just the same way configure any > > other netdev. You add an IP address to the switch port, you get the > > Ethernet statistics from the switch port, routing protocols use the > > switch port. > > > > You design needs to be the same. All configuration needs to happen via > > eth1. > > > > Please look at the DSA architecture. What you have here is very > > similar to a two port DSA switch. In DSA terminology, we would call > > eth0 the master interface. It needs to be up, but otherwise the user > > does not configure it. eth1 is the slave interface. It is the user > > facing interface of the switch. All configuration happens on this > > interface. Linux can also send/receive packets on this netdev. The > > slave TX function forwards the frame to the master interface netdev, > > via a DSA tagger. Frames which eth0 receive are passed through the > > tagger and then passed to the slave interface. > > > > All the infrastructure you need is already in place. Please use > > it. I'm not saying you need to write a DSA driver, but you should make > > use of the same ideas and low level hooks in the network stack which > > DSA uses. > > I did some investigation about the DSA, and actually I wrote a > experimental DSA driver. It works and almost meets my need, I can make > configuration, run pktgen on slave inf. > > A main concern for dsa is the wiring from slave inf to master inf depends > on the user logic. If FPGA users want to make their own user logic, they > may need a new driver. But our original design for the FPGA is, kernel > drivers support the fundamental parts - FPGA FIU (where Ether Group is in) > & other peripherals on board, and userspace direct I/O access for User > logic. Then FPGA user don't have to write & compile a driver for their > user logic change. > It seems not that case for netdev. The user logic is a part of the whole > functionality of the netdev, we cannot split part of the hardware > component to userspace and the rest in kernel. I really need to > reconsider this. This is obviously on purpose. Your design as it stands will not fly upstream, sorry. From netdev perspective the user should not care how many hardware blocks are in the pipeline, and on which piece of silicon. You have a 2 port (modulo port splitting) card, there should be 2 netdevs, and the link config and forwarding should be configured through those. Please let folks at Intel know that we don't like the "SDK in user space with reuse [/abuse] of parts of netdev infra" architecture. This is a second of those we see in a short time. Kernel is not a library for your SDK to use.
> > > > Do you really mean PHY? I actually expect it is PCS? > > > > > > For this implementation, yes. > > > > Yes, you have a PHY? Or Yes, it is PCS? > > Sorry, I mean I have a PHY. > > > > > To me, the phylib maintainer, having a PHY means you have a base-T > > interface, 25Gbase-T, 40Gbase-T? That would be an odd and expensive > > architecture when you should be able to just connect SERDES interfaces > > together. You really have 25Gbase-T, 40Gbase-T? Between the FPGA & XL710? What copper PHYs are using? > I see your concerns about the SERDES interface between FPGA & XL710. I have no concerns about direct SERDES connections. That is the normal way of doing this. It keeps it a lot simpler, since you don't have to worry about driving the PHYs. > I did some investigation about the DSA, and actually I wrote a > experimental DSA driver. It works and almost meets my need, I can make > configuration, run pktgen on slave inf. Cool. As i said, I don't know if this actually needs to be a DSA driver. It might just need to borrow some ideas from DSA. > Mm.. seems the hardware should be changed, either let host directly > access the QSFP, or re-design the BMC to provide more info for QSFP. At a minimum, you need to support ethtool -m. It could be a firmware call to the BMC, our you expose the i2c bus somehow. There are plenty of MAC drivers which implement eththool -m without using phylink. But i think you need to take a step back first, and look at the bigger picture. What is Intel's goal? Are they just going to sell complete cards? Or do they also want to sell the FPGA as a components anybody get put onto their own board? If there are only ever going to be compete cards, then you can go the firmware direction, push a lot of functionality into the BMC, and have the card driver make firmware calls to control the SFP, retimer, etc. You can then throw away your mdio and phy driver hacks. If however, the FPGA is going to be available as a component, can you also assume there is a BMC? Running Intel firmware? Can the customer also modify this firmware for their own needs? I think that is going to be difficult. So you need to push as much as possible towards linux, and let Linux drive all the hardware, the SFP, retimer, FPGA, etc. Andrew
On Mon, Oct 26, 2020 at 11:35:52AM -0700, Jakub Kicinski wrote: > On Tue, 27 Oct 2020 01:38:04 +0800 Xu Yilun wrote: > > > > The line/host side Ether Group is not the terminal of the network data stream. > > > > Eth1 will not paticipate in the network data exchange to host. > > > > > > > > The main purposes for eth1 are: > > > > 1. For users to monitor the network statistics on Line Side, and by comparing the > > > > statistics between eth0 & eth1, users could get some knowledge of how the User > > > > logic is taking function. > > > > > > > > 2. Get the link state of the front panel. The XL710 is now connected to > > > > Host Side of the FPGA and the its link state would be always on. So to > > > > check the link state of the front panel, we need to query eth1. > > > > > > This is very non-intuitive. We try to avoid this in the kernel and the > > > API to userspace. Ethernet switches are always modelled as > > > accelerators for what the Linux network stack can already do. You > > > configure an Ethernet switch port in just the same way configure any > > > other netdev. You add an IP address to the switch port, you get the > > > Ethernet statistics from the switch port, routing protocols use the > > > switch port. > > > > > > You design needs to be the same. All configuration needs to happen via > > > eth1. > > > > > > Please look at the DSA architecture. What you have here is very > > > similar to a two port DSA switch. In DSA terminology, we would call > > > eth0 the master interface. It needs to be up, but otherwise the user > > > does not configure it. eth1 is the slave interface. It is the user > > > facing interface of the switch. All configuration happens on this > > > interface. Linux can also send/receive packets on this netdev. The > > > slave TX function forwards the frame to the master interface netdev, > > > via a DSA tagger. Frames which eth0 receive are passed through the > > > tagger and then passed to the slave interface. > > > > > > All the infrastructure you need is already in place. Please use > > > it. I'm not saying you need to write a DSA driver, but you should make > > > use of the same ideas and low level hooks in the network stack which > > > DSA uses. > > > > I did some investigation about the DSA, and actually I wrote a > > experimental DSA driver. It works and almost meets my need, I can make > > configuration, run pktgen on slave inf. > > > > A main concern for dsa is the wiring from slave inf to master inf depends > > on the user logic. If FPGA users want to make their own user logic, they > > may need a new driver. But our original design for the FPGA is, kernel > > drivers support the fundamental parts - FPGA FIU (where Ether Group is in) > > & other peripherals on board, and userspace direct I/O access for User > > logic. Then FPGA user don't have to write & compile a driver for their > > user logic change. > > It seems not that case for netdev. The user logic is a part of the whole > > functionality of the netdev, we cannot split part of the hardware > > component to userspace and the rest in kernel. I really need to > > reconsider this. > > This is obviously on purpose. Your design as it stands will not fly > upstream, sorry. > > >From netdev perspective the user should not care how many hardware > blocks are in the pipeline, and on which piece of silicon. You have > a 2 port (modulo port splitting) card, there should be 2 netdevs, and > the link config and forwarding should be configured through those. > > Please let folks at Intel know that we don't like the "SDK in user > space with reuse [/abuse] of parts of netdev infra" architecture. > This is a second of those we see in a short time. Kernel is not a > library for your SDK to use. I get your point. I'll share the information internally and reconsider the design. Thanks, Yilun
On Mon, Oct 26, 2020 at 08:14:00PM +0100, Andrew Lunn wrote: > > > > > Do you really mean PHY? I actually expect it is PCS? > > > > > > > > For this implementation, yes. > > > > > > Yes, you have a PHY? Or Yes, it is PCS? > > > > Sorry, I mean I have a PHY. > > > > > > > > To me, the phylib maintainer, having a PHY means you have a base-T > > > interface, 25Gbase-T, 40Gbase-T? That would be an odd and expensive > > > architecture when you should be able to just connect SERDES interfaces > > > together. > > You really have 25Gbase-T, 40Gbase-T? Between the FPGA & XL710? > What copper PHYs are using? Sorry for the confusing. I'll check with our board designer and reply later. > > > I see your concerns about the SERDES interface between FPGA & XL710. > > I have no concerns about direct SERDES connections. That is the normal > way of doing this. It keeps it a lot simpler, since you don't have to > worry about driving the PHYs. > > > I did some investigation about the DSA, and actually I wrote a > > experimental DSA driver. It works and almost meets my need, I can make > > configuration, run pktgen on slave inf. > > Cool. As i said, I don't know if this actually needs to be a DSA > driver. It might just need to borrow some ideas from DSA. > > > Mm.. seems the hardware should be changed, either let host directly > > access the QSFP, or re-design the BMC to provide more info for QSFP. > > At a minimum, you need to support ethtool -m. It could be a firmware > call to the BMC, our you expose the i2c bus somehow. There are plenty > of MAC drivers which implement eththool -m without using phylink. > > But i think you need to take a step back first, and look at the bigger > picture. What is Intel's goal? Are they just going to sell complete > cards? Or do they also want to sell the FPGA as a components anybody > get put onto their own board? > > If there are only ever going to be compete cards, then you can go the > firmware direction, push a lot of functionality into the BMC, and have > the card driver make firmware calls to control the SFP, retimer, > etc. You can then throw away your mdio and phy driver hacks. > > If however, the FPGA is going to be available as a component, can you > also assume there is a BMC? Running Intel firmware? Can the customer > also modify this firmware for their own needs? I think that is going > to be difficult. So you need to push as much as possible towards > linux, and let Linux drive all the hardware, the SFP, retimer, FPGA, > etc. This is a very helpful. I'll share with our team and reconsider about the design. Thanks, Yilun > > Andrew >
Hi Andrew: On Mon, Oct 26, 2020 at 08:14:00PM +0100, Andrew Lunn wrote: > > > > > Do you really mean PHY? I actually expect it is PCS? > > > > > > > > For this implementation, yes. > > > > > > Yes, you have a PHY? Or Yes, it is PCS? > > > > Sorry, I mean I have a PHY. > > > > > > > > To me, the phylib maintainer, having a PHY means you have a base-T > > > interface, 25Gbase-T, 40Gbase-T? That would be an odd and expensive > > > architecture when you should be able to just connect SERDES interfaces > > > together. > > You really have 25Gbase-T, 40Gbase-T? Between the FPGA & XL710? > What copper PHYs are using? > > > I see your concerns about the SERDES interface between FPGA & XL710. > > I have no concerns about direct SERDES connections. That is the normal > way of doing this. It keeps it a lot simpler, since you don't have to > worry about driving the PHYs. > I did some investigation and now I have some details. The term 'PHY' described in Ether Group Spec should be the PCS + PMA, a figure below for one configuration: +------------------------+ +-----------------+ | Host Side Ether Group | | XL710 | | | | | | +--------------------+ | | | | | 40G Ether IP | | | | | | | | | | | | +---------+ | | XLAUI | | | | MAC - |PCS - PMA| | |----------| PMA - PCS - MAC | | | +---------+ | | | | +-+--------------------+-+ +-----------------+ Thanks, Yilun
> I did some investigation and now I have some details. > The term 'PHY' described in Ether Group Spec should be the PCS + PMA, a figure > below for one configuration: > > +------------------------+ +-----------------+ > | Host Side Ether Group | | XL710 | > | | | | > | +--------------------+ | | | > | | 40G Ether IP | | | | > | | | | | | > | | +---------+ | | XLAUI | | > | | MAC - |PCS - PMA| | |----------| PMA - PCS - MAC | > | | +---------+ | | | | > +-+--------------------+-+ +-----------------+ Thanks, that makes a lot more sense. Andrew
diff --git a/Documentation/networking/device_drivers/ethernet/index.rst b/Documentation/networking/device_drivers/ethernet/index.rst index cbb75a18..eb7c443 100644 --- a/Documentation/networking/device_drivers/ethernet/index.rst +++ b/Documentation/networking/device_drivers/ethernet/index.rst @@ -26,6 +26,7 @@ Contents: freescale/gianfar google/gve huawei/hinic + intel/dfl-eth-group intel/e100 intel/e1000 intel/e1000e diff --git a/Documentation/networking/device_drivers/ethernet/intel/dfl-eth-group.rst b/Documentation/networking/device_drivers/ethernet/intel/dfl-eth-group.rst new file mode 100644 index 0000000..525807e --- /dev/null +++ b/Documentation/networking/device_drivers/ethernet/intel/dfl-eth-group.rst @@ -0,0 +1,102 @@ +.. SPDX-License-Identifier: GPL-2.0+ + +======================================================================= +DFL device driver for Ether Group private feature on Intel(R) PAC N3000 +======================================================================= + +This is the driver for Ether Group private feature on Intel(R) +PAC (Programmable Acceleration Card) N3000. + +The Intel(R) PAC N3000 is a FPGA based SmartNIC platform for multi-workload +networking application acceleration. A simple diagram below to for the board: + + +----------------------------------------+ + | FPGA | ++----+ +-------+ +-----------+ +----------+ +-----------+ +----------+ +|QSFP|---|retimer|---|Line Side |--|User logic|--|Host Side |---|XL710 | ++----+ +-------+ |Ether Group| | | |Ether Group| |Ethernet | + |(PHY + MAC)| |wiring & | |(MAC + PHY)| |Controller| + +-----------+ |offloading| +-----------+ +----------+ + | +----------+ | + | | + +----------------------------------------+ + +The FPGA is composed of FPGA Interface Module (FIM) and Accelerated Function +Unit (AFU). The FIM implements the basic functionalities for FPGA access, +management and reprograming, while the AFU is the FPGA reprogramable region for +users. + +The Line Side & Host Side Ether Groups are soft IP blocks embedded in FIM. They +are internally wire connected to AFU and communicate with AFU with MAC packets. +The user logic is developed by the FPGA users and re-programmed to AFU, +providing the user defined wire connections between line side & host side data +interfaces, as well as the MAC layer offloading. + +There are 2 types of interfaces for the Ether Groups: + +1. The data interfaces connects the Ether Groups and the AFU, host has no +ability to control the data stream . So the FPGA is like a pipe between the +host ethernet controller and the retimer chip. + +2. The management interfaces connects the Ether Groups to the host, so host +could access the Ether Group registers for configuration and statistics +reading. + +The Intel(R) PAC N3000 could be programmed to various configurations (with +different link numbers and speeds, e.g. 8x10G, 4x25G ...). It is done by +programing different variants of the Ether Group IP blocks, and doing +corresponding configuration to the retimer chips. + +The DFL Ether Group driver registers netdev for each line side link. Users +could use standard commands (ethtool, ip, ifconfig) for configuration and +link state/statistics reading. For host side links, they are always connected +to the host ethernet controller, so they should always have same features as +the host ethernet controller. There is no need to register netdevs for them. +The driver just enables these links on probe. + +The retimer chips are managed by onboard BMC (Board Management Controller) +firmware, host driver is not capable to access them directly. So it is mostly +like an external fixed PHY. However the link states detected by the retimer +chips can not be propagated to the Ether Groups for hardware limitation, in +order to manage the link state, a PHY driver (intel-m10-bmc-retimer) is +introduced to query the BMC for the retimer's link state. The Ether Group +driver would connect to the PHY devices and get the link states. The +intel-m10-bmc-retimer driver creates a peseudo MDIO bus for each board, so +that the Ether Group driver could find the PHY devices by their peseudo PHY +addresses. + + +2. Features supported +===================== + +Data Path +--------- +Since the driver can't control the data stream, the Ether Group driver +doesn't implement the valid tx/rx functions. Any transmit attempt on these +links from host will be dropped, and no data could be received to host from +these links. Users should operate on the netdev of host ethernet controller +for networking data traffic. + + +Speed/Duplex +------------ +The Ether Group doesn't support auto-negotiation. The link speed is fixed to +10G, 25G or 40G full duplex according to which Ether Group IP is programmed. + +Statistics +---------- +The Ether Group IP has the statistics counters for ethernet traffic and errors. +The user can obtain these MAC-level statistics using "ethtool -S" option. + +MTU +--- +The Ether Group IP is capable of detecting oversized packets. It will not drop +the packet but pass it up and increment the tx/rx oversize counters. The MTU +could be changed via ip or ifconfig commands. + +Flow Control +------------ +Ethernet Flow Control (IEEE 802.3x) can be configured with ethtool to enable +transmitting pause frames. Receiving pause request from outside to Ether Group +MAC is not supported. The flow control auto-negotiation is not supported. The +user can enable or disable Tx Flow Control using "ethtool -A eth? tx <on|off>"
This patch adds the document for DFL Ether Group driver. Signed-off-by: Xu Yilun <yilun.xu@intel.com> --- .../networking/device_drivers/ethernet/index.rst | 1 + .../ethernet/intel/dfl-eth-group.rst | 102 +++++++++++++++++++++ 2 files changed, 103 insertions(+) create mode 100644 Documentation/networking/device_drivers/ethernet/intel/dfl-eth-group.rst