Message ID | 1527144984-31236-1-git-send-email-ilias.apalodimas@linaro.org |
---|---|
Headers | show |
Series | RFC CPSW switchdev mode | expand |
Thu, May 24, 2018 at 08:56:20AM CEST, ilias.apalodimas@linaro.org wrote: >Hello, > >This is adding a new mode on the cpsw driver based around switchdev. >In order to enable this you need to enable CONFIG_NET_SWITCHDEV, >CONFIG_BRIDGE_VLAN_FILTERING, CONFIG_TI_CPSW_SWITCHDEV >and add to udev config: > >SUBSYSTEM=="net", ACTION=="add", ATTR{phys_switch_id}=="0f011900", \ > ATTR{phys_port_name}!="", NAME="sw0$attr{phys_port_name}" >Since the phys_switch_id is based on cpsw version, users with different >version will need to do 'ip -d link show dev sw0p0 | grep switchid' and >replace with the correct value. > >This patch creates 3 ports, sw0p0, sw0p1 and sw0p2. >sw0p1 and sw0p2 are the netdev interfaces connected to PHY devices >while sw0p0 is the switch 'cpu facing port'. Any reason you need cpu port? We don't need it in mlxsw and also in dsa. What is this device? Could you give me some pointer to description?
On Thu, May 24, 2018 at 10:05:28AM +0200, Jiri Pirko wrote: > Thu, May 24, 2018 at 08:56:20AM CEST, ilias.apalodimas@linaro.org wrote: > Any reason you need cpu port? We don't need it in mlxsw and also in dsa. Yes i've seen that on mlxsw/rocker drivers and i was reluctant adding one here. The reason is that TI wants this configured differently from customer facing ports. Apparently there are existing customers already using the "feature". So OR'ing and adding the cpu port on every operation (add/del vlans add ucast/mcast entries etc) was less favoured. > > What is this device? Could you give me some pointer to description? This is the switch used on TI's AM5728 and BBB boards. I am pretty sure there are other platforms i am not aware of. http://www.ti.com/lit/ug/spruhz6j/spruhz6j.pdf is the techincal reference manual. Section 24.11.5.4 "Initialization and Configuration of CPSW" is the switch part. Thanks, Ilias
On Thu, May 24, 2018 at 11:48:31AM +0300, Ilias Apalodimas wrote: > On Thu, May 24, 2018 at 10:05:28AM +0200, Jiri Pirko wrote: > > Thu, May 24, 2018 at 08:56:20AM CEST, ilias.apalodimas@linaro.org wrote: > > Any reason you need cpu port? We don't need it in mlxsw and also in dsa. > Yes i've seen that on mlxsw/rocker drivers and i was reluctant adding one here. > The reason is that TI wants this configured differently from customer facing > ports. Apparently there are existing customers already using the "feature". > So OR'ing and adding the cpu port on every operation (add/del vlans add > ucast/mcast entries etc) was less favoured. Hi Ilias Nice to see this device moving away from its custom model and towards the switchdev model. Did you consider making a clean break from the existing code and write a new driver. Let the existing customers using the existing driver. Have the new switchdev driver fully conform to switchdev. I don't like having this 'cpu' interface. As you say, it breaks the switchhdev model. If we need to extend the switchdev model to support some use case, lets do that. Please can you fully describe the use cases, so we can discuss how to implement them cleanly within the switchdev model. Thanks Andrew
On 24.5.2018 14:54, Andrew Lunn wrote: > On Thu, May 24, 2018 at 11:48:31AM +0300, Ilias Apalodimas wrote: >> On Thu, May 24, 2018 at 10:05:28AM +0200, Jiri Pirko wrote: >>> Thu, May 24, 2018 at 08:56:20AM CEST, ilias.apalodimas@linaro.org wrote: >>> Any reason you need cpu port? We don't need it in mlxsw and also in dsa. >> Yes i've seen that on mlxsw/rocker drivers and i was reluctant adding one here. >> The reason is that TI wants this configured differently from customer facing >> ports. Apparently there are existing customers already using the "feature". >> So OR'ing and adding the cpu port on every operation (add/del vlans add >> ucast/mcast entries etc) was less favoured. > > Hi Ilias > > Nice to see this device moving away from its custom model and towards > the switchdev model. +1 > Did you consider making a clean break from the existing code and write > a new driver. Let the existing customers using the existing > driver. Have the new switchdev driver fully conform to switchdev. I would also prefer fresh new driver. The existing one can be marked as 'bugfix-only' and later pertinently deprecated/removed. > > I don't like having this 'cpu' interface. As you say, it breaks the > switchhdev model. If we need to extend the switchdev model to support > some use case, lets do that. Please can you fully describe the use > cases, so we can discuss how to implement them cleanly within the > switchdev model. +1 Ivan
On Thu, May 24, 2018 at 03:44:54PM +0200, Ivan Vecera wrote: > On 24.5.2018 14:54, Andrew Lunn wrote: > > On Thu, May 24, 2018 at 11:48:31AM +0300, Ilias Apalodimas wrote: > >> On Thu, May 24, 2018 at 10:05:28AM +0200, Jiri Pirko wrote: > >>> Thu, May 24, 2018 at 08:56:20AM CEST, ilias.apalodimas@linaro.org wrote: > >>> Any reason you need cpu port? We don't need it in mlxsw and also in dsa. > >> Yes i've seen that on mlxsw/rocker drivers and i was reluctant adding one here. > >> The reason is that TI wants this configured differently from customer facing > >> ports. Apparently there are existing customers already using the "feature". > >> So OR'ing and adding the cpu port on every operation (add/del vlans add > >> ucast/mcast entries etc) was less favoured. > > > > Hi Ilias > > > > Nice to see this device moving away from its custom model and towards > > the switchdev model. > +1 Thanks. To be honest it opens up so many posibilities for common configuration from userspace across vendors that doing something new without it doesn't make any sense (at least to me). > > > Did you consider making a clean break from the existing code and write > > a new driver. Let the existing customers using the existing > > driver. Have the new switchdev driver fully conform to switchdev. > > I would also prefer fresh new driver. The existing one can be marked as > 'bugfix-only' and later pertinently deprecated/removed. Yes, but given the driver and the platforms it's used at, we ended up patching the existing driver. I am not opposed to the idea, but Grygorii is more suited to reply on that. > > > > I don't like having this 'cpu' interface. As you say, it breaks the > > switchhdev model. If we need to extend the switchdev model to support > > some use case, lets do that. Please can you fully describe the use > > cases, so we can discuss how to implement them cleanly within the > > switchdev model. > +1 There's configuration needs from customers adding or not adding a VLAN to the CPU port. In my configuration examples for instance, if the cpu port is not added to the bridge, you cannot get an ip address on it. Similar cases exist for customers on adding MDBs as far as i know. So they want the "customer facing ports" to have the MDBs present but not the cpu port. In some cases (where the CPE/device that has the switch) participates in the traffic they want the cpu port to have the samne MDBs installed. This is just two simple cases that come in mind, again Grygorii is more suited to answer and explain existing/more complex use cases better than me. Adding a cpu port that cannot transmit or receive traffic is a bit "weird", on the other hand you can access it's configuration using the same userspace tools and the same commands you do for the "normal" ports. Extending switchdev might be the proper solution here. Ilias
> There's configuration needs from customers adding or not adding a VLAN to the > CPU port. In my configuration examples for instance, if the cpu port is not > added to the bridge, you cannot get an ip address on it. If you cannot get an IP address, it is plain broken. The whole idea is that switch port interfaces are just linux interfaces. A linux interface which cannot get an IP address is broken. > Similar cases exist for customers on adding MDBs as far as i know. So they want > the "customer facing ports" to have the MDBs present but not the cpu port. That i can understand. And it should actually work now with switchdev. It performs IGMP snooping, and if there is nothing joining the group on the CPU, it won't add an MDB entry to forward traffic to the CPU. > Adding a cpu port that cannot transmit or receive traffic is a bit "weird" And how is it supposed to send BPDUs? STP is going to be broken.... Andrew
On Thu, May 24, 2018 at 04:54:41PM +0200, Andrew Lunn wrote: > If you cannot get an IP address, it is plain broken. The whole idea is > that switch port interfaces are just linux interfaces. A linux > interface which cannot get an IP address is broken. The switch interfaces can get ip addresses just like every linux interface. The cpu port can't (sw0p0) > > > Similar cases exist for customers on adding MDBs as far as i know. So they want > > the "customer facing ports" to have the MDBs present but not the cpu port. > > That i can understand. And it should actually work now with > switchdev. It performs IGMP snooping, and if there is nothing joining > the group on the CPU, it won't add an MDB entry to forward traffic to > the CPU. Yes, but this should be configurable (i.e the customer can deny adding the MDB on the cpu port) > > > Adding a cpu port that cannot transmit or receive traffic is a bit "weird" > > And how is it supposed to send BPDUs? STP is going to be broken.... Not sure about this, i'll have to check Regards Ilias
> > That i can understand. And it should actually work now with > > switchdev. It performs IGMP snooping, and if there is nothing joining > > the group on the CPU, it won't add an MDB entry to forward traffic to > > the CPU. > Yes, but this should be configurable (i.e the customer can deny adding the MDB > on the cpu port) O.K, back to the basic idea. Switch ports are just normal Linux interfaces. How would you configure this with two e1000e put in a bridge? I want multicast to be bridged between the two e1000e, but the host stack should not see the packets. Andrew
On Thu, May 24, 2018 at 05:25:59PM +0200, Andrew Lunn wrote: > O.K, back to the basic idea. Switch ports are just normal Linux > interfaces. > > How would you configure this with two e1000e put in a bridge? I want > multicast to be bridged between the two e1000e, but the host stack > should not see the packets. I am not sure i am following. I might be missing something. In your case you got two ethernet pci/pcie interfaces bridged through software. You can filter those if needed. In the case we are trying to cover, you got a hardware that offers that capability. Since not all switches are pcie based shouldn't we be able to allow this ? Regards Ilias
On Thu, May 24, 2018 at 07:02:54PM +0300, Ilias Apalodimas wrote: > On Thu, May 24, 2018 at 05:25:59PM +0200, Andrew Lunn wrote: > > O.K, back to the basic idea. Switch ports are just normal Linux > > interfaces. > > > > How would you configure this with two e1000e put in a bridge? I want > > multicast to be bridged between the two e1000e, but the host stack > > should not see the packets. > I am not sure i am following. I might be missing something. In your case you > got two ethernet pci/pcie interfaces bridged through software. You can filter > those if needed. In the case we are trying to cover, you got a hardware that > offers that capability. Since not all switches are pcie based shouldn't we be > able to allow this ? switchdev is about offloading what Linux can do to hardware to accelerate it. The switch is a block of accelerator hardware, like a GPU is for accelerating graphics. Linux can render OpenGL, but it is better to hand it over to the GPU accelerator. Same applies here. The Linux bridge can bridge multicast. Using the switchdev API, you can push that down to the accelerator, and let it do it. So you need to think about, how do you make the Linux bridge not pass multicast traffic to the host stack. Then how do you extend the switchdev API so you can push this down to the accelerator. To really get switchdev, you often need to pivot your point of view a bit. People often think, switchdev is about writing drivers for switches. Its not, its about how you offload networking which Linux can do down to a switch. And if the switch cannot accelerate it, you leave Linux to do it. When you get in the details, i think you will find the switchdev API actually already has what you need for this use case. What you need to figure out is how you make the Linux bridge not pass multicast to the host. Well, actually, not pass multicast it has not asked for. Then accelerate it. Andrew
On Thu, May 24, 2018 at 06:33:10PM +0200, Andrew Lunn wrote: > On Thu, May 24, 2018 at 07:02:54PM +0300, Ilias Apalodimas wrote: > > On Thu, May 24, 2018 at 05:25:59PM +0200, Andrew Lunn wrote: > > > O.K, back to the basic idea. Switch ports are just normal Linux > > > interfaces. > > > > > > How would you configure this with two e1000e put in a bridge? I want > > > multicast to be bridged between the two e1000e, but the host stack > > > should not see the packets. > > I am not sure i am following. I might be missing something. In your case you > > got two ethernet pci/pcie interfaces bridged through software. You can filter > > those if needed. In the case we are trying to cover, you got a hardware that > > offers that capability. Since not all switches are pcie based shouldn't we be > > able to allow this ? > > switchdev is about offloading what Linux can do to hardware to > accelerate it. The switch is a block of accelerator hardware, like a > GPU is for accelerating graphics. Linux can render OpenGL, but it is > better to hand it over to the GPU accelerator. > > Same applies here. The Linux bridge can bridge multicast. Using the > switchdev API, you can push that down to the accelerator, and let it > do it. > > So you need to think about, how do you make the Linux bridge not pass > multicast traffic to the host stack. Then how do you extend the > switchdev API so you can push this down to the accelerator. > > To really get switchdev, you often need to pivot your point of view a > bit. People often think, switchdev is about writing drivers for > switches. Its not, its about how you offload networking which Linux > can do down to a switch. And if the switch cannot accelerate it, you > leave Linux to do it. > > When you get in the details, i think you will find the switchdev API > actually already has what you need for this use case. What you need to > figure out is how you make the Linux bridge not pass multicast to the > host. Well, actually, not pass multicast it has not asked for. Then > accelerate it. > Understood, if we missed back anything on handling multicast for the cpu port we'll go back and fix it (i am assuming snooping is the answer here). Multicasting is only one part of the equation though. What about the need for vlans/FDBs on that port though? Ilias
On Fri, May 25, 2018 at 09:29:02AM +0300, Ilias Apalodimas wrote: > On Thu, May 24, 2018 at 06:33:10PM +0200, Andrew Lunn wrote: > > On Thu, May 24, 2018 at 07:02:54PM +0300, Ilias Apalodimas wrote: > > > On Thu, May 24, 2018 at 05:25:59PM +0200, Andrew Lunn wrote: > > > > O.K, back to the basic idea. Switch ports are just normal Linux > > > > interfaces. > > > > > > > > How would you configure this with two e1000e put in a bridge? I want > > > > multicast to be bridged between the two e1000e, but the host stack > > > > should not see the packets. > > > I am not sure i am following. I might be missing something. In your case you > > > got two ethernet pci/pcie interfaces bridged through software. You can filter > > > those if needed. In the case we are trying to cover, you got a hardware that > > > offers that capability. Since not all switches are pcie based shouldn't we be > > > able to allow this ? > > > > switchdev is about offloading what Linux can do to hardware to > > accelerate it. The switch is a block of accelerator hardware, like a > > GPU is for accelerating graphics. Linux can render OpenGL, but it is > > better to hand it over to the GPU accelerator. > > > > Same applies here. The Linux bridge can bridge multicast. Using the > > switchdev API, you can push that down to the accelerator, and let it > > do it. > > > > So you need to think about, how do you make the Linux bridge not pass > > multicast traffic to the host stack. Then how do you extend the > > switchdev API so you can push this down to the accelerator. > > > > To really get switchdev, you often need to pivot your point of view a > > bit. People often think, switchdev is about writing drivers for > > switches. Its not, its about how you offload networking which Linux > > can do down to a switch. And if the switch cannot accelerate it, you > > leave Linux to do it. > > > > When you get in the details, i think you will find the switchdev API > > actually already has what you need for this use case. What you need to > > figure out is how you make the Linux bridge not pass multicast to the > > host. Well, actually, not pass multicast it has not asked for. Then > > accelerate it. > > > Understood, if we missed back anything on handling multicast for > the cpu port we'll go back and fix it (i am assuming snooping is the answer > here). Multicasting is only one part of the equation though. What about the > need for vlans/FDBs on that port though? > I just noticed this: https://www.spinics.net/lists/netdev/msg504760.html I tried doing the "bridge vlan add vid 2 dev br0 self" in my initial attempts but didn't get a notification to program the CPU port(with the sef argument). This obviously solves our vlan configuration issue. So it's only static FBDs left. Regards Ilias
> So it's only static FBDs left.
Please describe the use case. Why would i want to put static FDB
addresses on a CPU port? What in the linux network stack needs this?
Andrew
> Understood, if we missed back anything on handling multicast for > the cpu port we'll go back and fix it (i am assuming snooping is the answer > here). It is part of the answer. You should also look at .ndo_set_rx_mode. When the switch ports are not in a bridge, this call i used to pass a list of MAC addresses which the network stack would like to receiver. You probably want to turn that list into MBD and FDBs. Andrew
Sorry for the late response i had some time to take another look and do some extra testing > switchdev is about offloading what Linux can do to hardware to > accelerate it. The switch is a block of accelerator hardware, like a > GPU is for accelerating graphics. Linux can render OpenGL, but it is > better to hand it over to the GPU accelerator. > > Same applies here. The Linux bridge can bridge multicast. Using the > switchdev API, you can push that down to the accelerator, and let it > do it. > > So you need to think about, how do you make the Linux bridge not pass > multicast traffic to the host stack. Then how do you extend the > switchdev API so you can push this down to the accelerator. > > To really get switchdev, you often need to pivot your point of view a > bit. People often think, switchdev is about writing drivers for > switches. Its not, its about how you offload networking which Linux > can do down to a switch. And if the switch cannot accelerate it, you > leave Linux to do it. > > When you get in the details, i think you will find the switchdev API > actually already has what you need for this use case. What you need to > figure out is how you make the Linux bridge not pass multicast to the > host. Well, actually, not pass multicast it has not asked for. Then > accelerate it. The current driver is already working like that. The difference between the modes of operation is this: By registering the 'cpu port' we choose if the linux host is going to see the br_ip4_multicast_igmp3_report or br_multicast_ipv4_rcv (by configuring the vlan it participates) and trigger switchdev to add the MDBs If the cpu port is member of that VLAN then the dynamic entry shows on 'bridge mdb show' command i.e dev br0 port sw0p1 grp 239.1.1.1 temp offload vid 100 If not the user is able to add it manually. Anyway i got the main points of the RFC, if Petr's patch get accepted i might be able to respin this without registering a CPU port. Regards Ilias
Hi Ilias, On 05/24/2018 01:56 AM, Ilias Apalodimas wrote: > Hello, > > This is adding a new mode on the cpsw driver based around switchdev. > In order to enable this you need to enable CONFIG_NET_SWITCHDEV, > CONFIG_BRIDGE_VLAN_FILTERING, CONFIG_TI_CPSW_SWITCHDEV > and add to udev config: > > SUBSYSTEM=="net", ACTION=="add", ATTR{phys_switch_id}=="0f011900", \ > ATTR{phys_port_name}!="", NAME="sw0$attr{phys_port_name}" > Since the phys_switch_id is based on cpsw version, users with different > version will need to do 'ip -d link show dev sw0p0 | grep switchid' and > replace with the correct value. > > This patch creates 3 ports, sw0p0, sw0p1 and sw0p2. > sw0p1 and sw0p2 are the netdev interfaces connected to PHY devices > while sw0p0 is the switch 'cpu facing port'. > sw0p0 will be unable to receive and transmit traffic and it's not 100% within > switchdev scope but, it's used to configure switch cpu port individually as > this is needed for various switch features and configuration scenarios. First, sorry for delayed reply it was very tough week (new SoC). Second, Thanks a lot for your great work. I'm still testing it with different use cases and trying to consolidate my reply for all questions. All, thanks for your comments. -- regards, -grygorii
On Fri, Jun 01, 2018 at 04:29:08PM -0500, Grygorii Strashko wrote: > Hi Ilias, > Second, Thanks a lot for your great work. I'm still testing it with different > use cases and trying to consolidate my reply for all questions. > > All, thanks for your comments. Hi Grygorii Something i've said to Ilias already. I would recommend you don't try to cover all your uses cases with the first version. Keep it simple and clean, don't do anything controversial and get it merged. Then add more features one by one. We can then discuss any odd ball features while being able to look at the complete system, driver, switchdev and the network stack. Andrew
Hi All, Sry, for delayed reply. On 05/24/2018 09:08 AM, Ilias Apalodimas wrote: > On Thu, May 24, 2018 at 03:44:54PM +0200, Ivan Vecera wrote: >> On 24.5.2018 14:54, Andrew Lunn wrote: >>> On Thu, May 24, 2018 at 11:48:31AM +0300, Ilias Apalodimas wrote: >>>> On Thu, May 24, 2018 at 10:05:28AM +0200, Jiri Pirko wrote: >>>>> Thu, May 24, 2018 at 08:56:20AM CEST, ilias.apalodimas@linaro.org wrote: >>>>> Any reason you need cpu port? We don't need it in mlxsw and also in dsa. >>>> Yes i've seen that on mlxsw/rocker drivers and i was reluctant adding one here. >>>> The reason is that TI wants this configured differently from customer facing >>>> ports. Apparently there are existing customers already using the "feature". >>>> So OR'ing and adding the cpu port on every operation (add/del vlans add >>>> ucast/mcast entries etc) was less favoured. >>> >>> Hi Ilias >>> >>> Nice to see this device moving away from its custom model and towards >>> the switchdev model. >> +1 > Thanks. To be honest it opens up so many posibilities for common configuration > from userspace across vendors that doing something new without it doesn't make > any sense (at least to me). > >> >>> Did you consider making a clean break from the existing code and write >>> a new driver. Let the existing customers using the existing >>> driver. Have the new switchdev driver fully conform to switchdev. >> >> I would also prefer fresh new driver. The existing one can be marked as >> 'bugfix-only' and later pertinently deprecated/removed. > Yes, but given the driver and the platforms it's used at, we ended up patching > the existing driver. I am not opposed to the idea, but Grygorii is more suited > to reply on that. Correct, we considered two options - start from scratch or hack existing driver to get working prototype as fast as possible. Hacking of existing driver was just faster way to go and i agree that new driver might be better approach for the future. > >>> >>> I don't like having this 'cpu' interface. As you say, it breaks the >>> switchhdev model. If we need to extend the switchdev model to support >>> some use case, lets do that. Please can you fully describe the use >>> cases, so we can discuss how to implement them cleanly within the >>> switchdev model. >> +1 > There's configuration needs from customers adding or not adding a VLAN to the > CPU port. In my configuration examples for instance, if the cpu port is not > added to the bridge, you cannot get an ip address on it. > Similar cases exist for customers on adding MDBs as far as i know. So they want > the "customer facing ports" to have the MDBs present but not the cpu port. > In some cases (where the CPE/device that has the switch) participates in the > traffic they want the cpu port to have the samne MDBs installed. > This is just two simple cases that come in mind, again Grygorii is more suited > to answer and explain existing/more complex use cases better than me. > > Adding a cpu port that cannot transmit or receive traffic is a bit "weird", on > the other hand you can access it's configuration using the same userspace tools > and the same commands you do for the "normal" ports. Extending switchdev might > be the proper solution here. I'd try to provide more information about this switch and why we end up adding eth0 port. Please, be patient to me as this is new area for me and I might not be right in some conclusions or can missing smth. *Before this patch set*: Current CPSW driver in switch mode can be described as below +----------------------------------------+ |Linux Host 0 | | | | TI tool (ioctl)/netdev | | + + | +-----------------+----------------------+ | | | Control MMIO | | | Data DMA | | | +------------+-----+-----+-----------------+ | | | | CPDMA | CPSW | | v | | | | | | +-----------+ | | | | P0 | | | | | | | | | +-----+-----+ | | | | FDB: static MAC Port=0 | | +-----+-----+ | | +------> ALE +-----+ | | +------+ | | | | | +-----------+ | | | +---+----+ +----+---+ | | | | | | | | | P1 | | P2 | | +------------------------------------------+ | | | | | PHY | | PHY | +---+----+ +----+---+ | | +---+----+ +---+----+ | | | | | Host 1| | Host 2 | +--------+ +--------+ Note. This is embedded world and in many cases network configuration is static, everything which is not allowed - drop (means no such things like IGMP or even ARP). The core Part of CPSW is ALE module which perform switching ops according to ALE table (fdb/mdb/vlan). From ALE point of view *all* port absolutely equal. And Linux Host 0 in most of customer use cases is the no much different from Hosts 1 or 2, so allowed net traffic to Linux Host 0 have to be very carefully configured as Apps running on it must continue working even if network failed or overloaded (packet storms). So, as per my understanding, P0 meaning here is absolutely not the same as CPU port in DSA. It just another switch port with bad luck to be connected directly to CPU. Current CPSW driver offloads basic switch configuration in HW and additional configuration can be done using TI custom tool (kernel updated to accept additional IOCTL). Current driver creates one netdev ETH0 and all net traffic enter/exit through this device - it possible to distinguish which external port p1/2 received packet or send packet directly to external port p1/2, but current CPSW doesn't do this. Static Unicast FDB entries created in ALE table for Port 0 to specify which unicast traffic need to be accepted by Linux Host 0. By default only registered multicast or broadcast packets forwarded to Linux Host 0. *After this patch set*: goal keep things working the same as max as possible and get rid of TI custom tool. +-------------------------------------------------------+ | Linux Host 0 | | +------------------+ | | | br0 | | | | | | | +---+--------------+ | | | |data | | +-------+--------------+---------------+ | | | | ^ ^ | | | | | | ++-----+ +-+---+-+ +----+-+-+ | | | |sw0p0 | | sw0p1 | |sw0p2 | | | | +------+ +----^--+ +-----^--+ | | | | | | | | +-----------+-------------+ | | | | | +----+v+------------------------------------------------+ | | | Control MMIO | | | Data DMA | | | +------------+-----+-----+-----------------------------+ | | | | CPDMA | CPSW | | v | | | | | | +-----------+ | | | | P0 | | | | | | | | | +-----+-----+ | | | | FDB: static MACP1 port=0 | | | +-----+-----+ FDB: static MACP2 port=0 | | +------> ALE +-----+ | | +------+ | | | | | +-----------+ | | | +---+----+ +----+---+ | | | | | | | | | P1 | | P2 | | +------------------------------------------------------+ | | | | | PHY | | PHY | +--------+ +--------+ In this implementation switch egress traffic to Linux Host 0 always split between sw0p1 and sw0p2 depending on which external port P1/P2 packet was received, sw0p0 doesn't produce any traffic. On switch ingress from Linux Host 0 packets are always sent directly to external ports P1/P2, with assumption that Linux Bridge knows where packet should go, and ALE bypassed in this case. sw0p0 doesn't allows to send any traffic and serves for configuration purposes only. Below I've described some tested use cases (not include full static configuration), but regarding sw0p0 - there is work done by Ivan Khoronzhuk [1] which enables adds MQPRIO and CBS Qdisc and targets AVB network features. It required to offload MQPRIO and CBS parameters on all ports including P0. In case of P0, CPDMA TX channels shapers need to be configured, and in case of sw0p1/sw0p2 internal FIFOS. sw0p0 also expected to be used to configure CPDMA interface in general - number of tx/rx channels, rates, ring sizes. In addition there is set of global CPSW parameters (not related to P1/P2, like MAC Authorization Mode, OUI Deny Mode, crc ) which I've thought can be added to sw0p0 (using ethtool -priv-flags). Additional headache is PTP: we have on PHC, but both external interfaces P1/P2 can timestamp packets. [1] https://lkml.org/lkml/2018/5/18/1134 Below use cases were tried with this approach tried with current LKML, so possible changes to Net/Bridge framework not considered: 1) boot, ping no vlan # ip link add name br0 type bridge # echo 0 > /sys/class/net/br0/bridge/default_pvid # ip link set dev eth2 master br0 # ip link set dev eth0 master br0 # ip link set dev eth1 master br0 # ifconfig br0 192.168.1.2 *Note*: I've had to disable default_pvid as otherwise linux Bridge adds and offloads default vlan 1, but default configuration for CPSW driver is vid 0. + CPSW specific - it can't untag packets for P0. Another option I've found: # ip link set dev br0 type bridge vlan_filtering 1. but anyway, I've found it confusing that Linux bridge adds default vlan when vlan_filtering == 0 2) add vlans. Host 1 - not vlan capable, Host 2/ Linux Host 0 can handle vlans # bridge vlan add dev sw0p1 vid 100 pvid untagged master # bridge vlan add dev sw0p2 vid 100 master # bridge vlan add dev sw0p0 vid 100 master Any combination expected to work. *Note*. Ilias reused IFF_ALLMULTI and IFF_MULTICAST on each interface to configure registered and unregistered multicast ports masks for each ALE VLAN entries. So, # ifconfig eth0 -multicast # ifconfig eth0 --allmulti # bridge vlan add dev sw0p0 vid 100 master expected to stop forwarding any multicast traffic to Linux Host 0 3) MDB Allow mcast 239.1.1.1 between P1/P2 # bridge mdb add dev br0 port sw0p1 grp 239.1.1.1 permanent # bridge mdb add dev br0 port sw0p2 grp 239.1.1.1 permanent Allow mcast 239.1.1.2 between P0/P2 # bridge mdb add dev br0 port sw0p0 grp 239.1.1.2 permanent # bridge mdb add dev br0 port sw0p2 grp 239.1.1.2 permanent *Note !!!!!*. I've found no proper way to add L2 mcast addresses as 01-80-C2-00-00-00 or 01-1B-19-00-00-00. probably I missing smth. 4) stp I did some updates (added port stp state offload/stp mcats addr configuration - will post diff on Monday) and tried stp (kernel STP) by connecting P1/P2 ports to the same switch: # ip link add name br0 type bridge # echo 0 > /sys/class/net/br0/bridge/default_pvid # ip link set dev sw0p2 master br0 # ip link set dev sw0p0 master br0 # ip link set dev sw0p1 master br0 # brctl show # ifconfig br0 192.168.1.2 # brctl stp br0 on # brctl showstp br0 sw0p1 (3) port id 8003 state blocking sw0p2 (1) port id 8001 state forwarding Don't know howto: 1) add FDB entry with "blocked" flag - ALE can discard all packets with SRC/DST address = blocked MAC 2) add multicast MAC address with Supervisory Packet flag set. Such packets will bypass most of checks inside ALE and will be forwarded in all port's states except "disabled". 3) add "unknown vlan configuration" : ALE provides possibility to configure default behavior for tagged packets with "unknown vlan" by configuring - Unknown VLAN Force Untagged Egress ports Mask. - Unknown VLAN Registered Multicast Flood Ports Mask - Unknown VLAN Multicast Flood ports Mask - Unknown VLAN Member ports List 4) The way to detect "brctl stp br0 on/off" If you are here. Thanks for reading this and your patience. -- regards, -grygorii
On Sat, Jun 02, 2018 at 06:28:22PM -0500, Grygorii Strashko wrote: Hi Grygorii I'm just picking out one thing here... there is lots more good stuff here. > Additional headache is PTP: we have on PHC, but both external interfaces P1/P2 > can timestamp packets. This should not be a problem. The Marvell switches have one PHC, but each port can time stamp packets using this counter. Each port has its own receive and transmit time stamp registers. So i don't think this will cause you problems. Andrew
> *After this patch set*: goal keep things working the same as max as > possible and get rid of TI custom tool. We are happy to keep things the same, if they fit with the switchdev model. Anything in your customer TI tool/model which does not fit the switchdev model you won't be able to keep, except if we agree to extend the model. I can say now, sw0p0 is going to cause problems. I really do suggest you drop it for the moment in order to get a minimal driver accepted. sw0p0 does not fit the switchdev model. > Below I've described some tested use cases (not include full static configuration), > but regarding sw0p0 - there is work done by Ivan Khoronzhuk [1] which enables > adds MQPRIO and CBS Qdisc and targets AVB network features. It required to > offload MQPRIO and CBS parameters on all ports including P0. In case of P0, > CPDMA TX channels shapers need to be configured, and in case > of sw0p1/sw0p2 internal FIFOS. > sw0p0 also expected to be used to configure CPDMA interface in general - > number of tx/rx channels, rates, ring sizes. Can this be derives from the configuration on sw0p1 and sw0p2? sw0p1 has 1 tx channel, sw0p2 has 2 tx channels, so give p0 3 tx channels? > In addition there is set of global CPSW parameters (not related to P1/P2, like > MAC Authorization Mode, OUI Deny Mode, crc ) which I've > thought can be added to sw0p0 (using ethtool -priv-flags). You should describe these features, and then we can figure out how best to model them. devlink might be an option if they are switch global. Andrew
> 1) boot, ping no vlan > > # ip link add name br0 type bridge > # echo 0 > /sys/class/net/br0/bridge/default_pvid > # ip link set dev eth2 master br0 > # ip link set dev eth0 master br0 > # ip link set dev eth1 master br0 > # ifconfig br0 192.168.1.2 > > *Note*: I've had to disable default_pvid as otherwise linux Bridge adds > and offloads default vlan 1, but default configuration for CPSW driver is vid 0. > + CPSW specific - it can't untag packets for P0. > Another option I've found: > # ip link set dev br0 type bridge vlan_filtering 1. > but anyway, I've found it confusing that Linux bridge adds default vlan when vlan_filtering == 0 There are three different configurations here you need to worry about, with respect to vlans: # CONFIG_VLAN_8021Q is not set So you don't have any vlan support in the kernel. CONFIG_VLAN_8021Q=y, vlan_filtering = 0 So you have vlans, but filtering is off CONFIG_VLAN_8021Q=y, vlan_filtering = 1 So you have vlans, and filtering is on. Even with vlan_filtering off, the bridge still does a little with vlans. And you need all three to work correctly. Andrew
Hi Grygorii > Don't know howto: > 1) add FDB entry with "blocked" flag - ALE can discard all packets with SRC/DST > address = blocked MAC > 2) add multicast MAC address with Supervisory Packet flag set. > Such packets will bypass most of checks inside ALE and will be forwarded in all port's > states except "disabled". > 3) add "unknown vlan configuration" : ALE provides possibility to configure > default behavior for tagged packets with "unknown vlan" by configuring > - Unknown VLAN Force Untagged Egress ports Mask. > - Unknown VLAN Registered Multicast Flood Ports Mask > - Unknown VLAN Multicast Flood ports Mask > - Unknown VLAN Member ports List > 4) The way to detect "brctl stp br0 on/off" You are probably looking at this from the wrong direction. Yes, the switch can do these things. But the real question is, why would the network stack want to do this? As i've said before, you are accelerating the network stack by offloading things to the hardware. Does the software bridge support FDB with a blocked flag? I don't think it does. So you first need to extend the software bridge with this concept. Then you can offload it to the hardware to accelerate it. Does the network stack need for forward specific multicast MAC addresses between bridge ports independent of the state? If there is no need for it, you don't need to accelerate it. Andrew
On 06/02/2018 07:08 PM, Andrew Lunn wrote: > On Sat, Jun 02, 2018 at 06:28:22PM -0500, Grygorii Strashko wrote: > > Hi Grygorii > > I'm just picking out one thing here... there is lots more good stuff here. > >> Additional headache is PTP: we have on PHC, but both external interfaces P1/P2 >> can timestamp packets. > > This should not be a problem. The Marvell switches have one PHC, but > each port can time stamp packets using this counter. Each port has its > own receive and transmit time stamp registers. So i don't think this > will cause you problems. I hope you are right - question is always in number of available options and which one to select - and, most important, explain it to the end user :( For example: phc_index is returned as part of .get_ts_info() = cpsw_get_ts_info(), so which intf should return phc_index? Still not tested, so jut hope ... -- regards, -grygorii
> I hope you are right - question is always in number of available options > and which one to select - and, most important, explain it to the end user :( The end customer being ptp4linux? At least for Marvell switches, it is happy about everything except that the switch is a bit slow, so we need to modify some of the time outs in the configuration file. > For example: > phc_index is returned as part of .get_ts_info() = cpsw_get_ts_info(), > so which intf should return phc_index? It is not a 1:1 relationship. See: https://elixir.bootlin.com/linux/latest/source/drivers/net/dsa/mv88e6xxx/hwtstamp.c#L61 All interfaces return the same index. In fact, for a switch, having a PHC per port would be odd. That would mean you need to sync the PHCs in order to act as a boundary clock. Andrew
On 06/02/2018 07:37 PM, Andrew Lunn wrote: >> 1) boot, ping no vlan >> >> # ip link add name br0 type bridge >> # echo 0 > /sys/class/net/br0/bridge/default_pvid >> # ip link set dev eth2 master br0 >> # ip link set dev eth0 master br0 >> # ip link set dev eth1 master br0 >> # ifconfig br0 192.168.1.2 >> >> *Note*: I've had to disable default_pvid as otherwise linux Bridge adds >> and offloads default vlan 1, but default configuration for CPSW driver is vid 0. >> + CPSW specific - it can't untag packets for P0. >> Another option I've found: >> # ip link set dev br0 type bridge vlan_filtering 1. >> but anyway, I've found it confusing that Linux bridge adds default vlan when vlan_filtering == 0 > > There are three different configurations here you need to worry about, > with respect to vlans: > > # CONFIG_VLAN_8021Q is not set > > So you don't have any vlan support in the kernel. > > CONFIG_VLAN_8021Q=y, vlan_filtering = 0 > > So you have vlans, but filtering is off > > CONFIG_VLAN_8021Q=y, vlan_filtering = 1 > > So you have vlans, and filtering is on. > > Even with vlan_filtering off, the bridge still does a little with > vlans. > > And you need all three to work correctly. > Right, thanky for the info, but still (sry, to be annoying) why default vlan is added by bridge when vlan_filtering == 0? -- regards, -grygorii
> Right, thanky for the info, but still (sry, to be annoying) why default vlan is added by bridge > when vlan_filtering == 0? I have no idea! I just made sure the Marvell driver works with this. You might want to do a git blame and find out who added it, and it might say why. Andrew
On 06/05/2018 04:28 PM, Andrew Lunn wrote: >> I hope you are right - question is always in number of available options >> and which one to select - and, most important, explain it to the end user :( > > The end customer being ptp4linux? At least for Marvell switches, it is > happy about everything except that the switch is a bit slow, so we > need to modify some of the time outs in the configuration file. > >> For example: >> phc_index is returned as part of .get_ts_info() = cpsw_get_ts_info(), >> so which intf should return phc_index? > > It is not a 1:1 relationship. See: > > https://elixir.bootlin.com/linux/latest/source/drivers/net/dsa/mv88e6xxx/hwtstamp.c#L61 > > All interfaces return the same index. > > In fact, for a switch, having a PHC per port would be odd. That would > mean you need to sync the PHCs in order to act as a boundary clock. PHC only one, but hw timestamping blocks are per port. -- regards, -grygorii
> PHC only one, but hw timestamping blocks are per port.
Yes, same as the Marvell. Per port, there are two receive time stamps
and one transmit time stamp.
Andrew
Hi Andrew, Thanks a lot for you comments. On 06/02/2018 07:49 PM, Andrew Lunn wrote: > Hi Grygorii > >> Don't know howto: >> 1) add FDB entry with "blocked" flag - ALE can discard all packets with SRC/DST >> address = blocked MAC >> 2) add multicast MAC address with Supervisory Packet flag set. >> Such packets will bypass most of checks inside ALE and will be forwarded in all port's >> states except "disabled". >> 3) add "unknown vlan configuration" : ALE provides possibility to configure >> default behavior for tagged packets with "unknown vlan" by configuring >> - Unknown VLAN Force Untagged Egress ports Mask. >> - Unknown VLAN Registered Multicast Flood Ports Mask >> - Unknown VLAN Multicast Flood ports Mask >> - Unknown VLAN Member ports List >> 4) The way to detect "brctl stp br0 on/off" > > You are probably looking at this from the wrong direction. Yes, the > switch can do these things. But the real question is, why would the > network stack want to do this? As i've said before, you are > accelerating the network stack by offloading things to the hardware. Right. Thanks. > > Does the software bridge support FDB with a blocked flag? I don't > think it does. So you first need to extend the software bridge with > this concept. Then you can offload it to the hardware to accelerate > it. Sry, for possible misunderstanding: in "Don't know howto" i've listed things I was not able to discover from code or documentation with hope that expert opinion will help to confirm if this this a real/possible gap or I/we've just missed smth. And if this is a real gap - get "green" or "red" flag for future work (which need to be planned somehow). So, my understanding for (1) "blocked FDB entry support" is reasonable extension for bridge/switchdev ("green"). > > Does the network stack need for forward specific multicast MAC > addresses between bridge ports independent of the state? If there is > no need for it, you don't need to accelerate it. Assume this is about item 2 - this question is related to STP packets. CPSW/ALE will drop STP packets if receiving port is in blocking/learning states unless corresponding mcast entry exist in ALE entry with (Supervisory Packet) flag set (Actually ALE mcast entry has two fields (TRM): Supervisory Packet (SUPER): When set, this field indicates that the packet with a matching multicast destination address is a supervisory packet. Multicast Forward State (MCAST_FWD_STATE): Indicates the port state(s) required for the received port on a destination address lookup in order for the multicast packet to be forwarded to the transmit port(s). A transmit port must be in the Forwarding state in order to forward the packet.) Question 4 was asked with assumption that if (2) not supported and "red" flag - then option (4) can be used as w/a (again if "green" flag) and STP mcast address can be added in ALE on event "stp on". ** "unknown vlan configuration" This is about following use case. Non static network configuration when CPSW based device knows what traffic it can accept (Host port 0), but it has no knowledge (or limited) about network segments attached to Port 1 and Port 2. For example: Host 0 can accept only vlan 100 traffic coming from Port 1. ALE entry: vid =100, port_mask 0x3 But there can be vlan 50 created in attached network segments. Unknown VLAN Force Untagged Egress ports Mask = 0x0 Unknown VLAN Registered Multicast Flood Ports Mask = 0x6 (P1|P2) Unknown VLAN Multicast Flood ports Mask = 0x6 (P1|P2) Unknown VLAN Member ports List = 0x6 (P1|P2) with above configuration packets with "unknown vlan" (no ALE entry) will still be forwarded between port 1 and 2, but not Port 0. So, is it reasonable to add "unknown vlan configuration" to bridge/switchdev if not exist yet? will any other hw known benefit from it? -- regards, -grygorii
On 06/02/2018 09:08 AM, Andrew Lunn wrote: > On Fri, Jun 01, 2018 at 04:29:08PM -0500, Grygorii Strashko wrote: >> Hi Ilias, > > >> Second, Thanks a lot for your great work. I'm still testing it with different >> use cases and trying to consolidate my reply for all questions. >> >> All, thanks for your comments. > > Hi Grygorii > > Something i've said to Ilias already. I would recommend you don't try > to cover all your uses cases with the first version. Keep it simple > and clean, don't do anything controversial and get it merged. Then add > more features one by one. We can then discuss any odd ball features > while being able to look at the complete system, driver, switchdev and > the network stack. > yes. It definitely no problem from my side, except basic customer use-cases simply not working without sw0p0, at least with current LKML :( And I just have to look a little bit in the future as selected approach expected to be extended on future SoC (and other parts of existing SoCs - ICSS-G SW switch) where we going to have more features, like TSN, EST and packet Policing and Classification. And I very, very appreciated for your (and all others) time and comments. Thank you. -- regards, -grygorii
On 06/02/2018 07:26 PM, Andrew Lunn wrote: >> *After this patch set*: goal keep things working the same as max as >> possible and get rid of TI custom tool. > > We are happy to keep things the same, if they fit with the switchdev > model. Anything in your customer TI tool/model which does not fit the > switchdev model you won't be able to keep, except if we agree to > extend the model. Right. That's the main goal of RFC to identify those gaps. > > I can say now, sw0p0 is going to cause problems. I really do suggest > you drop it for the moment in order to get a minimal driver > accepted. sw0p0 does not fit the switchdev model. Honestly, this is not the first patchset and we started without sw0p0, but then.... (with current LKML) - default vlan offloading breaks traffic reception to P0 (Ilias saying it's fixed in next - good) - adding vlan to P1/P2 works, but not for P0 (again as per Ilias -fixed) - mcast - no way to manually add static record and include or exclude P0. :( above are basic functionality required. > >> Below I've described some tested use cases (not include full static configuration), >> but regarding sw0p0 - there is work done by Ivan Khoronzhuk [1] which enables >> adds MQPRIO and CBS Qdisc and targets AVB network features. It required to >> offload MQPRIO and CBS parameters on all ports including P0. In case of P0, >> CPDMA TX channels shapers need to be configured, and in case >> of sw0p1/sw0p2 internal FIFOS. >> sw0p0 also expected to be used to configure CPDMA interface in general - >> number of tx/rx channels, rates, ring sizes. > > Can this be derives from the configuration on sw0p1 and sw0p2? > sw0p1 has 1 tx channel, sw0p2 has 2 tx channels, so give p0 3 tx > channels? This not exactly what is required, sry I probably will need just repeat what Ivan described in https://lkml.org/lkml/2018/5/18/1135. I'd try to provide this info tomorrow. Unfortunately, I'm not sure if all this reworking and switchdev conversation would make sense if we will not be able to fit Ivan's work in new CPSW driver model ;..( and do AVB bridge. > >> In addition there is set of global CPSW parameters (not related to P1/P2, like >> MAC Authorization Mode, OUI Deny Mode, crc ) which I've >> thought can be added to sw0p0 (using ethtool -priv-flags). > > You should describe these features, and then we can figure out how > best to model them. devlink might be an option if they are switch > global. Ok. that can be postponed. -- regards, -grygorii
> So, my understanding for (1) "blocked FDB entry support" is reasonable > extension for bridge/switchdev ("green"). You might have to justify it, but yes. > > Does the network stack need for forward specific multicast MAC > > addresses between bridge ports independent of the state? If there is > > no need for it, you don't need to accelerate it. > > Assume this is about item 2 - this question is related to STP packets. > CPSW/ALE will drop STP packets if receiving port is in blocking/learning states > unless corresponding mcast entry exist in ALE entry with (Supervisory Packet) flag set > (Actually ALE mcast entry has two fields (TRM): > Supervisory Packet (SUPER): When set, this field indicates that the packet > with a matching multicast destination address is a supervisory packet. > Multicast Forward State (MCAST_FWD_STATE): Indicates the port state(s) required for the received port > on a destination address lookup in order for the multicast packet to be forwarded to > the transmit port(s). A transmit port must be in the Forwarding state in > order to forward the packet.) So i this case, i would expect your driver to just add these entries to the ALE. No need for configuration for above. Just do it as soon as a port is made a member of a bridge. And remove it when the port leaves the bridge. DSA devices are smarter. They all have hardware which looks for BPDU and forwards them to the host independent of the state of the port. They also tend to have hardware looking for IGMP packets, and will forward them to the host, even if the multicast address is not being forwarded to the host. > ** "unknown vlan configuration" > > This is about following use case. Non static network configuration when > CPSW based device knows what traffic it can accept (Host port 0), but > it has no knowledge (or limited) about network segments attached to Port 1 and Port 2. > > For example: Host 0 can accept only vlan 100 traffic coming from Port 1. > ALE entry: vid =100, port_mask 0x3 > > But there can be vlan 50 created in attached network segments. > Unknown VLAN Force Untagged Egress ports Mask = 0x0 > Unknown VLAN Registered Multicast Flood Ports Mask = 0x6 (P1|P2) > Unknown VLAN Multicast Flood ports Mask = 0x6 (P1|P2) > Unknown VLAN Member ports List = 0x6 (P1|P2) > > with above configuration packets with "unknown vlan" (no ALE entry) will > still be forwarded between port 1 and 2, but not Port 0. > > So, is it reasonable to add "unknown vlan configuration" to bridge/switchdev > if not exist yet? will any other hw known benefit from it? You need to think about the case of two e1000e. How do you configure this setup to do what you want? It probably is already possible. Andrew
On Tue, Jun 05, 2018 at 06:23:45PM -0500, Grygorii Strashko wrote: > > > On 06/02/2018 07:26 PM, Andrew Lunn wrote: > >> *After this patch set*: goal keep things working the same as max as > >> possible and get rid of TI custom tool. > > > > We are happy to keep things the same, if they fit with the switchdev > > model. Anything in your customer TI tool/model which does not fit the > > switchdev model you won't be able to keep, except if we agree to > > extend the model. > > Right. That's the main goal of RFC to identify those gaps. > > > > > I can say now, sw0p0 is going to cause problems. I really do suggest > > you drop it for the moment in order to get a minimal driver > > accepted. sw0p0 does not fit the switchdev model. > > Honestly, this is not the first patchset and we started without sw0p0, > but then.... (with current LKML) > - default vlan offloading breaks traffic reception to P0 > (Ilias saying it's fixed in next - good) > - adding vlan to P1/P2 works, but not for P0 (again as per Ilias -fixed) > - mcast - no way to manually add static record and include or exclude P0. > > > :( above are basic functionality required. For a DSA driver, this is way more than basic. A basic DSA driver just provides interfaces, and does everything in software. No offload at all. Generally, FDB offload is next, then MDB, and then VLAN, each as separate patch sets. > Unfortunately, I'm not sure if all this reworking and switchdev conversation would make sense > if we will not be able to fit Ivan's work in new CPSW driver model ;..( > and do AVB bridge. AVB bridge should fit the switchdev model. You can offload TC via switchdev e.g. the b53 has mirred, mellonex has flower and a lot more. Andrew
> And I just have to look a little bit in the future as selected approach > expected to be extended on future SoC (and other parts of existing SoCs - ICSS-G SW switch) > where we going to have more features, like TSN, EST and packet Policing and Classification. You should probably took a closer look at the Mellonex driver. It has a lot of TC offload, which is what policing and classification is. TSN is being worked on in general, and i think the i210 is taking the lead. So you probably want to keep an eye on that, and join the discussion. Andrew
Hi Andrew, >On Wed, Jun 06, 2018 at 01:53:56AM +0200, Andrew Lunn wrote: > > And I just have to look a little bit in the future as selected approach > > expected to be extended on future SoC (and other parts of existing SoCs - ICSS-G SW switch) > > where we going to have more features, like TSN, EST and packet Policing and Classification. > > You should probably took a closer look at the Mellonex driver. It has > a lot of TC offload, which is what policing and classification is. > I did take a close look to both Mellanox and rocker drivers before issuing this RFC and we seem to be close on the approach. What Grygorii is reffering to, is that for CBS to work properly on CPSW there *must* be a way to configure the CPU port individually. > TSN is being worked on in general, and i think the i210 is taking the > lead. So you probably want to keep an eye on that, and join the > discussion. > TSN is not just "one thing". It's a few IEEE standards bundled up to provide the needed functionality. i210 is only implementing CBS at the moment and there's an RFC out there to support what they call "Time based scheduling". I am already having discussions with Jesus on their current work. The idea behind using switchdev is that due to it's design, it's a very convenient way of utilizing netlink and iproute2/tc to configure any kind of future shapers. Since net_device_ops now has a callback to configure hardware schedulers being able to expose netdevs as hardware ports and configure them individually is great. As you can understand you'll end up with TSN capable switches and NICs being configured with the same commands from a userspace point of view. I am not sure this is the proper way to go, but at least for me, sounds like a viable solution. Thanks Ilias
On Tue, Jun 05, 2018 at 06:23:45PM -0500, Grygorii Strashko wrote: > > >On 06/02/2018 07:26 PM, Andrew Lunn wrote: >>> *After this patch set*: goal keep things working the same as max as >>> possible and get rid of TI custom tool. >> >> We are happy to keep things the same, if they fit with the switchdev >> model. Anything in your customer TI tool/model which does not fit the >> switchdev model you won't be able to keep, except if we agree to >> extend the model. > >Right. That's the main goal of RFC to identify those gaps. > >> >> I can say now, sw0p0 is going to cause problems. I really do suggest >> you drop it for the moment in order to get a minimal driver >> accepted. sw0p0 does not fit the switchdev model. > >Honestly, this is not the first patchset and we started without sw0p0, >but then.... (with current LKML) >- default vlan offloading breaks traffic reception to P0 > (Ilias saying it's fixed in next - good) >- adding vlan to P1/P2 works, but not for P0 (again as per Ilias -fixed) >- mcast - no way to manually add static record and include or exclude P0. > > >:( above are basic functionality required. > >> >>> Below I've described some tested use cases (not include full static configuration), >>> but regarding sw0p0 - there is work done by Ivan Khoronzhuk [1] which enables >>> adds MQPRIO and CBS Qdisc and targets AVB network features. It required to >>> offload MQPRIO and CBS parameters on all ports including P0. In case of P0, >>> CPDMA TX channels shapers need to be configured, and in case >>> of sw0p1/sw0p2 internal FIFOS. >>> sw0p0 also expected to be used to configure CPDMA interface in general - >>> number of tx/rx channels, rates, ring sizes. >> >> Can this be derives from the configuration on sw0p1 and sw0p2? >> sw0p1 has 1 tx channel, sw0p2 has 2 tx channels, so give p0 3 tx >> channels? > >This not exactly what is required, sry I probably will need just repeat what Ivan >described in https://lkml.org/lkml/2018/5/18/1135. I'd try to provide this info tomorrow. > >Unfortunately, I'm not sure if all this reworking and switchdev conversation would make sense >if we will not be able to fit Ivan's work in new CPSW driver model ;..( >and do AVB bridge. Not sure I tracked everything, but current link example only for dual-emac mode, where i guess no switchdev and thus we have only 2 ports having phys and no need in some common configuration. Only common resources tx queues that are easily shared between two ports. In case of switchdev the same, no need in port 0, at least for cpsw implementation (not sure about others), tx queues cpdma shapers are configured separately and can be done with any netdev presenting ports, thus p0 can be avoided in this case also. For instance, I've created short description, based on current switchdev RFC, showing simple configuration commands for shapers configuration from host side and CBS for every port, w/o p0 usage: https://git.linaro.org/people/ivan.khoronzhuk/tsn_conf.git/tree/README_switchdev Will add it once switchdev decision is stabilized and applied. Basically nothing changed with dual-emac configuration except different rates set for cpdma shapers from host side. Probably, p0 is needed for other configurations, or for compatibility reasons with old versions, but definitely should be created list of all cases, and on my opinion, any common resource for both ports can be configured with any netdev presenting ports if it doesn't conflict with ports configuration itself. > >> >>> In addition there is set of global CPSW parameters (not related to P1/P2, like >>> MAC Authorization Mode, OUI Deny Mode, crc ) which I've >>> thought can be added to sw0p0 (using ethtool -priv-flags). >> >> You should describe these features, and then we can figure out how >> best to model them. devlink might be an option if they are switch >> global. > >Ok. that can be postponed. > >-- >regards, >-grygorii -- Regards, Ivan Khoronzhuk