[RFC,v2,net-next,0/4] Add switchdev on TI-CPSW

Message ID	1528974690-31600-1-git-send-email-ilias.apalodimas@linaro.org
Headers	show Delivered-To: patch@linaro.org Received-SPF: pass (google.com: best guess record for domain of netdev-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; From: Ilias Apalodimas <ilias.apalodimas@linaro.org> To: netdev@vger.kernel.org, grygorii.strashko@ti.com, ivan.khoronzhuk@linaro.org, nsekhar@ti.com, jiri@resnulli.us, ivecera@redhat.com, andrew@lunn.ch, f.fainelli@gmail.com Cc: francois.ozog@linaro.org, yogeshs@ti.com, spatton@ti.com, Jose.Abreu@synopsys.com, Ilias Apalodimas <ilias.apalodimas@linaro.org> Subject: [RFC v2, net-next, PATCH 0/4] Add switchdev on TI-CPSW Date: Thu, 14 Jun 2018 14:11:26 +0300 Message-Id: <1528974690-31600-1-git-send-email-ilias.apalodimas@linaro.org> Sender: netdev-owner@vger.kernel.org Precedence: bulk
Series	Add switchdev on TI-CPSW \| expand [RFC,v2,net-next,0/4] Add switchdev on TI-CPSW [RFC,v2,net-next,1/4] net/cpsw: move common headers definitions to cpsw_priv.h [RFC,v2,net-next,2/4] net/cpsw_ale: add functions to modify VLANs/MDBs [RFC,v2,net-next,3/4] net/cpsw: prepare cpsw for switchdev support [RFC,v2,net-next,4/4] net/cpsw_switchdev: add switchdev mode of operation on cpsw driver

Ilias Apalodimas June 14, 2018, 11:11 a.m. UTC

Hello,

This the RFC v2 which does not register the CPU port based on net-next. 
I didn't manage to rewrite the driver and splitting it to 
common library-old-new but, i did reorganize the patches a bit based 
on Andrew's suggestions. Hopefully it's easier to read.

patch #1: Prepares headers files and move common code to cpsw_priv.h.
patch #2: Adds functions to ALE for modifying VLANs/MDBs.
patch #3: Prepares cpsw driver for switchdev mode, without changing any
of the funtionality.
patch #4: Adds new mode of operation based on switchdev.

In order to enable this you need enable CONFIG_NET_SWITCHDEV, 
CONFIG_BRIDGE_VLAN_FILTERING, CONFIG_TI_CPSW_SWITCHDEV
and add this to udev config: 

SUBSYSTEM=="net", ACTION=="add", ATTR{phys_switch_id}=="0f011900", \
        ATTR{phys_port_name}!="", NAME="sw0$attr{phys_port_name}"

Since the phys_switch_id is based on cpsw version, users with different 
version will need to do 'ip -d link show dev sw0p1 | grep switchid' and 
replace with the correct value.

This patch creates 2 ports, sw0p1 and sw0p2 both connected to PHYs.

Bridge setup:
ip link add name br0 type bridge
ip link set dev br0 type bridge ageing_time 1000
ip link set dev br0 type bridge vlan_filtering 1
ip link set dev sw0p1 up
ip link set dev sw0p2 up
ip link set dev sw0p1 master br0
ip link set dev sw0p2 master br0
ifconfig br0 up

- VLAN config:
untagged:
bridge vlan add dev sw0p1 vid 100 pvid untagged master
bridge vlan add dev sw0p2 vid 100 pvid untagged master

tagged:
bridge vlan add dev sw0p1 vid 100 master
bridge vlan add dev sw0p2 vid 100 master

IP address on br0:
This will add VLAN 100 on the cpu port.
bridge vlan add dev br0 vid 100 pvid untagged self
udhcpc -i br0

- FDBs:
FDBs are automatically added on the appropriate switch port uppon detection
Manually adding FDBs:
bridge fdb add aa:bb:cc:dd:ee:ff dev sw0p1 master vlan 100
bridge fdb add aa:bb:cc:dd:ee:fe dev sw0p2 master

- MDBs:
MDBs are automatically added on the appropriate switch port uppon detection
Manually adding MDBs:
bridge mdb add dev br0 port sw0p1 grp 239.1.1.1 permanent vid 100

- Multicast testing client-port1(tagged on vlan 100) server-port1
switch-config is provided by TI (https://git.ti.com/switch-config)
and is used to verify correct switch configuration.
1. switch-config output
	- type: vlan , vid = 100, untag_force = 0x4, reg_mcast = 0x6,
	unreg_mcast = 0x0, member_list = 0x6
Server running on sw0p2: iperf -s -u -B 239.1.1.1 -i 1
Client running on sw0p1: iperf -c 239.1.1.1 -u -b 990m -f m -i 5 -t 3600
No IGMP reaches the CPU port to add MDBs(since CPU does not receive 
unregistered multicast as programmed).

If the MDB is added manually via:
bridge mdb add dev br0 port sw0p2 grp 239.1.1.1 permanent vid 100
or unregistered flooding is enabled via: 
bridge link set dev sw0p2 mcast_flood on
Multicast traffic is offloaded as expected.

2. switch-config output
	- type: vlan , vid = 100, untag_force = 0x7, reg_mcast = 0x7, 
	unreg_mcast = 0x1, member_list = 0x7
In this case CPU port receives the IGMP message and programs the
switch accordingly. 

tcpdump on br0 shows no packets. If the MDB entry is removed with
"bridge mdb del dev br0 port sw0p1 grp 239.1.1.1 permanent"
br0 is flooded with multicast packets correctly(since unreg multicast is
set for the CPU port).
If the the mdb entry is manually added tcpdump shows no packets and
multicast offloading starts working again.

root@ti:~# bridge mdb show
dev br0 port sw0p1 grp ff02::fb temp offload vid 100
dev br0 port sw0p1 grp 239.1.1.1 temp offload vid 100
root@ti:~# switch-config -d
type: mcast, vid = 100, addr = 01:00:5e:01:01:01, mcast_state = f, \
no super, port_mask = 0x2

- Multicast testing server-port0 client-port1
CPU port(port 0) does not show on bridge mdb show
Using ti's switch-config to dump the switch status shows that the MDB is 
installed correctly.

root@ti:~# switch-config -d
type: mcast, vid = 100, addr = 01:00:5e:01:01:01, mcast_state = f, \
no super, port_mask = 0x1

- registered multicast:
Setting on/off and IFF_MULTICAST (on eth0/eth1/br0) will affect registered 
multicast masks programmed in the switch(for port1, port2, cpu port
respectively).
This muct occur before adding VLANs on the interfaces. If you change the
flag after the VLAN configuration you need to re-issue the VLAN config 
commands. 

If CPU port is participating the proper VLANs MDBs/FDBs will be
offloaded by the switch as described in switchdev API. This will also be
reflected on "bridge vlan/fdb/mdb show" command(in case the host  sends
the join the mdb entry won't show there, but ALE status confirms that 
it's added).

- NFS:
The only way for NFS to work is by chrooting to a minimal environment when 
switch configuration that will affect connectivity is needed.
Assuming you are booting NFS with eth1 interface(the script is hacky and 
it's just there to prove NFS is doable).

setup.sh:
#!/bin/sh
mkdir proc
mount -t proc none /proc
ifconfig br0  > /dev/null
if [ $? -ne 0 ]; then
        echo "Setting up bridge"
        ip link add name br0 type bridge
        ip link set dev br0 type bridge ageing_time 1000
        ip link set dev br0 type bridge vlan_filtering 1

        ip link set eth1 down 
        ip link set eth1 name sw0p1 
        ip link set dev sw0p1 up
        ip link set dev sw0p2 up
        ip link set dev sw0p2 master br0
        ip link set dev sw0p1 master br0
        bridge vlan add dev br0 vid 1 pvid untagged self
        ifconfig sw0p1 0.0.0.0
        udhchc -i br0
fi
umount /proc

run_nfs.sh:
#!/bin/sh
mkdir /tmp/root/bin -p
mkdir /tmp/root/lib -p

cp -r /lib/ /tmp/root/
cp -r /bin/ /tmp/root/
cp /sbin/ip /tmp/root/bin
cp /sbin/bridge /tmp/root/bin
cp /sbin/ifconfig /tmp/root/bin
cp /sbin/udhcpc /tmp/root/bin
cp /path/to/setup.sh /tmp/root/bin
chroot /tmp/root/ busybox sh /bin/run_nfs.sh

run ./run_nfs.sh

- Current issues/future work:
1. For this hardware and it's applications it's essential to control the 
	CPU port individually. After removing the CPU port we lost the ability 
	to control unregistered multicast traffic flags. 
	This code unconditionally(if it participates on that VLAN) adds CPU port
	on the unregistered multicast mask, while for ports 1 and 2 this is 
	configurable via:
	"bridge link set dev eth1 mcast_flood on/off"
	Petr Machata introduced a funtionality on
	VLANs(9c86ce2c1ae337fc10568a12aea812ed03de8319) where the command
	"bridge vlan add dev br0 vid 100 pvid untagged self" is propagated to the
	driver and allows us to configure the CPU port. 
	Adding something similar for MDBs i.e 
	"bridge link set dev br0 mcast_flood on self" that reaches the driver
	is an idea on how to control the CPU port independently and removing the
	need to add/remove the CPU port on the vlan group for this to happen.
2. VLAN CoS is always set to 0
3. Add support for ageing configuration
4. ALE_P0_UNI_FLOOD can be controlled via: 
	"bridge link set dev br0 flood on self" if this propagates to the driver 
	as well.
5. Add documentation for CPSW configuration on the patch

- Changes since RFC v1:
 - Removed CPU port registration. User can now add CPU port VLANs with
        "bridge vlan add/del dev br0 vid 100 pvid untagged self".
 - Removing VLANs will modify registered/unregistered multicast port masks
 	properly.
 - ALE_P0_UNI_FLOOD is controlled from bridge members. As long as the 
        bridge has members(switch interfaces) this will be enabled.
 - added management for SWITCHDEV_OBJ_ID_HOST_MDB to control MDBs for the 
        CPU port.
 - Added STP support.
 - Added multicast flood support. CPU port is always enabled for now.

Ilias Apalodimas (4):
  net/cpsw: move common headers definitions to cpsw_priv.h
  net/cpsw_ale: add functions to modify VLANs/MDBs
  net/cpsw: prepare cpsw for switchdev support
  net/cpsw_switchdev: add switchdev mode of operation on cpsw driver

 drivers/net/ethernet/ti/Kconfig          |   9 +
 drivers/net/ethernet/ti/Makefile         |   1 +
 drivers/net/ethernet/ti/cpsw.c           | 555 ++++++++++++++++++++++---------
 drivers/net/ethernet/ti/cpsw_ale.c       | 188 ++++++++++-
 drivers/net/ethernet/ti/cpsw_ale.h       |  10 +
 drivers/net/ethernet/ti/cpsw_priv.h      | 148 +++++++++
 drivers/net/ethernet/ti/cpsw_switchdev.c | 418 +++++++++++++++++++++++
 drivers/net/ethernet/ti/cpsw_switchdev.h |   4 +
 8 files changed, 1167 insertions(+), 166 deletions(-)
 create mode 100644 drivers/net/ethernet/ti/cpsw_priv.h
 create mode 100644 drivers/net/ethernet/ti/cpsw_switchdev.c
 create mode 100644 drivers/net/ethernet/ti/cpsw_switchdev.h

-- 
2.7.4

Andrew Lunn June 18, 2018, 3:04 p.m. UTC | #1

Hi Ilias

Thanks for removing the CPU port. That helps a lot moving forward.

> - Multicast testing client-port1(tagged on vlan 100) server-port1

> switch-config is provided by TI (https://git.ti.com/switch-config)

> and is used to verify correct switch configuration.

> 1. switch-config output

> 	- type: vlan , vid = 100, untag_force = 0x4, reg_mcast = 0x6,

> 	unreg_mcast = 0x0, member_list = 0x6

> Server running on sw0p2: iperf -s -u -B 239.1.1.1 -i 1

> Client running on sw0p1: iperf -c 239.1.1.1 -u -b 990m -f m -i 5 -t 3600

> No IGMP reaches the CPU port to add MDBs(since CPU does not receive 

> unregistered multicast as programmed).


Is this something you can work around? Is there a TCAM you can program
to detect IGMP and pass the packets to the CPU? Without receiving
IGMP, multicast is pretty broken.

If i understand right, a multicast listener running on the CPU should
work, since you can add an MDB to receive multicast traffic from the
two ports. Multicast traffic sent from the CPU also works. What does
not work is IGMP snooping of traffic between the two switch ports. You
have no access to the IGMP frames, so cannot snoop. So unless you can
work around that with a TCAM, i think you just have to blindly pass
all multicast between the two ports.

> Setting on/off and IFF_MULTICAST (on eth0/eth1/br0) will affect registered 

> multicast masks programmed in the switch(for port1, port2, cpu port

> respectively).


> This muct occur before adding VLANs on the interfaces. If you change the

> flag after the VLAN configuration you need to re-issue the VLAN config 

> commands. 


This you should fix. You should be able to get the stack to tell you
about all the configured VLANs, so you can re-program the switch.

> - NFS:

> The only way for NFS to work is by chrooting to a minimal environment when 

> switch configuration that will affect connectivity is needed.


You might want to look at the commit history for DSA. Florian added a
patch which makes NFS root work with DSA. It might give you clues as
to what you need to add to make it just work.

   Andrew

Ilias Apalodimas June 18, 2018, 4:04 p.m. UTC | #2

On Mon, Jun 18, 2018 at 05:04:24PM +0200, Andrew Lunn wrote:
> Hi Ilias

> 

> Thanks for removing the CPU port. That helps a lot moving forward.

> 

> > - Multicast testing client-port1(tagged on vlan 100) server-port1

> > switch-config is provided by TI (https://git.ti.com/switch-config)

> > and is used to verify correct switch configuration.

> > 1. switch-config output

> > 	- type: vlan , vid = 100, untag_force = 0x4, reg_mcast = 0x6,

> > 	unreg_mcast = 0x0, member_list = 0x6

> > Server running on sw0p2: iperf -s -u -B 239.1.1.1 -i 1

> > Client running on sw0p1: iperf -c 239.1.1.1 -u -b 990m -f m -i 5 -t 3600

> > No IGMP reaches the CPU port to add MDBs(since CPU does not receive 

> > unregistered multicast as programmed).

> 

> Is this something you can work around? Is there a TCAM you can program

> to detect IGMP and pass the packets to the CPU? Without receiving

> IGMP, multicast is pretty broken.

Yes it's described in example 2. (i'll explain in detail further down)
> 

> If i understand right, a multicast listener running on the CPU should

> work, since you can add an MDB to receive multicast traffic from the

> two ports. Multicast traffic sent from the CPU also works. What does

> not work is IGMP snooping of traffic between the two switch ports. You

> have no access to the IGMP frames, so cannot snoop. So unless you can

> work around that with a TCAM, i think you just have to blindly pass

> all multicast between the two ports.

Yes, if the CPU port is added on the VLAN then unregistered multicast traffic
(and thus IGMP joins) will reach the CPU port and everything will work as
expected. I think we should not consider this as a "problem" as long as it's
descibed properly in Documentation. This switch is excected to support this.

What you describe is essentially what happens on "example 2."
Enabling the unregistered multicat traffic to be directed to the CPU will cover
all use cases that require no user intervention for everything to work. MDBs
will automcatically be added/removed to the hardware and traffic will be
offloaded.
With the current code the user has that possibility, so it's up to him to 
decide what mode of configuration he prefers.

> 

> > Setting on/off and IFF_MULTICAST (on eth0/eth1/br0) will affect registered 

> > multicast masks programmed in the switch(for port1, port2, cpu port

> > respectively).

> 

> > This muct occur before adding VLANs on the interfaces. If you change the

> > flag after the VLAN configuration you need to re-issue the VLAN config 

> > commands. 

> 

> This you should fix. You should be able to get the stack to tell you

> about all the configured VLANs, so you can re-program the switch.

I was planning fixing these via bridge link commands which would get propagated
to the driver for port1/port2 and CPU port. I just didn't find anything
relevant to multicast on bridge commands apart from flooding (which is used 
properly). I think that the proper way to do this is follow the logic that was
introduced by VLANs i.e: 
bridge vlan add dev br0 vid 100 pvid untagged self <---- destined for CPU port
and apply it for multicast/flooding etc.
This requires iproute2 changes first though.

> 

> > - NFS:

> > The only way for NFS to work is by chrooting to a minimal environment when 

> > switch configuration that will affect connectivity is needed.

> 

> You might want to look at the commit history for DSA. Florian added a

> patch which makes NFS root work with DSA. It might give you clues as

> to what you need to add to make it just work.

I'll have a look. Chrooting is rarely needed in our case anyway. It's only
needed when "loss of connectivity" is bound to take place.
> 

>    Andrew

Thanks
Ilias

Andrew Lunn June 18, 2018, 4:28 p.m. UTC | #3

> Yes, if the CPU port is added on the VLAN then unregistered multicast traffic

> (and thus IGMP joins) will reach the CPU port and everything will work as

> expected. I think we should not consider this as a "problem" as long as it's

> descibed properly in Documentation. This switch is excected to support this.


Back to the two e1000e. What would you expect to happen with them?
Either IGMP snooping needs to work, or your don't do snooping at
all.

> What you describe is essentially what happens on "example 2."

> Enabling the unregistered multicat traffic to be directed to the CPU will cover

> all use cases that require no user intervention for everything to work. MDBs

> will automcatically be added/removed to the hardware and traffic will be

> offloaded.

> With the current code the user has that possibility, so it's up to him to 

> decide what mode of configuration he prefers.


So by default, it just needs to work. You can give the user the option
to shoot themselves in the foot, but they need to actively pull the
trigger to blow their own foot off.

> > > Setting on/off and IFF_MULTICAST (on eth0/eth1/br0) will affect registered 

> > > multicast masks programmed in the switch(for port1, port2, cpu port

> > > respectively).

> > 

> > > This muct occur before adding VLANs on the interfaces. If you change the

> > > flag after the VLAN configuration you need to re-issue the VLAN config 

> > > commands. 

> > 

> > This you should fix. You should be able to get the stack to tell you

> > about all the configured VLANs, so you can re-program the switch.

> I was planning fixing these via bridge link commands which would get propagated

> to the driver for port1/port2 and CPU port. I just didn't find anything

> relevant to multicast on bridge commands apart from flooding (which is used 

> properly). I think that the proper way to do this is follow the logic that was

> introduced by VLANs i.e: 

> bridge vlan add dev br0 vid 100 pvid untagged self <---- destined for CPU port

> and apply it for multicast/flooding etc.

> This requires iproute2 changes first though.


No, i think you can do this in the driver. The driver just needs to
ask the stack to tell it about all the VLANs again. Or you walk the
VLAN tables in the hardware and do the programming based on that. I
don't see why user space should be involved at all. What would you
expect with two e1000e's?

       Andrew

Ilias Apalodimas June 18, 2018, 4:46 p.m. UTC | #4

On Mon, Jun 18, 2018 at 06:28:36PM +0200, Andrew Lunn wrote:
> > Yes, if the CPU port is added on the VLAN then unregistered multicast traffic

> > (and thus IGMP joins) will reach the CPU port and everything will work as

> > expected. I think we should not consider this as a "problem" as long as it's

> > descibed properly in Documentation. This switch is excected to support this.

> 

> Back to the two e1000e. What would you expect to happen with them?

> Either IGMP snooping needs to work, or your don't do snooping at

> all.

That's a different use case, you don't have a CPU port on e1000 and it 
"just works". You can't do anything to the card and drop the packet. If you
want to have the same example imagine something like "i filter and drop IGMP
messages on one of the 2 e1000e's on the bridge but i expect IGMP to work".
It's a totally different hardware here were this is an option and for TI 
usecases a valid option.

> 

> > What you describe is essentially what happens on "example 2."

> > Enabling the unregistered multicat traffic to be directed to the CPU will cover

> > all use cases that require no user intervention for everything to work. MDBs

> > will automcatically be added/removed to the hardware and traffic will be

> > offloaded.

> > With the current code the user has that possibility, so it's up to him to 

> > decide what mode of configuration he prefers.

> 

> So by default, it just needs to work. You can give the user the option

> to shoot themselves in the foot, but they need to actively pull the

> trigger to blow their own foot off.

Yes it does by default. I don't consider it "foot shooting" though. 
If we stop thinking about switches connected to user environments and think of
industrial ones, my understanding is that this is a common scenarioa that needs
to be supported.

> 

> > > > Setting on/off and IFF_MULTICAST (on eth0/eth1/br0) will affect registered 

> > > > multicast masks programmed in the switch(for port1, port2, cpu port

> > > > respectively).

> > > 

> > > > This muct occur before adding VLANs on the interfaces. If you change the

> > > > flag after the VLAN configuration you need to re-issue the VLAN config 

> > > > commands. 

> > > 

> > > This you should fix. You should be able to get the stack to tell you

> > > about all the configured VLANs, so you can re-program the switch.

> > I was planning fixing these via bridge link commands which would get propagated

> > to the driver for port1/port2 and CPU port. I just didn't find anything

> > relevant to multicast on bridge commands apart from flooding (which is used 

> > properly). I think that the proper way to do this is follow the logic that was

> > introduced by VLANs i.e: 

> > bridge vlan add dev br0 vid 100 pvid untagged self <---- destined for CPU port

> > and apply it for multicast/flooding etc.

> > This requires iproute2 changes first though.

> 

> No, i think you can do this in the driver. The driver just needs to

> ask the stack to tell it about all the VLANs again. Or you walk the

> VLAN tables in the hardware and do the programming based on that. I

> don't see why user space should be involved at all. What would you

> expect with two e1000e's?

We are pretty much describing the same thing i just thought having a bridge
command for it was more appropriate.
After the user removes the multicast flag for an interface i'll just walk VLANs
and adjust accordingly. It's doable and i'll change it for the patch.


Thanks
Ilias

Andrew Lunn June 18, 2018, 5:30 p.m. UTC | #5

On Mon, Jun 18, 2018 at 07:46:02PM +0300, Ilias Apalodimas wrote:
> On Mon, Jun 18, 2018 at 06:28:36PM +0200, Andrew Lunn wrote:

> > > Yes, if the CPU port is added on the VLAN then unregistered multicast traffic

> > > (and thus IGMP joins) will reach the CPU port and everything will work as

> > > expected. I think we should not consider this as a "problem" as long as it's

> > > descibed properly in Documentation. This switch is excected to support this.

> > 

> > Back to the two e1000e. What would you expect to happen with them?

> > Either IGMP snooping needs to work, or your don't do snooping at

> > all.

> That's a different use case

I disagree. That is the exact same use case. I add ports to a bridge
and i expect the bridge to either do IGMP snooping, or just forward
all multicast. That is the users expectations. That is how the Linux
network stack works. If the hardware has limitations you want to try
to hide them from the user.

> > So by default, it just needs to work. You can give the user the option

> > to shoot themselves in the foot, but they need to actively pull the

> > trigger to blow their own foot off.

> Yes it does by default. I don't consider it "foot shooting" though. 

> If we stop thinking about switches connected to user environments 

I never think about switches. I think about a block of acceleration
hardware, which i try to offload Linux networking to. And if the
hardware cannot accelerate Linux network functions properly, i don't
try to offload it. That way it always operates in the same way, and
the user expectations are clear.

    Andrew

Ilias Apalodimas June 18, 2018, 5:49 p.m. UTC | #6

On Mon, Jun 18, 2018 at 07:30:25PM +0200, Andrew Lunn wrote:
> On Mon, Jun 18, 2018 at 07:46:02PM +0300, Ilias Apalodimas wrote:

> > On Mon, Jun 18, 2018 at 06:28:36PM +0200, Andrew Lunn wrote:

> > > > Yes, if the CPU port is added on the VLAN then unregistered multicast traffic

> > > > (and thus IGMP joins) will reach the CPU port and everything will work as

> > > > expected. I think we should not consider this as a "problem" as long as it's

> > > > descibed properly in Documentation. This switch is excected to support this.

> > > 

> > > Back to the two e1000e. What would you expect to happen with them?

> > > Either IGMP snooping needs to work, or your don't do snooping at

> > > all.

> > That's a different use case

> 

> I disagree. That is the exact same use case. I add ports to a bridge

> and i expect the bridge to either do IGMP snooping, or just forward

> all multicast. That is the users expectations. That is how the Linux

> network stack works. If the hardware has limitations you want to try

> to hide them from the user.

Why is this a limitation? Isn't it proper functionality?
If you configure the CPU port as "passthrough" (which again is
the default) then everything works just like e1000e. The user doesn't have to do
anything at all, MDBs are added/deleted based on proper timers/joins etc.
If the user chooses to use the cpu port as a kind of internal L-2 filter, 
that's up to him, but having hardware do the filtering for you isn't something 
i'd call a limitation.

I am not sure what's the case here though. The hardware operates as you want
by defaulti. As added functionality the user can, if he chooses to, add the 
MDBs manually instead of having some piece of code do that for him. 
This was clearly described in the first RFC since it was the only reason to add
a CPU port. Is there a problem with the user controlling these capabilities of
the hardware?
> > > So by default, it just needs to work. You can give the user the option

> > > to shoot themselves in the foot, but they need to actively pull the

> > > trigger to blow their own foot off.

> 

> > Yes it does by default. I don't consider it "foot shooting" though. 

> > If we stop thinking about switches connected to user environments 

> 

> I never think about switches. I think about a block of acceleration

> hardware, which i try to offload Linux networking to.  And if the

> hardware cannot accelerate Linux network functions properly, i don't

> try to offload it. That way it always operates in the same way, and

> the user expectations are clear.

> 

>     Andrew

The acceleration block is working properly here. The user has the ability to
instruct the acceleration block not to accelerate all the traffic but specific
cases he chooses to. Isn't that a valid use case since the hardware supports
that ?

Regards
Ilias

Grygorii Strashko June 27, 2018, 9:05 p.m. UTC | #7

On 06/18/2018 12:49 PM, Ilias Apalodimas wrote:
> On Mon, Jun 18, 2018 at 07:30:25PM +0200, Andrew Lunn wrote:

>> On Mon, Jun 18, 2018 at 07:46:02PM +0300, Ilias Apalodimas wrote:

>>> On Mon, Jun 18, 2018 at 06:28:36PM +0200, Andrew Lunn wrote:

>>>>> Yes, if the CPU port is added on the VLAN then unregistered multicast traffic

>>>>> (and thus IGMP joins) will reach the CPU port and everything will work as

>>>>> expected. I think we should not consider this as a "problem" as long as it's

>>>>> descibed properly in Documentation. This switch is excected to support this.

>>>>

>>>> Back to the two e1000e. What would you expect to happen with them?

>>>> Either IGMP snooping needs to work, or your don't do snooping at

>>>> all.

>>> That's a different use case

>>

>> I disagree. That is the exact same use case. I add ports to a bridge

>> and i expect the bridge to either do IGMP snooping, or just forward

>> all multicast. That is the users expectations. That is how the Linux

>> network stack works. If the hardware has limitations you want to try

>> to hide them from the user.

> Why is this a limitation? Isn't it proper functionality?

> If you configure the CPU port as "passthrough" (which again is

> the default) then everything works just like e1000e. The user doesn't have to do

> anything at all, MDBs are added/deleted based on proper timers/joins etc.

> If the user chooses to use the cpu port as a kind of internal L-2 filter,

> that's up to him, but having hardware do the filtering for you isn't something

> i'd call a limitation.

> 

> I am not sure what's the case here though. The hardware operates as you want

> by defaulti. As added functionality the user can, if he chooses to, add the

> MDBs manually instead of having some piece of code do that for him.

> This was clearly described in the first RFC since it was the only reason to add

> a CPU port. Is there a problem with the user controlling these capabilities of

> the hardware?

>>>> So by default, it just needs to work. You can give the user the option

>>>> to shoot themselves in the foot, but they need to actively pull the

>>>> trigger to blow their own foot off.

>>

>>> Yes it does by default. I don't consider it "foot shooting" though.

>>> If we stop thinking about switches connected to user environments

>>

>> I never think about switches. I think about a block of acceleration

>> hardware, which i try to offload Linux networking to.  And if the

>> hardware cannot accelerate Linux network functions properly, i don't

>> try to offload it. That way it always operates in the same way, and

>> the user expectations are clear.

Yeh. Sry, but the user expectations depends on application area. 
For the industrial & automotive application It's not only about acceleration only - it's
about filtering (hm, hw filtering is acceleration also :), which is the strict requirement.
Yes, CPSW will help Linux networking stuck to accelerate packet forwarding,
but it's expected to do this only for *allowed* traffic.
One of the main points is to prevent DoS attacks when Linux Host will not be able to
perform required operations and will stuck processing some network (mcast) traffic. 
Example, remote control/monitoring stations where one Port connected
to the private sensor/control network and another is wan/trunk port. Plus, Linux Host is running 
industrial application which monitor/control some hw by itself - activity on WAN port should
have minimal effect on ability of Linux Host industrial application to perform it target function
and prevent packets flooding to the private network.
As result, after boot, such systems are expected to work following the rule:
 "everything is prohibited which is not explicitly allowed".
And that's exactly what CPSW allows to achieve by filling manually/statically ALE VLAN/FDB/MDB tables and
this way prevent IGMP/.. Flood Attacks.

Hope, above will help understand our use-cases and why we "annoyingly" asking about
possibility to do manual/static configuration and not rely on IGMP or other great, but not 
applicable for all use cases, networking technologies.

> The acceleration block is working properly here. The user has the ability to

> instruct the acceleration block not to accelerate all the traffic but specific

> cases he chooses to. Isn't that a valid use case since the hardware supports

> that ?

-- 
regards,
-grygorii

[RFC,v2,net-next,0/4] Add switchdev on TI-CPSW

Message

Comments