Message ID | 20210728175327.1150120-2-dqfext@gmail.com |
---|---|
State | New |
Headers | show |
Series | [RFC,net-next,1/2] net: dsa: tag_mtk: skip address learning on transmit to standalone ports | expand |
On Thu, Jul 29, 2021 at 01:53:25AM +0800, DENG Qingfang wrote: > Consider the following bridge configuration, where bond0 is not > offloaded: > > +-- br0 --+ > / / | \ > / / | \ > / | | bond0 > / | | / \ > swp0 swp1 swp2 swp3 swp4 > . . . > . . . > A B C > > Address learning is enabled on offloaded ports (swp0~2) and the CPU > port, so when client A sends a packet to C, the following will happen: > > 1. The switch learns that client A can be reached at swp0. > 2. The switch probably already knows that client C can be reached at the > CPU port, so it forwards the packet to the CPU. > 3. The bridge core knows client C can be reached at bond0, so it > forwards the packet back to the switch. > 4. The switch learns that client A can be reached at the CPU port. > 5. The switch forwards the packet to either swp3 or swp4, according to > the packet's tag. > > That makes client A's MAC address flap between swp0 and the CPU port. If > client B sends a packet to A, it is possible that the packet is > forwarded to the CPU. With offload_fwd_mark = 1, the bridge core won't > forward it back to the switch, resulting in packet loss. > > To avoid that, skip address learning on the CPU port when the destination > port is standalone, which can be done by setting the SA_DIS bit of the > MTK tag, if bridge_dev of the destination port is not set. > > Signed-off-by: DENG Qingfang <dqfext@gmail.com> > --- > net/dsa/tag_mtk.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/net/dsa/tag_mtk.c b/net/dsa/tag_mtk.c > index cc3ba864ad5b..8c361812e21b 100644 > --- a/net/dsa/tag_mtk.c > +++ b/net/dsa/tag_mtk.c > @@ -15,8 +15,7 @@ > #define MTK_HDR_XMIT_TAGGED_TPID_8100 1 > #define MTK_HDR_XMIT_TAGGED_TPID_88A8 2 > #define MTK_HDR_RECV_SOURCE_PORT_MASK GENMASK(2, 0) > -#define MTK_HDR_XMIT_DP_BIT_MASK GENMASK(5, 0) > -#define MTK_HDR_XMIT_SA_DIS BIT(6) > +#define MTK_HDR_XMIT_SA_DIS_SHIFT 6 > > static struct sk_buff *mtk_tag_xmit(struct sk_buff *skb, > struct net_device *dev) > @@ -50,7 +49,8 @@ static struct sk_buff *mtk_tag_xmit(struct sk_buff *skb, > * whether that's a combined special tag with 802.1Q header. > */ > mtk_tag[0] = xmit_tpid; > - mtk_tag[1] = (1 << dp->index) & MTK_HDR_XMIT_DP_BIT_MASK; Why stop AND-ing with MTK_HDR_XMIT_DP_BIT_MASK if you were doing that before? If it's not needed (probably isn't), it would be nice to split that up. > + mtk_tag[1] = BIT(dp->index) | > + (!dp->bridge_dev << MTK_HDR_XMIT_SA_DIS_SHIFT); > > /* Tag control information is kept for 802.1Q */ > if (xmit_tpid == MTK_HDR_XMIT_UNTAGGED) { > -- > 2.25.1 > Otherwise this is as correct as can be without implementing TX forwarding offload for the bridge (which you've explained why it doesn't map 1:1 with what your hw can do). But just because a port is under a bridge doesn't mean that the only packets it sends belong to that bridge. Think AF_PACKET sockets, PTP etc. The bridge also has a no_linklocal_learn option that maybe should be taken into consideration for drivers that can do something meaningful about it. Anyway, food for thought. Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
On Wed, Jul 28, 2021 at 09:37:05PM +0300, Vladimir Oltean wrote: > Otherwise this is as correct as can be without implementing TX > forwarding offload for the bridge (which you've explained why it doesn't > map 1:1 with what your hw can do). But just because a port is under a bridge > doesn't mean that the only packets it sends belong to that bridge. Think > AF_PACKET sockets, PTP etc. The bridge also has a no_linklocal_learn > option that maybe should be taken into consideration for drivers that > can do something meaningful about it. Anyway, food for thought. Considering that you also have the option of setting ds->assisted_learning_on_cpu_port = true and this will have less false positives, what are the reasons why you did not choose that approach?
On Fri, Jul 30, 2021 at 07:24:03PM +0300, Vladimir Oltean wrote: > Considering that you also have the option of setting > ds->assisted_learning_on_cpu_port = true and this will have less false > positives, what are the reasons why you did not choose that approach? You're right. Hardware learning on CPU port does have some limitations. I have been testing a multi CPU ports patch, and assisted learning has to be used, because FDB entries should be installed like multicast ones, which point to all CPU ports.
On Sat, Jul 31, 2021 at 01:32:03AM +0800, DENG Qingfang wrote: > On Fri, Jul 30, 2021 at 07:24:03PM +0300, Vladimir Oltean wrote: > > Considering that you also have the option of setting > > ds->assisted_learning_on_cpu_port = true and this will have less false > > positives, what are the reasons why you did not choose that approach? > > You're right. Hardware learning on CPU port does have some limitations. > > I have been testing a multi CPU ports patch, and assisted learning has > to be used, because FDB entries should be installed like multicast > ones, which point to all CPU ports. Ah, mt7530 is one of the switches which has multiple CPU ports, I had forgotten that. In that case, then static FDB entries are pretty much the only way to go indeed. I am going to send a patch series soon to convert sja1105 to assisted learning too. It doesn't support multiple CPU ports, and it does have hardware learning on the CPU port, but it can be arranged in cross-chip topologies where each switch has its own CPU port, so from DSA's perspective, it is as though we are dealing with a multi-CPU port switch (the DSA tree does have multiple CPUs, in fact). I have been obsessively testing this configuration for the past few weeks and I think the assisted learning functionality works fairly well by now.
On Fri, Jul 30, 2021 at 08:39:02PM +0300, Vladimir Oltean wrote: > On Sat, Jul 31, 2021 at 01:32:03AM +0800, DENG Qingfang wrote: > > On Fri, Jul 30, 2021 at 07:24:03PM +0300, Vladimir Oltean wrote: > > > Considering that you also have the option of setting > > > ds->assisted_learning_on_cpu_port = true and this will have less false > > > positives, what are the reasons why you did not choose that approach? > > > > You're right. Hardware learning on CPU port does have some limitations. > > > > I have been testing a multi CPU ports patch, and assisted learning has > > to be used, because FDB entries should be installed like multicast > > ones, which point to all CPU ports. > > Ah, mt7530 is one of the switches which has multiple CPU ports, I had > forgotten that. In that case, then static FDB entries are pretty much > the only way to go indeed. I forget which ones are the modes in which the multi-CPU feature on mt7530 is supposed to be used: static assignment of user ports to CPU ports, or LAG between the CPU ports, or a mix of both?
On Fri, Jul 30, 2021 at 08:41:35PM +0300, Vladimir Oltean wrote: > On Fri, Jul 30, 2021 at 08:39:02PM +0300, Vladimir Oltean wrote: > > Ah, mt7530 is one of the switches which has multiple CPU ports, I had > > forgotten that. In that case, then static FDB entries are pretty much > > the only way to go indeed. > > I forget which ones are the modes in which the multi-CPU feature on > mt7530 is supposed to be used: static assignment of user ports to CPU > ports, or LAG between the CPU ports, or a mix of both? MT7530 only supports static assignment, by changing the port matrix. MT7531 also supports hardware LAG, but I don't think it's ideal because its CPU ports have different speeds (one 1Gbps RGMII and the other 2.5Gbps HSGMII).
On Fri, Jul 30, 2021 at 07:24:03PM +0300, Vladimir Oltean wrote: > Considering that you also have the option of setting > ds->assisted_learning_on_cpu_port = true and this will have less false > positives, what are the reasons why you did not choose that approach? After enabling it, I noticed .port_fdb_{add,del} are called with VID=0 (which it does not use now) unless I turn on VLAN filtering. Is that normal?
On Sat, Jul 31, 2021 at 03:00:20AM +0800, DENG Qingfang wrote: > On Fri, Jul 30, 2021 at 07:24:03PM +0300, Vladimir Oltean wrote: > > Considering that you also have the option of setting > > ds->assisted_learning_on_cpu_port = true and this will have less false > > positives, what are the reasons why you did not choose that approach? > > After enabling it, I noticed .port_fdb_{add,del} are called with VID=0 > (which it does not use now) unless I turn on VLAN filtering. Is that > normal? They are called with the VID from the learned packet. If the bridge is VLAN-unaware, the MAC SA is learned with VID 0. Generally, VID 0 is always used for VLAN-unaware bridging. You can privately translate VID 0 to whatever VLAN ID you use in VLAN-unaware mode.
On Fri, Jul 30, 2021 at 10:07:06PM +0300, Vladimir Oltean wrote: > > After enabling it, I noticed .port_fdb_{add,del} are called with VID=0 > > (which it does not use now) unless I turn on VLAN filtering. Is that > > normal? > > They are called with the VID from the learned packet. > If the bridge is VLAN-unaware, the MAC SA is learned with VID 0. > Generally, VID 0 is always used for VLAN-unaware bridging. You can > privately translate VID 0 to whatever VLAN ID you use in VLAN-unaware > mode. Now the issue is PVID is always set to the bridge's vlan_default_pvid, regardless of VLAN awareless.
On Sat, Jul 31, 2021 at 03:25:55AM +0800, DENG Qingfang wrote: > On Fri, Jul 30, 2021 at 10:07:06PM +0300, Vladimir Oltean wrote: > > > After enabling it, I noticed .port_fdb_{add,del} are called with VID=0 > > > (which it does not use now) unless I turn on VLAN filtering. Is that > > > normal? > > > > They are called with the VID from the learned packet. > > If the bridge is VLAN-unaware, the MAC SA is learned with VID 0. > > Generally, VID 0 is always used for VLAN-unaware bridging. You can > > privately translate VID 0 to whatever VLAN ID you use in VLAN-unaware > > mode. > > Now the issue is PVID is always set to the bridge's vlan_default_pvid, > regardless of VLAN awareless. Then change that, sja1105 and ocelot/felix are good examples of how to set a pvid in VLAN-unaware mode that is independent of what the bridge asks for.
diff --git a/net/dsa/tag_mtk.c b/net/dsa/tag_mtk.c index cc3ba864ad5b..8c361812e21b 100644 --- a/net/dsa/tag_mtk.c +++ b/net/dsa/tag_mtk.c @@ -15,8 +15,7 @@ #define MTK_HDR_XMIT_TAGGED_TPID_8100 1 #define MTK_HDR_XMIT_TAGGED_TPID_88A8 2 #define MTK_HDR_RECV_SOURCE_PORT_MASK GENMASK(2, 0) -#define MTK_HDR_XMIT_DP_BIT_MASK GENMASK(5, 0) -#define MTK_HDR_XMIT_SA_DIS BIT(6) +#define MTK_HDR_XMIT_SA_DIS_SHIFT 6 static struct sk_buff *mtk_tag_xmit(struct sk_buff *skb, struct net_device *dev) @@ -50,7 +49,8 @@ static struct sk_buff *mtk_tag_xmit(struct sk_buff *skb, * whether that's a combined special tag with 802.1Q header. */ mtk_tag[0] = xmit_tpid; - mtk_tag[1] = (1 << dp->index) & MTK_HDR_XMIT_DP_BIT_MASK; + mtk_tag[1] = BIT(dp->index) | + (!dp->bridge_dev << MTK_HDR_XMIT_SA_DIS_SHIFT); /* Tag control information is kept for 802.1Q */ if (xmit_tpid == MTK_HDR_XMIT_UNTAGGED) {
Consider the following bridge configuration, where bond0 is not offloaded: +-- br0 --+ / / | \ / / | \ / | | bond0 / | | / \ swp0 swp1 swp2 swp3 swp4 . . . . . . A B C Address learning is enabled on offloaded ports (swp0~2) and the CPU port, so when client A sends a packet to C, the following will happen: 1. The switch learns that client A can be reached at swp0. 2. The switch probably already knows that client C can be reached at the CPU port, so it forwards the packet to the CPU. 3. The bridge core knows client C can be reached at bond0, so it forwards the packet back to the switch. 4. The switch learns that client A can be reached at the CPU port. 5. The switch forwards the packet to either swp3 or swp4, according to the packet's tag. That makes client A's MAC address flap between swp0 and the CPU port. If client B sends a packet to A, it is possible that the packet is forwarded to the CPU. With offload_fwd_mark = 1, the bridge core won't forward it back to the switch, resulting in packet loss. To avoid that, skip address learning on the CPU port when the destination port is standalone, which can be done by setting the SA_DIS bit of the MTK tag, if bridge_dev of the destination port is not set. Signed-off-by: DENG Qingfang <dqfext@gmail.com> --- net/dsa/tag_mtk.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)