mbox series

[RFC,v2,net-next,00/16] Better support for sandwiched LAGs with bridge and DSA

Message ID 20210318231829.3892920-1-olteanv@gmail.com
Headers show
Series Better support for sandwiched LAGs with bridge and DSA | expand

Message

Vladimir Oltean March 18, 2021, 11:18 p.m. UTC
From: Vladimir Oltean <vladimir.oltean@nxp.com>

This series has two objectives:
- To make LAG uppers on top of DSA ports work regardless of which order
  we link interfaces to their masters (first make the port join the LAG,
  then the LAG join the bridge, or the other way around).
- To make DSA ports support non-offloaded LAG interfaces properly.

There was a design decision to be made in patches 2-4 on whether we
should adopt the "push" model, where the driver just calls:

  switchdev_bridge_port_offloaded(brport_dev,
                                  &atomic_notifier_block,
                                  &blocking_notifier_block,
                                  extack);

and the bridge just replays the entire collection of switchdev port
attributes and objects that it has, in some predefined order and with
some predefined error handling logic;


or the "pull" model, where the driver, apart from calling:

  switchdev_bridge_port_offloaded(brport_dev, extack);

has the task of "dumpster diving" (as Tobias puts it) through the bridge
attributes and objects by itself, by calling:

  - br_vlan_replay
  - br_fdb_replay
  - br_mdb_replay
  - br_vlan_enabled
  - br_port_flag_is_set
  - br_port_get_stp_state
  - br_multicast_router
  - br_get_ageing_time

(not necessarily all of them, and not necessarily in this order, and
with driver-defined error handling).

Even though I'm not in love myself with the "pull" model, I chose it
because there is a fundamental trick with replaying switchdev events
like this:

ip link add br0 type bridge
ip link add bond0 type bond
ip link set bond0 master br0
ip link set swp0 master bond0 <- this will replay the objects once for
                                 the bond0 bridge port, and the swp0
                                 switchdev port will process them
ip link set swp1 master bond0 <- this will replay the objects again for
                                 the bond0 bridge port, and the swp1
                                 switchdev port will see them, but swp0
                                 will see them for the second time now

Basically I believe that it is implementation defined whether the driver
wants to error out on switchdev objects seen twice on a port, and the
bridge should not enforce a certain model for that. For example, for FDB
entries added to a bonding interface, the underling switchdev driver
might have an abstraction for just that: an FDB entry pointing towards a
logical (as opposed to physical) port. So when the second port joins the
bridge, it doesn't realy need to replay FDB entries, since there is
already at least one hardware port which has been receiving those
events, and the FDB entries don't need to be added a second time to the
same logical port.
In the other corner, we have the drivers that handle switchdev port
attributes on a LAG as individual switchdev port attributes on physical
ports (example: VLAN filtering). In fact, the switchdev_handle_port_attr_set
helper facilitates this: it is a fan-out from a single orig_dev towards
multiple lowers that pass the check_cb().
But that's the point: switchdev_handle_port_attr_set is just a helper
which the driver _opts_ to use. The bridge can't enforce the "push"
model, because that would assume that all drivers handle port attributes
in the same way, which is probably false.

For this reason, I preferred to go with the "pull" mode for this patch
set. Just to see how bad it is for other switchdev drivers to copy-paste
this logic, I added the pull support to ocelot too, and I think it's
pretty manageable.

This patch set is RFC because it is minimally tested, and I would like
to get some feedback/agreement regarding the design decisions taken,
before I spend any more time on this.

There are also some things I probably broke, but I couldn't figure any
better. For example, I can't seem to figure out if mlxsw does the right
thing when joining a bonding interface that is already a bridge port.
I think it probably doesn't, so in that case, the placement I found for
the switchdev_bridge_port_offload() probably needs some adjustment when
there exists a LAG upper.

If possible, I would like the maintainers of the switchdev drivers to
tell me if this change introduces any regressions to how packets are
flooded (actually not flooded) in software by the bridge between two
ports belonging to the same ASIC ID.

I should mention that this patch series is written on top of Tobias'
series:
https://patchwork.kernel.org/project/netdevbpf/cover/20210318192540.895062-1-tobias@waldekranz.com/
which should get applied soon.

Vladimir Oltean (16):
  net: dsa: call dsa_port_bridge_join when joining a LAG that is already
    in a bridge
  net: dsa: pass extack to dsa_port_{bridge,lag}_join
  net: dsa: inherit the actual bridge port flags at join time
  net: dsa: sync up with bridge port's STP state when joining
  net: dsa: sync up VLAN filtering state when joining the bridge
  net: dsa: sync multicast router state when joining the bridge
  net: dsa: sync ageing time when joining the bridge
  net: dsa: replay port and host-joined mdb entries when joining the
    bridge
  net: dsa: replay port and local fdb entries when joining the bridge
  net: dsa: replay VLANs installed on port when joining the bridge
  net: ocelot: support multiple bridges
  net: ocelot: call ocelot_netdevice_bridge_join when joining a bridged
    LAG
  net: ocelot: replay switchdev events when joining bridge
  net: dsa: don't set skb->offload_fwd_mark when not offloading the
    bridge
  net: dsa: return -EOPNOTSUPP when driver does not implement
    .port_lag_join
  net: bridge: switchdev: let drivers inform which bridge ports are
    offloaded

 drivers/net/dsa/ocelot/felix.c                |   4 +-
 .../ethernet/freescale/dpaa2/dpaa2-switch.c   |   4 +-
 .../marvell/prestera/prestera_switchdev.c     |   7 +
 .../mellanox/mlxsw/spectrum_switchdev.c       |   4 +-
 drivers/net/ethernet/mscc/ocelot.c            |  90 ++++----
 drivers/net/ethernet/mscc/ocelot_net.c        | 210 +++++++++++++++---
 drivers/net/ethernet/rocker/rocker_ofdpa.c    |   8 +-
 drivers/net/ethernet/ti/am65-cpsw-nuss.c      |   7 +-
 drivers/net/ethernet/ti/cpsw_new.c            |   6 +-
 include/linux/if_bridge.h                     |  56 +++++
 include/net/switchdev.h                       |   1 +
 include/soc/mscc/ocelot.h                     |  13 +-
 net/bridge/br_fdb.c                           |  52 +++++
 net/bridge/br_if.c                            |  11 +-
 net/bridge/br_mdb.c                           |  84 +++++++
 net/bridge/br_private.h                       |   8 +-
 net/bridge/br_stp.c                           |  27 +++
 net/bridge/br_switchdev.c                     |  94 +++++++-
 net/bridge/br_vlan.c                          |  71 ++++++
 net/dsa/dsa_priv.h                            |  23 +-
 net/dsa/port.c                                | 201 +++++++++++++----
 net/dsa/slave.c                               |  11 +-
 net/dsa/switch.c                              |   4 +-
 net/dsa/tag_brcm.c                            |   2 +-
 net/dsa/tag_dsa.c                             |  15 +-
 net/dsa/tag_hellcreek.c                       |   2 +-
 net/dsa/tag_ksz.c                             |   2 +-
 net/dsa/tag_lan9303.c                         |   3 +-
 net/dsa/tag_mtk.c                             |   2 +-
 net/dsa/tag_ocelot.c                          |   2 +-
 net/dsa/tag_ocelot_8021q.c                    |   2 +-
 net/dsa/tag_rtl4_a.c                          |   2 +-
 net/dsa/tag_sja1105.c                         |   4 +-
 net/dsa/tag_xrs700x.c                         |   2 +-
 34 files changed, 845 insertions(+), 189 deletions(-)

Comments

Florian Fainelli March 19, 2021, 10:12 p.m. UTC | #1
On 3/18/2021 4:18 PM, Vladimir Oltean wrote:
> From: Vladimir Oltean <vladimir.oltean@nxp.com>
> 
> Make sure that the multicast router setting of the bridge is picked up
> correctly by DSA when joining, regardless of whether there are
> sandwiched interfaces or not. The SWITCHDEV_ATTR_ID_BRIDGE_MROUTER port
> attribute is only emitted from br_mc_router_state_change.
> 
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Florian Fainelli March 19, 2021, 10:13 p.m. UTC | #2
On 3/18/2021 4:18 PM, Vladimir Oltean wrote:
> From: Vladimir Oltean <vladimir.oltean@nxp.com>
> 
> The SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME attribute is only emitted from:
> 
> sysfs/ioctl/netlink
> -> br_set_ageing_time
>    -> __set_ageing_time
> 
> therefore not at bridge port creation time, so:
> (a) drivers had to hardcode the initial value for the address ageing time,
>     because they didn't get any notification
> (b) that hardcoded value can be out of sync, if the user changes the
>     ageing time before enslaving the port to the bridge
> 
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> ---
>  include/linux/if_bridge.h |  6 ++++++
>  net/bridge/br_stp.c       | 13 +++++++++++++
>  net/dsa/port.c            | 10 ++++++++++
>  3 files changed, 29 insertions(+)
> 
> diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
> index 920d3a02cc68..ebd16495459c 100644
> --- a/include/linux/if_bridge.h
> +++ b/include/linux/if_bridge.h
> @@ -137,6 +137,7 @@ struct net_device *br_fdb_find_port(const struct net_device *br_dev,
>  void br_fdb_clear_offload(const struct net_device *dev, u16 vid);
>  bool br_port_flag_is_set(const struct net_device *dev, unsigned long flag);
>  u8 br_port_get_stp_state(const struct net_device *dev);
> +clock_t br_get_ageing_time(struct net_device *br_dev);
>  #else
>  static inline struct net_device *
>  br_fdb_find_port(const struct net_device *br_dev,
> @@ -160,6 +161,11 @@ static inline u8 br_port_get_stp_state(const struct net_device *dev)
>  {
>  	return BR_STATE_DISABLED;
>  }
> +
> +static inline clock_t br_get_ageing_time(struct net_device *br_dev)
> +{
> +	return 0;
> +}
>  #endif
>  
>  #endif
> diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c
> index 86b5e05d3f21..3dafb6143cff 100644
> --- a/net/bridge/br_stp.c
> +++ b/net/bridge/br_stp.c
> @@ -639,6 +639,19 @@ int br_set_ageing_time(struct net_bridge *br, clock_t ageing_time)
>  	return 0;
>  }
>  
> +clock_t br_get_ageing_time(struct net_device *br_dev)
> +{
> +	struct net_bridge *br;
> +
> +	if (!netif_is_bridge_master(br_dev))
> +		return 0;
> +
> +	br = netdev_priv(br_dev);
> +
> +	return jiffies_to_clock_t(br->ageing_time);

Don't you want an ASSERT_RTNL() in this function as well?
Vladimir Oltean March 20, 2021, 10:09 a.m. UTC | #3
On Fri, Mar 19, 2021 at 03:13:03PM -0700, Florian Fainelli wrote:
> > diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c
> > index 86b5e05d3f21..3dafb6143cff 100644
> > --- a/net/bridge/br_stp.c
> > +++ b/net/bridge/br_stp.c
> > @@ -639,6 +639,19 @@ int br_set_ageing_time(struct net_bridge *br, clock_t ageing_time)
> >  	return 0;
> >  }
> >  
> > +clock_t br_get_ageing_time(struct net_device *br_dev)
> > +{
> > +	struct net_bridge *br;
> > +
> > +	if (!netif_is_bridge_master(br_dev))
> > +		return 0;
> > +
> > +	br = netdev_priv(br_dev);
> > +
> > +	return jiffies_to_clock_t(br->ageing_time);
> 
> Don't you want an ASSERT_RTNL() in this function as well?

Hmm, I'm not sure. I don't think I'm accessing anything that is under
the protection of the rtnl_mutex. If anything, the ageing time is
protected by the "bridge lock", but I don't think there's much of an
issue if I read an unsigned int while not holding it.