Message ID | 20210106231728.1363126-1-olteanv@gmail.com |
---|---|
Headers | show |
Series | Get rid of the switchdev transactional model | expand |
On Thu Jan 07 2021, Vladimir Oltean wrote: > From: Vladimir Oltean <vladimir.oltean@nxp.com> > > The call path of a switchdev VLAN addition to the bridge looks something > like this today: > > nbp_vlan_init > | __br_vlan_set_default_pvid > | | | > | | br_afspec | > | | | | > | | v | > | | br_process_vlan_info | > | | | | > | | v | > | | br_vlan_info | > | | / \ / > | | / \ / > | | / \ / > | | / \ / > v v v v v > nbp_vlan_add br_vlan_add ------+ > | ^ ^ | | > | / | | | > | / / / | > \ br_vlan_get_master/ / v > \ ^ / / br_vlan_add_existing > \ | / / | > \ | / / / > \ | / / / > \ | / / / > \ | / / / > v | | v / > __vlan_add / > / | / > / | / > v | / > __vlan_vid_add | / > \ | / > v v v > br_switchdev_port_vlan_add > > The ranges UAPI was introduced to the bridge in commit bdced7ef7838 > ("bridge: support for multiple vlans and vlan ranges in setlink and > dellink requests") (Jan 10 2015). But the VLAN ranges (parsed in br_afspec) > have always been passed one by one, through struct bridge_vlan_info > tmp_vinfo, to br_vlan_info. So the range never went too far in depth. > > Then Scott Feldman introduced the switchdev_port_bridge_setlink function > in commit 47f8328bb1a4 ("switchdev: add new switchdev bridge setlink"). > That marked the introduction of the SWITCHDEV_OBJ_PORT_VLAN, which made > full use of the range. But switchdev_port_bridge_setlink was called like > this: > > br_setlink > -> br_afspec > -> switchdev_port_bridge_setlink > > Basically, the switchdev and the bridge code were not tightly integrated. > Then commit 41c498b9359e ("bridge: restore br_setlink back to original") > came, and switchdev drivers were required to implement > .ndo_bridge_setlink = switchdev_port_bridge_setlink for a while. > > In the meantime, commits such as 0944d6b5a2fa ("bridge: try switchdev op > first in __vlan_vid_add/del") finally made switchdev penetrate the > br_vlan_info() barrier and start to develop the call path we have today. > But remember, br_vlan_info() still receives VLANs one by one. > > Then Arkadi Sharshevsky refactored the switchdev API in 2017 in commit > 29ab586c3d83 ("net: switchdev: Remove bridge bypass support from > switchdev") so that drivers would not implement .ndo_bridge_setlink any > longer. The switchdev_port_bridge_setlink also got deleted. > This refactoring removed the parallel bridge_setlink implementation from > switchdev, and left the only switchdev VLAN objects to be the ones > offloaded from __vlan_vid_add (basically RX filtering) and __vlan_add > (the latter coming from commit 9c86ce2c1ae3 ("net: bridge: Notify about > bridge VLANs")). > > That is to say, today the switchdev VLAN object ranges are not used in > the kernel. Refactoring the above call path is a bit complicated, when > the bridge VLAN call path is already a bit complicated. > > Let's go off and finish the job of commit 29ab586c3d83 by deleting the > bogus iteration through the VLAN ranges from the drivers. Some aspects > of this feature never made too much sense in the first place. For > example, what is a range of VLANs all having the BRIDGE_VLAN_INFO_PVID > flag supposed to mean, when a port can obviously have a single pvid? > This particular configuration _is_ denied as of commit 6623c60dc28e > ("bridge: vlan: enforce no pvid flag in vlan ranges"), but from an API > perspective, the driver still has to play pretend, and only offload the > vlan->vid_end as pvid. And the addition of a switchdev VLAN object can > modify the flags of another, completely unrelated, switchdev VLAN > object! (a VLAN that is PVID will invalidate the PVID flag from whatever > other VLAN had previously been offloaded with switchdev and had that > flag. Yet switchdev never notifies about that change, drivers are > supposed to guess). > > Nonetheless, having a VLAN range in the API makes error handling look > scarier than it really is - unwinding on errors and all of that. > When in reality, no one really calls this API with more than one VLAN. > It is all unnecessary complexity. > > And despite appearing pretentious (two-phase transactional model and > all), the switchdev API is really sloppy because the VLAN addition and > removal operations are not paired with one another (you can add a VLAN > 100 times and delete it just once). The bridge notifies through > switchdev of a VLAN addition not only when the flags of an existing VLAN > change, but also when nothing changes. There are switchdev drivers out > there who don't like adding a VLAN that has already been added, and > those checks don't really belong at driver level. But the fact that the > API contains ranges is yet another factor that prevents this from being > addressed in the future. > > Of the existing switchdev pieces of hardware, it appears that only > Mellanox Spectrum supports offloading more than one VLAN at a time, > through mlxsw_sp_port_vlan_set. I have kept that code internal to the > driver, because there is some more bookkeeping that makes use of it, but > I deleted it from the switchdev API. But since the switchdev support for > ranges has already been de facto deleted by a Mellanox employee and > nobody noticed for 4 years, I'm going to assume it's not a biggie. > > Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> > Reviewed-by: Ido Schimmel <idosch@nvidia.com> # switchdev and mlxsw [snip] > --- a/drivers/net/dsa/hirschmann/hellcreek.c > +++ b/drivers/net/dsa/hirschmann/hellcreek.c > @@ -353,9 +353,8 @@ static int hellcreek_vlan_prepare(struct dsa_switch *ds, int port, > if (!dsa_is_user_port(ds, i)) > continue; > > - for (vid = vlan->vid_begin; vid <= vlan->vid_end; ++vid) > - if (vid == restricted_vid) > - return -EBUSY; > + if (vlan->vid == restricted_vid) > + return -EBUSY; `u16 vid' is not used anymore: drivers/net/dsa/hirschmann/hellcreek.c: In function ‘hellcreek_vlan_prepare’: drivers/net/dsa/hirschmann/hellcreek.c:359:7: warning: unused variable ‘vid’ [-Wunused-variable] u16 vid; ^~~ Thanks, Kurt
On Thu, Jan 07, 2021 at 08:17:14AM +0100, Kurt Kanzenbach wrote: > [snip] > > > --- a/drivers/net/dsa/hirschmann/hellcreek.c > > +++ b/drivers/net/dsa/hirschmann/hellcreek.c > > @@ -353,9 +353,8 @@ static int hellcreek_vlan_prepare(struct dsa_switch *ds, int port, > > if (!dsa_is_user_port(ds, i)) > > continue; > > > > - for (vid = vlan->vid_begin; vid <= vlan->vid_end; ++vid) > > - if (vid == restricted_vid) > > - return -EBUSY; > > + if (vlan->vid == restricted_vid) > > + return -EBUSY; > > `u16 vid' is not used anymore: > > drivers/net/dsa/hirschmann/hellcreek.c: In function ‘hellcreek_vlan_prepare’: > drivers/net/dsa/hirschmann/hellcreek.c:359:7: warning: unused variable ‘vid’ [-Wunused-variable] > u16 vid; > ^~~ Thanks, I noticed now. I also noticed I did not update dsa_loop. Sorry. https://patchwork.hopto.org/static/nipa/410259/12002471/build_32bit/stderr
+Petr On Thu, Jan 07, 2021 at 01:17:20AM +0200, Vladimir Oltean wrote: > static int mlxsw_sp_port_obj_add(struct net_device *dev, > const struct switchdev_obj *obj, > - struct switchdev_trans *trans, > struct netlink_ext_ack *extack) > { > struct mlxsw_sp_port *mlxsw_sp_port = netdev_priv(dev); > const struct switchdev_obj_port_vlan *vlan; > + struct switchdev_trans trans; > int err = 0; > > switch (obj->id) { > case SWITCHDEV_OBJ_ID_PORT_VLAN: > vlan = SWITCHDEV_OBJ_PORT_VLAN(obj); > - err = mlxsw_sp_port_vlans_add(mlxsw_sp_port, vlan, trans, > + Got the regression results. The call to mlxsw_sp_span_respin() should be placed here because it needs to be triggered regardless of the return value of mlxsw_sp_port_vlans_add(). I'm looking into another failure, which might not be related to these patches. I will also have results with a debug kernel tomorrow (takes almost a day to complete). Will let you know. > + trans.ph_prepare = true; > + err = mlxsw_sp_port_vlans_add(mlxsw_sp_port, vlan, &trans, > extack); > + if (err) > + break; > > - if (switchdev_trans_ph_prepare(trans)) { > - /* The event is emitted before the changes are actually > - * applied to the bridge. Therefore schedule the respin > - * call for later, so that the respin logic sees the > - * updated bridge state. > - */ > - mlxsw_sp_span_respin(mlxsw_sp_port->mlxsw_sp); > - } > + /* The event is emitted before the changes are actually > + * applied to the bridge. Therefore schedule the respin > + * call for later, so that the respin logic sees the > + * updated bridge state. > + */ > + mlxsw_sp_span_respin(mlxsw_sp_port->mlxsw_sp); > + > + trans.ph_prepare = false; > + err = mlxsw_sp_port_vlans_add(mlxsw_sp_port, vlan, &trans, > + extack); > break; > case SWITCHDEV_OBJ_ID_PORT_MDB: > err = mlxsw_sp_port_mdb_add(mlxsw_sp_port, > - SWITCHDEV_OBJ_PORT_MDB(obj), > - trans); > + SWITCHDEV_OBJ_PORT_MDB(obj)); > break; > default: > err = -EOPNOTSUPP;
Forgot to actually add Petr On Thu, Jan 07, 2021 at 12:38:39PM +0200, Ido Schimmel wrote: > +Petr > > On Thu, Jan 07, 2021 at 01:17:20AM +0200, Vladimir Oltean wrote: > > static int mlxsw_sp_port_obj_add(struct net_device *dev, > > const struct switchdev_obj *obj, > > - struct switchdev_trans *trans, > > struct netlink_ext_ack *extack) > > { > > struct mlxsw_sp_port *mlxsw_sp_port = netdev_priv(dev); > > const struct switchdev_obj_port_vlan *vlan; > > + struct switchdev_trans trans; > > int err = 0; > > > > switch (obj->id) { > > case SWITCHDEV_OBJ_ID_PORT_VLAN: > > vlan = SWITCHDEV_OBJ_PORT_VLAN(obj); > > - err = mlxsw_sp_port_vlans_add(mlxsw_sp_port, vlan, trans, > > + > > Got the regression results. The call to mlxsw_sp_span_respin() should be > placed here because it needs to be triggered regardless of the return > value of mlxsw_sp_port_vlans_add(). > > I'm looking into another failure, which might not be related to these > patches. I will also have results with a debug kernel tomorrow (takes > almost a day to complete). Will let you know. > > > + trans.ph_prepare = true; > > + err = mlxsw_sp_port_vlans_add(mlxsw_sp_port, vlan, &trans, > > extack); > > + if (err) > > + break; > > > > - if (switchdev_trans_ph_prepare(trans)) { > > - /* The event is emitted before the changes are actually > > - * applied to the bridge. Therefore schedule the respin > > - * call for later, so that the respin logic sees the > > - * updated bridge state. > > - */ > > - mlxsw_sp_span_respin(mlxsw_sp_port->mlxsw_sp); > > - } > > + /* The event is emitted before the changes are actually > > + * applied to the bridge. Therefore schedule the respin > > + * call for later, so that the respin logic sees the > > + * updated bridge state. > > + */ > > + mlxsw_sp_span_respin(mlxsw_sp_port->mlxsw_sp); > > + > > + trans.ph_prepare = false; > > + err = mlxsw_sp_port_vlans_add(mlxsw_sp_port, vlan, &trans, > > + extack); > > break; > > case SWITCHDEV_OBJ_ID_PORT_MDB: > > err = mlxsw_sp_port_mdb_add(mlxsw_sp_port, > > - SWITCHDEV_OBJ_PORT_MDB(obj), > > - trans); > > + SWITCHDEV_OBJ_PORT_MDB(obj)); > > break; > > default: > > err = -EOPNOTSUPP;
On Thu, Jan 07, 2021 at 12:40:51PM +0200, Ido Schimmel wrote: > Forgot to actually add Petr > > On Thu, Jan 07, 2021 at 12:38:39PM +0200, Ido Schimmel wrote: > > +Petr > > > > On Thu, Jan 07, 2021 at 01:17:20AM +0200, Vladimir Oltean wrote: > > > static int mlxsw_sp_port_obj_add(struct net_device *dev, > > > const struct switchdev_obj *obj, > > > - struct switchdev_trans *trans, > > > struct netlink_ext_ack *extack) > > > { > > > struct mlxsw_sp_port *mlxsw_sp_port = netdev_priv(dev); > > > const struct switchdev_obj_port_vlan *vlan; > > > + struct switchdev_trans trans; > > > int err = 0; > > > > > > switch (obj->id) { > > > case SWITCHDEV_OBJ_ID_PORT_VLAN: > > > vlan = SWITCHDEV_OBJ_PORT_VLAN(obj); > > > - err = mlxsw_sp_port_vlans_add(mlxsw_sp_port, vlan, trans, > > > + > > > > Got the regression results. The call to mlxsw_sp_span_respin() should be > > placed here because it needs to be triggered regardless of the return > > value of mlxsw_sp_port_vlans_add(). > > > > I'm looking into another failure, which might not be related to these > > patches. I will also have results with a debug kernel tomorrow (takes > > almost a day to complete). Will let you know. Please ignore the comment about the additional failure. Not related to these patches. Thanks > > > > > + trans.ph_prepare = true; > > > + err = mlxsw_sp_port_vlans_add(mlxsw_sp_port, vlan, &trans, > > > extack); > > > + if (err) > > > + break; > > > > > > - if (switchdev_trans_ph_prepare(trans)) { > > > - /* The event is emitted before the changes are actually > > > - * applied to the bridge. Therefore schedule the respin > > > - * call for later, so that the respin logic sees the > > > - * updated bridge state. > > > - */ > > > - mlxsw_sp_span_respin(mlxsw_sp_port->mlxsw_sp); > > > - } > > > + /* The event is emitted before the changes are actually > > > + * applied to the bridge. Therefore schedule the respin > > > + * call for later, so that the respin logic sees the > > > + * updated bridge state. > > > + */ > > > + mlxsw_sp_span_respin(mlxsw_sp_port->mlxsw_sp); > > > + > > > + trans.ph_prepare = false; > > > + err = mlxsw_sp_port_vlans_add(mlxsw_sp_port, vlan, &trans, > > > + extack); > > > break; > > > case SWITCHDEV_OBJ_ID_PORT_MDB: > > > err = mlxsw_sp_port_mdb_add(mlxsw_sp_port, > > > - SWITCHDEV_OBJ_PORT_MDB(obj), > > > - trans); > > > + SWITCHDEV_OBJ_PORT_MDB(obj)); > > > break; > > > default: > > > err = -EOPNOTSUPP;
Ido Schimmel <idosch@idosch.org> writes: > +Petr > > On Thu, Jan 07, 2021 at 01:17:20AM +0200, Vladimir Oltean wrote: >> static int mlxsw_sp_port_obj_add(struct net_device *dev, >> const struct switchdev_obj *obj, >> - struct switchdev_trans *trans, >> struct netlink_ext_ack *extack) >> { >> struct mlxsw_sp_port *mlxsw_sp_port = netdev_priv(dev); >> const struct switchdev_obj_port_vlan *vlan; >> + struct switchdev_trans trans; >> int err = 0; >> >> switch (obj->id) { >> case SWITCHDEV_OBJ_ID_PORT_VLAN: >> vlan = SWITCHDEV_OBJ_PORT_VLAN(obj); >> - err = mlxsw_sp_port_vlans_add(mlxsw_sp_port, vlan, trans, >> + > > Got the regression results. The call to mlxsw_sp_span_respin() should be > placed here because it needs to be triggered regardless of the return > value of mlxsw_sp_port_vlans_add(). Agreed, the new code differs in that respin is not called on error path.
On Thu, Jan 07, 2021 at 12:38:35PM +0200, Ido Schimmel wrote: > +Petr > > On Thu, Jan 07, 2021 at 01:17:20AM +0200, Vladimir Oltean wrote: > > static int mlxsw_sp_port_obj_add(struct net_device *dev, > > const struct switchdev_obj *obj, > > - struct switchdev_trans *trans, > > struct netlink_ext_ack *extack) > > { > > struct mlxsw_sp_port *mlxsw_sp_port = netdev_priv(dev); > > const struct switchdev_obj_port_vlan *vlan; > > + struct switchdev_trans trans; > > int err = 0; > > > > switch (obj->id) { > > case SWITCHDEV_OBJ_ID_PORT_VLAN: > > vlan = SWITCHDEV_OBJ_PORT_VLAN(obj); > > - err = mlxsw_sp_port_vlans_add(mlxsw_sp_port, vlan, trans, > > + > > Got the regression results. The call to mlxsw_sp_span_respin() should be > placed here because it needs to be triggered regardless of the return > value of mlxsw_sp_port_vlans_add(). So before, mlxsw_sp_span_respin() was called right in between the prepare phase and the commit phase, regardless of the error value of mlxsw_sp_port_vlans_add. How does that work, I assume that mlxsw_sp_span_respin_work gets to run after the commit phase because it serializes using rtnl_lock()? Then why did it matter enough to schedule it between the prepare and commit phase in the first place? And what is there to do in mlxsw_sp_span_respin_work when mlxsw_sp_port_vlans_add returns -EOPNOTSUPP, -EBUSY, -EINVAL, -EEXIST or -ENOMEM?
On Thu, Jan 07, 2021 at 01:18:22PM +0200, Vladimir Oltean wrote: > On Thu, Jan 07, 2021 at 12:38:35PM +0200, Ido Schimmel wrote: > > +Petr > > > > On Thu, Jan 07, 2021 at 01:17:20AM +0200, Vladimir Oltean wrote: > > > static int mlxsw_sp_port_obj_add(struct net_device *dev, > > > const struct switchdev_obj *obj, > > > - struct switchdev_trans *trans, > > > struct netlink_ext_ack *extack) > > > { > > > struct mlxsw_sp_port *mlxsw_sp_port = netdev_priv(dev); > > > const struct switchdev_obj_port_vlan *vlan; > > > + struct switchdev_trans trans; > > > int err = 0; > > > > > > switch (obj->id) { > > > case SWITCHDEV_OBJ_ID_PORT_VLAN: > > > vlan = SWITCHDEV_OBJ_PORT_VLAN(obj); > > > - err = mlxsw_sp_port_vlans_add(mlxsw_sp_port, vlan, trans, > > > + > > > > Got the regression results. The call to mlxsw_sp_span_respin() should be > > placed here because it needs to be triggered regardless of the return > > value of mlxsw_sp_port_vlans_add(). > > So before, mlxsw_sp_span_respin() was called right in between the > prepare phase and the commit phase, regardless of the error value of > mlxsw_sp_port_vlans_add. How does that work, I assume that > mlxsw_sp_span_respin_work gets to run after the commit phase because it > serializes using rtnl_lock()? Then why did it matter enough to schedule > it between the prepare and commit phase in the first place? > And what is there to do in mlxsw_sp_span_respin_work when > mlxsw_sp_port_vlans_add returns -EOPNOTSUPP, -EBUSY, -EINVAL, -EEXIST or > -ENOMEM? The bridge driver will ignore -EOPNOTSUPP and actually add the VLAN on the bridge device. See commit 9c86ce2c1ae3 ("net: bridge: Notify about bridge VLANs") and commit ea4721751977 ("mlxsw: spectrum_switchdev: Ignore bridge VLAN events") Since the VLAN was successfully added on the bridge device, mlxsw_sp_span_respin_work() should be able to resolve the egress port for a packet that is mirrored to a gre tap and passes through the bridge device.
From: Vladimir Oltean <vladimir.oltean@nxp.com> Changes in v3: - Resolved a build warning in mv88e6xxx and tested that it actually works properly, which resulted in an extra patch (02/11. - Addressed Ido's minor feedback in commit 10/11 relating to a comment. Changes in v2: - Got rid of the vid_begin -> vid_end range too from the switchdev API. - Actually propagating errors from DSA MDB and VLAN notifiers. This series comes after the late realization that the prepare/commit separation imposed by switchdev does not help literally anybody: https://patchwork.kernel.org/project/netdevbpf/patch/20201212203901.351331-1-vladimir.oltean@nxp.com/ We should kill it before it inflicts even more damage to the error handling logic in drivers. Also remove the unused VLAN ranges feature from the switchdev VLAN objects, which simplifies all drivers by quite a bit. Vladimir Oltean (11): net: switchdev: remove vid_begin -> vid_end range from VLAN objects net: dsa: mv88e6xxx: deny vid 0 on the CPU port and DSA links too net: switchdev: remove the transaction structure from port object notifiers net: switchdev: delete switchdev_port_obj_add_now net: switchdev: remove the transaction structure from port attributes net: dsa: remove the transactional logic from ageing time notifiers net: dsa: remove the transactional logic from MDB entries net: dsa: remove the transactional logic from VLAN objects net: dsa: remove obsolete comments about switchdev transactions mlxsw: spectrum_switchdev: remove transactional logic for VLAN objects net: switchdev: delete the transaction object drivers/net/dsa/b53/b53_common.c | 96 ++++------ drivers/net/dsa/b53/b53_priv.h | 15 +- drivers/net/dsa/bcm_sf2.c | 2 - drivers/net/dsa/bcm_sf2_cfp.c | 10 +- drivers/net/dsa/dsa_loop.c | 68 +++---- drivers/net/dsa/hirschmann/hellcreek.c | 39 ++-- drivers/net/dsa/lan9303-core.c | 12 +- drivers/net/dsa/lantiq_gswip.c | 100 ++++------- drivers/net/dsa/microchip/ksz8795.c | 76 ++++---- drivers/net/dsa/microchip/ksz9477.c | 96 +++++----- drivers/net/dsa/microchip/ksz_common.c | 25 +-- drivers/net/dsa/microchip/ksz_common.h | 8 +- drivers/net/dsa/mt7530.c | 52 ++---- drivers/net/dsa/mv88e6xxx/chip.c | 155 ++++++++-------- drivers/net/dsa/ocelot/felix.c | 69 ++------ drivers/net/dsa/qca8k.c | 37 ++-- drivers/net/dsa/realtek-smi-core.h | 9 +- drivers/net/dsa/rtl8366.c | 152 +++++++--------- drivers/net/dsa/rtl8366rb.c | 1 - drivers/net/dsa/sja1105/sja1105.h | 3 +- drivers/net/dsa/sja1105/sja1105_devlink.c | 9 +- drivers/net/dsa/sja1105/sja1105_main.c | 97 ++++------ .../marvell/prestera/prestera_switchdev.c | 62 ++----- .../mellanox/mlxsw/spectrum_switchdev.c | 167 +++++------------- drivers/net/ethernet/mscc/ocelot.c | 32 ++-- drivers/net/ethernet/mscc/ocelot_net.c | 69 ++------ drivers/net/ethernet/rocker/rocker.h | 6 +- drivers/net/ethernet/rocker/rocker_main.c | 61 ++----- drivers/net/ethernet/rocker/rocker_ofdpa.c | 43 ++--- drivers/net/ethernet/ti/cpsw_switchdev.c | 70 ++------ drivers/staging/fsl-dpaa2/ethsw/ethsw.c | 115 +++++------- include/net/dsa.h | 11 +- include/net/switchdev.h | 27 +-- include/soc/mscc/ocelot.h | 3 +- net/bridge/br_switchdev.c | 6 +- net/dsa/dsa_priv.h | 27 +-- net/dsa/port.c | 103 +++++------ net/dsa/slave.c | 79 +++------ net/dsa/switch.c | 89 ++-------- net/switchdev/switchdev.c | 101 ++--------- 40 files changed, 714 insertions(+), 1488 deletions(-)