diff mbox series

[RFC,net-next,6/8] net: dsa: flush switchdev workqueue when leaving the bridge

Message ID 20210824114049.3814660-7-vladimir.oltean@nxp.com
State New
Headers show
Series Drop rtnl_lock from DSA .port_fdb_{add,del} | expand

Commit Message

Vladimir Oltean Aug. 24, 2021, 11:40 a.m. UTC
DSA is preparing to offer switch drivers an API through which they can
associate each FDB entry with a struct net_device *bridge_dev. This can
be used to perform FDB isolation (the FDB lookup performed on the
ingress of a standalone, or bridged port, should not find an FDB entry
that is present in the FDB of another bridge).

In preparation of that work, DSA needs to ensure that by the time we
call the switch .port_fdb_add and .port_fdb_del methods, the
dp->bridge_dev pointer is still valid, i.e. the port is still a bridge
port.

Currently this is true for .port_fdb_add, but not guaranteed to be true
for .port_fdb_del. This is because the SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE
API requires drivers that must have sleepable context to handle those
events to schedule the deferred work themselves. DSA does this through
the dsa_owq.

It can happen that a port leaves a bridge, del_nbp() flushes the FDB on
that port, SWITCHDEV_FDB_DEL_TO_DEVICE is notified in atomic context,
DSA schedules its deferred work, but del_nbp() finishes unlinking the
bridge as a master from the port before DSA's deferred work is run.

Fundamentally, the port must not be unlinked from the bridge until all
FDB deletion deferred work items have been flushed. The bridge must wait
for the completion of these hardware accesses.

I have tried to address this issue centrally in switchdev by making
SWITCHDEV_FDB_DEL_TO_DEVICE deferred (=> blocking) at the switchdev
level, which would offer implicit synchronization with del_nbp:

https://patchwork.kernel.org/project/netdevbpf/cover/20210820115746.3701811-1-vladimir.oltean@nxp.com/

but it seems that any attempt to modify switchdev's behavior and make
the events blocking there would introduce undesirable side effects in
other switchdev consumers.

The most undesirable behavior seems to be that
switchdev_deferred_process_work() takes the rtnl_mutex itself, which
would be worse off than having the rtnl_mutex taken individually from
drivers which is what we have now.

So to offer the needed guarantee to DSA switch drivers, I have come up
with a compromise solution that does not require switchdev rework:
we already have a hook at the last moment in time when the bridge is
still an upper of ours: the NETDEV_PRECHANGEUPPER handler. We can flush
the dsa_owq manually from there, which makes all FDB deletions
synchronous.

Major problem: the NETDEV_PRECHANGEUPPER event runs with rtnl_mutex held,
so flushing dsa_owq would deadlock if dsa_slave_switchdev_event_work
would take the rtnl_mutex too.

So not only would it be desirable to drop the rtnl_lock from DSA, it is
actually mandatory to do so.

This change requires ACKs from driver maintainers, since we expose
switches to a method which is now unlocked and can trigger concurrency
issue in the access to hardware.

I've eyeballed the existing drivers, and have needed to patch sja1105
and felix/ocelot. I am also looking at the b53 driver where the ARL ops
are unlocked. The other drivers do seem to have a mutex of sorts, but I
am fairly skeptical that its serialization features have really been put
to the test (knowing that the rtnl_mutex serialized accesses already).
So any regression test from drivers that implement:
- .port_fdb_add
- .port_fdb_del
- .port_fdb_dump
- .port_mdb_add
- .port_mdb_del
- .port_fast_age

is appreciated.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 net/dsa/dsa.c      | 5 +++++
 net/dsa/dsa_priv.h | 2 ++
 net/dsa/port.c     | 2 ++
 3 files changed, 9 insertions(+)
diff mbox series

Patch

diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index 1dc45e40f961..8e7207c85d61 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -345,6 +345,11 @@  bool dsa_schedule_work(struct work_struct *work)
 	return queue_work(dsa_owq, work);
 }
 
+void dsa_flush_work(void)
+{
+	flush_workqueue(dsa_owq);
+}
+
 int dsa_devlink_param_get(struct devlink *dl, u32 id,
 			  struct devlink_param_gset_ctx *ctx)
 {
diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index 33ab7d7af9eb..1dc28ad4b8a8 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -170,6 +170,8 @@  void dsa_tag_driver_put(const struct dsa_device_ops *ops);
 const struct dsa_device_ops *dsa_find_tagger_by_name(const char *buf);
 
 bool dsa_schedule_work(struct work_struct *work);
+void dsa_flush_work(void);
+
 const char *dsa_tag_protocol_to_str(const struct dsa_device_ops *ops);
 
 static inline int dsa_tag_protocol_overhead(const struct dsa_device_ops *ops)
diff --git a/net/dsa/port.c b/net/dsa/port.c
index 616330a16d31..65ce114b9fc8 100644
--- a/net/dsa/port.c
+++ b/net/dsa/port.c
@@ -380,6 +380,8 @@  void dsa_port_pre_bridge_leave(struct dsa_port *dp, struct net_device *br)
 	switchdev_bridge_port_unoffload(brport_dev, dp,
 					&dsa_slave_switchdev_notifier,
 					&dsa_slave_switchdev_blocking_notifier);
+
+	dsa_flush_work();
 }
 
 void dsa_port_bridge_leave(struct dsa_port *dp, struct net_device *br)