Message ID | 20241107133004.7469-6-shaw.leon@gmail.com |
---|---|
State | New |
Headers | show |
Series | net: Improve netns handling in RTNL and ip_tunnel | expand |
On Thu, Nov 7, 2024 at 9:37 PM Eric Dumazet <edumazet@google.com> wrote: > > On Thu, Nov 7, 2024 at 2:30 PM Xiao Liang <shaw.leon@gmail.com> wrote: > > > > If set to true, create device in target netns (when different from > > link-netns) without netns change, and use current netns as link-netns > > if not specified explicitly. > > > > Sorry, module parameters are not going to fly. Sorry, I didn't know that. > Instead, add new rtnetlink attributes ? It is to control driver behavior at rtnl_ops registration time. I think rtnetlink attributes are too late for that, maybe? Can't think of a way other than module parameters or register separate ops. Any suggestions?
On Thu, 7 Nov 2024 22:11:24 +0800 Xiao Liang wrote: > > Instead, add new rtnetlink attributes ? > > It is to control driver behavior at rtnl_ops registration time. I > think rtnetlink > attributes are too late for that, maybe? Can't think of a way other than > module parameters or register separate ops. Any suggestions? Step back from the implementation you have a little, forget that there is a boolean in rtnl_link_ops. User makes a request to spawn an interface, surely a flag inside that request can dictate how the netns attrs are interpreted.
On Thu, Nov 7, 2024 at 11:59 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Thu, 7 Nov 2024 22:11:24 +0800 Xiao Liang wrote: > > > Instead, add new rtnetlink attributes ? > > > > It is to control driver behavior at rtnl_ops registration time. I > > think rtnetlink > > attributes are too late for that, maybe? Can't think of a way other than > > module parameters or register separate ops. Any suggestions? > > Step back from the implementation you have a little, forget that there > is a boolean in rtnl_link_ops. User makes a request to spawn an > interface, surely a flag inside that request can dictate how the netns > attrs are interpreted. IMO, this is about driver capability, not about user requests. As you've pointed out earlier, probably no one would actually want the old behavior whenever the driver supports the new one. I added the module parameter just for compatibility, because ip_tunnels was not implemented to support src_net properly. Yes it's possible to add an extra flag in user request, but I don't think it's a good approach. BTW, I didn't find what's going on with module parameters, is there any documentation? Thanks.
On Fri, 8 Nov 2024 00:53:55 +0800 Xiao Liang wrote: > > > It is to control driver behavior at rtnl_ops registration time. I > > > think rtnetlink > > > attributes are too late for that, maybe? Can't think of a way other than > > > module parameters or register separate ops. Any suggestions? > > > > Step back from the implementation you have a little, forget that there > > is a boolean in rtnl_link_ops. User makes a request to spawn an > > interface, surely a flag inside that request can dictate how the netns > > attrs are interpreted. > > IMO, this is about driver capability, not about user requests. The bit is a driver capability, that's fine. But the question was how to achieve backward compatibility. A flag in user request shifts the responsibility of ensuring all services are compatible to whoever spawns the interfaces. Which will probably be some network management daemon. > As you've pointed out earlier, probably no one would actually want > the old behavior whenever the driver supports the new one. > I added the module parameter just for compatibility, because ip_tunnels > was not implemented to support src_net properly. And I maintain that it's very unlikely anyone cares about old behavior. So maybe as a starting point we can have neither the flag nor the module param? We can add them later if someone screams. > Yes it's possible to add an extra flag in user request, but I don't > think it's a good approach. There are two maintainers with opposing intuition so more data may be needed to convince.. > BTW, I didn't find what's going on with module parameters, is there > any documentation? Not sure if there is documentation, but module params are quite painful to work with. Main reason is that they are global and not namespace aware. Plus developers usually default to making them read only, which means they practically speaking have to be configured at boot.
On Fri, Nov 8, 2024 at 12:04 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Fri, 8 Nov 2024 00:53:55 +0800 Xiao Liang wrote: > > IMO, this is about driver capability, not about user requests. > > The bit is a driver capability, that's fine. But the question was how > to achieve backward compatibility. A flag in user request shifts the > responsibility of ensuring all services are compatible to whoever > spawns the interfaces. Which will probably be some network management > daemon. OK. So I think we can change the driver capability indicator in rtnl_ops to a tristate field, say, "linkns_support". If it is - not supported, then keep the old behavior - supported (vlan, macvlan, etc.), then change to the new behavior - compat-mode (ip_tunnel), default to old behavior and can be changed via an IFLA flag. Is this reasonable? > > BTW, I didn't find what's going on with module parameters, is there > > any documentation? > > Not sure if there is documentation, but module params are quite painful > to work with. Main reason is that they are global and not namespace > aware. Plus developers usually default to making them read only, which > means they practically speaking have to be configured at boot. Understood, thanks.
On Fri, 8 Nov 2024 14:44:37 +0800 Xiao Liang wrote: > > On Fri, 8 Nov 2024 00:53:55 +0800 Xiao Liang wrote: > > > IMO, this is about driver capability, not about user requests. > > > > The bit is a driver capability, that's fine. But the question was how > > to achieve backward compatibility. A flag in user request shifts the > > responsibility of ensuring all services are compatible to whoever > > spawns the interfaces. Which will probably be some network management > > daemon. > > OK. So I think we can change the driver capability indicator in rtnl_ops > to a tristate field, say, "linkns_support". > If it is > - not supported, then keep the old behavior > - supported (vlan, macvlan, etc.), then change to the new behavior > - compat-mode (ip_tunnel), default to old behavior and can be changed > via an IFLA flag. > Is this reasonable? Let's start with annotating the drivers which need the old behavior. It seems like something that was done as a workaround for old drivers, maybe there isn't that many of them and we can convert them all in one series.
On Sun, Nov 10, 2024 at 8:59 AM Jakub Kicinski <kuba@kernel.org> wrote: > > On Fri, 8 Nov 2024 14:44:37 +0800 Xiao Liang wrote: > > > On Fri, 8 Nov 2024 00:53:55 +0800 Xiao Liang wrote: > > > > IMO, this is about driver capability, not about user requests. > > > > > > The bit is a driver capability, that's fine. But the question was how > > > to achieve backward compatibility. A flag in user request shifts the > > > responsibility of ensuring all services are compatible to whoever > > > spawns the interfaces. Which will probably be some network management > > > daemon. > > > > OK. So I think we can change the driver capability indicator in rtnl_ops > > to a tristate field, say, "linkns_support". > > If it is > > - not supported, then keep the old behavior > > - supported (vlan, macvlan, etc.), then change to the new behavior > > - compat-mode (ip_tunnel), default to old behavior and can be changed > > via an IFLA flag. > > Is this reasonable? > > Let's start with annotating the drivers which need the old behavior. > It seems like something that was done as a workaround for old drivers, > maybe there isn't that many of them and we can convert them all in one > series. I'd like to clarify a bit here. Link netns is closely coupled with source netns, as it's passed to drivers as source netns. Without introducing a flag, after applying the logic in this patchset, drivers won't be able to distinguish the two: 1) ip -n ns1 link add netns ns2 ... 2) ip link add netns ns2 link-netns ns1 ... There's no problem for drivers that already handle source netns. But it changes the semantics of 1) for ip tunnels silently. The effective link-netns is ns2 before, and will be changed to ns1 afterwards, which will almost certainly affect some users. Is this acceptable? On the other hand, do we need to deal with out-of-tree drivers?
On Mon, 11 Nov 2024 16:15:41 +0800 Xiao Liang wrote: > > Let's start with annotating the drivers which need the old behavior. > > It seems like something that was done as a workaround for old drivers, > > maybe there isn't that many of them and we can convert them all in one > > series. > > I'd like to clarify a bit here. > Link netns is closely coupled with source netns, as it's passed to drivers > as source netns. Without introducing a flag, after applying the logic in > this patchset, drivers won't be able to distinguish the two: > 1) ip -n ns1 link add netns ns2 ... > 2) ip link add netns ns2 link-netns ns1 ... True, but the question is how many drivers actually care about the net parameter. Ideally we would pass both netns to the drivers, refactor the ->newlink callback to take a parameter struct and add both netns as members. Passing just one or the other will always be confusing. > There's no problem for drivers that already handle source netns. > But it changes the semantics of 1) for ip tunnels silently. The effective > link-netns is ns2 before, and will be changed to ns1 afterwards, which will > almost certainly affect some users. Is this acceptable? No, changing the behavior for the commands you provided is not acceptable. At the same time we shouldn't be adding technical debt of supporting both converted and unconverted drivers upstream. > On the other hand, do we need to deal with out-of-tree drivers? Nope, but again, changing the prototype of newlink would also make it hard to get wrong for OOT modules.
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c index f1f31ebfc793..6ef7e98e4620 100644 --- a/net/ipv4/ip_gre.c +++ b/net/ipv4/ip_gre.c @@ -107,6 +107,11 @@ static bool log_ecn_error = true; module_param(log_ecn_error, bool, 0644); MODULE_PARM_DESC(log_ecn_error, "Log packets received with corrupted ECN"); +static bool netns_atomic; +module_param(netns_atomic, bool, 0444); +MODULE_PARM_DESC(netns_atomic, + "Create tunnel in target net namespace directly and use current net namespace as link-netns by default"); + static struct rtnl_link_ops ipgre_link_ops __read_mostly; static const struct header_ops ipgre_header_ops; @@ -1393,6 +1398,7 @@ static int ipgre_newlink(struct net *src_net, struct net_device *dev, struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) { + struct net *link_net = netns_atomic ? src_net : dev_net(dev); struct ip_tunnel_parm_kern p; __u32 fwmark = 0; int err; @@ -1404,13 +1410,14 @@ static int ipgre_newlink(struct net *src_net, struct net_device *dev, err = ipgre_netlink_parms(dev, data, tb, &p, &fwmark); if (err < 0) return err; - return ip_tunnel_newlink(dev, tb, &p, fwmark); + return ip_tunnel_newlink_net(link_net, dev, tb, &p, fwmark); } static int erspan_newlink(struct net *src_net, struct net_device *dev, struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) { + struct net *link_net = netns_atomic ? src_net : dev_net(dev); struct ip_tunnel_parm_kern p; __u32 fwmark = 0; int err; @@ -1422,7 +1429,7 @@ static int erspan_newlink(struct net *src_net, struct net_device *dev, err = erspan_netlink_parms(dev, data, tb, &p, &fwmark); if (err) return err; - return ip_tunnel_newlink(dev, tb, &p, fwmark); + return ip_tunnel_newlink_net(link_net, dev, tb, &p, fwmark); } static int ipgre_changelink(struct net_device *dev, struct nlattr *tb[], @@ -1777,6 +1784,10 @@ static int __init ipgre_init(void) pr_info("GRE over IPv4 tunneling driver\n"); + ipgre_link_ops.netns_atomic = netns_atomic; + ipgre_tap_ops.netns_atomic = netns_atomic; + erspan_link_ops.netns_atomic = netns_atomic; + err = register_pernet_device(&ipgre_net_ops); if (err < 0) return err;
If set to true, create device in target netns (when different from link-netns) without netns change, and use current netns as link-netns if not specified explicitly. Signed-off-by: Xiao Liang <shaw.leon@gmail.com> --- net/ipv4/ip_gre.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-)