diff mbox series

[net-next,v2,5/8] net: ip_gre: Add netns_atomic module parameter

Message ID 20241107133004.7469-6-shaw.leon@gmail.com
State New
Headers show
Series net: Improve netns handling in RTNL and ip_tunnel | expand

Commit Message

Xiao Liang Nov. 7, 2024, 1:30 p.m. UTC
If set to true, create device in target netns (when different from
link-netns) without netns change, and use current netns as link-netns
if not specified explicitly.

Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
---
 net/ipv4/ip_gre.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

Comments

Xiao Liang Nov. 7, 2024, 2:11 p.m. UTC | #1
On Thu, Nov 7, 2024 at 9:37 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Thu, Nov 7, 2024 at 2:30 PM Xiao Liang <shaw.leon@gmail.com> wrote:
> >
> > If set to true, create device in target netns (when different from
> > link-netns) without netns change, and use current netns as link-netns
> > if not specified explicitly.
> >
>
> Sorry, module parameters are not going to fly.

Sorry, I didn't know that.

> Instead, add new rtnetlink attributes ?

It is to control driver behavior at rtnl_ops registration time. I
think rtnetlink
attributes are too late for that, maybe? Can't think of a way other than
module parameters or register separate ops. Any suggestions?
Jakub Kicinski Nov. 7, 2024, 3:59 p.m. UTC | #2
On Thu, 7 Nov 2024 22:11:24 +0800 Xiao Liang wrote:
> > Instead, add new rtnetlink attributes ?  
> 
> It is to control driver behavior at rtnl_ops registration time. I
> think rtnetlink
> attributes are too late for that, maybe? Can't think of a way other than
> module parameters or register separate ops. Any suggestions?

Step back from the implementation you have a little, forget that there
is a boolean in rtnl_link_ops. User makes a request to spawn an
interface, surely a flag inside that request can dictate how the netns
attrs are interpreted.
Xiao Liang Nov. 7, 2024, 4:53 p.m. UTC | #3
On Thu, Nov 7, 2024 at 11:59 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 7 Nov 2024 22:11:24 +0800 Xiao Liang wrote:
> > > Instead, add new rtnetlink attributes ?
> >
> > It is to control driver behavior at rtnl_ops registration time. I
> > think rtnetlink
> > attributes are too late for that, maybe? Can't think of a way other than
> > module parameters or register separate ops. Any suggestions?
>
> Step back from the implementation you have a little, forget that there
> is a boolean in rtnl_link_ops. User makes a request to spawn an
> interface, surely a flag inside that request can dictate how the netns
> attrs are interpreted.

IMO, this is about driver capability, not about user requests.
As you've pointed out earlier, probably no one would actually want
the old behavior whenever the driver supports the new one.
I added the module parameter just for compatibility, because ip_tunnels
was not implemented to support src_net properly. Yes it's possible to
add an extra flag in user request, but I don't think it's a good approach.

BTW, I didn't find what's going on with module parameters, is there
any documentation?

Thanks.
Jakub Kicinski Nov. 8, 2024, 4:04 a.m. UTC | #4
On Fri, 8 Nov 2024 00:53:55 +0800 Xiao Liang wrote:
> > > It is to control driver behavior at rtnl_ops registration time. I
> > > think rtnetlink
> > > attributes are too late for that, maybe? Can't think of a way other than
> > > module parameters or register separate ops. Any suggestions?  
> >
> > Step back from the implementation you have a little, forget that there
> > is a boolean in rtnl_link_ops. User makes a request to spawn an
> > interface, surely a flag inside that request can dictate how the netns
> > attrs are interpreted.  
> 
> IMO, this is about driver capability, not about user requests.

The bit is a driver capability, that's fine. But the question was how
to achieve backward compatibility. A flag in user request shifts the
responsibility of ensuring all services are compatible to whoever
spawns the interfaces. Which will probably be some network management
daemon.

> As you've pointed out earlier, probably no one would actually want
> the old behavior whenever the driver supports the new one.
> I added the module parameter just for compatibility, because ip_tunnels
> was not implemented to support src_net properly.

And I maintain that it's very unlikely anyone cares about old behavior.
So maybe as a starting point we can have neither the flag nor the
module param? We can add them later if someone screams.

> Yes it's possible to add an extra flag in user request, but I don't
> think it's a good approach.

There are two maintainers with opposing intuition so more data may be
needed to convince..

> BTW, I didn't find what's going on with module parameters, is there
> any documentation?

Not sure if there is documentation, but module params are quite painful
to work with. Main reason is that they are global and not namespace
aware. Plus developers usually default to making them read only, which
means they practically speaking have to be configured at boot.
Xiao Liang Nov. 8, 2024, 6:44 a.m. UTC | #5
On Fri, Nov 8, 2024 at 12:04 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Fri, 8 Nov 2024 00:53:55 +0800 Xiao Liang wrote:
> > IMO, this is about driver capability, not about user requests.
>
> The bit is a driver capability, that's fine. But the question was how
> to achieve backward compatibility. A flag in user request shifts the
> responsibility of ensuring all services are compatible to whoever
> spawns the interfaces. Which will probably be some network management
> daemon.

OK. So I think we can change the driver capability indicator in rtnl_ops
to a tristate field, say, "linkns_support".
If it is
  - not supported, then keep the old behavior
  - supported (vlan, macvlan, etc.), then change to the new behavior
  - compat-mode (ip_tunnel), default to old behavior and can be changed
    via an IFLA flag.
Is this reasonable?

> > BTW, I didn't find what's going on with module parameters, is there
> > any documentation?
>
> Not sure if there is documentation, but module params are quite painful
> to work with. Main reason is that they are global and not namespace
> aware. Plus developers usually default to making them read only, which
> means they practically speaking have to be configured at boot.

Understood, thanks.
Jakub Kicinski Nov. 10, 2024, 12:59 a.m. UTC | #6
On Fri, 8 Nov 2024 14:44:37 +0800 Xiao Liang wrote:
> > On Fri, 8 Nov 2024 00:53:55 +0800 Xiao Liang wrote:  
> > > IMO, this is about driver capability, not about user requests.  
> >
> > The bit is a driver capability, that's fine. But the question was how
> > to achieve backward compatibility. A flag in user request shifts the
> > responsibility of ensuring all services are compatible to whoever
> > spawns the interfaces. Which will probably be some network management
> > daemon.  
> 
> OK. So I think we can change the driver capability indicator in rtnl_ops
> to a tristate field, say, "linkns_support".
> If it is
>   - not supported, then keep the old behavior
>   - supported (vlan, macvlan, etc.), then change to the new behavior
>   - compat-mode (ip_tunnel), default to old behavior and can be changed
>     via an IFLA flag.
> Is this reasonable?

Let's start with annotating the drivers which need the old behavior.
It seems like something that was done as a workaround for old drivers,
maybe there isn't that many of them and we can convert them all in one
series.
Xiao Liang Nov. 11, 2024, 8:15 a.m. UTC | #7
On Sun, Nov 10, 2024 at 8:59 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Fri, 8 Nov 2024 14:44:37 +0800 Xiao Liang wrote:
> > > On Fri, 8 Nov 2024 00:53:55 +0800 Xiao Liang wrote:
> > > > IMO, this is about driver capability, not about user requests.
> > >
> > > The bit is a driver capability, that's fine. But the question was how
> > > to achieve backward compatibility. A flag in user request shifts the
> > > responsibility of ensuring all services are compatible to whoever
> > > spawns the interfaces. Which will probably be some network management
> > > daemon.
> >
> > OK. So I think we can change the driver capability indicator in rtnl_ops
> > to a tristate field, say, "linkns_support".
> > If it is
> >   - not supported, then keep the old behavior
> >   - supported (vlan, macvlan, etc.), then change to the new behavior
> >   - compat-mode (ip_tunnel), default to old behavior and can be changed
> >     via an IFLA flag.
> > Is this reasonable?
>
> Let's start with annotating the drivers which need the old behavior.
> It seems like something that was done as a workaround for old drivers,
> maybe there isn't that many of them and we can convert them all in one
> series.

I'd like to clarify a bit here.
Link netns is closely coupled with source netns, as it's passed to drivers
as source netns. Without introducing a flag, after applying the logic in
this patchset, drivers won't be able to distinguish the two:
  1) ip -n ns1 link add netns ns2 ...
  2) ip link add netns ns2 link-netns ns1 ...

There's no problem for drivers that already handle source netns.
But it changes the semantics of 1) for ip tunnels silently. The effective
link-netns is ns2 before, and will be changed to ns1 afterwards, which will
almost certainly affect some users. Is this acceptable? On the other hand,
do we need to deal with out-of-tree drivers?
Jakub Kicinski Nov. 11, 2024, 11:42 p.m. UTC | #8
On Mon, 11 Nov 2024 16:15:41 +0800 Xiao Liang wrote:
> > Let's start with annotating the drivers which need the old behavior.
> > It seems like something that was done as a workaround for old drivers,
> > maybe there isn't that many of them and we can convert them all in one
> > series.  
> 
> I'd like to clarify a bit here.
> Link netns is closely coupled with source netns, as it's passed to drivers
> as source netns. Without introducing a flag, after applying the logic in
> this patchset, drivers won't be able to distinguish the two:
>   1) ip -n ns1 link add netns ns2 ...
>   2) ip link add netns ns2 link-netns ns1 ...

True, but the question is how many drivers actually care about the net
parameter. Ideally we would pass both netns to the drivers, refactor
the ->newlink callback to take a parameter struct and add both netns
as members. Passing just one or the other will always be confusing.

> There's no problem for drivers that already handle source netns.
> But it changes the semantics of 1) for ip tunnels silently. The effective
> link-netns is ns2 before, and will be changed to ns1 afterwards, which will
> almost certainly affect some users. Is this acceptable?

No, changing the behavior for the commands you provided is not
acceptable. At the same time we shouldn't be adding technical debt 
of supporting both converted and unconverted drivers upstream.

> On the other hand, do we need to deal with out-of-tree drivers?

Nope, but again, changing the prototype of newlink would also make it
hard to get wrong for OOT modules.
diff mbox series

Patch

diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index f1f31ebfc793..6ef7e98e4620 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -107,6 +107,11 @@  static bool log_ecn_error = true;
 module_param(log_ecn_error, bool, 0644);
 MODULE_PARM_DESC(log_ecn_error, "Log packets received with corrupted ECN");
 
+static bool netns_atomic;
+module_param(netns_atomic, bool, 0444);
+MODULE_PARM_DESC(netns_atomic,
+		 "Create tunnel in target net namespace directly and use current net namespace as link-netns by default");
+
 static struct rtnl_link_ops ipgre_link_ops __read_mostly;
 static const struct header_ops ipgre_header_ops;
 
@@ -1393,6 +1398,7 @@  static int ipgre_newlink(struct net *src_net, struct net_device *dev,
 			 struct nlattr *tb[], struct nlattr *data[],
 			 struct netlink_ext_ack *extack)
 {
+	struct net *link_net = netns_atomic ? src_net : dev_net(dev);
 	struct ip_tunnel_parm_kern p;
 	__u32 fwmark = 0;
 	int err;
@@ -1404,13 +1410,14 @@  static int ipgre_newlink(struct net *src_net, struct net_device *dev,
 	err = ipgre_netlink_parms(dev, data, tb, &p, &fwmark);
 	if (err < 0)
 		return err;
-	return ip_tunnel_newlink(dev, tb, &p, fwmark);
+	return ip_tunnel_newlink_net(link_net, dev, tb, &p, fwmark);
 }
 
 static int erspan_newlink(struct net *src_net, struct net_device *dev,
 			  struct nlattr *tb[], struct nlattr *data[],
 			  struct netlink_ext_ack *extack)
 {
+	struct net *link_net = netns_atomic ? src_net : dev_net(dev);
 	struct ip_tunnel_parm_kern p;
 	__u32 fwmark = 0;
 	int err;
@@ -1422,7 +1429,7 @@  static int erspan_newlink(struct net *src_net, struct net_device *dev,
 	err = erspan_netlink_parms(dev, data, tb, &p, &fwmark);
 	if (err)
 		return err;
-	return ip_tunnel_newlink(dev, tb, &p, fwmark);
+	return ip_tunnel_newlink_net(link_net, dev, tb, &p, fwmark);
 }
 
 static int ipgre_changelink(struct net_device *dev, struct nlattr *tb[],
@@ -1777,6 +1784,10 @@  static int __init ipgre_init(void)
 
 	pr_info("GRE over IPv4 tunneling driver\n");
 
+	ipgre_link_ops.netns_atomic = netns_atomic;
+	ipgre_tap_ops.netns_atomic = netns_atomic;
+	erspan_link_ops.netns_atomic = netns_atomic;
+
 	err = register_pernet_device(&ipgre_net_ops);
 	if (err < 0)
 		return err;