Message ID | 1604448694-19351-1-git-send-email-yihung.wei@gmail.com |
---|---|
State | New |
Headers | show |
Series | [net] openvswitch: Fix upcall OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS | expand |
On 11/3/2020 4:11 PM, Yi-Hung Wei wrote: > TUNNEL_GENEVE_OPT is set on tun_flags in struct sw_flow_key when > a packet is coming from a geneve tunnel no matter the size of geneve > option is zero or not. On the other hand, TUNNEL_VXLAN_OPT or > TUNNEL_ERSPAN_OPT is set when the VXLAN or ERSPAN option is available. > Currently, ovs kernel module only generates > OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS when the tun_opts_len is non-zero. > As a result, for a geneve packet without tun_metadata, when the packet > is reinserted from userspace after upcall, we miss the TUNNEL_GENEVE_OPT > in the tun_flags on struct sw_flow_key, and that will further cause > megaflow matching issue. > > This patch changes the way that we deal with the upcall netlink message > generation to make sure the geneve tun_flags is set consistently > as the packet is firstly received from the geneve tunnel in order to > avoid megaflow matching issue demonstrated by the following flows. > This issue is only observed on ovs kernel datapath. > > Consider the following two flows, and the two cases. > * flow1: icmp traffic from gnv0 to gnv1, without any tun_metadata > * flow2: icmp traffic form gnv0 to gnv1 with tun_metadata0 > > Case 1) > Send flow2 first, and then send flow1. When both flows are running, > both the following two flows are hit, which is expected. > > table=2, priority=200, in_port=gnv0, icmp, ct_state=+trk+est, reg9=0x0/0xff action=output:gnv1 > table=2, priority=200, in_port=gnv0, icmp, ct_state=+trk+est, reg9=0x1/0xff action=output:gnv1 > > Case 2) > Send flow1 first, then send flow2. When both flows are running, > only the following flow is hit. > table=2, priority=200, in_port=gnv0, icmp, ct_state=+trk+est, reg9=0x0/0xff action=output:gnv1 > > Example flows) > > table=0, arp, actions=NORMAL > table=0, in_port=gnv1, icmp, action=ct(table=1) > table=0, in_port=gnv0, icmp action=move:NXM_NX_TUN_METADATA0[0..7]->NXM_NX_REG9[0..7], resubmit(,1) > table=1, in_port=gnv1, icmp, action=output:gnv0 > table=1, in_port=gnv0, icmp action=ct(table=2) > table=2, priority=300, in_port=gnv0, icmp, ct_state=+trk+new, action=ct(commit),output:gnv1 > table=2, priority=200, in_port=gnv0, icmp, ct_state=+trk+est, reg9=0x0/0xff action=output:gnv1 > table=2, priority=200, in_port=gnv0, icmp, ct_state=+trk+est, reg9=0x1/0xff action=output:gnv1 > > Fixes: fc4099f17240 ("openvswitch: Fix egress tunnel info.") > Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> > --- > net/openvswitch/flow_netlink.c | 10 +++++----- > 1 file changed, 5 insertions(+), 5 deletions(-) > > diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c > index 9d3e50c4d29f..b03ec6a1a1fa 100644 > --- a/net/openvswitch/flow_netlink.c > +++ b/net/openvswitch/flow_netlink.c > @@ -912,13 +912,13 @@ static int __ip_tun_to_nlattr(struct sk_buff *skb, > if ((output->tun_flags & TUNNEL_OAM) && > nla_put_flag(skb, OVS_TUNNEL_KEY_ATTR_OAM)) > return -EMSGSIZE; > - if (swkey_tun_opts_len) { > - if (output->tun_flags & TUNNEL_GENEVE_OPT && > - nla_put(skb, OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS, > + if (output->tun_flags & TUNNEL_GENEVE_OPT) { > + if (nla_put(skb, OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS, > swkey_tun_opts_len, tun_opts)) > return -EMSGSIZE; > - else if (output->tun_flags & TUNNEL_VXLAN_OPT && > - vxlan_opt_to_nlattr(skb, tun_opts, swkey_tun_opts_len)) > + } else if (swkey_tun_opts_len) { > + if (output->tun_flags & TUNNEL_VXLAN_OPT && > + vxlan_opt_to_nlattr(skb, tun_opts, swkey_tun_opts_len)) > return -EMSGSIZE; > else if (output->tun_flags & TUNNEL_ERSPAN_OPT && > nla_put(skb, OVS_TUNNEL_KEY_ATTR_ERSPAN_OPTS, > Applies and builds cleanly. Passes OVS kernel test suite geneve tunnel tests. 17: datapath - ping over geneve tunnel ok 18: datapath - flow resume with geneve tun_metadata ok 19: datapath - ping over geneve6 tunnel ok Code looks good to me. Acked-by: Greg Rose <gvrose8192@gmail.com>
On Tue, 3 Nov 2020 16:11:34 -0800 Yi-Hung Wei wrote: > TUNNEL_GENEVE_OPT is set on tun_flags in struct sw_flow_key when > a packet is coming from a geneve tunnel no matter the size of geneve > option is zero or not. On the other hand, TUNNEL_VXLAN_OPT or > TUNNEL_ERSPAN_OPT is set when the VXLAN or ERSPAN option is available. > Currently, ovs kernel module only generates > OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS when the tun_opts_len is non-zero. > As a result, for a geneve packet without tun_metadata, when the packet > is reinserted from userspace after upcall, we miss the TUNNEL_GENEVE_OPT > in the tun_flags on struct sw_flow_key, and that will further cause > megaflow matching issue. > > This patch changes the way that we deal with the upcall netlink message > generation to make sure the geneve tun_flags is set consistently > as the packet is firstly received from the geneve tunnel in order to > avoid megaflow matching issue demonstrated by the following flows. > This issue is only observed on ovs kernel datapath. > > Consider the following two flows, and the two cases. > * flow1: icmp traffic from gnv0 to gnv1, without any tun_metadata > * flow2: icmp traffic form gnv0 to gnv1 with tun_metadata0 > > Case 1) > Send flow2 first, and then send flow1. When both flows are running, > both the following two flows are hit, which is expected. > > table=2, priority=200, in_port=gnv0, icmp, ct_state=+trk+est, reg9=0x0/0xff action=output:gnv1 > table=2, priority=200, in_port=gnv0, icmp, ct_state=+trk+est, reg9=0x1/0xff action=output:gnv1 > > Case 2) > Send flow1 first, then send flow2. When both flows are running, > only the following flow is hit. > table=2, priority=200, in_port=gnv0, icmp, ct_state=+trk+est, reg9=0x0/0xff action=output:gnv1 > > Example flows) > > table=0, arp, actions=NORMAL > table=0, in_port=gnv1, icmp, action=ct(table=1) > table=0, in_port=gnv0, icmp action=move:NXM_NX_TUN_METADATA0[0..7]->NXM_NX_REG9[0..7], resubmit(,1) > table=1, in_port=gnv1, icmp, action=output:gnv0 > table=1, in_port=gnv0, icmp action=ct(table=2) > table=2, priority=300, in_port=gnv0, icmp, ct_state=+trk+new, action=ct(commit),output:gnv1 > table=2, priority=200, in_port=gnv0, icmp, ct_state=+trk+est, reg9=0x0/0xff action=output:gnv1 > table=2, priority=200, in_port=gnv0, icmp, ct_state=+trk+est, reg9=0x1/0xff action=output:gnv1 > > Fixes: fc4099f17240 ("openvswitch: Fix egress tunnel info.") > Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Wouldn't it make more sense to make GENEVE behave like the other tunnels rather than hack this logic into consumer of the flag? Please make sure that you CC authors of the patch you're fixing and the maintainers of the code you're changing. ./scripts/get_maintainer.pl is your friend.
On Sat, Nov 7, 2020 at 11:46 AM Jakub Kicinski <kuba@kernel.org> wrote: > > On Tue, 3 Nov 2020 16:11:34 -0800 Yi-Hung Wei wrote: > > TUNNEL_GENEVE_OPT is set on tun_flags in struct sw_flow_key when > > a packet is coming from a geneve tunnel no matter the size of geneve > > option is zero or not. On the other hand, TUNNEL_VXLAN_OPT or > > TUNNEL_ERSPAN_OPT is set when the VXLAN or ERSPAN option is available. > > Currently, ovs kernel module only generates > > OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS when the tun_opts_len is non-zero. > > As a result, for a geneve packet without tun_metadata, when the packet > > is reinserted from userspace after upcall, we miss the TUNNEL_GENEVE_OPT > > in the tun_flags on struct sw_flow_key, and that will further cause > > megaflow matching issue. > > > > This patch changes the way that we deal with the upcall netlink message > > generation to make sure the geneve tun_flags is set consistently > > as the packet is firstly received from the geneve tunnel in order to > > avoid megaflow matching issue demonstrated by the following flows. > > This issue is only observed on ovs kernel datapath. > > > > Consider the following two flows, and the two cases. > > * flow1: icmp traffic from gnv0 to gnv1, without any tun_metadata > > * flow2: icmp traffic form gnv0 to gnv1 with tun_metadata0 > > > > Case 1) > > Send flow2 first, and then send flow1. When both flows are running, > > both the following two flows are hit, which is expected. > > > > table=2, priority=200, in_port=gnv0, icmp, ct_state=+trk+est, reg9=0x0/0xff action=output:gnv1 > > table=2, priority=200, in_port=gnv0, icmp, ct_state=+trk+est, reg9=0x1/0xff action=output:gnv1 > > > > Case 2) > > Send flow1 first, then send flow2. When both flows are running, > > only the following flow is hit. > > table=2, priority=200, in_port=gnv0, icmp, ct_state=+trk+est, reg9=0x0/0xff action=output:gnv1 > > > > Example flows) > > > > table=0, arp, actions=NORMAL > > table=0, in_port=gnv1, icmp, action=ct(table=1) > > table=0, in_port=gnv0, icmp action=move:NXM_NX_TUN_METADATA0[0..7]->NXM_NX_REG9[0..7], resubmit(,1) > > table=1, in_port=gnv1, icmp, action=output:gnv0 > > table=1, in_port=gnv0, icmp action=ct(table=2) > > table=2, priority=300, in_port=gnv0, icmp, ct_state=+trk+new, action=ct(commit),output:gnv1 > > table=2, priority=200, in_port=gnv0, icmp, ct_state=+trk+est, reg9=0x0/0xff action=output:gnv1 > > table=2, priority=200, in_port=gnv0, icmp, ct_state=+trk+est, reg9=0x1/0xff action=output:gnv1 > > > > Fixes: fc4099f17240 ("openvswitch: Fix egress tunnel info.") > > Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> > > Wouldn't it make more sense to make GENEVE behave like the other > tunnels rather than hack this logic into consumer of the flag? Thanks Jakub for the review. Yes, it makes sense to fix on the tunnel side. I submit another patch to fix it: https://lore.kernel.org/netdev/1605053800-74072-1-git-send-email-yihung.wei@gmail.com/T/#u > Please make sure that you CC authors of the patch you're fixing > and the maintainers of the code you're changing. > > ./scripts/get_maintainer.pl is your friend. Thanks for reminding me. Will do that from now on. -Yi-Hung
diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c index 9d3e50c4d29f..b03ec6a1a1fa 100644 --- a/net/openvswitch/flow_netlink.c +++ b/net/openvswitch/flow_netlink.c @@ -912,13 +912,13 @@ static int __ip_tun_to_nlattr(struct sk_buff *skb, if ((output->tun_flags & TUNNEL_OAM) && nla_put_flag(skb, OVS_TUNNEL_KEY_ATTR_OAM)) return -EMSGSIZE; - if (swkey_tun_opts_len) { - if (output->tun_flags & TUNNEL_GENEVE_OPT && - nla_put(skb, OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS, + if (output->tun_flags & TUNNEL_GENEVE_OPT) { + if (nla_put(skb, OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS, swkey_tun_opts_len, tun_opts)) return -EMSGSIZE; - else if (output->tun_flags & TUNNEL_VXLAN_OPT && - vxlan_opt_to_nlattr(skb, tun_opts, swkey_tun_opts_len)) + } else if (swkey_tun_opts_len) { + if (output->tun_flags & TUNNEL_VXLAN_OPT && + vxlan_opt_to_nlattr(skb, tun_opts, swkey_tun_opts_len)) return -EMSGSIZE; else if (output->tun_flags & TUNNEL_ERSPAN_OPT && nla_put(skb, OVS_TUNNEL_KEY_ATTR_ERSPAN_OPTS,
TUNNEL_GENEVE_OPT is set on tun_flags in struct sw_flow_key when a packet is coming from a geneve tunnel no matter the size of geneve option is zero or not. On the other hand, TUNNEL_VXLAN_OPT or TUNNEL_ERSPAN_OPT is set when the VXLAN or ERSPAN option is available. Currently, ovs kernel module only generates OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS when the tun_opts_len is non-zero. As a result, for a geneve packet without tun_metadata, when the packet is reinserted from userspace after upcall, we miss the TUNNEL_GENEVE_OPT in the tun_flags on struct sw_flow_key, and that will further cause megaflow matching issue. This patch changes the way that we deal with the upcall netlink message generation to make sure the geneve tun_flags is set consistently as the packet is firstly received from the geneve tunnel in order to avoid megaflow matching issue demonstrated by the following flows. This issue is only observed on ovs kernel datapath. Consider the following two flows, and the two cases. * flow1: icmp traffic from gnv0 to gnv1, without any tun_metadata * flow2: icmp traffic form gnv0 to gnv1 with tun_metadata0 Case 1) Send flow2 first, and then send flow1. When both flows are running, both the following two flows are hit, which is expected. table=2, priority=200, in_port=gnv0, icmp, ct_state=+trk+est, reg9=0x0/0xff action=output:gnv1 table=2, priority=200, in_port=gnv0, icmp, ct_state=+trk+est, reg9=0x1/0xff action=output:gnv1 Case 2) Send flow1 first, then send flow2. When both flows are running, only the following flow is hit. table=2, priority=200, in_port=gnv0, icmp, ct_state=+trk+est, reg9=0x0/0xff action=output:gnv1 Example flows) table=0, arp, actions=NORMAL table=0, in_port=gnv1, icmp, action=ct(table=1) table=0, in_port=gnv0, icmp action=move:NXM_NX_TUN_METADATA0[0..7]->NXM_NX_REG9[0..7], resubmit(,1) table=1, in_port=gnv1, icmp, action=output:gnv0 table=1, in_port=gnv0, icmp action=ct(table=2) table=2, priority=300, in_port=gnv0, icmp, ct_state=+trk+new, action=ct(commit),output:gnv1 table=2, priority=200, in_port=gnv0, icmp, ct_state=+trk+est, reg9=0x0/0xff action=output:gnv1 table=2, priority=200, in_port=gnv0, icmp, ct_state=+trk+est, reg9=0x1/0xff action=output:gnv1 Fixes: fc4099f17240 ("openvswitch: Fix egress tunnel info.") Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> --- net/openvswitch/flow_netlink.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)