diff mbox series

[net] openvswitch: Fix upcall OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS

Message ID 1604448694-19351-1-git-send-email-yihung.wei@gmail.com
State New
Headers show
Series [net] openvswitch: Fix upcall OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS | expand

Commit Message

Yi-Hung Wei Nov. 4, 2020, 12:11 a.m. UTC
TUNNEL_GENEVE_OPT is set on tun_flags in struct sw_flow_key when
a packet is coming from a geneve tunnel no matter the size of geneve
option is zero or not.  On the other hand, TUNNEL_VXLAN_OPT or
TUNNEL_ERSPAN_OPT is set when the VXLAN or ERSPAN option is available.
Currently, ovs kernel module only generates
OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS when the tun_opts_len is non-zero.
As a result, for a geneve packet without tun_metadata, when the packet
is reinserted from userspace after upcall, we miss the TUNNEL_GENEVE_OPT
in the tun_flags on struct sw_flow_key, and that will further cause
megaflow matching issue.

This patch changes the way that we deal with the upcall netlink message
generation to make sure the geneve tun_flags is set consistently
as the packet is firstly received from the geneve tunnel in order to
avoid megaflow matching issue demonstrated by the following flows.
This issue is only observed on ovs kernel datapath.

Consider the following two flows, and the two cases.
* flow1: icmp traffic from gnv0 to gnv1, without any tun_metadata
* flow2: icmp traffic form gnv0 to gnv1 with tun_metadata0

Case 1)
Send flow2 first, and then send flow1.  When both flows are running,
both the following two flows are hit, which is expected.

table=2, priority=200, in_port=gnv0, icmp, ct_state=+trk+est, reg9=0x0/0xff action=output:gnv1
table=2, priority=200, in_port=gnv0, icmp, ct_state=+trk+est, reg9=0x1/0xff action=output:gnv1

Case 2)
Send flow1 first, then send flow2.  When both flows are running,
only the following flow is hit.
table=2, priority=200, in_port=gnv0, icmp, ct_state=+trk+est, reg9=0x0/0xff action=output:gnv1

Example flows)

table=0, arp, actions=NORMAL
table=0, in_port=gnv1, icmp, action=ct(table=1)
table=0, in_port=gnv0, icmp  action=move:NXM_NX_TUN_METADATA0[0..7]->NXM_NX_REG9[0..7], resubmit(,1)
table=1, in_port=gnv1, icmp, action=output:gnv0
table=1, in_port=gnv0, icmp  action=ct(table=2)
table=2, priority=300, in_port=gnv0, icmp, ct_state=+trk+new, action=ct(commit),output:gnv1
table=2, priority=200, in_port=gnv0, icmp, ct_state=+trk+est, reg9=0x0/0xff action=output:gnv1
table=2, priority=200, in_port=gnv0, icmp, ct_state=+trk+est, reg9=0x1/0xff action=output:gnv1

Fixes: fc4099f17240 ("openvswitch: Fix egress tunnel info.")
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
---
 net/openvswitch/flow_netlink.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

Comments

Gregory Rose Nov. 5, 2020, 7:47 p.m. UTC | #1
On 11/3/2020 4:11 PM, Yi-Hung Wei wrote:
> TUNNEL_GENEVE_OPT is set on tun_flags in struct sw_flow_key when

> a packet is coming from a geneve tunnel no matter the size of geneve

> option is zero or not.  On the other hand, TUNNEL_VXLAN_OPT or

> TUNNEL_ERSPAN_OPT is set when the VXLAN or ERSPAN option is available.

> Currently, ovs kernel module only generates

> OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS when the tun_opts_len is non-zero.

> As a result, for a geneve packet without tun_metadata, when the packet

> is reinserted from userspace after upcall, we miss the TUNNEL_GENEVE_OPT

> in the tun_flags on struct sw_flow_key, and that will further cause

> megaflow matching issue.

> 

> This patch changes the way that we deal with the upcall netlink message

> generation to make sure the geneve tun_flags is set consistently

> as the packet is firstly received from the geneve tunnel in order to

> avoid megaflow matching issue demonstrated by the following flows.

> This issue is only observed on ovs kernel datapath.

> 

> Consider the following two flows, and the two cases.

> * flow1: icmp traffic from gnv0 to gnv1, without any tun_metadata

> * flow2: icmp traffic form gnv0 to gnv1 with tun_metadata0

> 

> Case 1)

> Send flow2 first, and then send flow1.  When both flows are running,

> both the following two flows are hit, which is expected.

> 

> table=2, priority=200, in_port=gnv0, icmp, ct_state=+trk+est, reg9=0x0/0xff action=output:gnv1

> table=2, priority=200, in_port=gnv0, icmp, ct_state=+trk+est, reg9=0x1/0xff action=output:gnv1

> 

> Case 2)

> Send flow1 first, then send flow2.  When both flows are running,

> only the following flow is hit.

> table=2, priority=200, in_port=gnv0, icmp, ct_state=+trk+est, reg9=0x0/0xff action=output:gnv1

> 

> Example flows)

> 

> table=0, arp, actions=NORMAL

> table=0, in_port=gnv1, icmp, action=ct(table=1)

> table=0, in_port=gnv0, icmp  action=move:NXM_NX_TUN_METADATA0[0..7]->NXM_NX_REG9[0..7], resubmit(,1)

> table=1, in_port=gnv1, icmp, action=output:gnv0

> table=1, in_port=gnv0, icmp  action=ct(table=2)

> table=2, priority=300, in_port=gnv0, icmp, ct_state=+trk+new, action=ct(commit),output:gnv1

> table=2, priority=200, in_port=gnv0, icmp, ct_state=+trk+est, reg9=0x0/0xff action=output:gnv1

> table=2, priority=200, in_port=gnv0, icmp, ct_state=+trk+est, reg9=0x1/0xff action=output:gnv1

> 

> Fixes: fc4099f17240 ("openvswitch: Fix egress tunnel info.")

> Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>

> ---

>   net/openvswitch/flow_netlink.c | 10 +++++-----

>   1 file changed, 5 insertions(+), 5 deletions(-)

> 

> diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c

> index 9d3e50c4d29f..b03ec6a1a1fa 100644

> --- a/net/openvswitch/flow_netlink.c

> +++ b/net/openvswitch/flow_netlink.c

> @@ -912,13 +912,13 @@ static int __ip_tun_to_nlattr(struct sk_buff *skb,

>   	if ((output->tun_flags & TUNNEL_OAM) &&

>   	    nla_put_flag(skb, OVS_TUNNEL_KEY_ATTR_OAM))

>   		return -EMSGSIZE;

> -	if (swkey_tun_opts_len) {

> -		if (output->tun_flags & TUNNEL_GENEVE_OPT &&

> -		    nla_put(skb, OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS,

> +	if (output->tun_flags & TUNNEL_GENEVE_OPT) {

> +		if (nla_put(skb, OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS,

>   			    swkey_tun_opts_len, tun_opts))

>   			return -EMSGSIZE;

> -		else if (output->tun_flags & TUNNEL_VXLAN_OPT &&

> -			 vxlan_opt_to_nlattr(skb, tun_opts, swkey_tun_opts_len))

> +	} else if (swkey_tun_opts_len) {

> +		if (output->tun_flags & TUNNEL_VXLAN_OPT &&

> +		    vxlan_opt_to_nlattr(skb, tun_opts, swkey_tun_opts_len))

>   			return -EMSGSIZE;

>   		else if (output->tun_flags & TUNNEL_ERSPAN_OPT &&

>   			 nla_put(skb, OVS_TUNNEL_KEY_ATTR_ERSPAN_OPTS,

> 


Applies and builds cleanly.

Passes OVS kernel test suite geneve tunnel tests.

  17: datapath - ping over geneve tunnel              ok
  18: datapath - flow resume with geneve tun_metadata ok
  19: datapath - ping over geneve6 tunnel             ok

Code looks good to me.

Acked-by: Greg Rose <gvrose8192@gmail.com>
Jakub Kicinski Nov. 7, 2020, 7:46 p.m. UTC | #2
On Tue,  3 Nov 2020 16:11:34 -0800 Yi-Hung Wei wrote:
> TUNNEL_GENEVE_OPT is set on tun_flags in struct sw_flow_key when

> a packet is coming from a geneve tunnel no matter the size of geneve

> option is zero or not.  On the other hand, TUNNEL_VXLAN_OPT or

> TUNNEL_ERSPAN_OPT is set when the VXLAN or ERSPAN option is available.

> Currently, ovs kernel module only generates

> OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS when the tun_opts_len is non-zero.

> As a result, for a geneve packet without tun_metadata, when the packet

> is reinserted from userspace after upcall, we miss the TUNNEL_GENEVE_OPT

> in the tun_flags on struct sw_flow_key, and that will further cause

> megaflow matching issue.

> 

> This patch changes the way that we deal with the upcall netlink message

> generation to make sure the geneve tun_flags is set consistently

> as the packet is firstly received from the geneve tunnel in order to

> avoid megaflow matching issue demonstrated by the following flows.

> This issue is only observed on ovs kernel datapath.

> 

> Consider the following two flows, and the two cases.

> * flow1: icmp traffic from gnv0 to gnv1, without any tun_metadata

> * flow2: icmp traffic form gnv0 to gnv1 with tun_metadata0

> 

> Case 1)

> Send flow2 first, and then send flow1.  When both flows are running,

> both the following two flows are hit, which is expected.

> 

> table=2, priority=200, in_port=gnv0, icmp, ct_state=+trk+est, reg9=0x0/0xff action=output:gnv1

> table=2, priority=200, in_port=gnv0, icmp, ct_state=+trk+est, reg9=0x1/0xff action=output:gnv1

> 

> Case 2)

> Send flow1 first, then send flow2.  When both flows are running,

> only the following flow is hit.

> table=2, priority=200, in_port=gnv0, icmp, ct_state=+trk+est, reg9=0x0/0xff action=output:gnv1

> 

> Example flows)

> 

> table=0, arp, actions=NORMAL

> table=0, in_port=gnv1, icmp, action=ct(table=1)

> table=0, in_port=gnv0, icmp  action=move:NXM_NX_TUN_METADATA0[0..7]->NXM_NX_REG9[0..7], resubmit(,1)

> table=1, in_port=gnv1, icmp, action=output:gnv0

> table=1, in_port=gnv0, icmp  action=ct(table=2)

> table=2, priority=300, in_port=gnv0, icmp, ct_state=+trk+new, action=ct(commit),output:gnv1

> table=2, priority=200, in_port=gnv0, icmp, ct_state=+trk+est, reg9=0x0/0xff action=output:gnv1

> table=2, priority=200, in_port=gnv0, icmp, ct_state=+trk+est, reg9=0x1/0xff action=output:gnv1

> 

> Fixes: fc4099f17240 ("openvswitch: Fix egress tunnel info.")

> Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>


Wouldn't it make more sense to make GENEVE behave like the other
tunnels rather than hack this logic into consumer of the flag?

Please make sure that you CC authors of the patch you're fixing 
and the maintainers of the code you're changing.

./scripts/get_maintainer.pl is your friend.
Yi-Hung Wei Nov. 11, 2020, 12:26 a.m. UTC | #3
On Sat, Nov 7, 2020 at 11:46 AM Jakub Kicinski <kuba@kernel.org> wrote:
>

> On Tue,  3 Nov 2020 16:11:34 -0800 Yi-Hung Wei wrote:

> > TUNNEL_GENEVE_OPT is set on tun_flags in struct sw_flow_key when

> > a packet is coming from a geneve tunnel no matter the size of geneve

> > option is zero or not.  On the other hand, TUNNEL_VXLAN_OPT or

> > TUNNEL_ERSPAN_OPT is set when the VXLAN or ERSPAN option is available.

> > Currently, ovs kernel module only generates

> > OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS when the tun_opts_len is non-zero.

> > As a result, for a geneve packet without tun_metadata, when the packet

> > is reinserted from userspace after upcall, we miss the TUNNEL_GENEVE_OPT

> > in the tun_flags on struct sw_flow_key, and that will further cause

> > megaflow matching issue.

> >

> > This patch changes the way that we deal with the upcall netlink message

> > generation to make sure the geneve tun_flags is set consistently

> > as the packet is firstly received from the geneve tunnel in order to

> > avoid megaflow matching issue demonstrated by the following flows.

> > This issue is only observed on ovs kernel datapath.

> >

> > Consider the following two flows, and the two cases.

> > * flow1: icmp traffic from gnv0 to gnv1, without any tun_metadata

> > * flow2: icmp traffic form gnv0 to gnv1 with tun_metadata0

> >

> > Case 1)

> > Send flow2 first, and then send flow1.  When both flows are running,

> > both the following two flows are hit, which is expected.

> >

> > table=2, priority=200, in_port=gnv0, icmp, ct_state=+trk+est, reg9=0x0/0xff action=output:gnv1

> > table=2, priority=200, in_port=gnv0, icmp, ct_state=+trk+est, reg9=0x1/0xff action=output:gnv1

> >

> > Case 2)

> > Send flow1 first, then send flow2.  When both flows are running,

> > only the following flow is hit.

> > table=2, priority=200, in_port=gnv0, icmp, ct_state=+trk+est, reg9=0x0/0xff action=output:gnv1

> >

> > Example flows)

> >

> > table=0, arp, actions=NORMAL

> > table=0, in_port=gnv1, icmp, action=ct(table=1)

> > table=0, in_port=gnv0, icmp  action=move:NXM_NX_TUN_METADATA0[0..7]->NXM_NX_REG9[0..7], resubmit(,1)

> > table=1, in_port=gnv1, icmp, action=output:gnv0

> > table=1, in_port=gnv0, icmp  action=ct(table=2)

> > table=2, priority=300, in_port=gnv0, icmp, ct_state=+trk+new, action=ct(commit),output:gnv1

> > table=2, priority=200, in_port=gnv0, icmp, ct_state=+trk+est, reg9=0x0/0xff action=output:gnv1

> > table=2, priority=200, in_port=gnv0, icmp, ct_state=+trk+est, reg9=0x1/0xff action=output:gnv1

> >

> > Fixes: fc4099f17240 ("openvswitch: Fix egress tunnel info.")

> > Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>

>

> Wouldn't it make more sense to make GENEVE behave like the other

> tunnels rather than hack this logic into consumer of the flag?


Thanks Jakub for the review.  Yes, it makes sense to fix on the tunnel
side. I submit another patch to fix it:

https://lore.kernel.org/netdev/1605053800-74072-1-git-send-email-yihung.wei@gmail.com/T/#u

> Please make sure that you CC authors of the patch you're fixing

> and the maintainers of the code you're changing.

>

> ./scripts/get_maintainer.pl is your friend.


Thanks for reminding me.  Will do that from now on.


-Yi-Hung
diff mbox series

Patch

diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index 9d3e50c4d29f..b03ec6a1a1fa 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -912,13 +912,13 @@  static int __ip_tun_to_nlattr(struct sk_buff *skb,
 	if ((output->tun_flags & TUNNEL_OAM) &&
 	    nla_put_flag(skb, OVS_TUNNEL_KEY_ATTR_OAM))
 		return -EMSGSIZE;
-	if (swkey_tun_opts_len) {
-		if (output->tun_flags & TUNNEL_GENEVE_OPT &&
-		    nla_put(skb, OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS,
+	if (output->tun_flags & TUNNEL_GENEVE_OPT) {
+		if (nla_put(skb, OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS,
 			    swkey_tun_opts_len, tun_opts))
 			return -EMSGSIZE;
-		else if (output->tun_flags & TUNNEL_VXLAN_OPT &&
-			 vxlan_opt_to_nlattr(skb, tun_opts, swkey_tun_opts_len))
+	} else if (swkey_tun_opts_len) {
+		if (output->tun_flags & TUNNEL_VXLAN_OPT &&
+		    vxlan_opt_to_nlattr(skb, tun_opts, swkey_tun_opts_len))
 			return -EMSGSIZE;
 		else if (output->tun_flags & TUNNEL_ERSPAN_OPT &&
 			 nla_put(skb, OVS_TUNNEL_KEY_ATTR_ERSPAN_OPTS,