mbox series

[net,0/3] Fix PMTU for ESP-in-UDP encapsulation

Message ID 20210712005554.26948-1-vfedorenko@novek.ru
Headers show
Series Fix PMTU for ESP-in-UDP encapsulation | expand

Message

Vadim Fedorenko July 12, 2021, 12:55 a.m. UTC
Bug 213669 uncovered regression in PMTU discovery for UDP-encapsulated
routes and some incorrect usage in udp tunnel fields. This series fixes
problems and also adds such case for selftests

Vadim Fedorenko (3):
  udp: check for encap using encap_enable
  udp: check encap socket in __udp_lib_err
  selftests: net: add ESP-in-UDP PMTU test

 drivers/infiniband/sw/rxe/rxe_net.c   |   1 -
 drivers/net/bareudp.c                 |   1 -
 drivers/net/geneve.c                  |   1 -
 drivers/net/vxlan.c                   |   1 -
 drivers/net/wireguard/socket.c        |   1 -
 net/ipv4/fou.c                        |   1 -
 net/ipv4/udp.c                        |  31 ++++++--
 net/ipv6/udp.c                        |  30 ++++++--
 net/sctp/protocol.c                   |   2 -
 net/tipc/udp_media.c                  |   1 -
 tools/testing/selftests/net/nettest.c |  55 +++++++++++++-
 tools/testing/selftests/net/pmtu.sh   | 104 +++++++++++++++++++++++++-
 12 files changed, 205 insertions(+), 24 deletions(-)

Comments

Willem de Bruijn July 12, 2021, 7:59 a.m. UTC | #1
On Mon, Jul 12, 2021 at 2:56 AM Vadim Fedorenko <vfedorenko@novek.ru> wrote:
>
> Commit d26796ae5894 ("udp: check udp sock encap_type in __udp_lib_err")
> added checks for encapsulated sockets but it broke cases when there is
> no implementation of encap_err_lookup for encapsulation, i.e. ESP in
> UDP encapsulation. Fix it by calling encap_err_lookup only if socket
> implements this method otherwise treat it as legal socket.
>
> Fixes: d26796ae5894 ("udp: check udp sock encap_type in __udp_lib_err")
> Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
> ---
>  net/ipv4/udp.c | 24 +++++++++++++++++++++++-
>  net/ipv6/udp.c | 22 ++++++++++++++++++++++
>  2 files changed, 45 insertions(+), 1 deletion(-)

This duplicates __udp4_lib_err_encap and __udp6_lib_err_encap.

Can we avoid open-coding that logic multiple times?
Paolo Abeni July 12, 2021, 9:07 a.m. UTC | #2
Hello,

On Mon, 2021-07-12 at 03:55 +0300, Vadim Fedorenko wrote:
> Commit d26796ae5894 ("udp: check udp sock encap_type in __udp_lib_err")
> added checks for encapsulated sockets but it broke cases when there is
> no implementation of encap_err_lookup for encapsulation, i.e. ESP in
> UDP encapsulation. Fix it by calling encap_err_lookup only if socket
> implements this method otherwise treat it as legal socket.
> 
> Fixes: d26796ae5894 ("udp: check udp sock encap_type in __udp_lib_err")
> Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
> ---
>  net/ipv4/udp.c | 24 +++++++++++++++++++++++-
>  net/ipv6/udp.c | 22 ++++++++++++++++++++++
>  2 files changed, 45 insertions(+), 1 deletion(-)
> 
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index e5cb7fedfbcd..4980e0f19990 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -707,7 +707,29 @@ int __udp4_lib_err(struct sk_buff *skb, u32 info, struct udp_table *udptable)
>  	sk = __udp4_lib_lookup(net, iph->daddr, uh->dest,
>  			       iph->saddr, uh->source, skb->dev->ifindex,
>  			       inet_sdif(skb), udptable, NULL);
> -	if (!sk || udp_sk(sk)->encap_enabled) {
> +	if (sk && udp_sk(sk)->encap_enabled) {
> +		int (*lookup)(struct sock *sk, struct sk_buff *skb);
> +
> +		lookup = READ_ONCE(udp_sk(sk)->encap_err_lookup);
> +		if (lookup) {
> +			int network_offset, transport_offset;
> +
> +			network_offset = skb_network_offset(skb);
> +			transport_offset = skb_transport_offset(skb);
> +
> +			/* Network header needs to point to the outer IPv4 header inside ICMP */
> +			skb_reset_network_header(skb);
> +
> +			/* Transport header needs to point to the UDP header */
> +			skb_set_transport_header(skb, iph->ihl << 2);
> +			if (lookup(sk, skb))
> +				sk = NULL;
> +			skb_set_transport_header(skb, transport_offset);
> +			skb_set_network_header(skb, network_offset);
> +		}
> +	}
> +
> +	if (!sk) {
>  		/* No socket for error: try tunnels before discarding */
>  		sk = ERR_PTR(-ENOENT);
>  		if (static_branch_unlikely(&udp_encap_needed_key)) {
> diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
> index 798916d2e722..ed49a8589d9f 100644
> --- a/net/ipv6/udp.c
> +++ b/net/ipv6/udp.c
> @@ -558,6 +558,28 @@ int __udp6_lib_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
>  
>  	sk = __udp6_lib_lookup(net, daddr, uh->dest, saddr, uh->source,
>  			       inet6_iif(skb), inet6_sdif(skb), udptable, NULL);
> +	if (sk && udp_sk(sk)->encap_enabled) {
> +		int (*lookup)(struct sock *sk, struct sk_buff *skb);
> +
> +		lookup = READ_ONCE(udp_sk(sk)->encap_err_lookup);
> +		if (lookup) {
> +			int network_offset, transport_offset;
> +
> +			network_offset = skb_network_offset(skb);
> +			transport_offset = skb_transport_offset(skb);
> +
> +			/* Network header needs to point to the outer IPv6 header inside ICMP */
> +			skb_reset_network_header(skb);
> +
> +			/* Transport header needs to point to the UDP header */
> +			skb_set_transport_header(skb, offset);
> +			if (lookup(sk, skb))
> +				sk = NULL;
> +			skb_set_transport_header(skb, transport_offset);
> +			skb_set_network_header(skb, network_offset);
> +		}
> +	}

I can't follow this code. I guess that before d26796ae5894,
__udp6_lib_err() used to invoke ICMP processing on the ESP in UDP
socket, and after d26796ae5894 'sk' was cleared
by __udp4_lib_err_encap(), is that correct?

After this patch, the above chunk will not clear 'sk' for packets
targeting ESP in UDP sockets, but AFAICS we will still enter the
following conditional, preserving the current behavior - no ICMP
processing. 

Can you please clarify?

Why can't you use something alike the following instead?

---
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index c0f9f3260051..96a3b640e4da 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -707,7 +707,7 @@ int __udp4_lib_err(struct sk_buff *skb, u32 info, struct udp_table *udptable)
        sk = __udp4_lib_lookup(net, iph->daddr, uh->dest,
                               iph->saddr, uh->source, skb->dev->ifindex,
                               inet_sdif(skb), udptable, NULL);
-       if (!sk || udp_sk(sk)->encap_type) {
+       if (!sk || READ_ONCE(udp_sk(sk)->encap_err_lookup)) {
                /* No socket for error: try tunnels before discarding */
                sk = ERR_PTR(-ENOENT);
                if (static_branch_unlikely(&udp_encap_needed_key)) {

---

Thanks!

/P
Vadim Fedorenko July 12, 2021, 12:45 p.m. UTC | #3
On 12.07.2021 10:07, Paolo Abeni wrote:
> Hello,
> 
> On Mon, 2021-07-12 at 03:55 +0300, Vadim Fedorenko wrote:
>> Commit d26796ae5894 ("udp: check udp sock encap_type in __udp_lib_err")
>> added checks for encapsulated sockets but it broke cases when there is
>> no implementation of encap_err_lookup for encapsulation, i.e. ESP in
>> UDP encapsulation. Fix it by calling encap_err_lookup only if socket
>> implements this method otherwise treat it as legal socket.
>>
>> Fixes: d26796ae5894 ("udp: check udp sock encap_type in __udp_lib_err")
>> Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
>> ---
>>   net/ipv4/udp.c | 24 +++++++++++++++++++++++-
>>   net/ipv6/udp.c | 22 ++++++++++++++++++++++
>>   2 files changed, 45 insertions(+), 1 deletion(-)
>>
>> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
>> index e5cb7fedfbcd..4980e0f19990 100644
>> --- a/net/ipv4/udp.c
>> +++ b/net/ipv4/udp.c
>> @@ -707,7 +707,29 @@ int __udp4_lib_err(struct sk_buff *skb, u32 info, struct udp_table *udptable)
>>   	sk = __udp4_lib_lookup(net, iph->daddr, uh->dest,
>>   			       iph->saddr, uh->source, skb->dev->ifindex,
>>   			       inet_sdif(skb), udptable, NULL);
>> -	if (!sk || udp_sk(sk)->encap_enabled) {
>> +	if (sk && udp_sk(sk)->encap_enabled) {
>> +		int (*lookup)(struct sock *sk, struct sk_buff *skb);
>> +
>> +		lookup = READ_ONCE(udp_sk(sk)->encap_err_lookup);
>> +		if (lookup) {
>> +			int network_offset, transport_offset;
>> +
>> +			network_offset = skb_network_offset(skb);
>> +			transport_offset = skb_transport_offset(skb);
>> +
>> +			/* Network header needs to point to the outer IPv4 header inside ICMP */
>> +			skb_reset_network_header(skb);
>> +
>> +			/* Transport header needs to point to the UDP header */
>> +			skb_set_transport_header(skb, iph->ihl << 2);
>> +			if (lookup(sk, skb))
>> +				sk = NULL;
>> +			skb_set_transport_header(skb, transport_offset);
>> +			skb_set_network_header(skb, network_offset);
>> +		}
>> +	}
>> +
>> +	if (!sk) {
>>   		/* No socket for error: try tunnels before discarding */
>>   		sk = ERR_PTR(-ENOENT);
>>   		if (static_branch_unlikely(&udp_encap_needed_key)) {
>> diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
>> index 798916d2e722..ed49a8589d9f 100644
>> --- a/net/ipv6/udp.c
>> +++ b/net/ipv6/udp.c
>> @@ -558,6 +558,28 @@ int __udp6_lib_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
>>   
>>   	sk = __udp6_lib_lookup(net, daddr, uh->dest, saddr, uh->source,
>>   			       inet6_iif(skb), inet6_sdif(skb), udptable, NULL);
>> +	if (sk && udp_sk(sk)->encap_enabled) {
>> +		int (*lookup)(struct sock *sk, struct sk_buff *skb);
>> +
>> +		lookup = READ_ONCE(udp_sk(sk)->encap_err_lookup);
>> +		if (lookup) {
>> +			int network_offset, transport_offset;
>> +
>> +			network_offset = skb_network_offset(skb);
>> +			transport_offset = skb_transport_offset(skb);
>> +
>> +			/* Network header needs to point to the outer IPv6 header inside ICMP */
>> +			skb_reset_network_header(skb);
>> +
>> +			/* Transport header needs to point to the UDP header */
>> +			skb_set_transport_header(skb, offset);
>> +			if (lookup(sk, skb))
>> +				sk = NULL;
>> +			skb_set_transport_header(skb, transport_offset);
>> +			skb_set_network_header(skb, network_offset);
>> +		}
>> +	}
> 
> I can't follow this code. I guess that before d26796ae5894,
> __udp6_lib_err() used to invoke ICMP processing on the ESP in UDP
> socket, and after d26796ae5894 'sk' was cleared
> by __udp4_lib_err_encap(), is that correct?

Actually it was cleared just before __udp4_lib_err_encap() and after
it we totally loose the information of socket found by __udp4_lib_lookup()
because __udp4_lib_err_encap() uses different combination of ports
(source and destination ports are exchanged) and could find different
socket.

> 
> After this patch, the above chunk will not clear 'sk' for packets
> targeting ESP in UDP sockets, but AFAICS we will still enter the
> following conditional, preserving the current behavior - no ICMP
> processing.

We will not enter following conditional for ESP in UDP case because
there is no more check for encap_type or encap_enabled. Just for
case of no udp socket as it was before d26796ae5894. But we still
have to check if the socket found by __udp4_lib_lookup() is correct
for received ICMP packet that's why I added code about encap_err_lookup.

I maybe missing something but d26796ae5894 doesn't actually explain
which particular situation should be avoided by this additional check
and no tests were added to simply reproduce the problem. If you can
explain it a bit more it would greatly help me to improve the fix.

Thanks
> 
> Can you please clarify?
> 
> Why can't you use something alike the following instead?
> 
> ---
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index c0f9f3260051..96a3b640e4da 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -707,7 +707,7 @@ int __udp4_lib_err(struct sk_buff *skb, u32 info, struct udp_table *udptable)
>          sk = __udp4_lib_lookup(net, iph->daddr, uh->dest,
>                                 iph->saddr, uh->source, skb->dev->ifindex,
>                                 inet_sdif(skb), udptable, NULL);
> -       if (!sk || udp_sk(sk)->encap_type) {
> +       if (!sk || READ_ONCE(udp_sk(sk)->encap_err_lookup)) {
>                  /* No socket for error: try tunnels before discarding */
>                  sk = ERR_PTR(-ENOENT);
>                  if (static_branch_unlikely(&udp_encap_needed_key)) {
> 
> ---
> 
> Thanks!
> 
> /P
>
Vadim Fedorenko July 12, 2021, 2:05 p.m. UTC | #4
On 12.07.2021 14:37, Paolo Abeni wrote:
> On Mon, 2021-07-12 at 13:45 +0100, Vadim Fedorenko wrote:
>>
>>> After this patch, the above chunk will not clear 'sk' for packets
>>> targeting ESP in UDP sockets, but AFAICS we will still enter the
>>> following conditional, preserving the current behavior - no ICMP
>>> processing.
>>
>> We will not enter following conditional for ESP in UDP case because
>> there is no more check for encap_type or encap_enabled.
> 
> I see. You have a bug in the ipv6 code-path. With your patch applied:
> 
> ---
>   	sk = __udp6_lib_lookup(net, daddr, uh->dest, saddr, uh->source,
>                                 inet6_iif(skb), inet6_sdif(skb), udptable, NULL);
>          if (sk && udp_sk(sk)->encap_enabled) {
> 		//...
>          }
> 
>          if (!sk || udp_sk(sk)->encap_enabled) {
> 	// can still enter here...
> ---	
> 

Oh, my bad, thanks for catching this!

>> I maybe missing something but d26796ae5894 doesn't actually explain
>> which particular situation should be avoided by this additional check
>> and no tests were added to simply reproduce the problem. If you can
>> explain it a bit more it would greatly help me to improve the fix.
> 
> Xin knows better, but AFAICS it used to cover the situation you
> explicitly tests in patch 3/3 - incoming packet with src-port == dst-
> port == tunnel port - for e.g. vxlan tunnels.
>

Ok, so my assumption was like yours, that's good.

>>> Why can't you use something alike the following instead?
>>>
>>> ---
>>> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
>>> index c0f9f3260051..96a3b640e4da 100644
>>> --- a/net/ipv4/udp.c
>>> +++ b/net/ipv4/udp.c
>>> @@ -707,7 +707,7 @@ int __udp4_lib_err(struct sk_buff *skb, u32 info, struct udp_table *udptable)
>>>           sk = __udp4_lib_lookup(net, iph->daddr, uh->dest,
>>>                                  iph->saddr, uh->source, skb->dev->ifindex,
>>>                                  inet_sdif(skb), udptable, NULL);
>>> -       if (!sk || udp_sk(sk)->encap_type) {
>>> +       if (!sk || READ_ONCE(udp_sk(sk)->encap_err_lookup)) {
>>>                   /* No socket for error: try tunnels before discarding */
>>>                   sk = ERR_PTR(-ENOENT);
>>>                   if (static_branch_unlikely(&udp_encap_needed_key)) {
>>>
>>> ---
> 
> Could you please have a look at the above ?
> 
Sure. The main problem I see here is that udp4_lib_lookup in udp_lib_err_encap
could return different socket because of different source and destination port
and in this case we will never check for correctness of originally found socket,
i.e. encap_err_lookup will never be called and the ICMP notification will never
be applied to that socket even if it passes checks.
My point is that it's simplier to explicitly check socket that was found than
rely on the result of udp4_lib_lookup with different inputs and leave the case
of no socket as it was before d26796ae5894.

If it's ok, I will unify the code for check as Willem suggested and resend v2.
Xin Long July 16, 2021, 5:50 p.m. UTC | #5
On Mon, Jul 12, 2021 at 9:37 AM Paolo Abeni <pabeni@redhat.com> wrote:
>

> On Mon, 2021-07-12 at 13:45 +0100, Vadim Fedorenko wrote:

> >

> > > After this patch, the above chunk will not clear 'sk' for packets

> > > targeting ESP in UDP sockets, but AFAICS we will still enter the

> > > following conditional, preserving the current behavior - no ICMP

> > > processing.

> >

> > We will not enter following conditional for ESP in UDP case because

> > there is no more check for encap_type or encap_enabled.

>

> I see. You have a bug in the ipv6 code-path. With your patch applied:

>

> ---

>         sk = __udp6_lib_lookup(net, daddr, uh->dest, saddr, uh->source,

>                                inet6_iif(skb), inet6_sdif(skb), udptable, NULL);

>         if (sk && udp_sk(sk)->encap_enabled) {

>                 //...

>         }

>

>         if (!sk || udp_sk(sk)->encap_enabled) {

>         // can still enter here...

> ---

>

> > I maybe missing something but d26796ae5894 doesn't actually explain

> > which particular situation should be avoided by this additional check

> > and no tests were added to simply reproduce the problem. If you can

> > explain it a bit more it would greatly help me to improve the fix.

>

> Xin knows better, but AFAICS it used to cover the situation you

> explicitly tests in patch 3/3 - incoming packet with src-port == dst-

> port == tunnel port - for e.g. vxlan tunnels.

Thanks Paolo and sorry for late.

Right, __udp4/6_lib_err_encap() was introduced to process the ICMP error
packets for UDP tunnels. But it will only work when there's no socket
found with src + dst port, as when the src == dst port a socket might
be found(if the bind addr is ANY) and the code will be called.



>

> > > Why can't you use something alike the following instead?

> > >

> > > ---

> > > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c

> > > index c0f9f3260051..96a3b640e4da 100644

> > > --- a/net/ipv4/udp.c

> > > +++ b/net/ipv4/udp.c

> > > @@ -707,7 +707,7 @@ int __udp4_lib_err(struct sk_buff *skb, u32 info, struct udp_table *udptable)

> > >          sk = __udp4_lib_lookup(net, iph->daddr, uh->dest,

> > >                                 iph->saddr, uh->source, skb->dev->ifindex,

> > >                                 inet_sdif(skb), udptable, NULL);

> > > -       if (!sk || udp_sk(sk)->encap_type) {

> > > +       if (!sk || READ_ONCE(udp_sk(sk)->encap_err_lookup)) {

> > >                  /* No socket for error: try tunnels before discarding */

> > >                  sk = ERR_PTR(-ENOENT);

> > >                  if (static_branch_unlikely(&udp_encap_needed_key)) {

> > >

> > > ---

>

> Could you please have a look at the above ?

If not all udp tunnels want to do further validation for ICMP error packet,
This looks good to me.

>

> Thanks!

>

> /P

>