Message ID | 20210712005554.26948-1-vfedorenko@novek.ru |
---|---|
Headers | show |
Series | Fix PMTU for ESP-in-UDP encapsulation | expand |
On Mon, Jul 12, 2021 at 2:56 AM Vadim Fedorenko <vfedorenko@novek.ru> wrote: > > Commit d26796ae5894 ("udp: check udp sock encap_type in __udp_lib_err") > added checks for encapsulated sockets but it broke cases when there is > no implementation of encap_err_lookup for encapsulation, i.e. ESP in > UDP encapsulation. Fix it by calling encap_err_lookup only if socket > implements this method otherwise treat it as legal socket. > > Fixes: d26796ae5894 ("udp: check udp sock encap_type in __udp_lib_err") > Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru> > --- > net/ipv4/udp.c | 24 +++++++++++++++++++++++- > net/ipv6/udp.c | 22 ++++++++++++++++++++++ > 2 files changed, 45 insertions(+), 1 deletion(-) This duplicates __udp4_lib_err_encap and __udp6_lib_err_encap. Can we avoid open-coding that logic multiple times?
Hello, On Mon, 2021-07-12 at 03:55 +0300, Vadim Fedorenko wrote: > Commit d26796ae5894 ("udp: check udp sock encap_type in __udp_lib_err") > added checks for encapsulated sockets but it broke cases when there is > no implementation of encap_err_lookup for encapsulation, i.e. ESP in > UDP encapsulation. Fix it by calling encap_err_lookup only if socket > implements this method otherwise treat it as legal socket. > > Fixes: d26796ae5894 ("udp: check udp sock encap_type in __udp_lib_err") > Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru> > --- > net/ipv4/udp.c | 24 +++++++++++++++++++++++- > net/ipv6/udp.c | 22 ++++++++++++++++++++++ > 2 files changed, 45 insertions(+), 1 deletion(-) > > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c > index e5cb7fedfbcd..4980e0f19990 100644 > --- a/net/ipv4/udp.c > +++ b/net/ipv4/udp.c > @@ -707,7 +707,29 @@ int __udp4_lib_err(struct sk_buff *skb, u32 info, struct udp_table *udptable) > sk = __udp4_lib_lookup(net, iph->daddr, uh->dest, > iph->saddr, uh->source, skb->dev->ifindex, > inet_sdif(skb), udptable, NULL); > - if (!sk || udp_sk(sk)->encap_enabled) { > + if (sk && udp_sk(sk)->encap_enabled) { > + int (*lookup)(struct sock *sk, struct sk_buff *skb); > + > + lookup = READ_ONCE(udp_sk(sk)->encap_err_lookup); > + if (lookup) { > + int network_offset, transport_offset; > + > + network_offset = skb_network_offset(skb); > + transport_offset = skb_transport_offset(skb); > + > + /* Network header needs to point to the outer IPv4 header inside ICMP */ > + skb_reset_network_header(skb); > + > + /* Transport header needs to point to the UDP header */ > + skb_set_transport_header(skb, iph->ihl << 2); > + if (lookup(sk, skb)) > + sk = NULL; > + skb_set_transport_header(skb, transport_offset); > + skb_set_network_header(skb, network_offset); > + } > + } > + > + if (!sk) { > /* No socket for error: try tunnels before discarding */ > sk = ERR_PTR(-ENOENT); > if (static_branch_unlikely(&udp_encap_needed_key)) { > diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c > index 798916d2e722..ed49a8589d9f 100644 > --- a/net/ipv6/udp.c > +++ b/net/ipv6/udp.c > @@ -558,6 +558,28 @@ int __udp6_lib_err(struct sk_buff *skb, struct inet6_skb_parm *opt, > > sk = __udp6_lib_lookup(net, daddr, uh->dest, saddr, uh->source, > inet6_iif(skb), inet6_sdif(skb), udptable, NULL); > + if (sk && udp_sk(sk)->encap_enabled) { > + int (*lookup)(struct sock *sk, struct sk_buff *skb); > + > + lookup = READ_ONCE(udp_sk(sk)->encap_err_lookup); > + if (lookup) { > + int network_offset, transport_offset; > + > + network_offset = skb_network_offset(skb); > + transport_offset = skb_transport_offset(skb); > + > + /* Network header needs to point to the outer IPv6 header inside ICMP */ > + skb_reset_network_header(skb); > + > + /* Transport header needs to point to the UDP header */ > + skb_set_transport_header(skb, offset); > + if (lookup(sk, skb)) > + sk = NULL; > + skb_set_transport_header(skb, transport_offset); > + skb_set_network_header(skb, network_offset); > + } > + } I can't follow this code. I guess that before d26796ae5894, __udp6_lib_err() used to invoke ICMP processing on the ESP in UDP socket, and after d26796ae5894 'sk' was cleared by __udp4_lib_err_encap(), is that correct? After this patch, the above chunk will not clear 'sk' for packets targeting ESP in UDP sockets, but AFAICS we will still enter the following conditional, preserving the current behavior - no ICMP processing. Can you please clarify? Why can't you use something alike the following instead? --- diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index c0f9f3260051..96a3b640e4da 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -707,7 +707,7 @@ int __udp4_lib_err(struct sk_buff *skb, u32 info, struct udp_table *udptable) sk = __udp4_lib_lookup(net, iph->daddr, uh->dest, iph->saddr, uh->source, skb->dev->ifindex, inet_sdif(skb), udptable, NULL); - if (!sk || udp_sk(sk)->encap_type) { + if (!sk || READ_ONCE(udp_sk(sk)->encap_err_lookup)) { /* No socket for error: try tunnels before discarding */ sk = ERR_PTR(-ENOENT); if (static_branch_unlikely(&udp_encap_needed_key)) { --- Thanks! /P
On 12.07.2021 10:07, Paolo Abeni wrote: > Hello, > > On Mon, 2021-07-12 at 03:55 +0300, Vadim Fedorenko wrote: >> Commit d26796ae5894 ("udp: check udp sock encap_type in __udp_lib_err") >> added checks for encapsulated sockets but it broke cases when there is >> no implementation of encap_err_lookup for encapsulation, i.e. ESP in >> UDP encapsulation. Fix it by calling encap_err_lookup only if socket >> implements this method otherwise treat it as legal socket. >> >> Fixes: d26796ae5894 ("udp: check udp sock encap_type in __udp_lib_err") >> Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru> >> --- >> net/ipv4/udp.c | 24 +++++++++++++++++++++++- >> net/ipv6/udp.c | 22 ++++++++++++++++++++++ >> 2 files changed, 45 insertions(+), 1 deletion(-) >> >> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c >> index e5cb7fedfbcd..4980e0f19990 100644 >> --- a/net/ipv4/udp.c >> +++ b/net/ipv4/udp.c >> @@ -707,7 +707,29 @@ int __udp4_lib_err(struct sk_buff *skb, u32 info, struct udp_table *udptable) >> sk = __udp4_lib_lookup(net, iph->daddr, uh->dest, >> iph->saddr, uh->source, skb->dev->ifindex, >> inet_sdif(skb), udptable, NULL); >> - if (!sk || udp_sk(sk)->encap_enabled) { >> + if (sk && udp_sk(sk)->encap_enabled) { >> + int (*lookup)(struct sock *sk, struct sk_buff *skb); >> + >> + lookup = READ_ONCE(udp_sk(sk)->encap_err_lookup); >> + if (lookup) { >> + int network_offset, transport_offset; >> + >> + network_offset = skb_network_offset(skb); >> + transport_offset = skb_transport_offset(skb); >> + >> + /* Network header needs to point to the outer IPv4 header inside ICMP */ >> + skb_reset_network_header(skb); >> + >> + /* Transport header needs to point to the UDP header */ >> + skb_set_transport_header(skb, iph->ihl << 2); >> + if (lookup(sk, skb)) >> + sk = NULL; >> + skb_set_transport_header(skb, transport_offset); >> + skb_set_network_header(skb, network_offset); >> + } >> + } >> + >> + if (!sk) { >> /* No socket for error: try tunnels before discarding */ >> sk = ERR_PTR(-ENOENT); >> if (static_branch_unlikely(&udp_encap_needed_key)) { >> diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c >> index 798916d2e722..ed49a8589d9f 100644 >> --- a/net/ipv6/udp.c >> +++ b/net/ipv6/udp.c >> @@ -558,6 +558,28 @@ int __udp6_lib_err(struct sk_buff *skb, struct inet6_skb_parm *opt, >> >> sk = __udp6_lib_lookup(net, daddr, uh->dest, saddr, uh->source, >> inet6_iif(skb), inet6_sdif(skb), udptable, NULL); >> + if (sk && udp_sk(sk)->encap_enabled) { >> + int (*lookup)(struct sock *sk, struct sk_buff *skb); >> + >> + lookup = READ_ONCE(udp_sk(sk)->encap_err_lookup); >> + if (lookup) { >> + int network_offset, transport_offset; >> + >> + network_offset = skb_network_offset(skb); >> + transport_offset = skb_transport_offset(skb); >> + >> + /* Network header needs to point to the outer IPv6 header inside ICMP */ >> + skb_reset_network_header(skb); >> + >> + /* Transport header needs to point to the UDP header */ >> + skb_set_transport_header(skb, offset); >> + if (lookup(sk, skb)) >> + sk = NULL; >> + skb_set_transport_header(skb, transport_offset); >> + skb_set_network_header(skb, network_offset); >> + } >> + } > > I can't follow this code. I guess that before d26796ae5894, > __udp6_lib_err() used to invoke ICMP processing on the ESP in UDP > socket, and after d26796ae5894 'sk' was cleared > by __udp4_lib_err_encap(), is that correct? Actually it was cleared just before __udp4_lib_err_encap() and after it we totally loose the information of socket found by __udp4_lib_lookup() because __udp4_lib_err_encap() uses different combination of ports (source and destination ports are exchanged) and could find different socket. > > After this patch, the above chunk will not clear 'sk' for packets > targeting ESP in UDP sockets, but AFAICS we will still enter the > following conditional, preserving the current behavior - no ICMP > processing. We will not enter following conditional for ESP in UDP case because there is no more check for encap_type or encap_enabled. Just for case of no udp socket as it was before d26796ae5894. But we still have to check if the socket found by __udp4_lib_lookup() is correct for received ICMP packet that's why I added code about encap_err_lookup. I maybe missing something but d26796ae5894 doesn't actually explain which particular situation should be avoided by this additional check and no tests were added to simply reproduce the problem. If you can explain it a bit more it would greatly help me to improve the fix. Thanks > > Can you please clarify? > > Why can't you use something alike the following instead? > > --- > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c > index c0f9f3260051..96a3b640e4da 100644 > --- a/net/ipv4/udp.c > +++ b/net/ipv4/udp.c > @@ -707,7 +707,7 @@ int __udp4_lib_err(struct sk_buff *skb, u32 info, struct udp_table *udptable) > sk = __udp4_lib_lookup(net, iph->daddr, uh->dest, > iph->saddr, uh->source, skb->dev->ifindex, > inet_sdif(skb), udptable, NULL); > - if (!sk || udp_sk(sk)->encap_type) { > + if (!sk || READ_ONCE(udp_sk(sk)->encap_err_lookup)) { > /* No socket for error: try tunnels before discarding */ > sk = ERR_PTR(-ENOENT); > if (static_branch_unlikely(&udp_encap_needed_key)) { > > --- > > Thanks! > > /P >
On 12.07.2021 14:37, Paolo Abeni wrote: > On Mon, 2021-07-12 at 13:45 +0100, Vadim Fedorenko wrote: >> >>> After this patch, the above chunk will not clear 'sk' for packets >>> targeting ESP in UDP sockets, but AFAICS we will still enter the >>> following conditional, preserving the current behavior - no ICMP >>> processing. >> >> We will not enter following conditional for ESP in UDP case because >> there is no more check for encap_type or encap_enabled. > > I see. You have a bug in the ipv6 code-path. With your patch applied: > > --- > sk = __udp6_lib_lookup(net, daddr, uh->dest, saddr, uh->source, > inet6_iif(skb), inet6_sdif(skb), udptable, NULL); > if (sk && udp_sk(sk)->encap_enabled) { > //... > } > > if (!sk || udp_sk(sk)->encap_enabled) { > // can still enter here... > --- > Oh, my bad, thanks for catching this! >> I maybe missing something but d26796ae5894 doesn't actually explain >> which particular situation should be avoided by this additional check >> and no tests were added to simply reproduce the problem. If you can >> explain it a bit more it would greatly help me to improve the fix. > > Xin knows better, but AFAICS it used to cover the situation you > explicitly tests in patch 3/3 - incoming packet with src-port == dst- > port == tunnel port - for e.g. vxlan tunnels. > Ok, so my assumption was like yours, that's good. >>> Why can't you use something alike the following instead? >>> >>> --- >>> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c >>> index c0f9f3260051..96a3b640e4da 100644 >>> --- a/net/ipv4/udp.c >>> +++ b/net/ipv4/udp.c >>> @@ -707,7 +707,7 @@ int __udp4_lib_err(struct sk_buff *skb, u32 info, struct udp_table *udptable) >>> sk = __udp4_lib_lookup(net, iph->daddr, uh->dest, >>> iph->saddr, uh->source, skb->dev->ifindex, >>> inet_sdif(skb), udptable, NULL); >>> - if (!sk || udp_sk(sk)->encap_type) { >>> + if (!sk || READ_ONCE(udp_sk(sk)->encap_err_lookup)) { >>> /* No socket for error: try tunnels before discarding */ >>> sk = ERR_PTR(-ENOENT); >>> if (static_branch_unlikely(&udp_encap_needed_key)) { >>> >>> --- > > Could you please have a look at the above ? > Sure. The main problem I see here is that udp4_lib_lookup in udp_lib_err_encap could return different socket because of different source and destination port and in this case we will never check for correctness of originally found socket, i.e. encap_err_lookup will never be called and the ICMP notification will never be applied to that socket even if it passes checks. My point is that it's simplier to explicitly check socket that was found than rely on the result of udp4_lib_lookup with different inputs and leave the case of no socket as it was before d26796ae5894. If it's ok, I will unify the code for check as Willem suggested and resend v2.
On Mon, Jul 12, 2021 at 9:37 AM Paolo Abeni <pabeni@redhat.com> wrote: > > On Mon, 2021-07-12 at 13:45 +0100, Vadim Fedorenko wrote: > > > > > After this patch, the above chunk will not clear 'sk' for packets > > > targeting ESP in UDP sockets, but AFAICS we will still enter the > > > following conditional, preserving the current behavior - no ICMP > > > processing. > > > > We will not enter following conditional for ESP in UDP case because > > there is no more check for encap_type or encap_enabled. > > I see. You have a bug in the ipv6 code-path. With your patch applied: > > --- > sk = __udp6_lib_lookup(net, daddr, uh->dest, saddr, uh->source, > inet6_iif(skb), inet6_sdif(skb), udptable, NULL); > if (sk && udp_sk(sk)->encap_enabled) { > //... > } > > if (!sk || udp_sk(sk)->encap_enabled) { > // can still enter here... > --- > > > I maybe missing something but d26796ae5894 doesn't actually explain > > which particular situation should be avoided by this additional check > > and no tests were added to simply reproduce the problem. If you can > > explain it a bit more it would greatly help me to improve the fix. > > Xin knows better, but AFAICS it used to cover the situation you > explicitly tests in patch 3/3 - incoming packet with src-port == dst- > port == tunnel port - for e.g. vxlan tunnels. Thanks Paolo and sorry for late. Right, __udp4/6_lib_err_encap() was introduced to process the ICMP error packets for UDP tunnels. But it will only work when there's no socket found with src + dst port, as when the src == dst port a socket might be found(if the bind addr is ANY) and the code will be called. > > > > Why can't you use something alike the following instead? > > > > > > --- > > > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c > > > index c0f9f3260051..96a3b640e4da 100644 > > > --- a/net/ipv4/udp.c > > > +++ b/net/ipv4/udp.c > > > @@ -707,7 +707,7 @@ int __udp4_lib_err(struct sk_buff *skb, u32 info, struct udp_table *udptable) > > > sk = __udp4_lib_lookup(net, iph->daddr, uh->dest, > > > iph->saddr, uh->source, skb->dev->ifindex, > > > inet_sdif(skb), udptable, NULL); > > > - if (!sk || udp_sk(sk)->encap_type) { > > > + if (!sk || READ_ONCE(udp_sk(sk)->encap_err_lookup)) { > > > /* No socket for error: try tunnels before discarding */ > > > sk = ERR_PTR(-ENOENT); > > > if (static_branch_unlikely(&udp_encap_needed_key)) { > > > > > > --- > > Could you please have a look at the above ? If not all udp tunnels want to do further validation for ICMP error packet, This looks good to me. > > Thanks! > > /P >