Message ID | 20201005134813.2051883-1-eric.dumazet@gmail.com |
---|---|
State | New |
Headers | show |
Series | [net] tcp: fix receive window update in tcp_add_backlog() | expand |
On Mon, Oct 5, 2020 at 9:48 AM Eric Dumazet <eric.dumazet@gmail.com> wrote: > > From: Eric Dumazet <edumazet@google.com> > > We got reports from GKE customers flows being reset by netfilter > conntrack unless nf_conntrack_tcp_be_liberal is set to 1. > > Traces seemed to suggest ACK packet being dropped by the > packet capture, or more likely that ACK were received in the > wrong order. > > wscale=7, SYN and SYNACK not shown here. > > This ACK allows the sender to send 1871*128 bytes from seq 51359321 : > New right edge of the window -> 51359321+1871*128=51598809 > > 09:17:23.389210 IP A > B: Flags [.], ack 51359321, win 1871, options [nop,nop,TS val 10 ecr 999], length 0 > > 09:17:23.389212 IP B > A: Flags [.], seq 51422681:51424089, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 1408 > 09:17:23.389214 IP A > B: Flags [.], ack 51422681, win 1376, options [nop,nop,TS val 10 ecr 999], length 0 > 09:17:23.389253 IP B > A: Flags [.], seq 51424089:51488857, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 64768 > 09:17:23.389272 IP A > B: Flags [.], ack 51488857, win 859, options [nop,nop,TS val 10 ecr 999], length 0 > 09:17:23.389275 IP B > A: Flags [.], seq 51488857:51521241, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 32384 > > Receiver now allows to send 606*128=77568 from seq 51521241 : > New right edge of the window -> 51521241+606*128=51598809 > > 09:17:23.389296 IP A > B: Flags [.], ack 51521241, win 606, options [nop,nop,TS val 10 ecr 999], length 0 > > 09:17:23.389308 IP B > A: Flags [.], seq 51521241:51553625, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 32384 > > It seems the sender exceeds RWIN allowance, since 51611353 > 51598809 > > 09:17:23.389346 IP B > A: Flags [.], seq 51553625:51611353, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 57728 > 09:17:23.389356 IP B > A: Flags [.], seq 51611353:51618393, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 7040 > > 09:17:23.389367 IP A > B: Flags [.], ack 51611353, win 0, options [nop,nop,TS val 10 ecr 999], length 0 > > netfilter conntrack is not happy and sends RST > > 09:17:23.389389 IP A > B: Flags [R], seq 92176528, win 0, length 0 > 09:17:23.389488 IP B > A: Flags [R], seq 174478967, win 0, length 0 > > Now imagine ACK were delivered out of order and tcp_add_backlog() sets window based on wrong packet. > New right edge of the window -> 51521241+859*128=51631193 > > Normally TCP stack handles OOO packets just fine, but it > turns out tcp_add_backlog() does not. It can update the window > field of the aggregated packet even if the ACK sequence > of the last received packet is too old. > > Many thanks to Alexandre Ferrieux for independently reporting the issue > and suggesting a fix. > > Fixes: 4f693b55c3d2 ("tcp: implement coalescing on backlog queue") > Signed-off-by: Eric Dumazet <edumazet@google.com> > Reported-by: Alexandre Ferrieux <alexandre.ferrieux@orange.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Thanks for the fix! > --- > net/ipv4/tcp_ipv4.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c > index 5084333b5ab647ca8ed296235a1ed6573693b250..592c7396272315c864372c158a7bc8850c6ddc61 100644 > --- a/net/ipv4/tcp_ipv4.c > +++ b/net/ipv4/tcp_ipv4.c > @@ -1788,12 +1788,12 @@ bool tcp_add_backlog(struct sock *sk, struct sk_buff *skb) > > __skb_pull(skb, hdrlen); > if (skb_try_coalesce(tail, skb, &fragstolen, &delta)) { > - thtail->window = th->window; > - > TCP_SKB_CB(tail)->end_seq = TCP_SKB_CB(skb)->end_seq; > > - if (after(TCP_SKB_CB(skb)->ack_seq, TCP_SKB_CB(tail)->ack_seq)) > + if (likely(!before(TCP_SKB_CB(skb)->ack_seq, TCP_SKB_CB(tail)->ack_seq))) { > TCP_SKB_CB(tail)->ack_seq = TCP_SKB_CB(skb)->ack_seq; > + thtail->window = th->window; > + } > > /* We have to update both TCP_SKB_CB(tail)->tcp_flags and > * thtail->fin, so that the fast path in tcp_rcv_established() > -- > 2.28.0.806.g8561365e88-goog >
On Mon, Oct 5, 2020 at 9:48 AM Eric Dumazet <eric.dumazet@gmail.com> wrote: > > From: Eric Dumazet <edumazet@google.com> > > We got reports from GKE customers flows being reset by netfilter > conntrack unless nf_conntrack_tcp_be_liberal is set to 1. > > Traces seemed to suggest ACK packet being dropped by the > packet capture, or more likely that ACK were received in the > wrong order. > > wscale=7, SYN and SYNACK not shown here. > > This ACK allows the sender to send 1871*128 bytes from seq 51359321 : > New right edge of the window -> 51359321+1871*128=51598809 > > 09:17:23.389210 IP A > B: Flags [.], ack 51359321, win 1871, options [nop,nop,TS val 10 ecr 999], length 0 > > 09:17:23.389212 IP B > A: Flags [.], seq 51422681:51424089, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 1408 > 09:17:23.389214 IP A > B: Flags [.], ack 51422681, win 1376, options [nop,nop,TS val 10 ecr 999], length 0 > 09:17:23.389253 IP B > A: Flags [.], seq 51424089:51488857, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 64768 > 09:17:23.389272 IP A > B: Flags [.], ack 51488857, win 859, options [nop,nop,TS val 10 ecr 999], length 0 > 09:17:23.389275 IP B > A: Flags [.], seq 51488857:51521241, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 32384 > > Receiver now allows to send 606*128=77568 from seq 51521241 : > New right edge of the window -> 51521241+606*128=51598809 > > 09:17:23.389296 IP A > B: Flags [.], ack 51521241, win 606, options [nop,nop,TS val 10 ecr 999], length 0 > > 09:17:23.389308 IP B > A: Flags [.], seq 51521241:51553625, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 32384 > > It seems the sender exceeds RWIN allowance, since 51611353 > 51598809 > > 09:17:23.389346 IP B > A: Flags [.], seq 51553625:51611353, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 57728 > 09:17:23.389356 IP B > A: Flags [.], seq 51611353:51618393, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 7040 > > 09:17:23.389367 IP A > B: Flags [.], ack 51611353, win 0, options [nop,nop,TS val 10 ecr 999], length 0 > > netfilter conntrack is not happy and sends RST > > 09:17:23.389389 IP A > B: Flags [R], seq 92176528, win 0, length 0 > 09:17:23.389488 IP B > A: Flags [R], seq 174478967, win 0, length 0 > > Now imagine ACK were delivered out of order and tcp_add_backlog() sets window based on wrong packet. > New right edge of the window -> 51521241+859*128=51631193 > > Normally TCP stack handles OOO packets just fine, but it > turns out tcp_add_backlog() does not. It can update the window > field of the aggregated packet even if the ACK sequence > of the last received packet is too old. > > Many thanks to Alexandre Ferrieux for independently reporting the issue > and suggesting a fix. > > Fixes: 4f693b55c3d2 ("tcp: implement coalescing on backlog queue") > Signed-off-by: Eric Dumazet <edumazet@google.com> > Reported-by: Alexandre Ferrieux <alexandre.ferrieux@orange.com> > --- > net/ipv4/tcp_ipv4.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) Nice detective work, as always. Thanks for the fix! Acked-by: Neal Cardwell <ncardwell@google.com> neal
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Mon, 5 Oct 2020 06:48:13 -0700 > From: Eric Dumazet <edumazet@google.com> > > We got reports from GKE customers flows being reset by netfilter > conntrack unless nf_conntrack_tcp_be_liberal is set to 1. > > Traces seemed to suggest ACK packet being dropped by the > packet capture, or more likely that ACK were received in the > wrong order. > > wscale=7, SYN and SYNACK not shown here. > > This ACK allows the sender to send 1871*128 bytes from seq 51359321 : > New right edge of the window -> 51359321+1871*128=51598809 ... > Now imagine ACK were delivered out of order and tcp_add_backlog() sets window based on wrong packet. > New right edge of the window -> 51521241+859*128=51631193 > > Normally TCP stack handles OOO packets just fine, but it > turns out tcp_add_backlog() does not. It can update the window > field of the aggregated packet even if the ACK sequence > of the last received packet is too old. > > Many thanks to Alexandre Ferrieux for independently reporting the issue > and suggesting a fix. > > Fixes: 4f693b55c3d2 ("tcp: implement coalescing on backlog queue") > Signed-off-by: Eric Dumazet <edumazet@google.com> > Reported-by: Alexandre Ferrieux <alexandre.ferrieux@orange.com> Applied and queued up for -stable, thank you.
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 5084333b5ab647ca8ed296235a1ed6573693b250..592c7396272315c864372c158a7bc8850c6ddc61 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -1788,12 +1788,12 @@ bool tcp_add_backlog(struct sock *sk, struct sk_buff *skb) __skb_pull(skb, hdrlen); if (skb_try_coalesce(tail, skb, &fragstolen, &delta)) { - thtail->window = th->window; - TCP_SKB_CB(tail)->end_seq = TCP_SKB_CB(skb)->end_seq; - if (after(TCP_SKB_CB(skb)->ack_seq, TCP_SKB_CB(tail)->ack_seq)) + if (likely(!before(TCP_SKB_CB(skb)->ack_seq, TCP_SKB_CB(tail)->ack_seq))) { TCP_SKB_CB(tail)->ack_seq = TCP_SKB_CB(skb)->ack_seq; + thtail->window = th->window; + } /* We have to update both TCP_SKB_CB(tail)->tcp_flags and * thtail->fin, so that the fast path in tcp_rcv_established()