Message ID | 20210302175259.971778-1-kuba@kernel.org |
---|---|
State | New |
Headers | show |
Series | [net-next,v2] tcp: make TCP Fast Open retransmission ignore Tx status | expand |
On Tue, 2 Mar 2021 09:52:59 -0800 Jakub Kicinski wrote: > When receiver does not accept TCP Fast Open it will only ack > the SYN, and not the data. We detect this and immediately queue > the data for (re)transmission in tcp_rcv_fastopen_synack(). > > In DC networks with very low RTT and without RFS the SYN-ACK > may arrive before NIC driver reported Tx completion on > the original SYN. In which case skb_still_in_host_queue() > returns true and sender will need to wait for the retransmission > timer to fire milliseconds later. > > Work around this issue by passing negative segment count to > __tcp_retransmit_skb() as suggested by Eric. > > The condition triggers more often when Tx coalescing is configured > higher than Rx coalescing on the underlying NIC, but it does happen > even with relatively moderate and even settings (e.g. 33us). > > Note that DC machines usually run configured to always accept > TCP FastOpen data so the problem may not be very common. > > Suggested-by: Eric Dumazet <edumazet@google.com> > Signed-off-by: Neil Spring <ntspring@fb.com> > Signed-off-by: Jakub Kicinski <kuba@kernel.org> .. and now I realized net-next is closed. I'll keep an eye on patchwork and resend as needed, sorry.
On Tue, Mar 2, 2021 at 7:00 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Tue, 2 Mar 2021 09:52:59 -0800 Jakub Kicinski wrote: > > When receiver does not accept TCP Fast Open it will only ack > > the SYN, and not the data. We detect this and immediately queue > > the data for (re)transmission in tcp_rcv_fastopen_synack(). > > > > In DC networks with very low RTT and without RFS the SYN-ACK > > may arrive before NIC driver reported Tx completion on > > the original SYN. In which case skb_still_in_host_queue() > > returns true and sender will need to wait for the retransmission > > timer to fire milliseconds later. > > > > Work around this issue by passing negative segment count to > > __tcp_retransmit_skb() as suggested by Eric. > > > > The condition triggers more often when Tx coalescing is configured > > higher than Rx coalescing on the underlying NIC, but it does happen > > even with relatively moderate and even settings (e.g. 33us). > > > > Note that DC machines usually run configured to always accept > > TCP FastOpen data so the problem may not be very common. > > > > Suggested-by: Eric Dumazet <edumazet@google.com> > > Signed-off-by: Neil Spring <ntspring@fb.com> > > Signed-off-by: Jakub Kicinski <kuba@kernel.org> > > .. and now I realized net-next is closed. I'll keep an eye on patchwork > and resend as needed, sorry. Ah, just open it ;)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 69a545db80d2..fb453a4799be 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -5994,8 +5994,9 @@ static bool tcp_rcv_fastopen_synack(struct sock *sk, struct sk_buff *synack, tp->fastopen_client_fail = TFO_SYN_RETRANSMITTED; else tp->fastopen_client_fail = TFO_DATA_NOT_ACKED; + /* segs = -1 to bypass skb_still_in_host_queue() check */ skb_rbtree_walk_from(data) { - if (__tcp_retransmit_skb(sk, data, 1)) + if (__tcp_retransmit_skb(sk, data, -1)) break; } tcp_rearm_rto(sk); diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index fbf140a770d8..1d1489e59697 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -3155,8 +3155,12 @@ int __tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb, int segs) sk->sk_sndbuf)) return -EAGAIN; - if (skb_still_in_host_queue(sk, skb)) - return -EBUSY; + if (segs > 0) { + if (skb_still_in_host_queue(sk, skb)) + return -EBUSY; + } else { + segs = -segs; + } if (before(TCP_SKB_CB(skb)->seq, tp->snd_una)) { if (unlikely(before(TCP_SKB_CB(skb)->end_seq, tp->snd_una))) {