Message ID | 20210115223058.GA39267@localhost.localdomain |
---|---|
State | New |
Headers | show |
Series | [net,v2] tcp: fix TCP_USER_TIMEOUT with zero window | expand |
On Fri, 15 Jan 2021 14:30:58 -0800 Enke Chen wrote: > From: Enke Chen <enchen@paloaltonetworks.com> > > The TCP session does not terminate with TCP_USER_TIMEOUT when data > remain untransmitted due to zero window. > > The number of unanswered zero-window probes (tcp_probes_out) is > reset to zero with incoming acks irrespective of the window size, > as described in tcp_probe_timer(): > > RFC 1122 4.2.2.17 requires the sender to stay open indefinitely > as long as the receiver continues to respond probes. We support > this by default and reset icsk_probes_out with incoming ACKs. > > This counter, however, is the wrong one to be used in calculating the > duration that the window remains closed and data remain untransmitted. > Thanks to Jonathan Maxwell <jmaxwell37@gmail.com> for diagnosing the > actual issue. > > In this patch a new timestamp is introduced for the socket in order to > track the elapsed time for the zero-window probes that have not been > answered with any non-zero window ack. > > Fixes: 9721e709fa68 ("tcp: simplify window probe aborting on USER_TIMEOUT") > Reported-by: William McCall <william.mccall@gmail.com> > Co-developed-by: Neal Cardwell <ncardwell@google.com> > Signed-off-by: Neal Cardwell <ncardwell@google.com> > Signed-off-by: Enke Chen <enchen@paloaltonetworks.com> > Reviewed-by: Yuchung Cheng <ycheng@google.com> > Reviewed-by: Eric Dumazet <edumazet@google.com> I take it you got all these tags off-list? I don't see them on the v1 discussion. Applied to net, thanks!
On Mon, Jan 18, 2021 at 08:02:21PM -0800, Jakub Kicinski wrote: > On Fri, 15 Jan 2021 14:30:58 -0800 Enke Chen wrote: > > From: Enke Chen <enchen@paloaltonetworks.com> > > > > The TCP session does not terminate with TCP_USER_TIMEOUT when data > > remain untransmitted due to zero window. > > > > The number of unanswered zero-window probes (tcp_probes_out) is > > reset to zero with incoming acks irrespective of the window size, > > as described in tcp_probe_timer(): > > > > RFC 1122 4.2.2.17 requires the sender to stay open indefinitely > > as long as the receiver continues to respond probes. We support > > this by default and reset icsk_probes_out with incoming ACKs. > > > > This counter, however, is the wrong one to be used in calculating the > > duration that the window remains closed and data remain untransmitted. > > Thanks to Jonathan Maxwell <jmaxwell37@gmail.com> for diagnosing the > > actual issue. > > > > In this patch a new timestamp is introduced for the socket in order to > > track the elapsed time for the zero-window probes that have not been > > answered with any non-zero window ack. > > > > Fixes: 9721e709fa68 ("tcp: simplify window probe aborting on USER_TIMEOUT") > > Reported-by: William McCall <william.mccall@gmail.com> > > Co-developed-by: Neal Cardwell <ncardwell@google.com> > > Signed-off-by: Neal Cardwell <ncardwell@google.com> > > Signed-off-by: Enke Chen <enchen@paloaltonetworks.com> > > Reviewed-by: Yuchung Cheng <ycheng@google.com> > > Reviewed-by: Eric Dumazet <edumazet@google.com> > > I take it you got all these tags off-list? I don't see them on the v1 > discussion. Yes, the tags have been approved off-list by those named. > > Applied to net, thanks! Thanks. -- Enke
diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h index 7338b3865a2a..111d7771b208 100644 --- a/include/net/inet_connection_sock.h +++ b/include/net/inet_connection_sock.h @@ -76,6 +76,8 @@ struct inet_connection_sock_af_ops { * @icsk_ext_hdr_len: Network protocol overhead (IP/IPv6 options) * @icsk_ack: Delayed ACK control data * @icsk_mtup; MTU probing control data + * @icsk_probes_tstamp: Probe timestamp (cleared by non-zero window ack) + * @icsk_user_timeout: TCP_USER_TIMEOUT value */ struct inet_connection_sock { /* inet_sock has to be the first member! */ @@ -129,6 +131,7 @@ struct inet_connection_sock { u32 probe_timestamp; } icsk_mtup; + u32 icsk_probes_tstamp; u32 icsk_user_timeout; u64 icsk_ca_priv[104 / sizeof(u64)]; diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index fd8b8800a2c3..6bd7ca09af03 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -851,6 +851,7 @@ struct sock *inet_csk_clone_lock(const struct sock *sk, newicsk->icsk_retransmits = 0; newicsk->icsk_backoff = 0; newicsk->icsk_probes_out = 0; + newicsk->icsk_probes_tstamp = 0; /* Deinitialize accept_queue to trap illegal accesses. */ memset(&newicsk->icsk_accept_queue, 0, sizeof(newicsk->icsk_accept_queue)); diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index ed42d2193c5c..32545ecf2ab1 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2937,6 +2937,7 @@ int tcp_disconnect(struct sock *sk, int flags) icsk->icsk_backoff = 0; icsk->icsk_probes_out = 0; + icsk->icsk_probes_tstamp = 0; icsk->icsk_rto = TCP_TIMEOUT_INIT; icsk->icsk_rto_min = TCP_RTO_MIN; icsk->icsk_delack_max = TCP_DELACK_MAX; diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index c7e16b0ed791..bafcab75f425 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -3384,6 +3384,7 @@ static void tcp_ack_probe(struct sock *sk) return; if (!after(TCP_SKB_CB(head)->end_seq, tcp_wnd_end(tp))) { icsk->icsk_backoff = 0; + icsk->icsk_probes_tstamp = 0; inet_csk_clear_xmit_timer(sk, ICSK_TIME_PROBE0); /* Socket must be waked up by subsequent tcp_data_snd_check(). * This function is not for random using! diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index f322e798a351..ab458697881e 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -4084,6 +4084,7 @@ void tcp_send_probe0(struct sock *sk) /* Cancel probe timer, if it is not required. */ icsk->icsk_probes_out = 0; icsk->icsk_backoff = 0; + icsk->icsk_probes_tstamp = 0; return; } diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c index 6c62b9ea1320..454732ecc8f3 100644 --- a/net/ipv4/tcp_timer.c +++ b/net/ipv4/tcp_timer.c @@ -349,6 +349,7 @@ static void tcp_probe_timer(struct sock *sk) if (tp->packets_out || !skb) { icsk->icsk_probes_out = 0; + icsk->icsk_probes_tstamp = 0; return; } @@ -360,13 +361,12 @@ static void tcp_probe_timer(struct sock *sk) * corresponding system limit. We also implement similar policy when * we use RTO to probe window in tcp_retransmit_timer(). */ - if (icsk->icsk_user_timeout) { - u32 elapsed = tcp_model_timeout(sk, icsk->icsk_probes_out, - tcp_probe0_base(sk)); - - if (elapsed >= icsk->icsk_user_timeout) - goto abort; - } + if (!icsk->icsk_probes_tstamp) + icsk->icsk_probes_tstamp = tcp_jiffies32; + else if (icsk->icsk_user_timeout && + (s32)(tcp_jiffies32 - icsk->icsk_probes_tstamp) >= + msecs_to_jiffies(icsk->icsk_user_timeout)) + goto abort; max_probes = sock_net(sk)->ipv4.sysctl_tcp_retries2; if (sock_flag(sk, SOCK_DEAD)) {