diff mbox series

tcp: keepalive fixes

Message ID 20210112192544.GA12209@localhost.localdomain
State New
Headers show
Series tcp: keepalive fixes | expand

Commit Message

Enke Chen Jan. 12, 2021, 7:25 p.m. UTC
From: Enke Chen <enchen@paloaltonetworks.com>

In this patch two issues with TCP keepalives are fixed:

1) TCP keepalive does not timeout when there are data waiting to be
   delivered and then the connection got broken. The TCP keepalive
   timeout is not evaluated in that condition.

   The fix is to remove the code that prevents TCP keepalive from
   being evaluated for timeout.

2) With the fix for #1, TCP keepalive can erroneously timeout after
   the 0-window probe kicks in. The 0-window probe counter is wrongly
   applied to TCP keepalives.

   The fix is to use the elapsed time instead of the 0-window probe
   counter in evaluating TCP keepalive timeout.

Cc: stable@vger.kernel.org
Signed-off-by: Enke Chen <enchen@paloaltonetworks.com>
---
 net/ipv4/tcp_timer.c | 15 +++------------
 1 file changed, 3 insertions(+), 12 deletions(-)

Comments

Enke Chen Jan. 13, 2021, 12:42 a.m. UTC | #1
Hi, Yuchung:

I have attached the python script that reproduces the keepalive issues.
The script is a slight modification of the one written by Marek Majkowski:

https://github.com/cloudflare/cloudflare-blog/blob/master/2019-09-tcp-keepalives/test-zero.py

Please note that only the TCP keepalive is configured, and not the user timeout.

Thanks.  -- Enke

On Tue, Jan 12, 2021 at 02:48:01PM -0800, Yuchung Cheng wrote:
> On Tue, Jan 12, 2021 at 2:31 PM Enke Chen <enkechen2020@gmail.com> wrote:

> >

> > From: Enke Chen <enchen@paloaltonetworks.com>

> >

> > In this patch two issues with TCP keepalives are fixed:

> >

> > 1) TCP keepalive does not timeout when there are data waiting to be

> >    delivered and then the connection got broken. The TCP keepalive

> >    timeout is not evaluated in that condition.

> hi enke

> Do you have an example to demonstrate this issue -- in theory when

> there is data inflight, an RTO timer should be pending (which

> considers user-timeout setting). based on the user-timeout description

> (man tcp), the user timeout should abort the socket per the specified

> time after data commences. some data would help to understand the

> issue.

> 


------
#! /usr/bin/python

import io
import os
import select
import socket
import time
import utils
import ctypes

utils.new_ns()

port = 1

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM, 0)
s.bind(('127.0.0.1', port))
s.setsockopt(socket.SOL_SOCKET, socket.SO_RCVBUF, 1024)

s.listen(16)

tcpdump = utils.tcpdump_start(port)
c = socket.socket(socket.AF_INET, socket.SOCK_STREAM, 0)
c.setsockopt(socket.SOL_SOCKET, socket.SO_RCVBUF, 1024)
c.connect(('127.0.0.1', port))

x, _ = s.accept()

if False:
    c.setsockopt(socket.IPPROTO_TCP, socket.TCP_USER_TIMEOUT, 90*1000)

if True:
    c.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)
    c.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPCNT, 5)
    c.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPIDLE, 10)
    c.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPINTVL, 10)

time.sleep(0.2)
print("[ ] c.send()")
import fcntl
TIOCOUTQ=0x5411
c.setblocking(False)
while True:
    bytes_avail = ctypes.c_int()
    fcntl.ioctl(c.fileno(), TIOCOUTQ, bytes_avail)
    if bytes_avail.value > 64*1024:
        break
    try:
        c.send(b"A" * 16384 * 4)
    except io.BlockingIOError:
        break
c.setblocking(True)
time.sleep(0.2)
utils.ss(port)
utils.check_buffer(c)

t0 = time.time()

if True:
    utils.drop_start(dport=port)
    utils.drop_start(sport=port)

poll = select.poll()
poll.register(c, select.POLLIN)
poll.poll()

utils.ss(port)


e = c.getsockopt(socket.SOL_SOCKET, socket.SO_ERROR)
print("[ ] SO_ERROR = %s" % (e,))

t1 = time.time()
print("[ ] took: %f seconds" % (t1-t0,))
Enke Chen Jan. 13, 2021, 8:06 p.m. UTC | #2
Hi, Eric:

Just to clarify: the issues for tcp keepalive and TCP_USER_TIMEOUT are
separate isues, and the fixes would not conflict afaik.

Thanks.  -- Enke

On Tue, Jan 12, 2021 at 11:52:43PM +0100, Eric Dumazet wrote:
> On Tue, Jan 12, 2021 at 11:48 PM Yuchung Cheng <ycheng@google.com> wrote:

> >

> > On Tue, Jan 12, 2021 at 2:31 PM Enke Chen <enkechen2020@gmail.com> wrote:

> > >

> > > From: Enke Chen <enchen@paloaltonetworks.com>

> > >

> > > In this patch two issues with TCP keepalives are fixed:

> > >

> > > 1) TCP keepalive does not timeout when there are data waiting to be

> > >    delivered and then the connection got broken. The TCP keepalive

> > >    timeout is not evaluated in that condition.

> > hi enke

> > Do you have an example to demonstrate this issue -- in theory when

> > there is data inflight, an RTO timer should be pending (which

> > considers user-timeout setting). based on the user-timeout description

> > (man tcp), the user timeout should abort the socket per the specified

> > time after data commences. some data would help to understand the

> > issue.

> >

> 

> +1

> 

> A packetdrill test would be ideal.

> 

> Also, given that there is this ongoing issue with TCP_USER_TIMEOUT,

> lets not mix things

> or risk added work for backports to stable versions.
Enke Chen Jan. 13, 2021, 8:28 p.m. UTC | #3
On Wed, Jan 13, 2021 at 12:06:27PM -0800, Enke Chen wrote:
> Hi, Eric:

> 

> Just to clarify: the issues for tcp keepalive and TCP_USER_TIMEOUT are

> separate isues, and the fixes would not conflict afaik.

> 

> Thanks.  -- Enke


I have posted patches for both issues, and there is no conflict between
the patches.

Thanks.  -- Enke

> 

> On Tue, Jan 12, 2021 at 11:52:43PM +0100, Eric Dumazet wrote:

> > On Tue, Jan 12, 2021 at 11:48 PM Yuchung Cheng <ycheng@google.com> wrote:

> > >

> > > On Tue, Jan 12, 2021 at 2:31 PM Enke Chen <enkechen2020@gmail.com> wrote:

> > > >

> > > > From: Enke Chen <enchen@paloaltonetworks.com>

> > > >

> > > > In this patch two issues with TCP keepalives are fixed:

> > > >

> > > > 1) TCP keepalive does not timeout when there are data waiting to be

> > > >    delivered and then the connection got broken. The TCP keepalive

> > > >    timeout is not evaluated in that condition.

> > > hi enke

> > > Do you have an example to demonstrate this issue -- in theory when

> > > there is data inflight, an RTO timer should be pending (which

> > > considers user-timeout setting). based on the user-timeout description

> > > (man tcp), the user timeout should abort the socket per the specified

> > > time after data commences. some data would help to understand the

> > > issue.

> > >

> > 

> > +1

> > 

> > A packetdrill test would be ideal.

> > 

> > Also, given that there is this ongoing issue with TCP_USER_TIMEOUT,

> > lets not mix things

> > or risk added work for backports to stable versions.
Enke Chen Jan. 22, 2021, 7:45 p.m. UTC | #4
Hi, Folks:

Please ignore this patch. I will split it into separate ones as suggested
off-list by Neal Cardwell <ncardwell@google.com>.

Thanks.  -- Enke

On Tue, Jan 12, 2021 at 11:25:44AM -0800, Enke Chen wrote:
> From: Enke Chen <enchen@paloaltonetworks.com>

> 

> In this patch two issues with TCP keepalives are fixed:

> 

> 1) TCP keepalive does not timeout when there are data waiting to be

>    delivered and then the connection got broken. The TCP keepalive

>    timeout is not evaluated in that condition.

> 

>    The fix is to remove the code that prevents TCP keepalive from

>    being evaluated for timeout.

> 

> 2) With the fix for #1, TCP keepalive can erroneously timeout after

>    the 0-window probe kicks in. The 0-window probe counter is wrongly

>    applied to TCP keepalives.

> 

>    The fix is to use the elapsed time instead of the 0-window probe

>    counter in evaluating TCP keepalive timeout.

> 

> Cc: stable@vger.kernel.org

> Signed-off-by: Enke Chen <enchen@paloaltonetworks.com>

> ---

>  net/ipv4/tcp_timer.c | 15 +++------------

>  1 file changed, 3 insertions(+), 12 deletions(-)

> 

> diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c

> index 6c62b9ea1320..40953aa40d53 100644

> --- a/net/ipv4/tcp_timer.c

> +++ b/net/ipv4/tcp_timer.c

> @@ -696,12 +696,6 @@ static void tcp_keepalive_timer (struct timer_list *t)

>  	    ((1 << sk->sk_state) & (TCPF_CLOSE | TCPF_SYN_SENT)))

>  		goto out;

>  

> -	elapsed = keepalive_time_when(tp);

> -

> -	/* It is alive without keepalive 8) */

> -	if (tp->packets_out || !tcp_write_queue_empty(sk))

> -		goto resched;

> -

>  	elapsed = keepalive_time_elapsed(tp);

>  

>  	if (elapsed >= keepalive_time_when(tp)) {

> @@ -709,16 +703,15 @@ static void tcp_keepalive_timer (struct timer_list *t)

>  		 * to determine when to timeout instead.

>  		 */

>  		if ((icsk->icsk_user_timeout != 0 &&

> -		    elapsed >= msecs_to_jiffies(icsk->icsk_user_timeout) &&

> -		    icsk->icsk_probes_out > 0) ||

> +		     elapsed >= msecs_to_jiffies(icsk->icsk_user_timeout)) ||

>  		    (icsk->icsk_user_timeout == 0 &&

> -		    icsk->icsk_probes_out >= keepalive_probes(tp))) {

> +		     (elapsed >= keepalive_time_when(tp) +

> +		      keepalive_intvl_when(tp) * keepalive_probes(tp)))) {

>  			tcp_send_active_reset(sk, GFP_ATOMIC);

>  			tcp_write_err(sk);

>  			goto out;

>  		}

>  		if (tcp_write_wakeup(sk, LINUX_MIB_TCPKEEPALIVE) <= 0) {

> -			icsk->icsk_probes_out++;

>  			elapsed = keepalive_intvl_when(tp);

>  		} else {

>  			/* If keepalive was lost due to local congestion,

> @@ -732,8 +725,6 @@ static void tcp_keepalive_timer (struct timer_list *t)

>  	}

>  

>  	sk_mem_reclaim(sk);

> -

> -resched:

>  	inet_csk_reset_keepalive_timer (sk, elapsed);

>  	goto out;

>  

> -- 

> 2.29.2

>
diff mbox series

Patch

diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index 6c62b9ea1320..40953aa40d53 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -696,12 +696,6 @@  static void tcp_keepalive_timer (struct timer_list *t)
 	    ((1 << sk->sk_state) & (TCPF_CLOSE | TCPF_SYN_SENT)))
 		goto out;
 
-	elapsed = keepalive_time_when(tp);
-
-	/* It is alive without keepalive 8) */
-	if (tp->packets_out || !tcp_write_queue_empty(sk))
-		goto resched;
-
 	elapsed = keepalive_time_elapsed(tp);
 
 	if (elapsed >= keepalive_time_when(tp)) {
@@ -709,16 +703,15 @@  static void tcp_keepalive_timer (struct timer_list *t)
 		 * to determine when to timeout instead.
 		 */
 		if ((icsk->icsk_user_timeout != 0 &&
-		    elapsed >= msecs_to_jiffies(icsk->icsk_user_timeout) &&
-		    icsk->icsk_probes_out > 0) ||
+		     elapsed >= msecs_to_jiffies(icsk->icsk_user_timeout)) ||
 		    (icsk->icsk_user_timeout == 0 &&
-		    icsk->icsk_probes_out >= keepalive_probes(tp))) {
+		     (elapsed >= keepalive_time_when(tp) +
+		      keepalive_intvl_when(tp) * keepalive_probes(tp)))) {
 			tcp_send_active_reset(sk, GFP_ATOMIC);
 			tcp_write_err(sk);
 			goto out;
 		}
 		if (tcp_write_wakeup(sk, LINUX_MIB_TCPKEEPALIVE) <= 0) {
-			icsk->icsk_probes_out++;
 			elapsed = keepalive_intvl_when(tp);
 		} else {
 			/* If keepalive was lost due to local congestion,
@@ -732,8 +725,6 @@  static void tcp_keepalive_timer (struct timer_list *t)
 	}
 
 	sk_mem_reclaim(sk);
-
-resched:
 	inet_csk_reset_keepalive_timer (sk, elapsed);
 	goto out;