diff mbox series

[net-next,4/4] tcp: remove limit on initial receive window

Message ID 20210111222411.232916-5-hcaldwel@akamai.com
State New
Headers show
Series Fix receive window restriction | expand

Commit Message

Heath Caldwell Jan. 11, 2021, 10:24 p.m. UTC
Remove the 64KB limit imposed on the initial receive window.

The limit was added by commit a337531b942b ("tcp: up initial rmem to 128KB
and SYN rwin to around 64KB").

This change removes that limit so that the initial receive window can be
arbitrarily large (within existing limits and depending on the current
configuration).

The arbitrary, internal limit can interfere with research because it
irremediably restricts the receive window at the beginning of a connection
below what would be expected when explicitly configuring the receive buffer
size.

-

Here is a scenario to illustrate how the limit might cause undesirable
behavior:

Consider an installation where all parts of a network are either controlled
or sufficiently monitored and there is a desired use case where a 1MB
object is transmitted over a newly created TCP connection in a single
initial burst.

Let MSS be 1460 bytes.

The initial cwnd would need to be at least:

                |-  1048576 bytes  -|
    cwnd_init = |  ---------------  | = 719 packets
                |   1460 bytes/pkt  |

Let us say that it was determined that the network could handle bursts of
800 full sized packets at the frequency which the connections under
consideration would be expected to occur, so the sending host is configured
to use an initial cwnd of 800 for these connections.

In order for the receiver to be able to receive a 1MB burst, it needs to
have a sufficiently large receive buffer for the connection.  Considering
overhead, let us say that the receiver is configured to initially use a
receive buffer of 2148K for TCP connections:

    net.ipv4.tcp_rmem = 4096 2199552 6291456

Let rtt be 50 milliseconds.

If the entire object is sent in a single burst, then the theoretically
highest achievable throughput (discounting handshake and request) should
be:

                   bits   1048576 bytes   8 bits
    T_upperbound = ---- = ------------- * ------ =~ 168 Mbit/s
                   rtt       0.05 s       1 byte

But, if flow control limits throughput because the receive window is
initially limited to 64KB and grows at a rate of quadrupling every
rtt (maybe not accurate but seems to be optimistic from observation), we
should expect the highest achievable throughput to be limited to:

    bytes_sent = 65536 * (1 + 4)^(t / rtt)

    When bytes_sent = object size = 1048576:

    1048576 = 65536 * (1 + 4)^(t / rtt)
          t = rtt * log_5(16)

                            1048576 bytes              8 bits
    T_limited = ------------------------------------ * ------
                       /    |- rtt * log_5(16) -| \    1 byte
                rtt * ( 1 + |  ---------------- |  )
                       \    |        rtt        | /

                 1048576 bytes     8 bits
              = ---------------- * ------
                0.05 s * (1 + 2)   1 byte

              =~ 55.9 Mbit/s

In short: for this scenario, the 64KB limit on the initial receive window
increases the achievable acknowledged delivery time from 1 rtt
to (optimistically) 3 rtts, reducing the achievable throughput from
168 Mbit/s to 55.9 Mbit/s.

Here is an experimental illustration:

A time sequence chart of a packet capture taken on the sender for a
scenario similar to what is described above, where the receiver had the
64KB limit in place:

Symbols:
.:' - Data packets
_-  - Window advertised by receiver

y-axis - Relative sequence number
x-axis - Time from sending of first data packet, in seconds

3212891                                                                   _
3089318                                                                   -
2965745                                                                   -
2842172                                                                   -
2718600                                                           ________-
2595027                                                           -
2471454                                                           -
2347881                                                    --------
2224309                                                    _
2100736                                                    -
1977163                                                   --
1853590                                                   _
1730018                                                   -
1606445                                                   -
1482872                                                   -
1359300                                                   -
1235727                                                   -
1112154                                                   -
 988581                                                  _:
 865009                                   _______--------.:
 741436                                   .      :       '
 617863                                  -:
 494290                                  -:
 370718                                  .:
 247145                  --------.-------:
 123572 _________________:       '
      0 .:               '
      0.000    0.028    0.056    0.084    0.112    0.140    0.168    0.195

Note that the sender was not able to send the object in a single initial
burst and that it took around 4 rtts for the object to be fully
acknowledged.


A time sequence chart of a packet capture taken for the same scenario, but
with the limit removed:

2147035                                                                  __
2064456                                                                 _-
1981878                                                                _-
1899300                                                                -
1816721                                                               --
1734143                                                              _-
1651565                                                             _-
1568987                                                             -
1486408                                                            --
1403830                                                           _-
1321252                                                          _-
1238674                                                          -
1156095 ________________________________________________________--
1073517
 990939           :
 908360          :'
 825782         :'
 743204        .:
 660626        :
 578047       :'
 495469      :'
 412891     .:
 330313    .:
 247734    :
 165156   :'
  82578  :'
      0 .:
      0.000    0.008    0.016    0.025    0.033    0.041    0.049    0.057

Note that the sender was able to send the entire object in a single burst
and that it was fully acknowledged after a little over 1 rtt.

Signed-off-by: Heath Caldwell <hcaldwel@akamai.com>
---
 net/ipv4/tcp_output.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
diff mbox series

Patch

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 1d2773cd02c8..d7ab1f5f071e 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -232,7 +232,7 @@  void tcp_select_initial_window(const struct sock *sk, int __space, __u32 mss,
 	if (sock_net(sk)->ipv4.sysctl_tcp_workaround_signed_windows)
 		(*rcv_wnd) = min(space, MAX_TCP_WINDOW);
 	else
-		(*rcv_wnd) = min_t(u32, space, U16_MAX);
+		(*rcv_wnd) = space;
 
 	if (init_rcv_wnd)
 		*rcv_wnd = min(*rcv_wnd, init_rcv_wnd * mss);