Message ID | 20201022143331.1887495-1-ncardwell.kernel@gmail.com |
---|---|
State | New |
Headers | show |
Series | [net] tcp: fix to update snd_wl1 in bulk receiver fast path | expand |
On Thu, 22 Oct 2020 10:33:31 -0400 Neal Cardwell wrote: > From: Neal Cardwell <ncardwell@google.com> > > In the header prediction fast path for a bulk data receiver, if no > data is newly acknowledged then we do not call tcp_ack() and do not > call tcp_ack_update_window(). This means that a bulk receiver that > receives large amounts of data can have the incoming sequence numbers > wrap, so that the check in tcp_may_update_window fails: > after(ack_seq, tp->snd_wl1) > > If the incoming receive windows are zero in this state, and then the > connection that was a bulk data receiver later wants to send data, > that connection can find itself persistently rejecting the window > updates in incoming ACKs. This means the connection can persistently > fail to discover that the receive window has opened, which in turn > means that the connection is unable to send anything, and the > connection's sending process can get permanently "stuck". > > The fix is to update snd_wl1 in the header prediction fast path for a > bulk data receiver, so that it keeps up and does not see wrapping > problems. > > This fix is based on a very nice and thorough analysis and diagnosis > by Apollon Oikonomopoulos (see link below). > > This is a stable candidate but there is no Fixes tag here since the > bug predates current git history. Just for fun: looks like the bug > dates back to when header prediction was added in Linux v2.1.8 in Nov > 1996. In that version tcp_rcv_established() was added, and the code > only updates snd_wl1 in tcp_ack(), and in the new "Bulk data transfer: > receiver" code path it does not call tcp_ack(). This fix seems to > apply cleanly at least as far back as v3.2. In that case - can I slap: Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") on it?
On Thu, Oct 22, 2020 at 12:08 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Thu, 22 Oct 2020 10:33:31 -0400 Neal Cardwell wrote: > > From: Neal Cardwell <ncardwell@google.com> > > > > In the header prediction fast path for a bulk data receiver, if no > > data is newly acknowledged then we do not call tcp_ack() and do not > > call tcp_ack_update_window(). This means that a bulk receiver that > > receives large amounts of data can have the incoming sequence numbers > > wrap, so that the check in tcp_may_update_window fails: > > after(ack_seq, tp->snd_wl1) > > > > If the incoming receive windows are zero in this state, and then the > > connection that was a bulk data receiver later wants to send data, > > that connection can find itself persistently rejecting the window > > updates in incoming ACKs. This means the connection can persistently > > fail to discover that the receive window has opened, which in turn > > means that the connection is unable to send anything, and the > > connection's sending process can get permanently "stuck". > > > > The fix is to update snd_wl1 in the header prediction fast path for a > > bulk data receiver, so that it keeps up and does not see wrapping > > problems. > > > > This fix is based on a very nice and thorough analysis and diagnosis > > by Apollon Oikonomopoulos (see link below). > > > > This is a stable candidate but there is no Fixes tag here since the > > bug predates current git history. Just for fun: looks like the bug > > dates back to when header prediction was added in Linux v2.1.8 in Nov > > 1996. In that version tcp_rcv_established() was added, and the code > > only updates snd_wl1 in tcp_ack(), and in the new "Bulk data transfer: > > receiver" code path it does not call tcp_ack(). This fix seems to > > apply cleanly at least as far back as v3.2. > > In that case - can I slap: > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") > > on it? Yes, slapping that Fixes footer on it sounds fine to me. I see that it does apply cleanly to 1da177e4c3f4. Or let me know if you would prefer for me to resubmit a v2 with that footer. thanks, neal
On Thu, 22 Oct 2020 13:04:04 -0400 Neal Cardwell wrote: > > In that case - can I slap: > > > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") > > > > on it? > > Yes, slapping that Fixes footer on it sounds fine to me. I see > that it does apply cleanly to 1da177e4c3f4. FWIW even if it didn't - my modus operandi is to have the tag pointing at the earliest point in the git history where the bug exists. Even if the implementation was completely rewritten - we want to let the people who run old kernels know they may experience the issue. Backporters will see the patch doesn't apply and make the right call. That said I don't think the process documentation is very clear on this, so maybe someone more experienced will correct my course :) > Or let me know if you would prefer for me to resubmit a v2 with that footer. I'll add it, no worries.
On Thu, Oct 22, 2020 at 1:47 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Thu, 22 Oct 2020 13:04:04 -0400 Neal Cardwell wrote: > > > In that case - can I slap: > > > > > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") > > > > > > on it? > > > > Yes, slapping that Fixes footer on it sounds fine to me. I see > > that it does apply cleanly to 1da177e4c3f4. > > FWIW even if it didn't - my modus operandi is to have the tag pointing > at the earliest point in the git history where the bug exists. Even if > the implementation was completely rewritten - we want to let the people > who run old kernels know they may experience the issue. > > Backporters will see the patch doesn't apply and make the right call. > > That said I don't think the process documentation is very clear on > this, so maybe someone more experienced will correct my course :) > > > Or let me know if you would prefer for me to resubmit a v2 with that footer. > > I'll add it, no worries. Great. Thanks! neal
On Thu, 22 Oct 2020 10:33:31 -0400 Neal Cardwell wrote: > From: Neal Cardwell <ncardwell@google.com> > > In the header prediction fast path for a bulk data receiver, if no > data is newly acknowledged then we do not call tcp_ack() and do not > call tcp_ack_update_window(). This means that a bulk receiver that > receives large amounts of data can have the incoming sequence numbers > wrap, so that the check in tcp_may_update_window fails: > after(ack_seq, tp->snd_wl1) > > If the incoming receive windows are zero in this state, and then the > connection that was a bulk data receiver later wants to send data, > that connection can find itself persistently rejecting the window > updates in incoming ACKs. This means the connection can persistently > fail to discover that the receive window has opened, which in turn > means that the connection is unable to send anything, and the > connection's sending process can get permanently "stuck". > > The fix is to update snd_wl1 in the header prediction fast path for a > bulk data receiver, so that it keeps up and does not see wrapping > problems. > > This fix is based on a very nice and thorough analysis and diagnosis > by Apollon Oikonomopoulos (see link below). > > This is a stable candidate but there is no Fixes tag here since the > bug predates current git history. Just for fun: looks like the bug > dates back to when header prediction was added in Linux v2.1.8 in Nov > 1996. In that version tcp_rcv_established() was added, and the code > only updates snd_wl1 in tcp_ack(), and in the new "Bulk data transfer: > receiver" code path it does not call tcp_ack(). This fix seems to > apply cleanly at least as far back as v3.2. > > Signed-off-by: Neal Cardwell <ncardwell@google.com> > Reported-by: Apollon Oikonomopoulos <apoikos@dmesg.gr> > Tested-by: Apollon Oikonomopoulos <apoikos@dmesg.gr> > Link: https://www.spinics.net/lists/netdev/msg692430.html > Acked-by: Soheil Hassas Yeganeh <soheil@google.com> > Acked-by: Yuchung Cheng <ycheng@google.com> > Signed-off-by: Eric Dumazet <edumazet@google.com> Applied, thanks!
Hello: This patch was applied to netdev/net.git (refs/heads/master): On Thu, 22 Oct 2020 10:33:31 -0400 you wrote: > From: Neal Cardwell <ncardwell@google.com> > > In the header prediction fast path for a bulk data receiver, if no > data is newly acknowledged then we do not call tcp_ack() and do not > call tcp_ack_update_window(). This means that a bulk receiver that > receives large amounts of data can have the incoming sequence numbers > wrap, so that the check in tcp_may_update_window fails: > after(ack_seq, tp->snd_wl1) > > [...] Here is the summary with links: - [net] tcp: fix to update snd_wl1 in bulk receiver fast path https://git.kernel.org/netdev/net/c/18ded910b589 You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 67f10d3ec240..fc445833b5e5 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -5827,6 +5827,8 @@ void tcp_rcv_established(struct sock *sk, struct sk_buff *skb) tcp_data_snd_check(sk); if (!inet_csk_ack_scheduled(sk)) goto no_ack; + } else { + tcp_update_wl(tp, TCP_SKB_CB(skb)->seq); } __tcp_ack_snd_check(sk, 0);