From patchwork Fri Feb 12 23:22:13 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Dumazet X-Patchwork-Id: 382210 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3E5D1C433E6 for ; Fri, 12 Feb 2021 23:23:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0076464E12 for ; Fri, 12 Feb 2021 23:23:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232112AbhBLXXG (ORCPT ); Fri, 12 Feb 2021 18:23:06 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39156 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231946AbhBLXXC (ORCPT ); Fri, 12 Feb 2021 18:23:02 -0500 Received: from mail-pj1-x102d.google.com (mail-pj1-x102d.google.com [IPv6:2607:f8b0:4864:20::102d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0000DC061756 for ; Fri, 12 Feb 2021 15:22:21 -0800 (PST) Received: by mail-pj1-x102d.google.com with SMTP id gb24so456983pjb.4 for ; Fri, 12 Feb 2021 15:22:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=OA9jl+qltSvZN2XmtYv7p5wUHqDmkLnYtHrCz0b4Bms=; b=bYZ6OdMiwNSB6Ij5eZOaVY5VpTAMenPs2b1KYYnS/mbwO3fC5DCebV0HUyTgBDS9ie YUfT9qhN6igvj6nUx1ZqRCk3a8vdw2U5hl07VGZxrsaW5D3/CFBfUFIOG8ihgI5d+KFt Z1sRS7jD06wpOoy6p4LzCvP1g+p287YeMQ3zQkyel/hyfsgJ1u8Y4D6g8u4+fmLAaQpb Y+kxQVKvh9DcYKYL5aLH0v9p8Dv0rrX2bpNcLCeO6Kh4KOCZOQP1CTYAM20XChws5J1n qodbild7EOe0ZZt1hnO1NRXqYglbBON089MmLTWgzWiLLEmBO8nFIaaNvrbVqwz2Ftlb nksA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=OA9jl+qltSvZN2XmtYv7p5wUHqDmkLnYtHrCz0b4Bms=; b=oo+rNLQTv8Apu4GSb5j5KF8ZDo9CscjU8wtJ7altrhhVzNoe0d4ku/jD3b+QATUZov mOUzDQ9BsfPCandgQ19MWJiNTt5LcanF8j0rTWGvYGN5jL+EN5V7y2DVTr00jbhI0EaZ OdJvAELexco03HenxQdG3jZTibboSy9MeVpdHf/1DYzTmMzSMykl7NQ7E9RX9Ra1uMDb A4yYHPduNKzR6/aHr7WATSLOYHpaGjfUXXu9H4XK0DivhrjRubSUtxJWHKIGwx6tVtYl jZJQt77pXNpQEqhCnAZjgziENsJM2NVN1N62ykJqtDN5yF1WWXLXiJJ9Or90AXjtyjWs MWqA== X-Gm-Message-State: AOAM531rET/1v+6EFINwbMeJM3APXXhxf/FJhBaYLRxEi1HDU1vhJsZV b4T0kgRaS7+w86vKdvN6tQM= X-Google-Smtp-Source: ABdhPJyuYrvaYeHh+qepBvbP6lb0CavFlBWCtduiWuNsYzFQZUKjnzhL54Hjg358VOCP/s96yYJ+/A== X-Received: by 2002:a17:90a:7025:: with SMTP id f34mr4722928pjk.116.1613172141581; Fri, 12 Feb 2021 15:22:21 -0800 (PST) Received: from edumazet1.svl.corp.google.com ([2620:15c:2c4:201:449f:1ef7:3640:824a]) by smtp.gmail.com with ESMTPSA id f7sm9160614pjh.45.2021.02.12.15.22.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 12 Feb 2021 15:22:21 -0800 (PST) From: Eric Dumazet To: "David S . Miller" , Jakub Kicinski Cc: netdev , Eric Dumazet , Eric Dumazet , Arjun Roy , Wei Wang Subject: [PATCH net-next 1/2] tcp: fix SO_RCVLOWAT related hangs under mem pressure Date: Fri, 12 Feb 2021 15:22:13 -0800 Message-Id: <20210212232214.2869897-2-eric.dumazet@gmail.com> X-Mailer: git-send-email 2.30.0.478.g8a0d178c01-goog In-Reply-To: <20210212232214.2869897-1-eric.dumazet@gmail.com> References: <20210212232214.2869897-1-eric.dumazet@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Eric Dumazet While commit 24adbc1676af ("tcp: fix SO_RCVLOWAT hangs with fat skbs") fixed an issue vs too small sk_rcvbuf for given sk_rcvlowat constraint, it missed to address issue caused by memory pressure. 1) If we are under memory pressure and socket receive queue is empty. First incoming packet is allowed to be queued, after commit 76dfa6082032 ("tcp: allow one skb to be received per socket under memory pressure") But we do not send EPOLLIN yet, in case tcp_data_ready() sees sk_rcvlowat is bigger than skb length. 2) Then, when next packet comes, it is dropped, and we directly call sk->sk_data_ready(). 3) If application is using poll(), tcp_poll() will then use tcp_stream_is_readable() and decide the socket receive queue is not yet filled, so nothing will happen. Even when sender retransmits packets, phases 2) & 3) repeat and flow is effectively frozen, until memory pressure is off. Fix is to consider tcp_under_memory_pressure() to take care of global memory pressure or memcg pressure. Fixes: 24adbc1676af ("tcp: fix SO_RCVLOWAT hangs with fat skbs") Signed-off-by: Eric Dumazet Reported-by: Arjun Roy Suggested-by: Wei Wang Reviewed-by: Wei Wang --- include/net/tcp.h | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index 25bbada379c46add16fb7239733bd6571f10f680..244208f6f6c2ace87920b633e469421f557427a6 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1431,8 +1431,13 @@ void tcp_cleanup_rbuf(struct sock *sk, int copied); */ static inline bool tcp_rmem_pressure(const struct sock *sk) { - int rcvbuf = READ_ONCE(sk->sk_rcvbuf); - int threshold = rcvbuf - (rcvbuf >> 3); + int rcvbuf, threshold; + + if (tcp_under_memory_pressure(sk)) + return true; + + rcvbuf = READ_ONCE(sk->sk_rcvbuf); + threshold = rcvbuf - (rcvbuf >> 3); return atomic_read(&sk->sk_rmem_alloc) > threshold; } From patchwork Fri Feb 12 23:22:14 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Dumazet X-Patchwork-Id: 382689 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 66B5BC433E0 for ; Fri, 12 Feb 2021 23:23:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3143264E16 for ; Fri, 12 Feb 2021 23:23:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232148AbhBLXXW (ORCPT ); Fri, 12 Feb 2021 18:23:22 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39170 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232091AbhBLXXF (ORCPT ); Fri, 12 Feb 2021 18:23:05 -0500 Received: from mail-pl1-x62b.google.com (mail-pl1-x62b.google.com [IPv6:2607:f8b0:4864:20::62b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 00F9CC0613D6 for ; Fri, 12 Feb 2021 15:22:25 -0800 (PST) Received: by mail-pl1-x62b.google.com with SMTP id b8so613898plh.12 for ; Fri, 12 Feb 2021 15:22:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=HQFoYYRpVCfzGnqKCG9yNHIeFisJB8uMhi3j26ItGgI=; b=CgKV1TAwl3VHHsQeq/QrQz96zfCPn9jIptQ8Fw1UhxDbP46lHs+NNTGlTsLA5j0YJi ZmesZbYow3EiTjxV2qIRYBK6uCVUkTl9MJ40Jy4NUMNWwVm+DKdEnI5o3H4Bv4aexThf IL8bcBZhSX8C8DieHJFVe1aaImfv4SKmkMMla4to8Nf8JRDP2mOKLQrGo4H38AJMn8qE OV/d35VNA02X+r4UDrpZY3H6LL8rjb+r2vWTkuKMB1TvD4664rJj7LcKWIGw/mwLQn24 Ppx+78WwiHIDvx/nSxzeurr1segVJmV89PuBRdSImko1juQaIf2QTnNAyjhN3ugkUKXC f4OA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=HQFoYYRpVCfzGnqKCG9yNHIeFisJB8uMhi3j26ItGgI=; b=npMHnVgu6qWGeHG5V2C4XopihjU966HAxvgWiIzoplg18wAslTOEIv1DqRjbKg0I+5 cEqdiv5Ew2ZpvfSUT6THzQuXQcbdA+n8xJjJ1JkH/8WXxs7vshEiVPyy596AFhtoqUQ/ U2Z9nuwS7s9EgjOZFYmmk6q2yUIJxOv6k9yJI8i8FRhbkhO+qDrer8fVeU+cEDIlae/e XP7OHXxiWyi+PF/5MMWiIdsElQSnWXbmwouT23ytnKiUrtuJHD3DnrKsnEwoLMcMzvjj rHhHJ+/gd0gogOuc/gq5NDEy7an7Nq85s8VchHk0coVjFmDtVWFt5bpX8yYydWUYZfmW 5d7g== X-Gm-Message-State: AOAM532ecAPkxdI8j7fQxh2r0cnfKFCV9W7APq3V0ea+t4YfcPavrxK0 MCVq92grlh73y7Y8Yzwwd10= X-Google-Smtp-Source: ABdhPJw2r2xxTYzfozTB7qEcukpHbyfTDPA2+O72cjTZv18shLJ7JIGcx5NK/rYXBLOvgNGLmy233Q== X-Received: by 2002:a17:90b:1b50:: with SMTP id nv16mr4584602pjb.153.1613172144597; Fri, 12 Feb 2021 15:22:24 -0800 (PST) Received: from edumazet1.svl.corp.google.com ([2620:15c:2c4:201:449f:1ef7:3640:824a]) by smtp.gmail.com with ESMTPSA id f7sm9160614pjh.45.2021.02.12.15.22.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 12 Feb 2021 15:22:24 -0800 (PST) From: Eric Dumazet To: "David S . Miller" , Jakub Kicinski Cc: netdev , Eric Dumazet , Eric Dumazet , Arjun Roy , Wei Wang Subject: [PATCH net-next 2/2] tcp: factorize logic into tcp_epollin_ready() Date: Fri, 12 Feb 2021 15:22:14 -0800 Message-Id: <20210212232214.2869897-3-eric.dumazet@gmail.com> X-Mailer: git-send-email 2.30.0.478.g8a0d178c01-goog In-Reply-To: <20210212232214.2869897-1-eric.dumazet@gmail.com> References: <20210212232214.2869897-1-eric.dumazet@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Eric Dumazet Both tcp_data_ready() and tcp_stream_is_readable() share the same logic. Add tcp_epollin_ready() helper to avoid duplication. Signed-off-by: Eric Dumazet Cc: Arjun Roy Cc: Wei Wang --- include/net/tcp.h | 12 ++++++++++++ net/ipv4/tcp.c | 16 ++++------------ net/ipv4/tcp_input.c | 11 ++--------- 3 files changed, 18 insertions(+), 21 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index 244208f6f6c2ace87920b633e469421f557427a6..484eb2362645fd478f59b26b42457ecf4510eb14 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1442,6 +1442,18 @@ static inline bool tcp_rmem_pressure(const struct sock *sk) return atomic_read(&sk->sk_rmem_alloc) > threshold; } +static inline bool tcp_epollin_ready(const struct sock *sk, int target) +{ + const struct tcp_sock *tp = tcp_sk(sk); + int avail = READ_ONCE(tp->rcv_nxt) - READ_ONCE(tp->copied_seq); + + if (avail <= 0) + return false; + + return (avail >= target) || tcp_rmem_pressure(sk) || + (tcp_receive_window(tp) <= inet_csk(sk)->icsk_ack.rcv_mss); +} + extern void tcp_openreq_init_rwin(struct request_sock *req, const struct sock *sk_listener, const struct dst_entry *dst); diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 9896ca10bb340924b779cb6a7606d57fdd5c3357..7a6b58ae408d1fb1e5536ccfed8215be123f3b57 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -481,19 +481,11 @@ static void tcp_tx_timestamp(struct sock *sk, u16 tsflags) } } -static inline bool tcp_stream_is_readable(const struct tcp_sock *tp, - int target, struct sock *sk) +static bool tcp_stream_is_readable(struct sock *sk, int target) { - int avail = READ_ONCE(tp->rcv_nxt) - READ_ONCE(tp->copied_seq); + if (tcp_epollin_ready(sk, target)) + return true; - if (avail > 0) { - if (avail >= target) - return true; - if (tcp_rmem_pressure(sk)) - return true; - if (tcp_receive_window(tp) <= inet_csk(sk)->icsk_ack.rcv_mss) - return true; - } if (sk->sk_prot->stream_memory_read) return sk->sk_prot->stream_memory_read(sk); return false; @@ -568,7 +560,7 @@ __poll_t tcp_poll(struct file *file, struct socket *sock, poll_table *wait) tp->urg_data) target++; - if (tcp_stream_is_readable(tp, target, sk)) + if (tcp_stream_is_readable(sk, target)) mask |= EPOLLIN | EPOLLRDNORM; if (!(sk->sk_shutdown & SEND_SHUTDOWN)) { diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index a8f8f98159531e5d1c80660972148986f6acd20a..e32a7056cb7640c67ef2d6a4d9484684d2602fcd 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -4924,15 +4924,8 @@ int tcp_send_rcvq(struct sock *sk, struct msghdr *msg, size_t size) void tcp_data_ready(struct sock *sk) { - const struct tcp_sock *tp = tcp_sk(sk); - int avail = tp->rcv_nxt - tp->copied_seq; - - if (avail < sk->sk_rcvlowat && !tcp_rmem_pressure(sk) && - !sock_flag(sk, SOCK_DONE) && - tcp_receive_window(tp) > inet_csk(sk)->icsk_ack.rcv_mss) - return; - - sk->sk_data_ready(sk); + if (tcp_epollin_ready(sk, sk->sk_rcvlowat)) + sk->sk_data_ready(sk); } static void tcp_data_queue(struct sock *sk, struct sk_buff *skb)