From patchwork Sat Jul 10 00:20:46 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mat Martineau X-Patchwork-Id: 472631 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-20.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, MENTIONS_GIT_HOSTING, NUMERIC_HTTP_ADDR, SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT,WEIRD_PORT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7EE47C07E9B for ; Sat, 10 Jul 2021 00:21:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 66E7C613C9 for ; Sat, 10 Jul 2021 00:21:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231573AbhGJAXu (ORCPT ); Fri, 9 Jul 2021 20:23:50 -0400 Received: from mga06.intel.com ([134.134.136.31]:60427 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231406AbhGJAXp (ORCPT ); Fri, 9 Jul 2021 20:23:45 -0400 X-IronPort-AV: E=McAfee;i="6200,9189,10040"; a="270913467" X-IronPort-AV: E=Sophos;i="5.84,228,1620716400"; d="scan'208";a="270913467" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Jul 2021 17:20:57 -0700 X-IronPort-AV: E=Sophos;i="5.84,228,1620716400"; d="scan'208";a="462343531" Received: from mjmartin-desk2.amr.corp.intel.com (HELO mjmartin-desk2.intel.com) ([10.212.240.68]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Jul 2021 17:20:57 -0700 From: Mat Martineau To: netdev@vger.kernel.org Cc: Jianguo Wu , davem@davemloft.net, kuba@kernel.org, matthieu.baerts@tessares.net, pabeni@redhat.com, fw@strlen.de, mptcp@lists.linux.dev, Mat Martineau Subject: [PATCH net 1/6] mptcp: fix warning in __skb_flow_dissect() when do syn cookie for subflow join Date: Fri, 9 Jul 2021 17:20:46 -0700 Message-Id: <20210710002051.216010-2-mathew.j.martineau@linux.intel.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210710002051.216010-1-mathew.j.martineau@linux.intel.com> References: <20210710002051.216010-1-mathew.j.martineau@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Jianguo Wu I did stress test with wrk[1] and webfsd[2] with the assistance of mptcp-tools[3]: Server side: ./use_mptcp.sh webfsd -4 -R /tmp/ -p 8099 Client side: ./use_mptcp.sh wrk -c 200 -d 30 -t 4 http://192.168.174.129:8099/ and got the following warning message: [ 55.552626] TCP: request_sock_subflow: Possible SYN flooding on port 8099. Sending cookies. Check SNMP counters. [ 55.553024] ------------[ cut here ]------------ [ 55.553027] WARNING: CPU: 0 PID: 10 at net/core/flow_dissector.c:984 __skb_flow_dissect+0x280/0x1650 ... [ 55.553117] CPU: 0 PID: 10 Comm: ksoftirqd/0 Not tainted 5.12.0+ #18 [ 55.553121] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 02/27/2020 [ 55.553124] RIP: 0010:__skb_flow_dissect+0x280/0x1650 ... [ 55.553133] RSP: 0018:ffffb79580087770 EFLAGS: 00010246 [ 55.553137] RAX: 0000000000000000 RBX: ffffffff8ddb58e0 RCX: ffffb79580087888 [ 55.553139] RDX: ffffffff8ddb58e0 RSI: ffff8f7e4652b600 RDI: 0000000000000000 [ 55.553141] RBP: ffffb79580087858 R08: 0000000000000000 R09: 0000000000000008 [ 55.553143] R10: 000000008c622965 R11: 00000000d3313a5b R12: ffff8f7e4652b600 [ 55.553146] R13: ffff8f7e465c9062 R14: 0000000000000000 R15: ffffb79580087888 [ 55.553149] FS: 0000000000000000(0000) GS:ffff8f7f75e00000(0000) knlGS:0000000000000000 [ 55.553152] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 55.553154] CR2: 00007f73d1d19000 CR3: 0000000135e10004 CR4: 00000000003706f0 [ 55.553160] Call Trace: [ 55.553166] ? __sha256_final+0x67/0xd0 [ 55.553173] ? sha256+0x7e/0xa0 [ 55.553177] __skb_get_hash+0x57/0x210 [ 55.553182] subflow_init_req_cookie_join_save+0xac/0xc0 [ 55.553189] subflow_check_req+0x474/0x550 [ 55.553195] ? ip_route_output_key_hash+0x67/0x90 [ 55.553200] ? xfrm_lookup_route+0x1d/0xa0 [ 55.553207] subflow_v4_route_req+0x8e/0xd0 [ 55.553212] tcp_conn_request+0x31e/0xab0 [ 55.553218] ? selinux_socket_sock_rcv_skb+0x116/0x210 [ 55.553224] ? tcp_rcv_state_process+0x179/0x6d0 [ 55.553229] tcp_rcv_state_process+0x179/0x6d0 [ 55.553235] tcp_v4_do_rcv+0xaf/0x220 [ 55.553239] tcp_v4_rcv+0xce4/0xd80 [ 55.553243] ? ip_route_input_rcu+0x246/0x260 [ 55.553248] ip_protocol_deliver_rcu+0x35/0x1b0 [ 55.553253] ip_local_deliver_finish+0x44/0x50 [ 55.553258] ip_local_deliver+0x6c/0x110 [ 55.553262] ? ip_rcv_finish_core.isra.19+0x5a/0x400 [ 55.553267] ip_rcv+0xd1/0xe0 ... After debugging, I found in __skb_flow_dissect(), skb->dev and skb->sk are both NULL, then net is NULL, and trigger WARN_ON_ONCE(!net), actually net is always NULL in this code path, as skb->dev is set to NULL in tcp_v4_rcv(), and skb->sk is never set. Code snippet in __skb_flow_dissect() that trigger warning: 975 if (skb) { 976 if (!net) { 977 if (skb->dev) 978 net = dev_net(skb->dev); 979 else if (skb->sk) 980 net = sock_net(skb->sk); 981 } 982 } 983 984 WARN_ON_ONCE(!net); So, using seq and transport header derived hash. [1] https://github.com/wg/wrk [2] https://github.com/ourway/webfsd [3] https://github.com/pabeni/mptcp-tools Fixes: 9466a1ccebbe ("mptcp: enable JOIN requests even if cookies are in use") Suggested-by: Paolo Abeni Suggested-by: Florian Westphal Signed-off-by: Jianguo Wu Signed-off-by: Mat Martineau --- net/mptcp/syncookies.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/net/mptcp/syncookies.c b/net/mptcp/syncookies.c index abe0fd099746..37127781aee9 100644 --- a/net/mptcp/syncookies.c +++ b/net/mptcp/syncookies.c @@ -37,7 +37,21 @@ static spinlock_t join_entry_locks[COOKIE_JOIN_SLOTS] __cacheline_aligned_in_smp static u32 mptcp_join_entry_hash(struct sk_buff *skb, struct net *net) { - u32 i = skb_get_hash(skb) ^ net_hash_mix(net); + static u32 mptcp_join_hash_secret __read_mostly; + struct tcphdr *th = tcp_hdr(skb); + u32 seq, i; + + net_get_random_once(&mptcp_join_hash_secret, + sizeof(mptcp_join_hash_secret)); + + if (th->syn) + seq = TCP_SKB_CB(skb)->seq; + else + seq = TCP_SKB_CB(skb)->seq - 1; + + i = jhash_3words(seq, net_hash_mix(net), + (__force __u32)th->source << 16 | (__force __u32)th->dest, + mptcp_join_hash_secret); return i % ARRAY_SIZE(join_entries); } From patchwork Sat Jul 10 00:20:47 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mat Martineau X-Patchwork-Id: 472633 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.0 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 12258C07E9B for ; Sat, 10 Jul 2021 00:21:02 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E9CEA613CA for ; Sat, 10 Jul 2021 00:21:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231325AbhGJAXp (ORCPT ); Fri, 9 Jul 2021 20:23:45 -0400 Received: from mga06.intel.com ([134.134.136.31]:60426 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229700AbhGJAXo (ORCPT ); Fri, 9 Jul 2021 20:23:44 -0400 X-IronPort-AV: E=McAfee;i="6200,9189,10040"; a="270913468" X-IronPort-AV: E=Sophos;i="5.84,228,1620716400"; d="scan'208";a="270913468" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Jul 2021 17:20:57 -0700 X-IronPort-AV: E=Sophos;i="5.84,228,1620716400"; d="scan'208";a="462343533" Received: from mjmartin-desk2.amr.corp.intel.com (HELO mjmartin-desk2.intel.com) ([10.212.240.68]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Jul 2021 17:20:57 -0700 From: Mat Martineau To: netdev@vger.kernel.org Cc: Jianguo Wu , davem@davemloft.net, kuba@kernel.org, matthieu.baerts@tessares.net, geliangtang@gmail.com, mptcp@lists.linux.dev, Mat Martineau Subject: [PATCH net 2/6] mptcp: remove redundant req destruct in subflow_check_req() Date: Fri, 9 Jul 2021 17:20:47 -0700 Message-Id: <20210710002051.216010-3-mathew.j.martineau@linux.intel.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210710002051.216010-1-mathew.j.martineau@linux.intel.com> References: <20210710002051.216010-1-mathew.j.martineau@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Jianguo Wu In subflow_check_req(), if subflow sport is mismatch, will put msk, destroy token, and destruct req, then return -EPERM, which can be done by subflow_req_destructor() via: tcp_conn_request() |--__reqsk_free() |--subflow_req_destructor() So we should remove these redundant code, otherwise will call tcp_v4_reqsk_destructor() twice, and may double free inet_rsk(req)->ireq_opt. Fixes: 5bc56388c74f ("mptcp: add port number check for MP_JOIN") Signed-off-by: Jianguo Wu Signed-off-by: Mat Martineau --- net/mptcp/subflow.c | 5 ----- 1 file changed, 5 deletions(-) diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index 66d0b1893d26..b15e2017168d 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -214,11 +214,6 @@ static int subflow_check_req(struct request_sock *req, ntohs(inet_sk(sk_listener)->inet_sport), ntohs(inet_sk((struct sock *)subflow_req->msk)->inet_sport)); if (!mptcp_pm_sport_in_anno_list(subflow_req->msk, sk_listener)) { - sock_put((struct sock *)subflow_req->msk); - mptcp_token_destroy_request(req); - tcp_request_sock_ops.destructor(req); - subflow_req->msk = NULL; - subflow_req->mp_join = 0; SUBFLOW_REQ_INC_STATS(req, MPTCP_MIB_MISMATCHPORTSYNRX); return -EPERM; } From patchwork Sat Jul 10 00:20:48 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mat Martineau X-Patchwork-Id: 472824 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.0 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0BD7EC07E9C for ; Sat, 10 Jul 2021 00:21:05 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E576661375 for ; Sat, 10 Jul 2021 00:21:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231441AbhGJAXq (ORCPT ); Fri, 9 Jul 2021 20:23:46 -0400 Received: from mga06.intel.com ([134.134.136.31]:60426 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229846AbhGJAXo (ORCPT ); Fri, 9 Jul 2021 20:23:44 -0400 X-IronPort-AV: E=McAfee;i="6200,9189,10040"; a="270913470" X-IronPort-AV: E=Sophos;i="5.84,228,1620716400"; d="scan'208";a="270913470" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Jul 2021 17:20:58 -0700 X-IronPort-AV: E=Sophos;i="5.84,228,1620716400"; d="scan'208";a="462343534" Received: from mjmartin-desk2.amr.corp.intel.com (HELO mjmartin-desk2.intel.com) ([10.212.240.68]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Jul 2021 17:20:57 -0700 From: Mat Martineau To: netdev@vger.kernel.org Cc: Jianguo Wu , davem@davemloft.net, kuba@kernel.org, matthieu.baerts@tessares.net, pabeni@redhat.com, fw@strlen.de, mptcp@lists.linux.dev, Mat Martineau Subject: [PATCH net 3/6] mptcp: fix syncookie process if mptcp can not_accept new subflow Date: Fri, 9 Jul 2021 17:20:48 -0700 Message-Id: <20210710002051.216010-4-mathew.j.martineau@linux.intel.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210710002051.216010-1-mathew.j.martineau@linux.intel.com> References: <20210710002051.216010-1-mathew.j.martineau@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Jianguo Wu Lots of "TCP: tcp_fin: Impossible, sk->sk_state=7" in client side when doing stress testing using wrk and webfsd. There are at least two cases may trigger this warning: 1.mptcp is in syncookie, and server recv MP_JOIN SYN request, in subflow_check_req(), the mptcp_can_accept_new_subflow() return false, so subflow_init_req_cookie_join_save() isn't called, i.e. not store the data present in the MP_JOIN syn request and the random nonce in hash table - join_entries[], but still send synack. When recv 3rd-ack, mptcp_token_join_cookie_init_state() will return false, and 3rd-ack is dropped, then if mptcp conn is closed by client, client will send a DATA_FIN and a MPTCP FIN, the DATA_FIN doesn't have MP_CAPABLE or MP_JOIN, so mptcp_subflow_init_cookie_req() will return 0, and pass the cookie check, MP_JOIN request is fallback to normal TCP. Server will send a TCP FIN if closed, in client side, when process TCP FIN, it will do reset, the code path is: tcp_data_queue()->mptcp_incoming_options() ->check_fully_established()->mptcp_subflow_reset(). mptcp_subflow_reset() will set sock state to TCP_CLOSE, so tcp_fin will hit TCP_CLOSE, and print the warning. 2.mptcp is in syncookie, and server recv 3rd-ack, in mptcp_subflow_init_cookie_req(), mptcp_can_accept_new_subflow() return false, and subflow_req->mp_join is not set to 1, so in subflow_syn_recv_sock() will not reset the MP_JOIN subflow, but fallback to normal TCP, and then the same thing happens when server will send a TCP FIN if closed. For case1, subflow_check_req() return -EPERM, then tcp_conn_request() will drop MP_JOIN SYN. For case2, let subflow_syn_recv_sock() call mptcp_can_accept_new_subflow(), and do fatal fallback, send reset. Fixes: 9466a1ccebbe ("mptcp: enable JOIN requests even if cookies are in use") Signed-off-by: Jianguo Wu Signed-off-by: Mat Martineau --- net/mptcp/subflow.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index b15e2017168d..966f777d35ce 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -225,6 +225,8 @@ static int subflow_check_req(struct request_sock *req, if (unlikely(req->syncookie)) { if (mptcp_can_accept_new_subflow(subflow_req->msk)) subflow_init_req_cookie_join_save(subflow_req, skb); + else + return -EPERM; } pr_debug("token=%u, remote_nonce=%u msk=%p", subflow_req->token, @@ -264,9 +266,7 @@ int mptcp_subflow_init_cookie_req(struct request_sock *req, if (!mptcp_token_join_cookie_init_state(subflow_req, skb)) return -EINVAL; - if (mptcp_can_accept_new_subflow(subflow_req->msk)) - subflow_req->mp_join = 1; - + subflow_req->mp_join = 1; subflow_req->ssn_offset = TCP_SKB_CB(skb)->seq - 1; } From patchwork Sat Jul 10 00:20:49 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mat Martineau X-Patchwork-Id: 472632 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.0 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 18F99C07E9E for ; Sat, 10 Jul 2021 00:21:05 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id F1DFF613C9 for ; Sat, 10 Jul 2021 00:21:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231516AbhGJAXr (ORCPT ); Fri, 9 Jul 2021 20:23:47 -0400 Received: from mga06.intel.com ([134.134.136.31]:60426 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230515AbhGJAXo (ORCPT ); Fri, 9 Jul 2021 20:23:44 -0400 X-IronPort-AV: E=McAfee;i="6200,9189,10040"; a="270913471" X-IronPort-AV: E=Sophos;i="5.84,228,1620716400"; d="scan'208";a="270913471" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Jul 2021 17:20:58 -0700 X-IronPort-AV: E=Sophos;i="5.84,228,1620716400"; d="scan'208";a="462343537" Received: from mjmartin-desk2.amr.corp.intel.com (HELO mjmartin-desk2.intel.com) ([10.212.240.68]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Jul 2021 17:20:57 -0700 From: Mat Martineau To: netdev@vger.kernel.org Cc: Jianguo Wu , davem@davemloft.net, kuba@kernel.org, matthieu.baerts@tessares.net, pabeni@redhat.com, mptcp@lists.linux.dev, Mat Martineau Subject: [PATCH net 4/6] mptcp: avoid processing packet if a subflow reset Date: Fri, 9 Jul 2021 17:20:49 -0700 Message-Id: <20210710002051.216010-5-mathew.j.martineau@linux.intel.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210710002051.216010-1-mathew.j.martineau@linux.intel.com> References: <20210710002051.216010-1-mathew.j.martineau@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Jianguo Wu If check_fully_established() causes a subflow reset, it should not continue to process the packet in tcp_data_queue(). Add a return value to mptcp_incoming_options(), and return false if a subflow has been reset, else return true. Then drop the packet in tcp_data_queue()/tcp_rcv_state_process() if mptcp_incoming_options() return false. Fixes: d582484726c4 ("mptcp: fix fallback for MP_JOIN subflows") Signed-off-by: Jianguo Wu Signed-off-by: Mat Martineau --- include/net/mptcp.h | 5 +++-- net/ipv4/tcp_input.c | 19 +++++++++++++++---- net/mptcp/options.c | 19 +++++++++++++------ 3 files changed, 31 insertions(+), 12 deletions(-) diff --git a/include/net/mptcp.h b/include/net/mptcp.h index cb580b06152f..8b5af683a818 100644 --- a/include/net/mptcp.h +++ b/include/net/mptcp.h @@ -105,7 +105,7 @@ bool mptcp_synack_options(const struct request_sock *req, unsigned int *size, bool mptcp_established_options(struct sock *sk, struct sk_buff *skb, unsigned int *size, unsigned int remaining, struct mptcp_out_options *opts); -void mptcp_incoming_options(struct sock *sk, struct sk_buff *skb); +bool mptcp_incoming_options(struct sock *sk, struct sk_buff *skb); void mptcp_write_options(__be32 *ptr, const struct tcp_sock *tp, struct mptcp_out_options *opts); @@ -227,9 +227,10 @@ static inline bool mptcp_established_options(struct sock *sk, return false; } -static inline void mptcp_incoming_options(struct sock *sk, +static inline bool mptcp_incoming_options(struct sock *sk, struct sk_buff *skb) { + return true; } static inline void mptcp_skb_ext_move(struct sk_buff *to, diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index a5a8d0a378b2..149ceb5c94ff 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -4247,6 +4247,9 @@ void tcp_reset(struct sock *sk, struct sk_buff *skb) { trace_tcp_receive_reset(sk); + /* mptcp can't tell us to ignore reset pkts, + * so just ignore the return value of mptcp_incoming_options(). + */ if (sk_is_mptcp(sk)) mptcp_incoming_options(sk, skb); @@ -4941,8 +4944,13 @@ static void tcp_data_queue(struct sock *sk, struct sk_buff *skb) bool fragstolen; int eaten; - if (sk_is_mptcp(sk)) - mptcp_incoming_options(sk, skb); + /* If a subflow has been reset, the packet should not continue + * to be processed, drop the packet. + */ + if (sk_is_mptcp(sk) && !mptcp_incoming_options(sk, skb)) { + __kfree_skb(skb); + return; + } if (TCP_SKB_CB(skb)->seq == TCP_SKB_CB(skb)->end_seq) { __kfree_skb(skb); @@ -6523,8 +6531,11 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb) case TCP_CLOSING: case TCP_LAST_ACK: if (!before(TCP_SKB_CB(skb)->seq, tp->rcv_nxt)) { - if (sk_is_mptcp(sk)) - mptcp_incoming_options(sk, skb); + /* If a subflow has been reset, the packet should not + * continue to be processed, drop the packet. + */ + if (sk_is_mptcp(sk) && !mptcp_incoming_options(sk, skb)) + goto discard; break; } fallthrough; diff --git a/net/mptcp/options.c b/net/mptcp/options.c index b5850afea343..4452455aef7f 100644 --- a/net/mptcp/options.c +++ b/net/mptcp/options.c @@ -1035,7 +1035,8 @@ static bool add_addr_hmac_valid(struct mptcp_sock *msk, return hmac == mp_opt->ahmac; } -void mptcp_incoming_options(struct sock *sk, struct sk_buff *skb) +/* Return false if a subflow has been reset, else return true */ +bool mptcp_incoming_options(struct sock *sk, struct sk_buff *skb) { struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk); struct mptcp_sock *msk = mptcp_sk(subflow->conn); @@ -1053,12 +1054,16 @@ void mptcp_incoming_options(struct sock *sk, struct sk_buff *skb) __mptcp_check_push(subflow->conn, sk); __mptcp_data_acked(subflow->conn); mptcp_data_unlock(subflow->conn); - return; + return true; } mptcp_get_options(sk, skb, &mp_opt); + + /* The subflow can be in close state only if check_fully_established() + * just sent a reset. If so, tell the caller to ignore the current packet. + */ if (!check_fully_established(msk, sk, subflow, skb, &mp_opt)) - return; + return sk->sk_state != TCP_CLOSE; if (mp_opt.fastclose && msk->local_key == mp_opt.rcvr_key) { @@ -1100,7 +1105,7 @@ void mptcp_incoming_options(struct sock *sk, struct sk_buff *skb) } if (!mp_opt.dss) - return; + return true; /* we can't wait for recvmsg() to update the ack_seq, otherwise * monodirectional flows will stuck @@ -1119,12 +1124,12 @@ void mptcp_incoming_options(struct sock *sk, struct sk_buff *skb) schedule_work(&msk->work)) sock_hold(subflow->conn); - return; + return true; } mpext = skb_ext_add(skb, SKB_EXT_MPTCP); if (!mpext) - return; + return true; memset(mpext, 0, sizeof(*mpext)); @@ -1153,6 +1158,8 @@ void mptcp_incoming_options(struct sock *sk, struct sk_buff *skb) if (mpext->csum_reqd) mpext->csum = mp_opt.csum; } + + return true; } static void mptcp_set_rwin(const struct tcp_sock *tp) From patchwork Sat Jul 10 00:20:50 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mat Martineau X-Patchwork-Id: 472823 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.0 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9CB04C07E99 for ; Sat, 10 Jul 2021 00:21:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8935D613D3 for ; Sat, 10 Jul 2021 00:21:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231550AbhGJAXu (ORCPT ); Fri, 9 Jul 2021 20:23:50 -0400 Received: from mga06.intel.com ([134.134.136.31]:60426 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231328AbhGJAXp (ORCPT ); Fri, 9 Jul 2021 20:23:45 -0400 X-IronPort-AV: E=McAfee;i="6200,9189,10040"; a="270913472" X-IronPort-AV: E=Sophos;i="5.84,228,1620716400"; d="scan'208";a="270913472" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Jul 2021 17:20:58 -0700 X-IronPort-AV: E=Sophos;i="5.84,228,1620716400"; d="scan'208";a="462343541" Received: from mjmartin-desk2.amr.corp.intel.com (HELO mjmartin-desk2.intel.com) ([10.212.240.68]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Jul 2021 17:20:58 -0700 From: Mat Martineau To: netdev@vger.kernel.org Cc: Jianguo Wu , davem@davemloft.net, kuba@kernel.org, matthieu.baerts@tessares.net, pabeni@redhat.com, fw@strlen.de, mptcp@lists.linux.dev, kernel test robot , Mat Martineau Subject: [PATCH net 5/6] selftests: mptcp: fix case multiple subflows limited by server Date: Fri, 9 Jul 2021 17:20:50 -0700 Message-Id: <20210710002051.216010-6-mathew.j.martineau@linux.intel.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210710002051.216010-1-mathew.j.martineau@linux.intel.com> References: <20210710002051.216010-1-mathew.j.martineau@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Jianguo Wu After patch "mptcp: fix syncookie process if mptcp can not_accept new subflow", if subflow is limited, MP_JOIN SYN is dropped, and no SYN/ACK will be replied. So in case "multiple subflows limited by server", the expected SYN/ACK number should be 1. Fixes: 00587187ad30 ("selftests: mptcp: add test cases for mptcp join tests with syn cookies") Reported-by: kernel test robot Signed-off-by: Jianguo Wu Signed-off-by: Mat Martineau --- tools/testing/selftests/net/mptcp/mptcp_join.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh index 9a191c1a5de8..f02f4de2f3a0 100755 --- a/tools/testing/selftests/net/mptcp/mptcp_join.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh @@ -1409,7 +1409,7 @@ syncookies_tests() ip netns exec $ns2 ./pm_nl_ctl add 10.0.3.2 flags subflow ip netns exec $ns2 ./pm_nl_ctl add 10.0.2.2 flags subflow run_tests $ns1 $ns2 10.0.1.1 - chk_join_nr "subflows limited by server w cookies" 2 2 1 + chk_join_nr "subflows limited by server w cookies" 2 1 1 # test signal address with cookies reset_with_cookies From patchwork Sat Jul 10 00:20:51 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mat Martineau X-Patchwork-Id: 472630 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-22.0 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, MENTIONS_GIT_HOSTING, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7C2F5C07E99 for ; Sat, 10 Jul 2021 00:21:10 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 64996613C9 for ; Sat, 10 Jul 2021 00:21:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231660AbhGJAXx (ORCPT ); Fri, 9 Jul 2021 20:23:53 -0400 Received: from mga06.intel.com ([134.134.136.31]:60426 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231549AbhGJAXr (ORCPT ); Fri, 9 Jul 2021 20:23:47 -0400 X-IronPort-AV: E=McAfee;i="6200,9189,10040"; a="270913473" X-IronPort-AV: E=Sophos;i="5.84,228,1620716400"; d="scan'208";a="270913473" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Jul 2021 17:20:58 -0700 X-IronPort-AV: E=Sophos;i="5.84,228,1620716400"; d="scan'208";a="462343540" Received: from mjmartin-desk2.amr.corp.intel.com (HELO mjmartin-desk2.intel.com) ([10.212.240.68]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Jul 2021 17:20:58 -0700 From: Mat Martineau To: netdev@vger.kernel.org Cc: Paolo Abeni , davem@davemloft.net, kuba@kernel.org, matthieu.baerts@tessares.net, fw@strlen.de, mptcp@lists.linux.dev, Mat Martineau Subject: [PATCH net 6/6] mptcp: properly account bulk freed memory Date: Fri, 9 Jul 2021 17:20:51 -0700 Message-Id: <20210710002051.216010-7-mathew.j.martineau@linux.intel.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210710002051.216010-1-mathew.j.martineau@linux.intel.com> References: <20210710002051.216010-1-mathew.j.martineau@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Paolo Abeni After commit 879526030c8b ("mptcp: protect the rx path with the msk socket spinlock") the rmem currently used by a given msk is really sk_rmem_alloc - rmem_released. The safety check in mptcp_data_ready() does not take the above in due account, as a result legit incoming data is kept in subflow receive queue with no reason, delaying or blocking MPTCP-level ack generation. This change addresses the issue introducing a new helper to fetch the rmem memory and using it as needed. Additionally add a MIB counter for the exceptional event described above - the peer is misbehaving. Finally, introduce the required annotation when rmem_released is updated. Fixes: 879526030c8b ("mptcp: protect the rx path with the msk socket spinlock") Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/211 Signed-off-by: Paolo Abeni Signed-off-by: Mat Martineau --- net/mptcp/mib.c | 1 + net/mptcp/mib.h | 1 + net/mptcp/protocol.c | 12 +++++++----- net/mptcp/protocol.h | 10 +++++++++- 4 files changed, 18 insertions(+), 6 deletions(-) diff --git a/net/mptcp/mib.c b/net/mptcp/mib.c index 52ea2517e856..ff2cc0e3273d 100644 --- a/net/mptcp/mib.c +++ b/net/mptcp/mib.c @@ -44,6 +44,7 @@ static const struct snmp_mib mptcp_snmp_list[] = { SNMP_MIB_ITEM("RmSubflow", MPTCP_MIB_RMSUBFLOW), SNMP_MIB_ITEM("MPPrioTx", MPTCP_MIB_MPPRIOTX), SNMP_MIB_ITEM("MPPrioRx", MPTCP_MIB_MPPRIORX), + SNMP_MIB_ITEM("RcvPruned", MPTCP_MIB_RCVPRUNED), SNMP_MIB_SENTINEL }; diff --git a/net/mptcp/mib.h b/net/mptcp/mib.h index 193466c9b549..0663cb12b448 100644 --- a/net/mptcp/mib.h +++ b/net/mptcp/mib.h @@ -37,6 +37,7 @@ enum linux_mptcp_mib_field { MPTCP_MIB_RMSUBFLOW, /* Remove a subflow */ MPTCP_MIB_MPPRIOTX, /* Transmit a MP_PRIO */ MPTCP_MIB_MPPRIORX, /* Received a MP_PRIO */ + MPTCP_MIB_RCVPRUNED, /* Incoming packet dropped due to memory limit */ __MPTCP_MIB_MAX }; diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 7a5afa8c6866..a88924947815 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -474,7 +474,7 @@ static void mptcp_cleanup_rbuf(struct mptcp_sock *msk) bool cleanup, rx_empty; cleanup = (space > 0) && (space >= (old_space << 1)); - rx_empty = !atomic_read(&sk->sk_rmem_alloc); + rx_empty = !__mptcp_rmem(sk); mptcp_for_each_subflow(msk, subflow) { struct sock *ssk = mptcp_subflow_tcp_sock(subflow); @@ -720,8 +720,10 @@ void mptcp_data_ready(struct sock *sk, struct sock *ssk) sk_rbuf = ssk_rbuf; /* over limit? can't append more skbs to msk, Also, no need to wake-up*/ - if (atomic_read(&sk->sk_rmem_alloc) > sk_rbuf) + if (__mptcp_rmem(sk) > sk_rbuf) { + MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_RCVPRUNED); return; + } /* Wake-up the reader only for in-sequence data */ mptcp_data_lock(sk); @@ -1754,7 +1756,7 @@ static int __mptcp_recvmsg_mskq(struct mptcp_sock *msk, if (!(flags & MSG_PEEK)) { /* we will bulk release the skb memory later */ skb->destructor = NULL; - msk->rmem_released += skb->truesize; + WRITE_ONCE(msk->rmem_released, msk->rmem_released + skb->truesize); __skb_unlink(skb, &msk->receive_queue); __kfree_skb(skb); } @@ -1873,7 +1875,7 @@ static void __mptcp_update_rmem(struct sock *sk) atomic_sub(msk->rmem_released, &sk->sk_rmem_alloc); sk_mem_uncharge(sk, msk->rmem_released); - msk->rmem_released = 0; + WRITE_ONCE(msk->rmem_released, 0); } static void __mptcp_splice_receive_queue(struct sock *sk) @@ -2380,7 +2382,7 @@ static int __mptcp_init_sock(struct sock *sk) msk->out_of_order_queue = RB_ROOT; msk->first_pending = NULL; msk->wmem_reserved = 0; - msk->rmem_released = 0; + WRITE_ONCE(msk->rmem_released, 0); msk->tx_pending_data = 0; msk->first = NULL; diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index 426ed80fe72f..0f0c026c5f8b 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -296,9 +296,17 @@ static inline struct mptcp_sock *mptcp_sk(const struct sock *sk) return (struct mptcp_sock *)sk; } +/* the msk socket don't use the backlog, also account for the bulk + * free memory + */ +static inline int __mptcp_rmem(const struct sock *sk) +{ + return atomic_read(&sk->sk_rmem_alloc) - READ_ONCE(mptcp_sk(sk)->rmem_released); +} + static inline int __mptcp_space(const struct sock *sk) { - return tcp_space(sk) + READ_ONCE(mptcp_sk(sk)->rmem_released); + return tcp_win_from_space(sk, READ_ONCE(sk->sk_rcvbuf) - __mptcp_rmem(sk)); } static inline struct mptcp_data_frag *mptcp_send_head(const struct sock *sk)