From patchwork Thu Feb 3 12:32:54 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Westphal X-Patchwork-Id: 539857 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CDDF8C433F5 for ; Thu, 3 Feb 2022 12:34:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344533AbiBCMeH (ORCPT ); Thu, 3 Feb 2022 07:34:07 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59442 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230213AbiBCMeG (ORCPT ); Thu, 3 Feb 2022 07:34:06 -0500 Received: from Chamillionaire.breakpoint.cc (Chamillionaire.breakpoint.cc [IPv6:2a0a:51c0:0:12e:520::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3057DC061714; Thu, 3 Feb 2022 04:34:06 -0800 (PST) Received: from fw by Chamillionaire.breakpoint.cc with local (Exim 4.92) (envelope-from ) id 1nFbJL-0000U9-R1; Thu, 03 Feb 2022 13:34:03 +0100 From: Florian Westphal To: Cc: , Florian Westphal , Pablo Neira Ayuso Subject: [PATCH 4.9.y 1/2] netfilter: nat: remove l4 protocol port rovers Date: Thu, 3 Feb 2022 13:32:54 +0100 Message-Id: <20220203123255.18974-2-fw@strlen.de> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220203123255.18974-1-fw@strlen.de> References: <20220203123255.18974-1-fw@strlen.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org commit 6ed5943f8735e2b778d92ea4d9805c0a1d89bc2b upstream. This is a leftover from days where single-cpu systems were common: Store last port used to resolve a clash to use it as a starting point when the next conflict needs to be resolved. When we have parallel attempt to connect to same address:port pair, its likely that both cores end up computing the same "available" port, as both use same starting port, and newly used ports won't become visible to other cores until the conntrack gets confirmed later. One of the cores then has to drop the packet at insertion time because the chosen new tuple turns out to be in use after all. Lets simplify this: remove port rover and use a pseudo-random starting point. Note that this doesn't make netfilter default to 'fully random' mode; the 'rover' was only used if NAT could not reuse source port as-is. Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso --- include/net/netfilter/nf_nat_l4proto.h | 2 +- net/netfilter/nf_nat_proto_common.c | 7 ++----- net/netfilter/nf_nat_proto_dccp.c | 5 +---- net/netfilter/nf_nat_proto_sctp.c | 5 +---- net/netfilter/nf_nat_proto_tcp.c | 5 +---- net/netfilter/nf_nat_proto_udp.c | 5 +---- net/netfilter/nf_nat_proto_udplite.c | 5 +---- 7 files changed, 8 insertions(+), 26 deletions(-) diff --git a/include/net/netfilter/nf_nat_l4proto.h b/include/net/netfilter/nf_nat_l4proto.h index 12f4cc841b6e..630f0f5c3fa3 100644 --- a/include/net/netfilter/nf_nat_l4proto.h +++ b/include/net/netfilter/nf_nat_l4proto.h @@ -64,7 +64,7 @@ void nf_nat_l4proto_unique_tuple(const struct nf_nat_l3proto *l3proto, struct nf_conntrack_tuple *tuple, const struct nf_nat_range *range, enum nf_nat_manip_type maniptype, - const struct nf_conn *ct, u16 *rover); + const struct nf_conn *ct); int nf_nat_l4proto_nlattr_to_range(struct nlattr *tb[], struct nf_nat_range *range); diff --git a/net/netfilter/nf_nat_proto_common.c b/net/netfilter/nf_nat_proto_common.c index 7d7466dbf663..ac57e47aded2 100644 --- a/net/netfilter/nf_nat_proto_common.c +++ b/net/netfilter/nf_nat_proto_common.c @@ -38,8 +38,7 @@ void nf_nat_l4proto_unique_tuple(const struct nf_nat_l3proto *l3proto, struct nf_conntrack_tuple *tuple, const struct nf_nat_range *range, enum nf_nat_manip_type maniptype, - const struct nf_conn *ct, - u16 *rover) + const struct nf_conn *ct) { unsigned int range_size, min, max, i; __be16 *portptr; @@ -84,15 +83,13 @@ void nf_nat_l4proto_unique_tuple(const struct nf_nat_l3proto *l3proto, } else if (range->flags & NF_NAT_RANGE_PROTO_RANDOM_FULLY) { off = prandom_u32(); } else { - off = *rover; + off = prandom_u32(); } for (i = 0; ; ++off) { *portptr = htons(min + off % range_size); if (++i != range_size && nf_nat_used_tuple(tuple, ct)) continue; - if (!(range->flags & NF_NAT_RANGE_PROTO_RANDOM_ALL)) - *rover = off; return; } } diff --git a/net/netfilter/nf_nat_proto_dccp.c b/net/netfilter/nf_nat_proto_dccp.c index 15c47b246d0d..e7d27c083393 100644 --- a/net/netfilter/nf_nat_proto_dccp.c +++ b/net/netfilter/nf_nat_proto_dccp.c @@ -20,8 +20,6 @@ #include #include -static u_int16_t dccp_port_rover; - static void dccp_unique_tuple(const struct nf_nat_l3proto *l3proto, struct nf_conntrack_tuple *tuple, @@ -29,8 +27,7 @@ dccp_unique_tuple(const struct nf_nat_l3proto *l3proto, enum nf_nat_manip_type maniptype, const struct nf_conn *ct) { - nf_nat_l4proto_unique_tuple(l3proto, tuple, range, maniptype, ct, - &dccp_port_rover); + nf_nat_l4proto_unique_tuple(l3proto, tuple, range, maniptype, ct); } static bool diff --git a/net/netfilter/nf_nat_proto_sctp.c b/net/netfilter/nf_nat_proto_sctp.c index cbc7ade1487b..b839373716e8 100644 --- a/net/netfilter/nf_nat_proto_sctp.c +++ b/net/netfilter/nf_nat_proto_sctp.c @@ -14,8 +14,6 @@ #include -static u_int16_t nf_sctp_port_rover; - static void sctp_unique_tuple(const struct nf_nat_l3proto *l3proto, struct nf_conntrack_tuple *tuple, @@ -23,8 +21,7 @@ sctp_unique_tuple(const struct nf_nat_l3proto *l3proto, enum nf_nat_manip_type maniptype, const struct nf_conn *ct) { - nf_nat_l4proto_unique_tuple(l3proto, tuple, range, maniptype, ct, - &nf_sctp_port_rover); + nf_nat_l4proto_unique_tuple(l3proto, tuple, range, maniptype, ct); } static bool diff --git a/net/netfilter/nf_nat_proto_tcp.c b/net/netfilter/nf_nat_proto_tcp.c index 4f8820fc5148..882e79c6df73 100644 --- a/net/netfilter/nf_nat_proto_tcp.c +++ b/net/netfilter/nf_nat_proto_tcp.c @@ -18,8 +18,6 @@ #include #include -static u16 tcp_port_rover; - static void tcp_unique_tuple(const struct nf_nat_l3proto *l3proto, struct nf_conntrack_tuple *tuple, @@ -27,8 +25,7 @@ tcp_unique_tuple(const struct nf_nat_l3proto *l3proto, enum nf_nat_manip_type maniptype, const struct nf_conn *ct) { - nf_nat_l4proto_unique_tuple(l3proto, tuple, range, maniptype, ct, - &tcp_port_rover); + nf_nat_l4proto_unique_tuple(l3proto, tuple, range, maniptype, ct); } static bool diff --git a/net/netfilter/nf_nat_proto_udp.c b/net/netfilter/nf_nat_proto_udp.c index b1e627227b6e..ed91bdd8857c 100644 --- a/net/netfilter/nf_nat_proto_udp.c +++ b/net/netfilter/nf_nat_proto_udp.c @@ -17,8 +17,6 @@ #include #include -static u16 udp_port_rover; - static void udp_unique_tuple(const struct nf_nat_l3proto *l3proto, struct nf_conntrack_tuple *tuple, @@ -26,8 +24,7 @@ udp_unique_tuple(const struct nf_nat_l3proto *l3proto, enum nf_nat_manip_type maniptype, const struct nf_conn *ct) { - nf_nat_l4proto_unique_tuple(l3proto, tuple, range, maniptype, ct, - &udp_port_rover); + nf_nat_l4proto_unique_tuple(l3proto, tuple, range, maniptype, ct); } static bool diff --git a/net/netfilter/nf_nat_proto_udplite.c b/net/netfilter/nf_nat_proto_udplite.c index 58340c97bd83..8be265378de9 100644 --- a/net/netfilter/nf_nat_proto_udplite.c +++ b/net/netfilter/nf_nat_proto_udplite.c @@ -17,8 +17,6 @@ #include #include -static u16 udplite_port_rover; - static void udplite_unique_tuple(const struct nf_nat_l3proto *l3proto, struct nf_conntrack_tuple *tuple, @@ -26,8 +24,7 @@ udplite_unique_tuple(const struct nf_nat_l3proto *l3proto, enum nf_nat_manip_type maniptype, const struct nf_conn *ct) { - nf_nat_l4proto_unique_tuple(l3proto, tuple, range, maniptype, ct, - &udplite_port_rover); + nf_nat_l4proto_unique_tuple(l3proto, tuple, range, maniptype, ct); } static bool From patchwork Thu Feb 3 12:41:55 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Westphal X-Patchwork-Id: 539855 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 644C1C433F5 for ; Thu, 3 Feb 2022 12:42:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346025AbiBCMmN (ORCPT ); Thu, 3 Feb 2022 07:42:13 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32982 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230213AbiBCMmN (ORCPT ); Thu, 3 Feb 2022 07:42:13 -0500 Received: from Chamillionaire.breakpoint.cc (Chamillionaire.breakpoint.cc [IPv6:2a0a:51c0:0:12e:520::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 80298C061714; Thu, 3 Feb 2022 04:42:13 -0800 (PST) Received: from fw by Chamillionaire.breakpoint.cc with local (Exim 4.92) (envelope-from ) id 1nFbRD-0000Xy-U1; Thu, 03 Feb 2022 13:42:11 +0100 From: Florian Westphal To: Cc: , Florian Westphal , Pablo Neira Ayuso , Vimal Agrawal Subject: [PATCH 4.14.y 2/2] netfilter: nat: limit port clash resolution attempts Date: Thu, 3 Feb 2022 13:41:55 +0100 Message-Id: <20220203124155.16693-3-fw@strlen.de> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220203124155.16693-1-fw@strlen.de> References: <20220203124155.16693-1-fw@strlen.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org commit a504b703bb1da526a01593da0e4be2af9d9f5fa8 upstream. In case almost or all available ports are taken, clash resolution can take a very long time, resulting in soft lockup. This can happen when many to-be-natted hosts connect to same destination:port (e.g. a proxy) and all connections pass the same SNAT. Pick a random offset in the acceptable range, then try ever smaller number of adjacent port numbers, until either the limit is reached or a useable port was found. This results in at most 248 attempts (128 + 64 + 32 + 16 + 8, i.e. 4 restarts with new search offset) instead of 64000+, Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso Signed-off-by: Vimal Agrawal --- net/netfilter/nf_nat_proto_common.c | 29 +++++++++++++++++++++++------ 1 file changed, 23 insertions(+), 6 deletions(-) diff --git a/net/netfilter/nf_nat_proto_common.c b/net/netfilter/nf_nat_proto_common.c index ac57e47aded2..a4f709a3cbac 100644 --- a/net/netfilter/nf_nat_proto_common.c +++ b/net/netfilter/nf_nat_proto_common.c @@ -40,9 +40,10 @@ void nf_nat_l4proto_unique_tuple(const struct nf_nat_l3proto *l3proto, enum nf_nat_manip_type maniptype, const struct nf_conn *ct) { - unsigned int range_size, min, max, i; + unsigned int range_size, min, max, i, attempts; __be16 *portptr; - u_int16_t off; + u16 off; + static const unsigned int max_attempts = 128; if (maniptype == NF_NAT_MANIP_SRC) portptr = &tuple->src.u.all; @@ -86,12 +87,28 @@ void nf_nat_l4proto_unique_tuple(const struct nf_nat_l3proto *l3proto, off = prandom_u32(); } - for (i = 0; ; ++off) { + attempts = range_size; + if (attempts > max_attempts) + attempts = max_attempts; + + /* We are in softirq; doing a search of the entire range risks + * soft lockup when all tuples are already used. + * + * If we can't find any free port from first offset, pick a new + * one and try again, with ever smaller search window. + */ +another_round: + for (i = 0; i < attempts; i++, off++) { *portptr = htons(min + off % range_size); - if (++i != range_size && nf_nat_used_tuple(tuple, ct)) - continue; - return; + if (!nf_nat_used_tuple(tuple, ct)) + return; } + + if (attempts >= range_size || attempts < 16) + return; + attempts /= 2; + off = prandom_u32(); + goto another_round; } EXPORT_SYMBOL_GPL(nf_nat_l4proto_unique_tuple);