From patchwork Thu Feb 3 12:41:54 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Westphal X-Patchwork-Id: 540365 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 25723C433EF for ; Thu, 3 Feb 2022 12:42:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239566AbiBCMmJ (ORCPT ); Thu, 3 Feb 2022 07:42:09 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32964 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230213AbiBCMmJ (ORCPT ); Thu, 3 Feb 2022 07:42:09 -0500 Received: from Chamillionaire.breakpoint.cc (Chamillionaire.breakpoint.cc [IPv6:2a0a:51c0:0:12e:520::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 42D59C061714; Thu, 3 Feb 2022 04:42:09 -0800 (PST) Received: from fw by Chamillionaire.breakpoint.cc with local (Exim 4.92) (envelope-from ) id 1nFbR9-0000Xd-Pv; Thu, 03 Feb 2022 13:42:07 +0100 From: Florian Westphal To: Cc: , Florian Westphal , Pablo Neira Ayuso Subject: [PATCH 4.14.y 1/2] netfilter: nat: remove l4 protocol port rovers Date: Thu, 3 Feb 2022 13:41:54 +0100 Message-Id: <20220203124155.16693-2-fw@strlen.de> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220203124155.16693-1-fw@strlen.de> References: <20220203124155.16693-1-fw@strlen.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org commit 6ed5943f8735e2b778d92ea4d9805c0a1d89bc2b upstream. This is a leftover from days where single-cpu systems were common: Store last port used to resolve a clash to use it as a starting point when the next conflict needs to be resolved. When we have parallel attempt to connect to same address:port pair, its likely that both cores end up computing the same "available" port, as both use same starting port, and newly used ports won't become visible to other cores until the conntrack gets confirmed later. One of the cores then has to drop the packet at insertion time because the chosen new tuple turns out to be in use after all. Lets simplify this: remove port rover and use a pseudo-random starting point. Note that this doesn't make netfilter default to 'fully random' mode; the 'rover' was only used if NAT could not reuse source port as-is. Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso --- include/net/netfilter/nf_nat_l4proto.h | 2 +- net/netfilter/nf_nat_proto_common.c | 7 ++----- net/netfilter/nf_nat_proto_dccp.c | 5 +---- net/netfilter/nf_nat_proto_sctp.c | 5 +---- net/netfilter/nf_nat_proto_tcp.c | 5 +---- net/netfilter/nf_nat_proto_udp.c | 10 ++-------- 6 files changed, 8 insertions(+), 26 deletions(-) diff --git a/include/net/netfilter/nf_nat_l4proto.h b/include/net/netfilter/nf_nat_l4proto.h index 67835ff8a2d9..103ecea6afdb 100644 --- a/include/net/netfilter/nf_nat_l4proto.h +++ b/include/net/netfilter/nf_nat_l4proto.h @@ -74,7 +74,7 @@ void nf_nat_l4proto_unique_tuple(const struct nf_nat_l3proto *l3proto, struct nf_conntrack_tuple *tuple, const struct nf_nat_range *range, enum nf_nat_manip_type maniptype, - const struct nf_conn *ct, u16 *rover); + const struct nf_conn *ct); int nf_nat_l4proto_nlattr_to_range(struct nlattr *tb[], struct nf_nat_range *range); diff --git a/net/netfilter/nf_nat_proto_common.c b/net/netfilter/nf_nat_proto_common.c index 7d7466dbf663..ac57e47aded2 100644 --- a/net/netfilter/nf_nat_proto_common.c +++ b/net/netfilter/nf_nat_proto_common.c @@ -38,8 +38,7 @@ void nf_nat_l4proto_unique_tuple(const struct nf_nat_l3proto *l3proto, struct nf_conntrack_tuple *tuple, const struct nf_nat_range *range, enum nf_nat_manip_type maniptype, - const struct nf_conn *ct, - u16 *rover) + const struct nf_conn *ct) { unsigned int range_size, min, max, i; __be16 *portptr; @@ -84,15 +83,13 @@ void nf_nat_l4proto_unique_tuple(const struct nf_nat_l3proto *l3proto, } else if (range->flags & NF_NAT_RANGE_PROTO_RANDOM_FULLY) { off = prandom_u32(); } else { - off = *rover; + off = prandom_u32(); } for (i = 0; ; ++off) { *portptr = htons(min + off % range_size); if (++i != range_size && nf_nat_used_tuple(tuple, ct)) continue; - if (!(range->flags & NF_NAT_RANGE_PROTO_RANDOM_ALL)) - *rover = off; return; } } diff --git a/net/netfilter/nf_nat_proto_dccp.c b/net/netfilter/nf_nat_proto_dccp.c index 269fcd5dc34c..04c671300a14 100644 --- a/net/netfilter/nf_nat_proto_dccp.c +++ b/net/netfilter/nf_nat_proto_dccp.c @@ -18,8 +18,6 @@ #include #include -static u_int16_t dccp_port_rover; - static void dccp_unique_tuple(const struct nf_nat_l3proto *l3proto, struct nf_conntrack_tuple *tuple, @@ -27,8 +25,7 @@ dccp_unique_tuple(const struct nf_nat_l3proto *l3proto, enum nf_nat_manip_type maniptype, const struct nf_conn *ct) { - nf_nat_l4proto_unique_tuple(l3proto, tuple, range, maniptype, ct, - &dccp_port_rover); + nf_nat_l4proto_unique_tuple(l3proto, tuple, range, maniptype, ct); } static bool diff --git a/net/netfilter/nf_nat_proto_sctp.c b/net/netfilter/nf_nat_proto_sctp.c index c57ee3240b1d..7329c9b1dc1e 100644 --- a/net/netfilter/nf_nat_proto_sctp.c +++ b/net/netfilter/nf_nat_proto_sctp.c @@ -12,8 +12,6 @@ #include -static u_int16_t nf_sctp_port_rover; - static void sctp_unique_tuple(const struct nf_nat_l3proto *l3proto, struct nf_conntrack_tuple *tuple, @@ -21,8 +19,7 @@ sctp_unique_tuple(const struct nf_nat_l3proto *l3proto, enum nf_nat_manip_type maniptype, const struct nf_conn *ct) { - nf_nat_l4proto_unique_tuple(l3proto, tuple, range, maniptype, ct, - &nf_sctp_port_rover); + nf_nat_l4proto_unique_tuple(l3proto, tuple, range, maniptype, ct); } static bool diff --git a/net/netfilter/nf_nat_proto_tcp.c b/net/netfilter/nf_nat_proto_tcp.c index 4f8820fc5148..882e79c6df73 100644 --- a/net/netfilter/nf_nat_proto_tcp.c +++ b/net/netfilter/nf_nat_proto_tcp.c @@ -18,8 +18,6 @@ #include #include -static u16 tcp_port_rover; - static void tcp_unique_tuple(const struct nf_nat_l3proto *l3proto, struct nf_conntrack_tuple *tuple, @@ -27,8 +25,7 @@ tcp_unique_tuple(const struct nf_nat_l3proto *l3proto, enum nf_nat_manip_type maniptype, const struct nf_conn *ct) { - nf_nat_l4proto_unique_tuple(l3proto, tuple, range, maniptype, ct, - &tcp_port_rover); + nf_nat_l4proto_unique_tuple(l3proto, tuple, range, maniptype, ct); } static bool diff --git a/net/netfilter/nf_nat_proto_udp.c b/net/netfilter/nf_nat_proto_udp.c index 167ad0dd269c..f48bacd38d9d 100644 --- a/net/netfilter/nf_nat_proto_udp.c +++ b/net/netfilter/nf_nat_proto_udp.c @@ -17,8 +17,6 @@ #include #include -static u16 udp_port_rover; - static void udp_unique_tuple(const struct nf_nat_l3proto *l3proto, struct nf_conntrack_tuple *tuple, @@ -26,8 +24,7 @@ udp_unique_tuple(const struct nf_nat_l3proto *l3proto, enum nf_nat_manip_type maniptype, const struct nf_conn *ct) { - nf_nat_l4proto_unique_tuple(l3proto, tuple, range, maniptype, ct, - &udp_port_rover); + nf_nat_l4proto_unique_tuple(l3proto, tuple, range, maniptype, ct); } static void @@ -78,8 +75,6 @@ static bool udp_manip_pkt(struct sk_buff *skb, } #ifdef CONFIG_NF_NAT_PROTO_UDPLITE -static u16 udplite_port_rover; - static bool udplite_manip_pkt(struct sk_buff *skb, const struct nf_nat_l3proto *l3proto, unsigned int iphdroff, unsigned int hdroff, @@ -103,8 +98,7 @@ udplite_unique_tuple(const struct nf_nat_l3proto *l3proto, enum nf_nat_manip_type maniptype, const struct nf_conn *ct) { - nf_nat_l4proto_unique_tuple(l3proto, tuple, range, maniptype, ct, - &udplite_port_rover); + nf_nat_l4proto_unique_tuple(l3proto, tuple, range, maniptype, ct); } const struct nf_nat_l4proto nf_nat_l4proto_udplite = { From patchwork Thu Feb 3 12:32:55 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Westphal X-Patchwork-Id: 540366 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3BBC3C433FE for ; Thu, 3 Feb 2022 12:34:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1350553AbiBCMeK (ORCPT ); Thu, 3 Feb 2022 07:34:10 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59460 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230213AbiBCMeJ (ORCPT ); Thu, 3 Feb 2022 07:34:09 -0500 Received: from Chamillionaire.breakpoint.cc (Chamillionaire.breakpoint.cc [IPv6:2a0a:51c0:0:12e:520::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6788DC06173B; Thu, 3 Feb 2022 04:34:09 -0800 (PST) Received: from fw by Chamillionaire.breakpoint.cc with local (Exim 4.92) (envelope-from ) id 1nFbJP-0000UM-VX; Thu, 03 Feb 2022 13:34:08 +0100 From: Florian Westphal To: Cc: , Florian Westphal , Pablo Neira Ayuso Subject: [PATCH 4.9.y 2/2] netfilter: nat: limit port clash resolution attempts Date: Thu, 3 Feb 2022 13:32:55 +0100 Message-Id: <20220203123255.18974-3-fw@strlen.de> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220203123255.18974-1-fw@strlen.de> References: <20220203123255.18974-1-fw@strlen.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org commit a504b703bb1da526a01593da0e4be2af9d9f5fa8 upstream. In case almost or all available ports are taken, clash resolution can take a very long time, resulting in soft lockup. This can happen when many to-be-natted hosts connect to same destination:port (e.g. a proxy) and all connections pass the same SNAT. Pick a random offset in the acceptable range, then try ever smaller number of adjacent port numbers, until either the limit is reached or a useable port was found. This results in at most 248 attempts (128 + 64 + 32 + 16 + 8, i.e. 4 restarts with new search offset) instead of 64000+, Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso --- net/netfilter/nf_nat_proto_common.c | 29 +++++++++++++++++++++++------ 1 file changed, 23 insertions(+), 6 deletions(-) diff --git a/net/netfilter/nf_nat_proto_common.c b/net/netfilter/nf_nat_proto_common.c index ac57e47aded2..a4f709a3cbac 100644 --- a/net/netfilter/nf_nat_proto_common.c +++ b/net/netfilter/nf_nat_proto_common.c @@ -40,9 +40,10 @@ void nf_nat_l4proto_unique_tuple(const struct nf_nat_l3proto *l3proto, enum nf_nat_manip_type maniptype, const struct nf_conn *ct) { - unsigned int range_size, min, max, i; + unsigned int range_size, min, max, i, attempts; __be16 *portptr; - u_int16_t off; + u16 off; + static const unsigned int max_attempts = 128; if (maniptype == NF_NAT_MANIP_SRC) portptr = &tuple->src.u.all; @@ -86,12 +87,28 @@ void nf_nat_l4proto_unique_tuple(const struct nf_nat_l3proto *l3proto, off = prandom_u32(); } - for (i = 0; ; ++off) { + attempts = range_size; + if (attempts > max_attempts) + attempts = max_attempts; + + /* We are in softirq; doing a search of the entire range risks + * soft lockup when all tuples are already used. + * + * If we can't find any free port from first offset, pick a new + * one and try again, with ever smaller search window. + */ +another_round: + for (i = 0; i < attempts; i++, off++) { *portptr = htons(min + off % range_size); - if (++i != range_size && nf_nat_used_tuple(tuple, ct)) - continue; - return; + if (!nf_nat_used_tuple(tuple, ct)) + return; } + + if (attempts >= range_size || attempts < 16) + return; + attempts /= 2; + off = prandom_u32(); + goto another_round; } EXPORT_SYMBOL_GPL(nf_nat_l4proto_unique_tuple);