From patchwork Sun Mar 8 18:16:13 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kuniyuki Iwashima X-Patchwork-Id: 222852 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.9 required=3.0 tests=DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6C795C10DCE for ; Sun, 8 Mar 2020 18:17:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 4911B20663 for ; Sun, 8 Mar 2020 18:17:06 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=amazon.co.jp header.i=@amazon.co.jp header.b="rTJXFMhj" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726415AbgCHSRF (ORCPT ); Sun, 8 Mar 2020 14:17:05 -0400 Received: from smtp-fw-2101.amazon.com ([72.21.196.25]:53508 "EHLO smtp-fw-2101.amazon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726279AbgCHSRF (ORCPT ); Sun, 8 Mar 2020 14:17:05 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.co.jp; i=@amazon.co.jp; q=dns/txt; s=amazon201209; t=1583691424; x=1615227424; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version; bh=PhAn8Cftvli3LnbLzEEJWnBuBdexWvvymecj3+nlTh0=; b=rTJXFMhjpbw7QWBY9UZPcbR7K+pjIRdYAX9WS+8p7DIYFjKpEsh3fT1Q L+qfz/BoxSBFhdbpuc26cwHm387U4DlW7M/oxPE5EAco5sWdCfT63gqJ6 tptjhhPPCYFENyzEU+nEhLmAkMcowAat6HuOXJHhJQXGqy6Gd3HGKV53k Y=; IronPort-SDR: BiDr9v+MMhYOSMya3TDRYoOChM73+ug1bQObbe6/faf/WdcOLc1BBKnG3Tyfs/E7O+GffMLaSf eA9zWNP6hSZQ== X-IronPort-AV: E=Sophos;i="5.70,530,1574121600"; d="scan'208";a="20529942" Received: from iad12-co-svc-p1-lb1-vlan2.amazon.com (HELO email-inbound-relay-1e-62350142.us-east-1.amazon.com) ([10.43.8.2]) by smtp-border-fw-out-2101.iad2.amazon.com with ESMTP; 08 Mar 2020 18:17:03 +0000 Received: from EX13MTAUWA001.ant.amazon.com (iad55-ws-svc-p15-lb9-vlan3.iad.amazon.com [10.40.159.166]) by email-inbound-relay-1e-62350142.us-east-1.amazon.com (Postfix) with ESMTPS id 9A863A2316; Sun, 8 Mar 2020 18:17:01 +0000 (UTC) Received: from EX13D04ANC001.ant.amazon.com (10.43.157.89) by EX13MTAUWA001.ant.amazon.com (10.43.160.58) with Microsoft SMTP Server (TLS) id 15.0.1367.3; Sun, 8 Mar 2020 18:17:00 +0000 Received: from 38f9d3582de7.ant.amazon.com (10.43.160.100) by EX13D04ANC001.ant.amazon.com (10.43.157.89) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Sun, 8 Mar 2020 18:16:56 +0000 From: Kuniyuki Iwashima To: , , , CC: , , , Subject: [PATCH v4 net-next 3/5] tcp: Forbid to bind more than one sockets haveing SO_REUSEADDR and SO_REUSEPORT per EUID. Date: Mon, 9 Mar 2020 03:16:13 +0900 Message-ID: <20200308181615.90135-4-kuniyu@amazon.co.jp> X-Mailer: git-send-email 2.17.2 (Apple Git-113) In-Reply-To: <20200308181615.90135-1-kuniyu@amazon.co.jp> References: <20200308181615.90135-1-kuniyu@amazon.co.jp> MIME-Version: 1.0 X-Originating-IP: [10.43.160.100] X-ClientProxiedBy: EX13D22UWC001.ant.amazon.com (10.43.162.192) To EX13D04ANC001.ant.amazon.com (10.43.157.89) Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org If there is no TCP_LISTEN socket on a ephemeral port, we can bind multiple sockets having SO_REUSEADDR to the same port. Then if all sockets bound to the port have also SO_REUSEPORT enabled and have the same EUID, all of them can be listened. This is not safe. Let's say, an application has root privilege and binds sockets to an ephemeral port with both of SO_REUSEADDR and SO_REUSEPORT. When none of sockets is not listened yet, a malicious user can use sudo, exhaust ephemeral ports, and bind sockets to the same ephemeral port, so he or she can call listen and steal the port. To prevent this issue, we must not bind more than one sockets that have the same EUID and both of SO_REUSEADDR and SO_REUSEPORT. On the other hand, if the sockets have different EUIDs, the issue above does not occur. After sockets with different EUIDs are bound to the same port and one of them is listened, no more socket can be listened. This is because the condition below is evaluated true and listen() for the second socket fails. } else if (!reuseport_ok || !reuseport || !sk2->sk_reuseport || rcu_access_pointer(sk->sk_reuseport_cb) || (sk2->sk_state != TCP_TIME_WAIT && !uid_eq(uid, sock_i_uid(sk2)))) { if (inet_rcv_saddr_equal(sk, sk2, true)) break; } Therefore, on the same port, we cannot do listen() for multiple sockets with different EUIDs and any other listen syscalls fail, so the problem does not happen. In this case, we can still call connect() for other sockets that cannot be listened, so we have to succeed to call bind() in order to fully utilize 4-tuples. Summarizing the above, we should be able to bind only one socket having SO_REUSEADDR and SO_REUSEPORT per EUID. Signed-off-by: Kuniyuki Iwashima --- net/ipv4/inet_connection_sock.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index cddeab240ea6..d27ed5fe7147 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -131,7 +131,7 @@ static int inet_csk_bind_conflict(const struct sock *sk, { struct sock *sk2; bool reuse = sk->sk_reuse; - bool reuseport = !!sk->sk_reuseport && reuseport_ok; + bool reuseport = !!sk->sk_reuseport; kuid_t uid = sock_i_uid((struct sock *)sk); /* @@ -148,10 +148,16 @@ static int inet_csk_bind_conflict(const struct sock *sk, sk->sk_bound_dev_if == sk2->sk_bound_dev_if)) { if (reuse && sk2->sk_reuse && sk2->sk_state != TCP_LISTEN) { - if (!relax && + if ((!relax || + (!reuseport_ok && + reuseport && sk2->sk_reuseport && + !rcu_access_pointer(sk->sk_reuseport_cb) && + (sk2->sk_state == TCP_TIME_WAIT || + uid_eq(uid, sock_i_uid(sk2))))) && inet_rcv_saddr_equal(sk, sk2, true)) break; - } else if (!reuseport || !sk2->sk_reuseport || + } else if (!reuseport_ok || + !reuseport || !sk2->sk_reuseport || rcu_access_pointer(sk->sk_reuseport_cb) || (sk2->sk_state != TCP_TIME_WAIT && !uid_eq(uid, sock_i_uid(sk2)))) { From patchwork Sun Mar 8 18:21:26 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kuniyuki Iwashima X-Patchwork-Id: 222851 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.9 required=3.0 tests=DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 231E4C10DCE for ; Sun, 8 Mar 2020 18:23:02 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id E479320674 for ; Sun, 8 Mar 2020 18:23:01 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=amazon.co.jp header.i=@amazon.co.jp header.b="hnfDE98I" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726330AbgCHSXB (ORCPT ); Sun, 8 Mar 2020 14:23:01 -0400 Received: from smtp-fw-33001.amazon.com ([207.171.190.10]:62228 "EHLO smtp-fw-33001.amazon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726279AbgCHSXA (ORCPT ); Sun, 8 Mar 2020 14:23:00 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.co.jp; i=@amazon.co.jp; q=dns/txt; s=amazon201209; t=1583691778; x=1615227778; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version; bh=3mkSj8vWIuyo8kfcrz+ekAJnprGunmVLQ96tqiCY5kg=; b=hnfDE98I2tXrvT5lTLJfWZNbDcwuhhmzsWuRmrcjDW1x/CThQg/zveql 4m01HCswldAN491Mar3HiQMtmRHeZzTe/CCpiZvrGX+U1CTWFf5Hr/Glv RkCuE/i6TU3tsv6pC6WZsmMgaZFQK8Q8mpG1TsgAhIPDAywsMKMy0R9Qc g=; IronPort-SDR: VdBV4vV6PFb7oZuG8FMHsAOOwzD8RirkDrFgr8PfTNUra1fAB9qmGQF+OFiBPCK7sIFWy9zJuH E0YbCNBECsQA== X-IronPort-AV: E=Sophos;i="5.70,530,1574121600"; d="scan'208";a="31289646" Received: from sea32-co-svc-lb4-vlan3.sea.corp.amazon.com (HELO email-inbound-relay-1d-2c665b5d.us-east-1.amazon.com) ([10.47.23.38]) by smtp-border-fw-out-33001.sea14.amazon.com with ESMTP; 08 Mar 2020 18:22:57 +0000 Received: from EX13MTAUWA001.ant.amazon.com (iad55-ws-svc-p15-lb9-vlan3.iad.amazon.com [10.40.159.166]) by email-inbound-relay-1d-2c665b5d.us-east-1.amazon.com (Postfix) with ESMTPS id 3785DA21EC; Sun, 8 Mar 2020 18:22:54 +0000 (UTC) Received: from EX13D04ANC001.ant.amazon.com (10.43.157.89) by EX13MTAUWA001.ant.amazon.com (10.43.160.58) with Microsoft SMTP Server (TLS) id 15.0.1367.3; Sun, 8 Mar 2020 18:22:54 +0000 Received: from 38f9d3582de7.ant.amazon.com (10.43.160.100) by EX13D04ANC001.ant.amazon.com (10.43.157.89) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Sun, 8 Mar 2020 18:22:50 +0000 From: Kuniyuki Iwashima To: CC: , , , , , , Subject: [PATCH v4 net-next 5/5] selftests: net: Add SO_REUSEADDR test to check if 4-tuples are fully utilized. Date: Mon, 9 Mar 2020 03:21:26 +0900 Message-ID: <20200308182126.92852-1-kuniyu@amazon.co.jp> X-Mailer: git-send-email 2.17.2 (Apple Git-113) In-Reply-To: <20200308181615.90135-1-kuniyu@amazon.co.jp> References: <20200308181615.90135-1-kuniyu@amazon.co.jp> MIME-Version: 1.0 X-Originating-IP: [10.43.160.100] X-ClientProxiedBy: EX13D22UWC002.ant.amazon.com (10.43.162.29) To EX13D04ANC001.ant.amazon.com (10.43.157.89) Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This commit adds a test to check if we can fully utilize 4-tuples for connect() when all ephemeral ports are exhausted. The test program changes the local port range to use only one port and binds two sockets with or without SO_REUSEADDR and SO_REUSEPORT, and with the same EUID or with different EUIDs, then do listen(). We should be able to bind only one socket having both SO_REUSEADDR and SO_REUSEPORT per EUID, which restriction is to prevent unintentional listen(). Signed-off-by: Kuniyuki Iwashima --- tools/testing/selftests/net/.gitignore | 1 + tools/testing/selftests/net/Makefile | 2 + .../selftests/net/reuseaddr_ports_exhausted.c | 162 ++++++++++++++++++ .../net/reuseaddr_ports_exhausted.sh | 35 ++++ 4 files changed, 200 insertions(+) create mode 100644 tools/testing/selftests/net/reuseaddr_ports_exhausted.c create mode 100755 tools/testing/selftests/net/reuseaddr_ports_exhausted.sh diff --git a/tools/testing/selftests/net/.gitignore b/tools/testing/selftests/net/.gitignore index ecc52d4c034d..91f9aea853b1 100644 --- a/tools/testing/selftests/net/.gitignore +++ b/tools/testing/selftests/net/.gitignore @@ -23,3 +23,4 @@ so_txtime tcp_fastopen_backup_key nettest fin_ack_lat +reuseaddr_ports_exhausted \ No newline at end of file diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile index b5694196430a..ded1aa394880 100644 --- a/tools/testing/selftests/net/Makefile +++ b/tools/testing/selftests/net/Makefile @@ -12,6 +12,7 @@ TEST_PROGS += udpgro_bench.sh udpgro.sh test_vxlan_under_vrf.sh reuseport_addr_a TEST_PROGS += test_vxlan_fdb_changelink.sh so_txtime.sh ipv6_flowlabel.sh TEST_PROGS += tcp_fastopen_backup_key.sh fcnal-test.sh l2tp.sh traceroute.sh TEST_PROGS += fin_ack_lat.sh +TEST_PROGS += reuseaddr_ports_exhausted.sh TEST_PROGS_EXTENDED := in_netns.sh TEST_GEN_FILES = socket nettest TEST_GEN_FILES += psock_fanout psock_tpacket msg_zerocopy reuseport_addr_any @@ -22,6 +23,7 @@ TEST_GEN_FILES += tcp_fastopen_backup_key TEST_GEN_FILES += fin_ack_lat TEST_GEN_PROGS = reuseport_bpf reuseport_bpf_cpu reuseport_bpf_numa TEST_GEN_PROGS += reuseport_dualstack reuseaddr_conflict tls +TEST_GEN_FILES += reuseaddr_ports_exhausted KSFT_KHDR_INSTALL := 1 include ../lib.mk diff --git a/tools/testing/selftests/net/reuseaddr_ports_exhausted.c b/tools/testing/selftests/net/reuseaddr_ports_exhausted.c new file mode 100644 index 000000000000..7b01b7c2ec10 --- /dev/null +++ b/tools/testing/selftests/net/reuseaddr_ports_exhausted.c @@ -0,0 +1,162 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Check if we can fully utilize 4-tuples for connect(). + * + * Rules to bind sockets to the same port when all ephemeral ports are + * exhausted. + * + * 1. if there are TCP_LISTEN sockets on the port, fail to bind. + * 2. if there are sockets without SO_REUSEADDR, fail to bind. + * 3. if SO_REUSEADDR is disabled, fail to bind. + * 4. if SO_REUSEADDR is enabled and SO_REUSEPORT is disabled, + * succeed to bind. + * 5. if SO_REUSEADDR and SO_REUSEPORT are enabled and + * there is no socket having the both options and the same EUID, + * succeed to bind. + * 6. fail to bind. + * + * Author: Kuniyuki Iwashima + */ +#include +#include +#include +#include +#include +#include "../kselftest_harness.h" + +struct reuse_opts { + int reuseaddr[2]; + int reuseport[2]; +}; + +struct reuse_opts unreusable_opts[12] = { + {0, 0, 0, 0}, + {0, 0, 0, 1}, + {0, 0, 1, 0}, + {0, 0, 1, 1}, + {0, 1, 0, 0}, + {0, 1, 0, 1}, + {0, 1, 1, 0}, + {0, 1, 1, 1}, + {1, 0, 0, 0}, + {1, 0, 0, 1}, + {1, 0, 1, 0}, + {1, 0, 1, 1}, +}; + +struct reuse_opts reusable_opts[4] = { + {1, 1, 0, 0}, + {1, 1, 0, 1}, + {1, 1, 1, 0}, + {1, 1, 1, 1}, +}; + +int bind_port(struct __test_metadata *_metadata, int reuseaddr, int reuseport) +{ + struct sockaddr_in local_addr; + int len = sizeof(local_addr); + int fd, ret; + + fd = socket(AF_INET, SOCK_STREAM, 0); + ASSERT_NE(-1, fd) TH_LOG("failed to open socket."); + + ret = setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &reuseaddr, sizeof(int)); + ASSERT_EQ(0, ret) TH_LOG("failed to setsockopt: SO_REUSEADDR."); + + ret = setsockopt(fd, SOL_SOCKET, SO_REUSEPORT, &reuseport, sizeof(int)); + ASSERT_EQ(0, ret) TH_LOG("failed to setsockopt: SO_REUSEPORT."); + + local_addr.sin_family = AF_INET; + local_addr.sin_addr.s_addr = inet_addr("127.0.0.1"); + local_addr.sin_port = 0; + + if (bind(fd, (struct sockaddr *)&local_addr, len) == -1) { + close(fd); + return -1; + } + + return fd; +} + +TEST(reuseaddr_ports_exhausted_unreusable) +{ + struct reuse_opts *opts; + int i, j, fd[2]; + + for (i = 0; i < 12; i++) { + opts = &unreusable_opts[i]; + + for (j = 0; j < 2; j++) + fd[j] = bind_port(_metadata, opts->reuseaddr[j], opts->reuseport[j]); + + ASSERT_NE(-1, fd[0]) TH_LOG("failed to bind."); + EXPECT_EQ(-1, fd[1]) TH_LOG("should fail to bind."); + + for (j = 0; j < 2; j++) + if (fd[j] != -1) + close(fd[j]); + } +} + +TEST(reuseaddr_ports_exhausted_reusable_same_euid) +{ + struct reuse_opts *opts; + int i, j, fd[2]; + + for (i = 0; i < 4; i++) { + opts = &reusable_opts[i]; + + for (j = 0; j < 2; j++) + fd[j] = bind_port(_metadata, opts->reuseaddr[j], opts->reuseport[j]); + + ASSERT_NE(-1, fd[0]) TH_LOG("failed to bind."); + + if (opts->reuseport[0] && opts->reuseport[1]) { + EXPECT_EQ(-1, fd[1]) TH_LOG("should fail to bind because both sockets succeed to be listened."); + } else { + EXPECT_NE(-1, fd[1]) TH_LOG("should succeed to bind to connect to different destinations."); + } + + for (j = 0; j < 2; j++) + if (fd[j] != -1) + close(fd[j]); + } +} + +TEST(reuseaddr_ports_exhausted_reusable_different_euid) +{ + struct reuse_opts *opts; + int i, j, ret, fd[2]; + uid_t euid[2] = {10, 20}; + + for (i = 0; i < 4; i++) { + opts = &reusable_opts[i]; + + for (j = 0; j < 2; j++) { + ret = seteuid(euid[j]); + ASSERT_EQ(0, ret) TH_LOG("failed to seteuid: %d.", euid[j]); + + fd[j] = bind_port(_metadata, opts->reuseaddr[j], opts->reuseport[j]); + + ret = seteuid(0); + ASSERT_EQ(0, ret) TH_LOG("failed to seteuid: 0."); + } + + ASSERT_NE(-1, fd[0]) TH_LOG("failed to bind."); + EXPECT_NE(-1, fd[1]) TH_LOG("should succeed to bind because one socket can be bound in each euid."); + + if (fd[1] != -1) { + ret = listen(fd[0], 5); + ASSERT_EQ(0, ret) TH_LOG("failed to listen."); + + ret = listen(fd[1], 5); + EXPECT_EQ(-1, ret) TH_LOG("should fail to listen because only one uid reserves the port in TCP_LISTEN."); + } + + for (j = 0; j < 2; j++) + if (fd[j] != -1) + close(fd[j]); + } +} + +TEST_HARNESS_MAIN diff --git a/tools/testing/selftests/net/reuseaddr_ports_exhausted.sh b/tools/testing/selftests/net/reuseaddr_ports_exhausted.sh new file mode 100755 index 000000000000..20e3a2913d06 --- /dev/null +++ b/tools/testing/selftests/net/reuseaddr_ports_exhausted.sh @@ -0,0 +1,35 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# +# Run tests when all ephemeral ports are exhausted. +# +# Author: Kuniyuki Iwashima + +set +x +set -e + +readonly NETNS="ns-$(mktemp -u XXXXXX)" + +setup() { + ip netns add "${NETNS}" + ip -netns "${NETNS}" link set lo up + ip netns exec "${NETNS}" \ + sysctl -w net.ipv4.ip_local_port_range="32768 32768" \ + > /dev/null 2>&1 + ip netns exec "${NETNS}" \ + sysctl -w net.ipv4.ip_autobind_reuse=1 > /dev/null 2>&1 +} + +cleanup() { + ip netns del "${NETNS}" +} + +trap cleanup EXIT +setup + +do_test() { + ip netns exec "${NETNS}" ./reuseaddr_ports_exhausted +} + +do_test +echo "tests done"