From patchwork Mon Dec 7 13:24:44 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kuniyuki Iwashima X-Patchwork-Id: 339541 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BE342C197BF for ; Mon, 7 Dec 2020 13:26:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8D4E123382 for ; Mon, 7 Dec 2020 13:26:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726531AbgLGN0L (ORCPT ); Mon, 7 Dec 2020 08:26:11 -0500 Received: from smtp-fw-2101.amazon.com ([72.21.196.25]:36461 "EHLO smtp-fw-2101.amazon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726026AbgLGN0L (ORCPT ); Mon, 7 Dec 2020 08:26:11 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.co.jp; i=@amazon.co.jp; q=dns/txt; s=amazon201209; t=1607347570; x=1638883570; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version; bh=HKjIsmqmU/4GV4rzTHMtQpJccy4bICOTjf5HS8mC/Ik=; b=TKG0SRe8m9AkZMK5hQJmowRCbcgX40nS4KKyaCI8ZgdrBBNQPv0fVAX6 aTGNwLdU6HjW/PaBY/Cp3XCHAIMJRZGIE+h1r3zb+CVZiNUdoqn9oQ3JE T17kAnhb8Bxt+CilrvujV9M5lBFduiQ8rckpbtIi68CuJrMBmtPz7oSmA 4=; X-IronPort-AV: E=Sophos;i="5.78,399,1599523200"; d="scan'208";a="67699485" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO email-inbound-relay-1a-715bee71.us-east-1.amazon.com) ([10.43.8.6]) by smtp-border-fw-out-2101.iad2.amazon.com with ESMTP; 07 Dec 2020 13:25:37 +0000 Received: from EX13MTAUWB001.ant.amazon.com (iad12-ws-svc-p26-lb9-vlan2.iad.amazon.com [10.40.163.34]) by email-inbound-relay-1a-715bee71.us-east-1.amazon.com (Postfix) with ESMTPS id C5A93A1DD7; Mon, 7 Dec 2020 13:25:34 +0000 (UTC) Received: from EX13D04ANC001.ant.amazon.com (10.43.157.89) by EX13MTAUWB001.ant.amazon.com (10.43.161.249) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Mon, 7 Dec 2020 13:25:33 +0000 Received: from 38f9d3582de7.ant.amazon.com (10.43.161.43) by EX13D04ANC001.ant.amazon.com (10.43.157.89) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Mon, 7 Dec 2020 13:25:29 +0000 From: Kuniyuki Iwashima To: "David S . Miller" , Jakub Kicinski , Eric Dumazet , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau CC: Benjamin Herrenschmidt , Kuniyuki Iwashima , Kuniyuki Iwashima , , , Subject: [PATCH v2 bpf-next 01/13] tcp: Allow TCP_CLOSE sockets to hold the reuseport group. Date: Mon, 7 Dec 2020 22:24:44 +0900 Message-ID: <20201207132456.65472-2-kuniyu@amazon.co.jp> X-Mailer: git-send-email 2.17.2 (Apple Git-113) In-Reply-To: <20201207132456.65472-1-kuniyu@amazon.co.jp> References: <20201207132456.65472-1-kuniyu@amazon.co.jp> MIME-Version: 1.0 X-Originating-IP: [10.43.161.43] X-ClientProxiedBy: EX13D37UWC002.ant.amazon.com (10.43.162.123) To EX13D04ANC001.ant.amazon.com (10.43.157.89) Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This patch is a preparation patch to migrate incoming connections in the later commits and adds a field (num_closed_socks) to the struct sock_reuseport to allow TCP_CLOSE sockets to access to the reuseport group. When we close a listening socket, to migrate its connections to another listener in the same reuseport group, we have to handle two kinds of child sockets. One is that a listening socket has a reference to, and the other is not. The former is the TCP_ESTABLISHED/TCP_SYN_RECV sockets, and they are in the accept queue of their listening socket. So, we can pop them out and push them into another listener's queue at close() or shutdown() syscalls. On the other hand, the latter, the TCP_NEW_SYN_RECV socket is during the three-way handshake and not in the accept queue. Thus, we cannot access such sockets at close() or shutdown() syscalls. Accordingly, we have to migrate immature sockets after their listening socket has been closed. Currently, if their listening socket has been closed, TCP_NEW_SYN_RECV sockets are freed at receiving the final ACK or retransmitting SYN+ACKs. At that time, if we could select a new listener from the same reuseport group, no connection would be aborted. However, it is impossible because reuseport_detach_sock() sets NULL to sk_reuseport_cb and forbids access to the reuseport group from closed sockets. This patch allows TCP_CLOSE sockets to hold sk_reuseport_cb while any child socket references to them. The point is that reuseport_detach_sock() is called twice from inet_unhash() and sk_destruct(). At first, it decrements num_socks and increments num_closed_socks. Later, when all migrated connections are accepted, it decrements num_closed_socks and sets NULL to sk_reuseport_cb. By this change, closed sockets can keep sk_reuseport_cb until all child requests have been freed or accepted. Consequently calling listen() after shutdown() can cause EADDRINUSE or EBUSY in reuseport_add_sock() or inet_csk_bind_conflict() which expect that such sockets should not have the reuseport group. Therefore, this patch also loosens such validation rules so that the socket can listen again if it has the same reuseport group with other listening sockets. Reviewed-by: Benjamin Herrenschmidt Signed-off-by: Kuniyuki Iwashima --- include/net/sock_reuseport.h | 5 +++-- net/core/sock_reuseport.c | 39 +++++++++++++++++++++++---------- net/ipv4/inet_connection_sock.c | 7 ++++-- 3 files changed, 35 insertions(+), 16 deletions(-) diff --git a/include/net/sock_reuseport.h b/include/net/sock_reuseport.h index 505f1e18e9bf..0e558ca7afbf 100644 --- a/include/net/sock_reuseport.h +++ b/include/net/sock_reuseport.h @@ -13,8 +13,9 @@ extern spinlock_t reuseport_lock; struct sock_reuseport { struct rcu_head rcu; - u16 max_socks; /* length of socks */ - u16 num_socks; /* elements in socks */ + u16 max_socks; /* length of socks */ + u16 num_socks; /* elements in socks */ + u16 num_closed_socks; /* closed elements in socks */ /* The last synq overflow event timestamp of this * reuse->socks[] group. */ diff --git a/net/core/sock_reuseport.c b/net/core/sock_reuseport.c index bbdd3c7b6cb5..c26f4256ff41 100644 --- a/net/core/sock_reuseport.c +++ b/net/core/sock_reuseport.c @@ -98,14 +98,15 @@ static struct sock_reuseport *reuseport_grow(struct sock_reuseport *reuse) return NULL; more_reuse->num_socks = reuse->num_socks; + more_reuse->num_closed_socks = reuse->num_closed_socks; more_reuse->prog = reuse->prog; more_reuse->reuseport_id = reuse->reuseport_id; more_reuse->bind_inany = reuse->bind_inany; more_reuse->has_conns = reuse->has_conns; + more_reuse->synq_overflow_ts = READ_ONCE(reuse->synq_overflow_ts); memcpy(more_reuse->socks, reuse->socks, reuse->num_socks * sizeof(struct sock *)); - more_reuse->synq_overflow_ts = READ_ONCE(reuse->synq_overflow_ts); for (i = 0; i < reuse->num_socks; ++i) rcu_assign_pointer(reuse->socks[i]->sk_reuseport_cb, @@ -152,8 +153,10 @@ int reuseport_add_sock(struct sock *sk, struct sock *sk2, bool bind_inany) reuse = rcu_dereference_protected(sk2->sk_reuseport_cb, lockdep_is_held(&reuseport_lock)); old_reuse = rcu_dereference_protected(sk->sk_reuseport_cb, - lockdep_is_held(&reuseport_lock)); - if (old_reuse && old_reuse->num_socks != 1) { + lockdep_is_held(&reuseport_lock)); + if (old_reuse == reuse) { + reuse->num_closed_socks--; + } else if (old_reuse && old_reuse->num_socks != 1) { spin_unlock_bh(&reuseport_lock); return -EBUSY; } @@ -174,8 +177,9 @@ int reuseport_add_sock(struct sock *sk, struct sock *sk2, bool bind_inany) spin_unlock_bh(&reuseport_lock); - if (old_reuse) + if (old_reuse && old_reuse != reuse) call_rcu(&old_reuse->rcu, reuseport_free_rcu); + return 0; } EXPORT_SYMBOL(reuseport_add_sock); @@ -199,17 +203,28 @@ void reuseport_detach_sock(struct sock *sk) */ bpf_sk_reuseport_detach(sk); - rcu_assign_pointer(sk->sk_reuseport_cb, NULL); - - for (i = 0; i < reuse->num_socks; i++) { - if (reuse->socks[i] == sk) { - reuse->socks[i] = reuse->socks[reuse->num_socks - 1]; - reuse->num_socks--; - if (reuse->num_socks == 0) - call_rcu(&reuse->rcu, reuseport_free_rcu); + if (sk->sk_protocol == IPPROTO_TCP && sk->sk_state == TCP_CLOSE) { + reuse->num_closed_socks--; + rcu_assign_pointer(sk->sk_reuseport_cb, NULL); + } else { + for (i = 0; i < reuse->num_socks; i++) { + if (reuse->socks[i] != sk) + continue; break; } + + reuse->num_socks--; + reuse->socks[i] = reuse->socks[reuse->num_socks]; + + if (sk->sk_protocol == IPPROTO_TCP) + reuse->num_closed_socks++; + else + rcu_assign_pointer(sk->sk_reuseport_cb, NULL); } + + if (reuse->num_socks + reuse->num_closed_socks == 0) + call_rcu(&reuse->rcu, reuseport_free_rcu); + spin_unlock_bh(&reuseport_lock); } EXPORT_SYMBOL(reuseport_detach_sock); diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index f60869acbef0..1451aa9712b0 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -138,6 +138,7 @@ static int inet_csk_bind_conflict(const struct sock *sk, bool reuse = sk->sk_reuse; bool reuseport = !!sk->sk_reuseport; kuid_t uid = sock_i_uid((struct sock *)sk); + struct sock_reuseport *reuseport_cb = rcu_access_pointer(sk->sk_reuseport_cb); /* * Unlike other sk lookup places we do not check @@ -156,14 +157,16 @@ static int inet_csk_bind_conflict(const struct sock *sk, if ((!relax || (!reuseport_ok && reuseport && sk2->sk_reuseport && - !rcu_access_pointer(sk->sk_reuseport_cb) && + (!reuseport_cb || + reuseport_cb == rcu_access_pointer(sk2->sk_reuseport_cb)) && (sk2->sk_state == TCP_TIME_WAIT || uid_eq(uid, sock_i_uid(sk2))))) && inet_rcv_saddr_equal(sk, sk2, true)) break; } else if (!reuseport_ok || !reuseport || !sk2->sk_reuseport || - rcu_access_pointer(sk->sk_reuseport_cb) || + (reuseport_cb && + reuseport_cb != rcu_access_pointer(sk2->sk_reuseport_cb)) || (sk2->sk_state != TCP_TIME_WAIT && !uid_eq(uid, sock_i_uid(sk2)))) { if (inet_rcv_saddr_equal(sk, sk2, true)) From patchwork Mon Dec 7 13:24:46 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kuniyuki Iwashima X-Patchwork-Id: 339540 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CDF79C4361B for ; Mon, 7 Dec 2020 13:27:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8EC7A235FF for ; Mon, 7 Dec 2020 13:27:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726661AbgLGN0v (ORCPT ); Mon, 7 Dec 2020 08:26:51 -0500 Received: from smtp-fw-33001.amazon.com ([207.171.190.10]:35272 "EHLO smtp-fw-33001.amazon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726174AbgLGN0v (ORCPT ); Mon, 7 Dec 2020 08:26:51 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.co.jp; i=@amazon.co.jp; q=dns/txt; s=amazon201209; t=1607347610; x=1638883610; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version; bh=ZgNA79r8oboVjgxvQ5hc/k4jl9iZmqF9kCrWvhttAso=; b=T+wdSdcLjqn3Dovn/2gfHPSv46v8GPzF/jc1xvAnHB40+A24MSgKEEaj n4QLHLZtglw/6wNxgC6d8AZP9bSgALP8uBwq6F9uFEuwbUusMvKMv17Ok m7WVARaj1ELCGfC5pp2Qz413hHXA+QqyVIxn68k0YSTL1FPpWUfKj60h+ M=; X-IronPort-AV: E=Sophos;i="5.78,399,1599523200"; d="scan'208";a="101016343" Received: from sea32-co-svc-lb4-vlan3.sea.corp.amazon.com (HELO email-inbound-relay-1e-42f764a0.us-east-1.amazon.com) ([10.47.23.38]) by smtp-border-fw-out-33001.sea14.amazon.com with ESMTP; 07 Dec 2020 13:26:08 +0000 Received: from EX13MTAUWB001.ant.amazon.com (iad12-ws-svc-p26-lb9-vlan2.iad.amazon.com [10.40.163.34]) by email-inbound-relay-1e-42f764a0.us-east-1.amazon.com (Postfix) with ESMTPS id A8983B3DD3; Mon, 7 Dec 2020 13:26:05 +0000 (UTC) Received: from EX13D04ANC001.ant.amazon.com (10.43.157.89) by EX13MTAUWB001.ant.amazon.com (10.43.161.249) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Mon, 7 Dec 2020 13:26:04 +0000 Received: from 38f9d3582de7.ant.amazon.com (10.43.161.43) by EX13D04ANC001.ant.amazon.com (10.43.157.89) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Mon, 7 Dec 2020 13:26:00 +0000 From: Kuniyuki Iwashima To: "David S . Miller" , Jakub Kicinski , Eric Dumazet , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau CC: Benjamin Herrenschmidt , Kuniyuki Iwashima , Kuniyuki Iwashima , , , , Waiman Long Subject: [PATCH v2 bpf-next 03/13] Revert "locking/spinlocks: Remove the unused spin_lock_bh_nested() API" Date: Mon, 7 Dec 2020 22:24:46 +0900 Message-ID: <20201207132456.65472-4-kuniyu@amazon.co.jp> X-Mailer: git-send-email 2.17.2 (Apple Git-113) In-Reply-To: <20201207132456.65472-1-kuniyu@amazon.co.jp> References: <20201207132456.65472-1-kuniyu@amazon.co.jp> MIME-Version: 1.0 X-Originating-IP: [10.43.161.43] X-ClientProxiedBy: EX13D37UWC002.ant.amazon.com (10.43.162.123) To EX13D04ANC001.ant.amazon.com (10.43.157.89) Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This reverts commit 607904c357c61adf20b8fd18af765e501d61a385 to use spin_lock_bh_nested() in the next commit. Link: https://lore.kernel.org/netdev/9d290a57-49e1-04cd-2487-262b0d7c5844@gmail.com/ Signed-off-by: Kuniyuki Iwashima CC: Waiman Long Acked-by: Waiman Long --- include/linux/spinlock.h | 8 ++++++++ include/linux/spinlock_api_smp.h | 2 ++ include/linux/spinlock_api_up.h | 1 + kernel/locking/spinlock.c | 8 ++++++++ 4 files changed, 19 insertions(+) diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h index 79897841a2cc..c020b375a071 100644 --- a/include/linux/spinlock.h +++ b/include/linux/spinlock.h @@ -227,6 +227,8 @@ static inline void do_raw_spin_unlock(raw_spinlock_t *lock) __releases(lock) #ifdef CONFIG_DEBUG_LOCK_ALLOC # define raw_spin_lock_nested(lock, subclass) \ _raw_spin_lock_nested(lock, subclass) +# define raw_spin_lock_bh_nested(lock, subclass) \ + _raw_spin_lock_bh_nested(lock, subclass) # define raw_spin_lock_nest_lock(lock, nest_lock) \ do { \ @@ -242,6 +244,7 @@ static inline void do_raw_spin_unlock(raw_spinlock_t *lock) __releases(lock) # define raw_spin_lock_nested(lock, subclass) \ _raw_spin_lock(((void)(subclass), (lock))) # define raw_spin_lock_nest_lock(lock, nest_lock) _raw_spin_lock(lock) +# define raw_spin_lock_bh_nested(lock, subclass) _raw_spin_lock_bh(lock) #endif #if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) @@ -369,6 +372,11 @@ do { \ raw_spin_lock_nested(spinlock_check(lock), subclass); \ } while (0) +#define spin_lock_bh_nested(lock, subclass) \ +do { \ + raw_spin_lock_bh_nested(spinlock_check(lock), subclass);\ +} while (0) + #define spin_lock_nest_lock(lock, nest_lock) \ do { \ raw_spin_lock_nest_lock(spinlock_check(lock), nest_lock); \ diff --git a/include/linux/spinlock_api_smp.h b/include/linux/spinlock_api_smp.h index 19a9be9d97ee..d565fb6304f2 100644 --- a/include/linux/spinlock_api_smp.h +++ b/include/linux/spinlock_api_smp.h @@ -22,6 +22,8 @@ int in_lock_functions(unsigned long addr); void __lockfunc _raw_spin_lock(raw_spinlock_t *lock) __acquires(lock); void __lockfunc _raw_spin_lock_nested(raw_spinlock_t *lock, int subclass) __acquires(lock); +void __lockfunc _raw_spin_lock_bh_nested(raw_spinlock_t *lock, int subclass) + __acquires(lock); void __lockfunc _raw_spin_lock_nest_lock(raw_spinlock_t *lock, struct lockdep_map *map) __acquires(lock); diff --git a/include/linux/spinlock_api_up.h b/include/linux/spinlock_api_up.h index d0d188861ad6..d3afef9d8dbe 100644 --- a/include/linux/spinlock_api_up.h +++ b/include/linux/spinlock_api_up.h @@ -57,6 +57,7 @@ #define _raw_spin_lock(lock) __LOCK(lock) #define _raw_spin_lock_nested(lock, subclass) __LOCK(lock) +#define _raw_spin_lock_bh_nested(lock, subclass) __LOCK(lock) #define _raw_read_lock(lock) __LOCK(lock) #define _raw_write_lock(lock) __LOCK(lock) #define _raw_spin_lock_bh(lock) __LOCK_BH(lock) diff --git a/kernel/locking/spinlock.c b/kernel/locking/spinlock.c index 0ff08380f531..48e99ed1bdd8 100644 --- a/kernel/locking/spinlock.c +++ b/kernel/locking/spinlock.c @@ -363,6 +363,14 @@ void __lockfunc _raw_spin_lock_nested(raw_spinlock_t *lock, int subclass) } EXPORT_SYMBOL(_raw_spin_lock_nested); +void __lockfunc _raw_spin_lock_bh_nested(raw_spinlock_t *lock, int subclass) +{ + __local_bh_disable_ip(_RET_IP_, SOFTIRQ_LOCK_OFFSET); + spin_acquire(&lock->dep_map, subclass, 0, _RET_IP_); + LOCK_CONTENDED(lock, do_raw_spin_trylock, do_raw_spin_lock); +} +EXPORT_SYMBOL(_raw_spin_lock_bh_nested); + unsigned long __lockfunc _raw_spin_lock_irqsave_nested(raw_spinlock_t *lock, int subclass) { From patchwork Mon Dec 7 13:24:48 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kuniyuki Iwashima X-Patchwork-Id: 339539 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 795A1C4361B for ; Mon, 7 Dec 2020 13:27:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5512723382 for ; Mon, 7 Dec 2020 13:27:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726730AbgLGN1b (ORCPT ); Mon, 7 Dec 2020 08:27:31 -0500 Received: from smtp-fw-6002.amazon.com ([52.95.49.90]:36694 "EHLO smtp-fw-6002.amazon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725804AbgLGN1a (ORCPT ); Mon, 7 Dec 2020 08:27:30 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.co.jp; i=@amazon.co.jp; q=dns/txt; s=amazon201209; t=1607347650; x=1638883650; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version; bh=Vonv/Q6hCNxXl/Pnwc4Bge1XzPa1igFZTKnOjMRgcwA=; b=QeEghicxIcu1iQIlOViankC8PG6bWG7rldwwU3Q/lFsPymngP529tYf6 tu4Hh3OcdH0fn7Ymd5HC3Va4IbxwCEYwImMBNmN0h3pNErGslIpScjMYA O64s91Ubkhj+QFHlMkrzTixWJi5iCGzLHFii3CgyZ4A7DwqBIdtzwXZYX A=; X-IronPort-AV: E=Sophos;i="5.78,399,1599523200"; d="scan'208";a="69561666" Received: from iad12-co-svc-p1-lb1-vlan2.amazon.com (HELO email-inbound-relay-1d-37fd6b3d.us-east-1.amazon.com) ([10.43.8.2]) by smtp-border-fw-out-6002.iad6.amazon.com with ESMTP; 07 Dec 2020 13:26:49 +0000 Received: from EX13MTAUWB001.ant.amazon.com (iad12-ws-svc-p26-lb9-vlan2.iad.amazon.com [10.40.163.34]) by email-inbound-relay-1d-37fd6b3d.us-east-1.amazon.com (Postfix) with ESMTPS id 2DEAD28481E; Mon, 7 Dec 2020 13:26:45 +0000 (UTC) Received: from EX13D04ANC001.ant.amazon.com (10.43.157.89) by EX13MTAUWB001.ant.amazon.com (10.43.161.249) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Mon, 7 Dec 2020 13:26:45 +0000 Received: from 38f9d3582de7.ant.amazon.com (10.43.161.43) by EX13D04ANC001.ant.amazon.com (10.43.157.89) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Mon, 7 Dec 2020 13:26:40 +0000 From: Kuniyuki Iwashima To: "David S . Miller" , Jakub Kicinski , Eric Dumazet , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau CC: Benjamin Herrenschmidt , Kuniyuki Iwashima , Kuniyuki Iwashima , , , Subject: [PATCH v2 bpf-next 05/13] tcp: Set the new listener to migrated TFO requests. Date: Mon, 7 Dec 2020 22:24:48 +0900 Message-ID: <20201207132456.65472-6-kuniyu@amazon.co.jp> X-Mailer: git-send-email 2.17.2 (Apple Git-113) In-Reply-To: <20201207132456.65472-1-kuniyu@amazon.co.jp> References: <20201207132456.65472-1-kuniyu@amazon.co.jp> MIME-Version: 1.0 X-Originating-IP: [10.43.161.43] X-ClientProxiedBy: EX13D37UWC002.ant.amazon.com (10.43.162.123) To EX13D04ANC001.ant.amazon.com (10.43.157.89) Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org A TFO request socket is only freed after BOTH 3WHS has completed (or aborted) and the child socket has been accepted (or its listener has been closed). Hence, depending on the order, there can be two kinds of request sockets in the accept queue. 3WHS -> accept : TCP_ESTABLISHED accept -> 3WHS : TCP_SYN_RECV Unlike TCP_ESTABLISHED socket, accept() does not free the request socket for TCP_SYN_RECV socket. It is freed later at reqsk_fastopen_remove(). Also, it accesses request_sock.rsk_listener. So, in order to complete TFO socket migration, we have to set the current listener to it at accept() before reqsk_fastopen_remove(). Reviewed-by: Benjamin Herrenschmidt Signed-off-by: Kuniyuki Iwashima --- net/ipv4/inet_connection_sock.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index 5da38a756e4c..143590858c2e 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -500,6 +500,16 @@ struct sock *inet_csk_accept(struct sock *sk, int flags, int *err, bool kern) tcp_rsk(req)->tfo_listener) { spin_lock_bh(&queue->fastopenq.lock); if (tcp_rsk(req)->tfo_listener) { + if (req->rsk_listener != sk) { + /* TFO request was migrated to another listener so + * the new listener must be used in reqsk_fastopen_remove() + * to hold requests which cause RST. + */ + sock_put(req->rsk_listener); + sock_hold(sk); + req->rsk_listener = sk; + } + /* We are still waiting for the final ACK from 3WHS * so can't free req now. Instead, we set req->sk to * NULL to signify that the child socket is taken @@ -954,7 +964,6 @@ static void inet_child_forget(struct sock *sk, struct request_sock *req, if (sk->sk_protocol == IPPROTO_TCP && tcp_rsk(req)->tfo_listener) { BUG_ON(rcu_access_pointer(tcp_sk(child)->fastopen_rsk) != req); - BUG_ON(sk != req->rsk_listener); /* Paranoid, to prevent race condition if * an inbound pkt destined for child is From patchwork Mon Dec 7 13:24:51 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kuniyuki Iwashima X-Patchwork-Id: 339537 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5E73DC1B0E3 for ; Mon, 7 Dec 2020 13:28:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3939A23382 for ; Mon, 7 Dec 2020 13:28:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726422AbgLGN2V (ORCPT ); Mon, 7 Dec 2020 08:28:21 -0500 Received: from smtp-fw-6002.amazon.com ([52.95.49.90]:36838 "EHLO smtp-fw-6002.amazon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725823AbgLGN2S (ORCPT ); Mon, 7 Dec 2020 08:28:18 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.co.jp; i=@amazon.co.jp; q=dns/txt; s=amazon201209; t=1607347698; x=1638883698; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version; bh=PB9Ez+y4FD9VGYXw++IaSHiEr4C9tHxS7oGDzn/NXxo=; b=iGnVDJpP71Ni0d7zUEtffkJc7NRFkVlT6aQqCJMuQ9P9fXGoxjfCCOg8 DPeDHBvXkmXLpPGflx55nsHkV0N6RcF/Uj02AP/6nP2zU0cTS3tQhIRhW AS37/2GCXvOeUp7FF8vaEq1/couVkbnZXo2JZIT9RBu9PE84/+0PwO+0k g=; X-IronPort-AV: E=Sophos;i="5.78,399,1599523200"; d="scan'208";a="69561783" Received: from iad12-co-svc-p1-lb1-vlan2.amazon.com (HELO email-inbound-relay-1a-af6a10df.us-east-1.amazon.com) ([10.43.8.2]) by smtp-border-fw-out-6002.iad6.amazon.com with ESMTP; 07 Dec 2020 13:27:37 +0000 Received: from EX13MTAUWB001.ant.amazon.com (iad12-ws-svc-p26-lb9-vlan3.iad.amazon.com [10.40.163.38]) by email-inbound-relay-1a-af6a10df.us-east-1.amazon.com (Postfix) with ESMTPS id 9A40FA221B; Mon, 7 Dec 2020 13:27:34 +0000 (UTC) Received: from EX13D04ANC001.ant.amazon.com (10.43.157.89) by EX13MTAUWB001.ant.amazon.com (10.43.161.249) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Mon, 7 Dec 2020 13:27:33 +0000 Received: from 38f9d3582de7.ant.amazon.com (10.43.161.43) by EX13D04ANC001.ant.amazon.com (10.43.157.89) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Mon, 7 Dec 2020 13:27:29 +0000 From: Kuniyuki Iwashima To: "David S . Miller" , Jakub Kicinski , Eric Dumazet , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau CC: Benjamin Herrenschmidt , Kuniyuki Iwashima , Kuniyuki Iwashima , , , Subject: [PATCH v2 bpf-next 08/13] bpf: Introduce two attach types for BPF_PROG_TYPE_SK_REUSEPORT. Date: Mon, 7 Dec 2020 22:24:51 +0900 Message-ID: <20201207132456.65472-9-kuniyu@amazon.co.jp> X-Mailer: git-send-email 2.17.2 (Apple Git-113) In-Reply-To: <20201207132456.65472-1-kuniyu@amazon.co.jp> References: <20201207132456.65472-1-kuniyu@amazon.co.jp> MIME-Version: 1.0 X-Originating-IP: [10.43.161.43] X-ClientProxiedBy: EX13D37UWC002.ant.amazon.com (10.43.162.123) To EX13D04ANC001.ant.amazon.com (10.43.157.89) Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This commit adds new bpf_attach_type for BPF_PROG_TYPE_SK_REUSEPORT to check if the attached eBPF program is capable of migrating sockets. When the eBPF program is attached, the kernel runs it for socket migration only if the expected_attach_type is BPF_SK_REUSEPORT_SELECT_OR_MIGRATE. The kernel will change the behaviour depending on the returned value: - SK_PASS with selected_sk, select it as a new listener - SK_PASS with selected_sk NULL, fall back to the random selection - SK_DROP, cancel the migration Link: https://lore.kernel.org/netdev/20201123003828.xjpjdtk4ygl6tg6h@kafai-mbp.dhcp.thefacebook.com/ Suggested-by: Martin KaFai Lau Signed-off-by: Kuniyuki Iwashima --- include/uapi/linux/bpf.h | 2 ++ kernel/bpf/syscall.c | 13 +++++++++++++ tools/include/uapi/linux/bpf.h | 2 ++ 3 files changed, 17 insertions(+) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 7a48e0055500..c7f6848c0226 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -241,6 +241,8 @@ enum bpf_attach_type { BPF_XDP_CPUMAP, BPF_SK_LOOKUP, BPF_XDP, + BPF_SK_REUSEPORT_SELECT, + BPF_SK_REUSEPORT_SELECT_OR_MIGRATE, __MAX_BPF_ATTACH_TYPE }; diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 0cd3cc2af9c1..0737673c727c 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -1920,6 +1920,11 @@ static void bpf_prog_load_fixup_attach_type(union bpf_attr *attr) attr->expected_attach_type = BPF_CGROUP_INET_SOCK_CREATE; break; + case BPF_PROG_TYPE_SK_REUSEPORT: + if (!attr->expected_attach_type) + attr->expected_attach_type = + BPF_SK_REUSEPORT_SELECT; + break; } } @@ -2003,6 +2008,14 @@ bpf_prog_load_check_attach(enum bpf_prog_type prog_type, if (expected_attach_type == BPF_SK_LOOKUP) return 0; return -EINVAL; + case BPF_PROG_TYPE_SK_REUSEPORT: + switch (expected_attach_type) { + case BPF_SK_REUSEPORT_SELECT: + case BPF_SK_REUSEPORT_SELECT_OR_MIGRATE: + return 0; + default: + return -EINVAL; + } case BPF_PROG_TYPE_EXT: if (expected_attach_type) return -EINVAL; diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 7a48e0055500..c7f6848c0226 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -241,6 +241,8 @@ enum bpf_attach_type { BPF_XDP_CPUMAP, BPF_SK_LOOKUP, BPF_XDP, + BPF_SK_REUSEPORT_SELECT, + BPF_SK_REUSEPORT_SELECT_OR_MIGRATE, __MAX_BPF_ATTACH_TYPE }; From patchwork Mon Dec 7 13:24:52 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kuniyuki Iwashima X-Patchwork-Id: 339538 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 293D1C1B087 for ; Mon, 7 Dec 2020 13:28:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id F31852343E for ; Mon, 7 Dec 2020 13:28:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726485AbgLGN2W (ORCPT ); Mon, 7 Dec 2020 08:28:22 -0500 Received: from smtp-fw-9103.amazon.com ([207.171.188.200]:54916 "EHLO smtp-fw-9103.amazon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725917AbgLGN2U (ORCPT ); Mon, 7 Dec 2020 08:28:20 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.co.jp; i=@amazon.co.jp; q=dns/txt; s=amazon201209; t=1607347699; x=1638883699; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version; bh=zZWQroLY9J+nGi+ykaI4WpcwCCgdqU/PMdq9gqllP8M=; b=lIbaP/P9pB0fX+tDTg8pBWongkWfx23+avN5YtWJ63RzYcqV8uHUDDeo PmqMLA7uGpOQ3e0W0nKF0FPgFdIWB2SUFqUmig269CBBrUEoAITd6Z23g bJfdmmNjmG+Qod7Oazq5gsDhm3uAijfFhYwAyfi4JNAbzwOVbvWSKeOPu w=; X-IronPort-AV: E=Sophos;i="5.78,399,1599523200"; d="scan'208";a="901122159" Received: from sea32-co-svc-lb4-vlan3.sea.corp.amazon.com (HELO email-inbound-relay-1d-2c665b5d.us-east-1.amazon.com) ([10.47.23.38]) by smtp-border-fw-out-9103.sea19.amazon.com with ESMTP; 07 Dec 2020 13:27:53 +0000 Received: from EX13MTAUWB001.ant.amazon.com (iad12-ws-svc-p26-lb9-vlan3.iad.amazon.com [10.40.163.38]) by email-inbound-relay-1d-2c665b5d.us-east-1.amazon.com (Postfix) with ESMTPS id 4B0C0A1E56; Mon, 7 Dec 2020 13:27:50 +0000 (UTC) Received: from EX13D04ANC001.ant.amazon.com (10.43.157.89) by EX13MTAUWB001.ant.amazon.com (10.43.161.249) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Mon, 7 Dec 2020 13:27:50 +0000 Received: from 38f9d3582de7.ant.amazon.com (10.43.161.43) by EX13D04ANC001.ant.amazon.com (10.43.157.89) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Mon, 7 Dec 2020 13:27:45 +0000 From: Kuniyuki Iwashima To: "David S . Miller" , Jakub Kicinski , Eric Dumazet , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau CC: Benjamin Herrenschmidt , Kuniyuki Iwashima , Kuniyuki Iwashima , , , Subject: [PATCH v2 bpf-next 09/13] libbpf: Set expected_attach_type for BPF_PROG_TYPE_SK_REUSEPORT. Date: Mon, 7 Dec 2020 22:24:52 +0900 Message-ID: <20201207132456.65472-10-kuniyu@amazon.co.jp> X-Mailer: git-send-email 2.17.2 (Apple Git-113) In-Reply-To: <20201207132456.65472-1-kuniyu@amazon.co.jp> References: <20201207132456.65472-1-kuniyu@amazon.co.jp> MIME-Version: 1.0 X-Originating-IP: [10.43.161.43] X-ClientProxiedBy: EX13D37UWC002.ant.amazon.com (10.43.162.123) To EX13D04ANC001.ant.amazon.com (10.43.157.89) Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This commit introduces a new section (sk_reuseport/migrate) and sets expected_attach_type to two each section in BPF_PROG_TYPE_SK_REUSEPORT program. Signed-off-by: Kuniyuki Iwashima --- tools/lib/bpf/libbpf.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index 9be88a90a4aa..ba64c891a5e7 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -8471,7 +8471,10 @@ static struct bpf_link *attach_iter(const struct bpf_sec_def *sec, static const struct bpf_sec_def section_defs[] = { BPF_PROG_SEC("socket", BPF_PROG_TYPE_SOCKET_FILTER), - BPF_PROG_SEC("sk_reuseport", BPF_PROG_TYPE_SK_REUSEPORT), + BPF_EAPROG_SEC("sk_reuseport/migrate", BPF_PROG_TYPE_SK_REUSEPORT, + BPF_SK_REUSEPORT_SELECT_OR_MIGRATE), + BPF_EAPROG_SEC("sk_reuseport", BPF_PROG_TYPE_SK_REUSEPORT, + BPF_SK_REUSEPORT_SELECT), SEC_DEF("kprobe/", KPROBE, .attach_fn = attach_kprobe), BPF_PROG_SEC("uprobe/", BPF_PROG_TYPE_KPROBE), From patchwork Mon Dec 7 13:24:54 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kuniyuki Iwashima X-Patchwork-Id: 339536 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BB4FFC4167B for ; Mon, 7 Dec 2020 13:29:27 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8BDDF23407 for ; Mon, 7 Dec 2020 13:29:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726830AbgLGN3L (ORCPT ); Mon, 7 Dec 2020 08:29:11 -0500 Received: from smtp-fw-6002.amazon.com ([52.95.49.90]:36988 "EHLO smtp-fw-6002.amazon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725852AbgLGN3L (ORCPT ); Mon, 7 Dec 2020 08:29:11 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.co.jp; i=@amazon.co.jp; q=dns/txt; s=amazon201209; t=1607347749; x=1638883749; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version; bh=7XiD9aIMDu4mgLkRDGGRXRcyA0AIujxpHFHP5MzN3Ls=; b=iXT7U4Ia/WVxfMxGp4P83ymnCf06ELcH9Pj9A9OpyYel09aJqIzevZGH Gz9+2aGgmgukohbuj6nOdQ23h9YNi02gZoCDw/9nTrVL8iCnE2GZ2ZXc0 N7ZcnBXAraNsAEObgwaV8l3B5EbC+XtywaJzZXqV247vRBE4Higfs85/1 0=; X-IronPort-AV: E=Sophos;i="5.78,399,1599523200"; d="scan'208";a="69561866" Received: from iad12-co-svc-p1-lb1-vlan2.amazon.com (HELO email-inbound-relay-1a-e34f1ddc.us-east-1.amazon.com) ([10.43.8.2]) by smtp-border-fw-out-6002.iad6.amazon.com with ESMTP; 07 Dec 2020 13:28:29 +0000 Received: from EX13MTAUWB001.ant.amazon.com (iad12-ws-svc-p26-lb9-vlan2.iad.amazon.com [10.40.163.34]) by email-inbound-relay-1a-e34f1ddc.us-east-1.amazon.com (Postfix) with ESMTPS id 51446A1E93; Mon, 7 Dec 2020 13:28:25 +0000 (UTC) Received: from EX13D04ANC001.ant.amazon.com (10.43.157.89) by EX13MTAUWB001.ant.amazon.com (10.43.161.249) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Mon, 7 Dec 2020 13:28:24 +0000 Received: from 38f9d3582de7.ant.amazon.com (10.43.161.43) by EX13D04ANC001.ant.amazon.com (10.43.157.89) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Mon, 7 Dec 2020 13:28:15 +0000 From: Kuniyuki Iwashima To: "David S . Miller" , Jakub Kicinski , Eric Dumazet , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau CC: Benjamin Herrenschmidt , Kuniyuki Iwashima , Kuniyuki Iwashima , , , Subject: [PATCH v2 bpf-next 11/13] bpf: Support BPF_FUNC_get_socket_cookie() for BPF_PROG_TYPE_SK_REUSEPORT. Date: Mon, 7 Dec 2020 22:24:54 +0900 Message-ID: <20201207132456.65472-12-kuniyu@amazon.co.jp> X-Mailer: git-send-email 2.17.2 (Apple Git-113) In-Reply-To: <20201207132456.65472-1-kuniyu@amazon.co.jp> References: <20201207132456.65472-1-kuniyu@amazon.co.jp> MIME-Version: 1.0 X-Originating-IP: [10.43.161.43] X-ClientProxiedBy: EX13D37UWC002.ant.amazon.com (10.43.162.123) To EX13D04ANC001.ant.amazon.com (10.43.157.89) Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org We will call sock_reuseport.prog for socket migration in the next commit, so the eBPF program has to know which listener is closing in order to select the new listener. Currently, we can get a unique ID for each listener in the userspace by calling bpf_map_lookup_elem() for BPF_MAP_TYPE_REUSEPORT_SOCKARRAY map. This patch makes the sk pointer available in sk_reuseport_md so that we can get the ID by BPF_FUNC_get_socket_cookie() in the eBPF program. Link: https://lore.kernel.org/netdev/20201119001154.kapwihc2plp4f7zc@kafai-mbp.dhcp.thefacebook.com/ Suggested-by: Martin KaFai Lau Signed-off-by: Kuniyuki Iwashima --- include/uapi/linux/bpf.h | 8 ++++++++ net/core/filter.c | 22 ++++++++++++++++++++++ tools/include/uapi/linux/bpf.h | 8 ++++++++ 3 files changed, 38 insertions(+) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index cf518e83df5c..a688a7a4fe85 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -1655,6 +1655,13 @@ union bpf_attr { * A 8-byte long non-decreasing number on success, or 0 if the * socket field is missing inside *skb*. * + * u64 bpf_get_socket_cookie(struct bpf_sock *sk) + * Description + * Equivalent to bpf_get_socket_cookie() helper that accepts + * *skb*, but gets socket from **struct bpf_sock** context. + * Return + * A 8-byte long non-decreasing number. + * * u64 bpf_get_socket_cookie(struct bpf_sock_addr *ctx) * Description * Equivalent to bpf_get_socket_cookie() helper that accepts @@ -4463,6 +4470,7 @@ struct sk_reuseport_md { __u32 bind_inany; /* Is sock bound to an INANY address? */ __u32 hash; /* A hash of the packet 4 tuples */ __u8 migration; /* Migration type */ + __bpf_md_ptr(struct bpf_sock *, sk); /* Current listening socket */ }; #define BPF_TAG_SIZE 8 diff --git a/net/core/filter.c b/net/core/filter.c index 7bdf62f24044..9f7018e3f545 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -4631,6 +4631,18 @@ static const struct bpf_func_proto bpf_get_socket_cookie_sock_proto = { .arg1_type = ARG_PTR_TO_CTX, }; +BPF_CALL_1(bpf_get_socket_pointer_cookie, struct sock *, sk) +{ + return __sock_gen_cookie(sk); +} + +static const struct bpf_func_proto bpf_get_socket_pointer_cookie_proto = { + .func = bpf_get_socket_pointer_cookie, + .gpl_only = false, + .ret_type = RET_INTEGER, + .arg1_type = ARG_PTR_TO_SOCKET, +}; + BPF_CALL_1(bpf_get_socket_cookie_sock_ops, struct bpf_sock_ops_kern *, ctx) { return __sock_gen_cookie(ctx->sk); @@ -9989,6 +10001,8 @@ sk_reuseport_func_proto(enum bpf_func_id func_id, return &sk_reuseport_load_bytes_proto; case BPF_FUNC_skb_load_bytes_relative: return &sk_reuseport_load_bytes_relative_proto; + case BPF_FUNC_get_socket_cookie: + return &bpf_get_socket_pointer_cookie_proto; default: return bpf_base_func_proto(func_id); } @@ -10022,6 +10036,10 @@ sk_reuseport_is_valid_access(int off, int size, return prog->expected_attach_type == BPF_SK_REUSEPORT_SELECT_OR_MIGRATE && size == sizeof(__u8); + case offsetof(struct sk_reuseport_md, sk): + info->reg_type = PTR_TO_SOCKET; + return size == sizeof(__u64); + /* Fields that allow narrowing */ case bpf_ctx_range(struct sk_reuseport_md, eth_protocol): if (size < sizeof_field(struct sk_buff, protocol)) @@ -10098,6 +10116,10 @@ static u32 sk_reuseport_convert_ctx_access(enum bpf_access_type type, case offsetof(struct sk_reuseport_md, migration): SK_REUSEPORT_LOAD_FIELD(migration); break; + + case offsetof(struct sk_reuseport_md, sk): + SK_REUSEPORT_LOAD_FIELD(sk); + break; } return insn - insn_buf; diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index cf518e83df5c..a688a7a4fe85 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -1655,6 +1655,13 @@ union bpf_attr { * A 8-byte long non-decreasing number on success, or 0 if the * socket field is missing inside *skb*. * + * u64 bpf_get_socket_cookie(struct bpf_sock *sk) + * Description + * Equivalent to bpf_get_socket_cookie() helper that accepts + * *skb*, but gets socket from **struct bpf_sock** context. + * Return + * A 8-byte long non-decreasing number. + * * u64 bpf_get_socket_cookie(struct bpf_sock_addr *ctx) * Description * Equivalent to bpf_get_socket_cookie() helper that accepts @@ -4463,6 +4470,7 @@ struct sk_reuseport_md { __u32 bind_inany; /* Is sock bound to an INANY address? */ __u32 hash; /* A hash of the packet 4 tuples */ __u8 migration; /* Migration type */ + __bpf_md_ptr(struct bpf_sock *, sk); /* Current listening socket */ }; #define BPF_TAG_SIZE 8