From patchwork Thu Apr 1 22:00:19 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 414131 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CC6DAC433B4 for ; Thu, 1 Apr 2021 22:00:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 80E0F610F9 for ; Thu, 1 Apr 2021 22:00:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235448AbhDAWAh (ORCPT ); Thu, 1 Apr 2021 18:00:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58356 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234474AbhDAWAe (ORCPT ); Thu, 1 Apr 2021 18:00:34 -0400 Received: from mail-il1-x131.google.com (mail-il1-x131.google.com [IPv6:2607:f8b0:4864:20::131]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6D438C0613E6; Thu, 1 Apr 2021 15:00:34 -0700 (PDT) Received: by mail-il1-x131.google.com with SMTP id z9so3315595ilb.4; Thu, 01 Apr 2021 15:00:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:from:to:cc:date:message-id:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=gEN6WS0MVlUI3EFzvJqSIyxZRdWSPD1ycyVcmAgZq5g=; b=U3p7VneSWyIgTrEaBSXgJT2E2sDXQko1jtYLlLTy68baNI2UNG0hmKuJZEQ2sAbIKR x7hWHoPUf0EbLUB3MNGktDV8gBTG4UqviU4kNTiyMLJ7WvHlHX5BTWJ3rVpmlEHCd48j Bw0T6alSC5kiwidnncB29zx04f5ob0WZiruygQ16kVLTWH2CB3f5xdUEfsinX1QTS6ZO PAxwLj//0NVeXv9OzI1cnqZapvQDPRz0bMQmlHW6oKJtgppf8Fpmfx1gkXlpF0/q1WeF jbE54W/n8SnOsEFBRNV7kgoBzNnA/5ZNRwN3o/axVOhz4gEsHipErpi5a3NIdtrM+SqM RklA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=gEN6WS0MVlUI3EFzvJqSIyxZRdWSPD1ycyVcmAgZq5g=; b=fCY+hBmv6qgMuDH6r4iQkv58iXvfWNA1t8V5AERjJHvHTHU4WaHxhQ2kyRBc2nuKvj PNog6RgHj157Mz+rlUFqdClhPajBNLSgZ/VilDx/i/ZdSbWS+EEX2BzRASxCqgQPZYhN Cbf1hIdEJU+nN8XcRX+9wvxhb90WS8iaQZCjb/DgDmnK2CqvuoeGkigzYRUM8eG+Qr5Y Ss5spKVM9WnQxgTSE8YQx6gCTZDWo6hjnAcrhx+MasHUrLr76wlf0jCS08mZoYvhgAWY rV4wgtjaIi9d25e/cSZYL0S6J1wT+qODUXffxBwNgly3AJRfM0NZOgqMl6n7bhAhXxjU TDMQ== X-Gm-Message-State: AOAM532atVDw7cElQlA8WCbVHJxr92scxvfG071YFgjLh7w0jxzKlnAd YyuH6hMhchT3dDVkzPYlWVpkHviWHAo= X-Google-Smtp-Source: ABdhPJxavFh6mkdP0U/9ZlRlUDKe3NWMabSV7/qi2zRzeby/ftKNKpctMPATMurOSmN2R6wkpie+qA== X-Received: by 2002:a92:b74e:: with SMTP id c14mr8687175ilm.275.1617314433694; Thu, 01 Apr 2021 15:00:33 -0700 (PDT) Received: from [127.0.1.1] ([172.242.244.146]) by smtp.gmail.com with ESMTPSA id i6sm3123100ilr.61.2021.04.01.15.00.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 01 Apr 2021 15:00:33 -0700 (PDT) Subject: [PATCH bpf v2 1/2] bpf, sockmap: fix sk->prot unhash op reset From: John Fastabend To: xiyou.wangcong@gmail.com, andrii.nakryiko@gmail.com, daniel@iogearbox.net, ast@fb.com Cc: john.fastabend@gmail.com, bpf@vger.kernel.org, netdev@vger.kernel.org, lmb@cloudflare.com Date: Thu, 01 Apr 2021 15:00:19 -0700 Message-ID: <161731441904.68884.15593917809745631972.stgit@john-XPS-13-9370> In-Reply-To: <161731427139.68884.1934993103507544474.stgit@john-XPS-13-9370> References: <161731427139.68884.1934993103507544474.stgit@john-XPS-13-9370> User-Agent: StGit/0.23-85-g6af9 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org In '4da6a196f93b1' we fixed a potential unhash loop caused when a TLS socket in a sockmap was removed from the sockmap. This happened because the unhash operation on the TLS ctx continued to point at the sockmap implementation of unhash even though the psock has already been removed. The sockmap unhash handler when a psock is removed does the following, void sock_map_unhash(struct sock *sk) { void (*saved_unhash)(struct sock *sk); struct sk_psock *psock; rcu_read_lock(); psock = sk_psock(sk); if (unlikely(!psock)) { rcu_read_unlock(); if (sk->sk_prot->unhash) sk->sk_prot->unhash(sk); return; } [...] } The unlikely() case is there to handle the case where psock is detached but the proto ops have not been updated yet. But, in the above case with TLS and removed psock we never fixed sk_prot->unhash() and unhash() points back to sock_map_unhash resulting in a loop. To fix this we added this bit of code, static inline void sk_psock_restore_proto(struct sock *sk, struct sk_psock *psock) { sk->sk_prot->unhash = psock->saved_unhash; This will set the sk_prot->unhash back to its saved value. This is the correct callback for a TLS socket that has been removed from the sock_map. Unfortunately, this also overwrites the unhash pointer for all psocks. We effectively break sockmap unhash handling for any future socks. Omitting the unhash operation will leave stale entries in the map if a socket transition through unhash, but does not do close() op. To fix set unhash correctly before calling into tls_update. This way the TLS enabled socket will point to the saved unhash() handler. Fixes: 4da6a196f93b1 ("bpf: Sockmap/tls, during free we may call tcp_bpf_unhash() in loop") Reported-by: Cong Wang Reported-by: Lorenz Bauer Suggested-by: Cong Wang Signed-off-by: John Fastabend --- include/linux/skmsg.h | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h index 8edbbf5f2f93..822c048934e3 100644 --- a/include/linux/skmsg.h +++ b/include/linux/skmsg.h @@ -349,8 +349,13 @@ static inline void sk_psock_update_proto(struct sock *sk, static inline void sk_psock_restore_proto(struct sock *sk, struct sk_psock *psock) { - sk->sk_prot->unhash = psock->saved_unhash; if (inet_csk_has_ulp(sk)) { + /* TLS does not have an unhash proto in SW cases, but we need + * to ensure we stop using the sock_map unhash routine because + * the associated psock is being removed. So use the original + * unhash handler. + */ + WRITE_ONCE(sk->sk_prot->unhash, psock->saved_unhash); tcp_update_ulp(sk, psock->sk_proto, psock->saved_write_space); } else { sk->sk_write_space = psock->saved_write_space; From patchwork Thu Apr 1 22:00:40 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 414849 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9A763C433ED for ; Thu, 1 Apr 2021 22:00:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 654A8610FA for ; Thu, 1 Apr 2021 22:00:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235485AbhDAWA4 (ORCPT ); Thu, 1 Apr 2021 18:00:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58434 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234474AbhDAWAz (ORCPT ); Thu, 1 Apr 2021 18:00:55 -0400 Received: from mail-io1-xd32.google.com (mail-io1-xd32.google.com [IPv6:2607:f8b0:4864:20::d32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 70063C0613E6; Thu, 1 Apr 2021 15:00:55 -0700 (PDT) Received: by mail-io1-xd32.google.com with SMTP id f19so3673205ion.3; Thu, 01 Apr 2021 15:00:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:from:to:cc:date:message-id:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=/Y/2bXS+x40LAJhUqAWVL3NSg8OiZMwGupWZBWnd+7I=; b=GpleghdkTh4XLq+t3y0to8GLnIEIpRu43mJRtHF3aZlmc9Fsmu3Akx8T2+1KuT1aJd zHGbLYqSusLyjPW7PZQP1cjyxWPNWOJMVjZvRl+R8hz0jANU9Vpxmn73DXOWW1ghbQ9k ZrmUPvF2UMGMaxcTnl8pECkkD7Y9tLH1tRH+yUaeex+YnZAQ5kGQJBh6TSwC+2NaAPwU 3TbsxdmYtt0XNso4G+8Rtc7hi3hhaEqISpm3iicQFU8gRECmlDEeoRky2GyWHkjtg515 2MROs3S1iq5FMpiVP8LbDavEmKfbwFFAQkBcxripIlJLAgkvwK2DWXCK9AE8qIOtDirk 7n6w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=/Y/2bXS+x40LAJhUqAWVL3NSg8OiZMwGupWZBWnd+7I=; b=Y+NtlNq+DXurR3FlqCXoyb1vDNiAVZlonO3uaGsgtWFDtZSytHr+bEVWppEVi1eYgn 9NU9Qx/Cr5CxC75sAl/yGhXEwAaIHUdtZJAgaY9KRZPPag2vwH5TgkhXnKsmfc8EoFr8 4lQhO9rqiwoT7BBcfem/IOdiFluDnVclLGZaLwOTzxvljWLuKkGH5bSsw9xfYUtyC5cO Eb6NYsKRSFB5PkWm9YkL1kV02RoX9K6WKxTo7B80M3BmfFOrzRtWtdtnCI1wP/npbWyG V9UH19Qdmga31dNY4bUw5nDDkkQrT1vzcZqR1uuJlwEvUZWHpLALpnhQt2KMfYjUia7H 0P3w== X-Gm-Message-State: AOAM531hLAKYf3BaXj7+yiPw+SuPH/8R740xgt80uAV733B4iV3wUnhK eDnuwEHAHwHpdhfW+bIXuF0= X-Google-Smtp-Source: ABdhPJzpkdM7I4dBfiaSeNQnnZj+zjtjzQXobO2w+vNethUwToag5imhLOvPR4eXSFFbWExa+bePyA== X-Received: by 2002:a5d:960d:: with SMTP id w13mr8108720iol.126.1617314454840; Thu, 01 Apr 2021 15:00:54 -0700 (PDT) Received: from [127.0.1.1] ([172.242.244.146]) by smtp.gmail.com with ESMTPSA id q207sm3420052iod.6.2021.04.01.15.00.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 01 Apr 2021 15:00:54 -0700 (PDT) Subject: [PATCH bpf v2 2/2] bpf, sockmap: fix incorrect fwd_alloc accounting From: John Fastabend To: xiyou.wangcong@gmail.com, andrii.nakryiko@gmail.com, daniel@iogearbox.net, ast@fb.com Cc: john.fastabend@gmail.com, bpf@vger.kernel.org, netdev@vger.kernel.org, lmb@cloudflare.com Date: Thu, 01 Apr 2021 15:00:40 -0700 Message-ID: <161731444013.68884.4021114312848535993.stgit@john-XPS-13-9370> In-Reply-To: <161731427139.68884.1934993103507544474.stgit@john-XPS-13-9370> References: <161731427139.68884.1934993103507544474.stgit@john-XPS-13-9370> User-Agent: StGit/0.23-85-g6af9 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Incorrect accounting fwd_alloc can result in a warning when the socket is torn down, [18455.319240] WARNING: CPU: 0 PID: 24075 at net/core/stream.c:208 sk_stream_kill_queues+0x21f/0x230 [...] [18455.319543] Call Trace: [18455.319556] inet_csk_destroy_sock+0xba/0x1f0 [18455.319577] tcp_rcv_state_process+0x1b4e/0x2380 [18455.319593] ? lock_downgrade+0x3a0/0x3a0 [18455.319617] ? tcp_finish_connect+0x1e0/0x1e0 [18455.319631] ? sk_reset_timer+0x15/0x70 [18455.319646] ? tcp_schedule_loss_probe+0x1b2/0x240 [18455.319663] ? lock_release+0xb2/0x3f0 [18455.319676] ? __release_sock+0x8a/0x1b0 [18455.319690] ? lock_downgrade+0x3a0/0x3a0 [18455.319704] ? lock_release+0x3f0/0x3f0 [18455.319717] ? __tcp_close+0x2c6/0x790 [18455.319736] ? tcp_v4_do_rcv+0x168/0x370 [18455.319750] tcp_v4_do_rcv+0x168/0x370 [18455.319767] __release_sock+0xbc/0x1b0 [18455.319785] __tcp_close+0x2ee/0x790 [18455.319805] tcp_close+0x20/0x80 This currently happens because on redirect case we do skb_set_owner_r() with the original sock. This increments the fwd_alloc memory accounting on the original sock. Then on redirect we may push this into the queue of the psock we are redirecting to. When the skb is flushed from the queue we give the memory back to the original sock. The problem is if the original sock is destroyed/closed with skbs on another psocks queue then the original sock will not have a way to reclaim the memory before being destroyed. Then above warning will be thrown sockA sockB sk_psock_strp_read() sk_psock_verdict_apply() -- SK_REDIRECT -- sk_psock_skb_redirect() skb_queue_tail(psock_other->ingress_skb..) sk_close() sock_map_unref() sk_psock_put() sk_psock_drop() sk_psock_zap_ingress() At this point we have torn down our own psock, but have the outstanding skb in psock_other. Note that SK_PASS doesn't have this problem because the sk_psock_drop() logic releases the skb, its still associated with our psock. To resolve lets only account for sockets on the ingress queue that are still associated with the current socket. On the redirect case we will check memory limits per 6fa9201a89898, but will omit fwd_alloc accounting until skb is actually enqueued. When the skb is sent via skb_send_sock_locked or received with sk_psock_skb_ingress memory will be claimed on psock_other. Reported-by: Andrii Nakryiko Fixes: 6fa9201a89898 ("bpf, sockmap: Avoid returning unneeded EAGAIN when redirecting to self") Signed-off-by: John Fastabend --- net/core/skmsg.c | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 1261512d6807..5def3a2e85be 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -488,6 +488,7 @@ static int sk_psock_skb_ingress_self(struct sk_psock *psock, struct sk_buff *skb if (unlikely(!msg)) return -EAGAIN; sk_msg_init(msg); + skb_set_owner_r(skb, sk); return sk_psock_skb_ingress_enqueue(skb, psock, sk, msg); } @@ -790,7 +791,6 @@ static void sk_psock_tls_verdict_apply(struct sk_buff *skb, struct sock *sk, int { switch (verdict) { case __SK_REDIRECT: - skb_set_owner_r(skb, sk); sk_psock_skb_redirect(skb); break; case __SK_PASS: @@ -808,10 +808,6 @@ int sk_psock_tls_strp_read(struct sk_psock *psock, struct sk_buff *skb) rcu_read_lock(); prog = READ_ONCE(psock->progs.skb_verdict); if (likely(prog)) { - /* We skip full set_owner_r here because if we do a SK_PASS - * or SK_DROP we can skip skb memory accounting and use the - * TLS context. - */ skb->sk = psock->sk; tcp_skb_bpf_redirect_clear(skb); ret = sk_psock_bpf_run(psock, prog, skb); @@ -880,12 +876,13 @@ static void sk_psock_strp_read(struct strparser *strp, struct sk_buff *skb) kfree_skb(skb); goto out; } - skb_set_owner_r(skb, sk); prog = READ_ONCE(psock->progs.skb_verdict); if (likely(prog)) { + skb->sk = sk; tcp_skb_bpf_redirect_clear(skb); ret = sk_psock_bpf_run(psock, prog, skb); ret = sk_psock_map_verd(ret, tcp_skb_bpf_redirect_fetch(skb)); + skb->sk = NULL; } sk_psock_verdict_apply(psock, skb, ret); out: @@ -956,12 +953,13 @@ static int sk_psock_verdict_recv(read_descriptor_t *desc, struct sk_buff *skb, kfree_skb(skb); goto out; } - skb_set_owner_r(skb, sk); prog = READ_ONCE(psock->progs.skb_verdict); if (likely(prog)) { + skb->sk = sk; tcp_skb_bpf_redirect_clear(skb); ret = sk_psock_bpf_run(psock, prog, skb); ret = sk_psock_map_verd(ret, tcp_skb_bpf_redirect_fetch(skb)); + skb->sk = NULL; } sk_psock_verdict_apply(psock, skb, ret); out: