From patchwork Fri Oct 9 04:43:51 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 268860 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E372CC433E7 for ; Fri, 9 Oct 2020 04:44:05 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A3BD222254 for ; Fri, 9 Oct 2020 04:44:05 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="r5YKGZkd" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726845AbgJIEoF (ORCPT ); Fri, 9 Oct 2020 00:44:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47532 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725917AbgJIEoE (ORCPT ); Fri, 9 Oct 2020 00:44:04 -0400 Received: from mail-io1-xd42.google.com (mail-io1-xd42.google.com [IPv6:2607:f8b0:4864:20::d42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9778FC0613D2; Thu, 8 Oct 2020 21:44:04 -0700 (PDT) Received: by mail-io1-xd42.google.com with SMTP id q9so8823443iow.6; Thu, 08 Oct 2020 21:44:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:from:to:cc:date:message-id:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=lLa3hBZMn4NHFo7xwpT+WHTGtW+n4KL1Gv/kNX990nc=; b=r5YKGZkdIz4ByXXD71ugubSnprRvd4//oV9EGwOfL5/J+ibnL51987JpWaj6vfOeDS Fbux2X9qw6sqVqBM8//8WZqXFyofZW9aUKmGcgaFcD1bayRoQvFKSvA6hVgDjOU3y3j8 y9GOLmeNSb9FsGNSbh/TrBdhXzwkVu0vx9MDU1vf4RfbGAQeRbC3edDZuU5Wll5n1L6w eTircLqwwhoCXGcsAsR2vUWFwpeNguSx8jtmnCLz8wqGhq5FacS91uxrPETRl2cBj4mI RXA/+pNWPLA5Gw98cno93UxtlNkivr3x/TjfAVTGqkvn70ar7SD9KqXwAupBIOU5MWVM DN9w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=lLa3hBZMn4NHFo7xwpT+WHTGtW+n4KL1Gv/kNX990nc=; b=smSBAMTI3n+/4P+1hvYO+oKuaj2jmDFFGu9aJhJQyPbI2/tsg5EQk/MMgJsW/SEejY oNCQchlAo9GXOk2ZVyk8PLw1TdYdBkalee2psudCkH0u9sKj3fulCxzpQGwgHu7ZcTMa JI322Rl5W1jAu5OZN5X8V6jeGaO37zf0cjj2mmNk5GqOirDVY/tESXWj9Q4KyXCAiXbR DhZ/XFhd7b6q1XQdcRPgp0iqjCg229TklFz/7HEuzdNU6/lTOhR+BqUwA3gFT8424pyW EHzH7w26H+xZnYvg1N6u4cU9cP2RKdF7m/5UeAurbUWID2lrmR/NucdGYFxnKZW+kxEY tLdg== X-Gm-Message-State: AOAM532S9xWhHjIeqeUWe6r23sBTQpLG0RFWIXzXyg2xIZCVyr1jpEHz m7NWfX4tbP3aLKYezqPBQRQcei7jR2bUHA== X-Google-Smtp-Source: ABdhPJy8+WXvEQnuhXqMI6RuLjkYA/aYoBRPhvQF34tnnYDLuR3+Ewjn0pWnmvp96y695tvcp4Fytw== X-Received: by 2002:a6b:4e0b:: with SMTP id c11mr8237298iob.68.1602218643966; Thu, 08 Oct 2020 21:44:03 -0700 (PDT) Received: from [127.0.1.1] ([184.63.162.180]) by smtp.gmail.com with ESMTPSA id h14sm3673200ilc.38.2020.10.08.21.43.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Oct 2020 21:44:03 -0700 (PDT) Subject: [bpf-next PATCH 1/6] bpf, sockmap: skb verdict SK_PASS to self already checked rmem limits From: John Fastabend To: john.fastabend@gmail.com, alexei.starovoitov@gmail.com, daniel@iogearbox.net Cc: netdev@vger.kernel.org, bpf@vger.kernel.org, jakub@cloudflare.com, lmb@cloudflare.com Date: Thu, 08 Oct 2020 21:43:51 -0700 Message-ID: <160221863101.12042.14367865435124784102.stgit@john-Precision-5820-Tower> In-Reply-To: <160221803938.12042.6218664623397526197.stgit@john-Precision-5820-Tower> References: <160221803938.12042.6218664623397526197.stgit@john-Precision-5820-Tower> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org For sk_skb case where skb_verdict program returns SK_PASS to continue to pass packet up the stack, the memory limits were already checked before enqueuing in skb_queue_tail from TCP side. So, lets remove the extra checks here. The theory is if the TCP stack believes we have memory to receive the packet then lets trust the stack and not double check the limits. In fact the accounting here can cause a drop if sk_rmem_alloc has increased after the stack accepted this packet, but before the duplicate check here. And worse if this happens because TCP stack already believes the data has been received there is no retransmit. Fixes: 51199405f9672 ("bpf: skb_verdict, support SK_PASS on RX BPF path") Signed-off-by: John Fastabend --- net/core/skmsg.c | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 4b5f7c8fecd1..040ae1d75b65 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -771,6 +771,7 @@ EXPORT_SYMBOL_GPL(sk_psock_tls_strp_read); static void sk_psock_verdict_apply(struct sk_psock *psock, struct sk_buff *skb, int verdict) { + struct tcp_skb_cb *tcp; struct sock *sk_other; switch (verdict) { @@ -780,16 +781,12 @@ static void sk_psock_verdict_apply(struct sk_psock *psock, !sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED)) { goto out_free; } - if (atomic_read(&sk_other->sk_rmem_alloc) <= - sk_other->sk_rcvbuf) { - struct tcp_skb_cb *tcp = TCP_SKB_CB(skb); - tcp->bpf.flags |= BPF_F_INGRESS; - skb_queue_tail(&psock->ingress_skb, skb); - schedule_work(&psock->work); - break; - } - goto out_free; + tcp = TCP_SKB_CB(skb); + tcp->bpf.flags |= BPF_F_INGRESS; + skb_queue_tail(&psock->ingress_skb, skb); + schedule_work(&psock->work); + break; case __SK_REDIRECT: sk_psock_skb_redirect(skb); break; From patchwork Fri Oct 9 04:44:08 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 269326 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D4D1BC433E7 for ; Fri, 9 Oct 2020 04:44:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9767622251 for ; Fri, 9 Oct 2020 04:44:23 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="SjQ5r6tM" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727964AbgJIEoX (ORCPT ); Fri, 9 Oct 2020 00:44:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47586 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725917AbgJIEoW (ORCPT ); Fri, 9 Oct 2020 00:44:22 -0400 Received: from mail-io1-xd42.google.com (mail-io1-xd42.google.com [IPv6:2607:f8b0:4864:20::d42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A908BC0613D2; Thu, 8 Oct 2020 21:44:22 -0700 (PDT) Received: by mail-io1-xd42.google.com with SMTP id q9so8823960iow.6; Thu, 08 Oct 2020 21:44:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:from:to:cc:date:message-id:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=ue3SsOhZdq7UxtSZgVrpSUS1g0m/Xb3ufNzGhGzXbho=; b=SjQ5r6tMwj59/CA36tJMOZM5eHQ7ARLBS7UJYM9HX5m73cHld1YwDxW4mXpbe0d0Hc hN1xxCTmgG8O9zxu1z4WQ3YDsZlSf2LILzbmcpPhFkeFDNfG5NAQw+d/Pkg3+iT5cDVM QNFuBg4yzoZsYdJN4WB72f4aDlyC1xyrh+zMvWDLl1hkzKJBWhaH8hengXotPa8WaH5L yCvfdw1SWqbYjpKZZX9NlPMDOy7KEAsMFNXAyOa5mZqWvRPEV2lLQwes6GKawKPQbDQb B0PJYlDqKUSePKH3CUp5Dm+V66FK95tnsYcUMEirmybP9vmu6BlrKWrKFlmelRd6d0BO k21w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=ue3SsOhZdq7UxtSZgVrpSUS1g0m/Xb3ufNzGhGzXbho=; b=iS3+DD+5ZXvQoSEfDCZqXu1wfV47zBD0Z2Rd0YaMYleiHA1blrPQISNq+j74Hz7E72 S9hG8ECdeGfNZ0bWUpS05D5K0z2/lTOFk+RRl8IFSpevKLDdu1dpQAKt3ZpevgEQDJT2 yO5zVDKUN22d/JJGwUsc7O+s0MBFQVXnBaphp2FX2GyjePewM4RNkYMMxrXpQG440k1P HvS2Kf1DLHCfOnWorWI5UM7osnuIzk7OMRJ03Z/g2wvCH2W2T0WIaeMa1n0d8a7cBMoE 4F+6g3oFp1MVB53ZJFNbJOtstrCULQpZ0HNQe82mAEnY/gEH1sLK/wgNuOyVvpGwA/Wo zvyQ== X-Gm-Message-State: AOAM530xpWH6UiTM3IJdNMst5zsijIH0ScmHsvhWVyIB8oznDiWBbTR9 UhEU/uxxGyhj/Qm9NF4Ig6Y= X-Google-Smtp-Source: ABdhPJwzQnSnfNEcMSzt43/Qadw0zOOfQvoD0N+YCIExLk5rs9gPk44M/IaNLbb9IitlphuMOds9pA== X-Received: by 2002:a02:cb0c:: with SMTP id j12mr7700105jap.54.1602218662074; Thu, 08 Oct 2020 21:44:22 -0700 (PDT) Received: from [127.0.1.1] ([184.63.162.180]) by smtp.gmail.com with ESMTPSA id v13sm3638619ilh.65.2020.10.08.21.44.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Oct 2020 21:44:21 -0700 (PDT) Subject: [bpf-next PATCH 2/6] bpf, sockmap: On receive programs try to fast track SK_PASS ingress From: John Fastabend To: john.fastabend@gmail.com, alexei.starovoitov@gmail.com, daniel@iogearbox.net Cc: netdev@vger.kernel.org, bpf@vger.kernel.org, jakub@cloudflare.com, lmb@cloudflare.com Date: Thu, 08 Oct 2020 21:44:08 -0700 Message-ID: <160221864872.12042.14533177764605980614.stgit@john-Precision-5820-Tower> In-Reply-To: <160221803938.12042.6218664623397526197.stgit@john-Precision-5820-Tower> References: <160221803938.12042.6218664623397526197.stgit@john-Precision-5820-Tower> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org When we receive an skb and the ingress skb verdict program returns SK_PASS we currently set the ingress flag and put it on the workqueue so it can be turned into a sk_msg and put on the sk_msg ingress queue. Then finally telling userspace with data_ready hook. Here we observe that if the workqueue is empty then we can try to convert into a sk_msg type and call data_ready directly without bouncing through a workqueue. Its a common pattern to have a recv verdict program for visibility that always returns SK_PASS. In this case unless there is an ENOMEM error or we overrun the socket we can avoid the workqueue completely only using it when we fall back to error cases caused by memory pressure. By doing this we eliminate another case where data may be dropped if errors occur on memory limits in workqueue. Fixes: 51199405f9672 ("bpf: skb_verdict, support SK_PASS on RX BPF path") Signed-off-by: John Fastabend Reported-by: kernel test robot --- net/core/skmsg.c | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 040ae1d75b65..dabd25313a70 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -773,6 +773,7 @@ static void sk_psock_verdict_apply(struct sk_psock *psock, { struct tcp_skb_cb *tcp; struct sock *sk_other; + int err; switch (verdict) { case __SK_PASS: @@ -784,8 +785,20 @@ static void sk_psock_verdict_apply(struct sk_psock *psock, tcp = TCP_SKB_CB(skb); tcp->bpf.flags |= BPF_F_INGRESS; - skb_queue_tail(&psock->ingress_skb, skb); - schedule_work(&psock->work); + + /* If the queue is empty then we can submit directly + * into the msg queue. If its not empty we have to + * queue work otherwise we may get OOO data. Otherwise, + * if sk_psock_skb_ingress errors will be handled by + * retrying later from workqueue. + */ + if (skb_queue_empty(&psock->ingress_skb)) { + err = sk_psock_skb_ingress(psock, skb); + } + if (err < 0) { + skb_queue_tail(&psock->ingress_skb, skb); + schedule_work(&psock->work); + } break; case __SK_REDIRECT: sk_psock_skb_redirect(skb); From patchwork Fri Oct 9 04:44:27 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 268859 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 250A4C433E7 for ; Fri, 9 Oct 2020 04:44:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D759A2223C for ; Fri, 9 Oct 2020 04:44:42 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="b3P2FnpJ" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728854AbgJIEom (ORCPT ); Fri, 9 Oct 2020 00:44:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47634 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725917AbgJIEom (ORCPT ); Fri, 9 Oct 2020 00:44:42 -0400 Received: from mail-il1-x143.google.com (mail-il1-x143.google.com [IPv6:2607:f8b0:4864:20::143]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7B942C0613D2; Thu, 8 Oct 2020 21:44:40 -0700 (PDT) Received: by mail-il1-x143.google.com with SMTP id b2so8061765ilr.1; Thu, 08 Oct 2020 21:44:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:from:to:cc:date:message-id:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=4gkE+w1Qs8ezFeaPKV2wW5bemZHCEITfhAmzV/zUEPU=; b=b3P2FnpJl0G4N3eNTHIa7386f+S200GCO0E87U5tc+/SDDtl1eQrtN4/Oot3DrC8qg rnLKKziRVwMJ060N11NXOxgAta0x8Nh0GKlVZd0gyLxNKagmGJIQ7aCx5IxIXPFyE2mH Z0ZXdWJ1svMf38M3pUI313RcqbAGXWuFAjEQACEOOoKhZJ7Yf0QebFRy4iCRST8HA1bG GEuzHKxyh77iOP7PaQlDoMdqzYYNqZ8yl/n+hol4pGsa9r6HeXQSNujaZeE+owiFAcIr IH/tnRkf3Qykq3CaCz8cpT+jCYH6uiaxkBNefzspEsrs3v8QJffhi1CEbSq5ncbQ99GC xQAg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=4gkE+w1Qs8ezFeaPKV2wW5bemZHCEITfhAmzV/zUEPU=; b=LYTo/Jn9O7CbK8yyHkjw8xjMKlL4/jKdwhG2d5u6ggLIM+0VoSdQh1+Nn6Nvymp/M1 Mvs/l1deLohTILRuABHM8f8afdmUXgcBSdK4ilC/Bf4LmicTx8iAEuXUPXR/pFRrF8Kk yYMZacId10wil4R8NMkbggMxRMhq/SuRu1k/2yMPVD9CmzcP3xGPqfIlcrMVo2wmCdnh xHm4nINlmLf11yqqWNFsZH8z6sRg0e41P+kufvGolPalpKLWb1O6lUZKN+J9O0wr6t3Z Zz5PFn3A8PIJM7D0uPYdxd7kcCvycbu6EE0OT4DcEstquMzNeWTkXihOLOQ4MmtcESpb 8EtA== X-Gm-Message-State: AOAM5324kna4Fa2YPqi/e8Iagf74TYmJo5eE87VMirLzMXwZ6O4pv78j 3V12SAEltFjNzPsEnl76qGs= X-Google-Smtp-Source: ABdhPJzestMsc5QVbsSnDmT0bWIdMdmlSBWT3wc6i5wwOOCnKa2pcKYMsHa3q8JYiy3V3VI0ClDgXg== X-Received: by 2002:a92:c5c2:: with SMTP id s2mr9542412ilt.177.1602218679884; Thu, 08 Oct 2020 21:44:39 -0700 (PDT) Received: from [127.0.1.1] ([184.63.162.180]) by smtp.gmail.com with ESMTPSA id e15sm3608786ili.75.2020.10.08.21.44.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Oct 2020 21:44:39 -0700 (PDT) Subject: [bpf-next PATCH 3/6] bpf, sockmap: remove skb_set_owner_w wmem will be taken later from sendpage From: John Fastabend To: john.fastabend@gmail.com, alexei.starovoitov@gmail.com, daniel@iogearbox.net Cc: netdev@vger.kernel.org, bpf@vger.kernel.org, jakub@cloudflare.com, lmb@cloudflare.com Date: Thu, 08 Oct 2020 21:44:27 -0700 Message-ID: <160221866732.12042.16556499859895432372.stgit@john-Precision-5820-Tower> In-Reply-To: <160221803938.12042.6218664623397526197.stgit@john-Precision-5820-Tower> References: <160221803938.12042.6218664623397526197.stgit@john-Precision-5820-Tower> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org The skb_set_owner_w is unnecessary here. The sendpage call will create a fresh skb and set the owner correctly from workqueue. Its also not entirely harmless because it consumes cycles, but also impacts resource accounting by increasing sk_wmem_alloc. This is charging the socket we are going to send to for the skb, but we will put it on the workqueue for some time before this happens so we are artifically inflating sk_wmem_alloc for this period. Further, we don't know how many skbs will be used to send the packet or how it will be broken up when sent over the new socket so charging it with one big sum is also not correct when the workqueue may break it up if facing memory pressure. Seeing we don't know how/when this is going to be sent drop the early accounting. A later patch will do proper accounting charged on receive socket for the case where skbs get enqueued on the workqueue. Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: John Fastabend --- net/core/skmsg.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/net/core/skmsg.c b/net/core/skmsg.c index dabd25313a70..b60768951de2 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -728,8 +728,6 @@ static void sk_psock_skb_redirect(struct sk_buff *skb) (ingress && atomic_read(&sk_other->sk_rmem_alloc) <= sk_other->sk_rcvbuf)) { - if (!ingress) - skb_set_owner_w(skb, sk_other); skb_queue_tail(&psock_other->ingress_skb, skb); schedule_work(&psock_other->work); } else { From patchwork Fri Oct 9 04:44:45 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 269325 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 47C92C433E7 for ; Fri, 9 Oct 2020 04:45:00 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 11E4A2223C for ; Fri, 9 Oct 2020 04:45:00 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="slmms/52" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729366AbgJIEo7 (ORCPT ); Fri, 9 Oct 2020 00:44:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47690 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725917AbgJIEo7 (ORCPT ); Fri, 9 Oct 2020 00:44:59 -0400 Received: from mail-il1-x144.google.com (mail-il1-x144.google.com [IPv6:2607:f8b0:4864:20::144]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 28C74C0613D2; Thu, 8 Oct 2020 21:44:59 -0700 (PDT) Received: by mail-il1-x144.google.com with SMTP id o18so8063933ill.2; Thu, 08 Oct 2020 21:44:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:from:to:cc:date:message-id:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=QPiMN4Qec74wfDW6vodMRH/5EyDyOHe+QRu0XZ+jXD4=; b=slmms/52+r/mRLHlyLeV/N2DoLC/6H7EES0GahtqrTzgx7oWIZEHUEgVMye8eMINHN sQ/6F3FCtOn0g1STloUfs/Av6XOvqEjhPvTojYitUk25/rpKSwAZIgYLpsScHeIqR+OA sjQmx5NjtXKSSsTxsJHr0rpWRxedG4iH9HrRlnboROEtWxv3kEf3TfHxNVIFA8cIr0En 2rfbT4tA6LMHFdG18gq4pV5TD+FEyOP7lHckaT4YffDiLVPEBifPDxW/QPLVeHIId+JZ +ubw843Ln7hOUyW1W0cjmMEAf26BldHP0uK8hgNGvGd/k7PDhc34Qg3s9wMrvaGAf8f+ Vr8g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=QPiMN4Qec74wfDW6vodMRH/5EyDyOHe+QRu0XZ+jXD4=; b=AI4X8zLGee8axtfx9t/jLDfm3UohRZRAeaThbFKHRr9ODTg9K7aGpZG8V+U5JrQuOJ vG7xrk819h/PqOPFY3aYyVOp/mdOrB92AV1UsdaSvDkqtwGzqewNhJHCTBxRBD46MS8D kOj7W7447c3rF7vngvE5T9/L5EXIPTa27m9z02u7cNw/1cyZkoUOK9cQ8Hmwkxj6hgJf woJHdVzzAN70OYX7wfL4XFdCKU1aSl/MMrvIPTjmIej0Zu941phS6Uc9QRFtyQBkvQKz ZIiAU9Vlu6DhC6T9Tpo4jX5mPAGvqCGZ5kA9EqTcKgkFPrf8BgGQNfsvg84KJEp6Ojsz 7/zw== X-Gm-Message-State: AOAM530rqdsXlYB+k94DXqoLLzOJCIEki0JI1c0JFUePvLMOM7FJ1km6 1R+m/Sch2RVdBmZ/iFdASeoKwWdVKAVLPg== X-Google-Smtp-Source: ABdhPJwQE5BjrzF34YMbTYF5568FcjkEfFSMBaBbQZ6HQ8zwRsfKPBMZlrp7RhKp9ARfnJ1gGXlsbQ== X-Received: by 2002:a92:91d2:: with SMTP id e79mr8891124ill.17.1602218698517; Thu, 08 Oct 2020 21:44:58 -0700 (PDT) Received: from [127.0.1.1] ([184.63.162.180]) by smtp.gmail.com with ESMTPSA id v18sm3679066ilj.12.2020.10.08.21.44.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Oct 2020 21:44:57 -0700 (PDT) Subject: [bpf-next PATCH 4/6] bpf, sockmap: remove dropped data on errors in redirect case From: John Fastabend To: john.fastabend@gmail.com, alexei.starovoitov@gmail.com, daniel@iogearbox.net Cc: netdev@vger.kernel.org, bpf@vger.kernel.org, jakub@cloudflare.com, lmb@cloudflare.com Date: Thu, 08 Oct 2020 21:44:45 -0700 Message-ID: <160221868511.12042.12285689875540180401.stgit@john-Precision-5820-Tower> In-Reply-To: <160221803938.12042.6218664623397526197.stgit@john-Precision-5820-Tower> References: <160221803938.12042.6218664623397526197.stgit@john-Precision-5820-Tower> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org In the sk_skb redirect case we didn't handle the case where we overrun the sk_rmem_alloc entry on ingress redirect or sk_wmem_alloc on egress. Because we didn't have anything implemented we simply dropped the skb. This meant data could be dropped if socket memory accounting was in place. This fixes the above dropped data case by moving the memory checks later in the code where we actually do the send or recv. This pushes those checks into the workqueue and allows us to return an EAGAIN error which in turn allows us to try again later from the workqueue. Fixes: 51199405f9672 ("bpf: skb_verdict, support SK_PASS on RX BPF path") Signed-off-by: John Fastabend Reported-by: kernel test robot --- net/core/skmsg.c | 26 ++++++++++++++------------ 1 file changed, 14 insertions(+), 12 deletions(-) diff --git a/net/core/skmsg.c b/net/core/skmsg.c index b60768951de2..0bc8679e8033 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -433,10 +433,12 @@ static int sk_psock_skb_ingress(struct sk_psock *psock, struct sk_buff *skb) static int sk_psock_handle_skb(struct sk_psock *psock, struct sk_buff *skb, u32 off, u32 len, bool ingress) { - if (ingress) - return sk_psock_skb_ingress(psock, skb); - else + if (!ingress) { + if (!sock_writeable(psock->sk)) + return -EAGAIN; return skb_send_sock_locked(psock->sk, skb, off, len); + } + return sk_psock_skb_ingress(psock, skb); } static void sk_psock_backlog(struct work_struct *work) @@ -712,11 +714,18 @@ static void sk_psock_skb_redirect(struct sk_buff *skb) bool ingress; sk_other = tcp_skb_bpf_redirect_fetch(skb); + /* This error is a buggy BPF program, it returned a redirect + * return code, but then didn't set a redirect interface. + */ if (unlikely(!sk_other)) { kfree_skb(skb); return; } psock_other = sk_psock(sk_other); + /* This error indicates the socket is being torn down or had another + * error that caused the pipe to break. We can't send a packet on + * a socket that is in this state so we drop the skb. + */ if (!psock_other || sock_flag(sk_other, SOCK_DEAD) || !sk_psock_test_state(psock_other, SK_PSOCK_TX_ENABLED)) { kfree_skb(skb); @@ -724,15 +733,8 @@ static void sk_psock_skb_redirect(struct sk_buff *skb) } ingress = tcp_skb_bpf_ingress(skb); - if ((!ingress && sock_writeable(sk_other)) || - (ingress && - atomic_read(&sk_other->sk_rmem_alloc) <= - sk_other->sk_rcvbuf)) { - skb_queue_tail(&psock_other->ingress_skb, skb); - schedule_work(&psock_other->work); - } else { - kfree_skb(skb); - } + skb_queue_tail(&psock_other->ingress_skb, skb); + schedule_work(&psock_other->work); } static void sk_psock_tls_verdict_apply(struct sk_buff *skb, int verdict) From patchwork Fri Oct 9 04:45:04 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 268858 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 199A9C433E7 for ; Fri, 9 Oct 2020 04:45:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C31CE2223C for ; Fri, 9 Oct 2020 04:45:18 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="M0aknxh6" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731395AbgJIEpS (ORCPT ); Fri, 9 Oct 2020 00:45:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47740 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729225AbgJIEpR (ORCPT ); Fri, 9 Oct 2020 00:45:17 -0400 Received: from mail-io1-xd43.google.com (mail-io1-xd43.google.com [IPv6:2607:f8b0:4864:20::d43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C14F1C0613D2; Thu, 8 Oct 2020 21:45:17 -0700 (PDT) Received: by mail-io1-xd43.google.com with SMTP id n6so8793145ioc.12; Thu, 08 Oct 2020 21:45:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:from:to:cc:date:message-id:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=OKQN38whfOCZJQQ7SBZl1Rsoo0C1oj1GmNd56G4D9v0=; b=M0aknxh6Cv2qYdroLmGO2cGPUgfbrHLiUjHEgjvma+MW7F2tweZpyfQKe0zisJg94Z thWokAqgGhekQ2/Y0jEF7uLet8htJWJOoPDV+r49CjYTmZtopG+CD8PFmJC3EKI0H5zF Rml+3y9UNFJ/BISWFsytLDM1cNuGj+Ttu/nLOfiUWF4tPg+AVxarNfPU1fO7bs7e6KF5 SnLQjb8oGQGkfISoPag+bqpnEtQfj6RRG30woIY41t4hEBTleUaACmogpdRdpmQwiYuO PV3SID49gAF6XvZTEOGYUzMKMCNZ2VX3Wa88sYPZQy8dnhBiofeJBBh5UTPYbZ3fKWHz 9NDg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=OKQN38whfOCZJQQ7SBZl1Rsoo0C1oj1GmNd56G4D9v0=; b=EkZihpKjdVoSTlk81Ph8oqMyTblyqGvsSAml3+vWlVUOucTuVC/5VHeEtUdr9mhQMq 137J/2MfnmX/hTvnSTNqgdrFgWP9eiBAM4uG8wcvJtglJNvIbHonqk24fBP2unW8W74q M4dZK4Ws5GDMGSYWQFrUIVp5UgjwTGxR9jVTzrA5hqWBtdgySBlSfKdXwpFNArRMrTw1 pUoAx7MsIi+3arp74d67pgc/suU2Tw04aXZw41V9w/f2cB7fx9nErfExboTlb5isIPIx 4+mkzcOQ9xjhCXjp/d/m82DcA8H8WCVisvEPbtLgVQGrOlcBwuf1YWZWyE0tZH/fMlzq YYCg== X-Gm-Message-State: AOAM530ATBkjZRlamvRnE5LC6PxC5DPU+aCvnwxCU0QQdO68BN8vjHXq q7fnXw928YWZw7Th+gmQsmqX+B1bBELaJQ== X-Google-Smtp-Source: ABdhPJzmFgxZsRfy4le4vZmJ554UBvN5RmCpfuBKoE8t8zmpgMj2B94jc2nctLsFm+b/zFfktnqgpA== X-Received: by 2002:a6b:8dcf:: with SMTP id p198mr8359886iod.200.1602218717083; Thu, 08 Oct 2020 21:45:17 -0700 (PDT) Received: from [127.0.1.1] ([184.63.162.180]) by smtp.gmail.com with ESMTPSA id b2sm3628794ila.62.2020.10.08.21.45.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Oct 2020 21:45:16 -0700 (PDT) Subject: [bpf-next PATCH 5/6] bpf, sockmap: Remove skb_orphan and let normal skb_kfree do cleanup From: John Fastabend To: john.fastabend@gmail.com, alexei.starovoitov@gmail.com, daniel@iogearbox.net Cc: netdev@vger.kernel.org, bpf@vger.kernel.org, jakub@cloudflare.com, lmb@cloudflare.com Date: Thu, 08 Oct 2020 21:45:04 -0700 Message-ID: <160221870378.12042.9140246148992032681.stgit@john-Precision-5820-Tower> In-Reply-To: <160221803938.12042.6218664623397526197.stgit@john-Precision-5820-Tower> References: <160221803938.12042.6218664623397526197.stgit@john-Precision-5820-Tower> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Calling skb_orphan() is unnecessary in the strp rcv handler because the skb is from a skb_clone() in __strp_recv. So it never has a destructor or a sk assigned. Plus its confusing to read because it might hint to the reader that the skb could have an sk assigned which is not true. Even if we did have an sk assigned it would be cleaner to simply wait for the upcoming kfree_skb(). Additionally, move the comment about strparser clone up so its closer to the logic it is describing and add to it so that it is more complete. Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: John Fastabend --- net/core/skmsg.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 0bc8679e8033..ef68749c9104 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -686,15 +686,16 @@ static int sk_psock_bpf_run(struct sk_psock *psock, struct bpf_prog *prog, { int ret; + /* strparser clones the skb before handing it to a upper layer, + * meaning we have the same data, but sk is NULL. We do want an + * sk pointer though when we run the BPF program. So we set it + * here and then NULL it to ensure we don't trigger a BUG_ON() + * in skb/sk operations later if kfree_skb is called with a + * valid skb->sk pointer and no destructor assigned. + */ skb->sk = psock->sk; bpf_compute_data_end_sk_skb(skb); ret = bpf_prog_run_pin_on_cpu(prog, skb); - /* strparser clones the skb before handing it to a upper layer, - * meaning skb_orphan has been called. We NULL sk on the way out - * to ensure we don't trigger a BUG_ON() in skb/sk operations - * later and because we are not charging the memory of this skb - * to any socket yet. - */ skb->sk = NULL; return ret; } @@ -826,7 +827,6 @@ static void sk_psock_strp_read(struct strparser *strp, struct sk_buff *skb) } prog = READ_ONCE(psock->progs.skb_verdict); if (likely(prog)) { - skb_orphan(skb); tcp_skb_bpf_redirect_clear(skb); ret = sk_psock_bpf_run(psock, prog, skb); ret = sk_psock_map_verd(ret, tcp_skb_bpf_redirect_fetch(skb)); From patchwork Fri Oct 9 04:45:22 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 269324 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B3852C433E7 for ; Fri, 9 Oct 2020 04:45:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 724D92223C for ; Fri, 9 Oct 2020 04:45:37 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="MdSrj+nB" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731526AbgJIEpg (ORCPT ); Fri, 9 Oct 2020 00:45:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47788 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729225AbgJIEpg (ORCPT ); Fri, 9 Oct 2020 00:45:36 -0400 Received: from mail-il1-x142.google.com (mail-il1-x142.google.com [IPv6:2607:f8b0:4864:20::142]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 68D19C0613D2; Thu, 8 Oct 2020 21:45:36 -0700 (PDT) Received: by mail-il1-x142.google.com with SMTP id b2so8063037ilr.1; Thu, 08 Oct 2020 21:45:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:from:to:cc:date:message-id:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=y0Gm00JncLWpHRIGNKTr/1wc9EHa/ABXzDq9+KJHsr8=; b=MdSrj+nB4XfFW+yhZ7RX1aJv991/4ayNzJ6G/5pVOrxUhY+zmdDV3g9T2+yA/v9ADq jNlzbl5dY7ocEVgTEzJq+cb+oj4vHH6Ollp0IXUbHlDI55B1R6iCFdrrsrCrOHEHWTPv E2rd1BvGugRszUQNPaX0N+zR/yrHmUT0g0x36OE+Mw6kGr12ITvg3sAK/mNNvLaJo75s xJ8xDpUK7zpO+0WyA+wyzkbTiCQyLOfUcazkHZmJYCucJHwl+eBed4wY/i2L48mo/stV aT+MAEI6X5QBOuGPE1GeLp/U3U3PDb9X8E03xuquREHmpn4fe/W7Sfv4DjNXPWrJtOop 3j/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=y0Gm00JncLWpHRIGNKTr/1wc9EHa/ABXzDq9+KJHsr8=; b=qpG3Ih+nko8yvqjqFT5asAgfqA9uf8mTGrOi8/1gex9HXSb5zicj4mz66gomND8E/e gXG4lqHfE+hCNIsX4ZipIq8l+acGobRFkIfa0NKrz5BgfXRc81x66pWayk/1AkwgqMxE //DwdlPdoUJc373cPP2Alfo1qmyze9kXA4TQ+z6UTHWufRtbb5hfEyI3jQCeam+qO5iP 2JRbTLxaugPzfbPU9QvDnwzmIX4Hng6czldT/Wu2zkuruOPIUgEQdGW4fqw4xoCjKazL oMTf6efMi9wrIe1CVJgeUojGsXBmqLgJSZeH1KwgQUzztBBPwq1v2o8/IoPlPeFNyQWx kMJA== X-Gm-Message-State: AOAM5309ixs9eRl8WCgpyXxbaxTHbCSnSXFsQiNPbAb3FIG+TMEqWLqL p1OM0M3nt6ewQAL16YHRzVmM5QlekK10BA== X-Google-Smtp-Source: ABdhPJxShpETNkieD2iC2XPRmah77Yl1wtiWqHnLQVFPAxmF84SOhawMU/kg6s+p7uQQ54VreL8Qew== X-Received: by 2002:a92:608:: with SMTP id x8mr9711927ilg.79.1602218735728; Thu, 08 Oct 2020 21:45:35 -0700 (PDT) Received: from [127.0.1.1] ([184.63.162.180]) by smtp.gmail.com with ESMTPSA id z20sm3072059ior.2.2020.10.08.21.45.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Oct 2020 21:45:35 -0700 (PDT) Subject: [bpf-next PATCH 6/6] bpf, sockmap: Add memory accounting so skbs on ingress lists are visible From: John Fastabend To: john.fastabend@gmail.com, alexei.starovoitov@gmail.com, daniel@iogearbox.net Cc: netdev@vger.kernel.org, bpf@vger.kernel.org, jakub@cloudflare.com, lmb@cloudflare.com Date: Thu, 08 Oct 2020 21:45:22 -0700 Message-ID: <160221872234.12042.16278651489592613107.stgit@john-Precision-5820-Tower> In-Reply-To: <160221803938.12042.6218664623397526197.stgit@john-Precision-5820-Tower> References: <160221803938.12042.6218664623397526197.stgit@john-Precision-5820-Tower> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Move skb->sk assignment out of sk_psock_bpf_run() and into individual callers. Then we can use proper skb_set_owner_r() call to assign a sk to a skb. This improves things by also charging the truesize against the sockets sk_rmem_alloc counter. With this done we get some accounting in place to ensure the memory associated with skbs on the workqueue are still being accounted for somewhere. Finally, by using skb_set_owner_r the destructor is setup so we can just let the normal skb_kfree logic recover the memory. Combined with previous patch dropping skb_orphan() we now can recover from memory pressure and maintain accounting. Note, we will charge the skbs against their originating socket even if being redirected into another socket. Once the skb completes the redirect op the kfree_skb will give the memory back. This is important because if we charged the socket we are redirecting to (like it was done before this series) the sock_writeable() test could fail because of the skb trying to be sent is already charged against the socket. Also TLS case is special. Here we wait until we have decided not to simply PASS the packet up the stack. In the case where we PASS the packet up the stack we already have an skb which is accounted for on the TLS socket context. For the parser case we continue to just set/clear skb->sk this is because the skb being used here may be combined with other skbs or turned into multiple skbs depending on the parser logic. For example the parser could request a payload length greater than skb->len so that the strparser needs to collect multiple skbs. At any rate the final result will be handled in the strparser recv callback. Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: John Fastabend --- net/core/skmsg.c | 31 +++++++++++++++---------------- 1 file changed, 15 insertions(+), 16 deletions(-) diff --git a/net/core/skmsg.c b/net/core/skmsg.c index ef68749c9104..cc33ee74d0f6 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -684,20 +684,8 @@ EXPORT_SYMBOL_GPL(sk_psock_msg_verdict); static int sk_psock_bpf_run(struct sk_psock *psock, struct bpf_prog *prog, struct sk_buff *skb) { - int ret; - - /* strparser clones the skb before handing it to a upper layer, - * meaning we have the same data, but sk is NULL. We do want an - * sk pointer though when we run the BPF program. So we set it - * here and then NULL it to ensure we don't trigger a BUG_ON() - * in skb/sk operations later if kfree_skb is called with a - * valid skb->sk pointer and no destructor assigned. - */ - skb->sk = psock->sk; bpf_compute_data_end_sk_skb(skb); - ret = bpf_prog_run_pin_on_cpu(prog, skb); - skb->sk = NULL; - return ret; + return bpf_prog_run_pin_on_cpu(prog, skb); } static struct sk_psock *sk_psock_from_strp(struct strparser *strp) @@ -738,10 +726,11 @@ static void sk_psock_skb_redirect(struct sk_buff *skb) schedule_work(&psock_other->work); } -static void sk_psock_tls_verdict_apply(struct sk_buff *skb, int verdict) +static void sk_psock_tls_verdict_apply(struct sk_buff *skb, struct sock *sk, int verdict) { switch (verdict) { case __SK_REDIRECT: + skb_set_owner_r(skb, sk); sk_psock_skb_redirect(skb); break; case __SK_PASS: @@ -759,11 +748,17 @@ int sk_psock_tls_strp_read(struct sk_psock *psock, struct sk_buff *skb) rcu_read_lock(); prog = READ_ONCE(psock->progs.skb_verdict); if (likely(prog)) { + /* We skip full set_owner_r here because if we do a SK_PASS + * or SK_DROP we can skip skb memory accounting and use the + * TLS context. + */ + skb->sk = psock->sk; tcp_skb_bpf_redirect_clear(skb); ret = sk_psock_bpf_run(psock, prog, skb); ret = sk_psock_map_verd(ret, tcp_skb_bpf_redirect_fetch(skb)); + skb->sk = NULL; } - sk_psock_tls_verdict_apply(skb, ret); + sk_psock_tls_verdict_apply(skb, psock->sk, ret); rcu_read_unlock(); return ret; } @@ -825,6 +820,7 @@ static void sk_psock_strp_read(struct strparser *strp, struct sk_buff *skb) kfree_skb(skb); goto out; } + skb_set_owner_r(skb, sk); prog = READ_ONCE(psock->progs.skb_verdict); if (likely(prog)) { tcp_skb_bpf_redirect_clear(skb); @@ -849,8 +845,11 @@ static int sk_psock_strp_parse(struct strparser *strp, struct sk_buff *skb) rcu_read_lock(); prog = READ_ONCE(psock->progs.skb_parser); - if (likely(prog)) + if (likely(prog)) { + skb->sk = psock->sk; ret = sk_psock_bpf_run(psock, prog, skb); + skb->sk = NULL; + } rcu_read_unlock(); return ret; }