From patchwork Fri Oct 9 17:57:38 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 299175 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4033AC433E7 for ; Fri, 9 Oct 2020 17:57:54 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id EF70C22277 for ; Fri, 9 Oct 2020 17:57:53 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="EIu+vUuT" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2390248AbgJIR5x (ORCPT ); Fri, 9 Oct 2020 13:57:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57634 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731500AbgJIR5v (ORCPT ); Fri, 9 Oct 2020 13:57:51 -0400 Received: from mail-io1-xd41.google.com (mail-io1-xd41.google.com [IPv6:2607:f8b0:4864:20::d41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2DA24C0613D2; Fri, 9 Oct 2020 10:57:51 -0700 (PDT) Received: by mail-io1-xd41.google.com with SMTP id 67so10943628iob.8; Fri, 09 Oct 2020 10:57:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:from:to:cc:date:message-id:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=lLa3hBZMn4NHFo7xwpT+WHTGtW+n4KL1Gv/kNX990nc=; b=EIu+vUuT+dgns6VFfvuJxLybqgckEgpiMe44xRuPcfv9E8FwrbpPOP31OBviWDZk6p 1s5sbvQZGlP1GVBcpqDin08xUWK/7OxSjhcIKxzfI29T0I3DS7mcoISXx1hDJZ6bH+Yb uVThWIIDUAvTT//TeRuIUSPvWA3LHx2a4L/0lz3+Uf/CCyHPgk6BzRGOm6vO5lx38J4w 5TsIHrtD5fF/bggq1HEdRGsnGVoxRudm8KYmV5IAnufu6MDMuUwFlqIcOQT134MEzqTt e9DjQp8po3zTtmRp7iSqP8q8ZcvIA3OtcHucn88ffj26jATO9S6FfznrDl2hVf1oPi3R RwQw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=lLa3hBZMn4NHFo7xwpT+WHTGtW+n4KL1Gv/kNX990nc=; b=URLcmg3nMxj2d24s79RmZtFLkn5c6LyEJny0AnaA9RVOzQPUTGTtnpI+4BwiuneKin 6dc5krNd337Gb52yNzyTIYFDRwQ8mIiCMVFO810kbUIc/eprMWZ2Rla+yOwo4nEk/G5w wklDs+3AmWhKFhGBaLhjLOc6rm1JhDXBnzVg32A/tb5uQUs/m7rRxLwPQa1squct0RcK Ta8Xg3O/4MR2zG+okyHJr2dqQlspvXUdJ5VDgBlyDJu2H+0zVzK+S8JfXsugiVo9M6Q5 pxtMytUC8WhYjC6BoEXNzniFAsYBCXbIT3DP5hw2AjS4QmOL1HzM75ThQ/9JrMfRAuvV BgoQ== X-Gm-Message-State: AOAM530eKzX3EZlI/qPzVgeLUrYSOi9w6yJtR5TYm5yGCxZPuPkqAlob /DD+l5Qy7kYU1UVs49Rlr2s= X-Google-Smtp-Source: ABdhPJy0jlrRUjmIuft5WYSv7pEYPax0idrIQzPwVyoFzpb0BkYNu70XRIXDtSnUxM2AvQsBtSwz7w== X-Received: by 2002:a5d:8352:: with SMTP id q18mr10493822ior.31.1602266270513; Fri, 09 Oct 2020 10:57:50 -0700 (PDT) Received: from [127.0.1.1] ([184.63.162.180]) by smtp.gmail.com with ESMTPSA id a86sm4675034ill.11.2020.10.09.10.57.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 09 Oct 2020 10:57:49 -0700 (PDT) Subject: [bpf-next PATCH v2 1/6] bpf, sockmap: skb verdict SK_PASS to self already checked rmem limits From: John Fastabend To: john.fastabend@gmail.com, alexei.starovoitov@gmail.com, daniel@iogearbox.net Cc: netdev@vger.kernel.org, bpf@vger.kernel.org, jakub@cloudflare.com, lmb@cloudflare.com Date: Fri, 09 Oct 2020 10:57:38 -0700 Message-ID: <160226625788.4390.13364451138430478477.stgit@john-Precision-5820-Tower> In-Reply-To: <160226618411.4390.8167055952618723738.stgit@john-Precision-5820-Tower> References: <160226618411.4390.8167055952618723738.stgit@john-Precision-5820-Tower> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org For sk_skb case where skb_verdict program returns SK_PASS to continue to pass packet up the stack, the memory limits were already checked before enqueuing in skb_queue_tail from TCP side. So, lets remove the extra checks here. The theory is if the TCP stack believes we have memory to receive the packet then lets trust the stack and not double check the limits. In fact the accounting here can cause a drop if sk_rmem_alloc has increased after the stack accepted this packet, but before the duplicate check here. And worse if this happens because TCP stack already believes the data has been received there is no retransmit. Fixes: 51199405f9672 ("bpf: skb_verdict, support SK_PASS on RX BPF path") Signed-off-by: John Fastabend --- net/core/skmsg.c | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 4b5f7c8fecd1..040ae1d75b65 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -771,6 +771,7 @@ EXPORT_SYMBOL_GPL(sk_psock_tls_strp_read); static void sk_psock_verdict_apply(struct sk_psock *psock, struct sk_buff *skb, int verdict) { + struct tcp_skb_cb *tcp; struct sock *sk_other; switch (verdict) { @@ -780,16 +781,12 @@ static void sk_psock_verdict_apply(struct sk_psock *psock, !sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED)) { goto out_free; } - if (atomic_read(&sk_other->sk_rmem_alloc) <= - sk_other->sk_rcvbuf) { - struct tcp_skb_cb *tcp = TCP_SKB_CB(skb); - tcp->bpf.flags |= BPF_F_INGRESS; - skb_queue_tail(&psock->ingress_skb, skb); - schedule_work(&psock->work); - break; - } - goto out_free; + tcp = TCP_SKB_CB(skb); + tcp->bpf.flags |= BPF_F_INGRESS; + skb_queue_tail(&psock->ingress_skb, skb); + schedule_work(&psock->work); + break; case __SK_REDIRECT: sk_psock_skb_redirect(skb); break; From patchwork Fri Oct 9 17:57:56 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 288690 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 99AF3C43467 for ; Fri, 9 Oct 2020 17:58:10 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 452182227F for ; Fri, 9 Oct 2020 17:58:10 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Z4uKpWSE" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2390254AbgJIR6J (ORCPT ); Fri, 9 Oct 2020 13:58:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57682 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731500AbgJIR6J (ORCPT ); Fri, 9 Oct 2020 13:58:09 -0400 Received: from mail-io1-xd42.google.com (mail-io1-xd42.google.com [IPv6:2607:f8b0:4864:20::d42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 32D6CC0613D2; Fri, 9 Oct 2020 10:58:09 -0700 (PDT) Received: by mail-io1-xd42.google.com with SMTP id n6so10917409ioc.12; Fri, 09 Oct 2020 10:58:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:from:to:cc:date:message-id:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=CM32h5VxYUxOVKW+sj4z4wfLNFtEsf7TkwLYvQvIT2Q=; b=Z4uKpWSEFpCpRzBVq/Ckc+g+MUULVRZqQu9J0uj6ngmhtBw5eTLh4UVShNe6TntjAA 2O4SzFVBQ41ZodKsX4VVu2f+VZqCNBHnio2oOAry9kIjIqMaO+36Rhn9RopcVC0B7Elf ROpS4IiMnmSmG5adbI91kuCpSE6v5MBX/6tYVvraf8dcMaVV9plQ1VMwCk/MLApvKJ4k f+kvD6eHz3xx6CJa3VrXa3vrhB/qY1Cl0UU0Eey1g3kq5zGFzHUHzGbSfAZXRklk/tda Cg0QgBIRP/cRatNqhBEdU9Ozv1a+JsTadPTnulrNEmooLLJUf0WPUJOFBH7s/T+2i9ZG hu0A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=CM32h5VxYUxOVKW+sj4z4wfLNFtEsf7TkwLYvQvIT2Q=; b=Sy8RA/fq4MRGFrkcFZdcco5R3gW6JZ9bsjWUdGbG6TA1eBPK13kjXwJKY1GdqkyZsu bJqkkE55uajyHw7ZymK4YQeoocYOwZeJZlAx1Iloij9JA0xUQc0CtzgKgc2mjNRyApKf ZHTs33MFd1VJqJC1EHZmf0jbGalxAux8Mg19QvDxUutmryUMzfyKdJT+nsBLnM1zGtWX IEdSxQSMI0Abp7KQOM6GR2iN/iELGpr1Sjugy5eJM/T97lcHu4MWi0Op/EyWvY9PxuoK 8J2dRUbaeab3gZ26e+sIfuGqp+37RVQdUhWGVu0VHgb3I1AyW6075asDDyXd1WG2WE8E 7spw== X-Gm-Message-State: AOAM531mlWoBozjS8y6l8PQbWmLE6kUK8Ccxz1vOUChlZaQKN30CT7O3 5BGs7rz98hNaqXcDuihZpOrhPxl3lE3ZXQ== X-Google-Smtp-Source: ABdhPJzYZvZU4dFi/b/XkAmb4XegiSDIfmxLKgCPY58N6xtV0GvkWkBN/fEFVGR/uiZ7ej/kIiEKrw== X-Received: by 2002:a6b:8bd7:: with SMTP id n206mr9792005iod.13.1602266288495; Fri, 09 Oct 2020 10:58:08 -0700 (PDT) Received: from [127.0.1.1] ([184.63.162.180]) by smtp.gmail.com with ESMTPSA id k198sm4742686ilk.80.2020.10.09.10.58.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 09 Oct 2020 10:58:07 -0700 (PDT) Subject: [bpf-next PATCH v2 2/6] bpf, sockmap: On receive programs try to fast track SK_PASS ingress From: John Fastabend To: john.fastabend@gmail.com, alexei.starovoitov@gmail.com, daniel@iogearbox.net Cc: netdev@vger.kernel.org, bpf@vger.kernel.org, jakub@cloudflare.com, lmb@cloudflare.com Date: Fri, 09 Oct 2020 10:57:56 -0700 Message-ID: <160226627645.4390.11671193470778624910.stgit@john-Precision-5820-Tower> In-Reply-To: <160226618411.4390.8167055952618723738.stgit@john-Precision-5820-Tower> References: <160226618411.4390.8167055952618723738.stgit@john-Precision-5820-Tower> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org When we receive an skb and the ingress skb verdict program returns SK_PASS we currently set the ingress flag and put it on the workqueue so it can be turned into a sk_msg and put on the sk_msg ingress queue. Then finally telling userspace with data_ready hook. Here we observe that if the workqueue is empty then we can try to convert into a sk_msg type and call data_ready directly without bouncing through a workqueue. Its a common pattern to have a recv verdict program for visibility that always returns SK_PASS. In this case unless there is an ENOMEM error or we overrun the socket we can avoid the workqueue completely only using it when we fall back to error cases caused by memory pressure. By doing this we eliminate another case where data may be dropped if errors occur on memory limits in workqueue. Fixes: 51199405f9672 ("bpf: skb_verdict, support SK_PASS on RX BPF path") Signed-off-by: John Fastabend --- net/core/skmsg.c | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 040ae1d75b65..455cf5fa0279 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -773,6 +773,7 @@ static void sk_psock_verdict_apply(struct sk_psock *psock, { struct tcp_skb_cb *tcp; struct sock *sk_other; + int err = 0; switch (verdict) { case __SK_PASS: @@ -784,8 +785,20 @@ static void sk_psock_verdict_apply(struct sk_psock *psock, tcp = TCP_SKB_CB(skb); tcp->bpf.flags |= BPF_F_INGRESS; - skb_queue_tail(&psock->ingress_skb, skb); - schedule_work(&psock->work); + + /* If the queue is empty then we can submit directly + * into the msg queue. If its not empty we have to + * queue work otherwise we may get OOO data. Otherwise, + * if sk_psock_skb_ingress errors will be handled by + * retrying later from workqueue. + */ + if (skb_queue_empty(&psock->ingress_skb)) { + err = sk_psock_skb_ingress(psock, skb); + } + if (err < 0) { + skb_queue_tail(&psock->ingress_skb, skb); + schedule_work(&psock->work); + } break; case __SK_REDIRECT: sk_psock_skb_redirect(skb); From patchwork Fri Oct 9 17:58:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 299174 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D4266C433E7 for ; Fri, 9 Oct 2020 17:58:28 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 77C022227F for ; Fri, 9 Oct 2020 17:58:28 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="SFWceBkj" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2390259AbgJIR61 (ORCPT ); Fri, 9 Oct 2020 13:58:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57728 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731500AbgJIR61 (ORCPT ); Fri, 9 Oct 2020 13:58:27 -0400 Received: from mail-il1-x144.google.com (mail-il1-x144.google.com [IPv6:2607:f8b0:4864:20::144]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D1CD8C0613D2; Fri, 9 Oct 2020 10:58:26 -0700 (PDT) Received: by mail-il1-x144.google.com with SMTP id o9so5404208ilo.0; Fri, 09 Oct 2020 10:58:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:from:to:cc:date:message-id:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=Kyq+/ml3RrM58lYJGjEIbD5XFfST3owwENK8V6cBEBY=; b=SFWceBkjvvW6Qz79FDVrebFdr9ZI2US3XN9YlfwbjojtCTXRz05RcLjHdjlM1SIGh3 Dn8iCEV2RgrwEXiMqgqhOlYyr3gncT4iEN0jmh2vlJ7s60hQQa/+ELAh+g2/4cxFsj/M wBI/1K8B6jxUCCmiWyzN3wRY+DGiWG9l13+qjv1mSJrD2KJxGCUf1XvViISKpGfPsfwd B7eWWZKssuXpnIn5n2rg9rQdx8gOGoNnISOu2e9p8RPQu0dEYjfsNWgP0eRNEL41OU2m HlXTVlNFSVBdsZGBNh1KdWkW2T363twOgW+RG3CUJld3a7lfRdHU/jApDlhzEs4oDJY8 CILA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=Kyq+/ml3RrM58lYJGjEIbD5XFfST3owwENK8V6cBEBY=; b=JyiRdu/Nv+D94gf4n8wX9GSoQrSieSSHPP5CtvUUlOHHavSHYpT5i9xtwSb7tJa6UY e4mf4I5hyWFapErYVZgIISNQal3QB3NH2duAkEsYrBJU4PDkE+LM2WoZpaiPBHTnfA9b QbJtow5Muul5XZfjkH50mxJWmOefbUJQ82q2QlmMbwV2u9Jxtc5+ur9MfVVx2Tjllj6G dyxbMzs6CKApcoBw1rg7IYY4vLx8lLngbGCH9AUfcr2gQjGMNOvNpic4WBxJhyzXuOmp Ncs6Q5EDYFgG8uabIxMVxr1j4KXO38707dHDmeHzTpJdTy0ycr9pP8O4EH4d6b2VpLW2 q2kw== X-Gm-Message-State: AOAM5305PAJ5EwE9bYwBn4AeLOM1c5Bc/YISkBhdN7CBzLNle43VqyN/ 4DiMS145KbpNXbBvYJNunSk= X-Google-Smtp-Source: ABdhPJyfQayNnW4pOPMAQWhriZkf9JTyL4SM5RjFJxlKnWxU2xiJvn2gJyqqBZLwxufVCTLBmTxk4g== X-Received: by 2002:a92:d4d0:: with SMTP id o16mr9683307ilm.152.1602266306260; Fri, 09 Oct 2020 10:58:26 -0700 (PDT) Received: from [127.0.1.1] ([184.63.162.180]) by smtp.gmail.com with ESMTPSA id v18sm2823134ilo.52.2020.10.09.10.58.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 09 Oct 2020 10:58:25 -0700 (PDT) Subject: [bpf-next PATCH v2 3/6] bpf, sockmap: remove skb_set_owner_w wmem will be taken later from sendpage From: John Fastabend To: john.fastabend@gmail.com, alexei.starovoitov@gmail.com, daniel@iogearbox.net Cc: netdev@vger.kernel.org, bpf@vger.kernel.org, jakub@cloudflare.com, lmb@cloudflare.com Date: Fri, 09 Oct 2020 10:58:14 -0700 Message-ID: <160226629445.4390.448415070225757849.stgit@john-Precision-5820-Tower> In-Reply-To: <160226618411.4390.8167055952618723738.stgit@john-Precision-5820-Tower> References: <160226618411.4390.8167055952618723738.stgit@john-Precision-5820-Tower> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org The skb_set_owner_w is unnecessary here. The sendpage call will create a fresh skb and set the owner correctly from workqueue. Its also not entirely harmless because it consumes cycles, but also impacts resource accounting by increasing sk_wmem_alloc. This is charging the socket we are going to send to for the skb, but we will put it on the workqueue for some time before this happens so we are artifically inflating sk_wmem_alloc for this period. Further, we don't know how many skbs will be used to send the packet or how it will be broken up when sent over the new socket so charging it with one big sum is also not correct when the workqueue may break it up if facing memory pressure. Seeing we don't know how/when this is going to be sent drop the early accounting. A later patch will do proper accounting charged on receive socket for the case where skbs get enqueued on the workqueue. Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: John Fastabend --- net/core/skmsg.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 455cf5fa0279..cab596c02412 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -728,8 +728,6 @@ static void sk_psock_skb_redirect(struct sk_buff *skb) (ingress && atomic_read(&sk_other->sk_rmem_alloc) <= sk_other->sk_rcvbuf)) { - if (!ingress) - skb_set_owner_w(skb, sk_other); skb_queue_tail(&psock_other->ingress_skb, skb); schedule_work(&psock_other->work); } else { From patchwork Fri Oct 9 17:58:32 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 288689 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6E12BC433E7 for ; Fri, 9 Oct 2020 17:58:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1855021D6C for ; Fri, 9 Oct 2020 17:58:46 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="F11hvveY" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2390267AbgJIR6p (ORCPT ); Fri, 9 Oct 2020 13:58:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57776 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731500AbgJIR6o (ORCPT ); Fri, 9 Oct 2020 13:58:44 -0400 Received: from mail-il1-x142.google.com (mail-il1-x142.google.com [IPv6:2607:f8b0:4864:20::142]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A14DAC0613D2; Fri, 9 Oct 2020 10:58:44 -0700 (PDT) Received: by mail-il1-x142.google.com with SMTP id q7so9912740ile.8; Fri, 09 Oct 2020 10:58:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:from:to:cc:date:message-id:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=AsNMLHaVACEddokEf1W40M3DcEqsPBOzxmz+X7AsbdU=; b=F11hvveY7jVSJuW7sQ4KeWArIa9hubQ8fYtgWbL3Jb6J8zbyj5hqWV9xa2H6ooV+jc jdZeO49whpQL3EYjg7dQUC/zs1yQpy8RFPSrlvpXh9T3FgeMRa5XWvxqeiUWIitBdOT+ Yi7CwdZ2TCyq9VuxC2243H4HYbjOm4X7CknnYLuUSK82mYZJIfBLhaa6iHIsU93O9ZUV eJi3vVlhDT3gSW7N7h1XRHfmrHrxVLKdfXh3ophG+S4A3ulJbl10N571hvZGQeCzGLDm 9Ogb8hPPrdBKYvg3RXWGoSN37UDpgJzKpwRrUzfOUno8wb2h+3SGEeusLxg1JU1q/jv8 f1GQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=AsNMLHaVACEddokEf1W40M3DcEqsPBOzxmz+X7AsbdU=; b=Q5IbbM/Ra1bmwiKtIsBPlgB4JK/tqbft84NiXWVH4Eu/ew++qeBuaQULufnvjkQqr1 EsTBvMo93IaSR4fUAMsZoC+ggfHc8TMOavNAyPI5mGLSTBeqjGb/Ksa8aTuAGL3vZ+RZ pawBqDm8DJnYUM4kqCU78kvkfFZ89Ax1eV2AtVut0hLWJbLFPa4a52rIAzSjoO5mDvZT YoZk/mUNX2EGdMbZO6SbSo728ELiKW7S6efh1lvtKZx/2G//pX80yQkW80JdkLZqXiGf i3uKN4m6FDUyvDapURwxes/mxUq+wOiub5f+hIj3pEl+CCusVlZ/4JXlpWOLOKRO74j6 ZPHA== X-Gm-Message-State: AOAM530UwJen0a/m0i43Hood0QtGiHmkS4i3hGlwBnyywQ0dBTMblMS7 4m+pGn8UdEa1/ANkfweOFkri+5FzQ/HpAA== X-Google-Smtp-Source: ABdhPJyo8vTDxzi/EazCG0Cv59QOudDejSJlYzpVJkwuYFzTLQ53DDhvjJqlQutEpPMzx4XU4JAJAA== X-Received: by 2002:a05:6e02:1105:: with SMTP id u5mr5071288ilk.286.1602266323959; Fri, 09 Oct 2020 10:58:43 -0700 (PDT) Received: from [127.0.1.1] ([184.63.162.180]) by smtp.gmail.com with ESMTPSA id m18sm4505791iln.30.2020.10.09.10.58.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 09 Oct 2020 10:58:43 -0700 (PDT) Subject: [bpf-next PATCH v2 4/6] bpf, sockmap: remove dropped data on errors in redirect case From: John Fastabend To: john.fastabend@gmail.com, alexei.starovoitov@gmail.com, daniel@iogearbox.net Cc: netdev@vger.kernel.org, bpf@vger.kernel.org, jakub@cloudflare.com, lmb@cloudflare.com Date: Fri, 09 Oct 2020 10:58:32 -0700 Message-ID: <160226631218.4390.10523182655030600867.stgit@john-Precision-5820-Tower> In-Reply-To: <160226618411.4390.8167055952618723738.stgit@john-Precision-5820-Tower> References: <160226618411.4390.8167055952618723738.stgit@john-Precision-5820-Tower> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org In the sk_skb redirect case we didn't handle the case where we overrun the sk_rmem_alloc entry on ingress redirect or sk_wmem_alloc on egress. Because we didn't have anything implemented we simply dropped the skb. This meant data could be dropped if socket memory accounting was in place. This fixes the above dropped data case by moving the memory checks later in the code where we actually do the send or recv. This pushes those checks into the workqueue and allows us to return an EAGAIN error which in turn allows us to try again later from the workqueue. Fixes: 51199405f9672 ("bpf: skb_verdict, support SK_PASS on RX BPF path") Signed-off-by: John Fastabend --- net/core/skmsg.c | 28 ++++++++++++++-------------- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/net/core/skmsg.c b/net/core/skmsg.c index cab596c02412..9804ef0354a2 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -433,10 +433,12 @@ static int sk_psock_skb_ingress(struct sk_psock *psock, struct sk_buff *skb) static int sk_psock_handle_skb(struct sk_psock *psock, struct sk_buff *skb, u32 off, u32 len, bool ingress) { - if (ingress) - return sk_psock_skb_ingress(psock, skb); - else + if (!ingress) { + if (!sock_writeable(psock->sk)) + return -EAGAIN; return skb_send_sock_locked(psock->sk, skb, off, len); + } + return sk_psock_skb_ingress(psock, skb); } static void sk_psock_backlog(struct work_struct *work) @@ -709,30 +711,28 @@ static void sk_psock_skb_redirect(struct sk_buff *skb) { struct sk_psock *psock_other; struct sock *sk_other; - bool ingress; sk_other = tcp_skb_bpf_redirect_fetch(skb); + /* This error is a buggy BPF program, it returned a redirect + * return code, but then didn't set a redirect interface. + */ if (unlikely(!sk_other)) { kfree_skb(skb); return; } psock_other = sk_psock(sk_other); + /* This error indicates the socket is being torn down or had another + * error that caused the pipe to break. We can't send a packet on + * a socket that is in this state so we drop the skb. + */ if (!psock_other || sock_flag(sk_other, SOCK_DEAD) || !sk_psock_test_state(psock_other, SK_PSOCK_TX_ENABLED)) { kfree_skb(skb); return; } - ingress = tcp_skb_bpf_ingress(skb); - if ((!ingress && sock_writeable(sk_other)) || - (ingress && - atomic_read(&sk_other->sk_rmem_alloc) <= - sk_other->sk_rcvbuf)) { - skb_queue_tail(&psock_other->ingress_skb, skb); - schedule_work(&psock_other->work); - } else { - kfree_skb(skb); - } + skb_queue_tail(&psock_other->ingress_skb, skb); + schedule_work(&psock_other->work); } static void sk_psock_tls_verdict_apply(struct sk_buff *skb, int verdict) From patchwork Fri Oct 9 17:58:50 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 299173 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7B397C43457 for ; Fri, 9 Oct 2020 17:59:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2C7AF22284 for ; Fri, 9 Oct 2020 17:59:03 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="IKzvlnn2" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2390273AbgJIR7C (ORCPT ); Fri, 9 Oct 2020 13:59:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57820 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731500AbgJIR7C (ORCPT ); Fri, 9 Oct 2020 13:59:02 -0400 Received: from mail-io1-xd44.google.com (mail-io1-xd44.google.com [IPv6:2607:f8b0:4864:20::d44]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4B4F8C0613D2; Fri, 9 Oct 2020 10:59:02 -0700 (PDT) Received: by mail-io1-xd44.google.com with SMTP id u19so10984916ion.3; Fri, 09 Oct 2020 10:59:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:from:to:cc:date:message-id:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=WgUEv94zOjtzKObiIbf/jfp6XQO9pVOU6Rs2H1v8aK4=; b=IKzvlnn2OW6RDaIvrPhRRJrmV1RQPo64NTaWSg+y+kanlYxA8wmG8TUR40ZoPGAfJe kZB8rJGckpIKFn2j81wmCTcFoTH9KC+7uOXMDKrILMur/lRftoa9jTGZsSddmBnkGvDA ATfO7eLJzFm51ot5/LPKtLSV4qwgZi1I8LGfMAym3A7m4A2xU7/mN2RDCAzbildef4IO WHlCvZ6c0cckNKB1ayTPwey8j163d+V9rnF5n4xZ3MTf2XhWTa78bAptQp2S5n05AP+A +w9BiSDekzCjI2SzdmsRJpNsjmc+0zvwLQ4gIfOMfLow/t0rO0ZegLihmfpgs2sZIfbX O43Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=WgUEv94zOjtzKObiIbf/jfp6XQO9pVOU6Rs2H1v8aK4=; b=grAfwcL+a2n0sliuGwOcklXdu1X07I8kKO+4ZE0qkrZ1QMnyl3jDcgmvilwuvZeNO7 ZFOr61/nL+QhcKW1MjtARogygtCgsbeD7tZ/5IToZ/WRXMENK1ttI+5KHqqZwlbcYS0v kFHDFUaAKUM/uEsACnm7CQ65RhOyYkReblAgy1tIDpzs6dXldU0SB5incHB1GJtCMhLu YXBQ5tL6ixO8v+L5rVI8ow+YBiXEkQ7+Q1WHXK4ZRxzrI8Py8Bd2b1skOwFc6cQwIqx0 zktB7Hxt0v1OL1pv/SFwtOm1FWm0KRvLwwgy7I27N2O6/HomP0ALh5ksxCFOE0f+ZCxQ jbBw== X-Gm-Message-State: AOAM532xd7kLblT+ATyXo1LpSjlqalUhNBvjlaNvfW42lKCnBVH8qPRB VKQDqsGmQrWSn+mQzO+O68E= X-Google-Smtp-Source: ABdhPJzzvc1L3b8wXVR6ejrPChvXTz2fXMWDCP0nKgKErvkgwIXoWkx493jOuGDLGzUcrZ1A7PByuw== X-Received: by 2002:a6b:dc0f:: with SMTP id s15mr10085914ioc.180.1602266341590; Fri, 09 Oct 2020 10:59:01 -0700 (PDT) Received: from [127.0.1.1] ([184.63.162.180]) by smtp.gmail.com with ESMTPSA id x13sm3825384iox.31.2020.10.09.10.58.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 09 Oct 2020 10:59:00 -0700 (PDT) Subject: [bpf-next PATCH v2 5/6] bpf, sockmap: Remove skb_orphan and let normal skb_kfree do cleanup From: John Fastabend To: john.fastabend@gmail.com, alexei.starovoitov@gmail.com, daniel@iogearbox.net Cc: netdev@vger.kernel.org, bpf@vger.kernel.org, jakub@cloudflare.com, lmb@cloudflare.com Date: Fri, 09 Oct 2020 10:58:50 -0700 Message-ID: <160226632994.4390.2648269619617632786.stgit@john-Precision-5820-Tower> In-Reply-To: <160226618411.4390.8167055952618723738.stgit@john-Precision-5820-Tower> References: <160226618411.4390.8167055952618723738.stgit@john-Precision-5820-Tower> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Calling skb_orphan() is unnecessary in the strp rcv handler because the skb is from a skb_clone() in __strp_recv. So it never has a destructor or a sk assigned. Plus its confusing to read because it might hint to the reader that the skb could have an sk assigned which is not true. Even if we did have an sk assigned it would be cleaner to simply wait for the upcoming kfree_skb(). Additionally, move the comment about strparser clone up so its closer to the logic it is describing and add to it so that it is more complete. Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: John Fastabend --- net/core/skmsg.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 9804ef0354a2..b017c6104cdc 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -686,15 +686,16 @@ static int sk_psock_bpf_run(struct sk_psock *psock, struct bpf_prog *prog, { int ret; + /* strparser clones the skb before handing it to a upper layer, + * meaning we have the same data, but sk is NULL. We do want an + * sk pointer though when we run the BPF program. So we set it + * here and then NULL it to ensure we don't trigger a BUG_ON() + * in skb/sk operations later if kfree_skb is called with a + * valid skb->sk pointer and no destructor assigned. + */ skb->sk = psock->sk; bpf_compute_data_end_sk_skb(skb); ret = bpf_prog_run_pin_on_cpu(prog, skb); - /* strparser clones the skb before handing it to a upper layer, - * meaning skb_orphan has been called. We NULL sk on the way out - * to ensure we don't trigger a BUG_ON() in skb/sk operations - * later and because we are not charging the memory of this skb - * to any socket yet. - */ skb->sk = NULL; return ret; } @@ -824,7 +825,6 @@ static void sk_psock_strp_read(struct strparser *strp, struct sk_buff *skb) } prog = READ_ONCE(psock->progs.skb_verdict); if (likely(prog)) { - skb_orphan(skb); tcp_skb_bpf_redirect_clear(skb); ret = sk_psock_bpf_run(psock, prog, skb); ret = sk_psock_map_verd(ret, tcp_skb_bpf_redirect_fetch(skb)); From patchwork Fri Oct 9 17:59:07 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 288688 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 40796C433DF for ; Fri, 9 Oct 2020 17:59:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DAF022227F for ; Fri, 9 Oct 2020 17:59:22 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="uXG1+HhU" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2390278AbgJIR7W (ORCPT ); Fri, 9 Oct 2020 13:59:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57868 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730110AbgJIR7U (ORCPT ); Fri, 9 Oct 2020 13:59:20 -0400 Received: from mail-io1-xd41.google.com (mail-io1-xd41.google.com [IPv6:2607:f8b0:4864:20::d41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 543A5C0613D2; Fri, 9 Oct 2020 10:59:20 -0700 (PDT) Received: by mail-io1-xd41.google.com with SMTP id y20so6790048iod.5; Fri, 09 Oct 2020 10:59:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:from:to:cc:date:message-id:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=pi1UXh+jn1134QvUsVttHo/at8kQ91ooAT76FDYrGiY=; b=uXG1+HhU7u0+sNwyrw2U9p/k8XuPORYeq04GHIAPNpcaZfHbk8sgdMitEd6qW3MSvr fD9t2aIcmCWJ+fRgl393giE7qyE1bDylqwgq9KXP26He6gymtgUofrpSeogHGj2q5Bz+ JTTaH3naKQ7l8JqhPd4wPJs+gKjOndeQGQC5YwkmQMkaXnNhxeHrz5uIfseJ4TRvPAVQ 8YCDOiHlv2Xpb8IbdJPkkVviu/2cgtJ7ph915wMFpr+wUBbgMrUmDPmXrGTYPcEgLCQX x7w83khPZX14GwHsDSN0kygx2bCr1REQd/JlqrnGJx/YqkFqwiqn1S0JX3XWHGEgES6o DLtA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=pi1UXh+jn1134QvUsVttHo/at8kQ91ooAT76FDYrGiY=; b=CRn7R6e8uHOoKoglFGUlx6JL4bbRCFqI7mDGZ3nM4P1jjwXi3rv0laTo8frP0vrJVS JaOrGXJVVoT1d4SiFj+EWRoRTHH17lo7CcL64Thrq2cTn1sl7Q461A5QNYLyi7xgaQ48 Ws3bOKCczWQrJWwV5G3rX6dNqtiu2OykAh9/2lygE2cqNILUadLYOp5EMmPeTryWP1A3 wEvUnvK8dMJp/tIoxEBqis0+saDf2LXpaDqZiVCNnsiWSw7TLtD6OZBNA18r29jl7GwO zn23BML5NRKCBQ+hwA9f36zBj7E+nPzYQ+Z3v4bb9T1xZd+9mmmmfWXF5IWMCZq5kSqp D/ww== X-Gm-Message-State: AOAM5324XaiboL3OgQ4J0QZcPGa8it83FGThSc6Q1OgkX3/dqPh8PU31 aXB9bxRBNrGi8goNzZrUzKg= X-Google-Smtp-Source: ABdhPJxi+Qm9J3HKTrD6T9A8ljHPQEnNMSx3a65IhiX+3KL3Bd/r1Ln6Akjp2tPBwnZG0Dgl9ZdnAw== X-Received: by 2002:a05:6602:2dce:: with SMTP id l14mr3316230iow.198.1602266359682; Fri, 09 Oct 2020 10:59:19 -0700 (PDT) Received: from [127.0.1.1] ([184.63.162.180]) by smtp.gmail.com with ESMTPSA id k7sm3348361iog.26.2020.10.09.10.59.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 09 Oct 2020 10:59:19 -0700 (PDT) Subject: [bpf-next PATCH v2 6/6] bpf, sockmap: Add memory accounting so skbs on ingress lists are visible From: John Fastabend To: john.fastabend@gmail.com, alexei.starovoitov@gmail.com, daniel@iogearbox.net Cc: netdev@vger.kernel.org, bpf@vger.kernel.org, jakub@cloudflare.com, lmb@cloudflare.com Date: Fri, 09 Oct 2020 10:59:07 -0700 Message-ID: <160226634754.4390.3646137133633320563.stgit@john-Precision-5820-Tower> In-Reply-To: <160226618411.4390.8167055952618723738.stgit@john-Precision-5820-Tower> References: <160226618411.4390.8167055952618723738.stgit@john-Precision-5820-Tower> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Move skb->sk assignment out of sk_psock_bpf_run() and into individual callers. Then we can use proper skb_set_owner_r() call to assign a sk to a skb. This improves things by also charging the truesize against the sockets sk_rmem_alloc counter. With this done we get some accounting in place to ensure the memory associated with skbs on the workqueue are still being accounted for somewhere. Finally, by using skb_set_owner_r the destructor is setup so we can just let the normal skb_kfree logic recover the memory. Combined with previous patch dropping skb_orphan() we now can recover from memory pressure and maintain accounting. Note, we will charge the skbs against their originating socket even if being redirected into another socket. Once the skb completes the redirect op the kfree_skb will give the memory back. This is important because if we charged the socket we are redirecting to (like it was done before this series) the sock_writeable() test could fail because of the skb trying to be sent is already charged against the socket. Also TLS case is special. Here we wait until we have decided not to simply PASS the packet up the stack. In the case where we PASS the packet up the stack we already have an skb which is accounted for on the TLS socket context. For the parser case we continue to just set/clear skb->sk this is because the skb being used here may be combined with other skbs or turned into multiple skbs depending on the parser logic. For example the parser could request a payload length greater than skb->len so that the strparser needs to collect multiple skbs. At any rate the final result will be handled in the strparser recv callback. Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: John Fastabend --- net/core/skmsg.c | 31 +++++++++++++++---------------- 1 file changed, 15 insertions(+), 16 deletions(-) diff --git a/net/core/skmsg.c b/net/core/skmsg.c index b017c6104cdc..a7f1133fa11c 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -684,20 +684,8 @@ EXPORT_SYMBOL_GPL(sk_psock_msg_verdict); static int sk_psock_bpf_run(struct sk_psock *psock, struct bpf_prog *prog, struct sk_buff *skb) { - int ret; - - /* strparser clones the skb before handing it to a upper layer, - * meaning we have the same data, but sk is NULL. We do want an - * sk pointer though when we run the BPF program. So we set it - * here and then NULL it to ensure we don't trigger a BUG_ON() - * in skb/sk operations later if kfree_skb is called with a - * valid skb->sk pointer and no destructor assigned. - */ - skb->sk = psock->sk; bpf_compute_data_end_sk_skb(skb); - ret = bpf_prog_run_pin_on_cpu(prog, skb); - skb->sk = NULL; - return ret; + return bpf_prog_run_pin_on_cpu(prog, skb); } static struct sk_psock *sk_psock_from_strp(struct strparser *strp) @@ -736,10 +724,11 @@ static void sk_psock_skb_redirect(struct sk_buff *skb) schedule_work(&psock_other->work); } -static void sk_psock_tls_verdict_apply(struct sk_buff *skb, int verdict) +static void sk_psock_tls_verdict_apply(struct sk_buff *skb, struct sock *sk, int verdict) { switch (verdict) { case __SK_REDIRECT: + skb_set_owner_r(skb, sk); sk_psock_skb_redirect(skb); break; case __SK_PASS: @@ -757,11 +746,17 @@ int sk_psock_tls_strp_read(struct sk_psock *psock, struct sk_buff *skb) rcu_read_lock(); prog = READ_ONCE(psock->progs.skb_verdict); if (likely(prog)) { + /* We skip full set_owner_r here because if we do a SK_PASS + * or SK_DROP we can skip skb memory accounting and use the + * TLS context. + */ + skb->sk = psock->sk; tcp_skb_bpf_redirect_clear(skb); ret = sk_psock_bpf_run(psock, prog, skb); ret = sk_psock_map_verd(ret, tcp_skb_bpf_redirect_fetch(skb)); + skb->sk = NULL; } - sk_psock_tls_verdict_apply(skb, ret); + sk_psock_tls_verdict_apply(skb, psock->sk, ret); rcu_read_unlock(); return ret; } @@ -823,6 +818,7 @@ static void sk_psock_strp_read(struct strparser *strp, struct sk_buff *skb) kfree_skb(skb); goto out; } + skb_set_owner_r(skb, sk); prog = READ_ONCE(psock->progs.skb_verdict); if (likely(prog)) { tcp_skb_bpf_redirect_clear(skb); @@ -847,8 +843,11 @@ static int sk_psock_strp_parse(struct strparser *strp, struct sk_buff *skb) rcu_read_lock(); prog = READ_ONCE(psock->progs.skb_parser); - if (likely(prog)) + if (likely(prog)) { + skb->sk = psock->sk; ret = sk_psock_bpf_run(psock, prog, skb); + skb->sk = NULL; + } rcu_read_unlock(); return ret; }