From patchwork Fri Oct 9 18:36:16 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 299172 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2C669C43457 for ; Fri, 9 Oct 2020 18:37:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D00FA222BA for ; Fri, 9 Oct 2020 18:37:16 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="sRL7lAoF" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732545AbgJIShQ (ORCPT ); Fri, 9 Oct 2020 14:37:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35442 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726236AbgJISgc (ORCPT ); Fri, 9 Oct 2020 14:36:32 -0400 Received: from mail-io1-xd41.google.com (mail-io1-xd41.google.com [IPv6:2607:f8b0:4864:20::d41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AD6A9C0613D2; Fri, 9 Oct 2020 11:36:31 -0700 (PDT) Received: by mail-io1-xd41.google.com with SMTP id n6so11050263ioc.12; Fri, 09 Oct 2020 11:36:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:from:to:cc:date:message-id:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=lLa3hBZMn4NHFo7xwpT+WHTGtW+n4KL1Gv/kNX990nc=; b=sRL7lAoFzynAdYHdnMm8WglEKo8v4y1Q8zXpNKH3Thf+SHB6xNoP9uNnV05C50xb0A +nCyhlF77cF3OvrF6ZfqwRzxjNGoAuUdiKoQRnTkiro98h5AkjIGnXaAfHr3i8Uah6wv NXBM5t3EsH0G2fa/jPvOQfo1HWwQV28xugcBUfNPmF/lonIFerkT+HGTAyzP8/zRauZT LBZISk/IxqdsEj/tbT3w2GnkGZtslL+R/+ngdb56oCiJNbb4p46pHVbGlhKmk9fotlKN SSdxDje4ey1E442QE6+MgXu/7VghUAElcvOR+A4yTqWWdBut06SWpcnX0x/HT0cotUiv 3R6A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=lLa3hBZMn4NHFo7xwpT+WHTGtW+n4KL1Gv/kNX990nc=; b=BaJzIuIwOX+rOb/4mugG1NGe7tfFcRnjvxqkmy/4HYh99BJpfQrSrrEiePtJ61Jwf9 XphbOfsTkwi1PIB9CH5VftANJOKY8IQGHZZ5wRJaueuol88irG3vu1LU4OggvX8d1e0Q 9WRkvLqcpkbs4CyWIjc/jXHzd3jnUS9tV4FM3SrzTgidye6ifaf4TesbA64cCrdqsh+4 7f8644mKW83MbFybVD0X5/UOGeNKOrIMC/W4Yjh7lNr4sTbX3RULKWBW6Kg7KW3EGzZz fzRCKqomk+SJDpcqzyIfE92zwb2KJO56/t+vdgCCF3MBACPCjeHTjUQvnGkAIAZpW94C JhWg== X-Gm-Message-State: AOAM533aKMe4yJgCklNVCRWPKoSzL1ZP0WaN64eYo87Eg3gZlkC8tuOF yH3LtziPYvXWx7r9Db7Tszg= X-Google-Smtp-Source: ABdhPJy6lGYejDeSVVOQX1Q6ItQNkph2z2JzgclDW2pgPOM2H2jAof8NKvRN5R+AM2aNtAaBP31Ieg== X-Received: by 2002:a05:6602:2fc2:: with SMTP id v2mr1341210iow.19.1602268591014; Fri, 09 Oct 2020 11:36:31 -0700 (PDT) Received: from [127.0.1.1] ([184.63.162.180]) by smtp.gmail.com with ESMTPSA id t11sm4809886ill.61.2020.10.09.11.36.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 09 Oct 2020 11:36:30 -0700 (PDT) Subject: [bpf-next PATCH v3 1/6] bpf, sockmap: skb verdict SK_PASS to self already checked rmem limits From: John Fastabend To: john.fastabend@gmail.com, alexei.starovoitov@gmail.com, daniel@iogearbox.net Cc: netdev@vger.kernel.org, bpf@vger.kernel.org, jakub@cloudflare.com, lmb@cloudflare.com Date: Fri, 09 Oct 2020 11:36:16 -0700 Message-ID: <160226857664.5692.668205469388498375.stgit@john-Precision-5820-Tower> In-Reply-To: <160226839426.5692.13107801574043388675.stgit@john-Precision-5820-Tower> References: <160226839426.5692.13107801574043388675.stgit@john-Precision-5820-Tower> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org For sk_skb case where skb_verdict program returns SK_PASS to continue to pass packet up the stack, the memory limits were already checked before enqueuing in skb_queue_tail from TCP side. So, lets remove the extra checks here. The theory is if the TCP stack believes we have memory to receive the packet then lets trust the stack and not double check the limits. In fact the accounting here can cause a drop if sk_rmem_alloc has increased after the stack accepted this packet, but before the duplicate check here. And worse if this happens because TCP stack already believes the data has been received there is no retransmit. Fixes: 51199405f9672 ("bpf: skb_verdict, support SK_PASS on RX BPF path") Signed-off-by: John Fastabend --- net/core/skmsg.c | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 4b5f7c8fecd1..040ae1d75b65 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -771,6 +771,7 @@ EXPORT_SYMBOL_GPL(sk_psock_tls_strp_read); static void sk_psock_verdict_apply(struct sk_psock *psock, struct sk_buff *skb, int verdict) { + struct tcp_skb_cb *tcp; struct sock *sk_other; switch (verdict) { @@ -780,16 +781,12 @@ static void sk_psock_verdict_apply(struct sk_psock *psock, !sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED)) { goto out_free; } - if (atomic_read(&sk_other->sk_rmem_alloc) <= - sk_other->sk_rcvbuf) { - struct tcp_skb_cb *tcp = TCP_SKB_CB(skb); - tcp->bpf.flags |= BPF_F_INGRESS; - skb_queue_tail(&psock->ingress_skb, skb); - schedule_work(&psock->work); - break; - } - goto out_free; + tcp = TCP_SKB_CB(skb); + tcp->bpf.flags |= BPF_F_INGRESS; + skb_queue_tail(&psock->ingress_skb, skb); + schedule_work(&psock->work); + break; case __SK_REDIRECT: sk_psock_skb_redirect(skb); break; From patchwork Fri Oct 9 18:36:37 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 288685 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 19CD5C433E7 for ; Fri, 9 Oct 2020 18:37:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B3C26222B9 for ; Fri, 9 Oct 2020 18:37:15 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="g9OTkdCu" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732923AbgJIShP (ORCPT ); Fri, 9 Oct 2020 14:37:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35498 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732545AbgJISgv (ORCPT ); Fri, 9 Oct 2020 14:36:51 -0400 Received: from mail-io1-xd41.google.com (mail-io1-xd41.google.com [IPv6:2607:f8b0:4864:20::d41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B38AFC0613D2; Fri, 9 Oct 2020 11:36:51 -0700 (PDT) Received: by mail-io1-xd41.google.com with SMTP id b1so6307079iot.4; Fri, 09 Oct 2020 11:36:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:from:to:cc:date:message-id:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=x8XH6zeeutGP4rEQeNvbvrwh5BP+FcwqMZ4vlu9DuVE=; b=g9OTkdCuYdmmnzICRmmjnIaotRU+ekIoPyb4kW2X3OpP1ySdxO3rG5Hza5MUX+oqRe 2xjXek8E0eRNeTacOf2g820cM1y43EuJ6gxzNLQwrKPlxKVzT6LMYYB/MD+ePIpGB3PU Q4E5MaR7cxipWA2MhkbieSgb+zlFChESXPVC3FZH+aT4c1tkMQxnH7Z+EHzSsXheJflQ Nu0okgzIk7OUcIB1JC0DTi1HzgQ+g9Zhr8OlQLWNoss2+6FgN/XMkjxf1SXbwPg8gnfD DsO0rlQKPzoRSu0sehWq3/atWf/jGYTCltH/ZJAChAw3jpq6Qecjh/C09fM6+2Da6X9O gpkQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=x8XH6zeeutGP4rEQeNvbvrwh5BP+FcwqMZ4vlu9DuVE=; b=pcydZ0/4SBuhFEjDsqSkOX3gXTCEpbrvTaea88v/tvh3aLfnBEBnuMbv0e4tSrVmZX +nbYRw3YAfYLsAJf8EOTVALaFlLvRTTBDdJr1bw9Pa+VI5JgtussdCU+shr01moPzGp4 /50IAlbkjtt2fMdY2/b92YMjQBJjn2AcWgb2sn/glRtwN399Kp+16faep3xsbhRWNGeg mjoiMK8V/AUt7HxUjI83aFYU2nS8Tm3vEfVQXncBrukOt3p0ar3AC/zUvb/hrbxWeY9L QmS+sepDlcDsT905aN5Af7L9rxO29Dvzj4gSk+WRjeXl9PnlqEzoZvOQ4yFne37Xki2D 6yMg== X-Gm-Message-State: AOAM531QlCffX4AyE9CyGMJtVF0rbnyprMBpivGqYy2BnJ7jUjItO++K +J9QM7a61CwFuNl/oGoxUwE= X-Google-Smtp-Source: ABdhPJxdfto63uivW5rjDHcsOrcA9vIOvsVRnSOLsM9bbWrXJObHl1un/55AKDxltQ6bP6W7D8NWZg== X-Received: by 2002:a5d:9d15:: with SMTP id j21mr9991392ioj.100.1602268611082; Fri, 09 Oct 2020 11:36:51 -0700 (PDT) Received: from [127.0.1.1] ([184.63.162.180]) by smtp.gmail.com with ESMTPSA id v17sm4595887ilm.48.2020.10.09.11.36.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 09 Oct 2020 11:36:50 -0700 (PDT) Subject: [bpf-next PATCH v3 2/6] bpf, sockmap: On receive programs try to fast track SK_PASS ingress From: John Fastabend To: john.fastabend@gmail.com, alexei.starovoitov@gmail.com, daniel@iogearbox.net Cc: netdev@vger.kernel.org, bpf@vger.kernel.org, jakub@cloudflare.com, lmb@cloudflare.com Date: Fri, 09 Oct 2020 11:36:37 -0700 Message-ID: <160226859704.5692.12929678876744977669.stgit@john-Precision-5820-Tower> In-Reply-To: <160226839426.5692.13107801574043388675.stgit@john-Precision-5820-Tower> References: <160226839426.5692.13107801574043388675.stgit@john-Precision-5820-Tower> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org When we receive an skb and the ingress skb verdict program returns SK_PASS we currently set the ingress flag and put it on the workqueue so it can be turned into a sk_msg and put on the sk_msg ingress queue. Then finally telling userspace with data_ready hook. Here we observe that if the workqueue is empty then we can try to convert into a sk_msg type and call data_ready directly without bouncing through a workqueue. Its a common pattern to have a recv verdict program for visibility that always returns SK_PASS. In this case unless there is an ENOMEM error or we overrun the socket we can avoid the workqueue completely only using it when we fall back to error cases caused by memory pressure. By doing this we eliminate another case where data may be dropped if errors occur on memory limits in workqueue. Fixes: 51199405f9672 ("bpf: skb_verdict, support SK_PASS on RX BPF path") Signed-off-by: John Fastabend --- net/core/skmsg.c | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 040ae1d75b65..4b160d97b7f9 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -773,6 +773,7 @@ static void sk_psock_verdict_apply(struct sk_psock *psock, { struct tcp_skb_cb *tcp; struct sock *sk_other; + int err = -EIO; switch (verdict) { case __SK_PASS: @@ -784,8 +785,20 @@ static void sk_psock_verdict_apply(struct sk_psock *psock, tcp = TCP_SKB_CB(skb); tcp->bpf.flags |= BPF_F_INGRESS; - skb_queue_tail(&psock->ingress_skb, skb); - schedule_work(&psock->work); + + /* If the queue is empty then we can submit directly + * into the msg queue. If its not empty we have to + * queue work otherwise we may get OOO data. Otherwise, + * if sk_psock_skb_ingress errors will be handled by + * retrying later from workqueue. + */ + if (skb_queue_empty(&psock->ingress_skb)) { + err = sk_psock_skb_ingress(psock, skb); + } + if (err < 0) { + skb_queue_tail(&psock->ingress_skb, skb); + schedule_work(&psock->work); + } break; case __SK_REDIRECT: sk_psock_skb_redirect(skb); From patchwork Fri Oct 9 18:36:57 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 299171 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EE3C6C433DF for ; Fri, 9 Oct 2020 18:37:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8F64C2222F for ; Fri, 9 Oct 2020 18:37:21 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="OAtH2fZa" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732735AbgJIShP (ORCPT ); Fri, 9 Oct 2020 14:37:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35550 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730660AbgJIShL (ORCPT ); Fri, 9 Oct 2020 14:37:11 -0400 Received: from mail-io1-xd43.google.com (mail-io1-xd43.google.com [IPv6:2607:f8b0:4864:20::d43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 44998C0613D2; Fri, 9 Oct 2020 11:37:11 -0700 (PDT) Received: by mail-io1-xd43.google.com with SMTP id l8so11039266ioh.11; Fri, 09 Oct 2020 11:37:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:from:to:cc:date:message-id:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=4G6r6URW9FykMvUhJ39nieM0iVhRm4HwhOKU1HTrCTM=; b=OAtH2fZawxdfViRetetKOrBvPK4Dl80F6ldjO2RIvbT33Pxk6ggv+tespKTrDRcY5u OyVx+gU0mocGi3SAWCraSQteQgc5cLoHYlf6vGtGEgfIBUIXztpnFYdMcbF1qDn88hL/ QN9w0vqLGCGDD3zP72ybu3brha2ujLC4eCx8vkzjZ44BBM40Pbq/zMr1v4+R2vHI6hCo SKzBfjOL5NqqCg4VlhQryjrs4NtGZRIkCo0vtDqRYrNZovlHGmcABj3mryKUT+EMeoIb uiMkCiXtzGCifJNoeQTuCmYJl1C18cm1fQB53bGGWkl6/tmTkmvP+D4UE+I0cQZH/ebS Cd/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=4G6r6URW9FykMvUhJ39nieM0iVhRm4HwhOKU1HTrCTM=; b=uUjTHiyH+VWdu32Rk+Dktcf0QVwo+bptiE5XpPGxKWIBMHAXCYi1reD42j/uR4eKKV pN22mkpCDjUBRnYLX7y64daphBSinpMvwVshmGMxnVxgM3KayYceQYe//vrHruBm3ePO dxCkm1K0wnoTClPv1sh7DP6Dic06veQNNghepy3YU0PZ7jl3/n+v9V9IXuJYc2UPXLzl kG2O1HQlHEq/SRDhBQRiV3gMVV3NB/7DvUr2Kc961iMmjiGVLtu9OA7C3hnjUnuIQF1Z xDk/lV2KeozpHvJcY8raAOvWDkZPDFoiiwOdrr0x6hhzJsAvclcCQYmoADNzCNntPvnH f4BQ== X-Gm-Message-State: AOAM53316xSr17rN2AuScvqvRmH5XnQhI7TvhOWUMvNTKNxREVy3Xm+D mgcVCc1sAlYqXQ94qR/E5c4akvWUAt9Ilg== X-Google-Smtp-Source: ABdhPJzkicdBFXu3L0UhQ9gPkls6GWkrUA3tvWT0d9A0nxJwAxLSOSG2bDSTq/5Q7QEUXDJgXITU5g== X-Received: by 2002:a5e:9613:: with SMTP id a19mr10045968ioq.116.1602268630618; Fri, 09 Oct 2020 11:37:10 -0700 (PDT) Received: from [127.0.1.1] ([184.63.162.180]) by smtp.gmail.com with ESMTPSA id g8sm1623906ilc.39.2020.10.09.11.37.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 09 Oct 2020 11:37:09 -0700 (PDT) Subject: [bpf-next PATCH v3 3/6] bpf, sockmap: remove skb_set_owner_w wmem will be taken later from sendpage From: John Fastabend To: john.fastabend@gmail.com, alexei.starovoitov@gmail.com, daniel@iogearbox.net Cc: netdev@vger.kernel.org, bpf@vger.kernel.org, jakub@cloudflare.com, lmb@cloudflare.com Date: Fri, 09 Oct 2020 11:36:57 -0700 Message-ID: <160226861708.5692.17964237936462425136.stgit@john-Precision-5820-Tower> In-Reply-To: <160226839426.5692.13107801574043388675.stgit@john-Precision-5820-Tower> References: <160226839426.5692.13107801574043388675.stgit@john-Precision-5820-Tower> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org The skb_set_owner_w is unnecessary here. The sendpage call will create a fresh skb and set the owner correctly from workqueue. Its also not entirely harmless because it consumes cycles, but also impacts resource accounting by increasing sk_wmem_alloc. This is charging the socket we are going to send to for the skb, but we will put it on the workqueue for some time before this happens so we are artifically inflating sk_wmem_alloc for this period. Further, we don't know how many skbs will be used to send the packet or how it will be broken up when sent over the new socket so charging it with one big sum is also not correct when the workqueue may break it up if facing memory pressure. Seeing we don't know how/when this is going to be sent drop the early accounting. A later patch will do proper accounting charged on receive socket for the case where skbs get enqueued on the workqueue. Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: John Fastabend --- net/core/skmsg.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 4b160d97b7f9..7389d5d7e7f8 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -728,8 +728,6 @@ static void sk_psock_skb_redirect(struct sk_buff *skb) (ingress && atomic_read(&sk_other->sk_rmem_alloc) <= sk_other->sk_rcvbuf)) { - if (!ingress) - skb_set_owner_w(skb, sk_other); skb_queue_tail(&psock_other->ingress_skb, skb); schedule_work(&psock_other->work); } else { From patchwork Fri Oct 9 18:37:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 288684 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 51D0FC433E7 for ; Fri, 9 Oct 2020 18:38:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id EBE43222B9 for ; Fri, 9 Oct 2020 18:38:18 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="dgu/Llcn" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732900AbgJISiS (ORCPT ); Fri, 9 Oct 2020 14:38:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35598 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728317AbgJISha (ORCPT ); Fri, 9 Oct 2020 14:37:30 -0400 Received: from mail-io1-xd42.google.com (mail-io1-xd42.google.com [IPv6:2607:f8b0:4864:20::d42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 21A1DC0613D2; Fri, 9 Oct 2020 11:37:30 -0700 (PDT) Received: by mail-io1-xd42.google.com with SMTP id d20so11067093iop.10; Fri, 09 Oct 2020 11:37:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:from:to:cc:date:message-id:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=oNzlQC5it3x5TSZGgWLYGOv3a8456QwfczoKer9B8Qc=; b=dgu/Llcnqs4JikzOcPC+3aoY51HDj6veFBsvDvC846VtxxdkStm+SHg1x+KtrLRVne IxPM3jkseNgow34zp3r5d0SosEUfk5AjsbiQHXFk6+Yt3bTA/5WRS5hAPXClFJhTf+1h bkCFNtSLdCMdlw647l7rdEfhDVBhpAVe1aHP853Lpya7qqU2yqGCz6hkq5GEt2VRAO05 Tppqx+D7+dXScLlGZiVaMKX+/XQee6x+qn+bgEs5YXQAB2EI8L4vFZrhTLSzE5mg51Uy u8tTLflP8ZA/U7TBkxkVFMFBItmQMZsYFYj3D0JbS0f/JVHTbQ+pl5nyiBWDZHBmvPf9 ZCLg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=oNzlQC5it3x5TSZGgWLYGOv3a8456QwfczoKer9B8Qc=; b=eYlE0ovKpZhVVgWT+rAoCHfud2n9YqATV7MdKYnOlDCtrMpxTMIVLQoziPwrlDZzUk MihFUmtWiSSb54d6WOb24d2koQVfyqN0mmxscQ3JljE3smC8h79HmESDAgOSxRhr8eYx r88jFCKZnyRKPS36X9BC/SfQNXLyMf/SKlYbTjpFqpe/mnSJtzV25xzZWq6+plhxVpaF k6GSa1StpFZKcpV3jI70+BrD5wuabu/zZVCGl+rhwvbxYurxxJHoerwSUnE76rxIrL4l LAUNGjE1dDO8V26U4gRo7grcQZXi9slvyawQFOTBwxo62imNIg0fu4b7KYVR39ioA20G bxTQ== X-Gm-Message-State: AOAM533jre3TUGMHP5wPOsUAVSZVNUCj04u7Gnm067nB6znCfOPwzDxe IUaMC9cegTGrCQQwb3GVceg= X-Google-Smtp-Source: ABdhPJzdDNLuYVqiKCJGiWg5Pp9DfIywe8CaJbS1/+N4Cq4VjWHZW7OfmuaWjkFDy+EJuGZ9nAHvNQ== X-Received: by 2002:a02:6045:: with SMTP id d5mr9767026jaf.144.1602268649537; Fri, 09 Oct 2020 11:37:29 -0700 (PDT) Received: from [127.0.1.1] ([184.63.162.180]) by smtp.gmail.com with ESMTPSA id m2sm3875844ion.44.2020.10.09.11.37.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 09 Oct 2020 11:37:28 -0700 (PDT) Subject: [bpf-next PATCH v3 4/6] bpf, sockmap: remove dropped data on errors in redirect case From: John Fastabend To: john.fastabend@gmail.com, alexei.starovoitov@gmail.com, daniel@iogearbox.net Cc: netdev@vger.kernel.org, bpf@vger.kernel.org, jakub@cloudflare.com, lmb@cloudflare.com Date: Fri, 09 Oct 2020 11:37:17 -0700 Message-ID: <160226863689.5692.13861422742592309285.stgit@john-Precision-5820-Tower> In-Reply-To: <160226839426.5692.13107801574043388675.stgit@john-Precision-5820-Tower> References: <160226839426.5692.13107801574043388675.stgit@john-Precision-5820-Tower> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org In the sk_skb redirect case we didn't handle the case where we overrun the sk_rmem_alloc entry on ingress redirect or sk_wmem_alloc on egress. Because we didn't have anything implemented we simply dropped the skb. This meant data could be dropped if socket memory accounting was in place. This fixes the above dropped data case by moving the memory checks later in the code where we actually do the send or recv. This pushes those checks into the workqueue and allows us to return an EAGAIN error which in turn allows us to try again later from the workqueue. Fixes: 51199405f9672 ("bpf: skb_verdict, support SK_PASS on RX BPF path") Signed-off-by: John Fastabend --- net/core/skmsg.c | 28 ++++++++++++++-------------- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 7389d5d7e7f8..880b84baab5e 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -433,10 +433,12 @@ static int sk_psock_skb_ingress(struct sk_psock *psock, struct sk_buff *skb) static int sk_psock_handle_skb(struct sk_psock *psock, struct sk_buff *skb, u32 off, u32 len, bool ingress) { - if (ingress) - return sk_psock_skb_ingress(psock, skb); - else + if (!ingress) { + if (!sock_writeable(psock->sk)) + return -EAGAIN; return skb_send_sock_locked(psock->sk, skb, off, len); + } + return sk_psock_skb_ingress(psock, skb); } static void sk_psock_backlog(struct work_struct *work) @@ -709,30 +711,28 @@ static void sk_psock_skb_redirect(struct sk_buff *skb) { struct sk_psock *psock_other; struct sock *sk_other; - bool ingress; sk_other = tcp_skb_bpf_redirect_fetch(skb); + /* This error is a buggy BPF program, it returned a redirect + * return code, but then didn't set a redirect interface. + */ if (unlikely(!sk_other)) { kfree_skb(skb); return; } psock_other = sk_psock(sk_other); + /* This error indicates the socket is being torn down or had another + * error that caused the pipe to break. We can't send a packet on + * a socket that is in this state so we drop the skb. + */ if (!psock_other || sock_flag(sk_other, SOCK_DEAD) || !sk_psock_test_state(psock_other, SK_PSOCK_TX_ENABLED)) { kfree_skb(skb); return; } - ingress = tcp_skb_bpf_ingress(skb); - if ((!ingress && sock_writeable(sk_other)) || - (ingress && - atomic_read(&sk_other->sk_rmem_alloc) <= - sk_other->sk_rcvbuf)) { - skb_queue_tail(&psock_other->ingress_skb, skb); - schedule_work(&psock_other->work); - } else { - kfree_skb(skb); - } + skb_queue_tail(&psock_other->ingress_skb, skb); + schedule_work(&psock_other->work); } static void sk_psock_tls_verdict_apply(struct sk_buff *skb, int verdict) From patchwork Fri Oct 9 18:37:35 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 299170 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B45C3C433DF for ; Fri, 9 Oct 2020 18:38:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6EB0C222E8 for ; Fri, 9 Oct 2020 18:38:20 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="pu+f2Ull" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729285AbgJISiS (ORCPT ); Fri, 9 Oct 2020 14:38:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35650 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726501AbgJISht (ORCPT ); Fri, 9 Oct 2020 14:37:49 -0400 Received: from mail-il1-x144.google.com (mail-il1-x144.google.com [IPv6:2607:f8b0:4864:20::144]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B776BC0613D2; Fri, 9 Oct 2020 11:37:49 -0700 (PDT) Received: by mail-il1-x144.google.com with SMTP id r10so5170170ilm.11; Fri, 09 Oct 2020 11:37:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:from:to:cc:date:message-id:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=gRyrlvVDVlXc/XMAOZqzAVEoP6/JHU/5tv6ZTGQTtNg=; b=pu+f2Ulle1TdFyVItqGH76v8zqC7FJUV1xtf2vO0Q7hAL90Z2oM25LmXscFZGryIhm t/u8JH23jWQ228kWmb/mDgNvfEbdynEUrOxxdLBcZNoR6v3vwfpuIh23gJ3Mad+1jWFk GQN//UWs2Embi6+XnhFIKFL8bch8Bo4h3XVLYUFcniMg2UVxjJsS1pZ9m/rJgdmkXnVn 2Oe1vFcnzBTIkbMZ0t9XobO9UOoV/m824u/FJA1aNrcKJJN4z2KumpMhkaXcpAOv5NqA 9Nr43uo5ByMHT2EEkw8ofDtTWijDCUYS02iu9zZcpa8BQM8oz5Eg5kddSFPn/kXSscpO 7Xsw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=gRyrlvVDVlXc/XMAOZqzAVEoP6/JHU/5tv6ZTGQTtNg=; b=oQ4CVSeG9mVd2aG1dbg0rYONtcMkzvTV9XamDPJx9loSuHcD5c1S7iwGqfJUnk0WJY xsIplDB7V+rkva2xwXpbsyYWkIrvqzAcOb+7Cp+fbN8LBqAgxYRVvU2GjCaPKLGoJbrT MVDyBqjhdv8eBiZNtQh3KhJ9AqBml5fqVQafK9Aa+S004H475Ilcreg7iwg8bZcdJqjh shyPTp3zUz5aY3l+0OR8lkHYdWX12nc2rKmoPpixNC9h7nfPliXwm/DQ8KwCqA1SPMC2 TG/Qd/PFJiGsP7qiDloD3hD7+XrLJTFz4C9VTp4eC0HeSVlmGr3oUaaaOlsjbg58LiWM S6iQ== X-Gm-Message-State: AOAM531jznIb6MMxdApgCIljJUELAtfVV96nD/xOZB+Leav1tgcj8+rs H9ST9HnflqjhrX1QMiNjIo0= X-Google-Smtp-Source: ABdhPJztslSs2B+8OVOuKdZS6EUXBPacSDecdXOGM1Z/8fKaJj4poflLNdr68wMmfC2zWuchItMR/A== X-Received: by 2002:a05:6e02:eaa:: with SMTP id u10mr10586442ilj.57.1602268669034; Fri, 09 Oct 2020 11:37:49 -0700 (PDT) Received: from [127.0.1.1] ([184.63.162.180]) by smtp.gmail.com with ESMTPSA id k198sm4793327ilk.80.2020.10.09.11.37.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 09 Oct 2020 11:37:48 -0700 (PDT) Subject: [bpf-next PATCH v3 5/6] bpf, sockmap: Remove skb_orphan and let normal skb_kfree do cleanup From: John Fastabend To: john.fastabend@gmail.com, alexei.starovoitov@gmail.com, daniel@iogearbox.net Cc: netdev@vger.kernel.org, bpf@vger.kernel.org, jakub@cloudflare.com, lmb@cloudflare.com Date: Fri, 09 Oct 2020 11:37:35 -0700 Message-ID: <160226865548.5692.9098315689984599579.stgit@john-Precision-5820-Tower> In-Reply-To: <160226839426.5692.13107801574043388675.stgit@john-Precision-5820-Tower> References: <160226839426.5692.13107801574043388675.stgit@john-Precision-5820-Tower> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Calling skb_orphan() is unnecessary in the strp rcv handler because the skb is from a skb_clone() in __strp_recv. So it never has a destructor or a sk assigned. Plus its confusing to read because it might hint to the reader that the skb could have an sk assigned which is not true. Even if we did have an sk assigned it would be cleaner to simply wait for the upcoming kfree_skb(). Additionally, move the comment about strparser clone up so its closer to the logic it is describing and add to it so that it is more complete. Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: John Fastabend --- net/core/skmsg.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 880b84baab5e..3e78f2a80747 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -686,15 +686,16 @@ static int sk_psock_bpf_run(struct sk_psock *psock, struct bpf_prog *prog, { int ret; + /* strparser clones the skb before handing it to a upper layer, + * meaning we have the same data, but sk is NULL. We do want an + * sk pointer though when we run the BPF program. So we set it + * here and then NULL it to ensure we don't trigger a BUG_ON() + * in skb/sk operations later if kfree_skb is called with a + * valid skb->sk pointer and no destructor assigned. + */ skb->sk = psock->sk; bpf_compute_data_end_sk_skb(skb); ret = bpf_prog_run_pin_on_cpu(prog, skb); - /* strparser clones the skb before handing it to a upper layer, - * meaning skb_orphan has been called. We NULL sk on the way out - * to ensure we don't trigger a BUG_ON() in skb/sk operations - * later and because we are not charging the memory of this skb - * to any socket yet. - */ skb->sk = NULL; return ret; } @@ -824,7 +825,6 @@ static void sk_psock_strp_read(struct strparser *strp, struct sk_buff *skb) } prog = READ_ONCE(psock->progs.skb_verdict); if (likely(prog)) { - skb_orphan(skb); tcp_skb_bpf_redirect_clear(skb); ret = sk_psock_bpf_run(psock, prog, skb); ret = sk_psock_map_verd(ret, tcp_skb_bpf_redirect_fetch(skb)); From patchwork Fri Oct 9 18:37:55 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 288683 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 50CC8C4363D for ; Fri, 9 Oct 2020 18:38:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0F3AA222B9 for ; Fri, 9 Oct 2020 18:38:23 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="nfOwx/p1" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731967AbgJISiS (ORCPT ); Fri, 9 Oct 2020 14:38:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35702 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730763AbgJISiI (ORCPT ); Fri, 9 Oct 2020 14:38:08 -0400 Received: from mail-io1-xd42.google.com (mail-io1-xd42.google.com [IPv6:2607:f8b0:4864:20::d42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5EFD3C0613D2; Fri, 9 Oct 2020 11:38:08 -0700 (PDT) Received: by mail-io1-xd42.google.com with SMTP id u19so11118631ion.3; Fri, 09 Oct 2020 11:38:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:from:to:cc:date:message-id:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=PWLTwN4CpghStBRl2aJ9sg/Q1cHv19C1w9Szy0Ser+Y=; b=nfOwx/p1otqgvR3E+N6H79n2XaOfuKt4BFtIqs8dJW/Vjcv48EAz1lQm33rCRxAHt3 C9BCIchvyj/Folr2/UIyyciPrPAXAPe0vsvYgwDvzlZWo4MDBTPtcHLnKnhCeJOGVY/H WmxsB+b/3e1gnXiJbYrgWwiH2KwqlvGigkXKvG/wN6Q9ZQ1MwW0t+1WX0UhwAyp5ZC1x Ank+lbKhrQjswp8dgoaKhTUmzYqoXdXFRuqfOWcBdsNFnbgj9ac6ungGaCjm0950D03m VLlKOiq5N7tlQxM3/z/JFy2S3sH1hPRQPs4Gm2Xu4B6p/MCx76mEeZGN7Xhv2I09yk94 +10g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=PWLTwN4CpghStBRl2aJ9sg/Q1cHv19C1w9Szy0Ser+Y=; b=D7/ynt7sAOuSDE4t54fd4klm6ktvLUGvFBCaCa/oj9kKoMI7e0NllOw1fYTHdC0yFo fRukBsGsh/vswHZr5P1IOEXfCtuEOBqt4dFfrmR5Grygel2MFbazfArChoAwMWFZ5SQ9 22vDZm6wDIW6hwZRNyLC4ggrRdAUUKlwBCQV8UVU3nqzPHofHWeR9bP4lgbdgXT2/9ZY ycG3PYOeoxhE1Z2dtcRIpiHXJlth9nNw1voM3T1FJJLepIlj/UPLU+bGFGoZNIq4wPZq g1GqGamRVvTGfdALdaQ5B8isISFbolGHQlAKgF0s+Q4BA79Ev6h9CQqT/CfaYNm/3/yk qWsw== X-Gm-Message-State: AOAM5334/Pwe7+cGlOoe/w/TnqOd4MtbeU6mqILm+wChCFWuxV2tI7P1 QvoGs/1dMXmbl91HBkGILoI= X-Google-Smtp-Source: ABdhPJzarwCdy4BwQzuRsBCYFGhbe6xbH9MY2oQwyf6TV+5QGsImzcC9UYYnFzWf7oipbFMHKgxtDg== X-Received: by 2002:a05:6638:1508:: with SMTP id b8mr11627467jat.25.1602268687698; Fri, 09 Oct 2020 11:38:07 -0700 (PDT) Received: from [127.0.1.1] ([184.63.162.180]) by smtp.gmail.com with ESMTPSA id g8sm1624916ilc.39.2020.10.09.11.37.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 09 Oct 2020 11:38:07 -0700 (PDT) Subject: [bpf-next PATCH v3 6/6] bpf, sockmap: Add memory accounting so skbs on ingress lists are visible From: John Fastabend To: john.fastabend@gmail.com, alexei.starovoitov@gmail.com, daniel@iogearbox.net Cc: netdev@vger.kernel.org, bpf@vger.kernel.org, jakub@cloudflare.com, lmb@cloudflare.com Date: Fri, 09 Oct 2020 11:37:55 -0700 Message-ID: <160226867513.5692.10579573214635925960.stgit@john-Precision-5820-Tower> In-Reply-To: <160226839426.5692.13107801574043388675.stgit@john-Precision-5820-Tower> References: <160226839426.5692.13107801574043388675.stgit@john-Precision-5820-Tower> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Move skb->sk assignment out of sk_psock_bpf_run() and into individual callers. Then we can use proper skb_set_owner_r() call to assign a sk to a skb. This improves things by also charging the truesize against the sockets sk_rmem_alloc counter. With this done we get some accounting in place to ensure the memory associated with skbs on the workqueue are still being accounted for somewhere. Finally, by using skb_set_owner_r the destructor is setup so we can just let the normal skb_kfree logic recover the memory. Combined with previous patch dropping skb_orphan() we now can recover from memory pressure and maintain accounting. Note, we will charge the skbs against their originating socket even if being redirected into another socket. Once the skb completes the redirect op the kfree_skb will give the memory back. This is important because if we charged the socket we are redirecting to (like it was done before this series) the sock_writeable() test could fail because of the skb trying to be sent is already charged against the socket. Also TLS case is special. Here we wait until we have decided not to simply PASS the packet up the stack. In the case where we PASS the packet up the stack we already have an skb which is accounted for on the TLS socket context. For the parser case we continue to just set/clear skb->sk this is because the skb being used here may be combined with other skbs or turned into multiple skbs depending on the parser logic. For example the parser could request a payload length greater than skb->len so that the strparser needs to collect multiple skbs. At any rate the final result will be handled in the strparser recv callback. Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: John Fastabend --- net/core/skmsg.c | 31 +++++++++++++++---------------- 1 file changed, 15 insertions(+), 16 deletions(-) diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 3e78f2a80747..881a5b290946 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -684,20 +684,8 @@ EXPORT_SYMBOL_GPL(sk_psock_msg_verdict); static int sk_psock_bpf_run(struct sk_psock *psock, struct bpf_prog *prog, struct sk_buff *skb) { - int ret; - - /* strparser clones the skb before handing it to a upper layer, - * meaning we have the same data, but sk is NULL. We do want an - * sk pointer though when we run the BPF program. So we set it - * here and then NULL it to ensure we don't trigger a BUG_ON() - * in skb/sk operations later if kfree_skb is called with a - * valid skb->sk pointer and no destructor assigned. - */ - skb->sk = psock->sk; bpf_compute_data_end_sk_skb(skb); - ret = bpf_prog_run_pin_on_cpu(prog, skb); - skb->sk = NULL; - return ret; + return bpf_prog_run_pin_on_cpu(prog, skb); } static struct sk_psock *sk_psock_from_strp(struct strparser *strp) @@ -736,10 +724,11 @@ static void sk_psock_skb_redirect(struct sk_buff *skb) schedule_work(&psock_other->work); } -static void sk_psock_tls_verdict_apply(struct sk_buff *skb, int verdict) +static void sk_psock_tls_verdict_apply(struct sk_buff *skb, struct sock *sk, int verdict) { switch (verdict) { case __SK_REDIRECT: + skb_set_owner_r(skb, sk); sk_psock_skb_redirect(skb); break; case __SK_PASS: @@ -757,11 +746,17 @@ int sk_psock_tls_strp_read(struct sk_psock *psock, struct sk_buff *skb) rcu_read_lock(); prog = READ_ONCE(psock->progs.skb_verdict); if (likely(prog)) { + /* We skip full set_owner_r here because if we do a SK_PASS + * or SK_DROP we can skip skb memory accounting and use the + * TLS context. + */ + skb->sk = psock->sk; tcp_skb_bpf_redirect_clear(skb); ret = sk_psock_bpf_run(psock, prog, skb); ret = sk_psock_map_verd(ret, tcp_skb_bpf_redirect_fetch(skb)); + skb->sk = NULL; } - sk_psock_tls_verdict_apply(skb, ret); + sk_psock_tls_verdict_apply(skb, psock->sk, ret); rcu_read_unlock(); return ret; } @@ -823,6 +818,7 @@ static void sk_psock_strp_read(struct strparser *strp, struct sk_buff *skb) kfree_skb(skb); goto out; } + skb_set_owner_r(skb, sk); prog = READ_ONCE(psock->progs.skb_verdict); if (likely(prog)) { tcp_skb_bpf_redirect_clear(skb); @@ -847,8 +843,11 @@ static int sk_psock_strp_parse(struct strparser *strp, struct sk_buff *skb) rcu_read_lock(); prog = READ_ONCE(psock->progs.skb_parser); - if (likely(prog)) + if (likely(prog)) { + skb->sk = psock->sk; ret = sk_psock_bpf_run(psock, prog, skb); + skb->sk = NULL; + } rcu_read_unlock(); return ret; }