From patchwork Mon Nov 16 22:27:46 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 325109 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.6 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1CA85C388F9 for ; Mon, 16 Nov 2020 22:28:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id AC06A22447 for ; Mon, 16 Nov 2020 22:28:02 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="QpCfn048" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729723AbgKPW2C (ORCPT ); Mon, 16 Nov 2020 17:28:02 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34914 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728982AbgKPW2B (ORCPT ); Mon, 16 Nov 2020 17:28:01 -0500 Received: from mail-oi1-x242.google.com (mail-oi1-x242.google.com [IPv6:2607:f8b0:4864:20::242]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 82CA9C0613CF; Mon, 16 Nov 2020 14:28:01 -0800 (PST) Received: by mail-oi1-x242.google.com with SMTP id d9so20517770oib.3; Mon, 16 Nov 2020 14:28:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:from:to:cc:date:message-id:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=IosZaA2J4WB04PBn1llEGI1Q5uikyxZcnchHfDV0ON0=; b=QpCfn048Rb2F2BjKhWu0wJhZm8nH8/AAlMT1F4Nt7xMppyJyhsrkI9aiKlqDAaO+OE PpGNsJ5wdUeitX2n2LbBAjXmFJvsSMsg0ohW/dI/jCuL9FiiJCJCYmLXrtc5+KWwHIx2 D+FpBPVgk5Wl8i8eD7VX7udSQGPX7fp1T9sR+SrwahreJJwEaaB/IZAKFPvNn8o+5btu 9vr3rIMzOz1/DTMy0kSIBNmfqNdikeX7/auzu2hZgNmjZR0nsze603Soz03v7mfGdHiA 5KwG+1rUDDF73nL9D/6oPvB0EL34WM0+QcEWLVdbZfQnJIV+uF+G0guh28r1z5V9R62M J0Jg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=IosZaA2J4WB04PBn1llEGI1Q5uikyxZcnchHfDV0ON0=; b=JLKwwsUgehMKeeLam+YUo6Y64Uwo3TYTT2H75nPirWYtLcvjDUGRq0VpaeOUYYGvGE E5fAuBIU6cx0gY2k9C0kuzEVZKnggtXkEOuQu0ETywirfVCXrh7KzqWuQS3hcuEnIa/F oPp2Ay//Flhzu8SlutDaq4IX1eKVs4c58pcoYyHc/fre/5XD2n37/qPnQm4M2oUMG+gW 1zHfkIIzM6T7WYyQ2TbvPMnbjlam/DpAxc9Xt+8+DK308fCytKa1Dbiobd15mFsLsfh3 WnIFylV0AdFMIVX+3nvzGE6o8y/LjgVViV5NvbzMMwy4cSlf6DH9WyC8ejlXpAdCux+G ylAQ== X-Gm-Message-State: AOAM530SChHaSqPnM00UDz5nBeQycsxcjU0EtZ1fspwz8nOqUCTlsYra H5YebTy+vaH8g4YtlhpFD80+HsE1EWG1Gg== X-Google-Smtp-Source: ABdhPJzrs4QSdqjppUHQQp2IsbW+JMz+NMoR2Oe+/nFhYFhQAEpMqmvTKau6XqIIGs1Tesus7ZZOsQ== X-Received: by 2002:aca:a988:: with SMTP id s130mr583643oie.172.1605565680656; Mon, 16 Nov 2020 14:28:00 -0800 (PST) Received: from [127.0.1.1] ([184.63.162.180]) by smtp.gmail.com with ESMTPSA id v82sm4971383oif.27.2020.11.16.14.27.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 Nov 2020 14:28:00 -0800 (PST) Subject: [bpf PATCH v3 1/6] bpf, sockmap: fix partial copy_page_to_iter so progress can still be made From: John Fastabend To: jakub@cloudflare.com, ast@kernel.org, daniel@iogearbox.net Cc: john.fastabend@gmail.com, bpf@vger.kernel.org, netdev@vger.kernel.org Date: Mon, 16 Nov 2020 14:27:46 -0800 Message-ID: <160556566659.73229.15694973114605301063.stgit@john-XPS-13-9370> In-Reply-To: <160556562395.73229.12161576665124541961.stgit@john-XPS-13-9370> References: <160556562395.73229.12161576665124541961.stgit@john-XPS-13-9370> User-Agent: StGit/0.23-36-gc01b MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org If copy_page_to_iter() fails or even partially completes, but with fewer bytes copied than expected we currently reset sg.start and return EFAULT. This proves problematic if we already copied data into the user buffer before we return an error. Because we leave the copied data in the user buffer and fail to unwind the scatterlist so kernel side believes data has been copied and user side believes data has _not_ been received. Expected behavior should be to return number of bytes copied and then on the next read we need to return the error assuming its still there. This can happen if we have a copy length spanning multiple scatterlist elements and one or more complete before the error is hit. The error is rare enough though that my normal testing with server side programs, such as nginx, httpd, envoy, etc., I have never seen this. The only reliable way to reproduce that I've found is to stream movies over my browser for a day or so and wait for it to hang. Not very scientific, but with a few extra WARN_ON()s in the code the bug was obvious. When we review the errors from copy_page_to_iter() it seems we are hitting a page fault from copy_page_to_iter_iovec() where the code checks fault_in_pages_writeable(buf, copy) where buf is the user buffer. It also seems typical server applications don't hit this case. The other way to try and reproduce this is run the sockmap selftest tool test_sockmap with data verification enabled, but it doesn't reproduce the fault. Perhaps we can trigger this case artificially somehow from the test tools. I haven't sorted out a way to do that yet though. Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface") Reviewed-by: Jakub Sitnicki Signed-off-by: John Fastabend --- net/ipv4/tcp_bpf.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c index 37f4cb2bba5c..8e950b0bfabc 100644 --- a/net/ipv4/tcp_bpf.c +++ b/net/ipv4/tcp_bpf.c @@ -15,8 +15,8 @@ int __tcp_bpf_recvmsg(struct sock *sk, struct sk_psock *psock, { struct iov_iter *iter = &msg->msg_iter; int peek = flags & MSG_PEEK; - int i, ret, copied = 0; struct sk_msg *msg_rx; + int i, copied = 0; msg_rx = list_first_entry_or_null(&psock->ingress_msg, struct sk_msg, list); @@ -37,11 +37,9 @@ int __tcp_bpf_recvmsg(struct sock *sk, struct sk_psock *psock, page = sg_page(sge); if (copied + copy > len) copy = len - copied; - ret = copy_page_to_iter(page, sge->offset, copy, iter); - if (ret != copy) { - msg_rx->sg.start = i; - return -EFAULT; - } + copy = copy_page_to_iter(page, sge->offset, copy, iter); + if (!copy) + return copied ? copied : -EFAULT; copied += copy; if (likely(!peek)) { @@ -56,6 +54,11 @@ int __tcp_bpf_recvmsg(struct sock *sk, struct sk_psock *psock, put_page(page); } } else { + /* Lets not optimize peek case if copy_page_to_iter + * didn't copy the entire length lets just break. + */ + if (copy != sge->length) + return copied; sk_msg_iter_var_next(i); } From patchwork Mon Nov 16 22:29:08 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 325108 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.6 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9FB35C6379D for ; Mon, 16 Nov 2020 22:29:32 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 54B0D20E65 for ; Mon, 16 Nov 2020 22:29:32 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="EtB0W5/A" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730106AbgKPW3X (ORCPT ); Mon, 16 Nov 2020 17:29:23 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35130 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726543AbgKPW3X (ORCPT ); Mon, 16 Nov 2020 17:29:23 -0500 Received: from mail-oi1-x244.google.com (mail-oi1-x244.google.com [IPv6:2607:f8b0:4864:20::244]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5A327C0613CF; Mon, 16 Nov 2020 14:29:23 -0800 (PST) Received: by mail-oi1-x244.google.com with SMTP id k19so11878094oic.12; Mon, 16 Nov 2020 14:29:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:from:to:cc:date:message-id:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=wz0IJ/hlMcfL+ARf40dglrvKGm0TqnMJcE8E8aIInjk=; b=EtB0W5/AjEzM//nTSHunLzogOf83F88aMH7RG1i4TamCBAY2986NN9zbCfsTHv5RRI gKoOGXG+tOIr77aPTs/VicUljIJdNPPCAtAiWqSvMhHTI+nSpJ/LKC3G9wsdezx4O3v9 hVNE/2Tzi/llHqRHr5zEe/sV9wvhXuLWq+ye+R7fOGPHS/dNmbYHA/fHV7Pu/NNyjmhP uel6pVymrO2bgkZh05JS6svmDCrMlVVrHhdbkk6d3Puz+3SZsiYA1biX0jcwU7qn9kEL AMjqWI6ev5qDs/VS4L8FnM8cQT4J86X9U05mKTdUpbVLiL0sK6uSq02qIzgSvZ4OVrb2 MnuA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=wz0IJ/hlMcfL+ARf40dglrvKGm0TqnMJcE8E8aIInjk=; b=eUhh2zSSBblzST/RPz23/WVRPb6u8Dpqsh1xmy2sWSjEy6EH2EFUKeQCh6dVK6W3P7 5UdqoUGvxyMu1jP3+Lzl2X1NDMW1HJI6CJCC5ZtRDUjm0g9FQjL9yYDRNY+2DYGip67M Q5UY2VGRnUVcsDur9OTgP70ofws99memQEYE4KP8X1Z8VXnRLpExIOXNZrS8SB/5qq/t qTE5C/KYD0FDFnDhS4V/mnYrBiJFgWoItBRHCLQg6tEMpA+O4wYoa9NF9V1NcrrLO19C MVz+lDWL9a7Ao3A1id2VPvzNj3vE7i2l97+y17SvQcoxL4WW0gEkyQYIr9mcrDjXF7jl ZizQ== X-Gm-Message-State: AOAM531I5s8hVFwrp3E3S3DLfGGJnLI9OguEe7FR+O6ZWKA4aa9mvLfI BhJZ0V39qYnYCPCiQ+uTl4zhYMZCw4Bcrw== X-Google-Smtp-Source: ABdhPJxONS2U765bzX7oLx2jORWygnzCVebNGLXvmvjq14UCLmSS0mobW8L8doT7SrGD2+GRp0gi3A== X-Received: by 2002:a05:6808:494:: with SMTP id z20mr581408oid.10.1605565762485; Mon, 16 Nov 2020 14:29:22 -0800 (PST) Received: from [127.0.1.1] ([184.63.162.180]) by smtp.gmail.com with ESMTPSA id o135sm5229313ooo.38.2020.11.16.14.29.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 Nov 2020 14:29:21 -0800 (PST) Subject: [bpf PATCH v3 5/6] bpf, sockmap: Handle memory acct if skb_verdict prog redirects to self From: John Fastabend To: jakub@cloudflare.com, ast@kernel.org, daniel@iogearbox.net Cc: john.fastabend@gmail.com, bpf@vger.kernel.org, netdev@vger.kernel.org Date: Mon, 16 Nov 2020 14:29:08 -0800 Message-ID: <160556574804.73229.11328201020039674147.stgit@john-XPS-13-9370> In-Reply-To: <160556562395.73229.12161576665124541961.stgit@john-XPS-13-9370> References: <160556562395.73229.12161576665124541961.stgit@john-XPS-13-9370> User-Agent: StGit/0.23-36-gc01b MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org If the skb_verdict_prog redirects an skb knowingly to itself, fix your BPF program this is not optimal and an abuse of the API please use SK_PASS. That said there may be cases, such as socket load balancing, where picking the socket is hashed based or otherwise picks the same socket it was received on in some rare cases. If this happens we don't want to confuse userspace giving them an EAGAIN error if we can avoid it. To avoid double accounting in these cases. At the moment even if the skb has already been charged against the sockets rcvbuf and forward alloc we check it again and do set_owner_r() causing it to be orphaned and recharged. For one this is useless work, but more importantly we can have a case where the skb could be put on the ingress queue, but because we are under memory pressure we return EAGAIN. The trouble here is the skb has already been accounted for so any rcvbuf checks include the memory associated with the packet already. This rolls up and can result in unecessary EAGAIN errors in userspace read() calls. Fix by doing an unlikely check and skipping checks if skb->sk == sk. Fixes: 51199405f9672 ("bpf: skb_verdict, support SK_PASS on RX BPF path") Signed-off-by: John Fastabend Reviewed-by: Jakub Sitnicki --- net/core/skmsg.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 9aed5a2c7c5b..514bc9f6f8ae 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -442,11 +442,19 @@ static int sk_psock_skb_ingress_enqueue(struct sk_buff *skb, return copied; } +static int sk_psock_skb_ingress_self(struct sk_psock *psock, struct sk_buff *skb); + static int sk_psock_skb_ingress(struct sk_psock *psock, struct sk_buff *skb) { struct sock *sk = psock->sk; struct sk_msg *msg; + /* If we are receiving on the same sock skb->sk is already assigned, + * skip memory accounting and owner transition seeing it already set + * correctly. + */ + if (unlikely(skb->sk == sk)) + return sk_psock_skb_ingress_self(psock, skb); msg = sk_psock_create_ingress_msg(sk, skb); if (!msg) return -EAGAIN;