From patchwork Tue Mar 23 00:37:57 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 408752 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C0F5FC433EC for ; Tue, 23 Mar 2021 00:39:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A892E619AE for ; Tue, 23 Mar 2021 00:39:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231267AbhCWAi4 (ORCPT ); Mon, 22 Mar 2021 20:38:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39290 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230166AbhCWAiS (ORCPT ); Mon, 22 Mar 2021 20:38:18 -0400 Received: from mail-qv1-xf33.google.com (mail-qv1-xf33.google.com [IPv6:2607:f8b0:4864:20::f33]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1FCEDC061574; Mon, 22 Mar 2021 17:38:18 -0700 (PDT) Received: by mail-qv1-xf33.google.com with SMTP id h3so9647173qvh.8; Mon, 22 Mar 2021 17:38:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=azpBQutchzCr3FplNtXl63QhozAq1NlwxEgz2eiaadY=; b=WTeos/u9ctlf+UF+ERbZ4ffmp0Z6ja9sDED3log40MO7KxDweIGvMlvFBWW58FzRwQ ExiqqrLVWUwbEbugB38tqWsYRqjRnLr60jEEeP0VbDfNg0AAWQmBf7s8JoHZ5bX7K08a E1+KyrEAYDtFUn/RuVdEymB1tC+0KflMTFV+H4hTEgKZiD2nYyObXIQaYXROi1TxekUj LbYClIkhO9HFULdCLq9dRR6oR/4L+lhX3g82OR8Xm1s5FRGjWUx9L+VYgZykzud/Qy/Q EMURF92OutO2xf/cTZncaY+yLOeAVoXZInlpZVNv6Om61kEtKxbck6gF1VeGsfEHgsHQ T7tw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=azpBQutchzCr3FplNtXl63QhozAq1NlwxEgz2eiaadY=; b=ai0E6c09POhp4pLtRqDENzz0cmLbHninuQfXRP8hW8kcSYaP8qfqB4pshQ8tkR31xd UEO5A2cv3hLD/ANxWeNtvmkjyj5kCRJCyVDBblLQ/FNhrqqKBtMdLqpiYjH4A21qk7ok AZh0qvjZOVXqBs9o0jpwyuwDigSbttrBnORNBW46uVClR+aEx5sPkWI2eYb1Zu1yIifD mT3klmVGxXquYzrhT/xxV7tKVChxyMyEZSD3CPcAscZs1w6JeqC1hf/vRCo4kqeRvp9S Gt4HhzKP6RQZH5CL5KtZ3ohlEHEroqFIcnldX2ujV4q5+jD5CnkPgJHz7FXsTwhdVJ7z PcaQ== X-Gm-Message-State: AOAM533OHkZIA/0bxK5Yff/vBZzsciJiui6nq50sFP/RXXEMS3M+5gQW 9ZwnTY460uKfzGo8un9zSQE4ERzv8p7tmQ== X-Google-Smtp-Source: ABdhPJw0pzXwDKHdLNCw3KFOaWBhYYf1E4yAHymf5a96Qnm0/5HK3XHpWzR5BnvXbdH7ip1lBVspNA== X-Received: by 2002:a05:6214:d47:: with SMTP id 7mr2422982qvr.48.1616459897261; Mon, 22 Mar 2021 17:38:17 -0700 (PDT) Received: from unknown.attlocal.net ([2600:1700:65a0:ab60:fda6:6522:f108:7bd8]) by smtp.gmail.com with ESMTPSA id 184sm12356403qki.97.2021.03.22.17.38.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Mar 2021 17:38:16 -0700 (PDT) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, duanxiongchun@bytedance.com, wangdongdong.6@bytedance.com, jiang.wang@bytedance.com, Cong Wang , John Fastabend , Daniel Borkmann , Lorenz Bauer , Jakub Sitnicki Subject: [Patch bpf-next v6 01/12] skmsg: lock ingress_skb when purging Date: Mon, 22 Mar 2021 17:37:57 -0700 Message-Id: <20210323003808.16074-2-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210323003808.16074-1-xiyou.wangcong@gmail.com> References: <20210323003808.16074-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Cong Wang Currently we purge the ingress_skb queue only when psock refcnt goes down to 0, so locking the queue is not necessary, but in order to be called during ->close, we have to lock it here. Cc: John Fastabend Cc: Daniel Borkmann Cc: Lorenz Bauer Acked-by: Jakub Sitnicki Signed-off-by: Cong Wang --- net/core/skmsg.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 07f54015238a..bebf84ed4e30 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -634,7 +634,7 @@ static void sk_psock_zap_ingress(struct sk_psock *psock) { struct sk_buff *skb; - while ((skb = __skb_dequeue(&psock->ingress_skb)) != NULL) { + while ((skb = skb_dequeue(&psock->ingress_skb)) != NULL) { skb_bpf_redirect_clear(skb); kfree_skb(skb); } From patchwork Tue Mar 23 00:37:58 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 407436 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B16A0C433C1 for ; Tue, 23 Mar 2021 00:39:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6B6AF6199F for ; Tue, 23 Mar 2021 00:39:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231305AbhCWAjA (ORCPT ); Mon, 22 Mar 2021 20:39:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39300 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230401AbhCWAiU (ORCPT ); Mon, 22 Mar 2021 20:38:20 -0400 Received: from mail-qk1-x733.google.com (mail-qk1-x733.google.com [IPv6:2607:f8b0:4864:20::733]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C15C7C061574; Mon, 22 Mar 2021 17:38:19 -0700 (PDT) Received: by mail-qk1-x733.google.com with SMTP id v70so12713080qkb.8; Mon, 22 Mar 2021 17:38:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=bkt2WplmLy1bOLMwQLt5MnCSbjUaxe7m0LNKG1XmAxE=; b=CofJadLseqZ7nx3cLMXTABG1HGsodAMCGAl3L2QBEmiWTbww6nUQf6ENyELdpaykGZ 0aAgSN0lH6x0yoxtDj+omgDNKqyergbTPcfdAJPew3kWEl5Nu6f/4uFa4yCaCgruS0oX +37Dg+yIfBPF6uSwJUPwk7kBV078Wl26QOwehzSmn6wir8PgOxyw7E7t3c5W/r5DXQQX LJZDlbiMAqRYrnin4JI2AJMULK8p//r9p+mDFZZhdUaVP/z9MPJXnGRryYYtLd+oMH4I GxiAF3K4l5ap8fmELSW7zAe/8nKKmjbEAtVU/i8tr45qUg9B5EiiU/DZlEveCe8Pvgn7 L3+Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=bkt2WplmLy1bOLMwQLt5MnCSbjUaxe7m0LNKG1XmAxE=; b=krNor7UKH+150mypyfEkN+M9iOSw0HvZnACq+gunQUT40qltSwyv//Yb3e1HQwLXfL 6+jrOza8qB+o+h4/yfvJk+FKTo69VPxkf9Z8u2vnoHd8eKO5FnY+W6Aed1NBq/jDW5xS jcBtCSqmAVunDYLuZmgR9hC7O56Wiqx+lIx1N8xlzfYLDlQlTzxgFq4iVzo2Oc+VEreA 0StHOZXCYwQjAnIIRH57kLI/4J9Q5gxwdIoUjKlusR2IRbMWnqnigxL6OI+iiQYlUTlo NEaV0wrrp7Ec3mFo7dIc+Lh/Lv5ImQAQTU3GYpOnuL1iKGzPC9yRItBdIdAWDBkVr48H c9Dw== X-Gm-Message-State: AOAM5317JWCstDRlRk2RAhp34OWyAk/0ol1/ZuyYUCsJbUnHCRgFAAyg cDI5HY4jeSe1ejJT+xexElYEhanVZZHj2Q== X-Google-Smtp-Source: ABdhPJwRtd6FvIEdr4zRDrnuQSQJ6UL76ZhiZ659rR2/3WLXCRms7ZkiGXd+YXQmUAjN6Vd6Nm/kVg== X-Received: by 2002:a05:620a:806:: with SMTP id s6mr2817052qks.50.1616459898758; Mon, 22 Mar 2021 17:38:18 -0700 (PDT) Received: from unknown.attlocal.net ([2600:1700:65a0:ab60:fda6:6522:f108:7bd8]) by smtp.gmail.com with ESMTPSA id 184sm12356403qki.97.2021.03.22.17.38.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Mar 2021 17:38:18 -0700 (PDT) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, duanxiongchun@bytedance.com, wangdongdong.6@bytedance.com, jiang.wang@bytedance.com, Cong Wang , John Fastabend , Daniel Borkmann , Lorenz Bauer , Jakub Sitnicki Subject: [Patch bpf-next v6 02/12] skmsg: introduce a spinlock to protect ingress_msg Date: Mon, 22 Mar 2021 17:37:58 -0700 Message-Id: <20210323003808.16074-3-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210323003808.16074-1-xiyou.wangcong@gmail.com> References: <20210323003808.16074-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Cong Wang Currently we rely on lock_sock to protect ingress_msg, it is too big for this, we can actually just use a spinlock to protect this list like protecting other skb queues. __tcp_bpf_recvmsg() is still special because of peeking, it still has to use lock_sock. Cc: John Fastabend Cc: Daniel Borkmann Cc: Lorenz Bauer Acked-by: Jakub Sitnicki Signed-off-by: Cong Wang --- include/linux/skmsg.h | 46 +++++++++++++++++++++++++++++++++++++++++++ net/core/skmsg.c | 3 +++ net/ipv4/tcp_bpf.c | 18 ++++++----------- 3 files changed, 55 insertions(+), 12 deletions(-) diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h index 6c09d94be2e9..f2d45a73b2b2 100644 --- a/include/linux/skmsg.h +++ b/include/linux/skmsg.h @@ -89,6 +89,7 @@ struct sk_psock { #endif struct sk_buff_head ingress_skb; struct list_head ingress_msg; + spinlock_t ingress_lock; unsigned long state; struct list_head link; spinlock_t link_lock; @@ -284,7 +285,45 @@ static inline struct sk_psock *sk_psock(const struct sock *sk) static inline void sk_psock_queue_msg(struct sk_psock *psock, struct sk_msg *msg) { + spin_lock_bh(&psock->ingress_lock); list_add_tail(&msg->list, &psock->ingress_msg); + spin_unlock_bh(&psock->ingress_lock); +} + +static inline struct sk_msg *sk_psock_dequeue_msg(struct sk_psock *psock) +{ + struct sk_msg *msg; + + spin_lock_bh(&psock->ingress_lock); + msg = list_first_entry_or_null(&psock->ingress_msg, struct sk_msg, list); + if (msg) + list_del(&msg->list); + spin_unlock_bh(&psock->ingress_lock); + return msg; +} + +static inline struct sk_msg *sk_psock_peek_msg(struct sk_psock *psock) +{ + struct sk_msg *msg; + + spin_lock_bh(&psock->ingress_lock); + msg = list_first_entry_or_null(&psock->ingress_msg, struct sk_msg, list); + spin_unlock_bh(&psock->ingress_lock); + return msg; +} + +static inline struct sk_msg *sk_psock_next_msg(struct sk_psock *psock, + struct sk_msg *msg) +{ + struct sk_msg *ret; + + spin_lock_bh(&psock->ingress_lock); + if (list_is_last(&msg->list, &psock->ingress_msg)) + ret = NULL; + else + ret = list_next_entry(msg, list); + spin_unlock_bh(&psock->ingress_lock); + return ret; } static inline bool sk_psock_queue_empty(const struct sk_psock *psock) @@ -292,6 +331,13 @@ static inline bool sk_psock_queue_empty(const struct sk_psock *psock) return psock ? list_empty(&psock->ingress_msg) : true; } +static inline void kfree_sk_msg(struct sk_msg *msg) +{ + if (msg->skb) + consume_skb(msg->skb); + kfree(msg); +} + static inline void sk_psock_report_error(struct sk_psock *psock, int err) { struct sock *sk = psock->sk; diff --git a/net/core/skmsg.c b/net/core/skmsg.c index bebf84ed4e30..305dddc51857 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -592,6 +592,7 @@ struct sk_psock *sk_psock_init(struct sock *sk, int node) INIT_WORK(&psock->work, sk_psock_backlog); INIT_LIST_HEAD(&psock->ingress_msg); + spin_lock_init(&psock->ingress_lock); skb_queue_head_init(&psock->ingress_skb); sk_psock_set_state(psock, SK_PSOCK_TX_ENABLED); @@ -638,7 +639,9 @@ static void sk_psock_zap_ingress(struct sk_psock *psock) skb_bpf_redirect_clear(skb); kfree_skb(skb); } + spin_lock_bh(&psock->ingress_lock); __sk_psock_purge_ingress_msg(psock); + spin_unlock_bh(&psock->ingress_lock); } static void sk_psock_link_destroy(struct sk_psock *psock) diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c index 17c322b875fd..ae980716d896 100644 --- a/net/ipv4/tcp_bpf.c +++ b/net/ipv4/tcp_bpf.c @@ -18,9 +18,7 @@ int __tcp_bpf_recvmsg(struct sock *sk, struct sk_psock *psock, struct sk_msg *msg_rx; int i, copied = 0; - msg_rx = list_first_entry_or_null(&psock->ingress_msg, - struct sk_msg, list); - + msg_rx = sk_psock_peek_msg(psock); while (copied != len) { struct scatterlist *sge; @@ -68,22 +66,18 @@ int __tcp_bpf_recvmsg(struct sock *sk, struct sk_psock *psock, } while (i != msg_rx->sg.end); if (unlikely(peek)) { - if (msg_rx == list_last_entry(&psock->ingress_msg, - struct sk_msg, list)) + msg_rx = sk_psock_next_msg(psock, msg_rx); + if (!msg_rx) break; - msg_rx = list_next_entry(msg_rx, list); continue; } msg_rx->sg.start = i; if (!sge->length && msg_rx->sg.start == msg_rx->sg.end) { - list_del(&msg_rx->list); - if (msg_rx->skb) - consume_skb(msg_rx->skb); - kfree(msg_rx); + msg_rx = sk_psock_dequeue_msg(psock); + kfree_sk_msg(msg_rx); } - msg_rx = list_first_entry_or_null(&psock->ingress_msg, - struct sk_msg, list); + msg_rx = sk_psock_peek_msg(psock); } return copied; From patchwork Tue Mar 23 00:37:59 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 407438 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C3342C433F4 for ; Tue, 23 Mar 2021 00:39:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8E922619B1 for ; Tue, 23 Mar 2021 00:39:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231317AbhCWAjC (ORCPT ); Mon, 22 Mar 2021 20:39:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39302 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230450AbhCWAiV (ORCPT ); Mon, 22 Mar 2021 20:38:21 -0400 Received: from mail-qk1-x735.google.com (mail-qk1-x735.google.com [IPv6:2607:f8b0:4864:20::735]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 28B0AC061574; Mon, 22 Mar 2021 17:38:21 -0700 (PDT) Received: by mail-qk1-x735.google.com with SMTP id o5so12692633qkb.0; Mon, 22 Mar 2021 17:38:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=4vwBCDi/Trp5W1pFSVwRRUUYgEK05t0TKOcvxMF4nzg=; b=kvPfZuvbqa7Aywzuh7NmfKMccdrANYB2ikD54sNAqZyN8AamKirn26lvi8CDYZwxfm +xFaJfMCI08L7MT8UsFjFMpiKhQuWCJu3xFGxY5qCg11Gns0P/3asBD4IncwrL+xpqps Q1P4q8ZHpk0sku/AMpSXiLAQAUDFpvXTpsKVS7MphGWYLO14+b9+ghirLY6NCBxzyW5E uhotAZGKI0+Sb8vPpO/BrvC3x8LpTotl0wd5TVJrjDTMNo7/7jLwzua4uIZtzw5VCzDG DApxuJEuZmKx8pzksUn4EJFFnkezDq6QrkMUGeyOGG0nobYna3CO4vy4ijhnv15N3b3w Iomg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=4vwBCDi/Trp5W1pFSVwRRUUYgEK05t0TKOcvxMF4nzg=; b=fRdTKNMSKHf+eZPvDkZDNon+0y8LbsGUwtThCSJAFe4TIgLtjkTmS2CRIFJm+yZzKO FzIu5Xm5+dBLT64DjJ0ptifPFSng9RR1Qy0plsHt7+otUMCywvlfl5SpYdCEWW2vvQ95 lx9tvyND2U7Pw+pKf4UGwP61BOcKwe0AcdS9Dcm9peoUkd9dz/bUOkV8MpT3OD1BzDk7 9CVpv30+YsLjb08wG5O07GIH9AskZuVEo2Bd4AebNzGqEjJ3Yr/2cli+T5UFQbzZOD5J w5CTkXJ7WeR51xKkyhYAiUK1GnVpysx2uW+IXbodUvhePSCr+cIVZ46glt9bQhcCPZBD GmeQ== X-Gm-Message-State: AOAM532qwMLz1w/ZAOIr0hjca/8t6AOpeqNytCnsjG6L4bB6hSk6aOwN 6Z3SjoVXUYvq8AzdUoCiUUkA5qOSIlIBXg== X-Google-Smtp-Source: ABdhPJxdObfW73HnZZ25yBTUmv0bIE/mIqw3eBWOWgvSftPihUOQ6bW4xRFCejep0v7ubfvzPsSiAw== X-Received: by 2002:a05:620a:40d:: with SMTP id 13mr2929553qkp.369.1616459900242; Mon, 22 Mar 2021 17:38:20 -0700 (PDT) Received: from unknown.attlocal.net ([2600:1700:65a0:ab60:fda6:6522:f108:7bd8]) by smtp.gmail.com with ESMTPSA id 184sm12356403qki.97.2021.03.22.17.38.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Mar 2021 17:38:19 -0700 (PDT) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, duanxiongchun@bytedance.com, wangdongdong.6@bytedance.com, jiang.wang@bytedance.com, Cong Wang , John Fastabend , Daniel Borkmann , Jakub Sitnicki , Lorenz Bauer Subject: [Patch bpf-next v6 03/12] skmsg: introduce skb_send_sock() for sock_map Date: Mon, 22 Mar 2021 17:37:59 -0700 Message-Id: <20210323003808.16074-4-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210323003808.16074-1-xiyou.wangcong@gmail.com> References: <20210323003808.16074-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Cong Wang We only have skb_send_sock_locked() which requires callers to use lock_sock(). Introduce a variant skb_send_sock() which locks on its own, callers do not need to lock it any more. This will save us from adding a ->sendmsg_locked for each protocol. To reuse the code, pass function pointers to __skb_send_sock() and build skb_send_sock() and skb_send_sock_locked() on top. Cc: John Fastabend Cc: Daniel Borkmann Cc: Jakub Sitnicki Cc: Lorenz Bauer Signed-off-by: Cong Wang --- include/linux/skbuff.h | 1 + net/core/skbuff.c | 55 ++++++++++++++++++++++++++++++++++++------ 2 files changed, 49 insertions(+), 7 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 0503c917d773..2fc8c3657c53 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -3626,6 +3626,7 @@ int skb_splice_bits(struct sk_buff *skb, struct sock *sk, unsigned int offset, unsigned int flags); int skb_send_sock_locked(struct sock *sk, struct sk_buff *skb, int offset, int len); +int skb_send_sock(struct sock *sk, struct sk_buff *skb, int offset, int len); void skb_copy_and_csum_dev(const struct sk_buff *skb, u8 *to); unsigned int skb_zerocopy_headlen(const struct sk_buff *from); int skb_zerocopy(struct sk_buff *to, struct sk_buff *from, diff --git a/net/core/skbuff.c b/net/core/skbuff.c index c421c8f80925..14010c0eec48 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -2500,9 +2500,32 @@ int skb_splice_bits(struct sk_buff *skb, struct sock *sk, unsigned int offset, } EXPORT_SYMBOL_GPL(skb_splice_bits); -/* Send skb data on a socket. Socket must be locked. */ -int skb_send_sock_locked(struct sock *sk, struct sk_buff *skb, int offset, - int len) +static int sendmsg_unlocked(struct sock *sk, struct msghdr *msg, + struct kvec *vec, size_t num, size_t size) +{ + struct socket *sock = sk->sk_socket; + + if (!sock) + return -EINVAL; + return kernel_sendmsg(sock, msg, vec, num, size); +} + +static int sendpage_unlocked(struct sock *sk, struct page *page, int offset, + size_t size, int flags) +{ + struct socket *sock = sk->sk_socket; + + if (!sock) + return -EINVAL; + return kernel_sendpage(sock, page, offset, size, flags); +} + +typedef int (*sendmsg_func)(struct sock *sk, struct msghdr *msg, + struct kvec *vec, size_t num, size_t size); +typedef int (*sendpage_func)(struct sock *sk, struct page *page, int offset, + size_t size, int flags); +static int __skb_send_sock(struct sock *sk, struct sk_buff *skb, int offset, + int len, sendmsg_func sendmsg, sendpage_func sendpage) { unsigned int orig_len = len; struct sk_buff *head = skb; @@ -2522,7 +2545,8 @@ int skb_send_sock_locked(struct sock *sk, struct sk_buff *skb, int offset, memset(&msg, 0, sizeof(msg)); msg.msg_flags = MSG_DONTWAIT; - ret = kernel_sendmsg_locked(sk, &msg, &kv, 1, slen); + ret = INDIRECT_CALL_2(sendmsg, kernel_sendmsg_locked, + sendmsg_unlocked, sk, &msg, &kv, 1, slen); if (ret <= 0) goto error; @@ -2553,9 +2577,11 @@ int skb_send_sock_locked(struct sock *sk, struct sk_buff *skb, int offset, slen = min_t(size_t, len, skb_frag_size(frag) - offset); while (slen) { - ret = kernel_sendpage_locked(sk, skb_frag_page(frag), - skb_frag_off(frag) + offset, - slen, MSG_DONTWAIT); + ret = INDIRECT_CALL_2(sendpage, kernel_sendpage_locked, + sendpage_unlocked, sk, + skb_frag_page(frag), + skb_frag_off(frag) + offset, + slen, MSG_DONTWAIT); if (ret <= 0) goto error; @@ -2587,8 +2613,23 @@ int skb_send_sock_locked(struct sock *sk, struct sk_buff *skb, int offset, error: return orig_len == len ? ret : orig_len - len; } + +/* Send skb data on a socket. Socket must be locked. */ +int skb_send_sock_locked(struct sock *sk, struct sk_buff *skb, int offset, + int len) +{ + return __skb_send_sock(sk, skb, offset, len, kernel_sendmsg_locked, + kernel_sendpage_locked); +} EXPORT_SYMBOL_GPL(skb_send_sock_locked); +/* Send skb data on a socket. Socket must be unlocked. */ +int skb_send_sock(struct sock *sk, struct sk_buff *skb, int offset, int len) +{ + return __skb_send_sock(sk, skb, offset, len, sendmsg_unlocked, + sendpage_unlocked); +} + /** * skb_store_bits - store bits from kernel buffer to skb * @skb: destination buffer From patchwork Tue Mar 23 00:38:00 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 408750 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E16CEC433F7 for ; Tue, 23 Mar 2021 00:39:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id AE21C619A9 for ; Tue, 23 Mar 2021 00:39:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231327AbhCWAjE (ORCPT ); Mon, 22 Mar 2021 20:39:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39310 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230453AbhCWAiX (ORCPT ); Mon, 22 Mar 2021 20:38:23 -0400 Received: from mail-qk1-x72b.google.com (mail-qk1-x72b.google.com [IPv6:2607:f8b0:4864:20::72b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B1CBBC061574; Mon, 22 Mar 2021 17:38:22 -0700 (PDT) Received: by mail-qk1-x72b.google.com with SMTP id o5so12692691qkb.0; Mon, 22 Mar 2021 17:38:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=sUw/wwXeIFJjpvMM2bx490PBJI5xdu8qJvmZO6V0IWw=; b=mfI6O9fB1WPFFVbN0LFKUX3IadnsIwSIzzJtQi/huMuYcOgt8eHGPChBQMmxILMiaY bmFhVu3thpPSy5OGxkZYOIfW1R6ZKKBc5+94DnEH+fv1Zj+7aWrnpPsOV905tTFAIxKN x3JRKcvY35OBWPffmWdzQRxuCRp9gGMZ1n1JrGSNOFk+hzcN6xqS9n33X64Ih1LQDCJK af+sAwr5QHVyRknq0yOOSKJo79Vw0KTCdsjOxbKIsvYr84UTcIM7cPbF7shPGyuCcNYz KDg5KhtDJRQsXvdXmUGit8nqMwe84Avo8PblOUSrGnqUJRgUgmwe+n9kz6Ia8uYNgCbW DqEg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=sUw/wwXeIFJjpvMM2bx490PBJI5xdu8qJvmZO6V0IWw=; b=sNr7BH1LeUPseo3KuhOsp8E2+tkNPnYwIqPsZHfqRvMlyMcNWAYOa4GEqlakii87l0 cS003Nf93hRsVM0akBEjAiNoy5efwI/01LNfOBATFmb8oJPLz2oQKr83Hqu90OswbmdI KZ8jHoqqQ8lqjWPQLRcLiZk+GVjoRBveUkgeAV/umGVj+7WbHWETPOtAfASHN8Yw2mwX 7Y9lSfJlS/JTBMQWqErdtt0dNJ+I0B1kzMIH6PFIvhBnqOepf484I7MCo2tAKN6PtZ8f G4wOTZE9atDfXrFRqBPvsPqCnarNLIq4PnBFg60b83Bs6mYGoZCuubwEHsCPqC91NoNc hRCQ== X-Gm-Message-State: AOAM532iRs9s7AB4E0YgDfZlT7LFFPN8uDHu/vVE32KjOE7GjIK6AAkZ E0hyLLf4ySjWmnF1GsLbQIAL9hPr3qBjsA== X-Google-Smtp-Source: ABdhPJxpHIJNeRMd1pSx9rSiIpUCTZnHyp7jFWeI37GmiwgovWgtCsK259+1cmFgNZ7/8H2RPbA4tA== X-Received: by 2002:a37:9fce:: with SMTP id i197mr2833061qke.155.1616459901767; Mon, 22 Mar 2021 17:38:21 -0700 (PDT) Received: from unknown.attlocal.net ([2600:1700:65a0:ab60:fda6:6522:f108:7bd8]) by smtp.gmail.com with ESMTPSA id 184sm12356403qki.97.2021.03.22.17.38.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Mar 2021 17:38:21 -0700 (PDT) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, duanxiongchun@bytedance.com, wangdongdong.6@bytedance.com, jiang.wang@bytedance.com, Cong Wang , John Fastabend , Daniel Borkmann , Jakub Sitnicki , Lorenz Bauer Subject: [Patch bpf-next v6 04/12] skmsg: avoid lock_sock() in sk_psock_backlog() Date: Mon, 22 Mar 2021 17:38:00 -0700 Message-Id: <20210323003808.16074-5-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210323003808.16074-1-xiyou.wangcong@gmail.com> References: <20210323003808.16074-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Cong Wang We do not have to lock the sock to avoid losing sk_socket, instead we can purge all the ingress queues when we close the socket. Sending or receiving packets after orphaning socket makes no sense. We do purge these queues when psock refcnt reaches zero but here we want to purge them explicitly in sock_map_close(). There are also some nasty race conditions on testing bit SK_PSOCK_TX_ENABLED and queuing/canceling the psock work, we can expand psock->ingress_lock a bit to protect them too. Cc: John Fastabend Cc: Daniel Borkmann Cc: Jakub Sitnicki Cc: Lorenz Bauer Signed-off-by: Cong Wang --- include/linux/skmsg.h | 1 + net/core/skmsg.c | 51 +++++++++++++++++++++++++++---------------- net/core/sock_map.c | 1 + 3 files changed, 34 insertions(+), 19 deletions(-) diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h index f2d45a73b2b2..cf23e6e2cf54 100644 --- a/include/linux/skmsg.h +++ b/include/linux/skmsg.h @@ -347,6 +347,7 @@ static inline void sk_psock_report_error(struct sk_psock *psock, int err) } struct sk_psock *sk_psock_init(struct sock *sk, int node); +void sk_psock_stop(struct sk_psock *psock, bool wait); #if IS_ENABLED(CONFIG_BPF_STREAM_PARSER) int sk_psock_init_strp(struct sock *sk, struct sk_psock *psock); diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 305dddc51857..9176add87643 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -497,7 +497,7 @@ static int sk_psock_handle_skb(struct sk_psock *psock, struct sk_buff *skb, if (!ingress) { if (!sock_writeable(psock->sk)) return -EAGAIN; - return skb_send_sock_locked(psock->sk, skb, off, len); + return skb_send_sock(psock->sk, skb, off, len); } return sk_psock_skb_ingress(psock, skb); } @@ -511,8 +511,6 @@ static void sk_psock_backlog(struct work_struct *work) u32 len, off; int ret; - /* Lock sock to avoid losing sk_socket during loop. */ - lock_sock(psock->sk); if (state->skb) { skb = state->skb; len = state->len; @@ -529,7 +527,7 @@ static void sk_psock_backlog(struct work_struct *work) skb_bpf_redirect_clear(skb); do { ret = -EIO; - if (likely(psock->sk->sk_socket)) + if (!sock_flag(psock->sk, SOCK_DEAD)) ret = sk_psock_handle_skb(psock, skb, off, len, ingress); if (ret <= 0) { @@ -537,13 +535,13 @@ static void sk_psock_backlog(struct work_struct *work) state->skb = skb; state->len = len; state->off = off; - goto end; + return; } /* Hard errors break pipe and stop xmit. */ sk_psock_report_error(psock, ret ? -ret : EPIPE); sk_psock_clear_state(psock, SK_PSOCK_TX_ENABLED); kfree_skb(skb); - goto end; + return; } off += ret; len -= ret; @@ -552,8 +550,6 @@ static void sk_psock_backlog(struct work_struct *work) if (!ingress) kfree_skb(skb); } -end: - release_sock(psock->sk); } struct sk_psock *sk_psock_init(struct sock *sk, int node) @@ -631,7 +627,7 @@ static void __sk_psock_purge_ingress_msg(struct sk_psock *psock) } } -static void sk_psock_zap_ingress(struct sk_psock *psock) +static void __sk_psock_zap_ingress(struct sk_psock *psock) { struct sk_buff *skb; @@ -639,9 +635,7 @@ static void sk_psock_zap_ingress(struct sk_psock *psock) skb_bpf_redirect_clear(skb); kfree_skb(skb); } - spin_lock_bh(&psock->ingress_lock); __sk_psock_purge_ingress_msg(psock); - spin_unlock_bh(&psock->ingress_lock); } static void sk_psock_link_destroy(struct sk_psock *psock) @@ -654,6 +648,18 @@ static void sk_psock_link_destroy(struct sk_psock *psock) } } +void sk_psock_stop(struct sk_psock *psock, bool wait) +{ + spin_lock_bh(&psock->ingress_lock); + sk_psock_clear_state(psock, SK_PSOCK_TX_ENABLED); + sk_psock_cork_free(psock); + __sk_psock_zap_ingress(psock); + spin_unlock_bh(&psock->ingress_lock); + + if (wait) + cancel_work_sync(&psock->work); +} + static void sk_psock_done_strp(struct sk_psock *psock); static void sk_psock_destroy_deferred(struct work_struct *gc) @@ -670,7 +676,6 @@ static void sk_psock_destroy_deferred(struct work_struct *gc) sk_psock_link_destroy(psock); sk_psock_cork_free(psock); - sk_psock_zap_ingress(psock); if (psock->sk_redir) sock_put(psock->sk_redir); @@ -688,8 +693,7 @@ static void sk_psock_destroy(struct rcu_head *rcu) void sk_psock_drop(struct sock *sk, struct sk_psock *psock) { - sk_psock_cork_free(psock); - sk_psock_zap_ingress(psock); + sk_psock_stop(psock, false); write_lock_bh(&sk->sk_callback_lock); sk_psock_restore_proto(sk, psock); @@ -699,7 +703,6 @@ void sk_psock_drop(struct sock *sk, struct sk_psock *psock) else if (psock->progs.stream_verdict) sk_psock_stop_verdict(sk, psock); write_unlock_bh(&sk->sk_callback_lock); - sk_psock_clear_state(psock, SK_PSOCK_TX_ENABLED); call_rcu(&psock->rcu, sk_psock_destroy); } @@ -770,14 +773,20 @@ static void sk_psock_skb_redirect(struct sk_buff *skb) * error that caused the pipe to break. We can't send a packet on * a socket that is in this state so we drop the skb. */ - if (!psock_other || sock_flag(sk_other, SOCK_DEAD) || - !sk_psock_test_state(psock_other, SK_PSOCK_TX_ENABLED)) { + if (!psock_other || sock_flag(sk_other, SOCK_DEAD)) { + kfree_skb(skb); + return; + } + spin_lock_bh(&psock_other->ingress_lock); + if (!sk_psock_test_state(psock_other, SK_PSOCK_TX_ENABLED)) { + spin_unlock_bh(&psock_other->ingress_lock); kfree_skb(skb); return; } skb_queue_tail(&psock_other->ingress_skb, skb); schedule_work(&psock_other->work); + spin_unlock_bh(&psock_other->ingress_lock); } static void sk_psock_tls_verdict_apply(struct sk_buff *skb, struct sock *sk, int verdict) @@ -845,8 +854,12 @@ static void sk_psock_verdict_apply(struct sk_psock *psock, err = sk_psock_skb_ingress_self(psock, skb); } if (err < 0) { - skb_queue_tail(&psock->ingress_skb, skb); - schedule_work(&psock->work); + spin_lock_bh(&psock->ingress_lock); + if (sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED)) { + skb_queue_tail(&psock->ingress_skb, skb); + schedule_work(&psock->work); + } + spin_unlock_bh(&psock->ingress_lock); } break; case __SK_REDIRECT: diff --git a/net/core/sock_map.c b/net/core/sock_map.c index dd53a7771d7e..e564fdeaada1 100644 --- a/net/core/sock_map.c +++ b/net/core/sock_map.c @@ -1540,6 +1540,7 @@ void sock_map_close(struct sock *sk, long timeout) saved_close = psock->saved_close; sock_map_remove_links(sk, psock); rcu_read_unlock(); + sk_psock_stop(psock, true); release_sock(sk); saved_close(sk, timeout); } From patchwork Tue Mar 23 00:38:01 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 408751 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E24F5C433FB for ; Tue, 23 Mar 2021 00:39:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C76636199F for ; Tue, 23 Mar 2021 00:39:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231348AbhCWAjF (ORCPT ); Mon, 22 Mar 2021 20:39:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39326 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230455AbhCWAi1 (ORCPT ); Mon, 22 Mar 2021 20:38:27 -0400 Received: from mail-qv1-xf36.google.com (mail-qv1-xf36.google.com [IPv6:2607:f8b0:4864:20::f36]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 60C63C061756; Mon, 22 Mar 2021 17:38:24 -0700 (PDT) Received: by mail-qv1-xf36.google.com with SMTP id dc12so8755971qvb.4; Mon, 22 Mar 2021 17:38:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=xTEo6ohvKEl4pyLEeJ/PD6TxOMrhoqIZ3qvxpT6J8wU=; b=pHUoaoL5k6QdcZ84NSRgU8WiOMvK6XeHaGMOkUlAcQn0GYu8Tck5UPrBdDPdJRUmHP TX0VWFnzKVUkNE0ji/4WvkFS+cLD25LehANrX3YljWs9D8r0ZDzCIpJWk4GZ7RNisDp4 gbOo/m9bnfJICGYBeA7wd06RlsiIg3qZ3DOQYCjyCPRd1QuG/rRlKyz9WBKnFD1HPgCv O4vk9H7eD8IGOjQ81eunkh1OvFgNdu3fiLvnrmtnuq5mxKLfZAE6hVOql5FZcHon9OA5 9a4+u1Ld0bPM6njqjodhxrjrhOwe0mdAWT7oUvcEl9JT8Kz5x0GW9mBooiJ4yXChl6Xf ay5g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=xTEo6ohvKEl4pyLEeJ/PD6TxOMrhoqIZ3qvxpT6J8wU=; b=KMAHrTPv+NC3EL3Ko4rnPbUVunv9JkGTSH2IRVpbf/L6hG0zX6OZPHjlzMCe8Ur6QH O1hNVSm7M0PFgGn+Xp3JRkDlqRjrCLC0wlTXevsvZZb6Hfgj9kew/9W/us8riuoMy3Ua HRcsfq2743EJ1QpdNaqhR/gcKVe9pXZF7GrR+L/28Oihvub+vQ9WQIGAL4Bu+TMgtLEr tmWqvAAd3V0FFf2Cvpt1vCITcQhohuIkupV1U3F3o0DmlNghIcpEz/BfpvBZDdCo/beN UQ0xf9tE9Pk0tWMa3s4uH2Q9Y1ecQlHXuKgBpbQYXu3ugdu6i7AY5HNdNYvRfgV0WPL0 uUxQ== X-Gm-Message-State: AOAM532VKvuCsOhwja/bBlEIoo4twMAEFXmM0VDnUHw77m0BmOJVA6mA 09d6s8uLRphVpofgEkx1Umrdq8axn1LvEg== X-Google-Smtp-Source: ABdhPJxSgO3k2FjoYu9IiX3zjTbLHao4zMTHYrH/UDkl5kZehrq4HVxgI1WIFE3iMj4cxb23Y1X2Ug== X-Received: by 2002:ad4:4ae9:: with SMTP id cp9mr2351336qvb.20.1616459903486; Mon, 22 Mar 2021 17:38:23 -0700 (PDT) Received: from unknown.attlocal.net ([2600:1700:65a0:ab60:fda6:6522:f108:7bd8]) by smtp.gmail.com with ESMTPSA id 184sm12356403qki.97.2021.03.22.17.38.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Mar 2021 17:38:23 -0700 (PDT) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, duanxiongchun@bytedance.com, wangdongdong.6@bytedance.com, jiang.wang@bytedance.com, Cong Wang , John Fastabend , Daniel Borkmann , Jakub Sitnicki , Lorenz Bauer Subject: [Patch bpf-next v6 05/12] skmsg: use rcu work for destroying psock Date: Mon, 22 Mar 2021 17:38:01 -0700 Message-Id: <20210323003808.16074-6-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210323003808.16074-1-xiyou.wangcong@gmail.com> References: <20210323003808.16074-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Cong Wang The RCU callback sk_psock_destroy() only queues work psock->gc, so we can just switch to rcu work to simplify the code. Cc: John Fastabend Cc: Daniel Borkmann Cc: Jakub Sitnicki Cc: Lorenz Bauer Signed-off-by: Cong Wang Acked-by: John Fastabend --- include/linux/skmsg.h | 5 +---- net/core/skmsg.c | 17 +++++------------ 2 files changed, 6 insertions(+), 16 deletions(-) diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h index cf23e6e2cf54..d75c3936ab91 100644 --- a/include/linux/skmsg.h +++ b/include/linux/skmsg.h @@ -101,10 +101,7 @@ struct sk_psock { struct proto *sk_proto; struct sk_psock_work_state work_state; struct work_struct work; - union { - struct rcu_head rcu; - struct work_struct gc; - }; + struct rcu_work rwork; }; int sk_msg_alloc(struct sock *sk, struct sk_msg *msg, int len, diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 9176add87643..a9ee26aebfbd 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -662,10 +662,10 @@ void sk_psock_stop(struct sk_psock *psock, bool wait) static void sk_psock_done_strp(struct sk_psock *psock); -static void sk_psock_destroy_deferred(struct work_struct *gc) +static void sk_psock_destroy(struct work_struct *work) { - struct sk_psock *psock = container_of(gc, struct sk_psock, gc); - + struct sk_psock *psock = container_of(to_rcu_work(work), + struct sk_psock, rwork); /* No sk_callback_lock since already detached. */ sk_psock_done_strp(psock); @@ -683,14 +683,6 @@ static void sk_psock_destroy_deferred(struct work_struct *gc) kfree(psock); } -static void sk_psock_destroy(struct rcu_head *rcu) -{ - struct sk_psock *psock = container_of(rcu, struct sk_psock, rcu); - - INIT_WORK(&psock->gc, sk_psock_destroy_deferred); - schedule_work(&psock->gc); -} - void sk_psock_drop(struct sock *sk, struct sk_psock *psock) { sk_psock_stop(psock, false); @@ -704,7 +696,8 @@ void sk_psock_drop(struct sock *sk, struct sk_psock *psock) sk_psock_stop_verdict(sk, psock); write_unlock_bh(&sk->sk_callback_lock); - call_rcu(&psock->rcu, sk_psock_destroy); + INIT_RCU_WORK(&psock->rwork, sk_psock_destroy); + queue_rcu_work(system_wq, &psock->rwork); } EXPORT_SYMBOL_GPL(sk_psock_drop); From patchwork Tue Mar 23 00:38:02 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 407434 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2AE78C433FC for ; Tue, 23 Mar 2021 00:39:24 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 075F56199F for ; Tue, 23 Mar 2021 00:39:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231365AbhCWAjJ (ORCPT ); Mon, 22 Mar 2021 20:39:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39328 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230467AbhCWAi2 (ORCPT ); Mon, 22 Mar 2021 20:38:28 -0400 Received: from mail-qk1-x729.google.com (mail-qk1-x729.google.com [IPv6:2607:f8b0:4864:20::729]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EEE91C061762; Mon, 22 Mar 2021 17:38:25 -0700 (PDT) Received: by mail-qk1-x729.google.com with SMTP id y5so11166100qkl.9; Mon, 22 Mar 2021 17:38:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Gy7tDuRfeUFuz0b2V+w0UYbQ8JS3pLtgS4NvNblzE28=; b=kqK0fOJh8eqgA1jpwNpFbokKq2usP7Vys7IqtDA/hLHIOx5i9NMs6JjEgqN5KBqN5/ 9cKXvxxA15xMdhlO+FTdjrT6l0SlLqx2LQI0LiXQtPRBNgshMKMl3xqO3vmnyqhWWE1I tby/4Zfy4lpPMPeWd7tK8PSvltMGqivJIVNdRdvOKR+WF+CU6ZrgCNSCibJEqoRA2sLy 05GCvcoleAkvEzB0Mx30oF9yeJz3Rz8BLhQb10gn1Y8T7AVOnWoAzcnarhxIT2BRQzFV SHm2ue4j4CBFcYzDl1k7QYAh4UNUeFPA/dZJlVTRViZ8gzL3Qek0TuHUowIUZbdrMZoS KU+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Gy7tDuRfeUFuz0b2V+w0UYbQ8JS3pLtgS4NvNblzE28=; b=UrWZfoweD753En3mCEuRGdMDS6oqNcv1sFC4mB9B3tdpj4zghq2POJw54XrdU1tBQB +ADJfqsj30cQMfVgrWhVrEx80O7jjkAyhDrd6cbIGzpiQwsWYULXozhGVdvTN5bzdtPa EKQYfKyarEXLPtjWp6F2ig/5xOdGIXszPEeGPa1hYuMksevg8ZyUnZFD008XKbGaGNy9 8xTAkIEouKwUX0C0sY6Kv28zczwh0ePN4c0FyFI6Tc/7XPOXFJ+66mvlpDZRT5agZPNg MREoxvbrV/i1J9GiEN/3W0IWM1xyh8mwnq1sLx9VAFqLO7T9FrecraPVGwxuCVbXViq+ kVmw== X-Gm-Message-State: AOAM530m0f/rfdhy5MEmfhlrumVidF5jtx+FRzgY/wjFrrGjEgDV5+Ka 1TcvObLCYJWE2+tO2ABi4F29gz6lf6MW1w== X-Google-Smtp-Source: ABdhPJwB4aYFSoga7SRmmex+j2gBbkyzbCuXE6AGsBSCksjlBtIb8jgWL3iY/v1i8wuwzrtcRKwqeg== X-Received: by 2002:a05:620a:819:: with SMTP id s25mr3054270qks.485.1616459904979; Mon, 22 Mar 2021 17:38:24 -0700 (PDT) Received: from unknown.attlocal.net ([2600:1700:65a0:ab60:fda6:6522:f108:7bd8]) by smtp.gmail.com with ESMTPSA id 184sm12356403qki.97.2021.03.22.17.38.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Mar 2021 17:38:24 -0700 (PDT) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, duanxiongchun@bytedance.com, wangdongdong.6@bytedance.com, jiang.wang@bytedance.com, Cong Wang , John Fastabend , Daniel Borkmann , Jakub Sitnicki , Lorenz Bauer Subject: [Patch bpf-next v6 06/12] sock_map: introduce BPF_SK_SKB_VERDICT Date: Mon, 22 Mar 2021 17:38:02 -0700 Message-Id: <20210323003808.16074-7-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210323003808.16074-1-xiyou.wangcong@gmail.com> References: <20210323003808.16074-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Cong Wang Reusing BPF_SK_SKB_STREAM_VERDICT is possible but its name is confusing and more importantly we still want to distinguish them from user-space. So we can just reuse the stream verdict code but introduce a new type of eBPF program, skb_verdict. Users are not allowed to set stream_verdict and skb_verdict at the same time. Cc: John Fastabend Cc: Daniel Borkmann Cc: Jakub Sitnicki Cc: Lorenz Bauer Signed-off-by: Cong Wang --- include/linux/skmsg.h | 2 ++ include/uapi/linux/bpf.h | 1 + kernel/bpf/syscall.c | 1 + net/core/skmsg.c | 4 +++- net/core/sock_map.c | 23 ++++++++++++++++++++++- tools/bpf/bpftool/common.c | 1 + tools/bpf/bpftool/prog.c | 1 + tools/include/uapi/linux/bpf.h | 1 + 8 files changed, 32 insertions(+), 2 deletions(-) diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h index d75c3936ab91..bc4010b23063 100644 --- a/include/linux/skmsg.h +++ b/include/linux/skmsg.h @@ -58,6 +58,7 @@ struct sk_psock_progs { struct bpf_prog *msg_parser; struct bpf_prog *stream_parser; struct bpf_prog *stream_verdict; + struct bpf_prog *skb_verdict; }; enum sk_psock_state_bits { @@ -486,6 +487,7 @@ static inline void psock_progs_drop(struct sk_psock_progs *progs) psock_set_prog(&progs->msg_parser, NULL); psock_set_prog(&progs->stream_parser, NULL); psock_set_prog(&progs->stream_verdict, NULL); + psock_set_prog(&progs->skb_verdict, NULL); } int sk_psock_tls_strp_read(struct sk_psock *psock, struct sk_buff *skb); diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 2d3036e292a9..309a24582b18 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -957,6 +957,7 @@ enum bpf_attach_type { BPF_XDP_CPUMAP, BPF_SK_LOOKUP, BPF_XDP, + BPF_SK_SKB_VERDICT, __MAX_BPF_ATTACH_TYPE }; diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index c859bc46d06c..afa803a1553e 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -2941,6 +2941,7 @@ attach_type_to_prog_type(enum bpf_attach_type attach_type) return BPF_PROG_TYPE_SK_MSG; case BPF_SK_SKB_STREAM_PARSER: case BPF_SK_SKB_STREAM_VERDICT: + case BPF_SK_SKB_VERDICT: return BPF_PROG_TYPE_SK_SKB; case BPF_LIRC_MODE2: return BPF_PROG_TYPE_LIRC_MODE2; diff --git a/net/core/skmsg.c b/net/core/skmsg.c index a9ee26aebfbd..7e7e30e30f4b 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -692,7 +692,7 @@ void sk_psock_drop(struct sock *sk, struct sk_psock *psock) rcu_assign_sk_user_data(sk, NULL); if (psock->progs.stream_parser) sk_psock_stop_strp(sk, psock); - else if (psock->progs.stream_verdict) + else if (psock->progs.stream_verdict || psock->progs.skb_verdict) sk_psock_stop_verdict(sk, psock); write_unlock_bh(&sk->sk_callback_lock); @@ -1019,6 +1019,8 @@ static int sk_psock_verdict_recv(read_descriptor_t *desc, struct sk_buff *skb, } skb_set_owner_r(skb, sk); prog = READ_ONCE(psock->progs.stream_verdict); + if (!prog) + prog = READ_ONCE(psock->progs.skb_verdict); if (likely(prog)) { skb_dst_drop(skb); skb_bpf_redirect_clear(skb); diff --git a/net/core/sock_map.c b/net/core/sock_map.c index e564fdeaada1..c46709786a49 100644 --- a/net/core/sock_map.c +++ b/net/core/sock_map.c @@ -155,6 +155,8 @@ static void sock_map_del_link(struct sock *sk, strp_stop = true; if (psock->saved_data_ready && stab->progs.stream_verdict) verdict_stop = true; + if (psock->saved_data_ready && stab->progs.skb_verdict) + verdict_stop = true; list_del(&link->list); sk_psock_free_link(link); } @@ -227,7 +229,7 @@ static struct sk_psock *sock_map_psock_get_checked(struct sock *sk) static int sock_map_link(struct bpf_map *map, struct sk_psock_progs *progs, struct sock *sk) { - struct bpf_prog *msg_parser, *stream_parser, *stream_verdict; + struct bpf_prog *msg_parser, *stream_parser, *stream_verdict, *skb_verdict; struct sk_psock *psock; int ret; @@ -256,6 +258,15 @@ static int sock_map_link(struct bpf_map *map, struct sk_psock_progs *progs, } } + skb_verdict = READ_ONCE(progs->skb_verdict); + if (skb_verdict) { + skb_verdict = bpf_prog_inc_not_zero(skb_verdict); + if (IS_ERR(skb_verdict)) { + ret = PTR_ERR(skb_verdict); + goto out_put_msg_parser; + } + } + psock = sock_map_psock_get_checked(sk); if (IS_ERR(psock)) { ret = PTR_ERR(psock); @@ -265,6 +276,7 @@ static int sock_map_link(struct bpf_map *map, struct sk_psock_progs *progs, if (psock) { if ((msg_parser && READ_ONCE(psock->progs.msg_parser)) || (stream_parser && READ_ONCE(psock->progs.stream_parser)) || + (skb_verdict && READ_ONCE(psock->progs.skb_verdict)) || (stream_verdict && READ_ONCE(psock->progs.stream_verdict))) { sk_psock_put(sk, psock); ret = -EBUSY; @@ -296,6 +308,9 @@ static int sock_map_link(struct bpf_map *map, struct sk_psock_progs *progs, } else if (!stream_parser && stream_verdict && !psock->saved_data_ready) { psock_set_prog(&psock->progs.stream_verdict, stream_verdict); sk_psock_start_verdict(sk,psock); + } else if (!stream_verdict && skb_verdict && !psock->saved_data_ready) { + psock_set_prog(&psock->progs.skb_verdict, skb_verdict); + sk_psock_start_verdict(sk, psock); } write_unlock_bh(&sk->sk_callback_lock); return 0; @@ -304,6 +319,9 @@ static int sock_map_link(struct bpf_map *map, struct sk_psock_progs *progs, out_drop: sk_psock_put(sk, psock); out_progs: + if (skb_verdict) + bpf_prog_put(skb_verdict); +out_put_msg_parser: if (msg_parser) bpf_prog_put(msg_parser); out_put_stream_parser: @@ -1468,6 +1486,9 @@ static int sock_map_prog_update(struct bpf_map *map, struct bpf_prog *prog, case BPF_SK_SKB_STREAM_VERDICT: pprog = &progs->stream_verdict; break; + case BPF_SK_SKB_VERDICT: + pprog = &progs->skb_verdict; + break; default: return -EOPNOTSUPP; } diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c index 65303664417e..1828bba19020 100644 --- a/tools/bpf/bpftool/common.c +++ b/tools/bpf/bpftool/common.c @@ -57,6 +57,7 @@ const char * const attach_type_name[__MAX_BPF_ATTACH_TYPE] = { [BPF_SK_SKB_STREAM_PARSER] = "sk_skb_stream_parser", [BPF_SK_SKB_STREAM_VERDICT] = "sk_skb_stream_verdict", + [BPF_SK_SKB_VERDICT] = "sk_skb_verdict", [BPF_SK_MSG_VERDICT] = "sk_msg_verdict", [BPF_LIRC_MODE2] = "lirc_mode2", [BPF_FLOW_DISSECTOR] = "flow_dissector", diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c index f2b915b20546..3f067d2d7584 100644 --- a/tools/bpf/bpftool/prog.c +++ b/tools/bpf/bpftool/prog.c @@ -76,6 +76,7 @@ enum dump_mode { static const char * const attach_type_strings[] = { [BPF_SK_SKB_STREAM_PARSER] = "stream_parser", [BPF_SK_SKB_STREAM_VERDICT] = "stream_verdict", + [BPF_SK_SKB_VERDICT] = "skb_verdict", [BPF_SK_MSG_VERDICT] = "msg_verdict", [BPF_FLOW_DISSECTOR] = "flow_dissector", [__MAX_BPF_ATTACH_TYPE] = NULL, diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 2d3036e292a9..309a24582b18 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -957,6 +957,7 @@ enum bpf_attach_type { BPF_XDP_CPUMAP, BPF_SK_LOOKUP, BPF_XDP, + BPF_SK_SKB_VERDICT, __MAX_BPF_ATTACH_TYPE }; From patchwork Tue Mar 23 00:38:03 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 408749 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 130A5C433E2 for ; Tue, 23 Mar 2021 00:39:24 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DFA2E619AC for ; Tue, 23 Mar 2021 00:39:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231354AbhCWAjH (ORCPT ); Mon, 22 Mar 2021 20:39:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39338 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230479AbhCWAi2 (ORCPT ); Mon, 22 Mar 2021 20:38:28 -0400 Received: from mail-qv1-xf2c.google.com (mail-qv1-xf2c.google.com [IPv6:2607:f8b0:4864:20::f2c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5D729C061763; Mon, 22 Mar 2021 17:38:27 -0700 (PDT) Received: by mail-qv1-xf2c.google.com with SMTP id t16so9626649qvr.12; Mon, 22 Mar 2021 17:38:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=MzWDVMO5ZiHvbDpjyRkCneiAweZ0Mrphb9zPwFMt7cc=; b=LweSBCg096Tyz0noPYojV7pUsWosBJnJgXKs7Pn/aHT6YqBi4WIUkUQZAwJw9GbxOp fVYsFoy6xcchcZIZPodrYOId9bgaL4JdpWZX0FZ3moHX4lhe2XgckG+qxGLCwI6u4AMa Jlll6dYBTSQhNo9s6Ks0OmWe0QQmSnIamBtVNAZxTGF2HLS/NM6kxjJsPwhYnIPCX3Rx Dtdi85wRov402Svjo+jSFNO0jJEXIIzZCHYILfCRcB4fjzBPFzUxYXpJWqverTj3nklV 0AiX0EaTRZkHVGYw91ULnLKIdlqz2Wnofic4M5oupEd3xt4c8YUiTtemRLEEW/otUBlN G7iA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=MzWDVMO5ZiHvbDpjyRkCneiAweZ0Mrphb9zPwFMt7cc=; b=Vwlz8q5BxCjXuTVE96lKirOVx51linJ9tjE1pFVzKsA6orTdmFvbx9hBx/NptF88qi N9xjwm4pvkuyCUw0SY43m/RJDiOKxodUj46KJnwBFvvgbI8weXG5EBtmvnGY3oTjvqEL HegWxAVXGtwHuMiIKIkwWXTyqSX9yPWB7WFSPfcApBZL3lUZmLC3fO6fJtSd281iJm+o +uanbkqcAEbuNponBi78jKcZd1d4gnNvFdH7mtDhzgq/t1NtKK0kqeDLgBq1eqAErxCN dd0odggGi2elUDirmf2d2+76vYXw1LVaqCauqvICGv8vT1UXwhg7IK+trZAzau3aqGxq L4Iw== X-Gm-Message-State: AOAM531hkb7uXpOc8WeZwmrxdROsY/sMv+7FyjMEj5ugbZR69AXSbEaP fvuXfLKzk+bBpNAXr4keLZ2Jc3BmT64hlQ== X-Google-Smtp-Source: ABdhPJwTH9TurRtOO+mz03ECuwrqFK3TKFsFwsHoZvKX8LfHvCn78hra9G2DQNx/n0ALvt9G9uBSZg== X-Received: by 2002:ad4:5888:: with SMTP id dz8mr2490373qvb.12.1616459906466; Mon, 22 Mar 2021 17:38:26 -0700 (PDT) Received: from unknown.attlocal.net ([2600:1700:65a0:ab60:fda6:6522:f108:7bd8]) by smtp.gmail.com with ESMTPSA id 184sm12356403qki.97.2021.03.22.17.38.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Mar 2021 17:38:26 -0700 (PDT) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, duanxiongchun@bytedance.com, wangdongdong.6@bytedance.com, jiang.wang@bytedance.com, Cong Wang , John Fastabend , Daniel Borkmann , Jakub Sitnicki , Lorenz Bauer Subject: [Patch bpf-next v6 07/12] sock: introduce sk->sk_prot->psock_update_sk_prot() Date: Mon, 22 Mar 2021 17:38:03 -0700 Message-Id: <20210323003808.16074-8-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210323003808.16074-1-xiyou.wangcong@gmail.com> References: <20210323003808.16074-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Cong Wang Currently sockmap calls into each protocol to update the struct proto and replace it. This certainly won't work when the protocol is implemented as a module, for example, AF_UNIX. Introduce a new ops sk->sk_prot->psock_update_sk_prot(), so each protocol can implement its own way to replace the struct proto. This also helps get rid of symbol dependencies on CONFIG_INET. Cc: John Fastabend Cc: Daniel Borkmann Cc: Jakub Sitnicki Cc: Lorenz Bauer Signed-off-by: Cong Wang --- include/linux/skmsg.h | 18 +++--------------- include/net/sock.h | 3 +++ include/net/tcp.h | 1 + include/net/udp.h | 1 + net/core/skmsg.c | 5 ----- net/core/sock_map.c | 24 ++++-------------------- net/ipv4/tcp_bpf.c | 24 +++++++++++++++++++++--- net/ipv4/tcp_ipv4.c | 3 +++ net/ipv4/udp.c | 3 +++ net/ipv4/udp_bpf.c | 15 +++++++++++++-- net/ipv6/tcp_ipv6.c | 3 +++ net/ipv6/udp.c | 3 +++ 12 files changed, 58 insertions(+), 45 deletions(-) diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h index bc4010b23063..eb53f6ce74d7 100644 --- a/include/linux/skmsg.h +++ b/include/linux/skmsg.h @@ -99,6 +99,7 @@ struct sk_psock { void (*saved_close)(struct sock *sk, long timeout); void (*saved_write_space)(struct sock *sk); void (*saved_data_ready)(struct sock *sk); + int (*psock_update_sk_prot)(struct sock *sk, bool restore); struct proto *sk_proto; struct sk_psock_work_state work_state; struct work_struct work; @@ -394,25 +395,12 @@ static inline void sk_psock_cork_free(struct sk_psock *psock) } } -static inline void sk_psock_update_proto(struct sock *sk, - struct sk_psock *psock, - struct proto *ops) -{ - /* Pairs with lockless read in sk_clone_lock() */ - WRITE_ONCE(sk->sk_prot, ops); -} - static inline void sk_psock_restore_proto(struct sock *sk, struct sk_psock *psock) { sk->sk_prot->unhash = psock->saved_unhash; - if (inet_csk_has_ulp(sk)) { - tcp_update_ulp(sk, psock->sk_proto, psock->saved_write_space); - } else { - sk->sk_write_space = psock->saved_write_space; - /* Pairs with lockless read in sk_clone_lock() */ - WRITE_ONCE(sk->sk_prot, psock->sk_proto); - } + if (psock->psock_update_sk_prot) + psock->psock_update_sk_prot(sk, true); } static inline void sk_psock_set_state(struct sk_psock *psock, diff --git a/include/net/sock.h b/include/net/sock.h index 636810ddcd9b..eda64fbd5e3d 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1184,6 +1184,9 @@ struct proto { void (*unhash)(struct sock *sk); void (*rehash)(struct sock *sk); int (*get_port)(struct sock *sk, unsigned short snum); +#ifdef CONFIG_BPF_SYSCALL + int (*psock_update_sk_prot)(struct sock *sk, bool restore); +#endif /* Keeping track of sockets in use */ #ifdef CONFIG_PROC_FS diff --git a/include/net/tcp.h b/include/net/tcp.h index 075de26f449d..2efa4e5ea23d 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -2203,6 +2203,7 @@ struct sk_psock; #ifdef CONFIG_BPF_SYSCALL struct proto *tcp_bpf_get_proto(struct sock *sk, struct sk_psock *psock); +int tcp_bpf_update_proto(struct sock *sk, bool restore); void tcp_bpf_clone(const struct sock *sk, struct sock *newsk); #endif /* CONFIG_BPF_SYSCALL */ diff --git a/include/net/udp.h b/include/net/udp.h index d4d064c59232..df7cc1edc200 100644 --- a/include/net/udp.h +++ b/include/net/udp.h @@ -518,6 +518,7 @@ static inline struct sk_buff *udp_rcv_segment(struct sock *sk, #ifdef CONFIG_BPF_SYSCALL struct sk_psock; struct proto *udp_bpf_get_proto(struct sock *sk, struct sk_psock *psock); +int udp_bpf_update_proto(struct sock *sk, bool restore); #endif #endif /* _UDP_H */ diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 7e7e30e30f4b..f7a512bfdb6e 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -559,11 +559,6 @@ struct sk_psock *sk_psock_init(struct sock *sk, int node) write_lock_bh(&sk->sk_callback_lock); - if (inet_csk_has_ulp(sk)) { - psock = ERR_PTR(-EINVAL); - goto out; - } - if (sk->sk_user_data) { psock = ERR_PTR(-EBUSY); goto out; diff --git a/net/core/sock_map.c b/net/core/sock_map.c index c46709786a49..d04b98fc8104 100644 --- a/net/core/sock_map.c +++ b/net/core/sock_map.c @@ -184,26 +184,10 @@ static void sock_map_unref(struct sock *sk, void *link_raw) static int sock_map_init_proto(struct sock *sk, struct sk_psock *psock) { - struct proto *prot; - - switch (sk->sk_type) { - case SOCK_STREAM: - prot = tcp_bpf_get_proto(sk, psock); - break; - - case SOCK_DGRAM: - prot = udp_bpf_get_proto(sk, psock); - break; - - default: + if (!sk->sk_prot->psock_update_sk_prot) return -EINVAL; - } - - if (IS_ERR(prot)) - return PTR_ERR(prot); - - sk_psock_update_proto(sk, psock, prot); - return 0; + psock->psock_update_sk_prot = sk->sk_prot->psock_update_sk_prot; + return sk->sk_prot->psock_update_sk_prot(sk, false); } static struct sk_psock *sock_map_psock_get_checked(struct sock *sk) @@ -570,7 +554,7 @@ static bool sock_map_redirect_allowed(const struct sock *sk) static bool sock_map_sk_is_suitable(const struct sock *sk) { - return sk_is_tcp(sk) || sk_is_udp(sk); + return !!sk->sk_prot->psock_update_sk_prot; } static bool sock_map_sk_state_allowed(const struct sock *sk) diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c index ae980716d896..ac8cfbaeacd2 100644 --- a/net/ipv4/tcp_bpf.c +++ b/net/ipv4/tcp_bpf.c @@ -595,20 +595,38 @@ static int tcp_bpf_assert_proto_ops(struct proto *ops) ops->sendpage == tcp_sendpage ? 0 : -ENOTSUPP; } -struct proto *tcp_bpf_get_proto(struct sock *sk, struct sk_psock *psock) +int tcp_bpf_update_proto(struct sock *sk, bool restore) { + struct sk_psock *psock = sk_psock(sk); int family = sk->sk_family == AF_INET6 ? TCP_BPF_IPV6 : TCP_BPF_IPV4; int config = psock->progs.msg_parser ? TCP_BPF_TX : TCP_BPF_BASE; + if (restore) { + if (inet_csk_has_ulp(sk)) { + tcp_update_ulp(sk, psock->sk_proto, psock->saved_write_space); + } else { + sk->sk_write_space = psock->saved_write_space; + /* Pairs with lockless read in sk_clone_lock() */ + WRITE_ONCE(sk->sk_prot, psock->sk_proto); + } + return 0; + } + + if (inet_csk_has_ulp(sk)) + return -EINVAL; + if (sk->sk_family == AF_INET6) { if (tcp_bpf_assert_proto_ops(psock->sk_proto)) - return ERR_PTR(-EINVAL); + return -EINVAL; tcp_bpf_check_v6_needs_rebuild(psock->sk_proto); } - return &tcp_bpf_prots[family][config]; + /* Pairs with lockless read in sk_clone_lock() */ + WRITE_ONCE(sk->sk_prot, &tcp_bpf_prots[family][config]); + return 0; } +EXPORT_SYMBOL_GPL(tcp_bpf_update_proto); /* If a child got cloned from a listening socket that had tcp_bpf * protocol callbacks installed, we need to restore the callbacks to diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index daad4f99db32..dfc6d1c0e710 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -2806,6 +2806,9 @@ struct proto tcp_prot = { .hash = inet_hash, .unhash = inet_unhash, .get_port = inet_csk_get_port, +#ifdef CONFIG_BPF_SYSCALL + .psock_update_sk_prot = tcp_bpf_update_proto, +#endif .enter_memory_pressure = tcp_enter_memory_pressure, .leave_memory_pressure = tcp_leave_memory_pressure, .stream_memory_free = tcp_stream_memory_free, diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 4a0478b17243..38952aaee3a1 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -2849,6 +2849,9 @@ struct proto udp_prot = { .unhash = udp_lib_unhash, .rehash = udp_v4_rehash, .get_port = udp_v4_get_port, +#ifdef CONFIG_BPF_SYSCALL + .psock_update_sk_prot = udp_bpf_update_proto, +#endif .memory_allocated = &udp_memory_allocated, .sysctl_mem = sysctl_udp_mem, .sysctl_wmem_offset = offsetof(struct net, ipv4.sysctl_udp_wmem_min), diff --git a/net/ipv4/udp_bpf.c b/net/ipv4/udp_bpf.c index 7a94791efc1a..6001f93cd3a0 100644 --- a/net/ipv4/udp_bpf.c +++ b/net/ipv4/udp_bpf.c @@ -41,12 +41,23 @@ static int __init udp_bpf_v4_build_proto(void) } core_initcall(udp_bpf_v4_build_proto); -struct proto *udp_bpf_get_proto(struct sock *sk, struct sk_psock *psock) +int udp_bpf_update_proto(struct sock *sk, bool restore) { int family = sk->sk_family == AF_INET ? UDP_BPF_IPV4 : UDP_BPF_IPV6; + struct sk_psock *psock = sk_psock(sk); + + if (restore) { + sk->sk_write_space = psock->saved_write_space; + /* Pairs with lockless read in sk_clone_lock() */ + WRITE_ONCE(sk->sk_prot, psock->sk_proto); + return 0; + } if (sk->sk_family == AF_INET6) udp_bpf_check_v6_needs_rebuild(psock->sk_proto); - return &udp_bpf_prots[family]; + /* Pairs with lockless read in sk_clone_lock() */ + WRITE_ONCE(sk->sk_prot, &udp_bpf_prots[family]); + return 0; } +EXPORT_SYMBOL_GPL(udp_bpf_update_proto); diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index bd44ded7e50c..4fdc58a9e19e 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -2134,6 +2134,9 @@ struct proto tcpv6_prot = { .hash = inet6_hash, .unhash = inet_unhash, .get_port = inet_csk_get_port, +#ifdef CONFIG_BPF_SYSCALL + .psock_update_sk_prot = tcp_bpf_update_proto, +#endif .enter_memory_pressure = tcp_enter_memory_pressure, .leave_memory_pressure = tcp_leave_memory_pressure, .stream_memory_free = tcp_stream_memory_free, diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index d25e5a9252fd..ef2c75bb4771 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -1713,6 +1713,9 @@ struct proto udpv6_prot = { .unhash = udp_lib_unhash, .rehash = udp_v6_rehash, .get_port = udp_v6_get_port, +#ifdef CONFIG_BPF_SYSCALL + .psock_update_sk_prot = udp_bpf_update_proto, +#endif .memory_allocated = &udp_memory_allocated, .sysctl_mem = sysctl_udp_mem, .sysctl_wmem_offset = offsetof(struct net, ipv4.sysctl_udp_wmem_min), From patchwork Tue Mar 23 00:38:04 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 407437 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 44112C433F8 for ; Tue, 23 Mar 2021 00:39:24 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 26B03619AC for ; Tue, 23 Mar 2021 00:39:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231374AbhCWAjM (ORCPT ); Mon, 22 Mar 2021 20:39:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39340 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230483AbhCWAi3 (ORCPT ); Mon, 22 Mar 2021 20:38:29 -0400 Received: from mail-qv1-xf2c.google.com (mail-qv1-xf2c.google.com [IPv6:2607:f8b0:4864:20::f2c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CA29DC061574; Mon, 22 Mar 2021 17:38:28 -0700 (PDT) Received: by mail-qv1-xf2c.google.com with SMTP id cx5so9631113qvb.10; Mon, 22 Mar 2021 17:38:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=zmsF6HZla+BsIO3PjkzlLwlB0jrHAcknoqWo/eWzyt4=; b=VYNiboCQhsOBPVKdBz6NnL2Hq5xSi19uZWxGqvqvdh6iKb5k9+jef8Uwo+uXhh+Vfk E6tLGxYGADJdhg5A9zbkpqHkF7+UJ9uerVD7aSDNtLsWvCsf3SOdf05H/buG/nsYTMg0 PMAW8p2SrFbFge+0jCzJnASihl/ZN6Qkf8v+CINiUXp2SFr/e8JoTnamsfuo0H4ETOqQ foLzCBAwNKO4j+GYsBbS7ZRcyo9MhT9wbwDfCwQqR01sz+gGjDN95KSu9uBdM5x8iUIO IPX4Mjecc1z+3s26MJ3TKqJpGDGX/jnPxaiagSjvWYWH9PLlqkrPSCnt/6nze32p+J5O VjqA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=zmsF6HZla+BsIO3PjkzlLwlB0jrHAcknoqWo/eWzyt4=; b=FzUQt/6NDozrT5JWg6y3nCNAjLyXrHJymRNKbTDYcNC4SgiF7m4lLJEWwO2+4oBVH+ 4Kyc436007tPx+5vy5bRW1BeGUwScYtOTpEn4BAMBozqAA+46zSi8wOgvzSToecDCnUJ Q6RgMDzcbXzdIoJNrLsNx6fWjyULPg922DkPttlNZEI/U/Qa2kZmWutOFO9giK461nfp GWATH19lkxviVS+J2c5AnZfRmvBMqbPrJWXzNBxPX9dNw11+ALOJlvi/7aP/QB9VMM6K U8WuJL/DC2qFPytQZLS5NU6nRdMz/iGZIu2P9NNOkx4xfRfIcWaoYNrnKsKbN/bIbfgm 6O6g== X-Gm-Message-State: AOAM5311Ebmy/VfGbGmcLYf+qJbO0oakAP+SPNZCFHnXuw22ecaTkf01 OrVtdA1MryJf155uH/Q4kAz2aHPGxO9AsA== X-Google-Smtp-Source: ABdhPJzIeQ9nHRMtRteYhdvCnWBcK0pB4GMncQhBxK8M0GGuqzJf0JMNDpXf0Kn6LxJxeBIjLtp2Kw== X-Received: by 2002:ad4:4692:: with SMTP id bq18mr2864203qvb.0.1616459907952; Mon, 22 Mar 2021 17:38:27 -0700 (PDT) Received: from unknown.attlocal.net ([2600:1700:65a0:ab60:fda6:6522:f108:7bd8]) by smtp.gmail.com with ESMTPSA id 184sm12356403qki.97.2021.03.22.17.38.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Mar 2021 17:38:27 -0700 (PDT) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, duanxiongchun@bytedance.com, wangdongdong.6@bytedance.com, jiang.wang@bytedance.com, Cong Wang , John Fastabend , Daniel Borkmann , Jakub Sitnicki , Lorenz Bauer Subject: [Patch bpf-next v6 08/12] udp: implement ->read_sock() for sockmap Date: Mon, 22 Mar 2021 17:38:04 -0700 Message-Id: <20210323003808.16074-9-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210323003808.16074-1-xiyou.wangcong@gmail.com> References: <20210323003808.16074-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Cong Wang This is similar to tcp_read_sock(), except we do not need to worry about connections, we just need to retrieve skb from UDP receive queue. Cc: John Fastabend Cc: Daniel Borkmann Cc: Jakub Sitnicki Cc: Lorenz Bauer Signed-off-by: Cong Wang --- include/net/udp.h | 2 ++ net/ipv4/af_inet.c | 1 + net/ipv4/udp.c | 35 +++++++++++++++++++++++++++++++++++ net/ipv6/af_inet6.c | 1 + 4 files changed, 39 insertions(+) diff --git a/include/net/udp.h b/include/net/udp.h index df7cc1edc200..347b62a753c3 100644 --- a/include/net/udp.h +++ b/include/net/udp.h @@ -329,6 +329,8 @@ struct sock *__udp6_lib_lookup(struct net *net, struct sk_buff *skb); struct sock *udp6_lib_lookup_skb(const struct sk_buff *skb, __be16 sport, __be16 dport); +int udp_read_sock(struct sock *sk, read_descriptor_t *desc, + sk_read_actor_t recv_actor); /* UDP uses skb->dev_scratch to cache as much information as possible and avoid * possibly multiple cache miss on dequeue() diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index 1355e6c0d567..f17870ee558b 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -1070,6 +1070,7 @@ const struct proto_ops inet_dgram_ops = { .setsockopt = sock_common_setsockopt, .getsockopt = sock_common_getsockopt, .sendmsg = inet_sendmsg, + .read_sock = udp_read_sock, .recvmsg = inet_recvmsg, .mmap = sock_no_mmap, .sendpage = inet_sendpage, diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 38952aaee3a1..a0adee3b1af4 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -1782,6 +1782,41 @@ struct sk_buff *__skb_recv_udp(struct sock *sk, unsigned int flags, } EXPORT_SYMBOL(__skb_recv_udp); +int udp_read_sock(struct sock *sk, read_descriptor_t *desc, + sk_read_actor_t recv_actor) +{ + int copied = 0; + + while (1) { + int offset = 0, err; + struct sk_buff *skb; + + skb = __skb_recv_udp(sk, 0, 1, &offset, &err); + if (!skb) + break; + if (offset < skb->len) { + int used; + size_t len; + + len = skb->len - offset; + used = recv_actor(desc, skb, offset, len); + if (used <= 0) { + if (!copied) + copied = used; + break; + } else if (used <= len) { + copied += used; + offset += used; + } + } + if (!desc->count) + break; + } + + return copied; +} +EXPORT_SYMBOL(udp_read_sock); + /* * This should be easy, if there is something there we * return it, otherwise we block. diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c index 802f5111805a..71de739b4a9e 100644 --- a/net/ipv6/af_inet6.c +++ b/net/ipv6/af_inet6.c @@ -714,6 +714,7 @@ const struct proto_ops inet6_dgram_ops = { .getsockopt = sock_common_getsockopt, /* ok */ .sendmsg = inet6_sendmsg, /* retpoline's sake */ .recvmsg = inet6_recvmsg, /* retpoline's sake */ + .read_sock = udp_read_sock, .mmap = sock_no_mmap, .sendpage = sock_no_sendpage, .set_peek_off = sk_set_peek_off, From patchwork Tue Mar 23 00:38:05 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 408748 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5D1B1C43381 for ; Tue, 23 Mar 2021 00:39:24 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 406D66199F for ; Tue, 23 Mar 2021 00:39:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231388AbhCWAjO (ORCPT ); Mon, 22 Mar 2021 20:39:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39348 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230511AbhCWAia (ORCPT ); Mon, 22 Mar 2021 20:38:30 -0400 Received: from mail-qk1-x735.google.com (mail-qk1-x735.google.com [IPv6:2607:f8b0:4864:20::735]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5A2FBC061574; Mon, 22 Mar 2021 17:38:30 -0700 (PDT) Received: by mail-qk1-x735.google.com with SMTP id o5so12692974qkb.0; Mon, 22 Mar 2021 17:38:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=C0VNQTgWKivh82UbEOO87R6Dv7ivRPENK8wDVTwtxhg=; b=FYJErhXTbrG69KihdZALWiQUq7y4W21A+a8wUhqyNlE4pmECNYzp7QhmZca9eLcOnf HqZZ1DHqQ9II1f1qVD3y1yK5+rpAeTuok9fn/f6xJOgsiXr55FK/ffWDPxoBFp1eLzRt WFm28+OGSVueeLdfsQu7S+C8gjPI0KW6AZBsfyl200js+wePkV3NU71JOUKOaQA1YPKg JScxy+LZ0Co9Bj4T/nmgSZfhew4gAP87rXw3Ztlmw2wBpBGRh173hGkLh8lcIzjRnQQU jnbkC+SFztNWtfXqGVppjEB2zQ7ETebkrs1SFsAzgEAVEiWqaAuGM/TPc4BYazmaHNJT flRg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=C0VNQTgWKivh82UbEOO87R6Dv7ivRPENK8wDVTwtxhg=; b=Ie3Zg6Cgt+HAzVDyNMEHkyQd/Q2XhQl6quAwp7gOkPvPTLKrs0O2ykr0h9HleZaTnQ 7luaRxTgIdXMg+Bz7h2KO1jPFYhOw2/GUZ4Cg2LA10iUsbdmSI/eXVqRmwixipKLXPeo WIIFH2WflMV+1q9RfVDHu1Qp38yqGaoWJaO60U/q1i6HYIv7SlTtt4jm3SdI2NHDmQcY 3V4fN6xuESodz+HwK6Xkn578th571gAPTzewCamqNEEF8Ulhubk4DNpv9oLfUFyfLNjy 9kzqtY4ElL4iu4K5/urMUqBujerRQOXlcvBXGfd6/5n2ZcrkHigDEK2ifBs1WxH0QiEJ sdzQ== X-Gm-Message-State: AOAM533hEauHEi6aPL9tKaPT34AbBqweEL0fiAQ/bYCIRtFR1Ny96rOA btCoP6/SNzbQo+DE7nh5II3zaG8qiKQ5eQ== X-Google-Smtp-Source: ABdhPJxBsDDr4eZNePE1b15RPNrd9MDOmLB7Ipy0IDbPJmpVaxziQJi+tQjui/fJvJC5+ktB+velFA== X-Received: by 2002:a05:620a:854:: with SMTP id u20mr2997387qku.106.1616459909442; Mon, 22 Mar 2021 17:38:29 -0700 (PDT) Received: from unknown.attlocal.net ([2600:1700:65a0:ab60:fda6:6522:f108:7bd8]) by smtp.gmail.com with ESMTPSA id 184sm12356403qki.97.2021.03.22.17.38.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Mar 2021 17:38:29 -0700 (PDT) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, duanxiongchun@bytedance.com, wangdongdong.6@bytedance.com, jiang.wang@bytedance.com, Cong Wang , John Fastabend , Daniel Borkmann , Jakub Sitnicki , Lorenz Bauer Subject: [Patch bpf-next v6 09/12] skmsg: extract __tcp_bpf_recvmsg() and tcp_bpf_wait_data() Date: Mon, 22 Mar 2021 17:38:05 -0700 Message-Id: <20210323003808.16074-10-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210323003808.16074-1-xiyou.wangcong@gmail.com> References: <20210323003808.16074-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Cong Wang Although these two functions are only used by TCP, they are not specific to TCP at all, both operate on skmsg and ingress_msg, so fit in net/core/skmsg.c very well. And we will need them for non-TCP, so rename and move them to skmsg.c and export them to modules. Cc: John Fastabend Cc: Daniel Borkmann Cc: Jakub Sitnicki Cc: Lorenz Bauer Signed-off-by: Cong Wang --- include/linux/skmsg.h | 4 ++ include/net/tcp.h | 2 - net/core/skmsg.c | 98 +++++++++++++++++++++++++++++++++++++++++ net/ipv4/tcp_bpf.c | 100 +----------------------------------------- net/tls/tls_sw.c | 4 +- 5 files changed, 106 insertions(+), 102 deletions(-) diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h index eb53f6ce74d7..907127a4173a 100644 --- a/include/linux/skmsg.h +++ b/include/linux/skmsg.h @@ -124,6 +124,10 @@ int sk_msg_zerocopy_from_iter(struct sock *sk, struct iov_iter *from, struct sk_msg *msg, u32 bytes); int sk_msg_memcopy_from_iter(struct sock *sk, struct iov_iter *from, struct sk_msg *msg, u32 bytes); +int sk_msg_wait_data(struct sock *sk, struct sk_psock *psock, int flags, + long timeo, int *err); +int sk_msg_recvmsg(struct sock *sk, struct sk_psock *psock, struct msghdr *msg, + int len, int flags); static inline void sk_msg_check_to_free(struct sk_msg *msg, u32 i, u32 bytes) { diff --git a/include/net/tcp.h b/include/net/tcp.h index 2efa4e5ea23d..31b1696c62ba 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -2209,8 +2209,6 @@ void tcp_bpf_clone(const struct sock *sk, struct sock *newsk); int tcp_bpf_sendmsg_redir(struct sock *sk, struct sk_msg *msg, u32 bytes, int flags); -int __tcp_bpf_recvmsg(struct sock *sk, struct sk_psock *psock, - struct msghdr *msg, int len, int flags); #endif /* CONFIG_NET_SOCK_MSG */ #if !defined(CONFIG_BPF_SYSCALL) || !defined(CONFIG_NET_SOCK_MSG) diff --git a/net/core/skmsg.c b/net/core/skmsg.c index f7a512bfdb6e..d025d4585899 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -399,6 +399,104 @@ int sk_msg_memcopy_from_iter(struct sock *sk, struct iov_iter *from, } EXPORT_SYMBOL_GPL(sk_msg_memcopy_from_iter); +int sk_msg_wait_data(struct sock *sk, struct sk_psock *psock, int flags, + long timeo, int *err) +{ + DEFINE_WAIT_FUNC(wait, woken_wake_function); + int ret = 0; + + if (sk->sk_shutdown & RCV_SHUTDOWN) + return 1; + + if (!timeo) + return ret; + + add_wait_queue(sk_sleep(sk), &wait); + sk_set_bit(SOCKWQ_ASYNC_WAITDATA, sk); + ret = sk_wait_event(sk, &timeo, + !list_empty(&psock->ingress_msg) || + !skb_queue_empty(&sk->sk_receive_queue), &wait); + sk_clear_bit(SOCKWQ_ASYNC_WAITDATA, sk); + remove_wait_queue(sk_sleep(sk), &wait); + return ret; +} +EXPORT_SYMBOL_GPL(sk_msg_wait_data); + +/* Receive sk_msg from psock->ingress_msg to @msg. */ +int sk_msg_recvmsg(struct sock *sk, struct sk_psock *psock, struct msghdr *msg, + int len, int flags) +{ + struct iov_iter *iter = &msg->msg_iter; + int peek = flags & MSG_PEEK; + struct sk_msg *msg_rx; + int i, copied = 0; + + msg_rx = sk_psock_peek_msg(psock); + while (copied != len) { + struct scatterlist *sge; + + if (unlikely(!msg_rx)) + break; + + i = msg_rx->sg.start; + do { + struct page *page; + int copy; + + sge = sk_msg_elem(msg_rx, i); + copy = sge->length; + page = sg_page(sge); + if (copied + copy > len) + copy = len - copied; + copy = copy_page_to_iter(page, sge->offset, copy, iter); + if (!copy) + return copied ? copied : -EFAULT; + + copied += copy; + if (likely(!peek)) { + sge->offset += copy; + sge->length -= copy; + if (!msg_rx->skb) + sk_mem_uncharge(sk, copy); + msg_rx->sg.size -= copy; + + if (!sge->length) { + sk_msg_iter_var_next(i); + if (!msg_rx->skb) + put_page(page); + } + } else { + /* Lets not optimize peek case if copy_page_to_iter + * didn't copy the entire length lets just break. + */ + if (copy != sge->length) + return copied; + sk_msg_iter_var_next(i); + } + + if (copied == len) + break; + } while (i != msg_rx->sg.end); + + if (unlikely(peek)) { + msg_rx = sk_psock_next_msg(psock, msg_rx); + if (!msg_rx) + break; + continue; + } + + msg_rx->sg.start = i; + if (!sge->length && msg_rx->sg.start == msg_rx->sg.end) { + msg_rx = sk_psock_dequeue_msg(psock); + kfree_sk_msg(msg_rx); + } + msg_rx = sk_psock_peek_msg(psock); + } + + return copied; +} +EXPORT_SYMBOL_GPL(sk_msg_recvmsg); + static struct sk_msg *sk_psock_create_ingress_msg(struct sock *sk, struct sk_buff *skb) { diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c index ac8cfbaeacd2..3d622a0d0753 100644 --- a/net/ipv4/tcp_bpf.c +++ b/net/ipv4/tcp_bpf.c @@ -10,80 +10,6 @@ #include #include -int __tcp_bpf_recvmsg(struct sock *sk, struct sk_psock *psock, - struct msghdr *msg, int len, int flags) -{ - struct iov_iter *iter = &msg->msg_iter; - int peek = flags & MSG_PEEK; - struct sk_msg *msg_rx; - int i, copied = 0; - - msg_rx = sk_psock_peek_msg(psock); - while (copied != len) { - struct scatterlist *sge; - - if (unlikely(!msg_rx)) - break; - - i = msg_rx->sg.start; - do { - struct page *page; - int copy; - - sge = sk_msg_elem(msg_rx, i); - copy = sge->length; - page = sg_page(sge); - if (copied + copy > len) - copy = len - copied; - copy = copy_page_to_iter(page, sge->offset, copy, iter); - if (!copy) - return copied ? copied : -EFAULT; - - copied += copy; - if (likely(!peek)) { - sge->offset += copy; - sge->length -= copy; - if (!msg_rx->skb) - sk_mem_uncharge(sk, copy); - msg_rx->sg.size -= copy; - - if (!sge->length) { - sk_msg_iter_var_next(i); - if (!msg_rx->skb) - put_page(page); - } - } else { - /* Lets not optimize peek case if copy_page_to_iter - * didn't copy the entire length lets just break. - */ - if (copy != sge->length) - return copied; - sk_msg_iter_var_next(i); - } - - if (copied == len) - break; - } while (i != msg_rx->sg.end); - - if (unlikely(peek)) { - msg_rx = sk_psock_next_msg(psock, msg_rx); - if (!msg_rx) - break; - continue; - } - - msg_rx->sg.start = i; - if (!sge->length && msg_rx->sg.start == msg_rx->sg.end) { - msg_rx = sk_psock_dequeue_msg(psock); - kfree_sk_msg(msg_rx); - } - msg_rx = sk_psock_peek_msg(psock); - } - - return copied; -} -EXPORT_SYMBOL_GPL(__tcp_bpf_recvmsg); - static int bpf_tcp_ingress(struct sock *sk, struct sk_psock *psock, struct sk_msg *msg, u32 apply_bytes, int flags) { @@ -237,28 +163,6 @@ static bool tcp_bpf_stream_read(const struct sock *sk) return !empty; } -static int tcp_bpf_wait_data(struct sock *sk, struct sk_psock *psock, - int flags, long timeo, int *err) -{ - DEFINE_WAIT_FUNC(wait, woken_wake_function); - int ret = 0; - - if (sk->sk_shutdown & RCV_SHUTDOWN) - return 1; - - if (!timeo) - return ret; - - add_wait_queue(sk_sleep(sk), &wait); - sk_set_bit(SOCKWQ_ASYNC_WAITDATA, sk); - ret = sk_wait_event(sk, &timeo, - !list_empty(&psock->ingress_msg) || - !skb_queue_empty(&sk->sk_receive_queue), &wait); - sk_clear_bit(SOCKWQ_ASYNC_WAITDATA, sk); - remove_wait_queue(sk_sleep(sk), &wait); - return ret; -} - static int tcp_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int nonblock, int flags, int *addr_len) { @@ -278,13 +182,13 @@ static int tcp_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, } lock_sock(sk); msg_bytes_ready: - copied = __tcp_bpf_recvmsg(sk, psock, msg, len, flags); + copied = sk_msg_recvmsg(sk, psock, msg, len, flags); if (!copied) { int data, err = 0; long timeo; timeo = sock_rcvtimeo(sk, nonblock); - data = tcp_bpf_wait_data(sk, psock, flags, timeo, &err); + data = sk_msg_wait_data(sk, psock, flags, timeo, &err); if (data) { if (!sk_psock_queue_empty(psock)) goto msg_bytes_ready; diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index 01d933ae5f16..1dcb34dfd56b 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -1789,8 +1789,8 @@ int tls_sw_recvmsg(struct sock *sk, skb = tls_wait_data(sk, psock, flags, timeo, &err); if (!skb) { if (psock) { - int ret = __tcp_bpf_recvmsg(sk, psock, - msg, len, flags); + int ret = sk_msg_recvmsg(sk, psock, msg, len, + flags); if (ret > 0) { decrypted += ret; From patchwork Tue Mar 23 00:38:06 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 408747 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7BB0FC433E3 for ; Tue, 23 Mar 2021 00:39:24 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 58B88619AC for ; Tue, 23 Mar 2021 00:39:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231395AbhCWAjP (ORCPT ); Mon, 22 Mar 2021 20:39:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39358 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230512AbhCWAic (ORCPT ); Mon, 22 Mar 2021 20:38:32 -0400 Received: from mail-qk1-x734.google.com (mail-qk1-x734.google.com [IPv6:2607:f8b0:4864:20::734]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E3F32C061574; Mon, 22 Mar 2021 17:38:31 -0700 (PDT) Received: by mail-qk1-x734.google.com with SMTP id 7so12707308qka.7; Mon, 22 Mar 2021 17:38:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ktDD3wFfhTgzuwFfQ6FhKlNOYR4srlqkDAegPxHtcbc=; b=gCwCuwwVAIPAWFdKIHS2cH/MCoVfPSeZ2Xm4SDOlgug6/RYQvtrN1A+kJOdOgLrYMR bbzOCzJXjjnFmlbj4MfqFSu805EGPGxg3AXSsh08pNgBEEFp1WyLBt8DQR9dSKSf9IK8 OGwhq8duzfg3X13ADicjD1naoZR9Tw4r8JfPn34LnJUqyF0IzaPWeQfg6irwDeAvQdrB m2wG7ja4p9c+QqBk2/paCZgly63ieuUgNU7h4lcSX//dSasGlbihaRvs6nIVJ2RIFZA8 7R1bLoNR87N3RFcHtfax263LoHZiRJrCqKq/lsHsLEdti4Gu1CWCEkMblrcE8RkIbyqn iXTA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ktDD3wFfhTgzuwFfQ6FhKlNOYR4srlqkDAegPxHtcbc=; b=QF6DwdrYsOsXtpe5943lJ2yv7547SPbGePNA9yf5K8hCcLRuXbW8YCBHnhccIJuGmn GIbc6tfcoCSwGHWkNvQ7o878fQxAFItl4y0njWvCvLauRqpRwM09aE/1K4rPcu5XCAP/ DN3isbt7R4YfGrq1BzLwluhFtGpQwTfrdfX+XHedA20WVSB99a56X/pb8Ngmnp2m/t9V EW+39krGz9ODdyolbP7ZHx/CSRTWzOF5wGxziTfaD+0B5SdBK573VYaAR5WiCAk+eNtb CFlFU8oBGV8J7bzHA+Fa8zijvpyl6WEvmmsco11Dq4ZRsQQwfgKBLC+vzxtretIT8Y4E eBCw== X-Gm-Message-State: AOAM533/rD7agTILJIeLLb3o8m/zhUvHx2nDFFLgJJb1J0dPtTJDhyVT SbXl1b/e/Rnu/Io7VWAuBlbO1dYcd9hZfA== X-Google-Smtp-Source: ABdhPJyyRtG+yM9dxcT/jUVWLrzuyV3mNODLgD9sq5sMhH/i3LIDYlMtmUaSCnt8twqNVwYrkAieUA== X-Received: by 2002:a37:6312:: with SMTP id x18mr2844368qkb.193.1616459910944; Mon, 22 Mar 2021 17:38:30 -0700 (PDT) Received: from unknown.attlocal.net ([2600:1700:65a0:ab60:fda6:6522:f108:7bd8]) by smtp.gmail.com with ESMTPSA id 184sm12356403qki.97.2021.03.22.17.38.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Mar 2021 17:38:30 -0700 (PDT) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, duanxiongchun@bytedance.com, wangdongdong.6@bytedance.com, jiang.wang@bytedance.com, Cong Wang , John Fastabend , Daniel Borkmann , Jakub Sitnicki , Lorenz Bauer Subject: [Patch bpf-next v6 10/12] udp: implement udp_bpf_recvmsg() for sockmap Date: Mon, 22 Mar 2021 17:38:06 -0700 Message-Id: <20210323003808.16074-11-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210323003808.16074-1-xiyou.wangcong@gmail.com> References: <20210323003808.16074-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Cong Wang We have to implement udp_bpf_recvmsg() to replace the ->recvmsg() to retrieve skmsg from ingress_msg. Cc: John Fastabend Cc: Daniel Borkmann Cc: Jakub Sitnicki Cc: Lorenz Bauer Signed-off-by: Cong Wang --- net/ipv4/udp_bpf.c | 64 +++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 63 insertions(+), 1 deletion(-) diff --git a/net/ipv4/udp_bpf.c b/net/ipv4/udp_bpf.c index 6001f93cd3a0..7d5c4ebf42fe 100644 --- a/net/ipv4/udp_bpf.c +++ b/net/ipv4/udp_bpf.c @@ -4,6 +4,68 @@ #include #include #include +#include + +#include "udp_impl.h" + +static struct proto *udpv6_prot_saved __read_mostly; + +static int sk_udp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, + int noblock, int flags, int *addr_len) +{ +#if IS_ENABLED(CONFIG_IPV6) + if (sk->sk_family == AF_INET6) + return udpv6_prot_saved->recvmsg(sk, msg, len, noblock, flags, + addr_len); +#endif + return udp_prot.recvmsg(sk, msg, len, noblock, flags, addr_len); +} + +static int udp_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, + int nonblock, int flags, int *addr_len) +{ + struct sk_psock *psock; + int copied, ret; + + if (unlikely(flags & MSG_ERRQUEUE)) + return inet_recv_error(sk, msg, len, addr_len); + + psock = sk_psock_get(sk); + if (unlikely(!psock)) + return sk_udp_recvmsg(sk, msg, len, nonblock, flags, addr_len); + + lock_sock(sk); + if (sk_psock_queue_empty(psock)) { + ret = sk_udp_recvmsg(sk, msg, len, nonblock, flags, addr_len); + goto out; + } + +msg_bytes_ready: + copied = sk_msg_recvmsg(sk, psock, msg, len, flags); + if (!copied) { + int data, err = 0; + long timeo; + + timeo = sock_rcvtimeo(sk, nonblock); + data = sk_msg_wait_data(sk, psock, flags, timeo, &err); + if (data) { + if (!sk_psock_queue_empty(psock)) + goto msg_bytes_ready; + ret = sk_udp_recvmsg(sk, msg, len, nonblock, flags, addr_len); + goto out; + } + if (err) { + ret = err; + goto out; + } + copied = -EAGAIN; + } + ret = copied; +out: + release_sock(sk); + sk_psock_put(sk, psock); + return ret; +} enum { UDP_BPF_IPV4, @@ -11,7 +73,6 @@ enum { UDP_BPF_NUM_PROTS, }; -static struct proto *udpv6_prot_saved __read_mostly; static DEFINE_SPINLOCK(udpv6_prot_lock); static struct proto udp_bpf_prots[UDP_BPF_NUM_PROTS]; @@ -20,6 +81,7 @@ static void udp_bpf_rebuild_protos(struct proto *prot, const struct proto *base) *prot = *base; prot->unhash = sock_map_unhash; prot->close = sock_map_close; + prot->recvmsg = udp_bpf_recvmsg; } static void udp_bpf_check_v6_needs_rebuild(struct proto *ops) From patchwork Tue Mar 23 00:38:07 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 407435 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9EE7CC43446 for ; Tue, 23 Mar 2021 00:39:24 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 787C76199F for ; Tue, 23 Mar 2021 00:39:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231405AbhCWAjQ (ORCPT ); Mon, 22 Mar 2021 20:39:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39360 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230525AbhCWAid (ORCPT ); Mon, 22 Mar 2021 20:38:33 -0400 Received: from mail-qt1-x830.google.com (mail-qt1-x830.google.com [IPv6:2607:f8b0:4864:20::830]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 538DAC061574; Mon, 22 Mar 2021 17:38:33 -0700 (PDT) Received: by mail-qt1-x830.google.com with SMTP id a11so13824933qto.2; Mon, 22 Mar 2021 17:38:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=cio+qgcDvmruwG8aqOjWedfunYy04m3r9xm+A6tj84M=; b=SoMmBYaYtkk1KN69z5WwbB8MT/noBpcPN1pTUi/j41fRbpaqzIsToALvOdUqEb//6h J9NtygfsEclltiHRx3jVHvYl78/qCzo1bZgdcJHqmYT4HdPhfJJFfP9d5BSLgRvhcBRJ rFLeTaPYmQPuvwr6qXHJQH5pz17lTq4j3kL8iHChOvXOby8te2cNyltZIjbCLgtj1ejI 59QrB8UdIylzL7jfNk8nvGEu9WF3JjNba62aEo/vCcvRyfoernXe5qdClMm1a8uRg7U3 19qIBi9/dIhYX70IHoqE2Frdv8emJ9CvoG0L3lekGFvRmSduOX4h1XOj+gyB7nTGFWvk mKfA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=cio+qgcDvmruwG8aqOjWedfunYy04m3r9xm+A6tj84M=; b=itzeqrQMU91q0I9iRT5EY1vCnS7w4+NzW4L1axTJR2wl/c1QYL/h66ilMWY5pRUxia rUOkmqL2IzfbB3aZqALRaVmyio7MO831HbzzYGhQh2Tw3FYjp6EAqGtG7Be1rPVchdpT LB52uL1mX0iBTA7NZ2Rpy75pUxdSc+2Rrlm4HKQNufptCALcY536UIWFtWrDTm6GmUem bq2g8rTGmIQfzXKgKrdVou08auF28YRBIUtCYV+CwzwYSj/RhnjvBwQPoi94KU6d2GM7 0HGv+js3HMbvfIQPXSmDwrdR9pnUrONRrsj1QYBLGRN9pCn0I8NdaflnB1i2zkc3jpST J0vQ== X-Gm-Message-State: AOAM530HT57DcAiPSpcXmZpPqH0Sqq+gmmT3rAZbBqc0KEFvLSIkN9PK 5cvQ3NqYpvrRzpHWvZVw6orUgrEGKx4ecA== X-Google-Smtp-Source: ABdhPJxW5WQ7Nkthq8IfYNQ0vV9grQMwn/HOus8Bkv+6anATYj3yI1SddCuuyYqME/78bmI9afI2jw== X-Received: by 2002:ac8:4809:: with SMTP id g9mr2298634qtq.295.1616459912454; Mon, 22 Mar 2021 17:38:32 -0700 (PDT) Received: from unknown.attlocal.net ([2600:1700:65a0:ab60:fda6:6522:f108:7bd8]) by smtp.gmail.com with ESMTPSA id 184sm12356403qki.97.2021.03.22.17.38.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Mar 2021 17:38:32 -0700 (PDT) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, duanxiongchun@bytedance.com, wangdongdong.6@bytedance.com, jiang.wang@bytedance.com, Cong Wang , John Fastabend , Daniel Borkmann , Jakub Sitnicki , Lorenz Bauer Subject: [Patch bpf-next v6 11/12] sock_map: update sock type checks for UDP Date: Mon, 22 Mar 2021 17:38:07 -0700 Message-Id: <20210323003808.16074-12-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210323003808.16074-1-xiyou.wangcong@gmail.com> References: <20210323003808.16074-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Cong Wang Now UDP supports sockmap and redirection, we can safely update the sock type checks for it accordingly. Cc: John Fastabend Cc: Daniel Borkmann Cc: Jakub Sitnicki Cc: Lorenz Bauer Signed-off-by: Cong Wang --- net/core/sock_map.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/net/core/sock_map.c b/net/core/sock_map.c index d04b98fc8104..9ed040c7d9e7 100644 --- a/net/core/sock_map.c +++ b/net/core/sock_map.c @@ -549,7 +549,10 @@ static bool sk_is_udp(const struct sock *sk) static bool sock_map_redirect_allowed(const struct sock *sk) { - return sk_is_tcp(sk) && sk->sk_state != TCP_LISTEN; + if (sk_is_tcp(sk)) + return sk->sk_state != TCP_LISTEN; + else + return sk->sk_state == TCP_ESTABLISHED; } static bool sock_map_sk_is_suitable(const struct sock *sk) From patchwork Tue Mar 23 00:38:08 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 407433 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A7EF7C433DB for ; Tue, 23 Mar 2021 00:39:50 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 651F7619A8 for ; Tue, 23 Mar 2021 00:39:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231411AbhCWAjS (ORCPT ); Mon, 22 Mar 2021 20:39:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39368 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231130AbhCWAif (ORCPT ); Mon, 22 Mar 2021 20:38:35 -0400 Received: from mail-qt1-x835.google.com (mail-qt1-x835.google.com [IPv6:2607:f8b0:4864:20::835]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E36B2C061574; Mon, 22 Mar 2021 17:38:34 -0700 (PDT) Received: by mail-qt1-x835.google.com with SMTP id s2so13804827qtx.10; Mon, 22 Mar 2021 17:38:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=g3nptvPrG/Rja0vDg1eqjKadpsNbfe6SjFJEHfl1sz0=; b=hrNum4QvL3LamXWq9P4plZ53fQ3WQRfknjOtzqq1IhGP743lYsIihPZ0tga1D/KVbX +sLccEWajK/7rMy3pLgA49qWcfUfY8a2jo1jTe+LmD2mvSgnHo6nPUlfTSMZYd5oxdzm Y6HnAAibmwWSFGQ1NNolZxRiKYzmjqIIhl4tJLStj8Q4x/aVeXZKpu3qa77yU3sWOSp8 2BtBwLqCRsWChWBYmneLZSCF6LN+0nRiT6Sb9o8FYNJGEe2WmnUWH9+EPU/++B7NW1fC 8OgvNftY5nKzkVixPgdsHAzQfCCCYloyRGMbCryCDlrXPCAbaub9L7CXxUkoyZJx+ZfN H49A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=g3nptvPrG/Rja0vDg1eqjKadpsNbfe6SjFJEHfl1sz0=; b=XXrSijfKweU1wF40Snq9uPYUz46sYn6vZhOArljb/LfkEhv1d8h88BtpF423fGP6ps RGaPgVkZId9LVA2tsyOBGGavImErtHgnSTLOUJjYieHpcZJhkigqpm4I/ocEUAKWZ5Pc JAMV6obRnC8eahznOchfgDFG/FLQnhDW227TvcA+Ohan7mhWp4yegnwR0Kg4BRPn5npW 42udfXauI8hwnvCzts++x1L3E+26r8lNA3R/z2mXJSQNmAocPGWIIdT7mSVVuG7RnqXN QJthTimDRpB5Vq7SKlArHJxaqFN+6FakpNoHkGHj96a1DepCq7eQnkTKyFKY8XbBww9o rb/g== X-Gm-Message-State: AOAM533pOMKSkzxtDYHRdDOcy2rR5wMV299yYLWofOib/h11pHSZH7cB /CWid3lxTfe51rgUtsK+svYwkVfc/CCJaw== X-Google-Smtp-Source: ABdhPJxPf9vTbaZDie5kbxaQAkg1xL9Texd791A7afJdJVxkRgn8eVLh2YrsDHx4nHVerQChTuT4ew== X-Received: by 2002:a05:622a:3cf:: with SMTP id k15mr2294520qtx.282.1616459913939; Mon, 22 Mar 2021 17:38:33 -0700 (PDT) Received: from unknown.attlocal.net ([2600:1700:65a0:ab60:fda6:6522:f108:7bd8]) by smtp.gmail.com with ESMTPSA id 184sm12356403qki.97.2021.03.22.17.38.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Mar 2021 17:38:33 -0700 (PDT) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, duanxiongchun@bytedance.com, wangdongdong.6@bytedance.com, jiang.wang@bytedance.com, Cong Wang , John Fastabend , Daniel Borkmann , Jakub Sitnicki , Lorenz Bauer Subject: [Patch bpf-next v6 12/12] selftests/bpf: add a test case for udp sockmap Date: Mon, 22 Mar 2021 17:38:08 -0700 Message-Id: <20210323003808.16074-13-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210323003808.16074-1-xiyou.wangcong@gmail.com> References: <20210323003808.16074-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Cong Wang Add a test case to ensure redirection between two UDP sockets work. Cc: John Fastabend Cc: Daniel Borkmann Cc: Jakub Sitnicki Cc: Lorenz Bauer Signed-off-by: Cong Wang --- .../selftests/bpf/prog_tests/sockmap_listen.c | 140 ++++++++++++++++++ .../selftests/bpf/progs/test_sockmap_listen.c | 22 +++ 2 files changed, 162 insertions(+) diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c index c26e6bf05e49..a549ebd3b5a6 100644 --- a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c @@ -1563,6 +1563,142 @@ static void test_redir(struct test_sockmap_listen *skel, struct bpf_map *map, } } +static void udp_redir_to_connected(int family, int sotype, int sock_mapfd, + int verd_mapfd, enum redir_mode mode) +{ + const char *log_prefix = redir_mode_str(mode); + struct sockaddr_storage addr; + int c0, c1, p0, p1; + unsigned int pass; + socklen_t len; + int err, n; + u64 value; + u32 key; + char b; + + zero_verdict_count(verd_mapfd); + + p0 = socket_loopback(family, sotype | SOCK_NONBLOCK); + if (p0 < 0) + return; + len = sizeof(addr); + err = xgetsockname(p0, sockaddr(&addr), &len); + if (err) + goto close_peer0; + + c0 = xsocket(family, sotype | SOCK_NONBLOCK, 0); + if (c0 < 0) + goto close_peer0; + err = xconnect(c0, sockaddr(&addr), len); + if (err) + goto close_cli0; + err = xgetsockname(c0, sockaddr(&addr), &len); + if (err) + goto close_cli0; + err = xconnect(p0, sockaddr(&addr), len); + if (err) + goto close_cli0; + + p1 = socket_loopback(family, sotype | SOCK_NONBLOCK); + if (p1 < 0) + goto close_cli0; + err = xgetsockname(p1, sockaddr(&addr), &len); + if (err) + goto close_cli0; + + c1 = xsocket(family, sotype | SOCK_NONBLOCK, 0); + if (c1 < 0) + goto close_peer1; + err = xconnect(c1, sockaddr(&addr), len); + if (err) + goto close_cli1; + err = xgetsockname(c1, sockaddr(&addr), &len); + if (err) + goto close_cli1; + err = xconnect(p1, sockaddr(&addr), len); + if (err) + goto close_cli1; + + key = 0; + value = p0; + err = xbpf_map_update_elem(sock_mapfd, &key, &value, BPF_NOEXIST); + if (err) + goto close_cli1; + + key = 1; + value = p1; + err = xbpf_map_update_elem(sock_mapfd, &key, &value, BPF_NOEXIST); + if (err) + goto close_cli1; + + n = write(c1, "a", 1); + if (n < 0) + FAIL_ERRNO("%s: write", log_prefix); + if (n == 0) + FAIL("%s: incomplete write", log_prefix); + if (n < 1) + goto close_cli1; + + key = SK_PASS; + err = xbpf_map_lookup_elem(verd_mapfd, &key, &pass); + if (err) + goto close_cli1; + if (pass != 1) + FAIL("%s: want pass count 1, have %d", log_prefix, pass); + + n = read(mode == REDIR_INGRESS ? p0 : c0, &b, 1); + if (n < 0) + FAIL_ERRNO("%s: read", log_prefix); + if (n == 0) + FAIL("%s: incomplete read", log_prefix); + +close_cli1: + xclose(c1); +close_peer1: + xclose(p1); +close_cli0: + xclose(c0); +close_peer0: + xclose(p0); +} + +static void udp_skb_redir_to_connected(struct test_sockmap_listen *skel, + struct bpf_map *inner_map, int family, + int sotype) +{ + int verdict = bpf_program__fd(skel->progs.prog_skb_verdict); + int verdict_map = bpf_map__fd(skel->maps.verdict_map); + int sock_map = bpf_map__fd(inner_map); + int err; + + err = xbpf_prog_attach(verdict, sock_map, BPF_SK_SKB_VERDICT, 0); + if (err) + return; + + skel->bss->test_ingress = false; + udp_redir_to_connected(family, sotype, sock_map, verdict_map, + REDIR_EGRESS); + skel->bss->test_ingress = true; + udp_redir_to_connected(family, sotype, sock_map, verdict_map, + REDIR_INGRESS); + + xbpf_prog_detach2(verdict, sock_map, BPF_SK_SKB_VERDICT); +} + +static void test_udp_redir(struct test_sockmap_listen *skel, struct bpf_map *map, + int family) +{ + const char *family_name, *map_name; + char s[MAX_TEST_NAME]; + + family_name = family_str(family); + map_name = map_type_str(map); + snprintf(s, sizeof(s), "%s %s %s", map_name, family_name, __func__); + if (!test__start_subtest(s)) + return; + udp_skb_redir_to_connected(skel, map, family, SOCK_DGRAM); +} + static void test_reuseport(struct test_sockmap_listen *skel, struct bpf_map *map, int family, int sotype) { @@ -1626,10 +1762,14 @@ void test_sockmap_listen(void) skel->bss->test_sockmap = true; run_tests(skel, skel->maps.sock_map, AF_INET); run_tests(skel, skel->maps.sock_map, AF_INET6); + test_udp_redir(skel, skel->maps.sock_map, AF_INET); + test_udp_redir(skel, skel->maps.sock_map, AF_INET6); skel->bss->test_sockmap = false; run_tests(skel, skel->maps.sock_hash, AF_INET); run_tests(skel, skel->maps.sock_hash, AF_INET6); + test_udp_redir(skel, skel->maps.sock_hash, AF_INET); + test_udp_redir(skel, skel->maps.sock_hash, AF_INET6); test_sockmap_listen__destroy(skel); } diff --git a/tools/testing/selftests/bpf/progs/test_sockmap_listen.c b/tools/testing/selftests/bpf/progs/test_sockmap_listen.c index fa221141e9c1..a39eba9f5201 100644 --- a/tools/testing/selftests/bpf/progs/test_sockmap_listen.c +++ b/tools/testing/selftests/bpf/progs/test_sockmap_listen.c @@ -29,6 +29,7 @@ struct { } verdict_map SEC(".maps"); static volatile bool test_sockmap; /* toggled by user-space */ +static volatile bool test_ingress; /* toggled by user-space */ SEC("sk_skb/stream_parser") int prog_stream_parser(struct __sk_buff *skb) @@ -55,6 +56,27 @@ int prog_stream_verdict(struct __sk_buff *skb) return verdict; } +SEC("sk_skb/skb_verdict") +int prog_skb_verdict(struct __sk_buff *skb) +{ + unsigned int *count; + __u32 zero = 0; + int verdict; + + if (test_sockmap) + verdict = bpf_sk_redirect_map(skb, &sock_map, zero, + test_ingress ? BPF_F_INGRESS : 0); + else + verdict = bpf_sk_redirect_hash(skb, &sock_hash, &zero, + test_ingress ? BPF_F_INGRESS : 0); + + count = bpf_map_lookup_elem(&verdict_map, &verdict); + if (count) + (*count)++; + + return verdict; +} + SEC("sk_msg") int prog_msg_verdict(struct sk_msg_md *msg) {