From patchwork Fri May 28 02:28:02 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tanner Love X-Patchwork-Id: 449912 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 078EDC4708D for ; Fri, 28 May 2021 02:28:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D7344613B8 for ; Fri, 28 May 2021 02:28:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236110AbhE1C3v (ORCPT ); Thu, 27 May 2021 22:29:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41346 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234123AbhE1C3r (ORCPT ); Thu, 27 May 2021 22:29:47 -0400 Received: from mail-qv1-xf32.google.com (mail-qv1-xf32.google.com [IPv6:2607:f8b0:4864:20::f32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1BA97C06174A for ; Thu, 27 May 2021 19:28:12 -0700 (PDT) Received: by mail-qv1-xf32.google.com with SMTP id 5so1238418qvk.0 for ; Thu, 27 May 2021 19:28:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=SIe9zUzXkAQz1y6Sqd15GO+8qYPx3M6sSjI36nGtGVQ=; b=c9IJE5jojxnQgwrd1lKNSDTMOELGfLPPCjXYKhKh3GehcJDz7ZvAC0ighLO6jVPQcG tSGuLn+SCob3wWxX2J/nhEl6Q06C+hRU4XBT7i9D0G5xlFZl2LcjfskkQqdoY1TZt2Sq ndag8M3H77XwTeZpi/0/ZNeyQ1vjLUjQl7SU1udOJnn5JDfUtujuOyR87EF01Z6fleeN kRYJ3laEdXikEXnYwZ82Z74sdoeh19atsGgNyhfLW2a4jIl3zuICSFg45t4ahxLq1oui OKWumdfEl21ukgVaRSHT8mK4w3ESiLRonDEHfBklMqHJmbSU3DLB6i/DJQwezlOJ1r6x Pvug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=SIe9zUzXkAQz1y6Sqd15GO+8qYPx3M6sSjI36nGtGVQ=; b=ZQYuK/cg3/HY94nd6jOerWy9t8kXV2rUxNJ4OHkT3yAPZdL9WUBdmeNKhL3hX8N0l/ ZW2TlnEi76hUrvQnLnYOL1AofULQ/8reYFrnObjWWlMVUefj1KD5D9CqfZt2lKGYhSqS ng9rrWxxMmP+0JkDY8VO2CPToI2qBNqv/7AVamvFQIjhA4DAtRS7ANVIbVR5MeR8z6FN cKwlTHkINliKAoAnLsOuDpvVoHuPvmgGqxSVVmJqwJEENbOUqFgehBPw8D3uH0qaJ0tR aSTKN1Gsu8o5U8vAT+sSyQhoD0RkHExmD+Svd7kDo1py2wW32hXvFX33mgpdTAwwcD/9 to4A== X-Gm-Message-State: AOAM532frEUI8Mfiv/EcVBFM4XJd2VDc1PRhbiGtagqeF4NNPj7t0IYD TzJNXvV9z+yOPlyQNP9jDfvhqVR/qfg= X-Google-Smtp-Source: ABdhPJyK1GR/C5NaObOuvxlEfeOv6c3342M2GbP3lMphyEWrXkljOxpA8CHIbuBw2n3rwQ5hwPpf8Q== X-Received: by 2002:a0c:fb07:: with SMTP id c7mr1745136qvp.42.1622168891186; Thu, 27 May 2021 19:28:11 -0700 (PDT) Received: from tannerlove.nyc.corp.google.com ([2620:0:1003:316:2437:9aae:c493:2542]) by smtp.gmail.com with ESMTPSA id a14sm2488071qtj.57.2021.05.27.19.28.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 27 May 2021 19:28:10 -0700 (PDT) From: Tanner Love To: netdev@vger.kernel.org Cc: davem@davemloft.net, Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Eric Dumazet , Willem de Bruijn , Petar Penkov , Tanner Love Subject: [PATCH net-next v2 2/3] virtio_net: add optional flow dissection in virtio_net_hdr_to_skb Date: Thu, 27 May 2021 22:28:02 -0400 Message-Id: <20210528022803.778578-3-tannerlove.kernel@gmail.com> X-Mailer: git-send-email 2.32.0.rc0.204.g9fa02ecfa5-goog In-Reply-To: <20210528022803.778578-1-tannerlove.kernel@gmail.com> References: <20210528022803.778578-1-tannerlove.kernel@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Tanner Love Syzkaller bugs have resulted from loose specification of virtio_net_hdr[1]. Enable execution of a BPF flow dissector program in virtio_net_hdr_to_skb to validate the vnet header and drop bad input. The existing behavior of accepting these vnet headers is part of the ABI. But individual admins may want to enforce restrictions. For example, verifying that a GSO_TCPV4 gso_type matches packet contents: unencapsulated TCP/IPV4 packet with payload exceeding gso_size and hdr_len at payload offset. Introduce a new sysctl net.core.flow_dissect_vnet_hdr controlling a static key to decide whether to perform flow dissection. When the key is false, virtio_net_hdr_to_skb computes as before. [1] https://syzkaller.appspot.com/bug?id=b419a5ca95062664fe1a60b764621eb4526e2cd0 Signed-off-by: Tanner Love Suggested-by: Willem de Bruijn Reported-by: kernel test robot --- include/linux/virtio_net.h | 27 +++++++++++++++++++++++---- net/core/sysctl_net_core.c | 10 ++++++++++ 2 files changed, 33 insertions(+), 4 deletions(-) diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h index b465f8f3e554..a92fcf38087d 100644 --- a/include/linux/virtio_net.h +++ b/include/linux/virtio_net.h @@ -3,6 +3,8 @@ #define _LINUX_VIRTIO_NET_H #include +#include +#include #include #include #include @@ -25,10 +27,13 @@ static inline int virtio_net_hdr_set_proto(struct sk_buff *skb, return 0; } +DECLARE_STATIC_KEY_FALSE(sysctl_flow_dissect_vnet_hdr_key); + static inline int virtio_net_hdr_to_skb(struct sk_buff *skb, const struct virtio_net_hdr *hdr, bool little_endian) { + struct flow_keys_basic keys; unsigned int gso_type = 0; unsigned int thlen = 0; unsigned int p_off = 0; @@ -78,13 +83,24 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb, p_off = skb_transport_offset(skb) + thlen; if (!pskb_may_pull(skb, p_off)) return -EINVAL; - } else { + } + + /* BPF flow dissection for optional strict validation. + * + * Admins can define permitted packets more strictly, such as dropping + * deprecated UDP_UFO packets and requiring skb->protocol to be non-zero + * and matching packet headers. + */ + if (static_branch_unlikely(&sysctl_flow_dissect_vnet_hdr_key) && + !__skb_flow_dissect_flow_keys_basic(NULL, skb, &keys, NULL, 0, 0, 0, + 0, hdr)) + return -EINVAL; + + if (!(hdr->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM)) { /* gso packets without NEEDS_CSUM do not set transport_offset. * probe and drop if does not match one of the above types. */ if (gso_type && skb->network_header) { - struct flow_keys_basic keys; - if (!skb->protocol) { __be16 protocol = dev_parse_header_protocol(skb); @@ -92,8 +108,11 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb, if (protocol && protocol != skb->protocol) return -EINVAL; } + retry: - if (!skb_flow_dissect_flow_keys_basic(NULL, skb, &keys, + /* only if flow dissection not already done */ + if (!static_branch_unlikely(&sysctl_flow_dissect_vnet_hdr_key) && + !skb_flow_dissect_flow_keys_basic(NULL, skb, &keys, NULL, 0, 0, 0, 0)) { /* UFO does not specify ipv4 or 6: try both */ diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c index c8496c1142c9..277eb6ba3b01 100644 --- a/net/core/sysctl_net_core.c +++ b/net/core/sysctl_net_core.c @@ -36,6 +36,9 @@ static int net_msg_warn; /* Unused, but still a sysctl */ int sysctl_fb_tunnels_only_for_init_net __read_mostly = 0; EXPORT_SYMBOL(sysctl_fb_tunnels_only_for_init_net); +DEFINE_STATIC_KEY_FALSE(sysctl_flow_dissect_vnet_hdr_key); +EXPORT_SYMBOL(sysctl_flow_dissect_vnet_hdr_key); + /* 0 - Keep current behavior: * IPv4: inherit all current settings from init_net * IPv6: reset all settings to default @@ -580,6 +583,13 @@ static struct ctl_table net_core_table[] = { .extra1 = SYSCTL_ONE, .extra2 = &int_3600, }, + { + .procname = "flow_dissect_vnet_hdr", + .data = &sysctl_flow_dissect_vnet_hdr_key.key, + .maxlen = sizeof(sysctl_flow_dissect_vnet_hdr_key), + .mode = 0644, + .proc_handler = proc_do_static_key, + }, { } }; From patchwork Fri May 28 02:28:03 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tanner Love X-Patchwork-Id: 449911 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AF5F7C47089 for ; Fri, 28 May 2021 02:28:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8B377613BA for ; Fri, 28 May 2021 02:28:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236112AbhE1C36 (ORCPT ); Thu, 27 May 2021 22:29:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41364 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236098AbhE1C3u (ORCPT ); Thu, 27 May 2021 22:29:50 -0400 Received: from mail-qt1-x82e.google.com (mail-qt1-x82e.google.com [IPv6:2607:f8b0:4864:20::82e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C2D05C061763 for ; Thu, 27 May 2021 19:28:14 -0700 (PDT) Received: by mail-qt1-x82e.google.com with SMTP id i12so1742224qtr.7 for ; Thu, 27 May 2021 19:28:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=HjWmN6kUqQ5YCwC7JwQIh1tbtRwZbilq583FvBtayno=; b=Fni95fnuiEVx3icuKlMUgA9oMRsfMUAGf7Jl572VQTGR6GLa2l0IV+2ofl693ntuTd SpDmhAGPoczG/RNbM7kXH69iuNqlFdOzR0aa62HEeYL5ZIswTORm5VW1RBYe0DW8e+Yd wEc3RicFEgSLCvrkSxRgwkRVySWw79IQGHUEhAbJF+qM4d4UsfLL1sbfBBddPlMaxLoV tnia3AgWCJrOXcs6Forov+ixKMCoffCG3hoSHWU0aL9RVjX6lNp1JWAJit7ipBTS4mjr TomtUggrQUxHL1IfjMPr/ZLE7c3qi69YL1Hp6n4mcQ2m3HVl2ptGo0QRp33NvrU8A8EB g6cA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=HjWmN6kUqQ5YCwC7JwQIh1tbtRwZbilq583FvBtayno=; b=jpm3knzFk8YTOh3Ibi957JJ3gCrHZa3QRYEixVO/CuDorDRar36qb/6jF+A26jY5Iv ZCcP/sEao0PZ1Jd3DqzU/M930N7rIv2+k1eyP6lTO40JRBOYtAiY3sv38Rl6U0aTFHw5 lljMsKOKJsTl81IheiqXlpvF+iINOp4jGHwtqOhZOfo5umAQHXJmLIMB3zf2macEn1CL viuK7sI8CR356STwamy+B7nPQQgboSeG9Bdccj97Rs2l5VGLC7L/dMAFQsYTt44+U7sh 5m9O7yKXjKVmUSpismfqMhOOIxkoWmPPgl8hr5ikF/uoUqbpI4t8fzlIOExYMQqVSb2M CekA== X-Gm-Message-State: AOAM531VzSUCSBoU8opdjuNw9qv4+53TlK4i5idqpCZ+cti3VRV9TB1c LxSKVUrfId+lgS/uxAFq//IVfk03NoQ= X-Google-Smtp-Source: ABdhPJyxWAzZ5b2bS8yC2Scjs6GHchjE4LwURhOdg857hIuRLSBnFpEZNkItwH0uJSBOba29nEgxDg== X-Received: by 2002:ac8:5e4a:: with SMTP id i10mr1526503qtx.118.1622168893487; Thu, 27 May 2021 19:28:13 -0700 (PDT) Received: from tannerlove.nyc.corp.google.com ([2620:0:1003:316:2437:9aae:c493:2542]) by smtp.gmail.com with ESMTPSA id a14sm2488071qtj.57.2021.05.27.19.28.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 27 May 2021 19:28:13 -0700 (PDT) From: Tanner Love To: netdev@vger.kernel.org Cc: davem@davemloft.net, Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Eric Dumazet , Willem de Bruijn , Petar Penkov , Tanner Love Subject: [PATCH net-next v2 3/3] selftests/net: amend bpf flow dissector prog to do vnet hdr validation Date: Thu, 27 May 2021 22:28:03 -0400 Message-Id: <20210528022803.778578-4-tannerlove.kernel@gmail.com> X-Mailer: git-send-email 2.32.0.rc0.204.g9fa02ecfa5-goog In-Reply-To: <20210528022803.778578-1-tannerlove.kernel@gmail.com> References: <20210528022803.778578-1-tannerlove.kernel@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Tanner Love Change the BPF flow dissector program to perform various checks on the virtio_net_hdr fields after doing flow dissection. Amend test_flow_dissector.(c|sh) to add test cases that inject packets with reasonable or unreasonable virtio-net headers and assert that bad packets are dropped and good packets are not. Do this via packet socket; the kernel executes tpacket_snd, which enters virtio_net_hdr_to_skb, where flow dissection / vnet header validation occurs. Signed-off-by: Tanner Love Reviewed-by: Willem de Bruijn --- tools/testing/selftests/bpf/progs/bpf_flow.c | 188 +++++++++++++----- .../selftests/bpf/test_flow_dissector.c | 181 +++++++++++++++-- .../selftests/bpf/test_flow_dissector.sh | 19 ++ 3 files changed, 321 insertions(+), 67 deletions(-) diff --git a/tools/testing/selftests/bpf/progs/bpf_flow.c b/tools/testing/selftests/bpf/progs/bpf_flow.c index 95a5a0778ed7..681cfbdba1d7 100644 --- a/tools/testing/selftests/bpf/progs/bpf_flow.c +++ b/tools/testing/selftests/bpf/progs/bpf_flow.c @@ -13,6 +13,7 @@ #include #include #include +#include #include #include #include @@ -71,15 +72,119 @@ struct { __type(value, struct bpf_flow_keys); } last_dissection SEC(".maps"); -static __always_inline int export_flow_keys(struct bpf_flow_keys *keys, - int ret) +/* Drops invalid virtio-net headers */ +static __always_inline int validate_virtio_net_hdr(const struct __sk_buff *skb) { + const struct bpf_flow_keys *keys = skb->flow_keys; + + /* Check gso */ + if (skb->vhdr_gso_type != VIRTIO_NET_HDR_GSO_NONE) { + if (!(skb->vhdr_flags & VIRTIO_NET_HDR_F_NEEDS_CSUM)) + return BPF_DROP; + + if (keys->is_encap) + return BPF_DROP; + + switch (skb->vhdr_gso_type & ~VIRTIO_NET_HDR_GSO_ECN) { + case VIRTIO_NET_HDR_GSO_TCPV4: + if (keys->addr_proto != ETH_P_IP || + keys->ip_proto != IPPROTO_TCP) + return BPF_DROP; + + if (skb->vhdr_gso_size >= skb->len - keys->thoff - + sizeof(struct tcphdr)) + return BPF_DROP; + + break; + case VIRTIO_NET_HDR_GSO_TCPV6: + if (keys->addr_proto != ETH_P_IPV6 || + keys->ip_proto != IPPROTO_TCP) + return BPF_DROP; + + if (skb->vhdr_gso_size >= skb->len - keys->thoff - + sizeof(struct tcphdr)) + return BPF_DROP; + + break; + case VIRTIO_NET_HDR_GSO_UDP: + if (keys->ip_proto != IPPROTO_UDP) + return BPF_DROP; + + if (skb->vhdr_gso_size >= skb->len - keys->thoff - + sizeof(struct udphdr)) + return BPF_DROP; + + break; + default: + return BPF_DROP; + } + } + + /* Check hdr_len */ + if (skb->vhdr_hdr_len) { + switch (keys->ip_proto) { + case IPPROTO_TCP: + if (skb->vhdr_hdr_len != skb->flow_keys->thoff + + sizeof(struct tcphdr)) + return BPF_DROP; + + break; + case IPPROTO_UDP: + if (skb->vhdr_hdr_len != keys->thoff + + sizeof(struct udphdr)) + return BPF_DROP; + + break; + } + } + + /* Check csum */ + if (skb->vhdr_flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) { + if (keys->addr_proto != ETH_P_IP && + keys->addr_proto != ETH_P_IPV6) + return BPF_DROP; + + if (skb->vhdr_csum_start != keys->thoff) + return BPF_DROP; + + switch (keys->ip_proto) { + case IPPROTO_TCP: + if (skb->vhdr_csum_offset != + offsetof(struct tcphdr, check)) + return BPF_DROP; + + break; + case IPPROTO_UDP: + if (skb->vhdr_csum_offset != + offsetof(struct udphdr, check)) + return BPF_DROP; + + break; + default: + return BPF_DROP; + } + } + + return BPF_OK; +} + +/* Common steps to perform regardless of where protocol parsing finishes: + * 1. store flow keys in map + * 2. if parse result is BPF_OK, parse the vnet hdr if present + * 3. return the parse result + */ +static __always_inline int parse_epilogue(struct __sk_buff *skb, int ret) +{ + const struct bpf_flow_keys *keys = skb->flow_keys; __u32 key = (__u32)(keys->sport) << 16 | keys->dport; struct bpf_flow_keys val; memcpy(&val, keys, sizeof(val)); bpf_map_update_elem(&last_dissection, &key, &val, BPF_ANY); - return ret; + + if (ret != BPF_OK) + return ret; + return validate_virtio_net_hdr(skb); } #define IPV6_FLOWLABEL_MASK __bpf_constant_htonl(0x000FFFFF) @@ -114,8 +219,6 @@ static __always_inline void *bpf_flow_dissect_get_header(struct __sk_buff *skb, /* Dispatches on ETHERTYPE */ static __always_inline int parse_eth_proto(struct __sk_buff *skb, __be16 proto) { - struct bpf_flow_keys *keys = skb->flow_keys; - switch (proto) { case bpf_htons(ETH_P_IP): bpf_tail_call_static(skb, &jmp_table, IP); @@ -131,12 +234,10 @@ static __always_inline int parse_eth_proto(struct __sk_buff *skb, __be16 proto) case bpf_htons(ETH_P_8021AD): bpf_tail_call_static(skb, &jmp_table, VLAN); break; - default: - /* Protocol not supported */ - return export_flow_keys(keys, BPF_DROP); } - return export_flow_keys(keys, BPF_DROP); + /* Protocol not supported */ + return parse_epilogue(skb, BPF_DROP); } SEC("flow_dissector") @@ -162,28 +263,28 @@ static __always_inline int parse_ip_proto(struct __sk_buff *skb, __u8 proto) case IPPROTO_ICMP: icmp = bpf_flow_dissect_get_header(skb, sizeof(*icmp), &_icmp); if (!icmp) - return export_flow_keys(keys, BPF_DROP); - return export_flow_keys(keys, BPF_OK); + return parse_epilogue(skb, BPF_DROP); + return parse_epilogue(skb, BPF_OK); case IPPROTO_IPIP: keys->is_encap = true; if (keys->flags & BPF_FLOW_DISSECTOR_F_STOP_AT_ENCAP) - return export_flow_keys(keys, BPF_OK); + return parse_epilogue(skb, BPF_OK); return parse_eth_proto(skb, bpf_htons(ETH_P_IP)); case IPPROTO_IPV6: keys->is_encap = true; if (keys->flags & BPF_FLOW_DISSECTOR_F_STOP_AT_ENCAP) - return export_flow_keys(keys, BPF_OK); + return parse_epilogue(skb, BPF_OK); return parse_eth_proto(skb, bpf_htons(ETH_P_IPV6)); case IPPROTO_GRE: gre = bpf_flow_dissect_get_header(skb, sizeof(*gre), &_gre); if (!gre) - return export_flow_keys(keys, BPF_DROP); + return parse_epilogue(skb, BPF_DROP); if (bpf_htons(gre->flags & GRE_VERSION)) /* Only inspect standard GRE packets with version 0 */ - return export_flow_keys(keys, BPF_OK); + return parse_epilogue(skb, BPF_OK); keys->thoff += sizeof(*gre); /* Step over GRE Flags and Proto */ if (GRE_IS_CSUM(gre->flags)) @@ -195,13 +296,13 @@ static __always_inline int parse_ip_proto(struct __sk_buff *skb, __u8 proto) keys->is_encap = true; if (keys->flags & BPF_FLOW_DISSECTOR_F_STOP_AT_ENCAP) - return export_flow_keys(keys, BPF_OK); + return parse_epilogue(skb, BPF_OK); if (gre->proto == bpf_htons(ETH_P_TEB)) { eth = bpf_flow_dissect_get_header(skb, sizeof(*eth), &_eth); if (!eth) - return export_flow_keys(keys, BPF_DROP); + return parse_epilogue(skb, BPF_DROP); keys->thoff += sizeof(*eth); @@ -212,37 +313,35 @@ static __always_inline int parse_ip_proto(struct __sk_buff *skb, __u8 proto) case IPPROTO_TCP: tcp = bpf_flow_dissect_get_header(skb, sizeof(*tcp), &_tcp); if (!tcp) - return export_flow_keys(keys, BPF_DROP); + return parse_epilogue(skb, BPF_DROP); if (tcp->doff < 5) - return export_flow_keys(keys, BPF_DROP); + return parse_epilogue(skb, BPF_DROP); if ((__u8 *)tcp + (tcp->doff << 2) > data_end) - return export_flow_keys(keys, BPF_DROP); + return parse_epilogue(skb, BPF_DROP); keys->sport = tcp->source; keys->dport = tcp->dest; - return export_flow_keys(keys, BPF_OK); + return parse_epilogue(skb, BPF_OK); case IPPROTO_UDP: case IPPROTO_UDPLITE: udp = bpf_flow_dissect_get_header(skb, sizeof(*udp), &_udp); if (!udp) - return export_flow_keys(keys, BPF_DROP); + return parse_epilogue(skb, BPF_DROP); keys->sport = udp->source; keys->dport = udp->dest; - return export_flow_keys(keys, BPF_OK); + return parse_epilogue(skb, BPF_OK); default: - return export_flow_keys(keys, BPF_DROP); + return parse_epilogue(skb, BPF_DROP); } - return export_flow_keys(keys, BPF_DROP); + return parse_epilogue(skb, BPF_DROP); } static __always_inline int parse_ipv6_proto(struct __sk_buff *skb, __u8 nexthdr) { - struct bpf_flow_keys *keys = skb->flow_keys; - switch (nexthdr) { case IPPROTO_HOPOPTS: case IPPROTO_DSTOPTS: @@ -255,7 +354,7 @@ static __always_inline int parse_ipv6_proto(struct __sk_buff *skb, __u8 nexthdr) return parse_ip_proto(skb, nexthdr); } - return export_flow_keys(keys, BPF_DROP); + return parse_epilogue(skb, BPF_DROP); } PROG(IP)(struct __sk_buff *skb) @@ -268,11 +367,11 @@ PROG(IP)(struct __sk_buff *skb) iph = bpf_flow_dissect_get_header(skb, sizeof(*iph), &_iph); if (!iph) - return export_flow_keys(keys, BPF_DROP); + return parse_epilogue(skb, BPF_DROP); /* IP header cannot be smaller than 20 bytes */ if (iph->ihl < 5) - return export_flow_keys(keys, BPF_DROP); + return parse_epilogue(skb, BPF_DROP); keys->addr_proto = ETH_P_IP; keys->ipv4_src = iph->saddr; @@ -281,7 +380,7 @@ PROG(IP)(struct __sk_buff *skb) keys->thoff += iph->ihl << 2; if (data + keys->thoff > data_end) - return export_flow_keys(keys, BPF_DROP); + return parse_epilogue(skb, BPF_DROP); if (iph->frag_off & bpf_htons(IP_MF | IP_OFFSET)) { keys->is_frag = true; @@ -302,7 +401,7 @@ PROG(IP)(struct __sk_buff *skb) } if (done) - return export_flow_keys(keys, BPF_OK); + return parse_epilogue(skb, BPF_OK); return parse_ip_proto(skb, iph->protocol); } @@ -314,7 +413,7 @@ PROG(IPV6)(struct __sk_buff *skb) ip6h = bpf_flow_dissect_get_header(skb, sizeof(*ip6h), &_ip6h); if (!ip6h) - return export_flow_keys(keys, BPF_DROP); + return parse_epilogue(skb, BPF_DROP); keys->addr_proto = ETH_P_IPV6; memcpy(&keys->ipv6_src, &ip6h->saddr, 2*sizeof(ip6h->saddr)); @@ -324,7 +423,7 @@ PROG(IPV6)(struct __sk_buff *skb) keys->flow_label = ip6_flowlabel(ip6h); if (keys->flags & BPF_FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL) - return export_flow_keys(keys, BPF_OK); + return parse_epilogue(skb, BPF_OK); return parse_ipv6_proto(skb, ip6h->nexthdr); } @@ -336,7 +435,7 @@ PROG(IPV6OP)(struct __sk_buff *skb) ip6h = bpf_flow_dissect_get_header(skb, sizeof(*ip6h), &_ip6h); if (!ip6h) - return export_flow_keys(keys, BPF_DROP); + return parse_epilogue(skb, BPF_DROP); /* hlen is in 8-octets and does not include the first 8 bytes * of the header @@ -354,7 +453,7 @@ PROG(IPV6FR)(struct __sk_buff *skb) fragh = bpf_flow_dissect_get_header(skb, sizeof(*fragh), &_fragh); if (!fragh) - return export_flow_keys(keys, BPF_DROP); + return parse_epilogue(skb, BPF_DROP); keys->thoff += sizeof(*fragh); keys->is_frag = true; @@ -367,9 +466,9 @@ PROG(IPV6FR)(struct __sk_buff *skb) * explicitly asked for. */ if (!(keys->flags & BPF_FLOW_DISSECTOR_F_PARSE_1ST_FRAG)) - return export_flow_keys(keys, BPF_OK); + return parse_epilogue(skb, BPF_OK); } else { - return export_flow_keys(keys, BPF_OK); + return parse_epilogue(skb, BPF_OK); } return parse_ipv6_proto(skb, fragh->nexthdr); @@ -377,14 +476,13 @@ PROG(IPV6FR)(struct __sk_buff *skb) PROG(MPLS)(struct __sk_buff *skb) { - struct bpf_flow_keys *keys = skb->flow_keys; struct mpls_label *mpls, _mpls; mpls = bpf_flow_dissect_get_header(skb, sizeof(*mpls), &_mpls); if (!mpls) - return export_flow_keys(keys, BPF_DROP); + return parse_epilogue(skb, BPF_DROP); - return export_flow_keys(keys, BPF_OK); + return parse_epilogue(skb, BPF_OK); } PROG(VLAN)(struct __sk_buff *skb) @@ -396,10 +494,10 @@ PROG(VLAN)(struct __sk_buff *skb) if (keys->n_proto == bpf_htons(ETH_P_8021AD)) { vlan = bpf_flow_dissect_get_header(skb, sizeof(*vlan), &_vlan); if (!vlan) - return export_flow_keys(keys, BPF_DROP); + return parse_epilogue(skb, BPF_DROP); if (vlan->h_vlan_encapsulated_proto != bpf_htons(ETH_P_8021Q)) - return export_flow_keys(keys, BPF_DROP); + return parse_epilogue(skb, BPF_DROP); keys->nhoff += sizeof(*vlan); keys->thoff += sizeof(*vlan); @@ -407,14 +505,14 @@ PROG(VLAN)(struct __sk_buff *skb) vlan = bpf_flow_dissect_get_header(skb, sizeof(*vlan), &_vlan); if (!vlan) - return export_flow_keys(keys, BPF_DROP); + return parse_epilogue(skb, BPF_DROP); keys->nhoff += sizeof(*vlan); keys->thoff += sizeof(*vlan); /* Only allow 8021AD + 8021Q double tagging and no triple tagging.*/ if (vlan->h_vlan_encapsulated_proto == bpf_htons(ETH_P_8021AD) || vlan->h_vlan_encapsulated_proto == bpf_htons(ETH_P_8021Q)) - return export_flow_keys(keys, BPF_DROP); + return parse_epilogue(skb, BPF_DROP); keys->n_proto = vlan->h_vlan_encapsulated_proto; return parse_eth_proto(skb, vlan->h_vlan_encapsulated_proto); diff --git a/tools/testing/selftests/bpf/test_flow_dissector.c b/tools/testing/selftests/bpf/test_flow_dissector.c index 571cc076dd7d..583c13f75ead 100644 --- a/tools/testing/selftests/bpf/test_flow_dissector.c +++ b/tools/testing/selftests/bpf/test_flow_dissector.c @@ -17,6 +17,8 @@ #include #include #include +#include +#include #include #include #include @@ -65,7 +67,8 @@ struct guehdr { static uint8_t cfg_dsfield_inner; static uint8_t cfg_dsfield_outer; static uint8_t cfg_encap_proto; -static bool cfg_expect_failure = false; +static bool cfg_expect_norx; +static bool cfg_expect_snd_failure; static int cfg_l3_extra = AF_UNSPEC; /* optional SIT prefix */ static int cfg_l3_inner = AF_UNSPEC; static int cfg_l3_outer = AF_UNSPEC; @@ -77,8 +80,14 @@ static int cfg_port_gue = 6080; static bool cfg_only_rx; static bool cfg_only_tx; static int cfg_src_port = 9; +static bool cfg_tx_pf_packet; +static bool cfg_use_vnet; +static bool cfg_vnet_use_hdr_len_bad; +static bool cfg_vnet_use_gso; +static bool cfg_vnet_use_csum_off; +static bool cfg_partial_udp_hdr; -static char buf[ETH_DATA_LEN]; +static char buf[ETH_MAX_MTU]; #define INIT_ADDR4(name, addr4, port) \ static struct sockaddr_in name = { \ @@ -273,8 +282,48 @@ static int l3_length(int family) return sizeof(struct ipv6hdr); } +static int build_vnet_header(void *header, int il3_len) +{ + struct virtio_net_hdr *vh = header; + + vh->hdr_len = ETH_HLEN + il3_len + sizeof(struct udphdr); + + if (cfg_partial_udp_hdr) { + vh->hdr_len -= (sizeof(struct udphdr) >> 1); + return sizeof(*vh); + } + + /* Alteration must increase hdr_len; if not, kernel overwrites it */ + if (cfg_vnet_use_hdr_len_bad) + vh->hdr_len++; + + if (cfg_vnet_use_csum_off) { + vh->flags |= VIRTIO_NET_HDR_F_NEEDS_CSUM; + vh->csum_start = ETH_HLEN + il3_len; + vh->csum_offset = __builtin_offsetof(struct udphdr, check); + } + + if (cfg_vnet_use_gso) { + vh->gso_type = VIRTIO_NET_HDR_GSO_UDP; + vh->gso_size = ETH_DATA_LEN - il3_len; + } + + return sizeof(*vh); +} + +static int build_eth_header(void *header) +{ + struct ethhdr *eth = header; + uint16_t proto = cfg_l3_inner == PF_INET ? ETH_P_IP : ETH_P_IPV6; + + eth->h_proto = htons(proto); + + return ETH_HLEN; +} + static int build_packet(void) { + int l2_len = 0; int ol3_len = 0, ol4_len = 0, il3_len = 0, il4_len = 0; int el3_len = 0; @@ -294,23 +343,29 @@ static int build_packet(void) il3_len = l3_length(cfg_l3_inner); il4_len = sizeof(struct udphdr); - if (el3_len + ol3_len + ol4_len + il3_len + il4_len + cfg_payload_len >= - sizeof(buf)) + if (cfg_use_vnet) + l2_len += build_vnet_header(buf, il3_len); + if (cfg_tx_pf_packet) + l2_len += build_eth_header(buf + l2_len); + + if (l2_len + el3_len + ol3_len + ol4_len + il3_len + il4_len + + cfg_payload_len >= sizeof(buf)) error(1, 0, "packet too large\n"); /* * Fill packet from inside out, to calculate correct checksums. * But create ip before udp headers, as udp uses ip for pseudo-sum. */ - memset(buf + el3_len + ol3_len + ol4_len + il3_len + il4_len, + memset(buf + l2_len + el3_len + ol3_len + ol4_len + il3_len + il4_len, cfg_payload_char, cfg_payload_len); /* add zero byte for udp csum padding */ - buf[el3_len + ol3_len + ol4_len + il3_len + il4_len + cfg_payload_len] = 0; + buf[l2_len + el3_len + ol3_len + ol4_len + il3_len + il4_len + + cfg_payload_len] = 0; switch (cfg_l3_inner) { case PF_INET: - build_ipv4_header(buf + el3_len + ol3_len + ol4_len, + build_ipv4_header(buf + l2_len + el3_len + ol3_len + ol4_len, IPPROTO_UDP, in_saddr4.sin_addr.s_addr, in_daddr4.sin_addr.s_addr, @@ -318,7 +373,7 @@ static int build_packet(void) cfg_dsfield_inner); break; case PF_INET6: - build_ipv6_header(buf + el3_len + ol3_len + ol4_len, + build_ipv6_header(buf + l2_len + el3_len + ol3_len + ol4_len, IPPROTO_UDP, &in_saddr6, &in_daddr6, il4_len + cfg_payload_len, @@ -326,22 +381,25 @@ static int build_packet(void) break; } - build_udp_header(buf + el3_len + ol3_len + ol4_len + il3_len, + build_udp_header(buf + l2_len + el3_len + ol3_len + ol4_len + il3_len, cfg_payload_len, CFG_PORT_INNER, cfg_l3_inner); + if (cfg_partial_udp_hdr) + return l2_len + il3_len + (il4_len >> 1); + if (!cfg_encap_proto) - return il3_len + il4_len + cfg_payload_len; + return l2_len + il3_len + il4_len + cfg_payload_len; switch (cfg_l3_outer) { case PF_INET: - build_ipv4_header(buf + el3_len, cfg_encap_proto, + build_ipv4_header(buf + l2_len + el3_len, cfg_encap_proto, out_saddr4.sin_addr.s_addr, out_daddr4.sin_addr.s_addr, ol4_len + il3_len + il4_len + cfg_payload_len, cfg_dsfield_outer); break; case PF_INET6: - build_ipv6_header(buf + el3_len, cfg_encap_proto, + build_ipv6_header(buf + l2_len + el3_len, cfg_encap_proto, &out_saddr6, &out_daddr6, ol4_len + il3_len + il4_len + cfg_payload_len, cfg_dsfield_outer); @@ -350,17 +408,17 @@ static int build_packet(void) switch (cfg_encap_proto) { case IPPROTO_UDP: - build_gue_header(buf + el3_len + ol3_len + ol4_len - + build_gue_header(buf + l2_len + el3_len + ol3_len + ol4_len - sizeof(struct guehdr), cfg_l3_inner == PF_INET ? IPPROTO_IPIP : IPPROTO_IPV6); - build_udp_header(buf + el3_len + ol3_len, + build_udp_header(buf + l2_len + el3_len + ol3_len, sizeof(struct guehdr) + il3_len + il4_len + cfg_payload_len, cfg_port_gue, cfg_l3_outer); break; case IPPROTO_GRE: - build_gre_header(buf + el3_len + ol3_len, + build_gre_header(buf + l2_len + el3_len + ol3_len, cfg_l3_inner == PF_INET ? ETH_P_IP : ETH_P_IPV6); break; @@ -368,7 +426,7 @@ static int build_packet(void) switch (cfg_l3_extra) { case PF_INET: - build_ipv4_header(buf, + build_ipv4_header(buf + l2_len, cfg_l3_outer == PF_INET ? IPPROTO_IPIP : IPPROTO_IPV6, extra_saddr4.sin_addr.s_addr, @@ -377,7 +435,7 @@ static int build_packet(void) cfg_payload_len, 0); break; case PF_INET6: - build_ipv6_header(buf, + build_ipv6_header(buf + l2_len, cfg_l3_outer == PF_INET ? IPPROTO_IPIP : IPPROTO_IPV6, &extra_saddr6, &extra_daddr6, @@ -386,15 +444,46 @@ static int build_packet(void) break; } - return el3_len + ol3_len + ol4_len + il3_len + il4_len + + return l2_len + el3_len + ol3_len + ol4_len + il3_len + il4_len + cfg_payload_len; } +static int setup_tx_pfpacket(void) +{ + struct sockaddr_ll laddr = {0}; + const int one = 1; + uint16_t proto; + int fd; + + fd = socket(PF_PACKET, SOCK_RAW, 0); + if (fd == -1) + error(1, errno, "socket tx"); + + if (cfg_use_vnet && + setsockopt(fd, SOL_PACKET, PACKET_VNET_HDR, &one, sizeof(one))) + error(1, errno, "setsockopt vnet"); + + proto = cfg_l3_inner == PF_INET ? ETH_P_IP : ETH_P_IPV6; + laddr.sll_family = AF_PACKET; + laddr.sll_protocol = htons(proto); + laddr.sll_ifindex = if_nametoindex("lo"); + if (!laddr.sll_ifindex) + error(1, errno, "if_nametoindex"); + + if (bind(fd, (void *)&laddr, sizeof(laddr))) + error(1, errno, "bind"); + + return fd; +} + /* sender transmits encapsulated over RAW or unencap'd over UDP */ static int setup_tx(void) { int family, fd, ret; + if (cfg_tx_pf_packet) + return setup_tx_pfpacket(); + if (cfg_l3_extra) family = cfg_l3_extra; else if (cfg_l3_outer) @@ -464,6 +553,13 @@ static int do_tx(int fd, const char *pkt, int len) int ret; ret = write(fd, pkt, len); + + if (cfg_expect_snd_failure) { + if (ret == -1) + return 0; + error(1, 0, "expected tx to fail but it did not"); + } + if (ret == -1) error(1, errno, "send"); if (ret != len) @@ -571,7 +667,7 @@ static int do_main(void) * success (== 0) only if received all packets * unless failure is expected, in which case none must arrive. */ - if (cfg_expect_failure) + if (cfg_expect_norx || cfg_expect_snd_failure) return rx != 0; else return rx != tx; @@ -623,8 +719,12 @@ static void parse_opts(int argc, char **argv) { int c; - while ((c = getopt(argc, argv, "d:D:e:f:Fhi:l:n:o:O:Rs:S:t:Tx:X:")) != -1) { + while ((c = getopt(argc, argv, + "cd:D:e:Ef:FGghi:l:Ln:o:O:pRs:S:t:TUvx:X:")) != -1) { switch (c) { + case 'c': + cfg_vnet_use_csum_off = true; + break; case 'd': if (cfg_l3_outer == AF_UNSPEC) error(1, 0, "-d must be preceded by -o"); @@ -653,11 +753,17 @@ static void parse_opts(int argc, char **argv) else usage(argv[0]); break; + case 'E': + cfg_expect_snd_failure = true; + break; case 'f': cfg_src_port = strtol(optarg, NULL, 0); break; case 'F': - cfg_expect_failure = true; + cfg_expect_norx = true; + break; + case 'g': + cfg_vnet_use_gso = true; break; case 'h': usage(argv[0]); @@ -673,6 +779,9 @@ static void parse_opts(int argc, char **argv) case 'l': cfg_payload_len = strtol(optarg, NULL, 0); break; + case 'L': + cfg_vnet_use_hdr_len_bad = true; + break; case 'n': cfg_num_pkt = strtol(optarg, NULL, 0); break; @@ -682,6 +791,9 @@ static void parse_opts(int argc, char **argv) case 'O': cfg_l3_extra = parse_protocol_family(argv[0], optarg); break; + case 'p': + cfg_tx_pf_packet = true; + break; case 'R': cfg_only_rx = true; break; @@ -703,6 +815,12 @@ static void parse_opts(int argc, char **argv) case 'T': cfg_only_tx = true; break; + case 'U': + cfg_partial_udp_hdr = true; + break; + case 'v': + cfg_use_vnet = true; + break; case 'x': cfg_dsfield_outer = strtol(optarg, NULL, 0); break; @@ -733,7 +851,26 @@ static void parse_opts(int argc, char **argv) */ if (((cfg_dsfield_outer & 0x3) == 0x3) && ((cfg_dsfield_inner & 0x3) == 0x0)) - cfg_expect_failure = true; + cfg_expect_norx = true; + + /* Don't wait around for packets that we expect to fail to send */ + if (cfg_expect_snd_failure && !cfg_num_secs) + cfg_num_secs = 3; + + if (cfg_partial_udp_hdr && cfg_encap_proto) + error(1, 0, + "ops: can't specify partial UDP hdr (-U) and encap (-e)"); + + if (cfg_use_vnet && cfg_encap_proto) + error(1, 0, "options: can't specify encap (-e) with vnet (-v)"); + if (cfg_use_vnet && !cfg_tx_pf_packet) + error(1, 0, "options: vnet (-v) requires psock for tx (-p)"); + if (cfg_vnet_use_gso && !cfg_use_vnet) + error(1, 0, "options: gso (-g) requires vnet (-v)"); + if (cfg_vnet_use_csum_off && !cfg_use_vnet) + error(1, 0, "options: vnet csum (-c) requires vnet (-v)"); + if (cfg_vnet_use_hdr_len_bad && !cfg_use_vnet) + error(1, 0, "options: bad vnet hdrlen (-L) requires vnet (-v)"); } static void print_opts(void) diff --git a/tools/testing/selftests/bpf/test_flow_dissector.sh b/tools/testing/selftests/bpf/test_flow_dissector.sh index 174b72a64a4c..5852cf815eeb 100755 --- a/tools/testing/selftests/bpf/test_flow_dissector.sh +++ b/tools/testing/selftests/bpf/test_flow_dissector.sh @@ -51,6 +51,9 @@ if [[ -z $(ip netns identify $$) ]]; then echo "Skipping root flow dissector test, bpftool not found" >&2 fi + orig_flow_dissect_sysctl=$(