From patchwork Wed Jun 16 20:34:46 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tanner Love X-Patchwork-Id: 461864 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0B6C5C48BE5 for ; Wed, 16 Jun 2021 20:35:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E2A31601FE for ; Wed, 16 Jun 2021 20:35:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233548AbhFPUhC (ORCPT ); Wed, 16 Jun 2021 16:37:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51006 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233536AbhFPUhB (ORCPT ); Wed, 16 Jun 2021 16:37:01 -0400 Received: from mail-qk1-x733.google.com (mail-qk1-x733.google.com [IPv6:2607:f8b0:4864:20::733]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DF8F6C061574 for ; Wed, 16 Jun 2021 13:34:53 -0700 (PDT) Received: by mail-qk1-x733.google.com with SMTP id q16so830837qkm.9 for ; Wed, 16 Jun 2021 13:34:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ka4XRY15upnxXHngUs5IkdMkzfNExklyfsAoNXNv2tE=; b=OQv3+nniYPQcebil0iolMnv9xp3TLj9WsRTR+GAV/t8xdIw0fI4ShEdgDQgTvMsdtW TnI7lksyKm5zRAWQ3SB4CmubYcX33rLoLq8L8gva7ImovXudjqY8tem7Wouli9z2WU2h e1QNRq9C8HovZcGEeDO9yeJZGcbE2t/mBbm7d7AU00PcHNoeMn7Cn7VL+sSQCviZ6DAo NrCgTw+Ibm97xWkrmN1nXbYjVbAOOkHQ/d6p9pilA7dfpJJwH88LNTt4T78cCsuHIB+n sLI7Fc+xKHLl1YhFigdUdTU6NDy79dODdLTqtPtenneRCIjvVg9XkOjuncyMGp5UFfrI uZrg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ka4XRY15upnxXHngUs5IkdMkzfNExklyfsAoNXNv2tE=; b=WT5xaftNBqq/nbBvvWOlLIzNyJwBXmct6jy7by2jlOcnXy7jaGnxoD8IUNUwfGgKF8 YRL3uwH2u5MnQN09fp4jpW+I++RmBZbuEH7cV7qM/6kdQU0O10Md0OK0bb7iDL6PHyIR vLSQl6cB2LPwxFk8iEOjooVc/BxuPtMMJtvrOhVCsZ7f4QDV2NMrnBdYisbnooHTYuXC hNFba/rP7LsV6KNd1K/klnIUzzzufeqlvZuddivax4SPs2YyqGROOSCfb68bYBVdcPkV MoObHMo+Cry3aUlNmWWSKnKuiqzojDISp71DMZCzqlcIaVnGt3CZgqpSgRcDI23516Zr M0vw== X-Gm-Message-State: AOAM5339rumD4NE6zW/KRj6U6WkNKVWokavtLy8+qE8uXPQwxp3Ft3Cf ToSc5pFV63ViT/kTKj1rLUXK3Cle4hw= X-Google-Smtp-Source: ABdhPJxaLK7QeFlNdZIp7ZKaY02Efjrz9gc6tk6b807Lmmow33j1887wRre1h97Iq+XAXp/ID4ShcQ== X-Received: by 2002:a37:aac8:: with SMTP id t191mr217878qke.184.1623875692680; Wed, 16 Jun 2021 13:34:52 -0700 (PDT) Received: from tannerlove.nyc.corp.google.com ([2620:0:1003:1000:3395:f164:4389:255f]) by smtp.gmail.com with ESMTPSA id p2sm267308qkf.76.2021.06.16.13.34.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 16 Jun 2021 13:34:52 -0700 (PDT) From: Tanner Love To: netdev@vger.kernel.org Cc: davem@davemloft.net, Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Eric Dumazet , Willem de Bruijn , Petar Penkov , Jakub Kicinski , "Michael S . Tsirkin" , Jason Wang , Martin KaFai Lau , Tanner Love , Stanislav Fomichev Subject: [PATCH net-next v7 1/3] net: flow_dissector: extend bpf flow dissector support with vnet hdr Date: Wed, 16 Jun 2021 16:34:46 -0400 Message-Id: <20210616203448.995314-2-tannerlove.kernel@gmail.com> X-Mailer: git-send-email 2.32.0.272.g935e593368-goog In-Reply-To: <20210616203448.995314-1-tannerlove.kernel@gmail.com> References: <20210616203448.995314-1-tannerlove.kernel@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Tanner Love Amend the bpf flow dissector program type to be able to process virtio-net headers. Do this to enable bpf flow dissector programs to perform virtio-net header validation. The next patch in this series will add a flow dissection hook in virtio_net_hdr_to_skb and make use of this extended functionality. That commit message has more background on the use case. Add a pointer to struct virtio_net_hdr as a new member in struct bpf_flow_keys. When machine is big and vnet hdr is little endian, the kernel converts the vnet hdr endianness before passing the vnet hdr pointer to the bpf program; otherwise, the kernel just passes along the pointer to the unaltered vnet header to the bpf program. This handles the case of a v1+ header on a big endian machine. Changes v7: - Remove vhdr_is_little_endian, instead copy vhdr fields only in case where machine is big and vhdr is little endian - Rename check_flow_keys_access bpf_flow_dissector_is_valid_access - Move bpf_flow_dissector_is_valid_access() under CONFIG_NET - Make bpf_flow_dissector_is_valid_access() populate info.btf and info.reg_type in addition to info.btf_id v6: - Move bpf_flow_dissector_btf_ids, check_flow_keys_access() to filter.c - Verify (off % size == 0) in check_flow_keys_access() - Check bpf_flow_dissector_btf_ids[0] is nonzero in check_flow_keys_access() v5: - Use PTR_TO_BTF_ID_OR_NULL instead of defining new PTR_TO_VNET_HDR_OR_NULL - Make check_flow_keys_access() disallow writes to keys->vhdr - Make check_flow_keys_access() check loading keys->vhdr is in sizeof(__u64) - Use BPF_REG_AX instead of BPF_REG_TMP as scratch reg - Describe parameter vhdr_is_little_endian in __skb_flow_dissect documentation v4: - Add virtio_net_hdr pointer to struct bpf_flow_keys - Add vhdr_is_little_endian to struct bpf_flow_keys v2: - Describe parameter vhdr in __skb_flow_dissect documentation Signed-off-by: Tanner Love Reviewed-by: Willem de Bruijn Reviewed-by: Petar Penkov Reviewed-by: Stanislav Fomichev Reported-by: kernel test robot Reported-by: kernel test robot Reported-by: kernel test robot --- drivers/net/bonding/bond_main.c | 2 +- include/linux/bpf.h | 8 +++++ include/linux/skbuff.h | 35 +++++++++++++++++----- include/uapi/linux/bpf.h | 1 + kernel/bpf/verifier.c | 33 ++++++++++---------- net/bpf/test_run.c | 2 +- net/core/filter.c | 53 +++++++++++++++++++++++++++++++++ net/core/flow_dissector.c | 36 +++++++++++++++++++--- tools/include/uapi/linux/bpf.h | 1 + 9 files changed, 143 insertions(+), 28 deletions(-) diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index eb79a9f05914..36993636d56d 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -3554,7 +3554,7 @@ static bool bond_flow_dissect(struct bonding *bond, struct sk_buff *skb, case BOND_XMIT_POLICY_ENCAP34: memset(fk, 0, sizeof(*fk)); return __skb_flow_dissect(NULL, skb, &flow_keys_bonding, - fk, NULL, 0, 0, 0, 0); + fk, NULL, 0, 0, 0, 0, NULL, false); default: break; } diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 9dc44ba97584..e6980da0b469 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1998,6 +1998,8 @@ u32 bpf_sock_convert_ctx_access(enum bpf_access_type type, struct bpf_insn *insn_buf, struct bpf_prog *prog, u32 *target_size); +int bpf_flow_keys_is_valid_access(int off, int size, enum bpf_access_type t, + struct bpf_insn_access_aux *info); #else static inline bool bpf_sock_common_is_valid_access(int off, int size, enum bpf_access_type type, @@ -2019,6 +2021,12 @@ static inline u32 bpf_sock_convert_ctx_access(enum bpf_access_type type, { return 0; } +static inline int bpf_flow_keys_is_valid_access(int off, int size, + enum bpf_access_type t, + struct bpf_insn_access_aux *info) +{ + return -EINVAL; +} #endif #ifdef CONFIG_INET diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index b2db9cd9a73f..4e390cd8f72a 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1314,21 +1314,27 @@ void skb_flow_dissector_init(struct flow_dissector *flow_dissector, unsigned int key_count); struct bpf_flow_dissector; +struct virtio_net_hdr; bool bpf_flow_dissect(struct bpf_prog *prog, struct bpf_flow_dissector *ctx, - __be16 proto, int nhoff, int hlen, unsigned int flags); + __be16 proto, int nhoff, int hlen, unsigned int flags, + const struct virtio_net_hdr *vhdr, + bool vhdr_is_little_endian); bool __skb_flow_dissect(const struct net *net, const struct sk_buff *skb, struct flow_dissector *flow_dissector, void *target_container, const void *data, - __be16 proto, int nhoff, int hlen, unsigned int flags); + __be16 proto, int nhoff, int hlen, unsigned int flags, + const struct virtio_net_hdr *vhdr, + bool vhdr_is_little_endian); static inline bool skb_flow_dissect(const struct sk_buff *skb, struct flow_dissector *flow_dissector, void *target_container, unsigned int flags) { return __skb_flow_dissect(NULL, skb, flow_dissector, - target_container, NULL, 0, 0, 0, flags); + target_container, NULL, 0, 0, 0, flags, NULL, + false); } static inline bool skb_flow_dissect_flow_keys(const struct sk_buff *skb, @@ -1337,7 +1343,22 @@ static inline bool skb_flow_dissect_flow_keys(const struct sk_buff *skb, { memset(flow, 0, sizeof(*flow)); return __skb_flow_dissect(NULL, skb, &flow_keys_dissector, - flow, NULL, 0, 0, 0, flags); + flow, NULL, 0, 0, 0, flags, NULL, false); +} + +static inline bool +__skb_flow_dissect_flow_keys_basic(const struct net *net, + const struct sk_buff *skb, + struct flow_keys_basic *flow, + const void *data, __be16 proto, + int nhoff, int hlen, unsigned int flags, + const struct virtio_net_hdr *vhdr, + bool vhdr_is_little_endian) +{ + memset(flow, 0, sizeof(*flow)); + return __skb_flow_dissect(net, skb, &flow_keys_basic_dissector, flow, + data, proto, nhoff, hlen, flags, vhdr, + vhdr_is_little_endian); } static inline bool @@ -1347,9 +1368,9 @@ skb_flow_dissect_flow_keys_basic(const struct net *net, const void *data, __be16 proto, int nhoff, int hlen, unsigned int flags) { - memset(flow, 0, sizeof(*flow)); - return __skb_flow_dissect(net, skb, &flow_keys_basic_dissector, flow, - data, proto, nhoff, hlen, flags); + return __skb_flow_dissect_flow_keys_basic(net, skb, flow, data, proto, + nhoff, hlen, flags, NULL, + false); } void skb_flow_dissect_meta(const struct sk_buff *skb, diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 418b9b813d65..0524dec15c6d 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -6017,6 +6017,7 @@ struct bpf_flow_keys { }; __u32 flags; __be32 flow_label; + __bpf_md_ptr(const struct virtio_net_hdr *, vhdr); }; struct bpf_func_info { diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 331b170d9fcc..d4876d5e8959 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -3372,18 +3372,6 @@ static int check_ctx_access(struct bpf_verifier_env *env, int insn_idx, int off, return -EACCES; } -static int check_flow_keys_access(struct bpf_verifier_env *env, int off, - int size) -{ - if (size < 0 || off < 0 || - (u64)off + size > sizeof(struct bpf_flow_keys)) { - verbose(env, "invalid access to flow keys off=%d size=%d\n", - off, size); - return -EACCES; - } - return 0; -} - static int check_sock_access(struct bpf_verifier_env *env, int insn_idx, u32 regno, int off, int size, enum bpf_access_type t) @@ -4210,6 +4198,8 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn if (!err && t == BPF_READ && value_regno >= 0) mark_reg_unknown(env, regs, value_regno); } else if (reg->type == PTR_TO_FLOW_KEYS) { + struct bpf_insn_access_aux info = {}; + if (t == BPF_WRITE && value_regno >= 0 && is_pointer_value(env, value_regno)) { verbose(env, "R%d leaks addr into flow keys\n", @@ -4217,9 +4207,22 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn return -EACCES; } - err = check_flow_keys_access(env, off, size); - if (!err && t == BPF_READ && value_regno >= 0) - mark_reg_unknown(env, regs, value_regno); + err = bpf_flow_keys_is_valid_access(off, size, t, &info); + if (err) { + verbose(env, + "invalid access to flow keys off=%d size=%d\n", + off, size); + } else if (t == BPF_READ && value_regno >= 0) { + if (info.reg_type == PTR_TO_BTF_ID_OR_NULL) { + mark_reg_known_zero(env, regs, value_regno); + regs[value_regno].type = info.reg_type; + regs[value_regno].btf = info.btf; + regs[value_regno].btf_id = info.btf_id; + regs[value_regno].id = ++env->id_gen; + } else { + mark_reg_unknown(env, regs, value_regno); + } + } } else if (type_is_sk_pointer(reg->type)) { if (t == BPF_WRITE) { verbose(env, "R%d cannot write into %s\n", diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c index aa47af349ba8..a11c5ce99ccb 100644 --- a/net/bpf/test_run.c +++ b/net/bpf/test_run.c @@ -797,7 +797,7 @@ int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog, bpf_test_timer_enter(&t); do { retval = bpf_flow_dissect(prog, &ctx, eth->h_proto, ETH_HLEN, - size, flags); + size, flags, NULL, false); } while (bpf_test_timer_continue(&t, repeat, &ret, &duration)); bpf_test_timer_leave(&t); diff --git a/net/core/filter.c b/net/core/filter.c index 239de1306de9..c3964ef8f387 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -7880,6 +7880,33 @@ static bool sock_filter_is_valid_access(int off, int size, prog->expected_attach_type); } +BTF_ID_LIST_SINGLE(bpf_flow_dissector_btf_ids, struct, virtio_net_hdr); + +int bpf_flow_keys_is_valid_access(int off, int size, enum bpf_access_type t, + struct bpf_insn_access_aux *info) +{ + if (off < 0 || + (u64)off + size > offsetofend(struct bpf_flow_keys, vhdr)) + return -EACCES; + + switch (off) { + case bpf_ctx_range_ptr(struct bpf_flow_keys, vhdr): + if (t == BPF_WRITE || off % size != 0 || size != sizeof(__u64)) + return -EACCES; + + if (!bpf_flow_dissector_btf_ids[0]) + return -EINVAL; + + info->btf = bpf_get_btf_vmlinux(); + info->reg_type = PTR_TO_BTF_ID_OR_NULL; + info->btf_id = bpf_flow_dissector_btf_ids[0]; + + break; + } + + return 0; +} + static int bpf_noop_prologue(struct bpf_insn *insn_buf, bool direct_write, const struct bpf_prog *prog) { @@ -8358,6 +8385,8 @@ static bool flow_dissector_is_valid_access(int off, int size, return false; info->reg_type = PTR_TO_FLOW_KEYS; return true; + case bpf_ctx_range(struct __sk_buff, len): + return size == size_default; default: return false; } @@ -8390,6 +8419,30 @@ static u32 flow_dissector_convert_ctx_access(enum bpf_access_type type, si->dst_reg, si->src_reg, offsetof(struct bpf_flow_dissector, flow_keys)); break; + + case offsetof(struct __sk_buff, len): + *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct bpf_flow_dissector, skb), + si->dst_reg, si->src_reg, + offsetof(struct bpf_flow_dissector, skb)); + *insn++ = BPF_JMP_IMM(BPF_JNE, si->dst_reg, 0, 4); + /* bpf_flow_dissector->skb == NULL */ + /* dst_reg = bpf_flow_dissector->data_end */ + *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct bpf_flow_dissector, data_end), + si->dst_reg, si->src_reg, + offsetof(struct bpf_flow_dissector, data_end)); + /* AX = bpf_flow_dissector->data */ + *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct bpf_flow_dissector, data), + BPF_REG_AX, si->src_reg, + offsetof(struct bpf_flow_dissector, data)); + /* dst_reg -= bpf_flow_dissector->data */ + *insn++ = BPF_ALU64_REG(BPF_SUB, si->dst_reg, BPF_REG_AX); + *insn++ = BPF_JMP_A(1); + /* bpf_flow_dissector->skb != NULL */ + /* bpf_flow_dissector->skb->len */ + *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct sk_buff, len), + si->dst_reg, si->dst_reg, + offsetof(struct sk_buff, len)); + break; } return insn - insn_buf; diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c index 2aadbfc5193b..609e24ba98ea 100644 --- a/net/core/flow_dissector.c +++ b/net/core/flow_dissector.c @@ -28,6 +28,7 @@ #include #include #include +#include #if IS_ENABLED(CONFIG_NF_CONNTRACK) #include #include @@ -864,16 +865,38 @@ static void __skb_flow_bpf_to_target(const struct bpf_flow_keys *flow_keys, } bool bpf_flow_dissect(struct bpf_prog *prog, struct bpf_flow_dissector *ctx, - __be16 proto, int nhoff, int hlen, unsigned int flags) + __be16 proto, int nhoff, int hlen, unsigned int flags, + const struct virtio_net_hdr *vhdr, + bool vhdr_is_little_endian) { struct bpf_flow_keys *flow_keys = ctx->flow_keys; u32 result; +/* vnet hdr is either machine endian (virtio spec < v1) or le (>= v1) */ +#if defined(__BIG_ENDIAN_BITFIELD) + struct virtio_net_hdr vnet_hdr_local; + + if (vhdr && vhdr_is_little_endian) { + vnet_hdr_local.flags = vhdr->flags; + vnet_hdr_local.gso_type = vhdr->gso_type; + vnet_hdr_local.hdr_len = __virtio16_to_cpu(false, + vhdr->hdr_len); + vnet_hdr_local.gso_size = __virtio16_to_cpu(false, + vhdr->gso_size); + vnet_hdr_local.csum_start = __virtio16_to_cpu(false, + vhdr->csum_start); + vnet_hdr_local.csum_offset = __virtio16_to_cpu(false, + vhdr->csum_offset); + vhdr = &vnet_hdr_local; + } +#endif + /* Pass parameters to the BPF program */ memset(flow_keys, 0, sizeof(*flow_keys)); flow_keys->n_proto = proto; flow_keys->nhoff = nhoff; flow_keys->thoff = flow_keys->nhoff; + flow_keys->vhdr = vhdr; BUILD_BUG_ON((int)BPF_FLOW_DISSECTOR_F_PARSE_1ST_FRAG != (int)FLOW_DISSECTOR_F_PARSE_1ST_FRAG); @@ -904,6 +927,8 @@ bool bpf_flow_dissect(struct bpf_prog *prog, struct bpf_flow_dissector *ctx, * @hlen: packet header length, if @data is NULL use skb_headlen(skb) * @flags: flags that control the dissection process, e.g. * FLOW_DISSECTOR_F_STOP_AT_ENCAP. + * @vhdr: virtio_net_header to include in kernel context for BPF flow dissector + * @vhdr_is_little_endian: whether virtio_net_hdr fields are little endian * * The function will try to retrieve individual keys into target specified * by flow_dissector from either the skbuff or a raw buffer specified by the @@ -915,7 +940,9 @@ bool __skb_flow_dissect(const struct net *net, const struct sk_buff *skb, struct flow_dissector *flow_dissector, void *target_container, const void *data, - __be16 proto, int nhoff, int hlen, unsigned int flags) + __be16 proto, int nhoff, int hlen, unsigned int flags, + const struct virtio_net_hdr *vhdr, + bool vhdr_is_little_endian) { struct flow_dissector_key_control *key_control; struct flow_dissector_key_basic *key_basic; @@ -1012,7 +1039,8 @@ bool __skb_flow_dissect(const struct net *net, prog = READ_ONCE(run_array->items[0].prog); ret = bpf_flow_dissect(prog, &ctx, n_proto, nhoff, - hlen, flags); + hlen, flags, vhdr, + vhdr_is_little_endian); __skb_flow_bpf_to_target(&flow_keys, flow_dissector, target_container); rcu_read_unlock(); @@ -1610,7 +1638,7 @@ u32 __skb_get_hash_symmetric(const struct sk_buff *skb) memset(&keys, 0, sizeof(keys)); __skb_flow_dissect(NULL, skb, &flow_keys_dissector_symmetric, &keys, NULL, 0, 0, 0, - FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL); + FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL, NULL, false); return __flow_hash_from_keys(&keys, &hashrnd); } diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 418b9b813d65..0524dec15c6d 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -6017,6 +6017,7 @@ struct bpf_flow_keys { }; __u32 flags; __be32 flow_label; + __bpf_md_ptr(const struct virtio_net_hdr *, vhdr); }; struct bpf_func_info { From patchwork Wed Jun 16 20:34:48 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tanner Love X-Patchwork-Id: 461863 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3AF53C49361 for ; Wed, 16 Jun 2021 20:35:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 16392601FE for ; Wed, 16 Jun 2021 20:35:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233570AbhFPUhI (ORCPT ); Wed, 16 Jun 2021 16:37:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51020 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233554AbhFPUhF (ORCPT ); Wed, 16 Jun 2021 16:37:05 -0400 Received: from mail-qk1-x734.google.com (mail-qk1-x734.google.com [IPv6:2607:f8b0:4864:20::734]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 885ABC061574 for ; Wed, 16 Jun 2021 13:34:57 -0700 (PDT) Received: by mail-qk1-x734.google.com with SMTP id g142so855834qke.4 for ; Wed, 16 Jun 2021 13:34:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Pkaq8AVTHYasjORXengiDHcAf4yDUyXVFgQSdHztYp8=; b=snDAHCPlo8vSsy55znKy8PPQ8mwQJRIcpTx8BhXfvBIudCA0ck8cEu/u2sGG8qk3+0 UUxoP0Y4O9e9VoxsoayX5Sogqd2nuKR+Id2bf1qw2b7AxxD4rDpEdvhxOIx5/LiAh8o8 oX6F0QQzD/Yn3NGGE+TshJV03tB99/vZ6TsDvSMXb5fA6k/YOQSCvrnWm95BuSBh4+mE X8w9LlsAW28LGbdpPI51i+AFUMtLyhr/q8MqelUQjSYaEGAihxYB40PoBk9y8O3QVeaB fe5lmvSVNnmpBIxRa1f4YcQWE7A48D0mlrU5b8vOWA3Zk0+/WJ3Y3GYmYVmg2lyZg2Ja uH7g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Pkaq8AVTHYasjORXengiDHcAf4yDUyXVFgQSdHztYp8=; b=NpEwJiL3uC76zQ5hH1+q7Wnm2TtD4PZKAcBRMvvkir24Vh8Cx/HfoWgLGVV6JeNWxL AR1xwuNBImYETBm3kmZBLrdbG7DAFmj3+W9hOgJbdzE1V4hkpZAqvtwhSS0TgPzyEtRT tAG4d4FhCZ+2LoNQ3Fkx+qrF0Csgc7wpImdgr9ZkcqrY9M51qrOE3B8/RQxvwlHeRcRH C/DRNsv4SDGsdJHR4WuKzEjeIZZhXWlod/mmraOzUOD1kTGNOE8ncFzpkJ4Ck1zFbOKo ZSoLkuRX//0ajOSjVQ92jh72bKy6rmvL+v5Exa/uaq00+0HJiyVKP3hBb+/KRM8z/I7D xv5Q== X-Gm-Message-State: AOAM532Y41iuQ+mRG9/tWU8usGOwpXWZjTXsPG6usWBSSfJ9WgMHkit8 kH8GK5TejU5BpixHQoovqmXz/KY8uCk= X-Google-Smtp-Source: ABdhPJxZIkRZhiyihQhznqMtO3FqRgp8fxZkBf5RIhHxu1XqnOgOyrYNDHYvbjY9cYUVxQ1Ei3u8Ig== X-Received: by 2002:a37:468b:: with SMTP id t133mr185557qka.189.1623875696499; Wed, 16 Jun 2021 13:34:56 -0700 (PDT) Received: from tannerlove.nyc.corp.google.com ([2620:0:1003:1000:3395:f164:4389:255f]) by smtp.gmail.com with ESMTPSA id p2sm267308qkf.76.2021.06.16.13.34.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 16 Jun 2021 13:34:56 -0700 (PDT) From: Tanner Love To: netdev@vger.kernel.org Cc: davem@davemloft.net, Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Eric Dumazet , Willem de Bruijn , Petar Penkov , Jakub Kicinski , "Michael S . Tsirkin" , Jason Wang , Martin KaFai Lau , Tanner Love Subject: [PATCH net-next v7 3/3] selftests/net: amend bpf flow dissector prog to do vnet hdr validation Date: Wed, 16 Jun 2021 16:34:48 -0400 Message-Id: <20210616203448.995314-4-tannerlove.kernel@gmail.com> X-Mailer: git-send-email 2.32.0.272.g935e593368-goog In-Reply-To: <20210616203448.995314-1-tannerlove.kernel@gmail.com> References: <20210616203448.995314-1-tannerlove.kernel@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Tanner Love Change the BPF flow dissector program to perform various checks on the virtio_net_hdr fields after doing flow dissection. Amend test_flow_dissector.(c|sh) to add test cases that inject packets with reasonable or unreasonable virtio-net headers and assert that bad packets are dropped and good packets are not. Do this via packet socket; the kernel executes tpacket_snd, which enters virtio_net_hdr_to_skb, where flow dissection / vnet header validation occurs. Changes v4: - Read virtio_net_hdr pointer from struct bpf_flow_keys - Add vnet header endianness logic to BPF program Signed-off-by: Tanner Love Reviewed-by: Willem de Bruijn --- tools/testing/selftests/bpf/progs/bpf_flow.c | 188 +++++++++++++----- .../selftests/bpf/test_flow_dissector.c | 181 +++++++++++++++-- .../selftests/bpf/test_flow_dissector.sh | 19 ++ 3 files changed, 321 insertions(+), 67 deletions(-) diff --git a/tools/testing/selftests/bpf/progs/bpf_flow.c b/tools/testing/selftests/bpf/progs/bpf_flow.c index 95a5a0778ed7..451f38580bca 100644 --- a/tools/testing/selftests/bpf/progs/bpf_flow.c +++ b/tools/testing/selftests/bpf/progs/bpf_flow.c @@ -13,6 +13,7 @@ #include #include #include +#include #include #include #include @@ -71,15 +72,119 @@ struct { __type(value, struct bpf_flow_keys); } last_dissection SEC(".maps"); -static __always_inline int export_flow_keys(struct bpf_flow_keys *keys, - int ret) +/* Drops invalid virtio-net headers */ +static __always_inline int validate_vnet_hdr(const struct bpf_flow_keys *keys, + __u32 skb_len) { + const struct virtio_net_hdr *vhdr = keys->vhdr; + + if (!vhdr) + return BPF_OK; + + /* Check gso */ + if (vhdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) { + if (!(vhdr->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM)) + return BPF_DROP; + + if (keys->is_encap) + return BPF_DROP; + + switch (vhdr->gso_type & ~VIRTIO_NET_HDR_GSO_ECN) { + case VIRTIO_NET_HDR_GSO_TCPV4: + if (keys->addr_proto != ETH_P_IP || + keys->ip_proto != IPPROTO_TCP) + return BPF_DROP; + + if (vhdr->gso_size >= skb_len - keys->thoff - + sizeof(struct tcphdr)) + return BPF_DROP; + + break; + case VIRTIO_NET_HDR_GSO_TCPV6: + if (keys->addr_proto != ETH_P_IPV6 || + keys->ip_proto != IPPROTO_TCP) + return BPF_DROP; + + if (vhdr->gso_size >= skb_len - keys->thoff - + sizeof(struct tcphdr)) + return BPF_DROP; + + break; + case VIRTIO_NET_HDR_GSO_UDP: + if (keys->ip_proto != IPPROTO_UDP) + return BPF_DROP; + + if (vhdr->gso_size >= skb_len - keys->thoff - + sizeof(struct udphdr)) + return BPF_DROP; + + break; + default: + return BPF_DROP; + } + } + + /* Check hdr_len */ + if (vhdr->hdr_len) { + switch (keys->ip_proto) { + case IPPROTO_TCP: + if (vhdr->hdr_len != keys->thoff + sizeof(struct tcphdr)) + return BPF_DROP; + + break; + case IPPROTO_UDP: + if (vhdr->hdr_len != keys->thoff + sizeof(struct udphdr)) + return BPF_DROP; + + break; + } + } + + /* Check csum */ + if (vhdr->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) { + if (keys->addr_proto != ETH_P_IP && + keys->addr_proto != ETH_P_IPV6) + return BPF_DROP; + + if (vhdr->csum_start != keys->thoff) + return BPF_DROP; + + switch (keys->ip_proto) { + case IPPROTO_TCP: + if (vhdr->csum_offset != offsetof(struct tcphdr, check)) + return BPF_DROP; + + break; + case IPPROTO_UDP: + if (vhdr->csum_offset != offsetof(struct udphdr, check)) + return BPF_DROP; + + break; + default: + return BPF_DROP; + } + } + + return BPF_OK; +} + +/* Common steps to perform regardless of where protocol parsing finishes: + * 1. store flow keys in map + * 2. if parse result is BPF_OK, parse the vnet hdr if present + * 3. return the parse result + */ +static __always_inline int parse_epilogue(struct __sk_buff *skb, int ret) +{ + const struct bpf_flow_keys *keys = skb->flow_keys; __u32 key = (__u32)(keys->sport) << 16 | keys->dport; struct bpf_flow_keys val; memcpy(&val, keys, sizeof(val)); bpf_map_update_elem(&last_dissection, &key, &val, BPF_ANY); - return ret; + + if (ret != BPF_OK) + return ret; + return validate_vnet_hdr(keys, skb->len); } #define IPV6_FLOWLABEL_MASK __bpf_constant_htonl(0x000FFFFF) @@ -114,8 +219,6 @@ static __always_inline void *bpf_flow_dissect_get_header(struct __sk_buff *skb, /* Dispatches on ETHERTYPE */ static __always_inline int parse_eth_proto(struct __sk_buff *skb, __be16 proto) { - struct bpf_flow_keys *keys = skb->flow_keys; - switch (proto) { case bpf_htons(ETH_P_IP): bpf_tail_call_static(skb, &jmp_table, IP); @@ -131,12 +234,10 @@ static __always_inline int parse_eth_proto(struct __sk_buff *skb, __be16 proto) case bpf_htons(ETH_P_8021AD): bpf_tail_call_static(skb, &jmp_table, VLAN); break; - default: - /* Protocol not supported */ - return export_flow_keys(keys, BPF_DROP); } - return export_flow_keys(keys, BPF_DROP); + /* Protocol not supported */ + return parse_epilogue(skb, BPF_DROP); } SEC("flow_dissector") @@ -162,28 +263,28 @@ static __always_inline int parse_ip_proto(struct __sk_buff *skb, __u8 proto) case IPPROTO_ICMP: icmp = bpf_flow_dissect_get_header(skb, sizeof(*icmp), &_icmp); if (!icmp) - return export_flow_keys(keys, BPF_DROP); - return export_flow_keys(keys, BPF_OK); + return parse_epilogue(skb, BPF_DROP); + return parse_epilogue(skb, BPF_OK); case IPPROTO_IPIP: keys->is_encap = true; if (keys->flags & BPF_FLOW_DISSECTOR_F_STOP_AT_ENCAP) - return export_flow_keys(keys, BPF_OK); + return parse_epilogue(skb, BPF_OK); return parse_eth_proto(skb, bpf_htons(ETH_P_IP)); case IPPROTO_IPV6: keys->is_encap = true; if (keys->flags & BPF_FLOW_DISSECTOR_F_STOP_AT_ENCAP) - return export_flow_keys(keys, BPF_OK); + return parse_epilogue(skb, BPF_OK); return parse_eth_proto(skb, bpf_htons(ETH_P_IPV6)); case IPPROTO_GRE: gre = bpf_flow_dissect_get_header(skb, sizeof(*gre), &_gre); if (!gre) - return export_flow_keys(keys, BPF_DROP); + return parse_epilogue(skb, BPF_DROP); if (bpf_htons(gre->flags & GRE_VERSION)) /* Only inspect standard GRE packets with version 0 */ - return export_flow_keys(keys, BPF_OK); + return parse_epilogue(skb, BPF_OK); keys->thoff += sizeof(*gre); /* Step over GRE Flags and Proto */ if (GRE_IS_CSUM(gre->flags)) @@ -195,13 +296,13 @@ static __always_inline int parse_ip_proto(struct __sk_buff *skb, __u8 proto) keys->is_encap = true; if (keys->flags & BPF_FLOW_DISSECTOR_F_STOP_AT_ENCAP) - return export_flow_keys(keys, BPF_OK); + return parse_epilogue(skb, BPF_OK); if (gre->proto == bpf_htons(ETH_P_TEB)) { eth = bpf_flow_dissect_get_header(skb, sizeof(*eth), &_eth); if (!eth) - return export_flow_keys(keys, BPF_DROP); + return parse_epilogue(skb, BPF_DROP); keys->thoff += sizeof(*eth); @@ -212,37 +313,35 @@ static __always_inline int parse_ip_proto(struct __sk_buff *skb, __u8 proto) case IPPROTO_TCP: tcp = bpf_flow_dissect_get_header(skb, sizeof(*tcp), &_tcp); if (!tcp) - return export_flow_keys(keys, BPF_DROP); + return parse_epilogue(skb, BPF_DROP); if (tcp->doff < 5) - return export_flow_keys(keys, BPF_DROP); + return parse_epilogue(skb, BPF_DROP); if ((__u8 *)tcp + (tcp->doff << 2) > data_end) - return export_flow_keys(keys, BPF_DROP); + return parse_epilogue(skb, BPF_DROP); keys->sport = tcp->source; keys->dport = tcp->dest; - return export_flow_keys(keys, BPF_OK); + return parse_epilogue(skb, BPF_OK); case IPPROTO_UDP: case IPPROTO_UDPLITE: udp = bpf_flow_dissect_get_header(skb, sizeof(*udp), &_udp); if (!udp) - return export_flow_keys(keys, BPF_DROP); + return parse_epilogue(skb, BPF_DROP); keys->sport = udp->source; keys->dport = udp->dest; - return export_flow_keys(keys, BPF_OK); + return parse_epilogue(skb, BPF_OK); default: - return export_flow_keys(keys, BPF_DROP); + return parse_epilogue(skb, BPF_DROP); } - return export_flow_keys(keys, BPF_DROP); + return parse_epilogue(skb, BPF_DROP); } static __always_inline int parse_ipv6_proto(struct __sk_buff *skb, __u8 nexthdr) { - struct bpf_flow_keys *keys = skb->flow_keys; - switch (nexthdr) { case IPPROTO_HOPOPTS: case IPPROTO_DSTOPTS: @@ -255,7 +354,7 @@ static __always_inline int parse_ipv6_proto(struct __sk_buff *skb, __u8 nexthdr) return parse_ip_proto(skb, nexthdr); } - return export_flow_keys(keys, BPF_DROP); + return parse_epilogue(skb, BPF_DROP); } PROG(IP)(struct __sk_buff *skb) @@ -268,11 +367,11 @@ PROG(IP)(struct __sk_buff *skb) iph = bpf_flow_dissect_get_header(skb, sizeof(*iph), &_iph); if (!iph) - return export_flow_keys(keys, BPF_DROP); + return parse_epilogue(skb, BPF_DROP); /* IP header cannot be smaller than 20 bytes */ if (iph->ihl < 5) - return export_flow_keys(keys, BPF_DROP); + return parse_epilogue(skb, BPF_DROP); keys->addr_proto = ETH_P_IP; keys->ipv4_src = iph->saddr; @@ -281,7 +380,7 @@ PROG(IP)(struct __sk_buff *skb) keys->thoff += iph->ihl << 2; if (data + keys->thoff > data_end) - return export_flow_keys(keys, BPF_DROP); + return parse_epilogue(skb, BPF_DROP); if (iph->frag_off & bpf_htons(IP_MF | IP_OFFSET)) { keys->is_frag = true; @@ -302,7 +401,7 @@ PROG(IP)(struct __sk_buff *skb) } if (done) - return export_flow_keys(keys, BPF_OK); + return parse_epilogue(skb, BPF_OK); return parse_ip_proto(skb, iph->protocol); } @@ -314,7 +413,7 @@ PROG(IPV6)(struct __sk_buff *skb) ip6h = bpf_flow_dissect_get_header(skb, sizeof(*ip6h), &_ip6h); if (!ip6h) - return export_flow_keys(keys, BPF_DROP); + return parse_epilogue(skb, BPF_DROP); keys->addr_proto = ETH_P_IPV6; memcpy(&keys->ipv6_src, &ip6h->saddr, 2*sizeof(ip6h->saddr)); @@ -324,7 +423,7 @@ PROG(IPV6)(struct __sk_buff *skb) keys->flow_label = ip6_flowlabel(ip6h); if (keys->flags & BPF_FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL) - return export_flow_keys(keys, BPF_OK); + return parse_epilogue(skb, BPF_OK); return parse_ipv6_proto(skb, ip6h->nexthdr); } @@ -336,7 +435,7 @@ PROG(IPV6OP)(struct __sk_buff *skb) ip6h = bpf_flow_dissect_get_header(skb, sizeof(*ip6h), &_ip6h); if (!ip6h) - return export_flow_keys(keys, BPF_DROP); + return parse_epilogue(skb, BPF_DROP); /* hlen is in 8-octets and does not include the first 8 bytes * of the header @@ -354,7 +453,7 @@ PROG(IPV6FR)(struct __sk_buff *skb) fragh = bpf_flow_dissect_get_header(skb, sizeof(*fragh), &_fragh); if (!fragh) - return export_flow_keys(keys, BPF_DROP); + return parse_epilogue(skb, BPF_DROP); keys->thoff += sizeof(*fragh); keys->is_frag = true; @@ -367,9 +466,9 @@ PROG(IPV6FR)(struct __sk_buff *skb) * explicitly asked for. */ if (!(keys->flags & BPF_FLOW_DISSECTOR_F_PARSE_1ST_FRAG)) - return export_flow_keys(keys, BPF_OK); + return parse_epilogue(skb, BPF_OK); } else { - return export_flow_keys(keys, BPF_OK); + return parse_epilogue(skb, BPF_OK); } return parse_ipv6_proto(skb, fragh->nexthdr); @@ -377,14 +476,13 @@ PROG(IPV6FR)(struct __sk_buff *skb) PROG(MPLS)(struct __sk_buff *skb) { - struct bpf_flow_keys *keys = skb->flow_keys; struct mpls_label *mpls, _mpls; mpls = bpf_flow_dissect_get_header(skb, sizeof(*mpls), &_mpls); if (!mpls) - return export_flow_keys(keys, BPF_DROP); + return parse_epilogue(skb, BPF_DROP); - return export_flow_keys(keys, BPF_OK); + return parse_epilogue(skb, BPF_OK); } PROG(VLAN)(struct __sk_buff *skb) @@ -396,10 +494,10 @@ PROG(VLAN)(struct __sk_buff *skb) if (keys->n_proto == bpf_htons(ETH_P_8021AD)) { vlan = bpf_flow_dissect_get_header(skb, sizeof(*vlan), &_vlan); if (!vlan) - return export_flow_keys(keys, BPF_DROP); + return parse_epilogue(skb, BPF_DROP); if (vlan->h_vlan_encapsulated_proto != bpf_htons(ETH_P_8021Q)) - return export_flow_keys(keys, BPF_DROP); + return parse_epilogue(skb, BPF_DROP); keys->nhoff += sizeof(*vlan); keys->thoff += sizeof(*vlan); @@ -407,14 +505,14 @@ PROG(VLAN)(struct __sk_buff *skb) vlan = bpf_flow_dissect_get_header(skb, sizeof(*vlan), &_vlan); if (!vlan) - return export_flow_keys(keys, BPF_DROP); + return parse_epilogue(skb, BPF_DROP); keys->nhoff += sizeof(*vlan); keys->thoff += sizeof(*vlan); /* Only allow 8021AD + 8021Q double tagging and no triple tagging.*/ if (vlan->h_vlan_encapsulated_proto == bpf_htons(ETH_P_8021AD) || vlan->h_vlan_encapsulated_proto == bpf_htons(ETH_P_8021Q)) - return export_flow_keys(keys, BPF_DROP); + return parse_epilogue(skb, BPF_DROP); keys->n_proto = vlan->h_vlan_encapsulated_proto; return parse_eth_proto(skb, vlan->h_vlan_encapsulated_proto); diff --git a/tools/testing/selftests/bpf/test_flow_dissector.c b/tools/testing/selftests/bpf/test_flow_dissector.c index 571cc076dd7d..aa80055a5518 100644 --- a/tools/testing/selftests/bpf/test_flow_dissector.c +++ b/tools/testing/selftests/bpf/test_flow_dissector.c @@ -17,6 +17,8 @@ #include #include #include +#include +#include #include #include #include @@ -65,7 +67,8 @@ struct guehdr { static uint8_t cfg_dsfield_inner; static uint8_t cfg_dsfield_outer; static uint8_t cfg_encap_proto; -static bool cfg_expect_failure = false; +static bool cfg_expect_norx; +static bool cfg_expect_snd_failure; static int cfg_l3_extra = AF_UNSPEC; /* optional SIT prefix */ static int cfg_l3_inner = AF_UNSPEC; static int cfg_l3_outer = AF_UNSPEC; @@ -77,8 +80,14 @@ static int cfg_port_gue = 6080; static bool cfg_only_rx; static bool cfg_only_tx; static int cfg_src_port = 9; +static bool cfg_tx_pf_packet; +static bool cfg_use_vnet; +static bool cfg_vnet_use_hdr_len_bad; +static bool cfg_vnet_use_gso; +static bool cfg_vnet_use_csum_off; +static bool cfg_partial_udp_hdr; -static char buf[ETH_DATA_LEN]; +static char buf[ETH_MAX_MTU]; #define INIT_ADDR4(name, addr4, port) \ static struct sockaddr_in name = { \ @@ -273,8 +282,48 @@ static int l3_length(int family) return sizeof(struct ipv6hdr); } +static int build_vnet_header(void *header, int il3_len) +{ + struct virtio_net_hdr *vh = header; + + vh->hdr_len = ETH_HLEN + il3_len + sizeof(struct udphdr); + + if (cfg_partial_udp_hdr) { + vh->hdr_len -= (sizeof(struct udphdr) >> 1); + return sizeof(*vh); + } + + /* Alteration must increase hdr_len; if not, kernel overwrites it */ + if (cfg_vnet_use_hdr_len_bad) + vh->hdr_len++; + + if (cfg_vnet_use_csum_off) { + vh->flags |= VIRTIO_NET_HDR_F_NEEDS_CSUM; + vh->csum_start = ETH_HLEN + il3_len; + vh->csum_offset = __builtin_offsetof(struct udphdr, check); + } + + if (cfg_vnet_use_gso) { + vh->gso_type = VIRTIO_NET_HDR_GSO_UDP; + vh->gso_size = ETH_DATA_LEN - il3_len; + } + + return sizeof(*vh); +} + +static int build_eth_header(void *header) +{ + struct ethhdr *eth = header; + uint16_t proto = cfg_l3_inner == PF_INET ? ETH_P_IP : ETH_P_IPV6; + + eth->h_proto = htons(proto); + + return ETH_HLEN; +} + static int build_packet(void) { + int l2_len = 0; int ol3_len = 0, ol4_len = 0, il3_len = 0, il4_len = 0; int el3_len = 0; @@ -294,23 +343,29 @@ static int build_packet(void) il3_len = l3_length(cfg_l3_inner); il4_len = sizeof(struct udphdr); - if (el3_len + ol3_len + ol4_len + il3_len + il4_len + cfg_payload_len >= - sizeof(buf)) + if (cfg_use_vnet) + l2_len += build_vnet_header(buf, il3_len); + if (cfg_tx_pf_packet) + l2_len += build_eth_header(buf + l2_len); + + if (l2_len + el3_len + ol3_len + ol4_len + il3_len + il4_len + + cfg_payload_len >= sizeof(buf)) error(1, 0, "packet too large\n"); /* * Fill packet from inside out, to calculate correct checksums. * But create ip before udp headers, as udp uses ip for pseudo-sum. */ - memset(buf + el3_len + ol3_len + ol4_len + il3_len + il4_len, + memset(buf + l2_len + el3_len + ol3_len + ol4_len + il3_len + il4_len, cfg_payload_char, cfg_payload_len); /* add zero byte for udp csum padding */ - buf[el3_len + ol3_len + ol4_len + il3_len + il4_len + cfg_payload_len] = 0; + buf[l2_len + el3_len + ol3_len + ol4_len + il3_len + il4_len + + cfg_payload_len] = 0; switch (cfg_l3_inner) { case PF_INET: - build_ipv4_header(buf + el3_len + ol3_len + ol4_len, + build_ipv4_header(buf + l2_len + el3_len + ol3_len + ol4_len, IPPROTO_UDP, in_saddr4.sin_addr.s_addr, in_daddr4.sin_addr.s_addr, @@ -318,7 +373,7 @@ static int build_packet(void) cfg_dsfield_inner); break; case PF_INET6: - build_ipv6_header(buf + el3_len + ol3_len + ol4_len, + build_ipv6_header(buf + l2_len + el3_len + ol3_len + ol4_len, IPPROTO_UDP, &in_saddr6, &in_daddr6, il4_len + cfg_payload_len, @@ -326,22 +381,25 @@ static int build_packet(void) break; } - build_udp_header(buf + el3_len + ol3_len + ol4_len + il3_len, + build_udp_header(buf + l2_len + el3_len + ol3_len + ol4_len + il3_len, cfg_payload_len, CFG_PORT_INNER, cfg_l3_inner); + if (cfg_partial_udp_hdr) + return l2_len + il3_len + (il4_len >> 1); + if (!cfg_encap_proto) - return il3_len + il4_len + cfg_payload_len; + return l2_len + il3_len + il4_len + cfg_payload_len; switch (cfg_l3_outer) { case PF_INET: - build_ipv4_header(buf + el3_len, cfg_encap_proto, + build_ipv4_header(buf + l2_len + el3_len, cfg_encap_proto, out_saddr4.sin_addr.s_addr, out_daddr4.sin_addr.s_addr, ol4_len + il3_len + il4_len + cfg_payload_len, cfg_dsfield_outer); break; case PF_INET6: - build_ipv6_header(buf + el3_len, cfg_encap_proto, + build_ipv6_header(buf + l2_len + el3_len, cfg_encap_proto, &out_saddr6, &out_daddr6, ol4_len + il3_len + il4_len + cfg_payload_len, cfg_dsfield_outer); @@ -350,17 +408,17 @@ static int build_packet(void) switch (cfg_encap_proto) { case IPPROTO_UDP: - build_gue_header(buf + el3_len + ol3_len + ol4_len - + build_gue_header(buf + l2_len + el3_len + ol3_len + ol4_len - sizeof(struct guehdr), cfg_l3_inner == PF_INET ? IPPROTO_IPIP : IPPROTO_IPV6); - build_udp_header(buf + el3_len + ol3_len, + build_udp_header(buf + l2_len + el3_len + ol3_len, sizeof(struct guehdr) + il3_len + il4_len + cfg_payload_len, cfg_port_gue, cfg_l3_outer); break; case IPPROTO_GRE: - build_gre_header(buf + el3_len + ol3_len, + build_gre_header(buf + l2_len + el3_len + ol3_len, cfg_l3_inner == PF_INET ? ETH_P_IP : ETH_P_IPV6); break; @@ -368,7 +426,7 @@ static int build_packet(void) switch (cfg_l3_extra) { case PF_INET: - build_ipv4_header(buf, + build_ipv4_header(buf + l2_len, cfg_l3_outer == PF_INET ? IPPROTO_IPIP : IPPROTO_IPV6, extra_saddr4.sin_addr.s_addr, @@ -377,7 +435,7 @@ static int build_packet(void) cfg_payload_len, 0); break; case PF_INET6: - build_ipv6_header(buf, + build_ipv6_header(buf + l2_len, cfg_l3_outer == PF_INET ? IPPROTO_IPIP : IPPROTO_IPV6, &extra_saddr6, &extra_daddr6, @@ -386,15 +444,46 @@ static int build_packet(void) break; } - return el3_len + ol3_len + ol4_len + il3_len + il4_len + + return l2_len + el3_len + ol3_len + ol4_len + il3_len + il4_len + cfg_payload_len; } +static int setup_tx_pfpacket(void) +{ + struct sockaddr_ll laddr = {0}; + const int one = 1; + uint16_t proto; + int fd; + + fd = socket(PF_PACKET, SOCK_RAW, 0); + if (fd == -1) + error(1, errno, "socket tx"); + + if (cfg_use_vnet && + setsockopt(fd, SOL_PACKET, PACKET_VNET_HDR, &one, sizeof(one))) + error(1, errno, "setsockopt vnet"); + + proto = cfg_l3_inner == PF_INET ? ETH_P_IP : ETH_P_IPV6; + laddr.sll_family = AF_PACKET; + laddr.sll_protocol = htons(proto); + laddr.sll_ifindex = if_nametoindex("lo"); + if (!laddr.sll_ifindex) + error(1, errno, "if_nametoindex"); + + if (bind(fd, (void *)&laddr, sizeof(laddr))) + error(1, errno, "bind"); + + return fd; +} + /* sender transmits encapsulated over RAW or unencap'd over UDP */ static int setup_tx(void) { int family, fd, ret; + if (cfg_tx_pf_packet) + return setup_tx_pfpacket(); + if (cfg_l3_extra) family = cfg_l3_extra; else if (cfg_l3_outer) @@ -464,6 +553,13 @@ static int do_tx(int fd, const char *pkt, int len) int ret; ret = write(fd, pkt, len); + + if (cfg_expect_snd_failure) { + if (ret == -1) + return 0; + error(1, 0, "expected tx to fail but it did not"); + } + if (ret == -1) error(1, errno, "send"); if (ret != len) @@ -571,7 +667,7 @@ static int do_main(void) * success (== 0) only if received all packets * unless failure is expected, in which case none must arrive. */ - if (cfg_expect_failure) + if (cfg_expect_norx || cfg_expect_snd_failure) return rx != 0; else return rx != tx; @@ -623,8 +719,12 @@ static void parse_opts(int argc, char **argv) { int c; - while ((c = getopt(argc, argv, "d:D:e:f:Fhi:l:n:o:O:Rs:S:t:Tx:X:")) != -1) { + while ((c = getopt(argc, argv, + "cd:D:e:Ef:FGghi:l:Ln:o:O:pRs:S:t:TUvx:X:")) != -1) { switch (c) { + case 'c': + cfg_vnet_use_csum_off = true; + break; case 'd': if (cfg_l3_outer == AF_UNSPEC) error(1, 0, "-d must be preceded by -o"); @@ -653,11 +753,17 @@ static void parse_opts(int argc, char **argv) else usage(argv[0]); break; + case 'E': + cfg_expect_snd_failure = true; + break; case 'f': cfg_src_port = strtol(optarg, NULL, 0); break; case 'F': - cfg_expect_failure = true; + cfg_expect_norx = true; + break; + case 'g': + cfg_vnet_use_gso = true; break; case 'h': usage(argv[0]); @@ -673,6 +779,9 @@ static void parse_opts(int argc, char **argv) case 'l': cfg_payload_len = strtol(optarg, NULL, 0); break; + case 'L': + cfg_vnet_use_hdr_len_bad = true; + break; case 'n': cfg_num_pkt = strtol(optarg, NULL, 0); break; @@ -682,6 +791,9 @@ static void parse_opts(int argc, char **argv) case 'O': cfg_l3_extra = parse_protocol_family(argv[0], optarg); break; + case 'p': + cfg_tx_pf_packet = true; + break; case 'R': cfg_only_rx = true; break; @@ -703,6 +815,12 @@ static void parse_opts(int argc, char **argv) case 'T': cfg_only_tx = true; break; + case 'U': + cfg_partial_udp_hdr = true; + break; + case 'v': + cfg_use_vnet = true; + break; case 'x': cfg_dsfield_outer = strtol(optarg, NULL, 0); break; @@ -733,7 +851,26 @@ static void parse_opts(int argc, char **argv) */ if (((cfg_dsfield_outer & 0x3) == 0x3) && ((cfg_dsfield_inner & 0x3) == 0x0)) - cfg_expect_failure = true; + cfg_expect_norx = true; + + /* Don't wait around for packets that we expect to fail to send */ + if (cfg_expect_snd_failure && !cfg_num_secs) + cfg_num_secs = 3; + + if (cfg_partial_udp_hdr && cfg_encap_proto) + error(1, 0, + "ops: can't specify partial UDP hdr (-U) and encap (-e)"); + + if (cfg_use_vnet && cfg_encap_proto) + error(1, 0, "options: cannot specify encap (-e) with vnet (-v)"); + if (cfg_use_vnet && !cfg_tx_pf_packet) + error(1, 0, "options: vnet (-v) requires psock for tx (-p)"); + if (cfg_vnet_use_gso && !cfg_use_vnet) + error(1, 0, "options: gso (-g) requires vnet (-v)"); + if (cfg_vnet_use_csum_off && !cfg_use_vnet) + error(1, 0, "options: vnet csum (-c) requires vnet (-v)"); + if (cfg_vnet_use_hdr_len_bad && !cfg_use_vnet) + error(1, 0, "options: bad vnet hdrlen (-L) requires vnet (-v)"); } static void print_opts(void) diff --git a/tools/testing/selftests/bpf/test_flow_dissector.sh b/tools/testing/selftests/bpf/test_flow_dissector.sh index 174b72a64a4c..5852cf815eeb 100755 --- a/tools/testing/selftests/bpf/test_flow_dissector.sh +++ b/tools/testing/selftests/bpf/test_flow_dissector.sh @@ -51,6 +51,9 @@ if [[ -z $(ip netns identify $$) ]]; then echo "Skipping root flow dissector test, bpftool not found" >&2 fi + orig_flow_dissect_sysctl=$(