From patchwork Wed Oct 7 16:22:40 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jesper Dangaard Brouer X-Patchwork-Id: 288880 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.6 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 793D4C41604 for ; Wed, 7 Oct 2020 16:22:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1E62E216C4 for ; Wed, 7 Oct 2020 16:22:53 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="ZuJYIL2P" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728062AbgJGQWw (ORCPT ); Wed, 7 Oct 2020 12:22:52 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:53123 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726702AbgJGQWw (ORCPT ); Wed, 7 Oct 2020 12:22:52 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1602087770; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=znkGMQbCqPPIKuCPzeEbbRwLIioe5osJVYMyGwLpydA=; b=ZuJYIL2PXR9qCSBCdGDFKsSF4FIv/FxluuUY3vtLPhuf8qGlT9aM+v96vwcrjaQBAIaho6 KQYivqZUjdyUi7DNmk+C1CCgRASMc8UGt+oYjSbt/8ct4QcTxWvT+YTG/0N/NlBUVFDnKA Yiv+IXzh8hSZ2gMQsOBK3C+n19zUe6w= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-503-VWOgUPX5MIWlhVjSu8FNXg-1; Wed, 07 Oct 2020 12:22:47 -0400 X-MC-Unique: VWOgUPX5MIWlhVjSu8FNXg-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id DC4A81018F8C; Wed, 7 Oct 2020 16:22:44 +0000 (UTC) Received: from firesoul.localdomain (unknown [10.40.208.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id 5D0555C1BD; Wed, 7 Oct 2020 16:22:41 +0000 (UTC) Received: from [192.168.42.3] (localhost [IPv6:::1]) by firesoul.localdomain (Postfix) with ESMTP id 62BB630736C8B; Wed, 7 Oct 2020 18:22:40 +0200 (CEST) Subject: [PATCH bpf-next V2 1/6] bpf: Remove MTU check in __bpf_skb_max_len From: Jesper Dangaard Brouer To: bpf@vger.kernel.org Cc: Jesper Dangaard Brouer , netdev@vger.kernel.org, Daniel Borkmann , Alexei Starovoitov , maze@google.com, lmb@cloudflare.com, shaun@tigera.io, Lorenzo Bianconi , marek@cloudflare.com, John Fastabend , Jakub Kicinski , eyal.birger@gmail.com Date: Wed, 07 Oct 2020 18:22:40 +0200 Message-ID: <160208776033.798237.4028465222836713720.stgit@firesoul> In-Reply-To: <160208770557.798237.11181325462593441941.stgit@firesoul> References: <160208770557.798237.11181325462593441941.stgit@firesoul> User-Agent: StGit/0.19 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Multiple BPF-helpers that can manipulate/increase the size of the SKB uses __bpf_skb_max_len() as the max-length. This function limit size against the current net_device MTU (skb->dev->mtu). When a BPF-prog grow the packet size, then it should not be limited to the MTU. The MTU is a transmit limitation, and software receiving this packet should be allowed to increase the size. Further more, current MTU check in __bpf_skb_max_len uses the MTU from ingress/current net_device, which in case of redirects uses the wrong net_device. Keep a sanity max limit of IP_MAX_MTU which is 64KiB. In later patches we will enforce the MTU limitation when transmitting packets. Signed-off-by: Jesper Dangaard Brouer (imported from commit 37f8552786cf46588af52b77829b730dd14524d3) --- net/core/filter.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/net/core/filter.c b/net/core/filter.c index 05df73780dd3..fed239e77bdc 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -3476,8 +3476,7 @@ static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff, static u32 __bpf_skb_max_len(const struct sk_buff *skb) { - return skb->dev ? skb->dev->mtu + skb->dev->hard_header_len : - SKB_MAX_ALLOC; + return IP_MAX_MTU; } BPF_CALL_4(sk_skb_adjust_room, struct sk_buff *, skb, s32, len_diff, From patchwork Wed Oct 7 16:22:45 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jesper Dangaard Brouer X-Patchwork-Id: 268484 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.6 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A4E4EC4727E for ; Wed, 7 Oct 2020 16:22:55 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4891D21707 for ; Wed, 7 Oct 2020 16:22:55 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Dyw6Ep9J" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728119AbgJGQWy (ORCPT ); Wed, 7 Oct 2020 12:22:54 -0400 Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:57484 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728015AbgJGQWw (ORCPT ); Wed, 7 Oct 2020 12:22:52 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1602087771; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ikvHl0SMuAAx/2hFVp4naP5TxS0bA5ToBBRXWlR8ERk=; b=Dyw6Ep9JzDe9OQXbqr7g0Q/JMnh7NCCA4OZySKySBuq8xHcofQSbVTGHlvLtPWK3GeZlyn OOwB5OvN8NSA5kniUQT3B3RooY5n66qpzgxaWZ0RcvD/5Wo5KlXJRMMXPvJIVTT/MxJ8MM 3qke8fmVs5U/0YB/jTRzO3iriTB1yIk= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-262-Z-GCsxMIMcuiGbHybS9HmQ-1; Wed, 07 Oct 2020 12:22:49 -0400 X-MC-Unique: Z-GCsxMIMcuiGbHybS9HmQ-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id F0FC78070E2; Wed, 7 Oct 2020 16:22:46 +0000 (UTC) Received: from firesoul.localdomain (unknown [10.40.208.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id 70D3255770; Wed, 7 Oct 2020 16:22:46 +0000 (UTC) Received: from [192.168.42.3] (localhost [IPv6:::1]) by firesoul.localdomain (Postfix) with ESMTP id 78D6F30736C8B; Wed, 7 Oct 2020 18:22:45 +0200 (CEST) Subject: [PATCH bpf-next V2 2/6] bpf: bpf_fib_lookup return MTU value as output when looked up From: Jesper Dangaard Brouer To: bpf@vger.kernel.org Cc: Jesper Dangaard Brouer , netdev@vger.kernel.org, Daniel Borkmann , Alexei Starovoitov , maze@google.com, lmb@cloudflare.com, shaun@tigera.io, Lorenzo Bianconi , marek@cloudflare.com, John Fastabend , Jakub Kicinski , eyal.birger@gmail.com Date: Wed, 07 Oct 2020 18:22:45 +0200 Message-ID: <160208776541.798237.663413315328442772.stgit@firesoul> In-Reply-To: <160208770557.798237.11181325462593441941.stgit@firesoul> References: <160208770557.798237.11181325462593441941.stgit@firesoul> User-Agent: StGit/0.19 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org The BPF-helpers for FIB lookup (bpf_xdp_fib_lookup and bpf_skb_fib_lookup) can perform MTU check and return BPF_FIB_LKUP_RET_FRAG_NEEDED. The BPF-prog don't know the MTU value that caused this rejection. If the BPF-prog wants to implement PMTU (Path MTU Discovery) (rfc1191) it need to know this MTU value for the ICMP packet. Patch change lookup and result struct bpf_fib_lookup, to contain this MTU value as output via a union with 'tot_len' as this is the value used for the MTU lookup. Signed-off-by: Jesper Dangaard Brouer --- include/uapi/linux/bpf.h | 11 +++++++++-- net/core/filter.c | 17 ++++++++++++----- 2 files changed, 21 insertions(+), 7 deletions(-) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index c446394135be..50ce65e37b16 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -2216,6 +2216,9 @@ union bpf_attr { * * > 0 one of **BPF_FIB_LKUP_RET_** codes explaining why the * packet is not forwarded or needs assist from full stack * + * If lookup fails with BPF_FIB_LKUP_RET_FRAG_NEEDED, then the MTU + * was exceeded and result params->mtu contains the MTU. + * * long bpf_sock_hash_update(struct bpf_sock_ops *skops, struct bpf_map *map, void *key, u64 flags) * Description * Add an entry to, or update a sockhash *map* referencing sockets. @@ -4844,9 +4847,13 @@ struct bpf_fib_lookup { __be16 sport; __be16 dport; - /* total length of packet from network header - used for MTU check */ - __u16 tot_len; + union { /* used for MTU check */ + /* input to lookup */ + __u16 tot_len; /* total length of packet from network hdr */ + /* output: MTU value (if requested check_mtu) */ + __u16 mtu; + }; /* input: L3 device index for lookup * output: device index from FIB lookup */ diff --git a/net/core/filter.c b/net/core/filter.c index fed239e77bdc..d84723f347c0 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -5185,13 +5185,14 @@ static const struct bpf_func_proto bpf_skb_get_xfrm_state_proto = { #if IS_ENABLED(CONFIG_INET) || IS_ENABLED(CONFIG_IPV6) static int bpf_fib_set_fwd_params(struct bpf_fib_lookup *params, const struct neighbour *neigh, - const struct net_device *dev) + const struct net_device *dev, u32 mtu) { memcpy(params->dmac, neigh->ha, ETH_ALEN); memcpy(params->smac, dev->dev_addr, ETH_ALEN); params->h_vlan_TCI = 0; params->h_vlan_proto = 0; params->ifindex = dev->ifindex; + params->mtu = mtu; return 0; } @@ -5275,8 +5276,10 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params, if (check_mtu) { mtu = ip_mtu_from_fib_result(&res, params->ipv4_dst); - if (params->tot_len > mtu) + if (params->tot_len > mtu) { + params->mtu = mtu; /* union with tot_len */ return BPF_FIB_LKUP_RET_FRAG_NEEDED; + } } nhc = res.nhc; @@ -5309,7 +5312,7 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params, if (!neigh) return BPF_FIB_LKUP_RET_NO_NEIGH; - return bpf_fib_set_fwd_params(params, neigh, dev); + return bpf_fib_set_fwd_params(params, neigh, dev, mtu); } #endif @@ -5401,8 +5404,10 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params, if (check_mtu) { mtu = ipv6_stub->ip6_mtu_from_fib6(&res, dst, src); - if (params->tot_len > mtu) + if (params->tot_len > mtu) { + params->mtu = mtu; /* union with tot_len */ return BPF_FIB_LKUP_RET_FRAG_NEEDED; + } } if (res.nh->fib_nh_lws) @@ -5421,7 +5426,7 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params, if (!neigh) return BPF_FIB_LKUP_RET_NO_NEIGH; - return bpf_fib_set_fwd_params(params, neigh, dev); + return bpf_fib_set_fwd_params(params, neigh, dev, mtu); } #endif @@ -5490,6 +5495,8 @@ BPF_CALL_4(bpf_skb_fib_lookup, struct sk_buff *, skb, dev = dev_get_by_index_rcu(net, params->ifindex); if (!is_skb_forwardable(dev, skb)) rc = BPF_FIB_LKUP_RET_FRAG_NEEDED; + + params->mtu = dev->mtu; /* union with tot_len */ } return rc; From patchwork Wed Oct 7 16:22:50 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jesper Dangaard Brouer X-Patchwork-Id: 288879 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.6 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2539EC41604 for ; Wed, 7 Oct 2020 16:23:02 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B98E9216C4 for ; Wed, 7 Oct 2020 16:23:01 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Gtsmryw+" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728157AbgJGQXB (ORCPT ); Wed, 7 Oct 2020 12:23:01 -0400 Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:55135 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728015AbgJGQXA (ORCPT ); Wed, 7 Oct 2020 12:23:00 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1602087778; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=KPTBQV/ukqdfS0FQlAcoi/IUW31HlO7s/oIdJpf8onY=; b=Gtsmryw+VadhDXluLw/CR0OdWn4Q/22bmf0TgqDHta4fELuaqWXvELAy+VlI1XI1uShlNO tMArndcB/gdHvtFae0atP9QefGOQ0FOhtF/hkMDaxFWw7YS+mOhRdJylDB5LcZd80JaBjg V45Qlel/HPbjCElViIINdqtDsc+L8Qw= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-196-owCeDsKcOj6bOtQjVmIfmg-1; Wed, 07 Oct 2020 12:22:56 -0400 X-MC-Unique: owCeDsKcOj6bOtQjVmIfmg-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id B1BAC425D7; Wed, 7 Oct 2020 16:22:54 +0000 (UTC) Received: from firesoul.localdomain (unknown [10.40.208.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8A6B3702E7; Wed, 7 Oct 2020 16:22:51 +0000 (UTC) Received: from [192.168.42.3] (localhost [IPv6:::1]) by firesoul.localdomain (Postfix) with ESMTP id 91A0930736C8B; Wed, 7 Oct 2020 18:22:50 +0200 (CEST) Subject: [PATCH bpf-next V2 3/6] bpf: add BPF-helper for MTU checking From: Jesper Dangaard Brouer To: bpf@vger.kernel.org Cc: Jesper Dangaard Brouer , netdev@vger.kernel.org, Daniel Borkmann , Alexei Starovoitov , maze@google.com, lmb@cloudflare.com, shaun@tigera.io, Lorenzo Bianconi , marek@cloudflare.com, John Fastabend , Jakub Kicinski , eyal.birger@gmail.com Date: Wed, 07 Oct 2020 18:22:50 +0200 Message-ID: <160208777050.798237.15733498595654853619.stgit@firesoul> In-Reply-To: <160208770557.798237.11181325462593441941.stgit@firesoul> References: <160208770557.798237.11181325462593441941.stgit@firesoul> User-Agent: StGit/0.19 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This BPF-helper bpf_mtu_check() works for both XDP and TC-BPF programs. The API is designed to help the BPF-programmer, that want to do packet context size changes, which involves other helpers. These other helpers usually does a delta size adjustment. This helper also support a delta size (len_diff), which allow BPF-programmer to reuse arguments needed by these other helpers, and perform the MTU check prior to doing any actual size adjustment of the packet context. Signed-off-by: Jesper Dangaard Brouer --- include/uapi/linux/bpf.h | 57 ++++++++++++++++++++++ net/core/filter.c | 117 ++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 174 insertions(+) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 50ce65e37b16..64cdad06135e 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -3718,6 +3718,50 @@ union bpf_attr { * never return NULL. * Return * A pointer pointing to the kernel percpu variable on this cpu. + * + * int bpf_mtu_check(void *ctx, u32 ifindex, u32 *mtu_result, s32 len_diff, u64 flags) + * Description + * Check ctx packet size against MTU of net device (based on + * *ifindex*). This helper will likely be used in combination with + * helpers that adjust/change the packet size. The argument + * *len_diff* can be used for querying with a planned size + * change. This allows to check MTU prior to changing packet ctx. + * + * The Linux kernel route table can configure MTUs on a more + * specific per route level, which is not provided by this helper. + * For route level MTU checks use the **bpf_fib_lookup**\ () + * helper. + * + * *ctx* is either **struct xdp_md** for XDP programs or + * **struct sk_buff** for tc cls_act programs. + * + * The *flags* argument can be a combination of one or more of the + * following values: + * + * **BPF_MTU_CHK_RELAX** + * This flag relax or increase the MTU with room for one + * VLAN header (4 bytes) and take into account net device + * hard_header_len. This relaxation is also used by the + * kernels own forwarding MTU checks. + * + * **BPF_MTU_CHK_GSO** + * This flag will only works for *ctx* **struct sk_buff**. + * If packet context contains extra packet segment buffers + * (often knows as frags), then those are also checked + * against the MTU size. + * + * Return + * * 0 on success, and populate MTU value in *mtu_result* pointer. + * + * * < 0 if any input argument is invalid (*mtu_result* not updated) + * + * MTU violations return positive values, but also populate MTU + * value in *mtu_result* pointer, as this can be needed for + * implemeting PMTU handing: + * + * * **BPF_MTU_CHK_RET_FRAG_NEEDED** + * * **BPF_MTU_CHK_RET_GSO_TOOBIG** + * */ #define __BPF_FUNC_MAPPER(FN) \ FN(unspec), \ @@ -3875,6 +3919,7 @@ union bpf_attr { FN(redirect_neigh), \ FN(bpf_per_cpu_ptr), \ FN(bpf_this_cpu_ptr), \ + FN(mtu_check), \ /* */ /* integer value in 'imm' field of BPF_CALL instruction selects which helper @@ -4889,6 +4934,18 @@ struct bpf_fib_lookup { __u8 dmac[6]; /* ETH_ALEN */ }; +/* bpf_mtu_check flags*/ +enum bpf_mtu_check_flags { + BPF_MTU_CHK_RELAX = (1U << 0), + BPF_MTU_CHK_GSO = (1U << 1), +}; + +enum bpf_mtu_check_ret { + BPF_MTU_CHK_RET_SUCCESS, /* check and lookup successful */ + BPF_MTU_CHK_RET_FRAG_NEEDED, /* fragmentation required to fwd */ + BPF_MTU_CHK_RET_GSO_TOOBIG, /* GSO re-segmentation needed to fwd */ +}; + enum bpf_task_fd_type { BPF_FD_TYPE_RAW_TRACEPOINT, /* tp name */ BPF_FD_TYPE_TRACEPOINT, /* tp name */ diff --git a/net/core/filter.c b/net/core/filter.c index d84723f347c0..54b779e34f83 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -5512,6 +5512,119 @@ static const struct bpf_func_proto bpf_skb_fib_lookup_proto = { .arg4_type = ARG_ANYTHING, }; +static int bpf_mtu_lookup(struct net *netns, u32 ifindex, u64 flags) +{ + struct net_device *dev; + int mtu; + + dev = dev_get_by_index_rcu(netns, ifindex); + if (!dev) + return -ENODEV; + + mtu = dev->mtu; + + /* Same relax as xdp_ok_fwd_dev() and is_skb_forwardable() */ + if (flags & BPF_MTU_CHK_RELAX) + mtu += dev->hard_header_len + VLAN_HLEN; + + return mtu; +} + +static unsigned int __bpf_len_adjust_positive(unsigned int len, int len_diff) +{ + int len_new = len + len_diff; /* notice len_diff can be negative */ + + if (len_new > 0) + return len_new; + + return 0; +} + +BPF_CALL_5(bpf_skb_mtu_check, struct sk_buff *, skb, + u32, ifindex, u32 *, mtu_result, s32, len_diff, u64, flags) +{ + struct net *netns = dev_net(skb->dev); + int ret = BPF_MTU_CHK_RET_SUCCESS; + unsigned int len = skb->len; + int mtu; + + if (flags & ~(BPF_MTU_CHK_RELAX|BPF_MTU_CHK_GSO)) + return -EINVAL; + + mtu = bpf_mtu_lookup(netns, ifindex, flags); + if (unlikely(mtu < 0)) + return mtu; /* errno */ + + len = __bpf_len_adjust_positive(len, len_diff); + if (len > mtu) { + ret = BPF_MTU_CHK_RET_FRAG_NEEDED; + goto out; + } + + if (flags & BPF_MTU_CHK_GSO && + skb_is_gso(skb) && + skb_gso_validate_network_len(skb, mtu)) { + ret = BPF_MTU_CHK_RET_GSO_TOOBIG; + goto out; + } + +out: + if (mtu_result) + *mtu_result = mtu; + + return ret; +} + +BPF_CALL_5(bpf_xdp_mtu_check, struct xdp_buff *, xdp, + u32, ifindex, u32 *, mtu_result, s32, len_diff, u64, flags) +{ + unsigned int len = xdp->data_end - xdp->data; + struct net_device *dev = xdp->rxq->dev; + struct net *netns = dev_net(dev); + int ret = BPF_MTU_CHK_RET_SUCCESS; + int mtu; + + /* XDP variant doesn't support multi-buffer segment check (yet) */ + if (flags & ~BPF_MTU_CHK_RELAX) + return -EINVAL; + + mtu = bpf_mtu_lookup(netns, ifindex, flags); + if (unlikely(mtu < 0)) + return mtu; /* errno */ + + len = __bpf_len_adjust_positive(len, len_diff); + if (len > mtu) { + ret = BPF_MTU_CHK_RET_FRAG_NEEDED; + goto out; + } +out: + if (mtu_result) + *mtu_result = mtu; + + return ret;} + +static const struct bpf_func_proto bpf_skb_mtu_check_proto = { + .func = bpf_skb_mtu_check, + .gpl_only = true, + .ret_type = RET_INTEGER, + .arg1_type = ARG_PTR_TO_CTX, + .arg2_type = ARG_ANYTHING, + .arg3_type = ARG_PTR_TO_MEM, + .arg4_type = ARG_ANYTHING, + .arg5_type = ARG_ANYTHING, +}; + +static const struct bpf_func_proto bpf_xdp_mtu_check_proto = { + .func = bpf_xdp_mtu_check, + .gpl_only = true, + .ret_type = RET_INTEGER, + .arg1_type = ARG_PTR_TO_CTX, + .arg2_type = ARG_ANYTHING, + .arg3_type = ARG_PTR_TO_MEM, + .arg4_type = ARG_ANYTHING, + .arg5_type = ARG_ANYTHING, +}; + #if IS_ENABLED(CONFIG_IPV6_SEG6_BPF) static int bpf_push_seg6_encap(struct sk_buff *skb, u32 type, void *hdr, u32 len) { @@ -7075,6 +7188,8 @@ tc_cls_act_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) return &bpf_get_socket_uid_proto; case BPF_FUNC_fib_lookup: return &bpf_skb_fib_lookup_proto; + case BPF_FUNC_mtu_check: + return &bpf_skb_mtu_check_proto; case BPF_FUNC_sk_fullsock: return &bpf_sk_fullsock_proto; case BPF_FUNC_sk_storage_get: @@ -7144,6 +7259,8 @@ xdp_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) return &bpf_xdp_adjust_tail_proto; case BPF_FUNC_fib_lookup: return &bpf_xdp_fib_lookup_proto; + case BPF_FUNC_mtu_check: + return &bpf_xdp_mtu_check_proto; #ifdef CONFIG_INET case BPF_FUNC_sk_lookup_udp: return &bpf_xdp_sk_lookup_udp_proto; From patchwork Wed Oct 7 16:22:55 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jesper Dangaard Brouer X-Patchwork-Id: 268483 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.6 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8EB61C41604 for ; Wed, 7 Oct 2020 16:23:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 37BF8215A4 for ; Wed, 7 Oct 2020 16:23:08 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="a5XNN3np" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728204AbgJGQXH (ORCPT ); Wed, 7 Oct 2020 12:23:07 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:49448 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728015AbgJGQXG (ORCPT ); Wed, 7 Oct 2020 12:23:06 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1602087785; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=dFPCNmJV7BWn4h0wNGFY7upekYwi7D/UkVwSZQX+Iw4=; b=a5XNN3nph/PkA+43gk08EcWJUZRV29oBgOA3HzKQ5tLV0fU603CdHhTi7C3U801ppFhOhN 5lW1glrJw95G4B+omSkTODX/fsuTtsT4W9z8aTZx95yAb8LeUI+Is3Czk64TjNhqRqD4AM t5ddCHmPIlEaRnTk63lO2NpDYwNr1V8= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-389-nAP6kKJRPG2DDFEanhOstQ-1; Wed, 07 Oct 2020 12:23:01 -0400 X-MC-Unique: nAP6kKJRPG2DDFEanhOstQ-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id B7A358070E2; Wed, 7 Oct 2020 16:22:59 +0000 (UTC) Received: from firesoul.localdomain (unknown [10.40.208.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id CF1A36EF43; Wed, 7 Oct 2020 16:22:56 +0000 (UTC) Received: from [192.168.42.3] (localhost [IPv6:::1]) by firesoul.localdomain (Postfix) with ESMTP id A8AC430736C8B; Wed, 7 Oct 2020 18:22:55 +0200 (CEST) Subject: [PATCH bpf-next V2 4/6] bpf: make it possible to identify BPF redirected SKBs From: Jesper Dangaard Brouer To: bpf@vger.kernel.org Cc: Jesper Dangaard Brouer , netdev@vger.kernel.org, Daniel Borkmann , Alexei Starovoitov , maze@google.com, lmb@cloudflare.com, shaun@tigera.io, Lorenzo Bianconi , marek@cloudflare.com, John Fastabend , Jakub Kicinski , eyal.birger@gmail.com Date: Wed, 07 Oct 2020 18:22:55 +0200 Message-ID: <160208777560.798237.12584544697367464358.stgit@firesoul> In-Reply-To: <160208770557.798237.11181325462593441941.stgit@firesoul> References: <160208770557.798237.11181325462593441941.stgit@firesoul> User-Agent: StGit/0.19 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This change makes it possible to identify SKBs that have been redirected by TC-BPF (cls_act). This is needed for a number of cases. (1) For collaborating with driver ifb net_devices. (2) For avoiding starting generic-XDP prog on TC ingress redirect. (3) Next MTU check patches need ability to identify redirected SKBs. Signed-off-by: Jesper Dangaard Brouer --- net/core/dev.c | 2 ++ net/sched/Kconfig | 1 + 2 files changed, 3 insertions(+) diff --git a/net/core/dev.c b/net/core/dev.c index 9d55bf5d1a65..b433098896b2 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3885,6 +3885,7 @@ sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev) return NULL; case TC_ACT_REDIRECT: /* No need to push/pop skb's mac_header here on egress! */ + skb_set_redirected(skb, false); skb_do_redirect(skb); *ret = NET_XMIT_SUCCESS; return NULL; @@ -4974,6 +4975,7 @@ sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret, * redirecting to another netdev */ __skb_push(skb, skb->mac_len); + skb_set_redirected(skb, true); skb_do_redirect(skb); return NULL; case TC_ACT_CONSUMED: diff --git a/net/sched/Kconfig b/net/sched/Kconfig index a3b37d88800e..a1bbaa8fd054 100644 --- a/net/sched/Kconfig +++ b/net/sched/Kconfig @@ -384,6 +384,7 @@ config NET_SCH_INGRESS depends on NET_CLS_ACT select NET_INGRESS select NET_EGRESS + select NET_REDIRECT help Say Y here if you want to use classifiers for incoming and/or outgoing packets. This qdisc doesn't do anything else besides running classifiers, From patchwork Wed Oct 7 16:23:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jesper Dangaard Brouer X-Patchwork-Id: 288878 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.6 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6FA55C4741F for ; Wed, 7 Oct 2020 16:23:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 142822173E for ; Wed, 7 Oct 2020 16:23:11 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="iYp5i7Jb" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728240AbgJGQXK (ORCPT ); Wed, 7 Oct 2020 12:23:10 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:36804 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728015AbgJGQXK (ORCPT ); Wed, 7 Oct 2020 12:23:10 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1602087788; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=slv34YzMOjRKcThCr0qZMAMqWuUetWu/HzyWsLS+KEs=; b=iYp5i7Jb1m7dccQKXUrZY7JXIDDZkWzxgOkezlKALz9FZVgK8OJqs5iN+MGD629QNYmMFL Moujvr02knmzkksaDaMCGov4D3Jr108IGeOcwjP/huUbVDC27qJJ5XYaoyvRAaOGoynhto 68Y32VV1CZ+jnaWwgM9Y1FiArP2A7rU= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-385-jAryO5TQN3Wpx_KOcJ9TNg-1; Wed, 07 Oct 2020 12:23:04 -0400 X-MC-Unique: jAryO5TQN3Wpx_KOcJ9TNg-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 360B18070EE; Wed, 7 Oct 2020 16:23:02 +0000 (UTC) Received: from firesoul.localdomain (unknown [10.40.208.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id B74166EF45; Wed, 7 Oct 2020 16:23:01 +0000 (UTC) Received: from [192.168.42.3] (localhost [IPv6:::1]) by firesoul.localdomain (Postfix) with ESMTP id BEBB330736C8B; Wed, 7 Oct 2020 18:23:00 +0200 (CEST) Subject: [PATCH bpf-next V2 5/6] bpf: Add MTU check for TC-BPF packets after egress hook From: Jesper Dangaard Brouer To: bpf@vger.kernel.org Cc: Jesper Dangaard Brouer , netdev@vger.kernel.org, Daniel Borkmann , Alexei Starovoitov , maze@google.com, lmb@cloudflare.com, shaun@tigera.io, Lorenzo Bianconi , marek@cloudflare.com, John Fastabend , Jakub Kicinski , eyal.birger@gmail.com Date: Wed, 07 Oct 2020 18:23:00 +0200 Message-ID: <160208778070.798237.16265441131909465819.stgit@firesoul> In-Reply-To: <160208770557.798237.11181325462593441941.stgit@firesoul> References: <160208770557.798237.11181325462593441941.stgit@firesoul> User-Agent: StGit/0.19 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org The MTU should only apply to transmitted packets. When TC-ingress redirect packet to egress on another netdev, then the normal netstack MTU checks are skipped (and driver level will not catch any MTU violation, checked ixgbe). This patch choose not to add MTU check in the egress code path of skb_do_redirect() prior to calling dev_queue_xmit(), because it is still possible to run another BPF egress program that will shrink/consume headers, which will make packet comply with netdev MTU. This use-case might already be in production use (if ingress MTU is larger than egress). Instead do the MTU check after sch_handle_egress() step, for the cases that require this. The cases need a bit explaining. Ingress to egress redirected packets could be detected via skb->tc_at_ingress bit, but it is not reliable, because sch_handle_egress() could steal the packet and redirect this (again) to another egress netdev, which will then have the skb->tc_at_ingress cleared. There is also the case of TC-egress prog increase packet size and then redirect it egress. Thus, it is more reliable to do the MTU check for any redirected packet (both ingress and egress), which is available via skb_is_redirected() in earlier patch. Also handle case where egress BPF-prog increased size. One advantage of this approach is that it ingress-to-egress BPF-prog can send information via packet data. With the MTU checks removed in the helpers, and also not done in skb_do_redirect() call, this allows for an ingress BPF-prog to communicate with an egress BPF-prog via packet data, as long as egress BPF-prog remove this prior to transmitting packet. Troubleshooting: MTU violations are recorded in TX dropped counter, and kprobe on dev_queue_xmit() have retval -EMSGSIZE. Signed-off-by: Jesper Dangaard Brouer --- net/core/dev.c | 24 ++++++++++++++++++++++-- 1 file changed, 22 insertions(+), 2 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index b433098896b2..19406013f93e 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3870,6 +3870,7 @@ sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev) switch (tcf_classify(skb, miniq->filter_list, &cl_res, false)) { case TC_ACT_OK: case TC_ACT_RECLASSIFY: + *ret = NET_XMIT_SUCCESS; skb->tc_index = TC_H_MIN(cl_res.classid); break; case TC_ACT_SHOT: @@ -4064,9 +4065,12 @@ static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev) { struct net_device *dev = skb->dev; struct netdev_queue *txq; +#ifdef CONFIG_NET_CLS_ACT + bool mtu_check = false; +#endif + bool again = false; struct Qdisc *q; int rc = -ENOMEM; - bool again = false; skb_reset_mac_header(skb); @@ -4082,14 +4086,28 @@ static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev) qdisc_pkt_len_init(skb); #ifdef CONFIG_NET_CLS_ACT + mtu_check = skb_is_redirected(skb); skb->tc_at_ingress = 0; # ifdef CONFIG_NET_EGRESS if (static_branch_unlikely(&egress_needed_key)) { + unsigned int len_orig = skb->len; + skb = sch_handle_egress(skb, &rc, dev); if (!skb) goto out; + /* BPF-prog ran and could have changed packet size beyond MTU */ + if (rc == NET_XMIT_SUCCESS && skb->len > len_orig) + mtu_check = true; } # endif + /* MTU-check only happens on "last" net_device in a redirect sequence + * (e.g. above sch_handle_egress can steal SKB and skb_do_redirect it + * either ingress or egress to another device). + */ + if (mtu_check && !is_skb_forwardable(dev, skb)) { + rc = -EMSGSIZE; + goto drop; + } #endif /* If device/qdisc don't need skb->dst, release it right now while * its hot in this cpu cache. @@ -4157,7 +4175,9 @@ static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev) rc = -ENETDOWN; rcu_read_unlock_bh(); - +#ifdef CONFIG_NET_CLS_ACT +drop: +#endif atomic_long_inc(&dev->tx_dropped); kfree_skb_list(skb); return rc; From patchwork Wed Oct 7 16:23:05 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jesper Dangaard Brouer X-Patchwork-Id: 268482 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.6 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BFCC5C41604 for ; Wed, 7 Oct 2020 16:23:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 65C70216C4 for ; Wed, 7 Oct 2020 16:23:20 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="ep1L+D7k" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728454AbgJGQXT (ORCPT ); Wed, 7 Oct 2020 12:23:19 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:37268 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728250AbgJGQXS (ORCPT ); Wed, 7 Oct 2020 12:23:18 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1602087796; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8Ro78EIE2Bfl88QCMw0YeVuJsEGrocBOtGtIYfj6Qfk=; b=ep1L+D7k4wVDdaWir+Q3ECxLqE3jftDhmS3fKavE4V6pn/0vEFC1V/sHsuZ2MPY+0LMuBO jDXkCS90bQaDL7fuHD29JgB4ESFRjLy0zFhrar9Ve0cZR1HwMf/fi/qcQDZmamY+l8QOQU 4EkgLFKzNTIa2YikGkewtuxOW/7r//A= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-565-4wyycvdoMOefr3hG9xMz2A-1; Wed, 07 Oct 2020 12:23:12 -0400 X-MC-Unique: 4wyycvdoMOefr3hG9xMz2A-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id B9B6D10BBEC6; Wed, 7 Oct 2020 16:23:09 +0000 (UTC) Received: from firesoul.localdomain (unknown [10.40.208.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id CEFD91001B2B; Wed, 7 Oct 2020 16:23:06 +0000 (UTC) Received: from [192.168.42.3] (localhost [IPv6:::1]) by firesoul.localdomain (Postfix) with ESMTP id D536130736C8B; Wed, 7 Oct 2020 18:23:05 +0200 (CEST) Subject: [PATCH bpf-next V2 6/6] bpf: drop MTU check when doing TC-BPF redirect to ingress From: Jesper Dangaard Brouer To: bpf@vger.kernel.org Cc: Jesper Dangaard Brouer , netdev@vger.kernel.org, Daniel Borkmann , Alexei Starovoitov , maze@google.com, lmb@cloudflare.com, shaun@tigera.io, Lorenzo Bianconi , marek@cloudflare.com, John Fastabend , Jakub Kicinski , eyal.birger@gmail.com Date: Wed, 07 Oct 2020 18:23:05 +0200 Message-ID: <160208778579.798237.7257307543620328206.stgit@firesoul> In-Reply-To: <160208770557.798237.11181325462593441941.stgit@firesoul> References: <160208770557.798237.11181325462593441941.stgit@firesoul> User-Agent: StGit/0.19 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org The use-case for dropping the MTU check when TC-BPF ingress redirecting a packet, is described by Eyal Birger in email[0]. The summary is the ability to increase packet size (e.g. with IPv6 headers for NAT64) and ingress redirect packet and let normal netstack fragment packet as needed. [0] https://lore.kernel.org/netdev/CAHsH6Gug-hsLGHQ6N0wtixdOa85LDZ3HNRHVd0opR=19Qo4W4Q@mail.gmail.com/ Signed-off-by: Jesper Dangaard Brouer --- include/linux/netdevice.h | 5 +++-- net/core/dev.c | 2 +- net/core/filter.c | 12 ++++++++++-- 3 files changed, 14 insertions(+), 5 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 28cfa53daf72..58fb7b4869ba 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3866,10 +3866,11 @@ bool is_skb_forwardable(const struct net_device *dev, const struct sk_buff *skb); static __always_inline int ____dev_forward_skb(struct net_device *dev, - struct sk_buff *skb) + struct sk_buff *skb, + const bool mtu_check) { if (skb_orphan_frags(skb, GFP_ATOMIC) || - unlikely(!is_skb_forwardable(dev, skb))) { + (mtu_check && unlikely(!is_skb_forwardable(dev, skb)))) { atomic_long_inc(&dev->rx_dropped); kfree_skb(skb); return NET_RX_DROP; diff --git a/net/core/dev.c b/net/core/dev.c index 19406013f93e..bae95ae9aa96 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2209,7 +2209,7 @@ EXPORT_SYMBOL_GPL(is_skb_forwardable); int __dev_forward_skb(struct net_device *dev, struct sk_buff *skb) { - int ret = ____dev_forward_skb(dev, skb); + int ret = ____dev_forward_skb(dev, skb, true); if (likely(!ret)) { skb->protocol = eth_type_trans(skb, dev); diff --git a/net/core/filter.c b/net/core/filter.c index 54b779e34f83..5516d4efe225 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -2083,13 +2083,21 @@ static const struct bpf_func_proto bpf_csum_level_proto = { static inline int __bpf_rx_skb(struct net_device *dev, struct sk_buff *skb) { - return dev_forward_skb(dev, skb); + int ret = ____dev_forward_skb(dev, skb, false); + + if (likely(!ret)) { + skb->protocol = eth_type_trans(skb, dev); + skb_postpull_rcsum(skb, eth_hdr(skb), ETH_HLEN); + ret = netif_rx(skb); + } + + return ret; } static inline int __bpf_rx_skb_no_mac(struct net_device *dev, struct sk_buff *skb) { - int ret = ____dev_forward_skb(dev, skb); + int ret = ____dev_forward_skb(dev, skb, false); if (likely(!ret)) { skb->dev = dev;