Message ID | 20200813195816.67222-1-Jason@zx2c4.com |
---|---|
State | New |
Headers | show |
Series | [net,v4] net: xdp: account for layer 3 packets in generic skb handler | expand |
On Thu, 13 Aug 2020 21:58:16 +0200 Jason A. Donenfeld wrote: > - cls_bpf does not support the same feature set as XDP, and operates at > a slightly different stage in the networking stack. Please elaborate. > I had originally dropped this patch, but the issue kept coming up in > user reports, so here's a v4 of it. Testing of it is still rather slim, > but hopefully that will change in the coming days. Here an alternative patch, untested: diff --git a/net/core/dev.c b/net/core/dev.c index d3d53dc601f9..7f68831bdd4f 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -5434,6 +5434,11 @@ static int generic_xdp_install(struct net_device *dev, struct netdev_bpf *xdp) struct bpf_prog *new = xdp->prog; int ret = 0; + if (!dev->hard_header_len) { + NL_SET_ERR_MSG(xdp->extack, "generic XDP not supported on L3 devices, please use cls_bpf"); + return -EINVAL; + } + if (new) { u32 i;
On Thu, Aug 13, 2020 at 11:01 PM Jakub Kicinski <kuba@kernel.org> wrote: > > I had originally dropped this patch, but the issue kept coming up in > > user reports, so here's a v4 of it. Testing of it is still rather slim, > > but hopefully that will change in the coming days. > > Here an alternative patch, untested: Funny. But come on now... Why would we want to deprive our users of system consistency? Doesn't it make sense to allow users to use the same code across interfaces? You actually want them to rewrite their code to use a totally different trigger point just because of some weird kernel internals between interfaces? Why not make XDP more useful and more generic across interfaces? It's very common for systems to be receiving packets with a heavy ethernet card from the current data center, in addition to receiving packets from a tunnel interface connected to a remote data center, with a need to run the same XDP program on both interfaces. Why not support that kind of simplicity? This is _actually_ something that's come up _repeatedly_. This is a real world need from real users who are doing real things. Why not help them? It's not at the expense of any formal consistency, or performance, or even semantic perfection. It costs very little to support these popular use cases. [FYI, there's one tweak I'd like to make, so I'll probably send v5 ~soon.] Jason
On Fri, 14 Aug 2020 08:56:48 +0200 Jason A. Donenfeld wrote: > On Thu, Aug 13, 2020 at 11:01 PM Jakub Kicinski <kuba@kernel.org> wrote: > > > I had originally dropped this patch, but the issue kept coming up in > > > user reports, so here's a v4 of it. Testing of it is still rather slim, > > > but hopefully that will change in the coming days. > > > > Here an alternative patch, untested: > > Funny. But come on now... Why would we want to deprive our users of > system consistency? We should try for consistency between xdp and cls_bpf instead. > Doesn't it make sense to allow users to use the same code across > interfaces? You actually want them to rewrite their code to use a > totally different trigger point just because of some weird kernel > internals between interfaces? We're not building an abstraction over the kernel stack so that users won't have to worry how things work. Users need to have a minimal understanding of how specific hooks integrate with the stack and what they are for. And therefore why cls_bpf is actually more efficient to use in L3 tunnel case. > Why not make XDP more useful and more generic across interfaces? It's > very common for systems to be receiving packets with a heavy ethernet > card from the current data center, in addition to receiving packets > from a tunnel interface connected to a remote data center, with a need > to run the same XDP program on both interfaces. Why not support that > kind of simplicity? > > This is _actually_ something that's come up _repeatedly_. This is a > real world need from real users who are doing real things. Why not > help them? I'm sure it comes up repeatedly because we don't return any errors, so people waste time investigating why it doesn't work. > It's not at the expense of any formal consistency, or performance, or > even semantic perfection. It costs very little to support these > popular use cases.
On 8/14/20, Jakub Kicinski <kuba@kernel.org> wrote: > On Fri, 14 Aug 2020 08:56:48 +0200 Jason A. Donenfeld wrote: >> On Thu, Aug 13, 2020 at 11:01 PM Jakub Kicinski <kuba@kernel.org> wrote: >> > > I had originally dropped this patch, but the issue kept coming up in >> > > user reports, so here's a v4 of it. Testing of it is still rather >> > > slim, >> > > but hopefully that will change in the coming days. >> > >> > Here an alternative patch, untested: >> >> Funny. But come on now... Why would we want to deprive our users of >> system consistency? > > We should try for consistency between xdp and cls_bpf instead. And still require users to reimplement their packet processing logic twice? > >> Doesn't it make sense to allow users to use the same code across >> interfaces? You actually want them to rewrite their code to use a >> totally different trigger point just because of some weird kernel >> internals between interfaces? > > We're not building an abstraction over the kernel stack so that users > won't have to worry how things work. Users need to have a minimal > understanding of how specific hooks integrate with the stack and what > they are for. And therefore why cls_bpf is actually more efficient to > use in L3 tunnel case. It's not like adding 7 lines of code constitutes adding an abstraction layer. It's a pretty basic fix to make real things work for real users. While you might argue that users should do something different, you also can't deny that being able to hook up the same packet processing to eth0, eth1, extrafancyeth2, and tun0 is a huge convenience. > >> Why not make XDP more useful and more generic across interfaces? It's >> very common for systems to be receiving packets with a heavy ethernet >> card from the current data center, in addition to receiving packets >> from a tunnel interface connected to a remote data center, with a need >> to run the same XDP program on both interfaces. Why not support that >> kind of simplicity? >> >> This is _actually_ something that's come up _repeatedly_. This is a >> real world need from real users who are doing real things. Why not >> help them? > > I'm sure it comes up repeatedly because we don't return any errors, > so people waste time investigating why it doesn't work. What? No. It comes up repeatedly because people want to reuse their XDP processing logic with layer 3 devices. You might be right that if we tell them to go away, maybe they will, but on the other hand, why not make this actually work for them? It seems pretty easy to do, and saves everyone a lot of time. Are you worried about adding a branch to the already-slower-and-discouraged non-hardware generic path? If so, I wouldn't object if you wanted to put unlikely() around the branch condition in that if statement. Jason
From: Jakub Kicinski <kuba@kernel.org> Date: Fri, 14 Aug 2020 08:31:53 -0700 > I'm sure it comes up repeatedly because we don't return any errors, > so people waste time investigating why it doesn't work. +1
From: "Jason A. Donenfeld" <Jason@zx2c4.com> Date: Fri, 14 Aug 2020 23:04:56 +0200 > What? No. It comes up repeatedly because people want to reuse their > XDP processing logic with layer 3 devices. XDP is a layer 2 packet processing technology. It assumes an L2 ethernet and/or VLAN header is going to be there. Putting a pretend ethernet header there doesn't really change that.
On Fri, Aug 14, 2020 at 11:27 PM David Miller <davem@davemloft.net> wrote: > > From: "Jason A. Donenfeld" <Jason@zx2c4.com> > Date: Fri, 14 Aug 2020 23:04:56 +0200 > > > What? No. It comes up repeatedly because people want to reuse their > > XDP processing logic with layer 3 devices. > > XDP is a layer 2 packet processing technology. It assumes an L2 > ethernet and/or VLAN header is going to be there. > > Putting a pretend ethernet header there doesn't really change that. Actually, I wasn't aware that XDP was explicitly limited to L2-only, as some kind of fundamental thing. A while back when this patchset first came up, I initially posted something that gave unmodified L3 frames to XDP programs in the generic handler, but commenters pointed out that this loses the skb->protocol changing capability, which could be useful (e.g. some kind of custom v4->v6 modifying code), and adding the pretend ethernet header would allow people to keep the same programs for the L2 case as for the L3 case, which seemed *extremely* compelling to me. Hence, this patch series has gone in the pretend ethernet header direction. But, anyway, as I said, I wasn't aware that XDP was explicitly limited to L2-only, as some kind of fundamental thing. This actually surprises me. I always thought the original motivation of XDP was that it allowed putting a lot of steering and modification logic into the network card hardware, for fancy new cards that support eBPF. Later, the generic handler got added on so people could reuse those programs in heterogeneous environments, where some cards have hardware support and others do not. That seemed like a good idea to me. Extending that a step further for layer 3 devices seems like a logical next step, in the sense that if that step is invalid, surely the previous one of adding the generic handler must be invalid too? At least that's my impression of the historical evolution of this; I'm confident you know much better than I do. It makes me wonder, though, what will you say when hardware comes out that has layer 3 semantics and a thirst for eBPF? Also deny that hardware of XDP, because "XDP is a layer 2 packet processing technology"? I know what you'll say now: "we don't design our networking stack around hypothetical hardware, so why even bring it up? I won't entertain that." But nevertheless, contemplating those hypotheticals might be a good exercise for seeing how committed you are to the XDP=L2-only assertion. For example, maybe there are fundamental L2 semantics that XDP needs, and can't be emulated with my pretend ethernet header -- like if you really are relying on the MACs for something I'm not aware of; were that the case, it'd be compelling to me. But if it's a bit weaker, in the form of, "we just don't want to try anything with L3 at all because," then I admit I'm still a bit mystified. Nevertheless, at the risk of irritating you further, I will willingly drop this patchset at your request, even though I don't completely understand the entire scope of reasoning for doing so. (For posterity, I just posted a v6, which splits out that other bug fix into something separate for you to take, so that this one here can exist on its own.) Jason
diff --git a/net/core/dev.c b/net/core/dev.c index 7df6c9617321..e6403e74a6aa 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4591,12 +4591,12 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb, struct xdp_buff *xdp, struct bpf_prog *xdp_prog) { + bool orig_bcast, skb_is_l3 = false; struct netdev_rx_queue *rxqueue; void *orig_data, *orig_data_end; u32 metalen, act = XDP_DROP; __be16 orig_eth_type; struct ethhdr *eth; - bool orig_bcast; int hlen, off; u32 mac_len; @@ -4630,6 +4630,13 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb, * header. */ mac_len = skb->data - skb_mac_header(skb); + skb_is_l3 = !mac_len && !skb->dev->hard_header_len; + if (skb_is_l3) { + eth = skb_push(skb, sizeof(struct ethhdr)); + eth_zero_addr(eth->h_source); + eth_zero_addr(eth->h_dest); + eth->h_proto = skb->protocol; + } hlen = skb_headlen(skb) + mac_len; xdp->data = skb->data - mac_len; xdp->data_meta = xdp->data; @@ -4684,6 +4691,8 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb, __skb_push(skb, mac_len); break; case XDP_PASS: + if (skb_is_l3) + skb_pull(skb, sizeof(struct ethhdr)); metalen = xdp->data - xdp->data_meta; if (metalen) skb_metadata_set(skb, metalen);
A user reported that packets from wireguard were possibly ignored by XDP [1]. Another user reported that modifying packets from layer 3 interfaces results in impossible to diagnose drops. Apparently, the generic skb xdp handler path seems to assume that packets will always have an ethernet header, which really isn't always the case for layer 3 packets, which are produced by multiple drivers. This patch fixes the oversight. If the mac_len is 0 and so is hard_header_len, then we know that the skb is a layer 3 packet, and in that case prepend a pseudo ethhdr to the packet whose h_proto is copied from skb->protocol, which will have the appropriate v4 or v6 ethertype. This allows us to keep XDP programs' assumption correct about packets always having that ethernet header, so that existing code doesn't break, while still allowing layer 3 devices to use the generic XDP handler. For the XDP_PASS case, we remove the added ethernet header after the program runs. And for the XDP_REDIRECT or XDP_TX cases, we leave it on. This mirrors the logic for layer 2 packets, where mac_len is part of the buffer given to xdp, and then is pushed for the XDP_REDIRECT or XDP_TX cases, while it has already been naturally removed for the XDP_PASS case, since it always existed in the head space. This should preserve semantics for both cases. Previous discussions have included the point that maybe XDP should just be intentionally broken on layer 3 interfaces, by design, and that layer 3 people should be using cls_bpf. However, I think there are good grounds to reconsider this perspective: - Complicated deployments wind up applying XDP modifications to a variety of different devices on a given host, some of which are using specialized ethernet cards and other ones using virtual layer 3 interfaces, such as WireGuard. Being able to apply one codebase to each of these winds up being essential. - cls_bpf does not support the same feature set as XDP, and operates at a slightly different stage in the networking stack. You may reply, "then add all the features you want to cls_bpf", but that seems to be missing the point, and would still result in there being two ways to do everything, which is not desirable for anyone actually _using_ this code. - While XDP was originally made for hardware offloading, and while many look disdainfully upon the generic mode, it nevertheless remains a highly useful and popular way of adding bespoke packet transformations, and from that perspective, a difference between layer 2 and layer 3 packets is immaterial if the user is primarily concerned with transformations to layer 3 and beyond. [1] https://lore.kernel.org/wireguard/M5WzVK5--3-2@tuta.io/ Reported-by: Thomas Ptacek <thomas@sockpuppet.org> Reported-by: Adhipati Blambangan <adhipati@tuta.io> Cc: David Ahern <dsahern@gmail.com> Cc: Toke Høiland-Jørgensen <toke@redhat.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> --- I had originally dropped this patch, but the issue kept coming up in user reports, so here's a v4 of it. Testing of it is still rather slim, but hopefully that will change in the coming days. Changes v3->v4: - We now preserve the same logic for XDP_TX/XDP_REDIRECT as before. - hard_header_len is checked in addition to mac_len. net/core/dev.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) -- 2.28.0