Message ID | 161047350559.4003084.17398867215317668954.stgit@firesoul |
---|---|
State | Superseded |
Headers | show |
Series | bpf: New approach for BPF MTU handling | expand |
Jesper Dangaard Brouer wrote: > Multiple BPF-helpers that can manipulate/increase the size of the SKB uses > __bpf_skb_max_len() as the max-length. This function limit size against > the current net_device MTU (skb->dev->mtu). > > When a BPF-prog grow the packet size, then it should not be limited to the > MTU. The MTU is a transmit limitation, and software receiving this packet > should be allowed to increase the size. Further more, current MTU check in > __bpf_skb_max_len uses the MTU from ingress/current net_device, which in > case of redirects uses the wrong net_device. > > This patch keeps a sanity max limit of SKB_MAX_ALLOC (16KiB). The real limit > is elsewhere in the system. Jesper's testing[1] showed it was not possible > to exceed 8KiB when expanding the SKB size via BPF-helper. The limiting > factor is the define KMALLOC_MAX_CACHE_SIZE which is 8192 for > SLUB-allocator (CONFIG_SLUB) in-case PAGE_SIZE is 4096. This define is > in-effect due to this being called from softirq context see code > __gfp_pfmemalloc_flags() and __do_kmalloc_node(). Jakub's testing showed > that frames above 16KiB can cause NICs to reset (but not crash). Keep this > sanity limit at this level as memory layer can differ based on kernel > config. > > [1] https://github.com/xdp-project/bpf-examples/tree/master/MTU-tests > > V3: replace __bpf_skb_max_len() with define and use IPv6 max MTU size. > > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> > --- Acked-by: John Fastabend <john.fastabend@gmail.com>
diff --git a/net/core/filter.c b/net/core/filter.c index 255aeee72402..f8f198252ff2 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -3552,11 +3552,7 @@ static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff, return 0; } -static u32 __bpf_skb_max_len(const struct sk_buff *skb) -{ - return skb->dev ? skb->dev->mtu + skb->dev->hard_header_len : - SKB_MAX_ALLOC; -} +#define BPF_SKB_MAX_LEN SKB_MAX_ALLOC BPF_CALL_4(sk_skb_adjust_room, struct sk_buff *, skb, s32, len_diff, u32, mode, u64, flags) @@ -3605,7 +3601,7 @@ BPF_CALL_4(bpf_skb_adjust_room, struct sk_buff *, skb, s32, len_diff, { u32 len_cur, len_diff_abs = abs(len_diff); u32 len_min = bpf_skb_net_base_len(skb); - u32 len_max = __bpf_skb_max_len(skb); + u32 len_max = BPF_SKB_MAX_LEN; __be16 proto = skb->protocol; bool shrink = len_diff < 0; u32 off; @@ -3688,7 +3684,7 @@ static int bpf_skb_trim_rcsum(struct sk_buff *skb, unsigned int new_len) static inline int __bpf_skb_change_tail(struct sk_buff *skb, u32 new_len, u64 flags) { - u32 max_len = __bpf_skb_max_len(skb); + u32 max_len = BPF_SKB_MAX_LEN; u32 min_len = __bpf_skb_min_len(skb); int ret; @@ -3764,7 +3760,7 @@ static const struct bpf_func_proto sk_skb_change_tail_proto = { static inline int __bpf_skb_change_head(struct sk_buff *skb, u32 head_room, u64 flags) { - u32 max_len = __bpf_skb_max_len(skb); + u32 max_len = BPF_SKB_MAX_LEN; u32 new_len = skb->len + head_room; int ret;
Multiple BPF-helpers that can manipulate/increase the size of the SKB uses __bpf_skb_max_len() as the max-length. This function limit size against the current net_device MTU (skb->dev->mtu). When a BPF-prog grow the packet size, then it should not be limited to the MTU. The MTU is a transmit limitation, and software receiving this packet should be allowed to increase the size. Further more, current MTU check in __bpf_skb_max_len uses the MTU from ingress/current net_device, which in case of redirects uses the wrong net_device. This patch keeps a sanity max limit of SKB_MAX_ALLOC (16KiB). The real limit is elsewhere in the system. Jesper's testing[1] showed it was not possible to exceed 8KiB when expanding the SKB size via BPF-helper. The limiting factor is the define KMALLOC_MAX_CACHE_SIZE which is 8192 for SLUB-allocator (CONFIG_SLUB) in-case PAGE_SIZE is 4096. This define is in-effect due to this being called from softirq context see code __gfp_pfmemalloc_flags() and __do_kmalloc_node(). Jakub's testing showed that frames above 16KiB can cause NICs to reset (but not crash). Keep this sanity limit at this level as memory layer can differ based on kernel config. [1] https://github.com/xdp-project/bpf-examples/tree/master/MTU-tests V3: replace __bpf_skb_max_len() with define and use IPv6 max MTU size. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> --- net/core/filter.c | 12 ++++-------- 1 file changed, 4 insertions(+), 8 deletions(-)