Message ID | 20210623162328.2197645-1-kuba@kernel.org |
---|---|
State | New |
Headers | show |
Series | [net-next,v3] net: ip: avoid OOM kills with large UDP sends over loopback | expand |
On 6/23/21 8:05 PM, Eric Dumazet wrote: > > > On 6/23/21 6:23 PM, Jakub Kicinski wrote: >> Dave observed number of machines hitting OOM on the UDP send >> path. The workload seems to be sending large UDP packets over >> loopback. Since loopback has MTU of 64k kernel will try to >> allocate an skb with up to 64k of head space. This has a good >> chance of failing under memory pressure. What's worse if >> the message length is <32k the allocation may trigger an >> OOM killer. >> >> This is entirely avoidable, we can use an skb with page frags. >> >> af_unix solves a similar problem by limiting the head >> length to SKB_MAX_ALLOC. This seems like a good and simple >> approach. It means that UDP messages > 16kB will now >> use fragments if underlying device supports SG, if extra >> allocator pressure causes regressions in real workloads >> we can switch to trying the large allocation first and >> falling back. >> >> Reported-by: Dave Jones <dsj@fb.com> >> Signed-off-by: Jakub Kicinski <kuba@kernel.org> >> --- > > Reviewed-by: Eric Dumazet <edumazet@google.com> > > Thanks ! > I am taking this back. IPv6 side also needs to account for sizeof(struct frag_hdr) ?
On Wed, Jun 23, 2021 at 5:07 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Wed, 23 Jun 2021 21:45:55 +0200 Jesper Dangaard Brouer wrote: > > > diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c > > > index c3efc7d658f6..790dd28fd198 100644 > > > --- a/net/ipv4/ip_output.c > > > +++ b/net/ipv4/ip_output.c > > > @@ -1077,7 +1077,9 @@ static int __ip_append_data(struct sock *sk, > > > if ((flags & MSG_MORE) && > > > !(rt->dst.dev->features&NETIF_F_SG)) > > > alloclen = mtu; > > > - else if (!paged) > > > + else if (!paged && > > > + (fraglen + hh_len + 15 < SKB_MAX_ALLOC || > > > > What does the number 15 represent here? > > No idea, it's there on the allocation line, so I need to include it on > the size check. > > Looking at super old code (2.4.x) it looks like it may have gotten > copy & pasted mistakenly? The hard headers are rounded up to 16B, > and there is code which does things like: > > skb_alloc(size + dev->hard_header_len + 15); > skb_reserve(skb, (dev->hard_header_len + 15) & ~15); > > in other spots. So if I was to guess I'd say someone decided to add the > 15B "to be safe" even though hh_len already includes the round up here. The 15 seems to come from alignment indeed. Not sure when it was introduced, but until 56951b54e87a there is also this /* * Get the memory we require with some space left for alignment. */ skb = sock_alloc_send_skb(sk, fraglen+hh_len+15, 0, flags&MSG_DONTWAIT, &err);
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index c3efc7d658f6..790dd28fd198 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -1077,7 +1077,9 @@ static int __ip_append_data(struct sock *sk, if ((flags & MSG_MORE) && !(rt->dst.dev->features&NETIF_F_SG)) alloclen = mtu; - else if (!paged) + else if (!paged && + (fraglen + hh_len + 15 < SKB_MAX_ALLOC || + !(rt->dst.dev->features & NETIF_F_SG))) alloclen = fraglen; else { alloclen = min_t(int, fraglen, MAX_HEADER); diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index ff4f9ebcf7f6..ae8dbd6cdab1 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -1585,7 +1585,9 @@ static int __ip6_append_data(struct sock *sk, if ((flags & MSG_MORE) && !(rt->dst.dev->features&NETIF_F_SG)) alloclen = mtu; - else if (!paged) + else if (!paged && + (fraglen + hh_len < SKB_MAX_ALLOC || + !(rt->dst.dev->features & NETIF_F_SG))) alloclen = fraglen; else { alloclen = min_t(int, fraglen, MAX_HEADER);
Dave observed number of machines hitting OOM on the UDP send path. The workload seems to be sending large UDP packets over loopback. Since loopback has MTU of 64k kernel will try to allocate an skb with up to 64k of head space. This has a good chance of failing under memory pressure. What's worse if the message length is <32k the allocation may trigger an OOM killer. This is entirely avoidable, we can use an skb with page frags. af_unix solves a similar problem by limiting the head length to SKB_MAX_ALLOC. This seems like a good and simple approach. It means that UDP messages > 16kB will now use fragments if underlying device supports SG, if extra allocator pressure causes regressions in real workloads we can switch to trying the large allocation first and falling back. Reported-by: Dave Jones <dsj@fb.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> --- net/ipv4/ip_output.c | 4 +++- net/ipv6/ip6_output.c | 4 +++- 2 files changed, 6 insertions(+), 2 deletions(-)