Message ID | 1599828221-19364-1-git-send-email-magnus.karlsson@gmail.com |
---|---|
State | New |
Headers | show |
Series | [bpf,v4] xsk: do not discard packet when NETDEV_TX_BUSY | expand |
Magnus Karlsson wrote: > From: Magnus Karlsson <magnus.karlsson@intel.com> > > In the skb Tx path, transmission of a packet is performed with > dev_direct_xmit(). When NETDEV_TX_BUSY is set in the drivers, it > signifies that it was not possible to send the packet right now, > please try later. Unfortunately, the xsk transmit code discarded the > packet and returned EBUSY to the application. Fix this unnecessary > packet loss, by not discarding the packet in the Tx ring and return > EAGAIN. As EAGAIN is returned to the application, it can then retry > the send operation later and the packet will then likely be sent as > the driver will then likely have space/resources to send the packet. > > In summary, EAGAIN tells the application that the packet was not > discarded from the Tx ring and that it needs to call send() > again. EBUSY, on the other hand, signifies that the packet was not > sent and discarded from the Tx ring. The application needs to put the > packet on the Tx ring again if it wants it to be sent. > > Fixes: 35fcde7f8deb ("xsk: support for Tx") > Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> > Reported-by: Arkadiusz Zema <A.Zema@falconvsystems.com> > Suggested-by: Arkadiusz Zema <A.Zema@falconvsystems.com> > Suggested-by: Daniel Borkmann <daniel@iogearbox.net> Patch seems to make sense to me, better matches the expectations/path of the stack. Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Hey Magnus, On 9/11/20 2:43 PM, Magnus Karlsson wrote: > From: Magnus Karlsson <magnus.karlsson@intel.com> > > In the skb Tx path, transmission of a packet is performed with > dev_direct_xmit(). When NETDEV_TX_BUSY is set in the drivers, it > signifies that it was not possible to send the packet right now, > please try later. Unfortunately, the xsk transmit code discarded the > packet and returned EBUSY to the application. Fix this unnecessary > packet loss, by not discarding the packet in the Tx ring and return > EAGAIN. As EAGAIN is returned to the application, it can then retry > the send operation later and the packet will then likely be sent as > the driver will then likely have space/resources to send the packet. > > In summary, EAGAIN tells the application that the packet was not > discarded from the Tx ring and that it needs to call send() > again. EBUSY, on the other hand, signifies that the packet was not > sent and discarded from the Tx ring. The application needs to put the > packet on the Tx ring again if it wants it to be sent. > > Fixes: 35fcde7f8deb ("xsk: support for Tx") > Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> > Reported-by: Arkadiusz Zema <A.Zema@falconvsystems.com> > Suggested-by: Arkadiusz Zema <A.Zema@falconvsystems.com> > Suggested-by: Daniel Borkmann <daniel@iogearbox.net> > --- > v3->v4: > * Free the skb without triggering the drop trace when NETDEV_TX_BUSY > * Call consume_skb instead of kfree_skb when the packet has been > sent successfully for correct tracing > * Use sock_wfree as destructor when NETDEV_TX_BUSY > v1->v3: > * Hinder dev_direct_xmit() from freeing and completing the packet to > user space by manipulating the skb->users count as suggested by > Daniel Borkmann. > --- > net/xdp/xsk.c | 17 ++++++++++++++++- > 1 file changed, 16 insertions(+), 1 deletion(-) > > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c > index c323162..d32e39d 100644 > --- a/net/xdp/xsk.c > +++ b/net/xdp/xsk.c > @@ -377,15 +377,30 @@ static int xsk_generic_xmit(struct sock *sk) > skb_shinfo(skb)->destructor_arg = (void *)(long)desc.addr; > skb->destructor = xsk_destruct_skb; > > + /* Hinder dev_direct_xmit from freeing the packet and > + * therefore completing it in the destructor > + */ > + refcount_inc(&skb->users); > err = dev_direct_xmit(skb, xs->queue_id); > + if (err == NETDEV_TX_BUSY) { > + /* Tell user-space to retry the send */ > + skb->destructor = sock_wfree; I see, good catch, you need this one here as otherwise you leak wmem accounting given it's also part of xsk_destruct_skb() and we do free the prior allocated skb in this case. > + /* Free skb without triggering the perf drop trace */ > + __kfree_skb(skb); As a minor nit, I would just use consume_skb(skb) here given this doesn't blindly ignore the skb_unref(). It's mostly about seeing where drops are happening so that tracepoint is set to kfree_skb() which is the more interesting one. Other than that looks good and ready to go. Thanks (& sorry for late reply)! > + err = -EAGAIN; > + goto out; > + } > + > xskq_cons_release(xs->tx); > /* Ignore NET_XMIT_CN as packet might have been sent */ > - if (err == NET_XMIT_DROP || err == NETDEV_TX_BUSY) { > + if (err == NET_XMIT_DROP) { > /* SKB completed but not sent */ > + kfree_skb(skb); > err = -EBUSY; > goto out; > } > > + consume_skb(skb); > sent_frame = true; > } > >
On Tue, Sep 15, 2020 at 5:49 PM Daniel Borkmann <daniel@iogearbox.net> wrote: > > Hey Magnus, > > On 9/11/20 2:43 PM, Magnus Karlsson wrote: > > From: Magnus Karlsson <magnus.karlsson@intel.com> > > > > In the skb Tx path, transmission of a packet is performed with > > dev_direct_xmit(). When NETDEV_TX_BUSY is set in the drivers, it > > signifies that it was not possible to send the packet right now, > > please try later. Unfortunately, the xsk transmit code discarded the > > packet and returned EBUSY to the application. Fix this unnecessary > > packet loss, by not discarding the packet in the Tx ring and return > > EAGAIN. As EAGAIN is returned to the application, it can then retry > > the send operation later and the packet will then likely be sent as > > the driver will then likely have space/resources to send the packet. > > > > In summary, EAGAIN tells the application that the packet was not > > discarded from the Tx ring and that it needs to call send() > > again. EBUSY, on the other hand, signifies that the packet was not > > sent and discarded from the Tx ring. The application needs to put the > > packet on the Tx ring again if it wants it to be sent. > > > > Fixes: 35fcde7f8deb ("xsk: support for Tx") > > Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> > > Reported-by: Arkadiusz Zema <A.Zema@falconvsystems.com> > > Suggested-by: Arkadiusz Zema <A.Zema@falconvsystems.com> > > Suggested-by: Daniel Borkmann <daniel@iogearbox.net> > > --- > > v3->v4: > > * Free the skb without triggering the drop trace when NETDEV_TX_BUSY > > * Call consume_skb instead of kfree_skb when the packet has been > > sent successfully for correct tracing > > * Use sock_wfree as destructor when NETDEV_TX_BUSY > > v1->v3: > > * Hinder dev_direct_xmit() from freeing and completing the packet to > > user space by manipulating the skb->users count as suggested by > > Daniel Borkmann. > > --- > > net/xdp/xsk.c | 17 ++++++++++++++++- > > 1 file changed, 16 insertions(+), 1 deletion(-) > > > > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c > > index c323162..d32e39d 100644 > > --- a/net/xdp/xsk.c > > +++ b/net/xdp/xsk.c > > @@ -377,15 +377,30 @@ static int xsk_generic_xmit(struct sock *sk) > > skb_shinfo(skb)->destructor_arg = (void *)(long)desc.addr; > > skb->destructor = xsk_destruct_skb; > > > > + /* Hinder dev_direct_xmit from freeing the packet and > > + * therefore completing it in the destructor > > + */ > > + refcount_inc(&skb->users); > > err = dev_direct_xmit(skb, xs->queue_id); > > + if (err == NETDEV_TX_BUSY) { > > + /* Tell user-space to retry the send */ > > + skb->destructor = sock_wfree; > > I see, good catch, you need this one here as otherwise you leak wmem accounting > given it's also part of xsk_destruct_skb() and we do free the prior allocated skb > in this case. > > > + /* Free skb without triggering the perf drop trace */ > > + __kfree_skb(skb); > > As a minor nit, I would just use consume_skb(skb) here given this doesn't blindly > ignore the skb_unref(). It's mostly about seeing where drops are happening so that > tracepoint is set to kfree_skb() which is the more interesting one. Other than that > looks good and ready to go. Thanks (& sorry for late reply)! Thank you for reviewing this. I will spin a v5. > > + err = -EAGAIN; > > + goto out; > > + } > > + > > xskq_cons_release(xs->tx); > > /* Ignore NET_XMIT_CN as packet might have been sent */ > > - if (err == NET_XMIT_DROP || err == NETDEV_TX_BUSY) { > > + if (err == NET_XMIT_DROP) { > > /* SKB completed but not sent */ > > + kfree_skb(skb); > > err = -EBUSY; > > goto out; > > } > > > > + consume_skb(skb); > > sent_frame = true; > > } > > > > >
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index c323162..d32e39d 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -377,15 +377,30 @@ static int xsk_generic_xmit(struct sock *sk) skb_shinfo(skb)->destructor_arg = (void *)(long)desc.addr; skb->destructor = xsk_destruct_skb; + /* Hinder dev_direct_xmit from freeing the packet and + * therefore completing it in the destructor + */ + refcount_inc(&skb->users); err = dev_direct_xmit(skb, xs->queue_id); + if (err == NETDEV_TX_BUSY) { + /* Tell user-space to retry the send */ + skb->destructor = sock_wfree; + /* Free skb without triggering the perf drop trace */ + __kfree_skb(skb); + err = -EAGAIN; + goto out; + } + xskq_cons_release(xs->tx); /* Ignore NET_XMIT_CN as packet might have been sent */ - if (err == NET_XMIT_DROP || err == NETDEV_TX_BUSY) { + if (err == NET_XMIT_DROP) { /* SKB completed but not sent */ + kfree_skb(skb); err = -EBUSY; goto out; } + consume_skb(skb); sent_frame = true; }