diff mbox series

[v3,bpf-next] net: veth: alloc skb in bulk for ndo_xdp_xmit

Message ID a14a30d3c06fff24e13f836c733d80efc0bd6eb5.1611957532.git.lorenzo@kernel.org
State New
Headers show
Series [v3,bpf-next] net: veth: alloc skb in bulk for ndo_xdp_xmit | expand

Commit Message

Lorenzo Bianconi Jan. 29, 2021, 10:04 p.m. UTC
Split ndo_xdp_xmit and ndo_start_xmit use cases in veth_xdp_rcv routine
in order to alloc skbs in bulk for XDP_PASS verdict.
Introduce xdp_alloc_skb_bulk utility routine to alloc skb bulk list.
The proposed approach has been tested in the following scenario:

eth (ixgbe) --> XDP_REDIRECT --> veth0 --> (remote-ns) veth1 --> XDP_PASS

XDP_REDIRECT: xdp_redirect_map bpf sample
XDP_PASS: xdp_rxq_info bpf sample

traffic generator: pkt_gen sending udp traffic on a remote device

bpf-next master: ~3.64Mpps
bpf-next + skb bulking allocation: ~3.79Mpps

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
Changes since v2:
- use __GFP_ZERO flag instead of memset
- move some veth_xdp_rcv_batch() logic in veth_xdp_rcv_skb()

Changes since v1:
- drop patch 2/3, squash patch 1/3 and 3/3
- set VETH_XDP_BATCH to 16
- rework veth_xdp_rcv to use __ptr_ring_consume
---
 drivers/net/veth.c | 78 ++++++++++++++++++++++++++++++++++------------
 include/net/xdp.h  |  1 +
 net/core/xdp.c     | 11 +++++++
 3 files changed, 70 insertions(+), 20 deletions(-)

Comments

Toshiaki Makita Jan. 31, 2021, 2:16 p.m. UTC | #1
On 2021/01/30 7:04, Lorenzo Bianconi wrote:
> Split ndo_xdp_xmit and ndo_start_xmit use cases in veth_xdp_rcv routine

> in order to alloc skbs in bulk for XDP_PASS verdict.

> Introduce xdp_alloc_skb_bulk utility routine to alloc skb bulk list.

> The proposed approach has been tested in the following scenario:

> 

> eth (ixgbe) --> XDP_REDIRECT --> veth0 --> (remote-ns) veth1 --> XDP_PASS

> 

> XDP_REDIRECT: xdp_redirect_map bpf sample

> XDP_PASS: xdp_rxq_info bpf sample

> 

> traffic generator: pkt_gen sending udp traffic on a remote device

> 

> bpf-next master: ~3.64Mpps

> bpf-next + skb bulking allocation: ~3.79Mpps

> 

> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>


Reviewed-by: Toshiaki Makita <toshiaki.makita1@gmail.com>


Thanks,
Toshiaki Makita
Jesper Dangaard Brouer Feb. 2, 2021, 2:14 p.m. UTC | #2
On Fri, 29 Jan 2021 23:04:08 +0100
Lorenzo Bianconi <lorenzo@kernel.org> wrote:

> Split ndo_xdp_xmit and ndo_start_xmit use cases in veth_xdp_rcv routine

> in order to alloc skbs in bulk for XDP_PASS verdict.

> Introduce xdp_alloc_skb_bulk utility routine to alloc skb bulk list.

> The proposed approach has been tested in the following scenario:

> 

> eth (ixgbe) --> XDP_REDIRECT --> veth0 --> (remote-ns) veth1 --> XDP_PASS

> 

> XDP_REDIRECT: xdp_redirect_map bpf sample

> XDP_PASS: xdp_rxq_info bpf sample

> 

> traffic generator: pkt_gen sending udp traffic on a remote device

> 

> bpf-next master: ~3.64Mpps

> bpf-next + skb bulking allocation: ~3.79Mpps

> 

> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>

> ---


I wanted Lorenzo to test 8 vs 16 bulking, but after much testing and
IRC dialog, we cannot find and measure any difference with enough
accuracy. Thus:

Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>


> Changes since v2:

> - use __GFP_ZERO flag instead of memset

> - move some veth_xdp_rcv_batch() logic in veth_xdp_rcv_skb()

> 

> Changes since v1:

> - drop patch 2/3, squash patch 1/3 and 3/3

> - set VETH_XDP_BATCH to 16

> - rework veth_xdp_rcv to use __ptr_ring_consume

> ---

>  drivers/net/veth.c | 78 ++++++++++++++++++++++++++++++++++------------

>  include/net/xdp.h  |  1 +

>  net/core/xdp.c     | 11 +++++++

>  3 files changed, 70 insertions(+), 20 deletions(-)

> 

> diff --git a/drivers/net/veth.c b/drivers/net/veth.c

> index 6e03b619c93c..aa1a66ad2ce5 100644

> --- a/drivers/net/veth.c

> +++ b/drivers/net/veth.c

> @@ -35,6 +35,7 @@

>  #define VETH_XDP_HEADROOM	(XDP_PACKET_HEADROOM + NET_IP_ALIGN)

>  

>  #define VETH_XDP_TX_BULK_SIZE	16

> +#define VETH_XDP_BATCH		16

>  



-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer
patchwork-bot+netdevbpf@kernel.org Feb. 4, 2021, 12:10 a.m. UTC | #3
Hello:

This patch was applied to bpf/bpf-next.git (refs/heads/master):

On Fri, 29 Jan 2021 23:04:08 +0100 you wrote:
> Split ndo_xdp_xmit and ndo_start_xmit use cases in veth_xdp_rcv routine

> in order to alloc skbs in bulk for XDP_PASS verdict.

> Introduce xdp_alloc_skb_bulk utility routine to alloc skb bulk list.

> The proposed approach has been tested in the following scenario:

> 

> eth (ixgbe) --> XDP_REDIRECT --> veth0 --> (remote-ns) veth1 --> XDP_PASS

> 

> [...]


Here is the summary with links:
  - [v3,bpf-next] net: veth: alloc skb in bulk for ndo_xdp_xmit
    https://git.kernel.org/bpf/bpf-next/c/65e6dcf73398

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
Daniel Borkmann Feb. 4, 2021, 12:14 a.m. UTC | #4
On 1/29/21 11:04 PM, Lorenzo Bianconi wrote:
> Split ndo_xdp_xmit and ndo_start_xmit use cases in veth_xdp_rcv routine

> in order to alloc skbs in bulk for XDP_PASS verdict.

> Introduce xdp_alloc_skb_bulk utility routine to alloc skb bulk list.

> The proposed approach has been tested in the following scenario:

[...]
> diff --git a/net/core/xdp.c b/net/core/xdp.c

> index 0d2630a35c3e..05354976c1fc 100644

> --- a/net/core/xdp.c

> +++ b/net/core/xdp.c

> @@ -514,6 +514,17 @@ void xdp_warn(const char *msg, const char *func, const int line)

>   };

>   EXPORT_SYMBOL_GPL(xdp_warn);

>   

> +int xdp_alloc_skb_bulk(void **skbs, int n_skb, gfp_t gfp)

> +{

> +	n_skb = kmem_cache_alloc_bulk(skbuff_head_cache, gfp,

> +				      n_skb, skbs);


Applied, but one question I was wondering about when reading the kmem_cache_alloc_bulk()
code was whether it would be safer to simply test for kmem_cache_alloc_bulk() != n_skb
given it could potentially in future also alloc less objs than requested, but I presume
if such extension would get implemented then call-sites might need to indicate 'best
effort' somehow via flag instead (to handle < n_skb case). Either way all current callers
assume for != 0 that everything went well, so lgtm.

> +	if (unlikely(!n_skb))

> +		return -ENOMEM;

> +

> +	return 0;

> +}

> +EXPORT_SYMBOL_GPL(xdp_alloc_skb_bulk);

> +

>   struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf,

>   					   struct sk_buff *skb,

>   					   struct net_device *dev)

>
Jesper Dangaard Brouer Feb. 4, 2021, 9:05 a.m. UTC | #5
On Thu, 4 Feb 2021 01:14:56 +0100
Daniel Borkmann <daniel@iogearbox.net> wrote:

> On 1/29/21 11:04 PM, Lorenzo Bianconi wrote:

> > Split ndo_xdp_xmit and ndo_start_xmit use cases in veth_xdp_rcv routine

> > in order to alloc skbs in bulk for XDP_PASS verdict.

> > Introduce xdp_alloc_skb_bulk utility routine to alloc skb bulk list.

> > The proposed approach has been tested in the following scenario:  

> [...]

> > diff --git a/net/core/xdp.c b/net/core/xdp.c

> > index 0d2630a35c3e..05354976c1fc 100644

> > --- a/net/core/xdp.c

> > +++ b/net/core/xdp.c

> > @@ -514,6 +514,17 @@ void xdp_warn(const char *msg, const char *func, const int line)

> >   };

> >   EXPORT_SYMBOL_GPL(xdp_warn);

> >   

> > +int xdp_alloc_skb_bulk(void **skbs, int n_skb, gfp_t gfp)

> > +{

> > +	n_skb = kmem_cache_alloc_bulk(skbuff_head_cache, gfp,

> > +				      n_skb, skbs);  

> 

> Applied, but one question I was wondering about when reading the kmem_cache_alloc_bulk()

> code was whether it would be safer to simply test for kmem_cache_alloc_bulk() != n_skb

> given it could potentially in future also alloc less objs than requested, but I presume

> if such extension would get implemented then call-sites might need to indicate 'best

> effort' somehow via flag instead (to handle < n_skb case). Either way all current callers

> assume for != 0 that everything went well, so lgtm.


It was Andrew (AKPM) that wanted the API to either return the requested
number of objects or fail. I respected the MM-maintainers request at
that point, even-though I wanted the other API as there is a small
performance advantage (not crossing page boundary in SLUB).

At that time we discussed it on MM-list, and I see his/the point:
If API can allocate less objs than requested, then think about how this
complicated the surrounding code. E.g. in this specific code we already
have VETH_XDP_BATCH(16) xdp_frame objects, which we need to get 16 SKB
objects for.  What should the code do if it cannot get 16 SKBs(?).

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer
Daniel Borkmann Feb. 4, 2021, 3:43 p.m. UTC | #6
On 2/4/21 10:05 AM, Jesper Dangaard Brouer wrote:
[...]
> It was Andrew (AKPM) that wanted the API to either return the requested

> number of objects or fail. I respected the MM-maintainers request at

> that point, even-though I wanted the other API as there is a small

> performance advantage (not crossing page boundary in SLUB).

> 

> At that time we discussed it on MM-list, and I see his/the point:

> If API can allocate less objs than requested, then think about how this

> complicated the surrounding code. E.g. in this specific code we already

> have VETH_XDP_BATCH(16) xdp_frame objects, which we need to get 16 SKB

> objects for.  What should the code do if it cannot get 16 SKBs(?).


Right, I mentioned the error handling complications above wrt < n_skb case. I think iff this
ever gets implemented and there's a need, it would probably be best to add a new flag like
__GFP_BULK_BEST_EFFORT to indicate that it would be okay to return x elements with x being
in (0, size], so that only those callers need to deal with this, and all others can expect
[as today] that != 0 means all #size elements were bulk alloc'ed.

Thanks,
Daniel
diff mbox series

Patch

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 6e03b619c93c..aa1a66ad2ce5 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -35,6 +35,7 @@ 
 #define VETH_XDP_HEADROOM	(XDP_PACKET_HEADROOM + NET_IP_ALIGN)
 
 #define VETH_XDP_TX_BULK_SIZE	16
+#define VETH_XDP_BATCH		16
 
 struct veth_stats {
 	u64	rx_drops;
@@ -562,14 +563,13 @@  static int veth_xdp_tx(struct veth_rq *rq, struct xdp_buff *xdp,
 	return 0;
 }
 
-static struct sk_buff *veth_xdp_rcv_one(struct veth_rq *rq,
-					struct xdp_frame *frame,
-					struct veth_xdp_tx_bq *bq,
-					struct veth_stats *stats)
+static struct xdp_frame *veth_xdp_rcv_one(struct veth_rq *rq,
+					  struct xdp_frame *frame,
+					  struct veth_xdp_tx_bq *bq,
+					  struct veth_stats *stats)
 {
 	struct xdp_frame orig_frame;
 	struct bpf_prog *xdp_prog;
-	struct sk_buff *skb;
 
 	rcu_read_lock();
 	xdp_prog = rcu_dereference(rq->xdp_prog);
@@ -623,13 +623,7 @@  static struct sk_buff *veth_xdp_rcv_one(struct veth_rq *rq,
 	}
 	rcu_read_unlock();
 
-	skb = xdp_build_skb_from_frame(frame, rq->dev);
-	if (!skb) {
-		xdp_return_frame(frame);
-		stats->rx_drops++;
-	}
-
-	return skb;
+	return frame;
 err_xdp:
 	rcu_read_unlock();
 	xdp_return_frame(frame);
@@ -637,6 +631,37 @@  static struct sk_buff *veth_xdp_rcv_one(struct veth_rq *rq,
 	return NULL;
 }
 
+/* frames array contains VETH_XDP_BATCH at most */
+static void veth_xdp_rcv_bulk_skb(struct veth_rq *rq, void **frames,
+				  int n_xdpf, struct veth_xdp_tx_bq *bq,
+				  struct veth_stats *stats)
+{
+	void *skbs[VETH_XDP_BATCH];
+	int i;
+
+	if (xdp_alloc_skb_bulk(skbs, n_xdpf,
+			       GFP_ATOMIC | __GFP_ZERO) < 0) {
+		for (i = 0; i < n_xdpf; i++)
+			xdp_return_frame(frames[i]);
+		stats->rx_drops += n_xdpf;
+
+		return;
+	}
+
+	for (i = 0; i < n_xdpf; i++) {
+		struct sk_buff *skb = skbs[i];
+
+		skb = __xdp_build_skb_from_frame(frames[i], skb,
+						 rq->dev);
+		if (!skb) {
+			xdp_return_frame(frames[i]);
+			stats->rx_drops++;
+			continue;
+		}
+		napi_gro_receive(&rq->xdp_napi, skb);
+	}
+}
+
 static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
 					struct sk_buff *skb,
 					struct veth_xdp_tx_bq *bq,
@@ -784,32 +809,45 @@  static int veth_xdp_rcv(struct veth_rq *rq, int budget,
 			struct veth_xdp_tx_bq *bq,
 			struct veth_stats *stats)
 {
-	int i, done = 0;
+	int i, done = 0, n_xdpf = 0;
+	void *xdpf[VETH_XDP_BATCH];
 
 	for (i = 0; i < budget; i++) {
 		void *ptr = __ptr_ring_consume(&rq->xdp_ring);
-		struct sk_buff *skb;
 
 		if (!ptr)
 			break;
 
 		if (veth_is_xdp_frame(ptr)) {
+			/* ndo_xdp_xmit */
 			struct xdp_frame *frame = veth_ptr_to_xdp(ptr);
 
 			stats->xdp_bytes += frame->len;
-			skb = veth_xdp_rcv_one(rq, frame, bq, stats);
+			frame = veth_xdp_rcv_one(rq, frame, bq, stats);
+			if (frame) {
+				/* XDP_PASS */
+				xdpf[n_xdpf++] = frame;
+				if (n_xdpf == VETH_XDP_BATCH) {
+					veth_xdp_rcv_bulk_skb(rq, xdpf, n_xdpf,
+							      bq, stats);
+					n_xdpf = 0;
+				}
+			}
 		} else {
-			skb = ptr;
+			/* ndo_start_xmit */
+			struct sk_buff *skb = ptr;
+
 			stats->xdp_bytes += skb->len;
 			skb = veth_xdp_rcv_skb(rq, skb, bq, stats);
+			if (skb)
+				napi_gro_receive(&rq->xdp_napi, skb);
 		}
-
-		if (skb)
-			napi_gro_receive(&rq->xdp_napi, skb);
-
 		done++;
 	}
 
+	if (n_xdpf)
+		veth_xdp_rcv_bulk_skb(rq, xdpf, n_xdpf, bq, stats);
+
 	u64_stats_update_begin(&rq->stats.syncp);
 	rq->stats.vs.xdp_redirect += stats->xdp_redirect;
 	rq->stats.vs.xdp_bytes += stats->xdp_bytes;
diff --git a/include/net/xdp.h b/include/net/xdp.h
index c4bfdc9a8b79..a5bc214a49d9 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -169,6 +169,7 @@  struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf,
 					   struct net_device *dev);
 struct sk_buff *xdp_build_skb_from_frame(struct xdp_frame *xdpf,
 					 struct net_device *dev);
+int xdp_alloc_skb_bulk(void **skbs, int n_skb, gfp_t gfp);
 
 static inline
 void xdp_convert_frame_to_buff(struct xdp_frame *frame, struct xdp_buff *xdp)
diff --git a/net/core/xdp.c b/net/core/xdp.c
index 0d2630a35c3e..05354976c1fc 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -514,6 +514,17 @@  void xdp_warn(const char *msg, const char *func, const int line)
 };
 EXPORT_SYMBOL_GPL(xdp_warn);
 
+int xdp_alloc_skb_bulk(void **skbs, int n_skb, gfp_t gfp)
+{
+	n_skb = kmem_cache_alloc_bulk(skbuff_head_cache, gfp,
+				      n_skb, skbs);
+	if (unlikely(!n_skb))
+		return -ENOMEM;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(xdp_alloc_skb_bulk);
+
 struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf,
 					   struct sk_buff *skb,
 					   struct net_device *dev)