mbox series

[net-next,v3,0/8] virtio-net support xdp socket zero copy xmit

Message ID 20210331071139.15473-1-xuanzhuo@linux.alibaba.com
Headers show
Series virtio-net support xdp socket zero copy xmit | expand

Message

Xuan Zhuo March 31, 2021, 7:11 a.m. UTC
XDP socket is an excellent by pass kernel network transmission framework. The
zero copy feature of xsk (XDP socket) needs to be supported by the driver. The
performance of zero copy is very good. mlx5 and intel ixgbe already support this
feature, This patch set allows virtio-net to support xsk's zerocopy xmit
feature.

And xsk's zerocopy rx has made major changes to virtio-net, and I hope to submit
it after this patch set are received.

Compared with other drivers, virtio-net does not directly obtain the dma
address, so I first obtain the xsk page, and then pass the page to virtio.

When recycling the sent packets, we have to distinguish between skb and xdp.
Now we have to distinguish between skb, xdp, xsk.

#0 #1 made some adjustments to xsk.

---------------- Performance Testing ------------

The udp package tool implemented by the interface of xsk vs sockperf(kernel udp)
for performance testing:

xsk zero copy xmit in virtio-net:
CPU        PPS         MSGSIZE    vhost-cpu
7.9%       511804      64         100%
13.3%%     484373      1500       100%

sockperf:
CPU        PPS         MSGSIZE    vhost-cpu
100%       375227      64         89.1%
100%       307322      1500       81.5%

Xuan Zhuo (8):
  xsk: XDP_SETUP_XSK_POOL support option check_dma
  xsk: support get page by addr
  virtio-net: xsk zero copy xmit setup
  virtio-net: xsk zero copy xmit implement wakeup and xmit
  virtio-net: xsk zero copy xmit support xsk unaligned mode
  virtio-net: xsk zero copy xmit kick by threshold
  virtio-net: poll tx call xsk zerocopy xmit
  virtio-net: free old xmit handle xsk

 drivers/net/virtio_net.c   | 449 +++++++++++++++++++++++++++++++++----
 include/linux/netdevice.h  |   1 +
 include/net/xdp_sock_drv.h |  11 +
 net/xdp/xsk_buff_pool.c    |   3 +-
 4 files changed, 415 insertions(+), 49 deletions(-)

--
2.31.0

Comments

Jason Wang April 6, 2021, 4:27 a.m. UTC | #1
在 2021/3/31 下午3:11, Xuan Zhuo 写道:
> xsk is a high-performance packet receiving and sending technology.

>

> This patch implements the binding and unbinding operations of xsk and

> the virtio-net queue for xsk zero copy xmit.

>

> The xsk zero copy xmit depends on tx napi. So if tx napi is not opened,

> an error will be reported. And the entire operation is under the

> protection of rtnl_lock

>

> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>

> Reviewed-by: Dust Li <dust.li@linux.alibaba.com>

> ---

>   drivers/net/virtio_net.c | 66 ++++++++++++++++++++++++++++++++++++++++

>   1 file changed, 66 insertions(+)

>

> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c

> index bb4ea9dbc16b..4e25408a2b37 100644

> --- a/drivers/net/virtio_net.c

> +++ b/drivers/net/virtio_net.c

> @@ -22,6 +22,7 @@

>   #include <net/route.h>

>   #include <net/xdp.h>

>   #include <net/net_failover.h>

> +#include <net/xdp_sock_drv.h>

>   

>   static int napi_weight = NAPI_POLL_WEIGHT;

>   module_param(napi_weight, int, 0444);

> @@ -133,6 +134,11 @@ struct send_queue {

>   	struct virtnet_sq_stats stats;

>   

>   	struct napi_struct napi;

> +

> +	struct {

> +		/* xsk pool */

> +		struct xsk_buff_pool __rcu *pool;

> +	} xsk;

>   };

>   

>   /* Internal representation of a receive virtqueue */

> @@ -2526,11 +2532,71 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,

>   	return err;

>   }

>   

> +static int virtnet_xsk_pool_enable(struct net_device *dev,

> +				   struct xsk_buff_pool *pool,

> +				   u16 qid)

> +{

> +	struct virtnet_info *vi = netdev_priv(dev);

> +	struct send_queue *sq;

> +	int ret = -EBUSY;



I'd rather move this under the check of xsk.pool.


> +

> +	if (qid >= vi->curr_queue_pairs)

> +		return -EINVAL;

> +

> +	sq = &vi->sq[qid];

> +

> +	/* xsk zerocopy depend on the tx napi */



Need more comments to explain why tx NAPI is required here.

And what's more important, tx NAPI could be enabled/disable via ethtool, 
what if the NAPI is disabled after xsk is enabled?


> +	if (!sq->napi.weight)

> +		return -EPERM;

> +

> +	rcu_read_lock();

> +	if (rcu_dereference(sq->xsk.pool))

> +		goto end;



Under what condition can we reach here?


> +

> +	/* Here is already protected by rtnl_lock, so rcu_assign_pointer is

> +	 * safe.

> +	 */

> +	rcu_assign_pointer(sq->xsk.pool, pool);

> +	ret = 0;

> +end:

> +	rcu_read_unlock();

> +

> +	return ret;

> +}

> +

> +static int virtnet_xsk_pool_disable(struct net_device *dev, u16 qid)

> +{

> +	struct virtnet_info *vi = netdev_priv(dev);

> +	struct send_queue *sq;

> +

> +	if (qid >= vi->curr_queue_pairs)

> +		return -EINVAL;

> +

> +	sq = &vi->sq[qid];

> +

> +	/* Here is already protected by rtnl_lock, so rcu_assign_pointer is

> +	 * safe.

> +	 */

> +	rcu_assign_pointer(sq->xsk.pool, NULL);

> +

> +	synchronize_rcu(); /* Sync with the XSK wakeup and with NAPI. */



Since rtnl is held here, I guess it's better to use synchornize_net().


> +

> +	return 0;

> +}

> +

>   static int virtnet_xdp(struct net_device *dev, struct netdev_bpf *xdp)

>   {

>   	switch (xdp->command) {

>   	case XDP_SETUP_PROG:

>   		return virtnet_xdp_set(dev, xdp->prog, xdp->extack);

> +	case XDP_SETUP_XSK_POOL:

> +		/* virtio net not use dma before call vring api */

> +		xdp->xsk.check_dma = false;



I think it's better not open code things like this. How about introduce 
new parameters in xp_assign_dev()?


> +		if (xdp->xsk.pool)

> +			return virtnet_xsk_pool_enable(dev, xdp->xsk.pool,

> +						       xdp->xsk.queue_id);

> +		else

> +			return virtnet_xsk_pool_disable(dev, xdp->xsk.queue_id);

>   	default:

>   		return -EINVAL;

>   	}



Thanks
Jason Wang April 6, 2021, 6:59 a.m. UTC | #2
在 2021/3/31 下午3:11, Xuan Zhuo 写道:
> After testing, the performance of calling kick every time is not stable.

> And if all the packets are sent and kicked again, the performance is not

> good. So add a module parameter to specify how many packets are sent to

> call a kick.

>

> 8 is a relatively stable value with the best performance.



Please add some perf numbers here.

Thanks


>

> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>

> Reviewed-by: Dust Li <dust.li@linux.alibaba.com>

> ---

>   drivers/net/virtio_net.c | 23 ++++++++++++++++++++---

>   1 file changed, 20 insertions(+), 3 deletions(-)

>

> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c

> index 259fafcf6028..d7e95f55478d 100644

> --- a/drivers/net/virtio_net.c

> +++ b/drivers/net/virtio_net.c

> @@ -29,10 +29,12 @@ module_param(napi_weight, int, 0444);

>   

>   static bool csum = true, gso = true, napi_tx = true;

>   static int xsk_budget = 32;

> +static int xsk_kick_thr = 8;

>   module_param(csum, bool, 0444);

>   module_param(gso, bool, 0444);

>   module_param(napi_tx, bool, 0644);

>   module_param(xsk_budget, int, 0644);

> +module_param(xsk_kick_thr, int, 0644);

>   

>   /* FIXME: MTU in config. */

>   #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)

> @@ -2612,6 +2614,8 @@ static int virtnet_xsk_xmit_batch(struct send_queue *sq,

>   	struct xdp_desc desc;

>   	int err, packet = 0;

>   	int ret = -EAGAIN;

> +	int need_kick = 0;

> +	int kicks = 0;

>   

>   	if (sq->xsk.last_desc.addr) {

>   		err = virtnet_xsk_xmit(sq, pool, &sq->xsk.last_desc);

> @@ -2619,6 +2623,7 @@ static int virtnet_xsk_xmit_batch(struct send_queue *sq,

>   			return -EBUSY;

>   

>   		++packet;

> +		++need_kick;

>   		--budget;

>   		sq->xsk.last_desc.addr = 0;

>   	}

> @@ -2642,13 +2647,25 @@ static int virtnet_xsk_xmit_batch(struct send_queue *sq,

>   		}

>   

>   		++packet;

> +		++need_kick;

> +		if (need_kick > xsk_kick_thr) {

> +			if (virtqueue_kick_prepare(sq->vq) &&

> +			    virtqueue_notify(sq->vq))

> +				++kicks;

> +

> +			need_kick = 0;

> +		}

>   	}

>   

>   	if (packet) {

> -		if (virtqueue_kick_prepare(sq->vq) &&

> -		    virtqueue_notify(sq->vq)) {

> +		if (need_kick) {

> +			if (virtqueue_kick_prepare(sq->vq) &&

> +			    virtqueue_notify(sq->vq))

> +				++kicks;

> +		}

> +		if (kicks) {

>   			u64_stats_update_begin(&sq->stats.syncp);

> -			sq->stats.kicks += 1;

> +			sq->stats.kicks += kicks;

>   			u64_stats_update_end(&sq->stats.syncp);

>   		}

>
Jason Wang April 6, 2021, 7:03 a.m. UTC | #3
在 2021/3/31 下午3:11, Xuan Zhuo 写道:
> poll tx call virtnet_xsk_run, then the data in the xsk tx queue will be

> continuously consumed by napi.

>

> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>

> Reviewed-by: Dust Li <dust.li@linux.alibaba.com>



I think we need squash this into patch 4, it looks more like a bug fix 
to me.


> ---

>   drivers/net/virtio_net.c | 20 +++++++++++++++++---

>   1 file changed, 17 insertions(+), 3 deletions(-)

>

> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c

> index d7e95f55478d..fac7d0020013 100644

> --- a/drivers/net/virtio_net.c

> +++ b/drivers/net/virtio_net.c

> @@ -264,6 +264,9 @@ struct padded_vnet_hdr {

>   	char padding[4];

>   };

>   

> +static int virtnet_xsk_run(struct send_queue *sq, struct xsk_buff_pool *pool,

> +			   int budget, bool in_napi);

> +

>   static bool is_xdp_frame(void *ptr)

>   {

>   	return (unsigned long)ptr & VIRTIO_XDP_FLAG;

> @@ -1553,7 +1556,9 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)

>   	struct send_queue *sq = container_of(napi, struct send_queue, napi);

>   	struct virtnet_info *vi = sq->vq->vdev->priv;

>   	unsigned int index = vq2txq(sq->vq);

> +	struct xsk_buff_pool *pool;

>   	struct netdev_queue *txq;

> +	int work = 0;

>   

>   	if (unlikely(is_xdp_raw_buffer_queue(vi, index))) {

>   		/* We don't need to enable cb for XDP */

> @@ -1563,15 +1568,24 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)

>   

>   	txq = netdev_get_tx_queue(vi->dev, index);

>   	__netif_tx_lock(txq, raw_smp_processor_id());

> -	free_old_xmit_skbs(sq, true);

> +	rcu_read_lock();

> +	pool = rcu_dereference(sq->xsk.pool);

> +	if (pool) {

> +		work = virtnet_xsk_run(sq, pool, budget, true);

> +		rcu_read_unlock();

> +	} else {

> +		rcu_read_unlock();

> +		free_old_xmit_skbs(sq, true);

> +	}

>   	__netif_tx_unlock(txq);

>   

> -	virtqueue_napi_complete(napi, sq->vq, 0);

> +	if (work < budget)

> +		virtqueue_napi_complete(napi, sq->vq, 0);

>   

>   	if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)

>   		netif_tx_wake_queue(txq);

>   

> -	return 0;

> +	return work;



Need a separate patch to "fix" the budget returned by poll_tx here.

Thanks


>   }

>   

>   static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
Magnus Karlsson April 7, 2021, 10:02 a.m. UTC | #4
On Tue, Apr 6, 2021 at 3:13 PM Jason Wang <jasowang@redhat.com> wrote:
>

>

> 在 2021/3/31 下午3:11, Xuan Zhuo 写道:

> > xsk is a high-performance packet receiving and sending technology.

> >

> > This patch implements the binding and unbinding operations of xsk and

> > the virtio-net queue for xsk zero copy xmit.

> >

> > The xsk zero copy xmit depends on tx napi. So if tx napi is not opened,

> > an error will be reported. And the entire operation is under the

> > protection of rtnl_lock

> >

> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>

> > Reviewed-by: Dust Li <dust.li@linux.alibaba.com>

> > ---

> >   drivers/net/virtio_net.c | 66 ++++++++++++++++++++++++++++++++++++++++

> >   1 file changed, 66 insertions(+)

> >

> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c

> > index bb4ea9dbc16b..4e25408a2b37 100644

> > --- a/drivers/net/virtio_net.c

> > +++ b/drivers/net/virtio_net.c

> > @@ -22,6 +22,7 @@

> >   #include <net/route.h>

> >   #include <net/xdp.h>

> >   #include <net/net_failover.h>

> > +#include <net/xdp_sock_drv.h>

> >

> >   static int napi_weight = NAPI_POLL_WEIGHT;

> >   module_param(napi_weight, int, 0444);

> > @@ -133,6 +134,11 @@ struct send_queue {

> >       struct virtnet_sq_stats stats;

> >

> >       struct napi_struct napi;

> > +

> > +     struct {

> > +             /* xsk pool */

> > +             struct xsk_buff_pool __rcu *pool;

> > +     } xsk;

> >   };

> >

> >   /* Internal representation of a receive virtqueue */

> > @@ -2526,11 +2532,71 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,

> >       return err;

> >   }

> >

> > +static int virtnet_xsk_pool_enable(struct net_device *dev,

> > +                                struct xsk_buff_pool *pool,

> > +                                u16 qid)

> > +{

> > +     struct virtnet_info *vi = netdev_priv(dev);

> > +     struct send_queue *sq;

> > +     int ret = -EBUSY;

>

>

> I'd rather move this under the check of xsk.pool.

>

>

> > +

> > +     if (qid >= vi->curr_queue_pairs)

> > +             return -EINVAL;

> > +

> > +     sq = &vi->sq[qid];

> > +

> > +     /* xsk zerocopy depend on the tx napi */

>

>

> Need more comments to explain why tx NAPI is required here.

>

> And what's more important, tx NAPI could be enabled/disable via ethtool,

> what if the NAPI is disabled after xsk is enabled?

>

>

> > +     if (!sq->napi.weight)

> > +             return -EPERM;

> > +

> > +     rcu_read_lock();

> > +     if (rcu_dereference(sq->xsk.pool))

> > +             goto end;

>

>

> Under what condition can we reach here?


There is already code in the common xsk part that tests for binding
multiple sockets to the same netdev and queue id (in an illegal way
that is). Does this code not work for you? If so, we should fix that
and not introduce a separate check down in the driver. Or maybe I
misunderstand your problem.

The code is here:

struct xsk_buff_pool *xsk_get_pool_from_qid(struct net_device *dev,
                                            u16 queue_id)
{
        if (queue_id < dev->real_num_rx_queues)
                return dev->_rx[queue_id].pool;
        if (queue_id < dev->real_num_tx_queues)
                return dev->_tx[queue_id].pool;

        return NULL;
}

int xp_assign_dev(struct xsk_buff_pool *pool,
                  struct net_device *netdev, u16 queue_id, u16 flags)
{
        :
        :
        if (xsk_get_pool_from_qid(netdev, queue_id))
                return -EBUSY;


>

> > +

> > +     /* Here is already protected by rtnl_lock, so rcu_assign_pointer is

> > +      * safe.

> > +      */

> > +     rcu_assign_pointer(sq->xsk.pool, pool);

> > +     ret = 0;

> > +end:

> > +     rcu_read_unlock();

> > +

> > +     return ret;

> > +}

> > +

> > +static int virtnet_xsk_pool_disable(struct net_device *dev, u16 qid)

> > +{

> > +     struct virtnet_info *vi = netdev_priv(dev);

> > +     struct send_queue *sq;

> > +

> > +     if (qid >= vi->curr_queue_pairs)

> > +             return -EINVAL;

> > +

> > +     sq = &vi->sq[qid];

> > +

> > +     /* Here is already protected by rtnl_lock, so rcu_assign_pointer is

> > +      * safe.

> > +      */

> > +     rcu_assign_pointer(sq->xsk.pool, NULL);

> > +

> > +     synchronize_rcu(); /* Sync with the XSK wakeup and with NAPI. */

>

>

> Since rtnl is held here, I guess it's better to use synchornize_net().

>

>

> > +

> > +     return 0;

> > +}

> > +

> >   static int virtnet_xdp(struct net_device *dev, struct netdev_bpf *xdp)

> >   {

> >       switch (xdp->command) {

> >       case XDP_SETUP_PROG:

> >               return virtnet_xdp_set(dev, xdp->prog, xdp->extack);

> > +     case XDP_SETUP_XSK_POOL:

> > +             /* virtio net not use dma before call vring api */

> > +             xdp->xsk.check_dma = false;

>

>

> I think it's better not open code things like this. How about introduce

> new parameters in xp_assign_dev()?

>

>

> > +             if (xdp->xsk.pool)

> > +                     return virtnet_xsk_pool_enable(dev, xdp->xsk.pool,

> > +                                                    xdp->xsk.queue_id);

> > +             else

> > +                     return virtnet_xsk_pool_disable(dev, xdp->xsk.queue_id);

> >       default:

> >               return -EINVAL;

> >       }

>

>

> Thanks

>