Message ID | cover.1607349924.git.lorenzo@kernel.org |
---|---|
Headers | show |
Series | mvneta: introduce XDP multi-buffer support | expand |
On Mon, Dec 07, 2020 at 01:15:00PM -0800, Alexander Duyck wrote: > On Mon, Dec 7, 2020 at 8:36 AM Lorenzo Bianconi <lorenzo@kernel.org> wrote: > > > > Initialize multi-buffer bit (mb) to 0 in all XDP-capable drivers. > > This is a preliminary patch to enable xdp multi-buffer support. > > > > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> > > I'm really not a fan of this design. Having to update every driver in > order to initialize a field that was fragmented is a pain. At a > minimum it seems like it might be time to consider introducing some > sort of initializer function for this so that you can update things in > one central place the next time you have to add a new field instead of > having to update every individual driver that supports XDP. Otherwise > this isn't going to scale going forward. Also, a good example of why this might be bothering for us is a fact that in the meantime the dpaa driver got XDP support and this patch hasn't been updated to include mb setting in that driver. > > > --- > > drivers/net/ethernet/amazon/ena/ena_netdev.c | 1 + > > drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c | 1 + > > drivers/net/ethernet/cavium/thunder/nicvf_main.c | 1 + > > drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c | 1 + > > drivers/net/ethernet/intel/i40e/i40e_txrx.c | 1 + > > drivers/net/ethernet/intel/ice/ice_txrx.c | 1 + > > drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 1 + > > drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 1 + > > drivers/net/ethernet/marvell/mvneta.c | 1 + > > drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c | 1 + > > drivers/net/ethernet/mellanox/mlx4/en_rx.c | 1 + > > drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 1 + > > drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 1 + > > drivers/net/ethernet/qlogic/qede/qede_fp.c | 1 + > > drivers/net/ethernet/sfc/rx.c | 1 + > > drivers/net/ethernet/socionext/netsec.c | 1 + > > drivers/net/ethernet/ti/cpsw.c | 1 + > > drivers/net/ethernet/ti/cpsw_new.c | 1 + > > drivers/net/hyperv/netvsc_bpf.c | 1 + > > drivers/net/tun.c | 2 ++ > > drivers/net/veth.c | 1 + > > drivers/net/virtio_net.c | 2 ++ > > drivers/net/xen-netfront.c | 1 + > > net/core/dev.c | 1 + > > 24 files changed, 26 insertions(+) > >
On Mon, 2020-12-07 at 17:32 +0100, Lorenzo Bianconi wrote: > Introduce xdp_shared_info data structure to contain info about > "non-linear" xdp frame. xdp_shared_info will alias skb_shared_info > allowing to keep most of the frags in the same cache-line. > Introduce some xdp_shared_info helpers aligned to skb_frag* ones > is there or will be a more general purpose use to this xdp_shared_info ? other than hosting frags ? > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> > --- > drivers/net/ethernet/marvell/mvneta.c | 62 +++++++++++++++-------- > ---- > include/net/xdp.h | 52 ++++++++++++++++++++-- > 2 files changed, 82 insertions(+), 32 deletions(-) > > diff --git a/drivers/net/ethernet/marvell/mvneta.c > b/drivers/net/ethernet/marvell/mvneta.c > index 1e5b5c69685a..d635463609ad 100644 > --- a/drivers/net/ethernet/marvell/mvneta.c > +++ b/drivers/net/ethernet/marvell/mvneta.c > @@ -2033,14 +2033,17 @@ int mvneta_rx_refill_queue(struct mvneta_port > *pp, struct mvneta_rx_queue *rxq) > [...] > static void > @@ -2278,7 +2281,7 @@ mvneta_swbm_add_rx_fragment(struct mvneta_port > *pp, > struct mvneta_rx_desc *rx_desc, > struct mvneta_rx_queue *rxq, > struct xdp_buff *xdp, int *size, > - struct skb_shared_info *xdp_sinfo, > + struct xdp_shared_info *xdp_sinfo, > struct page *page) > { > struct net_device *dev = pp->dev; > @@ -2301,13 +2304,13 @@ mvneta_swbm_add_rx_fragment(struct > mvneta_port *pp, > if (data_len > 0 && xdp_sinfo->nr_frags < MAX_SKB_FRAGS) { > skb_frag_t *frag = &xdp_sinfo->frags[xdp_sinfo- > >nr_frags++]; > > - skb_frag_off_set(frag, pp->rx_offset_correction); > - skb_frag_size_set(frag, data_len); > - __skb_frag_set_page(frag, page); > + xdp_set_frag_offset(frag, pp->rx_offset_correction); > + xdp_set_frag_size(frag, data_len); > + xdp_set_frag_page(frag, page); > why three separate setters ? why not just one xdp_set_frag(page, offset, size) ? > /* last fragment */ > if (len == *size) { > - struct skb_shared_info *sinfo; > + struct xdp_shared_info *sinfo; > > sinfo = xdp_get_shared_info_from_buff(xdp); > sinfo->nr_frags = xdp_sinfo->nr_frags; > @@ -2324,10 +2327,13 @@ static struct sk_buff * > mvneta_swbm_build_skb(struct mvneta_port *pp, struct mvneta_rx_queue > *rxq, > struct xdp_buff *xdp, u32 desc_status) > { > - struct skb_shared_info *sinfo = > xdp_get_shared_info_from_buff(xdp); > - int i, num_frags = sinfo->nr_frags; > + struct xdp_shared_info *xdp_sinfo = > xdp_get_shared_info_from_buff(xdp); > + int i, num_frags = xdp_sinfo->nr_frags; > + skb_frag_t frag_list[MAX_SKB_FRAGS]; > struct sk_buff *skb; > > + memcpy(frag_list, xdp_sinfo->frags, sizeof(skb_frag_t) * > num_frags); > + > skb = build_skb(xdp->data_hard_start, PAGE_SIZE); > if (!skb) > return ERR_PTR(-ENOMEM); > @@ -2339,12 +2345,12 @@ mvneta_swbm_build_skb(struct mvneta_port *pp, > struct mvneta_rx_queue *rxq, > mvneta_rx_csum(pp, desc_status, skb); > > for (i = 0; i < num_frags; i++) { > - skb_frag_t *frag = &sinfo->frags[i]; > + struct page *page = xdp_get_frag_page(&frag_list[i]); > > skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, > - skb_frag_page(frag), > skb_frag_off(frag), > - skb_frag_size(frag), PAGE_SIZE); > - page_pool_release_page(rxq->page_pool, > skb_frag_page(frag)); > + page, > xdp_get_frag_offset(&frag_list[i]), > + xdp_get_frag_size(&frag_list[i]), > PAGE_SIZE); > + page_pool_release_page(rxq->page_pool, page); > } > > return skb; > @@ -2357,7 +2363,7 @@ static int mvneta_rx_swbm(struct napi_struct > *napi, > { > int rx_proc = 0, rx_todo, refill, size = 0; > struct net_device *dev = pp->dev; > - struct skb_shared_info sinfo; > + struct xdp_shared_info xdp_sinfo; > struct mvneta_stats ps = {}; > struct bpf_prog *xdp_prog; > u32 desc_status, frame_sz; > @@ -2368,7 +2374,7 @@ static int mvneta_rx_swbm(struct napi_struct > *napi, > xdp_buf.rxq = &rxq->xdp_rxq; > xdp_buf.mb = 0; > > - sinfo.nr_frags = 0; > + xdp_sinfo.nr_frags = 0; > > /* Get number of received packets */ > rx_todo = mvneta_rxq_busy_desc_num_get(pp, rxq); > @@ -2412,7 +2418,7 @@ static int mvneta_rx_swbm(struct napi_struct > *napi, > } > > mvneta_swbm_add_rx_fragment(pp, rx_desc, rxq, > &xdp_buf, > - &size, &sinfo, > page); > + &size, &xdp_sinfo, > page); > } /* Middle or Last descriptor */ > > if (!(rx_status & MVNETA_RXD_LAST_DESC)) > @@ -2420,7 +2426,7 @@ static int mvneta_rx_swbm(struct napi_struct > *napi, > continue; > > if (size) { > - mvneta_xdp_put_buff(pp, rxq, &xdp_buf, &sinfo, > -1); > + mvneta_xdp_put_buff(pp, rxq, &xdp_buf, > &xdp_sinfo, -1); > goto next; > } > > @@ -2432,7 +2438,7 @@ static int mvneta_rx_swbm(struct napi_struct > *napi, > if (IS_ERR(skb)) { > struct mvneta_pcpu_stats *stats = > this_cpu_ptr(pp->stats); > > - mvneta_xdp_put_buff(pp, rxq, &xdp_buf, &sinfo, > -1); > + mvneta_xdp_put_buff(pp, rxq, &xdp_buf, > &xdp_sinfo, -1); > > u64_stats_update_begin(&stats->syncp); > stats->es.skb_alloc_error++; > @@ -2449,12 +2455,12 @@ static int mvneta_rx_swbm(struct napi_struct > *napi, > napi_gro_receive(napi, skb); > next: > xdp_buf.data_hard_start = NULL; > - sinfo.nr_frags = 0; > + xdp_sinfo.nr_frags = 0; > } > rcu_read_unlock(); > > if (xdp_buf.data_hard_start) > - mvneta_xdp_put_buff(pp, rxq, &xdp_buf, &sinfo, -1); > + mvneta_xdp_put_buff(pp, rxq, &xdp_buf, &xdp_sinfo, -1); > > if (ps.xdp_redirect) > xdp_do_flush_map(); > diff --git a/include/net/xdp.h b/include/net/xdp.h > index 70559720ff44..614f66d35ee8 100644 > --- a/include/net/xdp.h > +++ b/include/net/xdp.h > @@ -87,10 +87,54 @@ struct xdp_buff { > ((xdp)->data_hard_start + (xdp)->frame_sz - \ > SKB_DATA_ALIGN(sizeof(struct skb_shared_info))) > > -static inline struct skb_shared_info * > +struct xdp_shared_info { xdp_shared_info is a bad name, we need this to have a specific purpose xdp_frags should the proper name, so people will think twice before adding weird bits to this so called shared_info. > + u16 nr_frags; > + u16 data_length; /* paged area length */ > + skb_frag_t frags[MAX_SKB_FRAGS]; why MAX_SKB_FRAGS ? just use a flexible array member skb_frag_t frags[]; and enforce size via the n_frags and on the construction of the tailroom preserved buffer, which is already being done. this is waste of unnecessary space, at lease by definition of the struct, in your use case you do: memcpy(frag_list, xdp_sinfo->frags, sizeof(skb_frag_t) * num_frags); And the tailroom space was already preserved for a full skb_shinfo. so i don't see why you need this array to be of a fixed MAX_SKB_FRAGS size. > +}; > + > +static inline struct xdp_shared_info * > xdp_get_shared_info_from_buff(struct xdp_buff *xdp) > { > - return (struct skb_shared_info *)xdp_data_hard_end(xdp); > + BUILD_BUG_ON(sizeof(struct xdp_shared_info) > > + sizeof(struct skb_shared_info)); > + return (struct xdp_shared_info *)xdp_data_hard_end(xdp); > +} > + Back to my first comment, do we have plans to use this tail room buffer for other than frag_list use cases ? what will be the buffer format then ? should we push all new fields to the end of the xdp_shared_info struct ? or deal with this tailroom buffer as a stack ? my main concern is that for drivers that don't support frag list and still want to utilize the tailroom buffer for other usecases they will have to skip the first sizeof(xdp_shared_info) so they won't break the stack. > +static inline struct page *xdp_get_frag_page(const skb_frag_t *frag) > +{ > + return frag->bv_page; > +} > + > +static inline unsigned int xdp_get_frag_offset(const skb_frag_t > *frag) > +{ > + return frag->bv_offset; > +} > + > +static inline unsigned int xdp_get_frag_size(const skb_frag_t *frag) > +{ > + return frag->bv_len; > +} > + > +static inline void *xdp_get_frag_address(const skb_frag_t *frag) > +{ > + return page_address(xdp_get_frag_page(frag)) + > + xdp_get_frag_offset(frag); > +} > + > +static inline void xdp_set_frag_page(skb_frag_t *frag, struct page > *page) > +{ > + frag->bv_page = page; > +} > + > +static inline void xdp_set_frag_offset(skb_frag_t *frag, u32 offset) > +{ > + frag->bv_offset = offset; > +} > + > +static inline void xdp_set_frag_size(skb_frag_t *frag, u32 size) > +{ > + frag->bv_len = size; > } > > struct xdp_frame { > @@ -120,12 +164,12 @@ static __always_inline void > xdp_frame_bulk_init(struct xdp_frame_bulk *bq) > bq->xa = NULL; > } > > -static inline struct skb_shared_info * > +static inline struct xdp_shared_info * > xdp_get_shared_info_from_frame(struct xdp_frame *frame) > { > void *data_hard_start = frame->data - frame->headroom - > sizeof(*frame); > > - return (struct skb_shared_info *)(data_hard_start + frame- > >frame_sz - > + return (struct xdp_shared_info *)(data_hard_start + frame- > >frame_sz - > SKB_DATA_ALIGN(sizeof(struct > skb_shared_info))); > } > need a comment here why we preserve the size of skb_shared_info, yet the usable buffer is of type xdp_shared_info.
> On Mon, 2020-12-07 at 22:37 +0100, Maciej Fijalkowski wrote: > > On Mon, Dec 07, 2020 at 01:15:00PM -0800, Alexander Duyck wrote: > > > On Mon, Dec 7, 2020 at 8:36 AM Lorenzo Bianconi <lorenzo@kernel.org > > > > wrote: > > > > Initialize multi-buffer bit (mb) to 0 in all XDP-capable drivers. > > > > This is a preliminary patch to enable xdp multi-buffer support. > > > > > > > > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> > > > > > > I'm really not a fan of this design. Having to update every driver > > > in > > > order to initialize a field that was fragmented is a pain. At a > > > minimum it seems like it might be time to consider introducing some > > > sort of initializer function for this so that you can update things > > > in > > > one central place the next time you have to add a new field instead > > > of > > > having to update every individual driver that supports XDP. > > > Otherwise > > > this isn't going to scale going forward. > > > > Also, a good example of why this might be bothering for us is a fact > > that > > in the meantime the dpaa driver got XDP support and this patch hasn't > > been > > updated to include mb setting in that driver. > > > something like > init_xdp_buff(hard_start, headroom, len, frame_sz, rxq); > > would work for most of the drivers. > ack, agree. I will add init_xdp_buff() in v6. Regards, Lorenzo
> On Mon, 2020-12-07 at 17:32 +0100, Lorenzo Bianconi wrote: > > Introduce xdp_shared_info data structure to contain info about > > "non-linear" xdp frame. xdp_shared_info will alias skb_shared_info > > allowing to keep most of the frags in the same cache-line. > > Introduce some xdp_shared_info helpers aligned to skb_frag* ones > > > > is there or will be a more general purpose use to this xdp_shared_info > ? other than hosting frags ? I do not have other use-cases at the moment other than multi-buff but in theory it is possible I guess. The reason we introduced it is to have most of the frags in the first shared_info cache-line to avoid cache-misses. > > > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> > > --- > > drivers/net/ethernet/marvell/mvneta.c | 62 +++++++++++++++-------- > > ---- > > include/net/xdp.h | 52 ++++++++++++++++++++-- > > 2 files changed, 82 insertions(+), 32 deletions(-) > > > > diff --git a/drivers/net/ethernet/marvell/mvneta.c > > b/drivers/net/ethernet/marvell/mvneta.c > > index 1e5b5c69685a..d635463609ad 100644 > > --- a/drivers/net/ethernet/marvell/mvneta.c > > +++ b/drivers/net/ethernet/marvell/mvneta.c > > @@ -2033,14 +2033,17 @@ int mvneta_rx_refill_queue(struct mvneta_port > > *pp, struct mvneta_rx_queue *rxq) > > > > [...] > > > static void > > @@ -2278,7 +2281,7 @@ mvneta_swbm_add_rx_fragment(struct mvneta_port > > *pp, > > struct mvneta_rx_desc *rx_desc, > > struct mvneta_rx_queue *rxq, > > struct xdp_buff *xdp, int *size, > > - struct skb_shared_info *xdp_sinfo, > > + struct xdp_shared_info *xdp_sinfo, > > struct page *page) > > { > > struct net_device *dev = pp->dev; > > @@ -2301,13 +2304,13 @@ mvneta_swbm_add_rx_fragment(struct > > mvneta_port *pp, > > if (data_len > 0 && xdp_sinfo->nr_frags < MAX_SKB_FRAGS) { > > skb_frag_t *frag = &xdp_sinfo->frags[xdp_sinfo- > > >nr_frags++]; > > > > - skb_frag_off_set(frag, pp->rx_offset_correction); > > - skb_frag_size_set(frag, data_len); > > - __skb_frag_set_page(frag, page); > > + xdp_set_frag_offset(frag, pp->rx_offset_correction); > > + xdp_set_frag_size(frag, data_len); > > + xdp_set_frag_page(frag, page); > > > > why three separate setters ? why not just one > xdp_set_frag(page, offset, size) ? to be aligned with skb_frags helpers, but I guess we can have a single helper, I do not have a strong opinion on it > > > /* last fragment */ > > if (len == *size) { > > - struct skb_shared_info *sinfo; > > + struct xdp_shared_info *sinfo; > > > > sinfo = xdp_get_shared_info_from_buff(xdp); > > sinfo->nr_frags = xdp_sinfo->nr_frags; > > @@ -2324,10 +2327,13 @@ static struct sk_buff * > > mvneta_swbm_build_skb(struct mvneta_port *pp, struct mvneta_rx_queue > > *rxq, > > struct xdp_buff *xdp, u32 desc_status) > > { [...] > > > > -static inline struct skb_shared_info * > > +struct xdp_shared_info { > > xdp_shared_info is a bad name, we need this to have a specific purpose > xdp_frags should the proper name, so people will think twice before > adding weird bits to this so called shared_info. I named the struct xdp_shared_info to recall skb_shared_info but I guess xdp_frags is fine too. Agree? > > > + u16 nr_frags; > > + u16 data_length; /* paged area length */ > > + skb_frag_t frags[MAX_SKB_FRAGS]; > > why MAX_SKB_FRAGS ? just use a flexible array member > skb_frag_t frags[]; > > and enforce size via the n_frags and on the construction of the > tailroom preserved buffer, which is already being done. > > this is waste of unnecessary space, at lease by definition of the > struct, in your use case you do: > memcpy(frag_list, xdp_sinfo->frags, sizeof(skb_frag_t) * num_frags); > And the tailroom space was already preserved for a full skb_shinfo. > so i don't see why you need this array to be of a fixed MAX_SKB_FRAGS > size. In order to avoid cache-misses, xdp_shared info is built as a variable on mvneta_rx_swbm() stack and it is written to "shared_info" area only on the last fragment in mvneta_swbm_add_rx_fragment(). I used MAX_SKB_FRAGS to be aligned with skb_shared_info struct but probably we can use even a smaller value. Another approach would be to define two different struct, e.g. stuct xdp_frag_metadata { u16 nr_frags; u16 data_length; /* paged area length */ }; struct xdp_frags { skb_frag_t frags[MAX_SKB_FRAGS]; }; and then define xdp_shared_info as struct xdp_shared_info { stuct xdp_frag_metadata meta; skb_frag_t frags[]; }; In this way we can probably optimize the space. What do you think? > > > +}; > > + > > +static inline struct xdp_shared_info * > > xdp_get_shared_info_from_buff(struct xdp_buff *xdp) > > { > > - return (struct skb_shared_info *)xdp_data_hard_end(xdp); > > + BUILD_BUG_ON(sizeof(struct xdp_shared_info) > > > + sizeof(struct skb_shared_info)); > > + return (struct xdp_shared_info *)xdp_data_hard_end(xdp); > > +} > > + > > Back to my first comment, do we have plans to use this tail room buffer > for other than frag_list use cases ? what will be the buffer format > then ? should we push all new fields to the end of the xdp_shared_info > struct ? or deal with this tailroom buffer as a stack ? > my main concern is that for drivers that don't support frag list and > still want to utilize the tailroom buffer for other usecases they will > have to skip the first sizeof(xdp_shared_info) so they won't break the > stack. for the moment I do not know if this area is used for other purposes. Do you think there are other use-cases for it? > > > +static inline struct page *xdp_get_frag_page(const skb_frag_t *frag) > > +{ > > + return frag->bv_page; > > +} > > + > > +static inline unsigned int xdp_get_frag_offset(const skb_frag_t > > *frag) > > +{ > > + return frag->bv_offset; > > +} > > + > > +static inline unsigned int xdp_get_frag_size(const skb_frag_t *frag) > > +{ > > + return frag->bv_len; > > +} > > + > > +static inline void *xdp_get_frag_address(const skb_frag_t *frag) > > +{ > > + return page_address(xdp_get_frag_page(frag)) + > > + xdp_get_frag_offset(frag); > > +} > > + > > +static inline void xdp_set_frag_page(skb_frag_t *frag, struct page > > *page) > > +{ > > + frag->bv_page = page; > > +} > > + > > +static inline void xdp_set_frag_offset(skb_frag_t *frag, u32 offset) > > +{ > > + frag->bv_offset = offset; > > +} > > + > > +static inline void xdp_set_frag_size(skb_frag_t *frag, u32 size) > > +{ > > + frag->bv_len = size; > > } > > > > struct xdp_frame { > > @@ -120,12 +164,12 @@ static __always_inline void > > xdp_frame_bulk_init(struct xdp_frame_bulk *bq) > > bq->xa = NULL; > > } > > > > -static inline struct skb_shared_info * > > +static inline struct xdp_shared_info * > > xdp_get_shared_info_from_frame(struct xdp_frame *frame) > > { > > void *data_hard_start = frame->data - frame->headroom - > > sizeof(*frame); > > > > - return (struct skb_shared_info *)(data_hard_start + frame- > > >frame_sz - > > + return (struct xdp_shared_info *)(data_hard_start + frame- > > >frame_sz - > > SKB_DATA_ALIGN(sizeof(struct > > skb_shared_info))); > > } > > > > need a comment here why we preserve the size of skb_shared_info, yet > the usable buffer is of type xdp_shared_info. ack, I will add it in v6. Regards, Lorenzo >
On Tue, 8 Dec 2020 11:31:03 +0100 Lorenzo Bianconi <lorenzo.bianconi@redhat.com> wrote: > > On Mon, 2020-12-07 at 22:37 +0100, Maciej Fijalkowski wrote: > > > On Mon, Dec 07, 2020 at 01:15:00PM -0800, Alexander Duyck wrote: > > > > On Mon, Dec 7, 2020 at 8:36 AM Lorenzo Bianconi <lorenzo@kernel.org > > > > > wrote: > > > > > Initialize multi-buffer bit (mb) to 0 in all XDP-capable drivers. > > > > > This is a preliminary patch to enable xdp multi-buffer support. > > > > > > > > > > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> > > > > > > > > I'm really not a fan of this design. Having to update every driver in > > > > order to initialize a field that was fragmented is a pain. At a > > > > minimum it seems like it might be time to consider introducing some > > > > sort of initializer function for this so that you can update things in > > > > one central place the next time you have to add a new field instead of > > > > having to update every individual driver that supports XDP. Otherwise > > > > this isn't going to scale going forward. +1 > > > Also, a good example of why this might be bothering for us is a fact that > > > in the meantime the dpaa driver got XDP support and this patch hasn't been > > > updated to include mb setting in that driver. > > > > > something like > > init_xdp_buff(hard_start, headroom, len, frame_sz, rxq); > > > > would work for most of the drivers. > > > > ack, agree. I will add init_xdp_buff() in v6. I do like the idea of an initialize helper function. Remember this is fast-path code and likely need to be inlined. Further more, remember that drivers can and do optimize the number of writes they do to xdp_buff. There are a number of fields in xdp_buff that only need to be initialized once per NAPI. E.g. rxq and frame_sz (some driver do change frame_sz per packet). Thus, you likely need two inlined helpers for init. Again, remember that C-compiler will generate an expensive operation (rep stos) for clearing a struct if it is initialized like this, where all member are not initialized (do NOT do this): struct xdp_buff xdp = { .rxq = rxq, .frame_sz = PAGE_SIZE, }; -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer
Lorenzo Bianconi <lorenzo.bianconi@redhat.com> writes: >> On Mon, 2020-12-07 at 17:32 +0100, Lorenzo Bianconi wrote: >> > Introduce xdp_shared_info data structure to contain info >> > about >> > "non-linear" xdp frame. xdp_shared_info will alias >> > skb_shared_info >> > allowing to keep most of the frags in the same cache-line. [...] >> >> > + u16 nr_frags; >> > + u16 data_length; /* paged area length */ >> > + skb_frag_t frags[MAX_SKB_FRAGS]; >> >> why MAX_SKB_FRAGS ? just use a flexible array member >> skb_frag_t frags[]; >> >> and enforce size via the n_frags and on the construction of the >> tailroom preserved buffer, which is already being done. >> >> this is waste of unnecessary space, at lease by definition of >> the >> struct, in your use case you do: >> memcpy(frag_list, xdp_sinfo->frags, sizeof(skb_frag_t) * >> num_frags); >> And the tailroom space was already preserved for a full >> skb_shinfo. >> so i don't see why you need this array to be of a fixed >> MAX_SKB_FRAGS >> size. > > In order to avoid cache-misses, xdp_shared info is built as a > variable > on mvneta_rx_swbm() stack and it is written to "shared_info" > area only on the > last fragment in mvneta_swbm_add_rx_fragment(). I used > MAX_SKB_FRAGS to be > aligned with skb_shared_info struct but probably we can use even > a smaller value. > Another approach would be to define two different struct, e.g. > > stuct xdp_frag_metadata { > u16 nr_frags; > u16 data_length; /* paged area length */ > }; > > struct xdp_frags { > skb_frag_t frags[MAX_SKB_FRAGS]; > }; > > and then define xdp_shared_info as > > struct xdp_shared_info { > stuct xdp_frag_metadata meta; > skb_frag_t frags[]; > }; > > In this way we can probably optimize the space. What do you > think? We're still reserving ~sizeof(skb_shared_info) bytes at the end of the first buffer and it seems like in mvneta code you keep updating all three fields (frags, nr_frags and data_length). Can you explain how the space is optimized by splitting the structs please? >> >> > +}; >> > + >> > +static inline struct xdp_shared_info * >> > xdp_get_shared_info_from_buff(struct xdp_buff *xdp) >> > { >> > - return (struct skb_shared_info *)xdp_data_hard_end(xdp); >> > + BUILD_BUG_ON(sizeof(struct xdp_shared_info) > >> > + sizeof(struct skb_shared_info)); >> > + return (struct xdp_shared_info *)xdp_data_hard_end(xdp); >> > +} >> > + >> >> Back to my first comment, do we have plans to use this tail >> room buffer >> for other than frag_list use cases ? what will be the buffer >> format >> then ? should we push all new fields to the end of the >> xdp_shared_info >> struct ? or deal with this tailroom buffer as a stack ? >> my main concern is that for drivers that don't support frag >> list and >> still want to utilize the tailroom buffer for other usecases >> they will >> have to skip the first sizeof(xdp_shared_info) so they won't >> break the >> stack. > > for the moment I do not know if this area is used for other > purposes. > Do you think there are other use-cases for it? > Saeed, the stack receives skb_shared_info when the frames are passed to the stack (skb_add_rx_frag is used to add the whole information to skb's shared info), and for XDP_REDIRECT use case, it doesn't seem like all drivers check page's tailroom for more information anyway (ena doesn't at least). Can you please explain what do you mean by "break the stack"? Thanks, Shay >> [...] > >>
On 2020-12-19 9:53 a.m., Shay Agroskin wrote: > > Lorenzo Bianconi <lorenzo.bianconi@redhat.com> writes: > >> for the moment I do not know if this area is used for other purposes. >> Do you think there are other use-cases for it? Sorry to interject: Does it make sense to use it to store arbitrary metadata or a scratchpad in this space? Something equivalent to skb->cb which is lacking in XDP. cheers, jamal
Lorenzo Bianconi <lorenzo@kernel.org> writes: > Introduce the capability to map non-linear xdp buffer running > mvneta_xdp_submit_frame() for XDP_TX and XDP_REDIRECT > > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> > --- > drivers/net/ethernet/marvell/mvneta.c | 94 > ++++++++++++++++----------- > 1 file changed, 56 insertions(+), 38 deletions(-) [...] > if (napi && buf->type == > MVNETA_TYPE_XDP_TX) > xdp_return_frame_rx_napi(buf->xdpf); > else > @@ -2054,45 +2054,64 @@ mvneta_xdp_put_buff(struct mvneta_port > *pp, struct mvneta_rx_queue *rxq, > > static int > mvneta_xdp_submit_frame(struct mvneta_port *pp, struct > mvneta_tx_queue *txq, > - struct xdp_frame *xdpf, bool dma_map) > + struct xdp_frame *xdpf, int *nxmit_byte, > bool dma_map) > { > - struct mvneta_tx_desc *tx_desc; > - struct mvneta_tx_buf *buf; > - dma_addr_t dma_addr; > + struct xdp_shared_info *xdp_sinfo = > xdp_get_shared_info_from_frame(xdpf); > + int i, num_frames = xdpf->mb ? xdp_sinfo->nr_frags + 1 : > 1; > + struct mvneta_tx_desc *tx_desc = NULL; > + struct page *page; > > - if (txq->count >= txq->tx_stop_threshold) > + if (txq->count + num_frames >= txq->size) > return MVNETA_XDP_DROPPED; > > - tx_desc = mvneta_txq_next_desc_get(txq); > + for (i = 0; i < num_frames; i++) { > + struct mvneta_tx_buf *buf = > &txq->buf[txq->txq_put_index]; > + skb_frag_t *frag = i ? &xdp_sinfo->frags[i - 1] : > NULL; > + int len = frag ? xdp_get_frag_size(frag) : > xdpf->len; nit, from branch prediction point of view, maybe it would be better to write int len = i ? xdp_get_frag_size(frag) : xdpf->len; since the value of i is checked one line above Disclaimer: I'm far from a compiler expert, and don't know whether the compiler would know to group these two assignments together into a single branch prediction decision, but it feels like using 'i' would make this decision easier for it. Thanks, Shay [...]
Lorenzo Bianconi <lorenzo@kernel.org> writes: > Introduce __xdp_build_skb_from_frame and > xdp_build_skb_from_frame > utility routines to build the skb from xdp_frame. > Add xdp multi-buff support to cpumap > > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> > --- > include/net/xdp.h | 5 ++++ > kernel/bpf/cpumap.c | 45 +--------------------------- > net/core/xdp.c | 73 > +++++++++++++++++++++++++++++++++++++++++++++ > 3 files changed, 79 insertions(+), 44 deletions(-) > [...] > diff --git a/net/core/xdp.c b/net/core/xdp.c > index 6c8e743ad03a..55f3e9c69427 100644 > --- a/net/core/xdp.c > +++ b/net/core/xdp.c > @@ -597,3 +597,76 @@ void xdp_warn(const char *msg, const char > *func, const int line) > WARN(1, "XDP_WARN: %s(line:%d): %s\n", func, line, msg); > }; > EXPORT_SYMBOL_GPL(xdp_warn); > + > +struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame > *xdpf, > + struct sk_buff *skb, > + struct net_device *dev) > +{ > + unsigned int headroom = sizeof(*xdpf) + xdpf->headroom; > + void *hard_start = xdpf->data - headroom; > + skb_frag_t frag_list[MAX_SKB_FRAGS]; > + struct xdp_shared_info *xdp_sinfo; > + int i, num_frags = 0; > + > + xdp_sinfo = xdp_get_shared_info_from_frame(xdpf); > + if (unlikely(xdpf->mb)) { > + num_frags = xdp_sinfo->nr_frags; > + memcpy(frag_list, xdp_sinfo->frags, > + sizeof(skb_frag_t) * num_frags); > + } nit, can you please move the xdp_sinfo assignment inside this 'if' ? This would help to emphasize that regarding xdp_frame tailroom as xdp_shared_info struct (rather than skb_shared_info) is correct only when the mb bit is set thanks, Shay > + > + skb = build_skb_around(skb, hard_start, xdpf->frame_sz); > + if (unlikely(!skb)) > + return NULL; [...]
> > > Lorenzo Bianconi <lorenzo.bianconi@redhat.com> writes: > > >> On Mon, 2020-12-07 at 17:32 +0100, Lorenzo Bianconi wrote: > >> > Introduce xdp_shared_info data structure to contain info > >> > about > >> > "non-linear" xdp frame. xdp_shared_info will alias > >> > skb_shared_info > >> > allowing to keep most of the frags in the same cache-line. > [...] > >> > >> > + u16 nr_frags; > >> > + u16 data_length; /* paged area length */ > >> > + skb_frag_t frags[MAX_SKB_FRAGS]; > >> > >> why MAX_SKB_FRAGS ? just use a flexible array member > >> skb_frag_t frags[]; > >> > >> and enforce size via the n_frags and on the construction of the > >> tailroom preserved buffer, which is already being done. > >> > >> this is waste of unnecessary space, at lease by definition of > >> the > >> struct, in your use case you do: > >> memcpy(frag_list, xdp_sinfo->frags, sizeof(skb_frag_t) * > >> num_frags); > >> And the tailroom space was already preserved for a full > >> skb_shinfo. > >> so i don't see why you need this array to be of a fixed > >> MAX_SKB_FRAGS > >> size. > > > > In order to avoid cache-misses, xdp_shared info is built as a > > variable > > on mvneta_rx_swbm() stack and it is written to "shared_info" > > area only on the > > last fragment in mvneta_swbm_add_rx_fragment(). I used > > MAX_SKB_FRAGS to be > > aligned with skb_shared_info struct but probably we can use even > > a smaller value. > > Another approach would be to define two different struct, e.g. > > > > stuct xdp_frag_metadata { > > u16 nr_frags; > > u16 data_length; /* paged area length */ > > }; > > > > struct xdp_frags { > > skb_frag_t frags[MAX_SKB_FRAGS]; > > }; > > > > and then define xdp_shared_info as > > > > struct xdp_shared_info { > > stuct xdp_frag_metadata meta; > > skb_frag_t frags[]; > > }; > > > > In this way we can probably optimize the space. What do you > > think? > > We're still reserving ~sizeof(skb_shared_info) bytes at the end of > the first buffer and it seems like in mvneta code you keep > updating all three fields (frags, nr_frags and data_length). > Can you explain how the space is optimized by splitting the > structs please? using xdp_shared_info struct we will have the first 3 fragments in the same cacheline of nr_frags while using skb_shared_info struct only the first fragment will be in the same cacheline of nr_frags. Moreover skb_shared_info has multiple fields unused by xdp. Regards, Lorenzo > > >> > >> > +}; > >> > + > >> > +static inline struct xdp_shared_info * > >> > xdp_get_shared_info_from_buff(struct xdp_buff *xdp) > >> > { > >> > - return (struct skb_shared_info *)xdp_data_hard_end(xdp); > >> > + BUILD_BUG_ON(sizeof(struct xdp_shared_info) > > >> > + sizeof(struct skb_shared_info)); > >> > + return (struct xdp_shared_info *)xdp_data_hard_end(xdp); > >> > +} > >> > + > >> > >> Back to my first comment, do we have plans to use this tail > >> room buffer > >> for other than frag_list use cases ? what will be the buffer > >> format > >> then ? should we push all new fields to the end of the > >> xdp_shared_info > >> struct ? or deal with this tailroom buffer as a stack ? > >> my main concern is that for drivers that don't support frag > >> list and > >> still want to utilize the tailroom buffer for other usecases > >> they will > >> have to skip the first sizeof(xdp_shared_info) so they won't > >> break the > >> stack. > > > > for the moment I do not know if this area is used for other > > purposes. > > Do you think there are other use-cases for it? > > > > Saeed, the stack receives skb_shared_info when the frames are > passed to the stack (skb_add_rx_frag is used to add the whole > information to skb's shared info), and for XDP_REDIRECT use case, > it doesn't seem like all drivers check page's tailroom for more > information anyway (ena doesn't at least). > Can you please explain what do you mean by "break the stack"? > > Thanks, Shay > > >> > [...] > > > >> >
> > > Lorenzo Bianconi <lorenzo@kernel.org> writes: > > > Introduce __xdp_build_skb_from_frame and > > xdp_build_skb_from_frame > > utility routines to build the skb from xdp_frame. > > Add xdp multi-buff support to cpumap > > > > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> > > --- > > include/net/xdp.h | 5 ++++ > > kernel/bpf/cpumap.c | 45 +--------------------------- > > net/core/xdp.c | 73 > > +++++++++++++++++++++++++++++++++++++++++++++ > > 3 files changed, 79 insertions(+), 44 deletions(-) > > > [...] > > diff --git a/net/core/xdp.c b/net/core/xdp.c > > index 6c8e743ad03a..55f3e9c69427 100644 > > --- a/net/core/xdp.c > > +++ b/net/core/xdp.c > > @@ -597,3 +597,76 @@ void xdp_warn(const char *msg, const char > > *func, const int line) > > WARN(1, "XDP_WARN: %s(line:%d): %s\n", func, line, msg); > > }; > > EXPORT_SYMBOL_GPL(xdp_warn); > > + > > +struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame > > *xdpf, > > + struct sk_buff *skb, > > + struct net_device *dev) > > +{ > > + unsigned int headroom = sizeof(*xdpf) + xdpf->headroom; > > + void *hard_start = xdpf->data - headroom; > > + skb_frag_t frag_list[MAX_SKB_FRAGS]; > > + struct xdp_shared_info *xdp_sinfo; > > + int i, num_frags = 0; > > + > > + xdp_sinfo = xdp_get_shared_info_from_frame(xdpf); > > + if (unlikely(xdpf->mb)) { > > + num_frags = xdp_sinfo->nr_frags; > > + memcpy(frag_list, xdp_sinfo->frags, > > + sizeof(skb_frag_t) * num_frags); > > + } > > nit, can you please move the xdp_sinfo assignment inside this 'if' > ? This would help to emphasize that regarding xdp_frame tailroom > as xdp_shared_info struct (rather than skb_shared_info) is correct > only when the mb bit is set > > thanks, > Shay ack, will do in v6. Regards, Lorenzo > > > + > > + skb = build_skb_around(skb, hard_start, xdpf->frame_sz); > > + if (unlikely(!skb)) > > + return NULL; > [...] >
On Sat, Dec 19, 2020 at 4:56 PM Shay Agroskin <shayagr@amazon.com> wrote: > > > Lorenzo Bianconi <lorenzo@kernel.org> writes: > > > Introduce the capability to map non-linear xdp buffer running > > mvneta_xdp_submit_frame() for XDP_TX and XDP_REDIRECT > > > > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> > > --- > > drivers/net/ethernet/marvell/mvneta.c | 94 > > ++++++++++++++++----------- > > 1 file changed, 56 insertions(+), 38 deletions(-) > [...] > > if (napi && buf->type == > > MVNETA_TYPE_XDP_TX) > > xdp_return_frame_rx_napi(buf->xdpf); > > else > > @@ -2054,45 +2054,64 @@ mvneta_xdp_put_buff(struct mvneta_port > > *pp, struct mvneta_rx_queue *rxq, > > > > static int > > mvneta_xdp_submit_frame(struct mvneta_port *pp, struct > > mvneta_tx_queue *txq, > > - struct xdp_frame *xdpf, bool dma_map) > > + struct xdp_frame *xdpf, int *nxmit_byte, > > bool dma_map) > > { > > - struct mvneta_tx_desc *tx_desc; > > - struct mvneta_tx_buf *buf; > > - dma_addr_t dma_addr; > > + struct xdp_shared_info *xdp_sinfo = > > xdp_get_shared_info_from_frame(xdpf); > > + int i, num_frames = xdpf->mb ? xdp_sinfo->nr_frags + 1 : > > 1; > > + struct mvneta_tx_desc *tx_desc = NULL; > > + struct page *page; > > > > - if (txq->count >= txq->tx_stop_threshold) > > + if (txq->count + num_frames >= txq->size) > > return MVNETA_XDP_DROPPED; > > > > - tx_desc = mvneta_txq_next_desc_get(txq); > > + for (i = 0; i < num_frames; i++) { > > + struct mvneta_tx_buf *buf = > > &txq->buf[txq->txq_put_index]; > > + skb_frag_t *frag = i ? &xdp_sinfo->frags[i - 1] : > > NULL; > > + int len = frag ? xdp_get_frag_size(frag) : > > xdpf->len; > > nit, from branch prediction point of view, maybe it would be > better to write > int len = i ? xdp_get_frag_size(frag) : xdpf->len; > ack, I will fix it in v6. Regards, Lorenzo > since the value of i is checked one line above > Disclaimer: I'm far from a compiler expert, and don't know whether > the compiler would know to group these two assignments together > into a single branch prediction decision, but it feels like using > 'i' would make this decision easier for it. > > Thanks, > Shay > > [...] >
On Sat, 19 Dec 2020 10:30:57 -0500 Jamal Hadi Salim <jhs@mojatatu.com> wrote: > On 2020-12-19 9:53 a.m., Shay Agroskin wrote: > > > > Lorenzo Bianconi <lorenzo.bianconi@redhat.com> writes: > > > > >> for the moment I do not know if this area is used for other purposes. > >> Do you think there are other use-cases for it? Yes, all the same use-cases as SKB have. I wanted to keep this the same as skb_shared_info, but Lorenzo choose to take John's advice and it going in this direction (which is fine, we can always change and adjust this later). > Sorry to interject: > Does it make sense to use it to store arbitrary metadata or a scratchpad > in this space? Something equivalent to skb->cb which is lacking in > XDP. Well, XDP have the data_meta area. But difficult to rely on because a lot of driver don't implement it. And Saeed and I plan to use this area and populate it with driver info from RX-descriptor. -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer
On 2020-12-21 4:01 a.m., Jesper Dangaard Brouer wrote: > On Sat, 19 Dec 2020 10:30:57 -0500 >> Sorry to interject: >> Does it make sense to use it to store arbitrary metadata or a scratchpad >> in this space? Something equivalent to skb->cb which is lacking in >> XDP. > > Well, XDP have the data_meta area. But difficult to rely on because a > lot of driver don't implement it. And Saeed and I plan to use this > area and populate it with driver info from RX-descriptor. > What i was thinking is some scratch pad that i can write to within an XDP prog (not driver); example, in a prog array map the scratch pad is written by one program in the array and read by another later on. skb->cb allows for that. Unless you mean i can already write to some XDP data_meta area? cheers, jamal
Lorenzo Bianconi <lorenzo.bianconi@redhat.com> writes: >> >> >> Lorenzo Bianconi <lorenzo.bianconi@redhat.com> writes: >> >> >> On Mon, 2020-12-07 at 17:32 +0100, Lorenzo Bianconi wrote: >> >> > Introduce xdp_shared_info data structure to contain info >> >> > about >> >> > "non-linear" xdp frame. xdp_shared_info will alias >> >> > skb_shared_info >> >> > allowing to keep most of the frags in the same cache-line. >> [...] >> >> >> >> > + u16 nr_frags; >> >> > + u16 data_length; /* paged area length */ >> >> > + skb_frag_t frags[MAX_SKB_FRAGS]; >> >> >> >> why MAX_SKB_FRAGS ? just use a flexible array member >> >> skb_frag_t frags[]; >> >> >> >> and enforce size via the n_frags and on the construction of >> >> the >> >> tailroom preserved buffer, which is already being done. >> >> >> >> this is waste of unnecessary space, at lease by definition >> >> of >> >> the >> >> struct, in your use case you do: >> >> memcpy(frag_list, xdp_sinfo->frags, sizeof(skb_frag_t) * >> >> num_frags); >> >> And the tailroom space was already preserved for a full >> >> skb_shinfo. >> >> so i don't see why you need this array to be of a fixed >> >> MAX_SKB_FRAGS >> >> size. >> > >> > In order to avoid cache-misses, xdp_shared info is built as a >> > variable >> > on mvneta_rx_swbm() stack and it is written to "shared_info" >> > area only on the >> > last fragment in mvneta_swbm_add_rx_fragment(). I used >> > MAX_SKB_FRAGS to be >> > aligned with skb_shared_info struct but probably we can use >> > even >> > a smaller value. >> > Another approach would be to define two different struct, >> > e.g. >> > >> > stuct xdp_frag_metadata { >> > u16 nr_frags; >> > u16 data_length; /* paged area length */ >> > }; >> > >> > struct xdp_frags { >> > skb_frag_t frags[MAX_SKB_FRAGS]; >> > }; >> > >> > and then define xdp_shared_info as >> > >> > struct xdp_shared_info { >> > stuct xdp_frag_metadata meta; >> > skb_frag_t frags[]; >> > }; >> > >> > In this way we can probably optimize the space. What do you >> > think? >> >> We're still reserving ~sizeof(skb_shared_info) bytes at the end >> of >> the first buffer and it seems like in mvneta code you keep >> updating all three fields (frags, nr_frags and data_length). >> Can you explain how the space is optimized by splitting the >> structs please? > > using xdp_shared_info struct we will have the first 3 fragments > in the > same cacheline of nr_frags while using skb_shared_info struct > only the > first fragment will be in the same cacheline of > nr_frags. Moreover > skb_shared_info has multiple fields unused by xdp. > > Regards, > Lorenzo > Thanks for your reply. I was actually referring to your suggestion to Saeed. Namely, defining struct xdp_shared_info { struct xdp_frag_metadata meta; skb_frag_t frags[]; } I don't see what benefits there are to this scheme compared to the original patch Thanks, Shay >> >> >> >> >> > +}; >> >> > + [...] >> >> Saeed, the stack receives skb_shared_info when the frames are >> passed to the stack (skb_add_rx_frag is used to add the whole >> information to skb's shared info), and for XDP_REDIRECT use >> case, >> it doesn't seem like all drivers check page's tailroom for more >> information anyway (ena doesn't at least). >> Can you please explain what do you mean by "break the stack"? >> >> Thanks, Shay >> >> >> >> [...] >> > >> >> >>