From patchwork Thu Feb 18 20:50:02 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 384748 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7A02AC43381 for ; Thu, 18 Feb 2021 20:51:32 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2E07864E79 for ; Thu, 18 Feb 2021 20:51:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230352AbhBRUvY (ORCPT ); Thu, 18 Feb 2021 15:51:24 -0500 Received: from mail1.protonmail.ch ([185.70.40.18]:18797 "EHLO mail1.protonmail.ch" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229919AbhBRUuq (ORCPT ); Thu, 18 Feb 2021 15:50:46 -0500 Date: Thu, 18 Feb 2021 20:50:02 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pm.me; s=protonmail; t=1613681403; bh=UT5dqOxXemXMBrQO9+qpLgvawJH5qOALCJiqi4jXfPQ=; h=Date:To:From:Cc:Reply-To:Subject:In-Reply-To:References:From; b=ZzPhaU1Um8KKnztCWdjI7U1gntZ54rIbwLoyG7z7JO3++LUhiAORy0MgqXE/++yRk gBTBajl4Kw+vw+g+fNl3uQ8uOaSdtOL3wWa6H1BirrmQ79QJeQINP6Y5tWT8pOd6Qa DalmENxYUoppmXBC22JeA1fzilIOKF9HLjjHxP9zpv2M4z9XHggpcu1r3cAsTZnKVS rMOV7DQNcbsVP7P7z7+p4p2gNmg2+8ODetnUuCQzveJ5BJPdSY6uh3GJp14Aqkq8AG G/GKp35+v2K8Zz8j2nkQPpgw1O2d+zI2K7JTnG9803cnqeuIA9qFz44X+VSE3iLusm f2Wse1HssAk2g== To: Daniel Borkmann , Magnus Karlsson From: Alexander Lobakin Cc: "Michael S. Tsirkin" , Jason Wang , "David S. Miller" , Jakub Kicinski , Jonathan Lemon , Alexei Starovoitov , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Jesper Dangaard Brouer , John Fastabend , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , KP Singh , Paolo Abeni , Eric Dumazet , Xuan Zhuo , Dust Li , Alexander Lobakin , virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, bpf@vger.kernel.org Reply-To: Alexander Lobakin Subject: [PATCH v8 bpf-next 2/5] net: add priv_flags for allow tx skb without linear Message-ID: <20210218204908.5455-3-alobakin@pm.me> In-Reply-To: <20210218204908.5455-1-alobakin@pm.me> References: <20210218204908.5455-1-alobakin@pm.me> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Xuan Zhuo In some cases, we hope to construct skb directly based on the existing memory without copying data. In this case, the page will be placed directly in the skb, and the linear space of skb is empty. But unfortunately, many the network card does not support this operation. For example Mellanox Technologies MT27710 Family [ConnectX-4 Lx] will get the following error message: mlx5_core 0000:3b:00.1 eth1: Error cqe on cqn 0x817, ci 0x8, qn 0x1dbb, opcode 0xd, syndrome 0x1, vendor syndrome 0x68 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000030: 00 00 00 00 60 10 68 01 0a 00 1d bb 00 0f 9f d2 WQE DUMP: WQ size 1024 WQ cur size 0, WQE index 0xf, len: 64 00000000: 00 00 0f 0a 00 1d bb 03 00 00 00 08 00 00 00 00 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000020: 00 00 00 2b 00 08 00 00 00 00 00 05 9e e3 08 00 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 mlx5_core 0000:3b:00.1 eth1: ERR CQE on SQ: 0x1dbb So a priv_flag is added here to indicate whether the network card supports this feature. Signed-off-by: Xuan Zhuo Suggested-by: Alexander Lobakin [ alobakin: give a new flag more detailed description ] Signed-off-by: Alexander Lobakin Acked-by: John Fastabend --- include/linux/netdevice.h | 4 ++++ 1 file changed, 4 insertions(+) -- 2.30.1 diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 3b6f82c2c271..6cef47b76cc6 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1518,6 +1518,8 @@ struct net_device_ops { * @IFF_FAILOVER_SLAVE: device is lower dev of a failover master device * @IFF_L3MDEV_RX_HANDLER: only invoke the rx handler of L3 master device * @IFF_LIVE_RENAME_OK: rename is allowed while device is up and running + * @IFF_TX_SKB_NO_LINEAR: device/driver is capable of xmitting frames with + * skb_headlen(skb) == 0 (data starts from frag0) */ enum netdev_priv_flags { IFF_802_1Q_VLAN = 1<<0, @@ -1551,6 +1553,7 @@ enum netdev_priv_flags { IFF_FAILOVER_SLAVE = 1<<28, IFF_L3MDEV_RX_HANDLER = 1<<29, IFF_LIVE_RENAME_OK = 1<<30, + IFF_TX_SKB_NO_LINEAR = 1<<31, }; #define IFF_802_1Q_VLAN IFF_802_1Q_VLAN @@ -1584,6 +1587,7 @@ enum netdev_priv_flags { #define IFF_FAILOVER_SLAVE IFF_FAILOVER_SLAVE #define IFF_L3MDEV_RX_HANDLER IFF_L3MDEV_RX_HANDLER #define IFF_LIVE_RENAME_OK IFF_LIVE_RENAME_OK +#define IFF_TX_SKB_NO_LINEAR IFF_TX_SKB_NO_LINEAR /** * struct net_device - The DEVICE structure. From patchwork Thu Feb 18 20:50:31 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 384747 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 46605C43381 for ; Thu, 18 Feb 2021 20:52:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 04BAB64EC7 for ; Thu, 18 Feb 2021 20:52:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230514AbhBRUvw (ORCPT ); Thu, 18 Feb 2021 15:51:52 -0500 Received: from mail-40133.protonmail.ch ([185.70.40.133]:55513 "EHLO mail-40133.protonmail.ch" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230209AbhBRUvT (ORCPT ); Thu, 18 Feb 2021 15:51:19 -0500 Date: Thu, 18 Feb 2021 20:50:31 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pm.me; s=protonmail; t=1613681436; bh=owSrMjomC3UBcnN55YEs7Jgmm2WXTHJ5680gLftdiFo=; h=Date:To:From:Cc:Reply-To:Subject:In-Reply-To:References:From; b=e3+8yjP4BwqRmhn71hIU3iQn4c6MwxGpDLPoIOKsOGawdxCkNZ0LEwQhKUdhNE+Wl W/mQLdf22X0Rbg8ur47JLw6ZL/r63HyOIlYiR8orF7aWMYtLheYdpFkiZCX5LbT4Ox zn5JHItknuwNjdpbP4XXuBcZ/pdS/e2Tq9i0kd+abvHFYFebZEwKT1uzLVgDPngZ6e 6QHnnYrbG8OTVv4OOo08mJ81l+9nuKN7I9ADINx3PzrRVsL1xz8dk1QbzHAszHKqAC dnBXqzBFGekWe/Lr+k09fC3MhURuTKUdP8EZncb6uUSTz+OXlP1WwMAcY3FtkFaRVS vqYPSt4o7CR2Q== To: Daniel Borkmann , Magnus Karlsson From: Alexander Lobakin Cc: "Michael S. Tsirkin" , Jason Wang , "David S. Miller" , Jakub Kicinski , Jonathan Lemon , Alexei Starovoitov , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Jesper Dangaard Brouer , John Fastabend , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , KP Singh , Paolo Abeni , Eric Dumazet , Xuan Zhuo , Dust Li , Alexander Lobakin , virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, bpf@vger.kernel.org Reply-To: Alexander Lobakin Subject: [PATCH v8 bpf-next 4/5] xsk: respect device's headroom and tailroom on generic xmit path Message-ID: <20210218204908.5455-5-alobakin@pm.me> In-Reply-To: <20210218204908.5455-1-alobakin@pm.me> References: <20210218204908.5455-1-alobakin@pm.me> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org xsk_generic_xmit() allocates a new skb and then queues it for xmitting. The size of new skb's headroom is desc->len, so it comes to the driver/device with no reserved headroom and/or tailroom. Lots of drivers need some headroom (and sometimes tailroom) to prepend (and/or append) some headers or data, e.g. CPU tags, device-specific headers/descriptors (LSO, TLS etc.), and if case of no available space skb_cow_head() will reallocate the skb. Reallocations are unwanted on fast-path, especially when it comes to XDP, so generic XSK xmit should reserve the spaces declared in dev->needed_headroom and dev->needed tailroom to avoid them. Note on max(NET_SKB_PAD, L1_CACHE_ALIGN(dev->needed_headroom)): Usually, output functions reserve LL_RESERVED_SPACE(dev), which consists of dev->hard_header_len + dev->needed_headroom, aligned by 16. However, on XSK xmit hard header is already here in the chunk, so hard_header_len is not needed. But it'd still be better to align data up to cacheline, while reserving no less than driver requests for headroom. NET_SKB_PAD here is to double-insure there will be no reallocations even when the driver advertises no needed_headroom, but in fact need it (not so rare case). Fixes: 35fcde7f8deb ("xsk: support for Tx") Signed-off-by: Alexander Lobakin Acked-by: Magnus Karlsson Acked-by: John Fastabend --- net/xdp/xsk.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) -- 2.30.1 diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 4faabd1ecfd1..143979ea4165 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -454,12 +454,16 @@ static int xsk_generic_xmit(struct sock *sk) struct sk_buff *skb; unsigned long flags; int err = 0; + u32 hr, tr; mutex_lock(&xs->mutex); if (xs->queue_id >= xs->dev->real_num_tx_queues) goto out; + hr = max(NET_SKB_PAD, L1_CACHE_ALIGN(xs->dev->needed_headroom)); + tr = xs->dev->needed_tailroom; + while (xskq_cons_peek_desc(xs->tx, &desc, xs->pool)) { char *buffer; u64 addr; @@ -471,11 +475,13 @@ static int xsk_generic_xmit(struct sock *sk) } len = desc.len; - skb = sock_alloc_send_skb(sk, len, 1, &err); + skb = sock_alloc_send_skb(sk, hr + len + tr, 1, &err); if (unlikely(!skb)) goto out; + skb_reserve(skb, hr); skb_put(skb, len); + addr = desc.addr; buffer = xsk_buff_raw_get_data(xs->pool, addr); err = skb_store_bits(skb, 0, buffer, len);