From patchwork Thu Aug 26 14:01:01 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Xing X-Patchwork-Id: 503266 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 88575C4320A for ; Thu, 26 Aug 2021 14:01:14 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 689336054E for ; Thu, 26 Aug 2021 14:01:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242813AbhHZOB7 (ORCPT ); Thu, 26 Aug 2021 10:01:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53636 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242769AbhHZOB7 (ORCPT ); Thu, 26 Aug 2021 10:01:59 -0400 Received: from mail-pj1-x1036.google.com (mail-pj1-x1036.google.com [IPv6:2607:f8b0:4864:20::1036]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D27EDC061757; Thu, 26 Aug 2021 07:01:11 -0700 (PDT) Received: by mail-pj1-x1036.google.com with SMTP id oc2-20020a17090b1c0200b00179e56772d6so6636502pjb.4; Thu, 26 Aug 2021 07:01:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=RO9C+XY/b2oSDBaOBWmveNN5U9ZpgXEGeZBUEdMkByc=; b=bG1hDKiMusj+DGd15o3vVNuS1h2qEjdW70gf/I2/HBTkeju2QR12BE2GMWLJuJ6BC2 e11GyuP2Phb6gljmkFJ4FLnyLRLg3HfJqNLwrmC9+++mUvDeAnlNsf0ZK6AgXtWBSnJN z9us9qIKGdRujLCfSidWYWshthUhR6wh++2CVSsyq1U/fRxUwdYIy8lzkUTcGzn85BcR esPHdyfbLEi3Xxg3ZulJOP8j8orTo2VhStEsD75hanA3RugBg9cPmUnPxLe9YE7M7P+M fzW/AogyLUnrjZBr4QLvj+ugfZpK3XMjExllx+1ghxU+TP867bHlIpFTsVFx6fBPyaHi JQ6A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=RO9C+XY/b2oSDBaOBWmveNN5U9ZpgXEGeZBUEdMkByc=; b=D6mA4wjijLTPvvjTS+0f8YlCsEJPMudlkoFBwzV2c6afrHy9cuZ8zkqyyxudpTLhCt rXXqm8HrdMCvOyZCzDssXrJgUrzzCVNWvZytoJG2BY3iGFUKRVvMSwXWr97BdTIFTgMA +AzCRnA4eKDvT4JvO9oh2oqry96fzmMmzIcx9mxWsGTRVgvtE4R99yCoU9y9+rH7cPEj u4wdXfAG/1yzPY/xU3OSgpismjdLY23fZ7Qk6Jk6r218vNtREG7U1FvDRNYJ7vko04aw eGVwumhBT2TP2pwdvcixANPv/lZCd5RvE1kjnP23IKxJUjDM/84h97K3hL1tDmEeIhOL noiA== X-Gm-Message-State: AOAM533pX7kvxGCq1WRgoIMNvB5Zn0FjZTVZJUomsTFD8wEy+0IxKDtq 3rQiNw6K87iS4In9YQQRVnM= X-Google-Smtp-Source: ABdhPJwnclrXIjon4AHTyVYsBZsgF+prlLHI9SYHXVLtiTbTzZIQr2Yvw8z2KQrTeB0J1km5Lg3+lA== X-Received: by 2002:a17:90a:7087:: with SMTP id g7mr4610981pjk.156.1629986471260; Thu, 26 Aug 2021 07:01:11 -0700 (PDT) Received: from localhost.localdomain ([162.219.34.246]) by smtp.gmail.com with ESMTPSA id j16sm3432234pfi.165.2021.08.26.07.01.06 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 26 Aug 2021 07:01:10 -0700 (PDT) From: kerneljasonxing@gmail.com To: jesse.brandeburg@intel.com, anthony.l.nguyen@intel.com, davem@davemloft.net, kuba@kernel.org, ast@kernel.org, daniel@iogearbox.net, hawk@kernel.org, john.fastabend@gmail.com, andrii@kernel.org, kafai@fb.com, songliubraving@fb.com, yhs@fb.com, kpsingh@kernel.org Cc: intel-wired-lan@lists.osuosl.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, bpf@vger.kernel.org, kerneljasonxing@gmail.com, Jason Xing , Shujin Li Subject: [PATCH v3] ixgbe: let the xdpdrv work with more than 64 cpus Date: Thu, 26 Aug 2021 22:01:01 +0800 Message-Id: <20210826140101.7944-1-kerneljasonxing@gmail.com> X-Mailer: git-send-email 2.30.1 (Apple Git-130) In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Jason Xing Originally, ixgbe driver doesn't allow the mounting of xdpdrv if the server is equipped with more than 64 cpus online. So it turns out that the loading of xdpdrv causes the "NOMEM" failure. Actually, we can adjust the algorithm and then make it work, which has no harm at all, only if we set the maxmium number of xdp queues. Fixes: 33fdc82f08 ("ixgbe: add support for XDP_TX action") Co-developed-by: Shujin Li Signed-off-by: Shujin Li Signed-off-by: Jason Xing --- drivers/net/ethernet/intel/ixgbe/ixgbe.h | 15 ++++- drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c | 9 ++- drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 64 ++++++++++++++++------ .../net/ethernet/intel/ixgbe/ixgbe_txrx_common.h | 1 + drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 9 +-- 5 files changed, 73 insertions(+), 25 deletions(-) diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h index a604552..5f7f181 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h @@ -82,6 +82,8 @@ #define IXGBE_2K_TOO_SMALL_WITH_PADDING \ ((NET_SKB_PAD + IXGBE_RXBUFFER_1536) > SKB_WITH_OVERHEAD(IXGBE_RXBUFFER_2K)) +DECLARE_STATIC_KEY_FALSE(ixgbe_xdp_locking_key); + static inline int ixgbe_compute_pad(int rx_buf_len) { int page_size, pad_size; @@ -351,6 +353,7 @@ struct ixgbe_ring { }; u16 rx_offset; struct xdp_rxq_info xdp_rxq; + spinlock_t tx_lock; /* used in XDP mode */ struct xsk_buff_pool *xsk_pool; u16 ring_idx; /* {rx,tx,xdp}_ring back reference idx */ u16 rx_buf_len; @@ -375,7 +378,7 @@ enum ixgbe_ring_f_enum { #define IXGBE_MAX_FCOE_INDICES 8 #define MAX_RX_QUEUES (IXGBE_MAX_FDIR_INDICES + 1) #define MAX_TX_QUEUES (IXGBE_MAX_FDIR_INDICES + 1) -#define MAX_XDP_QUEUES (IXGBE_MAX_FDIR_INDICES + 1) +#define IXGBE_MAX_XDP_QS (IXGBE_MAX_FDIR_INDICES + 1) #define IXGBE_MAX_L2A_QUEUES 4 #define IXGBE_BAD_L2A_QUEUE 3 #define IXGBE_MAX_MACVLANS 63 @@ -629,7 +632,7 @@ struct ixgbe_adapter { /* XDP */ int num_xdp_queues; - struct ixgbe_ring *xdp_ring[MAX_XDP_QUEUES]; + struct ixgbe_ring *xdp_ring[IXGBE_MAX_XDP_QS]; unsigned long *af_xdp_zc_qps; /* tracks AF_XDP ZC enabled rings */ /* TX */ @@ -772,6 +775,14 @@ struct ixgbe_adapter { #endif /* CONFIG_IXGBE_IPSEC */ }; +static inline int ixgbe_determine_xdp_q_idx(int cpu) +{ + if (static_key_enabled(&ixgbe_xdp_locking_key)) + return cpu % IXGBE_MAX_XDP_QS; + else + return cpu; +} + static inline u8 ixgbe_max_rss_indices(struct ixgbe_adapter *adapter) { switch (adapter->hw.mac.type) { diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c index 0218f6c..884bf99 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c @@ -299,7 +299,10 @@ static void ixgbe_cache_ring_register(struct ixgbe_adapter *adapter) static int ixgbe_xdp_queues(struct ixgbe_adapter *adapter) { - return adapter->xdp_prog ? nr_cpu_ids : 0; + int queues; + + queues = min_t(int, IXGBE_MAX_XDP_QS, num_online_cpus()); + return adapter->xdp_prog ? queues : 0; } #define IXGBE_RSS_64Q_MASK 0x3F @@ -947,6 +950,7 @@ static int ixgbe_alloc_q_vector(struct ixgbe_adapter *adapter, ring->count = adapter->tx_ring_count; ring->queue_index = xdp_idx; set_ring_xdp(ring); + spin_lock_init(&ring->tx_lock); /* assign ring to adapter */ WRITE_ONCE(adapter->xdp_ring[xdp_idx], ring); @@ -1032,6 +1036,9 @@ static void ixgbe_free_q_vector(struct ixgbe_adapter *adapter, int v_idx) adapter->q_vector[v_idx] = NULL; __netif_napi_del(&q_vector->napi); + if (static_key_enabled(&ixgbe_xdp_locking_key)) + static_branch_dec(&ixgbe_xdp_locking_key); + /* * after a call to __netif_napi_del() napi may still be used and * ixgbe_get_stats64() might access the rings on this vector, diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c index 14aea40..a878f40 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c @@ -165,6 +165,9 @@ static int ixgbe_notify_dca(struct notifier_block *, unsigned long event, MODULE_DESCRIPTION("Intel(R) 10 Gigabit PCI Express Network Driver"); MODULE_LICENSE("GPL v2"); +DEFINE_STATIC_KEY_FALSE(ixgbe_xdp_locking_key); +EXPORT_SYMBOL(ixgbe_xdp_locking_key); + static struct workqueue_struct *ixgbe_wq; static bool ixgbe_check_cfg_remove(struct ixgbe_hw *hw, struct pci_dev *pdev); @@ -2422,13 +2425,10 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector, xdp_do_flush_map(); if (xdp_xmit & IXGBE_XDP_TX) { - struct ixgbe_ring *ring = adapter->xdp_ring[smp_processor_id()]; + int index = ixgbe_determine_xdp_q_idx(smp_processor_id()); + struct ixgbe_ring *ring = adapter->xdp_ring[index]; - /* Force memory writes to complete before letting h/w - * know there are new descriptors to fetch. - */ - wmb(); - writel(ring->next_to_use, ring->tail); + ixgbe_xdp_ring_update_tail_locked(ring); } u64_stats_update_begin(&rx_ring->syncp); @@ -6320,7 +6320,7 @@ static int ixgbe_sw_init(struct ixgbe_adapter *adapter, if (ixgbe_init_rss_key(adapter)) return -ENOMEM; - adapter->af_xdp_zc_qps = bitmap_zalloc(MAX_XDP_QUEUES, GFP_KERNEL); + adapter->af_xdp_zc_qps = bitmap_zalloc(IXGBE_MAX_XDP_QS, GFP_KERNEL); if (!adapter->af_xdp_zc_qps) return -ENOMEM; @@ -8539,21 +8539,32 @@ static u16 ixgbe_select_queue(struct net_device *dev, struct sk_buff *skb, int ixgbe_xmit_xdp_ring(struct ixgbe_adapter *adapter, struct xdp_frame *xdpf) { - struct ixgbe_ring *ring = adapter->xdp_ring[smp_processor_id()]; struct ixgbe_tx_buffer *tx_buffer; union ixgbe_adv_tx_desc *tx_desc; + struct ixgbe_ring *ring; u32 len, cmd_type; dma_addr_t dma; + int index, ret; u16 i; len = xdpf->len; - if (unlikely(!ixgbe_desc_unused(ring))) - return IXGBE_XDP_CONSUMED; + index = ixgbe_determine_xdp_q_idx(smp_processor_id()); + ring = adapter->xdp_ring[index]; + + if (static_branch_unlikely(&ixgbe_xdp_locking_key)) + spin_lock(&ring->tx_lock); + + if (unlikely(!ixgbe_desc_unused(ring))) { + ret = IXGBE_XDP_CONSUMED; + goto out; + } dma = dma_map_single(ring->dev, xdpf->data, len, DMA_TO_DEVICE); - if (dma_mapping_error(ring->dev, dma)) - return IXGBE_XDP_CONSUMED; + if (dma_mapping_error(ring->dev, dma)) { + ret = IXGBE_XDP_CONSUMED; + goto out; + } /* record the location of the first descriptor for this packet */ tx_buffer = &ring->tx_buffer_info[ring->next_to_use]; @@ -8590,7 +8601,11 @@ int ixgbe_xmit_xdp_ring(struct ixgbe_adapter *adapter, tx_buffer->next_to_watch = tx_desc; ring->next_to_use = i; - return IXGBE_XDP_TX; + ret = IXGBE_XDP_TX; +out: + if (static_branch_unlikely(&ixgbe_xdp_locking_key)) + spin_unlock(&ring->tx_lock); + return ret; } netdev_tx_t ixgbe_xmit_frame_ring(struct sk_buff *skb, @@ -10130,8 +10145,13 @@ static int ixgbe_xdp_setup(struct net_device *dev, struct bpf_prog *prog) return -EINVAL; } - if (nr_cpu_ids > MAX_XDP_QUEUES) + /* if the number of cpus is much larger than the maximum of queues, + * we should stop it and then return with NOMEM like before. + */ + if (num_online_cpus() > IXGBE_MAX_XDP_QS * 2) return -ENOMEM; + else if (num_online_cpus() > IXGBE_MAX_XDP_QS) + static_branch_inc(&ixgbe_xdp_locking_key); old_prog = xchg(&adapter->xdp_prog, prog); need_reset = (!!prog != !!old_prog); @@ -10195,12 +10215,22 @@ void ixgbe_xdp_ring_update_tail(struct ixgbe_ring *ring) writel(ring->next_to_use, ring->tail); } +void ixgbe_xdp_ring_update_tail_locked(struct ixgbe_ring *ring) +{ + if (static_branch_unlikely(&ixgbe_xdp_locking_key)) + spin_lock(&ring->tx_lock); + ixgbe_xdp_ring_update_tail(ring); + if (static_branch_unlikely(&ixgbe_xdp_locking_key)) + spin_unlock(&ring->tx_lock); +} + static int ixgbe_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames, u32 flags) { struct ixgbe_adapter *adapter = netdev_priv(dev); struct ixgbe_ring *ring; int nxmit = 0; + int index; int i; if (unlikely(test_bit(__IXGBE_DOWN, &adapter->state))) @@ -10209,10 +10239,12 @@ static int ixgbe_xdp_xmit(struct net_device *dev, int n, if (unlikely(flags & ~XDP_XMIT_FLAGS_MASK)) return -EINVAL; + index = ixgbe_determine_xdp_q_idx(smp_processor_id()); + /* During program transitions its possible adapter->xdp_prog is assigned * but ring has not been configured yet. In this case simply abort xmit. */ - ring = adapter->xdp_prog ? adapter->xdp_ring[smp_processor_id()] : NULL; + ring = adapter->xdp_prog ? adapter->xdp_ring[index] : NULL; if (unlikely(!ring)) return -ENXIO; @@ -10230,7 +10262,7 @@ static int ixgbe_xdp_xmit(struct net_device *dev, int n, } if (unlikely(flags & XDP_XMIT_FLUSH)) - ixgbe_xdp_ring_update_tail(ring); + ixgbe_xdp_ring_update_tail_locked(ring); return nxmit; } diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h index 2aeec78..f6426d9 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h @@ -23,6 +23,7 @@ void ixgbe_process_skb_fields(struct ixgbe_ring *rx_ring, void ixgbe_rx_skb(struct ixgbe_q_vector *q_vector, struct sk_buff *skb); void ixgbe_xdp_ring_update_tail(struct ixgbe_ring *ring); +void ixgbe_xdp_ring_update_tail_locked(struct ixgbe_ring *ring); void ixgbe_irq_rearm_queues(struct ixgbe_adapter *adapter, u64 qmask); void ixgbe_txrx_ring_disable(struct ixgbe_adapter *adapter, int ring); diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c index b1d22e4..82d00e4 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c @@ -334,13 +334,10 @@ int ixgbe_clean_rx_irq_zc(struct ixgbe_q_vector *q_vector, xdp_do_flush_map(); if (xdp_xmit & IXGBE_XDP_TX) { - struct ixgbe_ring *ring = adapter->xdp_ring[smp_processor_id()]; + int index = ixgbe_determine_xdp_q_idx(smp_processor_id()); + struct ixgbe_ring *ring = adapter->xdp_ring[index]; - /* Force memory writes to complete before letting h/w - * know there are new descriptors to fetch. - */ - wmb(); - writel(ring->next_to_use, ring->tail); + ixgbe_xdp_ring_update_tail_locked(ring); } u64_stats_update_begin(&rx_ring->syncp);