From patchwork Thu Jun 24 18:06:17 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Bailey Forrest <bcf@google.com>
X-Patchwork-Id: 466711
Return-Path: <netdev-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
 aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-26.4 required=3.0 tests=BAYES_00,DKIMWL_WL_MED,
 DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,
 INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE,
 SPF_PASS, 
 USER_AGENT_GIT,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no
 version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
 by smtp.lore.kernel.org (Postfix) with ESMTP id BFE0DC49EA6
 for <netdev@archiver.kernel.org>;
 Thu, 24 Jun 2021 18:07:38 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
 by mail.kernel.org (Postfix) with ESMTP id A5A3E613F2
 for <netdev@archiver.kernel.org>;
 Thu, 24 Jun 2021 18:07:38 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
 id S232508AbhFXSJ4 (ORCPT <rfc822;netdev@archiver.kernel.org>);
 Thu, 24 Jun 2021 14:09:56 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37700 "EHLO
 lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by
 vger.kernel.org with ESMTP id S232505AbhFXSJw (ORCPT
 <rfc822;netdev@vger.kernel.org>); Thu, 24 Jun 2021 14:09:52 -0400
Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com
 [IPv6:2607:f8b0:4864:20::b4a])
 by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D25B5C061768
 for <netdev@vger.kernel.org>; Thu, 24 Jun 2021 11:07:32 -0700 (PDT)
Received: by mail-yb1-xb4a.google.com with SMTP id
 y2-20020a0569020522b0290553ecd1c09bso493374ybs.10
 for <netdev@vger.kernel.org>; Thu, 24 Jun 2021 11:07:32 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025;
 h=date:in-reply-to:message-id:mime-version:references:subject:from:to
 :cc; bh=oPZQMlPX45bmQkVAQxeorKCNLcsRF79uERdmVvKJY2w=;
 b=sKjGiKp7Dedeh0lL2aON5qMDI1UDZMaH9a+04kgwGYIk8Op5qmB2+3ttCLdygOxUX1
 n3isO1fJ5q2rtAwKJddqFacYmAB+rcHDJkvWOXl/Zu8MYgUHeFIjz3cReWlT47+58gwO
 I0/VZaB3lpf2D7ib/HTur9iww5jKt50BTuWT6ufe7ha3+7R7ZIwlaerMcq3v3G/4XaMt
 NUxK9zSFrFx/bzm71oOHGPrwUzu7sj8weUYBS4ZjqlbxltJ9LhNxGDd4mddSib52D/QQ
 c0Tl0tpZ8FWoYBK6/JLDn9ZGggMjsnjqXbOo5Uw619pbOkkZ1Jfof2dYu0WtxyujuA8C
 YwbQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:date:in-reply-to:message-id:mime-version
 :references:subject:from:to:cc;
 bh=oPZQMlPX45bmQkVAQxeorKCNLcsRF79uERdmVvKJY2w=;
 b=bBU5a7yDkuGPsp76RELTmw7LUNLxdP8ZQxVP8k+xbxJ8NtWSpIpSlI0QrK34yQt47P
 r0485xHo/yDGws4wNw3U8slPBrgiV88Vk04kErF584j5m/v/8DsrFf0DZTORAUKodaHC
 fTZV//7P2uWqJCh4zsWLEzTPh+euYfdxcvNBuP04lD1maY9ReCx6XC2FeHbdkvdP4K9j
 q0iNQTn8UoL+Re1Se0b/fPSEQukbQkfeTn2XSJmtwowa/O2ujnj/okIejCs6FWOzVGOj
 /i0wuoM2VdTxzmfsQGiRfJsp8oZsIR8bid+BOrRzLLv90jBEhujAnfT2ixpstL8uroD1
 3Zgg==
X-Gm-Message-State: AOAM532iVn/14i2PprgLVX1EAYuQ1joKAHa8gL7NjfbR6r1L0hcsCHSC
 uVU28PKWMixXXVR/35KJPebA21g=
X-Google-Smtp-Source: ABdhPJzLg+BV99Cy8vdTtAZUbkxn/cEK9aINk8C5XM7COPfp17FqZHozsZrH9q3xa2kHHlYo27+utjg=
X-Received: from bcf-linux.svl.corp.google.com
 ([2620:15c:2c4:1:cb6c:4753:6df0:b898])
 (user=bcf job=sendgmr) by 2002:a25:344:: with SMTP id
 65mr6725067ybd.223.1624558051980; 
 Thu, 24 Jun 2021 11:07:31 -0700 (PDT)
Date: Thu, 24 Jun 2021 11:06:17 -0700
In-Reply-To: <20210624180632.3659809-1-bcf@google.com>
Message-Id: <20210624180632.3659809-2-bcf@google.com>
Mime-Version: 1.0
References: <20210624180632.3659809-1-bcf@google.com>
X-Mailer: git-send-email 2.32.0.93.g670b81a890-goog
Subject: [PATCH net-next 01/16] gve: Update GVE documentation to describe DQO
From: Bailey Forrest <bcf@google.com>
To: Bailey Forrest <bcf@google.com>, "David S . Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org, Willem de Bruijn <willemb@google.com>,
 Catherine Sullivan <csully@google.com>
Precedence: bulk
List-ID: <netdev.vger.kernel.org>
X-Mailing-List: netdev@vger.kernel.org

DQO is a new descriptor format for our next generation virtual NIC.

Signed-off-by: Bailey Forrest <bcf@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Catherine Sullivan <csully@google.com>
---
 .../device_drivers/ethernet/google/gve.rst    | 53 +++++++++++++++++--
 1 file changed, 48 insertions(+), 5 deletions(-)

diff --git a/Documentation/networking/device_drivers/ethernet/google/gve.rst b/Documentation/networking/device_drivers/ethernet/google/gve.rst
index 793693cef6e3..6d73ee78f3d7 100644
--- a/Documentation/networking/device_drivers/ethernet/google/gve.rst
+++ b/Documentation/networking/device_drivers/ethernet/google/gve.rst
@@ -47,13 +47,24 @@ The driver interacts with the device in the following ways:
  - Transmit and Receive Queues
     - See description below
 
+Descriptor Formats
+------------------
+GVE supports two descriptor formats: GQI and DQO. These two formats have
+entirely different descriptors, which will be described below.
+
 Registers
 ---------
-All registers are MMIO and big endian.
+All registers are MMIO.
 
 The registers are used for initializing and configuring the device as well as
 querying device status in response to management interrupts.
 
+Endianness
+----------
+- Admin Queue messages and registers are all Big Endian.
+- GQI descriptors and datapath registers are Big Endian.
+- DQO descriptors and datapath registers are Little Endian.
+
 Admin Queue (AQ)
 ----------------
 The Admin Queue is a PAGE_SIZE memory block, treated as an array of AQ
@@ -97,10 +108,10 @@ the queues associated with that interrupt.
 The handler for these irqs schedule the napi for that block to run
 and poll the queues.
 
-Traffic Queues
---------------
-gVNIC's queues are composed of a descriptor ring and a buffer and are
-assigned to a notification block.
+GQI Traffic Queues
+------------------
+GQI queues are composed of a descriptor ring and a buffer and are assigned to a
+notification block.
 
 The descriptor rings are power-of-two-sized ring buffers consisting of
 fixed-size descriptors. They advance their head pointer using a __be32
@@ -121,3 +132,35 @@ Receive
 The buffers for receive rings are put into a data ring that is the same
 length as the descriptor ring and the head and tail pointers advance over
 the rings together.
+
+DQO Traffic Queues
+------------------
+- Every TX and RX queue is assigned a notification block.
+
+- TX and RX buffers queues, which send descriptors to the device, use MMIO
+  doorbells to notify the device of new descriptors.
+
+- RX and TX completion queues, which receive descriptors from the device, use a
+  "generation bit" to know when a descriptor was populated by the device. The
+  driver initializes all bits with the "current generation". The device will
+  populate received descriptors with the "next generation" which is inverted
+  from the current generation. When the ring wraps, the current/next generation
+  are swapped.
+
+- It's the driver's responsibility to ensure that the RX and TX completion
+  queues are not overrun. This can be accomplished by limiting the number of
+  descriptors posted to HW.
+
+- TX packets have a 16 bit completion_tag and RX buffers have a 16 bit
+  buffer_id. These will be returned on the TX completion and RX queues
+  respectively to let the driver know which packet/buffer was completed.
+
+Transmit
+~~~~~~~~
+A packet's buffers are DMA mapped for the device to access before transmission.
+After the packet was successfully transmitted, the buffers are unmapped.
+
+Receive
+~~~~~~~
+The driver posts fixed sized buffers to HW on the RX buffer queue. The packet
+received on the associated RX queue may span multiple descriptors.

From patchwork Thu Jun 24 18:06:19 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Bailey Forrest <bcf@google.com>
X-Patchwork-Id: 466710
Return-Path: <netdev-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
 aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-26.4 required=3.0 tests=BAYES_00,DKIMWL_WL_MED,
 DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,
 INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE,
 SPF_PASS, 
 USER_AGENT_GIT,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no
 version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
 by smtp.lore.kernel.org (Postfix) with ESMTP id A73AFC49EA6
 for <netdev@archiver.kernel.org>;
 Thu, 24 Jun 2021 18:07:42 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
 by mail.kernel.org (Postfix) with ESMTP id 85EE3613F0
 for <netdev@archiver.kernel.org>;
 Thu, 24 Jun 2021 18:07:42 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
 id S232537AbhFXSKA (ORCPT <rfc822;netdev@archiver.kernel.org>);
 Thu, 24 Jun 2021 14:10:00 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37724 "EHLO
 lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by
 vger.kernel.org with ESMTP id S232509AbhFXSJ5 (ORCPT
 <rfc822;netdev@vger.kernel.org>); Thu, 24 Jun 2021 14:09:57 -0400
Received: from mail-qt1-x84a.google.com (mail-qt1-x84a.google.com
 [IPv6:2607:f8b0:4864:20::84a])
 by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8813BC061574
 for <netdev@vger.kernel.org>; Thu, 24 Jun 2021 11:07:37 -0700 (PDT)
Received: by mail-qt1-x84a.google.com with SMTP id
 t6-20020ac80dc60000b029024e988e8277so7084424qti.23
 for <netdev@vger.kernel.org>; Thu, 24 Jun 2021 11:07:37 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025;
 h=date:in-reply-to:message-id:mime-version:references:subject:from:to
 :cc; bh=gTAys659jFcQRh+qKUfqA5gfoSdfjPUJRzzvjwrYuAo=;
 b=NBTFpDNy9AKZ4IHC+NaPLY5SD9XxCiX715u4Nsyr1ZAyylzzH0U7+71I7ZUv174w7H
 2Jri71+F/voV6HsGWGieS4yDfqGxjLWRKCnjibXLcMALaMHowWkRI9kHEzgoAnfi4euE
 ugTrj+Rhwwmhdrv0YQ0qG6tAC1HfeW9hxktfBoYzPpEnkgBIoZWQIjT3DPq8Y4guPa1m
 Rn8kAftHuVjkSX7rszdk0xwSCXs0qYqnQeAO306nR/GZB6HZK8XH8QQEWOSzB7bLQEgF
 ymyz7r0Xh2r2eK+Rud3ZPjX1h8RhtYkPBtkYDVuzsi02aqyeZ/hWq36Y33NBSzLFkxO5
 pXbQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:date:in-reply-to:message-id:mime-version
 :references:subject:from:to:cc;
 bh=gTAys659jFcQRh+qKUfqA5gfoSdfjPUJRzzvjwrYuAo=;
 b=MMIacD4x/PkFKGd8houcBkzkuvp3GLS+H7oEsbebYG2d5SGEsWzXrC6u4Nr2vzJx4f
 GmnUKnxx9Akd+OLpzmxc17IOhHGZOYuVFleP4dc1zIZiPRCstAtjmMOoV/9YjbqTsM7x
 et18ZOuWygQX0vxtkj458Xeyy6FU125O6ewviJs0DwCW6uyl1VmXmj2n9ET4Y8d41ElQ
 bkFr0tuwl0yrv7zP/vfm3204MQXfpXtIQB14kS4n4o+kLBGaghTfLKIlRpbB11H6DgL2
 dk/mauh2Q/zVA2E/uL66kewJCSrV/S2MHhcpxIYw4I/ZpgMbGN9e1tzHmj+mZnMReIY2
 n6ug==
X-Gm-Message-State: AOAM532r1CH8hMS3kZLI4qWCRPV6HJF+Irf8mgs1687GiBv4jxIRvL6J
 vE2EGYksDHDGmmAmycKtGg41Nks=
X-Google-Smtp-Source: ABdhPJyFphAOCIaksQBDH63k3Yv3rPocrULg+pIBM4SyCoqPmerMqN5sLE8yU4TeydA/2esiJVEfAV8=
X-Received: from bcf-linux.svl.corp.google.com
 ([2620:15c:2c4:1:cb6c:4753:6df0:b898])
 (user=bcf job=sendgmr) by 2002:a0c:e94c:: with SMTP id
 n12mr6778703qvo.61.1624558056708; 
 Thu, 24 Jun 2021 11:07:36 -0700 (PDT)
Date: Thu, 24 Jun 2021 11:06:19 -0700
In-Reply-To: <20210624180632.3659809-1-bcf@google.com>
Message-Id: <20210624180632.3659809-4-bcf@google.com>
Mime-Version: 1.0
References: <20210624180632.3659809-1-bcf@google.com>
X-Mailer: git-send-email 2.32.0.93.g670b81a890-goog
Subject: [PATCH net-next 03/16] gve: gve_rx_copy: Move padding to an argument
From: Bailey Forrest <bcf@google.com>
To: Bailey Forrest <bcf@google.com>, "David S . Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org, Willem de Bruijn <willemb@google.com>,
 Catherine Sullivan <csully@google.com>
Precedence: bulk
List-ID: <netdev.vger.kernel.org>
X-Mailing-List: netdev@vger.kernel.org

Future use cases will have a different padding value.

Signed-off-by: Bailey Forrest <bcf@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Catherine Sullivan <csully@google.com>
---
 drivers/net/ethernet/google/gve/gve_rx.c    | 4 ++--
 drivers/net/ethernet/google/gve/gve_utils.c | 5 +++--
 drivers/net/ethernet/google/gve/gve_utils.h | 3 ++-
 3 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/google/gve/gve_rx.c b/drivers/net/ethernet/google/gve/gve_rx.c
index 2cfedf4bf5d8..c51578c1e2b2 100644
--- a/drivers/net/ethernet/google/gve/gve_rx.c
+++ b/drivers/net/ethernet/google/gve/gve_rx.c
@@ -350,7 +350,7 @@ gve_rx_qpl(struct device *dev, struct net_device *netdev,
 			gve_rx_flip_buff(page_info, &data_slot->qpl_offset);
 		}
 	} else {
-		skb = gve_rx_copy(netdev, napi, page_info, len);
+		skb = gve_rx_copy(netdev, napi, page_info, len, GVE_RX_PAD);
 		if (skb) {
 			u64_stats_update_begin(&rx->statss);
 			rx->rx_copied_pkt++;
@@ -392,7 +392,7 @@ static bool gve_rx(struct gve_rx_ring *rx, struct gve_rx_desc *rx_desc,
 
 	if (len <= priv->rx_copybreak) {
 		/* Just copy small packets */
-		skb = gve_rx_copy(dev, napi, page_info, len);
+		skb = gve_rx_copy(dev, napi, page_info, len, GVE_RX_PAD);
 		u64_stats_update_begin(&rx->statss);
 		rx->rx_copied_pkt++;
 		rx->rx_copybreak_pkt++;
diff --git a/drivers/net/ethernet/google/gve/gve_utils.c b/drivers/net/ethernet/google/gve/gve_utils.c
index 2bfff0f75519..eb3d67c8b3ac 100644
--- a/drivers/net/ethernet/google/gve/gve_utils.c
+++ b/drivers/net/ethernet/google/gve/gve_utils.c
@@ -45,10 +45,11 @@ void gve_rx_add_to_block(struct gve_priv *priv, int queue_idx)
 }
 
 struct sk_buff *gve_rx_copy(struct net_device *dev, struct napi_struct *napi,
-			    struct gve_rx_slot_page_info *page_info, u16 len)
+			    struct gve_rx_slot_page_info *page_info, u16 len,
+			    u16 pad)
 {
 	struct sk_buff *skb = napi_alloc_skb(napi, len);
-	void *va = page_info->page_address + GVE_RX_PAD +
+	void *va = page_info->page_address + pad +
 		   (page_info->page_offset ? PAGE_SIZE / 2 : 0);
 
 	if (unlikely(!skb))
diff --git a/drivers/net/ethernet/google/gve/gve_utils.h b/drivers/net/ethernet/google/gve/gve_utils.h
index 76540374a083..8fb39b990bbc 100644
--- a/drivers/net/ethernet/google/gve/gve_utils.h
+++ b/drivers/net/ethernet/google/gve/gve_utils.h
@@ -18,7 +18,8 @@ void gve_rx_remove_from_block(struct gve_priv *priv, int queue_idx);
 void gve_rx_add_to_block(struct gve_priv *priv, int queue_idx);
 
 struct sk_buff *gve_rx_copy(struct net_device *dev, struct napi_struct *napi,
-			    struct gve_rx_slot_page_info *page_info, u16 len);
+			    struct gve_rx_slot_page_info *page_info, u16 len,
+			    u16 pad);
 
 #endif /* _GVE_UTILS_H */
 

From patchwork Thu Jun 24 18:06:21 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Bailey Forrest <bcf@google.com>
X-Patchwork-Id: 466709
Return-Path: <netdev-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
 aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-26.4 required=3.0 tests=BAYES_00,DKIMWL_WL_MED,
 DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,
 INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE,
 SPF_PASS, 
 USER_AGENT_GIT,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no
 version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
 by smtp.lore.kernel.org (Postfix) with ESMTP id 8453BC49EA6
 for <netdev@archiver.kernel.org>;
 Thu, 24 Jun 2021 18:07:50 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
 by mail.kernel.org (Postfix) with ESMTP id 6D40C613F0
 for <netdev@archiver.kernel.org>;
 Thu, 24 Jun 2021 18:07:50 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
 id S232543AbhFXSKI (ORCPT <rfc822;netdev@archiver.kernel.org>);
 Thu, 24 Jun 2021 14:10:08 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37750 "EHLO
 lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by
 vger.kernel.org with ESMTP id S232509AbhFXSKB (ORCPT
 <rfc822;netdev@vger.kernel.org>); Thu, 24 Jun 2021 14:10:01 -0400
Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com
 [IPv6:2607:f8b0:4864:20::b4a])
 by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 489E8C061766
 for <netdev@vger.kernel.org>; Thu, 24 Jun 2021 11:07:42 -0700 (PDT)
Received: by mail-yb1-xb4a.google.com with SMTP id
 v65-20020a25abc70000b02905574862aa77so495846ybi.21
 for <netdev@vger.kernel.org>; Thu, 24 Jun 2021 11:07:42 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025;
 h=date:in-reply-to:message-id:mime-version:references:subject:from:to
 :cc; bh=jpHb/6Wvw/QZVk2jJxCz3QzcPeHVV+pvgGPwgYsQ4Cw=;
 b=vgnC6qX7cpdKeM9axecEiIbfv23hA6M4//eLVVVFh58hxIDI30VyCnXlLGAtNRgm3i
 5FbNCOXo8w8O8PyiAfLm4+fXtlptCU9PCWwR2b2MwSpW4+YX2SNk8kOaV7mJpiwjRLfw
 q++soLSEuG+WDnadINNIYQsXXwzCpL+t5WiPniLz7M/dc7zs7fICnm6fsOGtuY2XY2Vq
 uuz349g7QUlUOFgyGfl184tTZIE0RD11LEb33gYj48ItTOn67Be5+O31ZSMrywCUe0LR
 FWNf5dCDn7W3OgZ3cy3iuTkgJe4y1VHzTctbnkHMtyTkfuJ22IUzo1LVfFRhw7xC4eFR
 /1Tg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:date:in-reply-to:message-id:mime-version
 :references:subject:from:to:cc;
 bh=jpHb/6Wvw/QZVk2jJxCz3QzcPeHVV+pvgGPwgYsQ4Cw=;
 b=erB7s0z3peUyVR2sFqtPfL1P8AiOUXk230httEXlRuZayrsZ9HnCkZ9BT2SyBKjkxp
 i/kYjXHg7D/eglfccgka+fOq0wO9+qNfiZnGBKJCiK4Ikgl+Y+tiQRF3J3paVB9LuO5+
 3xXxlyFd+17drrmEuSr5KtzmQFuKpJanauP9hhFqTC1qXVO0vGNgBAlaSNbJ8ccaX3mq
 RA8xAk+uvYJB7a1ZCpEvwkBaQHiN9JkVPxgX4jDyw3ajejm3AaeE5bSpEDI/kjzMyn9s
 7EVv8UIQ5PXA2VVy8UV2nucpBY0BvcIWXMlt1Mhx4oChFXgjhbOOeWjPPtIOkeSoQC/5
 h/Tg==
X-Gm-Message-State: AOAM530OBSD1VTjcRwVU+9YqXOCFo7IwfEcMv5EcNtjYhn02pKH9AEpx
 d2/3dZogR4XWmKAxNVnV1sH6gKg=
X-Google-Smtp-Source: ABdhPJw0OG80upVLc4pe7L91gf4uYdf2zyWRpvgpC/38hzJCcNt8abJobJip2pvyqvFSWQ+XQj9doNg=
X-Received: from bcf-linux.svl.corp.google.com
 ([2620:15c:2c4:1:cb6c:4753:6df0:b898])
 (user=bcf job=sendgmr) by 2002:a25:b793:: with SMTP id
 n19mr6781224ybh.488.1624558061453; 
 Thu, 24 Jun 2021 11:07:41 -0700 (PDT)
Date: Thu, 24 Jun 2021 11:06:21 -0700
In-Reply-To: <20210624180632.3659809-1-bcf@google.com>
Message-Id: <20210624180632.3659809-6-bcf@google.com>
Mime-Version: 1.0
References: <20210624180632.3659809-1-bcf@google.com>
X-Mailer: git-send-email 2.32.0.93.g670b81a890-goog
Subject: [PATCH net-next 05/16] gve: Introduce a new model for device options
From: Bailey Forrest <bcf@google.com>
To: Bailey Forrest <bcf@google.com>, "David S . Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org, Willem de Bruijn <willemb@google.com>,
 Catherine Sullivan <csully@google.com>
Precedence: bulk
List-ID: <netdev.vger.kernel.org>
X-Mailing-List: netdev@vger.kernel.org

The current model uses an integer ID and a fixed size struct for the
parameters of each device option.

The new model allows the device option structs to grow in size over
time. A driver may assume that changes to device option structs will
always be appended.

New device options will also generally have a
`supported_features_mask` so that the driver knows which fields within a
particular device option are enabled.

`gve_device_option.feat_mask` is changed to `required_features_mask`,
and it is a bitmask which must match the value expected by the driver.
This gives the device the ability to break backwards compatibility with
old drivers for certain features by blocking the old drivers from trying
to use the feature.

We maintain ABI compatibility with the old model for
GVE_DEV_OPT_ID_RAW_ADDRESSING in case a driver is using a device which
does not support the new model.

This patch introduces some new terminology:

RDA - Raw DMA Addressing - Buffers associated with SKBs are directly DMA
      mapped and read/updated by the device.

QPL - Queue Page Lists - Driver uses bounce buffers which are DMA mapped
      with the device for read/write and data is copied from/to SKBs.

Signed-off-by: Bailey Forrest <bcf@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Catherine Sullivan <csully@google.com>
---
 drivers/net/ethernet/google/gve/gve_adminq.c | 172 +++++++++++++++----
 drivers/net/ethernet/google/gve/gve_adminq.h |  50 +++++-
 2 files changed, 179 insertions(+), 43 deletions(-)

diff --git a/drivers/net/ethernet/google/gve/gve_adminq.c b/drivers/net/ethernet/google/gve/gve_adminq.c
index 53864f200599..1c2a4ccaefe5 100644
--- a/drivers/net/ethernet/google/gve/gve_adminq.c
+++ b/drivers/net/ethernet/google/gve/gve_adminq.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: (GPL-2.0 OR MIT)
 /* Google virtual Ethernet (gve) driver
  *
- * Copyright (C) 2015-2019 Google, Inc.
+ * Copyright (C) 2015-2021 Google, Inc.
  */
 
 #include <linux/etherdevice.h>
@@ -18,6 +18,8 @@
 "Expected: length=%d, feature_mask=%x.\n" \
 "Actual: length=%d, feature_mask=%x.\n"
 
+#define GVE_DEVICE_OPTION_TOO_BIG_FMT "Length of %s option larger than expected. Possible older version of guest driver.\n"
+
 static
 struct gve_device_option *gve_get_next_option(struct gve_device_descriptor *descriptor,
 					      struct gve_device_option *option)
@@ -33,28 +35,81 @@ struct gve_device_option *gve_get_next_option(struct gve_device_descriptor *desc
 static
 void gve_parse_device_option(struct gve_priv *priv,
 			     struct gve_device_descriptor *device_descriptor,
-			     struct gve_device_option *option)
+			     struct gve_device_option *option,
+			     struct gve_device_option_gqi_rda **dev_op_gqi_rda,
+			     struct gve_device_option_gqi_qpl **dev_op_gqi_qpl,
+			     struct gve_device_option_dqo_rda **dev_op_dqo_rda)
 {
+	u32 req_feat_mask = be32_to_cpu(option->required_features_mask);
 	u16 option_length = be16_to_cpu(option->option_length);
 	u16 option_id = be16_to_cpu(option->option_id);
 
+	/* If the length or feature mask doesn't match, continue without
+	 * enabling the feature.
+	 */
 	switch (option_id) {
-	case GVE_DEV_OPT_ID_RAW_ADDRESSING:
-		/* If the length or feature mask doesn't match,
-		 * continue without enabling the feature.
-		 */
-		if (option_length != GVE_DEV_OPT_LEN_RAW_ADDRESSING ||
-		    option->feat_mask != cpu_to_be32(GVE_DEV_OPT_FEAT_MASK_RAW_ADDRESSING)) {
-			dev_warn(&priv->pdev->dev, GVE_DEVICE_OPTION_ERROR_FMT, "Raw Addressing",
-				 GVE_DEV_OPT_LEN_RAW_ADDRESSING,
-				 cpu_to_be32(GVE_DEV_OPT_FEAT_MASK_RAW_ADDRESSING),
-				 option_length, option->feat_mask);
-			priv->raw_addressing = 0;
-		} else {
-			dev_info(&priv->pdev->dev,
-				 "Raw addressing device option enabled.\n");
-			priv->raw_addressing = 1;
+	case GVE_DEV_OPT_ID_GQI_RAW_ADDRESSING:
+		if (option_length != GVE_DEV_OPT_LEN_GQI_RAW_ADDRESSING ||
+		    req_feat_mask != GVE_DEV_OPT_REQ_FEAT_MASK_GQI_RAW_ADDRESSING) {
+			dev_warn(&priv->pdev->dev, GVE_DEVICE_OPTION_ERROR_FMT,
+				 "Raw Addressing",
+				 GVE_DEV_OPT_LEN_GQI_RAW_ADDRESSING,
+				 GVE_DEV_OPT_REQ_FEAT_MASK_GQI_RAW_ADDRESSING,
+				 option_length, req_feat_mask);
+			break;
+		}
+
+		dev_info(&priv->pdev->dev,
+			 "Gqi raw addressing device option enabled.\n");
+		priv->raw_addressing = 1;
+		break;
+	case GVE_DEV_OPT_ID_GQI_RDA:
+		if (option_length < sizeof(**dev_op_gqi_rda) ||
+		    req_feat_mask != GVE_DEV_OPT_REQ_FEAT_MASK_GQI_RDA) {
+			dev_warn(&priv->pdev->dev, GVE_DEVICE_OPTION_ERROR_FMT,
+				 "GQI RDA", (int)sizeof(**dev_op_gqi_rda),
+				 GVE_DEV_OPT_REQ_FEAT_MASK_GQI_RDA,
+				 option_length, req_feat_mask);
+			break;
+		}
+
+		if (option_length > sizeof(**dev_op_gqi_rda)) {
+			dev_warn(&priv->pdev->dev,
+				 GVE_DEVICE_OPTION_TOO_BIG_FMT, "GQI RDA");
+		}
+		*dev_op_gqi_rda = (void *)(option + 1);
+		break;
+	case GVE_DEV_OPT_ID_GQI_QPL:
+		if (option_length < sizeof(**dev_op_gqi_qpl) ||
+		    req_feat_mask != GVE_DEV_OPT_REQ_FEAT_MASK_GQI_QPL) {
+			dev_warn(&priv->pdev->dev, GVE_DEVICE_OPTION_ERROR_FMT,
+				 "GQI QPL", (int)sizeof(**dev_op_gqi_qpl),
+				 GVE_DEV_OPT_REQ_FEAT_MASK_GQI_QPL,
+				 option_length, req_feat_mask);
+			break;
+		}
+
+		if (option_length > sizeof(**dev_op_gqi_qpl)) {
+			dev_warn(&priv->pdev->dev,
+				 GVE_DEVICE_OPTION_TOO_BIG_FMT, "GQI QPL");
+		}
+		*dev_op_gqi_qpl = (void *)(option + 1);
+		break;
+	case GVE_DEV_OPT_ID_DQO_RDA:
+		if (option_length < sizeof(**dev_op_dqo_rda) ||
+		    req_feat_mask != GVE_DEV_OPT_REQ_FEAT_MASK_DQO_RDA) {
+			dev_warn(&priv->pdev->dev, GVE_DEVICE_OPTION_ERROR_FMT,
+				 "DQO RDA", (int)sizeof(**dev_op_dqo_rda),
+				 GVE_DEV_OPT_REQ_FEAT_MASK_DQO_RDA,
+				 option_length, req_feat_mask);
+			break;
+		}
+
+		if (option_length > sizeof(**dev_op_dqo_rda)) {
+			dev_warn(&priv->pdev->dev,
+				 GVE_DEVICE_OPTION_TOO_BIG_FMT, "DQO RDA");
 		}
+		*dev_op_dqo_rda = (void *)(option + 1);
 		break;
 	default:
 		/* If we don't recognize the option just continue
@@ -65,6 +120,39 @@ void gve_parse_device_option(struct gve_priv *priv,
 	}
 }
 
+/* Process all device options for a given describe device call. */
+static int
+gve_process_device_options(struct gve_priv *priv,
+			   struct gve_device_descriptor *descriptor,
+			   struct gve_device_option_gqi_rda **dev_op_gqi_rda,
+			   struct gve_device_option_gqi_qpl **dev_op_gqi_qpl,
+			   struct gve_device_option_dqo_rda **dev_op_dqo_rda)
+{
+	const int num_options = be16_to_cpu(descriptor->num_device_options);
+	struct gve_device_option *dev_opt;
+	int i;
+
+	/* The options struct directly follows the device descriptor. */
+	dev_opt = (void *)(descriptor + 1);
+	for (i = 0; i < num_options; i++) {
+		struct gve_device_option *next_opt;
+
+		next_opt = gve_get_next_option(descriptor, dev_opt);
+		if (!next_opt) {
+			dev_err(&priv->dev->dev,
+				"options exceed device_descriptor's total length.\n");
+			return -EINVAL;
+		}
+
+		gve_parse_device_option(priv, descriptor, dev_opt,
+					dev_op_gqi_rda, dev_op_gqi_qpl,
+					dev_op_dqo_rda);
+		dev_opt = next_opt;
+	}
+
+	return 0;
+}
+
 int gve_adminq_alloc(struct device *dev, struct gve_priv *priv)
 {
 	priv->adminq = dma_alloc_coherent(dev, PAGE_SIZE,
@@ -514,15 +602,15 @@ int gve_adminq_destroy_rx_queues(struct gve_priv *priv, u32 num_queues)
 
 int gve_adminq_describe_device(struct gve_priv *priv)
 {
+	struct gve_device_option_gqi_rda *dev_op_gqi_rda = NULL;
+	struct gve_device_option_gqi_qpl *dev_op_gqi_qpl = NULL;
+	struct gve_device_option_dqo_rda *dev_op_dqo_rda = NULL;
 	struct gve_device_descriptor *descriptor;
-	struct gve_device_option *dev_opt;
 	union gve_adminq_command cmd;
 	dma_addr_t descriptor_bus;
-	u16 num_options;
 	int err = 0;
 	u8 *mac;
 	u16 mtu;
-	int i;
 
 	memset(&cmd, 0, sizeof(cmd));
 	descriptor = dma_alloc_coherent(&priv->pdev->dev, PAGE_SIZE,
@@ -540,6 +628,31 @@ int gve_adminq_describe_device(struct gve_priv *priv)
 	if (err)
 		goto free_device_descriptor;
 
+	priv->raw_addressing = 0;
+	err = gve_process_device_options(priv, descriptor, &dev_op_gqi_rda,
+					 &dev_op_gqi_qpl, &dev_op_dqo_rda);
+	if (err)
+		goto free_device_descriptor;
+
+	/* If the GQI_RAW_ADDRESSING option is not enabled and the queue format
+	 * is not set to GqiRda, choose the queue format in a priority order:
+	 * DqoRda, GqiRda, GqiQpl. Use GqiQpl as default.
+	 */
+	if (priv->raw_addressing == 1) {
+		dev_info(&priv->pdev->dev,
+			 "Driver is running with GQI RDA queue format.\n");
+	} else if (dev_op_dqo_rda) {
+		dev_info(&priv->pdev->dev,
+			 "Driver is running with DQO RDA queue format.\n");
+	} else if (dev_op_gqi_rda) {
+		dev_info(&priv->pdev->dev,
+			 "Driver is running with GQI RDA queue format.\n");
+		priv->raw_addressing = 1;
+	} else {
+		dev_info(&priv->pdev->dev,
+			 "Driver is running with GQI QPL queue format.\n");
+	}
+
 	priv->tx_desc_cnt = be16_to_cpu(descriptor->tx_queue_entries);
 	if (priv->tx_desc_cnt * sizeof(priv->tx->desc[0]) < PAGE_SIZE) {
 		dev_err(&priv->pdev->dev, "Tx desc count %d too low\n", priv->tx_desc_cnt);
@@ -576,26 +689,9 @@ int gve_adminq_describe_device(struct gve_priv *priv)
 		priv->rx_desc_cnt = priv->rx_data_slot_cnt;
 	}
 	priv->default_num_queues = be16_to_cpu(descriptor->default_num_queues);
-	dev_opt = (void *)(descriptor + 1);
-
-	num_options = be16_to_cpu(descriptor->num_device_options);
-	for (i = 0; i < num_options; i++) {
-		struct gve_device_option *next_opt;
-
-		next_opt = gve_get_next_option(descriptor, dev_opt);
-		if (!next_opt) {
-			dev_err(&priv->dev->dev,
-				"options exceed device_descriptor's total length.\n");
-			err = -EINVAL;
-			goto free_device_descriptor;
-		}
-
-		gve_parse_device_option(priv, descriptor, dev_opt);
-		dev_opt = next_opt;
-	}
 
 free_device_descriptor:
-	dma_free_coherent(&priv->pdev->dev, sizeof(*descriptor), descriptor,
+	dma_free_coherent(&priv->pdev->dev, PAGE_SIZE, descriptor,
 			  descriptor_bus);
 	return err;
 }
diff --git a/drivers/net/ethernet/google/gve/gve_adminq.h b/drivers/net/ethernet/google/gve/gve_adminq.h
index d320c2ffd87c..4b1485b11a7b 100644
--- a/drivers/net/ethernet/google/gve/gve_adminq.h
+++ b/drivers/net/ethernet/google/gve/gve_adminq.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: (GPL-2.0 OR MIT)
  * Google virtual Ethernet (gve) driver
  *
- * Copyright (C) 2015-2019 Google, Inc.
+ * Copyright (C) 2015-2021 Google, Inc.
  */
 
 #ifndef _GVE_ADMINQ_H
@@ -82,14 +82,54 @@ static_assert(sizeof(struct gve_device_descriptor) == 40);
 struct gve_device_option {
 	__be16 option_id;
 	__be16 option_length;
-	__be32 feat_mask;
+	__be32 required_features_mask;
 };
 
 static_assert(sizeof(struct gve_device_option) == 8);
 
-#define GVE_DEV_OPT_ID_RAW_ADDRESSING 0x1
-#define GVE_DEV_OPT_LEN_RAW_ADDRESSING 0x0
-#define GVE_DEV_OPT_FEAT_MASK_RAW_ADDRESSING 0x0
+struct gve_device_option_gqi_rda {
+	__be32 supported_features_mask;
+};
+
+static_assert(sizeof(struct gve_device_option_gqi_rda) == 4);
+
+struct gve_device_option_gqi_qpl {
+	__be32 supported_features_mask;
+};
+
+static_assert(sizeof(struct gve_device_option_gqi_qpl) == 4);
+
+struct gve_device_option_dqo_rda {
+	__be32 supported_features_mask;
+	__be16 tx_comp_ring_entries;
+	__be16 rx_buff_ring_entries;
+};
+
+static_assert(sizeof(struct gve_device_option_dqo_rda) == 8);
+
+/* Terminology:
+ *
+ * RDA - Raw DMA Addressing - Buffers associated with SKBs are directly DMA
+ *       mapped and read/updated by the device.
+ *
+ * QPL - Queue Page Lists - Driver uses bounce buffers which are DMA mapped with
+ *       the device for read/write and data is copied from/to SKBs.
+ */
+enum gve_dev_opt_id {
+	GVE_DEV_OPT_ID_GQI_RAW_ADDRESSING = 0x1,
+	GVE_DEV_OPT_ID_GQI_RDA = 0x2,
+	GVE_DEV_OPT_ID_GQI_QPL = 0x3,
+	GVE_DEV_OPT_ID_DQO_RDA = 0x4,
+};
+
+enum gve_dev_opt_req_feat_mask {
+	GVE_DEV_OPT_REQ_FEAT_MASK_GQI_RAW_ADDRESSING = 0x0,
+	GVE_DEV_OPT_REQ_FEAT_MASK_GQI_RDA = 0x0,
+	GVE_DEV_OPT_REQ_FEAT_MASK_GQI_QPL = 0x0,
+	GVE_DEV_OPT_REQ_FEAT_MASK_DQO_RDA = 0x0,
+};
+
+#define GVE_DEV_OPT_LEN_GQI_RAW_ADDRESSING 0x0
 
 struct gve_adminq_configure_device_resources {
 	__be64 counter_array;

From patchwork Thu Jun 24 18:06:23 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Bailey Forrest <bcf@google.com>
X-Patchwork-Id: 466708
Return-Path: <netdev-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
 aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-26.4 required=3.0 tests=BAYES_00,DKIMWL_WL_MED,
 DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,
 INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE,
 SPF_PASS, 
 USER_AGENT_GIT,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no
 version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
 by smtp.lore.kernel.org (Postfix) with ESMTP id 23670C49EA5
 for <netdev@archiver.kernel.org>;
 Thu, 24 Jun 2021 18:07:55 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
 by mail.kernel.org (Postfix) with ESMTP id 0CFAA613C2
 for <netdev@archiver.kernel.org>;
 Thu, 24 Jun 2021 18:07:55 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
 id S230450AbhFXSKN (ORCPT <rfc822;netdev@archiver.kernel.org>);
 Thu, 24 Jun 2021 14:10:13 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37750 "EHLO
 lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by
 vger.kernel.org with ESMTP id S232554AbhFXSKG (ORCPT
 <rfc822;netdev@vger.kernel.org>); Thu, 24 Jun 2021 14:10:06 -0400
Received: from mail-qt1-x849.google.com (mail-qt1-x849.google.com
 [IPv6:2607:f8b0:4864:20::849])
 by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D980DC061574
 for <netdev@vger.kernel.org>; Thu, 24 Jun 2021 11:07:46 -0700 (PDT)
Received: by mail-qt1-x849.google.com with SMTP id
 c17-20020ac87d910000b029024ee21abd54so7094877qtd.19
 for <netdev@vger.kernel.org>; Thu, 24 Jun 2021 11:07:46 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025;
 h=date:in-reply-to:message-id:mime-version:references:subject:from:to
 :cc; bh=NBzeqdVL1FHtPJL1v/U3LdFAK/Op8Mjf4OERsboctv0=;
 b=j9KuaS+5HWylIDSUD6giBI0te8Nt6wzxz9KswJuFY3I3mJJRvZTpGqD208hLAnqZ6V
 EktgKWa0bLzb4P3L6ZVMbWdd8E9uoOudHB0OSM+AoQZJWGag8LoymBS/oFdWabxHBm8i
 PqCczUYgc5qwngPSn19SVSOIrPYVz6ADzVap8vKthHsbU9icml2K6TxZpJTb4taz6jGK
 ZfyXgMwWlsrOOxdnKgBQPw3F7ja9+NeZCQ0fM3cikB6fs2shTOw2m8UplDAqSt2qaJvI
 63h64POCpAo4ebyAgGhd7Mrhq8IKO1tg3Ju9A2uqYC8YdMCmbnnfyEvUWi0o//NmhFkd
 4IvQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:date:in-reply-to:message-id:mime-version
 :references:subject:from:to:cc;
 bh=NBzeqdVL1FHtPJL1v/U3LdFAK/Op8Mjf4OERsboctv0=;
 b=f3En8wnrcGiYF6ftP5XZ8k9YDFhfQhSMTzQFse+c3zh9MWlZhUoMfMsp9nh4xHMuY6
 U7MSdr7bYWOtZal0KER8M2PXz6kRX1bVGa6ADjAMNyKTHed4wZ3JzAbOvTRRtK1gtRkD
 cfryTstNPmtzPCC/DfULTK/1bAjX5W43rfQTl5EyBfCFDZ/yCn5rr4KKa89D70om4j5g
 MoRo/J1iK5jyPKdjVz9cjX3boDCO3TvvVBz2OTE8lpuKgMtBBGsQfl1U3BuguqxULFSX
 dJQB5ActJarZbZCvmObp4Zic7cs1WpbE0QmzKk1O+6QpdCDRPU10vCep1QKUd4zVUL3/
 jA7A==
X-Gm-Message-State: AOAM530eyaiNfSHcQ1XU1JRhqDX09iIwtfRj+tjQJe3HPXlTD0kggGRB
 as7/YGKUab+sjIo90abGaM0BNe0=
X-Google-Smtp-Source: ABdhPJzCF1h58ydwEH8bMakFltfZWWgVyA6V4+bBUJ37xfz/pPSpObFrSO7VPqVt/N+pXSp3LlgOvtw=
X-Received: from bcf-linux.svl.corp.google.com
 ([2620:15c:2c4:1:cb6c:4753:6df0:b898])
 (user=bcf job=sendgmr) by 2002:ad4:4ea7:: with SMTP id
 ed7mr4109382qvb.1.1624558066017; 
 Thu, 24 Jun 2021 11:07:46 -0700 (PDT)
Date: Thu, 24 Jun 2021 11:06:23 -0700
In-Reply-To: <20210624180632.3659809-1-bcf@google.com>
Message-Id: <20210624180632.3659809-8-bcf@google.com>
Mime-Version: 1.0
References: <20210624180632.3659809-1-bcf@google.com>
X-Mailer: git-send-email 2.32.0.93.g670b81a890-goog
Subject: [PATCH net-next 07/16] gve: adminq: DQO specific device descriptor
 logic
From: Bailey Forrest <bcf@google.com>
To: Bailey Forrest <bcf@google.com>, "David S . Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org, Willem de Bruijn <willemb@google.com>,
 Catherine Sullivan <csully@google.com>
Precedence: bulk
List-ID: <netdev.vger.kernel.org>
X-Mailing-List: netdev@vger.kernel.org

- In addition to TX and RX queues, DQO has TX completion and RX buffer
  queues.
  - TX completions are received when the device has completed sending a
    packet on the wire.
  - RX buffers are posted on a separate queue form the RX completions.
- DQO descriptor rings are allowed to be smaller than PAGE_SIZE.

Signed-off-by: Bailey Forrest <bcf@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Catherine Sullivan <csully@google.com>
---
 drivers/net/ethernet/google/gve/gve.h        | 13 +++++
 drivers/net/ethernet/google/gve/gve_adminq.c | 57 ++++++++++++++------
 2 files changed, 55 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/google/gve/gve.h b/drivers/net/ethernet/google/gve/gve.h
index 9cb9b8f3e66e..9045b86279cb 100644
--- a/drivers/net/ethernet/google/gve/gve.h
+++ b/drivers/net/ethernet/google/gve/gve.h
@@ -194,6 +194,11 @@ struct gve_qpl_config {
 	unsigned long *qpl_id_map; /* bitmap of used qpl ids */
 };
 
+struct gve_options_dqo_rda {
+	u16 tx_comp_ring_entries; /* number of tx_comp descriptors */
+	u16 rx_buff_ring_entries; /* number of rx_buff descriptors */
+};
+
 /* GVE_QUEUE_FORMAT_UNSPECIFIED must be zero since 0 is the default value
  * when the entire configure_device_resources command is zeroed out and the
  * queue_format is not specified.
@@ -286,6 +291,8 @@ struct gve_priv {
 	/* Gvnic device link speed from hypervisor. */
 	u64 link_speed;
 
+	struct gve_options_dqo_rda options_dqo_rda;
+
 	enum gve_queue_format queue_format;
 };
 
@@ -533,6 +540,12 @@ static inline enum dma_data_direction gve_qpl_dma_dir(struct gve_priv *priv,
 		return DMA_FROM_DEVICE;
 }
 
+static inline bool gve_is_gqi(struct gve_priv *priv)
+{
+	return priv->queue_format == GVE_GQI_RDA_FORMAT ||
+		priv->queue_format == GVE_GQI_QPL_FORMAT;
+}
+
 /* buffers */
 int gve_alloc_page(struct gve_priv *priv, struct device *dev,
 		   struct page **page, dma_addr_t *dma,
diff --git a/drivers/net/ethernet/google/gve/gve_adminq.c b/drivers/net/ethernet/google/gve/gve_adminq.c
index 9dfce9af60bc..9efa60ce34e0 100644
--- a/drivers/net/ethernet/google/gve/gve_adminq.c
+++ b/drivers/net/ethernet/google/gve/gve_adminq.c
@@ -602,6 +602,40 @@ int gve_adminq_destroy_rx_queues(struct gve_priv *priv, u32 num_queues)
 	return gve_adminq_kick_and_wait(priv);
 }
 
+static int gve_set_desc_cnt(struct gve_priv *priv,
+			    struct gve_device_descriptor *descriptor)
+{
+	priv->tx_desc_cnt = be16_to_cpu(descriptor->tx_queue_entries);
+	if (priv->tx_desc_cnt * sizeof(priv->tx->desc[0]) < PAGE_SIZE) {
+		dev_err(&priv->pdev->dev, "Tx desc count %d too low\n",
+			priv->tx_desc_cnt);
+		return -EINVAL;
+	}
+	priv->rx_desc_cnt = be16_to_cpu(descriptor->rx_queue_entries);
+	if (priv->rx_desc_cnt * sizeof(priv->rx->desc.desc_ring[0])
+	    < PAGE_SIZE) {
+		dev_err(&priv->pdev->dev, "Rx desc count %d too low\n",
+			priv->rx_desc_cnt);
+		return -EINVAL;
+	}
+	return 0;
+}
+
+static int
+gve_set_desc_cnt_dqo(struct gve_priv *priv,
+		     const struct gve_device_descriptor *descriptor,
+		     const struct gve_device_option_dqo_rda *dev_op_dqo_rda)
+{
+	priv->tx_desc_cnt = be16_to_cpu(descriptor->tx_queue_entries);
+	priv->options_dqo_rda.tx_comp_ring_entries =
+		be16_to_cpu(dev_op_dqo_rda->tx_comp_ring_entries);
+	priv->rx_desc_cnt = be16_to_cpu(descriptor->rx_queue_entries);
+	priv->options_dqo_rda.rx_buff_ring_entries =
+		be16_to_cpu(dev_op_dqo_rda->rx_buff_ring_entries);
+
+	return 0;
+}
+
 int gve_adminq_describe_device(struct gve_priv *priv)
 {
 	struct gve_device_option_gqi_rda *dev_op_gqi_rda = NULL;
@@ -655,22 +689,14 @@ int gve_adminq_describe_device(struct gve_priv *priv)
 		dev_info(&priv->pdev->dev,
 			 "Driver is running with GQI QPL queue format.\n");
 	}
-
-	priv->tx_desc_cnt = be16_to_cpu(descriptor->tx_queue_entries);
-	if (priv->tx_desc_cnt * sizeof(priv->tx->desc[0]) < PAGE_SIZE) {
-		dev_err(&priv->pdev->dev, "Tx desc count %d too low\n", priv->tx_desc_cnt);
-		err = -EINVAL;
-		goto free_device_descriptor;
+	if (gve_is_gqi(priv)) {
+		err = gve_set_desc_cnt(priv, descriptor);
+	} else {
+		err = gve_set_desc_cnt_dqo(priv, descriptor, dev_op_dqo_rda);
 	}
-	priv->rx_desc_cnt = be16_to_cpu(descriptor->rx_queue_entries);
-	if (priv->rx_desc_cnt * sizeof(priv->rx->desc.desc_ring[0])
-	    < PAGE_SIZE ||
-	    priv->rx_desc_cnt * sizeof(priv->rx->data.data_ring[0])
-	    < PAGE_SIZE) {
-		dev_err(&priv->pdev->dev, "Rx desc count %d too low\n", priv->rx_desc_cnt);
-		err = -EINVAL;
+	if (err)
 		goto free_device_descriptor;
-	}
+
 	priv->max_registered_pages =
 				be64_to_cpu(descriptor->max_registered_pages);
 	mtu = be16_to_cpu(descriptor->mtu);
@@ -686,7 +712,8 @@ int gve_adminq_describe_device(struct gve_priv *priv)
 	dev_info(&priv->pdev->dev, "MAC addr: %pM\n", mac);
 	priv->tx_pages_per_qpl = be16_to_cpu(descriptor->tx_pages_per_qpl);
 	priv->rx_data_slot_cnt = be16_to_cpu(descriptor->rx_pages_per_qpl);
-	if (priv->rx_data_slot_cnt < priv->rx_desc_cnt) {
+
+	if (gve_is_gqi(priv) && priv->rx_data_slot_cnt < priv->rx_desc_cnt) {
 		dev_err(&priv->pdev->dev, "rx_data_slot_cnt cannot be smaller than rx_desc_cnt, setting rx_desc_cnt down to %d.\n",
 			priv->rx_data_slot_cnt);
 		priv->rx_desc_cnt = priv->rx_data_slot_cnt;

From patchwork Thu Jun 24 18:06:25 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Bailey Forrest <bcf@google.com>
X-Patchwork-Id: 466707
Return-Path: <netdev-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
 aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-26.4 required=3.0 tests=BAYES_00,DKIMWL_WL_MED,
 DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,
 INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE,
 SPF_PASS, 
 USER_AGENT_GIT,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no
 version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
 by smtp.lore.kernel.org (Postfix) with ESMTP id 6A4ABC49EA5
 for <netdev@archiver.kernel.org>;
 Thu, 24 Jun 2021 18:07:59 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
 by mail.kernel.org (Postfix) with ESMTP id 50358613EE
 for <netdev@archiver.kernel.org>;
 Thu, 24 Jun 2021 18:07:59 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
 id S232579AbhFXSKP (ORCPT <rfc822;netdev@archiver.kernel.org>);
 Thu, 24 Jun 2021 14:10:15 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37778 "EHLO
 lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by
 vger.kernel.org with ESMTP id S232549AbhFXSKK (ORCPT
 <rfc822;netdev@vger.kernel.org>); Thu, 24 Jun 2021 14:10:10 -0400
Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com
 [IPv6:2607:f8b0:4864:20::b4a])
 by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 75520C061574
 for <netdev@vger.kernel.org>; Thu, 24 Jun 2021 11:07:51 -0700 (PDT)
Received: by mail-yb1-xb4a.google.com with SMTP id
 j190-20020a253cc70000b029054c72781aa2so493018yba.9
 for <netdev@vger.kernel.org>; Thu, 24 Jun 2021 11:07:51 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025;
 h=date:in-reply-to:message-id:mime-version:references:subject:from:to
 :cc; bh=kZOIrh4dR7tuqOMmKJOLzHoQySVqfkmJ+B+V9AQn0sw=;
 b=irAZra7AUUbQ3vP4ToCUXtLs2au5mvbfqBn32LimqtmC357LrYIVg1ZTR7OV26gBMF
 1Zy9BCLu++tDRE3p48ZFLKlFmzLj9KpPvc7/an50FhXHagE15y1vqF0G6agwymoZyf4V
 swr/kQU5qvnyfwcIkd/nC7SZJ08CChMsaeRonSa8u/hxQLeBkiufLIKxKxWQZoJuq9sa
 Y7zgIMQG5g9tzlMFI8qTqSILchbyuKePwQsdKZf1H7nEmbVXgLX09QWBwHSCc7WbNFBo
 SDTf6LJhKTIlq6UTqM9zjpDiBImnl3gIbuQC2hDVqJDxS+Cdnk2USwcuxSNOad7OVPcf
 PSsQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:date:in-reply-to:message-id:mime-version
 :references:subject:from:to:cc;
 bh=kZOIrh4dR7tuqOMmKJOLzHoQySVqfkmJ+B+V9AQn0sw=;
 b=DH4gMqWVPZ+d+xcmOmzsMH5ENIlsN0GE9A7qUDu8o6KvitI40eI6PwxvzhIPmqoJv3
 w5Gq0fUo87btgDIqPp71SlqMFmg5yIsjOdx1JKx8Vxb5z/Z6FU3hdg+qaVY6IQKMlndw
 rHfyvPsfq3SJv6sICxz+WUA4VKPKqgWqaZwJgw/0uZrQ5oKJSrIzfZSIBJ7r5zKz3S/t
 jId9VObYqVnxqW6MNrkVI76M5C+uEOEHYsTjBwUHhTT6joJ4DM920M88efeuj3MjB27S
 g5USO5WP1V4wP/rN2SA3YQWx0NFNI58lEHEqtmKE90Azf+dxHSsCXJ7Hn+jYvy8cdTvD
 V05g==
X-Gm-Message-State: AOAM532S1A7PVdLDJwFxtpRoR89Pp8UFgBlg1AIwpOOnr9Oxes9gqJJz
 CIw80mgPVTmr/xanBk0yM9s3xko=
X-Google-Smtp-Source: ABdhPJxGctHbp+HL0gvrVeaSuKGQQU7rl7f33i4ZSBEk5wZ1CWlLFWRGTEEo0AvfN2edNw/OZ2Y0uAs=
X-Received: from bcf-linux.svl.corp.google.com
 ([2620:15c:2c4:1:cb6c:4753:6df0:b898])
 (user=bcf job=sendgmr) by 2002:a25:2d55:: with SMTP id
 s21mr6803465ybe.338.1624558070665; 
 Thu, 24 Jun 2021 11:07:50 -0700 (PDT)
Date: Thu, 24 Jun 2021 11:06:25 -0700
In-Reply-To: <20210624180632.3659809-1-bcf@google.com>
Message-Id: <20210624180632.3659809-10-bcf@google.com>
Mime-Version: 1.0
References: <20210624180632.3659809-1-bcf@google.com>
X-Mailer: git-send-email 2.32.0.93.g670b81a890-goog
Subject: [PATCH net-next 09/16] gve: Add dqo descriptors
From: Bailey Forrest <bcf@google.com>
To: Bailey Forrest <bcf@google.com>, "David S . Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org, Willem de Bruijn <willemb@google.com>,
 Catherine Sullivan <csully@google.com>
Precedence: bulk
List-ID: <netdev.vger.kernel.org>
X-Mailing-List: netdev@vger.kernel.org

General description of rings and descriptors:

TX ring is used for sending TX packet buffers to the NIC. It has the
following descriptors:
- `gve_tx_pkt_desc_dqo` - Data buffer descriptor
- `gve_tx_tso_context_desc_dqo` - TSO context descriptor
- `gve_tx_general_context_desc_dqo` - Generic metadata descriptor

Metadata is a collection of 12 bytes. We define `gve_tx_metadata_dqo`
which represents the logical interpetation of the metadata bytes. It's
helpful to define this structure because the metadata bytes exist in
multiple descriptor types (including `gve_tx_tso_context_desc_dqo`),
and the device requires same field has the same value in all
descriptors.

The TX completion ring is used to receive completions from the NIC.
Having a separate ring allows for completions to be out of order. The
completion descriptor `gve_tx_compl_desc` has several different types,
most important are packet and descriptor completions. Descriptor
completions are used to notify the driver when descriptors sent on the
TX ring are done being consumed. The descriptor completion is only used
to signal that space is cleared in the TX ring. A packet completion will
be received when a packet transmitted on the TX queue is done being
transmitted.

In addition there are "miss" and "reinjection" completions. The device
implements a "flow-miss model". Most packets will simply receive a
packet completion. The flow-miss system may choose to process a packet
based on its contents. A TX packet which experiences a flow miss would
receive a miss completion followed by a later reinjection completion.
The miss-completion is received when the packet starts to be processed
by the flow-miss system and the reinjection completion is received when
the flow-miss system completes processing the packet and sends it on the
wire.

The RX buffer ring is used to send buffers to HW via the
`gve_rx_desc_dqo` descriptor.

Received packets are put into the RX queue by the device, which
populates the `gve_rx_compl_desc_dqo` descriptor. The RX descriptors
refer to buffers posted by the buffer queue. Received buffers may be
returned out of order, such as when HW LRO is enabled.

Important concepts:
- "TX" and "RX buffer" queues, which send descriptors to the device, use
  MMIO doorbells to notify the device of new descriptors.

- "RX" and "TX completion" queues, which receive descriptors from the
  device, use a "generation bit" to know when a descriptor was populated
  by the device. The driver initializes all bits with the "current
  generation". The device will populate received descriptors with the
  "next generation" which is inverted from the current generation. When
  the ring wraps, the current/next generation are swapped.

- It's the driver's responsibility to ensure that the RX and TX
  completion queues are not overrun. This can be accomplished by
  limiting the number of descriptors posted to HW.

- TX packets have a 16 bit completion_tag and RX buffers have a 16 bit
  buffer_id. These will be returned on the TX completion and RX queues
  respectively to let the driver know which packet/buffer was completed.

Bitfields are used to describe descriptor fields. This notation is more
concise and readable than shift-and-mask. It is possible because the
driver is restricted to little endian platforms.

Signed-off-by: Bailey Forrest <bcf@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Catherine Sullivan <csully@google.com>
---
 drivers/net/ethernet/google/Kconfig           |   2 +-
 .../net/ethernet/google/gve/gve_desc_dqo.h    | 256 ++++++++++++++++++
 2 files changed, 257 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/google/gve/gve_desc_dqo.h

diff --git a/drivers/net/ethernet/google/Kconfig b/drivers/net/ethernet/google/Kconfig
index b8f04d052fda..8641a00f8e63 100644
--- a/drivers/net/ethernet/google/Kconfig
+++ b/drivers/net/ethernet/google/Kconfig
@@ -17,7 +17,7 @@ if NET_VENDOR_GOOGLE
 
 config GVE
 	tristate "Google Virtual NIC (gVNIC) support"
-	depends on PCI_MSI
+	depends on (PCI_MSI && (X86 || CPU_LITTLE_ENDIAN))
 	help
 	  This driver supports Google Virtual NIC (gVNIC)"
 
diff --git a/drivers/net/ethernet/google/gve/gve_desc_dqo.h b/drivers/net/ethernet/google/gve/gve_desc_dqo.h
new file mode 100644
index 000000000000..e8fe9adef7f2
--- /dev/null
+++ b/drivers/net/ethernet/google/gve/gve_desc_dqo.h
@@ -0,0 +1,256 @@
+/* SPDX-License-Identifier: (GPL-2.0 OR MIT)
+ * Google virtual Ethernet (gve) driver
+ *
+ * Copyright (C) 2015-2021 Google, Inc.
+ */
+
+/* GVE DQO Descriptor formats */
+
+#ifndef _GVE_DESC_DQO_H_
+#define _GVE_DESC_DQO_H_
+
+#include <linux/build_bug.h>
+
+#define GVE_TX_MAX_HDR_SIZE_DQO 255
+#define GVE_TX_MIN_TSO_MSS_DQO 88
+
+#ifndef __LITTLE_ENDIAN_BITFIELD
+#error "Only little endian supported"
+#endif
+
+/* Basic TX descriptor (DTYPE 0x0C) */
+struct gve_tx_pkt_desc_dqo {
+	__le64 buf_addr;
+
+	/* Must be GVE_TX_PKT_DESC_DTYPE_DQO (0xc) */
+	u8 dtype: 5;
+
+	/* Denotes the last descriptor of a packet. */
+	u8 end_of_packet: 1;
+	u8 checksum_offload_enable: 1;
+
+	/* If set, will generate a descriptor completion for this descriptor. */
+	u8 report_event: 1;
+	u8 reserved0;
+	__le16 reserved1;
+
+	/* The TX completion associated with this packet will contain this tag.
+	 */
+	__le16 compl_tag;
+	u16 buf_size: 14;
+	u16 reserved2: 2;
+} __packed;
+static_assert(sizeof(struct gve_tx_pkt_desc_dqo) == 16);
+
+#define GVE_TX_PKT_DESC_DTYPE_DQO 0xc
+#define GVE_TX_MAX_BUF_SIZE_DQO ((16 * 1024) - 1)
+
+/* Maximum number of data descriptors allowed per packet, or per-TSO segment. */
+#define GVE_TX_MAX_DATA_DESCS 10
+
+/* Min gap between tail and head to avoid cacheline overlap */
+#define GVE_TX_MIN_DESC_PREVENT_CACHE_OVERLAP 4
+
+/* "report_event" on TX packet descriptors may only be reported on the last
+ * descriptor of a TX packet, and they must be spaced apart with at least this
+ * value.
+ */
+#define GVE_TX_MIN_RE_INTERVAL 32
+
+struct gve_tx_context_cmd_dtype {
+	u8 dtype: 5;
+	u8 tso: 1;
+	u8 reserved1: 2;
+
+	u8 reserved2;
+};
+
+static_assert(sizeof(struct gve_tx_context_cmd_dtype) == 2);
+
+/* TX Native TSO Context DTYPE (0x05)
+ *
+ * "flex" fields allow the driver to send additional packet context to HW.
+ */
+struct gve_tx_tso_context_desc_dqo {
+	/* The L4 payload bytes that should be segmented. */
+	u32 tso_total_len: 24;
+	u32 flex10: 8;
+
+	/* Max segment size in TSO excluding headers. */
+	u16 mss: 14;
+	u16 reserved: 2;
+
+	u8 header_len; /* Header length to use for TSO offload */
+	u8 flex11;
+	struct gve_tx_context_cmd_dtype cmd_dtype;
+	u8 flex0;
+	u8 flex5;
+	u8 flex6;
+	u8 flex7;
+	u8 flex8;
+	u8 flex9;
+} __packed;
+static_assert(sizeof(struct gve_tx_tso_context_desc_dqo) == 16);
+
+#define GVE_TX_TSO_CTX_DESC_DTYPE_DQO 0x5
+
+/* General context descriptor for sending metadata. */
+struct gve_tx_general_context_desc_dqo {
+	u8 flex4;
+	u8 flex5;
+	u8 flex6;
+	u8 flex7;
+	u8 flex8;
+	u8 flex9;
+	u8 flex10;
+	u8 flex11;
+	struct gve_tx_context_cmd_dtype cmd_dtype;
+	u16 reserved;
+	u8 flex0;
+	u8 flex1;
+	u8 flex2;
+	u8 flex3;
+} __packed;
+static_assert(sizeof(struct gve_tx_general_context_desc_dqo) == 16);
+
+#define GVE_TX_GENERAL_CTX_DESC_DTYPE_DQO 0x4
+
+/* Logical structure of metadata which is packed into context descriptor flex
+ * fields.
+ */
+struct gve_tx_metadata_dqo {
+	union {
+		struct {
+			u8 version;
+
+			/* If `skb->l4_hash` is set, this value should be
+			 * derived from `skb->hash`.
+			 *
+			 * A zero value means no l4_hash was associated with the
+			 * skb.
+			 */
+			u16 path_hash: 15;
+
+			/* Should be set to 1 if the flow associated with the
+			 * skb had a rehash from the TCP stack.
+			 */
+			u16 rehash_event: 1;
+		}  __packed;
+		u8 bytes[12];
+	};
+}  __packed;
+static_assert(sizeof(struct gve_tx_metadata_dqo) == 12);
+
+#define GVE_TX_METADATA_VERSION_DQO 0
+
+/* TX completion descriptor */
+struct gve_tx_compl_desc {
+	/* For types 0-4 this is the TX queue ID associated with this
+	 * completion.
+	 */
+	u16 id: 11;
+
+	/* See: GVE_COMPL_TYPE_DQO* */
+	u16 type: 3;
+	u16 reserved0: 1;
+
+	/* Flipped by HW to notify the descriptor is populated. */
+	u16 generation: 1;
+	union {
+		/* For descriptor completions, this is the last index fetched
+		 * by HW + 1.
+		 */
+		__le16 tx_head;
+
+		/* For packet completions, this is the completion tag set on the
+		 * TX packet descriptors.
+		 */
+		__le16 completion_tag;
+	};
+	__le32 reserved1;
+} __packed;
+static_assert(sizeof(struct gve_tx_compl_desc) == 8);
+
+#define GVE_COMPL_TYPE_DQO_PKT 0x2 /* Packet completion */
+#define GVE_COMPL_TYPE_DQO_DESC 0x4 /* Descriptor completion */
+#define GVE_COMPL_TYPE_DQO_MISS 0x1 /* Miss path completion */
+#define GVE_COMPL_TYPE_DQO_REINJECTION 0x3 /* Re-injection completion */
+
+/* Descriptor to post buffers to HW on buffer queue. */
+struct gve_rx_desc_dqo {
+	__le16 buf_id; /* ID returned in Rx completion descriptor */
+	__le16 reserved0;
+	__le32 reserved1;
+	__le64 buf_addr; /* DMA address of the buffer */
+	__le64 header_buf_addr;
+	__le64 reserved2;
+} __packed;
+static_assert(sizeof(struct gve_rx_desc_dqo) == 32);
+
+/* Descriptor for HW to notify SW of new packets received on RX queue. */
+struct gve_rx_compl_desc_dqo {
+	/* Must be 1 */
+	u8 rxdid: 4;
+	u8 reserved0: 4;
+
+	/* Packet originated from this system rather than the network. */
+	u8 loopback: 1;
+	/* Set when IPv6 packet contains a destination options header or routing
+	 * header.
+	 */
+	u8 ipv6_ex_add: 1;
+	/* Invalid packet was received. */
+	u8 rx_error: 1;
+	u8 reserved1: 5;
+
+	u16 packet_type: 10;
+	u16 ip_hdr_err: 1;
+	u16 udp_len_err: 1;
+	u16 raw_cs_invalid: 1;
+	u16 reserved2: 3;
+
+	u16 packet_len: 14;
+	/* Flipped by HW to notify the descriptor is populated. */
+	u16 generation: 1;
+	/* Should be zero. */
+	u16 buffer_queue_id: 1;
+
+	u16 header_len: 10;
+	u16 rsc: 1;
+	u16 split_header: 1;
+	u16 reserved3: 4;
+
+	u8 descriptor_done: 1;
+	u8 end_of_packet: 1;
+	u8 header_buffer_overflow: 1;
+	u8 l3_l4_processed: 1;
+	u8 csum_ip_err: 1;
+	u8 csum_l4_err: 1;
+	u8 csum_external_ip_err: 1;
+	u8 csum_external_udp_err: 1;
+
+	u8 status_error1;
+
+	__le16 reserved5;
+	__le16 buf_id; /* Buffer ID which was sent on the buffer queue. */
+
+	union {
+		/* Packet checksum. */
+		__le16 raw_cs;
+		/* Segment length for RSC packets. */
+		__le16 rsc_seg_len;
+	};
+	__le32 hash;
+	__le32 reserved6;
+	__le64 reserved7;
+} __packed;
+
+static_assert(sizeof(struct gve_rx_compl_desc_dqo) == 32);
+
+/* Ringing the doorbell too often can hurt performance.
+ *
+ * HW requires this value to be at least 8.
+ */
+#define GVE_RX_BUF_THRESH_DQO 32
+
+#endif /* _GVE_DESC_DQO_H_ */

From patchwork Thu Jun 24 18:06:27 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Bailey Forrest <bcf@google.com>
X-Patchwork-Id: 466706
Return-Path: <netdev-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
 aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-26.4 required=3.0 tests=BAYES_00,DKIMWL_WL_MED,
 DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,
 INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE,
 SPF_PASS, 
 USER_AGENT_GIT,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no
 version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
 by smtp.lore.kernel.org (Postfix) with ESMTP id 87444C49EA6
 for <netdev@archiver.kernel.org>;
 Thu, 24 Jun 2021 18:08:24 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
 by mail.kernel.org (Postfix) with ESMTP id 70FD1613F0
 for <netdev@archiver.kernel.org>;
 Thu, 24 Jun 2021 18:08:24 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
 id S232596AbhFXSKl (ORCPT <rfc822;netdev@archiver.kernel.org>);
 Thu, 24 Jun 2021 14:10:41 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37856 "EHLO
 lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by
 vger.kernel.org with ESMTP id S232600AbhFXSKT (ORCPT
 <rfc822;netdev@vger.kernel.org>); Thu, 24 Jun 2021 14:10:19 -0400
Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com
 [IPv6:2607:f8b0:4864:20::b4a])
 by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6DAA1C0617A8
 for <netdev@vger.kernel.org>; Thu, 24 Jun 2021 11:07:56 -0700 (PDT)
Received: by mail-yb1-xb4a.google.com with SMTP id
 x79-20020a25ce520000b02905519b84acfbso495831ybe.23
 for <netdev@vger.kernel.org>; Thu, 24 Jun 2021 11:07:56 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025;
 h=date:in-reply-to:message-id:mime-version:references:subject:from:to
 :cc; bh=tYPeUTCY6O5qLoeu23rNu4eVTYNZXjM9Mr8GOcPGRBo=;
 b=RsJt/DwobZzw85VUbh2A95NM2a/1VAx2KPqxBrPiBU5WYPZSVBKTyoclUZdFjOyZyk
 F5qxc+WBkxb+x0dAq3ePvee3t1rik/glhiJO3dkxVJ04Y+NWMv/WZ83tq0wmxzFwTFDb
 YeZBi/vNt9b4g+3pHnEthDxxqFCq7o6Q6jndjd7aNvJQz9QDbb69+W89YWFFES5wcmPG
 ZPl2siEByOiFnIqBzaUmh7syTBUbVd0qCnLSOJl824TvE12psL52ifq15Ps+bpebyCm8
 vy9gdUJajiCDt1XEQmsoLU1YiwhApuObBubuZcdxWIABasUvENLgHiCbq20P5lsMkJ88
 O16Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:date:in-reply-to:message-id:mime-version
 :references:subject:from:to:cc;
 bh=tYPeUTCY6O5qLoeu23rNu4eVTYNZXjM9Mr8GOcPGRBo=;
 b=sntfdkJEXUWOdyZYCQjEVyMZc1mTYYBiIfnOIekUfDmQprkOOuzjFUtim31+1lhj0R
 8vIhQjCxhubXRfVeqhNWUdDMM4cSwKM4wRrBMnWq6bttWwzcOUg8qwLEqCSikAEaaB16
 mmUNTG+4GMkXLy23MW1ZIq6TcyR+PmErdMjmoeKp5QlxkD6HdcKDRZWr6MdI2EGdcYUq
 CcAW7+0sA5BdJIj+OCtMJO33p+PbVREEk/5mnzYH18P4mg5cIbvs2N0SZOLd097pnwlR
 DpPEpAaVeL/yrnXQOukR2TTk3Nvm3121vSrBSVCudKeQjMffiE55djLUhqt9FcK8nBiF
 9xnQ==
X-Gm-Message-State: AOAM5300+nE1hbJBewRr5tc9zaRsXUmG2NKoKSYIgiuWlSKmkZGJZlKs
 natfZA5J/1KwbEX2gTVfbQgOXWw=
X-Google-Smtp-Source: ABdhPJzD3pl5QFMT0iOnF38qc752F6cbdkCah2AepBuV2XeSE3ZV7MDs9yaKen13hH6lwCpWEUXL4RE=
X-Received: from bcf-linux.svl.corp.google.com
 ([2620:15c:2c4:1:cb6c:4753:6df0:b898])
 (user=bcf job=sendgmr) by 2002:a25:3858:: with SMTP id
 f85mr6566309yba.202.1624558075609; 
 Thu, 24 Jun 2021 11:07:55 -0700 (PDT)
Date: Thu, 24 Jun 2021 11:06:27 -0700
In-Reply-To: <20210624180632.3659809-1-bcf@google.com>
Message-Id: <20210624180632.3659809-12-bcf@google.com>
Mime-Version: 1.0
References: <20210624180632.3659809-1-bcf@google.com>
X-Mailer: git-send-email 2.32.0.93.g670b81a890-goog
Subject: [PATCH net-next 11/16] gve: Update adminq commands to support DQO
 queues
From: Bailey Forrest <bcf@google.com>
To: Bailey Forrest <bcf@google.com>, "David S . Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org, Willem de Bruijn <willemb@google.com>,
 Catherine Sullivan <csully@google.com>
Precedence: bulk
List-ID: <netdev.vger.kernel.org>
X-Mailing-List: netdev@vger.kernel.org

DQO queue creation requires additional parameters:
- TX completion/RX buffer queue size
- TX completion/RX buffer queue address
- TX/RX queue size
- RX buffer size

Signed-off-by: Bailey Forrest <bcf@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Catherine Sullivan <csully@google.com>
---
 drivers/net/ethernet/google/gve/gve.h         |  3 +
 drivers/net/ethernet/google/gve/gve_adminq.c  | 63 ++++++++++++-------
 drivers/net/ethernet/google/gve/gve_adminq.h  | 18 ++++--
 drivers/net/ethernet/google/gve/gve_ethtool.c |  9 ++-
 4 files changed, 64 insertions(+), 29 deletions(-)

diff --git a/drivers/net/ethernet/google/gve/gve.h b/drivers/net/ethernet/google/gve/gve.h
index 5bfab1ac20d1..8a2a8d125090 100644
--- a/drivers/net/ethernet/google/gve/gve.h
+++ b/drivers/net/ethernet/google/gve/gve.h
@@ -548,6 +548,9 @@ struct gve_priv {
 	struct gve_options_dqo_rda options_dqo_rda;
 	struct gve_ptype_lut *ptype_lut_dqo;
 
+	/* Must be a power of two. */
+	int data_buffer_size_dqo;
+
 	enum gve_queue_format queue_format;
 };
 
diff --git a/drivers/net/ethernet/google/gve/gve_adminq.c b/drivers/net/ethernet/google/gve/gve_adminq.c
index 7d8d354f67e2..cf017a499119 100644
--- a/drivers/net/ethernet/google/gve/gve_adminq.c
+++ b/drivers/net/ethernet/google/gve/gve_adminq.c
@@ -443,6 +443,7 @@ int gve_adminq_configure_device_resources(struct gve_priv *priv,
 		.irq_db_stride = cpu_to_be32(sizeof(priv->ntfy_blocks[0])),
 		.ntfy_blk_msix_base_idx =
 					cpu_to_be32(GVE_NTFY_BLK_BASE_MSIX_IDX),
+		.queue_format = priv->queue_format,
 	};
 
 	return gve_adminq_execute_cmd(priv, &cmd);
@@ -462,28 +463,32 @@ static int gve_adminq_create_tx_queue(struct gve_priv *priv, u32 queue_index)
 {
 	struct gve_tx_ring *tx = &priv->tx[queue_index];
 	union gve_adminq_command cmd;
-	u32 qpl_id;
-	int err;
 
-	qpl_id = priv->queue_format == GVE_GQI_RDA_FORMAT ?
-		 GVE_RAW_ADDRESSING_QPL_ID : tx->tx_fifo.qpl->id;
 	memset(&cmd, 0, sizeof(cmd));
 	cmd.opcode = cpu_to_be32(GVE_ADMINQ_CREATE_TX_QUEUE);
 	cmd.create_tx_queue = (struct gve_adminq_create_tx_queue) {
 		.queue_id = cpu_to_be32(queue_index),
-		.reserved = 0,
 		.queue_resources_addr =
 			cpu_to_be64(tx->q_resources_bus),
 		.tx_ring_addr = cpu_to_be64(tx->bus),
-		.queue_page_list_id = cpu_to_be32(qpl_id),
 		.ntfy_id = cpu_to_be32(tx->ntfy_id),
 	};
 
-	err = gve_adminq_issue_cmd(priv, &cmd);
-	if (err)
-		return err;
+	if (gve_is_gqi(priv)) {
+		u32 qpl_id = priv->queue_format == GVE_GQI_RDA_FORMAT ?
+			GVE_RAW_ADDRESSING_QPL_ID : tx->tx_fifo.qpl->id;
 
-	return 0;
+		cmd.create_tx_queue.queue_page_list_id = cpu_to_be32(qpl_id);
+	} else {
+		cmd.create_tx_queue.tx_ring_size =
+			cpu_to_be16(priv->tx_desc_cnt);
+		cmd.create_tx_queue.tx_comp_ring_addr =
+			cpu_to_be64(tx->complq_bus_dqo);
+		cmd.create_tx_queue.tx_comp_ring_size =
+			cpu_to_be16(priv->options_dqo_rda.tx_comp_ring_entries);
+	}
+
+	return gve_adminq_issue_cmd(priv, &cmd);
 }
 
 int gve_adminq_create_tx_queues(struct gve_priv *priv, u32 num_queues)
@@ -504,29 +509,41 @@ static int gve_adminq_create_rx_queue(struct gve_priv *priv, u32 queue_index)
 {
 	struct gve_rx_ring *rx = &priv->rx[queue_index];
 	union gve_adminq_command cmd;
-	u32 qpl_id;
-	int err;
 
-	qpl_id = priv->queue_format == GVE_GQI_RDA_FORMAT ?
-		 GVE_RAW_ADDRESSING_QPL_ID : rx->data.qpl->id;
 	memset(&cmd, 0, sizeof(cmd));
 	cmd.opcode = cpu_to_be32(GVE_ADMINQ_CREATE_RX_QUEUE);
 	cmd.create_rx_queue = (struct gve_adminq_create_rx_queue) {
 		.queue_id = cpu_to_be32(queue_index),
-		.index = cpu_to_be32(queue_index),
-		.reserved = 0,
 		.ntfy_id = cpu_to_be32(rx->ntfy_id),
 		.queue_resources_addr = cpu_to_be64(rx->q_resources_bus),
-		.rx_desc_ring_addr = cpu_to_be64(rx->desc.bus),
-		.rx_data_ring_addr = cpu_to_be64(rx->data.data_bus),
-		.queue_page_list_id = cpu_to_be32(qpl_id),
 	};
 
-	err = gve_adminq_issue_cmd(priv, &cmd);
-	if (err)
-		return err;
+	if (gve_is_gqi(priv)) {
+		u32 qpl_id = priv->queue_format == GVE_GQI_RDA_FORMAT ?
+			GVE_RAW_ADDRESSING_QPL_ID : rx->data.qpl->id;
+
+		cmd.create_rx_queue.rx_desc_ring_addr =
+			cpu_to_be64(rx->desc.bus),
+		cmd.create_rx_queue.rx_data_ring_addr =
+			cpu_to_be64(rx->data.data_bus),
+		cmd.create_rx_queue.index = cpu_to_be32(queue_index);
+		cmd.create_rx_queue.queue_page_list_id = cpu_to_be32(qpl_id);
+	} else {
+		cmd.create_rx_queue.rx_ring_size =
+			cpu_to_be16(priv->rx_desc_cnt);
+		cmd.create_rx_queue.rx_desc_ring_addr =
+			cpu_to_be64(rx->dqo.complq.bus);
+		cmd.create_rx_queue.rx_data_ring_addr =
+			cpu_to_be64(rx->dqo.bufq.bus);
+		cmd.create_rx_queue.packet_buffer_size =
+			cpu_to_be16(priv->data_buffer_size_dqo);
+		cmd.create_rx_queue.rx_buff_ring_size =
+			cpu_to_be16(priv->options_dqo_rda.rx_buff_ring_entries);
+		cmd.create_rx_queue.enable_rsc =
+			!!(priv->dev->features & NETIF_F_LRO);
+	}
 
-	return 0;
+	return gve_adminq_issue_cmd(priv, &cmd);
 }
 
 int gve_adminq_create_rx_queues(struct gve_priv *priv, u32 num_queues)
diff --git a/drivers/net/ethernet/google/gve/gve_adminq.h b/drivers/net/ethernet/google/gve/gve_adminq.h
index 62a7e96af715..47c3d8f313fc 100644
--- a/drivers/net/ethernet/google/gve/gve_adminq.h
+++ b/drivers/net/ethernet/google/gve/gve_adminq.h
@@ -139,9 +139,11 @@ struct gve_adminq_configure_device_resources {
 	__be32 num_irq_dbs;
 	__be32 irq_db_stride;
 	__be32 ntfy_blk_msix_base_idx;
+	u8 queue_format;
+	u8 padding[7];
 };
 
-static_assert(sizeof(struct gve_adminq_configure_device_resources) == 32);
+static_assert(sizeof(struct gve_adminq_configure_device_resources) == 40);
 
 struct gve_adminq_register_page_list {
 	__be32 page_list_id;
@@ -166,9 +168,13 @@ struct gve_adminq_create_tx_queue {
 	__be64 tx_ring_addr;
 	__be32 queue_page_list_id;
 	__be32 ntfy_id;
+	__be64 tx_comp_ring_addr;
+	__be16 tx_ring_size;
+	__be16 tx_comp_ring_size;
+	u8 padding[4];
 };
 
-static_assert(sizeof(struct gve_adminq_create_tx_queue) == 32);
+static_assert(sizeof(struct gve_adminq_create_tx_queue) == 48);
 
 struct gve_adminq_create_rx_queue {
 	__be32 queue_id;
@@ -179,10 +185,14 @@ struct gve_adminq_create_rx_queue {
 	__be64 rx_desc_ring_addr;
 	__be64 rx_data_ring_addr;
 	__be32 queue_page_list_id;
-	u8 padding[4];
+	__be16 rx_ring_size;
+	__be16 packet_buffer_size;
+	__be16 rx_buff_ring_size;
+	u8 enable_rsc;
+	u8 padding[5];
 };
 
-static_assert(sizeof(struct gve_adminq_create_rx_queue) == 48);
+static_assert(sizeof(struct gve_adminq_create_rx_queue) == 56);
 
 /* Queue resources that are shared with the device */
 struct gve_queue_resources {
diff --git a/drivers/net/ethernet/google/gve/gve_ethtool.c b/drivers/net/ethernet/google/gve/gve_ethtool.c
index 5fb05cf36b49..ccaf68562312 100644
--- a/drivers/net/ethernet/google/gve/gve_ethtool.c
+++ b/drivers/net/ethernet/google/gve/gve_ethtool.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: (GPL-2.0 OR MIT)
 /* Google virtual Ethernet (gve) driver
  *
- * Copyright (C) 2015-2019 Google, Inc.
+ * Copyright (C) 2015-2021 Google, Inc.
  */
 
 #include <linux/ethtool.h>
@@ -453,11 +453,16 @@ static int gve_set_tunable(struct net_device *netdev,
 
 	switch (etuna->id) {
 	case ETHTOOL_RX_COPYBREAK:
+	{
+		u32 max_copybreak = gve_is_gqi(priv) ?
+			(PAGE_SIZE / 2) : priv->data_buffer_size_dqo;
+
 		len = *(u32 *)value;
-		if (len > PAGE_SIZE / 2)
+		if (len > max_copybreak)
 			return -EINVAL;
 		priv->rx_copybreak = len;
 		return 0;
+	}
 	default:
 		return -EOPNOTSUPP;
 	}

From patchwork Thu Jun 24 18:06:29 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Bailey Forrest <bcf@google.com>
X-Patchwork-Id: 466705
Return-Path: <netdev-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
 aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-26.4 required=3.0 tests=BAYES_00,DKIMWL_WL_MED,
 DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,
 INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE,
 SPF_PASS, 
 USER_AGENT_GIT,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no
 version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
 by smtp.lore.kernel.org (Postfix) with ESMTP id 57992C49EA6
 for <netdev@archiver.kernel.org>;
 Thu, 24 Jun 2021 18:08:37 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
 by mail.kernel.org (Postfix) with ESMTP id 2E2A0613F2
 for <netdev@archiver.kernel.org>;
 Thu, 24 Jun 2021 18:08:37 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
 id S232537AbhFXSKz (ORCPT <rfc822;netdev@archiver.kernel.org>);
 Thu, 24 Jun 2021 14:10:55 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37828 "EHLO
 lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by
 vger.kernel.org with ESMTP id S232633AbhFXSKa (ORCPT
 <rfc822;netdev@vger.kernel.org>); Thu, 24 Jun 2021 14:10:30 -0400
Received: from mail-qt1-x849.google.com (mail-qt1-x849.google.com
 [IPv6:2607:f8b0:4864:20::849])
 by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 913BDC061574
 for <netdev@vger.kernel.org>; Thu, 24 Jun 2021 11:08:01 -0700 (PDT)
Received: by mail-qt1-x849.google.com with SMTP id
 44-20020aed30af0000b029024e8ccfcd07so7117423qtf.11
 for <netdev@vger.kernel.org>; Thu, 24 Jun 2021 11:08:01 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025;
 h=date:in-reply-to:message-id:mime-version:references:subject:from:to
 :cc; bh=VKCf8h/xZ9MVf2Csydsh3XQpFeXlCwvkWbdwrg3KyMU=;
 b=H4WBqDUTUlnb4munKOe3vtRerfYxMH0MpEsov5UK7AGbc/mP4WLGi4hAveZoXL2VGF
 36fzcUsEqQnn/uMpiAXkWrzaaBmTmHOU6mHXxr/yXVRBH04a2VWeM2MOvyqd3lWxdq+E
 fSldIGFwh4tgHvDKr2IGWo75BSG2/NOA9E6oeDvcj37Gv6cZdEI9T9HecDQm+xIj3def
 7qoYfwxWJSx8yfaDwX3Q1I0mNlkVZKpR6Swb1wGn1pqgIL6iB2yif8d5Xanqo83fai+6
 LBPTbvCi1FXmM63e/lXbqgZYtz3MAQeJ7k7PdaxDSkNeEn7/G8761xafvQOeMKbox5cf
 3feA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:date:in-reply-to:message-id:mime-version
 :references:subject:from:to:cc;
 bh=VKCf8h/xZ9MVf2Csydsh3XQpFeXlCwvkWbdwrg3KyMU=;
 b=WqynBgS52TnxsGCY4YMNZYBRnDQs6mRhbuXBals8fGTWW7h56CeCphic2KU7+7Te86
 okwpGDd0Aw1Nr1lROy+bZo4bF7wmrYCAn8vzNdxSBbd60Xr3OV0BA8cH4R18CRgeaf72
 rd9EYXGloMrbTbTLaNe0Fd745zmv6N4uj/eyTpICSLHkef6WLJhOOwN9hYk6PvoC+iYR
 FQoEJUBOJf9QGCq9LzGAUs07Xsx6EXRjkYo9nFBFRNtEn3cHgeK1BVejgY6JJ7DP7XQh
 Dbxb6tq7e/SVt5+PYFhi53kSJZ62XpNONOTNWLZOAV9DcGs7CfplvgqTm/7do1K99zI4
 Rewg==
X-Gm-Message-State: AOAM530o2SHNRldQtQKMFg76qKHKwjqBeKU0M55pdl19kAHkata6kLtf
 BJB7KAyf/9dGyrpCU8UI0GrXJbo=
X-Google-Smtp-Source: ABdhPJxn0qy75M4MjwNU7Oe33tN3kfV0H9n5G3xmPOM0eyGy7wllr/qbkNhQdzUxyIV1iHlMaft8FUc=
X-Received: from bcf-linux.svl.corp.google.com
 ([2620:15c:2c4:1:cb6c:4753:6df0:b898])
 (user=bcf job=sendgmr) by 2002:ad4:4c43:: with SMTP id
 cs3mr6802921qvb.27.1624558080774; 
 Thu, 24 Jun 2021 11:08:00 -0700 (PDT)
Date: Thu, 24 Jun 2021 11:06:29 -0700
In-Reply-To: <20210624180632.3659809-1-bcf@google.com>
Message-Id: <20210624180632.3659809-14-bcf@google.com>
Mime-Version: 1.0
References: <20210624180632.3659809-1-bcf@google.com>
X-Mailer: git-send-email 2.32.0.93.g670b81a890-goog
Subject: [PATCH net-next 13/16] gve: DQO: Add ring allocation and
 initialization
From: Bailey Forrest <bcf@google.com>
To: Bailey Forrest <bcf@google.com>, "David S . Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org, Willem de Bruijn <willemb@google.com>,
 Catherine Sullivan <csully@google.com>
Precedence: bulk
List-ID: <netdev.vger.kernel.org>
X-Mailing-List: netdev@vger.kernel.org

Allocate the buffer and completion ring structures. Do not populate the
rings yet. That will happen in the respective rx and tx datapath
follow-on patches

Signed-off-by: Bailey Forrest <bcf@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Catherine Sullivan <csully@google.com>
---
 drivers/net/ethernet/google/gve/gve.h        |   8 +-
 drivers/net/ethernet/google/gve/gve_dqo.h    |  18 ++
 drivers/net/ethernet/google/gve/gve_main.c   |  53 ++++-
 drivers/net/ethernet/google/gve/gve_rx.c     |   2 +-
 drivers/net/ethernet/google/gve/gve_rx_dqo.c | 157 +++++++++++++++
 drivers/net/ethernet/google/gve/gve_tx.c     |   2 +-
 drivers/net/ethernet/google/gve/gve_tx_dqo.c | 193 +++++++++++++++++++
 7 files changed, 420 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/google/gve/gve.h b/drivers/net/ethernet/google/gve/gve.h
index d6bf0466ae8b..30978a15e37d 100644
--- a/drivers/net/ethernet/google/gve/gve.h
+++ b/drivers/net/ethernet/google/gve/gve.h
@@ -204,6 +204,10 @@ struct gve_rx_ring {
 	struct gve_queue_resources *q_resources; /* head and tail pointer idx */
 	dma_addr_t q_resources_bus; /* dma address for the queue resources */
 	struct u64_stats_sync statss; /* sync stats for 32bit archs */
+
+	/* head and tail of skb chain for the current packet or NULL if none */
+	struct sk_buff *skb_head;
+	struct sk_buff *skb_tail;
 };
 
 /* A TX desc ring entry */
@@ -816,14 +820,14 @@ void gve_free_page(struct device *dev, struct page *page, dma_addr_t dma,
 netdev_tx_t gve_tx(struct sk_buff *skb, struct net_device *dev);
 bool gve_tx_poll(struct gve_notify_block *block, int budget);
 int gve_tx_alloc_rings(struct gve_priv *priv);
-void gve_tx_free_rings(struct gve_priv *priv);
+void gve_tx_free_rings_gqi(struct gve_priv *priv);
 __be32 gve_tx_load_event_counter(struct gve_priv *priv,
 				 struct gve_tx_ring *tx);
 /* rx handling */
 void gve_rx_write_doorbell(struct gve_priv *priv, struct gve_rx_ring *rx);
 bool gve_rx_poll(struct gve_notify_block *block, int budget);
 int gve_rx_alloc_rings(struct gve_priv *priv);
-void gve_rx_free_rings(struct gve_priv *priv);
+void gve_rx_free_rings_gqi(struct gve_priv *priv);
 bool gve_clean_rx_done(struct gve_rx_ring *rx, int budget,
 		       netdev_features_t feat);
 /* Reset */
diff --git a/drivers/net/ethernet/google/gve/gve_dqo.h b/drivers/net/ethernet/google/gve/gve_dqo.h
index cff4e6ef7bb6..9877a33ec068 100644
--- a/drivers/net/ethernet/google/gve/gve_dqo.h
+++ b/drivers/net/ethernet/google/gve/gve_dqo.h
@@ -19,6 +19,24 @@
 netdev_tx_t gve_tx_dqo(struct sk_buff *skb, struct net_device *dev);
 bool gve_tx_poll_dqo(struct gve_notify_block *block, bool do_clean);
 int gve_rx_poll_dqo(struct gve_notify_block *block, int budget);
+int gve_tx_alloc_rings_dqo(struct gve_priv *priv);
+void gve_tx_free_rings_dqo(struct gve_priv *priv);
+int gve_rx_alloc_rings_dqo(struct gve_priv *priv);
+void gve_rx_free_rings_dqo(struct gve_priv *priv);
+int gve_clean_tx_done_dqo(struct gve_priv *priv, struct gve_tx_ring *tx,
+			  struct napi_struct *napi);
+void gve_rx_post_buffers_dqo(struct gve_rx_ring *rx);
+void gve_rx_write_doorbell_dqo(const struct gve_priv *priv, int queue_idx);
+
+static inline void
+gve_tx_put_doorbell_dqo(const struct gve_priv *priv,
+			const struct gve_queue_resources *q_resources, u32 val)
+{
+	u64 index;
+
+	index = be32_to_cpu(q_resources->db_index);
+	iowrite32(val, &priv->db_bar2[index]);
+}
 
 static inline void
 gve_write_irq_doorbell_dqo(const struct gve_priv *priv,
diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ethernet/google/gve/gve_main.c
index 579f867cf148..cddf19c8cf0b 100644
--- a/drivers/net/ethernet/google/gve/gve_main.c
+++ b/drivers/net/ethernet/google/gve/gve_main.c
@@ -571,13 +571,21 @@ static int gve_create_rings(struct gve_priv *priv)
 	netif_dbg(priv, drv, priv->dev, "created %d rx queues\n",
 		  priv->rx_cfg.num_queues);
 
-	/* Rx data ring has been prefilled with packet buffers at queue
-	 * allocation time.
-	 * Write the doorbell to provide descriptor slots and packet buffers
-	 * to the NIC.
-	 */
-	for (i = 0; i < priv->rx_cfg.num_queues; i++)
-		gve_rx_write_doorbell(priv, &priv->rx[i]);
+	if (gve_is_gqi(priv)) {
+		/* Rx data ring has been prefilled with packet buffers at queue
+		 * allocation time.
+		 *
+		 * Write the doorbell to provide descriptor slots and packet
+		 * buffers to the NIC.
+		 */
+		for (i = 0; i < priv->rx_cfg.num_queues; i++)
+			gve_rx_write_doorbell(priv, &priv->rx[i]);
+	} else {
+		for (i = 0; i < priv->rx_cfg.num_queues; i++) {
+			/* Post buffers and ring doorbell. */
+			gve_rx_post_buffers_dqo(&priv->rx[i]);
+		}
+	}
 
 	return 0;
 }
@@ -606,6 +614,15 @@ static void add_napi_init_sync_stats(struct gve_priv *priv,
 	}
 }
 
+static void gve_tx_free_rings(struct gve_priv *priv)
+{
+	if (gve_is_gqi(priv)) {
+		gve_tx_free_rings_gqi(priv);
+	} else {
+		gve_tx_free_rings_dqo(priv);
+	}
+}
+
 static int gve_alloc_rings(struct gve_priv *priv)
 {
 	int err;
@@ -615,9 +632,14 @@ static int gve_alloc_rings(struct gve_priv *priv)
 			    GFP_KERNEL);
 	if (!priv->tx)
 		return -ENOMEM;
-	err = gve_tx_alloc_rings(priv);
+
+	if (gve_is_gqi(priv))
+		err = gve_tx_alloc_rings(priv);
+	else
+		err = gve_tx_alloc_rings_dqo(priv);
 	if (err)
 		goto free_tx;
+
 	/* Setup rx rings */
 	priv->rx = kvzalloc(priv->rx_cfg.num_queues * sizeof(*priv->rx),
 			    GFP_KERNEL);
@@ -625,7 +647,11 @@ static int gve_alloc_rings(struct gve_priv *priv)
 		err = -ENOMEM;
 		goto free_tx_queue;
 	}
-	err = gve_rx_alloc_rings(priv);
+
+	if (gve_is_gqi(priv))
+		err = gve_rx_alloc_rings(priv);
+	else
+		err = gve_rx_alloc_rings_dqo(priv);
 	if (err)
 		goto free_rx;
 
@@ -670,6 +696,14 @@ static int gve_destroy_rings(struct gve_priv *priv)
 	return 0;
 }
 
+static inline void gve_rx_free_rings(struct gve_priv *priv)
+{
+	if (gve_is_gqi(priv))
+		gve_rx_free_rings_gqi(priv);
+	else
+		gve_rx_free_rings_dqo(priv);
+}
+
 static void gve_free_rings(struct gve_priv *priv)
 {
 	int ntfy_idx;
@@ -869,6 +903,7 @@ static int gve_open(struct net_device *dev)
 	err = gve_alloc_qpls(priv);
 	if (err)
 		return err;
+
 	err = gve_alloc_rings(priv);
 	if (err)
 		goto free_qpls;
diff --git a/drivers/net/ethernet/google/gve/gve_rx.c b/drivers/net/ethernet/google/gve/gve_rx.c
index 15a64e40004d..bb8261368250 100644
--- a/drivers/net/ethernet/google/gve/gve_rx.c
+++ b/drivers/net/ethernet/google/gve/gve_rx.c
@@ -238,7 +238,7 @@ int gve_rx_alloc_rings(struct gve_priv *priv)
 	return err;
 }
 
-void gve_rx_free_rings(struct gve_priv *priv)
+void gve_rx_free_rings_gqi(struct gve_priv *priv)
 {
 	int i;
 
diff --git a/drivers/net/ethernet/google/gve/gve_rx_dqo.c b/drivers/net/ethernet/google/gve/gve_rx_dqo.c
index 808e09741ecc..1073a820767d 100644
--- a/drivers/net/ethernet/google/gve/gve_rx_dqo.c
+++ b/drivers/net/ethernet/google/gve/gve_rx_dqo.c
@@ -16,6 +16,163 @@
 #include <net/ipv6.h>
 #include <net/tcp.h>
 
+static void gve_free_page_dqo(struct gve_priv *priv,
+			      struct gve_rx_buf_state_dqo *bs)
+{
+}
+
+static void gve_rx_free_ring_dqo(struct gve_priv *priv, int idx)
+{
+	struct gve_rx_ring *rx = &priv->rx[idx];
+	struct device *hdev = &priv->pdev->dev;
+	size_t completion_queue_slots;
+	size_t buffer_queue_slots;
+	size_t size;
+	int i;
+
+	completion_queue_slots = rx->dqo.complq.mask + 1;
+	buffer_queue_slots = rx->dqo.bufq.mask + 1;
+
+	gve_rx_remove_from_block(priv, idx);
+
+	if (rx->q_resources) {
+		dma_free_coherent(hdev, sizeof(*rx->q_resources),
+				  rx->q_resources, rx->q_resources_bus);
+		rx->q_resources = NULL;
+	}
+
+	for (i = 0; i < rx->dqo.num_buf_states; i++) {
+		struct gve_rx_buf_state_dqo *bs = &rx->dqo.buf_states[i];
+
+		if (bs->page_info.page)
+			gve_free_page_dqo(priv, bs);
+	}
+
+	if (rx->dqo.bufq.desc_ring) {
+		size = sizeof(rx->dqo.bufq.desc_ring[0]) * buffer_queue_slots;
+		dma_free_coherent(hdev, size, rx->dqo.bufq.desc_ring,
+				  rx->dqo.bufq.bus);
+		rx->dqo.bufq.desc_ring = NULL;
+	}
+
+	if (rx->dqo.complq.desc_ring) {
+		size = sizeof(rx->dqo.complq.desc_ring[0]) *
+			completion_queue_slots;
+		dma_free_coherent(hdev, size, rx->dqo.complq.desc_ring,
+				  rx->dqo.complq.bus);
+		rx->dqo.complq.desc_ring = NULL;
+	}
+
+	kvfree(rx->dqo.buf_states);
+	rx->dqo.buf_states = NULL;
+
+	netif_dbg(priv, drv, priv->dev, "freed rx ring %d\n", idx);
+}
+
+static int gve_rx_alloc_ring_dqo(struct gve_priv *priv, int idx)
+{
+	struct gve_rx_ring *rx = &priv->rx[idx];
+	struct device *hdev = &priv->pdev->dev;
+	size_t size;
+	int i;
+
+	const u32 buffer_queue_slots =
+		priv->options_dqo_rda.rx_buff_ring_entries;
+	const u32 completion_queue_slots = priv->rx_desc_cnt;
+
+	netif_dbg(priv, drv, priv->dev, "allocating rx ring DQO\n");
+
+	memset(rx, 0, sizeof(*rx));
+	rx->gve = priv;
+	rx->q_num = idx;
+	rx->dqo.bufq.mask = buffer_queue_slots - 1;
+	rx->dqo.complq.num_free_slots = completion_queue_slots;
+	rx->dqo.complq.mask = completion_queue_slots - 1;
+	rx->skb_head = NULL;
+	rx->skb_tail = NULL;
+
+	rx->dqo.num_buf_states = min_t(s16, S16_MAX, buffer_queue_slots * 4);
+	rx->dqo.buf_states = kvcalloc(rx->dqo.num_buf_states,
+				      sizeof(rx->dqo.buf_states[0]),
+				      GFP_KERNEL);
+	if (!rx->dqo.buf_states)
+		return -ENOMEM;
+
+	/* Set up linked list of buffer IDs */
+	for (i = 0; i < rx->dqo.num_buf_states - 1; i++)
+		rx->dqo.buf_states[i].next = i + 1;
+
+	rx->dqo.buf_states[rx->dqo.num_buf_states - 1].next = -1;
+	rx->dqo.recycled_buf_states.head = -1;
+	rx->dqo.recycled_buf_states.tail = -1;
+	rx->dqo.used_buf_states.head = -1;
+	rx->dqo.used_buf_states.tail = -1;
+
+	/* Allocate RX completion queue */
+	size = sizeof(rx->dqo.complq.desc_ring[0]) *
+		completion_queue_slots;
+	rx->dqo.complq.desc_ring =
+		dma_alloc_coherent(hdev, size, &rx->dqo.complq.bus, GFP_KERNEL);
+	if (!rx->dqo.complq.desc_ring)
+		goto err;
+
+	/* Allocate RX buffer queue */
+	size = sizeof(rx->dqo.bufq.desc_ring[0]) * buffer_queue_slots;
+	rx->dqo.bufq.desc_ring =
+		dma_alloc_coherent(hdev, size, &rx->dqo.bufq.bus, GFP_KERNEL);
+	if (!rx->dqo.bufq.desc_ring)
+		goto err;
+
+	rx->q_resources = dma_alloc_coherent(hdev, sizeof(*rx->q_resources),
+					     &rx->q_resources_bus, GFP_KERNEL);
+	if (!rx->q_resources)
+		goto err;
+
+	gve_rx_add_to_block(priv, idx);
+
+	return 0;
+
+err:
+	gve_rx_free_ring_dqo(priv, idx);
+	return -ENOMEM;
+}
+
+int gve_rx_alloc_rings_dqo(struct gve_priv *priv)
+{
+	int err = 0;
+	int i;
+
+	for (i = 0; i < priv->rx_cfg.num_queues; i++) {
+		err = gve_rx_alloc_ring_dqo(priv, i);
+		if (err) {
+			netif_err(priv, drv, priv->dev,
+				  "Failed to alloc rx ring=%d: err=%d\n",
+				  i, err);
+			goto err;
+		}
+	}
+
+	return 0;
+
+err:
+	for (i--; i >= 0; i--)
+		gve_rx_free_ring_dqo(priv, i);
+
+	return err;
+}
+
+void gve_rx_free_rings_dqo(struct gve_priv *priv)
+{
+	int i;
+
+	for (i = 0; i < priv->rx_cfg.num_queues; i++)
+		gve_rx_free_ring_dqo(priv, i);
+}
+
+void gve_rx_post_buffers_dqo(struct gve_rx_ring *rx)
+{
+}
+
 int gve_rx_poll_dqo(struct gve_notify_block *block, int budget)
 {
 	u32 work_done = 0;
diff --git a/drivers/net/ethernet/google/gve/gve_tx.c b/drivers/net/ethernet/google/gve/gve_tx.c
index 75930bb64eb9..665ac795a1ad 100644
--- a/drivers/net/ethernet/google/gve/gve_tx.c
+++ b/drivers/net/ethernet/google/gve/gve_tx.c
@@ -256,7 +256,7 @@ int gve_tx_alloc_rings(struct gve_priv *priv)
 	return err;
 }
 
-void gve_tx_free_rings(struct gve_priv *priv)
+void gve_tx_free_rings_gqi(struct gve_priv *priv)
 {
 	int i;
 
diff --git a/drivers/net/ethernet/google/gve/gve_tx_dqo.c b/drivers/net/ethernet/google/gve/gve_tx_dqo.c
index 4b3319a1b299..bde8f90ac8bd 100644
--- a/drivers/net/ethernet/google/gve/gve_tx_dqo.c
+++ b/drivers/net/ethernet/google/gve/gve_tx_dqo.c
@@ -12,11 +12,204 @@
 #include <linux/slab.h>
 #include <linux/skbuff.h>
 
+/* gve_tx_free_desc - Cleans up all pending tx requests and buffers.
+ */
+static void gve_tx_clean_pending_packets(struct gve_tx_ring *tx)
+{
+	int i;
+
+	for (i = 0; i < tx->dqo.num_pending_packets; i++) {
+		struct gve_tx_pending_packet_dqo *cur_state =
+			&tx->dqo.pending_packets[i];
+		int j;
+
+		for (j = 0; j < cur_state->num_bufs; j++) {
+			struct gve_tx_dma_buf *buf = &cur_state->bufs[j];
+
+			if (j == 0) {
+				dma_unmap_single(tx->dev,
+						 dma_unmap_addr(buf, dma),
+						 dma_unmap_len(buf, len),
+						 DMA_TO_DEVICE);
+			} else {
+				dma_unmap_page(tx->dev,
+					       dma_unmap_addr(buf, dma),
+					       dma_unmap_len(buf, len),
+					       DMA_TO_DEVICE);
+			}
+		}
+		if (cur_state->skb) {
+			dev_consume_skb_any(cur_state->skb);
+			cur_state->skb = NULL;
+		}
+	}
+}
+
+static void gve_tx_free_ring_dqo(struct gve_priv *priv, int idx)
+{
+	struct gve_tx_ring *tx = &priv->tx[idx];
+	struct device *hdev = &priv->pdev->dev;
+	size_t bytes;
+
+	gve_tx_remove_from_block(priv, idx);
+
+	if (tx->q_resources) {
+		dma_free_coherent(hdev, sizeof(*tx->q_resources),
+				  tx->q_resources, tx->q_resources_bus);
+		tx->q_resources = NULL;
+	}
+
+	if (tx->dqo.compl_ring) {
+		bytes = sizeof(tx->dqo.compl_ring[0]) *
+			(tx->dqo.complq_mask + 1);
+		dma_free_coherent(hdev, bytes, tx->dqo.compl_ring,
+				  tx->complq_bus_dqo);
+		tx->dqo.compl_ring = NULL;
+	}
+
+	if (tx->dqo.tx_ring) {
+		bytes = sizeof(tx->dqo.tx_ring[0]) * (tx->mask + 1);
+		dma_free_coherent(hdev, bytes, tx->dqo.tx_ring, tx->bus);
+		tx->dqo.tx_ring = NULL;
+	}
+
+	kvfree(tx->dqo.pending_packets);
+	tx->dqo.pending_packets = NULL;
+
+	netif_dbg(priv, drv, priv->dev, "freed tx queue %d\n", idx);
+}
+
+static int gve_tx_alloc_ring_dqo(struct gve_priv *priv, int idx)
+{
+	struct gve_tx_ring *tx = &priv->tx[idx];
+	struct device *hdev = &priv->pdev->dev;
+	int num_pending_packets;
+	size_t bytes;
+	int i;
+
+	memset(tx, 0, sizeof(*tx));
+	tx->q_num = idx;
+	tx->dev = &priv->pdev->dev;
+	tx->netdev_txq = netdev_get_tx_queue(priv->dev, idx);
+	atomic_set_release(&tx->dqo_compl.hw_tx_head, 0);
+
+	/* Queue sizes must be a power of 2 */
+	tx->mask = priv->tx_desc_cnt - 1;
+	tx->dqo.complq_mask = priv->options_dqo_rda.tx_comp_ring_entries - 1;
+
+	/* The max number of pending packets determines the maximum number of
+	 * descriptors which maybe written to the completion queue.
+	 *
+	 * We must set the number small enough to make sure we never overrun the
+	 * completion queue.
+	 */
+	num_pending_packets = tx->dqo.complq_mask + 1;
+
+	/* Reserve space for descriptor completions, which will be reported at
+	 * most every GVE_TX_MIN_RE_INTERVAL packets.
+	 */
+	num_pending_packets -=
+		(tx->dqo.complq_mask + 1) / GVE_TX_MIN_RE_INTERVAL;
+
+	/* Each packet may have at most 2 buffer completions if it receives both
+	 * a miss and reinjection completion.
+	 */
+	num_pending_packets /= 2;
+
+	tx->dqo.num_pending_packets = min_t(int, num_pending_packets, S16_MAX);
+	tx->dqo.pending_packets = kvcalloc(tx->dqo.num_pending_packets,
+					   sizeof(tx->dqo.pending_packets[0]),
+					   GFP_KERNEL);
+	if (!tx->dqo.pending_packets)
+		goto err;
+
+	/* Set up linked list of pending packets */
+	for (i = 0; i < tx->dqo.num_pending_packets - 1; i++)
+		tx->dqo.pending_packets[i].next = i + 1;
+
+	tx->dqo.pending_packets[tx->dqo.num_pending_packets - 1].next = -1;
+	atomic_set_release(&tx->dqo_compl.free_pending_packets, -1);
+	tx->dqo_compl.miss_completions.head = -1;
+	tx->dqo_compl.miss_completions.tail = -1;
+	tx->dqo_compl.timed_out_completions.head = -1;
+	tx->dqo_compl.timed_out_completions.tail = -1;
+
+	bytes = sizeof(tx->dqo.tx_ring[0]) * (tx->mask + 1);
+	tx->dqo.tx_ring = dma_alloc_coherent(hdev, bytes, &tx->bus, GFP_KERNEL);
+	if (!tx->dqo.tx_ring)
+		goto err;
+
+	bytes = sizeof(tx->dqo.compl_ring[0]) * (tx->dqo.complq_mask + 1);
+	tx->dqo.compl_ring = dma_alloc_coherent(hdev, bytes,
+						&tx->complq_bus_dqo,
+						GFP_KERNEL);
+	if (!tx->dqo.compl_ring)
+		goto err;
+
+	tx->q_resources = dma_alloc_coherent(hdev, sizeof(*tx->q_resources),
+					     &tx->q_resources_bus, GFP_KERNEL);
+	if (!tx->q_resources)
+		goto err;
+
+	gve_tx_add_to_block(priv, idx);
+
+	return 0;
+
+err:
+	gve_tx_free_ring_dqo(priv, idx);
+	return -ENOMEM;
+}
+
+int gve_tx_alloc_rings_dqo(struct gve_priv *priv)
+{
+	int err = 0;
+	int i;
+
+	for (i = 0; i < priv->tx_cfg.num_queues; i++) {
+		err = gve_tx_alloc_ring_dqo(priv, i);
+		if (err) {
+			netif_err(priv, drv, priv->dev,
+				  "Failed to alloc tx ring=%d: err=%d\n",
+				  i, err);
+			goto err;
+		}
+	}
+
+	return 0;
+
+err:
+	for (i--; i >= 0; i--)
+		gve_tx_free_ring_dqo(priv, i);
+
+	return err;
+}
+
+void gve_tx_free_rings_dqo(struct gve_priv *priv)
+{
+	int i;
+
+	for (i = 0; i < priv->tx_cfg.num_queues; i++) {
+		struct gve_tx_ring *tx = &priv->tx[i];
+
+		gve_clean_tx_done_dqo(priv, tx, /*napi=*/NULL);
+		netdev_tx_reset_queue(tx->netdev_txq);
+		gve_tx_clean_pending_packets(tx);
+
+		gve_tx_free_ring_dqo(priv, i);
+	}
+}
+
 netdev_tx_t gve_tx_dqo(struct sk_buff *skb, struct net_device *dev)
 {
 	return NETDEV_TX_OK;
 }
 
+int gve_clean_tx_done_dqo(struct gve_priv *priv, struct gve_tx_ring *tx,
+			  struct napi_struct *napi)
+{
+	return 0;
+}
+
 bool gve_tx_poll_dqo(struct gve_notify_block *block, bool do_clean)
 {
 	return false;

From patchwork Thu Jun 24 18:06:31 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Bailey Forrest <bcf@google.com>
X-Patchwork-Id: 466704
Return-Path: <netdev-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
 aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-26.4 required=3.0 tests=BAYES_00,DKIMWL_WL_MED,
 DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,
 INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE,
 SPF_PASS, 
 USER_AGENT_GIT,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no
 version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
 by smtp.lore.kernel.org (Postfix) with ESMTP id 15D67C49EAB
 for <netdev@archiver.kernel.org>;
 Thu, 24 Jun 2021 18:08:39 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
 by mail.kernel.org (Postfix) with ESMTP id E3E0B613C2
 for <netdev@archiver.kernel.org>;
 Thu, 24 Jun 2021 18:08:38 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
 id S232597AbhFXSK5 (ORCPT <rfc822;netdev@archiver.kernel.org>);
 Thu, 24 Jun 2021 14:10:57 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37942 "EHLO
 lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by
 vger.kernel.org with ESMTP id S232568AbhFXSKj (ORCPT
 <rfc822;netdev@vger.kernel.org>); Thu, 24 Jun 2021 14:10:39 -0400
Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com
 [IPv6:2607:f8b0:4864:20::b49])
 by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 35DD6C06124C
 for <netdev@vger.kernel.org>; Thu, 24 Jun 2021 11:08:06 -0700 (PDT)
Received: by mail-yb1-xb49.google.com with SMTP id
 g3-20020a256b030000b0290551bbd99700so500767ybc.6
 for <netdev@vger.kernel.org>; Thu, 24 Jun 2021 11:08:06 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025;
 h=date:in-reply-to:message-id:mime-version:references:subject:from:to
 :cc; bh=mDUflbNJ3UkF1SaBijE+YQS26huBrHtUUF8L5XrEmVA=;
 b=Z3KVlQoO8v9zXhUU3Z2QY8wV4hy0zlyPasyzJbv7b5eMbrrhU6CB9wGLnq5KbHCg62
 DxADeujdgrUwIUOvHmT3K5UQLYerXOwdbSp4ZR3TEKQVBB2tPCfmHVpGek6OUXSHm6tv
 CfG415/KNu3G51TWZt/3a8b2hHpPie5C84bij6pbDCobhumozkS3vjVU34JM1wmDaboj
 X4Jd4i6IiE+0lVmD/gmH+f4CLOkjVpG9GPwLC/SpY6RWcsoSTWPsrphUG2H4EmAPOucW
 TYFkbntaNcNM0skNBG+FDcBo9sjJRalgl7ZqIYE0qmLFrRMA7c8LTfN4AnA6LNWqtPwJ
 NArw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:date:in-reply-to:message-id:mime-version
 :references:subject:from:to:cc;
 bh=mDUflbNJ3UkF1SaBijE+YQS26huBrHtUUF8L5XrEmVA=;
 b=XTs0t4YUGd4NaHQw1rGM80O6vC27bzCpdu5KpYmCeb33dBtcKDnIurRWhpUTzg/npI
 BEatyWa2e9zAAePdYDmty32OEg/NLSAceWIIHDxyM4H5LP5zt4QXdoj7d0cr+wjXqraF
 PdXwEMZPpOPaH34zrSEFjduZyDdeoqjbvCP7097XlO0m7SOC66yLnW3S0/kmMrQkJop3
 PR8mgdFKS41lJ6KhGavMdoGTcmUK/p3nT6q1acvnpvC8bgXkrf16Mpyd1BjAy4akWszW
 cO/LMKD6/5qypv3Vxrmddms5gdA4eUM6k2w/vQB6lDX+jbZuNXIabS67h/atuPj3povb
 t1NQ==
X-Gm-Message-State: AOAM532QTRpw1fsayAjqrxbmh1M55qfNfroTkm1FehQHWxe+Mx6tKfUV
 KJXeum2a0O9w2e58xDK8r6xfXOs=
X-Google-Smtp-Source: ABdhPJwYwNuLFuWYFslkQT1X5tTaylwsjZjBduOFBGfdKHunkcvYe2sk4Nb8oY4Km5MH28Kglegw8pY=
X-Received: from bcf-linux.svl.corp.google.com
 ([2620:15c:2c4:1:cb6c:4753:6df0:b898])
 (user=bcf job=sendgmr) by 2002:a25:e70b:: with SMTP id
 e11mr3830327ybh.493.1624558085363; 
 Thu, 24 Jun 2021 11:08:05 -0700 (PDT)
Date: Thu, 24 Jun 2021 11:06:31 -0700
In-Reply-To: <20210624180632.3659809-1-bcf@google.com>
Message-Id: <20210624180632.3659809-16-bcf@google.com>
Mime-Version: 1.0
References: <20210624180632.3659809-1-bcf@google.com>
X-Mailer: git-send-email 2.32.0.93.g670b81a890-goog
Subject: [PATCH net-next 15/16] gve: DQO: Add TX path
From: Bailey Forrest <bcf@google.com>
To: Bailey Forrest <bcf@google.com>, "David S . Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org, Willem de Bruijn <willemb@google.com>,
 Catherine Sullivan <csully@google.com>
Precedence: bulk
List-ID: <netdev.vger.kernel.org>
X-Mailing-List: netdev@vger.kernel.org

TX SKBs will have their buffers DMA mapped with the device. Each buffer
will have at least one TX descriptor associated. Each SKB will also have
a metadata descriptor.

Each TX queue maintains an array of `gve_tx_pending_packet_dqo` objects.
Every TX SKB will have an associated pending_packet object. A TX SKB's
descriptors will use its pending_packet's index as the completion tag,
which will be returned on the TX completion queue.

The device implements a "flow-miss model". Most packets will simply
receive a packet completion. The flow-miss system may choose to process
a packet based on its contents. A TX packet which experiences a flow
miss would receive a miss completion followed by a later reinjection
completion. The miss-completion is received when the packet starts to be
processed by the flow-miss system and the reinjection completion is
received when the flow-miss system completes processing the packet and
sends it on the wire.

Notable mentions:

- Buffers may be freed after receiving the miss-completion, but in order
  to avoid packet reordering, we do not complete the SKB until receiving
  the reinjection completion.

- The driver must robustly handle the unlikely scenario where a miss
  completion does not have an associated reinjection completion. This is
  accomplished by maintaining a list of packets which have a pending
  reinjection completion. After a short timeout (5 seconds), the
  SKB and buffers are released and the pending_packet is moved to a
  second list which has a longer timeout (60 seconds), where the
  pending_packet will not be reused. When the longer timeout elapses,
  the driver may assume the reinjection completion would never be
  received and the pending_packet may be reused.

- Completion handling is triggered by an interrupt and is done in the
  NAPI poll function. Because the TX path and completion exist in
  different threading contexts they maintain their own lists for free
  pending_packet objects. The TX path uses a lock-free approach to steal
  the list from the completion path.

- Both the TSO context and general context descriptors have metadata
  bytes. The device requires that if multiple descriptors contain the
  same field, each descriptor must have the same value set for that
  field.

Signed-off-by: Bailey Forrest <bcf@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Catherine Sullivan <csully@google.com>
---
 drivers/net/ethernet/google/gve/gve_dqo.h    |  12 +
 drivers/net/ethernet/google/gve/gve_tx_dqo.c | 819 ++++++++++++++++++-
 2 files changed, 829 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/google/gve/gve_dqo.h b/drivers/net/ethernet/google/gve/gve_dqo.h
index 3b300223ea15..836042364124 100644
--- a/drivers/net/ethernet/google/gve/gve_dqo.h
+++ b/drivers/net/ethernet/google/gve/gve_dqo.h
@@ -19,6 +19,18 @@
 #define GVE_TX_IRQ_RATELIMIT_US_DQO 50
 #define GVE_RX_IRQ_RATELIMIT_US_DQO 20
 
+/* Timeout in seconds to wait for a reinjection completion after receiving
+ * its corresponding miss completion.
+ */
+#define GVE_REINJECT_COMPL_TIMEOUT 1
+
+/* Timeout in seconds to deallocate the completion tag for a packet that was
+ * prematurely freed for not receiving a valid completion. This should be large
+ * enough to rule out the possibility of receiving the corresponding valid
+ * completion after this interval.
+ */
+#define GVE_DEALLOCATE_COMPL_TIMEOUT 60
+
 netdev_tx_t gve_tx_dqo(struct sk_buff *skb, struct net_device *dev);
 bool gve_tx_poll_dqo(struct gve_notify_block *block, bool do_clean);
 int gve_rx_poll_dqo(struct gve_notify_block *block, int budget);
diff --git a/drivers/net/ethernet/google/gve/gve_tx_dqo.c b/drivers/net/ethernet/google/gve/gve_tx_dqo.c
index bde8f90ac8bd..a4906b9df540 100644
--- a/drivers/net/ethernet/google/gve/gve_tx_dqo.c
+++ b/drivers/net/ethernet/google/gve/gve_tx_dqo.c
@@ -12,6 +12,67 @@
 #include <linux/slab.h>
 #include <linux/skbuff.h>
 
+/* Returns true if a gve_tx_pending_packet_dqo object is available. */
+static bool gve_has_pending_packet(struct gve_tx_ring *tx)
+{
+	/* Check TX path's list. */
+	if (tx->dqo_tx.free_pending_packets != -1)
+		return true;
+
+	/* Check completion handler's list. */
+	if (atomic_read_acquire(&tx->dqo_compl.free_pending_packets) != -1)
+		return true;
+
+	return false;
+}
+
+static struct gve_tx_pending_packet_dqo *
+gve_alloc_pending_packet(struct gve_tx_ring *tx)
+{
+	struct gve_tx_pending_packet_dqo *pending_packet;
+	s16 index;
+
+	index = tx->dqo_tx.free_pending_packets;
+
+	/* No pending_packets available, try to steal the list from the
+	 * completion handler.
+	 */
+	if (unlikely(index == -1)) {
+		tx->dqo_tx.free_pending_packets =
+			atomic_xchg(&tx->dqo_compl.free_pending_packets, -1);
+		index = tx->dqo_tx.free_pending_packets;
+
+		if (unlikely(index == -1))
+			return NULL;
+	}
+
+	pending_packet = &tx->dqo.pending_packets[index];
+
+	/* Remove pending_packet from free list */
+	tx->dqo_tx.free_pending_packets = pending_packet->next;
+	pending_packet->state = GVE_PACKET_STATE_PENDING_DATA_COMPL;
+
+	return pending_packet;
+}
+
+static void
+gve_free_pending_packet(struct gve_tx_ring *tx,
+			struct gve_tx_pending_packet_dqo *pending_packet)
+{
+	s16 index = pending_packet - tx->dqo.pending_packets;
+
+	pending_packet->state = GVE_PACKET_STATE_UNALLOCATED;
+	while (true) {
+		s16 old_head = atomic_read_acquire(&tx->dqo_compl.free_pending_packets);
+
+		pending_packet->next = old_head;
+		if (atomic_cmpxchg(&tx->dqo_compl.free_pending_packets,
+				   old_head, index) == old_head) {
+			break;
+		}
+	}
+}
+
 /* gve_tx_free_desc - Cleans up all pending tx requests and buffers.
  */
 static void gve_tx_clean_pending_packets(struct gve_tx_ring *tx)
@@ -199,18 +260,772 @@ void gve_tx_free_rings_dqo(struct gve_priv *priv)
 	}
 }
 
+/* Returns the number of slots available in the ring */
+static inline u32 num_avail_tx_slots(const struct gve_tx_ring *tx)
+{
+	u32 num_used = (tx->dqo_tx.tail - tx->dqo_tx.head) & tx->mask;
+
+	return tx->mask - num_used;
+}
+
+/* Stops the queue if available descriptors is less than 'count'.
+ * Return: 0 if stop is not required.
+ */
+static int gve_maybe_stop_tx_dqo(struct gve_tx_ring *tx, int count)
+{
+	if (likely(gve_has_pending_packet(tx) &&
+		   num_avail_tx_slots(tx) >= count))
+		return 0;
+
+	/* Update cached TX head pointer */
+	tx->dqo_tx.head = atomic_read_acquire(&tx->dqo_compl.hw_tx_head);
+
+	if (likely(gve_has_pending_packet(tx) &&
+		   num_avail_tx_slots(tx) >= count))
+		return 0;
+
+	/* No space, so stop the queue */
+	tx->stop_queue++;
+	netif_tx_stop_queue(tx->netdev_txq);
+
+	/* Sync with restarting queue in `gve_tx_poll_dqo()` */
+	mb();
+
+	/* After stopping queue, check if we can transmit again in order to
+	 * avoid TOCTOU bug.
+	 */
+	tx->dqo_tx.head = atomic_read_acquire(&tx->dqo_compl.hw_tx_head);
+
+	if (likely(!gve_has_pending_packet(tx) ||
+		   num_avail_tx_slots(tx) < count))
+		return -EBUSY;
+
+	netif_tx_start_queue(tx->netdev_txq);
+	tx->wake_queue++;
+	return 0;
+}
+
+static void gve_extract_tx_metadata_dqo(const struct sk_buff *skb,
+					struct gve_tx_metadata_dqo *metadata)
+{
+	memset(metadata, 0, sizeof(*metadata));
+	metadata->version = GVE_TX_METADATA_VERSION_DQO;
+
+	if (skb->l4_hash) {
+		u16 path_hash = skb->hash ^ (skb->hash >> 16);
+
+		path_hash &= (1 << 15) - 1;
+		if (unlikely(path_hash == 0))
+			path_hash = ~path_hash;
+
+		metadata->path_hash = path_hash;
+	}
+}
+
+static void gve_tx_fill_pkt_desc_dqo(struct gve_tx_ring *tx, u32 *desc_idx,
+				     struct sk_buff *skb, u32 len, u64 addr,
+				     s16 compl_tag, bool eop, bool is_gso)
+{
+	const bool checksum_offload_en = skb->ip_summed == CHECKSUM_PARTIAL;
+
+	while (len > 0) {
+		struct gve_tx_pkt_desc_dqo *desc =
+			&tx->dqo.tx_ring[*desc_idx].pkt;
+		u32 cur_len = min_t(u32, len, GVE_TX_MAX_BUF_SIZE_DQO);
+		bool cur_eop = eop && cur_len == len;
+
+		*desc = (struct gve_tx_pkt_desc_dqo){
+			.buf_addr = cpu_to_le64(addr),
+			.dtype = GVE_TX_PKT_DESC_DTYPE_DQO,
+			.end_of_packet = cur_eop,
+			.checksum_offload_enable = checksum_offload_en,
+			.compl_tag = cpu_to_le16(compl_tag),
+			.buf_size = cur_len,
+		};
+
+		addr += cur_len;
+		len -= cur_len;
+		*desc_idx = (*desc_idx + 1) & tx->mask;
+	}
+}
+
+/* Validates and prepares `skb` for TSO.
+ *
+ * Returns header length, or < 0 if invalid.
+ */
+static int gve_prep_tso(struct sk_buff *skb)
+{
+	struct tcphdr *tcp;
+	int header_len;
+	u32 paylen;
+	int err;
+
+	/* Note: HW requires MSS (gso_size) to be <= 9728 and the total length
+	 * of the TSO to be <= 262143.
+	 *
+	 * However, we don't validate these because:
+	 * - Hypervisor enforces a limit of 9K MTU
+	 * - Kernel will not produce a TSO larger than 64k
+	 */
+
+	if (unlikely(skb_shinfo(skb)->gso_size < GVE_TX_MIN_TSO_MSS_DQO))
+		return -1;
+
+	/* Needed because we will modify header. */
+	err = skb_cow_head(skb, 0);
+	if (err < 0)
+		return err;
+
+	tcp = tcp_hdr(skb);
+
+	/* Remove payload length from checksum. */
+	paylen = skb->len - skb_transport_offset(skb);
+
+	switch (skb_shinfo(skb)->gso_type) {
+	case SKB_GSO_TCPV4:
+	case SKB_GSO_TCPV6:
+		csum_replace_by_diff(&tcp->check,
+				     (__force __wsum)htonl(paylen));
+
+		/* Compute length of segmentation header. */
+		header_len = skb_transport_offset(skb) + tcp_hdrlen(skb);
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	if (unlikely(header_len > GVE_TX_MAX_HDR_SIZE_DQO))
+		return -EINVAL;
+
+	return header_len;
+}
+
+static void gve_tx_fill_tso_ctx_desc(struct gve_tx_tso_context_desc_dqo *desc,
+				     const struct sk_buff *skb,
+				     const struct gve_tx_metadata_dqo *metadata,
+				     int header_len)
+{
+	*desc = (struct gve_tx_tso_context_desc_dqo){
+		.header_len = header_len,
+		.cmd_dtype = {
+			.dtype = GVE_TX_TSO_CTX_DESC_DTYPE_DQO,
+			.tso = 1,
+		},
+		.flex0 = metadata->bytes[0],
+		.flex5 = metadata->bytes[5],
+		.flex6 = metadata->bytes[6],
+		.flex7 = metadata->bytes[7],
+		.flex8 = metadata->bytes[8],
+		.flex9 = metadata->bytes[9],
+		.flex10 = metadata->bytes[10],
+		.flex11 = metadata->bytes[11],
+	};
+	desc->tso_total_len = skb->len - header_len;
+	desc->mss = skb_shinfo(skb)->gso_size;
+}
+
+static void
+gve_tx_fill_general_ctx_desc(struct gve_tx_general_context_desc_dqo *desc,
+			     const struct gve_tx_metadata_dqo *metadata)
+{
+	*desc = (struct gve_tx_general_context_desc_dqo){
+		.flex0 = metadata->bytes[0],
+		.flex1 = metadata->bytes[1],
+		.flex2 = metadata->bytes[2],
+		.flex3 = metadata->bytes[3],
+		.flex4 = metadata->bytes[4],
+		.flex5 = metadata->bytes[5],
+		.flex6 = metadata->bytes[6],
+		.flex7 = metadata->bytes[7],
+		.flex8 = metadata->bytes[8],
+		.flex9 = metadata->bytes[9],
+		.flex10 = metadata->bytes[10],
+		.flex11 = metadata->bytes[11],
+		.cmd_dtype = {.dtype = GVE_TX_GENERAL_CTX_DESC_DTYPE_DQO},
+	};
+}
+
+/* Returns 0 on success, or < 0 on error.
+ *
+ * Before this function is called, the caller must ensure
+ * gve_has_pending_packet(tx) returns true.
+ */
+static int gve_tx_add_skb_no_copy_dqo(struct gve_tx_ring *tx,
+				      struct sk_buff *skb)
+{
+	const struct skb_shared_info *shinfo = skb_shinfo(skb);
+	const bool is_gso = skb_is_gso(skb);
+	u32 desc_idx = tx->dqo_tx.tail;
+
+	struct gve_tx_pending_packet_dqo *pending_packet;
+	struct gve_tx_metadata_dqo metadata;
+	s16 completion_tag;
+	int i;
+
+	pending_packet = gve_alloc_pending_packet(tx);
+	pending_packet->skb = skb;
+	pending_packet->num_bufs = 0;
+	completion_tag = pending_packet - tx->dqo.pending_packets;
+
+	gve_extract_tx_metadata_dqo(skb, &metadata);
+	if (is_gso) {
+		int header_len = gve_prep_tso(skb);
+
+		if (unlikely(header_len < 0))
+			goto err;
+
+		gve_tx_fill_tso_ctx_desc(&tx->dqo.tx_ring[desc_idx].tso_ctx,
+					 skb, &metadata, header_len);
+		desc_idx = (desc_idx + 1) & tx->mask;
+	}
+
+	gve_tx_fill_general_ctx_desc(&tx->dqo.tx_ring[desc_idx].general_ctx,
+				     &metadata);
+	desc_idx = (desc_idx + 1) & tx->mask;
+
+	/* Note: HW requires that the size of a non-TSO packet be within the
+	 * range of [17, 9728].
+	 *
+	 * We don't double check because
+	 * - We limited `netdev->min_mtu` to ETH_MIN_MTU.
+	 * - Hypervisor won't allow MTU larger than 9216.
+	 */
+
+	/* Map the linear portion of skb */
+	{
+		struct gve_tx_dma_buf *buf =
+			&pending_packet->bufs[pending_packet->num_bufs];
+		u32 len = skb_headlen(skb);
+		dma_addr_t addr;
+
+		addr = dma_map_single(tx->dev, skb->data, len, DMA_TO_DEVICE);
+		if (unlikely(dma_mapping_error(tx->dev, addr)))
+			goto err;
+
+		dma_unmap_len_set(buf, len, len);
+		dma_unmap_addr_set(buf, dma, addr);
+		++pending_packet->num_bufs;
+
+		gve_tx_fill_pkt_desc_dqo(tx, &desc_idx, skb, len, addr,
+					 completion_tag,
+					 /*eop=*/shinfo->nr_frags == 0, is_gso);
+	}
+
+	for (i = 0; i < shinfo->nr_frags; i++) {
+		struct gve_tx_dma_buf *buf =
+			&pending_packet->bufs[pending_packet->num_bufs];
+		const skb_frag_t *frag = &shinfo->frags[i];
+		bool is_eop = i == (shinfo->nr_frags - 1);
+		u32 len = skb_frag_size(frag);
+		dma_addr_t addr;
+
+		addr = skb_frag_dma_map(tx->dev, frag, 0, len, DMA_TO_DEVICE);
+		if (unlikely(dma_mapping_error(tx->dev, addr)))
+			goto err;
+
+		dma_unmap_len_set(buf, len, len);
+		dma_unmap_addr_set(buf, dma, addr);
+		++pending_packet->num_bufs;
+
+		gve_tx_fill_pkt_desc_dqo(tx, &desc_idx, skb, len, addr,
+					 completion_tag, is_eop, is_gso);
+	}
+
+	/* Commit the changes to our state */
+	tx->dqo_tx.tail = desc_idx;
+
+	/* Request a descriptor completion on the last descriptor of the
+	 * packet if we are allowed to by the HW enforced interval.
+	 */
+	{
+		u32 last_desc_idx = (desc_idx - 1) & tx->mask;
+		u32 last_report_event_interval =
+			(last_desc_idx - tx->dqo_tx.last_re_idx) & tx->mask;
+
+		if (unlikely(last_report_event_interval >=
+			     GVE_TX_MIN_RE_INTERVAL)) {
+			tx->dqo.tx_ring[last_desc_idx].pkt.report_event = true;
+			tx->dqo_tx.last_re_idx = last_desc_idx;
+		}
+	}
+
+	return 0;
+
+err:
+	for (i = 0; i < pending_packet->num_bufs; i++) {
+		struct gve_tx_dma_buf *buf = &pending_packet->bufs[i];
+
+		if (i == 0) {
+			dma_unmap_single(tx->dev, dma_unmap_addr(buf, dma),
+					 dma_unmap_len(buf, len),
+					 DMA_TO_DEVICE);
+		} else {
+			dma_unmap_page(tx->dev, dma_unmap_addr(buf, dma),
+				       dma_unmap_len(buf, len), DMA_TO_DEVICE);
+		}
+	}
+
+	pending_packet->skb = NULL;
+	pending_packet->num_bufs = 0;
+	gve_free_pending_packet(tx, pending_packet);
+
+	return -1;
+}
+
+static int gve_num_descs_per_buf(size_t size)
+{
+	return DIV_ROUND_UP(size, GVE_TX_MAX_BUF_SIZE_DQO);
+}
+
+static int gve_num_buffer_descs_needed(const struct sk_buff *skb)
+{
+	const struct skb_shared_info *shinfo = skb_shinfo(skb);
+	int num_descs;
+	int i;
+
+	num_descs = gve_num_descs_per_buf(skb_headlen(skb));
+
+	for (i = 0; i < shinfo->nr_frags; i++) {
+		unsigned int frag_size = skb_frag_size(&shinfo->frags[i]);
+
+		num_descs += gve_num_descs_per_buf(frag_size);
+	}
+
+	return num_descs;
+}
+
+/* Returns true if HW is capable of sending TSO represented by `skb`.
+ *
+ * Each segment must not span more than GVE_TX_MAX_DATA_DESCS buffers.
+ * - The header is counted as one buffer for every single segment.
+ * - A buffer which is split between two segments is counted for both.
+ * - If a buffer contains both header and payload, it is counted as two buffers.
+ */
+static bool gve_can_send_tso(const struct sk_buff *skb)
+{
+	const int header_len = skb_checksum_start_offset(skb) + tcp_hdrlen(skb);
+	const int max_bufs_per_seg = GVE_TX_MAX_DATA_DESCS - 1;
+	const struct skb_shared_info *shinfo = skb_shinfo(skb);
+	const int gso_size = shinfo->gso_size;
+	int cur_seg_num_bufs;
+	int cur_seg_size;
+	int i;
+
+	cur_seg_size = skb_headlen(skb) - header_len;
+	cur_seg_num_bufs = cur_seg_size > 0;
+
+	for (i = 0; i < shinfo->nr_frags; i++) {
+		if (cur_seg_size >= gso_size) {
+			cur_seg_size %= gso_size;
+			cur_seg_num_bufs = cur_seg_size > 0;
+		}
+
+		if (unlikely(++cur_seg_num_bufs > max_bufs_per_seg))
+			return false;
+
+		cur_seg_size += skb_frag_size(&shinfo->frags[i]);
+	}
+
+	return true;
+}
+
+/* Attempt to transmit specified SKB.
+ *
+ * Returns 0 if the SKB was transmitted or dropped.
+ * Returns -1 if there is not currently enough space to transmit the SKB.
+ */
+static int gve_try_tx_skb(struct gve_priv *priv, struct gve_tx_ring *tx,
+			  struct sk_buff *skb)
+{
+	int num_buffer_descs;
+	int total_num_descs;
+
+	if (skb_is_gso(skb)) {
+		/* If TSO doesn't meet HW requirements, attempt to linearize the
+		 * packet.
+		 */
+		if (unlikely(!gve_can_send_tso(skb) &&
+			     skb_linearize(skb) < 0)) {
+			net_err_ratelimited("%s: Failed to transmit TSO packet\n",
+					    priv->dev->name);
+			goto drop;
+		}
+
+		num_buffer_descs = gve_num_buffer_descs_needed(skb);
+	} else {
+		num_buffer_descs = gve_num_buffer_descs_needed(skb);
+
+		if (unlikely(num_buffer_descs > GVE_TX_MAX_DATA_DESCS)) {
+			if (unlikely(skb_linearize(skb) < 0))
+				goto drop;
+
+			num_buffer_descs = 1;
+		}
+	}
+
+	/* Metadata + (optional TSO) + data descriptors. */
+	total_num_descs = 1 + skb_is_gso(skb) + num_buffer_descs;
+	if (unlikely(gve_maybe_stop_tx_dqo(tx, total_num_descs +
+			GVE_TX_MIN_DESC_PREVENT_CACHE_OVERLAP))) {
+		return -1;
+	}
+
+	if (unlikely(gve_tx_add_skb_no_copy_dqo(tx, skb) < 0))
+		goto drop;
+
+	netdev_tx_sent_queue(tx->netdev_txq, skb->len);
+	skb_tx_timestamp(skb);
+	return 0;
+
+drop:
+	tx->dropped_pkt++;
+	dev_kfree_skb_any(skb);
+	return 0;
+}
+
+/* Transmit a given skb and ring the doorbell. */
 netdev_tx_t gve_tx_dqo(struct sk_buff *skb, struct net_device *dev)
 {
+	struct gve_priv *priv = netdev_priv(dev);
+	struct gve_tx_ring *tx;
+
+	tx = &priv->tx[skb_get_queue_mapping(skb)];
+	if (unlikely(gve_try_tx_skb(priv, tx, skb) < 0)) {
+		/* We need to ring the txq doorbell -- we have stopped the Tx
+		 * queue for want of resources, but prior calls to gve_tx()
+		 * may have added descriptors without ringing the doorbell.
+		 */
+		gve_tx_put_doorbell_dqo(priv, tx->q_resources, tx->dqo_tx.tail);
+		return NETDEV_TX_BUSY;
+	}
+
+	if (!netif_xmit_stopped(tx->netdev_txq) && netdev_xmit_more())
+		return NETDEV_TX_OK;
+
+	gve_tx_put_doorbell_dqo(priv, tx->q_resources, tx->dqo_tx.tail);
 	return NETDEV_TX_OK;
 }
 
+static void add_to_list(struct gve_tx_ring *tx, struct gve_index_list *list,
+			struct gve_tx_pending_packet_dqo *pending_packet)
+{
+	s16 old_tail, index;
+
+	index = pending_packet - tx->dqo.pending_packets;
+	old_tail = list->tail;
+	list->tail = index;
+	if (old_tail == -1)
+		list->head = index;
+	else
+		tx->dqo.pending_packets[old_tail].next = index;
+
+	pending_packet->next = -1;
+	pending_packet->prev = old_tail;
+}
+
+static void remove_from_list(struct gve_tx_ring *tx,
+			     struct gve_index_list *list,
+			     struct gve_tx_pending_packet_dqo *pending_packet)
+{
+	s16 index, prev_index, next_index;
+
+	index = pending_packet - tx->dqo.pending_packets;
+	prev_index = pending_packet->prev;
+	next_index = pending_packet->next;
+
+	if (prev_index == -1) {
+		/* Node is head */
+		list->head = next_index;
+	} else {
+		tx->dqo.pending_packets[prev_index].next = next_index;
+	}
+	if (next_index == -1) {
+		/* Node is tail */
+		list->tail = prev_index;
+	} else {
+		tx->dqo.pending_packets[next_index].prev = prev_index;
+	}
+}
+
+static void gve_unmap_packet(struct device *dev,
+			     struct gve_tx_pending_packet_dqo *pending_packet)
+{
+	struct gve_tx_dma_buf *buf;
+	int i;
+
+	/* SKB linear portion is guaranteed to be mapped */
+	buf = &pending_packet->bufs[0];
+	dma_unmap_single(dev, dma_unmap_addr(buf, dma),
+			 dma_unmap_len(buf, len), DMA_TO_DEVICE);
+	for (i = 1; i < pending_packet->num_bufs; i++) {
+		buf = &pending_packet->bufs[i];
+		dma_unmap_page(dev, dma_unmap_addr(buf, dma),
+			       dma_unmap_len(buf, len), DMA_TO_DEVICE);
+	}
+	pending_packet->num_bufs = 0;
+}
+
+/* Completion types and expected behavior:
+ * No Miss compl + Packet compl = Packet completed normally.
+ * Miss compl + Re-inject compl = Packet completed normally.
+ * No Miss compl + Re-inject compl = Skipped i.e. packet not completed.
+ * Miss compl + Packet compl = Skipped i.e. packet not completed.
+ */
+static void gve_handle_packet_completion(struct gve_priv *priv,
+					 struct gve_tx_ring *tx, bool is_napi,
+					 u16 compl_tag, u64 *bytes, u64 *pkts,
+					 bool is_reinjection)
+{
+	struct gve_tx_pending_packet_dqo *pending_packet;
+
+	if (unlikely(compl_tag >= tx->dqo.num_pending_packets)) {
+		net_err_ratelimited("%s: Invalid TX completion tag: %d\n",
+				    priv->dev->name, (int)compl_tag);
+		return;
+	}
+
+	pending_packet = &tx->dqo.pending_packets[compl_tag];
+
+	if (unlikely(is_reinjection)) {
+		if (unlikely(pending_packet->state ==
+			     GVE_PACKET_STATE_TIMED_OUT_COMPL)) {
+			net_err_ratelimited("%s: Re-injection completion: %d received after timeout.\n",
+					    priv->dev->name, (int)compl_tag);
+			/* Packet was already completed as a result of timeout,
+			 * so just remove from list and free pending packet.
+			 */
+			remove_from_list(tx,
+					 &tx->dqo_compl.timed_out_completions,
+					 pending_packet);
+			gve_free_pending_packet(tx, pending_packet);
+			return;
+		}
+		if (unlikely(pending_packet->state !=
+			     GVE_PACKET_STATE_PENDING_REINJECT_COMPL)) {
+			/* No outstanding miss completion but packet allocated
+			 * implies packet receives a re-injection completion
+			 * without a a prior miss completion. Return without
+			 * completing the packet.
+			 */
+			net_err_ratelimited("%s: Re-injection completion received without corresponding miss completion: %d\n",
+					    priv->dev->name, (int)compl_tag);
+			return;
+		}
+		remove_from_list(tx, &tx->dqo_compl.miss_completions,
+				 pending_packet);
+	} else {
+		/* Packet is allocated but not a pending data completion. */
+		if (unlikely(pending_packet->state !=
+			     GVE_PACKET_STATE_PENDING_DATA_COMPL)) {
+			net_err_ratelimited("%s: No pending data completion: %d\n",
+					    priv->dev->name, (int)compl_tag);
+			return;
+		}
+	}
+	gve_unmap_packet(tx->dev, pending_packet);
+
+	*bytes += pending_packet->skb->len;
+	(*pkts)++;
+	napi_consume_skb(pending_packet->skb, is_napi);
+	pending_packet->skb = NULL;
+	gve_free_pending_packet(tx, pending_packet);
+}
+
+static void gve_handle_miss_completion(struct gve_priv *priv,
+				       struct gve_tx_ring *tx, u16 compl_tag,
+				       u64 *bytes, u64 *pkts)
+{
+	struct gve_tx_pending_packet_dqo *pending_packet;
+
+	if (unlikely(compl_tag >= tx->dqo.num_pending_packets)) {
+		net_err_ratelimited("%s: Invalid TX completion tag: %d\n",
+				    priv->dev->name, (int)compl_tag);
+		return;
+	}
+
+	pending_packet = &tx->dqo.pending_packets[compl_tag];
+	if (unlikely(pending_packet->state !=
+				GVE_PACKET_STATE_PENDING_DATA_COMPL)) {
+		net_err_ratelimited("%s: Unexpected packet state: %d for completion tag : %d\n",
+				    priv->dev->name, (int)pending_packet->state,
+				    (int)compl_tag);
+		return;
+	}
+
+	pending_packet->state = GVE_PACKET_STATE_PENDING_REINJECT_COMPL;
+	/* jiffies can wraparound but time comparisons can handle overflows. */
+	pending_packet->timeout_jiffies =
+			jiffies +
+			msecs_to_jiffies(GVE_REINJECT_COMPL_TIMEOUT *
+					 MSEC_PER_SEC);
+	add_to_list(tx, &tx->dqo_compl.miss_completions, pending_packet);
+
+	*bytes += pending_packet->skb->len;
+	(*pkts)++;
+}
+
+static void remove_miss_completions(struct gve_priv *priv,
+				    struct gve_tx_ring *tx)
+{
+	struct gve_tx_pending_packet_dqo *pending_packet;
+	s16 next_index;
+
+	next_index = tx->dqo_compl.miss_completions.head;
+	while (next_index != -1) {
+		pending_packet = &tx->dqo.pending_packets[next_index];
+		next_index = pending_packet->next;
+		/* Break early because packets should timeout in order. */
+		if (time_is_after_jiffies(pending_packet->timeout_jiffies))
+			break;
+
+		remove_from_list(tx, &tx->dqo_compl.miss_completions,
+				 pending_packet);
+		/* Unmap buffers and free skb but do not unallocate packet i.e.
+		 * the completion tag is not freed to ensure that the driver
+		 * can take appropriate action if a corresponding valid
+		 * completion is received later.
+		 */
+		gve_unmap_packet(tx->dev, pending_packet);
+		/* This indicates the packet was dropped. */
+		dev_kfree_skb_any(pending_packet->skb);
+		pending_packet->skb = NULL;
+		tx->dropped_pkt++;
+		net_err_ratelimited("%s: No reinjection completion was received for: %ld.\n",
+				    priv->dev->name,
+				    (pending_packet - tx->dqo.pending_packets));
+
+		pending_packet->state = GVE_PACKET_STATE_TIMED_OUT_COMPL;
+		pending_packet->timeout_jiffies =
+				jiffies +
+				msecs_to_jiffies(GVE_DEALLOCATE_COMPL_TIMEOUT *
+						 MSEC_PER_SEC);
+		/* Maintain pending packet in another list so the packet can be
+		 * unallocated at a later time.
+		 */
+		add_to_list(tx, &tx->dqo_compl.timed_out_completions,
+			    pending_packet);
+	}
+}
+
+static void remove_timed_out_completions(struct gve_priv *priv,
+					 struct gve_tx_ring *tx)
+{
+	struct gve_tx_pending_packet_dqo *pending_packet;
+	s16 next_index;
+
+	next_index = tx->dqo_compl.timed_out_completions.head;
+	while (next_index != -1) {
+		pending_packet = &tx->dqo.pending_packets[next_index];
+		next_index = pending_packet->next;
+		/* Break early because packets should timeout in order. */
+		if (time_is_after_jiffies(pending_packet->timeout_jiffies))
+			break;
+
+		remove_from_list(tx, &tx->dqo_compl.timed_out_completions,
+				 pending_packet);
+		gve_free_pending_packet(tx, pending_packet);
+	}
+}
+
 int gve_clean_tx_done_dqo(struct gve_priv *priv, struct gve_tx_ring *tx,
 			  struct napi_struct *napi)
 {
-	return 0;
+	u64 reinject_compl_bytes = 0;
+	u64 reinject_compl_pkts = 0;
+	int num_descs_cleaned = 0;
+	u64 miss_compl_bytes = 0;
+	u64 miss_compl_pkts = 0;
+	u64 pkt_compl_bytes = 0;
+	u64 pkt_compl_pkts = 0;
+
+	/* Limit in order to avoid blocking for too long */
+	while (!napi || pkt_compl_pkts < napi->weight) {
+		struct gve_tx_compl_desc *compl_desc =
+			&tx->dqo.compl_ring[tx->dqo_compl.head];
+		u16 type;
+
+		if (compl_desc->generation == tx->dqo_compl.cur_gen_bit)
+			break;
+
+		/* Prefetch the next descriptor. */
+		prefetch(&tx->dqo.compl_ring[(tx->dqo_compl.head + 1) &
+				tx->dqo.complq_mask]);
+
+		/* Do not read data until we own the descriptor */
+		dma_rmb();
+		type = compl_desc->type;
+
+		if (type == GVE_COMPL_TYPE_DQO_DESC) {
+			/* This is the last descriptor fetched by HW plus one */
+			u16 tx_head = le16_to_cpu(compl_desc->tx_head);
+
+			atomic_set_release(&tx->dqo_compl.hw_tx_head, tx_head);
+		} else if (type == GVE_COMPL_TYPE_DQO_PKT) {
+			u16 compl_tag = le16_to_cpu(compl_desc->completion_tag);
+
+			gve_handle_packet_completion(priv, tx, !!napi,
+						     compl_tag,
+						     &pkt_compl_bytes,
+						     &pkt_compl_pkts,
+						     /*is_reinjection=*/false);
+		} else if (type == GVE_COMPL_TYPE_DQO_MISS) {
+			u16 compl_tag = le16_to_cpu(compl_desc->completion_tag);
+
+			gve_handle_miss_completion(priv, tx, compl_tag,
+						   &miss_compl_bytes,
+						   &miss_compl_pkts);
+		} else if (type == GVE_COMPL_TYPE_DQO_REINJECTION) {
+			u16 compl_tag = le16_to_cpu(compl_desc->completion_tag);
+
+			gve_handle_packet_completion(priv, tx, !!napi,
+						     compl_tag,
+						     &reinject_compl_bytes,
+						     &reinject_compl_pkts,
+						     /*is_reinjection=*/true);
+		}
+
+		tx->dqo_compl.head =
+			(tx->dqo_compl.head + 1) & tx->dqo.complq_mask;
+		/* Flip the generation bit when we wrap around */
+		tx->dqo_compl.cur_gen_bit ^= tx->dqo_compl.head == 0;
+		num_descs_cleaned++;
+	}
+
+	netdev_tx_completed_queue(tx->netdev_txq,
+				  pkt_compl_pkts + miss_compl_pkts,
+				  pkt_compl_bytes + miss_compl_bytes);
+
+	remove_miss_completions(priv, tx);
+	remove_timed_out_completions(priv, tx);
+
+	u64_stats_update_begin(&tx->statss);
+	tx->bytes_done += pkt_compl_bytes + reinject_compl_bytes;
+	tx->pkt_done += pkt_compl_pkts + reinject_compl_pkts;
+	u64_stats_update_end(&tx->statss);
+	return num_descs_cleaned;
 }
 
 bool gve_tx_poll_dqo(struct gve_notify_block *block, bool do_clean)
 {
-	return false;
+	struct gve_tx_compl_desc *compl_desc;
+	struct gve_tx_ring *tx = block->tx;
+	struct gve_priv *priv = block->priv;
+
+	if (do_clean) {
+		int num_descs_cleaned = gve_clean_tx_done_dqo(priv, tx,
+							      &block->napi);
+
+		/* Sync with queue being stopped in `gve_maybe_stop_tx_dqo()` */
+		mb();
+
+		if (netif_tx_queue_stopped(tx->netdev_txq) &&
+		    num_descs_cleaned > 0) {
+			tx->wake_queue++;
+			netif_tx_wake_queue(tx->netdev_txq);
+		}
+	}
+
+	/* Return true if we still have work. */
+	compl_desc = &tx->dqo.compl_ring[tx->dqo_compl.head];
+	return compl_desc->generation != tx->dqo_compl.cur_gen_bit;
 }