mbox series

[RFC,v3,0/3] Re-introduce TX FIFO resize for larger EP bursting

Message ID 1590630363-3934-1-git-send-email-wcheng@codeaurora.org
Headers show
Series Re-introduce TX FIFO resize for larger EP bursting | expand

Message

Wesley Cheng May 28, 2020, 1:46 a.m. UTC
Changes in V3:
 - Removed "Reviewed-by" tags
 - Renamed series back to RFC
 - Modified logic to ensure that fifo_size is reset if we pass the minimum
   threshold.  Tested with binding multiple FDs requesting 6 FIFOs.

Changes in V2:
 - Modified TXFIFO resizing logic to ensure that each EP is reserved a
   FIFO.
 - Removed dev_dbg() prints and fixed typos from patches
 - Added some more description on the dt-bindings commit message

Currently, there is no functionality to allow for resizing the TXFIFOs, and
relying on the HW default setting for the TXFIFO depth.  In most cases, the
HW default is probably sufficient, but for USB compositions that contain
multiple functions that require EP bursting, the default settings
might not be enough.  Also to note, the current SW will assign an EP to a
function driver w/o checking to see if the TXFIFO size for that particular
EP is large enough. (this is a problem if there are multiple HW defined
values for the TXFIFO size)

It is mentioned in the SNPS databook that a minimum of TX FIFO depth = 3
is required for an EP that supports bursting.  Otherwise, there may be
frequent occurences of bursts ending.  For high bandwidth functions,
such as data tethering (protocols that support data aggregation), mass
storage, and media transfer protocol (over FFS), the bMaxBurst value can be
large, and a bigger TXFIFO depth may prove to be beneficial in terms of USB
throughput. (which can be associated to system access latency, etc...)  It
allows for a more consistent burst of traffic, w/o any interruptions, as
data is readily available in the FIFO.

With testing done using the mass storage function driver, the results show
that with a larger TXFIFO depth, the bandwidth increased significantly.

Test Parameters:
 - Platform: Qualcomm SM8150
 - bMaxBurst = 6
 - USB req size = 256kB
 - Num of USB reqs = 16
 - USB Speed = Super-Speed
 - Function Driver: Mass Storage (w/ ramdisk)
 - Test Application: CrystalDiskMark

Results:

TXFIFO Depth = 3 max packets

Test Case | Data Size | AVG tput (in MB/s)
-------------------------------------------
Sequential|1 GB x     | 
Read      |9 loops    | 193.60
	  |           | 195.86
          |           | 184.77
          |           | 193.60
-------------------------------------------

TXFIFO Depth = 6 max packets

Test Case | Data Size | AVG tput (in MB/s)
-------------------------------------------
Sequential|1 GB x     | 
Read      |9 loops    | 287.35
	  |           | 304.94
          |           | 289.64
          |           | 293.61
-------------------------------------------

Wesley Cheng (3):
  usb: dwc3: Resize TX FIFOs to meet EP bursting requirements
  arm64: boot: dts: qcom: sm8150: Enable dynamic TX FIFO resize logic
  dt-bindings: usb: dwc3: Add entry for tx-fifo-resize

 Documentation/devicetree/bindings/usb/dwc3.txt |   2 +-
 arch/arm64/boot/dts/qcom/sm8150.dtsi           |   1 +
 drivers/usb/dwc3/core.c                        |   2 +
 drivers/usb/dwc3/core.h                        |   8 ++
 drivers/usb/dwc3/ep0.c                         |  37 +++++++-
 drivers/usb/dwc3/gadget.c                      | 115 +++++++++++++++++++++++++
 6 files changed, 163 insertions(+), 2 deletions(-)

Comments

Rob Herring May 28, 2020, 2:55 p.m. UTC | #1
On Wed, 27 May 2020 18:46:03 -0700, Wesley Cheng wrote:
> Re-introduce the comment for the tx-fifo-resize setting for the DWC3
> controller.  This allows for vendors to control if they require the TX FIFO
> resizing logic on their HW, as the default FIFO size configurations may
> already be sufficient.
> 
> Signed-off-by: Wesley Cheng <wcheng@codeaurora.org>
> ---
>  Documentation/devicetree/bindings/usb/dwc3.txt | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 

Acked-by: Rob Herring <robh@kernel.org>
Jack Pham May 29, 2020, 4:28 p.m. UTC | #2
Hi Wesley,

On Wed, May 27, 2020 at 06:46:01PM -0700, Wesley Cheng wrote:
> Some devices have USB compositions which may require multiple endpoints
> that support EP bursting.  HW defined TX FIFO sizes may not always be
> sufficient for these compositions.  By utilizing flexible TX FIFO
> allocation, this allows for endpoints to request the required FIFO depth to
> achieve higher bandwidth.  With some higher bMaxBurst configurations, using
> a larger TX FIFO size results in better TX throughput.
> 
> Ensure that one TX FIFO is reserved for every IN endpoint.  This allows for
> the FIFO logic to prevent running out of FIFO space.
> 
> Signed-off-by: Wesley Cheng <wcheng@codeaurora.org>
> ---

<snip>

> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> index 00746c2..9b09528 100644
> --- a/drivers/usb/dwc3/gadget.c
> +++ b/drivers/usb/dwc3/gadget.c
> @@ -540,6 +540,117 @@ static int dwc3_gadget_start_config(struct dwc3_ep *dep)
>  	return 0;
>  }
>  
> +/*
> + * dwc3_gadget_resize_tx_fifos - reallocate fifo spaces for current use-case
> + * @dwc: pointer to our context structure
> + *
> + * This function will a best effort FIFO allocation in order
> + * to improve FIFO usage and throughput, while still allowing
> + * us to enable as many endpoints as possible.
> + *
> + * Keep in mind that this operation will be highly dependent
> + * on the configured size for RAM1 - which contains TxFifo -,
> + * the amount of endpoints enabled on coreConsultant tool, and
> + * the width of the Master Bus.
> + *
> + * In general, FIFO depths are represented with the following equation:
> + *
> + * fifo_size = mult * ((max_packet + mdwidth)/mdwidth + 1) + 1
> + *
> + * Conversions can be done to the equation to derive the number of packets that
> + * will fit to a particular FIFO size value.
> + */
> +static int dwc3_gadget_resize_tx_fifos(struct dwc3 *dwc, struct dwc3_ep *dep)

The 'dep' param should be sufficient; we can just get 'dwc' from
dep->dwc. That will make it more clear this function operates on each
endpoint that needs resizing.

> +{
> +	int ram1_depth, mdwidth, fifo_0_start, tmp, num_in_ep;
> +	int min_depth, remaining, fifo_size, mult = 1, fifo, max_packet = 1024;
> +
> +	if (!dwc->needs_fifo_resize)
> +		return 0;
> +
> +	/* resize IN endpoints except ep0 */
> +	if (!usb_endpoint_dir_in(dep->endpoint.desc) || dep->number <= 1)
> +		return 0;
> +
> +	/* Don't resize already resized IN endpoint */
> +	if (dep->fifo_depth)
> +		return 0;
> +
> +	ram1_depth = DWC3_RAM1_DEPTH(dwc->hwparams.hwparams7);
> +	mdwidth = DWC3_MDWIDTH(dwc->hwparams.hwparams0);
> +	/* MDWIDTH is represented in bits, we need it in bytes */
> +	mdwidth >>= 3;
> +
> +	if (((dep->endpoint.maxburst > 1) &&
> +			usb_endpoint_xfer_bulk(dep->endpoint.desc))
> +			|| usb_endpoint_xfer_isoc(dep->endpoint.desc))
> +		mult = 3;
> +
> +	if ((dep->endpoint.maxburst > 6) &&
> +			usb_endpoint_xfer_bulk(dep->endpoint.desc)
> +			&& dwc3_is_usb31(dwc))
> +		mult = 6;
> +
> +	/* FIFO size for a single buffer */
> +	fifo = (max_packet + mdwidth)/mdwidth;
> +	fifo++;
> +
> +	/* Calculate the number of remaining EPs w/o any FIFO */
> +	num_in_ep = dwc->num_eps/2;
> +	num_in_ep -= dwc->num_ep_resized;
> +	/* Ignore EP0 IN */
> +	num_in_ep--;
> +
> +	/* Reserve at least one FIFO for the number of IN EPs */
> +	min_depth = num_in_ep * (fifo+1);
> +	remaining = ram1_depth - min_depth - dwc->last_fifo_depth;
> +
> +	/* We've already reserved 1 FIFO per EP, so check what we can fit in
> +	 * addition to it.  If there is not enough remaining space, allocate
> +	 * all the remaining space to the EP.
> +	 */
> +	fifo_size = (mult-1) * fifo;
> +	if (remaining < fifo_size) {
> +		if (remaining > 0)
> +			fifo_size = remaining;
> +		else
> +			fifo_size = 0;
> +	}
> +
> +	fifo_size += fifo;
> +	fifo_size++;
> +	dep->fifo_depth = fifo_size;
> +
> +	/* Check if TXFIFOs start at non-zero addr */
> +	tmp = dwc3_readl(dwc->regs, DWC3_GTXFIFOSIZ(0));
> +	fifo_0_start = DWC3_GTXFIFOSIZ_TXFSTADDR(tmp);
> +
> +	fifo_size |= (fifo_0_start + (dwc->last_fifo_depth << 16));
> +	if (dwc3_is_usb31(dwc))
> +		dwc->last_fifo_depth += DWC31_GTXFIFOSIZ_TXFDEP(fifo_size);
> +	else
> +		dwc->last_fifo_depth += DWC3_GTXFIFOSIZ_TXFDEP(fifo_size);
> +
> +	/* Check fifo size allocation doesn't exceed available RAM size. */
> +	if (dwc->last_fifo_depth >= ram1_depth) {
> +		dev_err(dwc->dev, "Fifosize(%d) > RAM size(%d) %s depth:%d\n",
> +				(dwc->last_fifo_depth * mdwidth), ram1_depth,
> +				dep->endpoint.name, fifo_size);

Use dev_WARN() here and eliminate the WARN_ON(1) below?

> +		if (dwc3_is_usb31(dwc))
> +			fifo_size = DWC31_GTXFIFOSIZ_TXFDEP(fifo_size);
> +		else
> +			fifo_size = DWC3_GTXFIFOSIZ_TXFDEP(fifo_size);
> +		dwc->last_fifo_depth -= fifo_size;
> +		dep->fifo_depth = 0;
> +		WARN_ON(1);
> +		return -ENOMEM;
> +	}
> +
> +	dwc3_writel(dwc->regs, DWC3_GTXFIFOSIZ(dep->number >> 1), fifo_size);
> +	dwc->num_ep_resized++;
> +	return 0;
> +}
> +
>  static int dwc3_gadget_set_ep_config(struct dwc3_ep *dep, unsigned int action)
>  {
>  	const struct usb_ss_ep_comp_descriptor *comp_desc;
> @@ -620,6 +731,10 @@ static int __dwc3_gadget_ep_enable(struct dwc3_ep *dep, unsigned int action)
>  	int			ret;
>  
>  	if (!(dep->flags & DWC3_EP_ENABLED)) {
> +		ret = dwc3_gadget_resize_tx_fifos(dwc, dep);
> +		if (ret)
> +			return ret;
> +
>  		ret = dwc3_gadget_start_config(dep);
>  		if (ret)
>  			return ret;

Jack
Wesley Cheng May 30, 2020, 6:31 a.m. UTC | #3
On 5/29/2020 3:12 AM, Greg KH wrote:
> On Wed, May 27, 2020 at 06:46:00PM -0700, Wesley Cheng wrote:
>> Changes in V3:
>>  - Removed "Reviewed-by" tags
>>  - Renamed series back to RFC
>>  - Modified logic to ensure that fifo_size is reset if we pass the minimum
>>    threshold.  Tested with binding multiple FDs requesting 6 FIFOs.
>>
>> Changes in V2:
>>  - Modified TXFIFO resizing logic to ensure that each EP is reserved a
>>    FIFO.
>>  - Removed dev_dbg() prints and fixed typos from patches
>>  - Added some more description on the dt-bindings commit message
>>
>> Currently, there is no functionality to allow for resizing the TXFIFOs, and
>> relying on the HW default setting for the TXFIFO depth.  In most cases, the
>> HW default is probably sufficient, but for USB compositions that contain
>> multiple functions that require EP bursting, the default settings
>> might not be enough.  Also to note, the current SW will assign an EP to a
>> function driver w/o checking to see if the TXFIFO size for that particular
>> EP is large enough. (this is a problem if there are multiple HW defined
>> values for the TXFIFO size)
>>
>> It is mentioned in the SNPS databook that a minimum of TX FIFO depth = 3
>> is required for an EP that supports bursting.  Otherwise, there may be
>> frequent occurences of bursts ending.  For high bandwidth functions,
>> such as data tethering (protocols that support data aggregation), mass
>> storage, and media transfer protocol (over FFS), the bMaxBurst value can be
>> large, and a bigger TXFIFO depth may prove to be beneficial in terms of USB
>> throughput. (which can be associated to system access latency, etc...)  It
>> allows for a more consistent burst of traffic, w/o any interruptions, as
>> data is readily available in the FIFO.
>>
>> With testing done using the mass storage function driver, the results show
>> that with a larger TXFIFO depth, the bandwidth increased significantly.
> 
> Why is this still a "RFC" series?  That implies you don't want this
> applied...
> 

Hi Greg,

As Felipe mentioned, we need to make sure that this TX FIFO resize logic
is carefully thought out, since the behavior could be different based
off the HW configuration as shown in the past.  Eventually, I hope that
this does get applied, but I think the changes needs more detailed
reviews, as there may be potential shortfalls I did not consider due to
my limited knowledge of what happened w/ the previous logic.  That's
pretty much the reason for tagging it as a RFC, since we still need to
hash out if this is the right approach.

Thanks!