mbox series

[v12,0/6] Re-introduce TX FIFO resize for larger EP bursting

Message ID 1625218655-14180-1-git-send-email-wcheng@codeaurora.org
Headers show
Series Re-introduce TX FIFO resize for larger EP bursting | expand

Message

Wesley Cheng July 2, 2021, 9:37 a.m. UTC
Changes in V12:
 - Re-added change to add a stub for of_add_property(), and exporting it as well
   so that it can be used by modules.
 - Minor updates to some of the APIs, including updating comments, adding error
   handling, etc...

Changes in V11:
 - Added a DWC3 controller revision check to use a different calculation, based
   on Ferry's testing.
 - Removed descriptor loop in configfs, and utilize the fact that the ep->claimed
   parameter is still valid as ep_autoconf_reset() isn't called at the time of
   check_config()
 - Fix compilation errors if CONFIG_OF is not defined
 - Removed patch to add stubs for of_add_property()

Changes in V10:
 - Fixed compilation errors in config where OF is not used (error due to
   unknown symbol for of_add_property()).  Add of_add_property() stub.
 - Fixed compilation warning for incorrect argument being passed to dwc3_mdwidth

Changes in V9:
 - Fixed incorrect patch in series.  Removed changes in DTSI, as dwc3-qcom will
   add the property by default from the kernel.

Changes in V8:
 - Rebased to usb-testing
 - Using devm_kzalloc for adding txfifo property in dwc3-qcom
 - Removed DWC3 QCOM ACPI property for enabling the txfifo resize

Changes in V7:
 - Added a new property tx-fifo-max-num for limiting how much fifo space the
   resizing logic can allocate for endpoints with large burst values.  This
   can differ across platforms, and tie in closely with overall system latency.
 - Added recommended checks for DWC32.
 - Added changes to set the tx-fifo-resize property from dwc3-qcom by default
   instead of modifying the current DTSI files.
 - Added comments on all APIs/variables introduced.
 - Updated the DWC3 YAML to include a better description of the tx-fifo-resize
   property and added an entry for tx-fifo-max-num.

Changes in V6:
 - Rebased patches to usb-testing.
 - Renamed to PATCH series instead of RFC.
 - Checking for fs_descriptors instead of ss_descriptors for determining the
   endpoint count for a particular configuration.
 - Re-ordered patch series to fix patch dependencies.

Changes in V5:
 - Added check_config() logic, which is used to communicate the number of EPs
   used in a particular configuration.  Based on this, the DWC3 gadget driver
   has the ability to know the maximum number of eps utilized in all configs.
   This helps reduce unnecessary allocation to unused eps, and will catch fifo
   allocation issues at bind() time.
 - Fixed variable declaration to single line per variable, and reverse xmas.
 - Created a helper for fifo clearing, which is used by ep0.c

Changes in V4:
 - Removed struct dwc3* as an argument for dwc3_gadget_resize_tx_fifos()
 - Removed WARN_ON(1) in case we run out of fifo space
 
Changes in V3:
 - Removed "Reviewed-by" tags
 - Renamed series back to RFC
 - Modified logic to ensure that fifo_size is reset if we pass the minimum
   threshold.  Tested with binding multiple FDs requesting 6 FIFOs.

Changes in V2:
 - Modified TXFIFO resizing logic to ensure that each EP is reserved a
   FIFO.
 - Removed dev_dbg() prints and fixed typos from patches
 - Added some more description on the dt-bindings commit message

Currently, there is no functionality to allow for resizing the TXFIFOs, and
relying on the HW default setting for the TXFIFO depth.  In most cases, the
HW default is probably sufficient, but for USB compositions that contain
multiple functions that require EP bursting, the default settings
might not be enough.  Also to note, the current SW will assign an EP to a
function driver w/o checking to see if the TXFIFO size for that particular
EP is large enough. (this is a problem if there are multiple HW defined
values for the TXFIFO size)

It is mentioned in the SNPS databook that a minimum of TX FIFO depth = 3
is required for an EP that supports bursting.  Otherwise, there may be
frequent occurences of bursts ending.  For high bandwidth functions,
such as data tethering (protocols that support data aggregation), mass
storage, and media transfer protocol (over FFS), the bMaxBurst value can be
large, and a bigger TXFIFO depth may prove to be beneficial in terms of USB
throughput. (which can be associated to system access latency, etc...)  It
allows for a more consistent burst of traffic, w/o any interruptions, as
data is readily available in the FIFO.

With testing done using the mass storage function driver, the results show
that with a larger TXFIFO depth, the bandwidth increased significantly.

Test Parameters:
 - Platform: Qualcomm SM8150
 - bMaxBurst = 6
 - USB req size = 256kB
 - Num of USB reqs = 16
 - USB Speed = Super-Speed
 - Function Driver: Mass Storage (w/ ramdisk)
 - Test Application: CrystalDiskMark

Results:

TXFIFO Depth = 3 max packets

Test Case | Data Size | AVG tput (in MB/s)
-------------------------------------------
Sequential|1 GB x     | 
Read      |9 loops    | 193.60
	  |           | 195.86
          |           | 184.77
          |           | 193.60
-------------------------------------------

TXFIFO Depth = 6 max packets

Test Case | Data Size | AVG tput (in MB/s)
-------------------------------------------
Sequential|1 GB x     | 
Read      |9 loops    | 287.35
	  |           | 304.94
          |           | 289.64
          |           | 293.61
-------------------------------------------

Wesley Cheng (6):
  usb: gadget: udc: core: Introduce check_config to verify USB
    configuration
  usb: gadget: configfs: Check USB configuration before adding
  usb: dwc3: Resize TX FIFOs to meet EP bursting requirements
  of: Add stub for of_add_property()
  usb: dwc3: dwc3-qcom: Enable tx-fifo-resize property by default
  dt-bindings: usb: dwc3: Update dwc3 TX fifo properties

 .../devicetree/bindings/usb/snps,dwc3.yaml         |  15 +-
 drivers/of/base.c                                  |   1 +
 drivers/usb/dwc3/core.c                            |   9 +
 drivers/usb/dwc3/core.h                            |  15 ++
 drivers/usb/dwc3/dwc3-qcom.c                       |  15 ++
 drivers/usb/dwc3/ep0.c                             |   2 +
 drivers/usb/dwc3/gadget.c                          | 221 +++++++++++++++++++++
 drivers/usb/gadget/configfs.c                      |   4 +
 drivers/usb/gadget/udc/core.c                      |  19 ++
 include/linux/of.h                                 |   5 +
 include/linux/usb/gadget.h                         |   4 +
 11 files changed, 308 insertions(+), 2 deletions(-)

Comments

Greg KH July 6, 2021, 6:13 p.m. UTC | #1
On Fri, Jul 02, 2021 at 02:37:32AM -0700, Wesley Cheng wrote:
> Some devices have USB compositions which may require multiple endpoints

> that support EP bursting.  HW defined TX FIFO sizes may not always be

> sufficient for these compositions.  By utilizing flexible TX FIFO

> allocation, this allows for endpoints to request the required FIFO depth to

> achieve higher bandwidth.  With some higher bMaxBurst configurations, using

> a larger TX FIFO size results in better TX throughput.

> 

> By introducing the check_config() callback, the resizing logic can fetch

> the maximum number of endpoints used in the USB composition (can contain

> multiple configurations), which helps ensure that the resizing logic can

> fulfill the configuration(s), or return an error to the gadget layer

> otherwise during bind time.

> 

> Signed-off-by: Wesley Cheng <wcheng@codeaurora.org>

> ---

>  drivers/usb/dwc3/core.c   |   9 ++

>  drivers/usb/dwc3/core.h   |  15 ++++

>  drivers/usb/dwc3/ep0.c    |   2 +

>  drivers/usb/dwc3/gadget.c | 221 ++++++++++++++++++++++++++++++++++++++++++++++

>  4 files changed, 247 insertions(+)

> 

> diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c

> index e0a8e79..a7bcdb9d 100644

> --- a/drivers/usb/dwc3/core.c

> +++ b/drivers/usb/dwc3/core.c

> @@ -1267,6 +1267,7 @@ static void dwc3_get_properties(struct dwc3 *dwc)

>  	u8			rx_max_burst_prd;

>  	u8			tx_thr_num_pkt_prd;

>  	u8			tx_max_burst_prd;

> +	u8			tx_fifo_resize_max_num;

>  	const char		*usb_psy_name;

>  	int			ret;

>  

> @@ -1282,6 +1283,8 @@ static void dwc3_get_properties(struct dwc3 *dwc)

>  	 */

>  	hird_threshold = 12;

>  

> +	tx_fifo_resize_max_num = 6;

> +


No comment as to why 6 was picked, like the other defaults in this
function?

Why was 6 picked?


>  	dwc->maximum_speed = usb_get_maximum_speed(dev);

>  	dwc->max_ssp_rate = usb_get_maximum_ssp_rate(dev);

>  	dwc->dr_mode = usb_get_dr_mode(dev);

> @@ -1325,6 +1328,10 @@ static void dwc3_get_properties(struct dwc3 *dwc)

>  				&tx_thr_num_pkt_prd);

>  	device_property_read_u8(dev, "snps,tx-max-burst-prd",

>  				&tx_max_burst_prd);

> +	dwc->do_fifo_resize = device_property_read_bool(dev,

> +							"tx-fifo-resize");

> +	device_property_read_u8(dev, "tx-fifo-max-num",

> +				&tx_fifo_resize_max_num);


So you overwrite the "max" with whatever is given to you?  What if
tx-fifo-resize is not enabled?

thanks,

greg k-h
Wesley Cheng July 6, 2021, 8:19 p.m. UTC | #2
On 7/6/2021 11:13 AM, Greg KH wrote:
> On Fri, Jul 02, 2021 at 02:37:32AM -0700, Wesley Cheng wrote:
>> Some devices have USB compositions which may require multiple endpoints
>> that support EP bursting.  HW defined TX FIFO sizes may not always be
>> sufficient for these compositions.  By utilizing flexible TX FIFO
>> allocation, this allows for endpoints to request the required FIFO depth to
>> achieve higher bandwidth.  With some higher bMaxBurst configurations, using
>> a larger TX FIFO size results in better TX throughput.
>>
>> By introducing the check_config() callback, the resizing logic can fetch
>> the maximum number of endpoints used in the USB composition (can contain
>> multiple configurations), which helps ensure that the resizing logic can
>> fulfill the configuration(s), or return an error to the gadget layer
>> otherwise during bind time.
>>
>> Signed-off-by: Wesley Cheng <wcheng@codeaurora.org>
>> ---
>>  drivers/usb/dwc3/core.c   |   9 ++
>>  drivers/usb/dwc3/core.h   |  15 ++++
>>  drivers/usb/dwc3/ep0.c    |   2 +
>>  drivers/usb/dwc3/gadget.c | 221 ++++++++++++++++++++++++++++++++++++++++++++++
>>  4 files changed, 247 insertions(+)
>>
>> diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
>> index e0a8e79..a7bcdb9d 100644
>> --- a/drivers/usb/dwc3/core.c
>> +++ b/drivers/usb/dwc3/core.c
>> @@ -1267,6 +1267,7 @@ static void dwc3_get_properties(struct dwc3 *dwc)
>>  	u8			rx_max_burst_prd;
>>  	u8			tx_thr_num_pkt_prd;
>>  	u8			tx_max_burst_prd;
>> +	u8			tx_fifo_resize_max_num;
>>  	const char		*usb_psy_name;
>>  	int			ret;
>>  
>> @@ -1282,6 +1283,8 @@ static void dwc3_get_properties(struct dwc3 *dwc)
>>  	 */
>>  	hird_threshold = 12;
>>  
>> +	tx_fifo_resize_max_num = 6;
>> +

Hi Greg,
> 
> No comment as to why 6 was picked, like the other defaults in this
> function?
> 
> Why was 6 picked?
> 
> 
Talked with Thinh about this sometime back about why 6 was picked.  It
was just an arbitrary setting we decided on throughout our testing, as
that was what provided the best tput numbers for our system.  Hence why
it was suggested to have a separate property, so other vendors can set
this to accommodate their difference in HW latencies.

>>  	dwc->maximum_speed = usb_get_maximum_speed(dev);
>>  	dwc->max_ssp_rate = usb_get_maximum_ssp_rate(dev);
>>  	dwc->dr_mode = usb_get_dr_mode(dev);
>> @@ -1325,6 +1328,10 @@ static void dwc3_get_properties(struct dwc3 *dwc)
>>  				&tx_thr_num_pkt_prd);
>>  	device_property_read_u8(dev, "snps,tx-max-burst-prd",
>>  				&tx_max_burst_prd);
>> +	dwc->do_fifo_resize = device_property_read_bool(dev,
>> +							"tx-fifo-resize");
>> +	device_property_read_u8(dev, "tx-fifo-max-num",
>> +				&tx_fifo_resize_max_num);
> 
> So you overwrite the "max" with whatever is given to you?  What if
> tx-fifo-resize is not enabled?
>
If tx-fifo-resize is not enabled, then there shouldn't be anything that
will reference this property.  As mentioned in the previous comment, HW
vendors may not need a FIFO size of 6 max packets for their particular
system, so they should be able to program this to their needs.

If someone programs to this a large number, the logic works where it
will allocate based off the space left after ensuring enough space for 1
FIFO per ep.

Thanks
Wesley Cheng
Greg KH July 7, 2021, 6:36 a.m. UTC | #3
On Tue, Jul 06, 2021 at 01:19:38PM -0700, Wesley Cheng wrote:
> 
> 
> On 7/6/2021 11:13 AM, Greg KH wrote:
> > On Fri, Jul 02, 2021 at 02:37:32AM -0700, Wesley Cheng wrote:
> >> Some devices have USB compositions which may require multiple endpoints
> >> that support EP bursting.  HW defined TX FIFO sizes may not always be
> >> sufficient for these compositions.  By utilizing flexible TX FIFO
> >> allocation, this allows for endpoints to request the required FIFO depth to
> >> achieve higher bandwidth.  With some higher bMaxBurst configurations, using
> >> a larger TX FIFO size results in better TX throughput.
> >>
> >> By introducing the check_config() callback, the resizing logic can fetch
> >> the maximum number of endpoints used in the USB composition (can contain
> >> multiple configurations), which helps ensure that the resizing logic can
> >> fulfill the configuration(s), or return an error to the gadget layer
> >> otherwise during bind time.
> >>
> >> Signed-off-by: Wesley Cheng <wcheng@codeaurora.org>
> >> ---
> >>  drivers/usb/dwc3/core.c   |   9 ++
> >>  drivers/usb/dwc3/core.h   |  15 ++++
> >>  drivers/usb/dwc3/ep0.c    |   2 +
> >>  drivers/usb/dwc3/gadget.c | 221 ++++++++++++++++++++++++++++++++++++++++++++++
> >>  4 files changed, 247 insertions(+)
> >>
> >> diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
> >> index e0a8e79..a7bcdb9d 100644
> >> --- a/drivers/usb/dwc3/core.c
> >> +++ b/drivers/usb/dwc3/core.c
> >> @@ -1267,6 +1267,7 @@ static void dwc3_get_properties(struct dwc3 *dwc)
> >>  	u8			rx_max_burst_prd;
> >>  	u8			tx_thr_num_pkt_prd;
> >>  	u8			tx_max_burst_prd;
> >> +	u8			tx_fifo_resize_max_num;
> >>  	const char		*usb_psy_name;
> >>  	int			ret;
> >>  
> >> @@ -1282,6 +1283,8 @@ static void dwc3_get_properties(struct dwc3 *dwc)
> >>  	 */
> >>  	hird_threshold = 12;
> >>  
> >> +	tx_fifo_resize_max_num = 6;
> >> +
> 
> Hi Greg,
> > 
> > No comment as to why 6 was picked, like the other defaults in this
> > function?
> > 
> > Why was 6 picked?
> > 
> > 
> Talked with Thinh about this sometime back about why 6 was picked.  It
> was just an arbitrary setting we decided on throughout our testing, as
> that was what provided the best tput numbers for our system.  Hence why
> it was suggested to have a separate property, so other vendors can set
> this to accommodate their difference in HW latencies.

My point is, this needs to be documented!!!

Right now it just looks like you made up a magic number here.  Look at
the other defaults in this function right above this line.  Those
comments explain what is happening, unlike your change.

Do you want to look at this code in 3 years and wonder why this number
was picked?

> >>  	dwc->maximum_speed = usb_get_maximum_speed(dev);
> >>  	dwc->max_ssp_rate = usb_get_maximum_ssp_rate(dev);
> >>  	dwc->dr_mode = usb_get_dr_mode(dev);
> >> @@ -1325,6 +1328,10 @@ static void dwc3_get_properties(struct dwc3 *dwc)
> >>  				&tx_thr_num_pkt_prd);
> >>  	device_property_read_u8(dev, "snps,tx-max-burst-prd",
> >>  				&tx_max_burst_prd);
> >> +	dwc->do_fifo_resize = device_property_read_bool(dev,
> >> +							"tx-fifo-resize");
> >> +	device_property_read_u8(dev, "tx-fifo-max-num",
> >> +				&tx_fifo_resize_max_num);
> > 
> > So you overwrite the "max" with whatever is given to you?  What if
> > tx-fifo-resize is not enabled?
> >
> If tx-fifo-resize is not enabled, then there shouldn't be anything that
> will reference this property.  As mentioned in the previous comment, HW
> vendors may not need a FIFO size of 6 max packets for their particular
> system, so they should be able to program this to their needs.


That's fine, but is that what is really happening here?  You are not
looking at the "do_fifo_resize" value before you try to read the "max"
value, why not?

> If someone programs to this a large number, the logic works where it
> will allocate based off the space left after ensuring enough space for 1
> FIFO per ep.

Where is that documented or happening?

thanks,

greg k-h