diff mbox series

uvc gadget: Making upper bound of number of usb requests allocated configurable through configfs

Message ID edad1597-48da-49d2-a089-da2487cac889@google.com
State New
Headers show
Series uvc gadget: Making upper bound of number of usb requests allocated configurable through configfs | expand

Commit Message

Jayant Chowdhary Oct. 6, 2023, 10:03 p.m. UTC
Hi Everyone,

We had been seeing the UVC gadget driver receive isoc errors while
sending packets to the usb endpoint - resulting in glitches being shown
on linux hosts. My colleague Avichal Rakesh and others had a very
enlightening discussion at
https://lore.kernel.org/linux-usb/8741b7cb-54ec-410b-caf5-697f81e8ad64@google.com/T/ 


The conclusion that we came to was : usb requests with actual uvc frame
data were missing their scheduled uframes in the usb controller. As a
mitigation, we started sending 0 length usb requests when there was no
uvc frame buffer available to get data from. Even with this mitigation,
we are seeing glitches - albeit at a lower frequency.

After some investigation, it is seen that we’re getting isoc errors when
the worker thread serving video_pump() work items, doesn’t get scheduled
for longer periods of time - than usual - in most cases > 6ms.
This is close enough to the 8ms limit that we have when the number of usb
requests in the queue is 64 (since we have a 125us uframe period). In order
to tolerate the scheduling delays better, it helps to increase the 
number of
usb requests in the queue . In that case, we have more 0 length requests
given to the udc driver - and as a result we can wait longer for uvc
frames with valid data to get processed by video_pump(). I’m attaching a
patch which lets one configure the upper bound on the number of usb
requests allocated through configfs. Please let me know your thoughts.
I can formalize  the patch if it looks okay.

Thank you

Jayant

---
  .../ABI/testing/configfs-usb-gadget-uvc       |  2 ++
  Documentation/usb/gadget-testing.rst          | 21 ++++++++++++-------
  drivers/usb/gadget/function/f_uvc.c           |  4 +++-
  drivers/usb/gadget/function/u_uvc.h           |  1 +
  drivers/usb/gadget/function/uvc.h             |  3 +++
  drivers/usb/gadget/function/uvc_configfs.c    |  2 ++
  drivers/usb/gadget/function/uvc_queue.c       |  5 ++++-
  7 files changed, 28 insertions(+), 10 deletions(-)

      if (*nbuffers > UVC_MAX_VIDEO_BUFFERS)
@@ -54,6 +55,8 @@ static int uvc_queue_setup(struct vb2_queue *vq,

      sizes[0] = video->imagesize;

+    max_usb_requests = video->uvc->streaming_max_usb_requests;
+
      req_size = video->ep->maxpacket
           * max_t(unsigned int, video->ep->maxburst, 1)
           * (video->ep->mult);
@@ -62,7 +65,7 @@ static int uvc_queue_setup(struct vb2_queue *vq,
       * into fewer requests for smaller framesizes.
       */
      nreq = DIV_ROUND_UP(DIV_ROUND_UP(sizes[0], 2), req_size);
-    nreq = clamp(nreq, 4U, 64U);
+    nreq = clamp(nreq, 4U, max_usb_requests);
      video->uvc_num_requests = nreq;

      return 0;

Comments

Jayant Chowdhary Oct. 9, 2023, 10:34 p.m. UTC | #1
> On Fri, Oct 06, 2023 at 03:03:56PM -0700, Jayant Chowdhary wrote:
>> Hi Everyone,
>>
>> We had been seeing the UVC gadget driver receive isoc errors while
>> sending packets to the usb endpoint - resulting in glitches being shown
>> on linux hosts. My colleague Avichal Rakesh and others had a very
>> enlightening discussion at
>> https://lore.kernel.org/linux-usb/8741b7cb-54ec-410b-caf5-697f81e8ad64@google.com/T/
>>
>>
>> The conclusion that we came to was : usb requests with actual uvc frame
>> data were missing their scheduled uframes in the usb controller. As a
>> mitigation, we started sending 0 length usb requests when there was no
>> uvc frame buffer available to get data from. Even with this mitigation,
>> we are seeing glitches - albeit at a lower frequency.
>>
>> After some investigation, it is seen that we’re getting isoc errors when
>> the worker thread serving video_pump() work items, doesn’t get scheduled
>> for longer periods of time - than usual - in most cases > 6ms.
>> This is close enough to the 8ms limit that we have when the number of usb
>> requests in the queue is 64 (since we have a 125us uframe period). In order
>> to tolerate the scheduling delays better, it helps to increase the number of
>> usb requests in the queue . In that case, we have more 0 length requests
>> given to the udc driver - and as a result we can wait longer for uvc
>> frames with valid data to get processed by video_pump(). I’m attaching a
>> patch which lets one configure the upper bound on the number of usb
>> requests allocated through configfs. Please let me know your thoughts.
>> I can formalize  the patch if it looks okay.
> Why do you want to limit the upper bound?  Why not just not submit so
> many requests from userspace as you control that, right?


Userspace negotiates a video frame rate (typically 30/60fps) with the host that does
not necessarily correspond to the ISOC cadence. After the
patch at https://lkml.org/lkml/diff/2023/5/8/1115/1 was submitted, we are
maintaining back pressure on the usb controller even if we do not have uvc frame
data, by sending the controller 0 length requests (as long as usb_requests are
available). Also, even if the userspace application were to somehow produce
data to match the ISOC rate, it would  need to have information about USB
timing details - which I am not sure is available to userspace or is the right
thing to do here ?

Here, we are trying to handle the scenario in which the video_pump() worker
thread does not get scheduled in time - by increasing the number of usb requests
allocated in the queue. This would send more usb requests to the usb controller,
when video_pump() does get scheduled - even if they're 0 length. This buys
the video_pump() worker thread scheduling time -since more usb requests
are with the controller, subsequent requests sent will not be 'stale' and
dropped by the usb controller.

>
>> Thank you
>>
>> Jayant
>>
>> ---
>>  .../ABI/testing/configfs-usb-gadget-uvc       |  2 ++
>>  Documentation/usb/gadget-testing.rst          | 21 ++++++++++++-------
>>  drivers/usb/gadget/function/f_uvc.c           |  4 +++-
>>  drivers/usb/gadget/function/u_uvc.h           |  1 +
>>  drivers/usb/gadget/function/uvc.h             |  3 +++
>>  drivers/usb/gadget/function/uvc_configfs.c    |  2 ++
>>  drivers/usb/gadget/function/uvc_queue.c       |  5 ++++-
>>  7 files changed, 28 insertions(+), 10 deletions(-)
>>
>> diff --git a/Documentation/ABI/testing/configfs-usb-gadget-uvc
>> b/Documentation/ABI/testing/configfs-usb-gadget-uvc
>> index 4feb692c4c1d..9bc58440e1b7 100644
>> --- a/Documentation/ABI/testing/configfs-usb-gadget-uvc
>> +++ b/Documentation/ABI/testing/configfs-usb-gadget-uvc
>> @@ -7,6 +7,8 @@ Description:    UVC function directory
>>          streaming_maxburst    0..15 (ss only)
>>          streaming_maxpacket    1..1023 (fs), 1..3072 (hs/ss)
>>          streaming_interval    1..16
>> +         streaming_max_usb_requests 64..256
>> +
>>          function_name        string [32]
>>          ===================    =============================
>>
>> diff --git a/Documentation/usb/gadget-testing.rst
>> b/Documentation/usb/gadget-testing.rst
>> index 29072c166d23..24a62ba8e870 100644
>> --- a/Documentation/usb/gadget-testing.rst
>> +++ b/Documentation/usb/gadget-testing.rst
>> @@ -790,14 +790,19 @@ Function-specific configfs interface
>>  The function name to use when creating the function directory is "uvc".
>>  The uvc function provides these attributes in its function directory:
>>
>> -    =================== ================================================
>> -    streaming_interval  interval for polling endpoint for data transfers
>> -    streaming_maxburst  bMaxBurst for super speed companion descriptor
>> -    streaming_maxpacket maximum packet size this endpoint is capable of
>> -                sending or receiving when this configuration is
>> -                selected
>> -    function_name       name of the interface
>> -    =================== ================================================
>> +    =================== ===========================================
>> +    streaming_interval         interval for polling endpoint for data
>> +                    transfers
>> +    streaming_maxburst          bMaxBurst for super speed companion
>> +                    descriptor
>> +    streaming_maxpacket         maximum packet size this endpoint is
>> +                    capable of sending or receiving when
>> +                    this configuration is selected
>> +        streaming_max_usb_requests  upper bound for the number of usb
>> requests
>> +                        the gadget driver will allocate for
>> +                    sending to the endpoint.
>> +    function_name            name of the interface
> Note, your patch is whitespace damaged and line-wrapped, making it
> really really hard to read and impossible to apply.


My apologies and thank you for your patience. I have attached the patch again
for discussion.
(Also fixed a mailing list typo I had made in the previous e-mail)

Thanks,
Jayant

---
 .../ABI/testing/configfs-usb-gadget-uvc       |  2 ++
 Documentation/usb/gadget-testing.rst          | 21 ++++++++++++-------
 drivers/usb/gadget/function/f_uvc.c           |  4 +++-
 drivers/usb/gadget/function/u_uvc.h           |  1 +
 drivers/usb/gadget/function/uvc.h             |  3 +++
 drivers/usb/gadget/function/uvc_configfs.c    |  2 ++
 drivers/usb/gadget/function/uvc_queue.c       |  5 ++++-
 7 files changed, 28 insertions(+), 10 deletions(-)

diff --git a/Documentation/ABI/testing/configfs-usb-gadget-uvc b/Documentation/ABI/testing/configfs-usb-gadget-uvc
index 4feb692c4c1d..9bc58440e1b7 100644
--- a/Documentation/ABI/testing/configfs-usb-gadget-uvc
+++ b/Documentation/ABI/testing/configfs-usb-gadget-uvc
@@ -7,6 +7,8 @@ Description:	UVC function directory
 		streaming_maxburst	0..15 (ss only)
 		streaming_maxpacket	1..1023 (fs), 1..3072 (hs/ss)
 		streaming_interval	1..16
+		 streaming_max_usb_requests 64..256
+
 		function_name		string [32]
 		===================	=============================
 
diff --git a/Documentation/usb/gadget-testing.rst b/Documentation/usb/gadget-testing.rst
index 29072c166d23..24a62ba8e870 100644
--- a/Documentation/usb/gadget-testing.rst
+++ b/Documentation/usb/gadget-testing.rst
@@ -790,14 +790,19 @@ Function-specific configfs interface
 The function name to use when creating the function directory is "uvc".
 The uvc function provides these attributes in its function directory:
 
-	=================== ================================================
-	streaming_interval  interval for polling endpoint for data transfers
-	streaming_maxburst  bMaxBurst for super speed companion descriptor
-	streaming_maxpacket maximum packet size this endpoint is capable of
-			    sending or receiving when this configuration is
-			    selected
-	function_name       name of the interface
-	=================== ================================================
+	===================         ===========================================
+	streaming_interval 	    interval for polling endpoint for data
+				    transfers
+	streaming_maxburst  	    bMaxBurst for super speed companion
+				    descriptor
+	streaming_maxpacket         maximum packet size this endpoint is
+				    capable of sending or receiving when
+				    this configuration is selected
+	streaming_max_usb_requests  upper bound for the number of usb requests
+				    the gadget driver will allocate for
+				    sending to the endpoint.
+	function_name		    name of the interface
+	===================         ============================================
 
 There are also "control" and "streaming" subdirectories, each of which contain
 a number of their subdirectories. There are some sane defaults provided, but
diff --git a/drivers/usb/gadget/function/f_uvc.c b/drivers/usb/gadget/function/f_uvc.c
index faa398109431..32a296ea37d7 100644
--- a/drivers/usb/gadget/function/f_uvc.c
+++ b/drivers/usb/gadget/function/f_uvc.c
@@ -659,7 +659,8 @@ uvc_function_bind(struct usb_configuration *c, struct usb_function *f)
 	opts->streaming_interval = clamp(opts->streaming_interval, 1U, 16U);
 	opts->streaming_maxpacket = clamp(opts->streaming_maxpacket, 1U, 3072U);
 	opts->streaming_maxburst = min(opts->streaming_maxburst, 15U);
-
+	opts->streaming_max_usb_requests = clamp(opts->streaming_max_usb_requests, 64U, 256U);
+	uvc->streaming_max_usb_requests = opts->streaming_max_usb_requests;
 	/* For SS, wMaxPacketSize has to be 1024 if bMaxBurst is not 0 */
 	if (opts->streaming_maxburst &&
 	    (opts->streaming_maxpacket % 1024) != 0) {
@@ -934,6 +935,7 @@ static struct usb_function_instance *uvc_alloc_inst(void)
 
 	opts->streaming_interval = 1;
 	opts->streaming_maxpacket = 1024;
+	opts->streaming_max_usb_requests = 64;
 	snprintf(opts->function_name, sizeof(opts->function_name), "UVC Camera");
 
 	ret = uvcg_attach_configfs(opts);
diff --git a/drivers/usb/gadget/function/u_uvc.h b/drivers/usb/gadget/function/u_uvc.h
index 1ce58f61253c..075fee178418 100644
--- a/drivers/usb/gadget/function/u_uvc.h
+++ b/drivers/usb/gadget/function/u_uvc.h
@@ -24,6 +24,7 @@ struct f_uvc_opts {
 	unsigned int					streaming_interval;
 	unsigned int					streaming_maxpacket;
 	unsigned int					streaming_maxburst;
+	unsigned int					streaming_max_usb_requests;
 
 	unsigned int					control_interface;
 	unsigned int					streaming_interface;
diff --git a/drivers/usb/gadget/function/uvc.h b/drivers/usb/gadget/function/uvc.h
index 6751de8b63ad..943c074157fa 100644
--- a/drivers/usb/gadget/function/uvc.h
+++ b/drivers/usb/gadget/function/uvc.h
@@ -157,6 +157,9 @@ struct uvc_device {
 	/* Events */
 	unsigned int event_length;
 	unsigned int event_setup_out : 1;
+
+	/* Max number of usb requests allocated to send to the ep*/
+	unsigned int streaming_max_usb_requests;
 };
 
 static inline struct uvc_device *to_uvc(struct usb_function *f)
diff --git a/drivers/usb/gadget/function/uvc_configfs.c b/drivers/usb/gadget/function/uvc_configfs.c
index 9bf0e985acfa..0ad9eae4845e 100644
--- a/drivers/usb/gadget/function/uvc_configfs.c
+++ b/drivers/usb/gadget/function/uvc_configfs.c
@@ -3403,6 +3403,7 @@ UVC_ATTR(f_uvc_opts_, cname, cname)
 UVCG_OPTS_ATTR(streaming_interval, streaming_interval, 16);
 UVCG_OPTS_ATTR(streaming_maxpacket, streaming_maxpacket, 3072);
 UVCG_OPTS_ATTR(streaming_maxburst, streaming_maxburst, 15);
+UVCG_OPTS_ATTR(streaming_max_usb_requests, streaming_max_usb_requests, 256);
 
 #undef UVCG_OPTS_ATTR
 
@@ -3453,6 +3454,7 @@ static struct configfs_attribute *uvc_attrs[] = {
 	&f_uvc_opts_attr_streaming_maxpacket,
 	&f_uvc_opts_attr_streaming_maxburst,
 	&f_uvc_opts_string_attr_function_name,
+	&f_uvc_opts_attr_streaming_max_usb_requests,
 	NULL,
 };
 
diff --git a/drivers/usb/gadget/function/uvc_queue.c b/drivers/usb/gadget/function/uvc_queue.c
index 0aa3d7e1f3cc..b81c2d8e5ef2 100644
--- a/drivers/usb/gadget/function/uvc_queue.c
+++ b/drivers/usb/gadget/function/uvc_queue.c
@@ -45,6 +45,7 @@ static int uvc_queue_setup(struct vb2_queue *vq,
 	struct uvc_video_queue *queue = vb2_get_drv_priv(vq);
 	struct uvc_video *video = container_of(queue, struct uvc_video, queue);
 	unsigned int req_size;
+	unsigned int max_usb_requests;
 	unsigned int nreq;
 
 	if (*nbuffers > UVC_MAX_VIDEO_BUFFERS)
@@ -54,6 +55,8 @@ static int uvc_queue_setup(struct vb2_queue *vq,
 
 	sizes[0] = video->imagesize;
 
+	max_usb_requests = video->uvc->streaming_max_usb_requests;
+
 	req_size = video->ep->maxpacket
 		 * max_t(unsigned int, video->ep->maxburst, 1)
 		 * (video->ep->mult);
@@ -62,7 +65,7 @@ static int uvc_queue_setup(struct vb2_queue *vq,
 	 * into fewer requests for smaller framesizes.
 	 */
 	nreq = DIV_ROUND_UP(DIV_ROUND_UP(sizes[0], 2), req_size);
-	nreq = clamp(nreq, 4U, 64U);
+	nreq = clamp(nreq, 4U, max_usb_requests);
 	video->uvc_num_requests = nreq;
 
 	return 0;
--
Jayant Chowdhary Oct. 16, 2023, 4:33 a.m. UTC | #2
Hi Thinh,


On 10/12/23 11:50, Thinh Nguyen wrote:
> Hi,
>
> On Mon, Oct 09, 2023, Jayant Chowdhary wrote:
>>> On Fri, Oct 06, 2023 at 03:03:56PM -0700, Jayant Chowdhary wrote:
>>>> Hi Everyone,
>>>>
>>>> We had been seeing the UVC gadget driver receive isoc errors while
>>>> sending packets to the usb endpoint - resulting in glitches being shown
>>>> on linux hosts. My colleague Avichal Rakesh and others had a very
>>>> enlightening discussion at
>>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-usb/8741b7cb-54ec-410b-caf5-697f81e8ad64@google.com/T/__;!!A4F2R9G_pg!e3zVZGt-6Td6HJXqh8GaZAsUeKyvKBhOoyru9qzn3Vkw01Vdkwk7hFr_t5glBG2BYJlOYfFKEUpiH5H4gQ73n_-Y$ 
>>>>
>>>>
>>>> The conclusion that we came to was : usb requests with actual uvc frame
>>>> data were missing their scheduled uframes in the usb controller. As a
>>>> mitigation, we started sending 0 length usb requests when there was no
>>>> uvc frame buffer available to get data from. Even with this mitigation,
>>>> we are seeing glitches - albeit at a lower frequency.
>>>>
>>>> After some investigation, it is seen that we’re getting isoc errors when
>>>> the worker thread serving video_pump() work items, doesn’t get scheduled
>>>> for longer periods of time - than usual - in most cases > 6ms.
>>>> This is close enough to the 8ms limit that we have when the number of usb
>>>> requests in the queue is 64 (since we have a 125us uframe period). In order
>>>> to tolerate the scheduling delays better, it helps to increase the number of
>>>> usb requests in the queue . In that case, we have more 0 length requests
>>>> given to the udc driver - and as a result we can wait longer for uvc
>>>> frames with valid data to get processed by video_pump(). I’m attaching a
>>>> patch which lets one configure the upper bound on the number of usb
>>>> requests allocated through configfs. Please let me know your thoughts.
>>>> I can formalize  the patch if it looks okay.
>>> Why do you want to limit the upper bound?  Why not just not submit so
>>> many requests from userspace as you control that, right?
>>
>> Userspace negotiates a video frame rate (typically 30/60fps) with the host that does
>> not necessarily correspond to the ISOC cadence. After the
>> patch at https://urldefense.com/v3/__https://lkml.org/lkml/diff/2023/5/8/1115/1__;!!A4F2R9G_pg!e3zVZGt-6Td6HJXqh8GaZAsUeKyvKBhOoyru9qzn3Vkw01Vdkwk7hFr_t5glBG2BYJlOYfFKEUpiH5H4gWbb9bvy$  was submitted, we are
>> maintaining back pressure on the usb controller even if we do not have uvc frame
>> data, by sending the controller 0 length requests (as long as usb_requests are
>> available). Also, even if the userspace application were to somehow produce
>> data to match the ISOC rate, it would  need to have information about USB
>> timing details - which I am not sure is available to userspace or is the right
>> thing to do here ?
>>
>> Here, we are trying to handle the scenario in which the video_pump() worker
>> thread does not get scheduled in time - by increasing the number of usb requests
>> allocated in the queue. This would send more usb requests to the usb controller,
>> when video_pump() does get scheduled - even if they're 0 length. This buys
>> the video_pump() worker thread scheduling time -since more usb requests
>> are with the controller, subsequent requests sent will not be 'stale' and
>> dropped by the usb controller.
>>
> I believe you're testing against dwc3 controller right? I may not be as
> familiar with UVC function driver, but based on the previous
> discussions, I think the driver should be able to handle this without
> the user input.

Yes we are testing against the dwc3 controller.

>
> The frequency of the request submission should not depend on the
> video_pump() work thread since it can vary. The frequency of request
> submission should match with the request completion. We know that
> request completion rate should be fixed (1 uframe/request + when you
> don't set no_interrupt). Base on this you can do your calculation on how
> often you should set no_interrupt and how many requests you must submit.
> You don't have to wait for the video_pump() to submit 0-length requests.
>
> The only variable here is the completion handler delay or system
> latency, which should not be much and should be within your calculation.


Thanks for the suggestion. It indeed makes sense that we do not completely depend on
video_pump() for sending 0 length requests. I was concerned about
synchronization needed when we send requests to the dwc3 controller from
different threads. I see that the dwc3 controller code does internally serialize
queueing requests, can we expect this from other controllers as well ? 

This brings me to another question for Michael - I see
that we introduced a worker thread for pumping  usb requests to the usb endpoint
in https://lore.kernel.org/all/20200427151614.10868-1-m.grzeschik@pengutronix.de/
(I see multiple email addresses, so apologies if I used the incorrect one).

Did we introduce the worker thread to solve some specific deadlock scenarios ?
Or was it a general mitigation against racy usb request submission from v4l2 buffer
queuing, stream enable and the video complete handler firing ?

I was chatting with Avi about this, what if we submit requests to the endpoint
only at two points in the streaming lifecycle - 
1) The whole 64 (or however many usb requests are allocated) when
   uvcg_video_enable() is called - with 0 length usb_requests.
2) In the video complete handler - if a video buffer is available, we encode it
   and submit it to the endpoint. If not, we send a 0 length request.

This way we're really maintaining back pressure and sending requests as soon
as we can to the dwc3 controller. Encoding is mostly memcpys from what I see
so hopefully not too heavy on the interrupt handler. I will work on prototyping
this meanwhile.

Thanks,
Jayant
Michael Grzeschik Oct. 18, 2023, 1:28 p.m. UTC | #3
On Sun, Oct 15, 2023 at 09:33:43PM -0700, Jayant Chowdhary wrote:
>On 10/12/23 11:50, Thinh Nguyen wrote:
>> On Mon, Oct 09, 2023, Jayant Chowdhary wrote:
>>>> On Fri, Oct 06, 2023 at 03:03:56PM -0700, Jayant Chowdhary wrote:
>>>>> We had been seeing the UVC gadget driver receive isoc errors while
>>>>> sending packets to the usb endpoint - resulting in glitches being shown
>>>>> on linux hosts. My colleague Avichal Rakesh and others had a very
>>>>> enlightening discussion at
>>>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-usb/8741b7cb-54ec-410b-caf5-697f81e8ad64@google.com/T/__;!!A4F2R9G_pg!e3zVZGt-6Td6HJXqh8GaZAsUeKyvKBhOoyru9qzn3Vkw01Vdkwk7hFr_t5glBG2BYJlOYfFKEUpiH5H4gQ73n_-Y$
>>>>>
>>>>>
>>>>> The conclusion that we came to was : usb requests with actual uvc frame
>>>>> data were missing their scheduled uframes in the usb controller. As a
>>>>> mitigation, we started sending 0 length usb requests when there was no
>>>>> uvc frame buffer available to get data from. Even with this mitigation,
>>>>> we are seeing glitches - albeit at a lower frequency.
>>>>>
>>>>> After some investigation, it is seen that we’re getting isoc errors when
>>>>> the worker thread serving video_pump() work items, doesn’t get scheduled
>>>>> for longer periods of time - than usual - in most cases > 6ms.
>>>>> This is close enough to the 8ms limit that we have when the number of usb
>>>>> requests in the queue is 64 (since we have a 125us uframe period). In order
>>>>> to tolerate the scheduling delays better, it helps to increase the number of
>>>>> usb requests in the queue . In that case, we have more 0 length requests
>>>>> given to the udc driver - and as a result we can wait longer for uvc
>>>>> frames with valid data to get processed by video_pump(). I’m attaching a
>>>>> patch which lets one configure the upper bound on the number of usb
>>>>> requests allocated through configfs. Please let me know your thoughts.
>>>>> I can formalize  the patch if it looks okay.
>>>> Why do you want to limit the upper bound?  Why not just not submit so
>>>> many requests from userspace as you control that, right?
>>>
>>> Userspace negotiates a video frame rate (typically 30/60fps) with the host that does
>>> not necessarily correspond to the ISOC cadence. After the
>>> patch at https://urldefense.com/v3/__https://lkml.org/lkml/diff/2023/5/8/1115/1__;!!A4F2R9G_pg!e3zVZGt-6Td6HJXqh8GaZAsUeKyvKBhOoyru9qzn3Vkw01Vdkwk7hFr_t5glBG2BYJlOYfFKEUpiH5H4gWbb9bvy$  was submitted, we are
>>> maintaining back pressure on the usb controller even if we do not have uvc frame
>>> data, by sending the controller 0 length requests (as long as usb_requests are
>>> available). Also, even if the userspace application were to somehow produce
>>> data to match the ISOC rate, it would  need to have information about USB
>>> timing details - which I am not sure is available to userspace or is the right
>>> thing to do here ?
>>>
>>> Here, we are trying to handle the scenario in which the video_pump() worker
>>> thread does not get scheduled in time - by increasing the number of usb requests
>>> allocated in the queue. This would send more usb requests to the usb controller,
>>> when video_pump() does get scheduled - even if they're 0 length. This buys
>>> the video_pump() worker thread scheduling time -since more usb requests
>>> are with the controller, subsequent requests sent will not be 'stale' and
>>> dropped by the usb controller.
>>>
>> I believe you're testing against dwc3 controller right? I may not be as
>> familiar with UVC function driver, but based on the previous
>> discussions, I think the driver should be able to handle this without
>> the user input.
>
>Yes we are testing against the dwc3 controller.
>
>>
>> The frequency of the request submission should not depend on the
>> video_pump() work thread since it can vary. The frequency of request
>> submission should match with the request completion. We know that
>> request completion rate should be fixed (1 uframe/request + when you
>> don't set no_interrupt). Base on this you can do your calculation on how
>> often you should set no_interrupt and how many requests you must submit.
>> You don't have to wait for the video_pump() to submit 0-length requests.
>>
>> The only variable here is the completion handler delay or system
>> latency, which should not be much and should be within your calculation.
>
>
>Thanks for the suggestion. It indeed makes sense that we do not completely depend on
>video_pump() for sending 0 length requests. I was concerned about
>synchronization needed when we send requests to the dwc3 controller from
>different threads. I see that the dwc3 controller code does internally serialize
>queueing requests, can we expect this from other controllers as well ?
>
>This brings me to another question for Michael - I see
>that we introduced a worker thread for pumping  usb requests to the usb endpoint
>in https://lore.kernel.org/all/20200427151614.10868-1-m.grzeschik@pengutronix.de/
>(I see multiple email addresses, so apologies if I used the incorrect one).
>
>Did we introduce the worker thread to solve some specific deadlock scenarios ?

Exactly. This was the reason why we moved to the pump worker. I actually
looked into the host side implementation of the uvc driver. There we
also queue an worker from the complete function:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/media/usb/uvc/uvc_video.c#n1646

So this sounded reasonable to me. However we faced similar issues like
you and introduced different ways to improve the latency issue.

One thing we did was improving the latency by adding WQ_HIGHPRI

https://lore.kernel.org/linux-usb/20220907215818.2670097-1-m.grzeschik@pengutronix.de/

Another patch here is also adding WQ_CPU_INTENSIVE.

But, after all the input from Thinh it is probably better to solve the
issue in a more reliable way.

>Or was it a general mitigation against racy usb request submission from v4l2 buffer
>queuing, stream enable and the video complete handler firing ?

I don't remember all of the issues we saw back then. But this is also an very
likely scenario.

>I was chatting with Avi about this, what if we submit requests to the endpoint
>only at two points in the streaming lifecycle -
>1) The whole 64 (or however many usb requests are allocated) when
>   uvcg_video_enable() is called - with 0 length usb_requests.
>2) In the video complete handler - if a video buffer is available, we encode it
>   and submit it to the endpoint. If not, we send a 0 length request.

It really sounds like a good idea.

>This way we're really maintaining back pressure and sending requests as soon
>as we can to the dwc3 controller. Encoding is mostly memcpys from what I see
>so hopefully not too heavy on the interrupt handler. I will work on prototyping
>this meanwhile.

Great!

Michael
Michael Grzeschik Oct. 19, 2023, 11:15 p.m. UTC | #4
On Wed, Oct 18, 2023 at 03:28:57PM +0200, Michael Grzeschik wrote:
>On Sun, Oct 15, 2023 at 09:33:43PM -0700, Jayant Chowdhary wrote:
>>On 10/12/23 11:50, Thinh Nguyen wrote:
>>>On Mon, Oct 09, 2023, Jayant Chowdhary wrote:
>>>>>On Fri, Oct 06, 2023 at 03:03:56PM -0700, Jayant Chowdhary wrote:
>>>>>>We had been seeing the UVC gadget driver receive isoc errors while
>>>>>>sending packets to the usb endpoint - resulting in glitches being shown
>>>>>>on linux hosts. My colleague Avichal Rakesh and others had a very
>>>>>>enlightening discussion at
>>>>>>https://urldefense.com/v3/__https://lore.kernel.org/linux-usb/8741b7cb-54ec-410b-caf5-697f81e8ad64@google.com/T/__;!!A4F2R9G_pg!e3zVZGt-6Td6HJXqh8GaZAsUeKyvKBhOoyru9qzn3Vkw01Vdkwk7hFr_t5glBG2BYJlOYfFKEUpiH5H4gQ73n_-Y$
>>>>>>
>>>>>>
>>>>>>The conclusion that we came to was : usb requests with actual uvc frame
>>>>>>data were missing their scheduled uframes in the usb controller. As a
>>>>>>mitigation, we started sending 0 length usb requests when there was no
>>>>>>uvc frame buffer available to get data from. Even with this mitigation,
>>>>>>we are seeing glitches - albeit at a lower frequency.
>>>>>>
>>>>>>After some investigation, it is seen that we’re getting isoc errors when
>>>>>>the worker thread serving video_pump() work items, doesn’t get scheduled
>>>>>>for longer periods of time - than usual - in most cases > 6ms.
>>>>>>This is close enough to the 8ms limit that we have when the number of usb
>>>>>>requests in the queue is 64 (since we have a 125us uframe period). In order
>>>>>>to tolerate the scheduling delays better, it helps to increase the number of
>>>>>>usb requests in the queue . In that case, we have more 0 length requests
>>>>>>given to the udc driver - and as a result we can wait longer for uvc
>>>>>>frames with valid data to get processed by video_pump(). I’m attaching a
>>>>>>patch which lets one configure the upper bound on the number of usb
>>>>>>requests allocated through configfs. Please let me know your thoughts.
>>>>>>I can formalize  the patch if it looks okay.
>>>>>Why do you want to limit the upper bound?  Why not just not submit so
>>>>>many requests from userspace as you control that, right?
>>>>
>>>>Userspace negotiates a video frame rate (typically 30/60fps) with the host that does
>>>>not necessarily correspond to the ISOC cadence. After the
>>>>patch at https://urldefense.com/v3/__https://lkml.org/lkml/diff/2023/5/8/1115/1__;!!A4F2R9G_pg!e3zVZGt-6Td6HJXqh8GaZAsUeKyvKBhOoyru9qzn3Vkw01Vdkwk7hFr_t5glBG2BYJlOYfFKEUpiH5H4gWbb9bvy$  was submitted, we are
>>>>maintaining back pressure on the usb controller even if we do not have uvc frame
>>>>data, by sending the controller 0 length requests (as long as usb_requests are
>>>>available). Also, even if the userspace application were to somehow produce
>>>>data to match the ISOC rate, it would  need to have information about USB
>>>>timing details - which I am not sure is available to userspace or is the right
>>>>thing to do here ?
>>>>
>>>>Here, we are trying to handle the scenario in which the video_pump() worker
>>>>thread does not get scheduled in time - by increasing the number of usb requests
>>>>allocated in the queue. This would send more usb requests to the usb controller,
>>>>when video_pump() does get scheduled - even if they're 0 length. This buys
>>>>the video_pump() worker thread scheduling time -since more usb requests
>>>>are with the controller, subsequent requests sent will not be 'stale' and
>>>>dropped by the usb controller.
>>>>
>>>I believe you're testing against dwc3 controller right? I may not be as
>>>familiar with UVC function driver, but based on the previous
>>>discussions, I think the driver should be able to handle this without
>>>the user input.
>>
>>Yes we are testing against the dwc3 controller.
>>
>>>
>>>The frequency of the request submission should not depend on the
>>>video_pump() work thread since it can vary. The frequency of request
>>>submission should match with the request completion. We know that
>>>request completion rate should be fixed (1 uframe/request + when you
>>>don't set no_interrupt). Base on this you can do your calculation on how
>>>often you should set no_interrupt and how many requests you must submit.
>>>You don't have to wait for the video_pump() to submit 0-length requests.
>>>
>>>The only variable here is the completion handler delay or system
>>>latency, which should not be much and should be within your calculation.
>>
>>
>>Thanks for the suggestion. It indeed makes sense that we do not completely depend on
>>video_pump() for sending 0 length requests. I was concerned about
>>synchronization needed when we send requests to the dwc3 controller from
>>different threads. I see that the dwc3 controller code does internally serialize
>>queueing requests, can we expect this from other controllers as well ?
>>
>>This brings me to another question for Michael - I see
>>that we introduced a worker thread for pumping  usb requests to the usb endpoint
>>in https://lore.kernel.org/all/20200427151614.10868-1-m.grzeschik@pengutronix.de/
>>(I see multiple email addresses, so apologies if I used the incorrect one).
>>
>>Did we introduce the worker thread to solve some specific deadlock scenarios ?
>
>Exactly. This was the reason why we moved to the pump worker. I actually
>looked into the host side implementation of the uvc driver. There we
>also queue an worker from the complete function:
>
>https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/media/usb/uvc/uvc_video.c#n1646
>
>So this sounded reasonable to me. However we faced similar issues like
>you and introduced different ways to improve the latency issue.
>
>One thing we did was improving the latency by adding WQ_HIGHPRI
>
>https://lore.kernel.org/linux-usb/20220907215818.2670097-1-m.grzeschik@pengutronix.de/
>
>Another patch here is also adding WQ_CPU_INTENSIVE.
>
>But, after all the input from Thinh it is probably better to solve the
>issue in a more reliable way.
>
>>Or was it a general mitigation against racy usb request submission from v4l2 buffer
>>queuing, stream enable and the video complete handler firing ?
>
>I don't remember all of the issues we saw back then. But this is also an very
>likely scenario.
>
>>I was chatting with Avi about this, what if we submit requests to the endpoint
>>only at two points in the streaming lifecycle -
>>1) The whole 64 (or however many usb requests are allocated) when
>>  uvcg_video_enable() is called - with 0 length usb_requests.
>>2) In the video complete handler - if a video buffer is available, we encode it
>>  and submit it to the endpoint. If not, we send a 0 length request.
>
>It really sounds like a good idea.
>
>>This way we're really maintaining back pressure and sending requests as soon
>>as we can to the dwc3 controller. Encoding is mostly memcpys from what I see
>>so hopefully not too heavy on the interrupt handler. I will work on prototyping
>>this meanwhile.

[1] https://lore.kernel.org/linux-usb/20230930184821.310143-1-arakesh@google.com/T/#t

It was actually not that hard to do that.
With the patches from this thread applied [1] , the unformal changes looks like this:

#change 1

diff --git a/drivers/usb/gadget/function/uvc_v4l2.c b/drivers/usb/gadget/function/uvc_v4l2.c
index f64d03136c5665..29cd23c38eb99d 100644
--- a/drivers/usb/gadget/function/uvc_v4l2.c
+++ b/drivers/usb/gadget/function/uvc_v4l2.c
@@ -626,8 +626,7 @@ uvc_v4l2_qbuf(struct file *file, void *fh, struct v4l2_buffer *b)
  	if (ret < 0)
  		return ret;
  
-	if (uvc->state == UVC_STATE_STREAMING)
-		queue_work(video->async_wq, &video->pump);
+	uvcg_video_pump(video);
  
  	return ret;
  }
diff --git a/drivers/usb/gadget/function/uvc_video.c b/drivers/usb/gadget/function/uvc_video.c
index 2ec51ed5e9d074..2fe800500c88a3 100644
--- a/drivers/usb/gadget/function/uvc_video.c
+++ b/drivers/usb/gadget/function/uvc_video.c
@@ -329,7 +329,9 @@ uvc_video_complete(struct usb_ep *ep, struct usb_request *req)
  	 */
  	if (video->is_enabled) {
  		list_add_tail(&req->list, &video->req_free);
-		queue_work(video->async_wq, &video->pump);
+		spin_unlock_irqrestore(&video->req_lock, flags);
+		uvcg_video_pump(video);
+		return;
  	} else {
  		uvc_video_free_request(ureq, ep);
  	}
@@ -413,9 +415,8 @@ uvc_video_alloc_requests(struct uvc_video *video)
   * This function fills the available USB requests (listed in req_free) with
   * video data from the queued buffers.
   */
-static void uvcg_video_pump(struct work_struct *work)
+int uvcg_video_pump(struct uvc_video *video)
  {
-	struct uvc_video *video = container_of(work, struct uvc_video, pump);
  	struct uvc_video_queue *queue = &video->queue;
  	/* video->max_payload_size is only set when using bulk transfer */
  	bool is_bulk = video->max_payload_size;
@@ -427,7 +428,7 @@ static void uvcg_video_pump(struct work_struct *work)
  
  	while(true) {
  		if (!video->ep->enabled)
-			return;
+			return 0;
  
  		/*
  		 * Check is_enabled and retrieve the first available USB
@@ -436,7 +437,7 @@ static void uvcg_video_pump(struct work_struct *work)
  		spin_lock_irqsave(&video->req_lock, flags);
  		if (!video->is_enabled || list_empty(&video->req_free)) {
  			spin_unlock_irqrestore(&video->req_lock, flags);
-			return;
+			return -EBUSY;
  		}
  		req = list_first_entry(&video->req_free, struct usb_request,
  					list);
@@ -513,7 +514,7 @@ static void uvcg_video_pump(struct work_struct *work)
  	}
  
  	if (!req)
-		return;
+		return 0;
  
  	spin_lock_irqsave(&video->req_lock, flags);
  	if (video->is_enabled)
@@ -521,6 +522,8 @@ static void uvcg_video_pump(struct work_struct *work)
  	else
  		uvc_video_free_request(req->context, video->ep);
  	spin_unlock_irqrestore(&video->req_lock, flags);
+
+	return 0;
  }
  
  /*
@@ -554,7 +557,6 @@ uvcg_video_disable(struct uvc_video *video)
  	}
  	spin_unlock_irqrestore(&video->req_lock, flags);
  
-	cancel_work_sync(&video->pump);
  	uvcg_queue_cancel(&video->queue, 0);
  
  	spin_lock_irqsave(&video->req_lock, flags);
@@ -635,8 +637,6 @@ int uvcg_video_enable(struct uvc_video *video, int enable)
  
  	video->req_int_count = 0;
  
-	queue_work(video->async_wq, &video->pump);
-
  	return ret;
  }
  
@@ -649,12 +649,6 @@ int uvcg_video_init(struct uvc_video *video, struct uvc_device *uvc)
  	INIT_LIST_HEAD(&video->ureqs);
  	INIT_LIST_HEAD(&video->req_free);
  	spin_lock_init(&video->req_lock);
-	INIT_WORK(&video->pump, uvcg_video_pump);
-
-	/* Allocate a work queue for asynchronous video pump handler. */
-	video->async_wq = alloc_workqueue("uvcgadget", WQ_UNBOUND | WQ_HIGHPRI | WQ_CPU_INTENSIVE, 0);
-	if (!video->async_wq)
-		return -EINVAL;
  
  	video->uvc = uvc;
  
diff --git a/drivers/usb/gadget/function/uvc_video.h b/drivers/usb/gadget/function/uvc_video.h
index 03adeefa343b71..322b05da43965f 100644
--- a/drivers/usb/gadget/function/uvc_video.h
+++ b/drivers/usb/gadget/function/uvc_video.h
@@ -16,6 +16,8 @@ struct uvc_video;
  
  int uvcg_video_enable(struct uvc_video *video, int enable);
  
+int uvcg_video_pump(struct uvc_video *video);
+
  int uvcg_video_init(struct uvc_video *video, struct uvc_device *uvc);
  
  #endif /* __UVC_VIDEO_H__ */


#change 2

Also if you would like to revert the zero request generation apply this ontop.

diff --git a/drivers/usb/gadget/function/uvc_video.c b/drivers/usb/gadget/function/uvc_video.c
index 82695a2ff39aa3..2a3c87079c548d 100644
--- a/drivers/usb/gadget/function/uvc_video.c
+++ b/drivers/usb/gadget/function/uvc_video.c
@@ -418,12 +418,9 @@ uvc_video_alloc_requests(struct uvc_video *video)
  int uvcg_video_pump(struct uvc_video *video)
  {
  	struct uvc_video_queue *queue = &video->queue;
-	/* video->max_payload_size is only set when using bulk transfer */
-	bool is_bulk = video->max_payload_size;
  	struct usb_request *req = NULL;
  	struct uvc_buffer *buf;
  	unsigned long flags;
-	bool buf_done;
  	int ret;
  
  	while(true) {
@@ -450,28 +447,13 @@ int uvcg_video_pump(struct uvc_video *video)
  		 */
  		spin_lock_irqsave(&queue->irqlock, flags);
  		buf = uvcg_queue_head(queue);
-
-		if (buf != NULL) {
-			video->encode(req, video, buf);
-			buf_done = buf->state == UVC_BUF_STATE_DONE;
-		} else if (!(queue->flags & UVC_QUEUE_DISCONNECTED) && !is_bulk) {
-			/*
-			 * No video buffer available; the queue is still connected and
-			 * we're transferring over ISOC. Queue a 0 length request to
-			 * prevent missed ISOC transfers.
-			 */
-			req->length = 0;
-			buf_done = false;
-		} else {
-			/*
-			 * Either the queue has been disconnected or no video buffer
-			 * available for bulk transfer. Either way, stop processing
-			 * further.
-			 */
+		if (buf == NULL) {
  			spin_unlock_irqrestore(&queue->irqlock, flags);
  			break;
  		}
  
+		video->encode(req, video, buf);
+
  		/*
  		 * With USB3 handling more requests at a higher speed, we can't
  		 * afford to generate an interrupt for every request. Decide to
@@ -490,7 +472,8 @@ int uvcg_video_pump(struct uvc_video *video)
  		 *   indicated by video->uvc_num_requests), as a trade-off
  		 *   between latency and interrupt load.
  		 */
-		if (list_empty(&video->req_free) || buf_done ||
+		if (list_empty(&video->req_free) ||
+		    buf->state == UVC_BUF_STATE_DONE ||
  		    !(video->req_int_count %
  		       DIV_ROUND_UP(video->uvc_num_requests, 4))) {
  			video->req_int_count = 0;
@@ -510,7 +493,8 @@ int uvcg_video_pump(struct uvc_video *video)
  
  		/* Endpoint now owns the request */
  		req = NULL;
-		video->req_int_count++;
+		if(buf->state != UVC_BUF_STATE_DONE)
+			video->req_int_count++;
  	}
  
  	if (!req)


In my case this did not change a lot with the flickering.

In fact I did see the most effective change when increasing the
fifo size in the dwc3 controller like this in drivers/usb/dwc3/gadget.c

-813         fifo = dwc3_gadget_calc_tx_fifo_size(dwc, 1);
+813         fifo = dwc3_gadget_calc_tx_fifo_size(dwc, 3);

My system I am testing against is stressed with a high memory bandwidth
use. So it is possible that in this scenario the hardware fifo will not
get filled fast enough. Thus, changing the fifo size helps here. It is
still just a string to pull on but I think it is worth to dig a bit
deeper here.

I am not sure if you are already aware of the following discussion:

https://lore.kernel.org/all/ZPo51EUtBgH+qw44@pengutronix.de/T/

Rergards,
Michael
Jayant Chowdhary Oct. 20, 2023, 5:52 a.m. UTC | #5
Hi Michael,

On 10/19/23 16:15, Michael Grzeschik wrote:
> On Wed, Oct 18, 2023 at 03:28:57PM +0200, Michael Grzeschik wrote:
>> On Sun, Oct 15, 2023 at 09:33:43PM -0700, Jayant Chowdhary wrote:
>>> On 10/12/23 11:50, Thinh Nguyen wrote:
>>>> On Mon, Oct 09, 2023, Jayant Chowdhary wrote:
>>>>>> On Fri, Oct 06, 2023 at 03:03:56PM -0700, Jayant Chowdhary wrote:
>>>>>>> We had been seeing the UVC gadget driver receive isoc errors while
>>>>>>> sending packets to the usb endpoint - resulting in glitches being shown
>>>>>>> on linux hosts. My colleague Avichal Rakesh and others had a very
>>>>>>> enlightening discussion at
>>>>>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-usb/8741b7cb-54ec-410b-caf5-697f81e8ad64@google.com/T/__;!!A4F2R9G_pg!e3zVZGt-6Td6HJXqh8GaZAsUeKyvKBhOoyru9qzn3Vkw01Vdkwk7hFr_t5glBG2BYJlOYfFKEUpiH5H4gQ73n_-Y$
>>>>>>>
>>>>>>>
>>>>>>> The conclusion that we came to was : usb requests with actual uvc frame
>>>>>>> data were missing their scheduled uframes in the usb controller. As a
>>>>>>> mitigation, we started sending 0 length usb requests when there was no
>>>>>>> uvc frame buffer available to get data from. Even with this mitigation,
>>>>>>> we are seeing glitches - albeit at a lower frequency.
>>>>>>>
>>>>>>> After some investigation, it is seen that we’re getting isoc errors when
>>>>>>> the worker thread serving video_pump() work items, doesn’t get scheduled
>>>>>>> for longer periods of time - than usual - in most cases > 6ms.
>>>>>>> This is close enough to the 8ms limit that we have when the number of usb
>>>>>>> requests in the queue is 64 (since we have a 125us uframe period). In order
>>>>>>> to tolerate the scheduling delays better, it helps to increase the number of
>>>>>>> usb requests in the queue . In that case, we have more 0 length requests
>>>>>>> given to the udc driver - and as a result we can wait longer for uvc
>>>>>>> frames with valid data to get processed by video_pump(). I’m attaching a
>>>>>>> patch which lets one configure the upper bound on the number of usb
>>>>>>> requests allocated through configfs. Please let me know your thoughts.
>>>>>>> I can formalize  the patch if it looks okay.
>>>>>> Why do you want to limit the upper bound?  Why not just not submit so
>>>>>> many requests from userspace as you control that, right?
>>>>>
>>>>> Userspace negotiates a video frame rate (typically 30/60fps) with the host that does
>>>>> not necessarily correspond to the ISOC cadence. After the
>>>>> patch at https://urldefense.com/v3/__https://lkml.org/lkml/diff/2023/5/8/1115/1__;!!A4F2R9G_pg!e3zVZGt-6Td6HJXqh8GaZAsUeKyvKBhOoyru9qzn3Vkw01Vdkwk7hFr_t5glBG2BYJlOYfFKEUpiH5H4gWbb9bvy$  was submitted, we are
>>>>> maintaining back pressure on the usb controller even if we do not have uvc frame
>>>>> data, by sending the controller 0 length requests (as long as usb_requests are
>>>>> available). Also, even if the userspace application were to somehow produce
>>>>> data to match the ISOC rate, it would  need to have information about USB
>>>>> timing details - which I am not sure is available to userspace or is the right
>>>>> thing to do here ?
>>>>>
>>>>> Here, we are trying to handle the scenario in which the video_pump() worker
>>>>> thread does not get scheduled in time - by increasing the number of usb requests
>>>>> allocated in the queue. This would send more usb requests to the usb controller,
>>>>> when video_pump() does get scheduled - even if they're 0 length. This buys
>>>>> the video_pump() worker thread scheduling time -since more usb requests
>>>>> are with the controller, subsequent requests sent will not be 'stale' and
>>>>> dropped by the usb controller.
>>>>>
>>>> I believe you're testing against dwc3 controller right? I may not be as
>>>> familiar with UVC function driver, but based on the previous
>>>> discussions, I think the driver should be able to handle this without
>>>> the user input.
>>>
>>> Yes we are testing against the dwc3 controller.
>>>
>>>>
>>>> The frequency of the request submission should not depend on the
>>>> video_pump() work thread since it can vary. The frequency of request
>>>> submission should match with the request completion. We know that
>>>> request completion rate should be fixed (1 uframe/request + when you
>>>> don't set no_interrupt). Base on this you can do your calculation on how
>>>> often you should set no_interrupt and how many requests you must submit.
>>>> You don't have to wait for the video_pump() to submit 0-length requests.
>>>>
>>>> The only variable here is the completion handler delay or system
>>>> latency, which should not be much and should be within your calculation.
>>>
>>>
>>> Thanks for the suggestion. It indeed makes sense that we do not completely depend on
>>> video_pump() for sending 0 length requests. I was concerned about
>>> synchronization needed when we send requests to the dwc3 controller from
>>> different threads. I see that the dwc3 controller code does internally serialize
>>> queueing requests, can we expect this from other controllers as well ?
>>>
>>> This brings me to another question for Michael - I see
>>> that we introduced a worker thread for pumping  usb requests to the usb endpoint
>>> in https://lore.kernel.org/all/20200427151614.10868-1-m.grzeschik@pengutronix.de/
>>> (I see multiple email addresses, so apologies if I used the incorrect one).
>>>
>>> Did we introduce the worker thread to solve some specific deadlock scenarios ?
>>
>> Exactly. This was the reason why we moved to the pump worker. I actually
>> looked into the host side implementation of the uvc driver. There we
>> also queue an worker from the complete function:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/media/usb/uvc/uvc_video.c#n1646
>>
>> So this sounded reasonable to me. However we faced similar issues like
>> you and introduced different ways to improve the latency issue.
>>
>> One thing we did was improving the latency by adding WQ_HIGHPRI
>>
>> https://lore.kernel.org/linux-usb/20220907215818.2670097-1-m.grzeschik@pengutronix.de/
>>
>> Another patch here is also adding WQ_CPU_INTENSIVE.
>>
>> But, after all the input from Thinh it is probably better to solve the
>> issue in a more reliable way.
>>
>>> Or was it a general mitigation against racy usb request submission from v4l2 buffer
>>> queuing, stream enable and the video complete handler firing ?
>>
>> I don't remember all of the issues we saw back then. But this is also an very
>> likely scenario.
>>
>>> I was chatting with Avi about this, what if we submit requests to the endpoint
>>> only at two points in the streaming lifecycle -
>>> 1) The whole 64 (or however many usb requests are allocated) when
>>>  uvcg_video_enable() is called - with 0 length usb_requests.
>>> 2) In the video complete handler - if a video buffer is available, we encode it
>>>  and submit it to the endpoint. If not, we send a 0 length request.
>>
>> It really sounds like a good idea.
>>
>>> This way we're really maintaining back pressure and sending requests as soon
>>> as we can to the dwc3 controller. Encoding is mostly memcpys from what I see
>>> so hopefully not too heavy on the interrupt handler. I will work on prototyping
>>> this meanwhile.
>
> [1] https://lore.kernel.org/linux-usb/20230930184821.310143-1-arakesh@google.com/T/#t
>
> It was actually not that hard to do that.
> With the patches from this thread applied [1] , the unformal changes looks like this:
>
> #change 1
>
> diff --git a/drivers/usb/gadget/function/uvc_v4l2.c b/drivers/usb/gadget/function/uvc_v4l2.c
> index f64d03136c5665..29cd23c38eb99d 100644
> --- a/drivers/usb/gadget/function/uvc_v4l2.c
> +++ b/drivers/usb/gadget/function/uvc_v4l2.c
> @@ -626,8 +626,7 @@ uvc_v4l2_qbuf(struct file *file, void *fh, struct v4l2_buffer *b)
>      if (ret < 0)
>          return ret;
>  
> -    if (uvc->state == UVC_STATE_STREAMING)
> -        queue_work(video->async_wq, &video->pump);
> +    uvcg_video_pump(video);
>  
>      return ret;
>  }
> diff --git a/drivers/usb/gadget/function/uvc_video.c b/drivers/usb/gadget/function/uvc_video.c
> index 2ec51ed5e9d074..2fe800500c88a3 100644
> --- a/drivers/usb/gadget/function/uvc_video.c
> +++ b/drivers/usb/gadget/function/uvc_video.c
> @@ -329,7 +329,9 @@ uvc_video_complete(struct usb_ep *ep, struct usb_request *req)
>       */
>      if (video->is_enabled) {
>          list_add_tail(&req->list, &video->req_free);
> -        queue_work(video->async_wq, &video->pump);
> +        spin_unlock_irqrestore(&video->req_lock, flags);
> +        uvcg_video_pump(video);
> +        return;
>      } else {
>          uvc_video_free_request(ureq, ep);
>      }
> @@ -413,9 +415,8 @@ uvc_video_alloc_requests(struct uvc_video *video)
>   * This function fills the available USB requests (listed in req_free) with
>   * video data from the queued buffers.
>   */
> -static void uvcg_video_pump(struct work_struct *work)
> +int uvcg_video_pump(struct uvc_video *video)
>  {
> -    struct uvc_video *video = container_of(work, struct uvc_video, pump);
>      struct uvc_video_queue *queue = &video->queue;
>      /* video->max_payload_size is only set when using bulk transfer */
>      bool is_bulk = video->max_payload_size;
> @@ -427,7 +428,7 @@ static void uvcg_video_pump(struct work_struct *work)
>  
>      while(true) {
>          if (!video->ep->enabled)
> -            return;
> +            return 0;
>  
>          /*
>           * Check is_enabled and retrieve the first available USB
> @@ -436,7 +437,7 @@ static void uvcg_video_pump(struct work_struct *work)
>          spin_lock_irqsave(&video->req_lock, flags);
>          if (!video->is_enabled || list_empty(&video->req_free)) {
>              spin_unlock_irqrestore(&video->req_lock, flags);
> -            return;
> +            return -EBUSY;
>          }
>          req = list_first_entry(&video->req_free, struct usb_request,
>                      list);
> @@ -513,7 +514,7 @@ static void uvcg_video_pump(struct work_struct *work)
>      }
>  
>      if (!req)
> -        return;
> +        return 0;
>  
>      spin_lock_irqsave(&video->req_lock, flags);
>      if (video->is_enabled)
> @@ -521,6 +522,8 @@ static void uvcg_video_pump(struct work_struct *work)
>      else
>          uvc_video_free_request(req->context, video->ep);
>      spin_unlock_irqrestore(&video->req_lock, flags);
> +
> +    return 0;
>  }
>  
>  /*
> @@ -554,7 +557,6 @@ uvcg_video_disable(struct uvc_video *video)
>      }
>      spin_unlock_irqrestore(&video->req_lock, flags);
>  
> -    cancel_work_sync(&video->pump);
>      uvcg_queue_cancel(&video->queue, 0);
>  
>      spin_lock_irqsave(&video->req_lock, flags);
> @@ -635,8 +637,6 @@ int uvcg_video_enable(struct uvc_video *video, int enable)
>  
>      video->req_int_count = 0;
>  
> -    queue_work(video->async_wq, &video->pump);
> -
>      return ret;
>  }
>  
> @@ -649,12 +649,6 @@ int uvcg_video_init(struct uvc_video *video, struct uvc_device *uvc)
>      INIT_LIST_HEAD(&video->ureqs);
>      INIT_LIST_HEAD(&video->req_free);
>      spin_lock_init(&video->req_lock);
> -    INIT_WORK(&video->pump, uvcg_video_pump);
> -
> -    /* Allocate a work queue for asynchronous video pump handler. */
> -    video->async_wq = alloc_workqueue("uvcgadget", WQ_UNBOUND | WQ_HIGHPRI | WQ_CPU_INTENSIVE, 0);
> -    if (!video->async_wq)
> -        return -EINVAL;
>  
>      video->uvc = uvc;
>  
> diff --git a/drivers/usb/gadget/function/uvc_video.h b/drivers/usb/gadget/function/uvc_video.h
> index 03adeefa343b71..322b05da43965f 100644
> --- a/drivers/usb/gadget/function/uvc_video.h
> +++ b/drivers/usb/gadget/function/uvc_video.h
> @@ -16,6 +16,8 @@ struct uvc_video;
>  
>  int uvcg_video_enable(struct uvc_video *video, int enable);
>  
> +int uvcg_video_pump(struct uvc_video *video);
> +
>  int uvcg_video_init(struct uvc_video *video, struct uvc_device *uvc);
>  
>  #endif /* __UVC_VIDEO_H__ */
>
>

Thank you for this. I made some slight modifications (nothing functional)
and applied this. I'm actually seeing that the flickers completely disappear
on the device that I'm testing.
Michael Grzeschik Oct. 20, 2023, 12:49 p.m. UTC | #6
On Thu, Oct 19, 2023 at 10:52:11PM -0700, Jayant Chowdhary wrote:
>On 10/19/23 16:15, Michael Grzeschik wrote:
>> On Wed, Oct 18, 2023 at 03:28:57PM +0200, Michael Grzeschik wrote:
>>> On Sun, Oct 15, 2023 at 09:33:43PM -0700, Jayant Chowdhary wrote:
>>>> On 10/12/23 11:50, Thinh Nguyen wrote:
>>>>> On Mon, Oct 09, 2023, Jayant Chowdhary wrote:
>>>>>>> On Fri, Oct 06, 2023 at 03:03:56PM -0700, Jayant Chowdhary wrote:
>>>>>>>> We had been seeing the UVC gadget driver receive isoc errors while
>>>>>>>> sending packets to the usb endpoint - resulting in glitches being shown
>>>>>>>> on linux hosts. My colleague Avichal Rakesh and others had a very
>>>>>>>> enlightening discussion at
>>>>>>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-usb/8741b7cb-54ec-410b-caf5-697f81e8ad64@google.com/T/__;!!A4F2R9G_pg!e3zVZGt-6Td6HJXqh8GaZAsUeKyvKBhOoyru9qzn3Vkw01Vdkwk7hFr_t5glBG2BYJlOYfFKEUpiH5H4gQ73n_-Y$
>>>>>>>>
>>>>>>>>
>>>>>>>> The conclusion that we came to was : usb requests with actual uvc frame
>>>>>>>> data were missing their scheduled uframes in the usb controller. As a
>>>>>>>> mitigation, we started sending 0 length usb requests when there was no
>>>>>>>> uvc frame buffer available to get data from. Even with this mitigation,
>>>>>>>> we are seeing glitches - albeit at a lower frequency.
>>>>>>>>
>>>>>>>> After some investigation, it is seen that we’re getting isoc errors when
>>>>>>>> the worker thread serving video_pump() work items, doesn’t get scheduled
>>>>>>>> for longer periods of time - than usual - in most cases > 6ms.
>>>>>>>> This is close enough to the 8ms limit that we have when the number of usb
>>>>>>>> requests in the queue is 64 (since we have a 125us uframe period). In order
>>>>>>>> to tolerate the scheduling delays better, it helps to increase the number of
>>>>>>>> usb requests in the queue . In that case, we have more 0 length requests
>>>>>>>> given to the udc driver - and as a result we can wait longer for uvc
>>>>>>>> frames with valid data to get processed by video_pump(). I’m attaching a
>>>>>>>> patch which lets one configure the upper bound on the number of usb
>>>>>>>> requests allocated through configfs. Please let me know your thoughts.
>>>>>>>> I can formalize  the patch if it looks okay.
>>>>>>> Why do you want to limit the upper bound?  Why not just not submit so
>>>>>>> many requests from userspace as you control that, right?
>>>>>>
>>>>>> Userspace negotiates a video frame rate (typically 30/60fps) with the host that does
>>>>>> not necessarily correspond to the ISOC cadence. After the
>>>>>> patch at https://urldefense.com/v3/__https://lkml.org/lkml/diff/2023/5/8/1115/1__;!!A4F2R9G_pg!e3zVZGt-6Td6HJXqh8GaZAsUeKyvKBhOoyru9qzn3Vkw01Vdkwk7hFr_t5glBG2BYJlOYfFKEUpiH5H4gWbb9bvy$  was submitted, we are
>>>>>> maintaining back pressure on the usb controller even if we do not have uvc frame
>>>>>> data, by sending the controller 0 length requests (as long as usb_requests are
>>>>>> available). Also, even if the userspace application were to somehow produce
>>>>>> data to match the ISOC rate, it would  need to have information about USB
>>>>>> timing details - which I am not sure is available to userspace or is the right
>>>>>> thing to do here ?
>>>>>>
>>>>>> Here, we are trying to handle the scenario in which the video_pump() worker
>>>>>> thread does not get scheduled in time - by increasing the number of usb requests
>>>>>> allocated in the queue. This would send more usb requests to the usb controller,
>>>>>> when video_pump() does get scheduled - even if they're 0 length. This buys
>>>>>> the video_pump() worker thread scheduling time -since more usb requests
>>>>>> are with the controller, subsequent requests sent will not be 'stale' and
>>>>>> dropped by the usb controller.
>>>>>>
>>>>> I believe you're testing against dwc3 controller right? I may not be as
>>>>> familiar with UVC function driver, but based on the previous
>>>>> discussions, I think the driver should be able to handle this without
>>>>> the user input.
>>>>
>>>> Yes we are testing against the dwc3 controller.
>>>>
>>>>>
>>>>> The frequency of the request submission should not depend on the
>>>>> video_pump() work thread since it can vary. The frequency of request
>>>>> submission should match with the request completion. We know that
>>>>> request completion rate should be fixed (1 uframe/request + when you
>>>>> don't set no_interrupt). Base on this you can do your calculation on how
>>>>> often you should set no_interrupt and how many requests you must submit.
>>>>> You don't have to wait for the video_pump() to submit 0-length requests.
>>>>>
>>>>> The only variable here is the completion handler delay or system
>>>>> latency, which should not be much and should be within your calculation.
>>>>
>>>>
>>>> Thanks for the suggestion. It indeed makes sense that we do not completely depend on
>>>> video_pump() for sending 0 length requests. I was concerned about
>>>> synchronization needed when we send requests to the dwc3 controller from
>>>> different threads. I see that the dwc3 controller code does internally serialize
>>>> queueing requests, can we expect this from other controllers as well ?
>>>>
>>>> This brings me to another question for Michael - I see
>>>> that we introduced a worker thread for pumping  usb requests to the usb endpoint
>>>> in https://lore.kernel.org/all/20200427151614.10868-1-m.grzeschik@pengutronix.de/
>>>> (I see multiple email addresses, so apologies if I used the incorrect one).
>>>>
>>>> Did we introduce the worker thread to solve some specific deadlock scenarios ?
>>>
>>> Exactly. This was the reason why we moved to the pump worker. I actually
>>> looked into the host side implementation of the uvc driver. There we
>>> also queue an worker from the complete function:
>>>
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/media/usb/uvc/uvc_video.c#n1646
>>>
>>> So this sounded reasonable to me. However we faced similar issues like
>>> you and introduced different ways to improve the latency issue.
>>>
>>> One thing we did was improving the latency by adding WQ_HIGHPRI
>>>
>>> https://lore.kernel.org/linux-usb/20220907215818.2670097-1-m.grzeschik@pengutronix.de/
>>>
>>> Another patch here is also adding WQ_CPU_INTENSIVE.
>>>
>>> But, after all the input from Thinh it is probably better to solve the
>>> issue in a more reliable way.
>>>
>>>> Or was it a general mitigation against racy usb request submission from v4l2 buffer
>>>> queuing, stream enable and the video complete handler firing ?
>>>
>>> I don't remember all of the issues we saw back then. But this is also an very
>>> likely scenario.
>>>
>>>> I was chatting with Avi about this, what if we submit requests to the endpoint
>>>> only at two points in the streaming lifecycle -
>>>> 1) The whole 64 (or however many usb requests are allocated) when
>>>>  uvcg_video_enable() is called - with 0 length usb_requests.
>>>> 2) In the video complete handler - if a video buffer is available, we encode it
>>>>  and submit it to the endpoint. If not, we send a 0 length request.
>>>
>>> It really sounds like a good idea.
>>>
>>>> This way we're really maintaining back pressure and sending requests as soon
>>>> as we can to the dwc3 controller. Encoding is mostly memcpys from what I see
>>>> so hopefully not too heavy on the interrupt handler. I will work on prototyping
>>>> this meanwhile.
>>
>> [1] https://lore.kernel.org/linux-usb/20230930184821.310143-1-arakesh@google.com/T/#t
>>
>> It was actually not that hard to do that.
>> With the patches from this thread applied [1] , the unformal changes looks like this:
>>
>> #change 1
>>
>> diff --git a/drivers/usb/gadget/function/uvc_v4l2.c b/drivers/usb/gadget/function/uvc_v4l2.c
>> index f64d03136c5665..29cd23c38eb99d 100644
>> --- a/drivers/usb/gadget/function/uvc_v4l2.c
>> +++ b/drivers/usb/gadget/function/uvc_v4l2.c
>> @@ -626,8 +626,7 @@ uvc_v4l2_qbuf(struct file *file, void *fh, struct v4l2_buffer *b)
>>      if (ret < 0)
>>          return ret;
>>  
>> -    if (uvc->state == UVC_STATE_STREAMING)
>> -        queue_work(video->async_wq, &video->pump);
>> +    uvcg_video_pump(video);
>>  
>>      return ret;
>>  }
>> diff --git a/drivers/usb/gadget/function/uvc_video.c b/drivers/usb/gadget/function/uvc_video.c
>> index 2ec51ed5e9d074..2fe800500c88a3 100644
>> --- a/drivers/usb/gadget/function/uvc_video.c
>> +++ b/drivers/usb/gadget/function/uvc_video.c
>> @@ -329,7 +329,9 @@ uvc_video_complete(struct usb_ep *ep, struct usb_request *req)
>>       */
>>      if (video->is_enabled) {
>>          list_add_tail(&req->list, &video->req_free);
>> -        queue_work(video->async_wq, &video->pump);
>> +        spin_unlock_irqrestore(&video->req_lock, flags);
>> +        uvcg_video_pump(video);
>> +        return;
>>      } else {
>>          uvc_video_free_request(ureq, ep);
>>      }
>> @@ -413,9 +415,8 @@ uvc_video_alloc_requests(struct uvc_video *video)
>>   * This function fills the available USB requests (listed in req_free) with
>>   * video data from the queued buffers.
>>   */
>> -static void uvcg_video_pump(struct work_struct *work)
>> +int uvcg_video_pump(struct uvc_video *video)
>>  {
>> -    struct uvc_video *video = container_of(work, struct uvc_video, pump);
>>      struct uvc_video_queue *queue = &video->queue;
>>      /* video->max_payload_size is only set when using bulk transfer */
>>      bool is_bulk = video->max_payload_size;
>> @@ -427,7 +428,7 @@ static void uvcg_video_pump(struct work_struct *work)
>>  
>>      while(true) {
>>          if (!video->ep->enabled)
>> -            return;
>> +            return 0;
>>  
>>          /*
>>           * Check is_enabled and retrieve the first available USB
>> @@ -436,7 +437,7 @@ static void uvcg_video_pump(struct work_struct *work)
>>          spin_lock_irqsave(&video->req_lock, flags);
>>          if (!video->is_enabled || list_empty(&video->req_free)) {
>>              spin_unlock_irqrestore(&video->req_lock, flags);
>> -            return;
>> +            return -EBUSY;
>>          }
>>          req = list_first_entry(&video->req_free, struct usb_request,
>>                      list);
>> @@ -513,7 +514,7 @@ static void uvcg_video_pump(struct work_struct *work)
>>      }
>>  
>>      if (!req)
>> -        return;
>> +        return 0;
>>  
>>      spin_lock_irqsave(&video->req_lock, flags);
>>      if (video->is_enabled)
>> @@ -521,6 +522,8 @@ static void uvcg_video_pump(struct work_struct *work)
>>      else
>>          uvc_video_free_request(req->context, video->ep);
>>      spin_unlock_irqrestore(&video->req_lock, flags);
>> +
>> +    return 0;
>>  }
>>  
>>  /*
>> @@ -554,7 +557,6 @@ uvcg_video_disable(struct uvc_video *video)
>>      }
>>      spin_unlock_irqrestore(&video->req_lock, flags);
>>  
>> -    cancel_work_sync(&video->pump);
>>      uvcg_queue_cancel(&video->queue, 0);
>>  
>>      spin_lock_irqsave(&video->req_lock, flags);
>> @@ -635,8 +637,6 @@ int uvcg_video_enable(struct uvc_video *video, int enable)
>>  
>>      video->req_int_count = 0;
>>  
>> -    queue_work(video->async_wq, &video->pump);
>> -
>>      return ret;
>>  }
>>  
>> @@ -649,12 +649,6 @@ int uvcg_video_init(struct uvc_video *video, struct uvc_device *uvc)
>>      INIT_LIST_HEAD(&video->ureqs);
>>      INIT_LIST_HEAD(&video->req_free);
>>      spin_lock_init(&video->req_lock);
>> -    INIT_WORK(&video->pump, uvcg_video_pump);
>> -
>> -    /* Allocate a work queue for asynchronous video pump handler. */
>> -    video->async_wq = alloc_workqueue("uvcgadget", WQ_UNBOUND | WQ_HIGHPRI | WQ_CPU_INTENSIVE, 0);
>> -    if (!video->async_wq)
>> -        return -EINVAL;
>>  
>>      video->uvc = uvc;
>>  
>> diff --git a/drivers/usb/gadget/function/uvc_video.h b/drivers/usb/gadget/function/uvc_video.h
>> index 03adeefa343b71..322b05da43965f 100644
>> --- a/drivers/usb/gadget/function/uvc_video.h
>> +++ b/drivers/usb/gadget/function/uvc_video.h
>> @@ -16,6 +16,8 @@ struct uvc_video;
>>  
>>  int uvcg_video_enable(struct uvc_video *video, int enable);
>>  
>> +int uvcg_video_pump(struct uvc_video *video);
>> +
>>  int uvcg_video_init(struct uvc_video *video, struct uvc_device *uvc);
>>  
>>  #endif /* __UVC_VIDEO_H__ */
>>
>>
>
>Thank you for this. I made some slight modifications (nothing functional)
>and applied this. I'm actually seeing that the flickers completely disappear
>on the device that I'm testing.
>
>From around a flicker every couple of minutes to none in 20 minutes. What I did
>keep was the 0 length request submission, since the camera is naturally producing
>data at a much lower rate than what the usb controller expects. Is there a reason we would
>want to remove that code ?

There is no need to remove this. I was just currious, if this would
change anything for the tests.

>> #change 2
>>
>> Also if you would like to revert the zero request generation apply this ontop.
>>
>> diff --git a/drivers/usb/gadget/function/uvc_video.c b/drivers/usb/gadget/function/uvc_video.c
>> index 82695a2ff39aa3..2a3c87079c548d 100644
>> --- a/drivers/usb/gadget/function/uvc_video.c
>> +++ b/drivers/usb/gadget/function/uvc_video.c
>> @@ -418,12 +418,9 @@ uvc_video_alloc_requests(struct uvc_video *video)
>>  int uvcg_video_pump(struct uvc_video *video)
>>  {
>>      struct uvc_video_queue *queue = &video->queue;
>> -    /* video->max_payload_size is only set when using bulk transfer */
>> -    bool is_bulk = video->max_payload_size;
>>      struct usb_request *req = NULL;
>>      struct uvc_buffer *buf;
>>      unsigned long flags;
>> -    bool buf_done;
>>      int ret;
>>  
>>      while(true) {
>> @@ -450,28 +447,13 @@ int uvcg_video_pump(struct uvc_video *video)
>>           */
>>          spin_lock_irqsave(&queue->irqlock, flags);
>>          buf = uvcg_queue_head(queue);
>> -
>> -        if (buf != NULL) {
>> -            video->encode(req, video, buf);
>> -            buf_done = buf->state == UVC_BUF_STATE_DONE;
>> -        } else if (!(queue->flags & UVC_QUEUE_DISCONNECTED) && !is_bulk) {
>> -            /*
>> -             * No video buffer available; the queue is still connected and
>> -             * we're transferring over ISOC. Queue a 0 length request to
>> -             * prevent missed ISOC transfers.
>> -             */
>> -            req->length = 0;
>> -            buf_done = false;
>> -        } else {
>> -            /*
>> -             * Either the queue has been disconnected or no video buffer
>> -             * available for bulk transfer. Either way, stop processing
>> -             * further.
>> -             */
>> +        if (buf == NULL) {
>>              spin_unlock_irqrestore(&queue->irqlock, flags);
>>              break;
>>          }
>>  
>> +        video->encode(req, video, buf);
>> +
>>          /*
>>           * With USB3 handling more requests at a higher speed, we can't
>>           * afford to generate an interrupt for every request. Decide to
>> @@ -490,7 +472,8 @@ int uvcg_video_pump(struct uvc_video *video)
>>           *   indicated by video->uvc_num_requests), as a trade-off
>>           *   between latency and interrupt load.
>>           */
>> -        if (list_empty(&video->req_free) || buf_done ||
>> +        if (list_empty(&video->req_free) ||
>> +            buf->state == UVC_BUF_STATE_DONE ||
>>              !(video->req_int_count %
>>                 DIV_ROUND_UP(video->uvc_num_requests, 4))) {
>>              video->req_int_count = 0;
>> @@ -510,7 +493,8 @@ int uvcg_video_pump(struct uvc_video *video)
>>  
>>          /* Endpoint now owns the request */
>>          req = NULL;
>> -        video->req_int_count++;
>> +        if(buf->state != UVC_BUF_STATE_DONE)
>> +            video->req_int_count++;
>>      }
>>  
>>      if (!req)
>>
>>
>> In my case this did not change a lot with the flickering.
>>
>> In fact I did see the most effective change when increasing the
>> fifo size in the dwc3 controller like this in drivers/usb/dwc3/gadget.c
>>
>> -813         fifo = dwc3_gadget_calc_tx_fifo_size(dwc, 1);
>> +813         fifo = dwc3_gadget_calc_tx_fifo_size(dwc, 3);
>>
>> My system I am testing against is stressed with a high memory bandwidth
>> use. So it is possible that in this scenario the hardware fifo will not
>> get filled fast enough. Thus, changing the fifo size helps here. It is
>> still just a string to pull on but I think it is worth to dig a bit
>> deeper here.
>>
>> I am not sure if you are already aware of the following discussion:
>>
>> https://lore.kernel.org/all/ZPo51EUtBgH+qw44@pengutronix.de/T/
>
>Thank you for this. I wasn't aware of this thread, I will give it a read!

Thanks. I would be happy about some help.

>To confirm, should I still put up a patch for removing the video_pump() worker
>thread or are you planning on doing that ?

I don't mind. If you have the capacity to send this; feel free to do
this. Otherwise I could send the cleaned version from above and credit
you with an Suggested-By.

Regards,
Michael
Thinh Nguyen Oct. 20, 2023, 11:30 p.m. UTC | #7
Sorry for the delay response.

On Sun, Oct 15, 2023, Jayant Chowdhary wrote:
> On 10/12/23 11:50, Thinh Nguyen wrote:
> > The frequency of the request submission should not depend on the
> > video_pump() work thread since it can vary. The frequency of request
> > submission should match with the request completion. We know that
> > request completion rate should be fixed (1 uframe/request + when you
> > don't set no_interrupt). Base on this you can do your calculation on how
> > often you should set no_interrupt and how many requests you must submit.
> > You don't have to wait for the video_pump() to submit 0-length requests.
> >
> > The only variable here is the completion handler delay or system
> > latency, which should not be much and should be within your calculation.
> 
> 
> Thanks for the suggestion. It indeed makes sense that we do not completely depend on
> video_pump() for sending 0 length requests. I was concerned about
> synchronization needed when we send requests to the dwc3 controller from
> different threads. I see that the dwc3 controller code does internally serialize
> queueing requests, can we expect this from other controllers as well ? 

While it's not explicitly documented, when the gadget driver uses
usb_ep_queue(), the order in which the gadget recieves the request
should be maintained and serialized. Because the order the transfer go
out for the same endpoint can be critical, breaking this will cause
issue.

> 
> This brings me to another question for Michael - I see
> that we introduced a worker thread for pumping  usb requests to the usb endpoint
> in https://urldefense.com/v3/__https://lore.kernel.org/all/20200427151614.10868-1-m.grzeschik@pengutronix.de/__;!!A4F2R9G_pg!aAnzCopbTbZtUxBK6a6r6_QzV-b2Z2J5o5esPaartZ2jogKijmhqj6OyiKDg-BPhxq8pJHR3HS1Hf8z6YnqfWTon$ 
> (I see multiple email addresses, so apologies if I used the incorrect one).
> 
> Did we introduce the worker thread to solve some specific deadlock scenarios ?
> Or was it a general mitigation against racy usb request submission from v4l2 buffer
> queuing, stream enable and the video complete handler firing ?
> 
> I was chatting with Avi about this, what if we submit requests to the endpoint
> only at two points in the streaming lifecycle - 
> 1) The whole 64 (or however many usb requests are allocated) when
>    uvcg_video_enable() is called - with 0 length usb_requests.
> 2) In the video complete handler - if a video buffer is available, we encode it
>    and submit it to the endpoint. If not, we send a 0 length request.
> 
> This way we're really maintaining back pressure and sending requests as soon
> as we can to the dwc3 controller. Encoding is mostly memcpys from what I see
> so hopefully not too heavy on the interrupt handler. I will work on prototyping
> this meanwhile.
> 

That sounds good to me. I believe Michael already provided some test
patches and you've already done some preliminary tests for that right?

Thanks,
Thinh
Jayant Chowdhary Oct. 23, 2023, 6:13 p.m. UTC | #8
Hi Thinh, Michael,

On 10/20/23 16:30, Thinh Nguyen wrote:
> Sorry for the delay response.
>
> On Sun, Oct 15, 2023, Jayant Chowdhary wrote:
>> On 10/12/23 11:50, Thinh Nguyen wrote:
>>> The frequency of the request submission should not depend on the
>>> video_pump() work thread since it can vary. The frequency of request
>>> submission should match with the request completion. We know that
>>> request completion rate should be fixed (1 uframe/request + when you
>>> don't set no_interrupt). Base on this you can do your calculation on how
>>> often you should set no_interrupt and how many requests you must submit.
>>> You don't have to wait for the video_pump() to submit 0-length requests.
>>>
>>> The only variable here is the completion handler delay or system
>>> latency, which should not be much and should be within your calculation.
>>
>> Thanks for the suggestion. It indeed makes sense that we do not completely depend on
>> video_pump() for sending 0 length requests. I was concerned about
>> synchronization needed when we send requests to the dwc3 controller from
>> different threads. I see that the dwc3 controller code does internally serialize
>> queueing requests, can we expect this from other controllers as well ? 
> While it's not explicitly documented, when the gadget driver uses
> usb_ep_queue(), the order in which the gadget recieves the request
> should be maintained and serialized. Because the order the transfer go
> out for the same endpoint can be critical, breaking this will cause
> issue.
>
Thanks for clarifying this. Keeping this in mind - I made a slight modification to
your test patch - I removed the uvc_video_pump() function call from uvc_v4l2_qbuf(). We just
call it in uvcg_video_enable(). That should just queue 0 length requests till the first qbuf
is called. There-after only the complete handler running uvcg_video_complete() calls video_pump(),
which sends usb requests to the endpoint. While I do see that we hold the queue->irqlock while
getting the uvc buffer to encode and sending it to the ep, I feel like its just logically safer
for future changes if we can restrict the pumping of requests to one thread.

Does that seem okay to you ? I can formalize it if it does.  

The patch would look something like (on top of: https://lore.kernel.org/linux-usb/20230930184821.310143-1-arakesh@google.com/T/#t)

---
 drivers/usb/gadget/function/f_uvc.c     |  4 ----
 drivers/usb/gadget/function/uvc.h       |  3 ---
 drivers/usb/gadget/function/uvc_v4l2.c  |  3 ---
 drivers/usb/gadget/function/uvc_video.c | 19 +++++++------------
 4 files changed, 7 insertions(+), 22 deletions(-)

diff --git a/drivers/usb/gadget/function/f_uvc.c b/drivers/usb/gadget/function/f_uvc.c
index 44c36e40e943..7d78fc8c00c5 100644
--- a/drivers/usb/gadget/function/f_uvc.c
+++ b/drivers/usb/gadget/function/f_uvc.c
@@ -907,14 +907,10 @@ static void uvc_function_unbind(struct usb_configuration *c,
 {
 	struct usb_composite_dev *cdev = c->cdev;
 	struct uvc_device *uvc = to_uvc(f);
-	struct uvc_video *video = &uvc->video;
 	long wait_ret = 1;
 
 	uvcg_info(f, "%s()\n", __func__);
 
-	if (video->async_wq)
-		destroy_workqueue(video->async_wq);
-
 	/*
 	 * If we know we're connected via v4l2, then there should be a cleanup
 	 * of the device from userspace either via UVC_EVENT_DISCONNECT or
diff --git a/drivers/usb/gadget/function/uvc.h b/drivers/usb/gadget/function/uvc.h
index e8d4c87f1e09..b33211c92c02 100644
--- a/drivers/usb/gadget/function/uvc.h
+++ b/drivers/usb/gadget/function/uvc.h
@@ -88,9 +88,6 @@ struct uvc_video {
 	struct uvc_device *uvc;
 	struct usb_ep *ep;
 
-	struct work_struct pump;
-	struct workqueue_struct *async_wq;
-
 	/* Frame parameters */
 	u8 bpp;
 	u32 fcc;
diff --git a/drivers/usb/gadget/function/uvc_v4l2.c b/drivers/usb/gadget/function/uvc_v4l2.c
index 68bb18bdef81..ef4305f79cfe 100644
--- a/drivers/usb/gadget/function/uvc_v4l2.c
+++ b/drivers/usb/gadget/function/uvc_v4l2.c
@@ -421,9 +421,6 @@ uvc_v4l2_qbuf(struct file *file, void *fh, struct v4l2_buffer *b)
 	if (ret < 0)
 		return ret;
 
-	if (uvc->state == UVC_STATE_STREAMING)
-		queue_work(video->async_wq, &video->pump);
-
 	return ret;
 }
 
diff --git a/drivers/usb/gadget/function/uvc_video.c b/drivers/usb/gadget/function/uvc_video.c
index 54a1c36e879e..35fb6a185b6e 100644
--- a/drivers/usb/gadget/function/uvc_video.c
+++ b/drivers/usb/gadget/function/uvc_video.c
@@ -24,6 +24,8 @@
  * Video codecs
  */
 
+static void uvcg_video_pump(struct uvc_video *video);
+
 static int
 uvc_video_encode_header(struct uvc_video *video, struct uvc_buffer *buf,
 		u8 *data, int len)
@@ -329,7 +331,9 @@ uvc_video_complete(struct usb_ep *ep, struct usb_request *req)
 	 */
 	if (video->is_enabled) {
 		list_add_tail(&req->list, &video->req_free);
-		queue_work(video->async_wq, &video->pump);
+		spin_unlock_irqrestore(&video->req_lock, flags);
+		uvcg_video_pump(video);
+		return;
 	} else {
 		uvc_video_free_request(ureq, ep);
 	}
@@ -415,9 +419,8 @@ uvc_video_alloc_requests(struct uvc_video *video)
  * This function fills the available USB requests (listed in req_free) with
  * video data from the queued buffers.
  */
-static void uvcg_video_pump(struct work_struct *work)
+static void uvcg_video_pump(struct uvc_video *video)
 {
-	struct uvc_video *video = container_of(work, struct uvc_video, pump);
 	struct uvc_video_queue *queue = &video->queue;
 	struct usb_request *req = NULL;
 	struct uvc_buffer *buf;
@@ -545,7 +548,6 @@ uvcg_video_disable(struct uvc_video *video)
 	}
 	spin_unlock_irqrestore(&video->req_lock, flags);
 
-	cancel_work_sync(&video->pump);
 	uvcg_queue_cancel(&video->queue, 0);
 
 	spin_lock_irqsave(&video->req_lock, flags);
@@ -621,8 +623,7 @@ int uvcg_video_enable(struct uvc_video *video, int enable)
 
 	video->req_int_count = 0;
 
-	queue_work(video->async_wq, &video->pump);
-
+	uvcg_video_pump(video);
 	return ret;
 }
 
@@ -635,12 +636,6 @@ int uvcg_video_init(struct uvc_video *video, struct uvc_device *uvc)
 	INIT_LIST_HEAD(&video->ureqs);
 	INIT_LIST_HEAD(&video->req_free);
 	spin_lock_init(&video->req_lock);
-	INIT_WORK(&video->pump, uvcg_video_pump);
-
-	/* Allocate a work queue for asynchronous video pump handler. */
-	video->async_wq = alloc_workqueue("uvcgadget", WQ_UNBOUND | WQ_HIGHPRI, 0);
-	if (!video->async_wq)
-		return -EINVAL;
 
 	video->uvc = uvc;
 	video->fcc = V4L2_PIX_FMT_YUYV;
Michael Grzeschik Oct. 24, 2023, 12:33 p.m. UTC | #9
On Mon, Oct 23, 2023 at 11:13:03AM -0700, Jayant Chowdhary wrote:
>Hi Thinh, Michael,
>
>On 10/20/23 16:30, Thinh Nguyen wrote:
>> Sorry for the delay response.
>>
>> On Sun, Oct 15, 2023, Jayant Chowdhary wrote:
>>> On 10/12/23 11:50, Thinh Nguyen wrote:
>>>> The frequency of the request submission should not depend on the
>>>> video_pump() work thread since it can vary. The frequency of request
>>>> submission should match with the request completion. We know that
>>>> request completion rate should be fixed (1 uframe/request + when you
>>>> don't set no_interrupt). Base on this you can do your calculation on how
>>>> often you should set no_interrupt and how many requests you must submit.
>>>> You don't have to wait for the video_pump() to submit 0-length requests.
>>>>
>>>> The only variable here is the completion handler delay or system
>>>> latency, which should not be much and should be within your calculation.
>>>
>>> Thanks for the suggestion. It indeed makes sense that we do not completely depend on
>>> video_pump() for sending 0 length requests. I was concerned about
>>> synchronization needed when we send requests to the dwc3 controller from
>>> different threads. I see that the dwc3 controller code does internally serialize
>>> queueing requests, can we expect this from other controllers as well ?
>> While it's not explicitly documented, when the gadget driver uses
>> usb_ep_queue(), the order in which the gadget recieves the request
>> should be maintained and serialized. Because the order the transfer go
>> out for the same endpoint can be critical, breaking this will cause
>> issue.
>>
>Thanks for clarifying this. Keeping this in mind - I made a slight modification to
>your test patch - I removed the uvc_video_pump() function call from uvc_v4l2_qbuf(). We just
>call it in uvcg_video_enable(). That should just queue 0 length requests till the first qbuf
>is called. There-after only the complete handler running uvcg_video_complete() calls video_pump(),
>which sends usb requests to the endpoint. While I do see that we hold the queue->irqlock while
>getting the uvc buffer to encode and sending it to the ep, I feel like its just logically safer
>for future changes if we can restrict the pumping of requests to one thread.
>
>Does that seem okay to you ? I can formalize it if it does.

I tested this, and it looks good so far.

Since your changes are minimal you could send this with me as the author
and add your Suggested-by Tag. You should also add your Tested-by Tag in
that case.

Regards,
Michael

>The patch would look something like (on top of: https://lore.kernel.org/linux-usb/20230930184821.310143-1-arakesh@google.com/T/#t)
>
>---
> drivers/usb/gadget/function/f_uvc.c     |  4 ----
> drivers/usb/gadget/function/uvc.h       |  3 ---
> drivers/usb/gadget/function/uvc_v4l2.c  |  3 ---
> drivers/usb/gadget/function/uvc_video.c | 19 +++++++------------
> 4 files changed, 7 insertions(+), 22 deletions(-)
>
>diff --git a/drivers/usb/gadget/function/f_uvc.c b/drivers/usb/gadget/function/f_uvc.c
>index 44c36e40e943..7d78fc8c00c5 100644
>--- a/drivers/usb/gadget/function/f_uvc.c
>+++ b/drivers/usb/gadget/function/f_uvc.c
>@@ -907,14 +907,10 @@ static void uvc_function_unbind(struct usb_configuration *c,
> {
> 	struct usb_composite_dev *cdev = c->cdev;
> 	struct uvc_device *uvc = to_uvc(f);
>-	struct uvc_video *video = &uvc->video;
> 	long wait_ret = 1;
>
> 	uvcg_info(f, "%s()\n", __func__);
>
>-	if (video->async_wq)
>-		destroy_workqueue(video->async_wq);
>-
> 	/*
> 	 * If we know we're connected via v4l2, then there should be a cleanup
> 	 * of the device from userspace either via UVC_EVENT_DISCONNECT or
>diff --git a/drivers/usb/gadget/function/uvc.h b/drivers/usb/gadget/function/uvc.h
>index e8d4c87f1e09..b33211c92c02 100644
>--- a/drivers/usb/gadget/function/uvc.h
>+++ b/drivers/usb/gadget/function/uvc.h
>@@ -88,9 +88,6 @@ struct uvc_video {
> 	struct uvc_device *uvc;
> 	struct usb_ep *ep;
>
>-	struct work_struct pump;
>-	struct workqueue_struct *async_wq;
>-
> 	/* Frame parameters */
> 	u8 bpp;
> 	u32 fcc;
>diff --git a/drivers/usb/gadget/function/uvc_v4l2.c b/drivers/usb/gadget/function/uvc_v4l2.c
>index 68bb18bdef81..ef4305f79cfe 100644
>--- a/drivers/usb/gadget/function/uvc_v4l2.c
>+++ b/drivers/usb/gadget/function/uvc_v4l2.c
>@@ -421,9 +421,6 @@ uvc_v4l2_qbuf(struct file *file, void *fh, struct v4l2_buffer *b)
> 	if (ret < 0)
> 		return ret;
>
>-	if (uvc->state == UVC_STATE_STREAMING)
>-		queue_work(video->async_wq, &video->pump);
>-
> 	return ret;
> }
>
>diff --git a/drivers/usb/gadget/function/uvc_video.c b/drivers/usb/gadget/function/uvc_video.c
>index 54a1c36e879e..35fb6a185b6e 100644
>--- a/drivers/usb/gadget/function/uvc_video.c
>+++ b/drivers/usb/gadget/function/uvc_video.c
>@@ -24,6 +24,8 @@
>  * Video codecs
>  */
>
>+static void uvcg_video_pump(struct uvc_video *video);
>+
> static int
> uvc_video_encode_header(struct uvc_video *video, struct uvc_buffer *buf,
> 		u8 *data, int len)
>@@ -329,7 +331,9 @@ uvc_video_complete(struct usb_ep *ep, struct usb_request *req)
> 	 */
> 	if (video->is_enabled) {
> 		list_add_tail(&req->list, &video->req_free);
>-		queue_work(video->async_wq, &video->pump);
>+		spin_unlock_irqrestore(&video->req_lock, flags);
>+		uvcg_video_pump(video);
>+		return;
> 	} else {
> 		uvc_video_free_request(ureq, ep);
> 	}
>@@ -415,9 +419,8 @@ uvc_video_alloc_requests(struct uvc_video *video)
>  * This function fills the available USB requests (listed in req_free) with
>  * video data from the queued buffers.
>  */
>-static void uvcg_video_pump(struct work_struct *work)
>+static void uvcg_video_pump(struct uvc_video *video)
> {
>-	struct uvc_video *video = container_of(work, struct uvc_video, pump);
> 	struct uvc_video_queue *queue = &video->queue;
> 	struct usb_request *req = NULL;
> 	struct uvc_buffer *buf;
>@@ -545,7 +548,6 @@ uvcg_video_disable(struct uvc_video *video)
> 	}
> 	spin_unlock_irqrestore(&video->req_lock, flags);
>
>-	cancel_work_sync(&video->pump);
> 	uvcg_queue_cancel(&video->queue, 0);
>
> 	spin_lock_irqsave(&video->req_lock, flags);
>@@ -621,8 +623,7 @@ int uvcg_video_enable(struct uvc_video *video, int enable)
>
> 	video->req_int_count = 0;
>
>-	queue_work(video->async_wq, &video->pump);
>-
>+	uvcg_video_pump(video);
> 	return ret;
> }
>
>@@ -635,12 +636,6 @@ int uvcg_video_init(struct uvc_video *video, struct uvc_device *uvc)
> 	INIT_LIST_HEAD(&video->ureqs);
> 	INIT_LIST_HEAD(&video->req_free);
> 	spin_lock_init(&video->req_lock);
>-	INIT_WORK(&video->pump, uvcg_video_pump);
>-
>-	/* Allocate a work queue for asynchronous video pump handler. */
>-	video->async_wq = alloc_workqueue("uvcgadget", WQ_UNBOUND | WQ_HIGHPRI, 0);
>-	if (!video->async_wq)
>-		return -EINVAL;
>
> 	video->uvc = uvc;
> 	video->fcc = V4L2_PIX_FMT_YUYV;
>-- 
>
>>> This brings me to another question for Michael - I see
>>> that we introduced a worker thread for pumping  usb requests to the usb endpoint
>>> in https://urldefense.com/v3/__https://lore.kernel.org/all/20200427151614.10868-1-m.grzeschik@pengutronix.de/__;!!A4F2R9G_pg!aAnzCopbTbZtUxBK6a6r6_QzV-b2Z2J5o5esPaartZ2jogKijmhqj6OyiKDg-BPhxq8pJHR3HS1Hf8z6YnqfWTon$
>>> (I see multiple email addresses, so apologies if I used the incorrect one).
>>>
>>> Did we introduce the worker thread to solve some specific deadlock scenarios ?
>>> Or was it a general mitigation against racy usb request submission from v4l2 buffer
>>> queuing, stream enable and the video complete handler firing ?
>>>
>>> I was chatting with Avi about this, what if we submit requests to the endpoint
>>> only at two points in the streaming lifecycle -
>>> 1) The whole 64 (or however many usb requests are allocated) when
>>>    uvcg_video_enable() is called - with 0 length usb_requests.
>>> 2) In the video complete handler - if a video buffer is available, we encode it
>>>    and submit it to the endpoint. If not, we send a 0 length request.
>>>
>>> This way we're really maintaining back pressure and sending requests as soon
>>> as we can to the dwc3 controller. Encoding is mostly memcpys from what I see
>>> so hopefully not too heavy on the interrupt handler. I will work on prototyping
>>> this meanwhile.
>>>
>> That sounds good to me. I believe Michael already provided some test
>> patches and you've already done some preliminary tests for that right?
>
>Yes that is correct. I tested out the patch above as well and it seems to
>work for my setup. I haven't seen flickers in over an hour of streaming.
>
>Thanks,
>Jayant
>
>
>
Jayant Chowdhary Oct. 25, 2023, 11:09 p.m. UTC | #10
Hi Michael,

On 10/24/23 05:33, Michael Grzeschik wrote:
> On Mon, Oct 23, 2023 at 11:13:03AM -0700, Jayant Chowdhary wrote:
>> Hi Thinh, Michael,
>>
>> On 10/20/23 16:30, Thinh Nguyen wrote:
>>> Sorry for the delay response.
>>>
>>> On Sun, Oct 15, 2023, Jayant Chowdhary wrote:
>>>> On 10/12/23 11:50, Thinh Nguyen wrote:
>>>>> The frequency of the request submission should not depend on the
>>>>> video_pump() work thread since it can vary. The frequency of request
>>>>> submission should match with the request completion. We know that
>>>>> request completion rate should be fixed (1 uframe/request + when you
>>>>> don't set no_interrupt). Base on this you can do your calculation on how
>>>>> often you should set no_interrupt and how many requests you must submit.
>>>>> You don't have to wait for the video_pump() to submit 0-length requests.
>>>>>
>>>>> The only variable here is the completion handler delay or system
>>>>> latency, which should not be much and should be within your calculation.
>>>>
>>>> Thanks for the suggestion. It indeed makes sense that we do not completely depend on
>>>> video_pump() for sending 0 length requests. I was concerned about
>>>> synchronization needed when we send requests to the dwc3 controller from
>>>> different threads. I see that the dwc3 controller code does internally serialize
>>>> queueing requests, can we expect this from other controllers as well ?
>>> While it's not explicitly documented, when the gadget driver uses
>>> usb_ep_queue(), the order in which the gadget recieves the request
>>> should be maintained and serialized. Because the order the transfer go
>>> out for the same endpoint can be critical, breaking this will cause
>>> issue.
>>>
>> Thanks for clarifying this. Keeping this in mind - I made a slight modification to
>> your test patch - I removed the uvc_video_pump() function call from uvc_v4l2_qbuf(). We just
>> call it in uvcg_video_enable(). That should just queue 0 length requests till the first qbuf
>> is called. There-after only the complete handler running uvcg_video_complete() calls video_pump(),
>> which sends usb requests to the endpoint. While I do see that we hold the queue->irqlock while
>> getting the uvc buffer to encode and sending it to the ep, I feel like its just logically safer
>> for future changes if we can restrict the pumping of requests to one thread.
>>
>> Does that seem okay to you ? I can formalize it if it does.
>
> I tested this, and it looks good so far.
>
> Since your changes are minimal you could send this with me as the author
> and add your Suggested-by Tag. You should also add your Tested-by Tag in
> that case.
>
I sent out https://lore.kernel.org/linux-usb/99384044-0d14-4ebe-9109-8a5557e64449@google.com/T/#u

with a Signed-off-by crediting you and suggested by with Avichal and me. It has a few changes related to

bulk end-points as well, but they're relatively minor.

Thanks
Michael Grzeschik Oct. 26, 2023, 6:55 a.m. UTC | #11
On Wed, Oct 25, 2023 at 04:09:27PM -0700, Jayant Chowdhary wrote:
>On 10/24/23 05:33, Michael Grzeschik wrote:
>> On Mon, Oct 23, 2023 at 11:13:03AM -0700, Jayant Chowdhary wrote:
>>> On 10/20/23 16:30, Thinh Nguyen wrote:
>>>> Sorry for the delay response.
>>>>
>>>> On Sun, Oct 15, 2023, Jayant Chowdhary wrote:
>>>>> On 10/12/23 11:50, Thinh Nguyen wrote:
>>>>>> The frequency of the request submission should not depend on the
>>>>>> video_pump() work thread since it can vary. The frequency of request
>>>>>> submission should match with the request completion. We know that
>>>>>> request completion rate should be fixed (1 uframe/request + when you
>>>>>> don't set no_interrupt). Base on this you can do your calculation on how
>>>>>> often you should set no_interrupt and how many requests you must submit.
>>>>>> You don't have to wait for the video_pump() to submit 0-length requests.
>>>>>>
>>>>>> The only variable here is the completion handler delay or system
>>>>>> latency, which should not be much and should be within your calculation.
>>>>>
>>>>> Thanks for the suggestion. It indeed makes sense that we do not completely depend on
>>>>> video_pump() for sending 0 length requests. I was concerned about
>>>>> synchronization needed when we send requests to the dwc3 controller from
>>>>> different threads. I see that the dwc3 controller code does internally serialize
>>>>> queueing requests, can we expect this from other controllers as well ?
>>>> While it's not explicitly documented, when the gadget driver uses
>>>> usb_ep_queue(), the order in which the gadget recieves the request
>>>> should be maintained and serialized. Because the order the transfer go
>>>> out for the same endpoint can be critical, breaking this will cause
>>>> issue.
>>>>
>>> Thanks for clarifying this. Keeping this in mind - I made a slight modification to
>>> your test patch - I removed the uvc_video_pump() function call from uvc_v4l2_qbuf(). We just
>>> call it in uvcg_video_enable(). That should just queue 0 length requests till the first qbuf
>>> is called. There-after only the complete handler running uvcg_video_complete() calls video_pump(),
>>> which sends usb requests to the endpoint. While I do see that we hold the queue->irqlock while
>>> getting the uvc buffer to encode and sending it to the ep, I feel like its just logically safer
>>> for future changes if we can restrict the pumping of requests to one thread.
>>>
>>> Does that seem okay to you ? I can formalize it if it does.
>>
>> I tested this, and it looks good so far.
>>
>> Since your changes are minimal you could send this with me as the author
>> and add your Suggested-by Tag. You should also add your Tested-by Tag in
>> that case.
>>
>I sent out https://lore.kernel.org/linux-usb/99384044-0d14-4ebe-9109-8a5557e64449@google.com/T/#u
>
>with a Signed-off-by crediting you and suggested by with Avichal and me. It has a few changes related to
>
>bulk end-points as well, but they're relatively minor.

Sounds good to me, but you missed your own Signed-off-by in the patch.

Michael
Laurent Pinchart Oct. 27, 2023, 8:14 a.m. UTC | #12
On Fri, Oct 20, 2023 at 11:30:52PM +0000, Thinh Nguyen wrote:
> Sorry for the delay response.
> 
> On Sun, Oct 15, 2023, Jayant Chowdhary wrote:
> > On 10/12/23 11:50, Thinh Nguyen wrote:
> > > The frequency of the request submission should not depend on the
> > > video_pump() work thread since it can vary. The frequency of request
> > > submission should match with the request completion. We know that
> > > request completion rate should be fixed (1 uframe/request + when you
> > > don't set no_interrupt). Base on this you can do your calculation on how
> > > often you should set no_interrupt and how many requests you must submit.
> > > You don't have to wait for the video_pump() to submit 0-length requests.
> > >
> > > The only variable here is the completion handler delay or system
> > > latency, which should not be much and should be within your calculation.
> > 
> > 
> > Thanks for the suggestion. It indeed makes sense that we do not completely depend on
> > video_pump() for sending 0 length requests. I was concerned about
> > synchronization needed when we send requests to the dwc3 controller from
> > different threads. I see that the dwc3 controller code does internally serialize
> > queueing requests, can we expect this from other controllers as well ? 
> 
> While it's not explicitly documented, when the gadget driver uses
> usb_ep_queue(), the order in which the gadget recieves the request
> should be maintained and serialized. Because the order the transfer go
> out for the same endpoint can be critical, breaking this will cause
> issue.

That's right, but if usp_ep_queue() is called from multiple contexts,
there's no guarantee it can provide when it comes to the ordering. The
caller has to handle it.

> > This brings me to another question for Michael - I see
> > that we introduced a worker thread for pumping  usb requests to the usb endpoint
> > in https://urldefense.com/v3/__https://lore.kernel.org/all/20200427151614.10868-1-m.grzeschik@pengutronix.de/__;!!A4F2R9G_pg!aAnzCopbTbZtUxBK6a6r6_QzV-b2Z2J5o5esPaartZ2jogKijmhqj6OyiKDg-BPhxq8pJHR3HS1Hf8z6YnqfWTon$ 
> > (I see multiple email addresses, so apologies if I used the incorrect one).
> > 
> > Did we introduce the worker thread to solve some specific deadlock scenarios ?
> > Or was it a general mitigation against racy usb request submission from v4l2 buffer
> > queuing, stream enable and the video complete handler firing ?
> > 
> > I was chatting with Avi about this, what if we submit requests to the endpoint
> > only at two points in the streaming lifecycle - 
> > 1) The whole 64 (or however many usb requests are allocated) when
> >    uvcg_video_enable() is called - with 0 length usb_requests.
> > 2) In the video complete handler - if a video buffer is available, we encode it
> >    and submit it to the endpoint. If not, we send a 0 length request.
> > 
> > This way we're really maintaining back pressure and sending requests as soon
> > as we can to the dwc3 controller. Encoding is mostly memcpys from what I see
> > so hopefully not too heavy on the interrupt handler. I will work on prototyping
> > this meanwhile.
> 
> That sounds good to me. I believe Michael already provided some test
> patches and you've already done some preliminary tests for that right?
diff mbox series

Patch

diff --git a/Documentation/ABI/testing/configfs-usb-gadget-uvc 
b/Documentation/ABI/testing/configfs-usb-gadget-uvc
index 4feb692c4c1d..9bc58440e1b7 100644
--- a/Documentation/ABI/testing/configfs-usb-gadget-uvc
+++ b/Documentation/ABI/testing/configfs-usb-gadget-uvc
@@ -7,6 +7,8 @@  Description:    UVC function directory
          streaming_maxburst    0..15 (ss only)
          streaming_maxpacket    1..1023 (fs), 1..3072 (hs/ss)
          streaming_interval    1..16
+         streaming_max_usb_requests 64..256
+
          function_name        string [32]
          ===================    =============================

diff --git a/Documentation/usb/gadget-testing.rst 
b/Documentation/usb/gadget-testing.rst
index 29072c166d23..24a62ba8e870 100644
--- a/Documentation/usb/gadget-testing.rst
+++ b/Documentation/usb/gadget-testing.rst
@@ -790,14 +790,19 @@  Function-specific configfs interface
  The function name to use when creating the function directory is "uvc".
  The uvc function provides these attributes in its function directory:

-    =================== ================================================
-    streaming_interval  interval for polling endpoint for data transfers
-    streaming_maxburst  bMaxBurst for super speed companion descriptor
-    streaming_maxpacket maximum packet size this endpoint is capable of
-                sending or receiving when this configuration is
-                selected
-    function_name       name of the interface
-    =================== ================================================
+    =================== ===========================================
+    streaming_interval         interval for polling endpoint for data
+                    transfers
+    streaming_maxburst          bMaxBurst for super speed companion
+                    descriptor
+    streaming_maxpacket         maximum packet size this endpoint is
+                    capable of sending or receiving when
+                    this configuration is selected
+        streaming_max_usb_requests  upper bound for the number of usb 
requests
+                        the gadget driver will allocate for
+                    sending to the endpoint.
+    function_name            name of the interface
+    =================== ==================================================

  There are also "control" and "streaming" subdirectories, each of which 
contain
  a number of their subdirectories. There are some sane defaults 
provided, but
diff --git a/drivers/usb/gadget/function/f_uvc.c 
b/drivers/usb/gadget/function/f_uvc.c
index faa398109431..32a296ea37d7 100644
--- a/drivers/usb/gadget/function/f_uvc.c
+++ b/drivers/usb/gadget/function/f_uvc.c
@@ -659,7 +659,8 @@  uvc_function_bind(struct usb_configuration *c, 
struct usb_function *f)
      opts->streaming_interval = clamp(opts->streaming_interval, 1U, 16U);
      opts->streaming_maxpacket = clamp(opts->streaming_maxpacket, 1U, 
3072U);
      opts->streaming_maxburst = min(opts->streaming_maxburst, 15U);
-
+    opts->streaming_max_usb_requests = 
clamp(opts->streaming_max_usb_requests, 64U, 256U);
+    uvc->streaming_max_usb_requests = opts->streaming_max_usb_requests;
      /* For SS, wMaxPacketSize has to be 1024 if bMaxBurst is not 0 */
      if (opts->streaming_maxburst &&
          (opts->streaming_maxpacket % 1024) != 0) {
@@ -934,6 +935,7 @@  static struct usb_function_instance 
*uvc_alloc_inst(void)

      opts->streaming_interval = 1;
      opts->streaming_maxpacket = 1024;
+    opts->streaming_max_usb_requests = 64;
      snprintf(opts->function_name, sizeof(opts->function_name), "UVC 
Camera");

      ret = uvcg_attach_configfs(opts);
diff --git a/drivers/usb/gadget/function/u_uvc.h 
b/drivers/usb/gadget/function/u_uvc.h
index 1ce58f61253c..075fee178418 100644
--- a/drivers/usb/gadget/function/u_uvc.h
+++ b/drivers/usb/gadget/function/u_uvc.h
@@ -24,6 +24,7 @@  struct f_uvc_opts {
      unsigned int                    streaming_interval;
      unsigned int                    streaming_maxpacket;
      unsigned int                    streaming_maxburst;
+    unsigned int                    streaming_max_usb_requests;

      unsigned int                    control_interface;
      unsigned int                    streaming_interface;
diff --git a/drivers/usb/gadget/function/uvc.h 
b/drivers/usb/gadget/function/uvc.h
index 6751de8b63ad..943c074157fa 100644
--- a/drivers/usb/gadget/function/uvc.h
+++ b/drivers/usb/gadget/function/uvc.h
@@ -157,6 +157,9 @@  struct uvc_device {
      /* Events */
      unsigned int event_length;
      unsigned int event_setup_out : 1;
+
+    /* Max number of usb requests allocated to send to the ep*/
+    unsigned int streaming_max_usb_requests;
  };

  static inline struct uvc_device *to_uvc(struct usb_function *f)
diff --git a/drivers/usb/gadget/function/uvc_configfs.c 
b/drivers/usb/gadget/function/uvc_configfs.c
index 9bf0e985acfa..0ad9eae4845e 100644
--- a/drivers/usb/gadget/function/uvc_configfs.c
+++ b/drivers/usb/gadget/function/uvc_configfs.c
@@ -3403,6 +3403,7 @@  UVC_ATTR(f_uvc_opts_, cname, cname)
  UVCG_OPTS_ATTR(streaming_interval, streaming_interval, 16);
  UVCG_OPTS_ATTR(streaming_maxpacket, streaming_maxpacket, 3072);
  UVCG_OPTS_ATTR(streaming_maxburst, streaming_maxburst, 15);
+UVCG_OPTS_ATTR(streaming_max_usb_requests, streaming_max_usb_requests, 
256);

  #undef UVCG_OPTS_ATTR

@@ -3453,6 +3454,7 @@  static struct configfs_attribute *uvc_attrs[] = {
      &f_uvc_opts_attr_streaming_maxpacket,
      &f_uvc_opts_attr_streaming_maxburst,
      &f_uvc_opts_string_attr_function_name,
+    &f_uvc_opts_attr_streaming_max_usb_requests,
      NULL,
  };

diff --git a/drivers/usb/gadget/function/uvc_queue.c 
b/drivers/usb/gadget/function/uvc_queue.c
index 0aa3d7e1f3cc..b81c2d8e5ef2 100644
--- a/drivers/usb/gadget/function/uvc_queue.c
+++ b/drivers/usb/gadget/function/uvc_queue.c
@@ -45,6 +45,7 @@  static int uvc_queue_setup(struct vb2_queue *vq,
      struct uvc_video_queue *queue = vb2_get_drv_priv(vq);
      struct uvc_video *video = container_of(queue, struct uvc_video, 
queue);
      unsigned int req_size;
+    unsigned int max_usb_requests;
      unsigned int nreq;