Message ID | 20211202005852.3538102-1-m.grzeschik@pengutronix.de |
---|---|
State | New |
Headers | show |
Series | usb: gadget: uvc: use pump call conditionally | expand |
On Thu, Dec 02, 2021 at 01:58:52AM +0100, Michael Grzeschik wrote: > Preparing the usb request is not very expensive, when using > scatter gather. In that case we can even remove the overhead > of the extra pump worker and call the pump function directly. > > Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de> > --- > drivers/usb/gadget/function/uvc_v4l2.c | 8 +++++-- > drivers/usb/gadget/function/uvc_video.c | 28 +++++++++++++++++++------ > drivers/usb/gadget/function/uvc_video.h | 2 ++ > 3 files changed, 30 insertions(+), 8 deletions(-) > > diff --git a/drivers/usb/gadget/function/uvc_v4l2.c b/drivers/usb/gadget/function/uvc_v4l2.c > index a2c78690c5c288..020b4adc7840e0 100644 > --- a/drivers/usb/gadget/function/uvc_v4l2.c > +++ b/drivers/usb/gadget/function/uvc_v4l2.c > @@ -169,8 +169,12 @@ uvc_v4l2_qbuf(struct file *file, void *fh, struct v4l2_buffer *b) > if (ret < 0) > return ret; > > - if (uvc->state == UVC_STATE_STREAMING) > - schedule_work(&video->pump); > + if (uvc->state == UVC_STATE_STREAMING) { > + if (!video->queue.use_sg) > + schedule_work(&video->pump); > + else > + uvcg_video_pump(video); Wouldn't it be easier to understand this if you flip the if test around: if (video->queue.use_sg) uvcg_video_pump(video); else schedule_work(&video->pump); ? Negagive logic is never fun to read... Also, are you sure that sg really is not expensive on all systems? What did you test this on, and what was the results? thanks, greg k-h
On Fri, Dec 03, 2021 at 01:55:01PM +0100, Greg KH wrote: >On Thu, Dec 02, 2021 at 01:58:52AM +0100, Michael Grzeschik wrote: >> Preparing the usb request is not very expensive, when using >> scatter gather. In that case we can even remove the overhead >> of the extra pump worker and call the pump function directly. >> >> Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de> >> --- >> drivers/usb/gadget/function/uvc_v4l2.c | 8 +++++-- >> drivers/usb/gadget/function/uvc_video.c | 28 +++++++++++++++++++------ >> drivers/usb/gadget/function/uvc_video.h | 2 ++ >> 3 files changed, 30 insertions(+), 8 deletions(-) >> >> diff --git a/drivers/usb/gadget/function/uvc_v4l2.c b/drivers/usb/gadget/function/uvc_v4l2.c >> index a2c78690c5c288..020b4adc7840e0 100644 >> --- a/drivers/usb/gadget/function/uvc_v4l2.c >> +++ b/drivers/usb/gadget/function/uvc_v4l2.c >> @@ -169,8 +169,12 @@ uvc_v4l2_qbuf(struct file *file, void *fh, struct v4l2_buffer *b) >> if (ret < 0) >> return ret; >> >> - if (uvc->state == UVC_STATE_STREAMING) >> - schedule_work(&video->pump); >> + if (uvc->state == UVC_STATE_STREAMING) { >> + if (!video->queue.use_sg) >> + schedule_work(&video->pump); >> + else >> + uvcg_video_pump(video); > >Wouldn't it be easier to understand this if you flip the if test around: > if (video->queue.use_sg) > uvcg_video_pump(video); > else > schedule_work(&video->pump); >? > >Negagive logic is never fun to read... Yes, you are not wrong. >Also, are you sure that sg really is not expensive on all systems? What >did you test this on, and what was the results? I tested it on an zynqmp arm64 machine. I tried to test the sg case on an 32 bit IMX with chipidea, but the driver was complaining a lot about "not page aligned sg buffers". So if needed, I would first need to find a working machine to compare this with. However I would think that assigning some pointers on a scatterlist instead of doing memcpy of 1024 bytes should be less expensive. Regards, Michael
On Sun, Dec 05, 2021 at 10:49:58PM +0100, Michael Grzeschik wrote: > On Fri, Dec 03, 2021 at 01:55:01PM +0100, Greg KH wrote: > > On Thu, Dec 02, 2021 at 01:58:52AM +0100, Michael Grzeschik wrote: > > > Preparing the usb request is not very expensive, when using > > > scatter gather. In that case we can even remove the overhead > > > of the extra pump worker and call the pump function directly. > > > > > > Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de> > > > --- > > > drivers/usb/gadget/function/uvc_v4l2.c | 8 +++++-- > > > drivers/usb/gadget/function/uvc_video.c | 28 +++++++++++++++++++------ > > > drivers/usb/gadget/function/uvc_video.h | 2 ++ > > > 3 files changed, 30 insertions(+), 8 deletions(-) > > > > > > diff --git a/drivers/usb/gadget/function/uvc_v4l2.c b/drivers/usb/gadget/function/uvc_v4l2.c > > > index a2c78690c5c288..020b4adc7840e0 100644 > > > --- a/drivers/usb/gadget/function/uvc_v4l2.c > > > +++ b/drivers/usb/gadget/function/uvc_v4l2.c > > > @@ -169,8 +169,12 @@ uvc_v4l2_qbuf(struct file *file, void *fh, struct v4l2_buffer *b) > > > if (ret < 0) > > > return ret; > > > > > > - if (uvc->state == UVC_STATE_STREAMING) > > > - schedule_work(&video->pump); > > > + if (uvc->state == UVC_STATE_STREAMING) { > > > + if (!video->queue.use_sg) > > > + schedule_work(&video->pump); > > > + else > > > + uvcg_video_pump(video); > > > > Wouldn't it be easier to understand this if you flip the if test around: > > if (video->queue.use_sg) > > uvcg_video_pump(video); > > else > > schedule_work(&video->pump); > > ? > > > > Negagive logic is never fun to read... > > Yes, you are not wrong. > > > Also, are you sure that sg really is not expensive on all systems? What > > did you test this on, and what was the results? > > I tested it on an zynqmp arm64 machine. I tried to test the sg case on > an 32 bit IMX with chipidea, but the driver was complaining a lot about > "not page aligned sg buffers". So if needed, I would first need to find > a working machine to compare this with. > > However I would think that assigning some pointers on a scatterlist > instead of doing memcpy of 1024 bytes should be less expensive. Not true on many systems, memcpy can be _very_ fast, especially for small amounts like 1024 bytes. So please, measure this to be sure. thanks, greg k-h
Hello, On Mon, Dec 06, 2021 at 08:34:17AM +0100, Greg KH wrote: > On Sun, Dec 05, 2021 at 10:49:58PM +0100, Michael Grzeschik wrote: > > On Fri, Dec 03, 2021 at 01:55:01PM +0100, Greg KH wrote: > > > On Thu, Dec 02, 2021 at 01:58:52AM +0100, Michael Grzeschik wrote: > > > > Preparing the usb request is not very expensive, when using > > > > scatter gather. In that case we can even remove the overhead > > > > of the extra pump worker and call the pump function directly. > > > > > > > > Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de> > > > > --- > > > > drivers/usb/gadget/function/uvc_v4l2.c | 8 +++++-- > > > > drivers/usb/gadget/function/uvc_video.c | 28 +++++++++++++++++++------ > > > > drivers/usb/gadget/function/uvc_video.h | 2 ++ > > > > 3 files changed, 30 insertions(+), 8 deletions(-) > > > > > > > > diff --git a/drivers/usb/gadget/function/uvc_v4l2.c b/drivers/usb/gadget/function/uvc_v4l2.c > > > > index a2c78690c5c288..020b4adc7840e0 100644 > > > > --- a/drivers/usb/gadget/function/uvc_v4l2.c > > > > +++ b/drivers/usb/gadget/function/uvc_v4l2.c > > > > @@ -169,8 +169,12 @@ uvc_v4l2_qbuf(struct file *file, void *fh, struct v4l2_buffer *b) > > > > if (ret < 0) > > > > return ret; > > > > > > > > - if (uvc->state == UVC_STATE_STREAMING) > > > > - schedule_work(&video->pump); > > > > + if (uvc->state == UVC_STATE_STREAMING) { > > > > + if (!video->queue.use_sg) > > > > + schedule_work(&video->pump); > > > > + else > > > > + uvcg_video_pump(video); > > > > > > Wouldn't it be easier to understand this if you flip the if test around: > > > if (video->queue.use_sg) > > > uvcg_video_pump(video); > > > else > > > schedule_work(&video->pump); > > > ? > > > > > > Negagive logic is never fun to read... > > > > Yes, you are not wrong. > > > > > Also, are you sure that sg really is not expensive on all systems? What > > > did you test this on, and what was the results? > > > > I tested it on an zynqmp arm64 machine. I tried to test the sg case on > > an 32 bit IMX with chipidea, but the driver was complaining a lot about > > "not page aligned sg buffers". So if needed, I would first need to find > > a working machine to compare this with. > > > > However I would think that assigning some pointers on a scatterlist > > instead of doing memcpy of 1024 bytes should be less expensive. > > Not true on many systems, memcpy can be _very_ fast, especially for > small amounts like 1024 bytes. So please, measure this to be sure. We've moved memcpy()s to a work queue in the host UVC driver for high-bandwidth devices, which brough massive performance improvements (if I recall correctly, at least partly due to the parallelization of memcpy operations). I'm not sure we have measured performances in the gadget driver with the same level of accuracy and care though. Michael, do you plan to make measurements ?
On Mon, Dec 06, 2021 at 02:15:02PM +0200, Laurent Pinchart wrote: >Hello, > >On Mon, Dec 06, 2021 at 08:34:17AM +0100, Greg KH wrote: >> On Sun, Dec 05, 2021 at 10:49:58PM +0100, Michael Grzeschik wrote: >> > On Fri, Dec 03, 2021 at 01:55:01PM +0100, Greg KH wrote: >> > > On Thu, Dec 02, 2021 at 01:58:52AM +0100, Michael Grzeschik wrote: >> > > > Preparing the usb request is not very expensive, when using >> > > > scatter gather. In that case we can even remove the overhead >> > > > of the extra pump worker and call the pump function directly. >> > > > >> > > > Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de> >> > > > --- >> > > > drivers/usb/gadget/function/uvc_v4l2.c | 8 +++++-- >> > > > drivers/usb/gadget/function/uvc_video.c | 28 +++++++++++++++++++------ >> > > > drivers/usb/gadget/function/uvc_video.h | 2 ++ >> > > > 3 files changed, 30 insertions(+), 8 deletions(-) >> > > > >> > > > diff --git a/drivers/usb/gadget/function/uvc_v4l2.c b/drivers/usb/gadget/function/uvc_v4l2.c >> > > > index a2c78690c5c288..020b4adc7840e0 100644 >> > > > --- a/drivers/usb/gadget/function/uvc_v4l2.c >> > > > +++ b/drivers/usb/gadget/function/uvc_v4l2.c >> > > > @@ -169,8 +169,12 @@ uvc_v4l2_qbuf(struct file *file, void *fh, struct v4l2_buffer *b) >> > > > if (ret < 0) >> > > > return ret; >> > > > >> > > > - if (uvc->state == UVC_STATE_STREAMING) >> > > > - schedule_work(&video->pump); >> > > > + if (uvc->state == UVC_STATE_STREAMING) { >> > > > + if (!video->queue.use_sg) >> > > > + schedule_work(&video->pump); >> > > > + else >> > > > + uvcg_video_pump(video); >> > > >> > > Wouldn't it be easier to understand this if you flip the if test around: >> > > if (video->queue.use_sg) >> > > uvcg_video_pump(video); >> > > else >> > > schedule_work(&video->pump); >> > > ? >> > > >> > > Negagive logic is never fun to read... >> > >> > Yes, you are not wrong. >> > >> > > Also, are you sure that sg really is not expensive on all systems? What >> > > did you test this on, and what was the results? >> > >> > I tested it on an zynqmp arm64 machine. I tried to test the sg case on >> > an 32 bit IMX with chipidea, but the driver was complaining a lot about >> > "not page aligned sg buffers". So if needed, I would first need to find >> > a working machine to compare this with. >> > >> > However I would think that assigning some pointers on a scatterlist >> > instead of doing memcpy of 1024 bytes should be less expensive. >> >> Not true on many systems, memcpy can be _very_ fast, especially for >> small amounts like 1024 bytes. So please, measure this to be sure. > >We've moved memcpy()s to a work queue in the host UVC driver for >high-bandwidth devices, which brough massive performance improvements >(if I recall correctly, at least partly due to the parallelization of >memcpy operations). I'm not sure we have measured performances in the >gadget driver with the same level of accuracy and care though. > >Michael, do you plan to make measurements ? I would do it, but only if I can find time and hardware to test it on. For now, my agenda says to get the other patches mainline first. So yes, but only in some later undefined time. Michael
diff --git a/drivers/usb/gadget/function/uvc_v4l2.c b/drivers/usb/gadget/function/uvc_v4l2.c index a2c78690c5c288..020b4adc7840e0 100644 --- a/drivers/usb/gadget/function/uvc_v4l2.c +++ b/drivers/usb/gadget/function/uvc_v4l2.c @@ -169,8 +169,12 @@ uvc_v4l2_qbuf(struct file *file, void *fh, struct v4l2_buffer *b) if (ret < 0) return ret; - if (uvc->state == UVC_STATE_STREAMING) - schedule_work(&video->pump); + if (uvc->state == UVC_STATE_STREAMING) { + if (!video->queue.use_sg) + schedule_work(&video->pump); + else + uvcg_video_pump(video); + } return ret; } diff --git a/drivers/usb/gadget/function/uvc_video.c b/drivers/usb/gadget/function/uvc_video.c index 7f59a0c4740209..933c2b831fe747 100644 --- a/drivers/usb/gadget/function/uvc_video.c +++ b/drivers/usb/gadget/function/uvc_video.c @@ -268,8 +268,12 @@ uvc_video_complete(struct usb_ep *ep, struct usb_request *req) list_add_tail(&req->list, &video->req_free); spin_unlock_irqrestore(&video->req_lock, flags); - if (uvc->state == UVC_STATE_STREAMING) - schedule_work(&video->pump); + if (uvc->state == UVC_STATE_STREAMING) { + if (!queue->use_sg) + schedule_work(&video->pump); + else + uvcg_video_pump(video); + } } static int @@ -359,9 +363,8 @@ uvc_video_alloc_requests(struct uvc_video *video) * This function fills the available USB requests (listed in req_free) with * video data from the queued buffers. */ -static void uvcg_video_pump(struct work_struct *work) +void uvcg_video_pump(struct uvc_video *video) { - struct uvc_video *video = container_of(work, struct uvc_video, pump); struct uvc_video_queue *queue = &video->queue; struct usb_request *req = NULL; struct uvc_buffer *buf; @@ -427,6 +430,14 @@ static void uvcg_video_pump(struct work_struct *work) return; } + +static void uvcg_video_pump_t(struct work_struct *work) +{ + struct uvc_video *video = container_of(work, struct uvc_video, pump); + + uvcg_video_pump(video); +} + /* * Enable or disable the video stream. */ @@ -469,7 +480,10 @@ int uvcg_video_enable(struct uvc_video *video, int enable) video->req_int_count = 0; - schedule_work(&video->pump); + if (!video->queue.use_sg) + schedule_work(&video->pump); + else + uvcg_video_pump(video); return ret; } @@ -481,7 +495,9 @@ int uvcg_video_init(struct uvc_video *video, struct uvc_device *uvc) { INIT_LIST_HEAD(&video->req_free); spin_lock_init(&video->req_lock); - INIT_WORK(&video->pump, uvcg_video_pump); + + if (!video->queue.use_sg) + INIT_WORK(&video->pump, uvcg_video_pump_t); video->uvc = uvc; video->fcc = V4L2_PIX_FMT_YUYV; diff --git a/drivers/usb/gadget/function/uvc_video.h b/drivers/usb/gadget/function/uvc_video.h index 03adeefa343b71..d4ad61dd568974 100644 --- a/drivers/usb/gadget/function/uvc_video.h +++ b/drivers/usb/gadget/function/uvc_video.h @@ -14,6 +14,8 @@ struct uvc_video; +void uvcg_video_pump(struct uvc_video *video); + int uvcg_video_enable(struct uvc_video *video, int enable); int uvcg_video_init(struct uvc_video *video, struct uvc_device *uvc);
Preparing the usb request is not very expensive, when using scatter gather. In that case we can even remove the overhead of the extra pump worker and call the pump function directly. Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de> --- drivers/usb/gadget/function/uvc_v4l2.c | 8 +++++-- drivers/usb/gadget/function/uvc_video.c | 28 +++++++++++++++++++------ drivers/usb/gadget/function/uvc_video.h | 2 ++ 3 files changed, 30 insertions(+), 8 deletions(-)