mbox series

[RFC,0/4] Allow more than 32 vb2 buffers per queue

Message ID 20230313135916.862852-1-benjamin.gaignard@collabora.com
Headers show
Series Allow more than 32 vb2 buffers per queue | expand

Message

Benjamin Gaignard March 13, 2023, 1:59 p.m. UTC
Queues can only store up to VB2_MAX_FRAME (32) vb2 buffers.
Some use cases like VP9 dynamic resolution change may require
to have more than 32 buffers in use at the same time.
The goal of this series is to prepare queues for these use
cases by replacing bufs array by a list a vb2 buffers.

For the same VP9 use case we will need to be able to delete
buffers from the queue to limit memory usage so this series
add a bitmap to manage buffer indexes. This should permit to
avoid creating holes in vb2 index range.

I test these patches with Fluster test suite on Hantro video
decoder (VP9 and HEVC). I notice no performances issues and 
no regressions.

Despite carefully checking if removing bufs array doesn't break
the compilation of any media driver, I may have miss some so
one of the goal of this RFC is also to trig compilation robots.
 
Benjamin Gaignard (4):
  media: videobuf2: Use vb2_get_buffer() as helper everywhere
  media: videobuf2: Replace bufs array by a list
  media: videobuf2: Use bitmap to manage vb2 index
  media: videobuf2: Stop define VB2_MAX_FRAME as global

 .../media/common/videobuf2/videobuf2-core.c   | 107 +++++++++++-------
 .../media/common/videobuf2/videobuf2-v4l2.c   |  17 +--
 drivers/media/platform/amphion/vdec.c         |   1 +
 drivers/media/platform/amphion/vpu_dbg.c      |   4 +-
 .../platform/mediatek/jpeg/mtk_jpeg_core.c    |   2 +-
 .../vcodec/vdec/vdec_vp9_req_lat_if.c         |   4 +-
 drivers/media/platform/qcom/venus/hfi.h       |   2 +
 .../media/platform/verisilicon/hantro_hw.h    |   2 +
 drivers/media/test-drivers/visl/visl-dec.c    |  16 +--
 include/media/videobuf2-core.h                |  42 ++++++-
 include/media/videobuf2-v4l2.h                |   4 -
 11 files changed, 130 insertions(+), 71 deletions(-)

Comments

Hans Verkuil March 14, 2023, 8:55 a.m. UTC | #1
On 14/03/2023 00:16, David Laight wrote:
> From: Laurent Pinchart
>> Sent: 13 March 2023 18:12
>>
>> Hi Benjamin,
>>
>> Thank you for the patch.
>>
>> On Mon, Mar 13, 2023 at 02:59:14PM +0100, Benjamin Gaignard wrote:
>>> Replacing bufs array by a list allows to remove the 32 buffers
>>> limit per queue.
> 
> Is the limit actually a problem?
> Arrays of pointers have locking and caching advantages over
> linked lists.

I'm not so keen on using a list either. Adding or deleting buffers will
be an infrequency operation, so using an array of pointers and reallocing
it if needed would be perfectly fine. Buffer lookup based on the index
should be really fast, though.

Why not start with a dynamically allocated array of 32 vb2_buffer pointers?
And keep doubling the size (reallocing) whenever more buffers are needed,
up to some maximum (1024 would be a good initial value for that, I think).
This max could be even a module option.

A simple spinlock is sufficient, I think, to regulate access to the
struct vb2_buffer **bufs pointer in vb2_queue. From what I can see it is
not needed in interrupt context (i.e. vb2_buffer_done).

Regards,

	Hans

> 
> ...
>>> @@ -1239,8 +1242,12 @@ static inline void vb2_clear_last_buffer_dequeued(struct vb2_queue *q)
>>>  static inline struct vb2_buffer *vb2_get_buffer(struct vb2_queue *q,
>>>  						unsigned int index)
>>>  {
>>> -	if (index < q->num_buffers)
>>> -		return q->bufs[index];
>>> +	struct vb2_buffer *vb;
>>> +
>>> +	list_for_each_entry(vb, &q->allocated_bufs, allocated_entry)
>>> +		if (vb->index == index)
>>> +			return vb;
>>> +
>>>  	return NULL;
> 
> You really don't want to be doing that....
> 
> There are schemes for unbounded arrays.
> Scanning a linked list isn't a very good one.
> 
> 	David
> 
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
Hans Verkuil March 14, 2023, 10:42 a.m. UTC | #2
Hi David,

On 3/14/23 11:11, David Laight wrote:
> From: Hans Verkuil
>> Sent: 14 March 2023 08:55
> ...
>> Why not start with a dynamically allocated array of 32 vb2_buffer pointers?
>> And keep doubling the size (reallocing) whenever more buffers are needed,
>> up to some maximum (1024 would be a good initial value for that, I think).
>> This max could be even a module option.
> 
> I don't know the typical uses (or the code at all).
> But it might be worth having a small array in the structure itself.
> Useful if there are typically always (say) less than 8 buffers.
> For larger sizes use the (IIRC) kmalloc_size() to find the actual
> size of the structure that will be allocate and set the array
> size appropriately.

The typical usage is that applications allocate N buffers with the
VIDIOC_REQBUFS ioctl, and in most cases that's all they use. The
current max is 32 buffers, so allocating that initially (usually
during probe()) will cover all current cases with a single one-time
kzalloc.

Only if the application wants to allocate more than 32 buffers will
there be a slight overhead.

Regards,

	Hans

> 
> 	David
> 
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
Laurent Pinchart March 19, 2023, 11:33 p.m. UTC | #3
Hi Hans,

On Tue, Mar 14, 2023 at 11:42:46AM +0100, Hans Verkuil wrote:
> On 3/14/23 11:11, David Laight wrote:
> > From: Hans Verkuil
> >> Sent: 14 March 2023 08:55
> > ...
> >> Why not start with a dynamically allocated array of 32 vb2_buffer pointers?
> >> And keep doubling the size (reallocing) whenever more buffers are needed,
> >> up to some maximum (1024 would be a good initial value for that, I think).
> >> This max could be even a module option.

The kernel has IDR and IDA APIs, why not use them instead of reinventing
the wheel ?

> > I don't know the typical uses (or the code at all).
> > But it might be worth having a small array in the structure itself.
> > Useful if there are typically always (say) less than 8 buffers.
> > For larger sizes use the (IIRC) kmalloc_size() to find the actual
> > size of the structure that will be allocate and set the array
> > size appropriately.
> 
> The typical usage is that applications allocate N buffers with the
> VIDIOC_REQBUFS ioctl, and in most cases that's all they use.

Note that once we get DELETE_BUF (or DELETE_BUFS) support I'd like to
encourage applications to use the new API, and deprecate REQBUFS
(dropping it isn't on my radar, as it would take forever before no
userspace uses it anymore).

> The
> current max is 32 buffers, so allocating that initially (usually
> during probe()) will cover all current cases with a single one-time
> kzalloc.

Pre-allocating for the most common usage patterns is fine with me.

> Only if the application wants to allocate more than 32 buffers will
> there be a slight overhead.
Nicolas Dufresne March 24, 2023, 3:14 p.m. UTC | #4
Le mercredi 22 mars 2023 à 17:01 +0200, Laurent Pinchart a écrit :
> On Wed, Mar 22, 2023 at 10:50:52AM -0400, Nicolas Dufresne wrote:
> > Hi Laurent,
> > 
> > Le lundi 20 mars 2023 à 01:33 +0200, Laurent Pinchart a écrit :
> > > > The typical usage is that applications allocate N buffers with the
> > > > VIDIOC_REQBUFS ioctl, and in most cases that's all they use.
> > > 
> > > Note that once we get DELETE_BUF (or DELETE_BUFS) support I'd like to
> > > encourage applications to use the new API, and deprecate REQBUFS
> > > (dropping it isn't on my radar, as it would take forever before no
> > > userspace uses it anymore).
> > 
> > I was wondering if you can extend on this. I'm worried the count semantic might
> > prevent emulating it over create_bufs() ops, but if that works, did you meant to
> > emulate it so driver no longer have to implement reqbufs() if they have
> > create_bufs() ?
> 
> For drivers it should be fairly simply, as the reqbufs and create_bufs
> ioctl handlers should just point to the corresponding videobuf2 helpers.
> 
> What I meant is that I'd like to encourage userspace to use the
> VIDIOC_CREATE_BUFS ioctl instead of VIDIOC_REQBUFS.
> 

I'm not sure what rationale I can give implementer to "encourage" them to use a
more complex API that needs to copy over the FMT (which has just been set),
specially in the initial pre-allocation case. For most, CREATE_BUFS after SMT
will look like a very redundant and counter intuitive thing. Maybe you have a
more optimistic view on the matter ? Or you have a better idea how we could give
a meaning to having a fmt there on the initial case where the allocation matches
the queue FMT ?

Nicolas
Hans Verkuil March 24, 2023, 3:18 p.m. UTC | #5
On 24/03/2023 16:14, Nicolas Dufresne wrote:
> Le mercredi 22 mars 2023 à 17:01 +0200, Laurent Pinchart a écrit :
>> On Wed, Mar 22, 2023 at 10:50:52AM -0400, Nicolas Dufresne wrote:
>>> Hi Laurent,
>>>
>>> Le lundi 20 mars 2023 à 01:33 +0200, Laurent Pinchart a écrit :
>>>>> The typical usage is that applications allocate N buffers with the
>>>>> VIDIOC_REQBUFS ioctl, and in most cases that's all they use.
>>>>
>>>> Note that once we get DELETE_BUF (or DELETE_BUFS) support I'd like to
>>>> encourage applications to use the new API, and deprecate REQBUFS
>>>> (dropping it isn't on my radar, as it would take forever before no
>>>> userspace uses it anymore).
>>>
>>> I was wondering if you can extend on this. I'm worried the count semantic might
>>> prevent emulating it over create_bufs() ops, but if that works, did you meant to
>>> emulate it so driver no longer have to implement reqbufs() if they have
>>> create_bufs() ?
>>
>> For drivers it should be fairly simply, as the reqbufs and create_bufs
>> ioctl handlers should just point to the corresponding videobuf2 helpers.
>>
>> What I meant is that I'd like to encourage userspace to use the
>> VIDIOC_CREATE_BUFS ioctl instead of VIDIOC_REQBUFS.
>>
> 
> I'm not sure what rationale I can give implementer to "encourage" them to use a
> more complex API that needs to copy over the FMT (which has just been set),
> specially in the initial pre-allocation case. For most, CREATE_BUFS after SMT
> will look like a very redundant and counter intuitive thing. Maybe you have a
> more optimistic view on the matter ? Or you have a better idea how we could give
> a meaning to having a fmt there on the initial case where the allocation matches
> the queue FMT ?

I wouldn't mind if we can make a much nicer CREATE_BUFS variant with just the
size instead of a format. That was in hindsight a really bad idea, terrible
over-engineering.

Regards,

	Hans
Nicolas Dufresne March 24, 2023, 3:34 p.m. UTC | #6
Le vendredi 24 mars 2023 à 16:18 +0100, Hans Verkuil a écrit :
> On 24/03/2023 16:14, Nicolas Dufresne wrote:
> > Le mercredi 22 mars 2023 à 17:01 +0200, Laurent Pinchart a écrit :
> > > On Wed, Mar 22, 2023 at 10:50:52AM -0400, Nicolas Dufresne wrote:
> > > > Hi Laurent,
> > > > 
> > > > Le lundi 20 mars 2023 à 01:33 +0200, Laurent Pinchart a écrit :
> > > > > > The typical usage is that applications allocate N buffers with the
> > > > > > VIDIOC_REQBUFS ioctl, and in most cases that's all they use.
> > > > > 
> > > > > Note that once we get DELETE_BUF (or DELETE_BUFS) support I'd like to
> > > > > encourage applications to use the new API, and deprecate REQBUFS
> > > > > (dropping it isn't on my radar, as it would take forever before no
> > > > > userspace uses it anymore).
> > > > 
> > > > I was wondering if you can extend on this. I'm worried the count semantic might
> > > > prevent emulating it over create_bufs() ops, but if that works, did you meant to
> > > > emulate it so driver no longer have to implement reqbufs() if they have
> > > > create_bufs() ?
> > > 
> > > For drivers it should be fairly simply, as the reqbufs and create_bufs
> > > ioctl handlers should just point to the corresponding videobuf2 helpers.
> > > 
> > > What I meant is that I'd like to encourage userspace to use the
> > > VIDIOC_CREATE_BUFS ioctl instead of VIDIOC_REQBUFS.
> > > 
> > 
> > I'm not sure what rationale I can give implementer to "encourage" them to use a
> > more complex API that needs to copy over the FMT (which has just been set),
> > specially in the initial pre-allocation case. For most, CREATE_BUFS after SMT
> > will look like a very redundant and counter intuitive thing. Maybe you have a
> > more optimistic view on the matter ? Or you have a better idea how we could give
> > a meaning to having a fmt there on the initial case where the allocation matches
> > the queue FMT ?
> 
> I wouldn't mind if we can make a much nicer CREATE_BUFS variant with just the
> size instead of a format. That was in hindsight a really bad idea, terrible
> over-engineering.

Note that all DRM allocators also includes width/height and some format related
info (or the full info). This is because the driver deals with the alignment
requirements. In some use cases (I have inter frame dynamic control in mind
here) the fmt could be a mean to feedback the alignment (like bytesperline) back
to the application where the stream is no longer homogeneous on the FMT.

That being said, If we move toward a size base allocator API, we could also just
point back to an existing HEAP (or export an new heap if none are valid). And
define the sizeimage(s) is now that information you need from the FMT to
allocate anything + which heap needs to be used for the current setup.

Nicolas

> 
> Regards,
> 
> 	Hans
Laurent Pinchart March 24, 2023, 8:21 p.m. UTC | #7
On Fri, Mar 24, 2023 at 11:34:48AM -0400, Nicolas Dufresne wrote:
> Le vendredi 24 mars 2023 à 16:18 +0100, Hans Verkuil a écrit :
> > On 24/03/2023 16:14, Nicolas Dufresne wrote:
> > > Le mercredi 22 mars 2023 à 17:01 +0200, Laurent Pinchart a écrit :
> > > > On Wed, Mar 22, 2023 at 10:50:52AM -0400, Nicolas Dufresne wrote:
> > > > > Le lundi 20 mars 2023 à 01:33 +0200, Laurent Pinchart a écrit :
> > > > > > > The typical usage is that applications allocate N buffers with the
> > > > > > > VIDIOC_REQBUFS ioctl, and in most cases that's all they use.
> > > > > > 
> > > > > > Note that once we get DELETE_BUF (or DELETE_BUFS) support I'd like to
> > > > > > encourage applications to use the new API, and deprecate REQBUFS
> > > > > > (dropping it isn't on my radar, as it would take forever before no
> > > > > > userspace uses it anymore).
> > > > > 
> > > > > I was wondering if you can extend on this. I'm worried the count semantic might
> > > > > prevent emulating it over create_bufs() ops, but if that works, did you meant to
> > > > > emulate it so driver no longer have to implement reqbufs() if they have
> > > > > create_bufs() ?
> > > > 
> > > > For drivers it should be fairly simply, as the reqbufs and create_bufs
> > > > ioctl handlers should just point to the corresponding videobuf2 helpers.
> > > > 
> > > > What I meant is that I'd like to encourage userspace to use the
> > > > VIDIOC_CREATE_BUFS ioctl instead of VIDIOC_REQBUFS.
> > > > 
> > > 
> > > I'm not sure what rationale I can give implementer to "encourage" them to use a
> > > more complex API that needs to copy over the FMT (which has just been set),
> > > specially in the initial pre-allocation case. For most, CREATE_BUFS after SMT
> > > will look like a very redundant and counter intuitive thing. Maybe you have a
> > > more optimistic view on the matter ? Or you have a better idea how we could give
> > > a meaning to having a fmt there on the initial case where the allocation matches
> > > the queue FMT ?
> > 
> > I wouldn't mind if we can make a much nicer CREATE_BUFS variant with just the
> > size instead of a format. That was in hindsight a really bad idea, terrible
> > over-engineering.
> 
> Note that all DRM allocators also includes width/height and some format related
> info (or the full info). This is because the driver deals with the alignment
> requirements. In some use cases (I have inter frame dynamic control in mind
> here) the fmt could be a mean to feedback the alignment (like bytesperline) back
> to the application where the stream is no longer homogeneous on the FMT.
> 
> That being said, If we move toward a size base allocator API, we could also just
> point back to an existing HEAP (or export an new heap if none are valid). And
> define the sizeimage(s) is now that information you need from the FMT to
> allocate anything + which heap needs to be used for the current setup.

If we could move away from allocating buffers within V4L2 to only
importing buffers allocated through the DMA heaps API, I'd be very
happy. That won't be simple though. Maybe a good candidate for
discussions during the media summit in Prague this year ?