diff mbox series

[v2,1/2] USB: core: add a memory pool to urb for host-controller private data

Message ID 20250513113817.11962-1-00107082@163.com
State New
Headers show
Series [v2,1/2] USB: core: add a memory pool to urb for host-controller private data | expand

Commit Message

David Wang May 13, 2025, 11:38 a.m. UTC
---
Changes:
1. Use caller's mem_flags if a larger memory is needed.
Thanks to Oliver Neukum <oneukum@suse.com>'s review.
---
URB objects have long lifecycle, an urb can be reused between
enqueue-dequeue loops; The private data needed by some host controller
has very short lifecycle, the memory is alloced when enqueue, and
released when dequeue. For example, on a system with xhci, several
minutes of usage of webcam/keyboard/mouse have memory alloc counts:
  drivers/usb/core/urb.c:75 [usbcore] func:usb_alloc_urb 661
  drivers/usb/host/xhci.c:1555 [xhci_hcd] func:xhci_urb_enqueue 424863
Memory allocation frequency for host-controller private data can reach
~1k/s and above.

High frequent allocations for host-controller private data can be
avoided if urb take over the ownership of memory, the memory then shares
the longer lifecycle with urb objects.

Add a mempool to urb for hcpriv usage, the mempool only holds one block
of memory and grows when larger size is requested.

Signed-off-by: David Wang <00107082@163.com>
---
 drivers/usb/core/urb.c | 23 +++++++++++++++++++++++
 include/linux/usb.h    |  3 +++
 2 files changed, 26 insertions(+)

Comments

David Wang May 13, 2025, 2:41 p.m. UTC | #1
At 2025-05-13 22:25:50, "Alan Stern" <stern@rowland.harvard.edu> wrote:
>On Tue, May 13, 2025 at 07:38:17PM +0800, David Wang wrote:
>> ---
>> Changes:
>> 1. Use caller's mem_flags if a larger memory is needed.
>> Thanks to Oliver Neukum <oneukum@suse.com>'s review.
>> ---
>> URB objects have long lifecycle, an urb can be reused between
>> enqueue-dequeue loops; The private data needed by some host controller
>> has very short lifecycle, the memory is alloced when enqueue, and
>> released when dequeue. For example, on a system with xhci, several
>> minutes of usage of webcam/keyboard/mouse have memory alloc counts:
>>   drivers/usb/core/urb.c:75 [usbcore] func:usb_alloc_urb 661
>>   drivers/usb/host/xhci.c:1555 [xhci_hcd] func:xhci_urb_enqueue 424863
>> Memory allocation frequency for host-controller private data can reach
>> ~1k/s and above.
>> 
>> High frequent allocations for host-controller private data can be
>> avoided if urb take over the ownership of memory, the memory then shares
>> the longer lifecycle with urb objects.
>> 
>> Add a mempool to urb for hcpriv usage, the mempool only holds one block
>> of memory and grows when larger size is requested.
>> 
>> Signed-off-by: David Wang <00107082@163.com>
>
>It should be possible to do what you want without touching the USB core 
>code at all, changing only xhci-hcd.  That's what Mathias is suggesting.
>
>Instead of having an URB keep ownership of its extra memory after it 
>completes, xhci-hcd can put the memory area onto a free list.  Then 
>memory areas on the free list can be reused with almost no overhead when 
>URBs are enqueued later on.

I have to disagree,  your suggestion has no much difference from requesting memory from
system, locks and memory pool managements, all the same are needed, why bother?

The reason I choose URB is that  URB life cycle contains the private data's lifecycle, 
and no two HCD can take over the same URB as the same time.

I think the memory pool here is not actually a pool,  or I should say the mempool consists of
pool of URBs, and each URB have only "single one" slot of mem pool in it.


>
>This idea can become more elaborate if you maintain separate free lists 
>for different devices, or even for different endpoints, or sort the free 
>list by the size of the memory areas.  But the basic idea is always the 
>same: Don't change usbcore (including struct urb), and make getting and 
>releasing the extra memory areas have extremely low overhead.

Why implements a device level memory pool would have extremely low overhead?
Why  making changes to usb core is bad? The idea in this thread is meant for all kinds of
host controllers, xhci is what I have in my system; i think similar changes would benefit other
HCs


>
>Alan Stern
Alan Stern May 13, 2025, 3:37 p.m. UTC | #2
On Tue, May 13, 2025 at 10:41:45PM +0800, David Wang wrote:
> 
> 
> At 2025-05-13 22:25:50, "Alan Stern" <stern@rowland.harvard.edu> wrote:
> >On Tue, May 13, 2025 at 07:38:17PM +0800, David Wang wrote:
> >> ---
> >> Changes:
> >> 1. Use caller's mem_flags if a larger memory is needed.
> >> Thanks to Oliver Neukum <oneukum@suse.com>'s review.
> >> ---
> >> URB objects have long lifecycle, an urb can be reused between
> >> enqueue-dequeue loops; The private data needed by some host controller
> >> has very short lifecycle, the memory is alloced when enqueue, and
> >> released when dequeue. For example, on a system with xhci, several
> >> minutes of usage of webcam/keyboard/mouse have memory alloc counts:
> >>   drivers/usb/core/urb.c:75 [usbcore] func:usb_alloc_urb 661
> >>   drivers/usb/host/xhci.c:1555 [xhci_hcd] func:xhci_urb_enqueue 424863
> >> Memory allocation frequency for host-controller private data can reach
> >> ~1k/s and above.
> >> 
> >> High frequent allocations for host-controller private data can be
> >> avoided if urb take over the ownership of memory, the memory then shares
> >> the longer lifecycle with urb objects.
> >> 
> >> Add a mempool to urb for hcpriv usage, the mempool only holds one block
> >> of memory and grows when larger size is requested.
> >> 
> >> Signed-off-by: David Wang <00107082@163.com>
> >
> >It should be possible to do what you want without touching the USB core 
> >code at all, changing only xhci-hcd.  That's what Mathias is suggesting.
> >
> >Instead of having an URB keep ownership of its extra memory after it 
> >completes, xhci-hcd can put the memory area onto a free list.  Then 
> >memory areas on the free list can be reused with almost no overhead when 
> >URBs are enqueued later on.
> 
> I have to disagree,  your suggestion has no much difference from requesting memory from
> system, locks and memory pool managements, all the same are needed, why bother?

There are two differences.  First, xhci-hcd already has its own lock 
that it acquires when enqueuing or dequeuing URBs, so no additional 
locks would be needed.  Second, complicated memory pool management isn't 
necessary; the management can be extremely simple.  (For example, 
Mathias suggested just reusing the most recently released memory area 
unless it is too small.)

> The reason I choose URB is that  URB life cycle contains the private data's lifecycle, 
> and no two HCD can take over the same URB as the same time.
> 
> I think the memory pool here is not actually a pool,  or I should say the mempool consists of
> pool of URBs, and each URB have only "single one" slot of mem pool in it.
> 
> 
> >
> >This idea can become more elaborate if you maintain separate free lists 
> >for different devices, or even for different endpoints, or sort the free 
> >list by the size of the memory areas.  But the basic idea is always the 
> >same: Don't change usbcore (including struct urb), and make getting and 
> >releasing the extra memory areas have extremely low overhead.
> 
> Why implements a device level memory pool would have extremely low overhead?

Because then getting or releasing memory areas from the pool could be 
carried out just by manipulating a couple of pointers.

Some fraction of the time, URBs are created on demand and destroyed upon 
completion.  Your approach would not save any time for these URBs, 
whereas my suggested approach would.  (This fraction probably isn't very 
large, although I don't know how big it is.)

> Why  making changes to usb core is bad? The idea in this thread is meant for all kinds of
> host controllers, xhci is what I have in my system; i think similar changes would benefit other
> HCs

Those other HC drivers would still require changes.  They could be 
changed to support my scheme as easily as your scheme.

If I were redesigning Linux's entire USB stack from the beginning, I 
would change it so that URBs would be dedicated to particular host 
controllers and endpoint types -- maybe even to particular endpoints.  
They would contain all the additional memory required for the HCD to use 
them, all pre-allocated, so that dynamic allocation would not be needed 
during normal use.  (The gadget subsystem behaves this way already.)

Such a drastic change isn't feasible at this point, although what you 
are suggesting is a step in that direction.  In the end it comes down 
to a time/space tradeoff, and it's very difficult to know what the best 
balance is.

I'm not saying that your approach is bad or wrong, just that there are 
other possibilities to consider.

Alan Stern

Alan Stern
David Wang May 13, 2025, 4:35 p.m. UTC | #3
At 2025-05-13 23:37:20, "Alan Stern" <stern@rowland.harvard.edu> wrote:
>On Tue, May 13, 2025 at 10:41:45PM +0800, David Wang wrote:
>> 
>> 
>> At 2025-05-13 22:25:50, "Alan Stern" <stern@rowland.harvard.edu> wrote:
>> >On Tue, May 13, 2025 at 07:38:17PM +0800, David Wang wrote:
>> >> ---
>> >> Changes:
>> >> 1. Use caller's mem_flags if a larger memory is needed.
>> >> Thanks to Oliver Neukum <oneukum@suse.com>'s review.
>> >> ---
>> >> URB objects have long lifecycle, an urb can be reused between
>> >> enqueue-dequeue loops; The private data needed by some host controller
>> >> has very short lifecycle, the memory is alloced when enqueue, and
>> >> released when dequeue. For example, on a system with xhci, several
>> >> minutes of usage of webcam/keyboard/mouse have memory alloc counts:
>> >>   drivers/usb/core/urb.c:75 [usbcore] func:usb_alloc_urb 661
>> >>   drivers/usb/host/xhci.c:1555 [xhci_hcd] func:xhci_urb_enqueue 424863
>> >> Memory allocation frequency for host-controller private data can reach
>> >> ~1k/s and above.
>> >> 
>> >> High frequent allocations for host-controller private data can be
>> >> avoided if urb take over the ownership of memory, the memory then shares
>> >> the longer lifecycle with urb objects.
>> >> 
>> >> Add a mempool to urb for hcpriv usage, the mempool only holds one block
>> >> of memory and grows when larger size is requested.
>> >> 
>> >> Signed-off-by: David Wang <00107082@163.com>
>> >
>> >It should be possible to do what you want without touching the USB core 
>> >code at all, changing only xhci-hcd.  That's what Mathias is suggesting.
>> >
>> >Instead of having an URB keep ownership of its extra memory after it 
>> >completes, xhci-hcd can put the memory area onto a free list.  Then 
>> >memory areas on the free list can be reused with almost no overhead when 
>> >URBs are enqueued later on.
>> 
>> I have to disagree,  your suggestion has no much difference from requesting memory from
>> system, locks and memory pool managements, all the same are needed, why bother?
>
>There are two differences.  First, xhci-hcd already has its own lock 
>that it acquires when enqueuing or dequeuing URBs, so no additional 
>locks would be needed.  Second, complicated memory pool management isn't 
>necessary; the management can be extremely simple.  (For example, 
>Mathias suggested just reusing the most recently released memory area 
>unless it is too small.)

My patch also just reuse memory,  in a simpler way I would argue....

>
>> The reason I choose URB is that  URB life cycle contains the private data's lifecycle, 
>> and no two HCD can take over the same URB as the same time.
>> 
>> I think the memory pool here is not actually a pool,  or I should say the mempool consists of
>> pool of URBs, and each URB have only "single one" slot of mem pool in it.
>> 
>> 
>> >
>> >This idea can become more elaborate if you maintain separate free lists 
>> >for different devices, or even for different endpoints, or sort the free 
>> >list by the size of the memory areas.  But the basic idea is always the 
>> >same: Don't change usbcore (including struct urb), and make getting and 
>> >releasing the extra memory areas have extremely low overhead.
>> 
>> Why implements a device level memory pool would have extremely low overhead?
>
>Because then getting or releasing memory areas from the pool could be 
>carried out just by manipulating a couple of pointers.

A couple of pointers manipulation? and it would be simpler than a reusable buffer in URB?  I doubt that.
There would be lots of details needs to consider,  detail is devil and that why we prefer simpler solution,
I just don't understand, you seems imply that my patch is not simple, could you elaborate more on it,
or it is just that in my mind, make changes to "usb core" is a big no-no!

>
>Some fraction of the time, URBs are created on demand and destroyed upon 
>completion.  Your approach would not save any time for these URBs, 
>whereas my suggested approach would.  (This fraction probably isn't very 
>large, although I don't know how big it is.)

I am aiming to void extra tons of alloctions for "private data",  not URB allocation.
When I use my USB webcam, I would observer 1k+ allocation per seconds for private data, while
 urb allocation's frequency is already very low,  people have already done the  optimization
I guess.   The memory profiling feature introduced in 6.12 is a very good start for identifying
where improvement could be made for memory behavior. 

>
>> Why  making changes to usb core is bad? The idea in this thread is meant for all kinds of
>> host controllers, xhci is what I have in my system; i think similar changes would benefit other
>> HCs
>
>Those other HC drivers would still require changes.  They could be 
>changed to support my scheme as easily as your scheme.

>
>If I were redesigning Linux's entire USB stack from the beginning, I 
>would change it so that URBs would be dedicated to particular host 
>controllers and endpoint types -- maybe even to particular endpoints.  
>They would contain all the additional memory required for the HCD to use 
>them, all pre-allocated, so that dynamic allocation would not be needed 
>during normal use.  (The gadget subsystem behaves this way already.)
>
>Such a drastic change isn't feasible at this point, although what you 
>are suggesting is a step in that direction.  In the end it comes down 
>to a time/space tradeoff, and it's very difficult to know what the best 
>balance is.
Drastic change? Do you mean make change to USB core? Because counting by
lines of changes in this patch, I feel my patch is quite simple and efficient.
I also don't get your time/space tradeoff here,  are you talking about my patch?
or you were just talking a solution in your mind....
This patch only needs a pointer and a size in URB, and URB object allocated in a slow pace...

>
>I'm not saying that your approach is bad or wrong, just that there are 
>other possibilities to consider.
>
>Alan Stern
>
>Alan Stern


David
David Wang May 13, 2025, 6:48 p.m. UTC | #4
At 2025-05-14 02:21:15, "Alan Stern" <stern@rowland.harvard.edu> wrote:
>On Wed, May 14, 2025 at 12:35:29AM +0800, David Wang wrote:
>> 
>> At 2025-05-13 23:37:20, "Alan Stern" <stern@rowland.harvard.edu> wrote:
>> >> I have to disagree,  your suggestion has no much difference from requesting memory from
>> >> system, locks and memory pool managements, all the same are needed, why bother?
>> >
>> >There are two differences.  First, xhci-hcd already has its own lock 
>> >that it acquires when enqueuing or dequeuing URBs, so no additional 
>> >locks would be needed.  Second, complicated memory pool management isn't 
>> >necessary; the management can be extremely simple.  (For example, 
>> >Mathias suggested just reusing the most recently released memory area 
>> >unless it is too small.)
>> 
>> My patch also just reuse memory,  in a simpler way I would argue....
>
>I didn't say your approach wasn't simple.  I said that my approach has 
>very low overhead, a lot lower than the existing code, whereas you said 
>my approach "has no much difference" from the existing code.
>
>> >> >This idea can become more elaborate if you maintain separate free lists 
>> >> >for different devices, or even for different endpoints, or sort the free 
>> >> >list by the size of the memory areas.  But the basic idea is always the 
>> >> >same: Don't change usbcore (including struct urb), and make getting and 
>> >> >releasing the extra memory areas have extremely low overhead.
>> >> 
>> >> Why implements a device level memory pool would have extremely low overhead?
>> >
>> >Because then getting or releasing memory areas from the pool could be 
>> >carried out just by manipulating a couple of pointers.
>> 
>> A couple of pointers manipulation? and it would be simpler than a reusable buffer in URB?
>>  I doubt that.
>
>I didn't say pointer manipulation was simpler than a reusable buffer.  I 
>said that it was very low overhead, in order to answer your question: 
>"Why implements a device level memory pool would have extremely low 
>overhead?"

Now I feels it  becomes a wording game .....
>
>> There would be lots of details needs to consider,  detail is devil and that why we prefer simpler solution,
>> I just don't understand, you seems imply that my patch is not simple, could you elaborate more on it,
>> or it is just that in my mind, make changes to "usb core" is a big no-no!
>
>Neither one.
>  
>However, you have forgotten about one thing: With your patch, each URB 
>maintains ownership of a memory area even when the URB is not in use.  
>With the existing code, that memory is freed when the URB is not in use, 
>and with my approach the memory is shared among URBs.
>
>In this way, your patch will use more memory than the existing code or 
>my approach.  The question to answer is which is better: Using more 
>memory (your patch) or using more time (the allocations and 
>deallocations in the existing code or my approach)?
>
>> >If I were redesigning Linux's entire USB stack from the beginning, I 
>> >would change it so that URBs would be dedicated to particular host 
>> >controllers and endpoint types -- maybe even to particular endpoints.  
>> >They would contain all the additional memory required for the HCD to use 
>> >them, all pre-allocated, so that dynamic allocation would not be needed 
>> >during normal use.  (The gadget subsystem behaves this way already.)
>> >
>> >Such a drastic change isn't feasible at this point, although what you 
>> >are suggesting is a step in that direction.  In the end it comes down 
>> >to a time/space tradeoff, and it's very difficult to know what the best 
>> >balance is.
>> Drastic change? Do you mean make change to USB core?
>
>No, I meant redesigning the entire USB stack.  It would require changing 
>all the USB drivers to allocate URBs differently from the way they do 
>now.  And changing every HC driver to preallocate the memory it needs to 
>perform transfers.  I think you can agree that would be a pretty drastic 
>change.
>
>>  Because counting by
>> lines of changes in this patch, I feel my patch is quite simple and efficient.
>> I also don't get your time/space tradeoff here,  are you talking about my patch?
>> or you were just talking a solution in your mind....
>
>Both.  See my comment above.
>
>> This patch only needs a pointer and a size in URB, and URB object allocated in a slow pace...
>
>It also needs an extra memory area that is allocated the first time the 
>URB is used, and is not deallocated until the URB is destroyed.  You 
>seem to be ignoring this fact.
>
>Alan Stern

It is not an "extra" memory area,  the memory is needed by HC anyway, the memory pool just cache it.
And about not freeing memory until URB released,  you seems forgot that we are talking 
about "memory pool" .  A URB only used once could be considered a memory pool never used.

If your memory pool approach would not  "waste" memory, I would  rather happy to learn.

I want to mention the purpose of this patch again:  
A lot of "private data" allocation could be avoided if  we use a "mempool" to cache and reuse those memory.
And use URB as the holder is a very simple way to implement this,. 

And to add , base on my memory profiling, URB usage is very efficient. I think it is a very good candidate to hold
private data cache for HCs.


David
Greg KH May 14, 2025, 7:29 a.m. UTC | #5
On Wed, May 14, 2025 at 02:44:55PM +0800, David Wang wrote:
> Hi, 
> 
> Update memory footprints after hours of USB devices usage
> on my system:
> (I have webcam/mic/keyboard/mouse/harddisk connected via USB,
> a full picture of memory footprints is attached below)
> +----------------------+----------------+-------------------------------------------+-----------------------+
> | active memory(bytes) | active objects |               alloc location              | total objects created |
> +----------------------+----------------+-------------------------------------------+-----------------------+
> |        22912         |       24       | core/urb.c:1054:urb_hcpriv_mempool_zalloc |         10523         |
> |        11776         |       31       |        core/urb.c:76:usb_alloc_urb        |         11027         |
> +----------------------+----------------+-------------------------------------------+-----------------------+
> 
> The count for active URB objects remain at low level,
> its peak is about 12KB when I copied 10G file to my harddisk.
> The memory pool in this patch takes about 22KB, its peak is 23KB.
> The patch meant to reuse memory via a mempool, the memory kept in pool is indeed
> the "tradeoff" when the system is idle. (Well, we are talking about mempool anyway.)
> How balance the tradeoff is depends on how well the mempool is managed.
> This patch takes a easy approach: put faith in URB objects management and put
> a single slot of mempool in URB on demands. And the changes, by counting lines
> in this patch, are very simple.
> Base on the profiling, the number of active URB objects are kept at a very low scale,
> only several could have a very long lifecycle.
> I think URB is a good candidate for caching those memory needed for private data.
> But I could be very wrong, due simply to the lack of knowledge.
> 
> And before, without the patch, a 10 minutes webcam usage and copying 10G file to harddisk
> would yield high rate of memory allocation for priviate data in xhci_urb_enqueue:
> +----------------------+----------------+-----------------------------------+-----------------------+
> | active memory(bytes) | active objects |           alloc location          | total objects created |
> +----------------------+----------------+-----------------------------------+-----------------------+
> |        22784         |       23       | host/xhci.c:1555:xhci_urb_enqueue |         894281 << grow|ing very quick
> |        10880         |       31       |    core/urb.c:75:usb_alloc_urb    |          4028         |
> +----------------------+----------------+-----------------------------------+-----------------------+
> I observe a highest allocation rate of 1.5K/s in xhci_urb_enqueue
> when I was copying 10G file, and had my webcam opened at the same time.
> 
> And again, to be honest, I did not observe any observable performance improvement from
> an enduser's point of view with this patch. The only significant improvement is memory footprint
> _numbers_.
> I guess memory allocation is indeed "_really damn fast_", but I still have the mindset of
> "the less allocation the better".

No, this isn't necessarily true at all.  Allocations are fast, and if we
free/allocate things quickly, it's even faster.  USB is limited by the
hardware throughput, which is _very_ slow compared to memory accesses of
the allocator.

So unless you can show that we are using less CPU time, or something
else "real" that is measurable in a real way in userspace, that would
justify the extra complexity, it's going to be hard to get me to agree
that this is something that needs to be addressed at all.

Also, I'm totally confused as to what the "latest" version of this
patchset is...

thanks,

greg k-h
David Wang May 14, 2025, 8:50 a.m. UTC | #6
At 2025-05-14 15:29:42, "Greg KH" <gregkh@linuxfoundation.org> wrote:
>On Wed, May 14, 2025 at 02:44:55PM +0800, David Wang wrote:
>> Hi, 
>> 
>> Update memory footprints after hours of USB devices usage
>> on my system:
>> (I have webcam/mic/keyboard/mouse/harddisk connected via USB,
>> a full picture of memory footprints is attached below)
>> +----------------------+----------------+-------------------------------------------+-----------------------+
>> | active memory(bytes) | active objects |               alloc location              | total objects created |
>> +----------------------+----------------+-------------------------------------------+-----------------------+
>> |        22912         |       24       | core/urb.c:1054:urb_hcpriv_mempool_zalloc |         10523         |
>> |        11776         |       31       |        core/urb.c:76:usb_alloc_urb        |         11027         |
>> +----------------------+----------------+-------------------------------------------+-----------------------+
>> 
>> The count for active URB objects remain at low level,
>> its peak is about 12KB when I copied 10G file to my harddisk.
>> The memory pool in this patch takes about 22KB, its peak is 23KB.
>> The patch meant to reuse memory via a mempool, the memory kept in pool is indeed
>> the "tradeoff" when the system is idle. (Well, we are talking about mempool anyway.)
>> How balance the tradeoff is depends on how well the mempool is managed.
>> This patch takes a easy approach: put faith in URB objects management and put
>> a single slot of mempool in URB on demands. And the changes, by counting lines
>> in this patch, are very simple.
>> Base on the profiling, the number of active URB objects are kept at a very low scale,
>> only several could have a very long lifecycle.
>> I think URB is a good candidate for caching those memory needed for private data.
>> But I could be very wrong, due simply to the lack of knowledge.
>> 
>> And before, without the patch, a 10 minutes webcam usage and copying 10G file to harddisk
>> would yield high rate of memory allocation for priviate data in xhci_urb_enqueue:
>> +----------------------+----------------+-----------------------------------+-----------------------+
>> | active memory(bytes) | active objects |           alloc location          | total objects created |
>> +----------------------+----------------+-----------------------------------+-----------------------+
>> |        22784         |       23       | host/xhci.c:1555:xhci_urb_enqueue |         894281 << grow|ing very quick
>> |        10880         |       31       |    core/urb.c:75:usb_alloc_urb    |          4028         |
>> +----------------------+----------------+-----------------------------------+-----------------------+
>> I observe a highest allocation rate of 1.5K/s in xhci_urb_enqueue
>> when I was copying 10G file, and had my webcam opened at the same time.
>> 
>> And again, to be honest, I did not observe any observable performance improvement from
>> an enduser's point of view with this patch. The only significant improvement is memory footprint
>> _numbers_.
>> I guess memory allocation is indeed "_really damn fast_", but I still have the mindset of
>> "the less allocation the better".
>
>No, this isn't necessarily true at all.  Allocations are fast, and if we
>free/allocate things quickly, it's even faster.  USB is limited by the
>hardware throughput, which is _very_ slow compared to memory accesses of
>the allocator.
>
>So unless you can show that we are using less CPU time, or something
>else "real" that is measurable in a real way in userspace, that would
>justify the extra complexity, it's going to be hard to get me to agree
>that this is something that needs to be addressed at all.

Thanks for feedbacks~! 
That's very reasonable to me,  and I have been pondering on how
to profile a USB performance, but still no clue.

 I will keep thinking about it, hopefully this 1k+/s allocation would show up somewhere, or 
conclude that it really has no significant impact at all.


Thanks
David

>
>Also, I'm totally confused as to what the "latest" version of this
>patchset is...
>
sorry, I think I mess up the mails when I add "reply-to" header to newer patches

>thanks,
>
>greg k-h
Oliver Neukum May 14, 2025, 11:23 a.m. UTC | #7
On 13.05.25 13:38, David Wang wrote:
> ---

Hi,

still an issue after a second review.
I should have noticed earlier.

> --- a/drivers/usb/core/urb.c
> +++ b/drivers/usb/core/urb.c
> @@ -23,6 +23,7 @@ static void urb_destroy(struct kref *kref)
>   
>   	if (urb->transfer_flags & URB_FREE_BUFFER)
>   		kfree(urb->transfer_buffer);
> +	kfree(urb->hcpriv_mempool);

What if somebody uses usb_init_urb()?
  
>   	kfree(urb);
>   }
> @@ -1037,3 +1038,25 @@ int usb_anchor_empty(struct usb_anchor *anchor)
>   
>   EXPORT_SYMBOL_GPL(usb_anchor_empty);
>   
> +/**
> + * urb_hcpriv_mempool_zalloc - alloc memory from mempool for hcpriv
> + * @urb: pointer to URB being used
> + * @size: memory size requested by current host controller
> + * @mem_flags: the type of memory to allocate
> + *
> + * Return: NULL if out of memory, otherwise memory are zeroed
> + */
> +void *urb_hcpriv_mempool_zalloc(struct urb *urb, size_t size, gfp_t mem_flags)
> +{
> +	if (urb->hcpriv_mempool_size < size) {
> +		kfree(urb->hcpriv_mempool);
> +		urb->hcpriv_mempool_size = size;
> +		urb->hcpriv_mempool = kmalloc(size, mem_flags);

That could use kzalloc().

	Regards
		Oliver
Oliver Neukum May 14, 2025, 12:03 p.m. UTC | #8
On 14.05.25 13:51, David Wang wrote:
  
> I am not quite sure about the concern here, do you mean somebody create a urb,
> and then usb_init_urb() here, and never use urb_destroy to release it?

Yes.

> 
> That would cause memory leak if urb_destroy is not called......But is this really possible?.

I think a few drivers under drivers/media do so.

	Regards
		Oliver
diff mbox series

Patch

diff --git a/drivers/usb/core/urb.c b/drivers/usb/core/urb.c
index 5e52a35486af..51bf25125aeb 100644
--- a/drivers/usb/core/urb.c
+++ b/drivers/usb/core/urb.c
@@ -23,6 +23,7 @@  static void urb_destroy(struct kref *kref)
 
 	if (urb->transfer_flags & URB_FREE_BUFFER)
 		kfree(urb->transfer_buffer);
+	kfree(urb->hcpriv_mempool);
 
 	kfree(urb);
 }
@@ -1037,3 +1038,25 @@  int usb_anchor_empty(struct usb_anchor *anchor)
 
 EXPORT_SYMBOL_GPL(usb_anchor_empty);
 
+/**
+ * urb_hcpriv_mempool_zalloc - alloc memory from mempool for hcpriv
+ * @urb: pointer to URB being used
+ * @size: memory size requested by current host controller
+ * @mem_flags: the type of memory to allocate
+ *
+ * Return: NULL if out of memory, otherwise memory are zeroed
+ */
+void *urb_hcpriv_mempool_zalloc(struct urb *urb, size_t size, gfp_t mem_flags)
+{
+	if (urb->hcpriv_mempool_size < size) {
+		kfree(urb->hcpriv_mempool);
+		urb->hcpriv_mempool_size = size;
+		urb->hcpriv_mempool = kmalloc(size, mem_flags);
+	}
+	if (urb->hcpriv_mempool)
+		memset(urb->hcpriv_mempool, 0, size);
+	else
+		urb->hcpriv_mempool_size = 0;
+	return urb->hcpriv_mempool;
+}
+EXPORT_SYMBOL_GPL(urb_hcpriv_mempool_zalloc);
diff --git a/include/linux/usb.h b/include/linux/usb.h
index b46738701f8d..4400e41271bc 100644
--- a/include/linux/usb.h
+++ b/include/linux/usb.h
@@ -1602,6 +1602,8 @@  struct urb {
 	struct kref kref;		/* reference count of the URB */
 	int unlinked;			/* unlink error code */
 	void *hcpriv;			/* private data for host controller */
+	void *hcpriv_mempool;           /* a single slot of cache for HCD's private data */
+	size_t hcpriv_mempool_size;     /* current size of the memory pool */
 	atomic_t use_count;		/* concurrent submissions counter */
 	atomic_t reject;		/* submissions will fail */
 
@@ -1790,6 +1792,7 @@  extern int usb_wait_anchor_empty_timeout(struct usb_anchor *anchor,
 extern struct urb *usb_get_from_anchor(struct usb_anchor *anchor);
 extern void usb_scuttle_anchored_urbs(struct usb_anchor *anchor);
 extern int usb_anchor_empty(struct usb_anchor *anchor);
+extern void *urb_hcpriv_mempool_zalloc(struct urb *urb, size_t size, gfp_t mem_flags);
 
 #define usb_unblock_urb	usb_unpoison_urb