[v3,0/2] URB_FREE_COHERENT gs_usb memory leak fix

Message ID	20220610213335.3077375-1-rhett.aultman@samsara.com
Headers	show Return-Path: <linux-usb-owner@kernel.org> From: Rhett Aultman <rhett.aultman@samsara.com> To: linux-usb@vger.kernel.org, linux-can <linux-can@vger.kernel.org> Cc: --cc=Oliver Neukum <oneukum@suse.com>, Alan Stern <stern@rowland.harvard.edu>, Marc Kleine-Budde <mkl@pengutronix.de>, Greg Kroah-Hartman <gregkh@linuxfoundation.org>, Rhett Aultman <rhett.aultman@samsara.com> Subject: [PATCH v3 0/2] URB_FREE_COHERENT gs_usb memory leak fix Date: Fri, 10 Jun 2022 17:33:33 -0400 Message-Id: <20220610213335.3077375-1-rhett.aultman@samsara.com> In-Reply-To: <20220609204714.2715188-1-rhett.aultman@samsara.com> References: <20220609204714.2715188-1-rhett.aultman@samsara.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	URB_FREE_COHERENT gs_usb memory leak fix \| expand [v3,0/2] URB_FREE_COHERENT gs_usb memory leak fix [v3,1/2] drivers: usb/core/urb: Add URB_FREE_COHERENT

Rhett Aultman June 10, 2022, 9:33 p.m. UTC

This patchset resolves a to memory leak which can occur with successive
iterations of calling gs_can_open() and gs_can_close().  The central
cause of this memory leak, which is an issue common to many of the USB
CAN drivers, is that memory allocated for RX buffers using
usb_alloc_coherent() and then kept in the URB will be properly freed
when the URB is killed.  This assumption is incorrect, as memory
allocated with usb_alloc_coherent() must be freed with
usb_free_coherent(), and there is no provision for this in the existing
URB code.

The common solution to this, found in v1 of my patches as well as in
already merged patches for other CAN USB drivers (see the patch for the
esd CAN-USB/2 driver here:
https://www.spinics.net/lists/linux-can/msg08203.html) is for the driver
itself to maintain an array of addresses of the buffers allocated with
usb_alloc_coherent() and to then iteratively call usb_free_coherent() on
them in their close function.  This solution requires a driver developer
to understand this unclear nuance, and it has historically been solved
in a piecemeal way one driver at a time (note: the gs_usb driver has had
this issue since the 3.x.x kernel series).

Rather than continue to place the burden of complexity on the drivers,
this patchset adds a new URB flag which allows the DMA buffer to be
correctly freed with the URB is killed.  This results in a much simpler
solution at the driver level and with minimal additional code in the USB
core.

** Changelog **

v2 -> v3:

  * Resolve style nits in gs_usb.c
  * Squash commits in urb.c and correct authorship

v1 -> v2:

  * Add URB_FREE_COHERENT flag to urb.c
  * Use URB_FREE_COHERENT flag rather than arrays of buffers in gs_usb.c

Rhett Aultman (1):
  can: gs_usb: fix DMA memory leak on close

Vincent Mailhol (1):
  drivers: usb/core/urb: Add URB_FREE_COHERENT

 drivers/net/can/usb/gs_usb.c | 2 +-
 drivers/usb/core/urb.c       | 5 ++++-
 include/linux/usb.h          | 3 ++-
 3 files changed, 7 insertions(+), 3 deletions(-)

Marc Kleine-Budde June 11, 2022, 3:31 p.m. UTC | #1

On 10.06.2022 17:33:35, Rhett Aultman wrote:
> From: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
> 
> When allocating URB memory with kmalloc(), drivers can simply set the
> URB_FREE_BUFFER flag in urb::transfer_flags and that way, the memory
> will be freed in the background when killing the URB (for example with
> usb_kill_anchored_urbs()).
> 
> However, there are no equivalent mechanism when allocating DMA memory
> (with usb_alloc_coherent()).
> 
> This patch adds a new flag: URB_FREE_COHERENT. Setting this flag will
> cause the kernel to automatically call usb_free_coherent() on the
> transfer buffer when the URB is killed, similarly to how
> URB_FREE_BUFFER triggers a call to kfree().
> 
> In order to have all the flags in numerical order, URB_DIR_IN is
> renumbered from 0x0200 to 0x0400 so that URB_FREE_COHERENT can reuse
> value 0x0200.
> 
> Co-developed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
> Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
> Co-developed-by: Rhett Aultman <rhett.aultman@samsara.com>
> Signed-off-by: Rhett Aultman <rhett.aultman@samsara.com>
> Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>

FWIW:
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de>

This patch probably goes upstream via USB. Once this is in net I'll take
the 2nd patch.

regards,
Marc

Vincent MAILHOL June 11, 2022, 4:06 p.m. UTC | #2

On Sun. 12 juin 2022 at 00:31, Marc Kleine-Budde <mkl@pengutronix.de> wrote:
> On 10.06.2022 17:33:35, Rhett Aultman wrote:
> > From: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
> >
> > When allocating URB memory with kmalloc(), drivers can simply set the
> > URB_FREE_BUFFER flag in urb::transfer_flags and that way, the memory
> > will be freed in the background when killing the URB (for example with
> > usb_kill_anchored_urbs()).
> >
> > However, there are no equivalent mechanism when allocating DMA memory
> > (with usb_alloc_coherent()).
> >
> > This patch adds a new flag: URB_FREE_COHERENT. Setting this flag will
> > cause the kernel to automatically call usb_free_coherent() on the
> > transfer buffer when the URB is killed, similarly to how
> > URB_FREE_BUFFER triggers a call to kfree().
> >
> > In order to have all the flags in numerical order, URB_DIR_IN is
> > renumbered from 0x0200 to 0x0400 so that URB_FREE_COHERENT can reuse
> > value 0x0200.
> >
> > Co-developed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
> > Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
> > Co-developed-by: Rhett Aultman <rhett.aultman@samsara.com>
> > Signed-off-by: Rhett Aultman <rhett.aultman@samsara.com>
> > Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
>
> FWIW:
> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de>
>
> This patch probably goes upstream via USB. Once this is in net I'll take
> the 2nd patch.

Question to Greg: can this first patch also be applied to the stable
branches? Technically, this is a new feature but it will be used to
solve several memory leaks on existing drivers (the gs_usb is only one
example).

Yours sincerely,
Vincent Mailhol

Vincent MAILHOL June 21, 2022, 2:59 p.m. UTC | #3

On Tue. 21 Jun 2022 at 22:56, Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
> On Sun, Jun 12, 2022 at 01:06:37AM +0900, Vincent MAILHOL wrote:
> > On Sun. 12 juin 2022 at 00:31, Marc Kleine-Budde <mkl@pengutronix.de> wrote:
> > > On 10.06.2022 17:33:35, Rhett Aultman wrote:
> > > > From: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
> > > >
> > > > When allocating URB memory with kmalloc(), drivers can simply set the
> > > > URB_FREE_BUFFER flag in urb::transfer_flags and that way, the memory
> > > > will be freed in the background when killing the URB (for example with
> > > > usb_kill_anchored_urbs()).
> > > >
> > > > However, there are no equivalent mechanism when allocating DMA memory
> > > > (with usb_alloc_coherent()).
> > > >
> > > > This patch adds a new flag: URB_FREE_COHERENT. Setting this flag will
> > > > cause the kernel to automatically call usb_free_coherent() on the
> > > > transfer buffer when the URB is killed, similarly to how
> > > > URB_FREE_BUFFER triggers a call to kfree().
> > > >
> > > > In order to have all the flags in numerical order, URB_DIR_IN is
> > > > renumbered from 0x0200 to 0x0400 so that URB_FREE_COHERENT can reuse
> > > > value 0x0200.
> > > >
> > > > Co-developed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
> > > > Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
> > > > Co-developed-by: Rhett Aultman <rhett.aultman@samsara.com>
> > > > Signed-off-by: Rhett Aultman <rhett.aultman@samsara.com>
> > > > Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
> > >
> > > FWIW:
> > > Acked-by: Marc Kleine-Budde <mkl@pengutronix.de>
> > >
> > > This patch probably goes upstream via USB. Once this is in net I'll take
> > > the 2nd patch.
> >
> > Question to Greg: can this first patch also be applied to the stable
> > branches? Technically, this is a new feature but it will be used to
> > solve several memory leaks on existing drivers (the gs_usb is only one
> > example).
>
> We take in dependent patches into the stable trees all the time when
> needed, that's not an issue here.
>
> What is an issue here is that this feels odd as other USB developers
> said previously.
>
> My big objection here is what validates that the size of the transfer
> buffer here is really the size of the buffer to be freed?  Is that
> always set properly to be the length that was allocated?  That might
> just be the size of the last transfer using this buffer, but there is no
> guarantee that I can see of that says this really is the length of the
> allocated buffer, which is why usb_free_coherent() requires a size
> parameter.

Thanks for the comment.

I (probably wrongly) assumed that urb::transfer_buffer_length was the
allocated length and urb::actual_length was what was actually being
transferred. Right now, I am just confused. Seems that I need to study
a bit more and understand the real purpose of
urb::transfer_buffer_length because I still fail to understand in
which situation this can be different from the allocated length.

> If that guarantee is always right, then we should be able to drop the
> size option in usb_free_coherent(), and I don't think that's really
> possible.

I do not follow you on this comment. usb_free_coherent() does not have
a reference to the URB, right? How would it be supposed to retrieve
urb::transfer_buffer_length?


Yours sincerely,
Vincent Mailhol

Alan Stern June 21, 2022, 3:11 p.m. UTC | #4

On Tue, Jun 21, 2022 at 11:59:16PM +0900, Vincent MAILHOL wrote:
> I (probably wrongly) assumed that urb::transfer_buffer_length was the
> allocated length and urb::actual_length was what was actually being
> transferred. Right now, I am just confused. Seems that I need to study
> a bit more and understand the real purpose of
> urb::transfer_buffer_length because I still fail to understand in
> which situation this can be different from the allocated length.

urb->transfer_buffer_length is the amount of data that the driver wants 
to send or expects to receive.  urb->actual_length is the amount of data 
that was actually sent or actually received.

Neither of these values has to be the same as the size of the buffer -- 
but they better not be bigger!

Alan Stern

Vincent MAILHOL June 21, 2022, 3:54 p.m. UTC | #5

On Wed. 22 Jun 2022 at 00:11, Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
> On Tue, Jun 21, 2022 at 11:59:16PM +0900, Vincent MAILHOL wrote:
> > On Tue. 21 Jun 2022 at 22:56, Greg Kroah-Hartman
> > <gregkh@linuxfoundation.org> wrote:
> > > On Sun, Jun 12, 2022 at 01:06:37AM +0900, Vincent MAILHOL wrote:
> > > > On Sun. 12 juin 2022 at 00:31, Marc Kleine-Budde <mkl@pengutronix.de> wrote:
> > > > > On 10.06.2022 17:33:35, Rhett Aultman wrote:
> > > > > > From: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
> > > > > >
> > > > > > When allocating URB memory with kmalloc(), drivers can simply set the
> > > > > > URB_FREE_BUFFER flag in urb::transfer_flags and that way, the memory
> > > > > > will be freed in the background when killing the URB (for example with
> > > > > > usb_kill_anchored_urbs()).
> > > > > >
> > > > > > However, there are no equivalent mechanism when allocating DMA memory
> > > > > > (with usb_alloc_coherent()).
> > > > > >
> > > > > > This patch adds a new flag: URB_FREE_COHERENT. Setting this flag will
> > > > > > cause the kernel to automatically call usb_free_coherent() on the
> > > > > > transfer buffer when the URB is killed, similarly to how
> > > > > > URB_FREE_BUFFER triggers a call to kfree().
> > > > > >
> > > > > > In order to have all the flags in numerical order, URB_DIR_IN is
> > > > > > renumbered from 0x0200 to 0x0400 so that URB_FREE_COHERENT can reuse
> > > > > > value 0x0200.
> > > > > >
> > > > > > Co-developed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
> > > > > > Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
> > > > > > Co-developed-by: Rhett Aultman <rhett.aultman@samsara.com>
> > > > > > Signed-off-by: Rhett Aultman <rhett.aultman@samsara.com>
> > > > > > Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
> > > > >
> > > > > FWIW:
> > > > > Acked-by: Marc Kleine-Budde <mkl@pengutronix.de>
> > > > >
> > > > > This patch probably goes upstream via USB. Once this is in net I'll take
> > > > > the 2nd patch.
> > > >
> > > > Question to Greg: can this first patch also be applied to the stable
> > > > branches? Technically, this is a new feature but it will be used to
> > > > solve several memory leaks on existing drivers (the gs_usb is only one
> > > > example).
> > >
> > > We take in dependent patches into the stable trees all the time when
> > > needed, that's not an issue here.
> > >
> > > What is an issue here is that this feels odd as other USB developers
> > > said previously.
> > >
> > > My big objection here is what validates that the size of the transfer
> > > buffer here is really the size of the buffer to be freed?  Is that
> > > always set properly to be the length that was allocated?  That might
> > > just be the size of the last transfer using this buffer, but there is no
> > > guarantee that I can see of that says this really is the length of the
> > > allocated buffer, which is why usb_free_coherent() requires a size
> > > parameter.
> >
> > Thanks for the comment.
> >
> > I (probably wrongly) assumed that urb::transfer_buffer_length was the
> > allocated length and urb::actual_length was what was actually being
> > transferred. Right now, I am just confused. Seems that I need to study
> > a bit more and understand the real purpose of
> > urb::transfer_buffer_length because I still fail to understand in
> > which situation this can be different from the allocated length.
> >
> > > If that guarantee is always right, then we should be able to drop the
> > > size option in usb_free_coherent(), and I don't think that's really
> > > possible.
> >
> > I do not follow you on this comment. usb_free_coherent() does not have
> > a reference to the URB, right? How would it be supposed to retrieve
> > urb::transfer_buffer_length?
>
> Ah, good point.  Along those lines, how do you know what the `dma`
> parameter should be set to as well?

Here, just assuming that usb_alloc_coherent() was given
urb::transfer_dma as an argument. Maybe I missed some edge cases but I
never saw drivers doing it differently.

Vincent MAILHOL June 21, 2022, 4:40 p.m. UTC | #6

On Wed 22 Jun 2022 at 01:15, Alan Stern <stern@rowland.harvard.edu> wrote:
> On Wed, Jun 22, 2022 at 12:55:46AM +0900, Vincent MAILHOL wrote:
> > On Wed. 22 Jun 2022 at 00:13, Alan Stern <stern@rowland.harvard.edu> wrote:
> > > On Tue, Jun 21, 2022 at 11:59:16PM +0900, Vincent MAILHOL wrote:
> > > > I (probably wrongly) assumed that urb::transfer_buffer_length was the
> > > > allocated length and urb::actual_length was what was actually being
> > > > transferred. Right now, I am just confused. Seems that I need to study
> > > > a bit more and understand the real purpose of
> > > > urb::transfer_buffer_length because I still fail to understand in
> > > > which situation this can be different from the allocated length.
> > >
> > > urb->transfer_buffer_length is the amount of data that the driver wants
> > > to send or expects to receive.  urb->actual_length is the amount of data
> > > that was actually sent or actually received.
> > >
> > > Neither of these values has to be the same as the size of the buffer --
> > > but they better not be bigger!
> >
> > Thanks. Now things are a bit clearer.
> > I guess that for the outcoming URB what I proposed made no sense. For
> > incoming URB, I guess that most of the drivers want to set
> > urb::transfer_buffer once for all with the allocated size and never
> > touch it again.
>
> Not necessarily.  Some drivers may behave differently from the way you
> expect.

Yes, my point is not to generalise. Agree that there are exceptions.

> > Maybe the patch only makes sense of the incoming URB. Would it make
> > sense to keep it but with an additional check to trigger a dmesg
> > warning if this is used on an outcoming endpoint and with additional
> > comment that the URB_FREE_COHERENT requires urb::transfer_buffer to be
> > the allocated size?
>
> Well, what really matters is that the transfer_buffer_length value has
> to be the same as the size of the buffer.  If that's true, the direction
> of the URB doesn't matter.  So yes, that requirement would definitely
> need to be documented.
>
> On the other hand, there wouldn't be any way to tell automatically if
> the requirement was violated.

ACK. That's why I said "add comment" and not "check".

> And since this function could only be
> used with some of the URBs you're interested in, does it make sense to
> add it at all?  The other URBs would still need their buffers to be
> freed manually.

The rationale is that similarly to URB_FREE_BUFFER, this would be
optional. This is why I did not propose to reuse
URB_NO_TRANSFER_DMA_MAP but instead add a new flag. I propose it
because I think that many drivers can benefit from it.

More than that, the real concern is that many developers forget to
free the DMA allocated memory. c.f. original message of this thread:
https://lore.kernel.org/linux-can/alpine.DEB.2.22.394.2206031547001.1630869@thelappy/T/#m2ef343d3ee708178b1e37be898884bafa7f49f2f

And the usual fix requires to create local arrays to store references
to the transfer buffer and DMA addresses.

I would like to find a solution to remove this burden from the drivers
and have an USB API to easily free the URB when, for example, killing
an anchor.


Yours sincerely,
Vincent Mailhol

Greg Kroah-Hartman June 21, 2022, 5 p.m. UTC | #7

On Wed, Jun 22, 2022 at 01:40:10AM +0900, Vincent MAILHOL wrote:
> On Wed 22 Jun 2022 at 01:15, Alan Stern <stern@rowland.harvard.edu> wrote:
> > On Wed, Jun 22, 2022 at 12:55:46AM +0900, Vincent MAILHOL wrote:
> > > On Wed. 22 Jun 2022 at 00:13, Alan Stern <stern@rowland.harvard.edu> wrote:
> > > > On Tue, Jun 21, 2022 at 11:59:16PM +0900, Vincent MAILHOL wrote:
> > > > > I (probably wrongly) assumed that urb::transfer_buffer_length was the
> > > > > allocated length and urb::actual_length was what was actually being
> > > > > transferred. Right now, I am just confused. Seems that I need to study
> > > > > a bit more and understand the real purpose of
> > > > > urb::transfer_buffer_length because I still fail to understand in
> > > > > which situation this can be different from the allocated length.
> > > >
> > > > urb->transfer_buffer_length is the amount of data that the driver wants
> > > > to send or expects to receive.  urb->actual_length is the amount of data
> > > > that was actually sent or actually received.
> > > >
> > > > Neither of these values has to be the same as the size of the buffer --
> > > > but they better not be bigger!
> > >
> > > Thanks. Now things are a bit clearer.
> > > I guess that for the outcoming URB what I proposed made no sense. For
> > > incoming URB, I guess that most of the drivers want to set
> > > urb::transfer_buffer once for all with the allocated size and never
> > > touch it again.
> >
> > Not necessarily.  Some drivers may behave differently from the way you
> > expect.
> 
> Yes, my point is not to generalise. Agree that there are exceptions.
> 
> > > Maybe the patch only makes sense of the incoming URB. Would it make
> > > sense to keep it but with an additional check to trigger a dmesg
> > > warning if this is used on an outcoming endpoint and with additional
> > > comment that the URB_FREE_COHERENT requires urb::transfer_buffer to be
> > > the allocated size?
> >
> > Well, what really matters is that the transfer_buffer_length value has
> > to be the same as the size of the buffer.  If that's true, the direction
> > of the URB doesn't matter.  So yes, that requirement would definitely
> > need to be documented.
> >
> > On the other hand, there wouldn't be any way to tell automatically if
> > the requirement was violated.
> 
> ACK. That's why I said "add comment" and not "check".
> 
> > And since this function could only be
> > used with some of the URBs you're interested in, does it make sense to
> > add it at all?  The other URBs would still need their buffers to be
> > freed manually.
> 
> The rationale is that similarly to URB_FREE_BUFFER, this would be
> optional. This is why I did not propose to reuse
> URB_NO_TRANSFER_DMA_MAP but instead add a new flag. I propose it
> because I think that many drivers can benefit from it.
> 
> More than that, the real concern is that many developers forget to
> free the DMA allocated memory. c.f. original message of this thread:
> https://lore.kernel.org/linux-can/alpine.DEB.2.22.394.2206031547001.1630869@thelappy/T/#m2ef343d3ee708178b1e37be898884bafa7f49f2f
> 
> And the usual fix requires to create local arrays to store references
> to the transfer buffer and DMA addresses.

Why not just free the memory in the urb completion function that is
called when it is finished?

thanks,

greg k-h

Vincent MAILHOL June 21, 2022, 5:14 p.m. UTC | #8

On Wed. 22 Jun 2022 at 02:02, Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
> On Wed, Jun 22, 2022 at 01:40:10AM +0900, Vincent MAILHOL wrote:
> > On Wed 22 Jun 2022 at 01:15, Alan Stern <stern@rowland.harvard.edu> wrote:
> > > On Wed, Jun 22, 2022 at 12:55:46AM +0900, Vincent MAILHOL wrote:
> > > > On Wed. 22 Jun 2022 at 00:13, Alan Stern <stern@rowland.harvard.edu> wrote:
> > > > > On Tue, Jun 21, 2022 at 11:59:16PM +0900, Vincent MAILHOL wrote:
> > > > > > I (probably wrongly) assumed that urb::transfer_buffer_length was the
> > > > > > allocated length and urb::actual_length was what was actually being
> > > > > > transferred. Right now, I am just confused. Seems that I need to study
> > > > > > a bit more and understand the real purpose of
> > > > > > urb::transfer_buffer_length because I still fail to understand in
> > > > > > which situation this can be different from the allocated length.
> > > > >
> > > > > urb->transfer_buffer_length is the amount of data that the driver wants
> > > > > to send or expects to receive.  urb->actual_length is the amount of data
> > > > > that was actually sent or actually received.
> > > > >
> > > > > Neither of these values has to be the same as the size of the buffer --
> > > > > but they better not be bigger!
> > > >
> > > > Thanks. Now things are a bit clearer.
> > > > I guess that for the outcoming URB what I proposed made no sense. For
> > > > incoming URB, I guess that most of the drivers want to set
> > > > urb::transfer_buffer once for all with the allocated size and never
> > > > touch it again.
> > >
> > > Not necessarily.  Some drivers may behave differently from the way you
> > > expect.
> >
> > Yes, my point is not to generalise. Agree that there are exceptions.
> >
> > > > Maybe the patch only makes sense of the incoming URB. Would it make
> > > > sense to keep it but with an additional check to trigger a dmesg
> > > > warning if this is used on an outcoming endpoint and with additional
> > > > comment that the URB_FREE_COHERENT requires urb::transfer_buffer to be
> > > > the allocated size?
> > >
> > > Well, what really matters is that the transfer_buffer_length value has
> > > to be the same as the size of the buffer.  If that's true, the direction
> > > of the URB doesn't matter.  So yes, that requirement would definitely
> > > need to be documented.
> > >
> > > On the other hand, there wouldn't be any way to tell automatically if
> > > the requirement was violated.
> >
> > ACK. That's why I said "add comment" and not "check".
> >
> > > And since this function could only be
> > > used with some of the URBs you're interested in, does it make sense to
> > > add it at all?  The other URBs would still need their buffers to be
> > > freed manually.
> >
> > The rationale is that similarly to URB_FREE_BUFFER, this would be
> > optional. This is why I did not propose to reuse
> > URB_NO_TRANSFER_DMA_MAP but instead add a new flag. I propose it
> > because I think that many drivers can benefit from it.
> >
> > More than that, the real concern is that many developers forget to
> > free the DMA allocated memory. c.f. original message of this thread:
> > https://lore.kernel.org/linux-can/alpine.DEB.2.22.394.2206031547001.1630869@thelappy/T/#m2ef343d3ee708178b1e37be898884bafa7f49f2f
> >
> > And the usual fix requires to create local arrays to store references
> > to the transfer buffer and DMA addresses.
>
> Why not just free the memory in the urb completion function that is
> called when it is finished?

I thought it was a bad thing to call usb_free_coherent() in the
completion function and that the best practice was to allocate all the
DMA memory when opening the device, reuse them during runtime and free
everything when closing the device.

Especially, correct me if I am wrong, but isn't there a risk to
trigger below warning if calling usb_free_coherent() in an IRQ context
(which is the case in the completion function)?
https://elixir.bootlin.com/linux/v5.18/source/kernel/dma/mapping.c#L526


Yours sincerely,
Vincent Mailhol

David Laight June 22, 2022, 10:03 a.m. UTC | #9

From: Greg Kroah-Hartman
> Sent: 22 June 2022 10:41
...
> > IIRC urb are pretty big.
> 
> What exactly do you mean by "pretty big" here?

Maybe 100-200 bytes?

> And what is wrong with
> that, I have never seen any issues with the current size of that
> structure in any benchmark or performance results.

Nothing really, the cost of allocating a sub-page structure
is pretty much independent of its size.
What I really meant is it isn't (say) 32 bytes where adding
another 4 could be a significant increase.

> All USB bottlenecks
> that I know of are either in the hardware layer, or in the protocol
> layer itself (i.e. usb-storage protocol).

There is a bufferbloat issue for usbnet on XHCI.
It would be better to fill the ring with (probably) 4k buffers
and have something sort out the USB frames from the buffers
and then get the network driver to use (IIRC) build_skb
to generate the skb from the data fragments.

I was only using 100M last time I was testing that and
didn't get performance issues.
Just problems with the USB connections 'bouncing', that
project got canned for other reasons ...

But at higher speeds and high network use it might all
be a problem.

> 
> > You'd be unlucky if adding an extra field to hold the allocated
> > size would ever need more memory.
> > So it might just be worth saving the allocated size.
> 
> Maybe, yes, then we could transition to allocating the urb and buffer at
> the same time like we do partially for iso streams in an urb.  But that
> still might be overkill for just this one driver.  I'm curious as to why
> a slow and tiny protocol like CAN needs to use usb_alloc_coherent() for
> its buffers in the first place.

Yes, being able to do short transfers from buffer space in the urb
would save all the issues about having to allocate an extra
buffer to avoid DMA from stack.

Indeed for XHCI there is a bit that allows 8 bytes of data to
replace the pointer in the ring structure itself.
I don't remember the driver every doing that.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Greg Kroah-Hartman June 22, 2022, 12:23 p.m. UTC | #10

On Wed, Jun 22, 2022 at 07:34:57PM +0900, Vincent MAILHOL wrote:
> On Wed. 22 Jun 2022 at 18:44, Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
> > On Wed, Jun 22, 2022 at 09:22:12AM +0000, David Laight wrote:
> > > From: Vincent MAILHOL
> > > > Sent: 21 June 2022 16:56
> > > >
> > > > On Wed. 22 Jun 2022 at 00:13, Alan Stern <stern@rowland.harvard.edu> wrote:
> > > > > On Tue, Jun 21, 2022 at 11:59:16PM +0900, Vincent MAILHOL wrote:
> > > > > > I (probably wrongly) assumed that urb::transfer_buffer_length was the
> > > > > > allocated length and urb::actual_length was what was actually being
> > > > > > transferred. Right now, I am just confused. Seems that I need to study
> > > > > > a bit more and understand the real purpose of
> > > > > > urb::transfer_buffer_length because I still fail to understand in
> > > > > > which situation this can be different from the allocated length.
> > > > >
> > > > > urb->transfer_buffer_length is the amount of data that the driver wants
> > > > > to send or expects to receive.  urb->actual_length is the amount of data
> > > > > that was actually sent or actually received.
> > > > >
> > > > > Neither of these values has to be the same as the size of the buffer --
> > > > > but they better not be bigger!
> > > >
> > > > Thanks. Now things are a bit clearer.
> > > > I guess that for the outcoming URB what I proposed made no sense. For
> > > > incoming URB, I guess that most of the drivers want to set
> > > > urb::transfer_buffer once for all with the allocated size and never
> > > > touch it again.
> > > >
> > > > Maybe the patch only makes sense of the incoming URB. Would it make
> > > > sense to keep it but with an additional check to trigger a dmesg
> > > > warning if this is used on an outcoming endpoint and with additional
> > > > comment that the URB_FREE_COHERENT requires urb::transfer_buffer to be
> > > > the allocated size?
> > >
> > > IIRC urb are pretty big.
> >
> > What exactly do you mean by "pretty big" here?  And what is wrong with
> > that, I have never seen any issues with the current size of that
> > structure in any benchmark or performance results.  All USB bottlenecks
> > that I know of are either in the hardware layer, or in the protocol
> > layer itself (i.e. usb-storage protocol).
> >
> > > You'd be unlucky if adding an extra field to hold the allocated
> > > size would ever need more memory.
> > > So it might just be worth saving the allocated size.
> >
> > Maybe, yes, then we could transition to allocating the urb and buffer at
> > the same time like we do partially for iso streams in an urb.  But that
> > still might be overkill for just this one driver.
> 
> Well, I wouldn't have proposed the patch if it only applied to a
> single driver. If we add a urb::allocated_transfer_size as suggested
> by David, I believe that the majority of the drivers using DMA memory
> will be able to rely on that URB_FREE_COHERENT flag for the garbage
> collection.
> 
> The caveat, as you pointed before, is that the developper still needs
> to be aware of the limitations of DMA and that it should not be freed
> in an IRQ context. e.g. no call to usb_kill_anchored_urbs() or other
> functions that would lead to urb_destroy().
> 
> > I'm curious as to why
> > a slow and tiny protocol like CAN needs to use usb_alloc_coherent() for
> > its buffers in the first place.
> 
> The CAN protocol, in its latest revision, allows for transfer speed up
> to ~5Mbits. For low performance CPUs, this starts to be a significant
> load. Also, the CAN PDU being small (0 to 64 bytes), many small
> transfers occur.

And is the memcpy the actual issue here?  Even tiny cpus can do large
and small memcopy very very very fast.

> Unfortunately I did not do any benchmark myself so I won't be able to
> back my explanation with numbers.

That might be the simplest solution here :)

thanks,

greg k-h

Vincent MAILHOL June 22, 2022, 3:59 p.m. UTC | #11

On Wed. 22 Jun 2022 at 21:24, Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
> On Wed, Jun 22, 2022 at 07:34:57PM +0900, Vincent MAILHOL wrote:
> > On Wed. 22 Jun 2022 at 18:44, Greg Kroah-Hartman
> > <gregkh@linuxfoundation.org> wrote:
> > > On Wed, Jun 22, 2022 at 09:22:12AM +0000, David Laight wrote:
> > > > From: Vincent MAILHOL
> > > > > Sent: 21 June 2022 16:56
> > > > >
> > > > > On Wed. 22 Jun 2022 at 00:13, Alan Stern <stern@rowland.harvard.edu> wrote:
> > > > > > On Tue, Jun 21, 2022 at 11:59:16PM +0900, Vincent MAILHOL wrote:
> > > > > > > I (probably wrongly) assumed that urb::transfer_buffer_length was the
> > > > > > > allocated length and urb::actual_length was what was actually being
> > > > > > > transferred. Right now, I am just confused. Seems that I need to study
> > > > > > > a bit more and understand the real purpose of
> > > > > > > urb::transfer_buffer_length because I still fail to understand in
> > > > > > > which situation this can be different from the allocated length.
> > > > > >
> > > > > > urb->transfer_buffer_length is the amount of data that the driver wants
> > > > > > to send or expects to receive.  urb->actual_length is the amount of data
> > > > > > that was actually sent or actually received.
> > > > > >
> > > > > > Neither of these values has to be the same as the size of the buffer --
> > > > > > but they better not be bigger!
> > > > >
> > > > > Thanks. Now things are a bit clearer.
> > > > > I guess that for the outcoming URB what I proposed made no sense. For
> > > > > incoming URB, I guess that most of the drivers want to set
> > > > > urb::transfer_buffer once for all with the allocated size and never
> > > > > touch it again.
> > > > >
> > > > > Maybe the patch only makes sense of the incoming URB. Would it make
> > > > > sense to keep it but with an additional check to trigger a dmesg
> > > > > warning if this is used on an outcoming endpoint and with additional
> > > > > comment that the URB_FREE_COHERENT requires urb::transfer_buffer to be
> > > > > the allocated size?
> > > >
> > > > IIRC urb are pretty big.
> > >
> > > What exactly do you mean by "pretty big" here?  And what is wrong with
> > > that, I have never seen any issues with the current size of that
> > > structure in any benchmark or performance results.  All USB bottlenecks
> > > that I know of are either in the hardware layer, or in the protocol
> > > layer itself (i.e. usb-storage protocol).
> > >
> > > > You'd be unlucky if adding an extra field to hold the allocated
> > > > size would ever need more memory.
> > > > So it might just be worth saving the allocated size.
> > >
> > > Maybe, yes, then we could transition to allocating the urb and buffer at
> > > the same time like we do partially for iso streams in an urb.  But that
> > > still might be overkill for just this one driver.
> >
> > Well, I wouldn't have proposed the patch if it only applied to a
> > single driver. If we add a urb::allocated_transfer_size as suggested
> > by David, I believe that the majority of the drivers using DMA memory
> > will be able to rely on that URB_FREE_COHERENT flag for the garbage
> > collection.
> >
> > The caveat, as you pointed before, is that the developper still needs
> > to be aware of the limitations of DMA and that it should not be freed
> > in an IRQ context. e.g. no call to usb_kill_anchored_urbs() or other
> > functions that would lead to urb_destroy().
> >
> > > I'm curious as to why
> > > a slow and tiny protocol like CAN needs to use usb_alloc_coherent() for
> > > its buffers in the first place.
> >
> > The CAN protocol, in its latest revision, allows for transfer speed up
> > to ~5Mbits. For low performance CPUs, this starts to be a significant
> > load. Also, the CAN PDU being small (0 to 64 bytes), many small
> > transfers occur.
>
> And is the memcpy the actual issue here?  Even tiny cpus can do large
> and small memcopy very very very fast.
>
> > Unfortunately I did not do any benchmark myself so I won't be able to
> > back my explanation with numbers.
>
> That might be the simplest solution here :)

Yes, this would give a clear answer whether or not DMA was needed in
the first place. But I do not own that gs_usb device to do the
benchmark myself (and to be honest I do not have time to dedicate for
this at the moment, maybe I will do it later on some other devices).

Has anyone from the linux-can mailing list ever done such a benchmark?
Else, is there anyone who would like to volunteer?


Yours sincerely,
Vincent Mailhol

Rhett Aultman June 22, 2022, 6:11 p.m. UTC | #12

On Thu, 23 Jun 2022, Vincent MAILHOL wrote:

> On Wed. 22 Jun 2022 at 21:24, Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
> > On Wed, Jun 22, 2022 at 07:34:57PM +0900, Vincent MAILHOL wrote:
> > > On Wed. 22 Jun 2022 at 18:44, Greg Kroah-Hartman
> > > <gregkh@linuxfoundation.org> wrote:
> > > > On Wed, Jun 22, 2022 at 09:22:12AM +0000, David Laight wrote:
> > > > > From: Vincent MAILHOL
> > > > > > Sent: 21 June 2022 16:56
> > > > > >
> > > > > > On Wed. 22 Jun 2022 at 00:13, Alan Stern <stern@rowland.harvard.edu> wrote:
> > > > > > > On Tue, Jun 21, 2022 at 11:59:16PM +0900, Vincent MAILHOL wrote:
> > > > > > > > I (probably wrongly) assumed that urb::transfer_buffer_length was the
> > > > > > > > allocated length and urb::actual_length was what was actually being
> > > > > > > > transferred. Right now, I am just confused. Seems that I need to study
> > > > > > > > a bit more and understand the real purpose of
> > > > > > > > urb::transfer_buffer_length because I still fail to understand in
> > > > > > > > which situation this can be different from the allocated length.
> > > > > > >
> > > > > > > urb->transfer_buffer_length is the amount of data that the driver wants
> > > > > > > to send or expects to receive.  urb->actual_length is the amount of data
> > > > > > > that was actually sent or actually received.
> > > > > > >
> > > > > > > Neither of these values has to be the same as the size of the buffer --
> > > > > > > but they better not be bigger!
> > > > > >
> > > > > > Thanks. Now things are a bit clearer.
> > > > > > I guess that for the outcoming URB what I proposed made no sense. For
> > > > > > incoming URB, I guess that most of the drivers want to set
> > > > > > urb::transfer_buffer once for all with the allocated size and never
> > > > > > touch it again.
> > > > > >
> > > > > > Maybe the patch only makes sense of the incoming URB. Would it make
> > > > > > sense to keep it but with an additional check to trigger a dmesg
> > > > > > warning if this is used on an outcoming endpoint and with additional
> > > > > > comment that the URB_FREE_COHERENT requires urb::transfer_buffer to be
> > > > > > the allocated size?
> > > > >
> > > > > IIRC urb are pretty big.
> > > >
> > > > What exactly do you mean by "pretty big" here?  And what is wrong with
> > > > that, I have never seen any issues with the current size of that
> > > > structure in any benchmark or performance results.  All USB bottlenecks
> > > > that I know of are either in the hardware layer, or in the protocol
> > > > layer itself (i.e. usb-storage protocol).
> > > >
> > > > > You'd be unlucky if adding an extra field to hold the allocated
> > > > > size would ever need more memory.
> > > > > So it might just be worth saving the allocated size.
> > > >
> > > > Maybe, yes, then we could transition to allocating the urb and buffer at
> > > > the same time like we do partially for iso streams in an urb.  But that
> > > > still might be overkill for just this one driver.
> > >
> > > Well, I wouldn't have proposed the patch if it only applied to a
> > > single driver. If we add a urb::allocated_transfer_size as suggested
> > > by David, I believe that the majority of the drivers using DMA memory
> > > will be able to rely on that URB_FREE_COHERENT flag for the garbage
> > > collection.
> > >
> > > The caveat, as you pointed before, is that the developper still needs
> > > to be aware of the limitations of DMA and that it should not be freed
> > > in an IRQ context. e.g. no call to usb_kill_anchored_urbs() or other
> > > functions that would lead to urb_destroy().
> > >
> > > > I'm curious as to why
> > > > a slow and tiny protocol like CAN needs to use usb_alloc_coherent() for
> > > > its buffers in the first place.
> > >
> > > The CAN protocol, in its latest revision, allows for transfer speed up
> > > to ~5Mbits. For low performance CPUs, this starts to be a significant
> > > load. Also, the CAN PDU being small (0 to 64 bytes), many small
> > > transfers occur.
> >
> > And is the memcpy the actual issue here?  Even tiny cpus can do large
> > and small memcopy very very very fast.
> >
> > > Unfortunately I did not do any benchmark myself so I won't be able to
> > > back my explanation with numbers.
> >
> > That might be the simplest solution here :)
>
> Yes, this would give a clear answer whether or not DMA was needed in
> the first place. But I do not own that gs_usb device to do the
> benchmark myself (and to be honest I do not have time to dedicate for
> this at the moment, maybe I will do it later on some other devices).
>
> Has anyone from the linux-can mailing list ever done such a benchmark?
> Else, is there anyone who would like to volunteer?

I have access to a couple of gs_usb devices but I am afraid I have no
experience performing this sort of benchmarking and also would have to
squeeze it in as a weekend project or something similar.  That said, if
someone's willing to help step me through it, I can see if it's feasible
for me to do.

That said, the gs_usb driver is mostly following along a very well
established pattern for writing USB CAN devices.  Both the pattern
followed that created the memory leak, as well as the pattern I followed
to resolve the memory leak, were also seen in the esd2 USB CAN driver as
well, and likely others are following suit.  So, I don't know that we'd
need to keep it specific to gs_usb to gain good information here.

Best,
Rhett

Vincent MAILHOL June 26, 2022, 8:21 a.m. UTC | #13

On Thu. 23 Jun 2022 at 03:13, Rhett Aultman <rhett.aultman@samsara.com> wrote:
> On Thu, 23 Jun 2022, Vincent MAILHOL wrote:
> > On Wed. 22 Jun 2022 at 21:24, Greg Kroah-Hartman
> > <gregkh@linuxfoundation.org> wrote:
> > > On Wed, Jun 22, 2022 at 07:34:57PM +0900, Vincent MAILHOL wrote:
> > > > On Wed. 22 Jun 2022 at 18:44, Greg Kroah-Hartman
> > > > <gregkh@linuxfoundation.org> wrote:
> > > > > On Wed, Jun 22, 2022 at 09:22:12AM +0000, David Laight wrote:
> > > > > > From: Vincent MAILHOL
> > > > > > > Sent: 21 June 2022 16:56
> > > > > > >
> > > > > > > On Wed. 22 Jun 2022 at 00:13, Alan Stern <stern@rowland.harvard.edu> wrote:
> > > > > > > > On Tue, Jun 21, 2022 at 11:59:16PM +0900, Vincent MAILHOL wrote:
> > > > > > > > > I (probably wrongly) assumed that urb::transfer_buffer_length was the
> > > > > > > > > allocated length and urb::actual_length was what was actually being
> > > > > > > > > transferred. Right now, I am just confused. Seems that I need to study
> > > > > > > > > a bit more and understand the real purpose of
> > > > > > > > > urb::transfer_buffer_length because I still fail to understand in
> > > > > > > > > which situation this can be different from the allocated length.
> > > > > > > >
> > > > > > > > urb->transfer_buffer_length is the amount of data that the driver wants
> > > > > > > > to send or expects to receive.  urb->actual_length is the amount of data
> > > > > > > > that was actually sent or actually received.
> > > > > > > >
> > > > > > > > Neither of these values has to be the same as the size of the buffer --
> > > > > > > > but they better not be bigger!
> > > > > > >
> > > > > > > Thanks. Now things are a bit clearer.
> > > > > > > I guess that for the outcoming URB what I proposed made no sense. For
> > > > > > > incoming URB, I guess that most of the drivers want to set
> > > > > > > urb::transfer_buffer once for all with the allocated size and never
> > > > > > > touch it again.
> > > > > > >
> > > > > > > Maybe the patch only makes sense of the incoming URB. Would it make
> > > > > > > sense to keep it but with an additional check to trigger a dmesg
> > > > > > > warning if this is used on an outcoming endpoint and with additional
> > > > > > > comment that the URB_FREE_COHERENT requires urb::transfer_buffer to be
> > > > > > > the allocated size?
> > > > > >
> > > > > > IIRC urb are pretty big.
> > > > >
> > > > > What exactly do you mean by "pretty big" here?  And what is wrong with
> > > > > that, I have never seen any issues with the current size of that
> > > > > structure in any benchmark or performance results.  All USB bottlenecks
> > > > > that I know of are either in the hardware layer, or in the protocol
> > > > > layer itself (i.e. usb-storage protocol).
> > > > >
> > > > > > You'd be unlucky if adding an extra field to hold the allocated
> > > > > > size would ever need more memory.
> > > > > > So it might just be worth saving the allocated size.
> > > > >
> > > > > Maybe, yes, then we could transition to allocating the urb and buffer at
> > > > > the same time like we do partially for iso streams in an urb.  But that
> > > > > still might be overkill for just this one driver.
> > > >
> > > > Well, I wouldn't have proposed the patch if it only applied to a
> > > > single driver. If we add a urb::allocated_transfer_size as suggested
> > > > by David, I believe that the majority of the drivers using DMA memory
> > > > will be able to rely on that URB_FREE_COHERENT flag for the garbage
> > > > collection.
> > > >
> > > > The caveat, as you pointed before, is that the developper still needs
> > > > to be aware of the limitations of DMA and that it should not be freed
> > > > in an IRQ context. e.g. no call to usb_kill_anchored_urbs() or other
> > > > functions that would lead to urb_destroy().
> > > >
> > > > > I'm curious as to why
> > > > > a slow and tiny protocol like CAN needs to use usb_alloc_coherent() for
> > > > > its buffers in the first place.
> > > >
> > > > The CAN protocol, in its latest revision, allows for transfer speed up
> > > > to ~5Mbits. For low performance CPUs, this starts to be a significant
> > > > load. Also, the CAN PDU being small (0 to 64 bytes), many small
> > > > transfers occur.
> > >
> > > And is the memcpy the actual issue here?  Even tiny cpus can do large
> > > and small memcopy very very very fast.
> > >
> > > > Unfortunately I did not do any benchmark myself so I won't be able to
> > > > back my explanation with numbers.
> > >
> > > That might be the simplest solution here :)
> >
> > Yes, this would give a clear answer whether or not DMA was needed in
> > the first place. But I do not own that gs_usb device to do the
> > benchmark myself (and to be honest I do not have time to dedicate for
> > this at the moment, maybe I will do it later on some other devices).
> >
> > Has anyone from the linux-can mailing list ever done such a benchmark?
> > Else, is there anyone who would like to volunteer?
>
> I have access to a couple of gs_usb devices but I am afraid I have no
> experience performing this sort of benchmarking and also would have to
> squeeze it in as a weekend project or something similar.  That said, if
> someone's willing to help step me through it, I can see if it's feasible
> for me to do.

I can throw a few hints which might be helpful.

First, you should obviously prepare two versions of the gs_usb driver:
one using usb_alloc_coherent() (the current one), the other using
kmalloc() and compare the two.

Right now, I can think of two relevant benchmarks: transmission
latency and CPU load.

For the transmission latency, I posted one on my tools:
https://lore.kernel.org/linux-can/20220626075317.746535-1-mailhol.vincent@wanadoo.fr/T/#u

For the CPU load, I suggest to put the bus on full load, for example using:
| cangen -g0 -p1 can0
(you might also want to play with other parameters such as the length using -L)
Then use an existing tool to get the CPU load figures. I don't know
for sure which tool is a good one to benchmark CPU usage in kernel
land so you will have to research that part. If anyone has a
suggestion…

> That said, the gs_usb driver is mostly following along a very well
> established pattern for writing USB CAN devices.  Both the pattern
> followed that created the memory leak, as well as the pattern I followed
> to resolve the memory leak, were also seen in the esd2 USB CAN driver as
> well, and likely others are following suit.  So, I don't know that we'd
> need to keep it specific to gs_usb to gain good information here.

Yes, I looked at the log, the very first CAN USB driver is ems_usb and
was using DMA memory from the beginning. From that point on, nearly
all the drivers copied the trend (the only exception I am aware of is
peak_usb).

I agree that the scope is wider than the gs_can (thus my proposal to
fix it at API level).


Yours sincerely,
Vincent Mailhol

[v3,0/2] URB_FREE_COHERENT gs_usb memory leak fix

Message

Comments