Message ID | 20220610213335.3077375-1-rhett.aultman@samsara.com |
---|---|
Headers | show |
Series | URB_FREE_COHERENT gs_usb memory leak fix | expand |
On 10.06.2022 17:33:35, Rhett Aultman wrote: > From: Vincent Mailhol <mailhol.vincent@wanadoo.fr> > > When allocating URB memory with kmalloc(), drivers can simply set the > URB_FREE_BUFFER flag in urb::transfer_flags and that way, the memory > will be freed in the background when killing the URB (for example with > usb_kill_anchored_urbs()). > > However, there are no equivalent mechanism when allocating DMA memory > (with usb_alloc_coherent()). > > This patch adds a new flag: URB_FREE_COHERENT. Setting this flag will > cause the kernel to automatically call usb_free_coherent() on the > transfer buffer when the URB is killed, similarly to how > URB_FREE_BUFFER triggers a call to kfree(). > > In order to have all the flags in numerical order, URB_DIR_IN is > renumbered from 0x0200 to 0x0400 so that URB_FREE_COHERENT can reuse > value 0x0200. > > Co-developed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> > Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> > Co-developed-by: Rhett Aultman <rhett.aultman@samsara.com> > Signed-off-by: Rhett Aultman <rhett.aultman@samsara.com> > Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> FWIW: Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> This patch probably goes upstream via USB. Once this is in net I'll take the 2nd patch. regards, Marc
On Sun. 12 juin 2022 at 00:31, Marc Kleine-Budde <mkl@pengutronix.de> wrote: > On 10.06.2022 17:33:35, Rhett Aultman wrote: > > From: Vincent Mailhol <mailhol.vincent@wanadoo.fr> > > > > When allocating URB memory with kmalloc(), drivers can simply set the > > URB_FREE_BUFFER flag in urb::transfer_flags and that way, the memory > > will be freed in the background when killing the URB (for example with > > usb_kill_anchored_urbs()). > > > > However, there are no equivalent mechanism when allocating DMA memory > > (with usb_alloc_coherent()). > > > > This patch adds a new flag: URB_FREE_COHERENT. Setting this flag will > > cause the kernel to automatically call usb_free_coherent() on the > > transfer buffer when the URB is killed, similarly to how > > URB_FREE_BUFFER triggers a call to kfree(). > > > > In order to have all the flags in numerical order, URB_DIR_IN is > > renumbered from 0x0200 to 0x0400 so that URB_FREE_COHERENT can reuse > > value 0x0200. > > > > Co-developed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> > > Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> > > Co-developed-by: Rhett Aultman <rhett.aultman@samsara.com> > > Signed-off-by: Rhett Aultman <rhett.aultman@samsara.com> > > Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> > > FWIW: > Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> > > This patch probably goes upstream via USB. Once this is in net I'll take > the 2nd patch. Question to Greg: can this first patch also be applied to the stable branches? Technically, this is a new feature but it will be used to solve several memory leaks on existing drivers (the gs_usb is only one example). Yours sincerely, Vincent Mailhol
On Tue. 21 Jun 2022 at 22:56, Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > On Sun, Jun 12, 2022 at 01:06:37AM +0900, Vincent MAILHOL wrote: > > On Sun. 12 juin 2022 at 00:31, Marc Kleine-Budde <mkl@pengutronix.de> wrote: > > > On 10.06.2022 17:33:35, Rhett Aultman wrote: > > > > From: Vincent Mailhol <mailhol.vincent@wanadoo.fr> > > > > > > > > When allocating URB memory with kmalloc(), drivers can simply set the > > > > URB_FREE_BUFFER flag in urb::transfer_flags and that way, the memory > > > > will be freed in the background when killing the URB (for example with > > > > usb_kill_anchored_urbs()). > > > > > > > > However, there are no equivalent mechanism when allocating DMA memory > > > > (with usb_alloc_coherent()). > > > > > > > > This patch adds a new flag: URB_FREE_COHERENT. Setting this flag will > > > > cause the kernel to automatically call usb_free_coherent() on the > > > > transfer buffer when the URB is killed, similarly to how > > > > URB_FREE_BUFFER triggers a call to kfree(). > > > > > > > > In order to have all the flags in numerical order, URB_DIR_IN is > > > > renumbered from 0x0200 to 0x0400 so that URB_FREE_COHERENT can reuse > > > > value 0x0200. > > > > > > > > Co-developed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> > > > > Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> > > > > Co-developed-by: Rhett Aultman <rhett.aultman@samsara.com> > > > > Signed-off-by: Rhett Aultman <rhett.aultman@samsara.com> > > > > Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> > > > > > > FWIW: > > > Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> > > > > > > This patch probably goes upstream via USB. Once this is in net I'll take > > > the 2nd patch. > > > > Question to Greg: can this first patch also be applied to the stable > > branches? Technically, this is a new feature but it will be used to > > solve several memory leaks on existing drivers (the gs_usb is only one > > example). > > We take in dependent patches into the stable trees all the time when > needed, that's not an issue here. > > What is an issue here is that this feels odd as other USB developers > said previously. > > My big objection here is what validates that the size of the transfer > buffer here is really the size of the buffer to be freed? Is that > always set properly to be the length that was allocated? That might > just be the size of the last transfer using this buffer, but there is no > guarantee that I can see of that says this really is the length of the > allocated buffer, which is why usb_free_coherent() requires a size > parameter. Thanks for the comment. I (probably wrongly) assumed that urb::transfer_buffer_length was the allocated length and urb::actual_length was what was actually being transferred. Right now, I am just confused. Seems that I need to study a bit more and understand the real purpose of urb::transfer_buffer_length because I still fail to understand in which situation this can be different from the allocated length. > If that guarantee is always right, then we should be able to drop the > size option in usb_free_coherent(), and I don't think that's really > possible. I do not follow you on this comment. usb_free_coherent() does not have a reference to the URB, right? How would it be supposed to retrieve urb::transfer_buffer_length? Yours sincerely, Vincent Mailhol
On Tue, Jun 21, 2022 at 11:59:16PM +0900, Vincent MAILHOL wrote: > I (probably wrongly) assumed that urb::transfer_buffer_length was the > allocated length and urb::actual_length was what was actually being > transferred. Right now, I am just confused. Seems that I need to study > a bit more and understand the real purpose of > urb::transfer_buffer_length because I still fail to understand in > which situation this can be different from the allocated length. urb->transfer_buffer_length is the amount of data that the driver wants to send or expects to receive. urb->actual_length is the amount of data that was actually sent or actually received. Neither of these values has to be the same as the size of the buffer -- but they better not be bigger! Alan Stern
On Wed. 22 Jun 2022 at 00:11, Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > On Tue, Jun 21, 2022 at 11:59:16PM +0900, Vincent MAILHOL wrote: > > On Tue. 21 Jun 2022 at 22:56, Greg Kroah-Hartman > > <gregkh@linuxfoundation.org> wrote: > > > On Sun, Jun 12, 2022 at 01:06:37AM +0900, Vincent MAILHOL wrote: > > > > On Sun. 12 juin 2022 at 00:31, Marc Kleine-Budde <mkl@pengutronix.de> wrote: > > > > > On 10.06.2022 17:33:35, Rhett Aultman wrote: > > > > > > From: Vincent Mailhol <mailhol.vincent@wanadoo.fr> > > > > > > > > > > > > When allocating URB memory with kmalloc(), drivers can simply set the > > > > > > URB_FREE_BUFFER flag in urb::transfer_flags and that way, the memory > > > > > > will be freed in the background when killing the URB (for example with > > > > > > usb_kill_anchored_urbs()). > > > > > > > > > > > > However, there are no equivalent mechanism when allocating DMA memory > > > > > > (with usb_alloc_coherent()). > > > > > > > > > > > > This patch adds a new flag: URB_FREE_COHERENT. Setting this flag will > > > > > > cause the kernel to automatically call usb_free_coherent() on the > > > > > > transfer buffer when the URB is killed, similarly to how > > > > > > URB_FREE_BUFFER triggers a call to kfree(). > > > > > > > > > > > > In order to have all the flags in numerical order, URB_DIR_IN is > > > > > > renumbered from 0x0200 to 0x0400 so that URB_FREE_COHERENT can reuse > > > > > > value 0x0200. > > > > > > > > > > > > Co-developed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> > > > > > > Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> > > > > > > Co-developed-by: Rhett Aultman <rhett.aultman@samsara.com> > > > > > > Signed-off-by: Rhett Aultman <rhett.aultman@samsara.com> > > > > > > Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> > > > > > > > > > > FWIW: > > > > > Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> > > > > > > > > > > This patch probably goes upstream via USB. Once this is in net I'll take > > > > > the 2nd patch. > > > > > > > > Question to Greg: can this first patch also be applied to the stable > > > > branches? Technically, this is a new feature but it will be used to > > > > solve several memory leaks on existing drivers (the gs_usb is only one > > > > example). > > > > > > We take in dependent patches into the stable trees all the time when > > > needed, that's not an issue here. > > > > > > What is an issue here is that this feels odd as other USB developers > > > said previously. > > > > > > My big objection here is what validates that the size of the transfer > > > buffer here is really the size of the buffer to be freed? Is that > > > always set properly to be the length that was allocated? That might > > > just be the size of the last transfer using this buffer, but there is no > > > guarantee that I can see of that says this really is the length of the > > > allocated buffer, which is why usb_free_coherent() requires a size > > > parameter. > > > > Thanks for the comment. > > > > I (probably wrongly) assumed that urb::transfer_buffer_length was the > > allocated length and urb::actual_length was what was actually being > > transferred. Right now, I am just confused. Seems that I need to study > > a bit more and understand the real purpose of > > urb::transfer_buffer_length because I still fail to understand in > > which situation this can be different from the allocated length. > > > > > If that guarantee is always right, then we should be able to drop the > > > size option in usb_free_coherent(), and I don't think that's really > > > possible. > > > > I do not follow you on this comment. usb_free_coherent() does not have > > a reference to the URB, right? How would it be supposed to retrieve > > urb::transfer_buffer_length? > > Ah, good point. Along those lines, how do you know what the `dma` > parameter should be set to as well? Here, just assuming that usb_alloc_coherent() was given urb::transfer_dma as an argument. Maybe I missed some edge cases but I never saw drivers doing it differently.
On Wed 22 Jun 2022 at 01:15, Alan Stern <stern@rowland.harvard.edu> wrote: > On Wed, Jun 22, 2022 at 12:55:46AM +0900, Vincent MAILHOL wrote: > > On Wed. 22 Jun 2022 at 00:13, Alan Stern <stern@rowland.harvard.edu> wrote: > > > On Tue, Jun 21, 2022 at 11:59:16PM +0900, Vincent MAILHOL wrote: > > > > I (probably wrongly) assumed that urb::transfer_buffer_length was the > > > > allocated length and urb::actual_length was what was actually being > > > > transferred. Right now, I am just confused. Seems that I need to study > > > > a bit more and understand the real purpose of > > > > urb::transfer_buffer_length because I still fail to understand in > > > > which situation this can be different from the allocated length. > > > > > > urb->transfer_buffer_length is the amount of data that the driver wants > > > to send or expects to receive. urb->actual_length is the amount of data > > > that was actually sent or actually received. > > > > > > Neither of these values has to be the same as the size of the buffer -- > > > but they better not be bigger! > > > > Thanks. Now things are a bit clearer. > > I guess that for the outcoming URB what I proposed made no sense. For > > incoming URB, I guess that most of the drivers want to set > > urb::transfer_buffer once for all with the allocated size and never > > touch it again. > > Not necessarily. Some drivers may behave differently from the way you > expect. Yes, my point is not to generalise. Agree that there are exceptions. > > Maybe the patch only makes sense of the incoming URB. Would it make > > sense to keep it but with an additional check to trigger a dmesg > > warning if this is used on an outcoming endpoint and with additional > > comment that the URB_FREE_COHERENT requires urb::transfer_buffer to be > > the allocated size? > > Well, what really matters is that the transfer_buffer_length value has > to be the same as the size of the buffer. If that's true, the direction > of the URB doesn't matter. So yes, that requirement would definitely > need to be documented. > > On the other hand, there wouldn't be any way to tell automatically if > the requirement was violated. ACK. That's why I said "add comment" and not "check". > And since this function could only be > used with some of the URBs you're interested in, does it make sense to > add it at all? The other URBs would still need their buffers to be > freed manually. The rationale is that similarly to URB_FREE_BUFFER, this would be optional. This is why I did not propose to reuse URB_NO_TRANSFER_DMA_MAP but instead add a new flag. I propose it because I think that many drivers can benefit from it. More than that, the real concern is that many developers forget to free the DMA allocated memory. c.f. original message of this thread: https://lore.kernel.org/linux-can/alpine.DEB.2.22.394.2206031547001.1630869@thelappy/T/#m2ef343d3ee708178b1e37be898884bafa7f49f2f And the usual fix requires to create local arrays to store references to the transfer buffer and DMA addresses. I would like to find a solution to remove this burden from the drivers and have an USB API to easily free the URB when, for example, killing an anchor. Yours sincerely, Vincent Mailhol
On Wed, Jun 22, 2022 at 01:40:10AM +0900, Vincent MAILHOL wrote: > On Wed 22 Jun 2022 at 01:15, Alan Stern <stern@rowland.harvard.edu> wrote: > > On Wed, Jun 22, 2022 at 12:55:46AM +0900, Vincent MAILHOL wrote: > > > On Wed. 22 Jun 2022 at 00:13, Alan Stern <stern@rowland.harvard.edu> wrote: > > > > On Tue, Jun 21, 2022 at 11:59:16PM +0900, Vincent MAILHOL wrote: > > > > > I (probably wrongly) assumed that urb::transfer_buffer_length was the > > > > > allocated length and urb::actual_length was what was actually being > > > > > transferred. Right now, I am just confused. Seems that I need to study > > > > > a bit more and understand the real purpose of > > > > > urb::transfer_buffer_length because I still fail to understand in > > > > > which situation this can be different from the allocated length. > > > > > > > > urb->transfer_buffer_length is the amount of data that the driver wants > > > > to send or expects to receive. urb->actual_length is the amount of data > > > > that was actually sent or actually received. > > > > > > > > Neither of these values has to be the same as the size of the buffer -- > > > > but they better not be bigger! > > > > > > Thanks. Now things are a bit clearer. > > > I guess that for the outcoming URB what I proposed made no sense. For > > > incoming URB, I guess that most of the drivers want to set > > > urb::transfer_buffer once for all with the allocated size and never > > > touch it again. > > > > Not necessarily. Some drivers may behave differently from the way you > > expect. > > Yes, my point is not to generalise. Agree that there are exceptions. > > > > Maybe the patch only makes sense of the incoming URB. Would it make > > > sense to keep it but with an additional check to trigger a dmesg > > > warning if this is used on an outcoming endpoint and with additional > > > comment that the URB_FREE_COHERENT requires urb::transfer_buffer to be > > > the allocated size? > > > > Well, what really matters is that the transfer_buffer_length value has > > to be the same as the size of the buffer. If that's true, the direction > > of the URB doesn't matter. So yes, that requirement would definitely > > need to be documented. > > > > On the other hand, there wouldn't be any way to tell automatically if > > the requirement was violated. > > ACK. That's why I said "add comment" and not "check". > > > And since this function could only be > > used with some of the URBs you're interested in, does it make sense to > > add it at all? The other URBs would still need their buffers to be > > freed manually. > > The rationale is that similarly to URB_FREE_BUFFER, this would be > optional. This is why I did not propose to reuse > URB_NO_TRANSFER_DMA_MAP but instead add a new flag. I propose it > because I think that many drivers can benefit from it. > > More than that, the real concern is that many developers forget to > free the DMA allocated memory. c.f. original message of this thread: > https://lore.kernel.org/linux-can/alpine.DEB.2.22.394.2206031547001.1630869@thelappy/T/#m2ef343d3ee708178b1e37be898884bafa7f49f2f > > And the usual fix requires to create local arrays to store references > to the transfer buffer and DMA addresses. Why not just free the memory in the urb completion function that is called when it is finished? thanks, greg k-h
On Wed. 22 Jun 2022 at 02:02, Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > On Wed, Jun 22, 2022 at 01:40:10AM +0900, Vincent MAILHOL wrote: > > On Wed 22 Jun 2022 at 01:15, Alan Stern <stern@rowland.harvard.edu> wrote: > > > On Wed, Jun 22, 2022 at 12:55:46AM +0900, Vincent MAILHOL wrote: > > > > On Wed. 22 Jun 2022 at 00:13, Alan Stern <stern@rowland.harvard.edu> wrote: > > > > > On Tue, Jun 21, 2022 at 11:59:16PM +0900, Vincent MAILHOL wrote: > > > > > > I (probably wrongly) assumed that urb::transfer_buffer_length was the > > > > > > allocated length and urb::actual_length was what was actually being > > > > > > transferred. Right now, I am just confused. Seems that I need to study > > > > > > a bit more and understand the real purpose of > > > > > > urb::transfer_buffer_length because I still fail to understand in > > > > > > which situation this can be different from the allocated length. > > > > > > > > > > urb->transfer_buffer_length is the amount of data that the driver wants > > > > > to send or expects to receive. urb->actual_length is the amount of data > > > > > that was actually sent or actually received. > > > > > > > > > > Neither of these values has to be the same as the size of the buffer -- > > > > > but they better not be bigger! > > > > > > > > Thanks. Now things are a bit clearer. > > > > I guess that for the outcoming URB what I proposed made no sense. For > > > > incoming URB, I guess that most of the drivers want to set > > > > urb::transfer_buffer once for all with the allocated size and never > > > > touch it again. > > > > > > Not necessarily. Some drivers may behave differently from the way you > > > expect. > > > > Yes, my point is not to generalise. Agree that there are exceptions. > > > > > > Maybe the patch only makes sense of the incoming URB. Would it make > > > > sense to keep it but with an additional check to trigger a dmesg > > > > warning if this is used on an outcoming endpoint and with additional > > > > comment that the URB_FREE_COHERENT requires urb::transfer_buffer to be > > > > the allocated size? > > > > > > Well, what really matters is that the transfer_buffer_length value has > > > to be the same as the size of the buffer. If that's true, the direction > > > of the URB doesn't matter. So yes, that requirement would definitely > > > need to be documented. > > > > > > On the other hand, there wouldn't be any way to tell automatically if > > > the requirement was violated. > > > > ACK. That's why I said "add comment" and not "check". > > > > > And since this function could only be > > > used with some of the URBs you're interested in, does it make sense to > > > add it at all? The other URBs would still need their buffers to be > > > freed manually. > > > > The rationale is that similarly to URB_FREE_BUFFER, this would be > > optional. This is why I did not propose to reuse > > URB_NO_TRANSFER_DMA_MAP but instead add a new flag. I propose it > > because I think that many drivers can benefit from it. > > > > More than that, the real concern is that many developers forget to > > free the DMA allocated memory. c.f. original message of this thread: > > https://lore.kernel.org/linux-can/alpine.DEB.2.22.394.2206031547001.1630869@thelappy/T/#m2ef343d3ee708178b1e37be898884bafa7f49f2f > > > > And the usual fix requires to create local arrays to store references > > to the transfer buffer and DMA addresses. > > Why not just free the memory in the urb completion function that is > called when it is finished? I thought it was a bad thing to call usb_free_coherent() in the completion function and that the best practice was to allocate all the DMA memory when opening the device, reuse them during runtime and free everything when closing the device. Especially, correct me if I am wrong, but isn't there a risk to trigger below warning if calling usb_free_coherent() in an IRQ context (which is the case in the completion function)? https://elixir.bootlin.com/linux/v5.18/source/kernel/dma/mapping.c#L526 Yours sincerely, Vincent Mailhol
From: Greg Kroah-Hartman > Sent: 22 June 2022 10:41 ... > > IIRC urb are pretty big. > > What exactly do you mean by "pretty big" here? Maybe 100-200 bytes? > And what is wrong with > that, I have never seen any issues with the current size of that > structure in any benchmark or performance results. Nothing really, the cost of allocating a sub-page structure is pretty much independent of its size. What I really meant is it isn't (say) 32 bytes where adding another 4 could be a significant increase. > All USB bottlenecks > that I know of are either in the hardware layer, or in the protocol > layer itself (i.e. usb-storage protocol). There is a bufferbloat issue for usbnet on XHCI. It would be better to fill the ring with (probably) 4k buffers and have something sort out the USB frames from the buffers and then get the network driver to use (IIRC) build_skb to generate the skb from the data fragments. I was only using 100M last time I was testing that and didn't get performance issues. Just problems with the USB connections 'bouncing', that project got canned for other reasons ... But at higher speeds and high network use it might all be a problem. > > > You'd be unlucky if adding an extra field to hold the allocated > > size would ever need more memory. > > So it might just be worth saving the allocated size. > > Maybe, yes, then we could transition to allocating the urb and buffer at > the same time like we do partially for iso streams in an urb. But that > still might be overkill for just this one driver. I'm curious as to why > a slow and tiny protocol like CAN needs to use usb_alloc_coherent() for > its buffers in the first place. Yes, being able to do short transfers from buffer space in the urb would save all the issues about having to allocate an extra buffer to avoid DMA from stack. Indeed for XHCI there is a bit that allows 8 bytes of data to replace the pointer in the ring structure itself. I don't remember the driver every doing that. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Wed, Jun 22, 2022 at 07:34:57PM +0900, Vincent MAILHOL wrote: > On Wed. 22 Jun 2022 at 18:44, Greg Kroah-Hartman > <gregkh@linuxfoundation.org> wrote: > > On Wed, Jun 22, 2022 at 09:22:12AM +0000, David Laight wrote: > > > From: Vincent MAILHOL > > > > Sent: 21 June 2022 16:56 > > > > > > > > On Wed. 22 Jun 2022 at 00:13, Alan Stern <stern@rowland.harvard.edu> wrote: > > > > > On Tue, Jun 21, 2022 at 11:59:16PM +0900, Vincent MAILHOL wrote: > > > > > > I (probably wrongly) assumed that urb::transfer_buffer_length was the > > > > > > allocated length and urb::actual_length was what was actually being > > > > > > transferred. Right now, I am just confused. Seems that I need to study > > > > > > a bit more and understand the real purpose of > > > > > > urb::transfer_buffer_length because I still fail to understand in > > > > > > which situation this can be different from the allocated length. > > > > > > > > > > urb->transfer_buffer_length is the amount of data that the driver wants > > > > > to send or expects to receive. urb->actual_length is the amount of data > > > > > that was actually sent or actually received. > > > > > > > > > > Neither of these values has to be the same as the size of the buffer -- > > > > > but they better not be bigger! > > > > > > > > Thanks. Now things are a bit clearer. > > > > I guess that for the outcoming URB what I proposed made no sense. For > > > > incoming URB, I guess that most of the drivers want to set > > > > urb::transfer_buffer once for all with the allocated size and never > > > > touch it again. > > > > > > > > Maybe the patch only makes sense of the incoming URB. Would it make > > > > sense to keep it but with an additional check to trigger a dmesg > > > > warning if this is used on an outcoming endpoint and with additional > > > > comment that the URB_FREE_COHERENT requires urb::transfer_buffer to be > > > > the allocated size? > > > > > > IIRC urb are pretty big. > > > > What exactly do you mean by "pretty big" here? And what is wrong with > > that, I have never seen any issues with the current size of that > > structure in any benchmark or performance results. All USB bottlenecks > > that I know of are either in the hardware layer, or in the protocol > > layer itself (i.e. usb-storage protocol). > > > > > You'd be unlucky if adding an extra field to hold the allocated > > > size would ever need more memory. > > > So it might just be worth saving the allocated size. > > > > Maybe, yes, then we could transition to allocating the urb and buffer at > > the same time like we do partially for iso streams in an urb. But that > > still might be overkill for just this one driver. > > Well, I wouldn't have proposed the patch if it only applied to a > single driver. If we add a urb::allocated_transfer_size as suggested > by David, I believe that the majority of the drivers using DMA memory > will be able to rely on that URB_FREE_COHERENT flag for the garbage > collection. > > The caveat, as you pointed before, is that the developper still needs > to be aware of the limitations of DMA and that it should not be freed > in an IRQ context. e.g. no call to usb_kill_anchored_urbs() or other > functions that would lead to urb_destroy(). > > > I'm curious as to why > > a slow and tiny protocol like CAN needs to use usb_alloc_coherent() for > > its buffers in the first place. > > The CAN protocol, in its latest revision, allows for transfer speed up > to ~5Mbits. For low performance CPUs, this starts to be a significant > load. Also, the CAN PDU being small (0 to 64 bytes), many small > transfers occur. And is the memcpy the actual issue here? Even tiny cpus can do large and small memcopy very very very fast. > Unfortunately I did not do any benchmark myself so I won't be able to > back my explanation with numbers. That might be the simplest solution here :) thanks, greg k-h
On Wed. 22 Jun 2022 at 21:24, Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > On Wed, Jun 22, 2022 at 07:34:57PM +0900, Vincent MAILHOL wrote: > > On Wed. 22 Jun 2022 at 18:44, Greg Kroah-Hartman > > <gregkh@linuxfoundation.org> wrote: > > > On Wed, Jun 22, 2022 at 09:22:12AM +0000, David Laight wrote: > > > > From: Vincent MAILHOL > > > > > Sent: 21 June 2022 16:56 > > > > > > > > > > On Wed. 22 Jun 2022 at 00:13, Alan Stern <stern@rowland.harvard.edu> wrote: > > > > > > On Tue, Jun 21, 2022 at 11:59:16PM +0900, Vincent MAILHOL wrote: > > > > > > > I (probably wrongly) assumed that urb::transfer_buffer_length was the > > > > > > > allocated length and urb::actual_length was what was actually being > > > > > > > transferred. Right now, I am just confused. Seems that I need to study > > > > > > > a bit more and understand the real purpose of > > > > > > > urb::transfer_buffer_length because I still fail to understand in > > > > > > > which situation this can be different from the allocated length. > > > > > > > > > > > > urb->transfer_buffer_length is the amount of data that the driver wants > > > > > > to send or expects to receive. urb->actual_length is the amount of data > > > > > > that was actually sent or actually received. > > > > > > > > > > > > Neither of these values has to be the same as the size of the buffer -- > > > > > > but they better not be bigger! > > > > > > > > > > Thanks. Now things are a bit clearer. > > > > > I guess that for the outcoming URB what I proposed made no sense. For > > > > > incoming URB, I guess that most of the drivers want to set > > > > > urb::transfer_buffer once for all with the allocated size and never > > > > > touch it again. > > > > > > > > > > Maybe the patch only makes sense of the incoming URB. Would it make > > > > > sense to keep it but with an additional check to trigger a dmesg > > > > > warning if this is used on an outcoming endpoint and with additional > > > > > comment that the URB_FREE_COHERENT requires urb::transfer_buffer to be > > > > > the allocated size? > > > > > > > > IIRC urb are pretty big. > > > > > > What exactly do you mean by "pretty big" here? And what is wrong with > > > that, I have never seen any issues with the current size of that > > > structure in any benchmark or performance results. All USB bottlenecks > > > that I know of are either in the hardware layer, or in the protocol > > > layer itself (i.e. usb-storage protocol). > > > > > > > You'd be unlucky if adding an extra field to hold the allocated > > > > size would ever need more memory. > > > > So it might just be worth saving the allocated size. > > > > > > Maybe, yes, then we could transition to allocating the urb and buffer at > > > the same time like we do partially for iso streams in an urb. But that > > > still might be overkill for just this one driver. > > > > Well, I wouldn't have proposed the patch if it only applied to a > > single driver. If we add a urb::allocated_transfer_size as suggested > > by David, I believe that the majority of the drivers using DMA memory > > will be able to rely on that URB_FREE_COHERENT flag for the garbage > > collection. > > > > The caveat, as you pointed before, is that the developper still needs > > to be aware of the limitations of DMA and that it should not be freed > > in an IRQ context. e.g. no call to usb_kill_anchored_urbs() or other > > functions that would lead to urb_destroy(). > > > > > I'm curious as to why > > > a slow and tiny protocol like CAN needs to use usb_alloc_coherent() for > > > its buffers in the first place. > > > > The CAN protocol, in its latest revision, allows for transfer speed up > > to ~5Mbits. For low performance CPUs, this starts to be a significant > > load. Also, the CAN PDU being small (0 to 64 bytes), many small > > transfers occur. > > And is the memcpy the actual issue here? Even tiny cpus can do large > and small memcopy very very very fast. > > > Unfortunately I did not do any benchmark myself so I won't be able to > > back my explanation with numbers. > > That might be the simplest solution here :) Yes, this would give a clear answer whether or not DMA was needed in the first place. But I do not own that gs_usb device to do the benchmark myself (and to be honest I do not have time to dedicate for this at the moment, maybe I will do it later on some other devices). Has anyone from the linux-can mailing list ever done such a benchmark? Else, is there anyone who would like to volunteer? Yours sincerely, Vincent Mailhol
On Thu, 23 Jun 2022, Vincent MAILHOL wrote: > On Wed. 22 Jun 2022 at 21:24, Greg Kroah-Hartman > <gregkh@linuxfoundation.org> wrote: > > On Wed, Jun 22, 2022 at 07:34:57PM +0900, Vincent MAILHOL wrote: > > > On Wed. 22 Jun 2022 at 18:44, Greg Kroah-Hartman > > > <gregkh@linuxfoundation.org> wrote: > > > > On Wed, Jun 22, 2022 at 09:22:12AM +0000, David Laight wrote: > > > > > From: Vincent MAILHOL > > > > > > Sent: 21 June 2022 16:56 > > > > > > > > > > > > On Wed. 22 Jun 2022 at 00:13, Alan Stern <stern@rowland.harvard.edu> wrote: > > > > > > > On Tue, Jun 21, 2022 at 11:59:16PM +0900, Vincent MAILHOL wrote: > > > > > > > > I (probably wrongly) assumed that urb::transfer_buffer_length was the > > > > > > > > allocated length and urb::actual_length was what was actually being > > > > > > > > transferred. Right now, I am just confused. Seems that I need to study > > > > > > > > a bit more and understand the real purpose of > > > > > > > > urb::transfer_buffer_length because I still fail to understand in > > > > > > > > which situation this can be different from the allocated length. > > > > > > > > > > > > > > urb->transfer_buffer_length is the amount of data that the driver wants > > > > > > > to send or expects to receive. urb->actual_length is the amount of data > > > > > > > that was actually sent or actually received. > > > > > > > > > > > > > > Neither of these values has to be the same as the size of the buffer -- > > > > > > > but they better not be bigger! > > > > > > > > > > > > Thanks. Now things are a bit clearer. > > > > > > I guess that for the outcoming URB what I proposed made no sense. For > > > > > > incoming URB, I guess that most of the drivers want to set > > > > > > urb::transfer_buffer once for all with the allocated size and never > > > > > > touch it again. > > > > > > > > > > > > Maybe the patch only makes sense of the incoming URB. Would it make > > > > > > sense to keep it but with an additional check to trigger a dmesg > > > > > > warning if this is used on an outcoming endpoint and with additional > > > > > > comment that the URB_FREE_COHERENT requires urb::transfer_buffer to be > > > > > > the allocated size? > > > > > > > > > > IIRC urb are pretty big. > > > > > > > > What exactly do you mean by "pretty big" here? And what is wrong with > > > > that, I have never seen any issues with the current size of that > > > > structure in any benchmark or performance results. All USB bottlenecks > > > > that I know of are either in the hardware layer, or in the protocol > > > > layer itself (i.e. usb-storage protocol). > > > > > > > > > You'd be unlucky if adding an extra field to hold the allocated > > > > > size would ever need more memory. > > > > > So it might just be worth saving the allocated size. > > > > > > > > Maybe, yes, then we could transition to allocating the urb and buffer at > > > > the same time like we do partially for iso streams in an urb. But that > > > > still might be overkill for just this one driver. > > > > > > Well, I wouldn't have proposed the patch if it only applied to a > > > single driver. If we add a urb::allocated_transfer_size as suggested > > > by David, I believe that the majority of the drivers using DMA memory > > > will be able to rely on that URB_FREE_COHERENT flag for the garbage > > > collection. > > > > > > The caveat, as you pointed before, is that the developper still needs > > > to be aware of the limitations of DMA and that it should not be freed > > > in an IRQ context. e.g. no call to usb_kill_anchored_urbs() or other > > > functions that would lead to urb_destroy(). > > > > > > > I'm curious as to why > > > > a slow and tiny protocol like CAN needs to use usb_alloc_coherent() for > > > > its buffers in the first place. > > > > > > The CAN protocol, in its latest revision, allows for transfer speed up > > > to ~5Mbits. For low performance CPUs, this starts to be a significant > > > load. Also, the CAN PDU being small (0 to 64 bytes), many small > > > transfers occur. > > > > And is the memcpy the actual issue here? Even tiny cpus can do large > > and small memcopy very very very fast. > > > > > Unfortunately I did not do any benchmark myself so I won't be able to > > > back my explanation with numbers. > > > > That might be the simplest solution here :) > > Yes, this would give a clear answer whether or not DMA was needed in > the first place. But I do not own that gs_usb device to do the > benchmark myself (and to be honest I do not have time to dedicate for > this at the moment, maybe I will do it later on some other devices). > > Has anyone from the linux-can mailing list ever done such a benchmark? > Else, is there anyone who would like to volunteer? I have access to a couple of gs_usb devices but I am afraid I have no experience performing this sort of benchmarking and also would have to squeeze it in as a weekend project or something similar. That said, if someone's willing to help step me through it, I can see if it's feasible for me to do. That said, the gs_usb driver is mostly following along a very well established pattern for writing USB CAN devices. Both the pattern followed that created the memory leak, as well as the pattern I followed to resolve the memory leak, were also seen in the esd2 USB CAN driver as well, and likely others are following suit. So, I don't know that we'd need to keep it specific to gs_usb to gain good information here. Best, Rhett
On Thu. 23 Jun 2022 at 03:13, Rhett Aultman <rhett.aultman@samsara.com> wrote: > On Thu, 23 Jun 2022, Vincent MAILHOL wrote: > > On Wed. 22 Jun 2022 at 21:24, Greg Kroah-Hartman > > <gregkh@linuxfoundation.org> wrote: > > > On Wed, Jun 22, 2022 at 07:34:57PM +0900, Vincent MAILHOL wrote: > > > > On Wed. 22 Jun 2022 at 18:44, Greg Kroah-Hartman > > > > <gregkh@linuxfoundation.org> wrote: > > > > > On Wed, Jun 22, 2022 at 09:22:12AM +0000, David Laight wrote: > > > > > > From: Vincent MAILHOL > > > > > > > Sent: 21 June 2022 16:56 > > > > > > > > > > > > > > On Wed. 22 Jun 2022 at 00:13, Alan Stern <stern@rowland.harvard.edu> wrote: > > > > > > > > On Tue, Jun 21, 2022 at 11:59:16PM +0900, Vincent MAILHOL wrote: > > > > > > > > > I (probably wrongly) assumed that urb::transfer_buffer_length was the > > > > > > > > > allocated length and urb::actual_length was what was actually being > > > > > > > > > transferred. Right now, I am just confused. Seems that I need to study > > > > > > > > > a bit more and understand the real purpose of > > > > > > > > > urb::transfer_buffer_length because I still fail to understand in > > > > > > > > > which situation this can be different from the allocated length. > > > > > > > > > > > > > > > > urb->transfer_buffer_length is the amount of data that the driver wants > > > > > > > > to send or expects to receive. urb->actual_length is the amount of data > > > > > > > > that was actually sent or actually received. > > > > > > > > > > > > > > > > Neither of these values has to be the same as the size of the buffer -- > > > > > > > > but they better not be bigger! > > > > > > > > > > > > > > Thanks. Now things are a bit clearer. > > > > > > > I guess that for the outcoming URB what I proposed made no sense. For > > > > > > > incoming URB, I guess that most of the drivers want to set > > > > > > > urb::transfer_buffer once for all with the allocated size and never > > > > > > > touch it again. > > > > > > > > > > > > > > Maybe the patch only makes sense of the incoming URB. Would it make > > > > > > > sense to keep it but with an additional check to trigger a dmesg > > > > > > > warning if this is used on an outcoming endpoint and with additional > > > > > > > comment that the URB_FREE_COHERENT requires urb::transfer_buffer to be > > > > > > > the allocated size? > > > > > > > > > > > > IIRC urb are pretty big. > > > > > > > > > > What exactly do you mean by "pretty big" here? And what is wrong with > > > > > that, I have never seen any issues with the current size of that > > > > > structure in any benchmark or performance results. All USB bottlenecks > > > > > that I know of are either in the hardware layer, or in the protocol > > > > > layer itself (i.e. usb-storage protocol). > > > > > > > > > > > You'd be unlucky if adding an extra field to hold the allocated > > > > > > size would ever need more memory. > > > > > > So it might just be worth saving the allocated size. > > > > > > > > > > Maybe, yes, then we could transition to allocating the urb and buffer at > > > > > the same time like we do partially for iso streams in an urb. But that > > > > > still might be overkill for just this one driver. > > > > > > > > Well, I wouldn't have proposed the patch if it only applied to a > > > > single driver. If we add a urb::allocated_transfer_size as suggested > > > > by David, I believe that the majority of the drivers using DMA memory > > > > will be able to rely on that URB_FREE_COHERENT flag for the garbage > > > > collection. > > > > > > > > The caveat, as you pointed before, is that the developper still needs > > > > to be aware of the limitations of DMA and that it should not be freed > > > > in an IRQ context. e.g. no call to usb_kill_anchored_urbs() or other > > > > functions that would lead to urb_destroy(). > > > > > > > > > I'm curious as to why > > > > > a slow and tiny protocol like CAN needs to use usb_alloc_coherent() for > > > > > its buffers in the first place. > > > > > > > > The CAN protocol, in its latest revision, allows for transfer speed up > > > > to ~5Mbits. For low performance CPUs, this starts to be a significant > > > > load. Also, the CAN PDU being small (0 to 64 bytes), many small > > > > transfers occur. > > > > > > And is the memcpy the actual issue here? Even tiny cpus can do large > > > and small memcopy very very very fast. > > > > > > > Unfortunately I did not do any benchmark myself so I won't be able to > > > > back my explanation with numbers. > > > > > > That might be the simplest solution here :) > > > > Yes, this would give a clear answer whether or not DMA was needed in > > the first place. But I do not own that gs_usb device to do the > > benchmark myself (and to be honest I do not have time to dedicate for > > this at the moment, maybe I will do it later on some other devices). > > > > Has anyone from the linux-can mailing list ever done such a benchmark? > > Else, is there anyone who would like to volunteer? > > I have access to a couple of gs_usb devices but I am afraid I have no > experience performing this sort of benchmarking and also would have to > squeeze it in as a weekend project or something similar. That said, if > someone's willing to help step me through it, I can see if it's feasible > for me to do. I can throw a few hints which might be helpful. First, you should obviously prepare two versions of the gs_usb driver: one using usb_alloc_coherent() (the current one), the other using kmalloc() and compare the two. Right now, I can think of two relevant benchmarks: transmission latency and CPU load. For the transmission latency, I posted one on my tools: https://lore.kernel.org/linux-can/20220626075317.746535-1-mailhol.vincent@wanadoo.fr/T/#u For the CPU load, I suggest to put the bus on full load, for example using: | cangen -g0 -p1 can0 (you might also want to play with other parameters such as the length using -L) Then use an existing tool to get the CPU load figures. I don't know for sure which tool is a good one to benchmark CPU usage in kernel land so you will have to research that part. If anyone has a suggestion… > That said, the gs_usb driver is mostly following along a very well > established pattern for writing USB CAN devices. Both the pattern > followed that created the memory leak, as well as the pattern I followed > to resolve the memory leak, were also seen in the esd2 USB CAN driver as > well, and likely others are following suit. So, I don't know that we'd > need to keep it specific to gs_usb to gain good information here. Yes, I looked at the log, the very first CAN USB driver is ems_usb and was using DMA memory from the beginning. From that point on, nearly all the drivers copied the trend (the only exception I am aware of is peak_usb). I agree that the scope is wider than the gs_can (thus my proposal to fix it at API level). Yours sincerely, Vincent Mailhol