diff mbox series

[2/7] RDMA/hfi1: don't pass bogus GFP_ flags to dma_alloc_coherent

Message ID 20221113163535.884299-3-hch@lst.de
State Accepted
Commit 82c310c33ace7d25c0475e49a6051727c48a8cc6
Headers show
Series [1/7] media: videobuf-dma-contig: use dma_mmap_coherent | expand

Commit Message

Christoph Hellwig Nov. 13, 2022, 4:35 p.m. UTC
dma_alloc_coherent is an opaque allocator that only uses the GFP_ flags
for allocation context control.  Don't pass GFP_USER which doesn't make
sense for a kernel DMA allocation or __GFP_COMP which makes no sense
for an allocation that can't in any way be converted to a page pointer.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/infiniband/hw/hfi1/init.c | 21 +++------------------
 1 file changed, 3 insertions(+), 18 deletions(-)

Comments

Dean Luick Nov. 16, 2022, 2:40 p.m. UTC | #1
On 11/13/2022 10:35 AM, Christoph Hellwig wrote:
> dma_alloc_coherent is an opaque allocator that only uses the GFP_ flags
> for allocation context control.  Don't pass GFP_USER which doesn't make
> sense for a kernel DMA allocation or __GFP_COMP which makes no sense
> for an allocation that can't in any way be converted to a page pointer.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/infiniband/hw/hfi1/init.c | 21 +++------------------
>  1 file changed, 3 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/infiniband/hw/hfi1/init.c b/drivers/infiniband/hw/hfi1/init.c
> index 436372b314312..24c0f0d257fc9 100644
> --- a/drivers/infiniband/hw/hfi1/init.c
> +++ b/drivers/infiniband/hw/hfi1/init.c
> @@ -1761,17 +1761,11 @@ int hfi1_create_rcvhdrq(struct hfi1_devdata *dd, struct hfi1_ctxtdata *rcd)
>       unsigned amt;
>
>       if (!rcd->rcvhdrq) {
> -             gfp_t gfp_flags;
> -
>               amt = rcvhdrq_size(rcd);
>
> -             if (rcd->ctxt < dd->first_dyn_alloc_ctxt || rcd->is_vnic)
> -                     gfp_flags = GFP_KERNEL;
> -             else
> -                     gfp_flags = GFP_USER;
>               rcd->rcvhdrq = dma_alloc_coherent(&dd->pcidev->dev, amt,
>                                                 &rcd->rcvhdrq_dma,
> -                                               gfp_flags | __GFP_COMP);
> +                                               GFP_KERNEL);

A user context receive header queue may be mapped into user space.  Is that not the use case for GFP_USER?  The above conditional is what decides.

Why do you think GFP_USER should be removed here?

-Dean
External recipient
Robin Murphy Nov. 16, 2022, 3:15 p.m. UTC | #2
On 2022-11-16 14:40, Dean Luick wrote:
> On 11/13/2022 10:35 AM, Christoph Hellwig wrote:
>> dma_alloc_coherent is an opaque allocator that only uses the GFP_ flags
>> for allocation context control.  Don't pass GFP_USER which doesn't make
>> sense for a kernel DMA allocation or __GFP_COMP which makes no sense
>> for an allocation that can't in any way be converted to a page pointer.
>>
>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>> ---
>>   drivers/infiniband/hw/hfi1/init.c | 21 +++------------------
>>   1 file changed, 3 insertions(+), 18 deletions(-)
>>
>> diff --git a/drivers/infiniband/hw/hfi1/init.c b/drivers/infiniband/hw/hfi1/init.c
>> index 436372b314312..24c0f0d257fc9 100644
>> --- a/drivers/infiniband/hw/hfi1/init.c
>> +++ b/drivers/infiniband/hw/hfi1/init.c
>> @@ -1761,17 +1761,11 @@ int hfi1_create_rcvhdrq(struct hfi1_devdata *dd, struct hfi1_ctxtdata *rcd)
>>        unsigned amt;
>>
>>        if (!rcd->rcvhdrq) {
>> -             gfp_t gfp_flags;
>> -
>>                amt = rcvhdrq_size(rcd);
>>
>> -             if (rcd->ctxt < dd->first_dyn_alloc_ctxt || rcd->is_vnic)
>> -                     gfp_flags = GFP_KERNEL;
>> -             else
>> -                     gfp_flags = GFP_USER;
>>                rcd->rcvhdrq = dma_alloc_coherent(&dd->pcidev->dev, amt,
>>                                                  &rcd->rcvhdrq_dma,
>> -                                               gfp_flags | __GFP_COMP);
>> +                                               GFP_KERNEL);
> 
> A user context receive header queue may be mapped into user space.  Is that not the use case for GFP_USER?  The above conditional is what decides.
> 
> Why do you think GFP_USER should be removed here?

Coherent DMA buffers are allocated by a kernel driver or subsystem for 
the use of a device managed by that driver or subsystem, and thus they 
fundamentally belong to the kernel as proxy for the device. Any coherent 
DMA buffer may be mapped to userspace with the dma_mmap_*() interfaces, 
but they're never a "userspace allocation" in that sense.

Thanks,
Robin.
Dean Luick Nov. 16, 2022, 4:21 p.m. UTC | #3
On 11/16/2022 9:15 AM, Robin Murphy wrote:
> On 2022-11-16 14:40, Dean Luick wrote:
>> On 11/13/2022 10:35 AM, Christoph Hellwig wrote:
>>> dma_alloc_coherent is an opaque allocator that only uses the GFP_ flags
>>> for allocation context control.  Don't pass GFP_USER which doesn't make
>>> sense for a kernel DMA allocation or __GFP_COMP which makes no sense
>>> for an allocation that can't in any way be converted to a page pointer.
>>>
>>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>>> ---
>>>   drivers/infiniband/hw/hfi1/init.c | 21 +++------------------
>>>   1 file changed, 3 insertions(+), 18 deletions(-)
>>>
>>> diff --git a/drivers/infiniband/hw/hfi1/init.c b/drivers/infiniband/hw/hfi1/init.c
>>> index 436372b314312..24c0f0d257fc9 100644
>>> --- a/drivers/infiniband/hw/hfi1/init.c
>>> +++ b/drivers/infiniband/hw/hfi1/init.c
>>> @@ -1761,17 +1761,11 @@ int hfi1_create_rcvhdrq(struct hfi1_devdata *dd, struct hfi1_ctxtdata *rcd)
>>>        unsigned amt;
>>>
>>>        if (!rcd->rcvhdrq) {
>>> -             gfp_t gfp_flags;
>>> -
>>>                amt = rcvhdrq_size(rcd);
>>>
>>> -             if (rcd->ctxt < dd->first_dyn_alloc_ctxt || rcd->is_vnic)
>>> -                     gfp_flags = GFP_KERNEL;
>>> -             else
>>> -                     gfp_flags = GFP_USER;
>>>                rcd->rcvhdrq = dma_alloc_coherent(&dd->pcidev->dev, amt,
>>>                                                  &rcd->rcvhdrq_dma,
>>> -                                               gfp_flags | __GFP_COMP);
>>> +                                               GFP_KERNEL);
>>
>> A user context receive header queue may be mapped into user space.  Is that not the use case for GFP_USER?  The above conditional is what decides.
>>
>> Why do you think GFP_USER should be removed here?
>
> Coherent DMA buffers are allocated by a kernel driver or subsystem for the use of a device managed by that driver or subsystem, and thus they fundamentally belong to the kernel as proxy for the device. Any coherent DMA buffer may be mapped to userspace with the dma_mmap_*() interfaces, but they're never a "userspace allocation" in that sense.

My (seemingly dated) understanding is that GFP_USER is for kernel allocations that may be mapped into user space.  The description of GFP_USER in gfp_types.h enforces my understanding.  Is my uderstanding no longer correct?  If not, then what is the point of GFP_USER?  Is GFP_USER now mostly an artifact?  Should its description be updated?

Presently, the difference between GFP_KERNEL and GFP_USER is __GFP_HARDWALL.  This enforces cpuset allocation policy. If HARDWALL is not set, the allocator will back off to the nearest memory ancestor if needed.  The back off seems like a reasonable general policy.  I do have one concern that may be hypothetical: if GFP_KERNEL is used and a buffer is silently pushed out of the expected cpuset, this can lead to mysterious slowdowns.

-Dean

External recipient
Dean Luick Nov. 16, 2022, 5:49 p.m. UTC | #4
On 11/16/2022 9:45 AM, Christoph Hellwig wrote:
> On Wed, Nov 16, 2022 at 03:15:10PM +0000, Robin Murphy wrote:
>> Coherent DMA buffers are allocated by a kernel driver or subsystem for the
>> use of a device managed by that driver or subsystem, and thus they
>> fundamentally belong to the kernel as proxy for the device. Any coherent
>> DMA buffer may be mapped to userspace with the dma_mmap_*() interfaces, but
>> they're never a "userspace allocation" in that sense.
>
> Exactly.  I could not find a place to map the buffers to userspace,
> so if it does that without using the proper interfaces we need to fix
> that as well.  Dean, can you point me to the mmap code?

See hfi1_file_mmap(), cases RCV_HDRQ and RCV_EGRBUF, for the two items you changed in hfi1.  Both directly use remap_pfn_range(), which is probably the original approved call, but now is now buried deep within dma_mmap_*().  As you say - these should be updated.  That said, the eager buffer mapping will stitch together multiple eager buffers into a single user map/vma.  I don't see how to do that with the dma_mmap_*() interface.

-Dean
External recipient
Dean Luick Nov. 20, 2022, 8:41 p.m. UTC | #5
On 11/16/2022 11:49 AM, Dean Luick wrote:
> On 11/16/2022 9:45 AM, Christoph Hellwig wrote:
>> On Wed, Nov 16, 2022 at 03:15:10PM +0000, Robin Murphy wrote:
>>> Coherent DMA buffers are allocated by a kernel driver or subsystem for the
>>> use of a device managed by that driver or subsystem, and thus they
>>> fundamentally belong to the kernel as proxy for the device. Any coherent
>>> DMA buffer may be mapped to userspace with the dma_mmap_*() interfaces, but
>>> they're never a "userspace allocation" in that sense.
>>
>> Exactly.  I could not find a place to map the buffers to userspace,
>> so if it does that without using the proper interfaces we need to fix
>> that as well.  Dean, can you point me to the mmap code?
>
> See hfi1_file_mmap(), cases RCV_HDRQ and RCV_EGRBUF, for the two items you changed in hfi1.  Both directly use remap_pfn_range(), which is probably the original approved call, but now is now buried deep within dma_mmap_*().  As you say - these should be updated.  That said, the eager buffer mapping will stitch together multiple eager buffers into a single user map/vma.  I don't see how to do that with the dma_mmap_*() interface.

I have tested the proposed hfi1 changes.  They are fine.

Acked-by: Dean Luick <dean.luick@cornelisnetworks.com>
Tested-by: Dean Luick <dean.luick@cornelisnetworks.com>


Using dma_mmap_*() for the changed cases (e.g. rcvhdrq) fails.  They are being looked at.  I don't think they need to be part of this change.

-Dean

External recipient
diff mbox series

Patch

diff --git a/drivers/infiniband/hw/hfi1/init.c b/drivers/infiniband/hw/hfi1/init.c
index 436372b314312..24c0f0d257fc9 100644
--- a/drivers/infiniband/hw/hfi1/init.c
+++ b/drivers/infiniband/hw/hfi1/init.c
@@ -1761,17 +1761,11 @@  int hfi1_create_rcvhdrq(struct hfi1_devdata *dd, struct hfi1_ctxtdata *rcd)
 	unsigned amt;
 
 	if (!rcd->rcvhdrq) {
-		gfp_t gfp_flags;
-
 		amt = rcvhdrq_size(rcd);
 
-		if (rcd->ctxt < dd->first_dyn_alloc_ctxt || rcd->is_vnic)
-			gfp_flags = GFP_KERNEL;
-		else
-			gfp_flags = GFP_USER;
 		rcd->rcvhdrq = dma_alloc_coherent(&dd->pcidev->dev, amt,
 						  &rcd->rcvhdrq_dma,
-						  gfp_flags | __GFP_COMP);
+						  GFP_KERNEL);
 
 		if (!rcd->rcvhdrq) {
 			dd_dev_err(dd,
@@ -1785,7 +1779,7 @@  int hfi1_create_rcvhdrq(struct hfi1_devdata *dd, struct hfi1_ctxtdata *rcd)
 			rcd->rcvhdrtail_kvaddr = dma_alloc_coherent(&dd->pcidev->dev,
 								    PAGE_SIZE,
 								    &rcd->rcvhdrqtailaddr_dma,
-								    gfp_flags);
+								    GFP_KERNEL);
 			if (!rcd->rcvhdrtail_kvaddr)
 				goto bail_free;
 		}
@@ -1821,19 +1815,10 @@  int hfi1_setup_eagerbufs(struct hfi1_ctxtdata *rcd)
 {
 	struct hfi1_devdata *dd = rcd->dd;
 	u32 max_entries, egrtop, alloced_bytes = 0;
-	gfp_t gfp_flags;
 	u16 order, idx = 0;
 	int ret = 0;
 	u16 round_mtu = roundup_pow_of_two(hfi1_max_mtu);
 
-	/*
-	 * GFP_USER, but without GFP_FS, so buffer cache can be
-	 * coalesced (we hope); otherwise, even at order 4,
-	 * heavy filesystem activity makes these fail, and we can
-	 * use compound pages.
-	 */
-	gfp_flags = __GFP_RECLAIM | __GFP_IO | __GFP_COMP;
-
 	/*
 	 * The minimum size of the eager buffers is a groups of MTU-sized
 	 * buffers.
@@ -1864,7 +1849,7 @@  int hfi1_setup_eagerbufs(struct hfi1_ctxtdata *rcd)
 			dma_alloc_coherent(&dd->pcidev->dev,
 					   rcd->egrbufs.rcvtid_size,
 					   &rcd->egrbufs.buffers[idx].dma,
-					   gfp_flags);
+					   GFP_KERNEL);
 		if (rcd->egrbufs.buffers[idx].addr) {
 			rcd->egrbufs.buffers[idx].len =
 				rcd->egrbufs.rcvtid_size;