Message ID | 1549550196-25581-1-git-send-email-ilias.apalodimas@linaro.org |
---|---|
State | New |
Headers | show |
Series | [RFC] net: page_pool: Don't use page->private to store dma_addr_t | expand |
On Thu, Feb 07, 2019 at 04:36:36PM +0200, Ilias Apalodimas wrote: > +/* Until we can update struct-page, have a shadow struct-page, that > + * include our use-case > + * Used to store retrieve dma addresses from network drivers. > + * Never access this directly, use helper functions provided > + * page_pool_get_dma_addr() > + */ Huh? Why not simply: diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 2c471a2c43fa..2495a93ad90c 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -28,6 +28,10 @@ struct address_space; struct mem_cgroup; struct hmm; +struct page_pool { + dma_addr_t dma_addr; +}; + /* * Each physical page in the system has a struct page associated with * it to keep track of whatever it is we are using the page for at the @@ -77,6 +81,7 @@ struct page { * avoid collision and false-positive PageTail(). */ union { + struct page_pool pool; struct { /* Page cache and anonymous pages */ /** * @lru: Pageout list, eg. active_list protected by
Hi Matthew, On Thu, Feb 07, 2019 at 07:07:45AM -0800, Matthew Wilcox wrote: > On Thu, Feb 07, 2019 at 04:36:36PM +0200, Ilias Apalodimas wrote: > > +/* Until we can update struct-page, have a shadow struct-page, that > > + * include our use-case > > + * Used to store retrieve dma addresses from network drivers. > > + * Never access this directly, use helper functions provided > > + * page_pool_get_dma_addr() > > + */ > > Huh? Why not simply: > > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h > index 2c471a2c43fa..2495a93ad90c 100644 > --- a/include/linux/mm_types.h > +++ b/include/linux/mm_types.h > @@ -28,6 +28,10 @@ struct address_space; > struct mem_cgroup; > struct hmm; > > +struct page_pool { > + dma_addr_t dma_addr; > +}; > + > /* > * Each physical page in the system has a struct page associated with > * it to keep track of whatever it is we are using the page for at the > @@ -77,6 +81,7 @@ struct page { > * avoid collision and false-positive PageTail(). > */ > union { > + struct page_pool pool; > struct { /* Page cache and anonymous pages */ > /** > * @lru: Pageout list, eg. active_list protected by > Well updating struct page is the final goal, hence the comment. I am mostly looking for opinions here since we are trying to store dma addresses which are irrelevant to pages. Having dma_addr_t definitions in mm-related headers is a bit controversial isn't it ? If we can add that, then yes the code would look better Thanks /Ilias
From: Ilias Apalodimas <ilias.apalodimas@linaro.org> Date: Thu, 7 Feb 2019 17:20:34 +0200 > Well updating struct page is the final goal, hence the comment. I am mostly > looking for opinions here since we are trying to store dma addresses which are > irrelevant to pages. Having dma_addr_t definitions in mm-related headers is a > bit controversial isn't it ? If we can add that, then yes the code would look > better I fundamentally disagree. One of the core operations performed on a page is mapping it so that a device and use it. Why have ancillary data structure support for this all over the place, rather than in the common spot which is the page. A page really is not just a 'mm' structure, it is a system structure.
On Thu, Feb 07, 2019 at 01:25:19PM -0800, David Miller wrote: > From: Ilias Apalodimas <ilias.apalodimas@linaro.org> > Date: Thu, 7 Feb 2019 17:20:34 +0200 > > > Well updating struct page is the final goal, hence the comment. I am mostly > > looking for opinions here since we are trying to store dma addresses which are > > irrelevant to pages. Having dma_addr_t definitions in mm-related headers is a > > bit controversial isn't it ? If we can add that, then yes the code would look > > better > > I fundamentally disagree. > > One of the core operations performed on a page is mapping it so that a device > and use it. > > Why have ancillary data structure support for this all over the place, rather > than in the common spot which is the page. > > A page really is not just a 'mm' structure, it is a system structure. +1 The fundamental point of computing is to do I/O.
Hi David, On Thu, Feb 07, 2019 at 01:25I:19PM -0800, David Miller wrote: > From: Ilias Apalodimas <ilias.apalodimas@linaro.org> > Date: Thu, 7 Feb 2019 17:20:34 +0200 > > > Well updating struct page is the final goal, hence the comment. I am mostly > > looking for opinions here since we are trying to store dma addresses which are > > irrelevant to pages. Having dma_addr_t definitions in mm-related headers is a > > bit controversial isn't it ? If we can add that, then yes the code would look > > better > > I fundamentally disagree. > > One of the core operations performed on a page is mapping it so that a device > and use it. > > Why have ancillary data structure support for this all over the place, rather > than in the common spot which is the page. You are right on that. Moreover the intention of this change is to facilitate the page recycling patches we proposed with Jesper. In that context we do need the dma mapping information in a common spot since we'll need to access it from drivers, networking code etc. The struct page *is* the best place for that. > > A page really is not just a 'mm' structure, it is a system structure. Well if you put it that way i completely agree (also it makes our life a *lot* easier :)) Thanks /Ilias
Hi Matthew, On Thu, Feb 07, 2019 at 01:34:00PM -0800, Matthew Wilcox wrote: > On Thu, Feb 07, 2019 at 01:25:19PM -0800, David Miller wrote: > > From: Ilias Apalodimas <ilias.apalodimas@linaro.org> > > Date: Thu, 7 Feb 2019 17:20:34 +0200 > > > > > Well updating struct page is the final goal, hence the comment. I am mostly > > > looking for opinions here since we are trying to store dma addresses which are > > > irrelevant to pages. Having dma_addr_t definitions in mm-related headers is a > > > bit controversial isn't it ? If we can add that, then yes the code would look > > > better > > > > I fundamentally disagree. > > > > One of the core operations performed on a page is mapping it so that a device > > and use it. > > > > Why have ancillary data structure support for this all over the place, rather > > than in the common spot which is the page. > > > > A page really is not just a 'mm' structure, it is a system structure. > > +1 > > The fundamental point of computing is to do I/O. Ok, great that should sort it out then. I'll use your proposal and base the patch on that. Thanks for taking the time with this /Ilias
On 2/7/2019 11:42 PM, Ilias Apalodimas wrote: > Hi Matthew, > > On Thu, Feb 07, 2019 at 01:34:00PM -0800, Matthew Wilcox wrote: >> On Thu, Feb 07, 2019 at 01:25:19PM -0800, David Miller wrote: >>> From: Ilias Apalodimas <ilias.apalodimas@linaro.org> >>> Date: Thu, 7 Feb 2019 17:20:34 +0200 >>> >>>> Well updating struct page is the final goal, hence the comment. I am mostly >>>> looking for opinions here since we are trying to store dma addresses which are >>>> irrelevant to pages. Having dma_addr_t definitions in mm-related headers is a >>>> bit controversial isn't it ? If we can add that, then yes the code would look >>>> better >>> >>> I fundamentally disagree. >>> >>> One of the core operations performed on a page is mapping it so that a device >>> and use it. >>> >>> Why have ancillary data structure support for this all over the place, rather >>> than in the common spot which is the page. >>> >>> A page really is not just a 'mm' structure, it is a system structure. >> >> +1 >> >> The fundamental point of computing is to do I/O. > Ok, great that should sort it out then. > I'll use your proposal and base the patch on that. > > Thanks for taking the time with this > > /Ilias > Hi, It's great to use the struct page to store its dma mapping, but I am worried about extensibility. page_pool is evolving, and it would need several more per-page fields. One of them would be pageref_bias, a planned optimization to reduce the number of the costly atomic pageref operations (and replace existing code in several drivers). I would replace this dma field with a pointer to an extensible struct, that would contain the dma mapping (and other stuff in the near future). This pointer fits perfectly with the existing unsigned long private; they can share the memory, for both 32- and 64-bits systems. The only downside is one more pointer de-reference. This should be perf tested. However, when introducing the page refcnt bias optimization into page_pool, I believe the perf gain would be guaranteed. Regards, Tariq
On Mon, Feb 11, 2019 at 08:53:19AM +0000, Tariq Toukan wrote: > It's great to use the struct page to store its dma mapping, but I am > worried about extensibility. > page_pool is evolving, and it would need several more per-page fields. > One of them would be pageref_bias, a planned optimization to reduce the > number of the costly atomic pageref operations (and replace existing > code in several drivers). There's space for five words (20 or 40 bytes on 32/64 bit).
On 2/11/2019 2:12 PM, Matthew Wilcox wrote: > On Mon, Feb 11, 2019 at 08:53:19AM +0000, Tariq Toukan wrote: >> It's great to use the struct page to store its dma mapping, but I am >> worried about extensibility. >> page_pool is evolving, and it would need several more per-page fields. >> One of them would be pageref_bias, a planned optimization to reduce the >> number of the costly atomic pageref operations (and replace existing >> code in several drivers). > > There's space for five words (20 or 40 bytes on 32/64 bit). > OK so this is good for now, and for the near future. Fine by me. Regards, Tariq
On 02/11/2019 12:53 AM, Tariq Toukan wrote: > > Hi, > > It's great to use the struct page to store its dma mapping, but I am > worried about extensibility. > page_pool is evolving, and it would need several more per-page fields. > One of them would be pageref_bias, a planned optimization to reduce the > number of the costly atomic pageref operations (and replace existing > code in several drivers). > But the point about pageref_bias is to place it in a different cache line than "struct page" The major cost is having a cache line bouncing between producer and consumer. pageref_bias means the producer only have to read the "struct page" and not dirty it in the case the page can be recycled. > I would replace this dma field with a pointer to an extensible struct, > that would contain the dma mapping (and other stuff in the near future). > This pointer fits perfectly with the existing unsigned long private; > they can share the memory, for both 32- and 64-bits systems. > > The only downside is one more pointer de-reference. This should be perf > tested. > However, when introducing the page refcnt bias optimization into > page_pool, I believe the perf gain would be guaranteed. Only in some cases perhaps (when the cache line can be dirtied without performance hit)
On 2/11/2019 7:14 PM, Eric Dumazet wrote: > > > On 02/11/2019 12:53 AM, Tariq Toukan wrote: >> > >> Hi, >> >> It's great to use the struct page to store its dma mapping, but I am >> worried about extensibility. >> page_pool is evolving, and it would need several more per-page fields. >> One of them would be pageref_bias, a planned optimization to reduce the >> number of the costly atomic pageref operations (and replace existing >> code in several drivers). >> > > But the point about pageref_bias is to place it in a different cache line than "struct page" > > The major cost is having a cache line bouncing between producer and consumer. > pageref_bias is meant to be dirtied only by the page requester, i.e. the NIC driver / page_pool. All other components (basically, SKB release flow / put_page) should continue working with the atomic page_refcnt, and not dirty the pageref_bias. However, what bothers me more is another issue. The optimization doesn't cleanly combine with the new page_pool direction for maintaining a queue for "available" pages, as the put_page flow would need to read pageref_bias, asynchronously, and act accordingly. The suggested hook in put_page (to catch the 2 -> 1 "biased refcnt" transition) causes a problem to the traditional pageref_bias idea, as it implies a new point in which the pageref_bias field is read *asynchronously*. This would risk missing the this critical 2 -> 1 transition! Unless pageref_bias is atomic... > pageref_bias means the producer only have to read the "struct page" and not dirty it > in the case the page can be recycled. > > > >> I would replace this dma field with a pointer to an extensible struct, >> that would contain the dma mapping (and other stuff in the near future). >> This pointer fits perfectly with the existing unsigned long private; >> they can share the memory, for both 32- and 64-bits systems. >> >> The only downside is one more pointer de-reference. This should be perf >> tested. >> However, when introducing the page refcnt bias optimization into >> page_pool, I believe the perf gain would be guaranteed. > > Only in some cases perhaps (when the cache line can be dirtied without performance hit) >
On Tue, 12 Feb 2019 12:39:59 +0000 Tariq Toukan <tariqt@mellanox.com> wrote: > On 2/11/2019 7:14 PM, Eric Dumazet wrote: > > > > On 02/11/2019 12:53 AM, Tariq Toukan wrote: > >> > > > >> Hi, > >> > >> It's great to use the struct page to store its dma mapping, but I am > >> worried about extensibility. > >> page_pool is evolving, and it would need several more per-page fields. > >> One of them would be pageref_bias, a planned optimization to reduce the > >> number of the costly atomic pageref operations (and replace existing > >> code in several drivers). > >> > > > > But the point about pageref_bias is to place it in a different > > cache line than "struct page" Yes, exactly. > > The major cost is having a cache line bouncing between producer and > > consumer. > > pageref_bias is meant to be dirtied only by the page requester, i.e. the > NIC driver / page_pool. > All other components (basically, SKB release flow / put_page) should > continue working with the atomic page_refcnt, and not dirty the > pageref_bias. > > However, what bothers me more is another issue. > The optimization doesn't cleanly combine with the new page_pool > direction for maintaining a queue for "available" pages, as the put_page > flow would need to read pageref_bias, asynchronously, and act accordingly. > > The suggested hook in put_page (to catch the 2 -> 1 "biased refcnt" > transition) causes a problem to the traditional pageref_bias idea, as it > implies a new point in which the pageref_bias field is read > *asynchronously*. This would risk missing the this critical 2 -> 1 > transition! Unless pageref_bias is atomic... I want to stop you here... It seems to me that you are trying to shoehorn in a refcount optimization into page_pool. The page_pool is optimized for the XDP case of one-frame-per-page, where we can avoid changing the refcount, and tradeoff memory usage for speed. It is compatible with the elevated refcount usage, but that is not the optimization target. If the case you are optimizing for is "packing" more frames in a page, then the page_pool might be the wrong choice. To me it would make more sense to create another enum xdp_mem_type, that generalize the pageref_bias tricks also used by some drivers. -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer
On 2/12/2019 3:49 PM, Jesper Dangaard Brouer wrote: > On Tue, 12 Feb 2019 12:39:59 +0000 > Tariq Toukan <tariqt@mellanox.com> wrote: > >> On 2/11/2019 7:14 PM, Eric Dumazet wrote: >>> >>> On 02/11/2019 12:53 AM, Tariq Toukan wrote: >>>> >>> >>>> Hi, >>>> >>>> It's great to use the struct page to store its dma mapping, but I am >>>> worried about extensibility. >>>> page_pool is evolving, and it would need several more per-page fields. >>>> One of them would be pageref_bias, a planned optimization to reduce the >>>> number of the costly atomic pageref operations (and replace existing >>>> code in several drivers). >>>> >>> >>> But the point about pageref_bias is to place it in a different >>> cache line than "struct page" > > Yes, exactly. > > >>> The major cost is having a cache line bouncing between producer and >>> consumer. >> >> pageref_bias is meant to be dirtied only by the page requester, i.e. the >> NIC driver / page_pool. >> All other components (basically, SKB release flow / put_page) should >> continue working with the atomic page_refcnt, and not dirty the >> pageref_bias. >> >> However, what bothers me more is another issue. >> The optimization doesn't cleanly combine with the new page_pool >> direction for maintaining a queue for "available" pages, as the put_page >> flow would need to read pageref_bias, asynchronously, and act accordingly. >> >> The suggested hook in put_page (to catch the 2 -> 1 "biased refcnt" >> transition) causes a problem to the traditional pageref_bias idea, as it >> implies a new point in which the pageref_bias field is read >> *asynchronously*. This would risk missing the this critical 2 -> 1 >> transition! Unless pageref_bias is atomic... > > I want to stop you here... > > It seems to me that you are trying to shoehorn in a refcount > optimization into page_pool. The page_pool is optimized for the XDP > case of one-frame-per-page, where we can avoid changing the refcount, > and tradeoff memory usage for speed. It is compatible with the elevated > refcount usage, but that is not the optimization target. > > If the case you are optimizing for is "packing" more frames in a page, > then the page_pool might be the wrong choice. To me it would make more > sense to create another enum xdp_mem_type, that generalize the > pageref_bias tricks also used by some drivers. > Hi Jesper, We share the same interest, I tried to combine the pageref_bias optimization on top of the put_page hook, but turns out it doesn't fit. That's all. Of course, I am aware of the fact that page_pool is optimized for XDP use cases. But, as drivers prefer a single flow for their page-allocation management, rather than having several allocation/free methods depending on whether XDP program is loaded or not, the performance for non-XDP flow also matters. I know you're not ignoring this, the fact that you're adding compatibility for the elevated refcount usage is a key step in this direction. Another key benefit of page_pool is providing a netdev-optimized API that can replace the page allocation / dma mapping logic of the different drivers, and take it into one common shared unit. This helps remove many LOCs from drivers, significantly improves modularity, and eases the support of new optimizations. By improving the general non-XDP flow (packing several packets in a page) you encourage more and more drivers to do the transition. We all look to further improve the page-pool performance. The pageref_bias idea does not fit, that's fine. We can still introduce an API for bulk page-allocation, it will improve both XDP and non-XDP flows. Regards, Tariq
On 02/12/2019 04:39 AM, Tariq Toukan wrote: > > > On 2/11/2019 7:14 PM, Eric Dumazet wrote: >> >> >> On 02/11/2019 12:53 AM, Tariq Toukan wrote: >>> >> >>> Hi, >>> >>> It's great to use the struct page to store its dma mapping, but I am >>> worried about extensibility. >>> page_pool is evolving, and it would need several more per-page fields. >>> One of them would be pageref_bias, a planned optimization to reduce the >>> number of the costly atomic pageref operations (and replace existing >>> code in several drivers). >>> >> >> But the point about pageref_bias is to place it in a different cache line than "struct page" >> >> The major cost is having a cache line bouncing between producer and consumer. >> > > pageref_bias is meant to be dirtied only by the page requester, i.e. the > NIC driver / page_pool. > All other components (basically, SKB release flow / put_page) should > continue working with the atomic page_refcnt, and not dirty the > pageref_bias. This is exactly my point. You suggested to put pageref_bias in struct page, which breaks this completely. pageref_bias is better kept in a driver structure, with appropriate prefetching since most NIC use a ring buffer for their queues. The dma address _can_ be put in the struct page, since the driver does not dirty it and does not even read it when page can be recycled.
On Tue, Feb 12, 2019 at 7:16 AM Eric Dumazet <eric.dumazet@gmail.com> wrote: > > > > On 02/12/2019 04:39 AM, Tariq Toukan wrote: > > > > > > On 2/11/2019 7:14 PM, Eric Dumazet wrote: > >> > >> > >> On 02/11/2019 12:53 AM, Tariq Toukan wrote: > >>> > >> > >>> Hi, > >>> > >>> It's great to use the struct page to store its dma mapping, but I am > >>> worried about extensibility. > >>> page_pool is evolving, and it would need several more per-page fields. > >>> One of them would be pageref_bias, a planned optimization to reduce the > >>> number of the costly atomic pageref operations (and replace existing > >>> code in several drivers). > >>> > >> > >> But the point about pageref_bias is to place it in a different cache line than "struct page" > >> > >> The major cost is having a cache line bouncing between producer and consumer. > >> > > > > pageref_bias is meant to be dirtied only by the page requester, i.e. the > > NIC driver / page_pool. > > All other components (basically, SKB release flow / put_page) should > > continue working with the atomic page_refcnt, and not dirty the > > pageref_bias. > > This is exactly my point. > > You suggested to put pageref_bias in struct page, which breaks this completely. > > pageref_bias is better kept in a driver structure, with appropriate prefetching > since most NIC use a ring buffer for their queues. > > The dma address _can_ be put in the struct page, since the driver does not dirty it > and does not even read it when page can be recycled. Instead of maintaining the pageref_bias in the page itself it could be maintained in some sort of separate structure. You could just maintain a pointer to a slot in an array somewhere. Then you can still access it if needed, the pointer would be static for as long as it is in the page pool, and you could invalidate the pointer prior to removing the bias from the page.
Hi Alexander, On Tue, Feb 12, 2019 at 10:13:30AM -0800, Alexander Duyck wrote: > On Tue, Feb 12, 2019 at 7:16 AM Eric Dumazet <eric.dumazet@gmail.com> wrote: > > > > > > > > On 02/12/2019 04:39 AM, Tariq Toukan wrote: > > > > > > > > > On 2/11/2019 7:14 PM, Eric Dumazet wrote: > > >> > > >> > > >> On 02/11/2019 12:53 AM, Tariq Toukan wrote: > > >>> > > >> > > >>> Hi, > > >>> > > >>> It's great to use the struct page to store its dma mapping, but I am > > >>> worried about extensibility. > > >>> page_pool is evolving, and it would need several more per-page fields. > > >>> One of them would be pageref_bias, a planned optimization to reduce the > > >>> number of the costly atomic pageref operations (and replace existing > > >>> code in several drivers). > > >>> > > >> > > >> But the point about pageref_bias is to place it in a different cache line than "struct page" > > >> > > >> The major cost is having a cache line bouncing between producer and consumer. > > >> > > > > > > pageref_bias is meant to be dirtied only by the page requester, i.e. the > > > NIC driver / page_pool. > > > All other components (basically, SKB release flow / put_page) should > > > continue working with the atomic page_refcnt, and not dirty the > > > pageref_bias. > > > > This is exactly my point. > > > > You suggested to put pageref_bias in struct page, which breaks this completely. > > > > pageref_bias is better kept in a driver structure, with appropriate prefetching > > since most NIC use a ring buffer for their queues. > > > > The dma address _can_ be put in the struct page, since the driver does not dirty it > > and does not even read it when page can be recycled. > > Instead of maintaining the pageref_bias in the page itself it could be > maintained in some sort of separate structure. You could just maintain > a pointer to a slot in an array somewhere. Then you can still access > it if needed, the pointer would be static for as long as it is in the > page pool, and you could invalidate the pointer prior to removing the > bias from the page. I think that's what Tariq was suggesting in the first place. /Ilias
On 2/12/2019 8:13 PM, Alexander Duyck wrote: > On Tue, Feb 12, 2019 at 7:16 AM Eric Dumazet <eric.dumazet@gmail.com> wrote: >> >> >> >> On 02/12/2019 04:39 AM, Tariq Toukan wrote: >>> >>> >>> On 2/11/2019 7:14 PM, Eric Dumazet wrote: >>>> >>>> >>>> On 02/11/2019 12:53 AM, Tariq Toukan wrote: >>>>> >>>> >>>>> Hi, >>>>> >>>>> It's great to use the struct page to store its dma mapping, but I am >>>>> worried about extensibility. >>>>> page_pool is evolving, and it would need several more per-page fields. >>>>> One of them would be pageref_bias, a planned optimization to reduce the >>>>> number of the costly atomic pageref operations (and replace existing >>>>> code in several drivers). >>>>> >>>> >>>> But the point about pageref_bias is to place it in a different cache line than "struct page" >>>> >>>> The major cost is having a cache line bouncing between producer and consumer. >>>> >>> >>> pageref_bias is meant to be dirtied only by the page requester, i.e. the >>> NIC driver / page_pool. >>> All other components (basically, SKB release flow / put_page) should >>> continue working with the atomic page_refcnt, and not dirty the >>> pageref_bias. >> >> This is exactly my point. >> >> You suggested to put pageref_bias in struct page, which breaks this completely. >> >> pageref_bias is better kept in a driver structure, with appropriate prefetching >> since most NIC use a ring buffer for their queues. >> >> The dma address _can_ be put in the struct page, since the driver does not dirty it >> and does not even read it when page can be recycled. > > Instead of maintaining the pageref_bias in the page itself it could be > maintained in some sort of separate structure. You could just maintain > a pointer to a slot in an array somewhere. Then you can still access > it if needed, the pointer would be static for as long as it is in the > page pool, and you could invalidate the pointer prior to removing the > bias from the page. > Hi Alex, That's right. But as I describe in my other reply yesterday, there is a more serious issue with combining the pageref_bias feature with the new page_pool capability for elevated refcount pages. It won't work on top, and that's fine, as the idea of the new elevated refcount queue capability is more promising and important, and it wins here. Regards, Tariq
On 2/12/2019 8:20 PM, Ilias Apalodimas wrote: > Hi Alexander, > > On Tue, Feb 12, 2019 at 10:13:30AM -0800, Alexander Duyck wrote: >> On Tue, Feb 12, 2019 at 7:16 AM Eric Dumazet <eric.dumazet@gmail.com> wrote: >>> >>> >>> >>> On 02/12/2019 04:39 AM, Tariq Toukan wrote: >>>> >>>> >>>> On 2/11/2019 7:14 PM, Eric Dumazet wrote: >>>>> >>>>> >>>>> On 02/11/2019 12:53 AM, Tariq Toukan wrote: >>>>>> >>>>> >>>>>> Hi, >>>>>> >>>>>> It's great to use the struct page to store its dma mapping, but I am >>>>>> worried about extensibility. >>>>>> page_pool is evolving, and it would need several more per-page fields. >>>>>> One of them would be pageref_bias, a planned optimization to reduce the >>>>>> number of the costly atomic pageref operations (and replace existing >>>>>> code in several drivers). >>>>>> >>>>> >>>>> But the point about pageref_bias is to place it in a different cache line than "struct page" >>>>> >>>>> The major cost is having a cache line bouncing between producer and consumer. >>>>> >>>> >>>> pageref_bias is meant to be dirtied only by the page requester, i.e. the >>>> NIC driver / page_pool. >>>> All other components (basically, SKB release flow / put_page) should >>>> continue working with the atomic page_refcnt, and not dirty the >>>> pageref_bias. >>> >>> This is exactly my point. >>> >>> You suggested to put pageref_bias in struct page, which breaks this completely. >>> >>> pageref_bias is better kept in a driver structure, with appropriate prefetching >>> since most NIC use a ring buffer for their queues. >>> >>> The dma address _can_ be put in the struct page, since the driver does not dirty it >>> and does not even read it when page can be recycled. >> >> Instead of maintaining the pageref_bias in the page itself it could be >> maintained in some sort of separate structure. You could just maintain >> a pointer to a slot in an array somewhere. Then you can still access >> it if needed, the pointer would be static for as long as it is in the >> page pool, and you could invalidate the pointer prior to removing the >> bias from the page. > > I think that's what Tariq was suggesting in the first place. > > /Ilias > Correct. But not relevant anymore, as it won't work for other reasons. Tariq
diff --git a/include/net/page_pool.h b/include/net/page_pool.h index 694d055..618f2e5 100644 --- a/include/net/page_pool.h +++ b/include/net/page_pool.h @@ -98,6 +98,52 @@ struct page_pool { struct ptr_ring ring; }; +/* Until we can update struct-page, have a shadow struct-page, that + * include our use-case + * Used to store retrieve dma addresses from network drivers. + * Never access this directly, use helper functions provided + * page_pool_get_dma_addr() + */ +struct page_shadow { + unsigned long flags; /* Atomic flags, some possibly + * updated asynchronously + */ + /* + * Five words (20/40 bytes) are available in this union. + * WARNING: bit 0 of the first word is used for PageTail(). That + * means the other users of this union MUST NOT use the bit to + * avoid collision and false-positive PageTail(). + */ + union { + struct { /* Page cache and anonymous pages */ + /** + * @lru: Pageout list, eg. active_list protected by + * zone_lru_lock. Sometimes used as a generic list + * by the page owner. + */ + struct list_head lru; + /* See page-flags.h for PAGE_MAPPING_FLAGS */ + struct address_space *mapping; + pgoff_t index; /* Our offset within mapping. */ + /** + * @private: Mapping-private opaque data. + * Usually used for buffer_heads if PagePrivate. + * Used for swp_entry_t if PageSwapCache. + * Indicates order in the buddy system if PageBuddy. + */ + unsigned long private; + }; + struct { /* page_pool used by netstack */ + /** + * @dma_addr: Page_pool need to store DMA-addr, and + * cannot use @private, as DMA-mappings can be 64bit + * even on 32-bit Architectures. + */ + dma_addr_t dma_addr; /* Shares area with @lru */ + }; + }; +}; + struct page *page_pool_alloc_pages(struct page_pool *pool, gfp_t gfp); static inline struct page *page_pool_dev_alloc_pages(struct page_pool *pool) @@ -141,4 +187,13 @@ static inline bool is_page_pool_compiled_in(void) #endif } +static inline dma_addr_t page_pool_get_dma_addr(struct page *page) +{ + struct page_shadow *_page; + + _page = (struct page_shadow *)page; + + return _page->dma_addr; +} + #endif /* _NET_PAGE_POOL_H */ diff --git a/net/core/page_pool.c b/net/core/page_pool.c index 43a932c..1a956a6 100644 --- a/net/core/page_pool.c +++ b/net/core/page_pool.c @@ -111,6 +111,7 @@ noinline static struct page *__page_pool_alloc_pages_slow(struct page_pool *pool, gfp_t _gfp) { + struct page_shadow *_page; struct page *page; gfp_t gfp = _gfp; dma_addr_t dma; @@ -136,7 +137,7 @@ static struct page *__page_pool_alloc_pages_slow(struct page_pool *pool, if (!(pool->p.flags & PP_FLAG_DMA_MAP)) goto skip_dma_map; - /* Setup DMA mapping: use page->private for DMA-addr + /* Setup DMA mapping: use struct-page area for storing DMA-addr * This mapping is kept for lifetime of page, until leaving pool. */ dma = dma_map_page(pool->p.dev, page, 0, @@ -146,7 +147,8 @@ static struct page *__page_pool_alloc_pages_slow(struct page_pool *pool, put_page(page); return NULL; } - set_page_private(page, dma); /* page->private = dma; */ + _page = (struct page_shadow *)page; + _page->dma_addr = dma; skip_dma_map: /* When page just alloc'ed is should/must have refcnt 1. */ @@ -175,13 +177,21 @@ EXPORT_SYMBOL(page_pool_alloc_pages); static void __page_pool_clean_page(struct page_pool *pool, struct page *page) { + struct page_shadow *_page = (struct page_shadow *)page; + dma_addr_t dma; + if (!(pool->p.flags & PP_FLAG_DMA_MAP)) return; + dma = _page->dma_addr; + /* DMA unmap */ - dma_unmap_page(pool->p.dev, page_private(page), + dma_unmap_page(pool->p.dev, dma, PAGE_SIZE << pool->p.order, pool->p.dma_dir); - set_page_private(page, 0); + _page->dma_addr = 0; + /* 1. Make sure we don't need to list-init page->lru. + * 2. What does it mean: bit 0 of LRU first word is used for PageTail() + */ } /* Return a page to the page allocator, cleaning up our state */