mbox series

[RFC,v6,0/7] Generic page pool & deferred freeing for system dmabuf heap

Message ID 20210205080621.3102035-1-john.stultz@linaro.org
Headers show
Series Generic page pool & deferred freeing for system dmabuf heap | expand

Message

John Stultz Feb. 5, 2021, 8:06 a.m. UTC
This series is starting to get long, so I figured I'd add a
short cover letter for context.

The point of this series is trying to add both deferred-freeing
logic as well as a page pool to the DMA-BUF system heap.

This is desired, as the combination of deferred freeing along
with the page pool allows us to offload page-zeroing out of
the allocation hot path. This was done originally with ION
and this patch series allows the DMA-BUF system heap to match
ION's system heap allocation performance in a simple
microbenchmark [1] (ION re-added to the kernel for comparision,
running on an x86 vm image):

./dmabuf-heap-bench -i 0 1 system                     
Testing dmabuf system vs ion heaptype 0 (flags: 0x1)
---------------------------------------------
dmabuf heap: alloc 4096 bytes 5000 times in 86572223 ns          17314 ns/call
ion heap:    alloc 4096 bytes 5000 times in 97442526 ns          19488 ns/call
dmabuf heap: alloc 1048576 bytes 5000 times in 196635057 ns      39327 ns/call
ion heap:    alloc 1048576 bytes 5000 times in 357323629 ns      71464 ns/call
dmabuf heap: alloc 8388608 bytes 5000 times in 3165445534 ns     633089 ns/call
ion heap:    alloc 8388608 bytes 5000 times in 3699591271 ns     739918 ns/call
dmabuf heap: alloc 33554432 bytes 5000 times in 13327402517 ns   2665480 ns/call
ion heap:    alloc 33554432 bytes 5000 times in 15292352796 ns   3058470 ns/call

Daniel didn't like earlier attempts to re-use the network
page-pool code to achieve this, and suggested the ttm_pool be
used instead. This required pulling the fairly tightly knit
ttm_pool logic apart, but after many failed attmempts I think
I found a workable abstraction to split out shared logic.

So this series contains a new generic drm_page_pool helper
library, converts the ttm_pool to use it, and then adds the
dmabuf deferred freeing and adds support to the dmabuf system
heap to use both deferred freeing and the new drm_page_pool.

Input would be greatly appreciated. Testing as well, as I don't
have any development hardware that utilizes the ttm pool.

thanks
-john

[1] https://android.googlesource.com/platform/system/memory/libdmabufheap/+/refs/heads/master/tests/dmabuf_heap_bench.c

Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Liam Mark <lmark@codeaurora.org>
Cc: Chris Goldsworthy <cgoldswo@codeaurora.org>
Cc: Laura Abbott <labbott@kernel.org>
Cc: Brian Starkey <Brian.Starkey@arm.com>
Cc: Hridya Valsaraju <hridya@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sandeep Patil <sspatil@google.com>
Cc: Daniel Mentz <danielmentz@google.com>
Cc: Ørjan Eide <orjan.eide@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Ezequiel Garcia <ezequiel@collabora.com>
Cc: Simon Ser <contact@emersion.fr>
Cc: James Jones <jajones@nvidia.com>
Cc: linux-media@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org

John Stultz (7):
  drm: Add a sharable drm page-pool implementation
  drm: ttm_pool: Rename the ttm_pool_dma structure to ttm_pool_page_dat
  drm: ttm_pool: Rework ttm_pool_free_page to allow us to use it as a
    function pointer
  drm: ttm_pool: Rework ttm_pool to use drm_page_pool
  dma-buf: heaps: Add deferred-free-helper library code
  dma-buf: system_heap: Add drm pagepool support to system heap
  dma-buf: system_heap: Add deferred freeing to the system heap

 drivers/dma-buf/heaps/Kconfig                |   5 +
 drivers/dma-buf/heaps/Makefile               |   1 +
 drivers/dma-buf/heaps/deferred-free-helper.c | 145 ++++++++++
 drivers/dma-buf/heaps/deferred-free-helper.h |  55 ++++
 drivers/dma-buf/heaps/system_heap.c          |  77 ++++-
 drivers/gpu/drm/Kconfig                      |   5 +
 drivers/gpu/drm/Makefile                     |   1 +
 drivers/gpu/drm/page_pool.c                  | 220 +++++++++++++++
 drivers/gpu/drm/ttm/ttm_pool.c               | 278 ++++++-------------
 include/drm/page_pool.h                      |  54 ++++
 include/drm/ttm/ttm_pool.h                   |  23 +-
 11 files changed, 639 insertions(+), 225 deletions(-)
 create mode 100644 drivers/dma-buf/heaps/deferred-free-helper.c
 create mode 100644 drivers/dma-buf/heaps/deferred-free-helper.h
 create mode 100644 drivers/gpu/drm/page_pool.c
 create mode 100644 include/drm/page_pool.h

Comments

Christian König Feb. 5, 2021, 8:50 a.m. UTC | #1
Am 05.02.21 um 09:06 schrieb John Stultz:
> This patch reworks the ttm_pool logic to utilize the recently
> added drm_page_pool code.
>
> Basically all the ttm_pool_type structures are replaced with
> drm_page_pool pointers, and since the drm_page_pool logic has
> its own shrinker, we can remove all of the ttm_pool shrinker
> logic.
>
> NOTE: There is one mismatch in the interfaces I'm not totally
> happy with. The ttm_pool tracks all of its pooled pages across
> a number of different pools, and tries to keep this size under
> the specified page_pool_size value. With the drm_page_pool,
> there may other users, however there is still one global
> shrinker list of pools. So we can't easily reduce the ttm
> pool under the ttm specified size without potentially doing
> a lot of shrinking to other non-ttm pools. So either we can:
>    1) Try to split it so each user of drm_page_pools manages its
>       own pool shrinking.
>    2) Push the max value into the drm_page_pool, and have it
>       manage shrinking to fit under that global max. Then share
>       those size/max values out so the ttm_pool debug output
>       can have more context.
>
> I've taken the second path in this patch set, but wanted to call
> it out so folks could look closely.

Option 3: Move the debugging code into the drm_page_pool as well.

I strong think that will be a hard requirement from Daniel for 
upstreaming this.

Christian.

>
> Thoughts would be greatly appreciated here!
>
> Cc: Daniel Vetter <daniel@ffwll.ch>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> Cc: Liam Mark <lmark@codeaurora.org>
> Cc: Chris Goldsworthy <cgoldswo@codeaurora.org>
> Cc: Laura Abbott <labbott@kernel.org>
> Cc: Brian Starkey <Brian.Starkey@arm.com>
> Cc: Hridya Valsaraju <hridya@google.com>
> Cc: Suren Baghdasaryan <surenb@google.com>
> Cc: Sandeep Patil <sspatil@google.com>
> Cc: Daniel Mentz <danielmentz@google.com>
> Cc: Ørjan Eide <orjan.eide@arm.com>
> Cc: Robin Murphy <robin.murphy@arm.com>
> Cc: Ezequiel Garcia <ezequiel@collabora.com>
> Cc: Simon Ser <contact@emersion.fr>
> Cc: James Jones <jajones@nvidia.com>
> Cc: linux-media@vger.kernel.org
> Cc: dri-devel@lists.freedesktop.org
> Signed-off-by: John Stultz <john.stultz@linaro.org>
> ---
>   drivers/gpu/drm/Kconfig        |   1 +
>   drivers/gpu/drm/ttm/ttm_pool.c | 199 +++++++--------------------------
>   include/drm/ttm/ttm_pool.h     |  23 +---
>   3 files changed, 41 insertions(+), 182 deletions(-)
>
> diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
> index d16bf340ed2e..d427abefabfb 100644
> --- a/drivers/gpu/drm/Kconfig
> +++ b/drivers/gpu/drm/Kconfig
> @@ -181,6 +181,7 @@ config DRM_PAGE_POOL
>   config DRM_TTM
>   	tristate
>   	depends on DRM && MMU
> +	select DRM_PAGE_POOL
>   	help
>   	  GPU memory management subsystem for devices with multiple
>   	  GPU memory types. Will be enabled automatically if a device driver
> diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
> index eca36678f967..dbbaf55ca5df 100644
> --- a/drivers/gpu/drm/ttm/ttm_pool.c
> +++ b/drivers/gpu/drm/ttm/ttm_pool.c
> @@ -37,6 +37,7 @@
>   #ifdef CONFIG_X86
>   #include <asm/set_memory.h>
>   #endif
> +#include <drm/page_pool.h>
>   #include <drm/ttm/ttm_pool.h>
>   #include <drm/ttm/ttm_bo_driver.h>
>   #include <drm/ttm/ttm_tt.h>
> @@ -63,15 +64,13 @@ module_param(page_pool_size, ulong, 0644);
>   
>   static atomic_long_t allocated_pages;
>   
> -static struct ttm_pool_type global_write_combined[MAX_ORDER];
> -static struct ttm_pool_type global_uncached[MAX_ORDER];
> +static struct drm_page_pool *global_write_combined[MAX_ORDER];
> +static struct drm_page_pool *global_uncached[MAX_ORDER];
>   
> -static struct ttm_pool_type global_dma32_write_combined[MAX_ORDER];
> -static struct ttm_pool_type global_dma32_uncached[MAX_ORDER];
> +static struct drm_page_pool *global_dma32_write_combined[MAX_ORDER];
> +static struct drm_page_pool *global_dma32_uncached[MAX_ORDER];
>   
>   static struct mutex shrinker_lock;
> -static struct list_head shrinker_list;
> -static struct shrinker mm_shrinker;
>   
>   /* Allocate pages of size 1 << order with the given gfp_flags */
>   static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
> @@ -223,79 +222,26 @@ static void ttm_pool_unmap(struct ttm_pool *pool, dma_addr_t dma_addr,
>   		       DMA_BIDIRECTIONAL);
>   }
>   
> -/* Give pages into a specific pool_type */
> -static void ttm_pool_type_give(struct ttm_pool_type *pt, struct page *p)
> -{
> -	spin_lock(&pt->lock);
> -	list_add(&p->lru, &pt->pages);
> -	spin_unlock(&pt->lock);
> -	atomic_long_add(1 << pt->order, &allocated_pages);
> -}
> -
> -/* Take pages from a specific pool_type, return NULL when nothing available */
> -static struct page *ttm_pool_type_take(struct ttm_pool_type *pt)
> -{
> -	struct page *p;
> -
> -	spin_lock(&pt->lock);
> -	p = list_first_entry_or_null(&pt->pages, typeof(*p), lru);
> -	if (p) {
> -		atomic_long_sub(1 << pt->order, &allocated_pages);
> -		list_del(&p->lru);
> -	}
> -	spin_unlock(&pt->lock);
> -
> -	return p;
> -}
> -
> -/* Initialize and add a pool type to the global shrinker list */
> -static void ttm_pool_type_init(struct ttm_pool_type *pt, struct ttm_pool *pool,
> -			       enum ttm_caching caching, unsigned int order)
> -{
> -	pt->pool = pool;
> -	pt->caching = caching;
> -	pt->order = order;
> -	spin_lock_init(&pt->lock);
> -	INIT_LIST_HEAD(&pt->pages);
> -
> -	mutex_lock(&shrinker_lock);
> -	list_add_tail(&pt->shrinker_list, &shrinker_list);
> -	mutex_unlock(&shrinker_lock);
> -}
> -
> -/* Remove a pool_type from the global shrinker list and free all pages */
> -static void ttm_pool_type_fini(struct ttm_pool_type *pt)
> -{
> -	struct page *p, *tmp;
> -
> -	mutex_lock(&shrinker_lock);
> -	list_del(&pt->shrinker_list);
> -	mutex_unlock(&shrinker_lock);
> -
> -	list_for_each_entry_safe(p, tmp, &pt->pages, lru)
> -		ttm_pool_free_page(p, pt->order);
> -}
> -
>   /* Return the pool_type to use for the given caching and order */
> -static struct ttm_pool_type *ttm_pool_select_type(struct ttm_pool *pool,
> +static struct drm_page_pool *ttm_pool_select_type(struct ttm_pool *pool,
>   						  enum ttm_caching caching,
>   						  unsigned int order)
>   {
>   	if (pool->use_dma_alloc)
> -		return &pool->caching[caching].orders[order];
> +		return pool->caching[caching].orders[order];
>   
>   #ifdef CONFIG_X86
>   	switch (caching) {
>   	case ttm_write_combined:
>   		if (pool->use_dma32)
> -			return &global_dma32_write_combined[order];
> +			return global_dma32_write_combined[order];
>   
> -		return &global_write_combined[order];
> +		return global_write_combined[order];
>   	case ttm_uncached:
>   		if (pool->use_dma32)
> -			return &global_dma32_uncached[order];
> +			return global_dma32_uncached[order];
>   
> -		return &global_uncached[order];
> +		return global_uncached[order];
>   	default:
>   		break;
>   	}
> @@ -304,30 +250,6 @@ static struct ttm_pool_type *ttm_pool_select_type(struct ttm_pool *pool,
>   	return NULL;
>   }
>   
> -/* Free pages using the global shrinker list */
> -static unsigned int ttm_pool_shrink(void)
> -{
> -	struct ttm_pool_type *pt;
> -	unsigned int num_freed;
> -	struct page *p;
> -
> -	mutex_lock(&shrinker_lock);
> -	pt = list_first_entry(&shrinker_list, typeof(*pt), shrinker_list);
> -
> -	p = ttm_pool_type_take(pt);
> -	if (p) {
> -		ttm_pool_free_page(p, pt->order);
> -		num_freed = 1 << pt->order;
> -	} else {
> -		num_freed = 0;
> -	}
> -
> -	list_move_tail(&pt->shrinker_list, &shrinker_list);
> -	mutex_unlock(&shrinker_lock);
> -
> -	return num_freed;
> -}
> -
>   /* Return the allocation order based for a page */
>   static unsigned int ttm_pool_page_order(struct ttm_pool *pool, struct page *p)
>   {
> @@ -377,10 +299,10 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
>   	for (order = min(MAX_ORDER - 1UL, __fls(num_pages)); num_pages;
>   	     order = min_t(unsigned int, order, __fls(num_pages))) {
>   		bool apply_caching = false;
> -		struct ttm_pool_type *pt;
> +		struct drm_page_pool *pt;
>   
>   		pt = ttm_pool_select_type(pool, tt->caching, order);
> -		p = pt ? ttm_pool_type_take(pt) : NULL;
> +		p = pt ? drm_page_pool_fetch(pt) : NULL;
>   		if (p) {
>   			apply_caching = true;
>   		} else {
> @@ -462,7 +384,7 @@ void ttm_pool_free(struct ttm_pool *pool, struct ttm_tt *tt)
>   	for (i = 0; i < tt->num_pages; ) {
>   		struct page *p = tt->pages[i];
>   		unsigned int order, num_pages;
> -		struct ttm_pool_type *pt;
> +		struct drm_page_pool *pt;
>   
>   		order = ttm_pool_page_order(pool, p);
>   		num_pages = 1ULL << order;
> @@ -473,15 +395,12 @@ void ttm_pool_free(struct ttm_pool *pool, struct ttm_tt *tt)
>   
>   		pt = ttm_pool_select_type(pool, tt->caching, order);
>   		if (pt)
> -			ttm_pool_type_give(pt, tt->pages[i]);
> +			drm_page_pool_add(pt, tt->pages[i]);
>   		else
>   			ttm_pool_free_page(tt->pages[i], order);
>   
>   		i += num_pages;
>   	}
> -
> -	while (atomic_long_read(&allocated_pages) > page_pool_size)
> -		ttm_pool_shrink();
>   }
>   EXPORT_SYMBOL(ttm_pool_free);
>   
> @@ -508,8 +427,8 @@ void ttm_pool_init(struct ttm_pool *pool, struct device *dev,
>   
>   	for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i)
>   		for (j = 0; j < MAX_ORDER; ++j)
> -			ttm_pool_type_init(&pool->caching[i].orders[j],
> -					   pool, i, j);
> +			pool->caching[i].orders[j] = drm_page_pool_create(j,
> +							ttm_pool_free_page);
>   }
>   
>   /**
> @@ -526,33 +445,18 @@ void ttm_pool_fini(struct ttm_pool *pool)
>   
>   	for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i)
>   		for (j = 0; j < MAX_ORDER; ++j)
> -			ttm_pool_type_fini(&pool->caching[i].orders[j]);
> +			drm_page_pool_destroy(pool->caching[i].orders[j]);
>   }
>   
>   #ifdef CONFIG_DEBUG_FS
> -/* Count the number of pages available in a pool_type */
> -static unsigned int ttm_pool_type_count(struct ttm_pool_type *pt)
> -{
> -	unsigned int count = 0;
> -	struct page *p;
> -
> -	spin_lock(&pt->lock);
> -	/* Only used for debugfs, the overhead doesn't matter */
> -	list_for_each_entry(p, &pt->pages, lru)
> -		++count;
> -	spin_unlock(&pt->lock);
> -
> -	return count;
> -}
> -
>   /* Dump information about the different pool types */
> -static void ttm_pool_debugfs_orders(struct ttm_pool_type *pt,
> +static void ttm_pool_debugfs_orders(struct drm_page_pool **pt,
>   				    struct seq_file *m)
>   {
>   	unsigned int i;
>   
>   	for (i = 0; i < MAX_ORDER; ++i)
> -		seq_printf(m, " %8u", ttm_pool_type_count(&pt[i]));
> +		seq_printf(m, " %8u", drm_page_pool_get_size(pt[i]));
>   	seq_puts(m, "\n");
>   }
>   
> @@ -602,7 +506,10 @@ int ttm_pool_debugfs(struct ttm_pool *pool, struct seq_file *m)
>   	}
>   
>   	seq_printf(m, "\ntotal\t: %8lu of %8lu\n",
> -		   atomic_long_read(&allocated_pages), page_pool_size);
> +		   atomic_long_read(&allocated_pages),
> +		   drm_page_pool_get_max());
> +	seq_printf(m, "(%8lu in non-ttm pools)\n", drm_page_pool_get_total() -
> +					atomic_long_read(&allocated_pages));
>   
>   	mutex_unlock(&shrinker_lock);
>   
> @@ -612,28 +519,6 @@ EXPORT_SYMBOL(ttm_pool_debugfs);
>   
>   #endif
>   
> -/* As long as pages are available make sure to release at least one */
> -static unsigned long ttm_pool_shrinker_scan(struct shrinker *shrink,
> -					    struct shrink_control *sc)
> -{
> -	unsigned long num_freed = 0;
> -
> -	do
> -		num_freed += ttm_pool_shrink();
> -	while (!num_freed && atomic_long_read(&allocated_pages));
> -
> -	return num_freed;
> -}
> -
> -/* Return the number of pages available or SHRINK_EMPTY if we have none */
> -static unsigned long ttm_pool_shrinker_count(struct shrinker *shrink,
> -					     struct shrink_control *sc)
> -{
> -	unsigned long num_pages = atomic_long_read(&allocated_pages);
> -
> -	return num_pages ? num_pages : SHRINK_EMPTY;
> -}
> -
>   /**
>    * ttm_pool_mgr_init - Initialize globals
>    *
> @@ -648,24 +533,21 @@ int ttm_pool_mgr_init(unsigned long num_pages)
>   	if (!page_pool_size)
>   		page_pool_size = num_pages;
>   
> +	drm_page_pool_set_max(page_pool_size);
> +
>   	mutex_init(&shrinker_lock);
> -	INIT_LIST_HEAD(&shrinker_list);
>   
>   	for (i = 0; i < MAX_ORDER; ++i) {
> -		ttm_pool_type_init(&global_write_combined[i], NULL,
> -				   ttm_write_combined, i);
> -		ttm_pool_type_init(&global_uncached[i], NULL, ttm_uncached, i);
> -
> -		ttm_pool_type_init(&global_dma32_write_combined[i], NULL,
> -				   ttm_write_combined, i);
> -		ttm_pool_type_init(&global_dma32_uncached[i], NULL,
> -				   ttm_uncached, i);
> +		global_write_combined[i] = drm_page_pool_create(i,
> +							ttm_pool_free_page);
> +		global_uncached[i] = drm_page_pool_create(i,
> +							ttm_pool_free_page);
> +		global_dma32_write_combined[i] = drm_page_pool_create(i,
> +							ttm_pool_free_page);
> +		global_dma32_uncached[i] = drm_page_pool_create(i,
> +							ttm_pool_free_page);
>   	}
> -
> -	mm_shrinker.count_objects = ttm_pool_shrinker_count;
> -	mm_shrinker.scan_objects = ttm_pool_shrinker_scan;
> -	mm_shrinker.seeks = 1;
> -	return register_shrinker(&mm_shrinker);
> +	return 0;
>   }
>   
>   /**
> @@ -678,13 +560,10 @@ void ttm_pool_mgr_fini(void)
>   	unsigned int i;
>   
>   	for (i = 0; i < MAX_ORDER; ++i) {
> -		ttm_pool_type_fini(&global_write_combined[i]);
> -		ttm_pool_type_fini(&global_uncached[i]);
> +		drm_page_pool_destroy(global_write_combined[i]);
> +		drm_page_pool_destroy(global_uncached[i]);
>   
> -		ttm_pool_type_fini(&global_dma32_write_combined[i]);
> -		ttm_pool_type_fini(&global_dma32_uncached[i]);
> +		drm_page_pool_destroy(global_dma32_write_combined[i]);
> +		drm_page_pool_destroy(global_dma32_uncached[i]);
>   	}
> -
> -	unregister_shrinker(&mm_shrinker);
> -	WARN_ON(!list_empty(&shrinker_list));
>   }
> diff --git a/include/drm/ttm/ttm_pool.h b/include/drm/ttm/ttm_pool.h
> index 4321728bdd11..460ab232fd22 100644
> --- a/include/drm/ttm/ttm_pool.h
> +++ b/include/drm/ttm/ttm_pool.h
> @@ -36,27 +36,6 @@ struct ttm_tt;
>   struct ttm_pool;
>   struct ttm_operation_ctx;
>   
> -/**
> - * ttm_pool_type - Pool for a certain memory type
> - *
> - * @pool: the pool we belong to, might be NULL for the global ones
> - * @order: the allocation order our pages have
> - * @caching: the caching type our pages have
> - * @shrinker_list: our place on the global shrinker list
> - * @lock: protection of the page list
> - * @pages: the list of pages in the pool
> - */
> -struct ttm_pool_type {
> -	struct ttm_pool *pool;
> -	unsigned int order;
> -	enum ttm_caching caching;
> -
> -	struct list_head shrinker_list;
> -
> -	spinlock_t lock;
> -	struct list_head pages;
> -};
> -
>   /**
>    * ttm_pool - Pool for all caching and orders
>    *
> @@ -71,7 +50,7 @@ struct ttm_pool {
>   	bool use_dma32;
>   
>   	struct {
> -		struct ttm_pool_type orders[MAX_ORDER];
> +		struct drm_page_pool *orders[MAX_ORDER];
>   	} caching[TTM_NUM_CACHING_TYPES];
>   };
>
Christian König Feb. 5, 2021, 10:36 a.m. UTC | #2
Am 05.02.21 um 09:06 schrieb John Stultz:
> This series is starting to get long, so I figured I'd add a
> short cover letter for context.
>
> The point of this series is trying to add both deferred-freeing
> logic as well as a page pool to the DMA-BUF system heap.
>
> This is desired, as the combination of deferred freeing along
> with the page pool allows us to offload page-zeroing out of
> the allocation hot path. This was done originally with ION
> and this patch series allows the DMA-BUF system heap to match
> ION's system heap allocation performance in a simple
> microbenchmark [1] (ION re-added to the kernel for comparision,
> running on an x86 vm image):
>
> ./dmabuf-heap-bench -i 0 1 system
> Testing dmabuf system vs ion heaptype 0 (flags: 0x1)
> ---------------------------------------------
> dmabuf heap: alloc 4096 bytes 5000 times in 86572223 ns          17314 ns/call
> ion heap:    alloc 4096 bytes 5000 times in 97442526 ns          19488 ns/call
> dmabuf heap: alloc 1048576 bytes 5000 times in 196635057 ns      39327 ns/call
> ion heap:    alloc 1048576 bytes 5000 times in 357323629 ns      71464 ns/call
> dmabuf heap: alloc 8388608 bytes 5000 times in 3165445534 ns     633089 ns/call
> ion heap:    alloc 8388608 bytes 5000 times in 3699591271 ns     739918 ns/call
> dmabuf heap: alloc 33554432 bytes 5000 times in 13327402517 ns   2665480 ns/call
> ion heap:    alloc 33554432 bytes 5000 times in 15292352796 ns   3058470 ns/call
>
> Daniel didn't like earlier attempts to re-use the network
> page-pool code to achieve this, and suggested the ttm_pool be
> used instead. This required pulling the fairly tightly knit
> ttm_pool logic apart, but after many failed attmempts I think
> I found a workable abstraction to split out shared logic.
>
> So this series contains a new generic drm_page_pool helper
> library, converts the ttm_pool to use it, and then adds the
> dmabuf deferred freeing and adds support to the dmabuf system
> heap to use both deferred freeing and the new drm_page_pool.
>
> Input would be greatly appreciated. Testing as well, as I don't
> have any development hardware that utilizes the ttm pool.

We can easily do the testing and the general idea sounds solid to me.

I see three major things we need to clean up here.
1. The licensing, you are moving from BSD/MIT to GPL2.
2. Don't add any new overhead to the TTM pool, especially allocating a 
private object per page is a no-go.
3. What are you doing with the reclaim stuff and why?
4. Keeping the documentation would be nice to have.

Regards,
Christian.

>
> thanks
> -john
>
> [1] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fandroid.googlesource.com%2Fplatform%2Fsystem%2Fmemory%2Flibdmabufheap%2F%2B%2Frefs%2Fheads%2Fmaster%2Ftests%2Fdmabuf_heap_bench.c&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C2dc4d6cb3ee246558b9e08d8c9acef9a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637481091933715561%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=oMgVsrdlwS%2BqZuuatjTiWDzMU9SiUW5eRar5xwT%2BHYQ%3D&amp;reserved=0
>
> Cc: Daniel Vetter <daniel@ffwll.ch>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> Cc: Liam Mark <lmark@codeaurora.org>
> Cc: Chris Goldsworthy <cgoldswo@codeaurora.org>
> Cc: Laura Abbott <labbott@kernel.org>
> Cc: Brian Starkey <Brian.Starkey@arm.com>
> Cc: Hridya Valsaraju <hridya@google.com>
> Cc: Suren Baghdasaryan <surenb@google.com>
> Cc: Sandeep Patil <sspatil@google.com>
> Cc: Daniel Mentz <danielmentz@google.com>
> Cc: Ørjan Eide <orjan.eide@arm.com>
> Cc: Robin Murphy <robin.murphy@arm.com>
> Cc: Ezequiel Garcia <ezequiel@collabora.com>
> Cc: Simon Ser <contact@emersion.fr>
> Cc: James Jones <jajones@nvidia.com>
> Cc: linux-media@vger.kernel.org
> Cc: dri-devel@lists.freedesktop.org
>
> John Stultz (7):
>    drm: Add a sharable drm page-pool implementation
>    drm: ttm_pool: Rename the ttm_pool_dma structure to ttm_pool_page_dat
>    drm: ttm_pool: Rework ttm_pool_free_page to allow us to use it as a
>      function pointer
>    drm: ttm_pool: Rework ttm_pool to use drm_page_pool
>    dma-buf: heaps: Add deferred-free-helper library code
>    dma-buf: system_heap: Add drm pagepool support to system heap
>    dma-buf: system_heap: Add deferred freeing to the system heap
>
>   drivers/dma-buf/heaps/Kconfig                |   5 +
>   drivers/dma-buf/heaps/Makefile               |   1 +
>   drivers/dma-buf/heaps/deferred-free-helper.c | 145 ++++++++++
>   drivers/dma-buf/heaps/deferred-free-helper.h |  55 ++++
>   drivers/dma-buf/heaps/system_heap.c          |  77 ++++-
>   drivers/gpu/drm/Kconfig                      |   5 +
>   drivers/gpu/drm/Makefile                     |   1 +
>   drivers/gpu/drm/page_pool.c                  | 220 +++++++++++++++
>   drivers/gpu/drm/ttm/ttm_pool.c               | 278 ++++++-------------
>   include/drm/page_pool.h                      |  54 ++++
>   include/drm/ttm/ttm_pool.h                   |  23 +-
>   11 files changed, 639 insertions(+), 225 deletions(-)
>   create mode 100644 drivers/dma-buf/heaps/deferred-free-helper.c
>   create mode 100644 drivers/dma-buf/heaps/deferred-free-helper.h
>   create mode 100644 drivers/gpu/drm/page_pool.c
>   create mode 100644 include/drm/page_pool.h
>
John Stultz Feb. 5, 2021, 7:47 p.m. UTC | #3
On Fri, Feb 5, 2021 at 12:28 AM Christian König
<christian.koenig@amd.com> wrote:
> Am 05.02.21 um 09:06 schrieb John Stultz:
> > This refactors ttm_pool_free_page(), and by adding extra entries
> > to ttm_pool_page_dat, we then use it for all allocations, which
> > allows us to simplify the arguments needed to be passed to
> > ttm_pool_free_page().
>
> This is a clear NAK since the peer page data is just a workaround for
> the DMA-API hack to grab pages from there.
>
> Adding this to all pages would increase the memory footprint drastically.

Yea, that's a good point!  Hrm... bummer. I'll have to see if there's
some other way I can get the needed context for the free from the
generic page-pool side.

Thanks so much for the review!
-john
John Stultz Feb. 5, 2021, 8:57 p.m. UTC | #4
On Fri, Feb 5, 2021 at 2:36 AM Christian König <christian.koenig@amd.com> wrote:
> Am 05.02.21 um 09:06 schrieb John Stultz:
> > Input would be greatly appreciated. Testing as well, as I don't
> > have any development hardware that utilizes the ttm pool.
>
> We can easily do the testing and the general idea sounds solid to me.
>

Thanks so much again for the feedback!

> I see three major things we need to clean up here.
> 1. The licensing, you are moving from BSD/MIT to GPL2.

Yea, this may be sticky, as it's not just code re-used from one dual
licensed file, but combination of GPL2 work, so advice here would be
appreciated.

> 2. Don't add any new overhead to the TTM pool, especially allocating a
> private object per page is a no-go.

This will need some more series rework to solve. I've got some ideas,
but we'll see if they work.

> 3. What are you doing with the reclaim stuff and why?

As I mentioned, it's a holdover from earlier code, and I'm happy to
drop it and defer to other accounting/stats discussions that are
ongoing.

> 4. Keeping the documentation would be nice to have.

True. I didn't spend much time with documentation, as I worried folks
may have disagreed with the whole approach. I'll work to improve it
for the next go around.

Thanks so much again for the review and feedback! I really appreciate
your time here.
-john
Christian König Feb. 9, 2021, 12:13 p.m. UTC | #5
Am 05.02.21 um 20:47 schrieb John Stultz:
> On Fri, Feb 5, 2021 at 12:28 AM Christian König
> <christian.koenig@amd.com> wrote:
>> Am 05.02.21 um 09:06 schrieb John Stultz:
>>> This refactors ttm_pool_free_page(), and by adding extra entries
>>> to ttm_pool_page_dat, we then use it for all allocations, which
>>> allows us to simplify the arguments needed to be passed to
>>> ttm_pool_free_page().
>> This is a clear NAK since the peer page data is just a workaround for
>> the DMA-API hack to grab pages from there.
>>
>> Adding this to all pages would increase the memory footprint drastically.
> Yea, that's a good point!  Hrm... bummer. I'll have to see if there's
> some other way I can get the needed context for the free from the
> generic page-pool side.

What exactly is the problem here? As far as I can see we just have the 
lru entry (list_head) and the pool.

How the lru is cast to the page can be completely pool implementation 
specific.

Regards,
Christian.

>
> Thanks so much for the review!
> -john
Christian König Feb. 9, 2021, 12:57 p.m. UTC | #6
Am 09.02.21 um 13:11 schrieb Christian König:
> [SNIP]
>>>> +void drm_page_pool_add(struct drm_page_pool *pool, struct page *page)
>>>> +{
>>>> +     spin_lock(&pool->lock);
>>>> +     list_add_tail(&page->lru, &pool->items);
>>>> +     pool->count++;
>>>> +     atomic_long_add(1 << pool->order, &total_pages);
>>>> +     spin_unlock(&pool->lock);
>>>> +
>>>> +     mod_node_page_state(page_pgdat(page), 
>>>> NR_KERNEL_MISC_RECLAIMABLE,
>>>> +                         1 << pool->order);
>>> Hui what? What should that be good for?
>> This is a carryover from the ION page pool implementation:
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Ftree%2Fdrivers%2Fstaging%2Fandroid%2Fion%2Fion_page_pool.c%3Fh%3Dv5.10%23n28&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7Cc4eadb0a9cf6491d99ba08d8ca173457%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637481548325174885%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=FUjZK5NSDMUYfU7vGeE4fDU2HCF%2FYyNBwc30aoLLPQ4%3D&amp;reserved=0 
>>
>>
>> My sense is it helps with the vmstat/meminfo accounting so folks can
>> see the cached pages are shrinkable/freeable. This maybe falls under
>> other dmabuf accounting/stats discussions, so I'm happy to remove it
>> for now, or let the drivers using the shared page pool logic handle
>> the accounting themselves?

Intentionally separated the discussion for that here.

As far as I can see this is just bluntly incorrect.

Either the page is reclaimable or it is part of our pool and freeable 
through the shrinker, but never ever both.

In the best case this just messes up the accounting, in the worst case 
it can cause memory corruption.

Christian.
Suren Baghdasaryan Feb. 9, 2021, 5:33 p.m. UTC | #7
On Tue, Feb 9, 2021 at 4:57 AM Christian König <christian.koenig@amd.com> wrote:
>
> Am 09.02.21 um 13:11 schrieb Christian König:
> > [SNIP]
> >>>> +void drm_page_pool_add(struct drm_page_pool *pool, struct page *page)
> >>>> +{
> >>>> +     spin_lock(&pool->lock);
> >>>> +     list_add_tail(&page->lru, &pool->items);
> >>>> +     pool->count++;
> >>>> +     atomic_long_add(1 << pool->order, &total_pages);
> >>>> +     spin_unlock(&pool->lock);
> >>>> +
> >>>> +     mod_node_page_state(page_pgdat(page),
> >>>> NR_KERNEL_MISC_RECLAIMABLE,
> >>>> +                         1 << pool->order);
> >>> Hui what? What should that be good for?
> >> This is a carryover from the ION page pool implementation:
> >> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Ftree%2Fdrivers%2Fstaging%2Fandroid%2Fion%2Fion_page_pool.c%3Fh%3Dv5.10%23n28&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7Cc4eadb0a9cf6491d99ba08d8ca173457%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637481548325174885%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=FUjZK5NSDMUYfU7vGeE4fDU2HCF%2FYyNBwc30aoLLPQ4%3D&amp;reserved=0
> >>
> >>
> >> My sense is it helps with the vmstat/meminfo accounting so folks can
> >> see the cached pages are shrinkable/freeable. This maybe falls under
> >> other dmabuf accounting/stats discussions, so I'm happy to remove it
> >> for now, or let the drivers using the shared page pool logic handle
> >> the accounting themselves?
>
> Intentionally separated the discussion for that here.
>
> As far as I can see this is just bluntly incorrect.
>
> Either the page is reclaimable or it is part of our pool and freeable
> through the shrinker, but never ever both.

IIRC the original motivation for counting ION pooled pages as
reclaimable was to include them into /proc/meminfo's MemAvailable
calculations. NR_KERNEL_MISC_RECLAIMABLE defined as "reclaimable
non-slab kernel pages" seems like a good place to account for them but
I might be wrong.

>
> In the best case this just messes up the accounting, in the worst case
> it can cause memory corruption.
>
> Christian.
Christian König Feb. 9, 2021, 5:46 p.m. UTC | #8
Am 09.02.21 um 18:33 schrieb Suren Baghdasaryan:
> On Tue, Feb 9, 2021 at 4:57 AM Christian König <christian.koenig@amd.com> wrote:
>> Am 09.02.21 um 13:11 schrieb Christian König:
>>> [SNIP]
>>>>>> +void drm_page_pool_add(struct drm_page_pool *pool, struct page *page)
>>>>>> +{
>>>>>> +     spin_lock(&pool->lock);
>>>>>> +     list_add_tail(&page->lru, &pool->items);
>>>>>> +     pool->count++;
>>>>>> +     atomic_long_add(1 << pool->order, &total_pages);
>>>>>> +     spin_unlock(&pool->lock);
>>>>>> +
>>>>>> +     mod_node_page_state(page_pgdat(page),
>>>>>> NR_KERNEL_MISC_RECLAIMABLE,
>>>>>> +                         1 << pool->order);
>>>>> Hui what? What should that be good for?
>>>> This is a carryover from the ION page pool implementation:
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Ftree%2Fdrivers%2Fstaging%2Fandroid%2Fion%2Fion_page_pool.c%3Fh%3Dv5.10%23n28&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7Cdccccff8edcd4d147a5b08d8cd20cff2%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637484888114923580%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=9%2BIBC0tezSV6Ci4S3kWfW%2BQvJm4mdunn3dF6C0kyfCw%3D&amp;reserved=0
>>>>
>>>>
>>>> My sense is it helps with the vmstat/meminfo accounting so folks can
>>>> see the cached pages are shrinkable/freeable. This maybe falls under
>>>> other dmabuf accounting/stats discussions, so I'm happy to remove it
>>>> for now, or let the drivers using the shared page pool logic handle
>>>> the accounting themselves?
>> Intentionally separated the discussion for that here.
>>
>> As far as I can see this is just bluntly incorrect.
>>
>> Either the page is reclaimable or it is part of our pool and freeable
>> through the shrinker, but never ever both.
> IIRC the original motivation for counting ION pooled pages as
> reclaimable was to include them into /proc/meminfo's MemAvailable
> calculations. NR_KERNEL_MISC_RECLAIMABLE defined as "reclaimable
> non-slab kernel pages" seems like a good place to account for them but
> I might be wrong.

Yeah, that's what I see here as well. But exactly that is utterly nonsense.

Those pages are not "free" in the sense that get_free_page could return 
them directly.

Regards,
Christian.

>
>> In the best case this just messes up the accounting, in the worst case
>> it can cause memory corruption.
>>
>> Christian.
John Stultz Feb. 9, 2021, 5:51 p.m. UTC | #9
On Tue, Feb 9, 2021 at 4:11 AM Christian König <christian.koenig@amd.com> wrote:
>
>
>
> Am 05.02.21 um 21:46 schrieb John Stultz:
> > On Fri, Feb 5, 2021 at 12:47 AM Christian König
> > <christian.koenig@amd.com> wrote:
> >> Am 05.02.21 um 09:06 schrieb John Stultz:
> >>> diff --git a/drivers/gpu/drm/page_pool.c b/drivers/gpu/drm/page_pool.c
> >>> new file mode 100644
> >>> index 000000000000..2139f86e6ca7
> >>> --- /dev/null
> >>> +++ b/drivers/gpu/drm/page_pool.c
> >>> @@ -0,0 +1,220 @@
> >>> +// SPDX-License-Identifier: GPL-2.0
> >> Please use a BSD/MIT compatible license if you want to copy this from
> >> the TTM code.
> > Hrm. This may be problematic, as it's not just TTM code, but some of
> > the TTM logic integrated into a page-pool implementation I wrote based
> > on logic from the ION code (which was GPL-2.0 before it was dropped).
> > So I don't think I can just make it MIT.  Any extra context on the
> > need for MIT, or suggestions on how to best resolve this?
>
> Most of the DRM/TTM code is also license able under the BSD/MIT style
> license since we want to enable the BSD guys to port it over as well.
>
> What stuff do you need from the ION code that you can't just code
> yourself? As far as I have seen this is like 99% code from the TTM pool.

Yea, it evolved into being mostly logic from the TTM pool (or code
that was very similar to begin with), but it's not where it started.
My old days at IBM makes me wary of claiming it's no longer the Ship
of Theseus.

So instead I think I'll have to just throw out my patch and rewrite it
in full (so apologies in advance for any review issues I
introduce/reintroduce).

thanks
-john
Suren Baghdasaryan Feb. 9, 2021, 8:16 p.m. UTC | #10
On Tue, Feb 9, 2021 at 12:03 PM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Tue, Feb 9, 2021 at 6:46 PM Christian König <christian.koenig@amd.com> wrote:
> >
> >
> >
> > Am 09.02.21 um 18:33 schrieb Suren Baghdasaryan:
> > > On Tue, Feb 9, 2021 at 4:57 AM Christian König <christian.koenig@amd.com> wrote:
> > >> Am 09.02.21 um 13:11 schrieb Christian König:
> > >>> [SNIP]
> > >>>>>> +void drm_page_pool_add(struct drm_page_pool *pool, struct page *page)
> > >>>>>> +{
> > >>>>>> +     spin_lock(&pool->lock);
> > >>>>>> +     list_add_tail(&page->lru, &pool->items);
> > >>>>>> +     pool->count++;
> > >>>>>> +     atomic_long_add(1 << pool->order, &total_pages);
> > >>>>>> +     spin_unlock(&pool->lock);
> > >>>>>> +
> > >>>>>> +     mod_node_page_state(page_pgdat(page),
> > >>>>>> NR_KERNEL_MISC_RECLAIMABLE,
> > >>>>>> +                         1 << pool->order);
> > >>>>> Hui what? What should that be good for?
> > >>>> This is a carryover from the ION page pool implementation:
> > >>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Ftree%2Fdrivers%2Fstaging%2Fandroid%2Fion%2Fion_page_pool.c%3Fh%3Dv5.10%23n28&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7Cdccccff8edcd4d147a5b08d8cd20cff2%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637484888114923580%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=9%2BIBC0tezSV6Ci4S3kWfW%2BQvJm4mdunn3dF6C0kyfCw%3D&amp;reserved=0
> > >>>>
> > >>>>
> > >>>> My sense is it helps with the vmstat/meminfo accounting so folks can
> > >>>> see the cached pages are shrinkable/freeable. This maybe falls under
> > >>>> other dmabuf accounting/stats discussions, so I'm happy to remove it
> > >>>> for now, or let the drivers using the shared page pool logic handle
> > >>>> the accounting themselves?
> > >> Intentionally separated the discussion for that here.
> > >>
> > >> As far as I can see this is just bluntly incorrect.
> > >>
> > >> Either the page is reclaimable or it is part of our pool and freeable
> > >> through the shrinker, but never ever both.
> > > IIRC the original motivation for counting ION pooled pages as
> > > reclaimable was to include them into /proc/meminfo's MemAvailable
> > > calculations. NR_KERNEL_MISC_RECLAIMABLE defined as "reclaimable
> > > non-slab kernel pages" seems like a good place to account for them but
> > > I might be wrong.
> >
> > Yeah, that's what I see here as well. But exactly that is utterly nonsense.
> >
> > Those pages are not "free" in the sense that get_free_page could return
> > them directly.
>
> Well on Android that is kinda true, because Android has it's
> oom-killer (way back it was just a shrinker callback, not sure how it
> works now), which just shot down all the background apps. So at least
> some of that (everything used by background apps) is indeed
> reclaimable on Android.
>
> But that doesn't hold on Linux in general, so we can't really do this
> for common code.
>
> Also I had a long meeting with Suren, John and other googles
> yesterday, and the aim is now to try and support all the Android gpu
> memory accounting needs with cgroups. That should work, and it will
> allow Android to handle all the Android-ism in a clean way in upstream
> code. Or that's at least the plan.
>
> I think the only thing we identified that Android still needs on top
> is the dma-buf sysfs stuff, so that shared buffers (which on Android
> are always dma-buf, and always stay around as dma-buf fd throughout
> their lifetime) can be listed/analyzed with full detail.
>
> But aside from this the plan for all the per-process or per-heap
> account, oom-killer integration and everything else is planned to be
> done with cgroups.

Until cgroups are ready we probably will need to add a sysfs node to
report the total dmabuf pool size and I think that would cover our
current accounting need here.
As I mentioned, not including dmabuf pools into MemAvailable would
affect that stat and I'm wondering if pools should be considered as
part of MemAvailable or not. Since MemAvailable includes SReclaimable
I think it makes sense to include them but maybe there are other
considerations that I'm missing?

> Android (for now) only needs to account overall gpu
> memory since none of it is swappable on android drivers anyway, plus
> no vram, so not much needed.
>
> Cheers, Daniel
>
> >
> > Regards,
> > Christian.
> >
> > >
> > >> In the best case this just messes up the accounting, in the worst case
> > >> it can cause memory corruption.
> > >>
> > >> Christian.
> >
>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
Daniel Vetter Feb. 10, 2021, 5:21 p.m. UTC | #11
On Wed, Feb 10, 2021 at 5:39 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Wed, Feb 10, 2021 at 5:06 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > On Tue, Feb 09, 2021 at 12:16:51PM -0800, Suren Baghdasaryan wrote:
> > > On Tue, Feb 9, 2021 at 12:03 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> > > >
> > > > On Tue, Feb 9, 2021 at 6:46 PM Christian König <christian.koenig@amd.com> wrote:
> > > > >
> > > > >
> > > > >
> > > > > Am 09.02.21 um 18:33 schrieb Suren Baghdasaryan:
> > > > > > On Tue, Feb 9, 2021 at 4:57 AM Christian König <christian.koenig@amd.com> wrote:
> > > > > >> Am 09.02.21 um 13:11 schrieb Christian König:
> > > > > >>> [SNIP]
> > > > > >>>>>> +void drm_page_pool_add(struct drm_page_pool *pool, struct page *page)
> > > > > >>>>>> +{
> > > > > >>>>>> +     spin_lock(&pool->lock);
> > > > > >>>>>> +     list_add_tail(&page->lru, &pool->items);
> > > > > >>>>>> +     pool->count++;
> > > > > >>>>>> +     atomic_long_add(1 << pool->order, &total_pages);
> > > > > >>>>>> +     spin_unlock(&pool->lock);
> > > > > >>>>>> +
> > > > > >>>>>> +     mod_node_page_state(page_pgdat(page),
> > > > > >>>>>> NR_KERNEL_MISC_RECLAIMABLE,
> > > > > >>>>>> +                         1 << pool->order);
> > > > > >>>>> Hui what? What should that be good for?
> > > > > >>>> This is a carryover from the ION page pool implementation:
> > > > > >>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Ftree%2Fdrivers%2Fstaging%2Fandroid%2Fion%2Fion_page_pool.c%3Fh%3Dv5.10%23n28&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7Cdccccff8edcd4d147a5b08d8cd20cff2%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637484888114923580%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=9%2BIBC0tezSV6Ci4S3kWfW%2BQvJm4mdunn3dF6C0kyfCw%3D&amp;reserved=0
> > > > > >>>>
> > > > > >>>>
> > > > > >>>> My sense is it helps with the vmstat/meminfo accounting so folks can
> > > > > >>>> see the cached pages are shrinkable/freeable. This maybe falls under
> > > > > >>>> other dmabuf accounting/stats discussions, so I'm happy to remove it
> > > > > >>>> for now, or let the drivers using the shared page pool logic handle
> > > > > >>>> the accounting themselves?
> > > > > >> Intentionally separated the discussion for that here.
> > > > > >>
> > > > > >> As far as I can see this is just bluntly incorrect.
> > > > > >>
> > > > > >> Either the page is reclaimable or it is part of our pool and freeable
> > > > > >> through the shrinker, but never ever both.
> > > > > > IIRC the original motivation for counting ION pooled pages as
> > > > > > reclaimable was to include them into /proc/meminfo's MemAvailable
> > > > > > calculations. NR_KERNEL_MISC_RECLAIMABLE defined as "reclaimable
> > > > > > non-slab kernel pages" seems like a good place to account for them but
> > > > > > I might be wrong.
> > > > >
> > > > > Yeah, that's what I see here as well. But exactly that is utterly nonsense.
> > > > >
> > > > > Those pages are not "free" in the sense that get_free_page could return
> > > > > them directly.
> > > >
> > > > Well on Android that is kinda true, because Android has it's
> > > > oom-killer (way back it was just a shrinker callback, not sure how it
> > > > works now), which just shot down all the background apps. So at least
> > > > some of that (everything used by background apps) is indeed
> > > > reclaimable on Android.
> > > >
> > > > But that doesn't hold on Linux in general, so we can't really do this
> > > > for common code.
> > > >
> > > > Also I had a long meeting with Suren, John and other googles
> > > > yesterday, and the aim is now to try and support all the Android gpu
> > > > memory accounting needs with cgroups. That should work, and it will
> > > > allow Android to handle all the Android-ism in a clean way in upstream
> > > > code. Or that's at least the plan.
> > > >
> > > > I think the only thing we identified that Android still needs on top
> > > > is the dma-buf sysfs stuff, so that shared buffers (which on Android
> > > > are always dma-buf, and always stay around as dma-buf fd throughout
> > > > their lifetime) can be listed/analyzed with full detail.
> > > >
> > > > But aside from this the plan for all the per-process or per-heap
> > > > account, oom-killer integration and everything else is planned to be
> > > > done with cgroups.
> > >
> > > Until cgroups are ready we probably will need to add a sysfs node to
> > > report the total dmabuf pool size and I think that would cover our
> > > current accounting need here.
> > > As I mentioned, not including dmabuf pools into MemAvailable would
> > > affect that stat and I'm wondering if pools should be considered as
> > > part of MemAvailable or not. Since MemAvailable includes SReclaimable
> > > I think it makes sense to include them but maybe there are other
> > > considerations that I'm missing?
> >
> > On Android, yes, on upstream, not so much. Because upstream doesn't have
> > the android low memory killer cleanup up all the apps, so effectively we
> > can't reclaim that memory, and we shouldn't report it as such.
> > -Daniel
>
> Hmm. Sorry, I fail to see why Android's low memory killer makes a
> difference here. In my mind, the pages in the pools are not used but
> kept there in case heaps need them (maybe that's the part I'm wrong?).
> These pages can be freed by the shrinker if memory pressure rises. In
> that sense I think it's very similar to reclaimable slabs which are
> already accounted as part of MemAvailable. So it seems logical to me
> to include unused pages in the pools here as well. What am I missing?

Ah yes, those page pool pages we can list. But conceptually (at least
in the internals) they're shrunk through mm shrinker callbacks, like
slab cache memory. So not exactly sure where to list that.

Since we have the same pools for gpu allocations on the ttm side and
John is looking into unifying those, maybe we could add that as a
patch on top? For some nice consistency across all gpu drivers from
android to discrete. I think if you, John and Christian from ttm side
can figure out how these page pools should be reported we'll have
something that fits? Maybe John can ping you on the other thread with
the shared pool rfc between ttm and dma-buf heaps (there's so much
going right now all over I'm a bit lost).

Cheers, Daniel

>
> >
> > >
> > > > Android (for now) only needs to account overall gpu
> > > > memory since none of it is swappable on android drivers anyway, plus
> > > > no vram, so not much needed.
> > > >
> > > > Cheers, Daniel
> > > >
> > > > >
> > > > > Regards,
> > > > > Christian.
> > > > >
> > > > > >
> > > > > >> In the best case this just messes up the accounting, in the worst case
> > > > > >> it can cause memory corruption.
> > > > > >>
> > > > > >> Christian.
> > > > >
> > > >
> > > >
> > > > --
> > > > Daniel Vetter
> > > > Software Engineer, Intel Corporation
> > > > http://blog.ffwll.ch
> >
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch
Suren Baghdasaryan Feb. 10, 2021, 7:12 p.m. UTC | #12
On Wed, Feb 10, 2021 at 10:32 AM Christian König
<christian.koenig@amd.com> wrote:
>
>
>
> Am 10.02.21 um 17:39 schrieb Suren Baghdasaryan:
> > On Wed, Feb 10, 2021 at 5:06 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> >> On Tue, Feb 09, 2021 at 12:16:51PM -0800, Suren Baghdasaryan wrote:
> >>> On Tue, Feb 9, 2021 at 12:03 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> >>>> On Tue, Feb 9, 2021 at 6:46 PM Christian König <christian.koenig@amd.com> wrote:
> >>>>>
> >>>>>
> >>>>> Am 09.02.21 um 18:33 schrieb Suren Baghdasaryan:
> >>>>>> On Tue, Feb 9, 2021 at 4:57 AM Christian König <christian.koenig@amd.com> wrote:
> >>>>>>> Am 09.02.21 um 13:11 schrieb Christian König:
> >>>>>>>> [SNIP]
> >>>>>>>>>>> +void drm_page_pool_add(struct drm_page_pool *pool, struct page *page)
> >>>>>>>>>>> +{
> >>>>>>>>>>> +     spin_lock(&pool->lock);
> >>>>>>>>>>> +     list_add_tail(&page->lru, &pool->items);
> >>>>>>>>>>> +     pool->count++;
> >>>>>>>>>>> +     atomic_long_add(1 << pool->order, &total_pages);
> >>>>>>>>>>> +     spin_unlock(&pool->lock);
> >>>>>>>>>>> +
> >>>>>>>>>>> +     mod_node_page_state(page_pgdat(page),
> >>>>>>>>>>> NR_KERNEL_MISC_RECLAIMABLE,
> >>>>>>>>>>> +                         1 << pool->order);
> >>>>>>>>>> Hui what? What should that be good for?
> >>>>>>>>> This is a carryover from the ION page pool implementation:
> >>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Ftree%2Fdrivers%2Fstaging%2Fandroid%2Fion%2Fion_page_pool.c%3Fh%3Dv5.10%23n28&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7Cbb7155447ee149a49f3a08d8cde2685d%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637485719618339413%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=IYsJoAd7SUo12V7tS3CCRqNVm569iy%2FtoXQqm2MdC1g%3D&amp;reserved=0
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> My sense is it helps with the vmstat/meminfo accounting so folks can
> >>>>>>>>> see the cached pages are shrinkable/freeable. This maybe falls under
> >>>>>>>>> other dmabuf accounting/stats discussions, so I'm happy to remove it
> >>>>>>>>> for now, or let the drivers using the shared page pool logic handle
> >>>>>>>>> the accounting themselves?
> >>>>>>> Intentionally separated the discussion for that here.
> >>>>>>>
> >>>>>>> As far as I can see this is just bluntly incorrect.
> >>>>>>>
> >>>>>>> Either the page is reclaimable or it is part of our pool and freeable
> >>>>>>> through the shrinker, but never ever both.
> >>>>>> IIRC the original motivation for counting ION pooled pages as
> >>>>>> reclaimable was to include them into /proc/meminfo's MemAvailable
> >>>>>> calculations. NR_KERNEL_MISC_RECLAIMABLE defined as "reclaimable
> >>>>>> non-slab kernel pages" seems like a good place to account for them but
> >>>>>> I might be wrong.
> >>>>> Yeah, that's what I see here as well. But exactly that is utterly nonsense.
> >>>>>
> >>>>> Those pages are not "free" in the sense that get_free_page could return
> >>>>> them directly.
> >>>> Well on Android that is kinda true, because Android has it's
> >>>> oom-killer (way back it was just a shrinker callback, not sure how it
> >>>> works now), which just shot down all the background apps. So at least
> >>>> some of that (everything used by background apps) is indeed
> >>>> reclaimable on Android.
> >>>>
> >>>> But that doesn't hold on Linux in general, so we can't really do this
> >>>> for common code.
> >>>>
> >>>> Also I had a long meeting with Suren, John and other googles
> >>>> yesterday, and the aim is now to try and support all the Android gpu
> >>>> memory accounting needs with cgroups. That should work, and it will
> >>>> allow Android to handle all the Android-ism in a clean way in upstream
> >>>> code. Or that's at least the plan.
> >>>>
> >>>> I think the only thing we identified that Android still needs on top
> >>>> is the dma-buf sysfs stuff, so that shared buffers (which on Android
> >>>> are always dma-buf, and always stay around as dma-buf fd throughout
> >>>> their lifetime) can be listed/analyzed with full detail.
> >>>>
> >>>> But aside from this the plan for all the per-process or per-heap
> >>>> account, oom-killer integration and everything else is planned to be
> >>>> done with cgroups.
> >>> Until cgroups are ready we probably will need to add a sysfs node to
> >>> report the total dmabuf pool size and I think that would cover our
> >>> current accounting need here.
> >>> As I mentioned, not including dmabuf pools into MemAvailable would
> >>> affect that stat and I'm wondering if pools should be considered as
> >>> part of MemAvailable or not. Since MemAvailable includes SReclaimable
> >>> I think it makes sense to include them but maybe there are other
> >>> considerations that I'm missing?
> >> On Android, yes, on upstream, not so much. Because upstream doesn't have
> >> the android low memory killer cleanup up all the apps, so effectively we
> >> can't reclaim that memory, and we shouldn't report it as such.
> >> -Daniel
> > Hmm. Sorry, I fail to see why Android's low memory killer makes a
> > difference here. In my mind, the pages in the pools are not used but
> > kept there in case heaps need them (maybe that's the part I'm wrong?).
> > These pages can be freed by the shrinker if memory pressure rises.
>
> And exactly that's the difference. They *can* be freed is not the same
> thing as they *are* free.

No argument there. That's why I think meminfo has two separate stats
for MemFree and MemAvailable. MemFree is self-explanatory. The
description of MemAvailable in
https://www.kernel.org/doc/Documentation/filesystems/proc.txt says "An
estimate of how much memory is available for starting new
applications, without swapping.". Since dropping unused pages from
slabs, caches and pools is less expensive than swapping, I would
assume that a well-behaved system would do that before resorting to
swapping. And if so, such memory should be included in MemAvailable
because VM will make it available before swapping. But again, that's
my interpretation. WDYT?

>
> > In that sense I think it's very similar to reclaimable slabs which are
> > already accounted as part of MemAvailable. So it seems logical to me
> > to include unused pages in the pools here as well. What am I missing?
>
> See the shrinkers are there because you need to do some action before
> you can re-use the memory.
>
> In the case of the TTM/DRM pool for example you need to change the
> caching attributes which might cause sleeping for a TLB flush to finish.

I see your point here. But how about caches/pools which can be easily
dropped? Shouldn't they be part of MemAvailable?

>
> By accounting those pages as free you mess up (for example) the handling
> which makes sure that there are enough emergency reserves. I can only
> strongly recommend to not do that.
>
> What you could do is to add a sysfs interface to expose the different
> shrinkers and the amount of pages in them to userspace. Similar to what
> /proc/slabinfo is doing.

True, we can work around this. Just want to make sure whatever we do
really makes sense.
Thanks,
Suren.

>
> Regards,
> Christian.
>
> >
> >>>> Android (for now) only needs to account overall gpu
> >>>> memory since none of it is swappable on android drivers anyway, plus
> >>>> no vram, so not much needed.
> >>>>
> >>>> Cheers, Daniel
> >>>>
> >>>>> Regards,
> >>>>> Christian.
> >>>>>
> >>>>>>> In the best case this just messes up the accounting, in the worst case
> >>>>>>> it can cause memory corruption.
> >>>>>>>
> >>>>>>> Christian.
> >>>>
> >>>> --
> >>>> Daniel Vetter
> >>>> Software Engineer, Intel Corporation
> >>>> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7Cbb7155447ee149a49f3a08d8cde2685d%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637485719618349407%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=11ENl28PIoSoUx6FhkEK9u4G6yiLc3YhsYsl1DIzsv8%3D&amp;reserved=0
> >> --
> >> Daniel Vetter
> >> Software Engineer, Intel Corporation
> >> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7Cbb7155447ee149a49f3a08d8cde2685d%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637485719618349407%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=11ENl28PIoSoUx6FhkEK9u4G6yiLc3YhsYsl1DIzsv8%3D&amp;reserved=0
>