mbox series

[0/3] Add dma-buf secure-heap

Message ID 20220805135330.970-1-olivier.masse@nxp.com
Headers show
Series Add dma-buf secure-heap | expand

Message

Olivier Masse Aug. 5, 2022, 1:53 p.m. UTC
Purpose of these patches is to add a new dma-buf heap: linaro,secure-heap
Linaro OPTEE OS Secure Data Path feature is relying on a reserved memory
defined at Linux Kernel level and OPTEE OS level.

Comments

Brian Starkey Aug. 5, 2022, 3:41 p.m. UTC | #1
Hi Olivier,

Thanks, I think this is looking much better.

I'd like to know how others feel about landing this heap; there's been
push-back in the past about heaps in device-tree and discussions
around how "custom" heaps should be treated (though IMO this is quite
a generic one).

On Fri, Aug 05, 2022 at 03:53:28PM +0200, Olivier Masse wrote:
> add Linaro secure heap bindings: linaro,secure-heap
> use genalloc to allocate/free buffer from buffer pool.
> buffer pool info is from dts.
> use sg_table instore the allocated memory info, the length of sg_table is 1.
> implement secure_heap_buf_ops to implement buffer share in difference device:
> 1. Userspace passes this fd to all drivers it wants this buffer
> to share with: First the filedescriptor is converted to a &dma_buf using
> dma_buf_get(). Then the buffer is attached to the device using dma_buf_attach().
> 2. Once the buffer is attached to all devices userspace can initiate DMA
> access to the shared buffer. In the kernel this is done by calling dma_buf_map_attachment()
> 3. get sg_table with dma_buf_map_attachment in difference device.
> 

I think this commit message could use a little rework. A few thoughts:

* The bindings are in a separate commit, so seems strange to mention
  here.
* "buffer pool info is from dts" --> I think you should mention that
  this uses a reserved-memory region.
* sg_table nents and genalloc seem like low-level implementation
  details, so probably not needed in the commit message
* The usage steps 1, 2, 3 aren't specific to this heap - that's how
  all dma-buf sharing works.

> Signed-off-by: Olivier Masse <olivier.masse@nxp.com>
> ---
>  drivers/dma-buf/heaps/Kconfig       |   9 +
>  drivers/dma-buf/heaps/Makefile      |   1 +
>  drivers/dma-buf/heaps/secure_heap.c | 357 ++++++++++++++++++++++++++++
>  3 files changed, 367 insertions(+)
>  create mode 100644 drivers/dma-buf/heaps/secure_heap.c
> 
> diff --git a/drivers/dma-buf/heaps/Kconfig b/drivers/dma-buf/heaps/Kconfig
> index 3782eeeb91c0..c9070c728b9a 100644
> --- a/drivers/dma-buf/heaps/Kconfig
> +++ b/drivers/dma-buf/heaps/Kconfig
> @@ -20,3 +20,12 @@ config DMABUF_HEAPS_DSP
>            Choose this option to enable the dsp dmabuf heap. The dsp heap
>            is allocated by gen allocater. it's allocated according the dts.
>            If in doubt, say Y.
> +
> +config DMABUF_HEAPS_SECURE
> +	tristate "DMA-BUF Secure Heap"
> +	depends on DMABUF_HEAPS
> +	help
> +	  Choose this option to enable the secure dmabuf heap. The secure heap
> +	  pools are defined according to the DT. Heaps are allocated
> +	  in the pools using gen allocater.
> +	  If in doubt, say Y.
> diff --git a/drivers/dma-buf/heaps/Makefile b/drivers/dma-buf/heaps/Makefile
> index 29733f84c354..863ef10056a3 100644
> --- a/drivers/dma-buf/heaps/Makefile
> +++ b/drivers/dma-buf/heaps/Makefile
> @@ -2,3 +2,4 @@
>  obj-$(CONFIG_DMABUF_HEAPS_SYSTEM)	+= system_heap.o
>  obj-$(CONFIG_DMABUF_HEAPS_CMA)		+= cma_heap.o
>  obj-$(CONFIG_DMABUF_HEAPS_DSP)          += dsp_heap.o
> +obj-$(CONFIG_DMABUF_HEAPS_SECURE)	+= secure_heap.o
> diff --git a/drivers/dma-buf/heaps/secure_heap.c b/drivers/dma-buf/heaps/secure_heap.c
> new file mode 100644
> index 000000000000..25b3629613f3
> --- /dev/null
> +++ b/drivers/dma-buf/heaps/secure_heap.c
> @@ -0,0 +1,357 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * DMABUF secure heap exporter
> + *
> + * Copyright 2021 NXP.

It's 2022 :-)

> + */
> +
> +#include <linux/dma-buf.h>
> +#include <linux/dma-heap.h>
> +#include <linux/dma-mapping.h>
> +#include <linux/err.h>
> +#include <linux/genalloc.h>
> +#include <linux/highmem.h>
> +#include <linux/mm.h>
> +#include <linux/module.h>
> +#include <linux/of.h>
> +#include <linux/of_fdt.h>
> +#include <linux/of_reserved_mem.h>
> +#include <linux/scatterlist.h>
> +#include <linux/slab.h>
> +#include <linux/vmalloc.h>
> +
> +#define MAX_SECURE_HEAP 2
> +#define MAX_HEAP_NAME_LEN 32
> +
> +struct secure_heap_buffer {
> +	struct dma_heap *heap;
> +	struct list_head attachments;
> +	struct mutex lock;
> +	unsigned long len;
> +	struct sg_table sg_table;
> +	int vmap_cnt;
> +	void *vaddr;
> +};
> +
> +struct secure_heap_attachment {
> +	struct device *dev;
> +	struct sg_table *table;
> +	struct list_head list;
> +};
> +
> +struct secure_heap_info {
> +	struct gen_pool *pool;
> +};
> +
> +struct rmem_secure {
> +	phys_addr_t base;
> +	phys_addr_t size;
> +
> +	char name[MAX_HEAP_NAME_LEN];
> +};
> +
> +static struct rmem_secure secure_data[MAX_SECURE_HEAP] = {0};
> +static unsigned int secure_data_count;
> +
> +static struct sg_table *dup_sg_table(struct sg_table *table)
> +{
> +	struct sg_table *new_table;
> +	int ret, i;
> +	struct scatterlist *sg, *new_sg;
> +
> +	new_table = kzalloc(sizeof(*new_table), GFP_KERNEL);
> +	if (!new_table)
> +		return ERR_PTR(-ENOMEM);
> +
> +	ret = sg_alloc_table(new_table, table->orig_nents, GFP_KERNEL);
> +	if (ret) {
> +		kfree(new_table);
> +		return ERR_PTR(-ENOMEM);
> +	}
> +
> +	new_sg = new_table->sgl;
> +	for_each_sgtable_sg(table, sg, i) {
> +		sg_set_page(new_sg, sg_page(sg), sg->length, sg->offset);
> +		new_sg->dma_address = sg->dma_address;
> +#ifdef CONFIG_NEED_SG_DMA_LENGTH
> +		new_sg->dma_length = sg->dma_length;
> +#endif
> +		new_sg = sg_next(new_sg);
> +	}
> +
> +	return new_table;
> +}
> +
> +static int secure_heap_attach(struct dma_buf *dmabuf,
> +			      struct dma_buf_attachment *attachment)
> +{
> +	struct secure_heap_buffer *buffer = dmabuf->priv;
> +	struct secure_heap_attachment *a;
> +	struct sg_table *table;
> +
> +	a = kzalloc(sizeof(*a), GFP_KERNEL);
> +	if (!a)
> +		return -ENOMEM;
> +
> +	table = dup_sg_table(&buffer->sg_table);
> +	if (IS_ERR(table)) {
> +		kfree(a);
> +		return -ENOMEM;

nit: You could return PTR_ERR(table), in case dup_sg_table starts
returning other errors.

> +	}
> +
> +	a->table = table;
> +	a->dev = attachment->dev;
> +	INIT_LIST_HEAD(&a->list);
> +	attachment->priv = a;
> +
> +	mutex_lock(&buffer->lock);
> +	list_add(&a->list, &buffer->attachments);
> +	mutex_unlock(&buffer->lock);
> +
> +	return 0;
> +}
> +
> +static void secure_heap_detach(struct dma_buf *dmabuf,
> +			       struct dma_buf_attachment *attachment)
> +{
> +	struct secure_heap_buffer *buffer = dmabuf->priv;
> +	struct secure_heap_attachment *a = attachment->priv;
> +
> +	mutex_lock(&buffer->lock);
> +	list_del(&a->list);
> +	mutex_unlock(&buffer->lock);
> +
> +	sg_free_table(a->table);
> +	kfree(a->table);
> +	kfree(a);
> +}
> +
> +static struct sg_table *secure_heap_map_dma_buf(struct dma_buf_attachment *attachment,
> +						enum dma_data_direction direction)
> +{
> +	struct secure_heap_attachment *a = attachment->priv;
> +
> +	return a->table;

I think you still need to implement mapping and unmapping using the
DMA APIs. For example devices might be behind IOMMUs and the buffer
will need mapping into the IOMMU.

> +}
> +
> +static void secure_heap_unmap_dma_buf(struct dma_buf_attachment *attachment,
> +				      struct sg_table *table,
> +				      enum dma_data_direction direction)
> +{
> +}
> +
> +static void secure_heap_dma_buf_release(struct dma_buf *dmabuf)
> +{
> +	struct secure_heap_buffer *buffer = dmabuf->priv;
> +	struct secure_heap_info *info;
> +	struct sg_table *table;
> +	struct scatterlist *sg;
> +	int i;
> +
> +	info = dma_heap_get_drvdata(buffer->heap);
> +
> +	table = &buffer->sg_table;
> +	for_each_sg(table->sgl, sg, table->nents, i)
> +		gen_pool_free(info->pool, sg_dma_address(sg), sg_dma_len(sg));
> +
> +	sg_free_table(table);
> +	kfree(buffer);
> +}
> +
> +static const struct dma_buf_ops secure_heap_buf_ops = {
> +	.attach = secure_heap_attach,
> +	.detach = secure_heap_detach,
> +	.map_dma_buf = secure_heap_map_dma_buf,
> +	.unmap_dma_buf = secure_heap_unmap_dma_buf,
> +	.release = secure_heap_dma_buf_release,
> +};
> +
> +static struct dma_buf *secure_heap_allocate(struct dma_heap *heap,
> +					    unsigned long len,
> +					    unsigned long fd_flags,
> +					    unsigned long heap_flags)
> +{
> +	struct secure_heap_buffer *buffer;
> +	struct secure_heap_info *info = dma_heap_get_drvdata(heap);
> +	DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
> +	unsigned long size = roundup(len, PAGE_SIZE);
> +	struct dma_buf *dmabuf;
> +	struct sg_table *table;
> +	int ret = -ENOMEM;
> +	unsigned long phy_addr;
> +
> +	buffer = kzalloc(sizeof(*buffer), GFP_KERNEL);
> +	if (!buffer)
> +		return ERR_PTR(-ENOMEM);
> +
> +	INIT_LIST_HEAD(&buffer->attachments);
> +	mutex_init(&buffer->lock);
> +	buffer->heap = heap;
> +	buffer->len = size;
> +
> +	phy_addr = gen_pool_alloc(info->pool, size);
> +	if (!phy_addr)
> +		goto free_buffer;
> +
> +	table = &buffer->sg_table;
> +	if (sg_alloc_table(table, 1, GFP_KERNEL))
> +		goto free_pool;
> +
> +	sg_set_page(table->sgl,	phys_to_page(phy_addr),	size, 0);
> +	sg_dma_address(table->sgl) = phy_addr;
> +	sg_dma_len(table->sgl) = size;
> +
> +	/* create the dmabuf */
> +	exp_info.exp_name = dma_heap_get_name(heap);
> +	exp_info.ops = &secure_heap_buf_ops;
> +	exp_info.size = buffer->len;
> +	exp_info.flags = fd_flags;
> +	exp_info.priv = buffer;
> +	dmabuf = dma_buf_export(&exp_info);
> +	if (IS_ERR(dmabuf)) {
> +		ret = PTR_ERR(dmabuf);
> +		goto free_pages;
> +	}
> +
> +	return dmabuf;
> +
> +free_pages:

Should maybe be called free_table:

> +	sg_free_table(table);
> +
> +free_pool:
> +	gen_pool_free(info->pool, phy_addr, size);
> +
> +free_buffer:
> +	mutex_destroy(&buffer->lock);
> +	kfree(buffer);
> +
> +	return ERR_PTR(ret);
> +}
> +
> +static const struct dma_heap_ops secure_heap_ops = {
> +	.allocate = secure_heap_allocate,
> +};
> +
> +static int secure_heap_add(struct rmem_secure *rmem)
> +{
> +	struct dma_heap *secure_heap;
> +	struct dma_heap_export_info exp_info;
> +	struct secure_heap_info *info = NULL;
> +	struct gen_pool *pool = NULL;
> +	int ret = -EINVAL;
> +
> +	if (rmem->base == 0 || rmem->size == 0) {
> +		pr_err("secure_data base or size is not correct\n");
> +		goto error;
> +	}
> +
> +	info = kzalloc(sizeof(*info), GFP_KERNEL);
> +	if (!info) {
> +		pr_err("dmabuf info allocation failed\n");
> +		ret = -ENOMEM;
> +		goto error;
> +	}
> +
> +	pool = gen_pool_create(PAGE_SHIFT, -1);
> +	if (!pool) {
> +		pr_err("can't create gen pool\n");
> +		ret = -ENOMEM;
> +		goto error;
> +	}
> +
> +	if (gen_pool_add(pool, rmem->base, rmem->size, -1) < 0) {
> +		pr_err("failed to add memory into pool\n");
> +		ret = -ENOMEM;
> +		goto error;
> +	}
> +
> +	info->pool = pool;
> +
> +	exp_info.name = rmem->name;
> +	exp_info.ops = &secure_heap_ops;
> +	exp_info.priv = info;
> +
> +	secure_heap = dma_heap_add(&exp_info);
> +	if (IS_ERR(secure_heap)) {
> +		pr_err("dmabuf secure heap allocation failed\n");
> +		ret = PTR_ERR(secure_heap);
> +		goto error;
> +	}
> +
> +	return 0;
> +
> +error:
> +	kfree(info);
> +	if (pool)
> +		gen_pool_destroy(pool);

nit: I think your order should be reversed here, to match the opposite
order of allocation.

> +
> +	return ret;
> +}
> +
> +static int secure_heap_create(void)
> +{
> +	unsigned int i;
> +	int ret;
> +
> +	for (i = 0; i < secure_data_count; i++) {
> +		ret = secure_heap_add(&secure_data[i]);
> +		if (ret)
> +			return ret;
> +	}
> +	return 0;
> +}
> +
> +static int rmem_secure_heap_device_init(struct reserved_mem *rmem,
> +					 struct device *dev)
> +{
> +	dev_set_drvdata(dev, rmem);
> +	return 0;
> +}
> +
> +static void rmem_secure_heap_device_release(struct reserved_mem *rmem,
> +					 struct device *dev)
> +{
> +	dev_set_drvdata(dev, NULL);
> +}
> +
> +static const struct reserved_mem_ops rmem_dma_ops = {
> +	.device_init    = rmem_secure_heap_device_init,
> +	.device_release = rmem_secure_heap_device_release,
> +};

What are these reserved_mem_ops for? Setting the drvdata for a random
device seems like it could cause lots of problems.

Is there a requirement to support assigning this SDP reserved-memory
region to a specific device? If not, I think you can just drop this.
Otherwise, I think you need some other mechanism to do the
association.

> +
> +static int __init rmem_secure_heap_setup(struct reserved_mem *rmem)
> +{
> +	if (secure_data_count < MAX_SECURE_HEAP) {
> +		int name_len = 0;
> +		const char *s = rmem->name;
> +
> +		secure_data[secure_data_count].base = rmem->base;
> +		secure_data[secure_data_count].size = rmem->size;
> +
> +		while (name_len < MAX_HEAP_NAME_LEN) {
> +			if ((*s == '@') || (*s == '\0'))
> +				break;
> +			name_len++;
> +			s++;
> +		}
> +		if (name_len == MAX_HEAP_NAME_LEN)
> +			name_len--;
> +
> +		strncpy(secure_data[secure_data_count].name, rmem->name, name_len);

I think it would be good to explicitly do:

  secure_data[secure_data_count].name[name_len] = '\0'

I know it's zero-initialised, but that's done on a line far away, so
may be best to be defensive.

> +
> +		rmem->ops = &rmem_dma_ops;
> +		pr_info("Reserved memory: DMA buf secure pool %s at %pa, size %ld MiB\n",
> +			secure_data[secure_data_count].name,
> +			&rmem->base, (unsigned long)rmem->size / SZ_1M);

nit: What if rmem->size < SZ_1M, or not 1M-aligned

> +
> +		secure_data_count++;
> +		return 0;
> +	}
> +	WARN_ONCE(1, "Cannot handle more than %u secure heaps\n", MAX_SECURE_HEAP);
> +	return -EINVAL;
> +}
> +
> +RESERVEDMEM_OF_DECLARE(secure_heap, "linaro,secure-heap", rmem_secure_heap_setup);

Is there anything linaro-specific about this? Could it be
linux,secure-heap?

Thanks,
-Brian

> +
> +module_init(secure_heap_create);
> +MODULE_LICENSE("GPL v2");
> -- 
> 2.25.0
>
Brian Starkey Aug. 12, 2022, 4:39 p.m. UTC | #2
Hi,

On Mon, Aug 08, 2022 at 02:39:53PM +0000, Olivier Masse wrote:
> Hi Brian,
> 
> On ven., 2022-08-05 at 16:41 +0100, Brian Starkey wrote:
> > Caution: EXT Email
> > 
> > Hi Olivier,
> > 
> > Thanks, I think this is looking much better.
> > 
> > I'd like to know how others feel about landing this heap; there's
> > been
> > push-back in the past about heaps in device-tree and discussions
> > around how "custom" heaps should be treated (though IMO this is quite
> > a generic one).
> > 
> > On Fri, Aug 05, 2022 at 03:53:28PM +0200, Olivier Masse wrote:
> > > add Linaro secure heap bindings: linaro,secure-heap
> > > use genalloc to allocate/free buffer from buffer pool.
> > > buffer pool info is from dts.
> > > use sg_table instore the allocated memory info, the length of
> > > sg_table is 1.
> > > implement secure_heap_buf_ops to implement buffer share in
> > > difference device:
> > > 1. Userspace passes this fd to all drivers it wants this buffer
> > > to share with: First the filedescriptor is converted to a &dma_buf
> > > using
> > > dma_buf_get(). Then the buffer is attached to the device using
> > > dma_buf_attach().
> > > 2. Once the buffer is attached to all devices userspace can
> > > initiate DMA
> > > access to the shared buffer. In the kernel this is done by calling
> > > dma_buf_map_attachment()
> > > 3. get sg_table with dma_buf_map_attachment in difference device.
> > > 
> > 
> > I think this commit message could use a little rework. A few
> > thoughts:
> > 
> > * The bindings are in a separate commit, so seems strange to mention
> >   here.
> 
> what about:
> "add Linaro secure heap compatible reserved memory: linaro,secure-heap"
> 

I'd say something like:

Add a dma-buf heap to allocate secure buffers from a reserved-memory
region.

..snip

> > > +
> > > +static struct sg_table *secure_heap_map_dma_buf(struct
> > > dma_buf_attachment *attachment,
> > > +                                             enum
> > > dma_data_direction direction)
> > > +{
> > > +     struct secure_heap_attachment *a = attachment->priv;
> > > +
> > > +     return a->table;
> > 
> > I think you still need to implement mapping and unmapping using the
> > DMA APIs. For example devices might be behind IOMMUs and the buffer
> > will need mapping into the IOMMU.
> 
> Devices that will need access to the buffer must be in secure.
> The tee driver will only need the scatter-list table to get dma address
> and len. Mapping will be done in the TEE.
> Please find tee_shm_register_fd in the following commit
> https://github.com/linaro-swg/linux/commit/41e21e5c405530590dc2dd10b2a8dbe64589840f
> 
> This patch need to be up-streamed as well.
> 

Interesting, that's not how the devices I've worked on operated.

Are you saying that you have to have a display controller driver
running in the TEE to display one of these buffers? If everything
needs to be in the TEE, then why even have these buffers allocated
by non-secure Linux at all?

I would have expected there to be HW enforcement of buffer access,
but for the display driver to be in non-secure Linux. That's how
TZMP1 works: https://static.linaro.org/connect/hkg18/presentations/hkg18-408.pdf

Looking at this SDP presentation, that also seems to be the case
there: https://static.linaro.org/connect/san19/presentations/san19-107.pdf

Based on those presentations, I think this heap should be implementing
map_dma_buf in the "normal" way, using the DMA API to map the buffer
to the device. It's up to the TEE and HW firewall to prevent access
to those mappings from non-secure devices.

My understanding is:

* The memory region should never be mapped or accessed from the host
  CPU. This is not a security requirement - the CPU will be denied
  access by whatever hardware is enforcing security - but any CPU
  accesses will fail, so there is no point in ever having a CPU
  mapping.
* The allocated buffers _should_ be mapped to devices via map_dma_buf.
  Again the HW enforcement will prevent access from devices which
  aren't permitted access, but for example a display controller
  may be allowed to read the secure buffer, composite it with other
  buffers, and display it on the screen.

Am I wrong? Even if SDP doesn't work this way, I think we should make
the heap as generic as possible so that it can work with different
secure video implementations.

> 
> > 

.. snip

> > > +
> > > +RESERVEDMEM_OF_DECLARE(secure_heap, "linaro,secure-heap",
> > > rmem_secure_heap_setup);
> > 
> > Is there anything linaro-specific about this? Could it be
> > linux,secure-heap?
> 
> for now, it's specific to Linaro OPTEE OS.
> but in a more generic way, it could be 
> linux,unmapped-heap ?

If these buffers can never be mapped, not to the CPU nor to devices,
then actually I don't see why it should be a dma-buf heap at all.

If this is just an interface to associate some identifier (in this
case an fd) with a region of physical address space, then why is it
useful to pretend that it's a dma-buf, if none of the dma-buf
operations actually do anything?

Cheers,
-Brian

> 
> > 
> > Thanks,
> > -Brian
> > 
> > > +
> > > +module_init(secure_heap_create);
> > > +MODULE_LICENSE("GPL v2");
> > > --
> > > 2.25.0
> > >
Olivier Masse Aug. 16, 2022, 11:20 a.m. UTC | #3
Hi Brian,


On ven., 2022-08-12 at 17:39 +0100, Brian Starkey wrote:
> Caution: EXT Email
> 
> Hi,
> 
> On Mon, Aug 08, 2022 at 02:39:53PM +0000, Olivier Masse wrote:
> > Hi Brian,
> > 
> > On ven., 2022-08-05 at 16:41 +0100, Brian Starkey wrote:
> > > Caution: EXT Email
> > > 
> > > Hi Olivier,
> > > 
> > > Thanks, I think this is looking much better.
> > > 
> > > I'd like to know how others feel about landing this heap; there's
> > > been
> > > push-back in the past about heaps in device-tree and discussions
> > > around how "custom" heaps should be treated (though IMO this is
> > > quite
> > > a generic one).
> > > 
> > > On Fri, Aug 05, 2022 at 03:53:28PM +0200, Olivier Masse wrote:
> > > > add Linaro secure heap bindings: linaro,secure-heap
> > > > use genalloc to allocate/free buffer from buffer pool.
> > > > buffer pool info is from dts.
> > > > use sg_table instore the allocated memory info, the length of
> > > > sg_table is 1.
> > > > implement secure_heap_buf_ops to implement buffer share in
> > > > difference device:
> > > > 1. Userspace passes this fd to all drivers it wants this buffer
> > > > to share with: First the filedescriptor is converted to a
> > > > &dma_buf
> > > > using
> > > > dma_buf_get(). Then the buffer is attached to the device using
> > > > dma_buf_attach().
> > > > 2. Once the buffer is attached to all devices userspace can
> > > > initiate DMA
> > > > access to the shared buffer. In the kernel this is done by
> > > > calling
> > > > dma_buf_map_attachment()
> > > > 3. get sg_table with dma_buf_map_attachment in difference
> > > > device.
> > > > 
> > > 
> > > I think this commit message could use a little rework. A few
> > > thoughts:
> > > 
> > > * The bindings are in a separate commit, so seems strange to
> > > mention
> > >   here.
> > 
> > what about:
> > "add Linaro secure heap compatible reserved memory: linaro,secure-
> > heap"
> > 
> 
> I'd say something like:
> 
> Add a dma-buf heap to allocate secure buffers from a reserved-memory
> region.
> 
> ..snip

ok right.

> 
> > > > +
> > > > +static struct sg_table *secure_heap_map_dma_buf(struct
> > > > dma_buf_attachment *attachment,
> > > > +                                             enum
> > > > dma_data_direction direction)
> > > > +{
> > > > +     struct secure_heap_attachment *a = attachment->priv;
> > > > +
> > > > +     return a->table;
> > > 
> > > I think you still need to implement mapping and unmapping using
> > > the
> > > DMA APIs. For example devices might be behind IOMMUs and the
> > > buffer
> > > will need mapping into the IOMMU.
> > 
> > Devices that will need access to the buffer must be in secure.
> > The tee driver will only need the scatter-list table to get dma
> > address
> > and len. Mapping will be done in the TEE.
> > Please find tee_shm_register_fd in the following commit
> > 
https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Flinaro-swg%2Flinux%2Fcommit%2F41e21e5c405530590dc2dd10b2a8dbe64589840f&amp;data=05%7C01%7Colivier.masse%40nxp.com%7C6b3d47f1e15c41a8cf7108da7c813ef6%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C1%7C637959191795668899%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=OKZhaNevD5dj7Wjm6zbZlij0mPA9XYyio1NAN3VjTVM%3D&amp;reserved=0
> > 
> > This patch need to be up-streamed as well.
> > 
> 
> Interesting, that's not how the devices I've worked on operated.
> 
> Are you saying that you have to have a display controller driver
> running in the TEE to display one of these buffers?

In fact the display controller is managing 3 plans : UI, PiP and
video. The video plan is protected in secure as you can see on slide
11:
https://static.linaro.org/connect/san19/presentations/san19-107.pdf

The DCSS (display controller) is able to read from the protected secure
heap and composition result is send directly to the HDMI/HDCP port.


>  If everything
> needs to be in the TEE, then why even have these buffers allocated
> by non-secure Linux at all?

The TEE is only doing decryption using the HW Crypto Accelerator
(CAAM).
The CAAM will read from a non protected encrypted buffer to write clear
content to a secure buffer allocated with DMABUF and mapped in secure
by OPTEE OS.

> 
> I would have expected there to be HW enforcement of buffer access,
> but for the display driver to be in non-secure Linux. That's how
> TZMP1 works: 
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstatic.linaro.org%2Fconnect%2Fhkg18%2Fpresentations%2Fhkg18-408.pdf&amp;data=05%7C01%7Colivier.masse%40nxp.com%7C6b3d47f1e15c41a8cf7108da7c813ef6%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C1%7C637959191795668899%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=XVpI93dXYu%2BGswLE8dcYboq%2FAWzSJn9j9LMlngpr238%3D&amp;reserved=0
> 
> Looking at this SDP presentation, that also seems to be the case
> there: 
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstatic.linaro.org%2Fconnect%2Fsan19%2Fpresentations%2Fsan19-107.pdf&amp;data=05%7C01%7Colivier.masse%40nxp.com%7C6b3d47f1e15c41a8cf7108da7c813ef6%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C1%7C637959191795668899%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=5Ec61NC1f0UQU%2F3BEURZQhEBrZ%2FuvJ1vaoSN4ChMn%2Bw%3D&amp;reserved=0
> 

Indeed, TZMP1 is similar to our implementation.

> Based on those presentations, I think this heap should be
> implementing
> map_dma_buf in the "normal" way, using the DMA API to map the buffer
> to the device. It's up to the TEE and HW firewall to prevent access
> to those mappings from non-secure devices.

In fact, our devices (VPU and DCSS) do not need any mapping, but only
the physical address of buffers which need to be contiguous.
The VPU decoder, run by the CPU, read video meta data from a non
protected buffer and send physical memory address of encoded buffer to
the VPU HW.
As well, the DCSS get physical address of contiguous decoded video
buffer to do the composition.

> 
> My understanding is:
> 
> * The memory region should never be mapped or accessed from the host
>   CPU. This is not a security requirement - the CPU will be denied
>   access by whatever hardware is enforcing security - but any CPU
>   accesses will fail, so there is no point in ever having a CPU
>   mapping.

agree with that.

> * The allocated buffers _should_ be mapped to devices via
> map_dma_buf.
>   Again the HW enforcement will prevent access from devices which
>   aren't permitted access, but for example a display controller
>   may be allowed to read the secure buffer, composite it with other
>   buffers, and display it on the screen.

yes, in could be done for a more generic implementation.

> 
> Am I wrong? Even if SDP doesn't work this way, I think we should make
> the heap as generic as possible so that it can work with different
> secure video implementations.
> 
> > 
> > > 
> 
> .. snip

alright, I get your point

> 
> > > > +
> > > > +RESERVEDMEM_OF_DECLARE(secure_heap, "linaro,secure-heap",
> > > > rmem_secure_heap_setup);
> > > 
> > > Is there anything linaro-specific about this? Could it be
> > > linux,secure-heap?
> > 
> > for now, it's specific to Linaro OPTEE OS.
> > but in a more generic way, it could be
> > linux,unmapped-heap ?
> 
> If these buffers can never be mapped, not to the CPU nor to devices,
> then actually I don't see why it should be a dma-buf heap at all.
> 
> If this is just an interface to associate some identifier (in this
> case an fd) with a region of physical address space, then why is it
> useful to pretend that it's a dma-buf, if none of the dma-buf
> operations actually do anything?

in our previous implementation, we were using unmapped ION buffer to be
able to send an opaque fd to the TEE driver which could then be mapped
in secure by OPTEE.
Transitioning from ION to DMABUF heaps, our retaining option was to
create a new heap type.


Best regards,
Olivier

> 
> Cheers,
> -Brian
> 
> > 
> > > 
> > > Thanks,
> > > -Brian
> > > 
> > > > +
> > > > +module_init(secure_heap_create);
> > > > +MODULE_LICENSE("GPL v2");
> > > > --
> > > > 2.25.0
> > > >
Nicolas Dufresne Aug. 17, 2022, 2:29 p.m. UTC | #4
Hi Folks,

Le mardi 16 août 2022 à 11:20 +0000, Olivier Masse a écrit :
> Hi Brian,
> 
> 
> On ven., 2022-08-12 at 17:39 +0100, Brian Starkey wrote:
> > Caution: EXT Ema
> > 

[...]

> > 
> > Interesting, that's not how the devices I've worked on operated.
> > 
> > Are you saying that you have to have a display controller driver
> > running in the TEE to display one of these buffers?
> 
> In fact the display controller is managing 3 plans : UI, PiP and
> video. The video plan is protected in secure as you can see on slide
> 11:
> https://static.linaro.org/connect/san19/presentations/san19-107.pdf



just wanted to highlight that all the WPE/GStreamer bit in this presentation is
based on NXP Vendor Media CODEC design, which rely on their own i.MX VPU API. I
don't see any effort to extend this to a wider audience. It is not explaining
how this can work with a mainline kernel with v4l2 stateful or stateless drivers
and generic GStreamer/FFMPEG/Chromium support.

I'm raising this, since I'm worried that no one cares of solving that high level
problem from a generic point of view. In that context, any additions to the
mainline Linux kernel can only be flawed and will only serves specific vendors
and not the larger audience.

Another aspect, is that this design might be bound to a specific (NXP ?)
security design. I've learn recently that newer HW is going to use multiple
level of MMU (like virtual machines do) to protect the memory rather then
marking pages. Will all this work for that too ?

regards,
Nicolas
Cyrille Fleury Aug. 18, 2022, 5:25 a.m. UTC | #5
Hi Nicolas, all,

 The short reply:
      - For DRM, gstreamer, ffmeg, ... we don't use anymore NXP VPU proprietary API 
      - We need secure dma-buf heaps to replace secure ion heaps

  The more detailed reply to address concerns below in the thread:
      - NXP doesn't design VPU, but rely on third party VPU hardware IP we integrate in our soc.  NXP proprietary API are for legacy applications our customers did without using gstreamer or ffmpeg, but we are now relying on V4L2 API for WPE/gstreamer, chromium/ffmpeg ...
      - Even with NXP legacy BSP, there was no API impact for WPE (or chromium) due to NXP VPU API. We use WPE/gstreamer, then a gstreamer pluging relying on NXP VPU proprietary API. But now we use V4L2. So we can forget NXP VPU proprietary API, and I'm very happy with that.
      - We have moved from ion buffer to dma buff to manage secure memory management. This is why we need secure dma-buf heaps, we protect with NXP hardware as we did with ion heaps in the presentation Olivier shared.        
      - For secure video playback, the changes we need to do are in user space world (gstreamer, WPE, ...), to update our patches managing secure ion heaps by secure dma-buf heaps. But dma-buf is file descriptor based as ion heap are.
      - What will change between platforms, is how memory is protected. This is why we requested to have dtb in OPTEE for secure memory, to be able to provide a common API to secure DDR memory, and  then to rely on proprietary code in OPTEE to secure it.
      - We don't have a display controller or VPU decoder running in TEE. They remain under the full control of Linux/REE Word. If Linux/REE ask something breaking Widevine/PlayReady security rules, for example decode secure memory to non-secure memory, read from secure memory will return 0, write to secure memory will be ignored. Same with keys, certificates ...
      - i.MX8 socs have a stateless VPU and there is no VPU firmware. i.MX9 socs have a stateful VPU with firmware. In secure memory context, with secure memory, at software level, stateful VPU are even more simple to manage ->  less read/write operations performed by Linux world to parse the stream, so less patch to be done in the video framework. But for memory management, stateful/stateless, same concern: we  need  to provide support of secure dma heaps to Linux, to allocate secure memory for the VPU and the display controller, so it is just a different dma-buf heaps, so a different file descriptor.
      - i.MX9 VPU will support "Virtual Machine VPU". Till now I don't see why it will not work. I'm not an expert in VM, but from what I understood from my discussions with NXP VPU team integrating the new VPU hardware IP, from outside world, VPU is seen as multiple VPUs, with multiple register banks. So virtualized OS will continue to read/write registers as today, and at software level, secure memory is as non-secure memory, I mean at this end, it is physical DDR memory. Of course hardware shall be able to read/write it, but this is not software related, this is hardware concern. And even without VM, we target to dedicate one virtual VPU to DRM,  so one register bank, to setup dedicated security rules for DRM.
      
  I'm on vacation until end of this week. I can setup a call next week to discuss this topic if more clarifications are needed.

Regards.

-----Original Message-----
From: Olivier Masse <olivier.masse@nxp.com> 
Sent: Wednesday, August 17, 2022 4:52 PM
To: nicolas@ndufresne.ca; Cyrille Fleury <cyrille.fleury@nxp.com>; brian.starkey@arm.com
Cc: sumit.semwal@linaro.org; linux-kernel@vger.kernel.org; linaro-mm-sig@lists.linaro.org; christian.koenig@amd.com; linux-media@vger.kernel.org; nd@arm.com; Clément Faure <clement.faure@nxp.com>; dri-devel@lists.freedesktop.org; benjamin.gaignard@collabora.com
Subject: Re: [EXT] Re: [PATCH 1/3] dma-buf: heaps: add Linaro secure dmabuf heap support

+Cyrille

Hi Nicolas,

On mer., 2022-08-17 at 10:29 -0400, Nicolas Dufresne wrote:
> Caution: EXT Email
> 
> Hi Folks,
> 
> Le mardi 16 août 2022 à 11:20 +0000, Olivier Masse a écrit :
> > Hi Brian,
> > 
> > 
> > On ven., 2022-08-12 at 17:39 +0100, Brian Starkey wrote:
> > > Caution: EXT Ema
> > > 
> 
> [...]
> 
> > > 
> > > Interesting, that's not how the devices I've worked on operated.
> > > 
> > > Are you saying that you have to have a display controller driver 
> > > running in the TEE to display one of these buffers?
> > 
> > In fact the display controller is managing 3 plans : UI, PiP and 
> > video. The video plan is protected in secure as you can see on slide
> > 11:
> > 
https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstatic.linaro.org%2Fconnect%2Fsan19%2Fpresentations%2Fsan19-107.pdf&amp;data=05%7C01%7Colivier.masse%40nxp.com%7Ce0e00be789a54dff8e5208da805ce2f6%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C1%7C637963433695707516%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=GHjEfbgqRkfHK16oyNaYJob4LRVqvoffRElKR%2F7Rtes%3D&amp;reserved=0
> 
> 
> 
> just wanted to highlight that all the WPE/GStreamer bit in this 
> presentation is based on NXP Vendor Media CODEC design, which rely on 
> their own i.MX VPU API. I don't see any effort to extend this to a 
> wider audience. It is not explaining how this can work with a mainline 
> kernel with v4l2 stateful or stateless drivers and generic 
> GStreamer/FFMPEG/Chromium support.

Maybe Cyrille can explain what it is currently done at NXP level regarding the integration of v4l2 with NXP VPU.

> 
> I'm raising this, since I'm worried that no one cares of solving that 
> high level problem from a generic point of view. In that context, any 
> additions to the mainline Linux kernel can only be flawed and will 
> only serves specific vendors and not the larger audience.
> 
> Another aspect, is that this design might be bound to a specific (NXP
> ?)
> security design. I've learn recently that newer HW is going to use 
> multiple level of MMU (like virtual machines do) to protect the memory 
> rather then marking pages. Will all this work for that too ?

our fire-walling hardware is protecting memory behind the MMU and so rely on physical memory layout.
this work is only relying on a reserved physical memory.

Regards,
Olivier

> 
> regards,
> Nicolas
Nicolas Dufresne Aug. 19, 2022, 3:13 p.m. UTC | #6
Hi,

thanks for the additional information, we are starting to have a (still partial)
overview of your team goals.

Le jeudi 18 août 2022 à 05:25 +0000, Cyrille Fleury a écrit :
> Hi Nicolas, all,
> 
>  The short reply:
>       - For DRM, gstreamer, ffmeg, ... we don't use anymore NXP VPU
> proprietary API 
>       - We need secure dma-buf heaps to replace secure ion heaps
> 
>   The more detailed reply to address concerns below in the thread:
>       - NXP doesn't design VPU, but rely on third party VPU hardware IP we
> integrate in our soc.  NXP proprietary API are for legacy applications our
> customers did without using gstreamer or ffmpeg, but we are now relying on
> V4L2 API for WPE/gstreamer, chromium/ffmpeg ...
>       - Even with NXP legacy BSP, there was no API impact for WPE (or
> chromium) due to NXP VPU API. We use WPE/gstreamer, then a gstreamer pluging
> relying on NXP VPU proprietary API. But now we use V4L2. So we can forget NXP
> VPU proprietary API, and I'm very happy with that.
>       - We have moved from ion buffer to dma buff to manage secure memory
> management. This is why we need secure dma-buf heaps, we protect with NXP
> hardware as we did with ion heaps in the presentation Olivier shared.        
>       - For secure video playback, the changes we need to do are in user space
> world (gstreamer, WPE, ...), to update our patches managing secure ion heaps
> by secure dma-buf heaps. But dma-buf is file descriptor based as ion heap are.

Do you have some links to these changes to user-space code that demonstrate the
usage of this new heap in its real context ?

>       - What will change between platforms, is how memory is protected. This
> is why we requested to have dtb in OPTEE for secure memory, to be able to
> provide a common API to secure DDR memory, and  then to rely on proprietary
> code in OPTEE to secure it.
>       - We don't have a display controller or VPU decoder running in TEE. They
> remain under the full control of Linux/REE Word. If Linux/REE ask something
> breaking Widevine/PlayReady security rules, for example decode secure memory
> to non-secure memory, read from secure memory will return 0, write to secure
> memory will be ignored. Same with keys, certificates ...

Can you explain how you would manage to make VP9 stateless decoding work ? On
IMX8MQ you have a chip that will produce a feedback binary, which contains the
probability data. The mainline driver will merge the forward probability to
prepare the probability for the next decode.

This basically means at least 1 output of the decoder needs to be non-secure
(for CPU read-back). That breaks the notion of secure memory domain, which is
global to the HW. One could think you could just ask the TEE to copy it back for
you, but to do that safely, the TEE would need to control the CODEC programming,
hence have a CODEC driver in the secure OS.

I'm not familiar with it, but may that have impact on HDMI receivers, which may
need some buffers for CPU usage (perhaps HDR metadata, EDID, etc.).

>       - i.MX8 socs have a stateless VPU and there is no VPU firmware. i.MX9
> socs have a stateful VPU with firmware. In secure memory context, with secure
> memory, at software level, stateful VPU are even more simple to manage -> 
> less read/write operations performed by Linux world to parse the stream, so
> less patch to be done in the video framework. But for memory management,
> stateful/stateless, same concern: we  need  to provide support of secure dma
> heaps to Linux, to allocate secure memory for the VPU and the display
> controller, so it is just a different dma-buf heaps, so a different file
> descriptor.

i.MX8 boards may have stateless or stateful CODEC (Hantro chips are used in
stateless fashion, while Amphion chips are driven by a stateful firmware). I
would have hoped NXP folks would know that, as this is what their users have to
deal with on day-to-day.

May I interpret this as NXP is giving up on i.MX8 memory protection (or perhaps
your team is only caring about i.MX9 ?), and this solution is on usable for
stateful (less flexible) CODECs ?

>       - i.MX9 VPU will support "Virtual Machine VPU". Till now I don't see why
> it will not work. I'm not an expert in VM, but from what I understood from my
> discussions with NXP VPU team integrating the new VPU hardware IP, from
> outside world, VPU is seen as multiple VPUs, with multiple register banks. So
> virtualized OS will continue to read/write registers as today, and at software
> level, secure memory is as non-secure memory, I mean at this end, it is
> physical DDR memory. Of course hardware shall be able to read/write it, but
> this is not software related, this is hardware concern. And even without VM,
> we target to dedicate one virtual VPU to DRM,  so one register bank, to setup
> dedicated security rules for DRM.

What you wrote here is about as much as I heard about the new security model
coming in newer chips (this is not NXP specific). I think in order to push
forward designs and APIs, it would be logical to first present about these
mechanism, now they work and how they affect drivers and user space. Its not
clear how this mechanism inforces usage of non-mappable to kernel mmu memory.
Providing Open Source kernel and userland to demonstrate and use this feature is
also very helpful for reviewers and adopters, but also a requirement in the drm
tree.

regards,
Nicolas

>       
>   I'm on vacation until end of this week. I can setup a call next week to discuss this topic if more clarifications are needed.
> 
> Regards.
> 
> -----Original Message-----
> From: Olivier Masse <olivier.masse@nxp.com> 
> Sent: Wednesday, August 17, 2022 4:52 PM
> To: nicolas@ndufresne.ca; Cyrille Fleury <cyrille.fleury@nxp.com>; brian.starkey@arm.com
> Cc: sumit.semwal@linaro.org; linux-kernel@vger.kernel.org; linaro-mm-sig@lists.linaro.org; christian.koenig@amd.com; linux-media@vger.kernel.org; nd@arm.com; Clément Faure <clement.faure@nxp.com>; dri-devel@lists.freedesktop.org; benjamin.gaignard@collabora.com
> Subject: Re: [EXT] Re: [PATCH 1/3] dma-buf: heaps: add Linaro secure dmabuf heap support
> 
> +Cyrille
> 
> Hi Nicolas,
> 
> On mer., 2022-08-17 at 10:29 -0400, Nicolas Dufresne wrote:
> > Caution: EXT Email
> > 
> > Hi Folks,
> > 
> > Le mardi 16 août 2022 à 11:20 +0000, Olivier Masse a écrit :
> > > Hi Brian,
> > > 
> > > 
> > > On ven., 2022-08-12 at 17:39 +0100, Brian Starkey wrote:
> > > > Caution: EXT Ema
> > > > 
> > 
> > [...]
> > 
> > > > 
> > > > Interesting, that's not how the devices I've worked on operated.
> > > > 
> > > > Are you saying that you have to have a display controller driver 
> > > > running in the TEE to display one of these buffers?
> > > 
> > > In fact the display controller is managing 3 plans : UI, PiP and 
> > > video. The video plan is protected in secure as you can see on slide
> > > 11:
> > > 
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstatic.linaro.org%2Fconnect%2Fsan19%2Fpresentations%2Fsan19-107.pdf&amp;data=05%7C01%7Colivier.masse%40nxp.com%7Ce0e00be789a54dff8e5208da805ce2f6%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C1%7C637963433695707516%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=GHjEfbgqRkfHK16oyNaYJob4LRVqvoffRElKR%2F7Rtes%3D&amp;reserved=0
> > 
> > 
> > 
> > just wanted to highlight that all the WPE/GStreamer bit in this 
> > presentation is based on NXP Vendor Media CODEC design, which rely on 
> > their own i.MX VPU API. I don't see any effort to extend this to a 
> > wider audience. It is not explaining how this can work with a mainline 
> > kernel with v4l2 stateful or stateless drivers and generic 
> > GStreamer/FFMPEG/Chromium support.
> 
> Maybe Cyrille can explain what it is currently done at NXP level regarding the integration of v4l2 with NXP VPU.
> 
> > 
> > I'm raising this, since I'm worried that no one cares of solving that 
> > high level problem from a generic point of view. In that context, any 
> > additions to the mainline Linux kernel can only be flawed and will 
> > only serves specific vendors and not the larger audience.
> > 
> > Another aspect, is that this design might be bound to a specific (NXP
> > ?)
> > security design. I've learn recently that newer HW is going to use 
> > multiple level of MMU (like virtual machines do) to protect the memory 
> > rather then marking pages. Will all this work for that too ?
> 
> our fire-walling hardware is protecting memory behind the MMU and so rely on physical memory layout.
> this work is only relying on a reserved physical memory.
> 
> Regards,
> Olivier
> 
> > 
> > regards,
> > Nicolas
Cyrille Fleury Aug. 23, 2022, 3:58 p.m. UTC | #7
Hi Nicolas, all

Nicolas,  please see below in the thread for your concerns regarding video decoding and hdmi receivers with secure memory.

But i can also propose to setup calls to discuss about that. We are almost done with a proposal to add support of Linaro secure dmabuff heaps in V4L2. Today It is functional with Linux 5.15 + i.MX8MQ evk board,  I mean we can playback video using a secure dmabuf heap memory Linux can't read/write, but we still need some time to clean the code and have something ready for a code review.

So maybe 2 different calls could happen:
    1) A first one to define a generic mechanism for Linux to manage (allocate/free) secure memory. By secure memory i mean a memory Linux can't read and write with the cpu running in non-secure mode.
          - Linaro secure dmabuf heaps seems to be a reasonable approach and is available
          - Secure OS in charge of the hardware management to protect the secure memory, without any action to be done from Linux side, I mean when Linux kernel starts, a secure dmabuf heap is already protected ,seems a reasonable approach
          - then Linux Kernel needs to read device tree to know such secure heaps exist and will expose them to user space world.
          - For memory isolation/sandboxing use cases, we may need different secure heaps.  For example, one secure heap, the  video decoder is allowed to access (Secure Video Path like applications),and second secure heap Video heap decoder is not allowed to access (secure payment like applications), but this is under responsibility of secure OS to configure such memory security rules before Linux kernel starts. 

          - Using current Linux CMA, and ask dynamically the secure OS to allocate and secure memory from existing CMA heap doesn't seem to be a right approach to me. I think we should be more or less all aligned with that.
              Why is it not the right approach:
                 - when you release secure memory, you need to "memset" it to 0, because this memory can potentially  be reuse by non-secure world. It can take a long time with 4K video buffers and during that time, the Linux process calling the secure OS is stuck. Not good for real time or smooth video playback.
                 - we need direct interaction between Linux and Secure OS for each allocate/free of secure memory:
		   - different secure OS means different API to be called from Linux.
                                - it takes time for the CPU to switch from non-secure  Linux-> secure OS -> non-secure Linux. From what I have in mind, something like 1ms with a arm a53 cpu running at 1.5 Ghz, so max 1000 alloc/free per seconds, and again during that time, the Linux process calling the secure OS is stuck, waiting secure OS.
	                  - Linux and secure OS shall be in sync regarding memory allocation in CMA. Seems a very complex mechanism to maintain.
          
              ->  It is why we need dedicated dma buf heaps for secure memory, and why  Linaro secure dmabuf heap support is needed in Linux Kernel.

    2) A second call to discuss V4L2 using Linaro secure dmabuff heap, when we will be ready for the code review.

Please let me know if you agree with this proposal to setup 2 different calls, as they are 2 different topics to be addressed.

Regards.

-----Original Message-----
From: Nicolas Dufresne <nicolas@ndufresne.ca> 
Sent: Friday, August 19, 2022 5:14 PM
To: Cyrille Fleury <cyrille.fleury@nxp.com>; Olivier Masse <olivier.masse@nxp.com>; brian.starkey@arm.com
Cc: sumit.semwal@linaro.org; linux-kernel@vger.kernel.org; linaro-mm-sig@lists.linaro.org; christian.koenig@amd.com; linux-media@vger.kernel.org; nd@arm.com; Clément Faure <clement.faure@nxp.com>; dri-devel@lists.freedesktop.org; benjamin.gaignard@collabora.com
Subject: Re: [EXT] Re: [PATCH 1/3] dma-buf: heaps: add Linaro secure dmabuf heap support

Caution: EXT Email

Hi,

thanks for the additional information, we are starting to have a (still partial) overview of your team goals.

Le jeudi 18 août 2022 à 05:25 +0000, Cyrille Fleury a écrit :
> Hi Nicolas, all,
>
>  The short reply:
>       - For DRM, gstreamer, ffmeg, ... we don't use anymore NXP VPU 
> proprietary API
>       - We need secure dma-buf heaps to replace secure ion heaps
>
>   The more detailed reply to address concerns below in the thread:
>       - NXP doesn't design VPU, but rely on third party VPU hardware 
> IP we integrate in our soc.  NXP proprietary API are for legacy 
> applications our customers did without using gstreamer or ffmpeg, but 
> we are now relying on
> V4L2 API for WPE/gstreamer, chromium/ffmpeg ...
>       - Even with NXP legacy BSP, there was no API impact for WPE (or
> chromium) due to NXP VPU API. We use WPE/gstreamer, then a gstreamer 
> pluging relying on NXP VPU proprietary API. But now we use V4L2. So we 
> can forget NXP VPU proprietary API, and I'm very happy with that.
>       - We have moved from ion buffer to dma buff to manage secure 
> memory management. This is why we need secure dma-buf heaps, we 
> protect with NXP hardware as we did with ion heaps in the presentation Olivier shared.
>       - For secure video playback, the changes we need to do are in 
> user space world (gstreamer, WPE, ...), to update our patches managing 
> secure ion heaps by secure dma-buf heaps. But dma-buf is file descriptor based as ion heap are.

Do you have some links to these changes to user-space code that demonstrate the usage of this new heap in its real context ?

[Cyrille] We have a proposal for V4L2 + secure dma-buff heaps. Pull request should be available soon. We added an allocator in drivers/media/common/videobuf2/videobuf2-dma-heap.c, following what has been done in videobuf2-dma-contig.c. With this new allocator, we can playback H264 and HEVC streams with gstreamer and secure dma heaps (memory Linux can't read/write, and protected by TZASC + NXP equivalent of Arm TZMP technology (RDC/TRDC for i.MX8M family)). OPTEE is in charge of protecting memory, through a device tree and dedicated drivers in OPTEE, but OPTEE could be replaced by any other secure OS, as we don't rely on OPTEE to allocate memory.

>       - What will change between platforms, is how memory is 
> protected. This is why we requested to have dtb in OPTEE for secure 
> memory, to be able to provide a common API to secure DDR memory, and  
> then to rely on proprietary code in OPTEE to secure it.
>       - We don't have a display controller or VPU decoder running in 
> TEE. They remain under the full control of Linux/REE Word. If 
> Linux/REE ask something breaking Widevine/PlayReady security rules, 
> for example decode secure memory to non-secure memory, read from 
> secure memory will return 0, write to secure memory will be ignored. Same with keys, certificates ...

Can you explain how you would manage to make VP9 stateless decoding work ? On IMX8MQ you have a chip that will produce a feedback binary, which contains the probability data. The mainline driver will merge the forward probability to prepare the probability for the next decode.

This basically means at least 1 output of the decoder needs to be non-secure (for CPU read-back). That breaks the notion of secure memory domain, which is global to the HW. One could think you could just ask the TEE to copy it back for you, but to do that safely, the TEE would need to control the CODEC programming, hence have a CODEC driver in the secure OS.

I'm not familiar with it, but may that have impact on HDMI receivers, which may need some buffers for CPU usage (perhaps HDR metadata, EDID, etc.).

 [Cyrille] We indeed got issues with VP9 codec with i.MX 8M stateless VPU, but not with vp9 continuity counters/feedback binary. There is no really secret in those feedback information ( I mean you cannot build an image from them), so they can be expose through non-secure memory to Linux. Issue we got is related to amount of video meta data Widevine encrypt, because some meta data not supported by i.MX8 stateless VPU (Hantro G2 decoder) shall be parsed by CPU, but they are encrypted by Widevine. Widevine is not following CENC specification regarding those meta data. So we informed Widevine but they are afraid to change the encryption model they use for VP9. So to support VP9, we need to parse those VP9 meta data in OPTEE, in a dedicated Trusted Application, to detect and expose them to Linux. There is no secret in those meta data, the only risk is a bug in our parsing algorithm, and I agree this is not very good at secure video path level, but we have no other solution for VP9 codec with 8M family having Hantro stateless VPU. H264 and H265 streams don't have such issues.
 
I don't think HDMI receivers are an issue: data bitrate is just too big for a cpu, and so for the kind of information you mentioned (meta data, audio codec, audio clock, number of audio channels....) we rely on interruption and registers exposed by the HDMI controller receivers to notify changes in the hdmi flow, in order to reconfigure the hardware/software accordingly. EDID use a slow I2C bus like, but there is no sensible data there at DRM point of view. So cpu can parse it if not already managed by the HDMI receiver hardware and shared through registers.


>       - i.MX8 socs have a stateless VPU and there is no VPU firmware. 
> i.MX9 socs have a stateful VPU with firmware. In secure memory 
> context, with secure memory, at software level, stateful VPU are even 
> more simple to manage -> less read/write operations performed by Linux 
> world to parse the stream, so less patch to be done in the video 
> framework. But for memory management, stateful/stateless, same 
> concern: we  need  to provide support of secure dma heaps to Linux, to 
> allocate secure memory for the VPU and the display controller, so it 
> is just a different dma-buf heaps, so a different file descriptor.

i.MX8 boards may have stateless or stateful CODEC (Hantro chips are used in stateless fashion, while Amphion chips are driven by a stateful firmware). I would have hoped NXP folks would know that, as this is what their users have to deal with on day-to-day.

[Cyrille] Correct, I should have mentioned i.MX 8M  family (Hantro VPU only), and not i.MX 8 to avoid confusion.

May I interpret this as NXP is giving up on i.MX8 memory protection (or perhaps your team is only caring about i.MX9 ?), and this solution is on usable for stateful (less flexible) CODECs ?

[Cyrille] We target both stateless and stateful VPU. For 8, it is i.MX 8MPlus and 8MQ, for 9 it will depends what customer request for DRM. It doesn't make sense to support Secure Video Path for all socs. 

>       - i.MX9 VPU will support "Virtual Machine VPU". Till now I don't 
> see why it will not work. I'm not an expert in VM, but from what I 
> understood from my discussions with NXP VPU team integrating the new 
> VPU hardware IP, from outside world, VPU is seen as multiple VPUs, 
> with multiple register banks. So virtualized OS will continue to 
> read/write registers as today, and at software level, secure memory is 
> as non-secure memory, I mean at this end, it is physical DDR memory. 
> Of course hardware shall be able to read/write it, but this is not 
> software related, this is hardware concern. And even without VM, we 
> target to dedicate one virtual VPU to DRM,  so one register bank, to setup dedicated security rules for DRM.

What you wrote here is about as much as I heard about the new security model coming in newer chips (this is not NXP specific). I think in order to push forward designs and APIs, it would be logical to first present about these mechanism, now they work and how they affect drivers and user space. Its not clear how this mechanism inforces usage of non-mappable to kernel mmu memory.
Providing Open Source kernel and userland to demonstrate and use this feature is also very helpful for reviewers and adopters, but also a requirement in the drm tree.

regards,
Nicolas

>
>   I'm on vacation until end of this week. I can setup a call next week to discuss this topic if more clarifications are needed.
>
> Regards.
>
> -----Original Message-----
> From: Olivier Masse <olivier.masse@nxp.com>
> Sent: Wednesday, August 17, 2022 4:52 PM
> To: nicolas@ndufresne.ca; Cyrille Fleury <cyrille.fleury@nxp.com>; 
> brian.starkey@arm.com
> Cc: sumit.semwal@linaro.org; linux-kernel@vger.kernel.org; 
> linaro-mm-sig@lists.linaro.org; christian.koenig@amd.com; 
> linux-media@vger.kernel.org; nd@arm.com; Clément Faure 
> <clement.faure@nxp.com>; dri-devel@lists.freedesktop.org; 
> benjamin.gaignard@collabora.com
> Subject: Re: [EXT] Re: [PATCH 1/3] dma-buf: heaps: add Linaro secure 
> dmabuf heap support
>
> +Cyrille
>
> Hi Nicolas,
>
> On mer., 2022-08-17 at 10:29 -0400, Nicolas Dufresne wrote:
> > Caution: EXT Email
> >
> > Hi Folks,
> >
> > Le mardi 16 août 2022 à 11:20 +0000, Olivier Masse a écrit :
> > > Hi Brian,
> > >
> > >
> > > On ven., 2022-08-12 at 17:39 +0100, Brian Starkey wrote:
> > > > Caution: EXT Ema
> > > >
> >
> > [...]
> >
> > > >
> > > > Interesting, that's not how the devices I've worked on operated.
> > > >
> > > > Are you saying that you have to have a display controller driver 
> > > > running in the TEE to display one of these buffers?
> > >
> > > In fact the display controller is managing 3 plans : UI, PiP and 
> > > video. The video plan is protected in secure as you can see on 
> > > slide
> > > 11:
> > >
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat
> ic.linaro.org%2Fconnect%2Fsan19%2Fpresentations%2Fsan19-107.pdf&amp;da
> ta=05%7C01%7Ccyrille.fleury%40nxp.com%7C13a4dd35018b43f9f63908da81f570
> 30%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C1%7C637965188416145231%7CU
> nknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1ha
> WwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=cZ7BP3BJXVIBX8kCGcj%2FCNbe
> P4cB%2BaSjgGOfPMh6k4E%3D&amp;reserved=0
> >
> >
> >
> > just wanted to highlight that all the WPE/GStreamer bit in this 
> > presentation is based on NXP Vendor Media CODEC design, which rely 
> > on their own i.MX VPU API. I don't see any effort to extend this to 
> > a wider audience. It is not explaining how this can work with a 
> > mainline kernel with v4l2 stateful or stateless drivers and generic 
> > GStreamer/FFMPEG/Chromium support.
>
> Maybe Cyrille can explain what it is currently done at NXP level regarding the integration of v4l2 with NXP VPU.
>
> >
> > I'm raising this, since I'm worried that no one cares of solving 
> > that high level problem from a generic point of view. In that 
> > context, any additions to the mainline Linux kernel can only be 
> > flawed and will only serves specific vendors and not the larger audience.
> >
> > Another aspect, is that this design might be bound to a specific 
> > (NXP
> > ?)
> > security design. I've learn recently that newer HW is going to use 
> > multiple level of MMU (like virtual machines do) to protect the 
> > memory rather then marking pages. Will all this work for that too ?
>
> our fire-walling hardware is protecting memory behind the MMU and so rely on physical memory layout.
> this work is only relying on a reserved physical memory.
>
> Regards,
> Olivier
>
> >
> > regards,
> > Nicolas