mbox series

[0/2] drm/nvdla: Add driver support for NVDLA

Message ID 20220419135908.39606-1-cai.huoqing@linux.dev
Headers show
Series drm/nvdla: Add driver support for NVDLA | expand

Message

Cai Huoqing April 19, 2022, 1:58 p.m. UTC
The NVIDIA Deep Learning Accelerator (NVDLA) is an open source IP
which is integrated into NVIDIA Jetson AGX Xavier,
so add driver support for this accelerator.

NVDLA introduce:
http://nvdla.org/primer.html

User mode driver:
https://github.com/caihuoq/nvdla/tree/main/sw/umd


Cai Huoqing (2):
  MAINTAINERS: Add the driver info of the NVDLA
  drm/nvdla: Add driver support for NVDLA

 MAINTAINERS                             |    7 +
 drivers/gpu/drm/Kconfig                 |    2 +
 drivers/gpu/drm/Makefile                |    1 +
 drivers/gpu/drm/nvdla/Kconfig           |    8 +
 drivers/gpu/drm/nvdla/Makefile          |   19 +
 drivers/gpu/drm/nvdla/nvdla_bdma.c      |  200 +
 drivers/gpu/drm/nvdla/nvdla_cache.c     |  215 +
 drivers/gpu/drm/nvdla/nvdla_cdp.c       |  300 ++
 drivers/gpu/drm/nvdla/nvdla_common.c    |  295 ++
 drivers/gpu/drm/nvdla/nvdla_common.h    |  835 +++
 drivers/gpu/drm/nvdla/nvdla_conv.c      |  683 +++
 drivers/gpu/drm/nvdla/nvdla_drm.c       |  695 +++
 drivers/gpu/drm/nvdla/nvdla_drm.h       |  127 +
 drivers/gpu/drm/nvdla/nvdla_engine.c    |  233 +
 drivers/gpu/drm/nvdla/nvdla_engine.h    |  272 +
 drivers/gpu/drm/nvdla/nvdla_gem.c       |  393 ++
 drivers/gpu/drm/nvdla/nvdla_ioctl.h     |   99 +
 drivers/gpu/drm/nvdla/nvdla_pdp.c       |  446 ++
 drivers/gpu/drm/nvdla/nvdla_reg.h       | 6411 +++++++++++++++++++++++
 drivers/gpu/drm/nvdla/nvdla_rubik.c     |  217 +
 drivers/gpu/drm/nvdla/nvdla_sched.h     |   52 +
 drivers/gpu/drm/nvdla/nvdla_scheduler.c | 1005 ++++
 drivers/gpu/drm/nvdla/nvdla_sdp.c       |  728 +++
 23 files changed, 13243 insertions(+)
 create mode 100644 drivers/gpu/drm/nvdla/Kconfig
 create mode 100644 drivers/gpu/drm/nvdla/Makefile
 create mode 100644 drivers/gpu/drm/nvdla/nvdla_bdma.c
 create mode 100644 drivers/gpu/drm/nvdla/nvdla_cache.c
 create mode 100644 drivers/gpu/drm/nvdla/nvdla_cdp.c
 create mode 100644 drivers/gpu/drm/nvdla/nvdla_common.c
 create mode 100644 drivers/gpu/drm/nvdla/nvdla_common.h
 create mode 100644 drivers/gpu/drm/nvdla/nvdla_conv.c
 create mode 100644 drivers/gpu/drm/nvdla/nvdla_drm.c
 create mode 100644 drivers/gpu/drm/nvdla/nvdla_drm.h
 create mode 100644 drivers/gpu/drm/nvdla/nvdla_engine.c
 create mode 100644 drivers/gpu/drm/nvdla/nvdla_engine.h
 create mode 100644 drivers/gpu/drm/nvdla/nvdla_gem.c
 create mode 100644 drivers/gpu/drm/nvdla/nvdla_ioctl.h
 create mode 100644 drivers/gpu/drm/nvdla/nvdla_pdp.c
 create mode 100644 drivers/gpu/drm/nvdla/nvdla_reg.h
 create mode 100644 drivers/gpu/drm/nvdla/nvdla_rubik.c
 create mode 100644 drivers/gpu/drm/nvdla/nvdla_sched.h
 create mode 100644 drivers/gpu/drm/nvdla/nvdla_scheduler.c
 create mode 100644 drivers/gpu/drm/nvdla/nvdla_sdp.c

Comments

Cai Huoqing April 19, 2022, 2:35 p.m. UTC | #1
On 19 4月 22 16:07:44, Christian König wrote:
> Am 19.04.22 um 15:59 schrieb Cai Huoqing:
> > The NVIDIA Deep Learning Accelerator (NVDLA) is an open source IP
> > which is integrated into NVIDIA Jetson AGX Xavier,
> > so add driver support for this accelerator.
> > 
> > Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev>
> > 
> 
> Well doesn't looks so bad on first glance (regarding coding style etc..)
> 
> But am I blind or isn't there any UAPI for the driver? I mean adding a DRM
> driver without any change to include/uapi/drm is really odd.
thanks for your reply, I will rename nvdla_ioctl.h which is UAPI headfile,
then put it to include/uapi/drm.

thanks,
Cai
> 
> Regards,
> Christian.
Thomas Zimmermann April 21, 2022, 8:30 a.m. UTC | #2
(Resending, as some MLs didn't like the size of the origninal mail.)

Hi,

thanks for your submission. Some general comments:

   * some functions are prefixed with dla_, others use nvdla_. It seems 
arbitrary to me. Please use nvdla_ consistently throughout the source code.

   * For reporting errors, please use drm_err(), drm_warn(), etc. I 
suggest to rearrange the error messages to not be located in the 
innermost functions.

   * Could you please split this patch into smaller pieces? It currently 
hits size limits of some mailing lists. Maybe add the register constants 
separately.

Please find more review comments below. It's not a full review, but at 
least something to start with.

Best regards
Thomas

Am 19.04.22 um 15:59 schrieb Cai Huoqing:
> The NVIDIA Deep Learning Accelerator (NVDLA) is an open source IP
> which is integrated into NVIDIA Jetson AGX Xavier,
> so add driver support for this accelerator.
> 
> Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev>
> ---
>   drivers/gpu/drm/Kconfig                 |    2 +
>   drivers/gpu/drm/Makefile                |    1 +
>   drivers/gpu/drm/nvdla/Kconfig           |    8 +
>   drivers/gpu/drm/nvdla/Makefile          |   19 +
>   drivers/gpu/drm/nvdla/nvdla_bdma.c      |  200 +
>   drivers/gpu/drm/nvdla/nvdla_cache.c     |  215 +
>   drivers/gpu/drm/nvdla/nvdla_cdp.c       |  300 ++
>   drivers/gpu/drm/nvdla/nvdla_common.c    |  295 ++
>   drivers/gpu/drm/nvdla/nvdla_common.h    |  835 +++
>   drivers/gpu/drm/nvdla/nvdla_conv.c      |  683 +++
>   drivers/gpu/drm/nvdla/nvdla_drm.c       |  695 +++
>   drivers/gpu/drm/nvdla/nvdla_drm.h       |  127 +
>   drivers/gpu/drm/nvdla/nvdla_engine.c    |  233 +
>   drivers/gpu/drm/nvdla/nvdla_engine.h    |  272 +
>   drivers/gpu/drm/nvdla/nvdla_gem.c       |  393 ++
>   drivers/gpu/drm/nvdla/nvdla_ioctl.h     |   99 +
>   drivers/gpu/drm/nvdla/nvdla_pdp.c       |  446 ++
>   drivers/gpu/drm/nvdla/nvdla_reg.h       | 6411 +++++++++++++++++++++++
>   drivers/gpu/drm/nvdla/nvdla_rubik.c     |  217 +
>   drivers/gpu/drm/nvdla/nvdla_sched.h     |   52 +
>   drivers/gpu/drm/nvdla/nvdla_scheduler.c | 1005 ++++
>   drivers/gpu/drm/nvdla/nvdla_sdp.c       |  728 +++
>   22 files changed, 13236 insertions(+)
>   create mode 100644 drivers/gpu/drm/nvdla/Kconfig
>   create mode 100644 drivers/gpu/drm/nvdla/Makefile
>   create mode 100644 drivers/gpu/drm/nvdla/nvdla_bdma.c
>   create mode 100644 drivers/gpu/drm/nvdla/nvdla_cache.c
>   create mode 100644 drivers/gpu/drm/nvdla/nvdla_cdp.c
>   create mode 100644 drivers/gpu/drm/nvdla/nvdla_common.c
>   create mode 100644 drivers/gpu/drm/nvdla/nvdla_common.h
>   create mode 100644 drivers/gpu/drm/nvdla/nvdla_conv.c
>   create mode 100644 drivers/gpu/drm/nvdla/nvdla_drm.c
>   create mode 100644 drivers/gpu/drm/nvdla/nvdla_drm.h
>   create mode 100644 drivers/gpu/drm/nvdla/nvdla_engine.c
>   create mode 100644 drivers/gpu/drm/nvdla/nvdla_engine.h
>   create mode 100644 drivers/gpu/drm/nvdla/nvdla_gem.c
>   create mode 100644 drivers/gpu/drm/nvdla/nvdla_ioctl.h
>   create mode 100644 drivers/gpu/drm/nvdla/nvdla_pdp.c
>   create mode 100644 drivers/gpu/drm/nvdla/nvdla_reg.h
>   create mode 100644 drivers/gpu/drm/nvdla/nvdla_rubik.c
>   create mode 100644 drivers/gpu/drm/nvdla/nvdla_sched.h
>   create mode 100644 drivers/gpu/drm/nvdla/nvdla_scheduler.c
>   create mode 100644 drivers/gpu/drm/nvdla/nvdla_sdp.c
> 
> diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
> index 5133c3f028ab..a55cff374abd 100644
> --- a/drivers/gpu/drm/Kconfig
> +++ b/drivers/gpu/drm/Kconfig
> @@ -409,6 +409,8 @@ source "drivers/gpu/drm/solomon/Kconfig"
>   
>   source "drivers/gpu/drm/sprd/Kconfig"
>   
> +source "drivers/gpu/drm/nvdla/Kconfig"
> +
>   config DRM_HYPERV
>   	tristate "DRM Support for Hyper-V synthetic video device"
>   	depends on DRM && PCI && MMU && HYPERV
> diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
> index c2ef5f9fce54..8fa3537f308a 100644
> --- a/drivers/gpu/drm/Makefile
> +++ b/drivers/gpu/drm/Makefile
> @@ -134,3 +134,4 @@ obj-y			+= gud/
>   obj-$(CONFIG_DRM_HYPERV) += hyperv/
>   obj-y			+= solomon/
>   obj-$(CONFIG_DRM_SPRD) += sprd/
> +obj-$(CONFIG_DRM_NVDLA) += nvdla/
> diff --git a/drivers/gpu/drm/nvdla/Kconfig b/drivers/gpu/drm/nvdla/Kconfig
> new file mode 100644
> index 000000000000..11c04f5da877
> --- /dev/null
> +++ b/drivers/gpu/drm/nvdla/Kconfig
> @@ -0,0 +1,8 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +config DRM_NVDLA
> +	tristate "NVDLA DRM"
> +	depends on DRM
> +	select DRM_GEM_CMA_HELPER
> +	help
> +	  Choose this option for open-source NVIDIA DLA support.
> +	  If M is selected the module will be called nvdla-drm.
> diff --git a/drivers/gpu/drm/nvdla/Makefile b/drivers/gpu/drm/nvdla/Makefile
> new file mode 100644
> index 000000000000..74f37d258f8d
> --- /dev/null
> +++ b/drivers/gpu/drm/nvdla/Makefile
> @@ -0,0 +1,19 @@
> +
> +# SPDX-License-Identifier: GPL-2.0
> +nvdla-drm-y := \
> +	nvdla_drm.o \
> +	nvdla_gem.o \
> +	nvdla_scheduler.o \
> +	nvdla_engine.o \
> +	nvdla_bdma.o \
> +	nvdla_conv.o \
> +	nvdla_sdp.o \
> +	nvdla_cdp.o \
> +	nvdla_pdp.o \
> +	nvdla_rubik.o \
> +	nvdla_cache.o \
> +	nvdla_common.o \
> +	nvdla_engine_data.o \
> +	nvdla_engine_debug.o \

File names should be sorted alphabetically here.

> +
> +obj-$(CONFIG_DRM_NVDLA) += nvdla-drm.o
> diff --git a/drivers/gpu/drm/nvdla/nvdla_bdma.c b/drivers/gpu/drm/nvdla/nvdla_bdma.c
> new file mode 100644
> index 000000000000..225613f27acf
> --- /dev/null
}


> diff --git a/drivers/gpu/drm/nvdla/nvdla_drm.c b/drivers/gpu/drm/nvdla/nvdla_drm.c
> new file mode 100644
> index 000000000000..9217eee1de3b
> --- /dev/null
> +++ b/drivers/gpu/drm/nvdla/nvdla_drm.c
> @@ -0,0 +1,695 @@
> +// SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
> +/*
> + * Copyright (C) 2017-2018 NVIDIA CORPORATION
> + * Copyright (C) 2022 Cai Huoqing
> + */
> +
> +#include <linux/dma-buf.h>
> +#include <linux/dma-mapping.h>
> +#include <linux/fs.h>
> +#include <linux/interrupt.h>
> +#include <linux/irq.h>
> +#include <linux/irqdomain.h>
> +#include <linux/module.h>
> +#include <linux/of.h>
> +#include <linux/of_device.h>
> +#include <linux/of_irq.h>
> +#include <linux/of_platform.h>
> +#include <linux/platform_device.h>
> +#include <linux/printk.h>
> +#include <linux/slab.h>
> +#include <linux/spinlock.h>
> +#include <linux/time.h>
> +#include <linux/uaccess.h>
> +#include <linux/types.h>
> +
> +#include "nvdla_drm.h"
> +#include "nvdla_ioctl.h"
> +#include "nvdla_engine.h"
> +
> +static struct nvdla_config nvdla_config_os_initial = {
> +	.atom_size = 32,
> +	.bdma_enable = true,
> +	.rubik_enable = true,
> +	.weight_compress_support = true,
> +};
> +
> +static struct nvdla_config nvdla_config_small = {
> +	//.atom_size = 8,
> +	.atom_size = 32,  // nv_large config
> +	.bdma_enable = false,
> +	.rubik_enable = false,
> +	.weight_compress_support = false,
> +};
> +
> +int64_t dla_get_time_us(void)
> +{
> +	return ktime_get_ns() / NSEC_PER_USEC;
> +}
> +
> +void dla_reg_write(void *driver_context, uint32_t addr, uint32_t reg)
> +{
> +	struct nvdla_device *nvdla_dev =
> +			(struct nvdla_device *)driver_context;
> +
> +	if (!nvdla_dev)
> +		return;
> +
> +	writel(reg, nvdla_dev->base + addr);
> +}
> +
> +uint32_t dla_reg_read(void *driver_context, uint32_t addr)
> +{
> +	struct nvdla_device *nvdla_dev =
> +			(struct nvdla_device *)driver_context;
> +
> +	if (!nvdla_dev)
> +		return 0;
> +
> +	return readl(nvdla_dev->base + addr);
> +}
> +
> +static irqreturn_t nvdla_engine_isr(int32_t irq, void *data)
> +{
> +	unsigned long flags;
> +	uint32_t mask;
> +	uint32_t reg;
> +	struct dla_processor *processor = NULL;
> +	struct dla_processor_group *group;
> +	struct dla_engine *engine;
> +	struct nvdla_device *nvdla_dev = (struct nvdla_device *)data;
> +
> +	if (!nvdla_dev)
> +		return IRQ_NONE;
> +
> +	engine = nvdla_dev->engine_context;
> +	spin_lock_irqsave(&nvdla_dev->nvdla_lock, flags);
> +
> +	mask = glb_reg_read(engine, S_INTR_MASK);
> +	reg = glb_reg_read(engine, S_INTR_STATUS);
> +
> +	if (reg & MASK(GLB_S_INTR_STATUS_0, CACC_DONE_STATUS0)) {
> +		processor = &engine->processors[DLA_OP_CONV];
> +		group = &processor->groups[0];
> +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> +	}
> +	if (reg & MASK(GLB_S_INTR_STATUS_0, CACC_DONE_STATUS1)) {
> +		processor = &engine->processors[DLA_OP_CONV];
> +		group = &processor->groups[1];
> +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> +	}
> +	if (reg & MASK(GLB_S_INTR_STATUS_0, SDP_DONE_STATUS0)) {
> +		processor = &engine->processors[DLA_OP_SDP];
> +		group = &processor->groups[0];
> +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> +	}
> +	if (reg & MASK(GLB_S_INTR_STATUS_0, SDP_DONE_STATUS1)) {
> +		processor = &engine->processors[DLA_OP_SDP];
> +		group = &processor->groups[1];
> +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> +	}
> +	if (reg & MASK(GLB_S_INTR_STATUS_0, CDP_DONE_STATUS0)) {
> +		processor = &engine->processors[DLA_OP_CDP];
> +		group = &processor->groups[0];
> +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> +	}
> +	if (reg & MASK(GLB_S_INTR_STATUS_0, CDP_DONE_STATUS1)) {
> +		processor = &engine->processors[DLA_OP_CDP];
> +		group = &processor->groups[1];
> +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> +	}
> +	if (reg & MASK(GLB_S_INTR_STATUS_0, RUBIK_DONE_STATUS0)) {
> +		processor = &engine->processors[DLA_OP_RUBIK];
> +		group = &processor->groups[0];
> +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> +	}
> +	if (reg & MASK(GLB_S_INTR_STATUS_0, RUBIK_DONE_STATUS1)) {
> +		processor = &engine->processors[DLA_OP_RUBIK];
> +		group = &processor->groups[1];
> +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> +	}
> +	if (reg & MASK(GLB_S_INTR_STATUS_0, PDP_DONE_STATUS0)) {
> +		processor = &engine->processors[DLA_OP_PDP];
> +		group = &processor->groups[0];
> +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> +	}
> +	if (reg & MASK(GLB_S_INTR_STATUS_0, PDP_DONE_STATUS1)) {
> +		processor = &engine->processors[DLA_OP_PDP];
> +		group = &processor->groups[1];
> +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> +	}
> +	if (reg & MASK(GLB_S_INTR_STATUS_0, BDMA_DONE_STATUS0)) {
> +		processor = &engine->processors[DLA_OP_BDMA];
> +		group = &processor->groups[0];
> +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> +	}
> +	if (reg & MASK(GLB_S_INTR_STATUS_0, BDMA_DONE_STATUS1)) {
> +		processor = &engine->processors[DLA_OP_BDMA];
> +		group = &processor->groups[1];
> +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> +	}
> +	if (reg & MASK(GLB_S_INTR_STATUS_0, CDMA_DAT_DONE_STATUS0)) {
> +		processor = &engine->processors[DLA_OP_CONV];
> +		group = &processor->groups[0];
> +		group->events |= (1 << DLA_EVENT_CDMA_DT_DONE);
> +	}
> +	if (reg & MASK(GLB_S_INTR_STATUS_0, CDMA_DAT_DONE_STATUS1)) {
> +		processor = &engine->processors[DLA_OP_CONV];
> +		group = &processor->groups[1];
> +		group->events |= (1 << DLA_EVENT_CDMA_DT_DONE);
> +	}
> +	if (reg & MASK(GLB_S_INTR_STATUS_0, CDMA_WT_DONE_STATUS0)) {
> +		processor = &engine->processors[DLA_OP_CONV];
> +		group = &processor->groups[0];
> +		group->events |= (1 << DLA_EVENT_CDMA_WT_DONE);
> +	}
> +	if (reg & MASK(GLB_S_INTR_STATUS_0, CDMA_WT_DONE_STATUS1)) {
> +		processor = &engine->processors[DLA_OP_CONV];
> +		group = &processor->groups[1];
> +		group->events |= (1 << DLA_EVENT_CDMA_WT_DONE);
> +	}
> +
> +	glb_reg_write(engine, S_INTR_STATUS, reg);
> +	mask = glb_reg_read(engine, S_INTR_MASK);
> +	reg = glb_reg_read(engine, S_INTR_STATUS);
> +
> +	complete(&nvdla_dev->event_notifier);
> +	spin_unlock_irqrestore(&nvdla_dev->nvdla_lock, flags);
> +
> +	return IRQ_HANDLED;
> +}
> +
> +static int32_t dla_read_dma_address(void *driver_context, void *task_data,
> +						int16_t index, void *dst)
> +{
> +	int32_t ret = 0;
> +	struct nvdla_mem_handle *handles;
> +	dma_addr_t *phys_addr = (dma_addr_t *)(dst);
> +	struct nvdla_device *nvdla_dev =
> +			(struct nvdla_device *)driver_context;
> +	struct nvdla_task *task = (struct nvdla_task *)task_data;
> +
> +	if (index == -1 || index > task->num_addresses)
> +		return -EINVAL;
> +
> +	handles = (struct nvdla_mem_handle *)task->address_list;
> +	ret = nvdla_gem_dma_addr(nvdla_dev->drm, task->file,
> +					handles[index].handle,
> +					phys_addr);
> +
> +	/* Add offset to IOVA address */
> +	*phys_addr = *phys_addr + handles[index].offset;
> +
> +	return ret;
> +}
> +
> +static int32_t dla_read_cpu_address(void *driver_context, void *task_data,
> +						int16_t index, void *dst)
> +{
> +	uint64_t *temp = (uint64_t *)dst;
> +	struct nvdla_task *task = (struct nvdla_task *)task_data;
> +
> +	if (index == -1 || index > task->num_addresses)
> +		return -EINVAL;
> +
> +	*temp = (uint64_t)index;
> +	return 0;
> +}
> +
> +int32_t dla_get_dma_address(void *driver_context, void *task_data,
> +					int16_t index, void *dst_ptr,
> +					uint32_t destination)
> +{
> +	int32_t ret = 0;
> +
> +	if (destination == DESTINATION_PROCESSOR) {
> +		ret = dla_read_cpu_address(driver_context, task_data,
> +						index, dst_ptr);
> +	} else if (destination == DESTINATION_DMA) {
> +		ret = dla_read_dma_address(driver_context, task_data,
> +						index, dst_ptr);
> +	} else {
> +		ret = -EINVAL;
> +	}
> +
> +	return ret;
> +}
> +
> +int32_t dla_data_write(void *driver_context, void *task_data,
> +				void *src, uint64_t dst,
> +				uint32_t size, uint64_t offset)
> +{
> +	int32_t ret;
> +	void *ptr = NULL;
> +	struct dma_buf *buf;
> +	struct iosys_map map;
> +	struct nvdla_mem_handle *handles;
> +	struct nvdla_task *task = (struct nvdla_task *)task_data;
> +
> +	handles = task->address_list;
> +	buf = dma_buf_get(handles[dst].handle);
> +	if (IS_ERR(buf)) {
> +		pr_err("%s: Failed get dma_buf for handle=%d\n", __func__,
> +						handles[dst].handle);
> +		return -EFAULT;
> +	}
> +
> +	ret = dma_buf_begin_cpu_access(buf, DMA_BIDIRECTIONAL);
> +	if (ret)
> +		goto put_dma_buf;
> +
> +	ret = dma_buf_vmap(buf, &map);
> +	ptr = ret ? NULL : map.vaddr;

Never extract the pointer's address without good reason. You don't know 
if this points to a location in I/O memory.

> +	if (!ptr) {

Simply test for ret here.

> +		pr_err("%s: Failed to vmap dma_buf for handle=%d\n", __func__,
> +						handles[dst].handle);
> +		ret = -ENOMEM;

You already got an errno code. Don't override it.

> +		goto end_cpu_access;
> +	}
> +
> +
> +	memcpy((void *)((uint8_t *)ptr + offset), src, size);

Use iosys_map_memcpy_to() here.  It does the right thing

> +
> +	dma_buf_vunmap(buf, ptr);

You have to pass map as the second argument.

> +
> +end_cpu_access:
> +	dma_buf_end_cpu_access(buf, DMA_BIDIRECTIONAL);
> +
> +put_dma_buf:
> +	dma_buf_put(buf);
> +
> +	return ret;
> +}
> +
> +int32_t dla_data_read(void *driver_context, void *task_data,
> +				uint64_t src, void *dst,
> +				uint32_t size, uint64_t offset)
> +{
> +	int32_t ret;
> +	void *ptr = NULL;
> +	struct dma_buf *buf;
> +	struct iosys_map map;
> +	struct nvdla_mem_handle *handles;
> +	struct nvdla_task *task = (struct nvdla_task *)task_data;
> +
> +	handles = task->address_list;
> +
> +	buf = dma_buf_get(handles[src].handle);
> +	if (IS_ERR(buf)) {
> +		pr_err("%s: Failed get dma_buf for handle=%d\n", __func__,
> +						handles[src].handle);
> +		return -EFAULT;
> +	}
> +
> +	ret = dma_buf_begin_cpu_access(buf, DMA_BIDIRECTIONAL);
> +	if (ret)
> +		goto put_dma_buf;
> +
> +	ret = dma_buf_vmap(buf, &map);
> +	ptr = ret ? NULL : map.vaddr;
> +	if (!ptr) {
> +		pr_err("%s: Failed to vmap dma_buf for handle=%d\n", __func__,
> +						handles[src].handle);
> +		ret = -ENOMEM;
> +		goto end_cpu_access;
> +	}

All the same problems as in dla_data_write().

> +
> +	memcpy(dst, (void *)(((uint8_t *)ptr) + offset), size);

Use iosys_map_memcpy_from() here.

> +
> +	dma_buf_vunmap(buf, ptr);

'map' instead of 'ptr'

> +
> +end_cpu_access:
> +	dma_buf_end_cpu_access(buf, DMA_BIDIRECTIONAL);
> +
> +put_dma_buf:
> +	dma_buf_put(buf);
> +
> +	return ret;
> +}
> +

> +
> +static int32_t nvdla_submit(struct drm_device *drm, void *arg,
> +					struct drm_file *file)
> +{
> +	int32_t err = 0;
> +	struct nvdla_task *task;
> +	struct nvdla_ioctl_submit_task local_task;
> +	struct nvdla_ioctl_submit_task __user *user_task;
> +	struct nvdla_device *nvdla_dev = dev_get_drvdata(drm->dev);
> +	struct nvdla_submit_args *args =
> +			(struct nvdla_submit_args *)arg;
> +
> +	user_task = (struct nvdla_ioctl_submit_task __user *)
> +			(uintptr_t)args->tasks;
> +	if (!user_task)
> +		return -EINVAL;
> +
> +	/* IOCTL copy descriptors */
> +	if (copy_from_user(&local_task, (void __user *)user_task,
> +			(sizeof(*user_task))))
> +		return -EFAULT;
> +
> +	task = kzalloc(sizeof(*task), GFP_KERNEL);
> +	if (task == NULL)
> +		return -EFAULT;
> +
> +	nvdla_dev->task = task;
> +	kref_init(&task->ref);
> +	task->nvdla_dev = nvdla_dev;
> +	task->file = file;
> +
> +	/* update task desc fields */
> +	err = nvdla_fill_task_desc(&local_task, task);
> +	if (err)
> +		goto free_task_desc;
> +
> +	err = nvdla_task_submit(nvdla_dev, task);
> +
> +	kfree(task->address_list);
> +
> +free_task_desc:
> +	kfree(task);
> +	return err;
> +}
> +
> +static int32_t nvdla_gem_alloc(struct nvdla_gem_object *nobj)
> +{
> +	struct drm_gem_object *dobj = &nobj->object;
> +	struct drm_device *drm = dobj->dev;
> +
> +	nobj->dma_attrs = DMA_ATTR_WRITE_COMBINE;
> +
> +	nobj->kvaddr = dma_alloc_attrs(drm->dev, dobj->size, &nobj->dma_addr,
> +						GFP_KERNEL, nobj->dma_attrs);
> +

Store an iosys-map address in nobj and initialize it with 
iosys_map_set_vaddr(); or iosys_map_set_vaddr_iomem() if you're working 
with I/O memory.

> +	if (!nobj->kvaddr)
> +		return -ENOMEM;
> +
> +	return 0;
> +}
> +
> +static void nvdla_gem_free(struct nvdla_gem_object *nobj)
> +{
> +	struct drm_gem_object *dobj = &nobj->object;
> +	struct drm_device *drm = dobj->dev;
> +
> +	dma_free_attrs(drm->dev, dobj->size, nobj->kvaddr, nobj->dma_addr,
> +				nobj->dma_attrs);
> +}
> +
> +static void nvdla_gem_free_object(struct drm_gem_object *dobj)
> +{
> +	struct nvdla_gem_object *nobj;
> +
> +	drm_gem_free_mmap_offset(dobj);
> +
> +	nobj = to_nvdla_obj(dobj);
> +
> +	nvdla_gem_free(nobj);
> +
> +	kfree(nobj);
> +}
> +
> +static struct nvdla_gem_object *
> +nvdla_gem_create_object(struct drm_device *drm, uint32_t size)
> +{
> +	int32_t ret;
> +	struct drm_gem_object *dobj;
> +	struct nvdla_gem_object *nobj;
> +
> +	size = round_up(size, PAGE_SIZE);
> +
> +	nobj = kzalloc(sizeof(*nobj), GFP_KERNEL);
> +	if (!nobj)
> +		return ERR_PTR(-ENOMEM);
> +
> +	dobj = &nobj->object;
> +
> +	drm_gem_private_object_init(drm, dobj, size);
> +
> +	ret = nvdla_gem_alloc(nobj);
> +	if (ret)
> +		goto free_nvdla_obj;
> +
> +	return nobj;
> +
> +free_nvdla_obj:
> +	kfree(nobj);
> +	return ERR_PTR(ret);
> +}
> +
> +static struct sg_table*
> +nvdla_drm_gem_prime_get_sg_table(struct drm_gem_object *dobj)
> +{
> +	int32_t ret;
> +	struct sg_table *sgt;
> +	struct drm_device *drm = dobj->dev;
> +	struct nvdla_gem_object *nobj = to_nvdla_obj(dobj);
> +
> +	sgt = kzalloc(sizeof(*sgt), GFP_KERNEL);
> +	if (!sgt)
> +		return ERR_PTR(-ENOMEM);
> +
> +	ret = dma_get_sgtable_attrs(drm->dev, sgt, nobj->kvaddr,
> +				    nobj->dma_addr, dobj->size,
> +				    nobj->dma_attrs);
> +	if (ret) {
> +		DRM_ERROR("failed to allocate sgt, %d\n", ret);
> +		kfree(sgt);
> +		return ERR_PTR(ret);
> +	}
> +
> +	return sgt;
> +}
> +
> +static int nvdla_drm_gem_prime_vmap(struct drm_gem_object *obj, struct iosys_map *map)
> +{
> +	struct nvdla_gem_object *nobj = to_nvdla_obj(obj);
> +
> +	map->vaddr = nobj->kvaddr;

Instead of kvaddr, store the pointer as struct iosys_map. Then simply 
copy it here, as in

    *map = nobj->map;

> +
> +	return 0;
> +}
> +
> +static void nvdla_drm_gem_prime_vunmap(struct drm_gem_object *obj, struct iosys_map *map)
> +{
> +	/* Nothing to do */
> +}
> +
> +static int32_t nvdla_drm_gem_object_mmap(struct drm_gem_object *dobj,
> +					struct vm_area_struct *vma)
> +{
> +	int32_t ret;
> +	struct nvdla_gem_object *nobj = to_nvdla_obj(dobj);
> +	struct drm_device *drm = dobj->dev;
> +
> +	vma->vm_flags &= ~VM_PFNMAP;
> +	vma->vm_pgoff = 0;

It's cleaner to do this as

    vma->vm_pgoff -= drm_vma_node_start(&obj->vma_node)

> +
> +	ret = dma_mmap_attrs(drm->dev, vma, nobj->kvaddr, nobj->dma_addr,
> +			     dobj->size, nobj->dma_attrs);
> +	if (ret)
> +		drm_gem_vm_close(vma);
> +
> +	return ret;
> +}
> +
> +static const struct drm_gem_object_funcs nvdla_gem_object_funcs = {
> +	.free = nvdla_gem_free_object,
> +	.get_sg_table = nvdla_drm_gem_prime_get_sg_table,
> +	.vmap = nvdla_drm_gem_prime_vmap,
> +	.vunmap = nvdla_drm_gem_prime_vunmap,
> +	.mmap = nvdla_drm_gem_object_mmap,
> +};
> +
> +static struct nvdla_gem_object*
> +nvdla_gem_create_with_handle(struct drm_file *file_priv,
> +							 struct drm_device *drm, uint32_t size,
> +							 uint32_t *handle)
> +{
> +	int32_t ret;
> +	struct drm_gem_object *dobj;
> +	struct nvdla_gem_object *nobj;
> +
> +	nobj = nvdla_gem_create_object(drm, size);
> +	if (IS_ERR(nobj))
> +		return ERR_CAST(nobj);
> +
> +	dobj = &nobj->object;
> +	dobj->funcs = &nvdla_gem_object_funcs;
> +	ret = drm_gem_handle_create(file_priv, dobj, handle);
> +	if (ret)
> +		goto free_drm_object;
> +
> +	drm_gem_object_put(dobj);
> +
> +	return nobj;
> +
> +free_drm_object:
> +	nvdla_gem_free_object(dobj);
> +
> +	return ERR_PTR(ret);
> +}
> +
> +static int32_t nvdla_gem_create(struct drm_device *drm, void *data,
> +								struct drm_file *file)
> +{
> +	struct nvdla_gem_object *nobj;
> +	struct nvdla_gem_create_args *args = data;
> +
> +	nobj = nvdla_gem_create_with_handle(file, drm, args->size,
> +					 &args->handle);
> +	if (IS_ERR(nobj))
> +		return PTR_ERR(nobj);
> +
> +	return 0;
> +}
> +
> +static int32_t nvdla_drm_gem_mmap_buf(struct drm_gem_object *obj,
> +									  struct vm_area_struct *vma)
> +{
> +	int32_t ret;
> +
> +	ret = drm_gem_mmap_obj(obj, obj->size, vma);
> +	if (ret)
> +		return ret;
> +
> +	return nvdla_drm_gem_object_mmap(obj, vma);
> +}
> +
> +static int32_t nvdla_drm_gem_mmap(struct file *filp, struct vm_area_struct *vma)
> +{
> +	int32_t ret;
> +	struct drm_gem_object *obj;
> +
> +	ret = drm_gem_mmap(filp, vma);
> +	if (ret)
> +		return ret;
> +
> +	obj = vma->vm_private_data;
> +
> +	return nvdla_drm_gem_object_mmap(obj, vma);

I don't understand these two lines. This is part of what drm_gem_mmap() 
does. It shouldn't be necessary here.

> +}
> +
> +int32_t nvdla_gem_dma_addr(struct drm_device *dev, struct drm_file *file,
> +						   uint32_t fd, dma_addr_t *addr)
> +{
> +	int32_t ret;
> +	uint32_t handle;
> +	struct nvdla_gem_object *nobj;
> +	struct drm_gem_object *dobj;
> +
> +	ret = drm_gem_prime_fd_to_handle(dev, file, fd, &handle);
> +	if (ret)
> +		return ret;
> +
> +	dobj = drm_gem_object_lookup(file, handle);
> +	if (!dobj)
> +		return -EINVAL;
> +
> +	nobj = to_nvdla_obj(dobj);
> +
> +	*addr = nobj->dma_addr;
> +
> +	drm_gem_object_put(dobj);
> +
> +	return 0;
> +}
> +
> +static int32_t nvdla_gem_map_offset(struct drm_device *drm, void *data,
> +									struct drm_file *file)
> +{
> +	struct nvdla_gem_map_offset_args *args = data;
> +
> +	return drm_gem_dumb_map_offset(file, drm, args->handle,
> +								   &args->offset);
> +}
> +
> +static const struct file_operations nvdla_drm_fops = {
> +	.owner = THIS_MODULE,
> +	.open = drm_open,
> +	.release = drm_release,
> +	.unlocked_ioctl = drm_ioctl,
> +	.mmap = nvdla_drm_gem_mmap,

It should be fine to use drm_gem_mmap here. Then you should use 
DEFINE_DRM_GEM_FOPS() to define nvdla_drm_fops.

> +	.poll = drm_poll,
> +	.read = drm_read,
> +#ifdef CONFIG_COMPAT
> +	.compat_ioctl = drm_compat_ioctl,
> +#endif
> +	.llseek = noop_llseek,
> +};
> +
> +static const struct drm_ioctl_desc nvdla_drm_ioctls[] = {
> +	DRM_IOCTL_DEF_DRV(NVDLA_SUBMIT, nvdla_submit, DRM_RENDER_ALLOW),
> +	DRM_IOCTL_DEF_DRV(NVDLA_GEM_CREATE, nvdla_gem_create, DRM_RENDER_ALLOW),
> +	DRM_IOCTL_DEF_DRV(NVDLA_GEM_MMAP, nvdla_gem_map_offset, DRM_RENDER_ALLOW),
> +	/* use DRM_IOCTL_MODE_DESTROY_DUMB to destory */
> +};
> +
> +static struct drm_driver nvdla_drm_driver = {
> +	.driver_features = DRIVER_GEM | DRIVER_RENDER,
> +
> +	.ioctls = nvdla_drm_ioctls,
> +	.num_ioctls = ARRAY_SIZE(nvdla_drm_ioctls),
> +	.fops = &nvdla_drm_fops,
> +	.gem_prime_mmap		= nvdla_drm_gem_mmap_buf,

Use drm_gem_prime_mmap() here.

Some context: the situation with these mmap functions has been confusing 
and inconsistent among DRM drivers. But we cleaned it up so that you 
only have to provide a minimal implementation of struct 
drm_gem_object_funcs.mmap.  All other mmap callbacks can then be filled 
with standard DRM helpers.

> +
> +	.name = "nvdla",
> +	.desc = "NVDLA driver",
> +	.date = "20171017",
> +	.major = 0,
> +	.minor = 0,
> +	.patchlevel = 0,
> +};
> +
> +int32_t nvdla_drm_probe(struct nvdla_device *nvdla_dev)
> +{
> +	int32_t err;
> +	struct drm_device *drm;
> +	struct drm_driver *driver = &nvdla_drm_driver;
> +
> +	drm = drm_dev_alloc(driver, &nvdla_dev->pdev->dev);
> +	if (IS_ERR(drm))
> +		return PTR_ERR(drm);
> +
> +	nvdla_dev->drm = drm;
> +
> +	err = drm_dev_register(drm, 0);
> +	if (err < 0)
> +		goto unref;
> +
> +	return 0;
> +
> +unref:
> +	drm_dev_put(drm);
> +	return err;
> +}
> +
> +void nvdla_drm_remove(struct nvdla_device *nvdla_dev)
> +{
> +	drm_dev_unregister(nvdla_dev->drm);
> +	drm_dev_put(nvdla_dev->drm);
> +}
Thomas Zimmermann April 21, 2022, 8:57 a.m. UTC | #3
Hi

Am 21.04.22 um 10:34 schrieb Christian König:
> Am 21.04.22 um 10:30 schrieb Thomas Zimmermann:
>> (Resending, as some MLs didn't like the size of the origninal mail.)
>>
>> Hi,
>>
>> thanks for your submission. Some general comments:
>>
>>   * some functions are prefixed with dla_, others use nvdla_. It seems 
>> arbitrary to me. Please use nvdla_ consistently throughout the source 
>> code.
>>
>>   * For reporting errors, please use drm_err(), drm_warn(), etc. I 
>> suggest to rearrange the error messages to not be located in the 
>> innermost functions.
> 
> If you plan to have multiple instances of the driver loaded at the same 
> time, using drm_dev_err(), drm_dev_warn() etc.. would be even better.

I think that's what I mean. Thanks for pointing this out.

Best regards
Thomas

> 
> BTW: I'm still absolutely not keen to enforcing drm_* log functions. So 
> if you prefer to stick with pr_err() and dev_err() we could discuss that 
> once more.
> 
> Regards,
> Christian.
> 
>>
>>   * Could you please split this patch into smaller pieces? It 
>> currently hits size limits of some mailing lists. Maybe add the 
>> register constants separately.
>>
>> Please find more review comments below. It's not a full review, but at 
>> least something to start with.
>>
>> Best regards
>> Thomas
>
Thomas Zimmermann April 21, 2022, 9:13 a.m. UTC | #4
Hi

Am 21.04.22 um 10:34 schrieb Christian König:
> Am 21.04.22 um 10:30 schrieb Thomas Zimmermann:
>> (Resending, as some MLs didn't like the size of the origninal mail.)
>>
>> Hi,
>>
>> thanks for your submission. Some general comments:
>>
>>   * some functions are prefixed with dla_, others use nvdla_. It seems 
>> arbitrary to me. Please use nvdla_ consistently throughout the source 
>> code.
>>
>>   * For reporting errors, please use drm_err(), drm_warn(), etc. I 
>> suggest to rearrange the error messages to not be located in the 
>> innermost functions.
> 
> If you plan to have multiple instances of the driver loaded at the same 
> time, using drm_dev_err(), drm_dev_warn() etc.. would be even better.

I thought that these functions exist, but looking for them now I cannot 
find them. The macros DRM_DEV_ERR(), etc are deprecated.

> 
> BTW: I'm still absolutely not keen to enforcing drm_* log functions. So 
> if you prefer to stick with pr_err() and dev_err() we could discuss that 
> once more.
> 
> Regards,
> Christian.
> 
>>
>>   * Could you please split this patch into smaller pieces? It 
>> currently hits size limits of some mailing lists. Maybe add the 
>> register constants separately.
>>
>> Please find more review comments below. It's not a full review, but at 
>> least something to start with.
>>
>> Best regards
>> Thomas
>
Kari Argillander April 21, 2022, 10:01 p.m. UTC | #5
This is just quick look up. I basically check some style issues and did
some basic static analyzing.

I have run
  - cppcheck (which found couple mistakes)
  - flawfinder (did not found anything to my eyes)
  - codespell (did find couple typo)

You can run these yourself also or check below.

Couple common things which you can ignore or not	.
- Usually in this code there is goto exit and it is just return. Maybe
   use just return straight away. No need to jump.
- Some comments start capital others not. Maybe all should start
   capital. Very small nit, but makes nice touch to the code.
- Lot of oneline comments are unneccessary three line comments.

On 19.4.2022 16.59, Cai Huoqing wrote:
> The NVIDIA Deep Learning Accelerator (NVDLA) is an open source IP
> which is integrated into NVIDIA Jetson AGX Xavier,
> so add driver support for this accelerator.
> 
> Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev>

... snip

> diff --git a/drivers/gpu/drm/nvdla/nvdla_bdma.c b/drivers/gpu/drm/nvdla/nvdla_bdma.c
> new file mode 100644
> index 000000000000..225613f27acf
> --- /dev/null
> +++ b/drivers/gpu/drm/nvdla/nvdla_bdma.c

... snip

> +static int32_t
> +processor_bdma_program_slot(struct dla_engine *engine,
> +							struct dla_bdma_surface_desc *bdma_surface,
> +							struct dla_bdma_transfer_desc *transfer)
> +{
> +	int32_t ret = 0;
> +	uint64_t source_addr = 0;
> +	uint64_t destination_addr = 0;
> +	uint32_t high, low, reg;
> +	uint8_t  bdma_free_slots = 0;
> +
> +	/* make sure there're enough free slots */
> +	if (bdma_free_slots <= 0) {

This is always true right now.

> +		do {
> +			reg = bdma_reg_read(engine, STATUS);
> +			reg = (reg & MASK(BDMA_STATUS_0, FREE_SLOT)) >>
> +					SHIFT(BDMA_STATUS_0, FREE_SLOT);
> +		} while (reg == 0);
> +		bdma_free_slots = (uint8_t)reg;
> +	}
> +
> +	dla_get_dma_address(engine->driver_context, engine->task->task_data,
> +						transfer->source_address,
> +						(void *)&source_addr,
> +						DESTINATION_DMA);
> +	dla_get_dma_address(engine->driver_context, engine->task->task_data,
> +						transfer->destination_address,
> +						(void *)&destination_addr,
> +						DESTINATION_DMA);
> +
> +	ASSERT_GOTO((transfer->line_repeat <= 8192),
> +				ret, -EINVAL, exit);
> +	ASSERT_GOTO((transfer->surface_repeat <= 8192),
> +				ret, -EINVAL, exit);
> +	ASSERT_GOTO((transfer->line_size % 32) == 0,
> +				ret, -EINVAL, exit);
> +	ASSERT_GOTO(transfer->source_line >= transfer->line_size,
> +				ret, -EINVAL, exit);
> +	ASSERT_GOTO(transfer->destination_line >= transfer->line_size,
> +				ret, -EINVAL, exit);
> +	ASSERT_GOTO(transfer->source_surface >=
> +			(transfer->source_line * transfer->line_repeat),
> +				ret, -EINVAL, exit);
> +	ASSERT_GOTO(transfer->destination_surface >=
> +			(transfer->destination_line * transfer->line_repeat),
> +				ret, -EINVAL, exit);
> +
> +	/* config registers */
> +	high = upper_32_bits(source_addr);
> +	low = lower_32_bits(source_addr);
> +	bdma_reg_write(engine, CFG_SRC_ADDR_LOW, low);
> +	bdma_reg_write(engine, CFG_SRC_ADDR_HIGH, high);
> +	high = upper_32_bits(destination_addr);
> +	low = lower_32_bits(destination_addr);
> +	bdma_reg_write(engine, CFG_DST_ADDR_LOW, low);
> +	bdma_reg_write(engine, CFG_DST_ADDR_HIGH, high);
> +	bdma_reg_write(engine, CFG_LINE, (transfer->line_size >> 5) - 1);
> +	reg = (map_mem[bdma_surface->source_type] <<
> +				SHIFT(BDMA_CFG_CMD_0, SRC_RAM_TYPE)) |
> +		(map_mem[bdma_surface->destination_type] <<
> +				SHIFT(BDMA_CFG_CMD_0, DST_RAM_TYPE));
> +	bdma_reg_write(engine, CFG_CMD, reg);
> +	bdma_reg_write(engine, CFG_LINE_REPEAT, transfer->line_repeat - 1);
> +	bdma_reg_write(engine, CFG_SRC_LINE, transfer->source_line);
> +	bdma_reg_write(engine, CFG_DST_LINE, transfer->destination_line);
> +	bdma_reg_write(engine, CFG_SURF_REPEAT, transfer->surface_repeat - 1);
> +	bdma_reg_write(engine, CFG_SRC_SURF, transfer->source_surface);
> +	bdma_reg_write(engine, CFG_DST_SURF, transfer->destination_surface);
> +	bdma_reg_write(engine, CFG_OP, FIELD_ENUM(BDMA_CFG_OP_0, EN, ENABLE));
> +
> +exit:
> +	return ret;
> +}

... snip

> diff --git a/drivers/gpu/drm/nvdla/nvdla_cache.c b/drivers/gpu/drm/nvdla/nvdla_cache.c
> new file mode 100644
> index 000000000000..f8bd7b514aab
> --- /dev/null
> +++ b/drivers/gpu/drm/nvdla/nvdla_cache.c
> @@ -0,0 +1,215 @@
> +// SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
> +/*
> + * Copyright (C) 2017-2018 NVIDIA CORPORATION
> + * Copyright (C) 2022 Cai Huoqing
> + */
> +
> +#include "nvdla_common.h"
> +#include "nvdla_drm.h"
> +#include "nvdla_reg.h"
> +#include "nvdla_engine.h"
> +
> +#define DLA_OP_CACHE_SIZE (DLA_NUM_GROUPS * ((DLA_OP_NUM + 2) * 2))
> +
> +static struct dla_common_op_desc desc_cache[DLA_OP_NUM][DLA_OP_CACHE_SIZE];
> +static int32_t desc_refcount[DLA_OP_NUM][DLA_OP_CACHE_SIZE];
> +
> +void
> +dla_get_refcount(struct dla_common_op_desc *op_desc)
> +{
> +	int32_t i;
> +	struct dla_common_op_desc *desc = NULL;
> +
> +	if (op_desc == NULL)
> +		return;
> +
> +	if (op_desc->index == -1)
> +		return;
> +
> +	desc = &desc_cache[op_desc->op_type][0];
> +
> +	for (i = 0; i < DLA_OP_CACHE_SIZE; i++, desc++) {
> +		if (desc->index == op_desc->index &&
> +				desc->roi_index == op_desc->roi_index) {

reverse if

		if (desc->index != op_desc->index)
			continue;
		if (desc->roi_index != op_desc->roi_index)
			continue;

> +			desc_refcount[op_desc->op_type][i]++;
> +			return;
> +		}
> +	}
> +}
> +
> +struct dla_common_op_desc *
> +dla_get_op_desc(struct dla_engine *engine,
> +				struct dla_task *task, int16_t index,
> +				uint8_t op_type, uint8_t roi_index)
> +{
> +	int32_t i;
> +	int32_t ret;
> +	uint64_t op_base;
> +	uint64_t dep_graph_addr;
> +	struct dla_common_op_desc *desc = NULL;
> +
> +	if (index == -1) {
> +		pr_debug("no desc get due to index==-1\n");
> +		goto exit;
> +	}
> +
> +	dep_graph_addr = (sizeof(struct dla_common_op_desc) *
> +				engine->network->num_operations * roi_index);
> +
> +	desc = &desc_cache[op_type][0];
> +
> +	for (i = 0; i < DLA_OP_CACHE_SIZE; i++, desc++) {
> +		if (desc->index == index && desc->roi_index == roi_index) {
> +			if (desc->op_type != op_type) {
> +				pr_err("op_cache[op=%u] contains incorrect entry of op[%u]\n",
> +					   op_type, desc->op_type);
> +				continue;
> +			}

reverse if so this will be pretty clean

		if (desc->index != index)
			continue;
		if (desc->roi_index != roi_index)
			continue;
		if (desc->op_type != op_type) {
			pr_err("op_cache[op=%u] contains incorrect entry of op[%u]\n",
					op_type, desc->op_type);
			continue;
		}


> +			desc_refcount[op_type][i]++;
> +			goto exit;
> +		}
> +	}
> +
> +	desc = &desc_cache[op_type][0];
> +
> +	for (i = 0; i < DLA_OP_CACHE_SIZE; i++, desc++) {
> +		if (desc->index == -1) {

reverse if
		if (desc->index != -1)
			continue;

> +			op_base = dep_graph_addr +
> +					(sizeof(struct dla_common_op_desc) *
> +					(uint64_t)index);
> +			ret = dla_data_read(engine->driver_context,
> +					task->task_data,
> +					task->dependency_graph_addr,
> +					(void *)(desc),
> +					sizeof(struct dla_common_op_desc),
> +					op_base);
> +			if (ret) {
> +				desc = NULL;
> +				goto exit;
> +			}
> +
> +			if (op_type != desc->op_type) {
> +				/*
> +				 * op_type of entry read from DRAM should not
> +				 * mismatch with given op_type. If they
> +				 * mismatches, then wrong entry is fetched, so
> +				 * report this issue by throwing error.
> +				 */
> +				pr_err("Fetched [op_type=%u] from DRAM doesn't match with op_type[%u]\n",
> +					   desc->op_type, op_type);
> +				desc->op_type = op_type;
> +				desc->index = -1;
> +				desc->roi_index = -1;
> +				desc = NULL;
> +				goto exit;
> +			}
> +
> +			desc->index = index;
> +			desc->roi_index = roi_index;
> +
> +			desc_refcount[op_type][i]++;
> +			goto exit;
> +		}
> +	}
> +
> +exit:
> +	return desc;
> +}
> +
> +static void
> +dla_free_op_desc(struct dla_engine *engine, struct dla_common_op_desc *op_desc)
> +{
> +	uint64_t op_base;
> +	uint64_t dep_graph_addr;
> +	struct dla_task *task;
> +
> +	pr_debug("Enter: %s op desc index %u ROI %d\n", __func__,
> +				op_desc->index, op_desc->roi_index);

Possiple null pointer dereference

> +	task = engine->task;
> +	dep_graph_addr = (sizeof(struct dla_common_op_desc) *
> +				engine->network->num_operations *
> +				op_desc->roi_index);
> +
> +	if (op_desc->index == -1)
> +		goto exit;

Possiple null pointer dereference

> +	if (op_desc == NULL)
> +		goto exit;

Or this is unnecessary.

> +
> +	/**
> +	 * TODO: keeping the depth value hardcoded as 0 for now,
> +	 * need to replace it once corresponding implementation is done.
> +	 */
> +	op_base = (dep_graph_addr +
> +			(sizeof(struct dla_common_op_desc) *
> +			(uint64_t)op_desc->index));
> +
> +	/**
> +	 * Flush descriptor to DRAM
> +	 */
> +	dla_data_write(engine->driver_context,
> +			task->task_data,
> +			(void *)op_desc,
> +			task->dependency_graph_addr,
> +			sizeof(struct dla_common_op_desc),
> +			op_base);
> +
> +	/**
> +	 * Release it
> +	 */
> +	op_desc->index = -1;
> +	op_desc->roi_index = -1;
> +exit:
> +	return;
> +}
> +
> +void
> +dla_put_op_desc(struct dla_engine *engine, struct dla_common_op_desc *op_desc)
> +{
> +	int32_t i;
> +	struct dla_common_op_desc *desc;
> +
> +	if (op_desc == NULL)
> +		return;
> +
> +	if (op_desc->index == -1)
> +		return;
> +
> +	desc = &desc_cache[op_desc->op_type][0];
> +
> +	for (i = 0; i < DLA_OP_CACHE_SIZE; i++, desc++) {
> +		if (desc->index == op_desc->index &&
> +				desc->roi_index == op_desc->roi_index) {

Reverse if.

		if (desc->index != op_desc->index)
			continue;
		if (desc->roi_index != op_desc->roi_index)
			continue;

> +
> +			desc_refcount[op_desc->op_type][i]--;
> +
> +			/**
> +			 * Free desc if refcount is 0
> +			 */
Pretty useless comment and totally not needed three line for this.

> +			if (desc_refcount[op_desc->op_type][i] == 0)
> +				dla_free_op_desc(engine, op_desc);
> +
> +			return;
> +		}
> +	}
> +}
> +
> +void
> +dla_init_op_cache(struct dla_engine *engine)
> +{
> +	int32_t i, j;
> +	struct dla_common_op_desc *desc = &desc_cache[0][0];
> +
> +	memset((uint8_t *)&desc_cache[0][0], 0, sizeof(desc_cache));
> +	memset((uint8_t *)&desc_refcount[0][0], 0, sizeof(desc_refcount));
> +
> +	for (i = 0; i < DLA_OP_NUM; i++) {
> +		for (j = 0; j < DLA_OP_CACHE_SIZE; j++) {
> +			desc->index = -1;
> +			desc->roi_index = -1;
> +			desc->op_type = (uint8_t)i;
> +			desc++;
> +		}
> +	}
> +}

... snip

> diff --git a/drivers/gpu/drm/nvdla/nvdla_common.h b/drivers/gpu/drm/nvdla/nvdla_common.h
> new file mode 100644
> index 000000000000..38cf43246890
> --- /dev/null
> +++ b/drivers/gpu/drm/nvdla/nvdla_common.h
> @@ -0,0 +1,835 @@

... snip

> +struct dla_conv_op_desc {
> +	/* Performance parameters */
> +
> +	/* dla_conv_mode */
> +	uint8_t conv_mode;
> +	uint8_t data_reuse;
> +	uint8_t weight_reuse;
> +	uint8_t skip_data_rls;
> +
> +	uint8_t skip_weight_rls;
> +	uint8_t reserved0;
> +	uint16_t entry_per_slice;
> +
> +	/* dla_data_format */
> +	uint8_t data_format;
> +	/* dla_pixel_mapping */
> +	uint8_t pixel_mapping;
> +	/* number of free slices before fetch */
> +	uint16_t fetch_grain;
> +
> +	uint8_t reserved_b[8];
> +
> +	/* batch_num */
> +	uint8_t batch;
> +	/* dla_weight_format */
> +	uint8_t weight_format;
> +	uint8_t data_bank;
> +	uint8_t weight_bank;
> +
> +	/* the offset in bytes of each data cube in a batch */
> +	uint32_t batch_stride;
> +
> +	uint8_t post_extension;
> +	uint8_t pixel_override;
> +	/* number of slices need to be released */
> +	uint16_t release;
> +
> +	 /* The input cube dimension for CSC */
> +	uint16_t input_width_csc;
> +	uint16_t input_height_csc;
> +
> +	uint16_t input_channel_csc;
> +	uint16_t kernel_width_csc;
> +
> +	uint16_t kernel_height_csc;
> +	uint16_t kernel_channel_csc;
> +
> +	/* The input cube dimension for CMAC */
> +	uint16_t input_width_cmac;
> +	uint16_t input_height_cmac;
> +
> +	/* actual size in bytes */
> +	uint32_t bytes_per_kernel;
> +
> +	/* Algorithm parameters */
> +
> +	int16_t mean_ry; /* mean value for red in RGB or Y in YUV */
> +	int16_t mean_gu; /* mean value for green in RGB or U in YUV */
> +
> +	int16_t mean_bv; /* mean value for blue in RGB or V in YUV */
> +	int16_t mean_ax;
> +
> +	uint8_t mean_format; /* dla_mean_format */
> +	uint8_t conv_stride_x;
> +	uint8_t conv_stride_y;
> +	uint8_t pad_x_left;
> +
> +	uint8_t pad_x_right;
> +	uint8_t pad_y_top;
> +	uint8_t pad_y_bottom;
> +	uint8_t dilation_x;
> +
> +	uint8_t dilation_y;
> +	uint8_t reserved2[2];
> +
> +	/* Precision parameters */
> +	uint8_t pra_truncate;
> +
> +	uint8_t in_precision;
> +	/* The output precision from CONV, it's the MAC processing precison */

./nvdla_common.h:428: precison ==> precision

> +	uint8_t out_precision;
> +	int16_t pad_val;
> +
> +	/* input converter parameters */
> +	struct dla_cvt_param in_cvt;
> +	/* output converter parameters, support truncate only */
> +	struct dla_cvt_param out_cvt;
> +
> +} __packed __aligned(4);
> +
> +struct dla_conv_stat_desc {
> +	uint32_t data_read_stall;
> +	uint32_t weight_read_stall;
> +	uint32_t data_read_latency;
> +	uint32_t weight_read_latency;
> +	uint32_t saturation_count;
> +	uint32_t nan_data_num;
> +	uint32_t nan_weight_num;
> +	uint32_t inf_data_num;
> +	uint32_t inf_weight_num;
> +} __packed __aligned(4);
> +
> +/**
> + * @ingroup SDP
> + * @name Activation functions
> + * @brief Activation functions supported in SDP
> + * @{
> + */
> +#define ACTIVATION_NONE		0
> +#define ACTIVATION_RELU		1
> +#define ACTIVATION_LUT		2
> +#define ACTIVATION_PRELU	3
> +/** @} */
> +
> +/**
> + * @ingroup LUT
> + * @name LUT size
> + * @brief LUT sizes for linear and exponentila LUT
> + * @{
> + */
> +#define LUT_LINEAR_EXP_TABLE_ENTRY_LOG2		6
> +#define LUT_LINEAR_ONLY_TABLE_ENTRY_LOG2	8
> +/** @} */
> +
> +/**
> + * @ingroup LUT
> + * @name LUT types
> + * @brief DLA supports two types of LUT, linear and exonential
> + * @{
> + */
> +#define LUT_LINEAR_EXP_TABLE		0
> +#define LUT_LINEAR_ONLY_TABLE		1
> +/** @} */
> +
> +/**
> + * @ingroup LUT
> + * @name LUT methods
> + * @brief DLA supports two types of LUT, linear and exonential
> + * @{
> + */
> +#define LUT_METHOD_EXPONENTIAL		0
> +#define LUT_METHOD_LINEAR		1
> +/** @} */
> +
> +/**
> + * @ingroup LUT
> + * @name LUT
> + * @brief DLA supports two types of LUT, linear and exonential
> + * @{
> + */
> +#define LUT_PRI_LINEAR_EXP		0
> +#define LUT_PRI_LINEAR_ONLY		1
> +/** @} */
> +
> +union dla_lut_offset {
> +	/**
> +	 * Number should be substracted on log domain before look up

./nvdla_common.h:505: substracted ==> subtracted

> +	 * exponetial table it has the same definition as hardware

./nvdla_common.h:506: exponetial ==> exponential

> +	 * thus input scaling should also take into account when
> +	 * set this field.
> +	 */
> +	int8_t exp_offset;
> +	/**
> +	 * Number of bits should be right shift before looking
> +	 * up linear table
> +	 */
> +	int8_t frac_bits;
> +	uint16_t reserved0;
> +};

... snip

> diff --git a/drivers/gpu/drm/nvdla/nvdla_drm.c b/drivers/gpu/drm/nvdla/nvdla_drm.c
> new file mode 100644
> index 000000000000..9217eee1de3b
> --- /dev/null
> +++ b/drivers/gpu/drm/nvdla/nvdla_drm.c
> @@ -0,0 +1,695 @@
> +// SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
> +/*
> + * Copyright (C) 2017-2018 NVIDIA CORPORATION
> + * Copyright (C) 2022 Cai Huoqing
> + */
> +
> +#include <linux/dma-buf.h>
> +#include <linux/dma-mapping.h>
> +#include <linux/fs.h>
> +#include <linux/interrupt.h>
> +#include <linux/irq.h>
> +#include <linux/irqdomain.h>
> +#include <linux/module.h>
> +#include <linux/of.h>
> +#include <linux/of_device.h>
> +#include <linux/of_irq.h>
> +#include <linux/of_platform.h>
> +#include <linux/platform_device.h>
> +#include <linux/printk.h>
> +#include <linux/slab.h>
> +#include <linux/spinlock.h>
> +#include <linux/time.h>
> +#include <linux/uaccess.h>
> +#include <linux/types.h>
> +
> +#include "nvdla_drm.h"
> +#include "nvdla_ioctl.h"
> +#include "nvdla_engine.h"
> +
> +static struct nvdla_config nvdla_config_os_initial = {
> +	.atom_size = 32,
> +	.bdma_enable = true,
> +	.rubik_enable = true,
> +	.weight_compress_support = true,
> +};
> +
> +static struct nvdla_config nvdla_config_small = {
> +	//.atom_size = 8,
> +	.atom_size = 32,  // nv_large config
> +	.bdma_enable = false,
> +	.rubik_enable = false,
> +	.weight_compress_support = false,
> +};
> +
> +int64_t dla_get_time_us(void)

Funtion is never used.

> +{
> +	return ktime_get_ns() / NSEC_PER_USEC;
> +}
> +
> +void dla_reg_write(void *driver_context, uint32_t addr, uint32_t reg)
> +{
> +	struct nvdla_device *nvdla_dev =
> +			(struct nvdla_device *)driver_context;
> +
> +	if (!nvdla_dev)
> +		return;
> +
> +	writel(reg, nvdla_dev->base + addr);
> +}
> +
> +uint32_t dla_reg_read(void *driver_context, uint32_t addr)
> +{
> +	struct nvdla_device *nvdla_dev =
> +			(struct nvdla_device *)driver_context;
> +
> +	if (!nvdla_dev)
> +		return 0;
> +
> +	return readl(nvdla_dev->base + addr);
> +}
> +
> +static irqreturn_t nvdla_engine_isr(int32_t irq, void *data)
> +{
> +	unsigned long flags;
> +	uint32_t mask;
> +	uint32_t reg;
> +	struct dla_processor *processor = NULL;
> +	struct dla_processor_group *group;
> +	struct dla_engine *engine;
> +	struct nvdla_device *nvdla_dev = (struct nvdla_device *)data;
> +
> +	if (!nvdla_dev)
> +		return IRQ_NONE;
> +
> +	engine = nvdla_dev->engine_context;
> +	spin_lock_irqsave(&nvdla_dev->nvdla_lock, flags);
> +
> +	mask = glb_reg_read(engine, S_INTR_MASK);

Never used. It would be nice so that static analyzer will not complain
these anymore, but your choice what you want to do.

> +	reg = glb_reg_read(engine, S_INTR_STATUS);
> +
> +	if (reg & MASK(GLB_S_INTR_STATUS_0, CACC_DONE_STATUS0)) {
> +		processor = &engine->processors[DLA_OP_CONV];
> +		group = &processor->groups[0];
> +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> +	}
> +	if (reg & MASK(GLB_S_INTR_STATUS_0, CACC_DONE_STATUS1)) {
> +		processor = &engine->processors[DLA_OP_CONV];
> +		group = &processor->groups[1];
> +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> +	}
> +	if (reg & MASK(GLB_S_INTR_STATUS_0, SDP_DONE_STATUS0)) {
> +		processor = &engine->processors[DLA_OP_SDP];
> +		group = &processor->groups[0];
> +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> +	}
> +	if (reg & MASK(GLB_S_INTR_STATUS_0, SDP_DONE_STATUS1)) {
> +		processor = &engine->processors[DLA_OP_SDP];
> +		group = &processor->groups[1];
> +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> +	}
> +	if (reg & MASK(GLB_S_INTR_STATUS_0, CDP_DONE_STATUS0)) {
> +		processor = &engine->processors[DLA_OP_CDP];
> +		group = &processor->groups[0];
> +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> +	}
> +	if (reg & MASK(GLB_S_INTR_STATUS_0, CDP_DONE_STATUS1)) {
> +		processor = &engine->processors[DLA_OP_CDP];
> +		group = &processor->groups[1];
> +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> +	}
> +	if (reg & MASK(GLB_S_INTR_STATUS_0, RUBIK_DONE_STATUS0)) {
> +		processor = &engine->processors[DLA_OP_RUBIK];
> +		group = &processor->groups[0];
> +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> +	}
> +	if (reg & MASK(GLB_S_INTR_STATUS_0, RUBIK_DONE_STATUS1)) {
> +		processor = &engine->processors[DLA_OP_RUBIK];
> +		group = &processor->groups[1];
> +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> +	}
> +	if (reg & MASK(GLB_S_INTR_STATUS_0, PDP_DONE_STATUS0)) {
> +		processor = &engine->processors[DLA_OP_PDP];
> +		group = &processor->groups[0];
> +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> +	}
> +	if (reg & MASK(GLB_S_INTR_STATUS_0, PDP_DONE_STATUS1)) {
> +		processor = &engine->processors[DLA_OP_PDP];
> +		group = &processor->groups[1];
> +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> +	}
> +	if (reg & MASK(GLB_S_INTR_STATUS_0, BDMA_DONE_STATUS0)) {
> +		processor = &engine->processors[DLA_OP_BDMA];
> +		group = &processor->groups[0];
> +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> +	}
> +	if (reg & MASK(GLB_S_INTR_STATUS_0, BDMA_DONE_STATUS1)) {
> +		processor = &engine->processors[DLA_OP_BDMA];
> +		group = &processor->groups[1];
> +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> +	}
> +	if (reg & MASK(GLB_S_INTR_STATUS_0, CDMA_DAT_DONE_STATUS0)) {
> +		processor = &engine->processors[DLA_OP_CONV];
> +		group = &processor->groups[0];
> +		group->events |= (1 << DLA_EVENT_CDMA_DT_DONE);
> +	}
> +	if (reg & MASK(GLB_S_INTR_STATUS_0, CDMA_DAT_DONE_STATUS1)) {
> +		processor = &engine->processors[DLA_OP_CONV];
> +		group = &processor->groups[1];
> +		group->events |= (1 << DLA_EVENT_CDMA_DT_DONE);
> +	}
> +	if (reg & MASK(GLB_S_INTR_STATUS_0, CDMA_WT_DONE_STATUS0)) {
> +		processor = &engine->processors[DLA_OP_CONV];
> +		group = &processor->groups[0];
> +		group->events |= (1 << DLA_EVENT_CDMA_WT_DONE);
> +	}
> +	if (reg & MASK(GLB_S_INTR_STATUS_0, CDMA_WT_DONE_STATUS1)) {
> +		processor = &engine->processors[DLA_OP_CONV];
> +		group = &processor->groups[1];
> +		group->events |= (1 << DLA_EVENT_CDMA_WT_DONE);
> +	}
> +
> +	glb_reg_write(engine, S_INTR_STATUS, reg);
> +	mask = glb_reg_read(engine, S_INTR_MASK);

Never used

> +	reg = glb_reg_read(engine, S_INTR_STATUS);

Never used.

> +
> +	complete(&nvdla_dev->event_notifier);
> +	spin_unlock_irqrestore(&nvdla_dev->nvdla_lock, flags);
> +
> +	return IRQ_HANDLED;
> +}

... snip

> diff --git a/drivers/gpu/drm/nvdla/nvdla_gem.c b/drivers/gpu/drm/nvdla/nvdla_gem.c
> new file mode 100644
> index 000000000000..cccf6d01a564
> --- /dev/null
> +++ b/drivers/gpu/drm/nvdla/nvdla_gem.c

... snip

> +static const struct drm_ioctl_desc nvdla_drm_ioctls[] = {
> +	DRM_IOCTL_DEF_DRV(NVDLA_SUBMIT, nvdla_submit, DRM_RENDER_ALLOW),
> +	DRM_IOCTL_DEF_DRV(NVDLA_GEM_CREATE, nvdla_gem_create, DRM_RENDER_ALLOW),
> +	DRM_IOCTL_DEF_DRV(NVDLA_GEM_MMAP, nvdla_gem_map_offset, DRM_RENDER_ALLOW),
> +	/* use DRM_IOCTL_MODE_DESTROY_DUMB to destory */

./nvdla_gem.c:347: destory ==> destroy

> +};

... snip

> diff --git a/drivers/gpu/drm/nvdla/nvdla_scheduler.c b/drivers/gpu/drm/nvdla/nvdla_scheduler.c
> new file mode 100644
> index 000000000000..b814077478c6
> --- /dev/null
> +++ b/drivers/gpu/drm/nvdla/nvdla_scheduler.c

... snip

> +static int
> +dla_update_dependency(struct dla_engine *engine,
> +					  struct dla_consumer *consumer,
> +					  struct dla_common_op_desc *op_desc,
> +					  uint8_t event, uint8_t roi_index)
> +{
> +	int32_t ret = 0;
> +	struct dla_processor *processor;
> +
> +	if (consumer->index == -1)
> +		goto exit;
> +
> +	/* Update dependency only if event matches */
> +	if (event != consumer->event)
> +		goto exit;
> +
> +	/**
> +	 * If consumer index is valid but op desc is NULL means
> +	 * op desc for consumer was not pre-fetched
> +	 */
> +	if (op_desc == NULL) {
> +		ret = -EINVAL;
> +		pr_err("Operation descriptor is NULL, consumer index %d",
> +				consumer->index);
> +		goto exit;
> +	}
> +
> +	pr_debug("Update dependency operation index %d ROI %d DEP_COUNT=%d\n",
> +					op_desc->index, op_desc->roi_index,
> +					op_desc->dependency_count);
> +	op_desc->dependency_count--;
> +
> +	if (op_desc->dependency_count == 0) {
> +		processor = &engine->processors[op_desc->op_type];
> +		pr_debug("enable %s in %s as depdency are resolved\n",

./nvdla_scheduler.c:455: depdency ==> dependency

> +			processor->name, __func__);
> +
> +		ret = dla_enable_operation(engine, processor, op_desc);
> +		if (ret)
> +			goto exit;
> +	}
> +exit:
> +	return ret;
> +}

... snip

> +int
> +dla_process_events(struct dla_engine *engine, uint32_t *task_complete)
> +{
> +	int32_t i;
> +	int32_t ret = 0;
> +
> +	for (i = 0; i < DLA_OP_NUM; i++) {
> +		struct dla_processor *processor;
> +
> +		processor = &engine->processors[i];
> +		ret = dla_handle_events(engine, processor);
> +		/**
> +		 * Incase engine status is non-zero, then don't

./nvdla_scheduler.c:905: Incase ==> In case

> +		 * update the engine status. We should keep its
> +		 * status for later cleaning of engine.
> +		 */
> +		if (!engine->status)
> +			engine->status = ret;
> +	}
> +
> +	if (engine->network->num_operations == engine->num_proc_hwl)
> +		*task_complete = 1;
> +
> +	return ret;
> +}

... snip

   Argillander
Cai Huoqing April 25, 2022, 2:28 p.m. UTC | #6
On 22 4月 22 01:01:14, Kari Argillander wrote:
> This is just quick look up. I basically check some style issues and did
> some basic static analyzing.
> 
> I have run
>  - cppcheck (which found couple mistakes)
>  - flawfinder (did not found anything to my eyes)
>  - codespell (did find couple typo)
> 
> You can run these yourself also or check below.
> 
> Couple common things which you can ignore or not	.
> - Usually in this code there is goto exit and it is just return. Maybe
>   use just return straight away. No need to jump.
> - Some comments start capital others not. Maybe all should start
>   capital. Very small nit, but makes nice touch to the code.
> - Lot of oneline comments are unneccessary three line comments.
> 
> On 19.4.2022 16.59, Cai Huoqing wrote:
> > The NVIDIA Deep Learning Accelerator (NVDLA) is an open source IP
> > which is integrated into NVIDIA Jetson AGX Xavier,
> > so add driver support for this accelerator.
> > 
> > Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev>
> 
> ... snip
> 
> > diff --git a/drivers/gpu/drm/nvdla/nvdla_bdma.c b/drivers/gpu/drm/nvdla/nvdla_bdma.c
> > new file mode 100644
> > index 000000000000..225613f27acf
> > --- /dev/null
> > +++ b/drivers/gpu/drm/nvdla/nvdla_bdma.c
> 
> ... snip
> 
> > +static int32_t
> > +processor_bdma_program_slot(struct dla_engine *engine,
> > +							struct dla_bdma_surface_desc *bdma_surface,
> > +							struct dla_bdma_transfer_desc *transfer)
> > +{
> > +	int32_t ret = 0;
> > +	uint64_t source_addr = 0;
> > +	uint64_t destination_addr = 0;
> > +	uint32_t high, low, reg;
> > +	uint8_t  bdma_free_slots = 0;
> > +
> > +	/* make sure there're enough free slots */
> > +	if (bdma_free_slots <= 0) {
> 
> This is always true right now.
> 
> > +		do {
> > +			reg = bdma_reg_read(engine, STATUS);
> > +			reg = (reg & MASK(BDMA_STATUS_0, FREE_SLOT)) >>
> > +					SHIFT(BDMA_STATUS_0, FREE_SLOT);
> > +		} while (reg == 0);
> > +		bdma_free_slots = (uint8_t)reg;
> > +	}
> > +
> > +	dla_get_dma_address(engine->driver_context, engine->task->task_data,
> > +						transfer->source_address,
> > +						(void *)&source_addr,
> > +						DESTINATION_DMA);
> > +	dla_get_dma_address(engine->driver_context, engine->task->task_data,
> > +						transfer->destination_address,
> > +						(void *)&destination_addr,
> > +						DESTINATION_DMA);
> > +
> > +	ASSERT_GOTO((transfer->line_repeat <= 8192),
> > +				ret, -EINVAL, exit);
> > +	ASSERT_GOTO((transfer->surface_repeat <= 8192),
> > +				ret, -EINVAL, exit);
> > +	ASSERT_GOTO((transfer->line_size % 32) == 0,
> > +				ret, -EINVAL, exit);
> > +	ASSERT_GOTO(transfer->source_line >= transfer->line_size,
> > +				ret, -EINVAL, exit);
> > +	ASSERT_GOTO(transfer->destination_line >= transfer->line_size,
> > +				ret, -EINVAL, exit);
> > +	ASSERT_GOTO(transfer->source_surface >=
> > +			(transfer->source_line * transfer->line_repeat),
> > +				ret, -EINVAL, exit);
> > +	ASSERT_GOTO(transfer->destination_surface >=
> > +			(transfer->destination_line * transfer->line_repeat),
> > +				ret, -EINVAL, exit);
> > +
> > +	/* config registers */
> > +	high = upper_32_bits(source_addr);
> > +	low = lower_32_bits(source_addr);
> > +	bdma_reg_write(engine, CFG_SRC_ADDR_LOW, low);
> > +	bdma_reg_write(engine, CFG_SRC_ADDR_HIGH, high);
> > +	high = upper_32_bits(destination_addr);
> > +	low = lower_32_bits(destination_addr);
> > +	bdma_reg_write(engine, CFG_DST_ADDR_LOW, low);
> > +	bdma_reg_write(engine, CFG_DST_ADDR_HIGH, high);
> > +	bdma_reg_write(engine, CFG_LINE, (transfer->line_size >> 5) - 1);
> > +	reg = (map_mem[bdma_surface->source_type] <<
> > +				SHIFT(BDMA_CFG_CMD_0, SRC_RAM_TYPE)) |
> > +		(map_mem[bdma_surface->destination_type] <<
> > +				SHIFT(BDMA_CFG_CMD_0, DST_RAM_TYPE));
> > +	bdma_reg_write(engine, CFG_CMD, reg);
> > +	bdma_reg_write(engine, CFG_LINE_REPEAT, transfer->line_repeat - 1);
> > +	bdma_reg_write(engine, CFG_SRC_LINE, transfer->source_line);
> > +	bdma_reg_write(engine, CFG_DST_LINE, transfer->destination_line);
> > +	bdma_reg_write(engine, CFG_SURF_REPEAT, transfer->surface_repeat - 1);
> > +	bdma_reg_write(engine, CFG_SRC_SURF, transfer->source_surface);
> > +	bdma_reg_write(engine, CFG_DST_SURF, transfer->destination_surface);
> > +	bdma_reg_write(engine, CFG_OP, FIELD_ENUM(BDMA_CFG_OP_0, EN, ENABLE));
> > +
> > +exit:
> > +	return ret;
> > +}
> 
> ... snip
> 
> > diff --git a/drivers/gpu/drm/nvdla/nvdla_cache.c b/drivers/gpu/drm/nvdla/nvdla_cache.c
> > new file mode 100644
> > index 000000000000..f8bd7b514aab
> > --- /dev/null
> > +++ b/drivers/gpu/drm/nvdla/nvdla_cache.c
> > @@ -0,0 +1,215 @@
> > +// SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
> > +/*
> > + * Copyright (C) 2017-2018 NVIDIA CORPORATION
> > + * Copyright (C) 2022 Cai Huoqing
> > + */
> > +
> > +#include "nvdla_common.h"
> > +#include "nvdla_drm.h"
> > +#include "nvdla_reg.h"
> > +#include "nvdla_engine.h"
> > +
> > +#define DLA_OP_CACHE_SIZE (DLA_NUM_GROUPS * ((DLA_OP_NUM + 2) * 2))
> > +
> > +static struct dla_common_op_desc desc_cache[DLA_OP_NUM][DLA_OP_CACHE_SIZE];
> > +static int32_t desc_refcount[DLA_OP_NUM][DLA_OP_CACHE_SIZE];
> > +
> > +void
> > +dla_get_refcount(struct dla_common_op_desc *op_desc)
> > +{
> > +	int32_t i;
> > +	struct dla_common_op_desc *desc = NULL;
> > +
> > +	if (op_desc == NULL)
> > +		return;
> > +
> > +	if (op_desc->index == -1)
> > +		return;
> > +
> > +	desc = &desc_cache[op_desc->op_type][0];
> > +
> > +	for (i = 0; i < DLA_OP_CACHE_SIZE; i++, desc++) {
> > +		if (desc->index == op_desc->index &&
> > +				desc->roi_index == op_desc->roi_index) {
> 
> reverse if
> 
> 		if (desc->index != op_desc->index)
> 			continue;
> 		if (desc->roi_index != op_desc->roi_index)
> 			continue;
> 
> > +			desc_refcount[op_desc->op_type][i]++;
> > +			return;
> > +		}
> > +	}
> > +}
> > +
> > +struct dla_common_op_desc *
> > +dla_get_op_desc(struct dla_engine *engine,
> > +				struct dla_task *task, int16_t index,
> > +				uint8_t op_type, uint8_t roi_index)
> > +{
> > +	int32_t i;
> > +	int32_t ret;
> > +	uint64_t op_base;
> > +	uint64_t dep_graph_addr;
> > +	struct dla_common_op_desc *desc = NULL;
> > +
> > +	if (index == -1) {
> > +		pr_debug("no desc get due to index==-1\n");
> > +		goto exit;
> > +	}
> > +
> > +	dep_graph_addr = (sizeof(struct dla_common_op_desc) *
> > +				engine->network->num_operations * roi_index);
> > +
> > +	desc = &desc_cache[op_type][0];
> > +
> > +	for (i = 0; i < DLA_OP_CACHE_SIZE; i++, desc++) {
> > +		if (desc->index == index && desc->roi_index == roi_index) {
> > +			if (desc->op_type != op_type) {
> > +				pr_err("op_cache[op=%u] contains incorrect entry of op[%u]\n",
> > +					   op_type, desc->op_type);
> > +				continue;
> > +			}
> 
> reverse if so this will be pretty clean
> 
> 		if (desc->index != index)
> 			continue;
> 		if (desc->roi_index != roi_index)
> 			continue;
> 		if (desc->op_type != op_type) {
> 			pr_err("op_cache[op=%u] contains incorrect entry of op[%u]\n",
> 					op_type, desc->op_type);
> 			continue;
> 		}
> 
> 
> > +			desc_refcount[op_type][i]++;
> > +			goto exit;
> > +		}
> > +	}
> > +
> > +	desc = &desc_cache[op_type][0];
> > +
> > +	for (i = 0; i < DLA_OP_CACHE_SIZE; i++, desc++) {
> > +		if (desc->index == -1) {
> 
> reverse if
> 		if (desc->index != -1)
> 			continue;
> 
> > +			op_base = dep_graph_addr +
> > +					(sizeof(struct dla_common_op_desc) *
> > +					(uint64_t)index);
> > +			ret = dla_data_read(engine->driver_context,
> > +					task->task_data,
> > +					task->dependency_graph_addr,
> > +					(void *)(desc),
> > +					sizeof(struct dla_common_op_desc),
> > +					op_base);
> > +			if (ret) {
> > +				desc = NULL;
> > +				goto exit;
> > +			}
> > +
> > +			if (op_type != desc->op_type) {
> > +				/*
> > +				 * op_type of entry read from DRAM should not
> > +				 * mismatch with given op_type. If they
> > +				 * mismatches, then wrong entry is fetched, so
> > +				 * report this issue by throwing error.
> > +				 */
> > +				pr_err("Fetched [op_type=%u] from DRAM doesn't match with op_type[%u]\n",
> > +					   desc->op_type, op_type);
> > +				desc->op_type = op_type;
> > +				desc->index = -1;
> > +				desc->roi_index = -1;
> > +				desc = NULL;
> > +				goto exit;
> > +			}
> > +
> > +			desc->index = index;
> > +			desc->roi_index = roi_index;
> > +
> > +			desc_refcount[op_type][i]++;
> > +			goto exit;
> > +		}
> > +	}
> > +
> > +exit:
> > +	return desc;
> > +}
> > +
> > +static void
> > +dla_free_op_desc(struct dla_engine *engine, struct dla_common_op_desc *op_desc)
> > +{
> > +	uint64_t op_base;
> > +	uint64_t dep_graph_addr;
> > +	struct dla_task *task;
> > +
> > +	pr_debug("Enter: %s op desc index %u ROI %d\n", __func__,
> > +				op_desc->index, op_desc->roi_index);
> 
> Possiple null pointer dereference
> 
> > +	task = engine->task;
> > +	dep_graph_addr = (sizeof(struct dla_common_op_desc) *
> > +				engine->network->num_operations *
> > +				op_desc->roi_index);
> > +
> > +	if (op_desc->index == -1)
> > +		goto exit;
> 
> Possiple null pointer dereference
> 
> > +	if (op_desc == NULL)
> > +		goto exit;
> 
> Or this is unnecessary.
> 
> > +
> > +	/**
> > +	 * TODO: keeping the depth value hardcoded as 0 for now,
> > +	 * need to replace it once corresponding implementation is done.
> > +	 */
> > +	op_base = (dep_graph_addr +
> > +			(sizeof(struct dla_common_op_desc) *
> > +			(uint64_t)op_desc->index));
> > +
> > +	/**
> > +	 * Flush descriptor to DRAM
> > +	 */
> > +	dla_data_write(engine->driver_context,
> > +			task->task_data,
> > +			(void *)op_desc,
> > +			task->dependency_graph_addr,
> > +			sizeof(struct dla_common_op_desc),
> > +			op_base);
> > +
> > +	/**
> > +	 * Release it
> > +	 */
> > +	op_desc->index = -1;
> > +	op_desc->roi_index = -1;
> > +exit:
> > +	return;
> > +}
> > +
> > +void
> > +dla_put_op_desc(struct dla_engine *engine, struct dla_common_op_desc *op_desc)
> > +{
> > +	int32_t i;
> > +	struct dla_common_op_desc *desc;
> > +
> > +	if (op_desc == NULL)
> > +		return;
> > +
> > +	if (op_desc->index == -1)
> > +		return;
> > +
> > +	desc = &desc_cache[op_desc->op_type][0];
> > +
> > +	for (i = 0; i < DLA_OP_CACHE_SIZE; i++, desc++) {
> > +		if (desc->index == op_desc->index &&
> > +				desc->roi_index == op_desc->roi_index) {
> 
> Reverse if.
> 
> 		if (desc->index != op_desc->index)
> 			continue;
> 		if (desc->roi_index != op_desc->roi_index)
> 			continue;
> 
> > +
> > +			desc_refcount[op_desc->op_type][i]--;
> > +
> > +			/**
> > +			 * Free desc if refcount is 0
> > +			 */
> Pretty useless comment and totally not needed three line for this.
> 
> > +			if (desc_refcount[op_desc->op_type][i] == 0)
> > +				dla_free_op_desc(engine, op_desc);
> > +
> > +			return;
> > +		}
> > +	}
> > +}
> > +
> > +void
> > +dla_init_op_cache(struct dla_engine *engine)
> > +{
> > +	int32_t i, j;
> > +	struct dla_common_op_desc *desc = &desc_cache[0][0];
> > +
> > +	memset((uint8_t *)&desc_cache[0][0], 0, sizeof(desc_cache));
> > +	memset((uint8_t *)&desc_refcount[0][0], 0, sizeof(desc_refcount));
> > +
> > +	for (i = 0; i < DLA_OP_NUM; i++) {
> > +		for (j = 0; j < DLA_OP_CACHE_SIZE; j++) {
> > +			desc->index = -1;
> > +			desc->roi_index = -1;
> > +			desc->op_type = (uint8_t)i;
> > +			desc++;
> > +		}
> > +	}
> > +}
> 
> ... snip
> 
> > diff --git a/drivers/gpu/drm/nvdla/nvdla_common.h b/drivers/gpu/drm/nvdla/nvdla_common.h
> > new file mode 100644
> > index 000000000000..38cf43246890
> > --- /dev/null
> > +++ b/drivers/gpu/drm/nvdla/nvdla_common.h
> > @@ -0,0 +1,835 @@
> 
> ... snip
> 
> > +struct dla_conv_op_desc {
> > +	/* Performance parameters */
> > +
> > +	/* dla_conv_mode */
> > +	uint8_t conv_mode;
> > +	uint8_t data_reuse;
> > +	uint8_t weight_reuse;
> > +	uint8_t skip_data_rls;
> > +
> > +	uint8_t skip_weight_rls;
> > +	uint8_t reserved0;
> > +	uint16_t entry_per_slice;
> > +
> > +	/* dla_data_format */
> > +	uint8_t data_format;
> > +	/* dla_pixel_mapping */
> > +	uint8_t pixel_mapping;
> > +	/* number of free slices before fetch */
> > +	uint16_t fetch_grain;
> > +
> > +	uint8_t reserved_b[8];
> > +
> > +	/* batch_num */
> > +	uint8_t batch;
> > +	/* dla_weight_format */
> > +	uint8_t weight_format;
> > +	uint8_t data_bank;
> > +	uint8_t weight_bank;
> > +
> > +	/* the offset in bytes of each data cube in a batch */
> > +	uint32_t batch_stride;
> > +
> > +	uint8_t post_extension;
> > +	uint8_t pixel_override;
> > +	/* number of slices need to be released */
> > +	uint16_t release;
> > +
> > +	 /* The input cube dimension for CSC */
> > +	uint16_t input_width_csc;
> > +	uint16_t input_height_csc;
> > +
> > +	uint16_t input_channel_csc;
> > +	uint16_t kernel_width_csc;
> > +
> > +	uint16_t kernel_height_csc;
> > +	uint16_t kernel_channel_csc;
> > +
> > +	/* The input cube dimension for CMAC */
> > +	uint16_t input_width_cmac;
> > +	uint16_t input_height_cmac;
> > +
> > +	/* actual size in bytes */
> > +	uint32_t bytes_per_kernel;
> > +
> > +	/* Algorithm parameters */
> > +
> > +	int16_t mean_ry; /* mean value for red in RGB or Y in YUV */
> > +	int16_t mean_gu; /* mean value for green in RGB or U in YUV */
> > +
> > +	int16_t mean_bv; /* mean value for blue in RGB or V in YUV */
> > +	int16_t mean_ax;
> > +
> > +	uint8_t mean_format; /* dla_mean_format */
> > +	uint8_t conv_stride_x;
> > +	uint8_t conv_stride_y;
> > +	uint8_t pad_x_left;
> > +
> > +	uint8_t pad_x_right;
> > +	uint8_t pad_y_top;
> > +	uint8_t pad_y_bottom;
> > +	uint8_t dilation_x;
> > +
> > +	uint8_t dilation_y;
> > +	uint8_t reserved2[2];
> > +
> > +	/* Precision parameters */
> > +	uint8_t pra_truncate;
> > +
> > +	uint8_t in_precision;
> > +	/* The output precision from CONV, it's the MAC processing precison */
> 
> ./nvdla_common.h:428: precison ==> precision
> 
> > +	uint8_t out_precision;
> > +	int16_t pad_val;
> > +
> > +	/* input converter parameters */
> > +	struct dla_cvt_param in_cvt;
> > +	/* output converter parameters, support truncate only */
> > +	struct dla_cvt_param out_cvt;
> > +
> > +} __packed __aligned(4);
> > +
> > +struct dla_conv_stat_desc {
> > +	uint32_t data_read_stall;
> > +	uint32_t weight_read_stall;
> > +	uint32_t data_read_latency;
> > +	uint32_t weight_read_latency;
> > +	uint32_t saturation_count;
> > +	uint32_t nan_data_num;
> > +	uint32_t nan_weight_num;
> > +	uint32_t inf_data_num;
> > +	uint32_t inf_weight_num;
> > +} __packed __aligned(4);
> > +
> > +/**
> > + * @ingroup SDP
> > + * @name Activation functions
> > + * @brief Activation functions supported in SDP
> > + * @{
> > + */
> > +#define ACTIVATION_NONE		0
> > +#define ACTIVATION_RELU		1
> > +#define ACTIVATION_LUT		2
> > +#define ACTIVATION_PRELU	3
> > +/** @} */
> > +
> > +/**
> > + * @ingroup LUT
> > + * @name LUT size
> > + * @brief LUT sizes for linear and exponentila LUT
> > + * @{
> > + */
> > +#define LUT_LINEAR_EXP_TABLE_ENTRY_LOG2		6
> > +#define LUT_LINEAR_ONLY_TABLE_ENTRY_LOG2	8
> > +/** @} */
> > +
> > +/**
> > + * @ingroup LUT
> > + * @name LUT types
> > + * @brief DLA supports two types of LUT, linear and exonential
> > + * @{
> > + */
> > +#define LUT_LINEAR_EXP_TABLE		0
> > +#define LUT_LINEAR_ONLY_TABLE		1
> > +/** @} */
> > +
> > +/**
> > + * @ingroup LUT
> > + * @name LUT methods
> > + * @brief DLA supports two types of LUT, linear and exonential
> > + * @{
> > + */
> > +#define LUT_METHOD_EXPONENTIAL		0
> > +#define LUT_METHOD_LINEAR		1
> > +/** @} */
> > +
> > +/**
> > + * @ingroup LUT
> > + * @name LUT
> > + * @brief DLA supports two types of LUT, linear and exonential
> > + * @{
> > + */
> > +#define LUT_PRI_LINEAR_EXP		0
> > +#define LUT_PRI_LINEAR_ONLY		1
> > +/** @} */
> > +
> > +union dla_lut_offset {
> > +	/**
> > +	 * Number should be substracted on log domain before look up
> 
> ./nvdla_common.h:505: substracted ==> subtracted
> 
> > +	 * exponetial table it has the same definition as hardware
> 
> ./nvdla_common.h:506: exponetial ==> exponential
> 
> > +	 * thus input scaling should also take into account when
> > +	 * set this field.
> > +	 */
> > +	int8_t exp_offset;
> > +	/**
> > +	 * Number of bits should be right shift before looking
> > +	 * up linear table
> > +	 */
> > +	int8_t frac_bits;
> > +	uint16_t reserved0;
> > +};
> 
> ... snip
> 
> > diff --git a/drivers/gpu/drm/nvdla/nvdla_drm.c b/drivers/gpu/drm/nvdla/nvdla_drm.c
> > new file mode 100644
> > index 000000000000..9217eee1de3b
> > --- /dev/null
> > +++ b/drivers/gpu/drm/nvdla/nvdla_drm.c
> > @@ -0,0 +1,695 @@
> > +// SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
> > +/*
> > + * Copyright (C) 2017-2018 NVIDIA CORPORATION
> > + * Copyright (C) 2022 Cai Huoqing
> > + */
> > +
> > +#include <linux/dma-buf.h>
> > +#include <linux/dma-mapping.h>
> > +#include <linux/fs.h>
> > +#include <linux/interrupt.h>
> > +#include <linux/irq.h>
> > +#include <linux/irqdomain.h>
> > +#include <linux/module.h>
> > +#include <linux/of.h>
> > +#include <linux/of_device.h>
> > +#include <linux/of_irq.h>
> > +#include <linux/of_platform.h>
> > +#include <linux/platform_device.h>
> > +#include <linux/printk.h>
> > +#include <linux/slab.h>
> > +#include <linux/spinlock.h>
> > +#include <linux/time.h>
> > +#include <linux/uaccess.h>
> > +#include <linux/types.h>
> > +
> > +#include "nvdla_drm.h"
> > +#include "nvdla_ioctl.h"
> > +#include "nvdla_engine.h"
> > +
> > +static struct nvdla_config nvdla_config_os_initial = {
> > +	.atom_size = 32,
> > +	.bdma_enable = true,
> > +	.rubik_enable = true,
> > +	.weight_compress_support = true,
> > +};
> > +
> > +static struct nvdla_config nvdla_config_small = {
> > +	//.atom_size = 8,
> > +	.atom_size = 32,  // nv_large config
> > +	.bdma_enable = false,
> > +	.rubik_enable = false,
> > +	.weight_compress_support = false,
> > +};
> > +
> > +int64_t dla_get_time_us(void)
> 
> Funtion is never used.
> 
> > +{
> > +	return ktime_get_ns() / NSEC_PER_USEC;
> > +}
> > +
> > +void dla_reg_write(void *driver_context, uint32_t addr, uint32_t reg)
> > +{
> > +	struct nvdla_device *nvdla_dev =
> > +			(struct nvdla_device *)driver_context;
> > +
> > +	if (!nvdla_dev)
> > +		return;
> > +
> > +	writel(reg, nvdla_dev->base + addr);
> > +}
> > +
> > +uint32_t dla_reg_read(void *driver_context, uint32_t addr)
> > +{
> > +	struct nvdla_device *nvdla_dev =
> > +			(struct nvdla_device *)driver_context;
> > +
> > +	if (!nvdla_dev)
> > +		return 0;
> > +
> > +	return readl(nvdla_dev->base + addr);
> > +}
> > +
> > +static irqreturn_t nvdla_engine_isr(int32_t irq, void *data)
> > +{
> > +	unsigned long flags;
> > +	uint32_t mask;
> > +	uint32_t reg;
> > +	struct dla_processor *processor = NULL;
> > +	struct dla_processor_group *group;
> > +	struct dla_engine *engine;
> > +	struct nvdla_device *nvdla_dev = (struct nvdla_device *)data;
> > +
> > +	if (!nvdla_dev)
> > +		return IRQ_NONE;
> > +
> > +	engine = nvdla_dev->engine_context;
> > +	spin_lock_irqsave(&nvdla_dev->nvdla_lock, flags);
> > +
> > +	mask = glb_reg_read(engine, S_INTR_MASK);
> 
> Never used. It would be nice so that static analyzer will not complain
> these anymore, but your choice what you want to do.
thanks for your check. this line is an read clear register to clear interrupt,
it'is ok to leave here.
for others, code style and typo. I will try to fix

Thanks,
Cai
> 
> > +	reg = glb_reg_read(engine, S_INTR_STATUS);
> > +
> > +	if (reg & MASK(GLB_S_INTR_STATUS_0, CACC_DONE_STATUS0)) {
> > +		processor = &engine->processors[DLA_OP_CONV];
> > +		group = &processor->groups[0];
> > +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> > +	}
> > +	if (reg & MASK(GLB_S_INTR_STATUS_0, CACC_DONE_STATUS1)) {
> > +		processor = &engine->processors[DLA_OP_CONV];
> > +		group = &processor->groups[1];
> > +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> > +	}
> > +	if (reg & MASK(GLB_S_INTR_STATUS_0, SDP_DONE_STATUS0)) {
> > +		processor = &engine->processors[DLA_OP_SDP];
> > +		group = &processor->groups[0];
> > +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> > +	}
> > +	if (reg & MASK(GLB_S_INTR_STATUS_0, SDP_DONE_STATUS1)) {
> > +		processor = &engine->processors[DLA_OP_SDP];
> > +		group = &processor->groups[1];
> > +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> > +	}
> > +	if (reg & MASK(GLB_S_INTR_STATUS_0, CDP_DONE_STATUS0)) {
> > +		processor = &engine->processors[DLA_OP_CDP];
> > +		group = &processor->groups[0];
> > +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> > +	}
> > +	if (reg & MASK(GLB_S_INTR_STATUS_0, CDP_DONE_STATUS1)) {
> > +		processor = &engine->processors[DLA_OP_CDP];
> > +		group = &processor->groups[1];
> > +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> > +	}
> > +	if (reg & MASK(GLB_S_INTR_STATUS_0, RUBIK_DONE_STATUS0)) {
> > +		processor = &engine->processors[DLA_OP_RUBIK];
> > +		group = &processor->groups[0];
> > +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> > +	}
> > +	if (reg & MASK(GLB_S_INTR_STATUS_0, RUBIK_DONE_STATUS1)) {
> > +		processor = &engine->processors[DLA_OP_RUBIK];
> > +		group = &processor->groups[1];
> > +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> > +	}
> > +	if (reg & MASK(GLB_S_INTR_STATUS_0, PDP_DONE_STATUS0)) {
> > +		processor = &engine->processors[DLA_OP_PDP];
> > +		group = &processor->groups[0];
> > +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> > +	}
> > +	if (reg & MASK(GLB_S_INTR_STATUS_0, PDP_DONE_STATUS1)) {
> > +		processor = &engine->processors[DLA_OP_PDP];
> > +		group = &processor->groups[1];
> > +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> > +	}
> > +	if (reg & MASK(GLB_S_INTR_STATUS_0, BDMA_DONE_STATUS0)) {
> > +		processor = &engine->processors[DLA_OP_BDMA];
> > +		group = &processor->groups[0];
> > +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> > +	}
> > +	if (reg & MASK(GLB_S_INTR_STATUS_0, BDMA_DONE_STATUS1)) {
> > +		processor = &engine->processors[DLA_OP_BDMA];
> > +		group = &processor->groups[1];
> > +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> > +	}
> > +	if (reg & MASK(GLB_S_INTR_STATUS_0, CDMA_DAT_DONE_STATUS0)) {
> > +		processor = &engine->processors[DLA_OP_CONV];
> > +		group = &processor->groups[0];
> > +		group->events |= (1 << DLA_EVENT_CDMA_DT_DONE);
> > +	}
> > +	if (reg & MASK(GLB_S_INTR_STATUS_0, CDMA_DAT_DONE_STATUS1)) {
> > +		processor = &engine->processors[DLA_OP_CONV];
> > +		group = &processor->groups[1];
> > +		group->events |= (1 << DLA_EVENT_CDMA_DT_DONE);
> > +	}
> > +	if (reg & MASK(GLB_S_INTR_STATUS_0, CDMA_WT_DONE_STATUS0)) {
> > +		processor = &engine->processors[DLA_OP_CONV];
> > +		group = &processor->groups[0];
> > +		group->events |= (1 << DLA_EVENT_CDMA_WT_DONE);
> > +	}
> > +	if (reg & MASK(GLB_S_INTR_STATUS_0, CDMA_WT_DONE_STATUS1)) {
> > +		processor = &engine->processors[DLA_OP_CONV];
> > +		group = &processor->groups[1];
> > +		group->events |= (1 << DLA_EVENT_CDMA_WT_DONE);
> > +	}
> > +
> > +	glb_reg_write(engine, S_INTR_STATUS, reg);
> > +	mask = glb_reg_read(engine, S_INTR_MASK);
> 
> Never used
> 
> > +	reg = glb_reg_read(engine, S_INTR_STATUS);
> 
> Never used.
> 
> > +
> > +	complete(&nvdla_dev->event_notifier);
> > +	spin_unlock_irqrestore(&nvdla_dev->nvdla_lock, flags);
> > +
> > +	return IRQ_HANDLED;
> > +}
> 
> ... snip
> 
> > diff --git a/drivers/gpu/drm/nvdla/nvdla_gem.c b/drivers/gpu/drm/nvdla/nvdla_gem.c
> > new file mode 100644
> > index 000000000000..cccf6d01a564
> > --- /dev/null
> > +++ b/drivers/gpu/drm/nvdla/nvdla_gem.c
> 
> ... snip
> 
> > +static const struct drm_ioctl_desc nvdla_drm_ioctls[] = {
> > +	DRM_IOCTL_DEF_DRV(NVDLA_SUBMIT, nvdla_submit, DRM_RENDER_ALLOW),
> > +	DRM_IOCTL_DEF_DRV(NVDLA_GEM_CREATE, nvdla_gem_create, DRM_RENDER_ALLOW),
> > +	DRM_IOCTL_DEF_DRV(NVDLA_GEM_MMAP, nvdla_gem_map_offset, DRM_RENDER_ALLOW),
> > +	/* use DRM_IOCTL_MODE_DESTROY_DUMB to destory */
> 
> ./nvdla_gem.c:347: destory ==> destroy
> 
> > +};
> 
> ... snip
> 
> > diff --git a/drivers/gpu/drm/nvdla/nvdla_scheduler.c b/drivers/gpu/drm/nvdla/nvdla_scheduler.c
> > new file mode 100644
> > index 000000000000..b814077478c6
> > --- /dev/null
> > +++ b/drivers/gpu/drm/nvdla/nvdla_scheduler.c
> 
> ... snip
> 
> > +static int
> > +dla_update_dependency(struct dla_engine *engine,
> > +					  struct dla_consumer *consumer,
> > +					  struct dla_common_op_desc *op_desc,
> > +					  uint8_t event, uint8_t roi_index)
> > +{
> > +	int32_t ret = 0;
> > +	struct dla_processor *processor;
> > +
> > +	if (consumer->index == -1)
> > +		goto exit;
> > +
> > +	/* Update dependency only if event matches */
> > +	if (event != consumer->event)
> > +		goto exit;
> > +
> > +	/**
> > +	 * If consumer index is valid but op desc is NULL means
> > +	 * op desc for consumer was not pre-fetched
> > +	 */
> > +	if (op_desc == NULL) {
> > +		ret = -EINVAL;
> > +		pr_err("Operation descriptor is NULL, consumer index %d",
> > +				consumer->index);
> > +		goto exit;
> > +	}
> > +
> > +	pr_debug("Update dependency operation index %d ROI %d DEP_COUNT=%d\n",
> > +					op_desc->index, op_desc->roi_index,
> > +					op_desc->dependency_count);
> > +	op_desc->dependency_count--;
> > +
> > +	if (op_desc->dependency_count == 0) {
> > +		processor = &engine->processors[op_desc->op_type];
> > +		pr_debug("enable %s in %s as depdency are resolved\n",
> 
> ./nvdla_scheduler.c:455: depdency ==> dependency
> 
> > +			processor->name, __func__);
> > +
> > +		ret = dla_enable_operation(engine, processor, op_desc);
> > +		if (ret)
> > +			goto exit;
> > +	}
> > +exit:
> > +	return ret;
> > +}
> 
> ... snip
> 
> > +int
> > +dla_process_events(struct dla_engine *engine, uint32_t *task_complete)
> > +{
> > +	int32_t i;
> > +	int32_t ret = 0;
> > +
> > +	for (i = 0; i < DLA_OP_NUM; i++) {
> > +		struct dla_processor *processor;
> > +
> > +		processor = &engine->processors[i];
> > +		ret = dla_handle_events(engine, processor);
> > +		/**
> > +		 * Incase engine status is non-zero, then don't
> 
> ./nvdla_scheduler.c:905: Incase ==> In case
> 
> > +		 * update the engine status. We should keep its
> > +		 * status for later cleaning of engine.
> > +		 */
> > +		if (!engine->status)
> > +			engine->status = ret;
> > +	}
> > +
> > +	if (engine->network->num_operations == engine->num_proc_hwl)
> > +		*task_complete = 1;
> > +
> > +	return ret;
> > +}
> 
> ... snip
> 
>   Argillander