mbox series

[v7,00/13] AV1 stateless decoder for RK3588

Message ID 20230503083438.85139-1-benjamin.gaignard@collabora.com
Headers show
Series AV1 stateless decoder for RK3588 | expand

Message

Benjamin Gaignard May 3, 2023, 8:34 a.m. UTC
This series implement AV1 stateless decoder for RK3588 SoC.
The hardware support 8 and 10 bits bitstreams up to 7680x4320.
AV1 feature like film grain or scaling are done by the postprocessor.
The driver can produce NV12_4L4, NV12_10LE40_4L4, NV12 and P010 pixels formats.
Even if Rockchip have named the hardware VPU981 it looks like a VC9000 but 
with a different registers mapping.

The full branch can be found here:
https://gitlab.collabora.com/linux/for-upstream/-/commits/rk3588_av1_decoder_v7

Fluster score is: 200/239 while testing AV1-TEST-VECTORS with GStreamer-AV1-V4L2SL-Gst1.0.
The failing tests are:
- the 2 tests with 2 spatial layers: few errors in luma/chroma values
- tests with resolution < hardware limit (64x64)
- 10bits film grain test: bad macroblocks while decoding, the same 8bits
  test is working fine.

Changes in v7:
- Rebased on media_tree master branch.
- Fix warnings exposed by W=1
- Fix Angelo's comments

Changes in v6:
- Rename NV12_10LE40_4L4 pixel format into NV15_4L4.
- Add defines for post-proc selection.
- Change patch order as requested by Nicolas.
- Fix frame-larger-than warning.

Changes in v5:
- Add a patch to initialize bit_depth field of V4L2_CTRL_TYPE_AV1_SEQUENCE
  ioctl.

Changes in v4:
- Squash "Save bit depth for AV1 decoder" and "Check AV1 bitstreams bit
  depth" patches.
- Double motion vectors buffer size.
- Fix the various errors reported by Hans.

Changes in v3:
- Fix arrays loops limites.
- Remove unused field.
- Reset raw pixel formats list when bit depth or film grain feature
  values change.
- Enable post-processor P010 support

Changes in v2:
- Remove useless +1 in sbs computation. 
- Describe NV12_10LE40_4L4 pixels format.
- Post-processor could generate P010.
- Fix comments done on v1.
- The last patch make sure that only post-processed formats are used when film
  grain feature is enabled.

Benjamin


Benjamin Gaignard (12):
  dt-bindings: media: rockchip-vpu: Add rk3588 vpu compatible
  media: AV1: Make sure that bit depth in correctly initialize
  media: Add NV15_4L4 pixel format
  media: verisilicon: Get bit depth for V4L2_PIX_FMT_NV15_4L4
  media: verisilicon: Add AV1 decoder mode and controls
  media: verisilicon: Check AV1 bitstreams bit depth
  media: verisilicon: Compute motion vectors size for AV1 frames
  media: verisilicon: Add AV1 entropy helpers
  media: verisilicon: Add Rockchip AV1 decoder
  media: verisilicon: Add film grain feature to AV1 driver
  media: verisilicon: Enable AV1 decoder on rk3588
  media: verisilicon: Conditionally ignore native formats

Nicolas Dufresne (1):
  v4l2-common: Add support for fractional bpp

 .../bindings/media/rockchip-vpu.yaml          |    1 +
 .../media/v4l/pixfmt-yuv-planar.rst           |   16 +
 drivers/media/platform/verisilicon/Makefile   |    3 +
 drivers/media/platform/verisilicon/hantro.h   |    8 +
 .../media/platform/verisilicon/hantro_drv.c   |   68 +-
 .../media/platform/verisilicon/hantro_hw.h    |  102 +
 .../platform/verisilicon/hantro_postproc.c    |    9 +-
 .../media/platform/verisilicon/hantro_v4l2.c  |   67 +-
 .../media/platform/verisilicon/hantro_v4l2.h  |    8 +-
 .../verisilicon/rockchip_av1_entropymode.c    | 4424 +++++++++++++++++
 .../verisilicon/rockchip_av1_entropymode.h    |  272 +
 .../verisilicon/rockchip_av1_filmgrain.c      |  401 ++
 .../verisilicon/rockchip_av1_filmgrain.h      |   36 +
 .../verisilicon/rockchip_vpu981_hw_av1_dec.c  | 2232 +++++++++
 .../verisilicon/rockchip_vpu981_regs.h        |  477 ++
 .../platform/verisilicon/rockchip_vpu_hw.c    |  134 +
 drivers/media/v4l2-core/v4l2-common.c         |  162 +-
 drivers/media/v4l2-core/v4l2-ctrls-core.c     |    5 +
 drivers/media/v4l2-core/v4l2-ioctl.c          |    1 +
 include/media/v4l2-common.h                   |    2 +
 include/uapi/linux/videodev2.h                |    1 +
 21 files changed, 8326 insertions(+), 103 deletions(-)
 create mode 100644 drivers/media/platform/verisilicon/rockchip_av1_entropymode.c
 create mode 100644 drivers/media/platform/verisilicon/rockchip_av1_entropymode.h
 create mode 100644 drivers/media/platform/verisilicon/rockchip_av1_filmgrain.c
 create mode 100644 drivers/media/platform/verisilicon/rockchip_av1_filmgrain.h
 create mode 100644 drivers/media/platform/verisilicon/rockchip_vpu981_hw_av1_dec.c
 create mode 100644 drivers/media/platform/verisilicon/rockchip_vpu981_regs.h

Comments

Nicolas Dufresne May 25, 2023, 3:18 p.m. UTC | #1
Le mercredi 03 mai 2023 à 10:34 +0200, Benjamin Gaignard a écrit :
> This series implement AV1 stateless decoder for RK3588 SoC.
> The hardware support 8 and 10 bits bitstreams up to 7680x4320.
> AV1 feature like film grain or scaling are done by the postprocessor.
> The driver can produce NV12_4L4, NV12_10LE40_4L4, NV12 and P010 pixels formats.
> Even if Rockchip have named the hardware VPU981 it looks like a VC9000 but 
> with a different registers mapping.

Just wanted to add a comment about the series. After discussion with Ezequiel,
it seems this driver is getting quite big, at some point we should probably
split it, so the newer chips have a driver free from legacy, and also free from
having encoders in the same driver. But as usual, who makes the work tend to
rule, and this one did not turned too badly, a lot of the extra work ended
related to issues with the VP9/G2 integration. G2 cores are much closer to
VC8000D/9000D then G1 and might have been the right moment to make the split,
but we kind of miss that opportunity. The difference is that G2 had highly
limited post processing support, VC8000D/9000D have the same post processor,
which is similar in functionality to the G1 post processor.

**Informative, feel free to skip the rest**

Because I was asked at least twice last week, the reason for having encoders in
that driver is that the Hantro H1 and G1 cores shares the same cache storage,
and cannot be run concurrently. The solution that was picked was to place them
both in the same driver, sharing the same m2m ctx and leaving it to the kernel
scheduler to dispatch on both encoder and decoder cores. Newer VSI cores can run
concurrently.

In newer chips generation, notably RKVDEC2 and probably VC9000D (we don't have
spec), the multicore model will not work with such a scheduling model since for
8k stream you need to use 2 cores, where you can use those two concurrently for
other streams. The scheduling will have to be fancier to ensure we don't starve
the 8K+ streams. Having a clean driver to do so will help. This AV1 core does
not have 8K support with this method of binding two cores together, so I see no
harm going forward similarly to what we did for the G2 core (HEVC/VP9). Perhaps
there is going to be challenges in possibly moving the implementation in future,
I'd be very happy to get feedback about this, so the we can help Ezequiel here
in making sure this driver does not become a giant blob of unrelated but similar
chips all under the same driver.

regards,
Nicolas

> 
> The full branch can be found here:
> https://gitlab.collabora.com/linux/for-upstream/-/commits/rk3588_av1_decoder_v7
> 
> Fluster score is: 200/239 while testing AV1-TEST-VECTORS with GStreamer-AV1-V4L2SL-Gst1.0.
> The failing tests are:
> - the 2 tests with 2 spatial layers: few errors in luma/chroma values
> - tests with resolution < hardware limit (64x64)
> - 10bits film grain test: bad macroblocks while decoding, the same 8bits
>   test is working fine.
> 
> Changes in v7:
> - Rebased on media_tree master branch.
> - Fix warnings exposed by W=1
> - Fix Angelo's comments
> 
> Changes in v6:
> - Rename NV12_10LE40_4L4 pixel format into NV15_4L4.
> - Add defines for post-proc selection.
> - Change patch order as requested by Nicolas.
> - Fix frame-larger-than warning.
> 
> Changes in v5:
> - Add a patch to initialize bit_depth field of V4L2_CTRL_TYPE_AV1_SEQUENCE
>   ioctl.
> 
> Changes in v4:
> - Squash "Save bit depth for AV1 decoder" and "Check AV1 bitstreams bit
>   depth" patches.
> - Double motion vectors buffer size.
> - Fix the various errors reported by Hans.
> 
> Changes in v3:
> - Fix arrays loops limites.
> - Remove unused field.
> - Reset raw pixel formats list when bit depth or film grain feature
>   values change.
> - Enable post-processor P010 support
> 
> Changes in v2:
> - Remove useless +1 in sbs computation. 
> - Describe NV12_10LE40_4L4 pixels format.
> - Post-processor could generate P010.
> - Fix comments done on v1.
> - The last patch make sure that only post-processed formats are used when film
>   grain feature is enabled.
> 
> Benjamin
> 
> 
> Benjamin Gaignard (12):
>   dt-bindings: media: rockchip-vpu: Add rk3588 vpu compatible
>   media: AV1: Make sure that bit depth in correctly initialize
>   media: Add NV15_4L4 pixel format
>   media: verisilicon: Get bit depth for V4L2_PIX_FMT_NV15_4L4
>   media: verisilicon: Add AV1 decoder mode and controls
>   media: verisilicon: Check AV1 bitstreams bit depth
>   media: verisilicon: Compute motion vectors size for AV1 frames
>   media: verisilicon: Add AV1 entropy helpers
>   media: verisilicon: Add Rockchip AV1 decoder
>   media: verisilicon: Add film grain feature to AV1 driver
>   media: verisilicon: Enable AV1 decoder on rk3588
>   media: verisilicon: Conditionally ignore native formats
> 
> Nicolas Dufresne (1):
>   v4l2-common: Add support for fractional bpp
> 
>  .../bindings/media/rockchip-vpu.yaml          |    1 +
>  .../media/v4l/pixfmt-yuv-planar.rst           |   16 +
>  drivers/media/platform/verisilicon/Makefile   |    3 +
>  drivers/media/platform/verisilicon/hantro.h   |    8 +
>  .../media/platform/verisilicon/hantro_drv.c   |   68 +-
>  .../media/platform/verisilicon/hantro_hw.h    |  102 +
>  .../platform/verisilicon/hantro_postproc.c    |    9 +-
>  .../media/platform/verisilicon/hantro_v4l2.c  |   67 +-
>  .../media/platform/verisilicon/hantro_v4l2.h  |    8 +-
>  .../verisilicon/rockchip_av1_entropymode.c    | 4424 +++++++++++++++++
>  .../verisilicon/rockchip_av1_entropymode.h    |  272 +
>  .../verisilicon/rockchip_av1_filmgrain.c      |  401 ++
>  .../verisilicon/rockchip_av1_filmgrain.h      |   36 +
>  .../verisilicon/rockchip_vpu981_hw_av1_dec.c  | 2232 +++++++++
>  .../verisilicon/rockchip_vpu981_regs.h        |  477 ++
>  .../platform/verisilicon/rockchip_vpu_hw.c    |  134 +
>  drivers/media/v4l2-core/v4l2-common.c         |  162 +-
>  drivers/media/v4l2-core/v4l2-ctrls-core.c     |    5 +
>  drivers/media/v4l2-core/v4l2-ioctl.c          |    1 +
>  include/media/v4l2-common.h                   |    2 +
>  include/uapi/linux/videodev2.h                |    1 +
>  21 files changed, 8326 insertions(+), 103 deletions(-)
>  create mode 100644 drivers/media/platform/verisilicon/rockchip_av1_entropymode.c
>  create mode 100644 drivers/media/platform/verisilicon/rockchip_av1_entropymode.h
>  create mode 100644 drivers/media/platform/verisilicon/rockchip_av1_filmgrain.c
>  create mode 100644 drivers/media/platform/verisilicon/rockchip_av1_filmgrain.h
>  create mode 100644 drivers/media/platform/verisilicon/rockchip_vpu981_hw_av1_dec.c
>  create mode 100644 drivers/media/platform/verisilicon/rockchip_vpu981_regs.h
>