diff mbox series

media: cedrus: h264: Decoding fails or produces artifacts

Message ID ZhMNNazVsu8EXqMU@login.tika.stderr.nl
State New
Headers show
Series media: cedrus: h264: Decoding fails or produces artifacts | expand

Commit Message

Matthijs Kooijman April 7, 2024, 9:16 p.m. UTC
Hi,

I've been chasing a decoder bug in the cedrus h264 hardware decoder and after a
few days have been able to work around it by removing the
DMA_ATTR_NO_KERNEL_MAPPING argument when allocating the neighbour info dma
buffer (full patch at the end of the mail):

Comments

Sebastian Fricke April 10, 2024, 9:55 a.m. UTC | #1
Hey Matthijs,

On 07.04.2024 23:16, Matthijs Kooijman wrote:
>Hi,
>
>I've been chasing a decoder bug in the cedrus h264 hardware decoder and after a
>few days have been able to work around it by removing the
>DMA_ATTR_NO_KERNEL_MAPPING argument when allocating the neighbour info dma
>buffer (full patch at the end of the mail):
>
>--- a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
>+++ b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
>@@ -581,7 +581,7 @@ static int cedrus_h264_start(struct cedrus_ctx *ctx)
>        ctx->codec.h264.neighbor_info_buf =
>                dma_alloc_attrs(dev->dev, CEDRUS_NEIGHBOR_INFO_BUF_SIZE,
>                                &ctx->codec.h264.neighbor_info_buf_dma,
>-                               GFP_KERNEL, DMA_ATTR_NO_KERNEL_MAPPING);
>+                               GFP_KERNEL, 0);
>
>This does not seem like a proper fix to me (or maybe it is - I do not really
>understand the problem yet), but it very effective.

thanks for the patch, I don't think we can take it as due to, as you
highlighted, it relies on unknown side effects. Perhaps if you or
someone else in the community can investigate this further and answer
some more questions we can move it forward.

Some questions to help move the discussions along:

- Does the platform have an IOMMU?
- What does the IOMMU driver do when it sees the
   DMA_ATTR_NO_KERNEL_MAPPING flag?
- On the platform in question do coherent allocations get handled with
   cached snooped mappings or uncached?
- Does changing that flag affect any cacheablility
- Does the codec block expect any particular intialized data in that
   buffer?

Greetings,
Sebastian

>
>
>The actual problem is that sometimes (about 10%-50%) the first frame
>of a h264 stream incorrectly decodes, producing all-zero data in the bottom
>part of the frame. See attached error-green.png for an example. In some even
>more rare cases, more subtle decoding errors would occur, for example a lighter
>rectangle in the bottom left dark blue band in error-blue-bar.png, or some
>misplaced pixels in the bottom right "noise" area (compare error-noise.png with
>error-blue-bar.png to see).
>
>To reproduce:
>
>	# Create a h264-encoded 320x240 single-frame stream
>	gst-launch-1.0 videotestsrc num-buffers=1 ! openh264enc ! filesink location=test.h264
>	# Decode it again using hardware decoding (repeat until seeing a broken frame)
>	gst-launch-1.0 filesrc location=test.h264 ! v4l2slh264dec ! videoconvert ! pngenc ! filesink location=test.png
>
>I've been testing on an Orange Pi PC with an allwinner H3 running
>Armbian 22.04 (based on Ubuntu Jammy), a 6.7.12 kernel (mostly with
>Armbian-patched, but also clean mainline) and gstreamer
>1.20.3. The problem also occurred on older kernel versions (tested clean
>mainline down to 6.2 and then a bit earlier, but 6.1 mainline would
>not boot for me). I have also observed the problem (or at least the same
>symptom) on the Armbian-patched 6.1 kernel, but there it was extremely unlikely
>(saw it once in 700 or so playbacks).
>
>I've seen three variants of this issue:
>
> 1. With the 320x240 test pattern, showing the green area in the frame.
>    In this case, the decoder would actually return an error IRQ status
>    (VE_H264_STATUS register would be 0x10010003 or 0x70010003, instead
>    of the normal 0x60000001, bit 0 is VE_H264_STATUS_SLICE_DECODE_INT,
>    bit 1 is VE_H264_STATUS_DECODE_ERR_INT).
>
> 2. With the 320x240 test pattern, showing more subtle decoding errors (as
>    shown in attachments).  In these cases, the IRQ status would not indicate a
>    decoding error.
>
> 3. With another test video (1920x1080) that I cannot share, there would
>    be the same green areas (occasionally also green squares splattered
>    around the frame), but *no* decoding error in the IRQ status.
>
>None of these three issues occur anymore with the workaround applied (both on
>clean 6.7.12 and on top of the Armbian patches), the gstreamer decoding then
>produces bit-for-bit identical results every time (tested hundreds of
>decodings).
>
>
>I have no clue why removing DMA_ATTR_NO_KERNEL_MAPPING from this
>allocation helps - I just tried it on a whim when I nothing else seemed
>to help. I suspect that mapping the memory into kernel space might have
>some side effect that helps, but I am not familiar enough with either
>the cedrus hardware or the DMA system to tell.
>
>I initially thought there might be an alignment issue and tried
>increasing the allocation sizes of all DMA buffers to 256K (which
>I think enforces a bigger alignment), but that did not help. The reason
>I was looking at memory allocation (and also timing differences, but no
>luck there) was that bisecting pointed me at commit fec94f8c995 ("media:
>cedrus: h264: Optimize mv col buffer allocation") which seemed to make
>the problem 2-4× likely to occur (but looking back, I think I might have
>been bisecting noise there).
>
>
>I hope you folks can make more sense of this to figure out a proper fix.
>For the short term my problem is solved (I'll just apply this patch and ship
>it), so will not be investing more time right now.
>
>If there's anything I can contribute, just let me know (I have done some tests
>with extra debug output added to the kernel, and a testscript for repeatedly
>testing the decoding that I could share, and I am happy to test any
>patches/hypothesis that you have).
>
>Gr.
>
>Matthijs
>
>diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
>index dfb401df138..7cd3b6a5434 100644
>--- a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
>+++ b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
>@@ -581,7 +581,7 @@ static int cedrus_h264_start(struct cedrus_ctx *ctx)
>        ctx->codec.h264.neighbor_info_buf =
>                dma_alloc_attrs(dev->dev, CEDRUS_NEIGHBOR_INFO_BUF_SIZE,
>                                &ctx->codec.h264.neighbor_info_buf_dma,
>-                               GFP_KERNEL, DMA_ATTR_NO_KERNEL_MAPPING);
>+                               GFP_KERNEL, 0);
>        if (!ctx->codec.h264.neighbor_info_buf) {
>                ret = -ENOMEM;
>                goto err_pic_buf;
>@@ -634,7 +634,7 @@ static int cedrus_h264_start(struct cedrus_ctx *ctx)
>        dma_free_attrs(dev->dev, CEDRUS_NEIGHBOR_INFO_BUF_SIZE,
>                       ctx->codec.h264.neighbor_info_buf,
>                       ctx->codec.h264.neighbor_info_buf_dma,
>-                      DMA_ATTR_NO_KERNEL_MAPPING);
>+                      0);
>
> err_pic_buf:
>        dma_free_attrs(dev->dev, ctx->codec.h264.pic_info_buf_size,
>@@ -670,7 +670,7 @@ static void cedrus_h264_stop(struct cedrus_ctx *ctx)
>        dma_free_attrs(dev->dev, CEDRUS_NEIGHBOR_INFO_BUF_SIZE,
>                       ctx->codec.h264.neighbor_info_buf,
>                       ctx->codec.h264.neighbor_info_buf_dma,
>-                      DMA_ATTR_NO_KERNEL_MAPPING);
>+                      0);
>        dma_free_attrs(dev->dev, ctx->codec.h264.pic_info_buf_size,
>                       ctx->codec.h264.pic_info_buf,
>                       ctx->codec.h264.pic_info_buf_dma,
Matthijs Kooijman April 10, 2024, 10:09 a.m. UTC | #2
Hi Sebastian,

On Wed, Apr 10, 2024 at 11:55:55AM +0200, Sebastian Fricke wrote:
> Hey Matthijs,
> 
> On 07.04.2024 23:16, Matthijs Kooijman wrote:
> > Hi,
> > 
> > I've been chasing a decoder bug in the cedrus h264 hardware decoder and after a
> > few days have been able to work around it by removing the
> > DMA_ATTR_NO_KERNEL_MAPPING argument when allocating the neighbour info dma
> > buffer (full patch at the end of the mail):
> > 
> > --- a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> > +++ b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> > @@ -581,7 +581,7 @@ static int cedrus_h264_start(struct cedrus_ctx *ctx)
> >        ctx->codec.h264.neighbor_info_buf =
> >                dma_alloc_attrs(dev->dev, CEDRUS_NEIGHBOR_INFO_BUF_SIZE,
> >                                &ctx->codec.h264.neighbor_info_buf_dma,
> > -                               GFP_KERNEL, DMA_ATTR_NO_KERNEL_MAPPING);
> > +                               GFP_KERNEL, 0);
> > 
> > This does not seem like a proper fix to me (or maybe it is - I do not really
> > understand the problem yet), but it very effective.
> 
> thanks for the patch, I don't think we can take it as due to, as you
> highlighted, it relies on unknown side effects. Perhaps if you or
> someone else in the community can investigate this further and answer
> some more questions we can move it forward.
> 
> Some questions to help move the discussions along:
> 
> - Does the platform have an IOMMU?
> - What does the IOMMU driver do when it sees the
>   DMA_ATTR_NO_KERNEL_MAPPING flag?
> - On the platform in question do coherent allocations get handled with
>   cached snooped mappings or uncached?
> - Does changing that flag affect any cacheablility
> - Does the codec block expect any particular intialized data in that
>   buffer?

I have no idea about any of these, but I agree that they are good
questions to ask.

I've CC'd linux-sunxi@lists.linux.dev in this mail, maybe someone there
can answer these questions for the Allwinner H3 platform.


> > The actual problem is that sometimes (about 10%-50%) the first frame
> > of a h264 stream incorrectly decodes, producing all-zero data in the bottom
> > part of the frame. See attached error-green.png for an example. In some even
> > more rare cases, more subtle decoding errors would occur, for example a lighter
> > rectangle in the bottom left dark blue band in error-blue-bar.png, or some
> > misplaced pixels in the bottom right "noise" area (compare error-noise.png with
> > error-blue-bar.png to see).
> > 
> > To reproduce:
> > 
> > 	# Create a h264-encoded 320x240 single-frame stream
> > 	gst-launch-1.0 videotestsrc num-buffers=1 ! openh264enc ! filesink location=test.h264
> > 	# Decode it again using hardware decoding (repeat until seeing a broken frame)
> > 	gst-launch-1.0 filesrc location=test.h264 ! v4l2slh264dec ! videoconvert ! pngenc ! filesink location=test.png
> > 
> > I've been testing on an Orange Pi PC with an allwinner H3 running
> > Armbian 22.04 (based on Ubuntu Jammy), a 6.7.12 kernel (mostly with
> > Armbian-patched, but also clean mainline) and gstreamer
> > 1.20.3. The problem also occurred on older kernel versions (tested clean
> > mainline down to 6.2 and then a bit earlier, but 6.1 mainline would
> > not boot for me). I have also observed the problem (or at least the same
> > symptom) on the Armbian-patched 6.1 kernel, but there it was extremely unlikely
> > (saw it once in 700 or so playbacks).
> > 
> > I've seen three variants of this issue:
> > 
> > 1. With the 320x240 test pattern, showing the green area in the frame.
> >    In this case, the decoder would actually return an error IRQ status
> >    (VE_H264_STATUS register would be 0x10010003 or 0x70010003, instead
> >    of the normal 0x60000001, bit 0 is VE_H264_STATUS_SLICE_DECODE_INT,
> >    bit 1 is VE_H264_STATUS_DECODE_ERR_INT).
> > 
> > 2. With the 320x240 test pattern, showing more subtle decoding errors (as
> >    shown in attachments).  In these cases, the IRQ status would not indicate a
> >    decoding error.
> > 
> > 3. With another test video (1920x1080) that I cannot share, there would
> >    be the same green areas (occasionally also green squares splattered
> >    around the frame), but *no* decoding error in the IRQ status.
> > 
> > None of these three issues occur anymore with the workaround applied (both on
> > clean 6.7.12 and on top of the Armbian patches), the gstreamer decoding then
> > produces bit-for-bit identical results every time (tested hundreds of
> > decodings).
> > 
> > 
> > I have no clue why removing DMA_ATTR_NO_KERNEL_MAPPING from this
> > allocation helps - I just tried it on a whim when I nothing else seemed
> > to help. I suspect that mapping the memory into kernel space might have
> > some side effect that helps, but I am not familiar enough with either
> > the cedrus hardware or the DMA system to tell.
> > 
> > I initially thought there might be an alignment issue and tried
> > increasing the allocation sizes of all DMA buffers to 256K (which
> > I think enforces a bigger alignment), but that did not help. The reason
> > I was looking at memory allocation (and also timing differences, but no
> > luck there) was that bisecting pointed me at commit fec94f8c995 ("media:
> > cedrus: h264: Optimize mv col buffer allocation") which seemed to make
> > the problem 2-4× likely to occur (but looking back, I think I might have
> > been bisecting noise there).
> > 
> > 
> > I hope you folks can make more sense of this to figure out a proper fix.
> > For the short term my problem is solved (I'll just apply this patch and ship
> > it), so will not be investing more time right now.
> > 
> > If there's anything I can contribute, just let me know (I have done some tests
> > with extra debug output added to the kernel, and a testscript for repeatedly
> > testing the decoding that I could share, and I am happy to test any
> > patches/hypothesis that you have).
> > 
> > Gr.
> > 
> > Matthijs
> > 
> > diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> > index dfb401df138..7cd3b6a5434 100644
> > --- a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> > +++ b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> > @@ -581,7 +581,7 @@ static int cedrus_h264_start(struct cedrus_ctx *ctx)
> >        ctx->codec.h264.neighbor_info_buf =
> >                dma_alloc_attrs(dev->dev, CEDRUS_NEIGHBOR_INFO_BUF_SIZE,
> >                                &ctx->codec.h264.neighbor_info_buf_dma,
> > -                               GFP_KERNEL, DMA_ATTR_NO_KERNEL_MAPPING);
> > +                               GFP_KERNEL, 0);
> >        if (!ctx->codec.h264.neighbor_info_buf) {
> >                ret = -ENOMEM;
> >                goto err_pic_buf;
> > @@ -634,7 +634,7 @@ static int cedrus_h264_start(struct cedrus_ctx *ctx)
> >        dma_free_attrs(dev->dev, CEDRUS_NEIGHBOR_INFO_BUF_SIZE,
> >                       ctx->codec.h264.neighbor_info_buf,
> >                       ctx->codec.h264.neighbor_info_buf_dma,
> > -                      DMA_ATTR_NO_KERNEL_MAPPING);
> > +                      0);
> > 
> > err_pic_buf:
> >        dma_free_attrs(dev->dev, ctx->codec.h264.pic_info_buf_size,
> > @@ -670,7 +670,7 @@ static void cedrus_h264_stop(struct cedrus_ctx *ctx)
> >        dma_free_attrs(dev->dev, CEDRUS_NEIGHBOR_INFO_BUF_SIZE,
> >                       ctx->codec.h264.neighbor_info_buf,
> >                       ctx->codec.h264.neighbor_info_buf_dma,
> > -                      DMA_ATTR_NO_KERNEL_MAPPING);
> > +                      0);
> >        dma_free_attrs(dev->dev, ctx->codec.h264.pic_info_buf_size,
> >                       ctx->codec.h264.pic_info_buf,
> >                       ctx->codec.h264.pic_info_buf_dma,
diff mbox series

Patch

--- a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
+++ b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
@@ -581,7 +581,7 @@  static int cedrus_h264_start(struct cedrus_ctx *ctx)
        ctx->codec.h264.neighbor_info_buf =
                dma_alloc_attrs(dev->dev, CEDRUS_NEIGHBOR_INFO_BUF_SIZE,
                                &ctx->codec.h264.neighbor_info_buf_dma,
-                               GFP_KERNEL, DMA_ATTR_NO_KERNEL_MAPPING);
+                               GFP_KERNEL, 0);

This does not seem like a proper fix to me (or maybe it is - I do not really
understand the problem yet), but it very effective.


The actual problem is that sometimes (about 10%-50%) the first frame
of a h264 stream incorrectly decodes, producing all-zero data in the bottom
part of the frame. See attached error-green.png for an example. In some even
more rare cases, more subtle decoding errors would occur, for example a lighter
rectangle in the bottom left dark blue band in error-blue-bar.png, or some
misplaced pixels in the bottom right "noise" area (compare error-noise.png with
error-blue-bar.png to see).

To reproduce:

	# Create a h264-encoded 320x240 single-frame stream
	gst-launch-1.0 videotestsrc num-buffers=1 ! openh264enc ! filesink location=test.h264
	# Decode it again using hardware decoding (repeat until seeing a broken frame)
	gst-launch-1.0 filesrc location=test.h264 ! v4l2slh264dec ! videoconvert ! pngenc ! filesink location=test.png

I've been testing on an Orange Pi PC with an allwinner H3 running
Armbian 22.04 (based on Ubuntu Jammy), a 6.7.12 kernel (mostly with
Armbian-patched, but also clean mainline) and gstreamer
1.20.3. The problem also occurred on older kernel versions (tested clean
mainline down to 6.2 and then a bit earlier, but 6.1 mainline would
not boot for me). I have also observed the problem (or at least the same
symptom) on the Armbian-patched 6.1 kernel, but there it was extremely unlikely
(saw it once in 700 or so playbacks).

I've seen three variants of this issue:

 1. With the 320x240 test pattern, showing the green area in the frame.
    In this case, the decoder would actually return an error IRQ status
    (VE_H264_STATUS register would be 0x10010003 or 0x70010003, instead
    of the normal 0x60000001, bit 0 is VE_H264_STATUS_SLICE_DECODE_INT,
    bit 1 is VE_H264_STATUS_DECODE_ERR_INT).

 2. With the 320x240 test pattern, showing more subtle decoding errors (as
    shown in attachments).  In these cases, the IRQ status would not indicate a
    decoding error.

 3. With another test video (1920x1080) that I cannot share, there would
    be the same green areas (occasionally also green squares splattered
    around the frame), but *no* decoding error in the IRQ status.

None of these three issues occur anymore with the workaround applied (both on
clean 6.7.12 and on top of the Armbian patches), the gstreamer decoding then
produces bit-for-bit identical results every time (tested hundreds of
decodings).


I have no clue why removing DMA_ATTR_NO_KERNEL_MAPPING from this
allocation helps - I just tried it on a whim when I nothing else seemed
to help. I suspect that mapping the memory into kernel space might have
some side effect that helps, but I am not familiar enough with either
the cedrus hardware or the DMA system to tell.

I initially thought there might be an alignment issue and tried
increasing the allocation sizes of all DMA buffers to 256K (which
I think enforces a bigger alignment), but that did not help. The reason
I was looking at memory allocation (and also timing differences, but no
luck there) was that bisecting pointed me at commit fec94f8c995 ("media:
cedrus: h264: Optimize mv col buffer allocation") which seemed to make
the problem 2-4× likely to occur (but looking back, I think I might have
been bisecting noise there).


I hope you folks can make more sense of this to figure out a proper fix.
For the short term my problem is solved (I'll just apply this patch and ship
it), so will not be investing more time right now.

If there's anything I can contribute, just let me know (I have done some tests
with extra debug output added to the kernel, and a testscript for repeatedly
testing the decoding that I could share, and I am happy to test any
patches/hypothesis that you have).

Gr.

Matthijs

diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
index dfb401df138..7cd3b6a5434 100644
--- a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
+++ b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
@@ -581,7 +581,7 @@  static int cedrus_h264_start(struct cedrus_ctx *ctx)
        ctx->codec.h264.neighbor_info_buf =
                dma_alloc_attrs(dev->dev, CEDRUS_NEIGHBOR_INFO_BUF_SIZE,
                                &ctx->codec.h264.neighbor_info_buf_dma,
-                               GFP_KERNEL, DMA_ATTR_NO_KERNEL_MAPPING);
+                               GFP_KERNEL, 0);
        if (!ctx->codec.h264.neighbor_info_buf) {
                ret = -ENOMEM;
                goto err_pic_buf;
@@ -634,7 +634,7 @@  static int cedrus_h264_start(struct cedrus_ctx *ctx)
        dma_free_attrs(dev->dev, CEDRUS_NEIGHBOR_INFO_BUF_SIZE,
                       ctx->codec.h264.neighbor_info_buf,
                       ctx->codec.h264.neighbor_info_buf_dma,
-                      DMA_ATTR_NO_KERNEL_MAPPING);
+                      0);
 
 err_pic_buf:
        dma_free_attrs(dev->dev, ctx->codec.h264.pic_info_buf_size,
@@ -670,7 +670,7 @@  static void cedrus_h264_stop(struct cedrus_ctx *ctx)
        dma_free_attrs(dev->dev, CEDRUS_NEIGHBOR_INFO_BUF_SIZE,
                       ctx->codec.h264.neighbor_info_buf,
                       ctx->codec.h264.neighbor_info_buf_dma,
-                      DMA_ATTR_NO_KERNEL_MAPPING);
+                      0);
        dma_free_attrs(dev->dev, ctx->codec.h264.pic_info_buf_size,
                       ctx->codec.h264.pic_info_buf,
                       ctx->codec.h264.pic_info_buf_dma,