[v2] media: verisilicon: Fix crash when probing encoder

Message ID	20230413104756.356695-1-benjamin.gaignard@collabora.com
State	Accepted
Commit	f100ce3bbd6aa0093075b20b9dbd006686f6aedf
Headers	show Return-Path: <linux-media-owner@vger.kernel.org> sender: benjamin.gaignard) by madras.collabora.co.uk (Postfix) with ESMTPSA id 5A6AC660320E; Thu, 13 Apr 2023 11:48:04 +0100 (BST) From: Benjamin Gaignard <benjamin.gaignard@collabora.com> To: ezequiel@vanguardiasur.com.ar, p.zabel@pengutronix.de, mchehab@kernel.org, m.szyprowski@samsung.com Cc: linux-media@vger.kernel.org, linux-rockchip@lists.infradead.org, linux-kernel@vger.kernel.org, kernel@collabora.com, Benjamin Gaignard <benjamin.gaignard@collabora.com> Subject: [PATCH v2] media: verisilicon: Fix crash when probing encoder Date: Thu, 13 Apr 2023 12:47:56 +0200 Message-Id: <20230413104756.356695-1-benjamin.gaignard@collabora.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	[v2] media: verisilicon: Fix crash when probing encoder \| expand [v2] media: verisilicon: Fix crash when probing encoder

Benjamin Gaignard April 13, 2023, 10:47 a.m. UTC

ctx->vpu_dst_fmt is no more initialized before calling hantro_try_fmt()
so assigne it to vpu_fmt led to crash the kernel.
Like for decoder case use 'fmt' as format for encoder and clean up
the code.

Signed-off-by: Benjamin Gaignard <benjamin.gaignard@collabora.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Fixes: db6f68b51e5c ("media: verisilicon: Do not set context src/dst formats in reset functions")
---
version 2:
- Remove useless vpu_fmt.

 drivers/media/platform/verisilicon/hantro_v4l2.c | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

Diederik de Haas May 19, 2023, 10:34 p.m. UTC | #1

Hi,

On Thursday, 13 April 2023 21:52:50 CEST Nicolas Dufresne wrote:
> Le jeudi 13 avril 2023 à 10:10 -0300, Ezequiel Garcia a écrit :
> > Benjamin,
> > 
> > Please include the crash stracktrace in the commit.
> 
> Careful with HTML message, they don't always make it in these ML and tooling
> might not play well with the tooling. Perhaps it can be edited while
> pulling ? Here's the info from Marek's bug report:
> 
> hantro-vpu fdea0000.video-codec: Adding to iommu group 0
> hantro-vpu fdea0000.video-codec: registered rockchip,rk3568-vpu-dec as
> /dev/video0
> hantro-vpu fdee0000.video-codec: Adding to iommu group 1
> hantro-vpu fdee0000.video-codec: registered rockchip,rk3568-vepu-enc as
> /dev/video1
> Unable to handle kernel NULL pointer dereference at virtual address
> 0000000000000008
> Mem abort info:
>    ESR = 0x0000000096000004
>    EC = 0x25: DABT (current EL), IL = 32 bits
>    SET = 0, FnV = 0
>    EA = 0, S1PTW = 0
>    FSC = 0x04: level 0 translation fault
> Data abort info:
>    ISV = 0, ISS = 0x00000004
>    CM = 0, WnR = 0
> user pgtable: 4k pages, 48-bit VAs, pgdp=00000001f446f000
> [0000000000000008] pgd=0000000000000000, p4d=0000000000000000
> Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
> Modules linked in: hantro_vpu v4l2_vp9 v4l2_h264 v4l2_mem2mem
> videobuf2_dma_contig snd_soc_simple_card display_connector
> snd_soc_simple_card_utils videobuf2_memops crct10dif_ce dwmac_rk
> rockchip_thermal videobuf2_v4l2 stmmac_platform rockchip_saradc
> industrialio_triggered_buffer kfifo_buf stmmac videodev pcs_xpcs
> rtc_rk808 videobuf2_common rockchipdrm panfrost mc drm_shmem_helper
> analogix_dp gpu_sched dw_mipi_dsi dw_hdmi drm_display_helper ip_tables
> x_tables ipv6
> CPU: 3 PID: 171 Comm: v4l_id Not tainted 6.3.0-rc2+ #13478
> Hardware name: Hardkernel ODROID-M1 (DT)
> pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> pc : hantro_try_fmt+0xb4/0x280 [hantro_vpu]
> lr : hantro_try_fmt+0xa8/0x280 [hantro_vpu]
> ...
> Call trace:
>   hantro_try_fmt+0xb4/0x280 [hantro_vpu]
>   hantro_set_fmt_out+0x3c/0x278 [hantro_vpu]
>   hantro_reset_raw_fmt+0x94/0xb4 [hantro_vpu]
>   hantro_set_fmt_cap+0x23c/0x250 [hantro_vpu]
>   hantro_reset_fmts+0x94/0xcc [hantro_vpu]
>   hantro_open+0xd4/0x20c [hantro_vpu]
>   v4l2_open+0x80/0x120 [videodev]
>   chrdev_open+0xc0/0x22c
>   do_dentry_open+0x13c/0x490
>   vfs_open+0x2c/0x38
>   path_openat+0x550/0x938
>   do_filp_open+0x80/0x12c
>   do_sys_openat2+0xb4/0x16c
>   __arm64_sys_openat+0x64/0xac
>   invoke_syscall+0x48/0x114
>   el0_svc_common.constprop.0+0xfc/0x11c
>   do_el0_svc+0x38/0xa4
>   el0_svc+0x48/0xb8
>   el0t_64_sync_handler+0xb8/0xbc
>   el0t_64_sync+0x190/0x194
> Code: 97fe726c f940aa80 52864a61 72a686c1 (b9400800)
> ---[ end trace 0000000000000000 ]---

When I booted into my 6.4-rc1 (but also rc2) kernel on my
Pine64 Quartz64 Model A, I noticed a crash which seems the same as
above, but I didn't have such a crash with my 6.3 kernel.
Searching for 'hantro' led me to this commit as the most likely culprit
but when I build a new 6.4-rcX kernel with this commit reverted,
I still had this crash.
Do you have suggestions which commit would then be the likely culprit?

Cheers,
  Diederik

For completeness, this is the error I got with 6.4-rcX:

[   26.976766] panfrost fde60000.gpu: clock rate = 594000000
[   26.977297] panfrost fde60000.gpu: bus_clock rate = 500000000
[   26.996012] random: crng init done
[   27.072438] videodev: Linux video capture interface: v2.00
[   27.119012] Registered IR keymap rc-cec
[   27.125161] rc rc0: dw_hdmi as /devices/platform/fe0a0000.hdmi/rc/rc0
[   27.125427] panfrost fde60000.gpu: mali-g52 id 0x7402 major 0x1 minor 0x0 status 0x0
[   27.125905] input: dw_hdmi as /devices/platform/fe0a0000.hdmi/rc/rc0/input1
[   27.126427] panfrost fde60000.gpu: features: 00000000,00000cf7, issues: 00000000,00000400
[   27.127785] panfrost fde60000.gpu: Features: L2:0x07110206 Shader:0x00000002 Tiler:0x00000209 Mem:0x1 MMU:0x00002823 AS:0xff JS:0x7
[   27.128954] panfrost fde60000.gpu: shader_present=0x1 l2_present=0x1
[   27.148920] gpio-fan gpio_fan: GPIO fan initialized
[   27.191131] [drm] Initialized panfrost 1.2.0 20180908 for fde60000.gpu on minor 1
[   27.265920] hantro-vpu fdea0000.video-codec: Adding to iommu group 0
[   27.267535] hantro-vpu fdea0000.video-codec: registered rockchip,rk3568-vpu-dec as /dev/video0
[   27.270668] hantro-vpu fdee0000.video-codec: Adding to iommu group 1
[   27.272590] hantro-vpu fdee0000.video-codec: registered rockchip,rk3568-vepu-enc as /dev/video1
[   27.573417] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[   27.574238] Mem abort info:
[   27.574499]   ESR = 0x0000000096000004
[   27.574836]   EC = 0x25: DABT (current EL), IL = 32 bits
[   27.575310]   SET = 0, FnV = 0
[   27.575586]   EA = 0, S1PTW = 0
[   27.575868]   FSC = 0x04: level 0 translation fault
[   27.576368] Data abort info:
[   27.576637]   ISV = 0, ISS = 0x00000004
[   27.576980]   CM = 0, WnR = 0
[   27.577247] user pgtable: 4k pages, 48-bit VAs, pgdp=000000010718b000
[   27.577818] [0000000000000008] pgd=0000000000000000, p4d=0000000000000000
[   27.578430] Internal error: Oops: 0000000096000004 [#1] SMP
[   27.578934] Modules linked in: polyval_generic(E+) ghash_ce(E) gf128mul(E) snd_soc_hdmi_codec(E+) sha2_ce(E) ecdh_generic(E+) sha256_arm64(E) sha1_ce(E) rfkill(E) snd_soc_spdif_tx(E) ecc(E) leds_gpio(E) snd_soc_simple_card(E) snd_soc_simple_card_utils(E) display_connector(E) gpio_fan(E) hantro_vpu(E) v4l2_vp9(E) snd_soc_rockchip_i2s_tdm(E) v4l2_h264(E) videobuf2_dma_contig(E) snd_soc_rockchip_spdif(E) snd_soc_rk817(E) v4l2_mem2mem(E) videobuf2_memops(E) governor_simpleondemand(E) rockchip_thermal(E) videobuf2_v4l2(E) dw_wdt(E) snd_soc_core(E) dw_hdmi_i2s_audio(E) dw_hdmi_cec(E) videodev(E) snd_pcm_dmaengine(E) panfrost(E) videobuf2_common(E) snd_pcm(E) rk805_pwrkey(E) snd_timer(E) gpu_sched(E) snd(E) soundcore(E) mc(E) drm_shmem_helper(E) cpufreq_dt(E) loop(E) fuse(E) efi_pstore(E) dm_mod(E) dax(E) configfs(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc16(E) mbcache(E) jbd2(E) crc32c_generic(E) xhci_plat_hcd(E) xhci_hcd(E) motorcomm(E) rk808_regulator(E) fan53555(E) gpio_rockchip(E) dwmac_rk(E) stmmac_platform(E)
[   27.579108]  crct10dif_ce(E) stmmac(E) pcs_xpcs(E) crct10dif_common(E) phylink(E) fixed(E) of_mdio(E) pinctrl_rockchip(E) phy_rockchip_inno_usb2(E) fixed_phy(E) sdhci_of_dwcmshc(E) dw_mmc_rockchip(E) fwnode_mdio(E) sdhci_pltfm(E) phy_rockchip_naneng_combphy(E) dw_mmc_pltfm(E) sdhci(E) dw_mmc(E) pl330(E) libphy(E) rockchipdrm(E) ptp(E) drm_dma_helper(E) analogix_dp(E) dw_hdmi(E) cec(E) rc_core(E) drm_display_helper(E) dw_mipi_dsi(E) pps_core(E) drm_kms_helper(E) ohci_platform(E) ohci_hcd(E) ehci_platform(E) io_domain(E) ehci_hcd(E) dwc3(E) i2c_rk3x(E) udc_core(E) usbcore(E) roles(E) ulpi(E) drm(E) usb_common(E)
[   27.591758] CPU: 1 PID: 407 Comm: v4l_id Tainted: G            E      6.4.0-0-pine64-arm64 #1  Debian 6.4~rc2-1~pine64
[   27.592705] Hardware name: Pine64 RK3566 Quartz64-A Board (DT)
[   27.593223] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   27.593843] pc : hantro_try_fmt+0xb8/0x290 [hantro_vpu]
[   27.594336] lr : hantro_try_fmt+0xac/0x290 [hantro_vpu]
[   27.594811] sp : ffff80000aa9b670
[   27.595107] x29: ffff80000aa9b670 x28: ffff80000aa9bb60 x27: 0000000000000000
[   27.595746] x26: 0000000000000000 x25: ffff000108380008 x24: ffff8000014ebac0
[   27.596383] x23: 0000000000000001 x22: ffff000108380000 x21: 0000000000000000
[   27.597019] x20: ffff8000014f10f0 x19: ffff80000aa9b708 x18: 0000000000000010
[   27.597657] x17: 0000000000000000 x16: 0000000000000000 x15: ffff80000aa9b6a0
[   27.598292] x14: ffff0001043fe280 x13: 0000000000000001 x12: 0000000000000001
[   27.598927] x11: 0000000000000002 x10: 0000000000000003 x9 : 0000000000000002
[   27.599563] x8 : 000000000000001f x7 : 000000000000005f x6 : 0000000000000003
[   27.600199] x5 : ffff8000013e2583 x4 : ffff8000013e2580 x3 : 00000000ffffffff
[   27.600834] x2 : 0000000000000010 x1 : 0000000000000000 x0 : 0000000034363253
[   27.601470] Call trace:
[   27.601693]  hantro_try_fmt+0xb8/0x290 [hantro_vpu]
[   27.602143]  hantro_set_fmt_out+0x44/0x388 [hantro_vpu]
[   27.602617]  hantro_reset_raw_fmt+0x7c/0xe0 [hantro_vpu]
[   27.603096]  hantro_set_fmt_cap+0x29c/0x2b8 [hantro_vpu]
[   27.603575]  hantro_reset_encoded_fmt+0x80/0xc0 [hantro_vpu]
[   27.604086]  hantro_reset_fmts+0x20/0x48 [hantro_vpu]
[   27.604545]  hantro_open+0xe0/0x208 [hantro_vpu]
[   27.604965]  v4l2_open+0x84/0x130 [videodev]
[   27.605389]  chrdev_open+0xd8/0x2d8
[   27.605712]  do_dentry_open+0x1bc/0x490
[   27.606060]  vfs_open+0x34/0x40
[   27.606345]  path_openat+0x9c8/0xf20
[   27.606670]  do_filp_open+0xa4/0x160
[   27.606991]  do_sys_openat2+0xc8/0x188
[   27.607327]  __arm64_sys_openat+0x6c/0xb8
[   27.607687]  invoke_syscall+0x78/0x108
[   27.608029]  el0_svc_common.constprop.0+0xd4/0x100
[   27.608456]  do_el0_svc+0x40/0xa8
[   27.608755]  el0_svc+0x34/0xd8
[   27.609036]  el0t_64_sync_handler+0xf4/0x120
[   27.609420]  el0t_64_sync+0x190/0x198
[   27.609753] Code: 97fbc8e2 f94056c1 52864a60 72a686c0 (b9400821) 
[   27.610297] ---[ end trace 0000000000000000 ]---

Diederik de Haas May 22, 2023, 10:38 p.m. UTC | #2

On Monday, 22 May 2023 18:17:39 CEST Benjamin Gaignard wrote:
> Le 20/05/2023 à 00:34, Diederik de Haas a écrit :
> > On Thursday, 13 April 2023 21:52:50 CEST Nicolas Dufresne wrote:
> >> Le jeudi 13 avril 2023 à 10:10 -0300, Ezequiel Garcia a écrit :
> >>> Benjamin,
> >>> 
> >>> Please include the crash stracktrace in the commit.
> >> 
> >> Careful with HTML message, they don't always make it in these ML and
> >> tooling might not play well with the tooling. Perhaps it can be edited
> >> while pulling ? Here's the info from Marek's bug report:
> >> 
> >> hantro-vpu fdea0000.video-codec: Adding to iommu group 0
> >> hantro-vpu fdea0000.video-codec: registered rockchip,rk3568-vpu-dec as
> >> /dev/video0
> >> hantro-vpu fdee0000.video-codec: Adding to iommu group 1
> >> hantro-vpu fdee0000.video-codec: registered rockchip,rk3568-vepu-enc as
> >> /dev/video1
> >> Unable to handle kernel NULL pointer dereference at virtual address
> >> 0000000000000008
> >> Mem abort info:
> >> ESR = 0x0000000096000004
> >> EC = 0x25: DABT (current EL), IL = 32 bits
> >> SET = 0, FnV = 0
> >> EA = 0, S1PTW = 0
> >> FSC = 0x04: level 0 translation fault
> >> Data abort info:
> >> ISV = 0, ISS = 0x00000004
> >> CM = 0, WnR = 0
> >> user pgtable: 4k pages, 48-bit VAs, pgdp=00000001f446f000
> >> [0000000000000008] pgd=0000000000000000, p4d=0000000000000000
> >> Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
> >> Modules linked in: hantro_vpu v4l2_vp9 v4l2_h264 v4l2_mem2mem
> >> videobuf2_dma_contig snd_soc_simple_card display_connector
> >> snd_soc_simple_card_utils videobuf2_memops crct10dif_ce dwmac_rk
> >> rockchip_thermal videobuf2_v4l2 stmmac_platform rockchip_saradc
> >> industrialio_triggered_buffer kfifo_buf stmmac videodev pcs_xpcs
> >> rtc_rk808 videobuf2_common rockchipdrm panfrost mc drm_shmem_helper
> >> analogix_dp gpu_sched dw_mipi_dsi dw_hdmi drm_display_helper ip_tables
> >> x_tables ipv6
> >> CPU: 3 PID: 171 Comm: v4l_id Not tainted 6.3.0-rc2+ #13478
> >> Hardware name: Hardkernel ODROID-M1 (DT)
> >> pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> >> pc : hantro_try_fmt+0xb4/0x280 [hantro_vpu]
> >> lr : hantro_try_fmt+0xa8/0x280 [hantro_vpu]
> >> ...
> >> Call trace:
> >> hantro_try_fmt+0xb4/0x280 [hantro_vpu]
> >> hantro_set_fmt_out+0x3c/0x278 [hantro_vpu]
> >> hantro_reset_raw_fmt+0x94/0xb4 [hantro_vpu]
> >> hantro_set_fmt_cap+0x23c/0x250 [hantro_vpu]
> >> hantro_reset_fmts+0x94/0xcc [hantro_vpu]
> >> hantro_open+0xd4/0x20c [hantro_vpu]
> >> v4l2_open+0x80/0x120 [videodev]
> >> chrdev_open+0xc0/0x22c
> >> do_dentry_open+0x13c/0x490
> >> vfs_open+0x2c/0x38
> >> path_openat+0x550/0x938
> >> do_filp_open+0x80/0x12c
> >> do_sys_openat2+0xb4/0x16c
> >> __arm64_sys_openat+0x64/0xac
> >> invoke_syscall+0x48/0x114
> >> el0_svc_common.constprop.0+0xfc/0x11c
> >> do_el0_svc+0x38/0xa4
> >> el0_svc+0x48/0xb8
> >> el0t_64_sync_handler+0xb8/0xbc
> >> el0t_64_sync+0x190/0x194
> >> Code: 97fe726c f940aa80 52864a61 72a686c1 (b9400800)
> >> ---[ end trace 0000000000000000 ]---
> > 
> > When I booted into my 6.4-rc1 (but also rc2) kernel on my
> > Pine64 Quartz64 Model A, I noticed a crash which seems the same as
> > above, but I didn't have such a crash with my 6.3 kernel.
> > Searching for 'hantro' led me to this commit as the most likely culprit
> > but when I build a new 6.4-rcX kernel with this commit reverted,
> > I still had this crash.
> > Do you have suggestions which commit would then be the likely culprit?
> 
> This patch fix the crash at boot time, revert it doesn't seem to be the
> solution. Maybe this proposal from Marek can help you ?
> https://patchwork.kernel.org/project/linux-media/patch/20230421104759.223646
> 3-1-m.szyprowski@samsung.com/

That helped :) After applying that patch I no longer have the crash.
Thanks!

Regards,
  Diederik

Linux regression tracking (Thorsten Leemhuis) May 23, 2023, 10:50 a.m. UTC | #3

CCing the Regression list and a bunch of other people that were CCed in
threads that look related:

On 23.05.23 00:38, Diederik de Haas wrote:
> On Monday, 22 May 2023 18:17:39 CEST Benjamin Gaignard wrote:
>> Le 20/05/2023 à 00:34, Diederik de Haas a écrit :
>>> On Thursday, 13 April 2023 21:52:50 CEST Nicolas Dufresne wrote:
> [...]
>>> When I booted into my 6.4-rc1 (but also rc2) kernel on my
>>> Pine64 Quartz64 Model A, I noticed a crash which seems the same as
>>> above, but I didn't have such a crash with my 6.3 kernel.
>>> Searching for 'hantro' led me to this commit as the most likely culprit
>>> but when I build a new 6.4-rcX kernel with this commit reverted,
>>> I still had this crash.
>>> Do you have suggestions which commit would then be the likely culprit?
>>
>> This patch fix the crash at boot time, revert it doesn't seem to be the
>> solution. Maybe this proposal from Marek can help you ?
>>
>> https://patchwork.kernel.org/project/linux-media/patch/20230421104759.2236463-1-m.szyprowski@samsung.com/
> 
> That helped :) After applying that patch I no longer have the crash.
> Thanks!

That regression fix is now a month old, but not yet merged afaics --
guess due to Nicolas comment that wasn't addressed yet and likely
requires a updated patch.

Michael afaics a week ago posted a patch that to my *very limited
understanding of things* (I hope I don't confuse matters here!) seems to
address the same problem, but slightly differently:
https://lore.kernel.org/all/20230516091209.3098262-1-m.tretter@pengutronix.de/

No reply yet.

That's all a bit unfortunate, as it's not how regression fixes should be
dealt with -- and caused multiple people headaches that could have been
avoided. :-/

But well, things happen. But it leads to the question:

How can we finally address the issue quickly now to ensure is doesn't
cause headaches for even more people?

Marek, Michael, could you work on a patch together that we then get
somewhat fast-tracked to Linus to avoid him getting even more unhappy
about the state of things[1]?

Ciao, Thorsten

[1] see
https://lore.kernel.org/all/CAHk-=wgzU8_dGn0Yg+DyX7ammTkDUCyEJ4C=NvnHRhxKWC7Wpw@mail.gmail.com/

Michael Tretter May 23, 2023, 2:54 p.m. UTC | #4

On Tue, 23 May 2023 12:50:42 +0200, Linux regression tracking (Thorsten Leemhuis) wrote:
> CCing the Regression list and a bunch of other people that were CCed in
> threads that look related:

Thanks!

> 
> On 23.05.23 00:38, Diederik de Haas wrote:
> > On Monday, 22 May 2023 18:17:39 CEST Benjamin Gaignard wrote:
> >> Le 20/05/2023 à 00:34, Diederik de Haas a écrit :
> >>> On Thursday, 13 April 2023 21:52:50 CEST Nicolas Dufresne wrote:
> > [...]
> >>> When I booted into my 6.4-rc1 (but also rc2) kernel on my
> >>> Pine64 Quartz64 Model A, I noticed a crash which seems the same as
> >>> above, but I didn't have such a crash with my 6.3 kernel.
> >>> Searching for 'hantro' led me to this commit as the most likely culprit
> >>> but when I build a new 6.4-rcX kernel with this commit reverted,
> >>> I still had this crash.
> >>> Do you have suggestions which commit would then be the likely culprit?
> >>
> >> This patch fix the crash at boot time, revert it doesn't seem to be the
> >> solution. Maybe this proposal from Marek can help you ?
> >>
> >> https://patchwork.kernel.org/project/linux-media/patch/20230421104759.2236463-1-m.szyprowski@samsung.com/
> > 
> > That helped :) After applying that patch I no longer have the crash.
> > Thanks!
> 
> That regression fix is now a month old, but not yet merged afaics --
> guess due to Nicolas comment that wasn't addressed yet and likely
> requires a updated patch.

I agree with Nicolas comment on that patch and it needs to be updated.

> 
> Michael afaics a week ago posted a patch that to my *very limited
> understanding of things* (I hope I don't confuse matters here!) seems to
> address the same problem, but slightly differently:
> https://lore.kernel.org/all/20230516091209.3098262-1-m.tretter@pengutronix.de/

Correct, my patch addresses the same problem.

> 
> No reply yet.
> 
> That's all a bit unfortunate, as it's not how regression fixes should be
> dealt with -- and caused multiple people headaches that could have been
> avoided. :-/
> 
> But well, things happen. But it leads to the question:
> 
> How can we finally address the issue quickly now to ensure is doesn't
> cause headaches for even more people?
> 
> Marek, Michael, could you work on a patch together that we then get
> somewhat fast-tracked to Linus to avoid him getting even more unhappy
> about the state of things[1]?

Marek, if you have an updated patch, I will happily test and review it.
Otherwise, please take a look at my patch.

Michael

Marek Szyprowski May 23, 2023, 4:32 p.m. UTC | #5

On 23.05.2023 16:54, Michael Tretter wrote:
> On Tue, 23 May 2023 12:50:42 +0200, Linux regression tracking (Thorsten Leemhuis) wrote:
>> CCing the Regression list and a bunch of other people that were CCed in
>> threads that look related:
> Thanks!
>
>> On 23.05.23 00:38, Diederik de Haas wrote:
>>> On Monday, 22 May 2023 18:17:39 CEST Benjamin Gaignard wrote:
>>>> Le 20/05/2023 à 00:34, Diederik de Haas a écrit :
>>>>> On Thursday, 13 April 2023 21:52:50 CEST Nicolas Dufresne wrote:
>>> [...]
>>>>> When I booted into my 6.4-rc1 (but also rc2) kernel on my
>>>>> Pine64 Quartz64 Model A, I noticed a crash which seems the same as
>>>>> above, but I didn't have such a crash with my 6.3 kernel.
>>>>> Searching for 'hantro' led me to this commit as the most likely culprit
>>>>> but when I build a new 6.4-rcX kernel with this commit reverted,
>>>>> I still had this crash.
>>>>> Do you have suggestions which commit would then be the likely culprit?
>>>> This patch fix the crash at boot time, revert it doesn't seem to be the
>>>> solution. Maybe this proposal from Marek can help you ?
>>>>
>>>> https://patchwork.kernel.org/project/linux-media/patch/20230421104759.2236463-1-m.szyprowski@samsung.com/
>>> That helped :) After applying that patch I no longer have the crash.
>>> Thanks!
>> That regression fix is now a month old, but not yet merged afaics --
>> guess due to Nicolas comment that wasn't addressed yet and likely
>> requires a updated patch.
> I agree with Nicolas comment on that patch and it needs to be updated.
>
>> Michael afaics a week ago posted a patch that to my *very limited
>> understanding of things* (I hope I don't confuse matters here!) seems to
>> address the same problem, but slightly differently:
>> https://lore.kernel.org/all/20230516091209.3098262-1-m.tretter@pengutronix.de/
> Correct, my patch addresses the same problem.
>
>> No reply yet.
>>
>> That's all a bit unfortunate, as it's not how regression fixes should be
>> dealt with -- and caused multiple people headaches that could have been
>> avoided. :-/
>>
>> But well, things happen. But it leads to the question:
>>
>> How can we finally address the issue quickly now to ensure is doesn't
>> cause headaches for even more people?
>>
>> Marek, Michael, could you work on a patch together that we then get
>> somewhat fast-tracked to Linus to avoid him getting even more unhappy
>> about the state of things[1]?
> Marek, if you have an updated patch, I will happily test and review it.
> Otherwise, please take a look at my patch.

Go ahead with your, my was just a blind try eliminating the oops during 
driver probe.


Best regards

Michael Tretter June 1, 2023, 1:27 p.m. UTC | #6

Hi Benjamin,

On Thu, 13 Apr 2023 12:47:56 +0200, Benjamin Gaignard wrote:
> ctx->vpu_dst_fmt is no more initialized before calling hantro_try_fmt()
> so assigne it to vpu_fmt led to crash the kernel.
> Like for decoder case use 'fmt' as format for encoder and clean up
> the code.
> 
> Signed-off-by: Benjamin Gaignard <benjamin.gaignard@collabora.com>
> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Fixes: db6f68b51e5c ("media: verisilicon: Do not set context src/dst formats in reset functions")
> ---
> version 2:
> - Remove useless vpu_fmt.
> 
>  drivers/media/platform/verisilicon/hantro_v4l2.c | 10 +++-------
>  1 file changed, 3 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/media/platform/verisilicon/hantro_v4l2.c b/drivers/media/platform/verisilicon/hantro_v4l2.c
> index 8f1414085f47..d71f79471396 100644
> --- a/drivers/media/platform/verisilicon/hantro_v4l2.c
> +++ b/drivers/media/platform/verisilicon/hantro_v4l2.c
> @@ -275,7 +275,7 @@ static int hantro_try_fmt(const struct hantro_ctx *ctx,
>  			  struct v4l2_pix_format_mplane *pix_mp,
>  			  enum v4l2_buf_type type)
>  {
> -	const struct hantro_fmt *fmt, *vpu_fmt;
> +	const struct hantro_fmt *fmt;
>  	bool capture = V4L2_TYPE_IS_CAPTURE(type);
>  	bool coded;
>  
> @@ -295,11 +295,7 @@ static int hantro_try_fmt(const struct hantro_ctx *ctx,
>  
>  	if (coded) {
>  		pix_mp->num_planes = 1;
> -		vpu_fmt = fmt;
> -	} else if (ctx->is_encoder) {
> -		vpu_fmt = ctx->vpu_dst_fmt;
> -	} else {
> -		vpu_fmt = fmt;
> +	} else if (!ctx->is_encoder) {
>  		/*
>  		 * Width/height on the CAPTURE end of a decoder are ignored and
>  		 * replaced by the OUTPUT ones.
> @@ -311,7 +307,7 @@ static int hantro_try_fmt(const struct hantro_ctx *ctx,
>  	pix_mp->field = V4L2_FIELD_NONE;
>  
>  	v4l2_apply_frmsize_constraints(&pix_mp->width, &pix_mp->height,
> -				       &vpu_fmt->frmsize);
> +				       &fmt->frmsize);

This causes a regression on the OUTPUT device of the encoder. fmt->frmsize is
only valid for coded ("bitstream") formats, but fmt on the OUTPUT of an
encoder will be a raw format. This results in width and height to be clamped
to 0.

I think the correct fix would be to apply the frmsize constraints of the
currently configured coded format, but as ctx->vpu_dst_fmt is not initialized
before calling this code, I don't know how to get the coded format.

Michael

>  
>  	if (!coded) {
>  		/* Fill remaining fields */
> -- 
> 2.34.1
>

[v2] media: verisilicon: Fix crash when probing encoder

Commit Message

Comments

Patch