Message ID | 20240827093011.18621-14-nbd@nbd.name |
---|---|
State | New |
Headers | show |
Series | [v2,01/24] mt76: mt7603: fix mixed declarations and code | expand |
On Tue, Aug 27, 2024 at 11:30:01AM +0200, Felix Fietkau wrote: > In some cases MCU messages can get lost. Instead of failing completely, > attempt to recover by re-sending them. > > Signed-off-by: Felix Fietkau <nbd@nbd.name> Hi, KernelCI has identified a regression originating from this patch. I've verified that reverting it fixes the issue. Regression's impact: Unable to boot Affected platforms: * mt8186-corsola-steelix-sku131072 Relevant kernel logs: [ 3.457006] ------------[ cut here ]------------ [ 3.466050] kernel BUG at net/core/skbuff.c:2255! [ 3.466055] Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP [ 3.466059] Modules linked in: mt7921s mtk_vcodec_dbgfs mt76_sdio mtk_jpeg mtk_vcodec_common mt7921_common mtk_jpeg_enc_hw [ 3.484734] mt792x_lib mt76_connac_lib mtk_vpu mtk_jpeg_dec_hw v4l2_mem2mem [ 3.496464] mt76 videobuf2_dma_contig btmtksdio cros_ec_rpmsg btmtk videobuf2_memops cbmem mac80211 libarc4 bluetooth videobuf2_v4l2 videodev [ 3.510198] ecdh_generic cros_ec_sensors cros_ec_lid_angle cfg80211 ecc [ 3.522273] videobuf2_common mediatek_drm cros_ec_sensors_core crct10dif_ce industrialio_triggered_buffer [ 3.534348] kfifo_buf leds_cros_ec cros_ec_typec cros_ec_chardev mc [ 3.545814] sbs_battery elan_i2c phy_mtk_mipi_dsi_drv mtk_mmsys mtk_svs [ 3.562574] drm_dma_helper rfkill snd_sof_mt8186 mtk_adsp_common [ 3.574821] snd_sof_xtensa_dsp snd_sof_of snd_sof mtk_scp [ 3.594963] mtk_mutex mtk_rpmsg hid_multitouch mtk_scp_ipi lvts_thermal mt6577_auxadc [ 3.610771] snd_sof_utils mtk_wdt coreboot_table [ 3.626141] ramoops reed_solomon pwm_bl backlight [ 3.637694] [ 3.637698] CPU: 4 UID: 0 PID: 235 Comm: mt76-sdio-txrx Not tainted 6.11.0-rc7-next-20240913 #1 [ 3.651764] Hardware name: Google Steelix board (DT) [ 3.651767] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 3.664875] pc : pskb_expand_head+0x2cc/0x3c4 [ 3.729971] usb 1-1: new high-speed USB device number 2 using xhci-mtk [ 3.733209] lr : mt76s_tx_run_queue+0x27c/0x410 [mt76_sdio] [ 3.744152] sp : ffff8000810d3c30 [ 3.756220] x29: ffff8000810d3c30 x28: 0000000000000000 [ 3.765341] x27: ffff6a16c81a7780 [ 3.778973] x26: 0000000000000000 x25: ffff6a16cc2576b0 x24: 0000000000000140 [ 3.790005] x23: ffff6a16cb910080 [ 3.799991] x22: 0000000000000028 [ 3.816224] x21: ffff6a16cc252000 [ 3.821005] x20: 0000000000000000 x19: ffff6a16c81a4300 x18: 0000000000000000 [ 3.858429] x17: ffffc9f99aab0000 x16: ffffa01e5a692f5c x15: 0000000000000000 [ 3.865556] x14: 0000000000000352 x13: 0000000000000352 x12: 0000000000000001 [ 3.872682] x11: 0000000000000057 x10: ffff6a16c8aeeb88 x9 : ffff6a16c81a4300 [ 3.879808] x8 : ffff6a16c8aeeb98 x7 : ffff6a17f6d91428 x6 : 0000000000000001 [ 3.886935] x5 : 0000000000000000 x4 : ffff6a16c827b3c0 x3 : 0000000000000820 [ 3.894060] x2 : 0000000000000200 x1 : 0000000000000002 x0 : ffff6a16c81a4300 [ 3.901186] Call trace: [ 3.903621] pskb_expand_head+0x2cc/0x3c4 [ 3.907622] mt76s_tx_run_queue+0x27c/0x410 [mt76_sdio] [ 3.912839] mt76s_txrx_worker+0xc8/0xde4 [mt76_sdio] [ 3.917881] mt7921s_txrx_worker+0x5c/0xec [mt7921s] [ 3.922839] __mt76_worker_fn+0x80/0x120 [mt76] [ 3.927380] kthread+0x114/0x118 [ 3.930601] ret_from_fork+0x10/0x20 [ 3.934171] Code: 17ffffb5 f9002bfb d4210000 f9002bfb (d4210000) [ 3.940252] ---[ end trace 0000000000000000 ]--- [ 3.948178] note: mt76-sdio-txrx [235] exited with irqs disabled [ 3.954227] note: mt76-sdio-txrx [235] exited with preempt_count 1 [ 3.960491] ------------[ cut here ]------------ [ 11.486135] ------------[ cut here ]------------ [ 11.490749] WARNING: CPU: 7 PID: 54 at kernel/kthread.c:657 kthread_park+0xa4/0xd0 [ 11.498319] Modules linked in: ip_tables x_tables ipv6 ax88796b asix onboard_usb_dev panel_edp uvcvideo uvc videobuf2_vmalloc mtk_vcodec_dec_hw mtk_vcodec_dec v4l2_vp9 mtk_vcodec_enc v4l2_h264 mt7921s mtk_jpeg mtk_vcodec_dbgfs btmtksdio mtk_vcodec_common mt76_sdio mtk_jpeg_enc_hw mtk_vpu mtk_jpeg_dec_hw btmtk mt7921_common mt792x_lib v4l2_mem2mem cros_ec_rpmsg videobuf2_dma_contig cbmem mt76_connac_lib videobuf2_memops videobuf2_v4l2 mt76 mac80211 bluetooth crct10dif_ce videodev ecdh_generic cros_ec_lid_angle videobuf2_common cros_ec_sensors ecc libarc4 mc mediatek_drm leds_cros_ec cros_ec_sensors_core cfg80211 mtk_mutex phy_mtk_mipi_dsi_drv mtk_mmsys drm_dma_helper industrialio_triggered_buffer sbs_battery rfkill cros_ec_chardev kfifo_buf snd_sof_mt8186 hid_multitouch mtk_adsp_common cros_ec_typec elan_i2c snd_sof_xtensa_dsp snd_sof_of snd_sof mtk_wdt mtk_scp snd_sof_utils lvts_thermal mtk_svs mt6577_auxadc pwm_bl backlight mtk_rpmsg mtk_scp_ipi ramoops reed_solomon coreboot_table [ 11.585320] CPU: 7 UID: 0 PID: 54 Comm: kworker/7:0 Tainted: G D W 6.11.0-rc7-next-20240913 #1 [ 11.585329] Tainted: [D]=DIE, [W]=WARN [ 11.585331] Hardware name: Google Steelix board (DT) [ 11.585334] Workqueue: events mt7921_init_work [mt7921_common] [ 11.585349] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 11.616606] pc : kthread_park+0xa4/0xd0 [ 11.620431] lr : mt7921s_init_reset+0x80/0x24c [mt7921s] [ 11.625731] sp : ffff800080473cd0 [ 11.629032] x29: ffff800080473cd0 x28: 0000000000000000 x27: 0000000000000000 [ 11.636153] x26: ffff559736df45a8 x25: ffff559606502148 x24: ffff55960650a148 [ 11.643274] x23: 000000000041f23c x22: ffff559606502000 x21: ffff559606507738 [ 11.650394] x20: 0000000000000000 x19: ffff5596086e9140 x18: 0000000000000001 [ 11.657515] x17: 000000040044ffff x16: ffffb34762ec4fbc x15: 0000000000000000 [ 11.664636] x14: 00000000000001cc x13: 0000000000000000 x12: 0000000000000000 [ 11.671757] x11: 0000000000000001 x10: 0000000000000a90 x9 : ffff800080473bb0 [ 11.678878] x8 : ffff559736debcc0 x7 : ffff559736df4c40 x6 : 00000000000249f0 [ 11.685998] x5 : ffff559606502068 x4 : ffff559606502060 x3 : ffff559606507740 [ 11.693119] x2 : ffff559606507740 x1 : 0000000000005800 x0 : 000000000020804c [ 11.700240] Call trace: [ 11.702673] kthread_park+0xa4/0xd0 [ 11.706150] mt7921s_init_reset+0x80/0x24c [mt7921s] [ 11.711100] mt7921_init_work+0x190/0x240 [mt7921_common] [ 11.716486] process_one_work+0x14c/0x28c [ 11.720483] worker_thread+0x2d0/0x3d8 [ 11.724219] kthread+0x114/0x118 [ 11.727435] ret_from_fork+0x10/0x20 [ 11.731000] ---[ end trace 0000000000000000 ]--- [ 11.848160] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 [ 11.856958] Mem abort info: [ 11.859747] ESR = 0x0000000096000004 [ 11.863490] EC = 0x25: DABT (current EL), IL = 32 bits [ 11.868795] SET = 0, FnV = 0 [ 11.871842] EA = 0, S1PTW = 0 [ 11.874976] FSC = 0x04: level 0 translation fault [ 11.879847] Data abort info: [ 11.882721] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 [ 11.888199] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 11.893247] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 11.898555] user pgtable: 4k pages, 48-bit VAs, pgdp=000000010815f000 [ 11.904993] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000 [ 11.918032] Modules linked in: ip_tables x_tables ipv6 ax88796b asix onboard_usb_dev panel_edp uvcvideo uvc videobuf2_vmalloc mtk_vcodec_dec_hw mtk_vcodec_dec v4l2_vp9 mtk_vcodec_enc v4l2_h264 mt7921s mtk_jpeg mtk_vcodec_dbgfs btmtksdio mtk_vcodec_common mt76_sdio mtk_jpeg_enc_hw mtk_vpu mtk_jpeg_dec_hw btmtk mt7921_common mt792x_lib v4l2_mem2mem cros_ec_rpmsg videobuf2_dma_contig cbmem mt76_connac_lib videobuf2_memops videobuf2_v4l2 mt76 mac80211 bluetooth crct10dif_ce videodev ecdh_generic cros_ec_lid_angle videobuf2_common cros_ec_sensors ecc libarc4 mc mediatek_drm leds_cros_ec cros_ec_sensors_core cfg80211 mtk_mutex phy_mtk_mipi_dsi_drv mtk_mmsys drm_dma_helper industrialio_triggered_buffer sbs_battery rfkill cros_ec_chardev kfifo_buf snd_sof_mt8186 hid_multitouch mtk_adsp_common cros_ec_typec elan_i2c snd_sof_xtensa_dsp snd_sof_of snd_sof mtk_wdt mtk_scp snd_sof_utils lvts_thermal mtk_svs mt6577_auxadc pwm_bl backlight mtk_rpmsg mtk_scp_ipi ramoops reed_solomon coreboot_table [ 12.005017] CPU: 7 UID: 0 PID: 54 Comm: kworker/7:0 Tainted: G D W 6.11.0-rc7-next-20240913 #1 [ 12.014835] Tainted: [D]=DIE, [W]=WARN [ 12.018572] Hardware name: Google Steelix board (DT) [ 12.023523] Workqueue: events mt7921_init_work [mt7921_common] [ 12.029355] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 12.036305] pc : kthread_unpark+0x1c/0xb4 [ 12.040311] lr : mt7921s_init_reset+0xc0/0x24c [mt7921s] [ 12.045614] sp : ffff800080473cc0 [ 12.048915] x29: ffff800080473cc0 x28: 0000000000000000 x27: 0000000000000000 [ 12.056040] x26: ffff559736df45a8 x25: ffff559606502148 x24: ffff55960650a148 [ 12.063163] x23: 000000000041f23c x22: ffff559606502000 x21: ffff5596065076b0 [ 12.070288] x20: ffff559606507800 x19: 0000000000000000 x18: ffff55973eea827c [ 12.077411] x17: 00000000000a9818 x16: ffffb34762ec4a30 x15: 0000000000000000 [ 12.084535] x14: ffff559600304580 x13: 0000000000000050 x12: 0000000000000001 [ 12.091659] x11: 0000000000000001 x10: 0000000000000a90 x9 : ffff800080473850 [ 12.098783] x8 : 0000000000000100 x7 : ffff559736078000 x6 : 0000000000000018 [ 12.105907] x5 : 00000000ffff8f00 x4 : 00ffffffffffffff x3 : 0000000000001099 [ 12.113031] x2 : 00000000fffee699 x1 : 000000000020804c x0 : ffff5596086e9140 [ 12.120155] Call trace: [ 12.122590] kthread_unpark+0x1c/0xb4 [ 12.126244] mt7921s_init_reset+0xc0/0x24c [mt7921s] [ 12.131197] mt7921_init_work+0x190/0x240 [mt7921_common] [ 12.136587] process_one_work+0x14c/0x28c [ 12.140585] worker_thread+0x2d0/0x3d8 [ 12.144323] kthread+0x114/0x118 [ 12.147542] ret_from_fork+0x10/0x20 [ 12.151111] Code: f9000bf3 b9402c01 36a804c1 f942cc13 (f9400261) [ 12.157190] ---[ end trace 0000000000000000 ]--- (Full logs available here: http://0x0.st/XxI-.txt) Happy to provide any other details necessary. Please add Reported-by: Nícolas F. R. A. Prado <nfraprado@collabora.com> #KernelCI when fixing this. #regzbot introduced: next-20240909..next-20240910 #regzbot title: Boot regression on mt8186-corsola-steelix-sku131072 due to bug in mcu message sending logic in mt76 Thanks, Nícolas
Nícolas F. R. A. Prado <nfraprado@collabora.com> writes: > On Tue, Aug 27, 2024 at 11:30:01AM +0200, Felix Fietkau wrote: >> In some cases MCU messages can get lost. Instead of failing completely, >> attempt to recover by re-sending them. >> >> Signed-off-by: Felix Fietkau <nbd@nbd.name> > > Hi, > > KernelCI has identified a regression originating from this patch. I've verified > that reverting it fixes the issue. > > Regression's impact: Unable to boot > > Affected platforms: > * mt8186-corsola-steelix-sku131072 > > Relevant kernel logs: > > [ 3.457006] ------------[ cut here ]------------ > [ 3.466050] kernel BUG at net/core/skbuff.c:2255! > [ 3.466055] Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP > [ 3.466059] Modules linked in: mt7921s mtk_vcodec_dbgfs mt76_sdio > mtk_jpeg mtk_vcodec_common mt7921_common mtk_jpeg_enc_hw > [ 3.484734] mt792x_lib mt76_connac_lib mtk_vpu mtk_jpeg_dec_hw v4l2_mem2mem > [ 3.496464] mt76 videobuf2_dma_contig btmtksdio cros_ec_rpmsg btmtk > videobuf2_memops cbmem mac80211 libarc4 bluetooth videobuf2_v4l2 > videodev > [ 3.510198] ecdh_generic cros_ec_sensors cros_ec_lid_angle cfg80211 ecc > [ 3.522273] videobuf2_common mediatek_drm cros_ec_sensors_core > crct10dif_ce industrialio_triggered_buffer > [ 3.534348] kfifo_buf leds_cros_ec cros_ec_typec cros_ec_chardev mc > [ 3.545814] sbs_battery elan_i2c phy_mtk_mipi_dsi_drv mtk_mmsys mtk_svs > [ 3.562574] drm_dma_helper rfkill snd_sof_mt8186 mtk_adsp_common > [ 3.574821] snd_sof_xtensa_dsp snd_sof_of snd_sof mtk_scp > [ 3.594963] mtk_mutex mtk_rpmsg hid_multitouch mtk_scp_ipi lvts_thermal mt6577_auxadc > [ 3.610771] snd_sof_utils mtk_wdt coreboot_table > [ 3.626141] ramoops reed_solomon pwm_bl backlight > [ 3.637694] > [ 3.637698] CPU: 4 UID: 0 PID: 235 Comm: mt76-sdio-txrx Not tainted 6.11.0-rc7-next-20240913 #1 > [ 3.651764] Hardware name: Google Steelix board (DT) > [ 3.651767] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) > [ 3.664875] pc : pskb_expand_head+0x2cc/0x3c4 > [ 3.729971] usb 1-1: new high-speed USB device number 2 using xhci-mtk > [ 3.733209] lr : mt76s_tx_run_queue+0x27c/0x410 [mt76_sdio] > [ 3.744152] sp : ffff8000810d3c30 > [ 3.756220] x29: ffff8000810d3c30 x28: 0000000000000000 > [ 3.765341] x27: ffff6a16c81a7780 > [ 3.778973] x26: 0000000000000000 x25: ffff6a16cc2576b0 x24: 0000000000000140 > [ 3.790005] x23: ffff6a16cb910080 > [ 3.799991] x22: 0000000000000028 > [ 3.816224] x21: ffff6a16cc252000 > [ 3.821005] x20: 0000000000000000 x19: ffff6a16c81a4300 x18: 0000000000000000 > [ 3.858429] x17: ffffc9f99aab0000 x16: ffffa01e5a692f5c x15: 0000000000000000 > [ 3.865556] x14: 0000000000000352 x13: 0000000000000352 x12: 0000000000000001 > [ 3.872682] x11: 0000000000000057 x10: ffff6a16c8aeeb88 x9 : ffff6a16c81a4300 > [ 3.879808] x8 : ffff6a16c8aeeb98 x7 : ffff6a17f6d91428 x6 : 0000000000000001 > [ 3.886935] x5 : 0000000000000000 x4 : ffff6a16c827b3c0 x3 : 0000000000000820 > [ 3.894060] x2 : 0000000000000200 x1 : 0000000000000002 x0 : ffff6a16c81a4300 > [ 3.901186] Call trace: > [ 3.903621] pskb_expand_head+0x2cc/0x3c4 > [ 3.907622] mt76s_tx_run_queue+0x27c/0x410 [mt76_sdio] > [ 3.912839] mt76s_txrx_worker+0xc8/0xde4 [mt76_sdio] > [ 3.917881] mt7921s_txrx_worker+0x5c/0xec [mt7921s] > [ 3.922839] __mt76_worker_fn+0x80/0x120 [mt76] > [ 3.927380] kthread+0x114/0x118 > [ 3.930601] ret_from_fork+0x10/0x20 > [ 3.934171] Code: 17ffffb5 f9002bfb d4210000 f9002bfb (d4210000) > [ 3.940252] ---[ end trace 0000000000000000 ]--- > [ 3.948178] note: mt76-sdio-txrx [235] exited with irqs disabled > [ 3.954227] note: mt76-sdio-txrx [235] exited with preempt_count 1 > [ 3.960491] ------------[ cut here ]------------ > > [ 11.486135] ------------[ cut here ]------------ > [ 11.490749] WARNING: CPU: 7 PID: 54 at kernel/kthread.c:657 kthread_park+0xa4/0xd0 > [ 11.498319] Modules linked in: ip_tables x_tables ipv6 ax88796b > asix onboard_usb_dev panel_edp uvcvideo uvc videobuf2_vmalloc > mtk_vcodec_dec_hw mtk_vcodec_dec v4l2_vp9 mtk_vcodec_enc v4l2_h264 > mt7921s mtk_jpeg mtk_vcodec_dbgfs btmtksdio mtk_vcodec_common > mt76_sdio mtk_jpeg_enc_hw mtk_vpu mtk_jpeg_dec_hw btmtk mt7921_common > mt792x_lib v4l2_mem2mem cros_ec_rpmsg videobuf2_dma_contig cbmem > mt76_connac_lib videobuf2_memops videobuf2_v4l2 mt76 mac80211 > bluetooth crct10dif_ce videodev ecdh_generic cros_ec_lid_angle > videobuf2_common cros_ec_sensors ecc libarc4 mc mediatek_drm > leds_cros_ec cros_ec_sensors_core cfg80211 mtk_mutex > phy_mtk_mipi_dsi_drv mtk_mmsys drm_dma_helper > industrialio_triggered_buffer sbs_battery rfkill cros_ec_chardev > kfifo_buf snd_sof_mt8186 hid_multitouch mtk_adsp_common cros_ec_typec > elan_i2c snd_sof_xtensa_dsp snd_sof_of snd_sof mtk_wdt mtk_scp > snd_sof_utils lvts_thermal mtk_svs mt6577_auxadc pwm_bl backlight > mtk_rpmsg mtk_scp_ipi ramoops reed_solomon coreboot_table > [ 11.585320] CPU: 7 UID: 0 PID: 54 Comm: kworker/7:0 Tainted: G D W > 6.11.0-rc7-next-20240913 #1 > [ 11.585329] Tainted: [D]=DIE, [W]=WARN > [ 11.585331] Hardware name: Google Steelix board (DT) > [ 11.585334] Workqueue: events mt7921_init_work [mt7921_common] > [ 11.585349] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) > [ 11.616606] pc : kthread_park+0xa4/0xd0 > [ 11.620431] lr : mt7921s_init_reset+0x80/0x24c [mt7921s] > [ 11.625731] sp : ffff800080473cd0 > [ 11.629032] x29: ffff800080473cd0 x28: 0000000000000000 x27: 0000000000000000 > [ 11.636153] x26: ffff559736df45a8 x25: ffff559606502148 x24: ffff55960650a148 > [ 11.643274] x23: 000000000041f23c x22: ffff559606502000 x21: ffff559606507738 > [ 11.650394] x20: 0000000000000000 x19: ffff5596086e9140 x18: 0000000000000001 > [ 11.657515] x17: 000000040044ffff x16: ffffb34762ec4fbc x15: 0000000000000000 > [ 11.664636] x14: 00000000000001cc x13: 0000000000000000 x12: 0000000000000000 > [ 11.671757] x11: 0000000000000001 x10: 0000000000000a90 x9 : ffff800080473bb0 > [ 11.678878] x8 : ffff559736debcc0 x7 : ffff559736df4c40 x6 : 00000000000249f0 > [ 11.685998] x5 : ffff559606502068 x4 : ffff559606502060 x3 : ffff559606507740 > [ 11.693119] x2 : ffff559606507740 x1 : 0000000000005800 x0 : 000000000020804c > [ 11.700240] Call trace: > [ 11.702673] kthread_park+0xa4/0xd0 > [ 11.706150] mt7921s_init_reset+0x80/0x24c [mt7921s] > [ 11.711100] mt7921_init_work+0x190/0x240 [mt7921_common] > [ 11.716486] process_one_work+0x14c/0x28c > [ 11.720483] worker_thread+0x2d0/0x3d8 > [ 11.724219] kthread+0x114/0x118 > [ 11.727435] ret_from_fork+0x10/0x20 > [ 11.731000] ---[ end trace 0000000000000000 ]--- > [ 11.848160] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 > [ 11.856958] Mem abort info: > [ 11.859747] ESR = 0x0000000096000004 > [ 11.863490] EC = 0x25: DABT (current EL), IL = 32 bits > [ 11.868795] SET = 0, FnV = 0 > [ 11.871842] EA = 0, S1PTW = 0 > [ 11.874976] FSC = 0x04: level 0 translation fault > [ 11.879847] Data abort info: > [ 11.882721] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 > [ 11.888199] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 > [ 11.893247] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 > [ 11.898555] user pgtable: 4k pages, 48-bit VAs, pgdp=000000010815f000 > [ 11.904993] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000 > [ 11.918032] Modules linked in: ip_tables x_tables ipv6 ax88796b > asix onboard_usb_dev panel_edp uvcvideo uvc videobuf2_vmalloc > mtk_vcodec_dec_hw mtk_vcodec_dec v4l2_vp9 mtk_vcodec_enc v4l2_h264 > mt7921s mtk_jpeg mtk_vcodec_dbgfs btmtksdio mtk_vcodec_common > mt76_sdio mtk_jpeg_enc_hw mtk_vpu mtk_jpeg_dec_hw btmtk mt7921_common > mt792x_lib v4l2_mem2mem cros_ec_rpmsg videobuf2_dma_contig cbmem > mt76_connac_lib videobuf2_memops videobuf2_v4l2 mt76 mac80211 > bluetooth crct10dif_ce videodev ecdh_generic cros_ec_lid_angle > videobuf2_common cros_ec_sensors ecc libarc4 mc mediatek_drm > leds_cros_ec cros_ec_sensors_core cfg80211 mtk_mutex > phy_mtk_mipi_dsi_drv mtk_mmsys drm_dma_helper > industrialio_triggered_buffer sbs_battery rfkill cros_ec_chardev > kfifo_buf snd_sof_mt8186 hid_multitouch mtk_adsp_common cros_ec_typec > elan_i2c snd_sof_xtensa_dsp snd_sof_of snd_sof mtk_wdt mtk_scp > snd_sof_utils lvts_thermal mtk_svs mt6577_auxadc pwm_bl backlight > mtk_rpmsg mtk_scp_ipi ramoops reed_solomon coreboot_table > [ 12.005017] CPU: 7 UID: 0 PID: 54 Comm: kworker/7:0 Tainted: G D W > 6.11.0-rc7-next-20240913 #1 > [ 12.014835] Tainted: [D]=DIE, [W]=WARN > [ 12.018572] Hardware name: Google Steelix board (DT) > [ 12.023523] Workqueue: events mt7921_init_work [mt7921_common] > [ 12.029355] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) > [ 12.036305] pc : kthread_unpark+0x1c/0xb4 > [ 12.040311] lr : mt7921s_init_reset+0xc0/0x24c [mt7921s] > [ 12.045614] sp : ffff800080473cc0 > [ 12.048915] x29: ffff800080473cc0 x28: 0000000000000000 x27: 0000000000000000 > [ 12.056040] x26: ffff559736df45a8 x25: ffff559606502148 x24: ffff55960650a148 > [ 12.063163] x23: 000000000041f23c x22: ffff559606502000 x21: ffff5596065076b0 > [ 12.070288] x20: ffff559606507800 x19: 0000000000000000 x18: ffff55973eea827c > [ 12.077411] x17: 00000000000a9818 x16: ffffb34762ec4a30 x15: 0000000000000000 > [ 12.084535] x14: ffff559600304580 x13: 0000000000000050 x12: 0000000000000001 > [ 12.091659] x11: 0000000000000001 x10: 0000000000000a90 x9 : ffff800080473850 > [ 12.098783] x8 : 0000000000000100 x7 : ffff559736078000 x6 : 0000000000000018 > [ 12.105907] x5 : 00000000ffff8f00 x4 : 00ffffffffffffff x3 : 0000000000001099 > [ 12.113031] x2 : 00000000fffee699 x1 : 000000000020804c x0 : ffff5596086e9140 > [ 12.120155] Call trace: > [ 12.122590] kthread_unpark+0x1c/0xb4 > [ 12.126244] mt7921s_init_reset+0xc0/0x24c [mt7921s] > [ 12.131197] mt7921_init_work+0x190/0x240 [mt7921_common] > [ 12.136587] process_one_work+0x14c/0x28c > [ 12.140585] worker_thread+0x2d0/0x3d8 > [ 12.144323] kthread+0x114/0x118 > [ 12.147542] ret_from_fork+0x10/0x20 > [ 12.151111] Code: f9000bf3 b9402c01 36a804c1 f942cc13 (f9400261) > [ 12.157190] ---[ end trace 0000000000000000 ]--- > > (Full logs available here: http://0x0.st/XxI-.txt) > > Happy to provide any other details necessary. > > Please add > Reported-by: Nícolas F. R. A. Prado <nfraprado@collabora.com> #KernelCI > when fixing this. > > #regzbot introduced: next-20240909..next-20240910 > #regzbot title: Boot regression on mt8186-corsola-steelix-sku131072 > due to bug in mcu message sending logic in mt76 I don't see this in regzbot so let's try again: #regzbot introduced: 3688c18b65ae ^
diff --git a/drivers/net/wireless/mediatek/mt76/mcu.c b/drivers/net/wireless/mediatek/mt76/mcu.c index a8cafa39a56d..98da82b74094 100644 --- a/drivers/net/wireless/mediatek/mt76/mcu.c +++ b/drivers/net/wireless/mediatek/mt76/mcu.c @@ -73,6 +73,8 @@ int mt76_mcu_skb_send_and_get_msg(struct mt76_dev *dev, struct sk_buff *skb, int cmd, bool wait_resp, struct sk_buff **ret_skb) { + unsigned int retry = 0; + struct sk_buff *orig_skb = NULL; unsigned long expires; int ret, seq; @@ -81,6 +83,14 @@ int mt76_mcu_skb_send_and_get_msg(struct mt76_dev *dev, struct sk_buff *skb, mutex_lock(&dev->mcu.mutex); + if (dev->mcu_ops->mcu_skb_prepare_msg) { + ret = dev->mcu_ops->mcu_skb_prepare_msg(dev, skb, cmd, &seq); + if (ret < 0) + goto out; + } + +retry: + orig_skb = skb_get(skb); ret = dev->mcu_ops->mcu_skb_send_msg(dev, skb, cmd, &seq); if (ret < 0) goto out; @@ -94,6 +104,14 @@ int mt76_mcu_skb_send_and_get_msg(struct mt76_dev *dev, struct sk_buff *skb, do { skb = mt76_mcu_get_response(dev, expires); + if (!skb && !test_bit(MT76_MCU_RESET, &dev->phy.state) && + retry++ < dev->mcu_ops->max_retry) { + dev_err(dev->dev, "Retry message %08x (seq %d)\n", + cmd, seq); + skb = orig_skb; + goto retry; + } + ret = dev->mcu_ops->mcu_parse_response(dev, cmd, skb, seq); if (!ret && ret_skb) *ret_skb = skb; @@ -101,7 +119,9 @@ int mt76_mcu_skb_send_and_get_msg(struct mt76_dev *dev, struct sk_buff *skb, dev_kfree_skb(skb); } while (ret == -EAGAIN); + out: + dev_kfree_skb(orig_skb); mutex_unlock(&dev->mcu.mutex); return ret; diff --git a/drivers/net/wireless/mediatek/mt76/mt76.h b/drivers/net/wireless/mediatek/mt76/mt76.h index 43e743b510ba..794cd33be68b 100644 --- a/drivers/net/wireless/mediatek/mt76/mt76.h +++ b/drivers/net/wireless/mediatek/mt76/mt76.h @@ -230,11 +230,14 @@ struct mt76_queue { }; struct mt76_mcu_ops { + unsigned int max_retry; u32 headroom; u32 tailroom; int (*mcu_send_msg)(struct mt76_dev *dev, int cmd, const void *data, int len, bool wait_resp); + int (*mcu_skb_prepare_msg)(struct mt76_dev *dev, struct sk_buff *skb, + int cmd, int *seq); int (*mcu_skb_send_msg)(struct mt76_dev *dev, struct sk_buff *skb, int cmd, int *seq); int (*mcu_parse_response)(struct mt76_dev *dev, int cmd, diff --git a/drivers/net/wireless/mediatek/mt76/mt7915/mcu.c b/drivers/net/wireless/mediatek/mt76/mt7915/mcu.c index 2ef8d90132dd..0cde1b3c7d41 100644 --- a/drivers/net/wireless/mediatek/mt76/mt7915/mcu.c +++ b/drivers/net/wireless/mediatek/mt76/mt7915/mcu.c @@ -191,11 +191,6 @@ mt7915_mcu_send_message(struct mt76_dev *mdev, struct sk_buff *skb, { struct mt7915_dev *dev = container_of(mdev, struct mt7915_dev, mt76); enum mt76_mcuq_id qid; - int ret; - - ret = mt76_connac2_mcu_fill_message(mdev, skb, cmd, wait_seq); - if (ret) - return ret; if (cmd == MCU_CMD(FW_SCATTER)) qid = MT_MCUQ_FWDL; @@ -2382,7 +2377,9 @@ int mt7915_mcu_init_firmware(struct mt7915_dev *dev) int mt7915_mcu_init(struct mt7915_dev *dev) { static const struct mt76_mcu_ops mt7915_mcu_ops = { + .max_retry = 3, .headroom = sizeof(struct mt76_connac2_mcu_txd), + .mcu_skb_prepare_msg = mt76_connac2_mcu_fill_message, .mcu_skb_send_msg = mt7915_mcu_send_message, .mcu_parse_response = mt7915_mcu_parse_response, };
In some cases MCU messages can get lost. Instead of failing completely, attempt to recover by re-sending them. Signed-off-by: Felix Fietkau <nbd@nbd.name> --- drivers/net/wireless/mediatek/mt76/mcu.c | 20 +++++++++++++++++++ drivers/net/wireless/mediatek/mt76/mt76.h | 3 +++ .../net/wireless/mediatek/mt76/mt7915/mcu.c | 7 ++----- 3 files changed, 25 insertions(+), 5 deletions(-)