Message ID | 20250429122351.108684-1-usama.anjum@collabora.com |
---|---|
State | Superseded |
Headers | show |
Series | [v3] bus: mhi: host: don't free bhie tables during suspend/hibernation | expand |
Hi, On Tue, Apr 29, 2025 at 05:23:35PM +0500, Muhammad Usama Anjum wrote: > Fix dma_direct_alloc() failure at resume time during bhie_table > allocation. There is a crash report where at resume time, the memory > from the dma doesn't get allocated and MHI fails to re-initialize. > There is fragmentation/memory pressure. > > To fix it, don't free the memory at power down during suspend / > hibernation. Instead, use the same allocated memory again after every > resume / hibernation. This patch has been tested with resume and > hibernation both. > > The rddm is of constant size for a given hardware. While the fbc_image > size depends on the firmware. If the firmware changes, we'll free and > allocate new memory for it. > > Here are the crash logs: > > [ 3029.338587] mhi mhi0: Requested to power ON > [ 3029.338621] mhi mhi0: Power on setup success > [ 3029.668654] kworker/u33:8: page allocation failure: order:7, mode:0xc04(GFP_NOIO|GFP_DMA32), nodemask=(null),cpuset=/,mems_allowed=0 > [ 3029.668682] CPU: 4 UID: 0 PID: 2744 Comm: kworker/u33:8 Not tainted 6.11.11-valve10-1-neptune-611-gb69e902b4338 #1ed779c892334112fb968aaa3facf9686b5ff0bd7 > [ 3029.668690] Hardware name: Valve Galileo/Galileo, BIOS F7G0112 08/01/2024 > [ 3029.668694] Workqueue: mhi_hiprio_wq mhi_pm_st_worker [mhi] > [ 3029.668717] Call Trace: > [ 3029.668722] <TASK> > [ 3029.668728] dump_stack_lvl+0x4e/0x70 > [ 3029.668738] warn_alloc+0x164/0x190 > [ 3029.668747] ? srso_return_thunk+0x5/0x5f > [ 3029.668754] ? __alloc_pages_direct_compact+0xaf/0x360 > [ 3029.668761] __alloc_pages_slowpath.constprop.0+0xc75/0xd70 > [ 3029.668774] __alloc_pages_noprof+0x321/0x350 > [ 3029.668782] __dma_direct_alloc_pages.isra.0+0x14a/0x290 > [ 3029.668790] dma_direct_alloc+0x70/0x270 > [ 3029.668796] mhi_alloc_bhie_table+0xe8/0x190 [mhi faa917c5aa23a5f5b12d6a2c597067e16d2fedc0] > [ 3029.668814] mhi_fw_load_handler+0x1bc/0x310 [mhi faa917c5aa23a5f5b12d6a2c597067e16d2fedc0] > [ 3029.668830] mhi_pm_st_worker+0x5c8/0xaa0 [mhi faa917c5aa23a5f5b12d6a2c597067e16d2fedc0] > [ 3029.668844] ? srso_return_thunk+0x5/0x5f > [ 3029.668853] process_one_work+0x17e/0x330 > [ 3029.668861] worker_thread+0x2ce/0x3f0 > [ 3029.668868] ? __pfx_worker_thread+0x10/0x10 > [ 3029.668873] kthread+0xd2/0x100 > [ 3029.668879] ? __pfx_kthread+0x10/0x10 > [ 3029.668885] ret_from_fork+0x34/0x50 > [ 3029.668892] ? __pfx_kthread+0x10/0x10 > [ 3029.668898] ret_from_fork_asm+0x1a/0x30 > [ 3029.668910] </TASK> > > Tested-on: WCN6855 WLAN.HSP.1.1-03926.13-QCAHSPSWPL_V2_SILICONZ_CE-2.52297.6 > > Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> > --- This breaks ath12k on my T14s Snapdragon with WCN785x. After a suspend/resume cycle the following is in my logs (and the resume is super slow). Additionally at shutdown ath12k crashes with a NULL pointer dereference in mhi_deinit_dev_ctxt, which got called by mhi_unprepare_after_power_down, which got called by ath12k_mhi_stop. This happens after filesystem umount and I don't have anything configured right now to get logs from that point, so it is not included in the log from the suspend/resume cycle down below: ... [ 28.385370] ath12k_pci 0004:01:00.0: failed to set mhi state INIT(0) in current mhi state (0x1) [ 28.385379] ath12k_pci 0004:01:00.0: failed to set mhi state: INIT(0) [ 28.385383] ath12k_pci 0004:01:00.0: failed to start mhi: -22 [ 28.385387] ath12k_pci 0004:01:00.0: failed to power up hif during resume: -22 [ 28.385391] ath12k_pci 0004:01:00.0: failed to early resume core: -22 [ 28.385393] ath12k_pci 0004:01:00.0: PM: dpm_run_callback(): pci_pm_resume_early returns -22 [ 28.385413] ath12k_pci 0004:01:00.0: PM: failed to resume async early: error -22 [ 28.385513] qcom_mhi_qrtr mhi0_IPCR: Current EE: DISABLE Required EE Mask: 0x4 [ 28.385521] qcom_mhi_qrtr mhi0_IPCR: failed to prepare for autoqueue transfer -107 [ 28.385526] qcom_mhi_qrtr mhi0_IPCR: PM: dpm_run_callback(): qcom_mhi_qrtr_pm_resume_early [qrtr_mhi] returns -107 [ 28.385541] qcom_mhi_qrtr mhi0_IPCR: PM: failed to resume early: error -107 [ 50.146823] ath12k_pci 0004:01:00.0: timeout while waiting for restart complete [ 50.146830] ath12k_pci 0004:01:00.0: failed to resume core: -110 [ 50.146834] ath12k_pci 0004:01:00.0: PM: dpm_run_callback(): pci_pm_resume returns -110 [ 50.146849] ath12k_pci 0004:01:00.0: PM: failed to resume async: error -110 [ 53.218794] ath12k_pci 0004:01:00.0: wmi command 16387 timeout [ 53.218801] ath12k_pci 0004:01:00.0: failed to send WMI_PDEV_SET_PARAM cmd [ 53.218808] ath12k_pci 0004:01:00.0: failed to set ac override for ARP: -11 [ 53.218813] ath12k_pci 0004:01:00.0: fail to start mac operations in pdev idx 0 ret -11 [ 53.218817] ------------[ cut here ]------------ [ 53.218820] Hardware became unavailable upon resume. This could be a software issue prior to suspend or a hardware issue. [ 53.218855] WARNING: CPU: 2 PID: 1958 at net/mac80211/util.c:1829 ieee80211_reconfig+0x37c/0x1718 [mac80211] [ 53.218936] Modules linked in: reset_gpio snd_soc_wsa884x q6prm_clocks q6apm_dai q6apm_lpass_dais snd_q6dsp_common q6prm michael_mic rfcomm wireguard libchacha20poly1305 chacha_neon libchacha poly1305_neon ip6_udp_tunnel udp_tunnel libcurve25519_generic binfmt_misc qrtr_mhi ath12k mac80211 libarc4 cfg80211 mhi hci_uart btqca btbcm snd_soc_x1e80100 snd_soc_qcom_sdw snd_soc_qcom_common bluetooth ecdh_generic ecc qcom_spmi_temp_alarm rfkill snd_q6apm snd_soc_hdmi_codec fastrpc snd_soc_lpass_va_macro snd_soc_lpass_tx_macro snd_soc_lpass_rx_macro snd_soc_lpass_wsa_macro soundwire_qcom snd_soc_wcd938x slimbus snd_soc_lpass_macro_common snd_soc_wcd938x_sdw pci_pwrctrl_pwrseq regmap_sdw snd_soc_wcd_mbhc coresight_stm coresight_funnel coresight_tmc snd_soc_wcd_classh coresight_cti stm_core coresight_replicator soundwire_bus coresight mux_gpio fuse nfnetlink ip_tables x_tables ipv6 gpio_sbu_mux panel_edp msm hid_multitouch drm_exec ocmem gpu_sched drm_dp_aux_bus rpmsg_ctrl apr rpmsg_char qrtr_smd i2c_hid_of qcom_pd_mapper [ 53.219100] ps883x phy_nxp_ptn3222 i2c_hid drm_display_helper nvme phy_qcom_qmp_combo leds_qcom_lpg ucsi_glink pmic_glink_altmode nvme_core aux_hpd_bridge typec_ucsi qcom_battmgr sm3_ce sm3 led_class_multicolor qcom_q6v5_pas sha3_ce rtc_pm8xxx phy_qcom_eusb2_repeater qcom_pbs drm_client_lib aux_bridge sha512_ce qcom_pil_info drm_kms_helper qcom_common qcom_pon sha512_arm64 qcom_glink_smem typec qcom_q6v5 nvmem_qcom_spmi_sdam dispcc_x1e80100 drm pwrseq_qcom_wcn pinctrl_sm8550_lpass_lpi pwrseq_core i2c_qcom_geni qcom_stats pinctrl_lpass_lpi phy_qcom_edp phy_qcom_qmp_usb qcom_sysmon tcsrcc_x1e80100 llcc_qcom gpucc_x1e80100 phy_qcom_snps_eusb2 mdt_loader lpasscc_sc8280xp pcie_qcom qcom_cpucp_mbox icc_bwmon phy_qcom_qmp_pcie qrtr pmic_glink pdr_interface qcom_pdr_msg pwm_bl socinfo backlight qmi_helpers [ 53.219234] CPU: 2 UID: 0 PID: 1958 Comm: kworker/u49:49 Not tainted 6.15.0-rc4+ #95 PREEMPT [ 53.219241] Hardware name: LENOVO 21N1CTO1WW/21N1CTO1WW, BIOS N42ET85W (2.15 ) 11/22/2024 [ 53.219245] Workqueue: async async_run_entry_fn [ 53.219258] pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) [ 53.219265] pc : ieee80211_reconfig+0x37c/0x1718 [mac80211] [ 53.219315] lr : ieee80211_reconfig+0x37c/0x1718 [mac80211] [ 53.219362] sp : ffff8000853ebb30 [ 53.219364] x29: ffff8000853ebbf0 x28: 0000000000000000 x27: 0000000000000000 [ 53.219373] x26: ffff1ce140047428 x25: 0000000000000000 x24: ffff1ce1408f7c05 [ 53.219380] x23: ffff1ce14aaa05b8 x22: 0000000000000010 x21: 00000000fffffff5 [ 53.219387] x20: 0000000000000000 x19: ffff1ce14aaa0900 x18: 00000000fffffffe [ 53.219394] x17: 72617774666f7320 x16: 6120656220646c75 x15: 6f63207369685420 [ 53.219401] x14: 2e656d7573657220 x13: 0a2e657573736920 x12: 6572617764726168 [ 53.219408] x11: 0000000000000058 x10: 0000000000000018 x9 : ffffdacf6aa7749c [ 53.219415] x8 : 0000000000000507 x7 : ffffdacf6d031138 x6 : ffffdacf6d031138 [ 53.219422] x5 : ffff1ce8bbe76508 x4 : 0000000000000000 x3 : ffff42194efd1000 [ 53.219429] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff1ce149caa300 [ 53.219437] Call trace: [ 53.219440] ieee80211_reconfig+0x37c/0x1718 [mac80211] (P) [ 53.219490] ieee80211_resume+0x54/0x78 [mac80211] [ 53.219541] wiphy_resume+0x8c/0x200 [cfg80211] [ 53.219603] dpm_run_callback+0x50/0x188 [ 53.219614] device_resume+0xc4/0x1f8 [ 53.219621] async_resume+0x2c/0x50 [ 53.219628] async_run_entry_fn+0x3c/0x160 [ 53.219634] process_one_work+0x158/0x3c8 [ 53.219643] worker_thread+0x2e0/0x418 [ 53.219650] kthread+0x14c/0x230 [ 53.219657] ret_from_fork+0x10/0x20 [ 53.219666] ---[ end trace 0000000000000000 ]--- [ 53.220154] ------------[ cut here ]------------ [ 53.220158] WARNING: CPU: 2 PID: 1958 at net/mac80211/driver-ops.c:41 drv_stop+0x1cc/0x1e8 [mac80211] [ 53.220235] Modules linked in: reset_gpio snd_soc_wsa884x q6prm_clocks q6apm_dai q6apm_lpass_dais snd_q6dsp_common q6prm michael_mic rfcomm wireguard libchacha20poly1305 chacha_neon libchacha poly1305_neon ip6_udp_tunnel udp_tunnel libcurve25519_generic binfmt_misc qrtr_mhi ath12k mac80211 libarc4 cfg80211 mhi hci_uart btqca btbcm snd_soc_x1e80100 snd_soc_qcom_sdw snd_soc_qcom_common bluetooth ecdh_generic ecc qcom_spmi_temp_alarm rfkill snd_q6apm snd_soc_hdmi_codec fastrpc snd_soc_lpass_va_macro snd_soc_lpass_tx_macro snd_soc_lpass_rx_macro snd_soc_lpass_wsa_macro soundwire_qcom snd_soc_wcd938x slimbus snd_soc_lpass_macro_common snd_soc_wcd938x_sdw pci_pwrctrl_pwrseq regmap_sdw snd_soc_wcd_mbhc coresight_stm coresight_funnel coresight_tmc snd_soc_wcd_classh coresight_cti stm_core coresight_replicator soundwire_bus coresight mux_gpio fuse nfnetlink ip_tables x_tables ipv6 gpio_sbu_mux panel_edp msm hid_multitouch drm_exec ocmem gpu_sched drm_dp_aux_bus rpmsg_ctrl apr rpmsg_char qrtr_smd i2c_hid_of qcom_pd_mapper [ 53.220351] ps883x phy_nxp_ptn3222 i2c_hid drm_display_helper nvme phy_qcom_qmp_combo leds_qcom_lpg ucsi_glink pmic_glink_altmode nvme_core aux_hpd_bridge typec_ucsi qcom_battmgr sm3_ce sm3 led_class_multicolor qcom_q6v5_pas sha3_ce rtc_pm8xxx phy_qcom_eusb2_repeater qcom_pbs drm_client_lib aux_bridge sha512_ce qcom_pil_info drm_kms_helper qcom_common qcom_pon sha512_arm64 qcom_glink_smem typec qcom_q6v5 nvmem_qcom_spmi_sdam dispcc_x1e80100 drm pwrseq_qcom_wcn pinctrl_sm8550_lpass_lpi pwrseq_core i2c_qcom_geni qcom_stats pinctrl_lpass_lpi phy_qcom_edp phy_qcom_qmp_usb qcom_sysmon tcsrcc_x1e80100 llcc_qcom gpucc_x1e80100 phy_qcom_snps_eusb2 mdt_loader lpasscc_sc8280xp pcie_qcom qcom_cpucp_mbox icc_bwmon phy_qcom_qmp_pcie qrtr pmic_glink pdr_interface qcom_pdr_msg pwm_bl socinfo backlight qmi_helpers [ 53.220444] CPU: 2 UID: 0 PID: 1958 Comm: kworker/u49:49 Tainted: G W 6.15.0-rc4+ #95 PREEMPT [ 53.220452] Tainted: [W]=WARN [ 53.220455] Hardware name: LENOVO 21N1CTO1WW/21N1CTO1WW, BIOS N42ET85W (2.15 ) 11/22/2024 [ 53.220458] Workqueue: async async_run_entry_fn [ 53.220467] pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) [ 53.220472] pc : drv_stop+0x1cc/0x1e8 [mac80211] [ 53.220521] lr : ieee80211_stop_device+0x8c/0xa8 [mac80211] [ 53.220580] sp : ffff8000853eb9f0 [ 53.220582] x29: ffff8000853eb9f0 x28: 0000000000000000 x27: 0000000000000000 [ 53.220591] x26: ffff1ce140047428 x25: ffff8000853eba50 x24: ffff8000853eba50 [ 53.220598] x23: 0000000000000000 x22: 0000000000000001 x21: 0000000000000000 [ 53.220604] x20: 0000000000000000 x19: ffff1ce14aaa0900 x18: 00000000fffffffe [ 53.220611] x17: ffff42194efd1000 x16: ffff800080010000 x15: 6f63207369685420 [ 53.220618] x14: 000000000000037f x13: 000000000000037f x12: 071c71c71c71c71c [ 53.220625] x11: ffff1ce8bbe88b8c x10: 1f0348adc6bb8584 x9 : ffffdacf67622b3c [ 53.220633] x8 : ffff1ce149e1e550 x7 : 0000000000000000 x6 : 000000000000003f [ 53.220640] x5 : 0000000000000040 x4 : 0000000000000000 x3 : 0000000000000003 [ 53.220646] x2 : 0000000000000001 x1 : 0000000000000000 x0 : 0000000000000000 [ 53.220652] Call trace: [ 53.220654] drv_stop+0x1cc/0x1e8 [mac80211] (P) [ 53.220702] ieee80211_stop_device+0x8c/0xa8 [mac80211] [ 53.220751] ieee80211_do_stop+0x644/0x830 [mac80211] [ 53.220798] ieee80211_stop+0x60/0x1b0 [mac80211] [ 53.220845] __dev_close_many+0xbc/0x1f0 [ 53.220857] dev_close_many+0x94/0x160 [ 53.220863] netif_close+0x78/0xa0 [ 53.220868] dev_close+0x3c/0x70 [ 53.220876] cfg80211_shutdown_all_interfaces+0x4c/0x118 [cfg80211] [ 53.220935] wiphy_resume+0xc0/0x200 [cfg80211] [ 53.220985] dpm_run_callback+0x50/0x188 [ 53.220992] device_resume+0xc4/0x1f8 [ 53.220999] async_resume+0x2c/0x50 [ 53.221006] async_run_entry_fn+0x3c/0x160 [ 53.221012] process_one_work+0x158/0x3c8 [ 53.221020] worker_thread+0x2e0/0x418 [ 53.221027] kthread+0x14c/0x230 [ 53.221033] ret_from_fork+0x10/0x20 [ 53.221039] ---[ end trace 0000000000000000 ]--- [ 53.221223] ieee80211 phy0: PM: dpm_run_callback(): wiphy_resume [cfg80211] returns -11 [ 53.221277] ieee80211 phy0: PM: failed to resume async: error -11 [ 53.667179] OOM killer enabled. [ 53.667182] Restarting tasks ... done. [ 53.668270] random: crng reseeded on system resumption [ 53.668317] PM: suspend exit [ 56.804822] ath12k_pci 0004:01:00.0: wmi command 16387 timeout [ 56.804845] ath12k_pci 0004:01:00.0: failed to send WMI_PDEV_SET_PARAM cmd [ 56.804859] ath12k_pci 0004:01:00.0: failed to enable PMF QOS: (-11 [ 56.804872] ath12k_pci 0004:01:00.0: fail to start mac operations in pdev idx 0 ret -11 ... -- Sebastian
Hi Muhammad, kernel test robot noticed the following build warnings: [auto build test WARNING on ath/ath-next] [also build test WARNING on next-20250430] [cannot apply to mani-mhi/mhi-next char-misc/char-misc-testing char-misc/char-misc-next char-misc/char-misc-linus staging/staging-testing staging/staging-next staging/staging-linus usb/usb-testing usb/usb-next usb/usb-linus linus/master v6.15-rc4] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Muhammad-Usama-Anjum/bus-mhi-host-don-t-free-bhie-tables-during-suspend-hibernation/20250429-202649 base: https://git.kernel.org/pub/scm/linux/kernel/git/ath/ath.git ath-next patch link: https://lore.kernel.org/r/20250429122351.108684-1-usama.anjum%40collabora.com patch subject: [PATCH v3] bus: mhi: host: don't free bhie tables during suspend/hibernation config: arm-randconfig-001-20250430 (https://download.01.org/0day-ci/archive/20250430/202504302208.7JSH4wb6-lkp@intel.com/config) compiler: arm-linux-gnueabi-gcc (GCC) 10.5.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250430/202504302208.7JSH4wb6-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202504302208.7JSH4wb6-lkp@intel.com/ All warnings (new ones prefixed by >>): >> drivers/bus/mhi/host/pm.c:1246:6: warning: no previous prototype for 'mhi_power_down_unprepare_keep_dev' [-Wmissing-prototypes] 1246 | void mhi_power_down_unprepare_keep_dev(struct mhi_controller *mhi_cntrl) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ vim +/mhi_power_down_unprepare_keep_dev +1246 drivers/bus/mhi/host/pm.c 1245 > 1246 void mhi_power_down_unprepare_keep_dev(struct mhi_controller *mhi_cntrl) 1247 { 1248 mhi_cntrl->bhi = NULL; 1249 mhi_cntrl->bhie = NULL; 1250 1251 mhi_deinit_dev_ctxt(mhi_cntrl); 1252 } 1253
Hi Muhammad, kernel test robot noticed the following build warnings: [auto build test WARNING on ath/ath-next] [also build test WARNING on next-20250430] [cannot apply to mani-mhi/mhi-next char-misc/char-misc-testing char-misc/char-misc-next char-misc/char-misc-linus staging/staging-testing staging/staging-next staging/staging-linus usb/usb-testing usb/usb-next usb/usb-linus linus/master v6.15-rc4] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Muhammad-Usama-Anjum/bus-mhi-host-don-t-free-bhie-tables-during-suspend-hibernation/20250429-202649 base: https://git.kernel.org/pub/scm/linux/kernel/git/ath/ath.git ath-next patch link: https://lore.kernel.org/r/20250429122351.108684-1-usama.anjum%40collabora.com patch subject: [PATCH v3] bus: mhi: host: don't free bhie tables during suspend/hibernation config: arm64-randconfig-002-20250430 (https://download.01.org/0day-ci/archive/20250501/202505010037.1PMLamw8-lkp@intel.com/config) compiler: clang version 21.0.0git (https://github.com/llvm/llvm-project f819f46284f2a79790038e1f6649172789734ae8) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250501/202505010037.1PMLamw8-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202505010037.1PMLamw8-lkp@intel.com/ All warnings (new ones prefixed by >>): >> drivers/bus/mhi/host/pm.c:1246:6: warning: no previous prototype for function 'mhi_power_down_unprepare_keep_dev' [-Wmissing-prototypes] 1246 | void mhi_power_down_unprepare_keep_dev(struct mhi_controller *mhi_cntrl) | ^ drivers/bus/mhi/host/pm.c:1246:1: note: declare 'static' if the function is not intended to be used outside of this translation unit 1246 | void mhi_power_down_unprepare_keep_dev(struct mhi_controller *mhi_cntrl) | ^ | static 1 warning generated. vim +/mhi_power_down_unprepare_keep_dev +1246 drivers/bus/mhi/host/pm.c 1245 > 1246 void mhi_power_down_unprepare_keep_dev(struct mhi_controller *mhi_cntrl) 1247 { 1248 mhi_cntrl->bhi = NULL; 1249 mhi_cntrl->bhie = NULL; 1250 1251 mhi_deinit_dev_ctxt(mhi_cntrl); 1252 } 1253
diff --git a/drivers/bus/mhi/host/boot.c b/drivers/bus/mhi/host/boot.c index efa3b6dddf4d2..bc8459798bbee 100644 --- a/drivers/bus/mhi/host/boot.c +++ b/drivers/bus/mhi/host/boot.c @@ -584,10 +584,17 @@ void mhi_fw_load_handler(struct mhi_controller *mhi_cntrl) * device transitioning into MHI READY state */ if (fw_load_type == MHI_FW_LOAD_FBC) { - ret = mhi_alloc_bhie_table(mhi_cntrl, &mhi_cntrl->fbc_image, fw_sz); - if (ret) { - release_firmware(firmware); - goto error_fw_load; + if (mhi_cntrl->fbc_image && fw_sz != mhi_cntrl->prev_fw_sz) { + mhi_free_bhie_table(mhi_cntrl, mhi_cntrl->fbc_image); + mhi_cntrl->fbc_image = NULL; + } + if (!mhi_cntrl->fbc_image) { + ret = mhi_alloc_bhie_table(mhi_cntrl, &mhi_cntrl->fbc_image, fw_sz); + if (ret) { + release_firmware(firmware); + goto error_fw_load; + } + mhi_cntrl->prev_fw_sz = fw_sz; } /* Load the firmware into BHIE vec table */ diff --git a/drivers/bus/mhi/host/init.c b/drivers/bus/mhi/host/init.c index 13e7a55f54ff4..a7663ad16bfc6 100644 --- a/drivers/bus/mhi/host/init.c +++ b/drivers/bus/mhi/host/init.c @@ -1173,8 +1173,9 @@ int mhi_prepare_for_power_up(struct mhi_controller *mhi_cntrl) /* * Allocate RDDM table for debugging purpose if specified */ - mhi_alloc_bhie_table(mhi_cntrl, &mhi_cntrl->rddm_image, - mhi_cntrl->rddm_size); + if (!mhi_cntrl->rddm_image) + mhi_alloc_bhie_table(mhi_cntrl, &mhi_cntrl->rddm_image, + mhi_cntrl->rddm_size); if (mhi_cntrl->rddm_image) { ret = mhi_rddm_prepare(mhi_cntrl, mhi_cntrl->rddm_image); diff --git a/drivers/bus/mhi/host/pm.c b/drivers/bus/mhi/host/pm.c index e6c3ff62bab1d..b726b000d8a5d 100644 --- a/drivers/bus/mhi/host/pm.c +++ b/drivers/bus/mhi/host/pm.c @@ -1259,10 +1259,19 @@ void mhi_power_down(struct mhi_controller *mhi_cntrl, bool graceful) } EXPORT_SYMBOL_GPL(mhi_power_down); +void mhi_power_down_unprepare_keep_dev(struct mhi_controller *mhi_cntrl) +{ + mhi_cntrl->bhi = NULL; + mhi_cntrl->bhie = NULL; + + mhi_deinit_dev_ctxt(mhi_cntrl); +} + void mhi_power_down_keep_dev(struct mhi_controller *mhi_cntrl, bool graceful) { __mhi_power_down(mhi_cntrl, graceful, false); + mhi_power_down_unprepare_keep_dev(mhi_cntrl); } EXPORT_SYMBOL_GPL(mhi_power_down_keep_dev); diff --git a/drivers/net/wireless/ath/ath11k/mhi.c b/drivers/net/wireless/ath/ath11k/mhi.c index acd76e9392d31..c5dc776b23643 100644 --- a/drivers/net/wireless/ath/ath11k/mhi.c +++ b/drivers/net/wireless/ath/ath11k/mhi.c @@ -460,12 +460,12 @@ void ath11k_mhi_stop(struct ath11k_pci *ab_pci, bool is_suspend) * workaround, otherwise ath11k_core_resume() will timeout * during resume. */ - if (is_suspend) + if (is_suspend) { mhi_power_down_keep_dev(ab_pci->mhi_ctrl, true); - else + } else { mhi_power_down(ab_pci->mhi_ctrl, true); - - mhi_unprepare_after_power_down(ab_pci->mhi_ctrl); + mhi_unprepare_after_power_down(ab_pci->mhi_ctrl); + } } int ath11k_mhi_suspend(struct ath11k_pci *ab_pci) diff --git a/drivers/net/wireless/ath/ath12k/mhi.c b/drivers/net/wireless/ath/ath12k/mhi.c index 08f44baf182a5..cb7f789d873f2 100644 --- a/drivers/net/wireless/ath/ath12k/mhi.c +++ b/drivers/net/wireless/ath/ath12k/mhi.c @@ -635,12 +635,12 @@ void ath12k_mhi_stop(struct ath12k_pci *ab_pci, bool is_suspend) * workaround, otherwise ath12k_core_resume() will timeout * during resume. */ - if (is_suspend) + if (is_suspend) { ath12k_mhi_set_state(ab_pci, ATH12K_MHI_POWER_OFF_KEEP_DEV); - else + } else { ath12k_mhi_set_state(ab_pci, ATH12K_MHI_POWER_OFF); - - ath12k_mhi_set_state(ab_pci, ATH12K_MHI_DEINIT); + ath12k_mhi_set_state(ab_pci, ATH12K_MHI_DEINIT); + } } void ath12k_mhi_suspend(struct ath12k_pci *ab_pci) diff --git a/include/linux/mhi.h b/include/linux/mhi.h index dd372b0123a6d..6fd218a877855 100644 --- a/include/linux/mhi.h +++ b/include/linux/mhi.h @@ -306,6 +306,7 @@ struct mhi_controller_config { * if fw_image is NULL and fbc_download is true (optional) * @fw_sz: Firmware image data size for normal booting, used only if fw_image * is NULL and fbc_download is true (optional) + * @prev_fw_sz: Previous firmware image data size, when fbc_download is true * @edl_image: Firmware image name for emergency download mode (optional) * @rddm_size: RAM dump size that host should allocate for debugging purpose * @sbl_size: SBL image size downloaded through BHIe (optional) @@ -382,6 +383,7 @@ struct mhi_controller { const char *fw_image; const u8 *fw_data; size_t fw_sz; + size_t prev_fw_sz; const char *edl_image; size_t rddm_size; size_t sbl_size;
Fix dma_direct_alloc() failure at resume time during bhie_table allocation. There is a crash report where at resume time, the memory from the dma doesn't get allocated and MHI fails to re-initialize. There is fragmentation/memory pressure. To fix it, don't free the memory at power down during suspend / hibernation. Instead, use the same allocated memory again after every resume / hibernation. This patch has been tested with resume and hibernation both. The rddm is of constant size for a given hardware. While the fbc_image size depends on the firmware. If the firmware changes, we'll free and allocate new memory for it. Here are the crash logs: [ 3029.338587] mhi mhi0: Requested to power ON [ 3029.338621] mhi mhi0: Power on setup success [ 3029.668654] kworker/u33:8: page allocation failure: order:7, mode:0xc04(GFP_NOIO|GFP_DMA32), nodemask=(null),cpuset=/,mems_allowed=0 [ 3029.668682] CPU: 4 UID: 0 PID: 2744 Comm: kworker/u33:8 Not tainted 6.11.11-valve10-1-neptune-611-gb69e902b4338 #1ed779c892334112fb968aaa3facf9686b5ff0bd7 [ 3029.668690] Hardware name: Valve Galileo/Galileo, BIOS F7G0112 08/01/2024 [ 3029.668694] Workqueue: mhi_hiprio_wq mhi_pm_st_worker [mhi] [ 3029.668717] Call Trace: [ 3029.668722] <TASK> [ 3029.668728] dump_stack_lvl+0x4e/0x70 [ 3029.668738] warn_alloc+0x164/0x190 [ 3029.668747] ? srso_return_thunk+0x5/0x5f [ 3029.668754] ? __alloc_pages_direct_compact+0xaf/0x360 [ 3029.668761] __alloc_pages_slowpath.constprop.0+0xc75/0xd70 [ 3029.668774] __alloc_pages_noprof+0x321/0x350 [ 3029.668782] __dma_direct_alloc_pages.isra.0+0x14a/0x290 [ 3029.668790] dma_direct_alloc+0x70/0x270 [ 3029.668796] mhi_alloc_bhie_table+0xe8/0x190 [mhi faa917c5aa23a5f5b12d6a2c597067e16d2fedc0] [ 3029.668814] mhi_fw_load_handler+0x1bc/0x310 [mhi faa917c5aa23a5f5b12d6a2c597067e16d2fedc0] [ 3029.668830] mhi_pm_st_worker+0x5c8/0xaa0 [mhi faa917c5aa23a5f5b12d6a2c597067e16d2fedc0] [ 3029.668844] ? srso_return_thunk+0x5/0x5f [ 3029.668853] process_one_work+0x17e/0x330 [ 3029.668861] worker_thread+0x2ce/0x3f0 [ 3029.668868] ? __pfx_worker_thread+0x10/0x10 [ 3029.668873] kthread+0xd2/0x100 [ 3029.668879] ? __pfx_kthread+0x10/0x10 [ 3029.668885] ret_from_fork+0x34/0x50 [ 3029.668892] ? __pfx_kthread+0x10/0x10 [ 3029.668898] ret_from_fork_asm+0x1a/0x30 [ 3029.668910] </TASK> Tested-on: WCN6855 WLAN.HSP.1.1-03926.13-QCAHSPSWPL_V2_SILICONZ_CE-2.52297.6 Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> --- Changes since v1: - Don't free bhie tables during suspend/hibernation only - Handle fbc_image changed size correctly - Remove fbc_image getting set to NULL in *free_bhie_table() Changes since v2: - Remove the new mhi_partial_unprepare_after_power_down() and instead update mhi_power_down_keep_dev() to use mhi_power_down_unprepare_keep_dev() as suggested by Mani - Update all users of this API such as ath12k (previously only ath11k was updated) - Define prev_fw_sz in docs - Do better alignment of comments Tested on ath11k. --- drivers/bus/mhi/host/boot.c | 15 +++++++++++---- drivers/bus/mhi/host/init.c | 5 +++-- drivers/bus/mhi/host/pm.c | 9 +++++++++ drivers/net/wireless/ath/ath11k/mhi.c | 8 ++++---- drivers/net/wireless/ath/ath12k/mhi.c | 8 ++++---- include/linux/mhi.h | 2 ++ 6 files changed, 33 insertions(+), 14 deletions(-)