mbox series

[v4,0/3] wifi: ath11k: hibernation support

Message ID 20240228022243.17762-1-quic_bqiang@quicinc.com
Headers show
Series wifi: ath11k: hibernation support | expand

Message

Baochen Qiang Feb. 28, 2024, 2:22 a.m. UTC
Currently in ath11k we keep the firmware running on the WLAN device when the
network interface (wlan0) is down. The problem is that this will break
hibernation, obviously the firmware can't be running after the whole system is
powered off. To power down the ath11k firmware for suspend/hibernation some
changes both in MHI subsystem and ath11k are needed.

This patchset fixes a longstanding bug report about broken hibernation support:

https://bugzilla.kernel.org/show_bug.cgi?id=214649

There already is an RFC version which has been tested by multiple users with
positive results:

https://patchwork.kernel.org/project/linux-wireless/cover/20231127162022.518834-1-kvalo@kernel.org/

Basically the RFC version adds two APIs to MHI stack: with the first one ath11k
is able to keep MHI devices when going to suspend/hibernation, getting us rid of
the probe deferral issue when resume back. while with the second one ath11k could
manually prepare/unprepare MHI channels by itself, which is needed because QRTR
doesn't probe those channels automatically in this case.

Mani, the MHI maintainer, firstly doesn't like that version and insists that an
MHI device should be destroyed when suspend/hibernation, according to his
understanding on device driver model. See

https://lore.kernel.org/mhi/20231127162022.518834-1-kvalo@kernel.org/

After a long discussion Mani thought we might need a new PM callback with which
ath11k is able to wait until kernel unblocks device probe and thus MHI channels
get probed. So we came to the kernel PM list and there Mani realized that his
understanding is not correct so he finally agrees to keep MHI device during
suspend/hibernation. See

https://lore.kernel.org/all/21cd2098-97e1-4947-a5bb-a97582902ead@quicinc.com/

Mani also pointed out that an MHI controller driver (ath11k here) should not touch
MHI channels directly because those channels are managed by the corresponding MHI
client driver (QRTR here). To address this, we come up with this version.

Compared with that RFC version, this version adds PM callbacks in QRTR module:
suspend callback unprepares MHI channels during suspend and resume callback
prepares those channels during resume. In this way ath11k doesn't need to do
unprepare/prepare work by itself so those two APIs added in RFC version are
removed now.

The power down/up procedure requires a specific sequence in which PM callbacks
of wiphy, ath11k and QRTR are called, this is achieved by exploiting the
child-father relationship between their device struct, and also the PM framework
which separates whole suspend/resume process into several stages. Details in
patch [3/3].

v4:
 - resend v3 as v4 to CC netdev folks. No changes in patches themselves.

v3:
 - skip QRTR suspend/resume if MHI device is found to be in suspend state.

v2:
 - add comment on why destroying the device is optional in
   mhi_pm_disable_transition().
 - rename mhi_power_down_no_destroy() as mhi_power_down_keep_dev().
 - refine API description of mhi_power_down() and
   mhi_power_down_keep_dev().
 - add/remove __maybe_unused to QRTR PM callbacks.
 - remove '#ifdef CONFIG_PM'.
 - refine commit log of patch 1/3 and 2/3.

Baochen Qiang (3):
  bus: mhi: host: add mhi_power_down_keep_dev()
  net: qrtr: support suspend/hibernation
  wifi: ath11k: support hibernation

 drivers/bus/mhi/host/internal.h        |   4 +-
 drivers/bus/mhi/host/pm.c              |  42 ++++++++--
 drivers/net/wireless/ath/ath11k/ahb.c  |   6 +-
 drivers/net/wireless/ath/ath11k/core.c | 105 +++++++++++++++++--------
 drivers/net/wireless/ath/ath11k/core.h |   6 +-
 drivers/net/wireless/ath/ath11k/hif.h  |  14 +++-
 drivers/net/wireless/ath/ath11k/mhi.c  |  12 ++-
 drivers/net/wireless/ath/ath11k/mhi.h  |   5 +-
 drivers/net/wireless/ath/ath11k/pci.c  |  44 +++++++++--
 drivers/net/wireless/ath/ath11k/qmi.c  |   2 +-
 include/linux/mhi.h                    |  18 ++++-
 net/qrtr/mhi.c                         |  46 +++++++++++
 12 files changed, 244 insertions(+), 60 deletions(-)


base-commit: c39a5cfa0448f3afbee78373f16d87815a674f11

Comments

Baochen Qiang Feb. 29, 2024, 2:30 a.m. UTC | #1
On 2/28/2024 11:31 PM, Jeff Johnson wrote:
> On 2/27/2024 6:22 PM, Baochen Qiang wrote:
>> Now that all infrastructure is in place and ath11k is fixed to handle all the
>> corner cases, power down the ath11k firmware during suspend and power it back
>> up during resume. This fixes the problem when using hibernation with ath11k PCI
>> devices.
>>
>> For suspend, two conditions needs to be satisfied:
>>          1. since MHI channel unprepare would be done in late suspend stage,
>>             ath11k needs to get all QMI-dependent things done before that stage.
>>          2. and because unprepare MHI channels requires a working MHI stack,
>>             ath11k is not allowed to call mhi_power_down() until that finishes.
>> So the original suspend callback is separated into two parts: the first part
>> handles all QMI-dependent things in suspend callback; while the second part
>> powers down MHI in suspend_late callback. This is valid because kernel calls
>> ath11k's suspend callback before all suspend_late callbacks, making the first
>> condition happy. And because MHI devices are children of ath11k device
>> (ab->dev), kernel guarantees that ath11k's suspend_late callback is called
>> after QRTR's suspend_late callback, this satisfies the second condition.
>>
>> Above analysis also applies to resume process. so the original resume
>> callback is separated into two parts: the first part powers up MHI stack
>> in resume_early callback, this guarantees MHI stack is working when
>> QRTR tries to prepare MHI channels (kernel calls QRTR's resume_early callback
>> after ath11k's resume_early callback, due to the child-father relationship);
>> the second part waits for the completion of restart, which won't fail now
>> since MHI channels are ready for use by QMI.
>>
>> Another notable change is in power down path, we tell mhi_power_down() to not
>> to destroy MHI devices, making it possible for QRTR to help unprepare/prepare
>> MHI channels, and finally get us rid of the probe-defer issue when resume.
>>
>> Also change related code due to interface changes.
>>
>> Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.30
>>
>> Tested-by: Takashi Iwai <tiwai@suse.de>
>> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>
>> Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com>
>> ---
>>   drivers/net/wireless/ath/ath11k/ahb.c  |   6 +-
>>   drivers/net/wireless/ath/ath11k/core.c | 105 +++++++++++++++++--------
>>   drivers/net/wireless/ath/ath11k/core.h |   6 +-
>>   drivers/net/wireless/ath/ath11k/hif.h  |  14 +++-
>>   drivers/net/wireless/ath/ath11k/mhi.c  |  12 ++-
>>   drivers/net/wireless/ath/ath11k/mhi.h  |   5 +-
>>   drivers/net/wireless/ath/ath11k/pci.c  |  44 +++++++++--
>>   drivers/net/wireless/ath/ath11k/qmi.c  |   2 +-
>>   8 files changed, 142 insertions(+), 52 deletions(-)
> ...snip...
>> +int ath11k_core_resume_early(struct ath11k_base *ab)
>> +{
>> +	int ret;
>> +	struct ath11k_pdev *pdev;
>> +	struct ath11k *ar;
>> +
>> +	if (!ab->hw_params.supports_suspend)
>> +		return -EOPNOTSUPP;
>> +
>> +	/* so far signle_pdev_only chips have supports_suspend as true
> 
> nit: s/signle/single/
> 
>> +	 * and only the first pdev is valid.
>> +	 */
>> +	pdev = ath11k_core_get_single_pdev(ab);
>> +	ar = pdev->ar;
>> +	if (!ar || ar->state != ATH11K_STATE_OFF)
>> +		return 0;
>> +
>> +	reinit_completion(&ab->restart_completed);
>> +	ret = ath11k_hif_power_up(ab);
>> +	if (ret)
>> +		ath11k_warn(ab, "failed to power up hif during resume: %d\n", ret);
>> +
>> +	return ret;
>> +}
>> +EXPORT_SYMBOL(ath11k_core_resume_early);
>>   
>>   int ath11k_core_resume(struct ath11k_base *ab)
>>   {
>>   	int ret;
>>   	struct ath11k_pdev *pdev;
>>   	struct ath11k *ar;
>> +	long time_left;
>>   
>>   	if (!ab->hw_params.supports_suspend)
>>   		return -EOPNOTSUPP;
>> @@ -940,29 +990,19 @@ int ath11k_core_resume(struct ath11k_base *ab)
>>   	if (!ar || ar->state != ATH11K_STATE_OFF)
>>   		return 0;
>>   
>> -	ret = ath11k_hif_resume(ab);
>> -	if (ret) {
>> -		ath11k_warn(ab, "failed to resume hif during resume: %d\n", ret);
>> -		return ret;
>> +	time_left = wait_for_completion_timeout(&ab->restart_completed,
>> +						ATH11K_RESET_TIMEOUT_HZ);
>> +	if (time_left == 0) {
>> +		ath11k_warn(ab, "timeout while waiting for restart complete");
>> +		return -ETIMEDOUT;
>>   	}
>>   
>> -	ath11k_hif_ce_irq_enable(ab);
>> -	ath11k_hif_irq_enable(ab);
> 
> these are disabled in suspend_late()
> do you need to enable in resume_early()?
> or are they expected to be enabled via ath11k_wow_op_resume()?
> 
> and if that is the case, why isn't the disable in
> ath11k_wow_op_suspend() sufficient? can the disables in suspend_late()
> be removed?
There are two user cases here:
1. if WoWLAN is enabled, IRQ enable/disable only happens in 
ath11k_wow_op_suspend()/resume(), ath11k_core_suspend()/late_suspend() 
and ath11k_core_resume()/early_resume() do nothing but return directly 
due to below check:
		if (!ar || ar->state != ATH11K_STATE_OFF)
			return 0;
so this is symmetric and no issues here.

2. if WoWLAN is disabled, ath11k_wow_op_suspend()/resume() won't get 
called, see the check on 'local->wowlan' in __ieee80211_suspend(). Then 
IRQ is disabled in ath11k_core_suspend_late(). The reason why IRQ is not 
enabled in ath11k_core_resume()/early_resume() is because in this case 
we power down/up firmware, and during power up we go the reset path 
where IRQ would be enabled back in below calls:
	CE irqs: ath11k_core_qmi_firmware_ready() -> ath11k_core_start() -> 
ath11k_hif_start() -> ath11k_pci_start() -> ath11k_pcic_start() -> 
ath11k_pcic_ce_irqs_enable()
	DP irqs: ath11k_core_qmi_firmware_ready() -> ath11k_hif_irq_enable()
> 
> just concerned about the lack of symmetry here
Yes, also noticed it but didn't figure out a better way.

> 
> /jeff
Kalle Valo Feb. 29, 2024, 7:35 p.m. UTC | #2
Baochen Qiang <quic_bqiang@quicinc.com> writes:

> On 2/29/2024 6:12 PM, Manivannan Sadhasivam wrote:
>> On Wed, Feb 28, 2024 at 10:22:41AM +0800, Baochen Qiang wrote:
>>> ath11k fails to resume:
>>>
>>> ath11k_pci 0000:06:00.0: timeout while waiting for restart complete
>>>
>>> This happens because when calling mhi_sync_power_up() the MHI subsystem
>>> eventually calls device_add() from mhi_create_devices() but the device
>>> creation is deferred:
>>>
>>> mhi mhi0_IPCR: Driver qcom_mhi_qrtr force probe deferral
>>>
>>> The reason for deferring device creation is explained in dpm_prepare():
>>>
>>>          /*
>>>           * It is unsafe if probing of devices will happen during suspend or
>>>           * hibernation and system behavior will be unpredictable in this case.
>>>           * So, let's prohibit device's probing here and defer their probes
>>>           * instead. The normal behavior will be restored in dpm_complete().
>>>           */
>>>
>>> Because the device probe is deferred, the qcom_mhi_qrtr_probe() is not
>>> called and thus MHI channels are not prepared:
>>>
>>> So what this means that QRTR is not delivering messages and the QMI connection
>>> is not working between ath11k and the firmware, resulting a failure in firmware
>>> initialization.
>>>
>>> To fix this add new function mhi_power_down_keep_dev() which doesn't destroy
>>> the devices for channels during power down. This way we avoid probe defer issue
>>> and finally can get ath11k hibernation working with the following patches.
>>>
>>> Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.30
>>>
>>> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>
>>
>> Did Kalle co-author this patch? If so, his Co-developed-by tag should
>> be added.
>
> Hmm, I'm not sure...  I would like Kalle's thoughts on this.

IIRC I did only some simple cleanup before submitting the patch so I
don't think Co-developed-by is justified. I'm also fine with removing my
Signed-off-by.