mbox series

[v2,0/3] Work around missing MSA_READY indicator for msm8998 devices

Message ID fd26ce4a-a9f3-4ada-8d46-ed36fb2456ca@freebox.fr
Headers show
Series Work around missing MSA_READY indicator for msm8998 devices | expand

Message

Marc Gonzalez March 28, 2024, 5:24 p.m. UTC
Work around missing MSA_READY indicator in ath10k driver
(apply work-around for all msm8998 devices)

Marc Gonzalez (3):
  dt-bindings: net: wireless: ath10k: add qcom,no-msa-ready-indicator prop
  wifi: ath10k: fake missing MSA_READY indicator
  arm64: dts: qcom: msm8998: set qcom,no-msa-ready-indicator for wifi

 Documentation/devicetree/bindings/net/wireless/qcom,ath10k.yaml | 8 ++++++++
 arch/arm64/boot/dts/qcom/msm8998.dtsi                           | 1 +
 drivers/net/wireless/ath/ath10k/qmi.c                           | 7 +++++++
 drivers/net/wireless/ath/ath10k/qmi.h                           | 1 +
 4 files changed, 17 insertions(+)

Comments

Krzysztof Kozlowski March 30, 2024, 6:20 p.m. UTC | #1
On 28/03/2024 18:36, Marc Gonzalez wrote:
> The ath10k driver waits for an "MSA_READY" indicator
> to complete initialization. If the indicator is not
> received, then the device remains unusable.
> 
> cf. ath10k_qmi_driver_event_work()
> 
> Several msm8998-based devices are affected by this issue.
> Oddly, it seems safe to NOT wait for the indicator, and
> proceed immediately when QMI_EVENT_SERVER_ARRIVE.
> 

This is v2, so where is the changelog?

Best regards,
Krzysztof
Krzysztof Kozlowski March 30, 2024, 6:25 p.m. UTC | #2
On 28/03/2024 18:39, Marc Gonzalez wrote:
> The ath10k driver waits for an "MSA_READY" indicator
> to complete initialization. If the indicator is not
> received, then the device remains unusable.
> 
> cf. ath10k_qmi_driver_event_work()
> 
> Several msm8998-based devices are affected by this issue.
> Oddly, it seems safe to NOT wait for the indicator, and
> proceed immediately when QMI_EVENT_SERVER_ARRIVE.
> 
> Jeff Johnson wrote:
> 
>   The feedback I received was "it might be ok to change all ath10k qmi
>   to skip waiting for msa_ready", and it was pointed out that ath11k
>   (and ath12k) do not wait for it.
> 
>   However with so many deployed devices, "might be ok" isn't a strong
>   argument for changing the default behavior.
> 

I think you got pretty clear comments:

"This sounds more like a firmware feature, not a hardware feature."

"This is why having this property in DT does not look right
place for this."

Best regards,
Krzysztof
Jeff Johnson April 2, 2024, 6:22 p.m. UTC | #3
On 4/2/2024 8:55 AM, Dmitry Baryshkov wrote:
> I'd say, we should take a step back and actually verify how this was
> handled in the vendor kernel.

(error handling and other non-QMI code removed from the following for readability)

In ath10k we unconditionally call the following in
ath10k_qmi_event_server_arrive():
	ret = ath10k_qmi_host_cap_send_sync(qmi);
	ret = ath10k_qmi_msa_mem_info_send_sync_msg(qmi);
	ret = ath10k_qmi_setup_msa_permissions(qmi);
	ret = ath10k_qmi_msa_ready_send_sync_msg(qmi);
	ret = ath10k_qmi_cap_send_sync_msg(qmi);

In vendor icnss2 there is conditional logic in icnss_driver_event_server_arrive():
	if (priv->device_id == WCN6750_DEVICE_ID ||
	    priv->device_id == WCN6450_DEVICE_ID) {
		ret = wlfw_host_cap_send_sync(priv);
	}

	if (priv->device_id == ADRASTEA_DEVICE_ID) {
		ret = wlfw_msa_mem_info_send_sync_msg(priv);
		ret = wlfw_msa_ready_send_sync_msg(priv);
	}

	ret = wlfw_cap_send_sync_msg(priv);

reference:
https://git.codelinaro.org/clo/la/platform/vendor/qcom-opensource/wlan/platform/-/blob/wlan-platform.lnx.1.0.r1-rel/icnss2/main.c?ref_type=heads#L890

/jeff
Dmitry Baryshkov April 2, 2024, 7:15 p.m. UTC | #4
On Tue, 2 Apr 2024 at 21:22, Jeff Johnson <quic_jjohnson@quicinc.com> wrote:
>
> On 4/2/2024 8:55 AM, Dmitry Baryshkov wrote:
> > I'd say, we should take a step back and actually verify how this was
> > handled in the vendor kernel.
>
> (error handling and other non-QMI code removed from the following for readability)
>
> In ath10k we unconditionally call the following in
> ath10k_qmi_event_server_arrive():
>         ret = ath10k_qmi_host_cap_send_sync(qmi);
>         ret = ath10k_qmi_msa_mem_info_send_sync_msg(qmi);
>         ret = ath10k_qmi_setup_msa_permissions(qmi);
>         ret = ath10k_qmi_msa_ready_send_sync_msg(qmi);
>         ret = ath10k_qmi_cap_send_sync_msg(qmi);
>
> In vendor icnss2 there is conditional logic in icnss_driver_event_server_arrive():

Note, wcn3990 is icnss, not icnss2

>         if (priv->device_id == WCN6750_DEVICE_ID ||
>             priv->device_id == WCN6450_DEVICE_ID) {
>                 ret = wlfw_host_cap_send_sync(priv);
>         }
>
>         if (priv->device_id == ADRASTEA_DEVICE_ID) {
>                 ret = wlfw_msa_mem_info_send_sync_msg(priv);
>                 ret = wlfw_msa_ready_send_sync_msg(priv);
>         }

The problem with applying this approach is that here the discriminator
is the WiFi device ID. WCN6750, WCN6450 and this ADRASTEA are
different WiFi/BT chips. However for msm8998 and e.g. sdm845 there is
no easy way to distinguish the WiFi chips. Both platforms use wcn3990
device.

>
>         ret = wlfw_cap_send_sync_msg(priv);
>
> reference:
> https://git.codelinaro.org/clo/la/platform/vendor/qcom-opensource/wlan/platform/-/blob/wlan-platform.lnx.1.0.r1-rel/icnss2/main.c?ref_type=heads#L890
>
> /jeff
Marc Gonzalez April 3, 2024, 6:16 p.m. UTC | #5
On 03/04/2024 16:12, Dmitry Baryshkov wrote:

> From [Jeff's] message it looks like we are expected to get MSA READY even on msm8998.

This is the code we're using:

https://git.codelinaro.org/clo/la/kernel/msm-4.4/-/blob/caf_migration/kernel.lnx.4.4.r38-rel/drivers/net/wireless/ath/ath10k/qmi.c

When ATH10K_SNOC_DRIVER_EVENT_SERVER_ARRIVE,
driver registers an "indicator handler"
ath10k_snoc_qmi_wlfw_clnt_ind()

It handles QMI_WLFW_FW_READY_IND_V01 by posting
ATH10K_SNOC_DRIVER_EVENT_FW_READY_IND
which is handled in the
ath10k_snoc_driver_event_work() work queue.

But QMI_WLFW_MSA_READY_IND_V01 only triggers
a debug log and setting qmi_cfg->msa_ready = true;

$ git grep '\<msa_ready\>'
drivers/net/wireless/ath/ath10k/qmi.c:          qmi_cfg->msa_ready = true;
drivers/net/wireless/ath/ath10k/qmi.c:  qmi_cfg->msa_ready = false;
drivers/net/wireless/ath/ath10k/qmi.h: * msa_ready: wlan firmware msa memory ready for board data download
drivers/net/wireless/ath/ath10k/qmi.h:  bool msa_ready;

So basically, the vendor ath10k driver ignores QMI_WLFW_MSA_READY_IND_V01.


I will test the following patch which aligns the behavior
of mainline driver to that of vendor driver:

diff --git a/drivers/net/wireless/ath/ath10k/qmi.c b/drivers/net/wireless/ath/ath10k/qmi.c
index 38e939f572a9e..0e1ab5aca663b 100644
--- a/drivers/net/wireless/ath/ath10k/qmi.c
+++ b/drivers/net/wireless/ath/ath10k/qmi.c
@@ -1040,6 +1040,7 @@ static void ath10k_qmi_driver_event_work(struct work_struct *work)
 		switch (event->type) {
 		case ATH10K_QMI_EVENT_SERVER_ARRIVE:
 			ath10k_qmi_event_server_arrive(qmi);
+			ath10k_qmi_event_msa_ready(qmi);
 			break;
 		case ATH10K_QMI_EVENT_SERVER_EXIT:
 			ath10k_qmi_event_server_exit(qmi);
@@ -1048,7 +1049,7 @@ static void ath10k_qmi_driver_event_work(struct work_struct *work)
 			ath10k_qmi_event_fw_ready_ind(qmi);
 			break;
 		case ATH10K_QMI_EVENT_MSA_READY_IND:
-			ath10k_qmi_event_msa_ready(qmi);
+			printk(KERN_WARNING "IGNORING MSA_READY INDICATOR");
 			break;
 		default:
 			ath10k_warn(ar, "invalid event type: %d", event->type);


Dmitry Baryshkov reported:
Works on sm8150, sdm845, qrb2210

Regards
Dmitry Baryshkov April 4, 2024, 1:14 p.m. UTC | #6
On Thu, 4 Apr 2024 at 15:30, Marc Gonzalez <mgonzalez@freebox.fr> wrote:
>
> On 04/04/2024 13:57, Kalle Valo wrote:
>
> > Dmitry Baryshkov wrote:
> >
> >> I'd say, we should take a step back and actually verify how this was
> >> handled in the vendor kernel.
> >
> > One comment related to this: usually vendor driver and firmware branches
> > go "hand in hand", meaning that a version of driver supports only one
> > specific firmware branch. And there can be a lot of branches. So even if
> > one branch might have a check for something specific, there are no
> > guarantees what the other N+1 branches do :/
>
> The consequences and ramifications of the above comment are not clear to me.
>
> Does this mean:
> "It is pointless to analyze a given version (or even several versions)
> of the vendor driver downstream, because there are exist a large number
> of variations of the code." ?
>
> And thus, "it is nonsensical to try to "align" the mainline driver to
> "the" vendor driver, as there is no single "vendor driver"" ?
>
> Thus, the following patch (or one functionally-equivalent) is not acceptable?

For reference, I tested this patch on sdm845 (db845c), qcm2290 aka
qrb2210 (rb1), sm6115 aka qrb4210 (rb2) and sm8150 platforms.
I was not able to fully test it on sda660, modem crashes without this
patch (there is no MSA_READY indication) and with the patch applied
the device hangs, most likely because of the IOMMU or clocking issue.

>
> diff --git a/drivers/net/wireless/ath/ath10k/qmi.c b/drivers/net/wireless/ath/ath10k/qmi.c
> index 38e939f572a9e..fd9ac9717488a 100644
> --- a/drivers/net/wireless/ath/ath10k/qmi.c
> +++ b/drivers/net/wireless/ath/ath10k/qmi.c
> @@ -1040,6 +1040,8 @@ static void ath10k_qmi_driver_event_work(struct work_struct *work)
>                 switch (event->type) {
>                 case ATH10K_QMI_EVENT_SERVER_ARRIVE:
>                         ath10k_qmi_event_server_arrive(qmi);
> +                       printk(KERN_NOTICE "NOT WAITING FOR MSA_READY INDICATOR");
> +                       ath10k_qmi_event_msa_ready(qmi);
>                         break;
>                 case ATH10K_QMI_EVENT_SERVER_EXIT:
>                         ath10k_qmi_event_server_exit(qmi);
> @@ -1048,7 +1050,7 @@ static void ath10k_qmi_driver_event_work(struct work_struct *work)
>                         ath10k_qmi_event_fw_ready_ind(qmi);
>                         break;
>                 case ATH10K_QMI_EVENT_MSA_READY_IND:
> -                       ath10k_qmi_event_msa_ready(qmi);
> +                       printk(KERN_NOTICE "IGNORING ACTUAL MSA_READY INDICATOR");
>                         break;
>                 default:
>                         ath10k_warn(ar, "invalid event type: %d", event->type);
>
>
>
> Regards
>
Marc Gonzalez April 8, 2024, 3:53 p.m. UTC | #7
On 04/04/2024 17:28, Kalle Valo wrote:

> Marc Gonzalez wrote:
> 
>> On 04/04/2024 13:57, Kalle Valo wrote:
>>
>>> Dmitry Baryshkov wrote:
>>>
>>>> I'd say, we should take a step back and actually verify how this was
>>>> handled in the vendor kernel.
>>>
>>> One comment related to this: usually vendor driver and firmware branches
>>> go "hand in hand", meaning that a version of driver supports only one
>>> specific firmware branch. And there can be a lot of branches. So even if
>>> one branch might have a check for something specific, there are no
>>> guarantees what the other N+1 branches do :/
>>
>> The consequences and ramifications of the above comment are not clear to me.
>>
>> Does this mean:
>> "It is pointless to analyze a given version (or even several versions)
>> of the vendor driver downstream, because there are exist a large number
>> of variations of the code." ?
> 
> I was trying to say that because the design philosophy between vendor
> drivers and upstream drivers is very different, we can't 100% trust
> vendor drivers. It's a very good idea to check what vendor drivers do
> but we just need to be careful before making any conclusions. Testing
> real hardware (and corresponding firmware) is the most reliable way to
> know how different products/firmware work, unfortunately.
> 
>> And thus, "it is nonsensical to try to "align" the mainline driver to
>> "the" vendor driver, as there is no single "vendor driver"" ?
> 
> No no, I'm not saying that. I have suffered this "N+1 different firmware
> branches behaving slighly differently" problem since ath6kl days so for
> me this is business as usual, sadly. I'm sure we can find a solution for
> ath10k.

Hello Kalle,

I can spin a v3, no problem.

Do you prefer:

Option A = never waiting for the MSA_READY indicator for ANYONE
Option B = not waiting for the MSA_READY indicator when qcom,no-msa-ready-indicator is defined
Option C = not waiting for the MSA_READY indicator for certain platforms (based on root compatible)
Option D = some other solution not yet discussed

Dmitry has tested Option A on 5 platforms, where it does not induce regressions.
I worked on msm8998, where Option A (or any equivalent) unbreaks WiFi.

Please provide guidance :)

Regards
Kalle Valo April 25, 2024, 9:42 a.m. UTC | #8
Marc Gonzalez <mgonzalez@freebox.fr> writes:

> On 04/04/2024 17:28, Kalle Valo wrote:
>
>> Marc Gonzalez wrote:
>> 
>>> On 04/04/2024 13:57, Kalle Valo wrote:
>>>
>>>> Dmitry Baryshkov wrote:
>>>>
>>>>> I'd say, we should take a step back and actually verify how this was
>>>>> handled in the vendor kernel.
>>>>
>>>> One comment related to this: usually vendor driver and firmware branches
>>>> go "hand in hand", meaning that a version of driver supports only one
>>>> specific firmware branch. And there can be a lot of branches. So even if
>>>> one branch might have a check for something specific, there are no
>>>> guarantees what the other N+1 branches do :/
>>>
>>> The consequences and ramifications of the above comment are not clear to me.
>>>
>>> Does this mean:
>>> "It is pointless to analyze a given version (or even several versions)
>>> of the vendor driver downstream, because there are exist a large number
>>> of variations of the code." ?
>> 
>> I was trying to say that because the design philosophy between vendor
>> drivers and upstream drivers is very different, we can't 100% trust
>> vendor drivers. It's a very good idea to check what vendor drivers do
>> but we just need to be careful before making any conclusions. Testing
>> real hardware (and corresponding firmware) is the most reliable way to
>> know how different products/firmware work, unfortunately.
>> 
>>> And thus, "it is nonsensical to try to "align" the mainline driver to
>>> "the" vendor driver, as there is no single "vendor driver"" ?
>> 
>> No no, I'm not saying that. I have suffered this "N+1 different firmware
>> branches behaving slighly differently" problem since ath6kl days so for
>> me this is business as usual, sadly. I'm sure we can find a solution for
>> ath10k.
>
> Hello Kalle,
>
> I can spin a v3, no problem.
>
> Do you prefer:
>
> Option A = never waiting for the MSA_READY indicator for ANYONE
> Option B = not waiting for the MSA_READY indicator when
> qcom,no-msa-ready-indicator is defined
> Option C = not waiting for the MSA_READY indicator for certain
> platforms (based on root compatible)
> Option D = some other solution not yet discussed

After firmware-N.bin solution didn't work (sorry about that!) my
prerence is option B.

> Dmitry has tested Option A on 5 platforms, where it does not induce regressions.
> I worked on msm8998, where Option A (or any equivalent) unbreaks WiFi.

What do you mean here? Are you saying that option A works on all
devices? I'm guessing I'm misunderstanding something.
Marc Gonzalez April 25, 2024, 11:48 a.m. UTC | #9
On 25/04/2024 11:42, Kalle Valo wrote:

> Marc Gonzalez wrote:
> 
>> Do you prefer:
>>
>> Option A = never waiting for the MSA_READY indicator for ANYONE
>> Option B = not waiting for the MSA_READY indicator when
>> qcom,no-msa-ready-indicator is defined
>> Option C = not waiting for the MSA_READY indicator for certain
>> platforms (based on root compatible)
>> Option D = some other solution not yet discussed
> 
> After firmware-N.bin solution didn't work (sorry about that!) my
> preference is option B.

Actually, Option B is this patch series.
Could you formally review it?

Perhaps one thing I could do slightly differently is to NOT call
ath10k_qmi_event_msa_ready() a second time if we DO receive the
indicator later.


>> Dmitry has tested Option A on 5 platforms, where it does not induce regressions.
>> I worked on msm8998, where Option A (or any equivalent) unbreaks WiFi.
> 
> What do you mean here? Are you saying that option A works on all
> devices? I'm guessing I'm misunderstanding something.

No one serious would ever claim "this works on all devices".

Dmitry and I tested the following patch:

diff --git a/drivers/net/wireless/ath/ath10k/qmi.c b/drivers/net/wireless/ath/ath10k/qmi.c
index 38e939f572a9e..fd9ac9717488a 100644
--- a/drivers/net/wireless/ath/ath10k/qmi.c
+++ b/drivers/net/wireless/ath/ath10k/qmi.c
@@ -1040,6 +1040,8 @@ static void ath10k_qmi_driver_event_work(struct work_struct *work)
                switch (event->type) {
                case ATH10K_QMI_EVENT_SERVER_ARRIVE:
                        ath10k_qmi_event_server_arrive(qmi);
+                       printk(KERN_NOTICE "NOT WAITING FOR MSA_READY INDICATOR");
+                       ath10k_qmi_event_msa_ready(qmi);
                        break;
                case ATH10K_QMI_EVENT_SERVER_EXIT:
                        ath10k_qmi_event_server_exit(qmi);
@@ -1048,7 +1050,7 @@ static void ath10k_qmi_driver_event_work(struct work_struct *work)
                        ath10k_qmi_event_fw_ready_ind(qmi);
                        break;
                case ATH10K_QMI_EVENT_MSA_READY_IND:
-                       ath10k_qmi_event_msa_ready(qmi);
+                       printk(KERN_NOTICE "IGNORING ACTUAL MSA_READY INDICATOR");
                        break;
                default:
                        ath10k_warn(ar, "invalid event type: %d", event->type);


Dmitry tested several platforms:

> For reference, I tested this patch on sdm845 (db845c), qcm2290 aka
> qrb2210 (rb1), sm6115 aka qrb4210 (rb2) and sm8150 platforms.
> I was not able to fully test it on sda660, modem crashes without this
> patch (there is no MSA_READY indication) and with the patch applied
> the device hangs, most likely because of the IOMMU or clocking issue.

I tested on apq8098 (msm8998 sibling).
Patch makes adapter work on my msm8998 platform.

Regards
Kalle Valo April 25, 2024, 3:42 p.m. UTC | #10
Marc Gonzalez <mgonzalez@freebox.fr> writes:

> On 25/04/2024 11:42, Kalle Valo wrote:
>
>> Marc Gonzalez wrote:
>> 
>>> Do you prefer:
>>>
>>> Option A = never waiting for the MSA_READY indicator for ANYONE
>>> Option B = not waiting for the MSA_READY indicator when
>>> qcom,no-msa-ready-indicator is defined
>>> Option C = not waiting for the MSA_READY indicator for certain
>>> platforms (based on root compatible)
>>> Option D = some other solution not yet discussed
>> 
>> After firmware-N.bin solution didn't work (sorry about that!) my
>> preference is option B.
>
> Actually, Option B is this patch series.
> Could you formally review it?

I'm happy with this series and would take it to ath.git, just need an
ack from DT maintainers:

https://patchwork.kernel.org/project/linux-wireless/patch/84f20fb5-5d48-419c-8eff-d7044afb81c0@freebox.fr/

> Perhaps one thing I could do slightly differently is to NOT call
> ath10k_qmi_event_msa_ready() a second time if we DO receive the
> indicator later.

Good point. And maybe add an ath10k_warn() message so that we notice if
there's a mismatch.
Conor Dooley April 25, 2024, 4:02 p.m. UTC | #11
On Thu, Apr 25, 2024 at 06:42:16PM +0300, Kalle Valo wrote:
> Marc Gonzalez <mgonzalez@freebox.fr> writes:
> 
> > On 25/04/2024 11:42, Kalle Valo wrote:
> >
> >> Marc Gonzalez wrote:
> >> 
> >>> Do you prefer:
> >>>
> >>> Option A = never waiting for the MSA_READY indicator for ANYONE
> >>> Option B = not waiting for the MSA_READY indicator when
> >>> qcom,no-msa-ready-indicator is defined
> >>> Option C = not waiting for the MSA_READY indicator for certain
> >>> platforms (based on root compatible)
> >>> Option D = some other solution not yet discussed
> >> 
> >> After firmware-N.bin solution didn't work (sorry about that!) my
> >> preference is option B.
> >
> > Actually, Option B is this patch series.
> > Could you formally review it?
> 
> I'm happy with this series and would take it to ath.git, just need an
> ack from DT maintainers:

As far as I can tell, Krzysztof wanted a commit message update for 1/3.
Usually this dts patch would go via the qcom dts tree, so presumably
there's a need for an Ack from Bjorn or Konrad?

> 
> https://patchwork.kernel.org/project/linux-wireless/patch/84f20fb5-5d48-419c-8eff-d7044afb81c0@freebox.fr/
> 
> > Perhaps one thing I could do slightly differently is to NOT call
> > ath10k_qmi_event_msa_ready() a second time if we DO receive the
> > indicator later.
> 
> Good point. And maybe add an ath10k_warn() message so that we notice if
> there's a mismatch.
> 
> -- 
> https://patchwork.kernel.org/project/linux-wireless/list/
> 
> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
Kalle Valo April 25, 2024, 4:39 p.m. UTC | #12
Conor Dooley <conor@kernel.org> writes:

> On Thu, Apr 25, 2024 at 06:42:16PM +0300, Kalle Valo wrote:
>
>> Marc Gonzalez <mgonzalez@freebox.fr> writes:
>> 
>> > On 25/04/2024 11:42, Kalle Valo wrote:
>> >
>> >> Marc Gonzalez wrote:
>> >> 
>> >>> Do you prefer:
>> >>>
>> >>> Option A = never waiting for the MSA_READY indicator for ANYONE
>> >>> Option B = not waiting for the MSA_READY indicator when
>> >>> qcom,no-msa-ready-indicator is defined
>> >>> Option C = not waiting for the MSA_READY indicator for certain
>> >>> platforms (based on root compatible)
>> >>> Option D = some other solution not yet discussed
>> >> 
>> >> After firmware-N.bin solution didn't work (sorry about that!) my
>> >> preference is option B.
>> >
>> > Actually, Option B is this patch series.
>> > Could you formally review it?
>> 
>> I'm happy with this series and would take it to ath.git, just need an
>> ack from DT maintainers:
>
> As far as I can tell, Krzysztof wanted a commit message update for
> 1/3.

That's my understanding as well, I assume Marc will submit v3. I marked
this patchset as 'Changes Requested' in patchwork.

> Usually this dts patch would go via the qcom dts tree, so presumably
> there's a need for an Ack from Bjorn or Konrad?

Thanks pointing this out. What I meant to say earlier that I'm happy to
take patches 1-2 to ath.git but I prefer that patch 3 goes via qcom dts
tree.