Message ID | 20230601033840.2997-1-quic_bqiang@quicinc.com |
---|---|
State | Superseded |
Headers | show |
Series | wifi: ath11k: fix boot failure with one MSI vector | expand |
Baochen Qiang <quic_bqiang@quicinc.com> writes: > Commit 5b32b6dd96633 ("ath11k: Remove core PCI references from > PCI common code") breaks with one MSI vector because it moves > affinity setting after IRQ request, see below log: > > [ 1417.278835] ath11k_pci 0000:02:00.0: failed to receive control response completion, polling.. > [ 1418.302829] ath11k_pci 0000:02:00.0: Service connect timeout > [ 1418.302833] ath11k_pci 0000:02:00.0: failed to connect to HTT: -110 > [ 1418.303669] ath11k_pci 0000:02:00.0: failed to start core: -110 > > The detail is, if do affinity request after IRQ activated, > which is done in request_irq(), kernel caches that request and > returns success directly. Later when a subsequent MHI interrupt is > fired, kernel will do the real affinity setting work, as a result, > changs the MSI vector. However at that time host has configured > old vector to hardware, so host never receives CE or DP interrupts. > > Fix it by setting affinity before registering MHI controller > where host is, for the first time, doing IRQ request. > > Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3 > > Fixes: 5b32b6dd9663 ("ath11k: Remove core PCI references from PCI common code") > Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> I'm worried if this breaks WCN6750 support. Manikanta, would able to test this patch on WCN6750 and let us know if breaks anything?
On 6/9/2023 8:26 PM, Kalle Valo wrote: > Baochen Qiang <quic_bqiang@quicinc.com> writes: > >> Commit 5b32b6dd96633 ("ath11k: Remove core PCI references from >> PCI common code") breaks with one MSI vector because it moves >> affinity setting after IRQ request, see below log: >> >> [ 1417.278835] ath11k_pci 0000:02:00.0: failed to receive control response completion, polling.. >> [ 1418.302829] ath11k_pci 0000:02:00.0: Service connect timeout >> [ 1418.302833] ath11k_pci 0000:02:00.0: failed to connect to HTT: -110 >> [ 1418.303669] ath11k_pci 0000:02:00.0: failed to start core: -110 >> >> The detail is, if do affinity request after IRQ activated, >> which is done in request_irq(), kernel caches that request and >> returns success directly. Later when a subsequent MHI interrupt is >> fired, kernel will do the real affinity setting work, as a result, >> changs the MSI vector. However at that time host has configured >> old vector to hardware, so host never receives CE or DP interrupts. >> >> Fix it by setting affinity before registering MHI controller >> where host is, for the first time, doing IRQ request. >> >> Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3 >> >> Fixes: 5b32b6dd9663 ("ath11k: Remove core PCI references from PCI common code") >> Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> > I'm worried if this breaks WCN6750 support. Manikanta, would able to > test this patch on WCN6750 and let us know if breaks anything? Hi Manikanta, could you help test this patch on WCN6750?
On 8/7/2023 12:44 PM, Baochen Qiang wrote: > > On 6/9/2023 8:26 PM, Kalle Valo wrote: >> Baochen Qiang <quic_bqiang@quicinc.com> writes: >> >>> Commit 5b32b6dd96633 ("ath11k: Remove core PCI references from >>> PCI common code") breaks with one MSI vector because it moves >>> affinity setting after IRQ request, see below log: >>> >>> [ 1417.278835] ath11k_pci 0000:02:00.0: failed to receive control >>> response completion, polling.. >>> [ 1418.302829] ath11k_pci 0000:02:00.0: Service connect timeout >>> [ 1418.302833] ath11k_pci 0000:02:00.0: failed to connect to HTT: -110 >>> [ 1418.303669] ath11k_pci 0000:02:00.0: failed to start core: -110 >>> >>> The detail is, if do affinity request after IRQ activated, >>> which is done in request_irq(), kernel caches that request and >>> returns success directly. Later when a subsequent MHI interrupt is >>> fired, kernel will do the real affinity setting work, as a result, >>> changs the MSI vector. However at that time host has configured >>> old vector to hardware, so host never receives CE or DP interrupts. >>> >>> Fix it by setting affinity before registering MHI controller >>> where host is, for the first time, doing IRQ request. >>> >>> Tested-on: WCN6855 hw2.0 PCI >>> WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3 >>> >>> Fixes: 5b32b6dd9663 ("ath11k: Remove core PCI references from PCI >>> common code") >>> Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> >> I'm worried if this breaks WCN6750 support. Manikanta, would able to >> test this patch on WCN6750 and let us know if breaks anything? > Hi Manikanta, could you help test this patch on WCN6750? Sure, we will test and update this here. Thanks, Manikanta
On 8/7/2023 12:53 PM, Manikanta Pubbisetty wrote: > On 8/7/2023 12:44 PM, Baochen Qiang wrote: >> >> On 6/9/2023 8:26 PM, Kalle Valo wrote: >>> Baochen Qiang <quic_bqiang@quicinc.com> writes: >>> >>>> Commit 5b32b6dd96633 ("ath11k: Remove core PCI references from >>>> PCI common code") breaks with one MSI vector because it moves >>>> affinity setting after IRQ request, see below log: >>>> >>>> [ 1417.278835] ath11k_pci 0000:02:00.0: failed to receive control >>>> response completion, polling.. >>>> [ 1418.302829] ath11k_pci 0000:02:00.0: Service connect timeout >>>> [ 1418.302833] ath11k_pci 0000:02:00.0: failed to connect to HTT: -110 >>>> [ 1418.303669] ath11k_pci 0000:02:00.0: failed to start core: -110 >>>> >>>> The detail is, if do affinity request after IRQ activated, >>>> which is done in request_irq(), kernel caches that request and >>>> returns success directly. Later when a subsequent MHI interrupt is >>>> fired, kernel will do the real affinity setting work, as a result, >>>> changs the MSI vector. However at that time host has configured >>>> old vector to hardware, so host never receives CE or DP interrupts. >>>> >>>> Fix it by setting affinity before registering MHI controller >>>> where host is, for the first time, doing IRQ request. >>>> >>>> Tested-on: WCN6855 hw2.0 PCI >>>> WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3 >>>> >>>> Fixes: 5b32b6dd9663 ("ath11k: Remove core PCI references from PCI >>>> common code") >>>> Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> >>> I'm worried if this breaks WCN6750 support. Manikanta, would able to >>> test this patch on WCN6750 and let us know if breaks anything? >> Hi Manikanta, could you help test this patch on WCN6750? > > Sure, we will test and update this here. > No impact on WCN6750. Tested-by: Manikanta Pubbisetty <quic_mpubbise@quicinc.com> Thanks, Manikanta
On 8/7/2023 6:16 PM, Manikanta Pubbisetty wrote: > On 8/7/2023 12:53 PM, Manikanta Pubbisetty wrote: >> On 8/7/2023 12:44 PM, Baochen Qiang wrote: >>> >>> On 6/9/2023 8:26 PM, Kalle Valo wrote: >>>> Baochen Qiang <quic_bqiang@quicinc.com> writes: >>>> >>>>> Commit 5b32b6dd96633 ("ath11k: Remove core PCI references from >>>>> PCI common code") breaks with one MSI vector because it moves >>>>> affinity setting after IRQ request, see below log: >>>>> >>>>> [ 1417.278835] ath11k_pci 0000:02:00.0: failed to receive control >>>>> response completion, polling.. >>>>> [ 1418.302829] ath11k_pci 0000:02:00.0: Service connect timeout >>>>> [ 1418.302833] ath11k_pci 0000:02:00.0: failed to connect to HTT: >>>>> -110 >>>>> [ 1418.303669] ath11k_pci 0000:02:00.0: failed to start core: -110 >>>>> >>>>> The detail is, if do affinity request after IRQ activated, >>>>> which is done in request_irq(), kernel caches that request and >>>>> returns success directly. Later when a subsequent MHI interrupt is >>>>> fired, kernel will do the real affinity setting work, as a result, >>>>> changs the MSI vector. However at that time host has configured >>>>> old vector to hardware, so host never receives CE or DP interrupts. >>>>> >>>>> Fix it by setting affinity before registering MHI controller >>>>> where host is, for the first time, doing IRQ request. >>>>> >>>>> Tested-on: WCN6855 hw2.0 PCI >>>>> WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3 >>>>> >>>>> Fixes: 5b32b6dd9663 ("ath11k: Remove core PCI references from PCI >>>>> common code") >>>>> Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> >>>> I'm worried if this breaks WCN6750 support. Manikanta, would able to >>>> test this patch on WCN6750 and let us know if breaks anything? >>> Hi Manikanta, could you help test this patch on WCN6750? >> >> Sure, we will test and update this here. >> > > No impact on WCN6750. > > Tested-by: Manikanta Pubbisetty <quic_mpubbise@quicinc.com> > > > Thanks, > Manikanta Thanks Mani. Hi Kalle, any more comments?
On 8/7/2023 3:16 AM, Manikanta Pubbisetty wrote: > No impact on WCN6750. > > Tested-by: Manikanta Pubbisetty <quic_mpubbise@quicinc.com> Can you provide a Tested-on: flag that Kalle can add to the commit text? > > > Thanks, > Manikanta
On 8/7/2023 8:59 PM, Jeff Johnson wrote: > On 8/7/2023 3:16 AM, Manikanta Pubbisetty wrote: >> No impact on WCN6750. >> >> Tested-by: Manikanta Pubbisetty <quic_mpubbise@quicinc.com> > > Can you provide a Tested-on: flag that Kalle can add to the commit text? > Tested-on: WCN6750 hw1.0 AHB WLAN.MSL.1.0.1-01160-QCAMSLSWPLZ-1 Thanks, Manikanta
Manikanta Pubbisetty <quic_mpubbise@quicinc.com> writes: > On 8/7/2023 8:59 PM, Jeff Johnson wrote: >> On 8/7/2023 3:16 AM, Manikanta Pubbisetty wrote: >>> No impact on WCN6750. >>> >>> Tested-by: Manikanta Pubbisetty <quic_mpubbise@quicinc.com> >> Can you provide a Tested-on: flag that Kalle can add to the commit >> text? >> > > Tested-on: WCN6750 hw1.0 AHB WLAN.MSL.1.0.1-01160-QCAMSLSWPLZ-1 Excellent, thanks. The more Tested-on tags I see on the patch the more confidence I have for the patch.
On 8/22/2023 5:50 AM, Kalle Valo wrote: > Manikanta Pubbisetty <quic_mpubbise@quicinc.com> writes: > >> On 8/7/2023 8:59 PM, Jeff Johnson wrote: >>> On 8/7/2023 3:16 AM, Manikanta Pubbisetty wrote: >>>> No impact on WCN6750. >>>> >>>> Tested-by: Manikanta Pubbisetty <quic_mpubbise@quicinc.com> >>> Can you provide a Tested-on: flag that Kalle can add to the commit >>> text? >>> >> >> Tested-on: WCN6750 hw1.0 AHB WLAN.MSL.1.0.1-01160-QCAMSLSWPLZ-1 > > Excellent, thanks. The more Tested-on tags I see on the patch the more > confidence I have for the patch. > This fixes my spurious bootup failure :) Tested-on: WCN6855 hw2.1 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.23
Baochen Qiang <quic_bqiang@quicinc.com> wrote: > Commit 5b32b6dd96633 ("ath11k: Remove core PCI references from > PCI common code") breaks with one MSI vector because it moves > affinity setting after IRQ request, see below log: > > [ 1417.278835] ath11k_pci 0000:02:00.0: failed to receive control response completion, polling.. > [ 1418.302829] ath11k_pci 0000:02:00.0: Service connect timeout > [ 1418.302833] ath11k_pci 0000:02:00.0: failed to connect to HTT: -110 > [ 1418.303669] ath11k_pci 0000:02:00.0: failed to start core: -110 > > The detail is, if do affinity request after IRQ activated, > which is done in request_irq(), kernel caches that request and > returns success directly. Later when a subsequent MHI interrupt is > fired, kernel will do the real affinity setting work, as a result, > changs the MSI vector. However at that time host has configured > old vector to hardware, so host never receives CE or DP interrupts. > > Fix it by setting affinity before registering MHI controller > where host is, for the first time, doing IRQ request. > > Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3 > > Fixes: 5b32b6dd9663 ("ath11k: Remove core PCI references from PCI common code") > Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> > Tested-by: Manikanta Pubbisetty <quic_mpubbise@quicinc.com> Fails to apply, please rebase. error: patch failed: drivers/net/wireless/ath/ath11k/pci.c:888 error: drivers/net/wireless/ath/ath11k/pci.c: patch does not apply stg import: Diff does not apply cleanly Patch set to Changes Requested.
diff --git a/drivers/net/wireless/ath/ath11k/pci.c b/drivers/net/wireless/ath/ath11k/pci.c index 7b33731a50ee..a67576226380 100644 --- a/drivers/net/wireless/ath/ath11k/pci.c +++ b/drivers/net/wireless/ath/ath11k/pci.c @@ -850,10 +850,16 @@ static int ath11k_pci_probe(struct pci_dev *pdev, if (ret) goto err_pci_disable_msi; + ret = ath11k_pci_set_irq_affinity_hint(ab_pci, cpumask_of(0)); + if (ret) { + ath11k_err(ab, "failed to set irq affinity %d\n", ret); + goto err_pci_disable_msi; + } + ret = ath11k_mhi_register(ab_pci); if (ret) { ath11k_err(ab, "failed to register mhi: %d\n", ret); - goto err_pci_disable_msi; + goto err_irq_affinity_cleanup; } ret = ath11k_hal_srng_init(ab); @@ -874,12 +880,6 @@ static int ath11k_pci_probe(struct pci_dev *pdev, goto err_ce_free; } - ret = ath11k_pci_set_irq_affinity_hint(ab_pci, cpumask_of(0)); - if (ret) { - ath11k_err(ab, "failed to set irq affinity %d\n", ret); - goto err_free_irq; - } - /* kernel may allocate a dummy vector before request_irq and * then allocate a real vector when request_irq is called. * So get msi_data here again to avoid spurious interrupt @@ -888,19 +888,16 @@ static int ath11k_pci_probe(struct pci_dev *pdev, ret = ath11k_pci_config_msi_data(ab_pci); if (ret) { ath11k_err(ab, "failed to config msi_data: %d\n", ret); - goto err_irq_affinity_cleanup; + goto err_free_irq; } ret = ath11k_core_init(ab); if (ret) { ath11k_err(ab, "failed to init core: %d\n", ret); - goto err_irq_affinity_cleanup; + goto err_free_irq; } return 0; -err_irq_affinity_cleanup: - ath11k_pci_set_irq_affinity_hint(ab_pci, NULL); - err_free_irq: ath11k_pcic_free_irq(ab); @@ -913,6 +910,9 @@ static int ath11k_pci_probe(struct pci_dev *pdev, err_mhi_unregister: ath11k_mhi_unregister(ab_pci); +err_irq_affinity_cleanup: + ath11k_pci_set_irq_affinity_hint(ab_pci, NULL); + err_pci_disable_msi: ath11k_pci_free_msi(ab_pci);
Commit 5b32b6dd96633 ("ath11k: Remove core PCI references from PCI common code") breaks with one MSI vector because it moves affinity setting after IRQ request, see below log: [ 1417.278835] ath11k_pci 0000:02:00.0: failed to receive control response completion, polling.. [ 1418.302829] ath11k_pci 0000:02:00.0: Service connect timeout [ 1418.302833] ath11k_pci 0000:02:00.0: failed to connect to HTT: -110 [ 1418.303669] ath11k_pci 0000:02:00.0: failed to start core: -110 The detail is, if do affinity request after IRQ activated, which is done in request_irq(), kernel caches that request and returns success directly. Later when a subsequent MHI interrupt is fired, kernel will do the real affinity setting work, as a result, changs the MSI vector. However at that time host has configured old vector to hardware, so host never receives CE or DP interrupts. Fix it by setting affinity before registering MHI controller where host is, for the first time, doing IRQ request. Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3 Fixes: 5b32b6dd9663 ("ath11k: Remove core PCI references from PCI common code") Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> --- drivers/net/wireless/ath/ath11k/pci.c | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) base-commit: 4ab7f08db5310ded48a5c1f3ec3f2e177ba6b1c2