diff mbox series

[v2,1/2] wifi: ath9k: work around AR_CFG 0xdeadbeef chip hang

Message ID 20241106090439.3487958-1-ih@simonwunderlich.de
State New
Headers show
Series [v2,1/2] wifi: ath9k: work around AR_CFG 0xdeadbeef chip hang | expand

Commit Message

Issam Hamdi Nov. 6, 2024, 9:04 a.m. UTC
From: Simon Wunderlich <simon.wunderlich@open-mesh.com>

QCA 802.11n chips (especially AR9330/AR9340) sometimes end up in a state in
which a read of AR_CFG always returns 0xdeadbeef. This should not happen
when the power_mode of the device is ATH9K_PM_AWAKE.

This problem is not yet detected by any other workaround in ath9k. No way
is known to reproduce the problem easily.

This patch originally developed by "Simon Wunderlich <simon.wunderlich@open-mesh.com>"
and "Sven Eckelmann <sven.eckelmann@open-mesh.com>"

Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Sven Eckelmann <se@simonwunderlich.de>
Signed-off-by: Issam Hamdi <ih@simonwunderlich.de>
---
v2: update the Co-developed-by to Signed-off-by
---
 drivers/net/wireless/ath/ath9k/ath9k.h |  3 +++
 drivers/net/wireless/ath/ath9k/debug.c |  1 +
 drivers/net/wireless/ath/ath9k/debug.h |  1 +
 drivers/net/wireless/ath/ath9k/init.c  |  1 +
 drivers/net/wireless/ath/ath9k/link.c  | 31 ++++++++++++++++++++++++++
 drivers/net/wireless/ath/ath9k/main.c  |  4 ++++
 6 files changed, 41 insertions(+)


base-commit: 2b94751626a6d49bbe42a19cc1503bd391016bd5

Comments

Sven Eckelmann Nov. 6, 2024, 10:05 a.m. UTC | #1
On Wednesday, 6 November 2024 10:04:39 CET Issam Hamdi wrote:
[...]
> This patch originally developed by "Simon Wunderlich <simon.wunderlich@open-mesh.com>"
> and "Sven Eckelmann <sven.eckelmann@open-mesh.com>"

Am I the only person which finds this style of adding information about "Co-
authors" weird?

[...]
> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
> Signed-off-by: Sven Eckelmann <se@simonwunderlich.de>
> Signed-off-by: Issam Hamdi <ih@simonwunderlich.de>
> ---
> v2: change the "Co-developed-by" to "Signed-off-by", remove the dependency

I think Kalle meant that "Co-developed-by" should be followed by a 
"Signed-off-by" - not that "Co-developed-by" should be removed.

I was not part of the delivery path for this version of the patch. But
current Signed-off-by seem to suggest this.

> on CONFIG_ATH9K_DEBUGFS and add more information in the commit description

And please don't reply to the old thread when sending a new patchset - this 
becomes really unreadable after a while. You can simply use the method which 
b4 uses and just reference the old thread in your mail. Something like:

Changes in v2:
- change the "Co-developed-by" to "Signed-off-by"
- remove the dependency on CONFIG_ATH9K_DEBUGFS
- add more information in the commit description
- Link to v1: https://lore.kernel.org/r/20241104171627.3789199-1-ih@simonwunderlich.de

[...]
> +static bool ath_hw_hang_deaf(struct ath_softc *sc)
> +{
> +#ifdef CONFIG_ATH9K_TX99
> +	return false;
> +#else
> +	struct ath_common *common = ath9k_hw_common(sc->sc_ah);
> +	u32 interrupts, interrupt_per_s;
> +	unsigned int interval;
> +
> +	/* get historic data */
> +	interval = jiffies_to_msecs(jiffies - sc->last_check_time);
> +	if (sc->sc_ah->caps.hw_caps & ATH9K_HW_CAP_EDMA)
> +		interrupts = sc->debug.stats.istats.rxlp;
> +	else
> +		interrupts = sc->debug.stats.istats.rxok;

You can't simply access sc->debug.stats.istats. sc->debug is only available 
when building with CONFIG_ATH9K_DEBUGFS. See ath9k.c

struct ath_softc {
[...]
#ifdef CONFIG_ATH9K_DEBUGFS
	struct ath9k_debug debug;
#endif
[...]
}

> +       /* sanity check, should be 4 seconds */
> +       if (interval > 10000 || interval < 1000)

Here you have hardcoded values but the actual interval is hidden behind 
ATH_HANG_WORK_INTERVAL. Two things which now are rather disconnected and might 
cause problems in the future (when somebody fiddles around with 
ATH_HANG_WORK_INTERVAL).

Overall, the proposal from Toke seems to be a lot better integrated in the HW 
check style which was introduced by Felix in the beginning of 2017 [1].

At the same time there was a proposal by Felix [2] - which diverged too much 
from our original patch (and as a result caused too many resets) [3]. I would 
therefore propose to check Toke's version and test handles the problem 
correctly.

Kind regards,
	Sven

[1] https://github.com/openwrt/openwrt/commit/b94177e10fc72f9309eae7459c3570e5c080e960
[2] https://patchwork.kernel.org/project/linux-wireless/patch/20170125163654.66431-3-nbd@nbd.name/
[3] https://lore.kernel.org/all/2081606.z26xgMiW1A@prime/
Toke Høiland-Jørgensen Nov. 6, 2024, 12:41 p.m. UTC | #2
Sven Eckelmann <se@simonwunderlich.de> writes:

> Overall, the proposal from Toke seems to be a lot better integrated in the HW 
> check style which was introduced by Felix in the beginning of 2017 [1].
>
> At the same time there was a proposal by Felix [2] - which diverged too much 
> from our original patch (and as a result caused too many resets) [3]. I would 
> therefore propose to check Toke's version and test handles the problem 
> correctly.

Yes, agreed. I did actually write up a commit message for that, let me
send it as a proper patch...

-Toke
Kalle Valo Nov. 7, 2024, 8:03 a.m. UTC | #3
Sven Eckelmann <se@simonwunderlich.de> writes:

> On Wednesday, 6 November 2024 10:04:39 CET Issam Hamdi wrote:
> [...]
>> This patch originally developed by "Simon Wunderlich <simon.wunderlich@open-mesh.com>"
>> and "Sven Eckelmann <sven.eckelmann@open-mesh.com>"
>
> Am I the only person which finds this style of adding information about "Co-
> authors" weird?

No, you are not. I also find it weird.

> [...]
>> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
>> Signed-off-by: Sven Eckelmann <se@simonwunderlich.de>
>> Signed-off-by: Issam Hamdi <ih@simonwunderlich.de>
>> ---
>> v2: change the "Co-developed-by" to "Signed-off-by", remove the dependency
>
> I think Kalle meant that "Co-developed-by" should be followed by a 
> "Signed-off-by" - not that "Co-developed-by" should be removed.

Correct, to my understanding having both c-d-b and s-o-b is the
preferred format.

>> on CONFIG_ATH9K_DEBUGFS and add more information in the commit description
>
> And please don't reply to the old thread when sending a new patchset - this 
> becomes really unreadable after a while. You can simply use the method which 
> b4 uses and just reference the old thread in your mail. Something like:
>
> Changes in v2:
> - change the "Co-developed-by" to "Signed-off-by"
> - remove the dependency on CONFIG_ATH9K_DEBUGFS
> - add more information in the commit description
> - Link to v1: https://lore.kernel.org/r/20241104171627.3789199-1-ih@simonwunderlich.de

Yes, this style is very much recommended. Having links to older versions
helps reviewers.
diff mbox series

Patch

diff --git a/drivers/net/wireless/ath/ath9k/ath9k.h b/drivers/net/wireless/ath/ath9k/ath9k.h
index 29ca65a732a6..c1ce081445a9 100644
--- a/drivers/net/wireless/ath/ath9k/ath9k.h
+++ b/drivers/net/wireless/ath/ath9k/ath9k.h
@@ -739,11 +739,13 @@  void ath9k_csa_update(struct ath_softc *sc);
 #define ATH_ANI_MAX_SKIP_COUNT    10
 #define ATH_PAPRD_TIMEOUT         100 /* msecs */
 #define ATH_PLL_WORK_INTERVAL     100
+#define ATH_HANG_WORK_INTERVAL    4000
 
 void ath_hw_check_work(struct work_struct *work);
 void ath_reset_work(struct work_struct *work);
 bool ath_hw_check(struct ath_softc *sc);
 void ath_hw_pll_work(struct work_struct *work);
+void ath_hw_hang_work(struct work_struct *work);
 void ath_paprd_calibrate(struct work_struct *work);
 void ath_ani_calibrate(struct timer_list *t);
 void ath_start_ani(struct ath_softc *sc);
@@ -1044,6 +1046,7 @@  struct ath_softc {
 #endif
 	struct delayed_work hw_check_work;
 	struct delayed_work hw_pll_work;
+	struct delayed_work hw_hang_work;
 	struct timer_list sleep_timer;
 
 #ifdef CONFIG_ATH9K_BTCOEX_SUPPORT
diff --git a/drivers/net/wireless/ath/ath9k/debug.c b/drivers/net/wireless/ath/ath9k/debug.c
index eff894958a73..6b2469a01f17 100644
--- a/drivers/net/wireless/ath/ath9k/debug.c
+++ b/drivers/net/wireless/ath/ath9k/debug.c
@@ -750,6 +750,7 @@  static int read_file_reset(struct seq_file *file, void *data)
 		[RESET_TYPE_CALIBRATION] = "Calibration error",
 		[RESET_TX_DMA_ERROR] = "Tx DMA stop error",
 		[RESET_RX_DMA_ERROR] = "Rx DMA stop error",
+		[RESET_TYPE_DEADBEEF] = "deadbeef hang",
 	};
 	int i;
 
diff --git a/drivers/net/wireless/ath/ath9k/debug.h b/drivers/net/wireless/ath/ath9k/debug.h
index 389459c04d14..6ebb6053a8c1 100644
--- a/drivers/net/wireless/ath/ath9k/debug.h
+++ b/drivers/net/wireless/ath/ath9k/debug.h
@@ -53,6 +53,7 @@  enum ath_reset_type {
 	RESET_TYPE_CALIBRATION,
 	RESET_TX_DMA_ERROR,
 	RESET_RX_DMA_ERROR,
+	RESET_TYPE_DEADBEEF,
 	__RESET_TYPE_MAX
 };
 
diff --git a/drivers/net/wireless/ath/ath9k/init.c b/drivers/net/wireless/ath/ath9k/init.c
index f9e77c4624d9..833474d7281f 100644
--- a/drivers/net/wireless/ath/ath9k/init.c
+++ b/drivers/net/wireless/ath/ath9k/init.c
@@ -740,6 +740,7 @@  static int ath9k_init_softc(u16 devid, struct ath_softc *sc,
 	INIT_WORK(&sc->paprd_work, ath_paprd_calibrate);
 	INIT_DELAYED_WORK(&sc->hw_pll_work, ath_hw_pll_work);
 	INIT_DELAYED_WORK(&sc->hw_check_work, ath_hw_check_work);
+	INIT_DELAYED_WORK(&sc->hw_hang_work, ath_hw_hang_work);
 
 	ath9k_init_channel_context(sc);
 
diff --git a/drivers/net/wireless/ath/ath9k/link.c b/drivers/net/wireless/ath/ath9k/link.c
index d1e5767aab3c..37438960c278 100644
--- a/drivers/net/wireless/ath/ath9k/link.c
+++ b/drivers/net/wireless/ath/ath9k/link.c
@@ -142,6 +142,37 @@  void ath_hw_pll_work(struct work_struct *work)
 				     msecs_to_jiffies(ATH_PLL_WORK_INTERVAL));
 }
 
+static bool ath_hw_hang_deadbeef(struct ath_softc *sc)
+{
+	struct ath_common *common = ath9k_hw_common(sc->sc_ah);
+	u32 reg;
+
+	/* check for stucked MAC */
+	ath9k_ps_wakeup(sc);
+	reg = REG_READ(sc->sc_ah, AR_CFG);
+	ath9k_ps_restore(sc);
+
+	if (reg != 0xdeadbeef)
+		return false;
+
+	ath_dbg(common, RESET,
+		"0xdeadbeef hang is detected. Schedule chip reset\n");
+	ath9k_queue_reset(sc, RESET_TYPE_DEADBEEF);
+
+	return true;
+}
+
+void ath_hw_hang_work(struct work_struct *work)
+{
+	struct ath_softc *sc = container_of(work, struct ath_softc,
+					    hw_hang_work.work);
+
+	ath_hw_hang_deadbeef(sc);
+
+	ieee80211_queue_delayed_work(sc->hw, &sc->hw_hang_work,
+				     msecs_to_jiffies(ATH_HANG_WORK_INTERVAL));
+}
+
 /*
  * PA Pre-distortion.
  */
diff --git a/drivers/net/wireless/ath/ath9k/main.c b/drivers/net/wireless/ath/ath9k/main.c
index b92c89dad8de..024028ce8417 100644
--- a/drivers/net/wireless/ath/ath9k/main.c
+++ b/drivers/net/wireless/ath/ath9k/main.c
@@ -186,6 +186,7 @@  static void __ath_cancel_work(struct ath_softc *sc)
 	cancel_work_sync(&sc->paprd_work);
 	cancel_delayed_work_sync(&sc->hw_check_work);
 	cancel_delayed_work_sync(&sc->hw_pll_work);
+	cancel_delayed_work_sync(&sc->hw_hang_work);
 
 #ifdef CONFIG_ATH9K_BTCOEX_SUPPORT
 	if (ath9k_hw_mci_is_enabled(sc->sc_ah))
@@ -208,6 +209,9 @@  void ath_restart_work(struct ath_softc *sc)
 		ieee80211_queue_delayed_work(sc->hw, &sc->hw_pll_work,
 				     msecs_to_jiffies(ATH_PLL_WORK_INTERVAL));
 
+	ieee80211_queue_delayed_work(sc->hw, &sc->hw_hang_work,
+				     msecs_to_jiffies(ATH_HANG_WORK_INTERVAL));
+
 	ath_start_ani(sc);
 }