diff mbox series

[v2] wifi: cfg80211: Lock wiphy in cfg80211_get_station

Message ID 983b24a6a176e0800c01aedcd74480d9b551cb13.1716046653.git.repk@triplefau.lt
State New
Headers show
Series [v2] wifi: cfg80211: Lock wiphy in cfg80211_get_station | expand

Commit Message

Remi Pommarel May 21, 2024, 7:47 p.m. UTC
Wiphy should be locked before calling rdev_get_station() (see lockdep
assert in ieee80211_get_station()).

This fixes the following kernel NULL dereference:

 Unable to handle kernel NULL pointer dereference at virtual address 0000000000000050
 Mem abort info:
   ESR = 0x0000000096000006
   EC = 0x25: DABT (current EL), IL = 32 bits
   SET = 0, FnV = 0
   EA = 0, S1PTW = 0
   FSC = 0x06: level 2 translation fault
 Data abort info:
   ISV = 0, ISS = 0x00000006
   CM = 0, WnR = 0
 user pgtable: 4k pages, 48-bit VAs, pgdp=0000000003001000
 [0000000000000050] pgd=0800000002dca003, p4d=0800000002dca003, pud=08000000028e9003, pmd=0000000000000000
 Internal error: Oops: 0000000096000006 [#1] SMP
 Modules linked in: netconsole dwc3_meson_g12a dwc3_of_simple dwc3 ip_gre gre ath10k_pci ath10k_core ath9k ath9k_common ath9k_hw ath
 CPU: 0 PID: 1091 Comm: kworker/u8:0 Not tainted 6.4.0-02144-g565f9a3a7911-dirty #705
 Hardware name: RPT (r1) (DT)
 Workqueue: bat_events batadv_v_elp_throughput_metric_update
 pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
 pc : ath10k_sta_statistics+0x10/0x2dc [ath10k_core]
 lr : sta_set_sinfo+0xcc/0xbd4
 sp : ffff000007b43ad0
 x29: ffff000007b43ad0 x28: ffff0000071fa900 x27: ffff00000294ca98
 x26: ffff000006830880 x25: ffff000006830880 x24: ffff00000294c000
 x23: 0000000000000001 x22: ffff000007b43c90 x21: ffff800008898acc
 x20: ffff00000294c6e8 x19: ffff000007b43c90 x18: 0000000000000000
 x17: 445946354d552d78 x16: 62661f7200000000 x15: 57464f445946354d
 x14: 0000000000000000 x13: 00000000000000e3 x12: d5f0acbcebea978e
 x11: 00000000000000e3 x10: 000000010048fe41 x9 : 0000000000000000
 x8 : ffff000007b43d90 x7 : 000000007a1e2125 x6 : 0000000000000000
 x5 : ffff0000024e0900 x4 : ffff800000a0250c x3 : ffff000007b43c90
 x2 : ffff00000294ca98 x1 : ffff000006831920 x0 : 0000000000000000
 Call trace:
  ath10k_sta_statistics+0x10/0x2dc [ath10k_core]
  sta_set_sinfo+0xcc/0xbd4
  ieee80211_get_station+0x2c/0x44
  cfg80211_get_station+0x80/0x154
  batadv_v_elp_get_throughput+0x138/0x1fc
  batadv_v_elp_throughput_metric_update+0x1c/0xa4
  process_one_work+0x1ec/0x414
  worker_thread+0x70/0x46c
  kthread+0xdc/0xe0
  ret_from_fork+0x10/0x20
 Code: a9bb7bfd 910003fd a90153f3 f9411c40 (f9402814)

This happens because STA has time to disconnect and reconnect before
batadv_v_elp_throughput_metric_update() delayed work gets scheduled. In
this situation, ath10k_sta_state() can be in the middle of resetting
arsta data when the work queue get chance to be scheduled and ends up
accessing it. Locking wiphy prevents that.

Fixes: 7406353d43c8 ("cfg80211: implement cfg80211_get_station cfg80211 API")
Signed-off-by: Remi Pommarel <repk@triplefau.lt>
Reviewed-by: Nicolas Escande <nico.escande@gmail.com>
Acked-by: Antonio Quartulli <a@unstable.cc>
---
There are mainly no change since v1 it is only resent because both
linux-wireless and linux-kernel vger mailing list dropped it.

Full thread archive can be found here [0].

Tags from v1 has been added as well as a note on why this lock issue
could end with NULL pointer derefence.

[0]: https://patchwork.open-mesh.org/project/b.a.t.m.a.n./patch/983b24a6a176e0800c01aedcd74480d9b551cb13.1716046653.git.repk@triplefau.lt/
---
 net/wireless/util.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

Comments

Nicolas Escande May 21, 2024, 7:01 a.m. UTC | #1
On Sat May 18, 2024 at 5:50 PM CEST, Remi Pommarel wrote:
> Wiphy should be locked before calling rdev_get_station() (see lockdep
> assert in ieee80211_get_station()).
>
> This fixes the following kernel NULL dereference:
>
>  Unable to handle kernel NULL pointer dereference at virtual address 0000000000000050
>  Mem abort info:
>    ESR = 0x0000000096000006
>    EC = 0x25: DABT (current EL), IL = 32 bits
>    SET = 0, FnV = 0
>    EA = 0, S1PTW = 0
>    FSC = 0x06: level 2 translation fault
>  Data abort info:
>    ISV = 0, ISS = 0x00000006
>    CM = 0, WnR = 0
>  user pgtable: 4k pages, 48-bit VAs, pgdp=0000000003001000
>  [0000000000000050] pgd=0800000002dca003, p4d=0800000002dca003, pud=08000000028e9003, pmd=0000000000000000
>  Internal error: Oops: 0000000096000006 [#1] SMP
>  Modules linked in: netconsole dwc3_meson_g12a dwc3_of_simple dwc3 ip_gre gre ath10k_pci ath10k_core ath9k ath9k_common ath9k_hw ath
>  CPU: 0 PID: 1091 Comm: kworker/u8:0 Not tainted 6.4.0-02144-g565f9a3a7911-dirty #705
>  Hardware name: RPT (r1) (DT)
>  Workqueue: bat_events batadv_v_elp_throughput_metric_update
>  pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>  pc : ath10k_sta_statistics+0x10/0x2dc [ath10k_core]
>  lr : sta_set_sinfo+0xcc/0xbd4
>  sp : ffff000007b43ad0
>  x29: ffff000007b43ad0 x28: ffff0000071fa900 x27: ffff00000294ca98
>  x26: ffff000006830880 x25: ffff000006830880 x24: ffff00000294c000
>  x23: 0000000000000001 x22: ffff000007b43c90 x21: ffff800008898acc
>  x20: ffff00000294c6e8 x19: ffff000007b43c90 x18: 0000000000000000
>  x17: 445946354d552d78 x16: 62661f7200000000 x15: 57464f445946354d
>  x14: 0000000000000000 x13: 00000000000000e3 x12: d5f0acbcebea978e
>  x11: 00000000000000e3 x10: 000000010048fe41 x9 : 0000000000000000
>  x8 : ffff000007b43d90 x7 : 000000007a1e2125 x6 : 0000000000000000
>  x5 : ffff0000024e0900 x4 : ffff800000a0250c x3 : ffff000007b43c90
>  x2 : ffff00000294ca98 x1 : ffff000006831920 x0 : 0000000000000000
>  Call trace:
>   ath10k_sta_statistics+0x10/0x2dc [ath10k_core]
>   sta_set_sinfo+0xcc/0xbd4
>   ieee80211_get_station+0x2c/0x44
>   cfg80211_get_station+0x80/0x154
>   batadv_v_elp_get_throughput+0x138/0x1fc
>   batadv_v_elp_throughput_metric_update+0x1c/0xa4
>   process_one_work+0x1ec/0x414
>   worker_thread+0x70/0x46c
>   kthread+0xdc/0xe0
>   ret_from_fork+0x10/0x20
>  Code: a9bb7bfd 910003fd a90153f3 f9411c40 (f9402814)
>
> Fixes: 7406353d43c8 ("cfg80211: implement cfg80211_get_station cfg80211 API")
> Signed-off-by: Remi Pommarel <repk@triplefau.lt>

Reviewed-by: Nicolas Escande <nico.escande@gmail.com>
Johannes Berg May 21, 2024, 7:12 a.m. UTC | #2
On Sat, 2024-05-18 at 17:50 +0200, Remi Pommarel wrote:
> Wiphy should be locked before calling rdev_get_station() (see lockdep
> assert in ieee80211_get_station()).
> 
> This fixes the following kernel NULL dereference:

How do you get a NULL pointer dereference from a locking issue? Was
something else removing the station simultaneously?

johannes
Antonio Quartulli May 21, 2024, 7:43 a.m. UTC | #3
Hi,

On 18/05/2024 17:50, Remi Pommarel wrote:
> Wiphy should be locked before calling rdev_get_station() (see lockdep
> assert in ieee80211_get_station()).

Adding the lock is fine as nowadays it is taken in pre_doit and released 
in post_doit (with some exceptions). Therefore when invoking 
.get_station from a side path the lock should be taken too.

It was actually a05829a7222e9d10c416dd2dbbf3929fe6646b89 that introduced 
this requirement AFAICS.

> 
> This fixes the following kernel NULL dereference:

As already said by Johannes, I am not sure it truly fixes this NULL 
dereference though.

Have you checked where in ath10k_sta_statistics this is exactly 
happening? Do you think some sta was partly released and thus fields 
were NULLified?

Regards,
Antonio Quartulli May 21, 2024, 2:51 p.m. UTC | #4
Hi,

On 21/05/2024 14:15, Remi Pommarel wrote:
> On Tue, May 21, 2024 at 09:43:56AM +0200, Antonio Quartulli wrote:
>> Hi,
>>
>> On 18/05/2024 17:50, Remi Pommarel wrote:
>>> Wiphy should be locked before calling rdev_get_station() (see lockdep
>>> assert in ieee80211_get_station()).
>>
>> Adding the lock is fine as nowadays it is taken in pre_doit and released in
>> post_doit (with some exceptions). Therefore when invoking .get_station from
>> a side path the lock should be taken too.
>>
>> It was actually a05829a7222e9d10c416dd2dbbf3929fe6646b89 that introduced
>> this requirement AFAICS.
> 
> IIUC lock requirement was already there before this commit, only it was on
> rtnl lock to be taken instead of wiphy one.

You're right.

> 
>>
>>>
>>> This fixes the following kernel NULL dereference:
>>
>> As already said by Johannes, I am not sure it truly fixes this NULL
>> dereference though.
>>
>> Have you checked where in ath10k_sta_statistics this is exactly happening?
>> Do you think some sta was partly released and thus fields were NULLified?
> 
> ath10k_sta_statistics+0x10 is associated with the arsta->arvif->ar
> statement. It crashes because arsta->arvif is NULL.
> 
> Here is a scenario that explains the crash where the same sta
> disconnects then reconnects quickly (e.g. OPEN network):
> 
> 
> CPU0:                                       CPU1:
> 
> batadv_v_elp_periodic_work()
>   queue_work(batadv_v_elp_get_throughput)
> 
>                                              ieee80211_del_station()
>                                              ieee80211_add_station()
>                                               sta_info_insert()
>                                                list_add_tail_rcu()
>                                                ath10k_sta_state()
>                                                 memset(arsta, 0, sizeof(arsta))
> batadv_v_elp_get_throughput()
>   cfg80211_get_station()
>    ieee80211_get_station() <-- Find sta with its MAC in list
>     ath10k_sta_statistics()
>      arsta->arvif->ar <-- arsta is still zeroed
> 
> 
> In other words if the same sta has enough time to disconnect then
> reconnect before batadv_v_elp_get_throughput get scheduled, sta could be
> partially initialized when ath10k_sta_statistics() is called. Locking
> the wiphy ensure sta is fully initialized if found.
> 

We were just wondering how you could get the arvif being NULL and your 
explanation makes sense.

This said, holding the lock is required when invoking get_station via 
netlink, therefore it's meaningful to hold it even when side invoking it 
from another module.

Acked-by: Antonio Quartulli <a@unstable.cc>
diff mbox series

Patch

diff --git a/net/wireless/util.c b/net/wireless/util.c
index 2bde8a354631..082c6f9c5416 100644
--- a/net/wireless/util.c
+++ b/net/wireless/util.c
@@ -2549,6 +2549,7 @@  int cfg80211_get_station(struct net_device *dev, const u8 *mac_addr,
 {
 	struct cfg80211_registered_device *rdev;
 	struct wireless_dev *wdev;
+	int ret;
 
 	wdev = dev->ieee80211_ptr;
 	if (!wdev)
@@ -2560,7 +2561,11 @@  int cfg80211_get_station(struct net_device *dev, const u8 *mac_addr,
 
 	memset(sinfo, 0, sizeof(*sinfo));
 
-	return rdev_get_station(rdev, dev, mac_addr, sinfo);
+	wiphy_lock(&rdev->wiphy);
+	ret = rdev_get_station(rdev, dev, mac_addr, sinfo);
+	wiphy_unlock(&rdev->wiphy);
+
+	return ret;
 }
 EXPORT_SYMBOL(cfg80211_get_station);