diff mbox series

wifi: cfg80211: fix bss rbn double erase issue

Message ID 20231025103304.22082-1-jiazi.li@transsion.com
State New
Headers show
Series wifi: cfg80211: fix bss rbn double erase issue | expand

Commit Message

Jiazi Li Oct. 25, 2023, 10:33 a.m. UTC
If cfg80211_update_assoc_bss_entry call rb_insert_bss re-insert bss
failed because cmp_bss return 0, this bss->rbn will continue to hold
expired data, such as __rd_parent_color.
And this bss still in rdev->bss_list, maybe double erase in
__cfg80211_bss_expire later.
Double erase a rbtree node(with expired parent and color data) maybe
corrupt rbtree, so add a in_rbtree flag to fix this issue.

Signed-off-by: Jiazi Li <jiazi.li@transsion.com>
---
 net/wireless/core.h |  1 +
 net/wireless/scan.c | 11 ++++++++---
 2 files changed, 9 insertions(+), 3 deletions(-)

Comments

Johannes Berg Dec. 14, 2023, 1:13 p.m. UTC | #1
> > 
> > Ok that's bad - so you hit the WARN_ON there? How that? We should fix
> > that too?
> > 
> Yes, hit this WARN_ON in the test of direct connection between mobile
> phones and PC. Here is the log:
> [ 2741.982362] -----------[ cut here ]-----------
> [ 2741.982446] WARNING: CPU: 6 PID: 2175 at net/wireless/scan.c:1496 cfg80211_update_assoc_bss_entry+0x350/0x378 [cfg80211]

Right, so you can reproduce that - can you find a fix for it?

> I don't know why this is happening yet.

OK ...

We have some basic kunit infrastructure, maybe you can work out
something there.

> > > this bss->rbn will continue to hold
> > > expired data, such as __rd_parent_color.
> > 
> > Does that matter in any way?
> > 
> It caused a null pointer issue in rb_erase:

Well, OK, so the thing isn't about it holding a color or 'expired' data
or whatever, it's about it being still on the rbtree, no?

> > > And this bss still in rdev->bss_list, maybe double erase in
> > > __cfg80211_bss_expire later.
> > > Double erase a rbtree node(with expired parent and color data) maybe
> > > corrupt rbtree, so add a in_rbtree flag to fix this issue.
> > 
> > This seems overly complex - couldn't we just remove it from the list too
> > or something? It's already a case that "should never happen" so ... not
> > sure we need to do something "good"?
> > 
> Will remove it from list when re-insert fail cause confusion in it's
> refcount? Which could lead to leakage or use-after-free?
> > 

It's a warn-on anyway, better we leak it than crash?

johannes
Jiazi Li Dec. 15, 2023, 10:21 a.m. UTC | #2
On Thu, Dec 14, 2023 at 02:13:38PM +0100, Johannes Berg wrote:
> > > 
> > > Ok that's bad - so you hit the WARN_ON there? How that? We should fix
> > > that too?
> > > 
> > Yes, hit this WARN_ON in the test of direct connection between mobile
> > phones and PC. Here is the log:
> > [ 2741.982362] -----------[ cut here ]-----------
> > [ 2741.982446] WARNING: CPU: 6 PID: 2175 at net/wireless/scan.c:1496 cfg80211_update_assoc_bss_entry+0x350/0x378 [cfg80211]
> 
> Right, so you can reproduce that - can you find a fix for it?
> 

I am responsible for kernel stability and I am not very familiar with wireless code.
The colleague in charge of the WiFi module also couldn't find the root cause, so we
used the workaround solution I mentioned earlier to address this issue.

> > I don't know why this is happening yet.
> 
> OK ...
> 
> We have some basic kunit infrastructure, maybe you can work out
> something there.
>

I would suggest that my WiFi colleagues use basic kunit infrastructure to identify
the root cause of the problem.

> > > > this bss->rbn will continue to hold
> > > > expired data, such as __rd_parent_color.
> > > 
> > > Does that matter in any way?
> > > 
> > It caused a null pointer issue in rb_erase:
> 
> Well, OK, so the thing isn't about it holding a color or 'expired' data
> or whatever, it's about it being still on the rbtree, no?
> 

I think the thing is rbn has been erased in cfg80211_update_assoc_bss_entry, but
it couldn't be reinserted into rbtree:
  	rb_erase(&cbss->rbn, &rdev->bss_tree);
  	rb_insert_bss(rdev, cbss);//reinsert fail
  	rdev->bss_generation++;

So bss->rbn is not in rbtree now, but bss still in rdev->bss_list.
This leads to erasing this rbn that is not in the rbtree in __cfg80211_bss_expire.
Expired __rb_parent_color, rb_right, and rb_left in this rbn may cause various crash issues:
1.
[56994.336470][T312578] WARNING: CPU: 3 PID: 12578 at net/wireless/scan.c:1495 cfg80211_update_assoc_bss_entry+0x350/0x378 [cfg80211]
......
[57049.728279][T712578] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000004
......
[57049.765847][T712578] pc : [0xffffffe95cc51388] cmp_bss+0x30/0x3a4 [cfg80211]
[57049.765973][T712578] lr : [0xffffffe95cc50c68] cfg80211_bss_update+0x7c/0x76c [cfg80211]
2.
[12114.124799][T412320] WARNING: CPU: 4 PID: 12320 at net/wireless/scan.c:1496 cfg80211_update_assoc_bss_entry+0x350/0x378 [cfg80211]
......
[12418.513153][T212320] Unable to handle kernel paging request at virtual address ffff81b1d69f1928
......
[12418.548440][T212320] pc : [0xffffffec9e669630] rb_insert_color+0xe4/0x164
[12418.548762][T212320] lr : [0xffffffec99766fb0] cfg80211_bss_update+0x3d4/0x76c [cfg80211]

I have encountered crashes in the code of other modules, and it is speculated that the use after free of rbn has damaged the memory used by other modules.
3.
[ 3981.870858][T510216] WARNING: CPU: 5 PID: 10216 at net/wireless/scan.c:1496 cfg80211_update_assoc_bss_entry+0x350/0x378 [cfg80211]
......
[ 4020.227747][ T4070] list_del corruption. prev->next should be ffffff80ebeace00, but was ffffff81950e8b30
[ 4020.227892][ T4070] ------------[ cut here ]------------
[ 4020.227913][ T4070] kernel BUG at lib/list_debug.c:61!
[ 4020.359413][ T4070] pc : [0xffffffd0985eecf8] __list_del_entry_valid+0xc0/0xd4
[ 4020.359438][ T4070] lr : [0xffffffd0985eecf8] __list_del_entry_valid+0xc0/0xd4
4.
[ 4858.776299][T102099] WARNING: CPU: 1 PID: 2099 at net/wireless/scan.c:1496 cfg80211_update_assoc_bss_entry+0x350/0x378 [cfg80211]
......
[ 5557.453407][T732106] Unable to handle kernel paging request at virtual address 20ffffff813691f8
......
[ 5557.466271][T732106] pc : [0xffffffd0ce5bda08] binder_open+0x208/0x608
[ 5557.466273][T732106] lr : [0xffffffd0ce5bd9f4] binder_open+0x1f4/0x608

> > > > And this bss still in rdev->bss_list, maybe double erase in
> > > > __cfg80211_bss_expire later.
> > > > Double erase a rbtree node(with expired parent and color data) maybe
> > > > corrupt rbtree, so add a in_rbtree flag to fix this issue.
> > > 
> > > This seems overly complex - couldn't we just remove it from the list too
> > > or something? It's already a case that "should never happen" so ... not
> > > sure we need to do something "good"?
> > > 
> > Will remove it from list when re-insert fail cause confusion in it's
> > refcount? Which could lead to leakage or use-after-free?
> > > 
> 
> It's a warn-on anyway, better we leak it than crash?
> 

Inaccurate refcount may cause use after free, which can also lead to crash.

> johannes
diff mbox series

Patch

diff --git a/net/wireless/core.h b/net/wireless/core.h
index 79b1c6d17847..36dc1e9de6b9 100644
--- a/net/wireless/core.h
+++ b/net/wireless/core.h
@@ -178,6 +178,7 @@  struct cfg80211_internal_bss {
 	unsigned long ts;
 	unsigned long refcount;
 	atomic_t hold;
+	bool in_rbtree;
 
 	/* time at the start of the reception of the first octet of the
 	 * timestamp field of the last beacon/probe received for this BSS.
diff --git a/net/wireless/scan.c b/net/wireless/scan.c
index 8d114faf4842..d54cb47c1be6 100644
--- a/net/wireless/scan.c
+++ b/net/wireless/scan.c
@@ -202,7 +202,8 @@  static bool __cfg80211_unlink_bss(struct cfg80211_registered_device *rdev,
 
 	list_del_init(&bss->list);
 	list_del_init(&bss->pub.nontrans_list);
-	rb_erase(&bss->rbn, &rdev->bss_tree);
+	if (bss->in_rbtree)
+		rb_erase(&bss->rbn, &rdev->bss_tree);
 	rdev->bss_entries--;
 	WARN_ONCE((rdev->bss_entries == 0) ^ list_empty(&rdev->bss_list),
 		  "rdev bss entries[%d]/list[empty:%d] corruption\n",
@@ -1563,6 +1564,7 @@  static void rb_insert_bss(struct cfg80211_registered_device *rdev,
 
 		if (WARN_ON(!cmp)) {
 			/* will sort of leak this BSS */
+			bss->in_rbtree = false;
 			return;
 		}
 
@@ -1572,6 +1574,7 @@  static void rb_insert_bss(struct cfg80211_registered_device *rdev,
 			p = &(*p)->rb_right;
 	}
 
+	bss->in_rbtree = true;
 	rb_link_node(&bss->rbn, parent, p);
 	rb_insert_color(&bss->rbn, &rdev->bss_tree);
 }
@@ -3061,7 +3064,8 @@  void cfg80211_update_assoc_bss_entry(struct wireless_dev *wdev,
 			rdev->bss_generation++;
 	}
 
-	rb_erase(&cbss->rbn, &rdev->bss_tree);
+	if (cbss->in_rbtree)
+		rb_erase(&cbss->rbn, &rdev->bss_tree);
 	rb_insert_bss(rdev, cbss);
 	rdev->bss_generation++;
 
@@ -3070,7 +3074,8 @@  void cfg80211_update_assoc_bss_entry(struct wireless_dev *wdev,
 				 nontrans_list) {
 		bss = bss_from_pub(nontrans_bss);
 		bss->pub.channel = chan;
-		rb_erase(&bss->rbn, &rdev->bss_tree);
+		if (bss->in_rbtree)
+			rb_erase(&bss->rbn, &rdev->bss_tree);
 		rb_insert_bss(rdev, bss);
 		rdev->bss_generation++;
 	}