Message ID | 20201215172352.5311-1-youghand@codeaurora.org |
---|---|
State | New |
Headers | show |
Series | mac80211: Trigger disconnect for STA during recovery | expand |
On 2020-12-15 23:10, Felix Fietkau wrote: > On 2020-12-15 18:23, Youghandhar Chintala wrote: >> Currently in case of target hardware restart, we just reconfig and >> re-enable the security keys and enable the network queues to start >> data traffic back from where it was interrupted. >> >> Many ath10k wifi chipsets have sequence numbers for the data >> packets assigned by firmware and the mac sequence number will >> restart from zero after target hardware restart leading to mismatch >> in the sequence number expected by the remote peer vs the sequence >> number of the frame sent by the target firmware. >> >> This mismatch in sequence number will cause out-of-order packets >> on the remote peer and all the frames sent by the device are dropped >> until we reach the sequence number which was sent before we restarted >> the target hardware >> >> In order to fix this, we trigger a sta disconnect, for the targets >> which expose this corresponding wiphy flag, in case of target hw >> restart. After this there will be a fresh connection and thereby >> avoiding the dropping of frames by remote peer. >> >> The right fix would be to pull the entire data path into the host >> which is not feasible or would need lots of complex changes and >> will still be inefficient. > How about simply tracking which tids have aggregation enabled and send > DELBA frames for those after the restart? > It would mean less disruption for affected stations and less ugly hacks > in the stack for unreliable hardware. > > - Felix Hi Felix, We did try to send an ADDBA frame to the AP once the SSR happened. The AP ack’ed the frame and the new BA session with renewed sequence number was established. But still, the AP did not respond to the ping requests with the new sequence number. It did not respond until one of the two happened. 1. The sequence number was more than the sequence number that DUT had used before SSR happened 2. DUT disconnected and then reconnected. The other option is to send a DELBA frame to the AP and make the AP also force to establish the BA session from its side. This we feel can have some interoperability issues as some of the AP’s may not honour the DELBA frame and will continue to use the earlier BA session that it had established. Given that re-negotiating the BA session is prone to IOT issues, we feel that it would be good to go with the Disconnect/Reconnect solution which is foolproof and will work in all scenarios. Regards, Youghandhar
On Tue, Dec 15, 2020 at 10:53:52PM +0530, Youghandhar Chintala wrote: > Currently in case of target hardware restart, we just reconfig and > re-enable the security keys and enable the network queues to start > data traffic back from where it was interrupted. > > Many ath10k wifi chipsets have sequence numbers for the data > packets assigned by firmware and the mac sequence number will > restart from zero after target hardware restart leading to mismatch > in the sequence number expected by the remote peer vs the sequence > number of the frame sent by the target firmware. > > This mismatch in sequence number will cause out-of-order packets > on the remote peer and all the frames sent by the device are dropped > until we reach the sequence number which was sent before we restarted > the target hardware > > In order to fix this, we trigger a sta disconnect, for the targets > which expose this corresponding wiphy flag, in case of target hw > restart. After this there will be a fresh connection and thereby > avoiding the dropping of frames by remote peer. > > The right fix would be to pull the entire data path into the host > which is not feasible or would need lots of complex changes and > will still be inefficient. > > Tested on ath10k using WCN3990, QCA6174 > > Signed-off-by: Youghandhar Chintala <youghand@codeaurora.org> > Reviewed-by: Abhishek Kumar <kuabhs@chromium.org> > --- > net/mac80211/ieee80211_i.h | 3 +++ > net/mac80211/mlme.c | 9 +++++++++ > net/mac80211/util.c | 22 +++++++++++++++++++--- > 3 files changed, 31 insertions(+), 3 deletions(-) > > diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h > index cde2e3f..8cbeb5f 100644 > --- a/net/mac80211/ieee80211_i.h > +++ b/net/mac80211/ieee80211_i.h > @@ -748,6 +748,8 @@ struct ieee80211_if_mesh { > * back to wireless media and to the local net stack. > * @IEEE80211_SDATA_DISCONNECT_RESUME: Disconnect after resume. > * @IEEE80211_SDATA_IN_DRIVER: indicates interface was added to driver > + * @IEEE80211_SDATA_DISCONNECT_HW_RESTART: Disconnect after hardware restart > + * recovery > */ > enum ieee80211_sub_if_data_flags { > IEEE80211_SDATA_ALLMULTI = BIT(0), > @@ -755,6 +757,7 @@ enum ieee80211_sub_if_data_flags { > IEEE80211_SDATA_DONT_BRIDGE_PACKETS = BIT(3), > IEEE80211_SDATA_DISCONNECT_RESUME = BIT(4), > IEEE80211_SDATA_IN_DRIVER = BIT(5), > + IEEE80211_SDATA_DISCONNECT_HW_RESTART = BIT(6), > }; > > /** > diff --git a/net/mac80211/mlme.c b/net/mac80211/mlme.c > index 6adfcb9..e4d0d16 100644 > --- a/net/mac80211/mlme.c > +++ b/net/mac80211/mlme.c > @@ -4769,6 +4769,15 @@ void ieee80211_sta_restart(struct ieee80211_sub_if_data *sdata) > true); > sdata_unlock(sdata); > return; > + } else if (sdata->flags & IEEE80211_SDATA_DISCONNECT_HW_RESTART) { > + sdata->flags &= ~IEEE80211_SDATA_DISCONNECT_HW_RESTART; > + mlme_dbg(sdata, "driver requested disconnect after hardware restart\n"); > + ieee80211_sta_connection_lost(sdata, > + ifmgd->associated->bssid, > + WLAN_REASON_UNSPECIFIED, > + true); > + sdata_unlock(sdata); > + return; > } > sdata_unlock(sdata); > } > diff --git a/net/mac80211/util.c b/net/mac80211/util.c > index 8c3c01a..98567a3 100644 > --- a/net/mac80211/util.c > +++ b/net/mac80211/util.c > @@ -2567,9 +2567,12 @@ int ieee80211_reconfig(struct ieee80211_local *local) > } > mutex_unlock(&local->sta_mtx); > > - /* add back keys */ > - list_for_each_entry(sdata, &local->interfaces, list) > - ieee80211_reenable_keys(sdata); > + > + if (!(hw->wiphy->flags & WIPHY_FLAG_STA_DISCONNECT_ON_HW_RESTART)) { > + /* add back keys */ > + list_for_each_entry(sdata, &local->interfaces, list) > + ieee80211_reenable_keys(sdata); > + } > > /* Reconfigure sched scan if it was interrupted by FW restart */ > mutex_lock(&local->mtx); > @@ -2643,6 +2646,19 @@ int ieee80211_reconfig(struct ieee80211_local *local) > IEEE80211_QUEUE_STOP_REASON_SUSPEND, > false); > > + if ((hw->wiphy->flags & WIPHY_FLAG_STA_DISCONNECT_ON_HW_RESTART) && > + !reconfig_due_to_wowlan) { > + list_for_each_entry(sdata, &local->interfaces, list) { > + if (!ieee80211_sdata_running(sdata)) > + continue; > + if (sdata->vif.type == NL80211_IFTYPE_STATION) { > + sdata->flags |= > + IEEE80211_SDATA_DISCONNECT_HW_RESTART; > + ieee80211_sta_restart(sdata); If CONFIG_PM=n: ERROR: "ieee80211_sta_restart" [net/mac80211/mac80211.ko] undefined! Guenter > + } > + } > + } > + > /* > * If this is for hw restart things are still running. > * We may want to change that later, however.
On Fri, 2021-02-05 at 13:51 -0800, Abhishek Kumar wrote: > Since using DELBA frame to APs to re-establish BA session has a > dependency on APs and also some APs may not honour the DELBA frame. That's completely out of spec ... Can you say which AP this was? You could also try sending a BAR that updates the SN. johannes
On Tue, 2020-12-15 at 22:53 +0530, Youghandhar Chintala wrote: > The right fix would be to pull the entire data path into the host > +++ b/net/mac80211/ieee80211_i.h > @@ -748,6 +748,8 @@ struct ieee80211_if_mesh { > * back to wireless media and to the local net stack. > * @IEEE80211_SDATA_DISCONNECT_RESUME: Disconnect after resume. > * @IEEE80211_SDATA_IN_DRIVER: indicates interface was added to driver > + * @IEEE80211_SDATA_DISCONNECT_HW_RESTART: Disconnect after hardware restart > + * recovery How did you model this on IEEE80211_SDATA_DISCONNECT_RESUME, but than didn't check how that's actually used? Please change it so that the two models are the same. You really don't need the wiphy flag. johannes
On Fri, 2021-02-12 at 09:42 +0100, Johannes Berg wrote: > On Tue, 2020-12-15 at 22:53 +0530, Youghandhar Chintala wrote: > > The right fix would be to pull the entire data path into the host > > +++ b/net/mac80211/ieee80211_i.h > > @@ -748,6 +748,8 @@ struct ieee80211_if_mesh { > > * back to wireless media and to the local net stack. > > * @IEEE80211_SDATA_DISCONNECT_RESUME: Disconnect after resume. > > * @IEEE80211_SDATA_IN_DRIVER: indicates interface was added to driver > > + * @IEEE80211_SDATA_DISCONNECT_HW_RESTART: Disconnect after hardware restart > > + * recovery > > How did you model this on IEEE80211_SDATA_DISCONNECT_RESUME, but than > didn't check how that's actually used? > > Please change it so that the two models are the same. You really don't > need the wiphy flag. In fact, you could even simply generalize IEEE80211_SDATA_DISCONNECT_RESUME and ieee80211_resume_disconnect() to _reconfig_ instead of _resume_, and call it from the driver just before requesting HW restart. johannes
Hi Johannes and felix, We have tested with DELBA experiment during post SSR, DUT packet seq number and tx pn is resetting to 0 as expected but AP(Netgear R8000) is not honoring the tx pn from DUT. Whereas when we tested with DELBA experiment by making Linux android device as SAP and DUT as STA with which we don’t see any issue. Ping got resumed post SSR without disconnect. Please find below logs collected during my test for reference. 192.168.0.15(AtherosC_12:af:af) ===> DUT IP and MAC 192.168.0.55(Netgear_d2:93:3d) ===> AP IP and MAC No. Time Source Destination Protocol Channel Sequence number Protected flag Block Ack Starting Sequence Control (SSC) CCMP Ext. Initialization Vector Action code TID Info 474 22.186433 192.168.0.15 192.168.0.55 ICMP 44 37 Data is protected 0x000000000026 0 Echo (ping) request id=0x0d00, seq=256/1, ttl=64 (reply in 480) No. Time Source Destination Protocol Channel Sequence number Protected flag Block Ack Starting Sequence Control (SSC) CCMP Ext. Initialization Vector Action code TID Info 480 22.188371 192.168.0.55 192.168.0.15 ICMP 44 5 Data is protected 0x000000000011 6 Echo (ping) reply id=0x0d00, seq=256/1, ttl=64 (request in 474) No. Time Source Destination Protocol Channel Sequence number Protected flag Block Ack Starting Sequence Control (SSC) CCMP Ext. Initialization Vector Action code TID Info 483 22.246335 192.168.0.15 192.168.0.55 ICMP 44 38 Data is protected 0x000000000027 0 Echo (ping) request id=0x1258, seq=11/2816, ttl=64 (reply in 489) No. Time Source Destination Protocol Channel Sequence number Protected flag Block Ack Starting Sequence Control (SSC) CCMP Ext. Initialization Vector Action code TID Info 489 22.248127 192.168.0.55 192.168.0.15 ICMP 44 13 Data is protected 0x000000000012 0 Echo (ping) reply id=0x1258, seq=11/2816, ttl=64 (request in 483) The above pings(with TID 0) are before SSR. As soon as DUT recovers after SSR, DUT is sending DELBAs to AP. No. Time Source Destination Protocol Channel Sequence number Protected flag Block Ack Starting Sequence Control (SSC) CCMP Ext. Initialization Vector Action code TID Info 546 26.129127 AtherosC_12:af:af Netgear_d2:93:3d 802.11 44 4 Data is not protected Delete Block Ack 0x0 Action, SN=4, FN=0, Flags=........C No. Time Source Destination Protocol Channel Sequence number Protected flag Block Ack Starting Sequence Control (SSC) CCMP Ext. Initialization Vector Action code TID Info 548 26.129977 AtherosC_12:af:af Netgear_d2:93:3d 802.11 44 5 Data is not protected Delete Block Ack 0x6 Action, SN=5, FN=0, Flags=........C After SSR, we started ping traffic with TID 7 and 0. ping is successful for TID 7 and failed for TID 0. For TID 0, ping requests tx PN is reset to 0 but it seems AP is not reset its PN hence we see this ping failure for TID 0. Whereas TID 7 ping success because we started it after SSR. No. Time Source Destination Protocol Channel Sequence number Protected flag Block Ack Starting Sequence Control (SSC) CCMP Ext. Initialization Vector Action code TID Info 557 26.355256 192.168.0.15 192.168.0.55 ICMP 44 0 Data is protected 0x000000000001 0 Echo (ping) request id=0x1258, seq=15/3840, ttl=64 (no response found!) No. Time Source Destination Protocol Channel Sequence number Protected flag Block Ack Starting Sequence Control (SSC) CCMP Ext. Initialization Vector Action code TID Info 571 27.376895 192.168.0.15 192.168.0.55 ICMP 44 1 Data is protected 0x000000000002 0 Echo (ping) request id=0x1258, seq=16/4096, ttl=64 (no response found!) No. Time Source Destination Protocol Channel Sequence number Protected flag Block Ack Starting Sequence Control (SSC) CCMP Ext. Initialization Vector Action code TID Info 588 28.400946 192.168.0.15 192.168.0.55 ICMP 44 2 Data is protected 0x000000000003 0 Echo (ping) request id=0x1258, seq=17/4352, ttl=64 (no response found!) No. Time Source Destination Protocol Channel Sequence number Protected flag Block Ack Starting Sequence Control (SSC) CCMP Ext. Initialization Vector Action code TID Info 600 29.424881 192.168.0.15 192.168.0.55 ICMP 44 3 Data is protected 0x000000000004 0 Echo (ping) request id=0x1258, seq=18/4608, ttl=64 (no response found!) Below ping packets are with TID 7 No. Time Source Destination Protocol Channel Sequence number Protected flag Block Ack Starting Sequence Control (SSC) CCMP Ext. Initialization Vector Action code TID Info 622 30.898249 192.168.0.15 192.168.0.55 ICMP 44 0 Data is protected 0x000000000006 7 Echo (ping) request id=0x1276, seq=1/256, ttl=64 (reply in 626) No. Time Source Destination Protocol Channel Sequence number Protected flag Block Ack Starting Sequence Control (SSC) CCMP Ext. Initialization Vector Action code TID Info 626 30.900015 192.168.0.55 192.168.0.15 ICMP 44 0 Data is protected 0x000000000013 7 Echo (ping) reply id=0x1276, seq=1/256, ttl=64 (request in 622) No. Time Source Destination Protocol Channel Sequence number Protected flag Block Ack Starting Sequence Control (SSC) CCMP Ext. Initialization Vector Action code TID Info 644 31.897456 192.168.0.15 192.168.0.55 ICMP 44 1 Data is protected 0x000000000008 7 Echo (ping) request id=0x1276, seq=2/512, ttl=64 (reply in 648) No. Time Source Destination Protocol Channel Sequence number Protected flag Block Ack Starting Sequence Control (SSC) CCMP Ext. Initialization Vector Action code TID Info 648 31.899266 192.168.0.55 192.168.0.15 ICMP 44 1 Data is protected 0x000000000014 7 Echo (ping) reply id=0x1276, seq=2/512, ttl=64 (request in 644) Regards, Youghandhar On 2021-02-12 14:07, Johannes Berg wrote: > On Fri, 2021-02-05 at 13:51 -0800, Abhishek Kumar wrote: >> Since using DELBA frame to APs to re-establish BA session has a >> dependency on APs and also some APs may not honor the DELBA frame. > > > That's completely out of spec ... Can you say which AP this was? > > You could also try sending a BAR that updates the SN. > > johannes Regards, Youghandhar -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
On Fri, 2021-09-24 at 13:07 +0530, Youghandhar Chintala wrote: > Hi Johannes and felix, > > We have tested with DELBA experiment during post SSR, DUT packet seq > number and tx pn is resetting to 0 as expected but AP(Netgear R8000) is > not honoring the tx pn from DUT. > Whereas when we tested with DELBA experiment by making Linux android > device as SAP and DUT as STA with which we don’t see any issue. Ping got > resumed post SSR without disconnect. Hm. That's a lot of data, and not a lot of explanation :) I don't understand how DelBA and PN are related? johannes
Hi Johannes We thought sending the delba would solve the problem as earlier thought but the actual problem is with TX PN in a secure mode. It is not because of delba that the Seq number and TX PN are reset to zero. It’s because of the HW restart, these parameters are reset to zero. Since FW/HW is the one which decides the TX PN, when it goes through SSR, all these parameters are reset. The other peer say an AP, it does not know anything about the SSR on the peer device. It expects the next TX PN to be current PN + 1. Since TX PN starts from zero after SSR, PN check at AP will fail and it will silently drop all the packets. Regards, Youghandhar On 2021-09-24 13:09, Johannes Berg wrote: > On Fri, 2021-09-24 at 13:07 +0530, Youghandhar Chintala wrote: >> Hi Johannes and felix, >> >> We have tested with DELBA experiment during post SSR, DUT packet seq >> number and tx pn is resetting to 0 as expected but AP(Netgear R8000) >> is >> not honoring the tx pn from DUT. >> Whereas when we tested with DELBA experiment by making Linux android >> device as SAP and DUT as STA with which we don’t see any issue. Ping >> got >> resumed post SSR without disconnect. > > Hm. That's a lot of data, and not a lot of explanation :) > > I don't understand how DelBA and PN are related? > > johannes Regards, Youghandhar -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Hi, > We thought sending the delba would solve the problem as earlier thought > but the actual problem is with TX PN in a secure mode. > It is not because of delba that the Seq number and TX PN are reset to > zero. > It’s because of the HW restart, these parameters are reset to zero. > Since FW/HW is the one which decides the TX PN, when it goes through > SSR, all these parameters are reset. Right, we solved this problem too - in a sense the driver reads the database (not just TX PN btw, also RX replay counters) when the firmware crashes, and sending it back after the restart. mac80211 has some hooks for that. johannes
On Fri, Sep 24, 2021 at 11:20:50AM +0200, Johannes Berg wrote: > > We thought sending the delba would solve the problem as earlier thought > > but the actual problem is with TX PN in a secure mode. > > It is not because of delba that the Seq number and TX PN are reset to > > zero. > > It’s because of the HW restart, these parameters are reset to zero. > > Since FW/HW is the one which decides the TX PN, when it goes through > > SSR, all these parameters are reset. > > Right, we solved this problem too - in a sense the driver reads the > database (not just TX PN btw, also RX replay counters) when the firmware > crashes, and sending it back after the restart. mac80211 has some hooks > for that. This might be doable for some cases where the firmware is the component assigning the PN values on TX and the firmware still being in a state where the counter used for this could be fetched after a crash or detected misbehavior. However, this does not sound like a very reliable mechanism for cases where the firmware state for this cannot be trusted or for the cases where the TX PN is actually assigned by the hardware (which would get cleared on that restart and the value might be unreadable before that restart). Trying to pull for this information periodically before the issue is detected does not sound like a very robust design either, since that would both waste resources and have a race condition with the lower layers having transmitted additional frames. Obviously it would be nice to be able to restore this type of state in all cases accurately, but that may not really be a viable approach for all designs and it would seem to make sense to provide an alternative approach to minimize the user visible impact from the rare cases of having to restart some low level components during an association. -- Jouni Malinen PGP id EFC895FA
diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h index cde2e3f..8cbeb5f 100644 --- a/net/mac80211/ieee80211_i.h +++ b/net/mac80211/ieee80211_i.h @@ -748,6 +748,8 @@ struct ieee80211_if_mesh { * back to wireless media and to the local net stack. * @IEEE80211_SDATA_DISCONNECT_RESUME: Disconnect after resume. * @IEEE80211_SDATA_IN_DRIVER: indicates interface was added to driver + * @IEEE80211_SDATA_DISCONNECT_HW_RESTART: Disconnect after hardware restart + * recovery */ enum ieee80211_sub_if_data_flags { IEEE80211_SDATA_ALLMULTI = BIT(0), @@ -755,6 +757,7 @@ enum ieee80211_sub_if_data_flags { IEEE80211_SDATA_DONT_BRIDGE_PACKETS = BIT(3), IEEE80211_SDATA_DISCONNECT_RESUME = BIT(4), IEEE80211_SDATA_IN_DRIVER = BIT(5), + IEEE80211_SDATA_DISCONNECT_HW_RESTART = BIT(6), }; /** diff --git a/net/mac80211/mlme.c b/net/mac80211/mlme.c index 6adfcb9..e4d0d16 100644 --- a/net/mac80211/mlme.c +++ b/net/mac80211/mlme.c @@ -4769,6 +4769,15 @@ void ieee80211_sta_restart(struct ieee80211_sub_if_data *sdata) true); sdata_unlock(sdata); return; + } else if (sdata->flags & IEEE80211_SDATA_DISCONNECT_HW_RESTART) { + sdata->flags &= ~IEEE80211_SDATA_DISCONNECT_HW_RESTART; + mlme_dbg(sdata, "driver requested disconnect after hardware restart\n"); + ieee80211_sta_connection_lost(sdata, + ifmgd->associated->bssid, + WLAN_REASON_UNSPECIFIED, + true); + sdata_unlock(sdata); + return; } sdata_unlock(sdata); } diff --git a/net/mac80211/util.c b/net/mac80211/util.c index 8c3c01a..98567a3 100644 --- a/net/mac80211/util.c +++ b/net/mac80211/util.c @@ -2567,9 +2567,12 @@ int ieee80211_reconfig(struct ieee80211_local *local) } mutex_unlock(&local->sta_mtx); - /* add back keys */ - list_for_each_entry(sdata, &local->interfaces, list) - ieee80211_reenable_keys(sdata); + + if (!(hw->wiphy->flags & WIPHY_FLAG_STA_DISCONNECT_ON_HW_RESTART)) { + /* add back keys */ + list_for_each_entry(sdata, &local->interfaces, list) + ieee80211_reenable_keys(sdata); + } /* Reconfigure sched scan if it was interrupted by FW restart */ mutex_lock(&local->mtx); @@ -2643,6 +2646,19 @@ int ieee80211_reconfig(struct ieee80211_local *local) IEEE80211_QUEUE_STOP_REASON_SUSPEND, false); + if ((hw->wiphy->flags & WIPHY_FLAG_STA_DISCONNECT_ON_HW_RESTART) && + !reconfig_due_to_wowlan) { + list_for_each_entry(sdata, &local->interfaces, list) { + if (!ieee80211_sdata_running(sdata)) + continue; + if (sdata->vif.type == NL80211_IFTYPE_STATION) { + sdata->flags |= + IEEE80211_SDATA_DISCONNECT_HW_RESTART; + ieee80211_sta_restart(sdata); + } + } + } + /* * If this is for hw restart things are still running. * We may want to change that later, however.
Currently in case of target hardware restart, we just reconfig and re-enable the security keys and enable the network queues to start data traffic back from where it was interrupted. Many ath10k wifi chipsets have sequence numbers for the data packets assigned by firmware and the mac sequence number will restart from zero after target hardware restart leading to mismatch in the sequence number expected by the remote peer vs the sequence number of the frame sent by the target firmware. This mismatch in sequence number will cause out-of-order packets on the remote peer and all the frames sent by the device are dropped until we reach the sequence number which was sent before we restarted the target hardware In order to fix this, we trigger a sta disconnect, for the targets which expose this corresponding wiphy flag, in case of target hw restart. After this there will be a fresh connection and thereby avoiding the dropping of frames by remote peer. The right fix would be to pull the entire data path into the host which is not feasible or would need lots of complex changes and will still be inefficient. Tested on ath10k using WCN3990, QCA6174 Signed-off-by: Youghandhar Chintala <youghand@codeaurora.org> --- net/mac80211/ieee80211_i.h | 3 +++ net/mac80211/mlme.c | 9 +++++++++ net/mac80211/util.c | 22 +++++++++++++++++++--- 3 files changed, 31 insertions(+), 3 deletions(-)