Message ID | 20230609103651.313194-1-treapking@chromium.org |
---|---|
State | New |
Headers | show |
Series | wifi: mwifiex: Replace RX workqueues with kthreads | expand |
Hi, Thanks Pin-yen for most of the investigation here and for pushing the patch. With some additional information though, I might suggest *not* landing this patch at the moment. More details appended: On Sat, Jun 10, 2023 at 01:41:51AM +0800, Pin-yen Lin wrote: > I realized that I might have over-simplified the background and the > impact of this patch... > > The short answer to the question is that the throughput improved from > 100 mbps to 180 mbps. The test was run on ChromeOS's v5.15 kernel > fork. More detailed test setting is mentioned in [1]. > > However, the throughput of the same test case on our v4.19 kernel is > 320 mbps. That is, we observed a 320 mbps --> 100 mbps regression when > we tried to update the kernel version. This patch is more like a > mitigation of the regression. It improves the throughput, even though > it is still not as good as the older kernel. > > That being said, this patch does improve the throughput, so we think > this patch can be landed into the mainline kernel. > > Best regards, > Pin-yen > > [1]: https://lore.kernel.org/all/ZFvpJb9Dh0FCkLQA@google.com/ I (we?) was optimistic this would be an improvement (or at least, no worse) due to some of the reasoning at [1]. And, the work here is just a single work item, queued repeatedly to the same unbound workqueue. So conceptually, it shouldn't be much different than a kthread_worker, except for scheduler details -- where again, we'd think this should be an improvement, as the scheduler would now better track load for the task (mwifiex RX) in question. But additional testing on other mwifiex-based systems (RK3399 + PCIE 8997) showed the inverse: some throughput drops on similar benchmarks, from 110 Mbps to 80 Mbps. (Frankly, both numbers are significantly below where we might like...) Considering we still don't have a full explanation for all the performance differences we've been seeing (on either test platform), and that at least one of our platforms showed a (smaller) regression, I think we might want to do more research before committing to this. Brian
Brian Norris <briannorris@chromium.org> writes: > Hi, > > Thanks Pin-yen for most of the investigation here and for pushing the > patch. With some additional information though, I might suggest *not* > landing this patch at the moment. More details appended: > > On Sat, Jun 10, 2023 at 01:41:51AM +0800, Pin-yen Lin wrote: >> I realized that I might have over-simplified the background and the >> impact of this patch... >> >> The short answer to the question is that the throughput improved from >> 100 mbps to 180 mbps. The test was run on ChromeOS's v5.15 kernel >> fork. More detailed test setting is mentioned in [1]. >> >> However, the throughput of the same test case on our v4.19 kernel is >> 320 mbps. That is, we observed a 320 mbps --> 100 mbps regression when >> we tried to update the kernel version. This patch is more like a >> mitigation of the regression. It improves the throughput, even though >> it is still not as good as the older kernel. >> >> That being said, this patch does improve the throughput, so we think >> this patch can be landed into the mainline kernel. >> >> Best regards, >> Pin-yen >> >> [1]: https://lore.kernel.org/all/ZFvpJb9Dh0FCkLQA@google.com/ > > I (we?) was optimistic this would be an improvement (or at least, no > worse) due to some of the reasoning at [1]. And, the work here is just a > single work item, queued repeatedly to the same unbound workqueue. So > conceptually, it shouldn't be much different than a kthread_worker, > except for scheduler details -- where again, we'd think this should be > an improvement, as the scheduler would now better track load for the > task (mwifiex RX) in question. > > But additional testing on other mwifiex-based systems (RK3399 + PCIE > 8997) showed the inverse: some throughput drops on similar benchmarks, > from 110 Mbps to 80 Mbps. (Frankly, both numbers are significantly below > where we might like...) > > Considering we still don't have a full explanation for all the > performance differences we've been seeing (on either test platform), and > that at least one of our platforms showed a (smaller) regression, I > think we might want to do more research before committing to this. Yeah, I agree and I'll drop this. This is a really weird problem, I hope you can get to the bottom of it.
diff --git a/drivers/net/wireless/marvell/mwifiex/11n_rxreorder.c b/drivers/net/wireless/marvell/mwifiex/11n_rxreorder.c index 391793a16adc..431f6dc61f5b 100644 --- a/drivers/net/wireless/marvell/mwifiex/11n_rxreorder.c +++ b/drivers/net/wireless/marvell/mwifiex/11n_rxreorder.c @@ -198,7 +198,7 @@ mwifiex_del_rx_reorder_entry(struct mwifiex_private *priv, priv->adapter->rx_locked = true; if (priv->adapter->rx_processing) { spin_unlock_bh(&priv->adapter->rx_proc_lock); - flush_workqueue(priv->adapter->rx_workqueue); + kthread_flush_worker(priv->adapter->rx_thread); } else { spin_unlock_bh(&priv->adapter->rx_proc_lock); } diff --git a/drivers/net/wireless/marvell/mwifiex/cfg80211.c b/drivers/net/wireless/marvell/mwifiex/cfg80211.c index bcd564dc3554..f985bff4e52c 100644 --- a/drivers/net/wireless/marvell/mwifiex/cfg80211.c +++ b/drivers/net/wireless/marvell/mwifiex/cfg80211.c @@ -872,7 +872,7 @@ static int mwifiex_deinit_priv_params(struct mwifiex_private *priv) adapter->rx_locked = true; if (adapter->rx_processing) { spin_unlock_bh(&adapter->rx_proc_lock); - flush_workqueue(adapter->rx_workqueue); + kthread_flush_worker(adapter->rx_thread); } else { spin_unlock_bh(&adapter->rx_proc_lock); } diff --git a/drivers/net/wireless/marvell/mwifiex/main.c b/drivers/net/wireless/marvell/mwifiex/main.c index ea22a08e6c08..bab963f3db83 100644 --- a/drivers/net/wireless/marvell/mwifiex/main.c +++ b/drivers/net/wireless/marvell/mwifiex/main.c @@ -168,7 +168,7 @@ static void mwifiex_queue_rx_work(struct mwifiex_adapter *adapter) spin_unlock_bh(&adapter->rx_proc_lock); } else { spin_unlock_bh(&adapter->rx_proc_lock); - queue_work(adapter->rx_workqueue, &adapter->rx_work); + kthread_queue_work(adapter->rx_thread, &adapter->rx_work); } } @@ -526,9 +526,10 @@ static void mwifiex_terminate_workqueue(struct mwifiex_adapter *adapter) adapter->workqueue = NULL; } - if (adapter->rx_workqueue) { - destroy_workqueue(adapter->rx_workqueue); - adapter->rx_workqueue = NULL; + if (adapter->rx_thread) { + kthread_flush_worker(adapter->rx_thread); + kthread_destroy_worker(adapter->rx_thread); + adapter->rx_thread = NULL; } } @@ -1394,7 +1395,7 @@ int is_command_pending(struct mwifiex_adapter *adapter) * * It handles the RX operations. */ -static void mwifiex_rx_work_queue(struct work_struct *work) +static void mwifiex_rx_work_queue(struct kthread_work *work) { struct mwifiex_adapter *adapter = container_of(work, struct mwifiex_adapter, rx_work); @@ -1554,13 +1555,10 @@ mwifiex_reinit_sw(struct mwifiex_adapter *adapter) INIT_WORK(&adapter->main_work, mwifiex_main_work_queue); if (adapter->rx_work_enabled) { - adapter->rx_workqueue = alloc_workqueue("MWIFIEX_RX_WORK_QUEUE", - WQ_HIGHPRI | - WQ_MEM_RECLAIM | - WQ_UNBOUND, 1); - if (!adapter->rx_workqueue) + adapter->rx_thread = kthread_create_worker(0, "MWIFIEX_RX"); + if (IS_ERR(adapter->rx_thread)) goto err_kmalloc; - INIT_WORK(&adapter->rx_work, mwifiex_rx_work_queue); + kthread_init_work(&adapter->rx_work, mwifiex_rx_work_queue); } /* Register the device. Fill up the private data structure with @@ -1709,14 +1707,11 @@ mwifiex_add_card(void *card, struct completion *fw_done, INIT_WORK(&adapter->main_work, mwifiex_main_work_queue); if (adapter->rx_work_enabled) { - adapter->rx_workqueue = alloc_workqueue("MWIFIEX_RX_WORK_QUEUE", - WQ_HIGHPRI | - WQ_MEM_RECLAIM | - WQ_UNBOUND, 1); - if (!adapter->rx_workqueue) + adapter->rx_thread = kthread_create_worker(0, "MWIFIEX_RX"); + if (!adapter->rx_thread) goto err_kmalloc; - INIT_WORK(&adapter->rx_work, mwifiex_rx_work_queue); + kthread_init_work(&adapter->rx_work, mwifiex_rx_work_queue); } /* Register the device. Fill up the private data structure with relevant diff --git a/drivers/net/wireless/marvell/mwifiex/main.h b/drivers/net/wireless/marvell/mwifiex/main.h index b95886e1413e..3255f9a5c2d4 100644 --- a/drivers/net/wireless/marvell/mwifiex/main.h +++ b/drivers/net/wireless/marvell/mwifiex/main.h @@ -32,6 +32,7 @@ #include <linux/gfp.h> #include <linux/interrupt.h> #include <linux/io.h> +#include <linux/kthread.h> #include <linux/of_gpio.h> #include <linux/of_platform.h> #include <linux/platform_device.h> @@ -886,8 +887,8 @@ struct mwifiex_adapter { atomic_t tx_hw_pending; struct workqueue_struct *workqueue; struct work_struct main_work; - struct workqueue_struct *rx_workqueue; - struct work_struct rx_work; + struct kthread_worker *rx_thread; + struct kthread_work rx_work; struct workqueue_struct *dfs_workqueue; struct work_struct dfs_work; bool rx_work_enabled;
This improves the RX throughput likely by better locality for the work loads. We tested this patch on Mediatek MT8173 Chromebooks, and it shows ~80% Rx throughput improvement on high data rate test cases. Signed-off-by: Pin-yen Lin <treapking@chromium.org> --- .../wireless/marvell/mwifiex/11n_rxreorder.c | 2 +- .../net/wireless/marvell/mwifiex/cfg80211.c | 2 +- drivers/net/wireless/marvell/mwifiex/main.c | 29 ++++++++----------- drivers/net/wireless/marvell/mwifiex/main.h | 5 ++-- 4 files changed, 17 insertions(+), 21 deletions(-)