diff mbox series

[wireless] ar5523: Fix deadlock bugs caused by cancel_work_sync in ar5523_stop

Message ID 20220522133055.96405-1-duoming@zju.edu.cn
State New
Headers show
Series [wireless] ar5523: Fix deadlock bugs caused by cancel_work_sync in ar5523_stop | expand

Commit Message

Duoming Zhou May 22, 2022, 1:30 p.m. UTC
If the work item is running, the cancel_work_sync in ar5523_stop will
not return until work item is finished. If we hold mutex_lock and use
cancel_work_sync to wait the work item to finish, the work item such as
ar5523_tx_wd_work and ar5523_tx_work also require mutex_lock. As a result,
the ar5523_stop will be blocked forever. One of the race conditions is
shown below:

    (Thread 1)             |   (Thread 2)
ar5523_stop                |
  mutex_lock(&ar->mutex)   | ar5523_tx_wd_work
                           |   mutex_lock(&ar->mutex)
  cancel_work_sync         |   ...

This patch moves cancel_work_sync out of mutex_lock in order to mitigate
deadlock bugs.

Fixes: b7d572e1871d ("ar5523: Add new driver")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
---
 drivers/net/wireless/ath/ar5523/ar5523.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Kalle Valo May 30, 2022, 11:24 a.m. UTC | #1
Duoming Zhou <duoming@zju.edu.cn> writes:

> If the work item is running, the cancel_work_sync in ar5523_stop will
> not return until work item is finished. If we hold mutex_lock and use
> cancel_work_sync to wait the work item to finish, the work item such as
> ar5523_tx_wd_work and ar5523_tx_work also require mutex_lock. As a result,
> the ar5523_stop will be blocked forever. One of the race conditions is
> shown below:
>
>     (Thread 1)             |   (Thread 2)
> ar5523_stop                |
>   mutex_lock(&ar->mutex)   | ar5523_tx_wd_work
>                            |   mutex_lock(&ar->mutex)
>   cancel_work_sync         |   ...
>
> This patch moves cancel_work_sync out of mutex_lock in order to mitigate
> deadlock bugs.
>
> Fixes: b7d572e1871d ("ar5523: Add new driver")
> Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>

I assume you have found this with a static checker tool, it would be
good document what tool you are using. And if you have not tested this
with real hardware clearly mention that with "Compile tested only".

> ---
>  drivers/net/wireless/ath/ar5523/ar5523.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/net/wireless/ath/ar5523/ar5523.c b/drivers/net/wireless/ath/ar5523/ar5523.c
> index 9cabd342d15..99d6b13ffcf 100644
> --- a/drivers/net/wireless/ath/ar5523/ar5523.c
> +++ b/drivers/net/wireless/ath/ar5523/ar5523.c
> @@ -1071,8 +1071,10 @@ static void ar5523_stop(struct ieee80211_hw *hw)
>  	ar5523_cmd_write(ar, WDCMSG_TARGET_STOP, NULL, 0, 0);
>  
>  	del_timer_sync(&ar->tx_wd_timer);
> +	mutex_unlock(&ar->mutex);
>  	cancel_work_sync(&ar->tx_wd_work);
>  	cancel_work_sync(&ar->rx_refill_work);
> +	mutex_lock(&ar->mutex);
>  	ar5523_cancel_rx_bufs(ar);
>  	mutex_unlock(&ar->mutex);
>  }

Releasing a lock and taking it again looks like a hack to me. Please
test with a real device and try to find a better solution.
Duoming Zhou May 31, 2022, 7:50 a.m. UTC | #2
Hello,

On Mon, 30 May 2022 14:24:04 +0300 Kalle Valo wrote:

> Duoming Zhou <duoming@zju.edu.cn> writes:
> 
> > If the work item is running, the cancel_work_sync in ar5523_stop will
> > not return until work item is finished. If we hold mutex_lock and use
> > cancel_work_sync to wait the work item to finish, the work item such as
> > ar5523_tx_wd_work and ar5523_tx_work also require mutex_lock. As a result,
> > the ar5523_stop will be blocked forever. One of the race conditions is
> > shown below:
> >
> >     (Thread 1)             |   (Thread 2)
> > ar5523_stop                |
> >   mutex_lock(&ar->mutex)   | ar5523_tx_wd_work
> >                            |   mutex_lock(&ar->mutex)
> >   cancel_work_sync         |   ...
> >
> > This patch moves cancel_work_sync out of mutex_lock in order to mitigate
> > deadlock bugs.
> >
> > Fixes: b7d572e1871d ("ar5523: Add new driver")
> > Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
> 
> I assume you have found this with a static checker tool, it would be
> good document what tool you are using. And if you have not tested this
> with real hardware clearly mention that with "Compile tested only".
> 
> > ---
> >  drivers/net/wireless/ath/ar5523/ar5523.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/net/wireless/ath/ar5523/ar5523.c b/drivers/net/wireless/ath/ar5523/ar5523.c
> > index 9cabd342d15..99d6b13ffcf 100644
> > --- a/drivers/net/wireless/ath/ar5523/ar5523.c
> > +++ b/drivers/net/wireless/ath/ar5523/ar5523.c
> > @@ -1071,8 +1071,10 @@ static void ar5523_stop(struct ieee80211_hw *hw)
> >  	ar5523_cmd_write(ar, WDCMSG_TARGET_STOP, NULL, 0, 0);
> >  
> >  	del_timer_sync(&ar->tx_wd_timer);
> > +	mutex_unlock(&ar->mutex);
> >  	cancel_work_sync(&ar->tx_wd_work);
> >  	cancel_work_sync(&ar->rx_refill_work);
> > +	mutex_lock(&ar->mutex);
> >  	ar5523_cancel_rx_bufs(ar);
> >  	mutex_unlock(&ar->mutex);
> >  }
> 
> Releasing a lock and taking it again looks like a hack to me. Please
> test with a real device and try to find a better solution.

The following is a new solution:

diff --git a/drivers/net/wireless/ath/ar5523/ar5523.c b/drivers/net/wireless/ath/ar5523/ar5523.c
index 9cabd342d15..8adae85fcb9 100644
--- a/drivers/net/wireless/ath/ar5523/ar5523.c
+++ b/drivers/net/wireless/ath/ar5523/ar5523.c
@@ -910,7 +910,11 @@ static void ar5523_tx_wd_work(struct work_struct *work)
         * recover seems to be to reset the dongle.
         */

-       mutex_lock(&ar->mutex);
+       if(!mutex_trylock(&ar->mutex)) {
+               if(test_bit(AR5523_HW_UP, &ar->flags))
+                       ieee80211_queue_work(ar->hw, &ar->tx_wd_work);
+               return;
+       }
        ar5523_err(ar, "TX queue stuck (tot %d pend %d)\n",
                   atomic_read(&ar->tx_nr_total),
                   atomic_read(&ar->tx_nr_pending));

If ar5523_stop() has acquired "ar->mutex" lock, the ar5523_tx_wd_work() will directly return.
If "ar->mutex" lock has acquired by other functions except ar5523_stop(), ar5523_tx_wd_work()
will re-queue itself.

So, this solution could mitigate the deadlock between ar5523_stop() and ar5523_tx_wd_work().

Best regards,
Duoming Zhou
diff mbox series

Patch

diff --git a/drivers/net/wireless/ath/ar5523/ar5523.c b/drivers/net/wireless/ath/ar5523/ar5523.c
index 9cabd342d15..99d6b13ffcf 100644
--- a/drivers/net/wireless/ath/ar5523/ar5523.c
+++ b/drivers/net/wireless/ath/ar5523/ar5523.c
@@ -1071,8 +1071,10 @@  static void ar5523_stop(struct ieee80211_hw *hw)
 	ar5523_cmd_write(ar, WDCMSG_TARGET_STOP, NULL, 0, 0);
 
 	del_timer_sync(&ar->tx_wd_timer);
+	mutex_unlock(&ar->mutex);
 	cancel_work_sync(&ar->tx_wd_work);
 	cancel_work_sync(&ar->rx_refill_work);
+	mutex_lock(&ar->mutex);
 	ar5523_cancel_rx_bufs(ar);
 	mutex_unlock(&ar->mutex);
 }