diff mbox series

io_uring: fix early sqd_list removal sqpoll hangs

Message ID 1592cc2b0418a0512c83898dbef0b1c9722e8645.1618310545.git.asml.silence@gmail.com
State Accepted
Commit c7d95613c7d6e003969722a290397b8271bdad17
Headers show
Series io_uring: fix early sqd_list removal sqpoll hangs | expand

Commit Message

Pavel Begunkov April 13, 2021, 10:43 a.m. UTC
[  245.463317] INFO: task iou-sqp-1374:1377 blocked for more than 122 seconds.
[  245.463334] task:iou-sqp-1374    state:D flags:0x00004000
[  245.463345] Call Trace:
[  245.463352]  __schedule+0x36b/0x950
[  245.463376]  schedule+0x68/0xe0
[  245.463385]  __io_uring_cancel+0xfb/0x1a0
[  245.463407]  do_exit+0xc0/0xb40
[  245.463423]  io_sq_thread+0x49b/0x710
[  245.463445]  ret_from_fork+0x22/0x30

It happens when sqpoll forgot to run park_task_work and goes to exit,
then exiting user may remove ctx from sqd_list, and so corresponding
io_sq_thread() -> io_uring_cancel_sqpoll() won't be executed. Hopefully
it just stucks in do_exit() in this case.

Cc: stable@vger.kernel.org
Reported-by: Joakim Hassila <joj@mac.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 fs/io_uring.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

Comments

Pavel Begunkov April 14, 2021, 10:46 a.m. UTC | #1
On 13/04/2021 11:43, Pavel Begunkov wrote:
> [  245.463317] INFO: task iou-sqp-1374:1377 blocked for more than 122 seconds.

> [  245.463334] task:iou-sqp-1374    state:D flags:0x00004000

> [  245.463345] Call Trace:

> [  245.463352]  __schedule+0x36b/0x950

> [  245.463376]  schedule+0x68/0xe0

> [  245.463385]  __io_uring_cancel+0xfb/0x1a0

> [  245.463407]  do_exit+0xc0/0xb40

> [  245.463423]  io_sq_thread+0x49b/0x710

> [  245.463445]  ret_from_fork+0x22/0x30

> 

> It happens when sqpoll forgot to run park_task_work and goes to exit,

> then exiting user may remove ctx from sqd_list, and so corresponding

> io_sq_thread() -> io_uring_cancel_sqpoll() won't be executed. Hopefully

> it just stucks in do_exit() in this case.


fwiw, it's actually a 5.12 problem and I have a reliable enough
way to reproduce it.


> Cc: stable@vger.kernel.org

> Reported-by: Joakim Hassila <joj@mac.com>

> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>

> ---

>  fs/io_uring.c | 7 +++++--

>  1 file changed, 5 insertions(+), 2 deletions(-)

> 

> diff --git a/fs/io_uring.c b/fs/io_uring.c

> index cadd7a65a7f4..f390914666b1 100644

> --- a/fs/io_uring.c

> +++ b/fs/io_uring.c

> @@ -6817,6 +6817,9 @@ static int io_sq_thread(void *data)

>  	current->flags |= PF_NO_SETAFFINITY;

>  

>  	mutex_lock(&sqd->lock);

> +	/* a user may had exited before the thread wstarted */

> +	io_run_task_work_head(&sqd->park_task_work);

> +

>  	while (!test_bit(IO_SQ_THREAD_SHOULD_STOP, &sqd->state)) {

>  		int ret;

>  		bool cap_entries, sqt_spin, needs_sched;

> @@ -6833,10 +6836,10 @@ static int io_sq_thread(void *data)

>  			}

>  			cond_resched();

>  			mutex_lock(&sqd->lock);

> -			if (did_sig)

> -				break;

>  			io_run_task_work();

>  			io_run_task_work_head(&sqd->park_task_work);

> +			if (did_sig)

> +				break;

>  			timeout = jiffies + sqd->sq_thread_idle;

>  			continue;

>  		}

> 


-- 
Pavel Begunkov
Jens Axboe April 14, 2021, 4:19 p.m. UTC | #2
On 4/13/21 4:43 AM, Pavel Begunkov wrote:
> [  245.463317] INFO: task iou-sqp-1374:1377 blocked for more than 122 seconds.

> [  245.463334] task:iou-sqp-1374    state:D flags:0x00004000

> [  245.463345] Call Trace:

> [  245.463352]  __schedule+0x36b/0x950

> [  245.463376]  schedule+0x68/0xe0

> [  245.463385]  __io_uring_cancel+0xfb/0x1a0

> [  245.463407]  do_exit+0xc0/0xb40

> [  245.463423]  io_sq_thread+0x49b/0x710

> [  245.463445]  ret_from_fork+0x22/0x30

> 

> It happens when sqpoll forgot to run park_task_work and goes to exit,

> then exiting user may remove ctx from sqd_list, and so corresponding

> io_sq_thread() -> io_uring_cancel_sqpoll() won't be executed. Hopefully

> it just stucks in do_exit() in this case.


Added for 5.12, thanks.

-- 
Jens Axboe
diff mbox series

Patch

diff --git a/fs/io_uring.c b/fs/io_uring.c
index cadd7a65a7f4..f390914666b1 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -6817,6 +6817,9 @@  static int io_sq_thread(void *data)
 	current->flags |= PF_NO_SETAFFINITY;
 
 	mutex_lock(&sqd->lock);
+	/* a user may had exited before the thread wstarted */
+	io_run_task_work_head(&sqd->park_task_work);
+
 	while (!test_bit(IO_SQ_THREAD_SHOULD_STOP, &sqd->state)) {
 		int ret;
 		bool cap_entries, sqt_spin, needs_sched;
@@ -6833,10 +6836,10 @@  static int io_sq_thread(void *data)
 			}
 			cond_resched();
 			mutex_lock(&sqd->lock);
-			if (did_sig)
-				break;
 			io_run_task_work();
 			io_run_task_work_head(&sqd->park_task_work);
+			if (did_sig)
+				break;
 			timeout = jiffies + sqd->sq_thread_idle;
 			continue;
 		}