diff mbox series

[v2] nvme-fc: fix racing controller reset and create association

Message ID 20210309005126.58460-1-jsmart2021@gmail.com
State New
Headers show
Series [v2] nvme-fc: fix racing controller reset and create association | expand

Commit Message

James Smart March 9, 2021, 12:51 a.m. UTC
Recent patch to prevent calling __nvme_fc_abort_outstanding_ios in
interrupt context results in a possible race condition. A controller
reset results in errored io completions, which schedules error
work. The change of error work to a work element allows it to fire
after the ctrl state transition to NVME_CTRL_CONNECTING, causing
any outstanding io (used to initialize the controller) to fail and
cause problems for connect_work.

Add a state check to only schedule error work if not in the RESETTING
state.

Fixes: 19fce0470f05 ("nvme-fc: avoid calling _nvme_fc_abort_outstanding_ios from interrupt context")
Cc: <stable@vger.kernel.org> # v5.10+

Signed-off-by: Nigel Kirkland <nkirkland2304@gmail.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>

---
v2: clean up typo in commit header
---
 drivers/nvme/host/fc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Ewan Milne March 11, 2021, 5:33 p.m. UTC | #1
On Mon, 2021-03-08 at 16:51 -0800, James Smart wrote:
> Recent patch to prevent calling __nvme_fc_abort_outstanding_ios in

> interrupt context results in a possible race condition. A controller

> reset results in errored io completions, which schedules error

> work. The change of error work to a work element allows it to fire

> after the ctrl state transition to NVME_CTRL_CONNECTING, causing

> any outstanding io (used to initialize the controller) to fail and

> cause problems for connect_work.

> 

> Add a state check to only schedule error work if not in the RESETTING

> state.

> 

> Fixes: 19fce0470f05 ("nvme-fc: avoid calling

> _nvme_fc_abort_outstanding_ios from interrupt context")

> Cc: <stable@vger.kernel.org> # v5.10+

> 

> Signed-off-by: Nigel Kirkland <nkirkland2304@gmail.com>

> Signed-off-by: James Smart <jsmart2021@gmail.com>

> 

> ---

> v2: clean up typo in commit header

> ---

>  drivers/nvme/host/fc.c | 2 +-

>  1 file changed, 1 insertion(+), 1 deletion(-)

> 

> diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c

> index 20dadd86e981..0f92bd12123e 100644

> --- a/drivers/nvme/host/fc.c

> +++ b/drivers/nvme/host/fc.c

> @@ -2055,7 +2055,7 @@ nvme_fc_fcpio_done(struct nvmefc_fcp_req *req)

>  		nvme_fc_complete_rq(rq);

>  

>  check_error:

> -	if (terminate_assoc)

> +	if (terminate_assoc && ctrl->ctrl.state != NVME_CTRL_RESETTING)

>  		queue_work(nvme_reset_wq, &ctrl->ioerr_work);

>  }

>  


This fix resolves the frequent -EBUSY / -ENETRESET errors I saw when
resetting the controller via sysfs, as well as the eventual hang with
the controller stuck in the _CONNECTING state, thanks.  Looks good.

-Ewan
diff mbox series

Patch

diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index 20dadd86e981..0f92bd12123e 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -2055,7 +2055,7 @@  nvme_fc_fcpio_done(struct nvmefc_fcp_req *req)
 		nvme_fc_complete_rq(rq);
 
 check_error:
-	if (terminate_assoc)
+	if (terminate_assoc && ctrl->ctrl.state != NVME_CTRL_RESETTING)
 		queue_work(nvme_reset_wq, &ctrl->ioerr_work);
 }