Message ID | 20210403232333.212927-4-michael.christie@oracle.com |
---|---|
State | Superseded |
Headers | show |
Series | iscsi lock and refcount fix ups | expand |
On 4/3/21 4:22 PM, Mike Christie wrote: > The patch: > > commit 5923d64b7ab6 ("scsi: libiscsi: Drop taskqueuelock") > > added an extra task->state because for > > commit 6f8830f5bbab ("scsi: libiscsi: add lock around task lists to fix > list corruption regression") > > we didn't know why we ended up with cmds on the list and thought it > might have been a bad target sending a response while we were still > sending the cmd. It turns out the bug was just a race in libiscsi/ > libiscsi_tcp where > > 1. iscsi_tcp_r2t_rsp queues a r2t to tcp_task->r2tqueue. > 2. iscsi_tcp_task_xmit runs iscsi_tcp_get_curr_r2t and sees we have a r2t. > It dequeues it and iscsi_tcp_task_xmit starts to process it. > 3. iscsi_tcp_r2t_rsp runs iscsi_requeue_task and puts the task on the > requeue list. > 4. iscsi_tcp_task_xmit sends the data for r2t. This is the final chunk of > data, so the cmd is done. > 5. target sends the response. > 6. On a different CPU from #3, iscsi_complete_task processes the response. > Since there was no common lock for the list, the lists/tasks pointers are > not fully in sync, so could end up with list corruption. > > Since it was just a race on our side, this patch removes the extra check. > > Signed-off-by: Mike Christie <michael.christie@oracle.com> > --- > drivers/scsi/libiscsi.c | 13 ++++++++----- > 1 file changed, 8 insertions(+), 5 deletions(-) > > diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c > index 643edc4eb6fe..94cb9410230a 100644 > --- a/drivers/scsi/libiscsi.c > +++ b/drivers/scsi/libiscsi.c > @@ -562,16 +562,19 @@ static bool cleanup_queued_task(struct iscsi_task *task) > struct iscsi_conn *conn = task->conn; > bool early_complete = false; > > - /* Bad target might have completed task while it was still running */ > + /* > + * We might have raced where we handled a R2T early and got a response > + * but have not yet taken the task off the requeue list, then a TMF or > + * recovery happened and so we can still see it here. > + */ > if (task->state == ISCSI_TASK_COMPLETED) > early_complete = true; > > if (!list_empty(&task->running)) { > list_del_init(&task->running); > /* > - * If it's on a list but still running, this could be from > - * a bad target sending a rsp early, cleanup from a TMF, or > - * session recovery. > + * If it's on a list but still running this could be cleanup > + * from a TMF or session recovery. > */ > if (task->state == ISCSI_TASK_RUNNING || > task->state == ISCSI_TASK_COMPLETED) > @@ -1470,7 +1473,7 @@ static int iscsi_xmit_task(struct iscsi_conn *conn, struct iscsi_task *task, > } > /* regular RX path uses back_lock */ > spin_lock(&conn->session->back_lock); > - if (rc && task->state == ISCSI_TASK_RUNNING) { > + if (rc) { > /* > * get an extra ref that is released next time we access it > * as conn->task above. > Reviewed-by: Lee Duncan <lduncan@suse.com>
diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c index 643edc4eb6fe..94cb9410230a 100644 --- a/drivers/scsi/libiscsi.c +++ b/drivers/scsi/libiscsi.c @@ -562,16 +562,19 @@ static bool cleanup_queued_task(struct iscsi_task *task) struct iscsi_conn *conn = task->conn; bool early_complete = false; - /* Bad target might have completed task while it was still running */ + /* + * We might have raced where we handled a R2T early and got a response + * but have not yet taken the task off the requeue list, then a TMF or + * recovery happened and so we can still see it here. + */ if (task->state == ISCSI_TASK_COMPLETED) early_complete = true; if (!list_empty(&task->running)) { list_del_init(&task->running); /* - * If it's on a list but still running, this could be from - * a bad target sending a rsp early, cleanup from a TMF, or - * session recovery. + * If it's on a list but still running this could be cleanup + * from a TMF or session recovery. */ if (task->state == ISCSI_TASK_RUNNING || task->state == ISCSI_TASK_COMPLETED) @@ -1470,7 +1473,7 @@ static int iscsi_xmit_task(struct iscsi_conn *conn, struct iscsi_task *task, } /* regular RX path uses back_lock */ spin_lock(&conn->session->back_lock); - if (rc && task->state == ISCSI_TASK_RUNNING) { + if (rc) { /* * get an extra ref that is released next time we access it * as conn->task above.
The patch: commit 5923d64b7ab6 ("scsi: libiscsi: Drop taskqueuelock") added an extra task->state because for commit 6f8830f5bbab ("scsi: libiscsi: add lock around task lists to fix list corruption regression") we didn't know why we ended up with cmds on the list and thought it might have been a bad target sending a response while we were still sending the cmd. It turns out the bug was just a race in libiscsi/ libiscsi_tcp where 1. iscsi_tcp_r2t_rsp queues a r2t to tcp_task->r2tqueue. 2. iscsi_tcp_task_xmit runs iscsi_tcp_get_curr_r2t and sees we have a r2t. It dequeues it and iscsi_tcp_task_xmit starts to process it. 3. iscsi_tcp_r2t_rsp runs iscsi_requeue_task and puts the task on the requeue list. 4. iscsi_tcp_task_xmit sends the data for r2t. This is the final chunk of data, so the cmd is done. 5. target sends the response. 6. On a different CPU from #3, iscsi_complete_task processes the response. Since there was no common lock for the list, the lists/tasks pointers are not fully in sync, so could end up with list corruption. Since it was just a race on our side, this patch removes the extra check. Signed-off-by: Mike Christie <michael.christie@oracle.com> --- drivers/scsi/libiscsi.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-)