Message ID | 20210716205220.1101150-1-tyreld@linux.ibm.com |
---|---|
State | Accepted |
Commit | a264cf5e81c78e2b9918b8b9ef2ace9dde1850df |
Headers | show |
Series | ibmvfc: fix command state accounting and stale response detection | expand |
Tyrel Datwyler <tyreld@linux.ibm.com> a écrit : > Prior to commit 1f4a4a19508d ("scsi: ibmvfc: Complete commands outside > the host/queue lock") responses to commands were completed sequentially > with the host lock held such that a command had a basic binary state of > active or free. It was therefore a simple affair of ensuring the > assocaiated ibmvfc_event to a VIOS response was valid by testing that it > was not already free. The lock relexation work to complete commands > outside the lock inadverdently made it a trinary command state such that > a command is either in flight, received and being completed, or > completed and now free. This breaks the stale command detection logic as > a command may be still marked active and been placed on the delayed > completion list when a second stale response for the same command > arrives. This can lead to double completions and list corruption. This > issue was exposed by a recent VIOS regression were a missing memory > barrier could occasionally result in the ibmvfc client receiveing a > duplicate response for the same command. > > Fix the issue by introducing the atomic ibmvfc_event.active to track the > trinary state of a command. The state is explicitly set to 1 when a > command is successfully sent. The CRQ response handlers use > atomic_dec_if_positive() to test for stale responses and correctly > transition to the completion state when a active command is received. > Finally, atomic_dec_and_test() is used to sanity check transistions > when commands are freed as a result of a completion, or moved to the > purge list as a result of error handling or adapter reset. > > Cc: stable@vger.kernel.org > Fixes: 1f4a4a19508d ("scsi: ibmvfc: Complete commands outside the > host/queue lock") > Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com> > --- > drivers/scsi/ibmvscsi/ibmvfc.c | 19 +++++++++++++++++-- > drivers/scsi/ibmvscsi/ibmvfc.h | 1 + > 2 files changed, 18 insertions(+), 2 deletions(-) > > diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c > index bee1bec49c09..935b01ee44b7 100644 > --- a/drivers/scsi/ibmvscsi/ibmvfc.c > +++ b/drivers/scsi/ibmvscsi/ibmvfc.c > @@ -807,6 +807,13 @@ static int ibmvfc_init_event_pool(struct > ibmvfc_host *vhost, > for (i = 0; i < size; ++i) { > struct ibmvfc_event *evt = &pool->events[i]; > > + /* > + * evt->active states > + * 1 = in flight > + * 0 = being completed > + * -1 = free/freed > + */ > + atomic_set(&evt->active, -1); > atomic_set(&evt->free, 1); > evt->crq.valid = 0x80; > evt->crq.ioba = cpu_to_be64(pool->iu_token + (sizeof(*evt->xfer_iu) * i)); > @@ -1017,6 +1024,7 @@ static void ibmvfc_free_event(struct ibmvfc_event *evt) > > BUG_ON(!ibmvfc_valid_event(pool, evt)); > BUG_ON(atomic_inc_return(&evt->free) != 1); > + BUG_ON(atomic_dec_and_test(&evt->active)); Avoid new BUG_ONs. See https://www.kernel.org/doc/html/latest/process/deprecated.html > > spin_lock_irqsave(&evt->queue->l_lock, flags); > list_add_tail(&evt->queue_list, &evt->queue->free); > @@ -1072,6 +1080,12 @@ static void ibmvfc_complete_purge(struct > list_head *purge_list) > **/ > static void ibmvfc_fail_request(struct ibmvfc_event *evt, int error_code) > { > + /* > + * Anything we are failing should still be active. Otherwise, it > + * implies we already got a response for the command and are doing > + * something bad like double completing it. > + */ > + BUG_ON(!atomic_dec_and_test(&evt->active)); Same
On Fri, 16 Jul 2021 14:52:20 -0600, Tyrel Datwyler wrote: > Prior to commit 1f4a4a19508d ("scsi: ibmvfc: Complete commands outside > the host/queue lock") responses to commands were completed sequentially > with the host lock held such that a command had a basic binary state of > active or free. It was therefore a simple affair of ensuring the > assocaiated ibmvfc_event to a VIOS response was valid by testing that it > was not already free. The lock relexation work to complete commands > outside the lock inadverdently made it a trinary command state such that > a command is either in flight, received and being completed, or > completed and now free. This breaks the stale command detection logic as > a command may be still marked active and been placed on the delayed > completion list when a second stale response for the same command > arrives. This can lead to double completions and list corruption. This > issue was exposed by a recent VIOS regression were a missing memory > barrier could occasionally result in the ibmvfc client receiveing a > duplicate response for the same command. > > [...] Applied to 5.14/scsi-fixes, thanks! [1/1] ibmvfc: fix command state accounting and stale response detection https://git.kernel.org/mkp/scsi/c/73bfdf707d01 -- Martin K. Petersen Oracle Linux Engineering
diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c index bee1bec49c09..935b01ee44b7 100644 --- a/drivers/scsi/ibmvscsi/ibmvfc.c +++ b/drivers/scsi/ibmvscsi/ibmvfc.c @@ -807,6 +807,13 @@ static int ibmvfc_init_event_pool(struct ibmvfc_host *vhost, for (i = 0; i < size; ++i) { struct ibmvfc_event *evt = &pool->events[i]; + /* + * evt->active states + * 1 = in flight + * 0 = being completed + * -1 = free/freed + */ + atomic_set(&evt->active, -1); atomic_set(&evt->free, 1); evt->crq.valid = 0x80; evt->crq.ioba = cpu_to_be64(pool->iu_token + (sizeof(*evt->xfer_iu) * i)); @@ -1017,6 +1024,7 @@ static void ibmvfc_free_event(struct ibmvfc_event *evt) BUG_ON(!ibmvfc_valid_event(pool, evt)); BUG_ON(atomic_inc_return(&evt->free) != 1); + BUG_ON(atomic_dec_and_test(&evt->active)); spin_lock_irqsave(&evt->queue->l_lock, flags); list_add_tail(&evt->queue_list, &evt->queue->free); @@ -1072,6 +1080,12 @@ static void ibmvfc_complete_purge(struct list_head *purge_list) **/ static void ibmvfc_fail_request(struct ibmvfc_event *evt, int error_code) { + /* + * Anything we are failing should still be active. Otherwise, it + * implies we already got a response for the command and are doing + * something bad like double completing it. + */ + BUG_ON(!atomic_dec_and_test(&evt->active)); if (evt->cmnd) { evt->cmnd->result = (error_code << 16); evt->done = ibmvfc_scsi_eh_done; @@ -1723,6 +1737,7 @@ static int ibmvfc_send_event(struct ibmvfc_event *evt, evt->done(evt); } else { + atomic_set(&evt->active, 1); spin_unlock_irqrestore(&evt->queue->l_lock, flags); ibmvfc_trc_start(evt); } @@ -3251,7 +3266,7 @@ static void ibmvfc_handle_crq(struct ibmvfc_crq *crq, struct ibmvfc_host *vhost, return; } - if (unlikely(atomic_read(&evt->free))) { + if (unlikely(atomic_dec_if_positive(&evt->active))) { dev_err(vhost->dev, "Received duplicate correlation_token 0x%08llx!\n", crq->ioba); return; @@ -3778,7 +3793,7 @@ static void ibmvfc_handle_scrq(struct ibmvfc_crq *crq, struct ibmvfc_host *vhost return; } - if (unlikely(atomic_read(&evt->free))) { + if (unlikely(atomic_dec_if_positive(&evt->active))) { dev_err(vhost->dev, "Received duplicate correlation_token 0x%08llx!\n", crq->ioba); return; diff --git a/drivers/scsi/ibmvscsi/ibmvfc.h b/drivers/scsi/ibmvscsi/ibmvfc.h index 4f0f3baefae4..92fb889d7eb0 100644 --- a/drivers/scsi/ibmvscsi/ibmvfc.h +++ b/drivers/scsi/ibmvscsi/ibmvfc.h @@ -745,6 +745,7 @@ struct ibmvfc_event { struct ibmvfc_target *tgt; struct scsi_cmnd *cmnd; atomic_t free; + atomic_t active; union ibmvfc_iu *xfer_iu; void (*done)(struct ibmvfc_event *evt); void (*_done)(struct ibmvfc_event *evt);
Prior to commit 1f4a4a19508d ("scsi: ibmvfc: Complete commands outside the host/queue lock") responses to commands were completed sequentially with the host lock held such that a command had a basic binary state of active or free. It was therefore a simple affair of ensuring the assocaiated ibmvfc_event to a VIOS response was valid by testing that it was not already free. The lock relexation work to complete commands outside the lock inadverdently made it a trinary command state such that a command is either in flight, received and being completed, or completed and now free. This breaks the stale command detection logic as a command may be still marked active and been placed on the delayed completion list when a second stale response for the same command arrives. This can lead to double completions and list corruption. This issue was exposed by a recent VIOS regression were a missing memory barrier could occasionally result in the ibmvfc client receiveing a duplicate response for the same command. Fix the issue by introducing the atomic ibmvfc_event.active to track the trinary state of a command. The state is explicitly set to 1 when a command is successfully sent. The CRQ response handlers use atomic_dec_if_positive() to test for stale responses and correctly transition to the completion state when a active command is received. Finally, atomic_dec_and_test() is used to sanity check transistions when commands are freed as a result of a completion, or moved to the purge list as a result of error handling or adapter reset. Cc: stable@vger.kernel.org Fixes: 1f4a4a19508d ("scsi: ibmvfc: Complete commands outside the host/queue lock") Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com> --- drivers/scsi/ibmvscsi/ibmvfc.c | 19 +++++++++++++++++-- drivers/scsi/ibmvscsi/ibmvfc.h | 1 + 2 files changed, 18 insertions(+), 2 deletions(-)