Message ID | 20220311184359.2345319-1-djeffery@redhat.com |
---|---|
State | New |
Headers | show |
Series | fnic: finish scsi_cmnd before dropping the spinlock to prevent abort race | expand |
On Fri, 2022-03-11 at 13:43 -0500, David Jeffery wrote: > When aborting a scsi command through fnic, there is a race with the > fnic > interrupt handler which can result in the scsi command and its > request > being completed twice. If the interrupt handler claims the command by > setting CMD_SP to NULL first, the abort handler assumes the interrupt > handler has completed the command and returns SUCCESS, causing the > request > for the scsi_cmnd to be re-queued. > > But the interrupt handler may not have finished the command yet. > After it > drops the spinlock protecting CMD_SP, it does memory cleanup before > finally calling scsi_done to complete the scsi_cmnd. If the call to > scsi_done occurs after the abort handler finishes and re-queues the > request, the completion of the scsi_cmnd will advance and try to > double > complete a request already queued for retry. > > This patch fixes the issue by moving scsi_done and any other use of > scsi_cmnd to before the spinlock is released by the interrupt > handler. > > Signed-off-by: David Jeffery <djeffery@redhat.com> > --- > drivers/scsi/fnic/fnic_scsi.c | 13 ++++++------- > 1 file changed, 6 insertions(+), 7 deletions(-) > > diff --git a/drivers/scsi/fnic/fnic_scsi.c > b/drivers/scsi/fnic/fnic_scsi.c > index 88c549f257db..40a52feb315d 100644 > --- a/drivers/scsi/fnic/fnic_scsi.c > +++ b/drivers/scsi/fnic/fnic_scsi.c > @@ -986,8 +986,6 @@ static void fnic_fcpio_icmnd_cmpl_handler(struct > fnic *fnic, > CMD_SP(sc) = NULL; > CMD_FLAGS(sc) |= FNIC_IO_DONE; > > - spin_unlock_irqrestore(io_lock, flags); > - > if (hdr_status != FCPIO_SUCCESS) { > atomic64_inc(&fnic_stats->io_stats.io_failures); > shost_printk(KERN_ERR, fnic->lport->host, "hdr status = > %s\n", > @@ -996,8 +994,6 @@ static void fnic_fcpio_icmnd_cmpl_handler(struct > fnic *fnic, > > fnic_release_ioreq_buf(fnic, io_req, sc); > > - mempool_free(io_req, fnic->io_req_pool); > - > cmd_trace = ((u64)hdr_status << 56) | > (u64)icmnd_cmpl->scsi_status << 48 | > (u64)icmnd_cmpl->flags << 40 | (u64)sc->cmnd[0] << 32 > | > @@ -1021,6 +1017,12 @@ static void > fnic_fcpio_icmnd_cmpl_handler(struct fnic *fnic, > } else > fnic->lport->host_stats.fcp_control_requests++; > > + /* Call SCSI completion function to complete the IO */ > + scsi_done(sc); > + spin_unlock_irqrestore(io_lock, flags); > + > + mempool_free(io_req, fnic->io_req_pool); > + > atomic64_dec(&fnic_stats->io_stats.active_ios); > if (atomic64_read(&fnic->io_cmpl_skip)) > atomic64_dec(&fnic->io_cmpl_skip); > @@ -1049,9 +1051,6 @@ static void > fnic_fcpio_icmnd_cmpl_handler(struct fnic *fnic, > if(io_duration_time > atomic64_read(&fnic_stats- > >io_stats.current_max_io_time)) > atomic64_set(&fnic_stats- > >io_stats.current_max_io_time, io_duration_time); > } > - > - /* Call SCSI completion function to complete the IO */ > - scsi_done(sc); > } > > /* fnic_fcpio_itmf_cmpl_handler This patch was also presented to Ming who agreed with David's changes. Its been sent to a customer for full testing to see if it avoids the panics. The trigger is a sequence of these and then we get the double completion. WHile its not easy to reproduce and not often seen this customer can make it happen at will it seems. [1363787.139752] scsi host7: hdr status = FCPIO_DATA_CNT_MISMATCH [1363787.139822] scsi host7: hdr status = FCPIO_DATA_CNT_MISMATCH [1363787.139870] scsi host7: hdr status = FCPIO_DATA_CNT_MISMATCH [1363787.139916] scsi host7: hdr status = FCPIO_DATA_CNT_MISMATCH [1363787.139961] scsi host7: hdr status = FCPIO_DATA_CNT_MISMATCH [1363787.140006] scsi host7: hdr status = FCPIO_DATA_CNT_MISMATCH Reviewed-by: Laurence Oberman <loberman@redhat.com> Thanks very much
On Fri, Mar 11, 2022 at 01:43:59PM -0500, David Jeffery wrote: > When aborting a scsi command through fnic, there is a race with the fnic > interrupt handler which can result in the scsi command and its request > being completed twice. If the interrupt handler claims the command by > setting CMD_SP to NULL first, the abort handler assumes the interrupt > handler has completed the command and returns SUCCESS, causing the request > for the scsi_cmnd to be re-queued. > > But the interrupt handler may not have finished the command yet. After it > drops the spinlock protecting CMD_SP, it does memory cleanup before > finally calling scsi_done to complete the scsi_cmnd. If the call to > scsi_done occurs after the abort handler finishes and re-queues the > request, the completion of the scsi_cmnd will advance and try to double > complete a request already queued for retry. > > This patch fixes the issue by moving scsi_done and any other use of > scsi_cmnd to before the spinlock is released by the interrupt handler. This way provides one simple fix for the race between normal completion and abort, looks fine: Reviewed-by: Ming Lei <ming.lei@redhat.com> Thanks, Ming
On Fri, 11 Mar 2022 13:43:59 -0500, David Jeffery wrote: > When aborting a scsi command through fnic, there is a race with the fnic > interrupt handler which can result in the scsi command and its request > being completed twice. If the interrupt handler claims the command by > setting CMD_SP to NULL first, the abort handler assumes the interrupt > handler has completed the command and returns SUCCESS, causing the request > for the scsi_cmnd to be re-queued. > > [...] Applied to 5.17/scsi-fixes, thanks! [1/1] fnic: finish scsi_cmnd before dropping the spinlock to prevent abort race https://git.kernel.org/mkp/scsi/c/733ab7e1b5d1
diff --git a/drivers/scsi/fnic/fnic_scsi.c b/drivers/scsi/fnic/fnic_scsi.c index 88c549f257db..40a52feb315d 100644 --- a/drivers/scsi/fnic/fnic_scsi.c +++ b/drivers/scsi/fnic/fnic_scsi.c @@ -986,8 +986,6 @@ static void fnic_fcpio_icmnd_cmpl_handler(struct fnic *fnic, CMD_SP(sc) = NULL; CMD_FLAGS(sc) |= FNIC_IO_DONE; - spin_unlock_irqrestore(io_lock, flags); - if (hdr_status != FCPIO_SUCCESS) { atomic64_inc(&fnic_stats->io_stats.io_failures); shost_printk(KERN_ERR, fnic->lport->host, "hdr status = %s\n", @@ -996,8 +994,6 @@ static void fnic_fcpio_icmnd_cmpl_handler(struct fnic *fnic, fnic_release_ioreq_buf(fnic, io_req, sc); - mempool_free(io_req, fnic->io_req_pool); - cmd_trace = ((u64)hdr_status << 56) | (u64)icmnd_cmpl->scsi_status << 48 | (u64)icmnd_cmpl->flags << 40 | (u64)sc->cmnd[0] << 32 | @@ -1021,6 +1017,12 @@ static void fnic_fcpio_icmnd_cmpl_handler(struct fnic *fnic, } else fnic->lport->host_stats.fcp_control_requests++; + /* Call SCSI completion function to complete the IO */ + scsi_done(sc); + spin_unlock_irqrestore(io_lock, flags); + + mempool_free(io_req, fnic->io_req_pool); + atomic64_dec(&fnic_stats->io_stats.active_ios); if (atomic64_read(&fnic->io_cmpl_skip)) atomic64_dec(&fnic->io_cmpl_skip); @@ -1049,9 +1051,6 @@ static void fnic_fcpio_icmnd_cmpl_handler(struct fnic *fnic, if(io_duration_time > atomic64_read(&fnic_stats->io_stats.current_max_io_time)) atomic64_set(&fnic_stats->io_stats.current_max_io_time, io_duration_time); } - - /* Call SCSI completion function to complete the IO */ - scsi_done(sc); } /* fnic_fcpio_itmf_cmpl_handler
When aborting a scsi command through fnic, there is a race with the fnic interrupt handler which can result in the scsi command and its request being completed twice. If the interrupt handler claims the command by setting CMD_SP to NULL first, the abort handler assumes the interrupt handler has completed the command and returns SUCCESS, causing the request for the scsi_cmnd to be re-queued. But the interrupt handler may not have finished the command yet. After it drops the spinlock protecting CMD_SP, it does memory cleanup before finally calling scsi_done to complete the scsi_cmnd. If the call to scsi_done occurs after the abort handler finishes and re-queues the request, the completion of the scsi_cmnd will advance and try to double complete a request already queued for retry. This patch fixes the issue by moving scsi_done and any other use of scsi_cmnd to before the spinlock is released by the interrupt handler. Signed-off-by: David Jeffery <djeffery@redhat.com> --- drivers/scsi/fnic/fnic_scsi.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-)