Message ID | 2179ecb8-183f-a500-4d65-10f64f0f43cc@suse.de |
---|---|
State | New |
Headers | show |
On 11/02/2016 01:09 AM, Andrey Grodzovsky wrote: > Problem: > This is a work around for a bug with LSI Fusion MPT SAS2 when > pefroming secure erase. Due to the very long time the operation > takes commands issued during the erase will time out and will trigger > execution of abort hook. Even though the abort hook is called for > the specifc command which timed out this leads to entire device halt > (scsi_state terminated) and premature termination of the secured erase. > > Fix: > Set device state to busy while erase in progress to reject any incoming > commands until the erase is done. The device is blocked any way during > this time and cannot execute any other command. > More data and logs can be found here - > https://drive.google.com/file/d/0B9ocOHYHbbS1Q3VMdkkzeWFkTjg/view > > v2: Update according to example patch by Hannes Reinecke to apply > the blocking logic to any ATA 12/16 command. > > Signed-off-by: Andrey Grodzovsky <andrey2805@gmail.com> > Cc: <linux-scsi@vger.kernel.org> > Cc: Sathya Prakash <sathya.prakash@broadcom.com> > Cc: Chaitra P B <chaitra.basappa@broadcom.com> > Cc: Suganath Prabu Subramani <suganath-prabu.subramani@broadcom.com> > Cc: Sreekanth Reddy <Sreekanth.Reddy@broadcom.com> > Cc: Hannes Reinecke <hare@suse.de> > Cc: <stable@vger.kernel.org> > --- > drivers/scsi/mpt3sas/mpt3sas_scsih.c | 26 ++++++++++++++++++++++++++ > 1 file changed, 26 insertions(+) > > diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c > index 5a97e32..43ab0cc 100644 > --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c > +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c > @@ -3500,6 +3500,20 @@ _scsih_eedp_error_handling(struct scsi_cmnd *scmd, u16 ioc_status) > SAM_STAT_CHECK_CONDITION; > } > > +/** > + * This is a work around for a bug with LSI Fusion MPT SAS2 when > + * pefroming secure erase. Due to the verly long time the operation > + * takes commands issued during the erase will time out and will trigger > + * execution of abort hook. This leads to device reset and premature > + * termination of the secured erase. > + * > + */ > +static inline bool ata_12_16_cmd(struct scsi_cmnd *scmd) > +{ > + return (scmd->cmnd[0] == 0xa1 || scmd->cmnd[0] == 0x85); > +} > + > + > > /** > * _scsih_qcmd - main scsi request entry point > @@ -3528,6 +3542,14 @@ _scsih_qcmd(struct Scsi_Host *shost, struct scsi_cmnd *scmd) > scsi_print_command(scmd); > #endif > > + /** > + * Lock the device for any subsequent command until > + * command is done. > + */ > + if (ata_12_16_cmd(scmd)) > + scsi_internal_device_block(scmd->device); > + > + > sas_device_priv_data = scmd->device->hostdata; > if (!sas_device_priv_data || !sas_device_priv_data->sas_target) { > scmd->result = DID_NO_CONNECT << 16; > @@ -4062,6 +4084,10 @@ _scsih_io_done(struct MPT3SAS_ADAPTER *ioc, u16 smid, u8 msix_index, u32 reply) > if (scmd == NULL) > return 1; > > + if (ata_12_16_cmd(scmd)) > + scsi_internal_device_unblock(scmd->device, SDEV_RUNNING); > + > + > mpi_request = mpt3sas_base_get_msg_frame(ioc, smid); > > if (mpi_reply == NULL) { > Yeah, it's ugly, but I can't think of a better solution for the moment. Thanks for debugging this. Reviewed-by: Hannes Reinecke <hare@suse.com> Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>> "Hannes" == Hannes Reinecke <hare@suse.de> writes:
Hannes> Checking with SAT-3 (section 6.2.4: Commands the SATL queues
Hannes> internally) the implemented behaviour is standards conformant,
Hannes> although the standard also allows for returning 'TASK SET FULL'
Hannes> or 'BUSY' in these cases. Doing so would nicely solve this
Hannes> issue.
I agree with Hannes that it would be appropriate for the SATL to report
busy when it makes an non-queued command queueable.
--
Martin K. Petersen Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04/24/2018 11:09 AM, Steffen Maier wrote: > > On 11/04/2016 05:35 PM, Martin K. Petersen wrote: >>>>>>> "Hannes" == Hannes Reinecke <hare@suse.de> writes: >> >> Hannes> Checking with SAT-3 (section 6.2.4: Commands the SATL queues >> Hannes> internally) the implemented behaviour is standards conformant, >> Hannes> although the standard also allows for returning 'TASK SET FULL' >> Hannes> or 'BUSY' in these cases. Doing so would nicely solve this >> Hannes> issue. >> >> I agree with Hannes that it would be appropriate for the SATL to report >> busy when it makes an non-queued command queueable. > > Wouldn't this potentially still cause problems if the secure erase takes > longer than max_retries * scmd_tmo. I.e. the command timing out by > default after 180 seconds as in > https://www.spinics.net/lists/linux-block/msg24837.html ? > > The fix approach here seems to also handle this gracefully. > Well, yes, of course the command will be terminated after it timed out. But typically secure erase is invoked from userspace via sg ioctls, and it's in the responsibility of the application to set the correct timeout. Cheers, Hannes
From 1556746987c3b4c1a1a4705625280b1136554f89 Mon Sep 17 00:00:00 2001 From: Hannes Reinecke <hare@suse.de> Date: Sun, 30 Oct 2016 14:24:44 +0100 Subject: [PATCH] mpt3sas: hack: disable concurrent commands for ATA_16/ATA_12 There's a bug in the mpt3sas driver/firmware which would not return BUSY if it's busy processing requests (eg 'erase') and cannot respond to other commands. Hence these commands will timeout and eventually start the error handler. This patch disallows request processing whenever an ATA_12 or ATA_16 command is received, thereby avoiding this problem. Signed-off-by: Hannes Reinecke <hare@suse.com> --- drivers/scsi/mpt3sas/mpt3sas_scsih.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c index 97987e7..18b9f09 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c @@ -4096,6 +4096,13 @@ scsih_qcmd(struct Scsi_Host *shost, struct scsi_cmnd *scmd) sas_device_priv_data->block) return SCSI_MLQUEUE_DEVICE_BUSY; + /* + * Hack: block the device for any ATA_12/ATA_16 command + */ + if (scmd->cmnd[0] == 0xa1 || scmd->cmnd[0] == 0x85) { + sas_device_priv_data = scmd->device->hostdata; + _scsih_internal_device_block(scmd->device, sas_device_priv_data); + } if (scmd->sc_data_direction == DMA_FROM_DEVICE) mpi_control = MPI2_SCSIIO_CONTROL_READ; else if (scmd->sc_data_direction == DMA_TO_DEVICE) @@ -4835,6 +4842,10 @@ _scsih_io_done(struct MPT3SAS_ADAPTER *ioc, u16 smid, u8 msix_index, u32 reply) out: + if (scmd->cmnd[0] == 0xa1 || scmd->cmnd[0] == 0x85) { + sas_device_priv_data = scmd->device->hostdata; + _scsih_internal_device_unblock(scmd->device, sas_device_priv_data); + } scsi_dma_unmap(scmd); scmd->scsi_done(scmd); -- 2.6.6