Message ID | 20221208105947.2399894-1-niklas.cassel@wdc.com |
---|---|
Headers | show |
Series | Add Command Duration Limits support | expand |
> Kind regards, > Niklas & Damien > > Damien Le Moal (14): > ata: libata: simplify qc_fill_rtf port operation interface > ata: libata-scsi: improve ata_scsiop_maint_in() > scsi: support retrieving sub-pages of mode pages > scsi: support service action in scsi_report_opcode() > block: introduce duration-limits priority class > block: introduce BLK_STS_DURATION_LIMIT > ata: libata: detect support for command duration limits > ata: libata-scsi: handle CDL bits in ata_scsiop_maint_in() > ata: libata-scsi: add support for CDL pages mode sense > ata: libata: add ATA feature control sub-page translation > ata: libata: set read/write commands CDL index > scsi: sd: detect support for command duration limits > scsi: sd: set read/write commands CDL index > Documentation: sysfs-block-device: document command duration limits > > Niklas Cassel (11): > ata: scsi: rename flag ATA_QCFLAG_FAILED to ATA_QCFLAG_EH > ata: libata: move NCQ related ATA_DFLAGs > ata: libata: fix broken NCQ command status handling > ata: libata: respect successfully completed commands during errors > ata: libata: allow ata_scsi_set_sense() to not set CHECK_CONDITION > ata: libata: allow ata_eh_request_sense() to not set CHECK_CONDITION > ata: libata-scsi: do not overwrite SCSI ML and status bytes > scsi: core: allow libata to complete successful commands via EH > scsi: move get_scsi_ml_byte() to scsi_priv.h > scsi: sd: handle read/write CDL timeout failures > ata: libata: handle completion of CDL commands using policy 0xD > Out of 25 patches linux-block mailing list only got 3, was this on purpose ? see this and [1] :- https://marc.info/?l=linux-block&w=2&r=1&s=Command+Duration+limit&q=b -ck [1] 7. 2022-12-08 [1] [PATCH 1/4] sbitmap: remove unnecessary calculation o linux-blo Kemeng Shi 8. 2022-12-08 [1] [PATCH 0/4] A few cleanup patches for sbitmap linux-blo Kemeng Shi 9. 2022-12-08 [1] [PATCH 2/4] sbitmap: remove redundant check in __sbit linux-blo Kemeng Shi 10. 2022-12-08 [1] [PATCH 3/4] sbitmap: rewrite sbitmap_find_bit_in_inde linux-blo Kemeng Shi 11. 2022-12-08 [1] [PATCH 4/4] sbitmap: add sbitmap_find_bit to remove r linux-blo Kemeng Shi 12. 2022-12-08 [3] Re: [PATCH 3/9] bfq: Split shared queues on move betw linux-blo Yu Kuai * 13. 2022-12-08 [1] [PATCH 15/25] block: introduce BLK_STS_DURATION_LIMIT linux-blo Niklas Cassel* * 14. 2022-12-08 [1] [PATCH 14/25] block: introduce duration-limits priori linux-blo Niklas Cassel* * 15. 2022-12-08 [1] [PATCH 00/25] Add Command Duration Limits support linux-blo Niklas Cassel* 16. 2022-12-08 [1] [PATCH V9 8/8] block, bfq: balance I/O injection amon linux-blo Paolo Valente 17. 2022-12-08 [1] [PATCH V9 7/8] block, bfq: inject I/O to underutilize linux-blo Paolo Valente 18. 2022-12-08 [1] [PATCH V9 6/8] block, bfq: retrieve independent acces linux-blo Paolo Valente
On 12/8/22 4:59 AM, Niklas Cassel wrote: > Commands using a duration limit descriptor that has limit policies set > to a value other than 0x0 may be failed by the device if one of the > limits are exceeded. For such commands, since the failure is the result > of the user duration limit configuration and workload, the commands > should not be retried and terminated immediately. Furthermore, to allow > the user to differentiate these "soft" failures from hard errors due to > hardware problem, a different error code than EIO should be returned. > > There are 2 cases to consider: > (1) The failure is due to a limit policy failing the command with a > check condition sense key, that is, any limit policy other than 0xD. > For this case, scsi_check_sense() is modified to detect failures with > the ABORTED COMMAND sense key and the COMMAND TIMEOUT BEFORE PROCESSING > or COMMAND TIMEOUT DURING PROCESSING or COMMAND TIMEOUT DURING > PROCESSING DUE TO ERROR RECOVERY additional sense code. For these > failures, a SUCCESS disposition is returned so that > scsi_finish_command() is called to terminate the command. > > (2) The failure is due to a limit policy set to 0xD, which result in the > command being terminated with a GOOD status, COMPLETED sense key, and > DATA CURRENTLY UNAVAILABLE additional sense code. To handle this case, > the scsi_check_sense() is modified to return a SUCCESS disposition so > that scsi_finish_command() is called to terminate the command. > In addition, scsi_decide_disposition() has to be modified to see if a > command being terminated with GOOD status has sense data. > This is as defined in SCSI Primary Commands - 6 (SPC-6), so all > according to spec, even if GOOD status commands were not checked before. > > If scsi_check_sense() detects sense data representing a duration limit, > scsi_check_sense() will set the newly introduced SCSI ML byte > SCSIML_STAT_DL_TIMEOUT. This SCSI ML byte is checked in > scsi_noretry_cmd(), so that a command that failed because of a CDL > timeout cannot be retried. The SCSI ML byte is also checked in > scsi_result_to_blk_status() to complete the command request with the > BLK_STS_DURATION_LIMIT status, which result in the user seeing ETIME > errors for the failed commands. > > Co-developed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> > Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> > Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> > --- > drivers/scsi/scsi_error.c | 46 +++++++++++++++++++++++++++++++++++++++ > drivers/scsi/scsi_lib.c | 4 ++++ > drivers/scsi/scsi_priv.h | 1 + > 3 files changed, 51 insertions(+) > > diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c > index 51aa5c1e31b5..1bdab5385985 100644 > --- a/drivers/scsi/scsi_error.c > +++ b/drivers/scsi/scsi_error.c > @@ -536,6 +536,7 @@ static inline void set_scsi_ml_byte(struct scsi_cmnd *cmd, u8 status) > */ > enum scsi_disposition scsi_check_sense(struct scsi_cmnd *scmd) > { > + struct request *req = scsi_cmd_to_rq(scmd); > struct scsi_device *sdev = scmd->device; > struct scsi_sense_hdr sshdr; > > @@ -595,6 +596,22 @@ enum scsi_disposition scsi_check_sense(struct scsi_cmnd *scmd) > if (sshdr.asc == 0x10) /* DIF */ > return SUCCESS; > > + /* > + * Check aborts due to command duration limit policy: > + * ABORTED COMMAND additional sense code with the > + * COMMAND TIMEOUT BEFORE PROCESSING or > + * COMMAND TIMEOUT DURING PROCESSING or > + * COMMAND TIMEOUT DURING PROCESSING DUE TO ERROR RECOVERY > + * additional sense code qualifiers. > + */ > + if (sshdr.asc == 0x2e && > + sshdr.ascq >= 0x01 && sshdr.ascq <= 0x03) { > + set_scsi_ml_byte(scmd, SCSIML_STAT_DL_TIMEOUT); > + req->cmd_flags |= REQ_FAILFAST_DEV; Why are you setting the REQ_FAILFAST_DEV bit? Does libata check for it? I thought you might have set it because DID_TIME_OUT was set and you wanted to hit that check in scsi_noretry_cmd. However, I see that patch where you added the new flag so DID_TIME_OUT does not get set sometimes so you probably don't hit that path, and you have that check for SCSIML_STAT_DL_TIMEOUT in there below. > + req->rq_flags |= RQF_QUIET; > + return SUCCESS; > + } > + > if (sshdr.asc == 0x44 && sdev->sdev_bflags & BLIST_RETRY_ITF) > return ADD_TO_MLQUEUE; > if (sshdr.asc == 0xc1 && sshdr.ascq == 0x01 && > @@ -691,6 +708,15 @@ enum scsi_disposition scsi_check_sense(struct scsi_cmnd *scmd) > } > return SUCCESS; > > + case COMPLETED: > + if (sshdr.asc == 0x55 && sshdr.ascq == 0x0a) { > + set_scsi_ml_byte(scmd, SCSIML_STAT_DL_TIMEOUT); > + req->cmd_flags |= REQ_FAILFAST_DEV; > + req->rq_flags |= RQF_QUIET; > + return SUCCESS; > + } > + return SUCCESS; > + > default: > return SUCCESS; > } > @@ -785,6 +811,14 @@ static enum scsi_disposition scsi_eh_completed_normally(struct scsi_cmnd *scmd) > switch (get_status_byte(scmd)) { > case SAM_STAT_GOOD: > scsi_handle_queue_ramp_up(scmd->device); > + if (scmd->sense_buffer && SCSI_SENSE_VALID(scmd)) > + /* > + * If we have sense data, call scsi_check_sense() in > + * order to set the correct SCSI ML byte (if any). > + * No point in checking the return value, since the > + * command has already completed successfully. > + */ > + scsi_check_sense(scmd); > fallthrough; > case SAM_STAT_COMMAND_TERMINATED: > return SUCCESS; > @@ -1807,6 +1841,10 @@ bool scsi_noretry_cmd(struct scsi_cmnd *scmd) > return !!(req->cmd_flags & REQ_FAILFAST_DRIVER); > } > > + /* Never retry commands aborted due to a duration limit timeout */ > + if (get_scsi_ml_byte(scmd->result) == SCSIML_STAT_DL_TIMEOUT) > + return true; > + > if (!scsi_status_is_check_condition(scmd->result)) > return false; > > @@ -1966,6 +2004,14 @@ enum scsi_disposition scsi_decide_disposition(struct scsi_cmnd *scmd) > if (scmd->cmnd[0] == REPORT_LUNS) > scmd->device->sdev_target->expecting_lun_change = 0; > scsi_handle_queue_ramp_up(scmd->device); > + if (scmd->sense_buffer && SCSI_SENSE_VALID(scmd)) > + /* > + * If we have sense data, call scsi_check_sense() in > + * order to set the correct SCSI ML byte (if any). > + * No point in checking the return value, since the > + * command has already completed successfully. > + */ > + scsi_check_sense(scmd); > fallthrough; > case SAM_STAT_COMMAND_TERMINATED: > return SUCCESS; > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > index e64fd8f495d7..4f317c6593aa 100644 > --- a/drivers/scsi/scsi_lib.c > +++ b/drivers/scsi/scsi_lib.c > @@ -602,6 +602,8 @@ static blk_status_t scsi_result_to_blk_status(int result) > return BLK_STS_MEDIUM; > case SCSIML_STAT_TGT_FAILURE: > return BLK_STS_TARGET; > + case SCSIML_STAT_DL_TIMEOUT: > + return BLK_STS_DURATION_LIMIT; > } > > switch (host_byte(result)) { > @@ -799,6 +801,8 @@ static void scsi_io_completion_action(struct scsi_cmnd *cmd, int result) > blk_stat = BLK_STS_ZONE_OPEN_RESOURCE; > } > break; > + case COMPLETED: > + fallthrough; > default: > action = ACTION_FAIL; > break; > diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h > index 4f97e126c6fb..f8da92428ff6 100644 > --- a/drivers/scsi/scsi_priv.h > +++ b/drivers/scsi/scsi_priv.h > @@ -27,6 +27,7 @@ enum scsi_ml_status { > SCSIML_STAT_NOSPC = 0x02, /* Space allocation on the dev failed */ > SCSIML_STAT_MED_ERROR = 0x03, /* Medium error */ > SCSIML_STAT_TGT_FAILURE = 0x04, /* Permanent target failure */ > + SCSIML_STAT_DL_TIMEOUT = 0x05, /* Command Duration Limit timeout */ > }; > > static inline u8 get_scsi_ml_byte(int result)
On 12/9/22 09:13, Mike Christie wrote: >> @@ -595,6 +596,22 @@ enum scsi_disposition scsi_check_sense(struct scsi_cmnd *scmd) >> if (sshdr.asc == 0x10) /* DIF */ >> return SUCCESS; >> >> + /* >> + * Check aborts due to command duration limit policy: >> + * ABORTED COMMAND additional sense code with the >> + * COMMAND TIMEOUT BEFORE PROCESSING or >> + * COMMAND TIMEOUT DURING PROCESSING or >> + * COMMAND TIMEOUT DURING PROCESSING DUE TO ERROR RECOVERY >> + * additional sense code qualifiers. >> + */ >> + if (sshdr.asc == 0x2e && >> + sshdr.ascq >= 0x01 && sshdr.ascq <= 0x03) { >> + set_scsi_ml_byte(scmd, SCSIML_STAT_DL_TIMEOUT); >> + req->cmd_flags |= REQ_FAILFAST_DEV; > > Why are you setting the REQ_FAILFAST_DEV bit? Does libata check for it? > > I thought you might have set it because DID_TIME_OUT was set and you wanted > to hit that check in scsi_noretry_cmd. However, I see that patch where you > added the new flag so DID_TIME_OUT does not get set sometimes so you probably > don't hit that path, and you have that check for SCSIML_STAT_DL_TIMEOUT in there > below. This is for the block layer (blk_noretry_request() helper) so that the remainder of the request is not retried. Retrying a CDL command that timedout goes against the goal of CDL.
On 12/9/22 03:18, Chaitanya Kulkarni wrote: > >> Kind regards, >> Niklas & Damien >> >> Damien Le Moal (14): >> ata: libata: simplify qc_fill_rtf port operation interface >> ata: libata-scsi: improve ata_scsiop_maint_in() >> scsi: support retrieving sub-pages of mode pages >> scsi: support service action in scsi_report_opcode() >> block: introduce duration-limits priority class >> block: introduce BLK_STS_DURATION_LIMIT >> ata: libata: detect support for command duration limits >> ata: libata-scsi: handle CDL bits in ata_scsiop_maint_in() >> ata: libata-scsi: add support for CDL pages mode sense >> ata: libata: add ATA feature control sub-page translation >> ata: libata: set read/write commands CDL index >> scsi: sd: detect support for command duration limits >> scsi: sd: set read/write commands CDL index >> Documentation: sysfs-block-device: document command duration limits >> >> Niklas Cassel (11): >> ata: scsi: rename flag ATA_QCFLAG_FAILED to ATA_QCFLAG_EH >> ata: libata: move NCQ related ATA_DFLAGs >> ata: libata: fix broken NCQ command status handling >> ata: libata: respect successfully completed commands during errors >> ata: libata: allow ata_scsi_set_sense() to not set CHECK_CONDITION >> ata: libata: allow ata_eh_request_sense() to not set CHECK_CONDITION >> ata: libata-scsi: do not overwrite SCSI ML and status bytes >> scsi: core: allow libata to complete successful commands via EH >> scsi: move get_scsi_ml_byte() to scsi_priv.h >> scsi: sd: handle read/write CDL timeout failures >> ata: libata: handle completion of CDL commands using policy 0xD >> > > Out of 25 patches linux-block mailing list only got 3, > was this on purpose ? see this and [1] :- Not sure how Niklas sent the series. Niklas, For the next rev (we will need one to at least rebase on 6.2-rc1 I think), please make sure to send all patches to all lists/maintainers.