Message ID | 20240910073035.25974-3-peter.wang@mediatek.com |
---|---|
State | New |
Headers | show |
Series | fix abort defect | expand |
On Wed, 2024-09-11 at 12:11 -0700, Bart Van Assche wrote: > > External email : Please do not click links or open attachments until > you have verified the sender or the content. > On 9/10/24 11:03 PM, Peter Wang (王信友) wrote: > > This statement is not quite accurate becasue in UFSHIC2.1, SDB mode > > specification already have OCS: ABORTED (0x6) define. > > And it is used in below UTRLCLR description: > > 'which means a Transfer Request was "aborted"' > > Therefore, the host controller should follow the > > specification and fill the OCS field with OCS: ABORTED. > > If not so, at what point does your host controller use the > > OCS: ABORTED status? > > Hmm ... I have not been able to find any explanation in the UFSHCI > 2.1 > specification that says when the OCS status is set to aborted. Did I > perhaps overlook something? > > This is what I found in the UTRLCLR description: "The host software > shall use this field only when a UTP Transfer Request is expected to > not be completed, e.g., when the host software receives a “FUNCTION > COMPLETE” Task Management response which means a Transfer Request was > aborted." This does not mean that the host controller is expected to > set the OCS status to "ABORTED". I will send an email to the JC-64 > mailing list to request clarification. > Hi Bart, Yes, you're right, just as I mentioned earlier, I also think the spec does not explicitly state that UTRLC should have corresponding behavior for OCS. This might lead to inconsistencies in how host controllers operate. > >>> +/* > >>> + * When the host software receives a "FUNCTION COMPLETE", set > flag > >>> + * to requeue command after receive response with OCS_ABORTED > >>> + * SDB mode: UTRLCLR Task Management response which means a > >> Transfer > >>> + * Request was aborted. > >>> + * MCQ mode: Host will post to CQ with OCS_ABORTED after SQ > >> cleanup > >>> + * This flag is set because ufshcd_abort_all forcibly aborts all > >>> + * commands, and the host will automatically fill in the OCS > field > >>> + * of the corresponding response with OCS_ABORTED. > >>> + * Therefore, upon receiving this response, it needs to be > >> requeued. > >>> + */ > >>> +if (!err) > >>> +lrbp->abort_initiated_by_err = true; > >>> + > >>> err = ufshcd_clear_cmd(hba, tag); > >>> if (err) > >>> dev_err(hba->dev, "%s: Failed clearing cmd at tag %d, err > %d\n", > >> > >> The above change is misplaced. ufshcd_try_to_abort_task() can be > >> called > >> when the SCSI core decides to abort a command while > >> abort_initiated_by_err must not be set in that case. Please move > the > >> above code block into ufshcd_abort_one(). > > > > But move to ufshcd_abort_one may have race condition, beacause we > > need set this flag before ufshcd_clear_cmd host controller fill > > OCS_ABORTED to response. I will add check ufshcd_eh_in_progress. > > Calling ufshcd_clear_cmd() does not affect the OCS status as far as I > know. Did I perhaps overlook something? > Because after ufshcd_clear_cmd, in MCQ mode: ufshcd_mcq_sq_cleanup, host controller will post CQ response with OCS ABORTED. in SDB mode: ufshcd_utrl_clear set UTRLC, Mediatek host controller (may not all host controller) will post response with OCS ABORTED. In both cases, we have an interrupt sent to the host, and there may be a race condition before we set this flag for requeue. So I need to set this flag before ufshcd_clear_cmd. Thanks. Peter > Thanks, > > Bart.
On 9/12/24 6:31 AM, Peter Wang (王信友) wrote: > in SDB mode: > ufshcd_utrl_clear set UTRLC, Mediatek host controller > (may not all host controller) will post response with OCS ABORTED. > > In both cases, we have an interrupt sent to the host, and there > may be a race condition before we set this flag for requeue. > So I need to set this flag before ufshcd_clear_cmd. If a completion interrupt is sent to the host if a command has been cleared in SDB mode (I doubt this is what happens), I think that's a severe controller bug. A UFSHCI controller is not allowed to send a completion interrupt to the host if a command is cleared by writing into the UTRLCLR register. Thanks, Bart.
On Thu, 2024-09-12 at 14:17 -0700, Bart Van Assche wrote: > > External email : Please do not click links or open attachments until > you have verified the sender or the content. > On 9/12/24 6:31 AM, Peter Wang (王信友) wrote: > > in SDB mode: > > ufshcd_utrl_clear set UTRLC, Mediatek host controller > > (may not all host controller) will post response with OCS ABORTED. > > > > In both cases, we have an interrupt sent to the host, and there > > may be a race condition before we set this flag for requeue. > > So I need to set this flag before ufshcd_clear_cmd. > > If a completion interrupt is sent to the host if a command has been > cleared in SDB mode (I doubt this is what happens), I think that's a > severe controller bug. A UFSHCI controller is not allowed to send a > completion interrupt to the host if a command is cleared by writing > into > the UTRLCLR register. > > Thanks, > > Bart. Hi Bart, Because the MediaTek UFS controller uses UTRLCLR to clear commands and fills OCS with ABORTED. Regarding the specification of UTRCS: This bit is set to '1' by the host controller upon one of the following: Overall command Status (OCS) of the completed command is not equal to 'SUCCESS' even if its UTRD Interrupt bit set to '0' So, MediaTek host controller will send interrupt in this case. Thanks. Peter
On 9/13/24 12:10 AM, Peter Wang (王信友) wrote: > Because the MediaTek UFS controller uses UTRLCLR to clear > commands and fills OCS with ABORTED. > > Regarding the specification of UTRCS: > This bit is set to '1' by the host controller upon one of the > following: > Overall command Status (OCS) of the completed command is not > equal to 'SUCCESS' even if its UTRD Interrupt bit set to '0' > > So, MediaTek host controller will send interrupt in this case. Hi Peter, Thank you for having shared this information. Please consider introducing a quirk for ignoring completions triggered by clearing a command, e.g. as follows (there may be better approaches): * In ufshcd_clear_cmd(), before a command is cleared, initialize the completion that will be used for waiting for the completion interrupt. After a command has been cleared, call wait_for_completion_timeout(). * In ufshcd_compl_one_cqe(), check whether the completion is the result of a command being cleared. If so, call complete() instead of executing the regular completion code. Thanks, Bart.
diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c index a6f818cdef0e..615da47c1727 100644 --- a/drivers/ufs/core/ufshcd.c +++ b/drivers/ufs/core/ufshcd.c @@ -3006,6 +3006,7 @@ static int ufshcd_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd) ufshcd_prepare_lrbp_crypto(scsi_cmd_to_rq(cmd), lrbp); lrbp->req_abort_skip = false; + lrbp->abort_initiated_by_err = false; ufshcd_comp_scsi_upiu(hba, lrbp); @@ -5404,7 +5405,10 @@ ufshcd_transfer_rsp_status(struct ufs_hba *hba, struct ufshcd_lrb *lrbp, } break; case OCS_ABORTED: - result |= DID_ABORT << 16; + if (lrbp->abort_initiated_by_err) + result |= DID_REQUEUE << 16; + else + result |= DID_ABORT << 16; break; case OCS_INVALID_COMMAND_STATUS: result |= DID_REQUEUE << 16; @@ -6471,26 +6475,12 @@ static bool ufshcd_abort_one(struct request *rq, void *priv) struct scsi_device *sdev = cmd->device; struct Scsi_Host *shost = sdev->host; struct ufs_hba *hba = shost_priv(shost); - struct ufshcd_lrb *lrbp = &hba->lrb[tag]; - struct ufs_hw_queue *hwq; - unsigned long flags; *ret = ufshcd_try_to_abort_task(hba, tag); dev_err(hba->dev, "Aborting tag %d / CDB %#02x %s\n", tag, hba->lrb[tag].cmd ? hba->lrb[tag].cmd->cmnd[0] : -1, *ret ? "failed" : "succeeded"); - /* Release cmd in MCQ mode if abort succeeds */ - if (hba->mcq_enabled && (*ret == 0)) { - hwq = ufshcd_mcq_req_to_hwq(hba, scsi_cmd_to_rq(lrbp->cmd)); - if (!hwq) - return 0; - spin_lock_irqsave(&hwq->cq_lock, flags); - if (ufshcd_cmd_inflight(lrbp->cmd)) - ufshcd_release_scsi_cmd(hba, lrbp); - spin_unlock_irqrestore(&hwq->cq_lock, flags); - } - return *ret == 0; } @@ -7561,6 +7551,20 @@ int ufshcd_try_to_abort_task(struct ufs_hba *hba, int tag) goto out; } + /* + * When the host software receives a "FUNCTION COMPLETE", set flag + * to requeue command after receive response with OCS_ABORTED + * SDB mode: UTRLCLR Task Management response which means a Transfer + * Request was aborted. + * MCQ mode: Host will post to CQ with OCS_ABORTED after SQ cleanup + * This flag is set because ufshcd_abort_all forcibly aborts all + * commands, and the host will automatically fill in the OCS field + * of the corresponding response with OCS_ABORTED. + * Therefore, upon receiving this response, it needs to be requeued. + */ + if (!err) + lrbp->abort_initiated_by_err = true; + err = ufshcd_clear_cmd(hba, tag); if (err) dev_err(hba->dev, "%s: Failed clearing cmd at tag %d, err %d\n", diff --git a/include/ufs/ufshcd.h b/include/ufs/ufshcd.h index 0fd2aebac728..15b357672ca5 100644 --- a/include/ufs/ufshcd.h +++ b/include/ufs/ufshcd.h @@ -173,6 +173,8 @@ struct ufs_pm_lvl_states { * @crypto_key_slot: the key slot to use for inline crypto (-1 if none) * @data_unit_num: the data unit number for the first block for inline crypto * @req_abort_skip: skip request abort task flag + * @abort_initiated_by_err: The flag is specifically used to handle aborts + * caused by errors due to host/device communication */ struct ufshcd_lrb { struct utp_transfer_req_desc *utr_descriptor_ptr; @@ -202,6 +204,7 @@ struct ufshcd_lrb { #endif bool req_abort_skip; + bool abort_initiated_by_err; }; /**