Message ID | 20240923080344.19084-1-peter.wang@mediatek.com |
---|---|
Headers | show |
Series | fix abort defect | expand |
On 9/23/24 1:03 AM, peter.wang@mediatek.com wrote: > diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c > index b5c7bc50a27e..b42079c3d634 100644 > --- a/drivers/ufs/core/ufshcd.c > +++ b/drivers/ufs/core/ufshcd.c > @@ -5404,7 +5404,10 @@ ufshcd_transfer_rsp_status(struct ufs_hba *hba, struct ufshcd_lrb *lrbp, > } > break; > case OCS_ABORTED: > - result |= DID_ABORT << 16; > + if (hba->quirks & UFSHCD_QUIRK_OCS_ABORTED) > + result |= DID_REQUEUE << 16; > + else > + result |= DID_ABORT << 16; > dev_warn(hba->dev, > "OCS aborted from controller = %x for tag %d\n", > ocs, lrbp->task_tag); I think the approach of this patch is racy: the cmd->result assignment by ufshcd_transfer_rsp_status() races with the cmd->result assignment by ufshcd_abort_one(). How about addressing this race as follows? * In ufshcd_compl_one_cqe(), if the OCS_ABORTED status is encountered, set a completion. * In the code that aborts SCSI commands, for MediaTek controllers only, wait for that completion (based on a quirk). * Instead of introducing an if-statement in ufshcd_transfer_rsp_status(), rely on the cmd->result value assigned by ufshcd_abort_one(). Thanks, Bart.
On Mon, 2024-09-23 at 11:15 -0700, Bart Van Assche wrote: > > External email : Please do not click links or open attachments until > you have verified the sender or the content. > On 9/23/24 1:03 AM, peter.wang@mediatek.com wrote: > > diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c > > index b5c7bc50a27e..b42079c3d634 100644 > > --- a/drivers/ufs/core/ufshcd.c > > +++ b/drivers/ufs/core/ufshcd.c > > @@ -5404,7 +5404,10 @@ ufshcd_transfer_rsp_status(struct ufs_hba > *hba, struct ufshcd_lrb *lrbp, > > } > > break; > > case OCS_ABORTED: > > -result |= DID_ABORT << 16; > > +if (hba->quirks & UFSHCD_QUIRK_OCS_ABORTED) > > +result |= DID_REQUEUE << 16; > > +else > > +result |= DID_ABORT << 16; > > dev_warn(hba->dev, > > "OCS aborted from controller = %x for tag %d\n", > > ocs, lrbp->task_tag); > > I think the approach of this patch is racy: the cmd->result > assignment > by ufshcd_transfer_rsp_status() races with the cmd->result assignment > by > ufshcd_abort_one(). How about addressing this race as follows? > * In ufshcd_compl_one_cqe(), if the OCS_ABORTED status is > encountered, > set a completion. > * In the code that aborts SCSI commands, for MediaTek controllers > only, > wait for that completion (based on a quirk). > * Instead of introducing an if-statement in > ufshcd_transfer_rsp_status(), rely on the cmd->result value > assigned > by ufshcd_abort_one(). > > Thanks, > > Bart. Hi Bart, Sorry, I might not have understood the potential racing issue here. The SCSI abort command doesn't need to wait for completion because SCSI doesn't care about cmd->result, right? The error handler abort also doesn't need to wait for completion, because it should have a guaranteed order? Firstly, ufshcd_abort_one is only likely to be filled back with OCS: ABORTED by MediaTek UFS controller after ufshcd_try_to_abort_task, and the sequence is as follows. ufshcd_err_handler() ufshcd_abort_all() ufshcd_abort_one() ufshcd_try_to_abort_task() // trigger mediatek controller fill OCS: ABORTED ufshcd_complete_requests() ufshcd_transfer_req_compl() ufshcd_poll() get outstanding_lock clear outstanding_reqs tag release outstanding_lock __ufshcd_transfer_req_compl() ufshcd_compl_one_cqe() cmd->result = DID_REQUEUE // mediatek may need quirk change DID_ABORT to DID_REQUEUE In addition, the ISR will use the outstanding_lock with ufshcd_err_handler to provide protection, so there won't be any racing that causes the command to be released repeatedly. The only possible issue might be that after ufshcd_abort_one, the MediaTek UFS controller has not yet filled in OCS: ABORTED and has entered ufshcd_transfer_rsp_status for checking. But this doesn't matter, because it will just be treated as OCS_INVALID_COMMAND_STATUS. Is there any corner case that I might have overlooked? Thanks. Peter >
On 9/24/24 1:51 AM, Peter Wang (王信友) wrote:
> Is there any corner case that I might have overlooked?
Maybe I misunderstood how MediaTek controllers work. Anyway, if a
MediaTek controller reports the completion status OCS_ABORTED, is
there any chance that this status will be reported after
ufshcd_release_scsi_cmd() has been called? Is there any chance that
the controller writing OCS_ABORTED into the status field will race
with the host software overwriting that status field while submitting
a new command?
Thanks,
Bart.
From: Peter Wang <peter.wang@mediatek.com> V8: - Remove the abort variable to simplify the abort process. - Correct error handler successfully aborts release flow. - Ingore MCQ OCS: ABORTED. V7: - Use a variable instead of a flag. - Add a check for MCQ mode when setting this variable to UFS_ERR_HANDLER. - Print OCS information for OCS_ABORTED and OCS_INVALID_COMMAND_STATUS. - Add a MediaTek quirk for handling OCS_ABORTED in SDB mode. - Skip notifying SCSI from ISR during SCSI abort (ufshcd_abort()). V6: - Add err handler check before set flag true. V5: - Change flag name. - Amend comment and patch description. V4: - Remove nullify SQ entry abort requeue. - Add more comment for flag usage and set description. - Fix build warning. V3: - Change comment and use variable(rtc) for error print - Change flag name and move flag set before ufshcd_clear_cmd - Add SDB mode clear UTRLCLR tag receive OCS_ABORTED requeue V2: - Fix mcq_enabled build error. Peter Wang (3): ufs: core: fix the issue of ICU failure ufs: core: fix error handler process for MCQ abort ufs: core: add a quirk for MediaTek SDB mode aborted drivers/ufs/core/ufs-mcq.c | 15 ++++++++------- drivers/ufs/core/ufshcd.c | 28 ++++++++++++++++++++++++++-- drivers/ufs/host/ufs-mediatek.c | 1 + include/ufs/ufshcd.h | 6 ++++++ 4 files changed, 41 insertions(+), 9 deletions(-)