mbox series

[v8,0/3] fix abort defect

Message ID 20240923080344.19084-1-peter.wang@mediatek.com
Headers show
Series fix abort defect | expand

Message

Peter Wang (王信友) Sept. 23, 2024, 8:03 a.m. UTC
From: Peter Wang <peter.wang@mediatek.com>

V8:
 - Remove the abort variable to simplify the abort process.
 - Correct error handler successfully aborts release flow.
 - Ingore MCQ OCS: ABORTED.

V7:
 - Use a variable instead of a flag.
 - Add a check for MCQ mode when setting this variable to UFS_ERR_HANDLER.
 - Print OCS information for OCS_ABORTED and OCS_INVALID_COMMAND_STATUS.
 - Add a MediaTek quirk for handling OCS_ABORTED in SDB mode.
 - Skip notifying SCSI from ISR during SCSI abort (ufshcd_abort()).

V6:
 - Add err handler check before set flag true.

V5:
 - Change flag name.
 - Amend comment and patch description.

V4:
 - Remove nullify SQ entry abort requeue.
 - Add more comment for flag usage and set description.
 - Fix build warning.

V3:
 - Change comment and use variable(rtc) for error print
 - Change flag name and move flag set before ufshcd_clear_cmd
 - Add SDB mode clear UTRLCLR tag receive OCS_ABORTED requeue

V2:
 - Fix mcq_enabled build error.

Peter Wang (3):
  ufs: core: fix the issue of ICU failure
  ufs: core: fix error handler process for MCQ abort
  ufs: core: add a quirk for MediaTek SDB mode aborted

 drivers/ufs/core/ufs-mcq.c      | 15 ++++++++-------
 drivers/ufs/core/ufshcd.c       | 28 ++++++++++++++++++++++++++--
 drivers/ufs/host/ufs-mediatek.c |  1 +
 include/ufs/ufshcd.h            |  6 ++++++
 4 files changed, 41 insertions(+), 9 deletions(-)

Comments

Bart Van Assche Sept. 23, 2024, 6:15 p.m. UTC | #1
On 9/23/24 1:03 AM, peter.wang@mediatek.com wrote:
> diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
> index b5c7bc50a27e..b42079c3d634 100644
> --- a/drivers/ufs/core/ufshcd.c
> +++ b/drivers/ufs/core/ufshcd.c
> @@ -5404,7 +5404,10 @@ ufshcd_transfer_rsp_status(struct ufs_hba *hba, struct ufshcd_lrb *lrbp,
>   		}
>   		break;
>   	case OCS_ABORTED:
> -		result |= DID_ABORT << 16;
> +		if (hba->quirks & UFSHCD_QUIRK_OCS_ABORTED)
> +			result |= DID_REQUEUE << 16;
> +		else
> +			result |= DID_ABORT << 16;
>   		dev_warn(hba->dev,
>   				"OCS aborted from controller = %x for tag %d\n",
>   				ocs, lrbp->task_tag);

I think the approach of this patch is racy: the cmd->result assignment
by ufshcd_transfer_rsp_status() races with the cmd->result assignment by
ufshcd_abort_one(). How about addressing this race as follows?
* In ufshcd_compl_one_cqe(), if the OCS_ABORTED status is encountered,
   set a completion.
* In the code that aborts SCSI commands, for MediaTek controllers only,
   wait for that completion (based on a quirk).
* Instead of introducing an if-statement in
   ufshcd_transfer_rsp_status(), rely on the cmd->result value assigned
   by ufshcd_abort_one().

Thanks,

Bart.
Peter Wang (王信友) Sept. 24, 2024, 8:51 a.m. UTC | #2
On Mon, 2024-09-23 at 11:15 -0700, Bart Van Assche wrote:
>  	 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  On 9/23/24 1:03 AM, peter.wang@mediatek.com wrote:
> > diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
> > index b5c7bc50a27e..b42079c3d634 100644
> > --- a/drivers/ufs/core/ufshcd.c
> > +++ b/drivers/ufs/core/ufshcd.c
> > @@ -5404,7 +5404,10 @@ ufshcd_transfer_rsp_status(struct ufs_hba
> *hba, struct ufshcd_lrb *lrbp,
> >   }
> >   break;
> >   case OCS_ABORTED:
> > -result |= DID_ABORT << 16;
> > +if (hba->quirks & UFSHCD_QUIRK_OCS_ABORTED)
> > +result |= DID_REQUEUE << 16;
> > +else
> > +result |= DID_ABORT << 16;
> >   dev_warn(hba->dev,
> >   "OCS aborted from controller = %x for tag %d\n",
> >   ocs, lrbp->task_tag);
> 
> I think the approach of this patch is racy: the cmd->result
> assignment
> by ufshcd_transfer_rsp_status() races with the cmd->result assignment
> by
> ufshcd_abort_one(). How about addressing this race as follows?
> * In ufshcd_compl_one_cqe(), if the OCS_ABORTED status is
> encountered,
>    set a completion.
> * In the code that aborts SCSI commands, for MediaTek controllers
> only,
>    wait for that completion (based on a quirk).
> * Instead of introducing an if-statement in
>    ufshcd_transfer_rsp_status(), rely on the cmd->result value
> assigned
>    by ufshcd_abort_one().
> 
> Thanks,
> 
> Bart.

Hi Bart,

Sorry, I might not have understood the potential racing issue here.

The SCSI abort command doesn't need to wait for completion because 
SCSI doesn't care about cmd->result, right?

The error handler abort also doesn't need to wait for completion, 
because it should have a guaranteed order?
Firstly, ufshcd_abort_one is only likely to be filled 
back with OCS: ABORTED by MediaTek UFS controller after 
ufshcd_try_to_abort_task, and the sequence is as follows.

ufshcd_err_handler()
  ufshcd_abort_all()
    ufshcd_abort_one()
      ufshcd_try_to_abort_task()	// trigger mediatek controller
fill OCS: ABORTED
    ufshcd_complete_requests()
      ufshcd_transfer_req_compl()
        ufshcd_poll()
          get outstanding_lock
          clear outstanding_reqs tag
          release outstanding_lock	
          __ufshcd_transfer_req_compl()
            ufshcd_compl_one_cqe()
              cmd->result = DID_REQUEUE // mediatek may need quirk
change DID_ABORT to DID_REQUEUE


In addition, the ISR will use the outstanding_lock with 
ufshcd_err_handler to provide protection, so there won't be any 
racing that causes the command to be released repeatedly. 
The only possible issue might be that after ufshcd_abort_one, 
the MediaTek UFS controller has not yet filled in OCS: ABORTED 
and has entered ufshcd_transfer_rsp_status for checking. 
But this doesn't matter, because it will just be treated 
as OCS_INVALID_COMMAND_STATUS.

Is there any corner case that I might have overlooked?


Thanks.
Peter



>
Bart Van Assche Sept. 24, 2024, 7:36 p.m. UTC | #3
On 9/24/24 1:51 AM, Peter Wang (王信友) wrote:
> Is there any corner case that I might have overlooked?

Maybe I misunderstood how MediaTek controllers work. Anyway, if a
MediaTek controller reports the completion status OCS_ABORTED, is
there any chance that this status will be reported after
ufshcd_release_scsi_cmd() has been called? Is there any chance that
the controller writing OCS_ABORTED into the status field will race
with the host software overwriting that status field while submitting
a new command?

Thanks,

Bart.