mbox series

[v9,0/3] fix abort defect

Message ID 20240925095546.19492-1-peter.wang@mediatek.com
Headers show
Series fix abort defect | expand

Message

Peter Wang (王信友) Sept. 25, 2024, 9:55 a.m. UTC
From: Peter Wang <peter.wang@mediatek.com>

V9:
 - Revise the OCS content printed.
 
V8:
 - Remove the abort variable to simplify the abort process.
 - Correct error handler successfully aborts release flow.
 - Ingore MCQ OCS: ABORTED.

V7:
 - Use a variable instead of a flag.
 - Add a check for MCQ mode when setting this variable to UFS_ERR_HANDLER.
 - Print OCS information for OCS_ABORTED and OCS_INVALID_COMMAND_STATUS.
 - Add a MediaTek quirk for handling OCS_ABORTED in SDB mode.
 - Skip notifying SCSI from ISR during SCSI abort (ufshcd_abort()).

V6:
 - Add err handler check before set flag true.

V5:
 - Change flag name.
 - Amend comment and patch description.

V4:
 - Remove nullify SQ entry abort requeue.
 - Add more comment for flag usage and set description.
 - Fix build warning.

V3:
 - Change comment and use variable(rtc) for error print
 - Change flag name and move flag set before ufshcd_clear_cmd
 - Add SDB mode clear UTRLCLR tag receive OCS_ABORTED requeue

V2:
 - Fix mcq_enabled build error.

Peter Wang (3):
  ufs: core: fix the issue of ICU failure
  ufs: core: fix error handler process for MCQ abort
  ufs: core: add a quirk for MediaTek SDB mode aborted

 drivers/ufs/core/ufs-mcq.c      | 15 ++++++++-------
 drivers/ufs/core/ufshcd.c       | 28 ++++++++++++++++++++++++++--
 drivers/ufs/host/ufs-mediatek.c |  1 +
 include/ufs/ufshcd.h            |  6 ++++++
 4 files changed, 41 insertions(+), 9 deletions(-)

Comments

Peter Wang (王信友) Sept. 26, 2024, 3:45 a.m. UTC | #1
On Wed, 2024-09-25 at 09:49 -0700, Bart Van Assche wrote:
>  	 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  On 9/25/24 2:55 AM, peter.wang@mediatek.com wrote:
> >   case OCS_INVALID_COMMAND_STATUS:
> >   result |= DID_REQUEUE << 16;
> > +dev_warn(hba->dev,
> > +"OCS invaild from controller for tag %d\n",
> > +lrbp->task_tag);
> 
> Please change "invaild" into "invalid". Once that change has been
> made,
> feel free to add:
> 
> Reviewed-by: Bart Van Assche <bvanassche@acm.org>


Hi Bart,

Sorry, this typo wasn't corrected.

However, I still feel that this patch is not quite reasonable.
The reason is that in the first patch,
"ufs: core: fix the issue of ICU failure"
we corrected the ICU problem, allowing the SQ to clean up 
correctly, while the CQ will have a corresponding response. 
But the second patch directly ignores the CQ's response and 
continues to use a workaround to release the command right 
after ufshcd_try_to_abort_task. 
The Legacy SDB mode's approach would not release the 
command after ufshcd_try_to_abort_task. 
Instead, it releases after the DBR is cleared.

Therefore, as I previously suggested, it would probably 
be more reasonable to directly requeue the ABORTED commands 
as shown in the patch below.

---------------------------------------------------------------------
---
diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index 24a32e2fd75e..06aa4ed1a9e6 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -5417,10 +5417,12 @@ ufshcd_transfer_rsp_status(struct ufs_hba *hba,
struct ufshcd_lrb *lrbp,
 		}
 		break;
 	case OCS_ABORTED:
-		result |= DID_ABORT << 16;
-		break;
 	case OCS_INVALID_COMMAND_STATUS:
 		result |= DID_REQUEUE << 16;
+		dev_warn(hba->dev,
+				"OCS %s from controller for tag %d\n",
+				(ocs == OCS_ABORTED? "aborted" :
"invalid"),
+				lrbp->task_tag);
 		break;
 	case OCS_INVALID_CMD_TABLE_ATTR:
 	case OCS_INVALID_PRDT_ATTR:
@@ -6466,26 +6468,12 @@ static bool ufshcd_abort_one(struct request
*rq, void *priv)
 	struct scsi_device *sdev = cmd->device;
 	struct Scsi_Host *shost = sdev->host;
 	struct ufs_hba *hba = shost_priv(shost);
-	struct ufshcd_lrb *lrbp = &hba->lrb[tag];
-	struct ufs_hw_queue *hwq;
-	unsigned long flags;
 
 	*ret = ufshcd_try_to_abort_task(hba, tag);
 	dev_err(hba->dev, "Aborting tag %d / CDB %#02x %s\n", tag,
 		hba->lrb[tag].cmd ? hba->lrb[tag].cmd->cmnd[0] : -1,
 		*ret ? "failed" : "succeeded");
 
-	/* Release cmd in MCQ mode if abort succeeds */
-	if (hba->mcq_enabled && (*ret == 0)) {
-		hwq = ufshcd_mcq_req_to_hwq(hba, scsi_cmd_to_rq(lrbp-
>cmd));
-		if (!hwq)
-			return 0;
-		spin_lock_irqsave(&hwq->cq_lock, flags);
-		if (ufshcd_cmd_inflight(lrbp->cmd))
-			ufshcd_release_scsi_cmd(hba, lrbp);
-		spin_unlock_irqrestore(&hwq->cq_lock, flags);
-	}
-
 	return *ret == 0;
 }
 
---------------------------------------------------------------------
---


This patch has several advantages:

1. It makes the patch 'ufs: core: fix the issue of ICU failure' 
   seem valuable.
2. The patch is more concise.
3. There is no need to fetch OCS to determine OCS: ABORTED 
   on every CQ completion, which increases ISR time.
4. The err_handler flow for SDB and MCQ would be consistent.
5. There is no need for the MediaTek SDB quirk.


What do you think?"
Bart Van Assche Sept. 26, 2024, 6:26 p.m. UTC | #2
On 9/25/24 8:45 PM, Peter Wang (王信友) wrote:
> diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
> index 24a32e2fd75e..06aa4ed1a9e6 100644
> --- a/drivers/ufs/core/ufshcd.c
> +++ b/drivers/ufs/core/ufshcd.c
> @@ -5417,10 +5417,12 @@ ufshcd_transfer_rsp_status(struct ufs_hba *hba,
> struct ufshcd_lrb *lrbp,
>   		}
>   		break;
>   	case OCS_ABORTED:
> -		result |= DID_ABORT << 16;
> -		break;
>   	case OCS_INVALID_COMMAND_STATUS:
>   		result |= DID_REQUEUE << 16;
> +		dev_warn(hba->dev,
> +				"OCS %s from controller for tag %d\n",
> +				(ocs == OCS_ABORTED? "aborted" :
> "invalid"),
> +				lrbp->task_tag);
>   		break;
>   	case OCS_INVALID_CMD_TABLE_ATTR:
>   	case OCS_INVALID_PRDT_ATTR:
> @@ -6466,26 +6468,12 @@ static bool ufshcd_abort_one(struct request
> *rq, void *priv)
>   	struct scsi_device *sdev = cmd->device;
>   	struct Scsi_Host *shost = sdev->host;
>   	struct ufs_hba *hba = shost_priv(shost);
> -	struct ufshcd_lrb *lrbp = &hba->lrb[tag];
> -	struct ufs_hw_queue *hwq;
> -	unsigned long flags;
>   
>   	*ret = ufshcd_try_to_abort_task(hba, tag);
>   	dev_err(hba->dev, "Aborting tag %d / CDB %#02x %s\n", tag,
>   		hba->lrb[tag].cmd ? hba->lrb[tag].cmd->cmnd[0] : -1,
>   		*ret ? "failed" : "succeeded");
>   
> -	/* Release cmd in MCQ mode if abort succeeds */
> -	if (hba->mcq_enabled && (*ret == 0)) {
> -		hwq = ufshcd_mcq_req_to_hwq(hba, scsi_cmd_to_rq(lrbp-
>> cmd));
> -		if (!hwq)
> -			return 0;
> -		spin_lock_irqsave(&hwq->cq_lock, flags);
> -		if (ufshcd_cmd_inflight(lrbp->cmd))
> -			ufshcd_release_scsi_cmd(hba, lrbp);
> -		spin_unlock_irqrestore(&hwq->cq_lock, flags);
> -	}
> -
>   	return *ret == 0;
>   }
>   
> ---------------------------------------------------------------------
> 
> 
> This patch has several advantages:
> 
> 1. It makes the patch 'ufs: core: fix the issue of ICU failure'
>     seem valuable.
> 2. The patch is more concise.
> 3. There is no need to fetch OCS to determine OCS: ABORTED
>     on every CQ completion, which increases ISR time.
> 4. The err_handler flow for SDB and MCQ would be consistent.
> 5. There is no need for the MediaTek SDB quirk.
> 
> 
> What do you think?"

Hi Peter,

Is the above patch sufficient? In MCQ mode, aborting a command happens
as follows (simplified):
(1) Send the ABORT TASK TMF. If this TMF succeeds, no SQE will be
     generated. If this TMF succeeds it means that the SCSI command has
     reached the UFS device and hence is no longer present in any
     submission queue (SQ).
(2) If the command is still in a submission queue, nullify the SQE. In
     this case a CQE will be generated with status ABORTED.

It seems to me that the above patch handles (2) but not (1)?

Thanks,

Bart.