diff mbox series

[v6,2/2] ufs: core: requeue aborted request

Message ID 20240911095622.19225-3-peter.wang@mediatek.com
State Superseded
Headers show
Series fix abort defect | expand

Commit Message

Peter Wang (王信友) Sept. 11, 2024, 9:56 a.m. UTC
From: Peter Wang <peter.wang@mediatek.com>

ufshcd_abort_all forcibly aborts all on-going commands and the host
controller will automatically fill in the OCS field of the corresponding
response with OCS_ABORTED based on different working modes.

MCQ mode: aborts a command using SQ cleanup, The host controller
will post a Completion Queue entry with OCS = ABORTED.

SDB mode: aborts a command using UTRLCLR. Task Management response
which means a Transfer Request was aborted.

For these two cases, set a flag to notify SCSI to requeue the
command after receiving response with OCS_ABORTED.

Fixes: ab248643d3d6 ("scsi: ufs: core: Add error handling for MCQ mode")
Cc: stable@vger.kernel.org
Signed-off-by: Peter Wang <peter.wang@mediatek.com>
---
 drivers/ufs/core/ufshcd.c | 35 ++++++++++++++++++++---------------
 include/ufs/ufshcd.h      |  3 +++
 2 files changed, 23 insertions(+), 15 deletions(-)

Comments

Bart Van Assche Sept. 11, 2024, 7:37 p.m. UTC | #1
On 9/11/24 2:56 AM, peter.wang@mediatek.com wrote:
> ufshcd_abort_all forcibly aborts all on-going commands and the host
> controller will automatically fill in the OCS field of the corresponding
> response with OCS_ABORTED based on different working modes.
> 
> MCQ mode: aborts a command using SQ cleanup, The host controller
> will post a Completion Queue entry with OCS = ABORTED.
> 
> SDB mode: aborts a command using UTRLCLR. Task Management response
> which means a Transfer Request was aborted.

I think this is incorrect. The UFSHCI specification does not require a
host controller to set the OCS field if a SCSI command is aborted by the
ABORT TMF nor if its resources are freed by writing into the UTRLCLR
register.

> @@ -5404,7 +5405,10 @@ ufshcd_transfer_rsp_status(struct ufs_hba *hba, struct ufshcd_lrb *lrbp,
>   		}
>   		break;
>   	case OCS_ABORTED:
> -		result |= DID_ABORT << 16;
> +		if (lrbp->abort_initiated_by_eh)
> +			result |= DID_REQUEUE << 16;
> +		else
> +			result |= DID_ABORT << 16;
>   		break;

Is the above change necessary? ufshcd_abort_one() aborts commands by
submitting an ABORT TMF. Hence, ufshcd_transfer_rsp_status() won't be
called if aborting the command succeeds.

> @@ -7561,6 +7551,21 @@ int ufshcd_try_to_abort_task(struct ufs_hba *hba, int tag)
>   		goto out;
>   	}
>   
> +	/*
> +	 * When the host software receives a "FUNCTION COMPLETE", set flag
> +	 * to requeue command after receive response with OCS_ABORTED
> +	 * SDB mode: UTRLCLR Task Management response which means a Transfer
> +	 *           Request was aborted.
> +	 * MCQ mode: Host will post to CQ with OCS_ABORTED after SQ cleanup
> +	 *
> +	 * This flag is set because error handler ufshcd_abort_all forcibly
> +	 * aborts all commands, and the host controller will automatically
> +	 * fill in the OCS field of the corresponding response with OCS_ABORTED.
> +	 * Therefore, upon receiving this response, it needs to be requeued.
> +	 */
> +	if (!err && ufshcd_eh_in_progress(hba))
> +		lrbp->abort_initiated_by_eh = true;
> +
>   	err = ufshcd_clear_cmd(hba, tag);
>   	if (err)
>   		dev_err(hba->dev, "%s: Failed clearing cmd at tag %d, err %d\n",

The above change will cause lrbp->abort_initiated_by_eh to be set not
only if the UFS error handler decides to abort a command but also if the
SCSI core decides to abort a command. I think this is wrong.

Is this patch 2/2 necessary or is patch 1/2 perhaps sufficient?

Thanks,

Bart.
Peter Wang (王信友) Sept. 12, 2024, 1:49 p.m. UTC | #2
On Wed, 2024-09-11 at 12:37 -0700, Bart Van Assche wrote:
>  	 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  On 9/11/24 2:56 AM, peter.wang@mediatek.com wrote:
> > ufshcd_abort_all forcibly aborts all on-going commands and the host
> > controller will automatically fill in the OCS field of the
> corresponding
> > response with OCS_ABORTED based on different working modes.
> > 
> > MCQ mode: aborts a command using SQ cleanup, The host controller
> > will post a Completion Queue entry with OCS = ABORTED.
> > 
> > SDB mode: aborts a command using UTRLCLR. Task Management response
> > which means a Transfer Request was aborted.
> 
> I think this is incorrect. The UFSHCI specification does not require
> a
> host controller to set the OCS field if a SCSI command is aborted by
> the
> ABORT TMF nor if its resources are freed by writing into the UTRLCLR
> register.
> 

Hi Bart,

I'm thinking of removing the SDB descriptive comment, so that 
it can be applicable in cases with OCS ABORTED, and it won't 
matter if there isn't OCS ABORTED because setting this flag 
won't have any effect. 
Or I add more MCQ check only set this flag in MCQ mode. 
What do you think?



> > @@ -5404,7 +5405,10 @@ ufshcd_transfer_rsp_status(struct ufs_hba
> *hba, struct ufshcd_lrb *lrbp,
> >   }
> >   break;
> >   case OCS_ABORTED:
> > -result |= DID_ABORT << 16;
> > +if (lrbp->abort_initiated_by_eh)
> > +result |= DID_REQUEUE << 16;
> > +else
> > +result |= DID_ABORT << 16;
> >   break;
> 
> Is the above change necessary? ufshcd_abort_one() aborts commands by
> submitting an ABORT TMF. Hence, ufshcd_transfer_rsp_status() won't be
> called if aborting the command succeeds.
> 

Yes, Just becasue ABORT TMF COMPLETE, in spec description:
"A response of FUNCTION COMPLETE shall indicate that the 
command was aborted or was not in the task set. In either case, 
the SCSI target device shall guarantee that no further 
requests or responses are sent from the command."

So at this time, the device will not have a corresponding 
response coming back. The host controller will automatically 
fill in the response for the corresponding command based on 
the results of SQ cleanup (MCQ) or UTRLCLR (DBR, mediatek), 
with the COS content being ABORTED by interrupt.
ISR will handle the response which is from host, not from device.


> > @@ -7561,6 +7551,21 @@ int ufshcd_try_to_abort_task(struct ufs_hba
> *hba, int tag)
> >   goto out;
> >   }
> >   
> > +/*
> > + * When the host software receives a "FUNCTION COMPLETE", set flag
> > + * to requeue command after receive response with OCS_ABORTED
> > + * SDB mode: UTRLCLR Task Management response which means a
> Transfer
> > + *           Request was aborted.
> > + * MCQ mode: Host will post to CQ with OCS_ABORTED after SQ
> cleanup
> > + *
> > + * This flag is set because error handler ufshcd_abort_all
> forcibly
> > + * aborts all commands, and the host controller will automatically
> > + * fill in the OCS field of the corresponding response with
> OCS_ABORTED.
> > + * Therefore, upon receiving this response, it needs to be
> requeued.
> > + */
> > +if (!err && ufshcd_eh_in_progress(hba))
> > +lrbp->abort_initiated_by_eh = true;
> > +
> >   err = ufshcd_clear_cmd(hba, tag);
> >   if (err)
> >   dev_err(hba->dev, "%s: Failed clearing cmd at tag %d, err %d\n",
> 
> The above change will cause lrbp->abort_initiated_by_eh to be set not
> only if the UFS error handler decides to abort a command but also if
> the
> SCSI core decides to abort a command. I think this is wrong.
> 
> Is this patch 2/2 necessary or is patch 1/2 perhaps sufficient?
> 
> Thanks,
> 
> Bart.

Sorry, I might have missed something, but I didn't see 
scsi abort (ufshcd_abort) calling ufshcd_set_eh_in_progress. 
So, during a SCSI abort, ufshcd_eh_in_progress(hba) 
should return false and not set this flag, right?


Additionally, SCSI abort (ufshcd_abort) will have different
return values for MCQ mode and DBR mode when the device 
does not respond with a response.
MCQ mode will receivce OCS_ABORTED (nullify)
	case OCS_ABORTED:
		result |= DID_ABORT << 16;
		break;
But DBR mode, OCS won't change, it is 0x0F
	case OCS_INVALID_COMMAND_STATUS:
		result |= DID_REQUEUE << 16;
		break;
In this case, should we also return DID_ABORT for DBR mode?


Thanks.
Peter
Bart Van Assche Sept. 12, 2024, 9:11 p.m. UTC | #3
On 9/12/24 6:49 AM, Peter Wang (王信友) wrote:
> On Wed, 2024-09-11 at 12:37 -0700, Bart Van Assche wrote:
[ ... ]
> So at this time, the device will not have a corresponding
> response coming back. The host controller will automatically
> fill in the response for the corresponding command based on
> the results of SQ cleanup (MCQ) or UTRLCLR (DBR, mediatek),
> with the COS content being ABORTED by interrupt.

I don't think so. In SDB mode, writing into the UTRLCLR register does
NOT cause the ABORTED status to be written into the OCS field. Even if
there would be UFSHCI controllers that do this, the UFSHCI specification
does not require this behavior and hence the UFS driver should not
assume this behavior.

>> The above change will cause lrbp->abort_initiated_by_eh to be set not
>> only if the UFS error handler decides to abort a command but also if
>> the
>> SCSI core decides to abort a command. I think this is wrong.
> 
> Sorry, I might have missed something, but I didn't see
> scsi abort (ufshcd_abort) calling ufshcd_set_eh_in_progress.
> So, during a SCSI abort, ufshcd_eh_in_progress(hba)
> should return false and not set this flag, right?

I think you are right so please ignore my comment above.

> Additionally, SCSI abort (ufshcd_abort) will have different
> return values for MCQ mode and DBR mode when the device
> does not respond with a response.
> MCQ mode will receivce OCS_ABORTED (nullify)
> 	case OCS_ABORTED:
> 		result |= DID_ABORT << 16;
> 		break;

No. ufshcd_abort() submits an abort TMF and the OCS status is not
modified if the abort TMF succeeds.

> But DBR mode, OCS won't change, it is 0x0F
> 	case OCS_INVALID_COMMAND_STATUS:
> 		result |= DID_REQUEUE << 16;
> 		break;
> In this case, should we also return DID_ABORT for DBR mode?

The above code comes from ufshcd_transfer_rsp_status(), isn't it?
ufshcd_transfer_rsp_status() should not be called if the SCSI core
aborts a SCSI command (ufshcd_abort()). It is not allowed to call
scsi_done() from a SCSI abort handler like ufshcd_abort(). SCSI
abort handlers must return SUCCESS, FAILED or FAST_IO_FAIL and let
the SCSI core decide whether to complete or whether to resubmit the
SCSI command.

Thanks,

Bart.
Peter Wang (王信友) Sept. 13, 2024, 7:18 a.m. UTC | #4
On Thu, 2024-09-12 at 14:11 -0700, Bart Van Assche wrote:
>  	 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  On 9/12/24 6:49 AM, Peter Wang (王信友) wrote:
> > On Wed, 2024-09-11 at 12:37 -0700, Bart Van Assche wrote:
> [ ... ]
> > So at this time, the device will not have a corresponding
> > response coming back. The host controller will automatically
> > fill in the response for the corresponding command based on
> > the results of SQ cleanup (MCQ) or UTRLCLR (DBR, mediatek),
> > with the COS content being ABORTED by interrupt.
> 
> I don't think so. In SDB mode, writing into the UTRLCLR register does
> NOT cause the ABORTED status to be written into the OCS field. Even
> if
> there would be UFSHCI controllers that do this, the UFSHCI
> specification
> does not require this behavior and hence the UFS driver should not
> assume this behavior.
> 

Hi Bart,

Okay, I will add MCQ check only set this flag in MCQ mode.


> >> The above change will cause lrbp->abort_initiated_by_eh to be set
> not
> >> only if the UFS error handler decides to abort a command but also
> if
> >> the
> >> SCSI core decides to abort a command. I think this is wrong.
> > 
> > Sorry, I might have missed something, but I didn't see
> > scsi abort (ufshcd_abort) calling ufshcd_set_eh_in_progress.
> > So, during a SCSI abort, ufshcd_eh_in_progress(hba)
> > should return false and not set this flag, right?
> 
> I think you are right so please ignore my comment above.
> 
> > Additionally, SCSI abort (ufshcd_abort) will have different
> > return values for MCQ mode and DBR mode when the device
> > does not respond with a response.
> > MCQ mode will receivce OCS_ABORTED (nullify)
> > case OCS_ABORTED:
> > result |= DID_ABORT << 16;
> > break;
> 
> No. ufshcd_abort() submits an abort TMF and the OCS status is not
> modified if the abort TMF succeeds.
> 

Yes, TMF won't update the OCS status.
MCQ mode update OCS status by 
1. nullify: ufshcd_mcq_nullify_sqe
2. sq cleanup: ufshcd_mcq_sq_cleanup
In both case, host will fills OCS with ABORTED.



> > But DBR mode, OCS won't change, it is 0x0F
> > case OCS_INVALID_COMMAND_STATUS:
> > result |= DID_REQUEUE << 16;
> > break;
> > In this case, should we also return DID_ABORT for DBR mode?
> 
> The above code comes from ufshcd_transfer_rsp_status(), isn't it?
> ufshcd_transfer_rsp_status() should not be called if the SCSI core
> aborts a SCSI command (ufshcd_abort()). It is not allowed to call
> scsi_done() from a SCSI abort handler like ufshcd_abort(). SCSI
> abort handlers must return SUCCESS, FAILED or FAST_IO_FAIL and let
> the SCSI core decide whether to complete or whether to resubmit the
> SCSI command.
> 


Yes, it is from ufshcd_transfer_rsp_status, another ISR process.

Regarding the specification of UTRLDBR:
When a transfer request is "completed" (with success or error),
the corresponding bit is cleared to '0' by the host controller.

ufshcd_abort is error case that host controller clear UTRLDBR 
tag bit by UTRLCLR.

Regarding the specification of UTRCS:
This bit is set to '1' by the host controller upon one of the
following:
	Completion of a UTP transfer request with its UTRD
	Interrupt bit set to "1"
	Overall command Status (OCS) of the completed command is not 
	equal to 'SUCCESS' even if its UTRD Interrupt bit set to '0'

So, this case is transfer request is "completed" with error.
Whether the OCS is 'ABORTED' or 'INVALID_OCS_VALUE'
We should always receive an interrupt upon completion 
becasue UTRCS will set to '1'


> Thanks,
> 
> Bart.
diff mbox series

Patch

diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index a6f818cdef0e..a0548a495f30 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -3006,6 +3006,7 @@  static int ufshcd_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd)
 	ufshcd_prepare_lrbp_crypto(scsi_cmd_to_rq(cmd), lrbp);
 
 	lrbp->req_abort_skip = false;
+	lrbp->abort_initiated_by_eh = false;
 
 	ufshcd_comp_scsi_upiu(hba, lrbp);
 
@@ -5404,7 +5405,10 @@  ufshcd_transfer_rsp_status(struct ufs_hba *hba, struct ufshcd_lrb *lrbp,
 		}
 		break;
 	case OCS_ABORTED:
-		result |= DID_ABORT << 16;
+		if (lrbp->abort_initiated_by_eh)
+			result |= DID_REQUEUE << 16;
+		else
+			result |= DID_ABORT << 16;
 		break;
 	case OCS_INVALID_COMMAND_STATUS:
 		result |= DID_REQUEUE << 16;
@@ -6471,26 +6475,12 @@  static bool ufshcd_abort_one(struct request *rq, void *priv)
 	struct scsi_device *sdev = cmd->device;
 	struct Scsi_Host *shost = sdev->host;
 	struct ufs_hba *hba = shost_priv(shost);
-	struct ufshcd_lrb *lrbp = &hba->lrb[tag];
-	struct ufs_hw_queue *hwq;
-	unsigned long flags;
 
 	*ret = ufshcd_try_to_abort_task(hba, tag);
 	dev_err(hba->dev, "Aborting tag %d / CDB %#02x %s\n", tag,
 		hba->lrb[tag].cmd ? hba->lrb[tag].cmd->cmnd[0] : -1,
 		*ret ? "failed" : "succeeded");
 
-	/* Release cmd in MCQ mode if abort succeeds */
-	if (hba->mcq_enabled && (*ret == 0)) {
-		hwq = ufshcd_mcq_req_to_hwq(hba, scsi_cmd_to_rq(lrbp->cmd));
-		if (!hwq)
-			return 0;
-		spin_lock_irqsave(&hwq->cq_lock, flags);
-		if (ufshcd_cmd_inflight(lrbp->cmd))
-			ufshcd_release_scsi_cmd(hba, lrbp);
-		spin_unlock_irqrestore(&hwq->cq_lock, flags);
-	}
-
 	return *ret == 0;
 }
 
@@ -7561,6 +7551,21 @@  int ufshcd_try_to_abort_task(struct ufs_hba *hba, int tag)
 		goto out;
 	}
 
+	/*
+	 * When the host software receives a "FUNCTION COMPLETE", set flag
+	 * to requeue command after receive response with OCS_ABORTED
+	 * SDB mode: UTRLCLR Task Management response which means a Transfer
+	 *           Request was aborted.
+	 * MCQ mode: Host will post to CQ with OCS_ABORTED after SQ cleanup
+	 *
+	 * This flag is set because error handler ufshcd_abort_all forcibly
+	 * aborts all commands, and the host controller will automatically
+	 * fill in the OCS field of the corresponding response with OCS_ABORTED.
+	 * Therefore, upon receiving this response, it needs to be requeued.
+	 */
+	if (!err && ufshcd_eh_in_progress(hba))
+		lrbp->abort_initiated_by_eh = true;
+
 	err = ufshcd_clear_cmd(hba, tag);
 	if (err)
 		dev_err(hba->dev, "%s: Failed clearing cmd at tag %d, err %d\n",
diff --git a/include/ufs/ufshcd.h b/include/ufs/ufshcd.h
index 0fd2aebac728..07b5e60e6c44 100644
--- a/include/ufs/ufshcd.h
+++ b/include/ufs/ufshcd.h
@@ -173,6 +173,8 @@  struct ufs_pm_lvl_states {
  * @crypto_key_slot: the key slot to use for inline crypto (-1 if none)
  * @data_unit_num: the data unit number for the first block for inline crypto
  * @req_abort_skip: skip request abort task flag
+ * @abort_initiated_by_eh: The flag is specifically used to handle
+ *                         ufshcd_err_handler forcibly aborts.
  */
 struct ufshcd_lrb {
 	struct utp_transfer_req_desc *utr_descriptor_ptr;
@@ -202,6 +204,7 @@  struct ufshcd_lrb {
 #endif
 
 	bool req_abort_skip;
+	bool abort_initiated_by_eh;
 };
 
 /**