mbox series

[v2,0/2] Add UFS LINERESET handling

Message ID 1599099873-32579-1-git-send-email-cang@codeaurora.org
Headers show
Series Add UFS LINERESET handling | expand

Message

Can Guo Sept. 3, 2020, 2:24 a.m. UTC
PA Layer issues a LINERESET to the PHY at the recovery step in the Power
Mode change operation. If it happens during auto or mannual hibern8 enter,
even if hibern8 enter succeeds, UFS power mode shall be set to PWM-G1 mode
and kept in that mode after exit from hibern8, leading to bad performance.
Handle the LINERESET in the eh_work by restoring power mode to HS mode
after all pending reqs and tasks are cleared from doorbell.

Change since v1:
- Made some cleanup to the 2nd change.

Can Guo (2):
  scsi: ufs: Abort tasks before clear them from doorbell
  scsi: ufs: Handle LINERESET indication in err handler

 drivers/scsi/ufs/ufshcd.c | 283 +++++++++++++++++++++++++++++++---------------
 drivers/scsi/ufs/ufshcd.h |   2 +
 drivers/scsi/ufs/ufshci.h |   1 +
 drivers/scsi/ufs/unipro.h |   3 +
 4 files changed, 196 insertions(+), 93 deletions(-)

Comments

James Bottomley Sept. 9, 2020, 5:05 a.m. UTC | #1
I can't reconcile this hunk:

On Wed, 2020-09-02 at 19:24 -0700, Can Guo wrote:
> @@ -6504,6 +6505,80 @@ static void ufshcd_set_req_abort_skip(struct
> ufs_hba *hba, unsigned long bitmap)
>   * issued. To avoid that, first issue UFS_QUERY_TASK to check if the
> command is
>   * really issued and then try to abort it.
>   *
> + * Returns zero on success, non-zero on failure
> + */
> +static int ufshcd_try_to_abort_task(struct ufs_hba *hba, int tag)
> +{
> +	struct ufshcd_lrb *lrbp = &hba->lrb[tag];
> +	int err = 0;
> +	int poll_cnt;
> +	u8 resp = 0xF;
> +	u32 reg;
> +
> +	for (poll_cnt = 100; poll_cnt; poll_cnt--) {
> +		err = ufshcd_issue_tm_cmd(hba, lrbp->lun, lrbp-
> >task_tag,
> +				UFS_QUERY_TASK, &resp);
> +		if (!err && resp ==
> UPIU_TASK_MANAGEMENT_FUNC_SUCCEEDED) {
> +			/* cmd pending in the device */
> +			dev_err(hba->dev, "%s: cmd pending in the
> device. tag = %d\n",
> +				__func__, tag);
> +			break;
> +		} else if (!err && resp ==
> UPIU_TASK_MANAGEMENT_FUNC_COMPL) {
> +			/*
> +			 * cmd not pending in the device, check if
> it is
> +			 * in transition.
> +			 */
> +			dev_err(hba->dev, "%s: cmd at tag %d not
> pending in the device.\n",
> +				__func__, tag);
> +			reg = ufshcd_readl(hba,
> REG_UTP_TRANSFER_REQ_DOOR_BELL);
> +			if (reg & (1 << tag)) {
> +				/* sleep for max. 200us to stabilize
> */
> +				usleep_range(100, 200);
> +				continue;
> +			}
> +			/* command completed already */
> +			dev_err(hba->dev, "%s: cmd at tag %d
> successfully cleared from DB.\n",
> +				__func__, tag);
> +			goto out;
> +		} else {
> +			dev_err(hba->dev,
> +				"%s: no response from device. tag =
> %d, err %d\n",
> +				__func__, tag, err);
> +			if (!err)
> +				err = resp; /* service response
> error */
> +			goto out;
> +		}
> +	}
> +
> +	if (!poll_cnt) {
> +		err = -EBUSY;
> +		goto out;
> +	}
> +
> +	err = ufshcd_issue_tm_cmd(hba, lrbp->lun, lrbp->task_tag,
> +			UFS_ABORT_TASK, &resp);
> +	if (err || resp != UPIU_TASK_MANAGEMENT_FUNC_COMPL) {
> +		if (!err) {
> +			err = resp; /* service response error */
> +			dev_err(hba->dev, "%s: issued. tag = %d, err
> %d\n",
> +				__func__, tag, err);
> +		}
> +		goto out;
> +	}
> +
> +	err = ufshcd_clear_cmd(hba, tag);
> +	if (err)
> +		dev_err(hba->dev, "%s: Failed clearing cmd at tag
> %d, err %d\n",
> +			__func__, tag, err);
> +
> +out:
> +	return err;
> +}
> +
> +/**
> + * ufshcd_abort - scsi host template eh_abort_handler callback
> + * @cmd: SCSI command pointer
> + *
>   * Returns SUCCESS/FAILED
>   */
>  static int ufshcd_abort(struct scsi_cmnd *cmd)
> @@ -6513,8 +6588,6 @@ static int ufshcd_abort(struct scsi_cmnd *cmd)
>  	unsigned long flags;
>  	unsigned int tag;
>  	int err = 0;
> -	int poll_cnt;
> -	u8 resp = 0xF;
>  	struct ufshcd_lrb *lrbp;
>  	u32 reg;
>  
> @@ -6583,63 +6656,9 @@ static int ufshcd_abort(struct scsi_cmnd *cmd)
>  		goto out;
>  	}
>  
> -	for (poll_cnt = 100; poll_cnt; poll_cnt--) {
> -		err = ufshcd_issue_tm_cmd(hba, lrbp->lun, lrbp-
> >task_tag,
> -				UFS_QUERY_TASK, &resp);
> -		if (!err && resp ==
> UPIU_TASK_MANAGEMENT_FUNC_SUCCEEDED) {
> -			/* cmd pending in the device */
> -			dev_err(hba->dev, "%s: cmd pending in the
> device. tag = %d\n",
> -				__func__, tag);
> -			break;
> -		} else if (!err && resp ==
> UPIU_TASK_MANAGEMENT_FUNC_COMPL) {
> -			/*
> -			 * cmd not pending in the device, check if
> it is
> -			 * in transition.
> -			 */
> -			dev_err(hba->dev, "%s: cmd at tag %d not
> pending in the device.\n",
> -				__func__, tag);
> -			reg = ufshcd_readl(hba,
> REG_UTP_TRANSFER_REQ_DOOR_BELL);
> -			if (reg & (1 << tag)) {
> -				/* sleep for max. 200us to stabilize
> */
> -				usleep_range(100, 200);
> -				continue;
> -			}
> -			/* command completed already */
> -			dev_err(hba->dev, "%s: cmd at tag %d
> successfully cleared from DB.\n",
> -				__func__, tag);
> -			goto out;
> -		} else {
> -			dev_err(hba->dev,
> -				"%s: no response from device. tag =
> %d, err %d\n",
> -				__func__, tag, err);
> -			if (!err)
> -				err = resp; /* service response
> error */
> -			goto out;
> -		}
> -	}
> -
> -	if (!poll_cnt) {
> -		err = -EBUSY;
> -		goto out;
> -	}
> -
> -	err = ufshcd_issue_tm_cmd(hba, lrbp->lun, lrbp->task_tag,
> -			UFS_ABORT_TASK, &resp);
> -	if (err || resp != UPIU_TASK_MANAGEMENT_FUNC_COMPL) {
> -		if (!err) {
> -			err = resp; /* service response error */
> -			dev_err(hba->dev, "%s: issued. tag = %d, err
> %d\n",
> -				__func__, tag, err);
> -		}
> -		goto out;
> -	}
> -
> -	err = ufshcd_clear_cmd(hba, tag);
> -	if (err) {
> -		dev_err(hba->dev, "%s: Failed clearing cmd at tag
> %d, err %d\n",
> -			__func__, tag, err);
> +	err = ufshcd_try_to_abort_task(hba, tag);
> +	if (err)
>  		goto out;
> -	}
>  
>  	spin_lock_irqsave(host->host_lock, flags);
>  	__ufshcd_transfer_req_compl(hba, (1UL << tag));


With the change in this fix:

commit b10178ee7fa88b68a9e8adc06534d2605cb0ec23
Author: Stanley Chu <stanley.chu@mediatek.com>
Date:   Tue Aug 11 16:18:58 2020 +0200

    scsi: ufs: Clean up completed request without interrupt
notification


It looks like there have to be two separate error returns from your new
ufshcd_try_to_abort_function() so it knows to continue with
usfhcd_transfer_req_complete(), or the whole function needs to be
refactored, but if this goes upstream as is it looks like it will
eliminate the bug fix.

James
Martin K. Petersen Sept. 10, 2020, 2:32 a.m. UTC | #2
Can and Stanley,

> I can't reconcile this hunk:

Please provide a resolution for these conflicting commits in fixes and
queue:

307348f6ab14 scsi: ufs: Abort tasks before clearing them from doorbell

b10178ee7fa8 scsi: ufs: Clean up completed request without interrupt
notification

Thanks!
Stanley Chu Sept. 10, 2020, 2:48 a.m. UTC | #3
Hi Martin, Can,

On Wed, 2020-09-09 at 22:32 -0400, Martin K. Petersen wrote:
> Can and Stanley,

> 

> > I can't reconcile this hunk:

> 

> Please provide a resolution for these conflicting commits in fixes and

> queue:

> 

> 307348f6ab14 scsi: ufs: Abort tasks before clearing them from doorbell

> 

> b10178ee7fa8 scsi: ufs: Clean up completed request without interrupt

> notification

> 


Can's patch has considered my fix in the new flow.

To be more clear, for the fixing case in my patch,
ufshcd_try_to_abort_task() will return 0 (err = 0) and finally the
target tag can be completed and cleared by __ufshcd_transfer_req_compl()
in Can's new flow.

Thus I think the resolution can just using the code in Can's patch.

Can, please correct me if I was wrong.

Thanks,
Stanley Chu


> Thanks!

>
Stanley Chu Sept. 10, 2020, 8:18 a.m. UTC | #4
Hi James,

On Wed, 2020-09-09 at 23:18 -0700, James Bottomley wrote:
> On Thu, 2020-09-10 at 10:48 +0800, Stanley Chu wrote:

> > Hi Martin, Can,

> > 

> > On Wed, 2020-09-09 at 22:32 -0400, Martin K. Petersen wrote:

> > > Can and Stanley,

> > > 

> > > > I can't reconcile this hunk:

> > > 

> > > Please provide a resolution for these conflicting commits in fixes

> > > and

> > > queue:

> > > 

> > > 307348f6ab14 scsi: ufs: Abort tasks before clearing them from

> > > doorbell

> > > 

> > > b10178ee7fa8 scsi: ufs: Clean up completed request without

> > > interrupt

> > > notification

> > > 

> > 

> > Can's patch has considered my fix in the new flow.

> > 

> > To be more clear, for the fixing case in my patch,

> > ufshcd_try_to_abort_task() will return 0 (err = 0) and finally the

> > target tag can be completed and cleared by

> > __ufshcd_transfer_req_compl()

> > in Can's new flow.

> > 

> > Thus I think the resolution can just using the code in Can's patch.

> > 

> > Can, please correct me if I was wrong.

> 

> Well, that really doesn't make for an easy merge. The resolution I took

> is below.

> 

> James

> 

> ---

> 

> commit 5399a4aa684d491c35a386effe385c06b41398fa

> Merge: 59958f7a956b 8c6572356646

> Author: James Bottomley <James.Bottomley@HansenPartnership.com>

> Date:   Wed Sep 9 23:12:52 2020 -0700

> 

>     Merge branch 'misc' into for-next

>     

>     Conflicts:

>             drivers/scsi/ufs/ufshcd.c

>             drivers/scsi/ufs/ufshcd.h

> 

> diff --cc drivers/scsi/ufs/ufshcd.c

> index 34e1ab407b05,05716f62febe..49478c8a601f

> --- a/drivers/scsi/ufs/ufshcd.c

> +++ b/drivers/scsi/ufs/ufshcd.c

> @@@ -6574,84 -6739,22 +6736,25 @@@ static int ufshcd_abort(struct scsi_cmn

>   	}

>   	hba->req_abort_count++;

>   

>  -	/* Skip task abort in case previous aborts failed and report failure */

>  -	if (lrbp->req_abort_skip) {

>  -		err = -EIO;

>  -		goto out;

>  +	if (!(reg & (1 << tag))) {

>  +		dev_err(hba->dev,

>  +		"%s: cmd was completed, but without a notifying intr, tag = %d",

>  +		__func__, tag);

>  +		goto cleanup;

>   	}

>   

>  -	err = ufshcd_try_to_abort_task(hba, tag);

>  -	if (err)

>  -		goto out;

>  -

>  -	spin_lock_irqsave(host->host_lock, flags);

>  -	__ufshcd_transfer_req_compl(hba, (1UL << tag));

>  -	spin_unlock_irqrestore(host->host_lock, flags);

>  +	/* Skip task abort in case previous aborts failed and report failure */

> - 	if (lrbp->req_abort_skip) {

> ++	if (lrbp->req_abort_skip)

>  +		err = -EIO;

> - 		goto out;

> - 	}

> - 

> - 	for (poll_cnt = 100; poll_cnt; poll_cnt--) {

> - 		err = ufshcd_issue_tm_cmd(hba, lrbp->lun, lrbp->task_tag,

> - 				UFS_QUERY_TASK, &resp);

> - 		if (!err && resp == UPIU_TASK_MANAGEMENT_FUNC_SUCCEEDED) {

> - 			/* cmd pending in the device */

> - 			dev_err(hba->dev, "%s: cmd pending in the device. tag = %d\n",

> - 				__func__, tag);

> - 			break;

> - 		} else if (!err && resp == UPIU_TASK_MANAGEMENT_FUNC_COMPL) {

> - 			/*

> - 			 * cmd not pending in the device, check if it is

> - 			 * in transition.

> - 			 */

> - 			dev_err(hba->dev, "%s: cmd at tag %d not pending in the device.\n",

> - 				__func__, tag);

> - 			reg = ufshcd_readl(hba, REG_UTP_TRANSFER_REQ_DOOR_BELL);

> - 			if (reg & (1 << tag)) {

> - 				/* sleep for max. 200us to stabilize */

> - 				usleep_range(100, 200);

> - 				continue;

> - 			}

> - 			/* command completed already */

> - 			dev_err(hba->dev, "%s: cmd at tag %d successfully cleared from DB.\n",

> - 				__func__, tag);

> - 			goto cleanup;

> - 		} else {

> - 			dev_err(hba->dev,

> - 				"%s: no response from device. tag = %d, err %d\n",

> - 				__func__, tag, err);

> - 			if (!err)

> - 				err = resp; /* service response error */

> - 			goto out;

> - 		}

> - 	}

> - 

> - 	if (!poll_cnt) {

> - 		err = -EBUSY;

> - 		goto out;

> - 	}

> - 

> - 	err = ufshcd_issue_tm_cmd(hba, lrbp->lun, lrbp->task_tag,

> - 			UFS_ABORT_TASK, &resp);

> - 	if (err || resp != UPIU_TASK_MANAGEMENT_FUNC_COMPL) {

> - 		if (!err) {

> - 			err = resp; /* service response error */

> - 			dev_err(hba->dev, "%s: issued. tag = %d, err %d\n",

> - 				__func__, tag, err);

> - 		}

> - 		goto out;

> - 	}

> - 

> - 	err = ufshcd_clear_cmd(hba, tag);

> - 	if (err) {

> - 		dev_err(hba->dev, "%s: Failed clearing cmd at tag %d, err %d\n",

> - 			__func__, tag, err);

> - 		goto out;

> - 	}

> ++	else

> ++		err = ufshcd_try_to_abort_task(hba, tag);

>   

>  -out:

> + 	if (!err) {

>  +cleanup:


Yeah, considering Bean Huo's patch "scsi: ufs: No need to send Abort
Task if the task in DB was cleared", "cleanup" label shall be added back
here.

So your resolution looks good to me.

Thanks so much : )

Stanley Chu

> - 	spin_lock_irqsave(host->host_lock, flags);

> - 	__ufshcd_transfer_req_compl(hba, (1UL << tag));

> - 	spin_unlock_irqrestore(host->host_lock, flags);

> - 

> ++		spin_lock_irqsave(host->host_lock, flags);

> ++		__ufshcd_transfer_req_compl(hba, (1UL << tag));

> ++		spin_unlock_irqrestore(host->host_lock, flags);

>  +out:

> - 	if (!err) {

>   		err = SUCCESS;

>   	} else {

>   		dev_err(hba->dev, "%s: failed with err %d\n", __func__, err);

> diff --cc drivers/scsi/ufs/ufshcd.h

> index b5b2761456fb,8011fdc89fb1..6663325ed8a0

> --- a/drivers/scsi/ufs/ufshcd.h

> +++ b/drivers/scsi/ufs/ufshcd.h

> @@@ -531,11 -531,10 +531,16 @@@ enum ufshcd_quirks 

>   	 */

>   	UFSHCD_QUIRK_BROKEN_OCS_FATAL_ERROR		= 1 << 10,

>   

>  +	/*

>  +	 * This quirk needs to be enabled if the host controller has

>  +	 * auto-hibernate capability but it doesn't work.

>  +	 */

>  +	UFSHCD_QUIRK_BROKEN_AUTO_HIBERN8		= 1 << 11,

> ++

> + 	/*

> + 	 * This quirk needs to disable manual flush for write booster

> + 	 */

>  -	UFSHCI_QUIRK_SKIP_MANUAL_WB_FLUSH_CTRL		= 1 << 11,

> ++	UFSHCI_QUIRK_SKIP_MANUAL_WB_FLUSH_CTRL		= 1 << 12,

>   };

>   

>   enum ufshcd_caps {
James Bottomley Sept. 10, 2020, 4:09 p.m. UTC | #5
On Thu, 2020-09-10 at 16:18 +0800, Stanley Chu wrote:
[...]
> > + 	if (!err) {
> >  +cleanup:
> 
> Yeah, considering Bean Huo's patch "scsi: ufs: No need to send Abort
> Task if the task in DB was cleared", "cleanup" label shall be added
> back here.
> 
> So your resolution looks good to me.
> 
> Thanks so much : )

You're welcome ... but just remember I have to explain this to Linus
when the merge window opens.  It would be a lot easier if this hadn't
happened so please don't make it any worse ...

James
Can Guo Sept. 11, 2020, 2:16 a.m. UTC | #6
Hi James and Stanley,

On 2020-09-11 00:09, James Bottomley wrote:
> On Thu, 2020-09-10 at 16:18 +0800, Stanley Chu wrote:
> [...]
>> > + 	if (!err) {
>> >  +cleanup:
>> 
>> Yeah, considering Bean Huo's patch "scsi: ufs: No need to send Abort
>> Task if the task in DB was cleared", "cleanup" label shall be added
>> back here.
>> 
>> So your resolution looks good to me.
>> 
>> Thanks so much : )
> 
> You're welcome ... but just remember I have to explain this to Linus
> when the merge window opens.  It would be a lot easier if this hadn't
> happened so please don't make it any worse ...
> 
> James

Sorry that my changes got you confused and thank you for help resolve 
the
conflicts. My change ("scsi: ufs: Abort tasks before clearing them from
doorbell") is to serve my fixes to ufs error recovery which only got 
picked
up on scsi-queue-5.10. So I checked out to scsi-queue-5.10 and made my
changes on the tip of scsi-queue-5.10, below 2 changes were not even
present in scsi-queue-5.10 back that time.

scsi: ufs: Clean up completed request without interrupt notification
scsi: ufs: No need to send Abort Task if the task in DB was cleared

Is there anything wrong with my work flow above? Please let me know the
right process so that I can avoid such conflicts in my next changes, 
which
also touch the func ufshcd_abort(). Thanks!

Regards,

Can Guo.
Can Guo Sept. 15, 2020, 3:14 a.m. UTC | #7
Hi Bean,

On 2020-09-11 17:09, Bean Huo wrote:
> On Fri, 2020-09-11 at 02:16 +0000, Can Guo wrote:

>> > >

>> > > So your resolution looks good to me.

>> > >

>> > > Thanks so much : )

>> >

>> > You're welcome ... but just remember I have to explain this to

>> > Linus

>> > when the merge window opens.  It would be a lot easier if this

>> > hadn't

>> > happened so please don't make it any worse ...

>> >

>> > James

>> 

>> Sorry that my changes got you confused and thank you for help

>> resolve

>> the

>> conflicts. My change ("scsi: ufs: Abort tasks before clearing them

>> from

>> doorbell") is to serve my fixes to ufs error recovery which only got

>> picked

>> up on scsi-queue-5.10. So I checked out to scsi-queue-5.10 and made

>> my

>> changes on the tip of scsi-queue-5.10, below 2 changes were not even

>> present in scsi-queue-5.10 back that time.

> 

> I mentioned here https://patchwork.kernel.org/patch/11734713/


Do you know when can this change be picked up in scsi-queue-5.10?
If I push my fixes to ufshcd_abort() on scsi-queue-5.10, they will
run into conflicts with this one again, right? How should I move
forward now? Thanks.

Regards,

Can Guo.

> 

> this change (scsi: ufs: Abort tasks before clearing them from doorbell)

> has conflicts with the scsi-fixes branch. I don't know which branch is

> the main branch we should focus on.

> 

> 

> Bean
Martin K. Petersen Sept. 15, 2020, 8:21 p.m. UTC | #8
Can,

> Do you know when can this change be picked up in scsi-queue-5.10?

> If I push my fixes to ufshcd_abort() on scsi-queue-5.10, they will

> run into conflicts with this one again, right? How should I move

> forward now?


You should be able to use 5.10/scsi-queue as baseline now.

For 5.11 I think I'll do a separate branch for UFS.

-- 
Martin K. Petersen	Oracle Linux Engineering
Can Guo Sept. 16, 2020, 6:34 a.m. UTC | #9
On 2020-09-16 04:21, Martin K. Petersen wrote:
> Can,
> 
>> Do you know when can this change be picked up in scsi-queue-5.10?
>> If I push my fixes to ufshcd_abort() on scsi-queue-5.10, they will
>> run into conflicts with this one again, right? How should I move
>> forward now?
> 
> You should be able to use 5.10/scsi-queue as baseline now.
> 
> For 5.11 I think I'll do a separate branch for UFS.

Thanks for the information.

Regards,

Can Guo.