Message ID | 20230927033557.13801-1-peter.wang@mediatek.com |
---|---|
State | New |
Headers | show |
Series | [v5] ufs: core: wlun send SSU timeout recovery | expand |
On Wed, 2023-09-27 at 11:31 -0700, Bart Van Assche wrote: > > External email : Please do not click links or open attachments until > you have verified the sender or the content. > On 9/26/23 20:35, peter.wang@mediatek.com wrote: > > +if (hba->pm_op_in_progress) { > > +if (ufshcd_link_recovery(hba)) > > +err = FAILED; > > + > > +return err; > > +} > > This patch looks good to me but I wish the above code would have been > written using the style that is preferred in the Linux kernel: > > if (hba->pm_op_in_progress && ufshcd_link_recovery(hba) < 0) > return FAILED; > > Thanks, > > Bart. > Hi Bart, It looks more concise but cannot help in this deadlock case. Because if pm_op_in_progress is true, and ufshcd_link_recovery return 0, we should return SUCCESS directly, else go forward in this function eh_work will be triggered and deadlock happen. Thanks. Peter
On 9/26/23 20:35, peter.wang@mediatek.com wrote: > When runtime pm send SSU times out, the SCSI core invokes > eh_host_reset_handler, which hooks function ufshcd_eh_host_reset_handler > schedule eh_work and stuck at wait flush_work(&hba->eh_work). > However, ufshcd_err_handler hangs in wait rpm resume. > Do link recovery only in this case. Reviewed-by: Bart Van Assche <bvanassche@acm.org>
On Wed, Sep 27, 2023 at 1:43 PM <peter.wang@mediatek.com> wrote: > > From: Peter Wang <peter.wang@mediatek.com> > > When runtime pm send SSU times out, the SCSI core invokes > eh_host_reset_handler, which hooks function ufshcd_eh_host_reset_handler > schedule eh_work and stuck at wait flush_work(&hba->eh_work). > However, ufshcd_err_handler hangs in wait rpm resume. > Do link recovery only in this case. > Below is IO hang stack dump in kernel-6.1 Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Peter, > When runtime pm send SSU times out, the SCSI core invokes > eh_host_reset_handler, which hooks function > ufshcd_eh_host_reset_handler schedule eh_work and stuck at wait > flush_work(&hba->eh_work). However, ufshcd_err_handler hangs in wait > rpm resume. Do link recovery only in this case. Applied to 6.7/scsi-staging, thanks!
diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c index c2df07545f96..0619cefa092e 100644 --- a/drivers/ufs/core/ufshcd.c +++ b/drivers/ufs/core/ufshcd.c @@ -7716,6 +7716,20 @@ static int ufshcd_eh_host_reset_handler(struct scsi_cmnd *cmd) hba = shost_priv(cmd->device->host); + /* + * If runtime pm send SSU and got timeout, scsi_error_handler + * stuck at this function to wait for flush_work(&hba->eh_work). + * And ufshcd_err_handler(eh_work) stuck at wait for runtime pm active. + * Do ufshcd_link_recovery instead schedule eh_work can prevent + * dead lock to happen. + */ + if (hba->pm_op_in_progress) { + if (ufshcd_link_recovery(hba)) + err = FAILED; + + return err; + } + spin_lock_irqsave(hba->host->host_lock, flags); hba->force_reset = true; ufshcd_schedule_eh_work(hba);