Message ID | 20210722033439.26550-12-bvanassche@acm.org |
---|---|
State | New |
Headers | show |
Series | UFS patches for kernel v5.15 | expand |
On Wed, 2021-07-21 at 20:34 -0700, Bart Van Assche wrote: > Using the UTRLCNR register involves two MMIO accesses in the hot path > while > > using the doorbell register only involves a single MMIO access. Since > MMIO > > accesses take time, do not use the UTRLCNR register. The spinlock > contention > > on the SCSI host lock that is reintroduced by this patch will be > addressed > > by a later patch. > > > > This reverts commit 6f7151729647e58ac7c522081255fd0c07b38105. Bart, This commit is the key change in "Optimize host lock on TR send/compl paths and utilize UTRLCNR" https://patchwork.kernel.org/project/linux-scsi/cover/1621845419-14194-1-git-send-email-cang@codeaurora.org/. How did you compare the performance gain/loss after reverting this commit? Kind regards, Bean
On 7/29/21 1:03 AM, Bean Huo wrote: > On Wed, 2021-07-21 at 20:34 -0700, Bart Van Assche wrote: >> Using the UTRLCNR register involves two MMIO accesses in the hot path >> while >> >> using the doorbell register only involves a single MMIO access. Since >> MMIO >> >> accesses take time, do not use the UTRLCNR register. The spinlock >> contention >> >> on the SCSI host lock that is reintroduced by this patch will be >> addressed >> >> by a later patch. >> >> >> >> This reverts commit 6f7151729647e58ac7c522081255fd0c07b38105. > > Bart, > This commit is the key change in "Optimize host lock on TR send/compl > paths and utilize UTRLCNR" > https://patchwork.kernel.org/project/linux-scsi/cover/1621845419-14194-1-git-send-email-cang@codeaurora.org/. > > How did you compare the performance gain/loss after reverting this > commit? Hi Bean, I measured the performance impact of patches 11 and 12 combined. In a 4 KB read IOPS test I see that these two patches combined improve performance by 1%. This illustrates that the two MMIO accesses of the UTRLCNR register are slower than the single MMIO access of the doorbell register. Thanks, Bart.
On 7/29/21 9:10 AM, Bart Van Assche wrote: > On 7/29/21 1:03 AM, Bean Huo wrote: >> How did you compare the performance gain/loss after reverting this >> commit? > > I measured the performance impact of patches 11 and 12 combined. In a 4 > KB read IOPS test I see that these two patches combined improve > performance by 1%. This illustrates that the two MMIO accesses of the > UTRLCNR register are slower than the single MMIO access of the doorbell > register. A correction: I wanted to refer to the combination of patches 11 and 13 instead of 11 and 12. Bart.
On Thu, 2021-07-29 at 09:13 -0700, Bart Van Assche wrote: > > I measured the performance impact of patches 11 and 12 combined. In > > a 4 > > KB read IOPS test I see that these two patches combined improve > > performance by 1%. This illustrates that the two MMIO accesses of > > the > > UTRLCNR register are slower than the single MMIO access of the > > doorbell > > register. > > > A correction: I wanted to refer to the combination of patches 11 and > 13 > > instead of 11 and 12. > > > > Bart. Bart, thanks for your explanation. I tested this commit before, indeed there is performance improvement. Based on your comments, I will verify your statement. Bean
On Wed, 2021-07-21 at 20:34 -0700, Bart Van Assche wrote: > Using the UTRLCNR register involves two MMIO accesses in the hot path > while > using the doorbell register only involves a single MMIO access. Since > MMIO > accesses take time, do not use the UTRLCNR register. The spinlock > contention > on the SCSI host lock that is reintroduced by this patch will be > addressed > by a later patch. > > This reverts commit 6f7151729647e58ac7c522081255fd0c07b38105. > > Cc: Adrian Hunter <adrian.hunter@intel.com> > Cc: Stanley Chu <stanley.chu@mediatek.com> > Cc: Can Guo <cang@codeaurora.org> > Cc: Asutosh Das <asutoshd@codeaurora.org> > Cc: Avri Altman <avri.altman@wdc.com> > Signed-off-by: Bart Van Assche <bvanassche@acm.org> > Hi Bart, I did the comparison test on my platform, it is very difficult to get a very clear and fair result between two changes. but lamely speaking, on the small chunk size read/write, your changes wins. but on the big chunk size, It is not very clear, the gap between the two changes can be ignored. Tested-by: Bean Huo <beanhuo@micron.com> Bean
On 8/2/21 8:24 AM, Bean Huo wrote: > I did the comparison test on my platform, it is very difficult to get a > very clear and fair result between two changes. but lamely speaking, on > the small chunk size read/write, your changes wins. but on the big > chunk size, It is not very clear, the gap between the two changes can > be ignored. > > Tested-by: Bean Huo <beanhuo@micron.com> Thanks for having tested this patch :-) Bart.
diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c index 4c6d832f5e81..cb588b705fbb 100644 --- a/drivers/scsi/ufs/ufshcd.c +++ b/drivers/scsi/ufs/ufshcd.c @@ -2087,6 +2087,7 @@ static inline void ufshcd_send_command(struct ufs_hba *hba, unsigned int task_tag) { struct ufshcd_lrb *lrbp = &hba->lrb[task_tag]; + unsigned long flags; lrbp->issue_time_stamp = ktime_get(); lrbp->compl_time_stamp = ktime_set(0, 0); @@ -2095,19 +2096,10 @@ void ufshcd_send_command(struct ufs_hba *hba, unsigned int task_tag) ufshcd_clk_scaling_start_busy(hba); if (unlikely(ufshcd_should_inform_monitor(hba, lrbp))) ufshcd_start_monitor(hba, lrbp); - if (ufshcd_has_utrlcnr(hba)) { - set_bit(task_tag, &hba->outstanding_reqs); - ufshcd_writel(hba, 1 << task_tag, - REG_UTP_TRANSFER_REQ_DOOR_BELL); - } else { - unsigned long flags; - - spin_lock_irqsave(hba->host->host_lock, flags); - set_bit(task_tag, &hba->outstanding_reqs); - ufshcd_writel(hba, 1 << task_tag, - REG_UTP_TRANSFER_REQ_DOOR_BELL); - spin_unlock_irqrestore(hba->host->host_lock, flags); - } + spin_lock_irqsave(hba->host->host_lock, flags); + set_bit(task_tag, &hba->outstanding_reqs); + ufshcd_writel(hba, 1 << task_tag, REG_UTP_TRANSFER_REQ_DOOR_BELL); + spin_unlock_irqrestore(hba->host->host_lock, flags); /* Make sure that doorbell is committed immediately */ wmb(); } @@ -5234,17 +5226,17 @@ static void __ufshcd_transfer_req_compl(struct ufs_hba *hba, } /** - * ufshcd_trc_handler - handle transfer requests completion + * ufshcd_transfer_req_compl - handle SCSI and query command completion * @hba: per adapter instance - * @use_utrlcnr: get completed requests from UTRLCNR * * Returns * IRQ_HANDLED - If interrupt is valid * IRQ_NONE - If invalid interrupt */ -static irqreturn_t ufshcd_trc_handler(struct ufs_hba *hba, bool use_utrlcnr) +static irqreturn_t ufshcd_transfer_req_compl(struct ufs_hba *hba) { - unsigned long completed_reqs = 0; + unsigned long completed_reqs, flags; + u32 tr_doorbell; /* Resetting interrupt aggregation counters first and reading the * DOOR_BELL afterward allows us to handle all the completed requests. @@ -5257,24 +5249,10 @@ static irqreturn_t ufshcd_trc_handler(struct ufs_hba *hba, bool use_utrlcnr) !(hba->quirks & UFSHCI_QUIRK_SKIP_RESET_INTR_AGGR)) ufshcd_reset_intr_aggr(hba); - if (use_utrlcnr) { - u32 utrlcnr; - - utrlcnr = ufshcd_readl(hba, REG_UTP_TRANSFER_REQ_LIST_COMPL); - if (utrlcnr) { - ufshcd_writel(hba, utrlcnr, - REG_UTP_TRANSFER_REQ_LIST_COMPL); - completed_reqs = utrlcnr; - } - } else { - unsigned long flags; - u32 tr_doorbell; - - spin_lock_irqsave(hba->host->host_lock, flags); - tr_doorbell = ufshcd_readl(hba, REG_UTP_TRANSFER_REQ_DOOR_BELL); - completed_reqs = tr_doorbell ^ hba->outstanding_reqs; - spin_unlock_irqrestore(hba->host->host_lock, flags); - } + spin_lock_irqsave(hba->host->host_lock, flags); + tr_doorbell = ufshcd_readl(hba, REG_UTP_TRANSFER_REQ_DOOR_BELL); + completed_reqs = tr_doorbell ^ hba->outstanding_reqs; + spin_unlock_irqrestore(hba->host->host_lock, flags); if (completed_reqs) { __ufshcd_transfer_req_compl(hba, completed_reqs); @@ -5756,7 +5734,7 @@ static void ufshcd_exception_event_handler(struct work_struct *work) /* Complete requests that have door-bell cleared */ static void ufshcd_complete_requests(struct ufs_hba *hba) { - ufshcd_trc_handler(hba, false); + ufshcd_transfer_req_compl(hba); ufshcd_tmc_handler(hba); } @@ -6397,7 +6375,7 @@ static irqreturn_t ufshcd_sl_intr(struct ufs_hba *hba, u32 intr_status) retval |= ufshcd_tmc_handler(hba); if (intr_status & UTP_TRANSFER_REQ_COMPL) - retval |= ufshcd_trc_handler(hba, ufshcd_has_utrlcnr(hba)); + retval |= ufshcd_transfer_req_compl(hba); return retval; } diff --git a/drivers/scsi/ufs/ufshcd.h b/drivers/scsi/ufs/ufshcd.h index 8dcf0df770a2..a44baec43dd5 100644 --- a/drivers/scsi/ufs/ufshcd.h +++ b/drivers/scsi/ufs/ufshcd.h @@ -1176,11 +1176,6 @@ static inline u32 ufshcd_vops_get_ufs_hci_version(struct ufs_hba *hba) return ufshcd_readl(hba, REG_UFS_VERSION); } -static inline bool ufshcd_has_utrlcnr(struct ufs_hba *hba) -{ - return (hba->ufs_version >= ufshci_version(3, 0)); -} - static inline int ufshcd_vops_clk_scale_notify(struct ufs_hba *hba, bool up, enum ufs_notify_change_status status) { diff --git a/drivers/scsi/ufs/ufshci.h b/drivers/scsi/ufs/ufshci.h index 5affb1fce5ad..de95be5d11d4 100644 --- a/drivers/scsi/ufs/ufshci.h +++ b/drivers/scsi/ufs/ufshci.h @@ -39,7 +39,6 @@ enum { REG_UTP_TRANSFER_REQ_DOOR_BELL = 0x58, REG_UTP_TRANSFER_REQ_LIST_CLEAR = 0x5C, REG_UTP_TRANSFER_REQ_LIST_RUN_STOP = 0x60, - REG_UTP_TRANSFER_REQ_LIST_COMPL = 0x64, REG_UTP_TASK_REQ_LIST_BASE_L = 0x70, REG_UTP_TASK_REQ_LIST_BASE_H = 0x74, REG_UTP_TASK_REQ_DOOR_BELL = 0x78,
Using the UTRLCNR register involves two MMIO accesses in the hot path while using the doorbell register only involves a single MMIO access. Since MMIO accesses take time, do not use the UTRLCNR register. The spinlock contention on the SCSI host lock that is reintroduced by this patch will be addressed by a later patch. This reverts commit 6f7151729647e58ac7c522081255fd0c07b38105. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Stanley Chu <stanley.chu@mediatek.com> Cc: Can Guo <cang@codeaurora.org> Cc: Asutosh Das <asutoshd@codeaurora.org> Cc: Avri Altman <avri.altman@wdc.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> --- drivers/scsi/ufs/ufshcd.c | 52 +++++++++++---------------------------- drivers/scsi/ufs/ufshcd.h | 5 ---- drivers/scsi/ufs/ufshci.h | 1 - 3 files changed, 15 insertions(+), 43 deletions(-)