From patchwork Mon Jun 26 13:38:46 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Leizhen \(ThunderTown\)" <thunder.leizhen@huawei.com>
X-Patchwork-Id: 106326
Delivered-To: patch@linaro.org
Received: by 10.140.101.48 with SMTP id t45csp85980qge;
 Mon, 26 Jun 2017 06:39:50 -0700 (PDT)
X-Received: by 10.99.60.68 with SMTP id i4mr253404pgn.250.1498484390188;
 Mon, 26 Jun 2017 06:39:50 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1498484390; cv=none;
 d=google.com; s=arc-20160816;
 b=LWEb6PWvWSBoe4oBszUsMRDgWjFa/bGoZYQfpS1GjguXi4UxHwD3Vwh7dMuUGNjnu7
 YaAyf9kWeFyl0DS7udP4kUYnJZWdPlMCEa90+c7znPPnDLPIK0pOdAp0NJP5dCWBCw5N
 mr6AYvw1byIH/28vk5vvT4fcXERmwRzYfJyrmKggwSI5Ob0oaQsfOhx3vKFnh1bAEtLk
 jVelARwAZOsnIRVThHtHVrhQPkfZy9gNMq9egqZs1B7GnuTVwMbNp9bq/ilqs5BoLYZ4
 DPDDk8qoD61+YKkV1si6l7MyKv2hDS9aN6+k+z+PQE1bdC7gHQP95OuG7qx8RmNEjvwp
 Idhg==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816; 
 h=list-id:precedence:sender:mime-version:references:in-reply-to
 :message-id:date:subject:cc:to:from:arc-authentication-results;
 bh=/sIQiJuz7TO2pL5fkLQ87BSRa1j6WMse6X+MX+K8rZY=;
 b=nWAyRyj96DaCocITsAEsgSj/vRTzhJNyW60D3Ug3ZiBeuSeMMqUFhlNcFj3LUg2eTs
 YPmQiliqclz67QpFYzCTQRBZUzZwzvFGDbr34JmJo2ibvr/J8dEVIvXWxKxmeNlbFwGf
 ozRKE7epgbBTLVD0yD32LtE0MA2e8324E4LDPuLcYU5zUmdvmrr243moR4iuARp8AqbE
 2BNk6LFt6TiFpAsfQdRMD+ThEJOpUP0EHRioWzGO8wMbZIu3+YmVZzQfdOmH52GLeYTD
 OmO3o5owEi3BMs0Ggf+80TISwOa7izbAOms7ozslnET5zwNlEUUbNJOmcv9aHKQ62cuC
 560Q==
ARC-Authentication-Results: i=1; mx.google.com;
 spf=pass (google.com: best guess record for domain of
 linux-kernel-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67])
 by mx.google.com with ESMTP id u91si78851plb.499.2017.06.26.06.39.49; 
 Mon, 26 Jun 2017 06:39:50 -0700 (PDT)
Received-SPF: pass (google.com: best guess record for domain of
 linux-kernel-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender) client-ip=209.132.180.67; 
Authentication-Results: mx.google.com;
 spf=pass (google.com: best guess record for domain of
 linux-kernel-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
 id S1752481AbdFZNjg (ORCPT <rfc822;georgi.djakov@linaro.org>
 + 25 others); Mon, 26 Jun 2017 09:39:36 -0400
Received: from szxga02-in.huawei.com ([45.249.212.188]:8859 "EHLO
 szxga02-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
 with ESMTP id S1752193AbdFZNj2 (ORCPT
 <rfc822;linux-kernel@vger.kernel.org>);
 Mon, 26 Jun 2017 09:39:28 -0400
Received: from 172.30.72.53 (EHLO DGGEML401-HUB.china.huawei.com)
 ([172.30.72.53])
 by dggrg02-dlp.huawei.com (MOS 4.4.6-GA FastPath queued)
 with ESMTP id AQA47793; Mon, 26 Jun 2017 21:39:26 +0800 (CST)
Received: from localhost (10.177.23.164) by DGGEML401-HUB.china.huawei.com
 (10.3.17.32) with Microsoft SMTP Server id 14.3.301.0;
 Mon, 26 Jun 2017 21:39:17 +0800
From: Zhen Lei <thunder.leizhen@huawei.com>
To: Will Deacon <will.deacon@arm.com>, Joerg Roedel <joro@8bytes.org>,
 linux-arm-kernel <linux-arm-kernel@lists.infradead.org>,
 iommu <iommu@lists.linux-foundation.org>,
 Robin Murphy <robin.murphy@arm.com>,
 linux-kernel <linux-kernel@vger.kernel.org>
CC: Zefan Li <lizefan@huawei.com>, Xinwei Hu <huxinwei@huawei.com>,
 "Tianhong Ding" <dingtianhong@huawei.com>,
 Hanjun Guo <guohanjun@huawei.com>, Zhen Lei <thunder.leizhen@huawei.com>,
 John Garry <john.garry@huawei.com>
Subject: [PATCH 1/5] iommu/arm-smmu-v3: put off the execution of TLBI* to
 reduce lock confliction
Date: Mon, 26 Jun 2017 21:38:46 +0800
Message-ID: <1498484330-10840-2-git-send-email-thunder.leizhen@huawei.com>
X-Mailer: git-send-email 1.9.5.msysgit.0
In-Reply-To: <1498484330-10840-1-git-send-email-thunder.leizhen@huawei.com>
References: <1498484330-10840-1-git-send-email-thunder.leizhen@huawei.com>
MIME-Version: 1.0
X-Originating-IP: [10.177.23.164]
X-CFilter-Loop: Reflected
X-Mirapoint-Virus-RAPID-Raw: score=unknown(0),
 refid=str=0001.0A020204.59510E8E.00A2, ss=1, re=0.000, recu=0.000,
 reip=0.000, cl=1, cld=1, fgs=0, ip=0.0.0.0,
 so=2014-11-16 11:51:01, dmn=2013-03-21 17:37:32
X-Mirapoint-Loop-Id: a4d792d883189da3eefb60cb94f23d25
Sender: linux-kernel-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

Because all TLBI commands should be followed by a SYNC command, to make
sure that it has been completely finished. So we can just add the TLBI
commands into the queue, and put off the execution until meet SYNC or
other commands. To prevent the followed SYNC command waiting for a long
time because of too many commands have been delayed, restrict the max
delayed number.

According to my test, I got the same performance data as I replaced writel
with writel_relaxed in queue_inc_prod.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 drivers/iommu/arm-smmu-v3.c | 42 +++++++++++++++++++++++++++++++++++++-----
 1 file changed, 37 insertions(+), 5 deletions(-)

-- 
2.5.0

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 291da5f..4481123 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -337,6 +337,7 @@
 /* Command queue */
 #define CMDQ_ENT_DWORDS			2
 #define CMDQ_MAX_SZ_SHIFT		8
+#define CMDQ_MAX_DELAYED		32
 
 #define CMDQ_ERR_SHIFT			24
 #define CMDQ_ERR_MASK			0x7f
@@ -472,6 +473,7 @@ struct arm_smmu_cmdq_ent {
 			};
 		} cfgi;
 
+		#define CMDQ_OP_TLBI_NH_ALL	0x10
 		#define CMDQ_OP_TLBI_NH_ASID	0x11
 		#define CMDQ_OP_TLBI_NH_VA	0x12
 		#define CMDQ_OP_TLBI_EL2_ALL	0x20
@@ -499,6 +501,7 @@ struct arm_smmu_cmdq_ent {
 
 struct arm_smmu_queue {
 	int				irq; /* Wired interrupt */
+	u32				nr_delay;
 
 	__le64				*base;
 	dma_addr_t			base_dma;
@@ -722,11 +725,16 @@ static int queue_sync_prod(struct arm_smmu_queue *q)
 	return ret;
 }
 
-static void queue_inc_prod(struct arm_smmu_queue *q)
+static void queue_inc_swprod(struct arm_smmu_queue *q)
 {
-	u32 prod = (Q_WRP(q, q->prod) | Q_IDX(q, q->prod)) + 1;
+	u32 prod = q->prod + 1;
 
 	q->prod = Q_OVF(q, q->prod) | Q_WRP(q, prod) | Q_IDX(q, prod);
+}
+
+static void queue_inc_prod(struct arm_smmu_queue *q)
+{
+	queue_inc_swprod(q);
 	writel(q->prod, q->prod_reg);
 }
 
@@ -761,13 +769,24 @@ static void queue_write(__le64 *dst, u64 *src, size_t n_dwords)
 		*dst++ = cpu_to_le64(*src++);
 }
 
-static int queue_insert_raw(struct arm_smmu_queue *q, u64 *ent)
+static int queue_insert_raw(struct arm_smmu_queue *q, u64 *ent, int optimize)
 {
 	if (queue_full(q))
 		return -ENOSPC;
 
 	queue_write(Q_ENT(q, q->prod), ent, q->ent_dwords);
-	queue_inc_prod(q);
+
+	/*
+	 * We don't want too many commands to be delayed, this may lead the
+	 * followed sync command to wait for a long time.
+	 */
+	if (optimize && (++q->nr_delay < CMDQ_MAX_DELAYED)) {
+		queue_inc_swprod(q);
+	} else {
+		queue_inc_prod(q);
+		q->nr_delay = 0;
+	}
+
 	return 0;
 }
 
@@ -909,6 +928,7 @@ static void arm_smmu_cmdq_skip_err(struct arm_smmu_device *smmu)
 static void arm_smmu_cmdq_issue_cmd(struct arm_smmu_device *smmu,
 				    struct arm_smmu_cmdq_ent *ent)
 {
+	int optimize = 0;
 	u64 cmd[CMDQ_ENT_DWORDS];
 	unsigned long flags;
 	bool wfe = !!(smmu->features & ARM_SMMU_FEAT_SEV);
@@ -920,8 +940,17 @@ static void arm_smmu_cmdq_issue_cmd(struct arm_smmu_device *smmu,
 		return;
 	}
 
+	/*
+	 * All TLBI commands should be followed by a sync command later.
+	 * The CFGI commands is the same, but they are rarely executed.
+	 * So just optimize TLBI commands now, to reduce the "if" judgement.
+	 */
+	if ((ent->opcode >= CMDQ_OP_TLBI_NH_ALL) &&
+	    (ent->opcode <= CMDQ_OP_TLBI_NSNH_ALL))
+		optimize = 1;
+
 	spin_lock_irqsave(&smmu->cmdq.lock, flags);
-	while (queue_insert_raw(q, cmd) == -ENOSPC) {
+	while (queue_insert_raw(q, cmd, optimize) == -ENOSPC) {
 		if (queue_poll_cons(q, false, wfe))
 			dev_err_ratelimited(smmu->dev, "CMDQ timeout\n");
 	}
@@ -1953,6 +1982,8 @@ static int arm_smmu_init_one_queue(struct arm_smmu_device *smmu,
 		     << Q_BASE_LOG2SIZE_SHIFT;
 
 	q->prod = q->cons = 0;
+	q->nr_delay = 0;
+
 	return 0;
 }
 
@@ -2512,6 +2543,7 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 		dev_err(smmu->dev, "unit-length command queue not supported\n");
 		return -ENXIO;
 	}
+	BUILD_BUG_ON(CMDQ_MAX_DELAYED >= (1 << CMDQ_MAX_SZ_SHIFT));
 
 	smmu->evtq.q.max_n_shift = min((u32)EVTQ_MAX_SZ_SHIFT,
 				       reg >> IDR1_EVTQ_SHIFT & IDR1_EVTQ_MASK);