From patchwork Tue Feb 2 12:46:25 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Will Deacon X-Patchwork-Id: 61017 Delivered-To: patch@linaro.org Received: by 10.112.130.2 with SMTP id oa2csp708934lbb; Tue, 2 Feb 2016 04:49:50 -0800 (PST) X-Received: by 10.66.253.97 with SMTP id zz1mr46945677pac.106.1454417390120; Tue, 02 Feb 2016 04:49:50 -0800 (PST) Return-Path: Received: from bombadil.infradead.org (bombadil.infradead.org. [2001:1868:205::9]) by mx.google.com with ESMTPS id d9si1752616pas.186.2016.02.02.04.49.49 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 02 Feb 2016 04:49:50 -0800 (PST) Received-SPF: pass (google.com: domain of linux-arm-kernel-bounces+patch=linaro.org@lists.infradead.org designates 2001:1868:205::9 as permitted sender) client-ip=2001:1868:205::9; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-arm-kernel-bounces+patch=linaro.org@lists.infradead.org designates 2001:1868:205::9 as permitted sender) smtp.mailfrom=linux-arm-kernel-bounces+patch=linaro.org@lists.infradead.org Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux)) id 1aQaO0-0005pw-2o; Tue, 02 Feb 2016 12:48:48 +0000 Received: from cam-admin0.cambridge.arm.com ([217.140.96.50]) by bombadil.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux)) id 1aQaMD-00048o-4D for linux-arm-kernel@lists.infradead.org; Tue, 02 Feb 2016 12:47:28 +0000 Received: from edgewater-inn.cambridge.arm.com (edgewater-inn.cambridge.arm.com [10.1.203.121]) by cam-admin0.cambridge.arm.com (8.12.6/8.12.6) with ESMTP id u12CkXWr006491; Tue, 2 Feb 2016 12:46:33 GMT Received: by edgewater-inn.cambridge.arm.com (Postfix, from userid 1000) id 5CCB11AE326B; Tue, 2 Feb 2016 12:46:36 +0000 (GMT) From: Will Deacon To: linux-arm-kernel@lists.infradead.org Subject: [PATCH 3/4] arm64: lib: improve copy_page to deal with 128 bytes at a time Date: Tue, 2 Feb 2016 12:46:25 +0000 Message-Id: <1454417186-21828-4-git-send-email-will.deacon@arm.com> X-Mailer: git-send-email 2.1.4 In-Reply-To: <1454417186-21828-1-git-send-email-will.deacon@arm.com> References: <1454417186-21828-1-git-send-email-will.deacon@arm.com> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20160202_044657_827157_DD162F0E X-CRM114-Status: UNSURE ( 8.77 ) X-CRM114-Notice: Please train this message. X-Spam-Score: -7.4 (-------) X-Spam-Report: SpamAssassin version 3.4.0 on bombadil.infradead.org summary: Content analysis details: (-7.4 points) pts rule name description ---- ---------------------- -------------------------------------------------- -5.0 RCVD_IN_DNSWL_HI RBL: Sender listed at http://www.dnswl.org/, high trust [217.140.96.50 listed in list.dnswl.org] -0.0 SPF_PASS SPF: sender matches SPF record -0.5 RP_MATCHES_RCVD Envelope sender domain matches handover relay domain -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: catalin.marinas@arm.com, Will Deacon , apinski@cavium.com, suzuki.poulose@arm.com MIME-Version: 1.0 Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patch=linaro.org@lists.infradead.org We want to avoid lots of different copy_page implementations, settling for something that is "good enough" everywhere and hopefully easy to understand and maintain whilst we're at it. This patch reworks our copy_page implementation based on discussions with Cavium on the list and benchmarking on Cortex-A processors so that: - The loop is unrolled to copy 128 bytes per iteration - The reads are offset so that we read from the next 128-byte block in the same iteration that we store the previous block - Explicit prefetch instructions are removed for now, since they hurt performance on CPUs with hardware prefetching - The loop exit condition is calculated at the start of the loop Signed-off-by: Will Deacon --- arch/arm64/lib/copy_page.S | 46 ++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 38 insertions(+), 8 deletions(-) -- 2.1.4 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel diff --git a/arch/arm64/lib/copy_page.S b/arch/arm64/lib/copy_page.S index 512b9a7b980e..2534533ceb1d 100644 --- a/arch/arm64/lib/copy_page.S +++ b/arch/arm64/lib/copy_page.S @@ -27,20 +27,50 @@ * x1 - src */ ENTRY(copy_page) - /* Assume cache line size is 64 bytes. */ - prfm pldl1strm, [x1, #64] -1: ldp x2, x3, [x1] + ldp x2, x3, [x1] ldp x4, x5, [x1, #16] ldp x6, x7, [x1, #32] ldp x8, x9, [x1, #48] - add x1, x1, #64 - prfm pldl1strm, [x1, #64] + ldp x10, x11, [x1, #64] + ldp x12, x13, [x1, #80] + ldp x14, x15, [x1, #96] + ldp x16, x17, [x1, #112] + + mov x18, #(PAGE_SIZE - 128) + add x1, x1, #128 +1: + subs x18, x18, #128 + stnp x2, x3, [x0] + ldp x2, x3, [x1] stnp x4, x5, [x0, #16] + ldp x4, x5, [x1, #16] stnp x6, x7, [x0, #32] + ldp x6, x7, [x1, #32] stnp x8, x9, [x0, #48] - add x0, x0, #64 - tst x1, #(PAGE_SIZE - 1) - b.ne 1b + ldp x8, x9, [x1, #48] + stnp x10, x11, [x0, #64] + ldp x10, x11, [x1, #64] + stnp x12, x13, [x0, #80] + ldp x12, x13, [x1, #80] + stnp x14, x15, [x0, #96] + ldp x14, x15, [x1, #96] + stnp x16, x17, [x0, #112] + ldp x16, x17, [x1, #112] + + add x0, x0, #128 + add x1, x1, #128 + + b.gt 1b + + stnp x2, x3, [x0] + stnp x4, x5, [x0, #16] + stnp x6, x7, [x0, #32] + stnp x8, x9, [x0, #48] + stnp x10, x11, [x0, #64] + stnp x12, x13, [x0, #80] + stnp x14, x15, [x0, #96] + stnp x16, x17, [x0, #112] + ret ENDPROC(copy_page)