From patchwork Wed Jan 13 07:08:16 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Pinski X-Patchwork-Id: 59653 Delivered-To: patch@linaro.org Received: by 10.112.130.2 with SMTP id oa2csp3187867lbb; Tue, 12 Jan 2016 23:09:58 -0800 (PST) X-Received: by 10.67.4.100 with SMTP id cd4mr122216871pad.59.1452668998794; Tue, 12 Jan 2016 23:09:58 -0800 (PST) Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m68si41749030pfj.133.2016.01.12.23.09.58; Tue, 12 Jan 2016 23:09:58 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dkim=pass header.i=@cavium-com.20150623.gappssmtp.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932082AbcAMHJ5 (ORCPT + 29 others); Wed, 13 Jan 2016 02:09:57 -0500 Received: from mail-io0-f179.google.com ([209.85.223.179]:36382 "EHLO mail-io0-f179.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754997AbcAMHI0 (ORCPT ); Wed, 13 Jan 2016 02:08:26 -0500 Received: by mail-io0-f179.google.com with SMTP id g73so214183827ioe.3 for ; Tue, 12 Jan 2016 23:08:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cavium-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=Rcd5TNBovLpAUaG3iDerAXuDQ8bmNN99gwGi1/WygZ8=; b=dyG8ztL7OD2aD2QV+Hf3pQ13bAajs/AHKuwo6BMzGpRlbHDwbt6EwpGQOmu2+X+G94 bje6IXY22/nQgAtsUMABhqW7huZInCQFKUur27E+NCntRd/N5r88JoSIUl5J7zwji9FM zCvkVoffF9m3KOIav7CJAo8OUywnwp4Fo3OXquiFSn8JGEPmVYFNZuHUc+ARa0cSRdew rtSHfSnsyHCc9tLMF1W7KGvOa3sd5tvFjO7QFyBncQAWE82pM8MT/J7vGqtLUxOOz4G8 N3zJjpS1xflHVmpOfvYDJAMAWYVTyN3nRDhCM5atKLAw19M1FVn/g5p/V9D98P8q5Czd Zx/w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=Rcd5TNBovLpAUaG3iDerAXuDQ8bmNN99gwGi1/WygZ8=; b=jqTuCt1fF423gfop4ZjKeISNFjfgPNC917q04vEM3Gbgd3aWTXkvgWqzJcdcG4oP99 +vtUhA5O+nLAwqmrLkw94/88RBrmOecjeFlrrlD3h7kiijiuQ4DZNXLycCG7COPNHEEZ PPcg6r79HHckt9Vkn5MCYUBhbiJu8NWqKOLZ6S3rLHs/IpU0uPZMPDuZ7qdLpnzDzTLx l5MvzHIHxPgciB3eq5n3DglZF+JTeUm+p+YEHTDpbfdbjLMB0q/y5dGOKYXvk3aPQYJd AGadYQkWwRQd2aZcaQgkFpzYqav1f0ZiZzbQa99JOrqCj8czOJFTy3fqxmOqDsrODE8o xgFg== X-Gm-Message-State: ALoCoQn1pnyYIsXoq3lFshOnZxrAkY3GgdmexFXiG12XpnQVsPExs0ipvqtvrscbty1UIwW4SVeKciNFC4ANzCDhOPYwTwTkag== X-Received: by 10.107.27.6 with SMTP id b6mr125822893iob.163.1452668905423; Tue, 12 Jan 2016 23:08:25 -0800 (PST) Received: from localhost.localdomain ([64.2.3.194]) by smtp.gmail.com with ESMTPSA id ru8sm7726046igb.2.2016.01.12.23.08.24 (version=TLS1 cipher=AES128-SHA bits=128/128); Tue, 12 Jan 2016 23:08:24 -0800 (PST) Received: from localhost.localdomain (apinskidesktop [127.0.0.1]) by localhost.localdomain (8.14.3/8.14.3/Debian-9.4) with ESMTP id u0D78Nia003614 (version=TLSv1/SSLv3 cipher=DHE-DSS-AES256-SHA bits=256 verify=NO); Tue, 12 Jan 2016 23:08:23 -0800 Received: (from apinski@localhost) by localhost.localdomain (8.14.3/8.14.3/Submit) id u0D78NPW003613; Tue, 12 Jan 2016 23:08:23 -0800 From: Andrew Pinski To: pinskia@gmail.com, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, Will Deacon Cc: Andrew Pinski , Will Deacon Subject: [PATCH 2/5] ARM64 Improve copy_page for 128 byte cache line Date: Tue, 12 Jan 2016 23:08:16 -0800 Message-Id: <1452668899-3553-3-git-send-email-apinski@cavium.com> X-Mailer: git-send-email 1.7.2.5 In-Reply-To: <1452668899-3553-1-git-send-email-apinski@cavium.com> References: <1452668899-3553-1-git-send-email-apinski@cavium.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org For 128 byte cache line, doing 128 bytes unrolled in the loop is better. This is adapted from: https://lkml.org/lkml/2016/1/6/497 Note this removes prefetching as it is harmful for processors that includes hardware prefetching. Note the next patch includes patching in software prefetching for one target. Signed-off-by: Andrew Pinski Signed-off-by: Will Deacon --- arch/arm64/lib/copy_page.S | 47 ++++++++++++++++++++++++++++++++++++------- 1 files changed, 39 insertions(+), 8 deletions(-) -- 1.7.2.5 diff --git a/arch/arm64/lib/copy_page.S b/arch/arm64/lib/copy_page.S index 512b9a7..dfb0316 100644 --- a/arch/arm64/lib/copy_page.S +++ b/arch/arm64/lib/copy_page.S @@ -19,6 +19,7 @@ #include #include + /* * Copy a page from src to dest (both are page aligned) * @@ -27,20 +28,50 @@ * x1 - src */ ENTRY(copy_page) - /* Assume cache line size is 64 bytes. */ - prfm pldl1strm, [x1, #64] -1: ldp x2, x3, [x1] + ldp x2, x3, [x1] + ldp x4, x5, [x1, #16] + ldp x6, x7, [x1, #32] + ldp x8, x9, [x1, #48] + ldp x10, x11, [x1, #64] + ldp x12, x13, [x1, #80] + ldp x14, x15, [x1, #96] + ldp x16, x17, [x1, #112] + + mov x18, #(PAGE_SIZE - 128) + add x1, x1, #128 +1: + subs x18, x18, #128 + + stnp x2, x3, [x0] + ldp x2, x3, [x1] + stnp x4, x5, [x0, #16] ldp x4, x5, [x1, #16] + stnp x6, x7, [x0, #32] ldp x6, x7, [x1, #32] + stnp x8, x9, [x0, #48] ldp x8, x9, [x1, #48] - add x1, x1, #64 - prfm pldl1strm, [x1, #64] + stnp x10, x11, [x0, #64] + ldp x10, x11, [x1, #64] + stnp x12, x13, [x0, #80] + ldp x12, x13, [x1, #80] + stnp x14, x15, [x0, #96] + ldp x14, x15, [x1, #96] + stnp x16, x17, [x0, #112] + ldp x16, x17, [x1, #112] + + add x0, x0, #128 + add x1, x1, #128 + + b.gt 1b + stnp x2, x3, [x0] stnp x4, x5, [x0, #16] stnp x6, x7, [x0, #32] stnp x8, x9, [x0, #48] - add x0, x0, #64 - tst x1, #(PAGE_SIZE - 1) - b.ne 1b + stnp x10, x11, [x0, #64] + stnp x12, x13, [x0, #80] + stnp x14, x15, [x0, #96] + stnp x16, x17, [x0, #112] + ret ENDPROC(copy_page)