From patchwork Tue Mar 17 18:05:13 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 45895 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-wg0-f71.google.com (mail-wg0-f71.google.com [74.125.82.71]) by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id 24EF32153C for ; Tue, 17 Mar 2015 18:05:44 +0000 (UTC) Received: by wggy19 with SMTP id y19sf3186528wgg.2 for ; Tue, 17 Mar 2015 11:05:43 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:delivered-to:from:to:cc:subject :date:message-id:sender:precedence:list-id:x-original-sender :x-original-authentication-results:mailing-list:list-post:list-help :list-archive:list-unsubscribe; bh=GyGCdikgL7g/GXSNiw3aJb5wGgNqb3dgVnfhI5AaIGc=; b=ARI56tGD1CH+7E5tU1BKw65txl9Ud4jbdK1wIB8kRiJNZPZfBxg5qRocGW1jzvBKFd 4McAm31D4isP41r4R0LkVtxbX4nG6oD+IKhnHI40q2rWyfbKipvbQb+Wph+hvpJBnbOf YCsQcrd+Fx/TBVfwBu4IaSVsGbBtheEzPcBKue2yofqLu7aJIEq5USN8CoAHSzll/0uq EcYtlJHx7iBwElYjzLW6Hz8pwGma2jcp8HwZXKD0r8idF9waClxr8XLOIrD+HkiHupQP 1MeH3627DxcN0A4oyLIs6poFTJhx7ohpkLh2p3nJF3fBm7XDg2AswzQwGvYKNhTw9Aro z0vA== X-Gm-Message-State: ALoCoQkbBAr2RztXcaYfjAB9UXOUnBnk+sPW1rNCdMtnDb6E5t0TApAsfBEwHNr9L23IHQ6qNOY3 X-Received: by 10.152.21.9 with SMTP id r9mr10325487lae.0.1426615543327; Tue, 17 Mar 2015 11:05:43 -0700 (PDT) MIME-Version: 1.0 X-BeenThere: patchwork-forward@linaro.org Received: by 10.152.5.38 with SMTP id p6ls66494lap.17.gmail; Tue, 17 Mar 2015 11:05:42 -0700 (PDT) X-Received: by 10.152.2.105 with SMTP id 9mr38923104lat.16.1426615542854; Tue, 17 Mar 2015 11:05:42 -0700 (PDT) Received: from mail-la0-f41.google.com (mail-la0-f41.google.com. [209.85.215.41]) by mx.google.com with ESMTPS id mu1si11112990lbc.3.2015.03.17.11.05.42 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 17 Mar 2015 11:05:42 -0700 (PDT) Received-SPF: pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 209.85.215.41 as permitted sender) client-ip=209.85.215.41; Received: by lamx15 with SMTP id x15so15791578lam.3 for ; Tue, 17 Mar 2015 11:05:42 -0700 (PDT) X-Received: by 10.152.28.5 with SMTP id x5mr59952904lag.112.1426615542593; Tue, 17 Mar 2015 11:05:42 -0700 (PDT) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patch@linaro.org Received: by 10.112.35.133 with SMTP id h5csp675723lbj; Tue, 17 Mar 2015 11:05:41 -0700 (PDT) X-Received: by 10.70.133.97 with SMTP id pb1mr85929386pdb.10.1426615540659; Tue, 17 Mar 2015 11:05:40 -0700 (PDT) Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id qa2si30753640pdb.130.2015.03.17.11.05.39 for ; Tue, 17 Mar 2015 11:05:40 -0700 (PDT) Received-SPF: none (google.com: linux-crypto-owner@vger.kernel.org does not designate permitted sender hosts) client-ip=209.132.180.67; Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932076AbbCQSFg (ORCPT ); Tue, 17 Mar 2015 14:05:36 -0400 Received: from mail-wi0-f178.google.com ([209.85.212.178]:38825 "EHLO mail-wi0-f178.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753496AbbCQSFc (ORCPT ); Tue, 17 Mar 2015 14:05:32 -0400 Received: by wifj2 with SMTP id j2so18021111wif.1 for ; Tue, 17 Mar 2015 11:05:30 -0700 (PDT) X-Received: by 10.194.110.69 with SMTP id hy5mr137901691wjb.121.1426615530233; Tue, 17 Mar 2015 11:05:30 -0700 (PDT) Received: from ards-macbook-pro.local ([90.174.4.220]) by mx.google.com with ESMTPSA id vv9sm20931534wjc.35.2015.03.17.11.05.28 (version=TLSv1.1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 17 Mar 2015 11:05:29 -0700 (PDT) From: Ard Biesheuvel To: will.deacon@arm.com, catalin.marinas@arm.com, herbert@gondor.apana.org.au, linux-arm-kernel@lists.infradead.org, linux-crypto@vger.kernel.org Cc: Ard Biesheuvel Subject: [PATCH] arm64/crypto: issue aese/aesmc instructions in pairs Date: Tue, 17 Mar 2015 19:05:13 +0100 Message-Id: <1426615513-28587-1-git-send-email-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 1.8.3.2 Sender: linux-crypto-owner@vger.kernel.org Precedence: list List-ID: X-Mailing-List: linux-crypto@vger.kernel.org X-Removed-Original-Auth: Dkim didn't pass. X-Original-Sender: ard.biesheuvel@linaro.org X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 209.85.215.41 as permitted sender) smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org Mailing-list: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org X-Google-Group-Id: 836684582541 List-Post: , List-Help: , List-Archive: List-Unsubscribe: , This changes the AES core transform implementations to issue aese/aesmc (and aesd/aesimc) in pairs. This enables a micro-architectural optimization in recent Cortex-A5x cores that improves performance by 50-90%. Measured performance in cycles per byte (Cortex-A57): CBC enc CBC dec CTR before 3.64 1.34 1.32 after 1.95 0.85 0.93 Note that this results in a ~5% performance decrease for older cores. Signed-off-by: Ard Biesheuvel --- Will, This is the optimization you yourself mentioned to me about a year ago (or even longer perhaps?) Anyway, we have now been able to confirm it on a sample 'in the wild', (i.e., a Galaxy S6 phone) arch/arm64/crypto/aes-ce-ccm-core.S | 12 ++++++------ arch/arm64/crypto/aes-ce.S | 10 +++------- 2 files changed, 9 insertions(+), 13 deletions(-) diff --git a/arch/arm64/crypto/aes-ce-ccm-core.S b/arch/arm64/crypto/aes-ce-ccm-core.S index 432e4841cd81..a2a7fbcacc14 100644 --- a/arch/arm64/crypto/aes-ce-ccm-core.S +++ b/arch/arm64/crypto/aes-ce-ccm-core.S @@ -101,19 +101,19 @@ ENTRY(ce_aes_ccm_final) 0: mov v4.16b, v3.16b 1: ld1 {v5.2d}, [x2], #16 /* load next round key */ aese v0.16b, v4.16b - aese v1.16b, v4.16b aesmc v0.16b, v0.16b + aese v1.16b, v4.16b aesmc v1.16b, v1.16b 2: ld1 {v3.2d}, [x2], #16 /* load next round key */ aese v0.16b, v5.16b - aese v1.16b, v5.16b aesmc v0.16b, v0.16b + aese v1.16b, v5.16b aesmc v1.16b, v1.16b 3: ld1 {v4.2d}, [x2], #16 /* load next round key */ subs w3, w3, #3 aese v0.16b, v3.16b - aese v1.16b, v3.16b aesmc v0.16b, v0.16b + aese v1.16b, v3.16b aesmc v1.16b, v1.16b bpl 1b aese v0.16b, v4.16b @@ -146,19 +146,19 @@ ENDPROC(ce_aes_ccm_final) ld1 {v5.2d}, [x10], #16 /* load 2nd round key */ 2: /* inner loop: 3 rounds, 2x interleaved */ aese v0.16b, v4.16b - aese v1.16b, v4.16b aesmc v0.16b, v0.16b + aese v1.16b, v4.16b aesmc v1.16b, v1.16b 3: ld1 {v3.2d}, [x10], #16 /* load next round key */ aese v0.16b, v5.16b - aese v1.16b, v5.16b aesmc v0.16b, v0.16b + aese v1.16b, v5.16b aesmc v1.16b, v1.16b 4: ld1 {v4.2d}, [x10], #16 /* load next round key */ subs w7, w7, #3 aese v0.16b, v3.16b - aese v1.16b, v3.16b aesmc v0.16b, v0.16b + aese v1.16b, v3.16b aesmc v1.16b, v1.16b ld1 {v5.2d}, [x10], #16 /* load next round key */ bpl 2b diff --git a/arch/arm64/crypto/aes-ce.S b/arch/arm64/crypto/aes-ce.S index 685a18f731eb..78f3cfe92c08 100644 --- a/arch/arm64/crypto/aes-ce.S +++ b/arch/arm64/crypto/aes-ce.S @@ -45,18 +45,14 @@ .macro do_enc_Nx, de, mc, k, i0, i1, i2, i3 aes\de \i0\().16b, \k\().16b - .ifnb \i1 - aes\de \i1\().16b, \k\().16b - .ifnb \i3 - aes\de \i2\().16b, \k\().16b - aes\de \i3\().16b, \k\().16b - .endif - .endif aes\mc \i0\().16b, \i0\().16b .ifnb \i1 + aes\de \i1\().16b, \k\().16b aes\mc \i1\().16b, \i1\().16b .ifnb \i3 + aes\de \i2\().16b, \k\().16b aes\mc \i2\().16b, \i2\().16b + aes\de \i3\().16b, \k\().16b aes\mc \i3\().16b, \i3\().16b .endif .endif