From patchwork Tue Oct 11 18:15:18 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 77518 Delivered-To: patch@linaro.org Received: by 10.140.97.247 with SMTP id m110csp121509qge; Tue, 11 Oct 2016 11:16:17 -0700 (PDT) X-Received: by 10.107.15.27 with SMTP id x27mr7601958ioi.218.1476209777868; Tue, 11 Oct 2016 11:16:17 -0700 (PDT) Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 4si5147972pfk.12.2016.10.11.11.16.17; Tue, 11 Oct 2016 11:16:17 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@linaro.org; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=fail (p=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753244AbcJKSQQ (ORCPT + 1 other); Tue, 11 Oct 2016 14:16:16 -0400 Received: from mail-wm0-f52.google.com ([74.125.82.52]:33096 "EHLO mail-wm0-f52.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752885AbcJKSQJ (ORCPT ); Tue, 11 Oct 2016 14:16:09 -0400 Received: by mail-wm0-f52.google.com with SMTP id f193so99199wmg.0 for ; Tue, 11 Oct 2016 11:15:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=bWiicA8WQHL5KlJ/M9Rq8bMo8mOaS/jLNYoafyhhYRE=; b=VSSeXNQwzx/xNum4pg9O+RS3l9wOiXlzE/zB9fo2XAd1X4dvkiXBUCqnd2B6YNKo9c MAQEAubqHmzJPADHVVxlY7ow0QHp6eoszOznYnHx+kp+Uo6ssDT2248fTEJn0VeqE8T4 g/qChFTNCDSd7OOJKsKbF3q18Nw64lbIpdab8= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=bWiicA8WQHL5KlJ/M9Rq8bMo8mOaS/jLNYoafyhhYRE=; b=jKrW8wOM6M444FwG+SZ4rosKJaaiPKS1MHQ8R5Yb/K8Ui1DmlKwTcCOHTBhYlJwaZM I3cl8DqhH+D7lg0F2N7RqrS1VMk5f0OAd5/qxQI9bUL+kMhGXPNHgem6zIKMm/45oCTy ppkpp2QlFKUqtCBIdXuh38rwWXBSOuP4heEpCi8EQDyeUZopSWXwY3m2ZTLLiAaf+grE Cvw5IqlRKprYujAVOFWUAbRxjuKGCB/wfgcKiRJTSKNVQ0kEwxjnQ5uZph3nUVJk/JMd SXw4YIVkX8oOFXjScb3obfQqi+LrsNTsLbh/l7+PINqvtPBzgJVV7uXMxOCQIRETJR/D NU3A== X-Gm-Message-State: AA6/9RkynHZorkyY/pLI5aYde4YXOoUyjwn/psh5kfvHqmvKRehRVF0jpuHuEhHD0JqccreO X-Received: by 10.194.119.2 with SMTP id kq2mr6366543wjb.217.1476209744644; Tue, 11 Oct 2016 11:15:44 -0700 (PDT) Received: from localhost.localdomain ([105.147.31.57]) by smtp.gmail.com with ESMTPSA id ya1sm8341523wjb.23.2016.10.11.11.15.42 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Tue, 11 Oct 2016 11:15:43 -0700 (PDT) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org, linux-arm-kernel@lists.infradead.org, herbert@gondor.apana.org.au Cc: will.deacon@arm.com, catalin.marinas@arm.com, linux@arm.linux.org.uk, Ard Biesheuvel Subject: [PATCH v2 6/8] crypto: arm64/aes-neon - fix for big endian Date: Tue, 11 Oct 2016 19:15:18 +0100 Message-Id: <1476209720-21114-7-git-send-email-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1476209720-21114-1-git-send-email-ard.biesheuvel@linaro.org> References: <1476209720-21114-1-git-send-email-ard.biesheuvel@linaro.org> Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org The AES implementation using pure NEON instructions relies on the generic AES key schedule generation routines, which store the round keys as arrays of 32-bit quantities stored in memory using native endianness. This means we should refer to these round keys using 4x4 loads rather than 16x1 loads. In addition, the ShiftRows tables are loading using a single scalar load, which is also affected by endianness, so emit these tables in the correct order depending on whether we are building for big endian or not. Fixes: 49788fe2a128 ("arm64/crypto: AES-ECB/CBC/CTR/XTS using ARMv8 NEON and Crypto Extensions") Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-neon.S | 25 ++++++++++++-------- 1 file changed, 15 insertions(+), 10 deletions(-) -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/arch/arm64/crypto/aes-neon.S b/arch/arm64/crypto/aes-neon.S index b93170e1cc93..85f07ead7c5c 100644 --- a/arch/arm64/crypto/aes-neon.S +++ b/arch/arm64/crypto/aes-neon.S @@ -9,6 +9,7 @@ */ #include +#include #define AES_ENTRY(func) ENTRY(neon_ ## func) #define AES_ENDPROC(func) ENDPROC(neon_ ## func) @@ -83,13 +84,13 @@ .endm .macro do_block, enc, in, rounds, rk, rkp, i - ld1 {v15.16b}, [\rk] + ld1 {v15.4s}, [\rk] add \rkp, \rk, #16 mov \i, \rounds 1111: eor \in\().16b, \in\().16b, v15.16b /* ^round key */ tbl \in\().16b, {\in\().16b}, v13.16b /* ShiftRows */ sub_bytes \in - ld1 {v15.16b}, [\rkp], #16 + ld1 {v15.4s}, [\rkp], #16 subs \i, \i, #1 beq 2222f .if \enc == 1 @@ -229,7 +230,7 @@ .endm .macro do_block_2x, enc, in0, in1 rounds, rk, rkp, i - ld1 {v15.16b}, [\rk] + ld1 {v15.4s}, [\rk] add \rkp, \rk, #16 mov \i, \rounds 1111: eor \in0\().16b, \in0\().16b, v15.16b /* ^round key */ @@ -237,7 +238,7 @@ sub_bytes_2x \in0, \in1 tbl \in0\().16b, {\in0\().16b}, v13.16b /* ShiftRows */ tbl \in1\().16b, {\in1\().16b}, v13.16b /* ShiftRows */ - ld1 {v15.16b}, [\rkp], #16 + ld1 {v15.4s}, [\rkp], #16 subs \i, \i, #1 beq 2222f .if \enc == 1 @@ -254,7 +255,7 @@ .endm .macro do_block_4x, enc, in0, in1, in2, in3, rounds, rk, rkp, i - ld1 {v15.16b}, [\rk] + ld1 {v15.4s}, [\rk] add \rkp, \rk, #16 mov \i, \rounds 1111: eor \in0\().16b, \in0\().16b, v15.16b /* ^round key */ @@ -266,7 +267,7 @@ tbl \in1\().16b, {\in1\().16b}, v13.16b /* ShiftRows */ tbl \in2\().16b, {\in2\().16b}, v13.16b /* ShiftRows */ tbl \in3\().16b, {\in3\().16b}, v13.16b /* ShiftRows */ - ld1 {v15.16b}, [\rkp], #16 + ld1 {v15.4s}, [\rkp], #16 subs \i, \i, #1 beq 2222f .if \enc == 1 @@ -306,12 +307,16 @@ .text .align 4 .LForward_ShiftRows: - .byte 0x0, 0x5, 0xa, 0xf, 0x4, 0x9, 0xe, 0x3 - .byte 0x8, 0xd, 0x2, 0x7, 0xc, 0x1, 0x6, 0xb +CPU_LE( .byte 0x0, 0x5, 0xa, 0xf, 0x4, 0x9, 0xe, 0x3 ) +CPU_LE( .byte 0x8, 0xd, 0x2, 0x7, 0xc, 0x1, 0x6, 0xb ) +CPU_BE( .byte 0xb, 0x6, 0x1, 0xc, 0x7, 0x2, 0xd, 0x8 ) +CPU_BE( .byte 0x3, 0xe, 0x9, 0x4, 0xf, 0xa, 0x5, 0x0 ) .LReverse_ShiftRows: - .byte 0x0, 0xd, 0xa, 0x7, 0x4, 0x1, 0xe, 0xb - .byte 0x8, 0x5, 0x2, 0xf, 0xc, 0x9, 0x6, 0x3 +CPU_LE( .byte 0x0, 0xd, 0xa, 0x7, 0x4, 0x1, 0xe, 0xb ) +CPU_LE( .byte 0x8, 0x5, 0x2, 0xf, 0xc, 0x9, 0x6, 0x3 ) +CPU_BE( .byte 0x3, 0x6, 0x9, 0xc, 0xf, 0x2, 0x5, 0x8 ) +CPU_BE( .byte 0xb, 0xe, 0x1, 0x4, 0x7, 0xa, 0xd, 0x0 ) .LForward_Sbox: .byte 0x63, 0x7c, 0x77, 0x7b, 0xf2, 0x6b, 0x6f, 0xc5