From patchwork Thu Jun 12 14:43:07 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 31831 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-ob0-f200.google.com (mail-ob0-f200.google.com [209.85.214.200]) by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id 6809B2054B for ; Thu, 12 Jun 2014 14:45:40 +0000 (UTC) Received: by mail-ob0-f200.google.com with SMTP id wm4sf2602847obc.3 for ; Thu, 12 Jun 2014 07:45:39 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:delivered-to:from:to:subject:date:message-id:cc :precedence:list-id:list-unsubscribe:list-archive:list-post :list-help:list-subscribe:mime-version:sender:errors-to :x-original-sender:x-original-authentication-results:mailing-list :content-type:content-transfer-encoding; bh=GDaAWXVMRL/PsCnUK+1RDwYwE3ZDM0XJgR2lg4V21R4=; b=Q0of2ktc8C9WAWOL8AyiPMkOryn5UvfkffaQ2Hb+NXVUIBGPiBkkPGfB9didAvCr0S BdydOq/DTF9gXuwWB0Mj4le84TXsE/qvks0LtJhVEVyGC1C+APw2091VUlMb3tlkN5FT HPqwajjMyEoGHA3xa64lcrK1wuejXcxAV+mY0eCAVwlLWpzSIHxTdpsiaF9x+E63h4N+ gGUqTn9Opb0QgiqPucI8SfgmzF3Bnl9cHU6b4DAz21W+4rEF5lWz1drnqmPLvz02V0NJ GSl3roErAyz1FpJgq5ZwLIjV9Y5/X3m39qmhGbKafJ6Q5p3IaYRLDxR80hua/iprZ4ji vC2g== X-Gm-Message-State: ALoCoQlJO1NZFoUCnYc7yw/UpVFV1IOdnQkSs6jIHHnN4ieI0QOUEIgLOT3F4KGVE8GkrsbXwtQG X-Received: by 10.182.191.73 with SMTP id gw9mr947448obc.50.1402584339923; Thu, 12 Jun 2014 07:45:39 -0700 (PDT) X-BeenThere: patchwork-forward@linaro.org Received: by 10.140.41.8 with SMTP id y8ls3224307qgy.11.gmail; Thu, 12 Jun 2014 07:45:39 -0700 (PDT) X-Received: by 10.52.114.105 with SMTP id jf9mr461501vdb.79.1402584339767; Thu, 12 Jun 2014 07:45:39 -0700 (PDT) Received: from mail-vc0-f179.google.com (mail-vc0-f179.google.com [209.85.220.179]) by mx.google.com with ESMTPS id jb1si384808vec.42.2014.06.12.07.45.39 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 12 Jun 2014 07:45:39 -0700 (PDT) Received-SPF: pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 209.85.220.179 as permitted sender) client-ip=209.85.220.179; Received: by mail-vc0-f179.google.com with SMTP id id10so915216vcb.10 for ; Thu, 12 Jun 2014 07:45:39 -0700 (PDT) X-Received: by 10.52.190.138 with SMTP id gq10mr858574vdc.47.1402584339594; Thu, 12 Jun 2014 07:45:39 -0700 (PDT) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patch@linaro.org Received: by 10.221.54.6 with SMTP id vs6csp409126vcb; Thu, 12 Jun 2014 07:45:39 -0700 (PDT) X-Received: by 10.140.90.69 with SMTP id w63mr58584475qgd.52.1402584339039; Thu, 12 Jun 2014 07:45:39 -0700 (PDT) Received: from bombadil.infradead.org (bombadil.infradead.org. [2001:1868:205::9]) by mx.google.com with ESMTPS id e10si1325164qai.96.2014.06.12.07.45.38 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 12 Jun 2014 07:45:39 -0700 (PDT) Received-SPF: none (google.com: linux-arm-kernel-bounces+patch=linaro.org@lists.infradead.org does not designate permitted sender hosts) client-ip=2001:1868:205::9; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux)) id 1Wv6E7-0000ur-Vr; Thu, 12 Jun 2014 14:43:39 +0000 Received: from mail-wg0-f50.google.com ([74.125.82.50]) by bombadil.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat Linux)) id 1Wv6E5-0000sZ-2Y for linux-arm-kernel@lists.infradead.org; Thu, 12 Jun 2014 14:43:38 +0000 Received: by mail-wg0-f50.google.com with SMTP id x13so1396525wgg.9 for ; Thu, 12 Jun 2014 07:43:14 -0700 (PDT) X-Received: by 10.180.76.6 with SMTP id g6mr7200685wiw.34.1402584193922; Thu, 12 Jun 2014 07:43:13 -0700 (PDT) Received: from ards-macbook-pro.local ([92.69.202.131]) by mx.google.com with ESMTPSA id 4sm5115739eeu.16.2014.06.12.07.43.11 for (version=TLSv1.1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 12 Jun 2014 07:43:13 -0700 (PDT) From: Ard Biesheuvel To: catalin.marinas@arm.com, will.deacon@arm.com Subject: [PATCH] arm64/crypto: fix and improve GHASH secure hash implementation Date: Thu, 12 Jun 2014 16:43:07 +0200 Message-Id: <1402584187-17114-1-git-send-email-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 1.8.3.2 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20140612_074337_423049_A0F60D9A X-CRM114-Status: GOOD ( 13.60 ) X-Spam-Score: -0.7 (/) X-Spam-Report: SpamAssassin version 3.4.0 on bombadil.infradead.org summary: Content analysis details: (-0.7 points) pts rule name description ---- ---------------------- -------------------------------------------------- -0.7 RCVD_IN_DNSWL_LOW RBL: Sender listed at http://www.dnswl.org/, low trust [74.125.82.50 listed in list.dnswl.org] -0.0 SPF_PASS SPF: sender matches SPF record -0.0 RCVD_IN_MSPIKE_H3 RBL: Good reputation (+3) [74.125.82.50 listed in wl.mailspike.net] -0.0 RCVD_IN_MSPIKE_WL Mailspike good senders Cc: linux-arm-kernel@lists.infradead.org, Ard Biesheuvel X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: , List-Help: , List-Subscribe: , MIME-Version: 1.0 Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patch=linaro.org@lists.infradead.org X-Removed-Original-Auth: Dkim didn't pass. X-Original-Sender: ard.biesheuvel@linaro.org X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 209.85.220.179 as permitted sender) smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org Mailing-list: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org X-Google-Group-Id: 836684582541 This fixes a bug in the arm64 GHASH implementation, and switches to a faster, polynomial multiplication based reduction instead of one that uses shifts and rotates. Signed-off-by: Ard Biesheuvel --- This is a bug fix and a performance optimization in a single patch. As the code has never worked correctly and was merged just a couple of days ago, I am assuming this is OK but if anyone would prefer the bug fix separately, I'm happy to split them as well. Ard. arch/arm64/crypto/ghash-ce-core.S | 92 ++++++++++++++++----------------------- arch/arm64/crypto/ghash-ce-glue.c | 5 ++- 2 files changed, 41 insertions(+), 56 deletions(-) diff --git a/arch/arm64/crypto/ghash-ce-core.S b/arch/arm64/crypto/ghash-ce-core.S index b9e6eaf41c9b..dc457015884e 100644 --- a/arch/arm64/crypto/ghash-ce-core.S +++ b/arch/arm64/crypto/ghash-ce-core.S @@ -3,14 +3,6 @@ * * Copyright (C) 2014 Linaro Ltd. * - * Based on arch/x86/crypto/ghash-pmullni-intel_asm.S - * - * Copyright (c) 2009 Intel Corp. - * Author: Huang Ying - * Vinodh Gopal - * Erdinc Ozturk - * Deniz Karakoyunlu - * * This program is free software; you can redistribute it and/or modify it * under the terms of the GNU General Public License version 2 as published * by the Free Software Foundation. @@ -19,13 +11,15 @@ #include #include - DATA .req v0 - SHASH .req v1 - IN1 .req v2 + SHASH .req v0 + SHASH2 .req v1 T1 .req v2 T2 .req v3 - T3 .req v4 - VZR .req v5 + MASK .req v4 + XL .req v5 + XM .req v6 + XH .req v7 + IN1 .req v7 .text .arch armv8-a+crypto @@ -35,61 +29,51 @@ * struct ghash_key const *k, const char *head) */ ENTRY(pmull_ghash_update) - ld1 {DATA.16b}, [x1] ld1 {SHASH.16b}, [x3] - eor VZR.16b, VZR.16b, VZR.16b + ld1 {XL.16b}, [x1] + movi MASK.16b, #0xe1 + ext SHASH2.16b, SHASH.16b, SHASH.16b, #8 + shl MASK.2d, MASK.2d, #57 + eor SHASH2.16b, SHASH2.16b, SHASH.16b /* do the head block first, if supplied */ cbz x4, 0f - ld1 {IN1.2d}, [x4] + ld1 {T1.2d}, [x4] b 1f -0: ld1 {IN1.2d}, [x2], #16 +0: ld1 {T1.2d}, [x2], #16 sub w0, w0, #1 -1: ext IN1.16b, IN1.16b, IN1.16b, #8 -CPU_LE( rev64 IN1.16b, IN1.16b ) - eor DATA.16b, DATA.16b, IN1.16b - /* multiply DATA by SHASH in GF(2^128) */ - ext T2.16b, DATA.16b, DATA.16b, #8 - ext T3.16b, SHASH.16b, SHASH.16b, #8 - eor T2.16b, T2.16b, DATA.16b - eor T3.16b, T3.16b, SHASH.16b +1: /* multiply XL by SHASH in GF(2^128) */ +CPU_LE( rev64 T1.16b, T1.16b ) - pmull2 T1.1q, SHASH.2d, DATA.2d // a1 * b1 - pmull DATA.1q, SHASH.1d, DATA.1d // a0 * b0 - pmull T2.1q, T2.1d, T3.1d // (a1 + a0)(b1 + b0) - eor T2.16b, T2.16b, T1.16b // (a0 * b1) + (a1 * b0) - eor T2.16b, T2.16b, DATA.16b + ext T2.16b, XL.16b, XL.16b, #8 + ext IN1.16b, T1.16b, T1.16b, #8 + eor T1.16b, T1.16b, T2.16b + eor XL.16b, XL.16b, IN1.16b - ext T3.16b, VZR.16b, T2.16b, #8 - ext T2.16b, T2.16b, VZR.16b, #8 - eor DATA.16b, DATA.16b, T3.16b - eor T1.16b, T1.16b, T2.16b // is result of - // carry-less multiplication + pmull2 XH.1q, SHASH.2d, XL.2d // a1 * b1 + eor T1.16b, T1.16b, XL.16b + pmull XL.1q, SHASH.1d, XL.1d // a0 * b0 + pmull XM.1q, SHASH2.1d, T1.1d // (a1 + a0)(b1 + b0) - /* first phase of the reduction */ - shl T3.2d, DATA.2d, #1 - eor T3.16b, T3.16b, DATA.16b - shl T3.2d, T3.2d, #5 - eor T3.16b, T3.16b, DATA.16b - shl T3.2d, T3.2d, #57 - ext T2.16b, VZR.16b, T3.16b, #8 - ext T3.16b, T3.16b, VZR.16b, #8 - eor DATA.16b, DATA.16b, T2.16b - eor T1.16b, T1.16b, T3.16b + ext T1.16b, XL.16b, XH.16b, #8 + eor T2.16b, XL.16b, XH.16b + eor XM.16b, XM.16b, T1.16b + eor XM.16b, XM.16b, T2.16b + pmull T2.1q, XL.1d, MASK.1d - /* second phase of the reduction */ - ushr T2.2d, DATA.2d, #5 - eor T2.16b, T2.16b, DATA.16b - ushr T2.2d, T2.2d, #1 - eor T2.16b, T2.16b, DATA.16b - ushr T2.2d, T2.2d, #1 - eor T1.16b, T1.16b, T2.16b - eor DATA.16b, DATA.16b, T1.16b + mov XH.d[0], XM.d[1] + mov XM.d[1], XL.d[0] + + eor XL.16b, XM.16b, T2.16b + ext T2.16b, XL.16b, XL.16b, #8 + pmull XL.1q, XL.1d, MASK.1d + eor T2.16b, T2.16b, XH.16b + eor XL.16b, XL.16b, T2.16b cbnz w0, 0b - st1 {DATA.16b}, [x1] + st1 {XL.16b}, [x1] ret ENDPROC(pmull_ghash_update) diff --git a/arch/arm64/crypto/ghash-ce-glue.c b/arch/arm64/crypto/ghash-ce-glue.c index b92baf3f68c7..833ec1e3f3e9 100644 --- a/arch/arm64/crypto/ghash-ce-glue.c +++ b/arch/arm64/crypto/ghash-ce-glue.c @@ -67,11 +67,12 @@ static int ghash_update(struct shash_desc *desc, const u8 *src, blocks = len / GHASH_BLOCK_SIZE; len %= GHASH_BLOCK_SIZE; - kernel_neon_begin_partial(6); + kernel_neon_begin_partial(8); pmull_ghash_update(blocks, ctx->digest, src, key, partial ? ctx->buf : NULL); kernel_neon_end(); src += blocks * GHASH_BLOCK_SIZE; + partial = 0; } if (len) memcpy(ctx->buf + partial, src, len); @@ -88,7 +89,7 @@ static int ghash_final(struct shash_desc *desc, u8 *dst) memset(ctx->buf + partial, 0, GHASH_BLOCK_SIZE - partial); - kernel_neon_begin_partial(6); + kernel_neon_begin_partial(8); pmull_ghash_update(1, ctx->digest, ctx->buf, key, NULL); kernel_neon_end(); }