From patchwork Tue Dec 26 10:29:37 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 122736 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp780484qgn; Tue, 26 Dec 2017 02:32:28 -0800 (PST) X-Google-Smtp-Source: ACJfBosnnDrpzjMyvafi8FQ2m7WnFTiS6XRhNc/znOd/MdKPEgJ3pDuPB9SnfzhP/XtzTD7GnToS X-Received: by 10.101.82.196 with SMTP id z4mr18226614pgp.397.1514284348764; Tue, 26 Dec 2017 02:32:28 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1514284348; cv=none; d=google.com; s=arc-20160816; b=WwfKh7/pYSs4uHFDrN9PHfWNVlT+Hjwg7N2EUj11WMsc2cKqGpYYanFBooWpgeq9nD BBEFK9C89AU15eziKdbKRl+17gQpHNnlpJAhV9Vg53IEv+e1JD/iqCAxXyG/hPX4IKKK Pykt8bDsW99teDxPjaTUoVz9qQ7VhprWms3JNwPF6bSAe6mhgpEXkQRVguI5N8NUNFRD ZgKj+X2RWU5kLeCsCOycAPaEpOAbyngX6W5v0z9OpUnLtvnwhtZbOQmrW5Vn7MhQeB5v q4T5HHpy54cxdWBy6aZfkvn/z2hjz0kHggMuyd3TJeoo5qhV/ijNxG/pTWimnTOLhLIp rWcg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=c8h70L5BRtQTT5f04B6joUgfojSgImC7lcKFhq/xK/A=; b=FkQ5IGstDh99YZkMMVAytoqBEUa8c3neLx/aajMIBpSNJLfyWXZFZYtey5bhrcrHfs u4X+s0U4OVyr10O/SfJ/psjjk9nP/UiHiHokHtcaQoAjMOmOYJuWPosrrNoEQovvD4+S BVFix9z3tsZ7ZfM7dDHaDQ16BKy1dPwPEVBU4RRSkOUKRUG9d9T/oHxKszvEbugSImiN NNmHP4nNYG/0ENtTkHefJfgHiB3Oomsqz7T9kuRAjbDuHNOyHE5pkGq/353en5PTdn7Y 32h9PMgbkdv7pi7ZW1HiX5o+d4j71NNXPlOudV2/eW2mfMaEGLAHfWzuUH0cpj1uf1S8 Z46g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=V89yUpfI; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b3si22623175plc.740.2017.12.26.02.32.28; Tue, 26 Dec 2017 02:32:28 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=V89yUpfI; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751493AbdLZKc0 (ORCPT + 28 others); Tue, 26 Dec 2017 05:32:26 -0500 Received: from mail-wr0-f193.google.com ([209.85.128.193]:40278 "EHLO mail-wr0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751336AbdLZKb2 (ORCPT ); Tue, 26 Dec 2017 05:31:28 -0500 Received: by mail-wr0-f193.google.com with SMTP id p17so10575312wre.7 for ; Tue, 26 Dec 2017 02:31:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=c8h70L5BRtQTT5f04B6joUgfojSgImC7lcKFhq/xK/A=; b=V89yUpfIECPuTCDiiO5Axk8+it0v2lJO4XNYVlkB3ShlZm7C+y3LaLe1eNGO/7B2TU uN5ja/q7e9NiyjWLN/lRPDTgb0yna8KK14YY2Lw5l2itHkS5YQbtCz3mEyhEIt52K/CV ekFiduVKoUM/mhrRrG+0OPqzK7Ow81D66UOLk= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=c8h70L5BRtQTT5f04B6joUgfojSgImC7lcKFhq/xK/A=; b=FNL8KsqX8V6+Dj+l1zde0wFOhQvTfNb9iiTAOrbtb8//smzOPaRC54uR1HkpixI1MH KAWf89LQFbS2MJc0gCGGQRbcLbIekSXrE8HGwz7IuYXK7lGwQaos31+olSGNI+A8JifF h8wJp03t9xw7TofRZgB6tJfoa06Q8OSpgKsJUx+cCmgEFBe9sYk9Kz1sx7xznFLtE4rk F0FP7PgasH42XErlGDFo+JEao0h9i4ZNQIvw1VBknP/OInJ0hgsnEOy3xUP75+cjeXyS b9tZCKRcRy9GQLL0LC48csijWDK/gB+aFt7LughV1/MP9VitSZJuN5MX+95AnJSPKNtT M7eg== X-Gm-Message-State: AKGB3mJorN+4GPUiGgSgfOTeyVXz9EwlRzrSCQoBpH37uw2qIAtlJZAn jrfxYgxUIzcNxTszi2FGbAr6j9PpYi0= X-Received: by 10.223.131.132 with SMTP id 4mr23352367wre.217.1514284286660; Tue, 26 Dec 2017 02:31:26 -0800 (PST) Received: from localhost.localdomain ([160.171.216.245]) by smtp.gmail.com with ESMTPSA id l142sm13974036wmb.43.2017.12.26.02.31.24 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 26 Dec 2017 02:31:25 -0800 (PST) From: Ard Biesheuvel To: linux-kernel@vger.kernel.org Cc: Ard Biesheuvel , Dave Martin , Russell King - ARM Linux , Sebastian Andrzej Siewior , Mark Rutland , linux-rt-users@vger.kernel.org, Peter Zijlstra , Catalin Marinas , Will Deacon , Steven Rostedt , Thomas Gleixner Subject: [PATCH v4 17/20] crypto: arm64/aes-ghash - yield NEON after every block of input Date: Tue, 26 Dec 2017 10:29:37 +0000 Message-Id: <20171226102940.26908-18-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20171226102940.26908-1-ard.biesheuvel@linaro.org> References: <20171226102940.26908-1-ard.biesheuvel@linaro.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Avoid excessive scheduling delays under a preemptible kernel by yielding the NEON after every block of input. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/ghash-ce-core.S | 113 ++++++++++++++------ arch/arm64/crypto/ghash-ce-glue.c | 28 +++-- 2 files changed, 97 insertions(+), 44 deletions(-) -- 2.11.0 diff --git a/arch/arm64/crypto/ghash-ce-core.S b/arch/arm64/crypto/ghash-ce-core.S index 11ebf1ae248a..dcffb9e77589 100644 --- a/arch/arm64/crypto/ghash-ce-core.S +++ b/arch/arm64/crypto/ghash-ce-core.S @@ -213,22 +213,31 @@ .endm .macro __pmull_ghash, pn - ld1 {SHASH.2d}, [x3] - ld1 {XL.2d}, [x1] + frame_push 5 + + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 + +0: ld1 {SHASH.2d}, [x22] + ld1 {XL.2d}, [x20] ext SHASH2.16b, SHASH.16b, SHASH.16b, #8 eor SHASH2.16b, SHASH2.16b, SHASH.16b __pmull_pre_\pn /* do the head block first, if supplied */ - cbz x4, 0f - ld1 {T1.2d}, [x4] - b 1f + cbz x23, 1f + ld1 {T1.2d}, [x23] + mov x23, xzr + b 2f -0: ld1 {T1.2d}, [x2], #16 - sub w0, w0, #1 +1: ld1 {T1.2d}, [x21], #16 + sub w19, w19, #1 -1: /* multiply XL by SHASH in GF(2^128) */ +2: /* multiply XL by SHASH in GF(2^128) */ CPU_LE( rev64 T1.16b, T1.16b ) ext T2.16b, XL.16b, XL.16b, #8 @@ -250,9 +259,18 @@ CPU_LE( rev64 T1.16b, T1.16b ) eor T2.16b, T2.16b, XH.16b eor XL.16b, XL.16b, T2.16b - cbnz w0, 0b + cbz w19, 3f + + if_will_cond_yield_neon + st1 {XL.2d}, [x20] + do_cond_yield_neon + b 0b + endif_yield_neon + + b 1b - st1 {XL.2d}, [x1] +3: st1 {XL.2d}, [x20] + frame_pop ret .endm @@ -304,38 +322,55 @@ ENDPROC(pmull_ghash_update_p8) .endm .macro pmull_gcm_do_crypt, enc - ld1 {SHASH.2d}, [x4] - ld1 {XL.2d}, [x1] - ldr x8, [x5, #8] // load lower counter + frame_push 10 + + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 + mov x24, x5 + mov x25, x6 + mov x26, x7 + .if \enc == 1 + ldr x27, [sp, #96] // first stacked arg + .endif + + ldr x28, [x24, #8] // load lower counter +CPU_LE( rev x28, x28 ) + +0: mov x0, x25 + load_round_keys w26, x0 + ld1 {SHASH.2d}, [x23] + ld1 {XL.2d}, [x20] movi MASK.16b, #0xe1 ext SHASH2.16b, SHASH.16b, SHASH.16b, #8 -CPU_LE( rev x8, x8 ) shl MASK.2d, MASK.2d, #57 eor SHASH2.16b, SHASH2.16b, SHASH.16b .if \enc == 1 - ld1 {KS.16b}, [x7] + ld1 {KS.16b}, [x27] .endif -0: ld1 {CTR.8b}, [x5] // load upper counter - ld1 {INP.16b}, [x3], #16 - rev x9, x8 - add x8, x8, #1 - sub w0, w0, #1 +1: ld1 {CTR.8b}, [x24] // load upper counter + ld1 {INP.16b}, [x22], #16 + rev x9, x28 + add x28, x28, #1 + sub w19, w19, #1 ins CTR.d[1], x9 // set lower counter .if \enc == 1 eor INP.16b, INP.16b, KS.16b // encrypt input - st1 {INP.16b}, [x2], #16 + st1 {INP.16b}, [x21], #16 .endif rev64 T1.16b, INP.16b - cmp w6, #12 - b.ge 2f // AES-192/256? + cmp w26, #12 + b.ge 4f // AES-192/256? -1: enc_round CTR, v21 +2: enc_round CTR, v21 ext T2.16b, XL.16b, XL.16b, #8 ext IN1.16b, T1.16b, T1.16b, #8 @@ -390,27 +425,39 @@ CPU_LE( rev x8, x8 ) .if \enc == 0 eor INP.16b, INP.16b, KS.16b - st1 {INP.16b}, [x2], #16 + st1 {INP.16b}, [x21], #16 .endif - cbnz w0, 0b + cbz w19, 3f -CPU_LE( rev x8, x8 ) - st1 {XL.2d}, [x1] - str x8, [x5, #8] // store lower counter + if_will_cond_yield_neon + st1 {XL.2d}, [x20] + .if \enc == 1 + st1 {KS.16b}, [x27] + .endif + do_cond_yield_neon + b 0b + endif_yield_neon + b 1b + +3: st1 {XL.2d}, [x20] .if \enc == 1 - st1 {KS.16b}, [x7] + st1 {KS.16b}, [x27] .endif +CPU_LE( rev x28, x28 ) + str x28, [x24, #8] // store lower counter + + frame_pop ret -2: b.eq 3f // AES-192? +4: b.eq 5f // AES-192? enc_round CTR, v17 enc_round CTR, v18 -3: enc_round CTR, v19 +5: enc_round CTR, v19 enc_round CTR, v20 - b 1b + b 2b .endm /* diff --git a/arch/arm64/crypto/ghash-ce-glue.c b/arch/arm64/crypto/ghash-ce-glue.c index cfc9c92814fd..7cf0b1aa6ea8 100644 --- a/arch/arm64/crypto/ghash-ce-glue.c +++ b/arch/arm64/crypto/ghash-ce-glue.c @@ -63,11 +63,12 @@ static void (*pmull_ghash_update)(int blocks, u64 dg[], const char *src, asmlinkage void pmull_gcm_encrypt(int blocks, u64 dg[], u8 dst[], const u8 src[], struct ghash_key const *k, - u8 ctr[], int rounds, u8 ks[]); + u8 ctr[], u32 const rk[], int rounds, + u8 ks[]); asmlinkage void pmull_gcm_decrypt(int blocks, u64 dg[], u8 dst[], const u8 src[], struct ghash_key const *k, - u8 ctr[], int rounds); + u8 ctr[], u32 const rk[], int rounds); asmlinkage void pmull_gcm_encrypt_block(u8 dst[], u8 const src[], u32 const rk[], int rounds); @@ -368,26 +369,29 @@ static int gcm_encrypt(struct aead_request *req) pmull_gcm_encrypt_block(ks, iv, NULL, num_rounds(&ctx->aes_key)); put_unaligned_be32(3, iv + GCM_IV_SIZE); + kernel_neon_end(); - err = skcipher_walk_aead_encrypt(&walk, req, true); + err = skcipher_walk_aead_encrypt(&walk, req, false); while (walk.nbytes >= AES_BLOCK_SIZE) { int blocks = walk.nbytes / AES_BLOCK_SIZE; + kernel_neon_begin(); pmull_gcm_encrypt(blocks, dg, walk.dst.virt.addr, walk.src.virt.addr, &ctx->ghash_key, - iv, num_rounds(&ctx->aes_key), ks); + iv, ctx->aes_key.key_enc, + num_rounds(&ctx->aes_key), ks); + kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); } - kernel_neon_end(); } else { __aes_arm64_encrypt(ctx->aes_key.key_enc, tag, iv, num_rounds(&ctx->aes_key)); put_unaligned_be32(2, iv + GCM_IV_SIZE); - err = skcipher_walk_aead_encrypt(&walk, req, true); + err = skcipher_walk_aead_encrypt(&walk, req, false); while (walk.nbytes >= AES_BLOCK_SIZE) { int blocks = walk.nbytes / AES_BLOCK_SIZE; @@ -467,15 +471,19 @@ static int gcm_decrypt(struct aead_request *req) pmull_gcm_encrypt_block(tag, iv, ctx->aes_key.key_enc, num_rounds(&ctx->aes_key)); put_unaligned_be32(2, iv + GCM_IV_SIZE); + kernel_neon_end(); - err = skcipher_walk_aead_decrypt(&walk, req, true); + err = skcipher_walk_aead_decrypt(&walk, req, false); while (walk.nbytes >= AES_BLOCK_SIZE) { int blocks = walk.nbytes / AES_BLOCK_SIZE; + kernel_neon_begin(); pmull_gcm_decrypt(blocks, dg, walk.dst.virt.addr, walk.src.virt.addr, &ctx->ghash_key, - iv, num_rounds(&ctx->aes_key)); + iv, ctx->aes_key.key_enc, + num_rounds(&ctx->aes_key)); + kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); @@ -483,14 +491,12 @@ static int gcm_decrypt(struct aead_request *req) if (walk.nbytes) pmull_gcm_encrypt_block(iv, iv, NULL, num_rounds(&ctx->aes_key)); - - kernel_neon_end(); } else { __aes_arm64_encrypt(ctx->aes_key.key_enc, tag, iv, num_rounds(&ctx->aes_key)); put_unaligned_be32(2, iv + GCM_IV_SIZE); - err = skcipher_walk_aead_decrypt(&walk, req, true); + err = skcipher_walk_aead_decrypt(&walk, req, false); while (walk.nbytes >= AES_BLOCK_SIZE) { int blocks = walk.nbytes / AES_BLOCK_SIZE;