From patchwork Sat Aug 4 18:46:24 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 143449 Delivered-To: patch@linaro.org Received: by 2002:a2e:9754:0:0:0:0:0 with SMTP id f20-v6csp1543512ljj; Sat, 4 Aug 2018 11:46:36 -0700 (PDT) X-Google-Smtp-Source: AAOMgpdwpxFZwbQomsz9i+eVVHHf6/7IkF2kQNHyAe+yKJcJCDxRlKyo6tNEo5GFOKHfL02qOC5+ X-Received: by 2002:a63:5542:: with SMTP id f2-v6mr8633022pgm.37.1533408396070; Sat, 04 Aug 2018 11:46:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1533408396; cv=none; d=google.com; s=arc-20160816; b=znwBe17AMyITbp7HYfEXpgZfs5nsvKRV3ylqh8GmmpI0CXyUMLyl3rDhMsTMbxrux8 BLaakFZLc0SrLijpzsCKjNuqEpvyT4FAGF3ye73dVoRUqHUHfmGPQRttRpr9jvEUUSIa LYrxBL6Rk/q4CZ6kIICPL+bji5co7eGbcIZXH1ZGRRDgl68QlPlxngxtIZy3lEnE6jFz dv0l5xEaYBQB84RlL2G0CWO0f2NwsY77pDVdecAinItqwNMOptzv9eQgo4a+M01NZSsN lsGinjAruO9Elnj9MqKGOyQ4V5YwG8Xj5ADJL5WXtw9c8iHviZtGl8XmvwXy3NdtFIUc gLqA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=GoEAMPv7ae12W1B/AP8WTKMiEpYBcJEkfUG44j/a860=; b=efuHWWyxnPFHE4aICRZX5tbvqSpal178W0Pe9CJQqRtgfAL/coEXQFOdbWSuX/AlgA 86CpaJwYsKjbWE+LsgBVd5O85r2x1C6A1t5xFN9qebyi92AF5UTPFhfJ16S86BCPND+Q zkZt39wFPbuxUm+3XME/MKBQEWYQzToAs6/YtfPbuaesbFIBqvo9aB/dm9NQ2ufspKMk 1UKuUv6ff83WBjiwtQSKZAsh60y/JQB1v8v9IIN5rWdz891/k4dO9yii/3P4AMDz3Uty 4zg0hy1XFNZYmYWCVYeDi6X+mp6pnU6hOKocA9JBHC+ADi6p41VdoBcMzSUJOX2ImkUa i1ww== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=Zer9dOQ4; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p65-v6si8667183pga.401.2018.08.04.11.46.35; Sat, 04 Aug 2018 11:46:36 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=Zer9dOQ4; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728054AbeHDUsK (ORCPT + 1 other); Sat, 4 Aug 2018 16:48:10 -0400 Received: from mail-ed1-f45.google.com ([209.85.208.45]:46523 "EHLO mail-ed1-f45.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728067AbeHDUsI (ORCPT ); Sat, 4 Aug 2018 16:48:08 -0400 Received: by mail-ed1-f45.google.com with SMTP id o8-v6so3286479edt.13 for ; Sat, 04 Aug 2018 11:46:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=GoEAMPv7ae12W1B/AP8WTKMiEpYBcJEkfUG44j/a860=; b=Zer9dOQ4R/DkOk+xID0AtZJI+AdzIM4vjUOSVg6cFds/I/B+NSnbf2Z96uPxuJIqbf A7yyxePXeMcVLiL3ANRrNuq307/K+wlj8+2HUXTwuKUdZhj+6U4fdnXnO9KQMj16IPU0 g1xuyGRNu1JDLBucDzyjEqmnIQy0r6W8J4PvA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=GoEAMPv7ae12W1B/AP8WTKMiEpYBcJEkfUG44j/a860=; b=q3QyOqB8mtjOLsgMen2idRrBghcd6GkzWX9EqMZVbZRGpcSbCRTtMMo7/es0ylv3p2 Qsrb30jc9ooId6bLAiNvonDubLxi1Km+ZXAyjnqGkMKUQyimK2CkAbftvYi0IyD5pUnx OBYZ+UIZkAqJIoTWQS6qICSzQOJCYdfbTXBqaUOWmw8mGxgZlg4DFZq5xOPWxPPg1vFc cDYpPWHueH1MHw4jBz93st+TzyUvClZ95glzGQ2sVN0BwI5seHS86z98tqz4rK2+8FmV wd46HwW7i2XW1VDIVTJSELsgH/5ZYENa/ipRKGtqqheBNy9Nh5CtRt/S22yccq+4PSoG 0nFA== X-Gm-Message-State: AOUpUlGzlQQb2s+Zdb6etoFRzaqsd1TSp3zNAanI2YNSZMjT3sUfFLon UTQhvRAgL2kPEGivjjCDhuD+7ZAnHe4= X-Received: by 2002:a50:d307:: with SMTP id g7-v6mr12098903edh.221.1533408391768; Sat, 04 Aug 2018 11:46:31 -0700 (PDT) Received: from rev02.home (b80182.upc-b.chello.nl. [212.83.80.182]) by smtp.gmail.com with ESMTPSA id l30-v6sm4340504edc.70.2018.08.04.11.46.30 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 04 Aug 2018 11:46:30 -0700 (PDT) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: herbert@gondor.apana.org.au, linux-arm-kernel@lists.infradead.org, jerome.forissier@linaro.org, jens.wiklander@linaro.org, Ard Biesheuvel Subject: [PATCH 1/2] crypto: arm64/ghash-ce - replace NEON yield check with block limit Date: Sat, 4 Aug 2018 20:46:24 +0200 Message-Id: <20180804184625.28523-2-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180804184625.28523-1-ard.biesheuvel@linaro.org> References: <20180804184625.28523-1-ard.biesheuvel@linaro.org> Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Checking the TIF_NEED_RESCHED flag is disproportionately costly on cores with fast crypto instructions and comparatively slow memory accesses. On algorithms such as GHASH, which executes at ~1 cycle per byte on cores that implement support for 64 bit polynomial multiplication, there is really no need to check the TIF_NEED_RESCHED particularly often, and so we can remove the NEON yield check from the assembler routines. However, unlike the AEAD or skcipher APIs, the shash/ahash APIs take arbitrary input lengths, and so there needs to be some sanity check to ensure that we don't hog the CPU for excessive amounts of time. So let's simply cap the maximum input size that is processed in one go to 64 KB. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/ghash-ce-core.S | 39 ++++++-------------- arch/arm64/crypto/ghash-ce-glue.c | 16 ++++++-- 2 files changed, 23 insertions(+), 32 deletions(-) -- 2.18.0 diff --git a/arch/arm64/crypto/ghash-ce-core.S b/arch/arm64/crypto/ghash-ce-core.S index 913e49932ae6..344811c6a0ca 100644 --- a/arch/arm64/crypto/ghash-ce-core.S +++ b/arch/arm64/crypto/ghash-ce-core.S @@ -213,31 +213,23 @@ .endm .macro __pmull_ghash, pn - frame_push 5 - - mov x19, x0 - mov x20, x1 - mov x21, x2 - mov x22, x3 - mov x23, x4 - -0: ld1 {SHASH.2d}, [x22] - ld1 {XL.2d}, [x20] + ld1 {SHASH.2d}, [x3] + ld1 {XL.2d}, [x1] ext SHASH2.16b, SHASH.16b, SHASH.16b, #8 eor SHASH2.16b, SHASH2.16b, SHASH.16b __pmull_pre_\pn /* do the head block first, if supplied */ - cbz x23, 1f - ld1 {T1.2d}, [x23] - mov x23, xzr - b 2f + cbz x4, 0f + ld1 {T1.2d}, [x4] + mov x4, xzr + b 1f -1: ld1 {T1.2d}, [x21], #16 - sub w19, w19, #1 +0: ld1 {T1.2d}, [x2], #16 + sub w0, w0, #1 -2: /* multiply XL by SHASH in GF(2^128) */ +1: /* multiply XL by SHASH in GF(2^128) */ CPU_LE( rev64 T1.16b, T1.16b ) ext T2.16b, XL.16b, XL.16b, #8 @@ -259,18 +251,9 @@ CPU_LE( rev64 T1.16b, T1.16b ) eor T2.16b, T2.16b, XH.16b eor XL.16b, XL.16b, T2.16b - cbz w19, 3f - - if_will_cond_yield_neon - st1 {XL.2d}, [x20] - do_cond_yield_neon - b 0b - endif_yield_neon - - b 1b + cbnz w0, 0b -3: st1 {XL.2d}, [x20] - frame_pop + st1 {XL.2d}, [x1] ret .endm diff --git a/arch/arm64/crypto/ghash-ce-glue.c b/arch/arm64/crypto/ghash-ce-glue.c index 88e3d93fa7c7..03ce71ea81a2 100644 --- a/arch/arm64/crypto/ghash-ce-glue.c +++ b/arch/arm64/crypto/ghash-ce-glue.c @@ -113,6 +113,9 @@ static void ghash_do_update(int blocks, u64 dg[], const char *src, } } +/* avoid hogging the CPU for too long */ +#define MAX_BLOCKS (SZ_64K / GHASH_BLOCK_SIZE) + static int ghash_update(struct shash_desc *desc, const u8 *src, unsigned int len) { @@ -136,11 +139,16 @@ static int ghash_update(struct shash_desc *desc, const u8 *src, blocks = len / GHASH_BLOCK_SIZE; len %= GHASH_BLOCK_SIZE; - ghash_do_update(blocks, ctx->digest, src, key, - partial ? ctx->buf : NULL); + do { + int chunk = min(blocks, MAX_BLOCKS); + + ghash_do_update(chunk, ctx->digest, src, key, + partial ? ctx->buf : NULL); - src += blocks * GHASH_BLOCK_SIZE; - partial = 0; + blocks -= chunk; + src += chunk * GHASH_BLOCK_SIZE; + partial = 0; + } while (unlikely(blocks > 0)); } if (len) memcpy(ctx->buf + partial, src, len);