From patchwork Tue Jul 24 17:12:20 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 142817 Delivered-To: patch@linaro.org Received: by 2002:a2e:9754:0:0:0:0:0 with SMTP id f20-v6csp7516425ljj; Tue, 24 Jul 2018 10:12:32 -0700 (PDT) X-Google-Smtp-Source: AAOMgpdt3fnuQYq1DYwOWAOe0w3EDo0J1WiUjs2GR8lT6i9INj0LJpG8YkeyWO6d9iEthAbcBt1j X-Received: by 2002:a17:902:9a8b:: with SMTP id w11-v6mr17773098plp.333.1532452352874; Tue, 24 Jul 2018 10:12:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1532452352; cv=none; d=google.com; s=arc-20160816; b=LN8SkSZ/J2qyTRIA7D9UJcgOPzjKV/8P1cyvv64uATwbsMMtUu6ZPUoxEPIOSEave3 Y5aFk9jFiAaEBUTjKNrXawNjUsCDyH9tsBtnaJEzHwMsDbAO4Pe+HqX2i/AMkS4lqHpY lICDjHMHQIqi10+tHCzoygyPgMOh9yqK3BM4LoFjBZ4aAJ0ye3tyngqKIAHBayuwmj8Y ZBuOrtKMFiFWwbYvf8phHuIYY+tjqrVLR3jt4/tuiAeBM3a3fs1LCbayqRSTyBSrC6uJ 8cCchCehO2KcoWWb8V431C23bfTeKOJveis90R2NxYXIDYxYmX0nSpA8KiL50/I5x/Za Jv1g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from :dkim-signature:arc-authentication-results; bh=fhA0a44ohqeTEoK72r7bjf0ACOobhYs+QGnes466TUQ=; b=sOpY693GqMOY21HrQ8HxdXusKuO1FGyYBuovQHAwNhldzXTL4NZJVXhcl7Jl5wOnvM rEv6GzMulQx+vtEAxbzFzenMiLrZ+6kUFAwmDiqdDrbHgg9bjyziPuif/IcP+qJTIEiy bEfKSCv9AmqHthFBjtEVbB5dnVY6DxBwUUrivV8OxGlxhdNRXAI08WOuIKr0FYJCVG6n jFhqeN33dzBM84+lenScjGsSpNspyK2nntq8/+lHNMVLsxMoSaknh5BZuTQMVdPB28Ny zTDckAkUZ3q+ILVF5efxl1CT/0DjVOYnDGLbXGnb7qhURIyKCWjthej7/85JAJa/YghO ejiA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b="CgQx/oL1"; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x1-v6si10764942plb.8.2018.07.24.10.12.32; Tue, 24 Jul 2018 10:12:32 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b="CgQx/oL1"; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388462AbeGXST6 (ORCPT + 1 other); Tue, 24 Jul 2018 14:19:58 -0400 Received: from mail-ed1-f45.google.com ([209.85.208.45]:37980 "EHLO mail-ed1-f45.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2388431AbeGXST6 (ORCPT ); Tue, 24 Jul 2018 14:19:58 -0400 Received: by mail-ed1-f45.google.com with SMTP id t2-v6so4757308edr.5 for ; Tue, 24 Jul 2018 10:12:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id; bh=fhA0a44ohqeTEoK72r7bjf0ACOobhYs+QGnes466TUQ=; b=CgQx/oL1TVORJXMBH1khFgicgH5vsgm8GRGm/xgLqMx51BGHsZN119/qDjzpRbZr8c mQpnRWb1R6ZwVm+zMsSCWEyGEoX0AwJoRhwLp9ccJjBTI9u7roV9oZKXjysgVNYihCQt naOPswgzkL5J22fiBRAbXmv1DmE+yhqDSE3oY= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=fhA0a44ohqeTEoK72r7bjf0ACOobhYs+QGnes466TUQ=; b=rBn1RgOapOsI9O4pp0S5I/N76IgaL/Ej3TP1CKksZj4sfHca5LzBKWDdlKI/eM0GUt RJqm1me0p2yoIo2ZGj9P+A3U0EWpC3SE+/H+hI9tP6WGdHjKMmoIHuTEI34cdKEiyjki V8RfNkgY7yjKZe9ksBff3ZZqOHBmYbj7c7M+V94ohUlEq5JaFPPbqYMkp+rOU6nKPWOY FddwcM2KsjLd6usw8x8M8AItmPA5gM+3pOI4LHiow/ZPNc1PQosPJV1mrulX9vrEaM0f lCBr0Jt784OR5hzNj8YJTeR4Z3YLF4Q5RmkACYj47N4wgLjzDHxcao9Y1+0hWHRwlWTc QeOw== X-Gm-Message-State: AOUpUlEPa/qiu4+OSDNxYrkIFUQx9/IcvoD7TC9ldj44pfFXZ+rgKHrk TqeGPLbjxzXSvI6zCNN95AxNGk0DhgA= X-Received: by 2002:a50:8b66:: with SMTP id l93-v6mr19358715edl.44.1532452349673; Tue, 24 Jul 2018 10:12:29 -0700 (PDT) Received: from rev02.home (b80182.upc-b.chello.nl. [212.83.80.182]) by smtp.gmail.com with ESMTPSA id j50-v6sm11267948ede.0.2018.07.24.10.12.28 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 24 Jul 2018 10:12:28 -0700 (PDT) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: herbert@gondor.apana.org.au, will.deacon@arm.com, dave.martin@arm.com, vakul.garg@nxp.com, bigeasy@linutronix.de, Ard Biesheuvel Subject: [PATCH 0/4] crypto/arm64: reduce impact of NEON yield checks Date: Tue, 24 Jul 2018 19:12:20 +0200 Message-Id: <20180724171224.17363-1-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.11.0 Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Vakul reports a considerable performance hit when running the accelerated arm64 crypto routines with CONFIG_PREEMPT=y configured, now that thay have been updated to take the TIF_NEED_RESCHED flag into account. The issue appears to be caused by the fact that Cortex-A53, the core in question, has a high end implementation of the Crypto Extensions, and has a shallow pipeline, which means even sequential algorithms that may be held back by pipeline stalls on high end out of order cores run at maximum speed. This means SHA-1, SHA-2, GHASH and AES in GCM and CCM modes run at a speed in the order of 2 to 4 cycles per byte, and are currently implemented to check the TIF_NEED_RESCHED after each iteration, which may process as little as 16 bytes (for GHASH). Obviously, every cycle of overhead hurts in this context, and given that the A53's load/store unit is not quite high end, any delays caused by memory accesses that occur in the inner loop of the algorithms are going to be quite significant, hence the performance regression. So reduce the frequency at which the NEON yield checks are performed, so that they occur roughly once every 1000 cycles, which is hopefully a reasonable tradeoff between throughput and worst case scheduling latency. Ard Biesheuvel (4): crypto/arm64: ghash - reduce performance impact of NEON yield checks crypto/arm64: aes-ccm - reduce performance impact of NEON yield checks crypto/arm64: sha1 - reduce performance impact of NEON yield checks crypto/arm64: sha2 - reduce performance impact of NEON yield checks arch/arm64/crypto/aes-ce-ccm-core.S | 3 +++ arch/arm64/crypto/ghash-ce-core.S | 12 +++++++++--- arch/arm64/crypto/sha1-ce-core.S | 3 +++ arch/arm64/crypto/sha2-ce-core.S | 3 +++ 4 files changed, 18 insertions(+), 3 deletions(-) -- 2.11.0