From patchwork Tue Dec 4 13:13:30 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 152805 Delivered-To: patch@linaro.org Received: by 2002:a2e:299d:0:0:0:0:0 with SMTP id p29-v6csp8053742ljp; Tue, 4 Dec 2018 05:13:43 -0800 (PST) X-Google-Smtp-Source: AFSGD/WJbJdb4b6LEwPQ6rwrr+9zM9yqVmI0bQDenW5JOchCO4MA2FBf7BEa+w36n7Ssr3EV+QFQ X-Received: by 2002:a63:5664:: with SMTP id g36mr16565497pgm.313.1543929223031; Tue, 04 Dec 2018 05:13:43 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1543929223; cv=none; d=google.com; s=arc-20160816; b=RwUd+9tGQ8VuYa84hoUaJKQMFQvVOoyd+OTnd5dwUMAS8bxiCM6J2+c3iKJIWim9vM 1m9R1Ou2RlygXrrHrcGVRpsME4we7ftC67LM75vIvaR1YpUZ0nNzKk5AnQhoxhW6cRoI 2/vFuzdEMeXxjWjJRkqenIA+/1JgsfYFhVgcM+hWUwLjWaeLftEFigSwL3+/kx/zK2dX Y9dR3o9h14s+lqTOSkCmnYTIkIHCuXQ9asT7xgcE1ogF5cnGa30PLeW6hnfvSwjnnga4 3vE9Y/I1yHfQmGjhQPz746JkAYvYpkVBt57os5ZFL3j2Xx5mpwjfY6KqwjTkmoKF/l3g oigw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=LCJFmvx/Rzcmobu0zER/6C+riv28RQJhl9EKCgejxbw=; b=xVRzxeaafHYmvXVVoRNlg8B9TKPR7Py92R0bgnixtbX+W7So+Q5IJ3AHLtnm49Nxdv BUY9BrCNae1lZirX6+C7DBHmN4MA3yGqccYVdGaqlYMXMDS+Elt73GmxaCY82OR4loLx 0iDy06TDvr+E7npLvqWYD+hr228cURxOuSAByVdoJqmFkgXjjr5lqm3N1v7ofdngVsWN hzErbnOk6fGel8W0fBjVyP91LLVaqqXNdHzZ1WmDH+4L2rbMmf6IevZmkYp3Eu01CfQH nTwtzDsnPvDYj0yYAyR7wH3t8kIWSQvvK4wo2UJxM2mpwKx3E9etXkpoeKXNQ0lrjQn2 91lQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=Pc3LL34p; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c10si15936314pll.271.2018.12.04.05.13.42; Tue, 04 Dec 2018 05:13:43 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=Pc3LL34p; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726388AbeLDNNl (ORCPT + 2 others); Tue, 4 Dec 2018 08:13:41 -0500 Received: from mail-wr1-f66.google.com ([209.85.221.66]:46051 "EHLO mail-wr1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725802AbeLDNNk (ORCPT ); Tue, 4 Dec 2018 08:13:40 -0500 Received: by mail-wr1-f66.google.com with SMTP id b14so2319777wru.12 for ; Tue, 04 Dec 2018 05:13:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=LCJFmvx/Rzcmobu0zER/6C+riv28RQJhl9EKCgejxbw=; b=Pc3LL34p0+OICPgiRs2MrlYjqzAJSRqZ9qKDTaqHBtRdvFU8uMQhax/fncLqHr4I2W 2xK4GZkft/2zdfC3bsB+6Ea23ew3BwJFzHMsDrEVONFQsa1fCtcDk93xU49ZVl3FJ1gU 2EYfhGOZ52X6QZpr/p53SIVeHBlMJ6Rx+5bSU= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=LCJFmvx/Rzcmobu0zER/6C+riv28RQJhl9EKCgejxbw=; b=aXZE90R2sTjoqaD6DRkSDXFleJM2IT64IikWONR9dwcxwsOyfwxubdMcKzYU3huWFp ppwPLwfc+fZ+6Gcm2oXj5B90OJwnSSHBkIXy58k1AC1zq2LbvE322pPpQIxZG0AGPg8V FKtL75aXOoN6sB6lpcMBZq6njXak1p/QcMH1h9reZG/xTeHhiYq32eCZPYirn7Ii7ere 8oEVx5VZVBrEDmIGjOoDOvqP/OBWoR/Iyk7jyCZGbKrO9g9ea1Z63UVTmLLAQlzzlb0V 5Hmxs/43aj+oVyE/g/5FH6a8okIOaZLVDs4tw63b+meI7QdTh2F0oXlViE1nwulzLr1B OOMA== X-Gm-Message-State: AA+aEWaNGcJtWiAMij0f3Gom8YonndPRvRXjtmrrDNw7MaVqJSAnu9Jo h4bBD0SMlUOAFIXCW+lr9aOs00jSYbfidA== X-Received: by 2002:adf:c108:: with SMTP id r8mr18845871wre.233.1543929218527; Tue, 04 Dec 2018 05:13:38 -0800 (PST) Received: from harold.home ([2a01:cb1d:112:6f00:90ed:187a:cfaf:c404]) by smtp.gmail.com with ESMTPSA id p6sm13054707wrx.50.2018.12.04.05.13.37 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 04 Dec 2018 05:13:37 -0800 (PST) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: herbert@gondor.apana.org.au, Ard Biesheuvel , Eric Biggers , Martin Willi Subject: [PATCH v2 0/3] crypto: arm64/chacha - performance improvements Date: Tue, 4 Dec 2018 14:13:30 +0100 Message-Id: <20181204131333.15046-1-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.19.2 MIME-Version: 1.0 Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Improve the performance of NEON based ChaCha: Patch #1 adds a block size of 1472 to the tcrypt test template so we have something that reflects the VPN case. Patch #2 improves performance for arbitrary length inputs: on deep pipelines, throughput increases ~30% when running on inputs blocks whose size is drawn randomly from the interval [64, 1024) Patch #3 adopts the OpenSSL approach to use the ALU in parallel with the SIMD unit to process a fifth block while the SIMD is operating on 4 blocks. Performance on Cortex-A57: BEFORE: ======= testing speed of async chacha20 (chacha20-neon) encryption tcrypt: test 0 (256 bit key, 16 byte blocks): 2528223 operations in 1 seconds (40451568 bytes) tcrypt: test 1 (256 bit key, 64 byte blocks): 2518155 operations in 1 seconds (161161920 bytes) tcrypt: test 2 (256 bit key, 256 byte blocks): 1207948 operations in 1 seconds (309234688 bytes) tcrypt: test 3 (256 bit key, 1024 byte blocks): 332194 operations in 1 seconds (340166656 bytes) tcrypt: test 4 (256 bit key, 1472 byte blocks): 185659 operations in 1 seconds (273290048 bytes) tcrypt: test 5 (256 bit key, 8192 byte blocks): 41829 operations in 1 seconds (342663168 bytes) AFTER: ====== testing speed of async chacha20 (chacha20-neon) encryption tcrypt: test 0 (256 bit key, 16 byte blocks): 2530018 operations in 1 seconds (40480288 bytes) tcrypt: test 1 (256 bit key, 64 byte blocks): 2518270 operations in 1 seconds (161169280 bytes) tcrypt: test 2 (256 bit key, 256 byte blocks): 1187760 operations in 1 seconds (304066560 bytes) tcrypt: test 3 (256 bit key, 1024 byte blocks): 361652 operations in 1 seconds (370331648 bytes) tcrypt: test 4 (256 bit key, 1472 byte blocks): 280971 operations in 1 seconds (413589312 bytes) tcrypt: test 5 (256 bit key, 8192 byte blocks): 53654 operations in 1 seconds (439533568 bytes) Zinc: ===== testing speed of async chacha20 (chacha20-software) encryption tcrypt: test 0 (256 bit key, 16 byte blocks): 2510300 operations in 1 seconds (40164800 bytes) tcrypt: test 1 (256 bit key, 64 byte blocks): 2663794 operations in 1 seconds (170482816 bytes) tcrypt: test 2 (256 bit key, 256 byte blocks): 1237617 operations in 1 seconds (316829952 bytes) tcrypt: test 3 (256 bit key, 1024 byte blocks): 364645 operations in 1 seconds (373396480 bytes) tcrypt: test 4 (256 bit key, 1472 byte blocks): 251548 operations in 1 seconds (370278656 bytes) tcrypt: test 5 (256 bit key, 8192 byte blocks): 47650 operations in 1 seconds (390348800 bytes) Cc: Eric Biggers Cc: Martin Willi Ard Biesheuvel (3): crypto: tcrypt - add block size of 1472 to skcipher template crypto: arm64/chacha - optimize for arbitrary length inputs crypto: arm64/chacha - use combined SIMD/ALU routine for more speed arch/arm64/crypto/chacha-neon-core.S | 396 +++++++++++++++++++- arch/arm64/crypto/chacha-neon-glue.c | 59 ++- crypto/tcrypt.c | 2 +- 3 files changed, 404 insertions(+), 53 deletions(-) -- 2.19.2