From patchwork Thu Jul 21 13:05:05 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 592139 Delivered-To: patch@linaro.org Received: by 2002:a05:7000:b811:0:0:0:0 with SMTP id fc17csp445044mab; Thu, 21 Jul 2022 06:06:25 -0700 (PDT) X-Google-Smtp-Source: AGRyM1s3Sc4DKTzkpeMHy2QYDxruNrpovwp78Iypkm78aDSfIXKL+RnwnVcM/3mcgdeYei1SBQOP X-Received: by 2002:a05:6402:270b:b0:43a:d89e:8c2d with SMTP id y11-20020a056402270b00b0043ad89e8c2dmr57104460edd.413.1658408785430; Thu, 21 Jul 2022 06:06:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1658408785; cv=none; d=google.com; s=arc-20160816; b=JSoGBy5Yi17qHhGgMSG0T3881UxEsEBaD3/uIWvNRf+URUgwd73fg8VnjJJY1Rj69G Qpqe7IVYmkbXhO7J3ZR5kD1BuuuWGur0ZqZqxMHSmVw+ACSwJrYmkJAm3LpZiw7fi7VA igvi6lel3UFFg8AD423jfNpKYNu2XPYV+JblN6OKjNhnrW9AjFV2Ndvfv3a5L3Yo2fNU YoAsSyunnGy9mCiC4L/Y8lNHPU1b6EDSqyCugwT45qwFmNQLMujCSmrEmXgnsDcU7Ai0 wlNkb7bvAnJE4bC9N0lAJm83C8pP8wcZ98v3JtgL/nDm25WXMdQQajniMSOZB3x5Uall KcSQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:reply-to:from:list-subscribe:list-help :list-post:list-archive:list-unsubscribe:list-id:precedence :content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:dmarc-filter:delivered-to:dkim-signature :dkim-filter; bh=FlEAncg4MN/pvtM6hdKb2UX9yOxvc0/NdfypTXFAQB8=; b=wzeTmhbqU311GhlW0+29sA3Tjro1YSwzncgCg1chQoRDCgTFOaIzUKQ8G70RbDvzb0 jLr4l85zYVR14su1V7U0mM0zjphRPtcWrNu7C2cYAS+uuIhRNkZwoH4GGrdHOIrrfwik lDkavH0y1SvfDRO+Zhg71Jo/Epi/Ogi+InmjZixrk/ev73Rn8lFQuvMisjs9o5vQbKzg NAzykXlP6BnSgRVQyjCAwuh69LFtggCUpP6rPzFClT/prK//fcJEnEMQKplqQkkQUfGu n1XvrKRCmrKxE3ssOEvhIHkay6z8HAgGPTnqSZEJw30gMxvnENHzLdR8M/Vw0S26Fg/O Pcig== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@sourceware.org header.s=default header.b=CKSQpcge; spf=pass (google.com: domain of libc-alpha-bounces+patch=linaro.org@sourceware.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="libc-alpha-bounces+patch=linaro.org@sourceware.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sourceware.org Return-Path: Received: from sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id sd25-20020a1709076e1900b007120a60b38csi2378196ejc.568.2022.07.21.06.06.25 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 21 Jul 2022 06:06:25 -0700 (PDT) Received-SPF: pass (google.com: domain of libc-alpha-bounces+patch=linaro.org@sourceware.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@sourceware.org header.s=default header.b=CKSQpcge; spf=pass (google.com: domain of libc-alpha-bounces+patch=linaro.org@sourceware.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="libc-alpha-bounces+patch=linaro.org@sourceware.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 37EB4384B821 for ; Thu, 21 Jul 2022 13:06:24 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 37EB4384B821 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1658408784; bh=FlEAncg4MN/pvtM6hdKb2UX9yOxvc0/NdfypTXFAQB8=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=CKSQpcgezoXYa+YyFkXIwY5omMzXuOivVNTUX+9oChNcsiR3iJG+KDKv1QGFekBQV yQcSNu9+RCAosl532GnlO6qQV1E0+Qg9hNXh8vEd0eQLoUtji3oSZtP8TFm9qsqm1r DnTM2PLmsCpAi2eF71f3jPmnGaX56W+gFq+OpQG8= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-oi1-x233.google.com (mail-oi1-x233.google.com [IPv6:2607:f8b0:4864:20::233]) by sourceware.org (Postfix) with ESMTPS id 60FC2385BADF for ; Thu, 21 Jul 2022 13:05:31 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 60FC2385BADF Received: by mail-oi1-x233.google.com with SMTP id u9so1824294oiv.12 for ; Thu, 21 Jul 2022 06:05:31 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=FlEAncg4MN/pvtM6hdKb2UX9yOxvc0/NdfypTXFAQB8=; b=1t5/KhTVZmdcOftqhObUfjpA2VYmnkq2ySiSb9WJqxYWHZus+1k7eLwGpJIDUYo98F GdK08SkAmv7Nwm/TOCYnNSEPfKHOkreh+cZ8TnimY9da2HTzU/FE64YKRYO1LB+ZsytY ujGF3qZ4qx/oJ7ne4AbvlfFVaxu7WO9L2c9d7fCdu0J+Z0eqmoN0RMVk0U7uP86Ha6vm KlCc6Hombna+L0ZOckcoIH4Xjk8QrJQVuBM+IZhXsHSpYt85fvkDg41pGnqhEeojTh1y E29m4dcdAo+9lkyPrd8S07gpJi11+9waZYpo0iDaK68AjBJuNft3kzopC/ycVfdpmuls w1Uw== X-Gm-Message-State: AJIora+zfGCZYBYRXw8foGeeI482mw/CE/vsMrMQ3wBqAErz0egpHC9X 7cNT1mFsMUk0SLenwhyZy1fn380pRpWrnA== X-Received: by 2002:a05:6808:6d7:b0:325:67ff:a21b with SMTP id m23-20020a05680806d700b0032567ffa21bmr4731751oih.105.1658408728899; Thu, 21 Jul 2022 06:05:28 -0700 (PDT) Received: from mandiga.. ([2804:431:c7cb:8ded:c7ae:e809:e8fb:6cb9]) by smtp.gmail.com with ESMTPSA id a32-20020a056870a1a000b0010d997ffe7asm767553oaf.37.2022.07.21.06.05.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 21 Jul 2022 06:05:28 -0700 (PDT) To: libc-alpha@sourceware.org, Florian Weimer , Carlos O'Donell Subject: [PATCH v11 7/9] powerpc64: Add optimized chacha20 Date: Thu, 21 Jul 2022 10:05:05 -0300 Message-Id: <20220721130507.3017393-8-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220721130507.3017393-1-adhemerval.zanella@linaro.org> References: <20220721130507.3017393-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_ASCII_DIVIDERS, KAM_NUMSUBJECT, KAM_SHORT, RCVD_IN_DNSWL_NONE, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Adhemerval Zanella via Libc-alpha From: Adhemerval Zanella Netto Reply-To: Adhemerval Zanella Cc: "Paul E . Murphy" Errors-To: libc-alpha-bounces+patch=linaro.org@sourceware.org Sender: "Libc-alpha" From: Adhemerval Zanella Netto It adds vectorized ChaCha20 implementation based on libgcrypt cipher/chacha20-ppc.c. It targets POWER8 and it is used on default for LE. On a POWER8 it shows the following improvements (using formatted bench-arc4random data): POWER8 GENERIC MB/s ----------------------------------------------- arc4random [single-thread] 138.77 arc4random_buf(16) [single-thread] 174.36 arc4random_buf(32) [single-thread] 228.11 arc4random_buf(48) [single-thread] 252.31 arc4random_buf(64) [single-thread] 270.11 arc4random_buf(80) [single-thread] 278.97 arc4random_buf(96) [single-thread] 287.78 arc4random_buf(112) [single-thread] 291.92 arc4random_buf(128) [single-thread] 295.25 POWER8 MB/s ----------------------------------------------- arc4random [single-thread] 198.06 arc4random_buf(16) [single-thread] 278.79 arc4random_buf(32) [single-thread] 448.89 arc4random_buf(48) [single-thread] 551.09 arc4random_buf(64) [single-thread] 646.12 arc4random_buf(80) [single-thread] 698.04 arc4random_buf(96) [single-thread] 756.06 arc4random_buf(112) [single-thread] 784.12 arc4random_buf(128) [single-thread] 808.04 ----------------------------------------------- Checked on powerpc64-linux-gnu and powerpc64le-linux-gnu. Reviewed-by: Paul E. Murphy --- LICENSES | 3 +- .../powerpc/powerpc64/be/multiarch/Makefile | 4 + .../powerpc64/be/multiarch/chacha20-ppc.c | 1 + .../powerpc64/be/multiarch/chacha20_arch.h | 42 +++ sysdeps/powerpc/powerpc64/power8/Makefile | 5 + .../powerpc/powerpc64/power8/chacha20-ppc.c | 256 ++++++++++++++++++ .../powerpc/powerpc64/power8/chacha20_arch.h | 37 +++ 7 files changed, 347 insertions(+), 1 deletion(-) create mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/Makefile create mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c create mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h create mode 100644 sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c create mode 100644 sysdeps/powerpc/powerpc64/power8/chacha20_arch.h diff --git a/LICENSES b/LICENSES index 80168d0b1b..e177af6035 100644 --- a/LICENSES +++ b/LICENSES @@ -391,7 +391,8 @@ Copyright 2001 by Stephen L. Moshier . */ sysdeps/aarch64/chacha20-aarch64.S, sysdeps/x86_64/chacha20-amd64-sse2.S, -and sysdeps/x86_64/chacha20-amd64-avx2.S imports code from libgcrypt, +sysdeps/x86_64/chacha20-amd64-avx2.S, and +sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c imports code from libgcrypt, with the following notices: Copyright (C) 2017-2019 Jussi Kivilinna diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/Makefile b/sysdeps/powerpc/powerpc64/be/multiarch/Makefile new file mode 100644 index 0000000000..8c75165f7f --- /dev/null +++ b/sysdeps/powerpc/powerpc64/be/multiarch/Makefile @@ -0,0 +1,4 @@ +ifeq ($(subdir),stdlib) +sysdep_routines += chacha20-ppc +CFLAGS-chacha20-ppc.c += -mcpu=power8 +endif diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c b/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c new file mode 100644 index 0000000000..cf9e735326 --- /dev/null +++ b/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c @@ -0,0 +1 @@ +#include diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h b/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h new file mode 100644 index 0000000000..08494dc045 --- /dev/null +++ b/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h @@ -0,0 +1,42 @@ +/* PowerPC optimization for ChaCha20. + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include + +unsigned int __chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst, + const uint8_t *src, size_t nblks) + attribute_hidden; + +static void +chacha20_crypt (uint32_t *state, uint8_t *dst, + const uint8_t *src, size_t bytes) +{ + _Static_assert (CHACHA20_BUFSIZE % 4 == 0, + "CHACHA20_BUFSIZE not multiple of 4"); + _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 4, + "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 4"); + + unsigned long int hwcap = GLRO(dl_hwcap); + unsigned long int hwcap2 = GLRO(dl_hwcap2); + if (hwcap2 & PPC_FEATURE2_ARCH_2_07 && hwcap & PPC_FEATURE_HAS_ALTIVEC) + __chacha20_power8_blocks4 (state, dst, src, + CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); + else + chacha20_crypt_generic (state, dst, src, bytes); +} diff --git a/sysdeps/powerpc/powerpc64/power8/Makefile b/sysdeps/powerpc/powerpc64/power8/Makefile index 71a59529f3..abb0aa3f11 100644 --- a/sysdeps/powerpc/powerpc64/power8/Makefile +++ b/sysdeps/powerpc/powerpc64/power8/Makefile @@ -1,3 +1,8 @@ ifeq ($(subdir),string) sysdep_routines += strcasestr-ppc64 endif + +ifeq ($(subdir),stdlib) +sysdep_routines += chacha20-ppc +CFLAGS-chacha20-ppc.c += -mcpu=power8 +endif diff --git a/sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c b/sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c new file mode 100644 index 0000000000..0bbdcb9363 --- /dev/null +++ b/sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c @@ -0,0 +1,256 @@ +/* Optimized PowerPC implementation of ChaCha20 cipher. + Copyright (C) 2022 Free Software Foundation, Inc. + + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +/* chacha20-ppc.c - PowerPC vector implementation of ChaCha20 + Copyright (C) 2019 Jussi Kivilinna + + This file is part of Libgcrypt. + + Libgcrypt is free software; you can redistribute it and/or modify + it under the terms of the GNU Lesser General Public License as + published by the Free Software Foundation; either version 2.1 of + the License, or (at your option) any later version. + + Libgcrypt is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with this program; if not, see . + */ + +#include +#include +#include +#include +#include + +typedef vector unsigned char vector16x_u8; +typedef vector unsigned int vector4x_u32; +typedef vector unsigned long long vector2x_u64; + +#if __BYTE_ORDER == __BIG_ENDIAN +static const vector16x_u8 le_bswap_const = + { 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12 }; +#endif + +static inline vector4x_u32 +vec_rol_elems (vector4x_u32 v, unsigned int idx) +{ +#if __BYTE_ORDER != __BIG_ENDIAN + return vec_sld (v, v, (16 - (4 * idx)) & 15); +#else + return vec_sld (v, v, (4 * idx) & 15); +#endif +} + +static inline vector4x_u32 +vec_load_le (unsigned long offset, const unsigned char *ptr) +{ + vector4x_u32 vec; + vec = vec_vsx_ld (offset, (const uint32_t *)ptr); +#if __BYTE_ORDER == __BIG_ENDIAN + vec = (vector4x_u32) vec_perm ((vector16x_u8)vec, (vector16x_u8)vec, + le_bswap_const); +#endif + return vec; +} + +static inline void +vec_store_le (vector4x_u32 vec, unsigned long offset, unsigned char *ptr) +{ +#if __BYTE_ORDER == __BIG_ENDIAN + vec = (vector4x_u32)vec_perm((vector16x_u8)vec, (vector16x_u8)vec, + le_bswap_const); +#endif + vec_vsx_st (vec, offset, (uint32_t *)ptr); +} + + +static inline vector4x_u32 +vec_add_ctr_u64 (vector4x_u32 v, vector4x_u32 a) +{ +#if __BYTE_ORDER == __BIG_ENDIAN + static const vector16x_u8 swap32 = + { 4, 5, 6, 7, 0, 1, 2, 3, 12, 13, 14, 15, 8, 9, 10, 11 }; + vector2x_u64 vec, add, sum; + + vec = (vector2x_u64)vec_perm ((vector16x_u8)v, (vector16x_u8)v, swap32); + add = (vector2x_u64)vec_perm ((vector16x_u8)a, (vector16x_u8)a, swap32); + sum = vec + add; + return (vector4x_u32)vec_perm ((vector16x_u8)sum, (vector16x_u8)sum, swap32); +#else + return (vector4x_u32)((vector2x_u64)(v) + (vector2x_u64)(a)); +#endif +} + +/********************************************************************** + 4-way chacha20 + **********************************************************************/ + +#define ROTATE(v1,rolv) \ + __asm__ ("vrlw %0,%1,%2\n\t" : "=v" (v1) : "v" (v1), "v" (rolv)) + +#define PLUS(ds,s) \ + ((ds) += (s)) + +#define XOR(ds,s) \ + ((ds) ^= (s)) + +#define ADD_U64(v,a) \ + (v = vec_add_ctr_u64(v, a)) + +/* 4x4 32-bit integer matrix transpose */ +#define transpose_4x4(x0, x1, x2, x3) ({ \ + vector4x_u32 t1 = vec_mergeh(x0, x2); \ + vector4x_u32 t2 = vec_mergel(x0, x2); \ + vector4x_u32 t3 = vec_mergeh(x1, x3); \ + x3 = vec_mergel(x1, x3); \ + x0 = vec_mergeh(t1, t3); \ + x1 = vec_mergel(t1, t3); \ + x2 = vec_mergeh(t2, x3); \ + x3 = vec_mergel(t2, x3); \ + }) + +#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2) \ + PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \ + ROTATE(d1, rotate_16); ROTATE(d2, rotate_16); \ + PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \ + ROTATE(b1, rotate_12); ROTATE(b2, rotate_12); \ + PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \ + ROTATE(d1, rotate_8); ROTATE(d2, rotate_8); \ + PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \ + ROTATE(b1, rotate_7); ROTATE(b2, rotate_7); + +unsigned int attribute_hidden +__chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst, const uint8_t *src, + size_t nblks) +{ + vector4x_u32 counters_0123 = { 0, 1, 2, 3 }; + vector4x_u32 counter_4 = { 4, 0, 0, 0 }; + vector4x_u32 rotate_16 = { 16, 16, 16, 16 }; + vector4x_u32 rotate_12 = { 12, 12, 12, 12 }; + vector4x_u32 rotate_8 = { 8, 8, 8, 8 }; + vector4x_u32 rotate_7 = { 7, 7, 7, 7 }; + vector4x_u32 state0, state1, state2, state3; + vector4x_u32 v0, v1, v2, v3, v4, v5, v6, v7; + vector4x_u32 v8, v9, v10, v11, v12, v13, v14, v15; + vector4x_u32 tmp; + int i; + + /* Force preload of constants to vector registers. */ + __asm__ ("": "+v" (counters_0123) :: "memory"); + __asm__ ("": "+v" (counter_4) :: "memory"); + __asm__ ("": "+v" (rotate_16) :: "memory"); + __asm__ ("": "+v" (rotate_12) :: "memory"); + __asm__ ("": "+v" (rotate_8) :: "memory"); + __asm__ ("": "+v" (rotate_7) :: "memory"); + + state0 = vec_vsx_ld (0 * 16, state); + state1 = vec_vsx_ld (1 * 16, state); + state2 = vec_vsx_ld (2 * 16, state); + state3 = vec_vsx_ld (3 * 16, state); + + do + { + v0 = vec_splat (state0, 0); + v1 = vec_splat (state0, 1); + v2 = vec_splat (state0, 2); + v3 = vec_splat (state0, 3); + v4 = vec_splat (state1, 0); + v5 = vec_splat (state1, 1); + v6 = vec_splat (state1, 2); + v7 = vec_splat (state1, 3); + v8 = vec_splat (state2, 0); + v9 = vec_splat (state2, 1); + v10 = vec_splat (state2, 2); + v11 = vec_splat (state2, 3); + v12 = vec_splat (state3, 0); + v13 = vec_splat (state3, 1); + v14 = vec_splat (state3, 2); + v15 = vec_splat (state3, 3); + + v12 += counters_0123; + v13 -= vec_cmplt (v12, counters_0123); + + for (i = 20; i > 0; i -= 2) + { + QUARTERROUND2 (v0, v4, v8, v12, v1, v5, v9, v13) + QUARTERROUND2 (v2, v6, v10, v14, v3, v7, v11, v15) + QUARTERROUND2 (v0, v5, v10, v15, v1, v6, v11, v12) + QUARTERROUND2 (v2, v7, v8, v13, v3, v4, v9, v14) + } + + v0 += vec_splat (state0, 0); + v1 += vec_splat (state0, 1); + v2 += vec_splat (state0, 2); + v3 += vec_splat (state0, 3); + v4 += vec_splat (state1, 0); + v5 += vec_splat (state1, 1); + v6 += vec_splat (state1, 2); + v7 += vec_splat (state1, 3); + v8 += vec_splat (state2, 0); + v9 += vec_splat (state2, 1); + v10 += vec_splat (state2, 2); + v11 += vec_splat (state2, 3); + tmp = vec_splat( state3, 0); + tmp += counters_0123; + v12 += tmp; + v13 += vec_splat (state3, 1) - vec_cmplt (tmp, counters_0123); + v14 += vec_splat (state3, 2); + v15 += vec_splat (state3, 3); + ADD_U64 (state3, counter_4); + + transpose_4x4 (v0, v1, v2, v3); + transpose_4x4 (v4, v5, v6, v7); + transpose_4x4 (v8, v9, v10, v11); + transpose_4x4 (v12, v13, v14, v15); + + vec_store_le (v0, (64 * 0 + 16 * 0), dst); + vec_store_le (v1, (64 * 1 + 16 * 0), dst); + vec_store_le (v2, (64 * 2 + 16 * 0), dst); + vec_store_le (v3, (64 * 3 + 16 * 0), dst); + + vec_store_le (v4, (64 * 0 + 16 * 1), dst); + vec_store_le (v5, (64 * 1 + 16 * 1), dst); + vec_store_le (v6, (64 * 2 + 16 * 1), dst); + vec_store_le (v7, (64 * 3 + 16 * 1), dst); + + vec_store_le (v8, (64 * 0 + 16 * 2), dst); + vec_store_le (v9, (64 * 1 + 16 * 2), dst); + vec_store_le (v10, (64 * 2 + 16 * 2), dst); + vec_store_le (v11, (64 * 3 + 16 * 2), dst); + + vec_store_le (v12, (64 * 0 + 16 * 3), dst); + vec_store_le (v13, (64 * 1 + 16 * 3), dst); + vec_store_le (v14, (64 * 2 + 16 * 3), dst); + vec_store_le (v15, (64 * 3 + 16 * 3), dst); + + src += 4*64; + dst += 4*64; + + nblks -= 4; + } + while (nblks); + + vec_vsx_st (state3, 3 * 16, state); + + return 0; +} diff --git a/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h b/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h new file mode 100644 index 0000000000..ded06762b6 --- /dev/null +++ b/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h @@ -0,0 +1,37 @@ +/* PowerPC optimization for ChaCha20. + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include + +unsigned int __chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst, + const uint8_t *src, size_t nblks) + attribute_hidden; + +static void +chacha20_crypt (uint32_t *state, uint8_t *dst, + const uint8_t *src, size_t bytes) +{ + _Static_assert (CHACHA20_BUFSIZE % 4 == 0, + "CHACHA20_BUFSIZE not multiple of 4"); + _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 4, + "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 4"); + + __chacha20_power8_blocks4 (state, dst, src, + CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); +}