mbox series

[v3,00/19] crypto: Provide clmul.h and host accel

Message ID 20230821161854.419893-1-richard.henderson@linaro.org
Headers show
Series crypto: Provide clmul.h and host accel | expand

Message

Richard Henderson Aug. 21, 2023, 4:18 p.m. UTC
Inspired by Ard Biesheuvel's RFC patches [1] for accelerating
carry-less multiply under emulation.

Changes for v3:
  * Update target/i386 ops_sse.h.
  * Apply r-b.

Changes for v2:
  * Only accelerate clmul_64; keep generic helpers for other sizes.
  * Drop most of the Int128 interfaces, except for clmul_64.
  * Use the same acceleration format as aes-round.h.


r~


[1] https://patchew.org/QEMU/20230601123332.3297404-1-ardb@kernel.org/


Richard Henderson (19):
  crypto: Add generic 8-bit carry-less multiply routines
  target/arm: Use clmul_8* routines
  target/s390x: Use clmul_8* routines
  target/ppc: Use clmul_8* routines
  crypto: Add generic 16-bit carry-less multiply routines
  target/arm: Use clmul_16* routines
  target/s390x: Use clmul_16* routines
  target/ppc: Use clmul_16* routines
  crypto: Add generic 32-bit carry-less multiply routines
  target/arm: Use clmul_32* routines
  target/s390x: Use clmul_32* routines
  target/ppc: Use clmul_32* routines
  crypto: Add generic 64-bit carry-less multiply routine
  target/arm: Use clmul_64
  target/i386: Use clmul_64
  target/s390x: Use clmul_64
  target/ppc: Use clmul_64
  host/include/i386: Implement clmul.h
  host/include/aarch64: Implement clmul.h

 host/include/aarch64/host/cpuinfo.h      |   1 +
 host/include/aarch64/host/crypto/clmul.h |  41 +++++
 host/include/generic/host/crypto/clmul.h |  15 ++
 host/include/i386/host/cpuinfo.h         |   1 +
 host/include/i386/host/crypto/clmul.h    |  29 ++++
 host/include/x86_64/host/crypto/clmul.h  |   1 +
 include/crypto/clmul.h                   |  83 ++++++++++
 include/qemu/cpuid.h                     |   3 +
 target/arm/tcg/vec_internal.h            |  11 --
 target/i386/ops_sse.h                    |  40 ++---
 crypto/clmul.c                           | 112 ++++++++++++++
 target/arm/tcg/mve_helper.c              |  16 +-
 target/arm/tcg/vec_helper.c              | 102 ++-----------
 target/ppc/int_helper.c                  |  64 ++++----
 target/s390x/tcg/vec_int_helper.c        | 186 ++++++++++-------------
 util/cpuinfo-aarch64.c                   |   4 +-
 util/cpuinfo-i386.c                      |   1 +
 crypto/meson.build                       |   9 +-
 18 files changed, 434 insertions(+), 285 deletions(-)
 create mode 100644 host/include/aarch64/host/crypto/clmul.h
 create mode 100644 host/include/generic/host/crypto/clmul.h
 create mode 100644 host/include/i386/host/crypto/clmul.h
 create mode 100644 host/include/x86_64/host/crypto/clmul.h
 create mode 100644 include/crypto/clmul.h
 create mode 100644 crypto/clmul.c

Comments

Ard Biesheuvel Aug. 21, 2023, 6:08 p.m. UTC | #1
On Mon, 21 Aug 2023 at 18:18, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Inspired by Ard Biesheuvel's RFC patches [1] for accelerating
> carry-less multiply under emulation.
>
> Changes for v3:
>   * Update target/i386 ops_sse.h.
>   * Apply r-b.
>
> Changes for v2:
>   * Only accelerate clmul_64; keep generic helpers for other sizes.
>   * Drop most of the Int128 interfaces, except for clmul_64.
>   * Use the same acceleration format as aes-round.h.
>
>
> r~
>
>
> [1] https://patchew.org/QEMU/20230601123332.3297404-1-ardb@kernel.org/
>
>
> Richard Henderson (19):
>   crypto: Add generic 8-bit carry-less multiply routines
>   target/arm: Use clmul_8* routines
>   target/s390x: Use clmul_8* routines
>   target/ppc: Use clmul_8* routines
>   crypto: Add generic 16-bit carry-less multiply routines
>   target/arm: Use clmul_16* routines
>   target/s390x: Use clmul_16* routines
>   target/ppc: Use clmul_16* routines
>   crypto: Add generic 32-bit carry-less multiply routines
>   target/arm: Use clmul_32* routines
>   target/s390x: Use clmul_32* routines
>   target/ppc: Use clmul_32* routines
>   crypto: Add generic 64-bit carry-less multiply routine
>   target/arm: Use clmul_64
>   target/i386: Use clmul_64
>   target/s390x: Use clmul_64
>   target/ppc: Use clmul_64
>   host/include/i386: Implement clmul.h
>   host/include/aarch64: Implement clmul.h
>

OK, I did the OpenSSL benchmark this time, using a x86_64 cross build
on arm64/ThunderX2, and the speedup is 7x (\o/)

Tested-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>



Distro qemu (no acceleration):

$ qemu-x86_64 --version
qemu-x86_64 version 7.2.4 (Debian 1:7.2+dfsg-7+deb12u1)

$ apps/openssl speed -evp aes-128-gcm
version: 3.2.0-dev
built on: Mon Aug 21 17:57:37 2023 UTC
options: bn(64,64)
compiler: x86_64-linux-gnu-gcc -pthread -m64 -Wa,--noexecstack -Wall
-O3 -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_BUILDING_OPENSSL
-DNDEBUG
CPUINFO: OPENSSL_ia32cap=0xfed8320b0fcbfffd:0x8001020c01d843a9
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes
8192 bytes  16384 bytes
AES-128-GCM       8856.13k    13820.95k    17375.49k    16826.37k
16870.06k    17208.66k


QEMU built with this series applied onto latest master:

$ ~/build/qemu/build/qemu-x86_64 apps/openssl speed -evp aes-128-gcm
version: 3.2.0-dev
built on: Mon Aug 21 17:57:37 2023 UTC
options: bn(64,64)
compiler: x86_64-linux-gnu-gcc -pthread -m64 -Wa,--noexecstack -Wall
-O3 -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_BUILDING_OPENSSL
-DNDEBUG
CPUINFO: OPENSSL_ia32cap=0xfffa320b0fcbfffd:0x8041020c01dc47a9
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes
8192 bytes  16384 bytes
AES-128-GCM      14237.01k    34176.34k    70633.13k    97372.84k
119668.74k   122049.88k
Richard Henderson Aug. 21, 2023, 6:25 p.m. UTC | #2
On 8/21/23 11:08, Ard Biesheuvel wrote:
> OK, I did the OpenSSL benchmark this time, using a x86_64 cross build
> on arm64/ThunderX2, and the speedup is 7x (\o/)

Excellent, thanks.


r~
Richard Henderson Sept. 9, 2023, 6:58 p.m. UTC | #3
Ping.

Still missing r-b on patches 1, 4, 5, 8, 9, 12, 13, 18.

r~

On 8/21/23 09:18, Richard Henderson wrote:
> Inspired by Ard Biesheuvel's RFC patches [1] for accelerating
> carry-less multiply under emulation.
> 
> Changes for v3:
>    * Update target/i386 ops_sse.h.
>    * Apply r-b.
> 
> Changes for v2:
>    * Only accelerate clmul_64; keep generic helpers for other sizes.
>    * Drop most of the Int128 interfaces, except for clmul_64.
>    * Use the same acceleration format as aes-round.h.
> 
> 
> r~
> 
> 
> [1] https://patchew.org/QEMU/20230601123332.3297404-1-ardb@kernel.org/
> 
> 
> Richard Henderson (19):
>    crypto: Add generic 8-bit carry-less multiply routines
>    target/arm: Use clmul_8* routines
>    target/s390x: Use clmul_8* routines
>    target/ppc: Use clmul_8* routines
>    crypto: Add generic 16-bit carry-less multiply routines
>    target/arm: Use clmul_16* routines
>    target/s390x: Use clmul_16* routines
>    target/ppc: Use clmul_16* routines
>    crypto: Add generic 32-bit carry-less multiply routines
>    target/arm: Use clmul_32* routines
>    target/s390x: Use clmul_32* routines
>    target/ppc: Use clmul_32* routines
>    crypto: Add generic 64-bit carry-less multiply routine
>    target/arm: Use clmul_64
>    target/i386: Use clmul_64
>    target/s390x: Use clmul_64
>    target/ppc: Use clmul_64
>    host/include/i386: Implement clmul.h
>    host/include/aarch64: Implement clmul.h
> 
>   host/include/aarch64/host/cpuinfo.h      |   1 +
>   host/include/aarch64/host/crypto/clmul.h |  41 +++++
>   host/include/generic/host/crypto/clmul.h |  15 ++
>   host/include/i386/host/cpuinfo.h         |   1 +
>   host/include/i386/host/crypto/clmul.h    |  29 ++++
>   host/include/x86_64/host/crypto/clmul.h  |   1 +
>   include/crypto/clmul.h                   |  83 ++++++++++
>   include/qemu/cpuid.h                     |   3 +
>   target/arm/tcg/vec_internal.h            |  11 --
>   target/i386/ops_sse.h                    |  40 ++---
>   crypto/clmul.c                           | 112 ++++++++++++++
>   target/arm/tcg/mve_helper.c              |  16 +-
>   target/arm/tcg/vec_helper.c              | 102 ++-----------
>   target/ppc/int_helper.c                  |  64 ++++----
>   target/s390x/tcg/vec_int_helper.c        | 186 ++++++++++-------------
>   util/cpuinfo-aarch64.c                   |   4 +-
>   util/cpuinfo-i386.c                      |   1 +
>   crypto/meson.build                       |   9 +-
>   18 files changed, 434 insertions(+), 285 deletions(-)
>   create mode 100644 host/include/aarch64/host/crypto/clmul.h
>   create mode 100644 host/include/generic/host/crypto/clmul.h
>   create mode 100644 host/include/i386/host/crypto/clmul.h
>   create mode 100644 host/include/x86_64/host/crypto/clmul.h
>   create mode 100644 include/crypto/clmul.h
>   create mode 100644 crypto/clmul.c
>