Message ID | 20230603023426.1064431-1-richard.henderson@linaro.org |
---|---|
Headers | show |
Series | crypto: Provide aes-round.h and host accel | expand |
On Sat, 3 Jun 2023 at 04:34, Richard Henderson <richard.henderson@linaro.org> wrote: > > Inspired by Ard Biesheuvel's RFC patches for accelerating AES > under emulation, provide a set of primitives that maps between > the guest and host fragments. > > There is a small guest correctness test case. > > I think the end result is quite a bit cleaner, since the logic > is now centralized, rather than spread across 4 different guests. > > Further work could clean up crypto/aes.c itself to use these > instead of the tables directly. I'm sure that's just an ultimate > fallback when an appropriate system library is not available, and > so not terribly important, but it could still significantly reduce > the amount of code we carry. > > I would imagine structuring a polynomial multiplication header > in a similar way. There are 4 or 5 versions of those spread across > the different guests. > > Anyway, please review. > > > r~ > > > Richard Henderson (35): > tests/multiarch: Add test-aes > target/arm: Move aesmc and aesimc tables to crypto/aes.c > crypto/aes: Add constants for ShiftRows, InvShiftRows > crypto: Add aesenc_SB_SR > target/i386: Use aesenc_SB_SR > target/arm: Demultiplex AESE and AESMC > target/arm: Use aesenc_SB_SR > target/ppc: Use aesenc_SB_SR > target/riscv: Use aesenc_SB_SR > crypto: Add aesdec_ISB_ISR > target/i386: Use aesdec_ISB_ISR > target/arm: Use aesdec_ISB_ISR > target/ppc: Use aesdec_ISB_ISR > target/riscv: Use aesdec_ISB_ISR > crypto: Add aesenc_MC > target/arm: Use aesenc_MC > crypto: Add aesdec_IMC > target/i386: Use aesdec_IMC > target/arm: Use aesdec_IMC > target/riscv: Use aesdec_IMC > crypto: Add aesenc_SB_SR_MC_AK > target/i386: Use aesenc_SB_SR_MC_AK > target/ppc: Use aesenc_SB_SR_MC_AK > target/riscv: Use aesenc_SB_SR_MC_AK > crypto: Add aesdec_ISB_ISR_IMC_AK > target/i386: Use aesdec_ISB_ISR_IMC_AK > target/riscv: Use aesdec_ISB_ISR_IMC_AK > crypto: Add aesdec_ISB_ISR_AK_IMC > target/ppc: Use aesdec_ISB_ISR_AK_IMC > host/include/i386: Implement aes-round.h > host/include/aarch64: Implement aes-round.h > crypto: Remove AES_shifts, AES_ishifts > crypto: Implement aesdec_IMC with AES_imc_rot > crypto: Remove AES_imc > crypto: Unexport AES_*_rot, AES_TeN, AES_TdN > This is looking very good - it is clearly a much better abstraction than what I proposed, and I'd expect the performance boost to be the same.
On Sat, 3 Jun 2023 at 15:23, Ard Biesheuvel <ardb@kernel.org> wrote: > > On Sat, 3 Jun 2023 at 04:34, Richard Henderson > <richard.henderson@linaro.org> wrote: > > > > Inspired by Ard Biesheuvel's RFC patches for accelerating AES > > under emulation, provide a set of primitives that maps between > > the guest and host fragments. > > > > There is a small guest correctness test case. > > > > I think the end result is quite a bit cleaner, since the logic > > is now centralized, rather than spread across 4 different guests. > > > > Further work could clean up crypto/aes.c itself to use these > > instead of the tables directly. I'm sure that's just an ultimate > > fallback when an appropriate system library is not available, and > > so not terribly important, but it could still significantly reduce > > the amount of code we carry. > > > > I would imagine structuring a polynomial multiplication header > > in a similar way. There are 4 or 5 versions of those spread across > > the different guests. > > > > Anyway, please review. > > > > > > r~ > > > > > > Richard Henderson (35): > > tests/multiarch: Add test-aes > > target/arm: Move aesmc and aesimc tables to crypto/aes.c > > crypto/aes: Add constants for ShiftRows, InvShiftRows > > crypto: Add aesenc_SB_SR > > target/i386: Use aesenc_SB_SR > > target/arm: Demultiplex AESE and AESMC > > target/arm: Use aesenc_SB_SR > > target/ppc: Use aesenc_SB_SR > > target/riscv: Use aesenc_SB_SR > > crypto: Add aesdec_ISB_ISR > > target/i386: Use aesdec_ISB_ISR > > target/arm: Use aesdec_ISB_ISR > > target/ppc: Use aesdec_ISB_ISR > > target/riscv: Use aesdec_ISB_ISR > > crypto: Add aesenc_MC > > target/arm: Use aesenc_MC > > crypto: Add aesdec_IMC > > target/i386: Use aesdec_IMC > > target/arm: Use aesdec_IMC > > target/riscv: Use aesdec_IMC > > crypto: Add aesenc_SB_SR_MC_AK > > target/i386: Use aesenc_SB_SR_MC_AK > > target/ppc: Use aesenc_SB_SR_MC_AK > > target/riscv: Use aesenc_SB_SR_MC_AK > > crypto: Add aesdec_ISB_ISR_IMC_AK > > target/i386: Use aesdec_ISB_ISR_IMC_AK > > target/riscv: Use aesdec_ISB_ISR_IMC_AK > > crypto: Add aesdec_ISB_ISR_AK_IMC > > target/ppc: Use aesdec_ISB_ISR_AK_IMC > > host/include/i386: Implement aes-round.h > > host/include/aarch64: Implement aes-round.h > > crypto: Remove AES_shifts, AES_ishifts > > crypto: Implement aesdec_IMC with AES_imc_rot > > crypto: Remove AES_imc > > crypto: Unexport AES_*_rot, AES_TeN, AES_TdN > > > > This is looking very good - it is clearly a much better abstraction > than what I proposed, and I'd expect the performance boost to be the > same. Benchmark results for OpenSSL running in emulation on TX2: Without acceleration: $ ../qemu/build/qemu-x86_64 apps/openssl speed -evp aes-128-ctr version: 3.2.0-dev built on: Thu Jun 1 17:06:09 2023 UTC options: bn(64,64) compiler: x86_64-linux-gnu-gcc -pthread -m64 -Wa,--noexecstack -Wall -O3 -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_BUILDING_OPENSSL -DNDEBUG CPUINFO: OPENSSL_ia32cap=0xfed8320b0fcbfffd:0x8001020c01d843a9 The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes AES-128-CTR 25146.07k 50482.19k 69373.44k 76236.80k 78391.98k 78381.06k With acceleration: $ ../qemu/build/qemu-x86_64 apps/openssl speed -evp aes-128-ctr version: 3.2.0-dev built on: Thu Jun 1 17:06:09 2023 UTC options: bn(64,64) compiler: x86_64-linux-gnu-gcc -pthread -m64 -Wa,--noexecstack -Wall -O3 -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_BUILDING_OPENSSL -DNDEBUG CPUINFO: OPENSSL_ia32cap=0xfed8320b0fcbfffd:0x8001020c01d843a9 The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes AES-128-CTR 28774.46k 81173.59k 162346.24k 206301.53k 224214.22k 225600.56k