mbox series

[v7,0/9] crypto: HCTR2 support

Message ID 20220509191107.3556468-1-nhuck@google.com
Headers show
Series crypto: HCTR2 support | expand

Message

Nathan Huckleberry May 9, 2022, 7:10 p.m. UTC
HCTR2 is a length-preserving encryption mode that is efficient on
processors with instructions to accelerate AES and carryless
multiplication, e.g. x86 processors with AES-NI and CLMUL, and ARM
processors with the ARMv8 Crypto Extensions.

HCTR2 is specified in https://ia.cr/2021/1441 "Length-preserving encryption
with HCTR2" which shows that if AES is secure and HCTR2 is instantiated
with AES, then HCTR2 is secure.  Reference code and test vectors are at
https://github.com/google/hctr2.

As a length-preserving encryption mode, HCTR2 is suitable for applications
such as storage encryption where ciphertext expansion is not possible, and
thus authenticated encryption cannot be used.  Currently, such applications
usually use XTS, or in some cases Adiantum.  XTS has the disadvantage that
it is a narrow-block mode: a bitflip will only change 16 bytes in the
resulting ciphertext or plaintext.  This reveals more information to an
attacker than necessary.

HCTR2 is a wide-block mode, so it provides a stronger security property: a
bitflip will change the entire message.  HCTR2 is somewhat similar to
Adiantum, which is also a wide-block mode.  However, HCTR2 is designed to
take advantage of existing crypto instructions, while Adiantum targets
devices without such hardware support.  Adiantum is also designed with
longer messages in mind, while HCTR2 is designed to be efficient even on
short messages.

The first intended use of this mode in the kernel is for the encryption of
filenames, where for efficiency reasons encryption must be fully
deterministic (only one ciphertext for each plaintext) and the existing CBC
solution leaks more information than necessary for filenames with common
prefixes.

HCTR2 uses two passes of an ε-almost-∆-universal hash function called
POLYVAL and one pass of a block cipher mode called XCTR.  POLYVAL is a
polynomial hash designed for efficiency on modern processors and was
originally specified for use in AES-GCM-SIV (RFC 8452).  XCTR mode is a
variant of CTR mode that is more efficient on little-endian machines.

This patchset adds HCTR2 to Linux's crypto API, including generic
implementations of XCTR and POLYVAL, hardware accelerated implementations
of XCTR and POLYVAL for both x86-64 and ARM64, a templated implementation
of HCTR2, and an fscrypt policy for using HCTR2 for filename encryption.

Changes in v7:
 * Added/modified some comments in ARM64 XCTR/CTR implementation
 * Various small style fixes

Changes in v6:
 * Split ARM64 XCTR/CTR refactoring into separate patch
 * Allow simd POLYVAL implementations to be preempted
 * Fix uninitialized bug in HCTR2
 * Fix streamcipher name handling bug in HCTR2
 * Various small style fixes

Changes in v5:
 * Refactor HCTR2 tweak hashing
 * Remove non-AVX x86-64 XCTR implementation
 * Combine arm64 CTR and XCTR modes
 * Comment and alias CTR and XCTR modes
 * Move generic fallback code for simd POLYVAL into polyval-generic.c
 * Various small style fixes

Changes in v4:
 * Small style fixes in generic POLYVAL and XCTR
 * Move HCTR2 hash exporting/importing to helper functions
 * Rewrite montgomery reduction for x86-64 POLYVAL
 * Rewrite partial block handling for x86-64 POLYVAL
 * Optimize x86-64 POLYVAL loop handling
 * Remove ahash wrapper from x86-64 POLYVAL
 * Add simd-unavailable handling to x86-64 POLYVAL
 * Rewrite montgomery reduction for ARM64 POLYVAL
 * Rewrite partial block handling for ARM64 POLYVAL
 * Optimize ARM64 POLYVAL loop handling
 * Remove ahash wrapper from ARM64 POLYVAL
 * Add simd-unavailable handling to ARM64 POLYVAL

Changes in v3:
 * Improve testvec coverage for XCTR, POLYVAL and HCTR2
 * Fix endianness bug in xctr.c
 * Fix alignment issues in polyval-generic.c
 * Optimize hctr2.c by exporting/importing hash states
 * Fix blockcipher name derivation in hctr2.c
 * Move x86-64 XCTR implementation into aes_ctrby8_avx-x86_64.S
 * Reuse ARM64 CTR mode tail handling in ARM64 XCTR
 * Fix x86-64 POLYVAL comments
 * Fix x86-64 POLYVAL key_powers type to match asm
 * Fix ARM64 POLYVAL comments
 * Fix ARM64 POLYVAL key_powers type to match asm
 * Add XTS + HCTR2 policy to fscrypt

Nathan Huckleberry (9):
  crypto: xctr - Add XCTR support
  crypto: polyval - Add POLYVAL support
  crypto: hctr2 - Add HCTR2 support
  crypto: x86/aesni-xctr: Add accelerated implementation of XCTR
  crypto: arm64/aes-xctr: Add accelerated implementation of XCTR
  crypto: arm64/aes-xctr: Improve readability of XCTR and CTR modes
  crypto: x86/polyval: Add PCLMULQDQ accelerated implementation of
    POLYVAL
  crypto: arm64/polyval: Add PMULL accelerated implementation of POLYVAL
  fscrypt: Add HCTR2 support for filename encryption

 Documentation/filesystems/fscrypt.rst   |   22 +-
 arch/arm64/crypto/Kconfig               |    9 +-
 arch/arm64/crypto/Makefile              |    3 +
 arch/arm64/crypto/aes-glue.c            |   82 +-
 arch/arm64/crypto/aes-modes.S           |  338 +++--
 arch/arm64/crypto/polyval-ce-core.S     |  361 ++++++
 arch/arm64/crypto/polyval-ce-glue.c     |  193 +++
 arch/x86/crypto/Makefile                |    3 +
 arch/x86/crypto/aes_ctrby8_avx-x86_64.S |  232 ++--
 arch/x86/crypto/aesni-intel_glue.c      |  114 +-
 arch/x86/crypto/polyval-clmulni_asm.S   |  321 +++++
 arch/x86/crypto/polyval-clmulni_glue.c  |  203 +++
 crypto/Kconfig                          |   39 +-
 crypto/Makefile                         |    3 +
 crypto/hctr2.c                          |  581 +++++++++
 crypto/polyval-generic.c                |  245 ++++
 crypto/tcrypt.c                         |   10 +
 crypto/testmgr.c                        |   20 +
 crypto/testmgr.h                        | 1536 +++++++++++++++++++++++
 crypto/xctr.c                           |  191 +++
 fs/crypto/fscrypt_private.h             |    2 +-
 fs/crypto/keysetup.c                    |    7 +
 fs/crypto/policy.c                      |   14 +-
 include/crypto/polyval.h                |   22 +
 include/uapi/linux/fscrypt.h            |    3 +-
 25 files changed, 4355 insertions(+), 199 deletions(-)
 create mode 100644 arch/arm64/crypto/polyval-ce-core.S
 create mode 100644 arch/arm64/crypto/polyval-ce-glue.c
 create mode 100644 arch/x86/crypto/polyval-clmulni_asm.S
 create mode 100644 arch/x86/crypto/polyval-clmulni_glue.c
 create mode 100644 crypto/hctr2.c
 create mode 100644 crypto/polyval-generic.c
 create mode 100644 crypto/xctr.c
 create mode 100644 include/crypto/polyval.h

Comments

Eric Biggers May 9, 2022, 9:41 p.m. UTC | #1
On Mon, May 09, 2022 at 07:11:06PM +0000, Nathan Huckleberry wrote:
> Add hardware accelerated version of POLYVAL for ARM64 CPUs with
> Crypto Extensions support.
> 
> This implementation is accelerated using PMULL instructions to perform
> the finite field computations.  For added efficiency, 8 blocks of the
> message are processed simultaneously by precomputing the first 8
> powers of the key.
> 
> Karatsuba multiplication is used instead of Schoolbook multiplication
> because it was found to be slightly faster on ARM64 CPUs.  Montgomery
> reduction must be used instead of Barrett reduction due to the
> difference in modulus between POLYVAL's field and other finite fields.
> 
> More information on POLYVAL can be found in the HCTR2 paper:
> "Length-preserving encryption with HCTR2":
> https://eprint.iacr.org/2021/1441.pdf
> 
> Signed-off-by: Nathan Huckleberry <nhuck@google.com>
> Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
> ---
>  arch/arm64/crypto/Kconfig           |   5 +
>  arch/arm64/crypto/Makefile          |   3 +
>  arch/arm64/crypto/polyval-ce-core.S | 361 ++++++++++++++++++++++++++++
>  arch/arm64/crypto/polyval-ce-glue.c | 193 +++++++++++++++
>  4 files changed, 562 insertions(+)
>  create mode 100644 arch/arm64/crypto/polyval-ce-core.S
>  create mode 100644 arch/arm64/crypto/polyval-ce-glue.c

Reviewed-by: Eric Biggers <ebiggers@google.com>

- Eric
Eric Biggers May 9, 2022, 9:44 p.m. UTC | #2
On Mon, May 09, 2022 at 07:11:05PM +0000, Nathan Huckleberry wrote:
> diff --git a/arch/x86/crypto/polyval-clmulni_asm.S b/arch/x86/crypto/polyval-clmulni_asm.S
[...]
> +/*
> + * Computes the product of two 128-bit polynomials at the memory locations
> + * specified by (MSG + 16*i) and (KEY_POWERS + 16*i) and XORs the components of
> + * the 256-bit product into LO, MI, HI.
> + *
> + * Given:
> + *   X = [X_1 : X_0]
> + *   Y = [Y_1 : Y_0]
> + *
> + * We compute:
> + *   LO += X_0 * Y_0
> + *   MI += (X_0 + X_1) * (Y_0 + Y_1)
> + *   HI += X_1 * Y_1

The above comment (changed in v7) is describing Karatsuba multiplication, but
the actual code is using schoolbook multiplication.

Otherwise this looks good:

Reviewed-by: Eric Biggers <ebiggers@google.com>

- Eric