Message ID | 20241125041129.192999-1-ebiggers@kernel.org |
---|---|
Headers | show |
Series | x86: new optimized CRC functions, with VPCLMULQDQ support | expand |
On Mon, Nov 25, 2024 at 09:33:46AM +0100, Ingo Molnar wrote: > > * Eric Biggers <ebiggers@kernel.org> wrote: > > > From: Eric Biggers <ebiggers@google.com> > > > > Lift zmm_exclusion_list in aesni-intel_glue.c into the x86 CPU setup > > code, and add a new x86 CPU feature flag X86_FEATURE_PREFER_YMM that is > > set when the CPU is on this list. > > > > This allows other code in arch/x86/, such as the CRC library code, to > > apply the same exclusion list when deciding whether to execute 256-bit > > or 512-bit optimized functions. > > > > Note that full AVX512 support including zmm registers is still exposed > > to userspace and is still supported for in-kernel use. This flag just > > indicates whether in-kernel code should prefer to use ymm registers. > > > > Signed-off-by: Eric Biggers <ebiggers@google.com> > > --- > > arch/x86/crypto/aesni-intel_glue.c | 22 +--------------------- > > arch/x86/include/asm/cpufeatures.h | 1 + > > arch/x86/kernel/cpu/intel.c | 22 ++++++++++++++++++++++ > > 3 files changed, 24 insertions(+), 21 deletions(-) > > Acked-by: Ingo Molnar <mingo@kernel.org> > > I suppose you'd like to carry this in the crypto tree? I am planning to carry CRC-related patches myself (https://lore.kernel.org/lkml/20241117002244.105200-12-ebiggers@kernel.org/). > > > +/* > > + * This is a list of Intel CPUs that are known to suffer from downclocking when > > + * zmm registers (512-bit vectors) are used. On these CPUs, when the kernel > > + * executes SIMD-optimized code such as cryptography functions or CRCs, it > > + * should prefer 256-bit (ymm) code to 512-bit (zmm) code. > > + */ > > One speling nit, could you please do: > > s/ymm/YMM > s/zmm/ZMM > > ... to make it consistent with how the rest of the x86 code is > capitalizing the names of FPU vector register classes. Just like > we are capitalizing CPU and CRC properly ;-) > Will do, thanks. - Eric
On Mon, 25 Nov 2024 at 05:12, Eric Biggers <ebiggers@kernel.org> wrote: > > This patchset is also available in git via: > > git fetch https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git crc-x86-v1 > > This patchset applies on top of my other recent CRC patchsets > https://lore.kernel.org/r/20241103223154.136127-1-ebiggers@kernel.org/ and > https://lore.kernel.org/r/20241117002244.105200-1-ebiggers@kernel.org/ . > Consider it a preview for what may be coming next, as my priority is > getting those two other patchsets merged first. > > This patchset adds a new assembly macro that expands into the body of a > CRC function for x86 for the specified number of bits, bit order, vector > length, and AVX level. There's also a new script that generates the > constants needed by this function, given a CRC generator polynomial. > > This approach allows easily wiring up an x86-optimized implementation of > any variant of CRC-8, CRC-16, CRC-32, or CRC-64, including full support > for VPCLMULQDQ. On long messages the resulting functions are up to 4x > faster than the existing PCLMULQDQ optimized functions when they exist, > or up to 29x faster than the existing table-based functions. > > This patchset starts by wiring up the new macro for crc32_le, > crc_t10dif, and crc32_be. Later I'd also like to wire up crc64_be and > crc64_rocksoft, once the design of the library functions for those has > been fixed to be like what I'm doing for crc32* and crc_t10dif. > > A similar approach of sharing code between CRC variants, and vector > lengths when applicable, should work for other architectures. The CRC > constant generation script should be mostly reusable. > > Eric Biggers (6): > x86: move zmm exclusion list into CPU feature flag > scripts/crc: add gen-crc-consts.py > x86/crc: add "template" for [V]PCLMULQDQ based CRC functions > x86/crc32: implement crc32_le using new template > x86/crc-t10dif: implement crc_t10dif using new template > x86/crc32: implement crc32_be using new template > Good stuff! Acked-by: Ard Biesheuvel <ardb@kernel.org> Would indeed be nice to get CRC-64 implemented this way as well, so we can use it on both x86 and arm64.
On Fri, Nov 29, 2024 at 05:16:42PM +0100, Ard Biesheuvel wrote: > On Mon, 25 Nov 2024 at 05:12, Eric Biggers <ebiggers@kernel.org> wrote: > > > > This patchset is also available in git via: > > > > git fetch https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git crc-x86-v1 > > > > This patchset applies on top of my other recent CRC patchsets > > https://lore.kernel.org/r/20241103223154.136127-1-ebiggers@kernel.org/ and > > https://lore.kernel.org/r/20241117002244.105200-1-ebiggers@kernel.org/ . > > Consider it a preview for what may be coming next, as my priority is > > getting those two other patchsets merged first. > > > > This patchset adds a new assembly macro that expands into the body of a > > CRC function for x86 for the specified number of bits, bit order, vector > > length, and AVX level. There's also a new script that generates the > > constants needed by this function, given a CRC generator polynomial. > > > > This approach allows easily wiring up an x86-optimized implementation of > > any variant of CRC-8, CRC-16, CRC-32, or CRC-64, including full support > > for VPCLMULQDQ. On long messages the resulting functions are up to 4x > > faster than the existing PCLMULQDQ optimized functions when they exist, > > or up to 29x faster than the existing table-based functions. > > > > This patchset starts by wiring up the new macro for crc32_le, > > crc_t10dif, and crc32_be. Later I'd also like to wire up crc64_be and > > crc64_rocksoft, once the design of the library functions for those has > > been fixed to be like what I'm doing for crc32* and crc_t10dif. > > > > A similar approach of sharing code between CRC variants, and vector > > lengths when applicable, should work for other architectures. The CRC > > constant generation script should be mostly reusable. > > > > Eric Biggers (6): > > x86: move zmm exclusion list into CPU feature flag > > scripts/crc: add gen-crc-consts.py > > x86/crc: add "template" for [V]PCLMULQDQ based CRC functions > > x86/crc32: implement crc32_le using new template > > x86/crc-t10dif: implement crc_t10dif using new template > > x86/crc32: implement crc32_be using new template > > > > Good stuff! > > Acked-by: Ard Biesheuvel <ardb@kernel.org> > > Would indeed be nice to get CRC-64 implemented this way as well, so we > can use it on both x86 and arm64. Thanks! The template actually supports CRC-64 already (both LSB and MSB-first variants) and I've tested it in userspace. I just haven't wired it up to the kernel's CRC-64 functions yet. - Eric