Message ID | 20220813230431.2666-1-elliott@hpe.com |
---|---|
State | Accepted |
Commit | aa031b8f702e7941b4c86022348a366c335d389a |
Headers | show |
Series | crypto: x86/sha512 - load based on CPU features | expand |
On Fri, 2022-08-26 at 10:52 +0800, Herbert Xu wrote: > On Fri, Aug 26, 2022 at 02:40:58AM +0000, Elliott, Robert (Servers) wrote: > > Suggestion: please revert the sha512-x86 patch for a while. > > This problem would have existed anyway if the module was built > into the kernel. > > > Do these functions need to break up their processing into smaller chunks > > (e.g., a few Megabytes), calling kernel_fpu_end() periodically to > > allow the scheduler to take over the CPUs if needed? If so, what > > chunk size would be appropriate? > > Yes these should be limited to 4K each. It appears that all the > sha* helpers in arch/x86/crypto have the same problem. > I think limiting chunk to 4K is reasonable to prevent the CPU from being hogged by the crypto code for too long. Tim
diff --git a/arch/x86/crypto/sha512_ssse3_glue.c b/arch/x86/crypto/sha512_ssse3_glue.c index 30e70f4fe2f7..6d3b85e53d0e 100644 --- a/arch/x86/crypto/sha512_ssse3_glue.c +++ b/arch/x86/crypto/sha512_ssse3_glue.c @@ -36,6 +36,7 @@ #include <linux/types.h> #include <crypto/sha2.h> #include <crypto/sha512_base.h> +#include <asm/cpu_device_id.h> #include <asm/simd.h> asmlinkage void sha512_transform_ssse3(struct sha512_state *state, @@ -284,6 +285,13 @@ static int register_sha512_avx2(void) ARRAY_SIZE(sha512_avx2_algs)); return 0; } +static const struct x86_cpu_id module_cpu_ids[] = { + X86_MATCH_FEATURE(X86_FEATURE_AVX2, NULL), + X86_MATCH_FEATURE(X86_FEATURE_AVX, NULL), + X86_MATCH_FEATURE(X86_FEATURE_SSSE3, NULL), + {} +}; +MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids); static void unregister_sha512_avx2(void) { @@ -294,6 +302,8 @@ static void unregister_sha512_avx2(void) static int __init sha512_ssse3_mod_init(void) { + if (!x86_match_cpu(module_cpu_ids)) + return -ENODEV; if (register_sha512_ssse3()) goto fail;
x86 optimized crypto modules built as modules rather than built-in to the kernel end up as .ko files in the filesystem, e.g., in /usr/lib/modules. If the filesystem itself is a module, these might not be available when the crypto API is initialized, resulting in the generic implementation being used (e.g., sha512_transform rather than sha512_transform_avx2). In one test case, CPU utilization in the sha512 function dropped from 15.34% to 7.18% after forcing loading of the optimized module. Add module aliases for this x86 optimized crypto module based on CPU feature bits so udev gets a chance to load them later in the boot process when the filesystems are all running. Signed-off-by: Robert Elliott <elliott@hpe.com> --- arch/x86/crypto/sha512_ssse3_glue.c | 10 ++++++++++ 1 file changed, 10 insertions(+)