diff mbox series

crypto: x86/sha512 - load based on CPU features

Message ID 20220813230431.2666-1-elliott@hpe.com
State Accepted
Commit aa031b8f702e7941b4c86022348a366c335d389a
Headers show
Series crypto: x86/sha512 - load based on CPU features | expand

Commit Message

Elliott, Robert (Servers) Aug. 13, 2022, 11:04 p.m. UTC
x86 optimized crypto modules built as modules rather than built-in
to the kernel end up as .ko files in the filesystem, e.g., in
/usr/lib/modules. If the filesystem itself is a module, these might
not be available when the crypto API is initialized, resulting in
the generic implementation being used (e.g., sha512_transform rather
than sha512_transform_avx2).

In one test case, CPU utilization in the sha512 function dropped
from 15.34% to 7.18% after forcing loading of the optimized module.

Add module aliases for this x86 optimized crypto module based on CPU
feature bits so udev gets a chance to load them later in the boot
process when the filesystems are all running.

Signed-off-by: Robert Elliott <elliott@hpe.com>
---
 arch/x86/crypto/sha512_ssse3_glue.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

Comments

Tim Chen Aug. 30, 2022, 5:40 p.m. UTC | #1
On Fri, 2022-08-26 at 10:52 +0800, Herbert Xu wrote:
> On Fri, Aug 26, 2022 at 02:40:58AM +0000, Elliott, Robert (Servers) wrote:
> > Suggestion: please revert the sha512-x86 patch for a while.
> 
> This problem would have existed anyway if the module was built
> into the kernel.
> 
> > Do these functions need to break up their processing into smaller chunks
> > (e.g., a few Megabytes), calling kernel_fpu_end() periodically to 
> > allow the scheduler to take over the CPUs if needed? If so, what
> > chunk size would be appropriate?
> 
> Yes these should be limited to 4K each.  It appears that all the
> sha* helpers in arch/x86/crypto have the same problem.
> 

I think limiting chunk to 4K is reasonable to prevent the CPU from being hogged by
the crypto code for too long.

Tim
diff mbox series

Patch

diff --git a/arch/x86/crypto/sha512_ssse3_glue.c b/arch/x86/crypto/sha512_ssse3_glue.c
index 30e70f4fe2f7..6d3b85e53d0e 100644
--- a/arch/x86/crypto/sha512_ssse3_glue.c
+++ b/arch/x86/crypto/sha512_ssse3_glue.c
@@ -36,6 +36,7 @@ 
 #include <linux/types.h>
 #include <crypto/sha2.h>
 #include <crypto/sha512_base.h>
+#include <asm/cpu_device_id.h>
 #include <asm/simd.h>
 
 asmlinkage void sha512_transform_ssse3(struct sha512_state *state,
@@ -284,6 +285,13 @@  static int register_sha512_avx2(void)
 			ARRAY_SIZE(sha512_avx2_algs));
 	return 0;
 }
+static const struct x86_cpu_id module_cpu_ids[] = {
+	X86_MATCH_FEATURE(X86_FEATURE_AVX2, NULL),
+	X86_MATCH_FEATURE(X86_FEATURE_AVX, NULL),
+	X86_MATCH_FEATURE(X86_FEATURE_SSSE3, NULL),
+	{}
+};
+MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
 
 static void unregister_sha512_avx2(void)
 {
@@ -294,6 +302,8 @@  static void unregister_sha512_avx2(void)
 
 static int __init sha512_ssse3_mod_init(void)
 {
+	if (!x86_match_cpu(module_cpu_ids))
+		return -ENODEV;
 
 	if (register_sha512_ssse3())
 		goto fail;