Message ID | 20220129224529.76887-3-ardb@kernel.org |
---|---|
State | Superseded |
Headers | show |
Series | xor: enable auto-vectorization in Clang | expand |
On Sat, Jan 29, 2022 at 2:45 PM Ard Biesheuvel <ardb@kernel.org> wrote: > > The ARM version of the accelerated XOR routines are simply the 8-way C > routines passed through the auto-vectorizer with SIMD codegen enabled. > This used to require GCC version 4.6 at least, but given that 5.1 is now > the baseline, this check is no longer necessary, and actually > misidentifies Clang as GCC < 4.6 as Clang defines the GCC major/minor as > well, but makes no attempt at doing this in a way that conveys feature > parity with a certain version of GCC (which would not be a great idea in > the first place). > > So let's drop the version check, and make the auto-vectorize pragma > (which is based on a GCC-specific command line option) GCC-only. Since > Clang performs SIMD auto-vectorization by default at -O2, no pragma is > necessary here. > > Tested-by: Nathan Chancellor <nathan@kernel.org> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Thanks for the patch! Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Link: https://github.com/ClangBuiltLinux/linux/issues/496 Link: https://github.com/ClangBuiltLinux/linux/issues/503 > --- > arch/arm/lib/xor-neon.c | 12 +++--------- > 1 file changed, 3 insertions(+), 9 deletions(-) > > diff --git a/arch/arm/lib/xor-neon.c b/arch/arm/lib/xor-neon.c > index b99dd8e1c93f..522510baed49 100644 > --- a/arch/arm/lib/xor-neon.c > +++ b/arch/arm/lib/xor-neon.c > @@ -17,17 +17,11 @@ MODULE_LICENSE("GPL"); > /* > * Pull in the reference implementations while instructing GCC (through > * -ftree-vectorize) to attempt to exploit implicit parallelism and emit > - * NEON instructions. > + * NEON instructions. Clang does this by default at O2 so no pragma is > + * needed. > */ > -#if __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6) > +#ifdef CONFIG_CC_IS_GCC > #pragma GCC optimize "tree-vectorize" > -#else > -/* > - * While older versions of GCC do not generate incorrect code, they fail to > - * recognize the parallel nature of these functions, and emit plain ARM code, > - * which is known to be slower than the optimized ARM code in asm-arm/xor.h. > - */ > -#warning This code requires at least version 4.6 of GCC > #endif > > #pragma GCC diagnostic ignored "-Wunused-variable" > -- > 2.30.2 >
diff --git a/arch/arm/lib/xor-neon.c b/arch/arm/lib/xor-neon.c index b99dd8e1c93f..522510baed49 100644 --- a/arch/arm/lib/xor-neon.c +++ b/arch/arm/lib/xor-neon.c @@ -17,17 +17,11 @@ MODULE_LICENSE("GPL"); /* * Pull in the reference implementations while instructing GCC (through * -ftree-vectorize) to attempt to exploit implicit parallelism and emit - * NEON instructions. + * NEON instructions. Clang does this by default at O2 so no pragma is + * needed. */ -#if __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6) +#ifdef CONFIG_CC_IS_GCC #pragma GCC optimize "tree-vectorize" -#else -/* - * While older versions of GCC do not generate incorrect code, they fail to - * recognize the parallel nature of these functions, and emit plain ARM code, - * which is known to be slower than the optimized ARM code in asm-arm/xor.h. - */ -#warning This code requires at least version 4.6 of GCC #endif #pragma GCC diagnostic ignored "-Wunused-variable"