Message ID | 55FBD3B4.9050709@arm.com |
---|---|
State | New |
Headers | show |
Hi Christian, On 21/09/15 14:43, Christian Bruel wrote: > Hi Kyrill, > > Thanks for your comments. Answers interleaved and the new patch attached. > > On 09/18/2015 11:04 AM, Kyrill Tkachov wrote: >> On 15/09/15 11:47, Christian Bruel wrote: >>> On 09/14/2015 04:30 PM, Christian Bruel wrote: >>>> Finally, the final part of the patch set does the attribute target >>>> parsing and checking, redefines the preprocessor macros and implements >>>> the inlining rules. >>>> >>>> testcases and documentation included. >>>> >>> new version to remove a shadowed remnant piece of code. >>> >>> >>> > thanks >>> > >>> > Christian >>> > >> + /* OK to inline between different modes. >> + Function with mode specific instructions, e.g using asm, >> + must be explicitely protected with noinline. */ >> >> s/explicitely/explicitly/ >> > thanks > >> + const struct arm_fpu_desc *fpu_desc1 >> + = &all_fpus[caller_opts->x_arm_fpu_index]; >> + const struct arm_fpu_desc *fpu_desc2 >> + = &all_fpus[callee_opts->x_arm_fpu_index]; >> >> Please call these caller_fpu and callee_fpu, it's much easier to reason about the inlining rules that way > ok > >> + >> + /* Can't inline NEON extension if the caller doesn't support it. */ >> + if (ARM_FPU_FSET_HAS (fpu_desc2->features, FPU_FL_NEON) >> + && ! ARM_FPU_FSET_HAS (fpu_desc1->features, FPU_FL_NEON)) >> + return false; >> + >> + /* Can't inline CRYPTO extension if the caller doesn't support it. */ >> + if (ARM_FPU_FSET_HAS (fpu_desc2->features, FPU_FL_CRYPTO) >> + && ! ARM_FPU_FSET_HAS (fpu_desc1->features, FPU_FL_CRYPTO)) >> + return false; >> + >> >> We also need to take into account FPU_FL_FP16... >> In general what we want is for the callee FPU features to be >> a subset of the callers features, similar to the way we handle >> the x_aarch64_isa_flags handling in aarch64_can_inline_p from the >> aarch64 port. I think that's the way to go here rather than explicitly >> writing down a check for each feature. > ok, with FL_FP16 now, > >> @@ -242,6 +239,8 @@ >> >> /* Update macros. */ >> gcc_assert (cur_opt->x_target_flags == target_flags); >> + /* This one can be redefined by the pragma without warning. */ >> + cpp_undef (parse_in, "__ARM_FP"); >> arm_cpu_builtins (parse_in); >> >> Could you elaborate why the cpp_undef here? >> If you want to undefine __ARM_FP so you can redefine it to a new value >> in arm_cpu_builtins then I think you should just undefine it in that function. > This is to avoid a warning: "__ARM_FP" redefined when creating a new > pragma scope. (See the test attr-crypto.c). > > We cannot call the cpp_undef inside arm_cpu_builtins, because it is also > used for the TARGET_CPU_CPP_BUILTINS hook and then would prevent real > illegitimate redefinitions. > > Alternatively, I thought to reset the warn_builtin_macro_redefined flag, > but that doesn't work as the macro is not NODE_BUILTIN (see the > definition of warn_of_redefinition in libcpp). > We might need to change this later : should target macros be marked as > NOTE_BUILTIN ? We can discuss this separately (I can open a defect) as > we have the cpp_undep solution for now, if you agree. > >> >> diff -ruN gnu_trunk.p3/gcc/gcc/doc/invoke.texi gnu_trunk.p4/gcc/gcc/doc/invoke.texi >> --- gnu_trunk.p3/gcc/gcc/doc/invoke.texi 2015-09-10 12:21:00.698911244 +0200 >> +++ gnu_trunk.p4/gcc/gcc/doc/invoke.texi 2015-09-14 10:27:20.281932581 +0200 >> @@ -13360,6 +13363,8 @@ >> floating-point arithmetic (in particular denormal values are treated as >> zero), so the use of NEON instructions may lead to a loss of precision. >> >> +You can also set the fpu name at function level by using the @code{target("mfpu=")} function attributes (@pxref{ARM Function Attributes}) or pragmas (@pxref{Function Specific Option Pragmas}). >> + >> >> s/"mfpu="/"fpu=" >> > thanks > >> --- gnu_trunk.p3/gcc/gcc/testsuite/gcc.target/arm/attr-neon.c 1970-01-01 01:00:00.000000000 +0100 >> +++ gnu_trunk.p4/gcc/gcc/testsuite/gcc.target/arm/attr-neon.c 2015-09-14 16:12:08.449698268 +0200 >> @@ -0,0 +1,26 @@ >> +/* { dg-do compile } */ >> +/* { dg-require-effective-target arm_neon_ok } */ >> +/* { dg-options "-O3 -mfloat-abi=softfp -ftree-vectorize" } */ >> + >> +void >> +f3(int n, int x[], int y[]) { >> + int i; >> + for (i = 0; i < n; ++i) >> + y[i] = x[i] << 3; >> +} >> + >> >> What if GCC has been configured with --with-fpu=neon? >> Then f3 will be compiled assuming NEON. You should add a -mfpu=vfp to the dg-options. > Ah yes. I've added ((target("fpu=vfp")) instead, since we are testing > the attribute. > 2015-05-26 Christian Bruel<christian.bruel@st.com> PR target/65837 * config/arm/arm-c.c (arm_cpu_builtins): Set or reset __ARM_FEATURE_CRYPTO, __VFP_FP__, __ARM_NEON__ (arm_pragma_target_parse): Change check for arm_cpu_builtins. undefine __ARM_FP. * config/arm/arm.c (arm_can_inline_p): Check FPUs. (arm_valid_target_attribute_rec): Handle -mfpu attribute target. * doc/invoke.texi (-mfpu=): Mention attribute and pragma. * doc/extend.texi (-mfpu=): Describe attribute. 2015-09-14 Christian Bruel<christian.bruel@st.com> PR target/65837 gcc.target/arm/lto/pr65837_0.c gcc.target/arm/attr-neon2.c gcc.target/arm/attr-neon.c gcc.target/arm/attr-neon-builtin-fail.c gcc.target/arm/attr-crypto.c The parts in this patch look ok to me. However, I think we need some more functionality In aarch64 we support compiling a file with no simd, including arm_neon.h and using arm_neon.h intrinsics within functions tagged with simd support. We want to support such functionality on arm i.e. compile a file with -mfpu=vfp and use arm_neon.h intrinsics in a function tagged with an fpu=neon attribute. For that we'd need to wrap the intrinsics in arm_neon.h in appropriate pragmas, like in the aarch64 version of arm_neon.h Thanks, Kyrill
Hi Christian, On 12/11/15 14:54, Christian Bruel wrote: > Hi Kyril, > >> ... >> The parts in this patch look ok to me. >> However, I think we need some more functionality >> In aarch64 we support compiling a file with no simd, including arm_neon.h and using arm_neon.h intrinsics >> within functions tagged with simd support. >> We want to support such functionality on arm i.e. compile a file with -mfpu=vfp and use arm_neon.h intrinsics >> in a function tagged with an fpu=neon attribute. >> For that we'd need to wrap the intrinsics in arm_neon.h in appropriate pragmas, like in the aarch64 version of arm_neon.h > > As discussed, here is arm_neon.h for aarch32/neon with the same programming model than aarch64/simd. As you said lets use one of the fpu=neon attributes even if the file is compiled with -mfpu=vfp. > > The drawback for this is that now we unconditionally makes available every neon intrinsics, introducing a small legacy change with regards to error checking (that you didn't have with aarch64). Then it's worth to stress that: > > - One cannot check #include "arm_neon.h" to check if the compiler can use neon instruction. Instead use #ifndef __ARM_NEON__. (Found in target-supports.exp) Checking the macro is the 'canonical' way to check for NEON support, so I reckon we can live with that. > > > - Types cannot be checked. For instance: > > #include <arm_neon.h> > > poly128_t > foo (poly128_t* ptr) > { > return vldrq_p128 (ptr); > } > > compiled with -mfpu=neon used to be rejected with > > error: unknown type name 'poly128_t' ... > > Now the error, as a side effect from the inlining rules between incompatible modes, becomes > > error: inlining failed in call to always_inline 'vldrq_p128': target specific option mismatch ... Well, the previous message is misleading anyway since the user error there is not a type issue but failure to specify the correct -mfpu option. > > I found this more confusing, so I was a little bit reluctant to implement this, but the code is correctly rejected and the message makes sense, after all. Just a different check. > > This patch applies on top of the preceding attribute/pragma target fpu= series. Tested with arm-none-eabi configured with default and --with-cpu=cortex-a9 --with-fp --with-float=hard Do you mean --with-fpu=<something>? > > Also fixes a few macro that depends on fpu=, that I forgot to redefine. Can you please split those changes into a separate patch and ChangeLog and commit the separately? That part is preapproved. This patch is ok then with above comment about splitting the arm-c.c changes separately. Thanks for doing this! I believe all patches in this series are approved then so you can go ahead and start committing. Kyrill > > Christian >
diff -ruN gnu_trunk.p3/gcc/gcc/doc/invoke.texi gnu_trunk.p4/gcc/gcc/doc/invoke.texi --- gnu_trunk.p3/gcc/gcc/doc/invoke.texi 2015-09-10 12:21:00.698911244 +0200 +++ gnu_trunk.p4/gcc/gcc/doc/invoke.texi 2015-09-14 10:27:20.281932581 +0200 @@ -13360,6 +13363,8 @@ floating-point arithmetic (in particular denormal values are treated as zero), so the use of NEON instructions may lead to a loss of precision. +You can also set the fpu name at function level by using the @code{target("mfpu=")} function attributes (@pxref{ARM Function Attributes}) or pragmas (@pxref{Function Specific Option Pragmas}). + s/"mfpu="/"fpu=" --- gnu_trunk.p3/gcc/gcc/testsuite/gcc.target/arm/attr-neon.c 1970-01-01 01:00:00.000000000 +0100 +++ gnu_trunk.p4/gcc/gcc/testsuite/gcc.target/arm/attr-neon.c 2015-09-14 16:12:08.449698268 +0200 @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_neon_ok } */ +/* { dg-options "-O3 -mfloat-abi=softfp -ftree-vectorize" } */ + +void +f3(int n, int x[], int y[]) { + int i; + for (i = 0; i < n; ++i) + y[i] = x[i] << 3; +} + What if GCC has been configured with --with-fpu=neon? Then f3 will be compiled assuming NEON. You should add a -mfpu=vfp to the dg-options.