Message ID | 1441893770-27517-1-git-send-email-james.greenhalgh@arm.com |
---|---|
State | Superseded |
Headers | show |
On 10 September 2015 at 16:02, James Greenhalgh <james.greenhalgh@arm.com> wrote: > > On Wed, Sep 09, 2015 at 10:28:28AM +0100, Christophe Lyon wrote: >> On 9 September 2015 at 10:31, James Greenhalgh <james.greenhalgh@arm.com> wrote: >> > >> > Hi, >> > >> > This patch clears up some remaining confusion in the vector lane orderings >> > for the two intrinsics mentioned in the title. >> > >> > Bootstrapped on aarch64-none-linux-gnu and regression tested for >> > aarch64_be-none-elf with no issues. >> > >> >> Does this actually fix an existing testcase? > > Yes, of course, sorry - that was a useless introduction to the patch! > > First, I've updated the patch with a testcase, which fails for me on > aarch64_be-none-elf but not aarch64-*-*. > > The issue is that the RTL folding routines will happily fold through > a vec_concat or a vec_select, which we have given the wrong operands > to when in BYTES_BIG_ENDIAN mode. The fix is similar to that which we > have elsewhere in aarch64-simd.md, which is to split out the big > and little endian forms of the patterns which need vec_concat, and > to build a vec_par_cnst_*_half mask for the patterns which need > vec_select. This keeps us in the GCC-view of lane ordering. > > There is test coverage that these patterns do the right thing for the > vectorizer (I know, because I initially typoed s/le/be and saw tests > gcc.dg/vect fall over), and the new testcase adds coverage for the > expansion path through intrinsics. > > I've rebased on top of Alan's patch, which goes halfway to fixing the > issue, but which didn't fix the float_truncate patterns, which had an > incorrect vec_concat. That simplifies the patch considerably. > > Rechecked on aarch64_be-none-elf and aarch64-none-linux-gnu with no > issues. > > OK? The testcase should be modified so that it is skipped on arm* targets. Christophe. > > Thanks, > James > > --- > gcc/ > > 2015-09-09 James Greenhalgh <james.greenhalgh@arm.com> > > * config/aarch64/aarch64-simd.md > > (aarch64_float_truncate_hi_v4sf): Rewrite as an expand. > (aarch64_float_truncate_hi_v4sf_le): New. > (aarch64_float_truncate_hi_v4sf_be): Likewise. > > gcc/testsuite/ > > 2015-09-09 James Greenhalgh <james.greenhalgh@arm.com> > > * gcc.target/aarch64/advsimd-intrinsics/vcvt_high_1.c: New. >
[Resending in plain text] This makes sense to me now, although I find your comment slightly confusing: [....] in that +;; the meaning of HI and LO is always taken with a little-endian view of +;; the vector You mean vec_unpacks_{hi,lo} (which seems to go against the *architectural* bit after this), or hi/lo in cases other than vec_unpack (=> not "always"), or something else? maybe s/always/usually/ or s/always/otherwise/ ? Cheers, Alan On 10 September 2015 at 15:02, James Greenhalgh <james.greenhalgh@arm.com> wrote: > > On Wed, Sep 09, 2015 at 10:28:28AM +0100, Christophe Lyon wrote: >> On 9 September 2015 at 10:31, James Greenhalgh <james.greenhalgh@arm.com> wrote: >> > >> > Hi, >> > >> > This patch clears up some remaining confusion in the vector lane orderings >> > for the two intrinsics mentioned in the title. >> > >> > Bootstrapped on aarch64-none-linux-gnu and regression tested for >> > aarch64_be-none-elf with no issues. >> > >> >> Does this actually fix an existing testcase? > > Yes, of course, sorry - that was a useless introduction to the patch! > > First, I've updated the patch with a testcase, which fails for me on > aarch64_be-none-elf but not aarch64-*-*. > > The issue is that the RTL folding routines will happily fold through > a vec_concat or a vec_select, which we have given the wrong operands > to when in BYTES_BIG_ENDIAN mode. The fix is similar to that which we > have elsewhere in aarch64-simd.md, which is to split out the big > and little endian forms of the patterns which need vec_concat, and > to build a vec_par_cnst_*_half mask for the patterns which need > vec_select. This keeps us in the GCC-view of lane ordering. > > There is test coverage that these patterns do the right thing for the > vectorizer (I know, because I initially typoed s/le/be and saw tests > gcc.dg/vect fall over), and the new testcase adds coverage for the > expansion path through intrinsics. > > I've rebased on top of Alan's patch, which goes halfway to fixing the > issue, but which didn't fix the float_truncate patterns, which had an > incorrect vec_concat. That simplifies the patch considerably. > > Rechecked on aarch64_be-none-elf and aarch64-none-linux-gnu with no > issues. > > OK? > > Thanks, > James > > --- > gcc/ > > 2015-09-09 James Greenhalgh <james.greenhalgh@arm.com> > > * config/aarch64/aarch64-simd.md > > (aarch64_float_truncate_hi_v4sf): Rewrite as an expand. > (aarch64_float_truncate_hi_v4sf_le): New. > (aarch64_float_truncate_hi_v4sf_be): Likewise. > > gcc/testsuite/ > > 2015-09-09 James Greenhalgh <james.greenhalgh@arm.com> > > * gcc.target/aarch64/advsimd-intrinsics/vcvt_high_1.c: New. >
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index a4eaeca..8be9b97 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -1703,6 +1703,13 @@ [(set_attr "type" "neon_fp_cvt_widen_s")] ) +;; ??? Note that the vectorizer usage of the vec_unpacks_[lo/hi] patterns +;; is inconsistent with vector ordering elsewhere in the compiler, in that +;; the meaning of HI and LO is always taken with a little-endian view of +;; the vector. Thus, while the patterns below look incorrect in that +;; vec_unpacks_hi always extracts the high *architectural* lanes of a +;; vector, their behaviour is as required. + (define_expand "vec_unpacks_lo_<mode>" [(match_operand:<VWIDE> 0 "register_operand" "") (match_operand:VQ_HSF 1 "register_operand" "")] @@ -1757,17 +1764,42 @@ [(set_attr "type" "neon_fp_cvt_narrow_d_q")] ) -(define_insn "aarch64_float_truncate_hi_<Vdbl>" +(define_insn "aarch64_float_truncate_hi_<Vdbl>_le" [(set (match_operand:<VDBL> 0 "register_operand" "=w") (vec_concat:<VDBL> (match_operand:VDF 1 "register_operand" "0") (float_truncate:VDF (match_operand:<VWIDE> 2 "register_operand" "w"))))] - "TARGET_SIMD" + "TARGET_SIMD && !BYTES_BIG_ENDIAN" "fcvtn2\\t%0.<Vdtype>, %2<Vmwtype>" [(set_attr "type" "neon_fp_cvt_narrow_d_q")] ) +(define_insn "aarch64_float_truncate_hi_<Vdbl>_be" + [(set (match_operand:<VDBL> 0 "register_operand" "=w") + (vec_concat:<VDBL> + (float_truncate:VDF + (match_operand:<VWIDE> 2 "register_operand" "w")) + (match_operand:VDF 1 "register_operand" "0")))] + "TARGET_SIMD && BYTES_BIG_ENDIAN" + "fcvtn2\\t%0.<Vdtype>, %2<Vmwtype>" + [(set_attr "type" "neon_fp_cvt_narrow_d_q")] +) + +(define_expand "aarch64_float_truncate_hi_<Vdbl>" + [(match_operand:<VDBL> 0 "register_operand" "=w") + (match_operand:VDF 1 "register_operand" "0") + (match_operand:<VWIDE> 2 "register_operand" "w")] + "TARGET_SIMD" +{ + rtx (*gen) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN + ? gen_aarch64_float_truncate_hi_<Vdbl>_be + : gen_aarch64_float_truncate_hi_<Vdbl>_le; + emit_insn (gen (operands[0], operands[1], operands[2])); + DONE; +} +) + (define_expand "vec_pack_trunc_v2df" [(set (match_operand:V4SF 0 "register_operand") (vec_concat:V4SF diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_high_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_high_1.c new file mode 100644 index 0000000..492d6fd --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_high_1.c @@ -0,0 +1,97 @@ +#include "arm_neon.h" +#include <inttypes.h> + +void abort (void); + +void +foo (void) +{ + /* Test vcvt_high_f32_f64. */ + float32x2_t arg1; + float64x2_t arg2; + float32x4_t result; + arg1 = vcreate_f32 (UINT64_C (0x3f0db5793f6e1892)); + arg2 = vcombine_f64 (vcreate_f64 (UINT64_C (0x3fe8e49d23fb575d)), + vcreate_f64 (UINT64_C (0x3fd921291b3df73e))); + // Expect: "result" = 3ec909483f4724e93f0db5793f6e1892 + result = vcvt_high_f32_f64 (arg1, arg2); + float32_t got; + float32_t exp; + + /* Lane 0. */ + got = vgetq_lane_f32 (result, 0); + exp = ((float32_t) 0.9300624132156372); + if (((((exp / got) < ((float32_t) 0.999)) + || ((exp / got) > ((float32_t) 1.001))) + && (((exp - got) < ((float32_t) -1.0e-4)) + || ((exp - got) > ((float32_t) 1.0e-4))))) + abort (); + + /* Lane 1. */ + got = vgetq_lane_f32 (result, 1); + exp = ((float32_t) 0.5535503029823303); + if (((((exp / got) < ((float32_t) 0.999)) + || ((exp / got) > ((float32_t) 1.001))) + && (((exp - got) < ((float32_t) -1.0e-4)) + || ((exp - got) > ((float32_t) 1.0e-4))))) + abort (); + + /* Lane 2. */ + got = vgetq_lane_f32 (result, 2); + exp = ((float32_t) 0.7779069617051665); + if (((((exp / got) < ((float32_t) 0.999)) + || ((exp / got) > ((float32_t) 1.001))) + && (((exp - got) < ((float32_t) -1.0e-4)) + || ((exp - got) > ((float32_t) 1.0e-4))))) + abort (); + + /* Lane 2. */ + got = vgetq_lane_f32 (result, 3); + exp = ((float32_t) 0.3926489606891329); + if (((((exp / got) < ((float32_t) 0.999)) + || ((exp / got) > ((float32_t) 1.001))) + && (((exp - got) < ((float32_t) -1.0e-4)) + || ((exp - got) > ((float32_t) 1.0e-4))))) + abort (); +} + +void +bar (void) +{ + /* Test vcvt_high_f64_f32. */ + float32x4_t arg1; + float64x2_t result; + arg1 = vcombine_f32 (vcreate_f32 (UINT64_C (0x3f7c5cf13f261f74)), + vcreate_f32 (UINT64_C (0x3e3a7bc03f6ccc1d))); + // Expect: "result" = 3fc74f78000000003fed9983a0000000 + result = vcvt_high_f64_f32 (arg1); + + float64_t got; + float64_t exp; + + /* Lane 0. */ + got = vgetq_lane_f64 (result, 0); + exp = 0.9249895215034485; + if (((((exp / got) < 0.999) + || ((exp / got) > 1.001)) + && (((exp - got) < -1.0e-4) + || ((exp - got) > 1.0e-4)))) + abort (); + + /* Lane 0. */ + got = vgetq_lane_f64 (result, 1); + exp = 0.1821126937866211; + if (((((exp / got) < 0.999) + || ((exp / got) > 1.001)) + && (((exp - got) < -1.0e-4) + || ((exp - got) > 1.0e-4)))) + abort (); +} + +int +main (int argc, char **argv) +{ + foo (); + bar (); + return 0; +}