Message ID | 1441787488-19661-1-git-send-email-james.greenhalgh@arm.com |
---|---|
State | New |
Headers | show |
On 9 September 2015 at 10:31, James Greenhalgh <james.greenhalgh@arm.com> wrote: > > Hi, > > This patch clears up some remaining confusion in the vector lane orderings > for the two intrinsics mentioned in the title. > > Bootstrapped on aarch64-none-linux-gnu and regression tested for > aarch64_be-none-elf with no issues. > Does this actually fix an existing testcase? > OK? > > Thanks, > James > > --- > 2015-09-09 James Greenhalgh <james.greenhalgh@arm.com> > > * config/aarch64/aarch64-simd.md (vec_unpacks_lo_v4sf): Rewrite > as an expand. > (vec_unpacks_hi_v4sf): Likewise. > (aarch64_float_extend_lo_v2df): Rename to... > (aarch64_fcvtl_v4sf): This. > (aarch64_fcvtl2_v4sf): New. > (aarch64_float_truncate_hi_v4sf): Rewrite as an expand. > (aarch64_float_truncate_hi_v4sf_le): New. > (aarch64_float_truncate_hi_v4sf_be): Likewise. >
Hmmm, hang on. I'm not quite sure what the actual issue/bug is here, but is this the same issue as my patch 12 "with BE RTL fix"? (https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01482.html, explanation last at https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02365.html) I pushed this as r227551 last night and since this reparameterizes the patterns I don't think your patch will apply to current HEAD. If my patch is wrong...well, that may be, I haven't understood the issue yet. But it sounds like the first thing we need is a decent testcase? (Or is the confusion just in the RTL representation, so a testcase would require getting constant-folding to happen in RTL, which I tried but failed to make that happen myself?) --Alan On 09/09/15 09:31, James Greenhalgh wrote: > > Hi, > > This patch clears up some remaining confusion in the vector lane orderings > for the two intrinsics mentioned in the title. > > Bootstrapped on aarch64-none-linux-gnu and regression tested for > aarch64_be-none-elf with no issues. > > OK? > > Thanks, > James > > --- > 2015-09-09 James Greenhalgh <james.greenhalgh@arm.com> > > * config/aarch64/aarch64-simd.md (vec_unpacks_lo_v4sf): Rewrite > as an expand. > (vec_unpacks_hi_v4sf): Likewise. > (aarch64_float_extend_lo_v2df): Rename to... > (aarch64_fcvtl_v4sf): This. > (aarch64_fcvtl2_v4sf): New. > (aarch64_float_truncate_hi_v4sf): Rewrite as an expand. > (aarch64_float_truncate_hi_v4sf_le): New. > (aarch64_float_truncate_hi_v4sf_be): Likewise. >
On 09/09/15 11:31, Alan Lawrence wrote: > Hmmm, hang on. I'm not quite sure what the actual issue/bug is here, but is this > the same issue as my patch 12 "with BE RTL fix"? > (https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01482.html, explanation last at > https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02365.html) I pushed this as > r227551 last night and since this reparameterizes the patterns I don't think > your patch will apply to current HEAD. > > If my patch is wrong...well, that may be, I haven't understood the issue yet. In particular, we should expect the vec_unpacks standard pattern to have different behaviour (from a tree POV), as this is what I find searching for VEC_UNPACK in tree-vect-stmts.c: bool supportable_widening_operation { ... switch (code) { ... CASE_CONVERT: c1 = VEC_UNPACK_LO_EXPR; c2 = VEC_UNPACK_HI_EXPR; break; case FLOAT_EXPR: c1 = VEC_UNPACK_FLOAT_LO_EXPR; c2 = VEC_UNPACK_FLOAT_HI_EXPR; break; case FIX_TRUNC_EXPR: /* ??? Not yet implemented due to missing VEC_UNPACK_FIX_TRUNC_HI_EXPR/ VEC_UNPACK_FIX_TRUNC_LO_EXPR tree codes and optabs used for computing the operation. */ return false; default: gcc_unreachable (); } if (BYTES_BIG_ENDIAN && c1 != VEC_WIDEN_MULT_EVEN_EXPR) std::swap (c1, c2); Yes, IIUC this goes against the principle of tree being the same regardless of underlying endianness. --Alan
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 75fa0ab..c7ae956 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -1691,39 +1691,65 @@ ;; Float widening operations. -(define_insn "vec_unpacks_lo_v4sf" +(define_insn "aarch64_float_extend_lo_v2df" [(set (match_operand:V2DF 0 "register_operand" "=w") (float_extend:V2DF - (vec_select:V2SF - (match_operand:V4SF 1 "register_operand" "w") - (parallel [(const_int 0) (const_int 1)]) - )))] + (match_operand:V2SF 1 "register_operand" "w")))] "TARGET_SIMD" "fcvtl\\t%0.2d, %1.2s" [(set_attr "type" "neon_fp_cvt_widen_s")] ) -(define_insn "aarch64_float_extend_lo_v2df" +(define_insn "aarch64_fcvtl_v4sf" [(set (match_operand:V2DF 0 "register_operand" "=w") (float_extend:V2DF - (match_operand:V2SF 1 "register_operand" "w")))] + (vec_select:V2SF + (match_operand:V4SF 1 "register_operand" "w") + (match_operand:V4SF 2 "vect_par_cnst_lo_half" ""))))] "TARGET_SIMD" "fcvtl\\t%0.2d, %1.2s" [(set_attr "type" "neon_fp_cvt_widen_s")] ) -(define_insn "vec_unpacks_hi_v4sf" +(define_insn "aarch64_fcvtl2_v4sf" [(set (match_operand:V2DF 0 "register_operand" "=w") (float_extend:V2DF (vec_select:V2SF (match_operand:V4SF 1 "register_operand" "w") - (parallel [(const_int 2) (const_int 3)]) - )))] + (match_operand:V4SF 2 "vect_par_cnst_hi_half" ""))))] "TARGET_SIMD" "fcvtl2\\t%0.2d, %1.4s" [(set_attr "type" "neon_fp_cvt_widen_s")] ) +(define_expand "vec_unpacks_lo_v4sf" + [(match_operand:V2DF 0 "register_operand" "=w") + (match_operand:V4SF 1 "register_operand" "w")] + "TARGET_SIMD" +{ + rtx p = aarch64_simd_vect_par_cnst_half (V4SFmode, false); + rtx (*gen) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN + ? gen_aarch64_fcvtl2_v4sf + : gen_aarch64_fcvtl_v4sf; + emit_insn (gen (operands[0], operands[1], p)); + DONE; +} +) + +(define_expand "vec_unpacks_hi_v4sf" + [(match_operand:V2DF 0 "register_operand" "=w") + (match_operand:V4SF 1 "register_operand" "w")] + "TARGET_SIMD" +{ + rtx p = aarch64_simd_vect_par_cnst_half (V4SFmode, true); + rtx (*gen) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN + ? gen_aarch64_fcvtl_v4sf + : gen_aarch64_fcvtl2_v4sf; + emit_insn (gen (operands[0], operands[1], p)); + DONE; +} +) + ;; Float narrowing operations. (define_insn "aarch64_float_truncate_lo_v2sf" @@ -1735,17 +1761,42 @@ [(set_attr "type" "neon_fp_cvt_narrow_d_q")] ) -(define_insn "aarch64_float_truncate_hi_v4sf" +(define_insn "aarch64_float_truncate_hi_v4sf_le" [(set (match_operand:V4SF 0 "register_operand" "=w") (vec_concat:V4SF (match_operand:V2SF 1 "register_operand" "0") (float_truncate:V2SF (match_operand:V2DF 2 "register_operand" "w"))))] - "TARGET_SIMD" + "TARGET_SIMD && !BYTES_BIG_ENDIAN" "fcvtn2\\t%0.4s, %2.2d" [(set_attr "type" "neon_fp_cvt_narrow_d_q")] ) +(define_insn "aarch64_float_truncate_hi_v4sf_be" + [(set (match_operand:V4SF 0 "register_operand" "=w") + (vec_concat:V4SF + (float_truncate:V2SF + (match_operand:V2DF 2 "register_operand" "w")) + (match_operand:V2SF 1 "register_operand" "0")))] + "TARGET_SIMD && BYTES_BIG_ENDIAN" + "fcvtn2\\t%0.4s, %2.2d" + [(set_attr "type" "neon_fp_cvt_narrow_d_q")] +) + +(define_expand "aarch64_float_truncate_hi_v4sf" + [(match_operand:V4SF 0 "register_operand" "=w") + (match_operand:V2SF 1 "register_operand" "0") + (match_operand:V2DF 2 "register_operand" "w")] + "TARGET_SIMD" +{ + rtx (*gen) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN + ? gen_aarch64_float_truncate_hi_v4sf_be + : gen_aarch64_float_truncate_hi_v4sf_le; + emit_insn (gen (operands[0], operands[1], operands[2])); + DONE; +} +) + (define_expand "vec_pack_trunc_v2df" [(set (match_operand:V4SF 0 "register_operand") (vec_concat:V4SF