diff mbox

[ARM] Use vector wide add for mixed-mode adds

Message ID 566995BE.8040206@arm.com
State New
Headers show

Commit Message

Kyrylo Tkachov Dec. 10, 2015, 3:09 p.m. UTC
Hi Michael,

A few comments while I look deeper into this patch...

On 30/11/15 01:18, Michael Collison wrote:
>

> This is a modified version of my previous patch that supports vector wide add. I added support for vaddw on big endian when generating the parallel operand for the vector select.

>

> There are four failing test cases on arm big endian with similar code. They are:

>

> gcc.dg/vect/vect-outer-4f.c -flto -ffat-lto-objects execution test

> gcc.dg/vect/vect-outer-4g.c -flto -ffat-lto-objects execution test

> gcc.dg/vect/vect-outer-4k.c -flto -ffat-lto-objects execution test

> gcc.dg/vect/vect-outer-4l.c -flto -ffat-lto-objects execution test

>

>

> The failures occur without my patch and are related to a bug with vector loads using VUZP operations.

>

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68532

>

> Validated on arm-none-eabi, arm-none-linux-gnueabi, arm-none-linux-gnueabihf, and armeb-none-linux-gnueabihf.

>

> 2015-11-29  Michael Collison  <michael.collison@linaro.org>

>

>     * config/arm/neon.md (widen_<us>sum<mode>): New patterns where

>     mode is VQI to improve mixed mode vectorization.

>     * config/arm/neon.md (vec_sel_widen_ssum_lo<VQI:mode><VW:mode>3): New

>     define_insn to match low half of signed vaddw.

>     * config/arm/neon.md (vec_sel_widen_ssum_hi<VQI:mode><VW:mode>3): New

>     define_insn to match high half of signed vaddw.

>     * config/arm/neon.md (vec_sel_widen_usum_lo<VQI:mode><VW:mode>3): New

>     define_insn to match low half of unsigned vaddw.

>     * config/arm/neon.md (vec_sel_widen_usum_hi<VQI:mode><VW:mode>3): New

>     define_insn to match high half of unsigned vaddw.

>     * config/arm/arm.c (aarch32_simd_vect_par_cnst_half): New function.

>     (aarch32_simd_check_vect_par_cnst_half): Likewise.

>     * config/arm/arm-protos.h (aarch32_simd_vect_par_cnst_half): Prototype

>     for new function.

>     (aarch32_simd_check_vect_par_cnst_half): Likewise.

>     * config/arm/predicates.md (vect_par_constant_high): Support

>     big endian and simplify by calling

>     aarch32_simd_check_vect_par_cnst_half

>     (vect_par_constant_low): Likewise.

>     * testsuite/gcc.target/arm/neon-vaddws16.c: New test.

>     * testsuite/gcc.target/arm/neon-vaddws32.c: New test.

>     * testsuite/gcc.target/arm/neon-vaddwu16.c: New test.

>     * testsuite/gcc.target/arm/neon-vaddwu32.c: New test.

>     * testsuite/gcc.target/arm/neon-vaddwu8.c: New test.

>     * testsuite/lib/target-supports.exp

>     (check_effective_target_vect_widen_sum_hi_to_si_pattern): Indicate

>     that arm neon support vector widen sum of HImode TO SImode.

>

> Okay for trunk?

>
diff mbox

Patch

--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -50,7 +50,9 @@  extern tree arm_builtin_decl (unsigned code, bool initialize_p
  			      ATTRIBUTE_UNUSED);
  extern void arm_init_builtins (void);
  extern void arm_atomic_assign_expand_fenv (tree *hold, tree *clear, tree *update);
-
+extern rtx aarch32_simd_vect_par_cnst_half (machine_mode mode, bool high);
+extern bool aarch32_simd_check_vect_par_cnst_half (rtx op, machine_mode mode,
+						   bool high);


Please use arm instead of aarch32 in the name to be consistent with the rest of the
backend. Also, for functions that return a bool without side-effects it's preferable
to finish their name with '_p'. So for the second one I'd drop the 'check' and call
it something like "arm_vector_of_lane_nums_p ", is that a more descriptive name?

+/* Check OP for validity as a PARALLEL RTX vector with elements
+   numbering the lanes of either the high (HIGH == TRUE) or low lanes,
+   from the perspective of the architecture.  See the diagram above
+   aarch64_simd_vect_par_cnst_half for more details.  */
+

aarch64?

--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -1174,6 +1174,51 @@ 
  
  ;; Widening operations
  
+(define_expand "widen_ssum<mode>3"
+  [(set (match_operand:<V_double_width> 0 "s_register_operand" "")
+	(plus:<V_double_width> (sign_extend:<V_double_width> (match_operand:VQI 1 "s_register_operand" ""))
+			       (match_operand:<V_double_width> 2 "s_register_operand" "")))]
+  "TARGET_NEON"
+  {
+    machine_mode mode = GET_MODE (operands[1]);
+    rtx p1, p2;
+
+    p1  = aarch32_simd_vect_par_cnst_half (mode, false);
+    p2  = aarch32_simd_vect_par_cnst_half (mode, true);
+
+    if (operands[0] != operands[2])
+      emit_move_insn (operands[0], operands[2]);
+
+    emit_insn (gen_vec_sel_widen_ssum_lo<mode><V_half>3 (operands[0], operands[1], p1, operands[0]));
+    emit_insn (gen_vec_sel_widen_ssum_hi<mode><V_half>3 (operands[0], operands[1], p2, operands[0]));
+    DONE;
+  }

Please format these properly to avoid long lines.
Thanks,
Kyrill