Message ID | 87bmkyxg9d.fsf@linaro.org |
---|---|
State | New |
Headers | show |
Series | [05/nn] Add VEC_DUPLICATE_{CST,EXPR} and associated optab | expand |
On Mon, Oct 23, 2017 at 1:20 PM, Richard Sandiford <richard.sandiford@linaro.org> wrote: > SVE needs a way of broadcasting a scalar to a variable-length vector. > This patch adds VEC_DUPLICATE_CST for when VECTOR_CST would be used for > fixed-length vectors and VEC_DUPLICATE_EXPR for when CONSTRUCTOR would > be used for fixed-length vectors. VEC_DUPLICATE_EXPR is the tree > equivalent of the existing rtl code VEC_DUPLICATE. > > Originally we had a single VEC_DUPLICATE_EXPR and used TREE_CONSTANT > to mark constant nodes, but in response to last year's RFC, Richard B. > suggested it would be better to have separate codes for the constant > and non-constant cases. This allows VEC_DUPLICATE_EXPR to be treated > as a normal unary operation and avoids the previous need for treating > it as a GIMPLE_SINGLE_RHS. > > It might make sense to use VEC_DUPLICATE_CST for all duplicated > vector constants, since it's a bit more compact than VECTOR_CST > in that case, and is potentially more efficient to process. > However, the nice thing about keeping it restricted to variable-length > vectors is that there is then no need to handle combinations of > VECTOR_CST and VEC_DUPLICATE_CST; a vector type will always use > VECTOR_CST or never use it. > > The patch also adds a vec_duplicate_optab to go with VEC_DUPLICATE_EXPR. Index: gcc/tree-vect-generic.c =================================================================== --- gcc/tree-vect-generic.c 2017-10-23 11:38:53.934094740 +0100 +++ gcc/tree-vect-generic.c 2017-10-23 11:41:51.773953100 +0100 @@ -1419,6 +1419,7 @@ lower_vec_perm (gimple_stmt_iterator *gs ssa_uniform_vector_p (tree op) { if (TREE_CODE (op) == VECTOR_CST + || TREE_CODE (op) == VEC_DUPLICATE_CST || TREE_CODE (op) == CONSTRUCTOR) return uniform_vector_p (op); VEC_DUPLICATE_EXPR handling? Looks like for VEC_DUPLICATE_CST it could directly return true. I didn't see uniform_vector_p being updated? Can you add verification to either verify_expr or build_vec_duplicate_cst that the type is one of variable size? And amend tree.def docs accordingly. Because otherwise we miss a lot of cases in constant folding (mixing VEC_DUPLICATE_CST and VECTOR_CST). Otherwise looks ok to me. Thanks, Richard. > > 2017-10-23 Richard Sandiford <richard.sandiford@linaro.org> > Alan Hayward <alan.hawyard@arm.com> > David Sherwood <david.sherwood@arm.com> > > gcc/ > * doc/generic.texi (VEC_DUPLICATE_CST, VEC_DUPLICATE_EXPR): Document. > (VEC_COND_EXPR): Add missing @tindex. > * doc/md.texi (vec_duplicate@var{m}): Document. > * tree.def (VEC_DUPLICATE_CST, VEC_DUPLICATE_EXPR): New tree codes. > * tree-core.h (tree_base): Document that u.nelts and TREE_OVERFLOW > are used for VEC_DUPLICATE_CST as well. > (tree_vector): Access base.n.nelts directly. > * tree.h (TREE_OVERFLOW): Add VEC_DUPLICATE_CST to the list of > valid codes. > (VEC_DUPLICATE_CST_ELT): New macro. > (build_vec_duplicate_cst): Declare. > * tree.c (tree_node_structure_for_code, tree_code_size, tree_size) > (integer_zerop, integer_onep, integer_all_onesp, integer_truep) > (real_zerop, real_onep, real_minus_onep, add_expr, initializer_zerop) > (walk_tree_1, drop_tree_overflow): Handle VEC_DUPLICATE_CST. > (build_vec_duplicate_cst): New function. > (uniform_vector_p): Handle the new codes. > (test_vec_duplicate_predicates_int): New function. > (test_vec_duplicate_predicates_float): Likewise. > (test_vec_duplicate_predicates): Likewise. > (tree_c_tests): Call test_vec_duplicate_predicates. > * cfgexpand.c (expand_debug_expr): Handle the new codes. > * tree-pretty-print.c (dump_generic_node): Likewise. > * dwarf2out.c (rtl_for_decl_init): Handle VEC_DUPLICATE_CST. > * gimple-expr.h (is_gimple_constant): Likewise. > * gimplify.c (gimplify_expr): Likewise. > * graphite-isl-ast-to-gimple.c > (translate_isl_ast_to_gimple::is_constant): Likewise. > * graphite-scop-detection.c (scan_tree_for_params): Likewise. > * ipa-icf-gimple.c (func_checker::compare_cst_or_decl): Likewise. > (func_checker::compare_operand): Likewise. > * ipa-icf.c (sem_item::add_expr, sem_variable::equals): Likewise. > * match.pd (negate_expr_p): Likewise. > * print-tree.c (print_node): Likewise. > * tree-chkp.c (chkp_find_bounds_1): Likewise. > * tree-loop-distribution.c (const_with_all_bytes_same): Likewise. > * tree-ssa-loop.c (for_each_index): Likewise. > * tree-ssa-pre.c (create_component_ref_by_pieces_1): Likewise. > * tree-ssa-sccvn.c (copy_reference_ops_from_ref): Likewise. > (ao_ref_init_from_vn_reference): Likewise. > * tree-vect-generic.c (ssa_uniform_vector_p): Likewise. > * varasm.c (const_hash_1, compare_constant): Likewise. > * fold-const.c (negate_expr_p, fold_negate_expr_1, const_binop) > (fold_convert_const, operand_equal_p, fold_view_convert_expr) > (exact_inverse, fold_checksum_tree): Likewise. > (const_unop): Likewise. Fold VEC_DUPLICATE_EXPRs of a constant. > (test_vec_duplicate_folding): New function. > (fold_const_c_tests): Call it. > * optabs.def (vec_duplicate_optab): New optab. > * optabs-tree.c (optab_for_tree_code): Handle VEC_DUPLICATE_EXPR. > * optabs.h (expand_vector_broadcast): Declare. > * optabs.c (expand_vector_broadcast): Make non-static. Try using > vec_duplicate_optab. > * expr.c (store_constructor): Try using vec_duplicate_optab for > uniform vectors. > (const_vector_element): New function, split out from... > (const_vector_from_tree): ...here. > (expand_expr_real_2): Handle VEC_DUPLICATE_EXPR. > (expand_expr_real_1): Handle VEC_DUPLICATE_CST. > * internal-fn.c (expand_vector_ubsan_overflow): Use CONSTANT_P > instead of checking for VECTOR_CST. > * tree-cfg.c (verify_gimple_assign_unary): Handle VEC_DUPLICATE_EXPR. > (verify_gimple_assign_single): Handle VEC_DUPLICATE_CST. > * tree-inline.c (estimate_operator_cost): Handle VEC_DUPLICATE_EXPR. > > Index: gcc/doc/generic.texi > =================================================================== > --- gcc/doc/generic.texi 2017-10-23 11:38:53.934094740 +0100 > +++ gcc/doc/generic.texi 2017-10-23 11:41:51.760448406 +0100 > @@ -1036,6 +1036,7 @@ As this example indicates, the operands > @tindex FIXED_CST > @tindex COMPLEX_CST > @tindex VECTOR_CST > +@tindex VEC_DUPLICATE_CST > @tindex STRING_CST > @findex TREE_STRING_LENGTH > @findex TREE_STRING_POINTER > @@ -1089,6 +1090,14 @@ constant nodes. Each individual constan > double constant node. The first operand is a @code{TREE_LIST} of the > constant nodes and is accessed through @code{TREE_VECTOR_CST_ELTS}. > > +@item VEC_DUPLICATE_CST > +These nodes represent a vector constant in which every element has the > +same scalar value. At present only variable-length vectors use > +@code{VEC_DUPLICATE_CST}; constant-length vectors use @code{VECTOR_CST} > +instead. The scalar element value is given by > +@code{VEC_DUPLICATE_CST_ELT} and has the same restrictions as the > +element of a @code{VECTOR_CST}. > + > @item STRING_CST > These nodes represent string-constants. The @code{TREE_STRING_LENGTH} > returns the length of the string, as an @code{int}. The > @@ -1692,6 +1701,7 @@ a value from @code{enum annot_expr_kind} > > @node Vectors > @subsection Vectors > +@tindex VEC_DUPLICATE_EXPR > @tindex VEC_LSHIFT_EXPR > @tindex VEC_RSHIFT_EXPR > @tindex VEC_WIDEN_MULT_HI_EXPR > @@ -1703,9 +1713,14 @@ a value from @code{enum annot_expr_kind} > @tindex VEC_PACK_TRUNC_EXPR > @tindex VEC_PACK_SAT_EXPR > @tindex VEC_PACK_FIX_TRUNC_EXPR > +@tindex VEC_COND_EXPR > @tindex SAD_EXPR > > @table @code > +@item VEC_DUPLICATE_EXPR > +This node has a single operand and represents a vector in which every > +element is equal to that operand. > + > @item VEC_LSHIFT_EXPR > @itemx VEC_RSHIFT_EXPR > These nodes represent whole vector left and right shifts, respectively. > Index: gcc/doc/md.texi > =================================================================== > --- gcc/doc/md.texi 2017-10-23 11:41:22.189466342 +0100 > +++ gcc/doc/md.texi 2017-10-23 11:41:51.761413027 +0100 > @@ -4888,6 +4888,17 @@ and operand 1 is parallel containing val > the vector mode @var{m}, or a vector mode with the same element mode and > smaller number of elements. > > +@cindex @code{vec_duplicate@var{m}} instruction pattern > +@item @samp{vec_duplicate@var{m}} > +Initialize vector output operand 0 so that each element has the value given > +by scalar input operand 1. The vector has mode @var{m} and the scalar has > +the mode appropriate for one element of @var{m}. > + > +This pattern only handles duplicates of non-constant inputs. Constant > +vectors go through the @code{mov@var{m}} pattern instead. > + > +This pattern is not allowed to @code{FAIL}. > + > @cindex @code{vec_cmp@var{m}@var{n}} instruction pattern > @item @samp{vec_cmp@var{m}@var{n}} > Output a vector comparison. Operand 0 of mode @var{n} is the destination for > Index: gcc/tree.def > =================================================================== > --- gcc/tree.def 2017-10-23 11:38:53.934094740 +0100 > +++ gcc/tree.def 2017-10-23 11:41:51.774917721 +0100 > @@ -304,6 +304,10 @@ DEFTREECODE (COMPLEX_CST, "complex_cst", > /* Contents are in VECTOR_CST_ELTS field. */ > DEFTREECODE (VECTOR_CST, "vector_cst", tcc_constant, 0) > > +/* Represents a vector constant in which every element is equal to > + VEC_DUPLICATE_CST_ELT. */ > +DEFTREECODE (VEC_DUPLICATE_CST, "vec_duplicate_cst", tcc_constant, 0) > + > /* Contents are TREE_STRING_LENGTH and the actual contents of the string. */ > DEFTREECODE (STRING_CST, "string_cst", tcc_constant, 0) > > @@ -534,6 +538,9 @@ DEFTREECODE (TARGET_EXPR, "target_expr", > 1 and 2 are NULL. The operands are then taken from the cfg edges. */ > DEFTREECODE (COND_EXPR, "cond_expr", tcc_expression, 3) > > +/* Represents a vector in which every element is equal to operand 0. */ > +DEFTREECODE (VEC_DUPLICATE_EXPR, "vec_duplicate_expr", tcc_unary, 1) > + > /* Vector conditional expression. It is like COND_EXPR, but with > vector operands. > > Index: gcc/tree-core.h > =================================================================== > --- gcc/tree-core.h 2017-10-23 11:41:25.862065318 +0100 > +++ gcc/tree-core.h 2017-10-23 11:41:51.771059237 +0100 > @@ -975,7 +975,8 @@ struct GTY(()) tree_base { > /* VEC length. This field is only used with TREE_VEC. */ > int length; > > - /* Number of elements. This field is only used with VECTOR_CST. */ > + /* Number of elements. This field is only used with VECTOR_CST > + and VEC_DUPLICATE_CST. It is always 1 for VEC_DUPLICATE_CST. */ > unsigned int nelts; > > /* SSA version number. This field is only used with SSA_NAME. */ > @@ -1065,7 +1066,7 @@ struct GTY(()) tree_base { > public_flag: > > TREE_OVERFLOW in > - INTEGER_CST, REAL_CST, COMPLEX_CST, VECTOR_CST > + INTEGER_CST, REAL_CST, COMPLEX_CST, VECTOR_CST, VEC_DUPLICATE_CST > > TREE_PUBLIC in > VAR_DECL, FUNCTION_DECL > @@ -1332,7 +1333,7 @@ struct GTY(()) tree_complex { > > struct GTY(()) tree_vector { > struct tree_typed typed; > - tree GTY ((length ("VECTOR_CST_NELTS ((tree) &%h)"))) elts[1]; > + tree GTY ((length ("((tree) &%h)->base.u.nelts"))) elts[1]; > }; > > struct GTY(()) tree_identifier { > Index: gcc/tree.h > =================================================================== > --- gcc/tree.h 2017-10-23 11:41:23.517482774 +0100 > +++ gcc/tree.h 2017-10-23 11:41:51.775882341 +0100 > @@ -730,8 +730,8 @@ #define TREE_SYMBOL_REFERENCED(NODE) \ > #define TYPE_REF_CAN_ALIAS_ALL(NODE) \ > (PTR_OR_REF_CHECK (NODE)->base.static_flag) > > -/* In an INTEGER_CST, REAL_CST, COMPLEX_CST, or VECTOR_CST, this means > - there was an overflow in folding. */ > +/* In an INTEGER_CST, REAL_CST, COMPLEX_CST, VECTOR_CST or VEC_DUPLICATE_CST, > + this means there was an overflow in folding. */ > > #define TREE_OVERFLOW(NODE) (CST_CHECK (NODE)->base.public_flag) > > @@ -1030,6 +1030,10 @@ #define VECTOR_CST_NELTS(NODE) (VECTOR_C > #define VECTOR_CST_ELTS(NODE) (VECTOR_CST_CHECK (NODE)->vector.elts) > #define VECTOR_CST_ELT(NODE,IDX) (VECTOR_CST_CHECK (NODE)->vector.elts[IDX]) > > +/* In a VEC_DUPLICATE_CST node. */ > +#define VEC_DUPLICATE_CST_ELT(NODE) \ > + (VEC_DUPLICATE_CST_CHECK (NODE)->vector.elts[0]) > + > /* Define fields and accessors for some special-purpose tree nodes. */ > > #define IDENTIFIER_LENGTH(NODE) \ > @@ -4025,6 +4029,7 @@ extern tree build_int_cst (tree, HOST_WI > extern tree build_int_cstu (tree type, unsigned HOST_WIDE_INT cst); > extern tree build_int_cst_type (tree, HOST_WIDE_INT); > extern tree make_vector (unsigned CXX_MEM_STAT_INFO); > +extern tree build_vec_duplicate_cst (tree, tree CXX_MEM_STAT_INFO); > extern tree build_vector (tree, vec<tree> CXX_MEM_STAT_INFO); > extern tree build_vector_from_ctor (tree, vec<constructor_elt, va_gc> *); > extern tree build_vector_from_val (tree, tree); > Index: gcc/tree.c > =================================================================== > --- gcc/tree.c 2017-10-23 11:41:23.515548300 +0100 > +++ gcc/tree.c 2017-10-23 11:41:51.774917721 +0100 > @@ -464,6 +464,7 @@ tree_node_structure_for_code (enum tree_ > case FIXED_CST: return TS_FIXED_CST; > case COMPLEX_CST: return TS_COMPLEX; > case VECTOR_CST: return TS_VECTOR; > + case VEC_DUPLICATE_CST: return TS_VECTOR; > case STRING_CST: return TS_STRING; > /* tcc_exceptional cases. */ > case ERROR_MARK: return TS_COMMON; > @@ -816,6 +817,7 @@ tree_code_size (enum tree_code code) > case FIXED_CST: return sizeof (struct tree_fixed_cst); > case COMPLEX_CST: return sizeof (struct tree_complex); > case VECTOR_CST: return sizeof (struct tree_vector); > + case VEC_DUPLICATE_CST: return sizeof (struct tree_vector); > case STRING_CST: gcc_unreachable (); > default: > return lang_hooks.tree_size (code); > @@ -875,6 +877,9 @@ tree_size (const_tree node) > return (sizeof (struct tree_vector) > + (VECTOR_CST_NELTS (node) - 1) * sizeof (tree)); > > + case VEC_DUPLICATE_CST: > + return sizeof (struct tree_vector); > + > case STRING_CST: > return TREE_STRING_LENGTH (node) + offsetof (struct tree_string, str) + 1; > > @@ -1682,6 +1687,30 @@ cst_and_fits_in_hwi (const_tree x) > && (tree_fits_shwi_p (x) || tree_fits_uhwi_p (x))); > } > > +/* Build a new VEC_DUPLICATE_CST with type TYPE and operand EXP. > + > + Note that this function is only suitable for callers that specifically > + need a VEC_DUPLICATE_CST node. Use build_vector_from_val to duplicate > + a general scalar into a general vector type. */ > + > +tree > +build_vec_duplicate_cst (tree type, tree exp MEM_STAT_DECL) > +{ > + int length = sizeof (struct tree_vector); > + > + record_node_allocation_statistics (VEC_DUPLICATE_CST, length); > + > + tree t = ggc_alloc_cleared_tree_node_stat (length PASS_MEM_STAT); > + > + TREE_SET_CODE (t, VEC_DUPLICATE_CST); > + TREE_TYPE (t) = type; > + t->base.u.nelts = 1; > + VEC_DUPLICATE_CST_ELT (t) = exp; > + TREE_CONSTANT (t) = 1; > + > + return t; > +} > + > /* Build a newly constructed VECTOR_CST node of length LEN. */ > > tree > @@ -2343,6 +2372,8 @@ integer_zerop (const_tree expr) > return false; > return true; > } > + case VEC_DUPLICATE_CST: > + return integer_zerop (VEC_DUPLICATE_CST_ELT (expr)); > default: > return false; > } > @@ -2369,6 +2400,8 @@ integer_onep (const_tree expr) > return false; > return true; > } > + case VEC_DUPLICATE_CST: > + return integer_onep (VEC_DUPLICATE_CST_ELT (expr)); > default: > return false; > } > @@ -2407,6 +2440,9 @@ integer_all_onesp (const_tree expr) > return 1; > } > > + else if (TREE_CODE (expr) == VEC_DUPLICATE_CST) > + return integer_all_onesp (VEC_DUPLICATE_CST_ELT (expr)); > + > else if (TREE_CODE (expr) != INTEGER_CST) > return 0; > > @@ -2463,7 +2499,7 @@ integer_nonzerop (const_tree expr) > int > integer_truep (const_tree expr) > { > - if (TREE_CODE (expr) == VECTOR_CST) > + if (TREE_CODE (expr) == VECTOR_CST || TREE_CODE (expr) == VEC_DUPLICATE_CST) > return integer_all_onesp (expr); > return integer_onep (expr); > } > @@ -2634,6 +2670,8 @@ real_zerop (const_tree expr) > return false; > return true; > } > + case VEC_DUPLICATE_CST: > + return real_zerop (VEC_DUPLICATE_CST_ELT (expr)); > default: > return false; > } > @@ -2662,6 +2700,8 @@ real_onep (const_tree expr) > return false; > return true; > } > + case VEC_DUPLICATE_CST: > + return real_onep (VEC_DUPLICATE_CST_ELT (expr)); > default: > return false; > } > @@ -2689,6 +2729,8 @@ real_minus_onep (const_tree expr) > return false; > return true; > } > + case VEC_DUPLICATE_CST: > + return real_minus_onep (VEC_DUPLICATE_CST_ELT (expr)); > default: > return false; > } > @@ -7091,6 +7133,9 @@ add_expr (const_tree t, inchash::hash &h > inchash::add_expr (VECTOR_CST_ELT (t, i), hstate, flags); > return; > } > + case VEC_DUPLICATE_CST: > + inchash::add_expr (VEC_DUPLICATE_CST_ELT (t), hstate); > + return; > case SSA_NAME: > /* We can just compare by pointer. */ > hstate.add_wide_int (SSA_NAME_VERSION (t)); > @@ -10345,6 +10390,9 @@ initializer_zerop (const_tree init) > return true; > } > > + case VEC_DUPLICATE_CST: > + return initializer_zerop (VEC_DUPLICATE_CST_ELT (init)); > + > case CONSTRUCTOR: > { > unsigned HOST_WIDE_INT idx; > @@ -10390,7 +10438,13 @@ uniform_vector_p (const_tree vec) > > gcc_assert (VECTOR_TYPE_P (TREE_TYPE (vec))); > > - if (TREE_CODE (vec) == VECTOR_CST) > + if (TREE_CODE (vec) == VEC_DUPLICATE_CST) > + return VEC_DUPLICATE_CST_ELT (vec); > + > + else if (TREE_CODE (vec) == VEC_DUPLICATE_EXPR) > + return TREE_OPERAND (vec, 0); > + > + else if (TREE_CODE (vec) == VECTOR_CST) > { > first = VECTOR_CST_ELT (vec, 0); > for (i = 1; i < VECTOR_CST_NELTS (vec); ++i) > @@ -11095,6 +11149,7 @@ #define WALK_SUBTREE_TAIL(NODE) \ > case REAL_CST: > case FIXED_CST: > case VECTOR_CST: > + case VEC_DUPLICATE_CST: > case STRING_CST: > case BLOCK: > case PLACEHOLDER_EXPR: > @@ -12381,6 +12436,12 @@ drop_tree_overflow (tree t) > elt = drop_tree_overflow (elt); > } > } > + if (TREE_CODE (t) == VEC_DUPLICATE_CST) > + { > + tree *elt = &VEC_DUPLICATE_CST_ELT (t); > + if (TREE_OVERFLOW (*elt)) > + *elt = drop_tree_overflow (*elt); > + } > return t; > } > > @@ -13798,6 +13859,92 @@ test_integer_constants () > ASSERT_EQ (type, TREE_TYPE (zero)); > } > > +/* Verify predicate handling of VEC_DUPLICATE_CSTs and VEC_DUPLICATE_EXPRs > + for integral type TYPE. */ > + > +static void > +test_vec_duplicate_predicates_int (tree type) > +{ > + tree vec_type = build_vector_type (type, 4); > + > + tree zero = build_zero_cst (type); > + tree vec_zero = build_vec_duplicate_cst (vec_type, zero); > + ASSERT_TRUE (integer_zerop (vec_zero)); > + ASSERT_FALSE (integer_onep (vec_zero)); > + ASSERT_FALSE (integer_minus_onep (vec_zero)); > + ASSERT_FALSE (integer_all_onesp (vec_zero)); > + ASSERT_FALSE (integer_truep (vec_zero)); > + ASSERT_TRUE (initializer_zerop (vec_zero)); > + > + tree one = build_one_cst (type); > + tree vec_one = build_vec_duplicate_cst (vec_type, one); > + ASSERT_FALSE (integer_zerop (vec_one)); > + ASSERT_TRUE (integer_onep (vec_one)); > + ASSERT_FALSE (integer_minus_onep (vec_one)); > + ASSERT_FALSE (integer_all_onesp (vec_one)); > + ASSERT_FALSE (integer_truep (vec_one)); > + ASSERT_FALSE (initializer_zerop (vec_one)); > + > + tree minus_one = build_minus_one_cst (type); > + tree vec_minus_one = build_vec_duplicate_cst (vec_type, minus_one); > + ASSERT_FALSE (integer_zerop (vec_minus_one)); > + ASSERT_FALSE (integer_onep (vec_minus_one)); > + ASSERT_TRUE (integer_minus_onep (vec_minus_one)); > + ASSERT_TRUE (integer_all_onesp (vec_minus_one)); > + ASSERT_TRUE (integer_truep (vec_minus_one)); > + ASSERT_FALSE (initializer_zerop (vec_minus_one)); > + > + tree x = create_tmp_var_raw (type, "x"); > + tree vec_x = build1 (VEC_DUPLICATE_EXPR, vec_type, x); > + ASSERT_EQ (uniform_vector_p (vec_zero), zero); > + ASSERT_EQ (uniform_vector_p (vec_one), one); > + ASSERT_EQ (uniform_vector_p (vec_minus_one), minus_one); > + ASSERT_EQ (uniform_vector_p (vec_x), x); > +} > + > +/* Verify predicate handling of VEC_DUPLICATE_CSTs for floating-point > + type TYPE. */ > + > +static void > +test_vec_duplicate_predicates_float (tree type) > +{ > + tree vec_type = build_vector_type (type, 4); > + > + tree zero = build_zero_cst (type); > + tree vec_zero = build_vec_duplicate_cst (vec_type, zero); > + ASSERT_TRUE (real_zerop (vec_zero)); > + ASSERT_FALSE (real_onep (vec_zero)); > + ASSERT_FALSE (real_minus_onep (vec_zero)); > + ASSERT_TRUE (initializer_zerop (vec_zero)); > + > + tree one = build_one_cst (type); > + tree vec_one = build_vec_duplicate_cst (vec_type, one); > + ASSERT_FALSE (real_zerop (vec_one)); > + ASSERT_TRUE (real_onep (vec_one)); > + ASSERT_FALSE (real_minus_onep (vec_one)); > + ASSERT_FALSE (initializer_zerop (vec_one)); > + > + tree minus_one = build_minus_one_cst (type); > + tree vec_minus_one = build_vec_duplicate_cst (vec_type, minus_one); > + ASSERT_FALSE (real_zerop (vec_minus_one)); > + ASSERT_FALSE (real_onep (vec_minus_one)); > + ASSERT_TRUE (real_minus_onep (vec_minus_one)); > + ASSERT_FALSE (initializer_zerop (vec_minus_one)); > + > + ASSERT_EQ (uniform_vector_p (vec_zero), zero); > + ASSERT_EQ (uniform_vector_p (vec_one), one); > + ASSERT_EQ (uniform_vector_p (vec_minus_one), minus_one); > +} > + > +/* Verify predicate handling of VEC_DUPLICATE_CSTs and VEC_DUPLICATE_EXPRs. */ > + > +static void > +test_vec_duplicate_predicates () > +{ > + test_vec_duplicate_predicates_int (integer_type_node); > + test_vec_duplicate_predicates_float (float_type_node); > +} > + > /* Verify identifiers. */ > > static void > @@ -13826,6 +13973,7 @@ test_labels () > tree_c_tests () > { > test_integer_constants (); > + test_vec_duplicate_predicates (); > test_identifiers (); > test_labels (); > } > Index: gcc/cfgexpand.c > =================================================================== > --- gcc/cfgexpand.c 2017-10-23 11:41:23.137358624 +0100 > +++ gcc/cfgexpand.c 2017-10-23 11:41:51.760448406 +0100 > @@ -5049,6 +5049,8 @@ expand_debug_expr (tree exp) > case VEC_WIDEN_LSHIFT_HI_EXPR: > case VEC_WIDEN_LSHIFT_LO_EXPR: > case VEC_PERM_EXPR: > + case VEC_DUPLICATE_CST: > + case VEC_DUPLICATE_EXPR: > return NULL; > > /* Misc codes. */ > Index: gcc/tree-pretty-print.c > =================================================================== > --- gcc/tree-pretty-print.c 2017-10-23 11:38:53.934094740 +0100 > +++ gcc/tree-pretty-print.c 2017-10-23 11:41:51.772023858 +0100 > @@ -1802,6 +1802,12 @@ dump_generic_node (pretty_printer *pp, t > } > break; > > + case VEC_DUPLICATE_CST: > + pp_string (pp, "{ "); > + dump_generic_node (pp, VEC_DUPLICATE_CST_ELT (node), spc, flags, false); > + pp_string (pp, ", ... }"); > + break; > + > case FUNCTION_TYPE: > case METHOD_TYPE: > dump_generic_node (pp, TREE_TYPE (node), spc, flags, false); > @@ -3231,6 +3237,15 @@ dump_generic_node (pretty_printer *pp, t > pp_string (pp, " > "); > break; > > + case VEC_DUPLICATE_EXPR: > + pp_space (pp); > + for (str = get_tree_code_name (code); *str; str++) > + pp_character (pp, TOUPPER (*str)); > + pp_string (pp, " < "); > + dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false); > + pp_string (pp, " > "); > + break; > + > case VEC_UNPACK_HI_EXPR: > pp_string (pp, " VEC_UNPACK_HI_EXPR < "); > dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false); > Index: gcc/dwarf2out.c > =================================================================== > --- gcc/dwarf2out.c 2017-10-23 11:41:24.407340836 +0100 > +++ gcc/dwarf2out.c 2017-10-23 11:41:51.763342269 +0100 > @@ -18862,6 +18862,7 @@ rtl_for_decl_init (tree init, tree type) > switch (TREE_CODE (init)) > { > case VECTOR_CST: > + case VEC_DUPLICATE_CST: > break; > case CONSTRUCTOR: > if (TREE_CONSTANT (init)) > Index: gcc/gimple-expr.h > =================================================================== > --- gcc/gimple-expr.h 2017-10-23 11:38:53.934094740 +0100 > +++ gcc/gimple-expr.h 2017-10-23 11:41:51.765271511 +0100 > @@ -134,6 +134,7 @@ is_gimple_constant (const_tree t) > case FIXED_CST: > case COMPLEX_CST: > case VECTOR_CST: > + case VEC_DUPLICATE_CST: > case STRING_CST: > return true; > > Index: gcc/gimplify.c > =================================================================== > --- gcc/gimplify.c 2017-10-23 11:41:25.531270256 +0100 > +++ gcc/gimplify.c 2017-10-23 11:41:51.766236132 +0100 > @@ -11506,6 +11506,7 @@ gimplify_expr (tree *expr_p, gimple_seq > case STRING_CST: > case COMPLEX_CST: > case VECTOR_CST: > + case VEC_DUPLICATE_CST: > /* Drop the overflow flag on constants, we do not want > that in the GIMPLE IL. */ > if (TREE_OVERFLOW_P (*expr_p)) > Index: gcc/graphite-isl-ast-to-gimple.c > =================================================================== > --- gcc/graphite-isl-ast-to-gimple.c 2017-10-23 11:41:23.205065216 +0100 > +++ gcc/graphite-isl-ast-to-gimple.c 2017-10-23 11:41:51.767200753 +0100 > @@ -222,7 +222,8 @@ enum phi_node_kind > return TREE_CODE (op) == INTEGER_CST > || TREE_CODE (op) == REAL_CST > || TREE_CODE (op) == COMPLEX_CST > - || TREE_CODE (op) == VECTOR_CST; > + || TREE_CODE (op) == VECTOR_CST > + || TREE_CODE (op) == VEC_DUPLICATE_CST; > } > > private: > Index: gcc/graphite-scop-detection.c > =================================================================== > --- gcc/graphite-scop-detection.c 2017-10-23 11:41:25.533204730 +0100 > +++ gcc/graphite-scop-detection.c 2017-10-23 11:41:51.767200753 +0100 > @@ -1243,6 +1243,7 @@ scan_tree_for_params (sese_info_p s, tre > case REAL_CST: > case COMPLEX_CST: > case VECTOR_CST: > + case VEC_DUPLICATE_CST: > break; > > default: > Index: gcc/ipa-icf-gimple.c > =================================================================== > --- gcc/ipa-icf-gimple.c 2017-10-23 11:38:53.934094740 +0100 > +++ gcc/ipa-icf-gimple.c 2017-10-23 11:41:51.767200753 +0100 > @@ -333,6 +333,7 @@ func_checker::compare_cst_or_decl (tree > case INTEGER_CST: > case COMPLEX_CST: > case VECTOR_CST: > + case VEC_DUPLICATE_CST: > case STRING_CST: > case REAL_CST: > { > @@ -528,6 +529,7 @@ func_checker::compare_operand (tree t1, > case INTEGER_CST: > case COMPLEX_CST: > case VECTOR_CST: > + case VEC_DUPLICATE_CST: > case STRING_CST: > case REAL_CST: > case FUNCTION_DECL: > Index: gcc/ipa-icf.c > =================================================================== > --- gcc/ipa-icf.c 2017-10-23 11:41:25.874639400 +0100 > +++ gcc/ipa-icf.c 2017-10-23 11:41:51.768165374 +0100 > @@ -1478,6 +1478,7 @@ sem_item::add_expr (const_tree exp, inch > case STRING_CST: > case COMPLEX_CST: > case VECTOR_CST: > + case VEC_DUPLICATE_CST: > inchash::add_expr (exp, hstate); > break; > case CONSTRUCTOR: > @@ -2030,6 +2031,9 @@ sem_variable::equals (tree t1, tree t2) > > return 1; > } > + case VEC_DUPLICATE_CST: > + return sem_variable::equals (VEC_DUPLICATE_CST_ELT (t1), > + VEC_DUPLICATE_CST_ELT (t2)); > case ARRAY_REF: > case ARRAY_RANGE_REF: > { > Index: gcc/match.pd > =================================================================== > --- gcc/match.pd 2017-10-23 11:38:53.934094740 +0100 > +++ gcc/match.pd 2017-10-23 11:41:51.768165374 +0100 > @@ -958,6 +958,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > (match negate_expr_p > VECTOR_CST > (if (FLOAT_TYPE_P (TREE_TYPE (type)) || TYPE_OVERFLOW_WRAPS (type)))) > +(match negate_expr_p > + VEC_DUPLICATE_CST > + (if (FLOAT_TYPE_P (TREE_TYPE (type)) || TYPE_OVERFLOW_WRAPS (type)))) > > /* (-A) * (-B) -> A * B */ > (simplify > Index: gcc/print-tree.c > =================================================================== > --- gcc/print-tree.c 2017-10-23 11:38:53.934094740 +0100 > +++ gcc/print-tree.c 2017-10-23 11:41:51.769129995 +0100 > @@ -783,6 +783,10 @@ print_node (FILE *file, const char *pref > } > break; > > + case VEC_DUPLICATE_CST: > + print_node (file, "elt", VEC_DUPLICATE_CST_ELT (node), indent + 4); > + break; > + > case COMPLEX_CST: > print_node (file, "real", TREE_REALPART (node), indent + 4); > print_node (file, "imag", TREE_IMAGPART (node), indent + 4); > Index: gcc/tree-chkp.c > =================================================================== > --- gcc/tree-chkp.c 2017-10-23 11:41:23.201196268 +0100 > +++ gcc/tree-chkp.c 2017-10-23 11:41:51.770094616 +0100 > @@ -3800,6 +3800,7 @@ chkp_find_bounds_1 (tree ptr, tree ptr_s > case INTEGER_CST: > case COMPLEX_CST: > case VECTOR_CST: > + case VEC_DUPLICATE_CST: > if (integer_zerop (ptr_src)) > bounds = chkp_get_none_bounds (); > else > Index: gcc/tree-loop-distribution.c > =================================================================== > --- gcc/tree-loop-distribution.c 2017-10-23 11:41:23.228278904 +0100 > +++ gcc/tree-loop-distribution.c 2017-10-23 11:41:51.771059237 +0100 > @@ -921,6 +921,9 @@ const_with_all_bytes_same (tree val) > && CONSTRUCTOR_NELTS (val) == 0)) > return 0; > > + if (TREE_CODE (val) == VEC_DUPLICATE_CST) > + return const_with_all_bytes_same (VEC_DUPLICATE_CST_ELT (val)); > + > if (real_zerop (val)) > { > /* Only return 0 for +0.0, not for -0.0, which doesn't have > Index: gcc/tree-ssa-loop.c > =================================================================== > --- gcc/tree-ssa-loop.c 2017-10-23 11:38:53.934094740 +0100 > +++ gcc/tree-ssa-loop.c 2017-10-23 11:41:51.772023858 +0100 > @@ -616,6 +616,7 @@ for_each_index (tree *addr_p, bool (*cbc > case STRING_CST: > case RESULT_DECL: > case VECTOR_CST: > + case VEC_DUPLICATE_CST: > case COMPLEX_CST: > case INTEGER_CST: > case REAL_CST: > Index: gcc/tree-ssa-pre.c > =================================================================== > --- gcc/tree-ssa-pre.c 2017-10-23 11:41:25.549647760 +0100 > +++ gcc/tree-ssa-pre.c 2017-10-23 11:41:51.772023858 +0100 > @@ -2675,6 +2675,7 @@ create_component_ref_by_pieces_1 (basic_ > case INTEGER_CST: > case COMPLEX_CST: > case VECTOR_CST: > + case VEC_DUPLICATE_CST: > case REAL_CST: > case CONSTRUCTOR: > case VAR_DECL: > Index: gcc/tree-ssa-sccvn.c > =================================================================== > --- gcc/tree-ssa-sccvn.c 2017-10-23 11:38:53.934094740 +0100 > +++ gcc/tree-ssa-sccvn.c 2017-10-23 11:41:51.773953100 +0100 > @@ -858,6 +858,7 @@ copy_reference_ops_from_ref (tree ref, v > case INTEGER_CST: > case COMPLEX_CST: > case VECTOR_CST: > + case VEC_DUPLICATE_CST: > case REAL_CST: > case FIXED_CST: > case CONSTRUCTOR: > @@ -1050,6 +1051,7 @@ ao_ref_init_from_vn_reference (ao_ref *r > case INTEGER_CST: > case COMPLEX_CST: > case VECTOR_CST: > + case VEC_DUPLICATE_CST: > case REAL_CST: > case CONSTRUCTOR: > case CONST_DECL: > Index: gcc/tree-vect-generic.c > =================================================================== > --- gcc/tree-vect-generic.c 2017-10-23 11:38:53.934094740 +0100 > +++ gcc/tree-vect-generic.c 2017-10-23 11:41:51.773953100 +0100 > @@ -1419,6 +1419,7 @@ lower_vec_perm (gimple_stmt_iterator *gs > ssa_uniform_vector_p (tree op) > { > if (TREE_CODE (op) == VECTOR_CST > + || TREE_CODE (op) == VEC_DUPLICATE_CST > || TREE_CODE (op) == CONSTRUCTOR) > return uniform_vector_p (op); > if (TREE_CODE (op) == SSA_NAME) > Index: gcc/varasm.c > =================================================================== > --- gcc/varasm.c 2017-10-23 11:41:25.822408600 +0100 > +++ gcc/varasm.c 2017-10-23 11:41:51.775882341 +0100 > @@ -3068,6 +3068,9 @@ const_hash_1 (const tree exp) > CASE_CONVERT: > return const_hash_1 (TREE_OPERAND (exp, 0)) * 7 + 2; > > + case VEC_DUPLICATE_CST: > + return const_hash_1 (VEC_DUPLICATE_CST_ELT (exp)) * 7 + 3; > + > default: > /* A language specific constant. Just hash the code. */ > return code; > @@ -3158,6 +3161,10 @@ compare_constant (const tree t1, const t > return 1; > } > > + case VEC_DUPLICATE_CST: > + return compare_constant (VEC_DUPLICATE_CST_ELT (t1), > + VEC_DUPLICATE_CST_ELT (t2)); > + > case CONSTRUCTOR: > { > vec<constructor_elt, va_gc> *v1, *v2; > Index: gcc/fold-const.c > =================================================================== > --- gcc/fold-const.c 2017-10-23 11:41:23.535860278 +0100 > +++ gcc/fold-const.c 2017-10-23 11:41:51.765271511 +0100 > @@ -418,6 +418,9 @@ negate_expr_p (tree t) > return true; > } > > + case VEC_DUPLICATE_CST: > + return negate_expr_p (VEC_DUPLICATE_CST_ELT (t)); > + > case COMPLEX_EXPR: > return negate_expr_p (TREE_OPERAND (t, 0)) > && negate_expr_p (TREE_OPERAND (t, 1)); > @@ -579,6 +582,14 @@ fold_negate_expr_1 (location_t loc, tree > return build_vector (type, elts); > } > > + case VEC_DUPLICATE_CST: > + { > + tree sub = fold_negate_expr (loc, VEC_DUPLICATE_CST_ELT (t)); > + if (!sub) > + return NULL_TREE; > + return build_vector_from_val (type, sub); > + } > + > case COMPLEX_EXPR: > if (negate_expr_p (t)) > return fold_build2_loc (loc, COMPLEX_EXPR, type, > @@ -1436,6 +1447,16 @@ const_binop (enum tree_code code, tree a > return build_vector (type, elts); > } > > + if (TREE_CODE (arg1) == VEC_DUPLICATE_CST > + && TREE_CODE (arg2) == VEC_DUPLICATE_CST) > + { > + tree sub = const_binop (code, VEC_DUPLICATE_CST_ELT (arg1), > + VEC_DUPLICATE_CST_ELT (arg2)); > + if (!sub) > + return NULL_TREE; > + return build_vector_from_val (TREE_TYPE (arg1), sub); > + } > + > /* Shifts allow a scalar offset for a vector. */ > if (TREE_CODE (arg1) == VECTOR_CST > && TREE_CODE (arg2) == INTEGER_CST) > @@ -1459,6 +1480,15 @@ const_binop (enum tree_code code, tree a > > return build_vector (type, elts); > } > + > + if (TREE_CODE (arg1) == VEC_DUPLICATE_CST > + && TREE_CODE (arg2) == INTEGER_CST) > + { > + tree sub = const_binop (code, VEC_DUPLICATE_CST_ELT (arg1), arg2); > + if (!sub) > + return NULL_TREE; > + return build_vector_from_val (TREE_TYPE (arg1), sub); > + } > return NULL_TREE; > } > > @@ -1652,6 +1682,13 @@ const_unop (enum tree_code code, tree ty > if (i == count) > return build_vector (type, elements); > } > + else if (TREE_CODE (arg0) == VEC_DUPLICATE_CST) > + { > + tree sub = const_unop (BIT_NOT_EXPR, TREE_TYPE (type), > + VEC_DUPLICATE_CST_ELT (arg0)); > + if (sub) > + return build_vector_from_val (type, sub); > + } > break; > > case TRUTH_NOT_EXPR: > @@ -1737,6 +1774,11 @@ const_unop (enum tree_code code, tree ty > return res; > } > > + case VEC_DUPLICATE_EXPR: > + if (CONSTANT_CLASS_P (arg0)) > + return build_vector_from_val (type, arg0); > + return NULL_TREE; > + > default: > break; > } > @@ -2167,6 +2209,15 @@ fold_convert_const (enum tree_code code, > } > return build_vector (type, v); > } > + if (TREE_CODE (arg1) == VEC_DUPLICATE_CST > + && (TYPE_VECTOR_SUBPARTS (type) > + == TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)))) > + { > + tree sub = fold_convert_const (code, TREE_TYPE (type), > + VEC_DUPLICATE_CST_ELT (arg1)); > + if (sub) > + return build_vector_from_val (type, sub); > + } > } > return NULL_TREE; > } > @@ -2953,6 +3004,10 @@ operand_equal_p (const_tree arg0, const_ > return 1; > } > > + case VEC_DUPLICATE_CST: > + return operand_equal_p (VEC_DUPLICATE_CST_ELT (arg0), > + VEC_DUPLICATE_CST_ELT (arg1), flags); > + > case COMPLEX_CST: > return (operand_equal_p (TREE_REALPART (arg0), TREE_REALPART (arg1), > flags) > @@ -7492,6 +7547,20 @@ can_native_interpret_type_p (tree type) > static tree > fold_view_convert_expr (tree type, tree expr) > { > + /* Recurse on duplicated vectors if the target type is also a vector > + and if the elements line up. */ > + tree expr_type = TREE_TYPE (expr); > + if (TREE_CODE (expr) == VEC_DUPLICATE_CST > + && VECTOR_TYPE_P (type) > + && TYPE_VECTOR_SUBPARTS (type) == TYPE_VECTOR_SUBPARTS (expr_type) > + && TYPE_SIZE (TREE_TYPE (type)) == TYPE_SIZE (TREE_TYPE (expr_type))) > + { > + tree sub = fold_view_convert_expr (TREE_TYPE (type), > + VEC_DUPLICATE_CST_ELT (expr)); > + if (sub) > + return build_vector_from_val (type, sub); > + } > + > /* We support up to 512-bit values (for V8DFmode). */ > unsigned char buffer[64]; > int len; > @@ -8891,6 +8960,15 @@ exact_inverse (tree type, tree cst) > return build_vector (type, elts); > } > > + case VEC_DUPLICATE_CST: > + { > + tree sub = exact_inverse (TREE_TYPE (type), > + VEC_DUPLICATE_CST_ELT (cst)); > + if (!sub) > + return NULL_TREE; > + return build_vector_from_val (type, sub); > + } > + > default: > return NULL_TREE; > } > @@ -11969,6 +12047,9 @@ fold_checksum_tree (const_tree expr, str > for (i = 0; i < (int) VECTOR_CST_NELTS (expr); ++i) > fold_checksum_tree (VECTOR_CST_ELT (expr, i), ctx, ht); > break; > + case VEC_DUPLICATE_CST: > + fold_checksum_tree (VEC_DUPLICATE_CST_ELT (expr), ctx, ht); > + break; > default: > break; > } > @@ -14436,6 +14517,36 @@ test_vector_folding () > ASSERT_FALSE (integer_nonzerop (fold_build2 (NE_EXPR, res_type, one, one))); > } > > +/* Verify folding of VEC_DUPLICATE_CSTs and VEC_DUPLICATE_EXPRs. */ > + > +static void > +test_vec_duplicate_folding () > +{ > + tree type = build_vector_type (ssizetype, 4); > + tree dup5 = build_vec_duplicate_cst (type, ssize_int (5)); > + tree dup3 = build_vec_duplicate_cst (type, ssize_int (3)); > + > + tree neg_dup5 = fold_unary (NEGATE_EXPR, type, dup5); > + ASSERT_EQ (uniform_vector_p (neg_dup5), ssize_int (-5)); > + > + tree not_dup5 = fold_unary (BIT_NOT_EXPR, type, dup5); > + ASSERT_EQ (uniform_vector_p (not_dup5), ssize_int (-6)); > + > + tree dup5_plus_dup3 = fold_binary (PLUS_EXPR, type, dup5, dup3); > + ASSERT_EQ (uniform_vector_p (dup5_plus_dup3), ssize_int (8)); > + > + tree dup5_lsl_2 = fold_binary (LSHIFT_EXPR, type, dup5, ssize_int (2)); > + ASSERT_EQ (uniform_vector_p (dup5_lsl_2), ssize_int (20)); > + > + tree size_vector = build_vector_type (sizetype, 4); > + tree size_dup5 = fold_convert (size_vector, dup5); > + ASSERT_EQ (uniform_vector_p (size_dup5), size_int (5)); > + > + tree dup5_expr = fold_unary (VEC_DUPLICATE_EXPR, type, ssize_int (5)); > + tree dup5_cst = build_vector_from_val (type, ssize_int (5)); > + ASSERT_TRUE (operand_equal_p (dup5_expr, dup5_cst, 0)); > +} > + > /* Run all of the selftests within this file. */ > > void > @@ -14443,6 +14554,7 @@ fold_const_c_tests () > { > test_arithmetic_folding (); > test_vector_folding (); > + test_vec_duplicate_folding (); > } > > } // namespace selftest > Index: gcc/optabs.def > =================================================================== > --- gcc/optabs.def 2017-10-23 11:38:53.934094740 +0100 > +++ gcc/optabs.def 2017-10-23 11:41:51.769129995 +0100 > @@ -364,3 +364,5 @@ OPTAB_D (atomic_xor_optab, "atomic_xor$I > > OPTAB_D (get_thread_pointer_optab, "get_thread_pointer$I$a") > OPTAB_D (set_thread_pointer_optab, "set_thread_pointer$I$a") > + > +OPTAB_DC (vec_duplicate_optab, "vec_duplicate$a", VEC_DUPLICATE) > Index: gcc/optabs-tree.c > =================================================================== > --- gcc/optabs-tree.c 2017-10-23 11:38:53.934094740 +0100 > +++ gcc/optabs-tree.c 2017-10-23 11:41:51.768165374 +0100 > @@ -210,6 +210,9 @@ optab_for_tree_code (enum tree_code code > return TYPE_UNSIGNED (type) ? > vec_pack_ufix_trunc_optab : vec_pack_sfix_trunc_optab; > > + case VEC_DUPLICATE_EXPR: > + return vec_duplicate_optab; > + > default: > break; > } > Index: gcc/optabs.h > =================================================================== > --- gcc/optabs.h 2017-10-23 11:38:53.934094740 +0100 > +++ gcc/optabs.h 2017-10-23 11:41:51.769129995 +0100 > @@ -181,6 +181,7 @@ extern rtx simplify_expand_binop (machin > enum optab_methods methods); > extern bool force_expand_binop (machine_mode, optab, rtx, rtx, rtx, int, > enum optab_methods); > +extern rtx expand_vector_broadcast (machine_mode, rtx); > > /* Generate code for a simple binary or unary operation. "Simple" in > this case means "can be unambiguously described by a (mode, code) > Index: gcc/optabs.c > =================================================================== > --- gcc/optabs.c 2017-10-23 11:41:41.549050496 +0100 > +++ gcc/optabs.c 2017-10-23 11:41:51.769129995 +0100 > @@ -367,7 +367,7 @@ force_expand_binop (machine_mode mode, o > mode of OP must be the element mode of VMODE. If OP is a constant, > then the return value will be a constant. */ > > -static rtx > +rtx > expand_vector_broadcast (machine_mode vmode, rtx op) > { > enum insn_code icode; > @@ -380,6 +380,16 @@ expand_vector_broadcast (machine_mode vm > if (CONSTANT_P (op)) > return gen_const_vec_duplicate (vmode, op); > > + icode = optab_handler (vec_duplicate_optab, vmode); > + if (icode != CODE_FOR_nothing) > + { > + struct expand_operand ops[2]; > + create_output_operand (&ops[0], NULL_RTX, vmode); > + create_input_operand (&ops[1], op, GET_MODE (op)); > + expand_insn (icode, 2, ops); > + return ops[0].value; > + } > + > /* ??? If the target doesn't have a vec_init, then we have no easy way > of performing this operation. Most of this sort of generic support > is hidden away in the vector lowering support in gimple. */ > Index: gcc/expr.c > =================================================================== > --- gcc/expr.c 2017-10-23 11:41:39.187050437 +0100 > +++ gcc/expr.c 2017-10-23 11:41:51.764306890 +0100 > @@ -6572,7 +6572,8 @@ store_constructor (tree exp, rtx target, > constructor_elt *ce; > int i; > int need_to_clear; > - int icode = CODE_FOR_nothing; > + insn_code icode = CODE_FOR_nothing; > + tree elt; > tree elttype = TREE_TYPE (type); > int elt_size = tree_to_uhwi (TYPE_SIZE (elttype)); > machine_mode eltmode = TYPE_MODE (elttype); > @@ -6582,13 +6583,30 @@ store_constructor (tree exp, rtx target, > unsigned n_elts; > alias_set_type alias; > bool vec_vec_init_p = false; > + machine_mode mode = GET_MODE (target); > > gcc_assert (eltmode != BLKmode); > > + /* Try using vec_duplicate_optab for uniform vectors. */ > + if (!TREE_SIDE_EFFECTS (exp) > + && VECTOR_MODE_P (mode) > + && eltmode == GET_MODE_INNER (mode) > + && ((icode = optab_handler (vec_duplicate_optab, mode)) > + != CODE_FOR_nothing) > + && (elt = uniform_vector_p (exp))) > + { > + struct expand_operand ops[2]; > + create_output_operand (&ops[0], target, mode); > + create_input_operand (&ops[1], expand_normal (elt), eltmode); > + expand_insn (icode, 2, ops); > + if (!rtx_equal_p (target, ops[0].value)) > + emit_move_insn (target, ops[0].value); > + break; > + } > + > n_elts = TYPE_VECTOR_SUBPARTS (type); > - if (REG_P (target) && VECTOR_MODE_P (GET_MODE (target))) > + if (REG_P (target) && VECTOR_MODE_P (mode)) > { > - machine_mode mode = GET_MODE (target); > machine_mode emode = eltmode; > > if (CONSTRUCTOR_NELTS (exp) > @@ -6600,7 +6618,7 @@ store_constructor (tree exp, rtx target, > == n_elts); > emode = TYPE_MODE (etype); > } > - icode = (int) convert_optab_handler (vec_init_optab, mode, emode); > + icode = convert_optab_handler (vec_init_optab, mode, emode); > if (icode != CODE_FOR_nothing) > { > unsigned int i, n = n_elts; > @@ -6648,7 +6666,7 @@ store_constructor (tree exp, rtx target, > if (need_to_clear && size > 0 && !vector) > { > if (REG_P (target)) > - emit_move_insn (target, CONST0_RTX (GET_MODE (target))); > + emit_move_insn (target, CONST0_RTX (mode)); > else > clear_storage (target, GEN_INT (size), BLOCK_OP_NORMAL); > cleared = 1; > @@ -6656,7 +6674,7 @@ store_constructor (tree exp, rtx target, > > /* Inform later passes that the old value is dead. */ > if (!cleared && !vector && REG_P (target)) > - emit_move_insn (target, CONST0_RTX (GET_MODE (target))); > + emit_move_insn (target, CONST0_RTX (mode)); > > if (MEM_P (target)) > alias = MEM_ALIAS_SET (target); > @@ -6707,8 +6725,7 @@ store_constructor (tree exp, rtx target, > > if (vector) > emit_insn (GEN_FCN (icode) (target, > - gen_rtx_PARALLEL (GET_MODE (target), > - vector))); > + gen_rtx_PARALLEL (mode, vector))); > break; > } > > @@ -7686,6 +7703,19 @@ expand_operands (tree exp0, tree exp1, r > } > > > +/* Expand constant vector element ELT, which has mode MODE. This is used > + for members of VECTOR_CST and VEC_DUPLICATE_CST. */ > + > +static rtx > +const_vector_element (scalar_mode mode, const_tree elt) > +{ > + if (TREE_CODE (elt) == REAL_CST) > + return const_double_from_real_value (TREE_REAL_CST (elt), mode); > + if (TREE_CODE (elt) == FIXED_CST) > + return CONST_FIXED_FROM_FIXED_VALUE (TREE_FIXED_CST (elt), mode); > + return immed_wide_int_const (wi::to_wide (elt), mode); > +} > + > /* Return a MEM that contains constant EXP. DEFER is as for > output_constant_def and MODIFIER is as for expand_expr. */ > > @@ -9551,6 +9581,12 @@ #define REDUCE_BIT_FIELD(expr) (reduce_b > target = expand_vec_cond_expr (type, treeop0, treeop1, treeop2, target); > return target; > > + case VEC_DUPLICATE_EXPR: > + op0 = expand_expr (treeop0, NULL_RTX, VOIDmode, modifier); > + target = expand_vector_broadcast (mode, op0); > + gcc_assert (target); > + return target; > + > case BIT_INSERT_EXPR: > { > unsigned bitpos = tree_to_uhwi (treeop2); > @@ -10003,6 +10039,11 @@ expand_expr_real_1 (tree exp, rtx target > tmode, modifier); > } > > + case VEC_DUPLICATE_CST: > + op0 = const_vector_element (GET_MODE_INNER (mode), > + VEC_DUPLICATE_CST_ELT (exp)); > + return gen_const_vec_duplicate (mode, op0); > + > case CONST_DECL: > if (modifier == EXPAND_WRITE) > { > @@ -11764,8 +11805,7 @@ const_vector_from_tree (tree exp) > { > rtvec v; > unsigned i, units; > - tree elt; > - machine_mode inner, mode; > + machine_mode mode; > > mode = TYPE_MODE (TREE_TYPE (exp)); > > @@ -11776,23 +11816,12 @@ const_vector_from_tree (tree exp) > return const_vector_mask_from_tree (exp); > > units = VECTOR_CST_NELTS (exp); > - inner = GET_MODE_INNER (mode); > > v = rtvec_alloc (units); > > for (i = 0; i < units; ++i) > - { > - elt = VECTOR_CST_ELT (exp, i); > - > - if (TREE_CODE (elt) == REAL_CST) > - RTVEC_ELT (v, i) = const_double_from_real_value (TREE_REAL_CST (elt), > - inner); > - else if (TREE_CODE (elt) == FIXED_CST) > - RTVEC_ELT (v, i) = CONST_FIXED_FROM_FIXED_VALUE (TREE_FIXED_CST (elt), > - inner); > - else > - RTVEC_ELT (v, i) = immed_wide_int_const (wi::to_wide (elt), inner); > - } > + RTVEC_ELT (v, i) = const_vector_element (GET_MODE_INNER (mode), > + VECTOR_CST_ELT (exp, i)); > > return gen_rtx_CONST_VECTOR (mode, v); > } > Index: gcc/internal-fn.c > =================================================================== > --- gcc/internal-fn.c 2017-10-23 11:41:23.529089619 +0100 > +++ gcc/internal-fn.c 2017-10-23 11:41:51.767200753 +0100 > @@ -1911,12 +1911,12 @@ expand_vector_ubsan_overflow (location_t > emit_move_insn (cntvar, const0_rtx); > emit_label (loop_lab); > } > - if (TREE_CODE (arg0) != VECTOR_CST) > + if (!CONSTANT_CLASS_P (arg0)) > { > rtx arg0r = expand_normal (arg0); > arg0 = make_tree (TREE_TYPE (arg0), arg0r); > } > - if (TREE_CODE (arg1) != VECTOR_CST) > + if (!CONSTANT_CLASS_P (arg1)) > { > rtx arg1r = expand_normal (arg1); > arg1 = make_tree (TREE_TYPE (arg1), arg1r); > Index: gcc/tree-cfg.c > =================================================================== > --- gcc/tree-cfg.c 2017-10-23 11:41:25.864967029 +0100 > +++ gcc/tree-cfg.c 2017-10-23 11:41:51.770094616 +0100 > @@ -3803,6 +3803,17 @@ verify_gimple_assign_unary (gassign *stm > case CONJ_EXPR: > break; > > + case VEC_DUPLICATE_EXPR: > + if (TREE_CODE (lhs_type) != VECTOR_TYPE > + || !useless_type_conversion_p (TREE_TYPE (lhs_type), rhs1_type)) > + { > + error ("vec_duplicate should be from a scalar to a like vector"); > + debug_generic_expr (lhs_type); > + debug_generic_expr (rhs1_type); > + return true; > + } > + return false; > + > default: > gcc_unreachable (); > } > @@ -4473,6 +4484,7 @@ verify_gimple_assign_single (gassign *st > case FIXED_CST: > case COMPLEX_CST: > case VECTOR_CST: > + case VEC_DUPLICATE_CST: > case STRING_CST: > return res; > > Index: gcc/tree-inline.c > =================================================================== > --- gcc/tree-inline.c 2017-10-23 11:41:25.833048208 +0100 > +++ gcc/tree-inline.c 2017-10-23 11:41:51.771059237 +0100 > @@ -4002,6 +4002,7 @@ estimate_operator_cost (enum tree_code c > case VEC_PACK_FIX_TRUNC_EXPR: > case VEC_WIDEN_LSHIFT_HI_EXPR: > case VEC_WIDEN_LSHIFT_LO_EXPR: > + case VEC_DUPLICATE_EXPR: > > return 1; >
Richard Biener <richard.guenther@gmail.com> writes: > On Mon, Oct 23, 2017 at 1:20 PM, Richard Sandiford > <richard.sandiford@linaro.org> wrote: >> SVE needs a way of broadcasting a scalar to a variable-length vector. >> This patch adds VEC_DUPLICATE_CST for when VECTOR_CST would be used for >> fixed-length vectors and VEC_DUPLICATE_EXPR for when CONSTRUCTOR would >> be used for fixed-length vectors. VEC_DUPLICATE_EXPR is the tree >> equivalent of the existing rtl code VEC_DUPLICATE. >> >> Originally we had a single VEC_DUPLICATE_EXPR and used TREE_CONSTANT >> to mark constant nodes, but in response to last year's RFC, Richard B. >> suggested it would be better to have separate codes for the constant >> and non-constant cases. This allows VEC_DUPLICATE_EXPR to be treated >> as a normal unary operation and avoids the previous need for treating >> it as a GIMPLE_SINGLE_RHS. >> >> It might make sense to use VEC_DUPLICATE_CST for all duplicated >> vector constants, since it's a bit more compact than VECTOR_CST >> in that case, and is potentially more efficient to process. >> However, the nice thing about keeping it restricted to variable-length >> vectors is that there is then no need to handle combinations of >> VECTOR_CST and VEC_DUPLICATE_CST; a vector type will always use >> VECTOR_CST or never use it. >> >> The patch also adds a vec_duplicate_optab to go with VEC_DUPLICATE_EXPR. > > Index: gcc/tree-vect-generic.c > =================================================================== > --- gcc/tree-vect-generic.c 2017-10-23 11:38:53.934094740 +0100 > +++ gcc/tree-vect-generic.c 2017-10-23 11:41:51.773953100 +0100 > @@ -1419,6 +1419,7 @@ lower_vec_perm (gimple_stmt_iterator *gs > ssa_uniform_vector_p (tree op) > { > if (TREE_CODE (op) == VECTOR_CST > + || TREE_CODE (op) == VEC_DUPLICATE_CST > || TREE_CODE (op) == CONSTRUCTOR) > return uniform_vector_p (op); > > VEC_DUPLICATE_EXPR handling? Oops, yeah. I could have sworn it was there at one time... > Looks like for VEC_DUPLICATE_CST it could directly return true. The function is a bit misnamed: it returns the duplicated tree value rather than a bool. > I didn't see uniform_vector_p being updated? That part was there FWIW (for tree.c). > Can you add verification to either verify_expr or build_vec_duplicate_cst > that the type is one of variable size? And amend tree.def docs > accordingly. Because otherwise we miss a lot of cases in constant > folding (mixing VEC_DUPLICATE_CST and VECTOR_CST). OK, done in the patch below with a gcc_unreachable () bomb in build_vec_duplicate_cst, which becomes a gcc_assert when variable-length vectors are added. This meant changing the selftests to use build_vector_from_val rather than build_vec_duplicate_cst, but to still get testing of VEC_DUPLICATE_*, we then need to use the target's preferred vector length instead of always using 4. Tested as before. OK (given the slightly different selftests)? Thanks, Richard 2017-11-06 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hawyard@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * doc/generic.texi (VEC_DUPLICATE_CST, VEC_DUPLICATE_EXPR): Document. (VEC_COND_EXPR): Add missing @tindex. * doc/md.texi (vec_duplicate@var{m}): Document. * tree.def (VEC_DUPLICATE_CST, VEC_DUPLICATE_EXPR): New tree codes. * tree-core.h (tree_base): Document that u.nelts and TREE_OVERFLOW are used for VEC_DUPLICATE_CST as well. (tree_vector): Access base.n.nelts directly. * tree.h (TREE_OVERFLOW): Add VEC_DUPLICATE_CST to the list of valid codes. (VEC_DUPLICATE_CST_ELT): New macro. * tree.c (tree_node_structure_for_code, tree_code_size, tree_size) (integer_zerop, integer_onep, integer_all_onesp, integer_truep) (real_zerop, real_onep, real_minus_onep, add_expr, initializer_zerop) (walk_tree_1, drop_tree_overflow): Handle VEC_DUPLICATE_CST. (build_vec_duplicate_cst): New function. (build_vector_from_val): Add stubbed-out handling of variable-length vectors, using build_vec_duplicate_cst and VEC_DUPLICATE_EXPR. (uniform_vector_p): Handle the new codes. (test_vec_duplicate_predicates_int): New function. (test_vec_duplicate_predicates_float): Likewise. (test_vec_duplicate_predicates): Likewise. (tree_c_tests): Call test_vec_duplicate_predicates. * cfgexpand.c (expand_debug_expr): Handle the new codes. * tree-pretty-print.c (dump_generic_node): Likewise. * tree-vect-generic.c (ssa_uniform_vector_p): Likewise. * dwarf2out.c (rtl_for_decl_init): Handle VEC_DUPLICATE_CST. * gimple-expr.h (is_gimple_constant): Likewise. * gimplify.c (gimplify_expr): Likewise. * graphite-isl-ast-to-gimple.c (translate_isl_ast_to_gimple::is_constant): Likewise. * graphite-scop-detection.c (scan_tree_for_params): Likewise. * ipa-icf-gimple.c (func_checker::compare_cst_or_decl): Likewise. (func_checker::compare_operand): Likewise. * ipa-icf.c (sem_item::add_expr, sem_variable::equals): Likewise. * match.pd (negate_expr_p): Likewise. * print-tree.c (print_node): Likewise. * tree-chkp.c (chkp_find_bounds_1): Likewise. * tree-loop-distribution.c (const_with_all_bytes_same): Likewise. * tree-ssa-loop.c (for_each_index): Likewise. * tree-ssa-pre.c (create_component_ref_by_pieces_1): Likewise. * tree-ssa-sccvn.c (copy_reference_ops_from_ref): Likewise. (ao_ref_init_from_vn_reference): Likewise. * varasm.c (const_hash_1, compare_constant): Likewise. * fold-const.c (negate_expr_p, fold_negate_expr_1, const_binop) (fold_convert_const, operand_equal_p, fold_view_convert_expr) (exact_inverse, fold_checksum_tree): Likewise. (const_unop): Likewise. Fold VEC_DUPLICATE_EXPRs of a constant. (test_vec_duplicate_folding): New function. (fold_const_c_tests): Call it. * optabs.def (vec_duplicate_optab): New optab. * optabs-tree.c (optab_for_tree_code): Handle VEC_DUPLICATE_EXPR. * optabs.h (expand_vector_broadcast): Declare. * optabs.c (expand_vector_broadcast): Make non-static. Try using vec_duplicate_optab. * expr.c (store_constructor): Try using vec_duplicate_optab for uniform vectors. (const_vector_element): New function, split out from... (const_vector_from_tree): ...here. (expand_expr_real_2): Handle VEC_DUPLICATE_EXPR. (expand_expr_real_1): Handle VEC_DUPLICATE_CST. * internal-fn.c (expand_vector_ubsan_overflow): Use CONSTANT_P instead of checking for VECTOR_CST. * tree-cfg.c (verify_gimple_assign_unary): Handle VEC_DUPLICATE_EXPR. (verify_gimple_assign_single): Handle VEC_DUPLICATE_CST. * tree-inline.c (estimate_operator_cost): Handle VEC_DUPLICATE_EXPR. Index: gcc/doc/generic.texi =================================================================== --- gcc/doc/generic.texi 2017-11-06 12:40:39.845713389 +0000 +++ gcc/doc/generic.texi 2017-11-06 12:40:40.277637153 +0000 @@ -1036,6 +1036,7 @@ As this example indicates, the operands @tindex FIXED_CST @tindex COMPLEX_CST @tindex VECTOR_CST +@tindex VEC_DUPLICATE_CST @tindex STRING_CST @findex TREE_STRING_LENGTH @findex TREE_STRING_POINTER @@ -1089,6 +1090,14 @@ constant nodes. Each individual constan double constant node. The first operand is a @code{TREE_LIST} of the constant nodes and is accessed through @code{TREE_VECTOR_CST_ELTS}. +@item VEC_DUPLICATE_CST +These nodes represent a vector constant in which every element has the +same scalar value. At present only variable-length vectors use +@code{VEC_DUPLICATE_CST}; constant-length vectors use @code{VECTOR_CST} +instead. The scalar element value is given by +@code{VEC_DUPLICATE_CST_ELT} and has the same restrictions as the +element of a @code{VECTOR_CST}. + @item STRING_CST These nodes represent string-constants. The @code{TREE_STRING_LENGTH} returns the length of the string, as an @code{int}. The @@ -1692,6 +1701,7 @@ a value from @code{enum annot_expr_kind} @node Vectors @subsection Vectors +@tindex VEC_DUPLICATE_EXPR @tindex VEC_LSHIFT_EXPR @tindex VEC_RSHIFT_EXPR @tindex VEC_WIDEN_MULT_HI_EXPR @@ -1703,9 +1713,14 @@ a value from @code{enum annot_expr_kind} @tindex VEC_PACK_TRUNC_EXPR @tindex VEC_PACK_SAT_EXPR @tindex VEC_PACK_FIX_TRUNC_EXPR +@tindex VEC_COND_EXPR @tindex SAD_EXPR @table @code +@item VEC_DUPLICATE_EXPR +This node has a single operand and represents a vector in which every +element is equal to that operand. + @item VEC_LSHIFT_EXPR @itemx VEC_RSHIFT_EXPR These nodes represent whole vector left and right shifts, respectively. Index: gcc/doc/md.texi =================================================================== --- gcc/doc/md.texi 2017-11-06 12:40:39.845713389 +0000 +++ gcc/doc/md.texi 2017-11-06 12:40:40.278630081 +0000 @@ -4888,6 +4888,17 @@ and operand 1 is parallel containing val the vector mode @var{m}, or a vector mode with the same element mode and smaller number of elements. +@cindex @code{vec_duplicate@var{m}} instruction pattern +@item @samp{vec_duplicate@var{m}} +Initialize vector output operand 0 so that each element has the value given +by scalar input operand 1. The vector has mode @var{m} and the scalar has +the mode appropriate for one element of @var{m}. + +This pattern only handles duplicates of non-constant inputs. Constant +vectors go through the @code{mov@var{m}} pattern instead. + +This pattern is not allowed to @code{FAIL}. + @cindex @code{vec_cmp@var{m}@var{n}} instruction pattern @item @samp{vec_cmp@var{m}@var{n}} Output a vector comparison. Operand 0 of mode @var{n} is the destination for Index: gcc/tree.def =================================================================== --- gcc/tree.def 2017-11-06 12:40:39.845713389 +0000 +++ gcc/tree.def 2017-11-06 12:40:40.292531076 +0000 @@ -304,6 +304,11 @@ DEFTREECODE (COMPLEX_CST, "complex_cst", /* Contents are in VECTOR_CST_ELTS field. */ DEFTREECODE (VECTOR_CST, "vector_cst", tcc_constant, 0) +/* Represents a vector constant in which every element is equal to + VEC_DUPLICATE_CST_ELT. This is only ever used for variable-length + vectors; fixed-length vectors must use VECTOR_CST instead. */ +DEFTREECODE (VEC_DUPLICATE_CST, "vec_duplicate_cst", tcc_constant, 0) + /* Contents are TREE_STRING_LENGTH and the actual contents of the string. */ DEFTREECODE (STRING_CST, "string_cst", tcc_constant, 0) @@ -534,6 +539,9 @@ DEFTREECODE (TARGET_EXPR, "target_expr", 1 and 2 are NULL. The operands are then taken from the cfg edges. */ DEFTREECODE (COND_EXPR, "cond_expr", tcc_expression, 3) +/* Represents a vector in which every element is equal to operand 0. */ +DEFTREECODE (VEC_DUPLICATE_EXPR, "vec_duplicate_expr", tcc_unary, 1) + /* Vector conditional expression. It is like COND_EXPR, but with vector operands. Index: gcc/tree-core.h =================================================================== --- gcc/tree-core.h 2017-11-06 12:40:39.845713389 +0000 +++ gcc/tree-core.h 2017-11-06 12:40:40.288559363 +0000 @@ -975,7 +975,8 @@ struct GTY(()) tree_base { /* VEC length. This field is only used with TREE_VEC. */ int length; - /* Number of elements. This field is only used with VECTOR_CST. */ + /* Number of elements. This field is only used with VECTOR_CST + and VEC_DUPLICATE_CST. It is always 1 for VEC_DUPLICATE_CST. */ unsigned int nelts; /* SSA version number. This field is only used with SSA_NAME. */ @@ -1065,7 +1066,7 @@ struct GTY(()) tree_base { public_flag: TREE_OVERFLOW in - INTEGER_CST, REAL_CST, COMPLEX_CST, VECTOR_CST + INTEGER_CST, REAL_CST, COMPLEX_CST, VECTOR_CST, VEC_DUPLICATE_CST TREE_PUBLIC in VAR_DECL, FUNCTION_DECL @@ -1332,7 +1333,7 @@ struct GTY(()) tree_complex { struct GTY(()) tree_vector { struct tree_typed typed; - tree GTY ((length ("VECTOR_CST_NELTS ((tree) &%h)"))) elts[1]; + tree GTY ((length ("((tree) &%h)->base.u.nelts"))) elts[1]; }; struct GTY(()) tree_identifier { Index: gcc/tree.h =================================================================== --- gcc/tree.h 2017-11-06 12:40:39.845713389 +0000 +++ gcc/tree.h 2017-11-06 12:40:40.293524004 +0000 @@ -709,8 +709,8 @@ #define TREE_SYMBOL_REFERENCED(NODE) \ #define TYPE_REF_CAN_ALIAS_ALL(NODE) \ (PTR_OR_REF_CHECK (NODE)->base.static_flag) -/* In an INTEGER_CST, REAL_CST, COMPLEX_CST, or VECTOR_CST, this means - there was an overflow in folding. */ +/* In an INTEGER_CST, REAL_CST, COMPLEX_CST, VECTOR_CST or VEC_DUPLICATE_CST, + this means there was an overflow in folding. */ #define TREE_OVERFLOW(NODE) (CST_CHECK (NODE)->base.public_flag) @@ -1009,6 +1009,10 @@ #define VECTOR_CST_NELTS(NODE) (VECTOR_C #define VECTOR_CST_ELTS(NODE) (VECTOR_CST_CHECK (NODE)->vector.elts) #define VECTOR_CST_ELT(NODE,IDX) (VECTOR_CST_CHECK (NODE)->vector.elts[IDX]) +/* In a VEC_DUPLICATE_CST node. */ +#define VEC_DUPLICATE_CST_ELT(NODE) \ + (VEC_DUPLICATE_CST_CHECK (NODE)->vector.elts[0]) + /* Define fields and accessors for some special-purpose tree nodes. */ #define IDENTIFIER_LENGTH(NODE) \ Index: gcc/tree.c =================================================================== --- gcc/tree.c 2017-11-06 12:40:39.845713389 +0000 +++ gcc/tree.c 2017-11-06 12:40:40.292531076 +0000 @@ -464,6 +464,7 @@ tree_node_structure_for_code (enum tree_ case FIXED_CST: return TS_FIXED_CST; case COMPLEX_CST: return TS_COMPLEX; case VECTOR_CST: return TS_VECTOR; + case VEC_DUPLICATE_CST: return TS_VECTOR; case STRING_CST: return TS_STRING; /* tcc_exceptional cases. */ case ERROR_MARK: return TS_COMMON; @@ -829,6 +830,7 @@ tree_code_size (enum tree_code code) case FIXED_CST: return sizeof (tree_fixed_cst); case COMPLEX_CST: return sizeof (tree_complex); case VECTOR_CST: return sizeof (tree_vector); + case VEC_DUPLICATE_CST: return sizeof (tree_vector); case STRING_CST: gcc_unreachable (); default: gcc_checking_assert (code >= NUM_TREE_CODES); @@ -890,6 +892,9 @@ tree_size (const_tree node) return (sizeof (struct tree_vector) + (VECTOR_CST_NELTS (node) - 1) * sizeof (tree)); + case VEC_DUPLICATE_CST: + return sizeof (struct tree_vector); + case STRING_CST: return TREE_STRING_LENGTH (node) + offsetof (struct tree_string, str) + 1; @@ -1697,6 +1702,34 @@ cst_and_fits_in_hwi (const_tree x) && (tree_fits_shwi_p (x) || tree_fits_uhwi_p (x))); } +/* Build a new VEC_DUPLICATE_CST with type TYPE and operand EXP. + + This function is only suitable for callers that know TYPE is a + variable-length vector and specifically need a VEC_DUPLICATE_CST node. + Use build_vector_from_val to duplicate a general scalar into a general + vector type. */ + +static tree +build_vec_duplicate_cst (tree type, tree exp MEM_STAT_DECL) +{ + /* Shouldn't be used until we have variable-length vectors. */ + gcc_unreachable (); + + int length = sizeof (struct tree_vector); + + record_node_allocation_statistics (VEC_DUPLICATE_CST, length); + + tree t = ggc_alloc_cleared_tree_node_stat (length PASS_MEM_STAT); + + TREE_SET_CODE (t, VEC_DUPLICATE_CST); + TREE_TYPE (t) = type; + t->base.u.nelts = 1; + VEC_DUPLICATE_CST_ELT (t) = exp; + TREE_CONSTANT (t) = 1; + + return t; +} + /* Build a newly constructed VECTOR_CST node of length LEN. */ tree @@ -1790,6 +1823,13 @@ build_vector_from_val (tree vectype, tre gcc_checking_assert (types_compatible_p (TYPE_MAIN_VARIANT (TREE_TYPE (sc)), TREE_TYPE (vectype))); + if (0) + { + if (CONSTANT_CLASS_P (sc)) + return build_vec_duplicate_cst (vectype, sc); + return fold_build1 (VEC_DUPLICATE_EXPR, vectype, sc); + } + if (CONSTANT_CLASS_P (sc)) { auto_vec<tree, 32> v (nunits); @@ -2358,6 +2398,8 @@ integer_zerop (const_tree expr) return false; return true; } + case VEC_DUPLICATE_CST: + return integer_zerop (VEC_DUPLICATE_CST_ELT (expr)); default: return false; } @@ -2384,6 +2426,8 @@ integer_onep (const_tree expr) return false; return true; } + case VEC_DUPLICATE_CST: + return integer_onep (VEC_DUPLICATE_CST_ELT (expr)); default: return false; } @@ -2422,6 +2466,9 @@ integer_all_onesp (const_tree expr) return 1; } + else if (TREE_CODE (expr) == VEC_DUPLICATE_CST) + return integer_all_onesp (VEC_DUPLICATE_CST_ELT (expr)); + else if (TREE_CODE (expr) != INTEGER_CST) return 0; @@ -2478,7 +2525,7 @@ integer_nonzerop (const_tree expr) int integer_truep (const_tree expr) { - if (TREE_CODE (expr) == VECTOR_CST) + if (TREE_CODE (expr) == VECTOR_CST || TREE_CODE (expr) == VEC_DUPLICATE_CST) return integer_all_onesp (expr); return integer_onep (expr); } @@ -2649,6 +2696,8 @@ real_zerop (const_tree expr) return false; return true; } + case VEC_DUPLICATE_CST: + return real_zerop (VEC_DUPLICATE_CST_ELT (expr)); default: return false; } @@ -2677,6 +2726,8 @@ real_onep (const_tree expr) return false; return true; } + case VEC_DUPLICATE_CST: + return real_onep (VEC_DUPLICATE_CST_ELT (expr)); default: return false; } @@ -2704,6 +2755,8 @@ real_minus_onep (const_tree expr) return false; return true; } + case VEC_DUPLICATE_CST: + return real_minus_onep (VEC_DUPLICATE_CST_ELT (expr)); default: return false; } @@ -7106,6 +7159,9 @@ add_expr (const_tree t, inchash::hash &h inchash::add_expr (VECTOR_CST_ELT (t, i), hstate, flags); return; } + case VEC_DUPLICATE_CST: + inchash::add_expr (VEC_DUPLICATE_CST_ELT (t), hstate); + return; case SSA_NAME: /* We can just compare by pointer. */ hstate.add_hwi (SSA_NAME_VERSION (t)); @@ -10367,6 +10423,9 @@ initializer_zerop (const_tree init) return true; } + case VEC_DUPLICATE_CST: + return initializer_zerop (VEC_DUPLICATE_CST_ELT (init)); + case CONSTRUCTOR: { unsigned HOST_WIDE_INT idx; @@ -10412,7 +10471,13 @@ uniform_vector_p (const_tree vec) gcc_assert (VECTOR_TYPE_P (TREE_TYPE (vec))); - if (TREE_CODE (vec) == VECTOR_CST) + if (TREE_CODE (vec) == VEC_DUPLICATE_CST) + return VEC_DUPLICATE_CST_ELT (vec); + + else if (TREE_CODE (vec) == VEC_DUPLICATE_EXPR) + return TREE_OPERAND (vec, 0); + + else if (TREE_CODE (vec) == VECTOR_CST) { first = VECTOR_CST_ELT (vec, 0); for (i = 1; i < VECTOR_CST_NELTS (vec); ++i) @@ -11144,6 +11209,7 @@ #define WALK_SUBTREE_TAIL(NODE) \ case REAL_CST: case FIXED_CST: case VECTOR_CST: + case VEC_DUPLICATE_CST: case STRING_CST: case BLOCK: case PLACEHOLDER_EXPR: @@ -12430,6 +12496,12 @@ drop_tree_overflow (tree t) elt = drop_tree_overflow (elt); } } + if (TREE_CODE (t) == VEC_DUPLICATE_CST) + { + tree *elt = &VEC_DUPLICATE_CST_ELT (t); + if (TREE_OVERFLOW (*elt)) + *elt = drop_tree_overflow (*elt); + } return t; } @@ -13850,6 +13922,102 @@ test_integer_constants () ASSERT_EQ (type, TREE_TYPE (zero)); } +/* Verify predicate handling of VEC_DUPLICATE_CSTs and VEC_DUPLICATE_EXPRs + for integral type TYPE. */ + +static void +test_vec_duplicate_predicates_int (tree type) +{ + scalar_int_mode int_mode = SCALAR_INT_TYPE_MODE (type); + machine_mode vec_mode = targetm.vectorize.preferred_simd_mode (int_mode); + /* This will be 1 if VEC_MODE isn't a vector mode. */ + unsigned int nunits = GET_MODE_NUNITS (vec_mode); + + tree vec_type = build_vector_type (type, nunits); + + tree zero = build_zero_cst (type); + tree vec_zero = build_vector_from_val (vec_type, zero); + ASSERT_TRUE (integer_zerop (vec_zero)); + ASSERT_FALSE (integer_onep (vec_zero)); + ASSERT_FALSE (integer_minus_onep (vec_zero)); + ASSERT_FALSE (integer_all_onesp (vec_zero)); + ASSERT_FALSE (integer_truep (vec_zero)); + ASSERT_TRUE (initializer_zerop (vec_zero)); + + tree one = build_one_cst (type); + tree vec_one = build_vector_from_val (vec_type, one); + ASSERT_FALSE (integer_zerop (vec_one)); + ASSERT_TRUE (integer_onep (vec_one)); + ASSERT_FALSE (integer_minus_onep (vec_one)); + ASSERT_FALSE (integer_all_onesp (vec_one)); + ASSERT_FALSE (integer_truep (vec_one)); + ASSERT_FALSE (initializer_zerop (vec_one)); + + tree minus_one = build_minus_one_cst (type); + tree vec_minus_one = build_vector_from_val (vec_type, minus_one); + ASSERT_FALSE (integer_zerop (vec_minus_one)); + ASSERT_FALSE (integer_onep (vec_minus_one)); + ASSERT_TRUE (integer_minus_onep (vec_minus_one)); + ASSERT_TRUE (integer_all_onesp (vec_minus_one)); + ASSERT_TRUE (integer_truep (vec_minus_one)); + ASSERT_FALSE (initializer_zerop (vec_minus_one)); + + tree x = create_tmp_var_raw (type, "x"); + tree vec_x = build1 (VEC_DUPLICATE_EXPR, vec_type, x); + ASSERT_EQ (uniform_vector_p (vec_zero), zero); + ASSERT_EQ (uniform_vector_p (vec_one), one); + ASSERT_EQ (uniform_vector_p (vec_minus_one), minus_one); + ASSERT_EQ (uniform_vector_p (vec_x), x); +} + +/* Verify predicate handling of VEC_DUPLICATE_CSTs for floating-point + type TYPE. */ + +static void +test_vec_duplicate_predicates_float (tree type) +{ + scalar_float_mode float_mode = SCALAR_FLOAT_TYPE_MODE (type); + machine_mode vec_mode = targetm.vectorize.preferred_simd_mode (float_mode); + /* This will be 1 if VEC_MODE isn't a vector mode. */ + unsigned int nunits = GET_MODE_NUNITS (vec_mode); + + tree vec_type = build_vector_type (type, nunits); + + tree zero = build_zero_cst (type); + tree vec_zero = build_vector_from_val (vec_type, zero); + ASSERT_TRUE (real_zerop (vec_zero)); + ASSERT_FALSE (real_onep (vec_zero)); + ASSERT_FALSE (real_minus_onep (vec_zero)); + ASSERT_TRUE (initializer_zerop (vec_zero)); + + tree one = build_one_cst (type); + tree vec_one = build_vector_from_val (vec_type, one); + ASSERT_FALSE (real_zerop (vec_one)); + ASSERT_TRUE (real_onep (vec_one)); + ASSERT_FALSE (real_minus_onep (vec_one)); + ASSERT_FALSE (initializer_zerop (vec_one)); + + tree minus_one = build_minus_one_cst (type); + tree vec_minus_one = build_vector_from_val (vec_type, minus_one); + ASSERT_FALSE (real_zerop (vec_minus_one)); + ASSERT_FALSE (real_onep (vec_minus_one)); + ASSERT_TRUE (real_minus_onep (vec_minus_one)); + ASSERT_FALSE (initializer_zerop (vec_minus_one)); + + ASSERT_EQ (uniform_vector_p (vec_zero), zero); + ASSERT_EQ (uniform_vector_p (vec_one), one); + ASSERT_EQ (uniform_vector_p (vec_minus_one), minus_one); +} + +/* Verify predicate handling of VEC_DUPLICATE_CSTs and VEC_DUPLICATE_EXPRs. */ + +static void +test_vec_duplicate_predicates () +{ + test_vec_duplicate_predicates_int (integer_type_node); + test_vec_duplicate_predicates_float (float_type_node); +} + /* Verify identifiers. */ static void @@ -13878,6 +14046,7 @@ test_labels () tree_c_tests () { test_integer_constants (); + test_vec_duplicate_predicates (); test_identifiers (); test_labels (); } Index: gcc/cfgexpand.c =================================================================== --- gcc/cfgexpand.c 2017-11-06 12:40:39.845713389 +0000 +++ gcc/cfgexpand.c 2017-11-06 12:40:40.276644225 +0000 @@ -5068,6 +5068,8 @@ expand_debug_expr (tree exp) case VEC_WIDEN_LSHIFT_HI_EXPR: case VEC_WIDEN_LSHIFT_LO_EXPR: case VEC_PERM_EXPR: + case VEC_DUPLICATE_CST: + case VEC_DUPLICATE_EXPR: return NULL; /* Misc codes. */ Index: gcc/tree-pretty-print.c =================================================================== --- gcc/tree-pretty-print.c 2017-11-06 12:40:39.845713389 +0000 +++ gcc/tree-pretty-print.c 2017-11-06 12:40:40.289552291 +0000 @@ -1802,6 +1802,12 @@ dump_generic_node (pretty_printer *pp, t } break; + case VEC_DUPLICATE_CST: + pp_string (pp, "{ "); + dump_generic_node (pp, VEC_DUPLICATE_CST_ELT (node), spc, flags, false); + pp_string (pp, ", ... }"); + break; + case FUNCTION_TYPE: case METHOD_TYPE: dump_generic_node (pp, TREE_TYPE (node), spc, flags, false); @@ -3231,6 +3237,15 @@ dump_generic_node (pretty_printer *pp, t pp_string (pp, " > "); break; + case VEC_DUPLICATE_EXPR: + pp_space (pp); + for (str = get_tree_code_name (code); *str; str++) + pp_character (pp, TOUPPER (*str)); + pp_string (pp, " < "); + dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false); + pp_string (pp, " > "); + break; + case VEC_UNPACK_HI_EXPR: pp_string (pp, " VEC_UNPACK_HI_EXPR < "); dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false); Index: gcc/tree-vect-generic.c =================================================================== --- gcc/tree-vect-generic.c 2017-11-06 12:40:39.845713389 +0000 +++ gcc/tree-vect-generic.c 2017-11-06 12:40:40.291538147 +0000 @@ -1419,6 +1419,8 @@ lower_vec_perm (gimple_stmt_iterator *gs ssa_uniform_vector_p (tree op) { if (TREE_CODE (op) == VECTOR_CST + || TREE_CODE (op) == VEC_DUPLICATE_CST + || TREE_CODE (op) == VEC_DUPLICATE_EXPR || TREE_CODE (op) == CONSTRUCTOR) return uniform_vector_p (op); if (TREE_CODE (op) == SSA_NAME) Index: gcc/dwarf2out.c =================================================================== --- gcc/dwarf2out.c 2017-11-06 12:40:39.845713389 +0000 +++ gcc/dwarf2out.c 2017-11-06 12:40:40.280615937 +0000 @@ -18878,6 +18878,7 @@ rtl_for_decl_init (tree init, tree type) switch (TREE_CODE (init)) { case VECTOR_CST: + case VEC_DUPLICATE_CST: break; case CONSTRUCTOR: if (TREE_CONSTANT (init)) Index: gcc/gimple-expr.h =================================================================== --- gcc/gimple-expr.h 2017-11-06 12:40:39.845713389 +0000 +++ gcc/gimple-expr.h 2017-11-06 12:40:40.282601794 +0000 @@ -134,6 +134,7 @@ is_gimple_constant (const_tree t) case FIXED_CST: case COMPLEX_CST: case VECTOR_CST: + case VEC_DUPLICATE_CST: case STRING_CST: return true; Index: gcc/gimplify.c =================================================================== --- gcc/gimplify.c 2017-11-06 12:40:39.845713389 +0000 +++ gcc/gimplify.c 2017-11-06 12:40:40.283594722 +0000 @@ -11507,6 +11507,7 @@ gimplify_expr (tree *expr_p, gimple_seq case STRING_CST: case COMPLEX_CST: case VECTOR_CST: + case VEC_DUPLICATE_CST: /* Drop the overflow flag on constants, we do not want that in the GIMPLE IL. */ if (TREE_OVERFLOW_P (*expr_p)) Index: gcc/graphite-isl-ast-to-gimple.c =================================================================== --- gcc/graphite-isl-ast-to-gimple.c 2017-11-06 12:40:39.845713389 +0000 +++ gcc/graphite-isl-ast-to-gimple.c 2017-11-06 12:40:40.284587650 +0000 @@ -211,7 +211,8 @@ enum phi_node_kind return TREE_CODE (op) == INTEGER_CST || TREE_CODE (op) == REAL_CST || TREE_CODE (op) == COMPLEX_CST - || TREE_CODE (op) == VECTOR_CST; + || TREE_CODE (op) == VECTOR_CST + || TREE_CODE (op) == VEC_DUPLICATE_CST; } private: Index: gcc/graphite-scop-detection.c =================================================================== --- gcc/graphite-scop-detection.c 2017-11-06 12:40:39.845713389 +0000 +++ gcc/graphite-scop-detection.c 2017-11-06 12:40:40.284587650 +0000 @@ -1212,6 +1212,7 @@ scan_tree_for_params (sese_info_p s, tre case REAL_CST: case COMPLEX_CST: case VECTOR_CST: + case VEC_DUPLICATE_CST: break; default: Index: gcc/ipa-icf-gimple.c =================================================================== --- gcc/ipa-icf-gimple.c 2017-11-06 12:40:39.845713389 +0000 +++ gcc/ipa-icf-gimple.c 2017-11-06 12:40:40.285580578 +0000 @@ -333,6 +333,7 @@ func_checker::compare_cst_or_decl (tree case INTEGER_CST: case COMPLEX_CST: case VECTOR_CST: + case VEC_DUPLICATE_CST: case STRING_CST: case REAL_CST: { @@ -528,6 +529,7 @@ func_checker::compare_operand (tree t1, case INTEGER_CST: case COMPLEX_CST: case VECTOR_CST: + case VEC_DUPLICATE_CST: case STRING_CST: case REAL_CST: case FUNCTION_DECL: Index: gcc/ipa-icf.c =================================================================== --- gcc/ipa-icf.c 2017-11-06 12:40:39.845713389 +0000 +++ gcc/ipa-icf.c 2017-11-06 12:40:40.285580578 +0000 @@ -1479,6 +1479,7 @@ sem_item::add_expr (const_tree exp, inch case STRING_CST: case COMPLEX_CST: case VECTOR_CST: + case VEC_DUPLICATE_CST: inchash::add_expr (exp, hstate); break; case CONSTRUCTOR: @@ -2036,6 +2037,9 @@ sem_variable::equals (tree t1, tree t2) return 1; } + case VEC_DUPLICATE_CST: + return sem_variable::equals (VEC_DUPLICATE_CST_ELT (t1), + VEC_DUPLICATE_CST_ELT (t2)); case ARRAY_REF: case ARRAY_RANGE_REF: { Index: gcc/match.pd =================================================================== --- gcc/match.pd 2017-11-06 12:40:39.845713389 +0000 +++ gcc/match.pd 2017-11-06 12:40:40.285580578 +0000 @@ -958,6 +958,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (match negate_expr_p VECTOR_CST (if (FLOAT_TYPE_P (TREE_TYPE (type)) || TYPE_OVERFLOW_WRAPS (type)))) +(match negate_expr_p + VEC_DUPLICATE_CST + (if (FLOAT_TYPE_P (TREE_TYPE (type)) || TYPE_OVERFLOW_WRAPS (type)))) /* (-A) * (-B) -> A * B */ (simplify Index: gcc/print-tree.c =================================================================== --- gcc/print-tree.c 2017-11-06 12:40:39.845713389 +0000 +++ gcc/print-tree.c 2017-11-06 12:40:40.287566435 +0000 @@ -783,6 +783,10 @@ print_node (FILE *file, const char *pref } break; + case VEC_DUPLICATE_CST: + print_node (file, "elt", VEC_DUPLICATE_CST_ELT (node), indent + 4); + break; + case COMPLEX_CST: print_node (file, "real", TREE_REALPART (node), indent + 4); print_node (file, "imag", TREE_IMAGPART (node), indent + 4); Index: gcc/tree-chkp.c =================================================================== --- gcc/tree-chkp.c 2017-11-06 12:40:39.845713389 +0000 +++ gcc/tree-chkp.c 2017-11-06 12:40:40.288559363 +0000 @@ -3799,6 +3799,7 @@ chkp_find_bounds_1 (tree ptr, tree ptr_s case INTEGER_CST: case COMPLEX_CST: case VECTOR_CST: + case VEC_DUPLICATE_CST: if (integer_zerop (ptr_src)) bounds = chkp_get_none_bounds (); else Index: gcc/tree-loop-distribution.c =================================================================== --- gcc/tree-loop-distribution.c 2017-11-06 12:40:39.845713389 +0000 +++ gcc/tree-loop-distribution.c 2017-11-06 12:40:40.289552291 +0000 @@ -927,6 +927,9 @@ const_with_all_bytes_same (tree val) && CONSTRUCTOR_NELTS (val) == 0)) return 0; + if (TREE_CODE (val) == VEC_DUPLICATE_CST) + return const_with_all_bytes_same (VEC_DUPLICATE_CST_ELT (val)); + if (real_zerop (val)) { /* Only return 0 for +0.0, not for -0.0, which doesn't have Index: gcc/tree-ssa-loop.c =================================================================== --- gcc/tree-ssa-loop.c 2017-11-06 12:40:39.845713389 +0000 +++ gcc/tree-ssa-loop.c 2017-11-06 12:40:40.290545219 +0000 @@ -616,6 +616,7 @@ for_each_index (tree *addr_p, bool (*cbc case STRING_CST: case RESULT_DECL: case VECTOR_CST: + case VEC_DUPLICATE_CST: case COMPLEX_CST: case INTEGER_CST: case REAL_CST: Index: gcc/tree-ssa-pre.c =================================================================== --- gcc/tree-ssa-pre.c 2017-11-06 12:40:39.845713389 +0000 +++ gcc/tree-ssa-pre.c 2017-11-06 12:40:40.290545219 +0000 @@ -2627,6 +2627,7 @@ create_component_ref_by_pieces_1 (basic_ case INTEGER_CST: case COMPLEX_CST: case VECTOR_CST: + case VEC_DUPLICATE_CST: case REAL_CST: case CONSTRUCTOR: case VAR_DECL: Index: gcc/tree-ssa-sccvn.c =================================================================== --- gcc/tree-ssa-sccvn.c 2017-11-06 12:40:39.845713389 +0000 +++ gcc/tree-ssa-sccvn.c 2017-11-06 12:40:40.291538147 +0000 @@ -866,6 +866,7 @@ copy_reference_ops_from_ref (tree ref, v case INTEGER_CST: case COMPLEX_CST: case VECTOR_CST: + case VEC_DUPLICATE_CST: case REAL_CST: case FIXED_CST: case CONSTRUCTOR: @@ -1058,6 +1059,7 @@ ao_ref_init_from_vn_reference (ao_ref *r case INTEGER_CST: case COMPLEX_CST: case VECTOR_CST: + case VEC_DUPLICATE_CST: case REAL_CST: case CONSTRUCTOR: case CONST_DECL: Index: gcc/varasm.c =================================================================== --- gcc/varasm.c 2017-11-06 12:40:39.845713389 +0000 +++ gcc/varasm.c 2017-11-06 12:40:40.293524004 +0000 @@ -3068,6 +3068,9 @@ const_hash_1 (const tree exp) CASE_CONVERT: return const_hash_1 (TREE_OPERAND (exp, 0)) * 7 + 2; + case VEC_DUPLICATE_CST: + return const_hash_1 (VEC_DUPLICATE_CST_ELT (exp)) * 7 + 3; + default: /* A language specific constant. Just hash the code. */ return code; @@ -3158,6 +3161,10 @@ compare_constant (const tree t1, const t return 1; } + case VEC_DUPLICATE_CST: + return compare_constant (VEC_DUPLICATE_CST_ELT (t1), + VEC_DUPLICATE_CST_ELT (t2)); + case CONSTRUCTOR: { vec<constructor_elt, va_gc> *v1, *v2; Index: gcc/fold-const.c =================================================================== --- gcc/fold-const.c 2017-11-06 12:40:39.845713389 +0000 +++ gcc/fold-const.c 2017-11-06 12:40:40.282601794 +0000 @@ -418,6 +418,9 @@ negate_expr_p (tree t) return true; } + case VEC_DUPLICATE_CST: + return negate_expr_p (VEC_DUPLICATE_CST_ELT (t)); + case COMPLEX_EXPR: return negate_expr_p (TREE_OPERAND (t, 0)) && negate_expr_p (TREE_OPERAND (t, 1)); @@ -579,6 +582,14 @@ fold_negate_expr_1 (location_t loc, tree return build_vector (type, elts); } + case VEC_DUPLICATE_CST: + { + tree sub = fold_negate_expr (loc, VEC_DUPLICATE_CST_ELT (t)); + if (!sub) + return NULL_TREE; + return build_vector_from_val (type, sub); + } + case COMPLEX_EXPR: if (negate_expr_p (t)) return fold_build2_loc (loc, COMPLEX_EXPR, type, @@ -1436,6 +1447,16 @@ const_binop (enum tree_code code, tree a return build_vector (type, elts); } + if (TREE_CODE (arg1) == VEC_DUPLICATE_CST + && TREE_CODE (arg2) == VEC_DUPLICATE_CST) + { + tree sub = const_binop (code, VEC_DUPLICATE_CST_ELT (arg1), + VEC_DUPLICATE_CST_ELT (arg2)); + if (!sub) + return NULL_TREE; + return build_vector_from_val (TREE_TYPE (arg1), sub); + } + /* Shifts allow a scalar offset for a vector. */ if (TREE_CODE (arg1) == VECTOR_CST && TREE_CODE (arg2) == INTEGER_CST) @@ -1459,6 +1480,15 @@ const_binop (enum tree_code code, tree a return build_vector (type, elts); } + + if (TREE_CODE (arg1) == VEC_DUPLICATE_CST + && TREE_CODE (arg2) == INTEGER_CST) + { + tree sub = const_binop (code, VEC_DUPLICATE_CST_ELT (arg1), arg2); + if (!sub) + return NULL_TREE; + return build_vector_from_val (TREE_TYPE (arg1), sub); + } return NULL_TREE; } @@ -1652,6 +1682,13 @@ const_unop (enum tree_code code, tree ty if (i == count) return build_vector (type, elements); } + else if (TREE_CODE (arg0) == VEC_DUPLICATE_CST) + { + tree sub = const_unop (BIT_NOT_EXPR, TREE_TYPE (type), + VEC_DUPLICATE_CST_ELT (arg0)); + if (sub) + return build_vector_from_val (type, sub); + } break; case TRUTH_NOT_EXPR: @@ -1737,6 +1774,11 @@ const_unop (enum tree_code code, tree ty return res; } + case VEC_DUPLICATE_EXPR: + if (CONSTANT_CLASS_P (arg0)) + return build_vector_from_val (type, arg0); + return NULL_TREE; + default: break; } @@ -2167,6 +2209,15 @@ fold_convert_const (enum tree_code code, } return build_vector (type, v); } + if (TREE_CODE (arg1) == VEC_DUPLICATE_CST + && (TYPE_VECTOR_SUBPARTS (type) + == TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)))) + { + tree sub = fold_convert_const (code, TREE_TYPE (type), + VEC_DUPLICATE_CST_ELT (arg1)); + if (sub) + return build_vector_from_val (type, sub); + } } return NULL_TREE; } @@ -2953,6 +3004,10 @@ operand_equal_p (const_tree arg0, const_ return 1; } + case VEC_DUPLICATE_CST: + return operand_equal_p (VEC_DUPLICATE_CST_ELT (arg0), + VEC_DUPLICATE_CST_ELT (arg1), flags); + case COMPLEX_CST: return (operand_equal_p (TREE_REALPART (arg0), TREE_REALPART (arg1), flags) @@ -7475,6 +7530,20 @@ can_native_interpret_type_p (tree type) static tree fold_view_convert_expr (tree type, tree expr) { + /* Recurse on duplicated vectors if the target type is also a vector + and if the elements line up. */ + tree expr_type = TREE_TYPE (expr); + if (TREE_CODE (expr) == VEC_DUPLICATE_CST + && VECTOR_TYPE_P (type) + && TYPE_VECTOR_SUBPARTS (type) == TYPE_VECTOR_SUBPARTS (expr_type) + && TYPE_SIZE (TREE_TYPE (type)) == TYPE_SIZE (TREE_TYPE (expr_type))) + { + tree sub = fold_view_convert_expr (TREE_TYPE (type), + VEC_DUPLICATE_CST_ELT (expr)); + if (sub) + return build_vector_from_val (type, sub); + } + /* We support up to 512-bit values (for V8DFmode). */ unsigned char buffer[64]; int len; @@ -8874,6 +8943,15 @@ exact_inverse (tree type, tree cst) return build_vector (type, elts); } + case VEC_DUPLICATE_CST: + { + tree sub = exact_inverse (TREE_TYPE (type), + VEC_DUPLICATE_CST_ELT (cst)); + if (!sub) + return NULL_TREE; + return build_vector_from_val (type, sub); + } + default: return NULL_TREE; } @@ -11939,6 +12017,9 @@ fold_checksum_tree (const_tree expr, str for (i = 0; i < (int) VECTOR_CST_NELTS (expr); ++i) fold_checksum_tree (VECTOR_CST_ELT (expr, i), ctx, ht); break; + case VEC_DUPLICATE_CST: + fold_checksum_tree (VEC_DUPLICATE_CST_ELT (expr), ctx, ht); + break; default: break; } @@ -14412,6 +14493,41 @@ test_vector_folding () ASSERT_FALSE (integer_nonzerop (fold_build2 (NE_EXPR, res_type, one, one))); } +/* Verify folding of VEC_DUPLICATE_CSTs and VEC_DUPLICATE_EXPRs. */ + +static void +test_vec_duplicate_folding () +{ + scalar_int_mode int_mode = SCALAR_INT_TYPE_MODE (ssizetype); + machine_mode vec_mode = targetm.vectorize.preferred_simd_mode (int_mode); + /* This will be 1 if VEC_MODE isn't a vector mode. */ + unsigned int nunits = GET_MODE_NUNITS (vec_mode); + + tree type = build_vector_type (ssizetype, nunits); + tree dup5 = build_vector_from_val (type, ssize_int (5)); + tree dup3 = build_vector_from_val (type, ssize_int (3)); + + tree neg_dup5 = fold_unary (NEGATE_EXPR, type, dup5); + ASSERT_EQ (uniform_vector_p (neg_dup5), ssize_int (-5)); + + tree not_dup5 = fold_unary (BIT_NOT_EXPR, type, dup5); + ASSERT_EQ (uniform_vector_p (not_dup5), ssize_int (-6)); + + tree dup5_plus_dup3 = fold_binary (PLUS_EXPR, type, dup5, dup3); + ASSERT_EQ (uniform_vector_p (dup5_plus_dup3), ssize_int (8)); + + tree dup5_lsl_2 = fold_binary (LSHIFT_EXPR, type, dup5, ssize_int (2)); + ASSERT_EQ (uniform_vector_p (dup5_lsl_2), ssize_int (20)); + + tree size_vector = build_vector_type (sizetype, nunits); + tree size_dup5 = fold_convert (size_vector, dup5); + ASSERT_EQ (uniform_vector_p (size_dup5), size_int (5)); + + tree dup5_expr = fold_unary (VEC_DUPLICATE_EXPR, type, ssize_int (5)); + tree dup5_cst = build_vector_from_val (type, ssize_int (5)); + ASSERT_TRUE (operand_equal_p (dup5_expr, dup5_cst, 0)); +} + /* Run all of the selftests within this file. */ void @@ -14419,6 +14535,7 @@ fold_const_c_tests () { test_arithmetic_folding (); test_vector_folding (); + test_vec_duplicate_folding (); } } // namespace selftest Index: gcc/optabs.def =================================================================== --- gcc/optabs.def 2017-11-06 12:40:39.845713389 +0000 +++ gcc/optabs.def 2017-11-06 12:40:40.286573506 +0000 @@ -364,3 +364,5 @@ OPTAB_D (atomic_xor_optab, "atomic_xor$I OPTAB_D (get_thread_pointer_optab, "get_thread_pointer$I$a") OPTAB_D (set_thread_pointer_optab, "set_thread_pointer$I$a") + +OPTAB_DC (vec_duplicate_optab, "vec_duplicate$a", VEC_DUPLICATE) Index: gcc/optabs-tree.c =================================================================== --- gcc/optabs-tree.c 2017-11-06 12:40:39.845713389 +0000 +++ gcc/optabs-tree.c 2017-11-06 12:40:40.286573506 +0000 @@ -210,6 +210,9 @@ optab_for_tree_code (enum tree_code code return TYPE_UNSIGNED (type) ? vec_pack_ufix_trunc_optab : vec_pack_sfix_trunc_optab; + case VEC_DUPLICATE_EXPR: + return vec_duplicate_optab; + default: break; } Index: gcc/optabs.h =================================================================== --- gcc/optabs.h 2017-11-06 12:40:39.845713389 +0000 +++ gcc/optabs.h 2017-11-06 12:40:40.287566435 +0000 @@ -181,6 +181,7 @@ extern rtx simplify_expand_binop (machin enum optab_methods methods); extern bool force_expand_binop (machine_mode, optab, rtx, rtx, rtx, int, enum optab_methods); +extern rtx expand_vector_broadcast (machine_mode, rtx); /* Generate code for a simple binary or unary operation. "Simple" in this case means "can be unambiguously described by a (mode, code) Index: gcc/optabs.c =================================================================== --- gcc/optabs.c 2017-11-06 12:40:39.845713389 +0000 +++ gcc/optabs.c 2017-11-06 12:40:40.286573506 +0000 @@ -367,7 +367,7 @@ force_expand_binop (machine_mode mode, o mode of OP must be the element mode of VMODE. If OP is a constant, then the return value will be a constant. */ -static rtx +rtx expand_vector_broadcast (machine_mode vmode, rtx op) { enum insn_code icode; @@ -380,6 +380,16 @@ expand_vector_broadcast (machine_mode vm if (valid_for_const_vec_duplicate_p (vmode, op)) return gen_const_vec_duplicate (vmode, op); + icode = optab_handler (vec_duplicate_optab, vmode); + if (icode != CODE_FOR_nothing) + { + struct expand_operand ops[2]; + create_output_operand (&ops[0], NULL_RTX, vmode); + create_input_operand (&ops[1], op, GET_MODE (op)); + expand_insn (icode, 2, ops); + return ops[0].value; + } + /* ??? If the target doesn't have a vec_init, then we have no easy way of performing this operation. Most of this sort of generic support is hidden away in the vector lowering support in gimple. */ Index: gcc/expr.c =================================================================== --- gcc/expr.c 2017-11-06 12:40:39.845713389 +0000 +++ gcc/expr.c 2017-11-06 12:40:40.281608865 +0000 @@ -6576,7 +6576,8 @@ store_constructor (tree exp, rtx target, constructor_elt *ce; int i; int need_to_clear; - int icode = CODE_FOR_nothing; + insn_code icode = CODE_FOR_nothing; + tree elt; tree elttype = TREE_TYPE (type); int elt_size = tree_to_uhwi (TYPE_SIZE (elttype)); machine_mode eltmode = TYPE_MODE (elttype); @@ -6586,13 +6587,30 @@ store_constructor (tree exp, rtx target, unsigned n_elts; alias_set_type alias; bool vec_vec_init_p = false; + machine_mode mode = GET_MODE (target); gcc_assert (eltmode != BLKmode); + /* Try using vec_duplicate_optab for uniform vectors. */ + if (!TREE_SIDE_EFFECTS (exp) + && VECTOR_MODE_P (mode) + && eltmode == GET_MODE_INNER (mode) + && ((icode = optab_handler (vec_duplicate_optab, mode)) + != CODE_FOR_nothing) + && (elt = uniform_vector_p (exp))) + { + struct expand_operand ops[2]; + create_output_operand (&ops[0], target, mode); + create_input_operand (&ops[1], expand_normal (elt), eltmode); + expand_insn (icode, 2, ops); + if (!rtx_equal_p (target, ops[0].value)) + emit_move_insn (target, ops[0].value); + break; + } + n_elts = TYPE_VECTOR_SUBPARTS (type); - if (REG_P (target) && VECTOR_MODE_P (GET_MODE (target))) + if (REG_P (target) && VECTOR_MODE_P (mode)) { - machine_mode mode = GET_MODE (target); machine_mode emode = eltmode; if (CONSTRUCTOR_NELTS (exp) @@ -6604,7 +6622,7 @@ store_constructor (tree exp, rtx target, == n_elts); emode = TYPE_MODE (etype); } - icode = (int) convert_optab_handler (vec_init_optab, mode, emode); + icode = convert_optab_handler (vec_init_optab, mode, emode); if (icode != CODE_FOR_nothing) { unsigned int i, n = n_elts; @@ -6652,7 +6670,7 @@ store_constructor (tree exp, rtx target, if (need_to_clear && size > 0 && !vector) { if (REG_P (target)) - emit_move_insn (target, CONST0_RTX (GET_MODE (target))); + emit_move_insn (target, CONST0_RTX (mode)); else clear_storage (target, GEN_INT (size), BLOCK_OP_NORMAL); cleared = 1; @@ -6660,7 +6678,7 @@ store_constructor (tree exp, rtx target, /* Inform later passes that the old value is dead. */ if (!cleared && !vector && REG_P (target)) - emit_move_insn (target, CONST0_RTX (GET_MODE (target))); + emit_move_insn (target, CONST0_RTX (mode)); if (MEM_P (target)) alias = MEM_ALIAS_SET (target); @@ -6711,8 +6729,7 @@ store_constructor (tree exp, rtx target, if (vector) emit_insn (GEN_FCN (icode) (target, - gen_rtx_PARALLEL (GET_MODE (target), - vector))); + gen_rtx_PARALLEL (mode, vector))); break; } @@ -7690,6 +7707,19 @@ expand_operands (tree exp0, tree exp1, r } +/* Expand constant vector element ELT, which has mode MODE. This is used + for members of VECTOR_CST and VEC_DUPLICATE_CST. */ + +static rtx +const_vector_element (scalar_mode mode, const_tree elt) +{ + if (TREE_CODE (elt) == REAL_CST) + return const_double_from_real_value (TREE_REAL_CST (elt), mode); + if (TREE_CODE (elt) == FIXED_CST) + return CONST_FIXED_FROM_FIXED_VALUE (TREE_FIXED_CST (elt), mode); + return immed_wide_int_const (wi::to_wide (elt), mode); +} + /* Return a MEM that contains constant EXP. DEFER is as for output_constant_def and MODIFIER is as for expand_expr. */ @@ -9555,6 +9585,12 @@ #define REDUCE_BIT_FIELD(expr) (reduce_b target = expand_vec_cond_expr (type, treeop0, treeop1, treeop2, target); return target; + case VEC_DUPLICATE_EXPR: + op0 = expand_expr (treeop0, NULL_RTX, VOIDmode, modifier); + target = expand_vector_broadcast (mode, op0); + gcc_assert (target); + return target; + case BIT_INSERT_EXPR: { unsigned bitpos = tree_to_uhwi (treeop2); @@ -9988,6 +10024,11 @@ expand_expr_real_1 (tree exp, rtx target tmode, modifier); } + case VEC_DUPLICATE_CST: + op0 = const_vector_element (GET_MODE_INNER (mode), + VEC_DUPLICATE_CST_ELT (exp)); + return gen_const_vec_duplicate (mode, op0); + case CONST_DECL: if (modifier == EXPAND_WRITE) { @@ -11749,8 +11790,7 @@ const_vector_from_tree (tree exp) { rtvec v; unsigned i, units; - tree elt; - machine_mode inner, mode; + machine_mode mode; mode = TYPE_MODE (TREE_TYPE (exp)); @@ -11761,23 +11801,12 @@ const_vector_from_tree (tree exp) return const_vector_mask_from_tree (exp); units = VECTOR_CST_NELTS (exp); - inner = GET_MODE_INNER (mode); v = rtvec_alloc (units); for (i = 0; i < units; ++i) - { - elt = VECTOR_CST_ELT (exp, i); - - if (TREE_CODE (elt) == REAL_CST) - RTVEC_ELT (v, i) = const_double_from_real_value (TREE_REAL_CST (elt), - inner); - else if (TREE_CODE (elt) == FIXED_CST) - RTVEC_ELT (v, i) = CONST_FIXED_FROM_FIXED_VALUE (TREE_FIXED_CST (elt), - inner); - else - RTVEC_ELT (v, i) = immed_wide_int_const (wi::to_wide (elt), inner); - } + RTVEC_ELT (v, i) = const_vector_element (GET_MODE_INNER (mode), + VECTOR_CST_ELT (exp, i)); return gen_rtx_CONST_VECTOR (mode, v); } Index: gcc/internal-fn.c =================================================================== --- gcc/internal-fn.c 2017-11-06 12:40:39.845713389 +0000 +++ gcc/internal-fn.c 2017-11-06 12:40:40.284587650 +0000 @@ -1911,12 +1911,12 @@ expand_vector_ubsan_overflow (location_t emit_move_insn (cntvar, const0_rtx); emit_label (loop_lab); } - if (TREE_CODE (arg0) != VECTOR_CST) + if (!CONSTANT_CLASS_P (arg0)) { rtx arg0r = expand_normal (arg0); arg0 = make_tree (TREE_TYPE (arg0), arg0r); } - if (TREE_CODE (arg1) != VECTOR_CST) + if (!CONSTANT_CLASS_P (arg1)) { rtx arg1r = expand_normal (arg1); arg1 = make_tree (TREE_TYPE (arg1), arg1r); Index: gcc/tree-cfg.c =================================================================== --- gcc/tree-cfg.c 2017-11-06 12:40:39.845713389 +0000 +++ gcc/tree-cfg.c 2017-11-06 12:40:40.287566435 +0000 @@ -3798,6 +3798,17 @@ verify_gimple_assign_unary (gassign *stm case CONJ_EXPR: break; + case VEC_DUPLICATE_EXPR: + if (TREE_CODE (lhs_type) != VECTOR_TYPE + || !useless_type_conversion_p (TREE_TYPE (lhs_type), rhs1_type)) + { + error ("vec_duplicate should be from a scalar to a like vector"); + debug_generic_expr (lhs_type); + debug_generic_expr (rhs1_type); + return true; + } + return false; + default: gcc_unreachable (); } @@ -4468,6 +4479,7 @@ verify_gimple_assign_single (gassign *st case FIXED_CST: case COMPLEX_CST: case VECTOR_CST: + case VEC_DUPLICATE_CST: case STRING_CST: return res; Index: gcc/tree-inline.c =================================================================== --- gcc/tree-inline.c 2017-11-06 12:40:39.845713389 +0000 +++ gcc/tree-inline.c 2017-11-06 12:40:40.289552291 +0000 @@ -3930,6 +3930,7 @@ estimate_operator_cost (enum tree_code c case VEC_PACK_FIX_TRUNC_EXPR: case VEC_WIDEN_LSHIFT_HI_EXPR: case VEC_WIDEN_LSHIFT_LO_EXPR: + case VEC_DUPLICATE_EXPR: return 1;
On Mon, Nov 6, 2017 at 4:09 PM, Richard Sandiford <richard.sandiford@linaro.org> wrote: > Richard Biener <richard.guenther@gmail.com> writes: >> On Mon, Oct 23, 2017 at 1:20 PM, Richard Sandiford >> <richard.sandiford@linaro.org> wrote: >>> SVE needs a way of broadcasting a scalar to a variable-length vector. >>> This patch adds VEC_DUPLICATE_CST for when VECTOR_CST would be used for >>> fixed-length vectors and VEC_DUPLICATE_EXPR for when CONSTRUCTOR would >>> be used for fixed-length vectors. VEC_DUPLICATE_EXPR is the tree >>> equivalent of the existing rtl code VEC_DUPLICATE. >>> >>> Originally we had a single VEC_DUPLICATE_EXPR and used TREE_CONSTANT >>> to mark constant nodes, but in response to last year's RFC, Richard B. >>> suggested it would be better to have separate codes for the constant >>> and non-constant cases. This allows VEC_DUPLICATE_EXPR to be treated >>> as a normal unary operation and avoids the previous need for treating >>> it as a GIMPLE_SINGLE_RHS. >>> >>> It might make sense to use VEC_DUPLICATE_CST for all duplicated >>> vector constants, since it's a bit more compact than VECTOR_CST >>> in that case, and is potentially more efficient to process. >>> However, the nice thing about keeping it restricted to variable-length >>> vectors is that there is then no need to handle combinations of >>> VECTOR_CST and VEC_DUPLICATE_CST; a vector type will always use >>> VECTOR_CST or never use it. >>> >>> The patch also adds a vec_duplicate_optab to go with VEC_DUPLICATE_EXPR. >> >> Index: gcc/tree-vect-generic.c >> =================================================================== >> --- gcc/tree-vect-generic.c 2017-10-23 11:38:53.934094740 +0100 >> +++ gcc/tree-vect-generic.c 2017-10-23 11:41:51.773953100 +0100 >> @@ -1419,6 +1419,7 @@ lower_vec_perm (gimple_stmt_iterator *gs >> ssa_uniform_vector_p (tree op) >> { >> if (TREE_CODE (op) == VECTOR_CST >> + || TREE_CODE (op) == VEC_DUPLICATE_CST >> || TREE_CODE (op) == CONSTRUCTOR) >> return uniform_vector_p (op); >> >> VEC_DUPLICATE_EXPR handling? > > Oops, yeah. I could have sworn it was there at one time... > >> Looks like for VEC_DUPLICATE_CST it could directly return true. > > The function is a bit misnamed: it returns the duplicated tree value > rather than a bool. > >> I didn't see uniform_vector_p being updated? > > That part was there FWIW (for tree.c). > >> Can you add verification to either verify_expr or build_vec_duplicate_cst >> that the type is one of variable size? And amend tree.def docs >> accordingly. Because otherwise we miss a lot of cases in constant >> folding (mixing VEC_DUPLICATE_CST and VECTOR_CST). > > OK, done in the patch below with a gcc_unreachable () bomb in > build_vec_duplicate_cst, which becomes a gcc_assert when variable-length > vectors are added. This meant changing the selftests to use > build_vector_from_val rather than build_vec_duplicate_cst, > but to still get testing of VEC_DUPLICATE_*, we then need to use > the target's preferred vector length instead of always using 4. > > Tested as before. OK (given the slightly different selftests)? Ok. I'll leave the missed constant foldings to you to figure out. Richard. > Thanks, > Richard > > > 2017-11-06 Richard Sandiford <richard.sandiford@linaro.org> > Alan Hayward <alan.hawyard@arm.com> > David Sherwood <david.sherwood@arm.com> > > gcc/ > * doc/generic.texi (VEC_DUPLICATE_CST, VEC_DUPLICATE_EXPR): Document. > (VEC_COND_EXPR): Add missing @tindex. > * doc/md.texi (vec_duplicate@var{m}): Document. > * tree.def (VEC_DUPLICATE_CST, VEC_DUPLICATE_EXPR): New tree codes. > * tree-core.h (tree_base): Document that u.nelts and TREE_OVERFLOW > are used for VEC_DUPLICATE_CST as well. > (tree_vector): Access base.n.nelts directly. > * tree.h (TREE_OVERFLOW): Add VEC_DUPLICATE_CST to the list of > valid codes. > (VEC_DUPLICATE_CST_ELT): New macro. > * tree.c (tree_node_structure_for_code, tree_code_size, tree_size) > (integer_zerop, integer_onep, integer_all_onesp, integer_truep) > (real_zerop, real_onep, real_minus_onep, add_expr, initializer_zerop) > (walk_tree_1, drop_tree_overflow): Handle VEC_DUPLICATE_CST. > (build_vec_duplicate_cst): New function. > (build_vector_from_val): Add stubbed-out handling of variable-length > vectors, using build_vec_duplicate_cst and VEC_DUPLICATE_EXPR. > (uniform_vector_p): Handle the new codes. > (test_vec_duplicate_predicates_int): New function. > (test_vec_duplicate_predicates_float): Likewise. > (test_vec_duplicate_predicates): Likewise. > (tree_c_tests): Call test_vec_duplicate_predicates. > * cfgexpand.c (expand_debug_expr): Handle the new codes. > * tree-pretty-print.c (dump_generic_node): Likewise. > * tree-vect-generic.c (ssa_uniform_vector_p): Likewise. > * dwarf2out.c (rtl_for_decl_init): Handle VEC_DUPLICATE_CST. > * gimple-expr.h (is_gimple_constant): Likewise. > * gimplify.c (gimplify_expr): Likewise. > * graphite-isl-ast-to-gimple.c > (translate_isl_ast_to_gimple::is_constant): Likewise. > * graphite-scop-detection.c (scan_tree_for_params): Likewise. > * ipa-icf-gimple.c (func_checker::compare_cst_or_decl): Likewise. > (func_checker::compare_operand): Likewise. > * ipa-icf.c (sem_item::add_expr, sem_variable::equals): Likewise. > * match.pd (negate_expr_p): Likewise. > * print-tree.c (print_node): Likewise. > * tree-chkp.c (chkp_find_bounds_1): Likewise. > * tree-loop-distribution.c (const_with_all_bytes_same): Likewise. > * tree-ssa-loop.c (for_each_index): Likewise. > * tree-ssa-pre.c (create_component_ref_by_pieces_1): Likewise. > * tree-ssa-sccvn.c (copy_reference_ops_from_ref): Likewise. > (ao_ref_init_from_vn_reference): Likewise. > * varasm.c (const_hash_1, compare_constant): Likewise. > * fold-const.c (negate_expr_p, fold_negate_expr_1, const_binop) > (fold_convert_const, operand_equal_p, fold_view_convert_expr) > (exact_inverse, fold_checksum_tree): Likewise. > (const_unop): Likewise. Fold VEC_DUPLICATE_EXPRs of a constant. > (test_vec_duplicate_folding): New function. > (fold_const_c_tests): Call it. > * optabs.def (vec_duplicate_optab): New optab. > * optabs-tree.c (optab_for_tree_code): Handle VEC_DUPLICATE_EXPR. > * optabs.h (expand_vector_broadcast): Declare. > * optabs.c (expand_vector_broadcast): Make non-static. Try using > vec_duplicate_optab. > * expr.c (store_constructor): Try using vec_duplicate_optab for > uniform vectors. > (const_vector_element): New function, split out from... > (const_vector_from_tree): ...here. > (expand_expr_real_2): Handle VEC_DUPLICATE_EXPR. > (expand_expr_real_1): Handle VEC_DUPLICATE_CST. > * internal-fn.c (expand_vector_ubsan_overflow): Use CONSTANT_P > instead of checking for VECTOR_CST. > * tree-cfg.c (verify_gimple_assign_unary): Handle VEC_DUPLICATE_EXPR. > (verify_gimple_assign_single): Handle VEC_DUPLICATE_CST. > * tree-inline.c (estimate_operator_cost): Handle VEC_DUPLICATE_EXPR. > > Index: gcc/doc/generic.texi > =================================================================== > --- gcc/doc/generic.texi 2017-11-06 12:40:39.845713389 +0000 > +++ gcc/doc/generic.texi 2017-11-06 12:40:40.277637153 +0000 > @@ -1036,6 +1036,7 @@ As this example indicates, the operands > @tindex FIXED_CST > @tindex COMPLEX_CST > @tindex VECTOR_CST > +@tindex VEC_DUPLICATE_CST > @tindex STRING_CST > @findex TREE_STRING_LENGTH > @findex TREE_STRING_POINTER > @@ -1089,6 +1090,14 @@ constant nodes. Each individual constan > double constant node. The first operand is a @code{TREE_LIST} of the > constant nodes and is accessed through @code{TREE_VECTOR_CST_ELTS}. > > +@item VEC_DUPLICATE_CST > +These nodes represent a vector constant in which every element has the > +same scalar value. At present only variable-length vectors use > +@code{VEC_DUPLICATE_CST}; constant-length vectors use @code{VECTOR_CST} > +instead. The scalar element value is given by > +@code{VEC_DUPLICATE_CST_ELT} and has the same restrictions as the > +element of a @code{VECTOR_CST}. > + > @item STRING_CST > These nodes represent string-constants. The @code{TREE_STRING_LENGTH} > returns the length of the string, as an @code{int}. The > @@ -1692,6 +1701,7 @@ a value from @code{enum annot_expr_kind} > > @node Vectors > @subsection Vectors > +@tindex VEC_DUPLICATE_EXPR > @tindex VEC_LSHIFT_EXPR > @tindex VEC_RSHIFT_EXPR > @tindex VEC_WIDEN_MULT_HI_EXPR > @@ -1703,9 +1713,14 @@ a value from @code{enum annot_expr_kind} > @tindex VEC_PACK_TRUNC_EXPR > @tindex VEC_PACK_SAT_EXPR > @tindex VEC_PACK_FIX_TRUNC_EXPR > +@tindex VEC_COND_EXPR > @tindex SAD_EXPR > > @table @code > +@item VEC_DUPLICATE_EXPR > +This node has a single operand and represents a vector in which every > +element is equal to that operand. > + > @item VEC_LSHIFT_EXPR > @itemx VEC_RSHIFT_EXPR > These nodes represent whole vector left and right shifts, respectively. > Index: gcc/doc/md.texi > =================================================================== > --- gcc/doc/md.texi 2017-11-06 12:40:39.845713389 +0000 > +++ gcc/doc/md.texi 2017-11-06 12:40:40.278630081 +0000 > @@ -4888,6 +4888,17 @@ and operand 1 is parallel containing val > the vector mode @var{m}, or a vector mode with the same element mode and > smaller number of elements. > > +@cindex @code{vec_duplicate@var{m}} instruction pattern > +@item @samp{vec_duplicate@var{m}} > +Initialize vector output operand 0 so that each element has the value given > +by scalar input operand 1. The vector has mode @var{m} and the scalar has > +the mode appropriate for one element of @var{m}. > + > +This pattern only handles duplicates of non-constant inputs. Constant > +vectors go through the @code{mov@var{m}} pattern instead. > + > +This pattern is not allowed to @code{FAIL}. > + > @cindex @code{vec_cmp@var{m}@var{n}} instruction pattern > @item @samp{vec_cmp@var{m}@var{n}} > Output a vector comparison. Operand 0 of mode @var{n} is the destination for > Index: gcc/tree.def > =================================================================== > --- gcc/tree.def 2017-11-06 12:40:39.845713389 +0000 > +++ gcc/tree.def 2017-11-06 12:40:40.292531076 +0000 > @@ -304,6 +304,11 @@ DEFTREECODE (COMPLEX_CST, "complex_cst", > /* Contents are in VECTOR_CST_ELTS field. */ > DEFTREECODE (VECTOR_CST, "vector_cst", tcc_constant, 0) > > +/* Represents a vector constant in which every element is equal to > + VEC_DUPLICATE_CST_ELT. This is only ever used for variable-length > + vectors; fixed-length vectors must use VECTOR_CST instead. */ > +DEFTREECODE (VEC_DUPLICATE_CST, "vec_duplicate_cst", tcc_constant, 0) > + > /* Contents are TREE_STRING_LENGTH and the actual contents of the string. */ > DEFTREECODE (STRING_CST, "string_cst", tcc_constant, 0) > > @@ -534,6 +539,9 @@ DEFTREECODE (TARGET_EXPR, "target_expr", > 1 and 2 are NULL. The operands are then taken from the cfg edges. */ > DEFTREECODE (COND_EXPR, "cond_expr", tcc_expression, 3) > > +/* Represents a vector in which every element is equal to operand 0. */ > +DEFTREECODE (VEC_DUPLICATE_EXPR, "vec_duplicate_expr", tcc_unary, 1) > + > /* Vector conditional expression. It is like COND_EXPR, but with > vector operands. > > Index: gcc/tree-core.h > =================================================================== > --- gcc/tree-core.h 2017-11-06 12:40:39.845713389 +0000 > +++ gcc/tree-core.h 2017-11-06 12:40:40.288559363 +0000 > @@ -975,7 +975,8 @@ struct GTY(()) tree_base { > /* VEC length. This field is only used with TREE_VEC. */ > int length; > > - /* Number of elements. This field is only used with VECTOR_CST. */ > + /* Number of elements. This field is only used with VECTOR_CST > + and VEC_DUPLICATE_CST. It is always 1 for VEC_DUPLICATE_CST. */ > unsigned int nelts; > > /* SSA version number. This field is only used with SSA_NAME. */ > @@ -1065,7 +1066,7 @@ struct GTY(()) tree_base { > public_flag: > > TREE_OVERFLOW in > - INTEGER_CST, REAL_CST, COMPLEX_CST, VECTOR_CST > + INTEGER_CST, REAL_CST, COMPLEX_CST, VECTOR_CST, VEC_DUPLICATE_CST > > TREE_PUBLIC in > VAR_DECL, FUNCTION_DECL > @@ -1332,7 +1333,7 @@ struct GTY(()) tree_complex { > > struct GTY(()) tree_vector { > struct tree_typed typed; > - tree GTY ((length ("VECTOR_CST_NELTS ((tree) &%h)"))) elts[1]; > + tree GTY ((length ("((tree) &%h)->base.u.nelts"))) elts[1]; > }; > > struct GTY(()) tree_identifier { > Index: gcc/tree.h > =================================================================== > --- gcc/tree.h 2017-11-06 12:40:39.845713389 +0000 > +++ gcc/tree.h 2017-11-06 12:40:40.293524004 +0000 > @@ -709,8 +709,8 @@ #define TREE_SYMBOL_REFERENCED(NODE) \ > #define TYPE_REF_CAN_ALIAS_ALL(NODE) \ > (PTR_OR_REF_CHECK (NODE)->base.static_flag) > > -/* In an INTEGER_CST, REAL_CST, COMPLEX_CST, or VECTOR_CST, this means > - there was an overflow in folding. */ > +/* In an INTEGER_CST, REAL_CST, COMPLEX_CST, VECTOR_CST or VEC_DUPLICATE_CST, > + this means there was an overflow in folding. */ > > #define TREE_OVERFLOW(NODE) (CST_CHECK (NODE)->base.public_flag) > > @@ -1009,6 +1009,10 @@ #define VECTOR_CST_NELTS(NODE) (VECTOR_C > #define VECTOR_CST_ELTS(NODE) (VECTOR_CST_CHECK (NODE)->vector.elts) > #define VECTOR_CST_ELT(NODE,IDX) (VECTOR_CST_CHECK (NODE)->vector.elts[IDX]) > > +/* In a VEC_DUPLICATE_CST node. */ > +#define VEC_DUPLICATE_CST_ELT(NODE) \ > + (VEC_DUPLICATE_CST_CHECK (NODE)->vector.elts[0]) > + > /* Define fields and accessors for some special-purpose tree nodes. */ > > #define IDENTIFIER_LENGTH(NODE) \ > Index: gcc/tree.c > =================================================================== > --- gcc/tree.c 2017-11-06 12:40:39.845713389 +0000 > +++ gcc/tree.c 2017-11-06 12:40:40.292531076 +0000 > @@ -464,6 +464,7 @@ tree_node_structure_for_code (enum tree_ > case FIXED_CST: return TS_FIXED_CST; > case COMPLEX_CST: return TS_COMPLEX; > case VECTOR_CST: return TS_VECTOR; > + case VEC_DUPLICATE_CST: return TS_VECTOR; > case STRING_CST: return TS_STRING; > /* tcc_exceptional cases. */ > case ERROR_MARK: return TS_COMMON; > @@ -829,6 +830,7 @@ tree_code_size (enum tree_code code) > case FIXED_CST: return sizeof (tree_fixed_cst); > case COMPLEX_CST: return sizeof (tree_complex); > case VECTOR_CST: return sizeof (tree_vector); > + case VEC_DUPLICATE_CST: return sizeof (tree_vector); > case STRING_CST: gcc_unreachable (); > default: > gcc_checking_assert (code >= NUM_TREE_CODES); > @@ -890,6 +892,9 @@ tree_size (const_tree node) > return (sizeof (struct tree_vector) > + (VECTOR_CST_NELTS (node) - 1) * sizeof (tree)); > > + case VEC_DUPLICATE_CST: > + return sizeof (struct tree_vector); > + > case STRING_CST: > return TREE_STRING_LENGTH (node) + offsetof (struct tree_string, str) + 1; > > @@ -1697,6 +1702,34 @@ cst_and_fits_in_hwi (const_tree x) > && (tree_fits_shwi_p (x) || tree_fits_uhwi_p (x))); > } > > +/* Build a new VEC_DUPLICATE_CST with type TYPE and operand EXP. > + > + This function is only suitable for callers that know TYPE is a > + variable-length vector and specifically need a VEC_DUPLICATE_CST node. > + Use build_vector_from_val to duplicate a general scalar into a general > + vector type. */ > + > +static tree > +build_vec_duplicate_cst (tree type, tree exp MEM_STAT_DECL) > +{ > + /* Shouldn't be used until we have variable-length vectors. */ > + gcc_unreachable (); > + > + int length = sizeof (struct tree_vector); > + > + record_node_allocation_statistics (VEC_DUPLICATE_CST, length); > + > + tree t = ggc_alloc_cleared_tree_node_stat (length PASS_MEM_STAT); > + > + TREE_SET_CODE (t, VEC_DUPLICATE_CST); > + TREE_TYPE (t) = type; > + t->base.u.nelts = 1; > + VEC_DUPLICATE_CST_ELT (t) = exp; > + TREE_CONSTANT (t) = 1; > + > + return t; > +} > + > /* Build a newly constructed VECTOR_CST node of length LEN. */ > > tree > @@ -1790,6 +1823,13 @@ build_vector_from_val (tree vectype, tre > gcc_checking_assert (types_compatible_p (TYPE_MAIN_VARIANT (TREE_TYPE (sc)), > TREE_TYPE (vectype))); > > + if (0) > + { > + if (CONSTANT_CLASS_P (sc)) > + return build_vec_duplicate_cst (vectype, sc); > + return fold_build1 (VEC_DUPLICATE_EXPR, vectype, sc); > + } > + > if (CONSTANT_CLASS_P (sc)) > { > auto_vec<tree, 32> v (nunits); > @@ -2358,6 +2398,8 @@ integer_zerop (const_tree expr) > return false; > return true; > } > + case VEC_DUPLICATE_CST: > + return integer_zerop (VEC_DUPLICATE_CST_ELT (expr)); > default: > return false; > } > @@ -2384,6 +2426,8 @@ integer_onep (const_tree expr) > return false; > return true; > } > + case VEC_DUPLICATE_CST: > + return integer_onep (VEC_DUPLICATE_CST_ELT (expr)); > default: > return false; > } > @@ -2422,6 +2466,9 @@ integer_all_onesp (const_tree expr) > return 1; > } > > + else if (TREE_CODE (expr) == VEC_DUPLICATE_CST) > + return integer_all_onesp (VEC_DUPLICATE_CST_ELT (expr)); > + > else if (TREE_CODE (expr) != INTEGER_CST) > return 0; > > @@ -2478,7 +2525,7 @@ integer_nonzerop (const_tree expr) > int > integer_truep (const_tree expr) > { > - if (TREE_CODE (expr) == VECTOR_CST) > + if (TREE_CODE (expr) == VECTOR_CST || TREE_CODE (expr) == VEC_DUPLICATE_CST) > return integer_all_onesp (expr); > return integer_onep (expr); > } > @@ -2649,6 +2696,8 @@ real_zerop (const_tree expr) > return false; > return true; > } > + case VEC_DUPLICATE_CST: > + return real_zerop (VEC_DUPLICATE_CST_ELT (expr)); > default: > return false; > } > @@ -2677,6 +2726,8 @@ real_onep (const_tree expr) > return false; > return true; > } > + case VEC_DUPLICATE_CST: > + return real_onep (VEC_DUPLICATE_CST_ELT (expr)); > default: > return false; > } > @@ -2704,6 +2755,8 @@ real_minus_onep (const_tree expr) > return false; > return true; > } > + case VEC_DUPLICATE_CST: > + return real_minus_onep (VEC_DUPLICATE_CST_ELT (expr)); > default: > return false; > } > @@ -7106,6 +7159,9 @@ add_expr (const_tree t, inchash::hash &h > inchash::add_expr (VECTOR_CST_ELT (t, i), hstate, flags); > return; > } > + case VEC_DUPLICATE_CST: > + inchash::add_expr (VEC_DUPLICATE_CST_ELT (t), hstate); > + return; > case SSA_NAME: > /* We can just compare by pointer. */ > hstate.add_hwi (SSA_NAME_VERSION (t)); > @@ -10367,6 +10423,9 @@ initializer_zerop (const_tree init) > return true; > } > > + case VEC_DUPLICATE_CST: > + return initializer_zerop (VEC_DUPLICATE_CST_ELT (init)); > + > case CONSTRUCTOR: > { > unsigned HOST_WIDE_INT idx; > @@ -10412,7 +10471,13 @@ uniform_vector_p (const_tree vec) > > gcc_assert (VECTOR_TYPE_P (TREE_TYPE (vec))); > > - if (TREE_CODE (vec) == VECTOR_CST) > + if (TREE_CODE (vec) == VEC_DUPLICATE_CST) > + return VEC_DUPLICATE_CST_ELT (vec); > + > + else if (TREE_CODE (vec) == VEC_DUPLICATE_EXPR) > + return TREE_OPERAND (vec, 0); > + > + else if (TREE_CODE (vec) == VECTOR_CST) > { > first = VECTOR_CST_ELT (vec, 0); > for (i = 1; i < VECTOR_CST_NELTS (vec); ++i) > @@ -11144,6 +11209,7 @@ #define WALK_SUBTREE_TAIL(NODE) \ > case REAL_CST: > case FIXED_CST: > case VECTOR_CST: > + case VEC_DUPLICATE_CST: > case STRING_CST: > case BLOCK: > case PLACEHOLDER_EXPR: > @@ -12430,6 +12496,12 @@ drop_tree_overflow (tree t) > elt = drop_tree_overflow (elt); > } > } > + if (TREE_CODE (t) == VEC_DUPLICATE_CST) > + { > + tree *elt = &VEC_DUPLICATE_CST_ELT (t); > + if (TREE_OVERFLOW (*elt)) > + *elt = drop_tree_overflow (*elt); > + } > return t; > } > > @@ -13850,6 +13922,102 @@ test_integer_constants () > ASSERT_EQ (type, TREE_TYPE (zero)); > } > > +/* Verify predicate handling of VEC_DUPLICATE_CSTs and VEC_DUPLICATE_EXPRs > + for integral type TYPE. */ > + > +static void > +test_vec_duplicate_predicates_int (tree type) > +{ > + scalar_int_mode int_mode = SCALAR_INT_TYPE_MODE (type); > + machine_mode vec_mode = targetm.vectorize.preferred_simd_mode (int_mode); > + /* This will be 1 if VEC_MODE isn't a vector mode. */ > + unsigned int nunits = GET_MODE_NUNITS (vec_mode); > + > + tree vec_type = build_vector_type (type, nunits); > + > + tree zero = build_zero_cst (type); > + tree vec_zero = build_vector_from_val (vec_type, zero); > + ASSERT_TRUE (integer_zerop (vec_zero)); > + ASSERT_FALSE (integer_onep (vec_zero)); > + ASSERT_FALSE (integer_minus_onep (vec_zero)); > + ASSERT_FALSE (integer_all_onesp (vec_zero)); > + ASSERT_FALSE (integer_truep (vec_zero)); > + ASSERT_TRUE (initializer_zerop (vec_zero)); > + > + tree one = build_one_cst (type); > + tree vec_one = build_vector_from_val (vec_type, one); > + ASSERT_FALSE (integer_zerop (vec_one)); > + ASSERT_TRUE (integer_onep (vec_one)); > + ASSERT_FALSE (integer_minus_onep (vec_one)); > + ASSERT_FALSE (integer_all_onesp (vec_one)); > + ASSERT_FALSE (integer_truep (vec_one)); > + ASSERT_FALSE (initializer_zerop (vec_one)); > + > + tree minus_one = build_minus_one_cst (type); > + tree vec_minus_one = build_vector_from_val (vec_type, minus_one); > + ASSERT_FALSE (integer_zerop (vec_minus_one)); > + ASSERT_FALSE (integer_onep (vec_minus_one)); > + ASSERT_TRUE (integer_minus_onep (vec_minus_one)); > + ASSERT_TRUE (integer_all_onesp (vec_minus_one)); > + ASSERT_TRUE (integer_truep (vec_minus_one)); > + ASSERT_FALSE (initializer_zerop (vec_minus_one)); > + > + tree x = create_tmp_var_raw (type, "x"); > + tree vec_x = build1 (VEC_DUPLICATE_EXPR, vec_type, x); > + ASSERT_EQ (uniform_vector_p (vec_zero), zero); > + ASSERT_EQ (uniform_vector_p (vec_one), one); > + ASSERT_EQ (uniform_vector_p (vec_minus_one), minus_one); > + ASSERT_EQ (uniform_vector_p (vec_x), x); > +} > + > +/* Verify predicate handling of VEC_DUPLICATE_CSTs for floating-point > + type TYPE. */ > + > +static void > +test_vec_duplicate_predicates_float (tree type) > +{ > + scalar_float_mode float_mode = SCALAR_FLOAT_TYPE_MODE (type); > + machine_mode vec_mode = targetm.vectorize.preferred_simd_mode (float_mode); > + /* This will be 1 if VEC_MODE isn't a vector mode. */ > + unsigned int nunits = GET_MODE_NUNITS (vec_mode); > + > + tree vec_type = build_vector_type (type, nunits); > + > + tree zero = build_zero_cst (type); > + tree vec_zero = build_vector_from_val (vec_type, zero); > + ASSERT_TRUE (real_zerop (vec_zero)); > + ASSERT_FALSE (real_onep (vec_zero)); > + ASSERT_FALSE (real_minus_onep (vec_zero)); > + ASSERT_TRUE (initializer_zerop (vec_zero)); > + > + tree one = build_one_cst (type); > + tree vec_one = build_vector_from_val (vec_type, one); > + ASSERT_FALSE (real_zerop (vec_one)); > + ASSERT_TRUE (real_onep (vec_one)); > + ASSERT_FALSE (real_minus_onep (vec_one)); > + ASSERT_FALSE (initializer_zerop (vec_one)); > + > + tree minus_one = build_minus_one_cst (type); > + tree vec_minus_one = build_vector_from_val (vec_type, minus_one); > + ASSERT_FALSE (real_zerop (vec_minus_one)); > + ASSERT_FALSE (real_onep (vec_minus_one)); > + ASSERT_TRUE (real_minus_onep (vec_minus_one)); > + ASSERT_FALSE (initializer_zerop (vec_minus_one)); > + > + ASSERT_EQ (uniform_vector_p (vec_zero), zero); > + ASSERT_EQ (uniform_vector_p (vec_one), one); > + ASSERT_EQ (uniform_vector_p (vec_minus_one), minus_one); > +} > + > +/* Verify predicate handling of VEC_DUPLICATE_CSTs and VEC_DUPLICATE_EXPRs. */ > + > +static void > +test_vec_duplicate_predicates () > +{ > + test_vec_duplicate_predicates_int (integer_type_node); > + test_vec_duplicate_predicates_float (float_type_node); > +} > + > /* Verify identifiers. */ > > static void > @@ -13878,6 +14046,7 @@ test_labels () > tree_c_tests () > { > test_integer_constants (); > + test_vec_duplicate_predicates (); > test_identifiers (); > test_labels (); > } > Index: gcc/cfgexpand.c > =================================================================== > --- gcc/cfgexpand.c 2017-11-06 12:40:39.845713389 +0000 > +++ gcc/cfgexpand.c 2017-11-06 12:40:40.276644225 +0000 > @@ -5068,6 +5068,8 @@ expand_debug_expr (tree exp) > case VEC_WIDEN_LSHIFT_HI_EXPR: > case VEC_WIDEN_LSHIFT_LO_EXPR: > case VEC_PERM_EXPR: > + case VEC_DUPLICATE_CST: > + case VEC_DUPLICATE_EXPR: > return NULL; > > /* Misc codes. */ > Index: gcc/tree-pretty-print.c > =================================================================== > --- gcc/tree-pretty-print.c 2017-11-06 12:40:39.845713389 +0000 > +++ gcc/tree-pretty-print.c 2017-11-06 12:40:40.289552291 +0000 > @@ -1802,6 +1802,12 @@ dump_generic_node (pretty_printer *pp, t > } > break; > > + case VEC_DUPLICATE_CST: > + pp_string (pp, "{ "); > + dump_generic_node (pp, VEC_DUPLICATE_CST_ELT (node), spc, flags, false); > + pp_string (pp, ", ... }"); > + break; > + > case FUNCTION_TYPE: > case METHOD_TYPE: > dump_generic_node (pp, TREE_TYPE (node), spc, flags, false); > @@ -3231,6 +3237,15 @@ dump_generic_node (pretty_printer *pp, t > pp_string (pp, " > "); > break; > > + case VEC_DUPLICATE_EXPR: > + pp_space (pp); > + for (str = get_tree_code_name (code); *str; str++) > + pp_character (pp, TOUPPER (*str)); > + pp_string (pp, " < "); > + dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false); > + pp_string (pp, " > "); > + break; > + > case VEC_UNPACK_HI_EXPR: > pp_string (pp, " VEC_UNPACK_HI_EXPR < "); > dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false); > Index: gcc/tree-vect-generic.c > =================================================================== > --- gcc/tree-vect-generic.c 2017-11-06 12:40:39.845713389 +0000 > +++ gcc/tree-vect-generic.c 2017-11-06 12:40:40.291538147 +0000 > @@ -1419,6 +1419,8 @@ lower_vec_perm (gimple_stmt_iterator *gs > ssa_uniform_vector_p (tree op) > { > if (TREE_CODE (op) == VECTOR_CST > + || TREE_CODE (op) == VEC_DUPLICATE_CST > + || TREE_CODE (op) == VEC_DUPLICATE_EXPR > || TREE_CODE (op) == CONSTRUCTOR) > return uniform_vector_p (op); > if (TREE_CODE (op) == SSA_NAME) > Index: gcc/dwarf2out.c > =================================================================== > --- gcc/dwarf2out.c 2017-11-06 12:40:39.845713389 +0000 > +++ gcc/dwarf2out.c 2017-11-06 12:40:40.280615937 +0000 > @@ -18878,6 +18878,7 @@ rtl_for_decl_init (tree init, tree type) > switch (TREE_CODE (init)) > { > case VECTOR_CST: > + case VEC_DUPLICATE_CST: > break; > case CONSTRUCTOR: > if (TREE_CONSTANT (init)) > Index: gcc/gimple-expr.h > =================================================================== > --- gcc/gimple-expr.h 2017-11-06 12:40:39.845713389 +0000 > +++ gcc/gimple-expr.h 2017-11-06 12:40:40.282601794 +0000 > @@ -134,6 +134,7 @@ is_gimple_constant (const_tree t) > case FIXED_CST: > case COMPLEX_CST: > case VECTOR_CST: > + case VEC_DUPLICATE_CST: > case STRING_CST: > return true; > > Index: gcc/gimplify.c > =================================================================== > --- gcc/gimplify.c 2017-11-06 12:40:39.845713389 +0000 > +++ gcc/gimplify.c 2017-11-06 12:40:40.283594722 +0000 > @@ -11507,6 +11507,7 @@ gimplify_expr (tree *expr_p, gimple_seq > case STRING_CST: > case COMPLEX_CST: > case VECTOR_CST: > + case VEC_DUPLICATE_CST: > /* Drop the overflow flag on constants, we do not want > that in the GIMPLE IL. */ > if (TREE_OVERFLOW_P (*expr_p)) > Index: gcc/graphite-isl-ast-to-gimple.c > =================================================================== > --- gcc/graphite-isl-ast-to-gimple.c 2017-11-06 12:40:39.845713389 +0000 > +++ gcc/graphite-isl-ast-to-gimple.c 2017-11-06 12:40:40.284587650 +0000 > @@ -211,7 +211,8 @@ enum phi_node_kind > return TREE_CODE (op) == INTEGER_CST > || TREE_CODE (op) == REAL_CST > || TREE_CODE (op) == COMPLEX_CST > - || TREE_CODE (op) == VECTOR_CST; > + || TREE_CODE (op) == VECTOR_CST > + || TREE_CODE (op) == VEC_DUPLICATE_CST; > } > > private: > Index: gcc/graphite-scop-detection.c > =================================================================== > --- gcc/graphite-scop-detection.c 2017-11-06 12:40:39.845713389 +0000 > +++ gcc/graphite-scop-detection.c 2017-11-06 12:40:40.284587650 +0000 > @@ -1212,6 +1212,7 @@ scan_tree_for_params (sese_info_p s, tre > case REAL_CST: > case COMPLEX_CST: > case VECTOR_CST: > + case VEC_DUPLICATE_CST: > break; > > default: > Index: gcc/ipa-icf-gimple.c > =================================================================== > --- gcc/ipa-icf-gimple.c 2017-11-06 12:40:39.845713389 +0000 > +++ gcc/ipa-icf-gimple.c 2017-11-06 12:40:40.285580578 +0000 > @@ -333,6 +333,7 @@ func_checker::compare_cst_or_decl (tree > case INTEGER_CST: > case COMPLEX_CST: > case VECTOR_CST: > + case VEC_DUPLICATE_CST: > case STRING_CST: > case REAL_CST: > { > @@ -528,6 +529,7 @@ func_checker::compare_operand (tree t1, > case INTEGER_CST: > case COMPLEX_CST: > case VECTOR_CST: > + case VEC_DUPLICATE_CST: > case STRING_CST: > case REAL_CST: > case FUNCTION_DECL: > Index: gcc/ipa-icf.c > =================================================================== > --- gcc/ipa-icf.c 2017-11-06 12:40:39.845713389 +0000 > +++ gcc/ipa-icf.c 2017-11-06 12:40:40.285580578 +0000 > @@ -1479,6 +1479,7 @@ sem_item::add_expr (const_tree exp, inch > case STRING_CST: > case COMPLEX_CST: > case VECTOR_CST: > + case VEC_DUPLICATE_CST: > inchash::add_expr (exp, hstate); > break; > case CONSTRUCTOR: > @@ -2036,6 +2037,9 @@ sem_variable::equals (tree t1, tree t2) > > return 1; > } > + case VEC_DUPLICATE_CST: > + return sem_variable::equals (VEC_DUPLICATE_CST_ELT (t1), > + VEC_DUPLICATE_CST_ELT (t2)); > case ARRAY_REF: > case ARRAY_RANGE_REF: > { > Index: gcc/match.pd > =================================================================== > --- gcc/match.pd 2017-11-06 12:40:39.845713389 +0000 > +++ gcc/match.pd 2017-11-06 12:40:40.285580578 +0000 > @@ -958,6 +958,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > (match negate_expr_p > VECTOR_CST > (if (FLOAT_TYPE_P (TREE_TYPE (type)) || TYPE_OVERFLOW_WRAPS (type)))) > +(match negate_expr_p > + VEC_DUPLICATE_CST > + (if (FLOAT_TYPE_P (TREE_TYPE (type)) || TYPE_OVERFLOW_WRAPS (type)))) > > /* (-A) * (-B) -> A * B */ > (simplify > Index: gcc/print-tree.c > =================================================================== > --- gcc/print-tree.c 2017-11-06 12:40:39.845713389 +0000 > +++ gcc/print-tree.c 2017-11-06 12:40:40.287566435 +0000 > @@ -783,6 +783,10 @@ print_node (FILE *file, const char *pref > } > break; > > + case VEC_DUPLICATE_CST: > + print_node (file, "elt", VEC_DUPLICATE_CST_ELT (node), indent + 4); > + break; > + > case COMPLEX_CST: > print_node (file, "real", TREE_REALPART (node), indent + 4); > print_node (file, "imag", TREE_IMAGPART (node), indent + 4); > Index: gcc/tree-chkp.c > =================================================================== > --- gcc/tree-chkp.c 2017-11-06 12:40:39.845713389 +0000 > +++ gcc/tree-chkp.c 2017-11-06 12:40:40.288559363 +0000 > @@ -3799,6 +3799,7 @@ chkp_find_bounds_1 (tree ptr, tree ptr_s > case INTEGER_CST: > case COMPLEX_CST: > case VECTOR_CST: > + case VEC_DUPLICATE_CST: > if (integer_zerop (ptr_src)) > bounds = chkp_get_none_bounds (); > else > Index: gcc/tree-loop-distribution.c > =================================================================== > --- gcc/tree-loop-distribution.c 2017-11-06 12:40:39.845713389 +0000 > +++ gcc/tree-loop-distribution.c 2017-11-06 12:40:40.289552291 +0000 > @@ -927,6 +927,9 @@ const_with_all_bytes_same (tree val) > && CONSTRUCTOR_NELTS (val) == 0)) > return 0; > > + if (TREE_CODE (val) == VEC_DUPLICATE_CST) > + return const_with_all_bytes_same (VEC_DUPLICATE_CST_ELT (val)); > + > if (real_zerop (val)) > { > /* Only return 0 for +0.0, not for -0.0, which doesn't have > Index: gcc/tree-ssa-loop.c > =================================================================== > --- gcc/tree-ssa-loop.c 2017-11-06 12:40:39.845713389 +0000 > +++ gcc/tree-ssa-loop.c 2017-11-06 12:40:40.290545219 +0000 > @@ -616,6 +616,7 @@ for_each_index (tree *addr_p, bool (*cbc > case STRING_CST: > case RESULT_DECL: > case VECTOR_CST: > + case VEC_DUPLICATE_CST: > case COMPLEX_CST: > case INTEGER_CST: > case REAL_CST: > Index: gcc/tree-ssa-pre.c > =================================================================== > --- gcc/tree-ssa-pre.c 2017-11-06 12:40:39.845713389 +0000 > +++ gcc/tree-ssa-pre.c 2017-11-06 12:40:40.290545219 +0000 > @@ -2627,6 +2627,7 @@ create_component_ref_by_pieces_1 (basic_ > case INTEGER_CST: > case COMPLEX_CST: > case VECTOR_CST: > + case VEC_DUPLICATE_CST: > case REAL_CST: > case CONSTRUCTOR: > case VAR_DECL: > Index: gcc/tree-ssa-sccvn.c > =================================================================== > --- gcc/tree-ssa-sccvn.c 2017-11-06 12:40:39.845713389 +0000 > +++ gcc/tree-ssa-sccvn.c 2017-11-06 12:40:40.291538147 +0000 > @@ -866,6 +866,7 @@ copy_reference_ops_from_ref (tree ref, v > case INTEGER_CST: > case COMPLEX_CST: > case VECTOR_CST: > + case VEC_DUPLICATE_CST: > case REAL_CST: > case FIXED_CST: > case CONSTRUCTOR: > @@ -1058,6 +1059,7 @@ ao_ref_init_from_vn_reference (ao_ref *r > case INTEGER_CST: > case COMPLEX_CST: > case VECTOR_CST: > + case VEC_DUPLICATE_CST: > case REAL_CST: > case CONSTRUCTOR: > case CONST_DECL: > Index: gcc/varasm.c > =================================================================== > --- gcc/varasm.c 2017-11-06 12:40:39.845713389 +0000 > +++ gcc/varasm.c 2017-11-06 12:40:40.293524004 +0000 > @@ -3068,6 +3068,9 @@ const_hash_1 (const tree exp) > CASE_CONVERT: > return const_hash_1 (TREE_OPERAND (exp, 0)) * 7 + 2; > > + case VEC_DUPLICATE_CST: > + return const_hash_1 (VEC_DUPLICATE_CST_ELT (exp)) * 7 + 3; > + > default: > /* A language specific constant. Just hash the code. */ > return code; > @@ -3158,6 +3161,10 @@ compare_constant (const tree t1, const t > return 1; > } > > + case VEC_DUPLICATE_CST: > + return compare_constant (VEC_DUPLICATE_CST_ELT (t1), > + VEC_DUPLICATE_CST_ELT (t2)); > + > case CONSTRUCTOR: > { > vec<constructor_elt, va_gc> *v1, *v2; > Index: gcc/fold-const.c > =================================================================== > --- gcc/fold-const.c 2017-11-06 12:40:39.845713389 +0000 > +++ gcc/fold-const.c 2017-11-06 12:40:40.282601794 +0000 > @@ -418,6 +418,9 @@ negate_expr_p (tree t) > return true; > } > > + case VEC_DUPLICATE_CST: > + return negate_expr_p (VEC_DUPLICATE_CST_ELT (t)); > + > case COMPLEX_EXPR: > return negate_expr_p (TREE_OPERAND (t, 0)) > && negate_expr_p (TREE_OPERAND (t, 1)); > @@ -579,6 +582,14 @@ fold_negate_expr_1 (location_t loc, tree > return build_vector (type, elts); > } > > + case VEC_DUPLICATE_CST: > + { > + tree sub = fold_negate_expr (loc, VEC_DUPLICATE_CST_ELT (t)); > + if (!sub) > + return NULL_TREE; > + return build_vector_from_val (type, sub); > + } > + > case COMPLEX_EXPR: > if (negate_expr_p (t)) > return fold_build2_loc (loc, COMPLEX_EXPR, type, > @@ -1436,6 +1447,16 @@ const_binop (enum tree_code code, tree a > return build_vector (type, elts); > } > > + if (TREE_CODE (arg1) == VEC_DUPLICATE_CST > + && TREE_CODE (arg2) == VEC_DUPLICATE_CST) > + { > + tree sub = const_binop (code, VEC_DUPLICATE_CST_ELT (arg1), > + VEC_DUPLICATE_CST_ELT (arg2)); > + if (!sub) > + return NULL_TREE; > + return build_vector_from_val (TREE_TYPE (arg1), sub); > + } > + > /* Shifts allow a scalar offset for a vector. */ > if (TREE_CODE (arg1) == VECTOR_CST > && TREE_CODE (arg2) == INTEGER_CST) > @@ -1459,6 +1480,15 @@ const_binop (enum tree_code code, tree a > > return build_vector (type, elts); > } > + > + if (TREE_CODE (arg1) == VEC_DUPLICATE_CST > + && TREE_CODE (arg2) == INTEGER_CST) > + { > + tree sub = const_binop (code, VEC_DUPLICATE_CST_ELT (arg1), arg2); > + if (!sub) > + return NULL_TREE; > + return build_vector_from_val (TREE_TYPE (arg1), sub); > + } > return NULL_TREE; > } > > @@ -1652,6 +1682,13 @@ const_unop (enum tree_code code, tree ty > if (i == count) > return build_vector (type, elements); > } > + else if (TREE_CODE (arg0) == VEC_DUPLICATE_CST) > + { > + tree sub = const_unop (BIT_NOT_EXPR, TREE_TYPE (type), > + VEC_DUPLICATE_CST_ELT (arg0)); > + if (sub) > + return build_vector_from_val (type, sub); > + } > break; > > case TRUTH_NOT_EXPR: > @@ -1737,6 +1774,11 @@ const_unop (enum tree_code code, tree ty > return res; > } > > + case VEC_DUPLICATE_EXPR: > + if (CONSTANT_CLASS_P (arg0)) > + return build_vector_from_val (type, arg0); > + return NULL_TREE; > + > default: > break; > } > @@ -2167,6 +2209,15 @@ fold_convert_const (enum tree_code code, > } > return build_vector (type, v); > } > + if (TREE_CODE (arg1) == VEC_DUPLICATE_CST > + && (TYPE_VECTOR_SUBPARTS (type) > + == TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)))) > + { > + tree sub = fold_convert_const (code, TREE_TYPE (type), > + VEC_DUPLICATE_CST_ELT (arg1)); > + if (sub) > + return build_vector_from_val (type, sub); > + } > } > return NULL_TREE; > } > @@ -2953,6 +3004,10 @@ operand_equal_p (const_tree arg0, const_ > return 1; > } > > + case VEC_DUPLICATE_CST: > + return operand_equal_p (VEC_DUPLICATE_CST_ELT (arg0), > + VEC_DUPLICATE_CST_ELT (arg1), flags); > + > case COMPLEX_CST: > return (operand_equal_p (TREE_REALPART (arg0), TREE_REALPART (arg1), > flags) > @@ -7475,6 +7530,20 @@ can_native_interpret_type_p (tree type) > static tree > fold_view_convert_expr (tree type, tree expr) > { > + /* Recurse on duplicated vectors if the target type is also a vector > + and if the elements line up. */ > + tree expr_type = TREE_TYPE (expr); > + if (TREE_CODE (expr) == VEC_DUPLICATE_CST > + && VECTOR_TYPE_P (type) > + && TYPE_VECTOR_SUBPARTS (type) == TYPE_VECTOR_SUBPARTS (expr_type) > + && TYPE_SIZE (TREE_TYPE (type)) == TYPE_SIZE (TREE_TYPE (expr_type))) > + { > + tree sub = fold_view_convert_expr (TREE_TYPE (type), > + VEC_DUPLICATE_CST_ELT (expr)); > + if (sub) > + return build_vector_from_val (type, sub); > + } > + > /* We support up to 512-bit values (for V8DFmode). */ > unsigned char buffer[64]; > int len; > @@ -8874,6 +8943,15 @@ exact_inverse (tree type, tree cst) > return build_vector (type, elts); > } > > + case VEC_DUPLICATE_CST: > + { > + tree sub = exact_inverse (TREE_TYPE (type), > + VEC_DUPLICATE_CST_ELT (cst)); > + if (!sub) > + return NULL_TREE; > + return build_vector_from_val (type, sub); > + } > + > default: > return NULL_TREE; > } > @@ -11939,6 +12017,9 @@ fold_checksum_tree (const_tree expr, str > for (i = 0; i < (int) VECTOR_CST_NELTS (expr); ++i) > fold_checksum_tree (VECTOR_CST_ELT (expr, i), ctx, ht); > break; > + case VEC_DUPLICATE_CST: > + fold_checksum_tree (VEC_DUPLICATE_CST_ELT (expr), ctx, ht); > + break; > default: > break; > } > @@ -14412,6 +14493,41 @@ test_vector_folding () > ASSERT_FALSE (integer_nonzerop (fold_build2 (NE_EXPR, res_type, one, one))); > } > > +/* Verify folding of VEC_DUPLICATE_CSTs and VEC_DUPLICATE_EXPRs. */ > + > +static void > +test_vec_duplicate_folding () > +{ > + scalar_int_mode int_mode = SCALAR_INT_TYPE_MODE (ssizetype); > + machine_mode vec_mode = targetm.vectorize.preferred_simd_mode (int_mode); > + /* This will be 1 if VEC_MODE isn't a vector mode. */ > + unsigned int nunits = GET_MODE_NUNITS (vec_mode); > + > + tree type = build_vector_type (ssizetype, nunits); > + tree dup5 = build_vector_from_val (type, ssize_int (5)); > + tree dup3 = build_vector_from_val (type, ssize_int (3)); > + > + tree neg_dup5 = fold_unary (NEGATE_EXPR, type, dup5); > + ASSERT_EQ (uniform_vector_p (neg_dup5), ssize_int (-5)); > + > + tree not_dup5 = fold_unary (BIT_NOT_EXPR, type, dup5); > + ASSERT_EQ (uniform_vector_p (not_dup5), ssize_int (-6)); > + > + tree dup5_plus_dup3 = fold_binary (PLUS_EXPR, type, dup5, dup3); > + ASSERT_EQ (uniform_vector_p (dup5_plus_dup3), ssize_int (8)); > + > + tree dup5_lsl_2 = fold_binary (LSHIFT_EXPR, type, dup5, ssize_int (2)); > + ASSERT_EQ (uniform_vector_p (dup5_lsl_2), ssize_int (20)); > + > + tree size_vector = build_vector_type (sizetype, nunits); > + tree size_dup5 = fold_convert (size_vector, dup5); > + ASSERT_EQ (uniform_vector_p (size_dup5), size_int (5)); > + > + tree dup5_expr = fold_unary (VEC_DUPLICATE_EXPR, type, ssize_int (5)); > + tree dup5_cst = build_vector_from_val (type, ssize_int (5)); > + ASSERT_TRUE (operand_equal_p (dup5_expr, dup5_cst, 0)); > +} > + > /* Run all of the selftests within this file. */ > > void > @@ -14419,6 +14535,7 @@ fold_const_c_tests () > { > test_arithmetic_folding (); > test_vector_folding (); > + test_vec_duplicate_folding (); > } > > } // namespace selftest > Index: gcc/optabs.def > =================================================================== > --- gcc/optabs.def 2017-11-06 12:40:39.845713389 +0000 > +++ gcc/optabs.def 2017-11-06 12:40:40.286573506 +0000 > @@ -364,3 +364,5 @@ OPTAB_D (atomic_xor_optab, "atomic_xor$I > > OPTAB_D (get_thread_pointer_optab, "get_thread_pointer$I$a") > OPTAB_D (set_thread_pointer_optab, "set_thread_pointer$I$a") > + > +OPTAB_DC (vec_duplicate_optab, "vec_duplicate$a", VEC_DUPLICATE) > Index: gcc/optabs-tree.c > =================================================================== > --- gcc/optabs-tree.c 2017-11-06 12:40:39.845713389 +0000 > +++ gcc/optabs-tree.c 2017-11-06 12:40:40.286573506 +0000 > @@ -210,6 +210,9 @@ optab_for_tree_code (enum tree_code code > return TYPE_UNSIGNED (type) ? > vec_pack_ufix_trunc_optab : vec_pack_sfix_trunc_optab; > > + case VEC_DUPLICATE_EXPR: > + return vec_duplicate_optab; > + > default: > break; > } > Index: gcc/optabs.h > =================================================================== > --- gcc/optabs.h 2017-11-06 12:40:39.845713389 +0000 > +++ gcc/optabs.h 2017-11-06 12:40:40.287566435 +0000 > @@ -181,6 +181,7 @@ extern rtx simplify_expand_binop (machin > enum optab_methods methods); > extern bool force_expand_binop (machine_mode, optab, rtx, rtx, rtx, int, > enum optab_methods); > +extern rtx expand_vector_broadcast (machine_mode, rtx); > > /* Generate code for a simple binary or unary operation. "Simple" in > this case means "can be unambiguously described by a (mode, code) > Index: gcc/optabs.c > =================================================================== > --- gcc/optabs.c 2017-11-06 12:40:39.845713389 +0000 > +++ gcc/optabs.c 2017-11-06 12:40:40.286573506 +0000 > @@ -367,7 +367,7 @@ force_expand_binop (machine_mode mode, o > mode of OP must be the element mode of VMODE. If OP is a constant, > then the return value will be a constant. */ > > -static rtx > +rtx > expand_vector_broadcast (machine_mode vmode, rtx op) > { > enum insn_code icode; > @@ -380,6 +380,16 @@ expand_vector_broadcast (machine_mode vm > if (valid_for_const_vec_duplicate_p (vmode, op)) > return gen_const_vec_duplicate (vmode, op); > > + icode = optab_handler (vec_duplicate_optab, vmode); > + if (icode != CODE_FOR_nothing) > + { > + struct expand_operand ops[2]; > + create_output_operand (&ops[0], NULL_RTX, vmode); > + create_input_operand (&ops[1], op, GET_MODE (op)); > + expand_insn (icode, 2, ops); > + return ops[0].value; > + } > + > /* ??? If the target doesn't have a vec_init, then we have no easy way > of performing this operation. Most of this sort of generic support > is hidden away in the vector lowering support in gimple. */ > Index: gcc/expr.c > =================================================================== > --- gcc/expr.c 2017-11-06 12:40:39.845713389 +0000 > +++ gcc/expr.c 2017-11-06 12:40:40.281608865 +0000 > @@ -6576,7 +6576,8 @@ store_constructor (tree exp, rtx target, > constructor_elt *ce; > int i; > int need_to_clear; > - int icode = CODE_FOR_nothing; > + insn_code icode = CODE_FOR_nothing; > + tree elt; > tree elttype = TREE_TYPE (type); > int elt_size = tree_to_uhwi (TYPE_SIZE (elttype)); > machine_mode eltmode = TYPE_MODE (elttype); > @@ -6586,13 +6587,30 @@ store_constructor (tree exp, rtx target, > unsigned n_elts; > alias_set_type alias; > bool vec_vec_init_p = false; > + machine_mode mode = GET_MODE (target); > > gcc_assert (eltmode != BLKmode); > > + /* Try using vec_duplicate_optab for uniform vectors. */ > + if (!TREE_SIDE_EFFECTS (exp) > + && VECTOR_MODE_P (mode) > + && eltmode == GET_MODE_INNER (mode) > + && ((icode = optab_handler (vec_duplicate_optab, mode)) > + != CODE_FOR_nothing) > + && (elt = uniform_vector_p (exp))) > + { > + struct expand_operand ops[2]; > + create_output_operand (&ops[0], target, mode); > + create_input_operand (&ops[1], expand_normal (elt), eltmode); > + expand_insn (icode, 2, ops); > + if (!rtx_equal_p (target, ops[0].value)) > + emit_move_insn (target, ops[0].value); > + break; > + } > + > n_elts = TYPE_VECTOR_SUBPARTS (type); > - if (REG_P (target) && VECTOR_MODE_P (GET_MODE (target))) > + if (REG_P (target) && VECTOR_MODE_P (mode)) > { > - machine_mode mode = GET_MODE (target); > machine_mode emode = eltmode; > > if (CONSTRUCTOR_NELTS (exp) > @@ -6604,7 +6622,7 @@ store_constructor (tree exp, rtx target, > == n_elts); > emode = TYPE_MODE (etype); > } > - icode = (int) convert_optab_handler (vec_init_optab, mode, emode); > + icode = convert_optab_handler (vec_init_optab, mode, emode); > if (icode != CODE_FOR_nothing) > { > unsigned int i, n = n_elts; > @@ -6652,7 +6670,7 @@ store_constructor (tree exp, rtx target, > if (need_to_clear && size > 0 && !vector) > { > if (REG_P (target)) > - emit_move_insn (target, CONST0_RTX (GET_MODE (target))); > + emit_move_insn (target, CONST0_RTX (mode)); > else > clear_storage (target, GEN_INT (size), BLOCK_OP_NORMAL); > cleared = 1; > @@ -6660,7 +6678,7 @@ store_constructor (tree exp, rtx target, > > /* Inform later passes that the old value is dead. */ > if (!cleared && !vector && REG_P (target)) > - emit_move_insn (target, CONST0_RTX (GET_MODE (target))); > + emit_move_insn (target, CONST0_RTX (mode)); > > if (MEM_P (target)) > alias = MEM_ALIAS_SET (target); > @@ -6711,8 +6729,7 @@ store_constructor (tree exp, rtx target, > > if (vector) > emit_insn (GEN_FCN (icode) (target, > - gen_rtx_PARALLEL (GET_MODE (target), > - vector))); > + gen_rtx_PARALLEL (mode, vector))); > break; > } > > @@ -7690,6 +7707,19 @@ expand_operands (tree exp0, tree exp1, r > } > > > +/* Expand constant vector element ELT, which has mode MODE. This is used > + for members of VECTOR_CST and VEC_DUPLICATE_CST. */ > + > +static rtx > +const_vector_element (scalar_mode mode, const_tree elt) > +{ > + if (TREE_CODE (elt) == REAL_CST) > + return const_double_from_real_value (TREE_REAL_CST (elt), mode); > + if (TREE_CODE (elt) == FIXED_CST) > + return CONST_FIXED_FROM_FIXED_VALUE (TREE_FIXED_CST (elt), mode); > + return immed_wide_int_const (wi::to_wide (elt), mode); > +} > + > /* Return a MEM that contains constant EXP. DEFER is as for > output_constant_def and MODIFIER is as for expand_expr. */ > > @@ -9555,6 +9585,12 @@ #define REDUCE_BIT_FIELD(expr) (reduce_b > target = expand_vec_cond_expr (type, treeop0, treeop1, treeop2, target); > return target; > > + case VEC_DUPLICATE_EXPR: > + op0 = expand_expr (treeop0, NULL_RTX, VOIDmode, modifier); > + target = expand_vector_broadcast (mode, op0); > + gcc_assert (target); > + return target; > + > case BIT_INSERT_EXPR: > { > unsigned bitpos = tree_to_uhwi (treeop2); > @@ -9988,6 +10024,11 @@ expand_expr_real_1 (tree exp, rtx target > tmode, modifier); > } > > + case VEC_DUPLICATE_CST: > + op0 = const_vector_element (GET_MODE_INNER (mode), > + VEC_DUPLICATE_CST_ELT (exp)); > + return gen_const_vec_duplicate (mode, op0); > + > case CONST_DECL: > if (modifier == EXPAND_WRITE) > { > @@ -11749,8 +11790,7 @@ const_vector_from_tree (tree exp) > { > rtvec v; > unsigned i, units; > - tree elt; > - machine_mode inner, mode; > + machine_mode mode; > > mode = TYPE_MODE (TREE_TYPE (exp)); > > @@ -11761,23 +11801,12 @@ const_vector_from_tree (tree exp) > return const_vector_mask_from_tree (exp); > > units = VECTOR_CST_NELTS (exp); > - inner = GET_MODE_INNER (mode); > > v = rtvec_alloc (units); > > for (i = 0; i < units; ++i) > - { > - elt = VECTOR_CST_ELT (exp, i); > - > - if (TREE_CODE (elt) == REAL_CST) > - RTVEC_ELT (v, i) = const_double_from_real_value (TREE_REAL_CST (elt), > - inner); > - else if (TREE_CODE (elt) == FIXED_CST) > - RTVEC_ELT (v, i) = CONST_FIXED_FROM_FIXED_VALUE (TREE_FIXED_CST (elt), > - inner); > - else > - RTVEC_ELT (v, i) = immed_wide_int_const (wi::to_wide (elt), inner); > - } > + RTVEC_ELT (v, i) = const_vector_element (GET_MODE_INNER (mode), > + VECTOR_CST_ELT (exp, i)); > > return gen_rtx_CONST_VECTOR (mode, v); > } > Index: gcc/internal-fn.c > =================================================================== > --- gcc/internal-fn.c 2017-11-06 12:40:39.845713389 +0000 > +++ gcc/internal-fn.c 2017-11-06 12:40:40.284587650 +0000 > @@ -1911,12 +1911,12 @@ expand_vector_ubsan_overflow (location_t > emit_move_insn (cntvar, const0_rtx); > emit_label (loop_lab); > } > - if (TREE_CODE (arg0) != VECTOR_CST) > + if (!CONSTANT_CLASS_P (arg0)) > { > rtx arg0r = expand_normal (arg0); > arg0 = make_tree (TREE_TYPE (arg0), arg0r); > } > - if (TREE_CODE (arg1) != VECTOR_CST) > + if (!CONSTANT_CLASS_P (arg1)) > { > rtx arg1r = expand_normal (arg1); > arg1 = make_tree (TREE_TYPE (arg1), arg1r); > Index: gcc/tree-cfg.c > =================================================================== > --- gcc/tree-cfg.c 2017-11-06 12:40:39.845713389 +0000 > +++ gcc/tree-cfg.c 2017-11-06 12:40:40.287566435 +0000 > @@ -3798,6 +3798,17 @@ verify_gimple_assign_unary (gassign *stm > case CONJ_EXPR: > break; > > + case VEC_DUPLICATE_EXPR: > + if (TREE_CODE (lhs_type) != VECTOR_TYPE > + || !useless_type_conversion_p (TREE_TYPE (lhs_type), rhs1_type)) > + { > + error ("vec_duplicate should be from a scalar to a like vector"); > + debug_generic_expr (lhs_type); > + debug_generic_expr (rhs1_type); > + return true; > + } > + return false; > + > default: > gcc_unreachable (); > } > @@ -4468,6 +4479,7 @@ verify_gimple_assign_single (gassign *st > case FIXED_CST: > case COMPLEX_CST: > case VECTOR_CST: > + case VEC_DUPLICATE_CST: > case STRING_CST: > return res; > > Index: gcc/tree-inline.c > =================================================================== > --- gcc/tree-inline.c 2017-11-06 12:40:39.845713389 +0000 > +++ gcc/tree-inline.c 2017-11-06 12:40:40.289552291 +0000 > @@ -3930,6 +3930,7 @@ estimate_operator_cost (enum tree_code c > case VEC_PACK_FIX_TRUNC_EXPR: > case VEC_WIDEN_LSHIFT_HI_EXPR: > case VEC_WIDEN_LSHIFT_LO_EXPR: > + case VEC_DUPLICATE_EXPR: > > return 1; >
This patch just adds VEC_DUPLICATE_EXPR, since the VEC_DUPLICATE_CST isn't needed with the new VECTOR_CST layout. It's really just the original patch with bits removed, but just in case: Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64-linux-gnu. OK to install? Richard 2017-12-15 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hawyard@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * doc/generic.texi (VEC_DUPLICATE_EXPR): Document. (VEC_COND_EXPR): Add missing @tindex. * doc/md.texi (vec_duplicate@var{m}): Document. * tree.def (VEC_DUPLICATE_EXPR): New tree codes. * tree.c (build_vector_from_val): Add stubbed-out handling of variable-length vectors, using VEC_DUPLICATE_EXPR. (uniform_vector_p): Handle VEC_DUPLICATE_EXPR. * cfgexpand.c (expand_debug_expr): Likewise. * tree-cfg.c (verify_gimple_assign_unary): Likewise. * tree-inline.c (estimate_operator_cost): Likewise. * tree-pretty-print.c (dump_generic_node): Likewise. * tree-vect-generic.c (ssa_uniform_vector_p): Likewise. * fold-const.c (const_unop): Fold VEC_DUPLICATE_EXPRs of a constant. (test_vec_duplicate_folding): New function. (fold_const_c_tests): Call it. * optabs.def (vec_duplicate_optab): New optab. * optabs-tree.c (optab_for_tree_code): Handle VEC_DUPLICATE_EXPR. * optabs.h (expand_vector_broadcast): Declare. * optabs.c (expand_vector_broadcast): Make non-static. Try using vec_duplicate_optab. * expr.c (store_constructor): Try using vec_duplicate_optab for uniform vectors. (expand_expr_real_2): Handle VEC_DUPLICATE_EXPR. Index: gcc/doc/generic.texi =================================================================== --- gcc/doc/generic.texi 2017-12-15 00:24:47.213516622 +0000 +++ gcc/doc/generic.texi 2017-12-15 00:24:47.498459276 +0000 @@ -1768,6 +1768,7 @@ a value from @code{enum annot_expr_kind} @node Vectors @subsection Vectors +@tindex VEC_DUPLICATE_EXPR @tindex VEC_LSHIFT_EXPR @tindex VEC_RSHIFT_EXPR @tindex VEC_WIDEN_MULT_HI_EXPR @@ -1779,9 +1780,14 @@ a value from @code{enum annot_expr_kind} @tindex VEC_PACK_TRUNC_EXPR @tindex VEC_PACK_SAT_EXPR @tindex VEC_PACK_FIX_TRUNC_EXPR +@tindex VEC_COND_EXPR @tindex SAD_EXPR @table @code +@item VEC_DUPLICATE_EXPR +This node has a single operand and represents a vector in which every +element is equal to that operand. + @item VEC_LSHIFT_EXPR @itemx VEC_RSHIFT_EXPR These nodes represent whole vector left and right shifts, respectively. Index: gcc/doc/md.texi =================================================================== --- gcc/doc/md.texi 2017-12-15 00:24:47.213516622 +0000 +++ gcc/doc/md.texi 2017-12-15 00:24:47.499459075 +0000 @@ -4888,6 +4888,17 @@ and operand 1 is parallel containing val the vector mode @var{m}, or a vector mode with the same element mode and smaller number of elements. +@cindex @code{vec_duplicate@var{m}} instruction pattern +@item @samp{vec_duplicate@var{m}} +Initialize vector output operand 0 so that each element has the value given +by scalar input operand 1. The vector has mode @var{m} and the scalar has +the mode appropriate for one element of @var{m}. + +This pattern only handles duplicates of non-constant inputs. Constant +vectors go through the @code{mov@var{m}} pattern instead. + +This pattern is not allowed to @code{FAIL}. + @cindex @code{vec_cmp@var{m}@var{n}} instruction pattern @item @samp{vec_cmp@var{m}@var{n}} Output a vector comparison. Operand 0 of mode @var{n} is the destination for Index: gcc/tree.def =================================================================== --- gcc/tree.def 2017-12-15 00:24:47.213516622 +0000 +++ gcc/tree.def 2017-12-15 00:24:47.505457868 +0000 @@ -537,6 +537,9 @@ DEFTREECODE (TARGET_EXPR, "target_expr", 1 and 2 are NULL. The operands are then taken from the cfg edges. */ DEFTREECODE (COND_EXPR, "cond_expr", tcc_expression, 3) +/* Represents a vector in which every element is equal to operand 0. */ +DEFTREECODE (VEC_DUPLICATE_EXPR, "vec_duplicate_expr", tcc_unary, 1) + /* Vector conditional expression. It is like COND_EXPR, but with vector operands. Index: gcc/tree.c =================================================================== --- gcc/tree.c 2017-12-15 00:24:47.213516622 +0000 +++ gcc/tree.c 2017-12-15 00:24:47.505457868 +0000 @@ -1785,6 +1785,8 @@ build_vector_from_val (tree vectype, tre v.quick_push (sc); return v.build (); } + else if (0) + return fold_build1 (VEC_DUPLICATE_EXPR, vectype, sc); else { vec<constructor_elt, va_gc> *v; @@ -10468,7 +10470,10 @@ uniform_vector_p (const_tree vec) gcc_assert (VECTOR_TYPE_P (TREE_TYPE (vec))); - if (TREE_CODE (vec) == VECTOR_CST) + if (TREE_CODE (vec) == VEC_DUPLICATE_EXPR) + return TREE_OPERAND (vec, 0); + + else if (TREE_CODE (vec) == VECTOR_CST) { if (VECTOR_CST_NPATTERNS (vec) == 1 && VECTOR_CST_DUPLICATE_P (vec)) return VECTOR_CST_ENCODED_ELT (vec, 0); Index: gcc/cfgexpand.c =================================================================== --- gcc/cfgexpand.c 2017-12-15 00:24:47.213516622 +0000 +++ gcc/cfgexpand.c 2017-12-15 00:24:47.498459276 +0000 @@ -5069,6 +5069,7 @@ expand_debug_expr (tree exp) case VEC_WIDEN_LSHIFT_HI_EXPR: case VEC_WIDEN_LSHIFT_LO_EXPR: case VEC_PERM_EXPR: + case VEC_DUPLICATE_EXPR: return NULL; /* Misc codes. */ Index: gcc/tree-cfg.c =================================================================== --- gcc/tree-cfg.c 2017-12-15 00:24:47.213516622 +0000 +++ gcc/tree-cfg.c 2017-12-15 00:24:47.503458270 +0000 @@ -3857,6 +3857,17 @@ verify_gimple_assign_unary (gassign *stm case CONJ_EXPR: break; + case VEC_DUPLICATE_EXPR: + if (TREE_CODE (lhs_type) != VECTOR_TYPE + || !useless_type_conversion_p (TREE_TYPE (lhs_type), rhs1_type)) + { + error ("vec_duplicate should be from a scalar to a like vector"); + debug_generic_expr (lhs_type); + debug_generic_expr (rhs1_type); + return true; + } + return false; + default: gcc_unreachable (); } Index: gcc/tree-inline.c =================================================================== --- gcc/tree-inline.c 2017-12-15 00:24:47.213516622 +0000 +++ gcc/tree-inline.c 2017-12-15 00:24:47.504458069 +0000 @@ -3928,6 +3928,7 @@ estimate_operator_cost (enum tree_code c case VEC_PACK_FIX_TRUNC_EXPR: case VEC_WIDEN_LSHIFT_HI_EXPR: case VEC_WIDEN_LSHIFT_LO_EXPR: + case VEC_DUPLICATE_EXPR: return 1; Index: gcc/tree-pretty-print.c =================================================================== --- gcc/tree-pretty-print.c 2017-12-15 00:24:47.213516622 +0000 +++ gcc/tree-pretty-print.c 2017-12-15 00:24:47.504458069 +0000 @@ -3178,6 +3178,15 @@ dump_generic_node (pretty_printer *pp, t pp_string (pp, " > "); break; + case VEC_DUPLICATE_EXPR: + pp_space (pp); + for (str = get_tree_code_name (code); *str; str++) + pp_character (pp, TOUPPER (*str)); + pp_string (pp, " < "); + dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false); + pp_string (pp, " > "); + break; + case VEC_UNPACK_HI_EXPR: pp_string (pp, " VEC_UNPACK_HI_EXPR < "); dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false); Index: gcc/tree-vect-generic.c =================================================================== --- gcc/tree-vect-generic.c 2017-12-15 00:24:47.213516622 +0000 +++ gcc/tree-vect-generic.c 2017-12-15 00:24:47.504458069 +0000 @@ -1418,6 +1418,7 @@ lower_vec_perm (gimple_stmt_iterator *gs ssa_uniform_vector_p (tree op) { if (TREE_CODE (op) == VECTOR_CST + || TREE_CODE (op) == VEC_DUPLICATE_EXPR || TREE_CODE (op) == CONSTRUCTOR) return uniform_vector_p (op); if (TREE_CODE (op) == SSA_NAME) Index: gcc/fold-const.c =================================================================== --- gcc/fold-const.c 2017-12-15 00:24:47.213516622 +0000 +++ gcc/fold-const.c 2017-12-15 00:24:47.501458673 +0000 @@ -1771,6 +1771,11 @@ const_unop (enum tree_code code, tree ty return elts.build (); } + case VEC_DUPLICATE_EXPR: + if (CONSTANT_CLASS_P (arg0)) + return build_vector_from_val (type, arg0); + return NULL_TREE; + default: break; } @@ -14442,6 +14447,22 @@ test_vector_folding () ASSERT_FALSE (integer_nonzerop (fold_build2 (NE_EXPR, res_type, one, one))); } +/* Verify folding of VEC_DUPLICATE_EXPRs. */ + +static void +test_vec_duplicate_folding () +{ + scalar_int_mode int_mode = SCALAR_INT_TYPE_MODE (ssizetype); + machine_mode vec_mode = targetm.vectorize.preferred_simd_mode (int_mode); + /* This will be 1 if VEC_MODE isn't a vector mode. */ + unsigned int nunits = GET_MODE_NUNITS (vec_mode); + + tree type = build_vector_type (ssizetype, nunits); + tree dup5_expr = fold_unary (VEC_DUPLICATE_EXPR, type, ssize_int (5)); + tree dup5_cst = build_vector_from_val (type, ssize_int (5)); + ASSERT_TRUE (operand_equal_p (dup5_expr, dup5_cst, 0)); +} + /* Run all of the selftests within this file. */ void @@ -14449,6 +14470,7 @@ fold_const_c_tests () { test_arithmetic_folding (); test_vector_folding (); + test_vec_duplicate_folding (); } } // namespace selftest Index: gcc/optabs.def =================================================================== --- gcc/optabs.def 2017-12-15 00:24:47.213516622 +0000 +++ gcc/optabs.def 2017-12-15 00:24:47.502458472 +0000 @@ -363,3 +363,5 @@ OPTAB_D (atomic_xor_optab, "atomic_xor$I OPTAB_D (get_thread_pointer_optab, "get_thread_pointer$I$a") OPTAB_D (set_thread_pointer_optab, "set_thread_pointer$I$a") + +OPTAB_DC (vec_duplicate_optab, "vec_duplicate$a", VEC_DUPLICATE) Index: gcc/optabs-tree.c =================================================================== --- gcc/optabs-tree.c 2017-12-15 00:24:47.213516622 +0000 +++ gcc/optabs-tree.c 2017-12-15 00:24:47.501458673 +0000 @@ -199,6 +199,9 @@ optab_for_tree_code (enum tree_code code return TYPE_UNSIGNED (type) ? vec_pack_ufix_trunc_optab : vec_pack_sfix_trunc_optab; + case VEC_DUPLICATE_EXPR: + return vec_duplicate_optab; + default: break; } Index: gcc/optabs.h =================================================================== --- gcc/optabs.h 2017-12-15 00:24:47.213516622 +0000 +++ gcc/optabs.h 2017-12-15 00:24:47.502458472 +0000 @@ -182,6 +182,7 @@ extern rtx simplify_expand_binop (machin enum optab_methods methods); extern bool force_expand_binop (machine_mode, optab, rtx, rtx, rtx, int, enum optab_methods); +extern rtx expand_vector_broadcast (machine_mode, rtx); /* Generate code for a simple binary or unary operation. "Simple" in this case means "can be unambiguously described by a (mode, code) Index: gcc/optabs.c =================================================================== --- gcc/optabs.c 2017-12-15 00:24:47.213516622 +0000 +++ gcc/optabs.c 2017-12-15 00:24:47.502458472 +0000 @@ -367,7 +367,7 @@ force_expand_binop (machine_mode mode, o mode of OP must be the element mode of VMODE. If OP is a constant, then the return value will be a constant. */ -static rtx +rtx expand_vector_broadcast (machine_mode vmode, rtx op) { enum insn_code icode; @@ -380,6 +380,16 @@ expand_vector_broadcast (machine_mode vm if (valid_for_const_vec_duplicate_p (vmode, op)) return gen_const_vec_duplicate (vmode, op); + icode = optab_handler (vec_duplicate_optab, vmode); + if (icode != CODE_FOR_nothing) + { + struct expand_operand ops[2]; + create_output_operand (&ops[0], NULL_RTX, vmode); + create_input_operand (&ops[1], op, GET_MODE (op)); + expand_insn (icode, 2, ops); + return ops[0].value; + } + /* ??? If the target doesn't have a vec_init, then we have no easy way of performing this operation. Most of this sort of generic support is hidden away in the vector lowering support in gimple. */ Index: gcc/expr.c =================================================================== --- gcc/expr.c 2017-12-15 00:24:47.213516622 +0000 +++ gcc/expr.c 2017-12-15 00:24:47.500458874 +0000 @@ -6598,7 +6598,8 @@ store_constructor (tree exp, rtx target, constructor_elt *ce; int i; int need_to_clear; - int icode = CODE_FOR_nothing; + insn_code icode = CODE_FOR_nothing; + tree elt; tree elttype = TREE_TYPE (type); int elt_size = tree_to_uhwi (TYPE_SIZE (elttype)); machine_mode eltmode = TYPE_MODE (elttype); @@ -6608,13 +6609,30 @@ store_constructor (tree exp, rtx target, unsigned n_elts; alias_set_type alias; bool vec_vec_init_p = false; + machine_mode mode = GET_MODE (target); gcc_assert (eltmode != BLKmode); + /* Try using vec_duplicate_optab for uniform vectors. */ + if (!TREE_SIDE_EFFECTS (exp) + && VECTOR_MODE_P (mode) + && eltmode == GET_MODE_INNER (mode) + && ((icode = optab_handler (vec_duplicate_optab, mode)) + != CODE_FOR_nothing) + && (elt = uniform_vector_p (exp))) + { + struct expand_operand ops[2]; + create_output_operand (&ops[0], target, mode); + create_input_operand (&ops[1], expand_normal (elt), eltmode); + expand_insn (icode, 2, ops); + if (!rtx_equal_p (target, ops[0].value)) + emit_move_insn (target, ops[0].value); + break; + } + n_elts = TYPE_VECTOR_SUBPARTS (type); - if (REG_P (target) && VECTOR_MODE_P (GET_MODE (target))) + if (REG_P (target) && VECTOR_MODE_P (mode)) { - machine_mode mode = GET_MODE (target); machine_mode emode = eltmode; if (CONSTRUCTOR_NELTS (exp) @@ -6626,7 +6644,7 @@ store_constructor (tree exp, rtx target, == n_elts); emode = TYPE_MODE (etype); } - icode = (int) convert_optab_handler (vec_init_optab, mode, emode); + icode = convert_optab_handler (vec_init_optab, mode, emode); if (icode != CODE_FOR_nothing) { unsigned int i, n = n_elts; @@ -6674,7 +6692,7 @@ store_constructor (tree exp, rtx target, if (need_to_clear && size > 0 && !vector) { if (REG_P (target)) - emit_move_insn (target, CONST0_RTX (GET_MODE (target))); + emit_move_insn (target, CONST0_RTX (mode)); else clear_storage (target, GEN_INT (size), BLOCK_OP_NORMAL); cleared = 1; @@ -6682,7 +6700,7 @@ store_constructor (tree exp, rtx target, /* Inform later passes that the old value is dead. */ if (!cleared && !vector && REG_P (target)) - emit_move_insn (target, CONST0_RTX (GET_MODE (target))); + emit_move_insn (target, CONST0_RTX (mode)); if (MEM_P (target)) alias = MEM_ALIAS_SET (target); @@ -6733,8 +6751,7 @@ store_constructor (tree exp, rtx target, if (vector) emit_insn (GEN_FCN (icode) (target, - gen_rtx_PARALLEL (GET_MODE (target), - vector))); + gen_rtx_PARALLEL (mode, vector))); break; } @@ -9563,6 +9580,12 @@ #define REDUCE_BIT_FIELD(expr) (reduce_b target = expand_vec_cond_expr (type, treeop0, treeop1, treeop2, target); return target; + case VEC_DUPLICATE_EXPR: + op0 = expand_expr (treeop0, NULL_RTX, VOIDmode, modifier); + target = expand_vector_broadcast (mode, op0); + gcc_assert (target); + return target; + case BIT_INSERT_EXPR: { unsigned bitpos = tree_to_uhwi (treeop2);
On Fri, Dec 15, 2017 at 1:29 AM, Richard Sandiford <richard.sandiford@linaro.org> wrote: > This patch just adds VEC_DUPLICATE_EXPR, since the VEC_DUPLICATE_CST > isn't needed with the new VECTOR_CST layout. It's really just the > original patch with bits removed, but just in case: > > Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64-linux-gnu. > OK to install? To keep things simple at this point OK. Note that I'd eventually like to see this as VEC_PERM_EXPR <scalar_type_1, scalar_type_1, { 0, ... }>. For reductions when we need { x, 0, ... } we now have to use a VEC_DUPLICATE_EXPR to make x a vector and then a VEC_PERM_EXPR to merge it with {0, ... }, right? Rather than VEC_PERM_EXPR <x_1, 0, { 0, 1, 1, 1.... }> Thanks, Richard. > Richard > > > 2017-12-15 Richard Sandiford <richard.sandiford@linaro.org> > Alan Hayward <alan.hawyard@arm.com> > David Sherwood <david.sherwood@arm.com> > > gcc/ > * doc/generic.texi (VEC_DUPLICATE_EXPR): Document. > (VEC_COND_EXPR): Add missing @tindex. > * doc/md.texi (vec_duplicate@var{m}): Document. > * tree.def (VEC_DUPLICATE_EXPR): New tree codes. > * tree.c (build_vector_from_val): Add stubbed-out handling of > variable-length vectors, using VEC_DUPLICATE_EXPR. > (uniform_vector_p): Handle VEC_DUPLICATE_EXPR. > * cfgexpand.c (expand_debug_expr): Likewise. > * tree-cfg.c (verify_gimple_assign_unary): Likewise. > * tree-inline.c (estimate_operator_cost): Likewise. > * tree-pretty-print.c (dump_generic_node): Likewise. > * tree-vect-generic.c (ssa_uniform_vector_p): Likewise. > * fold-const.c (const_unop): Fold VEC_DUPLICATE_EXPRs of a constant. > (test_vec_duplicate_folding): New function. > (fold_const_c_tests): Call it. > * optabs.def (vec_duplicate_optab): New optab. > * optabs-tree.c (optab_for_tree_code): Handle VEC_DUPLICATE_EXPR. > * optabs.h (expand_vector_broadcast): Declare. > * optabs.c (expand_vector_broadcast): Make non-static. Try using > vec_duplicate_optab. > * expr.c (store_constructor): Try using vec_duplicate_optab for > uniform vectors. > (expand_expr_real_2): Handle VEC_DUPLICATE_EXPR. > > Index: gcc/doc/generic.texi > =================================================================== > --- gcc/doc/generic.texi 2017-12-15 00:24:47.213516622 +0000 > +++ gcc/doc/generic.texi 2017-12-15 00:24:47.498459276 +0000 > @@ -1768,6 +1768,7 @@ a value from @code{enum annot_expr_kind} > > @node Vectors > @subsection Vectors > +@tindex VEC_DUPLICATE_EXPR > @tindex VEC_LSHIFT_EXPR > @tindex VEC_RSHIFT_EXPR > @tindex VEC_WIDEN_MULT_HI_EXPR > @@ -1779,9 +1780,14 @@ a value from @code{enum annot_expr_kind} > @tindex VEC_PACK_TRUNC_EXPR > @tindex VEC_PACK_SAT_EXPR > @tindex VEC_PACK_FIX_TRUNC_EXPR > +@tindex VEC_COND_EXPR > @tindex SAD_EXPR > > @table @code > +@item VEC_DUPLICATE_EXPR > +This node has a single operand and represents a vector in which every > +element is equal to that operand. > + > @item VEC_LSHIFT_EXPR > @itemx VEC_RSHIFT_EXPR > These nodes represent whole vector left and right shifts, respectively. > Index: gcc/doc/md.texi > =================================================================== > --- gcc/doc/md.texi 2017-12-15 00:24:47.213516622 +0000 > +++ gcc/doc/md.texi 2017-12-15 00:24:47.499459075 +0000 > @@ -4888,6 +4888,17 @@ and operand 1 is parallel containing val > the vector mode @var{m}, or a vector mode with the same element mode and > smaller number of elements. > > +@cindex @code{vec_duplicate@var{m}} instruction pattern > +@item @samp{vec_duplicate@var{m}} > +Initialize vector output operand 0 so that each element has the value given > +by scalar input operand 1. The vector has mode @var{m} and the scalar has > +the mode appropriate for one element of @var{m}. > + > +This pattern only handles duplicates of non-constant inputs. Constant > +vectors go through the @code{mov@var{m}} pattern instead. > + > +This pattern is not allowed to @code{FAIL}. > + > @cindex @code{vec_cmp@var{m}@var{n}} instruction pattern > @item @samp{vec_cmp@var{m}@var{n}} > Output a vector comparison. Operand 0 of mode @var{n} is the destination for > Index: gcc/tree.def > =================================================================== > --- gcc/tree.def 2017-12-15 00:24:47.213516622 +0000 > +++ gcc/tree.def 2017-12-15 00:24:47.505457868 +0000 > @@ -537,6 +537,9 @@ DEFTREECODE (TARGET_EXPR, "target_expr", > 1 and 2 are NULL. The operands are then taken from the cfg edges. */ > DEFTREECODE (COND_EXPR, "cond_expr", tcc_expression, 3) > > +/* Represents a vector in which every element is equal to operand 0. */ > +DEFTREECODE (VEC_DUPLICATE_EXPR, "vec_duplicate_expr", tcc_unary, 1) > + > /* Vector conditional expression. It is like COND_EXPR, but with > vector operands. > > Index: gcc/tree.c > =================================================================== > --- gcc/tree.c 2017-12-15 00:24:47.213516622 +0000 > +++ gcc/tree.c 2017-12-15 00:24:47.505457868 +0000 > @@ -1785,6 +1785,8 @@ build_vector_from_val (tree vectype, tre > v.quick_push (sc); > return v.build (); > } > + else if (0) > + return fold_build1 (VEC_DUPLICATE_EXPR, vectype, sc); > else > { > vec<constructor_elt, va_gc> *v; > @@ -10468,7 +10470,10 @@ uniform_vector_p (const_tree vec) > > gcc_assert (VECTOR_TYPE_P (TREE_TYPE (vec))); > > - if (TREE_CODE (vec) == VECTOR_CST) > + if (TREE_CODE (vec) == VEC_DUPLICATE_EXPR) > + return TREE_OPERAND (vec, 0); > + > + else if (TREE_CODE (vec) == VECTOR_CST) > { > if (VECTOR_CST_NPATTERNS (vec) == 1 && VECTOR_CST_DUPLICATE_P (vec)) > return VECTOR_CST_ENCODED_ELT (vec, 0); > Index: gcc/cfgexpand.c > =================================================================== > --- gcc/cfgexpand.c 2017-12-15 00:24:47.213516622 +0000 > +++ gcc/cfgexpand.c 2017-12-15 00:24:47.498459276 +0000 > @@ -5069,6 +5069,7 @@ expand_debug_expr (tree exp) > case VEC_WIDEN_LSHIFT_HI_EXPR: > case VEC_WIDEN_LSHIFT_LO_EXPR: > case VEC_PERM_EXPR: > + case VEC_DUPLICATE_EXPR: > return NULL; > > /* Misc codes. */ > Index: gcc/tree-cfg.c > =================================================================== > --- gcc/tree-cfg.c 2017-12-15 00:24:47.213516622 +0000 > +++ gcc/tree-cfg.c 2017-12-15 00:24:47.503458270 +0000 > @@ -3857,6 +3857,17 @@ verify_gimple_assign_unary (gassign *stm > case CONJ_EXPR: > break; > > + case VEC_DUPLICATE_EXPR: > + if (TREE_CODE (lhs_type) != VECTOR_TYPE > + || !useless_type_conversion_p (TREE_TYPE (lhs_type), rhs1_type)) > + { > + error ("vec_duplicate should be from a scalar to a like vector"); > + debug_generic_expr (lhs_type); > + debug_generic_expr (rhs1_type); > + return true; > + } > + return false; > + > default: > gcc_unreachable (); > } > Index: gcc/tree-inline.c > =================================================================== > --- gcc/tree-inline.c 2017-12-15 00:24:47.213516622 +0000 > +++ gcc/tree-inline.c 2017-12-15 00:24:47.504458069 +0000 > @@ -3928,6 +3928,7 @@ estimate_operator_cost (enum tree_code c > case VEC_PACK_FIX_TRUNC_EXPR: > case VEC_WIDEN_LSHIFT_HI_EXPR: > case VEC_WIDEN_LSHIFT_LO_EXPR: > + case VEC_DUPLICATE_EXPR: > > return 1; > > Index: gcc/tree-pretty-print.c > =================================================================== > --- gcc/tree-pretty-print.c 2017-12-15 00:24:47.213516622 +0000 > +++ gcc/tree-pretty-print.c 2017-12-15 00:24:47.504458069 +0000 > @@ -3178,6 +3178,15 @@ dump_generic_node (pretty_printer *pp, t > pp_string (pp, " > "); > break; > > + case VEC_DUPLICATE_EXPR: > + pp_space (pp); > + for (str = get_tree_code_name (code); *str; str++) > + pp_character (pp, TOUPPER (*str)); > + pp_string (pp, " < "); > + dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false); > + pp_string (pp, " > "); > + break; > + > case VEC_UNPACK_HI_EXPR: > pp_string (pp, " VEC_UNPACK_HI_EXPR < "); > dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false); > Index: gcc/tree-vect-generic.c > =================================================================== > --- gcc/tree-vect-generic.c 2017-12-15 00:24:47.213516622 +0000 > +++ gcc/tree-vect-generic.c 2017-12-15 00:24:47.504458069 +0000 > @@ -1418,6 +1418,7 @@ lower_vec_perm (gimple_stmt_iterator *gs > ssa_uniform_vector_p (tree op) > { > if (TREE_CODE (op) == VECTOR_CST > + || TREE_CODE (op) == VEC_DUPLICATE_EXPR > || TREE_CODE (op) == CONSTRUCTOR) > return uniform_vector_p (op); > if (TREE_CODE (op) == SSA_NAME) > Index: gcc/fold-const.c > =================================================================== > --- gcc/fold-const.c 2017-12-15 00:24:47.213516622 +0000 > +++ gcc/fold-const.c 2017-12-15 00:24:47.501458673 +0000 > @@ -1771,6 +1771,11 @@ const_unop (enum tree_code code, tree ty > return elts.build (); > } > > + case VEC_DUPLICATE_EXPR: > + if (CONSTANT_CLASS_P (arg0)) > + return build_vector_from_val (type, arg0); > + return NULL_TREE; > + > default: > break; > } > @@ -14442,6 +14447,22 @@ test_vector_folding () > ASSERT_FALSE (integer_nonzerop (fold_build2 (NE_EXPR, res_type, one, one))); > } > > +/* Verify folding of VEC_DUPLICATE_EXPRs. */ > + > +static void > +test_vec_duplicate_folding () > +{ > + scalar_int_mode int_mode = SCALAR_INT_TYPE_MODE (ssizetype); > + machine_mode vec_mode = targetm.vectorize.preferred_simd_mode (int_mode); > + /* This will be 1 if VEC_MODE isn't a vector mode. */ > + unsigned int nunits = GET_MODE_NUNITS (vec_mode); > + > + tree type = build_vector_type (ssizetype, nunits); > + tree dup5_expr = fold_unary (VEC_DUPLICATE_EXPR, type, ssize_int (5)); > + tree dup5_cst = build_vector_from_val (type, ssize_int (5)); > + ASSERT_TRUE (operand_equal_p (dup5_expr, dup5_cst, 0)); > +} > + > /* Run all of the selftests within this file. */ > > void > @@ -14449,6 +14470,7 @@ fold_const_c_tests () > { > test_arithmetic_folding (); > test_vector_folding (); > + test_vec_duplicate_folding (); > } > > } // namespace selftest > Index: gcc/optabs.def > =================================================================== > --- gcc/optabs.def 2017-12-15 00:24:47.213516622 +0000 > +++ gcc/optabs.def 2017-12-15 00:24:47.502458472 +0000 > @@ -363,3 +363,5 @@ OPTAB_D (atomic_xor_optab, "atomic_xor$I > > OPTAB_D (get_thread_pointer_optab, "get_thread_pointer$I$a") > OPTAB_D (set_thread_pointer_optab, "set_thread_pointer$I$a") > + > +OPTAB_DC (vec_duplicate_optab, "vec_duplicate$a", VEC_DUPLICATE) > Index: gcc/optabs-tree.c > =================================================================== > --- gcc/optabs-tree.c 2017-12-15 00:24:47.213516622 +0000 > +++ gcc/optabs-tree.c 2017-12-15 00:24:47.501458673 +0000 > @@ -199,6 +199,9 @@ optab_for_tree_code (enum tree_code code > return TYPE_UNSIGNED (type) ? > vec_pack_ufix_trunc_optab : vec_pack_sfix_trunc_optab; > > + case VEC_DUPLICATE_EXPR: > + return vec_duplicate_optab; > + > default: > break; > } > Index: gcc/optabs.h > =================================================================== > --- gcc/optabs.h 2017-12-15 00:24:47.213516622 +0000 > +++ gcc/optabs.h 2017-12-15 00:24:47.502458472 +0000 > @@ -182,6 +182,7 @@ extern rtx simplify_expand_binop (machin > enum optab_methods methods); > extern bool force_expand_binop (machine_mode, optab, rtx, rtx, rtx, int, > enum optab_methods); > +extern rtx expand_vector_broadcast (machine_mode, rtx); > > /* Generate code for a simple binary or unary operation. "Simple" in > this case means "can be unambiguously described by a (mode, code) > Index: gcc/optabs.c > =================================================================== > --- gcc/optabs.c 2017-12-15 00:24:47.213516622 +0000 > +++ gcc/optabs.c 2017-12-15 00:24:47.502458472 +0000 > @@ -367,7 +367,7 @@ force_expand_binop (machine_mode mode, o > mode of OP must be the element mode of VMODE. If OP is a constant, > then the return value will be a constant. */ > > -static rtx > +rtx > expand_vector_broadcast (machine_mode vmode, rtx op) > { > enum insn_code icode; > @@ -380,6 +380,16 @@ expand_vector_broadcast (machine_mode vm > if (valid_for_const_vec_duplicate_p (vmode, op)) > return gen_const_vec_duplicate (vmode, op); > > + icode = optab_handler (vec_duplicate_optab, vmode); > + if (icode != CODE_FOR_nothing) > + { > + struct expand_operand ops[2]; > + create_output_operand (&ops[0], NULL_RTX, vmode); > + create_input_operand (&ops[1], op, GET_MODE (op)); > + expand_insn (icode, 2, ops); > + return ops[0].value; > + } > + > /* ??? If the target doesn't have a vec_init, then we have no easy way > of performing this operation. Most of this sort of generic support > is hidden away in the vector lowering support in gimple. */ > Index: gcc/expr.c > =================================================================== > --- gcc/expr.c 2017-12-15 00:24:47.213516622 +0000 > +++ gcc/expr.c 2017-12-15 00:24:47.500458874 +0000 > @@ -6598,7 +6598,8 @@ store_constructor (tree exp, rtx target, > constructor_elt *ce; > int i; > int need_to_clear; > - int icode = CODE_FOR_nothing; > + insn_code icode = CODE_FOR_nothing; > + tree elt; > tree elttype = TREE_TYPE (type); > int elt_size = tree_to_uhwi (TYPE_SIZE (elttype)); > machine_mode eltmode = TYPE_MODE (elttype); > @@ -6608,13 +6609,30 @@ store_constructor (tree exp, rtx target, > unsigned n_elts; > alias_set_type alias; > bool vec_vec_init_p = false; > + machine_mode mode = GET_MODE (target); > > gcc_assert (eltmode != BLKmode); > > + /* Try using vec_duplicate_optab for uniform vectors. */ > + if (!TREE_SIDE_EFFECTS (exp) > + && VECTOR_MODE_P (mode) > + && eltmode == GET_MODE_INNER (mode) > + && ((icode = optab_handler (vec_duplicate_optab, mode)) > + != CODE_FOR_nothing) > + && (elt = uniform_vector_p (exp))) > + { > + struct expand_operand ops[2]; > + create_output_operand (&ops[0], target, mode); > + create_input_operand (&ops[1], expand_normal (elt), eltmode); > + expand_insn (icode, 2, ops); > + if (!rtx_equal_p (target, ops[0].value)) > + emit_move_insn (target, ops[0].value); > + break; > + } > + > n_elts = TYPE_VECTOR_SUBPARTS (type); > - if (REG_P (target) && VECTOR_MODE_P (GET_MODE (target))) > + if (REG_P (target) && VECTOR_MODE_P (mode)) > { > - machine_mode mode = GET_MODE (target); > machine_mode emode = eltmode; > > if (CONSTRUCTOR_NELTS (exp) > @@ -6626,7 +6644,7 @@ store_constructor (tree exp, rtx target, > == n_elts); > emode = TYPE_MODE (etype); > } > - icode = (int) convert_optab_handler (vec_init_optab, mode, emode); > + icode = convert_optab_handler (vec_init_optab, mode, emode); > if (icode != CODE_FOR_nothing) > { > unsigned int i, n = n_elts; > @@ -6674,7 +6692,7 @@ store_constructor (tree exp, rtx target, > if (need_to_clear && size > 0 && !vector) > { > if (REG_P (target)) > - emit_move_insn (target, CONST0_RTX (GET_MODE (target))); > + emit_move_insn (target, CONST0_RTX (mode)); > else > clear_storage (target, GEN_INT (size), BLOCK_OP_NORMAL); > cleared = 1; > @@ -6682,7 +6700,7 @@ store_constructor (tree exp, rtx target, > > /* Inform later passes that the old value is dead. */ > if (!cleared && !vector && REG_P (target)) > - emit_move_insn (target, CONST0_RTX (GET_MODE (target))); > + emit_move_insn (target, CONST0_RTX (mode)); > > if (MEM_P (target)) > alias = MEM_ALIAS_SET (target); > @@ -6733,8 +6751,7 @@ store_constructor (tree exp, rtx target, > > if (vector) > emit_insn (GEN_FCN (icode) (target, > - gen_rtx_PARALLEL (GET_MODE (target), > - vector))); > + gen_rtx_PARALLEL (mode, vector))); > break; > } > > @@ -9563,6 +9580,12 @@ #define REDUCE_BIT_FIELD(expr) (reduce_b > target = expand_vec_cond_expr (type, treeop0, treeop1, treeop2, target); > return target; > > + case VEC_DUPLICATE_EXPR: > + op0 = expand_expr (treeop0, NULL_RTX, VOIDmode, modifier); > + target = expand_vector_broadcast (mode, op0); > + gcc_assert (target); > + return target; > + > case BIT_INSERT_EXPR: > { > unsigned bitpos = tree_to_uhwi (treeop2);
Richard Biener <richard.guenther@gmail.com> writes: > On Fri, Dec 15, 2017 at 1:29 AM, Richard Sandiford > <richard.sandiford@linaro.org> wrote: >> This patch just adds VEC_DUPLICATE_EXPR, since the VEC_DUPLICATE_CST >> isn't needed with the new VECTOR_CST layout. It's really just the >> original patch with bits removed, but just in case: >> >> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64-linux-gnu. >> OK to install? > > To keep things simple at this point OK. Note that I'd eventually > like to see this as VEC_PERM_EXPR <scalar_type_1, scalar_type_1, { 0, ... }>. > For reductions when we need { x, 0, ... } we now have to use a > VEC_DUPLICATE_EXPR to make x a vector and then a VEC_PERM_EXPR > to merge it with {0, ... }, right? Rather than VEC_PERM_EXPR <x_1, 0, > { 0, 1, 1, 1.... }> That's where the shift-left-and-insert-scalar thing (IFN_SHL_INSERT) comes in. But yeah, allowing scalars as operands to VEC_PERM_EXPRs would mean it could represent both VEC_DUPLICATE_EXPR and IFN_SHL_INSERT. I guess the question is whether that's better than extending CONSTRUCTOR (or a replacement) to use the VECTOR_CST encoding. I realise you don't like CONSTRUCTOR in gimple though... I promise to look at either of those for GCC 9 if you think they're better, but they'll be more invasive for other targets. Thanks, Richard
On Fri, Dec 15, 2017 at 1:52 PM, Richard Sandiford <richard.sandiford@linaro.org> wrote: > Richard Biener <richard.guenther@gmail.com> writes: >> On Fri, Dec 15, 2017 at 1:29 AM, Richard Sandiford >> <richard.sandiford@linaro.org> wrote: >>> This patch just adds VEC_DUPLICATE_EXPR, since the VEC_DUPLICATE_CST >>> isn't needed with the new VECTOR_CST layout. It's really just the >>> original patch with bits removed, but just in case: >>> >>> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64-linux-gnu. >>> OK to install? >> >> To keep things simple at this point OK. Note that I'd eventually >> like to see this as VEC_PERM_EXPR <scalar_type_1, scalar_type_1, { 0, ... }>. >> For reductions when we need { x, 0, ... } we now have to use a >> VEC_DUPLICATE_EXPR to make x a vector and then a VEC_PERM_EXPR >> to merge it with {0, ... }, right? Rather than VEC_PERM_EXPR <x_1, 0, >> { 0, 1, 1, 1.... }> > > That's where the shift-left-and-insert-scalar thing (IFN_SHL_INSERT) > comes in. But yeah, allowing scalars as operands to VEC_PERM_EXPRs > would mean it could represent both VEC_DUPLICATE_EXPR and IFN_SHL_INSERT. > I guess the question is whether that's better than extending CONSTRUCTOR > (or a replacement) to use the VECTOR_CST encoding. I realise you don't > like CONSTRUCTOR in gimple though... > > I promise to look at either of those for GCC 9 if you think they're > better, but they'll be more invasive for other targets. Thanks. Richard. > Thanks, > Richard
Index: gcc/doc/generic.texi =================================================================== --- gcc/doc/generic.texi 2017-10-23 11:38:53.934094740 +0100 +++ gcc/doc/generic.texi 2017-10-23 11:41:51.760448406 +0100 @@ -1036,6 +1036,7 @@ As this example indicates, the operands @tindex FIXED_CST @tindex COMPLEX_CST @tindex VECTOR_CST +@tindex VEC_DUPLICATE_CST @tindex STRING_CST @findex TREE_STRING_LENGTH @findex TREE_STRING_POINTER @@ -1089,6 +1090,14 @@ constant nodes. Each individual constan double constant node. The first operand is a @code{TREE_LIST} of the constant nodes and is accessed through @code{TREE_VECTOR_CST_ELTS}. +@item VEC_DUPLICATE_CST +These nodes represent a vector constant in which every element has the +same scalar value. At present only variable-length vectors use +@code{VEC_DUPLICATE_CST}; constant-length vectors use @code{VECTOR_CST} +instead. The scalar element value is given by +@code{VEC_DUPLICATE_CST_ELT} and has the same restrictions as the +element of a @code{VECTOR_CST}. + @item STRING_CST These nodes represent string-constants. The @code{TREE_STRING_LENGTH} returns the length of the string, as an @code{int}. The @@ -1692,6 +1701,7 @@ a value from @code{enum annot_expr_kind} @node Vectors @subsection Vectors +@tindex VEC_DUPLICATE_EXPR @tindex VEC_LSHIFT_EXPR @tindex VEC_RSHIFT_EXPR @tindex VEC_WIDEN_MULT_HI_EXPR @@ -1703,9 +1713,14 @@ a value from @code{enum annot_expr_kind} @tindex VEC_PACK_TRUNC_EXPR @tindex VEC_PACK_SAT_EXPR @tindex VEC_PACK_FIX_TRUNC_EXPR +@tindex VEC_COND_EXPR @tindex SAD_EXPR @table @code +@item VEC_DUPLICATE_EXPR +This node has a single operand and represents a vector in which every +element is equal to that operand. + @item VEC_LSHIFT_EXPR @itemx VEC_RSHIFT_EXPR These nodes represent whole vector left and right shifts, respectively. Index: gcc/doc/md.texi =================================================================== --- gcc/doc/md.texi 2017-10-23 11:41:22.189466342 +0100 +++ gcc/doc/md.texi 2017-10-23 11:41:51.761413027 +0100 @@ -4888,6 +4888,17 @@ and operand 1 is parallel containing val the vector mode @var{m}, or a vector mode with the same element mode and smaller number of elements. +@cindex @code{vec_duplicate@var{m}} instruction pattern +@item @samp{vec_duplicate@var{m}} +Initialize vector output operand 0 so that each element has the value given +by scalar input operand 1. The vector has mode @var{m} and the scalar has +the mode appropriate for one element of @var{m}. + +This pattern only handles duplicates of non-constant inputs. Constant +vectors go through the @code{mov@var{m}} pattern instead. + +This pattern is not allowed to @code{FAIL}. + @cindex @code{vec_cmp@var{m}@var{n}} instruction pattern @item @samp{vec_cmp@var{m}@var{n}} Output a vector comparison. Operand 0 of mode @var{n} is the destination for Index: gcc/tree.def =================================================================== --- gcc/tree.def 2017-10-23 11:38:53.934094740 +0100 +++ gcc/tree.def 2017-10-23 11:41:51.774917721 +0100 @@ -304,6 +304,10 @@ DEFTREECODE (COMPLEX_CST, "complex_cst", /* Contents are in VECTOR_CST_ELTS field. */ DEFTREECODE (VECTOR_CST, "vector_cst", tcc_constant, 0) +/* Represents a vector constant in which every element is equal to + VEC_DUPLICATE_CST_ELT. */ +DEFTREECODE (VEC_DUPLICATE_CST, "vec_duplicate_cst", tcc_constant, 0) + /* Contents are TREE_STRING_LENGTH and the actual contents of the string. */ DEFTREECODE (STRING_CST, "string_cst", tcc_constant, 0) @@ -534,6 +538,9 @@ DEFTREECODE (TARGET_EXPR, "target_expr", 1 and 2 are NULL. The operands are then taken from the cfg edges. */ DEFTREECODE (COND_EXPR, "cond_expr", tcc_expression, 3) +/* Represents a vector in which every element is equal to operand 0. */ +DEFTREECODE (VEC_DUPLICATE_EXPR, "vec_duplicate_expr", tcc_unary, 1) + /* Vector conditional expression. It is like COND_EXPR, but with vector operands. Index: gcc/tree-core.h =================================================================== --- gcc/tree-core.h 2017-10-23 11:41:25.862065318 +0100 +++ gcc/tree-core.h 2017-10-23 11:41:51.771059237 +0100 @@ -975,7 +975,8 @@ struct GTY(()) tree_base { /* VEC length. This field is only used with TREE_VEC. */ int length; - /* Number of elements. This field is only used with VECTOR_CST. */ + /* Number of elements. This field is only used with VECTOR_CST + and VEC_DUPLICATE_CST. It is always 1 for VEC_DUPLICATE_CST. */ unsigned int nelts; /* SSA version number. This field is only used with SSA_NAME. */ @@ -1065,7 +1066,7 @@ struct GTY(()) tree_base { public_flag: TREE_OVERFLOW in - INTEGER_CST, REAL_CST, COMPLEX_CST, VECTOR_CST + INTEGER_CST, REAL_CST, COMPLEX_CST, VECTOR_CST, VEC_DUPLICATE_CST TREE_PUBLIC in VAR_DECL, FUNCTION_DECL @@ -1332,7 +1333,7 @@ struct GTY(()) tree_complex { struct GTY(()) tree_vector { struct tree_typed typed; - tree GTY ((length ("VECTOR_CST_NELTS ((tree) &%h)"))) elts[1]; + tree GTY ((length ("((tree) &%h)->base.u.nelts"))) elts[1]; }; struct GTY(()) tree_identifier { Index: gcc/tree.h =================================================================== --- gcc/tree.h 2017-10-23 11:41:23.517482774 +0100 +++ gcc/tree.h 2017-10-23 11:41:51.775882341 +0100 @@ -730,8 +730,8 @@ #define TREE_SYMBOL_REFERENCED(NODE) \ #define TYPE_REF_CAN_ALIAS_ALL(NODE) \ (PTR_OR_REF_CHECK (NODE)->base.static_flag) -/* In an INTEGER_CST, REAL_CST, COMPLEX_CST, or VECTOR_CST, this means - there was an overflow in folding. */ +/* In an INTEGER_CST, REAL_CST, COMPLEX_CST, VECTOR_CST or VEC_DUPLICATE_CST, + this means there was an overflow in folding. */ #define TREE_OVERFLOW(NODE) (CST_CHECK (NODE)->base.public_flag) @@ -1030,6 +1030,10 @@ #define VECTOR_CST_NELTS(NODE) (VECTOR_C #define VECTOR_CST_ELTS(NODE) (VECTOR_CST_CHECK (NODE)->vector.elts) #define VECTOR_CST_ELT(NODE,IDX) (VECTOR_CST_CHECK (NODE)->vector.elts[IDX]) +/* In a VEC_DUPLICATE_CST node. */ +#define VEC_DUPLICATE_CST_ELT(NODE) \ + (VEC_DUPLICATE_CST_CHECK (NODE)->vector.elts[0]) + /* Define fields and accessors for some special-purpose tree nodes. */ #define IDENTIFIER_LENGTH(NODE) \ @@ -4025,6 +4029,7 @@ extern tree build_int_cst (tree, HOST_WI extern tree build_int_cstu (tree type, unsigned HOST_WIDE_INT cst); extern tree build_int_cst_type (tree, HOST_WIDE_INT); extern tree make_vector (unsigned CXX_MEM_STAT_INFO); +extern tree build_vec_duplicate_cst (tree, tree CXX_MEM_STAT_INFO); extern tree build_vector (tree, vec<tree> CXX_MEM_STAT_INFO); extern tree build_vector_from_ctor (tree, vec<constructor_elt, va_gc> *); extern tree build_vector_from_val (tree, tree); Index: gcc/tree.c =================================================================== --- gcc/tree.c 2017-10-23 11:41:23.515548300 +0100 +++ gcc/tree.c 2017-10-23 11:41:51.774917721 +0100 @@ -464,6 +464,7 @@ tree_node_structure_for_code (enum tree_ case FIXED_CST: return TS_FIXED_CST; case COMPLEX_CST: return TS_COMPLEX; case VECTOR_CST: return TS_VECTOR; + case VEC_DUPLICATE_CST: return TS_VECTOR; case STRING_CST: return TS_STRING; /* tcc_exceptional cases. */ case ERROR_MARK: return TS_COMMON; @@ -816,6 +817,7 @@ tree_code_size (enum tree_code code) case FIXED_CST: return sizeof (struct tree_fixed_cst); case COMPLEX_CST: return sizeof (struct tree_complex); case VECTOR_CST: return sizeof (struct tree_vector); + case VEC_DUPLICATE_CST: return sizeof (struct tree_vector); case STRING_CST: gcc_unreachable (); default: return lang_hooks.tree_size (code); @@ -875,6 +877,9 @@ tree_size (const_tree node) return (sizeof (struct tree_vector) + (VECTOR_CST_NELTS (node) - 1) * sizeof (tree)); + case VEC_DUPLICATE_CST: + return sizeof (struct tree_vector); + case STRING_CST: return TREE_STRING_LENGTH (node) + offsetof (struct tree_string, str) + 1; @@ -1682,6 +1687,30 @@ cst_and_fits_in_hwi (const_tree x) && (tree_fits_shwi_p (x) || tree_fits_uhwi_p (x))); } +/* Build a new VEC_DUPLICATE_CST with type TYPE and operand EXP. + + Note that this function is only suitable for callers that specifically + need a VEC_DUPLICATE_CST node. Use build_vector_from_val to duplicate + a general scalar into a general vector type. */ + +tree +build_vec_duplicate_cst (tree type, tree exp MEM_STAT_DECL) +{ + int length = sizeof (struct tree_vector); + + record_node_allocation_statistics (VEC_DUPLICATE_CST, length); + + tree t = ggc_alloc_cleared_tree_node_stat (length PASS_MEM_STAT); + + TREE_SET_CODE (t, VEC_DUPLICATE_CST); + TREE_TYPE (t) = type; + t->base.u.nelts = 1; + VEC_DUPLICATE_CST_ELT (t) = exp; + TREE_CONSTANT (t) = 1; + + return t; +} + /* Build a newly constructed VECTOR_CST node of length LEN. */ tree @@ -2343,6 +2372,8 @@ integer_zerop (const_tree expr) return false; return true; } + case VEC_DUPLICATE_CST: + return integer_zerop (VEC_DUPLICATE_CST_ELT (expr)); default: return false; } @@ -2369,6 +2400,8 @@ integer_onep (const_tree expr) return false; return true; } + case VEC_DUPLICATE_CST: + return integer_onep (VEC_DUPLICATE_CST_ELT (expr)); default: return false; } @@ -2407,6 +2440,9 @@ integer_all_onesp (const_tree expr) return 1; } + else if (TREE_CODE (expr) == VEC_DUPLICATE_CST) + return integer_all_onesp (VEC_DUPLICATE_CST_ELT (expr)); + else if (TREE_CODE (expr) != INTEGER_CST) return 0; @@ -2463,7 +2499,7 @@ integer_nonzerop (const_tree expr) int integer_truep (const_tree expr) { - if (TREE_CODE (expr) == VECTOR_CST) + if (TREE_CODE (expr) == VECTOR_CST || TREE_CODE (expr) == VEC_DUPLICATE_CST) return integer_all_onesp (expr); return integer_onep (expr); } @@ -2634,6 +2670,8 @@ real_zerop (const_tree expr) return false; return true; } + case VEC_DUPLICATE_CST: + return real_zerop (VEC_DUPLICATE_CST_ELT (expr)); default: return false; } @@ -2662,6 +2700,8 @@ real_onep (const_tree expr) return false; return true; } + case VEC_DUPLICATE_CST: + return real_onep (VEC_DUPLICATE_CST_ELT (expr)); default: return false; } @@ -2689,6 +2729,8 @@ real_minus_onep (const_tree expr) return false; return true; } + case VEC_DUPLICATE_CST: + return real_minus_onep (VEC_DUPLICATE_CST_ELT (expr)); default: return false; } @@ -7091,6 +7133,9 @@ add_expr (const_tree t, inchash::hash &h inchash::add_expr (VECTOR_CST_ELT (t, i), hstate, flags); return; } + case VEC_DUPLICATE_CST: + inchash::add_expr (VEC_DUPLICATE_CST_ELT (t), hstate); + return; case SSA_NAME: /* We can just compare by pointer. */ hstate.add_wide_int (SSA_NAME_VERSION (t)); @@ -10345,6 +10390,9 @@ initializer_zerop (const_tree init) return true; } + case VEC_DUPLICATE_CST: + return initializer_zerop (VEC_DUPLICATE_CST_ELT (init)); + case CONSTRUCTOR: { unsigned HOST_WIDE_INT idx; @@ -10390,7 +10438,13 @@ uniform_vector_p (const_tree vec) gcc_assert (VECTOR_TYPE_P (TREE_TYPE (vec))); - if (TREE_CODE (vec) == VECTOR_CST) + if (TREE_CODE (vec) == VEC_DUPLICATE_CST) + return VEC_DUPLICATE_CST_ELT (vec); + + else if (TREE_CODE (vec) == VEC_DUPLICATE_EXPR) + return TREE_OPERAND (vec, 0); + + else if (TREE_CODE (vec) == VECTOR_CST) { first = VECTOR_CST_ELT (vec, 0); for (i = 1; i < VECTOR_CST_NELTS (vec); ++i) @@ -11095,6 +11149,7 @@ #define WALK_SUBTREE_TAIL(NODE) \ case REAL_CST: case FIXED_CST: case VECTOR_CST: + case VEC_DUPLICATE_CST: case STRING_CST: case BLOCK: case PLACEHOLDER_EXPR: @@ -12381,6 +12436,12 @@ drop_tree_overflow (tree t) elt = drop_tree_overflow (elt); } } + if (TREE_CODE (t) == VEC_DUPLICATE_CST) + { + tree *elt = &VEC_DUPLICATE_CST_ELT (t); + if (TREE_OVERFLOW (*elt)) + *elt = drop_tree_overflow (*elt); + } return t; } @@ -13798,6 +13859,92 @@ test_integer_constants () ASSERT_EQ (type, TREE_TYPE (zero)); } +/* Verify predicate handling of VEC_DUPLICATE_CSTs and VEC_DUPLICATE_EXPRs + for integral type TYPE. */ + +static void +test_vec_duplicate_predicates_int (tree type) +{ + tree vec_type = build_vector_type (type, 4); + + tree zero = build_zero_cst (type); + tree vec_zero = build_vec_duplicate_cst (vec_type, zero); + ASSERT_TRUE (integer_zerop (vec_zero)); + ASSERT_FALSE (integer_onep (vec_zero)); + ASSERT_FALSE (integer_minus_onep (vec_zero)); + ASSERT_FALSE (integer_all_onesp (vec_zero)); + ASSERT_FALSE (integer_truep (vec_zero)); + ASSERT_TRUE (initializer_zerop (vec_zero)); + + tree one = build_one_cst (type); + tree vec_one = build_vec_duplicate_cst (vec_type, one); + ASSERT_FALSE (integer_zerop (vec_one)); + ASSERT_TRUE (integer_onep (vec_one)); + ASSERT_FALSE (integer_minus_onep (vec_one)); + ASSERT_FALSE (integer_all_onesp (vec_one)); + ASSERT_FALSE (integer_truep (vec_one)); + ASSERT_FALSE (initializer_zerop (vec_one)); + + tree minus_one = build_minus_one_cst (type); + tree vec_minus_one = build_vec_duplicate_cst (vec_type, minus_one); + ASSERT_FALSE (integer_zerop (vec_minus_one)); + ASSERT_FALSE (integer_onep (vec_minus_one)); + ASSERT_TRUE (integer_minus_onep (vec_minus_one)); + ASSERT_TRUE (integer_all_onesp (vec_minus_one)); + ASSERT_TRUE (integer_truep (vec_minus_one)); + ASSERT_FALSE (initializer_zerop (vec_minus_one)); + + tree x = create_tmp_var_raw (type, "x"); + tree vec_x = build1 (VEC_DUPLICATE_EXPR, vec_type, x); + ASSERT_EQ (uniform_vector_p (vec_zero), zero); + ASSERT_EQ (uniform_vector_p (vec_one), one); + ASSERT_EQ (uniform_vector_p (vec_minus_one), minus_one); + ASSERT_EQ (uniform_vector_p (vec_x), x); +} + +/* Verify predicate handling of VEC_DUPLICATE_CSTs for floating-point + type TYPE. */ + +static void +test_vec_duplicate_predicates_float (tree type) +{ + tree vec_type = build_vector_type (type, 4); + + tree zero = build_zero_cst (type); + tree vec_zero = build_vec_duplicate_cst (vec_type, zero); + ASSERT_TRUE (real_zerop (vec_zero)); + ASSERT_FALSE (real_onep (vec_zero)); + ASSERT_FALSE (real_minus_onep (vec_zero)); + ASSERT_TRUE (initializer_zerop (vec_zero)); + + tree one = build_one_cst (type); + tree vec_one = build_vec_duplicate_cst (vec_type, one); + ASSERT_FALSE (real_zerop (vec_one)); + ASSERT_TRUE (real_onep (vec_one)); + ASSERT_FALSE (real_minus_onep (vec_one)); + ASSERT_FALSE (initializer_zerop (vec_one)); + + tree minus_one = build_minus_one_cst (type); + tree vec_minus_one = build_vec_duplicate_cst (vec_type, minus_one); + ASSERT_FALSE (real_zerop (vec_minus_one)); + ASSERT_FALSE (real_onep (vec_minus_one)); + ASSERT_TRUE (real_minus_onep (vec_minus_one)); + ASSERT_FALSE (initializer_zerop (vec_minus_one)); + + ASSERT_EQ (uniform_vector_p (vec_zero), zero); + ASSERT_EQ (uniform_vector_p (vec_one), one); + ASSERT_EQ (uniform_vector_p (vec_minus_one), minus_one); +} + +/* Verify predicate handling of VEC_DUPLICATE_CSTs and VEC_DUPLICATE_EXPRs. */ + +static void +test_vec_duplicate_predicates () +{ + test_vec_duplicate_predicates_int (integer_type_node); + test_vec_duplicate_predicates_float (float_type_node); +} + /* Verify identifiers. */ static void @@ -13826,6 +13973,7 @@ test_labels () tree_c_tests () { test_integer_constants (); + test_vec_duplicate_predicates (); test_identifiers (); test_labels (); } Index: gcc/cfgexpand.c =================================================================== --- gcc/cfgexpand.c 2017-10-23 11:41:23.137358624 +0100 +++ gcc/cfgexpand.c 2017-10-23 11:41:51.760448406 +0100 @@ -5049,6 +5049,8 @@ expand_debug_expr (tree exp) case VEC_WIDEN_LSHIFT_HI_EXPR: case VEC_WIDEN_LSHIFT_LO_EXPR: case VEC_PERM_EXPR: + case VEC_DUPLICATE_CST: + case VEC_DUPLICATE_EXPR: return NULL; /* Misc codes. */ Index: gcc/tree-pretty-print.c =================================================================== --- gcc/tree-pretty-print.c 2017-10-23 11:38:53.934094740 +0100 +++ gcc/tree-pretty-print.c 2017-10-23 11:41:51.772023858 +0100 @@ -1802,6 +1802,12 @@ dump_generic_node (pretty_printer *pp, t } break; + case VEC_DUPLICATE_CST: + pp_string (pp, "{ "); + dump_generic_node (pp, VEC_DUPLICATE_CST_ELT (node), spc, flags, false); + pp_string (pp, ", ... }"); + break; + case FUNCTION_TYPE: case METHOD_TYPE: dump_generic_node (pp, TREE_TYPE (node), spc, flags, false); @@ -3231,6 +3237,15 @@ dump_generic_node (pretty_printer *pp, t pp_string (pp, " > "); break; + case VEC_DUPLICATE_EXPR: + pp_space (pp); + for (str = get_tree_code_name (code); *str; str++) + pp_character (pp, TOUPPER (*str)); + pp_string (pp, " < "); + dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false); + pp_string (pp, " > "); + break; + case VEC_UNPACK_HI_EXPR: pp_string (pp, " VEC_UNPACK_HI_EXPR < "); dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false); Index: gcc/dwarf2out.c =================================================================== --- gcc/dwarf2out.c 2017-10-23 11:41:24.407340836 +0100 +++ gcc/dwarf2out.c 2017-10-23 11:41:51.763342269 +0100 @@ -18862,6 +18862,7 @@ rtl_for_decl_init (tree init, tree type) switch (TREE_CODE (init)) { case VECTOR_CST: + case VEC_DUPLICATE_CST: break; case CONSTRUCTOR: if (TREE_CONSTANT (init)) Index: gcc/gimple-expr.h =================================================================== --- gcc/gimple-expr.h 2017-10-23 11:38:53.934094740 +0100 +++ gcc/gimple-expr.h 2017-10-23 11:41:51.765271511 +0100 @@ -134,6 +134,7 @@ is_gimple_constant (const_tree t) case FIXED_CST: case COMPLEX_CST: case VECTOR_CST: + case VEC_DUPLICATE_CST: case STRING_CST: return true; Index: gcc/gimplify.c =================================================================== --- gcc/gimplify.c 2017-10-23 11:41:25.531270256 +0100 +++ gcc/gimplify.c 2017-10-23 11:41:51.766236132 +0100 @@ -11506,6 +11506,7 @@ gimplify_expr (tree *expr_p, gimple_seq case STRING_CST: case COMPLEX_CST: case VECTOR_CST: + case VEC_DUPLICATE_CST: /* Drop the overflow flag on constants, we do not want that in the GIMPLE IL. */ if (TREE_OVERFLOW_P (*expr_p)) Index: gcc/graphite-isl-ast-to-gimple.c =================================================================== --- gcc/graphite-isl-ast-to-gimple.c 2017-10-23 11:41:23.205065216 +0100 +++ gcc/graphite-isl-ast-to-gimple.c 2017-10-23 11:41:51.767200753 +0100 @@ -222,7 +222,8 @@ enum phi_node_kind return TREE_CODE (op) == INTEGER_CST || TREE_CODE (op) == REAL_CST || TREE_CODE (op) == COMPLEX_CST - || TREE_CODE (op) == VECTOR_CST; + || TREE_CODE (op) == VECTOR_CST + || TREE_CODE (op) == VEC_DUPLICATE_CST; } private: Index: gcc/graphite-scop-detection.c =================================================================== --- gcc/graphite-scop-detection.c 2017-10-23 11:41:25.533204730 +0100 +++ gcc/graphite-scop-detection.c 2017-10-23 11:41:51.767200753 +0100 @@ -1243,6 +1243,7 @@ scan_tree_for_params (sese_info_p s, tre case REAL_CST: case COMPLEX_CST: case VECTOR_CST: + case VEC_DUPLICATE_CST: break; default: Index: gcc/ipa-icf-gimple.c =================================================================== --- gcc/ipa-icf-gimple.c 2017-10-23 11:38:53.934094740 +0100 +++ gcc/ipa-icf-gimple.c 2017-10-23 11:41:51.767200753 +0100 @@ -333,6 +333,7 @@ func_checker::compare_cst_or_decl (tree case INTEGER_CST: case COMPLEX_CST: case VECTOR_CST: + case VEC_DUPLICATE_CST: case STRING_CST: case REAL_CST: { @@ -528,6 +529,7 @@ func_checker::compare_operand (tree t1, case INTEGER_CST: case COMPLEX_CST: case VECTOR_CST: + case VEC_DUPLICATE_CST: case STRING_CST: case REAL_CST: case FUNCTION_DECL: Index: gcc/ipa-icf.c =================================================================== --- gcc/ipa-icf.c 2017-10-23 11:41:25.874639400 +0100 +++ gcc/ipa-icf.c 2017-10-23 11:41:51.768165374 +0100 @@ -1478,6 +1478,7 @@ sem_item::add_expr (const_tree exp, inch case STRING_CST: case COMPLEX_CST: case VECTOR_CST: + case VEC_DUPLICATE_CST: inchash::add_expr (exp, hstate); break; case CONSTRUCTOR: @@ -2030,6 +2031,9 @@ sem_variable::equals (tree t1, tree t2) return 1; } + case VEC_DUPLICATE_CST: + return sem_variable::equals (VEC_DUPLICATE_CST_ELT (t1), + VEC_DUPLICATE_CST_ELT (t2)); case ARRAY_REF: case ARRAY_RANGE_REF: { Index: gcc/match.pd =================================================================== --- gcc/match.pd 2017-10-23 11:38:53.934094740 +0100 +++ gcc/match.pd 2017-10-23 11:41:51.768165374 +0100 @@ -958,6 +958,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (match negate_expr_p VECTOR_CST (if (FLOAT_TYPE_P (TREE_TYPE (type)) || TYPE_OVERFLOW_WRAPS (type)))) +(match negate_expr_p + VEC_DUPLICATE_CST + (if (FLOAT_TYPE_P (TREE_TYPE (type)) || TYPE_OVERFLOW_WRAPS (type)))) /* (-A) * (-B) -> A * B */ (simplify Index: gcc/print-tree.c =================================================================== --- gcc/print-tree.c 2017-10-23 11:38:53.934094740 +0100 +++ gcc/print-tree.c 2017-10-23 11:41:51.769129995 +0100 @@ -783,6 +783,10 @@ print_node (FILE *file, const char *pref } break; + case VEC_DUPLICATE_CST: + print_node (file, "elt", VEC_DUPLICATE_CST_ELT (node), indent + 4); + break; + case COMPLEX_CST: print_node (file, "real", TREE_REALPART (node), indent + 4); print_node (file, "imag", TREE_IMAGPART (node), indent + 4); Index: gcc/tree-chkp.c =================================================================== --- gcc/tree-chkp.c 2017-10-23 11:41:23.201196268 +0100 +++ gcc/tree-chkp.c 2017-10-23 11:41:51.770094616 +0100 @@ -3800,6 +3800,7 @@ chkp_find_bounds_1 (tree ptr, tree ptr_s case INTEGER_CST: case COMPLEX_CST: case VECTOR_CST: + case VEC_DUPLICATE_CST: if (integer_zerop (ptr_src)) bounds = chkp_get_none_bounds (); else Index: gcc/tree-loop-distribution.c =================================================================== --- gcc/tree-loop-distribution.c 2017-10-23 11:41:23.228278904 +0100 +++ gcc/tree-loop-distribution.c 2017-10-23 11:41:51.771059237 +0100 @@ -921,6 +921,9 @@ const_with_all_bytes_same (tree val) && CONSTRUCTOR_NELTS (val) == 0)) return 0; + if (TREE_CODE (val) == VEC_DUPLICATE_CST) + return const_with_all_bytes_same (VEC_DUPLICATE_CST_ELT (val)); + if (real_zerop (val)) { /* Only return 0 for +0.0, not for -0.0, which doesn't have Index: gcc/tree-ssa-loop.c =================================================================== --- gcc/tree-ssa-loop.c 2017-10-23 11:38:53.934094740 +0100 +++ gcc/tree-ssa-loop.c 2017-10-23 11:41:51.772023858 +0100 @@ -616,6 +616,7 @@ for_each_index (tree *addr_p, bool (*cbc case STRING_CST: case RESULT_DECL: case VECTOR_CST: + case VEC_DUPLICATE_CST: case COMPLEX_CST: case INTEGER_CST: case REAL_CST: Index: gcc/tree-ssa-pre.c =================================================================== --- gcc/tree-ssa-pre.c 2017-10-23 11:41:25.549647760 +0100 +++ gcc/tree-ssa-pre.c 2017-10-23 11:41:51.772023858 +0100 @@ -2675,6 +2675,7 @@ create_component_ref_by_pieces_1 (basic_ case INTEGER_CST: case COMPLEX_CST: case VECTOR_CST: + case VEC_DUPLICATE_CST: case REAL_CST: case CONSTRUCTOR: case VAR_DECL: Index: gcc/tree-ssa-sccvn.c =================================================================== --- gcc/tree-ssa-sccvn.c 2017-10-23 11:38:53.934094740 +0100 +++ gcc/tree-ssa-sccvn.c 2017-10-23 11:41:51.773953100 +0100 @@ -858,6 +858,7 @@ copy_reference_ops_from_ref (tree ref, v case INTEGER_CST: case COMPLEX_CST: case VECTOR_CST: + case VEC_DUPLICATE_CST: case REAL_CST: case FIXED_CST: case CONSTRUCTOR: @@ -1050,6 +1051,7 @@ ao_ref_init_from_vn_reference (ao_ref *r case INTEGER_CST: case COMPLEX_CST: case VECTOR_CST: + case VEC_DUPLICATE_CST: case REAL_CST: case CONSTRUCTOR: case CONST_DECL: Index: gcc/tree-vect-generic.c =================================================================== --- gcc/tree-vect-generic.c 2017-10-23 11:38:53.934094740 +0100 +++ gcc/tree-vect-generic.c 2017-10-23 11:41:51.773953100 +0100 @@ -1419,6 +1419,7 @@ lower_vec_perm (gimple_stmt_iterator *gs ssa_uniform_vector_p (tree op) { if (TREE_CODE (op) == VECTOR_CST + || TREE_CODE (op) == VEC_DUPLICATE_CST || TREE_CODE (op) == CONSTRUCTOR) return uniform_vector_p (op); if (TREE_CODE (op) == SSA_NAME) Index: gcc/varasm.c =================================================================== --- gcc/varasm.c 2017-10-23 11:41:25.822408600 +0100 +++ gcc/varasm.c 2017-10-23 11:41:51.775882341 +0100 @@ -3068,6 +3068,9 @@ const_hash_1 (const tree exp) CASE_CONVERT: return const_hash_1 (TREE_OPERAND (exp, 0)) * 7 + 2; + case VEC_DUPLICATE_CST: + return const_hash_1 (VEC_DUPLICATE_CST_ELT (exp)) * 7 + 3; + default: /* A language specific constant. Just hash the code. */ return code; @@ -3158,6 +3161,10 @@ compare_constant (const tree t1, const t return 1; } + case VEC_DUPLICATE_CST: + return compare_constant (VEC_DUPLICATE_CST_ELT (t1), + VEC_DUPLICATE_CST_ELT (t2)); + case CONSTRUCTOR: { vec<constructor_elt, va_gc> *v1, *v2; Index: gcc/fold-const.c =================================================================== --- gcc/fold-const.c 2017-10-23 11:41:23.535860278 +0100 +++ gcc/fold-const.c 2017-10-23 11:41:51.765271511 +0100 @@ -418,6 +418,9 @@ negate_expr_p (tree t) return true; } + case VEC_DUPLICATE_CST: + return negate_expr_p (VEC_DUPLICATE_CST_ELT (t)); + case COMPLEX_EXPR: return negate_expr_p (TREE_OPERAND (t, 0)) && negate_expr_p (TREE_OPERAND (t, 1)); @@ -579,6 +582,14 @@ fold_negate_expr_1 (location_t loc, tree return build_vector (type, elts); } + case VEC_DUPLICATE_CST: + { + tree sub = fold_negate_expr (loc, VEC_DUPLICATE_CST_ELT (t)); + if (!sub) + return NULL_TREE; + return build_vector_from_val (type, sub); + } + case COMPLEX_EXPR: if (negate_expr_p (t)) return fold_build2_loc (loc, COMPLEX_EXPR, type, @@ -1436,6 +1447,16 @@ const_binop (enum tree_code code, tree a return build_vector (type, elts); } + if (TREE_CODE (arg1) == VEC_DUPLICATE_CST + && TREE_CODE (arg2) == VEC_DUPLICATE_CST) + { + tree sub = const_binop (code, VEC_DUPLICATE_CST_ELT (arg1), + VEC_DUPLICATE_CST_ELT (arg2)); + if (!sub) + return NULL_TREE; + return build_vector_from_val (TREE_TYPE (arg1), sub); + } + /* Shifts allow a scalar offset for a vector. */ if (TREE_CODE (arg1) == VECTOR_CST && TREE_CODE (arg2) == INTEGER_CST) @@ -1459,6 +1480,15 @@ const_binop (enum tree_code code, tree a return build_vector (type, elts); } + + if (TREE_CODE (arg1) == VEC_DUPLICATE_CST + && TREE_CODE (arg2) == INTEGER_CST) + { + tree sub = const_binop (code, VEC_DUPLICATE_CST_ELT (arg1), arg2); + if (!sub) + return NULL_TREE; + return build_vector_from_val (TREE_TYPE (arg1), sub); + } return NULL_TREE; } @@ -1652,6 +1682,13 @@ const_unop (enum tree_code code, tree ty if (i == count) return build_vector (type, elements); } + else if (TREE_CODE (arg0) == VEC_DUPLICATE_CST) + { + tree sub = const_unop (BIT_NOT_EXPR, TREE_TYPE (type), + VEC_DUPLICATE_CST_ELT (arg0)); + if (sub) + return build_vector_from_val (type, sub); + } break; case TRUTH_NOT_EXPR: @@ -1737,6 +1774,11 @@ const_unop (enum tree_code code, tree ty return res; } + case VEC_DUPLICATE_EXPR: + if (CONSTANT_CLASS_P (arg0)) + return build_vector_from_val (type, arg0); + return NULL_TREE; + default: break; } @@ -2167,6 +2209,15 @@ fold_convert_const (enum tree_code code, } return build_vector (type, v); } + if (TREE_CODE (arg1) == VEC_DUPLICATE_CST + && (TYPE_VECTOR_SUBPARTS (type) + == TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)))) + { + tree sub = fold_convert_const (code, TREE_TYPE (type), + VEC_DUPLICATE_CST_ELT (arg1)); + if (sub) + return build_vector_from_val (type, sub); + } } return NULL_TREE; } @@ -2953,6 +3004,10 @@ operand_equal_p (const_tree arg0, const_ return 1; } + case VEC_DUPLICATE_CST: + return operand_equal_p (VEC_DUPLICATE_CST_ELT (arg0), + VEC_DUPLICATE_CST_ELT (arg1), flags); + case COMPLEX_CST: return (operand_equal_p (TREE_REALPART (arg0), TREE_REALPART (arg1), flags) @@ -7492,6 +7547,20 @@ can_native_interpret_type_p (tree type) static tree fold_view_convert_expr (tree type, tree expr) { + /* Recurse on duplicated vectors if the target type is also a vector + and if the elements line up. */ + tree expr_type = TREE_TYPE (expr); + if (TREE_CODE (expr) == VEC_DUPLICATE_CST + && VECTOR_TYPE_P (type) + && TYPE_VECTOR_SUBPARTS (type) == TYPE_VECTOR_SUBPARTS (expr_type) + && TYPE_SIZE (TREE_TYPE (type)) == TYPE_SIZE (TREE_TYPE (expr_type))) + { + tree sub = fold_view_convert_expr (TREE_TYPE (type), + VEC_DUPLICATE_CST_ELT (expr)); + if (sub) + return build_vector_from_val (type, sub); + } + /* We support up to 512-bit values (for V8DFmode). */ unsigned char buffer[64]; int len; @@ -8891,6 +8960,15 @@ exact_inverse (tree type, tree cst) return build_vector (type, elts); } + case VEC_DUPLICATE_CST: + { + tree sub = exact_inverse (TREE_TYPE (type), + VEC_DUPLICATE_CST_ELT (cst)); + if (!sub) + return NULL_TREE; + return build_vector_from_val (type, sub); + } + default: return NULL_TREE; } @@ -11969,6 +12047,9 @@ fold_checksum_tree (const_tree expr, str for (i = 0; i < (int) VECTOR_CST_NELTS (expr); ++i) fold_checksum_tree (VECTOR_CST_ELT (expr, i), ctx, ht); break; + case VEC_DUPLICATE_CST: + fold_checksum_tree (VEC_DUPLICATE_CST_ELT (expr), ctx, ht); + break; default: break; } @@ -14436,6 +14517,36 @@ test_vector_folding () ASSERT_FALSE (integer_nonzerop (fold_build2 (NE_EXPR, res_type, one, one))); } +/* Verify folding of VEC_DUPLICATE_CSTs and VEC_DUPLICATE_EXPRs. */ + +static void +test_vec_duplicate_folding () +{ + tree type = build_vector_type (ssizetype, 4); + tree dup5 = build_vec_duplicate_cst (type, ssize_int (5)); + tree dup3 = build_vec_duplicate_cst (type, ssize_int (3)); + + tree neg_dup5 = fold_unary (NEGATE_EXPR, type, dup5); + ASSERT_EQ (uniform_vector_p (neg_dup5), ssize_int (-5)); + + tree not_dup5 = fold_unary (BIT_NOT_EXPR, type, dup5); + ASSERT_EQ (uniform_vector_p (not_dup5), ssize_int (-6)); + + tree dup5_plus_dup3 = fold_binary (PLUS_EXPR, type, dup5, dup3); + ASSERT_EQ (uniform_vector_p (dup5_plus_dup3), ssize_int (8)); + + tree dup5_lsl_2 = fold_binary (LSHIFT_EXPR, type, dup5, ssize_int (2)); + ASSERT_EQ (uniform_vector_p (dup5_lsl_2), ssize_int (20)); + + tree size_vector = build_vector_type (sizetype, 4); + tree size_dup5 = fold_convert (size_vector, dup5); + ASSERT_EQ (uniform_vector_p (size_dup5), size_int (5)); + + tree dup5_expr = fold_unary (VEC_DUPLICATE_EXPR, type, ssize_int (5)); + tree dup5_cst = build_vector_from_val (type, ssize_int (5)); + ASSERT_TRUE (operand_equal_p (dup5_expr, dup5_cst, 0)); +} + /* Run all of the selftests within this file. */ void @@ -14443,6 +14554,7 @@ fold_const_c_tests () { test_arithmetic_folding (); test_vector_folding (); + test_vec_duplicate_folding (); } } // namespace selftest Index: gcc/optabs.def =================================================================== --- gcc/optabs.def 2017-10-23 11:38:53.934094740 +0100 +++ gcc/optabs.def 2017-10-23 11:41:51.769129995 +0100 @@ -364,3 +364,5 @@ OPTAB_D (atomic_xor_optab, "atomic_xor$I OPTAB_D (get_thread_pointer_optab, "get_thread_pointer$I$a") OPTAB_D (set_thread_pointer_optab, "set_thread_pointer$I$a") + +OPTAB_DC (vec_duplicate_optab, "vec_duplicate$a", VEC_DUPLICATE) Index: gcc/optabs-tree.c =================================================================== --- gcc/optabs-tree.c 2017-10-23 11:38:53.934094740 +0100 +++ gcc/optabs-tree.c 2017-10-23 11:41:51.768165374 +0100 @@ -210,6 +210,9 @@ optab_for_tree_code (enum tree_code code return TYPE_UNSIGNED (type) ? vec_pack_ufix_trunc_optab : vec_pack_sfix_trunc_optab; + case VEC_DUPLICATE_EXPR: + return vec_duplicate_optab; + default: break; } Index: gcc/optabs.h =================================================================== --- gcc/optabs.h 2017-10-23 11:38:53.934094740 +0100 +++ gcc/optabs.h 2017-10-23 11:41:51.769129995 +0100 @@ -181,6 +181,7 @@ extern rtx simplify_expand_binop (machin enum optab_methods methods); extern bool force_expand_binop (machine_mode, optab, rtx, rtx, rtx, int, enum optab_methods); +extern rtx expand_vector_broadcast (machine_mode, rtx); /* Generate code for a simple binary or unary operation. "Simple" in this case means "can be unambiguously described by a (mode, code) Index: gcc/optabs.c =================================================================== --- gcc/optabs.c 2017-10-23 11:41:41.549050496 +0100 +++ gcc/optabs.c 2017-10-23 11:41:51.769129995 +0100 @@ -367,7 +367,7 @@ force_expand_binop (machine_mode mode, o mode of OP must be the element mode of VMODE. If OP is a constant, then the return value will be a constant. */ -static rtx +rtx expand_vector_broadcast (machine_mode vmode, rtx op) { enum insn_code icode; @@ -380,6 +380,16 @@ expand_vector_broadcast (machine_mode vm if (CONSTANT_P (op)) return gen_const_vec_duplicate (vmode, op); + icode = optab_handler (vec_duplicate_optab, vmode); + if (icode != CODE_FOR_nothing) + { + struct expand_operand ops[2]; + create_output_operand (&ops[0], NULL_RTX, vmode); + create_input_operand (&ops[1], op, GET_MODE (op)); + expand_insn (icode, 2, ops); + return ops[0].value; + } + /* ??? If the target doesn't have a vec_init, then we have no easy way of performing this operation. Most of this sort of generic support is hidden away in the vector lowering support in gimple. */ Index: gcc/expr.c =================================================================== --- gcc/expr.c 2017-10-23 11:41:39.187050437 +0100 +++ gcc/expr.c 2017-10-23 11:41:51.764306890 +0100 @@ -6572,7 +6572,8 @@ store_constructor (tree exp, rtx target, constructor_elt *ce; int i; int need_to_clear; - int icode = CODE_FOR_nothing; + insn_code icode = CODE_FOR_nothing; + tree elt; tree elttype = TREE_TYPE (type); int elt_size = tree_to_uhwi (TYPE_SIZE (elttype)); machine_mode eltmode = TYPE_MODE (elttype); @@ -6582,13 +6583,30 @@ store_constructor (tree exp, rtx target, unsigned n_elts; alias_set_type alias; bool vec_vec_init_p = false; + machine_mode mode = GET_MODE (target); gcc_assert (eltmode != BLKmode); + /* Try using vec_duplicate_optab for uniform vectors. */ + if (!TREE_SIDE_EFFECTS (exp) + && VECTOR_MODE_P (mode) + && eltmode == GET_MODE_INNER (mode) + && ((icode = optab_handler (vec_duplicate_optab, mode)) + != CODE_FOR_nothing) + && (elt = uniform_vector_p (exp))) + { + struct expand_operand ops[2]; + create_output_operand (&ops[0], target, mode); + create_input_operand (&ops[1], expand_normal (elt), eltmode); + expand_insn (icode, 2, ops); + if (!rtx_equal_p (target, ops[0].value)) + emit_move_insn (target, ops[0].value); + break; + } + n_elts = TYPE_VECTOR_SUBPARTS (type); - if (REG_P (target) && VECTOR_MODE_P (GET_MODE (target))) + if (REG_P (target) && VECTOR_MODE_P (mode)) { - machine_mode mode = GET_MODE (target); machine_mode emode = eltmode; if (CONSTRUCTOR_NELTS (exp) @@ -6600,7 +6618,7 @@ store_constructor (tree exp, rtx target, == n_elts); emode = TYPE_MODE (etype); } - icode = (int) convert_optab_handler (vec_init_optab, mode, emode); + icode = convert_optab_handler (vec_init_optab, mode, emode); if (icode != CODE_FOR_nothing) { unsigned int i, n = n_elts; @@ -6648,7 +6666,7 @@ store_constructor (tree exp, rtx target, if (need_to_clear && size > 0 && !vector) { if (REG_P (target)) - emit_move_insn (target, CONST0_RTX (GET_MODE (target))); + emit_move_insn (target, CONST0_RTX (mode)); else clear_storage (target, GEN_INT (size), BLOCK_OP_NORMAL); cleared = 1; @@ -6656,7 +6674,7 @@ store_constructor (tree exp, rtx target, /* Inform later passes that the old value is dead. */ if (!cleared && !vector && REG_P (target)) - emit_move_insn (target, CONST0_RTX (GET_MODE (target))); + emit_move_insn (target, CONST0_RTX (mode)); if (MEM_P (target)) alias = MEM_ALIAS_SET (target); @@ -6707,8 +6725,7 @@ store_constructor (tree exp, rtx target, if (vector) emit_insn (GEN_FCN (icode) (target, - gen_rtx_PARALLEL (GET_MODE (target), - vector))); + gen_rtx_PARALLEL (mode, vector))); break; } @@ -7686,6 +7703,19 @@ expand_operands (tree exp0, tree exp1, r } +/* Expand constant vector element ELT, which has mode MODE. This is used + for members of VECTOR_CST and VEC_DUPLICATE_CST. */ + +static rtx +const_vector_element (scalar_mode mode, const_tree elt) +{ + if (TREE_CODE (elt) == REAL_CST) + return const_double_from_real_value (TREE_REAL_CST (elt), mode); + if (TREE_CODE (elt) == FIXED_CST) + return CONST_FIXED_FROM_FIXED_VALUE (TREE_FIXED_CST (elt), mode); + return immed_wide_int_const (wi::to_wide (elt), mode); +} + /* Return a MEM that contains constant EXP. DEFER is as for output_constant_def and MODIFIER is as for expand_expr. */ @@ -9551,6 +9581,12 @@ #define REDUCE_BIT_FIELD(expr) (reduce_b target = expand_vec_cond_expr (type, treeop0, treeop1, treeop2, target); return target; + case VEC_DUPLICATE_EXPR: + op0 = expand_expr (treeop0, NULL_RTX, VOIDmode, modifier); + target = expand_vector_broadcast (mode, op0); + gcc_assert (target); + return target; + case BIT_INSERT_EXPR: { unsigned bitpos = tree_to_uhwi (treeop2); @@ -10003,6 +10039,11 @@ expand_expr_real_1 (tree exp, rtx target tmode, modifier); } + case VEC_DUPLICATE_CST: + op0 = const_vector_element (GET_MODE_INNER (mode), + VEC_DUPLICATE_CST_ELT (exp)); + return gen_const_vec_duplicate (mode, op0); + case CONST_DECL: if (modifier == EXPAND_WRITE) { @@ -11764,8 +11805,7 @@ const_vector_from_tree (tree exp) { rtvec v; unsigned i, units; - tree elt; - machine_mode inner, mode; + machine_mode mode; mode = TYPE_MODE (TREE_TYPE (exp)); @@ -11776,23 +11816,12 @@ const_vector_from_tree (tree exp) return const_vector_mask_from_tree (exp); units = VECTOR_CST_NELTS (exp); - inner = GET_MODE_INNER (mode); v = rtvec_alloc (units); for (i = 0; i < units; ++i) - { - elt = VECTOR_CST_ELT (exp, i); - - if (TREE_CODE (elt) == REAL_CST) - RTVEC_ELT (v, i) = const_double_from_real_value (TREE_REAL_CST (elt), - inner); - else if (TREE_CODE (elt) == FIXED_CST) - RTVEC_ELT (v, i) = CONST_FIXED_FROM_FIXED_VALUE (TREE_FIXED_CST (elt), - inner); - else - RTVEC_ELT (v, i) = immed_wide_int_const (wi::to_wide (elt), inner); - } + RTVEC_ELT (v, i) = const_vector_element (GET_MODE_INNER (mode), + VECTOR_CST_ELT (exp, i)); return gen_rtx_CONST_VECTOR (mode, v); } Index: gcc/internal-fn.c =================================================================== --- gcc/internal-fn.c 2017-10-23 11:41:23.529089619 +0100 +++ gcc/internal-fn.c 2017-10-23 11:41:51.767200753 +0100 @@ -1911,12 +1911,12 @@ expand_vector_ubsan_overflow (location_t emit_move_insn (cntvar, const0_rtx); emit_label (loop_lab); } - if (TREE_CODE (arg0) != VECTOR_CST) + if (!CONSTANT_CLASS_P (arg0)) { rtx arg0r = expand_normal (arg0); arg0 = make_tree (TREE_TYPE (arg0), arg0r); } - if (TREE_CODE (arg1) != VECTOR_CST) + if (!CONSTANT_CLASS_P (arg1)) { rtx arg1r = expand_normal (arg1); arg1 = make_tree (TREE_TYPE (arg1), arg1r); Index: gcc/tree-cfg.c =================================================================== --- gcc/tree-cfg.c 2017-10-23 11:41:25.864967029 +0100 +++ gcc/tree-cfg.c 2017-10-23 11:41:51.770094616 +0100 @@ -3803,6 +3803,17 @@ verify_gimple_assign_unary (gassign *stm case CONJ_EXPR: break; + case VEC_DUPLICATE_EXPR: + if (TREE_CODE (lhs_type) != VECTOR_TYPE + || !useless_type_conversion_p (TREE_TYPE (lhs_type), rhs1_type)) + { + error ("vec_duplicate should be from a scalar to a like vector"); + debug_generic_expr (lhs_type); + debug_generic_expr (rhs1_type); + return true; + } + return false; + default: gcc_unreachable (); } @@ -4473,6 +4484,7 @@ verify_gimple_assign_single (gassign *st case FIXED_CST: case COMPLEX_CST: case VECTOR_CST: + case VEC_DUPLICATE_CST: case STRING_CST: return res; Index: gcc/tree-inline.c =================================================================== --- gcc/tree-inline.c 2017-10-23 11:41:25.833048208 +0100 +++ gcc/tree-inline.c 2017-10-23 11:41:51.771059237 +0100 @@ -4002,6 +4002,7 @@ estimate_operator_cost (enum tree_code c case VEC_PACK_FIX_TRUNC_EXPR: case VEC_WIDEN_LSHIFT_HI_EXPR: case VEC_WIDEN_LSHIFT_LO_EXPR: + case VEC_DUPLICATE_EXPR: return 1;