Message ID | 87lgiblc9q.fsf@linaro.org |
---|---|
State | New |
Headers | show |
Series | Make VEC_PERM_EXPR work for variable-length vectors | expand |
On Sun, Dec 10, 2017 at 12:20 AM, Richard Sandiford <richard.sandiford@linaro.org> wrote: > This patch changes vec_perm_indices from a plain vec<> to a class > that stores a canonicalised permutation, using the same encoding > as for VECTOR_CSTs. This means that vec_perm_indices now carries > information about the number of vectors being permuted (currently > always 1 or 2) and the number of elements in each input vector. Before I dive into the C++ details can you explain why it needs this info and how it encodes it for variable-length vectors? To interleave two vectors you need sth like { 0, N, 1, N+1, ... }, I'm not sure we can directly encode N here, can we? extract even/odd should just work as { 0, 2, 4, 6, ...} without knowledge of whether we permute one or two vectors (the one vector case just has two times the same vector) or how many elements each of the vectors (or the result) has. Richard. > A new vec_perm_builder class is used to actually build up the vector, > like tree_vector_builder does for trees. vec_perm_indices is the > completed representation, a bit like VECTOR_CST is for trees. > > The patch just does a mechanical conversion of the code to > vec_perm_builder: a later patch uses explicit encodings where possible. > > The point of all this is that it makes the representation suitable > for variable-length vectors. It's no longer necessary for the > underlying vec<>s to store every element explicitly. > > In int-vector-builder.h, "using the same encoding as tree and rtx constants" > describes the endpoint -- adding the rtx encoding comes later. > > > 2017-12-09 Richard Sandiford <richard.sandiford@linaro.org> > > gcc/ > * int-vector-builder.h: New file. > * vec-perm-indices.h: Include int-vector-builder.h. > (vec_perm_indices): Redefine as an int_vector_builder. > (auto_vec_perm_indices): Delete. > (vec_perm_builder): Redefine as a stand-alone class. > (vec_perm_indices::vec_perm_indices): New function. > (vec_perm_indices::clamp): Likewise. > * vec-perm-indices.c: Include fold-const.h and tree-vector-builder.h. > (vec_perm_indices::new_vector): New function. > (vec_perm_indices::new_expanded_vector): Update for new > vec_perm_indices class. > (vec_perm_indices::rotate_inputs): New function. > (vec_perm_indices::all_in_range_p): Operate directly on the > encoded form, without computing elided elements. > (tree_to_vec_perm_builder): Operate directly on the VECTOR_CST > encoding. Update for new vec_perm_indices class. > * optabs.c (expand_vec_perm_const): Create a vec_perm_indices for > the given vec_perm_builder. > (expand_vec_perm_var): Update vec_perm_builder constructor. > (expand_mult_highpart): Use vec_perm_builder instead of > auto_vec_perm_indices. > * optabs-query.c (can_mult_highpart_p): Use vec_perm_builder and > vec_perm_indices instead of auto_vec_perm_indices. Use a single > or double series encoding as appropriate. > * fold-const.c (fold_ternary_loc): Use vec_perm_builder and > vec_perm_indices instead of auto_vec_perm_indices. > * tree-ssa-forwprop.c (simplify_vector_constructor): Likewise. > * tree-vect-data-refs.c (vect_grouped_store_supported): Likewise. > (vect_permute_store_chain): Likewise. > (vect_grouped_load_supported): Likewise. > (vect_permute_load_chain): Likewise. > (vect_shift_permute_load_chain): Likewise. > * tree-vect-slp.c (vect_build_slp_tree_1): Likewise. > (vect_transform_slp_perm_load): Likewise. > (vect_schedule_slp_instance): Likewise. > * tree-vect-stmts.c (perm_mask_for_reverse): Likewise. > (vectorizable_mask_load_store): Likewise. > (vectorizable_bswap): Likewise. > (vectorizable_store): Likewise. > (vectorizable_load): Likewise. > * tree-vect-generic.c (lower_vec_perm): Use vec_perm_builder and > vec_perm_indices instead of auto_vec_perm_indices. Use > tree_to_vec_perm_builder to read the vector from a tree. > * tree-vect-loop.c (calc_vec_perm_mask_for_shift): Take a > vec_perm_builder instead of a vec_perm_indices. > (have_whole_vector_shift): Use vec_perm_builder and > vec_perm_indices instead of auto_vec_perm_indices. Leave the > truncation to calc_vec_perm_mask_for_shift. > (vect_create_epilog_for_reduction): Likewise. > * config/aarch64/aarch64.c (expand_vec_perm_d::perm): Change > from auto_vec_perm_indices to vec_perm_indices. > (aarch64_expand_vec_perm_const_1): Use rotate_inputs on d.perm > instead of changing individual elements. > (aarch64_vectorize_vec_perm_const): Use new_vector to install > the vector in d.perm. > * config/arm/arm.c (expand_vec_perm_d::perm): Change > from auto_vec_perm_indices to vec_perm_indices. > (arm_expand_vec_perm_const_1): Use rotate_inputs on d.perm > instead of changing individual elements. > (arm_vectorize_vec_perm_const): Use new_vector to install > the vector in d.perm. > * config/powerpcspe/powerpcspe.c (rs6000_expand_extract_even): > Update vec_perm_builder constructor. > (rs6000_expand_interleave): Likewise. > * config/rs6000/rs6000.c (rs6000_expand_extract_even): Likewise. > (rs6000_expand_interleave): Likewise. > > Index: gcc/int-vector-builder.h > =================================================================== > --- /dev/null 2017-12-09 13:59:56.352713187 +0000 > +++ gcc/int-vector-builder.h 2017-12-09 22:48:47.545825268 +0000 > @@ -0,0 +1,90 @@ > +/* A class for building vector integer constants. > + Copyright (C) 2017 Free Software Foundation, Inc. > + > +This file is part of GCC. > + > +GCC is free software; you can redistribute it and/or modify it under > +the terms of the GNU General Public License as published by the Free > +Software Foundation; either version 3, or (at your option) any later > +version. > + > +GCC is distributed in the hope that it will be useful, but WITHOUT ANY > +WARRANTY; without even the implied warranty of MERCHANTABILITY or > +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License > +for more details. > + > +You should have received a copy of the GNU General Public License > +along with GCC; see the file COPYING3. If not see > +<http://www.gnu.org/licenses/>. */ > + > +#ifndef GCC_INT_VECTOR_BUILDER_H > +#define GCC_INT_VECTOR_BUILDER_H 1 > + > +#include "vector-builder.h" > + > +/* This class is used to build vectors of integer type T using the same > + encoding as tree and rtx constants. See vector_builder for more > + details. */ > +template<typename T> > +class int_vector_builder : public vector_builder<T, int_vector_builder<T> > > +{ > + typedef vector_builder<T, int_vector_builder> parent; > + friend class vector_builder<T, int_vector_builder>; > + > +public: > + int_vector_builder () {} > + int_vector_builder (unsigned int, unsigned int, unsigned int); > + > + using parent::new_vector; > + > +private: > + bool equal_p (T, T) const; > + bool allow_steps_p () const { return true; } > + bool integral_p (T) const { return true; } > + T step (T, T) const; > + T apply_step (T, unsigned int, T) const; > + bool can_elide_p (T) const { return true; } > + void note_representative (T *, T) {} > +}; > + > +/* Create a new builder for a vector with FULL_NELTS elements. > + Initially encode the value as NPATTERNS interleaved patterns with > + NELTS_PER_PATTERN elements each. */ > + > +template<typename T> > +inline > +int_vector_builder<T>::int_vector_builder (unsigned int full_nelts, > + unsigned int npatterns, > + unsigned int nelts_per_pattern) > +{ > + new_vector (full_nelts, npatterns, nelts_per_pattern); > +} > + > +/* Return true if elements ELT1 and ELT2 are equal. */ > + > +template<typename T> > +inline bool > +int_vector_builder<T>::equal_p (T elt1, T elt2) const > +{ > + return elt1 == elt2; > +} > + > +/* Return the value of element ELT2 minus the value of element ELT1. */ > + > +template<typename T> > +inline T > +int_vector_builder<T>::step (T elt1, T elt2) const > +{ > + return elt2 - elt1; > +} > + > +/* Return a vector element with the value BASE + FACTOR * STEP. */ > + > +template<typename T> > +inline T > +int_vector_builder<T>::apply_step (T base, unsigned int factor, T step) const > +{ > + return base + factor * step; > +} > + > +#endif > Index: gcc/vec-perm-indices.h > =================================================================== > --- gcc/vec-perm-indices.h 2017-12-09 22:47:27.885318101 +0000 > +++ gcc/vec-perm-indices.h 2017-12-09 22:48:47.548825399 +0000 > @@ -20,30 +20,102 @@ Software Foundation; either version 3, o > #ifndef GCC_VEC_PERN_INDICES_H > #define GCC_VEC_PERN_INDICES_H 1 > > +#include "int-vector-builder.h" > + > +/* A vector_builder for building constant permutation vectors. > + The elements do not need to be clamped to a particular range > + of input elements. */ > +typedef int_vector_builder<HOST_WIDE_INT> vec_perm_builder; > + > /* This class represents a constant permutation vector, such as that used > - as the final operand to a VEC_PERM_EXPR. */ > -class vec_perm_indices : public auto_vec<unsigned short, 32> > + as the final operand to a VEC_PERM_EXPR. The vector is canonicalized > + for a particular number of input vectors and for a particular number > + of elements per input. The class copes with cases in which the > + input and output vectors have different numbers of elements. */ > +class vec_perm_indices > { > - typedef unsigned short element_type; > - typedef auto_vec<element_type, 32> parent_type; > + typedef HOST_WIDE_INT element_type; > > public: > - vec_perm_indices () {} > - vec_perm_indices (unsigned int nunits) : parent_type (nunits) {} > + vec_perm_indices (); > + vec_perm_indices (const vec_perm_builder &, unsigned int, unsigned int); > > + void new_vector (const vec_perm_builder &, unsigned int, unsigned int); > void new_expanded_vector (const vec_perm_indices &, unsigned int); > + void rotate_inputs (int delta); > + > + /* Return the underlying vector encoding. */ > + const vec_perm_builder &encoding () const { return m_encoding; } > + > + /* Return the number of output elements. This is called length () > + so that we present a more vec-like interface. */ > + unsigned int length () const { return m_encoding.full_nelts (); } > + > + /* Return the number of input vectors being permuted. */ > + unsigned int ninputs () const { return m_ninputs; } > > + /* Return the number of elements in each input vector. */ > + unsigned int nelts_per_input () const { return m_nelts_per_input; } > + > + /* Return the total number of input elements. */ > + unsigned int input_nelts () const { return m_ninputs * m_nelts_per_input; } > + > + element_type clamp (element_type) const; > + element_type operator[] (unsigned int i) const; > bool all_in_range_p (element_type, element_type) const; > > private: > vec_perm_indices (const vec_perm_indices &); > -}; > > -/* Temporary. */ > -typedef vec_perm_indices vec_perm_builder; > -typedef vec_perm_indices auto_vec_perm_indices; > + vec_perm_builder m_encoding; > + unsigned int m_ninputs; > + unsigned int m_nelts_per_input; > +}; > > bool tree_to_vec_perm_builder (vec_perm_builder *, tree); > rtx vec_perm_indices_to_rtx (machine_mode, const vec_perm_indices &); > > +inline > +vec_perm_indices::vec_perm_indices () > + : m_ninputs (0), > + m_nelts_per_input (0) > +{ > +} > + > +/* Construct a permutation vector that selects between NINPUTS vector > + inputs that have NELTS_PER_INPUT elements each. Take the elements of > + the new vector from ELEMENTS, clamping each one to be in range. */ > + > +inline > +vec_perm_indices::vec_perm_indices (const vec_perm_builder &elements, > + unsigned int ninputs, > + unsigned int nelts_per_input) > +{ > + new_vector (elements, ninputs, nelts_per_input); > +} > + > +/* Return the canonical value for permutation vector element ELT, > + taking into account the current number of input elements. */ > + > +inline vec_perm_indices::element_type > +vec_perm_indices::clamp (element_type elt) const > +{ > + element_type limit = input_nelts (); > + elt %= limit; > + /* Treat negative elements as counting from the end. This only matters > + if the vector size is not a power of 2. */ > + if (elt < 0) > + elt += limit; > + return elt; > +} > + > +/* Return the value of vector element I, which might or might not be > + explicitly encoded. */ > + > +inline vec_perm_indices::element_type > +vec_perm_indices::operator[] (unsigned int i) const > +{ > + return clamp (m_encoding.elt (i)); > +} > + > #endif > Index: gcc/vec-perm-indices.c > =================================================================== > --- gcc/vec-perm-indices.c 2017-12-09 22:47:27.885318101 +0000 > +++ gcc/vec-perm-indices.c 2017-12-09 22:48:47.548825399 +0000 > @@ -22,11 +22,33 @@ Software Foundation; either version 3, o > #include "coretypes.h" > #include "vec-perm-indices.h" > #include "tree.h" > +#include "fold-const.h" > +#include "tree-vector-builder.h" > #include "backend.h" > #include "rtl.h" > #include "memmodel.h" > #include "emit-rtl.h" > > +/* Switch to a new permutation vector that selects between NINPUTS vector > + inputs that have NELTS_PER_INPUT elements each. Take the elements of the > + new permutation vector from ELEMENTS, clamping each one to be in range. */ > + > +void > +vec_perm_indices::new_vector (const vec_perm_builder &elements, > + unsigned int ninputs, > + unsigned int nelts_per_input) > +{ > + m_ninputs = ninputs; > + m_nelts_per_input = nelts_per_input; > + /* Expand the encoding and clamp each element. E.g. { 0, 2, 4, ... } > + might wrap halfway if there is only one vector input. */ > + unsigned int full_nelts = elements.full_nelts (); > + m_encoding.new_vector (full_nelts, full_nelts, 1); > + for (unsigned int i = 0; i < full_nelts; ++i) > + m_encoding.quick_push (clamp (elements.elt (i))); > + m_encoding.finalize (); > +} > + > /* Switch to a new permutation vector that selects the same input elements > as ORIG, but with each element split into FACTOR pieces. For example, > if ORIG is { 1, 2, 0, 3 } and FACTOR is 2, the new permutation is > @@ -36,14 +58,31 @@ Software Foundation; either version 3, o > vec_perm_indices::new_expanded_vector (const vec_perm_indices &orig, > unsigned int factor) > { > - truncate (0); > - reserve (orig.length () * factor); > - for (unsigned int i = 0; i < orig.length (); ++i) > + m_ninputs = orig.m_ninputs; > + m_nelts_per_input = orig.m_nelts_per_input * factor; > + m_encoding.new_vector (orig.m_encoding.full_nelts () * factor, > + orig.m_encoding.npatterns () * factor, > + orig.m_encoding.nelts_per_pattern ()); > + unsigned int encoded_nelts = orig.m_encoding.encoded_nelts (); > + for (unsigned int i = 0; i < encoded_nelts; ++i) > { > - element_type base = orig[i] * factor; > + element_type base = orig.m_encoding[i] * factor; > for (unsigned int j = 0; j < factor; ++j) > - quick_push (base + j); > + m_encoding.quick_push (base + j); > } > + m_encoding.finalize (); > +} > + > +/* Rotate the inputs of the permutation right by DELTA inputs. This changes > + the values of the permutation vector but it doesn't change the way that > + the elements are encoded. */ > + > +void > +vec_perm_indices::rotate_inputs (int delta) > +{ > + element_type element_delta = delta * m_nelts_per_input; > + for (unsigned int i = 0; i < m_encoding.length (); ++i) > + m_encoding[i] = clamp (m_encoding[i] + element_delta); > } > > /* Return true if all elements of the permutation vector are in the range > @@ -52,9 +91,44 @@ vec_perm_indices::new_expanded_vector (c > bool > vec_perm_indices::all_in_range_p (element_type start, element_type size) const > { > - for (unsigned int i = 0; i < length (); ++i) > - if ((*this)[i] < start || ((*this)[i] - start) >= size) > + /* Check the first two elements of each pattern. */ > + unsigned int npatterns = m_encoding.npatterns (); > + unsigned int nelts_per_pattern = m_encoding.nelts_per_pattern (); > + unsigned int base_nelts = npatterns * MIN (nelts_per_pattern, 2); > + for (unsigned int i = 0; i < base_nelts; ++i) > + if (m_encoding[i] < start || (m_encoding[i] - start) >= size) > return false; > + > + /* For stepped encodings, check the full range of the series. */ > + if (nelts_per_pattern == 3) > + { > + element_type limit = input_nelts (); > + > + /* The number of elements in each pattern beyond the first two > + that we checked above. */ > + unsigned int step_nelts = (m_encoding.full_nelts () / npatterns) - 2; > + for (unsigned int i = 0; i < npatterns; ++i) > + { > + /* BASE1 has been checked but BASE2 hasn't. */ > + element_type base1 = m_encoding[i + npatterns]; > + element_type base2 = m_encoding[i + base_nelts]; > + > + /* The step to add to get from BASE1 to each subsequent value. */ > + element_type step = clamp (base2 - base1); > + > + /* STEP has no inherent sign, so a value near LIMIT can > + act as a negative step. The series is in range if it > + is in range according to one of the two interpretations. > + > + Since we're dealing with clamped values, ELEMENT_TYPE is > + wide enough for overflow not to be a problem. */ > + element_type headroom_down = base1 - start; > + element_type headroom_up = size - headroom_down - 1; > + if (headroom_up < step * step_nelts > + && headroom_down < (limit - step) * step_nelts) > + return false; > + } > + } > return true; > } > > @@ -65,15 +139,16 @@ vec_perm_indices::all_in_range_p (elemen > bool > tree_to_vec_perm_builder (vec_perm_builder *builder, tree cst) > { > - unsigned int nelts = TYPE_VECTOR_SUBPARTS (TREE_TYPE (cst)); > - for (unsigned int i = 0; i < nelts; ++i) > - if (!tree_fits_shwi_p (vector_cst_elt (cst, i))) > + unsigned int encoded_nelts = vector_cst_encoded_nelts (cst); > + for (unsigned int i = 0; i < encoded_nelts; ++i) > + if (!tree_fits_shwi_p (VECTOR_CST_ENCODED_ELT (cst, i))) > return false; > > - builder->reserve (nelts); > - for (unsigned int i = 0; i < nelts; ++i) > - builder->quick_push (tree_to_shwi (vector_cst_elt (cst, i)) > - & (2 * nelts - 1)); > + builder->new_vector (TYPE_VECTOR_SUBPARTS (TREE_TYPE (cst)), > + VECTOR_CST_NPATTERNS (cst), > + VECTOR_CST_NELTS_PER_PATTERN (cst)); > + for (unsigned int i = 0; i < encoded_nelts; ++i) > + builder->quick_push (tree_to_shwi (VECTOR_CST_ENCODED_ELT (cst, i))); > return true; > } > > Index: gcc/optabs.c > =================================================================== > --- gcc/optabs.c 2017-12-09 22:47:27.881318099 +0000 > +++ gcc/optabs.c 2017-12-09 22:48:47.546825312 +0000 > @@ -5456,6 +5456,11 @@ expand_vec_perm_const (machine_mode mode > rtx_insn *last = get_last_insn (); > > bool single_arg_p = rtx_equal_p (v0, v1); > + /* Always specify two input vectors here and leave the target to handle > + cases in which the inputs are equal. Not all backends can cope with > + the single-input representation when testing for a double-input > + target instruction. */ > + vec_perm_indices indices (sel, 2, GET_MODE_NUNITS (mode)); > > /* See if this can be handled with a vec_shr. We only do this if the > second vector is all zeroes. */ > @@ -5468,7 +5473,7 @@ expand_vec_perm_const (machine_mode mode > && (shift_code != CODE_FOR_nothing > || shift_code_qi != CODE_FOR_nothing)) > { > - rtx shift_amt = shift_amt_for_vec_perm_mask (mode, sel); > + rtx shift_amt = shift_amt_for_vec_perm_mask (mode, indices); > if (shift_amt) > { > struct expand_operand ops[3]; > @@ -5500,7 +5505,7 @@ expand_vec_perm_const (machine_mode mode > else > v1 = force_reg (mode, v1); > > - if (targetm.vectorize.vec_perm_const (mode, target, v0, v1, sel)) > + if (targetm.vectorize.vec_perm_const (mode, target, v0, v1, indices)) > return target; > } > > @@ -5509,7 +5514,7 @@ expand_vec_perm_const (machine_mode mode > rtx target_qi = NULL_RTX, v0_qi = NULL_RTX, v1_qi = NULL_RTX; > if (qimode != VOIDmode) > { > - qimode_indices.new_expanded_vector (sel, GET_MODE_UNIT_SIZE (mode)); > + qimode_indices.new_expanded_vector (indices, GET_MODE_UNIT_SIZE (mode)); > target_qi = gen_reg_rtx (qimode); > v0_qi = gen_lowpart (qimode, v0); > v1_qi = gen_lowpart (qimode, v1); > @@ -5536,7 +5541,7 @@ expand_vec_perm_const (machine_mode mode > REQUIRED_SEL_MODE is OK. */ > if (sel_mode != required_sel_mode) > { > - if (!selector_fits_mode_p (required_sel_mode, sel)) > + if (!selector_fits_mode_p (required_sel_mode, indices)) > { > delete_insns_since (last); > return NULL_RTX; > @@ -5547,7 +5552,7 @@ expand_vec_perm_const (machine_mode mode > insn_code icode = direct_optab_handler (vec_perm_optab, mode); > if (icode != CODE_FOR_nothing) > { > - rtx sel_rtx = vec_perm_indices_to_rtx (sel_mode, sel); > + rtx sel_rtx = vec_perm_indices_to_rtx (sel_mode, indices); > rtx tmp = expand_vec_perm_1 (icode, target, v0, v1, sel_rtx); > if (tmp) > return tmp; > @@ -5621,7 +5626,7 @@ expand_vec_perm_var (machine_mode mode, > gcc_assert (sel != NULL); > > /* Broadcast the low byte each element into each of its bytes. */ > - vec_perm_builder const_sel (w); > + vec_perm_builder const_sel (w, w, 1); > for (i = 0; i < w; ++i) > { > int this_e = i / u * u; > @@ -5848,7 +5853,7 @@ expand_mult_highpart (machine_mode mode, > expand_insn (optab_handler (tab2, mode), 3, eops); > m2 = gen_lowpart (mode, eops[0].value); > > - auto_vec_perm_indices sel (nunits); > + vec_perm_builder sel (nunits, nunits, 1); > if (method == 2) > { > for (i = 0; i < nunits; ++i) > Index: gcc/optabs-query.c > =================================================================== > --- gcc/optabs-query.c 2017-12-09 22:47:27.881318099 +0000 > +++ gcc/optabs-query.c 2017-12-09 22:48:47.545825268 +0000 > @@ -501,12 +501,13 @@ can_mult_highpart_p (machine_mode mode, > op = uns_p ? vec_widen_umult_odd_optab : vec_widen_smult_odd_optab; > if (optab_handler (op, mode) != CODE_FOR_nothing) > { > - auto_vec_perm_indices sel (nunits); > + vec_perm_builder sel (nunits, nunits, 1); > for (i = 0; i < nunits; ++i) > sel.quick_push (!BYTES_BIG_ENDIAN > + (i & ~1) > + ((i & 1) ? nunits : 0)); > - if (can_vec_perm_const_p (mode, sel)) > + vec_perm_indices indices (sel, 2, nunits); > + if (can_vec_perm_const_p (mode, indices)) > return 2; > } > } > @@ -517,10 +518,11 @@ can_mult_highpart_p (machine_mode mode, > op = uns_p ? vec_widen_umult_lo_optab : vec_widen_smult_lo_optab; > if (optab_handler (op, mode) != CODE_FOR_nothing) > { > - auto_vec_perm_indices sel (nunits); > + vec_perm_builder sel (nunits, nunits, 1); > for (i = 0; i < nunits; ++i) > sel.quick_push (2 * i + (BYTES_BIG_ENDIAN ? 0 : 1)); > - if (can_vec_perm_const_p (mode, sel)) > + vec_perm_indices indices (sel, 2, nunits); > + if (can_vec_perm_const_p (mode, indices)) > return 3; > } > } > Index: gcc/fold-const.c > =================================================================== > --- gcc/fold-const.c 2017-12-09 22:47:27.881318099 +0000 > +++ gcc/fold-const.c 2017-12-09 22:48:47.545825268 +0000 > @@ -11217,7 +11217,7 @@ fold_ternary_loc (location_t loc, enum t > { > unsigned int nelts = VECTOR_CST_NELTS (arg0), i; > gcc_assert (nelts == TYPE_VECTOR_SUBPARTS (type)); > - auto_vec_perm_indices sel (nelts); > + vec_perm_builder sel (nelts, nelts, 1); > for (i = 0; i < nelts; i++) > { > tree val = VECTOR_CST_ELT (arg0, i); > @@ -11228,7 +11228,8 @@ fold_ternary_loc (location_t loc, enum t > else /* Currently unreachable. */ > return NULL_TREE; > } > - tree t = fold_vec_perm (type, arg1, arg2, sel); > + tree t = fold_vec_perm (type, arg1, arg2, > + vec_perm_indices (sel, 2, nelts)); > if (t != NULL_TREE) > return t; > } > @@ -11558,8 +11559,8 @@ fold_ternary_loc (location_t loc, enum t > mask2 = 2 * nelts - 1; > mask = single_arg ? (nelts - 1) : mask2; > gcc_assert (nelts == TYPE_VECTOR_SUBPARTS (type)); > - auto_vec_perm_indices sel (nelts); > - auto_vec_perm_indices sel2 (nelts); > + vec_perm_builder sel (nelts, nelts, 1); > + vec_perm_builder sel2 (nelts, nelts, 1); > for (i = 0; i < nelts; i++) > { > tree val = VECTOR_CST_ELT (arg2, i); > @@ -11604,12 +11605,13 @@ fold_ternary_loc (location_t loc, enum t > need_mask_canon = true; > } > > + vec_perm_indices indices (sel, 2, nelts); > if ((TREE_CODE (op0) == VECTOR_CST > || TREE_CODE (op0) == CONSTRUCTOR) > && (TREE_CODE (op1) == VECTOR_CST > || TREE_CODE (op1) == CONSTRUCTOR)) > { > - tree t = fold_vec_perm (type, op0, op1, sel); > + tree t = fold_vec_perm (type, op0, op1, indices); > if (t != NULL_TREE) > return t; > } > @@ -11621,11 +11623,14 @@ fold_ternary_loc (location_t loc, enum t > argument permutation while still allowing an equivalent > 2-argument version. */ > if (need_mask_canon && arg2 == op2 > - && !can_vec_perm_const_p (TYPE_MODE (type), sel, false) > - && can_vec_perm_const_p (TYPE_MODE (type), sel2, false)) > + && !can_vec_perm_const_p (TYPE_MODE (type), indices, false) > + && can_vec_perm_const_p (TYPE_MODE (type), > + vec_perm_indices (sel2, 2, nelts), > + false)) > { > need_mask_canon = need_mask_canon2; > - sel = sel2; > + sel.truncate (0); > + sel.splice (sel2); > } > > if (need_mask_canon && arg2 == op2) > Index: gcc/tree-ssa-forwprop.c > =================================================================== > --- gcc/tree-ssa-forwprop.c 2017-12-09 22:47:27.883318100 +0000 > +++ gcc/tree-ssa-forwprop.c 2017-12-09 22:48:47.546825312 +0000 > @@ -2019,7 +2019,7 @@ simplify_vector_constructor (gimple_stmt > elem_type = TREE_TYPE (type); > elem_size = TREE_INT_CST_LOW (TYPE_SIZE (elem_type)); > > - auto_vec_perm_indices sel (nelts); > + vec_perm_builder sel (nelts, nelts, 1); > orig = NULL; > conv_code = ERROR_MARK; > maybe_ident = true; > @@ -2109,7 +2109,8 @@ simplify_vector_constructor (gimple_stmt > { > tree mask_type; > > - if (!can_vec_perm_const_p (TYPE_MODE (type), sel)) > + vec_perm_indices indices (sel, 1, nelts); > + if (!can_vec_perm_const_p (TYPE_MODE (type), indices)) > return false; > mask_type > = build_vector_type (build_nonstandard_integer_type (elem_size, 1), > Index: gcc/tree-vect-data-refs.c > =================================================================== > --- gcc/tree-vect-data-refs.c 2017-12-09 22:47:27.883318100 +0000 > +++ gcc/tree-vect-data-refs.c 2017-12-09 22:48:47.546825312 +0000 > @@ -4566,7 +4566,7 @@ vect_grouped_store_supported (tree vecty > if (VECTOR_MODE_P (mode)) > { > unsigned int i, nelt = GET_MODE_NUNITS (mode); > - auto_vec_perm_indices sel (nelt); > + vec_perm_builder sel (nelt, nelt, 1); > sel.quick_grow (nelt); > > if (count == 3) > @@ -4574,6 +4574,7 @@ vect_grouped_store_supported (tree vecty > unsigned int j0 = 0, j1 = 0, j2 = 0; > unsigned int i, j; > > + vec_perm_indices indices; > for (j = 0; j < 3; j++) > { > int nelt0 = ((3 - j) * nelt) % 3; > @@ -4588,7 +4589,8 @@ vect_grouped_store_supported (tree vecty > if (3 * i + nelt2 < nelt) > sel[3 * i + nelt2] = 0; > } > - if (!can_vec_perm_const_p (mode, sel)) > + indices.new_vector (sel, 2, nelt); > + if (!can_vec_perm_const_p (mode, indices)) > { > if (dump_enabled_p ()) > dump_printf (MSG_MISSED_OPTIMIZATION, > @@ -4605,7 +4607,8 @@ vect_grouped_store_supported (tree vecty > if (3 * i + nelt2 < nelt) > sel[3 * i + nelt2] = nelt + j2++; > } > - if (!can_vec_perm_const_p (mode, sel)) > + indices.new_vector (sel, 2, nelt); > + if (!can_vec_perm_const_p (mode, indices)) > { > if (dump_enabled_p ()) > dump_printf (MSG_MISSED_OPTIMIZATION, > @@ -4625,11 +4628,13 @@ vect_grouped_store_supported (tree vecty > sel[i * 2] = i; > sel[i * 2 + 1] = i + nelt; > } > - if (can_vec_perm_const_p (mode, sel)) > + vec_perm_indices indices (sel, 2, nelt); > + if (can_vec_perm_const_p (mode, indices)) > { > for (i = 0; i < nelt; i++) > sel[i] += nelt / 2; > - if (can_vec_perm_const_p (mode, sel)) > + indices.new_vector (sel, 2, nelt); > + if (can_vec_perm_const_p (mode, indices)) > return true; > } > } > @@ -4731,7 +4736,7 @@ vect_permute_store_chain (vec<tree> dr_c > unsigned int i, n, log_length = exact_log2 (length); > unsigned int j, nelt = TYPE_VECTOR_SUBPARTS (vectype); > > - auto_vec_perm_indices sel (nelt); > + vec_perm_builder sel (nelt, nelt, 1); > sel.quick_grow (nelt); > > result_chain->quick_grow (length); > @@ -4742,6 +4747,7 @@ vect_permute_store_chain (vec<tree> dr_c > { > unsigned int j0 = 0, j1 = 0, j2 = 0; > > + vec_perm_indices indices; > for (j = 0; j < 3; j++) > { > int nelt0 = ((3 - j) * nelt) % 3; > @@ -4757,7 +4763,8 @@ vect_permute_store_chain (vec<tree> dr_c > if (3 * i + nelt2 < nelt) > sel[3 * i + nelt2] = 0; > } > - perm3_mask_low = vect_gen_perm_mask_checked (vectype, sel); > + indices.new_vector (sel, 2, nelt); > + perm3_mask_low = vect_gen_perm_mask_checked (vectype, indices); > > for (i = 0; i < nelt; i++) > { > @@ -4768,7 +4775,8 @@ vect_permute_store_chain (vec<tree> dr_c > if (3 * i + nelt2 < nelt) > sel[3 * i + nelt2] = nelt + j2++; > } > - perm3_mask_high = vect_gen_perm_mask_checked (vectype, sel); > + indices.new_vector (sel, 2, nelt); > + perm3_mask_high = vect_gen_perm_mask_checked (vectype, indices); > > vect1 = dr_chain[0]; > vect2 = dr_chain[1]; > @@ -4805,11 +4813,13 @@ vect_permute_store_chain (vec<tree> dr_c > sel[i * 2] = i; > sel[i * 2 + 1] = i + nelt; > } > - perm_mask_high = vect_gen_perm_mask_checked (vectype, sel); > + vec_perm_indices indices (sel, 2, nelt); > + perm_mask_high = vect_gen_perm_mask_checked (vectype, indices); > > for (i = 0; i < nelt; i++) > sel[i] += nelt / 2; > - perm_mask_low = vect_gen_perm_mask_checked (vectype, sel); > + indices.new_vector (sel, 2, nelt); > + perm_mask_low = vect_gen_perm_mask_checked (vectype, indices); > > for (i = 0, n = log_length; i < n; i++) > { > @@ -5154,11 +5164,12 @@ vect_grouped_load_supported (tree vectyp > if (VECTOR_MODE_P (mode)) > { > unsigned int i, j, nelt = GET_MODE_NUNITS (mode); > - auto_vec_perm_indices sel (nelt); > + vec_perm_builder sel (nelt, nelt, 1); > sel.quick_grow (nelt); > > if (count == 3) > { > + vec_perm_indices indices; > unsigned int k; > for (k = 0; k < 3; k++) > { > @@ -5167,7 +5178,8 @@ vect_grouped_load_supported (tree vectyp > sel[i] = 3 * i + k; > else > sel[i] = 0; > - if (!can_vec_perm_const_p (mode, sel)) > + indices.new_vector (sel, 2, nelt); > + if (!can_vec_perm_const_p (mode, indices)) > { > if (dump_enabled_p ()) > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > @@ -5180,7 +5192,8 @@ vect_grouped_load_supported (tree vectyp > sel[i] = i; > else > sel[i] = nelt + ((nelt + k) % 3) + 3 * (j++); > - if (!can_vec_perm_const_p (mode, sel)) > + indices.new_vector (sel, 2, nelt); > + if (!can_vec_perm_const_p (mode, indices)) > { > if (dump_enabled_p ()) > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > @@ -5195,13 +5208,16 @@ vect_grouped_load_supported (tree vectyp > { > /* If length is not equal to 3 then only power of 2 is supported. */ > gcc_assert (pow2p_hwi (count)); > + > for (i = 0; i < nelt; i++) > sel[i] = i * 2; > - if (can_vec_perm_const_p (mode, sel)) > + vec_perm_indices indices (sel, 2, nelt); > + if (can_vec_perm_const_p (mode, indices)) > { > for (i = 0; i < nelt; i++) > sel[i] = i * 2 + 1; > - if (can_vec_perm_const_p (mode, sel)) > + indices.new_vector (sel, 2, nelt); > + if (can_vec_perm_const_p (mode, indices)) > return true; > } > } > @@ -5316,7 +5332,7 @@ vect_permute_load_chain (vec<tree> dr_ch > unsigned int i, j, log_length = exact_log2 (length); > unsigned nelt = TYPE_VECTOR_SUBPARTS (vectype); > > - auto_vec_perm_indices sel (nelt); > + vec_perm_builder sel (nelt, nelt, 1); > sel.quick_grow (nelt); > > result_chain->quick_grow (length); > @@ -5327,6 +5343,7 @@ vect_permute_load_chain (vec<tree> dr_ch > { > unsigned int k; > > + vec_perm_indices indices; > for (k = 0; k < 3; k++) > { > for (i = 0; i < nelt; i++) > @@ -5334,15 +5351,16 @@ vect_permute_load_chain (vec<tree> dr_ch > sel[i] = 3 * i + k; > else > sel[i] = 0; > - perm3_mask_low = vect_gen_perm_mask_checked (vectype, sel); > + indices.new_vector (sel, 2, nelt); > + perm3_mask_low = vect_gen_perm_mask_checked (vectype, indices); > > for (i = 0, j = 0; i < nelt; i++) > if (3 * i + k < 2 * nelt) > sel[i] = i; > else > sel[i] = nelt + ((nelt + k) % 3) + 3 * (j++); > - > - perm3_mask_high = vect_gen_perm_mask_checked (vectype, sel); > + indices.new_vector (sel, 2, nelt); > + perm3_mask_high = vect_gen_perm_mask_checked (vectype, indices); > > first_vect = dr_chain[0]; > second_vect = dr_chain[1]; > @@ -5374,11 +5392,13 @@ vect_permute_load_chain (vec<tree> dr_ch > > for (i = 0; i < nelt; ++i) > sel[i] = i * 2; > - perm_mask_even = vect_gen_perm_mask_checked (vectype, sel); > + vec_perm_indices indices (sel, 2, nelt); > + perm_mask_even = vect_gen_perm_mask_checked (vectype, indices); > > for (i = 0; i < nelt; ++i) > sel[i] = i * 2 + 1; > - perm_mask_odd = vect_gen_perm_mask_checked (vectype, sel); > + indices.new_vector (sel, 2, nelt); > + perm_mask_odd = vect_gen_perm_mask_checked (vectype, indices); > > for (i = 0; i < log_length; i++) > { > @@ -5514,7 +5534,7 @@ vect_shift_permute_load_chain (vec<tree> > stmt_vec_info stmt_info = vinfo_for_stmt (stmt); > loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info); > > - auto_vec_perm_indices sel (nelt); > + vec_perm_builder sel (nelt, nelt, 1); > sel.quick_grow (nelt); > > result_chain->quick_grow (length); > @@ -5528,7 +5548,8 @@ vect_shift_permute_load_chain (vec<tree> > sel[i] = i * 2; > for (i = 0; i < nelt / 2; ++i) > sel[nelt / 2 + i] = i * 2 + 1; > - if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) > + vec_perm_indices indices (sel, 2, nelt); > + if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices)) > { > if (dump_enabled_p ()) > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > @@ -5536,13 +5557,14 @@ vect_shift_permute_load_chain (vec<tree> > supported by target\n"); > return false; > } > - perm2_mask1 = vect_gen_perm_mask_checked (vectype, sel); > + perm2_mask1 = vect_gen_perm_mask_checked (vectype, indices); > > for (i = 0; i < nelt / 2; ++i) > sel[i] = i * 2 + 1; > for (i = 0; i < nelt / 2; ++i) > sel[nelt / 2 + i] = i * 2; > - if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) > + indices.new_vector (sel, 2, nelt); > + if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices)) > { > if (dump_enabled_p ()) > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > @@ -5550,20 +5572,21 @@ vect_shift_permute_load_chain (vec<tree> > supported by target\n"); > return false; > } > - perm2_mask2 = vect_gen_perm_mask_checked (vectype, sel); > + perm2_mask2 = vect_gen_perm_mask_checked (vectype, indices); > > /* Generating permutation constant to shift all elements. > For vector length 8 it is {4 5 6 7 8 9 10 11}. */ > for (i = 0; i < nelt; i++) > sel[i] = nelt / 2 + i; > - if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) > + indices.new_vector (sel, 2, nelt); > + if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices)) > { > if (dump_enabled_p ()) > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > "shift permutation is not supported by target\n"); > return false; > } > - shift1_mask = vect_gen_perm_mask_checked (vectype, sel); > + shift1_mask = vect_gen_perm_mask_checked (vectype, indices); > > /* Generating permutation constant to select vector from 2. > For vector length 8 it is {0 1 2 3 12 13 14 15}. */ > @@ -5571,14 +5594,15 @@ vect_shift_permute_load_chain (vec<tree> > sel[i] = i; > for (i = nelt / 2; i < nelt; i++) > sel[i] = nelt + i; > - if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) > + indices.new_vector (sel, 2, nelt); > + if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices)) > { > if (dump_enabled_p ()) > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > "select is not supported by target\n"); > return false; > } > - select_mask = vect_gen_perm_mask_checked (vectype, sel); > + select_mask = vect_gen_perm_mask_checked (vectype, indices); > > for (i = 0; i < log_length; i++) > { > @@ -5634,7 +5658,8 @@ vect_shift_permute_load_chain (vec<tree> > sel[i] = 3 * k + (l % 3); > k++; > } > - if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) > + vec_perm_indices indices (sel, 2, nelt); > + if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices)) > { > if (dump_enabled_p ()) > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > @@ -5642,59 +5667,63 @@ vect_shift_permute_load_chain (vec<tree> > supported by target\n"); > return false; > } > - perm3_mask = vect_gen_perm_mask_checked (vectype, sel); > + perm3_mask = vect_gen_perm_mask_checked (vectype, indices); > > /* Generating permutation constant to shift all elements. > For vector length 8 it is {6 7 8 9 10 11 12 13}. */ > for (i = 0; i < nelt; i++) > sel[i] = 2 * (nelt / 3) + (nelt % 3) + i; > - if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) > + indices.new_vector (sel, 2, nelt); > + if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices)) > { > if (dump_enabled_p ()) > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > "shift permutation is not supported by target\n"); > return false; > } > - shift1_mask = vect_gen_perm_mask_checked (vectype, sel); > + shift1_mask = vect_gen_perm_mask_checked (vectype, indices); > > /* Generating permutation constant to shift all elements. > For vector length 8 it is {5 6 7 8 9 10 11 12}. */ > for (i = 0; i < nelt; i++) > sel[i] = 2 * (nelt / 3) + 1 + i; > - if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) > + indices.new_vector (sel, 2, nelt); > + if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices)) > { > if (dump_enabled_p ()) > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > "shift permutation is not supported by target\n"); > return false; > } > - shift2_mask = vect_gen_perm_mask_checked (vectype, sel); > + shift2_mask = vect_gen_perm_mask_checked (vectype, indices); > > /* Generating permutation constant to shift all elements. > For vector length 8 it is {3 4 5 6 7 8 9 10}. */ > for (i = 0; i < nelt; i++) > sel[i] = (nelt / 3) + (nelt % 3) / 2 + i; > - if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) > + indices.new_vector (sel, 2, nelt); > + if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices)) > { > if (dump_enabled_p ()) > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > "shift permutation is not supported by target\n"); > return false; > } > - shift3_mask = vect_gen_perm_mask_checked (vectype, sel); > + shift3_mask = vect_gen_perm_mask_checked (vectype, indices); > > /* Generating permutation constant to shift all elements. > For vector length 8 it is {5 6 7 8 9 10 11 12}. */ > for (i = 0; i < nelt; i++) > sel[i] = 2 * (nelt / 3) + (nelt % 3) / 2 + i; > - if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) > + indices.new_vector (sel, 2, nelt); > + if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices)) > { > if (dump_enabled_p ()) > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > "shift permutation is not supported by target\n"); > return false; > } > - shift4_mask = vect_gen_perm_mask_checked (vectype, sel); > + shift4_mask = vect_gen_perm_mask_checked (vectype, indices); > > for (k = 0; k < 3; k++) > { > Index: gcc/tree-vect-slp.c > =================================================================== > --- gcc/tree-vect-slp.c 2017-12-09 22:47:27.884318101 +0000 > +++ gcc/tree-vect-slp.c 2017-12-09 22:48:47.547825355 +0000 > @@ -894,7 +894,7 @@ vect_build_slp_tree_1 (vec_info *vinfo, > && TREE_CODE_CLASS (alt_stmt_code) != tcc_reference) > { > unsigned int count = TYPE_VECTOR_SUBPARTS (vectype); > - auto_vec_perm_indices sel (count); > + vec_perm_builder sel (count, count, 1); > for (i = 0; i < count; ++i) > { > unsigned int elt = i; > @@ -902,7 +902,8 @@ vect_build_slp_tree_1 (vec_info *vinfo, > elt += count; > sel.quick_push (elt); > } > - if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) > + vec_perm_indices indices (sel, 2, count); > + if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices)) > { > for (i = 0; i < group_size; ++i) > if (gimple_assign_rhs_code (stmts[i]) == alt_stmt_code) > @@ -3570,8 +3571,9 @@ vect_transform_slp_perm_load (slp_tree n > (int_mode_for_mode (TYPE_MODE (TREE_TYPE (vectype))).require (), 1); > mask_type = get_vectype_for_scalar_type (mask_element_type); > nunits = TYPE_VECTOR_SUBPARTS (vectype); > - auto_vec_perm_indices mask (nunits); > + vec_perm_builder mask (nunits, nunits, 1); > mask.quick_grow (nunits); > + vec_perm_indices indices; > > /* Initialize the vect stmts of NODE to properly insert the generated > stmts later. */ > @@ -3644,10 +3646,10 @@ vect_transform_slp_perm_load (slp_tree n > noop_p = false; > mask[index++] = mask_element; > > - if (index == nunits) > + if (index == nunits && !noop_p) > { > - if (! noop_p > - && ! can_vec_perm_const_p (mode, mask)) > + indices.new_vector (mask, 2, nunits); > + if (!can_vec_perm_const_p (mode, indices)) > { > if (dump_enabled_p ()) > { > @@ -3655,16 +3657,19 @@ vect_transform_slp_perm_load (slp_tree n > vect_location, > "unsupported vect permute { "); > for (i = 0; i < nunits; ++i) > - dump_printf (MSG_MISSED_OPTIMIZATION, "%d ", mask[i]); > + dump_printf (MSG_MISSED_OPTIMIZATION, > + HOST_WIDE_INT_PRINT_DEC " ", mask[i]); > dump_printf (MSG_MISSED_OPTIMIZATION, "}\n"); > } > gcc_assert (analyze_only); > return false; > } > > - if (! noop_p) > - ++*n_perms; > + ++*n_perms; > + } > > + if (index == nunits) > + { > if (!analyze_only) > { > tree mask_vec = NULL_TREE; > @@ -3797,7 +3802,7 @@ vect_schedule_slp_instance (slp_tree nod > enum tree_code code0 = gimple_assign_rhs_code (stmt); > enum tree_code ocode = ERROR_MARK; > gimple *ostmt; > - auto_vec_perm_indices mask (group_size); > + vec_perm_builder mask (group_size, group_size, 1); > FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, ostmt) > if (gimple_assign_rhs_code (ostmt) != code0) > { > Index: gcc/tree-vect-stmts.c > =================================================================== > --- gcc/tree-vect-stmts.c 2017-12-09 22:47:27.885318101 +0000 > +++ gcc/tree-vect-stmts.c 2017-12-09 22:48:47.548825399 +0000 > @@ -1717,13 +1717,14 @@ perm_mask_for_reverse (tree vectype) > > nunits = TYPE_VECTOR_SUBPARTS (vectype); > > - auto_vec_perm_indices sel (nunits); > + vec_perm_builder sel (nunits, nunits, 1); > for (i = 0; i < nunits; ++i) > sel.quick_push (nunits - 1 - i); > > - if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) > + vec_perm_indices indices (sel, 1, nunits); > + if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices)) > return NULL_TREE; > - return vect_gen_perm_mask_checked (vectype, sel); > + return vect_gen_perm_mask_checked (vectype, indices); > } > > /* A subroutine of get_load_store_type, with a subset of the same > @@ -2185,27 +2186,32 @@ vectorizable_mask_load_store (gimple *st > { > modifier = WIDEN; > > - auto_vec_perm_indices sel (gather_off_nunits); > + vec_perm_builder sel (gather_off_nunits, gather_off_nunits, 1); > for (i = 0; i < gather_off_nunits; ++i) > sel.quick_push (i | nunits); > > - perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype, sel); > + vec_perm_indices indices (sel, 1, gather_off_nunits); > + perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype, > + indices); > } > else if (nunits == gather_off_nunits * 2) > { > modifier = NARROW; > > - auto_vec_perm_indices sel (nunits); > + vec_perm_builder sel (nunits, nunits, 1); > sel.quick_grow (nunits); > for (i = 0; i < nunits; ++i) > sel[i] = i < gather_off_nunits > ? i : i + nunits - gather_off_nunits; > + vec_perm_indices indices (sel, 2, nunits); > + perm_mask = vect_gen_perm_mask_checked (vectype, indices); > > - perm_mask = vect_gen_perm_mask_checked (vectype, sel); > ncopies *= 2; > + > for (i = 0; i < nunits; ++i) > sel[i] = i | gather_off_nunits; > - mask_perm_mask = vect_gen_perm_mask_checked (masktype, sel); > + indices.new_vector (sel, 2, gather_off_nunits); > + mask_perm_mask = vect_gen_perm_mask_checked (masktype, indices); > } > else > gcc_unreachable (); > @@ -2498,12 +2504,13 @@ vectorizable_bswap (gimple *stmt, gimple > unsigned int num_bytes = TYPE_VECTOR_SUBPARTS (char_vectype); > unsigned word_bytes = num_bytes / nunits; > > - auto_vec_perm_indices elts (num_bytes); > + vec_perm_builder elts (num_bytes, num_bytes, 1); > for (unsigned i = 0; i < nunits; ++i) > for (unsigned j = 0; j < word_bytes; ++j) > elts.quick_push ((i + 1) * word_bytes - j - 1); > > - if (!can_vec_perm_const_p (TYPE_MODE (char_vectype), elts)) > + vec_perm_indices indices (elts, 1, num_bytes); > + if (!can_vec_perm_const_p (TYPE_MODE (char_vectype), indices)) > return false; > > if (! vec_stmt) > @@ -5809,22 +5816,25 @@ vectorizable_store (gimple *stmt, gimple > { > modifier = WIDEN; > > - auto_vec_perm_indices sel (scatter_off_nunits); > + vec_perm_builder sel (scatter_off_nunits, scatter_off_nunits, 1); > for (i = 0; i < (unsigned int) scatter_off_nunits; ++i) > sel.quick_push (i | nunits); > > - perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype, sel); > + vec_perm_indices indices (sel, 1, scatter_off_nunits); > + perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype, > + indices); > gcc_assert (perm_mask != NULL_TREE); > } > else if (nunits == (unsigned int) scatter_off_nunits * 2) > { > modifier = NARROW; > > - auto_vec_perm_indices sel (nunits); > + vec_perm_builder sel (nunits, nunits, 1); > for (i = 0; i < (unsigned int) nunits; ++i) > sel.quick_push (i | scatter_off_nunits); > > - perm_mask = vect_gen_perm_mask_checked (vectype, sel); > + vec_perm_indices indices (sel, 2, nunits); > + perm_mask = vect_gen_perm_mask_checked (vectype, indices); > gcc_assert (perm_mask != NULL_TREE); > ncopies *= 2; > } > @@ -6845,22 +6855,25 @@ vectorizable_load (gimple *stmt, gimple_ > { > modifier = WIDEN; > > - auto_vec_perm_indices sel (gather_off_nunits); > + vec_perm_builder sel (gather_off_nunits, gather_off_nunits, 1); > for (i = 0; i < gather_off_nunits; ++i) > sel.quick_push (i | nunits); > > - perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype, sel); > + vec_perm_indices indices (sel, 1, gather_off_nunits); > + perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype, > + indices); > } > else if (nunits == gather_off_nunits * 2) > { > modifier = NARROW; > > - auto_vec_perm_indices sel (nunits); > + vec_perm_builder sel (nunits, nunits, 1); > for (i = 0; i < nunits; ++i) > sel.quick_push (i < gather_off_nunits > ? i : i + nunits - gather_off_nunits); > > - perm_mask = vect_gen_perm_mask_checked (vectype, sel); > + vec_perm_indices indices (sel, 2, nunits); > + perm_mask = vect_gen_perm_mask_checked (vectype, indices); > ncopies *= 2; > } > else > Index: gcc/tree-vect-generic.c > =================================================================== > --- gcc/tree-vect-generic.c 2017-12-09 22:47:27.883318100 +0000 > +++ gcc/tree-vect-generic.c 2017-12-09 22:48:47.547825355 +0000 > @@ -1299,15 +1299,13 @@ lower_vec_perm (gimple_stmt_iterator *gs > mask = gimple_assign_rhs1 (def_stmt); > } > > - if (TREE_CODE (mask) == VECTOR_CST) > - { > - auto_vec_perm_indices sel_int (elements); > - > - for (i = 0; i < elements; ++i) > - sel_int.quick_push (TREE_INT_CST_LOW (VECTOR_CST_ELT (mask, i)) > - & (2 * elements - 1)); > + vec_perm_builder sel_int; > > - if (can_vec_perm_const_p (TYPE_MODE (vect_type), sel_int)) > + if (TREE_CODE (mask) == VECTOR_CST > + && tree_to_vec_perm_builder (&sel_int, mask)) > + { > + vec_perm_indices indices (sel_int, 2, elements); > + if (can_vec_perm_const_p (TYPE_MODE (vect_type), indices)) > { > gimple_assign_set_rhs3 (stmt, mask); > update_stmt (stmt); > @@ -1319,14 +1317,14 @@ lower_vec_perm (gimple_stmt_iterator *gs > != CODE_FOR_nothing > && TREE_CODE (vec1) == VECTOR_CST > && initializer_zerop (vec1) > - && sel_int[0] > - && sel_int[0] < elements) > + && indices[0] > + && indices[0] < elements) > { > for (i = 1; i < elements; ++i) > { > - unsigned int expected = i + sel_int[0]; > + unsigned int expected = i + indices[0]; > /* Indices into the second vector are all equivalent. */ > - if (MIN (elements, (unsigned) sel_int[i]) > + if (MIN (elements, (unsigned) indices[i]) > != MIN (elements, expected)) > break; > } > Index: gcc/tree-vect-loop.c > =================================================================== > --- gcc/tree-vect-loop.c 2017-12-09 22:47:27.884318101 +0000 > +++ gcc/tree-vect-loop.c 2017-12-09 22:48:47.547825355 +0000 > @@ -3714,12 +3714,11 @@ vect_estimate_min_profitable_iters (loop > vector elements (not bits) for a vector with NELT elements. */ > static void > calc_vec_perm_mask_for_shift (unsigned int offset, unsigned int nelt, > - vec_perm_indices *sel) > + vec_perm_builder *sel) > { > - unsigned int i; > - > - for (i = 0; i < nelt; i++) > - sel->quick_push ((i + offset) & (2 * nelt - 1)); > + sel->new_vector (nelt, nelt, 1); > + for (unsigned int i = 0; i < nelt; i++) > + sel->quick_push (i + offset); > } > > /* Checks whether the target supports whole-vector shifts for vectors of mode > @@ -3732,13 +3731,13 @@ have_whole_vector_shift (machine_mode mo > return true; > > unsigned int i, nelt = GET_MODE_NUNITS (mode); > - auto_vec_perm_indices sel (nelt); > - > + vec_perm_builder sel; > + vec_perm_indices indices; > for (i = nelt/2; i >= 1; i/=2) > { > - sel.truncate (0); > calc_vec_perm_mask_for_shift (i, nelt, &sel); > - if (!can_vec_perm_const_p (mode, sel, false)) > + indices.new_vector (sel, 2, nelt); > + if (!can_vec_perm_const_p (mode, indices, false)) > return false; > } > return true; > @@ -5028,7 +5027,8 @@ vect_create_epilog_for_reduction (vec<tr > if (reduce_with_shift && !slp_reduc) > { > int nelements = vec_size_in_bits / element_bitsize; > - auto_vec_perm_indices sel (nelements); > + vec_perm_builder sel; > + vec_perm_indices indices; > > int elt_offset; > > @@ -5052,9 +5052,9 @@ vect_create_epilog_for_reduction (vec<tr > elt_offset >= 1; > elt_offset /= 2) > { > - sel.truncate (0); > calc_vec_perm_mask_for_shift (elt_offset, nelements, &sel); > - tree mask = vect_gen_perm_mask_any (vectype, sel); > + indices.new_vector (sel, 2, nelements); > + tree mask = vect_gen_perm_mask_any (vectype, indices); > epilog_stmt = gimple_build_assign (vec_dest, VEC_PERM_EXPR, > new_temp, zero_vec, mask); > new_name = make_ssa_name (vec_dest, epilog_stmt); > Index: gcc/config/aarch64/aarch64.c > =================================================================== > --- gcc/config/aarch64/aarch64.c 2017-12-09 22:47:27.856318084 +0000 > +++ gcc/config/aarch64/aarch64.c 2017-12-09 22:48:47.535824832 +0000 > @@ -13208,7 +13208,7 @@ #define MAX_VECT_LEN 16 > struct expand_vec_perm_d > { > rtx target, op0, op1; > - auto_vec_perm_indices perm; > + vec_perm_indices perm; > machine_mode vmode; > bool one_vector_p; > bool testing_p; > @@ -13598,10 +13598,7 @@ aarch64_expand_vec_perm_const_1 (struct > unsigned int nelt = d->perm.length (); > if (d->perm[0] >= nelt) > { > - gcc_assert (nelt == (nelt & -nelt)); > - for (unsigned int i = 0; i < nelt; ++i) > - d->perm[i] ^= nelt; /* Keep the same index, but in the other vector. */ > - > + d->perm.rotate_inputs (1); > std::swap (d->op0, d->op1); > } > > @@ -13641,12 +13638,10 @@ aarch64_vectorize_vec_perm_const (machin > > /* Calculate whether all elements are in one vector. */ > unsigned int nelt = sel.length (); > - d.perm.reserve (nelt); > for (i = which = 0; i < nelt; ++i) > { > unsigned int ei = sel[i] & (2 * nelt - 1); > which |= (ei < nelt ? 1 : 2); > - d.perm.quick_push (ei); > } > > switch (which) > @@ -13665,8 +13660,6 @@ aarch64_vectorize_vec_perm_const (machin > input vector. */ > /* Fall Through. */ > case 2: > - for (i = 0; i < nelt; ++i) > - d.perm[i] &= nelt - 1; > d.op0 = op1; > d.one_vector_p = true; > break; > @@ -13677,6 +13670,8 @@ aarch64_vectorize_vec_perm_const (machin > break; > } > > + d.perm.new_vector (sel.encoding (), d.one_vector_p ? 1 : 2, nelt); > + > if (!d.testing_p) > return aarch64_expand_vec_perm_const_1 (&d); > > Index: gcc/config/arm/arm.c > =================================================================== > --- gcc/config/arm/arm.c 2017-12-09 22:47:27.858318085 +0000 > +++ gcc/config/arm/arm.c 2017-12-09 22:48:47.538824963 +0000 > @@ -28852,7 +28852,7 @@ #define MAX_VECT_LEN 16 > struct expand_vec_perm_d > { > rtx target, op0, op1; > - auto_vec_perm_indices perm; > + vec_perm_indices perm; > machine_mode vmode; > bool one_vector_p; > bool testing_p; > @@ -29360,9 +29360,7 @@ arm_expand_vec_perm_const_1 (struct expa > unsigned int nelt = d->perm.length (); > if (d->perm[0] >= nelt) > { > - for (unsigned int i = 0; i < nelt; ++i) > - d->perm[i] = (d->perm[i] + nelt) & (2 * nelt - 1); > - > + d->perm.rotate_inputs (1); > std::swap (d->op0, d->op1); > } > > @@ -29402,12 +29400,10 @@ arm_vectorize_vec_perm_const (machine_mo > d.testing_p = !target; > > nelt = GET_MODE_NUNITS (d.vmode); > - d.perm.reserve (nelt); > for (i = which = 0; i < nelt; ++i) > { > int ei = sel[i] & (2 * nelt - 1); > which |= (ei < nelt ? 1 : 2); > - d.perm.quick_push (ei); > } > > switch (which) > @@ -29426,8 +29422,6 @@ arm_vectorize_vec_perm_const (machine_mo > input vector. */ > /* FALLTHRU */ > case 2: > - for (i = 0; i < nelt; ++i) > - d.perm[i] &= nelt - 1; > d.op0 = op1; > d.one_vector_p = true; > break; > @@ -29438,6 +29432,8 @@ arm_vectorize_vec_perm_const (machine_mo > break; > } > > + d.perm.new_vector (sel.encoding (), d.one_vector_p ? 1 : 2, nelt); > + > if (d.testing_p) > return arm_expand_vec_perm_const_1 (&d); > > Index: gcc/config/powerpcspe/powerpcspe.c > =================================================================== > --- gcc/config/powerpcspe/powerpcspe.c 2017-12-09 22:47:27.871318093 +0000 > +++ gcc/config/powerpcspe/powerpcspe.c 2017-12-09 22:48:47.541825094 +0000 > @@ -38780,7 +38780,7 @@ rs6000_expand_extract_even (rtx target, > { > machine_mode vmode = GET_MODE (target); > unsigned i, nelt = GET_MODE_NUNITS (vmode); > - vec_perm_builder perm (nelt); > + vec_perm_builder perm (nelt, nelt, 1); > > for (i = 0; i < nelt; i++) > perm.quick_push (i * 2); > @@ -38795,7 +38795,7 @@ rs6000_expand_interleave (rtx target, rt > { > machine_mode vmode = GET_MODE (target); > unsigned i, high, nelt = GET_MODE_NUNITS (vmode); > - vec_perm_builder perm (nelt); > + vec_perm_builder perm (nelt, nelt, 1); > > high = (highp ? 0 : nelt / 2); > for (i = 0; i < nelt / 2; i++) > Index: gcc/config/rs6000/rs6000.c > =================================================================== > --- gcc/config/rs6000/rs6000.c 2017-12-09 22:47:27.874318095 +0000 > +++ gcc/config/rs6000/rs6000.c 2017-12-09 22:48:47.544825224 +0000 > @@ -36017,7 +36017,7 @@ rs6000_expand_extract_even (rtx target, > { > machine_mode vmode = GET_MODE (target); > unsigned i, nelt = GET_MODE_NUNITS (vmode); > - vec_perm_builder perm (nelt); > + vec_perm_builder perm (nelt, nelt, 1); > > for (i = 0; i < nelt; i++) > perm.quick_push (i * 2); > @@ -36032,7 +36032,7 @@ rs6000_expand_interleave (rtx target, rt > { > machine_mode vmode = GET_MODE (target); > unsigned i, high, nelt = GET_MODE_NUNITS (vmode); > - vec_perm_builder perm (nelt); > + vec_perm_builder perm (nelt, nelt, 1); > > high = (highp ? 0 : nelt / 2); > for (i = 0; i < nelt / 2; i++)
Richard Biener <richard.guenther@gmail.com> writes: > On Sun, Dec 10, 2017 at 12:20 AM, Richard Sandiford > <richard.sandiford@linaro.org> wrote: >> This patch changes vec_perm_indices from a plain vec<> to a class >> that stores a canonicalised permutation, using the same encoding >> as for VECTOR_CSTs. This means that vec_perm_indices now carries >> information about the number of vectors being permuted (currently >> always 1 or 2) and the number of elements in each input vector. > > Before I dive into the C++ details can you explain why it needs this > info and how it encodes it for variable-length vectors? To interleave > two vectors you need sth like { 0, N, 1, N+1, ... }, I'm not sure we > can directly encode N here, can we? extract even/odd should just > work as { 0, 2, 4, 6, ...} without knowledge of whether we permute > one or two vectors (the one vector case just has two times the same > vector) or how many elements each of the vectors (or the result) has. One of the later patches switches the element types to HOST_WIDE_INT, so that we can represent all ssizetypes. Then there's a poly_int patch (not yet posted) to make that poly_int64, so that we can represent the N even for variable-length vectors. The class needs to know the number of elements because that affects the canonical representation. E.g. extract even on fixed-length vectors with both inputs the same should be { 0, 2, 4, ..., 0, 2, 4 ... }, which we can't encode as a simple series. Interleave low with both inputs the same should be { 0, 0, 1, 1, ... } for both fixed-length and variable-length vectors. Also, operator[] is supposed to return an in-range selector even if the selector element is only implicitly encoded. So we need to know the number of input elements there. Separating the number of input elements into the number of inputs and the number of elements per input isn't really necessary, but made it easier to provide routines for testing whether all selected elements come from a particular input, and for rotating the selector by a whole number of inputs. Thanks, Richard
On Tue, Dec 12, 2017 at 4:46 PM, Richard Sandiford <richard.sandiford@linaro.org> wrote: > Richard Biener <richard.guenther@gmail.com> writes: >> On Sun, Dec 10, 2017 at 12:20 AM, Richard Sandiford >> <richard.sandiford@linaro.org> wrote: >>> This patch changes vec_perm_indices from a plain vec<> to a class >>> that stores a canonicalised permutation, using the same encoding >>> as for VECTOR_CSTs. This means that vec_perm_indices now carries >>> information about the number of vectors being permuted (currently >>> always 1 or 2) and the number of elements in each input vector. >> >> Before I dive into the C++ details can you explain why it needs this >> info and how it encodes it for variable-length vectors? To interleave >> two vectors you need sth like { 0, N, 1, N+1, ... }, I'm not sure we >> can directly encode N here, can we? extract even/odd should just >> work as { 0, 2, 4, 6, ...} without knowledge of whether we permute >> one or two vectors (the one vector case just has two times the same >> vector) or how many elements each of the vectors (or the result) has. > > One of the later patches switches the element types to HOST_WIDE_INT, > so that we can represent all ssizetypes. Then there's a poly_int > patch (not yet posted) to make that poly_int64, so that we can > represent the N even for variable-length vectors. > > The class needs to know the number of elements because that affects > the canonical representation. E.g. extract even on fixed-length > vectors with both inputs the same should be { 0, 2, 4, ..., 0, 2, 4 ... }, > which we can't encode as a simple series. Interleave low with both > inputs the same should be { 0, 0, 1, 1, ... } for both fixed-length and > variable-length vectors. Huh? extract even is { 0, 2, 4, 6, 8 ... } indexes in the selection vector are referencing concat'ed input vectors. So yes, for two same vectors that's effectively { 0, 2, 4, ..., 0, 2, 4, ... } but I don't see why that should be the canonical form? > Also, operator[] is supposed to return an in-range selector even if > the selector element is only implicitly encoded. So we need to know > the number of input elements there. > > Separating the number of input elements into the number of inputs > and the number of elements per input isn't really necessary, but made > it easier to provide routines for testing whether all selected > elements come from a particular input, and for rotating the selector > by a whole number of inputs. > > Thanks, > Richard
Richard Biener <richard.guenther@gmail.com> writes: > On Tue, Dec 12, 2017 at 4:46 PM, Richard Sandiford > <richard.sandiford@linaro.org> wrote: >> Richard Biener <richard.guenther@gmail.com> writes: >>> On Sun, Dec 10, 2017 at 12:20 AM, Richard Sandiford >>> <richard.sandiford@linaro.org> wrote: >>>> This patch changes vec_perm_indices from a plain vec<> to a class >>>> that stores a canonicalised permutation, using the same encoding >>>> as for VECTOR_CSTs. This means that vec_perm_indices now carries >>>> information about the number of vectors being permuted (currently >>>> always 1 or 2) and the number of elements in each input vector. >>> >>> Before I dive into the C++ details can you explain why it needs this >>> info and how it encodes it for variable-length vectors? To interleave >>> two vectors you need sth like { 0, N, 1, N+1, ... }, I'm not sure we >>> can directly encode N here, can we? extract even/odd should just >>> work as { 0, 2, 4, 6, ...} without knowledge of whether we permute >>> one or two vectors (the one vector case just has two times the same >>> vector) or how many elements each of the vectors (or the result) has. >> >> One of the later patches switches the element types to HOST_WIDE_INT, >> so that we can represent all ssizetypes. Then there's a poly_int >> patch (not yet posted) to make that poly_int64, so that we can >> represent the N even for variable-length vectors. >> >> The class needs to know the number of elements because that affects >> the canonical representation. E.g. extract even on fixed-length >> vectors with both inputs the same should be { 0, 2, 4, ..., 0, 2, 4 ... }, >> which we can't encode as a simple series. Interleave low with both >> inputs the same should be { 0, 0, 1, 1, ... } for both fixed-length and >> variable-length vectors. > > Huh? extract even is { 0, 2, 4, 6, 8 ... } indexes in the selection vector > are referencing concat'ed input vectors. So yes, for two same vectors > that's effectively { 0, 2, 4, ..., 0, 2, 4, ... } but I don't see why > that should > be the canonical form? Current practice is to use the single-input form where possible, if both inputs are the same (see e.g. the VEC_PERM_EXPR handling in fold-const.c). It means that things like: _1 = VEC_PERM_EXPR <a, a, { 0, 2, 4, 6, 0, 2, 4, 6 }>; _2 = VEC_PERM_EXPR <a, a, { 0, 2, 4, 6, 8, 10, 12, 14 }>; _3 = VEC_PERM_EXPR <a, b, { 0, 2, 4, 6, 0, 2, 4, 6 }>; get folded to the same sequence, and so can be CSEd. We could instead convert the single-input form to use the two-input selector, but that would be harder. The advantage of treating the single-input form as canonical is that it works even for irregular permutes. Thanks, Richard >> Also, operator[] is supposed to return an in-range selector even if >> the selector element is only implicitly encoded. So we need to know >> the number of input elements there. >> >> Separating the number of input elements into the number of inputs >> and the number of elements per input isn't really necessary, but made >> it easier to provide routines for testing whether all selected >> elements come from a particular input, and for rotating the selector >> by a whole number of inputs. >> >> Thanks, >> Richard
On Wed, Dec 20, 2017 at 2:48 PM, Richard Sandiford <richard.sandiford@linaro.org> wrote: > Richard Biener <richard.guenther@gmail.com> writes: >> On Tue, Dec 12, 2017 at 4:46 PM, Richard Sandiford >> <richard.sandiford@linaro.org> wrote: >>> Richard Biener <richard.guenther@gmail.com> writes: >>>> On Sun, Dec 10, 2017 at 12:20 AM, Richard Sandiford >>>> <richard.sandiford@linaro.org> wrote: >>>>> This patch changes vec_perm_indices from a plain vec<> to a class >>>>> that stores a canonicalised permutation, using the same encoding >>>>> as for VECTOR_CSTs. This means that vec_perm_indices now carries >>>>> information about the number of vectors being permuted (currently >>>>> always 1 or 2) and the number of elements in each input vector. >>>> >>>> Before I dive into the C++ details can you explain why it needs this >>>> info and how it encodes it for variable-length vectors? To interleave >>>> two vectors you need sth like { 0, N, 1, N+1, ... }, I'm not sure we >>>> can directly encode N here, can we? extract even/odd should just >>>> work as { 0, 2, 4, 6, ...} without knowledge of whether we permute >>>> one or two vectors (the one vector case just has two times the same >>>> vector) or how many elements each of the vectors (or the result) has. >>> >>> One of the later patches switches the element types to HOST_WIDE_INT, >>> so that we can represent all ssizetypes. Then there's a poly_int >>> patch (not yet posted) to make that poly_int64, so that we can >>> represent the N even for variable-length vectors. >>> >>> The class needs to know the number of elements because that affects >>> the canonical representation. E.g. extract even on fixed-length >>> vectors with both inputs the same should be { 0, 2, 4, ..., 0, 2, 4 ... }, >>> which we can't encode as a simple series. Interleave low with both >>> inputs the same should be { 0, 0, 1, 1, ... } for both fixed-length and >>> variable-length vectors. >> >> Huh? extract even is { 0, 2, 4, 6, 8 ... } indexes in the selection vector >> are referencing concat'ed input vectors. So yes, for two same vectors >> that's effectively { 0, 2, 4, ..., 0, 2, 4, ... } but I don't see why >> that should >> be the canonical form? > > Current practice is to use the single-input form where possible, > if both inputs are the same (see e.g. the VEC_PERM_EXPR handling > in fold-const.c). It means that things like: > > _1 = VEC_PERM_EXPR <a, a, { 0, 2, 4, 6, 0, 2, 4, 6 }>; > _2 = VEC_PERM_EXPR <a, a, { 0, 2, 4, 6, 8, 10, 12, 14 }>; > _3 = VEC_PERM_EXPR <a, b, { 0, 2, 4, 6, 0, 2, 4, 6 }>; > > get folded to the same sequence, and so can be CSEd. > > We could instead convert the single-input form to use the two-input > selector, but that would be harder. The advantage of treating the > single-input form as canonical is that it works even for irregular > permutes. Ok, I see. Maybe adding a comment along this helps. Thanks, Richard. > Thanks, > Richard > >>> Also, operator[] is supposed to return an in-range selector even if >>> the selector element is only implicitly encoded. So we need to know >>> the number of input elements there. >>> >>> Separating the number of input elements into the number of inputs >>> and the number of elements per input isn't really necessary, but made >>> it easier to provide routines for testing whether all selected >>> elements come from a particular input, and for rotating the selector >>> by a whole number of inputs. >>> >>> Thanks, >>> Richard
Richard Biener <richard.guenther@gmail.com> writes: > On Wed, Dec 20, 2017 at 2:48 PM, Richard Sandiford > <richard.sandiford@linaro.org> wrote: >> Richard Biener <richard.guenther@gmail.com> writes: >>> On Tue, Dec 12, 2017 at 4:46 PM, Richard Sandiford >>> <richard.sandiford@linaro.org> wrote: >>>> Richard Biener <richard.guenther@gmail.com> writes: >>>>> On Sun, Dec 10, 2017 at 12:20 AM, Richard Sandiford >>>>> <richard.sandiford@linaro.org> wrote: >>>>>> This patch changes vec_perm_indices from a plain vec<> to a class >>>>>> that stores a canonicalised permutation, using the same encoding >>>>>> as for VECTOR_CSTs. This means that vec_perm_indices now carries >>>>>> information about the number of vectors being permuted (currently >>>>>> always 1 or 2) and the number of elements in each input vector. >>>>> >>>>> Before I dive into the C++ details can you explain why it needs this >>>>> info and how it encodes it for variable-length vectors? To interleave >>>>> two vectors you need sth like { 0, N, 1, N+1, ... }, I'm not sure we >>>>> can directly encode N here, can we? extract even/odd should just >>>>> work as { 0, 2, 4, 6, ...} without knowledge of whether we permute >>>>> one or two vectors (the one vector case just has two times the same >>>>> vector) or how many elements each of the vectors (or the result) has. >>>> >>>> One of the later patches switches the element types to HOST_WIDE_INT, >>>> so that we can represent all ssizetypes. Then there's a poly_int >>>> patch (not yet posted) to make that poly_int64, so that we can >>>> represent the N even for variable-length vectors. >>>> >>>> The class needs to know the number of elements because that affects >>>> the canonical representation. E.g. extract even on fixed-length >>>> vectors with both inputs the same should be { 0, 2, 4, ..., 0, 2, 4 ... }, >>>> which we can't encode as a simple series. Interleave low with both >>>> inputs the same should be { 0, 0, 1, 1, ... } for both fixed-length and >>>> variable-length vectors. >>> >>> Huh? extract even is { 0, 2, 4, 6, 8 ... } indexes in the selection vector >>> are referencing concat'ed input vectors. So yes, for two same vectors >>> that's effectively { 0, 2, 4, ..., 0, 2, 4, ... } but I don't see why >>> that should >>> be the canonical form? >> >> Current practice is to use the single-input form where possible, >> if both inputs are the same (see e.g. the VEC_PERM_EXPR handling >> in fold-const.c). It means that things like: >> >> _1 = VEC_PERM_EXPR <a, a, { 0, 2, 4, 6, 0, 2, 4, 6 }>; >> _2 = VEC_PERM_EXPR <a, a, { 0, 2, 4, 6, 8, 10, 12, 14 }>; >> _3 = VEC_PERM_EXPR <a, b, { 0, 2, 4, 6, 0, 2, 4, 6 }>; >> >> get folded to the same sequence, and so can be CSEd. >> >> We could instead convert the single-input form to use the two-input >> selector, but that would be harder. The advantage of treating the >> single-input form as canonical is that it works even for irregular >> permutes. > > Ok, I see. Maybe adding a comment along this helps. OK, thanks, installed as below with that change. Richard 2018-01-02 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * int-vector-builder.h: New file. * vec-perm-indices.h: Include int-vector-builder.h. (vec_perm_indices): Redefine as an int_vector_builder. (auto_vec_perm_indices): Delete. (vec_perm_builder): Redefine as a stand-alone class. (vec_perm_indices::vec_perm_indices): New function. (vec_perm_indices::clamp): Likewise. * vec-perm-indices.c: Include fold-const.h and tree-vector-builder.h. (vec_perm_indices::new_vector): New function. (vec_perm_indices::new_expanded_vector): Update for new vec_perm_indices class. (vec_perm_indices::rotate_inputs): New function. (vec_perm_indices::all_in_range_p): Operate directly on the encoded form, without computing elided elements. (tree_to_vec_perm_builder): Operate directly on the VECTOR_CST encoding. Update for new vec_perm_indices class. * optabs.c (expand_vec_perm_const): Create a vec_perm_indices for the given vec_perm_builder. (expand_vec_perm_var): Update vec_perm_builder constructor. (expand_mult_highpart): Use vec_perm_builder instead of auto_vec_perm_indices. * optabs-query.c (can_mult_highpart_p): Use vec_perm_builder and vec_perm_indices instead of auto_vec_perm_indices. Use a single or double series encoding as appropriate. * fold-const.c (fold_ternary_loc): Use vec_perm_builder and vec_perm_indices instead of auto_vec_perm_indices. * tree-ssa-forwprop.c (simplify_vector_constructor): Likewise. * tree-vect-data-refs.c (vect_grouped_store_supported): Likewise. (vect_permute_store_chain): Likewise. (vect_grouped_load_supported): Likewise. (vect_permute_load_chain): Likewise. (vect_shift_permute_load_chain): Likewise. * tree-vect-slp.c (vect_build_slp_tree_1): Likewise. (vect_transform_slp_perm_load): Likewise. (vect_schedule_slp_instance): Likewise. * tree-vect-stmts.c (perm_mask_for_reverse): Likewise. (vectorizable_mask_load_store): Likewise. (vectorizable_bswap): Likewise. (vectorizable_store): Likewise. (vectorizable_load): Likewise. * tree-vect-generic.c (lower_vec_perm): Use vec_perm_builder and vec_perm_indices instead of auto_vec_perm_indices. Use tree_to_vec_perm_builder to read the vector from a tree. * tree-vect-loop.c (calc_vec_perm_mask_for_shift): Take a vec_perm_builder instead of a vec_perm_indices. (have_whole_vector_shift): Use vec_perm_builder and vec_perm_indices instead of auto_vec_perm_indices. Leave the truncation to calc_vec_perm_mask_for_shift. (vect_create_epilog_for_reduction): Likewise. * config/aarch64/aarch64.c (expand_vec_perm_d::perm): Change from auto_vec_perm_indices to vec_perm_indices. (aarch64_expand_vec_perm_const_1): Use rotate_inputs on d.perm instead of changing individual elements. (aarch64_vectorize_vec_perm_const): Use new_vector to install the vector in d.perm. * config/arm/arm.c (expand_vec_perm_d::perm): Change from auto_vec_perm_indices to vec_perm_indices. (arm_expand_vec_perm_const_1): Use rotate_inputs on d.perm instead of changing individual elements. (arm_vectorize_vec_perm_const): Use new_vector to install the vector in d.perm. * config/powerpcspe/powerpcspe.c (rs6000_expand_extract_even): Update vec_perm_builder constructor. (rs6000_expand_interleave): Likewise. * config/rs6000/rs6000.c (rs6000_expand_extract_even): Likewise. (rs6000_expand_interleave): Likewise. ------------------------------------------------------------------------------ Index: gcc/int-vector-builder.h =================================================================== --- /dev/null 2017-12-30 11:27:13.464311244 +0000 +++ gcc/int-vector-builder.h 2018-01-02 17:01:28.746627393 +0000 @@ -0,0 +1,90 @@ +/* A class for building vector integer constants. + Copyright (C) 2017 Free Software Foundation, Inc. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +<http://www.gnu.org/licenses/>. */ + +#ifndef GCC_INT_VECTOR_BUILDER_H +#define GCC_INT_VECTOR_BUILDER_H 1 + +#include "vector-builder.h" + +/* This class is used to build vectors of integer type T using the same + encoding as tree and rtx constants. See vector_builder for more + details. */ +template<typename T> +class int_vector_builder : public vector_builder<T, int_vector_builder<T> > +{ + typedef vector_builder<T, int_vector_builder> parent; + friend class vector_builder<T, int_vector_builder>; + +public: + int_vector_builder () {} + int_vector_builder (unsigned int, unsigned int, unsigned int); + + using parent::new_vector; + +private: + bool equal_p (T, T) const; + bool allow_steps_p () const { return true; } + bool integral_p (T) const { return true; } + T step (T, T) const; + T apply_step (T, unsigned int, T) const; + bool can_elide_p (T) const { return true; } + void note_representative (T *, T) {} +}; + +/* Create a new builder for a vector with FULL_NELTS elements. + Initially encode the value as NPATTERNS interleaved patterns with + NELTS_PER_PATTERN elements each. */ + +template<typename T> +inline +int_vector_builder<T>::int_vector_builder (unsigned int full_nelts, + unsigned int npatterns, + unsigned int nelts_per_pattern) +{ + new_vector (full_nelts, npatterns, nelts_per_pattern); +} + +/* Return true if elements ELT1 and ELT2 are equal. */ + +template<typename T> +inline bool +int_vector_builder<T>::equal_p (T elt1, T elt2) const +{ + return elt1 == elt2; +} + +/* Return the value of element ELT2 minus the value of element ELT1. */ + +template<typename T> +inline T +int_vector_builder<T>::step (T elt1, T elt2) const +{ + return elt2 - elt1; +} + +/* Return a vector element with the value BASE + FACTOR * STEP. */ + +template<typename T> +inline T +int_vector_builder<T>::apply_step (T base, unsigned int factor, T step) const +{ + return base + factor * step; +} + +#endif Index: gcc/vec-perm-indices.h =================================================================== --- gcc/vec-perm-indices.h 2018-01-02 17:01:26.633719678 +0000 +++ gcc/vec-perm-indices.h 2018-01-02 18:25:11.081076126 +0000 @@ -20,30 +20,117 @@ Software Foundation; either version 3, o #ifndef GCC_VEC_PERN_INDICES_H #define GCC_VEC_PERN_INDICES_H 1 +#include "int-vector-builder.h" + +/* A vector_builder for building constant permutation vectors. + The elements do not need to be clamped to a particular range + of input elements. */ +typedef int_vector_builder<HOST_WIDE_INT> vec_perm_builder; + /* This class represents a constant permutation vector, such as that used - as the final operand to a VEC_PERM_EXPR. */ -class vec_perm_indices : public auto_vec<unsigned short, 32> + as the final operand to a VEC_PERM_EXPR. + + Permutation vectors select indices modulo the number of input elements, + and the class canonicalizes each permutation vector for a particular + number of input vectors and for a particular number of elements per + input. For example, the gimple statements: + + _1 = VEC_PERM_EXPR <a, a, { 0, 2, 4, 6, 0, 2, 4, 6 }>; + _2 = VEC_PERM_EXPR <a, a, { 0, 2, 4, 6, 8, 10, 12, 14 }>; + _3 = VEC_PERM_EXPR <a, a, { 0, 2, 20, 22, 24, 2, 4, 14 }>; + + effectively have only a single vector input "a". If "a" has 8 + elements, the indices select elements modulo 8, which makes all three + VEC_PERM_EXPRs equivalent. The canonical form is for the indices to be + in the range [0, number of input elements - 1], so the class treats the + second and third permutation vectors as though they had been the first. + + The class copes with cases in which the input and output vectors have + different numbers of elements. */ +class vec_perm_indices { - typedef unsigned short element_type; - typedef auto_vec<element_type, 32> parent_type; + typedef HOST_WIDE_INT element_type; public: - vec_perm_indices () {} - vec_perm_indices (unsigned int nunits) : parent_type (nunits) {} + vec_perm_indices (); + vec_perm_indices (const vec_perm_builder &, unsigned int, unsigned int); + void new_vector (const vec_perm_builder &, unsigned int, unsigned int); void new_expanded_vector (const vec_perm_indices &, unsigned int); + void rotate_inputs (int delta); + + /* Return the underlying vector encoding. */ + const vec_perm_builder &encoding () const { return m_encoding; } + + /* Return the number of output elements. This is called length () + so that we present a more vec-like interface. */ + unsigned int length () const { return m_encoding.full_nelts (); } + /* Return the number of input vectors being permuted. */ + unsigned int ninputs () const { return m_ninputs; } + + /* Return the number of elements in each input vector. */ + unsigned int nelts_per_input () const { return m_nelts_per_input; } + + /* Return the total number of input elements. */ + unsigned int input_nelts () const { return m_ninputs * m_nelts_per_input; } + + element_type clamp (element_type) const; + element_type operator[] (unsigned int i) const; bool all_in_range_p (element_type, element_type) const; private: vec_perm_indices (const vec_perm_indices &); -}; -/* Temporary. */ -typedef vec_perm_indices vec_perm_builder; -typedef vec_perm_indices auto_vec_perm_indices; + vec_perm_builder m_encoding; + unsigned int m_ninputs; + unsigned int m_nelts_per_input; +}; bool tree_to_vec_perm_builder (vec_perm_builder *, tree); rtx vec_perm_indices_to_rtx (machine_mode, const vec_perm_indices &); +inline +vec_perm_indices::vec_perm_indices () + : m_ninputs (0), + m_nelts_per_input (0) +{ +} + +/* Construct a permutation vector that selects between NINPUTS vector + inputs that have NELTS_PER_INPUT elements each. Take the elements of + the new vector from ELEMENTS, clamping each one to be in range. */ + +inline +vec_perm_indices::vec_perm_indices (const vec_perm_builder &elements, + unsigned int ninputs, + unsigned int nelts_per_input) +{ + new_vector (elements, ninputs, nelts_per_input); +} + +/* Return the canonical value for permutation vector element ELT, + taking into account the current number of input elements. */ + +inline vec_perm_indices::element_type +vec_perm_indices::clamp (element_type elt) const +{ + element_type limit = input_nelts (); + elt %= limit; + /* Treat negative elements as counting from the end. This only matters + if the vector size is not a power of 2. */ + if (elt < 0) + elt += limit; + return elt; +} + +/* Return the value of vector element I, which might or might not be + explicitly encoded. */ + +inline vec_perm_indices::element_type +vec_perm_indices::operator[] (unsigned int i) const +{ + return clamp (m_encoding.elt (i)); +} + #endif Index: gcc/vec-perm-indices.c =================================================================== --- gcc/vec-perm-indices.c 2018-01-02 17:01:26.632719721 +0000 +++ gcc/vec-perm-indices.c 2018-01-02 17:01:28.750627219 +0000 @@ -22,11 +22,33 @@ Software Foundation; either version 3, o #include "coretypes.h" #include "vec-perm-indices.h" #include "tree.h" +#include "fold-const.h" +#include "tree-vector-builder.h" #include "backend.h" #include "rtl.h" #include "memmodel.h" #include "emit-rtl.h" +/* Switch to a new permutation vector that selects between NINPUTS vector + inputs that have NELTS_PER_INPUT elements each. Take the elements of the + new permutation vector from ELEMENTS, clamping each one to be in range. */ + +void +vec_perm_indices::new_vector (const vec_perm_builder &elements, + unsigned int ninputs, + unsigned int nelts_per_input) +{ + m_ninputs = ninputs; + m_nelts_per_input = nelts_per_input; + /* Expand the encoding and clamp each element. E.g. { 0, 2, 4, ... } + might wrap halfway if there is only one vector input. */ + unsigned int full_nelts = elements.full_nelts (); + m_encoding.new_vector (full_nelts, full_nelts, 1); + for (unsigned int i = 0; i < full_nelts; ++i) + m_encoding.quick_push (clamp (elements.elt (i))); + m_encoding.finalize (); +} + /* Switch to a new permutation vector that selects the same input elements as ORIG, but with each element split into FACTOR pieces. For example, if ORIG is { 1, 2, 0, 3 } and FACTOR is 2, the new permutation is @@ -36,14 +58,31 @@ Software Foundation; either version 3, o vec_perm_indices::new_expanded_vector (const vec_perm_indices &orig, unsigned int factor) { - truncate (0); - reserve (orig.length () * factor); - for (unsigned int i = 0; i < orig.length (); ++i) + m_ninputs = orig.m_ninputs; + m_nelts_per_input = orig.m_nelts_per_input * factor; + m_encoding.new_vector (orig.m_encoding.full_nelts () * factor, + orig.m_encoding.npatterns () * factor, + orig.m_encoding.nelts_per_pattern ()); + unsigned int encoded_nelts = orig.m_encoding.encoded_nelts (); + for (unsigned int i = 0; i < encoded_nelts; ++i) { - element_type base = orig[i] * factor; + element_type base = orig.m_encoding[i] * factor; for (unsigned int j = 0; j < factor; ++j) - quick_push (base + j); + m_encoding.quick_push (base + j); } + m_encoding.finalize (); +} + +/* Rotate the inputs of the permutation right by DELTA inputs. This changes + the values of the permutation vector but it doesn't change the way that + the elements are encoded. */ + +void +vec_perm_indices::rotate_inputs (int delta) +{ + element_type element_delta = delta * m_nelts_per_input; + for (unsigned int i = 0; i < m_encoding.length (); ++i) + m_encoding[i] = clamp (m_encoding[i] + element_delta); } /* Return true if all elements of the permutation vector are in the range @@ -52,9 +91,44 @@ vec_perm_indices::new_expanded_vector (c bool vec_perm_indices::all_in_range_p (element_type start, element_type size) const { - for (unsigned int i = 0; i < length (); ++i) - if ((*this)[i] < start || ((*this)[i] - start) >= size) + /* Check the first two elements of each pattern. */ + unsigned int npatterns = m_encoding.npatterns (); + unsigned int nelts_per_pattern = m_encoding.nelts_per_pattern (); + unsigned int base_nelts = npatterns * MIN (nelts_per_pattern, 2); + for (unsigned int i = 0; i < base_nelts; ++i) + if (m_encoding[i] < start || (m_encoding[i] - start) >= size) return false; + + /* For stepped encodings, check the full range of the series. */ + if (nelts_per_pattern == 3) + { + element_type limit = input_nelts (); + + /* The number of elements in each pattern beyond the first two + that we checked above. */ + unsigned int step_nelts = (m_encoding.full_nelts () / npatterns) - 2; + for (unsigned int i = 0; i < npatterns; ++i) + { + /* BASE1 has been checked but BASE2 hasn't. */ + element_type base1 = m_encoding[i + npatterns]; + element_type base2 = m_encoding[i + base_nelts]; + + /* The step to add to get from BASE1 to each subsequent value. */ + element_type step = clamp (base2 - base1); + + /* STEP has no inherent sign, so a value near LIMIT can + act as a negative step. The series is in range if it + is in range according to one of the two interpretations. + + Since we're dealing with clamped values, ELEMENT_TYPE is + wide enough for overflow not to be a problem. */ + element_type headroom_down = base1 - start; + element_type headroom_up = size - headroom_down - 1; + if (headroom_up < step * step_nelts + && headroom_down < (limit - step) * step_nelts) + return false; + } + } return true; } @@ -65,15 +139,16 @@ vec_perm_indices::all_in_range_p (elemen bool tree_to_vec_perm_builder (vec_perm_builder *builder, tree cst) { - unsigned int nelts = TYPE_VECTOR_SUBPARTS (TREE_TYPE (cst)); - for (unsigned int i = 0; i < nelts; ++i) - if (!tree_fits_shwi_p (vector_cst_elt (cst, i))) + unsigned int encoded_nelts = vector_cst_encoded_nelts (cst); + for (unsigned int i = 0; i < encoded_nelts; ++i) + if (!tree_fits_shwi_p (VECTOR_CST_ENCODED_ELT (cst, i))) return false; - builder->reserve (nelts); - for (unsigned int i = 0; i < nelts; ++i) - builder->quick_push (tree_to_shwi (vector_cst_elt (cst, i)) - & (2 * nelts - 1)); + builder->new_vector (TYPE_VECTOR_SUBPARTS (TREE_TYPE (cst)), + VECTOR_CST_NPATTERNS (cst), + VECTOR_CST_NELTS_PER_PATTERN (cst)); + for (unsigned int i = 0; i < encoded_nelts; ++i) + builder->quick_push (tree_to_shwi (VECTOR_CST_ENCODED_ELT (cst, i))); return true; } Index: gcc/optabs.c =================================================================== --- gcc/optabs.c 2018-01-02 17:01:27.762670303 +0000 +++ gcc/optabs.c 2018-01-02 17:01:28.747627350 +0000 @@ -5476,6 +5476,11 @@ expand_vec_perm_const (machine_mode mode rtx_insn *last = get_last_insn (); bool single_arg_p = rtx_equal_p (v0, v1); + /* Always specify two input vectors here and leave the target to handle + cases in which the inputs are equal. Not all backends can cope with + the single-input representation when testing for a double-input + target instruction. */ + vec_perm_indices indices (sel, 2, GET_MODE_NUNITS (mode)); /* See if this can be handled with a vec_shr. We only do this if the second vector is all zeroes. */ @@ -5488,7 +5493,7 @@ expand_vec_perm_const (machine_mode mode && (shift_code != CODE_FOR_nothing || shift_code_qi != CODE_FOR_nothing)) { - rtx shift_amt = shift_amt_for_vec_perm_mask (mode, sel); + rtx shift_amt = shift_amt_for_vec_perm_mask (mode, indices); if (shift_amt) { struct expand_operand ops[3]; @@ -5520,7 +5525,7 @@ expand_vec_perm_const (machine_mode mode else v1 = force_reg (mode, v1); - if (targetm.vectorize.vec_perm_const (mode, target, v0, v1, sel)) + if (targetm.vectorize.vec_perm_const (mode, target, v0, v1, indices)) return target; } @@ -5529,7 +5534,7 @@ expand_vec_perm_const (machine_mode mode rtx target_qi = NULL_RTX, v0_qi = NULL_RTX, v1_qi = NULL_RTX; if (qimode != VOIDmode) { - qimode_indices.new_expanded_vector (sel, GET_MODE_UNIT_SIZE (mode)); + qimode_indices.new_expanded_vector (indices, GET_MODE_UNIT_SIZE (mode)); target_qi = gen_reg_rtx (qimode); v0_qi = gen_lowpart (qimode, v0); v1_qi = gen_lowpart (qimode, v1); @@ -5556,7 +5561,7 @@ expand_vec_perm_const (machine_mode mode REQUIRED_SEL_MODE is OK. */ if (sel_mode != required_sel_mode) { - if (!selector_fits_mode_p (required_sel_mode, sel)) + if (!selector_fits_mode_p (required_sel_mode, indices)) { delete_insns_since (last); return NULL_RTX; @@ -5567,7 +5572,7 @@ expand_vec_perm_const (machine_mode mode insn_code icode = direct_optab_handler (vec_perm_optab, mode); if (icode != CODE_FOR_nothing) { - rtx sel_rtx = vec_perm_indices_to_rtx (sel_mode, sel); + rtx sel_rtx = vec_perm_indices_to_rtx (sel_mode, indices); rtx tmp = expand_vec_perm_1 (icode, target, v0, v1, sel_rtx); if (tmp) return tmp; @@ -5642,7 +5647,7 @@ expand_vec_perm_var (machine_mode mode, gcc_assert (sel != NULL); /* Broadcast the low byte each element into each of its bytes. */ - vec_perm_builder const_sel (w); + vec_perm_builder const_sel (w, w, 1); for (i = 0; i < w; ++i) { int this_e = i / u * u; @@ -5890,7 +5895,7 @@ expand_mult_highpart (machine_mode mode, expand_insn (optab_handler (tab2, mode), 3, eops); m2 = gen_lowpart (mode, eops[0].value); - auto_vec_perm_indices sel (nunits); + vec_perm_builder sel (nunits, nunits, 1); if (method == 2) { for (i = 0; i < nunits; ++i) Index: gcc/optabs-query.c =================================================================== --- gcc/optabs-query.c 2018-01-02 17:01:27.761670346 +0000 +++ gcc/optabs-query.c 2018-01-02 17:01:28.746627393 +0000 @@ -516,12 +516,13 @@ can_mult_highpart_p (machine_mode mode, op = uns_p ? vec_widen_umult_odd_optab : vec_widen_smult_odd_optab; if (optab_handler (op, mode) != CODE_FOR_nothing) { - auto_vec_perm_indices sel (nunits); + vec_perm_builder sel (nunits, nunits, 1); for (i = 0; i < nunits; ++i) sel.quick_push (!BYTES_BIG_ENDIAN + (i & ~1) + ((i & 1) ? nunits : 0)); - if (can_vec_perm_const_p (mode, sel)) + vec_perm_indices indices (sel, 2, nunits); + if (can_vec_perm_const_p (mode, indices)) return 2; } } @@ -532,10 +533,11 @@ can_mult_highpart_p (machine_mode mode, op = uns_p ? vec_widen_umult_lo_optab : vec_widen_smult_lo_optab; if (optab_handler (op, mode) != CODE_FOR_nothing) { - auto_vec_perm_indices sel (nunits); + vec_perm_builder sel (nunits, nunits, 1); for (i = 0; i < nunits; ++i) sel.quick_push (2 * i + (BYTES_BIG_ENDIAN ? 0 : 1)); - if (can_vec_perm_const_p (mode, sel)) + vec_perm_indices indices (sel, 2, nunits); + if (can_vec_perm_const_p (mode, indices)) return 3; } } Index: gcc/fold-const.c =================================================================== --- gcc/fold-const.c 2018-01-02 17:01:26.628719897 +0000 +++ gcc/fold-const.c 2018-01-02 17:01:28.746627393 +0000 @@ -11373,7 +11373,7 @@ fold_ternary_loc (location_t loc, enum t { unsigned int nelts = VECTOR_CST_NELTS (arg0), i; gcc_assert (nelts == TYPE_VECTOR_SUBPARTS (type)); - auto_vec_perm_indices sel (nelts); + vec_perm_builder sel (nelts, nelts, 1); for (i = 0; i < nelts; i++) { tree val = VECTOR_CST_ELT (arg0, i); @@ -11384,7 +11384,8 @@ fold_ternary_loc (location_t loc, enum t else /* Currently unreachable. */ return NULL_TREE; } - tree t = fold_vec_perm (type, arg1, arg2, sel); + tree t = fold_vec_perm (type, arg1, arg2, + vec_perm_indices (sel, 2, nelts)); if (t != NULL_TREE) return t; } @@ -11716,8 +11717,8 @@ fold_ternary_loc (location_t loc, enum t mask2 = 2 * nelts - 1; mask = single_arg ? (nelts - 1) : mask2; gcc_assert (nelts == TYPE_VECTOR_SUBPARTS (type)); - auto_vec_perm_indices sel (nelts); - auto_vec_perm_indices sel2 (nelts); + vec_perm_builder sel (nelts, nelts, 1); + vec_perm_builder sel2 (nelts, nelts, 1); for (i = 0; i < nelts; i++) { tree val = VECTOR_CST_ELT (arg2, i); @@ -11762,12 +11763,13 @@ fold_ternary_loc (location_t loc, enum t need_mask_canon = true; } + vec_perm_indices indices (sel, 2, nelts); if ((TREE_CODE (op0) == VECTOR_CST || TREE_CODE (op0) == CONSTRUCTOR) && (TREE_CODE (op1) == VECTOR_CST || TREE_CODE (op1) == CONSTRUCTOR)) { - tree t = fold_vec_perm (type, op0, op1, sel); + tree t = fold_vec_perm (type, op0, op1, indices); if (t != NULL_TREE) return t; } @@ -11779,11 +11781,14 @@ fold_ternary_loc (location_t loc, enum t argument permutation while still allowing an equivalent 2-argument version. */ if (need_mask_canon && arg2 == op2 - && !can_vec_perm_const_p (TYPE_MODE (type), sel, false) - && can_vec_perm_const_p (TYPE_MODE (type), sel2, false)) + && !can_vec_perm_const_p (TYPE_MODE (type), indices, false) + && can_vec_perm_const_p (TYPE_MODE (type), + vec_perm_indices (sel2, 2, nelts), + false)) { need_mask_canon = need_mask_canon2; - sel = sel2; + sel.truncate (0); + sel.splice (sel2); } if (need_mask_canon && arg2 == op2) Index: gcc/tree-ssa-forwprop.c =================================================================== --- gcc/tree-ssa-forwprop.c 2018-01-02 17:01:26.630719809 +0000 +++ gcc/tree-ssa-forwprop.c 2018-01-02 17:01:28.747627350 +0000 @@ -2018,7 +2018,7 @@ simplify_vector_constructor (gimple_stmt elem_type = TREE_TYPE (type); elem_size = TREE_INT_CST_LOW (TYPE_SIZE (elem_type)); - auto_vec_perm_indices sel (nelts); + vec_perm_builder sel (nelts, nelts, 1); orig = NULL; conv_code = ERROR_MARK; maybe_ident = true; @@ -2109,7 +2109,8 @@ simplify_vector_constructor (gimple_stmt { tree mask_type; - if (!can_vec_perm_const_p (TYPE_MODE (type), sel)) + vec_perm_indices indices (sel, 1, nelts); + if (!can_vec_perm_const_p (TYPE_MODE (type), indices)) return false; mask_type = build_vector_type (build_nonstandard_integer_type (elem_size, 1), Index: gcc/tree-vect-data-refs.c =================================================================== --- gcc/tree-vect-data-refs.c 2018-01-02 17:01:26.630719809 +0000 +++ gcc/tree-vect-data-refs.c 2018-01-02 17:01:28.748627306 +0000 @@ -4579,7 +4579,7 @@ vect_grouped_store_supported (tree vecty if (VECTOR_MODE_P (mode)) { unsigned int i, nelt = GET_MODE_NUNITS (mode); - auto_vec_perm_indices sel (nelt); + vec_perm_builder sel (nelt, nelt, 1); sel.quick_grow (nelt); if (count == 3) @@ -4587,6 +4587,7 @@ vect_grouped_store_supported (tree vecty unsigned int j0 = 0, j1 = 0, j2 = 0; unsigned int i, j; + vec_perm_indices indices; for (j = 0; j < 3; j++) { int nelt0 = ((3 - j) * nelt) % 3; @@ -4601,7 +4602,8 @@ vect_grouped_store_supported (tree vecty if (3 * i + nelt2 < nelt) sel[3 * i + nelt2] = 0; } - if (!can_vec_perm_const_p (mode, sel)) + indices.new_vector (sel, 2, nelt); + if (!can_vec_perm_const_p (mode, indices)) { if (dump_enabled_p ()) dump_printf (MSG_MISSED_OPTIMIZATION, @@ -4618,7 +4620,8 @@ vect_grouped_store_supported (tree vecty if (3 * i + nelt2 < nelt) sel[3 * i + nelt2] = nelt + j2++; } - if (!can_vec_perm_const_p (mode, sel)) + indices.new_vector (sel, 2, nelt); + if (!can_vec_perm_const_p (mode, indices)) { if (dump_enabled_p ()) dump_printf (MSG_MISSED_OPTIMIZATION, @@ -4638,11 +4641,13 @@ vect_grouped_store_supported (tree vecty sel[i * 2] = i; sel[i * 2 + 1] = i + nelt; } - if (can_vec_perm_const_p (mode, sel)) + vec_perm_indices indices (sel, 2, nelt); + if (can_vec_perm_const_p (mode, indices)) { for (i = 0; i < nelt; i++) sel[i] += nelt / 2; - if (can_vec_perm_const_p (mode, sel)) + indices.new_vector (sel, 2, nelt); + if (can_vec_perm_const_p (mode, indices)) return true; } } @@ -4744,7 +4749,7 @@ vect_permute_store_chain (vec<tree> dr_c unsigned int i, n, log_length = exact_log2 (length); unsigned int j, nelt = TYPE_VECTOR_SUBPARTS (vectype); - auto_vec_perm_indices sel (nelt); + vec_perm_builder sel (nelt, nelt, 1); sel.quick_grow (nelt); result_chain->quick_grow (length); @@ -4755,6 +4760,7 @@ vect_permute_store_chain (vec<tree> dr_c { unsigned int j0 = 0, j1 = 0, j2 = 0; + vec_perm_indices indices; for (j = 0; j < 3; j++) { int nelt0 = ((3 - j) * nelt) % 3; @@ -4770,7 +4776,8 @@ vect_permute_store_chain (vec<tree> dr_c if (3 * i + nelt2 < nelt) sel[3 * i + nelt2] = 0; } - perm3_mask_low = vect_gen_perm_mask_checked (vectype, sel); + indices.new_vector (sel, 2, nelt); + perm3_mask_low = vect_gen_perm_mask_checked (vectype, indices); for (i = 0; i < nelt; i++) { @@ -4781,7 +4788,8 @@ vect_permute_store_chain (vec<tree> dr_c if (3 * i + nelt2 < nelt) sel[3 * i + nelt2] = nelt + j2++; } - perm3_mask_high = vect_gen_perm_mask_checked (vectype, sel); + indices.new_vector (sel, 2, nelt); + perm3_mask_high = vect_gen_perm_mask_checked (vectype, indices); vect1 = dr_chain[0]; vect2 = dr_chain[1]; @@ -4818,11 +4826,13 @@ vect_permute_store_chain (vec<tree> dr_c sel[i * 2] = i; sel[i * 2 + 1] = i + nelt; } - perm_mask_high = vect_gen_perm_mask_checked (vectype, sel); + vec_perm_indices indices (sel, 2, nelt); + perm_mask_high = vect_gen_perm_mask_checked (vectype, indices); for (i = 0; i < nelt; i++) sel[i] += nelt / 2; - perm_mask_low = vect_gen_perm_mask_checked (vectype, sel); + indices.new_vector (sel, 2, nelt); + perm_mask_low = vect_gen_perm_mask_checked (vectype, indices); for (i = 0, n = log_length; i < n; i++) { @@ -5167,11 +5177,12 @@ vect_grouped_load_supported (tree vectyp if (VECTOR_MODE_P (mode)) { unsigned int i, j, nelt = GET_MODE_NUNITS (mode); - auto_vec_perm_indices sel (nelt); + vec_perm_builder sel (nelt, nelt, 1); sel.quick_grow (nelt); if (count == 3) { + vec_perm_indices indices; unsigned int k; for (k = 0; k < 3; k++) { @@ -5180,7 +5191,8 @@ vect_grouped_load_supported (tree vectyp sel[i] = 3 * i + k; else sel[i] = 0; - if (!can_vec_perm_const_p (mode, sel)) + indices.new_vector (sel, 2, nelt); + if (!can_vec_perm_const_p (mode, indices)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, @@ -5193,7 +5205,8 @@ vect_grouped_load_supported (tree vectyp sel[i] = i; else sel[i] = nelt + ((nelt + k) % 3) + 3 * (j++); - if (!can_vec_perm_const_p (mode, sel)) + indices.new_vector (sel, 2, nelt); + if (!can_vec_perm_const_p (mode, indices)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, @@ -5208,13 +5221,16 @@ vect_grouped_load_supported (tree vectyp { /* If length is not equal to 3 then only power of 2 is supported. */ gcc_assert (pow2p_hwi (count)); + for (i = 0; i < nelt; i++) sel[i] = i * 2; - if (can_vec_perm_const_p (mode, sel)) + vec_perm_indices indices (sel, 2, nelt); + if (can_vec_perm_const_p (mode, indices)) { for (i = 0; i < nelt; i++) sel[i] = i * 2 + 1; - if (can_vec_perm_const_p (mode, sel)) + indices.new_vector (sel, 2, nelt); + if (can_vec_perm_const_p (mode, indices)) return true; } } @@ -5329,7 +5345,7 @@ vect_permute_load_chain (vec<tree> dr_ch unsigned int i, j, log_length = exact_log2 (length); unsigned nelt = TYPE_VECTOR_SUBPARTS (vectype); - auto_vec_perm_indices sel (nelt); + vec_perm_builder sel (nelt, nelt, 1); sel.quick_grow (nelt); result_chain->quick_grow (length); @@ -5340,6 +5356,7 @@ vect_permute_load_chain (vec<tree> dr_ch { unsigned int k; + vec_perm_indices indices; for (k = 0; k < 3; k++) { for (i = 0; i < nelt; i++) @@ -5347,15 +5364,16 @@ vect_permute_load_chain (vec<tree> dr_ch sel[i] = 3 * i + k; else sel[i] = 0; - perm3_mask_low = vect_gen_perm_mask_checked (vectype, sel); + indices.new_vector (sel, 2, nelt); + perm3_mask_low = vect_gen_perm_mask_checked (vectype, indices); for (i = 0, j = 0; i < nelt; i++) if (3 * i + k < 2 * nelt) sel[i] = i; else sel[i] = nelt + ((nelt + k) % 3) + 3 * (j++); - - perm3_mask_high = vect_gen_perm_mask_checked (vectype, sel); + indices.new_vector (sel, 2, nelt); + perm3_mask_high = vect_gen_perm_mask_checked (vectype, indices); first_vect = dr_chain[0]; second_vect = dr_chain[1]; @@ -5387,11 +5405,13 @@ vect_permute_load_chain (vec<tree> dr_ch for (i = 0; i < nelt; ++i) sel[i] = i * 2; - perm_mask_even = vect_gen_perm_mask_checked (vectype, sel); + vec_perm_indices indices (sel, 2, nelt); + perm_mask_even = vect_gen_perm_mask_checked (vectype, indices); for (i = 0; i < nelt; ++i) sel[i] = i * 2 + 1; - perm_mask_odd = vect_gen_perm_mask_checked (vectype, sel); + indices.new_vector (sel, 2, nelt); + perm_mask_odd = vect_gen_perm_mask_checked (vectype, indices); for (i = 0; i < log_length; i++) { @@ -5527,7 +5547,7 @@ vect_shift_permute_load_chain (vec<tree> stmt_vec_info stmt_info = vinfo_for_stmt (stmt); loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info); - auto_vec_perm_indices sel (nelt); + vec_perm_builder sel (nelt, nelt, 1); sel.quick_grow (nelt); result_chain->quick_grow (length); @@ -5541,7 +5561,8 @@ vect_shift_permute_load_chain (vec<tree> sel[i] = i * 2; for (i = 0; i < nelt / 2; ++i) sel[nelt / 2 + i] = i * 2 + 1; - if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) + vec_perm_indices indices (sel, 2, nelt); + if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, @@ -5549,13 +5570,14 @@ vect_shift_permute_load_chain (vec<tree> supported by target\n"); return false; } - perm2_mask1 = vect_gen_perm_mask_checked (vectype, sel); + perm2_mask1 = vect_gen_perm_mask_checked (vectype, indices); for (i = 0; i < nelt / 2; ++i) sel[i] = i * 2 + 1; for (i = 0; i < nelt / 2; ++i) sel[nelt / 2 + i] = i * 2; - if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) + indices.new_vector (sel, 2, nelt); + if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, @@ -5563,20 +5585,21 @@ vect_shift_permute_load_chain (vec<tree> supported by target\n"); return false; } - perm2_mask2 = vect_gen_perm_mask_checked (vectype, sel); + perm2_mask2 = vect_gen_perm_mask_checked (vectype, indices); /* Generating permutation constant to shift all elements. For vector length 8 it is {4 5 6 7 8 9 10 11}. */ for (i = 0; i < nelt; i++) sel[i] = nelt / 2 + i; - if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) + indices.new_vector (sel, 2, nelt); + if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, "shift permutation is not supported by target\n"); return false; } - shift1_mask = vect_gen_perm_mask_checked (vectype, sel); + shift1_mask = vect_gen_perm_mask_checked (vectype, indices); /* Generating permutation constant to select vector from 2. For vector length 8 it is {0 1 2 3 12 13 14 15}. */ @@ -5584,14 +5607,15 @@ vect_shift_permute_load_chain (vec<tree> sel[i] = i; for (i = nelt / 2; i < nelt; i++) sel[i] = nelt + i; - if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) + indices.new_vector (sel, 2, nelt); + if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, "select is not supported by target\n"); return false; } - select_mask = vect_gen_perm_mask_checked (vectype, sel); + select_mask = vect_gen_perm_mask_checked (vectype, indices); for (i = 0; i < log_length; i++) { @@ -5647,7 +5671,8 @@ vect_shift_permute_load_chain (vec<tree> sel[i] = 3 * k + (l % 3); k++; } - if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) + vec_perm_indices indices (sel, 2, nelt); + if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, @@ -5655,59 +5680,63 @@ vect_shift_permute_load_chain (vec<tree> supported by target\n"); return false; } - perm3_mask = vect_gen_perm_mask_checked (vectype, sel); + perm3_mask = vect_gen_perm_mask_checked (vectype, indices); /* Generating permutation constant to shift all elements. For vector length 8 it is {6 7 8 9 10 11 12 13}. */ for (i = 0; i < nelt; i++) sel[i] = 2 * (nelt / 3) + (nelt % 3) + i; - if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) + indices.new_vector (sel, 2, nelt); + if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, "shift permutation is not supported by target\n"); return false; } - shift1_mask = vect_gen_perm_mask_checked (vectype, sel); + shift1_mask = vect_gen_perm_mask_checked (vectype, indices); /* Generating permutation constant to shift all elements. For vector length 8 it is {5 6 7 8 9 10 11 12}. */ for (i = 0; i < nelt; i++) sel[i] = 2 * (nelt / 3) + 1 + i; - if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) + indices.new_vector (sel, 2, nelt); + if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, "shift permutation is not supported by target\n"); return false; } - shift2_mask = vect_gen_perm_mask_checked (vectype, sel); + shift2_mask = vect_gen_perm_mask_checked (vectype, indices); /* Generating permutation constant to shift all elements. For vector length 8 it is {3 4 5 6 7 8 9 10}. */ for (i = 0; i < nelt; i++) sel[i] = (nelt / 3) + (nelt % 3) / 2 + i; - if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) + indices.new_vector (sel, 2, nelt); + if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, "shift permutation is not supported by target\n"); return false; } - shift3_mask = vect_gen_perm_mask_checked (vectype, sel); + shift3_mask = vect_gen_perm_mask_checked (vectype, indices); /* Generating permutation constant to shift all elements. For vector length 8 it is {5 6 7 8 9 10 11 12}. */ for (i = 0; i < nelt; i++) sel[i] = 2 * (nelt / 3) + (nelt % 3) / 2 + i; - if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) + indices.new_vector (sel, 2, nelt); + if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, "shift permutation is not supported by target\n"); return false; } - shift4_mask = vect_gen_perm_mask_checked (vectype, sel); + shift4_mask = vect_gen_perm_mask_checked (vectype, indices); for (k = 0; k < 3; k++) { Index: gcc/tree-vect-slp.c =================================================================== --- gcc/tree-vect-slp.c 2018-01-02 17:01:26.632719721 +0000 +++ gcc/tree-vect-slp.c 2018-01-02 17:01:28.749627263 +0000 @@ -894,7 +894,7 @@ vect_build_slp_tree_1 (vec_info *vinfo, && TREE_CODE_CLASS (alt_stmt_code) != tcc_reference) { unsigned int count = TYPE_VECTOR_SUBPARTS (vectype); - auto_vec_perm_indices sel (count); + vec_perm_builder sel (count, count, 1); for (i = 0; i < count; ++i) { unsigned int elt = i; @@ -902,7 +902,8 @@ vect_build_slp_tree_1 (vec_info *vinfo, elt += count; sel.quick_push (elt); } - if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) + vec_perm_indices indices (sel, 2, count); + if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices)) { for (i = 0; i < group_size; ++i) if (gimple_assign_rhs_code (stmts[i]) == alt_stmt_code) @@ -3570,8 +3571,9 @@ vect_transform_slp_perm_load (slp_tree n (int_mode_for_mode (TYPE_MODE (TREE_TYPE (vectype))).require (), 1); mask_type = get_vectype_for_scalar_type (mask_element_type); nunits = TYPE_VECTOR_SUBPARTS (vectype); - auto_vec_perm_indices mask (nunits); + vec_perm_builder mask (nunits, nunits, 1); mask.quick_grow (nunits); + vec_perm_indices indices; /* Initialize the vect stmts of NODE to properly insert the generated stmts later. */ @@ -3644,10 +3646,10 @@ vect_transform_slp_perm_load (slp_tree n noop_p = false; mask[index++] = mask_element; - if (index == nunits) + if (index == nunits && !noop_p) { - if (! noop_p - && ! can_vec_perm_const_p (mode, mask)) + indices.new_vector (mask, 2, nunits); + if (!can_vec_perm_const_p (mode, indices)) { if (dump_enabled_p ()) { @@ -3655,16 +3657,19 @@ vect_transform_slp_perm_load (slp_tree n vect_location, "unsupported vect permute { "); for (i = 0; i < nunits; ++i) - dump_printf (MSG_MISSED_OPTIMIZATION, "%d ", mask[i]); + dump_printf (MSG_MISSED_OPTIMIZATION, + HOST_WIDE_INT_PRINT_DEC " ", mask[i]); dump_printf (MSG_MISSED_OPTIMIZATION, "}\n"); } gcc_assert (analyze_only); return false; } - if (! noop_p) - ++*n_perms; + ++*n_perms; + } + if (index == nunits) + { if (!analyze_only) { tree mask_vec = NULL_TREE; @@ -3797,7 +3802,7 @@ vect_schedule_slp_instance (slp_tree nod enum tree_code code0 = gimple_assign_rhs_code (stmt); enum tree_code ocode = ERROR_MARK; gimple *ostmt; - auto_vec_perm_indices mask (group_size); + vec_perm_builder mask (group_size, group_size, 1); FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, ostmt) if (gimple_assign_rhs_code (ostmt) != code0) { Index: gcc/tree-vect-stmts.c =================================================================== --- gcc/tree-vect-stmts.c 2018-01-02 17:01:26.632719721 +0000 +++ gcc/tree-vect-stmts.c 2018-01-02 17:01:28.750627219 +0000 @@ -1717,13 +1717,14 @@ perm_mask_for_reverse (tree vectype) nunits = TYPE_VECTOR_SUBPARTS (vectype); - auto_vec_perm_indices sel (nunits); + vec_perm_builder sel (nunits, nunits, 1); for (i = 0; i < nunits; ++i) sel.quick_push (nunits - 1 - i); - if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) + vec_perm_indices indices (sel, 1, nunits); + if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices)) return NULL_TREE; - return vect_gen_perm_mask_checked (vectype, sel); + return vect_gen_perm_mask_checked (vectype, indices); } /* A subroutine of get_load_store_type, with a subset of the same @@ -2185,27 +2186,32 @@ vectorizable_mask_load_store (gimple *st { modifier = WIDEN; - auto_vec_perm_indices sel (gather_off_nunits); + vec_perm_builder sel (gather_off_nunits, gather_off_nunits, 1); for (i = 0; i < gather_off_nunits; ++i) sel.quick_push (i | nunits); - perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype, sel); + vec_perm_indices indices (sel, 1, gather_off_nunits); + perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype, + indices); } else if (nunits == gather_off_nunits * 2) { modifier = NARROW; - auto_vec_perm_indices sel (nunits); + vec_perm_builder sel (nunits, nunits, 1); sel.quick_grow (nunits); for (i = 0; i < nunits; ++i) sel[i] = i < gather_off_nunits ? i : i + nunits - gather_off_nunits; + vec_perm_indices indices (sel, 2, nunits); + perm_mask = vect_gen_perm_mask_checked (vectype, indices); - perm_mask = vect_gen_perm_mask_checked (vectype, sel); ncopies *= 2; + for (i = 0; i < nunits; ++i) sel[i] = i | gather_off_nunits; - mask_perm_mask = vect_gen_perm_mask_checked (masktype, sel); + indices.new_vector (sel, 2, gather_off_nunits); + mask_perm_mask = vect_gen_perm_mask_checked (masktype, indices); } else gcc_unreachable (); @@ -2498,12 +2504,13 @@ vectorizable_bswap (gimple *stmt, gimple unsigned int num_bytes = TYPE_VECTOR_SUBPARTS (char_vectype); unsigned word_bytes = num_bytes / nunits; - auto_vec_perm_indices elts (num_bytes); + vec_perm_builder elts (num_bytes, num_bytes, 1); for (unsigned i = 0; i < nunits; ++i) for (unsigned j = 0; j < word_bytes; ++j) elts.quick_push ((i + 1) * word_bytes - j - 1); - if (!can_vec_perm_const_p (TYPE_MODE (char_vectype), elts)) + vec_perm_indices indices (elts, 1, num_bytes); + if (!can_vec_perm_const_p (TYPE_MODE (char_vectype), indices)) return false; if (! vec_stmt) @@ -5826,22 +5833,25 @@ vectorizable_store (gimple *stmt, gimple { modifier = WIDEN; - auto_vec_perm_indices sel (scatter_off_nunits); + vec_perm_builder sel (scatter_off_nunits, scatter_off_nunits, 1); for (i = 0; i < (unsigned int) scatter_off_nunits; ++i) sel.quick_push (i | nunits); - perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype, sel); + vec_perm_indices indices (sel, 1, scatter_off_nunits); + perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype, + indices); gcc_assert (perm_mask != NULL_TREE); } else if (nunits == (unsigned int) scatter_off_nunits * 2) { modifier = NARROW; - auto_vec_perm_indices sel (nunits); + vec_perm_builder sel (nunits, nunits, 1); for (i = 0; i < (unsigned int) nunits; ++i) sel.quick_push (i | scatter_off_nunits); - perm_mask = vect_gen_perm_mask_checked (vectype, sel); + vec_perm_indices indices (sel, 2, nunits); + perm_mask = vect_gen_perm_mask_checked (vectype, indices); gcc_assert (perm_mask != NULL_TREE); ncopies *= 2; } @@ -6862,22 +6872,25 @@ vectorizable_load (gimple *stmt, gimple_ { modifier = WIDEN; - auto_vec_perm_indices sel (gather_off_nunits); + vec_perm_builder sel (gather_off_nunits, gather_off_nunits, 1); for (i = 0; i < gather_off_nunits; ++i) sel.quick_push (i | nunits); - perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype, sel); + vec_perm_indices indices (sel, 1, gather_off_nunits); + perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype, + indices); } else if (nunits == gather_off_nunits * 2) { modifier = NARROW; - auto_vec_perm_indices sel (nunits); + vec_perm_builder sel (nunits, nunits, 1); for (i = 0; i < nunits; ++i) sel.quick_push (i < gather_off_nunits ? i : i + nunits - gather_off_nunits); - perm_mask = vect_gen_perm_mask_checked (vectype, sel); + vec_perm_indices indices (sel, 2, nunits); + perm_mask = vect_gen_perm_mask_checked (vectype, indices); ncopies *= 2; } else Index: gcc/tree-vect-generic.c =================================================================== --- gcc/tree-vect-generic.c 2018-01-02 17:01:26.631719765 +0000 +++ gcc/tree-vect-generic.c 2018-01-02 17:01:28.748627306 +0000 @@ -1299,15 +1299,13 @@ lower_vec_perm (gimple_stmt_iterator *gs mask = gimple_assign_rhs1 (def_stmt); } - if (TREE_CODE (mask) == VECTOR_CST) - { - auto_vec_perm_indices sel_int (elements); - - for (i = 0; i < elements; ++i) - sel_int.quick_push (TREE_INT_CST_LOW (VECTOR_CST_ELT (mask, i)) - & (2 * elements - 1)); + vec_perm_builder sel_int; - if (can_vec_perm_const_p (TYPE_MODE (vect_type), sel_int)) + if (TREE_CODE (mask) == VECTOR_CST + && tree_to_vec_perm_builder (&sel_int, mask)) + { + vec_perm_indices indices (sel_int, 2, elements); + if (can_vec_perm_const_p (TYPE_MODE (vect_type), indices)) { gimple_assign_set_rhs3 (stmt, mask); update_stmt (stmt); @@ -1319,14 +1317,14 @@ lower_vec_perm (gimple_stmt_iterator *gs != CODE_FOR_nothing && TREE_CODE (vec1) == VECTOR_CST && initializer_zerop (vec1) - && sel_int[0] - && sel_int[0] < elements) + && indices[0] + && indices[0] < elements) { for (i = 1; i < elements; ++i) { - unsigned int expected = i + sel_int[0]; + unsigned int expected = i + indices[0]; /* Indices into the second vector are all equivalent. */ - if (MIN (elements, (unsigned) sel_int[i]) + if (MIN (elements, (unsigned) indices[i]) != MIN (elements, expected)) break; } Index: gcc/tree-vect-loop.c =================================================================== --- gcc/tree-vect-loop.c 2018-01-02 17:01:26.631719765 +0000 +++ gcc/tree-vect-loop.c 2018-01-02 17:01:28.749627263 +0000 @@ -3713,12 +3713,11 @@ vect_estimate_min_profitable_iters (loop vector elements (not bits) for a vector with NELT elements. */ static void calc_vec_perm_mask_for_shift (unsigned int offset, unsigned int nelt, - vec_perm_indices *sel) + vec_perm_builder *sel) { - unsigned int i; - - for (i = 0; i < nelt; i++) - sel->quick_push ((i + offset) & (2 * nelt - 1)); + sel->new_vector (nelt, nelt, 1); + for (unsigned int i = 0; i < nelt; i++) + sel->quick_push (i + offset); } /* Checks whether the target supports whole-vector shifts for vectors of mode @@ -3731,13 +3730,13 @@ have_whole_vector_shift (machine_mode mo return true; unsigned int i, nelt = GET_MODE_NUNITS (mode); - auto_vec_perm_indices sel (nelt); - + vec_perm_builder sel; + vec_perm_indices indices; for (i = nelt/2; i >= 1; i/=2) { - sel.truncate (0); calc_vec_perm_mask_for_shift (i, nelt, &sel); - if (!can_vec_perm_const_p (mode, sel, false)) + indices.new_vector (sel, 2, nelt); + if (!can_vec_perm_const_p (mode, indices, false)) return false; } return true; @@ -5055,7 +5054,8 @@ vect_create_epilog_for_reduction (vec<tr if (reduce_with_shift && !slp_reduc) { int nelements = vec_size_in_bits / element_bitsize; - auto_vec_perm_indices sel (nelements); + vec_perm_builder sel; + vec_perm_indices indices; int elt_offset; @@ -5079,9 +5079,9 @@ vect_create_epilog_for_reduction (vec<tr elt_offset >= 1; elt_offset /= 2) { - sel.truncate (0); calc_vec_perm_mask_for_shift (elt_offset, nelements, &sel); - tree mask = vect_gen_perm_mask_any (vectype, sel); + indices.new_vector (sel, 2, nelements); + tree mask = vect_gen_perm_mask_any (vectype, indices); epilog_stmt = gimple_build_assign (vec_dest, VEC_PERM_EXPR, new_temp, zero_vec, mask); new_name = make_ssa_name (vec_dest, epilog_stmt); Index: gcc/config/aarch64/aarch64.c =================================================================== --- gcc/config/aarch64/aarch64.c 2018-01-02 17:01:26.602721036 +0000 +++ gcc/config/aarch64/aarch64.c 2018-01-02 17:01:28.736627829 +0000 @@ -13252,7 +13252,7 @@ #define MAX_VECT_LEN 16 struct expand_vec_perm_d { rtx target, op0, op1; - auto_vec_perm_indices perm; + vec_perm_indices perm; machine_mode vmode; bool one_vector_p; bool testing_p; @@ -13642,10 +13642,7 @@ aarch64_expand_vec_perm_const_1 (struct unsigned int nelt = d->perm.length (); if (d->perm[0] >= nelt) { - gcc_assert (nelt == (nelt & -nelt)); - for (unsigned int i = 0; i < nelt; ++i) - d->perm[i] ^= nelt; /* Keep the same index, but in the other vector. */ - + d->perm.rotate_inputs (1); std::swap (d->op0, d->op1); } @@ -13685,12 +13682,10 @@ aarch64_vectorize_vec_perm_const (machin /* Calculate whether all elements are in one vector. */ unsigned int nelt = sel.length (); - d.perm.reserve (nelt); for (i = which = 0; i < nelt; ++i) { unsigned int ei = sel[i] & (2 * nelt - 1); which |= (ei < nelt ? 1 : 2); - d.perm.quick_push (ei); } switch (which) @@ -13709,8 +13704,6 @@ aarch64_vectorize_vec_perm_const (machin input vector. */ /* Fall Through. */ case 2: - for (i = 0; i < nelt; ++i) - d.perm[i] &= nelt - 1; d.op0 = op1; d.one_vector_p = true; break; @@ -13721,6 +13714,8 @@ aarch64_vectorize_vec_perm_const (machin break; } + d.perm.new_vector (sel.encoding (), d.one_vector_p ? 1 : 2, nelt); + if (!d.testing_p) return aarch64_expand_vec_perm_const_1 (&d); Index: gcc/config/arm/arm.c =================================================================== --- gcc/config/arm/arm.c 2018-01-02 17:01:26.604720948 +0000 +++ gcc/config/arm/arm.c 2018-01-02 17:01:28.739627698 +0000 @@ -28854,7 +28854,7 @@ #define MAX_VECT_LEN 16 struct expand_vec_perm_d { rtx target, op0, op1; - auto_vec_perm_indices perm; + vec_perm_indices perm; machine_mode vmode; bool one_vector_p; bool testing_p; @@ -29362,9 +29362,7 @@ arm_expand_vec_perm_const_1 (struct expa unsigned int nelt = d->perm.length (); if (d->perm[0] >= nelt) { - for (unsigned int i = 0; i < nelt; ++i) - d->perm[i] = (d->perm[i] + nelt) & (2 * nelt - 1); - + d->perm.rotate_inputs (1); std::swap (d->op0, d->op1); } @@ -29404,12 +29402,10 @@ arm_vectorize_vec_perm_const (machine_mo d.testing_p = !target; nelt = GET_MODE_NUNITS (d.vmode); - d.perm.reserve (nelt); for (i = which = 0; i < nelt; ++i) { int ei = sel[i] & (2 * nelt - 1); which |= (ei < nelt ? 1 : 2); - d.perm.quick_push (ei); } switch (which) @@ -29428,8 +29424,6 @@ arm_vectorize_vec_perm_const (machine_mo input vector. */ /* FALLTHRU */ case 2: - for (i = 0; i < nelt; ++i) - d.perm[i] &= nelt - 1; d.op0 = op1; d.one_vector_p = true; break; @@ -29440,6 +29434,8 @@ arm_vectorize_vec_perm_const (machine_mo break; } + d.perm.new_vector (sel.encoding (), d.one_vector_p ? 1 : 2, nelt); + if (d.testing_p) return arm_expand_vec_perm_const_1 (&d); Index: gcc/config/powerpcspe/powerpcspe.c =================================================================== --- gcc/config/powerpcspe/powerpcspe.c 2018-01-02 17:01:26.617720379 +0000 +++ gcc/config/powerpcspe/powerpcspe.c 2018-01-02 17:01:28.742627568 +0000 @@ -38782,7 +38782,7 @@ rs6000_expand_extract_even (rtx target, { machine_mode vmode = GET_MODE (target); unsigned i, nelt = GET_MODE_NUNITS (vmode); - vec_perm_builder perm (nelt); + vec_perm_builder perm (nelt, nelt, 1); for (i = 0; i < nelt; i++) perm.quick_push (i * 2); @@ -38797,7 +38797,7 @@ rs6000_expand_interleave (rtx target, rt { machine_mode vmode = GET_MODE (target); unsigned i, high, nelt = GET_MODE_NUNITS (vmode); - vec_perm_builder perm (nelt); + vec_perm_builder perm (nelt, nelt, 1); high = (highp ? 0 : nelt / 2); for (i = 0; i < nelt / 2; i++) Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c 2018-01-02 17:01:26.621720203 +0000 +++ gcc/config/rs6000/rs6000.c 2018-01-02 17:01:28.745627437 +0000 @@ -36042,7 +36042,7 @@ rs6000_expand_extract_even (rtx target, { machine_mode vmode = GET_MODE (target); unsigned i, nelt = GET_MODE_NUNITS (vmode); - vec_perm_builder perm (nelt); + vec_perm_builder perm (nelt, nelt, 1); for (i = 0; i < nelt; i++) perm.quick_push (i * 2); @@ -36057,7 +36057,7 @@ rs6000_expand_interleave (rtx target, rt { machine_mode vmode = GET_MODE (target); unsigned i, high, nelt = GET_MODE_NUNITS (vmode); - vec_perm_builder perm (nelt); + vec_perm_builder perm (nelt, nelt, 1); high = (highp ? 0 : nelt / 2); for (i = 0; i < nelt / 2; i++)
Index: gcc/int-vector-builder.h =================================================================== --- /dev/null 2017-12-09 13:59:56.352713187 +0000 +++ gcc/int-vector-builder.h 2017-12-09 22:48:47.545825268 +0000 @@ -0,0 +1,90 @@ +/* A class for building vector integer constants. + Copyright (C) 2017 Free Software Foundation, Inc. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +<http://www.gnu.org/licenses/>. */ + +#ifndef GCC_INT_VECTOR_BUILDER_H +#define GCC_INT_VECTOR_BUILDER_H 1 + +#include "vector-builder.h" + +/* This class is used to build vectors of integer type T using the same + encoding as tree and rtx constants. See vector_builder for more + details. */ +template<typename T> +class int_vector_builder : public vector_builder<T, int_vector_builder<T> > +{ + typedef vector_builder<T, int_vector_builder> parent; + friend class vector_builder<T, int_vector_builder>; + +public: + int_vector_builder () {} + int_vector_builder (unsigned int, unsigned int, unsigned int); + + using parent::new_vector; + +private: + bool equal_p (T, T) const; + bool allow_steps_p () const { return true; } + bool integral_p (T) const { return true; } + T step (T, T) const; + T apply_step (T, unsigned int, T) const; + bool can_elide_p (T) const { return true; } + void note_representative (T *, T) {} +}; + +/* Create a new builder for a vector with FULL_NELTS elements. + Initially encode the value as NPATTERNS interleaved patterns with + NELTS_PER_PATTERN elements each. */ + +template<typename T> +inline +int_vector_builder<T>::int_vector_builder (unsigned int full_nelts, + unsigned int npatterns, + unsigned int nelts_per_pattern) +{ + new_vector (full_nelts, npatterns, nelts_per_pattern); +} + +/* Return true if elements ELT1 and ELT2 are equal. */ + +template<typename T> +inline bool +int_vector_builder<T>::equal_p (T elt1, T elt2) const +{ + return elt1 == elt2; +} + +/* Return the value of element ELT2 minus the value of element ELT1. */ + +template<typename T> +inline T +int_vector_builder<T>::step (T elt1, T elt2) const +{ + return elt2 - elt1; +} + +/* Return a vector element with the value BASE + FACTOR * STEP. */ + +template<typename T> +inline T +int_vector_builder<T>::apply_step (T base, unsigned int factor, T step) const +{ + return base + factor * step; +} + +#endif Index: gcc/vec-perm-indices.h =================================================================== --- gcc/vec-perm-indices.h 2017-12-09 22:47:27.885318101 +0000 +++ gcc/vec-perm-indices.h 2017-12-09 22:48:47.548825399 +0000 @@ -20,30 +20,102 @@ Software Foundation; either version 3, o #ifndef GCC_VEC_PERN_INDICES_H #define GCC_VEC_PERN_INDICES_H 1 +#include "int-vector-builder.h" + +/* A vector_builder for building constant permutation vectors. + The elements do not need to be clamped to a particular range + of input elements. */ +typedef int_vector_builder<HOST_WIDE_INT> vec_perm_builder; + /* This class represents a constant permutation vector, such as that used - as the final operand to a VEC_PERM_EXPR. */ -class vec_perm_indices : public auto_vec<unsigned short, 32> + as the final operand to a VEC_PERM_EXPR. The vector is canonicalized + for a particular number of input vectors and for a particular number + of elements per input. The class copes with cases in which the + input and output vectors have different numbers of elements. */ +class vec_perm_indices { - typedef unsigned short element_type; - typedef auto_vec<element_type, 32> parent_type; + typedef HOST_WIDE_INT element_type; public: - vec_perm_indices () {} - vec_perm_indices (unsigned int nunits) : parent_type (nunits) {} + vec_perm_indices (); + vec_perm_indices (const vec_perm_builder &, unsigned int, unsigned int); + void new_vector (const vec_perm_builder &, unsigned int, unsigned int); void new_expanded_vector (const vec_perm_indices &, unsigned int); + void rotate_inputs (int delta); + + /* Return the underlying vector encoding. */ + const vec_perm_builder &encoding () const { return m_encoding; } + + /* Return the number of output elements. This is called length () + so that we present a more vec-like interface. */ + unsigned int length () const { return m_encoding.full_nelts (); } + + /* Return the number of input vectors being permuted. */ + unsigned int ninputs () const { return m_ninputs; } + /* Return the number of elements in each input vector. */ + unsigned int nelts_per_input () const { return m_nelts_per_input; } + + /* Return the total number of input elements. */ + unsigned int input_nelts () const { return m_ninputs * m_nelts_per_input; } + + element_type clamp (element_type) const; + element_type operator[] (unsigned int i) const; bool all_in_range_p (element_type, element_type) const; private: vec_perm_indices (const vec_perm_indices &); -}; -/* Temporary. */ -typedef vec_perm_indices vec_perm_builder; -typedef vec_perm_indices auto_vec_perm_indices; + vec_perm_builder m_encoding; + unsigned int m_ninputs; + unsigned int m_nelts_per_input; +}; bool tree_to_vec_perm_builder (vec_perm_builder *, tree); rtx vec_perm_indices_to_rtx (machine_mode, const vec_perm_indices &); +inline +vec_perm_indices::vec_perm_indices () + : m_ninputs (0), + m_nelts_per_input (0) +{ +} + +/* Construct a permutation vector that selects between NINPUTS vector + inputs that have NELTS_PER_INPUT elements each. Take the elements of + the new vector from ELEMENTS, clamping each one to be in range. */ + +inline +vec_perm_indices::vec_perm_indices (const vec_perm_builder &elements, + unsigned int ninputs, + unsigned int nelts_per_input) +{ + new_vector (elements, ninputs, nelts_per_input); +} + +/* Return the canonical value for permutation vector element ELT, + taking into account the current number of input elements. */ + +inline vec_perm_indices::element_type +vec_perm_indices::clamp (element_type elt) const +{ + element_type limit = input_nelts (); + elt %= limit; + /* Treat negative elements as counting from the end. This only matters + if the vector size is not a power of 2. */ + if (elt < 0) + elt += limit; + return elt; +} + +/* Return the value of vector element I, which might or might not be + explicitly encoded. */ + +inline vec_perm_indices::element_type +vec_perm_indices::operator[] (unsigned int i) const +{ + return clamp (m_encoding.elt (i)); +} + #endif Index: gcc/vec-perm-indices.c =================================================================== --- gcc/vec-perm-indices.c 2017-12-09 22:47:27.885318101 +0000 +++ gcc/vec-perm-indices.c 2017-12-09 22:48:47.548825399 +0000 @@ -22,11 +22,33 @@ Software Foundation; either version 3, o #include "coretypes.h" #include "vec-perm-indices.h" #include "tree.h" +#include "fold-const.h" +#include "tree-vector-builder.h" #include "backend.h" #include "rtl.h" #include "memmodel.h" #include "emit-rtl.h" +/* Switch to a new permutation vector that selects between NINPUTS vector + inputs that have NELTS_PER_INPUT elements each. Take the elements of the + new permutation vector from ELEMENTS, clamping each one to be in range. */ + +void +vec_perm_indices::new_vector (const vec_perm_builder &elements, + unsigned int ninputs, + unsigned int nelts_per_input) +{ + m_ninputs = ninputs; + m_nelts_per_input = nelts_per_input; + /* Expand the encoding and clamp each element. E.g. { 0, 2, 4, ... } + might wrap halfway if there is only one vector input. */ + unsigned int full_nelts = elements.full_nelts (); + m_encoding.new_vector (full_nelts, full_nelts, 1); + for (unsigned int i = 0; i < full_nelts; ++i) + m_encoding.quick_push (clamp (elements.elt (i))); + m_encoding.finalize (); +} + /* Switch to a new permutation vector that selects the same input elements as ORIG, but with each element split into FACTOR pieces. For example, if ORIG is { 1, 2, 0, 3 } and FACTOR is 2, the new permutation is @@ -36,14 +58,31 @@ Software Foundation; either version 3, o vec_perm_indices::new_expanded_vector (const vec_perm_indices &orig, unsigned int factor) { - truncate (0); - reserve (orig.length () * factor); - for (unsigned int i = 0; i < orig.length (); ++i) + m_ninputs = orig.m_ninputs; + m_nelts_per_input = orig.m_nelts_per_input * factor; + m_encoding.new_vector (orig.m_encoding.full_nelts () * factor, + orig.m_encoding.npatterns () * factor, + orig.m_encoding.nelts_per_pattern ()); + unsigned int encoded_nelts = orig.m_encoding.encoded_nelts (); + for (unsigned int i = 0; i < encoded_nelts; ++i) { - element_type base = orig[i] * factor; + element_type base = orig.m_encoding[i] * factor; for (unsigned int j = 0; j < factor; ++j) - quick_push (base + j); + m_encoding.quick_push (base + j); } + m_encoding.finalize (); +} + +/* Rotate the inputs of the permutation right by DELTA inputs. This changes + the values of the permutation vector but it doesn't change the way that + the elements are encoded. */ + +void +vec_perm_indices::rotate_inputs (int delta) +{ + element_type element_delta = delta * m_nelts_per_input; + for (unsigned int i = 0; i < m_encoding.length (); ++i) + m_encoding[i] = clamp (m_encoding[i] + element_delta); } /* Return true if all elements of the permutation vector are in the range @@ -52,9 +91,44 @@ vec_perm_indices::new_expanded_vector (c bool vec_perm_indices::all_in_range_p (element_type start, element_type size) const { - for (unsigned int i = 0; i < length (); ++i) - if ((*this)[i] < start || ((*this)[i] - start) >= size) + /* Check the first two elements of each pattern. */ + unsigned int npatterns = m_encoding.npatterns (); + unsigned int nelts_per_pattern = m_encoding.nelts_per_pattern (); + unsigned int base_nelts = npatterns * MIN (nelts_per_pattern, 2); + for (unsigned int i = 0; i < base_nelts; ++i) + if (m_encoding[i] < start || (m_encoding[i] - start) >= size) return false; + + /* For stepped encodings, check the full range of the series. */ + if (nelts_per_pattern == 3) + { + element_type limit = input_nelts (); + + /* The number of elements in each pattern beyond the first two + that we checked above. */ + unsigned int step_nelts = (m_encoding.full_nelts () / npatterns) - 2; + for (unsigned int i = 0; i < npatterns; ++i) + { + /* BASE1 has been checked but BASE2 hasn't. */ + element_type base1 = m_encoding[i + npatterns]; + element_type base2 = m_encoding[i + base_nelts]; + + /* The step to add to get from BASE1 to each subsequent value. */ + element_type step = clamp (base2 - base1); + + /* STEP has no inherent sign, so a value near LIMIT can + act as a negative step. The series is in range if it + is in range according to one of the two interpretations. + + Since we're dealing with clamped values, ELEMENT_TYPE is + wide enough for overflow not to be a problem. */ + element_type headroom_down = base1 - start; + element_type headroom_up = size - headroom_down - 1; + if (headroom_up < step * step_nelts + && headroom_down < (limit - step) * step_nelts) + return false; + } + } return true; } @@ -65,15 +139,16 @@ vec_perm_indices::all_in_range_p (elemen bool tree_to_vec_perm_builder (vec_perm_builder *builder, tree cst) { - unsigned int nelts = TYPE_VECTOR_SUBPARTS (TREE_TYPE (cst)); - for (unsigned int i = 0; i < nelts; ++i) - if (!tree_fits_shwi_p (vector_cst_elt (cst, i))) + unsigned int encoded_nelts = vector_cst_encoded_nelts (cst); + for (unsigned int i = 0; i < encoded_nelts; ++i) + if (!tree_fits_shwi_p (VECTOR_CST_ENCODED_ELT (cst, i))) return false; - builder->reserve (nelts); - for (unsigned int i = 0; i < nelts; ++i) - builder->quick_push (tree_to_shwi (vector_cst_elt (cst, i)) - & (2 * nelts - 1)); + builder->new_vector (TYPE_VECTOR_SUBPARTS (TREE_TYPE (cst)), + VECTOR_CST_NPATTERNS (cst), + VECTOR_CST_NELTS_PER_PATTERN (cst)); + for (unsigned int i = 0; i < encoded_nelts; ++i) + builder->quick_push (tree_to_shwi (VECTOR_CST_ENCODED_ELT (cst, i))); return true; } Index: gcc/optabs.c =================================================================== --- gcc/optabs.c 2017-12-09 22:47:27.881318099 +0000 +++ gcc/optabs.c 2017-12-09 22:48:47.546825312 +0000 @@ -5456,6 +5456,11 @@ expand_vec_perm_const (machine_mode mode rtx_insn *last = get_last_insn (); bool single_arg_p = rtx_equal_p (v0, v1); + /* Always specify two input vectors here and leave the target to handle + cases in which the inputs are equal. Not all backends can cope with + the single-input representation when testing for a double-input + target instruction. */ + vec_perm_indices indices (sel, 2, GET_MODE_NUNITS (mode)); /* See if this can be handled with a vec_shr. We only do this if the second vector is all zeroes. */ @@ -5468,7 +5473,7 @@ expand_vec_perm_const (machine_mode mode && (shift_code != CODE_FOR_nothing || shift_code_qi != CODE_FOR_nothing)) { - rtx shift_amt = shift_amt_for_vec_perm_mask (mode, sel); + rtx shift_amt = shift_amt_for_vec_perm_mask (mode, indices); if (shift_amt) { struct expand_operand ops[3]; @@ -5500,7 +5505,7 @@ expand_vec_perm_const (machine_mode mode else v1 = force_reg (mode, v1); - if (targetm.vectorize.vec_perm_const (mode, target, v0, v1, sel)) + if (targetm.vectorize.vec_perm_const (mode, target, v0, v1, indices)) return target; } @@ -5509,7 +5514,7 @@ expand_vec_perm_const (machine_mode mode rtx target_qi = NULL_RTX, v0_qi = NULL_RTX, v1_qi = NULL_RTX; if (qimode != VOIDmode) { - qimode_indices.new_expanded_vector (sel, GET_MODE_UNIT_SIZE (mode)); + qimode_indices.new_expanded_vector (indices, GET_MODE_UNIT_SIZE (mode)); target_qi = gen_reg_rtx (qimode); v0_qi = gen_lowpart (qimode, v0); v1_qi = gen_lowpart (qimode, v1); @@ -5536,7 +5541,7 @@ expand_vec_perm_const (machine_mode mode REQUIRED_SEL_MODE is OK. */ if (sel_mode != required_sel_mode) { - if (!selector_fits_mode_p (required_sel_mode, sel)) + if (!selector_fits_mode_p (required_sel_mode, indices)) { delete_insns_since (last); return NULL_RTX; @@ -5547,7 +5552,7 @@ expand_vec_perm_const (machine_mode mode insn_code icode = direct_optab_handler (vec_perm_optab, mode); if (icode != CODE_FOR_nothing) { - rtx sel_rtx = vec_perm_indices_to_rtx (sel_mode, sel); + rtx sel_rtx = vec_perm_indices_to_rtx (sel_mode, indices); rtx tmp = expand_vec_perm_1 (icode, target, v0, v1, sel_rtx); if (tmp) return tmp; @@ -5621,7 +5626,7 @@ expand_vec_perm_var (machine_mode mode, gcc_assert (sel != NULL); /* Broadcast the low byte each element into each of its bytes. */ - vec_perm_builder const_sel (w); + vec_perm_builder const_sel (w, w, 1); for (i = 0; i < w; ++i) { int this_e = i / u * u; @@ -5848,7 +5853,7 @@ expand_mult_highpart (machine_mode mode, expand_insn (optab_handler (tab2, mode), 3, eops); m2 = gen_lowpart (mode, eops[0].value); - auto_vec_perm_indices sel (nunits); + vec_perm_builder sel (nunits, nunits, 1); if (method == 2) { for (i = 0; i < nunits; ++i) Index: gcc/optabs-query.c =================================================================== --- gcc/optabs-query.c 2017-12-09 22:47:27.881318099 +0000 +++ gcc/optabs-query.c 2017-12-09 22:48:47.545825268 +0000 @@ -501,12 +501,13 @@ can_mult_highpart_p (machine_mode mode, op = uns_p ? vec_widen_umult_odd_optab : vec_widen_smult_odd_optab; if (optab_handler (op, mode) != CODE_FOR_nothing) { - auto_vec_perm_indices sel (nunits); + vec_perm_builder sel (nunits, nunits, 1); for (i = 0; i < nunits; ++i) sel.quick_push (!BYTES_BIG_ENDIAN + (i & ~1) + ((i & 1) ? nunits : 0)); - if (can_vec_perm_const_p (mode, sel)) + vec_perm_indices indices (sel, 2, nunits); + if (can_vec_perm_const_p (mode, indices)) return 2; } } @@ -517,10 +518,11 @@ can_mult_highpart_p (machine_mode mode, op = uns_p ? vec_widen_umult_lo_optab : vec_widen_smult_lo_optab; if (optab_handler (op, mode) != CODE_FOR_nothing) { - auto_vec_perm_indices sel (nunits); + vec_perm_builder sel (nunits, nunits, 1); for (i = 0; i < nunits; ++i) sel.quick_push (2 * i + (BYTES_BIG_ENDIAN ? 0 : 1)); - if (can_vec_perm_const_p (mode, sel)) + vec_perm_indices indices (sel, 2, nunits); + if (can_vec_perm_const_p (mode, indices)) return 3; } } Index: gcc/fold-const.c =================================================================== --- gcc/fold-const.c 2017-12-09 22:47:27.881318099 +0000 +++ gcc/fold-const.c 2017-12-09 22:48:47.545825268 +0000 @@ -11217,7 +11217,7 @@ fold_ternary_loc (location_t loc, enum t { unsigned int nelts = VECTOR_CST_NELTS (arg0), i; gcc_assert (nelts == TYPE_VECTOR_SUBPARTS (type)); - auto_vec_perm_indices sel (nelts); + vec_perm_builder sel (nelts, nelts, 1); for (i = 0; i < nelts; i++) { tree val = VECTOR_CST_ELT (arg0, i); @@ -11228,7 +11228,8 @@ fold_ternary_loc (location_t loc, enum t else /* Currently unreachable. */ return NULL_TREE; } - tree t = fold_vec_perm (type, arg1, arg2, sel); + tree t = fold_vec_perm (type, arg1, arg2, + vec_perm_indices (sel, 2, nelts)); if (t != NULL_TREE) return t; } @@ -11558,8 +11559,8 @@ fold_ternary_loc (location_t loc, enum t mask2 = 2 * nelts - 1; mask = single_arg ? (nelts - 1) : mask2; gcc_assert (nelts == TYPE_VECTOR_SUBPARTS (type)); - auto_vec_perm_indices sel (nelts); - auto_vec_perm_indices sel2 (nelts); + vec_perm_builder sel (nelts, nelts, 1); + vec_perm_builder sel2 (nelts, nelts, 1); for (i = 0; i < nelts; i++) { tree val = VECTOR_CST_ELT (arg2, i); @@ -11604,12 +11605,13 @@ fold_ternary_loc (location_t loc, enum t need_mask_canon = true; } + vec_perm_indices indices (sel, 2, nelts); if ((TREE_CODE (op0) == VECTOR_CST || TREE_CODE (op0) == CONSTRUCTOR) && (TREE_CODE (op1) == VECTOR_CST || TREE_CODE (op1) == CONSTRUCTOR)) { - tree t = fold_vec_perm (type, op0, op1, sel); + tree t = fold_vec_perm (type, op0, op1, indices); if (t != NULL_TREE) return t; } @@ -11621,11 +11623,14 @@ fold_ternary_loc (location_t loc, enum t argument permutation while still allowing an equivalent 2-argument version. */ if (need_mask_canon && arg2 == op2 - && !can_vec_perm_const_p (TYPE_MODE (type), sel, false) - && can_vec_perm_const_p (TYPE_MODE (type), sel2, false)) + && !can_vec_perm_const_p (TYPE_MODE (type), indices, false) + && can_vec_perm_const_p (TYPE_MODE (type), + vec_perm_indices (sel2, 2, nelts), + false)) { need_mask_canon = need_mask_canon2; - sel = sel2; + sel.truncate (0); + sel.splice (sel2); } if (need_mask_canon && arg2 == op2) Index: gcc/tree-ssa-forwprop.c =================================================================== --- gcc/tree-ssa-forwprop.c 2017-12-09 22:47:27.883318100 +0000 +++ gcc/tree-ssa-forwprop.c 2017-12-09 22:48:47.546825312 +0000 @@ -2019,7 +2019,7 @@ simplify_vector_constructor (gimple_stmt elem_type = TREE_TYPE (type); elem_size = TREE_INT_CST_LOW (TYPE_SIZE (elem_type)); - auto_vec_perm_indices sel (nelts); + vec_perm_builder sel (nelts, nelts, 1); orig = NULL; conv_code = ERROR_MARK; maybe_ident = true; @@ -2109,7 +2109,8 @@ simplify_vector_constructor (gimple_stmt { tree mask_type; - if (!can_vec_perm_const_p (TYPE_MODE (type), sel)) + vec_perm_indices indices (sel, 1, nelts); + if (!can_vec_perm_const_p (TYPE_MODE (type), indices)) return false; mask_type = build_vector_type (build_nonstandard_integer_type (elem_size, 1), Index: gcc/tree-vect-data-refs.c =================================================================== --- gcc/tree-vect-data-refs.c 2017-12-09 22:47:27.883318100 +0000 +++ gcc/tree-vect-data-refs.c 2017-12-09 22:48:47.546825312 +0000 @@ -4566,7 +4566,7 @@ vect_grouped_store_supported (tree vecty if (VECTOR_MODE_P (mode)) { unsigned int i, nelt = GET_MODE_NUNITS (mode); - auto_vec_perm_indices sel (nelt); + vec_perm_builder sel (nelt, nelt, 1); sel.quick_grow (nelt); if (count == 3) @@ -4574,6 +4574,7 @@ vect_grouped_store_supported (tree vecty unsigned int j0 = 0, j1 = 0, j2 = 0; unsigned int i, j; + vec_perm_indices indices; for (j = 0; j < 3; j++) { int nelt0 = ((3 - j) * nelt) % 3; @@ -4588,7 +4589,8 @@ vect_grouped_store_supported (tree vecty if (3 * i + nelt2 < nelt) sel[3 * i + nelt2] = 0; } - if (!can_vec_perm_const_p (mode, sel)) + indices.new_vector (sel, 2, nelt); + if (!can_vec_perm_const_p (mode, indices)) { if (dump_enabled_p ()) dump_printf (MSG_MISSED_OPTIMIZATION, @@ -4605,7 +4607,8 @@ vect_grouped_store_supported (tree vecty if (3 * i + nelt2 < nelt) sel[3 * i + nelt2] = nelt + j2++; } - if (!can_vec_perm_const_p (mode, sel)) + indices.new_vector (sel, 2, nelt); + if (!can_vec_perm_const_p (mode, indices)) { if (dump_enabled_p ()) dump_printf (MSG_MISSED_OPTIMIZATION, @@ -4625,11 +4628,13 @@ vect_grouped_store_supported (tree vecty sel[i * 2] = i; sel[i * 2 + 1] = i + nelt; } - if (can_vec_perm_const_p (mode, sel)) + vec_perm_indices indices (sel, 2, nelt); + if (can_vec_perm_const_p (mode, indices)) { for (i = 0; i < nelt; i++) sel[i] += nelt / 2; - if (can_vec_perm_const_p (mode, sel)) + indices.new_vector (sel, 2, nelt); + if (can_vec_perm_const_p (mode, indices)) return true; } } @@ -4731,7 +4736,7 @@ vect_permute_store_chain (vec<tree> dr_c unsigned int i, n, log_length = exact_log2 (length); unsigned int j, nelt = TYPE_VECTOR_SUBPARTS (vectype); - auto_vec_perm_indices sel (nelt); + vec_perm_builder sel (nelt, nelt, 1); sel.quick_grow (nelt); result_chain->quick_grow (length); @@ -4742,6 +4747,7 @@ vect_permute_store_chain (vec<tree> dr_c { unsigned int j0 = 0, j1 = 0, j2 = 0; + vec_perm_indices indices; for (j = 0; j < 3; j++) { int nelt0 = ((3 - j) * nelt) % 3; @@ -4757,7 +4763,8 @@ vect_permute_store_chain (vec<tree> dr_c if (3 * i + nelt2 < nelt) sel[3 * i + nelt2] = 0; } - perm3_mask_low = vect_gen_perm_mask_checked (vectype, sel); + indices.new_vector (sel, 2, nelt); + perm3_mask_low = vect_gen_perm_mask_checked (vectype, indices); for (i = 0; i < nelt; i++) { @@ -4768,7 +4775,8 @@ vect_permute_store_chain (vec<tree> dr_c if (3 * i + nelt2 < nelt) sel[3 * i + nelt2] = nelt + j2++; } - perm3_mask_high = vect_gen_perm_mask_checked (vectype, sel); + indices.new_vector (sel, 2, nelt); + perm3_mask_high = vect_gen_perm_mask_checked (vectype, indices); vect1 = dr_chain[0]; vect2 = dr_chain[1]; @@ -4805,11 +4813,13 @@ vect_permute_store_chain (vec<tree> dr_c sel[i * 2] = i; sel[i * 2 + 1] = i + nelt; } - perm_mask_high = vect_gen_perm_mask_checked (vectype, sel); + vec_perm_indices indices (sel, 2, nelt); + perm_mask_high = vect_gen_perm_mask_checked (vectype, indices); for (i = 0; i < nelt; i++) sel[i] += nelt / 2; - perm_mask_low = vect_gen_perm_mask_checked (vectype, sel); + indices.new_vector (sel, 2, nelt); + perm_mask_low = vect_gen_perm_mask_checked (vectype, indices); for (i = 0, n = log_length; i < n; i++) { @@ -5154,11 +5164,12 @@ vect_grouped_load_supported (tree vectyp if (VECTOR_MODE_P (mode)) { unsigned int i, j, nelt = GET_MODE_NUNITS (mode); - auto_vec_perm_indices sel (nelt); + vec_perm_builder sel (nelt, nelt, 1); sel.quick_grow (nelt); if (count == 3) { + vec_perm_indices indices; unsigned int k; for (k = 0; k < 3; k++) { @@ -5167,7 +5178,8 @@ vect_grouped_load_supported (tree vectyp sel[i] = 3 * i + k; else sel[i] = 0; - if (!can_vec_perm_const_p (mode, sel)) + indices.new_vector (sel, 2, nelt); + if (!can_vec_perm_const_p (mode, indices)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, @@ -5180,7 +5192,8 @@ vect_grouped_load_supported (tree vectyp sel[i] = i; else sel[i] = nelt + ((nelt + k) % 3) + 3 * (j++); - if (!can_vec_perm_const_p (mode, sel)) + indices.new_vector (sel, 2, nelt); + if (!can_vec_perm_const_p (mode, indices)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, @@ -5195,13 +5208,16 @@ vect_grouped_load_supported (tree vectyp { /* If length is not equal to 3 then only power of 2 is supported. */ gcc_assert (pow2p_hwi (count)); + for (i = 0; i < nelt; i++) sel[i] = i * 2; - if (can_vec_perm_const_p (mode, sel)) + vec_perm_indices indices (sel, 2, nelt); + if (can_vec_perm_const_p (mode, indices)) { for (i = 0; i < nelt; i++) sel[i] = i * 2 + 1; - if (can_vec_perm_const_p (mode, sel)) + indices.new_vector (sel, 2, nelt); + if (can_vec_perm_const_p (mode, indices)) return true; } } @@ -5316,7 +5332,7 @@ vect_permute_load_chain (vec<tree> dr_ch unsigned int i, j, log_length = exact_log2 (length); unsigned nelt = TYPE_VECTOR_SUBPARTS (vectype); - auto_vec_perm_indices sel (nelt); + vec_perm_builder sel (nelt, nelt, 1); sel.quick_grow (nelt); result_chain->quick_grow (length); @@ -5327,6 +5343,7 @@ vect_permute_load_chain (vec<tree> dr_ch { unsigned int k; + vec_perm_indices indices; for (k = 0; k < 3; k++) { for (i = 0; i < nelt; i++) @@ -5334,15 +5351,16 @@ vect_permute_load_chain (vec<tree> dr_ch sel[i] = 3 * i + k; else sel[i] = 0; - perm3_mask_low = vect_gen_perm_mask_checked (vectype, sel); + indices.new_vector (sel, 2, nelt); + perm3_mask_low = vect_gen_perm_mask_checked (vectype, indices); for (i = 0, j = 0; i < nelt; i++) if (3 * i + k < 2 * nelt) sel[i] = i; else sel[i] = nelt + ((nelt + k) % 3) + 3 * (j++); - - perm3_mask_high = vect_gen_perm_mask_checked (vectype, sel); + indices.new_vector (sel, 2, nelt); + perm3_mask_high = vect_gen_perm_mask_checked (vectype, indices); first_vect = dr_chain[0]; second_vect = dr_chain[1]; @@ -5374,11 +5392,13 @@ vect_permute_load_chain (vec<tree> dr_ch for (i = 0; i < nelt; ++i) sel[i] = i * 2; - perm_mask_even = vect_gen_perm_mask_checked (vectype, sel); + vec_perm_indices indices (sel, 2, nelt); + perm_mask_even = vect_gen_perm_mask_checked (vectype, indices); for (i = 0; i < nelt; ++i) sel[i] = i * 2 + 1; - perm_mask_odd = vect_gen_perm_mask_checked (vectype, sel); + indices.new_vector (sel, 2, nelt); + perm_mask_odd = vect_gen_perm_mask_checked (vectype, indices); for (i = 0; i < log_length; i++) { @@ -5514,7 +5534,7 @@ vect_shift_permute_load_chain (vec<tree> stmt_vec_info stmt_info = vinfo_for_stmt (stmt); loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info); - auto_vec_perm_indices sel (nelt); + vec_perm_builder sel (nelt, nelt, 1); sel.quick_grow (nelt); result_chain->quick_grow (length); @@ -5528,7 +5548,8 @@ vect_shift_permute_load_chain (vec<tree> sel[i] = i * 2; for (i = 0; i < nelt / 2; ++i) sel[nelt / 2 + i] = i * 2 + 1; - if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) + vec_perm_indices indices (sel, 2, nelt); + if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, @@ -5536,13 +5557,14 @@ vect_shift_permute_load_chain (vec<tree> supported by target\n"); return false; } - perm2_mask1 = vect_gen_perm_mask_checked (vectype, sel); + perm2_mask1 = vect_gen_perm_mask_checked (vectype, indices); for (i = 0; i < nelt / 2; ++i) sel[i] = i * 2 + 1; for (i = 0; i < nelt / 2; ++i) sel[nelt / 2 + i] = i * 2; - if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) + indices.new_vector (sel, 2, nelt); + if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, @@ -5550,20 +5572,21 @@ vect_shift_permute_load_chain (vec<tree> supported by target\n"); return false; } - perm2_mask2 = vect_gen_perm_mask_checked (vectype, sel); + perm2_mask2 = vect_gen_perm_mask_checked (vectype, indices); /* Generating permutation constant to shift all elements. For vector length 8 it is {4 5 6 7 8 9 10 11}. */ for (i = 0; i < nelt; i++) sel[i] = nelt / 2 + i; - if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) + indices.new_vector (sel, 2, nelt); + if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, "shift permutation is not supported by target\n"); return false; } - shift1_mask = vect_gen_perm_mask_checked (vectype, sel); + shift1_mask = vect_gen_perm_mask_checked (vectype, indices); /* Generating permutation constant to select vector from 2. For vector length 8 it is {0 1 2 3 12 13 14 15}. */ @@ -5571,14 +5594,15 @@ vect_shift_permute_load_chain (vec<tree> sel[i] = i; for (i = nelt / 2; i < nelt; i++) sel[i] = nelt + i; - if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) + indices.new_vector (sel, 2, nelt); + if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, "select is not supported by target\n"); return false; } - select_mask = vect_gen_perm_mask_checked (vectype, sel); + select_mask = vect_gen_perm_mask_checked (vectype, indices); for (i = 0; i < log_length; i++) { @@ -5634,7 +5658,8 @@ vect_shift_permute_load_chain (vec<tree> sel[i] = 3 * k + (l % 3); k++; } - if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) + vec_perm_indices indices (sel, 2, nelt); + if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, @@ -5642,59 +5667,63 @@ vect_shift_permute_load_chain (vec<tree> supported by target\n"); return false; } - perm3_mask = vect_gen_perm_mask_checked (vectype, sel); + perm3_mask = vect_gen_perm_mask_checked (vectype, indices); /* Generating permutation constant to shift all elements. For vector length 8 it is {6 7 8 9 10 11 12 13}. */ for (i = 0; i < nelt; i++) sel[i] = 2 * (nelt / 3) + (nelt % 3) + i; - if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) + indices.new_vector (sel, 2, nelt); + if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, "shift permutation is not supported by target\n"); return false; } - shift1_mask = vect_gen_perm_mask_checked (vectype, sel); + shift1_mask = vect_gen_perm_mask_checked (vectype, indices); /* Generating permutation constant to shift all elements. For vector length 8 it is {5 6 7 8 9 10 11 12}. */ for (i = 0; i < nelt; i++) sel[i] = 2 * (nelt / 3) + 1 + i; - if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) + indices.new_vector (sel, 2, nelt); + if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, "shift permutation is not supported by target\n"); return false; } - shift2_mask = vect_gen_perm_mask_checked (vectype, sel); + shift2_mask = vect_gen_perm_mask_checked (vectype, indices); /* Generating permutation constant to shift all elements. For vector length 8 it is {3 4 5 6 7 8 9 10}. */ for (i = 0; i < nelt; i++) sel[i] = (nelt / 3) + (nelt % 3) / 2 + i; - if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) + indices.new_vector (sel, 2, nelt); + if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, "shift permutation is not supported by target\n"); return false; } - shift3_mask = vect_gen_perm_mask_checked (vectype, sel); + shift3_mask = vect_gen_perm_mask_checked (vectype, indices); /* Generating permutation constant to shift all elements. For vector length 8 it is {5 6 7 8 9 10 11 12}. */ for (i = 0; i < nelt; i++) sel[i] = 2 * (nelt / 3) + (nelt % 3) / 2 + i; - if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) + indices.new_vector (sel, 2, nelt); + if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, "shift permutation is not supported by target\n"); return false; } - shift4_mask = vect_gen_perm_mask_checked (vectype, sel); + shift4_mask = vect_gen_perm_mask_checked (vectype, indices); for (k = 0; k < 3; k++) { Index: gcc/tree-vect-slp.c =================================================================== --- gcc/tree-vect-slp.c 2017-12-09 22:47:27.884318101 +0000 +++ gcc/tree-vect-slp.c 2017-12-09 22:48:47.547825355 +0000 @@ -894,7 +894,7 @@ vect_build_slp_tree_1 (vec_info *vinfo, && TREE_CODE_CLASS (alt_stmt_code) != tcc_reference) { unsigned int count = TYPE_VECTOR_SUBPARTS (vectype); - auto_vec_perm_indices sel (count); + vec_perm_builder sel (count, count, 1); for (i = 0; i < count; ++i) { unsigned int elt = i; @@ -902,7 +902,8 @@ vect_build_slp_tree_1 (vec_info *vinfo, elt += count; sel.quick_push (elt); } - if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) + vec_perm_indices indices (sel, 2, count); + if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices)) { for (i = 0; i < group_size; ++i) if (gimple_assign_rhs_code (stmts[i]) == alt_stmt_code) @@ -3570,8 +3571,9 @@ vect_transform_slp_perm_load (slp_tree n (int_mode_for_mode (TYPE_MODE (TREE_TYPE (vectype))).require (), 1); mask_type = get_vectype_for_scalar_type (mask_element_type); nunits = TYPE_VECTOR_SUBPARTS (vectype); - auto_vec_perm_indices mask (nunits); + vec_perm_builder mask (nunits, nunits, 1); mask.quick_grow (nunits); + vec_perm_indices indices; /* Initialize the vect stmts of NODE to properly insert the generated stmts later. */ @@ -3644,10 +3646,10 @@ vect_transform_slp_perm_load (slp_tree n noop_p = false; mask[index++] = mask_element; - if (index == nunits) + if (index == nunits && !noop_p) { - if (! noop_p - && ! can_vec_perm_const_p (mode, mask)) + indices.new_vector (mask, 2, nunits); + if (!can_vec_perm_const_p (mode, indices)) { if (dump_enabled_p ()) { @@ -3655,16 +3657,19 @@ vect_transform_slp_perm_load (slp_tree n vect_location, "unsupported vect permute { "); for (i = 0; i < nunits; ++i) - dump_printf (MSG_MISSED_OPTIMIZATION, "%d ", mask[i]); + dump_printf (MSG_MISSED_OPTIMIZATION, + HOST_WIDE_INT_PRINT_DEC " ", mask[i]); dump_printf (MSG_MISSED_OPTIMIZATION, "}\n"); } gcc_assert (analyze_only); return false; } - if (! noop_p) - ++*n_perms; + ++*n_perms; + } + if (index == nunits) + { if (!analyze_only) { tree mask_vec = NULL_TREE; @@ -3797,7 +3802,7 @@ vect_schedule_slp_instance (slp_tree nod enum tree_code code0 = gimple_assign_rhs_code (stmt); enum tree_code ocode = ERROR_MARK; gimple *ostmt; - auto_vec_perm_indices mask (group_size); + vec_perm_builder mask (group_size, group_size, 1); FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, ostmt) if (gimple_assign_rhs_code (ostmt) != code0) { Index: gcc/tree-vect-stmts.c =================================================================== --- gcc/tree-vect-stmts.c 2017-12-09 22:47:27.885318101 +0000 +++ gcc/tree-vect-stmts.c 2017-12-09 22:48:47.548825399 +0000 @@ -1717,13 +1717,14 @@ perm_mask_for_reverse (tree vectype) nunits = TYPE_VECTOR_SUBPARTS (vectype); - auto_vec_perm_indices sel (nunits); + vec_perm_builder sel (nunits, nunits, 1); for (i = 0; i < nunits; ++i) sel.quick_push (nunits - 1 - i); - if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) + vec_perm_indices indices (sel, 1, nunits); + if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices)) return NULL_TREE; - return vect_gen_perm_mask_checked (vectype, sel); + return vect_gen_perm_mask_checked (vectype, indices); } /* A subroutine of get_load_store_type, with a subset of the same @@ -2185,27 +2186,32 @@ vectorizable_mask_load_store (gimple *st { modifier = WIDEN; - auto_vec_perm_indices sel (gather_off_nunits); + vec_perm_builder sel (gather_off_nunits, gather_off_nunits, 1); for (i = 0; i < gather_off_nunits; ++i) sel.quick_push (i | nunits); - perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype, sel); + vec_perm_indices indices (sel, 1, gather_off_nunits); + perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype, + indices); } else if (nunits == gather_off_nunits * 2) { modifier = NARROW; - auto_vec_perm_indices sel (nunits); + vec_perm_builder sel (nunits, nunits, 1); sel.quick_grow (nunits); for (i = 0; i < nunits; ++i) sel[i] = i < gather_off_nunits ? i : i + nunits - gather_off_nunits; + vec_perm_indices indices (sel, 2, nunits); + perm_mask = vect_gen_perm_mask_checked (vectype, indices); - perm_mask = vect_gen_perm_mask_checked (vectype, sel); ncopies *= 2; + for (i = 0; i < nunits; ++i) sel[i] = i | gather_off_nunits; - mask_perm_mask = vect_gen_perm_mask_checked (masktype, sel); + indices.new_vector (sel, 2, gather_off_nunits); + mask_perm_mask = vect_gen_perm_mask_checked (masktype, indices); } else gcc_unreachable (); @@ -2498,12 +2504,13 @@ vectorizable_bswap (gimple *stmt, gimple unsigned int num_bytes = TYPE_VECTOR_SUBPARTS (char_vectype); unsigned word_bytes = num_bytes / nunits; - auto_vec_perm_indices elts (num_bytes); + vec_perm_builder elts (num_bytes, num_bytes, 1); for (unsigned i = 0; i < nunits; ++i) for (unsigned j = 0; j < word_bytes; ++j) elts.quick_push ((i + 1) * word_bytes - j - 1); - if (!can_vec_perm_const_p (TYPE_MODE (char_vectype), elts)) + vec_perm_indices indices (elts, 1, num_bytes); + if (!can_vec_perm_const_p (TYPE_MODE (char_vectype), indices)) return false; if (! vec_stmt) @@ -5809,22 +5816,25 @@ vectorizable_store (gimple *stmt, gimple { modifier = WIDEN; - auto_vec_perm_indices sel (scatter_off_nunits); + vec_perm_builder sel (scatter_off_nunits, scatter_off_nunits, 1); for (i = 0; i < (unsigned int) scatter_off_nunits; ++i) sel.quick_push (i | nunits); - perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype, sel); + vec_perm_indices indices (sel, 1, scatter_off_nunits); + perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype, + indices); gcc_assert (perm_mask != NULL_TREE); } else if (nunits == (unsigned int) scatter_off_nunits * 2) { modifier = NARROW; - auto_vec_perm_indices sel (nunits); + vec_perm_builder sel (nunits, nunits, 1); for (i = 0; i < (unsigned int) nunits; ++i) sel.quick_push (i | scatter_off_nunits); - perm_mask = vect_gen_perm_mask_checked (vectype, sel); + vec_perm_indices indices (sel, 2, nunits); + perm_mask = vect_gen_perm_mask_checked (vectype, indices); gcc_assert (perm_mask != NULL_TREE); ncopies *= 2; } @@ -6845,22 +6855,25 @@ vectorizable_load (gimple *stmt, gimple_ { modifier = WIDEN; - auto_vec_perm_indices sel (gather_off_nunits); + vec_perm_builder sel (gather_off_nunits, gather_off_nunits, 1); for (i = 0; i < gather_off_nunits; ++i) sel.quick_push (i | nunits); - perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype, sel); + vec_perm_indices indices (sel, 1, gather_off_nunits); + perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype, + indices); } else if (nunits == gather_off_nunits * 2) { modifier = NARROW; - auto_vec_perm_indices sel (nunits); + vec_perm_builder sel (nunits, nunits, 1); for (i = 0; i < nunits; ++i) sel.quick_push (i < gather_off_nunits ? i : i + nunits - gather_off_nunits); - perm_mask = vect_gen_perm_mask_checked (vectype, sel); + vec_perm_indices indices (sel, 2, nunits); + perm_mask = vect_gen_perm_mask_checked (vectype, indices); ncopies *= 2; } else Index: gcc/tree-vect-generic.c =================================================================== --- gcc/tree-vect-generic.c 2017-12-09 22:47:27.883318100 +0000 +++ gcc/tree-vect-generic.c 2017-12-09 22:48:47.547825355 +0000 @@ -1299,15 +1299,13 @@ lower_vec_perm (gimple_stmt_iterator *gs mask = gimple_assign_rhs1 (def_stmt); } - if (TREE_CODE (mask) == VECTOR_CST) - { - auto_vec_perm_indices sel_int (elements); - - for (i = 0; i < elements; ++i) - sel_int.quick_push (TREE_INT_CST_LOW (VECTOR_CST_ELT (mask, i)) - & (2 * elements - 1)); + vec_perm_builder sel_int; - if (can_vec_perm_const_p (TYPE_MODE (vect_type), sel_int)) + if (TREE_CODE (mask) == VECTOR_CST + && tree_to_vec_perm_builder (&sel_int, mask)) + { + vec_perm_indices indices (sel_int, 2, elements); + if (can_vec_perm_const_p (TYPE_MODE (vect_type), indices)) { gimple_assign_set_rhs3 (stmt, mask); update_stmt (stmt); @@ -1319,14 +1317,14 @@ lower_vec_perm (gimple_stmt_iterator *gs != CODE_FOR_nothing && TREE_CODE (vec1) == VECTOR_CST && initializer_zerop (vec1) - && sel_int[0] - && sel_int[0] < elements) + && indices[0] + && indices[0] < elements) { for (i = 1; i < elements; ++i) { - unsigned int expected = i + sel_int[0]; + unsigned int expected = i + indices[0]; /* Indices into the second vector are all equivalent. */ - if (MIN (elements, (unsigned) sel_int[i]) + if (MIN (elements, (unsigned) indices[i]) != MIN (elements, expected)) break; } Index: gcc/tree-vect-loop.c =================================================================== --- gcc/tree-vect-loop.c 2017-12-09 22:47:27.884318101 +0000 +++ gcc/tree-vect-loop.c 2017-12-09 22:48:47.547825355 +0000 @@ -3714,12 +3714,11 @@ vect_estimate_min_profitable_iters (loop vector elements (not bits) for a vector with NELT elements. */ static void calc_vec_perm_mask_for_shift (unsigned int offset, unsigned int nelt, - vec_perm_indices *sel) + vec_perm_builder *sel) { - unsigned int i; - - for (i = 0; i < nelt; i++) - sel->quick_push ((i + offset) & (2 * nelt - 1)); + sel->new_vector (nelt, nelt, 1); + for (unsigned int i = 0; i < nelt; i++) + sel->quick_push (i + offset); } /* Checks whether the target supports whole-vector shifts for vectors of mode @@ -3732,13 +3731,13 @@ have_whole_vector_shift (machine_mode mo return true; unsigned int i, nelt = GET_MODE_NUNITS (mode); - auto_vec_perm_indices sel (nelt); - + vec_perm_builder sel; + vec_perm_indices indices; for (i = nelt/2; i >= 1; i/=2) { - sel.truncate (0); calc_vec_perm_mask_for_shift (i, nelt, &sel); - if (!can_vec_perm_const_p (mode, sel, false)) + indices.new_vector (sel, 2, nelt); + if (!can_vec_perm_const_p (mode, indices, false)) return false; } return true; @@ -5028,7 +5027,8 @@ vect_create_epilog_for_reduction (vec<tr if (reduce_with_shift && !slp_reduc) { int nelements = vec_size_in_bits / element_bitsize; - auto_vec_perm_indices sel (nelements); + vec_perm_builder sel; + vec_perm_indices indices; int elt_offset; @@ -5052,9 +5052,9 @@ vect_create_epilog_for_reduction (vec<tr elt_offset >= 1; elt_offset /= 2) { - sel.truncate (0); calc_vec_perm_mask_for_shift (elt_offset, nelements, &sel); - tree mask = vect_gen_perm_mask_any (vectype, sel); + indices.new_vector (sel, 2, nelements); + tree mask = vect_gen_perm_mask_any (vectype, indices); epilog_stmt = gimple_build_assign (vec_dest, VEC_PERM_EXPR, new_temp, zero_vec, mask); new_name = make_ssa_name (vec_dest, epilog_stmt); Index: gcc/config/aarch64/aarch64.c =================================================================== --- gcc/config/aarch64/aarch64.c 2017-12-09 22:47:27.856318084 +0000 +++ gcc/config/aarch64/aarch64.c 2017-12-09 22:48:47.535824832 +0000 @@ -13208,7 +13208,7 @@ #define MAX_VECT_LEN 16 struct expand_vec_perm_d { rtx target, op0, op1; - auto_vec_perm_indices perm; + vec_perm_indices perm; machine_mode vmode; bool one_vector_p; bool testing_p; @@ -13598,10 +13598,7 @@ aarch64_expand_vec_perm_const_1 (struct unsigned int nelt = d->perm.length (); if (d->perm[0] >= nelt) { - gcc_assert (nelt == (nelt & -nelt)); - for (unsigned int i = 0; i < nelt; ++i) - d->perm[i] ^= nelt; /* Keep the same index, but in the other vector. */ - + d->perm.rotate_inputs (1); std::swap (d->op0, d->op1); } @@ -13641,12 +13638,10 @@ aarch64_vectorize_vec_perm_const (machin /* Calculate whether all elements are in one vector. */ unsigned int nelt = sel.length (); - d.perm.reserve (nelt); for (i = which = 0; i < nelt; ++i) { unsigned int ei = sel[i] & (2 * nelt - 1); which |= (ei < nelt ? 1 : 2); - d.perm.quick_push (ei); } switch (which) @@ -13665,8 +13660,6 @@ aarch64_vectorize_vec_perm_const (machin input vector. */ /* Fall Through. */ case 2: - for (i = 0; i < nelt; ++i) - d.perm[i] &= nelt - 1; d.op0 = op1; d.one_vector_p = true; break; @@ -13677,6 +13670,8 @@ aarch64_vectorize_vec_perm_const (machin break; } + d.perm.new_vector (sel.encoding (), d.one_vector_p ? 1 : 2, nelt); + if (!d.testing_p) return aarch64_expand_vec_perm_const_1 (&d); Index: gcc/config/arm/arm.c =================================================================== --- gcc/config/arm/arm.c 2017-12-09 22:47:27.858318085 +0000 +++ gcc/config/arm/arm.c 2017-12-09 22:48:47.538824963 +0000 @@ -28852,7 +28852,7 @@ #define MAX_VECT_LEN 16 struct expand_vec_perm_d { rtx target, op0, op1; - auto_vec_perm_indices perm; + vec_perm_indices perm; machine_mode vmode; bool one_vector_p; bool testing_p; @@ -29360,9 +29360,7 @@ arm_expand_vec_perm_const_1 (struct expa unsigned int nelt = d->perm.length (); if (d->perm[0] >= nelt) { - for (unsigned int i = 0; i < nelt; ++i) - d->perm[i] = (d->perm[i] + nelt) & (2 * nelt - 1); - + d->perm.rotate_inputs (1); std::swap (d->op0, d->op1); } @@ -29402,12 +29400,10 @@ arm_vectorize_vec_perm_const (machine_mo d.testing_p = !target; nelt = GET_MODE_NUNITS (d.vmode); - d.perm.reserve (nelt); for (i = which = 0; i < nelt; ++i) { int ei = sel[i] & (2 * nelt - 1); which |= (ei < nelt ? 1 : 2); - d.perm.quick_push (ei); } switch (which) @@ -29426,8 +29422,6 @@ arm_vectorize_vec_perm_const (machine_mo input vector. */ /* FALLTHRU */ case 2: - for (i = 0; i < nelt; ++i) - d.perm[i] &= nelt - 1; d.op0 = op1; d.one_vector_p = true; break; @@ -29438,6 +29432,8 @@ arm_vectorize_vec_perm_const (machine_mo break; } + d.perm.new_vector (sel.encoding (), d.one_vector_p ? 1 : 2, nelt); + if (d.testing_p) return arm_expand_vec_perm_const_1 (&d); Index: gcc/config/powerpcspe/powerpcspe.c =================================================================== --- gcc/config/powerpcspe/powerpcspe.c 2017-12-09 22:47:27.871318093 +0000 +++ gcc/config/powerpcspe/powerpcspe.c 2017-12-09 22:48:47.541825094 +0000 @@ -38780,7 +38780,7 @@ rs6000_expand_extract_even (rtx target, { machine_mode vmode = GET_MODE (target); unsigned i, nelt = GET_MODE_NUNITS (vmode); - vec_perm_builder perm (nelt); + vec_perm_builder perm (nelt, nelt, 1); for (i = 0; i < nelt; i++) perm.quick_push (i * 2); @@ -38795,7 +38795,7 @@ rs6000_expand_interleave (rtx target, rt { machine_mode vmode = GET_MODE (target); unsigned i, high, nelt = GET_MODE_NUNITS (vmode); - vec_perm_builder perm (nelt); + vec_perm_builder perm (nelt, nelt, 1); high = (highp ? 0 : nelt / 2); for (i = 0; i < nelt / 2; i++) Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c 2017-12-09 22:47:27.874318095 +0000 +++ gcc/config/rs6000/rs6000.c 2017-12-09 22:48:47.544825224 +0000 @@ -36017,7 +36017,7 @@ rs6000_expand_extract_even (rtx target, { machine_mode vmode = GET_MODE (target); unsigned i, nelt = GET_MODE_NUNITS (vmode); - vec_perm_builder perm (nelt); + vec_perm_builder perm (nelt, nelt, 1); for (i = 0; i < nelt; i++) perm.quick_push (i * 2); @@ -36032,7 +36032,7 @@ rs6000_expand_interleave (rtx target, rt { machine_mode vmode = GET_MODE (target); unsigned i, high, nelt = GET_MODE_NUNITS (vmode); - vec_perm_builder perm (nelt); + vec_perm_builder perm (nelt, nelt, 1); high = (highp ? 0 : nelt / 2); for (i = 0; i < nelt / 2; i++)