Message ID | 874lozmr9a.fsf@linaro.org |
---|---|
State | New |
Headers | show |
Series | Make VEC_PERM_EXPR work for variable-length vectors | expand |
On Sun, Dec 10, 2017 at 12:10 AM, Richard Sandiford <richard.sandiford@linaro.org> wrote: > This patch splits can_vec_perm_p into two functions: can_vec_perm_var_p > for testing permute operations with variable selection vectors, and > can_vec_perm_const_p for testing permute operations with specific > constant selection vectors. This means that we can pass the constant > selection vector by reference. > > Constant permutes can still use a variable permute as a fallback. > A later patch adds a check to make sure that we don't truncate the > vector indices when doing this. > > However, have_whole_vector_shift checked: > > if (direct_optab_handler (vec_perm_const_optab, mode) == CODE_FOR_nothing) > return false; > > which had the effect of disallowing the fallback to variable permutes. > I'm not sure whether that was the intention or whether it was just > supposed to short-cut the loop on targets that don't support permutes. > (But then why bother? The first check in the loop would fail and > we'd bail out straightaway.) > > The patch adds a parameter for disallowing the fallback. I think it > makes sense to do this for the following code in the VEC_PERM_EXPR > folder: > > /* Some targets are deficient and fail to expand a single > argument permutation while still allowing an equivalent > 2-argument version. */ > if (need_mask_canon && arg2 == op2 > && !can_vec_perm_p (TYPE_MODE (type), false, &sel) > && can_vec_perm_p (TYPE_MODE (type), false, &sel2)) > > since it's really testing whether the expand_vec_perm_const code expects > a particular form. Ok. > > 2017-12-09 Richard Sandiford <richard.sandiford@linaro.org> > > gcc/ > * optabs-query.h (can_vec_perm_p): Delete. > (can_vec_perm_var_p, can_vec_perm_const_p): Declare. > * optabs-query.c (can_vec_perm_p): Split into... > (can_vec_perm_var_p, can_vec_perm_const_p): ...these two functions. > (can_mult_highpart_p): Use can_vec_perm_const_p to test whether a > particular selector is valid. > * tree-ssa-forwprop.c (simplify_vector_constructor): Likewise. > * tree-vect-data-refs.c (vect_grouped_store_supported): Likewise. > (vect_grouped_load_supported): Likewise. > (vect_shift_permute_load_chain): Likewise. > * tree-vect-slp.c (vect_build_slp_tree_1): Likewise. > (vect_transform_slp_perm_load): Likewise. > * tree-vect-stmts.c (perm_mask_for_reverse): Likewise. > (vectorizable_bswap): Likewise. > (vect_gen_perm_mask_checked): Likewise. > * fold-const.c (fold_ternary_loc): Likewise. Don't take > implementations of variable permutation vectors into account > when deciding which selector to use. > * tree-vect-loop.c (have_whole_vector_shift): Don't check whether > vec_perm_const_optab is supported; instead use can_vec_perm_const_p > with a false third argument. > * tree-vect-generic.c (lower_vec_perm): Use can_vec_perm_const_p > to test whether the constant selector is valid and can_vec_perm_var_p > to test whether a variable selector is valid. > > Index: gcc/optabs-query.h > =================================================================== > --- gcc/optabs-query.h 2017-12-09 22:47:14.730310076 +0000 > +++ gcc/optabs-query.h 2017-12-09 22:47:21.534314227 +0000 > @@ -175,7 +175,9 @@ enum insn_code can_float_p (machine_mode > enum insn_code can_fix_p (machine_mode, machine_mode, int, bool *); > bool can_conditionally_move_p (machine_mode mode); > opt_machine_mode qimode_for_vec_perm (machine_mode); > -bool can_vec_perm_p (machine_mode, bool, vec_perm_indices *); > +bool can_vec_perm_var_p (machine_mode); > +bool can_vec_perm_const_p (machine_mode, const vec_perm_indices &, > + bool = true); > /* Find a widening optab even if it doesn't widen as much as we want. */ > #define find_widening_optab_handler(A, B, C) \ > find_widening_optab_handler_and_mode (A, B, C, NULL) > Index: gcc/optabs-query.c > =================================================================== > --- gcc/optabs-query.c 2017-12-09 22:47:14.729310075 +0000 > +++ gcc/optabs-query.c 2017-12-09 22:47:21.534314227 +0000 > @@ -361,58 +361,75 @@ qimode_for_vec_perm (machine_mode mode) > return opt_machine_mode (); > } > > -/* Return true if VEC_PERM_EXPR of arbitrary input vectors can be > - expanded using SIMD extensions of the CPU. SEL may be NULL, which > - stands for an unknown constant. Note that additional permutations > - representing whole-vector shifts may also be handled via the vec_shr > - optab, but only where the second input vector is entirely constant > - zeroes; this case is not dealt with here. */ > +/* Return true if VEC_PERM_EXPRs with variable selector operands can be > + expanded using SIMD extensions of the CPU. MODE is the mode of the > + vectors being permuted. */ > > bool > -can_vec_perm_p (machine_mode mode, bool variable, vec_perm_indices *sel) > +can_vec_perm_var_p (machine_mode mode) > { > - machine_mode qimode; > - > /* If the target doesn't implement a vector mode for the vector type, > then no operations are supported. */ > if (!VECTOR_MODE_P (mode)) > return false; > > - if (!variable) > - { > - if (direct_optab_handler (vec_perm_const_optab, mode) != CODE_FOR_nothing > - && (sel == NULL > - || targetm.vectorize.vec_perm_const_ok == NULL > - || targetm.vectorize.vec_perm_const_ok (mode, *sel))) > - return true; > - } > - > if (direct_optab_handler (vec_perm_optab, mode) != CODE_FOR_nothing) > return true; > > /* We allow fallback to a QI vector mode, and adjust the mask. */ > + machine_mode qimode; > if (!qimode_for_vec_perm (mode).exists (&qimode)) > return false; > > - /* ??? For completeness, we ought to check the QImode version of > - vec_perm_const_optab. But all users of this implicit lowering > - feature implement the variable vec_perm_optab. */ > if (direct_optab_handler (vec_perm_optab, qimode) == CODE_FOR_nothing) > return false; > > /* In order to support the lowering of variable permutations, > we need to support shifts and adds. */ > - if (variable) > + if (GET_MODE_UNIT_SIZE (mode) > 2 > + && optab_handler (ashl_optab, mode) == CODE_FOR_nothing > + && optab_handler (vashl_optab, mode) == CODE_FOR_nothing) > + return false; > + if (optab_handler (add_optab, qimode) == CODE_FOR_nothing) > + return false; > + > + return true; > +} > + > +/* Return true if the target directly supports VEC_PERM_EXPRs on vectors > + of mode MODE using the selector SEL. ALLOW_VARIABLE_P is true if it > + is acceptable to force the selector into a register and use a variable > + permute (if the target supports that). > + > + Note that additional permutations representing whole-vector shifts may > + also be handled via the vec_shr optab, but only where the second input > + vector is entirely constant zeroes; this case is not dealt with here. */ > + > +bool > +can_vec_perm_const_p (machine_mode mode, const vec_perm_indices &sel, > + bool allow_variable_p) > +{ > + /* If the target doesn't implement a vector mode for the vector type, > + then no operations are supported. */ > + if (!VECTOR_MODE_P (mode)) > + return false; > + > + /* It's probably cheaper to test for the variable case first. */ > + if (allow_variable_p && can_vec_perm_var_p (mode)) > + return true; > + > + if (direct_optab_handler (vec_perm_const_optab, mode) != CODE_FOR_nothing) > { > - if (GET_MODE_UNIT_SIZE (mode) > 2 > - && optab_handler (ashl_optab, mode) == CODE_FOR_nothing > - && optab_handler (vashl_optab, mode) == CODE_FOR_nothing) > - return false; > - if (optab_handler (add_optab, qimode) == CODE_FOR_nothing) > - return false; > + if (targetm.vectorize.vec_perm_const_ok == NULL > + || targetm.vectorize.vec_perm_const_ok (mode, sel)) > + return true; > + > + /* ??? For completeness, we ought to check the QImode version of > + vec_perm_const_optab. But all users of this implicit lowering > + feature implement the variable vec_perm_optab. */ > } > > - return true; > + return false; > } > > /* Find a widening optab even if it doesn't widen as much as we want. > @@ -472,7 +489,7 @@ can_mult_highpart_p (machine_mode mode, > sel.quick_push (!BYTES_BIG_ENDIAN > + (i & ~1) > + ((i & 1) ? nunits : 0)); > - if (can_vec_perm_p (mode, false, &sel)) > + if (can_vec_perm_const_p (mode, sel)) > return 2; > } > } > @@ -486,7 +503,7 @@ can_mult_highpart_p (machine_mode mode, > auto_vec_perm_indices sel (nunits); > for (i = 0; i < nunits; ++i) > sel.quick_push (2 * i + (BYTES_BIG_ENDIAN ? 0 : 1)); > - if (can_vec_perm_p (mode, false, &sel)) > + if (can_vec_perm_const_p (mode, sel)) > return 3; > } > } > Index: gcc/tree-ssa-forwprop.c > =================================================================== > --- gcc/tree-ssa-forwprop.c 2017-12-09 22:47:11.145420483 +0000 > +++ gcc/tree-ssa-forwprop.c 2017-12-09 22:47:21.534314227 +0000 > @@ -2108,7 +2108,7 @@ simplify_vector_constructor (gimple_stmt > { > tree mask_type; > > - if (!can_vec_perm_p (TYPE_MODE (type), false, &sel)) > + if (!can_vec_perm_const_p (TYPE_MODE (type), sel)) > return false; > mask_type > = build_vector_type (build_nonstandard_integer_type (elem_size, 1), > Index: gcc/tree-vect-data-refs.c > =================================================================== > --- gcc/tree-vect-data-refs.c 2017-12-09 22:47:11.145420483 +0000 > +++ gcc/tree-vect-data-refs.c 2017-12-09 22:47:21.535314227 +0000 > @@ -4587,11 +4587,11 @@ vect_grouped_store_supported (tree vecty > if (3 * i + nelt2 < nelt) > sel[3 * i + nelt2] = 0; > } > - if (!can_vec_perm_p (mode, false, &sel)) > + if (!can_vec_perm_const_p (mode, sel)) > { > if (dump_enabled_p ()) > dump_printf (MSG_MISSED_OPTIMIZATION, > - "permutaion op not supported by target.\n"); > + "permutation op not supported by target.\n"); > return false; > } > > @@ -4604,11 +4604,11 @@ vect_grouped_store_supported (tree vecty > if (3 * i + nelt2 < nelt) > sel[3 * i + nelt2] = nelt + j2++; > } > - if (!can_vec_perm_p (mode, false, &sel)) > + if (!can_vec_perm_const_p (mode, sel)) > { > if (dump_enabled_p ()) > dump_printf (MSG_MISSED_OPTIMIZATION, > - "permutaion op not supported by target.\n"); > + "permutation op not supported by target.\n"); > return false; > } > } > @@ -4624,11 +4624,11 @@ vect_grouped_store_supported (tree vecty > sel[i * 2] = i; > sel[i * 2 + 1] = i + nelt; > } > - if (can_vec_perm_p (mode, false, &sel)) > + if (can_vec_perm_const_p (mode, sel)) > { > for (i = 0; i < nelt; i++) > sel[i] += nelt / 2; > - if (can_vec_perm_p (mode, false, &sel)) > + if (can_vec_perm_const_p (mode, sel)) > return true; > } > } > @@ -5166,7 +5166,7 @@ vect_grouped_load_supported (tree vectyp > sel[i] = 3 * i + k; > else > sel[i] = 0; > - if (!can_vec_perm_p (mode, false, &sel)) > + if (!can_vec_perm_const_p (mode, sel)) > { > if (dump_enabled_p ()) > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > @@ -5179,7 +5179,7 @@ vect_grouped_load_supported (tree vectyp > sel[i] = i; > else > sel[i] = nelt + ((nelt + k) % 3) + 3 * (j++); > - if (!can_vec_perm_p (mode, false, &sel)) > + if (!can_vec_perm_const_p (mode, sel)) > { > if (dump_enabled_p ()) > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > @@ -5196,11 +5196,11 @@ vect_grouped_load_supported (tree vectyp > gcc_assert (pow2p_hwi (count)); > for (i = 0; i < nelt; i++) > sel[i] = i * 2; > - if (can_vec_perm_p (mode, false, &sel)) > + if (can_vec_perm_const_p (mode, sel)) > { > for (i = 0; i < nelt; i++) > sel[i] = i * 2 + 1; > - if (can_vec_perm_p (mode, false, &sel)) > + if (can_vec_perm_const_p (mode, sel)) > return true; > } > } > @@ -5527,7 +5527,7 @@ vect_shift_permute_load_chain (vec<tree> > sel[i] = i * 2; > for (i = 0; i < nelt / 2; ++i) > sel[nelt / 2 + i] = i * 2 + 1; > - if (!can_vec_perm_p (TYPE_MODE (vectype), false, &sel)) > + if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) > { > if (dump_enabled_p ()) > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > @@ -5541,7 +5541,7 @@ vect_shift_permute_load_chain (vec<tree> > sel[i] = i * 2 + 1; > for (i = 0; i < nelt / 2; ++i) > sel[nelt / 2 + i] = i * 2; > - if (!can_vec_perm_p (TYPE_MODE (vectype), false, &sel)) > + if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) > { > if (dump_enabled_p ()) > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > @@ -5555,7 +5555,7 @@ vect_shift_permute_load_chain (vec<tree> > For vector length 8 it is {4 5 6 7 8 9 10 11}. */ > for (i = 0; i < nelt; i++) > sel[i] = nelt / 2 + i; > - if (!can_vec_perm_p (TYPE_MODE (vectype), false, &sel)) > + if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) > { > if (dump_enabled_p ()) > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > @@ -5570,7 +5570,7 @@ vect_shift_permute_load_chain (vec<tree> > sel[i] = i; > for (i = nelt / 2; i < nelt; i++) > sel[i] = nelt + i; > - if (!can_vec_perm_p (TYPE_MODE (vectype), false, &sel)) > + if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) > { > if (dump_enabled_p ()) > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > @@ -5633,7 +5633,7 @@ vect_shift_permute_load_chain (vec<tree> > sel[i] = 3 * k + (l % 3); > k++; > } > - if (!can_vec_perm_p (TYPE_MODE (vectype), false, &sel)) > + if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) > { > if (dump_enabled_p ()) > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > @@ -5647,7 +5647,7 @@ vect_shift_permute_load_chain (vec<tree> > For vector length 8 it is {6 7 8 9 10 11 12 13}. */ > for (i = 0; i < nelt; i++) > sel[i] = 2 * (nelt / 3) + (nelt % 3) + i; > - if (!can_vec_perm_p (TYPE_MODE (vectype), false, &sel)) > + if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) > { > if (dump_enabled_p ()) > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > @@ -5660,7 +5660,7 @@ vect_shift_permute_load_chain (vec<tree> > For vector length 8 it is {5 6 7 8 9 10 11 12}. */ > for (i = 0; i < nelt; i++) > sel[i] = 2 * (nelt / 3) + 1 + i; > - if (!can_vec_perm_p (TYPE_MODE (vectype), false, &sel)) > + if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) > { > if (dump_enabled_p ()) > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > @@ -5673,7 +5673,7 @@ vect_shift_permute_load_chain (vec<tree> > For vector length 8 it is {3 4 5 6 7 8 9 10}. */ > for (i = 0; i < nelt; i++) > sel[i] = (nelt / 3) + (nelt % 3) / 2 + i; > - if (!can_vec_perm_p (TYPE_MODE (vectype), false, &sel)) > + if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) > { > if (dump_enabled_p ()) > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > @@ -5686,7 +5686,7 @@ vect_shift_permute_load_chain (vec<tree> > For vector length 8 it is {5 6 7 8 9 10 11 12}. */ > for (i = 0; i < nelt; i++) > sel[i] = 2 * (nelt / 3) + (nelt % 3) / 2 + i; > - if (!can_vec_perm_p (TYPE_MODE (vectype), false, &sel)) > + if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) > { > if (dump_enabled_p ()) > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > Index: gcc/tree-vect-slp.c > =================================================================== > --- gcc/tree-vect-slp.c 2017-12-09 22:47:11.145420483 +0000 > +++ gcc/tree-vect-slp.c 2017-12-09 22:47:21.536314228 +0000 > @@ -901,7 +901,7 @@ vect_build_slp_tree_1 (vec_info *vinfo, > elt += count; > sel.quick_push (elt); > } > - if (!can_vec_perm_p (TYPE_MODE (vectype), false, &sel)) > + if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) > { > for (i = 0; i < group_size; ++i) > if (gimple_assign_rhs_code (stmts[i]) == alt_stmt_code) > @@ -3646,7 +3646,7 @@ vect_transform_slp_perm_load (slp_tree n > if (index == nunits) > { > if (! noop_p > - && ! can_vec_perm_p (mode, false, &mask)) > + && ! can_vec_perm_const_p (mode, mask)) > { > if (dump_enabled_p ()) > { > Index: gcc/tree-vect-stmts.c > =================================================================== > --- gcc/tree-vect-stmts.c 2017-12-09 22:47:19.119312754 +0000 > +++ gcc/tree-vect-stmts.c 2017-12-09 22:47:21.537314229 +0000 > @@ -1720,7 +1720,7 @@ perm_mask_for_reverse (tree vectype) > for (i = 0; i < nunits; ++i) > sel.quick_push (nunits - 1 - i); > > - if (!can_vec_perm_p (TYPE_MODE (vectype), false, &sel)) > + if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) > return NULL_TREE; > return vect_gen_perm_mask_checked (vectype, sel); > } > @@ -2502,7 +2502,7 @@ vectorizable_bswap (gimple *stmt, gimple > for (unsigned j = 0; j < word_bytes; ++j) > elts.quick_push ((i + 1) * word_bytes - j - 1); > > - if (! can_vec_perm_p (TYPE_MODE (char_vectype), false, &elts)) > + if (!can_vec_perm_const_p (TYPE_MODE (char_vectype), elts)) > return false; > > if (! vec_stmt) > @@ -6502,7 +6502,7 @@ vectorizable_store (gimple *stmt, gimple > > /* Given a vector type VECTYPE, turns permutation SEL into the equivalent > VECTOR_CST mask. No checks are made that the target platform supports the > - mask, so callers may wish to test can_vec_perm_p separately, or use > + mask, so callers may wish to test can_vec_perm_const_p separately, or use > vect_gen_perm_mask_checked. */ > > tree > @@ -6523,13 +6523,13 @@ vect_gen_perm_mask_any (tree vectype, co > return mask_elts.build (); > } > > -/* Checked version of vect_gen_perm_mask_any. Asserts can_vec_perm_p, > +/* Checked version of vect_gen_perm_mask_any. Asserts can_vec_perm_const_p, > i.e. that the target supports the pattern _for arbitrary input vectors_. */ > > tree > vect_gen_perm_mask_checked (tree vectype, const vec_perm_indices &sel) > { > - gcc_assert (can_vec_perm_p (TYPE_MODE (vectype), false, &sel)); > + gcc_assert (can_vec_perm_const_p (TYPE_MODE (vectype), sel)); > return vect_gen_perm_mask_any (vectype, sel); > } > > Index: gcc/fold-const.c > =================================================================== > --- gcc/fold-const.c 2017-12-09 22:47:19.119312754 +0000 > +++ gcc/fold-const.c 2017-12-09 22:47:21.534314227 +0000 > @@ -11620,8 +11620,8 @@ fold_ternary_loc (location_t loc, enum t > argument permutation while still allowing an equivalent > 2-argument version. */ > if (need_mask_canon && arg2 == op2 > - && !can_vec_perm_p (TYPE_MODE (type), false, &sel) > - && can_vec_perm_p (TYPE_MODE (type), false, &sel2)) > + && !can_vec_perm_const_p (TYPE_MODE (type), sel, false) > + && can_vec_perm_const_p (TYPE_MODE (type), sel2, false)) > { > need_mask_canon = need_mask_canon2; > sel = sel2; > Index: gcc/tree-vect-loop.c > =================================================================== > --- gcc/tree-vect-loop.c 2017-12-09 22:47:11.145420483 +0000 > +++ gcc/tree-vect-loop.c 2017-12-09 22:47:21.536314228 +0000 > @@ -3730,9 +3730,6 @@ have_whole_vector_shift (machine_mode mo > if (optab_handler (vec_shr_optab, mode) != CODE_FOR_nothing) > return true; > > - if (direct_optab_handler (vec_perm_const_optab, mode) == CODE_FOR_nothing) > - return false; > - > unsigned int i, nelt = GET_MODE_NUNITS (mode); > auto_vec_perm_indices sel (nelt); > > @@ -3740,7 +3737,7 @@ have_whole_vector_shift (machine_mode mo > { > sel.truncate (0); > calc_vec_perm_mask_for_shift (i, nelt, &sel); > - if (!can_vec_perm_p (mode, false, &sel)) > + if (!can_vec_perm_const_p (mode, sel, false)) > return false; > } > return true; > Index: gcc/tree-vect-generic.c > =================================================================== > --- gcc/tree-vect-generic.c 2017-12-09 22:47:11.145420483 +0000 > +++ gcc/tree-vect-generic.c 2017-12-09 22:47:21.535314227 +0000 > @@ -1306,7 +1306,7 @@ lower_vec_perm (gimple_stmt_iterator *gs > sel_int.quick_push (TREE_INT_CST_LOW (VECTOR_CST_ELT (mask, i)) > & (2 * elements - 1)); > > - if (can_vec_perm_p (TYPE_MODE (vect_type), false, &sel_int)) > + if (can_vec_perm_const_p (TYPE_MODE (vect_type), sel_int)) > { > gimple_assign_set_rhs3 (stmt, mask); > update_stmt (stmt); > @@ -1337,7 +1337,7 @@ lower_vec_perm (gimple_stmt_iterator *gs > } > } > } > - else if (can_vec_perm_p (TYPE_MODE (vect_type), true, NULL)) > + else if (can_vec_perm_var_p (TYPE_MODE (vect_type))) > return; > > warning_at (loc, OPT_Wvector_operation_performance,
Index: gcc/optabs-query.h =================================================================== --- gcc/optabs-query.h 2017-12-09 22:47:14.730310076 +0000 +++ gcc/optabs-query.h 2017-12-09 22:47:21.534314227 +0000 @@ -175,7 +175,9 @@ enum insn_code can_float_p (machine_mode enum insn_code can_fix_p (machine_mode, machine_mode, int, bool *); bool can_conditionally_move_p (machine_mode mode); opt_machine_mode qimode_for_vec_perm (machine_mode); -bool can_vec_perm_p (machine_mode, bool, vec_perm_indices *); +bool can_vec_perm_var_p (machine_mode); +bool can_vec_perm_const_p (machine_mode, const vec_perm_indices &, + bool = true); /* Find a widening optab even if it doesn't widen as much as we want. */ #define find_widening_optab_handler(A, B, C) \ find_widening_optab_handler_and_mode (A, B, C, NULL) Index: gcc/optabs-query.c =================================================================== --- gcc/optabs-query.c 2017-12-09 22:47:14.729310075 +0000 +++ gcc/optabs-query.c 2017-12-09 22:47:21.534314227 +0000 @@ -361,58 +361,75 @@ qimode_for_vec_perm (machine_mode mode) return opt_machine_mode (); } -/* Return true if VEC_PERM_EXPR of arbitrary input vectors can be - expanded using SIMD extensions of the CPU. SEL may be NULL, which - stands for an unknown constant. Note that additional permutations - representing whole-vector shifts may also be handled via the vec_shr - optab, but only where the second input vector is entirely constant - zeroes; this case is not dealt with here. */ +/* Return true if VEC_PERM_EXPRs with variable selector operands can be + expanded using SIMD extensions of the CPU. MODE is the mode of the + vectors being permuted. */ bool -can_vec_perm_p (machine_mode mode, bool variable, vec_perm_indices *sel) +can_vec_perm_var_p (machine_mode mode) { - machine_mode qimode; - /* If the target doesn't implement a vector mode for the vector type, then no operations are supported. */ if (!VECTOR_MODE_P (mode)) return false; - if (!variable) - { - if (direct_optab_handler (vec_perm_const_optab, mode) != CODE_FOR_nothing - && (sel == NULL - || targetm.vectorize.vec_perm_const_ok == NULL - || targetm.vectorize.vec_perm_const_ok (mode, *sel))) - return true; - } - if (direct_optab_handler (vec_perm_optab, mode) != CODE_FOR_nothing) return true; /* We allow fallback to a QI vector mode, and adjust the mask. */ + machine_mode qimode; if (!qimode_for_vec_perm (mode).exists (&qimode)) return false; - /* ??? For completeness, we ought to check the QImode version of - vec_perm_const_optab. But all users of this implicit lowering - feature implement the variable vec_perm_optab. */ if (direct_optab_handler (vec_perm_optab, qimode) == CODE_FOR_nothing) return false; /* In order to support the lowering of variable permutations, we need to support shifts and adds. */ - if (variable) + if (GET_MODE_UNIT_SIZE (mode) > 2 + && optab_handler (ashl_optab, mode) == CODE_FOR_nothing + && optab_handler (vashl_optab, mode) == CODE_FOR_nothing) + return false; + if (optab_handler (add_optab, qimode) == CODE_FOR_nothing) + return false; + + return true; +} + +/* Return true if the target directly supports VEC_PERM_EXPRs on vectors + of mode MODE using the selector SEL. ALLOW_VARIABLE_P is true if it + is acceptable to force the selector into a register and use a variable + permute (if the target supports that). + + Note that additional permutations representing whole-vector shifts may + also be handled via the vec_shr optab, but only where the second input + vector is entirely constant zeroes; this case is not dealt with here. */ + +bool +can_vec_perm_const_p (machine_mode mode, const vec_perm_indices &sel, + bool allow_variable_p) +{ + /* If the target doesn't implement a vector mode for the vector type, + then no operations are supported. */ + if (!VECTOR_MODE_P (mode)) + return false; + + /* It's probably cheaper to test for the variable case first. */ + if (allow_variable_p && can_vec_perm_var_p (mode)) + return true; + + if (direct_optab_handler (vec_perm_const_optab, mode) != CODE_FOR_nothing) { - if (GET_MODE_UNIT_SIZE (mode) > 2 - && optab_handler (ashl_optab, mode) == CODE_FOR_nothing - && optab_handler (vashl_optab, mode) == CODE_FOR_nothing) - return false; - if (optab_handler (add_optab, qimode) == CODE_FOR_nothing) - return false; + if (targetm.vectorize.vec_perm_const_ok == NULL + || targetm.vectorize.vec_perm_const_ok (mode, sel)) + return true; + + /* ??? For completeness, we ought to check the QImode version of + vec_perm_const_optab. But all users of this implicit lowering + feature implement the variable vec_perm_optab. */ } - return true; + return false; } /* Find a widening optab even if it doesn't widen as much as we want. @@ -472,7 +489,7 @@ can_mult_highpart_p (machine_mode mode, sel.quick_push (!BYTES_BIG_ENDIAN + (i & ~1) + ((i & 1) ? nunits : 0)); - if (can_vec_perm_p (mode, false, &sel)) + if (can_vec_perm_const_p (mode, sel)) return 2; } } @@ -486,7 +503,7 @@ can_mult_highpart_p (machine_mode mode, auto_vec_perm_indices sel (nunits); for (i = 0; i < nunits; ++i) sel.quick_push (2 * i + (BYTES_BIG_ENDIAN ? 0 : 1)); - if (can_vec_perm_p (mode, false, &sel)) + if (can_vec_perm_const_p (mode, sel)) return 3; } } Index: gcc/tree-ssa-forwprop.c =================================================================== --- gcc/tree-ssa-forwprop.c 2017-12-09 22:47:11.145420483 +0000 +++ gcc/tree-ssa-forwprop.c 2017-12-09 22:47:21.534314227 +0000 @@ -2108,7 +2108,7 @@ simplify_vector_constructor (gimple_stmt { tree mask_type; - if (!can_vec_perm_p (TYPE_MODE (type), false, &sel)) + if (!can_vec_perm_const_p (TYPE_MODE (type), sel)) return false; mask_type = build_vector_type (build_nonstandard_integer_type (elem_size, 1), Index: gcc/tree-vect-data-refs.c =================================================================== --- gcc/tree-vect-data-refs.c 2017-12-09 22:47:11.145420483 +0000 +++ gcc/tree-vect-data-refs.c 2017-12-09 22:47:21.535314227 +0000 @@ -4587,11 +4587,11 @@ vect_grouped_store_supported (tree vecty if (3 * i + nelt2 < nelt) sel[3 * i + nelt2] = 0; } - if (!can_vec_perm_p (mode, false, &sel)) + if (!can_vec_perm_const_p (mode, sel)) { if (dump_enabled_p ()) dump_printf (MSG_MISSED_OPTIMIZATION, - "permutaion op not supported by target.\n"); + "permutation op not supported by target.\n"); return false; } @@ -4604,11 +4604,11 @@ vect_grouped_store_supported (tree vecty if (3 * i + nelt2 < nelt) sel[3 * i + nelt2] = nelt + j2++; } - if (!can_vec_perm_p (mode, false, &sel)) + if (!can_vec_perm_const_p (mode, sel)) { if (dump_enabled_p ()) dump_printf (MSG_MISSED_OPTIMIZATION, - "permutaion op not supported by target.\n"); + "permutation op not supported by target.\n"); return false; } } @@ -4624,11 +4624,11 @@ vect_grouped_store_supported (tree vecty sel[i * 2] = i; sel[i * 2 + 1] = i + nelt; } - if (can_vec_perm_p (mode, false, &sel)) + if (can_vec_perm_const_p (mode, sel)) { for (i = 0; i < nelt; i++) sel[i] += nelt / 2; - if (can_vec_perm_p (mode, false, &sel)) + if (can_vec_perm_const_p (mode, sel)) return true; } } @@ -5166,7 +5166,7 @@ vect_grouped_load_supported (tree vectyp sel[i] = 3 * i + k; else sel[i] = 0; - if (!can_vec_perm_p (mode, false, &sel)) + if (!can_vec_perm_const_p (mode, sel)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, @@ -5179,7 +5179,7 @@ vect_grouped_load_supported (tree vectyp sel[i] = i; else sel[i] = nelt + ((nelt + k) % 3) + 3 * (j++); - if (!can_vec_perm_p (mode, false, &sel)) + if (!can_vec_perm_const_p (mode, sel)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, @@ -5196,11 +5196,11 @@ vect_grouped_load_supported (tree vectyp gcc_assert (pow2p_hwi (count)); for (i = 0; i < nelt; i++) sel[i] = i * 2; - if (can_vec_perm_p (mode, false, &sel)) + if (can_vec_perm_const_p (mode, sel)) { for (i = 0; i < nelt; i++) sel[i] = i * 2 + 1; - if (can_vec_perm_p (mode, false, &sel)) + if (can_vec_perm_const_p (mode, sel)) return true; } } @@ -5527,7 +5527,7 @@ vect_shift_permute_load_chain (vec<tree> sel[i] = i * 2; for (i = 0; i < nelt / 2; ++i) sel[nelt / 2 + i] = i * 2 + 1; - if (!can_vec_perm_p (TYPE_MODE (vectype), false, &sel)) + if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, @@ -5541,7 +5541,7 @@ vect_shift_permute_load_chain (vec<tree> sel[i] = i * 2 + 1; for (i = 0; i < nelt / 2; ++i) sel[nelt / 2 + i] = i * 2; - if (!can_vec_perm_p (TYPE_MODE (vectype), false, &sel)) + if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, @@ -5555,7 +5555,7 @@ vect_shift_permute_load_chain (vec<tree> For vector length 8 it is {4 5 6 7 8 9 10 11}. */ for (i = 0; i < nelt; i++) sel[i] = nelt / 2 + i; - if (!can_vec_perm_p (TYPE_MODE (vectype), false, &sel)) + if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, @@ -5570,7 +5570,7 @@ vect_shift_permute_load_chain (vec<tree> sel[i] = i; for (i = nelt / 2; i < nelt; i++) sel[i] = nelt + i; - if (!can_vec_perm_p (TYPE_MODE (vectype), false, &sel)) + if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, @@ -5633,7 +5633,7 @@ vect_shift_permute_load_chain (vec<tree> sel[i] = 3 * k + (l % 3); k++; } - if (!can_vec_perm_p (TYPE_MODE (vectype), false, &sel)) + if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, @@ -5647,7 +5647,7 @@ vect_shift_permute_load_chain (vec<tree> For vector length 8 it is {6 7 8 9 10 11 12 13}. */ for (i = 0; i < nelt; i++) sel[i] = 2 * (nelt / 3) + (nelt % 3) + i; - if (!can_vec_perm_p (TYPE_MODE (vectype), false, &sel)) + if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, @@ -5660,7 +5660,7 @@ vect_shift_permute_load_chain (vec<tree> For vector length 8 it is {5 6 7 8 9 10 11 12}. */ for (i = 0; i < nelt; i++) sel[i] = 2 * (nelt / 3) + 1 + i; - if (!can_vec_perm_p (TYPE_MODE (vectype), false, &sel)) + if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, @@ -5673,7 +5673,7 @@ vect_shift_permute_load_chain (vec<tree> For vector length 8 it is {3 4 5 6 7 8 9 10}. */ for (i = 0; i < nelt; i++) sel[i] = (nelt / 3) + (nelt % 3) / 2 + i; - if (!can_vec_perm_p (TYPE_MODE (vectype), false, &sel)) + if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, @@ -5686,7 +5686,7 @@ vect_shift_permute_load_chain (vec<tree> For vector length 8 it is {5 6 7 8 9 10 11 12}. */ for (i = 0; i < nelt; i++) sel[i] = 2 * (nelt / 3) + (nelt % 3) / 2 + i; - if (!can_vec_perm_p (TYPE_MODE (vectype), false, &sel)) + if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, Index: gcc/tree-vect-slp.c =================================================================== --- gcc/tree-vect-slp.c 2017-12-09 22:47:11.145420483 +0000 +++ gcc/tree-vect-slp.c 2017-12-09 22:47:21.536314228 +0000 @@ -901,7 +901,7 @@ vect_build_slp_tree_1 (vec_info *vinfo, elt += count; sel.quick_push (elt); } - if (!can_vec_perm_p (TYPE_MODE (vectype), false, &sel)) + if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) { for (i = 0; i < group_size; ++i) if (gimple_assign_rhs_code (stmts[i]) == alt_stmt_code) @@ -3646,7 +3646,7 @@ vect_transform_slp_perm_load (slp_tree n if (index == nunits) { if (! noop_p - && ! can_vec_perm_p (mode, false, &mask)) + && ! can_vec_perm_const_p (mode, mask)) { if (dump_enabled_p ()) { Index: gcc/tree-vect-stmts.c =================================================================== --- gcc/tree-vect-stmts.c 2017-12-09 22:47:19.119312754 +0000 +++ gcc/tree-vect-stmts.c 2017-12-09 22:47:21.537314229 +0000 @@ -1720,7 +1720,7 @@ perm_mask_for_reverse (tree vectype) for (i = 0; i < nunits; ++i) sel.quick_push (nunits - 1 - i); - if (!can_vec_perm_p (TYPE_MODE (vectype), false, &sel)) + if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) return NULL_TREE; return vect_gen_perm_mask_checked (vectype, sel); } @@ -2502,7 +2502,7 @@ vectorizable_bswap (gimple *stmt, gimple for (unsigned j = 0; j < word_bytes; ++j) elts.quick_push ((i + 1) * word_bytes - j - 1); - if (! can_vec_perm_p (TYPE_MODE (char_vectype), false, &elts)) + if (!can_vec_perm_const_p (TYPE_MODE (char_vectype), elts)) return false; if (! vec_stmt) @@ -6502,7 +6502,7 @@ vectorizable_store (gimple *stmt, gimple /* Given a vector type VECTYPE, turns permutation SEL into the equivalent VECTOR_CST mask. No checks are made that the target platform supports the - mask, so callers may wish to test can_vec_perm_p separately, or use + mask, so callers may wish to test can_vec_perm_const_p separately, or use vect_gen_perm_mask_checked. */ tree @@ -6523,13 +6523,13 @@ vect_gen_perm_mask_any (tree vectype, co return mask_elts.build (); } -/* Checked version of vect_gen_perm_mask_any. Asserts can_vec_perm_p, +/* Checked version of vect_gen_perm_mask_any. Asserts can_vec_perm_const_p, i.e. that the target supports the pattern _for arbitrary input vectors_. */ tree vect_gen_perm_mask_checked (tree vectype, const vec_perm_indices &sel) { - gcc_assert (can_vec_perm_p (TYPE_MODE (vectype), false, &sel)); + gcc_assert (can_vec_perm_const_p (TYPE_MODE (vectype), sel)); return vect_gen_perm_mask_any (vectype, sel); } Index: gcc/fold-const.c =================================================================== --- gcc/fold-const.c 2017-12-09 22:47:19.119312754 +0000 +++ gcc/fold-const.c 2017-12-09 22:47:21.534314227 +0000 @@ -11620,8 +11620,8 @@ fold_ternary_loc (location_t loc, enum t argument permutation while still allowing an equivalent 2-argument version. */ if (need_mask_canon && arg2 == op2 - && !can_vec_perm_p (TYPE_MODE (type), false, &sel) - && can_vec_perm_p (TYPE_MODE (type), false, &sel2)) + && !can_vec_perm_const_p (TYPE_MODE (type), sel, false) + && can_vec_perm_const_p (TYPE_MODE (type), sel2, false)) { need_mask_canon = need_mask_canon2; sel = sel2; Index: gcc/tree-vect-loop.c =================================================================== --- gcc/tree-vect-loop.c 2017-12-09 22:47:11.145420483 +0000 +++ gcc/tree-vect-loop.c 2017-12-09 22:47:21.536314228 +0000 @@ -3730,9 +3730,6 @@ have_whole_vector_shift (machine_mode mo if (optab_handler (vec_shr_optab, mode) != CODE_FOR_nothing) return true; - if (direct_optab_handler (vec_perm_const_optab, mode) == CODE_FOR_nothing) - return false; - unsigned int i, nelt = GET_MODE_NUNITS (mode); auto_vec_perm_indices sel (nelt); @@ -3740,7 +3737,7 @@ have_whole_vector_shift (machine_mode mo { sel.truncate (0); calc_vec_perm_mask_for_shift (i, nelt, &sel); - if (!can_vec_perm_p (mode, false, &sel)) + if (!can_vec_perm_const_p (mode, sel, false)) return false; } return true; Index: gcc/tree-vect-generic.c =================================================================== --- gcc/tree-vect-generic.c 2017-12-09 22:47:11.145420483 +0000 +++ gcc/tree-vect-generic.c 2017-12-09 22:47:21.535314227 +0000 @@ -1306,7 +1306,7 @@ lower_vec_perm (gimple_stmt_iterator *gs sel_int.quick_push (TREE_INT_CST_LOW (VECTOR_CST_ELT (mask, i)) & (2 * elements - 1)); - if (can_vec_perm_p (TYPE_MODE (vect_type), false, &sel_int)) + if (can_vec_perm_const_p (TYPE_MODE (vect_type), sel_int)) { gimple_assign_set_rhs3 (stmt, mask); update_stmt (stmt); @@ -1337,7 +1337,7 @@ lower_vec_perm (gimple_stmt_iterator *gs } } } - else if (can_vec_perm_p (TYPE_MODE (vect_type), true, NULL)) + else if (can_vec_perm_var_p (TYPE_MODE (vect_type))) return; warning_at (loc, OPT_Wvector_operation_performance,