Message ID | 6b66b279-eba2-80be-58f0-f72dc2316fcb@arm.com |
---|---|
State | New |
Headers | show |
Series | [SLP] SLP vectorization: vectorize vector constructors | expand |
On Wed, 30 Oct 2019, Joel Hutton wrote: > On 15/10/2019 13:11, Richard Biener wrote: > >> > You miss to check that CONSTRUCTOR_NELTS == TYPE_VECTOR_SUBPARTS > >> > (we can have omitted trailing zeros). > > > > ^^^ > > > > I don't see this being handled? You give up on non-SSA names > > but not on the omitted trailing zeros. > > I had thought checking the number of vectors produced would work. But that's too late to fail. > I've added that check. > I'm slightly confused about what should be done for non-SSA names. > There's no scalar stmt to gather for a constant in a vector constructor. For now simply give up. What could be done is "compact" the operands to vectorize to only contain SSA names, try to SLP match and vectorize them and then combine the vectorized result with the constant elements. Not worth doing I guess unless it's sth regular like { a, b, c, d, 0, 0, 0, 0 } or so. But this can be handled as followup if necessary. > > > > You build a CONSTRUCTOR of vectors, thus > > > > _orig_ssa_1 = { vect_part1_2, vect_part2_3, ... }; > I've added code to do this, and a testcase which triggers it. Great. > > > > + > > + if (constructor) > > + { > > + SLP_INSTANCE_ROOT_STMT (new_instance) = stmt_info->stmt; > > + } > > + else > > + SLP_INSTANCE_ROOT_STMT (new_instance) = NULL; > > + > > > > too much vertical space, no {} around single-stmt if clauses > Fixed. > > > > > > > @@ -2725,6 +2760,10 @@ vect_bb_slp_scalar_cost (basic_block bb, > > stmt_vec_info use_stmt_info = vinfo->lookup_stmt > > (use_stmt); > > if (!use_stmt_info || !PURE_SLP_STMT (use_stmt_info)) > > { > > + /* Check this is not a constructor that will be > > vectorized > > + away. */ > > + if (BB_VINFO_GROUPED_STORES (vinfo).contains > > (use_stmt_info)) > > + continue; > > > > hmm, so why not set the slp type on SLP_INSTANCE_ROOT_STMT instead? > > In theory the stmt should be part of the SLP tree itself but that's > > probably too awkward to be made work at the moment ;) > I did try this, but it was indeed very awkward to be made to work. > > > > > vect_ssa_use_outside_bb and vect_slp_check_for_constructors are new > > functions which need comments. > Fixed. I've also taken the 'vectorize the root' out into a separate > function. > > > > > + /* For vector constructors, the same SSA name must be used to > maintain > > data > > + flow into other basic blocks. */ > > + if (instance->root == node && SLP_INSTANCE_ROOT_STMT (instance) > > + && SLP_TREE_NUMBER_OF_VEC_STMTS (node) == 1 > > + && SLP_TREE_VEC_STMTS (node).exists ()) > > + { > > > > it should read > > > > /* Vectorize the instance root. */ > > > > and be in vect_schedule_slp after the vect_schedule_slp_instance. > > As said above you need to handle SLP_TREE_NUMBER_OF_VEC_STMTS > 1, > > you also cannot simply do "nothing" here, "failing" vectorization > > (well, you probably can - DCE will remove the vectorized code - but > > at least if there were calls in the SLP tree they will be mangled > > by vectorization so the scalar code is wrecked). SO it should be > > > > if (SLP_INSTANCE_ROOT_STMT (instance)) > > .. you may not fail to replace the scalar root stmt here .. > > > So what should be done in the case that CONSTRUCTOR_NELTS < > TYPE_VECTOR_SUBPARTS? You have to fail SLP discovery, not code generation. The check you did above should have done this. > > + if (CONSTRUCTOR_NELTS (rhs) == 0) > > + vectorizable = false; > > + > > > > if you use continue; you can elide the 'vectorizable' variable. > Done. > > > > > + if (!vect_ssa_use_outside_bb (gimple_assign_lhs (stmt))) > > + vectorizable = false; > > + > > > > why's that? no comments that clarify ;) The vector may be > > used as argument to a call or as source of a store. So I'd simply > > remove this check (and the function). > > Done. The thinking was that if the vector was used as a source of a > store the SLP tree would be built from the grouped store instead. Will > it not cause problems if both the grouped store and the vector > constructor are used to build SLP trees? > > > > 2019-10-10 Joel Hutton <Joel.Hutton@arm.com> > > * expr.c (store_constructor): Add case for constructor of vectors. Why do you need this? The vectorizer already creates such CTORs. Any testcase that you can show? > * tree-vect-slp.c (vect_analyze_slp_instance): Add case for vector constructors. > (vect_bb_slp_scalar_cost): Likewise. > (vect_slp_check_for_constructors): New function. > (vect_slp_analyze_bb_1): Add check for vector constructors. > (vect_schedule_slp_instance): Add case to fixup vector constructor stmt. > (vectorize_slp_instance_root_stmt): New function > * tree-vectorizer.h (SLP_INSTANCE_ROOT_STMT): New field. + SLP_INSTANCE_ROOT_STMT (new_instance) = constructor ? stmt_info->stmt\ + : NULL; SLP_INSTANCE_ROOT_STMT should be a stmt_vec_info I guess... @@ -2801,6 +2830,10 @@ vect_bb_slp_scalar_cost (basic_block bb, stmt_vec_info use_stmt_info = vinfo->lookup_stmt (use_stmt); if (!use_stmt_info || !PURE_SLP_STMT (use_stmt_info)) { + /* Check this is not a constructor that will be vectorized + away. */ + if (BB_VINFO_GROUPED_STORES (vinfo).contains (use_stmt_info)) + continue; (*life)[i] = true; ... which you then simply mark as PURE_SLP_STMT where we call vect_mark_slp_stmts in vect_slp_analyze_bb_1. I see you have the TYPE_VECTOR_SUBPARTS check still at transform stage in vectorize_slp_instance_root_stmt, please simply move it to vect_slp_check_for_constructors or to vect_analyze_slp_instance where you have the other rejects (non-SSA names in the ctor). That is, vectorize_slp_instance_root_stmt may not fail. +bool +vectorize_slp_instance_root_stmt (slp_tree node, slp_instance instance) +{ + extra newline + if (SLP_TREE_NUMBER_OF_VEC_STMTS (node) == 1) the stmt replacing can be commonized between == 1 and > 1 cases. FOR_EACH_VEC_ELT (slp_instances, i, instance) { + slp_tree node = SLP_INSTANCE_TREE (instance); /* Schedule the tree of INSTANCE. */ - vect_schedule_slp_instance (SLP_INSTANCE_TREE (instance), + vect_schedule_slp_instance (node, instance, bst_map); + + /* Vectorize the instance root. */ + if (instance->root == node && SLP_INSTANCE_ROOT_STMT (instance) + && SLP_TREE_VEC_STMTS (node).exists ()) + if (!vectorize_slp_instance_root_stmt (node, instance)) + { instance->root == node is always true. Likewise SLP_TREE_VEC_STMTS (node).exists (). @@ -4061,15 +4201,42 @@ vect_schedule_slp (vec_info *vinfo) if (is_a <loop_vec_info> (vinfo)) the ChangeLog doesn't mention this. I guess this isn't necessary unless you fail vectorizing late which you shouldn't. diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index 56be28b0cc5a77412f996e70636b08d5b615813e..9f8419e4208b7d438ace41892022f93ebcadd019 100644 --- a/gcc/tree-vectorizer.h +++ b/gcc/tree-vectorizer.h @@ -151,6 +151,10 @@ public: /* The root of SLP tree. */ slp_tree root; + /* For vector constructors, the constructor stmt that the SLP tree is built + from, NULL otherwise. */ + gimple *root_stmt; as said, make this a stmt_vec_info Thanks, Richard. > gcc/testsuite/ChangeLog: > > 2019-10-10 Joel Hutton <Joel.Hutton@arm.com> > > * gcc.dg/vect/bb-slp-40.c: New test. > * gcc.dg/vect/bb-slp-41.c: New test. > > -- Richard Biener <rguenther@suse.de> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
On 30/10/2019 13:49, Richard Biener wrote: > Why do you need this? The vectorizer already creates such CTORs. Any > testcase that you can show? typedef long v2di __attribute__((vector_size(16))); v2di v; void foo() { v = (v2di){v[1], v[0]}; } >> * tree-vect-slp.c (vect_analyze_slp_instance): Add case for vector constructors. >> (vect_bb_slp_scalar_cost): Likewise. >> (vect_slp_check_for_constructors): New function. >> (vect_slp_analyze_bb_1): Add check for vector constructors. >> (vect_schedule_slp_instance): Add case to fixup vector constructor stmt. >> (vectorize_slp_instance_root_stmt): New function >> * tree-vectorizer.h (SLP_INSTANCE_ROOT_STMT): New field. > + SLP_INSTANCE_ROOT_STMT (new_instance) = constructor ? > stmt_info->stmt\ > + : NULL; > > SLP_INSTANCE_ROOT_STMT should be a stmt_vec_info I guess... > > @@ -2801,6 +2830,10 @@ vect_bb_slp_scalar_cost (basic_block bb, > stmt_vec_info use_stmt_info = vinfo->lookup_stmt > (use_stmt); > if (!use_stmt_info || !PURE_SLP_STMT (use_stmt_info)) > { > + /* Check this is not a constructor that will be > vectorized > + away. */ > + if (BB_VINFO_GROUPED_STORES (vinfo).contains > (use_stmt_info)) > + continue; > (*life)[i] = true; > > ... which you then simply mark as PURE_SLP_STMT where we call > vect_mark_slp_stmts in vect_slp_analyze_bb_1. > > I see you have the TYPE_VECTOR_SUBPARTS check still at transform > stage in vectorize_slp_instance_root_stmt, please simply move > it to vect_slp_check_for_constructors or to vect_analyze_slp_instance > where you have the other rejects (non-SSA names in the ctor). If the check is in vect_slp_check_for_constructors, which vector is used as the input to tYPE_VECTOR_SUBPARTS, the lhs of the constructor? > That is, vectorize_slp_instance_root_stmt may not fail. > > +bool > +vectorize_slp_instance_root_stmt (slp_tree node, slp_instance instance) > +{ > + > > extra newline > > + if (SLP_TREE_NUMBER_OF_VEC_STMTS (node) == 1) > > the stmt replacing can be commonized between == 1 and > 1 cases. > > FOR_EACH_VEC_ELT (slp_instances, i, instance) > { > + slp_tree node = SLP_INSTANCE_TREE (instance); > /* Schedule the tree of INSTANCE. */ > - vect_schedule_slp_instance (SLP_INSTANCE_TREE (instance), > + vect_schedule_slp_instance (node, > instance, bst_map); > + > + /* Vectorize the instance root. */ > + if (instance->root == node && SLP_INSTANCE_ROOT_STMT (instance) > + && SLP_TREE_VEC_STMTS (node).exists ()) > + if (!vectorize_slp_instance_root_stmt (node, instance)) > + { > > instance->root == node is always true. Likewise > SLP_TREE_VEC_STMTS (node).exists (). > > @@ -4061,15 +4201,42 @@ vect_schedule_slp (vec_info *vinfo) > if (is_a <loop_vec_info> (vinfo)) > > the ChangeLog doesn't mention this. I guess this isn't necessary > unless you fail vectorizing late which you shouldn't. > > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h > index > 56be28b0cc5a77412f996e70636b08d5b615813e..9f8419e4208b7d438ace41892022f93ebcadd019 > 100644 > --- a/gcc/tree-vectorizer.h > +++ b/gcc/tree-vectorizer.h > @@ -151,6 +151,10 @@ public: > /* The root of SLP tree. */ > slp_tree root; > > + /* For vector constructors, the constructor stmt that the SLP tree is > built > + from, NULL otherwise. */ > + gimple *root_stmt; > > as said, make this a stmt_vec_info > > Thanks, > Richard. > > >> gcc/testsuite/ChangeLog: >> >> 2019-10-10 Joel Hutton <Joel.Hutton@arm.com> >> >> * gcc.dg/vect/bb-slp-40.c: New test. >> * gcc.dg/vect/bb-slp-41.c: New test. >> >>
On Wed, 30 Oct 2019, Joel Hutton wrote: > On 30/10/2019 13:49, Richard Biener wrote: > > Why do you need this? The vectorizer already creates such CTORs. Any > > testcase that you can show? > > typedef long v2di __attribute__((vector_size(16))); > v2di v; > void > foo() > { > v = (v2di){v[1], v[0]}; > } Huh. On what architecture? Is that for a V2DI constructor of V1DI vectors maybe? On x86 the code doesn't trigger. The condition was indeed to check for vector mode elements so maybe it should simply read if (VECTOR_MODE_P (emode)) ? eltmode is always scalar. > > >> * tree-vect-slp.c (vect_analyze_slp_instance): Add case for vector constructors. > >> (vect_bb_slp_scalar_cost): Likewise. > >> (vect_slp_check_for_constructors): New function. > >> (vect_slp_analyze_bb_1): Add check for vector constructors. > >> (vect_schedule_slp_instance): Add case to fixup vector constructor stmt. > >> (vectorize_slp_instance_root_stmt): New function > >> * tree-vectorizer.h (SLP_INSTANCE_ROOT_STMT): New field. > > + SLP_INSTANCE_ROOT_STMT (new_instance) = constructor ? > > stmt_info->stmt\ > > + : NULL; > > > > SLP_INSTANCE_ROOT_STMT should be a stmt_vec_info I guess... > > > > @@ -2801,6 +2830,10 @@ vect_bb_slp_scalar_cost (basic_block bb, > > stmt_vec_info use_stmt_info = vinfo->lookup_stmt > > (use_stmt); > > if (!use_stmt_info || !PURE_SLP_STMT (use_stmt_info)) > > { > > + /* Check this is not a constructor that will be > > vectorized > > + away. */ > > + if (BB_VINFO_GROUPED_STORES (vinfo).contains > > (use_stmt_info)) > > + continue; > > (*life)[i] = true; > > > > ... which you then simply mark as PURE_SLP_STMT where we call > > vect_mark_slp_stmts in vect_slp_analyze_bb_1. > > > > I see you have the TYPE_VECTOR_SUBPARTS check still at transform > > stage in vectorize_slp_instance_root_stmt, please simply move > > it to vect_slp_check_for_constructors or to vect_analyze_slp_instance > > where you have the other rejects (non-SSA names in the ctor). > If the check is in vect_slp_check_for_constructors, which vector is used > as the input to tYPE_VECTOR_SUBPARTS, the lhs of the constructor? The type of the constructor itself. > > That is, vectorize_slp_instance_root_stmt may not fail. > > > > +bool > > +vectorize_slp_instance_root_stmt (slp_tree node, slp_instance instance) > > +{ > > + > > > > extra newline > > > > + if (SLP_TREE_NUMBER_OF_VEC_STMTS (node) == 1) > > > > the stmt replacing can be commonized between == 1 and > 1 cases. > > > > FOR_EACH_VEC_ELT (slp_instances, i, instance) > > { > > + slp_tree node = SLP_INSTANCE_TREE (instance); > > /* Schedule the tree of INSTANCE. */ > > - vect_schedule_slp_instance (SLP_INSTANCE_TREE (instance), > > + vect_schedule_slp_instance (node, > > instance, bst_map); > > + > > + /* Vectorize the instance root. */ > > + if (instance->root == node && SLP_INSTANCE_ROOT_STMT (instance) > > + && SLP_TREE_VEC_STMTS (node).exists ()) > > + if (!vectorize_slp_instance_root_stmt (node, instance)) > > + { > > > > instance->root == node is always true. Likewise > > SLP_TREE_VEC_STMTS (node).exists (). > > > > @@ -4061,15 +4201,42 @@ vect_schedule_slp (vec_info *vinfo) > > if (is_a <loop_vec_info> (vinfo)) > > > > the ChangeLog doesn't mention this. I guess this isn't necessary > > unless you fail vectorizing late which you shouldn't. > > > > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h > > index > > 56be28b0cc5a77412f996e70636b08d5b615813e..9f8419e4208b7d438ace41892022f93ebcadd019 > > 100644 > > --- a/gcc/tree-vectorizer.h > > +++ b/gcc/tree-vectorizer.h > > @@ -151,6 +151,10 @@ public: > > /* The root of SLP tree. */ > > slp_tree root; > > > > + /* For vector constructors, the constructor stmt that the SLP tree is > > built > > + from, NULL otherwise. */ > > + gimple *root_stmt; > > > > as said, make this a stmt_vec_info > > > > Thanks, > > Richard. > > > > > >> gcc/testsuite/ChangeLog: > >> > >> 2019-10-10 Joel Hutton <Joel.Hutton@arm.com> > >> > >> * gcc.dg/vect/bb-slp-40.c: New test. > >> * gcc.dg/vect/bb-slp-41.c: New test. > >> > >> > -- Richard Biener <rguenther@suse.de> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
On 30/10/2019 14:51, Richard Biener wrote: > On Wed, 30 Oct 2019, Joel Hutton wrote: > >> On 30/10/2019 13:49, Richard Biener wrote: >>> Why do you need this? The vectorizer already creates such CTORs. Any >>> testcase that you can show? >> typedef long v2di __attribute__((vector_size(16))); >> v2di v; >> void >> foo() >> { >> v = (v2di){v[1], v[0]}; >> } > Huh. On what architecture? Is that for a V2DI constructor of > V1DI vectors maybe? On x86 the code doesn't trigger. On aarch64. emode = E_DImode eltmode = E_DImode > The condition was indeed to check for vector mode elements so > maybe it should simply read > > if (VECTOR_MODE_P (emode)) > > ? eltmode is always scalar. I'll try that, thanks.
On 30/10/2019 13:49, Richard Biener wrote: >> >> * expr.c (store_constructor): Add case for constructor of vectors. > Why do you need this? The vectorizer already creates such CTORs. Any > testcase that you can show? > > typedef long v2di __attribute__((vector_size(16))); > v2di v; > void > foo() > { > v = (v2di){v[1], v[0]}; > } There is a special case for single element vectors, where the vector mode and the element mode are the same. I've changed this slightly, as your solution of using VECTOR_MODE_P didn't work for my case where: emode = E_DImode eltmode = E_DImode On aarch64. As E_DImode is not a vector mode. Based on some checks I found in verify_gimple, I set the type of the replacement constructor to the same as the original constructor, as verify_gimple seems to expect flattened types. i.e. a vector of vectors of ints should have type vector of ints. > What could be done is "compact" the > operands to vectorize to only contain SSA names, try to SLP > match and vectorize them and then combine the vectorized result > with the constant elements. > > Not worth doing I guess unless it's sth regular like > > { a, b, c, d, 0, 0, 0, 0 } > > or so. But this can be handled as followup if necessary. > I agree, it should be possible to support this in future patches. > + SLP_INSTANCE_ROOT_STMT (new_instance) = constructor ? > stmt_info->stmt\ > + : NULL; > > SLP_INSTANCE_ROOT_STMT should be a stmt_vec_info I guess... > > @@ -2801,6 +2830,10 @@ vect_bb_slp_scalar_cost (basic_block bb, > stmt_vec_info use_stmt_info = vinfo->lookup_stmt > (use_stmt); > if (!use_stmt_info || !PURE_SLP_STMT (use_stmt_info)) > { > + /* Check this is not a constructor that will be > vectorized > + away. */ > + if (BB_VINFO_GROUPED_STORES (vinfo).contains > (use_stmt_info)) > + continue; > (*life)[i] = true; > > ... which you then simply mark as PURE_SLP_STMT where we call > vect_mark_slp_stmts in vect_slp_analyze_bb_1. Done. > I see you have the TYPE_VECTOR_SUBPARTS check still at transform > stage in vectorize_slp_instance_root_stmt, please simply move > it to vect_slp_check_for_constructors or to vect_analyze_slp_instance > where you have the other rejects (non-SSA names in the ctor). > Done. > > +bool > +vectorize_slp_instance_root_stmt (slp_tree node, slp_instance instance) > +{ > + > > extra newline Fixed. > + /* For vector constructors, the constructor stmt that the SLP tree is > built > + from, NULL otherwise. */ > + gimple *root_stmt; > > as said, make this a stmt_vec_info Done. > + if (SLP_TREE_NUMBER_OF_VEC_STMTS (node) == 1) > > the stmt replacing can be commonized between == 1 and > 1 cases. Done. > + > + /* Vectorize the instance root. */ > + if (instance->root == node && SLP_INSTANCE_ROOT_STMT (instance) > + && SLP_TREE_VEC_STMTS (node).exists ()) > + if (!vectorize_slp_instance_root_stmt (node, instance)) > + { > > instance->root == node is always true. Likewise > SLP_TREE_VEC_STMTS (node).exists (). Done. > @@ -4061,15 +4201,42 @@ vect_schedule_slp (vec_info *vinfo) > if (is_a <loop_vec_info> (vinfo)) > > the ChangeLog doesn't mention this. I guess this isn't necessary > unless you fail vectorizing late which you shouldn't. > It's necessary to avoid: removing stmts twice for constructors of the form {_1,_1,_1,_1} removing stmts that define ssa names used elsewhere (which previously wasn't a consideration because the scalar_stmts were stores to memory, not assignments to ssa names) Updated changelog (and patch) 2019-10-31 Joel Hutton <Joel.Hutton@arm.com> * expr.c (store_constructor): Modify to handle single element vectors. * tree-vect-slp.c (vect_analyze_slp_instance): Add case for vector constructors. (vect_slp_check_for_constructors): New function. (vect_slp_analyze_bb_1): Call new function to check for vector constructors. (vectorize_slp_instance_root_stmt): New function. (vect_schedule_slp): Call new function to vectorize root stmt of vector constructors. * tree-vectorizer.h (SLP_INSTANCE_ROOT_STMT): New field. gcc/testsuite/ChangeLog: 2019-10-31 Joel Hutton <Joel.Hutton@arm.com> * gcc.dg/vect/bb-slp-40.c: New test. * gcc.dg/vect/bb-slp-41.c: New test. From 35942aebea1e93497280e11a17af0ca393539e2f Mon Sep 17 00:00:00 2001 From: Joel Hutton <Joel.Hutton@arm.com> Date: Tue, 22 Oct 2019 10:05:19 +0100 Subject: [PATCH] SLP Vectorization: Vectorize vector constructors --- gcc/expr.c | 5 +- gcc/testsuite/gcc.dg/vect/bb-slp-40.c | 34 ++++++ gcc/testsuite/gcc.dg/vect/bb-slp-41.c | 61 +++++++++++ gcc/tree-vect-slp.c | 151 +++++++++++++++++++++++++- gcc/tree-vectorizer.h | 5 + 5 files changed, 252 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-40.c create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-41.c diff --git a/gcc/expr.c b/gcc/expr.c index 476c6865f20828fc68f455e70d4874eaabd9d08d..2b7811d30a0986913415d6d6d759976d5f15b26b 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -6809,12 +6809,13 @@ store_constructor (tree exp, rtx target, int cleared, poly_int64 size, && n_elts.is_constant (&const_n_elts)) { machine_mode emode = eltmode; + tree etype = NULL; if (CONSTRUCTOR_NELTS (exp) && (TREE_CODE (TREE_TYPE (CONSTRUCTOR_ELT (exp, 0)->value)) == VECTOR_TYPE)) { - tree etype = TREE_TYPE (CONSTRUCTOR_ELT (exp, 0)->value); + etype = TREE_TYPE (CONSTRUCTOR_ELT (exp, 0)->value); gcc_assert (known_eq (CONSTRUCTOR_NELTS (exp) * TYPE_VECTOR_SUBPARTS (etype), n_elts)); @@ -6825,7 +6826,7 @@ store_constructor (tree exp, rtx target, int cleared, poly_int64 size, { unsigned int n = const_n_elts; - if (emode != eltmode) + if (etype && TREE_CODE (etype) == VECTOR_TYPE) { n = CONSTRUCTOR_NELTS (exp); vec_vec_init_p = true; diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-40.c b/gcc/testsuite/gcc.dg/vect/bb-slp-40.c new file mode 100644 index 0000000000000000000000000000000000000000..a1dd372184623f34f8f2825aa5da50dc70c98084 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-40.c @@ -0,0 +1,34 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -fdump-tree-slp-all" } */ +/* { dg-require-effective-target vect_int } */ + +char g_d[1024], g_s1[1024], g_s2[1024]; +void foo(void) +{ + char *d = g_d, *s1 = g_s1, *s2 = g_s2; + + for ( int y = 0; y < 128; y++ ) + { + d[0 ] = s1[0 ] + s2[0 ]; + d[1 ] = s1[1 ] + s2[1 ]; + d[2 ] = s1[2 ] + s2[2 ]; + d[3 ] = s1[3 ] + s2[3 ]; + d[4 ] = s1[4 ] + s2[4 ]; + d[5 ] = s1[5 ] + s2[5 ]; + d[6 ] = s1[6 ] + s2[6 ]; + d[7 ] = s1[7 ] + s2[7 ]; + d[8 ] = s1[8 ] + s2[8 ]; + d[9 ] = s1[9 ] + s2[9 ]; + d[10] = s1[10] + s2[10]; + d[11] = s1[11] + s2[11]; + d[12] = s1[12] + s2[12]; + d[13] = s1[13] + s2[13]; + d[14] = s1[14] + s2[14]; + d[15] = s1[15] + s2[15]; + d += 16; + } +} + +/* See that we vectorize an SLP instance. */ +/* { dg-final { scan-tree-dump-times "Found vectorizable constructor" 1 "slp1" } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "slp1" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-41.c b/gcc/testsuite/gcc.dg/vect/bb-slp-41.c new file mode 100644 index 0000000000000000000000000000000000000000..b4a71241be588744c4c4480eaa8548172e7ddc9d --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-41.c @@ -0,0 +1,61 @@ +/* { dg-do run } */ +/* { dg-options "-O3 -fdump-tree-slp-all -fno-vect-cost-model" } */ +/* { dg-require-effective-target vect_int } */ + +#define ARR_SIZE 1000 + +void foo (int *a, int *b) +{ + int i; + for (i = 0; i < (ARR_SIZE - 2); ++i) + a[i] = b[0] + b[1] + b[i+1] + b[i+2]; +} + +void bar (int *a, int *b) +{ + int i; + for (i = 0; i < (ARR_SIZE - 2); ++i) + { + a[i] = b[0]; + } + for (i = 0; i < (ARR_SIZE - 2); ++i) + { + a[i] = a[i] + b[1]; + } + for (i = 0; i < (ARR_SIZE - 2); ++i) + { + a[i] = a[i] + b[i+1]; + } + for (i = 0; i < (ARR_SIZE - 2); ++i) + { + a[i] = a[i] + b[i+2]; + } +} + +int main () +{ + int a1[ARR_SIZE]; + int a2[ARR_SIZE]; + int b[ARR_SIZE]; + int i; + + for (i = 0; i < ARR_SIZE; i++) + { + a1[i] = 0; + a2[i] = 0; + b[i] = i; + } + + foo (a1, b); + bar (a2, b); + + for (i = 0; i < ARR_SIZE; i++) + if (a1[i] != a2[i]) + return 1; + + return 0; + +} +/* See that we vectorize an SLP instance. */ +/* { dg-final { scan-tree-dump-times "Found vectorizable constructor" 12 "slp1" } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "slp1" } } */ diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c index 4b1a231bb6ddf60b4e943fe9066fb87f97a4993a..ad762a808a40c9d591c6d858ada711a02e0d3472 100644 --- a/gcc/tree-vect-slp.c +++ b/gcc/tree-vect-slp.c @@ -1993,6 +1993,7 @@ vect_analyze_slp_instance (vec_info *vinfo, unsigned int i; struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info); vec<stmt_vec_info> scalar_stmts; + bool constructor = false; if (STMT_VINFO_GROUPED_ACCESS (stmt_info)) { @@ -2006,6 +2007,13 @@ vect_analyze_slp_instance (vec_info *vinfo, vectype = STMT_VINFO_VECTYPE (stmt_info); group_size = REDUC_GROUP_SIZE (stmt_info); } + else if (is_gimple_assign (stmt_info->stmt) + && gimple_assign_rhs_code (stmt_info->stmt) == CONSTRUCTOR) + { + vectype = TREE_TYPE (gimple_assign_rhs1 (stmt_info->stmt)); + group_size = CONSTRUCTOR_NELTS (gimple_assign_rhs1 (stmt_info->stmt)); + constructor = true; + } else { gcc_assert (is_a <loop_vec_info> (vinfo)); @@ -2053,6 +2061,25 @@ vect_analyze_slp_instance (vec_info *vinfo, STMT_VINFO_REDUC_DEF (vect_orig_stmt (stmt_info)) = STMT_VINFO_REDUC_DEF (vect_orig_stmt (scalar_stmts.last ())); } + else if (constructor) + { + tree rhs = gimple_assign_rhs1 (stmt_info->stmt); + tree val; + FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (rhs), i, val) + { + if (TREE_CODE (val) == SSA_NAME) + { + gimple* def = SSA_NAME_DEF_STMT (val); + stmt_vec_info def_info = vinfo->lookup_stmt (def); + /* Value is defined in another basic block. */ + if (!def_info) + return false; + scalar_stmts.safe_push (def_info); + } + else + return false; + } + } else { /* Collect reduction statements. */ @@ -2138,6 +2165,8 @@ vect_analyze_slp_instance (vec_info *vinfo, SLP_INSTANCE_GROUP_SIZE (new_instance) = group_size; SLP_INSTANCE_UNROLLING_FACTOR (new_instance) = unrolling_factor; SLP_INSTANCE_LOADS (new_instance) = vNULL; + SLP_INSTANCE_ROOT_STMT (new_instance) = constructor ? stmt_info : NULL; + vect_gather_slp_loads (new_instance, node); if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, @@ -2955,6 +2984,44 @@ vect_bb_vectorization_profitable_p (bb_vec_info bb_vinfo) return true; } +/* Find any vectorizable constructors, and add them to the grouped_store + array. */ + +static void +vect_slp_check_for_constructors (bb_vec_info bb_vinfo) +{ + gimple_stmt_iterator gsi; + + for (gsi = bb_vinfo->region_begin; + gsi_stmt (gsi) != gsi_stmt (bb_vinfo->region_end); gsi_next (&gsi)) + { + gimple *stmt = gsi_stmt (gsi); + + if (is_gimple_assign (stmt) + && gimple_assign_rhs_code (stmt) == CONSTRUCTOR + && TREE_CODE (gimple_assign_lhs (stmt)) == SSA_NAME + && TREE_CODE (TREE_TYPE (gimple_assign_lhs (stmt))) == VECTOR_TYPE) + { + tree rhs = gimple_assign_rhs1 (stmt); + + if (CONSTRUCTOR_NELTS (rhs) == 0) + continue; + + poly_uint64 subparts = TYPE_VECTOR_SUBPARTS (TREE_TYPE (rhs)); + + if (!subparts.is_constant () || !(subparts.to_constant () + == CONSTRUCTOR_NELTS (rhs))) + continue; + + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "Found vectorizable constructor: %G\n", stmt); + stmt_vec_info stmt_info = bb_vinfo->lookup_stmt (stmt); + BB_VINFO_GROUPED_STORES (bb_vinfo).safe_push (stmt_info); + } + } +} + /* Check if the region described by BB_VINFO can be vectorized, returning true if so. When returning false, set FATAL to true if the same failure would prevent vectorization at other vector sizes, false if it is still @@ -3002,6 +3069,8 @@ vect_slp_analyze_bb_1 (bb_vec_info bb_vinfo, int n_stmts, bool &fatal) return false; } + vect_slp_check_for_constructors (bb_vinfo); + /* If there are no grouped stores in the region there is no need to continue with pattern recog as vect_analyze_slp will fail anyway. */ @@ -3058,6 +3127,8 @@ vect_slp_analyze_bb_1 (bb_vec_info bb_vinfo, int n_stmts, bool &fatal) relevant. */ vect_mark_slp_stmts (SLP_INSTANCE_TREE (instance)); vect_mark_slp_stmts_relevant (SLP_INSTANCE_TREE (instance)); + if (SLP_INSTANCE_ROOT_STMT (instance)) + STMT_SLP_TYPE (SLP_INSTANCE_ROOT_STMT (instance)) = pure_slp; i++; } @@ -4074,6 +4145,50 @@ vect_remove_slp_scalar_calls (slp_tree node) vect_remove_slp_scalar_calls (node, visited); } +/* Vectorize the instance root. Return success or failure. */ + +void +vectorize_slp_instance_root_stmt (slp_tree node, slp_instance instance) +{ + gassign *rstmt; + + if (SLP_TREE_NUMBER_OF_VEC_STMTS (node) == 1) + { + stmt_vec_info child_stmt_info; + int j; + + FOR_EACH_VEC_ELT (SLP_TREE_VEC_STMTS (node), j, child_stmt_info) + { + tree vect_lhs = gimple_get_lhs (child_stmt_info->stmt); + tree root_lhs = gimple_get_lhs (instance->root_stmt->stmt); + rstmt = gimple_build_assign (root_lhs, vect_lhs); + break; + } + } + else if (SLP_TREE_NUMBER_OF_VEC_STMTS (node) > 1) + { + int nelts = SLP_TREE_NUMBER_OF_VEC_STMTS (node); + stmt_vec_info child_stmt_info; + int j; + vec<constructor_elt, va_gc> *v; + vec_alloc (v, nelts); + + FOR_EACH_VEC_ELT (SLP_TREE_VEC_STMTS (node), j, child_stmt_info) + { + CONSTRUCTOR_APPEND_ELT (v, + NULL_TREE, + gimple_get_lhs (child_stmt_info->stmt)); + } + tree lhs = gimple_get_lhs (instance->root_stmt->stmt); + tree rtype = TREE_TYPE (gimple_assign_rhs1 (instance->root_stmt->stmt)); + tree r_constructor = build_constructor (rtype, v); + rstmt = gimple_build_assign (lhs, r_constructor); + } + gimple_stmt_iterator rgsi = gsi_for_stmt (instance->root_stmt->stmt); + gsi_replace (&rgsi, rstmt, true); + SLP_INSTANCE_ROOT_STMT (instance) = NULL; +} + /* Generate vector code for all SLP instances in the loop/basic block. */ void @@ -4088,9 +4203,14 @@ vect_schedule_slp (vec_info *vinfo) slp_instances = vinfo->slp_instances; FOR_EACH_VEC_ELT (slp_instances, i, instance) { + slp_tree node = SLP_INSTANCE_TREE (instance); /* Schedule the tree of INSTANCE. */ - vect_schedule_slp_instance (SLP_INSTANCE_TREE (instance), + vect_schedule_slp_instance (node, instance, bst_map); + + if (SLP_INSTANCE_ROOT_STMT (instance)) + vectorize_slp_instance_root_stmt (node, instance); + if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, "vectorizing stmts using SLP.\n"); @@ -4113,15 +4233,42 @@ vect_schedule_slp (vec_info *vinfo) if (is_a <loop_vec_info> (vinfo)) vect_remove_slp_scalar_calls (root); + auto_vec<stmt_vec_info> removed; + for (j = 0; SLP_TREE_SCALAR_STMTS (root).iterate (j, &store_info) && j < SLP_INSTANCE_GROUP_SIZE (instance); j++) { if (!STMT_VINFO_DATA_REF (store_info)) break; + if (removed.contains (store_info)) + continue; + store_info = vect_orig_stmt (store_info); + tree lhs = gimple_get_lhs (store_info->stmt); + bool used = false; + /* We must ensure we don't remove def stmts for ssa names used + elsewhere. */ + if (TREE_CODE (lhs) == SSA_NAME) + { + gimple *use_stmt; + imm_use_iterator use_iter; + + FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, lhs) + { + if (use_stmt != store_info->stmt) + { + used = true; + BREAK_FROM_IMM_USE_STMT (use_iter); + } + } + } + /* Free the attached stmt_vec_info and remove the stmt. */ - vinfo->remove_stmt (store_info); + + if (!used) + vinfo->remove_stmt (store_info); + removed.safe_push (store_info); } } } diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index 71b5f380e2c91a7a551f6e26920bb17809abedf0..0fa64357be2c154662378af85642632aac50c523 100644 --- a/gcc/tree-vectorizer.h +++ b/gcc/tree-vectorizer.h @@ -152,6 +152,10 @@ public: /* The root of SLP tree. */ slp_tree root; + /* For vector constructors, the constructor stmt that the SLP tree is built + from, NULL otherwise. */ + stmt_vec_info root_stmt; + /* Size of groups of scalar stmts that will be replaced by SIMD stmt/s. */ unsigned int group_size; @@ -171,6 +175,7 @@ public: #define SLP_INSTANCE_GROUP_SIZE(S) (S)->group_size #define SLP_INSTANCE_UNROLLING_FACTOR(S) (S)->unrolling_factor #define SLP_INSTANCE_LOADS(S) (S)->loads +#define SLP_INSTANCE_ROOT_STMT(S) (S)->root_stmt #define SLP_TREE_CHILDREN(S) (S)->children #define SLP_TREE_SCALAR_STMTS(S) (S)->stmts -- 2.17.1
On Fri, 1 Nov 2019, Joel Hutton wrote: > On 30/10/2019 13:49, Richard Biener wrote: > >> > >> * expr.c (store_constructor): Add case for constructor of > vectors. > > Why do you need this? The vectorizer already creates such CTORs. Any > > testcase that you can show? > > > > typedef long v2di __attribute__((vector_size(16))); > > v2di v; > > void > > foo() > > { > > v = (v2di){v[1], v[0]}; > > } > There is a special case for single element vectors, where the vector > mode and > the element mode are the same. > I've changed this slightly, as your solution of using VECTOR_MODE_P didn't > work for my case where: > emode = E_DImode > eltmode = E_DImode > On aarch64. As E_DImode is not a vector mode. > Based on some checks I found in verify_gimple, I set the type of the > replacement > constructor to the same as the original constructor, as verify_gimple > seems to > expect flattened types. i.e. a vector of vectors of ints should have > type vector > of ints. Huh. On aarch64 for the above testcase I see mode V2DImode and emode = eltmode = DImode. That's just a regular CTOR with non-vector elements so not sure why you need any adjustment at all here? It looks like your vectorization of the CTOR introduces a V2DImode CTOR of vector(1) long elements which incidentially have DImode. That's because somehow V2DI vectorization isn't profitable but V1DI one is. In the end it's a noop transform but the testcase shows that for V2DI vectorization we fail to cost the CTOR in the scalar cost computation (you have to include the pattern root there I guess). Technically we feed valid GIMPLE to the expander so we indeed shouldn't ICE. So I'd like to have the earlier keying on vec_vec_init_p match the later assert we run into, thus sth like Index: gcc/expr.c =================================================================== --- gcc/expr.c (revision 277765) +++ gcc/expr.c (working copy) @@ -6809,6 +6809,7 @@ store_constructor (tree exp, rtx target, && n_elts.is_constant (&const_n_elts)) { machine_mode emode = eltmode; + bool vector_typed_elts_p = false; if (CONSTRUCTOR_NELTS (exp) && (TREE_CODE (TREE_TYPE (CONSTRUCTOR_ELT (exp, 0)->value)) @@ -6819,13 +6820,14 @@ store_constructor (tree exp, rtx target, * TYPE_VECTOR_SUBPARTS (etype), n_elts)); emode = TYPE_MODE (etype); + vector_typed_elts_p = true; } icode = convert_optab_handler (vec_init_optab, mode, emode); if (icode != CODE_FOR_nothing) { unsigned int n = const_n_elts; - if (emode != eltmode) + if (vector_typed_elts_p) { n = CONSTRUCTOR_NELTS (exp); vec_vec_init_p = true; > > > What could be done is "compact" the > > operands to vectorize to only contain SSA names, try to SLP > > match and vectorize them and then combine the vectorized result > > with the constant elements. > > > > Not worth doing I guess unless it's sth regular like > > > > { a, b, c, d, 0, 0, 0, 0 } > > > > or so. But this can be handled as followup if necessary. > > > I agree, it should be possible to support this in future patches. > > > > + SLP_INSTANCE_ROOT_STMT (new_instance) = constructor ? > > stmt_info->stmt\ > > + : NULL; > > > > SLP_INSTANCE_ROOT_STMT should be a stmt_vec_info I guess... > > > > @@ -2801,6 +2830,10 @@ vect_bb_slp_scalar_cost (basic_block bb, > > stmt_vec_info use_stmt_info = vinfo->lookup_stmt > > (use_stmt); > > if (!use_stmt_info || !PURE_SLP_STMT (use_stmt_info)) > > { > > + /* Check this is not a constructor that will be > > vectorized > > + away. */ > > + if (BB_VINFO_GROUPED_STORES (vinfo).contains > > (use_stmt_info)) > > + continue; > > (*life)[i] = true; > > > > ... which you then simply mark as PURE_SLP_STMT where we call > > vect_mark_slp_stmts in vect_slp_analyze_bb_1. > Done. > > > > I see you have the TYPE_VECTOR_SUBPARTS check still at transform > > stage in vectorize_slp_instance_root_stmt, please simply move > > it to vect_slp_check_for_constructors or to vect_analyze_slp_instance > > where you have the other rejects (non-SSA names in the ctor). > > > Done. > > > > > > +bool > > +vectorize_slp_instance_root_stmt (slp_tree node, slp_instance instance) > > +{ > > + > > > > extra newline > Fixed. > > > > + /* For vector constructors, the constructor stmt that the SLP tree is > > built > > + from, NULL otherwise. */ > > + gimple *root_stmt; > > > > as said, make this a stmt_vec_info > Done. > > > > + if (SLP_TREE_NUMBER_OF_VEC_STMTS (node) == 1) > > > > the stmt replacing can be commonized between == 1 and > 1 cases. > Done. > > > > + > > + /* Vectorize the instance root. */ > > + if (instance->root == node && SLP_INSTANCE_ROOT_STMT (instance) > > + && SLP_TREE_VEC_STMTS (node).exists ()) > > + if (!vectorize_slp_instance_root_stmt (node, instance)) > > + { > > > > instance->root == node is always true. Likewise > > SLP_TREE_VEC_STMTS (node).exists (). > Done. > > > > @@ -4061,15 +4201,42 @@ vect_schedule_slp (vec_info *vinfo) > > if (is_a <loop_vec_info> (vinfo)) > > > > the ChangeLog doesn't mention this. I guess this isn't necessary > > unless you fail vectorizing late which you shouldn't. > > > It's necessary to avoid: > > removing stmts twice for constructors of the form {_1,_1,_1,_1} > removing stmts that define ssa names used elsewhere (which > previously wasn't a consideration because the scalar_stmts were stores > to memory, not assignments to ssa names) OK, but the code you are patching is just supposed to remove stores from the final scalar stmt seeds - it doesn't expect any loads there which is probably what happens. I'd instead add a /* Do not CSE the stmts feeding a CTOR, they might have uses outside of the vectorized stmts. */ if (SLP_INSTANCE_ROOT_STMT (instance)) continue; before the loop over SLP_TREE_SCALAR_STMTS. > Updated changelog (and patch) + if (!subparts.is_constant () || !(subparts.to_constant () + == CONSTRUCTOR_NELTS (rhs))) + continue; can be better written as if (maybe_ne (subparts, CONSTRUCTOR_NELTS (rhs))) +/* Vectorize the instance root. Return success or failure. */ + +void since it's now void the comment has to be adjusted. + vect_schedule_slp_instance (node, instance, bst_map); this now fits in one line. OK with those changes. Thanks, Richard. > 2019-10-31 Joel Hutton <Joel.Hutton@arm.com> > > * expr.c (store_constructor): Modify to handle single element > vectors. > * tree-vect-slp.c (vect_analyze_slp_instance): Add case for > vector constructors. > (vect_slp_check_for_constructors): New function. > (vect_slp_analyze_bb_1): Call new function to check for vector > constructors. > (vectorize_slp_instance_root_stmt): New function. > (vect_schedule_slp): Call new function to vectorize root stmt > of vector constructors. > * tree-vectorizer.h (SLP_INSTANCE_ROOT_STMT): New field. > > gcc/testsuite/ChangeLog: > > 2019-10-31 Joel Hutton <Joel.Hutton@arm.com> > > * gcc.dg/vect/bb-slp-40.c: New test. > * gcc.dg/vect/bb-slp-41.c: New test. > > -- Richard Biener <rguenther@suse.de> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
First copy bounced from mailing list. > > On 30/10/2019 13:49, Richard Biener wrote: > > >> > > >> * expr.c (store_constructor): Add case for constructor of > > vectors. > > > Why do you need this? The vectorizer already creates such CTORs. Any > > > testcase that you can show? > > > > > > typedef long v2di __attribute__((vector_size(16))); > > > v2di v; > > > void > > > foo() > > > { > > > v = (v2di){v[1], v[0]}; > > > } > > There is a special case for single element vectors, where the vector > > mode and > > the element mode are the same. > > I've changed this slightly, as your solution of using VECTOR_MODE_P didn't > > work for my case where: > > emode = E_DImode > > eltmode = E_DImode > > On aarch64. As E_DImode is not a vector mode. > > Based on some checks I found in verify_gimple, I set the type of the > > replacement > > constructor to the same as the original constructor, as verify_gimple > > seems to > > expect flattened types. i.e. a vector of vectors of ints should have > > type vector > > of ints. > > Huh. On aarch64 for the above testcase I see mode V2DImode and > emode = eltmode = DImode. That's just a regular CTOR with > non-vector elements so not sure why you need any adjustment at all > here? > > It looks like your vectorization of the CTOR introduces a > V2DImode CTOR of vector(1) long elements which incidentially > have DImode. That's because somehow V2DI vectorization isn't > profitable but V1DI one is. In the end it's a noop transform > but the testcase shows that for V2DI vectorization we fail > to cost the CTOR in the scalar cost computation (you have to > include the pattern root there I guess). > Yes, sorry, I was unclear about this, it's the new constructor that the change is needed for. > Technically we feed valid GIMPLE to the expander so we indeed > shouldn't ICE. So I'd like to have the earlier keying on > vec_vec_init_p match the later assert we run into, thus > sth like > > Index: gcc/expr.c > =================================================================== > --- gcc/expr.c (revision 277765) > +++ gcc/expr.c (working copy) > @@ -6809,6 +6809,7 @@ store_constructor (tree exp, rtx target, > && n_elts.is_constant (&const_n_elts)) > { > machine_mode emode = eltmode; > + bool vector_typed_elts_p = false; > > if (CONSTRUCTOR_NELTS (exp) > && (TREE_CODE (TREE_TYPE (CONSTRUCTOR_ELT (exp, > 0)->value)) > @@ -6819,13 +6820,14 @@ store_constructor (tree exp, rtx target, > * TYPE_VECTOR_SUBPARTS (etype), > n_elts)); > emode = TYPE_MODE (etype); > + vector_typed_elts_p = true; > } > icode = convert_optab_handler (vec_init_optab, mode, emode); > if (icode != CODE_FOR_nothing) > { > unsigned int n = const_n_elts; > > - if (emode != eltmode) > + if (vector_typed_elts_p) > { > n = CONSTRUCTOR_NELTS (exp); > vec_vec_init_p = true; > > I've used this, thank you. > > > @@ -4061,15 +4201,42 @@ vect_schedule_slp (vec_info *vinfo) > > > if (is_a <loop_vec_info> (vinfo)) > > > > > > the ChangeLog doesn't mention this. I guess this isn't necessary > > > unless you fail vectorizing late which you shouldn't. > > > > > It's necessary to avoid: > > > > removing stmts twice for constructors of the form {_1,_1,_1,_1} > > removing stmts that define ssa names used elsewhere (which > > previously wasn't a consideration because the scalar_stmts were stores > > to memory, not assignments to ssa names) > > OK, but the code you are patching is just supposed to remove stores > from the final scalar stmt seeds - it doesn't expect any loads there > which is probably what happens. I'd instead add a > > /* Do not CSE the stmts feeding a CTOR, they might have uses > outside of the vectorized stmts. */ > if (SLP_INSTANCE_ROOT_STMT (instance)) > continue; > > before the loop over SLP_TREE_SCALAR_STMTS. Done. > + if (!subparts.is_constant () || !(subparts.to_constant () > + == CONSTRUCTOR_NELTS (rhs))) > + continue; > > can be better written as if (maybe_ne (subparts, CONSTRUCTOR_NELTS (rhs))) Done. > +/* Vectorize the instance root. Return success or failure. */ > + > +void > > since it's now void the comment has to be adjusted. Done. > + vect_schedule_slp_instance (node, > instance, bst_map); > > this now fits in one line. Done. > OK with those changes. Tested and bootstrapped on aarch64. OK? 2019-11-04 Joel Hutton <Joel.Hutton@arm.com> * expr.c (store_constructor): Modify to handle single element vectors. * tree-vect-slp.c (vect_analyze_slp_instance): Add case for vector constructors. (vect_slp_check_for_constructors): New function. (vect_slp_analyze_bb_1): Call new function to check for vector constructors. (vectorize_slp_instance_root_stmt): New function. (vect_schedule_slp): Call new function to vectorize root stmt of vector constructors. * tree-vectorizer.h (SLP_INSTANCE_ROOT_STMT): New field. gcc/testsuite/ChangeLog: 2019-11-04 Joel Hutton <Joel.Hutton@arm.com> * gcc.dg/vect/bb-slp-40.c: New test. * gcc.dg/vect/bb-slp-41.c: New test. From a79d17a40f376ee689862f2fecc0d8cbe626aa10 Mon Sep 17 00:00:00 2001 From: Joel Hutton <Joel.Hutton@arm.com> Date: Tue, 22 Oct 2019 10:05:19 +0100 Subject: [PATCH] SLP Vectorization: Vectorize vector constructors --- gcc/expr.c | 4 +- gcc/testsuite/gcc.dg/vect/bb-slp-40.c | 34 +++++++ gcc/testsuite/gcc.dg/vect/bb-slp-41.c | 61 +++++++++++++ gcc/tree-vect-slp.c | 124 +++++++++++++++++++++++++- gcc/tree-vectorizer.h | 5 ++ 5 files changed, 225 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-40.c create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-41.c diff --git a/gcc/expr.c b/gcc/expr.c index 476c6865f20828fc68f455e70d4874eaabd9d08d..12165af21cb0ded943b63e9e503e42e267de8ce5 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -6809,6 +6809,7 @@ store_constructor (tree exp, rtx target, int cleared, poly_int64 size, && n_elts.is_constant (&const_n_elts)) { machine_mode emode = eltmode; + bool vector_typed_elts_p = false; if (CONSTRUCTOR_NELTS (exp) && (TREE_CODE (TREE_TYPE (CONSTRUCTOR_ELT (exp, 0)->value)) @@ -6819,13 +6820,14 @@ store_constructor (tree exp, rtx target, int cleared, poly_int64 size, * TYPE_VECTOR_SUBPARTS (etype), n_elts)); emode = TYPE_MODE (etype); + vector_typed_elts_p = true; } icode = convert_optab_handler (vec_init_optab, mode, emode); if (icode != CODE_FOR_nothing) { unsigned int n = const_n_elts; - if (emode != eltmode) + if (vector_typed_elts_p) { n = CONSTRUCTOR_NELTS (exp); vec_vec_init_p = true; diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-40.c b/gcc/testsuite/gcc.dg/vect/bb-slp-40.c new file mode 100644 index 0000000000000000000000000000000000000000..a1dd372184623f34f8f2825aa5da50dc70c98084 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-40.c @@ -0,0 +1,34 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -fdump-tree-slp-all" } */ +/* { dg-require-effective-target vect_int } */ + +char g_d[1024], g_s1[1024], g_s2[1024]; +void foo(void) +{ + char *d = g_d, *s1 = g_s1, *s2 = g_s2; + + for ( int y = 0; y < 128; y++ ) + { + d[0 ] = s1[0 ] + s2[0 ]; + d[1 ] = s1[1 ] + s2[1 ]; + d[2 ] = s1[2 ] + s2[2 ]; + d[3 ] = s1[3 ] + s2[3 ]; + d[4 ] = s1[4 ] + s2[4 ]; + d[5 ] = s1[5 ] + s2[5 ]; + d[6 ] = s1[6 ] + s2[6 ]; + d[7 ] = s1[7 ] + s2[7 ]; + d[8 ] = s1[8 ] + s2[8 ]; + d[9 ] = s1[9 ] + s2[9 ]; + d[10] = s1[10] + s2[10]; + d[11] = s1[11] + s2[11]; + d[12] = s1[12] + s2[12]; + d[13] = s1[13] + s2[13]; + d[14] = s1[14] + s2[14]; + d[15] = s1[15] + s2[15]; + d += 16; + } +} + +/* See that we vectorize an SLP instance. */ +/* { dg-final { scan-tree-dump-times "Found vectorizable constructor" 1 "slp1" } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "slp1" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-41.c b/gcc/testsuite/gcc.dg/vect/bb-slp-41.c new file mode 100644 index 0000000000000000000000000000000000000000..b4a71241be588744c4c4480eaa8548172e7ddc9d --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-41.c @@ -0,0 +1,61 @@ +/* { dg-do run } */ +/* { dg-options "-O3 -fdump-tree-slp-all -fno-vect-cost-model" } */ +/* { dg-require-effective-target vect_int } */ + +#define ARR_SIZE 1000 + +void foo (int *a, int *b) +{ + int i; + for (i = 0; i < (ARR_SIZE - 2); ++i) + a[i] = b[0] + b[1] + b[i+1] + b[i+2]; +} + +void bar (int *a, int *b) +{ + int i; + for (i = 0; i < (ARR_SIZE - 2); ++i) + { + a[i] = b[0]; + } + for (i = 0; i < (ARR_SIZE - 2); ++i) + { + a[i] = a[i] + b[1]; + } + for (i = 0; i < (ARR_SIZE - 2); ++i) + { + a[i] = a[i] + b[i+1]; + } + for (i = 0; i < (ARR_SIZE - 2); ++i) + { + a[i] = a[i] + b[i+2]; + } +} + +int main () +{ + int a1[ARR_SIZE]; + int a2[ARR_SIZE]; + int b[ARR_SIZE]; + int i; + + for (i = 0; i < ARR_SIZE; i++) + { + a1[i] = 0; + a2[i] = 0; + b[i] = i; + } + + foo (a1, b); + bar (a2, b); + + for (i = 0; i < ARR_SIZE; i++) + if (a1[i] != a2[i]) + return 1; + + return 0; + +} +/* See that we vectorize an SLP instance. */ +/* { dg-final { scan-tree-dump-times "Found vectorizable constructor" 12 "slp1" } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "slp1" } } */ diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c index 4b1a231bb6ddf60b4e943fe9066fb87f97a4993a..5549d053bf3cd102a4c4fcdc2e890c596927bd55 100644 --- a/gcc/tree-vect-slp.c +++ b/gcc/tree-vect-slp.c @@ -1993,6 +1993,7 @@ vect_analyze_slp_instance (vec_info *vinfo, unsigned int i; struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info); vec<stmt_vec_info> scalar_stmts; + bool constructor = false; if (STMT_VINFO_GROUPED_ACCESS (stmt_info)) { @@ -2006,6 +2007,13 @@ vect_analyze_slp_instance (vec_info *vinfo, vectype = STMT_VINFO_VECTYPE (stmt_info); group_size = REDUC_GROUP_SIZE (stmt_info); } + else if (is_gimple_assign (stmt_info->stmt) + && gimple_assign_rhs_code (stmt_info->stmt) == CONSTRUCTOR) + { + vectype = TREE_TYPE (gimple_assign_rhs1 (stmt_info->stmt)); + group_size = CONSTRUCTOR_NELTS (gimple_assign_rhs1 (stmt_info->stmt)); + constructor = true; + } else { gcc_assert (is_a <loop_vec_info> (vinfo)); @@ -2053,6 +2061,25 @@ vect_analyze_slp_instance (vec_info *vinfo, STMT_VINFO_REDUC_DEF (vect_orig_stmt (stmt_info)) = STMT_VINFO_REDUC_DEF (vect_orig_stmt (scalar_stmts.last ())); } + else if (constructor) + { + tree rhs = gimple_assign_rhs1 (stmt_info->stmt); + tree val; + FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (rhs), i, val) + { + if (TREE_CODE (val) == SSA_NAME) + { + gimple* def = SSA_NAME_DEF_STMT (val); + stmt_vec_info def_info = vinfo->lookup_stmt (def); + /* Value is defined in another basic block. */ + if (!def_info) + return false; + scalar_stmts.safe_push (def_info); + } + else + return false; + } + } else { /* Collect reduction statements. */ @@ -2138,6 +2165,8 @@ vect_analyze_slp_instance (vec_info *vinfo, SLP_INSTANCE_GROUP_SIZE (new_instance) = group_size; SLP_INSTANCE_UNROLLING_FACTOR (new_instance) = unrolling_factor; SLP_INSTANCE_LOADS (new_instance) = vNULL; + SLP_INSTANCE_ROOT_STMT (new_instance) = constructor ? stmt_info : NULL; + vect_gather_slp_loads (new_instance, node); if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, @@ -2955,6 +2984,43 @@ vect_bb_vectorization_profitable_p (bb_vec_info bb_vinfo) return true; } +/* Find any vectorizable constructors and add them to the grouped_store + array. */ + +static void +vect_slp_check_for_constructors (bb_vec_info bb_vinfo) +{ + gimple_stmt_iterator gsi; + + for (gsi = bb_vinfo->region_begin; + gsi_stmt (gsi) != gsi_stmt (bb_vinfo->region_end); gsi_next (&gsi)) + { + gimple *stmt = gsi_stmt (gsi); + + if (is_gimple_assign (stmt) + && gimple_assign_rhs_code (stmt) == CONSTRUCTOR + && TREE_CODE (gimple_assign_lhs (stmt)) == SSA_NAME + && TREE_CODE (TREE_TYPE (gimple_assign_lhs (stmt))) == VECTOR_TYPE) + { + tree rhs = gimple_assign_rhs1 (stmt); + + if (CONSTRUCTOR_NELTS (rhs) == 0) + continue; + + poly_uint64 subparts = TYPE_VECTOR_SUBPARTS (TREE_TYPE (rhs)); + + if (maybe_ne (subparts, CONSTRUCTOR_NELTS (rhs))) + continue; + + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "Found vectorizable constructor: %G\n", stmt); + stmt_vec_info stmt_info = bb_vinfo->lookup_stmt (stmt); + BB_VINFO_GROUPED_STORES (bb_vinfo).safe_push (stmt_info); + } + } +} + /* Check if the region described by BB_VINFO can be vectorized, returning true if so. When returning false, set FATAL to true if the same failure would prevent vectorization at other vector sizes, false if it is still @@ -3002,6 +3068,8 @@ vect_slp_analyze_bb_1 (bb_vec_info bb_vinfo, int n_stmts, bool &fatal) return false; } + vect_slp_check_for_constructors (bb_vinfo); + /* If there are no grouped stores in the region there is no need to continue with pattern recog as vect_analyze_slp will fail anyway. */ @@ -3058,6 +3126,8 @@ vect_slp_analyze_bb_1 (bb_vec_info bb_vinfo, int n_stmts, bool &fatal) relevant. */ vect_mark_slp_stmts (SLP_INSTANCE_TREE (instance)); vect_mark_slp_stmts_relevant (SLP_INSTANCE_TREE (instance)); + if (SLP_INSTANCE_ROOT_STMT (instance)) + STMT_SLP_TYPE (SLP_INSTANCE_ROOT_STMT (instance)) = pure_slp; i++; } @@ -4074,6 +4144,49 @@ vect_remove_slp_scalar_calls (slp_tree node) vect_remove_slp_scalar_calls (node, visited); } +/* Vectorize the instance root. */ + +void +vectorize_slp_instance_root_stmt (slp_tree node, slp_instance instance) +{ + gassign *rstmt; + + if (SLP_TREE_NUMBER_OF_VEC_STMTS (node) == 1) + { + stmt_vec_info child_stmt_info; + int j; + + FOR_EACH_VEC_ELT (SLP_TREE_VEC_STMTS (node), j, child_stmt_info) + { + tree vect_lhs = gimple_get_lhs (child_stmt_info->stmt); + tree root_lhs = gimple_get_lhs (instance->root_stmt->stmt); + rstmt = gimple_build_assign (root_lhs, vect_lhs); + break; + } + } + else if (SLP_TREE_NUMBER_OF_VEC_STMTS (node) > 1) + { + int nelts = SLP_TREE_NUMBER_OF_VEC_STMTS (node); + stmt_vec_info child_stmt_info; + int j; + vec<constructor_elt, va_gc> *v; + vec_alloc (v, nelts); + + FOR_EACH_VEC_ELT (SLP_TREE_VEC_STMTS (node), j, child_stmt_info) + { + CONSTRUCTOR_APPEND_ELT (v, + NULL_TREE, + gimple_get_lhs (child_stmt_info->stmt)); + } + tree lhs = gimple_get_lhs (instance->root_stmt->stmt); + tree rtype = TREE_TYPE (gimple_assign_rhs1 (instance->root_stmt->stmt)); + tree r_constructor = build_constructor (rtype, v); + rstmt = gimple_build_assign (lhs, r_constructor); + } + gimple_stmt_iterator rgsi = gsi_for_stmt (instance->root_stmt->stmt); + gsi_replace (&rgsi, rstmt, true); +} + /* Generate vector code for all SLP instances in the loop/basic block. */ void @@ -4088,9 +4201,13 @@ vect_schedule_slp (vec_info *vinfo) slp_instances = vinfo->slp_instances; FOR_EACH_VEC_ELT (slp_instances, i, instance) { + slp_tree node = SLP_INSTANCE_TREE (instance); /* Schedule the tree of INSTANCE. */ - vect_schedule_slp_instance (SLP_INSTANCE_TREE (instance), - instance, bst_map); + vect_schedule_slp_instance (node, instance, bst_map); + + if (SLP_INSTANCE_ROOT_STMT (instance)) + vectorize_slp_instance_root_stmt (node, instance); + if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, "vectorizing stmts using SLP.\n"); @@ -4119,6 +4236,9 @@ vect_schedule_slp (vec_info *vinfo) if (!STMT_VINFO_DATA_REF (store_info)) break; + if (SLP_INSTANCE_ROOT_STMT (instance)) + continue; + store_info = vect_orig_stmt (store_info); /* Free the attached stmt_vec_info and remove the stmt. */ vinfo->remove_stmt (store_info); diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index 71b5f380e2c91a7a551f6e26920bb17809abedf0..0fa64357be2c154662378af85642632aac50c523 100644 --- a/gcc/tree-vectorizer.h +++ b/gcc/tree-vectorizer.h @@ -152,6 +152,10 @@ public: /* The root of SLP tree. */ slp_tree root; + /* For vector constructors, the constructor stmt that the SLP tree is built + from, NULL otherwise. */ + stmt_vec_info root_stmt; + /* Size of groups of scalar stmts that will be replaced by SIMD stmt/s. */ unsigned int group_size; @@ -171,6 +175,7 @@ public: #define SLP_INSTANCE_GROUP_SIZE(S) (S)->group_size #define SLP_INSTANCE_UNROLLING_FACTOR(S) (S)->unrolling_factor #define SLP_INSTANCE_LOADS(S) (S)->loads +#define SLP_INSTANCE_ROOT_STMT(S) (S)->root_stmt #define SLP_TREE_CHILDREN(S) (S)->children #define SLP_TREE_SCALAR_STMTS(S) (S)->stmts -- 2.17.1
On November 4, 2019 4:30:57 PM GMT+01:00, Joel Hutton <Joel.Hutton@arm.com> wrote: >First copy bounced from mailing list. > > > On 30/10/2019 13:49, Richard Biener wrote: > > > >> >> > >> * expr.c (store_constructor): Add case for constructor >of > > > vectors. > > > > Why do you need this? The vectorizer already creates such >CTORs. Any > > > > testcase that you can show? > > > > > > > > typedef long v2di __attribute__((vector_size(16))); > > > > v2di v; > > > > void > > > > foo() > > > > { > > > > v = (v2di){v[1], v[0]}; > > > > } >> > There is a special case for single element vectors, where the >vector > > > mode and > > > the element mode are the same. >> > I've changed this slightly, as your solution of using VECTOR_MODE_P > >didn't > > > work for my case where: > > > emode = E_DImode > > > eltmode = E_DImode > > > On aarch64. As E_DImode is not a vector mode. >> > Based on some checks I found in verify_gimple, I set the type of >the > > > replacement >> > constructor to the same as the original constructor, as >verify_gimple > > > seems to >> > expect flattened types. i.e. a vector of vectors of ints should >have > > > type vector > > > of ints. > > > > Huh. On aarch64 for the above testcase I see mode V2DImode and > > emode = eltmode = DImode. That's just a regular CTOR with > > non-vector elements so not sure why you need any adjustment at all > > here? > > > > It looks like your vectorization of the CTOR introduces a > > V2DImode CTOR of vector(1) long elements which incidentially > > have DImode. That's because somehow V2DI vectorization isn't > > profitable but V1DI one is. In the end it's a noop transform > > but the testcase shows that for V2DI vectorization we fail > > to cost the CTOR in the scalar cost computation (you have to > > include the pattern root there I guess). > > >Yes, sorry, I was unclear about this, it's the new constructor that the > >change is needed for. > > > Technically we feed valid GIMPLE to the expander so we indeed > > shouldn't ICE. So I'd like to have the earlier keying on > > vec_vec_init_p match the later assert we run into, thus > > sth like > > > > Index: gcc/expr.c > > =================================================================== > > --- gcc/expr.c (revision 277765) > > +++ gcc/expr.c (working copy) > > @@ -6809,6 +6809,7 @@ store_constructor (tree exp, rtx target, > > && n_elts.is_constant (&const_n_elts)) > > { > > machine_mode emode = eltmode; > > + bool vector_typed_elts_p = false; > > > > if (CONSTRUCTOR_NELTS (exp) > > && (TREE_CODE (TREE_TYPE (CONSTRUCTOR_ELT (exp, > > 0)->value)) > > @@ -6819,13 +6820,14 @@ store_constructor (tree exp, rtx target, >> * TYPE_VECTOR_SUBPARTS (etype), > > n_elts)); > > emode = TYPE_MODE (etype); > > + vector_typed_elts_p = true; > > } >> icode = convert_optab_handler (vec_init_optab, mode, >emode); > > if (icode != CODE_FOR_nothing) > > { > > unsigned int n = const_n_elts; > > > > - if (emode != eltmode) > > + if (vector_typed_elts_p) > > { > > n = CONSTRUCTOR_NELTS (exp); > > vec_vec_init_p = true; > > > > >I've used this, thank you. > > > > > @@ -4061,15 +4201,42 @@ vect_schedule_slp (vec_info *vinfo) > > > > if (is_a <loop_vec_info> (vinfo)) > > > > >> > > the ChangeLog doesn't mention this. I guess this isn't >necessary > > > > unless you fail vectorizing late which you shouldn't. > > > > > > > It's necessary to avoid: > > > > > > removing stmts twice for constructors of the form >{_1,_1,_1,_1} >> > removing stmts that define ssa names used elsewhere (which >> > previously wasn't a consideration because the scalar_stmts were >stores > > > to memory, not assignments to ssa names) > > > > OK, but the code you are patching is just supposed to remove stores > > from the final scalar stmt seeds - it doesn't expect any loads there > > which is probably what happens. I'd instead add a > > > > /* Do not CSE the stmts feeding a CTOR, they might have uses > > outside of the vectorized stmts. */ > > if (SLP_INSTANCE_ROOT_STMT (instance)) > > continue; > > > > before the loop over SLP_TREE_SCALAR_STMTS. >Done. > > > + if (!subparts.is_constant () || !(subparts.to_constant () >> + == CONSTRUCTOR_NELTS >(rhs))) > > + continue; > > > > can be better written as if (maybe_ne (subparts, CONSTRUCTOR_NELTS >(rhs))) >Done. > > > +/* Vectorize the instance root. Return success or failure. */ > > + > > +void > > > > since it's now void the comment has to be adjusted. >Done. > > > + vect_schedule_slp_instance (node, > > instance, bst_map); > > > > this now fits in one line. >Done. > > > OK with those changes. >Tested and bootstrapped on aarch64. >OK? Ok. Thanks, Richard. >2019-11-04 Joel Hutton <Joel.Hutton@arm.com> > > * expr.c (store_constructor): Modify to handle single element >vectors. > * tree-vect-slp.c (vect_analyze_slp_instance): Add case for >vector constructors. > (vect_slp_check_for_constructors): New function. > (vect_slp_analyze_bb_1): Call new function to check for vector >constructors. > (vectorize_slp_instance_root_stmt): New function. > (vect_schedule_slp): Call new function to vectorize root stmt >of vector constructors. > * tree-vectorizer.h (SLP_INSTANCE_ROOT_STMT): New field. > >gcc/testsuite/ChangeLog: > >2019-11-04 Joel Hutton <Joel.Hutton@arm.com> > > * gcc.dg/vect/bb-slp-40.c: New test. > * gcc.dg/vect/bb-slp-41.c: New test.
From 902510bd498acfc9e30636f8267b57027bc63254 Mon Sep 17 00:00:00 2001 From: Joel Hutton <Joel.Hutton@arm.com> Date: Tue, 22 Oct 2019 10:05:19 +0100 Subject: [PATCH] SLP Vectorization: Vectorize vector constructors --- gcc/expr.c | 5 +- gcc/testsuite/gcc.dg/vect/bb-slp-40.c | 34 +++++ gcc/testsuite/gcc.dg/vect/bb-slp-41.c | 61 +++++++++ gcc/tree-vect-slp.c | 171 +++++++++++++++++++++++++- gcc/tree-vectorizer.h | 5 + 5 files changed, 273 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-40.c create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-41.c diff --git a/gcc/expr.c b/gcc/expr.c index 476c6865f20828fc68f455e70d4874eaabd9d08d..ea4705e47c7e8424cca5e19319ef21db415f496a 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -6825,7 +6825,10 @@ store_constructor (tree exp, rtx target, int cleared, poly_int64 size, { unsigned int n = const_n_elts; - if (emode != eltmode) + if (emode != eltmode + || (TREE_CODE (TREE_TYPE (exp)) == VECTOR_TYPE + && TREE_CODE (TREE_TYPE (CONSTRUCTOR_ELT(exp, 0)->value)) + == VECTOR_TYPE)) { n = CONSTRUCTOR_NELTS (exp); vec_vec_init_p = true; diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-40.c b/gcc/testsuite/gcc.dg/vect/bb-slp-40.c new file mode 100644 index 0000000000000000000000000000000000000000..a1dd372184623f34f8f2825aa5da50dc70c98084 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-40.c @@ -0,0 +1,34 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -fdump-tree-slp-all" } */ +/* { dg-require-effective-target vect_int } */ + +char g_d[1024], g_s1[1024], g_s2[1024]; +void foo(void) +{ + char *d = g_d, *s1 = g_s1, *s2 = g_s2; + + for ( int y = 0; y < 128; y++ ) + { + d[0 ] = s1[0 ] + s2[0 ]; + d[1 ] = s1[1 ] + s2[1 ]; + d[2 ] = s1[2 ] + s2[2 ]; + d[3 ] = s1[3 ] + s2[3 ]; + d[4 ] = s1[4 ] + s2[4 ]; + d[5 ] = s1[5 ] + s2[5 ]; + d[6 ] = s1[6 ] + s2[6 ]; + d[7 ] = s1[7 ] + s2[7 ]; + d[8 ] = s1[8 ] + s2[8 ]; + d[9 ] = s1[9 ] + s2[9 ]; + d[10] = s1[10] + s2[10]; + d[11] = s1[11] + s2[11]; + d[12] = s1[12] + s2[12]; + d[13] = s1[13] + s2[13]; + d[14] = s1[14] + s2[14]; + d[15] = s1[15] + s2[15]; + d += 16; + } +} + +/* See that we vectorize an SLP instance. */ +/* { dg-final { scan-tree-dump-times "Found vectorizable constructor" 1 "slp1" } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "slp1" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-41.c b/gcc/testsuite/gcc.dg/vect/bb-slp-41.c new file mode 100644 index 0000000000000000000000000000000000000000..618d1d0ef3f6da93b53df62296620d589ad2e21a --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-41.c @@ -0,0 +1,61 @@ +/* { dg-do run } */ +/* { dg-options "-O3 -fdump-tree-slp-all -fno-vect-cost-model" } */ +/* { dg-require-effective-target vect_int } */ + +#define ARR_SIZE 1000 + +void foo (int *a, int *b) +{ + int i; + for (i = 0; i < (ARR_SIZE - 2); ++i) + a[i] = b[0] + b[1] + b[i+1] + b[i+2]; +} + +void bar (int *a, int *b) +{ + int i; + for (i = 0; i < (ARR_SIZE - 2); ++i) + { + a[i] = b[0]; + } + for (i = 0; i < (ARR_SIZE - 2); ++i) + { + a[i] = a[i] + b[1]; + } + for (i = 0; i < (ARR_SIZE - 2); ++i) + { + a[i] = a[i] + b[i+1]; + } + for (i = 0; i < (ARR_SIZE - 2); ++i) + { + a[i] = a[i] + b[i+2]; + } +} + +int main () +{ + int a1[ARR_SIZE]; + int a2[ARR_SIZE]; + int b[ARR_SIZE]; + int i; + + for (i = 0; i < ARR_SIZE; i++) + { + a1[i] = 0; + a2[i] = 0; + b[i] = i; + } + + foo (a1, b); + bar (a2, b); + + for (i = 0; i < ARR_SIZE; i++) + if (a1[i] != a2[i]) + return 1; + + return 0; + +} +/* See that we vectorize an SLP instance. */ +/* { dg-final { scan-tree-dump-times "Found vectorizable constructor" 4 "slp1" } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "slp1" } } */ diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c index e1061ede061751f5fd6c55edb56671a854e7456e..be7bb27899da06d22ddc4833b3e90a2f8c000177 100644 --- a/gcc/tree-vect-slp.c +++ b/gcc/tree-vect-slp.c @@ -1983,6 +1983,7 @@ vect_analyze_slp_instance (vec_info *vinfo, unsigned int i; struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info); vec<stmt_vec_info> scalar_stmts; + bool constructor = false; if (STMT_VINFO_GROUPED_ACCESS (stmt_info)) { @@ -1996,6 +1997,13 @@ vect_analyze_slp_instance (vec_info *vinfo, vectype = STMT_VINFO_VECTYPE (stmt_info); group_size = REDUC_GROUP_SIZE (stmt_info); } + else if (is_gimple_assign (stmt_info->stmt) + && gimple_assign_rhs_code (stmt_info->stmt) == CONSTRUCTOR) + { + vectype = TREE_TYPE (gimple_assign_rhs1 (stmt_info->stmt)); + group_size = CONSTRUCTOR_NELTS (gimple_assign_rhs1 (stmt_info->stmt)); + constructor = true; + } else { gcc_assert (is_a <loop_vec_info> (vinfo)); @@ -2042,6 +2050,25 @@ vect_analyze_slp_instance (vec_info *vinfo, STMT_VINFO_REDUC_DEF (vect_orig_stmt (stmt_info)) = STMT_VINFO_REDUC_DEF (vect_orig_stmt (scalar_stmts.last ())); } + else if (constructor) + { + tree rhs = gimple_assign_rhs1 (stmt_info->stmt); + tree val; + FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (rhs), i, val) + { + if (TREE_CODE (val) == SSA_NAME) + { + gimple* def = SSA_NAME_DEF_STMT (val); + stmt_vec_info def_info = vinfo->lookup_stmt (def); + /* Value is defined in another basic block. */ + if (!def_info) + return false; + scalar_stmts.safe_push (def_info); + } + else + return false; + } + } else { /* Collect reduction statements. */ @@ -2099,6 +2126,8 @@ vect_analyze_slp_instance (vec_info *vinfo, SLP_INSTANCE_GROUP_SIZE (new_instance) = group_size; SLP_INSTANCE_UNROLLING_FACTOR (new_instance) = unrolling_factor; SLP_INSTANCE_LOADS (new_instance) = vNULL; + SLP_INSTANCE_ROOT_STMT (new_instance) = constructor ? stmt_info->stmt\ + : NULL; vect_gather_slp_loads (new_instance, node); if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, @@ -2801,6 +2830,10 @@ vect_bb_slp_scalar_cost (basic_block bb, stmt_vec_info use_stmt_info = vinfo->lookup_stmt (use_stmt); if (!use_stmt_info || !PURE_SLP_STMT (use_stmt_info)) { + /* Check this is not a constructor that will be vectorized + away. */ + if (BB_VINFO_GROUPED_STORES (vinfo).contains (use_stmt_info)) + continue; (*life)[i] = true; BREAK_FROM_IMM_USE_STMT (use_iter); } @@ -2912,6 +2945,38 @@ vect_bb_vectorization_profitable_p (bb_vec_info bb_vinfo) return true; } +/* Find any vectorizable constructors, and add them to the grouped_store + array. */ + +static void +vect_slp_check_for_constructors (bb_vec_info bb_vinfo) +{ + gimple_stmt_iterator gsi; + + for (gsi = bb_vinfo->region_begin; + gsi_stmt (gsi) != gsi_stmt (bb_vinfo->region_end); gsi_next (&gsi)) + { + gimple *stmt = gsi_stmt (gsi); + + if (is_gimple_assign (stmt) + && gimple_assign_rhs_code (stmt) == CONSTRUCTOR + && TREE_CODE (gimple_assign_lhs (stmt)) == SSA_NAME + && TREE_CODE (TREE_TYPE (gimple_assign_lhs (stmt))) == VECTOR_TYPE) + { + tree rhs = gimple_assign_rhs1 (stmt); + + if (CONSTRUCTOR_NELTS (rhs) == 0) + continue; + + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "Found vectorizable constructor: %G\n", stmt); + stmt_vec_info stmt_info = bb_vinfo->lookup_stmt (stmt); + BB_VINFO_GROUPED_STORES (bb_vinfo).safe_push (stmt_info); + } + } +} + /* Check if the region described by BB_VINFO can be vectorized, returning true if so. When returning false, set FATAL to true if the same failure would prevent vectorization at other vector sizes, false if it is still @@ -2959,6 +3024,8 @@ vect_slp_analyze_bb_1 (bb_vec_info bb_vinfo, int n_stmts, bool &fatal) return false; } + vect_slp_check_for_constructors (bb_vinfo); + /* If there are no grouped stores in the region there is no need to continue with pattern recog as vect_analyze_slp will fail anyway. */ @@ -4022,6 +4089,68 @@ vect_remove_slp_scalar_calls (slp_tree node) vect_remove_slp_scalar_calls (node, visited); } +/* Vectorize the instance root. Return success or failure. */ + +bool +vectorize_slp_instance_root_stmt (slp_tree node, slp_instance instance) +{ + + if (SLP_TREE_NUMBER_OF_VEC_STMTS (node) == 1) + { + stmt_vec_info child_stmt_info; + int j; + tree constructor; + constructor = gimple_assign_rhs1 (SLP_INSTANCE_ROOT_STMT (instance)); + + FOR_EACH_VEC_ELT (SLP_TREE_VEC_STMTS (node), j, child_stmt_info) + { + tree vect_lhs = gimple_get_lhs (child_stmt_info->stmt); + gassign *rstmt + = gimple_build_assign (gimple_get_lhs (instance->root_stmt), + gimple_get_lhs (child_stmt_info->stmt)); + gimple_stmt_iterator rgsi = gsi_for_stmt (instance->root_stmt); + if (TYPE_VECTOR_SUBPARTS (TREE_TYPE (vect_lhs)).is_constant () + && (CONSTRUCTOR_NELTS (constructor) + == TYPE_VECTOR_SUBPARTS (TREE_TYPE (vect_lhs)).to_constant ())) + { + gsi_replace (&rgsi, rstmt, true); + SLP_INSTANCE_ROOT_STMT (instance) = NULL; + } + break; + } + } + else if (SLP_TREE_NUMBER_OF_VEC_STMTS (node) > 1) + { + int nelts = SLP_TREE_NUMBER_OF_VEC_STMTS (node); + stmt_vec_info child_stmt_info; + int j; + vec<constructor_elt, va_gc> *v; + vec_alloc (v, nelts); + FOR_EACH_VEC_ELT (SLP_TREE_VEC_STMTS (node), j, child_stmt_info) + { + CONSTRUCTOR_APPEND_ELT (v, + NULL_TREE, + gimple_get_lhs (child_stmt_info->stmt)); + } + tree type = TREE_TYPE (gimple_assign_rhs1 (instance->root_stmt)); + tree r_constructor = build_constructor (type, v); + gassign *rstmt + = gimple_build_assign (gimple_get_lhs (instance->root_stmt), + r_constructor); + gimple_stmt_iterator rgsi = gsi_for_stmt (instance->root_stmt); + gsi_replace (&rgsi, rstmt, true); + SLP_INSTANCE_ROOT_STMT (instance) = NULL; + } + else + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "vectorization of constructor failed.\n"); + return false; + } + return true; +} + /* Generate vector code for all SLP instances in the loop/basic block. */ void @@ -4036,9 +4165,20 @@ vect_schedule_slp (vec_info *vinfo) slp_instances = vinfo->slp_instances; FOR_EACH_VEC_ELT (slp_instances, i, instance) { + slp_tree node = SLP_INSTANCE_TREE (instance); /* Schedule the tree of INSTANCE. */ - vect_schedule_slp_instance (SLP_INSTANCE_TREE (instance), + vect_schedule_slp_instance (node, instance, bst_map); + + /* Vectorize the instance root. */ + if (instance->root == node && SLP_INSTANCE_ROOT_STMT (instance) + && SLP_TREE_VEC_STMTS (node).exists ()) + if (!vectorize_slp_instance_root_stmt (node, instance)) + { + delete bst_map; + return; + } + if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, "vectorizing stmts using SLP.\n"); @@ -4061,15 +4201,42 @@ vect_schedule_slp (vec_info *vinfo) if (is_a <loop_vec_info> (vinfo)) vect_remove_slp_scalar_calls (root); + auto_vec<stmt_vec_info> removed; + for (j = 0; SLP_TREE_SCALAR_STMTS (root).iterate (j, &store_info) && j < SLP_INSTANCE_GROUP_SIZE (instance); j++) { if (!STMT_VINFO_DATA_REF (store_info)) break; + if (removed.contains (store_info)) + continue; + store_info = vect_orig_stmt (store_info); + tree lhs = gimple_get_lhs (store_info->stmt); + bool used = false; + /* We must ensure we don't remove def stmts for ssa names used + elsewhere. */ + if (TREE_CODE (lhs) == SSA_NAME) + { + gimple *use_stmt; + imm_use_iterator use_iter; + + FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, lhs) + { + if (use_stmt != store_info->stmt) + { + used = true; + BREAK_FROM_IMM_USE_STMT (use_iter); + } + } + } + /* Free the attached stmt_vec_info and remove the stmt. */ - vinfo->remove_stmt (store_info); + + if (!used) + vinfo->remove_stmt (store_info); + removed.safe_push (store_info); } } } diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index 56be28b0cc5a77412f996e70636b08d5b615813e..9f8419e4208b7d438ace41892022f93ebcadd019 100644 --- a/gcc/tree-vectorizer.h +++ b/gcc/tree-vectorizer.h @@ -151,6 +151,10 @@ public: /* The root of SLP tree. */ slp_tree root; + /* For vector constructors, the constructor stmt that the SLP tree is built + from, NULL otherwise. */ + gimple *root_stmt; + /* Size of groups of scalar stmts that will be replaced by SIMD stmt/s. */ unsigned int group_size; @@ -170,6 +174,7 @@ public: #define SLP_INSTANCE_GROUP_SIZE(S) (S)->group_size #define SLP_INSTANCE_UNROLLING_FACTOR(S) (S)->unrolling_factor #define SLP_INSTANCE_LOADS(S) (S)->loads +#define SLP_INSTANCE_ROOT_STMT(S) (S)->root_stmt #define SLP_TREE_CHILDREN(S) (S)->children #define SLP_TREE_SCALAR_STMTS(S) (S)->stmts -- 2.17.1