Message ID | 87pobvr5t8.fsf@linaro.org |
---|---|
State | New |
Headers | show |
Richard Sandiford <richard.sandiford@linaro.org> writes: > The first loop in the testcase regressed after my recent changes to > dr_analyze_innermost. Previously we would treat "i" as an iv even > for bb analysis and end up with: > > DR_BASE_ADDRESS: p or q > DR_OFFSET: 0 > DR_INIT: 0 or 4 > DR_STEP: 16 > > We now always keep the step as 0 instead, so for an int "i" we'd have: > > DR_BASE_ADDRESS: p or q > DR_OFFSET: (intptr_t) i > DR_INIT: 0 or 4 > DR_STEP: 0 > > This is also what we'd like to have for the unsigned "i", but the > problem is that strip_constant_offset thinks that the "i + 1" in > "(intptr_t) (i + 1)" could wrap and so doesn't peel off the "+ 1". > The [i + 1] accesses therefore have a DR_OFFSET equal to the SSA > name that holds "(intptr_t) (i + 1)", meaning that the accesses no > longer seem to be related to the [i] ones. > > There seem to be two main uses of DR_OFFSET and DR_INIT. One type > expects DR_OFFSET and DR_INIT to be generic expressions whose sum > gives the initial offset from DR_BASE_ADDRESS. The other type uses > the pair (DR_BASE_ADDRESS, DR_OFFSET) to identify related data > references, with the distance between their start addresses being > the difference between their DR_INITs. For this second type, the > exact form of DR_OFFSET doesn't really matter, and the more it is > able to canonicalise the non-constant offset, the better. > > The patch fixes the PR by adding another offset/init pair to the > innermost loop behaviour for this second group. The new pair use chrecs > rather than generic exprs for the offset part, with the chrec being > analysed in the innermost loop for which the offset isn't invariant. > This allows us to vectorise the second function in the testcase > as well, which wasn't possible before the earlier patch. > > Tested on x86_64-linux-gnu and aarch64-linux-gnu. OK to install? > > Richard > > > gcc/ > PR tree-optimization/81635 > * tree-data-ref.h (innermost_loop_behavior): Add chrec_offset and > chrec_init. > (DR_CHREC_OFFSET, DR_CHREC_INIT): New macros. > (dr_equal_offsets_p): Delete. > (dr_chrec_offsets_equal_p, dr_chrec_offsets_compare): Declare. > * tree-data-ref.c: Include tree-ssa-loop-ivopts.h. > (split_constant_offset): Handle POLYNOMIAL_CHREC. > (dr_analyze_innermost): Initialize dr_chrec_offset and dr_chrec_init. > (operator ==): Use dr_chrec_offsets_equal_p and compare the > DR_CHREC_INITs. > (prune_runtime_alias_test_list): Likewise. > (comp_dr_with_seg_len_pair): Use dr_chrec_offsets_compare and compare > the DR_CHREC_INITs. > (dr_equal_offsets_p1, dr_equal_offsets_p): Delete. > (analyze_offset_scev): New function. > (compute_offset_chrecs): Likewise. > (dr_chrec_offsets_equal_p): Likewise. > (dr_chrec_offsets_compare): Likewise. > * tree-loop-distribution.c (compute_alias_check_pairs): Use > dr_chrec_offsets_compare. > * tree-vect-data-refs.c (vect_find_same_alignment_drs): Use > dr_chrec_offsets_compare and compare the DR_CHREC_INITs. > (dr_group_sort_cmp): Likewise. > (vect_analyze_group_access_1): Use DR_CHREC_INIT instead of DR_INIT. > (vect_no_alias_p): Likewise. > (vect_analyze_data_ref_accesses): Use dr_chrec_offsets_equal_p and > compare the DR_CHREC_INITs. > (vect_prune_runtime_alias_test_list): Use dr_chrec_offsets_compare. > * tree-vect-stmts.c (vectorizable_load): Use DR_CHREC_INIT instead > of DR_INIT. > > gcc/testsuite/ > PR tree-optimization/81635 > * gcc.dg/vect/bb-slp-pr81635.c: New test. Sorry, forgot I was going to convert this to a context diff... Index: gcc/tree-data-ref.h =================================================================== *** gcc/tree-data-ref.h 2017-08-04 11:42:45.938134820 +0100 --- gcc/tree-data-ref.h 2017-08-16 14:34:56.611554296 +0100 *************** struct innermost_loop_behavior *** 52,57 **** --- 52,78 ---- tree init; tree step; + /* The scalar evolution of OFFSET + INIT, split into non-constant and + constant terms in the same way as OFFSET and INIT. For example, + if OFFSET + INIT is {x + 4, +, 3}_1, the fields would be: + + chrec_offset {x, +, 3}_1 + chrec_init 4 + + If OFFSET + INIT cannot be represented as a scalar evolution, + the fields are simply a copy of OFFSET and INIT. + + This split is useful when analyzing the relationship between two + data references with the same base. OFFSET and INIT are generic + expressions that can be used to build related data references, + while CHREC_OFFSET and CHREC_INIT are our best attempt at + canonicalizing the value of OFFSET + INIT. + + These fields are only initialized lazily, by a call to + compute_offset_chrecs. */ + tree chrec_offset; + tree chrec_init; + /* BASE_ADDRESS is known to be misaligned by BASE_MISALIGNMENT bytes from an alignment boundary of BASE_ALIGNMENT bytes. For example, if we had: *************** #define DR_IS_CONDITIONAL_IN_STMT(DR) (D *** 187,192 **** --- 208,215 ---- #define DR_BASE_ADDRESS(DR) (DR)->innermost.base_address #define DR_OFFSET(DR) (DR)->innermost.offset #define DR_INIT(DR) (DR)->innermost.init + #define DR_CHREC_OFFSET(DR) (DR)->innermost.chrec_offset + #define DR_CHREC_INIT(DR) (DR)->innermost.chrec_init #define DR_STEP(DR) (DR)->innermost.step #define DR_PTR_INFO(DR) (DR)->alias.ptr_info #define DR_BASE_ALIGNMENT(DR) (DR)->innermost.base_alignment *************** dr_alignment (data_reference *dr) *** 466,473 **** extern bool dr_may_alias_p (const struct data_reference *, const struct data_reference *, bool); ! extern bool dr_equal_offsets_p (struct data_reference *, ! struct data_reference *); extern bool runtime_alias_check_p (ddr_p, struct loop *, bool); extern int data_ref_compare_tree (tree, tree); --- 489,498 ---- extern bool dr_may_alias_p (const struct data_reference *, const struct data_reference *, bool); ! extern bool dr_chrec_offsets_equal_p (struct data_reference *, ! struct data_reference *); ! extern int dr_chrec_offsets_compare (struct data_reference *, ! struct data_reference *); extern bool runtime_alias_check_p (ddr_p, struct loop *, bool); extern int data_ref_compare_tree (tree, tree); Index: gcc/tree-data-ref.c =================================================================== *** gcc/tree-data-ref.c 2017-08-04 11:42:45.938134820 +0100 --- gcc/tree-data-ref.c 2017-08-16 14:34:56.611554296 +0100 *************** Software Foundation; either version 3, o *** 86,91 **** --- 86,92 ---- #include "expr.h" #include "gimple-iterator.h" #include "tree-ssa-loop-niter.h" + #include "tree-ssa-loop-ivopts.h" #include "tree-ssa-loop.h" #include "tree-ssa.h" #include "cfgloop.h" *************** split_constant_offset (tree exp, tree *v *** 730,736 **** *off = ssize_int (0); STRIP_NOPS (exp); ! if (tree_is_chrec (exp) || get_gimple_rhs_class (TREE_CODE (exp)) == GIMPLE_TERNARY_RHS) return; --- 731,749 ---- *off = ssize_int (0); STRIP_NOPS (exp); ! if (TREE_CODE (exp) == POLYNOMIAL_CHREC) ! { ! split_constant_offset (CHREC_LEFT (exp), &op0, &op1); ! if (op0 != CHREC_LEFT (exp)) ! { ! *var = build3 (POLYNOMIAL_CHREC, type, CHREC_VAR (exp), ! op0, CHREC_RIGHT (exp)); ! *off = op1; ! } ! return; ! } ! ! if (automatically_generated_chrec_p (exp) || get_gimple_rhs_class (TREE_CODE (exp)) == GIMPLE_TERNARY_RHS) return; *************** dr_analyze_innermost (innermost_loop_beh *** 924,929 **** --- 937,944 ---- drb->offset = fold_convert (ssizetype, offset_iv.base); drb->init = init; drb->step = step; + drb->chrec_offset = NULL_TREE; + drb->chrec_init = NULL_TREE; drb->base_alignment = base_alignment; drb->base_misalignment = base_misalignment & (base_alignment - 1); drb->offset_alignment = highest_pow2_factor (offset_iv.base); *************** runtime_alias_check_p (ddr_p ddr, struct *** 1324,1334 **** operator == (const dr_with_seg_len& d1, const dr_with_seg_len& d2) { ! return operand_equal_p (DR_BASE_ADDRESS (d1.dr), DR_BASE_ADDRESS (d2.dr), 0) ! && data_ref_compare_tree (DR_OFFSET (d1.dr), DR_OFFSET (d2.dr)) == 0 ! && data_ref_compare_tree (DR_INIT (d1.dr), DR_INIT (d2.dr)) == 0 ! && data_ref_compare_tree (d1.seg_len, d2.seg_len) == 0; } /* Comparison function for sorting objects of dr_with_seg_len_pair_t --- 1339,1350 ---- operator == (const dr_with_seg_len& d1, const dr_with_seg_len& d2) { ! return (operand_equal_p (DR_BASE_ADDRESS (d1.dr), DR_BASE_ADDRESS (d2.dr), 0) ! && dr_chrec_offsets_equal_p (d1.dr, d2.dr) ! && data_ref_compare_tree (DR_CHREC_INIT (d1.dr), ! DR_CHREC_INIT (d2.dr)) == 0 ! && data_ref_compare_tree (d1.seg_len, d2.seg_len) == 0); } /* Comparison function for sorting objects of dr_with_seg_len_pair_t *************** comp_dr_with_seg_len_pair (const void *p *** 1360,1376 **** if ((comp_res = data_ref_compare_tree (DR_STEP (a2.dr), DR_STEP (b2.dr))) != 0) return comp_res; ! if ((comp_res = data_ref_compare_tree (DR_OFFSET (a1.dr), ! DR_OFFSET (b1.dr))) != 0) return comp_res; ! if ((comp_res = data_ref_compare_tree (DR_INIT (a1.dr), ! DR_INIT (b1.dr))) != 0) return comp_res; ! if ((comp_res = data_ref_compare_tree (DR_OFFSET (a2.dr), ! DR_OFFSET (b2.dr))) != 0) return comp_res; ! if ((comp_res = data_ref_compare_tree (DR_INIT (a2.dr), ! DR_INIT (b2.dr))) != 0) return comp_res; return 0; --- 1376,1390 ---- if ((comp_res = data_ref_compare_tree (DR_STEP (a2.dr), DR_STEP (b2.dr))) != 0) return comp_res; ! if ((comp_res = dr_chrec_offsets_compare (a1.dr, b1.dr)) != 0) return comp_res; ! if ((comp_res = data_ref_compare_tree (DR_CHREC_INIT (a1.dr), ! DR_CHREC_INIT (b1.dr))) != 0) return comp_res; ! if ((comp_res = dr_chrec_offsets_compare (a2.dr, b2.dr)) != 0) return comp_res; ! if ((comp_res = data_ref_compare_tree (DR_CHREC_INIT (a2.dr), ! DR_CHREC_INIT (b2.dr))) != 0) return comp_res; return 0; *************** prune_runtime_alias_test_list (vec<dr_wi *** 1455,1464 **** if (!operand_equal_p (DR_BASE_ADDRESS (dr_a1->dr), DR_BASE_ADDRESS (dr_a2->dr), 0) ! || !operand_equal_p (DR_OFFSET (dr_a1->dr), ! DR_OFFSET (dr_a2->dr), 0) ! || !tree_fits_shwi_p (DR_INIT (dr_a1->dr)) ! || !tree_fits_shwi_p (DR_INIT (dr_a2->dr))) continue; /* Only merge const step data references. */ --- 1469,1477 ---- if (!operand_equal_p (DR_BASE_ADDRESS (dr_a1->dr), DR_BASE_ADDRESS (dr_a2->dr), 0) ! || !dr_chrec_offsets_equal_p (dr_a1->dr, dr_a2->dr) ! || !tree_fits_shwi_p (DR_CHREC_INIT (dr_a1->dr)) ! || !tree_fits_shwi_p (DR_CHREC_INIT (dr_a2->dr))) continue; /* Only merge const step data references. */ *************** prune_runtime_alias_test_list (vec<dr_wi *** 1484,1494 **** continue; /* Make sure dr_a1 starts left of dr_a2. */ ! if (tree_int_cst_lt (DR_INIT (dr_a2->dr), DR_INIT (dr_a1->dr))) std::swap (*dr_a1, *dr_a2); bool do_remove = false; ! wide_int diff = wi::sub (DR_INIT (dr_a2->dr), DR_INIT (dr_a1->dr)); wide_int min_seg_len_b; tree new_seg_len; --- 1497,1509 ---- continue; /* Make sure dr_a1 starts left of dr_a2. */ ! if (tree_int_cst_lt (DR_CHREC_INIT (dr_a2->dr), ! DR_CHREC_INIT (dr_a1->dr))) std::swap (*dr_a1, *dr_a2); bool do_remove = false; ! wide_int diff = wi::sub (DR_CHREC_INIT (dr_a2->dr), ! DR_CHREC_INIT (dr_a1->dr)); wide_int min_seg_len_b; tree new_seg_len; *************** create_runtime_alias_checks (struct loop *** 1826,1871 **** } } ! /* Check if OFFSET1 and OFFSET2 (DR_OFFSETs of some data-refs) are identical ! expressions. */ ! static bool ! dr_equal_offsets_p1 (tree offset1, tree offset2) { ! bool res; ! STRIP_NOPS (offset1); ! STRIP_NOPS (offset2); ! if (offset1 == offset2) ! return true; ! if (TREE_CODE (offset1) != TREE_CODE (offset2) ! || (!BINARY_CLASS_P (offset1) && !UNARY_CLASS_P (offset1))) ! return false; ! res = dr_equal_offsets_p1 (TREE_OPERAND (offset1, 0), ! TREE_OPERAND (offset2, 0)); ! if (!res || !BINARY_CLASS_P (offset1)) ! return res; ! res = dr_equal_offsets_p1 (TREE_OPERAND (offset1, 1), ! TREE_OPERAND (offset2, 1)); ! return res; } ! /* Check if DRA and DRB have equal offsets. */ ! bool ! dr_equal_offsets_p (struct data_reference *dra, ! struct data_reference *drb) ! { ! tree offset1, offset2; ! offset1 = DR_OFFSET (dra); ! offset2 = DR_OFFSET (drb); ! return dr_equal_offsets_p1 (offset1, offset2); } /* Returns true if FNA == FNB. */ --- 1841,1923 ---- } } ! /* Analyze the scalar evolution of OFFSET in the innermost parent of ! LOOP for which it isn't invariant. */ ! ! static tree ! analyze_offset_scev (struct loop *loop, tree offset) { ! struct loop *inv_loop = outermost_invariant_loop_for_expr (loop, offset); ! if (inv_loop != NULL) ! { ! if (loop_depth (inv_loop) == 0) ! return offset; ! loop = loop_outer (inv_loop); ! } ! return analyze_scalar_evolution (loop, offset); ! } ! /* Analyze the scalar evolution of DRB->offset + DRB->init in a form ! that is valid for use in LOOP. Store the result DRB->chrec_offset ! and DRB->chrec_init. */ ! static void ! compute_offset_chrecs (struct loop *loop, innermost_loop_behavior *drb) ! { ! tree scev = analyze_offset_scev (loop, drb->offset); ! if (TREE_CODE (scev) == POLYNOMIAL_CHREC) ! { ! tree init_term; ! split_constant_offset (scev, &drb->chrec_offset, &init_term); ! drb->chrec_init = fold_build2 (PLUS_EXPR, ssizetype, init_term, ! drb->init); ! } ! else ! { ! drb->chrec_offset = drb->offset; ! drb->chrec_init = drb->init; ! } ! } ! /* Analyze the scalar evolution of DR_OFFSET (DR) + DR_INIT (DR) wrt ! the loop that contains DR_STMT (DR), storing the result in ! DR_CHREC_OFFSET (DR) and DR_CHREC_INIT (DR). */ ! static void ! compute_offset_chrecs (data_reference *dr) ! { ! return compute_offset_chrecs (loop_containing_stmt (DR_STMT (dr)), ! &DR_INNERMOST (dr)); ! } ! /* Check if DRA and DRB have equal DR_CHREC_OFFSETs, computing them ! first if necessary. */ ! bool ! dr_chrec_offsets_equal_p (struct data_reference *dra, ! struct data_reference *drb) ! { ! if (!DR_CHREC_OFFSET (dra)) ! compute_offset_chrecs (dra); ! if (!DR_CHREC_OFFSET (drb)) ! compute_offset_chrecs (drb); ! return eq_evolutions_p (DR_CHREC_OFFSET (dra), DR_CHREC_OFFSET (drb)); } ! /* Compare the DR_CHREC_OFFSETs of DRA and DRB for sorting purposes, ! computing them first if necessary. */ ! int ! dr_chrec_offsets_compare (struct data_reference *dra, ! struct data_reference *drb) ! { ! if (!DR_CHREC_OFFSET (dra)) ! compute_offset_chrecs (dra); ! if (!DR_CHREC_OFFSET (drb)) ! compute_offset_chrecs (drb); ! return data_ref_compare_tree (DR_CHREC_OFFSET (dra), DR_CHREC_OFFSET (drb)); } /* Returns true if FNA == FNB. */ Index: gcc/tree-loop-distribution.c =================================================================== *** gcc/tree-loop-distribution.c 2017-08-16 08:50:32.415427038 +0100 --- gcc/tree-loop-distribution.c 2017-08-16 14:34:56.612554255 +0100 *************** compute_alias_check_pairs (struct loop * *** 2204,2210 **** DR_BASE_ADDRESS (dr_b)); if (comp_res == 0) ! comp_res = data_ref_compare_tree (DR_OFFSET (dr_a), DR_OFFSET (dr_b)); gcc_assert (comp_res != 0); if (latch_dominated_by_data_ref (loop, dr_a)) --- 2204,2210 ---- DR_BASE_ADDRESS (dr_b)); if (comp_res == 0) ! comp_res = dr_chrec_offsets_compare (dr_a, dr_b); gcc_assert (comp_res != 0); if (latch_dominated_by_data_ref (loop, dr_a)) Index: gcc/tree-vect-data-refs.c =================================================================== *** gcc/tree-vect-data-refs.c 2017-08-04 11:42:45.939105152 +0100 --- gcc/tree-vect-data-refs.c 2017-08-16 14:34:56.613554213 +0100 *************** vect_find_same_alignment_drs (struct dat *** 2187,2199 **** return; if (!operand_equal_p (DR_BASE_ADDRESS (dra), DR_BASE_ADDRESS (drb), 0) ! || !operand_equal_p (DR_OFFSET (dra), DR_OFFSET (drb), 0) || !operand_equal_p (DR_STEP (dra), DR_STEP (drb), 0)) return; /* Two references with distance zero have the same alignment. */ ! offset_int diff = (wi::to_offset (DR_INIT (dra)) ! - wi::to_offset (DR_INIT (drb))); if (diff != 0) { /* Get the wider of the two alignments. */ --- 2187,2199 ---- return; if (!operand_equal_p (DR_BASE_ADDRESS (dra), DR_BASE_ADDRESS (drb), 0) ! || !dr_chrec_offsets_equal_p (dra, drb) || !operand_equal_p (DR_STEP (dra), DR_STEP (drb), 0)) return; /* Two references with distance zero have the same alignment. */ ! offset_int diff = (wi::to_offset (DR_CHREC_INIT (dra)) ! - wi::to_offset (DR_CHREC_INIT (drb))); if (diff != 0) { /* Get the wider of the two alignments. */ *************** vect_analyze_group_access_1 (struct data *** 2434,2440 **** gimple *next = GROUP_NEXT_ELEMENT (vinfo_for_stmt (stmt)); struct data_reference *data_ref = dr; unsigned int count = 1; ! tree prev_init = DR_INIT (data_ref); gimple *prev = stmt; HOST_WIDE_INT diff, gaps = 0; --- 2434,2440 ---- gimple *next = GROUP_NEXT_ELEMENT (vinfo_for_stmt (stmt)); struct data_reference *data_ref = dr; unsigned int count = 1; ! tree prev_init = DR_CHREC_INIT (data_ref); gimple *prev = stmt; HOST_WIDE_INT diff, gaps = 0; *************** vect_analyze_group_access_1 (struct data *** 2444,2452 **** data-ref (supported only for loads), we vectorize only the first stmt, and the rest get their vectorized loads from the first one. */ ! if (!tree_int_cst_compare (DR_INIT (data_ref), ! DR_INIT (STMT_VINFO_DATA_REF ( ! vinfo_for_stmt (next))))) { if (DR_IS_WRITE (data_ref)) { --- 2444,2452 ---- data-ref (supported only for loads), we vectorize only the first stmt, and the rest get their vectorized loads from the first one. */ ! if (!tree_int_cst_compare (DR_CHREC_INIT (data_ref), ! DR_CHREC_INIT (STMT_VINFO_DATA_REF ! (vinfo_for_stmt (next))))) { if (DR_IS_WRITE (data_ref)) { *************** vect_analyze_group_access_1 (struct data *** 2476,2482 **** /* Check that the distance between two accesses is equal to the type size. Otherwise, we have gaps. */ ! diff = (TREE_INT_CST_LOW (DR_INIT (data_ref)) - TREE_INT_CST_LOW (prev_init)) / type_size; if (diff != 1) { --- 2476,2482 ---- /* Check that the distance between two accesses is equal to the type size. Otherwise, we have gaps. */ ! diff = (TREE_INT_CST_LOW (DR_CHREC_INIT (data_ref)) - TREE_INT_CST_LOW (prev_init)) / type_size; if (diff != 1) { *************** vect_analyze_group_access_1 (struct data *** 2499,2505 **** gap in the access, GROUP_GAP is always 1. */ GROUP_GAP (vinfo_for_stmt (next)) = diff; ! prev_init = DR_INIT (data_ref); next = GROUP_NEXT_ELEMENT (vinfo_for_stmt (next)); /* Count the number of data-refs in the chain. */ count++; --- 2499,2505 ---- gap in the access, GROUP_GAP is always 1. */ GROUP_GAP (vinfo_for_stmt (next)) = diff; ! prev_init = DR_CHREC_INIT (data_ref); next = GROUP_NEXT_ELEMENT (vinfo_for_stmt (next)); /* Count the number of data-refs in the chain. */ count++; *************** dr_group_sort_cmp (const void *dra_, con *** 2715,2727 **** return cmp; } ! /* And according to DR_OFFSET. */ ! if (!dr_equal_offsets_p (dra, drb)) ! { ! cmp = data_ref_compare_tree (DR_OFFSET (dra), DR_OFFSET (drb)); ! if (cmp != 0) ! return cmp; ! } /* Put reads before writes. */ if (DR_IS_READ (dra) != DR_IS_READ (drb)) --- 2715,2724 ---- return cmp; } ! /* And according to DR_CHREC_OFFSET. */ ! cmp = dr_chrec_offsets_compare (dra, drb); ! if (cmp != 0) ! return cmp; /* Put reads before writes. */ if (DR_IS_READ (dra) != DR_IS_READ (drb)) *************** dr_group_sort_cmp (const void *dra_, con *** 2745,2752 **** return cmp; } ! /* Then sort after DR_INIT. In case of identical DRs sort after stmt UID. */ ! cmp = tree_int_cst_compare (DR_INIT (dra), DR_INIT (drb)); if (cmp == 0) return gimple_uid (DR_STMT (dra)) < gimple_uid (DR_STMT (drb)) ? -1 : 1; return cmp; --- 2742,2750 ---- return cmp; } ! /* Then sort after DR_CHREC_INIT. In case of identical DRs sort after ! stmt UID. */ ! cmp = tree_int_cst_compare (DR_CHREC_INIT (dra), DR_CHREC_INIT (drb)); if (cmp == 0) return gimple_uid (DR_STMT (dra)) < gimple_uid (DR_STMT (drb)) ? -1 : 1; return cmp; *************** vect_analyze_data_ref_accesses (vec_info *** 2817,2823 **** if (DR_IS_READ (dra) != DR_IS_READ (drb) || !operand_equal_p (DR_BASE_ADDRESS (dra), DR_BASE_ADDRESS (drb), 0) ! || !dr_equal_offsets_p (dra, drb) || !gimple_assign_single_p (DR_STMT (dra)) || !gimple_assign_single_p (DR_STMT (drb))) break; --- 2815,2821 ---- if (DR_IS_READ (dra) != DR_IS_READ (drb) || !operand_equal_p (DR_BASE_ADDRESS (dra), DR_BASE_ADDRESS (drb), 0) ! || !dr_chrec_offsets_equal_p (dra, drb) || !gimple_assign_single_p (DR_STMT (dra)) || !gimple_assign_single_p (DR_STMT (drb))) break; *************** vect_analyze_data_ref_accesses (vec_info *** 2835,2841 **** break; /* Do not place the same access in the interleaving chain twice. */ ! if (tree_int_cst_compare (DR_INIT (dra), DR_INIT (drb)) == 0) break; /* Check the types are compatible. --- 2833,2840 ---- break; /* Do not place the same access in the interleaving chain twice. */ ! if (tree_int_cst_compare (DR_CHREC_INIT (dra), ! DR_CHREC_INIT (drb)) == 0) break; /* Check the types are compatible. *************** vect_analyze_data_ref_accesses (vec_info *** 2844,2852 **** TREE_TYPE (DR_REF (drb)))) break; ! /* Sorting has ensured that DR_INIT (dra) <= DR_INIT (drb). */ ! HOST_WIDE_INT init_a = TREE_INT_CST_LOW (DR_INIT (dra)); ! HOST_WIDE_INT init_b = TREE_INT_CST_LOW (DR_INIT (drb)); gcc_assert (init_a <= init_b); /* If init_b == init_a + the size of the type * k, we have an --- 2843,2852 ---- TREE_TYPE (DR_REF (drb)))) break; ! /* Sorting has ensured that ! DR_CHREC_INIT (dra) <= DR_CHREC_INIT (drb). */ ! HOST_WIDE_INT init_a = TREE_INT_CST_LOW (DR_CHREC_INIT (dra)); ! HOST_WIDE_INT init_b = TREE_INT_CST_LOW (DR_CHREC_INIT (drb)); gcc_assert (init_a <= init_b); /* If init_b == init_a + the size of the type * k, we have an *************** vect_analyze_data_ref_accesses (vec_info *** 2859,2868 **** /* If we have a store, the accesses are adjacent. This splits groups into chunks we support (we don't support vectorization of stores with gaps). */ if (!DR_IS_READ (dra) ! && (init_b - (HOST_WIDE_INT) TREE_INT_CST_LOW ! (DR_INIT (datarefs_copy[i-1])) ! != type_size_a)) break; /* If the step (if not zero or non-constant) is greater than the --- 2859,2868 ---- /* If we have a store, the accesses are adjacent. This splits groups into chunks we support (we don't support vectorization of stores with gaps). */ + HOST_WIDE_INT prev_init + = TREE_INT_CST_LOW (DR_CHREC_INIT (datarefs_copy[i - 1])); if (!DR_IS_READ (dra) ! && (init_b - prev_init) != type_size_a) break; /* If the step (if not zero or non-constant) is greater than the *************** vect_vfa_segment_size (struct data_refer *** 2974,2985 **** vect_no_alias_p (struct data_reference *a, struct data_reference *b, tree segment_length_a, tree segment_length_b) { ! gcc_assert (TREE_CODE (DR_INIT (a)) == INTEGER_CST ! && TREE_CODE (DR_INIT (b)) == INTEGER_CST); ! if (tree_int_cst_equal (DR_INIT (a), DR_INIT (b))) return false; ! tree seg_a_min = DR_INIT (a); tree seg_a_max = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_a_min), seg_a_min, segment_length_a); /* For negative step, we need to adjust address range by TYPE_SIZE_UNIT --- 2974,2985 ---- vect_no_alias_p (struct data_reference *a, struct data_reference *b, tree segment_length_a, tree segment_length_b) { ! gcc_assert (TREE_CODE (DR_CHREC_INIT (a)) == INTEGER_CST ! && TREE_CODE (DR_CHREC_INIT (b)) == INTEGER_CST); ! if (tree_int_cst_equal (DR_CHREC_INIT (a), DR_CHREC_INIT (b))) return false; ! tree seg_a_min = DR_CHREC_INIT (a); tree seg_a_max = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_a_min), seg_a_min, segment_length_a); /* For negative step, we need to adjust address range by TYPE_SIZE_UNIT *************** vect_no_alias_p (struct data_reference * *** 2990,2999 **** tree unit_size = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (a))); seg_a_min = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_a_max), seg_a_max, unit_size); ! seg_a_max = fold_build2 (PLUS_EXPR, TREE_TYPE (DR_INIT (a)), ! DR_INIT (a), unit_size); } ! tree seg_b_min = DR_INIT (b); tree seg_b_max = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_b_min), seg_b_min, segment_length_b); if (tree_int_cst_compare (DR_STEP (b), size_zero_node) < 0) --- 2990,2999 ---- tree unit_size = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (a))); seg_a_min = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_a_max), seg_a_max, unit_size); ! seg_a_max = fold_build2 (PLUS_EXPR, TREE_TYPE (DR_CHREC_INIT (a)), ! DR_CHREC_INIT (a), unit_size); } ! tree seg_b_min = DR_CHREC_INIT (b); tree seg_b_max = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_b_min), seg_b_min, segment_length_b); if (tree_int_cst_compare (DR_STEP (b), size_zero_node) < 0) *************** vect_no_alias_p (struct data_reference * *** 3001,3008 **** tree unit_size = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (b))); seg_b_min = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_b_max), seg_b_max, unit_size); ! seg_b_max = fold_build2 (PLUS_EXPR, TREE_TYPE (DR_INIT (b)), ! DR_INIT (b), unit_size); } if (tree_int_cst_le (seg_a_max, seg_b_min) --- 3001,3008 ---- tree unit_size = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (b))); seg_b_min = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_b_max), seg_b_max, unit_size); ! seg_b_max = fold_build2 (PLUS_EXPR, TREE_TYPE (DR_CHREC_INIT (b)), ! DR_CHREC_INIT (b), unit_size); } if (tree_int_cst_le (seg_a_max, seg_b_min) *************** vect_prune_runtime_alias_test_list (loop *** 3148,3155 **** comp_res = data_ref_compare_tree (DR_BASE_ADDRESS (dr_a), DR_BASE_ADDRESS (dr_b)); if (comp_res == 0) ! comp_res = data_ref_compare_tree (DR_OFFSET (dr_a), ! DR_OFFSET (dr_b)); /* Alias is known at compilation time. */ if (comp_res == 0 --- 3148,3154 ---- comp_res = data_ref_compare_tree (DR_BASE_ADDRESS (dr_a), DR_BASE_ADDRESS (dr_b)); if (comp_res == 0) ! comp_res = dr_chrec_offsets_compare (dr_a, dr_b); /* Alias is known at compilation time. */ if (comp_res == 0 Index: gcc/tree-vect-stmts.c =================================================================== *** gcc/tree-vect-stmts.c 2017-08-03 10:40:55.650125801 +0100 --- gcc/tree-vect-stmts.c 2017-08-16 14:34:56.614554172 +0100 *************** vectorizable_load (gimple *stmt, gimple_ *** 7423,7430 **** = STMT_VINFO_DATA_REF (vinfo_for_stmt (first_stmt_for_drptr)); tree diff = fold_convert (sizetype, size_binop (MINUS_EXPR, ! DR_INIT (first_dr), ! DR_INIT (ptrdr))); dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, gsi, stmt, diff); } --- 7423,7430 ---- = STMT_VINFO_DATA_REF (vinfo_for_stmt (first_stmt_for_drptr)); tree diff = fold_convert (sizetype, size_binop (MINUS_EXPR, ! DR_CHREC_INIT (first_dr), ! DR_CHREC_INIT (ptrdr))); dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, gsi, stmt, diff); } Index: gcc/testsuite/gcc.dg/vect/bb-slp-pr81635.c =================================================================== *** /dev/null 2017-08-15 18:19:06.926776201 +0100 --- gcc/testsuite/gcc.dg/vect/bb-slp-pr81635.c 2017-08-16 14:34:56.610554337 +0100 *************** *** 0 **** --- 1,32 ---- + /* { dg-do compile } */ + /* { dg-require-effective-target vect_unpack } */ + + double p[1000]; + double q[1000]; + + void + f1 (void) + { + for (unsigned int i = 0; i < 1000; i += 4) + { + double a = q[i] + p[i]; + double b = q[i + 1] + p[i + 1]; + q[i] = a; + q[i + 1] = b; + } + } + + void + f2 (void) + { + for (unsigned int i = 0; i < 500; i += 6) + for (unsigned int j = 0; j < 500; j += 4) + { + double a = q[j] + p[i]; + double b = q[j + 1] + p[i + 1]; + q[i] = a; + q[i + 1] = b; + } + } + + /* { dg-final { scan-tree-dump-times "basic block vectorized" 2 "slp1" } } */
On Wed, Aug 16, 2017 at 2:38 PM, Richard Sandiford <richard.sandiford@linaro.org> wrote: > The first loop in the testcase regressed after my recent changes to > dr_analyze_innermost. Previously we would treat "i" as an iv even > for bb analysis and end up with: > > DR_BASE_ADDRESS: p or q > DR_OFFSET: 0 > DR_INIT: 0 or 4 > DR_STEP: 16 > > We now always keep the step as 0 instead, so for an int "i" we'd have: > > DR_BASE_ADDRESS: p or q > DR_OFFSET: (intptr_t) i > DR_INIT: 0 or 4 > DR_STEP: 0 > > This is also what we'd like to have for the unsigned "i", but the > problem is that strip_constant_offset thinks that the "i + 1" in > "(intptr_t) (i + 1)" could wrap and so doesn't peel off the "+ 1". > The [i + 1] accesses therefore have a DR_OFFSET equal to the SSA > name that holds "(intptr_t) (i + 1)", meaning that the accesses no > longer seem to be related to the [i] ones. Didn't read the change in detail, so sorry if I mis-understood the issue. I made changes in scev to better fold type conversion by various sources of information, for example, vrp, niters, undefined overflow behavior etc. In theory these information should be available for other optimizers without querying scev. For the mentioned test, vrp should compute accurate range information for "i" so that we can fold (intptr_t) (i + 1) it without worrying overflow. Note we don't do it in generic folding because (intptr_t) (i) + 1 could be more expensive (especially in case of (T)(i + j)), or because the CST part is in bigger precision after conversion. But such folding is wanted in several places, e.g, IVOPTs. To provide such an interface, we changed tree-affine and made it do aggressive fold. I am curious if it's possible to use aff_tree to implement strip_constant_offset here since aggressive folding is wanted. After all, using additional chrec looks like a little heavy wrto the simple test. Thanks, bin > > There seem to be two main uses of DR_OFFSET and DR_INIT. One type > expects DR_OFFSET and DR_INIT to be generic expressions whose sum > gives the initial offset from DR_BASE_ADDRESS. The other type uses > the pair (DR_BASE_ADDRESS, DR_OFFSET) to identify related data > references, with the distance between their start addresses being > the difference between their DR_INITs. For this second type, the > exact form of DR_OFFSET doesn't really matter, and the more it is > able to canonicalise the non-constant offset, the better. > > The patch fixes the PR by adding another offset/init pair to the > innermost loop behaviour for this second group. The new pair use chrecs > rather than generic exprs for the offset part, with the chrec being > analysed in the innermost loop for which the offset isn't invariant. > This allows us to vectorise the second function in the testcase > as well, which wasn't possible before the earlier patch. > > Tested on x86_64-linux-gnu and aarch64-linux-gnu. OK to install? > > Richard > > > gcc/ > PR tree-optimization/81635 > * tree-data-ref.h (innermost_loop_behavior): Add chrec_offset and > chrec_init. > (DR_CHREC_OFFSET, DR_CHREC_INIT): New macros. > (dr_equal_offsets_p): Delete. > (dr_chrec_offsets_equal_p, dr_chrec_offsets_compare): Declare. > * tree-data-ref.c: Include tree-ssa-loop-ivopts.h. > (split_constant_offset): Handle POLYNOMIAL_CHREC. > (dr_analyze_innermost): Initialize dr_chrec_offset and dr_chrec_init. > (operator ==): Use dr_chrec_offsets_equal_p and compare the > DR_CHREC_INITs. > (prune_runtime_alias_test_list): Likewise. > (comp_dr_with_seg_len_pair): Use dr_chrec_offsets_compare and compare > the DR_CHREC_INITs. > (dr_equal_offsets_p1, dr_equal_offsets_p): Delete. > (analyze_offset_scev): New function. > (compute_offset_chrecs): Likewise. > (dr_chrec_offsets_equal_p): Likewise. > (dr_chrec_offsets_compare): Likewise. > * tree-loop-distribution.c (compute_alias_check_pairs): Use > dr_chrec_offsets_compare. > * tree-vect-data-refs.c (vect_find_same_alignment_drs): Use > dr_chrec_offsets_compare and compare the DR_CHREC_INITs. > (dr_group_sort_cmp): Likewise. > (vect_analyze_group_access_1): Use DR_CHREC_INIT instead of DR_INIT. > (vect_no_alias_p): Likewise. > (vect_analyze_data_ref_accesses): Use dr_chrec_offsets_equal_p and > compare the DR_CHREC_INITs. > (vect_prune_runtime_alias_test_list): Use dr_chrec_offsets_compare. > * tree-vect-stmts.c (vectorizable_load): Use DR_CHREC_INIT instead > of DR_INIT. > > gcc/testsuite/ > PR tree-optimization/81635 > * gcc.dg/vect/bb-slp-pr81635.c: New test. >
"Bin.Cheng" <amker.cheng@gmail.com> writes: > On Wed, Aug 16, 2017 at 2:38 PM, Richard Sandiford > <richard.sandiford@linaro.org> wrote: >> The first loop in the testcase regressed after my recent changes to >> dr_analyze_innermost. Previously we would treat "i" as an iv even >> for bb analysis and end up with: >> >> DR_BASE_ADDRESS: p or q >> DR_OFFSET: 0 >> DR_INIT: 0 or 4 >> DR_STEP: 16 >> >> We now always keep the step as 0 instead, so for an int "i" we'd have: >> >> DR_BASE_ADDRESS: p or q >> DR_OFFSET: (intptr_t) i >> DR_INIT: 0 or 4 >> DR_STEP: 0 >> >> This is also what we'd like to have for the unsigned "i", but the >> problem is that strip_constant_offset thinks that the "i + 1" in >> "(intptr_t) (i + 1)" could wrap and so doesn't peel off the "+ 1". >> The [i + 1] accesses therefore have a DR_OFFSET equal to the SSA >> name that holds "(intptr_t) (i + 1)", meaning that the accesses no >> longer seem to be related to the [i] ones. > > Didn't read the change in detail, so sorry if I mis-understood the issue. > I made changes in scev to better fold type conversion by various sources > of information, for example, vrp, niters, undefined overflow behavior etc. > In theory these information should be available for other optimizers without > querying scev. For the mentioned test, vrp should compute accurate range > information for "i" so that we can fold (intptr_t) (i + 1) it without worrying > overflow. Note we don't do it in generic folding because (intptr_t) (i) + 1 > could be more expensive (especially in case of (T)(i + j)), or because the > CST part is in bigger precision after conversion. > But such folding is wanted in several places, e.g, IVOPTs. To provide such > an interface, we changed tree-affine and made it do aggressive fold. I am > curious if it's possible to use aff_tree to implement strip_constant_offset > here since aggressive folding is wanted. After all, using additional chrec > looks like a little heavy wrto the simple test. Yeah, using aff_tree does work here when the bounds are constant. It doesn't look like it works for things like: double p[1000]; double q[1000]; void f4 (unsigned int n) { for (unsigned int i = 0; i < n; i += 4) { double a = q[i] + p[i]; double b = q[i + 1] + p[i + 1]; q[i] = a; q[i + 1] = b; } } though, where the bounds on the global arrays guarantee that [i + 1] can't overflow, even though "n" is unconstrained. The patch as posted handles this case too. Thanks, Richard > > Thanks, > bin >> >> There seem to be two main uses of DR_OFFSET and DR_INIT. One type >> expects DR_OFFSET and DR_INIT to be generic expressions whose sum >> gives the initial offset from DR_BASE_ADDRESS. The other type uses >> the pair (DR_BASE_ADDRESS, DR_OFFSET) to identify related data >> references, with the distance between their start addresses being >> the difference between their DR_INITs. For this second type, the >> exact form of DR_OFFSET doesn't really matter, and the more it is >> able to canonicalise the non-constant offset, the better. >> >> The patch fixes the PR by adding another offset/init pair to the >> innermost loop behaviour for this second group. The new pair use chrecs >> rather than generic exprs for the offset part, with the chrec being >> analysed in the innermost loop for which the offset isn't invariant. >> This allows us to vectorise the second function in the testcase >> as well, which wasn't possible before the earlier patch. >> >> Tested on x86_64-linux-gnu and aarch64-linux-gnu. OK to install? >> >> Richard >> >> >> gcc/ >> PR tree-optimization/81635 >> * tree-data-ref.h (innermost_loop_behavior): Add chrec_offset and >> chrec_init. >> (DR_CHREC_OFFSET, DR_CHREC_INIT): New macros. >> (dr_equal_offsets_p): Delete. >> (dr_chrec_offsets_equal_p, dr_chrec_offsets_compare): Declare. >> * tree-data-ref.c: Include tree-ssa-loop-ivopts.h. >> (split_constant_offset): Handle POLYNOMIAL_CHREC. >> (dr_analyze_innermost): Initialize dr_chrec_offset and dr_chrec_init. >> (operator ==): Use dr_chrec_offsets_equal_p and compare the >> DR_CHREC_INITs. >> (prune_runtime_alias_test_list): Likewise. >> (comp_dr_with_seg_len_pair): Use dr_chrec_offsets_compare and compare >> the DR_CHREC_INITs. >> (dr_equal_offsets_p1, dr_equal_offsets_p): Delete. >> (analyze_offset_scev): New function. >> (compute_offset_chrecs): Likewise. >> (dr_chrec_offsets_equal_p): Likewise. >> (dr_chrec_offsets_compare): Likewise. >> * tree-loop-distribution.c (compute_alias_check_pairs): Use >> dr_chrec_offsets_compare. >> * tree-vect-data-refs.c (vect_find_same_alignment_drs): Use >> dr_chrec_offsets_compare and compare the DR_CHREC_INITs. >> (dr_group_sort_cmp): Likewise. >> (vect_analyze_group_access_1): Use DR_CHREC_INIT instead of DR_INIT. >> (vect_no_alias_p): Likewise. >> (vect_analyze_data_ref_accesses): Use dr_chrec_offsets_equal_p and >> compare the DR_CHREC_INITs. >> (vect_prune_runtime_alias_test_list): Use dr_chrec_offsets_compare. >> * tree-vect-stmts.c (vectorizable_load): Use DR_CHREC_INIT instead >> of DR_INIT. >> >> gcc/testsuite/ >> PR tree-optimization/81635 >> * gcc.dg/vect/bb-slp-pr81635.c: New test. >>
On Wed, Aug 16, 2017 at 5:00 PM, Richard Sandiford <richard.sandiford@linaro.org> wrote: > "Bin.Cheng" <amker.cheng@gmail.com> writes: >> On Wed, Aug 16, 2017 at 2:38 PM, Richard Sandiford >> <richard.sandiford@linaro.org> wrote: >>> The first loop in the testcase regressed after my recent changes to >>> dr_analyze_innermost. Previously we would treat "i" as an iv even >>> for bb analysis and end up with: >>> >>> DR_BASE_ADDRESS: p or q >>> DR_OFFSET: 0 >>> DR_INIT: 0 or 4 >>> DR_STEP: 16 >>> >>> We now always keep the step as 0 instead, so for an int "i" we'd have: >>> >>> DR_BASE_ADDRESS: p or q >>> DR_OFFSET: (intptr_t) i >>> DR_INIT: 0 or 4 >>> DR_STEP: 0 >>> >>> This is also what we'd like to have for the unsigned "i", but the >>> problem is that strip_constant_offset thinks that the "i + 1" in >>> "(intptr_t) (i + 1)" could wrap and so doesn't peel off the "+ 1". >>> The [i + 1] accesses therefore have a DR_OFFSET equal to the SSA >>> name that holds "(intptr_t) (i + 1)", meaning that the accesses no >>> longer seem to be related to the [i] ones. >> >> Didn't read the change in detail, so sorry if I mis-understood the issue. >> I made changes in scev to better fold type conversion by various sources >> of information, for example, vrp, niters, undefined overflow behavior etc. >> In theory these information should be available for other optimizers without >> querying scev. For the mentioned test, vrp should compute accurate range >> information for "i" so that we can fold (intptr_t) (i + 1) it without worrying >> overflow. Note we don't do it in generic folding because (intptr_t) (i) + 1 >> could be more expensive (especially in case of (T)(i + j)), or because the >> CST part is in bigger precision after conversion. >> But such folding is wanted in several places, e.g, IVOPTs. To provide such >> an interface, we changed tree-affine and made it do aggressive fold. I am >> curious if it's possible to use aff_tree to implement strip_constant_offset >> here since aggressive folding is wanted. After all, using additional chrec >> looks like a little heavy wrto the simple test. > > Yeah, using aff_tree does work here when the bounds are constant. > It doesn't look like it works for things like: > > double p[1000]; > double q[1000]; > > void > f4 (unsigned int n) > { > for (unsigned int i = 0; i < n; i += 4) > { > double a = q[i] + p[i]; > double b = q[i + 1] + p[i + 1]; > q[i] = a; > q[i + 1] = b; > } > } > > though, where the bounds on the global arrays guarantee that [i + 1] can't > overflow, even though "n" is unconstrained. The patch as posted handles > this case too. BTW is this a missed optimization in value range analysis? The range information for i should flow in a way like: array boundary -> niters -> scev/vrp. I think that's what niters/scev do in analysis. Thanks, bin > > Thanks, > Richard > >> >> Thanks, >> bin >>> >>> There seem to be two main uses of DR_OFFSET and DR_INIT. One type >>> expects DR_OFFSET and DR_INIT to be generic expressions whose sum >>> gives the initial offset from DR_BASE_ADDRESS. The other type uses >>> the pair (DR_BASE_ADDRESS, DR_OFFSET) to identify related data >>> references, with the distance between their start addresses being >>> the difference between their DR_INITs. For this second type, the >>> exact form of DR_OFFSET doesn't really matter, and the more it is >>> able to canonicalise the non-constant offset, the better. >>> >>> The patch fixes the PR by adding another offset/init pair to the >>> innermost loop behaviour for this second group. The new pair use chrecs >>> rather than generic exprs for the offset part, with the chrec being >>> analysed in the innermost loop for which the offset isn't invariant. >>> This allows us to vectorise the second function in the testcase >>> as well, which wasn't possible before the earlier patch. >>> >>> Tested on x86_64-linux-gnu and aarch64-linux-gnu. OK to install? >>> >>> Richard >>> >>> >>> gcc/ >>> PR tree-optimization/81635 >>> * tree-data-ref.h (innermost_loop_behavior): Add chrec_offset and >>> chrec_init. >>> (DR_CHREC_OFFSET, DR_CHREC_INIT): New macros. >>> (dr_equal_offsets_p): Delete. >>> (dr_chrec_offsets_equal_p, dr_chrec_offsets_compare): Declare. >>> * tree-data-ref.c: Include tree-ssa-loop-ivopts.h. >>> (split_constant_offset): Handle POLYNOMIAL_CHREC. >>> (dr_analyze_innermost): Initialize dr_chrec_offset and dr_chrec_init. >>> (operator ==): Use dr_chrec_offsets_equal_p and compare the >>> DR_CHREC_INITs. >>> (prune_runtime_alias_test_list): Likewise. >>> (comp_dr_with_seg_len_pair): Use dr_chrec_offsets_compare and compare >>> the DR_CHREC_INITs. >>> (dr_equal_offsets_p1, dr_equal_offsets_p): Delete. >>> (analyze_offset_scev): New function. >>> (compute_offset_chrecs): Likewise. >>> (dr_chrec_offsets_equal_p): Likewise. >>> (dr_chrec_offsets_compare): Likewise. >>> * tree-loop-distribution.c (compute_alias_check_pairs): Use >>> dr_chrec_offsets_compare. >>> * tree-vect-data-refs.c (vect_find_same_alignment_drs): Use >>> dr_chrec_offsets_compare and compare the DR_CHREC_INITs. >>> (dr_group_sort_cmp): Likewise. >>> (vect_analyze_group_access_1): Use DR_CHREC_INIT instead of DR_INIT. >>> (vect_no_alias_p): Likewise. >>> (vect_analyze_data_ref_accesses): Use dr_chrec_offsets_equal_p and >>> compare the DR_CHREC_INITs. >>> (vect_prune_runtime_alias_test_list): Use dr_chrec_offsets_compare. >>> * tree-vect-stmts.c (vectorizable_load): Use DR_CHREC_INIT instead >>> of DR_INIT. >>> >>> gcc/testsuite/ >>> PR tree-optimization/81635 >>> * gcc.dg/vect/bb-slp-pr81635.c: New test. >>>
"Bin.Cheng" <amker.cheng@gmail.com> writes: > On Wed, Aug 16, 2017 at 5:00 PM, Richard Sandiford > <richard.sandiford@linaro.org> wrote: >> "Bin.Cheng" <amker.cheng@gmail.com> writes: >>> On Wed, Aug 16, 2017 at 2:38 PM, Richard Sandiford >>> <richard.sandiford@linaro.org> wrote: >>>> The first loop in the testcase regressed after my recent changes to >>>> dr_analyze_innermost. Previously we would treat "i" as an iv even >>>> for bb analysis and end up with: >>>> >>>> DR_BASE_ADDRESS: p or q >>>> DR_OFFSET: 0 >>>> DR_INIT: 0 or 4 >>>> DR_STEP: 16 >>>> >>>> We now always keep the step as 0 instead, so for an int "i" we'd have: >>>> >>>> DR_BASE_ADDRESS: p or q >>>> DR_OFFSET: (intptr_t) i >>>> DR_INIT: 0 or 4 >>>> DR_STEP: 0 >>>> >>>> This is also what we'd like to have for the unsigned "i", but the >>>> problem is that strip_constant_offset thinks that the "i + 1" in >>>> "(intptr_t) (i + 1)" could wrap and so doesn't peel off the "+ 1". >>>> The [i + 1] accesses therefore have a DR_OFFSET equal to the SSA >>>> name that holds "(intptr_t) (i + 1)", meaning that the accesses no >>>> longer seem to be related to the [i] ones. >>> >>> Didn't read the change in detail, so sorry if I mis-understood the issue. >>> I made changes in scev to better fold type conversion by various sources >>> of information, for example, vrp, niters, undefined overflow behavior etc. >>> In theory these information should be available for other optimizers without >>> querying scev. For the mentioned test, vrp should compute accurate range >>> information for "i" so that we can fold (intptr_t) (i + 1) it without >>> worrying >>> overflow. Note we don't do it in generic folding because (intptr_t) (i) + 1 >>> could be more expensive (especially in case of (T)(i + j)), or because the >>> CST part is in bigger precision after conversion. >>> But such folding is wanted in several places, e.g, IVOPTs. To provide such >>> an interface, we changed tree-affine and made it do aggressive fold. I am >>> curious if it's possible to use aff_tree to implement strip_constant_offset >>> here since aggressive folding is wanted. After all, using additional chrec >>> looks like a little heavy wrto the simple test. >> >> Yeah, using aff_tree does work here when the bounds are constant. >> It doesn't look like it works for things like: >> >> double p[1000]; >> double q[1000]; >> >> void >> f4 (unsigned int n) >> { >> for (unsigned int i = 0; i < n; i += 4) >> { >> double a = q[i] + p[i]; >> double b = q[i + 1] + p[i + 1]; >> q[i] = a; >> q[i + 1] = b; >> } >> } >> >> though, where the bounds on the global arrays guarantee that [i + 1] can't >> overflow, even though "n" is unconstrained. The patch as posted handles >> this case too. > BTW is this a missed optimization in value range analysis? The range > information for i should flow in a way like: array boundary -> niters > -> scev/vrp. > I think that's what niters/scev do in analysis. Yeah, maybe :-) It looks like the problem is that when SLP runs, the previous VRP pass came before loop header copying, so the (single) header has to cope with n == 0 case. Thus we get: Visiting statement: i_15 = ASSERT_EXPR <i_6, i_6 < n_9(D)>; Intersecting [0, n_9(D) + 4294967295] EQUIVALENCES: { i_6 } (1 elements) and [0, 0] to [0, 0] EQUIVALENCES: { i_6 } (1 elements) Intersecting [0, 0] EQUIVALENCES: { i_6 } (1 elements) and [0, 1000] to [0, 0] EQUIVALENCES: { i_6 } (1 elements) Found new range for i_15: [0, 0] Visiting statement: _3 = i_15 + 1; Match-and-simplified i_15 + 1 to 1 Intersecting [1, 1] and [0, +INF] to [1, 1] Found new range for _3: [1, 1] (where _3 is the index we care about), followed by: Visiting statement: i_15 = ASSERT_EXPR <i_6, i_6 < n_9(D)>; Intersecting [0, n_9(D) + 4294967295] EQUIVALENCES: { i_6 } (1 elements) and [0, 4] to [0, n_9(D) + 4294967295] EQUIVALENCES: { i_6 } (1 elements) Intersecting [0, n_9(D) + 4294967295] EQUIVALENCES: { i_6 } (1 elements) and [0, 1000] to [0, n_9(D) + 4294967295] EQUIVALENCES: { i_6 } (1 elements) Found new range for i_15: [0, n_9(D) + 4294967295] Visiting statement: _3 = i_15 + 1; Intersecting [0, +INF] and [0, +INF] to [0, +INF] Found new range for _3: [0, +INF] I guess in this case it would be better to intersect the i_15 ranges to [0, 1000] rather than [0, n_9(D) + 4294967295]. It does work if another VRP pass runs after CH. But even then, is it a good idea to rely on the range info being kept up-to-date all the way through to SLP? A lot happens inbetween. FWIW, the old simple_iv check that I removed for bb data-ref analysis relies on SCEV analysis too, so I don't think this is more expensive than what we had before. Thanks, Richard
On Wed, Aug 16, 2017 at 6:50 PM, Richard Sandiford <richard.sandiford@linaro.org> wrote: > "Bin.Cheng" <amker.cheng@gmail.com> writes: >> On Wed, Aug 16, 2017 at 5:00 PM, Richard Sandiford >> <richard.sandiford@linaro.org> wrote: >>> "Bin.Cheng" <amker.cheng@gmail.com> writes: >>>> On Wed, Aug 16, 2017 at 2:38 PM, Richard Sandiford >>>> <richard.sandiford@linaro.org> wrote: >>>>> The first loop in the testcase regressed after my recent changes to >>>>> dr_analyze_innermost. Previously we would treat "i" as an iv even >>>>> for bb analysis and end up with: >>>>> >>>>> DR_BASE_ADDRESS: p or q >>>>> DR_OFFSET: 0 >>>>> DR_INIT: 0 or 4 >>>>> DR_STEP: 16 >>>>> >>>>> We now always keep the step as 0 instead, so for an int "i" we'd have: >>>>> >>>>> DR_BASE_ADDRESS: p or q >>>>> DR_OFFSET: (intptr_t) i >>>>> DR_INIT: 0 or 4 >>>>> DR_STEP: 0 >>>>> >>>>> This is also what we'd like to have for the unsigned "i", but the >>>>> problem is that strip_constant_offset thinks that the "i + 1" in >>>>> "(intptr_t) (i + 1)" could wrap and so doesn't peel off the "+ 1". >>>>> The [i + 1] accesses therefore have a DR_OFFSET equal to the SSA >>>>> name that holds "(intptr_t) (i + 1)", meaning that the accesses no >>>>> longer seem to be related to the [i] ones. >>>> >>>> Didn't read the change in detail, so sorry if I mis-understood the issue. >>>> I made changes in scev to better fold type conversion by various sources >>>> of information, for example, vrp, niters, undefined overflow behavior etc. >>>> In theory these information should be available for other optimizers without >>>> querying scev. For the mentioned test, vrp should compute accurate range >>>> information for "i" so that we can fold (intptr_t) (i + 1) it without >>>> worrying >>>> overflow. Note we don't do it in generic folding because (intptr_t) (i) + 1 >>>> could be more expensive (especially in case of (T)(i + j)), or because the >>>> CST part is in bigger precision after conversion. >>>> But such folding is wanted in several places, e.g, IVOPTs. To provide such >>>> an interface, we changed tree-affine and made it do aggressive fold. I am >>>> curious if it's possible to use aff_tree to implement strip_constant_offset >>>> here since aggressive folding is wanted. After all, using additional chrec >>>> looks like a little heavy wrto the simple test. >>> >>> Yeah, using aff_tree does work here when the bounds are constant. >>> It doesn't look like it works for things like: >>> >>> double p[1000]; >>> double q[1000]; >>> >>> void >>> f4 (unsigned int n) >>> { >>> for (unsigned int i = 0; i < n; i += 4) >>> { >>> double a = q[i] + p[i]; >>> double b = q[i + 1] + p[i + 1]; >>> q[i] = a; >>> q[i + 1] = b; >>> } >>> } >>> >>> though, where the bounds on the global arrays guarantee that [i + 1] can't >>> overflow, even though "n" is unconstrained. The patch as posted handles >>> this case too. >> BTW is this a missed optimization in value range analysis? The range >> information for i should flow in a way like: array boundary -> niters >> -> scev/vrp. >> I think that's what niters/scev do in analysis. > > Yeah, maybe :-) It looks like the problem is that when SLP runs, > the previous VRP pass came before loop header copying, so the (single) > header has to cope with n == 0 case. Thus we get: Ah, there are several passes in between vrp and pass_ch, not sure if any such pass depends on vrp intensively. I would suggestion reorder the two passes, or standalone VRP interface updating information for loop region after header copied? This is a non-trivial issue that needs to be fixed. Niters analyzer rely on simplify_using_initial_conditions heavily to get the same information, which in my opinion should be provided by VRP. Though this won't be able to obsolete simplify_using_initial_conditions because VRP is weak in symbolic range... > > Visiting statement: > i_15 = ASSERT_EXPR <i_6, i_6 < n_9(D)>; > Intersecting > [0, n_9(D) + 4294967295] EQUIVALENCES: { i_6 } (1 elements) > and > [0, 0] > to > [0, 0] EQUIVALENCES: { i_6 } (1 elements) > Intersecting > [0, 0] EQUIVALENCES: { i_6 } (1 elements) > and > [0, 1000] > to > [0, 0] EQUIVALENCES: { i_6 } (1 elements) > Found new range for i_15: [0, 0] > > Visiting statement: > _3 = i_15 + 1; > Match-and-simplified i_15 + 1 to 1 > Intersecting > [1, 1] > and > [0, +INF] > to > [1, 1] > Found new range for _3: [1, 1] > > (where _3 is the index we care about), followed by: > > Visiting statement: > i_15 = ASSERT_EXPR <i_6, i_6 < n_9(D)>; > Intersectings/aarch64-linux/trunk-orig/debug/gcc' > [0, n_9(D) + 4294967295] EQUIVALENCES: { i_6 } (1 elements) > and > [0, 4] > to > [0, n_9(D) + 4294967295] EQUIVALENCES: { i_6 } (1 elements) > Intersecting > [0, n_9(D) + 4294967295] EQUIVALENCES: { i_6 } (1 elements) > and > [0, 1000] > to > [0, n_9(D) + 4294967295] EQUIVALENCES: { i_6 } (1 elements) > Found new range for i_15: [0, n_9(D) + 4294967295] > > Visiting statement: > _3 = i_15 + 1; > Intersecting > [0, +INF] > and > [0, +INF] > to > [0, +INF] > Found new range for _3: [0, +INF] > > I guess in this case it would be better to intersect the i_15 ranges > to [0, 1000] rather than [0, n_9(D) + 4294967295]. > > It does work if another VRP pass runs after CH. But even then, > is it a good idea to rely on the range info being kept up-to-date > all the way through to SLP? A lot happens inbetween. To some extend yes. Now I understand that SCEV uses niters information to prove no_overflow. Niters analysis does infer better information from array boundary, while VRP fails to do that. I don't worry much about gap between vrp pass and slp, it's basically the same as niters. Both information are analyzed at one point and meant to be used by following passes. It's also not common for vrp information become invalid given we are on SSA? Now that data address is not analyzed against loop, VRP would be the only information we can use unless adding back scev analysis. IMHO, the patch is doing so in another way than before. It requires additional chrec data structure. I remember the previous patch enables more slp vectorization, is it because of "step" information from scev? In this patch, step information is simply discarded. I am wondering if possible to always analyze scev within innermost loop for slp while discards step information. Thanks, bin > > FWIW, the old simple_iv check that I removed for bb data-ref analysis > relies on SCEV analysis too, so I don't think this is more expensive > than what we had before. > > Thanks, > Richard
"Bin.Cheng" <amker.cheng@gmail.com> writes: > On Wed, Aug 16, 2017 at 6:50 PM, Richard Sandiford > <richard.sandiford@linaro.org> wrote: >> "Bin.Cheng" <amker.cheng@gmail.com> writes: >>> On Wed, Aug 16, 2017 at 5:00 PM, Richard Sandiford >>> <richard.sandiford@linaro.org> wrote: >>>> "Bin.Cheng" <amker.cheng@gmail.com> writes: >>>>> On Wed, Aug 16, 2017 at 2:38 PM, Richard Sandiford >>>>> <richard.sandiford@linaro.org> wrote: >>>>>> The first loop in the testcase regressed after my recent changes to >>>>>> dr_analyze_innermost. Previously we would treat "i" as an iv even >>>>>> for bb analysis and end up with: >>>>>> >>>>>> DR_BASE_ADDRESS: p or q >>>>>> DR_OFFSET: 0 >>>>>> DR_INIT: 0 or 4 >>>>>> DR_STEP: 16 >>>>>> >>>>>> We now always keep the step as 0 instead, so for an int "i" we'd have: >>>>>> >>>>>> DR_BASE_ADDRESS: p or q >>>>>> DR_OFFSET: (intptr_t) i >>>>>> DR_INIT: 0 or 4 >>>>>> DR_STEP: 0 >>>>>> >>>>>> This is also what we'd like to have for the unsigned "i", but the >>>>>> problem is that strip_constant_offset thinks that the "i + 1" in >>>>>> "(intptr_t) (i + 1)" could wrap and so doesn't peel off the "+ 1". >>>>>> The [i + 1] accesses therefore have a DR_OFFSET equal to the SSA >>>>>> name that holds "(intptr_t) (i + 1)", meaning that the accesses no >>>>>> longer seem to be related to the [i] ones. >>>>> >>>>> Didn't read the change in detail, so sorry if I mis-understood the issue. >>>>> I made changes in scev to better fold type conversion by various sources >>>>> of information, for example, vrp, niters, undefined overflow behavior etc. >>>>> In theory these information should be available for other >>>>> optimizers without >>>>> querying scev. For the mentioned test, vrp should compute accurate range >>>>> information for "i" so that we can fold (intptr_t) (i + 1) it without >>>>> worrying >>>>> overflow. Note we don't do it in generic folding because >>>>> (intptr_t) (i) + 1 >>>>> could be more expensive (especially in case of (T)(i + j)), or because the >>>>> CST part is in bigger precision after conversion. >>>>> But such folding is wanted in several places, e.g, IVOPTs. To provide such >>>>> an interface, we changed tree-affine and made it do aggressive fold. I am >>>>> curious if it's possible to use aff_tree to implement strip_constant_offset >>>>> here since aggressive folding is wanted. After all, using additional chrec >>>>> looks like a little heavy wrto the simple test. >>>> >>>> Yeah, using aff_tree does work here when the bounds are constant. >>>> It doesn't look like it works for things like: >>>> >>>> double p[1000]; >>>> double q[1000]; >>>> >>>> void >>>> f4 (unsigned int n) >>>> { >>>> for (unsigned int i = 0; i < n; i += 4) >>>> { >>>> double a = q[i] + p[i]; >>>> double b = q[i + 1] + p[i + 1]; >>>> q[i] = a; >>>> q[i + 1] = b; >>>> } >>>> } >>>> >>>> though, where the bounds on the global arrays guarantee that [i + 1] can't >>>> overflow, even though "n" is unconstrained. The patch as posted handles >>>> this case too. >>> BTW is this a missed optimization in value range analysis? The range >>> information for i should flow in a way like: array boundary -> niters >>> -> scev/vrp. >>> I think that's what niters/scev do in analysis. >> >> Yeah, maybe :-) It looks like the problem is that when SLP runs, >> the previous VRP pass came before loop header copying, so the (single) >> header has to cope with n == 0 case. Thus we get: > Ah, there are several passes in between vrp and pass_ch, not sure if > any such pass depends on vrp intensively. I would suggestion reorder > the two passes, or standalone VRP interface updating information for > loop region after header copied? This is a non-trivial issue that > needs to be fixed. Niters analyzer rely on > simplify_using_initial_conditions heavily to get the same information, > which in my opinion should be provided by VRP. Though this won't be > able to obsolete simplify_using_initial_conditions because VRP is weak > in symbolic range... > >> >> Visiting statement: >> i_15 = ASSERT_EXPR <i_6, i_6 < n_9(D)>; >> Intersecting >> [0, n_9(D) + 4294967295] EQUIVALENCES: { i_6 } (1 elements) >> and >> [0, 0] >> to >> [0, 0] EQUIVALENCES: { i_6 } (1 elements) >> Intersecting >> [0, 0] EQUIVALENCES: { i_6 } (1 elements) >> and >> [0, 1000] >> to >> [0, 0] EQUIVALENCES: { i_6 } (1 elements) >> Found new range for i_15: [0, 0] >> >> Visiting statement: >> _3 = i_15 + 1; >> Match-and-simplified i_15 + 1 to 1 >> Intersecting >> [1, 1] >> and >> [0, +INF] >> to >> [1, 1] >> Found new range for _3: [1, 1] >> >> (where _3 is the index we care about), followed by: >> >> Visiting statement: >> i_15 = ASSERT_EXPR <i_6, i_6 < n_9(D)>; >> Intersectings/aarch64-linux/trunk-orig/debug/gcc' >> [0, n_9(D) + 4294967295] EQUIVALENCES: { i_6 } (1 elements) >> and >> [0, 4] >> to >> [0, n_9(D) + 4294967295] EQUIVALENCES: { i_6 } (1 elements) >> Intersecting >> [0, n_9(D) + 4294967295] EQUIVALENCES: { i_6 } (1 elements) >> and >> [0, 1000] >> to >> [0, n_9(D) + 4294967295] EQUIVALENCES: { i_6 } (1 elements) >> Found new range for i_15: [0, n_9(D) + 4294967295] >> >> Visiting statement: >> _3 = i_15 + 1; >> Intersecting >> [0, +INF] >> and >> [0, +INF] >> to >> [0, +INF] >> Found new range for _3: [0, +INF] >> >> I guess in this case it would be better to intersect the i_15 ranges >> to [0, 1000] rather than [0, n_9(D) + 4294967295]. >> >> It does work if another VRP pass runs after CH. But even then, >> is it a good idea to rely on the range info being kept up-to-date >> all the way through to SLP? A lot happens inbetween. > To some extend yes. Now I understand that SCEV uses niters > information to prove no_overflow. Niters analysis does infer better > information from array boundary, while VRP fails to do that. I don't > worry much about gap between vrp pass and slp, it's basically the same > as niters. Both information are analyzed at one point and meant to be > used by following passes. It's also not common for vrp information > become invalid given we are on SSA? I'm not worried so much about vrp information becoming invalid on an SSA name that existed when VRP was run. It's more a question of what happens about SSA names that get introduced after VRP, e.g. by things like dom, reassoc, PRE, etc. > Now that data address is not analyzed against loop, VRP would be the > only information we can use unless adding back scev analysis. IMHO, > the patch is doing so in another way than before. It requires > additional chrec data structure. I remember the previous patch > enables more slp vectorization, is it because of "step" information > from scev? Do you mean that: 2017-07-03 Richard Sandiford <richard.sandiford@linaro.org> * tree-data-ref.c (dr_analyze_innermost): Replace the "nest" parameter with a "loop" parameter and use it instead of the loop containing DR_STMT. Don't check simple_iv when doing BB analysis. Describe the two analysis modes in the comment. enabled more SLP vectorisation in bb-slp-pr65935.c? That was due to us not doing IV analysis for BB vectorisation, and ensuring that the step was always zero. > In this patch, step information is simply discarded. I am > wondering if possible to always analyze scev within innermost loop for > slp while discards step information. Well, we don't calculate a step for bb analysis (i.e. it's not the case the patch calculates step information and then simply discards it). I don't see how that would work. For bb analysis, the DR_OFFSET + DR_INIT has to give the offset for every execution of the block, not just the first iteration of the containing loop. So if we get back a nonzero step, we have to do something with it. But: (a) the old simple_iv analysis is more expensive than simply calling analyze_scev, so I don't think this is a win in terms of complexity. (b) for bb analysis, there's nothing particularly special about the innermost loop. It makes more sense to analyse it in the innermost loop for which the offset is invariant, as shown by the second testcase in the patch. (c) The patch helps with loop vectorisation too, since analysing the starting DR_OFFSET in the context of the containing loop can help in a similar way as analysing the full offset does for SLP. Thanks, Richard > > Thanks, > bin >> >> FWIW, the old simple_iv check that I removed for bb data-ref analysis >> relies on SCEV analysis too, so I don't think this is more expensive >> than what we had before. >> >> Thanks, >> Richard
On Thu, Aug 17, 2017 at 12:35 PM, Richard Sandiford <richard.sandiford@linaro.org> wrote: > "Bin.Cheng" <amker.cheng@gmail.com> writes: >> On Wed, Aug 16, 2017 at 6:50 PM, Richard Sandiford >> <richard.sandiford@linaro.org> wrote: >>> "Bin.Cheng" <amker.cheng@gmail.com> writes: >>>> On Wed, Aug 16, 2017 at 5:00 PM, Richard Sandiford >>>> <richard.sandiford@linaro.org> wrote: >>>>> "Bin.Cheng" <amker.cheng@gmail.com> writes: >>>>>> On Wed, Aug 16, 2017 at 2:38 PM, Richard Sandiford >>>>>> <richard.sandiford@linaro.org> wrote: >>>>>>> The first loop in the testcase regressed after my recent changes to >>>>>>> dr_analyze_innermost. Previously we would treat "i" as an iv even >>>>>>> for bb analysis and end up with: >>>>>>> >>>>>>> DR_BASE_ADDRESS: p or q >>>>>>> DR_OFFSET: 0 >>>>>>> DR_INIT: 0 or 4 >>>>>>> DR_STEP: 16 >>>>>>> >>>>>>> We now always keep the step as 0 instead, so for an int "i" we'd have: >>>>>>> >>>>>>> DR_BASE_ADDRESS: p or q >>>>>>> DR_OFFSET: (intptr_t) i >>>>>>> DR_INIT: 0 or 4 >>>>>>> DR_STEP: 0 >>>>>>> >>>>>>> This is also what we'd like to have for the unsigned "i", but the >>>>>>> problem is that strip_constant_offset thinks that the "i + 1" in >>>>>>> "(intptr_t) (i + 1)" could wrap and so doesn't peel off the "+ 1". >>>>>>> The [i + 1] accesses therefore have a DR_OFFSET equal to the SSA >>>>>>> name that holds "(intptr_t) (i + 1)", meaning that the accesses no >>>>>>> longer seem to be related to the [i] ones. >>>>>> >>>>>> Didn't read the change in detail, so sorry if I mis-understood the issue. >>>>>> I made changes in scev to better fold type conversion by various sources >>>>>> of information, for example, vrp, niters, undefined overflow behavior etc. >>>>>> In theory these information should be available for other >>>>>> optimizers without >>>>>> querying scev. For the mentioned test, vrp should compute accurate range >>>>>> information for "i" so that we can fold (intptr_t) (i + 1) it without >>>>>> worrying >>>>>> overflow. Note we don't do it in generic folding because >>>>>> (intptr_t) (i) + 1 >>>>>> could be more expensive (especially in case of (T)(i + j)), or because the >>>>>> CST part is in bigger precision after conversion. >>>>>> But such folding is wanted in several places, e.g, IVOPTs. To provide such >>>>>> an interface, we changed tree-affine and made it do aggressive fold. I am >>>>>> curious if it's possible to use aff_tree to implement strip_constant_offset >>>>>> here since aggressive folding is wanted. After all, using additional chrec >>>>>> looks like a little heavy wrto the simple test. >>>>> >>>>> Yeah, using aff_tree does work here when the bounds are constant. >>>>> It doesn't look like it works for things like: >>>>> >>>>> double p[1000]; >>>>> double q[1000]; >>>>> >>>>> void >>>>> f4 (unsigned int n) >>>>> { >>>>> for (unsigned int i = 0; i < n; i += 4) >>>>> { >>>>> double a = q[i] + p[i]; >>>>> double b = q[i + 1] + p[i + 1]; >>>>> q[i] = a; >>>>> q[i + 1] = b; >>>>> } >>>>> } >>>>> >>>>> though, where the bounds on the global arrays guarantee that [i + 1] can't >>>>> overflow, even though "n" is unconstrained. The patch as posted handles >>>>> this case too. >>>> BTW is this a missed optimization in value range analysis? The range >>>> information for i should flow in a way like: array boundary -> niters >>>> -> scev/vrp. >>>> I think that's what niters/scev do in analysis. >>> >>> Yeah, maybe :-) It looks like the problem is that when SLP runs, >>> the previous VRP pass came before loop header copying, so the (single) >>> header has to cope with n == 0 case. Thus we get: >> Ah, there are several passes in between vrp and pass_ch, not sure if >> any such pass depends on vrp intensively. I would suggestion reorder >> the two passes, or standalone VRP interface updating information for >> loop region after header copied? This is a non-trivial issue that >> needs to be fixed. Niters analyzer rely on >> simplify_using_initial_conditions heavily to get the same information, >> which in my opinion should be provided by VRP. Though this won't be >> able to obsolete simplify_using_initial_conditions because VRP is weak >> in symbolic range... >> >>> >>> Visiting statement: >>> i_15 = ASSERT_EXPR <i_6, i_6 < n_9(D)>; >>> Intersecting >>> [0, n_9(D) + 4294967295] EQUIVALENCES: { i_6 } (1 elements) >>> and >>> [0, 0] >>> to >>> [0, 0] EQUIVALENCES: { i_6 } (1 elements) >>> Intersecting >>> [0, 0] EQUIVALENCES: { i_6 } (1 elements) >>> and >>> [0, 1000] >>> to >>> [0, 0] EQUIVALENCES: { i_6 } (1 elements) >>> Found new range for i_15: [0, 0] >>> >>> Visiting statement: >>> _3 = i_15 + 1; >>> Match-and-simplified i_15 + 1 to 1 >>> Intersecting >>> [1, 1] >>> and >>> [0, +INF] >>> to >>> [1, 1] >>> Found new range for _3: [1, 1] >>> >>> (where _3 is the index we care about), followed by: >>> >>> Visiting statement: >>> i_15 = ASSERT_EXPR <i_6, i_6 < n_9(D)>; >>> Intersectings/aarch64-linux/trunk-orig/debug/gcc' >>> [0, n_9(D) + 4294967295] EQUIVALENCES: { i_6 } (1 elements) >>> and >>> [0, 4] >>> to >>> [0, n_9(D) + 4294967295] EQUIVALENCES: { i_6 } (1 elements) >>> Intersecting >>> [0, n_9(D) + 4294967295] EQUIVALENCES: { i_6 } (1 elements) >>> and >>> [0, 1000] >>> to >>> [0, n_9(D) + 4294967295] EQUIVALENCES: { i_6 } (1 elements) >>> Found new range for i_15: [0, n_9(D) + 4294967295] >>> >>> Visiting statement: >>> _3 = i_15 + 1; >>> Intersecting >>> [0, +INF] >>> and >>> [0, +INF] >>> to >>> [0, +INF] >>> Found new range for _3: [0, +INF] >>> >>> I guess in this case it would be better to intersect the i_15 ranges >>> to [0, 1000] rather than [0, n_9(D) + 4294967295]. >>> >>> It does work if another VRP pass runs after CH. But even then, >>> is it a good idea to rely on the range info being kept up-to-date >>> all the way through to SLP? A lot happens inbetween. >> To some extend yes. Now I understand that SCEV uses niters >> information to prove no_overflow. Niters analysis does infer better >> information from array boundary, while VRP fails to do that. I don't >> worry much about gap between vrp pass and slp, it's basically the same >> as niters. Both information are analyzed at one point and meant to be >> used by following passes. It's also not common for vrp information >> become invalid given we are on SSA? > > I'm not worried so much about vrp information becoming invalid on > an SSA name that existed when VRP was run. It's more a question > of what happens about SSA names that get introduced after VRP, > e.g. by things like dom, reassoc, PRE, etc. For induction variables in concern, these passes shouldn't aggressively introduces new variables I think. > >> Now that data address is not analyzed against loop, VRP would be the >> only information we can use unless adding back scev analysis. IMHO, >> the patch is doing so in another way than before. It requires >> additional chrec data structure. I remember the previous patch >> enables more slp vectorization, is it because of "step" information >> from scev? > > Do you mean that: > > 2017-07-03 Richard Sandiford <richard.sandiford@linaro.org> > > * tree-data-ref.c (dr_analyze_innermost): Replace the "nest" > parameter with a "loop" parameter and use it instead of the > loop containing DR_STMT. Don't check simple_iv when doing > BB analysis. Describe the two analysis modes in the comment. > > enabled more SLP vectorisation in bb-slp-pr65935.c? That was due > to us not doing IV analysis for BB vectorisation, and ensuring that > the step was always zero. Which means vectorizer code can handle not IV-analyzed offset, but can't for analyzed form? > >> In this patch, step information is simply discarded. I am >> wondering if possible to always analyze scev within innermost loop for >> slp while discards step information. > > Well, we don't calculate a step for bb analysis (i.e. it's not the case > the patch calculates step information and then simply discards it). > I don't see how that would work. For bb analysis, the DR_OFFSET + DR_INIT > has to give the offset for every execution of the block, not just the > first iteration of the containing loop. So if we get back a nonzero > step, we have to do something with it. Yeah. > > But: > > (a) the old simple_iv analysis is more expensive than simply calling > analyze_scev, so I don't think this is a win in terms of complexity. > > (b) for bb analysis, there's nothing particularly special about the > innermost loop. It makes more sense to analyse it in the innermost > loop for which the offset is invariant, as shown by the second > testcase in the patch. > > (c) The patch helps with loop vectorisation too, since analysing the > starting DR_OFFSET in the context of the containing loop can help > in a similar way as analysing the full offset does for SLP. I have to admit I am not very much into this method. It complicates structure as well as code. Mostly because now dr_init are split into two different fields and one of it is lazily computed. For example: > @@ -2974,12 +2974,12 @@ vect_vfa_segment_size (struct data_refer > vect_no_alias_p (struct data_reference *a, struct data_reference *b, > tree segment_length_a, tree segment_length_b) > { > - gcc_assert (TREE_CODE (DR_INIT (a)) == INTEGER_CST > - && TREE_CODE (DR_INIT (b)) == INTEGER_CST); > - if (tree_int_cst_equal (DR_INIT (a), DR_INIT (b))) > + gcc_assert (TREE_CODE (DR_CHREC_INIT (a)) == INTEGER_CST > + && TREE_CODE (DR_CHREC_INIT (b)) == INTEGER_CST); > + if (tree_int_cst_equal (DR_CHREC_INIT (a), DR_CHREC_INIT (b))) > return false; > > - tree seg_a_min = DR_INIT (a); > + tree seg_a_min = DR_CHREC_INIT (a); > tree seg_a_max = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_a_min), > seg_a_min, segment_length_a); > /* For negative step, we need to adjust address range by TYPE_SIZE_UNIT > @@ -2990,10 +2990,10 @@ vect_no_alias_p (struct data_reference * > tree unit_size = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (a))); > seg_a_min = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_a_max), > seg_a_max, unit_size); > - seg_a_max = fold_build2 (PLUS_EXPR, TREE_TYPE (DR_INIT (a)), > - DR_INIT (a), unit_size); > + seg_a_max = fold_build2 (PLUS_EXPR, TREE_TYPE (DR_CHREC_INIT (a)), > + DR_CHREC_INIT (a), unit_size); > } > - tree seg_b_min = DR_INIT (b); > + tree seg_b_min = DR_CHREC_INIT (b); > tree seg_b_max = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_b_min), > seg_b_min, segment_length_b); > if (tree_int_cst_compare (DR_STEP (b), size_zero_node) < 0) Use of DR_INIT is simply replaced by DR_CHREC_INIT. Is it safe to do so in case of non-ZERO DR_INIT? It worries me that I may need to think twice before referring to DR_INIT because it's not clear when DR_OFFSET is split and DR_CHREC_INIT becomes non-ZERO. It may simply because I am too dumb to handle this. I will leave this to richi. Thanks, bin > > Thanks, > Richard > >> >> Thanks, >> bin >>> >>> FWIW, the old simple_iv check that I removed for bb data-ref analysis >>> relies on SCEV analysis too, so I don't think this is more expensive >>> than what we had before. >>> >>> Thanks, >>> Richard
On Thu, Aug 17, 2017 at 2:24 PM, Bin.Cheng <amker.cheng@gmail.com> wrote: > On Thu, Aug 17, 2017 at 12:35 PM, Richard Sandiford > <richard.sandiford@linaro.org> wrote: >> "Bin.Cheng" <amker.cheng@gmail.com> writes: >>> On Wed, Aug 16, 2017 at 6:50 PM, Richard Sandiford >>> <richard.sandiford@linaro.org> wrote: >>>> "Bin.Cheng" <amker.cheng@gmail.com> writes: >>>>> On Wed, Aug 16, 2017 at 5:00 PM, Richard Sandiford >>>>> <richard.sandiford@linaro.org> wrote: >>>>>> "Bin.Cheng" <amker.cheng@gmail.com> writes: >>>>>>> On Wed, Aug 16, 2017 at 2:38 PM, Richard Sandiford >>>>>>> <richard.sandiford@linaro.org> wrote: >>>>>>>> The first loop in the testcase regressed after my recent changes to >>>>>>>> dr_analyze_innermost. Previously we would treat "i" as an iv even >>>>>>>> for bb analysis and end up with: >>>>>>>> >>>>>>>> DR_BASE_ADDRESS: p or q >>>>>>>> DR_OFFSET: 0 >>>>>>>> DR_INIT: 0 or 4 >>>>>>>> DR_STEP: 16 >>>>>>>> >>>>>>>> We now always keep the step as 0 instead, so for an int "i" we'd have: >>>>>>>> >>>>>>>> DR_BASE_ADDRESS: p or q >>>>>>>> DR_OFFSET: (intptr_t) i >>>>>>>> DR_INIT: 0 or 4 >>>>>>>> DR_STEP: 0 >>>>>>>> >>>>>>>> This is also what we'd like to have for the unsigned "i", but the >>>>>>>> problem is that strip_constant_offset thinks that the "i + 1" in >>>>>>>> "(intptr_t) (i + 1)" could wrap and so doesn't peel off the "+ 1". >>>>>>>> The [i + 1] accesses therefore have a DR_OFFSET equal to the SSA >>>>>>>> name that holds "(intptr_t) (i + 1)", meaning that the accesses no >>>>>>>> longer seem to be related to the [i] ones. >>>>>>> >>>>>>> Didn't read the change in detail, so sorry if I mis-understood the issue. >>>>>>> I made changes in scev to better fold type conversion by various sources >>>>>>> of information, for example, vrp, niters, undefined overflow behavior etc. >>>>>>> In theory these information should be available for other >>>>>>> optimizers without >>>>>>> querying scev. For the mentioned test, vrp should compute accurate range >>>>>>> information for "i" so that we can fold (intptr_t) (i + 1) it without >>>>>>> worrying >>>>>>> overflow. Note we don't do it in generic folding because >>>>>>> (intptr_t) (i) + 1 >>>>>>> could be more expensive (especially in case of (T)(i + j)), or because the >>>>>>> CST part is in bigger precision after conversion. >>>>>>> But such folding is wanted in several places, e.g, IVOPTs. To provide such >>>>>>> an interface, we changed tree-affine and made it do aggressive fold. I am >>>>>>> curious if it's possible to use aff_tree to implement strip_constant_offset >>>>>>> here since aggressive folding is wanted. After all, using additional chrec >>>>>>> looks like a little heavy wrto the simple test. >>>>>> >>>>>> Yeah, using aff_tree does work here when the bounds are constant. >>>>>> It doesn't look like it works for things like: >>>>>> >>>>>> double p[1000]; >>>>>> double q[1000]; >>>>>> >>>>>> void >>>>>> f4 (unsigned int n) >>>>>> { >>>>>> for (unsigned int i = 0; i < n; i += 4) >>>>>> { >>>>>> double a = q[i] + p[i]; >>>>>> double b = q[i + 1] + p[i + 1]; >>>>>> q[i] = a; >>>>>> q[i + 1] = b; >>>>>> } >>>>>> } >>>>>> >>>>>> though, where the bounds on the global arrays guarantee that [i + 1] can't >>>>>> overflow, even though "n" is unconstrained. The patch as posted handles >>>>>> this case too. >>>>> BTW is this a missed optimization in value range analysis? The range >>>>> information for i should flow in a way like: array boundary -> niters >>>>> -> scev/vrp. >>>>> I think that's what niters/scev do in analysis. >>>> >>>> Yeah, maybe :-) It looks like the problem is that when SLP runs, >>>> the previous VRP pass came before loop header copying, so the (single) >>>> header has to cope with n == 0 case. Thus we get: >>> Ah, there are several passes in between vrp and pass_ch, not sure if >>> any such pass depends on vrp intensively. I would suggestion reorder >>> the two passes, or standalone VRP interface updating information for >>> loop region after header copied? This is a non-trivial issue that >>> needs to be fixed. Niters analyzer rely on >>> simplify_using_initial_conditions heavily to get the same information, >>> which in my opinion should be provided by VRP. Though this won't be >>> able to obsolete simplify_using_initial_conditions because VRP is weak >>> in symbolic range... >>> >>>> >>>> Visiting statement: >>>> i_15 = ASSERT_EXPR <i_6, i_6 < n_9(D)>; >>>> Intersecting >>>> [0, n_9(D) + 4294967295] EQUIVALENCES: { i_6 } (1 elements) >>>> and >>>> [0, 0] >>>> to >>>> [0, 0] EQUIVALENCES: { i_6 } (1 elements) >>>> Intersecting >>>> [0, 0] EQUIVALENCES: { i_6 } (1 elements) >>>> and >>>> [0, 1000] >>>> to >>>> [0, 0] EQUIVALENCES: { i_6 } (1 elements) >>>> Found new range for i_15: [0, 0] >>>> >>>> Visiting statement: >>>> _3 = i_15 + 1; >>>> Match-and-simplified i_15 + 1 to 1 >>>> Intersecting >>>> [1, 1] >>>> and >>>> [0, +INF] >>>> to >>>> [1, 1] >>>> Found new range for _3: [1, 1] >>>> >>>> (where _3 is the index we care about), followed by: >>>> >>>> Visiting statement: >>>> i_15 = ASSERT_EXPR <i_6, i_6 < n_9(D)>; >>>> Intersectings/aarch64-linux/trunk-orig/debug/gcc' >>>> [0, n_9(D) + 4294967295] EQUIVALENCES: { i_6 } (1 elements) >>>> and >>>> [0, 4] >>>> to >>>> [0, n_9(D) + 4294967295] EQUIVALENCES: { i_6 } (1 elements) >>>> Intersecting >>>> [0, n_9(D) + 4294967295] EQUIVALENCES: { i_6 } (1 elements) >>>> and >>>> [0, 1000] >>>> to >>>> [0, n_9(D) + 4294967295] EQUIVALENCES: { i_6 } (1 elements) >>>> Found new range for i_15: [0, n_9(D) + 4294967295] >>>> >>>> Visiting statement: >>>> _3 = i_15 + 1; >>>> Intersecting >>>> [0, +INF] >>>> and >>>> [0, +INF] >>>> to >>>> [0, +INF] >>>> Found new range for _3: [0, +INF] >>>> >>>> I guess in this case it would be better to intersect the i_15 ranges >>>> to [0, 1000] rather than [0, n_9(D) + 4294967295]. >>>> >>>> It does work if another VRP pass runs after CH. But even then, >>>> is it a good idea to rely on the range info being kept up-to-date >>>> all the way through to SLP? A lot happens inbetween. >>> To some extend yes. Now I understand that SCEV uses niters >>> information to prove no_overflow. Niters analysis does infer better >>> information from array boundary, while VRP fails to do that. I don't >>> worry much about gap between vrp pass and slp, it's basically the same >>> as niters. Both information are analyzed at one point and meant to be >>> used by following passes. It's also not common for vrp information >>> become invalid given we are on SSA? >> >> I'm not worried so much about vrp information becoming invalid on >> an SSA name that existed when VRP was run. It's more a question >> of what happens about SSA names that get introduced after VRP, >> e.g. by things like dom, reassoc, PRE, etc. > For induction variables in concern, these passes shouldn't > aggressively introduces new variables I think. >> >>> Now that data address is not analyzed against loop, VRP would be the >>> only information we can use unless adding back scev analysis. IMHO, >>> the patch is doing so in another way than before. It requires >>> additional chrec data structure. I remember the previous patch >>> enables more slp vectorization, is it because of "step" information >>> from scev? >> >> Do you mean that: >> >> 2017-07-03 Richard Sandiford <richard.sandiford@linaro.org> >> >> * tree-data-ref.c (dr_analyze_innermost): Replace the "nest" >> parameter with a "loop" parameter and use it instead of the >> loop containing DR_STMT. Don't check simple_iv when doing >> BB analysis. Describe the two analysis modes in the comment. >> >> enabled more SLP vectorisation in bb-slp-pr65935.c? That was due >> to us not doing IV analysis for BB vectorisation, and ensuring that >> the step was always zero. > Which means vectorizer code can handle not IV-analyzed offset, but > can't for analyzed form? >> >>> In this patch, step information is simply discarded. I am >>> wondering if possible to always analyze scev within innermost loop for >>> slp while discards step information. >> >> Well, we don't calculate a step for bb analysis (i.e. it's not the case >> the patch calculates step information and then simply discards it). >> I don't see how that would work. For bb analysis, the DR_OFFSET + DR_INIT >> has to give the offset for every execution of the block, not just the >> first iteration of the containing loop. So if we get back a nonzero >> step, we have to do something with it. > Yeah. >> >> But: >> >> (a) the old simple_iv analysis is more expensive than simply calling >> analyze_scev, so I don't think this is a win in terms of complexity. >> >> (b) for bb analysis, there's nothing particularly special about the >> innermost loop. It makes more sense to analyse it in the innermost >> loop for which the offset is invariant, as shown by the second >> testcase in the patch. >> >> (c) The patch helps with loop vectorisation too, since analysing the >> starting DR_OFFSET in the context of the containing loop can help >> in a similar way as analysing the full offset does for SLP. > > I have to admit I am not very much into this method. It complicates > structure as well as code. > Mostly because now dr_init are split into two different fields and one > of it is lazily computed. > > For example: >> @@ -2974,12 +2974,12 @@ vect_vfa_segment_size (struct data_refer >> vect_no_alias_p (struct data_reference *a, struct data_reference *b, >> tree segment_length_a, tree segment_length_b) >> { >> - gcc_assert (TREE_CODE (DR_INIT (a)) == INTEGER_CST >> - && TREE_CODE (DR_INIT (b)) == INTEGER_CST); >> - if (tree_int_cst_equal (DR_INIT (a), DR_INIT (b))) >> + gcc_assert (TREE_CODE (DR_CHREC_INIT (a)) == INTEGER_CST >> + && TREE_CODE (DR_CHREC_INIT (b)) == INTEGER_CST); >> + if (tree_int_cst_equal (DR_CHREC_INIT (a), DR_CHREC_INIT (b))) >> return false; >> >> - tree seg_a_min = DR_INIT (a); >> + tree seg_a_min = DR_CHREC_INIT (a); >> tree seg_a_max = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_a_min), >> seg_a_min, segment_length_a); >> /* For negative step, we need to adjust address range by TYPE_SIZE_UNIT >> @@ -2990,10 +2990,10 @@ vect_no_alias_p (struct data_reference * >> tree unit_size = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (a))); >> seg_a_min = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_a_max), >> seg_a_max, unit_size); >> - seg_a_max = fold_build2 (PLUS_EXPR, TREE_TYPE (DR_INIT (a)), >> - DR_INIT (a), unit_size); >> + seg_a_max = fold_build2 (PLUS_EXPR, TREE_TYPE (DR_CHREC_INIT (a)), >> + DR_CHREC_INIT (a), unit_size); >> } >> - tree seg_b_min = DR_INIT (b); >> + tree seg_b_min = DR_CHREC_INIT (b); >> tree seg_b_max = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_b_min), >> seg_b_min, segment_length_b); >> if (tree_int_cst_compare (DR_STEP (b), size_zero_node) < 0) > > Use of DR_INIT is simply replaced by DR_CHREC_INIT. Is it safe to do > so in case of non-ZERO > DR_INIT? It worries me that I may need to think twice before > referring to DR_INIT because it's > not clear when DR_OFFSET is split and DR_CHREC_INIT becomes non-ZERO. > It may simply > because I am too dumb to handle this. I will leave this to richi. I'm trying to understand this a bit (not liking it very much in its current form). Can code currently using DR_OFFSET and DR_INIT simply use DR_CHREC_INIT and CHREC_LEFT (DR_CHREC_OFFSET) (or DR_CHREC_OFFSET if DR_CHREC_OFFSET is not a CHREC)? If yes, would there be any downside in doing that? If not, then why? That said, I'm all for making DR info more powerful. But I detest having extra fields and adding confusion as to when to use which. Thus if we can make DR_CHREC_INIT be DR_INIT and DR_OFFSET an inline function accessing CHREC_LEFT if suitable plus exposing DR_OFFSET_CHREC that would make me more happy. One downside might be that the scalar evolution of the offset might pull in SSA vars into "DR_OFFSET" that are "far away" and thus less optimal for code-generation when one re-constructs a ref by adding the components? Thanks, Richard. > Thanks, > bin >> >> Thanks, >> Richard >> >>> >>> Thanks, >>> bin >>>> >>>> FWIW, the old simple_iv check that I removed for bb data-ref analysis >>>> relies on SCEV analysis too, so I don't think this is more expensive >>>> than what we had before. >>>> >>>> Thanks, >>>> Richard
On Fri, Aug 18, 2017 at 12:30 PM, Richard Biener <richard.guenther@gmail.com> wrote: > On Thu, Aug 17, 2017 at 2:24 PM, Bin.Cheng <amker.cheng@gmail.com> wrote: >> On Thu, Aug 17, 2017 at 12:35 PM, Richard Sandiford >> <richard.sandiford@linaro.org> wrote: >>> "Bin.Cheng" <amker.cheng@gmail.com> writes: >>>> On Wed, Aug 16, 2017 at 6:50 PM, Richard Sandiford >>>> <richard.sandiford@linaro.org> wrote: >>>>> "Bin.Cheng" <amker.cheng@gmail.com> writes: >>>>>> On Wed, Aug 16, 2017 at 5:00 PM, Richard Sandiford >>>>>> <richard.sandiford@linaro.org> wrote: >>>>>>> "Bin.Cheng" <amker.cheng@gmail.com> writes: >>>>>>>> On Wed, Aug 16, 2017 at 2:38 PM, Richard Sandiford >>>>>>>> <richard.sandiford@linaro.org> wrote: >>>>>>>>> The first loop in the testcase regressed after my recent changes to >>>>>>>>> dr_analyze_innermost. Previously we would treat "i" as an iv even >>>>>>>>> for bb analysis and end up with: >>>>>>>>> >>>>>>>>> DR_BASE_ADDRESS: p or q >>>>>>>>> DR_OFFSET: 0 >>>>>>>>> DR_INIT: 0 or 4 >>>>>>>>> DR_STEP: 16 >>>>>>>>> >>>>>>>>> We now always keep the step as 0 instead, so for an int "i" we'd have: >>>>>>>>> >>>>>>>>> DR_BASE_ADDRESS: p or q >>>>>>>>> DR_OFFSET: (intptr_t) i >>>>>>>>> DR_INIT: 0 or 4 >>>>>>>>> DR_STEP: 0 >>>>>>>>> >>>>>>>>> This is also what we'd like to have for the unsigned "i", but the >>>>>>>>> problem is that strip_constant_offset thinks that the "i + 1" in >>>>>>>>> "(intptr_t) (i + 1)" could wrap and so doesn't peel off the "+ 1". >>>>>>>>> The [i + 1] accesses therefore have a DR_OFFSET equal to the SSA >>>>>>>>> name that holds "(intptr_t) (i + 1)", meaning that the accesses no >>>>>>>>> longer seem to be related to the [i] ones. >>>>>>>> >>>>>>>> Didn't read the change in detail, so sorry if I mis-understood the issue. >>>>>>>> I made changes in scev to better fold type conversion by various sources >>>>>>>> of information, for example, vrp, niters, undefined overflow behavior etc. >>>>>>>> In theory these information should be available for other >>>>>>>> optimizers without >>>>>>>> querying scev. For the mentioned test, vrp should compute accurate range >>>>>>>> information for "i" so that we can fold (intptr_t) (i + 1) it without >>>>>>>> worrying >>>>>>>> overflow. Note we don't do it in generic folding because >>>>>>>> (intptr_t) (i) + 1 >>>>>>>> could be more expensive (especially in case of (T)(i + j)), or because the >>>>>>>> CST part is in bigger precision after conversion. >>>>>>>> But such folding is wanted in several places, e.g, IVOPTs. To provide such >>>>>>>> an interface, we changed tree-affine and made it do aggressive fold. I am >>>>>>>> curious if it's possible to use aff_tree to implement strip_constant_offset >>>>>>>> here since aggressive folding is wanted. After all, using additional chrec >>>>>>>> looks like a little heavy wrto the simple test. >>>>>>> >>>>>>> Yeah, using aff_tree does work here when the bounds are constant. >>>>>>> It doesn't look like it works for things like: >>>>>>> >>>>>>> double p[1000]; >>>>>>> double q[1000]; >>>>>>> >>>>>>> void >>>>>>> f4 (unsigned int n) >>>>>>> { >>>>>>> for (unsigned int i = 0; i < n; i += 4) >>>>>>> { >>>>>>> double a = q[i] + p[i]; >>>>>>> double b = q[i + 1] + p[i + 1]; >>>>>>> q[i] = a; >>>>>>> q[i + 1] = b; >>>>>>> } >>>>>>> } >>>>>>> >>>>>>> though, where the bounds on the global arrays guarantee that [i + 1] can't >>>>>>> overflow, even though "n" is unconstrained. The patch as posted handles >>>>>>> this case too. >>>>>> BTW is this a missed optimization in value range analysis? The range >>>>>> information for i should flow in a way like: array boundary -> niters >>>>>> -> scev/vrp. >>>>>> I think that's what niters/scev do in analysis. >>>>> >>>>> Yeah, maybe :-) It looks like the problem is that when SLP runs, >>>>> the previous VRP pass came before loop header copying, so the (single) >>>>> header has to cope with n == 0 case. Thus we get: >>>> Ah, there are several passes in between vrp and pass_ch, not sure if >>>> any such pass depends on vrp intensively. I would suggestion reorder >>>> the two passes, or standalone VRP interface updating information for >>>> loop region after header copied? This is a non-trivial issue that >>>> needs to be fixed. Niters analyzer rely on >>>> simplify_using_initial_conditions heavily to get the same information, >>>> which in my opinion should be provided by VRP. Though this won't be >>>> able to obsolete simplify_using_initial_conditions because VRP is weak >>>> in symbolic range... >>>> >>>>> >>>>> Visiting statement: >>>>> i_15 = ASSERT_EXPR <i_6, i_6 < n_9(D)>; >>>>> Intersecting >>>>> [0, n_9(D) + 4294967295] EQUIVALENCES: { i_6 } (1 elements) >>>>> and >>>>> [0, 0] >>>>> to >>>>> [0, 0] EQUIVALENCES: { i_6 } (1 elements) >>>>> Intersecting >>>>> [0, 0] EQUIVALENCES: { i_6 } (1 elements) >>>>> and >>>>> [0, 1000] >>>>> to >>>>> [0, 0] EQUIVALENCES: { i_6 } (1 elements) >>>>> Found new range for i_15: [0, 0] >>>>> >>>>> Visiting statement: >>>>> _3 = i_15 + 1; >>>>> Match-and-simplified i_15 + 1 to 1 >>>>> Intersecting >>>>> [1, 1] >>>>> and >>>>> [0, +INF] >>>>> to >>>>> [1, 1] >>>>> Found new range for _3: [1, 1] >>>>> >>>>> (where _3 is the index we care about), followed by: >>>>> >>>>> Visiting statement: >>>>> i_15 = ASSERT_EXPR <i_6, i_6 < n_9(D)>; >>>>> Intersectings/aarch64-linux/trunk-orig/debug/gcc' >>>>> [0, n_9(D) + 4294967295] EQUIVALENCES: { i_6 } (1 elements) >>>>> and >>>>> [0, 4] >>>>> to >>>>> [0, n_9(D) + 4294967295] EQUIVALENCES: { i_6 } (1 elements) >>>>> Intersecting >>>>> [0, n_9(D) + 4294967295] EQUIVALENCES: { i_6 } (1 elements) >>>>> and >>>>> [0, 1000] >>>>> to >>>>> [0, n_9(D) + 4294967295] EQUIVALENCES: { i_6 } (1 elements) >>>>> Found new range for i_15: [0, n_9(D) + 4294967295] >>>>> >>>>> Visiting statement: >>>>> _3 = i_15 + 1; >>>>> Intersecting >>>>> [0, +INF] >>>>> and >>>>> [0, +INF] >>>>> to >>>>> [0, +INF] >>>>> Found new range for _3: [0, +INF] >>>>> >>>>> I guess in this case it would be better to intersect the i_15 ranges >>>>> to [0, 1000] rather than [0, n_9(D) + 4294967295]. >>>>> >>>>> It does work if another VRP pass runs after CH. But even then, >>>>> is it a good idea to rely on the range info being kept up-to-date >>>>> all the way through to SLP? A lot happens inbetween. >>>> To some extend yes. Now I understand that SCEV uses niters >>>> information to prove no_overflow. Niters analysis does infer better >>>> information from array boundary, while VRP fails to do that. I don't >>>> worry much about gap between vrp pass and slp, it's basically the same >>>> as niters. Both information are analyzed at one point and meant to be >>>> used by following passes. It's also not common for vrp information >>>> become invalid given we are on SSA? >>> >>> I'm not worried so much about vrp information becoming invalid on >>> an SSA name that existed when VRP was run. It's more a question >>> of what happens about SSA names that get introduced after VRP, >>> e.g. by things like dom, reassoc, PRE, etc. >> For induction variables in concern, these passes shouldn't >> aggressively introduces new variables I think. >>> >>>> Now that data address is not analyzed against loop, VRP would be the >>>> only information we can use unless adding back scev analysis. IMHO, >>>> the patch is doing so in another way than before. It requires >>>> additional chrec data structure. I remember the previous patch >>>> enables more slp vectorization, is it because of "step" information >>>> from scev? >>> >>> Do you mean that: >>> >>> 2017-07-03 Richard Sandiford <richard.sandiford@linaro.org> >>> >>> * tree-data-ref.c (dr_analyze_innermost): Replace the "nest" >>> parameter with a "loop" parameter and use it instead of the >>> loop containing DR_STMT. Don't check simple_iv when doing >>> BB analysis. Describe the two analysis modes in the comment. >>> >>> enabled more SLP vectorisation in bb-slp-pr65935.c? That was due >>> to us not doing IV analysis for BB vectorisation, and ensuring that >>> the step was always zero. >> Which means vectorizer code can handle not IV-analyzed offset, but >> can't for analyzed form? >>> >>>> In this patch, step information is simply discarded. I am >>>> wondering if possible to always analyze scev within innermost loop for >>>> slp while discards step information. >>> >>> Well, we don't calculate a step for bb analysis (i.e. it's not the case >>> the patch calculates step information and then simply discards it). >>> I don't see how that would work. For bb analysis, the DR_OFFSET + DR_INIT >>> has to give the offset for every execution of the block, not just the >>> first iteration of the containing loop. So if we get back a nonzero >>> step, we have to do something with it. >> Yeah. >>> >>> But: >>> >>> (a) the old simple_iv analysis is more expensive than simply calling >>> analyze_scev, so I don't think this is a win in terms of complexity. >>> >>> (b) for bb analysis, there's nothing particularly special about the >>> innermost loop. It makes more sense to analyse it in the innermost >>> loop for which the offset is invariant, as shown by the second >>> testcase in the patch. >>> >>> (c) The patch helps with loop vectorisation too, since analysing the >>> starting DR_OFFSET in the context of the containing loop can help >>> in a similar way as analysing the full offset does for SLP. >> >> I have to admit I am not very much into this method. It complicates >> structure as well as code. >> Mostly because now dr_init are split into two different fields and one >> of it is lazily computed. >> >> For example: >>> @@ -2974,12 +2974,12 @@ vect_vfa_segment_size (struct data_refer >>> vect_no_alias_p (struct data_reference *a, struct data_reference *b, >>> tree segment_length_a, tree segment_length_b) >>> { >>> - gcc_assert (TREE_CODE (DR_INIT (a)) == INTEGER_CST >>> - && TREE_CODE (DR_INIT (b)) == INTEGER_CST); >>> - if (tree_int_cst_equal (DR_INIT (a), DR_INIT (b))) >>> + gcc_assert (TREE_CODE (DR_CHREC_INIT (a)) == INTEGER_CST >>> + && TREE_CODE (DR_CHREC_INIT (b)) == INTEGER_CST); >>> + if (tree_int_cst_equal (DR_CHREC_INIT (a), DR_CHREC_INIT (b))) >>> return false; >>> >>> - tree seg_a_min = DR_INIT (a); >>> + tree seg_a_min = DR_CHREC_INIT (a); >>> tree seg_a_max = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_a_min), >>> seg_a_min, segment_length_a); >>> /* For negative step, we need to adjust address range by TYPE_SIZE_UNIT >>> @@ -2990,10 +2990,10 @@ vect_no_alias_p (struct data_reference * >>> tree unit_size = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (a))); >>> seg_a_min = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_a_max), >>> seg_a_max, unit_size); >>> - seg_a_max = fold_build2 (PLUS_EXPR, TREE_TYPE (DR_INIT (a)), >>> - DR_INIT (a), unit_size); >>> + seg_a_max = fold_build2 (PLUS_EXPR, TREE_TYPE (DR_CHREC_INIT (a)), >>> + DR_CHREC_INIT (a), unit_size); >>> } >>> - tree seg_b_min = DR_INIT (b); >>> + tree seg_b_min = DR_CHREC_INIT (b); >>> tree seg_b_max = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_b_min), >>> seg_b_min, segment_length_b); >>> if (tree_int_cst_compare (DR_STEP (b), size_zero_node) < 0) >> >> Use of DR_INIT is simply replaced by DR_CHREC_INIT. Is it safe to do >> so in case of non-ZERO >> DR_INIT? It worries me that I may need to think twice before >> referring to DR_INIT because it's >> not clear when DR_OFFSET is split and DR_CHREC_INIT becomes non-ZERO. >> It may simply >> because I am too dumb to handle this. I will leave this to richi. > > I'm trying to understand this a bit (not liking it very much in its > current form). > > Can code currently using DR_OFFSET and DR_INIT simply use > DR_CHREC_INIT and CHREC_LEFT (DR_CHREC_OFFSET) (or DR_CHREC_OFFSET > if DR_CHREC_OFFSET is not a CHREC)? If yes, would there be any downside > in doing that? If not, then why? > > That said, I'm all for making DR info more powerful. But I detest > having extra fields > and adding confusion as to when to use which. Thus if we can make DR_CHREC_INIT > be DR_INIT and DR_OFFSET an inline function accessing CHREC_LEFT if > suitable plus exposing DR_OFFSET_CHREC that would make me more happy. And if we want to make it opt-in do a dr_analyze_me_harder () which will re-write DR_OFFSET/INIT into the new form. But DR_OFFSET and DR_INIT (read) accessors would maintain their semantics while DR_OFFSET_CHREC would expose the CHREC if available. Richard. > One downside might be that the scalar evolution of the offset might pull in > SSA vars into "DR_OFFSET" that are "far away" and thus less optimal for > code-generation when one re-constructs a ref by adding the components? > > Thanks, > Richard. > > >> Thanks, >> bin >>> >>> Thanks, >>> Richard >>> >>>> >>>> Thanks, >>>> bin >>>>> >>>>> FWIW, the old simple_iv check that I removed for bb data-ref analysis >>>>> relies on SCEV analysis too, so I don't think this is more expensive >>>>> than what we had before. >>>>> >>>>> Thanks, >>>>> Richard
Richard Biener <richard.guenther@gmail.com> writes: > On Fri, Aug 18, 2017 at 12:30 PM, Richard Biener > <richard.guenther@gmail.com> wrote: >> On Thu, Aug 17, 2017 at 2:24 PM, Bin.Cheng <amker.cheng@gmail.com> wrote: >>> On Thu, Aug 17, 2017 at 12:35 PM, Richard Sandiford >>> <richard.sandiford@linaro.org> wrote: >>>> "Bin.Cheng" <amker.cheng@gmail.com> writes: >>>>> On Wed, Aug 16, 2017 at 6:50 PM, Richard Sandiford >>>>> <richard.sandiford@linaro.org> wrote: >>>>>> "Bin.Cheng" <amker.cheng@gmail.com> writes: >>>>>>> On Wed, Aug 16, 2017 at 5:00 PM, Richard Sandiford >>>>>>> <richard.sandiford@linaro.org> wrote: >>>>>>>> "Bin.Cheng" <amker.cheng@gmail.com> writes: >>>>>>>>> On Wed, Aug 16, 2017 at 2:38 PM, Richard Sandiford >>>>>>>>> <richard.sandiford@linaro.org> wrote: >>>>>>>>>> The first loop in the testcase regressed after my recent changes to >>>>>>>>>> dr_analyze_innermost. Previously we would treat "i" as an iv even >>>>>>>>>> for bb analysis and end up with: >>>>>>>>>> >>>>>>>>>> DR_BASE_ADDRESS: p or q >>>>>>>>>> DR_OFFSET: 0 >>>>>>>>>> DR_INIT: 0 or 4 >>>>>>>>>> DR_STEP: 16 >>>>>>>>>> >>>>>>>>>> We now always keep the step as 0 instead, so for an int "i" we'd have: >>>>>>>>>> >>>>>>>>>> DR_BASE_ADDRESS: p or q >>>>>>>>>> DR_OFFSET: (intptr_t) i >>>>>>>>>> DR_INIT: 0 or 4 >>>>>>>>>> DR_STEP: 0 >>>>>>>>>> >>>>>>>>>> This is also what we'd like to have for the unsigned "i", but the >>>>>>>>>> problem is that strip_constant_offset thinks that the "i + 1" in >>>>>>>>>> "(intptr_t) (i + 1)" could wrap and so doesn't peel off the "+ 1". >>>>>>>>>> The [i + 1] accesses therefore have a DR_OFFSET equal to the SSA >>>>>>>>>> name that holds "(intptr_t) (i + 1)", meaning that the accesses no >>>>>>>>>> longer seem to be related to the [i] ones. >>>>>>>>> >>>>>>>>> Didn't read the change in detail, so sorry if I mis-understood >>>>>>>>> the issue. >>>>>>>>> I made changes in scev to better fold type conversion by >>>>>>>>> various sources >>>>>>>>> of information, for example, vrp, niters, undefined overflow >>>>>>>>> behavior etc. >>>>>>>>> In theory these information should be available for other >>>>>>>>> optimizers without >>>>>>>>> querying scev. For the mentioned test, vrp should compute >>>>>>>>> accurate range >>>>>>>>> information for "i" so that we can fold (intptr_t) (i + 1) it without >>>>>>>>> worrying >>>>>>>>> overflow. Note we don't do it in generic folding because >>>>>>>>> (intptr_t) (i) + 1 >>>>>>>>> could be more expensive (especially in case of (T)(i + j)), or >>>>>>>>> because the >>>>>>>>> CST part is in bigger precision after conversion. >>>>>>>>> But such folding is wanted in several places, e.g, IVOPTs. To >>>>>>>>> provide such >>>>>>>>> an interface, we changed tree-affine and made it do aggressive >>>>>>>>> fold. I am >>>>>>>>> curious if it's possible to use aff_tree to implement >>>>>>>>> strip_constant_offset >>>>>>>>> here since aggressive folding is wanted. After all, using >>>>>>>>> additional chrec >>>>>>>>> looks like a little heavy wrto the simple test. >>>>>>>> >>>>>>>> Yeah, using aff_tree does work here when the bounds are constant. >>>>>>>> It doesn't look like it works for things like: >>>>>>>> >>>>>>>> double p[1000]; >>>>>>>> double q[1000]; >>>>>>>> >>>>>>>> void >>>>>>>> f4 (unsigned int n) >>>>>>>> { >>>>>>>> for (unsigned int i = 0; i < n; i += 4) >>>>>>>> { >>>>>>>> double a = q[i] + p[i]; >>>>>>>> double b = q[i + 1] + p[i + 1]; >>>>>>>> q[i] = a; >>>>>>>> q[i + 1] = b; >>>>>>>> } >>>>>>>> } >>>>>>>> >>>>>>>> though, where the bounds on the global arrays guarantee that [i + 1] can't >>>>>>>> overflow, even though "n" is unconstrained. The patch as posted handles >>>>>>>> this case too. >>>>>>> BTW is this a missed optimization in value range analysis? The range >>>>>>> information for i should flow in a way like: array boundary -> niters >>>>>>> -> scev/vrp. >>>>>>> I think that's what niters/scev do in analysis. >>>>>> >>>>>> Yeah, maybe :-) It looks like the problem is that when SLP runs, >>>>>> the previous VRP pass came before loop header copying, so the (single) >>>>>> header has to cope with n == 0 case. Thus we get: >>>>> Ah, there are several passes in between vrp and pass_ch, not sure if >>>>> any such pass depends on vrp intensively. I would suggestion reorder >>>>> the two passes, or standalone VRP interface updating information for >>>>> loop region after header copied? This is a non-trivial issue that >>>>> needs to be fixed. Niters analyzer rely on >>>>> simplify_using_initial_conditions heavily to get the same information, >>>>> which in my opinion should be provided by VRP. Though this won't be >>>>> able to obsolete simplify_using_initial_conditions because VRP is weak >>>>> in symbolic range... >>>>> >>>>>> >>>>>> Visiting statement: >>>>>> i_15 = ASSERT_EXPR <i_6, i_6 < n_9(D)>; >>>>>> Intersecting >>>>>> [0, n_9(D) + 4294967295] EQUIVALENCES: { i_6 } (1 elements) >>>>>> and >>>>>> [0, 0] >>>>>> to >>>>>> [0, 0] EQUIVALENCES: { i_6 } (1 elements) >>>>>> Intersecting >>>>>> [0, 0] EQUIVALENCES: { i_6 } (1 elements) >>>>>> and >>>>>> [0, 1000] >>>>>> to >>>>>> [0, 0] EQUIVALENCES: { i_6 } (1 elements) >>>>>> Found new range for i_15: [0, 0] >>>>>> >>>>>> Visiting statement: >>>>>> _3 = i_15 + 1; >>>>>> Match-and-simplified i_15 + 1 to 1 >>>>>> Intersecting >>>>>> [1, 1] >>>>>> and >>>>>> [0, +INF] >>>>>> to >>>>>> [1, 1] >>>>>> Found new range for _3: [1, 1] >>>>>> >>>>>> (where _3 is the index we care about), followed by: >>>>>> >>>>>> Visiting statement: >>>>>> i_15 = ASSERT_EXPR <i_6, i_6 < n_9(D)>; >>>>>> Intersectings/aarch64-linux/trunk-orig/debug/gcc' >>>>>> [0, n_9(D) + 4294967295] EQUIVALENCES: { i_6 } (1 elements) >>>>>> and >>>>>> [0, 4] >>>>>> to >>>>>> [0, n_9(D) + 4294967295] EQUIVALENCES: { i_6 } (1 elements) >>>>>> Intersecting >>>>>> [0, n_9(D) + 4294967295] EQUIVALENCES: { i_6 } (1 elements) >>>>>> and >>>>>> [0, 1000] >>>>>> to >>>>>> [0, n_9(D) + 4294967295] EQUIVALENCES: { i_6 } (1 elements) >>>>>> Found new range for i_15: [0, n_9(D) + 4294967295] >>>>>> >>>>>> Visiting statement: >>>>>> _3 = i_15 + 1; >>>>>> Intersecting >>>>>> [0, +INF] >>>>>> and >>>>>> [0, +INF] >>>>>> to >>>>>> [0, +INF] >>>>>> Found new range for _3: [0, +INF] >>>>>> >>>>>> I guess in this case it would be better to intersect the i_15 ranges >>>>>> to [0, 1000] rather than [0, n_9(D) + 4294967295]. >>>>>> >>>>>> It does work if another VRP pass runs after CH. But even then, >>>>>> is it a good idea to rely on the range info being kept up-to-date >>>>>> all the way through to SLP? A lot happens inbetween. >>>>> To some extend yes. Now I understand that SCEV uses niters >>>>> information to prove no_overflow. Niters analysis does infer better >>>>> information from array boundary, while VRP fails to do that. I don't >>>>> worry much about gap between vrp pass and slp, it's basically the same >>>>> as niters. Both information are analyzed at one point and meant to be >>>>> used by following passes. It's also not common for vrp information >>>>> become invalid given we are on SSA? >>>> >>>> I'm not worried so much about vrp information becoming invalid on >>>> an SSA name that existed when VRP was run. It's more a question >>>> of what happens about SSA names that get introduced after VRP, >>>> e.g. by things like dom, reassoc, PRE, etc. >>> For induction variables in concern, these passes shouldn't >>> aggressively introduces new variables I think. >>>> >>>>> Now that data address is not analyzed against loop, VRP would be the >>>>> only information we can use unless adding back scev analysis. IMHO, >>>>> the patch is doing so in another way than before. It requires >>>>> additional chrec data structure. I remember the previous patch >>>>> enables more slp vectorization, is it because of "step" information >>>>> from scev? >>>> >>>> Do you mean that: >>>> >>>> 2017-07-03 Richard Sandiford <richard.sandiford@linaro.org> >>>> >>>> * tree-data-ref.c (dr_analyze_innermost): Replace the "nest" >>>> parameter with a "loop" parameter and use it instead of the >>>> loop containing DR_STMT. Don't check simple_iv when doing >>>> BB analysis. Describe the two analysis modes in the comment. >>>> >>>> enabled more SLP vectorisation in bb-slp-pr65935.c? That was due >>>> to us not doing IV analysis for BB vectorisation, and ensuring that >>>> the step was always zero. >>> Which means vectorizer code can handle not IV-analyzed offset, but >>> can't for analyzed form? >>>> >>>>> In this patch, step information is simply discarded. I am >>>>> wondering if possible to always analyze scev within innermost loop for >>>>> slp while discards step information. >>>> >>>> Well, we don't calculate a step for bb analysis (i.e. it's not the case >>>> the patch calculates step information and then simply discards it). >>>> I don't see how that would work. For bb analysis, the DR_OFFSET + DR_INIT >>>> has to give the offset for every execution of the block, not just the >>>> first iteration of the containing loop. So if we get back a nonzero >>>> step, we have to do something with it. >>> Yeah. >>>> >>>> But: >>>> >>>> (a) the old simple_iv analysis is more expensive than simply calling >>>> analyze_scev, so I don't think this is a win in terms of complexity. >>>> >>>> (b) for bb analysis, there's nothing particularly special about the >>>> innermost loop. It makes more sense to analyse it in the innermost >>>> loop for which the offset is invariant, as shown by the second >>>> testcase in the patch. >>>> >>>> (c) The patch helps with loop vectorisation too, since analysing the >>>> starting DR_OFFSET in the context of the containing loop can help >>>> in a similar way as analysing the full offset does for SLP. >>> >>> I have to admit I am not very much into this method. It complicates >>> structure as well as code. >>> Mostly because now dr_init are split into two different fields and one >>> of it is lazily computed. >>> >>> For example: >>>> @@ -2974,12 +2974,12 @@ vect_vfa_segment_size (struct data_refer >>>> vect_no_alias_p (struct data_reference *a, struct data_reference *b, >>>> tree segment_length_a, tree segment_length_b) >>>> { >>>> - gcc_assert (TREE_CODE (DR_INIT (a)) == INTEGER_CST >>>> - && TREE_CODE (DR_INIT (b)) == INTEGER_CST); >>>> - if (tree_int_cst_equal (DR_INIT (a), DR_INIT (b))) >>>> + gcc_assert (TREE_CODE (DR_CHREC_INIT (a)) == INTEGER_CST >>>> + && TREE_CODE (DR_CHREC_INIT (b)) == INTEGER_CST); >>>> + if (tree_int_cst_equal (DR_CHREC_INIT (a), DR_CHREC_INIT (b))) >>>> return false; >>>> >>>> - tree seg_a_min = DR_INIT (a); >>>> + tree seg_a_min = DR_CHREC_INIT (a); >>>> tree seg_a_max = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_a_min), >>>> seg_a_min, segment_length_a); >>>> /* For negative step, we need to adjust address range by TYPE_SIZE_UNIT >>>> @@ -2990,10 +2990,10 @@ vect_no_alias_p (struct data_reference * >>>> tree unit_size = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (a))); >>>> seg_a_min = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_a_max), >>>> seg_a_max, unit_size); >>>> - seg_a_max = fold_build2 (PLUS_EXPR, TREE_TYPE (DR_INIT (a)), >>>> - DR_INIT (a), unit_size); >>>> + seg_a_max = fold_build2 (PLUS_EXPR, TREE_TYPE (DR_CHREC_INIT (a)), >>>> + DR_CHREC_INIT (a), unit_size); >>>> } >>>> - tree seg_b_min = DR_INIT (b); >>>> + tree seg_b_min = DR_CHREC_INIT (b); >>>> tree seg_b_max = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_b_min), >>>> seg_b_min, segment_length_b); >>>> if (tree_int_cst_compare (DR_STEP (b), size_zero_node) < 0) >>> >>> Use of DR_INIT is simply replaced by DR_CHREC_INIT. Is it safe to do >>> so in case of non-ZERO >>> DR_INIT? It worries me that I may need to think twice before >>> referring to DR_INIT because it's >>> not clear when DR_OFFSET is split and DR_CHREC_INIT becomes non-ZERO. >>> It may simply >>> because I am too dumb to handle this. I will leave this to richi. >> >> I'm trying to understand this a bit (not liking it very much in its >> current form). >> >> Can code currently using DR_OFFSET and DR_INIT simply use >> DR_CHREC_INIT and CHREC_LEFT (DR_CHREC_OFFSET) (or DR_CHREC_OFFSET >> if DR_CHREC_OFFSET is not a CHREC)? If yes, would there be any downside >> in doing that? If not, then why? There's nothing particularly special about the CHREC_LEFT for users of the drs. The chrec as a whole describes the variable part of the offset. >> That said, I'm all for making DR info more powerful. But I detest >> having extra fields >> and adding confusion as to when to use which. Thus if we can make >> DR_CHREC_INIT >> be DR_INIT and DR_OFFSET an inline function accessing CHREC_LEFT if >> suitable plus exposing DR_OFFSET_CHREC that would make me more happy. > > And if we want to make it opt-in do a dr_analyze_me_harder () which will > re-write DR_OFFSET/INIT into the new form. > > But DR_OFFSET and DR_INIT (read) accessors would maintain their > semantics while DR_OFFSET_CHREC would expose the CHREC if > available. After the changes, the only place that actually cared about the split between the "old" DR_OFFSET and DR_INIT was tree-predcom.c: ref_at_iteration. Everywhere else just wanted the sum of OFFSET and INIT, and for them it would have been more convenient to have a combined field. So maybe one way of trying to avoid the confusion would be to keep DR_OFFSET together as the full starting offset from the base, then provide DR_VAR_OFFSET and DR_CONST_OFFSET as the split forms, with DR_VAR_OFFSET being more like an abstract value number. See the comment at the start of the patch below for more details. I'm not sure whether the main reason for splitting the offset was to make life easier for the users of the drs, or whether it was to try to avoid creating new trees. But then, the unconditional scev analysis that we did previously already generated new trees. Tested on aarch64-linux-gnu and x86_64-linux-gnu. Thanks, Richard 2017-08-22 Richard Sandiford <richard.sandiford@arm.com> gcc/ PR tree-optimization/81635 * tree-data-ref.h (innermost_loop_behavior): Remove init field. Add var_offset and const_offset fields. Rename offset_alignment to var_offset_alignment. (DR_INIT): Delete. (DR_CONST_OFFSET, DR_VAR_OFFSET): New macros. (DR_OFFSET_ALIGNMENT): Replace with... (DR_VAR_OFFSET_ALIGNMENT): ...this new macro. (dr_analyze_innermost): Add a gimple * argument. (dr_equal_offsets_p): Delete. (dr_var_offsets_equal_p, dr_var_offsets_compare): Declare. * tree-vectorizer.h (STMT_VINFO_DR_INIT): Delete. (STMT_VINFO_DR_VAR_OFFSET, STMT_VINFO_DR_CONST_OFFSET): New macros. (STMT_VINFO_DR_OFFSET_ALIGNMENT): Replace with... (STMT_VINFO_DR_VAR_OFFSET_ALIGNMENT): ...this new macro. * tree.c (tree_ctz): Handle POLYNOMIAL_CHREC. * tree-data-ref.c: Include tree-ssa-loop-ivopts.h. (split_constant_offset): Handle POLYNOMIAL_CHREC. (analyze_offset_scev): New function. (dr_analyze_innermost): Add a gimple * statement. Update after changes to innermost_behavior. Initialize var_offset and const_offset. (create_data_ref): Update call to dr_analyze_innermost. Update dump after changes to innermost_behavior. (operator ==): Use dr_var_offsets_equal_p and compare the DR_CONST_OFFSETs. (prune_runtime_alias_test_list): Likewise. (comp_dr_with_seg_len_pair): Use dr_var_offsets_compare and compare the DR_CONST_OFFSETs. (create_intersect_range_checks): Use DR_OFFSET without adding DR_INIT. (dr_equal_offsets_p1, dr_equal_offsets_p): Delete. (dr_alignment): Use const_offset instead of init and var_offset_alignment instead of offset_alignment. * tree-if-conv.c (innermost_loop_behavior_hash::hash): Don't test the init field. (innermost_loop_behavior_hash::equal): Likewise. (ifcvt_memrefs_wont_trap): Likewise. (if_convertible_loop_p_1): Likewise. * tree-loop-distribution.c (build_addr_arg_loc): Use DR_OFFSET without adding DR_INIT. (build_rdg_partition_for_vertex): Don't check DR_INIT. (share_memory_accesses): Likewise. (pg_add_dependence_edges): Likewise. (compute_alias_check_pairs): Use dr_var_offsets_compare. * tree-predcom.c (aff_combination_dr_offset): Use DR_OFFSET without adding DR_INIT. (determine_offset): Likewise. (valid_initializer_p): Likewise. (find_looparound_phi): Update call to dr_analyze_innermost. (ref_at_iteration): Use split_constant_offset to split the offset. * tree-vect-data-refs.c (vect_compute_data_ref_alignment): Use const_offset instead of init and var_offset_alignment instead of offset_alignment. (vect_find_same_alignment_drs): Use dr_var_offsets_compare and compare the DR_CONST_OFFSETs. (dr_group_sort_cmp): Likewise. (vect_analyze_group_access_1): Use DR_CONST_OFFSET instead of DR_INIT. (vect_no_alias_p): Likewise. (vect_analyze_data_ref_accesses): Use dr_var_offsets_equal_p and compare the DR_CONST_OFFSETs. (vect_prune_runtime_alias_test_list): Use dr_var_offsets_compare. (vect_analyze_data_refs): Don't check DR_INIT and use DR_OFFSET without adding DR_INIT. Use DR_VAR_OFFSET_ALIGNMENT instead of DR_OFFSET_ALIGNMENT. Update call to dr_analyze_innermost, and update subsequent dump. (vect_create_addr_base_for_vector_ref): Use DR_OFFSET without adding DR_INIT. * tree-vect-stmts.c (vectorizable_store): Likewise. (vectorizable_load): Likewise. Use DR_CONST_OFFSET instead of DR_INIT. gcc/testsuite/ PR tree-optimization/81635 * gcc.dg/vect/bb-slp-pr81635.c: New test.Index: gcc/tree-data-ref.h =================================================================== --- gcc/tree-data-ref.h 2017-08-21 10:42:51.088530428 +0100 +++ gcc/tree-data-ref.h 2017-08-22 14:54:48.630563940 +0100 @@ -28,29 +28,44 @@ #define GCC_TREE_DATA_REF_H innermost_loop_behavior describes the evolution of the address of the memory reference in the innermost enclosing loop. The address is expressed as BASE + STEP * # of iteration, and base is further decomposed as the base - pointer (BASE_ADDRESS), loop invariant offset (OFFSET) and - constant offset (INIT). Examples, in loop nest + pointer (BASE_ADDRESS) and the loop invariant offset (OFFSET). + OFFSET is further expressed as the sum of a zero or non-constant term + (VAR_OFFSET) and a constant term (CONST_OFFSET). VAR_OFFSET should be + treated as an abstract representation; in particular, it may contain + chrecs. CONST_OFFSET is always an INTEGER_CST. - for (i = 0; i < 100; i++) - for (j = 3; j < 100; j++) + Examples, in loop nest - Example 1 Example 2 - data-ref a[j].b[i][j] *(p + x + 16B + 4B * j) + 1: for (i = 0; i < 100; i++) + 2: for (j = 3; j < 100; j++) + Example 1 Example 2 + data-ref a[j].b[i][j] *(p + x + 16B + 4B * j) - innermost_loop_behavior - base_address &a p - offset i * D_i x - init 3 * D_j + offsetof (b) 28 - step D_j 4 - */ + innermost_loop_behavior + base_address &a p + offset i * D_i + 3 * D_j + offsetof (b) x + 28 + var_offset {0, +, D_i}_1 x (or an equiv. chrec) + const_offset 3 * D_j + offsetof (b) 28 + step D_j 4 + + The main two uses of VAR_OFFSET and CONST_OFFSET are: + + 1. to better analyze the alignment, since CONST_OFFSET can be treated as + the misalignment wrt the alignment of VAR_OFFSET. + + 2. to find data references that are a constant number of bytes apart. + If two data references have the same BASE_ADDRESS and VAR_OFFSET, + the distance between them is given by the difference in their + CONST_OFFSETs. */ struct innermost_loop_behavior { tree base_address; tree offset; - tree init; tree step; + tree var_offset; + tree const_offset; /* BASE_ADDRESS is known to be misaligned by BASE_MISALIGNMENT bytes from an alignment boundary of BASE_ALIGNMENT bytes. For example, @@ -91,7 +106,7 @@ struct innermost_loop_behavior /* The largest power of two that divides OFFSET, capped to a suitably high value if the offset is zero. This is a byte rather than a bit quantity. */ - unsigned int offset_alignment; + unsigned int var_offset_alignment; /* Likewise for STEP. */ unsigned int step_alignment; @@ -186,12 +201,13 @@ #define DR_IS_WRITE(DR) (!DR_ #define DR_IS_CONDITIONAL_IN_STMT(DR) (DR)->is_conditional_in_stmt #define DR_BASE_ADDRESS(DR) (DR)->innermost.base_address #define DR_OFFSET(DR) (DR)->innermost.offset -#define DR_INIT(DR) (DR)->innermost.init +#define DR_VAR_OFFSET(DR) (DR)->innermost.var_offset +#define DR_CONST_OFFSET(DR) (DR)->innermost.const_offset #define DR_STEP(DR) (DR)->innermost.step #define DR_PTR_INFO(DR) (DR)->alias.ptr_info #define DR_BASE_ALIGNMENT(DR) (DR)->innermost.base_alignment #define DR_BASE_MISALIGNMENT(DR) (DR)->innermost.base_misalignment -#define DR_OFFSET_ALIGNMENT(DR) (DR)->innermost.offset_alignment +#define DR_VAR_OFFSET_ALIGNMENT(DR) (DR)->innermost.var_offset_alignment #define DR_STEP_ALIGNMENT(DR) (DR)->innermost.step_alignment #define DR_INNERMOST(DR) (DR)->innermost @@ -412,7 +428,8 @@ #define DDR_REVERSED_P(DDR) (DDR)->rever #define DDR_COULD_BE_INDEPENDENT_P(DDR) (DDR)->could_be_independent_p -bool dr_analyze_innermost (innermost_loop_behavior *, tree, struct loop *); +bool dr_analyze_innermost (innermost_loop_behavior *, tree, + gimple *, struct loop *); extern bool compute_data_dependences_for_loop (struct loop *, bool, vec<loop_p> *, vec<data_reference_p> *, @@ -466,8 +483,6 @@ dr_alignment (data_reference *dr) extern bool dr_may_alias_p (const struct data_reference *, const struct data_reference *, bool); -extern bool dr_equal_offsets_p (struct data_reference *, - struct data_reference *); extern bool runtime_alias_check_p (ddr_p, struct loop *, bool); extern int data_ref_compare_tree (tree, tree); @@ -675,4 +690,23 @@ lambda_matrix_new (int m, int n, struct return mat; } +/* Check if DRA and DRB have equal DR_VAR_OFFSETs. */ + +inline bool +dr_var_offsets_equal_p (struct data_reference *dra, + struct data_reference *drb) +{ + return eq_evolutions_p (DR_VAR_OFFSET (dra), DR_VAR_OFFSET (drb)); +} + +/* Compare the DR_VAR_OFFSETs of DRA and DRB for sorting purposes, + returning a qsort-style result. */ + +inline int +dr_var_offsets_compare (struct data_reference *dra, + struct data_reference *drb) +{ + return data_ref_compare_tree (DR_VAR_OFFSET (dra), DR_VAR_OFFSET (drb)); +} + #endif /* GCC_TREE_DATA_REF_H */ Index: gcc/tree-vectorizer.h =================================================================== --- gcc/tree-vectorizer.h 2017-08-04 11:42:45.939105152 +0100 +++ gcc/tree-vectorizer.h 2017-08-22 14:54:48.633563940 +0100 @@ -740,14 +740,15 @@ #define STMT_VINFO_VEC_CONST_COND_REDUC_ #define STMT_VINFO_DR_WRT_VEC_LOOP(S) (S)->dr_wrt_vec_loop #define STMT_VINFO_DR_BASE_ADDRESS(S) (S)->dr_wrt_vec_loop.base_address -#define STMT_VINFO_DR_INIT(S) (S)->dr_wrt_vec_loop.init #define STMT_VINFO_DR_OFFSET(S) (S)->dr_wrt_vec_loop.offset +#define STMT_VINFO_DR_VAR_OFFSET(S) (S)->dr_wrt_vec_loop.var_offset +#define STMT_VINFO_DR_CONST_OFFSET(S) (S)->dr_wrt_vec_loop.const_offset #define STMT_VINFO_DR_STEP(S) (S)->dr_wrt_vec_loop.step #define STMT_VINFO_DR_BASE_ALIGNMENT(S) (S)->dr_wrt_vec_loop.base_alignment #define STMT_VINFO_DR_BASE_MISALIGNMENT(S) \ (S)->dr_wrt_vec_loop.base_misalignment -#define STMT_VINFO_DR_OFFSET_ALIGNMENT(S) \ - (S)->dr_wrt_vec_loop.offset_alignment +#define STMT_VINFO_DR_VAR_OFFSET_ALIGNMENT(S) \ + (S)->dr_wrt_vec_loop.var_offset_alignment #define STMT_VINFO_DR_STEP_ALIGNMENT(S) \ (S)->dr_wrt_vec_loop.step_alignment Index: gcc/tree.c =================================================================== --- gcc/tree.c 2017-08-21 12:14:47.159835474 +0100 +++ gcc/tree.c 2017-08-22 14:54:48.634563941 +0100 @@ -2601,6 +2601,12 @@ tree_ctz (const_tree expr) return MIN (ret1, prec); } return 0; + case POLYNOMIAL_CHREC: + ret1 = tree_ctz (CHREC_LEFT (expr)); + if (ret1 == 0) + return ret1; + ret2 = tree_ctz (CHREC_RIGHT (expr)); + return MIN (ret1, ret2); default: return 0; } Index: gcc/tree-data-ref.c =================================================================== --- gcc/tree-data-ref.c 2017-08-21 10:42:51.088530428 +0100 +++ gcc/tree-data-ref.c 2017-08-22 14:54:48.629563940 +0100 @@ -86,6 +86,7 @@ Software Foundation; either version 3, o #include "expr.h" #include "gimple-iterator.h" #include "tree-ssa-loop-niter.h" +#include "tree-ssa-loop-ivopts.h" #include "tree-ssa-loop.h" #include "tree-ssa.h" #include "cfgloop.h" @@ -730,7 +731,19 @@ split_constant_offset (tree exp, tree *v *off = ssize_int (0); STRIP_NOPS (exp); - if (tree_is_chrec (exp) + if (TREE_CODE (exp) == POLYNOMIAL_CHREC) + { + split_constant_offset (CHREC_LEFT (exp), &op0, &op1); + if (op0 != CHREC_LEFT (exp)) + { + *var = build3 (POLYNOMIAL_CHREC, type, CHREC_VAR (exp), + op0, CHREC_RIGHT (exp)); + *off = op1; + } + return; + } + + if (automatically_generated_chrec_p (exp) || get_gimple_rhs_class (TREE_CODE (exp)) == GIMPLE_TERNARY_RHS) return; @@ -765,7 +778,28 @@ canonicalize_base_object_address (tree a return build_fold_addr_expr (TREE_OPERAND (addr, 0)); } -/* Analyze the behavior of memory reference REF. There are two modes: +/* Analyze the scalar evolution of OFFSET in the innermost parent of + LOOP for which it isn't invariant. Return OFFSET itself if the + value is invariant or if it's too complex to analyze. */ + +static tree +analyze_offset_scev (struct loop *loop, tree offset) +{ + struct loop *inv_loop = outermost_invariant_loop_for_expr (loop, offset); + if (inv_loop != NULL) + { + if (loop_depth (inv_loop) == 0) + return offset; + loop = loop_outer (inv_loop); + } + tree res = analyze_scalar_evolution (loop, offset); + if (chrec_contains_undetermined (res)) + return offset; + return res; +} + +/* Analyze the behavior of memory reference REF, which occurs in STMT. + There are two modes: - BB analysis. In this case we simply split the address into base, init and offset components, without reference to any containing loop. @@ -787,14 +821,14 @@ canonicalize_base_object_address (tree a bool dr_analyze_innermost (innermost_loop_behavior *drb, tree ref, - struct loop *loop) + gimple *stmt, struct loop *loop) { HOST_WIDE_INT pbitsize, pbitpos; tree base, poffset; machine_mode pmode; int punsignedp, preversep, pvolatilep; affine_iv base_iv, offset_iv; - tree init, dinit, step; + tree dinit; bool in_loop = (loop && loop->num); if (dump_file && (dump_flags & TDF_DETAILS)) @@ -885,7 +919,7 @@ dr_analyze_innermost (innermost_loop_beh } } - init = ssize_int (pbitpos / BITS_PER_UNIT); + tree init = ssize_int (pbitpos / BITS_PER_UNIT); /* Subtract any constant component from the base and add it to INIT instead. Adjust the misalignment to reflect the amount we subtracted. */ @@ -893,14 +927,14 @@ dr_analyze_innermost (innermost_loop_beh init = size_binop (PLUS_EXPR, init, dinit); base_misalignment -= TREE_INT_CST_LOW (dinit); - split_constant_offset (offset_iv.base, &offset_iv.base, &dinit); - init = size_binop (PLUS_EXPR, init, dinit); - - step = size_binop (PLUS_EXPR, - fold_convert (ssizetype, base_iv.step), - fold_convert (ssizetype, offset_iv.step)); - base = canonicalize_base_object_address (base_iv.base); + tree offset = size_binop (PLUS_EXPR, + fold_convert (ssizetype, offset_iv.base), + init); + tree step = size_binop (PLUS_EXPR, + fold_convert (ssizetype, base_iv.step), + fold_convert (ssizetype, offset_iv.step)); + tree scev = analyze_offset_scev (loop_containing_stmt (stmt), offset); /* See if get_pointer_alignment can guarantee a higher alignment than the one we calculated above. */ @@ -921,12 +955,12 @@ dr_analyze_innermost (innermost_loop_beh } drb->base_address = base; - drb->offset = fold_convert (ssizetype, offset_iv.base); - drb->init = init; + drb->offset = offset; drb->step = step; + split_constant_offset (scev, &drb->var_offset, &drb->const_offset); drb->base_alignment = base_alignment; drb->base_misalignment = base_misalignment & (base_alignment - 1); - drb->offset_alignment = highest_pow2_factor (offset_iv.base); + drb->var_offset_alignment = highest_pow2_factor (drb->var_offset); drb->step_alignment = highest_pow2_factor (step); if (dump_file && (dump_flags & TDF_DETAILS)) @@ -1154,7 +1188,7 @@ create_data_ref (loop_p nest, loop_p loo DR_IS_READ (dr) = is_read; DR_IS_CONDITIONAL_IN_STMT (dr) = is_conditional_in_stmt; - dr_analyze_innermost (&DR_INNERMOST (dr), memref, + dr_analyze_innermost (&DR_INNERMOST (dr), memref, stmt, nest != NULL ? loop : NULL); dr_analyze_indices (dr, nest, loop); dr_analyze_alias (dr); @@ -1166,15 +1200,17 @@ create_data_ref (loop_p nest, loop_p loo print_generic_expr (dump_file, DR_BASE_ADDRESS (dr), TDF_SLIM); fprintf (dump_file, "\n\toffset from base address: "); print_generic_expr (dump_file, DR_OFFSET (dr), TDF_SLIM); - fprintf (dump_file, "\n\tconstant offset from base address: "); - print_generic_expr (dump_file, DR_INIT (dr), TDF_SLIM); + fprintf (dump_file, "\n\tvariable part of offset: "); + print_generic_expr (dump_file, DR_VAR_OFFSET (dr), TDF_SLIM); + fprintf (dump_file, "\n\tconstant part of offset: "); + print_generic_expr (dump_file, DR_CONST_OFFSET (dr), TDF_SLIM); fprintf (dump_file, "\n\tstep: "); print_generic_expr (dump_file, DR_STEP (dr), TDF_SLIM); fprintf (dump_file, "\n\tbase alignment: %d", DR_BASE_ALIGNMENT (dr)); fprintf (dump_file, "\n\tbase misalignment: %d", DR_BASE_MISALIGNMENT (dr)); - fprintf (dump_file, "\n\toffset alignment: %d", - DR_OFFSET_ALIGNMENT (dr)); + fprintf (dump_file, "\n\tvariable offset alignment: %d", + DR_VAR_OFFSET_ALIGNMENT (dr)); fprintf (dump_file, "\n\tstep alignment: %d", DR_STEP_ALIGNMENT (dr)); fprintf (dump_file, "\n\tbase_object: "); print_generic_expr (dump_file, DR_BASE_OBJECT (dr), TDF_SLIM); @@ -1324,11 +1360,12 @@ runtime_alias_check_p (ddr_p ddr, struct operator == (const dr_with_seg_len& d1, const dr_with_seg_len& d2) { - return operand_equal_p (DR_BASE_ADDRESS (d1.dr), + return (operand_equal_p (DR_BASE_ADDRESS (d1.dr), DR_BASE_ADDRESS (d2.dr), 0) - && data_ref_compare_tree (DR_OFFSET (d1.dr), DR_OFFSET (d2.dr)) == 0 - && data_ref_compare_tree (DR_INIT (d1.dr), DR_INIT (d2.dr)) == 0 - && data_ref_compare_tree (d1.seg_len, d2.seg_len) == 0; + && dr_var_offsets_equal_p (d1.dr, d2.dr) + && data_ref_compare_tree (DR_CONST_OFFSET (d1.dr), + DR_CONST_OFFSET (d2.dr)) == 0 + && data_ref_compare_tree (d1.seg_len, d2.seg_len) == 0); } /* Comparison function for sorting objects of dr_with_seg_len_pair_t @@ -1360,17 +1397,15 @@ comp_dr_with_seg_len_pair (const void *p if ((comp_res = data_ref_compare_tree (DR_STEP (a2.dr), DR_STEP (b2.dr))) != 0) return comp_res; - if ((comp_res = data_ref_compare_tree (DR_OFFSET (a1.dr), - DR_OFFSET (b1.dr))) != 0) + if ((comp_res = dr_var_offsets_compare (a1.dr, b1.dr)) != 0) return comp_res; - if ((comp_res = data_ref_compare_tree (DR_INIT (a1.dr), - DR_INIT (b1.dr))) != 0) + if ((comp_res = data_ref_compare_tree (DR_CONST_OFFSET (a1.dr), + DR_CONST_OFFSET (b1.dr))) != 0) return comp_res; - if ((comp_res = data_ref_compare_tree (DR_OFFSET (a2.dr), - DR_OFFSET (b2.dr))) != 0) + if ((comp_res = dr_var_offsets_compare (a2.dr, b2.dr)) != 0) return comp_res; - if ((comp_res = data_ref_compare_tree (DR_INIT (a2.dr), - DR_INIT (b2.dr))) != 0) + if ((comp_res = data_ref_compare_tree (DR_CONST_OFFSET (a2.dr), + DR_CONST_OFFSET (b2.dr))) != 0) return comp_res; return 0; @@ -1455,10 +1490,9 @@ prune_runtime_alias_test_list (vec<dr_wi if (!operand_equal_p (DR_BASE_ADDRESS (dr_a1->dr), DR_BASE_ADDRESS (dr_a2->dr), 0) - || !operand_equal_p (DR_OFFSET (dr_a1->dr), - DR_OFFSET (dr_a2->dr), 0) - || !tree_fits_shwi_p (DR_INIT (dr_a1->dr)) - || !tree_fits_shwi_p (DR_INIT (dr_a2->dr))) + || !dr_var_offsets_equal_p (dr_a1->dr, dr_a2->dr) + || !tree_fits_shwi_p (DR_CONST_OFFSET (dr_a1->dr)) + || !tree_fits_shwi_p (DR_CONST_OFFSET (dr_a2->dr))) continue; /* Only merge const step data references. */ @@ -1484,11 +1518,13 @@ prune_runtime_alias_test_list (vec<dr_wi continue; /* Make sure dr_a1 starts left of dr_a2. */ - if (tree_int_cst_lt (DR_INIT (dr_a2->dr), DR_INIT (dr_a1->dr))) + if (tree_int_cst_lt (DR_CONST_OFFSET (dr_a2->dr), + DR_CONST_OFFSET (dr_a1->dr))) std::swap (*dr_a1, *dr_a2); bool do_remove = false; - wide_int diff = wi::sub (DR_INIT (dr_a2->dr), DR_INIT (dr_a1->dr)); + wide_int diff = wi::sub (DR_CONST_OFFSET (dr_a2->dr), + DR_CONST_OFFSET (dr_a1->dr)); wide_int min_seg_len_b; tree new_seg_len; @@ -1756,10 +1792,6 @@ create_intersect_range_checks (struct lo tree addr_base_b = DR_BASE_ADDRESS (dr_b.dr); tree offset_a = DR_OFFSET (dr_a.dr), offset_b = DR_OFFSET (dr_b.dr); - offset_a = fold_build2 (PLUS_EXPR, TREE_TYPE (offset_a), - offset_a, DR_INIT (dr_a.dr)); - offset_b = fold_build2 (PLUS_EXPR, TREE_TYPE (offset_b), - offset_b, DR_INIT (dr_b.dr)); addr_base_a = fold_build_pointer_plus (addr_base_a, offset_a); addr_base_b = fold_build_pointer_plus (addr_base_b, offset_b); @@ -1826,48 +1858,6 @@ create_runtime_alias_checks (struct loop } } -/* Check if OFFSET1 and OFFSET2 (DR_OFFSETs of some data-refs) are identical - expressions. */ -static bool -dr_equal_offsets_p1 (tree offset1, tree offset2) -{ - bool res; - - STRIP_NOPS (offset1); - STRIP_NOPS (offset2); - - if (offset1 == offset2) - return true; - - if (TREE_CODE (offset1) != TREE_CODE (offset2) - || (!BINARY_CLASS_P (offset1) && !UNARY_CLASS_P (offset1))) - return false; - - res = dr_equal_offsets_p1 (TREE_OPERAND (offset1, 0), - TREE_OPERAND (offset2, 0)); - - if (!res || !BINARY_CLASS_P (offset1)) - return res; - - res = dr_equal_offsets_p1 (TREE_OPERAND (offset1, 1), - TREE_OPERAND (offset2, 1)); - - return res; -} - -/* Check if DRA and DRB have equal offsets. */ -bool -dr_equal_offsets_p (struct data_reference *dra, - struct data_reference *drb) -{ - tree offset1, offset2; - - offset1 = DR_OFFSET (dra); - offset2 = DR_OFFSET (drb); - - return dr_equal_offsets_p1 (offset1, offset2); -} - /* Returns true if FNA == FNB. */ static bool @@ -5083,13 +5073,13 @@ dr_alignment (innermost_loop_behavior *d /* Get the alignment of BASE_ADDRESS + INIT. */ unsigned int alignment = drb->base_alignment; unsigned int misalignment = (drb->base_misalignment - + TREE_INT_CST_LOW (drb->init)); + + TREE_INT_CST_LOW (drb->const_offset)); if (misalignment != 0) alignment = MIN (alignment, misalignment & -misalignment); /* Cap it to the alignment of OFFSET. */ if (!integer_zerop (drb->offset)) - alignment = MIN (alignment, drb->offset_alignment); + alignment = MIN (alignment, drb->var_offset_alignment); /* Cap it to the alignment of STEP. */ if (!integer_zerop (drb->step)) Index: gcc/tree-if-conv.c =================================================================== --- gcc/tree-if-conv.c 2017-07-13 09:25:12.666266733 +0100 +++ gcc/tree-if-conv.c 2017-08-22 14:54:48.630563940 +0100 @@ -149,7 +149,6 @@ innermost_loop_behavior_hash::hash (cons hash = iterative_hash_expr (e->base_address, 0); hash = iterative_hash_expr (e->offset, hash); - hash = iterative_hash_expr (e->init, hash); return iterative_hash_expr (e->step, hash); } @@ -161,8 +160,6 @@ innermost_loop_behavior_hash::equal (con || (!e1->base_address && e2->base_address) || (!e1->offset && e2->offset) || (e1->offset && !e2->offset) - || (!e1->init && e2->init) - || (e1->init && !e2->init) || (!e1->step && e2->step) || (e1->step && !e2->step)) return false; @@ -173,9 +170,6 @@ innermost_loop_behavior_hash::equal (con if (e1->offset && e2->offset && !operand_equal_p (e1->offset, e2->offset, 0)) return false; - if (e1->init && e2->init - && !operand_equal_p (e1->init, e2->init, 0)) - return false; if (e1->step && e2->step && !operand_equal_p (e1->step, e2->step, 0)) return false; @@ -856,8 +850,7 @@ ifcvt_memrefs_wont_trap (gimple *stmt, v innermost_loop_behavior *innermost = &DR_INNERMOST (a); gcc_assert (DR_STMT (a) == stmt); - gcc_assert (DR_BASE_ADDRESS (a) || DR_OFFSET (a) - || DR_INIT (a) || DR_STEP (a)); + gcc_assert (DR_BASE_ADDRESS (a) || DR_OFFSET (a) || DR_STEP (a)); master_dr = innermost_DR_map->get (innermost); gcc_assert (master_dr != NULL); @@ -1433,8 +1426,7 @@ if_convertible_loop_p_1 (struct loop *lo if (TREE_CODE (ref) == COMPONENT_REF || TREE_CODE (ref) == IMAGPART_EXPR || TREE_CODE (ref) == REALPART_EXPR - || !(DR_BASE_ADDRESS (dr) || DR_OFFSET (dr) - || DR_INIT (dr) || DR_STEP (dr))) + || !(DR_BASE_ADDRESS (dr) || DR_OFFSET (dr) || DR_STEP (dr))) { while (TREE_CODE (ref) == COMPONENT_REF || TREE_CODE (ref) == IMAGPART_EXPR Index: gcc/tree-loop-distribution.c =================================================================== --- gcc/tree-loop-distribution.c 2017-08-21 10:42:51.088530428 +0100 +++ gcc/tree-loop-distribution.c 2017-08-22 14:54:48.630563940 +0100 @@ -903,8 +903,7 @@ build_addr_arg_loc (location_t loc, data { tree addr_base; - addr_base = size_binop_loc (loc, PLUS_EXPR, DR_OFFSET (dr), DR_INIT (dr)); - addr_base = fold_convert_loc (loc, sizetype, addr_base); + addr_base = fold_convert_loc (loc, sizetype, DR_OFFSET (dr)); /* Test for a negative stride, iterating over every element. */ if (tree_int_cst_sgn (DR_STEP (dr)) == -1) @@ -1289,8 +1288,7 @@ build_rdg_partition_for_vertex (struct g /* Partition can only be executed sequentially if there is any unknown data reference. */ - if (!DR_BASE_ADDRESS (dr) || !DR_OFFSET (dr) - || !DR_INIT (dr) || !DR_STEP (dr)) + if (!DR_BASE_ADDRESS (dr) || !DR_OFFSET (dr) || !DR_STEP (dr)) partition->type = PTYPE_SEQUENTIAL; bitmap_set_bit (partition->datarefs, idx); @@ -1507,21 +1505,18 @@ share_memory_accesses (struct graph *rdg { dr1 = datarefs_vec[i]; - if (!DR_BASE_ADDRESS (dr1) - || !DR_OFFSET (dr1) || !DR_INIT (dr1) || !DR_STEP (dr1)) + if (!DR_BASE_ADDRESS (dr1) || !DR_OFFSET (dr1) || !DR_STEP (dr1)) continue; EXECUTE_IF_SET_IN_BITMAP (partition2->datarefs, 0, j, bj) { dr2 = datarefs_vec[j]; - if (!DR_BASE_ADDRESS (dr2) - || !DR_OFFSET (dr2) || !DR_INIT (dr2) || !DR_STEP (dr2)) + if (!DR_BASE_ADDRESS (dr2) || !DR_OFFSET (dr2) || !DR_STEP (dr2)) continue; if (operand_equal_p (DR_BASE_ADDRESS (dr1), DR_BASE_ADDRESS (dr2), 0) && operand_equal_p (DR_OFFSET (dr1), DR_OFFSET (dr2), 0) - && operand_equal_p (DR_INIT (dr1), DR_INIT (dr2), 0) && operand_equal_p (DR_STEP (dr1), DR_STEP (dr2), 0)) return true; } @@ -1705,7 +1700,6 @@ pg_add_dependence_edges (struct graph *r runtime alias check. */ if (!DR_BASE_ADDRESS (dr1) || !DR_BASE_ADDRESS (dr2) || !DR_OFFSET (dr1) || !DR_OFFSET (dr2) - || !DR_INIT (dr1) || !DR_INIT (dr2) || !DR_STEP (dr1) || !tree_fits_uhwi_p (DR_STEP (dr1)) || !DR_STEP (dr2) || !tree_fits_uhwi_p (DR_STEP (dr2)) || res == 0) @@ -2203,7 +2197,7 @@ compute_alias_check_pairs (struct loop * DR_BASE_ADDRESS (dr_b)); if (comp_res == 0) - comp_res = data_ref_compare_tree (DR_OFFSET (dr_a), DR_OFFSET (dr_b)); + comp_res = dr_var_offsets_compare (dr_a, dr_b); gcc_assert (comp_res != 0); if (latch_dominated_by_data_ref (loop, dr_a)) Index: gcc/tree-predcom.c =================================================================== --- gcc/tree-predcom.c 2017-08-10 14:36:08.057471267 +0100 +++ gcc/tree-predcom.c 2017-08-22 14:54:48.631563940 +0100 @@ -666,18 +666,13 @@ suitable_reference_p (struct data_refere return true; } -/* Stores DR_OFFSET (DR) + DR_INIT (DR) to OFFSET. */ +/* Stores DR_OFFSET (DR) to OFFSET. */ static void aff_combination_dr_offset (struct data_reference *dr, aff_tree *offset) { - tree type = TREE_TYPE (DR_OFFSET (dr)); - aff_tree delta; - - tree_to_aff_combination_expand (DR_OFFSET (dr), type, offset, - &name_expansions); - aff_combination_const (&delta, type, wi::to_widest (DR_INIT (dr))); - aff_combination_add (offset, &delta); + tree_to_aff_combination_expand (DR_OFFSET (dr), TREE_TYPE (DR_OFFSET (dr)), + offset, &name_expansions); } /* Determines number of iterations of the innermost enclosing loop before B @@ -710,8 +705,7 @@ determine_offset (struct data_reference /* If the references have loop invariant address, check that they access exactly the same location. */ *off = 0; - return (operand_equal_p (DR_OFFSET (a), DR_OFFSET (b), 0) - && operand_equal_p (DR_INIT (a), DR_INIT (b), 0)); + return operand_equal_p (DR_OFFSET (a), DR_OFFSET (b), 0); } /* Compare the offsets of the addresses, and check whether the difference @@ -1171,8 +1165,7 @@ valid_initializer_p (struct data_referen /* If the address of the reference is invariant, initializer must access exactly the same location. */ if (integer_zerop (DR_STEP (root))) - return (operand_equal_p (DR_OFFSET (ref), DR_OFFSET (root), 0) - && operand_equal_p (DR_INIT (ref), DR_INIT (root), 0)); + return operand_equal_p (DR_OFFSET (ref), DR_OFFSET (root), 0); /* Verify that this index of REF is equal to the root's index at -DISTANCE-th iteration. */ @@ -1247,7 +1240,7 @@ find_looparound_phi (struct loop *loop, memset (&init_dr, 0, sizeof (struct data_reference)); DR_REF (&init_dr) = init_ref; DR_STMT (&init_dr) = phi; - if (!dr_analyze_innermost (&DR_INNERMOST (&init_dr), init_ref, loop)) + if (!dr_analyze_innermost (&DR_INNERMOST (&init_dr), init_ref, phi, loop)) return NULL; if (!valid_initializer_p (&init_dr, ref->distance + 1, root->ref)) @@ -1481,8 +1474,7 @@ replace_ref_with (gimple *stmt, tree new ref_at_iteration (data_reference_p dr, int iter, gimple_seq *stmts, tree niters = NULL_TREE) { - tree off = DR_OFFSET (dr); - tree coff = DR_INIT (dr); + tree off, coff; tree ref = DR_REF (dr); enum tree_code ref_code = ERROR_MARK; tree ref_type = NULL_TREE; @@ -1490,6 +1482,7 @@ ref_at_iteration (data_reference_p dr, i tree ref_op2 = NULL_TREE; tree new_offset; + split_constant_offset (DR_OFFSET (dr), &off, &coff); if (iter != 0) { new_offset = size_binop (MULT_EXPR, DR_STEP (dr), ssize_int (iter)); Index: gcc/tree-vect-data-refs.c =================================================================== --- gcc/tree-vect-data-refs.c 2017-08-21 10:42:51.088530428 +0100 +++ gcc/tree-vect-data-refs.c 2017-08-22 14:54:48.632563940 +0100 @@ -866,7 +866,7 @@ vect_compute_data_ref_alignment (struct base_misalignment = (*entry)->base_misalignment; } - if (drb->offset_alignment < vector_alignment + if (drb->var_offset_alignment < vector_alignment || !step_preserves_misalignment_p /* We need to know whether the step wrt the vectorized loop is negative when computing the starting misalignment below. */ @@ -928,7 +928,7 @@ vect_compute_data_ref_alignment (struct base_misalignment = 0; } unsigned int misalignment = (base_misalignment - + TREE_INT_CST_LOW (drb->init)); + + TREE_INT_CST_LOW (drb->const_offset)); /* If this is a backward running DR then first access in the larger vectype actually is N-1 elements before the address in the DR. @@ -2187,13 +2187,13 @@ vect_find_same_alignment_drs (struct dat return; if (!operand_equal_p (DR_BASE_ADDRESS (dra), DR_BASE_ADDRESS (drb), 0) - || !operand_equal_p (DR_OFFSET (dra), DR_OFFSET (drb), 0) + || !dr_var_offsets_equal_p (dra, drb) || !operand_equal_p (DR_STEP (dra), DR_STEP (drb), 0)) return; /* Two references with distance zero have the same alignment. */ - offset_int diff = (wi::to_offset (DR_INIT (dra)) - - wi::to_offset (DR_INIT (drb))); + offset_int diff = (wi::to_offset (DR_CONST_OFFSET (dra)) + - wi::to_offset (DR_CONST_OFFSET (drb))); if (diff != 0) { /* Get the wider of the two alignments. */ @@ -2434,7 +2434,7 @@ vect_analyze_group_access_1 (struct data gimple *next = GROUP_NEXT_ELEMENT (vinfo_for_stmt (stmt)); struct data_reference *data_ref = dr; unsigned int count = 1; - tree prev_init = DR_INIT (data_ref); + tree prev_init = DR_CONST_OFFSET (data_ref); gimple *prev = stmt; HOST_WIDE_INT diff, gaps = 0; @@ -2444,9 +2444,10 @@ vect_analyze_group_access_1 (struct data data-ref (supported only for loads), we vectorize only the first stmt, and the rest get their vectorized loads from the first one. */ - if (!tree_int_cst_compare (DR_INIT (data_ref), - DR_INIT (STMT_VINFO_DATA_REF ( - vinfo_for_stmt (next))))) + data_reference *next_ref + = STMT_VINFO_DATA_REF (vinfo_for_stmt (next)); + if (!tree_int_cst_compare (DR_CONST_OFFSET (data_ref), + DR_CONST_OFFSET (next_ref))) { if (DR_IS_WRITE (data_ref)) { @@ -2469,14 +2470,14 @@ vect_analyze_group_access_1 (struct data } prev = next; - data_ref = STMT_VINFO_DATA_REF (vinfo_for_stmt (next)); + data_ref = next_ref; /* All group members have the same STEP by construction. */ gcc_checking_assert (operand_equal_p (DR_STEP (data_ref), step, 0)); /* Check that the distance between two accesses is equal to the type size. Otherwise, we have gaps. */ - diff = (TREE_INT_CST_LOW (DR_INIT (data_ref)) + diff = (TREE_INT_CST_LOW (DR_CONST_OFFSET (data_ref)) - TREE_INT_CST_LOW (prev_init)) / type_size; if (diff != 1) { @@ -2499,7 +2500,7 @@ vect_analyze_group_access_1 (struct data gap in the access, GROUP_GAP is always 1. */ GROUP_GAP (vinfo_for_stmt (next)) = diff; - prev_init = DR_INIT (data_ref); + prev_init = DR_CONST_OFFSET (data_ref); next = GROUP_NEXT_ELEMENT (vinfo_for_stmt (next)); /* Count the number of data-refs in the chain. */ count++; @@ -2715,13 +2716,10 @@ dr_group_sort_cmp (const void *dra_, con return cmp; } - /* And according to DR_OFFSET. */ - if (!dr_equal_offsets_p (dra, drb)) - { - cmp = data_ref_compare_tree (DR_OFFSET (dra), DR_OFFSET (drb)); - if (cmp != 0) - return cmp; - } + /* And according to DR_VAR_OFFSET. */ + cmp = dr_var_offsets_compare (dra, drb); + if (cmp != 0) + return cmp; /* Put reads before writes. */ if (DR_IS_READ (dra) != DR_IS_READ (drb)) @@ -2745,8 +2743,9 @@ dr_group_sort_cmp (const void *dra_, con return cmp; } - /* Then sort after DR_INIT. In case of identical DRs sort after stmt UID. */ - cmp = tree_int_cst_compare (DR_INIT (dra), DR_INIT (drb)); + /* Then sort after DR_CONST_OFFSET. In case of identical DRs sort after + stmt UID. */ + cmp = tree_int_cst_compare (DR_CONST_OFFSET (dra), DR_CONST_OFFSET (drb)); if (cmp == 0) return gimple_uid (DR_STMT (dra)) < gimple_uid (DR_STMT (drb)) ? -1 : 1; return cmp; @@ -2817,7 +2816,7 @@ vect_analyze_data_ref_accesses (vec_info if (DR_IS_READ (dra) != DR_IS_READ (drb) || !operand_equal_p (DR_BASE_ADDRESS (dra), DR_BASE_ADDRESS (drb), 0) - || !dr_equal_offsets_p (dra, drb) + || !dr_var_offsets_equal_p (dra, drb) || !gimple_assign_single_p (DR_STMT (dra)) || !gimple_assign_single_p (DR_STMT (drb))) break; @@ -2835,7 +2834,8 @@ vect_analyze_data_ref_accesses (vec_info break; /* Do not place the same access in the interleaving chain twice. */ - if (tree_int_cst_compare (DR_INIT (dra), DR_INIT (drb)) == 0) + if (tree_int_cst_compare (DR_CONST_OFFSET (dra), + DR_CONST_OFFSET (drb)) == 0) break; /* Check the types are compatible. @@ -2844,9 +2844,10 @@ vect_analyze_data_ref_accesses (vec_info TREE_TYPE (DR_REF (drb)))) break; - /* Sorting has ensured that DR_INIT (dra) <= DR_INIT (drb). */ - HOST_WIDE_INT init_a = TREE_INT_CST_LOW (DR_INIT (dra)); - HOST_WIDE_INT init_b = TREE_INT_CST_LOW (DR_INIT (drb)); + /* Sorting has ensured that + DR_CONST_OFFSET (dra) <= DR_CONST_OFFSET (drb). */ + HOST_WIDE_INT init_a = TREE_INT_CST_LOW (DR_CONST_OFFSET (dra)); + HOST_WIDE_INT init_b = TREE_INT_CST_LOW (DR_CONST_OFFSET (drb)); gcc_assert (init_a <= init_b); /* If init_b == init_a + the size of the type * k, we have an @@ -2859,10 +2860,10 @@ vect_analyze_data_ref_accesses (vec_info /* If we have a store, the accesses are adjacent. This splits groups into chunks we support (we don't support vectorization of stores with gaps). */ + HOST_WIDE_INT prev_init + = TREE_INT_CST_LOW (DR_CONST_OFFSET (datarefs_copy[i - 1])); if (!DR_IS_READ (dra) - && (init_b - (HOST_WIDE_INT) TREE_INT_CST_LOW - (DR_INIT (datarefs_copy[i-1])) - != type_size_a)) + && (init_b - prev_init) != type_size_a) break; /* If the step (if not zero or non-constant) is greater than the @@ -2974,12 +2975,12 @@ vect_vfa_segment_size (struct data_refer vect_no_alias_p (struct data_reference *a, struct data_reference *b, tree segment_length_a, tree segment_length_b) { - gcc_assert (TREE_CODE (DR_INIT (a)) == INTEGER_CST - && TREE_CODE (DR_INIT (b)) == INTEGER_CST); - if (tree_int_cst_equal (DR_INIT (a), DR_INIT (b))) + gcc_assert (TREE_CODE (DR_CONST_OFFSET (a)) == INTEGER_CST + && TREE_CODE (DR_CONST_OFFSET (b)) == INTEGER_CST); + if (tree_int_cst_equal (DR_CONST_OFFSET (a), DR_CONST_OFFSET (b))) return false; - tree seg_a_min = DR_INIT (a); + tree seg_a_min = DR_CONST_OFFSET (a); tree seg_a_max = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_a_min), seg_a_min, segment_length_a); /* For negative step, we need to adjust address range by TYPE_SIZE_UNIT @@ -2990,10 +2991,10 @@ vect_no_alias_p (struct data_reference * tree unit_size = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (a))); seg_a_min = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_a_max), seg_a_max, unit_size); - seg_a_max = fold_build2 (PLUS_EXPR, TREE_TYPE (DR_INIT (a)), - DR_INIT (a), unit_size); + seg_a_max = fold_build2 (PLUS_EXPR, TREE_TYPE (DR_CONST_OFFSET (a)), + DR_CONST_OFFSET (a), unit_size); } - tree seg_b_min = DR_INIT (b); + tree seg_b_min = DR_CONST_OFFSET (b); tree seg_b_max = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_b_min), seg_b_min, segment_length_b); if (tree_int_cst_compare (DR_STEP (b), size_zero_node) < 0) @@ -3001,8 +3002,8 @@ vect_no_alias_p (struct data_reference * tree unit_size = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (b))); seg_b_min = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_b_max), seg_b_max, unit_size); - seg_b_max = fold_build2 (PLUS_EXPR, TREE_TYPE (DR_INIT (b)), - DR_INIT (b), unit_size); + seg_b_max = fold_build2 (PLUS_EXPR, TREE_TYPE (DR_CONST_OFFSET (b)), + DR_CONST_OFFSET (b), unit_size); } if (tree_int_cst_le (seg_a_max, seg_b_min) @@ -3148,8 +3149,7 @@ vect_prune_runtime_alias_test_list (loop comp_res = data_ref_compare_tree (DR_BASE_ADDRESS (dr_a), DR_BASE_ADDRESS (dr_b)); if (comp_res == 0) - comp_res = data_ref_compare_tree (DR_OFFSET (dr_a), - DR_OFFSET (dr_b)); + comp_res = dr_var_offsets_compare (dr_a, dr_b); /* Alias is known at compilation time. */ if (comp_res == 0 @@ -3455,7 +3455,7 @@ vect_analyze_data_refs (vec_info *vinfo, { gimple *stmt; stmt_vec_info stmt_info; - tree base, offset, init; + tree base, offset; enum { SG_NONE, GATHER, SCATTER } gatherscatter = SG_NONE; bool simd_lane_access = false; int vf; @@ -3488,8 +3488,7 @@ vect_analyze_data_refs (vec_info *vinfo, } /* Check that analysis of the data-ref succeeded. */ - if (!DR_BASE_ADDRESS (dr) || !DR_OFFSET (dr) || !DR_INIT (dr) - || !DR_STEP (dr)) + if (!DR_BASE_ADDRESS (dr) || !DR_OFFSET (dr) || !DR_STEP (dr)) { bool maybe_gather = DR_IS_READ (dr) @@ -3515,7 +3514,6 @@ vect_analyze_data_refs (vec_info *vinfo, gcc_assert (newdr != NULL && DR_REF (newdr)); if (DR_BASE_ADDRESS (newdr) && DR_OFFSET (newdr) - && DR_INIT (newdr) && DR_STEP (newdr) && integer_zerop (DR_STEP (newdr))) { @@ -3523,8 +3521,7 @@ vect_analyze_data_refs (vec_info *vinfo, { tree off = DR_OFFSET (newdr); STRIP_NOPS (off); - if (TREE_CODE (DR_INIT (newdr)) == INTEGER_CST - && TREE_CODE (off) == MULT_EXPR + if (TREE_CODE (off) == MULT_EXPR && tree_fits_uhwi_p (TREE_OPERAND (off, 1))) { tree step = TREE_OPERAND (off, 1); @@ -3555,7 +3552,7 @@ vect_analyze_data_refs (vec_info *vinfo, { DR_OFFSET (newdr) = ssize_int (0); DR_STEP (newdr) = step; - DR_OFFSET_ALIGNMENT (newdr) + DR_VAR_OFFSET_ALIGNMENT (newdr) = BIGGEST_ALIGNMENT; DR_STEP_ALIGNMENT (newdr) = highest_pow2_factor (step); @@ -3665,7 +3662,6 @@ vect_analyze_data_refs (vec_info *vinfo, base = unshare_expr (DR_BASE_ADDRESS (dr)); offset = unshare_expr (DR_OFFSET (dr)); - init = unshare_expr (DR_INIT (dr)); if (is_gimple_call (stmt) && (!gimple_call_internal_p (stmt) @@ -3701,9 +3697,7 @@ vect_analyze_data_refs (vec_info *vinfo, inner loop: *(BASE + INIT + OFFSET). By construction, this address must be invariant in the inner loop, so we can consider it as being used in the outer loop. */ - tree init_offset = fold_build2 (PLUS_EXPR, TREE_TYPE (offset), - init, offset); - tree init_addr = fold_build_pointer_plus (base, init_offset); + tree init_addr = fold_build_pointer_plus (base, offset); tree init_ref = build_fold_indirect_ref (init_addr); if (dump_enabled_p ()) @@ -3715,7 +3709,7 @@ vect_analyze_data_refs (vec_info *vinfo, } if (!dr_analyze_innermost (&STMT_VINFO_DR_WRT_VEC_LOOP (stmt_info), - init_ref, loop)) + init_ref, stmt, loop)) /* dr_analyze_innermost already explained the failure. */ return false; @@ -3728,10 +3722,14 @@ vect_analyze_data_refs (vec_info *vinfo, dump_printf (MSG_NOTE, "\n\touter offset from base address: "); dump_generic_expr (MSG_NOTE, TDF_SLIM, STMT_VINFO_DR_OFFSET (stmt_info)); - dump_printf (MSG_NOTE, - "\n\touter constant offset from base address: "); + dump_printf (MSG_NOTE, "\n\tvariable part of outer offset" + " from base address: "); + dump_generic_expr (MSG_NOTE, TDF_SLIM, + STMT_VINFO_DR_VAR_OFFSET (stmt_info)); + dump_printf (MSG_NOTE, "\n\tconstart part of outer offset" + " from base address: "); dump_generic_expr (MSG_NOTE, TDF_SLIM, - STMT_VINFO_DR_INIT (stmt_info)); + STMT_VINFO_DR_CONST_OFFSET (stmt_info)); dump_printf (MSG_NOTE, "\n\touter step: "); dump_generic_expr (MSG_NOTE, TDF_SLIM, STMT_VINFO_DR_STEP (stmt_info)); @@ -3739,8 +3737,9 @@ vect_analyze_data_refs (vec_info *vinfo, STMT_VINFO_DR_BASE_ALIGNMENT (stmt_info)); dump_printf (MSG_NOTE, "\n\touter base misalignment: %d\n", STMT_VINFO_DR_BASE_MISALIGNMENT (stmt_info)); - dump_printf (MSG_NOTE, "\n\touter offset alignment: %d\n", - STMT_VINFO_DR_OFFSET_ALIGNMENT (stmt_info)); + dump_printf (MSG_NOTE, + "\n\touter variable offset alignment: %d\n", + STMT_VINFO_DR_VAR_OFFSET_ALIGNMENT (stmt_info)); dump_printf (MSG_NOTE, "\n\touter step alignment: %d\n", STMT_VINFO_DR_STEP_ALIGNMENT (stmt_info)); } @@ -4055,23 +4054,18 @@ vect_create_addr_base_for_vector_ref (gi innermost_loop_behavior *drb = vect_dr_behavior (dr); tree data_ref_base = unshare_expr (drb->base_address); - tree base_offset = unshare_expr (drb->offset); - tree init = unshare_expr (drb->init); - + tree base_offset; if (loop_vinfo) - base_name = get_name (data_ref_base); + { + base_name = get_name (data_ref_base); + base_offset = fold_convert (sizetype, unshare_expr (drb->offset)); + } else { - base_offset = ssize_int (0); - init = ssize_int (0); + base_offset = size_int (0); base_name = get_name (DR_REF (dr)); } - /* Create base_offset */ - base_offset = size_binop (PLUS_EXPR, - fold_convert (sizetype, base_offset), - fold_convert (sizetype, init)); - if (offset) { offset = fold_build2 (MULT_EXPR, sizetype, Index: gcc/tree-vect-stmts.c =================================================================== --- gcc/tree-vect-stmts.c 2017-08-21 15:50:48.664709938 +0100 +++ gcc/tree-vect-stmts.c 2017-08-22 14:54:48.633563940 +0100 @@ -5970,9 +5970,7 @@ vectorizable_store (gimple *stmt, gimple stride_base = fold_build_pointer_plus (unshare_expr (DR_BASE_ADDRESS (first_dr)), - size_binop (PLUS_EXPR, - convert_to_ptrofftype (unshare_expr (DR_OFFSET (first_dr))), - convert_to_ptrofftype (DR_INIT (first_dr)))); + convert_to_ptrofftype (unshare_expr (DR_OFFSET (first_dr)))); stride_step = fold_convert (sizetype, unshare_expr (DR_STEP (first_dr))); /* For a store with loop-invariant (but other than power-of-2) @@ -6299,7 +6297,6 @@ vectorizable_store (gimple *stmt, gimple && TREE_CODE (DR_BASE_ADDRESS (first_dr)) == ADDR_EXPR && VAR_P (TREE_OPERAND (DR_BASE_ADDRESS (first_dr), 0)) && integer_zerop (DR_OFFSET (first_dr)) - && integer_zerop (DR_INIT (first_dr)) && alias_sets_conflict_p (get_alias_set (aggr_type), get_alias_set (TREE_TYPE (ref_type)))) { @@ -6993,9 +6990,7 @@ vectorizable_load (gimple *stmt, gimple_ stride_base = fold_build_pointer_plus (DR_BASE_ADDRESS (first_dr), - size_binop (PLUS_EXPR, - convert_to_ptrofftype (DR_OFFSET (first_dr)), - convert_to_ptrofftype (DR_INIT (first_dr)))); + convert_to_ptrofftype (DR_OFFSET (first_dr))); stride_step = fold_convert (sizetype, DR_STEP (first_dr)); /* For a load with loop-invariant (but other than power-of-2) @@ -7394,7 +7389,6 @@ vectorizable_load (gimple *stmt, gimple_ && TREE_CODE (DR_BASE_ADDRESS (first_dr)) == ADDR_EXPR && VAR_P (TREE_OPERAND (DR_BASE_ADDRESS (first_dr), 0)) && integer_zerop (DR_OFFSET (first_dr)) - && integer_zerop (DR_INIT (first_dr)) && alias_sets_conflict_p (get_alias_set (aggr_type), get_alias_set (TREE_TYPE (ref_type))) && (alignment_support_scheme == dr_aligned @@ -7417,8 +7411,8 @@ vectorizable_load (gimple *stmt, gimple_ = STMT_VINFO_DATA_REF (vinfo_for_stmt (first_stmt_for_drptr)); tree diff = fold_convert (sizetype, size_binop (MINUS_EXPR, - DR_INIT (first_dr), - DR_INIT (ptrdr))); + DR_CONST_OFFSET (first_dr), + DR_CONST_OFFSET (ptrdr))); dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, gsi, stmt, diff); } Index: gcc/testsuite/gcc.dg/vect/bb-slp-pr81635.c =================================================================== --- /dev/null 2017-08-21 17:11:06.647307681 +0100 +++ gcc/testsuite/gcc.dg/vect/bb-slp-pr81635.c 2017-08-22 14:54:48.628563940 +0100 @@ -0,0 +1,56 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_unpack } */ + +double p[1000]; +double q[1000]; + +void +f1 (void) +{ + for (unsigned int i = 0; i < 1000; i += 4) + { + double a = q[i] + p[i]; + double b = q[i + 1] + p[i + 1]; + q[i] = a; + q[i + 1] = b; + } +} + +void +f2 (void) +{ + for (unsigned int i = 0; i < 500; i += 6) + for (unsigned int j = 0; j < 500; j += 4) + { + double a = q[j] + p[i]; + double b = q[j + 1] + p[i + 1]; + q[i] = a; + q[i + 1] = b; + } +} + +void +f3 (void) +{ + for (unsigned int i = 2; i < 1000; i += 4) + { + double a = q[i - 2] + p[i - 2]; + double b = q[i - 1] + p[i - 1]; + q[i - 2] = a; + q[i - 1] = b; + } +} + +void +f4 (unsigned int n) +{ + for (unsigned int i = 0; i < n; i += 4) + { + double a = q[i] + p[i]; + double b = q[i + 1] + p[i + 1]; + q[i] = a; + q[i + 1] = b; + } +} + +/* { dg-final { scan-tree-dump-times "basic block vectorized" 4 "slp1" } } */
On Tue, Aug 22, 2017 at 4:19 PM, Richard Sandiford <richard.sandiford@linaro.org> wrote: > Richard Biener <richard.guenther@gmail.com> writes: >> On Fri, Aug 18, 2017 at 12:30 PM, Richard Biener >> <richard.guenther@gmail.com> wrote: >>> On Thu, Aug 17, 2017 at 2:24 PM, Bin.Cheng <amker.cheng@gmail.com> wrote: >>>> On Thu, Aug 17, 2017 at 12:35 PM, Richard Sandiford >>>> <richard.sandiford@linaro.org> wrote: >>>>> "Bin.Cheng" <amker.cheng@gmail.com> writes: >>>>>> On Wed, Aug 16, 2017 at 6:50 PM, Richard Sandiford >>>>>> <richard.sandiford@linaro.org> wrote: >>>>>>> "Bin.Cheng" <amker.cheng@gmail.com> writes: >>>>>>>> On Wed, Aug 16, 2017 at 5:00 PM, Richard Sandiford >>>>>>>> <richard.sandiford@linaro.org> wrote: >>>>>>>>> "Bin.Cheng" <amker.cheng@gmail.com> writes: >>>>>>>>>> On Wed, Aug 16, 2017 at 2:38 PM, Richard Sandiford >>>>>>>>>> <richard.sandiford@linaro.org> wrote: >>>>>>>>>>> The first loop in the testcase regressed after my recent changes to >>>>>>>>>>> dr_analyze_innermost. Previously we would treat "i" as an iv even >>>>>>>>>>> for bb analysis and end up with: >>>>>>>>>>> >>>>>>>>>>> DR_BASE_ADDRESS: p or q >>>>>>>>>>> DR_OFFSET: 0 >>>>>>>>>>> DR_INIT: 0 or 4 >>>>>>>>>>> DR_STEP: 16 >>>>>>>>>>> >>>>>>>>>>> We now always keep the step as 0 instead, so for an int "i" we'd have: >>>>>>>>>>> >>>>>>>>>>> DR_BASE_ADDRESS: p or q >>>>>>>>>>> DR_OFFSET: (intptr_t) i >>>>>>>>>>> DR_INIT: 0 or 4 >>>>>>>>>>> DR_STEP: 0 >>>>>>>>>>> >>>>>>>>>>> This is also what we'd like to have for the unsigned "i", but the >>>>>>>>>>> problem is that strip_constant_offset thinks that the "i + 1" in >>>>>>>>>>> "(intptr_t) (i + 1)" could wrap and so doesn't peel off the "+ 1". >>>>>>>>>>> The [i + 1] accesses therefore have a DR_OFFSET equal to the SSA >>>>>>>>>>> name that holds "(intptr_t) (i + 1)", meaning that the accesses no >>>>>>>>>>> longer seem to be related to the [i] ones. >>>>>>>>>> >>>>>>>>>> Didn't read the change in detail, so sorry if I mis-understood >>>>>>>>>> the issue. >>>>>>>>>> I made changes in scev to better fold type conversion by >>>>>>>>>> various sources >>>>>>>>>> of information, for example, vrp, niters, undefined overflow >>>>>>>>>> behavior etc. >>>>>>>>>> In theory these information should be available for other >>>>>>>>>> optimizers without >>>>>>>>>> querying scev. For the mentioned test, vrp should compute >>>>>>>>>> accurate range >>>>>>>>>> information for "i" so that we can fold (intptr_t) (i + 1) it without >>>>>>>>>> worrying >>>>>>>>>> overflow. Note we don't do it in generic folding because >>>>>>>>>> (intptr_t) (i) + 1 >>>>>>>>>> could be more expensive (especially in case of (T)(i + j)), or >>>>>>>>>> because the >>>>>>>>>> CST part is in bigger precision after conversion. >>>>>>>>>> But such folding is wanted in several places, e.g, IVOPTs. To >>>>>>>>>> provide such >>>>>>>>>> an interface, we changed tree-affine and made it do aggressive >>>>>>>>>> fold. I am >>>>>>>>>> curious if it's possible to use aff_tree to implement >>>>>>>>>> strip_constant_offset >>>>>>>>>> here since aggressive folding is wanted. After all, using >>>>>>>>>> additional chrec >>>>>>>>>> looks like a little heavy wrto the simple test. >>>>>>>>> >>>>>>>>> Yeah, using aff_tree does work here when the bounds are constant. >>>>>>>>> It doesn't look like it works for things like: >>>>>>>>> >>>>>>>>> double p[1000]; >>>>>>>>> double q[1000]; >>>>>>>>> >>>>>>>>> void >>>>>>>>> f4 (unsigned int n) >>>>>>>>> { >>>>>>>>> for (unsigned int i = 0; i < n; i += 4) >>>>>>>>> { >>>>>>>>> double a = q[i] + p[i]; >>>>>>>>> double b = q[i + 1] + p[i + 1]; >>>>>>>>> q[i] = a; >>>>>>>>> q[i + 1] = b; >>>>>>>>> } >>>>>>>>> } >>>>>>>>> >>>>>>>>> though, where the bounds on the global arrays guarantee that [i + 1] can't >>>>>>>>> overflow, even though "n" is unconstrained. The patch as posted handles >>>>>>>>> this case too. >>>>>>>> BTW is this a missed optimization in value range analysis? The range >>>>>>>> information for i should flow in a way like: array boundary -> niters >>>>>>>> -> scev/vrp. >>>>>>>> I think that's what niters/scev do in analysis. >>>>>>> >>>>>>> Yeah, maybe :-) It looks like the problem is that when SLP runs, >>>>>>> the previous VRP pass came before loop header copying, so the (single) >>>>>>> header has to cope with n == 0 case. Thus we get: >>>>>> Ah, there are several passes in between vrp and pass_ch, not sure if >>>>>> any such pass depends on vrp intensively. I would suggestion reorder >>>>>> the two passes, or standalone VRP interface updating information for >>>>>> loop region after header copied? This is a non-trivial issue that >>>>>> needs to be fixed. Niters analyzer rely on >>>>>> simplify_using_initial_conditions heavily to get the same information, >>>>>> which in my opinion should be provided by VRP. Though this won't be >>>>>> able to obsolete simplify_using_initial_conditions because VRP is weak >>>>>> in symbolic range... >>>>>> >>>>>>> >>>>>>> Visiting statement: >>>>>>> i_15 = ASSERT_EXPR <i_6, i_6 < n_9(D)>; >>>>>>> Intersecting >>>>>>> [0, n_9(D) + 4294967295] EQUIVALENCES: { i_6 } (1 elements) >>>>>>> and >>>>>>> [0, 0] >>>>>>> to >>>>>>> [0, 0] EQUIVALENCES: { i_6 } (1 elements) >>>>>>> Intersecting >>>>>>> [0, 0] EQUIVALENCES: { i_6 } (1 elements) >>>>>>> and >>>>>>> [0, 1000] >>>>>>> to >>>>>>> [0, 0] EQUIVALENCES: { i_6 } (1 elements) >>>>>>> Found new range for i_15: [0, 0] >>>>>>> >>>>>>> Visiting statement: >>>>>>> _3 = i_15 + 1; >>>>>>> Match-and-simplified i_15 + 1 to 1 >>>>>>> Intersecting >>>>>>> [1, 1] >>>>>>> and >>>>>>> [0, +INF] >>>>>>> to >>>>>>> [1, 1] >>>>>>> Found new range for _3: [1, 1] >>>>>>> >>>>>>> (where _3 is the index we care about), followed by: >>>>>>> >>>>>>> Visiting statement: >>>>>>> i_15 = ASSERT_EXPR <i_6, i_6 < n_9(D)>; >>>>>>> Intersectings/aarch64-linux/trunk-orig/debug/gcc' >>>>>>> [0, n_9(D) + 4294967295] EQUIVALENCES: { i_6 } (1 elements) >>>>>>> and >>>>>>> [0, 4] >>>>>>> to >>>>>>> [0, n_9(D) + 4294967295] EQUIVALENCES: { i_6 } (1 elements) >>>>>>> Intersecting >>>>>>> [0, n_9(D) + 4294967295] EQUIVALENCES: { i_6 } (1 elements) >>>>>>> and >>>>>>> [0, 1000] >>>>>>> to >>>>>>> [0, n_9(D) + 4294967295] EQUIVALENCES: { i_6 } (1 elements) >>>>>>> Found new range for i_15: [0, n_9(D) + 4294967295] >>>>>>> >>>>>>> Visiting statement: >>>>>>> _3 = i_15 + 1; >>>>>>> Intersecting >>>>>>> [0, +INF] >>>>>>> and >>>>>>> [0, +INF] >>>>>>> to >>>>>>> [0, +INF] >>>>>>> Found new range for _3: [0, +INF] >>>>>>> >>>>>>> I guess in this case it would be better to intersect the i_15 ranges >>>>>>> to [0, 1000] rather than [0, n_9(D) + 4294967295]. >>>>>>> >>>>>>> It does work if another VRP pass runs after CH. But even then, >>>>>>> is it a good idea to rely on the range info being kept up-to-date >>>>>>> all the way through to SLP? A lot happens inbetween. >>>>>> To some extend yes. Now I understand that SCEV uses niters >>>>>> information to prove no_overflow. Niters analysis does infer better >>>>>> information from array boundary, while VRP fails to do that. I don't >>>>>> worry much about gap between vrp pass and slp, it's basically the same >>>>>> as niters. Both information are analyzed at one point and meant to be >>>>>> used by following passes. It's also not common for vrp information >>>>>> become invalid given we are on SSA? >>>>> >>>>> I'm not worried so much about vrp information becoming invalid on >>>>> an SSA name that existed when VRP was run. It's more a question >>>>> of what happens about SSA names that get introduced after VRP, >>>>> e.g. by things like dom, reassoc, PRE, etc. >>>> For induction variables in concern, these passes shouldn't >>>> aggressively introduces new variables I think. >>>>> >>>>>> Now that data address is not analyzed against loop, VRP would be the >>>>>> only information we can use unless adding back scev analysis. IMHO, >>>>>> the patch is doing so in another way than before. It requires >>>>>> additional chrec data structure. I remember the previous patch >>>>>> enables more slp vectorization, is it because of "step" information >>>>>> from scev? >>>>> >>>>> Do you mean that: >>>>> >>>>> 2017-07-03 Richard Sandiford <richard.sandiford@linaro.org> >>>>> >>>>> * tree-data-ref.c (dr_analyze_innermost): Replace the "nest" >>>>> parameter with a "loop" parameter and use it instead of the >>>>> loop containing DR_STMT. Don't check simple_iv when doing >>>>> BB analysis. Describe the two analysis modes in the comment. >>>>> >>>>> enabled more SLP vectorisation in bb-slp-pr65935.c? That was due >>>>> to us not doing IV analysis for BB vectorisation, and ensuring that >>>>> the step was always zero. >>>> Which means vectorizer code can handle not IV-analyzed offset, but >>>> can't for analyzed form? >>>>> >>>>>> In this patch, step information is simply discarded. I am >>>>>> wondering if possible to always analyze scev within innermost loop for >>>>>> slp while discards step information. >>>>> >>>>> Well, we don't calculate a step for bb analysis (i.e. it's not the case >>>>> the patch calculates step information and then simply discards it). >>>>> I don't see how that would work. For bb analysis, the DR_OFFSET + DR_INIT >>>>> has to give the offset for every execution of the block, not just the >>>>> first iteration of the containing loop. So if we get back a nonzero >>>>> step, we have to do something with it. >>>> Yeah. >>>>> >>>>> But: >>>>> >>>>> (a) the old simple_iv analysis is more expensive than simply calling >>>>> analyze_scev, so I don't think this is a win in terms of complexity. >>>>> >>>>> (b) for bb analysis, there's nothing particularly special about the >>>>> innermost loop. It makes more sense to analyse it in the innermost >>>>> loop for which the offset is invariant, as shown by the second >>>>> testcase in the patch. >>>>> >>>>> (c) The patch helps with loop vectorisation too, since analysing the >>>>> starting DR_OFFSET in the context of the containing loop can help >>>>> in a similar way as analysing the full offset does for SLP. >>>> >>>> I have to admit I am not very much into this method. It complicates >>>> structure as well as code. >>>> Mostly because now dr_init are split into two different fields and one >>>> of it is lazily computed. >>>> >>>> For example: >>>>> @@ -2974,12 +2974,12 @@ vect_vfa_segment_size (struct data_refer >>>>> vect_no_alias_p (struct data_reference *a, struct data_reference *b, >>>>> tree segment_length_a, tree segment_length_b) >>>>> { >>>>> - gcc_assert (TREE_CODE (DR_INIT (a)) == INTEGER_CST >>>>> - && TREE_CODE (DR_INIT (b)) == INTEGER_CST); >>>>> - if (tree_int_cst_equal (DR_INIT (a), DR_INIT (b))) >>>>> + gcc_assert (TREE_CODE (DR_CHREC_INIT (a)) == INTEGER_CST >>>>> + && TREE_CODE (DR_CHREC_INIT (b)) == INTEGER_CST); >>>>> + if (tree_int_cst_equal (DR_CHREC_INIT (a), DR_CHREC_INIT (b))) >>>>> return false; >>>>> >>>>> - tree seg_a_min = DR_INIT (a); >>>>> + tree seg_a_min = DR_CHREC_INIT (a); >>>>> tree seg_a_max = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_a_min), >>>>> seg_a_min, segment_length_a); >>>>> /* For negative step, we need to adjust address range by TYPE_SIZE_UNIT >>>>> @@ -2990,10 +2990,10 @@ vect_no_alias_p (struct data_reference * >>>>> tree unit_size = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (a))); >>>>> seg_a_min = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_a_max), >>>>> seg_a_max, unit_size); >>>>> - seg_a_max = fold_build2 (PLUS_EXPR, TREE_TYPE (DR_INIT (a)), >>>>> - DR_INIT (a), unit_size); >>>>> + seg_a_max = fold_build2 (PLUS_EXPR, TREE_TYPE (DR_CHREC_INIT (a)), >>>>> + DR_CHREC_INIT (a), unit_size); >>>>> } >>>>> - tree seg_b_min = DR_INIT (b); >>>>> + tree seg_b_min = DR_CHREC_INIT (b); >>>>> tree seg_b_max = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_b_min), >>>>> seg_b_min, segment_length_b); >>>>> if (tree_int_cst_compare (DR_STEP (b), size_zero_node) < 0) >>>> >>>> Use of DR_INIT is simply replaced by DR_CHREC_INIT. Is it safe to do >>>> so in case of non-ZERO >>>> DR_INIT? It worries me that I may need to think twice before >>>> referring to DR_INIT because it's >>>> not clear when DR_OFFSET is split and DR_CHREC_INIT becomes non-ZERO. >>>> It may simply >>>> because I am too dumb to handle this. I will leave this to richi. >>> >>> I'm trying to understand this a bit (not liking it very much in its >>> current form). >>> >>> Can code currently using DR_OFFSET and DR_INIT simply use >>> DR_CHREC_INIT and CHREC_LEFT (DR_CHREC_OFFSET) (or DR_CHREC_OFFSET >>> if DR_CHREC_OFFSET is not a CHREC)? If yes, would there be any downside >>> in doing that? If not, then why? > > There's nothing particularly special about the CHREC_LEFT for users of > the drs. The chrec as a whole describes the variable part of the offset. > >>> That said, I'm all for making DR info more powerful. But I detest >>> having extra fields >>> and adding confusion as to when to use which. Thus if we can make >>> DR_CHREC_INIT >>> be DR_INIT and DR_OFFSET an inline function accessing CHREC_LEFT if >>> suitable plus exposing DR_OFFSET_CHREC that would make me more happy. >> >> And if we want to make it opt-in do a dr_analyze_me_harder () which will >> re-write DR_OFFSET/INIT into the new form. >> >> But DR_OFFSET and DR_INIT (read) accessors would maintain their >> semantics while DR_OFFSET_CHREC would expose the CHREC if >> available. > > After the changes, the only place that actually cared about the split > between the "old" DR_OFFSET and DR_INIT was tree-predcom.c: > ref_at_iteration. Everywhere else just wanted the sum of OFFSET and INIT, > and for them it would have been more convenient to have a combined field. > > So maybe one way of trying to avoid the confusion would be to keep > DR_OFFSET together as the full starting offset from the base, then > provide DR_VAR_OFFSET and DR_CONST_OFFSET as the split forms, with > DR_VAR_OFFSET being more like an abstract value number. See the comment > at the start of the patch below for more details. > > I'm not sure whether the main reason for splitting the offset was to > make life easier for the users of the drs, or whether it was to try > to avoid creating new trees. But then, the unconditional scev analysis > that we did previously already generated new trees. Make live easier and allow same base + variable offset but differing const offset to be detected easily -- a[i] vs. a[i+1]. @@ -787,14 +821,14 @@ canonicalize_base_object_address (tree a bool dr_analyze_innermost (innermost_loop_behavior *drb, tree ref, - struct loop *loop) + gimple *stmt, struct loop *loop) { please amend the function comment with what STMT is about (DR_STMT I suppose). @@ -893,14 +927,14 @@ dr_analyze_innermost (innermost_loop_beh init = size_binop (PLUS_EXPR, init, dinit); base_misalignment -= TREE_INT_CST_LOW (dinit); - split_constant_offset (offset_iv.base, &offset_iv.base, &dinit); - init = size_binop (PLUS_EXPR, init, dinit); - - step = size_binop (PLUS_EXPR, - fold_convert (ssizetype, base_iv.step), - fold_convert (ssizetype, offset_iv.step)); - base = canonicalize_base_object_address (base_iv.base); + tree offset = size_binop (PLUS_EXPR, + fold_convert (ssizetype, offset_iv.base), + init); so you remove the split_constant_offset handling on offset_iv.base. This may end up no longer walking and expanding def stmts of SSA names contained therein. I suppose this is fully intended so that re-computing the ref address using DR_BASE/DR_OFFSET doesn't end up expanding that redundant code? For analysis relying on this one now needs to resort to DR_VAR/CONST_OFFSET where SCEV analysis hopefully performs similar expansions? Just sth to watch at ... @@ -921,12 +955,12 @@ dr_analyze_innermost (innermost_loop_beh } drb->base_address = base; - drb->offset = fold_convert (ssizetype, offset_iv.base); - drb->init = init; + drb->offset = offset; drb->step = step; + split_constant_offset (scev, &drb->var_offset, &drb->const_offset); so is the full-fledged split_constant_offset (with its SSA name handling) still needed here? Sth to eventually address with a followup. @@ -1490,6 +1482,7 @@ ref_at_iteration (data_reference_p dr, i tree ref_op2 = NULL_TREE; tree new_offset; + split_constant_offset (DR_OFFSET (dr), &off, &coff); if (iter != 0) { new_offset = size_binop (MULT_EXPR, DR_STEP (dr), ssize_int (iter)); likewise here? Why do you think ref_at_iteration cares? Is that because of codegen quality? I'd have done with coff == size_zero_node plus simplifications that arise from that. Thanks, Richard. > Tested on aarch64-linux-gnu and x86_64-linux-gnu. > > Thanks, > Richard > > > 2017-08-22 Richard Sandiford <richard.sandiford@arm.com> > > gcc/ > PR tree-optimization/81635 > * tree-data-ref.h (innermost_loop_behavior): Remove init field. > Add var_offset and const_offset fields. Rename offset_alignment > to var_offset_alignment. > (DR_INIT): Delete. > (DR_CONST_OFFSET, DR_VAR_OFFSET): New macros. > (DR_OFFSET_ALIGNMENT): Replace with... > (DR_VAR_OFFSET_ALIGNMENT): ...this new macro. > (dr_analyze_innermost): Add a gimple * argument. > (dr_equal_offsets_p): Delete. > (dr_var_offsets_equal_p, dr_var_offsets_compare): Declare. > * tree-vectorizer.h (STMT_VINFO_DR_INIT): Delete. > (STMT_VINFO_DR_VAR_OFFSET, STMT_VINFO_DR_CONST_OFFSET): New macros. > (STMT_VINFO_DR_OFFSET_ALIGNMENT): Replace with... > (STMT_VINFO_DR_VAR_OFFSET_ALIGNMENT): ...this new macro. > * tree.c (tree_ctz): Handle POLYNOMIAL_CHREC. > * tree-data-ref.c: Include tree-ssa-loop-ivopts.h. > (split_constant_offset): Handle POLYNOMIAL_CHREC. > (analyze_offset_scev): New function. > (dr_analyze_innermost): Add a gimple * statement. Update after > changes to innermost_behavior. Initialize var_offset and > const_offset. > (create_data_ref): Update call to dr_analyze_innermost. > Update dump after changes to innermost_behavior. > (operator ==): Use dr_var_offsets_equal_p and compare the > DR_CONST_OFFSETs. > (prune_runtime_alias_test_list): Likewise. > (comp_dr_with_seg_len_pair): Use dr_var_offsets_compare and compare > the DR_CONST_OFFSETs. > (create_intersect_range_checks): Use DR_OFFSET without adding > DR_INIT. > (dr_equal_offsets_p1, dr_equal_offsets_p): Delete. > (dr_alignment): Use const_offset instead of init and > var_offset_alignment instead of offset_alignment. > * tree-if-conv.c (innermost_loop_behavior_hash::hash): Don't > test the init field. > (innermost_loop_behavior_hash::equal): Likewise. > (ifcvt_memrefs_wont_trap): Likewise. > (if_convertible_loop_p_1): Likewise. > * tree-loop-distribution.c (build_addr_arg_loc): Use DR_OFFSET > without adding DR_INIT. > (build_rdg_partition_for_vertex): Don't check DR_INIT. > (share_memory_accesses): Likewise. > (pg_add_dependence_edges): Likewise. > (compute_alias_check_pairs): Use dr_var_offsets_compare. > * tree-predcom.c (aff_combination_dr_offset): Use DR_OFFSET without > adding DR_INIT. > (determine_offset): Likewise. > (valid_initializer_p): Likewise. > (find_looparound_phi): Update call to dr_analyze_innermost. > (ref_at_iteration): Use split_constant_offset to split the offset. > * tree-vect-data-refs.c (vect_compute_data_ref_alignment): Use > const_offset instead of init and var_offset_alignment instead of > offset_alignment. > (vect_find_same_alignment_drs): Use dr_var_offsets_compare and > compare the DR_CONST_OFFSETs. > (dr_group_sort_cmp): Likewise. > (vect_analyze_group_access_1): Use DR_CONST_OFFSET instead of DR_INIT. > (vect_no_alias_p): Likewise. > (vect_analyze_data_ref_accesses): Use dr_var_offsets_equal_p and > compare the DR_CONST_OFFSETs. > (vect_prune_runtime_alias_test_list): Use dr_var_offsets_compare. > (vect_analyze_data_refs): Don't check DR_INIT and use DR_OFFSET > without adding DR_INIT. Use DR_VAR_OFFSET_ALIGNMENT instead of > DR_OFFSET_ALIGNMENT. Update call to dr_analyze_innermost, and > update subsequent dump. > (vect_create_addr_base_for_vector_ref): Use DR_OFFSET without > adding DR_INIT. > * tree-vect-stmts.c (vectorizable_store): Likewise. > (vectorizable_load): Likewise. Use DR_CONST_OFFSET instead > of DR_INIT. > > gcc/testsuite/ > PR tree-optimization/81635 > * gcc.dg/vect/bb-slp-pr81635.c: New test. > > Index: gcc/tree-data-ref.h > =================================================================== > --- gcc/tree-data-ref.h 2017-08-21 10:42:51.088530428 +0100 > +++ gcc/tree-data-ref.h 2017-08-22 14:54:48.630563940 +0100 > @@ -28,29 +28,44 @@ #define GCC_TREE_DATA_REF_H > innermost_loop_behavior describes the evolution of the address of the memory > reference in the innermost enclosing loop. The address is expressed as > BASE + STEP * # of iteration, and base is further decomposed as the base > - pointer (BASE_ADDRESS), loop invariant offset (OFFSET) and > - constant offset (INIT). Examples, in loop nest > + pointer (BASE_ADDRESS) and the loop invariant offset (OFFSET). > + OFFSET is further expressed as the sum of a zero or non-constant term > + (VAR_OFFSET) and a constant term (CONST_OFFSET). VAR_OFFSET should be > + treated as an abstract representation; in particular, it may contain > + chrecs. CONST_OFFSET is always an INTEGER_CST. > > - for (i = 0; i < 100; i++) > - for (j = 3; j < 100; j++) > + Examples, in loop nest > > - Example 1 Example 2 > - data-ref a[j].b[i][j] *(p + x + 16B + 4B * j) > + 1: for (i = 0; i < 100; i++) > + 2: for (j = 3; j < 100; j++) > > + Example 1 Example 2 > + data-ref a[j].b[i][j] *(p + x + 16B + 4B * j) > > - innermost_loop_behavior > - base_address &a p > - offset i * D_i x > - init 3 * D_j + offsetof (b) 28 > - step D_j 4 > > - */ > + innermost_loop_behavior > + base_address &a p > + offset i * D_i + 3 * D_j + offsetof (b) x + 28 > + var_offset {0, +, D_i}_1 x (or an equiv. chrec) > + const_offset 3 * D_j + offsetof (b) 28 > + step D_j 4 > + > + The main two uses of VAR_OFFSET and CONST_OFFSET are: > + > + 1. to better analyze the alignment, since CONST_OFFSET can be treated as > + the misalignment wrt the alignment of VAR_OFFSET. > + > + 2. to find data references that are a constant number of bytes apart. > + If two data references have the same BASE_ADDRESS and VAR_OFFSET, > + the distance between them is given by the difference in their > + CONST_OFFSETs. */ > struct innermost_loop_behavior > { > tree base_address; > tree offset; > - tree init; > tree step; > + tree var_offset; > + tree const_offset; > > /* BASE_ADDRESS is known to be misaligned by BASE_MISALIGNMENT bytes > from an alignment boundary of BASE_ALIGNMENT bytes. For example, > @@ -91,7 +106,7 @@ struct innermost_loop_behavior > /* The largest power of two that divides OFFSET, capped to a suitably > high value if the offset is zero. This is a byte rather than a bit > quantity. */ > - unsigned int offset_alignment; > + unsigned int var_offset_alignment; > > /* Likewise for STEP. */ > unsigned int step_alignment; > @@ -186,12 +201,13 @@ #define DR_IS_WRITE(DR) (!DR_ > #define DR_IS_CONDITIONAL_IN_STMT(DR) (DR)->is_conditional_in_stmt > #define DR_BASE_ADDRESS(DR) (DR)->innermost.base_address > #define DR_OFFSET(DR) (DR)->innermost.offset > -#define DR_INIT(DR) (DR)->innermost.init > +#define DR_VAR_OFFSET(DR) (DR)->innermost.var_offset > +#define DR_CONST_OFFSET(DR) (DR)->innermost.const_offset > #define DR_STEP(DR) (DR)->innermost.step > #define DR_PTR_INFO(DR) (DR)->alias.ptr_info > #define DR_BASE_ALIGNMENT(DR) (DR)->innermost.base_alignment > #define DR_BASE_MISALIGNMENT(DR) (DR)->innermost.base_misalignment > -#define DR_OFFSET_ALIGNMENT(DR) (DR)->innermost.offset_alignment > +#define DR_VAR_OFFSET_ALIGNMENT(DR) (DR)->innermost.var_offset_alignment > #define DR_STEP_ALIGNMENT(DR) (DR)->innermost.step_alignment > #define DR_INNERMOST(DR) (DR)->innermost > > @@ -412,7 +428,8 @@ #define DDR_REVERSED_P(DDR) (DDR)->rever > #define DDR_COULD_BE_INDEPENDENT_P(DDR) (DDR)->could_be_independent_p > > > -bool dr_analyze_innermost (innermost_loop_behavior *, tree, struct loop *); > +bool dr_analyze_innermost (innermost_loop_behavior *, tree, > + gimple *, struct loop *); > extern bool compute_data_dependences_for_loop (struct loop *, bool, > vec<loop_p> *, > vec<data_reference_p> *, > @@ -466,8 +483,6 @@ dr_alignment (data_reference *dr) > > extern bool dr_may_alias_p (const struct data_reference *, > const struct data_reference *, bool); > -extern bool dr_equal_offsets_p (struct data_reference *, > - struct data_reference *); > > extern bool runtime_alias_check_p (ddr_p, struct loop *, bool); > extern int data_ref_compare_tree (tree, tree); > @@ -675,4 +690,23 @@ lambda_matrix_new (int m, int n, struct > return mat; > } > > +/* Check if DRA and DRB have equal DR_VAR_OFFSETs. */ > + > +inline bool > +dr_var_offsets_equal_p (struct data_reference *dra, > + struct data_reference *drb) > +{ > + return eq_evolutions_p (DR_VAR_OFFSET (dra), DR_VAR_OFFSET (drb)); > +} > + > +/* Compare the DR_VAR_OFFSETs of DRA and DRB for sorting purposes, > + returning a qsort-style result. */ > + > +inline int > +dr_var_offsets_compare (struct data_reference *dra, > + struct data_reference *drb) > +{ > + return data_ref_compare_tree (DR_VAR_OFFSET (dra), DR_VAR_OFFSET (drb)); > +} > + > #endif /* GCC_TREE_DATA_REF_H */ > Index: gcc/tree-vectorizer.h > =================================================================== > --- gcc/tree-vectorizer.h 2017-08-04 11:42:45.939105152 +0100 > +++ gcc/tree-vectorizer.h 2017-08-22 14:54:48.633563940 +0100 > @@ -740,14 +740,15 @@ #define STMT_VINFO_VEC_CONST_COND_REDUC_ > > #define STMT_VINFO_DR_WRT_VEC_LOOP(S) (S)->dr_wrt_vec_loop > #define STMT_VINFO_DR_BASE_ADDRESS(S) (S)->dr_wrt_vec_loop.base_address > -#define STMT_VINFO_DR_INIT(S) (S)->dr_wrt_vec_loop.init > #define STMT_VINFO_DR_OFFSET(S) (S)->dr_wrt_vec_loop.offset > +#define STMT_VINFO_DR_VAR_OFFSET(S) (S)->dr_wrt_vec_loop.var_offset > +#define STMT_VINFO_DR_CONST_OFFSET(S) (S)->dr_wrt_vec_loop.const_offset > #define STMT_VINFO_DR_STEP(S) (S)->dr_wrt_vec_loop.step > #define STMT_VINFO_DR_BASE_ALIGNMENT(S) (S)->dr_wrt_vec_loop.base_alignment > #define STMT_VINFO_DR_BASE_MISALIGNMENT(S) \ > (S)->dr_wrt_vec_loop.base_misalignment > -#define STMT_VINFO_DR_OFFSET_ALIGNMENT(S) \ > - (S)->dr_wrt_vec_loop.offset_alignment > +#define STMT_VINFO_DR_VAR_OFFSET_ALIGNMENT(S) \ > + (S)->dr_wrt_vec_loop.var_offset_alignment > #define STMT_VINFO_DR_STEP_ALIGNMENT(S) \ > (S)->dr_wrt_vec_loop.step_alignment > > Index: gcc/tree.c > =================================================================== > --- gcc/tree.c 2017-08-21 12:14:47.159835474 +0100 > +++ gcc/tree.c 2017-08-22 14:54:48.634563941 +0100 > @@ -2601,6 +2601,12 @@ tree_ctz (const_tree expr) > return MIN (ret1, prec); > } > return 0; > + case POLYNOMIAL_CHREC: > + ret1 = tree_ctz (CHREC_LEFT (expr)); > + if (ret1 == 0) > + return ret1; > + ret2 = tree_ctz (CHREC_RIGHT (expr)); > + return MIN (ret1, ret2); > default: > return 0; > } > Index: gcc/tree-data-ref.c > =================================================================== > --- gcc/tree-data-ref.c 2017-08-21 10:42:51.088530428 +0100 > +++ gcc/tree-data-ref.c 2017-08-22 14:54:48.629563940 +0100 > @@ -86,6 +86,7 @@ Software Foundation; either version 3, o > #include "expr.h" > #include "gimple-iterator.h" > #include "tree-ssa-loop-niter.h" > +#include "tree-ssa-loop-ivopts.h" > #include "tree-ssa-loop.h" > #include "tree-ssa.h" > #include "cfgloop.h" > @@ -730,7 +731,19 @@ split_constant_offset (tree exp, tree *v > *off = ssize_int (0); > STRIP_NOPS (exp); > > - if (tree_is_chrec (exp) > + if (TREE_CODE (exp) == POLYNOMIAL_CHREC) > + { > + split_constant_offset (CHREC_LEFT (exp), &op0, &op1); > + if (op0 != CHREC_LEFT (exp)) > + { > + *var = build3 (POLYNOMIAL_CHREC, type, CHREC_VAR (exp), > + op0, CHREC_RIGHT (exp)); > + *off = op1; > + } > + return; > + } > + > + if (automatically_generated_chrec_p (exp) > || get_gimple_rhs_class (TREE_CODE (exp)) == GIMPLE_TERNARY_RHS) > return; > > @@ -765,7 +778,28 @@ canonicalize_base_object_address (tree a > return build_fold_addr_expr (TREE_OPERAND (addr, 0)); > } > > -/* Analyze the behavior of memory reference REF. There are two modes: > +/* Analyze the scalar evolution of OFFSET in the innermost parent of > + LOOP for which it isn't invariant. Return OFFSET itself if the > + value is invariant or if it's too complex to analyze. */ > + > +static tree > +analyze_offset_scev (struct loop *loop, tree offset) > +{ > + struct loop *inv_loop = outermost_invariant_loop_for_expr (loop, offset); > + if (inv_loop != NULL) > + { > + if (loop_depth (inv_loop) == 0) > + return offset; > + loop = loop_outer (inv_loop); > + } > + tree res = analyze_scalar_evolution (loop, offset); > + if (chrec_contains_undetermined (res)) > + return offset; > + return res; > +} > + > +/* Analyze the behavior of memory reference REF, which occurs in STMT. > + There are two modes: > > - BB analysis. In this case we simply split the address into base, > init and offset components, without reference to any containing loop. > @@ -787,14 +821,14 @@ canonicalize_base_object_address (tree a > > bool > dr_analyze_innermost (innermost_loop_behavior *drb, tree ref, > - struct loop *loop) > + gimple *stmt, struct loop *loop) > { > HOST_WIDE_INT pbitsize, pbitpos; > tree base, poffset; > machine_mode pmode; > int punsignedp, preversep, pvolatilep; > affine_iv base_iv, offset_iv; > - tree init, dinit, step; > + tree dinit; > bool in_loop = (loop && loop->num); > > if (dump_file && (dump_flags & TDF_DETAILS)) > @@ -885,7 +919,7 @@ dr_analyze_innermost (innermost_loop_beh > } > } > > - init = ssize_int (pbitpos / BITS_PER_UNIT); > + tree init = ssize_int (pbitpos / BITS_PER_UNIT); > > /* Subtract any constant component from the base and add it to INIT instead. > Adjust the misalignment to reflect the amount we subtracted. */ > @@ -893,14 +927,14 @@ dr_analyze_innermost (innermost_loop_beh > init = size_binop (PLUS_EXPR, init, dinit); > base_misalignment -= TREE_INT_CST_LOW (dinit); > > - split_constant_offset (offset_iv.base, &offset_iv.base, &dinit); > - init = size_binop (PLUS_EXPR, init, dinit); > - > - step = size_binop (PLUS_EXPR, > - fold_convert (ssizetype, base_iv.step), > - fold_convert (ssizetype, offset_iv.step)); > - > base = canonicalize_base_object_address (base_iv.base); > + tree offset = size_binop (PLUS_EXPR, > + fold_convert (ssizetype, offset_iv.base), > + init); > + tree step = size_binop (PLUS_EXPR, > + fold_convert (ssizetype, base_iv.step), > + fold_convert (ssizetype, offset_iv.step)); > + tree scev = analyze_offset_scev (loop_containing_stmt (stmt), offset); > > /* See if get_pointer_alignment can guarantee a higher alignment than > the one we calculated above. */ > @@ -921,12 +955,12 @@ dr_analyze_innermost (innermost_loop_beh > } > > drb->base_address = base; > - drb->offset = fold_convert (ssizetype, offset_iv.base); > - drb->init = init; > + drb->offset = offset; > drb->step = step; > + split_constant_offset (scev, &drb->var_offset, &drb->const_offset); > drb->base_alignment = base_alignment; > drb->base_misalignment = base_misalignment & (base_alignment - 1); > - drb->offset_alignment = highest_pow2_factor (offset_iv.base); > + drb->var_offset_alignment = highest_pow2_factor (drb->var_offset); > drb->step_alignment = highest_pow2_factor (step); > > if (dump_file && (dump_flags & TDF_DETAILS)) > @@ -1154,7 +1188,7 @@ create_data_ref (loop_p nest, loop_p loo > DR_IS_READ (dr) = is_read; > DR_IS_CONDITIONAL_IN_STMT (dr) = is_conditional_in_stmt; > > - dr_analyze_innermost (&DR_INNERMOST (dr), memref, > + dr_analyze_innermost (&DR_INNERMOST (dr), memref, stmt, > nest != NULL ? loop : NULL); > dr_analyze_indices (dr, nest, loop); > dr_analyze_alias (dr); > @@ -1166,15 +1200,17 @@ create_data_ref (loop_p nest, loop_p loo > print_generic_expr (dump_file, DR_BASE_ADDRESS (dr), TDF_SLIM); > fprintf (dump_file, "\n\toffset from base address: "); > print_generic_expr (dump_file, DR_OFFSET (dr), TDF_SLIM); > - fprintf (dump_file, "\n\tconstant offset from base address: "); > - print_generic_expr (dump_file, DR_INIT (dr), TDF_SLIM); > + fprintf (dump_file, "\n\tvariable part of offset: "); > + print_generic_expr (dump_file, DR_VAR_OFFSET (dr), TDF_SLIM); > + fprintf (dump_file, "\n\tconstant part of offset: "); > + print_generic_expr (dump_file, DR_CONST_OFFSET (dr), TDF_SLIM); > fprintf (dump_file, "\n\tstep: "); > print_generic_expr (dump_file, DR_STEP (dr), TDF_SLIM); > fprintf (dump_file, "\n\tbase alignment: %d", DR_BASE_ALIGNMENT (dr)); > fprintf (dump_file, "\n\tbase misalignment: %d", > DR_BASE_MISALIGNMENT (dr)); > - fprintf (dump_file, "\n\toffset alignment: %d", > - DR_OFFSET_ALIGNMENT (dr)); > + fprintf (dump_file, "\n\tvariable offset alignment: %d", > + DR_VAR_OFFSET_ALIGNMENT (dr)); > fprintf (dump_file, "\n\tstep alignment: %d", DR_STEP_ALIGNMENT (dr)); > fprintf (dump_file, "\n\tbase_object: "); > print_generic_expr (dump_file, DR_BASE_OBJECT (dr), TDF_SLIM); > @@ -1324,11 +1360,12 @@ runtime_alias_check_p (ddr_p ddr, struct > operator == (const dr_with_seg_len& d1, > const dr_with_seg_len& d2) > { > - return operand_equal_p (DR_BASE_ADDRESS (d1.dr), > + return (operand_equal_p (DR_BASE_ADDRESS (d1.dr), > DR_BASE_ADDRESS (d2.dr), 0) > - && data_ref_compare_tree (DR_OFFSET (d1.dr), DR_OFFSET (d2.dr)) == 0 > - && data_ref_compare_tree (DR_INIT (d1.dr), DR_INIT (d2.dr)) == 0 > - && data_ref_compare_tree (d1.seg_len, d2.seg_len) == 0; > + && dr_var_offsets_equal_p (d1.dr, d2.dr) > + && data_ref_compare_tree (DR_CONST_OFFSET (d1.dr), > + DR_CONST_OFFSET (d2.dr)) == 0 > + && data_ref_compare_tree (d1.seg_len, d2.seg_len) == 0); > } > > /* Comparison function for sorting objects of dr_with_seg_len_pair_t > @@ -1360,17 +1397,15 @@ comp_dr_with_seg_len_pair (const void *p > if ((comp_res = data_ref_compare_tree (DR_STEP (a2.dr), > DR_STEP (b2.dr))) != 0) > return comp_res; > - if ((comp_res = data_ref_compare_tree (DR_OFFSET (a1.dr), > - DR_OFFSET (b1.dr))) != 0) > + if ((comp_res = dr_var_offsets_compare (a1.dr, b1.dr)) != 0) > return comp_res; > - if ((comp_res = data_ref_compare_tree (DR_INIT (a1.dr), > - DR_INIT (b1.dr))) != 0) > + if ((comp_res = data_ref_compare_tree (DR_CONST_OFFSET (a1.dr), > + DR_CONST_OFFSET (b1.dr))) != 0) > return comp_res; > - if ((comp_res = data_ref_compare_tree (DR_OFFSET (a2.dr), > - DR_OFFSET (b2.dr))) != 0) > + if ((comp_res = dr_var_offsets_compare (a2.dr, b2.dr)) != 0) > return comp_res; > - if ((comp_res = data_ref_compare_tree (DR_INIT (a2.dr), > - DR_INIT (b2.dr))) != 0) > + if ((comp_res = data_ref_compare_tree (DR_CONST_OFFSET (a2.dr), > + DR_CONST_OFFSET (b2.dr))) != 0) > return comp_res; > > return 0; > @@ -1455,10 +1490,9 @@ prune_runtime_alias_test_list (vec<dr_wi > > if (!operand_equal_p (DR_BASE_ADDRESS (dr_a1->dr), > DR_BASE_ADDRESS (dr_a2->dr), 0) > - || !operand_equal_p (DR_OFFSET (dr_a1->dr), > - DR_OFFSET (dr_a2->dr), 0) > - || !tree_fits_shwi_p (DR_INIT (dr_a1->dr)) > - || !tree_fits_shwi_p (DR_INIT (dr_a2->dr))) > + || !dr_var_offsets_equal_p (dr_a1->dr, dr_a2->dr) > + || !tree_fits_shwi_p (DR_CONST_OFFSET (dr_a1->dr)) > + || !tree_fits_shwi_p (DR_CONST_OFFSET (dr_a2->dr))) > continue; > > /* Only merge const step data references. */ > @@ -1484,11 +1518,13 @@ prune_runtime_alias_test_list (vec<dr_wi > continue; > > /* Make sure dr_a1 starts left of dr_a2. */ > - if (tree_int_cst_lt (DR_INIT (dr_a2->dr), DR_INIT (dr_a1->dr))) > + if (tree_int_cst_lt (DR_CONST_OFFSET (dr_a2->dr), > + DR_CONST_OFFSET (dr_a1->dr))) > std::swap (*dr_a1, *dr_a2); > > bool do_remove = false; > - wide_int diff = wi::sub (DR_INIT (dr_a2->dr), DR_INIT (dr_a1->dr)); > + wide_int diff = wi::sub (DR_CONST_OFFSET (dr_a2->dr), > + DR_CONST_OFFSET (dr_a1->dr)); > wide_int min_seg_len_b; > tree new_seg_len; > > @@ -1756,10 +1792,6 @@ create_intersect_range_checks (struct lo > tree addr_base_b = DR_BASE_ADDRESS (dr_b.dr); > tree offset_a = DR_OFFSET (dr_a.dr), offset_b = DR_OFFSET (dr_b.dr); > > - offset_a = fold_build2 (PLUS_EXPR, TREE_TYPE (offset_a), > - offset_a, DR_INIT (dr_a.dr)); > - offset_b = fold_build2 (PLUS_EXPR, TREE_TYPE (offset_b), > - offset_b, DR_INIT (dr_b.dr)); > addr_base_a = fold_build_pointer_plus (addr_base_a, offset_a); > addr_base_b = fold_build_pointer_plus (addr_base_b, offset_b); > > @@ -1826,48 +1858,6 @@ create_runtime_alias_checks (struct loop > } > } > > -/* Check if OFFSET1 and OFFSET2 (DR_OFFSETs of some data-refs) are identical > - expressions. */ > -static bool > -dr_equal_offsets_p1 (tree offset1, tree offset2) > -{ > - bool res; > - > - STRIP_NOPS (offset1); > - STRIP_NOPS (offset2); > - > - if (offset1 == offset2) > - return true; > - > - if (TREE_CODE (offset1) != TREE_CODE (offset2) > - || (!BINARY_CLASS_P (offset1) && !UNARY_CLASS_P (offset1))) > - return false; > - > - res = dr_equal_offsets_p1 (TREE_OPERAND (offset1, 0), > - TREE_OPERAND (offset2, 0)); > - > - if (!res || !BINARY_CLASS_P (offset1)) > - return res; > - > - res = dr_equal_offsets_p1 (TREE_OPERAND (offset1, 1), > - TREE_OPERAND (offset2, 1)); > - > - return res; > -} > - > -/* Check if DRA and DRB have equal offsets. */ > -bool > -dr_equal_offsets_p (struct data_reference *dra, > - struct data_reference *drb) > -{ > - tree offset1, offset2; > - > - offset1 = DR_OFFSET (dra); > - offset2 = DR_OFFSET (drb); > - > - return dr_equal_offsets_p1 (offset1, offset2); > -} > - > /* Returns true if FNA == FNB. */ > > static bool > @@ -5083,13 +5073,13 @@ dr_alignment (innermost_loop_behavior *d > /* Get the alignment of BASE_ADDRESS + INIT. */ > unsigned int alignment = drb->base_alignment; > unsigned int misalignment = (drb->base_misalignment > - + TREE_INT_CST_LOW (drb->init)); > + + TREE_INT_CST_LOW (drb->const_offset)); > if (misalignment != 0) > alignment = MIN (alignment, misalignment & -misalignment); > > /* Cap it to the alignment of OFFSET. */ > if (!integer_zerop (drb->offset)) > - alignment = MIN (alignment, drb->offset_alignment); > + alignment = MIN (alignment, drb->var_offset_alignment); > > /* Cap it to the alignment of STEP. */ > if (!integer_zerop (drb->step)) > Index: gcc/tree-if-conv.c > =================================================================== > --- gcc/tree-if-conv.c 2017-07-13 09:25:12.666266733 +0100 > +++ gcc/tree-if-conv.c 2017-08-22 14:54:48.630563940 +0100 > @@ -149,7 +149,6 @@ innermost_loop_behavior_hash::hash (cons > > hash = iterative_hash_expr (e->base_address, 0); > hash = iterative_hash_expr (e->offset, hash); > - hash = iterative_hash_expr (e->init, hash); > return iterative_hash_expr (e->step, hash); > } > > @@ -161,8 +160,6 @@ innermost_loop_behavior_hash::equal (con > || (!e1->base_address && e2->base_address) > || (!e1->offset && e2->offset) > || (e1->offset && !e2->offset) > - || (!e1->init && e2->init) > - || (e1->init && !e2->init) > || (!e1->step && e2->step) > || (e1->step && !e2->step)) > return false; > @@ -173,9 +170,6 @@ innermost_loop_behavior_hash::equal (con > if (e1->offset && e2->offset > && !operand_equal_p (e1->offset, e2->offset, 0)) > return false; > - if (e1->init && e2->init > - && !operand_equal_p (e1->init, e2->init, 0)) > - return false; > if (e1->step && e2->step > && !operand_equal_p (e1->step, e2->step, 0)) > return false; > @@ -856,8 +850,7 @@ ifcvt_memrefs_wont_trap (gimple *stmt, v > innermost_loop_behavior *innermost = &DR_INNERMOST (a); > > gcc_assert (DR_STMT (a) == stmt); > - gcc_assert (DR_BASE_ADDRESS (a) || DR_OFFSET (a) > - || DR_INIT (a) || DR_STEP (a)); > + gcc_assert (DR_BASE_ADDRESS (a) || DR_OFFSET (a) || DR_STEP (a)); > > master_dr = innermost_DR_map->get (innermost); > gcc_assert (master_dr != NULL); > @@ -1433,8 +1426,7 @@ if_convertible_loop_p_1 (struct loop *lo > if (TREE_CODE (ref) == COMPONENT_REF > || TREE_CODE (ref) == IMAGPART_EXPR > || TREE_CODE (ref) == REALPART_EXPR > - || !(DR_BASE_ADDRESS (dr) || DR_OFFSET (dr) > - || DR_INIT (dr) || DR_STEP (dr))) > + || !(DR_BASE_ADDRESS (dr) || DR_OFFSET (dr) || DR_STEP (dr))) > { > while (TREE_CODE (ref) == COMPONENT_REF > || TREE_CODE (ref) == IMAGPART_EXPR > Index: gcc/tree-loop-distribution.c > =================================================================== > --- gcc/tree-loop-distribution.c 2017-08-21 10:42:51.088530428 +0100 > +++ gcc/tree-loop-distribution.c 2017-08-22 14:54:48.630563940 +0100 > @@ -903,8 +903,7 @@ build_addr_arg_loc (location_t loc, data > { > tree addr_base; > > - addr_base = size_binop_loc (loc, PLUS_EXPR, DR_OFFSET (dr), DR_INIT (dr)); > - addr_base = fold_convert_loc (loc, sizetype, addr_base); > + addr_base = fold_convert_loc (loc, sizetype, DR_OFFSET (dr)); > > /* Test for a negative stride, iterating over every element. */ > if (tree_int_cst_sgn (DR_STEP (dr)) == -1) > @@ -1289,8 +1288,7 @@ build_rdg_partition_for_vertex (struct g > > /* Partition can only be executed sequentially if there is any > unknown data reference. */ > - if (!DR_BASE_ADDRESS (dr) || !DR_OFFSET (dr) > - || !DR_INIT (dr) || !DR_STEP (dr)) > + if (!DR_BASE_ADDRESS (dr) || !DR_OFFSET (dr) || !DR_STEP (dr)) > partition->type = PTYPE_SEQUENTIAL; > > bitmap_set_bit (partition->datarefs, idx); > @@ -1507,21 +1505,18 @@ share_memory_accesses (struct graph *rdg > { > dr1 = datarefs_vec[i]; > > - if (!DR_BASE_ADDRESS (dr1) > - || !DR_OFFSET (dr1) || !DR_INIT (dr1) || !DR_STEP (dr1)) > + if (!DR_BASE_ADDRESS (dr1) || !DR_OFFSET (dr1) || !DR_STEP (dr1)) > continue; > > EXECUTE_IF_SET_IN_BITMAP (partition2->datarefs, 0, j, bj) > { > dr2 = datarefs_vec[j]; > > - if (!DR_BASE_ADDRESS (dr2) > - || !DR_OFFSET (dr2) || !DR_INIT (dr2) || !DR_STEP (dr2)) > + if (!DR_BASE_ADDRESS (dr2) || !DR_OFFSET (dr2) || !DR_STEP (dr2)) > continue; > > if (operand_equal_p (DR_BASE_ADDRESS (dr1), DR_BASE_ADDRESS (dr2), 0) > && operand_equal_p (DR_OFFSET (dr1), DR_OFFSET (dr2), 0) > - && operand_equal_p (DR_INIT (dr1), DR_INIT (dr2), 0) > && operand_equal_p (DR_STEP (dr1), DR_STEP (dr2), 0)) > return true; > } > @@ -1705,7 +1700,6 @@ pg_add_dependence_edges (struct graph *r > runtime alias check. */ > if (!DR_BASE_ADDRESS (dr1) || !DR_BASE_ADDRESS (dr2) > || !DR_OFFSET (dr1) || !DR_OFFSET (dr2) > - || !DR_INIT (dr1) || !DR_INIT (dr2) > || !DR_STEP (dr1) || !tree_fits_uhwi_p (DR_STEP (dr1)) > || !DR_STEP (dr2) || !tree_fits_uhwi_p (DR_STEP (dr2)) > || res == 0) > @@ -2203,7 +2197,7 @@ compute_alias_check_pairs (struct loop * > DR_BASE_ADDRESS (dr_b)); > > if (comp_res == 0) > - comp_res = data_ref_compare_tree (DR_OFFSET (dr_a), DR_OFFSET (dr_b)); > + comp_res = dr_var_offsets_compare (dr_a, dr_b); > gcc_assert (comp_res != 0); > > if (latch_dominated_by_data_ref (loop, dr_a)) > Index: gcc/tree-predcom.c > =================================================================== > --- gcc/tree-predcom.c 2017-08-10 14:36:08.057471267 +0100 > +++ gcc/tree-predcom.c 2017-08-22 14:54:48.631563940 +0100 > @@ -666,18 +666,13 @@ suitable_reference_p (struct data_refere > return true; > } > > -/* Stores DR_OFFSET (DR) + DR_INIT (DR) to OFFSET. */ > +/* Stores DR_OFFSET (DR) to OFFSET. */ > > static void > aff_combination_dr_offset (struct data_reference *dr, aff_tree *offset) > { > - tree type = TREE_TYPE (DR_OFFSET (dr)); > - aff_tree delta; > - > - tree_to_aff_combination_expand (DR_OFFSET (dr), type, offset, > - &name_expansions); > - aff_combination_const (&delta, type, wi::to_widest (DR_INIT (dr))); > - aff_combination_add (offset, &delta); > + tree_to_aff_combination_expand (DR_OFFSET (dr), TREE_TYPE (DR_OFFSET (dr)), > + offset, &name_expansions); > } > > /* Determines number of iterations of the innermost enclosing loop before B > @@ -710,8 +705,7 @@ determine_offset (struct data_reference > /* If the references have loop invariant address, check that they access > exactly the same location. */ > *off = 0; > - return (operand_equal_p (DR_OFFSET (a), DR_OFFSET (b), 0) > - && operand_equal_p (DR_INIT (a), DR_INIT (b), 0)); > + return operand_equal_p (DR_OFFSET (a), DR_OFFSET (b), 0); > } > > /* Compare the offsets of the addresses, and check whether the difference > @@ -1171,8 +1165,7 @@ valid_initializer_p (struct data_referen > /* If the address of the reference is invariant, initializer must access > exactly the same location. */ > if (integer_zerop (DR_STEP (root))) > - return (operand_equal_p (DR_OFFSET (ref), DR_OFFSET (root), 0) > - && operand_equal_p (DR_INIT (ref), DR_INIT (root), 0)); > + return operand_equal_p (DR_OFFSET (ref), DR_OFFSET (root), 0); > > /* Verify that this index of REF is equal to the root's index at > -DISTANCE-th iteration. */ > @@ -1247,7 +1240,7 @@ find_looparound_phi (struct loop *loop, > memset (&init_dr, 0, sizeof (struct data_reference)); > DR_REF (&init_dr) = init_ref; > DR_STMT (&init_dr) = phi; > - if (!dr_analyze_innermost (&DR_INNERMOST (&init_dr), init_ref, loop)) > + if (!dr_analyze_innermost (&DR_INNERMOST (&init_dr), init_ref, phi, loop)) > return NULL; > > if (!valid_initializer_p (&init_dr, ref->distance + 1, root->ref)) > @@ -1481,8 +1474,7 @@ replace_ref_with (gimple *stmt, tree new > ref_at_iteration (data_reference_p dr, int iter, > gimple_seq *stmts, tree niters = NULL_TREE) > { > - tree off = DR_OFFSET (dr); > - tree coff = DR_INIT (dr); > + tree off, coff; > tree ref = DR_REF (dr); > enum tree_code ref_code = ERROR_MARK; > tree ref_type = NULL_TREE; > @@ -1490,6 +1482,7 @@ ref_at_iteration (data_reference_p dr, i > tree ref_op2 = NULL_TREE; > tree new_offset; > > + split_constant_offset (DR_OFFSET (dr), &off, &coff); > if (iter != 0) > { > new_offset = size_binop (MULT_EXPR, DR_STEP (dr), ssize_int (iter)); > Index: gcc/tree-vect-data-refs.c > =================================================================== > --- gcc/tree-vect-data-refs.c 2017-08-21 10:42:51.088530428 +0100 > +++ gcc/tree-vect-data-refs.c 2017-08-22 14:54:48.632563940 +0100 > @@ -866,7 +866,7 @@ vect_compute_data_ref_alignment (struct > base_misalignment = (*entry)->base_misalignment; > } > > - if (drb->offset_alignment < vector_alignment > + if (drb->var_offset_alignment < vector_alignment > || !step_preserves_misalignment_p > /* We need to know whether the step wrt the vectorized loop is > negative when computing the starting misalignment below. */ > @@ -928,7 +928,7 @@ vect_compute_data_ref_alignment (struct > base_misalignment = 0; > } > unsigned int misalignment = (base_misalignment > - + TREE_INT_CST_LOW (drb->init)); > + + TREE_INT_CST_LOW (drb->const_offset)); > > /* If this is a backward running DR then first access in the larger > vectype actually is N-1 elements before the address in the DR. > @@ -2187,13 +2187,13 @@ vect_find_same_alignment_drs (struct dat > return; > > if (!operand_equal_p (DR_BASE_ADDRESS (dra), DR_BASE_ADDRESS (drb), 0) > - || !operand_equal_p (DR_OFFSET (dra), DR_OFFSET (drb), 0) > + || !dr_var_offsets_equal_p (dra, drb) > || !operand_equal_p (DR_STEP (dra), DR_STEP (drb), 0)) > return; > > /* Two references with distance zero have the same alignment. */ > - offset_int diff = (wi::to_offset (DR_INIT (dra)) > - - wi::to_offset (DR_INIT (drb))); > + offset_int diff = (wi::to_offset (DR_CONST_OFFSET (dra)) > + - wi::to_offset (DR_CONST_OFFSET (drb))); > if (diff != 0) > { > /* Get the wider of the two alignments. */ > @@ -2434,7 +2434,7 @@ vect_analyze_group_access_1 (struct data > gimple *next = GROUP_NEXT_ELEMENT (vinfo_for_stmt (stmt)); > struct data_reference *data_ref = dr; > unsigned int count = 1; > - tree prev_init = DR_INIT (data_ref); > + tree prev_init = DR_CONST_OFFSET (data_ref); > gimple *prev = stmt; > HOST_WIDE_INT diff, gaps = 0; > > @@ -2444,9 +2444,10 @@ vect_analyze_group_access_1 (struct data > data-ref (supported only for loads), we vectorize only the first > stmt, and the rest get their vectorized loads from the first > one. */ > - if (!tree_int_cst_compare (DR_INIT (data_ref), > - DR_INIT (STMT_VINFO_DATA_REF ( > - vinfo_for_stmt (next))))) > + data_reference *next_ref > + = STMT_VINFO_DATA_REF (vinfo_for_stmt (next)); > + if (!tree_int_cst_compare (DR_CONST_OFFSET (data_ref), > + DR_CONST_OFFSET (next_ref))) > { > if (DR_IS_WRITE (data_ref)) > { > @@ -2469,14 +2470,14 @@ vect_analyze_group_access_1 (struct data > } > > prev = next; > - data_ref = STMT_VINFO_DATA_REF (vinfo_for_stmt (next)); > + data_ref = next_ref; > > /* All group members have the same STEP by construction. */ > gcc_checking_assert (operand_equal_p (DR_STEP (data_ref), step, 0)); > > /* Check that the distance between two accesses is equal to the type > size. Otherwise, we have gaps. */ > - diff = (TREE_INT_CST_LOW (DR_INIT (data_ref)) > + diff = (TREE_INT_CST_LOW (DR_CONST_OFFSET (data_ref)) > - TREE_INT_CST_LOW (prev_init)) / type_size; > if (diff != 1) > { > @@ -2499,7 +2500,7 @@ vect_analyze_group_access_1 (struct data > gap in the access, GROUP_GAP is always 1. */ > GROUP_GAP (vinfo_for_stmt (next)) = diff; > > - prev_init = DR_INIT (data_ref); > + prev_init = DR_CONST_OFFSET (data_ref); > next = GROUP_NEXT_ELEMENT (vinfo_for_stmt (next)); > /* Count the number of data-refs in the chain. */ > count++; > @@ -2715,13 +2716,10 @@ dr_group_sort_cmp (const void *dra_, con > return cmp; > } > > - /* And according to DR_OFFSET. */ > - if (!dr_equal_offsets_p (dra, drb)) > - { > - cmp = data_ref_compare_tree (DR_OFFSET (dra), DR_OFFSET (drb)); > - if (cmp != 0) > - return cmp; > - } > + /* And according to DR_VAR_OFFSET. */ > + cmp = dr_var_offsets_compare (dra, drb); > + if (cmp != 0) > + return cmp; > > /* Put reads before writes. */ > if (DR_IS_READ (dra) != DR_IS_READ (drb)) > @@ -2745,8 +2743,9 @@ dr_group_sort_cmp (const void *dra_, con > return cmp; > } > > - /* Then sort after DR_INIT. In case of identical DRs sort after stmt UID. */ > - cmp = tree_int_cst_compare (DR_INIT (dra), DR_INIT (drb)); > + /* Then sort after DR_CONST_OFFSET. In case of identical DRs sort after > + stmt UID. */ > + cmp = tree_int_cst_compare (DR_CONST_OFFSET (dra), DR_CONST_OFFSET (drb)); > if (cmp == 0) > return gimple_uid (DR_STMT (dra)) < gimple_uid (DR_STMT (drb)) ? -1 : 1; > return cmp; > @@ -2817,7 +2816,7 @@ vect_analyze_data_ref_accesses (vec_info > if (DR_IS_READ (dra) != DR_IS_READ (drb) > || !operand_equal_p (DR_BASE_ADDRESS (dra), > DR_BASE_ADDRESS (drb), 0) > - || !dr_equal_offsets_p (dra, drb) > + || !dr_var_offsets_equal_p (dra, drb) > || !gimple_assign_single_p (DR_STMT (dra)) > || !gimple_assign_single_p (DR_STMT (drb))) > break; > @@ -2835,7 +2834,8 @@ vect_analyze_data_ref_accesses (vec_info > break; > > /* Do not place the same access in the interleaving chain twice. */ > - if (tree_int_cst_compare (DR_INIT (dra), DR_INIT (drb)) == 0) > + if (tree_int_cst_compare (DR_CONST_OFFSET (dra), > + DR_CONST_OFFSET (drb)) == 0) > break; > > /* Check the types are compatible. > @@ -2844,9 +2844,10 @@ vect_analyze_data_ref_accesses (vec_info > TREE_TYPE (DR_REF (drb)))) > break; > > - /* Sorting has ensured that DR_INIT (dra) <= DR_INIT (drb). */ > - HOST_WIDE_INT init_a = TREE_INT_CST_LOW (DR_INIT (dra)); > - HOST_WIDE_INT init_b = TREE_INT_CST_LOW (DR_INIT (drb)); > + /* Sorting has ensured that > + DR_CONST_OFFSET (dra) <= DR_CONST_OFFSET (drb). */ > + HOST_WIDE_INT init_a = TREE_INT_CST_LOW (DR_CONST_OFFSET (dra)); > + HOST_WIDE_INT init_b = TREE_INT_CST_LOW (DR_CONST_OFFSET (drb)); > gcc_assert (init_a <= init_b); > > /* If init_b == init_a + the size of the type * k, we have an > @@ -2859,10 +2860,10 @@ vect_analyze_data_ref_accesses (vec_info > /* If we have a store, the accesses are adjacent. This splits > groups into chunks we support (we don't support vectorization > of stores with gaps). */ > + HOST_WIDE_INT prev_init > + = TREE_INT_CST_LOW (DR_CONST_OFFSET (datarefs_copy[i - 1])); > if (!DR_IS_READ (dra) > - && (init_b - (HOST_WIDE_INT) TREE_INT_CST_LOW > - (DR_INIT (datarefs_copy[i-1])) > - != type_size_a)) > + && (init_b - prev_init) != type_size_a) > break; > > /* If the step (if not zero or non-constant) is greater than the > @@ -2974,12 +2975,12 @@ vect_vfa_segment_size (struct data_refer > vect_no_alias_p (struct data_reference *a, struct data_reference *b, > tree segment_length_a, tree segment_length_b) > { > - gcc_assert (TREE_CODE (DR_INIT (a)) == INTEGER_CST > - && TREE_CODE (DR_INIT (b)) == INTEGER_CST); > - if (tree_int_cst_equal (DR_INIT (a), DR_INIT (b))) > + gcc_assert (TREE_CODE (DR_CONST_OFFSET (a)) == INTEGER_CST > + && TREE_CODE (DR_CONST_OFFSET (b)) == INTEGER_CST); > + if (tree_int_cst_equal (DR_CONST_OFFSET (a), DR_CONST_OFFSET (b))) > return false; > > - tree seg_a_min = DR_INIT (a); > + tree seg_a_min = DR_CONST_OFFSET (a); > tree seg_a_max = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_a_min), > seg_a_min, segment_length_a); > /* For negative step, we need to adjust address range by TYPE_SIZE_UNIT > @@ -2990,10 +2991,10 @@ vect_no_alias_p (struct data_reference * > tree unit_size = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (a))); > seg_a_min = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_a_max), > seg_a_max, unit_size); > - seg_a_max = fold_build2 (PLUS_EXPR, TREE_TYPE (DR_INIT (a)), > - DR_INIT (a), unit_size); > + seg_a_max = fold_build2 (PLUS_EXPR, TREE_TYPE (DR_CONST_OFFSET (a)), > + DR_CONST_OFFSET (a), unit_size); > } > - tree seg_b_min = DR_INIT (b); > + tree seg_b_min = DR_CONST_OFFSET (b); > tree seg_b_max = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_b_min), > seg_b_min, segment_length_b); > if (tree_int_cst_compare (DR_STEP (b), size_zero_node) < 0) > @@ -3001,8 +3002,8 @@ vect_no_alias_p (struct data_reference * > tree unit_size = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (b))); > seg_b_min = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_b_max), > seg_b_max, unit_size); > - seg_b_max = fold_build2 (PLUS_EXPR, TREE_TYPE (DR_INIT (b)), > - DR_INIT (b), unit_size); > + seg_b_max = fold_build2 (PLUS_EXPR, TREE_TYPE (DR_CONST_OFFSET (b)), > + DR_CONST_OFFSET (b), unit_size); > } > > if (tree_int_cst_le (seg_a_max, seg_b_min) > @@ -3148,8 +3149,7 @@ vect_prune_runtime_alias_test_list (loop > comp_res = data_ref_compare_tree (DR_BASE_ADDRESS (dr_a), > DR_BASE_ADDRESS (dr_b)); > if (comp_res == 0) > - comp_res = data_ref_compare_tree (DR_OFFSET (dr_a), > - DR_OFFSET (dr_b)); > + comp_res = dr_var_offsets_compare (dr_a, dr_b); > > /* Alias is known at compilation time. */ > if (comp_res == 0 > @@ -3455,7 +3455,7 @@ vect_analyze_data_refs (vec_info *vinfo, > { > gimple *stmt; > stmt_vec_info stmt_info; > - tree base, offset, init; > + tree base, offset; > enum { SG_NONE, GATHER, SCATTER } gatherscatter = SG_NONE; > bool simd_lane_access = false; > int vf; > @@ -3488,8 +3488,7 @@ vect_analyze_data_refs (vec_info *vinfo, > } > > /* Check that analysis of the data-ref succeeded. */ > - if (!DR_BASE_ADDRESS (dr) || !DR_OFFSET (dr) || !DR_INIT (dr) > - || !DR_STEP (dr)) > + if (!DR_BASE_ADDRESS (dr) || !DR_OFFSET (dr) || !DR_STEP (dr)) > { > bool maybe_gather > = DR_IS_READ (dr) > @@ -3515,7 +3514,6 @@ vect_analyze_data_refs (vec_info *vinfo, > gcc_assert (newdr != NULL && DR_REF (newdr)); > if (DR_BASE_ADDRESS (newdr) > && DR_OFFSET (newdr) > - && DR_INIT (newdr) > && DR_STEP (newdr) > && integer_zerop (DR_STEP (newdr))) > { > @@ -3523,8 +3521,7 @@ vect_analyze_data_refs (vec_info *vinfo, > { > tree off = DR_OFFSET (newdr); > STRIP_NOPS (off); > - if (TREE_CODE (DR_INIT (newdr)) == INTEGER_CST > - && TREE_CODE (off) == MULT_EXPR > + if (TREE_CODE (off) == MULT_EXPR > && tree_fits_uhwi_p (TREE_OPERAND (off, 1))) > { > tree step = TREE_OPERAND (off, 1); > @@ -3555,7 +3552,7 @@ vect_analyze_data_refs (vec_info *vinfo, > { > DR_OFFSET (newdr) = ssize_int (0); > DR_STEP (newdr) = step; > - DR_OFFSET_ALIGNMENT (newdr) > + DR_VAR_OFFSET_ALIGNMENT (newdr) > = BIGGEST_ALIGNMENT; > DR_STEP_ALIGNMENT (newdr) > = highest_pow2_factor (step); > @@ -3665,7 +3662,6 @@ vect_analyze_data_refs (vec_info *vinfo, > > base = unshare_expr (DR_BASE_ADDRESS (dr)); > offset = unshare_expr (DR_OFFSET (dr)); > - init = unshare_expr (DR_INIT (dr)); > > if (is_gimple_call (stmt) > && (!gimple_call_internal_p (stmt) > @@ -3701,9 +3697,7 @@ vect_analyze_data_refs (vec_info *vinfo, > inner loop: *(BASE + INIT + OFFSET). By construction, > this address must be invariant in the inner loop, so we > can consider it as being used in the outer loop. */ > - tree init_offset = fold_build2 (PLUS_EXPR, TREE_TYPE (offset), > - init, offset); > - tree init_addr = fold_build_pointer_plus (base, init_offset); > + tree init_addr = fold_build_pointer_plus (base, offset); > tree init_ref = build_fold_indirect_ref (init_addr); > > if (dump_enabled_p ()) > @@ -3715,7 +3709,7 @@ vect_analyze_data_refs (vec_info *vinfo, > } > > if (!dr_analyze_innermost (&STMT_VINFO_DR_WRT_VEC_LOOP (stmt_info), > - init_ref, loop)) > + init_ref, stmt, loop)) > /* dr_analyze_innermost already explained the failure. */ > return false; > > @@ -3728,10 +3722,14 @@ vect_analyze_data_refs (vec_info *vinfo, > dump_printf (MSG_NOTE, "\n\touter offset from base address: "); > dump_generic_expr (MSG_NOTE, TDF_SLIM, > STMT_VINFO_DR_OFFSET (stmt_info)); > - dump_printf (MSG_NOTE, > - "\n\touter constant offset from base address: "); > + dump_printf (MSG_NOTE, "\n\tvariable part of outer offset" > + " from base address: "); > + dump_generic_expr (MSG_NOTE, TDF_SLIM, > + STMT_VINFO_DR_VAR_OFFSET (stmt_info)); > + dump_printf (MSG_NOTE, "\n\tconstart part of outer offset" > + " from base address: "); > dump_generic_expr (MSG_NOTE, TDF_SLIM, > - STMT_VINFO_DR_INIT (stmt_info)); > + STMT_VINFO_DR_CONST_OFFSET (stmt_info)); > dump_printf (MSG_NOTE, "\n\touter step: "); > dump_generic_expr (MSG_NOTE, TDF_SLIM, > STMT_VINFO_DR_STEP (stmt_info)); > @@ -3739,8 +3737,9 @@ vect_analyze_data_refs (vec_info *vinfo, > STMT_VINFO_DR_BASE_ALIGNMENT (stmt_info)); > dump_printf (MSG_NOTE, "\n\touter base misalignment: %d\n", > STMT_VINFO_DR_BASE_MISALIGNMENT (stmt_info)); > - dump_printf (MSG_NOTE, "\n\touter offset alignment: %d\n", > - STMT_VINFO_DR_OFFSET_ALIGNMENT (stmt_info)); > + dump_printf (MSG_NOTE, > + "\n\touter variable offset alignment: %d\n", > + STMT_VINFO_DR_VAR_OFFSET_ALIGNMENT (stmt_info)); > dump_printf (MSG_NOTE, "\n\touter step alignment: %d\n", > STMT_VINFO_DR_STEP_ALIGNMENT (stmt_info)); > } > @@ -4055,23 +4054,18 @@ vect_create_addr_base_for_vector_ref (gi > innermost_loop_behavior *drb = vect_dr_behavior (dr); > > tree data_ref_base = unshare_expr (drb->base_address); > - tree base_offset = unshare_expr (drb->offset); > - tree init = unshare_expr (drb->init); > - > + tree base_offset; > if (loop_vinfo) > - base_name = get_name (data_ref_base); > + { > + base_name = get_name (data_ref_base); > + base_offset = fold_convert (sizetype, unshare_expr (drb->offset)); > + } > else > { > - base_offset = ssize_int (0); > - init = ssize_int (0); > + base_offset = size_int (0); > base_name = get_name (DR_REF (dr)); > } > > - /* Create base_offset */ > - base_offset = size_binop (PLUS_EXPR, > - fold_convert (sizetype, base_offset), > - fold_convert (sizetype, init)); > - > if (offset) > { > offset = fold_build2 (MULT_EXPR, sizetype, > Index: gcc/tree-vect-stmts.c > =================================================================== > --- gcc/tree-vect-stmts.c 2017-08-21 15:50:48.664709938 +0100 > +++ gcc/tree-vect-stmts.c 2017-08-22 14:54:48.633563940 +0100 > @@ -5970,9 +5970,7 @@ vectorizable_store (gimple *stmt, gimple > stride_base > = fold_build_pointer_plus > (unshare_expr (DR_BASE_ADDRESS (first_dr)), > - size_binop (PLUS_EXPR, > - convert_to_ptrofftype (unshare_expr (DR_OFFSET (first_dr))), > - convert_to_ptrofftype (DR_INIT (first_dr)))); > + convert_to_ptrofftype (unshare_expr (DR_OFFSET (first_dr)))); > stride_step = fold_convert (sizetype, unshare_expr (DR_STEP (first_dr))); > > /* For a store with loop-invariant (but other than power-of-2) > @@ -6299,7 +6297,6 @@ vectorizable_store (gimple *stmt, gimple > && TREE_CODE (DR_BASE_ADDRESS (first_dr)) == ADDR_EXPR > && VAR_P (TREE_OPERAND (DR_BASE_ADDRESS (first_dr), 0)) > && integer_zerop (DR_OFFSET (first_dr)) > - && integer_zerop (DR_INIT (first_dr)) > && alias_sets_conflict_p (get_alias_set (aggr_type), > get_alias_set (TREE_TYPE (ref_type)))) > { > @@ -6993,9 +6990,7 @@ vectorizable_load (gimple *stmt, gimple_ > stride_base > = fold_build_pointer_plus > (DR_BASE_ADDRESS (first_dr), > - size_binop (PLUS_EXPR, > - convert_to_ptrofftype (DR_OFFSET (first_dr)), > - convert_to_ptrofftype (DR_INIT (first_dr)))); > + convert_to_ptrofftype (DR_OFFSET (first_dr))); > stride_step = fold_convert (sizetype, DR_STEP (first_dr)); > > /* For a load with loop-invariant (but other than power-of-2) > @@ -7394,7 +7389,6 @@ vectorizable_load (gimple *stmt, gimple_ > && TREE_CODE (DR_BASE_ADDRESS (first_dr)) == ADDR_EXPR > && VAR_P (TREE_OPERAND (DR_BASE_ADDRESS (first_dr), 0)) > && integer_zerop (DR_OFFSET (first_dr)) > - && integer_zerop (DR_INIT (first_dr)) > && alias_sets_conflict_p (get_alias_set (aggr_type), > get_alias_set (TREE_TYPE (ref_type))) > && (alignment_support_scheme == dr_aligned > @@ -7417,8 +7411,8 @@ vectorizable_load (gimple *stmt, gimple_ > = STMT_VINFO_DATA_REF (vinfo_for_stmt (first_stmt_for_drptr)); > tree diff = fold_convert (sizetype, > size_binop (MINUS_EXPR, > - DR_INIT (first_dr), > - DR_INIT (ptrdr))); > + DR_CONST_OFFSET (first_dr), > + DR_CONST_OFFSET (ptrdr))); > dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, gsi, > stmt, diff); > } > Index: gcc/testsuite/gcc.dg/vect/bb-slp-pr81635.c > =================================================================== > --- /dev/null 2017-08-21 17:11:06.647307681 +0100 > +++ gcc/testsuite/gcc.dg/vect/bb-slp-pr81635.c 2017-08-22 14:54:48.628563940 +0100 > @@ -0,0 +1,56 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target vect_unpack } */ > + > +double p[1000]; > +double q[1000]; > + > +void > +f1 (void) > +{ > + for (unsigned int i = 0; i < 1000; i += 4) > + { > + double a = q[i] + p[i]; > + double b = q[i + 1] + p[i + 1]; > + q[i] = a; > + q[i + 1] = b; > + } > +} > + > +void > +f2 (void) > +{ > + for (unsigned int i = 0; i < 500; i += 6) > + for (unsigned int j = 0; j < 500; j += 4) > + { > + double a = q[j] + p[i]; > + double b = q[j + 1] + p[i + 1]; > + q[i] = a; > + q[i + 1] = b; > + } > +} > + > +void > +f3 (void) > +{ > + for (unsigned int i = 2; i < 1000; i += 4) > + { > + double a = q[i - 2] + p[i - 2]; > + double b = q[i - 1] + p[i - 1]; > + q[i - 2] = a; > + q[i - 1] = b; > + } > +} > + > +void > +f4 (unsigned int n) > +{ > + for (unsigned int i = 0; i < n; i += 4) > + { > + double a = q[i] + p[i]; > + double b = q[i + 1] + p[i + 1]; > + q[i] = a; > + q[i + 1] = b; > + } > +} > + > +/* { dg-final { scan-tree-dump-times "basic block vectorized" 4 "slp1" } } */
Richard Biener <richard.guenther@gmail.com> writes: > @@ -787,14 +821,14 @@ canonicalize_base_object_address (tree a > > bool > dr_analyze_innermost (innermost_loop_behavior *drb, tree ref, > - struct loop *loop) > + gimple *stmt, struct loop *loop) > { > > please amend the function comment with what STMT is about > (DR_STMT I suppose). I'd done that with: ------ @@ -765,7 +778,28 @@ canonicalize_base_object_address (tree a return build_fold_addr_expr (TREE_OPERAND (addr, 0)); } -/* Analyze the behavior of memory reference REF. There are two modes: [...] +/* Analyze the behavior of memory reference REF, which occurs in STMT. + There are two modes: - BB analysis. In this case we simply split the address into base, init and offset components, without reference to any containing loop. ------ but it wasn't obvious because of the elided stuff. > @@ -893,14 +927,14 @@ dr_analyze_innermost (innermost_loop_beh > init = size_binop (PLUS_EXPR, init, dinit); > base_misalignment -= TREE_INT_CST_LOW (dinit); > > - split_constant_offset (offset_iv.base, &offset_iv.base, &dinit); > - init = size_binop (PLUS_EXPR, init, dinit); > - > - step = size_binop (PLUS_EXPR, > - fold_convert (ssizetype, base_iv.step), > - fold_convert (ssizetype, offset_iv.step)); > - > base = canonicalize_base_object_address (base_iv.base); > + tree offset = size_binop (PLUS_EXPR, > + fold_convert (ssizetype, offset_iv.base), > + init); > > so you remove the split_constant_offset handling on offset_iv.base. > This may end up no longer walking and expanding def stmts of > SSA names contained therein. I suppose this is fully intended > so that re-computing the ref address using DR_BASE/DR_OFFSET > doesn't end up expanding that redundant code? Yeah. I think DR_OFFSET is only going to be used by code that wants to construct a new address, so there doesn't seem much point taking apart the offset only to regimplify it when building a new reference. > For analysis relying on this one now needs to resort to > DR_VAR/CONST_OFFSET where SCEV analysis hopefully performs similar > expansions? We still apply split_constant_offset to the DR_VAR_OFFSET as well; the scev analysis applies on top of that rather than replacing it. > Just sth to watch at ... > > @@ -921,12 +955,12 @@ dr_analyze_innermost (innermost_loop_beh > } > > drb->base_address = base; > - drb->offset = fold_convert (ssizetype, offset_iv.base); > - drb->init = init; > + drb->offset = offset; > drb->step = step; > + split_constant_offset (scev, &drb->var_offset, &drb->const_offset); > > so is the full-fledged split_constant_offset (with its SSA name handling) > still needed here? Sth to eventually address with a followup. Yeah, I think we still need it. The SCEV stuff is only really useful if the starting offset has a simple evolution in a containing loop. For other cases we punt and just use the original generic expression. In particular, if we're doing SLP on a single-block function, we need everything that split_constant_offset currently does. > @@ -1490,6 +1482,7 @@ ref_at_iteration (data_reference_p dr, i > tree ref_op2 = NULL_TREE; > tree new_offset; > > + split_constant_offset (DR_OFFSET (dr), &off, &coff); > if (iter != 0) > { > new_offset = size_binop (MULT_EXPR, DR_STEP (dr), ssize_int (iter)); > > likewise here? Why do you think ref_at_iteration cares? Is that because > of codegen quality? I'd have done with coff == size_zero_node plus > simplifications that arise from that. Yeah, I assumed it was codegen quality. But maybe it was just a way to compensate for the fact that the MEM_REF is built directly (rather than via a fold) and so this was the easiest way of getting a nonzero second operand to the MEM_REF. Thanks, Richard
Index: gcc/tree-data-ref.h =================================================================== --- gcc/tree-data-ref.h 2017-08-04 11:42:45.938134820 +0100 +++ gcc/tree-data-ref.h 2017-08-16 14:34:56.611554296 +0100 @@ -52,6 +52,27 @@ struct innermost_loop_behavior tree init; tree step; + /* The scalar evolution of OFFSET + INIT, split into non-constant and + constant terms in the same way as OFFSET and INIT. For example, + if OFFSET + INIT is {x + 4, +, 3}_1, the fields would be: + + chrec_offset {x, +, 3}_1 + chrec_init 4 + + If OFFSET + INIT cannot be represented as a scalar evolution, + the fields are simply a copy of OFFSET and INIT. + + This split is useful when analyzing the relationship between two + data references with the same base. OFFSET and INIT are generic + expressions that can be used to build related data references, + while CHREC_OFFSET and CHREC_INIT are our best attempt at + canonicalizing the value of OFFSET + INIT. + + These fields are only initialized lazily, by a call to + compute_offset_chrecs. */ + tree chrec_offset; + tree chrec_init; + /* BASE_ADDRESS is known to be misaligned by BASE_MISALIGNMENT bytes from an alignment boundary of BASE_ALIGNMENT bytes. For example, if we had: @@ -187,6 +208,8 @@ #define DR_IS_CONDITIONAL_IN_STMT(DR) (D #define DR_BASE_ADDRESS(DR) (DR)->innermost.base_address #define DR_OFFSET(DR) (DR)->innermost.offset #define DR_INIT(DR) (DR)->innermost.init +#define DR_CHREC_OFFSET(DR) (DR)->innermost.chrec_offset +#define DR_CHREC_INIT(DR) (DR)->innermost.chrec_init #define DR_STEP(DR) (DR)->innermost.step #define DR_PTR_INFO(DR) (DR)->alias.ptr_info #define DR_BASE_ALIGNMENT(DR) (DR)->innermost.base_alignment @@ -466,8 +489,10 @@ dr_alignment (data_reference *dr) extern bool dr_may_alias_p (const struct data_reference *, const struct data_reference *, bool); -extern bool dr_equal_offsets_p (struct data_reference *, - struct data_reference *); +extern bool dr_chrec_offsets_equal_p (struct data_reference *, + struct data_reference *); +extern int dr_chrec_offsets_compare (struct data_reference *, + struct data_reference *); extern bool runtime_alias_check_p (ddr_p, struct loop *, bool); extern int data_ref_compare_tree (tree, tree); Index: gcc/tree-data-ref.c =================================================================== --- gcc/tree-data-ref.c 2017-08-04 11:42:45.938134820 +0100 +++ gcc/tree-data-ref.c 2017-08-16 14:34:56.611554296 +0100 @@ -86,6 +86,7 @@ Software Foundation; either version 3, o #include "expr.h" #include "gimple-iterator.h" #include "tree-ssa-loop-niter.h" +#include "tree-ssa-loop-ivopts.h" #include "tree-ssa-loop.h" #include "tree-ssa.h" #include "cfgloop.h" @@ -730,7 +731,19 @@ split_constant_offset (tree exp, tree *v *off = ssize_int (0); STRIP_NOPS (exp); - if (tree_is_chrec (exp) + if (TREE_CODE (exp) == POLYNOMIAL_CHREC) + { + split_constant_offset (CHREC_LEFT (exp), &op0, &op1); + if (op0 != CHREC_LEFT (exp)) + { + *var = build3 (POLYNOMIAL_CHREC, type, CHREC_VAR (exp), + op0, CHREC_RIGHT (exp)); + *off = op1; + } + return; + } + + if (automatically_generated_chrec_p (exp) || get_gimple_rhs_class (TREE_CODE (exp)) == GIMPLE_TERNARY_RHS) return; @@ -924,6 +937,8 @@ dr_analyze_innermost (innermost_loop_beh drb->offset = fold_convert (ssizetype, offset_iv.base); drb->init = init; drb->step = step; + drb->chrec_offset = NULL_TREE; + drb->chrec_init = NULL_TREE; drb->base_alignment = base_alignment; drb->base_misalignment = base_misalignment & (base_alignment - 1); drb->offset_alignment = highest_pow2_factor (offset_iv.base); @@ -1324,11 +1339,12 @@ runtime_alias_check_p (ddr_p ddr, struct operator == (const dr_with_seg_len& d1, const dr_with_seg_len& d2) { - return operand_equal_p (DR_BASE_ADDRESS (d1.dr), + return (operand_equal_p (DR_BASE_ADDRESS (d1.dr), DR_BASE_ADDRESS (d2.dr), 0) - && data_ref_compare_tree (DR_OFFSET (d1.dr), DR_OFFSET (d2.dr)) == 0 - && data_ref_compare_tree (DR_INIT (d1.dr), DR_INIT (d2.dr)) == 0 - && data_ref_compare_tree (d1.seg_len, d2.seg_len) == 0; + && dr_chrec_offsets_equal_p (d1.dr, d2.dr) + && data_ref_compare_tree (DR_CHREC_INIT (d1.dr), + DR_CHREC_INIT (d2.dr)) == 0 + && data_ref_compare_tree (d1.seg_len, d2.seg_len) == 0); } /* Comparison function for sorting objects of dr_with_seg_len_pair_t @@ -1360,17 +1376,15 @@ comp_dr_with_seg_len_pair (const void *p if ((comp_res = data_ref_compare_tree (DR_STEP (a2.dr), DR_STEP (b2.dr))) != 0) return comp_res; - if ((comp_res = data_ref_compare_tree (DR_OFFSET (a1.dr), - DR_OFFSET (b1.dr))) != 0) + if ((comp_res = dr_chrec_offsets_compare (a1.dr, b1.dr)) != 0) return comp_res; - if ((comp_res = data_ref_compare_tree (DR_INIT (a1.dr), - DR_INIT (b1.dr))) != 0) + if ((comp_res = data_ref_compare_tree (DR_CHREC_INIT (a1.dr), + DR_CHREC_INIT (b1.dr))) != 0) return comp_res; - if ((comp_res = data_ref_compare_tree (DR_OFFSET (a2.dr), - DR_OFFSET (b2.dr))) != 0) + if ((comp_res = dr_chrec_offsets_compare (a2.dr, b2.dr)) != 0) return comp_res; - if ((comp_res = data_ref_compare_tree (DR_INIT (a2.dr), - DR_INIT (b2.dr))) != 0) + if ((comp_res = data_ref_compare_tree (DR_CHREC_INIT (a2.dr), + DR_CHREC_INIT (b2.dr))) != 0) return comp_res; return 0; @@ -1455,10 +1469,9 @@ prune_runtime_alias_test_list (vec<dr_wi if (!operand_equal_p (DR_BASE_ADDRESS (dr_a1->dr), DR_BASE_ADDRESS (dr_a2->dr), 0) - || !operand_equal_p (DR_OFFSET (dr_a1->dr), - DR_OFFSET (dr_a2->dr), 0) - || !tree_fits_shwi_p (DR_INIT (dr_a1->dr)) - || !tree_fits_shwi_p (DR_INIT (dr_a2->dr))) + || !dr_chrec_offsets_equal_p (dr_a1->dr, dr_a2->dr) + || !tree_fits_shwi_p (DR_CHREC_INIT (dr_a1->dr)) + || !tree_fits_shwi_p (DR_CHREC_INIT (dr_a2->dr))) continue; /* Only merge const step data references. */ @@ -1484,11 +1497,13 @@ prune_runtime_alias_test_list (vec<dr_wi continue; /* Make sure dr_a1 starts left of dr_a2. */ - if (tree_int_cst_lt (DR_INIT (dr_a2->dr), DR_INIT (dr_a1->dr))) + if (tree_int_cst_lt (DR_CHREC_INIT (dr_a2->dr), + DR_CHREC_INIT (dr_a1->dr))) std::swap (*dr_a1, *dr_a2); bool do_remove = false; - wide_int diff = wi::sub (DR_INIT (dr_a2->dr), DR_INIT (dr_a1->dr)); + wide_int diff = wi::sub (DR_CHREC_INIT (dr_a2->dr), + DR_CHREC_INIT (dr_a1->dr)); wide_int min_seg_len_b; tree new_seg_len; @@ -1826,46 +1841,83 @@ create_runtime_alias_checks (struct loop } } -/* Check if OFFSET1 and OFFSET2 (DR_OFFSETs of some data-refs) are identical - expressions. */ -static bool -dr_equal_offsets_p1 (tree offset1, tree offset2) +/* Analyze the scalar evolution of OFFSET in the innermost parent of + LOOP for which it isn't invariant. */ + +static tree +analyze_offset_scev (struct loop *loop, tree offset) { - bool res; + struct loop *inv_loop = outermost_invariant_loop_for_expr (loop, offset); + if (inv_loop != NULL) + { + if (loop_depth (inv_loop) == 0) + return offset; + loop = loop_outer (inv_loop); + } + return analyze_scalar_evolution (loop, offset); +} - STRIP_NOPS (offset1); - STRIP_NOPS (offset2); +/* Analyze the scalar evolution of DRB->offset + DRB->init in a form + that is valid for use in LOOP. Store the result DRB->chrec_offset + and DRB->chrec_init. */ - if (offset1 == offset2) - return true; +static void +compute_offset_chrecs (struct loop *loop, innermost_loop_behavior *drb) +{ + tree scev = analyze_offset_scev (loop, drb->offset); + if (TREE_CODE (scev) == POLYNOMIAL_CHREC) + { + tree init_term; + split_constant_offset (scev, &drb->chrec_offset, &init_term); + drb->chrec_init = fold_build2 (PLUS_EXPR, ssizetype, init_term, + drb->init); + } + else + { + drb->chrec_offset = drb->offset; + drb->chrec_init = drb->init; + } +} - if (TREE_CODE (offset1) != TREE_CODE (offset2) - || (!BINARY_CLASS_P (offset1) && !UNARY_CLASS_P (offset1))) - return false; +/* Analyze the scalar evolution of DR_OFFSET (DR) + DR_INIT (DR) wrt + the loop that contains DR_STMT (DR), storing the result in + DR_CHREC_OFFSET (DR) and DR_CHREC_INIT (DR). */ - res = dr_equal_offsets_p1 (TREE_OPERAND (offset1, 0), - TREE_OPERAND (offset2, 0)); +static void +compute_offset_chrecs (data_reference *dr) +{ + return compute_offset_chrecs (loop_containing_stmt (DR_STMT (dr)), + &DR_INNERMOST (dr)); +} - if (!res || !BINARY_CLASS_P (offset1)) - return res; +/* Check if DRA and DRB have equal DR_CHREC_OFFSETs, computing them + first if necessary. */ - res = dr_equal_offsets_p1 (TREE_OPERAND (offset1, 1), - TREE_OPERAND (offset2, 1)); +bool +dr_chrec_offsets_equal_p (struct data_reference *dra, + struct data_reference *drb) +{ + if (!DR_CHREC_OFFSET (dra)) + compute_offset_chrecs (dra); + if (!DR_CHREC_OFFSET (drb)) + compute_offset_chrecs (drb); - return res; + return eq_evolutions_p (DR_CHREC_OFFSET (dra), DR_CHREC_OFFSET (drb)); } -/* Check if DRA and DRB have equal offsets. */ -bool -dr_equal_offsets_p (struct data_reference *dra, - struct data_reference *drb) -{ - tree offset1, offset2; +/* Compare the DR_CHREC_OFFSETs of DRA and DRB for sorting purposes, + computing them first if necessary. */ - offset1 = DR_OFFSET (dra); - offset2 = DR_OFFSET (drb); +int +dr_chrec_offsets_compare (struct data_reference *dra, + struct data_reference *drb) +{ + if (!DR_CHREC_OFFSET (dra)) + compute_offset_chrecs (dra); + if (!DR_CHREC_OFFSET (drb)) + compute_offset_chrecs (drb); - return dr_equal_offsets_p1 (offset1, offset2); + return data_ref_compare_tree (DR_CHREC_OFFSET (dra), DR_CHREC_OFFSET (drb)); } /* Returns true if FNA == FNB. */ Index: gcc/tree-loop-distribution.c =================================================================== --- gcc/tree-loop-distribution.c 2017-08-16 08:50:32.415427038 +0100 +++ gcc/tree-loop-distribution.c 2017-08-16 14:34:56.612554255 +0100 @@ -2204,7 +2204,7 @@ compute_alias_check_pairs (struct loop * DR_BASE_ADDRESS (dr_b)); if (comp_res == 0) - comp_res = data_ref_compare_tree (DR_OFFSET (dr_a), DR_OFFSET (dr_b)); + comp_res = dr_chrec_offsets_compare (dr_a, dr_b); gcc_assert (comp_res != 0); if (latch_dominated_by_data_ref (loop, dr_a)) Index: gcc/tree-vect-data-refs.c =================================================================== --- gcc/tree-vect-data-refs.c 2017-08-04 11:42:45.939105152 +0100 +++ gcc/tree-vect-data-refs.c 2017-08-16 14:34:56.613554213 +0100 @@ -2187,13 +2187,13 @@ vect_find_same_alignment_drs (struct dat return; if (!operand_equal_p (DR_BASE_ADDRESS (dra), DR_BASE_ADDRESS (drb), 0) - || !operand_equal_p (DR_OFFSET (dra), DR_OFFSET (drb), 0) + || !dr_chrec_offsets_equal_p (dra, drb) || !operand_equal_p (DR_STEP (dra), DR_STEP (drb), 0)) return; /* Two references with distance zero have the same alignment. */ - offset_int diff = (wi::to_offset (DR_INIT (dra)) - - wi::to_offset (DR_INIT (drb))); + offset_int diff = (wi::to_offset (DR_CHREC_INIT (dra)) + - wi::to_offset (DR_CHREC_INIT (drb))); if (diff != 0) { /* Get the wider of the two alignments. */ @@ -2434,7 +2434,7 @@ vect_analyze_group_access_1 (struct data gimple *next = GROUP_NEXT_ELEMENT (vinfo_for_stmt (stmt)); struct data_reference *data_ref = dr; unsigned int count = 1; - tree prev_init = DR_INIT (data_ref); + tree prev_init = DR_CHREC_INIT (data_ref); gimple *prev = stmt; HOST_WIDE_INT diff, gaps = 0; @@ -2444,9 +2444,9 @@ vect_analyze_group_access_1 (struct data data-ref (supported only for loads), we vectorize only the first stmt, and the rest get their vectorized loads from the first one. */ - if (!tree_int_cst_compare (DR_INIT (data_ref), - DR_INIT (STMT_VINFO_DATA_REF ( - vinfo_for_stmt (next))))) + if (!tree_int_cst_compare (DR_CHREC_INIT (data_ref), + DR_CHREC_INIT (STMT_VINFO_DATA_REF + (vinfo_for_stmt (next))))) { if (DR_IS_WRITE (data_ref)) { @@ -2476,7 +2476,7 @@ vect_analyze_group_access_1 (struct data /* Check that the distance between two accesses is equal to the type size. Otherwise, we have gaps. */ - diff = (TREE_INT_CST_LOW (DR_INIT (data_ref)) + diff = (TREE_INT_CST_LOW (DR_CHREC_INIT (data_ref)) - TREE_INT_CST_LOW (prev_init)) / type_size; if (diff != 1) { @@ -2499,7 +2499,7 @@ vect_analyze_group_access_1 (struct data gap in the access, GROUP_GAP is always 1. */ GROUP_GAP (vinfo_for_stmt (next)) = diff; - prev_init = DR_INIT (data_ref); + prev_init = DR_CHREC_INIT (data_ref); next = GROUP_NEXT_ELEMENT (vinfo_for_stmt (next)); /* Count the number of data-refs in the chain. */ count++; @@ -2715,13 +2715,10 @@ dr_group_sort_cmp (const void *dra_, con return cmp; } - /* And according to DR_OFFSET. */ - if (!dr_equal_offsets_p (dra, drb)) - { - cmp = data_ref_compare_tree (DR_OFFSET (dra), DR_OFFSET (drb)); - if (cmp != 0) - return cmp; - } + /* And according to DR_CHREC_OFFSET. */ + cmp = dr_chrec_offsets_compare (dra, drb); + if (cmp != 0) + return cmp; /* Put reads before writes. */ if (DR_IS_READ (dra) != DR_IS_READ (drb)) @@ -2745,8 +2742,9 @@ dr_group_sort_cmp (const void *dra_, con return cmp; } - /* Then sort after DR_INIT. In case of identical DRs sort after stmt UID. */ - cmp = tree_int_cst_compare (DR_INIT (dra), DR_INIT (drb)); + /* Then sort after DR_CHREC_INIT. In case of identical DRs sort after + stmt UID. */ + cmp = tree_int_cst_compare (DR_CHREC_INIT (dra), DR_CHREC_INIT (drb)); if (cmp == 0) return gimple_uid (DR_STMT (dra)) < gimple_uid (DR_STMT (drb)) ? -1 : 1; return cmp; @@ -2817,7 +2815,7 @@ vect_analyze_data_ref_accesses (vec_info if (DR_IS_READ (dra) != DR_IS_READ (drb) || !operand_equal_p (DR_BASE_ADDRESS (dra), DR_BASE_ADDRESS (drb), 0) - || !dr_equal_offsets_p (dra, drb) + || !dr_chrec_offsets_equal_p (dra, drb) || !gimple_assign_single_p (DR_STMT (dra)) || !gimple_assign_single_p (DR_STMT (drb))) break; @@ -2835,7 +2833,8 @@ vect_analyze_data_ref_accesses (vec_info break; /* Do not place the same access in the interleaving chain twice. */ - if (tree_int_cst_compare (DR_INIT (dra), DR_INIT (drb)) == 0) + if (tree_int_cst_compare (DR_CHREC_INIT (dra), + DR_CHREC_INIT (drb)) == 0) break; /* Check the types are compatible. @@ -2844,9 +2843,10 @@ vect_analyze_data_ref_accesses (vec_info TREE_TYPE (DR_REF (drb)))) break; - /* Sorting has ensured that DR_INIT (dra) <= DR_INIT (drb). */ - HOST_WIDE_INT init_a = TREE_INT_CST_LOW (DR_INIT (dra)); - HOST_WIDE_INT init_b = TREE_INT_CST_LOW (DR_INIT (drb)); + /* Sorting has ensured that + DR_CHREC_INIT (dra) <= DR_CHREC_INIT (drb). */ + HOST_WIDE_INT init_a = TREE_INT_CST_LOW (DR_CHREC_INIT (dra)); + HOST_WIDE_INT init_b = TREE_INT_CST_LOW (DR_CHREC_INIT (drb)); gcc_assert (init_a <= init_b); /* If init_b == init_a + the size of the type * k, we have an @@ -2859,10 +2859,10 @@ vect_analyze_data_ref_accesses (vec_info /* If we have a store, the accesses are adjacent. This splits groups into chunks we support (we don't support vectorization of stores with gaps). */ + HOST_WIDE_INT prev_init + = TREE_INT_CST_LOW (DR_CHREC_INIT (datarefs_copy[i - 1])); if (!DR_IS_READ (dra) - && (init_b - (HOST_WIDE_INT) TREE_INT_CST_LOW - (DR_INIT (datarefs_copy[i-1])) - != type_size_a)) + && (init_b - prev_init) != type_size_a) break; /* If the step (if not zero or non-constant) is greater than the @@ -2974,12 +2974,12 @@ vect_vfa_segment_size (struct data_refer vect_no_alias_p (struct data_reference *a, struct data_reference *b, tree segment_length_a, tree segment_length_b) { - gcc_assert (TREE_CODE (DR_INIT (a)) == INTEGER_CST - && TREE_CODE (DR_INIT (b)) == INTEGER_CST); - if (tree_int_cst_equal (DR_INIT (a), DR_INIT (b))) + gcc_assert (TREE_CODE (DR_CHREC_INIT (a)) == INTEGER_CST + && TREE_CODE (DR_CHREC_INIT (b)) == INTEGER_CST); + if (tree_int_cst_equal (DR_CHREC_INIT (a), DR_CHREC_INIT (b))) return false; - tree seg_a_min = DR_INIT (a); + tree seg_a_min = DR_CHREC_INIT (a); tree seg_a_max = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_a_min), seg_a_min, segment_length_a); /* For negative step, we need to adjust address range by TYPE_SIZE_UNIT @@ -2990,10 +2990,10 @@ vect_no_alias_p (struct data_reference * tree unit_size = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (a))); seg_a_min = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_a_max), seg_a_max, unit_size); - seg_a_max = fold_build2 (PLUS_EXPR, TREE_TYPE (DR_INIT (a)), - DR_INIT (a), unit_size); + seg_a_max = fold_build2 (PLUS_EXPR, TREE_TYPE (DR_CHREC_INIT (a)), + DR_CHREC_INIT (a), unit_size); } - tree seg_b_min = DR_INIT (b); + tree seg_b_min = DR_CHREC_INIT (b); tree seg_b_max = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_b_min), seg_b_min, segment_length_b); if (tree_int_cst_compare (DR_STEP (b), size_zero_node) < 0) @@ -3001,8 +3001,8 @@ vect_no_alias_p (struct data_reference * tree unit_size = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (b))); seg_b_min = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_b_max), seg_b_max, unit_size); - seg_b_max = fold_build2 (PLUS_EXPR, TREE_TYPE (DR_INIT (b)), - DR_INIT (b), unit_size); + seg_b_max = fold_build2 (PLUS_EXPR, TREE_TYPE (DR_CHREC_INIT (b)), + DR_CHREC_INIT (b), unit_size); } if (tree_int_cst_le (seg_a_max, seg_b_min) @@ -3148,8 +3148,7 @@ vect_prune_runtime_alias_test_list (loop comp_res = data_ref_compare_tree (DR_BASE_ADDRESS (dr_a), DR_BASE_ADDRESS (dr_b)); if (comp_res == 0) - comp_res = data_ref_compare_tree (DR_OFFSET (dr_a), - DR_OFFSET (dr_b)); + comp_res = dr_chrec_offsets_compare (dr_a, dr_b); /* Alias is known at compilation time. */ if (comp_res == 0 Index: gcc/tree-vect-stmts.c =================================================================== --- gcc/tree-vect-stmts.c 2017-08-03 10:40:55.650125801 +0100 +++ gcc/tree-vect-stmts.c 2017-08-16 14:34:56.614554172 +0100 @@ -7423,8 +7423,8 @@ vectorizable_load (gimple *stmt, gimple_ = STMT_VINFO_DATA_REF (vinfo_for_stmt (first_stmt_for_drptr)); tree diff = fold_convert (sizetype, size_binop (MINUS_EXPR, - DR_INIT (first_dr), - DR_INIT (ptrdr))); + DR_CHREC_INIT (first_dr), + DR_CHREC_INIT (ptrdr))); dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, gsi, stmt, diff); } Index: gcc/testsuite/gcc.dg/vect/bb-slp-pr81635.c =================================================================== --- /dev/null 2017-08-15 18:19:06.926776201 +0100 +++ gcc/testsuite/gcc.dg/vect/bb-slp-pr81635.c 2017-08-16 14:34:56.610554337 +0100 @@ -0,0 +1,32 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_unpack } */ + +double p[1000]; +double q[1000]; + +void +f1 (void) +{ + for (unsigned int i = 0; i < 1000; i += 4) + { + double a = q[i] + p[i]; + double b = q[i + 1] + p[i + 1]; + q[i] = a; + q[i + 1] = b; + } +} + +void +f2 (void) +{ + for (unsigned int i = 0; i < 500; i += 6) + for (unsigned int j = 0; j < 500; j += 4) + { + double a = q[j] + p[i]; + double b = q[j + 1] + p[i + 1]; + q[i] = a; + q[i + 1] = b; + } +} + +/* { dg-final { scan-tree-dump-times "basic block vectorized" 2 "slp1" } } */