PR81635: Use chrecs to help find related data refs

Message ID	87pobvr5t8.fsf@linaro.org
State	New
Headers	show Delivered-To: patch@linaro.org Received-SPF: pass (google.com: domain of gcc-patches-return-460433-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; q=dns; s= default; b=AVBmww/9fmxVSaULB9uUn3MqO593MpHn0xxBcrCUqQmedX7xa5CPr fdWfBw9VMvb38uL3nFkt7aVfKPQhA7VSqYJ5Ye2pn5hEPywRIBP4oW93/yIznhEr Q+oUMQUW4/NI5qa4F88u8cwOa5h4cOE2f3C3LT8RNpsyImvA2EoLpk= Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk Sender: gcc-patches-owner@gcc.gnu.org From: Richard Sandiford <richard.sandiford@linaro.org> To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@linaro.org Subject: PR81635: Use chrecs to help find related data refs Date: Wed, 16 Aug 2017 14:38:27 +0100 Message-ID: <87pobvr5t8.fsf@linaro.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain

Richard Sandiford Aug. 16, 2017, 1:38 p.m. UTC

The first loop in the testcase regressed after my recent changes to
dr_analyze_innermost.  Previously we would treat "i" as an iv even
for bb analysis and end up with:

   DR_BASE_ADDRESS: p or q
   DR_OFFSET: 0
   DR_INIT: 0 or 4
   DR_STEP: 16

We now always keep the step as 0 instead, so for an int "i" we'd have:

   DR_BASE_ADDRESS: p or q
   DR_OFFSET: (intptr_t) i
   DR_INIT: 0 or 4
   DR_STEP: 0

This is also what we'd like to have for the unsigned "i", but the
problem is that strip_constant_offset thinks that the "i + 1" in
"(intptr_t) (i + 1)" could wrap and so doesn't peel off the "+ 1".
The [i + 1] accesses therefore have a DR_OFFSET equal to the SSA
name that holds "(intptr_t) (i + 1)", meaning that the accesses no
longer seem to be related to the [i] ones.

There seem to be two main uses of DR_OFFSET and DR_INIT.  One type
expects DR_OFFSET and DR_INIT to be generic expressions whose sum
gives the initial offset from DR_BASE_ADDRESS.  The other type uses
the pair (DR_BASE_ADDRESS, DR_OFFSET) to identify related data
references, with the distance between their start addresses being
the difference between their DR_INITs.  For this second type, the
exact form of DR_OFFSET doesn't really matter, and the more it is
able to canonicalise the non-constant offset, the better.

The patch fixes the PR by adding another offset/init pair to the
innermost loop behaviour for this second group.  The new pair use chrecs
rather than generic exprs for the offset part, with the chrec being
analysed in the innermost loop for which the offset isn't invariant.
This allows us to vectorise the second function in the testcase
as well, which wasn't possible before the earlier patch.

Tested on x86_64-linux-gnu and aarch64-linux-gnu.  OK to install?

Richard


gcc/
	PR tree-optimization/81635
	* tree-data-ref.h (innermost_loop_behavior): Add chrec_offset and
	chrec_init.
	(DR_CHREC_OFFSET, DR_CHREC_INIT): New macros.
	(dr_equal_offsets_p): Delete.
	(dr_chrec_offsets_equal_p, dr_chrec_offsets_compare): Declare.
	* tree-data-ref.c: Include tree-ssa-loop-ivopts.h.
	(split_constant_offset): Handle POLYNOMIAL_CHREC.
	(dr_analyze_innermost): Initialize dr_chrec_offset and dr_chrec_init.
	(operator ==): Use dr_chrec_offsets_equal_p and compare the
	DR_CHREC_INITs.
	(prune_runtime_alias_test_list): Likewise.
	(comp_dr_with_seg_len_pair): Use dr_chrec_offsets_compare and compare
	the DR_CHREC_INITs.
	(dr_equal_offsets_p1, dr_equal_offsets_p): Delete.
	(analyze_offset_scev): New function.
	(compute_offset_chrecs): Likewise.
	(dr_chrec_offsets_equal_p): Likewise.
	(dr_chrec_offsets_compare): Likewise.
	* tree-loop-distribution.c (compute_alias_check_pairs): Use
	dr_chrec_offsets_compare.
	* tree-vect-data-refs.c (vect_find_same_alignment_drs): Use
	dr_chrec_offsets_compare and compare the DR_CHREC_INITs.
	(dr_group_sort_cmp): Likewise.
	(vect_analyze_group_access_1): Use DR_CHREC_INIT instead of DR_INIT.
	(vect_no_alias_p): Likewise.
	(vect_analyze_data_ref_accesses): Use dr_chrec_offsets_equal_p and
	compare the DR_CHREC_INITs.
	(vect_prune_runtime_alias_test_list): Use dr_chrec_offsets_compare.
	* tree-vect-stmts.c (vectorizable_load): Use DR_CHREC_INIT instead
	of DR_INIT.

gcc/testsuite/
	PR tree-optimization/81635
	* gcc.dg/vect/bb-slp-pr81635.c: New test.

Richard Sandiford Aug. 16, 2017, 1:40 p.m. UTC | #1

Richard Sandiford <richard.sandiford@linaro.org> writes:
> The first loop in the testcase regressed after my recent changes to

> dr_analyze_innermost.  Previously we would treat "i" as an iv even

> for bb analysis and end up with:

>

>    DR_BASE_ADDRESS: p or q

>    DR_OFFSET: 0

>    DR_INIT: 0 or 4

>    DR_STEP: 16

>

> We now always keep the step as 0 instead, so for an int "i" we'd have:

>

>    DR_BASE_ADDRESS: p or q

>    DR_OFFSET: (intptr_t) i

>    DR_INIT: 0 or 4

>    DR_STEP: 0

>

> This is also what we'd like to have for the unsigned "i", but the

> problem is that strip_constant_offset thinks that the "i + 1" in

> "(intptr_t) (i + 1)" could wrap and so doesn't peel off the "+ 1".

> The [i + 1] accesses therefore have a DR_OFFSET equal to the SSA

> name that holds "(intptr_t) (i + 1)", meaning that the accesses no

> longer seem to be related to the [i] ones.

>

> There seem to be two main uses of DR_OFFSET and DR_INIT.  One type

> expects DR_OFFSET and DR_INIT to be generic expressions whose sum

> gives the initial offset from DR_BASE_ADDRESS.  The other type uses

> the pair (DR_BASE_ADDRESS, DR_OFFSET) to identify related data

> references, with the distance between their start addresses being

> the difference between their DR_INITs.  For this second type, the

> exact form of DR_OFFSET doesn't really matter, and the more it is

> able to canonicalise the non-constant offset, the better.

>

> The patch fixes the PR by adding another offset/init pair to the

> innermost loop behaviour for this second group.  The new pair use chrecs

> rather than generic exprs for the offset part, with the chrec being

> analysed in the innermost loop for which the offset isn't invariant.

> This allows us to vectorise the second function in the testcase

> as well, which wasn't possible before the earlier patch.

>

> Tested on x86_64-linux-gnu and aarch64-linux-gnu.  OK to install?

>

> Richard

>

>

> gcc/

> 	PR tree-optimization/81635

> 	* tree-data-ref.h (innermost_loop_behavior): Add chrec_offset and

> 	chrec_init.

> 	(DR_CHREC_OFFSET, DR_CHREC_INIT): New macros.

> 	(dr_equal_offsets_p): Delete.

> 	(dr_chrec_offsets_equal_p, dr_chrec_offsets_compare): Declare.

> 	* tree-data-ref.c: Include tree-ssa-loop-ivopts.h.

> 	(split_constant_offset): Handle POLYNOMIAL_CHREC.

> 	(dr_analyze_innermost): Initialize dr_chrec_offset and dr_chrec_init.

> 	(operator ==): Use dr_chrec_offsets_equal_p and compare the

> 	DR_CHREC_INITs.

> 	(prune_runtime_alias_test_list): Likewise.

> 	(comp_dr_with_seg_len_pair): Use dr_chrec_offsets_compare and compare

> 	the DR_CHREC_INITs.

> 	(dr_equal_offsets_p1, dr_equal_offsets_p): Delete.

> 	(analyze_offset_scev): New function.

> 	(compute_offset_chrecs): Likewise.

> 	(dr_chrec_offsets_equal_p): Likewise.

> 	(dr_chrec_offsets_compare): Likewise.

> 	* tree-loop-distribution.c (compute_alias_check_pairs): Use

> 	dr_chrec_offsets_compare.

> 	* tree-vect-data-refs.c (vect_find_same_alignment_drs): Use

> 	dr_chrec_offsets_compare and compare the DR_CHREC_INITs.

> 	(dr_group_sort_cmp): Likewise.

> 	(vect_analyze_group_access_1): Use DR_CHREC_INIT instead of DR_INIT.

> 	(vect_no_alias_p): Likewise.

> 	(vect_analyze_data_ref_accesses): Use dr_chrec_offsets_equal_p and

> 	compare the DR_CHREC_INITs.

> 	(vect_prune_runtime_alias_test_list): Use dr_chrec_offsets_compare.

> 	* tree-vect-stmts.c (vectorizable_load): Use DR_CHREC_INIT instead

> 	of DR_INIT.

>

> gcc/testsuite/

> 	PR tree-optimization/81635

> 	* gcc.dg/vect/bb-slp-pr81635.c: New test.


Sorry, forgot I was going to convert this to a context diff...

Index: gcc/tree-data-ref.h
===================================================================
*** gcc/tree-data-ref.h	2017-08-04 11:42:45.938134820 +0100
--- gcc/tree-data-ref.h	2017-08-16 14:34:56.611554296 +0100
*************** struct innermost_loop_behavior
*** 52,57 ****
--- 52,78 ----
    tree init;
    tree step;
  
+   /* The scalar evolution of OFFSET + INIT, split into non-constant and
+      constant terms in the same way as OFFSET and INIT.  For example,
+      if OFFSET + INIT is {x + 4, +, 3}_1, the fields would be:
+ 
+      chrec_offset  {x, +, 3}_1
+      chrec_init    4
+ 
+      If OFFSET + INIT cannot be represented as a scalar evolution,
+      the fields are simply a copy of OFFSET and INIT.
+ 
+      This split is useful when analyzing the relationship between two
+      data references with the same base.  OFFSET and INIT are generic
+      expressions that can be used to build related data references,
+      while CHREC_OFFSET and CHREC_INIT are our best attempt at
+      canonicalizing the value of OFFSET + INIT.
+ 
+      These fields are only initialized lazily, by a call to
+      compute_offset_chrecs.  */
+   tree chrec_offset;
+   tree chrec_init;
+ 
    /* BASE_ADDRESS is known to be misaligned by BASE_MISALIGNMENT bytes
       from an alignment boundary of BASE_ALIGNMENT bytes.  For example,
       if we had:
*************** #define DR_IS_CONDITIONAL_IN_STMT(DR) (D
*** 187,192 ****
--- 208,215 ----
  #define DR_BASE_ADDRESS(DR)        (DR)->innermost.base_address
  #define DR_OFFSET(DR)              (DR)->innermost.offset
  #define DR_INIT(DR)                (DR)->innermost.init
+ #define DR_CHREC_OFFSET(DR)        (DR)->innermost.chrec_offset
+ #define DR_CHREC_INIT(DR)          (DR)->innermost.chrec_init
  #define DR_STEP(DR)                (DR)->innermost.step
  #define DR_PTR_INFO(DR)            (DR)->alias.ptr_info
  #define DR_BASE_ALIGNMENT(DR)      (DR)->innermost.base_alignment
*************** dr_alignment (data_reference *dr)
*** 466,473 ****
  
  extern bool dr_may_alias_p (const struct data_reference *,
  			    const struct data_reference *, bool);
! extern bool dr_equal_offsets_p (struct data_reference *,
!                                 struct data_reference *);
  
  extern bool runtime_alias_check_p (ddr_p, struct loop *, bool);
  extern int data_ref_compare_tree (tree, tree);
--- 489,498 ----
  
  extern bool dr_may_alias_p (const struct data_reference *,
  			    const struct data_reference *, bool);
! extern bool dr_chrec_offsets_equal_p (struct data_reference *,
! 				      struct data_reference *);
! extern int dr_chrec_offsets_compare (struct data_reference *,
! 				     struct data_reference *);
  
  extern bool runtime_alias_check_p (ddr_p, struct loop *, bool);
  extern int data_ref_compare_tree (tree, tree);
Index: gcc/tree-data-ref.c
===================================================================
*** gcc/tree-data-ref.c	2017-08-04 11:42:45.938134820 +0100
--- gcc/tree-data-ref.c	2017-08-16 14:34:56.611554296 +0100
*************** Software Foundation; either version 3, o
*** 86,91 ****
--- 86,92 ----
  #include "expr.h"
  #include "gimple-iterator.h"
  #include "tree-ssa-loop-niter.h"
+ #include "tree-ssa-loop-ivopts.h"
  #include "tree-ssa-loop.h"
  #include "tree-ssa.h"
  #include "cfgloop.h"
*************** split_constant_offset (tree exp, tree *v
*** 730,736 ****
    *off = ssize_int (0);
    STRIP_NOPS (exp);
  
!   if (tree_is_chrec (exp)
        || get_gimple_rhs_class (TREE_CODE (exp)) == GIMPLE_TERNARY_RHS)
      return;
  
--- 731,749 ----
    *off = ssize_int (0);
    STRIP_NOPS (exp);
  
!   if (TREE_CODE (exp) == POLYNOMIAL_CHREC)
!     {
!       split_constant_offset (CHREC_LEFT (exp), &op0, &op1);
!       if (op0 != CHREC_LEFT (exp))
! 	{
! 	  *var = build3 (POLYNOMIAL_CHREC, type, CHREC_VAR (exp),
! 			 op0, CHREC_RIGHT (exp));
! 	  *off = op1;
! 	}
!       return;
!     }
! 
!   if (automatically_generated_chrec_p (exp)
        || get_gimple_rhs_class (TREE_CODE (exp)) == GIMPLE_TERNARY_RHS)
      return;
  
*************** dr_analyze_innermost (innermost_loop_beh
*** 924,929 ****
--- 937,944 ----
    drb->offset = fold_convert (ssizetype, offset_iv.base);
    drb->init = init;
    drb->step = step;
+   drb->chrec_offset = NULL_TREE;
+   drb->chrec_init = NULL_TREE;
    drb->base_alignment = base_alignment;
    drb->base_misalignment = base_misalignment & (base_alignment - 1);
    drb->offset_alignment = highest_pow2_factor (offset_iv.base);
*************** runtime_alias_check_p (ddr_p ddr, struct
*** 1324,1334 ****
  operator == (const dr_with_seg_len& d1,
  	     const dr_with_seg_len& d2)
  {
!   return operand_equal_p (DR_BASE_ADDRESS (d1.dr),
  			  DR_BASE_ADDRESS (d2.dr), 0)
! 	   && data_ref_compare_tree (DR_OFFSET (d1.dr), DR_OFFSET (d2.dr)) == 0
! 	   && data_ref_compare_tree (DR_INIT (d1.dr), DR_INIT (d2.dr)) == 0
! 	   && data_ref_compare_tree (d1.seg_len, d2.seg_len) == 0;
  }
  
  /* Comparison function for sorting objects of dr_with_seg_len_pair_t
--- 1339,1350 ----
  operator == (const dr_with_seg_len& d1,
  	     const dr_with_seg_len& d2)
  {
!   return (operand_equal_p (DR_BASE_ADDRESS (d1.dr),
  			  DR_BASE_ADDRESS (d2.dr), 0)
! 	  && dr_chrec_offsets_equal_p (d1.dr, d2.dr)
! 	  && data_ref_compare_tree (DR_CHREC_INIT (d1.dr),
! 				    DR_CHREC_INIT (d2.dr)) == 0
! 	  && data_ref_compare_tree (d1.seg_len, d2.seg_len) == 0);
  }
  
  /* Comparison function for sorting objects of dr_with_seg_len_pair_t
*************** comp_dr_with_seg_len_pair (const void *p
*** 1360,1376 ****
    if ((comp_res = data_ref_compare_tree (DR_STEP (a2.dr),
  					 DR_STEP (b2.dr))) != 0)
      return comp_res;
!   if ((comp_res = data_ref_compare_tree (DR_OFFSET (a1.dr),
! 					 DR_OFFSET (b1.dr))) != 0)
      return comp_res;
!   if ((comp_res = data_ref_compare_tree (DR_INIT (a1.dr),
! 					 DR_INIT (b1.dr))) != 0)
      return comp_res;
!   if ((comp_res = data_ref_compare_tree (DR_OFFSET (a2.dr),
! 					 DR_OFFSET (b2.dr))) != 0)
      return comp_res;
!   if ((comp_res = data_ref_compare_tree (DR_INIT (a2.dr),
! 					 DR_INIT (b2.dr))) != 0)
      return comp_res;
  
    return 0;
--- 1376,1390 ----
    if ((comp_res = data_ref_compare_tree (DR_STEP (a2.dr),
  					 DR_STEP (b2.dr))) != 0)
      return comp_res;
!   if ((comp_res = dr_chrec_offsets_compare (a1.dr, b1.dr)) != 0)
      return comp_res;
!   if ((comp_res = data_ref_compare_tree (DR_CHREC_INIT (a1.dr),
! 					 DR_CHREC_INIT (b1.dr))) != 0)
      return comp_res;
!   if ((comp_res = dr_chrec_offsets_compare (a2.dr, b2.dr)) != 0)
      return comp_res;
!   if ((comp_res = data_ref_compare_tree (DR_CHREC_INIT (a2.dr),
! 					 DR_CHREC_INIT (b2.dr))) != 0)
      return comp_res;
  
    return 0;
*************** prune_runtime_alias_test_list (vec<dr_wi
*** 1455,1464 ****
  
  	  if (!operand_equal_p (DR_BASE_ADDRESS (dr_a1->dr),
  				DR_BASE_ADDRESS (dr_a2->dr), 0)
! 	      || !operand_equal_p (DR_OFFSET (dr_a1->dr),
! 				   DR_OFFSET (dr_a2->dr), 0)
! 	      || !tree_fits_shwi_p (DR_INIT (dr_a1->dr))
! 	      || !tree_fits_shwi_p (DR_INIT (dr_a2->dr)))
  	    continue;
  
  	  /* Only merge const step data references.  */
--- 1469,1477 ----
  
  	  if (!operand_equal_p (DR_BASE_ADDRESS (dr_a1->dr),
  				DR_BASE_ADDRESS (dr_a2->dr), 0)
! 	      || !dr_chrec_offsets_equal_p (dr_a1->dr, dr_a2->dr)
! 	      || !tree_fits_shwi_p (DR_CHREC_INIT (dr_a1->dr))
! 	      || !tree_fits_shwi_p (DR_CHREC_INIT (dr_a2->dr)))
  	    continue;
  
  	  /* Only merge const step data references.  */
*************** prune_runtime_alias_test_list (vec<dr_wi
*** 1484,1494 ****
  	    continue;
  
  	  /* Make sure dr_a1 starts left of dr_a2.  */
! 	  if (tree_int_cst_lt (DR_INIT (dr_a2->dr), DR_INIT (dr_a1->dr)))
  	    std::swap (*dr_a1, *dr_a2);
  
  	  bool do_remove = false;
! 	  wide_int diff = wi::sub (DR_INIT (dr_a2->dr), DR_INIT (dr_a1->dr));
  	  wide_int min_seg_len_b;
  	  tree new_seg_len;
  
--- 1497,1509 ----
  	    continue;
  
  	  /* Make sure dr_a1 starts left of dr_a2.  */
! 	  if (tree_int_cst_lt (DR_CHREC_INIT (dr_a2->dr),
! 			       DR_CHREC_INIT (dr_a1->dr)))
  	    std::swap (*dr_a1, *dr_a2);
  
  	  bool do_remove = false;
! 	  wide_int diff = wi::sub (DR_CHREC_INIT (dr_a2->dr),
! 				   DR_CHREC_INIT (dr_a1->dr));
  	  wide_int min_seg_len_b;
  	  tree new_seg_len;
  
*************** create_runtime_alias_checks (struct loop
*** 1826,1871 ****
      }
  }
  
! /* Check if OFFSET1 and OFFSET2 (DR_OFFSETs of some data-refs) are identical
!    expressions.  */
! static bool
! dr_equal_offsets_p1 (tree offset1, tree offset2)
  {
!   bool res;
  
!   STRIP_NOPS (offset1);
!   STRIP_NOPS (offset2);
  
!   if (offset1 == offset2)
!     return true;
  
!   if (TREE_CODE (offset1) != TREE_CODE (offset2)
!       || (!BINARY_CLASS_P (offset1) && !UNARY_CLASS_P (offset1)))
!     return false;
  
!   res = dr_equal_offsets_p1 (TREE_OPERAND (offset1, 0),
!                              TREE_OPERAND (offset2, 0));
  
!   if (!res || !BINARY_CLASS_P (offset1))
!     return res;
  
!   res = dr_equal_offsets_p1 (TREE_OPERAND (offset1, 1),
!                              TREE_OPERAND (offset2, 1));
  
!   return res;
  }
  
! /* Check if DRA and DRB have equal offsets.  */
! bool
! dr_equal_offsets_p (struct data_reference *dra,
!                     struct data_reference *drb)
! {
!   tree offset1, offset2;
  
!   offset1 = DR_OFFSET (dra);
!   offset2 = DR_OFFSET (drb);
  
!   return dr_equal_offsets_p1 (offset1, offset2);
  }
  
  /* Returns true if FNA == FNB.  */
--- 1841,1923 ----
      }
  }
  
! /* Analyze the scalar evolution of OFFSET in the innermost parent of
!    LOOP for which it isn't invariant.  */
! 
! static tree
! analyze_offset_scev (struct loop *loop, tree offset)
  {
!   struct loop *inv_loop = outermost_invariant_loop_for_expr (loop, offset);
!   if (inv_loop != NULL)
!     {
!       if (loop_depth (inv_loop) == 0)
! 	return offset;
!       loop = loop_outer (inv_loop);
!     }
!   return analyze_scalar_evolution (loop, offset);
! }
  
! /* Analyze the scalar evolution of DRB->offset + DRB->init in a form
!    that is valid for use in LOOP.  Store the result DRB->chrec_offset
!    and DRB->chrec_init.  */
  
! static void
! compute_offset_chrecs (struct loop *loop, innermost_loop_behavior *drb)
! {
!   tree scev = analyze_offset_scev (loop, drb->offset);
!   if (TREE_CODE (scev) == POLYNOMIAL_CHREC)
!     {
!       tree init_term;
!       split_constant_offset (scev, &drb->chrec_offset, &init_term);
!       drb->chrec_init = fold_build2 (PLUS_EXPR, ssizetype, init_term,
! 				     drb->init);
!     }
!   else
!     {
!       drb->chrec_offset = drb->offset;
!       drb->chrec_init = drb->init;
!     }
! }
  
! /* Analyze the scalar evolution of DR_OFFSET (DR) + DR_INIT (DR) wrt
!    the loop that contains DR_STMT (DR), storing the result in
!    DR_CHREC_OFFSET (DR) and DR_CHREC_INIT (DR).  */
  
! static void
! compute_offset_chrecs (data_reference *dr)
! {
!   return compute_offset_chrecs (loop_containing_stmt (DR_STMT (dr)),
! 				&DR_INNERMOST (dr));
! }
  
! /* Check if DRA and DRB have equal DR_CHREC_OFFSETs, computing them
!    first if necessary.  */
  
! bool
! dr_chrec_offsets_equal_p (struct data_reference *dra,
! 			  struct data_reference *drb)
! {
!   if (!DR_CHREC_OFFSET (dra))
!     compute_offset_chrecs (dra);
!   if (!DR_CHREC_OFFSET (drb))
!     compute_offset_chrecs (drb);
  
!   return eq_evolutions_p (DR_CHREC_OFFSET (dra), DR_CHREC_OFFSET (drb));
  }
  
! /* Compare the DR_CHREC_OFFSETs of DRA and DRB for sorting purposes,
!    computing them first if necessary.  */
  
! int
! dr_chrec_offsets_compare (struct data_reference *dra,
! 			  struct data_reference *drb)
! {
!   if (!DR_CHREC_OFFSET (dra))
!     compute_offset_chrecs (dra);
!   if (!DR_CHREC_OFFSET (drb))
!     compute_offset_chrecs (drb);
  
!   return data_ref_compare_tree (DR_CHREC_OFFSET (dra), DR_CHREC_OFFSET (drb));
  }
  
  /* Returns true if FNA == FNB.  */
Index: gcc/tree-loop-distribution.c
===================================================================
*** gcc/tree-loop-distribution.c	2017-08-16 08:50:32.415427038 +0100
--- gcc/tree-loop-distribution.c	2017-08-16 14:34:56.612554255 +0100
*************** compute_alias_check_pairs (struct loop *
*** 2204,2210 ****
  					    DR_BASE_ADDRESS (dr_b));
  
        if (comp_res == 0)
! 	comp_res = data_ref_compare_tree (DR_OFFSET (dr_a), DR_OFFSET (dr_b));
        gcc_assert (comp_res != 0);
  
        if (latch_dominated_by_data_ref (loop, dr_a))
--- 2204,2210 ----
  					    DR_BASE_ADDRESS (dr_b));
  
        if (comp_res == 0)
! 	comp_res = dr_chrec_offsets_compare (dr_a, dr_b);
        gcc_assert (comp_res != 0);
  
        if (latch_dominated_by_data_ref (loop, dr_a))
Index: gcc/tree-vect-data-refs.c
===================================================================
*** gcc/tree-vect-data-refs.c	2017-08-04 11:42:45.939105152 +0100
--- gcc/tree-vect-data-refs.c	2017-08-16 14:34:56.613554213 +0100
*************** vect_find_same_alignment_drs (struct dat
*** 2187,2199 ****
      return;
  
    if (!operand_equal_p (DR_BASE_ADDRESS (dra), DR_BASE_ADDRESS (drb), 0)
!       || !operand_equal_p (DR_OFFSET (dra), DR_OFFSET (drb), 0)
        || !operand_equal_p (DR_STEP (dra), DR_STEP (drb), 0))
      return;
  
    /* Two references with distance zero have the same alignment.  */
!   offset_int diff = (wi::to_offset (DR_INIT (dra))
! 		     - wi::to_offset (DR_INIT (drb)));
    if (diff != 0)
      {
        /* Get the wider of the two alignments.  */
--- 2187,2199 ----
      return;
  
    if (!operand_equal_p (DR_BASE_ADDRESS (dra), DR_BASE_ADDRESS (drb), 0)
!       || !dr_chrec_offsets_equal_p (dra, drb)
        || !operand_equal_p (DR_STEP (dra), DR_STEP (drb), 0))
      return;
  
    /* Two references with distance zero have the same alignment.  */
!   offset_int diff = (wi::to_offset (DR_CHREC_INIT (dra))
! 		     - wi::to_offset (DR_CHREC_INIT (drb)));
    if (diff != 0)
      {
        /* Get the wider of the two alignments.  */
*************** vect_analyze_group_access_1 (struct data
*** 2434,2440 ****
        gimple *next = GROUP_NEXT_ELEMENT (vinfo_for_stmt (stmt));
        struct data_reference *data_ref = dr;
        unsigned int count = 1;
!       tree prev_init = DR_INIT (data_ref);
        gimple *prev = stmt;
        HOST_WIDE_INT diff, gaps = 0;
  
--- 2434,2440 ----
        gimple *next = GROUP_NEXT_ELEMENT (vinfo_for_stmt (stmt));
        struct data_reference *data_ref = dr;
        unsigned int count = 1;
!       tree prev_init = DR_CHREC_INIT (data_ref);
        gimple *prev = stmt;
        HOST_WIDE_INT diff, gaps = 0;
  
*************** vect_analyze_group_access_1 (struct data
*** 2444,2452 ****
               data-ref (supported only for loads), we vectorize only the first
               stmt, and the rest get their vectorized loads from the first
               one.  */
!           if (!tree_int_cst_compare (DR_INIT (data_ref),
!                                      DR_INIT (STMT_VINFO_DATA_REF (
! 						   vinfo_for_stmt (next)))))
              {
                if (DR_IS_WRITE (data_ref))
                  {
--- 2444,2452 ----
               data-ref (supported only for loads), we vectorize only the first
               stmt, and the rest get their vectorized loads from the first
               one.  */
!           if (!tree_int_cst_compare (DR_CHREC_INIT (data_ref),
!                                      DR_CHREC_INIT (STMT_VINFO_DATA_REF
! 						    (vinfo_for_stmt (next)))))
              {
                if (DR_IS_WRITE (data_ref))
                  {
*************** vect_analyze_group_access_1 (struct data
*** 2476,2482 ****
  
            /* Check that the distance between two accesses is equal to the type
               size. Otherwise, we have gaps.  */
!           diff = (TREE_INT_CST_LOW (DR_INIT (data_ref))
                    - TREE_INT_CST_LOW (prev_init)) / type_size;
  	  if (diff != 1)
  	    {
--- 2476,2482 ----
  
            /* Check that the distance between two accesses is equal to the type
               size. Otherwise, we have gaps.  */
!           diff = (TREE_INT_CST_LOW (DR_CHREC_INIT (data_ref))
                    - TREE_INT_CST_LOW (prev_init)) / type_size;
  	  if (diff != 1)
  	    {
*************** vect_analyze_group_access_1 (struct data
*** 2499,2505 ****
               gap in the access, GROUP_GAP is always 1.  */
            GROUP_GAP (vinfo_for_stmt (next)) = diff;
  
!           prev_init = DR_INIT (data_ref);
            next = GROUP_NEXT_ELEMENT (vinfo_for_stmt (next));
            /* Count the number of data-refs in the chain.  */
            count++;
--- 2499,2505 ----
               gap in the access, GROUP_GAP is always 1.  */
            GROUP_GAP (vinfo_for_stmt (next)) = diff;
  
!           prev_init = DR_CHREC_INIT (data_ref);
            next = GROUP_NEXT_ELEMENT (vinfo_for_stmt (next));
            /* Count the number of data-refs in the chain.  */
            count++;
*************** dr_group_sort_cmp (const void *dra_, con
*** 2715,2727 ****
          return cmp;
      }
  
!   /* And according to DR_OFFSET.  */
!   if (!dr_equal_offsets_p (dra, drb))
!     {
!       cmp = data_ref_compare_tree (DR_OFFSET (dra), DR_OFFSET (drb));
!       if (cmp != 0)
!         return cmp;
!     }
  
    /* Put reads before writes.  */
    if (DR_IS_READ (dra) != DR_IS_READ (drb))
--- 2715,2724 ----
          return cmp;
      }
  
!   /* And according to DR_CHREC_OFFSET.  */
!   cmp = dr_chrec_offsets_compare (dra, drb);
!   if (cmp != 0)
!     return cmp;
  
    /* Put reads before writes.  */
    if (DR_IS_READ (dra) != DR_IS_READ (drb))
*************** dr_group_sort_cmp (const void *dra_, con
*** 2745,2752 ****
          return cmp;
      }
  
!   /* Then sort after DR_INIT.  In case of identical DRs sort after stmt UID.  */
!   cmp = tree_int_cst_compare (DR_INIT (dra), DR_INIT (drb));
    if (cmp == 0)
      return gimple_uid (DR_STMT (dra)) < gimple_uid (DR_STMT (drb)) ? -1 : 1;
    return cmp;
--- 2742,2750 ----
          return cmp;
      }
  
!   /* Then sort after DR_CHREC_INIT.  In case of identical DRs sort after
!      stmt UID.  */
!   cmp = tree_int_cst_compare (DR_CHREC_INIT (dra), DR_CHREC_INIT (drb));
    if (cmp == 0)
      return gimple_uid (DR_STMT (dra)) < gimple_uid (DR_STMT (drb)) ? -1 : 1;
    return cmp;
*************** vect_analyze_data_ref_accesses (vec_info
*** 2817,2823 ****
  	  if (DR_IS_READ (dra) != DR_IS_READ (drb)
  	      || !operand_equal_p (DR_BASE_ADDRESS (dra),
  				   DR_BASE_ADDRESS (drb), 0)
! 	      || !dr_equal_offsets_p (dra, drb)
  	      || !gimple_assign_single_p (DR_STMT (dra))
  	      || !gimple_assign_single_p (DR_STMT (drb)))
  	    break;
--- 2815,2821 ----
  	  if (DR_IS_READ (dra) != DR_IS_READ (drb)
  	      || !operand_equal_p (DR_BASE_ADDRESS (dra),
  				   DR_BASE_ADDRESS (drb), 0)
! 	      || !dr_chrec_offsets_equal_p (dra, drb)
  	      || !gimple_assign_single_p (DR_STMT (dra))
  	      || !gimple_assign_single_p (DR_STMT (drb)))
  	    break;
*************** vect_analyze_data_ref_accesses (vec_info
*** 2835,2841 ****
  	    break;
  
  	  /* Do not place the same access in the interleaving chain twice.  */
! 	  if (tree_int_cst_compare (DR_INIT (dra), DR_INIT (drb)) == 0)
  	    break;
  
  	  /* Check the types are compatible.
--- 2833,2840 ----
  	    break;
  
  	  /* Do not place the same access in the interleaving chain twice.  */
! 	  if (tree_int_cst_compare (DR_CHREC_INIT (dra),
! 				    DR_CHREC_INIT (drb)) == 0)
  	    break;
  
  	  /* Check the types are compatible.
*************** vect_analyze_data_ref_accesses (vec_info
*** 2844,2852 ****
  				   TREE_TYPE (DR_REF (drb))))
  	    break;
  
! 	  /* Sorting has ensured that DR_INIT (dra) <= DR_INIT (drb).  */
! 	  HOST_WIDE_INT init_a = TREE_INT_CST_LOW (DR_INIT (dra));
! 	  HOST_WIDE_INT init_b = TREE_INT_CST_LOW (DR_INIT (drb));
  	  gcc_assert (init_a <= init_b);
  
  	  /* If init_b == init_a + the size of the type * k, we have an
--- 2843,2852 ----
  				   TREE_TYPE (DR_REF (drb))))
  	    break;
  
! 	  /* Sorting has ensured that
! 	     DR_CHREC_INIT (dra) <= DR_CHREC_INIT (drb).  */
! 	  HOST_WIDE_INT init_a = TREE_INT_CST_LOW (DR_CHREC_INIT (dra));
! 	  HOST_WIDE_INT init_b = TREE_INT_CST_LOW (DR_CHREC_INIT (drb));
  	  gcc_assert (init_a <= init_b);
  
  	  /* If init_b == init_a + the size of the type * k, we have an
*************** vect_analyze_data_ref_accesses (vec_info
*** 2859,2868 ****
  	  /* If we have a store, the accesses are adjacent.  This splits
  	     groups into chunks we support (we don't support vectorization
  	     of stores with gaps).  */
  	  if (!DR_IS_READ (dra)
! 	      && (init_b - (HOST_WIDE_INT) TREE_INT_CST_LOW
! 					     (DR_INIT (datarefs_copy[i-1]))
! 		  != type_size_a))
  	    break;
  
  	  /* If the step (if not zero or non-constant) is greater than the
--- 2859,2868 ----
  	  /* If we have a store, the accesses are adjacent.  This splits
  	     groups into chunks we support (we don't support vectorization
  	     of stores with gaps).  */
+ 	  HOST_WIDE_INT prev_init
+ 	    = TREE_INT_CST_LOW (DR_CHREC_INIT (datarefs_copy[i - 1]));
  	  if (!DR_IS_READ (dra)
! 	      && (init_b - prev_init) != type_size_a)
  	    break;
  
  	  /* If the step (if not zero or non-constant) is greater than the
*************** vect_vfa_segment_size (struct data_refer
*** 2974,2985 ****
  vect_no_alias_p (struct data_reference *a, struct data_reference *b,
                   tree segment_length_a, tree segment_length_b)
  {
!   gcc_assert (TREE_CODE (DR_INIT (a)) == INTEGER_CST
! 	      && TREE_CODE (DR_INIT (b)) == INTEGER_CST);
!   if (tree_int_cst_equal (DR_INIT (a), DR_INIT (b)))
      return false;
  
!   tree seg_a_min = DR_INIT (a);
    tree seg_a_max = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_a_min),
  				seg_a_min, segment_length_a);
    /* For negative step, we need to adjust address range by TYPE_SIZE_UNIT
--- 2974,2985 ----
  vect_no_alias_p (struct data_reference *a, struct data_reference *b,
                   tree segment_length_a, tree segment_length_b)
  {
!   gcc_assert (TREE_CODE (DR_CHREC_INIT (a)) == INTEGER_CST
! 	      && TREE_CODE (DR_CHREC_INIT (b)) == INTEGER_CST);
!   if (tree_int_cst_equal (DR_CHREC_INIT (a), DR_CHREC_INIT (b)))
      return false;
  
!   tree seg_a_min = DR_CHREC_INIT (a);
    tree seg_a_max = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_a_min),
  				seg_a_min, segment_length_a);
    /* For negative step, we need to adjust address range by TYPE_SIZE_UNIT
*************** vect_no_alias_p (struct data_reference *
*** 2990,2999 ****
        tree unit_size = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (a)));
        seg_a_min = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_a_max),
  			       seg_a_max, unit_size);
!       seg_a_max = fold_build2 (PLUS_EXPR, TREE_TYPE (DR_INIT (a)),
! 			       DR_INIT (a), unit_size);
      }
!   tree seg_b_min = DR_INIT (b);
    tree seg_b_max = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_b_min),
  				seg_b_min, segment_length_b);
    if (tree_int_cst_compare (DR_STEP (b), size_zero_node) < 0)
--- 2990,2999 ----
        tree unit_size = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (a)));
        seg_a_min = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_a_max),
  			       seg_a_max, unit_size);
!       seg_a_max = fold_build2 (PLUS_EXPR, TREE_TYPE (DR_CHREC_INIT (a)),
! 			       DR_CHREC_INIT (a), unit_size);
      }
!   tree seg_b_min = DR_CHREC_INIT (b);
    tree seg_b_max = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_b_min),
  				seg_b_min, segment_length_b);
    if (tree_int_cst_compare (DR_STEP (b), size_zero_node) < 0)
*************** vect_no_alias_p (struct data_reference *
*** 3001,3008 ****
        tree unit_size = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (b)));
        seg_b_min = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_b_max),
  			       seg_b_max, unit_size);
!       seg_b_max = fold_build2 (PLUS_EXPR, TREE_TYPE (DR_INIT (b)),
! 			       DR_INIT (b), unit_size);
      }
  
    if (tree_int_cst_le (seg_a_max, seg_b_min)
--- 3001,3008 ----
        tree unit_size = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (b)));
        seg_b_min = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_b_max),
  			       seg_b_max, unit_size);
!       seg_b_max = fold_build2 (PLUS_EXPR, TREE_TYPE (DR_CHREC_INIT (b)),
! 			       DR_CHREC_INIT (b), unit_size);
      }
  
    if (tree_int_cst_le (seg_a_max, seg_b_min)
*************** vect_prune_runtime_alias_test_list (loop
*** 3148,3155 ****
        comp_res = data_ref_compare_tree (DR_BASE_ADDRESS (dr_a),
  					DR_BASE_ADDRESS (dr_b));
        if (comp_res == 0)
! 	comp_res = data_ref_compare_tree (DR_OFFSET (dr_a),
! 					  DR_OFFSET (dr_b));
  
        /* Alias is known at compilation time.  */
        if (comp_res == 0
--- 3148,3154 ----
        comp_res = data_ref_compare_tree (DR_BASE_ADDRESS (dr_a),
  					DR_BASE_ADDRESS (dr_b));
        if (comp_res == 0)
! 	comp_res = dr_chrec_offsets_compare (dr_a, dr_b);
  
        /* Alias is known at compilation time.  */
        if (comp_res == 0
Index: gcc/tree-vect-stmts.c
===================================================================
*** gcc/tree-vect-stmts.c	2017-08-03 10:40:55.650125801 +0100
--- gcc/tree-vect-stmts.c	2017-08-16 14:34:56.614554172 +0100
*************** vectorizable_load (gimple *stmt, gimple_
*** 7423,7430 ****
  		= STMT_VINFO_DATA_REF (vinfo_for_stmt (first_stmt_for_drptr));
  	      tree diff = fold_convert (sizetype,
  					size_binop (MINUS_EXPR,
! 						    DR_INIT (first_dr),
! 						    DR_INIT (ptrdr)));
  	      dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, gsi,
  					     stmt, diff);
  	    }
--- 7423,7430 ----
  		= STMT_VINFO_DATA_REF (vinfo_for_stmt (first_stmt_for_drptr));
  	      tree diff = fold_convert (sizetype,
  					size_binop (MINUS_EXPR,
! 						    DR_CHREC_INIT (first_dr),
! 						    DR_CHREC_INIT (ptrdr)));
  	      dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, gsi,
  					     stmt, diff);
  	    }
Index: gcc/testsuite/gcc.dg/vect/bb-slp-pr81635.c
===================================================================
*** /dev/null	2017-08-15 18:19:06.926776201 +0100
--- gcc/testsuite/gcc.dg/vect/bb-slp-pr81635.c	2017-08-16 14:34:56.610554337 +0100
***************
*** 0 ****
--- 1,32 ----
+ /* { dg-do compile } */
+ /* { dg-require-effective-target vect_unpack } */
+ 
+ double p[1000];
+ double q[1000];
+ 
+ void
+ f1 (void)
+ {
+   for (unsigned int i = 0; i < 1000; i += 4)
+     {
+       double a = q[i] + p[i];
+       double b = q[i + 1] + p[i + 1];
+       q[i] = a;
+       q[i + 1] = b;
+     }
+ }
+ 
+ void
+ f2 (void)
+ {
+   for (unsigned int i = 0; i < 500; i += 6)
+     for (unsigned int j = 0; j < 500; j += 4)
+       {
+ 	double a = q[j] + p[i];
+ 	double b = q[j + 1] + p[i + 1];
+ 	q[i] = a;
+ 	q[i + 1] = b;
+       }
+ }
+ 
+ /* { dg-final { scan-tree-dump-times "basic block vectorized" 2 "slp1" } } */

Bin.Cheng Aug. 16, 2017, 1:59 p.m. UTC | #2

On Wed, Aug 16, 2017 at 2:38 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> The first loop in the testcase regressed after my recent changes to

> dr_analyze_innermost.  Previously we would treat "i" as an iv even

> for bb analysis and end up with:

>

>    DR_BASE_ADDRESS: p or q

>    DR_OFFSET: 0

>    DR_INIT: 0 or 4

>    DR_STEP: 16

>

> We now always keep the step as 0 instead, so for an int "i" we'd have:

>

>    DR_BASE_ADDRESS: p or q

>    DR_OFFSET: (intptr_t) i

>    DR_INIT: 0 or 4

>    DR_STEP: 0

>

> This is also what we'd like to have for the unsigned "i", but the

> problem is that strip_constant_offset thinks that the "i + 1" in

> "(intptr_t) (i + 1)" could wrap and so doesn't peel off the "+ 1".

> The [i + 1] accesses therefore have a DR_OFFSET equal to the SSA

> name that holds "(intptr_t) (i + 1)", meaning that the accesses no

> longer seem to be related to the [i] ones.


Didn't read the change in detail, so sorry if I mis-understood the issue.
I made changes in scev to better fold type conversion by various sources
of information, for example, vrp, niters, undefined overflow behavior etc.
In theory these information should be available for other optimizers without
querying scev.  For the mentioned test, vrp should compute accurate range
information for "i" so that we can fold (intptr_t) (i + 1) it without worrying
overflow.  Note we don't do it in generic folding because (intptr_t) (i) + 1
could be more expensive (especially in case of (T)(i + j)), or because the
CST part is in bigger precision after conversion.
But such folding is wanted in several places, e.g, IVOPTs.  To provide such
an interface, we changed tree-affine and made it do aggressive fold.  I am
curious if it's possible to use aff_tree to implement strip_constant_offset
here since aggressive folding is wanted.  After all, using additional chrec
looks like a little heavy wrto the simple test.

Thanks,
bin
>

> There seem to be two main uses of DR_OFFSET and DR_INIT.  One type

> expects DR_OFFSET and DR_INIT to be generic expressions whose sum

> gives the initial offset from DR_BASE_ADDRESS.  The other type uses

> the pair (DR_BASE_ADDRESS, DR_OFFSET) to identify related data

> references, with the distance between their start addresses being

> the difference between their DR_INITs.  For this second type, the

> exact form of DR_OFFSET doesn't really matter, and the more it is

> able to canonicalise the non-constant offset, the better.

>

> The patch fixes the PR by adding another offset/init pair to the

> innermost loop behaviour for this second group.  The new pair use chrecs

> rather than generic exprs for the offset part, with the chrec being

> analysed in the innermost loop for which the offset isn't invariant.

> This allows us to vectorise the second function in the testcase

> as well, which wasn't possible before the earlier patch.

>

> Tested on x86_64-linux-gnu and aarch64-linux-gnu.  OK to install?

>

> Richard

>

>

> gcc/

>         PR tree-optimization/81635

>         * tree-data-ref.h (innermost_loop_behavior): Add chrec_offset and

>         chrec_init.

>         (DR_CHREC_OFFSET, DR_CHREC_INIT): New macros.

>         (dr_equal_offsets_p): Delete.

>         (dr_chrec_offsets_equal_p, dr_chrec_offsets_compare): Declare.

>         * tree-data-ref.c: Include tree-ssa-loop-ivopts.h.

>         (split_constant_offset): Handle POLYNOMIAL_CHREC.

>         (dr_analyze_innermost): Initialize dr_chrec_offset and dr_chrec_init.

>         (operator ==): Use dr_chrec_offsets_equal_p and compare the

>         DR_CHREC_INITs.

>         (prune_runtime_alias_test_list): Likewise.

>         (comp_dr_with_seg_len_pair): Use dr_chrec_offsets_compare and compare

>         the DR_CHREC_INITs.

>         (dr_equal_offsets_p1, dr_equal_offsets_p): Delete.

>         (analyze_offset_scev): New function.

>         (compute_offset_chrecs): Likewise.

>         (dr_chrec_offsets_equal_p): Likewise.

>         (dr_chrec_offsets_compare): Likewise.

>         * tree-loop-distribution.c (compute_alias_check_pairs): Use

>         dr_chrec_offsets_compare.

>         * tree-vect-data-refs.c (vect_find_same_alignment_drs): Use

>         dr_chrec_offsets_compare and compare the DR_CHREC_INITs.

>         (dr_group_sort_cmp): Likewise.

>         (vect_analyze_group_access_1): Use DR_CHREC_INIT instead of DR_INIT.

>         (vect_no_alias_p): Likewise.

>         (vect_analyze_data_ref_accesses): Use dr_chrec_offsets_equal_p and

>         compare the DR_CHREC_INITs.

>         (vect_prune_runtime_alias_test_list): Use dr_chrec_offsets_compare.

>         * tree-vect-stmts.c (vectorizable_load): Use DR_CHREC_INIT instead

>         of DR_INIT.

>

> gcc/testsuite/

>         PR tree-optimization/81635

>         * gcc.dg/vect/bb-slp-pr81635.c: New test.

>

Richard Sandiford Aug. 16, 2017, 4 p.m. UTC | #3

"Bin.Cheng" <amker.cheng@gmail.com> writes:
> On Wed, Aug 16, 2017 at 2:38 PM, Richard Sandiford

> <richard.sandiford@linaro.org> wrote:

>> The first loop in the testcase regressed after my recent changes to

>> dr_analyze_innermost.  Previously we would treat "i" as an iv even

>> for bb analysis and end up with:

>>

>>    DR_BASE_ADDRESS: p or q

>>    DR_OFFSET: 0

>>    DR_INIT: 0 or 4

>>    DR_STEP: 16

>>

>> We now always keep the step as 0 instead, so for an int "i" we'd have:

>>

>>    DR_BASE_ADDRESS: p or q

>>    DR_OFFSET: (intptr_t) i

>>    DR_INIT: 0 or 4

>>    DR_STEP: 0

>>

>> This is also what we'd like to have for the unsigned "i", but the

>> problem is that strip_constant_offset thinks that the "i + 1" in

>> "(intptr_t) (i + 1)" could wrap and so doesn't peel off the "+ 1".

>> The [i + 1] accesses therefore have a DR_OFFSET equal to the SSA

>> name that holds "(intptr_t) (i + 1)", meaning that the accesses no

>> longer seem to be related to the [i] ones.

>

> Didn't read the change in detail, so sorry if I mis-understood the issue.

> I made changes in scev to better fold type conversion by various sources

> of information, for example, vrp, niters, undefined overflow behavior etc.

> In theory these information should be available for other optimizers without

> querying scev.  For the mentioned test, vrp should compute accurate range

> information for "i" so that we can fold (intptr_t) (i + 1) it without worrying

> overflow.  Note we don't do it in generic folding because (intptr_t) (i) + 1

> could be more expensive (especially in case of (T)(i + j)), or because the

> CST part is in bigger precision after conversion.

> But such folding is wanted in several places, e.g, IVOPTs.  To provide such

> an interface, we changed tree-affine and made it do aggressive fold.  I am

> curious if it's possible to use aff_tree to implement strip_constant_offset

> here since aggressive folding is wanted.  After all, using additional chrec

> looks like a little heavy wrto the simple test.


Yeah, using aff_tree does work here when the bounds are constant.
It doesn't look like it works for things like:

    double p[1000];
    double q[1000];

    void
    f4 (unsigned int n)
    {
      for (unsigned int i = 0; i < n; i += 4)
        {
          double a = q[i] + p[i];
          double b = q[i + 1] + p[i + 1];
          q[i] = a;
          q[i + 1] = b;
        }
    }

though, where the bounds on the global arrays guarantee that [i + 1] can't
overflow, even though "n" is unconstrained.  The patch as posted handles
this case too.

Thanks,
Richard

>

> Thanks,

> bin

>>

>> There seem to be two main uses of DR_OFFSET and DR_INIT.  One type

>> expects DR_OFFSET and DR_INIT to be generic expressions whose sum

>> gives the initial offset from DR_BASE_ADDRESS.  The other type uses

>> the pair (DR_BASE_ADDRESS, DR_OFFSET) to identify related data

>> references, with the distance between their start addresses being

>> the difference between their DR_INITs.  For this second type, the

>> exact form of DR_OFFSET doesn't really matter, and the more it is

>> able to canonicalise the non-constant offset, the better.

>>

>> The patch fixes the PR by adding another offset/init pair to the

>> innermost loop behaviour for this second group.  The new pair use chrecs

>> rather than generic exprs for the offset part, with the chrec being

>> analysed in the innermost loop for which the offset isn't invariant.

>> This allows us to vectorise the second function in the testcase

>> as well, which wasn't possible before the earlier patch.

>>

>> Tested on x86_64-linux-gnu and aarch64-linux-gnu.  OK to install?

>>

>> Richard

>>

>>

>> gcc/

>>         PR tree-optimization/81635

>>         * tree-data-ref.h (innermost_loop_behavior): Add chrec_offset and

>>         chrec_init.

>>         (DR_CHREC_OFFSET, DR_CHREC_INIT): New macros.

>>         (dr_equal_offsets_p): Delete.

>>         (dr_chrec_offsets_equal_p, dr_chrec_offsets_compare): Declare.

>>         * tree-data-ref.c: Include tree-ssa-loop-ivopts.h.

>>         (split_constant_offset): Handle POLYNOMIAL_CHREC.

>>         (dr_analyze_innermost): Initialize dr_chrec_offset and dr_chrec_init.

>>         (operator ==): Use dr_chrec_offsets_equal_p and compare the

>>         DR_CHREC_INITs.

>>         (prune_runtime_alias_test_list): Likewise.

>>         (comp_dr_with_seg_len_pair): Use dr_chrec_offsets_compare and compare

>>         the DR_CHREC_INITs.

>>         (dr_equal_offsets_p1, dr_equal_offsets_p): Delete.

>>         (analyze_offset_scev): New function.

>>         (compute_offset_chrecs): Likewise.

>>         (dr_chrec_offsets_equal_p): Likewise.

>>         (dr_chrec_offsets_compare): Likewise.

>>         * tree-loop-distribution.c (compute_alias_check_pairs): Use

>>         dr_chrec_offsets_compare.

>>         * tree-vect-data-refs.c (vect_find_same_alignment_drs): Use

>>         dr_chrec_offsets_compare and compare the DR_CHREC_INITs.

>>         (dr_group_sort_cmp): Likewise.

>>         (vect_analyze_group_access_1): Use DR_CHREC_INIT instead of DR_INIT.

>>         (vect_no_alias_p): Likewise.

>>         (vect_analyze_data_ref_accesses): Use dr_chrec_offsets_equal_p and

>>         compare the DR_CHREC_INITs.

>>         (vect_prune_runtime_alias_test_list): Use dr_chrec_offsets_compare.

>>         * tree-vect-stmts.c (vectorizable_load): Use DR_CHREC_INIT instead

>>         of DR_INIT.

>>

>> gcc/testsuite/

>>         PR tree-optimization/81635

>>         * gcc.dg/vect/bb-slp-pr81635.c: New test.

>>

Bin.Cheng Aug. 16, 2017, 4:10 p.m. UTC | #4

On Wed, Aug 16, 2017 at 5:00 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> "Bin.Cheng" <amker.cheng@gmail.com> writes:

>> On Wed, Aug 16, 2017 at 2:38 PM, Richard Sandiford

>> <richard.sandiford@linaro.org> wrote:

>>> The first loop in the testcase regressed after my recent changes to

>>> dr_analyze_innermost.  Previously we would treat "i" as an iv even

>>> for bb analysis and end up with:

>>>

>>>    DR_BASE_ADDRESS: p or q

>>>    DR_OFFSET: 0

>>>    DR_INIT: 0 or 4

>>>    DR_STEP: 16

>>>

>>> We now always keep the step as 0 instead, so for an int "i" we'd have:

>>>

>>>    DR_BASE_ADDRESS: p or q

>>>    DR_OFFSET: (intptr_t) i

>>>    DR_INIT: 0 or 4

>>>    DR_STEP: 0

>>>

>>> This is also what we'd like to have for the unsigned "i", but the

>>> problem is that strip_constant_offset thinks that the "i + 1" in

>>> "(intptr_t) (i + 1)" could wrap and so doesn't peel off the "+ 1".

>>> The [i + 1] accesses therefore have a DR_OFFSET equal to the SSA

>>> name that holds "(intptr_t) (i + 1)", meaning that the accesses no

>>> longer seem to be related to the [i] ones.

>>

>> Didn't read the change in detail, so sorry if I mis-understood the issue.

>> I made changes in scev to better fold type conversion by various sources

>> of information, for example, vrp, niters, undefined overflow behavior etc.

>> In theory these information should be available for other optimizers without

>> querying scev.  For the mentioned test, vrp should compute accurate range

>> information for "i" so that we can fold (intptr_t) (i + 1) it without worrying

>> overflow.  Note we don't do it in generic folding because (intptr_t) (i) + 1

>> could be more expensive (especially in case of (T)(i + j)), or because the

>> CST part is in bigger precision after conversion.

>> But such folding is wanted in several places, e.g, IVOPTs.  To provide such

>> an interface, we changed tree-affine and made it do aggressive fold.  I am

>> curious if it's possible to use aff_tree to implement strip_constant_offset

>> here since aggressive folding is wanted.  After all, using additional chrec

>> looks like a little heavy wrto the simple test.

>

> Yeah, using aff_tree does work here when the bounds are constant.

> It doesn't look like it works for things like:

>

>     double p[1000];

>     double q[1000];

>

>     void

>     f4 (unsigned int n)

>     {

>       for (unsigned int i = 0; i < n; i += 4)

>         {

>           double a = q[i] + p[i];

>           double b = q[i + 1] + p[i + 1];

>           q[i] = a;

>           q[i + 1] = b;

>         }

>     }

>

> though, where the bounds on the global arrays guarantee that [i + 1] can't

> overflow, even though "n" is unconstrained.  The patch as posted handles

> this case too.

BTW is this a missed optimization in value range analysis?  The range
information for i should flow in a way like: array boundary -> niters
-> scev/vrp.
I think that's what niters/scev do in analysis.

Thanks,
bin
>

> Thanks,

> Richard

>

>>

>> Thanks,

>> bin

>>>

>>> There seem to be two main uses of DR_OFFSET and DR_INIT.  One type

>>> expects DR_OFFSET and DR_INIT to be generic expressions whose sum

>>> gives the initial offset from DR_BASE_ADDRESS.  The other type uses

>>> the pair (DR_BASE_ADDRESS, DR_OFFSET) to identify related data

>>> references, with the distance between their start addresses being

>>> the difference between their DR_INITs.  For this second type, the

>>> exact form of DR_OFFSET doesn't really matter, and the more it is

>>> able to canonicalise the non-constant offset, the better.

>>>

>>> The patch fixes the PR by adding another offset/init pair to the

>>> innermost loop behaviour for this second group.  The new pair use chrecs

>>> rather than generic exprs for the offset part, with the chrec being

>>> analysed in the innermost loop for which the offset isn't invariant.

>>> This allows us to vectorise the second function in the testcase

>>> as well, which wasn't possible before the earlier patch.

>>>

>>> Tested on x86_64-linux-gnu and aarch64-linux-gnu.  OK to install?

>>>

>>> Richard

>>>

>>>

>>> gcc/

>>>         PR tree-optimization/81635

>>>         * tree-data-ref.h (innermost_loop_behavior): Add chrec_offset and

>>>         chrec_init.

>>>         (DR_CHREC_OFFSET, DR_CHREC_INIT): New macros.

>>>         (dr_equal_offsets_p): Delete.

>>>         (dr_chrec_offsets_equal_p, dr_chrec_offsets_compare): Declare.

>>>         * tree-data-ref.c: Include tree-ssa-loop-ivopts.h.

>>>         (split_constant_offset): Handle POLYNOMIAL_CHREC.

>>>         (dr_analyze_innermost): Initialize dr_chrec_offset and dr_chrec_init.

>>>         (operator ==): Use dr_chrec_offsets_equal_p and compare the

>>>         DR_CHREC_INITs.

>>>         (prune_runtime_alias_test_list): Likewise.

>>>         (comp_dr_with_seg_len_pair): Use dr_chrec_offsets_compare and compare

>>>         the DR_CHREC_INITs.

>>>         (dr_equal_offsets_p1, dr_equal_offsets_p): Delete.

>>>         (analyze_offset_scev): New function.

>>>         (compute_offset_chrecs): Likewise.

>>>         (dr_chrec_offsets_equal_p): Likewise.

>>>         (dr_chrec_offsets_compare): Likewise.

>>>         * tree-loop-distribution.c (compute_alias_check_pairs): Use

>>>         dr_chrec_offsets_compare.

>>>         * tree-vect-data-refs.c (vect_find_same_alignment_drs): Use

>>>         dr_chrec_offsets_compare and compare the DR_CHREC_INITs.

>>>         (dr_group_sort_cmp): Likewise.

>>>         (vect_analyze_group_access_1): Use DR_CHREC_INIT instead of DR_INIT.

>>>         (vect_no_alias_p): Likewise.

>>>         (vect_analyze_data_ref_accesses): Use dr_chrec_offsets_equal_p and

>>>         compare the DR_CHREC_INITs.

>>>         (vect_prune_runtime_alias_test_list): Use dr_chrec_offsets_compare.

>>>         * tree-vect-stmts.c (vectorizable_load): Use DR_CHREC_INIT instead

>>>         of DR_INIT.

>>>

>>> gcc/testsuite/

>>>         PR tree-optimization/81635

>>>         * gcc.dg/vect/bb-slp-pr81635.c: New test.

>>>

Richard Sandiford Aug. 16, 2017, 5:50 p.m. UTC | #5

"Bin.Cheng" <amker.cheng@gmail.com> writes:
> On Wed, Aug 16, 2017 at 5:00 PM, Richard Sandiford

> <richard.sandiford@linaro.org> wrote:

>> "Bin.Cheng" <amker.cheng@gmail.com> writes:

>>> On Wed, Aug 16, 2017 at 2:38 PM, Richard Sandiford

>>> <richard.sandiford@linaro.org> wrote:

>>>> The first loop in the testcase regressed after my recent changes to

>>>> dr_analyze_innermost.  Previously we would treat "i" as an iv even

>>>> for bb analysis and end up with:

>>>>

>>>>    DR_BASE_ADDRESS: p or q

>>>>    DR_OFFSET: 0

>>>>    DR_INIT: 0 or 4

>>>>    DR_STEP: 16

>>>>

>>>> We now always keep the step as 0 instead, so for an int "i" we'd have:

>>>>

>>>>    DR_BASE_ADDRESS: p or q

>>>>    DR_OFFSET: (intptr_t) i

>>>>    DR_INIT: 0 or 4

>>>>    DR_STEP: 0

>>>>

>>>> This is also what we'd like to have for the unsigned "i", but the

>>>> problem is that strip_constant_offset thinks that the "i + 1" in

>>>> "(intptr_t) (i + 1)" could wrap and so doesn't peel off the "+ 1".

>>>> The [i + 1] accesses therefore have a DR_OFFSET equal to the SSA

>>>> name that holds "(intptr_t) (i + 1)", meaning that the accesses no

>>>> longer seem to be related to the [i] ones.

>>>

>>> Didn't read the change in detail, so sorry if I mis-understood the issue.

>>> I made changes in scev to better fold type conversion by various sources

>>> of information, for example, vrp, niters, undefined overflow behavior etc.

>>> In theory these information should be available for other optimizers without

>>> querying scev.  For the mentioned test, vrp should compute accurate range

>>> information for "i" so that we can fold (intptr_t) (i + 1) it without

>>> worrying

>>> overflow.  Note we don't do it in generic folding because (intptr_t) (i) + 1

>>> could be more expensive (especially in case of (T)(i + j)), or because the

>>> CST part is in bigger precision after conversion.

>>> But such folding is wanted in several places, e.g, IVOPTs.  To provide such

>>> an interface, we changed tree-affine and made it do aggressive fold.  I am

>>> curious if it's possible to use aff_tree to implement strip_constant_offset

>>> here since aggressive folding is wanted.  After all, using additional chrec

>>> looks like a little heavy wrto the simple test.

>>

>> Yeah, using aff_tree does work here when the bounds are constant.

>> It doesn't look like it works for things like:

>>

>>     double p[1000];

>>     double q[1000];

>>

>>     void

>>     f4 (unsigned int n)

>>     {

>>       for (unsigned int i = 0; i < n; i += 4)

>>         {

>>           double a = q[i] + p[i];

>>           double b = q[i + 1] + p[i + 1];

>>           q[i] = a;

>>           q[i + 1] = b;

>>         }

>>     }

>>

>> though, where the bounds on the global arrays guarantee that [i + 1] can't

>> overflow, even though "n" is unconstrained.  The patch as posted handles

>> this case too.

> BTW is this a missed optimization in value range analysis?  The range

> information for i should flow in a way like: array boundary -> niters

> -> scev/vrp.

> I think that's what niters/scev do in analysis.


Yeah, maybe :-)  It looks like the problem is that when SLP runs,
the previous VRP pass came before loop header copying, so the (single)
header has to cope with n == 0 case.  Thus we get:

  Visiting statement:
  i_15 = ASSERT_EXPR <i_6, i_6 < n_9(D)>;
  Intersecting
    [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)
  and
    [0, 0]
  to
    [0, 0]  EQUIVALENCES: { i_6 } (1 elements)
  Intersecting
    [0, 0]  EQUIVALENCES: { i_6 } (1 elements)
  and
    [0, 1000]
  to
    [0, 0]  EQUIVALENCES: { i_6 } (1 elements)
  Found new range for i_15: [0, 0]

  Visiting statement:
  _3 = i_15 + 1;
  Match-and-simplified i_15 + 1 to 1
  Intersecting
    [1, 1]
  and
    [0, +INF]
  to
    [1, 1]
  Found new range for _3: [1, 1]

(where _3 is the index we care about), followed by:

  Visiting statement:
  i_15 = ASSERT_EXPR <i_6, i_6 < n_9(D)>;
  Intersecting
    [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)
  and
    [0, 4]
  to
    [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)
  Intersecting
    [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)
  and
    [0, 1000]
  to
    [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)
  Found new range for i_15: [0, n_9(D) + 4294967295]

  Visiting statement:
  _3 = i_15 + 1;
  Intersecting
    [0, +INF]
  and
    [0, +INF]
  to
    [0, +INF]
  Found new range for _3: [0, +INF]

I guess in this case it would be better to intersect the i_15 ranges
to [0, 1000] rather than [0, n_9(D) + 4294967295].

It does work if another VRP pass runs after CH.  But even then,
is it a good idea to rely on the range info being kept up-to-date
all the way through to SLP?  A lot happens inbetween.

FWIW, the old simple_iv check that I removed for bb data-ref analysis
relies on SCEV analysis too, so I don't think this is more expensive
than what we had before.

Thanks,
Richard

Bin.Cheng Aug. 17, 2017, 10:48 a.m. UTC | #6

On Wed, Aug 16, 2017 at 6:50 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> "Bin.Cheng" <amker.cheng@gmail.com> writes:

>> On Wed, Aug 16, 2017 at 5:00 PM, Richard Sandiford

>> <richard.sandiford@linaro.org> wrote:

>>> "Bin.Cheng" <amker.cheng@gmail.com> writes:

>>>> On Wed, Aug 16, 2017 at 2:38 PM, Richard Sandiford

>>>> <richard.sandiford@linaro.org> wrote:

>>>>> The first loop in the testcase regressed after my recent changes to

>>>>> dr_analyze_innermost.  Previously we would treat "i" as an iv even

>>>>> for bb analysis and end up with:

>>>>>

>>>>>    DR_BASE_ADDRESS: p or q

>>>>>    DR_OFFSET: 0

>>>>>    DR_INIT: 0 or 4

>>>>>    DR_STEP: 16

>>>>>

>>>>> We now always keep the step as 0 instead, so for an int "i" we'd have:

>>>>>

>>>>>    DR_BASE_ADDRESS: p or q

>>>>>    DR_OFFSET: (intptr_t) i

>>>>>    DR_INIT: 0 or 4

>>>>>    DR_STEP: 0

>>>>>

>>>>> This is also what we'd like to have for the unsigned "i", but the

>>>>> problem is that strip_constant_offset thinks that the "i + 1" in

>>>>> "(intptr_t) (i + 1)" could wrap and so doesn't peel off the "+ 1".

>>>>> The [i + 1] accesses therefore have a DR_OFFSET equal to the SSA

>>>>> name that holds "(intptr_t) (i + 1)", meaning that the accesses no

>>>>> longer seem to be related to the [i] ones.

>>>>

>>>> Didn't read the change in detail, so sorry if I mis-understood the issue.

>>>> I made changes in scev to better fold type conversion by various sources

>>>> of information, for example, vrp, niters, undefined overflow behavior etc.

>>>> In theory these information should be available for other optimizers without

>>>> querying scev.  For the mentioned test, vrp should compute accurate range

>>>> information for "i" so that we can fold (intptr_t) (i + 1) it without

>>>> worrying

>>>> overflow.  Note we don't do it in generic folding because (intptr_t) (i) + 1

>>>> could be more expensive (especially in case of (T)(i + j)), or because the

>>>> CST part is in bigger precision after conversion.

>>>> But such folding is wanted in several places, e.g, IVOPTs.  To provide such

>>>> an interface, we changed tree-affine and made it do aggressive fold.  I am

>>>> curious if it's possible to use aff_tree to implement strip_constant_offset

>>>> here since aggressive folding is wanted.  After all, using additional chrec

>>>> looks like a little heavy wrto the simple test.

>>>

>>> Yeah, using aff_tree does work here when the bounds are constant.

>>> It doesn't look like it works for things like:

>>>

>>>     double p[1000];

>>>     double q[1000];

>>>

>>>     void

>>>     f4 (unsigned int n)

>>>     {

>>>       for (unsigned int i = 0; i < n; i += 4)

>>>         {

>>>           double a = q[i] + p[i];

>>>           double b = q[i + 1] + p[i + 1];

>>>           q[i] = a;

>>>           q[i + 1] = b;

>>>         }

>>>     }

>>>

>>> though, where the bounds on the global arrays guarantee that [i + 1] can't

>>> overflow, even though "n" is unconstrained.  The patch as posted handles

>>> this case too.

>> BTW is this a missed optimization in value range analysis?  The range

>> information for i should flow in a way like: array boundary -> niters

>> -> scev/vrp.

>> I think that's what niters/scev do in analysis.

>

> Yeah, maybe :-)  It looks like the problem is that when SLP runs,

> the previous VRP pass came before loop header copying, so the (single)

> header has to cope with n == 0 case.  Thus we get:

Ah, there are several passes in between vrp and pass_ch, not sure if
any such pass depends on vrp intensively.  I would suggestion reorder
the two passes, or standalone VRP interface updating information for
loop region after header copied?   This is a non-trivial issue that
needs to be fixed.  Niters analyzer rely on
simplify_using_initial_conditions heavily to get the same information,
which in my opinion should be provided by VRP.  Though this won't be
able to obsolete simplify_using_initial_conditions because VRP is weak
in symbolic range...

>

>   Visiting statement:

>   i_15 = ASSERT_EXPR <i_6, i_6 < n_9(D)>;

>   Intersecting

>     [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)

>   and

>     [0, 0]

>   to

>     [0, 0]  EQUIVALENCES: { i_6 } (1 elements)

>   Intersecting

>     [0, 0]  EQUIVALENCES: { i_6 } (1 elements)

>   and

>     [0, 1000]

>   to

>     [0, 0]  EQUIVALENCES: { i_6 } (1 elements)

>   Found new range for i_15: [0, 0]

>

>   Visiting statement:

>   _3 = i_15 + 1;

>   Match-and-simplified i_15 + 1 to 1

>   Intersecting

>     [1, 1]

>   and

>     [0, +INF]

>   to

>     [1, 1]

>   Found new range for _3: [1, 1]

>

> (where _3 is the index we care about), followed by:

>

>   Visiting statement:

>   i_15 = ASSERT_EXPR <i_6, i_6 < n_9(D)>;

>   Intersectings/aarch64-linux/trunk-orig/debug/gcc'

>     [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)

>   and

>     [0, 4]

>   to

>     [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)

>   Intersecting

>     [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)

>   and

>     [0, 1000]

>   to

>     [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)

>   Found new range for i_15: [0, n_9(D) + 4294967295]

>

>   Visiting statement:

>   _3 = i_15 + 1;

>   Intersecting

>     [0, +INF]

>   and

>     [0, +INF]

>   to

>     [0, +INF]

>   Found new range for _3: [0, +INF]

>

> I guess in this case it would be better to intersect the i_15 ranges

> to [0, 1000] rather than [0, n_9(D) + 4294967295].

>

> It does work if another VRP pass runs after CH.  But even then,

> is it a good idea to rely on the range info being kept up-to-date

> all the way through to SLP?  A lot happens inbetween.

To some extend yes.  Now I understand that SCEV uses niters
information to prove no_overflow.  Niters analysis does infer better
information from array boundary, while VRP fails to do that.  I don't
worry much about gap between vrp pass and slp, it's basically the same
as niters.  Both information are analyzed at one point and meant to be
used by following passes.  It's also not common for vrp information
become invalid given we are on SSA?
Now that data address is not analyzed against loop, VRP would be the
only information we can use unless adding back scev analysis.  IMHO,
the patch is doing so in another way than before.  It requires
additional chrec data structure.  I remember the previous patch
enables more slp vectorization, is it because of "step" information
from scev?  In this patch, step information is simply discarded.  I am
wondering if possible to always analyze scev within innermost loop for
slp while discards step information.

Thanks,
bin
>

> FWIW, the old simple_iv check that I removed for bb data-ref analysis

> relies on SCEV analysis too, so I don't think this is more expensive

> than what we had before.

>

> Thanks,

> Richard

Richard Sandiford Aug. 17, 2017, 11:35 a.m. UTC | #7

"Bin.Cheng" <amker.cheng@gmail.com> writes:
> On Wed, Aug 16, 2017 at 6:50 PM, Richard Sandiford

> <richard.sandiford@linaro.org> wrote:

>> "Bin.Cheng" <amker.cheng@gmail.com> writes:

>>> On Wed, Aug 16, 2017 at 5:00 PM, Richard Sandiford

>>> <richard.sandiford@linaro.org> wrote:

>>>> "Bin.Cheng" <amker.cheng@gmail.com> writes:

>>>>> On Wed, Aug 16, 2017 at 2:38 PM, Richard Sandiford

>>>>> <richard.sandiford@linaro.org> wrote:

>>>>>> The first loop in the testcase regressed after my recent changes to

>>>>>> dr_analyze_innermost.  Previously we would treat "i" as an iv even

>>>>>> for bb analysis and end up with:

>>>>>>

>>>>>>    DR_BASE_ADDRESS: p or q

>>>>>>    DR_OFFSET: 0

>>>>>>    DR_INIT: 0 or 4

>>>>>>    DR_STEP: 16

>>>>>>

>>>>>> We now always keep the step as 0 instead, so for an int "i" we'd have:

>>>>>>

>>>>>>    DR_BASE_ADDRESS: p or q

>>>>>>    DR_OFFSET: (intptr_t) i

>>>>>>    DR_INIT: 0 or 4

>>>>>>    DR_STEP: 0

>>>>>>

>>>>>> This is also what we'd like to have for the unsigned "i", but the

>>>>>> problem is that strip_constant_offset thinks that the "i + 1" in

>>>>>> "(intptr_t) (i + 1)" could wrap and so doesn't peel off the "+ 1".

>>>>>> The [i + 1] accesses therefore have a DR_OFFSET equal to the SSA

>>>>>> name that holds "(intptr_t) (i + 1)", meaning that the accesses no

>>>>>> longer seem to be related to the [i] ones.

>>>>>

>>>>> Didn't read the change in detail, so sorry if I mis-understood the issue.

>>>>> I made changes in scev to better fold type conversion by various sources

>>>>> of information, for example, vrp, niters, undefined overflow behavior etc.

>>>>> In theory these information should be available for other

>>>>> optimizers without

>>>>> querying scev.  For the mentioned test, vrp should compute accurate range

>>>>> information for "i" so that we can fold (intptr_t) (i + 1) it without

>>>>> worrying

>>>>> overflow.  Note we don't do it in generic folding because

>>>>> (intptr_t) (i) + 1

>>>>> could be more expensive (especially in case of (T)(i + j)), or because the

>>>>> CST part is in bigger precision after conversion.

>>>>> But such folding is wanted in several places, e.g, IVOPTs.  To provide such

>>>>> an interface, we changed tree-affine and made it do aggressive fold.  I am

>>>>> curious if it's possible to use aff_tree to implement strip_constant_offset

>>>>> here since aggressive folding is wanted.  After all, using additional chrec

>>>>> looks like a little heavy wrto the simple test.

>>>>

>>>> Yeah, using aff_tree does work here when the bounds are constant.

>>>> It doesn't look like it works for things like:

>>>>

>>>>     double p[1000];

>>>>     double q[1000];

>>>>

>>>>     void

>>>>     f4 (unsigned int n)

>>>>     {

>>>>       for (unsigned int i = 0; i < n; i += 4)

>>>>         {

>>>>           double a = q[i] + p[i];

>>>>           double b = q[i + 1] + p[i + 1];

>>>>           q[i] = a;

>>>>           q[i + 1] = b;

>>>>         }

>>>>     }

>>>>

>>>> though, where the bounds on the global arrays guarantee that [i + 1] can't

>>>> overflow, even though "n" is unconstrained.  The patch as posted handles

>>>> this case too.

>>> BTW is this a missed optimization in value range analysis?  The range

>>> information for i should flow in a way like: array boundary -> niters

>>> -> scev/vrp.

>>> I think that's what niters/scev do in analysis.

>>

>> Yeah, maybe :-)  It looks like the problem is that when SLP runs,

>> the previous VRP pass came before loop header copying, so the (single)

>> header has to cope with n == 0 case.  Thus we get:

> Ah, there are several passes in between vrp and pass_ch, not sure if

> any such pass depends on vrp intensively.  I would suggestion reorder

> the two passes, or standalone VRP interface updating information for

> loop region after header copied?   This is a non-trivial issue that

> needs to be fixed.  Niters analyzer rely on

> simplify_using_initial_conditions heavily to get the same information,

> which in my opinion should be provided by VRP.  Though this won't be

> able to obsolete simplify_using_initial_conditions because VRP is weak

> in symbolic range...

>

>>

>>   Visiting statement:

>>   i_15 = ASSERT_EXPR <i_6, i_6 < n_9(D)>;

>>   Intersecting

>>     [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)

>>   and

>>     [0, 0]

>>   to

>>     [0, 0]  EQUIVALENCES: { i_6 } (1 elements)

>>   Intersecting

>>     [0, 0]  EQUIVALENCES: { i_6 } (1 elements)

>>   and

>>     [0, 1000]

>>   to

>>     [0, 0]  EQUIVALENCES: { i_6 } (1 elements)

>>   Found new range for i_15: [0, 0]

>>

>>   Visiting statement:

>>   _3 = i_15 + 1;

>>   Match-and-simplified i_15 + 1 to 1

>>   Intersecting

>>     [1, 1]

>>   and

>>     [0, +INF]

>>   to

>>     [1, 1]

>>   Found new range for _3: [1, 1]

>>

>> (where _3 is the index we care about), followed by:

>>

>>   Visiting statement:

>>   i_15 = ASSERT_EXPR <i_6, i_6 < n_9(D)>;

>>   Intersectings/aarch64-linux/trunk-orig/debug/gcc'

>>     [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)

>>   and

>>     [0, 4]

>>   to

>>     [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)

>>   Intersecting

>>     [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)

>>   and

>>     [0, 1000]

>>   to

>>     [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)

>>   Found new range for i_15: [0, n_9(D) + 4294967295]

>>

>>   Visiting statement:

>>   _3 = i_15 + 1;

>>   Intersecting

>>     [0, +INF]

>>   and

>>     [0, +INF]

>>   to

>>     [0, +INF]

>>   Found new range for _3: [0, +INF]

>>

>> I guess in this case it would be better to intersect the i_15 ranges

>> to [0, 1000] rather than [0, n_9(D) + 4294967295].

>>

>> It does work if another VRP pass runs after CH.  But even then,

>> is it a good idea to rely on the range info being kept up-to-date

>> all the way through to SLP?  A lot happens inbetween.

> To some extend yes.  Now I understand that SCEV uses niters

> information to prove no_overflow.  Niters analysis does infer better

> information from array boundary, while VRP fails to do that.  I don't

> worry much about gap between vrp pass and slp, it's basically the same

> as niters.  Both information are analyzed at one point and meant to be

> used by following passes.  It's also not common for vrp information

> become invalid given we are on SSA?


I'm not worried so much about vrp information becoming invalid on
an SSA name that existed when VRP was run.  It's more a question
of what happens about SSA names that get introduced after VRP,
e.g. by things like dom, reassoc, PRE, etc.

> Now that data address is not analyzed against loop, VRP would be the

> only information we can use unless adding back scev analysis.  IMHO,

> the patch is doing so in another way than before.  It requires

> additional chrec data structure.  I remember the previous patch

> enables more slp vectorization, is it because of "step" information

> from scev?


Do you mean that:

2017-07-03  Richard Sandiford  <richard.sandiford@linaro.org>

	* tree-data-ref.c (dr_analyze_innermost): Replace the "nest"
	parameter with a "loop" parameter and use it instead of the
	loop containing DR_STMT.  Don't check simple_iv when doing
	BB analysis.  Describe the two analysis modes in the comment.

enabled more SLP vectorisation in bb-slp-pr65935.c?  That was due
to us not doing IV analysis for BB vectorisation, and ensuring that
the step was always zero.

> In this patch, step information is simply discarded.  I am

> wondering if possible to always analyze scev within innermost loop for

> slp while discards step information.


Well, we don't calculate a step for bb analysis (i.e. it's not the case
the patch calculates step information and then simply discards it).
I don't see how that would work.  For bb analysis, the DR_OFFSET + DR_INIT
has to give the offset for every execution of the block, not just the
first iteration of the containing loop.  So if we get back a nonzero
step, we have to do something with it.

But:

(a) the old simple_iv analysis is more expensive than simply calling
    analyze_scev, so I don't think this is a win in terms of complexity.

(b) for bb analysis, there's nothing particularly special about the
    innermost loop.  It makes more sense to analyse it in the innermost
    loop for which the offset is invariant, as shown by the second
    testcase in the patch.

(c) The patch helps with loop vectorisation too, since analysing the
    starting DR_OFFSET in the context of the containing loop can help
    in a similar way as analysing the full offset does for SLP.

Thanks,
Richard

>

> Thanks,

> bin

>>

>> FWIW, the old simple_iv check that I removed for bb data-ref analysis

>> relies on SCEV analysis too, so I don't think this is more expensive

>> than what we had before.

>>

>> Thanks,

>> Richard

Bin.Cheng Aug. 17, 2017, 12:24 p.m. UTC | #8

On Thu, Aug 17, 2017 at 12:35 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> "Bin.Cheng" <amker.cheng@gmail.com> writes:

>> On Wed, Aug 16, 2017 at 6:50 PM, Richard Sandiford

>> <richard.sandiford@linaro.org> wrote:

>>> "Bin.Cheng" <amker.cheng@gmail.com> writes:

>>>> On Wed, Aug 16, 2017 at 5:00 PM, Richard Sandiford

>>>> <richard.sandiford@linaro.org> wrote:

>>>>> "Bin.Cheng" <amker.cheng@gmail.com> writes:

>>>>>> On Wed, Aug 16, 2017 at 2:38 PM, Richard Sandiford

>>>>>> <richard.sandiford@linaro.org> wrote:

>>>>>>> The first loop in the testcase regressed after my recent changes to

>>>>>>> dr_analyze_innermost.  Previously we would treat "i" as an iv even

>>>>>>> for bb analysis and end up with:

>>>>>>>

>>>>>>>    DR_BASE_ADDRESS: p or q

>>>>>>>    DR_OFFSET: 0

>>>>>>>    DR_INIT: 0 or 4

>>>>>>>    DR_STEP: 16

>>>>>>>

>>>>>>> We now always keep the step as 0 instead, so for an int "i" we'd have:

>>>>>>>

>>>>>>>    DR_BASE_ADDRESS: p or q

>>>>>>>    DR_OFFSET: (intptr_t) i

>>>>>>>    DR_INIT: 0 or 4

>>>>>>>    DR_STEP: 0

>>>>>>>

>>>>>>> This is also what we'd like to have for the unsigned "i", but the

>>>>>>> problem is that strip_constant_offset thinks that the "i + 1" in

>>>>>>> "(intptr_t) (i + 1)" could wrap and so doesn't peel off the "+ 1".

>>>>>>> The [i + 1] accesses therefore have a DR_OFFSET equal to the SSA

>>>>>>> name that holds "(intptr_t) (i + 1)", meaning that the accesses no

>>>>>>> longer seem to be related to the [i] ones.

>>>>>>

>>>>>> Didn't read the change in detail, so sorry if I mis-understood the issue.

>>>>>> I made changes in scev to better fold type conversion by various sources

>>>>>> of information, for example, vrp, niters, undefined overflow behavior etc.

>>>>>> In theory these information should be available for other

>>>>>> optimizers without

>>>>>> querying scev.  For the mentioned test, vrp should compute accurate range

>>>>>> information for "i" so that we can fold (intptr_t) (i + 1) it without

>>>>>> worrying

>>>>>> overflow.  Note we don't do it in generic folding because

>>>>>> (intptr_t) (i) + 1

>>>>>> could be more expensive (especially in case of (T)(i + j)), or because the

>>>>>> CST part is in bigger precision after conversion.

>>>>>> But such folding is wanted in several places, e.g, IVOPTs.  To provide such

>>>>>> an interface, we changed tree-affine and made it do aggressive fold.  I am

>>>>>> curious if it's possible to use aff_tree to implement strip_constant_offset

>>>>>> here since aggressive folding is wanted.  After all, using additional chrec

>>>>>> looks like a little heavy wrto the simple test.

>>>>>

>>>>> Yeah, using aff_tree does work here when the bounds are constant.

>>>>> It doesn't look like it works for things like:

>>>>>

>>>>>     double p[1000];

>>>>>     double q[1000];

>>>>>

>>>>>     void

>>>>>     f4 (unsigned int n)

>>>>>     {

>>>>>       for (unsigned int i = 0; i < n; i += 4)

>>>>>         {

>>>>>           double a = q[i] + p[i];

>>>>>           double b = q[i + 1] + p[i + 1];

>>>>>           q[i] = a;

>>>>>           q[i + 1] = b;

>>>>>         }

>>>>>     }

>>>>>

>>>>> though, where the bounds on the global arrays guarantee that [i + 1] can't

>>>>> overflow, even though "n" is unconstrained.  The patch as posted handles

>>>>> this case too.

>>>> BTW is this a missed optimization in value range analysis?  The range

>>>> information for i should flow in a way like: array boundary -> niters

>>>> -> scev/vrp.

>>>> I think that's what niters/scev do in analysis.

>>>

>>> Yeah, maybe :-)  It looks like the problem is that when SLP runs,

>>> the previous VRP pass came before loop header copying, so the (single)

>>> header has to cope with n == 0 case.  Thus we get:

>> Ah, there are several passes in between vrp and pass_ch, not sure if

>> any such pass depends on vrp intensively.  I would suggestion reorder

>> the two passes, or standalone VRP interface updating information for

>> loop region after header copied?   This is a non-trivial issue that

>> needs to be fixed.  Niters analyzer rely on

>> simplify_using_initial_conditions heavily to get the same information,

>> which in my opinion should be provided by VRP.  Though this won't be

>> able to obsolete simplify_using_initial_conditions because VRP is weak

>> in symbolic range...

>>

>>>

>>>   Visiting statement:

>>>   i_15 = ASSERT_EXPR <i_6, i_6 < n_9(D)>;

>>>   Intersecting

>>>     [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)

>>>   and

>>>     [0, 0]

>>>   to

>>>     [0, 0]  EQUIVALENCES: { i_6 } (1 elements)

>>>   Intersecting

>>>     [0, 0]  EQUIVALENCES: { i_6 } (1 elements)

>>>   and

>>>     [0, 1000]

>>>   to

>>>     [0, 0]  EQUIVALENCES: { i_6 } (1 elements)

>>>   Found new range for i_15: [0, 0]

>>>

>>>   Visiting statement:

>>>   _3 = i_15 + 1;

>>>   Match-and-simplified i_15 + 1 to 1

>>>   Intersecting

>>>     [1, 1]

>>>   and

>>>     [0, +INF]

>>>   to

>>>     [1, 1]

>>>   Found new range for _3: [1, 1]

>>>

>>> (where _3 is the index we care about), followed by:

>>>

>>>   Visiting statement:

>>>   i_15 = ASSERT_EXPR <i_6, i_6 < n_9(D)>;

>>>   Intersectings/aarch64-linux/trunk-orig/debug/gcc'

>>>     [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)

>>>   and

>>>     [0, 4]

>>>   to

>>>     [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)

>>>   Intersecting

>>>     [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)

>>>   and

>>>     [0, 1000]

>>>   to

>>>     [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)

>>>   Found new range for i_15: [0, n_9(D) + 4294967295]

>>>

>>>   Visiting statement:

>>>   _3 = i_15 + 1;

>>>   Intersecting

>>>     [0, +INF]

>>>   and

>>>     [0, +INF]

>>>   to

>>>     [0, +INF]

>>>   Found new range for _3: [0, +INF]

>>>

>>> I guess in this case it would be better to intersect the i_15 ranges

>>> to [0, 1000] rather than [0, n_9(D) + 4294967295].

>>>

>>> It does work if another VRP pass runs after CH.  But even then,

>>> is it a good idea to rely on the range info being kept up-to-date

>>> all the way through to SLP?  A lot happens inbetween.

>> To some extend yes.  Now I understand that SCEV uses niters

>> information to prove no_overflow.  Niters analysis does infer better

>> information from array boundary, while VRP fails to do that.  I don't

>> worry much about gap between vrp pass and slp, it's basically the same

>> as niters.  Both information are analyzed at one point and meant to be

>> used by following passes.  It's also not common for vrp information

>> become invalid given we are on SSA?

>

> I'm not worried so much about vrp information becoming invalid on

> an SSA name that existed when VRP was run.  It's more a question

> of what happens about SSA names that get introduced after VRP,

> e.g. by things like dom, reassoc, PRE, etc.

For induction variables in concern, these passes shouldn't
aggressively introduces new variables I think.
>

>> Now that data address is not analyzed against loop, VRP would be the

>> only information we can use unless adding back scev analysis.  IMHO,

>> the patch is doing so in another way than before.  It requires

>> additional chrec data structure.  I remember the previous patch

>> enables more slp vectorization, is it because of "step" information

>> from scev?

>

> Do you mean that:

>

> 2017-07-03  Richard Sandiford  <richard.sandiford@linaro.org>

>

>         * tree-data-ref.c (dr_analyze_innermost): Replace the "nest"

>         parameter with a "loop" parameter and use it instead of the

>         loop containing DR_STMT.  Don't check simple_iv when doing

>         BB analysis.  Describe the two analysis modes in the comment.

>

> enabled more SLP vectorisation in bb-slp-pr65935.c?  That was due

> to us not doing IV analysis for BB vectorisation, and ensuring that

> the step was always zero.

Which means vectorizer code can handle not IV-analyzed offset, but
can't for analyzed form?
>

>> In this patch, step information is simply discarded.  I am

>> wondering if possible to always analyze scev within innermost loop for

>> slp while discards step information.

>

> Well, we don't calculate a step for bb analysis (i.e. it's not the case

> the patch calculates step information and then simply discards it).

> I don't see how that would work.  For bb analysis, the DR_OFFSET + DR_INIT

> has to give the offset for every execution of the block, not just the

> first iteration of the containing loop.  So if we get back a nonzero

> step, we have to do something with it.

Yeah.
>

> But:

>

> (a) the old simple_iv analysis is more expensive than simply calling

>     analyze_scev, so I don't think this is a win in terms of complexity.

>

> (b) for bb analysis, there's nothing particularly special about the

>     innermost loop.  It makes more sense to analyse it in the innermost

>     loop for which the offset is invariant, as shown by the second

>     testcase in the patch.

>

> (c) The patch helps with loop vectorisation too, since analysing the

>     starting DR_OFFSET in the context of the containing loop can help

>     in a similar way as analysing the full offset does for SLP.


I have to admit I am not very much into this method.  It complicates
structure as well as code.
Mostly because now dr_init are split into two different fields and one
of it is lazily computed.

For example:
> @@ -2974,12 +2974,12 @@ vect_vfa_segment_size (struct data_refer

>  vect_no_alias_p (struct data_reference *a, struct data_reference *b,

>                   tree segment_length_a, tree segment_length_b)

>  {

> -  gcc_assert (TREE_CODE (DR_INIT (a)) == INTEGER_CST

> -          && TREE_CODE (DR_INIT (b)) == INTEGER_CST);

> -  if (tree_int_cst_equal (DR_INIT (a), DR_INIT (b)))

> +  gcc_assert (TREE_CODE (DR_CHREC_INIT (a)) == INTEGER_CST

> +          && TREE_CODE (DR_CHREC_INIT (b)) == INTEGER_CST);

> +  if (tree_int_cst_equal (DR_CHREC_INIT (a), DR_CHREC_INIT (b)))

>      return false;

>

> -  tree seg_a_min = DR_INIT (a);

> +  tree seg_a_min = DR_CHREC_INIT (a);

>    tree seg_a_max = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_a_min),

>                  seg_a_min, segment_length_a);

>    /* For negative step, we need to adjust address range by TYPE_SIZE_UNIT

> @@ -2990,10 +2990,10 @@ vect_no_alias_p (struct data_reference *

>        tree unit_size = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (a)));

>        seg_a_min = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_a_max),

>                     seg_a_max, unit_size);

> -      seg_a_max = fold_build2 (PLUS_EXPR, TREE_TYPE (DR_INIT (a)),

> -                   DR_INIT (a), unit_size);

> +      seg_a_max = fold_build2 (PLUS_EXPR, TREE_TYPE (DR_CHREC_INIT (a)),

> +                   DR_CHREC_INIT (a), unit_size);

>      }

> -  tree seg_b_min = DR_INIT (b);

> +  tree seg_b_min = DR_CHREC_INIT (b);

>    tree seg_b_max = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_b_min),

>                  seg_b_min, segment_length_b);

>    if (tree_int_cst_compare (DR_STEP (b), size_zero_node) < 0)


Use of DR_INIT is simply replaced by DR_CHREC_INIT.  Is it safe to do
so in case of non-ZERO
DR_INIT?  It worries me that I may need to think twice before
referring to DR_INIT because it's
not clear when DR_OFFSET is split and DR_CHREC_INIT becomes non-ZERO.
It may simply
because I am too dumb to handle this.  I will leave this to richi.

Thanks,
bin
>

> Thanks,

> Richard

>

>>

>> Thanks,

>> bin

>>>

>>> FWIW, the old simple_iv check that I removed for bb data-ref analysis

>>> relies on SCEV analysis too, so I don't think this is more expensive

>>> than what we had before.

>>>

>>> Thanks,

>>> Richard

Richard Biener Aug. 18, 2017, 10:30 a.m. UTC | #9

On Thu, Aug 17, 2017 at 2:24 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Thu, Aug 17, 2017 at 12:35 PM, Richard Sandiford

> <richard.sandiford@linaro.org> wrote:

>> "Bin.Cheng" <amker.cheng@gmail.com> writes:

>>> On Wed, Aug 16, 2017 at 6:50 PM, Richard Sandiford

>>> <richard.sandiford@linaro.org> wrote:

>>>> "Bin.Cheng" <amker.cheng@gmail.com> writes:

>>>>> On Wed, Aug 16, 2017 at 5:00 PM, Richard Sandiford

>>>>> <richard.sandiford@linaro.org> wrote:

>>>>>> "Bin.Cheng" <amker.cheng@gmail.com> writes:

>>>>>>> On Wed, Aug 16, 2017 at 2:38 PM, Richard Sandiford

>>>>>>> <richard.sandiford@linaro.org> wrote:

>>>>>>>> The first loop in the testcase regressed after my recent changes to

>>>>>>>> dr_analyze_innermost.  Previously we would treat "i" as an iv even

>>>>>>>> for bb analysis and end up with:

>>>>>>>>

>>>>>>>>    DR_BASE_ADDRESS: p or q

>>>>>>>>    DR_OFFSET: 0

>>>>>>>>    DR_INIT: 0 or 4

>>>>>>>>    DR_STEP: 16

>>>>>>>>

>>>>>>>> We now always keep the step as 0 instead, so for an int "i" we'd have:

>>>>>>>>

>>>>>>>>    DR_BASE_ADDRESS: p or q

>>>>>>>>    DR_OFFSET: (intptr_t) i

>>>>>>>>    DR_INIT: 0 or 4

>>>>>>>>    DR_STEP: 0

>>>>>>>>

>>>>>>>> This is also what we'd like to have for the unsigned "i", but the

>>>>>>>> problem is that strip_constant_offset thinks that the "i + 1" in

>>>>>>>> "(intptr_t) (i + 1)" could wrap and so doesn't peel off the "+ 1".

>>>>>>>> The [i + 1] accesses therefore have a DR_OFFSET equal to the SSA

>>>>>>>> name that holds "(intptr_t) (i + 1)", meaning that the accesses no

>>>>>>>> longer seem to be related to the [i] ones.

>>>>>>>

>>>>>>> Didn't read the change in detail, so sorry if I mis-understood the issue.

>>>>>>> I made changes in scev to better fold type conversion by various sources

>>>>>>> of information, for example, vrp, niters, undefined overflow behavior etc.

>>>>>>> In theory these information should be available for other

>>>>>>> optimizers without

>>>>>>> querying scev.  For the mentioned test, vrp should compute accurate range

>>>>>>> information for "i" so that we can fold (intptr_t) (i + 1) it without

>>>>>>> worrying

>>>>>>> overflow.  Note we don't do it in generic folding because

>>>>>>> (intptr_t) (i) + 1

>>>>>>> could be more expensive (especially in case of (T)(i + j)), or because the

>>>>>>> CST part is in bigger precision after conversion.

>>>>>>> But such folding is wanted in several places, e.g, IVOPTs.  To provide such

>>>>>>> an interface, we changed tree-affine and made it do aggressive fold.  I am

>>>>>>> curious if it's possible to use aff_tree to implement strip_constant_offset

>>>>>>> here since aggressive folding is wanted.  After all, using additional chrec

>>>>>>> looks like a little heavy wrto the simple test.

>>>>>>

>>>>>> Yeah, using aff_tree does work here when the bounds are constant.

>>>>>> It doesn't look like it works for things like:

>>>>>>

>>>>>>     double p[1000];

>>>>>>     double q[1000];

>>>>>>

>>>>>>     void

>>>>>>     f4 (unsigned int n)

>>>>>>     {

>>>>>>       for (unsigned int i = 0; i < n; i += 4)

>>>>>>         {

>>>>>>           double a = q[i] + p[i];

>>>>>>           double b = q[i + 1] + p[i + 1];

>>>>>>           q[i] = a;

>>>>>>           q[i + 1] = b;

>>>>>>         }

>>>>>>     }

>>>>>>

>>>>>> though, where the bounds on the global arrays guarantee that [i + 1] can't

>>>>>> overflow, even though "n" is unconstrained.  The patch as posted handles

>>>>>> this case too.

>>>>> BTW is this a missed optimization in value range analysis?  The range

>>>>> information for i should flow in a way like: array boundary -> niters

>>>>> -> scev/vrp.

>>>>> I think that's what niters/scev do in analysis.

>>>>

>>>> Yeah, maybe :-)  It looks like the problem is that when SLP runs,

>>>> the previous VRP pass came before loop header copying, so the (single)

>>>> header has to cope with n == 0 case.  Thus we get:

>>> Ah, there are several passes in between vrp and pass_ch, not sure if

>>> any such pass depends on vrp intensively.  I would suggestion reorder

>>> the two passes, or standalone VRP interface updating information for

>>> loop region after header copied?   This is a non-trivial issue that

>>> needs to be fixed.  Niters analyzer rely on

>>> simplify_using_initial_conditions heavily to get the same information,

>>> which in my opinion should be provided by VRP.  Though this won't be

>>> able to obsolete simplify_using_initial_conditions because VRP is weak

>>> in symbolic range...

>>>

>>>>

>>>>   Visiting statement:

>>>>   i_15 = ASSERT_EXPR <i_6, i_6 < n_9(D)>;

>>>>   Intersecting

>>>>     [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)

>>>>   and

>>>>     [0, 0]

>>>>   to

>>>>     [0, 0]  EQUIVALENCES: { i_6 } (1 elements)

>>>>   Intersecting

>>>>     [0, 0]  EQUIVALENCES: { i_6 } (1 elements)

>>>>   and

>>>>     [0, 1000]

>>>>   to

>>>>     [0, 0]  EQUIVALENCES: { i_6 } (1 elements)

>>>>   Found new range for i_15: [0, 0]

>>>>

>>>>   Visiting statement:

>>>>   _3 = i_15 + 1;

>>>>   Match-and-simplified i_15 + 1 to 1

>>>>   Intersecting

>>>>     [1, 1]

>>>>   and

>>>>     [0, +INF]

>>>>   to

>>>>     [1, 1]

>>>>   Found new range for _3: [1, 1]

>>>>

>>>> (where _3 is the index we care about), followed by:

>>>>

>>>>   Visiting statement:

>>>>   i_15 = ASSERT_EXPR <i_6, i_6 < n_9(D)>;

>>>>   Intersectings/aarch64-linux/trunk-orig/debug/gcc'

>>>>     [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)

>>>>   and

>>>>     [0, 4]

>>>>   to

>>>>     [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)

>>>>   Intersecting

>>>>     [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)

>>>>   and

>>>>     [0, 1000]

>>>>   to

>>>>     [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)

>>>>   Found new range for i_15: [0, n_9(D) + 4294967295]

>>>>

>>>>   Visiting statement:

>>>>   _3 = i_15 + 1;

>>>>   Intersecting

>>>>     [0, +INF]

>>>>   and

>>>>     [0, +INF]

>>>>   to

>>>>     [0, +INF]

>>>>   Found new range for _3: [0, +INF]

>>>>

>>>> I guess in this case it would be better to intersect the i_15 ranges

>>>> to [0, 1000] rather than [0, n_9(D) + 4294967295].

>>>>

>>>> It does work if another VRP pass runs after CH.  But even then,

>>>> is it a good idea to rely on the range info being kept up-to-date

>>>> all the way through to SLP?  A lot happens inbetween.

>>> To some extend yes.  Now I understand that SCEV uses niters

>>> information to prove no_overflow.  Niters analysis does infer better

>>> information from array boundary, while VRP fails to do that.  I don't

>>> worry much about gap between vrp pass and slp, it's basically the same

>>> as niters.  Both information are analyzed at one point and meant to be

>>> used by following passes.  It's also not common for vrp information

>>> become invalid given we are on SSA?

>>

>> I'm not worried so much about vrp information becoming invalid on

>> an SSA name that existed when VRP was run.  It's more a question

>> of what happens about SSA names that get introduced after VRP,

>> e.g. by things like dom, reassoc, PRE, etc.

> For induction variables in concern, these passes shouldn't

> aggressively introduces new variables I think.

>>

>>> Now that data address is not analyzed against loop, VRP would be the

>>> only information we can use unless adding back scev analysis.  IMHO,

>>> the patch is doing so in another way than before.  It requires

>>> additional chrec data structure.  I remember the previous patch

>>> enables more slp vectorization, is it because of "step" information

>>> from scev?

>>

>> Do you mean that:

>>

>> 2017-07-03  Richard Sandiford  <richard.sandiford@linaro.org>

>>

>>         * tree-data-ref.c (dr_analyze_innermost): Replace the "nest"

>>         parameter with a "loop" parameter and use it instead of the

>>         loop containing DR_STMT.  Don't check simple_iv when doing

>>         BB analysis.  Describe the two analysis modes in the comment.

>>

>> enabled more SLP vectorisation in bb-slp-pr65935.c?  That was due

>> to us not doing IV analysis for BB vectorisation, and ensuring that

>> the step was always zero.

> Which means vectorizer code can handle not IV-analyzed offset, but

> can't for analyzed form?

>>

>>> In this patch, step information is simply discarded.  I am

>>> wondering if possible to always analyze scev within innermost loop for

>>> slp while discards step information.

>>

>> Well, we don't calculate a step for bb analysis (i.e. it's not the case

>> the patch calculates step information and then simply discards it).

>> I don't see how that would work.  For bb analysis, the DR_OFFSET + DR_INIT

>> has to give the offset for every execution of the block, not just the

>> first iteration of the containing loop.  So if we get back a nonzero

>> step, we have to do something with it.

> Yeah.

>>

>> But:

>>

>> (a) the old simple_iv analysis is more expensive than simply calling

>>     analyze_scev, so I don't think this is a win in terms of complexity.

>>

>> (b) for bb analysis, there's nothing particularly special about the

>>     innermost loop.  It makes more sense to analyse it in the innermost

>>     loop for which the offset is invariant, as shown by the second

>>     testcase in the patch.

>>

>> (c) The patch helps with loop vectorisation too, since analysing the

>>     starting DR_OFFSET in the context of the containing loop can help

>>     in a similar way as analysing the full offset does for SLP.

>

> I have to admit I am not very much into this method.  It complicates

> structure as well as code.

> Mostly because now dr_init are split into two different fields and one

> of it is lazily computed.

>

> For example:

>> @@ -2974,12 +2974,12 @@ vect_vfa_segment_size (struct data_refer

>>  vect_no_alias_p (struct data_reference *a, struct data_reference *b,

>>                   tree segment_length_a, tree segment_length_b)

>>  {

>> -  gcc_assert (TREE_CODE (DR_INIT (a)) == INTEGER_CST

>> -          && TREE_CODE (DR_INIT (b)) == INTEGER_CST);

>> -  if (tree_int_cst_equal (DR_INIT (a), DR_INIT (b)))

>> +  gcc_assert (TREE_CODE (DR_CHREC_INIT (a)) == INTEGER_CST

>> +          && TREE_CODE (DR_CHREC_INIT (b)) == INTEGER_CST);

>> +  if (tree_int_cst_equal (DR_CHREC_INIT (a), DR_CHREC_INIT (b)))

>>      return false;

>>

>> -  tree seg_a_min = DR_INIT (a);

>> +  tree seg_a_min = DR_CHREC_INIT (a);

>>    tree seg_a_max = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_a_min),

>>                  seg_a_min, segment_length_a);

>>    /* For negative step, we need to adjust address range by TYPE_SIZE_UNIT

>> @@ -2990,10 +2990,10 @@ vect_no_alias_p (struct data_reference *

>>        tree unit_size = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (a)));

>>        seg_a_min = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_a_max),

>>                     seg_a_max, unit_size);

>> -      seg_a_max = fold_build2 (PLUS_EXPR, TREE_TYPE (DR_INIT (a)),

>> -                   DR_INIT (a), unit_size);

>> +      seg_a_max = fold_build2 (PLUS_EXPR, TREE_TYPE (DR_CHREC_INIT (a)),

>> +                   DR_CHREC_INIT (a), unit_size);

>>      }

>> -  tree seg_b_min = DR_INIT (b);

>> +  tree seg_b_min = DR_CHREC_INIT (b);

>>    tree seg_b_max = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_b_min),

>>                  seg_b_min, segment_length_b);

>>    if (tree_int_cst_compare (DR_STEP (b), size_zero_node) < 0)

>

> Use of DR_INIT is simply replaced by DR_CHREC_INIT.  Is it safe to do

> so in case of non-ZERO

> DR_INIT?  It worries me that I may need to think twice before

> referring to DR_INIT because it's

> not clear when DR_OFFSET is split and DR_CHREC_INIT becomes non-ZERO.

> It may simply

> because I am too dumb to handle this.  I will leave this to richi.


I'm trying to understand this a bit (not liking it very much in its
current form).

Can code currently using DR_OFFSET and DR_INIT simply use
DR_CHREC_INIT and CHREC_LEFT (DR_CHREC_OFFSET) (or DR_CHREC_OFFSET
if DR_CHREC_OFFSET is not a CHREC)?  If yes, would there be any downside
in doing that?  If not, then why?

That said, I'm all for making DR info more powerful.  But I detest
having extra fields
and adding confusion as to when to use which.  Thus if we can make DR_CHREC_INIT
be DR_INIT and DR_OFFSET an inline function accessing CHREC_LEFT if
suitable plus exposing DR_OFFSET_CHREC that would make me more happy.

One downside might be that the scalar evolution of the offset might pull in
SSA vars into "DR_OFFSET" that are "far away" and thus less optimal for
code-generation when one re-constructs a ref by adding the components?

Thanks,
Richard.


> Thanks,

> bin

>>

>> Thanks,

>> Richard

>>

>>>

>>> Thanks,

>>> bin

>>>>

>>>> FWIW, the old simple_iv check that I removed for bb data-ref analysis

>>>> relies on SCEV analysis too, so I don't think this is more expensive

>>>> than what we had before.

>>>>

>>>> Thanks,

>>>> Richard

Richard Biener Aug. 18, 2017, 10:35 a.m. UTC | #10

On Fri, Aug 18, 2017 at 12:30 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Thu, Aug 17, 2017 at 2:24 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:

>> On Thu, Aug 17, 2017 at 12:35 PM, Richard Sandiford

>> <richard.sandiford@linaro.org> wrote:

>>> "Bin.Cheng" <amker.cheng@gmail.com> writes:

>>>> On Wed, Aug 16, 2017 at 6:50 PM, Richard Sandiford

>>>> <richard.sandiford@linaro.org> wrote:

>>>>> "Bin.Cheng" <amker.cheng@gmail.com> writes:

>>>>>> On Wed, Aug 16, 2017 at 5:00 PM, Richard Sandiford

>>>>>> <richard.sandiford@linaro.org> wrote:

>>>>>>> "Bin.Cheng" <amker.cheng@gmail.com> writes:

>>>>>>>> On Wed, Aug 16, 2017 at 2:38 PM, Richard Sandiford

>>>>>>>> <richard.sandiford@linaro.org> wrote:

>>>>>>>>> The first loop in the testcase regressed after my recent changes to

>>>>>>>>> dr_analyze_innermost.  Previously we would treat "i" as an iv even

>>>>>>>>> for bb analysis and end up with:

>>>>>>>>>

>>>>>>>>>    DR_BASE_ADDRESS: p or q

>>>>>>>>>    DR_OFFSET: 0

>>>>>>>>>    DR_INIT: 0 or 4

>>>>>>>>>    DR_STEP: 16

>>>>>>>>>

>>>>>>>>> We now always keep the step as 0 instead, so for an int "i" we'd have:

>>>>>>>>>

>>>>>>>>>    DR_BASE_ADDRESS: p or q

>>>>>>>>>    DR_OFFSET: (intptr_t) i

>>>>>>>>>    DR_INIT: 0 or 4

>>>>>>>>>    DR_STEP: 0

>>>>>>>>>

>>>>>>>>> This is also what we'd like to have for the unsigned "i", but the

>>>>>>>>> problem is that strip_constant_offset thinks that the "i + 1" in

>>>>>>>>> "(intptr_t) (i + 1)" could wrap and so doesn't peel off the "+ 1".

>>>>>>>>> The [i + 1] accesses therefore have a DR_OFFSET equal to the SSA

>>>>>>>>> name that holds "(intptr_t) (i + 1)", meaning that the accesses no

>>>>>>>>> longer seem to be related to the [i] ones.

>>>>>>>>

>>>>>>>> Didn't read the change in detail, so sorry if I mis-understood the issue.

>>>>>>>> I made changes in scev to better fold type conversion by various sources

>>>>>>>> of information, for example, vrp, niters, undefined overflow behavior etc.

>>>>>>>> In theory these information should be available for other

>>>>>>>> optimizers without

>>>>>>>> querying scev.  For the mentioned test, vrp should compute accurate range

>>>>>>>> information for "i" so that we can fold (intptr_t) (i + 1) it without

>>>>>>>> worrying

>>>>>>>> overflow.  Note we don't do it in generic folding because

>>>>>>>> (intptr_t) (i) + 1

>>>>>>>> could be more expensive (especially in case of (T)(i + j)), or because the

>>>>>>>> CST part is in bigger precision after conversion.

>>>>>>>> But such folding is wanted in several places, e.g, IVOPTs.  To provide such

>>>>>>>> an interface, we changed tree-affine and made it do aggressive fold.  I am

>>>>>>>> curious if it's possible to use aff_tree to implement strip_constant_offset

>>>>>>>> here since aggressive folding is wanted.  After all, using additional chrec

>>>>>>>> looks like a little heavy wrto the simple test.

>>>>>>>

>>>>>>> Yeah, using aff_tree does work here when the bounds are constant.

>>>>>>> It doesn't look like it works for things like:

>>>>>>>

>>>>>>>     double p[1000];

>>>>>>>     double q[1000];

>>>>>>>

>>>>>>>     void

>>>>>>>     f4 (unsigned int n)

>>>>>>>     {

>>>>>>>       for (unsigned int i = 0; i < n; i += 4)

>>>>>>>         {

>>>>>>>           double a = q[i] + p[i];

>>>>>>>           double b = q[i + 1] + p[i + 1];

>>>>>>>           q[i] = a;

>>>>>>>           q[i + 1] = b;

>>>>>>>         }

>>>>>>>     }

>>>>>>>

>>>>>>> though, where the bounds on the global arrays guarantee that [i + 1] can't

>>>>>>> overflow, even though "n" is unconstrained.  The patch as posted handles

>>>>>>> this case too.

>>>>>> BTW is this a missed optimization in value range analysis?  The range

>>>>>> information for i should flow in a way like: array boundary -> niters

>>>>>> -> scev/vrp.

>>>>>> I think that's what niters/scev do in analysis.

>>>>>

>>>>> Yeah, maybe :-)  It looks like the problem is that when SLP runs,

>>>>> the previous VRP pass came before loop header copying, so the (single)

>>>>> header has to cope with n == 0 case.  Thus we get:

>>>> Ah, there are several passes in between vrp and pass_ch, not sure if

>>>> any such pass depends on vrp intensively.  I would suggestion reorder

>>>> the two passes, or standalone VRP interface updating information for

>>>> loop region after header copied?   This is a non-trivial issue that

>>>> needs to be fixed.  Niters analyzer rely on

>>>> simplify_using_initial_conditions heavily to get the same information,

>>>> which in my opinion should be provided by VRP.  Though this won't be

>>>> able to obsolete simplify_using_initial_conditions because VRP is weak

>>>> in symbolic range...

>>>>

>>>>>

>>>>>   Visiting statement:

>>>>>   i_15 = ASSERT_EXPR <i_6, i_6 < n_9(D)>;

>>>>>   Intersecting

>>>>>     [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)

>>>>>   and

>>>>>     [0, 0]

>>>>>   to

>>>>>     [0, 0]  EQUIVALENCES: { i_6 } (1 elements)

>>>>>   Intersecting

>>>>>     [0, 0]  EQUIVALENCES: { i_6 } (1 elements)

>>>>>   and

>>>>>     [0, 1000]

>>>>>   to

>>>>>     [0, 0]  EQUIVALENCES: { i_6 } (1 elements)

>>>>>   Found new range for i_15: [0, 0]

>>>>>

>>>>>   Visiting statement:

>>>>>   _3 = i_15 + 1;

>>>>>   Match-and-simplified i_15 + 1 to 1

>>>>>   Intersecting

>>>>>     [1, 1]

>>>>>   and

>>>>>     [0, +INF]

>>>>>   to

>>>>>     [1, 1]

>>>>>   Found new range for _3: [1, 1]

>>>>>

>>>>> (where _3 is the index we care about), followed by:

>>>>>

>>>>>   Visiting statement:

>>>>>   i_15 = ASSERT_EXPR <i_6, i_6 < n_9(D)>;

>>>>>   Intersectings/aarch64-linux/trunk-orig/debug/gcc'

>>>>>     [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)

>>>>>   and

>>>>>     [0, 4]

>>>>>   to

>>>>>     [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)

>>>>>   Intersecting

>>>>>     [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)

>>>>>   and

>>>>>     [0, 1000]

>>>>>   to

>>>>>     [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)

>>>>>   Found new range for i_15: [0, n_9(D) + 4294967295]

>>>>>

>>>>>   Visiting statement:

>>>>>   _3 = i_15 + 1;

>>>>>   Intersecting

>>>>>     [0, +INF]

>>>>>   and

>>>>>     [0, +INF]

>>>>>   to

>>>>>     [0, +INF]

>>>>>   Found new range for _3: [0, +INF]

>>>>>

>>>>> I guess in this case it would be better to intersect the i_15 ranges

>>>>> to [0, 1000] rather than [0, n_9(D) + 4294967295].

>>>>>

>>>>> It does work if another VRP pass runs after CH.  But even then,

>>>>> is it a good idea to rely on the range info being kept up-to-date

>>>>> all the way through to SLP?  A lot happens inbetween.

>>>> To some extend yes.  Now I understand that SCEV uses niters

>>>> information to prove no_overflow.  Niters analysis does infer better

>>>> information from array boundary, while VRP fails to do that.  I don't

>>>> worry much about gap between vrp pass and slp, it's basically the same

>>>> as niters.  Both information are analyzed at one point and meant to be

>>>> used by following passes.  It's also not common for vrp information

>>>> become invalid given we are on SSA?

>>>

>>> I'm not worried so much about vrp information becoming invalid on

>>> an SSA name that existed when VRP was run.  It's more a question

>>> of what happens about SSA names that get introduced after VRP,

>>> e.g. by things like dom, reassoc, PRE, etc.

>> For induction variables in concern, these passes shouldn't

>> aggressively introduces new variables I think.

>>>

>>>> Now that data address is not analyzed against loop, VRP would be the

>>>> only information we can use unless adding back scev analysis.  IMHO,

>>>> the patch is doing so in another way than before.  It requires

>>>> additional chrec data structure.  I remember the previous patch

>>>> enables more slp vectorization, is it because of "step" information

>>>> from scev?

>>>

>>> Do you mean that:

>>>

>>> 2017-07-03  Richard Sandiford  <richard.sandiford@linaro.org>

>>>

>>>         * tree-data-ref.c (dr_analyze_innermost): Replace the "nest"

>>>         parameter with a "loop" parameter and use it instead of the

>>>         loop containing DR_STMT.  Don't check simple_iv when doing

>>>         BB analysis.  Describe the two analysis modes in the comment.

>>>

>>> enabled more SLP vectorisation in bb-slp-pr65935.c?  That was due

>>> to us not doing IV analysis for BB vectorisation, and ensuring that

>>> the step was always zero.

>> Which means vectorizer code can handle not IV-analyzed offset, but

>> can't for analyzed form?

>>>

>>>> In this patch, step information is simply discarded.  I am

>>>> wondering if possible to always analyze scev within innermost loop for

>>>> slp while discards step information.

>>>

>>> Well, we don't calculate a step for bb analysis (i.e. it's not the case

>>> the patch calculates step information and then simply discards it).

>>> I don't see how that would work.  For bb analysis, the DR_OFFSET + DR_INIT

>>> has to give the offset for every execution of the block, not just the

>>> first iteration of the containing loop.  So if we get back a nonzero

>>> step, we have to do something with it.

>> Yeah.

>>>

>>> But:

>>>

>>> (a) the old simple_iv analysis is more expensive than simply calling

>>>     analyze_scev, so I don't think this is a win in terms of complexity.

>>>

>>> (b) for bb analysis, there's nothing particularly special about the

>>>     innermost loop.  It makes more sense to analyse it in the innermost

>>>     loop for which the offset is invariant, as shown by the second

>>>     testcase in the patch.

>>>

>>> (c) The patch helps with loop vectorisation too, since analysing the

>>>     starting DR_OFFSET in the context of the containing loop can help

>>>     in a similar way as analysing the full offset does for SLP.

>>

>> I have to admit I am not very much into this method.  It complicates

>> structure as well as code.

>> Mostly because now dr_init are split into two different fields and one

>> of it is lazily computed.

>>

>> For example:

>>> @@ -2974,12 +2974,12 @@ vect_vfa_segment_size (struct data_refer

>>>  vect_no_alias_p (struct data_reference *a, struct data_reference *b,

>>>                   tree segment_length_a, tree segment_length_b)

>>>  {

>>> -  gcc_assert (TREE_CODE (DR_INIT (a)) == INTEGER_CST

>>> -          && TREE_CODE (DR_INIT (b)) == INTEGER_CST);

>>> -  if (tree_int_cst_equal (DR_INIT (a), DR_INIT (b)))

>>> +  gcc_assert (TREE_CODE (DR_CHREC_INIT (a)) == INTEGER_CST

>>> +          && TREE_CODE (DR_CHREC_INIT (b)) == INTEGER_CST);

>>> +  if (tree_int_cst_equal (DR_CHREC_INIT (a), DR_CHREC_INIT (b)))

>>>      return false;

>>>

>>> -  tree seg_a_min = DR_INIT (a);

>>> +  tree seg_a_min = DR_CHREC_INIT (a);

>>>    tree seg_a_max = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_a_min),

>>>                  seg_a_min, segment_length_a);

>>>    /* For negative step, we need to adjust address range by TYPE_SIZE_UNIT

>>> @@ -2990,10 +2990,10 @@ vect_no_alias_p (struct data_reference *

>>>        tree unit_size = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (a)));

>>>        seg_a_min = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_a_max),

>>>                     seg_a_max, unit_size);

>>> -      seg_a_max = fold_build2 (PLUS_EXPR, TREE_TYPE (DR_INIT (a)),

>>> -                   DR_INIT (a), unit_size);

>>> +      seg_a_max = fold_build2 (PLUS_EXPR, TREE_TYPE (DR_CHREC_INIT (a)),

>>> +                   DR_CHREC_INIT (a), unit_size);

>>>      }

>>> -  tree seg_b_min = DR_INIT (b);

>>> +  tree seg_b_min = DR_CHREC_INIT (b);

>>>    tree seg_b_max = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_b_min),

>>>                  seg_b_min, segment_length_b);

>>>    if (tree_int_cst_compare (DR_STEP (b), size_zero_node) < 0)

>>

>> Use of DR_INIT is simply replaced by DR_CHREC_INIT.  Is it safe to do

>> so in case of non-ZERO

>> DR_INIT?  It worries me that I may need to think twice before

>> referring to DR_INIT because it's

>> not clear when DR_OFFSET is split and DR_CHREC_INIT becomes non-ZERO.

>> It may simply

>> because I am too dumb to handle this.  I will leave this to richi.

>

> I'm trying to understand this a bit (not liking it very much in its

> current form).

>

> Can code currently using DR_OFFSET and DR_INIT simply use

> DR_CHREC_INIT and CHREC_LEFT (DR_CHREC_OFFSET) (or DR_CHREC_OFFSET

> if DR_CHREC_OFFSET is not a CHREC)?  If yes, would there be any downside

> in doing that?  If not, then why?

>

> That said, I'm all for making DR info more powerful.  But I detest

> having extra fields

> and adding confusion as to when to use which.  Thus if we can make DR_CHREC_INIT

> be DR_INIT and DR_OFFSET an inline function accessing CHREC_LEFT if

> suitable plus exposing DR_OFFSET_CHREC that would make me more happy.


And if we want to make it opt-in do a dr_analyze_me_harder () which will
re-write DR_OFFSET/INIT into the new form.

But DR_OFFSET and DR_INIT (read) accessors would maintain their
semantics while DR_OFFSET_CHREC would expose the CHREC if
available.

Richard.

> One downside might be that the scalar evolution of the offset might pull in

> SSA vars into "DR_OFFSET" that are "far away" and thus less optimal for

> code-generation when one re-constructs a ref by adding the components?

>

> Thanks,

> Richard.

>

>

>> Thanks,

>> bin

>>>

>>> Thanks,

>>> Richard

>>>

>>>>

>>>> Thanks,

>>>> bin

>>>>>

>>>>> FWIW, the old simple_iv check that I removed for bb data-ref analysis

>>>>> relies on SCEV analysis too, so I don't think this is more expensive

>>>>> than what we had before.

>>>>>

>>>>> Thanks,

>>>>> Richard

Richard Sandiford Aug. 22, 2017, 2:19 p.m. UTC | #11

Richard Biener <richard.guenther@gmail.com> writes:
> On Fri, Aug 18, 2017 at 12:30 PM, Richard Biener

> <richard.guenther@gmail.com> wrote:

>> On Thu, Aug 17, 2017 at 2:24 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:

>>> On Thu, Aug 17, 2017 at 12:35 PM, Richard Sandiford

>>> <richard.sandiford@linaro.org> wrote:

>>>> "Bin.Cheng" <amker.cheng@gmail.com> writes:

>>>>> On Wed, Aug 16, 2017 at 6:50 PM, Richard Sandiford

>>>>> <richard.sandiford@linaro.org> wrote:

>>>>>> "Bin.Cheng" <amker.cheng@gmail.com> writes:

>>>>>>> On Wed, Aug 16, 2017 at 5:00 PM, Richard Sandiford

>>>>>>> <richard.sandiford@linaro.org> wrote:

>>>>>>>> "Bin.Cheng" <amker.cheng@gmail.com> writes:

>>>>>>>>> On Wed, Aug 16, 2017 at 2:38 PM, Richard Sandiford

>>>>>>>>> <richard.sandiford@linaro.org> wrote:

>>>>>>>>>> The first loop in the testcase regressed after my recent changes to

>>>>>>>>>> dr_analyze_innermost.  Previously we would treat "i" as an iv even

>>>>>>>>>> for bb analysis and end up with:

>>>>>>>>>>

>>>>>>>>>>    DR_BASE_ADDRESS: p or q

>>>>>>>>>>    DR_OFFSET: 0

>>>>>>>>>>    DR_INIT: 0 or 4

>>>>>>>>>>    DR_STEP: 16

>>>>>>>>>>

>>>>>>>>>> We now always keep the step as 0 instead, so for an int "i" we'd have:

>>>>>>>>>>

>>>>>>>>>>    DR_BASE_ADDRESS: p or q

>>>>>>>>>>    DR_OFFSET: (intptr_t) i

>>>>>>>>>>    DR_INIT: 0 or 4

>>>>>>>>>>    DR_STEP: 0

>>>>>>>>>>

>>>>>>>>>> This is also what we'd like to have for the unsigned "i", but the

>>>>>>>>>> problem is that strip_constant_offset thinks that the "i + 1" in

>>>>>>>>>> "(intptr_t) (i + 1)" could wrap and so doesn't peel off the "+ 1".

>>>>>>>>>> The [i + 1] accesses therefore have a DR_OFFSET equal to the SSA

>>>>>>>>>> name that holds "(intptr_t) (i + 1)", meaning that the accesses no

>>>>>>>>>> longer seem to be related to the [i] ones.

>>>>>>>>>

>>>>>>>>> Didn't read the change in detail, so sorry if I mis-understood

>>>>>>>>> the issue.

>>>>>>>>> I made changes in scev to better fold type conversion by

>>>>>>>>> various sources

>>>>>>>>> of information, for example, vrp, niters, undefined overflow

>>>>>>>>> behavior etc.

>>>>>>>>> In theory these information should be available for other

>>>>>>>>> optimizers without

>>>>>>>>> querying scev.  For the mentioned test, vrp should compute

>>>>>>>>> accurate range

>>>>>>>>> information for "i" so that we can fold (intptr_t) (i + 1) it without

>>>>>>>>> worrying

>>>>>>>>> overflow.  Note we don't do it in generic folding because

>>>>>>>>> (intptr_t) (i) + 1

>>>>>>>>> could be more expensive (especially in case of (T)(i + j)), or

>>>>>>>>> because the

>>>>>>>>> CST part is in bigger precision after conversion.

>>>>>>>>> But such folding is wanted in several places, e.g, IVOPTs.  To

>>>>>>>>> provide such

>>>>>>>>> an interface, we changed tree-affine and made it do aggressive

>>>>>>>>> fold.  I am

>>>>>>>>> curious if it's possible to use aff_tree to implement

>>>>>>>>> strip_constant_offset

>>>>>>>>> here since aggressive folding is wanted.  After all, using

>>>>>>>>> additional chrec

>>>>>>>>> looks like a little heavy wrto the simple test.

>>>>>>>>

>>>>>>>> Yeah, using aff_tree does work here when the bounds are constant.

>>>>>>>> It doesn't look like it works for things like:

>>>>>>>>

>>>>>>>>     double p[1000];

>>>>>>>>     double q[1000];

>>>>>>>>

>>>>>>>>     void

>>>>>>>>     f4 (unsigned int n)

>>>>>>>>     {

>>>>>>>>       for (unsigned int i = 0; i < n; i += 4)

>>>>>>>>         {

>>>>>>>>           double a = q[i] + p[i];

>>>>>>>>           double b = q[i + 1] + p[i + 1];

>>>>>>>>           q[i] = a;

>>>>>>>>           q[i + 1] = b;

>>>>>>>>         }

>>>>>>>>     }

>>>>>>>>

>>>>>>>> though, where the bounds on the global arrays guarantee that [i + 1] can't

>>>>>>>> overflow, even though "n" is unconstrained.  The patch as posted handles

>>>>>>>> this case too.

>>>>>>> BTW is this a missed optimization in value range analysis?  The range

>>>>>>> information for i should flow in a way like: array boundary -> niters

>>>>>>> -> scev/vrp.

>>>>>>> I think that's what niters/scev do in analysis.

>>>>>>

>>>>>> Yeah, maybe :-)  It looks like the problem is that when SLP runs,

>>>>>> the previous VRP pass came before loop header copying, so the (single)

>>>>>> header has to cope with n == 0 case.  Thus we get:

>>>>> Ah, there are several passes in between vrp and pass_ch, not sure if

>>>>> any such pass depends on vrp intensively.  I would suggestion reorder

>>>>> the two passes, or standalone VRP interface updating information for

>>>>> loop region after header copied?   This is a non-trivial issue that

>>>>> needs to be fixed.  Niters analyzer rely on

>>>>> simplify_using_initial_conditions heavily to get the same information,

>>>>> which in my opinion should be provided by VRP.  Though this won't be

>>>>> able to obsolete simplify_using_initial_conditions because VRP is weak

>>>>> in symbolic range...

>>>>>

>>>>>>

>>>>>>   Visiting statement:

>>>>>>   i_15 = ASSERT_EXPR <i_6, i_6 < n_9(D)>;

>>>>>>   Intersecting

>>>>>>     [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)

>>>>>>   and

>>>>>>     [0, 0]

>>>>>>   to

>>>>>>     [0, 0]  EQUIVALENCES: { i_6 } (1 elements)

>>>>>>   Intersecting

>>>>>>     [0, 0]  EQUIVALENCES: { i_6 } (1 elements)

>>>>>>   and

>>>>>>     [0, 1000]

>>>>>>   to

>>>>>>     [0, 0]  EQUIVALENCES: { i_6 } (1 elements)

>>>>>>   Found new range for i_15: [0, 0]

>>>>>>

>>>>>>   Visiting statement:

>>>>>>   _3 = i_15 + 1;

>>>>>>   Match-and-simplified i_15 + 1 to 1

>>>>>>   Intersecting

>>>>>>     [1, 1]

>>>>>>   and

>>>>>>     [0, +INF]

>>>>>>   to

>>>>>>     [1, 1]

>>>>>>   Found new range for _3: [1, 1]

>>>>>>

>>>>>> (where _3 is the index we care about), followed by:

>>>>>>

>>>>>>   Visiting statement:

>>>>>>   i_15 = ASSERT_EXPR <i_6, i_6 < n_9(D)>;

>>>>>>   Intersectings/aarch64-linux/trunk-orig/debug/gcc'

>>>>>>     [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)

>>>>>>   and

>>>>>>     [0, 4]

>>>>>>   to

>>>>>>     [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)

>>>>>>   Intersecting

>>>>>>     [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)

>>>>>>   and

>>>>>>     [0, 1000]

>>>>>>   to

>>>>>>     [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)

>>>>>>   Found new range for i_15: [0, n_9(D) + 4294967295]

>>>>>>

>>>>>>   Visiting statement:

>>>>>>   _3 = i_15 + 1;

>>>>>>   Intersecting

>>>>>>     [0, +INF]

>>>>>>   and

>>>>>>     [0, +INF]

>>>>>>   to

>>>>>>     [0, +INF]

>>>>>>   Found new range for _3: [0, +INF]

>>>>>>

>>>>>> I guess in this case it would be better to intersect the i_15 ranges

>>>>>> to [0, 1000] rather than [0, n_9(D) + 4294967295].

>>>>>>

>>>>>> It does work if another VRP pass runs after CH.  But even then,

>>>>>> is it a good idea to rely on the range info being kept up-to-date

>>>>>> all the way through to SLP?  A lot happens inbetween.

>>>>> To some extend yes.  Now I understand that SCEV uses niters

>>>>> information to prove no_overflow.  Niters analysis does infer better

>>>>> information from array boundary, while VRP fails to do that.  I don't

>>>>> worry much about gap between vrp pass and slp, it's basically the same

>>>>> as niters.  Both information are analyzed at one point and meant to be

>>>>> used by following passes.  It's also not common for vrp information

>>>>> become invalid given we are on SSA?

>>>>

>>>> I'm not worried so much about vrp information becoming invalid on

>>>> an SSA name that existed when VRP was run.  It's more a question

>>>> of what happens about SSA names that get introduced after VRP,

>>>> e.g. by things like dom, reassoc, PRE, etc.

>>> For induction variables in concern, these passes shouldn't

>>> aggressively introduces new variables I think.

>>>>

>>>>> Now that data address is not analyzed against loop, VRP would be the

>>>>> only information we can use unless adding back scev analysis.  IMHO,

>>>>> the patch is doing so in another way than before.  It requires

>>>>> additional chrec data structure.  I remember the previous patch

>>>>> enables more slp vectorization, is it because of "step" information

>>>>> from scev?

>>>>

>>>> Do you mean that:

>>>>

>>>> 2017-07-03  Richard Sandiford  <richard.sandiford@linaro.org>

>>>>

>>>>         * tree-data-ref.c (dr_analyze_innermost): Replace the "nest"

>>>>         parameter with a "loop" parameter and use it instead of the

>>>>         loop containing DR_STMT.  Don't check simple_iv when doing

>>>>         BB analysis.  Describe the two analysis modes in the comment.

>>>>

>>>> enabled more SLP vectorisation in bb-slp-pr65935.c?  That was due

>>>> to us not doing IV analysis for BB vectorisation, and ensuring that

>>>> the step was always zero.

>>> Which means vectorizer code can handle not IV-analyzed offset, but

>>> can't for analyzed form?

>>>>

>>>>> In this patch, step information is simply discarded.  I am

>>>>> wondering if possible to always analyze scev within innermost loop for

>>>>> slp while discards step information.

>>>>

>>>> Well, we don't calculate a step for bb analysis (i.e. it's not the case

>>>> the patch calculates step information and then simply discards it).

>>>> I don't see how that would work.  For bb analysis, the DR_OFFSET + DR_INIT

>>>> has to give the offset for every execution of the block, not just the

>>>> first iteration of the containing loop.  So if we get back a nonzero

>>>> step, we have to do something with it.

>>> Yeah.

>>>>

>>>> But:

>>>>

>>>> (a) the old simple_iv analysis is more expensive than simply calling

>>>>     analyze_scev, so I don't think this is a win in terms of complexity.

>>>>

>>>> (b) for bb analysis, there's nothing particularly special about the

>>>>     innermost loop.  It makes more sense to analyse it in the innermost

>>>>     loop for which the offset is invariant, as shown by the second

>>>>     testcase in the patch.

>>>>

>>>> (c) The patch helps with loop vectorisation too, since analysing the

>>>>     starting DR_OFFSET in the context of the containing loop can help

>>>>     in a similar way as analysing the full offset does for SLP.

>>>

>>> I have to admit I am not very much into this method.  It complicates

>>> structure as well as code.

>>> Mostly because now dr_init are split into two different fields and one

>>> of it is lazily computed.

>>>

>>> For example:

>>>> @@ -2974,12 +2974,12 @@ vect_vfa_segment_size (struct data_refer

>>>>  vect_no_alias_p (struct data_reference *a, struct data_reference *b,

>>>>                   tree segment_length_a, tree segment_length_b)

>>>>  {

>>>> -  gcc_assert (TREE_CODE (DR_INIT (a)) == INTEGER_CST

>>>> -          && TREE_CODE (DR_INIT (b)) == INTEGER_CST);

>>>> -  if (tree_int_cst_equal (DR_INIT (a), DR_INIT (b)))

>>>> +  gcc_assert (TREE_CODE (DR_CHREC_INIT (a)) == INTEGER_CST

>>>> +          && TREE_CODE (DR_CHREC_INIT (b)) == INTEGER_CST);

>>>> +  if (tree_int_cst_equal (DR_CHREC_INIT (a), DR_CHREC_INIT (b)))

>>>>      return false;

>>>>

>>>> -  tree seg_a_min = DR_INIT (a);

>>>> +  tree seg_a_min = DR_CHREC_INIT (a);

>>>>    tree seg_a_max = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_a_min),

>>>>                  seg_a_min, segment_length_a);

>>>>    /* For negative step, we need to adjust address range by TYPE_SIZE_UNIT

>>>> @@ -2990,10 +2990,10 @@ vect_no_alias_p (struct data_reference *

>>>>        tree unit_size = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (a)));

>>>>        seg_a_min = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_a_max),

>>>>                     seg_a_max, unit_size);

>>>> -      seg_a_max = fold_build2 (PLUS_EXPR, TREE_TYPE (DR_INIT (a)),

>>>> -                   DR_INIT (a), unit_size);

>>>> +      seg_a_max = fold_build2 (PLUS_EXPR, TREE_TYPE (DR_CHREC_INIT (a)),

>>>> +                   DR_CHREC_INIT (a), unit_size);

>>>>      }

>>>> -  tree seg_b_min = DR_INIT (b);

>>>> +  tree seg_b_min = DR_CHREC_INIT (b);

>>>>    tree seg_b_max = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_b_min),

>>>>                  seg_b_min, segment_length_b);

>>>>    if (tree_int_cst_compare (DR_STEP (b), size_zero_node) < 0)

>>>

>>> Use of DR_INIT is simply replaced by DR_CHREC_INIT.  Is it safe to do

>>> so in case of non-ZERO

>>> DR_INIT?  It worries me that I may need to think twice before

>>> referring to DR_INIT because it's

>>> not clear when DR_OFFSET is split and DR_CHREC_INIT becomes non-ZERO.

>>> It may simply

>>> because I am too dumb to handle this.  I will leave this to richi.

>>

>> I'm trying to understand this a bit (not liking it very much in its

>> current form).

>>

>> Can code currently using DR_OFFSET and DR_INIT simply use

>> DR_CHREC_INIT and CHREC_LEFT (DR_CHREC_OFFSET) (or DR_CHREC_OFFSET

>> if DR_CHREC_OFFSET is not a CHREC)?  If yes, would there be any downside

>> in doing that?  If not, then why?


There's nothing particularly special about the CHREC_LEFT for users of
the drs.  The chrec as a whole describes the variable part of the offset.

>> That said, I'm all for making DR info more powerful.  But I detest

>> having extra fields

>> and adding confusion as to when to use which.  Thus if we can make

>> DR_CHREC_INIT

>> be DR_INIT and DR_OFFSET an inline function accessing CHREC_LEFT if

>> suitable plus exposing DR_OFFSET_CHREC that would make me more happy.

>

> And if we want to make it opt-in do a dr_analyze_me_harder () which will

> re-write DR_OFFSET/INIT into the new form.

>

> But DR_OFFSET and DR_INIT (read) accessors would maintain their

> semantics while DR_OFFSET_CHREC would expose the CHREC if

> available.


After the changes, the only place that actually cared about the split
between the "old" DR_OFFSET and DR_INIT was tree-predcom.c:
ref_at_iteration.  Everywhere else just wanted the sum of OFFSET and INIT,
and for them it would have been more convenient to have a combined field.

So maybe one way of trying to avoid the confusion would be to keep
DR_OFFSET together as the full starting offset from the base, then
provide DR_VAR_OFFSET and DR_CONST_OFFSET as the split forms, with
DR_VAR_OFFSET being more like an abstract value number.  See the comment
at the start of the patch below for more details.

I'm not sure whether the main reason for splitting the offset was to
make life easier for the users of the drs, or whether it was to try
to avoid creating new trees.  But then, the unconditional scev analysis
that we did previously already generated new trees.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.

Thanks,
Richard


2017-08-22  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	PR tree-optimization/81635
	* tree-data-ref.h (innermost_loop_behavior): Remove init field.
	Add var_offset and const_offset fields.  Rename offset_alignment
	to var_offset_alignment.
	(DR_INIT): Delete.
	(DR_CONST_OFFSET, DR_VAR_OFFSET): New macros.
	(DR_OFFSET_ALIGNMENT): Replace with...
	(DR_VAR_OFFSET_ALIGNMENT): ...this new macro.
	(dr_analyze_innermost): Add a gimple * argument.
	(dr_equal_offsets_p): Delete.
	(dr_var_offsets_equal_p, dr_var_offsets_compare): Declare.
	* tree-vectorizer.h (STMT_VINFO_DR_INIT): Delete.
	(STMT_VINFO_DR_VAR_OFFSET, STMT_VINFO_DR_CONST_OFFSET): New macros.
	(STMT_VINFO_DR_OFFSET_ALIGNMENT): Replace with...
	(STMT_VINFO_DR_VAR_OFFSET_ALIGNMENT): ...this new macro.
	* tree.c (tree_ctz): Handle POLYNOMIAL_CHREC.
	* tree-data-ref.c: Include tree-ssa-loop-ivopts.h.
	(split_constant_offset): Handle POLYNOMIAL_CHREC.
	(analyze_offset_scev): New function.
	(dr_analyze_innermost): Add a gimple * statement.  Update after
	changes to innermost_behavior.  Initialize var_offset and
	const_offset.
	(create_data_ref): Update call to dr_analyze_innermost.
	Update dump after changes to innermost_behavior.
	(operator ==): Use dr_var_offsets_equal_p and compare the
	DR_CONST_OFFSETs.
	(prune_runtime_alias_test_list): Likewise.
	(comp_dr_with_seg_len_pair): Use dr_var_offsets_compare and compare
	the DR_CONST_OFFSETs.
	(create_intersect_range_checks): Use DR_OFFSET without adding
	DR_INIT.
	(dr_equal_offsets_p1, dr_equal_offsets_p): Delete.
	(dr_alignment): Use const_offset instead of init and
	var_offset_alignment instead of offset_alignment.
	* tree-if-conv.c (innermost_loop_behavior_hash::hash): Don't
	test the init field.
	(innermost_loop_behavior_hash::equal): Likewise.
	(ifcvt_memrefs_wont_trap): Likewise.
	(if_convertible_loop_p_1): Likewise.
	* tree-loop-distribution.c (build_addr_arg_loc): Use DR_OFFSET
	without adding DR_INIT.
	(build_rdg_partition_for_vertex): Don't check DR_INIT.
	(share_memory_accesses): Likewise.
	(pg_add_dependence_edges): Likewise.
	(compute_alias_check_pairs): Use dr_var_offsets_compare.
	* tree-predcom.c (aff_combination_dr_offset): Use DR_OFFSET without
	adding DR_INIT.
	(determine_offset): Likewise.
	(valid_initializer_p): Likewise.
	(find_looparound_phi): Update call to dr_analyze_innermost.
	(ref_at_iteration): Use split_constant_offset to split the offset.
	* tree-vect-data-refs.c (vect_compute_data_ref_alignment): Use
	const_offset instead of init and var_offset_alignment instead of
	offset_alignment.
	(vect_find_same_alignment_drs): Use dr_var_offsets_compare and
	compare the DR_CONST_OFFSETs.
	(dr_group_sort_cmp): Likewise.
	(vect_analyze_group_access_1): Use DR_CONST_OFFSET instead of DR_INIT.
	(vect_no_alias_p): Likewise.
	(vect_analyze_data_ref_accesses): Use dr_var_offsets_equal_p and
	compare the DR_CONST_OFFSETs.
	(vect_prune_runtime_alias_test_list): Use dr_var_offsets_compare.
	(vect_analyze_data_refs): Don't check DR_INIT and use DR_OFFSET
	without adding DR_INIT.  Use DR_VAR_OFFSET_ALIGNMENT instead of
	DR_OFFSET_ALIGNMENT.  Update call to dr_analyze_innermost, and
	update subsequent dump.
	(vect_create_addr_base_for_vector_ref): Use DR_OFFSET without
	adding DR_INIT.
	* tree-vect-stmts.c (vectorizable_store): Likewise.
	(vectorizable_load): Likewise.  Use DR_CONST_OFFSET instead
	of DR_INIT.

gcc/testsuite/
	PR tree-optimization/81635
	* gcc.dg/vect/bb-slp-pr81635.c: New test.Index: gcc/tree-data-ref.h
===================================================================
--- gcc/tree-data-ref.h	2017-08-21 10:42:51.088530428 +0100
+++ gcc/tree-data-ref.h	2017-08-22 14:54:48.630563940 +0100
@@ -28,29 +28,44 @@ #define GCC_TREE_DATA_REF_H
   innermost_loop_behavior describes the evolution of the address of the memory
   reference in the innermost enclosing loop.  The address is expressed as
   BASE + STEP * # of iteration, and base is further decomposed as the base
-  pointer (BASE_ADDRESS),  loop invariant offset (OFFSET) and
-  constant offset (INIT).  Examples, in loop nest
+  pointer (BASE_ADDRESS) and the loop invariant offset (OFFSET).
+  OFFSET is further expressed as the sum of a zero or non-constant term
+  (VAR_OFFSET) and a constant term (CONST_OFFSET).  VAR_OFFSET should be
+  treated as an abstract representation; in particular, it may contain
+  chrecs.  CONST_OFFSET is always an INTEGER_CST.
 
-  for (i = 0; i < 100; i++)
-    for (j = 3; j < 100; j++)
+  Examples, in loop nest
 
-                       Example 1                      Example 2
-      data-ref         a[j].b[i][j]                   *(p + x + 16B + 4B * j)
+  1: for (i = 0; i < 100; i++)
+  2:   for (j = 3; j < 100; j++)
 
+                    Example 1                         Example 2
+      data-ref      a[j].b[i][j]                      *(p + x + 16B + 4B * j)
 
-  innermost_loop_behavior
-      base_address     &a                             p
-      offset           i * D_i			      x
-      init             3 * D_j + offsetof (b)         28
-      step             D_j                            4
 
-  */
+  innermost_loop_behavior
+      base_address  &a                                p
+      offset        i * D_i + 3 * D_j + offsetof (b)  x + 28
+      var_offset    {0, +, D_i}_1                     x (or an equiv. chrec)
+      const_offset  3 * D_j + offsetof (b)            28
+      step          D_j                               4
+
+  The main two uses of VAR_OFFSET and CONST_OFFSET are:
+
+  1. to better analyze the alignment, since CONST_OFFSET can be treated as
+     the misalignment wrt the alignment of VAR_OFFSET.
+
+  2. to find data references that are a constant number of bytes apart.
+     If two data references have the same BASE_ADDRESS and VAR_OFFSET,
+     the distance between them is given by the difference in their
+     CONST_OFFSETs.  */
 struct innermost_loop_behavior
 {
   tree base_address;
   tree offset;
-  tree init;
   tree step;
+  tree var_offset;
+  tree const_offset;
 
   /* BASE_ADDRESS is known to be misaligned by BASE_MISALIGNMENT bytes
      from an alignment boundary of BASE_ALIGNMENT bytes.  For example,
@@ -91,7 +106,7 @@ struct innermost_loop_behavior
   /* The largest power of two that divides OFFSET, capped to a suitably
      high value if the offset is zero.  This is a byte rather than a bit
      quantity.  */
-  unsigned int offset_alignment;
+  unsigned int var_offset_alignment;
 
   /* Likewise for STEP.  */
   unsigned int step_alignment;
@@ -186,12 +201,13 @@ #define DR_IS_WRITE(DR)            (!DR_
 #define DR_IS_CONDITIONAL_IN_STMT(DR) (DR)->is_conditional_in_stmt
 #define DR_BASE_ADDRESS(DR)        (DR)->innermost.base_address
 #define DR_OFFSET(DR)              (DR)->innermost.offset
-#define DR_INIT(DR)                (DR)->innermost.init
+#define DR_VAR_OFFSET(DR)          (DR)->innermost.var_offset
+#define DR_CONST_OFFSET(DR)        (DR)->innermost.const_offset
 #define DR_STEP(DR)                (DR)->innermost.step
 #define DR_PTR_INFO(DR)            (DR)->alias.ptr_info
 #define DR_BASE_ALIGNMENT(DR)      (DR)->innermost.base_alignment
 #define DR_BASE_MISALIGNMENT(DR)   (DR)->innermost.base_misalignment
-#define DR_OFFSET_ALIGNMENT(DR)    (DR)->innermost.offset_alignment
+#define DR_VAR_OFFSET_ALIGNMENT(DR) (DR)->innermost.var_offset_alignment
 #define DR_STEP_ALIGNMENT(DR)      (DR)->innermost.step_alignment
 #define DR_INNERMOST(DR)           (DR)->innermost
 
@@ -412,7 +428,8 @@ #define DDR_REVERSED_P(DDR) (DDR)->rever
 #define DDR_COULD_BE_INDEPENDENT_P(DDR) (DDR)->could_be_independent_p
 
 
-bool dr_analyze_innermost (innermost_loop_behavior *, tree, struct loop *);
+bool dr_analyze_innermost (innermost_loop_behavior *, tree,
+			   gimple *, struct loop *);
 extern bool compute_data_dependences_for_loop (struct loop *, bool,
 					       vec<loop_p> *,
 					       vec<data_reference_p> *,
@@ -466,8 +483,6 @@ dr_alignment (data_reference *dr)
 
 extern bool dr_may_alias_p (const struct data_reference *,
 			    const struct data_reference *, bool);
-extern bool dr_equal_offsets_p (struct data_reference *,
-                                struct data_reference *);
 
 extern bool runtime_alias_check_p (ddr_p, struct loop *, bool);
 extern int data_ref_compare_tree (tree, tree);
@@ -675,4 +690,23 @@ lambda_matrix_new (int m, int n, struct
   return mat;
 }
 
+/* Check if DRA and DRB have equal DR_VAR_OFFSETs.  */
+
+inline bool
+dr_var_offsets_equal_p (struct data_reference *dra,
+			struct data_reference *drb)
+{
+  return eq_evolutions_p (DR_VAR_OFFSET (dra), DR_VAR_OFFSET (drb));
+}
+
+/* Compare the DR_VAR_OFFSETs of DRA and DRB for sorting purposes,
+   returning a qsort-style result.  */
+
+inline int
+dr_var_offsets_compare (struct data_reference *dra,
+			struct data_reference *drb)
+{
+  return data_ref_compare_tree (DR_VAR_OFFSET (dra), DR_VAR_OFFSET (drb));
+}
+
 #endif  /* GCC_TREE_DATA_REF_H  */
Index: gcc/tree-vectorizer.h
===================================================================
--- gcc/tree-vectorizer.h	2017-08-04 11:42:45.939105152 +0100
+++ gcc/tree-vectorizer.h	2017-08-22 14:54:48.633563940 +0100
@@ -740,14 +740,15 @@ #define STMT_VINFO_VEC_CONST_COND_REDUC_
 
 #define STMT_VINFO_DR_WRT_VEC_LOOP(S)      (S)->dr_wrt_vec_loop
 #define STMT_VINFO_DR_BASE_ADDRESS(S)      (S)->dr_wrt_vec_loop.base_address
-#define STMT_VINFO_DR_INIT(S)              (S)->dr_wrt_vec_loop.init
 #define STMT_VINFO_DR_OFFSET(S)            (S)->dr_wrt_vec_loop.offset
+#define STMT_VINFO_DR_VAR_OFFSET(S)        (S)->dr_wrt_vec_loop.var_offset
+#define STMT_VINFO_DR_CONST_OFFSET(S)      (S)->dr_wrt_vec_loop.const_offset
 #define STMT_VINFO_DR_STEP(S)              (S)->dr_wrt_vec_loop.step
 #define STMT_VINFO_DR_BASE_ALIGNMENT(S)    (S)->dr_wrt_vec_loop.base_alignment
 #define STMT_VINFO_DR_BASE_MISALIGNMENT(S) \
   (S)->dr_wrt_vec_loop.base_misalignment
-#define STMT_VINFO_DR_OFFSET_ALIGNMENT(S) \
-  (S)->dr_wrt_vec_loop.offset_alignment
+#define STMT_VINFO_DR_VAR_OFFSET_ALIGNMENT(S) \
+  (S)->dr_wrt_vec_loop.var_offset_alignment
 #define STMT_VINFO_DR_STEP_ALIGNMENT(S) \
   (S)->dr_wrt_vec_loop.step_alignment
 
Index: gcc/tree.c
===================================================================
--- gcc/tree.c	2017-08-21 12:14:47.159835474 +0100
+++ gcc/tree.c	2017-08-22 14:54:48.634563941 +0100
@@ -2601,6 +2601,12 @@ tree_ctz (const_tree expr)
 	  return MIN (ret1, prec);
 	}
       return 0;
+    case POLYNOMIAL_CHREC:
+      ret1 = tree_ctz (CHREC_LEFT (expr));
+      if (ret1 == 0)
+	return ret1;
+      ret2 = tree_ctz (CHREC_RIGHT (expr));
+      return MIN (ret1, ret2);
     default:
       return 0;
     }
Index: gcc/tree-data-ref.c
===================================================================
--- gcc/tree-data-ref.c	2017-08-21 10:42:51.088530428 +0100
+++ gcc/tree-data-ref.c	2017-08-22 14:54:48.629563940 +0100
@@ -86,6 +86,7 @@ Software Foundation; either version 3, o
 #include "expr.h"
 #include "gimple-iterator.h"
 #include "tree-ssa-loop-niter.h"
+#include "tree-ssa-loop-ivopts.h"
 #include "tree-ssa-loop.h"
 #include "tree-ssa.h"
 #include "cfgloop.h"
@@ -730,7 +731,19 @@ split_constant_offset (tree exp, tree *v
   *off = ssize_int (0);
   STRIP_NOPS (exp);
 
-  if (tree_is_chrec (exp)
+  if (TREE_CODE (exp) == POLYNOMIAL_CHREC)
+    {
+      split_constant_offset (CHREC_LEFT (exp), &op0, &op1);
+      if (op0 != CHREC_LEFT (exp))
+	{
+	  *var = build3 (POLYNOMIAL_CHREC, type, CHREC_VAR (exp),
+			 op0, CHREC_RIGHT (exp));
+	  *off = op1;
+	}
+      return;
+    }
+
+  if (automatically_generated_chrec_p (exp)
       || get_gimple_rhs_class (TREE_CODE (exp)) == GIMPLE_TERNARY_RHS)
     return;
 
@@ -765,7 +778,28 @@ canonicalize_base_object_address (tree a
   return build_fold_addr_expr (TREE_OPERAND (addr, 0));
 }
 
-/* Analyze the behavior of memory reference REF.  There are two modes:
+/* Analyze the scalar evolution of OFFSET in the innermost parent of
+   LOOP for which it isn't invariant.  Return OFFSET itself if the
+   value is invariant or if it's too complex to analyze.  */
+
+static tree
+analyze_offset_scev (struct loop *loop, tree offset)
+{
+  struct loop *inv_loop = outermost_invariant_loop_for_expr (loop, offset);
+  if (inv_loop != NULL)
+    {
+      if (loop_depth (inv_loop) == 0)
+	return offset;
+      loop = loop_outer (inv_loop);
+    }
+  tree res = analyze_scalar_evolution (loop, offset);
+  if (chrec_contains_undetermined (res))
+    return offset;
+  return res;
+}
+
+/* Analyze the behavior of memory reference REF, which occurs in STMT.
+   There are two modes:
 
    - BB analysis.  In this case we simply split the address into base,
      init and offset components, without reference to any containing loop.
@@ -787,14 +821,14 @@ canonicalize_base_object_address (tree a
 
 bool
 dr_analyze_innermost (innermost_loop_behavior *drb, tree ref,
-		      struct loop *loop)
+		      gimple *stmt, struct loop *loop)
 {
   HOST_WIDE_INT pbitsize, pbitpos;
   tree base, poffset;
   machine_mode pmode;
   int punsignedp, preversep, pvolatilep;
   affine_iv base_iv, offset_iv;
-  tree init, dinit, step;
+  tree dinit;
   bool in_loop = (loop && loop->num);
 
   if (dump_file && (dump_flags & TDF_DETAILS))
@@ -885,7 +919,7 @@ dr_analyze_innermost (innermost_loop_beh
         }
     }
 
-  init = ssize_int (pbitpos / BITS_PER_UNIT);
+  tree init = ssize_int (pbitpos / BITS_PER_UNIT);
 
   /* Subtract any constant component from the base and add it to INIT instead.
      Adjust the misalignment to reflect the amount we subtracted.  */
@@ -893,14 +927,14 @@ dr_analyze_innermost (innermost_loop_beh
   init = size_binop (PLUS_EXPR, init, dinit);
   base_misalignment -= TREE_INT_CST_LOW (dinit);
 
-  split_constant_offset (offset_iv.base, &offset_iv.base, &dinit);
-  init = size_binop (PLUS_EXPR, init, dinit);
-
-  step = size_binop (PLUS_EXPR,
-		     fold_convert (ssizetype, base_iv.step),
-		     fold_convert (ssizetype, offset_iv.step));
-
   base = canonicalize_base_object_address (base_iv.base);
+  tree offset = size_binop (PLUS_EXPR,
+			    fold_convert (ssizetype, offset_iv.base),
+			    init);
+  tree step = size_binop (PLUS_EXPR,
+			  fold_convert (ssizetype, base_iv.step),
+			  fold_convert (ssizetype, offset_iv.step));
+  tree scev = analyze_offset_scev (loop_containing_stmt (stmt), offset);
 
   /* See if get_pointer_alignment can guarantee a higher alignment than
      the one we calculated above.  */
@@ -921,12 +955,12 @@ dr_analyze_innermost (innermost_loop_beh
     }
 
   drb->base_address = base;
-  drb->offset = fold_convert (ssizetype, offset_iv.base);
-  drb->init = init;
+  drb->offset = offset;
   drb->step = step;
+  split_constant_offset (scev, &drb->var_offset, &drb->const_offset);
   drb->base_alignment = base_alignment;
   drb->base_misalignment = base_misalignment & (base_alignment - 1);
-  drb->offset_alignment = highest_pow2_factor (offset_iv.base);
+  drb->var_offset_alignment = highest_pow2_factor (drb->var_offset);
   drb->step_alignment = highest_pow2_factor (step);
 
   if (dump_file && (dump_flags & TDF_DETAILS))
@@ -1154,7 +1188,7 @@ create_data_ref (loop_p nest, loop_p loo
   DR_IS_READ (dr) = is_read;
   DR_IS_CONDITIONAL_IN_STMT (dr) = is_conditional_in_stmt;
 
-  dr_analyze_innermost (&DR_INNERMOST (dr), memref,
+  dr_analyze_innermost (&DR_INNERMOST (dr), memref, stmt,
 			nest != NULL ? loop : NULL);
   dr_analyze_indices (dr, nest, loop);
   dr_analyze_alias (dr);
@@ -1166,15 +1200,17 @@ create_data_ref (loop_p nest, loop_p loo
       print_generic_expr (dump_file, DR_BASE_ADDRESS (dr), TDF_SLIM);
       fprintf (dump_file, "\n\toffset from base address: ");
       print_generic_expr (dump_file, DR_OFFSET (dr), TDF_SLIM);
-      fprintf (dump_file, "\n\tconstant offset from base address: ");
-      print_generic_expr (dump_file, DR_INIT (dr), TDF_SLIM);
+      fprintf (dump_file, "\n\tvariable part of offset: ");
+      print_generic_expr (dump_file, DR_VAR_OFFSET (dr), TDF_SLIM);
+      fprintf (dump_file, "\n\tconstant part of offset: ");
+      print_generic_expr (dump_file, DR_CONST_OFFSET (dr), TDF_SLIM);
       fprintf (dump_file, "\n\tstep: ");
       print_generic_expr (dump_file, DR_STEP (dr), TDF_SLIM);
       fprintf (dump_file, "\n\tbase alignment: %d", DR_BASE_ALIGNMENT (dr));
       fprintf (dump_file, "\n\tbase misalignment: %d",
 	       DR_BASE_MISALIGNMENT (dr));
-      fprintf (dump_file, "\n\toffset alignment: %d",
-	       DR_OFFSET_ALIGNMENT (dr));
+      fprintf (dump_file, "\n\tvariable offset alignment: %d",
+	       DR_VAR_OFFSET_ALIGNMENT (dr));
       fprintf (dump_file, "\n\tstep alignment: %d", DR_STEP_ALIGNMENT (dr));
       fprintf (dump_file, "\n\tbase_object: ");
       print_generic_expr (dump_file, DR_BASE_OBJECT (dr), TDF_SLIM);
@@ -1324,11 +1360,12 @@ runtime_alias_check_p (ddr_p ddr, struct
 operator == (const dr_with_seg_len& d1,
 	     const dr_with_seg_len& d2)
 {
-  return operand_equal_p (DR_BASE_ADDRESS (d1.dr),
+  return (operand_equal_p (DR_BASE_ADDRESS (d1.dr),
 			  DR_BASE_ADDRESS (d2.dr), 0)
-	   && data_ref_compare_tree (DR_OFFSET (d1.dr), DR_OFFSET (d2.dr)) == 0
-	   && data_ref_compare_tree (DR_INIT (d1.dr), DR_INIT (d2.dr)) == 0
-	   && data_ref_compare_tree (d1.seg_len, d2.seg_len) == 0;
+	  && dr_var_offsets_equal_p (d1.dr, d2.dr)
+	  && data_ref_compare_tree (DR_CONST_OFFSET (d1.dr),
+				    DR_CONST_OFFSET (d2.dr)) == 0
+	  && data_ref_compare_tree (d1.seg_len, d2.seg_len) == 0);
 }
 
 /* Comparison function for sorting objects of dr_with_seg_len_pair_t
@@ -1360,17 +1397,15 @@ comp_dr_with_seg_len_pair (const void *p
   if ((comp_res = data_ref_compare_tree (DR_STEP (a2.dr),
 					 DR_STEP (b2.dr))) != 0)
     return comp_res;
-  if ((comp_res = data_ref_compare_tree (DR_OFFSET (a1.dr),
-					 DR_OFFSET (b1.dr))) != 0)
+  if ((comp_res = dr_var_offsets_compare (a1.dr, b1.dr)) != 0)
     return comp_res;
-  if ((comp_res = data_ref_compare_tree (DR_INIT (a1.dr),
-					 DR_INIT (b1.dr))) != 0)
+  if ((comp_res = data_ref_compare_tree (DR_CONST_OFFSET (a1.dr),
+					 DR_CONST_OFFSET (b1.dr))) != 0)
     return comp_res;
-  if ((comp_res = data_ref_compare_tree (DR_OFFSET (a2.dr),
-					 DR_OFFSET (b2.dr))) != 0)
+  if ((comp_res = dr_var_offsets_compare (a2.dr, b2.dr)) != 0)
     return comp_res;
-  if ((comp_res = data_ref_compare_tree (DR_INIT (a2.dr),
-					 DR_INIT (b2.dr))) != 0)
+  if ((comp_res = data_ref_compare_tree (DR_CONST_OFFSET (a2.dr),
+					 DR_CONST_OFFSET (b2.dr))) != 0)
     return comp_res;
 
   return 0;
@@ -1455,10 +1490,9 @@ prune_runtime_alias_test_list (vec<dr_wi
 
 	  if (!operand_equal_p (DR_BASE_ADDRESS (dr_a1->dr),
 				DR_BASE_ADDRESS (dr_a2->dr), 0)
-	      || !operand_equal_p (DR_OFFSET (dr_a1->dr),
-				   DR_OFFSET (dr_a2->dr), 0)
-	      || !tree_fits_shwi_p (DR_INIT (dr_a1->dr))
-	      || !tree_fits_shwi_p (DR_INIT (dr_a2->dr)))
+	      || !dr_var_offsets_equal_p (dr_a1->dr, dr_a2->dr)
+	      || !tree_fits_shwi_p (DR_CONST_OFFSET (dr_a1->dr))
+	      || !tree_fits_shwi_p (DR_CONST_OFFSET (dr_a2->dr)))
 	    continue;
 
 	  /* Only merge const step data references.  */
@@ -1484,11 +1518,13 @@ prune_runtime_alias_test_list (vec<dr_wi
 	    continue;
 
 	  /* Make sure dr_a1 starts left of dr_a2.  */
-	  if (tree_int_cst_lt (DR_INIT (dr_a2->dr), DR_INIT (dr_a1->dr)))
+	  if (tree_int_cst_lt (DR_CONST_OFFSET (dr_a2->dr),
+			       DR_CONST_OFFSET (dr_a1->dr)))
 	    std::swap (*dr_a1, *dr_a2);
 
 	  bool do_remove = false;
-	  wide_int diff = wi::sub (DR_INIT (dr_a2->dr), DR_INIT (dr_a1->dr));
+	  wide_int diff = wi::sub (DR_CONST_OFFSET (dr_a2->dr),
+				   DR_CONST_OFFSET (dr_a1->dr));
 	  wide_int min_seg_len_b;
 	  tree new_seg_len;
 
@@ -1756,10 +1792,6 @@ create_intersect_range_checks (struct lo
   tree addr_base_b = DR_BASE_ADDRESS (dr_b.dr);
   tree offset_a = DR_OFFSET (dr_a.dr), offset_b = DR_OFFSET (dr_b.dr);
 
-  offset_a = fold_build2 (PLUS_EXPR, TREE_TYPE (offset_a),
-			  offset_a, DR_INIT (dr_a.dr));
-  offset_b = fold_build2 (PLUS_EXPR, TREE_TYPE (offset_b),
-			  offset_b, DR_INIT (dr_b.dr));
   addr_base_a = fold_build_pointer_plus (addr_base_a, offset_a);
   addr_base_b = fold_build_pointer_plus (addr_base_b, offset_b);
 
@@ -1826,48 +1858,6 @@ create_runtime_alias_checks (struct loop
     }
 }
 
-/* Check if OFFSET1 and OFFSET2 (DR_OFFSETs of some data-refs) are identical
-   expressions.  */
-static bool
-dr_equal_offsets_p1 (tree offset1, tree offset2)
-{
-  bool res;
-
-  STRIP_NOPS (offset1);
-  STRIP_NOPS (offset2);
-
-  if (offset1 == offset2)
-    return true;
-
-  if (TREE_CODE (offset1) != TREE_CODE (offset2)
-      || (!BINARY_CLASS_P (offset1) && !UNARY_CLASS_P (offset1)))
-    return false;
-
-  res = dr_equal_offsets_p1 (TREE_OPERAND (offset1, 0),
-                             TREE_OPERAND (offset2, 0));
-
-  if (!res || !BINARY_CLASS_P (offset1))
-    return res;
-
-  res = dr_equal_offsets_p1 (TREE_OPERAND (offset1, 1),
-                             TREE_OPERAND (offset2, 1));
-
-  return res;
-}
-
-/* Check if DRA and DRB have equal offsets.  */
-bool
-dr_equal_offsets_p (struct data_reference *dra,
-                    struct data_reference *drb)
-{
-  tree offset1, offset2;
-
-  offset1 = DR_OFFSET (dra);
-  offset2 = DR_OFFSET (drb);
-
-  return dr_equal_offsets_p1 (offset1, offset2);
-}
-
 /* Returns true if FNA == FNB.  */
 
 static bool
@@ -5083,13 +5073,13 @@ dr_alignment (innermost_loop_behavior *d
   /* Get the alignment of BASE_ADDRESS + INIT.  */
   unsigned int alignment = drb->base_alignment;
   unsigned int misalignment = (drb->base_misalignment
-			       + TREE_INT_CST_LOW (drb->init));
+			       + TREE_INT_CST_LOW (drb->const_offset));
   if (misalignment != 0)
     alignment = MIN (alignment, misalignment & -misalignment);
 
   /* Cap it to the alignment of OFFSET.  */
   if (!integer_zerop (drb->offset))
-    alignment = MIN (alignment, drb->offset_alignment);
+    alignment = MIN (alignment, drb->var_offset_alignment);
 
   /* Cap it to the alignment of STEP.  */
   if (!integer_zerop (drb->step))
Index: gcc/tree-if-conv.c
===================================================================
--- gcc/tree-if-conv.c	2017-07-13 09:25:12.666266733 +0100
+++ gcc/tree-if-conv.c	2017-08-22 14:54:48.630563940 +0100
@@ -149,7 +149,6 @@ innermost_loop_behavior_hash::hash (cons
 
   hash = iterative_hash_expr (e->base_address, 0);
   hash = iterative_hash_expr (e->offset, hash);
-  hash = iterative_hash_expr (e->init, hash);
   return iterative_hash_expr (e->step, hash);
 }
 
@@ -161,8 +160,6 @@ innermost_loop_behavior_hash::equal (con
       || (!e1->base_address && e2->base_address)
       || (!e1->offset && e2->offset)
       || (e1->offset && !e2->offset)
-      || (!e1->init && e2->init)
-      || (e1->init && !e2->init)
       || (!e1->step && e2->step)
       || (e1->step && !e2->step))
     return false;
@@ -173,9 +170,6 @@ innermost_loop_behavior_hash::equal (con
   if (e1->offset && e2->offset
       && !operand_equal_p (e1->offset, e2->offset, 0))
     return false;
-  if (e1->init && e2->init
-      && !operand_equal_p (e1->init, e2->init, 0))
-    return false;
   if (e1->step && e2->step
       && !operand_equal_p (e1->step, e2->step, 0))
     return false;
@@ -856,8 +850,7 @@ ifcvt_memrefs_wont_trap (gimple *stmt, v
   innermost_loop_behavior *innermost = &DR_INNERMOST (a);
 
   gcc_assert (DR_STMT (a) == stmt);
-  gcc_assert (DR_BASE_ADDRESS (a) || DR_OFFSET (a)
-              || DR_INIT (a) || DR_STEP (a));
+  gcc_assert (DR_BASE_ADDRESS (a) || DR_OFFSET (a) || DR_STEP (a));
 
   master_dr = innermost_DR_map->get (innermost);
   gcc_assert (master_dr != NULL);
@@ -1433,8 +1426,7 @@ if_convertible_loop_p_1 (struct loop *lo
       if (TREE_CODE (ref) == COMPONENT_REF
           || TREE_CODE (ref) == IMAGPART_EXPR
           || TREE_CODE (ref) == REALPART_EXPR
-          || !(DR_BASE_ADDRESS (dr) || DR_OFFSET (dr)
-	       || DR_INIT (dr) || DR_STEP (dr)))
+          || !(DR_BASE_ADDRESS (dr) || DR_OFFSET (dr) || DR_STEP (dr)))
         {
           while (TREE_CODE (ref) == COMPONENT_REF
 	         || TREE_CODE (ref) == IMAGPART_EXPR
Index: gcc/tree-loop-distribution.c
===================================================================
--- gcc/tree-loop-distribution.c	2017-08-21 10:42:51.088530428 +0100
+++ gcc/tree-loop-distribution.c	2017-08-22 14:54:48.630563940 +0100
@@ -903,8 +903,7 @@ build_addr_arg_loc (location_t loc, data
 {
   tree addr_base;
 
-  addr_base = size_binop_loc (loc, PLUS_EXPR, DR_OFFSET (dr), DR_INIT (dr));
-  addr_base = fold_convert_loc (loc, sizetype, addr_base);
+  addr_base = fold_convert_loc (loc, sizetype, DR_OFFSET (dr));
 
   /* Test for a negative stride, iterating over every element.  */
   if (tree_int_cst_sgn (DR_STEP (dr)) == -1)
@@ -1289,8 +1288,7 @@ build_rdg_partition_for_vertex (struct g
 
 	  /* Partition can only be executed sequentially if there is any
 	     unknown data reference.  */
-	  if (!DR_BASE_ADDRESS (dr) || !DR_OFFSET (dr)
-	      || !DR_INIT (dr) || !DR_STEP (dr))
+	  if (!DR_BASE_ADDRESS (dr) || !DR_OFFSET (dr) || !DR_STEP (dr))
 	    partition->type = PTYPE_SEQUENTIAL;
 
 	  bitmap_set_bit (partition->datarefs, idx);
@@ -1507,21 +1505,18 @@ share_memory_accesses (struct graph *rdg
     {
       dr1 = datarefs_vec[i];
 
-      if (!DR_BASE_ADDRESS (dr1)
-	  || !DR_OFFSET (dr1) || !DR_INIT (dr1) || !DR_STEP (dr1))
+      if (!DR_BASE_ADDRESS (dr1) || !DR_OFFSET (dr1) || !DR_STEP (dr1))
 	continue;
 
       EXECUTE_IF_SET_IN_BITMAP (partition2->datarefs, 0, j, bj)
 	{
 	  dr2 = datarefs_vec[j];
 
-	  if (!DR_BASE_ADDRESS (dr2)
-	      || !DR_OFFSET (dr2) || !DR_INIT (dr2) || !DR_STEP (dr2))
+	  if (!DR_BASE_ADDRESS (dr2) || !DR_OFFSET (dr2) || !DR_STEP (dr2))
 	    continue;
 
 	  if (operand_equal_p (DR_BASE_ADDRESS (dr1), DR_BASE_ADDRESS (dr2), 0)
 	      && operand_equal_p (DR_OFFSET (dr1), DR_OFFSET (dr2), 0)
-	      && operand_equal_p (DR_INIT (dr1), DR_INIT (dr2), 0)
 	      && operand_equal_p (DR_STEP (dr1), DR_STEP (dr2), 0))
 	    return true;
 	}
@@ -1705,7 +1700,6 @@ pg_add_dependence_edges (struct graph *r
 		 runtime alias check.  */
 	      if (!DR_BASE_ADDRESS (dr1) || !DR_BASE_ADDRESS (dr2)
 		  || !DR_OFFSET (dr1) || !DR_OFFSET (dr2)
-		  || !DR_INIT (dr1) || !DR_INIT (dr2)
 		  || !DR_STEP (dr1) || !tree_fits_uhwi_p (DR_STEP (dr1))
 		  || !DR_STEP (dr2) || !tree_fits_uhwi_p (DR_STEP (dr2))
 		  || res == 0)
@@ -2203,7 +2197,7 @@ compute_alias_check_pairs (struct loop *
 					    DR_BASE_ADDRESS (dr_b));
 
       if (comp_res == 0)
-	comp_res = data_ref_compare_tree (DR_OFFSET (dr_a), DR_OFFSET (dr_b));
+	comp_res = dr_var_offsets_compare (dr_a, dr_b);
       gcc_assert (comp_res != 0);
 
       if (latch_dominated_by_data_ref (loop, dr_a))
Index: gcc/tree-predcom.c
===================================================================
--- gcc/tree-predcom.c	2017-08-10 14:36:08.057471267 +0100
+++ gcc/tree-predcom.c	2017-08-22 14:54:48.631563940 +0100
@@ -666,18 +666,13 @@ suitable_reference_p (struct data_refere
   return true;
 }
 
-/* Stores DR_OFFSET (DR) + DR_INIT (DR) to OFFSET.  */
+/* Stores DR_OFFSET (DR) to OFFSET.  */
 
 static void
 aff_combination_dr_offset (struct data_reference *dr, aff_tree *offset)
 {
-  tree type = TREE_TYPE (DR_OFFSET (dr));
-  aff_tree delta;
-
-  tree_to_aff_combination_expand (DR_OFFSET (dr), type, offset,
-				  &name_expansions);
-  aff_combination_const (&delta, type, wi::to_widest (DR_INIT (dr)));
-  aff_combination_add (offset, &delta);
+  tree_to_aff_combination_expand (DR_OFFSET (dr), TREE_TYPE (DR_OFFSET (dr)),
+				  offset, &name_expansions);
 }
 
 /* Determines number of iterations of the innermost enclosing loop before B
@@ -710,8 +705,7 @@ determine_offset (struct data_reference
       /* If the references have loop invariant address, check that they access
 	 exactly the same location.  */
       *off = 0;
-      return (operand_equal_p (DR_OFFSET (a), DR_OFFSET (b), 0)
-	      && operand_equal_p (DR_INIT (a), DR_INIT (b), 0));
+      return operand_equal_p (DR_OFFSET (a), DR_OFFSET (b), 0);
     }
 
   /* Compare the offsets of the addresses, and check whether the difference
@@ -1171,8 +1165,7 @@ valid_initializer_p (struct data_referen
   /* If the address of the reference is invariant, initializer must access
      exactly the same location.  */
   if (integer_zerop (DR_STEP (root)))
-    return (operand_equal_p (DR_OFFSET (ref), DR_OFFSET (root), 0)
-	    && operand_equal_p (DR_INIT (ref), DR_INIT (root), 0));
+    return operand_equal_p (DR_OFFSET (ref), DR_OFFSET (root), 0);
 
   /* Verify that this index of REF is equal to the root's index at
      -DISTANCE-th iteration.  */
@@ -1247,7 +1240,7 @@ find_looparound_phi (struct loop *loop,
   memset (&init_dr, 0, sizeof (struct data_reference));
   DR_REF (&init_dr) = init_ref;
   DR_STMT (&init_dr) = phi;
-  if (!dr_analyze_innermost (&DR_INNERMOST (&init_dr), init_ref, loop))
+  if (!dr_analyze_innermost (&DR_INNERMOST (&init_dr), init_ref, phi, loop))
     return NULL;
 
   if (!valid_initializer_p (&init_dr, ref->distance + 1, root->ref))
@@ -1481,8 +1474,7 @@ replace_ref_with (gimple *stmt, tree new
 ref_at_iteration (data_reference_p dr, int iter,
 		  gimple_seq *stmts, tree niters = NULL_TREE)
 {
-  tree off = DR_OFFSET (dr);
-  tree coff = DR_INIT (dr);
+  tree off, coff;
   tree ref = DR_REF (dr);
   enum tree_code ref_code = ERROR_MARK;
   tree ref_type = NULL_TREE;
@@ -1490,6 +1482,7 @@ ref_at_iteration (data_reference_p dr, i
   tree ref_op2 = NULL_TREE;
   tree new_offset;
 
+  split_constant_offset (DR_OFFSET (dr), &off, &coff);
   if (iter != 0)
     {
       new_offset = size_binop (MULT_EXPR, DR_STEP (dr), ssize_int (iter));
Index: gcc/tree-vect-data-refs.c
===================================================================
--- gcc/tree-vect-data-refs.c	2017-08-21 10:42:51.088530428 +0100
+++ gcc/tree-vect-data-refs.c	2017-08-22 14:54:48.632563940 +0100
@@ -866,7 +866,7 @@ vect_compute_data_ref_alignment (struct
       base_misalignment = (*entry)->base_misalignment;
     }
 
-  if (drb->offset_alignment < vector_alignment
+  if (drb->var_offset_alignment < vector_alignment
       || !step_preserves_misalignment_p
       /* We need to know whether the step wrt the vectorized loop is
 	 negative when computing the starting misalignment below.  */
@@ -928,7 +928,7 @@ vect_compute_data_ref_alignment (struct
       base_misalignment = 0;
     }
   unsigned int misalignment = (base_misalignment
-			       + TREE_INT_CST_LOW (drb->init));
+			       + TREE_INT_CST_LOW (drb->const_offset));
 
   /* If this is a backward running DR then first access in the larger
      vectype actually is N-1 elements before the address in the DR.
@@ -2187,13 +2187,13 @@ vect_find_same_alignment_drs (struct dat
     return;
 
   if (!operand_equal_p (DR_BASE_ADDRESS (dra), DR_BASE_ADDRESS (drb), 0)
-      || !operand_equal_p (DR_OFFSET (dra), DR_OFFSET (drb), 0)
+      || !dr_var_offsets_equal_p (dra, drb)
       || !operand_equal_p (DR_STEP (dra), DR_STEP (drb), 0))
     return;
 
   /* Two references with distance zero have the same alignment.  */
-  offset_int diff = (wi::to_offset (DR_INIT (dra))
-		     - wi::to_offset (DR_INIT (drb)));
+  offset_int diff = (wi::to_offset (DR_CONST_OFFSET (dra))
+		     - wi::to_offset (DR_CONST_OFFSET (drb)));
   if (diff != 0)
     {
       /* Get the wider of the two alignments.  */
@@ -2434,7 +2434,7 @@ vect_analyze_group_access_1 (struct data
       gimple *next = GROUP_NEXT_ELEMENT (vinfo_for_stmt (stmt));
       struct data_reference *data_ref = dr;
       unsigned int count = 1;
-      tree prev_init = DR_INIT (data_ref);
+      tree prev_init = DR_CONST_OFFSET (data_ref);
       gimple *prev = stmt;
       HOST_WIDE_INT diff, gaps = 0;
 
@@ -2444,9 +2444,10 @@ vect_analyze_group_access_1 (struct data
              data-ref (supported only for loads), we vectorize only the first
              stmt, and the rest get their vectorized loads from the first
              one.  */
-          if (!tree_int_cst_compare (DR_INIT (data_ref),
-                                     DR_INIT (STMT_VINFO_DATA_REF (
-						   vinfo_for_stmt (next)))))
+	  data_reference *next_ref
+	    = STMT_VINFO_DATA_REF (vinfo_for_stmt (next));
+	  if (!tree_int_cst_compare (DR_CONST_OFFSET (data_ref),
+				     DR_CONST_OFFSET (next_ref)))
             {
               if (DR_IS_WRITE (data_ref))
                 {
@@ -2469,14 +2470,14 @@ vect_analyze_group_access_1 (struct data
             }
 
           prev = next;
-          data_ref = STMT_VINFO_DATA_REF (vinfo_for_stmt (next));
+          data_ref = next_ref;
 
 	  /* All group members have the same STEP by construction.  */
 	  gcc_checking_assert (operand_equal_p (DR_STEP (data_ref), step, 0));
 
           /* Check that the distance between two accesses is equal to the type
              size. Otherwise, we have gaps.  */
-          diff = (TREE_INT_CST_LOW (DR_INIT (data_ref))
+          diff = (TREE_INT_CST_LOW (DR_CONST_OFFSET (data_ref))
                   - TREE_INT_CST_LOW (prev_init)) / type_size;
 	  if (diff != 1)
 	    {
@@ -2499,7 +2500,7 @@ vect_analyze_group_access_1 (struct data
              gap in the access, GROUP_GAP is always 1.  */
           GROUP_GAP (vinfo_for_stmt (next)) = diff;
 
-          prev_init = DR_INIT (data_ref);
+          prev_init = DR_CONST_OFFSET (data_ref);
           next = GROUP_NEXT_ELEMENT (vinfo_for_stmt (next));
           /* Count the number of data-refs in the chain.  */
           count++;
@@ -2715,13 +2716,10 @@ dr_group_sort_cmp (const void *dra_, con
         return cmp;
     }
 
-  /* And according to DR_OFFSET.  */
-  if (!dr_equal_offsets_p (dra, drb))
-    {
-      cmp = data_ref_compare_tree (DR_OFFSET (dra), DR_OFFSET (drb));
-      if (cmp != 0)
-        return cmp;
-    }
+  /* And according to DR_VAR_OFFSET.  */
+  cmp = dr_var_offsets_compare (dra, drb);
+  if (cmp != 0)
+    return cmp;
 
   /* Put reads before writes.  */
   if (DR_IS_READ (dra) != DR_IS_READ (drb))
@@ -2745,8 +2743,9 @@ dr_group_sort_cmp (const void *dra_, con
         return cmp;
     }
 
-  /* Then sort after DR_INIT.  In case of identical DRs sort after stmt UID.  */
-  cmp = tree_int_cst_compare (DR_INIT (dra), DR_INIT (drb));
+  /* Then sort after DR_CONST_OFFSET.  In case of identical DRs sort after
+     stmt UID.  */
+  cmp = tree_int_cst_compare (DR_CONST_OFFSET (dra), DR_CONST_OFFSET (drb));
   if (cmp == 0)
     return gimple_uid (DR_STMT (dra)) < gimple_uid (DR_STMT (drb)) ? -1 : 1;
   return cmp;
@@ -2817,7 +2816,7 @@ vect_analyze_data_ref_accesses (vec_info
 	  if (DR_IS_READ (dra) != DR_IS_READ (drb)
 	      || !operand_equal_p (DR_BASE_ADDRESS (dra),
 				   DR_BASE_ADDRESS (drb), 0)
-	      || !dr_equal_offsets_p (dra, drb)
+	      || !dr_var_offsets_equal_p (dra, drb)
 	      || !gimple_assign_single_p (DR_STMT (dra))
 	      || !gimple_assign_single_p (DR_STMT (drb)))
 	    break;
@@ -2835,7 +2834,8 @@ vect_analyze_data_ref_accesses (vec_info
 	    break;
 
 	  /* Do not place the same access in the interleaving chain twice.  */
-	  if (tree_int_cst_compare (DR_INIT (dra), DR_INIT (drb)) == 0)
+	  if (tree_int_cst_compare (DR_CONST_OFFSET (dra),
+				    DR_CONST_OFFSET (drb)) == 0)
 	    break;
 
 	  /* Check the types are compatible.
@@ -2844,9 +2844,10 @@ vect_analyze_data_ref_accesses (vec_info
 				   TREE_TYPE (DR_REF (drb))))
 	    break;
 
-	  /* Sorting has ensured that DR_INIT (dra) <= DR_INIT (drb).  */
-	  HOST_WIDE_INT init_a = TREE_INT_CST_LOW (DR_INIT (dra));
-	  HOST_WIDE_INT init_b = TREE_INT_CST_LOW (DR_INIT (drb));
+	  /* Sorting has ensured that
+	     DR_CONST_OFFSET (dra) <= DR_CONST_OFFSET (drb).  */
+	  HOST_WIDE_INT init_a = TREE_INT_CST_LOW (DR_CONST_OFFSET (dra));
+	  HOST_WIDE_INT init_b = TREE_INT_CST_LOW (DR_CONST_OFFSET (drb));
 	  gcc_assert (init_a <= init_b);
 
 	  /* If init_b == init_a + the size of the type * k, we have an
@@ -2859,10 +2860,10 @@ vect_analyze_data_ref_accesses (vec_info
 	  /* If we have a store, the accesses are adjacent.  This splits
 	     groups into chunks we support (we don't support vectorization
 	     of stores with gaps).  */
+	  HOST_WIDE_INT prev_init
+	    = TREE_INT_CST_LOW (DR_CONST_OFFSET (datarefs_copy[i - 1]));
 	  if (!DR_IS_READ (dra)
-	      && (init_b - (HOST_WIDE_INT) TREE_INT_CST_LOW
-					     (DR_INIT (datarefs_copy[i-1]))
-		  != type_size_a))
+	      && (init_b - prev_init) != type_size_a)
 	    break;
 
 	  /* If the step (if not zero or non-constant) is greater than the
@@ -2974,12 +2975,12 @@ vect_vfa_segment_size (struct data_refer
 vect_no_alias_p (struct data_reference *a, struct data_reference *b,
                  tree segment_length_a, tree segment_length_b)
 {
-  gcc_assert (TREE_CODE (DR_INIT (a)) == INTEGER_CST
-	      && TREE_CODE (DR_INIT (b)) == INTEGER_CST);
-  if (tree_int_cst_equal (DR_INIT (a), DR_INIT (b)))
+  gcc_assert (TREE_CODE (DR_CONST_OFFSET (a)) == INTEGER_CST
+	      && TREE_CODE (DR_CONST_OFFSET (b)) == INTEGER_CST);
+  if (tree_int_cst_equal (DR_CONST_OFFSET (a), DR_CONST_OFFSET (b)))
     return false;
 
-  tree seg_a_min = DR_INIT (a);
+  tree seg_a_min = DR_CONST_OFFSET (a);
   tree seg_a_max = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_a_min),
 				seg_a_min, segment_length_a);
   /* For negative step, we need to adjust address range by TYPE_SIZE_UNIT
@@ -2990,10 +2991,10 @@ vect_no_alias_p (struct data_reference *
       tree unit_size = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (a)));
       seg_a_min = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_a_max),
 			       seg_a_max, unit_size);
-      seg_a_max = fold_build2 (PLUS_EXPR, TREE_TYPE (DR_INIT (a)),
-			       DR_INIT (a), unit_size);
+      seg_a_max = fold_build2 (PLUS_EXPR, TREE_TYPE (DR_CONST_OFFSET (a)),
+			       DR_CONST_OFFSET (a), unit_size);
     }
-  tree seg_b_min = DR_INIT (b);
+  tree seg_b_min = DR_CONST_OFFSET (b);
   tree seg_b_max = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_b_min),
 				seg_b_min, segment_length_b);
   if (tree_int_cst_compare (DR_STEP (b), size_zero_node) < 0)
@@ -3001,8 +3002,8 @@ vect_no_alias_p (struct data_reference *
       tree unit_size = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (b)));
       seg_b_min = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_b_max),
 			       seg_b_max, unit_size);
-      seg_b_max = fold_build2 (PLUS_EXPR, TREE_TYPE (DR_INIT (b)),
-			       DR_INIT (b), unit_size);
+      seg_b_max = fold_build2 (PLUS_EXPR, TREE_TYPE (DR_CONST_OFFSET (b)),
+			       DR_CONST_OFFSET (b), unit_size);
     }
 
   if (tree_int_cst_le (seg_a_max, seg_b_min)
@@ -3148,8 +3149,7 @@ vect_prune_runtime_alias_test_list (loop
       comp_res = data_ref_compare_tree (DR_BASE_ADDRESS (dr_a),
 					DR_BASE_ADDRESS (dr_b));
       if (comp_res == 0)
-	comp_res = data_ref_compare_tree (DR_OFFSET (dr_a),
-					  DR_OFFSET (dr_b));
+	comp_res = dr_var_offsets_compare (dr_a, dr_b);
 
       /* Alias is known at compilation time.  */
       if (comp_res == 0
@@ -3455,7 +3455,7 @@ vect_analyze_data_refs (vec_info *vinfo,
     {
       gimple *stmt;
       stmt_vec_info stmt_info;
-      tree base, offset, init;
+      tree base, offset;
       enum { SG_NONE, GATHER, SCATTER } gatherscatter = SG_NONE;
       bool simd_lane_access = false;
       int vf;
@@ -3488,8 +3488,7 @@ vect_analyze_data_refs (vec_info *vinfo,
 	}
 
       /* Check that analysis of the data-ref succeeded.  */
-      if (!DR_BASE_ADDRESS (dr) || !DR_OFFSET (dr) || !DR_INIT (dr)
-	  || !DR_STEP (dr))
+      if (!DR_BASE_ADDRESS (dr) || !DR_OFFSET (dr) || !DR_STEP (dr))
         {
 	  bool maybe_gather
 	    = DR_IS_READ (dr)
@@ -3515,7 +3514,6 @@ vect_analyze_data_refs (vec_info *vinfo,
 	      gcc_assert (newdr != NULL && DR_REF (newdr));
 	      if (DR_BASE_ADDRESS (newdr)
 		  && DR_OFFSET (newdr)
-		  && DR_INIT (newdr)
 		  && DR_STEP (newdr)
 		  && integer_zerop (DR_STEP (newdr)))
 		{
@@ -3523,8 +3521,7 @@ vect_analyze_data_refs (vec_info *vinfo,
 		    {
 		      tree off = DR_OFFSET (newdr);
 		      STRIP_NOPS (off);
-		      if (TREE_CODE (DR_INIT (newdr)) == INTEGER_CST
-			  && TREE_CODE (off) == MULT_EXPR
+		      if (TREE_CODE (off) == MULT_EXPR
 			  && tree_fits_uhwi_p (TREE_OPERAND (off, 1)))
 			{
 			  tree step = TREE_OPERAND (off, 1);
@@ -3555,7 +3552,7 @@ vect_analyze_data_refs (vec_info *vinfo,
 				    {
 				      DR_OFFSET (newdr) = ssize_int (0);
 				      DR_STEP (newdr) = step;
-				      DR_OFFSET_ALIGNMENT (newdr)
+				      DR_VAR_OFFSET_ALIGNMENT (newdr)
 					= BIGGEST_ALIGNMENT;
 				      DR_STEP_ALIGNMENT (newdr)
 					= highest_pow2_factor (step);
@@ -3665,7 +3662,6 @@ vect_analyze_data_refs (vec_info *vinfo,
 
       base = unshare_expr (DR_BASE_ADDRESS (dr));
       offset = unshare_expr (DR_OFFSET (dr));
-      init = unshare_expr (DR_INIT (dr));
 
       if (is_gimple_call (stmt)
 	  && (!gimple_call_internal_p (stmt)
@@ -3701,9 +3697,7 @@ vect_analyze_data_refs (vec_info *vinfo,
 	     inner loop: *(BASE + INIT + OFFSET).  By construction,
 	     this address must be invariant in the inner loop, so we
 	     can consider it as being used in the outer loop.  */
-	  tree init_offset = fold_build2 (PLUS_EXPR, TREE_TYPE (offset),
-					  init, offset);
-	  tree init_addr = fold_build_pointer_plus (base, init_offset);
+	  tree init_addr = fold_build_pointer_plus (base, offset);
 	  tree init_ref = build_fold_indirect_ref (init_addr);
 
 	  if (dump_enabled_p ())
@@ -3715,7 +3709,7 @@ vect_analyze_data_refs (vec_info *vinfo,
 	    }
 
 	  if (!dr_analyze_innermost (&STMT_VINFO_DR_WRT_VEC_LOOP (stmt_info),
-				     init_ref, loop))
+				     init_ref, stmt, loop))
 	    /* dr_analyze_innermost already explained the failure.  */
 	    return false;
 
@@ -3728,10 +3722,14 @@ vect_analyze_data_refs (vec_info *vinfo,
 	      dump_printf (MSG_NOTE, "\n\touter offset from base address: ");
 	      dump_generic_expr (MSG_NOTE, TDF_SLIM,
                                  STMT_VINFO_DR_OFFSET (stmt_info));
-	      dump_printf (MSG_NOTE,
-                           "\n\touter constant offset from base address: ");
+	      dump_printf (MSG_NOTE, "\n\tvariable part of outer offset"
+			   " from base address: ");
+	      dump_generic_expr (MSG_NOTE, TDF_SLIM,
+                                 STMT_VINFO_DR_VAR_OFFSET (stmt_info));
+	      dump_printf (MSG_NOTE, "\n\tconstart part of outer offset"
+			   " from base address: ");
 	      dump_generic_expr (MSG_NOTE, TDF_SLIM,
-                                 STMT_VINFO_DR_INIT (stmt_info));
+                                 STMT_VINFO_DR_CONST_OFFSET (stmt_info));
 	      dump_printf (MSG_NOTE, "\n\touter step: ");
 	      dump_generic_expr (MSG_NOTE, TDF_SLIM,
                                  STMT_VINFO_DR_STEP (stmt_info));
@@ -3739,8 +3737,9 @@ vect_analyze_data_refs (vec_info *vinfo,
 			   STMT_VINFO_DR_BASE_ALIGNMENT (stmt_info));
 	      dump_printf (MSG_NOTE, "\n\touter base misalignment: %d\n",
 			   STMT_VINFO_DR_BASE_MISALIGNMENT (stmt_info));
-	      dump_printf (MSG_NOTE, "\n\touter offset alignment: %d\n",
-			   STMT_VINFO_DR_OFFSET_ALIGNMENT (stmt_info));
+	      dump_printf (MSG_NOTE,
+			   "\n\touter variable offset alignment: %d\n",
+			   STMT_VINFO_DR_VAR_OFFSET_ALIGNMENT (stmt_info));
 	      dump_printf (MSG_NOTE, "\n\touter step alignment: %d\n",
 			   STMT_VINFO_DR_STEP_ALIGNMENT (stmt_info));
 	    }
@@ -4055,23 +4054,18 @@ vect_create_addr_base_for_vector_ref (gi
   innermost_loop_behavior *drb = vect_dr_behavior (dr);
 
   tree data_ref_base = unshare_expr (drb->base_address);
-  tree base_offset = unshare_expr (drb->offset);
-  tree init = unshare_expr (drb->init);
-
+  tree base_offset;
   if (loop_vinfo)
-    base_name = get_name (data_ref_base);
+    {
+      base_name = get_name (data_ref_base);
+      base_offset = fold_convert (sizetype, unshare_expr (drb->offset));
+    }
   else
     {
-      base_offset = ssize_int (0);
-      init = ssize_int (0);
+      base_offset = size_int (0);
       base_name = get_name (DR_REF (dr));
     }
 
-  /* Create base_offset */
-  base_offset = size_binop (PLUS_EXPR,
-			    fold_convert (sizetype, base_offset),
-			    fold_convert (sizetype, init));
-
   if (offset)
     {
       offset = fold_build2 (MULT_EXPR, sizetype,
Index: gcc/tree-vect-stmts.c
===================================================================
--- gcc/tree-vect-stmts.c	2017-08-21 15:50:48.664709938 +0100
+++ gcc/tree-vect-stmts.c	2017-08-22 14:54:48.633563940 +0100
@@ -5970,9 +5970,7 @@ vectorizable_store (gimple *stmt, gimple
       stride_base
 	= fold_build_pointer_plus
 	    (unshare_expr (DR_BASE_ADDRESS (first_dr)),
-	     size_binop (PLUS_EXPR,
-			 convert_to_ptrofftype (unshare_expr (DR_OFFSET (first_dr))),
-			 convert_to_ptrofftype (DR_INIT (first_dr))));
+	     convert_to_ptrofftype (unshare_expr (DR_OFFSET (first_dr))));
       stride_step = fold_convert (sizetype, unshare_expr (DR_STEP (first_dr)));
 
       /* For a store with loop-invariant (but other than power-of-2)
@@ -6299,7 +6297,6 @@ vectorizable_store (gimple *stmt, gimple
 	      && TREE_CODE (DR_BASE_ADDRESS (first_dr)) == ADDR_EXPR
 	      && VAR_P (TREE_OPERAND (DR_BASE_ADDRESS (first_dr), 0))
 	      && integer_zerop (DR_OFFSET (first_dr))
-	      && integer_zerop (DR_INIT (first_dr))
 	      && alias_sets_conflict_p (get_alias_set (aggr_type),
 					get_alias_set (TREE_TYPE (ref_type))))
 	    {
@@ -6993,9 +6990,7 @@ vectorizable_load (gimple *stmt, gimple_
       stride_base
 	= fold_build_pointer_plus
 	    (DR_BASE_ADDRESS (first_dr),
-	     size_binop (PLUS_EXPR,
-			 convert_to_ptrofftype (DR_OFFSET (first_dr)),
-			 convert_to_ptrofftype (DR_INIT (first_dr))));
+	     convert_to_ptrofftype (DR_OFFSET (first_dr)));
       stride_step = fold_convert (sizetype, DR_STEP (first_dr));
 
       /* For a load with loop-invariant (but other than power-of-2)
@@ -7394,7 +7389,6 @@ vectorizable_load (gimple *stmt, gimple_
 	      && TREE_CODE (DR_BASE_ADDRESS (first_dr)) == ADDR_EXPR
 	      && VAR_P (TREE_OPERAND (DR_BASE_ADDRESS (first_dr), 0))
 	      && integer_zerop (DR_OFFSET (first_dr))
-	      && integer_zerop (DR_INIT (first_dr))
 	      && alias_sets_conflict_p (get_alias_set (aggr_type),
 					get_alias_set (TREE_TYPE (ref_type)))
 	      && (alignment_support_scheme == dr_aligned
@@ -7417,8 +7411,8 @@ vectorizable_load (gimple *stmt, gimple_
 		= STMT_VINFO_DATA_REF (vinfo_for_stmt (first_stmt_for_drptr));
 	      tree diff = fold_convert (sizetype,
 					size_binop (MINUS_EXPR,
-						    DR_INIT (first_dr),
-						    DR_INIT (ptrdr)));
+						    DR_CONST_OFFSET (first_dr),
+						    DR_CONST_OFFSET (ptrdr)));
 	      dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, gsi,
 					     stmt, diff);
 	    }
Index: gcc/testsuite/gcc.dg/vect/bb-slp-pr81635.c
===================================================================
--- /dev/null	2017-08-21 17:11:06.647307681 +0100
+++ gcc/testsuite/gcc.dg/vect/bb-slp-pr81635.c	2017-08-22 14:54:48.628563940 +0100
@@ -0,0 +1,56 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_unpack } */
+
+double p[1000];
+double q[1000];
+
+void
+f1 (void)
+{
+  for (unsigned int i = 0; i < 1000; i += 4)
+    {
+      double a = q[i] + p[i];
+      double b = q[i + 1] + p[i + 1];
+      q[i] = a;
+      q[i + 1] = b;
+    }
+}
+
+void
+f2 (void)
+{
+  for (unsigned int i = 0; i < 500; i += 6)
+    for (unsigned int j = 0; j < 500; j += 4)
+      {
+	double a = q[j] + p[i];
+	double b = q[j + 1] + p[i + 1];
+	q[i] = a;
+	q[i + 1] = b;
+      }
+}
+
+void
+f3 (void)
+{
+  for (unsigned int i = 2; i < 1000; i += 4)
+    {
+      double a = q[i - 2] + p[i - 2];
+      double b = q[i - 1] + p[i - 1];
+      q[i - 2] = a;
+      q[i - 1] = b;
+    }
+}
+
+void
+f4 (unsigned int n)
+{
+  for (unsigned int i = 0; i < n; i += 4)
+    {
+      double a = q[i] + p[i];
+      double b = q[i + 1] + p[i + 1];
+      q[i] = a;
+      q[i + 1] = b;
+    }
+}
+
+/* { dg-final { scan-tree-dump-times "basic block vectorized" 4 "slp1" } } */

Richard Biener Aug. 24, 2017, 12:40 p.m. UTC | #12

On Tue, Aug 22, 2017 at 4:19 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> Richard Biener <richard.guenther@gmail.com> writes:

>> On Fri, Aug 18, 2017 at 12:30 PM, Richard Biener

>> <richard.guenther@gmail.com> wrote:

>>> On Thu, Aug 17, 2017 at 2:24 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:

>>>> On Thu, Aug 17, 2017 at 12:35 PM, Richard Sandiford

>>>> <richard.sandiford@linaro.org> wrote:

>>>>> "Bin.Cheng" <amker.cheng@gmail.com> writes:

>>>>>> On Wed, Aug 16, 2017 at 6:50 PM, Richard Sandiford

>>>>>> <richard.sandiford@linaro.org> wrote:

>>>>>>> "Bin.Cheng" <amker.cheng@gmail.com> writes:

>>>>>>>> On Wed, Aug 16, 2017 at 5:00 PM, Richard Sandiford

>>>>>>>> <richard.sandiford@linaro.org> wrote:

>>>>>>>>> "Bin.Cheng" <amker.cheng@gmail.com> writes:

>>>>>>>>>> On Wed, Aug 16, 2017 at 2:38 PM, Richard Sandiford

>>>>>>>>>> <richard.sandiford@linaro.org> wrote:

>>>>>>>>>>> The first loop in the testcase regressed after my recent changes to

>>>>>>>>>>> dr_analyze_innermost.  Previously we would treat "i" as an iv even

>>>>>>>>>>> for bb analysis and end up with:

>>>>>>>>>>>

>>>>>>>>>>>    DR_BASE_ADDRESS: p or q

>>>>>>>>>>>    DR_OFFSET: 0

>>>>>>>>>>>    DR_INIT: 0 or 4

>>>>>>>>>>>    DR_STEP: 16

>>>>>>>>>>>

>>>>>>>>>>> We now always keep the step as 0 instead, so for an int "i" we'd have:

>>>>>>>>>>>

>>>>>>>>>>>    DR_BASE_ADDRESS: p or q

>>>>>>>>>>>    DR_OFFSET: (intptr_t) i

>>>>>>>>>>>    DR_INIT: 0 or 4

>>>>>>>>>>>    DR_STEP: 0

>>>>>>>>>>>

>>>>>>>>>>> This is also what we'd like to have for the unsigned "i", but the

>>>>>>>>>>> problem is that strip_constant_offset thinks that the "i + 1" in

>>>>>>>>>>> "(intptr_t) (i + 1)" could wrap and so doesn't peel off the "+ 1".

>>>>>>>>>>> The [i + 1] accesses therefore have a DR_OFFSET equal to the SSA

>>>>>>>>>>> name that holds "(intptr_t) (i + 1)", meaning that the accesses no

>>>>>>>>>>> longer seem to be related to the [i] ones.

>>>>>>>>>>

>>>>>>>>>> Didn't read the change in detail, so sorry if I mis-understood

>>>>>>>>>> the issue.

>>>>>>>>>> I made changes in scev to better fold type conversion by

>>>>>>>>>> various sources

>>>>>>>>>> of information, for example, vrp, niters, undefined overflow

>>>>>>>>>> behavior etc.

>>>>>>>>>> In theory these information should be available for other

>>>>>>>>>> optimizers without

>>>>>>>>>> querying scev.  For the mentioned test, vrp should compute

>>>>>>>>>> accurate range

>>>>>>>>>> information for "i" so that we can fold (intptr_t) (i + 1) it without

>>>>>>>>>> worrying

>>>>>>>>>> overflow.  Note we don't do it in generic folding because

>>>>>>>>>> (intptr_t) (i) + 1

>>>>>>>>>> could be more expensive (especially in case of (T)(i + j)), or

>>>>>>>>>> because the

>>>>>>>>>> CST part is in bigger precision after conversion.

>>>>>>>>>> But such folding is wanted in several places, e.g, IVOPTs.  To

>>>>>>>>>> provide such

>>>>>>>>>> an interface, we changed tree-affine and made it do aggressive

>>>>>>>>>> fold.  I am

>>>>>>>>>> curious if it's possible to use aff_tree to implement

>>>>>>>>>> strip_constant_offset

>>>>>>>>>> here since aggressive folding is wanted.  After all, using

>>>>>>>>>> additional chrec

>>>>>>>>>> looks like a little heavy wrto the simple test.

>>>>>>>>>

>>>>>>>>> Yeah, using aff_tree does work here when the bounds are constant.

>>>>>>>>> It doesn't look like it works for things like:

>>>>>>>>>

>>>>>>>>>     double p[1000];

>>>>>>>>>     double q[1000];

>>>>>>>>>

>>>>>>>>>     void

>>>>>>>>>     f4 (unsigned int n)

>>>>>>>>>     {

>>>>>>>>>       for (unsigned int i = 0; i < n; i += 4)

>>>>>>>>>         {

>>>>>>>>>           double a = q[i] + p[i];

>>>>>>>>>           double b = q[i + 1] + p[i + 1];

>>>>>>>>>           q[i] = a;

>>>>>>>>>           q[i + 1] = b;

>>>>>>>>>         }

>>>>>>>>>     }

>>>>>>>>>

>>>>>>>>> though, where the bounds on the global arrays guarantee that [i + 1] can't

>>>>>>>>> overflow, even though "n" is unconstrained.  The patch as posted handles

>>>>>>>>> this case too.

>>>>>>>> BTW is this a missed optimization in value range analysis?  The range

>>>>>>>> information for i should flow in a way like: array boundary -> niters

>>>>>>>> -> scev/vrp.

>>>>>>>> I think that's what niters/scev do in analysis.

>>>>>>>

>>>>>>> Yeah, maybe :-)  It looks like the problem is that when SLP runs,

>>>>>>> the previous VRP pass came before loop header copying, so the (single)

>>>>>>> header has to cope with n == 0 case.  Thus we get:

>>>>>> Ah, there are several passes in between vrp and pass_ch, not sure if

>>>>>> any such pass depends on vrp intensively.  I would suggestion reorder

>>>>>> the two passes, or standalone VRP interface updating information for

>>>>>> loop region after header copied?   This is a non-trivial issue that

>>>>>> needs to be fixed.  Niters analyzer rely on

>>>>>> simplify_using_initial_conditions heavily to get the same information,

>>>>>> which in my opinion should be provided by VRP.  Though this won't be

>>>>>> able to obsolete simplify_using_initial_conditions because VRP is weak

>>>>>> in symbolic range...

>>>>>>

>>>>>>>

>>>>>>>   Visiting statement:

>>>>>>>   i_15 = ASSERT_EXPR <i_6, i_6 < n_9(D)>;

>>>>>>>   Intersecting

>>>>>>>     [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)

>>>>>>>   and

>>>>>>>     [0, 0]

>>>>>>>   to

>>>>>>>     [0, 0]  EQUIVALENCES: { i_6 } (1 elements)

>>>>>>>   Intersecting

>>>>>>>     [0, 0]  EQUIVALENCES: { i_6 } (1 elements)

>>>>>>>   and

>>>>>>>     [0, 1000]

>>>>>>>   to

>>>>>>>     [0, 0]  EQUIVALENCES: { i_6 } (1 elements)

>>>>>>>   Found new range for i_15: [0, 0]

>>>>>>>

>>>>>>>   Visiting statement:

>>>>>>>   _3 = i_15 + 1;

>>>>>>>   Match-and-simplified i_15 + 1 to 1

>>>>>>>   Intersecting

>>>>>>>     [1, 1]

>>>>>>>   and

>>>>>>>     [0, +INF]

>>>>>>>   to

>>>>>>>     [1, 1]

>>>>>>>   Found new range for _3: [1, 1]

>>>>>>>

>>>>>>> (where _3 is the index we care about), followed by:

>>>>>>>

>>>>>>>   Visiting statement:

>>>>>>>   i_15 = ASSERT_EXPR <i_6, i_6 < n_9(D)>;

>>>>>>>   Intersectings/aarch64-linux/trunk-orig/debug/gcc'

>>>>>>>     [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)

>>>>>>>   and

>>>>>>>     [0, 4]

>>>>>>>   to

>>>>>>>     [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)

>>>>>>>   Intersecting

>>>>>>>     [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)

>>>>>>>   and

>>>>>>>     [0, 1000]

>>>>>>>   to

>>>>>>>     [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)

>>>>>>>   Found new range for i_15: [0, n_9(D) + 4294967295]

>>>>>>>

>>>>>>>   Visiting statement:

>>>>>>>   _3 = i_15 + 1;

>>>>>>>   Intersecting

>>>>>>>     [0, +INF]

>>>>>>>   and

>>>>>>>     [0, +INF]

>>>>>>>   to

>>>>>>>     [0, +INF]

>>>>>>>   Found new range for _3: [0, +INF]

>>>>>>>

>>>>>>> I guess in this case it would be better to intersect the i_15 ranges

>>>>>>> to [0, 1000] rather than [0, n_9(D) + 4294967295].

>>>>>>>

>>>>>>> It does work if another VRP pass runs after CH.  But even then,

>>>>>>> is it a good idea to rely on the range info being kept up-to-date

>>>>>>> all the way through to SLP?  A lot happens inbetween.

>>>>>> To some extend yes.  Now I understand that SCEV uses niters

>>>>>> information to prove no_overflow.  Niters analysis does infer better

>>>>>> information from array boundary, while VRP fails to do that.  I don't

>>>>>> worry much about gap between vrp pass and slp, it's basically the same

>>>>>> as niters.  Both information are analyzed at one point and meant to be

>>>>>> used by following passes.  It's also not common for vrp information

>>>>>> become invalid given we are on SSA?

>>>>>

>>>>> I'm not worried so much about vrp information becoming invalid on

>>>>> an SSA name that existed when VRP was run.  It's more a question

>>>>> of what happens about SSA names that get introduced after VRP,

>>>>> e.g. by things like dom, reassoc, PRE, etc.

>>>> For induction variables in concern, these passes shouldn't

>>>> aggressively introduces new variables I think.

>>>>>

>>>>>> Now that data address is not analyzed against loop, VRP would be the

>>>>>> only information we can use unless adding back scev analysis.  IMHO,

>>>>>> the patch is doing so in another way than before.  It requires

>>>>>> additional chrec data structure.  I remember the previous patch

>>>>>> enables more slp vectorization, is it because of "step" information

>>>>>> from scev?

>>>>>

>>>>> Do you mean that:

>>>>>

>>>>> 2017-07-03  Richard Sandiford  <richard.sandiford@linaro.org>

>>>>>

>>>>>         * tree-data-ref.c (dr_analyze_innermost): Replace the "nest"

>>>>>         parameter with a "loop" parameter and use it instead of the

>>>>>         loop containing DR_STMT.  Don't check simple_iv when doing

>>>>>         BB analysis.  Describe the two analysis modes in the comment.

>>>>>

>>>>> enabled more SLP vectorisation in bb-slp-pr65935.c?  That was due

>>>>> to us not doing IV analysis for BB vectorisation, and ensuring that

>>>>> the step was always zero.

>>>> Which means vectorizer code can handle not IV-analyzed offset, but

>>>> can't for analyzed form?

>>>>>

>>>>>> In this patch, step information is simply discarded.  I am

>>>>>> wondering if possible to always analyze scev within innermost loop for

>>>>>> slp while discards step information.

>>>>>

>>>>> Well, we don't calculate a step for bb analysis (i.e. it's not the case

>>>>> the patch calculates step information and then simply discards it).

>>>>> I don't see how that would work.  For bb analysis, the DR_OFFSET + DR_INIT

>>>>> has to give the offset for every execution of the block, not just the

>>>>> first iteration of the containing loop.  So if we get back a nonzero

>>>>> step, we have to do something with it.

>>>> Yeah.

>>>>>

>>>>> But:

>>>>>

>>>>> (a) the old simple_iv analysis is more expensive than simply calling

>>>>>     analyze_scev, so I don't think this is a win in terms of complexity.

>>>>>

>>>>> (b) for bb analysis, there's nothing particularly special about the

>>>>>     innermost loop.  It makes more sense to analyse it in the innermost

>>>>>     loop for which the offset is invariant, as shown by the second

>>>>>     testcase in the patch.

>>>>>

>>>>> (c) The patch helps with loop vectorisation too, since analysing the

>>>>>     starting DR_OFFSET in the context of the containing loop can help

>>>>>     in a similar way as analysing the full offset does for SLP.

>>>>

>>>> I have to admit I am not very much into this method.  It complicates

>>>> structure as well as code.

>>>> Mostly because now dr_init are split into two different fields and one

>>>> of it is lazily computed.

>>>>

>>>> For example:

>>>>> @@ -2974,12 +2974,12 @@ vect_vfa_segment_size (struct data_refer

>>>>>  vect_no_alias_p (struct data_reference *a, struct data_reference *b,

>>>>>                   tree segment_length_a, tree segment_length_b)

>>>>>  {

>>>>> -  gcc_assert (TREE_CODE (DR_INIT (a)) == INTEGER_CST

>>>>> -          && TREE_CODE (DR_INIT (b)) == INTEGER_CST);

>>>>> -  if (tree_int_cst_equal (DR_INIT (a), DR_INIT (b)))

>>>>> +  gcc_assert (TREE_CODE (DR_CHREC_INIT (a)) == INTEGER_CST

>>>>> +          && TREE_CODE (DR_CHREC_INIT (b)) == INTEGER_CST);

>>>>> +  if (tree_int_cst_equal (DR_CHREC_INIT (a), DR_CHREC_INIT (b)))

>>>>>      return false;

>>>>>

>>>>> -  tree seg_a_min = DR_INIT (a);

>>>>> +  tree seg_a_min = DR_CHREC_INIT (a);

>>>>>    tree seg_a_max = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_a_min),

>>>>>                  seg_a_min, segment_length_a);

>>>>>    /* For negative step, we need to adjust address range by TYPE_SIZE_UNIT

>>>>> @@ -2990,10 +2990,10 @@ vect_no_alias_p (struct data_reference *

>>>>>        tree unit_size = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (a)));

>>>>>        seg_a_min = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_a_max),

>>>>>                     seg_a_max, unit_size);

>>>>> -      seg_a_max = fold_build2 (PLUS_EXPR, TREE_TYPE (DR_INIT (a)),

>>>>> -                   DR_INIT (a), unit_size);

>>>>> +      seg_a_max = fold_build2 (PLUS_EXPR, TREE_TYPE (DR_CHREC_INIT (a)),

>>>>> +                   DR_CHREC_INIT (a), unit_size);

>>>>>      }

>>>>> -  tree seg_b_min = DR_INIT (b);

>>>>> +  tree seg_b_min = DR_CHREC_INIT (b);

>>>>>    tree seg_b_max = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_b_min),

>>>>>                  seg_b_min, segment_length_b);

>>>>>    if (tree_int_cst_compare (DR_STEP (b), size_zero_node) < 0)

>>>>

>>>> Use of DR_INIT is simply replaced by DR_CHREC_INIT.  Is it safe to do

>>>> so in case of non-ZERO

>>>> DR_INIT?  It worries me that I may need to think twice before

>>>> referring to DR_INIT because it's

>>>> not clear when DR_OFFSET is split and DR_CHREC_INIT becomes non-ZERO.

>>>> It may simply

>>>> because I am too dumb to handle this.  I will leave this to richi.

>>>

>>> I'm trying to understand this a bit (not liking it very much in its

>>> current form).

>>>

>>> Can code currently using DR_OFFSET and DR_INIT simply use

>>> DR_CHREC_INIT and CHREC_LEFT (DR_CHREC_OFFSET) (or DR_CHREC_OFFSET

>>> if DR_CHREC_OFFSET is not a CHREC)?  If yes, would there be any downside

>>> in doing that?  If not, then why?

>

> There's nothing particularly special about the CHREC_LEFT for users of

> the drs.  The chrec as a whole describes the variable part of the offset.

>

>>> That said, I'm all for making DR info more powerful.  But I detest

>>> having extra fields

>>> and adding confusion as to when to use which.  Thus if we can make

>>> DR_CHREC_INIT

>>> be DR_INIT and DR_OFFSET an inline function accessing CHREC_LEFT if

>>> suitable plus exposing DR_OFFSET_CHREC that would make me more happy.

>>

>> And if we want to make it opt-in do a dr_analyze_me_harder () which will

>> re-write DR_OFFSET/INIT into the new form.

>>

>> But DR_OFFSET and DR_INIT (read) accessors would maintain their

>> semantics while DR_OFFSET_CHREC would expose the CHREC if

>> available.

>

> After the changes, the only place that actually cared about the split

> between the "old" DR_OFFSET and DR_INIT was tree-predcom.c:

> ref_at_iteration.  Everywhere else just wanted the sum of OFFSET and INIT,

> and for them it would have been more convenient to have a combined field.

>

> So maybe one way of trying to avoid the confusion would be to keep

> DR_OFFSET together as the full starting offset from the base, then

> provide DR_VAR_OFFSET and DR_CONST_OFFSET as the split forms, with

> DR_VAR_OFFSET being more like an abstract value number.  See the comment

> at the start of the patch below for more details.

>

> I'm not sure whether the main reason for splitting the offset was to

> make life easier for the users of the drs, or whether it was to try

> to avoid creating new trees.  But then, the unconditional scev analysis

> that we did previously already generated new trees.


Make live easier and allow same base + variable offset but differing
const offset to be detected easily -- a[i] vs. a[i+1].

@@ -787,14 +821,14 @@ canonicalize_base_object_address (tree a

 bool
 dr_analyze_innermost (innermost_loop_behavior *drb, tree ref,
-                     struct loop *loop)
+                     gimple *stmt, struct loop *loop)
 {

please amend the function comment with what STMT is about
(DR_STMT I suppose).

@@ -893,14 +927,14 @@ dr_analyze_innermost (innermost_loop_beh
   init = size_binop (PLUS_EXPR, init, dinit);
   base_misalignment -= TREE_INT_CST_LOW (dinit);

-  split_constant_offset (offset_iv.base, &offset_iv.base, &dinit);
-  init = size_binop (PLUS_EXPR, init, dinit);
-
-  step = size_binop (PLUS_EXPR,
-                    fold_convert (ssizetype, base_iv.step),
-                    fold_convert (ssizetype, offset_iv.step));
-
   base = canonicalize_base_object_address (base_iv.base);
+  tree offset = size_binop (PLUS_EXPR,
+                           fold_convert (ssizetype, offset_iv.base),
+                           init);

so you remove the split_constant_offset handling on offset_iv.base.
This may end up no longer walking and expanding def stmts of
SSA names contained therein.  I suppose this is fully intended
so that re-computing the ref address using DR_BASE/DR_OFFSET
doesn't end up expanding that redundant code?  For analysis
relying on this one now needs to resort to DR_VAR/CONST_OFFSET
where SCEV analysis hopefully performs similar expansions?

Just sth to watch at ...

@@ -921,12 +955,12 @@ dr_analyze_innermost (innermost_loop_beh
     }

   drb->base_address = base;
-  drb->offset = fold_convert (ssizetype, offset_iv.base);
-  drb->init = init;
+  drb->offset = offset;
   drb->step = step;
+  split_constant_offset (scev, &drb->var_offset, &drb->const_offset);

so is the full-fledged split_constant_offset (with its SSA name handling)
still needed here?  Sth to eventually address with a followup.

@@ -1490,6 +1482,7 @@ ref_at_iteration (data_reference_p dr, i
   tree ref_op2 = NULL_TREE;
   tree new_offset;

+  split_constant_offset (DR_OFFSET (dr), &off, &coff);
   if (iter != 0)
     {
       new_offset = size_binop (MULT_EXPR, DR_STEP (dr), ssize_int (iter));

likewise here?  Why do you think ref_at_iteration cares?  Is that because
of codegen quality?  I'd have done with coff == size_zero_node plus
simplifications that arise from that.

Thanks,
Richard.


> Tested on aarch64-linux-gnu and x86_64-linux-gnu.

>

> Thanks,

> Richard

>

>

> 2017-08-22  Richard Sandiford  <richard.sandiford@arm.com>

>

> gcc/

>         PR tree-optimization/81635

>         * tree-data-ref.h (innermost_loop_behavior): Remove init field.

>         Add var_offset and const_offset fields.  Rename offset_alignment

>         to var_offset_alignment.

>         (DR_INIT): Delete.

>         (DR_CONST_OFFSET, DR_VAR_OFFSET): New macros.

>         (DR_OFFSET_ALIGNMENT): Replace with...

>         (DR_VAR_OFFSET_ALIGNMENT): ...this new macro.

>         (dr_analyze_innermost): Add a gimple * argument.

>         (dr_equal_offsets_p): Delete.

>         (dr_var_offsets_equal_p, dr_var_offsets_compare): Declare.

>         * tree-vectorizer.h (STMT_VINFO_DR_INIT): Delete.

>         (STMT_VINFO_DR_VAR_OFFSET, STMT_VINFO_DR_CONST_OFFSET): New macros.

>         (STMT_VINFO_DR_OFFSET_ALIGNMENT): Replace with...

>         (STMT_VINFO_DR_VAR_OFFSET_ALIGNMENT): ...this new macro.

>         * tree.c (tree_ctz): Handle POLYNOMIAL_CHREC.

>         * tree-data-ref.c: Include tree-ssa-loop-ivopts.h.

>         (split_constant_offset): Handle POLYNOMIAL_CHREC.

>         (analyze_offset_scev): New function.

>         (dr_analyze_innermost): Add a gimple * statement.  Update after

>         changes to innermost_behavior.  Initialize var_offset and

>         const_offset.

>         (create_data_ref): Update call to dr_analyze_innermost.

>         Update dump after changes to innermost_behavior.

>         (operator ==): Use dr_var_offsets_equal_p and compare the

>         DR_CONST_OFFSETs.

>         (prune_runtime_alias_test_list): Likewise.

>         (comp_dr_with_seg_len_pair): Use dr_var_offsets_compare and compare

>         the DR_CONST_OFFSETs.

>         (create_intersect_range_checks): Use DR_OFFSET without adding

>         DR_INIT.

>         (dr_equal_offsets_p1, dr_equal_offsets_p): Delete.

>         (dr_alignment): Use const_offset instead of init and

>         var_offset_alignment instead of offset_alignment.

>         * tree-if-conv.c (innermost_loop_behavior_hash::hash): Don't

>         test the init field.

>         (innermost_loop_behavior_hash::equal): Likewise.

>         (ifcvt_memrefs_wont_trap): Likewise.

>         (if_convertible_loop_p_1): Likewise.

>         * tree-loop-distribution.c (build_addr_arg_loc): Use DR_OFFSET

>         without adding DR_INIT.

>         (build_rdg_partition_for_vertex): Don't check DR_INIT.

>         (share_memory_accesses): Likewise.

>         (pg_add_dependence_edges): Likewise.

>         (compute_alias_check_pairs): Use dr_var_offsets_compare.

>         * tree-predcom.c (aff_combination_dr_offset): Use DR_OFFSET without

>         adding DR_INIT.

>         (determine_offset): Likewise.

>         (valid_initializer_p): Likewise.

>         (find_looparound_phi): Update call to dr_analyze_innermost.

>         (ref_at_iteration): Use split_constant_offset to split the offset.

>         * tree-vect-data-refs.c (vect_compute_data_ref_alignment): Use

>         const_offset instead of init and var_offset_alignment instead of

>         offset_alignment.

>         (vect_find_same_alignment_drs): Use dr_var_offsets_compare and

>         compare the DR_CONST_OFFSETs.

>         (dr_group_sort_cmp): Likewise.

>         (vect_analyze_group_access_1): Use DR_CONST_OFFSET instead of DR_INIT.

>         (vect_no_alias_p): Likewise.

>         (vect_analyze_data_ref_accesses): Use dr_var_offsets_equal_p and

>         compare the DR_CONST_OFFSETs.

>         (vect_prune_runtime_alias_test_list): Use dr_var_offsets_compare.

>         (vect_analyze_data_refs): Don't check DR_INIT and use DR_OFFSET

>         without adding DR_INIT.  Use DR_VAR_OFFSET_ALIGNMENT instead of

>         DR_OFFSET_ALIGNMENT.  Update call to dr_analyze_innermost, and

>         update subsequent dump.

>         (vect_create_addr_base_for_vector_ref): Use DR_OFFSET without

>         adding DR_INIT.

>         * tree-vect-stmts.c (vectorizable_store): Likewise.

>         (vectorizable_load): Likewise.  Use DR_CONST_OFFSET instead

>         of DR_INIT.

>

> gcc/testsuite/

>         PR tree-optimization/81635

>         * gcc.dg/vect/bb-slp-pr81635.c: New test.

>

> Index: gcc/tree-data-ref.h

> ===================================================================

> --- gcc/tree-data-ref.h 2017-08-21 10:42:51.088530428 +0100

> +++ gcc/tree-data-ref.h 2017-08-22 14:54:48.630563940 +0100

> @@ -28,29 +28,44 @@ #define GCC_TREE_DATA_REF_H

>    innermost_loop_behavior describes the evolution of the address of the memory

>    reference in the innermost enclosing loop.  The address is expressed as

>    BASE + STEP * # of iteration, and base is further decomposed as the base

> -  pointer (BASE_ADDRESS),  loop invariant offset (OFFSET) and

> -  constant offset (INIT).  Examples, in loop nest

> +  pointer (BASE_ADDRESS) and the loop invariant offset (OFFSET).

> +  OFFSET is further expressed as the sum of a zero or non-constant term

> +  (VAR_OFFSET) and a constant term (CONST_OFFSET).  VAR_OFFSET should be

> +  treated as an abstract representation; in particular, it may contain

> +  chrecs.  CONST_OFFSET is always an INTEGER_CST.

>

> -  for (i = 0; i < 100; i++)

> -    for (j = 3; j < 100; j++)

> +  Examples, in loop nest

>

> -                       Example 1                      Example 2

> -      data-ref         a[j].b[i][j]                   *(p + x + 16B + 4B * j)

> +  1: for (i = 0; i < 100; i++)

> +  2:   for (j = 3; j < 100; j++)

>

> +                    Example 1                         Example 2

> +      data-ref      a[j].b[i][j]                      *(p + x + 16B + 4B * j)

>

> -  innermost_loop_behavior

> -      base_address     &a                             p

> -      offset           i * D_i                       x

> -      init             3 * D_j + offsetof (b)         28

> -      step             D_j                            4

>

> -  */

> +  innermost_loop_behavior

> +      base_address  &a                                p

> +      offset        i * D_i + 3 * D_j + offsetof (b)  x + 28

> +      var_offset    {0, +, D_i}_1                     x (or an equiv. chrec)

> +      const_offset  3 * D_j + offsetof (b)            28

> +      step          D_j                               4

> +

> +  The main two uses of VAR_OFFSET and CONST_OFFSET are:

> +

> +  1. to better analyze the alignment, since CONST_OFFSET can be treated as

> +     the misalignment wrt the alignment of VAR_OFFSET.

> +

> +  2. to find data references that are a constant number of bytes apart.

> +     If two data references have the same BASE_ADDRESS and VAR_OFFSET,

> +     the distance between them is given by the difference in their

> +     CONST_OFFSETs.  */

>  struct innermost_loop_behavior

>  {

>    tree base_address;

>    tree offset;

> -  tree init;

>    tree step;

> +  tree var_offset;

> +  tree const_offset;

>

>    /* BASE_ADDRESS is known to be misaligned by BASE_MISALIGNMENT bytes

>       from an alignment boundary of BASE_ALIGNMENT bytes.  For example,

> @@ -91,7 +106,7 @@ struct innermost_loop_behavior

>    /* The largest power of two that divides OFFSET, capped to a suitably

>       high value if the offset is zero.  This is a byte rather than a bit

>       quantity.  */

> -  unsigned int offset_alignment;

> +  unsigned int var_offset_alignment;

>

>    /* Likewise for STEP.  */

>    unsigned int step_alignment;

> @@ -186,12 +201,13 @@ #define DR_IS_WRITE(DR)            (!DR_

>  #define DR_IS_CONDITIONAL_IN_STMT(DR) (DR)->is_conditional_in_stmt

>  #define DR_BASE_ADDRESS(DR)        (DR)->innermost.base_address

>  #define DR_OFFSET(DR)              (DR)->innermost.offset

> -#define DR_INIT(DR)                (DR)->innermost.init

> +#define DR_VAR_OFFSET(DR)          (DR)->innermost.var_offset

> +#define DR_CONST_OFFSET(DR)        (DR)->innermost.const_offset

>  #define DR_STEP(DR)                (DR)->innermost.step

>  #define DR_PTR_INFO(DR)            (DR)->alias.ptr_info

>  #define DR_BASE_ALIGNMENT(DR)      (DR)->innermost.base_alignment

>  #define DR_BASE_MISALIGNMENT(DR)   (DR)->innermost.base_misalignment

> -#define DR_OFFSET_ALIGNMENT(DR)    (DR)->innermost.offset_alignment

> +#define DR_VAR_OFFSET_ALIGNMENT(DR) (DR)->innermost.var_offset_alignment

>  #define DR_STEP_ALIGNMENT(DR)      (DR)->innermost.step_alignment

>  #define DR_INNERMOST(DR)           (DR)->innermost

>

> @@ -412,7 +428,8 @@ #define DDR_REVERSED_P(DDR) (DDR)->rever

>  #define DDR_COULD_BE_INDEPENDENT_P(DDR) (DDR)->could_be_independent_p

>

>

> -bool dr_analyze_innermost (innermost_loop_behavior *, tree, struct loop *);

> +bool dr_analyze_innermost (innermost_loop_behavior *, tree,

> +                          gimple *, struct loop *);

>  extern bool compute_data_dependences_for_loop (struct loop *, bool,

>                                                vec<loop_p> *,

>                                                vec<data_reference_p> *,

> @@ -466,8 +483,6 @@ dr_alignment (data_reference *dr)

>

>  extern bool dr_may_alias_p (const struct data_reference *,

>                             const struct data_reference *, bool);

> -extern bool dr_equal_offsets_p (struct data_reference *,

> -                                struct data_reference *);

>

>  extern bool runtime_alias_check_p (ddr_p, struct loop *, bool);

>  extern int data_ref_compare_tree (tree, tree);

> @@ -675,4 +690,23 @@ lambda_matrix_new (int m, int n, struct

>    return mat;

>  }

>

> +/* Check if DRA and DRB have equal DR_VAR_OFFSETs.  */

> +

> +inline bool

> +dr_var_offsets_equal_p (struct data_reference *dra,

> +                       struct data_reference *drb)

> +{

> +  return eq_evolutions_p (DR_VAR_OFFSET (dra), DR_VAR_OFFSET (drb));

> +}

> +

> +/* Compare the DR_VAR_OFFSETs of DRA and DRB for sorting purposes,

> +   returning a qsort-style result.  */

> +

> +inline int

> +dr_var_offsets_compare (struct data_reference *dra,

> +                       struct data_reference *drb)

> +{

> +  return data_ref_compare_tree (DR_VAR_OFFSET (dra), DR_VAR_OFFSET (drb));

> +}

> +

>  #endif  /* GCC_TREE_DATA_REF_H  */

> Index: gcc/tree-vectorizer.h

> ===================================================================

> --- gcc/tree-vectorizer.h       2017-08-04 11:42:45.939105152 +0100

> +++ gcc/tree-vectorizer.h       2017-08-22 14:54:48.633563940 +0100

> @@ -740,14 +740,15 @@ #define STMT_VINFO_VEC_CONST_COND_REDUC_

>

>  #define STMT_VINFO_DR_WRT_VEC_LOOP(S)      (S)->dr_wrt_vec_loop

>  #define STMT_VINFO_DR_BASE_ADDRESS(S)      (S)->dr_wrt_vec_loop.base_address

> -#define STMT_VINFO_DR_INIT(S)              (S)->dr_wrt_vec_loop.init

>  #define STMT_VINFO_DR_OFFSET(S)            (S)->dr_wrt_vec_loop.offset

> +#define STMT_VINFO_DR_VAR_OFFSET(S)        (S)->dr_wrt_vec_loop.var_offset

> +#define STMT_VINFO_DR_CONST_OFFSET(S)      (S)->dr_wrt_vec_loop.const_offset

>  #define STMT_VINFO_DR_STEP(S)              (S)->dr_wrt_vec_loop.step

>  #define STMT_VINFO_DR_BASE_ALIGNMENT(S)    (S)->dr_wrt_vec_loop.base_alignment

>  #define STMT_VINFO_DR_BASE_MISALIGNMENT(S) \

>    (S)->dr_wrt_vec_loop.base_misalignment

> -#define STMT_VINFO_DR_OFFSET_ALIGNMENT(S) \

> -  (S)->dr_wrt_vec_loop.offset_alignment

> +#define STMT_VINFO_DR_VAR_OFFSET_ALIGNMENT(S) \

> +  (S)->dr_wrt_vec_loop.var_offset_alignment

>  #define STMT_VINFO_DR_STEP_ALIGNMENT(S) \

>    (S)->dr_wrt_vec_loop.step_alignment

>

> Index: gcc/tree.c

> ===================================================================

> --- gcc/tree.c  2017-08-21 12:14:47.159835474 +0100

> +++ gcc/tree.c  2017-08-22 14:54:48.634563941 +0100

> @@ -2601,6 +2601,12 @@ tree_ctz (const_tree expr)

>           return MIN (ret1, prec);

>         }

>        return 0;

> +    case POLYNOMIAL_CHREC:

> +      ret1 = tree_ctz (CHREC_LEFT (expr));

> +      if (ret1 == 0)

> +       return ret1;

> +      ret2 = tree_ctz (CHREC_RIGHT (expr));

> +      return MIN (ret1, ret2);

>      default:

>        return 0;

>      }

> Index: gcc/tree-data-ref.c

> ===================================================================

> --- gcc/tree-data-ref.c 2017-08-21 10:42:51.088530428 +0100

> +++ gcc/tree-data-ref.c 2017-08-22 14:54:48.629563940 +0100

> @@ -86,6 +86,7 @@ Software Foundation; either version 3, o

>  #include "expr.h"

>  #include "gimple-iterator.h"

>  #include "tree-ssa-loop-niter.h"

> +#include "tree-ssa-loop-ivopts.h"

>  #include "tree-ssa-loop.h"

>  #include "tree-ssa.h"

>  #include "cfgloop.h"

> @@ -730,7 +731,19 @@ split_constant_offset (tree exp, tree *v

>    *off = ssize_int (0);

>    STRIP_NOPS (exp);

>

> -  if (tree_is_chrec (exp)

> +  if (TREE_CODE (exp) == POLYNOMIAL_CHREC)

> +    {

> +      split_constant_offset (CHREC_LEFT (exp), &op0, &op1);

> +      if (op0 != CHREC_LEFT (exp))

> +       {

> +         *var = build3 (POLYNOMIAL_CHREC, type, CHREC_VAR (exp),

> +                        op0, CHREC_RIGHT (exp));

> +         *off = op1;

> +       }

> +      return;

> +    }

> +

> +  if (automatically_generated_chrec_p (exp)

>        || get_gimple_rhs_class (TREE_CODE (exp)) == GIMPLE_TERNARY_RHS)

>      return;

>

> @@ -765,7 +778,28 @@ canonicalize_base_object_address (tree a

>    return build_fold_addr_expr (TREE_OPERAND (addr, 0));

>  }

>

> -/* Analyze the behavior of memory reference REF.  There are two modes:

> +/* Analyze the scalar evolution of OFFSET in the innermost parent of

> +   LOOP for which it isn't invariant.  Return OFFSET itself if the

> +   value is invariant or if it's too complex to analyze.  */

> +

> +static tree

> +analyze_offset_scev (struct loop *loop, tree offset)

> +{

> +  struct loop *inv_loop = outermost_invariant_loop_for_expr (loop, offset);

> +  if (inv_loop != NULL)

> +    {

> +      if (loop_depth (inv_loop) == 0)

> +       return offset;

> +      loop = loop_outer (inv_loop);

> +    }

> +  tree res = analyze_scalar_evolution (loop, offset);

> +  if (chrec_contains_undetermined (res))

> +    return offset;

> +  return res;

> +}

> +

> +/* Analyze the behavior of memory reference REF, which occurs in STMT.

> +   There are two modes:

>

>     - BB analysis.  In this case we simply split the address into base,

>       init and offset components, without reference to any containing loop.

> @@ -787,14 +821,14 @@ canonicalize_base_object_address (tree a

>

>  bool

>  dr_analyze_innermost (innermost_loop_behavior *drb, tree ref,

> -                     struct loop *loop)

> +                     gimple *stmt, struct loop *loop)

>  {

>    HOST_WIDE_INT pbitsize, pbitpos;

>    tree base, poffset;

>    machine_mode pmode;

>    int punsignedp, preversep, pvolatilep;

>    affine_iv base_iv, offset_iv;

> -  tree init, dinit, step;

> +  tree dinit;

>    bool in_loop = (loop && loop->num);

>

>    if (dump_file && (dump_flags & TDF_DETAILS))

> @@ -885,7 +919,7 @@ dr_analyze_innermost (innermost_loop_beh

>          }

>      }

>

> -  init = ssize_int (pbitpos / BITS_PER_UNIT);

> +  tree init = ssize_int (pbitpos / BITS_PER_UNIT);

>

>    /* Subtract any constant component from the base and add it to INIT instead.

>       Adjust the misalignment to reflect the amount we subtracted.  */

> @@ -893,14 +927,14 @@ dr_analyze_innermost (innermost_loop_beh

>    init = size_binop (PLUS_EXPR, init, dinit);

>    base_misalignment -= TREE_INT_CST_LOW (dinit);

>

> -  split_constant_offset (offset_iv.base, &offset_iv.base, &dinit);

> -  init = size_binop (PLUS_EXPR, init, dinit);

> -

> -  step = size_binop (PLUS_EXPR,

> -                    fold_convert (ssizetype, base_iv.step),

> -                    fold_convert (ssizetype, offset_iv.step));

> -

>    base = canonicalize_base_object_address (base_iv.base);

> +  tree offset = size_binop (PLUS_EXPR,

> +                           fold_convert (ssizetype, offset_iv.base),

> +                           init);

> +  tree step = size_binop (PLUS_EXPR,

> +                         fold_convert (ssizetype, base_iv.step),

> +                         fold_convert (ssizetype, offset_iv.step));

> +  tree scev = analyze_offset_scev (loop_containing_stmt (stmt), offset);

>

>    /* See if get_pointer_alignment can guarantee a higher alignment than

>       the one we calculated above.  */

> @@ -921,12 +955,12 @@ dr_analyze_innermost (innermost_loop_beh

>      }

>

>    drb->base_address = base;

> -  drb->offset = fold_convert (ssizetype, offset_iv.base);

> -  drb->init = init;

> +  drb->offset = offset;

>    drb->step = step;

> +  split_constant_offset (scev, &drb->var_offset, &drb->const_offset);

>    drb->base_alignment = base_alignment;

>    drb->base_misalignment = base_misalignment & (base_alignment - 1);

> -  drb->offset_alignment = highest_pow2_factor (offset_iv.base);

> +  drb->var_offset_alignment = highest_pow2_factor (drb->var_offset);

>    drb->step_alignment = highest_pow2_factor (step);

>

>    if (dump_file && (dump_flags & TDF_DETAILS))

> @@ -1154,7 +1188,7 @@ create_data_ref (loop_p nest, loop_p loo

>    DR_IS_READ (dr) = is_read;

>    DR_IS_CONDITIONAL_IN_STMT (dr) = is_conditional_in_stmt;

>

> -  dr_analyze_innermost (&DR_INNERMOST (dr), memref,

> +  dr_analyze_innermost (&DR_INNERMOST (dr), memref, stmt,

>                         nest != NULL ? loop : NULL);

>    dr_analyze_indices (dr, nest, loop);

>    dr_analyze_alias (dr);

> @@ -1166,15 +1200,17 @@ create_data_ref (loop_p nest, loop_p loo

>        print_generic_expr (dump_file, DR_BASE_ADDRESS (dr), TDF_SLIM);

>        fprintf (dump_file, "\n\toffset from base address: ");

>        print_generic_expr (dump_file, DR_OFFSET (dr), TDF_SLIM);

> -      fprintf (dump_file, "\n\tconstant offset from base address: ");

> -      print_generic_expr (dump_file, DR_INIT (dr), TDF_SLIM);

> +      fprintf (dump_file, "\n\tvariable part of offset: ");

> +      print_generic_expr (dump_file, DR_VAR_OFFSET (dr), TDF_SLIM);

> +      fprintf (dump_file, "\n\tconstant part of offset: ");

> +      print_generic_expr (dump_file, DR_CONST_OFFSET (dr), TDF_SLIM);

>        fprintf (dump_file, "\n\tstep: ");

>        print_generic_expr (dump_file, DR_STEP (dr), TDF_SLIM);

>        fprintf (dump_file, "\n\tbase alignment: %d", DR_BASE_ALIGNMENT (dr));

>        fprintf (dump_file, "\n\tbase misalignment: %d",

>                DR_BASE_MISALIGNMENT (dr));

> -      fprintf (dump_file, "\n\toffset alignment: %d",

> -              DR_OFFSET_ALIGNMENT (dr));

> +      fprintf (dump_file, "\n\tvariable offset alignment: %d",

> +              DR_VAR_OFFSET_ALIGNMENT (dr));

>        fprintf (dump_file, "\n\tstep alignment: %d", DR_STEP_ALIGNMENT (dr));

>        fprintf (dump_file, "\n\tbase_object: ");

>        print_generic_expr (dump_file, DR_BASE_OBJECT (dr), TDF_SLIM);

> @@ -1324,11 +1360,12 @@ runtime_alias_check_p (ddr_p ddr, struct

>  operator == (const dr_with_seg_len& d1,

>              const dr_with_seg_len& d2)

>  {

> -  return operand_equal_p (DR_BASE_ADDRESS (d1.dr),

> +  return (operand_equal_p (DR_BASE_ADDRESS (d1.dr),

>                           DR_BASE_ADDRESS (d2.dr), 0)

> -          && data_ref_compare_tree (DR_OFFSET (d1.dr), DR_OFFSET (d2.dr)) == 0

> -          && data_ref_compare_tree (DR_INIT (d1.dr), DR_INIT (d2.dr)) == 0

> -          && data_ref_compare_tree (d1.seg_len, d2.seg_len) == 0;

> +         && dr_var_offsets_equal_p (d1.dr, d2.dr)

> +         && data_ref_compare_tree (DR_CONST_OFFSET (d1.dr),

> +                                   DR_CONST_OFFSET (d2.dr)) == 0

> +         && data_ref_compare_tree (d1.seg_len, d2.seg_len) == 0);

>  }

>

>  /* Comparison function for sorting objects of dr_with_seg_len_pair_t

> @@ -1360,17 +1397,15 @@ comp_dr_with_seg_len_pair (const void *p

>    if ((comp_res = data_ref_compare_tree (DR_STEP (a2.dr),

>                                          DR_STEP (b2.dr))) != 0)

>      return comp_res;

> -  if ((comp_res = data_ref_compare_tree (DR_OFFSET (a1.dr),

> -                                        DR_OFFSET (b1.dr))) != 0)

> +  if ((comp_res = dr_var_offsets_compare (a1.dr, b1.dr)) != 0)

>      return comp_res;

> -  if ((comp_res = data_ref_compare_tree (DR_INIT (a1.dr),

> -                                        DR_INIT (b1.dr))) != 0)

> +  if ((comp_res = data_ref_compare_tree (DR_CONST_OFFSET (a1.dr),

> +                                        DR_CONST_OFFSET (b1.dr))) != 0)

>      return comp_res;

> -  if ((comp_res = data_ref_compare_tree (DR_OFFSET (a2.dr),

> -                                        DR_OFFSET (b2.dr))) != 0)

> +  if ((comp_res = dr_var_offsets_compare (a2.dr, b2.dr)) != 0)

>      return comp_res;

> -  if ((comp_res = data_ref_compare_tree (DR_INIT (a2.dr),

> -                                        DR_INIT (b2.dr))) != 0)

> +  if ((comp_res = data_ref_compare_tree (DR_CONST_OFFSET (a2.dr),

> +                                        DR_CONST_OFFSET (b2.dr))) != 0)

>      return comp_res;

>

>    return 0;

> @@ -1455,10 +1490,9 @@ prune_runtime_alias_test_list (vec<dr_wi

>

>           if (!operand_equal_p (DR_BASE_ADDRESS (dr_a1->dr),

>                                 DR_BASE_ADDRESS (dr_a2->dr), 0)

> -             || !operand_equal_p (DR_OFFSET (dr_a1->dr),

> -                                  DR_OFFSET (dr_a2->dr), 0)

> -             || !tree_fits_shwi_p (DR_INIT (dr_a1->dr))

> -             || !tree_fits_shwi_p (DR_INIT (dr_a2->dr)))

> +             || !dr_var_offsets_equal_p (dr_a1->dr, dr_a2->dr)

> +             || !tree_fits_shwi_p (DR_CONST_OFFSET (dr_a1->dr))

> +             || !tree_fits_shwi_p (DR_CONST_OFFSET (dr_a2->dr)))

>             continue;

>

>           /* Only merge const step data references.  */

> @@ -1484,11 +1518,13 @@ prune_runtime_alias_test_list (vec<dr_wi

>             continue;

>

>           /* Make sure dr_a1 starts left of dr_a2.  */

> -         if (tree_int_cst_lt (DR_INIT (dr_a2->dr), DR_INIT (dr_a1->dr)))

> +         if (tree_int_cst_lt (DR_CONST_OFFSET (dr_a2->dr),

> +                              DR_CONST_OFFSET (dr_a1->dr)))

>             std::swap (*dr_a1, *dr_a2);

>

>           bool do_remove = false;

> -         wide_int diff = wi::sub (DR_INIT (dr_a2->dr), DR_INIT (dr_a1->dr));

> +         wide_int diff = wi::sub (DR_CONST_OFFSET (dr_a2->dr),

> +                                  DR_CONST_OFFSET (dr_a1->dr));

>           wide_int min_seg_len_b;

>           tree new_seg_len;

>

> @@ -1756,10 +1792,6 @@ create_intersect_range_checks (struct lo

>    tree addr_base_b = DR_BASE_ADDRESS (dr_b.dr);

>    tree offset_a = DR_OFFSET (dr_a.dr), offset_b = DR_OFFSET (dr_b.dr);

>

> -  offset_a = fold_build2 (PLUS_EXPR, TREE_TYPE (offset_a),

> -                         offset_a, DR_INIT (dr_a.dr));

> -  offset_b = fold_build2 (PLUS_EXPR, TREE_TYPE (offset_b),

> -                         offset_b, DR_INIT (dr_b.dr));

>    addr_base_a = fold_build_pointer_plus (addr_base_a, offset_a);

>    addr_base_b = fold_build_pointer_plus (addr_base_b, offset_b);

>

> @@ -1826,48 +1858,6 @@ create_runtime_alias_checks (struct loop

>      }

>  }

>

> -/* Check if OFFSET1 and OFFSET2 (DR_OFFSETs of some data-refs) are identical

> -   expressions.  */

> -static bool

> -dr_equal_offsets_p1 (tree offset1, tree offset2)

> -{

> -  bool res;

> -

> -  STRIP_NOPS (offset1);

> -  STRIP_NOPS (offset2);

> -

> -  if (offset1 == offset2)

> -    return true;

> -

> -  if (TREE_CODE (offset1) != TREE_CODE (offset2)

> -      || (!BINARY_CLASS_P (offset1) && !UNARY_CLASS_P (offset1)))

> -    return false;

> -

> -  res = dr_equal_offsets_p1 (TREE_OPERAND (offset1, 0),

> -                             TREE_OPERAND (offset2, 0));

> -

> -  if (!res || !BINARY_CLASS_P (offset1))

> -    return res;

> -

> -  res = dr_equal_offsets_p1 (TREE_OPERAND (offset1, 1),

> -                             TREE_OPERAND (offset2, 1));

> -

> -  return res;

> -}

> -

> -/* Check if DRA and DRB have equal offsets.  */

> -bool

> -dr_equal_offsets_p (struct data_reference *dra,

> -                    struct data_reference *drb)

> -{

> -  tree offset1, offset2;

> -

> -  offset1 = DR_OFFSET (dra);

> -  offset2 = DR_OFFSET (drb);

> -

> -  return dr_equal_offsets_p1 (offset1, offset2);

> -}

> -

>  /* Returns true if FNA == FNB.  */

>

>  static bool

> @@ -5083,13 +5073,13 @@ dr_alignment (innermost_loop_behavior *d

>    /* Get the alignment of BASE_ADDRESS + INIT.  */

>    unsigned int alignment = drb->base_alignment;

>    unsigned int misalignment = (drb->base_misalignment

> -                              + TREE_INT_CST_LOW (drb->init));

> +                              + TREE_INT_CST_LOW (drb->const_offset));

>    if (misalignment != 0)

>      alignment = MIN (alignment, misalignment & -misalignment);

>

>    /* Cap it to the alignment of OFFSET.  */

>    if (!integer_zerop (drb->offset))

> -    alignment = MIN (alignment, drb->offset_alignment);

> +    alignment = MIN (alignment, drb->var_offset_alignment);

>

>    /* Cap it to the alignment of STEP.  */

>    if (!integer_zerop (drb->step))

> Index: gcc/tree-if-conv.c

> ===================================================================

> --- gcc/tree-if-conv.c  2017-07-13 09:25:12.666266733 +0100

> +++ gcc/tree-if-conv.c  2017-08-22 14:54:48.630563940 +0100

> @@ -149,7 +149,6 @@ innermost_loop_behavior_hash::hash (cons

>

>    hash = iterative_hash_expr (e->base_address, 0);

>    hash = iterative_hash_expr (e->offset, hash);

> -  hash = iterative_hash_expr (e->init, hash);

>    return iterative_hash_expr (e->step, hash);

>  }

>

> @@ -161,8 +160,6 @@ innermost_loop_behavior_hash::equal (con

>        || (!e1->base_address && e2->base_address)

>        || (!e1->offset && e2->offset)

>        || (e1->offset && !e2->offset)

> -      || (!e1->init && e2->init)

> -      || (e1->init && !e2->init)

>        || (!e1->step && e2->step)

>        || (e1->step && !e2->step))

>      return false;

> @@ -173,9 +170,6 @@ innermost_loop_behavior_hash::equal (con

>    if (e1->offset && e2->offset

>        && !operand_equal_p (e1->offset, e2->offset, 0))

>      return false;

> -  if (e1->init && e2->init

> -      && !operand_equal_p (e1->init, e2->init, 0))

> -    return false;

>    if (e1->step && e2->step

>        && !operand_equal_p (e1->step, e2->step, 0))

>      return false;

> @@ -856,8 +850,7 @@ ifcvt_memrefs_wont_trap (gimple *stmt, v

>    innermost_loop_behavior *innermost = &DR_INNERMOST (a);

>

>    gcc_assert (DR_STMT (a) == stmt);

> -  gcc_assert (DR_BASE_ADDRESS (a) || DR_OFFSET (a)

> -              || DR_INIT (a) || DR_STEP (a));

> +  gcc_assert (DR_BASE_ADDRESS (a) || DR_OFFSET (a) || DR_STEP (a));

>

>    master_dr = innermost_DR_map->get (innermost);

>    gcc_assert (master_dr != NULL);

> @@ -1433,8 +1426,7 @@ if_convertible_loop_p_1 (struct loop *lo

>        if (TREE_CODE (ref) == COMPONENT_REF

>            || TREE_CODE (ref) == IMAGPART_EXPR

>            || TREE_CODE (ref) == REALPART_EXPR

> -          || !(DR_BASE_ADDRESS (dr) || DR_OFFSET (dr)

> -              || DR_INIT (dr) || DR_STEP (dr)))

> +          || !(DR_BASE_ADDRESS (dr) || DR_OFFSET (dr) || DR_STEP (dr)))

>          {

>            while (TREE_CODE (ref) == COMPONENT_REF

>                  || TREE_CODE (ref) == IMAGPART_EXPR

> Index: gcc/tree-loop-distribution.c

> ===================================================================

> --- gcc/tree-loop-distribution.c        2017-08-21 10:42:51.088530428 +0100

> +++ gcc/tree-loop-distribution.c        2017-08-22 14:54:48.630563940 +0100

> @@ -903,8 +903,7 @@ build_addr_arg_loc (location_t loc, data

>  {

>    tree addr_base;

>

> -  addr_base = size_binop_loc (loc, PLUS_EXPR, DR_OFFSET (dr), DR_INIT (dr));

> -  addr_base = fold_convert_loc (loc, sizetype, addr_base);

> +  addr_base = fold_convert_loc (loc, sizetype, DR_OFFSET (dr));

>

>    /* Test for a negative stride, iterating over every element.  */

>    if (tree_int_cst_sgn (DR_STEP (dr)) == -1)

> @@ -1289,8 +1288,7 @@ build_rdg_partition_for_vertex (struct g

>

>           /* Partition can only be executed sequentially if there is any

>              unknown data reference.  */

> -         if (!DR_BASE_ADDRESS (dr) || !DR_OFFSET (dr)

> -             || !DR_INIT (dr) || !DR_STEP (dr))

> +         if (!DR_BASE_ADDRESS (dr) || !DR_OFFSET (dr) || !DR_STEP (dr))

>             partition->type = PTYPE_SEQUENTIAL;

>

>           bitmap_set_bit (partition->datarefs, idx);

> @@ -1507,21 +1505,18 @@ share_memory_accesses (struct graph *rdg

>      {

>        dr1 = datarefs_vec[i];

>

> -      if (!DR_BASE_ADDRESS (dr1)

> -         || !DR_OFFSET (dr1) || !DR_INIT (dr1) || !DR_STEP (dr1))

> +      if (!DR_BASE_ADDRESS (dr1) || !DR_OFFSET (dr1) || !DR_STEP (dr1))

>         continue;

>

>        EXECUTE_IF_SET_IN_BITMAP (partition2->datarefs, 0, j, bj)

>         {

>           dr2 = datarefs_vec[j];

>

> -         if (!DR_BASE_ADDRESS (dr2)

> -             || !DR_OFFSET (dr2) || !DR_INIT (dr2) || !DR_STEP (dr2))

> +         if (!DR_BASE_ADDRESS (dr2) || !DR_OFFSET (dr2) || !DR_STEP (dr2))

>             continue;

>

>           if (operand_equal_p (DR_BASE_ADDRESS (dr1), DR_BASE_ADDRESS (dr2), 0)

>               && operand_equal_p (DR_OFFSET (dr1), DR_OFFSET (dr2), 0)

> -             && operand_equal_p (DR_INIT (dr1), DR_INIT (dr2), 0)

>               && operand_equal_p (DR_STEP (dr1), DR_STEP (dr2), 0))

>             return true;

>         }

> @@ -1705,7 +1700,6 @@ pg_add_dependence_edges (struct graph *r

>                  runtime alias check.  */

>               if (!DR_BASE_ADDRESS (dr1) || !DR_BASE_ADDRESS (dr2)

>                   || !DR_OFFSET (dr1) || !DR_OFFSET (dr2)

> -                 || !DR_INIT (dr1) || !DR_INIT (dr2)

>                   || !DR_STEP (dr1) || !tree_fits_uhwi_p (DR_STEP (dr1))

>                   || !DR_STEP (dr2) || !tree_fits_uhwi_p (DR_STEP (dr2))

>                   || res == 0)

> @@ -2203,7 +2197,7 @@ compute_alias_check_pairs (struct loop *

>                                             DR_BASE_ADDRESS (dr_b));

>

>        if (comp_res == 0)

> -       comp_res = data_ref_compare_tree (DR_OFFSET (dr_a), DR_OFFSET (dr_b));

> +       comp_res = dr_var_offsets_compare (dr_a, dr_b);

>        gcc_assert (comp_res != 0);

>

>        if (latch_dominated_by_data_ref (loop, dr_a))

> Index: gcc/tree-predcom.c

> ===================================================================

> --- gcc/tree-predcom.c  2017-08-10 14:36:08.057471267 +0100

> +++ gcc/tree-predcom.c  2017-08-22 14:54:48.631563940 +0100

> @@ -666,18 +666,13 @@ suitable_reference_p (struct data_refere

>    return true;

>  }

>

> -/* Stores DR_OFFSET (DR) + DR_INIT (DR) to OFFSET.  */

> +/* Stores DR_OFFSET (DR) to OFFSET.  */

>

>  static void

>  aff_combination_dr_offset (struct data_reference *dr, aff_tree *offset)

>  {

> -  tree type = TREE_TYPE (DR_OFFSET (dr));

> -  aff_tree delta;

> -

> -  tree_to_aff_combination_expand (DR_OFFSET (dr), type, offset,

> -                                 &name_expansions);

> -  aff_combination_const (&delta, type, wi::to_widest (DR_INIT (dr)));

> -  aff_combination_add (offset, &delta);

> +  tree_to_aff_combination_expand (DR_OFFSET (dr), TREE_TYPE (DR_OFFSET (dr)),

> +                                 offset, &name_expansions);

>  }

>

>  /* Determines number of iterations of the innermost enclosing loop before B

> @@ -710,8 +705,7 @@ determine_offset (struct data_reference

>        /* If the references have loop invariant address, check that they access

>          exactly the same location.  */

>        *off = 0;

> -      return (operand_equal_p (DR_OFFSET (a), DR_OFFSET (b), 0)

> -             && operand_equal_p (DR_INIT (a), DR_INIT (b), 0));

> +      return operand_equal_p (DR_OFFSET (a), DR_OFFSET (b), 0);

>      }

>

>    /* Compare the offsets of the addresses, and check whether the difference

> @@ -1171,8 +1165,7 @@ valid_initializer_p (struct data_referen

>    /* If the address of the reference is invariant, initializer must access

>       exactly the same location.  */

>    if (integer_zerop (DR_STEP (root)))

> -    return (operand_equal_p (DR_OFFSET (ref), DR_OFFSET (root), 0)

> -           && operand_equal_p (DR_INIT (ref), DR_INIT (root), 0));

> +    return operand_equal_p (DR_OFFSET (ref), DR_OFFSET (root), 0);

>

>    /* Verify that this index of REF is equal to the root's index at

>       -DISTANCE-th iteration.  */

> @@ -1247,7 +1240,7 @@ find_looparound_phi (struct loop *loop,

>    memset (&init_dr, 0, sizeof (struct data_reference));

>    DR_REF (&init_dr) = init_ref;

>    DR_STMT (&init_dr) = phi;

> -  if (!dr_analyze_innermost (&DR_INNERMOST (&init_dr), init_ref, loop))

> +  if (!dr_analyze_innermost (&DR_INNERMOST (&init_dr), init_ref, phi, loop))

>      return NULL;

>

>    if (!valid_initializer_p (&init_dr, ref->distance + 1, root->ref))

> @@ -1481,8 +1474,7 @@ replace_ref_with (gimple *stmt, tree new

>  ref_at_iteration (data_reference_p dr, int iter,

>                   gimple_seq *stmts, tree niters = NULL_TREE)

>  {

> -  tree off = DR_OFFSET (dr);

> -  tree coff = DR_INIT (dr);

> +  tree off, coff;

>    tree ref = DR_REF (dr);

>    enum tree_code ref_code = ERROR_MARK;

>    tree ref_type = NULL_TREE;

> @@ -1490,6 +1482,7 @@ ref_at_iteration (data_reference_p dr, i

>    tree ref_op2 = NULL_TREE;

>    tree new_offset;

>

> +  split_constant_offset (DR_OFFSET (dr), &off, &coff);

>    if (iter != 0)

>      {

>        new_offset = size_binop (MULT_EXPR, DR_STEP (dr), ssize_int (iter));

> Index: gcc/tree-vect-data-refs.c

> ===================================================================

> --- gcc/tree-vect-data-refs.c   2017-08-21 10:42:51.088530428 +0100

> +++ gcc/tree-vect-data-refs.c   2017-08-22 14:54:48.632563940 +0100

> @@ -866,7 +866,7 @@ vect_compute_data_ref_alignment (struct

>        base_misalignment = (*entry)->base_misalignment;

>      }

>

> -  if (drb->offset_alignment < vector_alignment

> +  if (drb->var_offset_alignment < vector_alignment

>        || !step_preserves_misalignment_p

>        /* We need to know whether the step wrt the vectorized loop is

>          negative when computing the starting misalignment below.  */

> @@ -928,7 +928,7 @@ vect_compute_data_ref_alignment (struct

>        base_misalignment = 0;

>      }

>    unsigned int misalignment = (base_misalignment

> -                              + TREE_INT_CST_LOW (drb->init));

> +                              + TREE_INT_CST_LOW (drb->const_offset));

>

>    /* If this is a backward running DR then first access in the larger

>       vectype actually is N-1 elements before the address in the DR.

> @@ -2187,13 +2187,13 @@ vect_find_same_alignment_drs (struct dat

>      return;

>

>    if (!operand_equal_p (DR_BASE_ADDRESS (dra), DR_BASE_ADDRESS (drb), 0)

> -      || !operand_equal_p (DR_OFFSET (dra), DR_OFFSET (drb), 0)

> +      || !dr_var_offsets_equal_p (dra, drb)

>        || !operand_equal_p (DR_STEP (dra), DR_STEP (drb), 0))

>      return;

>

>    /* Two references with distance zero have the same alignment.  */

> -  offset_int diff = (wi::to_offset (DR_INIT (dra))

> -                    - wi::to_offset (DR_INIT (drb)));

> +  offset_int diff = (wi::to_offset (DR_CONST_OFFSET (dra))

> +                    - wi::to_offset (DR_CONST_OFFSET (drb)));

>    if (diff != 0)

>      {

>        /* Get the wider of the two alignments.  */

> @@ -2434,7 +2434,7 @@ vect_analyze_group_access_1 (struct data

>        gimple *next = GROUP_NEXT_ELEMENT (vinfo_for_stmt (stmt));

>        struct data_reference *data_ref = dr;

>        unsigned int count = 1;

> -      tree prev_init = DR_INIT (data_ref);

> +      tree prev_init = DR_CONST_OFFSET (data_ref);

>        gimple *prev = stmt;

>        HOST_WIDE_INT diff, gaps = 0;

>

> @@ -2444,9 +2444,10 @@ vect_analyze_group_access_1 (struct data

>               data-ref (supported only for loads), we vectorize only the first

>               stmt, and the rest get their vectorized loads from the first

>               one.  */

> -          if (!tree_int_cst_compare (DR_INIT (data_ref),

> -                                     DR_INIT (STMT_VINFO_DATA_REF (

> -                                                  vinfo_for_stmt (next)))))

> +         data_reference *next_ref

> +           = STMT_VINFO_DATA_REF (vinfo_for_stmt (next));

> +         if (!tree_int_cst_compare (DR_CONST_OFFSET (data_ref),

> +                                    DR_CONST_OFFSET (next_ref)))

>              {

>                if (DR_IS_WRITE (data_ref))

>                  {

> @@ -2469,14 +2470,14 @@ vect_analyze_group_access_1 (struct data

>              }

>

>            prev = next;

> -          data_ref = STMT_VINFO_DATA_REF (vinfo_for_stmt (next));

> +          data_ref = next_ref;

>

>           /* All group members have the same STEP by construction.  */

>           gcc_checking_assert (operand_equal_p (DR_STEP (data_ref), step, 0));

>

>            /* Check that the distance between two accesses is equal to the type

>               size. Otherwise, we have gaps.  */

> -          diff = (TREE_INT_CST_LOW (DR_INIT (data_ref))

> +          diff = (TREE_INT_CST_LOW (DR_CONST_OFFSET (data_ref))

>                    - TREE_INT_CST_LOW (prev_init)) / type_size;

>           if (diff != 1)

>             {

> @@ -2499,7 +2500,7 @@ vect_analyze_group_access_1 (struct data

>               gap in the access, GROUP_GAP is always 1.  */

>            GROUP_GAP (vinfo_for_stmt (next)) = diff;

>

> -          prev_init = DR_INIT (data_ref);

> +          prev_init = DR_CONST_OFFSET (data_ref);

>            next = GROUP_NEXT_ELEMENT (vinfo_for_stmt (next));

>            /* Count the number of data-refs in the chain.  */

>            count++;

> @@ -2715,13 +2716,10 @@ dr_group_sort_cmp (const void *dra_, con

>          return cmp;

>      }

>

> -  /* And according to DR_OFFSET.  */

> -  if (!dr_equal_offsets_p (dra, drb))

> -    {

> -      cmp = data_ref_compare_tree (DR_OFFSET (dra), DR_OFFSET (drb));

> -      if (cmp != 0)

> -        return cmp;

> -    }

> +  /* And according to DR_VAR_OFFSET.  */

> +  cmp = dr_var_offsets_compare (dra, drb);

> +  if (cmp != 0)

> +    return cmp;

>

>    /* Put reads before writes.  */

>    if (DR_IS_READ (dra) != DR_IS_READ (drb))

> @@ -2745,8 +2743,9 @@ dr_group_sort_cmp (const void *dra_, con

>          return cmp;

>      }

>

> -  /* Then sort after DR_INIT.  In case of identical DRs sort after stmt UID.  */

> -  cmp = tree_int_cst_compare (DR_INIT (dra), DR_INIT (drb));

> +  /* Then sort after DR_CONST_OFFSET.  In case of identical DRs sort after

> +     stmt UID.  */

> +  cmp = tree_int_cst_compare (DR_CONST_OFFSET (dra), DR_CONST_OFFSET (drb));

>    if (cmp == 0)

>      return gimple_uid (DR_STMT (dra)) < gimple_uid (DR_STMT (drb)) ? -1 : 1;

>    return cmp;

> @@ -2817,7 +2816,7 @@ vect_analyze_data_ref_accesses (vec_info

>           if (DR_IS_READ (dra) != DR_IS_READ (drb)

>               || !operand_equal_p (DR_BASE_ADDRESS (dra),

>                                    DR_BASE_ADDRESS (drb), 0)

> -             || !dr_equal_offsets_p (dra, drb)

> +             || !dr_var_offsets_equal_p (dra, drb)

>               || !gimple_assign_single_p (DR_STMT (dra))

>               || !gimple_assign_single_p (DR_STMT (drb)))

>             break;

> @@ -2835,7 +2834,8 @@ vect_analyze_data_ref_accesses (vec_info

>             break;

>

>           /* Do not place the same access in the interleaving chain twice.  */

> -         if (tree_int_cst_compare (DR_INIT (dra), DR_INIT (drb)) == 0)

> +         if (tree_int_cst_compare (DR_CONST_OFFSET (dra),

> +                                   DR_CONST_OFFSET (drb)) == 0)

>             break;

>

>           /* Check the types are compatible.

> @@ -2844,9 +2844,10 @@ vect_analyze_data_ref_accesses (vec_info

>                                    TREE_TYPE (DR_REF (drb))))

>             break;

>

> -         /* Sorting has ensured that DR_INIT (dra) <= DR_INIT (drb).  */

> -         HOST_WIDE_INT init_a = TREE_INT_CST_LOW (DR_INIT (dra));

> -         HOST_WIDE_INT init_b = TREE_INT_CST_LOW (DR_INIT (drb));

> +         /* Sorting has ensured that

> +            DR_CONST_OFFSET (dra) <= DR_CONST_OFFSET (drb).  */

> +         HOST_WIDE_INT init_a = TREE_INT_CST_LOW (DR_CONST_OFFSET (dra));

> +         HOST_WIDE_INT init_b = TREE_INT_CST_LOW (DR_CONST_OFFSET (drb));

>           gcc_assert (init_a <= init_b);

>

>           /* If init_b == init_a + the size of the type * k, we have an

> @@ -2859,10 +2860,10 @@ vect_analyze_data_ref_accesses (vec_info

>           /* If we have a store, the accesses are adjacent.  This splits

>              groups into chunks we support (we don't support vectorization

>              of stores with gaps).  */

> +         HOST_WIDE_INT prev_init

> +           = TREE_INT_CST_LOW (DR_CONST_OFFSET (datarefs_copy[i - 1]));

>           if (!DR_IS_READ (dra)

> -             && (init_b - (HOST_WIDE_INT) TREE_INT_CST_LOW

> -                                            (DR_INIT (datarefs_copy[i-1]))

> -                 != type_size_a))

> +             && (init_b - prev_init) != type_size_a)

>             break;

>

>           /* If the step (if not zero or non-constant) is greater than the

> @@ -2974,12 +2975,12 @@ vect_vfa_segment_size (struct data_refer

>  vect_no_alias_p (struct data_reference *a, struct data_reference *b,

>                   tree segment_length_a, tree segment_length_b)

>  {

> -  gcc_assert (TREE_CODE (DR_INIT (a)) == INTEGER_CST

> -             && TREE_CODE (DR_INIT (b)) == INTEGER_CST);

> -  if (tree_int_cst_equal (DR_INIT (a), DR_INIT (b)))

> +  gcc_assert (TREE_CODE (DR_CONST_OFFSET (a)) == INTEGER_CST

> +             && TREE_CODE (DR_CONST_OFFSET (b)) == INTEGER_CST);

> +  if (tree_int_cst_equal (DR_CONST_OFFSET (a), DR_CONST_OFFSET (b)))

>      return false;

>

> -  tree seg_a_min = DR_INIT (a);

> +  tree seg_a_min = DR_CONST_OFFSET (a);

>    tree seg_a_max = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_a_min),

>                                 seg_a_min, segment_length_a);

>    /* For negative step, we need to adjust address range by TYPE_SIZE_UNIT

> @@ -2990,10 +2991,10 @@ vect_no_alias_p (struct data_reference *

>        tree unit_size = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (a)));

>        seg_a_min = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_a_max),

>                                seg_a_max, unit_size);

> -      seg_a_max = fold_build2 (PLUS_EXPR, TREE_TYPE (DR_INIT (a)),

> -                              DR_INIT (a), unit_size);

> +      seg_a_max = fold_build2 (PLUS_EXPR, TREE_TYPE (DR_CONST_OFFSET (a)),

> +                              DR_CONST_OFFSET (a), unit_size);

>      }

> -  tree seg_b_min = DR_INIT (b);

> +  tree seg_b_min = DR_CONST_OFFSET (b);

>    tree seg_b_max = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_b_min),

>                                 seg_b_min, segment_length_b);

>    if (tree_int_cst_compare (DR_STEP (b), size_zero_node) < 0)

> @@ -3001,8 +3002,8 @@ vect_no_alias_p (struct data_reference *

>        tree unit_size = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (b)));

>        seg_b_min = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_b_max),

>                                seg_b_max, unit_size);

> -      seg_b_max = fold_build2 (PLUS_EXPR, TREE_TYPE (DR_INIT (b)),

> -                              DR_INIT (b), unit_size);

> +      seg_b_max = fold_build2 (PLUS_EXPR, TREE_TYPE (DR_CONST_OFFSET (b)),

> +                              DR_CONST_OFFSET (b), unit_size);

>      }

>

>    if (tree_int_cst_le (seg_a_max, seg_b_min)

> @@ -3148,8 +3149,7 @@ vect_prune_runtime_alias_test_list (loop

>        comp_res = data_ref_compare_tree (DR_BASE_ADDRESS (dr_a),

>                                         DR_BASE_ADDRESS (dr_b));

>        if (comp_res == 0)

> -       comp_res = data_ref_compare_tree (DR_OFFSET (dr_a),

> -                                         DR_OFFSET (dr_b));

> +       comp_res = dr_var_offsets_compare (dr_a, dr_b);

>

>        /* Alias is known at compilation time.  */

>        if (comp_res == 0

> @@ -3455,7 +3455,7 @@ vect_analyze_data_refs (vec_info *vinfo,

>      {

>        gimple *stmt;

>        stmt_vec_info stmt_info;

> -      tree base, offset, init;

> +      tree base, offset;

>        enum { SG_NONE, GATHER, SCATTER } gatherscatter = SG_NONE;

>        bool simd_lane_access = false;

>        int vf;

> @@ -3488,8 +3488,7 @@ vect_analyze_data_refs (vec_info *vinfo,

>         }

>

>        /* Check that analysis of the data-ref succeeded.  */

> -      if (!DR_BASE_ADDRESS (dr) || !DR_OFFSET (dr) || !DR_INIT (dr)

> -         || !DR_STEP (dr))

> +      if (!DR_BASE_ADDRESS (dr) || !DR_OFFSET (dr) || !DR_STEP (dr))

>          {

>           bool maybe_gather

>             = DR_IS_READ (dr)

> @@ -3515,7 +3514,6 @@ vect_analyze_data_refs (vec_info *vinfo,

>               gcc_assert (newdr != NULL && DR_REF (newdr));

>               if (DR_BASE_ADDRESS (newdr)

>                   && DR_OFFSET (newdr)

> -                 && DR_INIT (newdr)

>                   && DR_STEP (newdr)

>                   && integer_zerop (DR_STEP (newdr)))

>                 {

> @@ -3523,8 +3521,7 @@ vect_analyze_data_refs (vec_info *vinfo,

>                     {

>                       tree off = DR_OFFSET (newdr);

>                       STRIP_NOPS (off);

> -                     if (TREE_CODE (DR_INIT (newdr)) == INTEGER_CST

> -                         && TREE_CODE (off) == MULT_EXPR

> +                     if (TREE_CODE (off) == MULT_EXPR

>                           && tree_fits_uhwi_p (TREE_OPERAND (off, 1)))

>                         {

>                           tree step = TREE_OPERAND (off, 1);

> @@ -3555,7 +3552,7 @@ vect_analyze_data_refs (vec_info *vinfo,

>                                     {

>                                       DR_OFFSET (newdr) = ssize_int (0);

>                                       DR_STEP (newdr) = step;

> -                                     DR_OFFSET_ALIGNMENT (newdr)

> +                                     DR_VAR_OFFSET_ALIGNMENT (newdr)

>                                         = BIGGEST_ALIGNMENT;

>                                       DR_STEP_ALIGNMENT (newdr)

>                                         = highest_pow2_factor (step);

> @@ -3665,7 +3662,6 @@ vect_analyze_data_refs (vec_info *vinfo,

>

>        base = unshare_expr (DR_BASE_ADDRESS (dr));

>        offset = unshare_expr (DR_OFFSET (dr));

> -      init = unshare_expr (DR_INIT (dr));

>

>        if (is_gimple_call (stmt)

>           && (!gimple_call_internal_p (stmt)

> @@ -3701,9 +3697,7 @@ vect_analyze_data_refs (vec_info *vinfo,

>              inner loop: *(BASE + INIT + OFFSET).  By construction,

>              this address must be invariant in the inner loop, so we

>              can consider it as being used in the outer loop.  */

> -         tree init_offset = fold_build2 (PLUS_EXPR, TREE_TYPE (offset),

> -                                         init, offset);

> -         tree init_addr = fold_build_pointer_plus (base, init_offset);

> +         tree init_addr = fold_build_pointer_plus (base, offset);

>           tree init_ref = build_fold_indirect_ref (init_addr);

>

>           if (dump_enabled_p ())

> @@ -3715,7 +3709,7 @@ vect_analyze_data_refs (vec_info *vinfo,

>             }

>

>           if (!dr_analyze_innermost (&STMT_VINFO_DR_WRT_VEC_LOOP (stmt_info),

> -                                    init_ref, loop))

> +                                    init_ref, stmt, loop))

>             /* dr_analyze_innermost already explained the failure.  */

>             return false;

>

> @@ -3728,10 +3722,14 @@ vect_analyze_data_refs (vec_info *vinfo,

>               dump_printf (MSG_NOTE, "\n\touter offset from base address: ");

>               dump_generic_expr (MSG_NOTE, TDF_SLIM,

>                                   STMT_VINFO_DR_OFFSET (stmt_info));

> -             dump_printf (MSG_NOTE,

> -                           "\n\touter constant offset from base address: ");

> +             dump_printf (MSG_NOTE, "\n\tvariable part of outer offset"

> +                          " from base address: ");

> +             dump_generic_expr (MSG_NOTE, TDF_SLIM,

> +                                 STMT_VINFO_DR_VAR_OFFSET (stmt_info));

> +             dump_printf (MSG_NOTE, "\n\tconstart part of outer offset"

> +                          " from base address: ");

>               dump_generic_expr (MSG_NOTE, TDF_SLIM,

> -                                 STMT_VINFO_DR_INIT (stmt_info));

> +                                 STMT_VINFO_DR_CONST_OFFSET (stmt_info));

>               dump_printf (MSG_NOTE, "\n\touter step: ");

>               dump_generic_expr (MSG_NOTE, TDF_SLIM,

>                                   STMT_VINFO_DR_STEP (stmt_info));

> @@ -3739,8 +3737,9 @@ vect_analyze_data_refs (vec_info *vinfo,

>                            STMT_VINFO_DR_BASE_ALIGNMENT (stmt_info));

>               dump_printf (MSG_NOTE, "\n\touter base misalignment: %d\n",

>                            STMT_VINFO_DR_BASE_MISALIGNMENT (stmt_info));

> -             dump_printf (MSG_NOTE, "\n\touter offset alignment: %d\n",

> -                          STMT_VINFO_DR_OFFSET_ALIGNMENT (stmt_info));

> +             dump_printf (MSG_NOTE,

> +                          "\n\touter variable offset alignment: %d\n",

> +                          STMT_VINFO_DR_VAR_OFFSET_ALIGNMENT (stmt_info));

>               dump_printf (MSG_NOTE, "\n\touter step alignment: %d\n",

>                            STMT_VINFO_DR_STEP_ALIGNMENT (stmt_info));

>             }

> @@ -4055,23 +4054,18 @@ vect_create_addr_base_for_vector_ref (gi

>    innermost_loop_behavior *drb = vect_dr_behavior (dr);

>

>    tree data_ref_base = unshare_expr (drb->base_address);

> -  tree base_offset = unshare_expr (drb->offset);

> -  tree init = unshare_expr (drb->init);

> -

> +  tree base_offset;

>    if (loop_vinfo)

> -    base_name = get_name (data_ref_base);

> +    {

> +      base_name = get_name (data_ref_base);

> +      base_offset = fold_convert (sizetype, unshare_expr (drb->offset));

> +    }

>    else

>      {

> -      base_offset = ssize_int (0);

> -      init = ssize_int (0);

> +      base_offset = size_int (0);

>        base_name = get_name (DR_REF (dr));

>      }

>

> -  /* Create base_offset */

> -  base_offset = size_binop (PLUS_EXPR,

> -                           fold_convert (sizetype, base_offset),

> -                           fold_convert (sizetype, init));

> -

>    if (offset)

>      {

>        offset = fold_build2 (MULT_EXPR, sizetype,

> Index: gcc/tree-vect-stmts.c

> ===================================================================

> --- gcc/tree-vect-stmts.c       2017-08-21 15:50:48.664709938 +0100

> +++ gcc/tree-vect-stmts.c       2017-08-22 14:54:48.633563940 +0100

> @@ -5970,9 +5970,7 @@ vectorizable_store (gimple *stmt, gimple

>        stride_base

>         = fold_build_pointer_plus

>             (unshare_expr (DR_BASE_ADDRESS (first_dr)),

> -            size_binop (PLUS_EXPR,

> -                        convert_to_ptrofftype (unshare_expr (DR_OFFSET (first_dr))),

> -                        convert_to_ptrofftype (DR_INIT (first_dr))));

> +            convert_to_ptrofftype (unshare_expr (DR_OFFSET (first_dr))));

>        stride_step = fold_convert (sizetype, unshare_expr (DR_STEP (first_dr)));

>

>        /* For a store with loop-invariant (but other than power-of-2)

> @@ -6299,7 +6297,6 @@ vectorizable_store (gimple *stmt, gimple

>               && TREE_CODE (DR_BASE_ADDRESS (first_dr)) == ADDR_EXPR

>               && VAR_P (TREE_OPERAND (DR_BASE_ADDRESS (first_dr), 0))

>               && integer_zerop (DR_OFFSET (first_dr))

> -             && integer_zerop (DR_INIT (first_dr))

>               && alias_sets_conflict_p (get_alias_set (aggr_type),

>                                         get_alias_set (TREE_TYPE (ref_type))))

>             {

> @@ -6993,9 +6990,7 @@ vectorizable_load (gimple *stmt, gimple_

>        stride_base

>         = fold_build_pointer_plus

>             (DR_BASE_ADDRESS (first_dr),

> -            size_binop (PLUS_EXPR,

> -                        convert_to_ptrofftype (DR_OFFSET (first_dr)),

> -                        convert_to_ptrofftype (DR_INIT (first_dr))));

> +            convert_to_ptrofftype (DR_OFFSET (first_dr)));

>        stride_step = fold_convert (sizetype, DR_STEP (first_dr));

>

>        /* For a load with loop-invariant (but other than power-of-2)

> @@ -7394,7 +7389,6 @@ vectorizable_load (gimple *stmt, gimple_

>               && TREE_CODE (DR_BASE_ADDRESS (first_dr)) == ADDR_EXPR

>               && VAR_P (TREE_OPERAND (DR_BASE_ADDRESS (first_dr), 0))

>               && integer_zerop (DR_OFFSET (first_dr))

> -             && integer_zerop (DR_INIT (first_dr))

>               && alias_sets_conflict_p (get_alias_set (aggr_type),

>                                         get_alias_set (TREE_TYPE (ref_type)))

>               && (alignment_support_scheme == dr_aligned

> @@ -7417,8 +7411,8 @@ vectorizable_load (gimple *stmt, gimple_

>                 = STMT_VINFO_DATA_REF (vinfo_for_stmt (first_stmt_for_drptr));

>               tree diff = fold_convert (sizetype,

>                                         size_binop (MINUS_EXPR,

> -                                                   DR_INIT (first_dr),

> -                                                   DR_INIT (ptrdr)));

> +                                                   DR_CONST_OFFSET (first_dr),

> +                                                   DR_CONST_OFFSET (ptrdr)));

>               dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, gsi,

>                                              stmt, diff);

>             }

> Index: gcc/testsuite/gcc.dg/vect/bb-slp-pr81635.c

> ===================================================================

> --- /dev/null   2017-08-21 17:11:06.647307681 +0100

> +++ gcc/testsuite/gcc.dg/vect/bb-slp-pr81635.c  2017-08-22 14:54:48.628563940 +0100

> @@ -0,0 +1,56 @@

> +/* { dg-do compile } */

> +/* { dg-require-effective-target vect_unpack } */

> +

> +double p[1000];

> +double q[1000];

> +

> +void

> +f1 (void)

> +{

> +  for (unsigned int i = 0; i < 1000; i += 4)

> +    {

> +      double a = q[i] + p[i];

> +      double b = q[i + 1] + p[i + 1];

> +      q[i] = a;

> +      q[i + 1] = b;

> +    }

> +}

> +

> +void

> +f2 (void)

> +{

> +  for (unsigned int i = 0; i < 500; i += 6)

> +    for (unsigned int j = 0; j < 500; j += 4)

> +      {

> +       double a = q[j] + p[i];

> +       double b = q[j + 1] + p[i + 1];

> +       q[i] = a;

> +       q[i + 1] = b;

> +      }

> +}

> +

> +void

> +f3 (void)

> +{

> +  for (unsigned int i = 2; i < 1000; i += 4)

> +    {

> +      double a = q[i - 2] + p[i - 2];

> +      double b = q[i - 1] + p[i - 1];

> +      q[i - 2] = a;

> +      q[i - 1] = b;

> +    }

> +}

> +

> +void

> +f4 (unsigned int n)

> +{

> +  for (unsigned int i = 0; i < n; i += 4)

> +    {

> +      double a = q[i] + p[i];

> +      double b = q[i + 1] + p[i + 1];

> +      q[i] = a;

> +      q[i + 1] = b;

> +    }

> +}

> +

> +/* { dg-final { scan-tree-dump-times "basic block vectorized" 4 "slp1" } } */

Richard Sandiford Aug. 24, 2017, 6:49 p.m. UTC | #13

Richard Biener <richard.guenther@gmail.com> writes:
> @@ -787,14 +821,14 @@ canonicalize_base_object_address (tree a

>

>  bool

>  dr_analyze_innermost (innermost_loop_behavior *drb, tree ref,

> -                     struct loop *loop)

> +                     gimple *stmt, struct loop *loop)

>  {

>

> please amend the function comment with what STMT is about

> (DR_STMT I suppose).


I'd done that with:

------
@@ -765,7 +778,28 @@ canonicalize_base_object_address (tree a
   return build_fold_addr_expr (TREE_OPERAND (addr, 0));
 }
 
-/* Analyze the behavior of memory reference REF.  There are two modes:
[...]
+/* Analyze the behavior of memory reference REF, which occurs in STMT.
+   There are two modes:
 
    - BB analysis.  In this case we simply split the address into base,
      init and offset components, without reference to any containing loop.
------

but it wasn't obvious because of the elided stuff.

> @@ -893,14 +927,14 @@ dr_analyze_innermost (innermost_loop_beh

>    init = size_binop (PLUS_EXPR, init, dinit);

>    base_misalignment -= TREE_INT_CST_LOW (dinit);

>

> -  split_constant_offset (offset_iv.base, &offset_iv.base, &dinit);

> -  init = size_binop (PLUS_EXPR, init, dinit);

> -

> -  step = size_binop (PLUS_EXPR,

> -                    fold_convert (ssizetype, base_iv.step),

> -                    fold_convert (ssizetype, offset_iv.step));

> -

>    base = canonicalize_base_object_address (base_iv.base);

> +  tree offset = size_binop (PLUS_EXPR,

> +                           fold_convert (ssizetype, offset_iv.base),

> +                           init);

>

> so you remove the split_constant_offset handling on offset_iv.base.

> This may end up no longer walking and expanding def stmts of

> SSA names contained therein.  I suppose this is fully intended

> so that re-computing the ref address using DR_BASE/DR_OFFSET

> doesn't end up expanding that redundant code?


Yeah.  I think DR_OFFSET is only going to be used by code that
wants to construct a new address, so there doesn't seem much
point taking apart the offset only to regimplify it when building
a new reference.

> For analysis relying on this one now needs to resort to

> DR_VAR/CONST_OFFSET where SCEV analysis hopefully performs similar

> expansions?


We still apply split_constant_offset to the DR_VAR_OFFSET as well;
the scev analysis applies on top of that rather than replacing it.

> Just sth to watch at ...

>

> @@ -921,12 +955,12 @@ dr_analyze_innermost (innermost_loop_beh

>      }

>

>    drb->base_address = base;

> -  drb->offset = fold_convert (ssizetype, offset_iv.base);

> -  drb->init = init;

> +  drb->offset = offset;

>    drb->step = step;

> +  split_constant_offset (scev, &drb->var_offset, &drb->const_offset);

>

> so is the full-fledged split_constant_offset (with its SSA name handling)

> still needed here?  Sth to eventually address with a followup.


Yeah, I think we still need it.  The SCEV stuff is only really useful
if the starting offset has a simple evolution in a containing loop.
For other cases we punt and just use the original generic expression.
In particular, if we're doing SLP on a single-block function, we need
everything that split_constant_offset currently does.

> @@ -1490,6 +1482,7 @@ ref_at_iteration (data_reference_p dr, i

>    tree ref_op2 = NULL_TREE;

>    tree new_offset;

>

> +  split_constant_offset (DR_OFFSET (dr), &off, &coff);

>    if (iter != 0)

>      {

>        new_offset = size_binop (MULT_EXPR, DR_STEP (dr), ssize_int (iter));

>

> likewise here?  Why do you think ref_at_iteration cares?  Is that because

> of codegen quality?  I'd have done with coff == size_zero_node plus

> simplifications that arise from that.


Yeah, I assumed it was codegen quality.  But maybe it was just a way to
compensate for the fact that the MEM_REF is built directly (rather than
via a fold) and so this was the easiest way of getting a nonzero second
operand to the MEM_REF.

Thanks,
Richard

PR81635: Use chrecs to help find related data refs

Commit Message

Comments

Patch