[RFC,PR82479] missing popcount builtin detection

Message ID	CAELXzTNXvW1jo4hbn3LHgo5ne=u-0tFJB5Ri3-P=1z-_YHNnFA@mail.gmail.com
State	New
Headers	show Delivered-To: patch@linaro.org Received-SPF: pass (google.com: domain of gcc-patches-return-471991-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:from:date:message-id:subject:to:content-type; q= dns; s=default; b=x77iktAUDa8YIf3nXsKrQ+KkQ6mbslMnCsZt5Rnz4HqkK3 VOoTBlqANUNIZxgkKJjJzPZoAwd7UP04xNW/jPbGvFUurLyWMR0KqtRY2TPdZJRJ I5LinelTGqQAC+AmFqGP6SlZNITeC5WjgBPYzLsW3uzoKXoscDRQZAc4rcwz0= Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk Sender: gcc-patches-owner@gcc.gnu.org MIME-Version: 1.0 From: Kugan Vivekanandarajah <kugan.vivekanandarajah@linaro.org> Date: Thu, 25 Jan 2018 08:56:37 +1100 Message-ID: <CAELXzTNXvW1jo4hbn3LHgo5ne=u-0tFJB5Ri3-P=1z-_YHNnFA@mail.gmail.com> Subject: [RFC][PR82479] missing popcount builtin detection To: gcc-patches@gcc.gnu.org Content-Type: multipart/mixed; boundary="94eb2c0543a8361c9705638cbfc3"
Series	[RFC,PR82479] missing popcount builtin detection \| expand [RFC,PR82479] missing popcount builtin detection

Kugan Vivekanandarajah Jan. 24, 2018, 9:56 p.m. UTC

Hi All,

Here is a patch for popcount builtin detection similar to LLVM. I
would like to queue this for review for next stage 1.

1. This is done part of loop-distribution and effective for -O3 and above.
2. This does not distribute loop to detect popcount (like
memcpy/memmove). I dont think that happens in practice. Please correct
me if I am wrong.

Bootstrapped and regression tested on aarch64-linux-gnu with no new regressions.

Thanks,
Kugan

gcc/ChangeLog:

2018-01-25  Kugan Vivekanandarajah  <kuganv@linaro.org>

    PR middle-end/82479
    * tree-loop-distribution.c (handle_popcount): New.
    (pass_loop_distribution::execute): Use handle_popcount.

gcc/testsuite/ChangeLog:

2018-01-25  Kugan Vivekanandarajah  <kuganv@linaro.org>

    PR middle-end/82479
    * gcc.dg/tree-ssa/popcount.c: New test.

Richard Biener Jan. 25, 2018, 9:04 a.m. UTC | #1

On Wed, Jan 24, 2018 at 10:56 PM, Kugan Vivekanandarajah
<kugan.vivekanandarajah@linaro.org> wrote:
> Hi All,

>

> Here is a patch for popcount builtin detection similar to LLVM. I

> would like to queue this for review for next stage 1.

>

> 1. This is done part of loop-distribution and effective for -O3 and above.

> 2. This does not distribute loop to detect popcount (like

> memcpy/memmove). I dont think that happens in practice. Please correct

> me if I am wrong.


But then it has no business inside loop distribution but instead is
doing final value
replacement, right?  You are pattern-matching the whole loop after all.  I think
final value replacement would already do the correct thing if you
teached number of
iteration analysis that niter for

  <bb 3> [local count: 955630224]:
  # b_11 = PHI <b_5(5), b_8(6)>
  _1 = b_11 + -1;
  b_8 = _1 & b_11;
  if (b_8 != 0)
    goto <bb 6>; [89.00%]
  else
    goto <bb 8>; [11.00%]

  <bb 6> [local count: 850510900]:
  goto <bb 3>; [100.00%]

is related to popcount (b_5).

Richard.

> Bootstrapped and regression tested on aarch64-linux-gnu with no new regressions.

>

> Thanks,

> Kugan

>

> gcc/ChangeLog:

>

> 2018-01-25  Kugan Vivekanandarajah  <kuganv@linaro.org>

>

>     PR middle-end/82479

>     * tree-loop-distribution.c (handle_popcount): New.

>     (pass_loop_distribution::execute): Use handle_popcount.

>

> gcc/testsuite/ChangeLog:

>

> 2018-01-25  Kugan Vivekanandarajah  <kuganv@linaro.org>

>

>     PR middle-end/82479

>     * gcc.dg/tree-ssa/popcount.c: New test.

Kugan Vivekanandarajah Jan. 31, 2018, 10:28 a.m. UTC | #2

Hi Richard,

Thanks for the review.
On 25 January 2018 at 20:04, Richard Biener <richard.guenther@gmail.com> wrote:
> On Wed, Jan 24, 2018 at 10:56 PM, Kugan Vivekanandarajah

> <kugan.vivekanandarajah@linaro.org> wrote:

>> Hi All,

>>

>> Here is a patch for popcount builtin detection similar to LLVM. I

>> would like to queue this for review for next stage 1.

>>

>> 1. This is done part of loop-distribution and effective for -O3 and above.

>> 2. This does not distribute loop to detect popcount (like

>> memcpy/memmove). I dont think that happens in practice. Please correct

>> me if I am wrong.

>

> But then it has no business inside loop distribution but instead is

> doing final value

> replacement, right?  You are pattern-matching the whole loop after all.  I think

> final value replacement would already do the correct thing if you

> teached number of

> iteration analysis that niter for

>

>   <bb 3> [local count: 955630224]:

>   # b_11 = PHI <b_5(5), b_8(6)>

>   _1 = b_11 + -1;

>   b_8 = _1 & b_11;

>   if (b_8 != 0)

>     goto <bb 6>; [89.00%]

>   else

>     goto <bb 8>; [11.00%]

>

>   <bb 6> [local count: 850510900]:

>   goto <bb 3>; [100.00%]


I am looking into this approach. What should be the scalar evolution
for b_8 (i.e. b & (b -1) in a loop) should be? This is not clear to me
and can this be represented with the scev?

Thanks,
Kugan
>

> is related to popcount (b_5).

>

> Richard.

>

>> Bootstrapped and regression tested on aarch64-linux-gnu with no new regressions.

>>

>> Thanks,

>> Kugan

>>

>> gcc/ChangeLog:

>>

>> 2018-01-25  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>

>>     PR middle-end/82479

>>     * tree-loop-distribution.c (handle_popcount): New.

>>     (pass_loop_distribution::execute): Use handle_popcount.

>>

>> gcc/testsuite/ChangeLog:

>>

>> 2018-01-25  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>

>>     PR middle-end/82479

>>     * gcc.dg/tree-ssa/popcount.c: New test.

Richard Biener Jan. 31, 2018, 10:39 a.m. UTC | #3

On Wed, Jan 31, 2018 at 11:28 AM, Kugan Vivekanandarajah
<kugan.vivekanandarajah@linaro.org> wrote:
> Hi Richard,

>

> Thanks for the review.

> On 25 January 2018 at 20:04, Richard Biener <richard.guenther@gmail.com> wrote:

>> On Wed, Jan 24, 2018 at 10:56 PM, Kugan Vivekanandarajah

>> <kugan.vivekanandarajah@linaro.org> wrote:

>>> Hi All,

>>>

>>> Here is a patch for popcount builtin detection similar to LLVM. I

>>> would like to queue this for review for next stage 1.

>>>

>>> 1. This is done part of loop-distribution and effective for -O3 and above.

>>> 2. This does not distribute loop to detect popcount (like

>>> memcpy/memmove). I dont think that happens in practice. Please correct

>>> me if I am wrong.

>>

>> But then it has no business inside loop distribution but instead is

>> doing final value

>> replacement, right?  You are pattern-matching the whole loop after all.  I think

>> final value replacement would already do the correct thing if you

>> teached number of

>> iteration analysis that niter for

>>

>>   <bb 3> [local count: 955630224]:

>>   # b_11 = PHI <b_5(5), b_8(6)>

>>   _1 = b_11 + -1;

>>   b_8 = _1 & b_11;

>>   if (b_8 != 0)

>>     goto <bb 6>; [89.00%]

>>   else

>>     goto <bb 8>; [11.00%]

>>

>>   <bb 6> [local count: 850510900]:

>>   goto <bb 3>; [100.00%]

>

> I am looking into this approach. What should be the scalar evolution

> for b_8 (i.e. b & (b -1) in a loop) should be? This is not clear to me

> and can this be represented with the scev?


No, it's not affine and thus cannot be represented.  You only need the
scalar evolution of the counting IV which is already handled and
the number of iteration analysis needs to handle the above IV - this
is the missing part.

Richard.

> Thanks,

> Kugan

>>

>> is related to popcount (b_5).

>>

>> Richard.

>>

>>> Bootstrapped and regression tested on aarch64-linux-gnu with no new regressions.

>>>

>>> Thanks,

>>> Kugan

>>>

>>> gcc/ChangeLog:

>>>

>>> 2018-01-25  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>>

>>>     PR middle-end/82479

>>>     * tree-loop-distribution.c (handle_popcount): New.

>>>     (pass_loop_distribution::execute): Use handle_popcount.

>>>

>>> gcc/testsuite/ChangeLog:

>>>

>>> 2018-01-25  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>>

>>>     PR middle-end/82479

>>>     * gcc.dg/tree-ssa/popcount.c: New test.

Kugan Vivekanandarajah Feb. 1, 2018, 4:07 a.m. UTC | #4

Hi Richard,

On 31 January 2018 at 21:39, Richard Biener <richard.guenther@gmail.com> wrote:
> On Wed, Jan 31, 2018 at 11:28 AM, Kugan Vivekanandarajah

> <kugan.vivekanandarajah@linaro.org> wrote:

>> Hi Richard,

>>

>> Thanks for the review.

>> On 25 January 2018 at 20:04, Richard Biener <richard.guenther@gmail.com> wrote:

>>> On Wed, Jan 24, 2018 at 10:56 PM, Kugan Vivekanandarajah

>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>> Hi All,

>>>>

>>>> Here is a patch for popcount builtin detection similar to LLVM. I

>>>> would like to queue this for review for next stage 1.

>>>>

>>>> 1. This is done part of loop-distribution and effective for -O3 and above.

>>>> 2. This does not distribute loop to detect popcount (like

>>>> memcpy/memmove). I dont think that happens in practice. Please correct

>>>> me if I am wrong.

>>>

>>> But then it has no business inside loop distribution but instead is

>>> doing final value

>>> replacement, right?  You are pattern-matching the whole loop after all.  I think

>>> final value replacement would already do the correct thing if you

>>> teached number of

>>> iteration analysis that niter for

>>>

>>>   <bb 3> [local count: 955630224]:

>>>   # b_11 = PHI <b_5(5), b_8(6)>

>>>   _1 = b_11 + -1;

>>>   b_8 = _1 & b_11;

>>>   if (b_8 != 0)

>>>     goto <bb 6>; [89.00%]

>>>   else

>>>     goto <bb 8>; [11.00%]

>>>

>>>   <bb 6> [local count: 850510900]:

>>>   goto <bb 3>; [100.00%]

>>

>> I am looking into this approach. What should be the scalar evolution

>> for b_8 (i.e. b & (b -1) in a loop) should be? This is not clear to me

>> and can this be represented with the scev?

>

> No, it's not affine and thus cannot be represented.  You only need the

> scalar evolution of the counting IV which is already handled and

> the number of iteration analysis needs to handle the above IV - this

> is the missing part.

Thanks for the clarification. I am now matching this loop pattern in
number_of_iterations_exit when number_of_iterations_exit_assumptions
fails. If the pattern matches, I am inserting the _builtin_popcount in
the loop preheater and setting the loop niter with this. This will be
used by the final value replacement. Is this what you wanted?

In general, there is also a condition gating the loop like

if (b_4 != 0)
  goto loop;
else
  end:

This of course will not be removed by final value replacement. Since
popcount (0) is defined, this is redundant and could be removed but
not removed.

Thanks,
Kugan

>

> Richard.

>

>> Thanks,

>> Kugan

>>>

>>> is related to popcount (b_5).

>>>

>>> Richard.

>>>

>>>> Bootstrapped and regression tested on aarch64-linux-gnu with no new regressions.

>>>>

>>>> Thanks,

>>>> Kugan

>>>>

>>>> gcc/ChangeLog:

>>>>

>>>> 2018-01-25  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>>>

>>>>     PR middle-end/82479

>>>>     * tree-loop-distribution.c (handle_popcount): New.

>>>>     (pass_loop_distribution::execute): Use handle_popcount.

>>>>

>>>> gcc/testsuite/ChangeLog:

>>>>

>>>> 2018-01-25  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>>>

>>>>     PR middle-end/82479

>>>>     * gcc.dg/tree-ssa/popcount.c: New test.

Richard Biener Feb. 1, 2018, 12:21 p.m. UTC | #5

On Thu, Feb 1, 2018 at 5:07 AM, Kugan Vivekanandarajah
<kugan.vivekanandarajah@linaro.org> wrote:
> Hi Richard,

>

> On 31 January 2018 at 21:39, Richard Biener <richard.guenther@gmail.com> wrote:

>> On Wed, Jan 31, 2018 at 11:28 AM, Kugan Vivekanandarajah

>> <kugan.vivekanandarajah@linaro.org> wrote:

>>> Hi Richard,

>>>

>>> Thanks for the review.

>>> On 25 January 2018 at 20:04, Richard Biener <richard.guenther@gmail.com> wrote:

>>>> On Wed, Jan 24, 2018 at 10:56 PM, Kugan Vivekanandarajah

>>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>>> Hi All,

>>>>>

>>>>> Here is a patch for popcount builtin detection similar to LLVM. I

>>>>> would like to queue this for review for next stage 1.

>>>>>

>>>>> 1. This is done part of loop-distribution and effective for -O3 and above.

>>>>> 2. This does not distribute loop to detect popcount (like

>>>>> memcpy/memmove). I dont think that happens in practice. Please correct

>>>>> me if I am wrong.

>>>>

>>>> But then it has no business inside loop distribution but instead is

>>>> doing final value

>>>> replacement, right?  You are pattern-matching the whole loop after all.  I think

>>>> final value replacement would already do the correct thing if you

>>>> teached number of

>>>> iteration analysis that niter for

>>>>

>>>>   <bb 3> [local count: 955630224]:

>>>>   # b_11 = PHI <b_5(5), b_8(6)>

>>>>   _1 = b_11 + -1;

>>>>   b_8 = _1 & b_11;

>>>>   if (b_8 != 0)

>>>>     goto <bb 6>; [89.00%]

>>>>   else

>>>>     goto <bb 8>; [11.00%]

>>>>

>>>>   <bb 6> [local count: 850510900]:

>>>>   goto <bb 3>; [100.00%]

>>>

>>> I am looking into this approach. What should be the scalar evolution

>>> for b_8 (i.e. b & (b -1) in a loop) should be? This is not clear to me

>>> and can this be represented with the scev?

>>

>> No, it's not affine and thus cannot be represented.  You only need the

>> scalar evolution of the counting IV which is already handled and

>> the number of iteration analysis needs to handle the above IV - this

>> is the missing part.

> Thanks for the clarification. I am now matching this loop pattern in

> number_of_iterations_exit when number_of_iterations_exit_assumptions

> fails. If the pattern matches, I am inserting the _builtin_popcount in

> the loop preheater and setting the loop niter with this. This will be

> used by the final value replacement. Is this what you wanted?


No, you shouldn't insert a popcount stmt but instead the niter
GENERIC tree should be a CALL_EXPR to popcount with the
appropriate argument.

> In general, there is also a condition gating the loop like

>

> if (b_4 != 0)

>   goto loop;

> else

>   end:

>

> This of course will not be removed by final value replacement. Since

> popcount (0) is defined, this is redundant and could be removed but

> not removed.


Yeah, that's probably sth for another pass though.  I suppose the
end: case just uses zero in which case you'll have a PHI and you
can optimize

  if (b != 0)
    return popcount (b);
  return 0;

in phiopt.

Richard.

> Thanks,

> Kugan

>

>>

>> Richard.

>>

>>> Thanks,

>>> Kugan

>>>>

>>>> is related to popcount (b_5).

>>>>

>>>> Richard.

>>>>

>>>>> Bootstrapped and regression tested on aarch64-linux-gnu with no new regressions.

>>>>>

>>>>> Thanks,

>>>>> Kugan

>>>>>

>>>>> gcc/ChangeLog:

>>>>>

>>>>> 2018-01-25  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>>>>

>>>>>     PR middle-end/82479

>>>>>     * tree-loop-distribution.c (handle_popcount): New.

>>>>>     (pass_loop_distribution::execute): Use handle_popcount.

>>>>>

>>>>> gcc/testsuite/ChangeLog:

>>>>>

>>>>> 2018-01-25  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>>>>

>>>>>     PR middle-end/82479

>>>>>     * gcc.dg/tree-ssa/popcount.c: New test.

Kugan Vivekanandarajah Feb. 8, 2018, 12:41 a.m. UTC | #6

Hi Richard,

On 1 February 2018 at 23:21, Richard Biener <richard.guenther@gmail.com> wrote:
> On Thu, Feb 1, 2018 at 5:07 AM, Kugan Vivekanandarajah

> <kugan.vivekanandarajah@linaro.org> wrote:

>> Hi Richard,

>>

>> On 31 January 2018 at 21:39, Richard Biener <richard.guenther@gmail.com> wrote:

>>> On Wed, Jan 31, 2018 at 11:28 AM, Kugan Vivekanandarajah

>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>> Hi Richard,

>>>>

>>>> Thanks for the review.

>>>> On 25 January 2018 at 20:04, Richard Biener <richard.guenther@gmail.com> wrote:

>>>>> On Wed, Jan 24, 2018 at 10:56 PM, Kugan Vivekanandarajah

>>>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>>>> Hi All,

>>>>>>

>>>>>> Here is a patch for popcount builtin detection similar to LLVM. I

>>>>>> would like to queue this for review for next stage 1.

>>>>>>

>>>>>> 1. This is done part of loop-distribution and effective for -O3 and above.

>>>>>> 2. This does not distribute loop to detect popcount (like

>>>>>> memcpy/memmove). I dont think that happens in practice. Please correct

>>>>>> me if I am wrong.

>>>>>

>>>>> But then it has no business inside loop distribution but instead is

>>>>> doing final value

>>>>> replacement, right?  You are pattern-matching the whole loop after all.  I think

>>>>> final value replacement would already do the correct thing if you

>>>>> teached number of

>>>>> iteration analysis that niter for

>>>>>

>>>>>   <bb 3> [local count: 955630224]:

>>>>>   # b_11 = PHI <b_5(5), b_8(6)>

>>>>>   _1 = b_11 + -1;

>>>>>   b_8 = _1 & b_11;

>>>>>   if (b_8 != 0)

>>>>>     goto <bb 6>; [89.00%]

>>>>>   else

>>>>>     goto <bb 8>; [11.00%]

>>>>>

>>>>>   <bb 6> [local count: 850510900]:

>>>>>   goto <bb 3>; [100.00%]

>>>>

>>>> I am looking into this approach. What should be the scalar evolution

>>>> for b_8 (i.e. b & (b -1) in a loop) should be? This is not clear to me

>>>> and can this be represented with the scev?

>>>

>>> No, it's not affine and thus cannot be represented.  You only need the

>>> scalar evolution of the counting IV which is already handled and

>>> the number of iteration analysis needs to handle the above IV - this

>>> is the missing part.

>> Thanks for the clarification. I am now matching this loop pattern in

>> number_of_iterations_exit when number_of_iterations_exit_assumptions

>> fails. If the pattern matches, I am inserting the _builtin_popcount in

>> the loop preheater and setting the loop niter with this. This will be

>> used by the final value replacement. Is this what you wanted?

>

> No, you shouldn't insert a popcount stmt but instead the niter

> GENERIC tree should be a CALL_EXPR to popcount with the

> appropriate argument.


Thats what I tried earlier but ran into some ICEs. I wasn't sure if
niter in tree_niter_desc can take such.

Attached patch now does this. Also had to add support for CALL_EXPR in
few places to handle niter with CALL_EXPR. Does this look OK?

Thanks,
Kugan


gcc/ChangeLog:

2018-02-08  Kugan Vivekanandarajah  <kuganv@linaro.org>

    * gimple-expr.c (extract_ops_from_tree): Handle CALL_EXPR.
    * tree-chrec.c (chrec_convert_1): Likewise.
    * tree-scalar-evolution.c (expression_expensive_p): Likewise.
    * tree-ssa-loop-ivopts.c (contains_abnormal_ssa_name_p): Likewise.
    * tree-ssa-loop-niter.c (check_popcount_pattern): New.
    (number_of_iterations_exit): Record niter for popcount patern.
    * tree-ssa.c (verify_ssa): Check stmt to be non NULL.

gcc/testsuite/ChangeLog:

2018-02-08  Kugan Vivekanandarajah  <kuganv@linaro.org>

    * gcc.dg/tree-ssa/popcount.c: New test.


>

>> In general, there is also a condition gating the loop like

>>

>> if (b_4 != 0)

>>   goto loop;

>> else

>>   end:

>>

>> This of course will not be removed by final value replacement. Since

>> popcount (0) is defined, this is redundant and could be removed but

>> not removed.

>

> Yeah, that's probably sth for another pass though.  I suppose the

> end: case just uses zero in which case you'll have a PHI and you

> can optimize

>

>   if (b != 0)

>     return popcount (b);

>   return 0;

>

> in phiopt.

>

> Richard.

>

>> Thanks,

>> Kugan

>>

>>>

>>> Richard.

>>>

>>>> Thanks,

>>>> Kugan

>>>>>

>>>>> is related to popcount (b_5).

>>>>>

>>>>> Richard.

>>>>>

>>>>>> Bootstrapped and regression tested on aarch64-linux-gnu with no new regressions.

>>>>>>

>>>>>> Thanks,

>>>>>> Kugan

>>>>>>

>>>>>> gcc/ChangeLog:

>>>>>>

>>>>>> 2018-01-25  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>>>>>

>>>>>>     PR middle-end/82479

>>>>>>     * tree-loop-distribution.c (handle_popcount): New.

>>>>>>     (pass_loop_distribution::execute): Use handle_popcount.

>>>>>>

>>>>>> gcc/testsuite/ChangeLog:

>>>>>>

>>>>>> 2018-01-25  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>>>>>

>>>>>>     PR middle-end/82479

>>>>>>     * gcc.dg/tree-ssa/popcount.c: New test.
From 296efa2c09cdd797e3295f0c29a5a943dc87e9f2 Mon Sep 17 00:00:00 2001
From: Kugan Vivekanandarajah <kugan.vivekanandarajah@linaro.org>
Date: Thu, 8 Feb 2018 09:28:57 +1100
Subject: [PATCH] popcount v2

---
 gcc/gimple-expr.c                        |   3 +-
 gcc/testsuite/gcc.dg/tree-ssa/popcount.c |  41 ++++++++++
 gcc/tree-chrec.c                         |   2 +
 gcc/tree-scalar-evolution.c              |   2 +
 gcc/tree-ssa-loop-ivopts.c               |  10 +++
 gcc/tree-ssa-loop-niter.c                | 124 ++++++++++++++++++++++++++++++-
 gcc/tree-ssa.c                           |   2 +-
 7 files changed, 181 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/popcount.c

diff --git a/gcc/gimple-expr.c b/gcc/gimple-expr.c
index 56caaca..3fc04ff 100644
--- a/gcc/gimple-expr.c
+++ b/gcc/gimple-expr.c
@@ -548,7 +548,8 @@ extract_ops_from_tree (tree expr, enum tree_code *subcode_p, tree *op1_p,
       *op2_p = NULL_TREE;
       *op3_p = NULL_TREE;
     }
-  else if (grhs_class == GIMPLE_SINGLE_RHS)
+  else if (grhs_class == GIMPLE_SINGLE_RHS
+	   || TREE_CODE (expr) == CALL_EXPR)
     {
       *op1_p = expr;
       *op2_p = NULL_TREE;
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/popcount.c b/gcc/testsuite/gcc.dg/tree-ssa/popcount.c
new file mode 100644
index 0000000..86a66cb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/popcount.c
@@ -0,0 +1,41 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+
+extern int foo (int);
+
+int PopCount (long b) {
+    int c = 0;
+    b++;
+
+    while (b) {
+	b &= b - 1;
+	c++;
+    }
+    return c;
+}
+int PopCount2 (long b) {
+    int c = 0;
+
+    while (b) {
+	b &= b - 1;
+	c++;
+    }
+    foo (c);
+    return foo (c);
+}
+
+void PopCount3 (long b1) {
+
+    for (long i = 0; i < b1; ++i)
+      {
+	long b = i;
+	int c = 0;
+	while (b) {
+	    b &= b - 1;
+	    c++;
+	}
+	foo (c);
+      }
+}
+
+/* { dg-final { scan-tree-dump-times "__builtin_popcount" 3 "optimized" } } */
diff --git a/gcc/tree-chrec.c b/gcc/tree-chrec.c
index 924df04..098f46f 100644
--- a/gcc/tree-chrec.c
+++ b/gcc/tree-chrec.c
@@ -1382,6 +1382,8 @@ keep_cast:
 						  CHREC_RIGHT (chrec)));
       res = chrec_convert_1 (type, res, at_stmt, use_overflow_semantics, from);
     }
+  else if (TREE_CODE (chrec) == CALL_EXPR)
+    res = fold_build1 (NOP_EXPR, TREE_TYPE (TREE_TYPE (chrec)), chrec);
   else
     res = fold_convert (type, chrec);
 
diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c
index 0ba1aa8..ff313c9 100644
--- a/gcc/tree-scalar-evolution.c
+++ b/gcc/tree-scalar-evolution.c
@@ -3488,6 +3488,8 @@ expression_expensive_p (tree expr)
       if (!integer_pow2p (TREE_OPERAND (expr, 1)))
 	return true;
     }
+  if (code == CALL_EXPR)
+    return false;
 
   switch (TREE_CODE_CLASS (code))
     {
diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index b313571..519649a 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -985,6 +985,16 @@ contains_abnormal_ssa_name_p (tree expr)
   code = TREE_CODE (expr);
   codeclass = TREE_CODE_CLASS (code);
 
+  if (code == CALL_EXPR)
+    {
+      tree arg;
+      call_expr_arg_iterator iter;
+      FOR_EACH_CALL_EXPR_ARG (arg, iter, expr)
+	if (contains_abnormal_ssa_name_p (arg))
+	  return true;
+      return false;
+    }
+
   if (code == SSA_NAME)
     return SSA_NAME_OCCURS_IN_ABNORMAL_PHI (expr) != 0;
 
diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
index fa49abf..470ba2f 100644
--- a/gcc/tree-ssa-loop-niter.c
+++ b/gcc/tree-ssa-loop-niter.c
@@ -2430,6 +2430,111 @@ number_of_iterations_exit_assumptions (struct loop *loop, edge exit,
   return (!integer_zerop (niter->assumptions));
 }
 
+/* See if LOOP is a popcout implementation of the form
+
+    int c = 0;
+    while (b) {
+	b = b & (b - 1);
+	c++;
+    }
+
+    If so, Set NITER to  __builtin_popcount (b) - 1
+    return true if we did, false otherwise.  */
+
+static bool
+check_popcount_pattern (loop_p loop, tree *niter)
+{
+  tree lhs, rhs;
+  tree dest, src;
+  gimple *and_minus_one;
+  int count = 0;
+  gphi *count_phi = NULL;
+
+  if (!single_exit (loop))
+    return false;
+
+  /* Check loop terminating branch is like
+     if (b != 0).  */
+  gimple *stmt = last_stmt (loop->header);
+  if (!stmt
+      || gimple_code (stmt) != GIMPLE_COND
+      || !zerop (gimple_cond_rhs (stmt)))
+    return false;
+
+  /* Cheeck "b = b & (b - 1)" is calculated.  */
+  lhs = gimple_cond_lhs (stmt);
+  if (TREE_CODE (lhs) != SSA_NAME)
+    return false;
+  gimple *and_stmt = SSA_NAME_DEF_STMT (lhs);
+  if (!is_gimple_assign (and_stmt)
+      || gimple_assign_rhs_code (and_stmt) != BIT_AND_EXPR)
+    return false;
+  lhs = gimple_assign_rhs1 (and_stmt);
+  rhs = gimple_assign_rhs2 (and_stmt);
+  if (TREE_CODE (lhs) == SSA_NAME
+      && (and_minus_one = SSA_NAME_DEF_STMT (lhs))
+      && is_gimple_assign (and_minus_one)
+      && (gimple_assign_rhs_code (and_minus_one) == PLUS_EXPR)
+      && integer_minus_onep (gimple_assign_rhs2 (and_minus_one)))
+      lhs = rhs;
+  else if (TREE_CODE (rhs) == SSA_NAME
+      && (and_minus_one = SSA_NAME_DEF_STMT (rhs))
+      && is_gimple_assign (and_minus_one)
+      && (gimple_assign_rhs_code (and_minus_one) == PLUS_EXPR)
+      && integer_minus_onep (gimple_assign_rhs2 (and_minus_one)))
+      ;
+  else
+    return false;
+  if ((gimple_assign_rhs1 (and_stmt) != gimple_assign_rhs1 (and_minus_one))
+      && (gimple_assign_rhs2 (and_stmt) != gimple_assign_rhs1 (and_minus_one)))
+    return false;
+
+  /* Check the recurrence.  */
+  gimple *phi = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (and_minus_one));
+  gimple *src_phi = SSA_NAME_DEF_STMT (lhs);
+  if (gimple_code (phi) != GIMPLE_PHI
+      || gimple_code (src_phi) != GIMPLE_PHI)
+    return false;
+
+  /* Check the loop closed SSA definition for just the variable c defined in
+     loop.  */
+  src = gimple_phi_arg_def (src_phi, loop_preheader_edge (loop)->dest_idx);
+  basic_block bb = single_exit (loop)->dest;
+  for (gphi_iterator gpi = gsi_start_phis (bb);
+       !gsi_end_p (gpi); gsi_next (&gpi))
+    {
+      count_phi = gpi.phi ();
+      count++;
+    }
+
+  if (count != 1)
+    return false;
+
+  /* Make sure we have a count by one and it starts from zero.  */
+  rhs = gimple_phi_arg_def (count_phi, 0);
+  stmt = SSA_NAME_DEF_STMT (rhs);
+  if (!is_gimple_assign (stmt)
+      || (gimple_assign_rhs_code (stmt) != PLUS_EXPR)
+      || !integer_onep (gimple_assign_rhs2 (stmt)))
+    return false;
+  rhs = gimple_assign_rhs1 (stmt);
+  stmt = SSA_NAME_DEF_STMT (rhs);
+  if (gimple_code (stmt) != GIMPLE_PHI
+      || !integer_zerop (gimple_phi_arg_def (stmt,
+					     loop_preheader_edge (loop)->dest_idx)))
+    return false;
+
+  dest = gimple_phi_result (count_phi);
+  tree var = make_ssa_name (TREE_TYPE (dest), NULL);
+  tree fn = builtin_decl_implicit (BUILT_IN_POPCOUNT);
+
+  var = build_call_expr (fn, 1, src);
+  *niter = fold_build2 (MINUS_EXPR, TREE_TYPE (dest), var,
+			build_int_cst (TREE_TYPE (dest), 1));
+  return true;
+}
+
+
 /* Like number_of_iterations_exit_assumptions, but return TRUE only if
    the niter information holds unconditionally.  */
 
@@ -2441,7 +2546,24 @@ number_of_iterations_exit (struct loop *loop, edge exit,
   gcond *stmt;
   if (!number_of_iterations_exit_assumptions (loop, exit, niter,
 					      &stmt, every_iteration))
-    return false;
+    {
+      tree count;
+      if (check_popcount_pattern (loop, &count))
+	{
+	  niter->assumptions = boolean_false_node;
+	  niter->control.base = NULL_TREE;
+	  niter->control.step = NULL_TREE;
+	  niter->control.no_overflow = false;
+	  niter->niter = count;
+	  niter->assumptions = boolean_true_node;
+	  niter->may_be_zero = boolean_false_node;
+	  niter->max = -1;
+	  niter->bound = NULL_TREE;
+	  niter->cmp = ERROR_MARK;
+	  return true;
+	}
+      return false;
+    }
 
   if (integer_nonzerop (niter->assumptions))
     return true;
diff --git a/gcc/tree-ssa.c b/gcc/tree-ssa.c
index ee311ce..572419c 100644
--- a/gcc/tree-ssa.c
+++ b/gcc/tree-ssa.c
@@ -1047,7 +1047,7 @@ verify_ssa (bool check_modified_stmt, bool check_ssa_operands)
 	  verify_ssa_name (name, virtual_operand_p (name));
 
 	  stmt = SSA_NAME_DEF_STMT (name);
-	  if (!gimple_nop_p (stmt))
+	  if (stmt && !gimple_nop_p (stmt))
 	    {
 	      basic_block bb = gimple_bb (stmt);
 	      if (verify_def (bb, definition_block,

Richard Biener March 5, 2018, 3:24 p.m. UTC | #7

On Thu, Feb 8, 2018 at 1:41 AM, Kugan Vivekanandarajah
<kugan.vivekanandarajah@linaro.org> wrote:
> Hi Richard,

>

> On 1 February 2018 at 23:21, Richard Biener <richard.guenther@gmail.com> wrote:

>> On Thu, Feb 1, 2018 at 5:07 AM, Kugan Vivekanandarajah

>> <kugan.vivekanandarajah@linaro.org> wrote:

>>> Hi Richard,

>>>

>>> On 31 January 2018 at 21:39, Richard Biener <richard.guenther@gmail.com> wrote:

>>>> On Wed, Jan 31, 2018 at 11:28 AM, Kugan Vivekanandarajah

>>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>>> Hi Richard,

>>>>>

>>>>> Thanks for the review.

>>>>> On 25 January 2018 at 20:04, Richard Biener <richard.guenther@gmail.com> wrote:

>>>>>> On Wed, Jan 24, 2018 at 10:56 PM, Kugan Vivekanandarajah

>>>>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>>>>> Hi All,

>>>>>>>

>>>>>>> Here is a patch for popcount builtin detection similar to LLVM. I

>>>>>>> would like to queue this for review for next stage 1.

>>>>>>>

>>>>>>> 1. This is done part of loop-distribution and effective for -O3 and above.

>>>>>>> 2. This does not distribute loop to detect popcount (like

>>>>>>> memcpy/memmove). I dont think that happens in practice. Please correct

>>>>>>> me if I am wrong.

>>>>>>

>>>>>> But then it has no business inside loop distribution but instead is

>>>>>> doing final value

>>>>>> replacement, right?  You are pattern-matching the whole loop after all.  I think

>>>>>> final value replacement would already do the correct thing if you

>>>>>> teached number of

>>>>>> iteration analysis that niter for

>>>>>>

>>>>>>   <bb 3> [local count: 955630224]:

>>>>>>   # b_11 = PHI <b_5(5), b_8(6)>

>>>>>>   _1 = b_11 + -1;

>>>>>>   b_8 = _1 & b_11;

>>>>>>   if (b_8 != 0)

>>>>>>     goto <bb 6>; [89.00%]

>>>>>>   else

>>>>>>     goto <bb 8>; [11.00%]

>>>>>>

>>>>>>   <bb 6> [local count: 850510900]:

>>>>>>   goto <bb 3>; [100.00%]

>>>>>

>>>>> I am looking into this approach. What should be the scalar evolution

>>>>> for b_8 (i.e. b & (b -1) in a loop) should be? This is not clear to me

>>>>> and can this be represented with the scev?

>>>>

>>>> No, it's not affine and thus cannot be represented.  You only need the

>>>> scalar evolution of the counting IV which is already handled and

>>>> the number of iteration analysis needs to handle the above IV - this

>>>> is the missing part.

>>> Thanks for the clarification. I am now matching this loop pattern in

>>> number_of_iterations_exit when number_of_iterations_exit_assumptions

>>> fails. If the pattern matches, I am inserting the _builtin_popcount in

>>> the loop preheater and setting the loop niter with this. This will be

>>> used by the final value replacement. Is this what you wanted?

>>

>> No, you shouldn't insert a popcount stmt but instead the niter

>> GENERIC tree should be a CALL_EXPR to popcount with the

>> appropriate argument.

>

> Thats what I tried earlier but ran into some ICEs. I wasn't sure if

> niter in tree_niter_desc can take such.

>

> Attached patch now does this. Also had to add support for CALL_EXPR in

> few places to handle niter with CALL_EXPR. Does this look OK?


Overall this looks ok - the patch includes changes in places that I don't think
need changes such as chrec_convert_1 or extract_ops_from_tree.
The expression_expensive_p change should be more specific than making
all calls inexpensive as well.

The verify_ssa change looks bogus, you do

+  dest = gimple_phi_result (count_phi);
+  tree var = make_ssa_name (TREE_TYPE (dest), NULL);
+  tree fn = builtin_decl_implicit (BUILT_IN_POPCOUNT);
+
+  var = build_call_expr (fn, 1, src);
+  *niter = fold_build2 (MINUS_EXPR, TREE_TYPE (dest), var,
+                       build_int_cst (TREE_TYPE (dest), 1));

why do you allocate a new SSA name here?  It seems unused
as you overwrive 'var' with the CALL_EXPR immediately.

I didn't review the pattern matching thoroughly nor the exact place you
call it.  But

+      if (check_popcount_pattern (loop, &count))
+       {
+         niter->assumptions = boolean_false_node;
+         niter->control.base = NULL_TREE;
+         niter->control.step = NULL_TREE;
+         niter->control.no_overflow = false;
+         niter->niter = count;
+         niter->assumptions = boolean_true_node;
+         niter->may_be_zero = boolean_false_node;
+         niter->max = -1;
+         niter->bound = NULL_TREE;
+         niter->cmp = ERROR_MARK;
+         return true;
+       }

simply setting may_be_zero to false looks fishy.  Try
with -fno-tree-loop-ch.  Also max should not be negative,
it should be the number of bits in the IV type?

A related testcase could be that we can completely peel
a loop like the following which iterates at most 8 times:

int a[8];
void foo (unsigned char ctrl)
{
  int c = 0;
  while (ctrl)
    {
       ctrl = ctrl & (ctrl - 1);
       a[c++] = ctrl;
    }
}

This is now stage1 material so please update and re-post.  Maybe Bin has
further suggestions as well.

Thanks,
Richard.

> Thanks,

> Kugan

>

>

> gcc/ChangeLog:

>

> 2018-02-08  Kugan Vivekanandarajah  <kuganv@linaro.org>

>

>     * gimple-expr.c (extract_ops_from_tree): Handle CALL_EXPR.

>     * tree-chrec.c (chrec_convert_1): Likewise.

>     * tree-scalar-evolution.c (expression_expensive_p): Likewise.

>     * tree-ssa-loop-ivopts.c (contains_abnormal_ssa_name_p): Likewise.

>     * tree-ssa-loop-niter.c (check_popcount_pattern): New.

>     (number_of_iterations_exit): Record niter for popcount patern.

>     * tree-ssa.c (verify_ssa): Check stmt to be non NULL.

>

> gcc/testsuite/ChangeLog:

>

> 2018-02-08  Kugan Vivekanandarajah  <kuganv@linaro.org>

>

>     * gcc.dg/tree-ssa/popcount.c: New test.

>

>

>>

>>> In general, there is also a condition gating the loop like

>>>

>>> if (b_4 != 0)

>>>   goto loop;

>>> else

>>>   end:

>>>

>>> This of course will not be removed by final value replacement. Since

>>> popcount (0) is defined, this is redundant and could be removed but

>>> not removed.

>>

>> Yeah, that's probably sth for another pass though.  I suppose the

>> end: case just uses zero in which case you'll have a PHI and you

>> can optimize

>>

>>   if (b != 0)

>>     return popcount (b);

>>   return 0;

>>

>> in phiopt.

>>

>> Richard.

>>

>>> Thanks,

>>> Kugan

>>>

>>>>

>>>> Richard.

>>>>

>>>>> Thanks,

>>>>> Kugan

>>>>>>

>>>>>> is related to popcount (b_5).

>>>>>>

>>>>>> Richard.

>>>>>>

>>>>>>> Bootstrapped and regression tested on aarch64-linux-gnu with no new regressions.

>>>>>>>

>>>>>>> Thanks,

>>>>>>> Kugan

>>>>>>>

>>>>>>> gcc/ChangeLog:

>>>>>>>

>>>>>>> 2018-01-25  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>>>>>>

>>>>>>>     PR middle-end/82479

>>>>>>>     * tree-loop-distribution.c (handle_popcount): New.

>>>>>>>     (pass_loop_distribution::execute): Use handle_popcount.

>>>>>>>

>>>>>>> gcc/testsuite/ChangeLog:

>>>>>>>

>>>>>>> 2018-01-25  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>>>>>>

>>>>>>>     PR middle-end/82479

>>>>>>>     * gcc.dg/tree-ssa/popcount.c: New test.

Bin.Cheng March 6, 2018, 4:20 p.m. UTC | #8

On Mon, Mar 5, 2018 at 3:24 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Thu, Feb 8, 2018 at 1:41 AM, Kugan Vivekanandarajah

> <kugan.vivekanandarajah@linaro.org> wrote:

>> Hi Richard,

>>

>> On 1 February 2018 at 23:21, Richard Biener <richard.guenther@gmail.com> wrote:

>>> On Thu, Feb 1, 2018 at 5:07 AM, Kugan Vivekanandarajah

>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>> Hi Richard,

>>>>

>>>> On 31 January 2018 at 21:39, Richard Biener <richard.guenther@gmail.com> wrote:

>>>>> On Wed, Jan 31, 2018 at 11:28 AM, Kugan Vivekanandarajah

>>>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>>>> Hi Richard,

>>>>>>

>>>>>> Thanks for the review.

>>>>>> On 25 January 2018 at 20:04, Richard Biener <richard.guenther@gmail.com> wrote:

>>>>>>> On Wed, Jan 24, 2018 at 10:56 PM, Kugan Vivekanandarajah

>>>>>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>>>>>> Hi All,

>>>>>>>>

>>>>>>>> Here is a patch for popcount builtin detection similar to LLVM. I

>>>>>>>> would like to queue this for review for next stage 1.

>>>>>>>>

>>>>>>>> 1. This is done part of loop-distribution and effective for -O3 and above.

>>>>>>>> 2. This does not distribute loop to detect popcount (like

>>>>>>>> memcpy/memmove). I dont think that happens in practice. Please correct

>>>>>>>> me if I am wrong.

>>>>>>>

>>>>>>> But then it has no business inside loop distribution but instead is

>>>>>>> doing final value

>>>>>>> replacement, right?  You are pattern-matching the whole loop after all.  I think

>>>>>>> final value replacement would already do the correct thing if you

>>>>>>> teached number of

>>>>>>> iteration analysis that niter for

>>>>>>>

>>>>>>>   <bb 3> [local count: 955630224]:

>>>>>>>   # b_11 = PHI <b_5(5), b_8(6)>

>>>>>>>   _1 = b_11 + -1;

>>>>>>>   b_8 = _1 & b_11;

>>>>>>>   if (b_8 != 0)

>>>>>>>     goto <bb 6>; [89.00%]

>>>>>>>   else

>>>>>>>     goto <bb 8>; [11.00%]

>>>>>>>

>>>>>>>   <bb 6> [local count: 850510900]:

>>>>>>>   goto <bb 3>; [100.00%]

>>>>>>

>>>>>> I am looking into this approach. What should be the scalar evolution

>>>>>> for b_8 (i.e. b & (b -1) in a loop) should be? This is not clear to me

>>>>>> and can this be represented with the scev?

>>>>>

>>>>> No, it's not affine and thus cannot be represented.  You only need the

>>>>> scalar evolution of the counting IV which is already handled and

>>>>> the number of iteration analysis needs to handle the above IV - this

>>>>> is the missing part.

>>>> Thanks for the clarification. I am now matching this loop pattern in

>>>> number_of_iterations_exit when number_of_iterations_exit_assumptions

>>>> fails. If the pattern matches, I am inserting the _builtin_popcount in

>>>> the loop preheater and setting the loop niter with this. This will be

>>>> used by the final value replacement. Is this what you wanted?

>>>

>>> No, you shouldn't insert a popcount stmt but instead the niter

>>> GENERIC tree should be a CALL_EXPR to popcount with the

>>> appropriate argument.

>>

>> Thats what I tried earlier but ran into some ICEs. I wasn't sure if

>> niter in tree_niter_desc can take such.

>>

>> Attached patch now does this. Also had to add support for CALL_EXPR in

>> few places to handle niter with CALL_EXPR. Does this look OK?

>

> Overall this looks ok - the patch includes changes in places that I don't think

> need changes such as chrec_convert_1 or extract_ops_from_tree.

> The expression_expensive_p change should be more specific than making

> all calls inexpensive as well.

>

> The verify_ssa change looks bogus, you do

>

> +  dest = gimple_phi_result (count_phi);

> +  tree var = make_ssa_name (TREE_TYPE (dest), NULL);

> +  tree fn = builtin_decl_implicit (BUILT_IN_POPCOUNT);

> +

> +  var = build_call_expr (fn, 1, src);

> +  *niter = fold_build2 (MINUS_EXPR, TREE_TYPE (dest), var,

> +                       build_int_cst (TREE_TYPE (dest), 1));

>

> why do you allocate a new SSA name here?  It seems unused

> as you overwrive 'var' with the CALL_EXPR immediately.

>

> I didn't review the pattern matching thoroughly nor the exact place you

> call it.  But

>

> +      if (check_popcount_pattern (loop, &count))

> +       {

> +         niter->assumptions = boolean_false_node;

> +         niter->control.base = NULL_TREE;

> +         niter->control.step = NULL_TREE;

> +         niter->control.no_overflow = false;

> +         niter->niter = count;

> +         niter->assumptions = boolean_true_node;

> +         niter->may_be_zero = boolean_false_node;

> +         niter->max = -1;

> +         niter->bound = NULL_TREE;

> +         niter->cmp = ERROR_MARK;

> +         return true;

> +       }

>

> simply setting may_be_zero to false looks fishy.  Try

> with -fno-tree-loop-ch.  Also max should not be negative,

> it should be the number of bits in the IV type?

>

> A related testcase could be that we can completely peel

> a loop like the following which iterates at most 8 times:

>

> int a[8];

> void foo (unsigned char ctrl)

> {

>   int c = 0;

>   while (ctrl)

>     {

>        ctrl = ctrl & (ctrl - 1);

>        a[c++] = ctrl;

>     }

> }

>

> This is now stage1 material so please update and re-post.  Maybe Bin has

> further suggestions as well.

Sorry for being late on this.  Actually I thought about popcount in
distribution before.  More like the first patch, but handled as
another distribution pattern rather than a special case.  It's a bit
strange to compute and store the info in niters.  It's also not
straight forward when/where the transformation is finally done.
I haven't looked into the details so not sure how appropriate it will
be as a distribution pattern (current ones are only about data
references).  So I am okay with this version if it's more appropriate.

Thanks,
bin
>

> Thanks,

> Richard.

>

>> Thanks,

>> Kugan

>>

>>

>> gcc/ChangeLog:

>>

>> 2018-02-08  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>

>>     * gimple-expr.c (extract_ops_from_tree): Handle CALL_EXPR.

>>     * tree-chrec.c (chrec_convert_1): Likewise.

>>     * tree-scalar-evolution.c (expression_expensive_p): Likewise.

>>     * tree-ssa-loop-ivopts.c (contains_abnormal_ssa_name_p): Likewise.

>>     * tree-ssa-loop-niter.c (check_popcount_pattern): New.

>>     (number_of_iterations_exit): Record niter for popcount patern.

>>     * tree-ssa.c (verify_ssa): Check stmt to be non NULL.

>>

>> gcc/testsuite/ChangeLog:

>>

>> 2018-02-08  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>

>>     * gcc.dg/tree-ssa/popcount.c: New test.

>>

>>

>>>

>>>> In general, there is also a condition gating the loop like

>>>>

>>>> if (b_4 != 0)

>>>>   goto loop;

>>>> else

>>>>   end:

>>>>

>>>> This of course will not be removed by final value replacement. Since

>>>> popcount (0) is defined, this is redundant and could be removed but

>>>> not removed.

>>>

>>> Yeah, that's probably sth for another pass though.  I suppose the

>>> end: case just uses zero in which case you'll have a PHI and you

>>> can optimize

>>>

>>>   if (b != 0)

>>>     return popcount (b);

>>>   return 0;

>>>

>>> in phiopt.

>>>

>>> Richard.

>>>

>>>> Thanks,

>>>> Kugan

>>>>

>>>>>

>>>>> Richard.

>>>>>

>>>>>> Thanks,

>>>>>> Kugan

>>>>>>>

>>>>>>> is related to popcount (b_5).

>>>>>>>

>>>>>>> Richard.

>>>>>>>

>>>>>>>> Bootstrapped and regression tested on aarch64-linux-gnu with no new regressions.

>>>>>>>>

>>>>>>>> Thanks,

>>>>>>>> Kugan

>>>>>>>>

>>>>>>>> gcc/ChangeLog:

>>>>>>>>

>>>>>>>> 2018-01-25  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>>>>>>>

>>>>>>>>     PR middle-end/82479

>>>>>>>>     * tree-loop-distribution.c (handle_popcount): New.

>>>>>>>>     (pass_loop_distribution::execute): Use handle_popcount.

>>>>>>>>

>>>>>>>> gcc/testsuite/ChangeLog:

>>>>>>>>

>>>>>>>> 2018-01-25  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>>>>>>>

>>>>>>>>     PR middle-end/82479

>>>>>>>>     * gcc.dg/tree-ssa/popcount.c: New test.

Richard Biener March 7, 2018, 8:26 a.m. UTC | #9

On Tue, Mar 6, 2018 at 5:20 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Mon, Mar 5, 2018 at 3:24 PM, Richard Biener

> <richard.guenther@gmail.com> wrote:

>> On Thu, Feb 8, 2018 at 1:41 AM, Kugan Vivekanandarajah

>> <kugan.vivekanandarajah@linaro.org> wrote:

>>> Hi Richard,

>>>

>>> On 1 February 2018 at 23:21, Richard Biener <richard.guenther@gmail.com> wrote:

>>>> On Thu, Feb 1, 2018 at 5:07 AM, Kugan Vivekanandarajah

>>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>>> Hi Richard,

>>>>>

>>>>> On 31 January 2018 at 21:39, Richard Biener <richard.guenther@gmail.com> wrote:

>>>>>> On Wed, Jan 31, 2018 at 11:28 AM, Kugan Vivekanandarajah

>>>>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>>>>> Hi Richard,

>>>>>>>

>>>>>>> Thanks for the review.

>>>>>>> On 25 January 2018 at 20:04, Richard Biener <richard.guenther@gmail.com> wrote:

>>>>>>>> On Wed, Jan 24, 2018 at 10:56 PM, Kugan Vivekanandarajah

>>>>>>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>>>>>>> Hi All,

>>>>>>>>>

>>>>>>>>> Here is a patch for popcount builtin detection similar to LLVM. I

>>>>>>>>> would like to queue this for review for next stage 1.

>>>>>>>>>

>>>>>>>>> 1. This is done part of loop-distribution and effective for -O3 and above.

>>>>>>>>> 2. This does not distribute loop to detect popcount (like

>>>>>>>>> memcpy/memmove). I dont think that happens in practice. Please correct

>>>>>>>>> me if I am wrong.

>>>>>>>>

>>>>>>>> But then it has no business inside loop distribution but instead is

>>>>>>>> doing final value

>>>>>>>> replacement, right?  You are pattern-matching the whole loop after all.  I think

>>>>>>>> final value replacement would already do the correct thing if you

>>>>>>>> teached number of

>>>>>>>> iteration analysis that niter for

>>>>>>>>

>>>>>>>>   <bb 3> [local count: 955630224]:

>>>>>>>>   # b_11 = PHI <b_5(5), b_8(6)>

>>>>>>>>   _1 = b_11 + -1;

>>>>>>>>   b_8 = _1 & b_11;

>>>>>>>>   if (b_8 != 0)

>>>>>>>>     goto <bb 6>; [89.00%]

>>>>>>>>   else

>>>>>>>>     goto <bb 8>; [11.00%]

>>>>>>>>

>>>>>>>>   <bb 6> [local count: 850510900]:

>>>>>>>>   goto <bb 3>; [100.00%]

>>>>>>>

>>>>>>> I am looking into this approach. What should be the scalar evolution

>>>>>>> for b_8 (i.e. b & (b -1) in a loop) should be? This is not clear to me

>>>>>>> and can this be represented with the scev?

>>>>>>

>>>>>> No, it's not affine and thus cannot be represented.  You only need the

>>>>>> scalar evolution of the counting IV which is already handled and

>>>>>> the number of iteration analysis needs to handle the above IV - this

>>>>>> is the missing part.

>>>>> Thanks for the clarification. I am now matching this loop pattern in

>>>>> number_of_iterations_exit when number_of_iterations_exit_assumptions

>>>>> fails. If the pattern matches, I am inserting the _builtin_popcount in

>>>>> the loop preheater and setting the loop niter with this. This will be

>>>>> used by the final value replacement. Is this what you wanted?

>>>>

>>>> No, you shouldn't insert a popcount stmt but instead the niter

>>>> GENERIC tree should be a CALL_EXPR to popcount with the

>>>> appropriate argument.

>>>

>>> Thats what I tried earlier but ran into some ICEs. I wasn't sure if

>>> niter in tree_niter_desc can take such.

>>>

>>> Attached patch now does this. Also had to add support for CALL_EXPR in

>>> few places to handle niter with CALL_EXPR. Does this look OK?

>>

>> Overall this looks ok - the patch includes changes in places that I don't think

>> need changes such as chrec_convert_1 or extract_ops_from_tree.

>> The expression_expensive_p change should be more specific than making

>> all calls inexpensive as well.

>>

>> The verify_ssa change looks bogus, you do

>>

>> +  dest = gimple_phi_result (count_phi);

>> +  tree var = make_ssa_name (TREE_TYPE (dest), NULL);

>> +  tree fn = builtin_decl_implicit (BUILT_IN_POPCOUNT);

>> +

>> +  var = build_call_expr (fn, 1, src);

>> +  *niter = fold_build2 (MINUS_EXPR, TREE_TYPE (dest), var,

>> +                       build_int_cst (TREE_TYPE (dest), 1));

>>

>> why do you allocate a new SSA name here?  It seems unused

>> as you overwrive 'var' with the CALL_EXPR immediately.

>>

>> I didn't review the pattern matching thoroughly nor the exact place you

>> call it.  But

>>

>> +      if (check_popcount_pattern (loop, &count))

>> +       {

>> +         niter->assumptions = boolean_false_node;

>> +         niter->control.base = NULL_TREE;

>> +         niter->control.step = NULL_TREE;

>> +         niter->control.no_overflow = false;

>> +         niter->niter = count;

>> +         niter->assumptions = boolean_true_node;

>> +         niter->may_be_zero = boolean_false_node;

>> +         niter->max = -1;

>> +         niter->bound = NULL_TREE;

>> +         niter->cmp = ERROR_MARK;

>> +         return true;

>> +       }

>>

>> simply setting may_be_zero to false looks fishy.  Try

>> with -fno-tree-loop-ch.  Also max should not be negative,

>> it should be the number of bits in the IV type?

>>

>> A related testcase could be that we can completely peel

>> a loop like the following which iterates at most 8 times:

>>

>> int a[8];

>> void foo (unsigned char ctrl)

>> {

>>   int c = 0;

>>   while (ctrl)

>>     {

>>        ctrl = ctrl & (ctrl - 1);

>>        a[c++] = ctrl;

>>     }

>> }

>>

>> This is now stage1 material so please update and re-post.  Maybe Bin has

>> further suggestions as well.

> Sorry for being late on this.  Actually I thought about popcount in

> distribution before.  More like the first patch, but handled as

> another distribution pattern rather than a special case.  It's a bit

> strange to compute and store the info in niters.  It's also not

> straight forward when/where the transformation is finally done.


It's done at final value replacement if the counter is used.  But with
it in place we should also be able to compute niters for the case
where it isn't (like the above case).  So I believe that in the end handling
this in niter analysis is more powerful.

> I haven't looked into the details so not sure how appropriate it will

> be as a distribution pattern (current ones are only about data

> references).  So I am okay with this version if it's more appropriate.


Yeah, adding distribution patterns for scalar reduction forms is certainly
appropriate but then this is the same or at least similar as final
value replacement.

Richard.

> Thanks,

> bin

>>

>> Thanks,

>> Richard.

>>

>>> Thanks,

>>> Kugan

>>>

>>>

>>> gcc/ChangeLog:

>>>

>>> 2018-02-08  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>>

>>>     * gimple-expr.c (extract_ops_from_tree): Handle CALL_EXPR.

>>>     * tree-chrec.c (chrec_convert_1): Likewise.

>>>     * tree-scalar-evolution.c (expression_expensive_p): Likewise.

>>>     * tree-ssa-loop-ivopts.c (contains_abnormal_ssa_name_p): Likewise.

>>>     * tree-ssa-loop-niter.c (check_popcount_pattern): New.

>>>     (number_of_iterations_exit): Record niter for popcount patern.

>>>     * tree-ssa.c (verify_ssa): Check stmt to be non NULL.

>>>

>>> gcc/testsuite/ChangeLog:

>>>

>>> 2018-02-08  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>>

>>>     * gcc.dg/tree-ssa/popcount.c: New test.

>>>

>>>

>>>>

>>>>> In general, there is also a condition gating the loop like

>>>>>

>>>>> if (b_4 != 0)

>>>>>   goto loop;

>>>>> else

>>>>>   end:

>>>>>

>>>>> This of course will not be removed by final value replacement. Since

>>>>> popcount (0) is defined, this is redundant and could be removed but

>>>>> not removed.

>>>>

>>>> Yeah, that's probably sth for another pass though.  I suppose the

>>>> end: case just uses zero in which case you'll have a PHI and you

>>>> can optimize

>>>>

>>>>   if (b != 0)

>>>>     return popcount (b);

>>>>   return 0;

>>>>

>>>> in phiopt.

>>>>

>>>> Richard.

>>>>

>>>>> Thanks,

>>>>> Kugan

>>>>>

>>>>>>

>>>>>> Richard.

>>>>>>

>>>>>>> Thanks,

>>>>>>> Kugan

>>>>>>>>

>>>>>>>> is related to popcount (b_5).

>>>>>>>>

>>>>>>>> Richard.

>>>>>>>>

>>>>>>>>> Bootstrapped and regression tested on aarch64-linux-gnu with no new regressions.

>>>>>>>>>

>>>>>>>>> Thanks,

>>>>>>>>> Kugan

>>>>>>>>>

>>>>>>>>> gcc/ChangeLog:

>>>>>>>>>

>>>>>>>>> 2018-01-25  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>>>>>>>>

>>>>>>>>>     PR middle-end/82479

>>>>>>>>>     * tree-loop-distribution.c (handle_popcount): New.

>>>>>>>>>     (pass_loop_distribution::execute): Use handle_popcount.

>>>>>>>>>

>>>>>>>>> gcc/testsuite/ChangeLog:

>>>>>>>>>

>>>>>>>>> 2018-01-25  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>>>>>>>>

>>>>>>>>>     PR middle-end/82479

>>>>>>>>>     * gcc.dg/tree-ssa/popcount.c: New test.

Bin.Cheng March 7, 2018, 11:25 a.m. UTC | #10

On Wed, Mar 7, 2018 at 8:26 AM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Tue, Mar 6, 2018 at 5:20 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:

>> On Mon, Mar 5, 2018 at 3:24 PM, Richard Biener

>> <richard.guenther@gmail.com> wrote:

>>> On Thu, Feb 8, 2018 at 1:41 AM, Kugan Vivekanandarajah

>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>> Hi Richard,

>>>>

>>>> On 1 February 2018 at 23:21, Richard Biener <richard.guenther@gmail.com> wrote:

>>>>> On Thu, Feb 1, 2018 at 5:07 AM, Kugan Vivekanandarajah

>>>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>>>> Hi Richard,

>>>>>>

>>>>>> On 31 January 2018 at 21:39, Richard Biener <richard.guenther@gmail.com> wrote:

>>>>>>> On Wed, Jan 31, 2018 at 11:28 AM, Kugan Vivekanandarajah

>>>>>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>>>>>> Hi Richard,

>>>>>>>>

>>>>>>>> Thanks for the review.

>>>>>>>> On 25 January 2018 at 20:04, Richard Biener <richard.guenther@gmail.com> wrote:

>>>>>>>>> On Wed, Jan 24, 2018 at 10:56 PM, Kugan Vivekanandarajah

>>>>>>>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>>>>>>>> Hi All,

>>>>>>>>>>

>>>>>>>>>> Here is a patch for popcount builtin detection similar to LLVM. I

>>>>>>>>>> would like to queue this for review for next stage 1.

>>>>>>>>>>

>>>>>>>>>> 1. This is done part of loop-distribution and effective for -O3 and above.

>>>>>>>>>> 2. This does not distribute loop to detect popcount (like

>>>>>>>>>> memcpy/memmove). I dont think that happens in practice. Please correct

>>>>>>>>>> me if I am wrong.

>>>>>>>>>

>>>>>>>>> But then it has no business inside loop distribution but instead is

>>>>>>>>> doing final value

>>>>>>>>> replacement, right?  You are pattern-matching the whole loop after all.  I think

>>>>>>>>> final value replacement would already do the correct thing if you

>>>>>>>>> teached number of

>>>>>>>>> iteration analysis that niter for

>>>>>>>>>

>>>>>>>>>   <bb 3> [local count: 955630224]:

>>>>>>>>>   # b_11 = PHI <b_5(5), b_8(6)>

>>>>>>>>>   _1 = b_11 + -1;

>>>>>>>>>   b_8 = _1 & b_11;

>>>>>>>>>   if (b_8 != 0)

>>>>>>>>>     goto <bb 6>; [89.00%]

>>>>>>>>>   else

>>>>>>>>>     goto <bb 8>; [11.00%]

>>>>>>>>>

>>>>>>>>>   <bb 6> [local count: 850510900]:

>>>>>>>>>   goto <bb 3>; [100.00%]

>>>>>>>>

>>>>>>>> I am looking into this approach. What should be the scalar evolution

>>>>>>>> for b_8 (i.e. b & (b -1) in a loop) should be? This is not clear to me

>>>>>>>> and can this be represented with the scev?

>>>>>>>

>>>>>>> No, it's not affine and thus cannot be represented.  You only need the

>>>>>>> scalar evolution of the counting IV which is already handled and

>>>>>>> the number of iteration analysis needs to handle the above IV - this

>>>>>>> is the missing part.

>>>>>> Thanks for the clarification. I am now matching this loop pattern in

>>>>>> number_of_iterations_exit when number_of_iterations_exit_assumptions

>>>>>> fails. If the pattern matches, I am inserting the _builtin_popcount in

>>>>>> the loop preheater and setting the loop niter with this. This will be

>>>>>> used by the final value replacement. Is this what you wanted?

>>>>>

>>>>> No, you shouldn't insert a popcount stmt but instead the niter

>>>>> GENERIC tree should be a CALL_EXPR to popcount with the

>>>>> appropriate argument.

>>>>

>>>> Thats what I tried earlier but ran into some ICEs. I wasn't sure if

>>>> niter in tree_niter_desc can take such.

>>>>

>>>> Attached patch now does this. Also had to add support for CALL_EXPR in

>>>> few places to handle niter with CALL_EXPR. Does this look OK?

>>>

>>> Overall this looks ok - the patch includes changes in places that I don't think

>>> need changes such as chrec_convert_1 or extract_ops_from_tree.

>>> The expression_expensive_p change should be more specific than making

>>> all calls inexpensive as well.

>>>

>>> The verify_ssa change looks bogus, you do

>>>

>>> +  dest = gimple_phi_result (count_phi);

>>> +  tree var = make_ssa_name (TREE_TYPE (dest), NULL);

>>> +  tree fn = builtin_decl_implicit (BUILT_IN_POPCOUNT);

>>> +

>>> +  var = build_call_expr (fn, 1, src);

>>> +  *niter = fold_build2 (MINUS_EXPR, TREE_TYPE (dest), var,

>>> +                       build_int_cst (TREE_TYPE (dest), 1));

>>>

>>> why do you allocate a new SSA name here?  It seems unused

>>> as you overwrive 'var' with the CALL_EXPR immediately.

>>>

>>> I didn't review the pattern matching thoroughly nor the exact place you

>>> call it.  But

>>>

>>> +      if (check_popcount_pattern (loop, &count))

>>> +       {

>>> +         niter->assumptions = boolean_false_node;

>>> +         niter->control.base = NULL_TREE;

>>> +         niter->control.step = NULL_TREE;

>>> +         niter->control.no_overflow = false;

>>> +         niter->niter = count;

>>> +         niter->assumptions = boolean_true_node;

>>> +         niter->may_be_zero = boolean_false_node;

>>> +         niter->max = -1;

>>> +         niter->bound = NULL_TREE;

>>> +         niter->cmp = ERROR_MARK;

>>> +         return true;

>>> +       }

>>>

>>> simply setting may_be_zero to false looks fishy.  Try

>>> with -fno-tree-loop-ch.  Also max should not be negative,

>>> it should be the number of bits in the IV type?

>>>

>>> A related testcase could be that we can completely peel

>>> a loop like the following which iterates at most 8 times:

>>>

>>> int a[8];

>>> void foo (unsigned char ctrl)

>>> {

>>>   int c = 0;

>>>   while (ctrl)

>>>     {

>>>        ctrl = ctrl & (ctrl - 1);

>>>        a[c++] = ctrl;

>>>     }

>>> }

>>>

>>> This is now stage1 material so please update and re-post.  Maybe Bin has

>>> further suggestions as well.

>> Sorry for being late on this.  Actually I thought about popcount in

>> distribution before.  More like the first patch, but handled as

>> another distribution pattern rather than a special case.  It's a bit

>> strange to compute and store the info in niters.  It's also not

>> straight forward when/where the transformation is finally done.

>

> It's done at final value replacement if the counter is used.  But with

> it in place we should also be able to compute niters for the case

> where it isn't (like the above case).  So I believe that in the end handling

> this in niter analysis is more powerful.

Yes, you are right, as distribution pattern, we would only be able to
handle standalone popcount loop.

Thanks,
bin
>

>> I haven't looked into the details so not sure how appropriate it will

>> be as a distribution pattern (current ones are only about data

>> references).  So I am okay with this version if it's more appropriate.

>

> Yeah, adding distribution patterns for scalar reduction forms is certainly

> appropriate but then this is the same or at least similar as final

> value replacement.

>

> Richard.

>

>> Thanks,

>> bin

>>>

>>> Thanks,

>>> Richard.

>>>

>>>> Thanks,

>>>> Kugan

>>>>

>>>>

>>>> gcc/ChangeLog:

>>>>

>>>> 2018-02-08  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>>>

>>>>     * gimple-expr.c (extract_ops_from_tree): Handle CALL_EXPR.

>>>>     * tree-chrec.c (chrec_convert_1): Likewise.

>>>>     * tree-scalar-evolution.c (expression_expensive_p): Likewise.

>>>>     * tree-ssa-loop-ivopts.c (contains_abnormal_ssa_name_p): Likewise.

>>>>     * tree-ssa-loop-niter.c (check_popcount_pattern): New.

>>>>     (number_of_iterations_exit): Record niter for popcount patern.

>>>>     * tree-ssa.c (verify_ssa): Check stmt to be non NULL.

>>>>

>>>> gcc/testsuite/ChangeLog:

>>>>

>>>> 2018-02-08  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>>>

>>>>     * gcc.dg/tree-ssa/popcount.c: New test.

>>>>

>>>>

>>>>>

>>>>>> In general, there is also a condition gating the loop like

>>>>>>

>>>>>> if (b_4 != 0)

>>>>>>   goto loop;

>>>>>> else

>>>>>>   end:

>>>>>>

>>>>>> This of course will not be removed by final value replacement. Since

>>>>>> popcount (0) is defined, this is redundant and could be removed but

>>>>>> not removed.

>>>>>

>>>>> Yeah, that's probably sth for another pass though.  I suppose the

>>>>> end: case just uses zero in which case you'll have a PHI and you

>>>>> can optimize

>>>>>

>>>>>   if (b != 0)

>>>>>     return popcount (b);

>>>>>   return 0;

>>>>>

>>>>> in phiopt.

>>>>>

>>>>> Richard.

>>>>>

>>>>>> Thanks,

>>>>>> Kugan

>>>>>>

>>>>>>>

>>>>>>> Richard.

>>>>>>>

>>>>>>>> Thanks,

>>>>>>>> Kugan

>>>>>>>>>

>>>>>>>>> is related to popcount (b_5).

>>>>>>>>>

>>>>>>>>> Richard.

>>>>>>>>>

>>>>>>>>>> Bootstrapped and regression tested on aarch64-linux-gnu with no new regressions.

>>>>>>>>>>

>>>>>>>>>> Thanks,

>>>>>>>>>> Kugan

>>>>>>>>>>

>>>>>>>>>> gcc/ChangeLog:

>>>>>>>>>>

>>>>>>>>>> 2018-01-25  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>>>>>>>>>

>>>>>>>>>>     PR middle-end/82479

>>>>>>>>>>     * tree-loop-distribution.c (handle_popcount): New.

>>>>>>>>>>     (pass_loop_distribution::execute): Use handle_popcount.

>>>>>>>>>>

>>>>>>>>>> gcc/testsuite/ChangeLog:

>>>>>>>>>>

>>>>>>>>>> 2018-01-25  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>>>>>>>>>

>>>>>>>>>>     PR middle-end/82479

>>>>>>>>>>     * gcc.dg/tree-ssa/popcount.c: New test.

Kugan Vivekanandarajah March 8, 2018, 10:06 p.m. UTC | #11

Hi Richard and Bin,

Thanks for your comments. I will revise the patch and post it as soon
as stage-1 opens.

Kugan

On 7 March 2018 at 22:25, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Wed, Mar 7, 2018 at 8:26 AM, Richard Biener

> <richard.guenther@gmail.com> wrote:

>> On Tue, Mar 6, 2018 at 5:20 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:

>>> On Mon, Mar 5, 2018 at 3:24 PM, Richard Biener

>>> <richard.guenther@gmail.com> wrote:

>>>> On Thu, Feb 8, 2018 at 1:41 AM, Kugan Vivekanandarajah

>>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>>> Hi Richard,

>>>>>

>>>>> On 1 February 2018 at 23:21, Richard Biener <richard.guenther@gmail.com> wrote:

>>>>>> On Thu, Feb 1, 2018 at 5:07 AM, Kugan Vivekanandarajah

>>>>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>>>>> Hi Richard,

>>>>>>>

>>>>>>> On 31 January 2018 at 21:39, Richard Biener <richard.guenther@gmail.com> wrote:

>>>>>>>> On Wed, Jan 31, 2018 at 11:28 AM, Kugan Vivekanandarajah

>>>>>>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>>>>>>> Hi Richard,

>>>>>>>>>

>>>>>>>>> Thanks for the review.

>>>>>>>>> On 25 January 2018 at 20:04, Richard Biener <richard.guenther@gmail.com> wrote:

>>>>>>>>>> On Wed, Jan 24, 2018 at 10:56 PM, Kugan Vivekanandarajah

>>>>>>>>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>>>>>>>>> Hi All,

>>>>>>>>>>>

>>>>>>>>>>> Here is a patch for popcount builtin detection similar to LLVM. I

>>>>>>>>>>> would like to queue this for review for next stage 1.

>>>>>>>>>>>

>>>>>>>>>>> 1. This is done part of loop-distribution and effective for -O3 and above.

>>>>>>>>>>> 2. This does not distribute loop to detect popcount (like

>>>>>>>>>>> memcpy/memmove). I dont think that happens in practice. Please correct

>>>>>>>>>>> me if I am wrong.

>>>>>>>>>>

>>>>>>>>>> But then it has no business inside loop distribution but instead is

>>>>>>>>>> doing final value

>>>>>>>>>> replacement, right?  You are pattern-matching the whole loop after all.  I think

>>>>>>>>>> final value replacement would already do the correct thing if you

>>>>>>>>>> teached number of

>>>>>>>>>> iteration analysis that niter for

>>>>>>>>>>

>>>>>>>>>>   <bb 3> [local count: 955630224]:

>>>>>>>>>>   # b_11 = PHI <b_5(5), b_8(6)>

>>>>>>>>>>   _1 = b_11 + -1;

>>>>>>>>>>   b_8 = _1 & b_11;

>>>>>>>>>>   if (b_8 != 0)

>>>>>>>>>>     goto <bb 6>; [89.00%]

>>>>>>>>>>   else

>>>>>>>>>>     goto <bb 8>; [11.00%]

>>>>>>>>>>

>>>>>>>>>>   <bb 6> [local count: 850510900]:

>>>>>>>>>>   goto <bb 3>; [100.00%]

>>>>>>>>>

>>>>>>>>> I am looking into this approach. What should be the scalar evolution

>>>>>>>>> for b_8 (i.e. b & (b -1) in a loop) should be? This is not clear to me

>>>>>>>>> and can this be represented with the scev?

>>>>>>>>

>>>>>>>> No, it's not affine and thus cannot be represented.  You only need the

>>>>>>>> scalar evolution of the counting IV which is already handled and

>>>>>>>> the number of iteration analysis needs to handle the above IV - this

>>>>>>>> is the missing part.

>>>>>>> Thanks for the clarification. I am now matching this loop pattern in

>>>>>>> number_of_iterations_exit when number_of_iterations_exit_assumptions

>>>>>>> fails. If the pattern matches, I am inserting the _builtin_popcount in

>>>>>>> the loop preheater and setting the loop niter with this. This will be

>>>>>>> used by the final value replacement. Is this what you wanted?

>>>>>>

>>>>>> No, you shouldn't insert a popcount stmt but instead the niter

>>>>>> GENERIC tree should be a CALL_EXPR to popcount with the

>>>>>> appropriate argument.

>>>>>

>>>>> Thats what I tried earlier but ran into some ICEs. I wasn't sure if

>>>>> niter in tree_niter_desc can take such.

>>>>>

>>>>> Attached patch now does this. Also had to add support for CALL_EXPR in

>>>>> few places to handle niter with CALL_EXPR. Does this look OK?

>>>>

>>>> Overall this looks ok - the patch includes changes in places that I don't think

>>>> need changes such as chrec_convert_1 or extract_ops_from_tree.

>>>> The expression_expensive_p change should be more specific than making

>>>> all calls inexpensive as well.

>>>>

>>>> The verify_ssa change looks bogus, you do

>>>>

>>>> +  dest = gimple_phi_result (count_phi);

>>>> +  tree var = make_ssa_name (TREE_TYPE (dest), NULL);

>>>> +  tree fn = builtin_decl_implicit (BUILT_IN_POPCOUNT);

>>>> +

>>>> +  var = build_call_expr (fn, 1, src);

>>>> +  *niter = fold_build2 (MINUS_EXPR, TREE_TYPE (dest), var,

>>>> +                       build_int_cst (TREE_TYPE (dest), 1));

>>>>

>>>> why do you allocate a new SSA name here?  It seems unused

>>>> as you overwrive 'var' with the CALL_EXPR immediately.

>>>>

>>>> I didn't review the pattern matching thoroughly nor the exact place you

>>>> call it.  But

>>>>

>>>> +      if (check_popcount_pattern (loop, &count))

>>>> +       {

>>>> +         niter->assumptions = boolean_false_node;

>>>> +         niter->control.base = NULL_TREE;

>>>> +         niter->control.step = NULL_TREE;

>>>> +         niter->control.no_overflow = false;

>>>> +         niter->niter = count;

>>>> +         niter->assumptions = boolean_true_node;

>>>> +         niter->may_be_zero = boolean_false_node;

>>>> +         niter->max = -1;

>>>> +         niter->bound = NULL_TREE;

>>>> +         niter->cmp = ERROR_MARK;

>>>> +         return true;

>>>> +       }

>>>>

>>>> simply setting may_be_zero to false looks fishy.  Try

>>>> with -fno-tree-loop-ch.  Also max should not be negative,

>>>> it should be the number of bits in the IV type?

>>>>

>>>> A related testcase could be that we can completely peel

>>>> a loop like the following which iterates at most 8 times:

>>>>

>>>> int a[8];

>>>> void foo (unsigned char ctrl)

>>>> {

>>>>   int c = 0;

>>>>   while (ctrl)

>>>>     {

>>>>        ctrl = ctrl & (ctrl - 1);

>>>>        a[c++] = ctrl;

>>>>     }

>>>> }

>>>>

>>>> This is now stage1 material so please update and re-post.  Maybe Bin has

>>>> further suggestions as well.

>>> Sorry for being late on this.  Actually I thought about popcount in

>>> distribution before.  More like the first patch, but handled as

>>> another distribution pattern rather than a special case.  It's a bit

>>> strange to compute and store the info in niters.  It's also not

>>> straight forward when/where the transformation is finally done.

>>

>> It's done at final value replacement if the counter is used.  But with

>> it in place we should also be able to compute niters for the case

>> where it isn't (like the above case).  So I believe that in the end handling

>> this in niter analysis is more powerful.

> Yes, you are right, as distribution pattern, we would only be able to

> handle standalone popcount loop.

>

> Thanks,

> bin

>>

>>> I haven't looked into the details so not sure how appropriate it will

>>> be as a distribution pattern (current ones are only about data

>>> references).  So I am okay with this version if it's more appropriate.

>>

>> Yeah, adding distribution patterns for scalar reduction forms is certainly

>> appropriate but then this is the same or at least similar as final

>> value replacement.

>>

>> Richard.

>>

>>> Thanks,

>>> bin

>>>>

>>>> Thanks,

>>>> Richard.

>>>>

>>>>> Thanks,

>>>>> Kugan

>>>>>

>>>>>

>>>>> gcc/ChangeLog:

>>>>>

>>>>> 2018-02-08  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>>>>

>>>>>     * gimple-expr.c (extract_ops_from_tree): Handle CALL_EXPR.

>>>>>     * tree-chrec.c (chrec_convert_1): Likewise.

>>>>>     * tree-scalar-evolution.c (expression_expensive_p): Likewise.

>>>>>     * tree-ssa-loop-ivopts.c (contains_abnormal_ssa_name_p): Likewise.

>>>>>     * tree-ssa-loop-niter.c (check_popcount_pattern): New.

>>>>>     (number_of_iterations_exit): Record niter for popcount patern.

>>>>>     * tree-ssa.c (verify_ssa): Check stmt to be non NULL.

>>>>>

>>>>> gcc/testsuite/ChangeLog:

>>>>>

>>>>> 2018-02-08  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>>>>

>>>>>     * gcc.dg/tree-ssa/popcount.c: New test.

>>>>>

>>>>>

>>>>>>

>>>>>>> In general, there is also a condition gating the loop like

>>>>>>>

>>>>>>> if (b_4 != 0)

>>>>>>>   goto loop;

>>>>>>> else

>>>>>>>   end:

>>>>>>>

>>>>>>> This of course will not be removed by final value replacement. Since

>>>>>>> popcount (0) is defined, this is redundant and could be removed but

>>>>>>> not removed.

>>>>>>

>>>>>> Yeah, that's probably sth for another pass though.  I suppose the

>>>>>> end: case just uses zero in which case you'll have a PHI and you

>>>>>> can optimize

>>>>>>

>>>>>>   if (b != 0)

>>>>>>     return popcount (b);

>>>>>>   return 0;

>>>>>>

>>>>>> in phiopt.

>>>>>>

>>>>>> Richard.

>>>>>>

>>>>>>> Thanks,

>>>>>>> Kugan

>>>>>>>

>>>>>>>>

>>>>>>>> Richard.

>>>>>>>>

>>>>>>>>> Thanks,

>>>>>>>>> Kugan

>>>>>>>>>>

>>>>>>>>>> is related to popcount (b_5).

>>>>>>>>>>

>>>>>>>>>> Richard.

>>>>>>>>>>

>>>>>>>>>>> Bootstrapped and regression tested on aarch64-linux-gnu with no new regressions.

>>>>>>>>>>>

>>>>>>>>>>> Thanks,

>>>>>>>>>>> Kugan

>>>>>>>>>>>

>>>>>>>>>>> gcc/ChangeLog:

>>>>>>>>>>>

>>>>>>>>>>> 2018-01-25  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>>>>>>>>>>

>>>>>>>>>>>     PR middle-end/82479

>>>>>>>>>>>     * tree-loop-distribution.c (handle_popcount): New.

>>>>>>>>>>>     (pass_loop_distribution::execute): Use handle_popcount.

>>>>>>>>>>>

>>>>>>>>>>> gcc/testsuite/ChangeLog:

>>>>>>>>>>>

>>>>>>>>>>> 2018-01-25  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>>>>>>>>>>

>>>>>>>>>>>     PR middle-end/82479

>>>>>>>>>>>     * gcc.dg/tree-ssa/popcount.c: New test.

Kugan Vivekanandarajah May 17, 2018, 1:39 a.m. UTC | #12

Hi Richard,

On 6 March 2018 at 02:24, Richard Biener <richard.guenther@gmail.com> wrote:
> On Thu, Feb 8, 2018 at 1:41 AM, Kugan Vivekanandarajah

> <kugan.vivekanandarajah@linaro.org> wrote:

>> Hi Richard,

>>

>> On 1 February 2018 at 23:21, Richard Biener <richard.guenther@gmail.com> wrote:

>>> On Thu, Feb 1, 2018 at 5:07 AM, Kugan Vivekanandarajah

>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>> Hi Richard,

>>>>

>>>> On 31 January 2018 at 21:39, Richard Biener <richard.guenther@gmail.com> wrote:

>>>>> On Wed, Jan 31, 2018 at 11:28 AM, Kugan Vivekanandarajah

>>>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>>>> Hi Richard,

>>>>>>

>>>>>> Thanks for the review.

>>>>>> On 25 January 2018 at 20:04, Richard Biener <richard.guenther@gmail.com> wrote:

>>>>>>> On Wed, Jan 24, 2018 at 10:56 PM, Kugan Vivekanandarajah

>>>>>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>>>>>> Hi All,

>>>>>>>>

>>>>>>>> Here is a patch for popcount builtin detection similar to LLVM. I

>>>>>>>> would like to queue this for review for next stage 1.

>>>>>>>>

>>>>>>>> 1. This is done part of loop-distribution and effective for -O3 and above.

>>>>>>>> 2. This does not distribute loop to detect popcount (like

>>>>>>>> memcpy/memmove). I dont think that happens in practice. Please correct

>>>>>>>> me if I am wrong.

>>>>>>>

>>>>>>> But then it has no business inside loop distribution but instead is

>>>>>>> doing final value

>>>>>>> replacement, right?  You are pattern-matching the whole loop after all.  I think

>>>>>>> final value replacement would already do the correct thing if you

>>>>>>> teached number of

>>>>>>> iteration analysis that niter for

>>>>>>>

>>>>>>>   <bb 3> [local count: 955630224]:

>>>>>>>   # b_11 = PHI <b_5(5), b_8(6)>

>>>>>>>   _1 = b_11 + -1;

>>>>>>>   b_8 = _1 & b_11;

>>>>>>>   if (b_8 != 0)

>>>>>>>     goto <bb 6>; [89.00%]

>>>>>>>   else

>>>>>>>     goto <bb 8>; [11.00%]

>>>>>>>

>>>>>>>   <bb 6> [local count: 850510900]:

>>>>>>>   goto <bb 3>; [100.00%]

>>>>>>

>>>>>> I am looking into this approach. What should be the scalar evolution

>>>>>> for b_8 (i.e. b & (b -1) in a loop) should be? This is not clear to me

>>>>>> and can this be represented with the scev?

>>>>>

>>>>> No, it's not affine and thus cannot be represented.  You only need the

>>>>> scalar evolution of the counting IV which is already handled and

>>>>> the number of iteration analysis needs to handle the above IV - this

>>>>> is the missing part.

>>>> Thanks for the clarification. I am now matching this loop pattern in

>>>> number_of_iterations_exit when number_of_iterations_exit_assumptions

>>>> fails. If the pattern matches, I am inserting the _builtin_popcount in

>>>> the loop preheater and setting the loop niter with this. This will be

>>>> used by the final value replacement. Is this what you wanted?

>>>

>>> No, you shouldn't insert a popcount stmt but instead the niter

>>> GENERIC tree should be a CALL_EXPR to popcount with the

>>> appropriate argument.

>>

>> Thats what I tried earlier but ran into some ICEs. I wasn't sure if

>> niter in tree_niter_desc can take such.

>>

>> Attached patch now does this. Also had to add support for CALL_EXPR in

>> few places to handle niter with CALL_EXPR. Does this look OK?

>

> Overall this looks ok - the patch includes changes in places that I don't think

> need changes such as chrec_convert_1 or extract_ops_from_tree.

> The expression_expensive_p change should be more specific than making

> all calls inexpensive as well.


Changed it.

>

> The verify_ssa change looks bogus, you do

>

> +  dest = gimple_phi_result (count_phi);

> +  tree var = make_ssa_name (TREE_TYPE (dest), NULL);

> +  tree fn = builtin_decl_implicit (BUILT_IN_POPCOUNT);

> +

> +  var = build_call_expr (fn, 1, src);

> +  *niter = fold_build2 (MINUS_EXPR, TREE_TYPE (dest), var,

> +                       build_int_cst (TREE_TYPE (dest), 1));

>

> why do you allocate a new SSA name here?  It seems unused

> as you overwrive 'var' with the CALL_EXPR immediately.

Changed now.

>

> I didn't review the pattern matching thoroughly nor the exact place you

> call it.  But

>

> +      if (check_popcount_pattern (loop, &count))

> +       {

> +         niter->assumptions = boolean_false_node;

> +         niter->control.base = NULL_TREE;

> +         niter->control.step = NULL_TREE;

> +         niter->control.no_overflow = false;

> +         niter->niter = count;

> +         niter->assumptions = boolean_true_node;

> +         niter->may_be_zero = boolean_false_node;

> +         niter->max = -1;

> +         niter->bound = NULL_TREE;

> +         niter->cmp = ERROR_MARK;

> +         return true;

> +       }

>

> simply setting may_be_zero to false looks fishy.

Should I set this to (argument to popcount == zero)?

> Try with -fno-tree-loop-ch.

I changed the pattern matching to handle loop without header copying
too. Looks a bit complicated checking all the conditions. Wondering if
this can be done in a simpler and easier to read way.

>  Also max should not be negative,

> it should be the number of bits in the IV type?


Changed this too.
>

> A related testcase could be that we can completely peel

> a loop like the following which iterates at most 8 times:

>

> int a[8];

> void foo (unsigned char ctrl)

> {

>   int c = 0;

>   while (ctrl)

>     {

>        ctrl = ctrl & (ctrl - 1);

>        a[c++] = ctrl;

>     }

> }


Hmm, this is an interesting test case but as I am now trying to match
a loop which does popcount, this is not handled.


Attaching the current version for review.


Thanks,
Kugan

>

> This is now stage1 material so please update and re-post.  Maybe Bin has

> further suggestions as well.

>

> Thanks,

> Richard.

>

>> Thanks,

>> Kugan

>>

>>

>> gcc/ChangeLog:

>>

>> 2018-02-08  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>

>>     * gimple-expr.c (extract_ops_from_tree): Handle CALL_EXPR.

>>     * tree-chrec.c (chrec_convert_1): Likewise.

>>     * tree-scalar-evolution.c (expression_expensive_p): Likewise.

>>     * tree-ssa-loop-ivopts.c (contains_abnormal_ssa_name_p): Likewise.

>>     * tree-ssa-loop-niter.c (check_popcount_pattern): New.

>>     (number_of_iterations_exit): Record niter for popcount patern.

>>     * tree-ssa.c (verify_ssa): Check stmt to be non NULL.

>>

>> gcc/testsuite/ChangeLog:

>>

>> 2018-02-08  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>

>>     * gcc.dg/tree-ssa/popcount.c: New test.

>>

>>

>>>

>>>> In general, there is also a condition gating the loop like

>>>>

>>>> if (b_4 != 0)

>>>>   goto loop;

>>>> else

>>>>   end:

>>>>

>>>> This of course will not be removed by final value replacement. Since

>>>> popcount (0) is defined, this is redundant and could be removed but

>>>> not removed.

>>>

>>> Yeah, that's probably sth for another pass though.  I suppose the

>>> end: case just uses zero in which case you'll have a PHI and you

>>> can optimize

>>>

>>>   if (b != 0)

>>>     return popcount (b);

>>>   return 0;

>>>

>>> in phiopt.

>>>

>>> Richard.

>>>

>>>> Thanks,

>>>> Kugan

>>>>

>>>>>

>>>>> Richard.

>>>>>

>>>>>> Thanks,

>>>>>> Kugan

>>>>>>>

>>>>>>> is related to popcount (b_5).

>>>>>>>

>>>>>>> Richard.

>>>>>>>

>>>>>>>> Bootstrapped and regression tested on aarch64-linux-gnu with no new regressions.

>>>>>>>>

>>>>>>>> Thanks,

>>>>>>>> Kugan

>>>>>>>>

>>>>>>>> gcc/ChangeLog:

>>>>>>>>

>>>>>>>> 2018-01-25  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>>>>>>>

>>>>>>>>     PR middle-end/82479

>>>>>>>>     * tree-loop-distribution.c (handle_popcount): New.

>>>>>>>>     (pass_loop_distribution::execute): Use handle_popcount.

>>>>>>>>

>>>>>>>> gcc/testsuite/ChangeLog:

>>>>>>>>

>>>>>>>> 2018-01-25  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>>>>>>>

>>>>>>>>     PR middle-end/82479

>>>>>>>>     * gcc.dg/tree-ssa/popcount.c: New test.
From 69093cbf4c2a2c86628f78acd563a56081e234f6 Mon Sep 17 00:00:00 2001
From: Kugan Vivekanandarajah <kugan.vivekanandarajah@linaro.org>
Date: Thu, 10 May 2018 21:41:53 +1000
Subject: [PATCH] popcount

Change-Id: I49b8557601cabffb5257e1daabeb5c59979fca96
---
 gcc/ipa-fnsummary.c                       |   4 +
 gcc/testsuite/gcc.dg/tree-ssa/popcount.c  |  41 +++++++++
 gcc/testsuite/gcc.dg/tree-ssa/popcount2.c |  29 ++++++
 gcc/testsuite/gcc.dg/tree-ssa/popcount3.c |  28 ++++++
 gcc/tree-scalar-evolution.c               |   7 ++
 gcc/tree-ssa-loop-ivopts.c                |  10 ++
 gcc/tree-ssa-loop-niter.c                 | 148 +++++++++++++++++++++++++++++-
 7 files changed, 266 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/popcount.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/popcount2.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/popcount3.c

diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c
index bdf9ba1..4dc156a 100644
--- a/gcc/ipa-fnsummary.c
+++ b/gcc/ipa-fnsummary.c
@@ -1485,6 +1485,10 @@ will_be_nonconstant_expr_predicate (struct ipa_node_params *info,
 					       nonconstant_names);
       return p2.or_with (summary->conds, p1);
     }
+  else if (TREE_CODE (expr) == CALL_EXPR)
+    {
+      return false;
+    }
   else
     {
       debug_tree (expr);
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/popcount.c b/gcc/testsuite/gcc.dg/tree-ssa/popcount.c
new file mode 100644
index 0000000..86a66cb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/popcount.c
@@ -0,0 +1,41 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+
+extern int foo (int);
+
+int PopCount (long b) {
+    int c = 0;
+    b++;
+
+    while (b) {
+	b &= b - 1;
+	c++;
+    }
+    return c;
+}
+int PopCount2 (long b) {
+    int c = 0;
+
+    while (b) {
+	b &= b - 1;
+	c++;
+    }
+    foo (c);
+    return foo (c);
+}
+
+void PopCount3 (long b1) {
+
+    for (long i = 0; i < b1; ++i)
+      {
+	long b = i;
+	int c = 0;
+	while (b) {
+	    b &= b - 1;
+	    c++;
+	}
+	foo (c);
+      }
+}
+
+/* { dg-final { scan-tree-dump-times "__builtin_popcount" 3 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/popcount2.c b/gcc/testsuite/gcc.dg/tree-ssa/popcount2.c
new file mode 100644
index 0000000..52afc2d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/popcount2.c
@@ -0,0 +1,29 @@
+/* { dg-do execute } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+int
+__attribute__ ((noinline, noclone))
+foo (int i, long b)
+{
+    int c = i;
+
+    while (b) {
+	b &= b - 1;
+	c++;
+    }
+    return c;
+}
+
+int main()
+{
+
+  if (foo (1, 7) != 4)
+   __builtin_abort ();
+  if (foo (0, 0) != 0)
+   __builtin_abort ();
+  if (foo (8, 0xff) != 16)
+   __builtin_abort ();
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "__builtin_popcount" 1 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/popcount3.c b/gcc/testsuite/gcc.dg/tree-ssa/popcount3.c
new file mode 100644
index 0000000..0c69d97
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/popcount3.c
@@ -0,0 +1,28 @@
+/* { dg-do execute } */
+/* { dg-options "-O2 -fno-tree-ch -fdump-tree-optimized" } */
+
+int
+__attribute__ ((noinline, noclone))
+foo (long b)
+{
+    int c = i;
+
+    while (b) {
+	b &= b - 1;
+	c++;
+    }
+    return c;
+}
+
+int main()
+{
+  if (foo (7) != 3)
+   __builtin_abort ();
+  if (foo (0) != 0)
+   __builtin_abort ();
+  if (foo (0xff) != 8)
+   __builtin_abort ();
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "__builtin_popcount" 1 "optimized" } } */
diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c
index fefc9de..967b154 100644
--- a/gcc/tree-scalar-evolution.c
+++ b/gcc/tree-scalar-evolution.c
@@ -1984,6 +1984,7 @@ interpret_expr (struct loop *loop, gimple *at_stmt, tree expr)
     return expr;
 
   if (TREE_CODE (expr) == POLYNOMIAL_CHREC
+      || TREE_CODE (expr) == CALL_EXPR
       || get_gimple_rhs_class (TREE_CODE (expr)) == GIMPLE_TERNARY_RHS)
     return chrec_dont_know;
 
@@ -3472,6 +3473,7 @@ bool
 expression_expensive_p (tree expr)
 {
   enum tree_code code;
+  tree fndecl;
 
   if (is_gimple_val (expr))
     return false;
@@ -3492,6 +3494,11 @@ expression_expensive_p (tree expr)
       if (!integer_pow2p (TREE_OPERAND (expr, 1)))
 	return true;
     }
+  if (code == CALL_EXPR
+      && (fndecl = get_callee_fndecl (expr))
+      && DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL
+      && (BUILT_IN_POPCOUNT == DECL_FUNCTION_CODE (fndecl)))
+    return false;
 
   switch (TREE_CODE_CLASS (code))
     {
diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index b313571..519649a 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -985,6 +985,16 @@ contains_abnormal_ssa_name_p (tree expr)
   code = TREE_CODE (expr);
   codeclass = TREE_CODE_CLASS (code);
 
+  if (code == CALL_EXPR)
+    {
+      tree arg;
+      call_expr_arg_iterator iter;
+      FOR_EACH_CALL_EXPR_ARG (arg, iter, expr)
+	if (contains_abnormal_ssa_name_p (arg))
+	  return true;
+      return false;
+    }
+
   if (code == SSA_NAME)
     return SSA_NAME_OCCURS_IN_ABNORMAL_PHI (expr) != 0;
 
diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
index 7a54c5f..f390321 100644
--- a/gcc/tree-ssa-loop-niter.c
+++ b/gcc/tree-ssa-loop-niter.c
@@ -2430,6 +2430,134 @@ number_of_iterations_exit_assumptions (struct loop *loop, edge exit,
   return (!integer_zerop (niter->assumptions));
 }
 
+/* See if LOOP is a popcout implementation of the form
+
+    int c = 0;
+    while (b) {
+	b = b & (b - 1);
+	c++;
+    }
+
+    If so, Set NITER to  __builtin_popcount (b) - 1
+    return true if we did, false otherwise.  */
+
+static bool
+check_popcount_pattern (loop_p loop, tree *niter, HOST_WIDE_INT *max)
+{
+  tree lhs, rhs;
+  tree dest;
+  gimple *and_minus_one;
+  gimple *phi;
+  int count = 0;
+  gimple *count_stmt = NULL;
+  bool adjust = true;
+
+  if (!single_exit (loop))
+    return false;
+
+  /* Check loop terminating branch is like
+     if (b != 0).  */
+  gimple *stmt = last_stmt (loop->header);
+  if (!stmt
+      || gimple_code (stmt) != GIMPLE_COND
+      || !zerop (gimple_cond_rhs (stmt)))
+    return false;
+
+  /* Check the loop closed SSA definition for just the variable c defined in
+     loop.  */
+  basic_block bb = single_exit (loop)->dest;
+  for (gphi_iterator gpi = gsi_start_phis (bb);
+       !gsi_end_p (gpi); gsi_next (&gpi))
+    {
+      phi = gpi.phi ();
+      count++;
+    }
+
+  if (count != 1)
+    return false;
+
+  rhs = gimple_phi_arg_def (phi, single_exit (loop)->dest_idx);
+  if (TREE_CODE (rhs) != SSA_NAME)
+    return false;
+  count_stmt = SSA_NAME_DEF_STMT (rhs);
+  lhs = gimple_cond_lhs (stmt);
+  if (TREE_CODE (lhs) != SSA_NAME)
+    return false;
+  gimple *and_stmt = SSA_NAME_DEF_STMT (lhs);
+
+  /* Depending on copy-header is performed, feeding PHI stmts might be in
+     the loop header or loop exit, handle this.  */
+  if (gimple_code (count_stmt) == GIMPLE_PHI)
+    {
+      tree t;
+      if (gimple_code (and_stmt) != GIMPLE_PHI
+	  || gimple_bb (and_stmt) != single_exit (loop)->src
+	  || gimple_bb (count_stmt) != single_exit (loop)->src)
+	return false;
+      t = gimple_phi_arg_def (count_stmt, loop_latch_edge (loop)->dest_idx);
+      if (TREE_CODE (t) != SSA_NAME)
+	return false;
+      count_stmt = SSA_NAME_DEF_STMT (t);
+      t = gimple_phi_arg_def (and_stmt, loop_latch_edge (loop)->dest_idx);
+      if (TREE_CODE (t) != SSA_NAME)
+	return false;
+      and_stmt = SSA_NAME_DEF_STMT (t);
+      adjust = false;
+    }
+
+  /* Make sure we have a count by one.  */
+  if (!is_gimple_assign (count_stmt)
+      || (gimple_assign_rhs_code (count_stmt) != PLUS_EXPR)
+      || !integer_onep (gimple_assign_rhs2 (count_stmt)))
+    return false;
+
+  /* Cheeck "b = b & (b - 1)" is calculated.  */
+  if (!is_gimple_assign (and_stmt)
+      || gimple_assign_rhs_code (and_stmt) != BIT_AND_EXPR)
+    return false;
+
+  lhs = gimple_assign_rhs1 (and_stmt);
+  rhs = gimple_assign_rhs2 (and_stmt);
+  if (TREE_CODE (lhs) == SSA_NAME
+      && (and_minus_one = SSA_NAME_DEF_STMT (lhs))
+      && is_gimple_assign (and_minus_one)
+      && (gimple_assign_rhs_code (and_minus_one) == PLUS_EXPR)
+      && integer_minus_onep (gimple_assign_rhs2 (and_minus_one)))
+      lhs = rhs;
+  else if (TREE_CODE (rhs) == SSA_NAME
+      && (and_minus_one = SSA_NAME_DEF_STMT (rhs))
+      && is_gimple_assign (and_minus_one)
+      && (gimple_assign_rhs_code (and_minus_one) == PLUS_EXPR)
+      && integer_minus_onep (gimple_assign_rhs2 (and_minus_one)))
+      ;
+  else
+    return false;
+
+  if ((gimple_assign_rhs1 (and_stmt) != gimple_assign_rhs1 (and_minus_one))
+      && (gimple_assign_rhs2 (and_stmt) != gimple_assign_rhs1 (and_minus_one)))
+    return false;
+
+  /* Check the recurrence.  */
+  phi = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (and_minus_one));
+  gimple *src_phi = SSA_NAME_DEF_STMT (lhs);
+  if (gimple_code (phi) != GIMPLE_PHI
+      || gimple_code (src_phi) != GIMPLE_PHI)
+    return false;
+
+  dest = gimple_assign_lhs (count_stmt);
+  tree fn = builtin_decl_implicit (BUILT_IN_POPCOUNT);
+  tree src = gimple_phi_arg_def (src_phi, loop_preheader_edge (loop)->dest_idx);
+  if (adjust)
+    *niter = fold_build2 (MINUS_EXPR, TREE_TYPE (dest),
+			  build_call_expr (fn, 1, src),
+			  build_int_cst (TREE_TYPE (dest), 1));
+  else
+    *niter = build_call_expr (fn, 1, src);
+  *max = int_cst_value (TYPE_MAX_VALUE (TREE_TYPE (dest)));
+  return true;
+}
+
+
 /* Like number_of_iterations_exit_assumptions, but return TRUE only if
    the niter information holds unconditionally.  */
 
@@ -2441,7 +2569,25 @@ number_of_iterations_exit (struct loop *loop, edge exit,
   gcond *stmt;
   if (!number_of_iterations_exit_assumptions (loop, exit, niter,
 					      &stmt, every_iteration))
-    return false;
+    {
+      tree count;
+      HOST_WIDE_INT max;
+      if (check_popcount_pattern (loop, &count, &max))
+	{
+	  niter->assumptions = boolean_false_node;
+	  niter->control.base = NULL_TREE;
+	  niter->control.step = NULL_TREE;
+	  niter->control.no_overflow = false;
+	  niter->niter = count;
+	  niter->assumptions = boolean_true_node;
+	  niter->may_be_zero = boolean_false_node;
+	  niter->max = max;
+	  niter->bound = NULL_TREE;
+	  niter->cmp = ERROR_MARK;
+	  return true;
+	}
+      return false;
+    }
 
   if (integer_nonzerop (niter->assumptions))
     return true;

Bin.Cheng May 17, 2018, 9:56 a.m. UTC | #13

On Thu, May 17, 2018 at 2:39 AM, Kugan Vivekanandarajah
<kugan.vivekanandarajah@linaro.org> wrote:
> Hi Richard,

>

> On 6 March 2018 at 02:24, Richard Biener <richard.guenther@gmail.com> wrote:

>> On Thu, Feb 8, 2018 at 1:41 AM, Kugan Vivekanandarajah

>> <kugan.vivekanandarajah@linaro.org> wrote:

>>> Hi Richard,

>>>

>>> On 1 February 2018 at 23:21, Richard Biener <richard.guenther@gmail.com> wrote:

>>>> On Thu, Feb 1, 2018 at 5:07 AM, Kugan Vivekanandarajah

>>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>>> Hi Richard,

>>>>>

>>>>> On 31 January 2018 at 21:39, Richard Biener <richard.guenther@gmail.com> wrote:

>>>>>> On Wed, Jan 31, 2018 at 11:28 AM, Kugan Vivekanandarajah

>>>>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>>>>> Hi Richard,

>>>>>>>

>>>>>>> Thanks for the review.

>>>>>>> On 25 January 2018 at 20:04, Richard Biener <richard.guenther@gmail.com> wrote:

>>>>>>>> On Wed, Jan 24, 2018 at 10:56 PM, Kugan Vivekanandarajah

>>>>>>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>>>>>>> Hi All,

>>>>>>>>>

>>>>>>>>> Here is a patch for popcount builtin detection similar to LLVM. I

>>>>>>>>> would like to queue this for review for next stage 1.

>>>>>>>>>

>>>>>>>>> 1. This is done part of loop-distribution and effective for -O3 and above.

>>>>>>>>> 2. This does not distribute loop to detect popcount (like

>>>>>>>>> memcpy/memmove). I dont think that happens in practice. Please correct

>>>>>>>>> me if I am wrong.

>>>>>>>>

>>>>>>>> But then it has no business inside loop distribution but instead is

>>>>>>>> doing final value

>>>>>>>> replacement, right?  You are pattern-matching the whole loop after all.  I think

>>>>>>>> final value replacement would already do the correct thing if you

>>>>>>>> teached number of

>>>>>>>> iteration analysis that niter for

>>>>>>>>

>>>>>>>>   <bb 3> [local count: 955630224]:

>>>>>>>>   # b_11 = PHI <b_5(5), b_8(6)>

>>>>>>>>   _1 = b_11 + -1;

>>>>>>>>   b_8 = _1 & b_11;

>>>>>>>>   if (b_8 != 0)

>>>>>>>>     goto <bb 6>; [89.00%]

>>>>>>>>   else

>>>>>>>>     goto <bb 8>; [11.00%]

>>>>>>>>

>>>>>>>>   <bb 6> [local count: 850510900]:

>>>>>>>>   goto <bb 3>; [100.00%]

>>>>>>>

>>>>>>> I am looking into this approach. What should be the scalar evolution

>>>>>>> for b_8 (i.e. b & (b -1) in a loop) should be? This is not clear to me

>>>>>>> and can this be represented with the scev?

>>>>>>

>>>>>> No, it's not affine and thus cannot be represented.  You only need the

>>>>>> scalar evolution of the counting IV which is already handled and

>>>>>> the number of iteration analysis needs to handle the above IV - this

>>>>>> is the missing part.

>>>>> Thanks for the clarification. I am now matching this loop pattern in

>>>>> number_of_iterations_exit when number_of_iterations_exit_assumptions

>>>>> fails. If the pattern matches, I am inserting the _builtin_popcount in

>>>>> the loop preheater and setting the loop niter with this. This will be

>>>>> used by the final value replacement. Is this what you wanted?

>>>>

>>>> No, you shouldn't insert a popcount stmt but instead the niter

>>>> GENERIC tree should be a CALL_EXPR to popcount with the

>>>> appropriate argument.

>>>

>>> Thats what I tried earlier but ran into some ICEs. I wasn't sure if

>>> niter in tree_niter_desc can take such.

>>>

>>> Attached patch now does this. Also had to add support for CALL_EXPR in

>>> few places to handle niter with CALL_EXPR. Does this look OK?

>>

>> Overall this looks ok - the patch includes changes in places that I don't think

>> need changes such as chrec_convert_1 or extract_ops_from_tree.

>> The expression_expensive_p change should be more specific than making

>> all calls inexpensive as well.

>

> Changed it.

>

>>

>> The verify_ssa change looks bogus, you do

>>

>> +  dest = gimple_phi_result (count_phi);

>> +  tree var = make_ssa_name (TREE_TYPE (dest), NULL);

>> +  tree fn = builtin_decl_implicit (BUILT_IN_POPCOUNT);

>> +

>> +  var = build_call_expr (fn, 1, src);

>> +  *niter = fold_build2 (MINUS_EXPR, TREE_TYPE (dest), var,

>> +                       build_int_cst (TREE_TYPE (dest), 1));

>>

>> why do you allocate a new SSA name here?  It seems unused

>> as you overwrive 'var' with the CALL_EXPR immediately.

> Changed now.

>

>>

>> I didn't review the pattern matching thoroughly nor the exact place you

>> call it.  But

>>

>> +      if (check_popcount_pattern (loop, &count))

>> +       {

>> +         niter->assumptions = boolean_false_node;

>> +         niter->control.base = NULL_TREE;

>> +         niter->control.step = NULL_TREE;

>> +         niter->control.no_overflow = false;

>> +         niter->niter = count;

>> +         niter->assumptions = boolean_true_node;

>> +         niter->may_be_zero = boolean_false_node;

>> +         niter->max = -1;

>> +         niter->bound = NULL_TREE;

>> +         niter->cmp = ERROR_MARK;

>> +         return true;

>> +       }

>>

>> simply setting may_be_zero to false looks fishy.

> Should I set this to (argument to popcount == zero)?

No, I think that's unnecessary.  The number of iterations is computed
as: may_be_zero ? 0 : niter;
Here niter is ZERO even when may_be_zero is set to false, and niters
is computed correctly.

I think the point is that may_be_zero is false doesn't imply that
niters is non-zero.

>

>> Try with -fno-tree-loop-ch.

> I changed the pattern matching to handle loop without header copying

> too. Looks a bit complicated checking all the conditions. Wondering if

> this can be done in a simpler and easier to read way.

>

>>  Also max should not be negative,

>> it should be the number of bits in the IV type?

>

> Changed this too.

>>

>> A related testcase could be that we can completely peel

>> a loop like the following which iterates at most 8 times:

>>

>> int a[8];

>> void foo (unsigned char ctrl)

>> {

>>   int c = 0;

>>   while (ctrl)

>>     {

>>        ctrl = ctrl & (ctrl - 1);

>>        a[c++] = ctrl;

>>     }

>> }

>

> Hmm, this is an interesting test case but as I am now trying to match

> a loop which does popcount, this is not handled.

>

>

> Attaching the current version for review.

Here are some comments.

> diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c

> index 7a54c5f..f390321 100644

> --- a/gcc/tree-ssa-loop-niter.c

> +++ b/gcc/tree-ssa-loop-niter.c

> @@ -2430,6 +2430,134 @@ number_of_iterations_exit_assumptions (struct loop *loop, edge exit,

>    return (!integer_zerop (niter->assumptions));

>  }

>

> +/* See if LOOP is a popcout implementation of the form

> +

> +    int c = 0;

> +    while (b) {

> +    b = b & (b - 1);

> +    c++;

> +    }

> +

> +    If so, Set NITER to  __builtin_popcount (b) - 1

> +    return true if we did, false otherwise.  */

> +

> +static bool

> +check_popcount_pattern (loop_p loop, tree *niter, HOST_WIDE_INT *max)

> +{

> +  tree lhs, rhs;

> +  tree dest;

> +  gimple *and_minus_one;

> +  gimple *phi;

> +  int count = 0;

> +  gimple *count_stmt = NULL;

> +  bool adjust = true;

> +

> +  if (!single_exit (loop))

> +    return false;

> +

> +  /* Check loop terminating branch is like

> +     if (b != 0).  */

> +  gimple *stmt = last_stmt (loop->header);

> +  if (!stmt

> +      || gimple_code (stmt) != GIMPLE_COND

> +      || !zerop (gimple_cond_rhs (stmt)))

The check doesn't fully match the comment.  NE is not checked here.
Also can move below "(TREE_CODE (lhs) != SSA_NAME)" check up to this
point, making simple checks earlier.

> +    return false;

> +

> +  /* Check the loop closed SSA definition for just the variable c defined in

> +     loop.  */

> +  basic_block bb = single_exit (loop)->dest;

single_exit is repeatedly called various times, call it once and use
the returning edge instead?  I am not sure GCC is smart enough
removing repeated call instances.  Same to loop_latch_edge.

> +  for (gphi_iterator gpi = gsi_start_phis (bb);

> +       !gsi_end_p (gpi); gsi_next (&gpi))

> +    {

> +      phi = gpi.phi ();

> +      count++;

> +    }

> +

> +  if (count != 1)

> +    return false;

> +

> +  rhs = gimple_phi_arg_def (phi, single_exit (loop)->dest_idx);

> +  if (TREE_CODE (rhs) != SSA_NAME)

> +    return false;

> +  count_stmt = SSA_NAME_DEF_STMT (rhs);

> +  lhs = gimple_cond_lhs (stmt);

> +  if (TREE_CODE (lhs) != SSA_NAME)

> +    return false;

> +  gimple *and_stmt = SSA_NAME_DEF_STMT (lhs);

> +

> +  /* Depending on copy-header is performed, feeding PHI stmts might be in

> +     the loop header or loop exit, handle this.  */

> +  if (gimple_code (count_stmt) == GIMPLE_PHI)

> +    {

> +      tree t;

> +      if (gimple_code (and_stmt) != GIMPLE_PHI

> +      || gimple_bb (and_stmt) != single_exit (loop)->src

> +      || gimple_bb (count_stmt) != single_exit (loop)->src)

> +    return false;

> +      t = gimple_phi_arg_def (count_stmt, loop_latch_edge (loop)->dest_idx);

> +      if (TREE_CODE (t) != SSA_NAME)

> +    return false;

> +      count_stmt = SSA_NAME_DEF_STMT (t);

> +      t = gimple_phi_arg_def (and_stmt, loop_latch_edge (loop)->dest_idx);

> +      if (TREE_CODE (t) != SSA_NAME)

> +    return false;

> +      and_stmt = SSA_NAME_DEF_STMT (t);

> +      adjust = false;

> +    }

> +

> +  /* Make sure we have a count by one.  */

> +  if (!is_gimple_assign (count_stmt)

> +      || (gimple_assign_rhs_code (count_stmt) != PLUS_EXPR)

> +      || !integer_onep (gimple_assign_rhs2 (count_stmt)))

> +    return false;

> +

> +  /* Cheeck "b = b & (b - 1)" is calculated.  */

Typo.

> +  if (!is_gimple_assign (and_stmt)

> +      || gimple_assign_rhs_code (and_stmt) != BIT_AND_EXPR)

> +    return false;

> +

> +  lhs = gimple_assign_rhs1 (and_stmt);

> +  rhs = gimple_assign_rhs2 (and_stmt);

> +  if (TREE_CODE (lhs) == SSA_NAME

> +      && (and_minus_one = SSA_NAME_DEF_STMT (lhs))

> +      && is_gimple_assign (and_minus_one)

> +      && (gimple_assign_rhs_code (and_minus_one) == PLUS_EXPR)

> +      && integer_minus_onep (gimple_assign_rhs2 (and_minus_one)))

> +      lhs = rhs;

> +  else if (TREE_CODE (rhs) == SSA_NAME

> +      && (and_minus_one = SSA_NAME_DEF_STMT (rhs))

> +      && is_gimple_assign (and_minus_one)

> +      && (gimple_assign_rhs_code (and_minus_one) == PLUS_EXPR)

> +      && integer_minus_onep (gimple_assign_rhs2 (and_minus_one)))

Could you avoid duplication by factoring the condition into an inline
function?  They are exactly the same for lhs/rhs.

> +      ;

> +  else

> +    return false;

> +

> +  if ((gimple_assign_rhs1 (and_stmt) != gimple_assign_rhs1 (and_minus_one))

> +      && (gimple_assign_rhs2 (and_stmt) != gimple_assign_rhs1 (and_minus_one)))

Here you already got lhs correctly, so don't need to check on and_stmt
rhs directly.  You can even merge this check into above one.

> +    return false;

> +

> +  /* Check the recurrence.  */

> +  phi = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (and_minus_one));

> +  gimple *src_phi = SSA_NAME_DEF_STMT (lhs);

> +  if (gimple_code (phi) != GIMPLE_PHI

> +      || gimple_code (src_phi) != GIMPLE_PHI)

I think this is redundant since you have lhs equals to
gimple_assign_rhs1 (and_minus_one).  So phi == src_phi is always true?

> +    return false;

> +

> +  dest = gimple_assign_lhs (count_stmt);

> +  tree fn = builtin_decl_implicit (BUILT_IN_POPCOUNT);

> +  tree src = gimple_phi_arg_def (src_phi, loop_preheader_edge (loop)->dest_idx);

> +  if (adjust)

> +    *niter = fold_build2 (MINUS_EXPR, TREE_TYPE (dest),

> +              build_call_expr (fn, 1, src),

> +              build_int_cst (TREE_TYPE (dest), 1));

> +  else

> +    *niter = build_call_expr (fn, 1, src);

> +  *max = int_cst_value (TYPE_MAX_VALUE (TREE_TYPE (dest)));

> +  return true;

> +}

> +

> +

>  /* Like number_of_iterations_exit_assumptions, but return TRUE only if

>     the niter information holds unconditionally.  */

>

> @@ -2441,7 +2569,25 @@ number_of_iterations_exit (struct loop *loop, edge exit,

>    gcond *stmt;

>    if (!number_of_iterations_exit_assumptions (loop, exit, niter,

>                            &stmt, every_iteration))

> -    return false;

> +    {

> +      tree count;

> +      HOST_WIDE_INT max;

> +      if (check_popcount_pattern (loop, &count, &max))

> +    {

> +      niter->assumptions = boolean_false_node;

> +      niter->control.base = NULL_TREE;

> +      niter->control.step = NULL_TREE;

> +      niter->control.no_overflow = false;

> +      niter->niter = count;

> +      niter->assumptions = boolean_true_node;

> +      niter->may_be_zero = boolean_false_node;

> +      niter->max = max;

> +      niter->bound = NULL_TREE;

> +      niter->cmp = ERROR_MARK;

> +      return true;

> +    }

Better to merge these compound statement into check_popcount_pattern
and rename it into something like number_of_iterations_popcount.

I wondered if the more inefficient version popcount should be checked, like:

int count = 0;
while (x)
{
  count += x & 1;
  x = x >> 1;
}

Thanks,
bin

> +      return false;

> +    }

>

>    if (integer_nonzerop (niter->assumptions))

>      return true;

> --

> 2.7.4

>

Kugan Vivekanandarajah May 31, 2018, 3:51 a.m. UTC | #14

Hi Bin,

Thanks for the review. Please find the revised patch based on the
review comments.

Thanks,
Kugan

On 17 May 2018 at 19:56, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Thu, May 17, 2018 at 2:39 AM, Kugan Vivekanandarajah

> <kugan.vivekanandarajah@linaro.org> wrote:

>> Hi Richard,

>>

>> On 6 March 2018 at 02:24, Richard Biener <richard.guenther@gmail.com> wrote:

>>> On Thu, Feb 8, 2018 at 1:41 AM, Kugan Vivekanandarajah

>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>> Hi Richard,

>>>>

>>>> On 1 February 2018 at 23:21, Richard Biener <richard.guenther@gmail.com> wrote:

>>>>> On Thu, Feb 1, 2018 at 5:07 AM, Kugan Vivekanandarajah

>>>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>>>> Hi Richard,

>>>>>>

>>>>>> On 31 January 2018 at 21:39, Richard Biener <richard.guenther@gmail.com> wrote:

>>>>>>> On Wed, Jan 31, 2018 at 11:28 AM, Kugan Vivekanandarajah

>>>>>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>>>>>> Hi Richard,

>>>>>>>>

>>>>>>>> Thanks for the review.

>>>>>>>> On 25 January 2018 at 20:04, Richard Biener <richard.guenther@gmail.com> wrote:

>>>>>>>>> On Wed, Jan 24, 2018 at 10:56 PM, Kugan Vivekanandarajah

>>>>>>>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>>>>>>>> Hi All,

>>>>>>>>>>

>>>>>>>>>> Here is a patch for popcount builtin detection similar to LLVM. I

>>>>>>>>>> would like to queue this for review for next stage 1.

>>>>>>>>>>

>>>>>>>>>> 1. This is done part of loop-distribution and effective for -O3 and above.

>>>>>>>>>> 2. This does not distribute loop to detect popcount (like

>>>>>>>>>> memcpy/memmove). I dont think that happens in practice. Please correct

>>>>>>>>>> me if I am wrong.

>>>>>>>>>

>>>>>>>>> But then it has no business inside loop distribution but instead is

>>>>>>>>> doing final value

>>>>>>>>> replacement, right?  You are pattern-matching the whole loop after all.  I think

>>>>>>>>> final value replacement would already do the correct thing if you

>>>>>>>>> teached number of

>>>>>>>>> iteration analysis that niter for

>>>>>>>>>

>>>>>>>>>   <bb 3> [local count: 955630224]:

>>>>>>>>>   # b_11 = PHI <b_5(5), b_8(6)>

>>>>>>>>>   _1 = b_11 + -1;

>>>>>>>>>   b_8 = _1 & b_11;

>>>>>>>>>   if (b_8 != 0)

>>>>>>>>>     goto <bb 6>; [89.00%]

>>>>>>>>>   else

>>>>>>>>>     goto <bb 8>; [11.00%]

>>>>>>>>>

>>>>>>>>>   <bb 6> [local count: 850510900]:

>>>>>>>>>   goto <bb 3>; [100.00%]

>>>>>>>>

>>>>>>>> I am looking into this approach. What should be the scalar evolution

>>>>>>>> for b_8 (i.e. b & (b -1) in a loop) should be? This is not clear to me

>>>>>>>> and can this be represented with the scev?

>>>>>>>

>>>>>>> No, it's not affine and thus cannot be represented.  You only need the

>>>>>>> scalar evolution of the counting IV which is already handled and

>>>>>>> the number of iteration analysis needs to handle the above IV - this

>>>>>>> is the missing part.

>>>>>> Thanks for the clarification. I am now matching this loop pattern in

>>>>>> number_of_iterations_exit when number_of_iterations_exit_assumptions

>>>>>> fails. If the pattern matches, I am inserting the _builtin_popcount in

>>>>>> the loop preheater and setting the loop niter with this. This will be

>>>>>> used by the final value replacement. Is this what you wanted?

>>>>>

>>>>> No, you shouldn't insert a popcount stmt but instead the niter

>>>>> GENERIC tree should be a CALL_EXPR to popcount with the

>>>>> appropriate argument.

>>>>

>>>> Thats what I tried earlier but ran into some ICEs. I wasn't sure if

>>>> niter in tree_niter_desc can take such.

>>>>

>>>> Attached patch now does this. Also had to add support for CALL_EXPR in

>>>> few places to handle niter with CALL_EXPR. Does this look OK?

>>>

>>> Overall this looks ok - the patch includes changes in places that I don't think

>>> need changes such as chrec_convert_1 or extract_ops_from_tree.

>>> The expression_expensive_p change should be more specific than making

>>> all calls inexpensive as well.

>>

>> Changed it.

>>

>>>

>>> The verify_ssa change looks bogus, you do

>>>

>>> +  dest = gimple_phi_result (count_phi);

>>> +  tree var = make_ssa_name (TREE_TYPE (dest), NULL);

>>> +  tree fn = builtin_decl_implicit (BUILT_IN_POPCOUNT);

>>> +

>>> +  var = build_call_expr (fn, 1, src);

>>> +  *niter = fold_build2 (MINUS_EXPR, TREE_TYPE (dest), var,

>>> +                       build_int_cst (TREE_TYPE (dest), 1));

>>>

>>> why do you allocate a new SSA name here?  It seems unused

>>> as you overwrive 'var' with the CALL_EXPR immediately.

>> Changed now.

>>

>>>

>>> I didn't review the pattern matching thoroughly nor the exact place you

>>> call it.  But

>>>

>>> +      if (check_popcount_pattern (loop, &count))

>>> +       {

>>> +         niter->assumptions = boolean_false_node;

>>> +         niter->control.base = NULL_TREE;

>>> +         niter->control.step = NULL_TREE;

>>> +         niter->control.no_overflow = false;

>>> +         niter->niter = count;

>>> +         niter->assumptions = boolean_true_node;

>>> +         niter->may_be_zero = boolean_false_node;

>>> +         niter->max = -1;

>>> +         niter->bound = NULL_TREE;

>>> +         niter->cmp = ERROR_MARK;

>>> +         return true;

>>> +       }

>>>

>>> simply setting may_be_zero to false looks fishy.

>> Should I set this to (argument to popcount == zero)?

> No, I think that's unnecessary.  The number of iterations is computed

> as: may_be_zero ? 0 : niter;

> Here niter is ZERO even when may_be_zero is set to false, and niters

> is computed correctly.

>

> I think the point is that may_be_zero is false doesn't imply that

> niters is non-zero.

>

>>

>>> Try with -fno-tree-loop-ch.

>> I changed the pattern matching to handle loop without header copying

>> too. Looks a bit complicated checking all the conditions. Wondering if

>> this can be done in a simpler and easier to read way.

>>

>>>  Also max should not be negative,

>>> it should be the number of bits in the IV type?

>>

>> Changed this too.

>>>

>>> A related testcase could be that we can completely peel

>>> a loop like the following which iterates at most 8 times:

>>>

>>> int a[8];

>>> void foo (unsigned char ctrl)

>>> {

>>>   int c = 0;

>>>   while (ctrl)

>>>     {

>>>        ctrl = ctrl & (ctrl - 1);

>>>        a[c++] = ctrl;

>>>     }

>>> }

>>

>> Hmm, this is an interesting test case but as I am now trying to match

>> a loop which does popcount, this is not handled.

>>

>>

>> Attaching the current version for review.

> Here are some comments.

>

>> diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c

>> index 7a54c5f..f390321 100644

>> --- a/gcc/tree-ssa-loop-niter.c

>> +++ b/gcc/tree-ssa-loop-niter.c

>> @@ -2430,6 +2430,134 @@ number_of_iterations_exit_assumptions (struct loop *loop, edge exit,

>>    return (!integer_zerop (niter->assumptions));

>>  }

>>

>> +/* See if LOOP is a popcout implementation of the form

>> +

>> +    int c = 0;

>> +    while (b) {

>> +    b = b & (b - 1);

>> +    c++;

>> +    }

>> +

>> +    If so, Set NITER to  __builtin_popcount (b) - 1

>> +    return true if we did, false otherwise.  */

>> +

>> +static bool

>> +check_popcount_pattern (loop_p loop, tree *niter, HOST_WIDE_INT *max)

>> +{

>> +  tree lhs, rhs;

>> +  tree dest;

>> +  gimple *and_minus_one;

>> +  gimple *phi;

>> +  int count = 0;

>> +  gimple *count_stmt = NULL;

>> +  bool adjust = true;

>> +

>> +  if (!single_exit (loop))

>> +    return false;

>> +

>> +  /* Check loop terminating branch is like

>> +     if (b != 0).  */

>> +  gimple *stmt = last_stmt (loop->header);

>> +  if (!stmt

>> +      || gimple_code (stmt) != GIMPLE_COND

>> +      || !zerop (gimple_cond_rhs (stmt)))

> The check doesn't fully match the comment.  NE is not checked here.

> Also can move below "(TREE_CODE (lhs) != SSA_NAME)" check up to this

> point, making simple checks earlier.

>

>> +    return false;

>> +

>> +  /* Check the loop closed SSA definition for just the variable c defined in

>> +     loop.  */

>> +  basic_block bb = single_exit (loop)->dest;

> single_exit is repeatedly called various times, call it once and use

> the returning edge instead?  I am not sure GCC is smart enough

> removing repeated call instances.  Same to loop_latch_edge.

>

>> +  for (gphi_iterator gpi = gsi_start_phis (bb);

>> +       !gsi_end_p (gpi); gsi_next (&gpi))

>> +    {

>> +      phi = gpi.phi ();

>> +      count++;

>> +    }

>> +

>> +  if (count != 1)

>> +    return false;

>> +

>> +  rhs = gimple_phi_arg_def (phi, single_exit (loop)->dest_idx);

>> +  if (TREE_CODE (rhs) != SSA_NAME)

>> +    return false;

>> +  count_stmt = SSA_NAME_DEF_STMT (rhs);

>> +  lhs = gimple_cond_lhs (stmt);

>> +  if (TREE_CODE (lhs) != SSA_NAME)

>> +    return false;

>> +  gimple *and_stmt = SSA_NAME_DEF_STMT (lhs);

>> +

>> +  /* Depending on copy-header is performed, feeding PHI stmts might be in

>> +     the loop header or loop exit, handle this.  */

>> +  if (gimple_code (count_stmt) == GIMPLE_PHI)

>> +    {

>> +      tree t;

>> +      if (gimple_code (and_stmt) != GIMPLE_PHI

>> +      || gimple_bb (and_stmt) != single_exit (loop)->src

>> +      || gimple_bb (count_stmt) != single_exit (loop)->src)

>> +    return false;

>> +      t = gimple_phi_arg_def (count_stmt, loop_latch_edge (loop)->dest_idx);

>> +      if (TREE_CODE (t) != SSA_NAME)

>> +    return false;

>> +      count_stmt = SSA_NAME_DEF_STMT (t);

>> +      t = gimple_phi_arg_def (and_stmt, loop_latch_edge (loop)->dest_idx);

>> +      if (TREE_CODE (t) != SSA_NAME)

>> +    return false;

>> +      and_stmt = SSA_NAME_DEF_STMT (t);

>> +      adjust = false;

>> +    }

>> +

>> +  /* Make sure we have a count by one.  */

>> +  if (!is_gimple_assign (count_stmt)

>> +      || (gimple_assign_rhs_code (count_stmt) != PLUS_EXPR)

>> +      || !integer_onep (gimple_assign_rhs2 (count_stmt)))

>> +    return false;

>> +

>> +  /* Cheeck "b = b & (b - 1)" is calculated.  */

> Typo.

>

>> +  if (!is_gimple_assign (and_stmt)

>> +      || gimple_assign_rhs_code (and_stmt) != BIT_AND_EXPR)

>> +    return false;

>> +

>> +  lhs = gimple_assign_rhs1 (and_stmt);

>> +  rhs = gimple_assign_rhs2 (and_stmt);

>> +  if (TREE_CODE (lhs) == SSA_NAME

>> +      && (and_minus_one = SSA_NAME_DEF_STMT (lhs))

>> +      && is_gimple_assign (and_minus_one)

>> +      && (gimple_assign_rhs_code (and_minus_one) == PLUS_EXPR)

>> +      && integer_minus_onep (gimple_assign_rhs2 (and_minus_one)))

>> +      lhs = rhs;

>> +  else if (TREE_CODE (rhs) == SSA_NAME

>> +      && (and_minus_one = SSA_NAME_DEF_STMT (rhs))

>> +      && is_gimple_assign (and_minus_one)

>> +      && (gimple_assign_rhs_code (and_minus_one) == PLUS_EXPR)

>> +      && integer_minus_onep (gimple_assign_rhs2 (and_minus_one)))

> Could you avoid duplication by factoring the condition into an inline

> function?  They are exactly the same for lhs/rhs.

>

>> +      ;

>> +  else

>> +    return false;

>> +

>> +  if ((gimple_assign_rhs1 (and_stmt) != gimple_assign_rhs1 (and_minus_one))

>> +      && (gimple_assign_rhs2 (and_stmt) != gimple_assign_rhs1 (and_minus_one)))

> Here you already got lhs correctly, so don't need to check on and_stmt

> rhs directly.  You can even merge this check into above one.

>

>> +    return false;

>> +

>> +  /* Check the recurrence.  */

>> +  phi = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (and_minus_one));

>> +  gimple *src_phi = SSA_NAME_DEF_STMT (lhs);

>> +  if (gimple_code (phi) != GIMPLE_PHI

>> +      || gimple_code (src_phi) != GIMPLE_PHI)

> I think this is redundant since you have lhs equals to

> gimple_assign_rhs1 (and_minus_one).  So phi == src_phi is always true?

>

>> +    return false;

>> +

>> +  dest = gimple_assign_lhs (count_stmt);

>> +  tree fn = builtin_decl_implicit (BUILT_IN_POPCOUNT);

>> +  tree src = gimple_phi_arg_def (src_phi, loop_preheader_edge (loop)->dest_idx);

>> +  if (adjust)

>> +    *niter = fold_build2 (MINUS_EXPR, TREE_TYPE (dest),

>> +              build_call_expr (fn, 1, src),

>> +              build_int_cst (TREE_TYPE (dest), 1));

>> +  else

>> +    *niter = build_call_expr (fn, 1, src);

>> +  *max = int_cst_value (TYPE_MAX_VALUE (TREE_TYPE (dest)));

>> +  return true;

>> +}

>> +

>> +

>>  /* Like number_of_iterations_exit_assumptions, but return TRUE only if

>>     the niter information holds unconditionally.  */

>>

>> @@ -2441,7 +2569,25 @@ number_of_iterations_exit (struct loop *loop, edge exit,

>>    gcond *stmt;

>>    if (!number_of_iterations_exit_assumptions (loop, exit, niter,

>>                            &stmt, every_iteration))

>> -    return false;

>> +    {

>> +      tree count;

>> +      HOST_WIDE_INT max;

>> +      if (check_popcount_pattern (loop, &count, &max))

>> +    {

>> +      niter->assumptions = boolean_false_node;

>> +      niter->control.base = NULL_TREE;

>> +      niter->control.step = NULL_TREE;

>> +      niter->control.no_overflow = false;

>> +      niter->niter = count;

>> +      niter->assumptions = boolean_true_node;

>> +      niter->may_be_zero = boolean_false_node;

>> +      niter->max = max;

>> +      niter->bound = NULL_TREE;

>> +      niter->cmp = ERROR_MARK;

>> +      return true;

>> +    }

> Better to merge these compound statement into check_popcount_pattern

> and rename it into something like number_of_iterations_popcount.

>

> I wondered if the more inefficient version popcount should be checked, like:

>

> int count = 0;

> while (x)

> {

>   count += x & 1;

>   x = x >> 1;

> }

>

> Thanks,

> bin

>

>> +      return false;

>> +    }

>>

>>    if (integer_nonzerop (niter->assumptions))

>>      return true;

>> --

>> 2.7.4

>>
From 413aafbb5d53812546c4af5f556f362aafdb32d4 Mon Sep 17 00:00:00 2001
From: Kugan Vivekanandarajah <kugan.vivekanandarajah@linaro.org>
Date: Thu, 10 May 2018 21:41:53 +1000
Subject: [PATCH] popcount

Change-Id: I2f796dd4518937495cc1855852b0dfa5c4ff1143
---
 gcc/ipa-fnsummary.c                       |   4 +
 gcc/testsuite/gcc.dg/tree-ssa/popcount.c  |  41 ++++++++
 gcc/testsuite/gcc.dg/tree-ssa/popcount2.c |  29 ++++++
 gcc/testsuite/gcc.dg/tree-ssa/popcount3.c |  28 ++++++
 gcc/tree-scalar-evolution.c               |   7 ++
 gcc/tree-ssa-loop-ivopts.c                |  10 ++
 gcc/tree-ssa-loop-niter.c                 | 156 +++++++++++++++++++++++++++++-
 7 files changed, 274 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/popcount.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/popcount2.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/popcount3.c

diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c
index bdf9ba1..4dc156a 100644
--- a/gcc/ipa-fnsummary.c
+++ b/gcc/ipa-fnsummary.c
@@ -1485,6 +1485,10 @@ will_be_nonconstant_expr_predicate (struct ipa_node_params *info,
 					       nonconstant_names);
       return p2.or_with (summary->conds, p1);
     }
+  else if (TREE_CODE (expr) == CALL_EXPR)
+    {
+      return false;
+    }
   else
     {
       debug_tree (expr);
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/popcount.c b/gcc/testsuite/gcc.dg/tree-ssa/popcount.c
new file mode 100644
index 0000000..86a66cb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/popcount.c
@@ -0,0 +1,41 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+
+extern int foo (int);
+
+int PopCount (long b) {
+    int c = 0;
+    b++;
+
+    while (b) {
+	b &= b - 1;
+	c++;
+    }
+    return c;
+}
+int PopCount2 (long b) {
+    int c = 0;
+
+    while (b) {
+	b &= b - 1;
+	c++;
+    }
+    foo (c);
+    return foo (c);
+}
+
+void PopCount3 (long b1) {
+
+    for (long i = 0; i < b1; ++i)
+      {
+	long b = i;
+	int c = 0;
+	while (b) {
+	    b &= b - 1;
+	    c++;
+	}
+	foo (c);
+      }
+}
+
+/* { dg-final { scan-tree-dump-times "__builtin_popcount" 3 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/popcount2.c b/gcc/testsuite/gcc.dg/tree-ssa/popcount2.c
new file mode 100644
index 0000000..52afc2d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/popcount2.c
@@ -0,0 +1,29 @@
+/* { dg-do execute } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+int
+__attribute__ ((noinline, noclone))
+foo (int i, long b)
+{
+    int c = i;
+
+    while (b) {
+	b &= b - 1;
+	c++;
+    }
+    return c;
+}
+
+int main()
+{
+
+  if (foo (1, 7) != 4)
+   __builtin_abort ();
+  if (foo (0, 0) != 0)
+   __builtin_abort ();
+  if (foo (8, 0xff) != 16)
+   __builtin_abort ();
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "__builtin_popcount" 1 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/popcount3.c b/gcc/testsuite/gcc.dg/tree-ssa/popcount3.c
new file mode 100644
index 0000000..0c69d97
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/popcount3.c
@@ -0,0 +1,28 @@
+/* { dg-do execute } */
+/* { dg-options "-O2 -fno-tree-ch -fdump-tree-optimized" } */
+
+int
+__attribute__ ((noinline, noclone))
+foo (long b)
+{
+    int c = i;
+
+    while (b) {
+	b &= b - 1;
+	c++;
+    }
+    return c;
+}
+
+int main()
+{
+  if (foo (7) != 3)
+   __builtin_abort ();
+  if (foo (0) != 0)
+   __builtin_abort ();
+  if (foo (0xff) != 8)
+   __builtin_abort ();
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "__builtin_popcount" 1 "optimized" } } */
diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c
index fefc9de..967b154 100644
--- a/gcc/tree-scalar-evolution.c
+++ b/gcc/tree-scalar-evolution.c
@@ -1984,6 +1984,7 @@ interpret_expr (struct loop *loop, gimple *at_stmt, tree expr)
     return expr;
 
   if (TREE_CODE (expr) == POLYNOMIAL_CHREC
+      || TREE_CODE (expr) == CALL_EXPR
       || get_gimple_rhs_class (TREE_CODE (expr)) == GIMPLE_TERNARY_RHS)
     return chrec_dont_know;
 
@@ -3472,6 +3473,7 @@ bool
 expression_expensive_p (tree expr)
 {
   enum tree_code code;
+  tree fndecl;
 
   if (is_gimple_val (expr))
     return false;
@@ -3492,6 +3494,11 @@ expression_expensive_p (tree expr)
       if (!integer_pow2p (TREE_OPERAND (expr, 1)))
 	return true;
     }
+  if (code == CALL_EXPR
+      && (fndecl = get_callee_fndecl (expr))
+      && DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL
+      && (BUILT_IN_POPCOUNT == DECL_FUNCTION_CODE (fndecl)))
+    return false;
 
   switch (TREE_CODE_CLASS (code))
     {
diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index b313571..519649a 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -985,6 +985,16 @@ contains_abnormal_ssa_name_p (tree expr)
   code = TREE_CODE (expr);
   codeclass = TREE_CODE_CLASS (code);
 
+  if (code == CALL_EXPR)
+    {
+      tree arg;
+      call_expr_arg_iterator iter;
+      FOR_EACH_CALL_EXPR_ARG (arg, iter, expr)
+	if (contains_abnormal_ssa_name_p (arg))
+	  return true;
+      return false;
+    }
+
   if (code == SSA_NAME)
     return SSA_NAME_OCCURS_IN_ABNORMAL_PHI (expr) != 0;
 
diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
index 7a54c5f..66f9b4f 100644
--- a/gcc/tree-ssa-loop-niter.c
+++ b/gcc/tree-ssa-loop-niter.c
@@ -2430,6 +2430,156 @@ number_of_iterations_exit_assumptions (struct loop *loop, edge exit,
   return (!integer_zerop (niter->assumptions));
 }
 
+/* Utility function to check if OP is defined by a stmt
+   that is a val - 1.  If that is the case, set it to STMT.  */
+
+static bool
+ssa_defined_by_and_minus_one_stmt_p (tree op, tree val, gimple **stmt)
+{
+  if (TREE_CODE (op) == SSA_NAME
+      && (*stmt = SSA_NAME_DEF_STMT (op))
+      && is_gimple_assign (*stmt)
+      && (gimple_assign_rhs_code (*stmt) == PLUS_EXPR)
+      && val == gimple_assign_rhs1 (*stmt)
+      && integer_minus_onep (gimple_assign_rhs2 (*stmt)))
+    return true;
+  else
+    return false;
+}
+
+/* See if LOOP is a popcout implementation of the form
+
+    int c = 0;
+    while (b) {
+	b = b & (b - 1);
+	c++;
+    }
+
+    If popcount pattern, update NITER accordingly.
+    i.e., set NITER to  __builtin_popcount (b)
+    return true if we did, false otherwise.
+
+ */
+
+static bool
+number_of_iterations_popcount (loop_p loop, struct tree_niter_desc *niter)
+{
+  tree rhs1, rhs2;
+  tree dest;
+  gimple *and_minus_one;
+  gimple *phi;
+  int count = 0;
+  gimple *count_stmt = NULL;
+  bool adjust = true;
+  edge exit;
+  tree iter;
+  HOST_WIDE_INT max;
+
+  if (!(exit = single_exit (loop)))
+    return false;
+
+  /* Check loop terminating branch is like
+     if (b != 0).  */
+  gimple *stmt = last_stmt (loop->header);
+  if (!stmt
+      || gimple_code (stmt) != GIMPLE_COND
+      || gimple_cond_code (stmt) != NE_EXPR
+      || !zerop (gimple_cond_rhs (stmt))
+      || TREE_CODE (gimple_cond_lhs (stmt)) != SSA_NAME)
+    return false;
+
+  /* Check the loop closed SSA definition for just the variable c defined in
+     loop.  */
+  basic_block bb = exit->dest;
+  for (gphi_iterator gpi = gsi_start_phis (bb);
+       !gsi_end_p (gpi); gsi_next (&gpi))
+    {
+      phi = gpi.phi ();
+      count++;
+    }
+
+  if (count != 1)
+    return false;
+
+  rhs1 = gimple_phi_arg_def (phi, exit->dest_idx);
+  if (TREE_CODE (rhs1) != SSA_NAME)
+    return false;
+  count_stmt = SSA_NAME_DEF_STMT (rhs1);
+  gimple *and_stmt = SSA_NAME_DEF_STMT (gimple_cond_lhs (stmt));
+
+  /* Depending on copy-header is performed, feeding PHI stmts might be in
+     the loop header or loop exit, handle this.  */
+  if (gimple_code (count_stmt) == GIMPLE_PHI)
+    {
+      tree t;
+      if (gimple_code (and_stmt) != GIMPLE_PHI
+	  || gimple_bb (and_stmt) != exit->src
+	  || gimple_bb (count_stmt) != exit->src)
+	return false;
+      t = gimple_phi_arg_def (count_stmt, loop_latch_edge (loop)->dest_idx);
+      if (TREE_CODE (t) != SSA_NAME)
+	return false;
+      count_stmt = SSA_NAME_DEF_STMT (t);
+      t = gimple_phi_arg_def (and_stmt, loop_latch_edge (loop)->dest_idx);
+      if (TREE_CODE (t) != SSA_NAME)
+	return false;
+      and_stmt = SSA_NAME_DEF_STMT (t);
+      adjust = false;
+    }
+
+  /* Make sure we have a count by one.  */
+  if (!is_gimple_assign (count_stmt)
+      || (gimple_assign_rhs_code (count_stmt) != PLUS_EXPR)
+      || !integer_onep (gimple_assign_rhs2 (count_stmt)))
+    return false;
+
+  /* Check "b = b & (b - 1)" is calculated.  */
+  if (!is_gimple_assign (and_stmt)
+      || gimple_assign_rhs_code (and_stmt) != BIT_AND_EXPR)
+    return false;
+
+  rhs1 = gimple_assign_rhs1 (and_stmt);
+  rhs2 = gimple_assign_rhs2 (and_stmt);
+
+  if (ssa_defined_by_and_minus_one_stmt_p (rhs1, rhs2, &and_minus_one))
+    rhs1 = rhs2;
+  else if (ssa_defined_by_and_minus_one_stmt_p (rhs2, rhs1, &and_minus_one))
+    ;
+  else
+    return false;
+
+  /* Check the recurrence.  */
+  phi = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (and_minus_one));
+  gimple *src_phi = SSA_NAME_DEF_STMT (rhs2);
+  if (gimple_code (phi) != GIMPLE_PHI
+      || gimple_code (src_phi) != GIMPLE_PHI)
+    return false;
+
+  dest = gimple_assign_lhs (count_stmt);
+  tree fn = builtin_decl_implicit (BUILT_IN_POPCOUNT);
+  tree src = gimple_phi_arg_def (src_phi, loop_preheader_edge (loop)->dest_idx);
+  if (adjust)
+    iter = fold_build2 (MINUS_EXPR, TREE_TYPE (dest),
+			build_call_expr (fn, 1, src),
+			build_int_cst (TREE_TYPE (dest), 1));
+  else
+    iter = build_call_expr (fn, 1, src);
+  max = int_cst_value (TYPE_MAX_VALUE (TREE_TYPE (dest)));
+
+  niter->assumptions = boolean_false_node;
+  niter->control.base = NULL_TREE;
+  niter->control.step = NULL_TREE;
+  niter->control.no_overflow = false;
+  niter->niter = iter;
+  niter->assumptions = boolean_true_node;
+  niter->may_be_zero = boolean_false_node;
+  niter->max = max;
+  niter->bound = NULL_TREE;
+  niter->cmp = ERROR_MARK;
+  return true;
+}
+
+
 /* Like number_of_iterations_exit_assumptions, but return TRUE only if
    the niter information holds unconditionally.  */
 
@@ -2441,7 +2591,11 @@ number_of_iterations_exit (struct loop *loop, edge exit,
   gcond *stmt;
   if (!number_of_iterations_exit_assumptions (loop, exit, niter,
 					      &stmt, every_iteration))
-    return false;
+    {
+      if (number_of_iterations_popcount (loop, niter))
+	return true;
+      return false;
+    }
 
   if (integer_nonzerop (niter->assumptions))
     return true;

Bin.Cheng May 31, 2018, 5:45 p.m. UTC | #15

On Thu, May 31, 2018 at 3:51 AM, Kugan Vivekanandarajah
<kugan.vivekanandarajah@linaro.org> wrote:
> Hi Bin,

>

> Thanks for the review. Please find the revised patch based on the

> review comments.

>

> Thanks,

> Kugan

>

> On 17 May 2018 at 19:56, Bin.Cheng <amker.cheng@gmail.com> wrote:

>> On Thu, May 17, 2018 at 2:39 AM, Kugan Vivekanandarajah

>> <kugan.vivekanandarajah@linaro.org> wrote:

>>> Hi Richard,

>>>

>>> On 6 March 2018 at 02:24, Richard Biener <richard.guenther@gmail.com> wrote:

>>>> On Thu, Feb 8, 2018 at 1:41 AM, Kugan Vivekanandarajah

>>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>>> Hi Richard,

>>>>>


Hi,
Thanks very much for working.

> +/* Utility function to check if OP is defined by a stmt

> +   that is a val - 1.  If that is the case, set it to STMT.  */

> +

> +static bool

> +ssa_defined_by_and_minus_one_stmt_p (tree op, tree val, gimple **stmt)

This is checking if op is defined as val - 1, so name it as
ssa_defined_by_minus_one_stmt_p?

> +{

> +  if (TREE_CODE (op) == SSA_NAME

> +      && (*stmt = SSA_NAME_DEF_STMT (op))

> +      && is_gimple_assign (*stmt)

> +      && (gimple_assign_rhs_code (*stmt) == PLUS_EXPR)

> +      && val == gimple_assign_rhs1 (*stmt)

> +      && integer_minus_onep (gimple_assign_rhs2 (*stmt)))

> +    return true;

> +  else

> +    return false;

You can simply return the boolean condition.

> +}

> +

> +/* See if LOOP is a popcout implementation of the form

...
> +  rhs1 = gimple_assign_rhs1 (and_stmt);

> +  rhs2 = gimple_assign_rhs2 (and_stmt);

> +

> +  if (ssa_defined_by_and_minus_one_stmt_p (rhs1, rhs2, &and_minus_one))

> +    rhs1 = rhs2;

> +  else if (ssa_defined_by_and_minus_one_stmt_p (rhs2, rhs1, &and_minus_one))

> +    ;

> +  else

> +    return false;

> +

> +  /* Check the recurrence.  */

> +  phi = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (and_minus_one));

So gimple_assign_rhs1 (and_minus_one) == rhs1 is always true?  Please
use rhs1 directly.

> +  gimple *src_phi = SSA_NAME_DEF_STMT (rhs2);

I think this is checking wrong thing and is redundant.  Either rhs2
equals to rhs1 or is defined as (rhs1 - 1).  For (rhs2 == rhs1) case,
the check duplicates checking on phi; for the latter, it's never a PHI
stmt and shouldn't be checked.

> +  if (gimple_code (phi) != GIMPLE_PHI

> +      || gimple_code (src_phi) != GIMPLE_PHI)

> +    return false;

> +

> +  dest = gimple_assign_lhs (count_stmt);

> +  tree fn = builtin_decl_implicit (BUILT_IN_POPCOUNT);

> +  tree src = gimple_phi_arg_def (src_phi, loop_preheader_edge (loop)->dest_idx);

> +  if (adjust)

> +    iter = fold_build2 (MINUS_EXPR, TREE_TYPE (dest),

> +            build_call_expr (fn, 1, src),

> +            build_int_cst (TREE_TYPE (dest), 1));

> +  else

> +    iter = build_call_expr (fn, 1, src);

Note tree-ssa-loop-niters.c always use unsigned_type_for (IV-type) as
niters type.  Though unsigned type is unnecessary in this case, but
better to follow existing behavior?

> +  max = int_cst_value (TYPE_MAX_VALUE (TREE_TYPE (dest)));

As richi suggested, max should be the number of bits in type of IV.

> +

> +  niter->assumptions = boolean_false_node;

Redundant.

> +  niter->control.base = NULL_TREE;

> +  niter->control.step = NULL_TREE;

> +  niter->control.no_overflow = false;

> +  niter->niter = iter;

> +  niter->assumptions = boolean_true_node;

> +  niter->may_be_zero = boolean_false_node;

> +  niter->max = max;

> +  niter->bound = NULL_TREE;

> +  niter->cmp = ERROR_MARK;

> +  return true;

> +}

> +

> +

Appology if these are nitpickings.

Thanks,
bin

Kugan Vivekanandarajah June 1, 2018, 9:56 a.m. UTC | #16

Hi Bin,

Thanks a lo for the review.

On 1 June 2018 at 03:45, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Thu, May 31, 2018 at 3:51 AM, Kugan Vivekanandarajah

> <kugan.vivekanandarajah@linaro.org> wrote:

>> Hi Bin,

>>

>> Thanks for the review. Please find the revised patch based on the

>> review comments.

>>

>> Thanks,

>> Kugan

>>

>> On 17 May 2018 at 19:56, Bin.Cheng <amker.cheng@gmail.com> wrote:

>>> On Thu, May 17, 2018 at 2:39 AM, Kugan Vivekanandarajah

>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>> Hi Richard,

>>>>

>>>> On 6 March 2018 at 02:24, Richard Biener <richard.guenther@gmail.com> wrote:

>>>>> On Thu, Feb 8, 2018 at 1:41 AM, Kugan Vivekanandarajah

>>>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>>>> Hi Richard,

>>>>>>

>

> Hi,

> Thanks very much for working.

>

>> +/* Utility function to check if OP is defined by a stmt

>> +   that is a val - 1.  If that is the case, set it to STMT.  */

>> +

>> +static bool

>> +ssa_defined_by_and_minus_one_stmt_p (tree op, tree val, gimple **stmt)

> This is checking if op is defined as val - 1, so name it as

> ssa_defined_by_minus_one_stmt_p?

>

>> +{

>> +  if (TREE_CODE (op) == SSA_NAME

>> +      && (*stmt = SSA_NAME_DEF_STMT (op))

>> +      && is_gimple_assign (*stmt)

>> +      && (gimple_assign_rhs_code (*stmt) == PLUS_EXPR)

>> +      && val == gimple_assign_rhs1 (*stmt)

>> +      && integer_minus_onep (gimple_assign_rhs2 (*stmt)))

>> +    return true;

>> +  else

>> +    return false;

> You can simply return the boolean condition.

Done.

>

>> +}

>> +

>> +/* See if LOOP is a popcout implementation of the form

> ...

>> +  rhs1 = gimple_assign_rhs1 (and_stmt);

>> +  rhs2 = gimple_assign_rhs2 (and_stmt);

>> +

>> +  if (ssa_defined_by_and_minus_one_stmt_p (rhs1, rhs2, &and_minus_one))

>> +    rhs1 = rhs2;

>> +  else if (ssa_defined_by_and_minus_one_stmt_p (rhs2, rhs1, &and_minus_one))

>> +    ;

>> +  else

>> +    return false;

>> +

>> +  /* Check the recurrence.  */

>> +  phi = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (and_minus_one));

> So gimple_assign_rhs1 (and_minus_one) == rhs1 is always true?  Please

> use rhs1 directly.


Done.
>> +  gimple *src_phi = SSA_NAME_DEF_STMT (rhs2);

> I think this is checking wrong thing and is redundant.  Either rhs2

> equals to rhs1 or is defined as (rhs1 - 1).  For (rhs2 == rhs1) case,

> the check duplicates checking on phi; for the latter, it's never a PHI

> stmt and shouldn't be checked.

>

>> +  if (gimple_code (phi) != GIMPLE_PHI

>> +      || gimple_code (src_phi) != GIMPLE_PHI)

>> +    return false;

>> +

>> +  dest = gimple_assign_lhs (count_stmt);

>> +  tree fn = builtin_decl_implicit (BUILT_IN_POPCOUNT);

>> +  tree src = gimple_phi_arg_def (src_phi, loop_preheader_edge (loop)->dest_idx);

>> +  if (adjust)

>> +    iter = fold_build2 (MINUS_EXPR, TREE_TYPE (dest),

>> +            build_call_expr (fn, 1, src),

>> +            build_int_cst (TREE_TYPE (dest), 1));

>> +  else

>> +    iter = build_call_expr (fn, 1, src);

> Note tree-ssa-loop-niters.c always use unsigned_type_for (IV-type) as

> niters type.  Though unsigned type is unnecessary in this case, but

> better to follow existing behavior?


Done.
>

>> +  max = int_cst_value (TYPE_MAX_VALUE (TREE_TYPE (dest)));

> As richi suggested, max should be the number of bits in type of IV.

>

>> +

>> +  niter->assumptions = boolean_false_node;

> Redundant.


Not sure I understand. If I remove this,  I am getting ICE
(segmentation fault). What is the expectation here?

>> +  niter->control.base = NULL_TREE;

>> +  niter->control.step = NULL_TREE;

>> +  niter->control.no_overflow = false;

>> +  niter->niter = iter;

>> +  niter->assumptions = boolean_true_node;

>> +  niter->may_be_zero = boolean_false_node;

>> +  niter->max = max;

>> +  niter->bound = NULL_TREE;

>> +  niter->cmp = ERROR_MARK;

>> +  return true;

>> +}

>> +

>> +

> Appology if these are nitpickings.

Thanks for the review. I am happy to make the changes needed to get it
to how it should be :)

Thanks,
Kugan

>

> Thanks,

> bin
From f45179c777d846731d2d899a142c45a36ab35fd1 Mon Sep 17 00:00:00 2001
From: Kugan Vivekanandarajah <kugan.vivekanandarajah@linaro.org>
Date: Thu, 10 May 2018 21:41:53 +1000
Subject: [PATCH] popcount

Change-Id: I10c1f990e5b9c9900cf7c93678df924f0463b72e
---
 gcc/ipa-fnsummary.c                       |   4 +
 gcc/testsuite/gcc.dg/tree-ssa/popcount.c  |  41 ++++++++
 gcc/testsuite/gcc.dg/tree-ssa/popcount2.c |  29 ++++++
 gcc/testsuite/gcc.dg/tree-ssa/popcount3.c |  28 ++++++
 gcc/tree-scalar-evolution.c               |   7 ++
 gcc/tree-ssa-loop-ivopts.c                |  10 ++
 gcc/tree-ssa-loop-niter.c                 | 152 +++++++++++++++++++++++++++++-
 7 files changed, 270 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/popcount.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/popcount2.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/popcount3.c

diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c
index bdf9ba1..4dc156a 100644
--- a/gcc/ipa-fnsummary.c
+++ b/gcc/ipa-fnsummary.c
@@ -1485,6 +1485,10 @@ will_be_nonconstant_expr_predicate (struct ipa_node_params *info,
 					       nonconstant_names);
       return p2.or_with (summary->conds, p1);
     }
+  else if (TREE_CODE (expr) == CALL_EXPR)
+    {
+      return false;
+    }
   else
     {
       debug_tree (expr);
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/popcount.c b/gcc/testsuite/gcc.dg/tree-ssa/popcount.c
new file mode 100644
index 0000000..86a66cb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/popcount.c
@@ -0,0 +1,41 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+
+extern int foo (int);
+
+int PopCount (long b) {
+    int c = 0;
+    b++;
+
+    while (b) {
+	b &= b - 1;
+	c++;
+    }
+    return c;
+}
+int PopCount2 (long b) {
+    int c = 0;
+
+    while (b) {
+	b &= b - 1;
+	c++;
+    }
+    foo (c);
+    return foo (c);
+}
+
+void PopCount3 (long b1) {
+
+    for (long i = 0; i < b1; ++i)
+      {
+	long b = i;
+	int c = 0;
+	while (b) {
+	    b &= b - 1;
+	    c++;
+	}
+	foo (c);
+      }
+}
+
+/* { dg-final { scan-tree-dump-times "__builtin_popcount" 3 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/popcount2.c b/gcc/testsuite/gcc.dg/tree-ssa/popcount2.c
new file mode 100644
index 0000000..52afc2d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/popcount2.c
@@ -0,0 +1,29 @@
+/* { dg-do execute } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+int
+__attribute__ ((noinline, noclone))
+foo (int i, long b)
+{
+    int c = i;
+
+    while (b) {
+	b &= b - 1;
+	c++;
+    }
+    return c;
+}
+
+int main()
+{
+
+  if (foo (1, 7) != 4)
+   __builtin_abort ();
+  if (foo (0, 0) != 0)
+   __builtin_abort ();
+  if (foo (8, 0xff) != 16)
+   __builtin_abort ();
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "__builtin_popcount" 1 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/popcount3.c b/gcc/testsuite/gcc.dg/tree-ssa/popcount3.c
new file mode 100644
index 0000000..0c69d97
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/popcount3.c
@@ -0,0 +1,28 @@
+/* { dg-do execute } */
+/* { dg-options "-O2 -fno-tree-ch -fdump-tree-optimized" } */
+
+int
+__attribute__ ((noinline, noclone))
+foo (long b)
+{
+    int c = i;
+
+    while (b) {
+	b &= b - 1;
+	c++;
+    }
+    return c;
+}
+
+int main()
+{
+  if (foo (7) != 3)
+   __builtin_abort ();
+  if (foo (0) != 0)
+   __builtin_abort ();
+  if (foo (0xff) != 8)
+   __builtin_abort ();
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "__builtin_popcount" 1 "optimized" } } */
diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c
index fefc9de..967b154 100644
--- a/gcc/tree-scalar-evolution.c
+++ b/gcc/tree-scalar-evolution.c
@@ -1984,6 +1984,7 @@ interpret_expr (struct loop *loop, gimple *at_stmt, tree expr)
     return expr;
 
   if (TREE_CODE (expr) == POLYNOMIAL_CHREC
+      || TREE_CODE (expr) == CALL_EXPR
       || get_gimple_rhs_class (TREE_CODE (expr)) == GIMPLE_TERNARY_RHS)
     return chrec_dont_know;
 
@@ -3472,6 +3473,7 @@ bool
 expression_expensive_p (tree expr)
 {
   enum tree_code code;
+  tree fndecl;
 
   if (is_gimple_val (expr))
     return false;
@@ -3492,6 +3494,11 @@ expression_expensive_p (tree expr)
       if (!integer_pow2p (TREE_OPERAND (expr, 1)))
 	return true;
     }
+  if (code == CALL_EXPR
+      && (fndecl = get_callee_fndecl (expr))
+      && DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL
+      && (BUILT_IN_POPCOUNT == DECL_FUNCTION_CODE (fndecl)))
+    return false;
 
   switch (TREE_CODE_CLASS (code))
     {
diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index b313571..519649a 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -985,6 +985,16 @@ contains_abnormal_ssa_name_p (tree expr)
   code = TREE_CODE (expr);
   codeclass = TREE_CODE_CLASS (code);
 
+  if (code == CALL_EXPR)
+    {
+      tree arg;
+      call_expr_arg_iterator iter;
+      FOR_EACH_CALL_EXPR_ARG (arg, iter, expr)
+	if (contains_abnormal_ssa_name_p (arg))
+	  return true;
+      return false;
+    }
+
   if (code == SSA_NAME)
     return SSA_NAME_OCCURS_IN_ABNORMAL_PHI (expr) != 0;
 
diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
index 7a54c5f..fde9245 100644
--- a/gcc/tree-ssa-loop-niter.c
+++ b/gcc/tree-ssa-loop-niter.c
@@ -2430,6 +2430,152 @@ number_of_iterations_exit_assumptions (struct loop *loop, edge exit,
   return (!integer_zerop (niter->assumptions));
 }
 
+/* Utility function to check if OP is defined by a stmt
+   that is a val - 1.  If that is the case, set it to STMT.  */
+
+static bool
+ssa_defined_by_minus_one_stmt_p (tree op, tree val, gimple **stmt)
+{
+  return (TREE_CODE (op) == SSA_NAME
+	  && (*stmt = SSA_NAME_DEF_STMT (op))
+	  && is_gimple_assign (*stmt)
+	  && (gimple_assign_rhs_code (*stmt) == PLUS_EXPR)
+	  && val == gimple_assign_rhs1 (*stmt)
+	  && integer_minus_onep (gimple_assign_rhs2 (*stmt)));
+}
+
+/* See if LOOP is a popcout implementation of the form
+
+    int c = 0;
+    while (b) {
+	b = b & (b - 1);
+	c++;
+    }
+
+    If popcount pattern, update NITER accordingly.
+    i.e., set NITER to  __builtin_popcount (b)
+    return true if we did, false otherwise.
+
+ */
+
+static bool
+number_of_iterations_popcount (loop_p loop, struct tree_niter_desc *niter)
+{
+  tree rhs1, rhs2;
+  tree dest;
+  gimple *and_minus_one;
+  gimple *phi;
+  int count = 0;
+  gimple *count_stmt = NULL;
+  bool adjust = true;
+  edge exit;
+  tree iter;
+  HOST_WIDE_INT max;
+
+  if (!(exit = single_exit (loop)))
+    return false;
+
+  /* Check loop terminating branch is like
+     if (b != 0).  */
+  gimple *stmt = last_stmt (loop->header);
+  if (!stmt
+      || gimple_code (stmt) != GIMPLE_COND
+      || gimple_cond_code (stmt) != NE_EXPR
+      || !zerop (gimple_cond_rhs (stmt))
+      || TREE_CODE (gimple_cond_lhs (stmt)) != SSA_NAME)
+    return false;
+
+  /* Check the loop closed SSA definition for just the variable c defined in
+     loop.  */
+  basic_block bb = exit->dest;
+  for (gphi_iterator gpi = gsi_start_phis (bb);
+       !gsi_end_p (gpi); gsi_next (&gpi))
+    {
+      phi = gpi.phi ();
+      count++;
+    }
+
+  if (count != 1)
+    return false;
+
+  rhs1 = gimple_phi_arg_def (phi, exit->dest_idx);
+  if (TREE_CODE (rhs1) != SSA_NAME)
+    return false;
+  count_stmt = SSA_NAME_DEF_STMT (rhs1);
+  gimple *and_stmt = SSA_NAME_DEF_STMT (gimple_cond_lhs (stmt));
+
+  /* Depending on copy-header is performed, feeding PHI stmts might be in
+     the loop header or loop exit, handle this.  */
+  if (gimple_code (count_stmt) == GIMPLE_PHI)
+    {
+      tree t;
+      if (gimple_code (and_stmt) != GIMPLE_PHI
+	  || gimple_bb (and_stmt) != exit->src
+	  || gimple_bb (count_stmt) != exit->src)
+	return false;
+      t = gimple_phi_arg_def (count_stmt, loop_latch_edge (loop)->dest_idx);
+      if (TREE_CODE (t) != SSA_NAME)
+	return false;
+      count_stmt = SSA_NAME_DEF_STMT (t);
+      t = gimple_phi_arg_def (and_stmt, loop_latch_edge (loop)->dest_idx);
+      if (TREE_CODE (t) != SSA_NAME)
+	return false;
+      and_stmt = SSA_NAME_DEF_STMT (t);
+      adjust = false;
+    }
+
+  /* Make sure we have a count by one.  */
+  if (!is_gimple_assign (count_stmt)
+      || (gimple_assign_rhs_code (count_stmt) != PLUS_EXPR)
+      || !integer_onep (gimple_assign_rhs2 (count_stmt)))
+    return false;
+
+  /* Check "b = b & (b - 1)" is calculated.  */
+  if (!is_gimple_assign (and_stmt)
+      || gimple_assign_rhs_code (and_stmt) != BIT_AND_EXPR)
+    return false;
+
+  rhs1 = gimple_assign_rhs1 (and_stmt);
+  rhs2 = gimple_assign_rhs2 (and_stmt);
+
+  if (ssa_defined_by_minus_one_stmt_p (rhs1, rhs2, &and_minus_one))
+    std::swap (rhs1, rhs2);
+  else if (ssa_defined_by_minus_one_stmt_p (rhs2, rhs1, &and_minus_one))
+    ;
+  else
+    return false;
+
+  /* Check the recurrence.  */
+  phi = SSA_NAME_DEF_STMT (rhs1);
+  if (gimple_code (phi) != GIMPLE_PHI)
+    return false;
+
+  dest = gimple_assign_lhs (count_stmt);
+  tree utype = unsigned_type_for (TREE_TYPE (dest));
+  tree fn = builtin_decl_implicit (BUILT_IN_POPCOUNT);
+  tree src = gimple_phi_arg_def (phi, loop_preheader_edge (loop)->dest_idx);
+  if (adjust)
+    iter = fold_build2 (MINUS_EXPR, utype,
+			build_call_expr (fn, 1, src),
+			build_int_cst (utype, 1));
+  else
+    iter = fold_convert (utype, build_call_expr (fn, 1, src));
+  max = int_cst_value (TYPE_MAX_VALUE (TREE_TYPE (dest)));
+
+  niter->assumptions = boolean_false_node;
+  niter->control.base = NULL_TREE;
+  niter->control.step = NULL_TREE;
+  niter->control.no_overflow = false;
+  niter->niter = iter;
+  niter->assumptions = boolean_true_node;
+  niter->may_be_zero = boolean_false_node;
+  niter->max = max;
+  niter->bound = NULL_TREE;
+  niter->cmp = ERROR_MARK;
+  return true;
+}
+
+
 /* Like number_of_iterations_exit_assumptions, but return TRUE only if
    the niter information holds unconditionally.  */
 
@@ -2441,7 +2587,11 @@ number_of_iterations_exit (struct loop *loop, edge exit,
   gcond *stmt;
   if (!number_of_iterations_exit_assumptions (loop, exit, niter,
 					      &stmt, every_iteration))
-    return false;
+    {
+      if (number_of_iterations_popcount (loop, niter))
+	return true;
+      return false;
+    }
 
   if (integer_nonzerop (niter->assumptions))
     return true;

Bin.Cheng June 1, 2018, 10:06 a.m. UTC | #17

On Fri, Jun 1, 2018 at 9:56 AM, Kugan Vivekanandarajah
<kugan.vivekanandarajah@linaro.org> wrote:
> Hi Bin,

>

> Thanks a lo for the review.

>

> On 1 June 2018 at 03:45, Bin.Cheng <amker.cheng@gmail.com> wrote:

>> On Thu, May 31, 2018 at 3:51 AM, Kugan Vivekanandarajah

>> <kugan.vivekanandarajah@linaro.org> wrote:

>>> Hi Bin,

>>>

>>> Thanks for the review. Please find the revised patch based on the

>>> review comments.

>>>

>>> Thanks,

>>> Kugan

>>>

>>> On 17 May 2018 at 19:56, Bin.Cheng <amker.cheng@gmail.com> wrote:

>>>> On Thu, May 17, 2018 at 2:39 AM, Kugan Vivekanandarajah

>>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>>> Hi Richard,

>>>>>

>>>>> On 6 March 2018 at 02:24, Richard Biener <richard.guenther@gmail.com> wrote:

>>>>>> On Thu, Feb 8, 2018 at 1:41 AM, Kugan Vivekanandarajah

>>>>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>>>>> Hi Richard,

>>>>>>>

>>

>> Hi,

>> Thanks very much for working.

>>

>>> +/* Utility function to check if OP is defined by a stmt

>>> +   that is a val - 1.  If that is the case, set it to STMT.  */

>>> +

>>> +static bool

>>> +ssa_defined_by_and_minus_one_stmt_p (tree op, tree val, gimple **stmt)

>> This is checking if op is defined as val - 1, so name it as

>> ssa_defined_by_minus_one_stmt_p?

>>

>>> +{

>>> +  if (TREE_CODE (op) == SSA_NAME

>>> +      && (*stmt = SSA_NAME_DEF_STMT (op))

>>> +      && is_gimple_assign (*stmt)

>>> +      && (gimple_assign_rhs_code (*stmt) == PLUS_EXPR)

>>> +      && val == gimple_assign_rhs1 (*stmt)

>>> +      && integer_minus_onep (gimple_assign_rhs2 (*stmt)))

>>> +    return true;

>>> +  else

>>> +    return false;

>> You can simply return the boolean condition.

> Done.

>

>>

>>> +}

>>> +

>>> +/* See if LOOP is a popcout implementation of the form

>> ...

>>> +  rhs1 = gimple_assign_rhs1 (and_stmt);

>>> +  rhs2 = gimple_assign_rhs2 (and_stmt);

>>> +

>>> +  if (ssa_defined_by_and_minus_one_stmt_p (rhs1, rhs2, &and_minus_one))

>>> +    rhs1 = rhs2;

>>> +  else if (ssa_defined_by_and_minus_one_stmt_p (rhs2, rhs1, &and_minus_one))

>>> +    ;

>>> +  else

>>> +    return false;

>>> +

>>> +  /* Check the recurrence.  */

>>> +  phi = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (and_minus_one));

>> So gimple_assign_rhs1 (and_minus_one) == rhs1 is always true?  Please

>> use rhs1 directly.

>

> Done.

>>> +  gimple *src_phi = SSA_NAME_DEF_STMT (rhs2);

>> I think this is checking wrong thing and is redundant.  Either rhs2

>> equals to rhs1 or is defined as (rhs1 - 1).  For (rhs2 == rhs1) case,

>> the check duplicates checking on phi; for the latter, it's never a PHI

>> stmt and shouldn't be checked.

>>

>>> +  if (gimple_code (phi) != GIMPLE_PHI

>>> +      || gimple_code (src_phi) != GIMPLE_PHI)

>>> +    return false;

>>> +

>>> +  dest = gimple_assign_lhs (count_stmt);

>>> +  tree fn = builtin_decl_implicit (BUILT_IN_POPCOUNT);

>>> +  tree src = gimple_phi_arg_def (src_phi, loop_preheader_edge (loop)->dest_idx);

>>> +  if (adjust)

>>> +    iter = fold_build2 (MINUS_EXPR, TREE_TYPE (dest),

>>> +            build_call_expr (fn, 1, src),

>>> +            build_int_cst (TREE_TYPE (dest), 1));

>>> +  else

>>> +    iter = build_call_expr (fn, 1, src);

>> Note tree-ssa-loop-niters.c always use unsigned_type_for (IV-type) as

>> niters type.  Though unsigned type is unnecessary in this case, but

>> better to follow existing behavior?

>

> Done.

>>

>>> +  max = int_cst_value (TYPE_MAX_VALUE (TREE_TYPE (dest)));

>> As richi suggested, max should be the number of bits in type of IV.

>>

>>> +

>>> +  niter->assumptions = boolean_false_node;

>> Redundant.

>

> Not sure I understand. If I remove this,  I am getting ICE

> (segmentation fault). What is the expectation here?

Is it a simple typo?  Because assumptions is set to boolean_true_node
just 5 lines below?
The niters part looks good for me with this change.  You may need
richi's approval for other parts?

Thanks,
bin
>

>>> +  niter->control.base = NULL_TREE;

>>> +  niter->control.step = NULL_TREE;

>>> +  niter->control.no_overflow = false;

>>> +  niter->niter = iter;

>>> +  niter->assumptions = boolean_true_node;

>>> +  niter->may_be_zero = boolean_false_node;

>>> +  niter->max = max;

>>> +  niter->bound = NULL_TREE;

>>> +  niter->cmp = ERROR_MARK;

>>> +  return true;

>>> +}

>>> +

>>> +

>> Appology if these are nitpickings.

> Thanks for the review. I am happy to make the changes needed to get it

> to how it should be :)

>

> Thanks,

> Kugan

>

>>

>> Thanks,

>> bin

Richard Biener June 1, 2018, 1:12 p.m. UTC | #18

On Fri, Jun 1, 2018 at 12:06 PM Bin.Cheng <amker.cheng@gmail.com> wrote:
>

> On Fri, Jun 1, 2018 at 9:56 AM, Kugan Vivekanandarajah

> <kugan.vivekanandarajah@linaro.org> wrote:

> > Hi Bin,

> >

> > Thanks a lo for the review.

> >

> > On 1 June 2018 at 03:45, Bin.Cheng <amker.cheng@gmail.com> wrote:

> >> On Thu, May 31, 2018 at 3:51 AM, Kugan Vivekanandarajah

> >> <kugan.vivekanandarajah@linaro.org> wrote:

> >>> Hi Bin,

> >>>

> >>> Thanks for the review. Please find the revised patch based on the

> >>> review comments.

> >>>

> >>> Thanks,

> >>> Kugan

> >>>

> >>> On 17 May 2018 at 19:56, Bin.Cheng <amker.cheng@gmail.com> wrote:

> >>>> On Thu, May 17, 2018 at 2:39 AM, Kugan Vivekanandarajah

> >>>> <kugan.vivekanandarajah@linaro.org> wrote:

> >>>>> Hi Richard,

> >>>>>

> >>>>> On 6 March 2018 at 02:24, Richard Biener <richard.guenther@gmail.com> wrote:

> >>>>>> On Thu, Feb 8, 2018 at 1:41 AM, Kugan Vivekanandarajah

> >>>>>> <kugan.vivekanandarajah@linaro.org> wrote:

> >>>>>>> Hi Richard,

> >>>>>>>

> >>

> >> Hi,

> >> Thanks very much for working.

> >>

> >>> +/* Utility function to check if OP is defined by a stmt

> >>> +   that is a val - 1.  If that is the case, set it to STMT.  */

> >>> +

> >>> +static bool

> >>> +ssa_defined_by_and_minus_one_stmt_p (tree op, tree val, gimple **stmt)

> >> This is checking if op is defined as val - 1, so name it as

> >> ssa_defined_by_minus_one_stmt_p?

> >>

> >>> +{

> >>> +  if (TREE_CODE (op) == SSA_NAME

> >>> +      && (*stmt = SSA_NAME_DEF_STMT (op))

> >>> +      && is_gimple_assign (*stmt)

> >>> +      && (gimple_assign_rhs_code (*stmt) == PLUS_EXPR)

> >>> +      && val == gimple_assign_rhs1 (*stmt)

> >>> +      && integer_minus_onep (gimple_assign_rhs2 (*stmt)))

> >>> +    return true;

> >>> +  else

> >>> +    return false;

> >> You can simply return the boolean condition.

> > Done.

> >

> >>

> >>> +}

> >>> +

> >>> +/* See if LOOP is a popcout implementation of the form

> >> ...

> >>> +  rhs1 = gimple_assign_rhs1 (and_stmt);

> >>> +  rhs2 = gimple_assign_rhs2 (and_stmt);

> >>> +

> >>> +  if (ssa_defined_by_and_minus_one_stmt_p (rhs1, rhs2, &and_minus_one))

> >>> +    rhs1 = rhs2;

> >>> +  else if (ssa_defined_by_and_minus_one_stmt_p (rhs2, rhs1, &and_minus_one))

> >>> +    ;

> >>> +  else

> >>> +    return false;

> >>> +

> >>> +  /* Check the recurrence.  */

> >>> +  phi = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (and_minus_one));

> >> So gimple_assign_rhs1 (and_minus_one) == rhs1 is always true?  Please

> >> use rhs1 directly.

> >

> > Done.

> >>> +  gimple *src_phi = SSA_NAME_DEF_STMT (rhs2);

> >> I think this is checking wrong thing and is redundant.  Either rhs2

> >> equals to rhs1 or is defined as (rhs1 - 1).  For (rhs2 == rhs1) case,

> >> the check duplicates checking on phi; for the latter, it's never a PHI

> >> stmt and shouldn't be checked.

> >>

> >>> +  if (gimple_code (phi) != GIMPLE_PHI

> >>> +      || gimple_code (src_phi) != GIMPLE_PHI)

> >>> +    return false;

> >>> +

> >>> +  dest = gimple_assign_lhs (count_stmt);

> >>> +  tree fn = builtin_decl_implicit (BUILT_IN_POPCOUNT);

> >>> +  tree src = gimple_phi_arg_def (src_phi, loop_preheader_edge (loop)->dest_idx);

> >>> +  if (adjust)

> >>> +    iter = fold_build2 (MINUS_EXPR, TREE_TYPE (dest),

> >>> +            build_call_expr (fn, 1, src),

> >>> +            build_int_cst (TREE_TYPE (dest), 1));

> >>> +  else

> >>> +    iter = build_call_expr (fn, 1, src);

> >> Note tree-ssa-loop-niters.c always use unsigned_type_for (IV-type) as

> >> niters type.  Though unsigned type is unnecessary in this case, but

> >> better to follow existing behavior?

> >

> > Done.

> >>

> >>> +  max = int_cst_value (TYPE_MAX_VALUE (TREE_TYPE (dest)));

> >> As richi suggested, max should be the number of bits in type of IV.

> >>

> >>> +

> >>> +  niter->assumptions = boolean_false_node;

> >> Redundant.

> >

> > Not sure I understand. If I remove this,  I am getting ICE

> > (segmentation fault). What is the expectation here?

> Is it a simple typo?  Because assumptions is set to boolean_true_node

> just 5 lines below?

> The niters part looks good for me with this change.  You may need

> richi's approval for other parts?


diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c
index bdf9ba1..4dc156a 100644
--- a/gcc/ipa-fnsummary.c
+++ b/gcc/ipa-fnsummary.c
@@ -1485,6 +1485,10 @@ will_be_nonconstant_expr_predicate (struct
ipa_node_params *info,
                                               nonconstant_names);
       return p2.or_with (summary->conds, p1);
     }
+  else if (TREE_CODE (expr) == CALL_EXPR)
+    {
+      return false;
+    }
   else
     {
       debug_tree (expr);

I'd return true here instead - we don't want to track popcount() in
predicates.  Also unnecessary braces.

@@ -3492,6 +3494,11 @@ expression_expensive_p (tree expr)
       if (!integer_pow2p (TREE_OPERAND (expr, 1)))
        return true;
     }
+  if (code == CALL_EXPR
+      && (fndecl = get_callee_fndecl (expr))
+      && DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL
+      && (BUILT_IN_POPCOUNT == DECL_FUNCTION_CODE (fndecl)))
+    return false;

please use

    if (code == CALL_EXPR)
     {
         if (!is_inexpensive_builtin (get_callee_fndecl (expr)))
           return true;
         FOR_EACH_CALL_EXPR_ARG (arg, iter, expr)
             if (expression_expensive_p (arg))
                return true;
         return false;
     }

instead.

+  /* Check "b = b & (b - 1)" is calculated.  */
+  if (!is_gimple_assign (and_stmt)
+      || gimple_assign_rhs_code (and_stmt) != BIT_AND_EXPR)
+    return false;
+
+  rhs1 = gimple_assign_rhs1 (and_stmt);
+  rhs2 = gimple_assign_rhs2 (and_stmt);
+
+  if (ssa_defined_by_minus_one_stmt_p (rhs1, rhs2, &and_minus_one))
+    std::swap (rhs1, rhs2);
+  else if (ssa_defined_by_minus_one_stmt_p (rhs2, rhs1, &and_minus_one))
+    ;
+  else
+    return false;

so the powers of match-and-simplify let you add sth like the following to
match.pd:

#if GIMPLE
(match (b_bit_and_b_minus_one @0)
 (bit_and:c @0 (plus @0 integer_minus_onep)))
#endif

and then use it like

   extern bool gimple_b_bit_and_b_minus_one (tree, tree *, tree (*)(tree));
   if (!gimple_b_bit_and_b_minus_one (gimple_assign_lhs (and_stmt),
&rhs1, NULL))
      return false;

note you need to explicitely declare the matcher as there's no header
file generated
for them.  It would be also the first user for this "feature" and at
some point I
considered using in-line source markup (and sth like GTY_FILES for
what files to scan)
to gather patterns.

+  /* Check the loop closed SSA definition for just the variable c defined in
+     loop.  */
+  basic_block bb = exit->dest;
+  for (gphi_iterator gpi = gsi_start_phis (bb);
+       !gsi_end_p (gpi); gsi_next (&gpi))
+    {
+      phi = gpi.phi ();
+      count++;
+    }

I don't understand why you are checking for this - isn't the number of
iterations
independent on the rest of the loop-closed PHIs?  That is, even for just

 while (b) { b = b & (b-1); }

the number of iterations is popcount (b).  So it's enough to check
the exit condition?  In fact you are coming from number_of_iterations_exit
and thus are asked for a specific exit and thus

+  if (!(exit = single_exit (loop)))
+    return false;

is premature.  That is, for

  while (b) { b = b & (b - 1); if (--c == 0) break; }

still should have popcount (b) iterations for the exit involving the
while (b) test.

So please simplify (and thus genericize) the code accordingly.  Do so
without the match.pd trick for now, I think we can simplify the current
beast down enough to not need the factored out function.

You then also can handle replacing

int c = 0;
while (b) { b = b & (b-1); c+=3; }

with 3 * popcount (b);

Richard.

> Thanks,

> bin

> >

> >>> +  niter->control.base = NULL_TREE;

> >>> +  niter->control.step = NULL_TREE;

> >>> +  niter->control.no_overflow = false;

> >>> +  niter->niter = iter;

> >>> +  niter->assumptions = boolean_true_node;

> >>> +  niter->may_be_zero = boolean_false_node;

> >>> +  niter->max = max;

> >>> +  niter->bound = NULL_TREE;

> >>> +  niter->cmp = ERROR_MARK;

> >>> +  return true;

> >>> +}

> >>> +

> >>> +

> >> Appology if these are nitpickings.

> > Thanks for the review. I am happy to make the changes needed to get it

> > to how it should be :)

> >

> > Thanks,

> > Kugan

> >

> >>

> >> Thanks,

> >> bin

Kugan Vivekanandarajah June 5, 2018, 8:59 a.m. UTC | #19

Hi Richard,

Thanks for the review.

On 1 June 2018 at 23:12, Richard Biener <richard.guenther@gmail.com> wrote:
> On Fri, Jun 1, 2018 at 12:06 PM Bin.Cheng <amker.cheng@gmail.com> wrote:

>>

>> On Fri, Jun 1, 2018 at 9:56 AM, Kugan Vivekanandarajah

>> <kugan.vivekanandarajah@linaro.org> wrote:

>> > Hi Bin,

>> >

>> > Thanks a lo for the review.

>> >

>> > On 1 June 2018 at 03:45, Bin.Cheng <amker.cheng@gmail.com> wrote:

>> >> On Thu, May 31, 2018 at 3:51 AM, Kugan Vivekanandarajah

>> >> <kugan.vivekanandarajah@linaro.org> wrote:

>> >>> Hi Bin,

>> >>>

>> >>> Thanks for the review. Please find the revised patch based on the

>> >>> review comments.

>> >>>

>> >>> Thanks,

>> >>> Kugan

>> >>>

>> >>> On 17 May 2018 at 19:56, Bin.Cheng <amker.cheng@gmail.com> wrote:

>> >>>> On Thu, May 17, 2018 at 2:39 AM, Kugan Vivekanandarajah

>> >>>> <kugan.vivekanandarajah@linaro.org> wrote:

>> >>>>> Hi Richard,

>> >>>>>

>> >>>>> On 6 March 2018 at 02:24, Richard Biener <richard.guenther@gmail.com> wrote:

>> >>>>>> On Thu, Feb 8, 2018 at 1:41 AM, Kugan Vivekanandarajah

>> >>>>>> <kugan.vivekanandarajah@linaro.org> wrote:

>> >>>>>>> Hi Richard,

>> >>>>>>>

>> >>

>> >> Hi,

>> >> Thanks very much for working.

>> >>

>> >>> +/* Utility function to check if OP is defined by a stmt

>> >>> +   that is a val - 1.  If that is the case, set it to STMT.  */

>> >>> +

>> >>> +static bool

>> >>> +ssa_defined_by_and_minus_one_stmt_p (tree op, tree val, gimple **stmt)

>> >> This is checking if op is defined as val - 1, so name it as

>> >> ssa_defined_by_minus_one_stmt_p?

>> >>

>> >>> +{

>> >>> +  if (TREE_CODE (op) == SSA_NAME

>> >>> +      && (*stmt = SSA_NAME_DEF_STMT (op))

>> >>> +      && is_gimple_assign (*stmt)

>> >>> +      && (gimple_assign_rhs_code (*stmt) == PLUS_EXPR)

>> >>> +      && val == gimple_assign_rhs1 (*stmt)

>> >>> +      && integer_minus_onep (gimple_assign_rhs2 (*stmt)))

>> >>> +    return true;

>> >>> +  else

>> >>> +    return false;

>> >> You can simply return the boolean condition.

>> > Done.

>> >

>> >>

>> >>> +}

>> >>> +

>> >>> +/* See if LOOP is a popcout implementation of the form

>> >> ...

>> >>> +  rhs1 = gimple_assign_rhs1 (and_stmt);

>> >>> +  rhs2 = gimple_assign_rhs2 (and_stmt);

>> >>> +

>> >>> +  if (ssa_defined_by_and_minus_one_stmt_p (rhs1, rhs2, &and_minus_one))

>> >>> +    rhs1 = rhs2;

>> >>> +  else if (ssa_defined_by_and_minus_one_stmt_p (rhs2, rhs1, &and_minus_one))

>> >>> +    ;

>> >>> +  else

>> >>> +    return false;

>> >>> +

>> >>> +  /* Check the recurrence.  */

>> >>> +  phi = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (and_minus_one));

>> >> So gimple_assign_rhs1 (and_minus_one) == rhs1 is always true?  Please

>> >> use rhs1 directly.

>> >

>> > Done.

>> >>> +  gimple *src_phi = SSA_NAME_DEF_STMT (rhs2);

>> >> I think this is checking wrong thing and is redundant.  Either rhs2

>> >> equals to rhs1 or is defined as (rhs1 - 1).  For (rhs2 == rhs1) case,

>> >> the check duplicates checking on phi; for the latter, it's never a PHI

>> >> stmt and shouldn't be checked.

>> >>

>> >>> +  if (gimple_code (phi) != GIMPLE_PHI

>> >>> +      || gimple_code (src_phi) != GIMPLE_PHI)

>> >>> +    return false;

>> >>> +

>> >>> +  dest = gimple_assign_lhs (count_stmt);

>> >>> +  tree fn = builtin_decl_implicit (BUILT_IN_POPCOUNT);

>> >>> +  tree src = gimple_phi_arg_def (src_phi, loop_preheader_edge (loop)->dest_idx);

>> >>> +  if (adjust)

>> >>> +    iter = fold_build2 (MINUS_EXPR, TREE_TYPE (dest),

>> >>> +            build_call_expr (fn, 1, src),

>> >>> +            build_int_cst (TREE_TYPE (dest), 1));

>> >>> +  else

>> >>> +    iter = build_call_expr (fn, 1, src);

>> >> Note tree-ssa-loop-niters.c always use unsigned_type_for (IV-type) as

>> >> niters type.  Though unsigned type is unnecessary in this case, but

>> >> better to follow existing behavior?

>> >

>> > Done.

>> >>

>> >>> +  max = int_cst_value (TYPE_MAX_VALUE (TREE_TYPE (dest)));

>> >> As richi suggested, max should be the number of bits in type of IV.

>> >>

>> >>> +

>> >>> +  niter->assumptions = boolean_false_node;

>> >> Redundant.

>> >

>> > Not sure I understand. If I remove this,  I am getting ICE

>> > (segmentation fault). What is the expectation here?

>> Is it a simple typo?  Because assumptions is set to boolean_true_node

>> just 5 lines below?

>> The niters part looks good for me with this change.  You may need

>> richi's approval for other parts?

>

> diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c

> index bdf9ba1..4dc156a 100644

> --- a/gcc/ipa-fnsummary.c

> +++ b/gcc/ipa-fnsummary.c

> @@ -1485,6 +1485,10 @@ will_be_nonconstant_expr_predicate (struct

> ipa_node_params *info,

>                                                nonconstant_names);

>        return p2.or_with (summary->conds, p1);

>      }

> +  else if (TREE_CODE (expr) == CALL_EXPR)

> +    {

> +      return false;

> +    }

>    else

>      {

>        debug_tree (expr);

>

> I'd return true here instead - we don't want to track popcount() in

> predicates.  Also unnecessary braces.

>

> @@ -3492,6 +3494,11 @@ expression_expensive_p (tree expr)

>        if (!integer_pow2p (TREE_OPERAND (expr, 1)))

>         return true;

>      }

> +  if (code == CALL_EXPR

> +      && (fndecl = get_callee_fndecl (expr))

> +      && DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL

> +      && (BUILT_IN_POPCOUNT == DECL_FUNCTION_CODE (fndecl)))

> +    return false;

>

> please use

>

>     if (code == CALL_EXPR)

>      {

>          if (!is_inexpensive_builtin (get_callee_fndecl (expr)))

>            return true;

>          FOR_EACH_CALL_EXPR_ARG (arg, iter, expr)

>              if (expression_expensive_p (arg))

>                 return true;

>          return false;

>      }

>

> instead.

Done.

>

> +  /* Check "b = b & (b - 1)" is calculated.  */

> +  if (!is_gimple_assign (and_stmt)

> +      || gimple_assign_rhs_code (and_stmt) != BIT_AND_EXPR)

> +    return false;

> +

> +  rhs1 = gimple_assign_rhs1 (and_stmt);

> +  rhs2 = gimple_assign_rhs2 (and_stmt);

> +

> +  if (ssa_defined_by_minus_one_stmt_p (rhs1, rhs2, &and_minus_one))

> +    std::swap (rhs1, rhs2);

> +  else if (ssa_defined_by_minus_one_stmt_p (rhs2, rhs1, &and_minus_one))

> +    ;

> +  else

> +    return false;

>

> so the powers of match-and-simplify let you add sth like the following to

> match.pd:

>

> #if GIMPLE

> (match (b_bit_and_b_minus_one @0)

>  (bit_and:c @0 (plus @0 integer_minus_onep)))

> #endif

>

> and then use it like

>

>    extern bool gimple_b_bit_and_b_minus_one (tree, tree *, tree (*)(tree));

>    if (!gimple_b_bit_and_b_minus_one (gimple_assign_lhs (and_stmt),

> &rhs1, NULL))

>       return false;

>

> note you need to explicitely declare the matcher as there's no header

> file generated

> for them.  It would be also the first user for this "feature" and at

> some point I

> considered using in-line source markup (and sth like GTY_FILES for

> what files to scan)

> to gather patterns.

>

> +  /* Check the loop closed SSA definition for just the variable c defined in

> +     loop.  */

> +  basic_block bb = exit->dest;

> +  for (gphi_iterator gpi = gsi_start_phis (bb);

> +       !gsi_end_p (gpi); gsi_next (&gpi))

> +    {

> +      phi = gpi.phi ();

> +      count++;

> +    }

>

> I don't understand why you are checking for this - isn't the number of

> iterations

I am trying to be overly conservative. I have removed it now and adjusted.

> independent on the rest of the loop-closed PHIs?  That is, even for just

>

>  while (b) { b = b & (b-1); }

>

> the number of iterations is popcount (b).  So it's enough to check

> the exit condition?  In fact you are coming from number_of_iterations_exit

> and thus are asked for a specific exit and thus

>

> +  if (!(exit = single_exit (loop)))

> +    return false;

Ok, changed it.


> is premature.  That is, for

>

>   while (b) { b = b & (b - 1); if (--c == 0) break; }

>

> still should have popcount (b) iterations for the exit involving the

> while (b) test.

>

> So please simplify (and thus genericize) the code accordingly.  Do so

> without the match.pd trick for now, I think we can simplify the current

> beast down enough to not need the factored out function.

I am not using match.pd changes as you have asked. Please let me know
if you want that to be included as well.

>

> You then also can handle replacing

>

> int c = 0;

> while (b) { b = b & (b-1); c+=3; }

>

> with 3 * popcount (b);

Made to handle this too. But, there are still cases we don't handle. I
am not sure if it is worth the complexity handling all the possible
cases.

Is the attached patch looks better now?

Thanks,
Kugan


>

> Richard.

>

>> Thanks,

>> bin

>> >

>> >>> +  niter->control.base = NULL_TREE;

>> >>> +  niter->control.step = NULL_TREE;

>> >>> +  niter->control.no_overflow = false;

>> >>> +  niter->niter = iter;

>> >>> +  niter->assumptions = boolean_true_node;

>> >>> +  niter->may_be_zero = boolean_false_node;

>> >>> +  niter->max = max;

>> >>> +  niter->bound = NULL_TREE;

>> >>> +  niter->cmp = ERROR_MARK;

>> >>> +  return true;

>> >>> +}

>> >>> +

>> >>> +

>> >> Appology if these are nitpickings.

>> > Thanks for the review. I am happy to make the changes needed to get it

>> > to how it should be :)

>> >

>> > Thanks,

>> > Kugan

>> >

>> >>

>> >> Thanks,

>> >> bin
From a21020831f44915c69e7a49692ba1e98b7be3760 Mon Sep 17 00:00:00 2001
From: Kugan Vivekanandarajah <kugan.vivekanandarajah@linaro.org>
Date: Thu, 10 May 2018 21:41:53 +1000
Subject: [PATCH] popcount

Change-Id: I383178e01051c2929647af474c7e77a5781373f0
---
 gcc/ipa-fnsummary.c                       |   2 +
 gcc/testsuite/gcc.dg/tree-ssa/popcount.c  |  41 +++++++++
 gcc/testsuite/gcc.dg/tree-ssa/popcount2.c |  29 ++++++
 gcc/testsuite/gcc.dg/tree-ssa/popcount3.c |  28 ++++++
 gcc/tree-scalar-evolution.c               |  15 ++++
 gcc/tree-ssa-loop-ivopts.c                |  10 +++
 gcc/tree-ssa-loop-niter.c                 | 143 +++++++++++++++++++++++++++++-
 7 files changed, 267 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/popcount.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/popcount2.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/popcount3.c

diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c
index bdf9ba1..feb1c9e 100644
--- a/gcc/ipa-fnsummary.c
+++ b/gcc/ipa-fnsummary.c
@@ -1485,6 +1485,8 @@ will_be_nonconstant_expr_predicate (struct ipa_node_params *info,
 					       nonconstant_names);
       return p2.or_with (summary->conds, p1);
     }
+  else if (TREE_CODE (expr) == CALL_EXPR)
+    return true;
   else
     {
       debug_tree (expr);
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/popcount.c b/gcc/testsuite/gcc.dg/tree-ssa/popcount.c
new file mode 100644
index 0000000..86a66cb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/popcount.c
@@ -0,0 +1,41 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+
+extern int foo (int);
+
+int PopCount (long b) {
+    int c = 0;
+    b++;
+
+    while (b) {
+	b &= b - 1;
+	c++;
+    }
+    return c;
+}
+int PopCount2 (long b) {
+    int c = 0;
+
+    while (b) {
+	b &= b - 1;
+	c++;
+    }
+    foo (c);
+    return foo (c);
+}
+
+void PopCount3 (long b1) {
+
+    for (long i = 0; i < b1; ++i)
+      {
+	long b = i;
+	int c = 0;
+	while (b) {
+	    b &= b - 1;
+	    c++;
+	}
+	foo (c);
+      }
+}
+
+/* { dg-final { scan-tree-dump-times "__builtin_popcount" 3 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/popcount2.c b/gcc/testsuite/gcc.dg/tree-ssa/popcount2.c
new file mode 100644
index 0000000..52afc2d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/popcount2.c
@@ -0,0 +1,29 @@
+/* { dg-do execute } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+int
+__attribute__ ((noinline, noclone))
+foo (int i, long b)
+{
+    int c = i;
+
+    while (b) {
+	b &= b - 1;
+	c++;
+    }
+    return c;
+}
+
+int main()
+{
+
+  if (foo (1, 7) != 4)
+   __builtin_abort ();
+  if (foo (0, 0) != 0)
+   __builtin_abort ();
+  if (foo (8, 0xff) != 16)
+   __builtin_abort ();
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "__builtin_popcount" 1 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/popcount3.c b/gcc/testsuite/gcc.dg/tree-ssa/popcount3.c
new file mode 100644
index 0000000..0c69d97
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/popcount3.c
@@ -0,0 +1,28 @@
+/* { dg-do execute } */
+/* { dg-options "-O2 -fno-tree-ch -fdump-tree-optimized" } */
+
+int
+__attribute__ ((noinline, noclone))
+foo (long b)
+{
+    int c = i;
+
+    while (b) {
+	b &= b - 1;
+	c++;
+    }
+    return c;
+}
+
+int main()
+{
+  if (foo (7) != 3)
+   __builtin_abort ();
+  if (foo (0) != 0)
+   __builtin_abort ();
+  if (foo (0xff) != 8)
+   __builtin_abort ();
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "__builtin_popcount" 1 "optimized" } } */
diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c
index fefc9de..4b0ec02 100644
--- a/gcc/tree-scalar-evolution.c
+++ b/gcc/tree-scalar-evolution.c
@@ -281,6 +281,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-propagate.h"
 #include "gimple-fold.h"
 #include "tree-into-ssa.h"
+#include "builtins.h"
 
 static tree analyze_scalar_evolution_1 (struct loop *, tree);
 static tree analyze_scalar_evolution_for_address_of (struct loop *loop,
@@ -1984,6 +1985,7 @@ interpret_expr (struct loop *loop, gimple *at_stmt, tree expr)
     return expr;
 
   if (TREE_CODE (expr) == POLYNOMIAL_CHREC
+      || TREE_CODE (expr) == CALL_EXPR
       || get_gimple_rhs_class (TREE_CODE (expr)) == GIMPLE_TERNARY_RHS)
     return chrec_dont_know;
 
@@ -3493,6 +3495,19 @@ expression_expensive_p (tree expr)
 	return true;
     }
 
+  if (code == CALL_EXPR)
+    {
+      tree arg;
+      call_expr_arg_iterator iter;
+
+      if (!is_inexpensive_builtin (get_callee_fndecl (expr)))
+	return true;
+      FOR_EACH_CALL_EXPR_ARG (arg, iter, expr)
+	if (expression_expensive_p (arg))
+	  return true;
+      return false;
+    }
+
   switch (TREE_CODE_CLASS (code))
     {
     case tcc_binary:
diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index b313571..519649a 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -985,6 +985,16 @@ contains_abnormal_ssa_name_p (tree expr)
   code = TREE_CODE (expr);
   codeclass = TREE_CODE_CLASS (code);
 
+  if (code == CALL_EXPR)
+    {
+      tree arg;
+      call_expr_arg_iterator iter;
+      FOR_EACH_CALL_EXPR_ARG (arg, iter, expr)
+	if (contains_abnormal_ssa_name_p (arg))
+	  return true;
+      return false;
+    }
+
   if (code == SSA_NAME)
     return SSA_NAME_OCCURS_IN_ABNORMAL_PHI (expr) != 0;
 
diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
index 7a54c5f..34c7d15 100644
--- a/gcc/tree-ssa-loop-niter.c
+++ b/gcc/tree-ssa-loop-niter.c
@@ -2430,6 +2430,143 @@ number_of_iterations_exit_assumptions (struct loop *loop, edge exit,
   return (!integer_zerop (niter->assumptions));
 }
 
+
+/* Utility function to check if OP is defined by a stmt
+   that is a val - 1.  */
+
+static bool
+ssa_defined_by_minus_one_stmt_p (tree op, tree val)
+{
+  gimple *stmt;
+  return (TREE_CODE (op) == SSA_NAME
+	  && (stmt = SSA_NAME_DEF_STMT (op))
+	  && is_gimple_assign (stmt)
+	  && (gimple_assign_rhs_code (stmt) == PLUS_EXPR)
+	  && val == gimple_assign_rhs1 (stmt)
+	  && integer_minus_onep (gimple_assign_rhs2 (stmt)));
+}
+
+
+/* See if LOOP is a popcout implementation of the form
+
+    int c = 0;
+    while (b) {
+	b = b & (b - 1);
+	c++;
+    }
+
+    If popcount pattern, update NITER accordingly.
+    i.e., set NITER to  __builtin_popcount (b)
+    return true if we did, false otherwise.
+
+ */
+
+static bool
+number_of_iterations_popcount (loop_p loop, edge exit,
+			       struct tree_niter_desc *niter)
+{
+  tree rhs1, rhs2;
+  tree dest;
+  bool adjust = true;
+  tree iter;
+  HOST_WIDE_INT max;
+  tree times;
+  adjust = true;
+
+  /* Check loop terminating branch is like
+     if (b != 0).  */
+  gimple *stmt = last_stmt (loop->header);
+  if (!stmt
+      || gimple_code (stmt) != GIMPLE_COND
+      || gimple_cond_code (stmt) != NE_EXPR
+      || !zerop (gimple_cond_rhs (stmt))
+      || TREE_CODE (gimple_cond_lhs (stmt)) != SSA_NAME)
+    return false;
+
+  gphi_iterator gpi = gsi_start_phis (exit->dest);
+  if (gsi_end_p (gpi))
+    return false;
+  rhs1 = gimple_phi_arg_def (gpi.phi (), exit->dest_idx);
+  if (TREE_CODE (rhs1) != SSA_NAME)
+    return false;
+
+  gimple *count_stmt = SSA_NAME_DEF_STMT (rhs1);
+  gimple *and_stmt = SSA_NAME_DEF_STMT (gimple_cond_lhs (stmt));
+
+  /* Depending on copy-header is performed, feeding PHI stmts might be in
+     the loop header or loop exit, handle this.  */
+  if (gimple_code (count_stmt) == GIMPLE_PHI
+      && gimple_code (and_stmt) == GIMPLE_PHI
+      && gimple_bb (and_stmt) == exit->src
+      && gimple_bb (count_stmt) == exit->src
+      && (TREE_CODE (gimple_phi_arg_def (count_stmt,
+					 loop_latch_edge (loop)->dest_idx))
+	  == SSA_NAME)
+      && (TREE_CODE (gimple_phi_arg_def (and_stmt,
+					 loop_latch_edge (loop)->dest_idx))
+	  == SSA_NAME))
+    {
+      tree t;
+      t = gimple_phi_arg_def (count_stmt, loop_latch_edge (loop)->dest_idx);
+      count_stmt = SSA_NAME_DEF_STMT (t);
+      t = gimple_phi_arg_def (and_stmt, loop_latch_edge (loop)->dest_idx);
+      and_stmt = SSA_NAME_DEF_STMT (t);
+      adjust = false;
+    }
+
+  /* Make sure we have a count by one.  */
+  if (!is_gimple_assign (count_stmt)
+      || (gimple_assign_rhs_code (count_stmt) != PLUS_EXPR)
+      || TREE_CODE (gimple_assign_rhs2 (count_stmt)) != INTEGER_CST)
+    return false;
+  times = gimple_assign_rhs2 (count_stmt);
+
+  /* Check "b = b & (b - 1)" is calculated.  */
+  if (!is_gimple_assign (and_stmt)
+      || gimple_assign_rhs_code (and_stmt) != BIT_AND_EXPR)
+    return false;
+
+  rhs1 = gimple_assign_rhs1 (and_stmt);
+  rhs2 = gimple_assign_rhs2 (and_stmt);
+
+  if (ssa_defined_by_minus_one_stmt_p (rhs1, rhs2))
+    std::swap (rhs1, rhs2);
+  else if (ssa_defined_by_minus_one_stmt_p (rhs2, rhs1))
+    ;
+  else
+    return false;
+
+  /* Check the recurrence.  */
+  gimple *phi = SSA_NAME_DEF_STMT (rhs1);
+  if (gimple_code (phi) != GIMPLE_PHI)
+    return false;
+
+  dest = gimple_assign_lhs (count_stmt);
+  tree utype = unsigned_type_for (TREE_TYPE (dest));
+  tree fn = builtin_decl_implicit (BUILT_IN_POPCOUNT);
+  tree src = gimple_phi_arg_def (phi, loop_preheader_edge (loop)->dest_idx);
+  if (adjust)
+    iter = fold_build2 (MINUS_EXPR, utype,
+			build_call_expr (fn, 1, src),
+			build_int_cst (utype, 1));
+  else
+    iter = fold_convert (utype, build_call_expr (fn, 1, src));
+  max = int_cst_value (TYPE_MAX_VALUE (TREE_TYPE (dest)));
+
+  niter->assumptions = boolean_false_node;
+  niter->control.base = NULL_TREE;
+  niter->control.step = times;
+  niter->control.no_overflow = false;
+  niter->niter = iter;
+  niter->assumptions = boolean_true_node;
+  niter->may_be_zero = boolean_false_node;
+  niter->max = max;
+  niter->bound = NULL_TREE;
+  niter->cmp = ERROR_MARK;
+  return true;
+}
+
+
 /* Like number_of_iterations_exit_assumptions, but return TRUE only if
    the niter information holds unconditionally.  */
 
@@ -2441,7 +2578,11 @@ number_of_iterations_exit (struct loop *loop, edge exit,
   gcond *stmt;
   if (!number_of_iterations_exit_assumptions (loop, exit, niter,
 					      &stmt, every_iteration))
-    return false;
+    {
+      if (number_of_iterations_popcount (loop, exit, niter))
+	return true;
+      return false;
+    }
 
   if (integer_nonzerop (niter->assumptions))
     return true;

Richard Biener June 5, 2018, 11:25 a.m. UTC | #20

On Tue, Jun 5, 2018 at 10:59 AM Kugan Vivekanandarajah
<kugan.vivekanandarajah@linaro.org> wrote:
>

> Hi Richard,

>

> Thanks for the review.

>

> On 1 June 2018 at 23:12, Richard Biener <richard.guenther@gmail.com> wrote:

> > On Fri, Jun 1, 2018 at 12:06 PM Bin.Cheng <amker.cheng@gmail.com> wrote:

> >>

> >> On Fri, Jun 1, 2018 at 9:56 AM, Kugan Vivekanandarajah

> >> <kugan.vivekanandarajah@linaro.org> wrote:

> >> > Hi Bin,

> >> >

> >> > Thanks a lo for the review.

> >> >

> >> > On 1 June 2018 at 03:45, Bin.Cheng <amker.cheng@gmail.com> wrote:

> >> >> On Thu, May 31, 2018 at 3:51 AM, Kugan Vivekanandarajah

> >> >> <kugan.vivekanandarajah@linaro.org> wrote:

> >> >>> Hi Bin,

> >> >>>

> >> >>> Thanks for the review. Please find the revised patch based on the

> >> >>> review comments.

> >> >>>

> >> >>> Thanks,

> >> >>> Kugan

> >> >>>

> >> >>> On 17 May 2018 at 19:56, Bin.Cheng <amker.cheng@gmail.com> wrote:

> >> >>>> On Thu, May 17, 2018 at 2:39 AM, Kugan Vivekanandarajah

> >> >>>> <kugan.vivekanandarajah@linaro.org> wrote:

> >> >>>>> Hi Richard,

> >> >>>>>

> >> >>>>> On 6 March 2018 at 02:24, Richard Biener <richard.guenther@gmail.com> wrote:

> >> >>>>>> On Thu, Feb 8, 2018 at 1:41 AM, Kugan Vivekanandarajah

> >> >>>>>> <kugan.vivekanandarajah@linaro.org> wrote:

> >> >>>>>>> Hi Richard,

> >> >>>>>>>

> >> >>

> >> >> Hi,

> >> >> Thanks very much for working.

> >> >>

> >> >>> +/* Utility function to check if OP is defined by a stmt

> >> >>> +   that is a val - 1.  If that is the case, set it to STMT.  */

> >> >>> +

> >> >>> +static bool

> >> >>> +ssa_defined_by_and_minus_one_stmt_p (tree op, tree val, gimple **stmt)

> >> >> This is checking if op is defined as val - 1, so name it as

> >> >> ssa_defined_by_minus_one_stmt_p?

> >> >>

> >> >>> +{

> >> >>> +  if (TREE_CODE (op) == SSA_NAME

> >> >>> +      && (*stmt = SSA_NAME_DEF_STMT (op))

> >> >>> +      && is_gimple_assign (*stmt)

> >> >>> +      && (gimple_assign_rhs_code (*stmt) == PLUS_EXPR)

> >> >>> +      && val == gimple_assign_rhs1 (*stmt)

> >> >>> +      && integer_minus_onep (gimple_assign_rhs2 (*stmt)))

> >> >>> +    return true;

> >> >>> +  else

> >> >>> +    return false;

> >> >> You can simply return the boolean condition.

> >> > Done.

> >> >

> >> >>

> >> >>> +}

> >> >>> +

> >> >>> +/* See if LOOP is a popcout implementation of the form

> >> >> ...

> >> >>> +  rhs1 = gimple_assign_rhs1 (and_stmt);

> >> >>> +  rhs2 = gimple_assign_rhs2 (and_stmt);

> >> >>> +

> >> >>> +  if (ssa_defined_by_and_minus_one_stmt_p (rhs1, rhs2, &and_minus_one))

> >> >>> +    rhs1 = rhs2;

> >> >>> +  else if (ssa_defined_by_and_minus_one_stmt_p (rhs2, rhs1, &and_minus_one))

> >> >>> +    ;

> >> >>> +  else

> >> >>> +    return false;

> >> >>> +

> >> >>> +  /* Check the recurrence.  */

> >> >>> +  phi = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (and_minus_one));

> >> >> So gimple_assign_rhs1 (and_minus_one) == rhs1 is always true?  Please

> >> >> use rhs1 directly.

> >> >

> >> > Done.

> >> >>> +  gimple *src_phi = SSA_NAME_DEF_STMT (rhs2);

> >> >> I think this is checking wrong thing and is redundant.  Either rhs2

> >> >> equals to rhs1 or is defined as (rhs1 - 1).  For (rhs2 == rhs1) case,

> >> >> the check duplicates checking on phi; for the latter, it's never a PHI

> >> >> stmt and shouldn't be checked.

> >> >>

> >> >>> +  if (gimple_code (phi) != GIMPLE_PHI

> >> >>> +      || gimple_code (src_phi) != GIMPLE_PHI)

> >> >>> +    return false;

> >> >>> +

> >> >>> +  dest = gimple_assign_lhs (count_stmt);

> >> >>> +  tree fn = builtin_decl_implicit (BUILT_IN_POPCOUNT);

> >> >>> +  tree src = gimple_phi_arg_def (src_phi, loop_preheader_edge (loop)->dest_idx);

> >> >>> +  if (adjust)

> >> >>> +    iter = fold_build2 (MINUS_EXPR, TREE_TYPE (dest),

> >> >>> +            build_call_expr (fn, 1, src),

> >> >>> +            build_int_cst (TREE_TYPE (dest), 1));

> >> >>> +  else

> >> >>> +    iter = build_call_expr (fn, 1, src);

> >> >> Note tree-ssa-loop-niters.c always use unsigned_type_for (IV-type) as

> >> >> niters type.  Though unsigned type is unnecessary in this case, but

> >> >> better to follow existing behavior?

> >> >

> >> > Done.

> >> >>

> >> >>> +  max = int_cst_value (TYPE_MAX_VALUE (TREE_TYPE (dest)));

> >> >> As richi suggested, max should be the number of bits in type of IV.

> >> >>

> >> >>> +

> >> >>> +  niter->assumptions = boolean_false_node;

> >> >> Redundant.

> >> >

> >> > Not sure I understand. If I remove this,  I am getting ICE

> >> > (segmentation fault). What is the expectation here?

> >> Is it a simple typo?  Because assumptions is set to boolean_true_node

> >> just 5 lines below?

> >> The niters part looks good for me with this change.  You may need

> >> richi's approval for other parts?

> >

> > diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c

> > index bdf9ba1..4dc156a 100644

> > --- a/gcc/ipa-fnsummary.c

> > +++ b/gcc/ipa-fnsummary.c

> > @@ -1485,6 +1485,10 @@ will_be_nonconstant_expr_predicate (struct

> > ipa_node_params *info,

> >                                                nonconstant_names);

> >        return p2.or_with (summary->conds, p1);

> >      }

> > +  else if (TREE_CODE (expr) == CALL_EXPR)

> > +    {

> > +      return false;

> > +    }

> >    else

> >      {

> >        debug_tree (expr);

> >

> > I'd return true here instead - we don't want to track popcount() in

> > predicates.  Also unnecessary braces.

> >

> > @@ -3492,6 +3494,11 @@ expression_expensive_p (tree expr)

> >        if (!integer_pow2p (TREE_OPERAND (expr, 1)))

> >         return true;

> >      }

> > +  if (code == CALL_EXPR

> > +      && (fndecl = get_callee_fndecl (expr))

> > +      && DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL

> > +      && (BUILT_IN_POPCOUNT == DECL_FUNCTION_CODE (fndecl)))

> > +    return false;

> >

> > please use

> >

> >     if (code == CALL_EXPR)

> >      {

> >          if (!is_inexpensive_builtin (get_callee_fndecl (expr)))

> >            return true;

> >          FOR_EACH_CALL_EXPR_ARG (arg, iter, expr)

> >              if (expression_expensive_p (arg))

> >                 return true;

> >          return false;

> >      }

> >

> > instead.

> Done.

>

> >

> > +  /* Check "b = b & (b - 1)" is calculated.  */

> > +  if (!is_gimple_assign (and_stmt)

> > +      || gimple_assign_rhs_code (and_stmt) != BIT_AND_EXPR)

> > +    return false;

> > +

> > +  rhs1 = gimple_assign_rhs1 (and_stmt);

> > +  rhs2 = gimple_assign_rhs2 (and_stmt);

> > +

> > +  if (ssa_defined_by_minus_one_stmt_p (rhs1, rhs2, &and_minus_one))

> > +    std::swap (rhs1, rhs2);

> > +  else if (ssa_defined_by_minus_one_stmt_p (rhs2, rhs1, &and_minus_one))

> > +    ;

> > +  else

> > +    return false;

> >

> > so the powers of match-and-simplify let you add sth like the following to

> > match.pd:

> >

> > #if GIMPLE

> > (match (b_bit_and_b_minus_one @0)

> >  (bit_and:c @0 (plus @0 integer_minus_onep)))

> > #endif

> >

> > and then use it like

> >

> >    extern bool gimple_b_bit_and_b_minus_one (tree, tree *, tree (*)(tree));

> >    if (!gimple_b_bit_and_b_minus_one (gimple_assign_lhs (and_stmt),

> > &rhs1, NULL))

> >       return false;

> >

> > note you need to explicitely declare the matcher as there's no header

> > file generated

> > for them.  It would be also the first user for this "feature" and at

> > some point I

> > considered using in-line source markup (and sth like GTY_FILES for

> > what files to scan)

> > to gather patterns.

> >

> > +  /* Check the loop closed SSA definition for just the variable c defined in

> > +     loop.  */

> > +  basic_block bb = exit->dest;

> > +  for (gphi_iterator gpi = gsi_start_phis (bb);

> > +       !gsi_end_p (gpi); gsi_next (&gpi))

> > +    {

> > +      phi = gpi.phi ();

> > +      count++;

> > +    }

> >

> > I don't understand why you are checking for this - isn't the number of

> > iterations

> I am trying to be overly conservative. I have removed it now and adjusted.

>

> > independent on the rest of the loop-closed PHIs?  That is, even for just

> >

> >  while (b) { b = b & (b-1); }

> >

> > the number of iterations is popcount (b).  So it's enough to check

> > the exit condition?  In fact you are coming from number_of_iterations_exit

> > and thus are asked for a specific exit and thus

> >

> > +  if (!(exit = single_exit (loop)))

> > +    return false;

> Ok, changed it.

>

>

> > is premature.  That is, for

> >

> >   while (b) { b = b & (b - 1); if (--c == 0) break; }

> >

> > still should have popcount (b) iterations for the exit involving the

> > while (b) test.

> >

> > So please simplify (and thus genericize) the code accordingly.  Do so

> > without the match.pd trick for now, I think we can simplify the current

> > beast down enough to not need the factored out function.

> I am not using match.pd changes as you have asked. Please let me know

> if you want that to be included as well.

>

> >

> > You then also can handle replacing

> >

> > int c = 0;

> > while (b) { b = b & (b-1); c+=3; }

> >

> > with 3 * popcount (b);

> Made to handle this too. But, there are still cases we don't handle. I

> am not sure if it is worth the complexity handling all the possible

> cases.

>

> Is the attached patch looks better now?


+  /* Check loop terminating branch is like
+     if (b != 0).  */
+  gimple *stmt = last_stmt (loop->header);
+  if (!stmt

this should now look at exit->src instead of loop->header.

+      || !zerop (gimple_cond_rhs (stmt))

use integer_zerop

+  gimple *count_stmt = SSA_NAME_DEF_STMT (rhs1);

you still didn't remove the whole count-stmt stuff.  It's unrelated
and is not required at all.  Only the GIMPLE_COND lhs def-use
cycle is interesting for niter purposes.

Please adjust accordingly.

@@ -2441,7 +2578,11 @@ number_of_iterations_exit (struct loop *loop, edge exit,
   gcond *stmt;
   if (!number_of_iterations_exit_assumptions (loop, exit, niter,
                                              &stmt, every_iteration))
-    return false;
+    {
+      if (number_of_iterations_popcount (loop, exit, niter))
+       return true;
+      return false;
+    }

this should be done in number_of_iterations_exit_assumptions.  I realize
you want to do it only if that fails but in that process you have to be
sure to properly handle every_iteration & friends which is much easier
to do in number_of_iterations_exit_assumptions.  So please put it
where that fails for this kind of IVs:

  tree iv0_niters = NULL_TREE;
  if (!simple_iv_with_niters (loop, loop_containing_stmt (stmt),
                              op0, &iv0, safe ? &iv0_niters : NULL, false))
^^^ here
    return false;


Thanks,
Richard.

> Thanks,

> Kugan

>

>

> >

> > Richard.

> >

> >> Thanks,

> >> bin

> >> >

> >> >>> +  niter->control.base = NULL_TREE;

> >> >>> +  niter->control.step = NULL_TREE;

> >> >>> +  niter->control.no_overflow = false;

> >> >>> +  niter->niter = iter;

> >> >>> +  niter->assumptions = boolean_true_node;

> >> >>> +  niter->may_be_zero = boolean_false_node;

> >> >>> +  niter->max = max;

> >> >>> +  niter->bound = NULL_TREE;

> >> >>> +  niter->cmp = ERROR_MARK;

> >> >>> +  return true;

> >> >>> +}

> >> >>> +

> >> >>> +

> >> >> Appology if these are nitpickings.

> >> > Thanks for the review. I am happy to make the changes needed to get it

> >> > to how it should be :)

> >> >

> >> > Thanks,

> >> > Kugan

> >> >

> >> >>

> >> >> Thanks,

> >> >> bin

Kugan Vivekanandarajah June 7, 2018, 12:51 a.m. UTC | #21

Hi Richard,

Thanks for the review.

On 5 June 2018 at 21:25, Richard Biener <richard.guenther@gmail.com> wrote:
> On Tue, Jun 5, 2018 at 10:59 AM Kugan Vivekanandarajah

> <kugan.vivekanandarajah@linaro.org> wrote:

>>

>> Hi Richard,

>>

>> Thanks for the review.

>>

>> On 1 June 2018 at 23:12, Richard Biener <richard.guenther@gmail.com> wrote:

>> > On Fri, Jun 1, 2018 at 12:06 PM Bin.Cheng <amker.cheng@gmail.com> wrote:

>> >>

>> >> On Fri, Jun 1, 2018 at 9:56 AM, Kugan Vivekanandarajah

>> >> <kugan.vivekanandarajah@linaro.org> wrote:

>> >> > Hi Bin,

>> >> >

>> >> > Thanks a lo for the review.

>> >> >

>> >> > On 1 June 2018 at 03:45, Bin.Cheng <amker.cheng@gmail.com> wrote:

>> >> >> On Thu, May 31, 2018 at 3:51 AM, Kugan Vivekanandarajah

>> >> >> <kugan.vivekanandarajah@linaro.org> wrote:

>> >> >>> Hi Bin,

>> >> >>>

>> >> >>> Thanks for the review. Please find the revised patch based on the

>> >> >>> review comments.

>> >> >>>

>> >> >>> Thanks,

>> >> >>> Kugan

>> >> >>>

>> >> >>> On 17 May 2018 at 19:56, Bin.Cheng <amker.cheng@gmail.com> wrote:

>> >> >>>> On Thu, May 17, 2018 at 2:39 AM, Kugan Vivekanandarajah

>> >> >>>> <kugan.vivekanandarajah@linaro.org> wrote:

>> >> >>>>> Hi Richard,

>> >> >>>>>

>> >> >>>>> On 6 March 2018 at 02:24, Richard Biener <richard.guenther@gmail.com> wrote:

>> >> >>>>>> On Thu, Feb 8, 2018 at 1:41 AM, Kugan Vivekanandarajah

>> >> >>>>>> <kugan.vivekanandarajah@linaro.org> wrote:

>> >> >>>>>>> Hi Richard,

>> >> >>>>>>>

>> >> >>

>> >> >> Hi,

>> >> >> Thanks very much for working.

>> >> >>

>> >> >>> +/* Utility function to check if OP is defined by a stmt

>> >> >>> +   that is a val - 1.  If that is the case, set it to STMT.  */

>> >> >>> +

>> >> >>> +static bool

>> >> >>> +ssa_defined_by_and_minus_one_stmt_p (tree op, tree val, gimple **stmt)

>> >> >> This is checking if op is defined as val - 1, so name it as

>> >> >> ssa_defined_by_minus_one_stmt_p?

>> >> >>

>> >> >>> +{

>> >> >>> +  if (TREE_CODE (op) == SSA_NAME

>> >> >>> +      && (*stmt = SSA_NAME_DEF_STMT (op))

>> >> >>> +      && is_gimple_assign (*stmt)

>> >> >>> +      && (gimple_assign_rhs_code (*stmt) == PLUS_EXPR)

>> >> >>> +      && val == gimple_assign_rhs1 (*stmt)

>> >> >>> +      && integer_minus_onep (gimple_assign_rhs2 (*stmt)))

>> >> >>> +    return true;

>> >> >>> +  else

>> >> >>> +    return false;

>> >> >> You can simply return the boolean condition.

>> >> > Done.

>> >> >

>> >> >>

>> >> >>> +}

>> >> >>> +

>> >> >>> +/* See if LOOP is a popcout implementation of the form

>> >> >> ...

>> >> >>> +  rhs1 = gimple_assign_rhs1 (and_stmt);

>> >> >>> +  rhs2 = gimple_assign_rhs2 (and_stmt);

>> >> >>> +

>> >> >>> +  if (ssa_defined_by_and_minus_one_stmt_p (rhs1, rhs2, &and_minus_one))

>> >> >>> +    rhs1 = rhs2;

>> >> >>> +  else if (ssa_defined_by_and_minus_one_stmt_p (rhs2, rhs1, &and_minus_one))

>> >> >>> +    ;

>> >> >>> +  else

>> >> >>> +    return false;

>> >> >>> +

>> >> >>> +  /* Check the recurrence.  */

>> >> >>> +  phi = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (and_minus_one));

>> >> >> So gimple_assign_rhs1 (and_minus_one) == rhs1 is always true?  Please

>> >> >> use rhs1 directly.

>> >> >

>> >> > Done.

>> >> >>> +  gimple *src_phi = SSA_NAME_DEF_STMT (rhs2);

>> >> >> I think this is checking wrong thing and is redundant.  Either rhs2

>> >> >> equals to rhs1 or is defined as (rhs1 - 1).  For (rhs2 == rhs1) case,

>> >> >> the check duplicates checking on phi; for the latter, it's never a PHI

>> >> >> stmt and shouldn't be checked.

>> >> >>

>> >> >>> +  if (gimple_code (phi) != GIMPLE_PHI

>> >> >>> +      || gimple_code (src_phi) != GIMPLE_PHI)

>> >> >>> +    return false;

>> >> >>> +

>> >> >>> +  dest = gimple_assign_lhs (count_stmt);

>> >> >>> +  tree fn = builtin_decl_implicit (BUILT_IN_POPCOUNT);

>> >> >>> +  tree src = gimple_phi_arg_def (src_phi, loop_preheader_edge (loop)->dest_idx);

>> >> >>> +  if (adjust)

>> >> >>> +    iter = fold_build2 (MINUS_EXPR, TREE_TYPE (dest),

>> >> >>> +            build_call_expr (fn, 1, src),

>> >> >>> +            build_int_cst (TREE_TYPE (dest), 1));

>> >> >>> +  else

>> >> >>> +    iter = build_call_expr (fn, 1, src);

>> >> >> Note tree-ssa-loop-niters.c always use unsigned_type_for (IV-type) as

>> >> >> niters type.  Though unsigned type is unnecessary in this case, but

>> >> >> better to follow existing behavior?

>> >> >

>> >> > Done.

>> >> >>

>> >> >>> +  max = int_cst_value (TYPE_MAX_VALUE (TREE_TYPE (dest)));

>> >> >> As richi suggested, max should be the number of bits in type of IV.

>> >> >>

>> >> >>> +

>> >> >>> +  niter->assumptions = boolean_false_node;

>> >> >> Redundant.

>> >> >

>> >> > Not sure I understand. If I remove this,  I am getting ICE

>> >> > (segmentation fault). What is the expectation here?

>> >> Is it a simple typo?  Because assumptions is set to boolean_true_node

>> >> just 5 lines below?

>> >> The niters part looks good for me with this change.  You may need

>> >> richi's approval for other parts?

>> >

>> > diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c

>> > index bdf9ba1..4dc156a 100644

>> > --- a/gcc/ipa-fnsummary.c

>> > +++ b/gcc/ipa-fnsummary.c

>> > @@ -1485,6 +1485,10 @@ will_be_nonconstant_expr_predicate (struct

>> > ipa_node_params *info,

>> >                                                nonconstant_names);

>> >        return p2.or_with (summary->conds, p1);

>> >      }

>> > +  else if (TREE_CODE (expr) == CALL_EXPR)

>> > +    {

>> > +      return false;

>> > +    }

>> >    else

>> >      {

>> >        debug_tree (expr);

>> >

>> > I'd return true here instead - we don't want to track popcount() in

>> > predicates.  Also unnecessary braces.

>> >

>> > @@ -3492,6 +3494,11 @@ expression_expensive_p (tree expr)

>> >        if (!integer_pow2p (TREE_OPERAND (expr, 1)))

>> >         return true;

>> >      }

>> > +  if (code == CALL_EXPR

>> > +      && (fndecl = get_callee_fndecl (expr))

>> > +      && DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL

>> > +      && (BUILT_IN_POPCOUNT == DECL_FUNCTION_CODE (fndecl)))

>> > +    return false;

>> >

>> > please use

>> >

>> >     if (code == CALL_EXPR)

>> >      {

>> >          if (!is_inexpensive_builtin (get_callee_fndecl (expr)))

>> >            return true;

>> >          FOR_EACH_CALL_EXPR_ARG (arg, iter, expr)

>> >              if (expression_expensive_p (arg))

>> >                 return true;

>> >          return false;

>> >      }

>> >

>> > instead.

>> Done.

>>

>> >

>> > +  /* Check "b = b & (b - 1)" is calculated.  */

>> > +  if (!is_gimple_assign (and_stmt)

>> > +      || gimple_assign_rhs_code (and_stmt) != BIT_AND_EXPR)

>> > +    return false;

>> > +

>> > +  rhs1 = gimple_assign_rhs1 (and_stmt);

>> > +  rhs2 = gimple_assign_rhs2 (and_stmt);

>> > +

>> > +  if (ssa_defined_by_minus_one_stmt_p (rhs1, rhs2, &and_minus_one))

>> > +    std::swap (rhs1, rhs2);

>> > +  else if (ssa_defined_by_minus_one_stmt_p (rhs2, rhs1, &and_minus_one))

>> > +    ;

>> > +  else

>> > +    return false;

>> >

>> > so the powers of match-and-simplify let you add sth like the following to

>> > match.pd:

>> >

>> > #if GIMPLE

>> > (match (b_bit_and_b_minus_one @0)

>> >  (bit_and:c @0 (plus @0 integer_minus_onep)))

>> > #endif

>> >

>> > and then use it like

>> >

>> >    extern bool gimple_b_bit_and_b_minus_one (tree, tree *, tree (*)(tree));

>> >    if (!gimple_b_bit_and_b_minus_one (gimple_assign_lhs (and_stmt),

>> > &rhs1, NULL))

>> >       return false;

>> >

>> > note you need to explicitely declare the matcher as there's no header

>> > file generated

>> > for them.  It would be also the first user for this "feature" and at

>> > some point I

>> > considered using in-line source markup (and sth like GTY_FILES for

>> > what files to scan)

>> > to gather patterns.

>> >

>> > +  /* Check the loop closed SSA definition for just the variable c defined in

>> > +     loop.  */

>> > +  basic_block bb = exit->dest;

>> > +  for (gphi_iterator gpi = gsi_start_phis (bb);

>> > +       !gsi_end_p (gpi); gsi_next (&gpi))

>> > +    {

>> > +      phi = gpi.phi ();

>> > +      count++;

>> > +    }

>> >

>> > I don't understand why you are checking for this - isn't the number of

>> > iterations

>> I am trying to be overly conservative. I have removed it now and adjusted.

>>

>> > independent on the rest of the loop-closed PHIs?  That is, even for just

>> >

>> >  while (b) { b = b & (b-1); }

>> >

>> > the number of iterations is popcount (b).  So it's enough to check

>> > the exit condition?  In fact you are coming from number_of_iterations_exit

>> > and thus are asked for a specific exit and thus

>> >

>> > +  if (!(exit = single_exit (loop)))

>> > +    return false;

>> Ok, changed it.

>>

>>

>> > is premature.  That is, for

>> >

>> >   while (b) { b = b & (b - 1); if (--c == 0) break; }

>> >

>> > still should have popcount (b) iterations for the exit involving the

>> > while (b) test.

>> >

>> > So please simplify (and thus genericize) the code accordingly.  Do so

>> > without the match.pd trick for now, I think we can simplify the current

>> > beast down enough to not need the factored out function.

>> I am not using match.pd changes as you have asked. Please let me know

>> if you want that to be included as well.

>>

>> >

>> > You then also can handle replacing

>> >

>> > int c = 0;

>> > while (b) { b = b & (b-1); c+=3; }

>> >

>> > with 3 * popcount (b);

>> Made to handle this too. But, there are still cases we don't handle. I

>> am not sure if it is worth the complexity handling all the possible

>> cases.

>>

>> Is the attached patch looks better now?

>

> +  /* Check loop terminating branch is like

> +     if (b != 0).  */

> +  gimple *stmt = last_stmt (loop->header);

> +  if (!stmt

>

> this should now look at exit->src instead of loop->header.

Done.

>

> +      || !zerop (gimple_cond_rhs (stmt))

>

> use integer_zerop

Done.

>

> +  gimple *count_stmt = SSA_NAME_DEF_STMT (rhs1);

>

> you still didn't remove the whole count-stmt stuff.  It's unrelated

> and is not required at all.  Only the GIMPLE_COND lhs def-use

> cycle is interesting for niter purposes.

Changed it.

>

> Please adjust accordingly.

>

> @@ -2441,7 +2578,11 @@ number_of_iterations_exit (struct loop *loop, edge exit,

>    gcond *stmt;

>    if (!number_of_iterations_exit_assumptions (loop, exit, niter,

>                                               &stmt, every_iteration))

> -    return false;

> +    {

> +      if (number_of_iterations_popcount (loop, exit, niter))

> +       return true;

> +      return false;

> +    }

>

> this should be done in number_of_iterations_exit_assumptions.  I realize

> you want to do it only if that fails but in that process you have to be

> sure to properly handle every_iteration & friends which is much easier

> to do in number_of_iterations_exit_assumptions.  So please put it

> where that fails for this kind of IVs:

>

>   tree iv0_niters = NULL_TREE;

>   if (!simple_iv_with_niters (loop, loop_containing_stmt (stmt),

>                               op0, &iv0, safe ? &iv0_niters : NULL, false))

> ^^^ here

>     return false;


Done.

Is this OK now. Regression tested and bootstrapped on x86-linux-gnu
with no new regressions.

Thanks,
Kugan
>

>

> Thanks,

> Richard.

>

>> Thanks,

>> Kugan

>>

>>

>> >

>> > Richard.

>> >

>> >> Thanks,

>> >> bin

>> >> >

>> >> >>> +  niter->control.base = NULL_TREE;

>> >> >>> +  niter->control.step = NULL_TREE;

>> >> >>> +  niter->control.no_overflow = false;

>> >> >>> +  niter->niter = iter;

>> >> >>> +  niter->assumptions = boolean_true_node;

>> >> >>> +  niter->may_be_zero = boolean_false_node;

>> >> >>> +  niter->max = max;

>> >> >>> +  niter->bound = NULL_TREE;

>> >> >>> +  niter->cmp = ERROR_MARK;

>> >> >>> +  return true;

>> >> >>> +}

>> >> >>> +

>> >> >>> +

>> >> >> Appology if these are nitpickings.

>> >> > Thanks for the review. I am happy to make the changes needed to get it

>> >> > to how it should be :)

>> >> >

>> >> > Thanks,

>> >> > Kugan

>> >> >

>> >> >>

>> >> >> Thanks,

>> >> >> bin
From 93723ba9de05b8f065f6d0f98bb5536a76215211 Mon Sep 17 00:00:00 2001
From: Kugan Vivekanandarajah <kugan.vivekanandarajah@linaro.org>
Date: Thu, 10 May 2018 21:41:53 +1000
Subject: [PATCH] popcount

Change-Id: I3f75f807c1b3ed2bbfbd7ab8455cb28a28c68b1a
---
 gcc/ipa-fnsummary.c                       |   2 +
 gcc/testsuite/gcc.dg/tree-ssa/popcount.c  |  41 ++++++++++
 gcc/testsuite/gcc.dg/tree-ssa/popcount2.c |  29 +++++++
 gcc/testsuite/gcc.dg/tree-ssa/popcount3.c |  28 +++++++
 gcc/tree-scalar-evolution.c               |  15 ++++
 gcc/tree-ssa-loop-ivopts.c                |  10 +++
 gcc/tree-ssa-loop-niter.c                 | 125 +++++++++++++++++++++++++++++-
 7 files changed, 249 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/popcount.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/popcount2.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/popcount3.c

diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c
index bdf9ba1..feb1c9e 100644
--- a/gcc/ipa-fnsummary.c
+++ b/gcc/ipa-fnsummary.c
@@ -1485,6 +1485,8 @@ will_be_nonconstant_expr_predicate (struct ipa_node_params *info,
 					       nonconstant_names);
       return p2.or_with (summary->conds, p1);
     }
+  else if (TREE_CODE (expr) == CALL_EXPR)
+    return true;
   else
     {
       debug_tree (expr);
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/popcount.c b/gcc/testsuite/gcc.dg/tree-ssa/popcount.c
new file mode 100644
index 0000000..86a66cb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/popcount.c
@@ -0,0 +1,41 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+
+extern int foo (int);
+
+int PopCount (long b) {
+    int c = 0;
+    b++;
+
+    while (b) {
+	b &= b - 1;
+	c++;
+    }
+    return c;
+}
+int PopCount2 (long b) {
+    int c = 0;
+
+    while (b) {
+	b &= b - 1;
+	c++;
+    }
+    foo (c);
+    return foo (c);
+}
+
+void PopCount3 (long b1) {
+
+    for (long i = 0; i < b1; ++i)
+      {
+	long b = i;
+	int c = 0;
+	while (b) {
+	    b &= b - 1;
+	    c++;
+	}
+	foo (c);
+      }
+}
+
+/* { dg-final { scan-tree-dump-times "__builtin_popcount" 3 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/popcount2.c b/gcc/testsuite/gcc.dg/tree-ssa/popcount2.c
new file mode 100644
index 0000000..362d835
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/popcount2.c
@@ -0,0 +1,29 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+int
+__attribute__ ((noinline, noclone))
+foo (int i, long b)
+{
+    int c = i;
+
+    while (b) {
+	b &= b - 1;
+	c++;
+    }
+    return c;
+}
+
+int main()
+{
+
+  if (foo (1, 7) != 4)
+   __builtin_abort ();
+  if (foo (0, 0) != 0)
+   __builtin_abort ();
+  if (foo (8, 0xff) != 16)
+   __builtin_abort ();
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "__builtin_popcount" 1 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/popcount3.c b/gcc/testsuite/gcc.dg/tree-ssa/popcount3.c
new file mode 100644
index 0000000..9096c6b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/popcount3.c
@@ -0,0 +1,28 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -fno-tree-ch -fdump-tree-optimized" } */
+
+int
+__attribute__ ((noinline, noclone))
+foo (long b)
+{
+    int c = 0;
+
+    while (b) {
+	b &= b - 1;
+	c++;
+    }
+    return c;
+}
+
+int main()
+{
+  if (foo (7) != 3)
+   __builtin_abort ();
+  if (foo (0) != 0)
+   __builtin_abort ();
+  if (foo (0xff) != 8)
+   __builtin_abort ();
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "__builtin_popcount" 1 "optimized" } } */
diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c
index fefc9de..4b0ec02 100644
--- a/gcc/tree-scalar-evolution.c
+++ b/gcc/tree-scalar-evolution.c
@@ -281,6 +281,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-propagate.h"
 #include "gimple-fold.h"
 #include "tree-into-ssa.h"
+#include "builtins.h"
 
 static tree analyze_scalar_evolution_1 (struct loop *, tree);
 static tree analyze_scalar_evolution_for_address_of (struct loop *loop,
@@ -1984,6 +1985,7 @@ interpret_expr (struct loop *loop, gimple *at_stmt, tree expr)
     return expr;
 
   if (TREE_CODE (expr) == POLYNOMIAL_CHREC
+      || TREE_CODE (expr) == CALL_EXPR
       || get_gimple_rhs_class (TREE_CODE (expr)) == GIMPLE_TERNARY_RHS)
     return chrec_dont_know;
 
@@ -3493,6 +3495,19 @@ expression_expensive_p (tree expr)
 	return true;
     }
 
+  if (code == CALL_EXPR)
+    {
+      tree arg;
+      call_expr_arg_iterator iter;
+
+      if (!is_inexpensive_builtin (get_callee_fndecl (expr)))
+	return true;
+      FOR_EACH_CALL_EXPR_ARG (arg, iter, expr)
+	if (expression_expensive_p (arg))
+	  return true;
+      return false;
+    }
+
   switch (TREE_CODE_CLASS (code))
     {
     case tcc_binary:
diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index b313571..519649a 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -985,6 +985,16 @@ contains_abnormal_ssa_name_p (tree expr)
   code = TREE_CODE (expr);
   codeclass = TREE_CODE_CLASS (code);
 
+  if (code == CALL_EXPR)
+    {
+      tree arg;
+      call_expr_arg_iterator iter;
+      FOR_EACH_CALL_EXPR_ARG (arg, iter, expr)
+	if (contains_abnormal_ssa_name_p (arg))
+	  return true;
+      return false;
+    }
+
   if (code == SSA_NAME)
     return SSA_NAME_OCCURS_IN_ABNORMAL_PHI (expr) != 0;
 
diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
index 7a54c5f..5064681 100644
--- a/gcc/tree-ssa-loop-niter.c
+++ b/gcc/tree-ssa-loop-niter.c
@@ -63,6 +63,9 @@ struct bounds
   mpz_t below, up;
 };
 
+static bool number_of_iterations_popcount (loop_p loop, edge exit,
+					   struct tree_niter_desc *niter);
+
 
 /* Splits expression EXPR to a variable part VAR and constant OFFSET.  */
 
@@ -2357,7 +2360,7 @@ number_of_iterations_exit_assumptions (struct loop *loop, edge exit,
   tree iv0_niters = NULL_TREE;
   if (!simple_iv_with_niters (loop, loop_containing_stmt (stmt),
 			      op0, &iv0, safe ? &iv0_niters : NULL, false))
-    return false;
+    return number_of_iterations_popcount (loop, exit, niter);
   tree iv1_niters = NULL_TREE;
   if (!simple_iv_with_niters (loop, loop_containing_stmt (stmt),
 			      op1, &iv1, safe ? &iv1_niters : NULL, false))
@@ -2430,6 +2433,126 @@ number_of_iterations_exit_assumptions (struct loop *loop, edge exit,
   return (!integer_zerop (niter->assumptions));
 }
 
+
+/* Utility function to check if OP is defined by a stmt
+   that is a val - 1.  */
+
+static bool
+ssa_defined_by_minus_one_stmt_p (tree op, tree val)
+{
+  gimple *stmt;
+  return (TREE_CODE (op) == SSA_NAME
+	  && (stmt = SSA_NAME_DEF_STMT (op))
+	  && is_gimple_assign (stmt)
+	  && (gimple_assign_rhs_code (stmt) == PLUS_EXPR)
+	  && val == gimple_assign_rhs1 (stmt)
+	  && integer_minus_onep (gimple_assign_rhs2 (stmt)));
+}
+
+
+/* See if LOOP is a popcout implementation of the form
+
+    int c = 0;
+    while (b) {
+	b = b & (b - 1);
+	c++;
+    }
+
+    If popcount pattern, update NITER accordingly.
+    i.e., set NITER to  __builtin_popcount (b)
+    return true if we did, false otherwise.
+
+ */
+
+static bool
+number_of_iterations_popcount (loop_p loop, edge exit,
+			       struct tree_niter_desc *niter)
+{
+  tree rhs1, rhs2;
+  bool adjust = true;
+  tree iter;
+  HOST_WIDE_INT max;
+  adjust = true;
+
+  /* Check loop terminating branch is like
+     if (b != 0).  */
+  gimple *stmt = last_stmt (exit->src);
+  if (!stmt
+      || gimple_code (stmt) != GIMPLE_COND
+      || gimple_cond_code (stmt) != NE_EXPR
+      || !integer_zerop (gimple_cond_rhs (stmt))
+      || TREE_CODE (gimple_cond_lhs (stmt)) != SSA_NAME)
+    return false;
+
+  gphi_iterator gpi = gsi_start_phis (exit->dest);
+  if (gsi_end_p (gpi))
+    return false;
+  rhs1 = gimple_phi_arg_def (gpi.phi (), exit->dest_idx);
+  if (TREE_CODE (rhs1) != SSA_NAME)
+    return false;
+
+  gimple *and_stmt = SSA_NAME_DEF_STMT (gimple_cond_lhs (stmt));
+
+  /* Depending on copy-header is performed, feeding PHI stmts might be in
+     the loop header or loop exit, handle this.  */
+  if (gimple_code (and_stmt) == GIMPLE_PHI
+      && gimple_bb (and_stmt) == exit->src
+      && gimple_phi_num_args (and_stmt) == 2
+      && (TREE_CODE (gimple_phi_arg_def (and_stmt,
+					 loop_latch_edge (loop)->dest_idx))
+	  == SSA_NAME))
+    {
+      tree t = gimple_phi_arg_def (and_stmt, loop_latch_edge (loop)->dest_idx);
+      and_stmt = SSA_NAME_DEF_STMT (t);
+      adjust = false;
+    }
+
+  /* Check "b = b & (b - 1)" is calculated.  */
+  if (!is_gimple_assign (and_stmt)
+      || gimple_assign_rhs_code (and_stmt) != BIT_AND_EXPR)
+    return false;
+
+  rhs1 = gimple_assign_rhs1 (and_stmt);
+  rhs2 = gimple_assign_rhs2 (and_stmt);
+
+  if (ssa_defined_by_minus_one_stmt_p (rhs1, rhs2))
+    std::swap (rhs1, rhs2);
+  else if (ssa_defined_by_minus_one_stmt_p (rhs2, rhs1))
+    ;
+  else
+    return false;
+
+  /* Check the recurrence.  */
+  gimple *phi = SSA_NAME_DEF_STMT (rhs1);
+  if (gimple_code (phi) != GIMPLE_PHI)
+    return false;
+
+  tree fn = builtin_decl_implicit (BUILT_IN_POPCOUNT);
+  tree src = gimple_phi_arg_def (phi, loop_preheader_edge (loop)->dest_idx);
+  tree call = build_call_expr (fn, 1, src);
+  tree utype = unsigned_type_for (TREE_TYPE (call));
+  if (adjust)
+    iter = fold_build2 (MINUS_EXPR, utype,
+			build_call_expr (fn, 1, src),
+			build_int_cst (utype, 1));
+  else
+    iter = fold_convert (utype, build_call_expr (fn, 1, src));
+  max = int_cst_value (TYPE_MAX_VALUE (utype));
+
+  niter->assumptions = boolean_false_node;
+  niter->control.base = NULL_TREE;
+  niter->control.step = NULL_TREE;
+  niter->control.no_overflow = false;
+  niter->niter = iter;
+  niter->assumptions = boolean_true_node;
+  niter->may_be_zero = boolean_false_node;
+  niter->max = max;
+  niter->bound = NULL_TREE;
+  niter->cmp = ERROR_MARK;
+  return true;
+}
+
+
 /* Like number_of_iterations_exit_assumptions, but return TRUE only if
    the niter information holds unconditionally.  */

Richard Biener June 7, 2018, 9:21 a.m. UTC | #22

On Thu, Jun 7, 2018 at 2:52 AM Kugan Vivekanandarajah
<kugan.vivekanandarajah@linaro.org> wrote:
>

> Hi Richard,

>

> Thanks for the review.

>

> On 5 June 2018 at 21:25, Richard Biener <richard.guenther@gmail.com> wrote:

> > On Tue, Jun 5, 2018 at 10:59 AM Kugan Vivekanandarajah

> > <kugan.vivekanandarajah@linaro.org> wrote:

> >>

> >> Hi Richard,

> >>

> >> Thanks for the review.

> >>

> >> On 1 June 2018 at 23:12, Richard Biener <richard.guenther@gmail.com> wrote:

> >> > On Fri, Jun 1, 2018 at 12:06 PM Bin.Cheng <amker.cheng@gmail.com> wrote:

> >> >>

> >> >> On Fri, Jun 1, 2018 at 9:56 AM, Kugan Vivekanandarajah

> >> >> <kugan.vivekanandarajah@linaro.org> wrote:

> >> >> > Hi Bin,

> >> >> >

> >> >> > Thanks a lo for the review.

> >> >> >

> >> >> > On 1 June 2018 at 03:45, Bin.Cheng <amker.cheng@gmail.com> wrote:

> >> >> >> On Thu, May 31, 2018 at 3:51 AM, Kugan Vivekanandarajah

> >> >> >> <kugan.vivekanandarajah@linaro.org> wrote:

> >> >> >>> Hi Bin,

> >> >> >>>

> >> >> >>> Thanks for the review. Please find the revised patch based on the

> >> >> >>> review comments.

> >> >> >>>

> >> >> >>> Thanks,

> >> >> >>> Kugan

> >> >> >>>

> >> >> >>> On 17 May 2018 at 19:56, Bin.Cheng <amker.cheng@gmail.com> wrote:

> >> >> >>>> On Thu, May 17, 2018 at 2:39 AM, Kugan Vivekanandarajah

> >> >> >>>> <kugan.vivekanandarajah@linaro.org> wrote:

> >> >> >>>>> Hi Richard,

> >> >> >>>>>

> >> >> >>>>> On 6 March 2018 at 02:24, Richard Biener <richard.guenther@gmail.com> wrote:

> >> >> >>>>>> On Thu, Feb 8, 2018 at 1:41 AM, Kugan Vivekanandarajah

> >> >> >>>>>> <kugan.vivekanandarajah@linaro.org> wrote:

> >> >> >>>>>>> Hi Richard,

> >> >> >>>>>>>

> >> >> >>

> >> >> >> Hi,

> >> >> >> Thanks very much for working.

> >> >> >>

> >> >> >>> +/* Utility function to check if OP is defined by a stmt

> >> >> >>> +   that is a val - 1.  If that is the case, set it to STMT.  */

> >> >> >>> +

> >> >> >>> +static bool

> >> >> >>> +ssa_defined_by_and_minus_one_stmt_p (tree op, tree val, gimple **stmt)

> >> >> >> This is checking if op is defined as val - 1, so name it as

> >> >> >> ssa_defined_by_minus_one_stmt_p?

> >> >> >>

> >> >> >>> +{

> >> >> >>> +  if (TREE_CODE (op) == SSA_NAME

> >> >> >>> +      && (*stmt = SSA_NAME_DEF_STMT (op))

> >> >> >>> +      && is_gimple_assign (*stmt)

> >> >> >>> +      && (gimple_assign_rhs_code (*stmt) == PLUS_EXPR)

> >> >> >>> +      && val == gimple_assign_rhs1 (*stmt)

> >> >> >>> +      && integer_minus_onep (gimple_assign_rhs2 (*stmt)))

> >> >> >>> +    return true;

> >> >> >>> +  else

> >> >> >>> +    return false;

> >> >> >> You can simply return the boolean condition.

> >> >> > Done.

> >> >> >

> >> >> >>

> >> >> >>> +}

> >> >> >>> +

> >> >> >>> +/* See if LOOP is a popcout implementation of the form

> >> >> >> ...

> >> >> >>> +  rhs1 = gimple_assign_rhs1 (and_stmt);

> >> >> >>> +  rhs2 = gimple_assign_rhs2 (and_stmt);

> >> >> >>> +

> >> >> >>> +  if (ssa_defined_by_and_minus_one_stmt_p (rhs1, rhs2, &and_minus_one))

> >> >> >>> +    rhs1 = rhs2;

> >> >> >>> +  else if (ssa_defined_by_and_minus_one_stmt_p (rhs2, rhs1, &and_minus_one))

> >> >> >>> +    ;

> >> >> >>> +  else

> >> >> >>> +    return false;

> >> >> >>> +

> >> >> >>> +  /* Check the recurrence.  */

> >> >> >>> +  phi = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (and_minus_one));

> >> >> >> So gimple_assign_rhs1 (and_minus_one) == rhs1 is always true?  Please

> >> >> >> use rhs1 directly.

> >> >> >

> >> >> > Done.

> >> >> >>> +  gimple *src_phi = SSA_NAME_DEF_STMT (rhs2);

> >> >> >> I think this is checking wrong thing and is redundant.  Either rhs2

> >> >> >> equals to rhs1 or is defined as (rhs1 - 1).  For (rhs2 == rhs1) case,

> >> >> >> the check duplicates checking on phi; for the latter, it's never a PHI

> >> >> >> stmt and shouldn't be checked.

> >> >> >>

> >> >> >>> +  if (gimple_code (phi) != GIMPLE_PHI

> >> >> >>> +      || gimple_code (src_phi) != GIMPLE_PHI)

> >> >> >>> +    return false;

> >> >> >>> +

> >> >> >>> +  dest = gimple_assign_lhs (count_stmt);

> >> >> >>> +  tree fn = builtin_decl_implicit (BUILT_IN_POPCOUNT);

> >> >> >>> +  tree src = gimple_phi_arg_def (src_phi, loop_preheader_edge (loop)->dest_idx);

> >> >> >>> +  if (adjust)

> >> >> >>> +    iter = fold_build2 (MINUS_EXPR, TREE_TYPE (dest),

> >> >> >>> +            build_call_expr (fn, 1, src),

> >> >> >>> +            build_int_cst (TREE_TYPE (dest), 1));

> >> >> >>> +  else

> >> >> >>> +    iter = build_call_expr (fn, 1, src);

> >> >> >> Note tree-ssa-loop-niters.c always use unsigned_type_for (IV-type) as

> >> >> >> niters type.  Though unsigned type is unnecessary in this case, but

> >> >> >> better to follow existing behavior?

> >> >> >

> >> >> > Done.

> >> >> >>

> >> >> >>> +  max = int_cst_value (TYPE_MAX_VALUE (TREE_TYPE (dest)));

> >> >> >> As richi suggested, max should be the number of bits in type of IV.

> >> >> >>

> >> >> >>> +

> >> >> >>> +  niter->assumptions = boolean_false_node;

> >> >> >> Redundant.

> >> >> >

> >> >> > Not sure I understand. If I remove this,  I am getting ICE

> >> >> > (segmentation fault). What is the expectation here?

> >> >> Is it a simple typo?  Because assumptions is set to boolean_true_node

> >> >> just 5 lines below?

> >> >> The niters part looks good for me with this change.  You may need

> >> >> richi's approval for other parts?

> >> >

> >> > diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c

> >> > index bdf9ba1..4dc156a 100644

> >> > --- a/gcc/ipa-fnsummary.c

> >> > +++ b/gcc/ipa-fnsummary.c

> >> > @@ -1485,6 +1485,10 @@ will_be_nonconstant_expr_predicate (struct

> >> > ipa_node_params *info,

> >> >                                                nonconstant_names);

> >> >        return p2.or_with (summary->conds, p1);

> >> >      }

> >> > +  else if (TREE_CODE (expr) == CALL_EXPR)

> >> > +    {

> >> > +      return false;

> >> > +    }

> >> >    else

> >> >      {

> >> >        debug_tree (expr);

> >> >

> >> > I'd return true here instead - we don't want to track popcount() in

> >> > predicates.  Also unnecessary braces.

> >> >

> >> > @@ -3492,6 +3494,11 @@ expression_expensive_p (tree expr)

> >> >        if (!integer_pow2p (TREE_OPERAND (expr, 1)))

> >> >         return true;

> >> >      }

> >> > +  if (code == CALL_EXPR

> >> > +      && (fndecl = get_callee_fndecl (expr))

> >> > +      && DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL

> >> > +      && (BUILT_IN_POPCOUNT == DECL_FUNCTION_CODE (fndecl)))

> >> > +    return false;

> >> >

> >> > please use

> >> >

> >> >     if (code == CALL_EXPR)

> >> >      {

> >> >          if (!is_inexpensive_builtin (get_callee_fndecl (expr)))

> >> >            return true;

> >> >          FOR_EACH_CALL_EXPR_ARG (arg, iter, expr)

> >> >              if (expression_expensive_p (arg))

> >> >                 return true;

> >> >          return false;

> >> >      }

> >> >

> >> > instead.

> >> Done.

> >>

> >> >

> >> > +  /* Check "b = b & (b - 1)" is calculated.  */

> >> > +  if (!is_gimple_assign (and_stmt)

> >> > +      || gimple_assign_rhs_code (and_stmt) != BIT_AND_EXPR)

> >> > +    return false;

> >> > +

> >> > +  rhs1 = gimple_assign_rhs1 (and_stmt);

> >> > +  rhs2 = gimple_assign_rhs2 (and_stmt);

> >> > +

> >> > +  if (ssa_defined_by_minus_one_stmt_p (rhs1, rhs2, &and_minus_one))

> >> > +    std::swap (rhs1, rhs2);

> >> > +  else if (ssa_defined_by_minus_one_stmt_p (rhs2, rhs1, &and_minus_one))

> >> > +    ;

> >> > +  else

> >> > +    return false;

> >> >

> >> > so the powers of match-and-simplify let you add sth like the following to

> >> > match.pd:

> >> >

> >> > #if GIMPLE

> >> > (match (b_bit_and_b_minus_one @0)

> >> >  (bit_and:c @0 (plus @0 integer_minus_onep)))

> >> > #endif

> >> >

> >> > and then use it like

> >> >

> >> >    extern bool gimple_b_bit_and_b_minus_one (tree, tree *, tree (*)(tree));

> >> >    if (!gimple_b_bit_and_b_minus_one (gimple_assign_lhs (and_stmt),

> >> > &rhs1, NULL))

> >> >       return false;

> >> >

> >> > note you need to explicitely declare the matcher as there's no header

> >> > file generated

> >> > for them.  It would be also the first user for this "feature" and at

> >> > some point I

> >> > considered using in-line source markup (and sth like GTY_FILES for

> >> > what files to scan)

> >> > to gather patterns.

> >> >

> >> > +  /* Check the loop closed SSA definition for just the variable c defined in

> >> > +     loop.  */

> >> > +  basic_block bb = exit->dest;

> >> > +  for (gphi_iterator gpi = gsi_start_phis (bb);

> >> > +       !gsi_end_p (gpi); gsi_next (&gpi))

> >> > +    {

> >> > +      phi = gpi.phi ();

> >> > +      count++;

> >> > +    }

> >> >

> >> > I don't understand why you are checking for this - isn't the number of

> >> > iterations

> >> I am trying to be overly conservative. I have removed it now and adjusted.

> >>

> >> > independent on the rest of the loop-closed PHIs?  That is, even for just

> >> >

> >> >  while (b) { b = b & (b-1); }

> >> >

> >> > the number of iterations is popcount (b).  So it's enough to check

> >> > the exit condition?  In fact you are coming from number_of_iterations_exit

> >> > and thus are asked for a specific exit and thus

> >> >

> >> > +  if (!(exit = single_exit (loop)))

> >> > +    return false;

> >> Ok, changed it.

> >>

> >>

> >> > is premature.  That is, for

> >> >

> >> >   while (b) { b = b & (b - 1); if (--c == 0) break; }

> >> >

> >> > still should have popcount (b) iterations for the exit involving the

> >> > while (b) test.

> >> >

> >> > So please simplify (and thus genericize) the code accordingly.  Do so

> >> > without the match.pd trick for now, I think we can simplify the current

> >> > beast down enough to not need the factored out function.

> >> I am not using match.pd changes as you have asked. Please let me know

> >> if you want that to be included as well.

> >>

> >> >

> >> > You then also can handle replacing

> >> >

> >> > int c = 0;

> >> > while (b) { b = b & (b-1); c+=3; }

> >> >

> >> > with 3 * popcount (b);

> >> Made to handle this too. But, there are still cases we don't handle. I

> >> am not sure if it is worth the complexity handling all the possible

> >> cases.

> >>

> >> Is the attached patch looks better now?

> >

> > +  /* Check loop terminating branch is like

> > +     if (b != 0).  */

> > +  gimple *stmt = last_stmt (loop->header);

> > +  if (!stmt

> >

> > this should now look at exit->src instead of loop->header.

> Done.

>

> >

> > +      || !zerop (gimple_cond_rhs (stmt))

> >

> > use integer_zerop

> Done.

>

> >

> > +  gimple *count_stmt = SSA_NAME_DEF_STMT (rhs1);

> >

> > you still didn't remove the whole count-stmt stuff.  It's unrelated

> > and is not required at all.  Only the GIMPLE_COND lhs def-use

> > cycle is interesting for niter purposes.

> Changed it.

>

> >

> > Please adjust accordingly.

> >

> > @@ -2441,7 +2578,11 @@ number_of_iterations_exit (struct loop *loop, edge exit,

> >    gcond *stmt;

> >    if (!number_of_iterations_exit_assumptions (loop, exit, niter,

> >                                               &stmt, every_iteration))

> > -    return false;

> > +    {

> > +      if (number_of_iterations_popcount (loop, exit, niter))

> > +       return true;

> > +      return false;

> > +    }

> >

> > this should be done in number_of_iterations_exit_assumptions.  I realize

> > you want to do it only if that fails but in that process you have to be

> > sure to properly handle every_iteration & friends which is much easier

> > to do in number_of_iterations_exit_assumptions.  So please put it

> > where that fails for this kind of IVs:

> >

> >   tree iv0_niters = NULL_TREE;

> >   if (!simple_iv_with_niters (loop, loop_containing_stmt (stmt),

> >                               op0, &iv0, safe ? &iv0_niters : NULL, false))

> > ^^^ here

> >     return false;

>

> Done.

>

> Is this OK now. Regression tested and bootstrapped on x86-linux-gnu

> with no new regressions.


Please pass down 'code' from number_of_iterations_exit_assumptions becaue
that correctly does

  /* We want the condition for staying inside loop.  */
  code = gimple_cond_code (stmt);
  if (exit->flags & EDGE_TRUE_VALUE)
    code = invert_tree_comparison (code, false);

alternatively if you want to keep the function more isolated from the actual
caller you'd have to do sth like the above in number_of_iterations_popcount.

+  gphi_iterator gpi = gsi_start_phis (exit->dest);
+  if (gsi_end_p (gpi))
+    return false;
+  rhs1 = gimple_phi_arg_def (gpi.phi (), exit->dest_idx);
+  if (TREE_CODE (rhs1) != SSA_NAME)
+    return false;

drop this, it's not necessary.

+  /* Depending on copy-header is performed, feeding PHI stmts might be in
+     the loop header or loop exit, handle this.  */

the loop header or loop latch

+  if (gimple_code (and_stmt) == GIMPLE_PHI
+      && gimple_bb (and_stmt) == exit->src

this would be loop->header still, not exit->src, otherwise the
gimple_phi_arg_def call on the latch-edge might ICE.

+  /* Check the recurrence.  */
+  gimple *phi = SSA_NAME_DEF_STMT (rhs1);
+  if (gimple_code (phi) != GIMPLE_PHI)
+    return false;

I think you need to verify that the latch edge argument of this
PHI has the proper value which should be the LHS of the
and_stmt.

What would make the code clearer is probably naming
variables after a pattern you put in comments.  For example

 /* We match
      # b_1 = PHI <..., b_2>
      tem_3 = b_1 + -1;
      b_2 = b_1 & tem_3;  */

and then have local vars b1 and b2 instead of rhs1/rhs2, etc.

The overall function comment is now misleading since
it still mentions 'c'.

+  tree fn = builtin_decl_implicit (BUILT_IN_POPCOUNT);
+  tree src = gimple_phi_arg_def (phi, loop_preheader_edge (loop)->dest_idx);
+  tree call = build_call_expr (fn, 1, src);

POPCOUNT expects an 'unsigned int' argument so you have to choose
POPCOUNT, POPCOUNTL or POPCOUNTLL based on the type of SRC
and also convert it properly (from say signed int to unsigned int).  The
testcases all use 'long' but no actual value that would make a difference
if you use popcount ((int)long-value).

Note there's an internal functiuon as well but it is only supported if the
target natively supports popcount which isn't what is desired here.
So you have to deal with the type of src, making sure you zero-extent
it if you need extension (for a loop that works on singed char or short)
because sign-extension can change the popcount result (likely you'll
need a GIMPLE testcase to trigger a bogus transform here).

That said, for simplicity and not complicating this even further - heh - I
suggest you do

  if (TYPE_PRECISION (TREE_TYPE (src)) == TYPE_PRECISION (integer_type_node))
    fn = builtin_decl_implicit (BUILT_IN_POPCOUNT);
 else if (TYPE_PRECISION (TREE_TYPE (src)) == TYPE_PRECISION
(long_integer_type_node))
   fn = builtin_decl_implicit (BUILT_IN_POPCOUNTL);
 else if
    ... LL case

  /* ???  Support promoting char/short to int.  */
  if (!fn)
    return false;

  src = fold_convert (unsigned_type_for (TREE_TYPE (src)), src);

+  tree utype = unsigned_type_for (TREE_TYPE (call));
+  if (adjust)
+    iter = fold_build2 (MINUS_EXPR, utype,
+                       build_call_expr (fn, 1, src),

you need to convert the call result to utype, thus wrap it
in fold_convert (utype, build_call_expr (...)).

+                       build_int_cst (utype, 1));

so if adjust is true then you have to guard against the initial value
being zero because then your niter is -1U.  You do this by
setting may_be_zero to

  fold_build2 (EQ_EXPR, boolean_type_node, src, build_zero_cst
(TREE_TYPE (src)));

+  else
+    iter = fold_convert (utype, build_call_expr (fn, 1, src));
+  max = int_cst_value (TYPE_MAX_VALUE (utype));

Surely max is not TYPE_MAX_VALUE of utype but instead
TYPE_PRECISION (TREE_TYPE (src)) (of the original
unpromoted src).  In case of adjust == true it is even one less.

Sorry for the many (correctness) issues again :/

Thanks,
Richard.

+
+  niter->assumptions = boolean_false_node;
+  niter->control.base = NULL_TREE;
+  niter->control.step = NULL_TREE;
+  niter->control.no_overflow = false;
+  niter->niter = iter;
+  niter->assumptions = boolean_true_node;
+  niter->may_be_zero = boolean_false_node;
+  niter->max = max;
+  niter->bound = NULL_TREE;
+  niter->cmp = ERROR_MARK;



> Thanks,

> Kugan

> >

> >

> > Thanks,

> > Richard.

> >

> >> Thanks,

> >> Kugan

> >>

> >>

> >> >

> >> > Richard.

> >> >

> >> >> Thanks,

> >> >> bin

> >> >> >

> >> >> >>> +  niter->control.base = NULL_TREE;

> >> >> >>> +  niter->control.step = NULL_TREE;

> >> >> >>> +  niter->control.no_overflow = false;

> >> >> >>> +  niter->niter = iter;

> >> >> >>> +  niter->assumptions = boolean_true_node;

> >> >> >>> +  niter->may_be_zero = boolean_false_node;

> >> >> >>> +  niter->max = max;

> >> >> >>> +  niter->bound = NULL_TREE;

> >> >> >>> +  niter->cmp = ERROR_MARK;

> >> >> >>> +  return true;

> >> >> >>> +}

> >> >> >>> +

> >> >> >>> +

> >> >> >> Appology if these are nitpickings.

> >> >> > Thanks for the review. I am happy to make the changes needed to get it

> >> >> > to how it should be :)

> >> >> >

> >> >> > Thanks,

> >> >> > Kugan

> >> >> >

> >> >> >>

> >> >> >> Thanks,

> >> >> >> bin

Kugan Vivekanandarajah June 12, 2018, 3:09 a.m. UTC | #23

Hi Richard,

Thanks for the review.

On 7 June 2018 at 19:21, Richard Biener <richard.guenther@gmail.com> wrote:
> On Thu, Jun 7, 2018 at 2:52 AM Kugan Vivekanandarajah

> <kugan.vivekanandarajah@linaro.org> wrote:

>>

>> Hi Richard,

>>

>> Thanks for the review.

>>

>> On 5 June 2018 at 21:25, Richard Biener <richard.guenther@gmail.com> wrote:

>> > On Tue, Jun 5, 2018 at 10:59 AM Kugan Vivekanandarajah

>> > <kugan.vivekanandarajah@linaro.org> wrote:

>> >>

>> >> Hi Richard,

>> >>

>> >> Thanks for the review.

>> >>

>> >> On 1 June 2018 at 23:12, Richard Biener <richard.guenther@gmail.com> wrote:

>> >> > On Fri, Jun 1, 2018 at 12:06 PM Bin.Cheng <amker.cheng@gmail.com> wrote:

>> >> >>

>> >> >> On Fri, Jun 1, 2018 at 9:56 AM, Kugan Vivekanandarajah

>> >> >> <kugan.vivekanandarajah@linaro.org> wrote:

>> >> >> > Hi Bin,

>> >> >> >

>> >> >> > Thanks a lo for the review.

>> >> >> >

>> >> >> > On 1 June 2018 at 03:45, Bin.Cheng <amker.cheng@gmail.com> wrote:

>> >> >> >> On Thu, May 31, 2018 at 3:51 AM, Kugan Vivekanandarajah

>> >> >> >> <kugan.vivekanandarajah@linaro.org> wrote:

>> >> >> >>> Hi Bin,

>> >> >> >>>

>> >> >> >>> Thanks for the review. Please find the revised patch based on the

>> >> >> >>> review comments.

>> >> >> >>>

>> >> >> >>> Thanks,

>> >> >> >>> Kugan

>> >> >> >>>

>> >> >> >>> On 17 May 2018 at 19:56, Bin.Cheng <amker.cheng@gmail.com> wrote:

>> >> >> >>>> On Thu, May 17, 2018 at 2:39 AM, Kugan Vivekanandarajah

>> >> >> >>>> <kugan.vivekanandarajah@linaro.org> wrote:

>> >> >> >>>>> Hi Richard,

>> >> >> >>>>>

>> >> >> >>>>> On 6 March 2018 at 02:24, Richard Biener <richard.guenther@gmail.com> wrote:

>> >> >> >>>>>> On Thu, Feb 8, 2018 at 1:41 AM, Kugan Vivekanandarajah

>> >> >> >>>>>> <kugan.vivekanandarajah@linaro.org> wrote:

>> >> >> >>>>>>> Hi Richard,

>> >> >> >>>>>>>

>> >> >> >>

>> >> >> >> Hi,

>> >> >> >> Thanks very much for working.

>> >> >> >>

>> >> >> >>> +/* Utility function to check if OP is defined by a stmt

>> >> >> >>> +   that is a val - 1.  If that is the case, set it to STMT.  */

>> >> >> >>> +

>> >> >> >>> +static bool

>> >> >> >>> +ssa_defined_by_and_minus_one_stmt_p (tree op, tree val, gimple **stmt)

>> >> >> >> This is checking if op is defined as val - 1, so name it as

>> >> >> >> ssa_defined_by_minus_one_stmt_p?

>> >> >> >>

>> >> >> >>> +{

>> >> >> >>> +  if (TREE_CODE (op) == SSA_NAME

>> >> >> >>> +      && (*stmt = SSA_NAME_DEF_STMT (op))

>> >> >> >>> +      && is_gimple_assign (*stmt)

>> >> >> >>> +      && (gimple_assign_rhs_code (*stmt) == PLUS_EXPR)

>> >> >> >>> +      && val == gimple_assign_rhs1 (*stmt)

>> >> >> >>> +      && integer_minus_onep (gimple_assign_rhs2 (*stmt)))

>> >> >> >>> +    return true;

>> >> >> >>> +  else

>> >> >> >>> +    return false;

>> >> >> >> You can simply return the boolean condition.

>> >> >> > Done.

>> >> >> >

>> >> >> >>

>> >> >> >>> +}

>> >> >> >>> +

>> >> >> >>> +/* See if LOOP is a popcout implementation of the form

>> >> >> >> ...

>> >> >> >>> +  rhs1 = gimple_assign_rhs1 (and_stmt);

>> >> >> >>> +  rhs2 = gimple_assign_rhs2 (and_stmt);

>> >> >> >>> +

>> >> >> >>> +  if (ssa_defined_by_and_minus_one_stmt_p (rhs1, rhs2, &and_minus_one))

>> >> >> >>> +    rhs1 = rhs2;

>> >> >> >>> +  else if (ssa_defined_by_and_minus_one_stmt_p (rhs2, rhs1, &and_minus_one))

>> >> >> >>> +    ;

>> >> >> >>> +  else

>> >> >> >>> +    return false;

>> >> >> >>> +

>> >> >> >>> +  /* Check the recurrence.  */

>> >> >> >>> +  phi = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (and_minus_one));

>> >> >> >> So gimple_assign_rhs1 (and_minus_one) == rhs1 is always true?  Please

>> >> >> >> use rhs1 directly.

>> >> >> >

>> >> >> > Done.

>> >> >> >>> +  gimple *src_phi = SSA_NAME_DEF_STMT (rhs2);

>> >> >> >> I think this is checking wrong thing and is redundant.  Either rhs2

>> >> >> >> equals to rhs1 or is defined as (rhs1 - 1).  For (rhs2 == rhs1) case,

>> >> >> >> the check duplicates checking on phi; for the latter, it's never a PHI

>> >> >> >> stmt and shouldn't be checked.

>> >> >> >>

>> >> >> >>> +  if (gimple_code (phi) != GIMPLE_PHI

>> >> >> >>> +      || gimple_code (src_phi) != GIMPLE_PHI)

>> >> >> >>> +    return false;

>> >> >> >>> +

>> >> >> >>> +  dest = gimple_assign_lhs (count_stmt);

>> >> >> >>> +  tree fn = builtin_decl_implicit (BUILT_IN_POPCOUNT);

>> >> >> >>> +  tree src = gimple_phi_arg_def (src_phi, loop_preheader_edge (loop)->dest_idx);

>> >> >> >>> +  if (adjust)

>> >> >> >>> +    iter = fold_build2 (MINUS_EXPR, TREE_TYPE (dest),

>> >> >> >>> +            build_call_expr (fn, 1, src),

>> >> >> >>> +            build_int_cst (TREE_TYPE (dest), 1));

>> >> >> >>> +  else

>> >> >> >>> +    iter = build_call_expr (fn, 1, src);

>> >> >> >> Note tree-ssa-loop-niters.c always use unsigned_type_for (IV-type) as

>> >> >> >> niters type.  Though unsigned type is unnecessary in this case, but

>> >> >> >> better to follow existing behavior?

>> >> >> >

>> >> >> > Done.

>> >> >> >>

>> >> >> >>> +  max = int_cst_value (TYPE_MAX_VALUE (TREE_TYPE (dest)));

>> >> >> >> As richi suggested, max should be the number of bits in type of IV.

>> >> >> >>

>> >> >> >>> +

>> >> >> >>> +  niter->assumptions = boolean_false_node;

>> >> >> >> Redundant.

>> >> >> >

>> >> >> > Not sure I understand. If I remove this,  I am getting ICE

>> >> >> > (segmentation fault). What is the expectation here?

>> >> >> Is it a simple typo?  Because assumptions is set to boolean_true_node

>> >> >> just 5 lines below?

>> >> >> The niters part looks good for me with this change.  You may need

>> >> >> richi's approval for other parts?

>> >> >

>> >> > diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c

>> >> > index bdf9ba1..4dc156a 100644

>> >> > --- a/gcc/ipa-fnsummary.c

>> >> > +++ b/gcc/ipa-fnsummary.c

>> >> > @@ -1485,6 +1485,10 @@ will_be_nonconstant_expr_predicate (struct

>> >> > ipa_node_params *info,

>> >> >                                                nonconstant_names);

>> >> >        return p2.or_with (summary->conds, p1);

>> >> >      }

>> >> > +  else if (TREE_CODE (expr) == CALL_EXPR)

>> >> > +    {

>> >> > +      return false;

>> >> > +    }

>> >> >    else

>> >> >      {

>> >> >        debug_tree (expr);

>> >> >

>> >> > I'd return true here instead - we don't want to track popcount() in

>> >> > predicates.  Also unnecessary braces.

>> >> >

>> >> > @@ -3492,6 +3494,11 @@ expression_expensive_p (tree expr)

>> >> >        if (!integer_pow2p (TREE_OPERAND (expr, 1)))

>> >> >         return true;

>> >> >      }

>> >> > +  if (code == CALL_EXPR

>> >> > +      && (fndecl = get_callee_fndecl (expr))

>> >> > +      && DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL

>> >> > +      && (BUILT_IN_POPCOUNT == DECL_FUNCTION_CODE (fndecl)))

>> >> > +    return false;

>> >> >

>> >> > please use

>> >> >

>> >> >     if (code == CALL_EXPR)

>> >> >      {

>> >> >          if (!is_inexpensive_builtin (get_callee_fndecl (expr)))

>> >> >            return true;

>> >> >          FOR_EACH_CALL_EXPR_ARG (arg, iter, expr)

>> >> >              if (expression_expensive_p (arg))

>> >> >                 return true;

>> >> >          return false;

>> >> >      }

>> >> >

>> >> > instead.

>> >> Done.

>> >>

>> >> >

>> >> > +  /* Check "b = b & (b - 1)" is calculated.  */

>> >> > +  if (!is_gimple_assign (and_stmt)

>> >> > +      || gimple_assign_rhs_code (and_stmt) != BIT_AND_EXPR)

>> >> > +    return false;

>> >> > +

>> >> > +  rhs1 = gimple_assign_rhs1 (and_stmt);

>> >> > +  rhs2 = gimple_assign_rhs2 (and_stmt);

>> >> > +

>> >> > +  if (ssa_defined_by_minus_one_stmt_p (rhs1, rhs2, &and_minus_one))

>> >> > +    std::swap (rhs1, rhs2);

>> >> > +  else if (ssa_defined_by_minus_one_stmt_p (rhs2, rhs1, &and_minus_one))

>> >> > +    ;

>> >> > +  else

>> >> > +    return false;

>> >> >

>> >> > so the powers of match-and-simplify let you add sth like the following to

>> >> > match.pd:

>> >> >

>> >> > #if GIMPLE

>> >> > (match (b_bit_and_b_minus_one @0)

>> >> >  (bit_and:c @0 (plus @0 integer_minus_onep)))

>> >> > #endif

>> >> >

>> >> > and then use it like

>> >> >

>> >> >    extern bool gimple_b_bit_and_b_minus_one (tree, tree *, tree (*)(tree));

>> >> >    if (!gimple_b_bit_and_b_minus_one (gimple_assign_lhs (and_stmt),

>> >> > &rhs1, NULL))

>> >> >       return false;

>> >> >

>> >> > note you need to explicitely declare the matcher as there's no header

>> >> > file generated

>> >> > for them.  It would be also the first user for this "feature" and at

>> >> > some point I

>> >> > considered using in-line source markup (and sth like GTY_FILES for

>> >> > what files to scan)

>> >> > to gather patterns.

>> >> >

>> >> > +  /* Check the loop closed SSA definition for just the variable c defined in

>> >> > +     loop.  */

>> >> > +  basic_block bb = exit->dest;

>> >> > +  for (gphi_iterator gpi = gsi_start_phis (bb);

>> >> > +       !gsi_end_p (gpi); gsi_next (&gpi))

>> >> > +    {

>> >> > +      phi = gpi.phi ();

>> >> > +      count++;

>> >> > +    }

>> >> >

>> >> > I don't understand why you are checking for this - isn't the number of

>> >> > iterations

>> >> I am trying to be overly conservative. I have removed it now and adjusted.

>> >>

>> >> > independent on the rest of the loop-closed PHIs?  That is, even for just

>> >> >

>> >> >  while (b) { b = b & (b-1); }

>> >> >

>> >> > the number of iterations is popcount (b).  So it's enough to check

>> >> > the exit condition?  In fact you are coming from number_of_iterations_exit

>> >> > and thus are asked for a specific exit and thus

>> >> >

>> >> > +  if (!(exit = single_exit (loop)))

>> >> > +    return false;

>> >> Ok, changed it.

>> >>

>> >>

>> >> > is premature.  That is, for

>> >> >

>> >> >   while (b) { b = b & (b - 1); if (--c == 0) break; }

>> >> >

>> >> > still should have popcount (b) iterations for the exit involving the

>> >> > while (b) test.

>> >> >

>> >> > So please simplify (and thus genericize) the code accordingly.  Do so

>> >> > without the match.pd trick for now, I think we can simplify the current

>> >> > beast down enough to not need the factored out function.

>> >> I am not using match.pd changes as you have asked. Please let me know

>> >> if you want that to be included as well.

>> >>

>> >> >

>> >> > You then also can handle replacing

>> >> >

>> >> > int c = 0;

>> >> > while (b) { b = b & (b-1); c+=3; }

>> >> >

>> >> > with 3 * popcount (b);

>> >> Made to handle this too. But, there are still cases we don't handle. I

>> >> am not sure if it is worth the complexity handling all the possible

>> >> cases.

>> >>

>> >> Is the attached patch looks better now?

>> >

>> > +  /* Check loop terminating branch is like

>> > +     if (b != 0).  */

>> > +  gimple *stmt = last_stmt (loop->header);

>> > +  if (!stmt

>> >

>> > this should now look at exit->src instead of loop->header.

>> Done.

>>

>> >

>> > +      || !zerop (gimple_cond_rhs (stmt))

>> >

>> > use integer_zerop

>> Done.

>>

>> >

>> > +  gimple *count_stmt = SSA_NAME_DEF_STMT (rhs1);

>> >

>> > you still didn't remove the whole count-stmt stuff.  It's unrelated

>> > and is not required at all.  Only the GIMPLE_COND lhs def-use

>> > cycle is interesting for niter purposes.

>> Changed it.

>>

>> >

>> > Please adjust accordingly.

>> >

>> > @@ -2441,7 +2578,11 @@ number_of_iterations_exit (struct loop *loop, edge exit,

>> >    gcond *stmt;

>> >    if (!number_of_iterations_exit_assumptions (loop, exit, niter,

>> >                                               &stmt, every_iteration))

>> > -    return false;

>> > +    {

>> > +      if (number_of_iterations_popcount (loop, exit, niter))

>> > +       return true;

>> > +      return false;

>> > +    }

>> >

>> > this should be done in number_of_iterations_exit_assumptions.  I realize

>> > you want to do it only if that fails but in that process you have to be

>> > sure to properly handle every_iteration & friends which is much easier

>> > to do in number_of_iterations_exit_assumptions.  So please put it

>> > where that fails for this kind of IVs:

>> >

>> >   tree iv0_niters = NULL_TREE;

>> >   if (!simple_iv_with_niters (loop, loop_containing_stmt (stmt),

>> >                               op0, &iv0, safe ? &iv0_niters : NULL, false))

>> > ^^^ here

>> >     return false;

>>

>> Done.

>>

>> Is this OK now. Regression tested and bootstrapped on x86-linux-gnu

>> with no new regressions.

>

> Please pass down 'code' from number_of_iterations_exit_assumptions becaue

> that correctly does


Done.

>

>   /* We want the condition for staying inside loop.  */

>   code = gimple_cond_code (stmt);

>   if (exit->flags & EDGE_TRUE_VALUE)

>     code = invert_tree_comparison (code, false);

>

> alternatively if you want to keep the function more isolated from the actual

> caller you'd have to do sth like the above in number_of_iterations_popcount.

>

> +  gphi_iterator gpi = gsi_start_phis (exit->dest);

> +  if (gsi_end_p (gpi))

> +    return false;

> +  rhs1 = gimple_phi_arg_def (gpi.phi (), exit->dest_idx);

> +  if (TREE_CODE (rhs1) != SSA_NAME)

> +    return false;

>

> drop this, it's not necessary.

Done.

>

> +  /* Depending on copy-header is performed, feeding PHI stmts might be in

> +     the loop header or loop exit, handle this.  */

>

> the loop header or loop latch

>

> +  if (gimple_code (and_stmt) == GIMPLE_PHI

> +      && gimple_bb (and_stmt) == exit->src

>

> this would be loop->header still, not exit->src, otherwise the

> gimple_phi_arg_def call on the latch-edge might ICE.


Done.

>

> +  /* Check the recurrence.  */

> +  gimple *phi = SSA_NAME_DEF_STMT (rhs1);

> +  if (gimple_code (phi) != GIMPLE_PHI)

> +    return false;

>

> I think you need to verify that the latch edge argument of this

> PHI has the proper value which should be the LHS of the

> and_stmt.

Done.


>

> What would make the code clearer is probably naming

> variables after a pattern you put in comments.  For example

>

>  /* We match

>       # b_1 = PHI <..., b_2>

>       tem_3 = b_1 + -1;

>       b_2 = b_1 & tem_3;  */

>

> and then have local vars b1 and b2 instead of rhs1/rhs2, etc.

Done.

>

> The overall function comment is now misleading since

> it still mentions 'c'.

>

> +  tree fn = builtin_decl_implicit (BUILT_IN_POPCOUNT);

> +  tree src = gimple_phi_arg_def (phi, loop_preheader_edge (loop)->dest_idx);

> +  tree call = build_call_expr (fn, 1, src);

>

> POPCOUNT expects an 'unsigned int' argument so you have to choose

> POPCOUNT, POPCOUNTL or POPCOUNTLL based on the type of SRC

> and also convert it properly (from say signed int to unsigned int).  The

> testcases all use 'long' but no actual value that would make a difference

> if you use popcount ((int)long-value).

>

> Note there's an internal functiuon as well but it is only supported if the

> target natively supports popcount which isn't what is desired here.

> So you have to deal with the type of src, making sure you zero-extent

> it if you need extension (for a loop that works on singed char or short)

> because sign-extension can change the popcount result (likely you'll

> need a GIMPLE testcase to trigger a bogus transform here).

>

> That said, for simplicity and not complicating this even further - heh - I

> suggest you do

>

>   if (TYPE_PRECISION (TREE_TYPE (src)) == TYPE_PRECISION (integer_type_node))

>     fn = builtin_decl_implicit (BUILT_IN_POPCOUNT);

>  else if (TYPE_PRECISION (TREE_TYPE (src)) == TYPE_PRECISION

> (long_integer_type_node))

>    fn = builtin_decl_implicit (BUILT_IN_POPCOUNTL);

>  else if

>     ... LL case

>

>   /* ???  Support promoting char/short to int.  */

>   if (!fn)

>     return false;

>

>   src = fold_convert (unsigned_type_for (TREE_TYPE (src)), src);

>

> +  tree utype = unsigned_type_for (TREE_TYPE (call));

> +  if (adjust)

> +    iter = fold_build2 (MINUS_EXPR, utype,

> +                       build_call_expr (fn, 1, src),

>

> you need to convert the call result to utype, thus wrap it

> in fold_convert (utype, build_call_expr (...)).

>

> +                       build_int_cst (utype, 1));

>

> so if adjust is true then you have to guard against the initial value

> being zero because then your niter is -1U.  You do this by

> setting may_be_zero to

Done. But in this case now we dont do the final value replacement for
the case when we do copy header.

When we do copy header (where we do -1), loop seems to always have the
zero check at the beginning. So is it still necessary?

>

>   fold_build2 (EQ_EXPR, boolean_type_node, src, build_zero_cst

> (TREE_TYPE (src)));

>

> +  else

> +    iter = fold_convert (utype, build_call_expr (fn, 1, src));

> +  max = int_cst_value (TYPE_MAX_VALUE (utype));

>

> Surely max is not TYPE_MAX_VALUE of utype but instead

> TYPE_PRECISION (TREE_TYPE (src)) (of the original

> unpromoted src).  In case of adjust == true it is even one less.

Done.



>

> Sorry for the many (correctness) issues again :/


Thanks again for the review.

Kugan
>

> Thanks,

> Richard.

>

> +

> +  niter->assumptions = boolean_false_node;

> +  niter->control.base = NULL_TREE;

> +  niter->control.step = NULL_TREE;

> +  niter->control.no_overflow = false;

> +  niter->niter = iter;

> +  niter->assumptions = boolean_true_node;

> +  niter->may_be_zero = boolean_false_node;

> +  niter->max = max;

> +  niter->bound = NULL_TREE;

> +  niter->cmp = ERROR_MARK;

>

>

>

>> Thanks,

>> Kugan

>> >

>> >

>> > Thanks,

>> > Richard.

>> >

>> >> Thanks,

>> >> Kugan

>> >>

>> >>

>> >> >

>> >> > Richard.

>> >> >

>> >> >> Thanks,

>> >> >> bin

>> >> >> >

>> >> >> >>> +  niter->control.base = NULL_TREE;

>> >> >> >>> +  niter->control.step = NULL_TREE;

>> >> >> >>> +  niter->control.no_overflow = false;

>> >> >> >>> +  niter->niter = iter;

>> >> >> >>> +  niter->assumptions = boolean_true_node;

>> >> >> >>> +  niter->may_be_zero = boolean_false_node;

>> >> >> >>> +  niter->max = max;

>> >> >> >>> +  niter->bound = NULL_TREE;

>> >> >> >>> +  niter->cmp = ERROR_MARK;

>> >> >> >>> +  return true;

>> >> >> >>> +}

>> >> >> >>> +

>> >> >> >>> +

>> >> >> >> Appology if these are nitpickings.

>> >> >> > Thanks for the review. I am happy to make the changes needed to get it

>> >> >> > to how it should be :)

>> >> >> >

>> >> >> > Thanks,

>> >> >> > Kugan

>> >> >> >

>> >> >> >>

>> >> >> >> Thanks,

>> >> >> >> bin
From 01257de5b8b19d5f0bd8fde6edfa3908b74aba90 Mon Sep 17 00:00:00 2001
From: Kugan Vivekanandarajah <kugan.vivekanandarajah@linaro.org>
Date: Thu, 10 May 2018 21:41:53 +1000
Subject: [PATCH] popcount

Change-Id: Ib211004cc42998f1da8caedfedfaa72ba3aaf5e2
---
 gcc/ipa-fnsummary.c                       |   2 +
 gcc/testsuite/gcc.dg/tree-ssa/popcount.c  |  41 +++++++
 gcc/testsuite/gcc.dg/tree-ssa/popcount2.c |  29 +++++
 gcc/testsuite/gcc.dg/tree-ssa/popcount3.c |  28 +++++
 gcc/tree-scalar-evolution.c               |  15 +++
 gcc/tree-ssa-loop-ivopts.c                |  10 ++
 gcc/tree-ssa-loop-niter.c                 | 176 +++++++++++++++++++++++++++++-
 7 files changed, 300 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/popcount.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/popcount2.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/popcount3.c

diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c
index bdf9ba1..feb1c9e 100644
--- a/gcc/ipa-fnsummary.c
+++ b/gcc/ipa-fnsummary.c
@@ -1485,6 +1485,8 @@ will_be_nonconstant_expr_predicate (struct ipa_node_params *info,
 					       nonconstant_names);
       return p2.or_with (summary->conds, p1);
     }
+  else if (TREE_CODE (expr) == CALL_EXPR)
+    return true;
   else
     {
       debug_tree (expr);
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/popcount.c b/gcc/testsuite/gcc.dg/tree-ssa/popcount.c
new file mode 100644
index 0000000..86a66cb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/popcount.c
@@ -0,0 +1,41 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+
+extern int foo (int);
+
+int PopCount (long b) {
+    int c = 0;
+    b++;
+
+    while (b) {
+	b &= b - 1;
+	c++;
+    }
+    return c;
+}
+int PopCount2 (long b) {
+    int c = 0;
+
+    while (b) {
+	b &= b - 1;
+	c++;
+    }
+    foo (c);
+    return foo (c);
+}
+
+void PopCount3 (long b1) {
+
+    for (long i = 0; i < b1; ++i)
+      {
+	long b = i;
+	int c = 0;
+	while (b) {
+	    b &= b - 1;
+	    c++;
+	}
+	foo (c);
+      }
+}
+
+/* { dg-final { scan-tree-dump-times "__builtin_popcount" 3 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/popcount2.c b/gcc/testsuite/gcc.dg/tree-ssa/popcount2.c
new file mode 100644
index 0000000..362d835
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/popcount2.c
@@ -0,0 +1,29 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+int
+__attribute__ ((noinline, noclone))
+foo (int i, long b)
+{
+    int c = i;
+
+    while (b) {
+	b &= b - 1;
+	c++;
+    }
+    return c;
+}
+
+int main()
+{
+
+  if (foo (1, 7) != 4)
+   __builtin_abort ();
+  if (foo (0, 0) != 0)
+   __builtin_abort ();
+  if (foo (8, 0xff) != 16)
+   __builtin_abort ();
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "__builtin_popcount" 1 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/popcount3.c b/gcc/testsuite/gcc.dg/tree-ssa/popcount3.c
new file mode 100644
index 0000000..9096c6b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/popcount3.c
@@ -0,0 +1,28 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -fno-tree-ch -fdump-tree-optimized" } */
+
+int
+__attribute__ ((noinline, noclone))
+foo (long b)
+{
+    int c = 0;
+
+    while (b) {
+	b &= b - 1;
+	c++;
+    }
+    return c;
+}
+
+int main()
+{
+  if (foo (7) != 3)
+   __builtin_abort ();
+  if (foo (0) != 0)
+   __builtin_abort ();
+  if (foo (0xff) != 8)
+   __builtin_abort ();
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "__builtin_popcount" 1 "optimized" } } */
diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c
index fefc9de..4b0ec02 100644
--- a/gcc/tree-scalar-evolution.c
+++ b/gcc/tree-scalar-evolution.c
@@ -281,6 +281,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-propagate.h"
 #include "gimple-fold.h"
 #include "tree-into-ssa.h"
+#include "builtins.h"
 
 static tree analyze_scalar_evolution_1 (struct loop *, tree);
 static tree analyze_scalar_evolution_for_address_of (struct loop *loop,
@@ -1984,6 +1985,7 @@ interpret_expr (struct loop *loop, gimple *at_stmt, tree expr)
     return expr;
 
   if (TREE_CODE (expr) == POLYNOMIAL_CHREC
+      || TREE_CODE (expr) == CALL_EXPR
       || get_gimple_rhs_class (TREE_CODE (expr)) == GIMPLE_TERNARY_RHS)
     return chrec_dont_know;
 
@@ -3493,6 +3495,19 @@ expression_expensive_p (tree expr)
 	return true;
     }
 
+  if (code == CALL_EXPR)
+    {
+      tree arg;
+      call_expr_arg_iterator iter;
+
+      if (!is_inexpensive_builtin (get_callee_fndecl (expr)))
+	return true;
+      FOR_EACH_CALL_EXPR_ARG (arg, iter, expr)
+	if (expression_expensive_p (arg))
+	  return true;
+      return false;
+    }
+
   switch (TREE_CODE_CLASS (code))
     {
     case tcc_binary:
diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index b313571..519649a 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -985,6 +985,16 @@ contains_abnormal_ssa_name_p (tree expr)
   code = TREE_CODE (expr);
   codeclass = TREE_CODE_CLASS (code);
 
+  if (code == CALL_EXPR)
+    {
+      tree arg;
+      call_expr_arg_iterator iter;
+      FOR_EACH_CALL_EXPR_ARG (arg, iter, expr)
+	if (contains_abnormal_ssa_name_p (arg))
+	  return true;
+      return false;
+    }
+
   if (code == SSA_NAME)
     return SSA_NAME_OCCURS_IN_ABNORMAL_PHI (expr) != 0;
 
diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
index 7a54c5f..8422d92 100644
--- a/gcc/tree-ssa-loop-niter.c
+++ b/gcc/tree-ssa-loop-niter.c
@@ -63,6 +63,10 @@ struct bounds
   mpz_t below, up;
 };
 
+static bool number_of_iterations_popcount (loop_p loop, edge exit,
+					   enum tree_code code,
+					   struct tree_niter_desc *niter);
+
 
 /* Splits expression EXPR to a variable part VAR and constant OFFSET.  */
 
@@ -2357,7 +2361,7 @@ number_of_iterations_exit_assumptions (struct loop *loop, edge exit,
   tree iv0_niters = NULL_TREE;
   if (!simple_iv_with_niters (loop, loop_containing_stmt (stmt),
 			      op0, &iv0, safe ? &iv0_niters : NULL, false))
-    return false;
+    return number_of_iterations_popcount (loop, exit, code, niter);
   tree iv1_niters = NULL_TREE;
   if (!simple_iv_with_niters (loop, loop_containing_stmt (stmt),
 			      op1, &iv1, safe ? &iv1_niters : NULL, false))
@@ -2430,6 +2434,176 @@ number_of_iterations_exit_assumptions (struct loop *loop, edge exit,
   return (!integer_zerop (niter->assumptions));
 }
 
+
+/* Utility function to check if OP is defined by a stmt
+   that is a val - 1.  */
+
+static bool
+ssa_defined_by_minus_one_stmt_p (tree op, tree val)
+{
+  gimple *stmt;
+  return (TREE_CODE (op) == SSA_NAME
+	  && (stmt = SSA_NAME_DEF_STMT (op))
+	  && is_gimple_assign (stmt)
+	  && (gimple_assign_rhs_code (stmt) == PLUS_EXPR)
+	  && val == gimple_assign_rhs1 (stmt)
+	  && integer_minus_onep (gimple_assign_rhs2 (stmt)));
+}
+
+
+/* See if LOOP is a popcout implementation, determine NITER for the loop
+
+   We match:
+   <bb 2>
+   goto <bb 4>
+
+   <bb 3>
+   _1 = b_11 + -1
+   b_6 = _1 & b_11
+
+   <bb 4>
+   b_11 = PHI <b_5(D)(2), b_6(3)>
+
+   exit block
+   if (b_11 != 0)
+	goto <bb 3>
+   else
+	goto <bb 5>
+
+   OR we match copy-header version:
+   if (b_5 != 0)
+	goto <bb 3>
+   else
+	goto <bb 4>
+
+   <bb 3>
+   b_11 = PHI <b_5(2), b_6(3)>
+   _1 = b_11 + -1
+   b_6 = _1 & b_11
+
+   exit block
+   if (b_6 != 0)
+	goto <bb 3>
+   else
+	goto <bb 4>
+
+   If popcount pattern, update NITER accordingly.
+   i.e., set NITER to  __builtin_popcount (b)
+   return true if we did, false otherwise.
+
+ */
+
+static bool
+number_of_iterations_popcount (loop_p loop, edge exit,
+			       enum tree_code code,
+			       struct tree_niter_desc *niter)
+{
+  bool adjust = true;
+  tree iter;
+  HOST_WIDE_INT max;
+  adjust = true;
+  tree fn = NULL_TREE;
+
+  /* Check loop terminating branch is like
+     if (b != 0).  */
+  gimple *stmt = last_stmt (exit->src);
+  if (!stmt
+      || gimple_code (stmt) != GIMPLE_COND
+      || code != NE_EXPR
+      || !integer_zerop (gimple_cond_rhs (stmt))
+      || TREE_CODE (gimple_cond_lhs (stmt)) != SSA_NAME)
+    return false;
+
+  gimple *and_stmt = SSA_NAME_DEF_STMT (gimple_cond_lhs (stmt));
+
+  /* Depending on copy-header is performed, feeding PHI stmts might be in
+     the loop header or loop latch, handle this.  */
+  if (gimple_code (and_stmt) == GIMPLE_PHI
+      && gimple_bb (and_stmt) == loop->header
+      && gimple_phi_num_args (and_stmt) == 2
+      && (TREE_CODE (gimple_phi_arg_def (and_stmt,
+					 loop_latch_edge (loop)->dest_idx))
+	  == SSA_NAME))
+    {
+      /* SSA used in exit condition is defined by PHI stmt
+	b_11 = PHI <b_5(D)(2), b_6(3)>
+	from the PHI stmt, get the and_stmt
+	b_6 = _1 & b_11.  */
+      tree t = gimple_phi_arg_def (and_stmt, loop_latch_edge (loop)->dest_idx);
+      and_stmt = SSA_NAME_DEF_STMT (t);
+      adjust = false;
+    }
+
+  /* Make sure it is indeed an and stmt (b_6 = _1 & b_11).  */
+  if (!is_gimple_assign (and_stmt)
+      || gimple_assign_rhs_code (and_stmt) != BIT_AND_EXPR)
+    return false;
+
+  tree b_11 = gimple_assign_rhs1 (and_stmt);
+  tree _1 = gimple_assign_rhs2 (and_stmt);
+
+  /* Check that _1 is defined by _b11 + -1 (_1 = b_11 + -1).
+     Also make sure that b_11 is the same in and_stmt and _1 defining stmt.
+     Also canonicalize if _1 and _b11 are revrsed.  */
+  if (ssa_defined_by_minus_one_stmt_p (b_11, _1))
+    std::swap (b_11, _1);
+  else if (ssa_defined_by_minus_one_stmt_p (_1, b_11))
+    ;
+  else
+    return false;
+  /* Check the recurrence:
+   ... = PHI <b_5(2), b_6(3)>.  */
+  gimple *phi = SSA_NAME_DEF_STMT (b_11);
+  if (gimple_code (phi) != GIMPLE_PHI
+      || (gimple_assign_lhs (and_stmt)
+	  != gimple_phi_arg_def (phi, loop_latch_edge (loop)->dest_idx)))
+    return false;
+
+  /* We found a match. Get the corresponding popcount builtin.  */
+  tree src = gimple_phi_arg_def (phi, loop_preheader_edge (loop)->dest_idx);
+  if (TYPE_PRECISION (TREE_TYPE (src)) == TYPE_PRECISION (integer_type_node))
+    fn = builtin_decl_implicit (BUILT_IN_POPCOUNT);
+  else if (TYPE_PRECISION (TREE_TYPE (src)) == TYPE_PRECISION
+	   (long_integer_type_node))
+    fn = builtin_decl_implicit (BUILT_IN_POPCOUNTL);
+  else if (TYPE_PRECISION (TREE_TYPE (src)) == TYPE_PRECISION
+	   (long_long_integer_type_node))
+    fn = builtin_decl_implicit (BUILT_IN_POPCOUNTLL);
+
+  /* ??? Support promoting char/short to int.  */
+  if (!fn)
+    return false;
+
+  /* Update NITER params accordingly  */
+  max = int_cst_value (TYPE_MAX_VALUE (TREE_TYPE (src)));
+  tree utype = unsigned_type_for (TREE_TYPE (src));
+  src = fold_convert (utype, src);
+  tree call = fold_convert (utype, build_call_expr (fn, 1, src));
+  if (adjust)
+    iter = fold_build2 (MINUS_EXPR, utype,
+			call,
+			build_int_cst (utype, 1));
+  else
+    iter = call;
+  if (adjust)
+    max = max - 1;
+
+  niter->niter = iter;
+  niter->assumptions = boolean_true_node;
+  if (adjust)
+    niter->may_be_zero = fold_build2 (EQ_EXPR, boolean_type_node, src,
+				      build_zero_cst
+				      (TREE_TYPE (src)));
+  else
+    niter->may_be_zero = boolean_false_node;
+
+  niter->max = max;
+  niter->bound = NULL_TREE;
+  niter->cmp = ERROR_MARK;
+  return true;
+}
+
+
 /* Like number_of_iterations_exit_assumptions, but return TRUE only if
    the niter information holds unconditionally.  */

Richard Biener June 14, 2018, 1:57 p.m. UTC | #24

On Tue, Jun 12, 2018 at 5:10 AM Kugan Vivekanandarajah
<kugan.vivekanandarajah@linaro.org> wrote:
>

> Hi Richard,

>

> Thanks for the review.

>

> On 7 June 2018 at 19:21, Richard Biener <richard.guenther@gmail.com> wrote:

> > On Thu, Jun 7, 2018 at 2:52 AM Kugan Vivekanandarajah

> > <kugan.vivekanandarajah@linaro.org> wrote:

> >>

> >> Hi Richard,

> >>

> >> Thanks for the review.

> >>

> >> On 5 June 2018 at 21:25, Richard Biener <richard.guenther@gmail.com> wrote:

> >> > On Tue, Jun 5, 2018 at 10:59 AM Kugan Vivekanandarajah

> >> > <kugan.vivekanandarajah@linaro.org> wrote:

> >> >>

> >> >> Hi Richard,

> >> >>

> >> >> Thanks for the review.

> >> >>

> >> >> On 1 June 2018 at 23:12, Richard Biener <richard.guenther@gmail.com> wrote:

> >> >> > On Fri, Jun 1, 2018 at 12:06 PM Bin.Cheng <amker.cheng@gmail.com> wrote:

> >> >> >>

> >> >> >> On Fri, Jun 1, 2018 at 9:56 AM, Kugan Vivekanandarajah

> >> >> >> <kugan.vivekanandarajah@linaro.org> wrote:

> >> >> >> > Hi Bin,

> >> >> >> >

> >> >> >> > Thanks a lo for the review.

> >> >> >> >

> >> >> >> > On 1 June 2018 at 03:45, Bin.Cheng <amker.cheng@gmail.com> wrote:

> >> >> >> >> On Thu, May 31, 2018 at 3:51 AM, Kugan Vivekanandarajah

> >> >> >> >> <kugan.vivekanandarajah@linaro.org> wrote:

> >> >> >> >>> Hi Bin,

> >> >> >> >>>

> >> >> >> >>> Thanks for the review. Please find the revised patch based on the

> >> >> >> >>> review comments.

> >> >> >> >>>

> >> >> >> >>> Thanks,

> >> >> >> >>> Kugan

> >> >> >> >>>

> >> >> >> >>> On 17 May 2018 at 19:56, Bin.Cheng <amker.cheng@gmail.com> wrote:

> >> >> >> >>>> On Thu, May 17, 2018 at 2:39 AM, Kugan Vivekanandarajah

> >> >> >> >>>> <kugan.vivekanandarajah@linaro.org> wrote:

> >> >> >> >>>>> Hi Richard,

> >> >> >> >>>>>

> >> >> >> >>>>> On 6 March 2018 at 02:24, Richard Biener <richard.guenther@gmail.com> wrote:

> >> >> >> >>>>>> On Thu, Feb 8, 2018 at 1:41 AM, Kugan Vivekanandarajah

> >> >> >> >>>>>> <kugan.vivekanandarajah@linaro.org> wrote:

> >> >> >> >>>>>>> Hi Richard,

> >> >> >> >>>>>>>

> >> >> >> >>

> >> >> >> >> Hi,

> >> >> >> >> Thanks very much for working.

> >> >> >> >>

> >> >> >> >>> +/* Utility function to check if OP is defined by a stmt

> >> >> >> >>> +   that is a val - 1.  If that is the case, set it to STMT.  */

> >> >> >> >>> +

> >> >> >> >>> +static bool

> >> >> >> >>> +ssa_defined_by_and_minus_one_stmt_p (tree op, tree val, gimple **stmt)

> >> >> >> >> This is checking if op is defined as val - 1, so name it as

> >> >> >> >> ssa_defined_by_minus_one_stmt_p?

> >> >> >> >>

> >> >> >> >>> +{

> >> >> >> >>> +  if (TREE_CODE (op) == SSA_NAME

> >> >> >> >>> +      && (*stmt = SSA_NAME_DEF_STMT (op))

> >> >> >> >>> +      && is_gimple_assign (*stmt)

> >> >> >> >>> +      && (gimple_assign_rhs_code (*stmt) == PLUS_EXPR)

> >> >> >> >>> +      && val == gimple_assign_rhs1 (*stmt)

> >> >> >> >>> +      && integer_minus_onep (gimple_assign_rhs2 (*stmt)))

> >> >> >> >>> +    return true;

> >> >> >> >>> +  else

> >> >> >> >>> +    return false;

> >> >> >> >> You can simply return the boolean condition.

> >> >> >> > Done.

> >> >> >> >

> >> >> >> >>

> >> >> >> >>> +}

> >> >> >> >>> +

> >> >> >> >>> +/* See if LOOP is a popcout implementation of the form

> >> >> >> >> ...

> >> >> >> >>> +  rhs1 = gimple_assign_rhs1 (and_stmt);

> >> >> >> >>> +  rhs2 = gimple_assign_rhs2 (and_stmt);

> >> >> >> >>> +

> >> >> >> >>> +  if (ssa_defined_by_and_minus_one_stmt_p (rhs1, rhs2, &and_minus_one))

> >> >> >> >>> +    rhs1 = rhs2;

> >> >> >> >>> +  else if (ssa_defined_by_and_minus_one_stmt_p (rhs2, rhs1, &and_minus_one))

> >> >> >> >>> +    ;

> >> >> >> >>> +  else

> >> >> >> >>> +    return false;

> >> >> >> >>> +

> >> >> >> >>> +  /* Check the recurrence.  */

> >> >> >> >>> +  phi = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (and_minus_one));

> >> >> >> >> So gimple_assign_rhs1 (and_minus_one) == rhs1 is always true?  Please

> >> >> >> >> use rhs1 directly.

> >> >> >> >

> >> >> >> > Done.

> >> >> >> >>> +  gimple *src_phi = SSA_NAME_DEF_STMT (rhs2);

> >> >> >> >> I think this is checking wrong thing and is redundant.  Either rhs2

> >> >> >> >> equals to rhs1 or is defined as (rhs1 - 1).  For (rhs2 == rhs1) case,

> >> >> >> >> the check duplicates checking on phi; for the latter, it's never a PHI

> >> >> >> >> stmt and shouldn't be checked.

> >> >> >> >>

> >> >> >> >>> +  if (gimple_code (phi) != GIMPLE_PHI

> >> >> >> >>> +      || gimple_code (src_phi) != GIMPLE_PHI)

> >> >> >> >>> +    return false;

> >> >> >> >>> +

> >> >> >> >>> +  dest = gimple_assign_lhs (count_stmt);

> >> >> >> >>> +  tree fn = builtin_decl_implicit (BUILT_IN_POPCOUNT);

> >> >> >> >>> +  tree src = gimple_phi_arg_def (src_phi, loop_preheader_edge (loop)->dest_idx);

> >> >> >> >>> +  if (adjust)

> >> >> >> >>> +    iter = fold_build2 (MINUS_EXPR, TREE_TYPE (dest),

> >> >> >> >>> +            build_call_expr (fn, 1, src),

> >> >> >> >>> +            build_int_cst (TREE_TYPE (dest), 1));

> >> >> >> >>> +  else

> >> >> >> >>> +    iter = build_call_expr (fn, 1, src);

> >> >> >> >> Note tree-ssa-loop-niters.c always use unsigned_type_for (IV-type) as

> >> >> >> >> niters type.  Though unsigned type is unnecessary in this case, but

> >> >> >> >> better to follow existing behavior?

> >> >> >> >

> >> >> >> > Done.

> >> >> >> >>

> >> >> >> >>> +  max = int_cst_value (TYPE_MAX_VALUE (TREE_TYPE (dest)));

> >> >> >> >> As richi suggested, max should be the number of bits in type of IV.

> >> >> >> >>

> >> >> >> >>> +

> >> >> >> >>> +  niter->assumptions = boolean_false_node;

> >> >> >> >> Redundant.

> >> >> >> >

> >> >> >> > Not sure I understand. If I remove this,  I am getting ICE

> >> >> >> > (segmentation fault). What is the expectation here?

> >> >> >> Is it a simple typo?  Because assumptions is set to boolean_true_node

> >> >> >> just 5 lines below?

> >> >> >> The niters part looks good for me with this change.  You may need

> >> >> >> richi's approval for other parts?

> >> >> >

> >> >> > diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c

> >> >> > index bdf9ba1..4dc156a 100644

> >> >> > --- a/gcc/ipa-fnsummary.c

> >> >> > +++ b/gcc/ipa-fnsummary.c

> >> >> > @@ -1485,6 +1485,10 @@ will_be_nonconstant_expr_predicate (struct

> >> >> > ipa_node_params *info,

> >> >> >                                                nonconstant_names);

> >> >> >        return p2.or_with (summary->conds, p1);

> >> >> >      }

> >> >> > +  else if (TREE_CODE (expr) == CALL_EXPR)

> >> >> > +    {

> >> >> > +      return false;

> >> >> > +    }

> >> >> >    else

> >> >> >      {

> >> >> >        debug_tree (expr);

> >> >> >

> >> >> > I'd return true here instead - we don't want to track popcount() in

> >> >> > predicates.  Also unnecessary braces.

> >> >> >

> >> >> > @@ -3492,6 +3494,11 @@ expression_expensive_p (tree expr)

> >> >> >        if (!integer_pow2p (TREE_OPERAND (expr, 1)))

> >> >> >         return true;

> >> >> >      }

> >> >> > +  if (code == CALL_EXPR

> >> >> > +      && (fndecl = get_callee_fndecl (expr))

> >> >> > +      && DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL

> >> >> > +      && (BUILT_IN_POPCOUNT == DECL_FUNCTION_CODE (fndecl)))

> >> >> > +    return false;

> >> >> >

> >> >> > please use

> >> >> >

> >> >> >     if (code == CALL_EXPR)

> >> >> >      {

> >> >> >          if (!is_inexpensive_builtin (get_callee_fndecl (expr)))

> >> >> >            return true;

> >> >> >          FOR_EACH_CALL_EXPR_ARG (arg, iter, expr)

> >> >> >              if (expression_expensive_p (arg))

> >> >> >                 return true;

> >> >> >          return false;

> >> >> >      }

> >> >> >

> >> >> > instead.

> >> >> Done.

> >> >>

> >> >> >

> >> >> > +  /* Check "b = b & (b - 1)" is calculated.  */

> >> >> > +  if (!is_gimple_assign (and_stmt)

> >> >> > +      || gimple_assign_rhs_code (and_stmt) != BIT_AND_EXPR)

> >> >> > +    return false;

> >> >> > +

> >> >> > +  rhs1 = gimple_assign_rhs1 (and_stmt);

> >> >> > +  rhs2 = gimple_assign_rhs2 (and_stmt);

> >> >> > +

> >> >> > +  if (ssa_defined_by_minus_one_stmt_p (rhs1, rhs2, &and_minus_one))

> >> >> > +    std::swap (rhs1, rhs2);

> >> >> > +  else if (ssa_defined_by_minus_one_stmt_p (rhs2, rhs1, &and_minus_one))

> >> >> > +    ;

> >> >> > +  else

> >> >> > +    return false;

> >> >> >

> >> >> > so the powers of match-and-simplify let you add sth like the following to

> >> >> > match.pd:

> >> >> >

> >> >> > #if GIMPLE

> >> >> > (match (b_bit_and_b_minus_one @0)

> >> >> >  (bit_and:c @0 (plus @0 integer_minus_onep)))

> >> >> > #endif

> >> >> >

> >> >> > and then use it like

> >> >> >

> >> >> >    extern bool gimple_b_bit_and_b_minus_one (tree, tree *, tree (*)(tree));

> >> >> >    if (!gimple_b_bit_and_b_minus_one (gimple_assign_lhs (and_stmt),

> >> >> > &rhs1, NULL))

> >> >> >       return false;

> >> >> >

> >> >> > note you need to explicitely declare the matcher as there's no header

> >> >> > file generated

> >> >> > for them.  It would be also the first user for this "feature" and at

> >> >> > some point I

> >> >> > considered using in-line source markup (and sth like GTY_FILES for

> >> >> > what files to scan)

> >> >> > to gather patterns.

> >> >> >

> >> >> > +  /* Check the loop closed SSA definition for just the variable c defined in

> >> >> > +     loop.  */

> >> >> > +  basic_block bb = exit->dest;

> >> >> > +  for (gphi_iterator gpi = gsi_start_phis (bb);

> >> >> > +       !gsi_end_p (gpi); gsi_next (&gpi))

> >> >> > +    {

> >> >> > +      phi = gpi.phi ();

> >> >> > +      count++;

> >> >> > +    }

> >> >> >

> >> >> > I don't understand why you are checking for this - isn't the number of

> >> >> > iterations

> >> >> I am trying to be overly conservative. I have removed it now and adjusted.

> >> >>

> >> >> > independent on the rest of the loop-closed PHIs?  That is, even for just

> >> >> >

> >> >> >  while (b) { b = b & (b-1); }

> >> >> >

> >> >> > the number of iterations is popcount (b).  So it's enough to check

> >> >> > the exit condition?  In fact you are coming from number_of_iterations_exit

> >> >> > and thus are asked for a specific exit and thus

> >> >> >

> >> >> > +  if (!(exit = single_exit (loop)))

> >> >> > +    return false;

> >> >> Ok, changed it.

> >> >>

> >> >>

> >> >> > is premature.  That is, for

> >> >> >

> >> >> >   while (b) { b = b & (b - 1); if (--c == 0) break; }

> >> >> >

> >> >> > still should have popcount (b) iterations for the exit involving the

> >> >> > while (b) test.

> >> >> >

> >> >> > So please simplify (and thus genericize) the code accordingly.  Do so

> >> >> > without the match.pd trick for now, I think we can simplify the current

> >> >> > beast down enough to not need the factored out function.

> >> >> I am not using match.pd changes as you have asked. Please let me know

> >> >> if you want that to be included as well.

> >> >>

> >> >> >

> >> >> > You then also can handle replacing

> >> >> >

> >> >> > int c = 0;

> >> >> > while (b) { b = b & (b-1); c+=3; }

> >> >> >

> >> >> > with 3 * popcount (b);

> >> >> Made to handle this too. But, there are still cases we don't handle. I

> >> >> am not sure if it is worth the complexity handling all the possible

> >> >> cases.

> >> >>

> >> >> Is the attached patch looks better now?

> >> >

> >> > +  /* Check loop terminating branch is like

> >> > +     if (b != 0).  */

> >> > +  gimple *stmt = last_stmt (loop->header);

> >> > +  if (!stmt

> >> >

> >> > this should now look at exit->src instead of loop->header.

> >> Done.

> >>

> >> >

> >> > +      || !zerop (gimple_cond_rhs (stmt))

> >> >

> >> > use integer_zerop

> >> Done.

> >>

> >> >

> >> > +  gimple *count_stmt = SSA_NAME_DEF_STMT (rhs1);

> >> >

> >> > you still didn't remove the whole count-stmt stuff.  It's unrelated

> >> > and is not required at all.  Only the GIMPLE_COND lhs def-use

> >> > cycle is interesting for niter purposes.

> >> Changed it.

> >>

> >> >

> >> > Please adjust accordingly.

> >> >

> >> > @@ -2441,7 +2578,11 @@ number_of_iterations_exit (struct loop *loop, edge exit,

> >> >    gcond *stmt;

> >> >    if (!number_of_iterations_exit_assumptions (loop, exit, niter,

> >> >                                               &stmt, every_iteration))

> >> > -    return false;

> >> > +    {

> >> > +      if (number_of_iterations_popcount (loop, exit, niter))

> >> > +       return true;

> >> > +      return false;

> >> > +    }

> >> >

> >> > this should be done in number_of_iterations_exit_assumptions.  I realize

> >> > you want to do it only if that fails but in that process you have to be

> >> > sure to properly handle every_iteration & friends which is much easier

> >> > to do in number_of_iterations_exit_assumptions.  So please put it

> >> > where that fails for this kind of IVs:

> >> >

> >> >   tree iv0_niters = NULL_TREE;

> >> >   if (!simple_iv_with_niters (loop, loop_containing_stmt (stmt),

> >> >                               op0, &iv0, safe ? &iv0_niters : NULL, false))

> >> > ^^^ here

> >> >     return false;

> >>

> >> Done.

> >>

> >> Is this OK now. Regression tested and bootstrapped on x86-linux-gnu

> >> with no new regressions.

> >

> > Please pass down 'code' from number_of_iterations_exit_assumptions becaue

> > that correctly does

>

> Done.

>

> >

> >   /* We want the condition for staying inside loop.  */

> >   code = gimple_cond_code (stmt);

> >   if (exit->flags & EDGE_TRUE_VALUE)

> >     code = invert_tree_comparison (code, false);

> >

> > alternatively if you want to keep the function more isolated from the actual

> > caller you'd have to do sth like the above in number_of_iterations_popcount.

> >

> > +  gphi_iterator gpi = gsi_start_phis (exit->dest);

> > +  if (gsi_end_p (gpi))

> > +    return false;

> > +  rhs1 = gimple_phi_arg_def (gpi.phi (), exit->dest_idx);

> > +  if (TREE_CODE (rhs1) != SSA_NAME)

> > +    return false;

> >

> > drop this, it's not necessary.

> Done.

>

> >

> > +  /* Depending on copy-header is performed, feeding PHI stmts might be in

> > +     the loop header or loop exit, handle this.  */

> >

> > the loop header or loop latch

> >

> > +  if (gimple_code (and_stmt) == GIMPLE_PHI

> > +      && gimple_bb (and_stmt) == exit->src

> >

> > this would be loop->header still, not exit->src, otherwise the

> > gimple_phi_arg_def call on the latch-edge might ICE.

>

> Done.

>

> >

> > +  /* Check the recurrence.  */

> > +  gimple *phi = SSA_NAME_DEF_STMT (rhs1);

> > +  if (gimple_code (phi) != GIMPLE_PHI)

> > +    return false;

> >

> > I think you need to verify that the latch edge argument of this

> > PHI has the proper value which should be the LHS of the

> > and_stmt.

> Done.

>

>

> >

> > What would make the code clearer is probably naming

> > variables after a pattern you put in comments.  For example

> >

> >  /* We match

> >       # b_1 = PHI <..., b_2>

> >       tem_3 = b_1 + -1;

> >       b_2 = b_1 & tem_3;  */

> >

> > and then have local vars b1 and b2 instead of rhs1/rhs2, etc.

> Done.

>

> >

> > The overall function comment is now misleading since

> > it still mentions 'c'.

> >

> > +  tree fn = builtin_decl_implicit (BUILT_IN_POPCOUNT);

> > +  tree src = gimple_phi_arg_def (phi, loop_preheader_edge (loop)->dest_idx);

> > +  tree call = build_call_expr (fn, 1, src);

> >

> > POPCOUNT expects an 'unsigned int' argument so you have to choose

> > POPCOUNT, POPCOUNTL or POPCOUNTLL based on the type of SRC

> > and also convert it properly (from say signed int to unsigned int).  The

> > testcases all use 'long' but no actual value that would make a difference

> > if you use popcount ((int)long-value).

> >

> > Note there's an internal functiuon as well but it is only supported if the

> > target natively supports popcount which isn't what is desired here.

> > So you have to deal with the type of src, making sure you zero-extent

> > it if you need extension (for a loop that works on singed char or short)

> > because sign-extension can change the popcount result (likely you'll

> > need a GIMPLE testcase to trigger a bogus transform here).

> >

> > That said, for simplicity and not complicating this even further - heh - I

> > suggest you do

> >

> >   if (TYPE_PRECISION (TREE_TYPE (src)) == TYPE_PRECISION (integer_type_node))

> >     fn = builtin_decl_implicit (BUILT_IN_POPCOUNT);

> >  else if (TYPE_PRECISION (TREE_TYPE (src)) == TYPE_PRECISION

> > (long_integer_type_node))

> >    fn = builtin_decl_implicit (BUILT_IN_POPCOUNTL);

> >  else if

> >     ... LL case

> >

> >   /* ???  Support promoting char/short to int.  */

> >   if (!fn)

> >     return false;

> >

> >   src = fold_convert (unsigned_type_for (TREE_TYPE (src)), src);

> >

> > +  tree utype = unsigned_type_for (TREE_TYPE (call));

> > +  if (adjust)

> > +    iter = fold_build2 (MINUS_EXPR, utype,

> > +                       build_call_expr (fn, 1, src),

> >

> > you need to convert the call result to utype, thus wrap it

> > in fold_convert (utype, build_call_expr (...)).

> >

> > +                       build_int_cst (utype, 1));

> >

> > so if adjust is true then you have to guard against the initial value

> > being zero because then your niter is -1U.  You do this by

> > setting may_be_zero to

> Done. But in this case now we dont do the final value replacement for

> the case when we do copy header.

>

> When we do copy header (where we do -1), loop seems to always have the

> zero check at the beginning. So is it still necessary?


Yes, this is necessary because if you write the loop as

  do {
     b &= b - 1;
  } while (b);

and enter with b == 0 then the number of latch executions is zero
and not -1 (popcount (0) is still zero).  And this is a do-while loop
already so nobody will insert an artificial if (b) around the loop.

The failure to replace the final value is something that needs to be
addressed in final_value_replacement_loop () which should use
number_of_iterations_exit () and appropriately handle
niter.assumptions and niter.may_be_zero when doing the replacement
(giving up on the former and somehow handling the latter)

> >

> >   fold_build2 (EQ_EXPR, boolean_type_node, src, build_zero_cst

> > (TREE_TYPE (src)));

> >

> > +  else

> > +    iter = fold_convert (utype, build_call_expr (fn, 1, src));

> > +  max = int_cst_value (TYPE_MAX_VALUE (utype));

> >

> > Surely max is not TYPE_MAX_VALUE of utype but instead

> > TYPE_PRECISION (TREE_TYPE (src)) (of the original

> > unpromoted src).  In case of adjust == true it is even one less.

> Done.


I don't see where you did this?  max is still computed as

+  /* Update NITER params accordingly  */
+  max = int_cst_value (TYPE_MAX_VALUE (TREE_TYPE (src)));

rather than

   max = TYPE_PRECISION (TREE_TYPE (src));

Ok with this change.

Thanks,
Richard.

>

>

> >

> > Sorry for the many (correctness) issues again :/

>

> Thanks again for the review.

>

> Kugan

> >

> > Thanks,

> > Richard.

> >

> > +

> > +  niter->assumptions = boolean_false_node;

> > +  niter->control.base = NULL_TREE;

> > +  niter->control.step = NULL_TREE;

> > +  niter->control.no_overflow = false;

> > +  niter->niter = iter;

> > +  niter->assumptions = boolean_true_node;

> > +  niter->may_be_zero = boolean_false_node;

> > +  niter->max = max;

> > +  niter->bound = NULL_TREE;

> > +  niter->cmp = ERROR_MARK;

> >

> >

> >

> >> Thanks,

> >> Kugan

> >> >

> >> >

> >> > Thanks,

> >> > Richard.

> >> >

> >> >> Thanks,

> >> >> Kugan

> >> >>

> >> >>

> >> >> >

> >> >> > Richard.

> >> >> >

> >> >> >> Thanks,

> >> >> >> bin

> >> >> >> >

> >> >> >> >>> +  niter->control.base = NULL_TREE;

> >> >> >> >>> +  niter->control.step = NULL_TREE;

> >> >> >> >>> +  niter->control.no_overflow = false;

> >> >> >> >>> +  niter->niter = iter;

> >> >> >> >>> +  niter->assumptions = boolean_true_node;

> >> >> >> >>> +  niter->may_be_zero = boolean_false_node;

> >> >> >> >>> +  niter->max = max;

> >> >> >> >>> +  niter->bound = NULL_TREE;

> >> >> >> >>> +  niter->cmp = ERROR_MARK;

> >> >> >> >>> +  return true;

> >> >> >> >>> +}

> >> >> >> >>> +

> >> >> >> >>> +

> >> >> >> >> Appology if these are nitpickings.

> >> >> >> > Thanks for the review. I am happy to make the changes needed to get it

> >> >> >> > to how it should be :)

> >> >> >> >

> >> >> >> > Thanks,

> >> >> >> > Kugan

> >> >> >> >

> >> >> >> >>

> >> >> >> >> Thanks,

> >> >> >> >> bin

[RFC,PR82479] missing popcount builtin detection

Commit Message

Comments

Patch