From patchwork Thu Jun 19 09:44:55 2014
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Zhenqiang Chen <zhenqiang.chen@linaro.org>
X-Patchwork-Id: 32196
Return-Path: <patchwork-forward+bncBCA6F2EI6YNRBK7CRKOQKGQEOBYKLRA@linaro.org>
X-Original-To: linaro@patches.linaro.org
Delivered-To: linaro@patches.linaro.org
Received: from mail-ie0-f199.google.com (mail-ie0-f199.google.com
 [209.85.223.199])
 by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id EF224203F4
 for <linaro@patches.linaro.org>; Thu, 19 Jun 2014 09:45:15 +0000 (UTC)
Received: by mail-ie0-f199.google.com with SMTP id rd18sf14921057iec.10
 for <linaro@patches.linaro.org>; Thu, 19 Jun 2014 02:45:15 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:delivered-to:mailing-list:precedence:list-id
 :list-unsubscribe:list-archive:list-post:list-help:sender
 :delivered-to:mime-version:in-reply-to:references:date:message-id
 :subject:from:to:cc:x-original-sender
 :x-original-authentication-results:content-type;
 bh=vU1G7jGcEAo3rVCCns7it2g9TmVBhvnNqxKTLNU7bO8=;
 b=ER7rZUmeyfTdKbuuqAt4SoHGV6y+wPVA40EfidZ96M7CVNx02g/DUlE2JDqsgh0ng2
 BDsN2tZxWAV3nzMd1s/wSIJPzQ0sDpAGaMWNfsXaMWcImjTc3lC/s49j6tUNQZ8CwIRP
 QL4oAuYyWBcN7ZxpYZyJiv3uU8mNu4HOQWvFLumiPq2LBkmmREBxmRB2Yc3mydyVQZhx
 0ikXPWFtaYplfLizbfssoJUVOc8ldb55NUNnFIPp3CCwTUCguyPrlqgicmNUipMvnr6a
 w9sURP/M15qmak+iKvpZdFOfLDxx6JNBAvZt++ZhofsrLIPPk5IbW86/oes8EK2ySjMI
 X04Q==
X-Gm-Message-State: ALoCoQkkDpl2wG76XuO4Du/sO7PIXENR2UEJB40V3xDjKGdqVS3C64wI1yk7AIKzOMqpnN4eHAcj
X-Received: by 10.182.47.199 with SMTP id f7mr1904903obn.6.1403171115505;
 Thu, 19 Jun 2014 02:45:15 -0700 (PDT)
X-BeenThere: patchwork-forward@linaro.org
Received: by 10.140.36.36 with SMTP id o33ls465819qgo.76.gmail; Thu, 19 Jun
 2014 02:45:15 -0700 (PDT)
X-Received: by 10.58.230.101 with SMTP id sx5mr3186760vec.10.1403171115391; 
 Thu, 19 Jun 2014 02:45:15 -0700 (PDT)
Received: from mail-vc0-x236.google.com (mail-vc0-x236.google.com
 [2607:f8b0:400c:c03::236]) by mx.google.com with ESMTPS id
 q13si2123015vci.29.2014.06.19.02.45.15
 for <patchwork-forward@linaro.org>
 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Thu, 19 Jun 2014 02:45:15 -0700 (PDT)
Received-SPF: pass (google.com: domain of
 patch+caf_=patchwork-forward=linaro.org@linaro.org designates
 2607:f8b0:400c:c03::236 as permitted sender)
 client-ip=2607:f8b0:400c:c03::236; 
Received: by mail-vc0-f182.google.com with SMTP id il7so1975335vcb.41
 for <patchwork-forward@linaro.org>;
 Thu, 19 Jun 2014 02:45:15 -0700 (PDT)
X-Received: by 10.220.105.136 with SMTP id t8mr3084079vco.13.1403171115258; 
 Thu, 19 Jun 2014 02:45:15 -0700 (PDT)
X-Forwarded-To: patchwork-forward@linaro.org
X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org
Delivered-To: patch@linaro.org
Received: by 10.221.54.6 with SMTP id vs6csp349333vcb;
 Thu, 19 Jun 2014 02:45:14 -0700 (PDT)
X-Received: by 10.68.99.194 with SMTP id es2mr4281110pbb.100.1403171114413; 
 Thu, 19 Jun 2014 02:45:14 -0700 (PDT)
Received: from sourceware.org (server1.sourceware.org. [209.132.180.131])
 by mx.google.com with ESMTPS id
 hq3si5255462pad.87.2014.06.19.02.45.13 for <patch@linaro.org>
 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Thu, 19 Jun 2014 02:45:14 -0700 (PDT)
Received-SPF: pass (google.com: domain of
 gcc-patches-return-370602-patch=linaro.org@gcc.gnu.org
 designates 209.132.180.131 as permitted sender)
 client-ip=209.132.180.131; 
Received: (qmail 20520 invoked by alias); 19 Jun 2014 09:45:01 -0000
Mailing-List: list patchwork-forward@linaro.org;
 contact patchwork-forward+owners@linaro.org
Precedence: list
List-Id: <patchwork-forward.linaro.org>
List-Unsubscribe: <http://groups.google.com/a/linaro.org/group/patchwork-forward/subscribe>, 
 <mailto:googlegroups-manage+836684582541+unsubscribe@googlegroups.com>
List-Archive: <http://groups.google.com/a/linaro.org/group/patchwork-forward/>
List-Post: <http://groups.google.com/a/linaro.org/group/patchwork-forward/post>, 
 <mailto:patchwork-forward@linaro.org>
List-Help: <http://support.google.com/a/linaro.org/bin/topic.py?topic=25838>, 
 <mailto:patchwork-forward+help@linaro.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 20502 invoked by uid 89); 19 Jun 2014 09:45:00 -0000
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-2.5 required=5.0 tests=AWL, BAYES_00,
 RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2
X-HELO: mail-lb0-f177.google.com
Received: from mail-lb0-f177.google.com (HELO mail-lb0-f177.google.com)
 (209.85.217.177) by sourceware.org
 (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted)
 ESMTPS; Thu, 19 Jun 2014 09:44:59 +0000
Received: by mail-lb0-f177.google.com with SMTP id u10so1280130lbd.36 for
 <gcc-patches@gcc.gnu.org>; Thu, 19 Jun 2014 02:44:55 -0700 (PDT)
MIME-Version: 1.0
X-Received: by 10.152.21.132 with SMTP id v4mr2624958lae.23.1403171095832;
 Thu, 19 Jun 2014 02:44:55 -0700 (PDT)
Received: by 10.112.13.36 with HTTP; Thu, 19 Jun 2014 02:44:55 -0700 (PDT)
In-Reply-To: <CACgzC7Dybp-U8k_0S+VuXSR+k9LfinHk5+gUwks8+5-A53cAag@mail.gmail.com>
References: <CACgzC7A8cVyoyFQO24eOo9t0+UEuKcYtWdANiJvgukcg_P9baw@mail.gmail.com>	<CAFiYyc3=-0aeeGuMiQepEBFhiJqxxLKD-WZ2RPRJA95PTdJ_EA@mail.gmail.com>	<CACgzC7Dybp-U8k_0S+VuXSR+k9LfinHk5+gUwks8+5-A53cAag@mail.gmail.com>
Date: Thu, 19 Jun 2014 17:44:55 +0800
Message-ID: <CACgzC7Bweh9JXbNeOE=oHazhaWhsXWpk6LSXyEwzTqVW_AzP8Q@mail.gmail.com>
Subject: Re: [PATCH, cprop] Check rtx_cost when propagating constant
From: Zhenqiang Chen <zhenqiang.chen@linaro.org>
To: Richard Biener <richard.guenther@gmail.com>
Cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
 Andrew Pinski <pinskia@gmail.com>
X-IsSubscribed: yes
X-Original-Sender: zhenqiang.chen@linaro.org
X-Original-Authentication-Results: mx.google.com; spf=pass (google.com:
 domain of
 patch+caf_=patchwork-forward=linaro.org@linaro.org designates
 2607:f8b0:400c:c03::236 as permitted sender)
 smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org; 
 dkim=pass header.i=@gcc.gnu.org
X-Google-Group-Id: 836684582541

On 17 June 2014 17:42, Zhenqiang Chen <zhenqiang.chen@linaro.org> wrote:
> On 17 June 2014 16:15, Richard Biener <richard.guenther@gmail.com> wrote:
>> On Tue, Jun 17, 2014 at 4:11 AM, Zhenqiang Chen
>> <zhenqiang.chen@linaro.org> wrote:
>>> Hi,
>>>
>>> For some large constant, ports like ARM, need one more instructions to
>>> operate it. e.g
>>>
>>> #define MASK 0xfe00ff
>>> void maskdata (int * data, int len)
>>> {
>>>    int i = len;
>>>    for (; i > 0; i -= 2)
>>>     {
>>>       data[i] &= MASK;
>>>       data[i + 1] &= MASK;
>>>     }
>>> }
>>>
>>> Need two instructions for each AND operation:
>>>
>>>     and    r3, r3, #16711935
>>>     bic    r3, r3, #65536
>>>
>>> If we keep the MASK in a register, loop2_invariant pass can hoist it
>>> out the loop. And it can be shared by different references.
>>>
>>> So the patch skips constant propagation if it makes INSN's cost higher.
>>
>> So cprop undos invariant motions work here?
>
> Yes. GLOBAL CONST-PROP will undo invariant motions.
>
>> Should we make sure we add a REG_EQUAL note when not propagating?
>
> Logs show there already has REG_EQUAL note.
>
>>> Bootstrap and no make check regression on X86-64 and ARM Chrome book.
>>>
>>> OK for trunk?
>>>
>>> Thanks!
>>> -Zhenqiang
>>>
>>> ChangeLog:
>>> 2014-06-17  Zhenqiang Chen  <zhenqiang.chen@linaro.org>
>>>
>>>         * cprop.c (try_replace_reg): Check cost for constants.
>>>
>>> diff --git a/gcc/cprop.c b/gcc/cprop.c
>>> index aef3ee8..c9cf02a 100644
>>> --- a/gcc/cprop.c
>>> +++ b/gcc/cprop.c
>>> @@ -733,6 +733,14 @@ try_replace_reg (rtx from, rtx to, rtx insn)
>>>    rtx src = 0;
>>>    int success = 0;
>>>    rtx set = single_set (insn);
>>> +  int old_cost = 0;
>>> +  bool copy_p = false;
>>> +  bool speed = optimize_bb_for_speed_p (BLOCK_FOR_INSN (insn));
>>> +
>>> +  if (set && SET_SRC (set) && REG_P (SET_SRC (set)))
>>> +    copy_p = true;
>>> +  else
>>> +    old_cost = set_rtx_cost (set, speed);
>>
>> Looks bogus for set == NULL?
>
> set_rtx_cost has checked it. If it is NULL, the function will return 0;
>
>> Also what about register pressure?
>
> Do you think it has big register pressure impact? I think it does not
> increase register pressure.
>
>> I think this kind of change needs wider testing as RTX costs are
>> usually not fully implemented and you introduce a new use kind
>> (or is it already used elsewhere in this way to compute cost
>> difference of a set with s/reg/const?).
>
> Passes like fwprop, cse, auto_inc_dec, uses RTX costs to make the
> decision. e.g. in function attempt_change of auto-inc-dec.c, it has
> code segments like:
>
>   old_cost = (set_src_cost (mem, speed)
>               + set_rtx_cost (PATTERN (inc_insn.insn), speed));
>   new_cost = set_src_cost (mem_tmp, speed);
>   ...
>   if (old_cost < new_cost)
>     {
>       ...
>       return false;
>     }
>
> The usage of RTX costs in this patch is similar.
>
> I had run X86-64 bootstrap and regression tests with
> --enable-languages=c,c++,lto,fortran,go,ada,objc,obj-c++,java
>
> And ARM bootstrap and regression tests with
> --enable-languages=c,c++,fortran,lto,objc,obj-c++
>
> I will run tests on i686. What other tests do you think I have to run?
>
>> What kind of performance difference do you see?
>
> I had run coremark, dhrystone, eembc on ARM Cortex-M4 (with some arm
> backend changes). Coremark with some options show >10% performance
> improvement. dhrystone is a little better. Some wave in eembc, but
> overall result is better.
>
> I will run spec2000 on X86-64 and ARM, and back to you about the
> performance changes.

Please ignore my previous comments about Cortex-M4 performance since
it does not base on clean codes.

Here is a summary for performance result on X86-64 and ARM.

For X86-64, I run SPEC2000 INT and FP (-O3). There is no improvement
or regression. As tests, I moved the code segment to end of function
try_replace_reg and check insns which meet "success && new_cost >
old_cost". Logs show only 52 occurrences for all SPEC2000 build and
the only one instruction pattern: *adddi_1 is impacted. For *adddi_1,
rtx_cost increases from 8 to 10 when changing a register operand to a
constant.

For ARM Cortex-M4, minimal changes for Coremark, Dhrystone and EEMBC.
For ARM Chrome book (Cortex-A15), some wave in SPEC2000 INT test. But
the final result does not show improvement or regression.

The patch is updated to remove the "bogus" code and keep more constants.

Bootstrap and no make check regression on X86-64, i686 and ARM.

diff --git a/gcc/cprop.c b/gcc/cprop.c
index aef3ee8..6ea6be0 100644
--- a/gcc/cprop.c
+++ b/gcc/cprop.c
@@ -733,6 +733,28 @@ try_replace_reg (rtx from, rtx to, rtx insn)
   rtx src = 0;
   int success = 0;
   rtx set = single_set (insn);
+  int old_cost = 0;
+  bool const_p = false;
+  bool speed = optimize_bb_for_speed_p (BLOCK_FOR_INSN (insn));
+
+  if (set && SET_SRC (set))
+    {
+      rtx src = SET_SRC (set);
+      if (REG_P (src) || GET_CODE (src) == SUBREG)
+        const_p = true;
+      else
+       {
+         if (note != 0
+             && REG_NOTE_KIND (note) == REG_EQUAL
+             && (GET_CODE (XEXP (note, 0)) == CONST
+                 || CONSTANT_P (XEXP (note, 0))))
+           {
+             const_p = true;
+           }
+         else
+           old_cost = set_rtx_cost (set, speed);
+       }
+    }

   /* Usually we substitute easy stuff, so we won't copy everything.
      We however need to take care to not duplicate non-trivial CONST
@@ -740,6 +762,20 @@ try_replace_reg (rtx from, rtx to, rtx insn)
   to = copy_rtx (to);

   validate_replace_src_group (from, to, insn);
+
+  /* For CONSTANT_P (TO), loop2_invariant pass might hoist it out the loop.
+     And it can be shared by different references.  So skip propagation if
+     it makes INSN's rtx cost higher.  */
+  if (set && SET_SRC (set) && !const_p && CONSTANT_P (to))
+    {
+      if (!CONSTANT_P (SET_SRC (set))
+         && (set_rtx_cost (set, speed) > old_cost))
+       {
+         cancel_changes (0);
+         return false;
+       }
+    }
+
   if (num_changes_pending () && apply_change_group ())
     success = 1;