From patchwork Thu Jun  2 08:46:38 2011
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Ira Rosen <ira.rosen@linaro.org>
X-Patchwork-Id: 1715
Return-Path: <ira.rosen@linaro.org>
Delivered-To: unknown
Received: from imap.gmail.com (74.125.159.109) by localhost6.localdomain6
 with IMAP4-SSL; 08 Jun 2011 14:54:44 -0000
Delivered-To: patches@linaro.org
Received: by 10.52.181.10 with SMTP id ds10cs357879vdc;
 Thu, 2 Jun 2011 01:46:41 -0700 (PDT)
Received: by 10.42.180.10 with SMTP id bs10mr949743icb.321.1307004400759;
 Thu, 02 Jun 2011 01:46:40 -0700 (PDT)
Received: from mail-pv0-f178.google.com (mail-pv0-f178.google.com
 [74.125.83.178]) by mx.google.com with ESMTPS id
 i6si1795534icv.133.2011.06.02.01.46.39
 (version=TLSv1/SSLv3 cipher=OTHER);
 Thu, 02 Jun 2011 01:46:39 -0700 (PDT)
Received-SPF: neutral (google.com: 74.125.83.178 is neither permitted nor
 denied by best guess record for domain of
 ira.rosen@linaro.org) client-ip=74.125.83.178; 
Authentication-Results: mx.google.com;
 spf=neutral (google.com: 74.125.83.178 is neither
 permitted nor denied by best guess record for domain of
 ira.rosen@linaro.org) smtp.mail=ira.rosen@linaro.org
Received: by pvg7 with SMTP id 7so341512pvg.37
 for <patches@linaro.org>; Thu, 02 Jun 2011 01:46:38 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.143.63.17 with SMTP id q17mr83105wfk.186.1307004398614; Thu,
 02 Jun 2011 01:46:38 -0700 (PDT)
Received: by 10.143.61.7 with HTTP; Thu, 2 Jun 2011 01:46:38 -0700 (PDT)
In-Reply-To: <BANLkTi=wF=x0iMhGsa90Sa5M=QWBs=J8XQ@mail.gmail.com>
References: <BANLkTi=wUC7Lf2oqY3Gt4m8s0JfwbcJ78Q@mail.gmail.com>
 <BANLkTinO0ifhMQ3nkeQgaeu3pN1a64NxAw@mail.gmail.com>
 <BANLkTi=kwCb71z5_n6U1P497uggA5boMyw@mail.gmail.com>
 <BANLkTi=wF=x0iMhGsa90Sa5M=QWBs=J8XQ@mail.gmail.com>
Date: Thu, 2 Jun 2011 11:46:38 +0300
Message-ID: <BANLkTikSzRMm_QFL6YKKN6dxNtAEx8DNqg@mail.gmail.com>
Subject: Re: [patch] Improve detection of widening multiplication in the
 vectorizer
From: Ira Rosen <ira.rosen@linaro.org>
To: Richard Guenther <richard.guenther@gmail.com>
Cc: gcc-patches@gcc.gnu.org, Patch Tracking <patches@linaro.org>

On 1 June 2011 15:14, Richard Guenther <richard.guenther@gmail.com> wrote:
> On Wed, Jun 1, 2011 at 1:37 PM, Ira Rosen <ira.rosen@linaro.org> wrote:
>> On 1 June 2011 12:42, Richard Guenther <richard.guenther@gmail.com> wrote:
>>
>>> Did you think about moving pass_optimize_widening_mul before
>>> loop optimizations?  Does that pass catch the cases you are
>>> teaching the pattern recognizer?  I think we should try to expose
>>> these more complicated instructions to loop optimizers.
>>>
>>
>> pass_optimize_widening_mul doesn't catch these cases, but I can try to
>> teach it instead of the vectorizer.
>> I am now testing
>>
>> Index: passes.c
>> ===================================================================
>> --- passes.c    (revision 174391)
>> +++ passes.c    (working copy)
>> @@ -870,6 +870,7 @@
>>       NEXT_PASS (pass_split_crit_edges);
>>       NEXT_PASS (pass_pre);
>>       NEXT_PASS (pass_sink_code);
>> +      NEXT_PASS (pass_optimize_widening_mul);
>>       NEXT_PASS (pass_tree_loop);
>>        {
>>          struct opt_pass **p = &pass_tree_loop.pass.sub;
>> @@ -934,7 +935,6 @@
>>       NEXT_PASS (pass_forwprop);
>>       NEXT_PASS (pass_phiopt);
>>       NEXT_PASS (pass_fold_builtins);
>> -      NEXT_PASS (pass_optimize_widening_mul);
>>       NEXT_PASS (pass_tail_calls);
>>       NEXT_PASS (pass_rename_ssa_copies);
>>       NEXT_PASS (pass_uncprop);
>>
>> to see how it affects other loop optimizations (vectorizer pattern
>> tests obviously fail).

Looks like it needs copy_prop and dce as well:


otherwise I get (on x86_64-suse-linux)

FAIL: gcc.target/i386/fma4-fma-2.c scan-assembler vfmaddss
FAIL: gcc.target/i386/fma4-fma-2.c scan-assembler vfmaddsd
FAIL: gcc.target/i386/fma4-fma-2.c scan-assembler vfmsubss
FAIL: gcc.target/i386/fma4-fma-2.c scan-assembler vfmsubsd
FAIL: gcc.target/i386/fma4-fma-2.c scan-assembler vfnmaddss
FAIL: gcc.target/i386/fma4-fma-2.c scan-assembler vfnmaddsd

Ira

>
> Thanks.  I would hope that we eventually can get rid of the
> pattern recognizer ... at least for SSE there is also always
> a scalar variant instruction for each vectorized one.
>
> Richard.
>

Index: passes.c
===================================================================
--- passes.c    (revision 174391)
+++ passes.c    (working copy)
@@ -870,6 +870,9 @@
       NEXT_PASS (pass_split_crit_edges);
       NEXT_PASS (pass_pre);
       NEXT_PASS (pass_sink_code);
+      NEXT_PASS (pass_copy_prop);
+      NEXT_PASS (pass_dce);
+      NEXT_PASS (pass_optimize_widening_mul);
       NEXT_PASS (pass_tree_loop);
        {
          struct opt_pass **p = &pass_tree_loop.pass.sub;
@@ -934,7 +937,6 @@
       NEXT_PASS (pass_forwprop);
       NEXT_PASS (pass_phiopt);
       NEXT_PASS (pass_fold_builtins);
-      NEXT_PASS (pass_optimize_widening_mul);
       NEXT_PASS (pass_tail_calls);
       NEXT_PASS (pass_rename_ssa_copies);
       NEXT_PASS (pass_uncprop);