From patchwork Thu Jun 2 08:46:38 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Ira Rosen X-Patchwork-Id: 1715 Return-Path: Delivered-To: unknown Received: from imap.gmail.com (74.125.159.109) by localhost6.localdomain6 with IMAP4-SSL; 08 Jun 2011 14:54:44 -0000 Delivered-To: patches@linaro.org Received: by 10.52.181.10 with SMTP id ds10cs357879vdc; Thu, 2 Jun 2011 01:46:41 -0700 (PDT) Received: by 10.42.180.10 with SMTP id bs10mr949743icb.321.1307004400759; Thu, 02 Jun 2011 01:46:40 -0700 (PDT) Received: from mail-pv0-f178.google.com (mail-pv0-f178.google.com [74.125.83.178]) by mx.google.com with ESMTPS id i6si1795534icv.133.2011.06.02.01.46.39 (version=TLSv1/SSLv3 cipher=OTHER); Thu, 02 Jun 2011 01:46:39 -0700 (PDT) Received-SPF: neutral (google.com: 74.125.83.178 is neither permitted nor denied by best guess record for domain of ira.rosen@linaro.org) client-ip=74.125.83.178; Authentication-Results: mx.google.com; spf=neutral (google.com: 74.125.83.178 is neither permitted nor denied by best guess record for domain of ira.rosen@linaro.org) smtp.mail=ira.rosen@linaro.org Received: by pvg7 with SMTP id 7so341512pvg.37 for ; Thu, 02 Jun 2011 01:46:38 -0700 (PDT) MIME-Version: 1.0 Received: by 10.143.63.17 with SMTP id q17mr83105wfk.186.1307004398614; Thu, 02 Jun 2011 01:46:38 -0700 (PDT) Received: by 10.143.61.7 with HTTP; Thu, 2 Jun 2011 01:46:38 -0700 (PDT) In-Reply-To: References: Date: Thu, 2 Jun 2011 11:46:38 +0300 Message-ID: Subject: Re: [patch] Improve detection of widening multiplication in the vectorizer From: Ira Rosen To: Richard Guenther Cc: gcc-patches@gcc.gnu.org, Patch Tracking On 1 June 2011 15:14, Richard Guenther wrote: > On Wed, Jun 1, 2011 at 1:37 PM, Ira Rosen wrote: >> On 1 June 2011 12:42, Richard Guenther wrote: >> >>> Did you think about moving pass_optimize_widening_mul before >>> loop optimizations?  Does that pass catch the cases you are >>> teaching the pattern recognizer?  I think we should try to expose >>> these more complicated instructions to loop optimizers. >>> >> >> pass_optimize_widening_mul doesn't catch these cases, but I can try to >> teach it instead of the vectorizer. >> I am now testing >> >> Index: passes.c >> =================================================================== >> --- passes.c    (revision 174391) >> +++ passes.c    (working copy) >> @@ -870,6 +870,7 @@ >>       NEXT_PASS (pass_split_crit_edges); >>       NEXT_PASS (pass_pre); >>       NEXT_PASS (pass_sink_code); >> +      NEXT_PASS (pass_optimize_widening_mul); >>       NEXT_PASS (pass_tree_loop); >>        { >>          struct opt_pass **p = &pass_tree_loop.pass.sub; >> @@ -934,7 +935,6 @@ >>       NEXT_PASS (pass_forwprop); >>       NEXT_PASS (pass_phiopt); >>       NEXT_PASS (pass_fold_builtins); >> -      NEXT_PASS (pass_optimize_widening_mul); >>       NEXT_PASS (pass_tail_calls); >>       NEXT_PASS (pass_rename_ssa_copies); >>       NEXT_PASS (pass_uncprop); >> >> to see how it affects other loop optimizations (vectorizer pattern >> tests obviously fail). Looks like it needs copy_prop and dce as well: otherwise I get (on x86_64-suse-linux) FAIL: gcc.target/i386/fma4-fma-2.c scan-assembler vfmaddss FAIL: gcc.target/i386/fma4-fma-2.c scan-assembler vfmaddsd FAIL: gcc.target/i386/fma4-fma-2.c scan-assembler vfmsubss FAIL: gcc.target/i386/fma4-fma-2.c scan-assembler vfmsubsd FAIL: gcc.target/i386/fma4-fma-2.c scan-assembler vfnmaddss FAIL: gcc.target/i386/fma4-fma-2.c scan-assembler vfnmaddsd Ira > > Thanks.  I would hope that we eventually can get rid of the > pattern recognizer ... at least for SSE there is also always > a scalar variant instruction for each vectorized one. > > Richard. > Index: passes.c =================================================================== --- passes.c (revision 174391) +++ passes.c (working copy) @@ -870,6 +870,9 @@ NEXT_PASS (pass_split_crit_edges); NEXT_PASS (pass_pre); NEXT_PASS (pass_sink_code); + NEXT_PASS (pass_copy_prop); + NEXT_PASS (pass_dce); + NEXT_PASS (pass_optimize_widening_mul); NEXT_PASS (pass_tree_loop); { struct opt_pass **p = &pass_tree_loop.pass.sub; @@ -934,7 +937,6 @@ NEXT_PASS (pass_forwprop); NEXT_PASS (pass_phiopt); NEXT_PASS (pass_fold_builtins); - NEXT_PASS (pass_optimize_widening_mul); NEXT_PASS (pass_tail_calls); NEXT_PASS (pass_rename_ssa_copies); NEXT_PASS (pass_uncprop);