From patchwork Mon Aug 4 08:24:15 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhenqiang Chen X-Patchwork-Id: 34767 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-qg0-f71.google.com (mail-qg0-f71.google.com [209.85.192.71]) by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id C2D3020F2E for ; Mon, 4 Aug 2014 08:24:36 +0000 (UTC) Received: by mail-qg0-f71.google.com with SMTP id f51sf14078668qge.10 for ; Mon, 04 Aug 2014 01:24:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:delivered-to:mailing-list:precedence:list-id :list-unsubscribe:list-archive:list-post:list-help:sender :delivered-to:mime-version:in-reply-to:references:date:message-id :subject:from:to:cc:x-original-sender :x-original-authentication-results:content-type; bh=0rQs3BLRHLDAW8rqelDnie3WdascloBPtYt6UmMQjDU=; b=GuZD/KiFBrl//Olk1qmOMGtEfZKmN7I9sz+jjBC3JcVJRVvMoV6M7d3kCmT0yYY/t9 GcBgba1oXFLzNRxtPR6OirQz3hBRD3YdTMOUNW9rNsx+oXD7ghcYWYPBhU5JTJL+J7R6 fYZWLJ8bY77GA36myW5PrdiYk0pmuS07UcAGgzYt8ArNwpZV9SHWTV3LkwJ/4WmWy+PV 1l6KlP/jZBiwUIKzdJTLGb40orhSR8/h3iXiFb/qmTrikVRJhqXI06qaLwIeDuRfglwu h4dq24XA6PYz7t4ywd1exuNAyAmS1e0a587fYjxjwgYcE8QkCG+lVX5dHJ6gH94v8zC7 gB5w== X-Gm-Message-State: ALoCoQk7c5ZsfYiFRv32C0fVYgjwsAIS1t8A1qlLFJ2I7O73u4LLLJmoaO1Xzj0NvUWk7nENmfL7 X-Received: by 10.236.44.205 with SMTP id n53mr7105164yhb.12.1407140676628; Mon, 04 Aug 2014 01:24:36 -0700 (PDT) X-BeenThere: patchwork-forward@linaro.org Received: by 10.140.102.18 with SMTP id v18ls2185281qge.51.gmail; Mon, 04 Aug 2014 01:24:36 -0700 (PDT) X-Received: by 10.220.166.9 with SMTP id k9mr21758863vcy.20.1407140676473; Mon, 04 Aug 2014 01:24:36 -0700 (PDT) Received: from mail-vc0-x22f.google.com (mail-vc0-x22f.google.com [2607:f8b0:400c:c03::22f]) by mx.google.com with ESMTPS id tc7si11530449vcb.24.2014.08.04.01.24.36 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 04 Aug 2014 01:24:36 -0700 (PDT) Received-SPF: pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 2607:f8b0:400c:c03::22f as permitted sender) client-ip=2607:f8b0:400c:c03::22f; Received: by mail-vc0-f175.google.com with SMTP id ik5so10313208vcb.6 for ; Mon, 04 Aug 2014 01:24:36 -0700 (PDT) X-Received: by 10.52.166.10 with SMTP id zc10mr1340141vdb.61.1407140676350; Mon, 04 Aug 2014 01:24:36 -0700 (PDT) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patch@linaro.org Received: by 10.221.37.5 with SMTP id tc5csp286899vcb; Mon, 4 Aug 2014 01:24:35 -0700 (PDT) X-Received: by 10.66.233.170 with SMTP id tx10mr22675471pac.68.1407140675060; Mon, 04 Aug 2014 01:24:35 -0700 (PDT) Received: from sourceware.org (server1.sourceware.org. [209.132.180.131]) by mx.google.com with ESMTPS id az7si8402096pdb.476.2014.08.04.01.24.34 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 04 Aug 2014 01:24:35 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-return-374001-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Received: (qmail 26277 invoked by alias); 4 Aug 2014 08:24:21 -0000 Mailing-List: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: , List-Help: , Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 26268 invoked by uid 89); 4 Aug 2014 08:24:21 -0000 X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.5 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-la0-f47.google.com Received: from mail-la0-f47.google.com (HELO mail-la0-f47.google.com) (209.85.215.47) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Mon, 04 Aug 2014 08:24:19 +0000 Received: by mail-la0-f47.google.com with SMTP id mc6so5052179lab.6 for ; Mon, 04 Aug 2014 01:24:15 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.112.60.5 with SMTP id d5mr21125876lbr.26.1407140655546; Mon, 04 Aug 2014 01:24:15 -0700 (PDT) Received: by 10.112.136.66 with HTTP; Mon, 4 Aug 2014 01:24:15 -0700 (PDT) In-Reply-To: <53C73EBD.9070909@redhat.com> References: <53C73EBD.9070909@redhat.com> Date: Mon, 4 Aug 2014 16:24:15 +0800 Message-ID: Subject: Re: [PATCH] Fix PR 61225 From: Zhenqiang Chen To: Jeff Law Cc: Steven Bosscher , "gcc-patches@gcc.gnu.org" , Jakub Jelinek X-IsSubscribed: yes X-Original-Sender: zhenqiang.chen@linaro.org X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 2607:f8b0:400c:c03::22f as permitted sender) smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org; dkim=pass header.i=@gcc.gnu.org X-Google-Group-Id: 836684582541 On 17 July 2014 11:10, Jeff Law wrote: > On 05/22/14 03:52, Zhenqiang Chen wrote: >> >> On 21 May 2014 20:43, Steven Bosscher wrote: >>> >>> On Wed, May 21, 2014 at 11:58 AM, Zhenqiang Chen wrote: >>>> >>>> Hi, >>>> >>>> The patch fixes the gcc.target/i386/pr49095.c FAIL in PR61225. The >>>> test case tends to check a peephole2 optimization, which optimizes the >>>> following sequence >>>> >>>> 2: bx:SI=ax:SI >>>> 25: ax:SI=[bx:SI] >>>> 7: {ax:SI=ax:SI-0x1;clobber flags:CC;} >>>> 8: [bx:SI]=ax:SI >>>> 9: flags:CCZ=cmp(ax:SI,0) >>>> to >>>> 2: bx:SI=ax:SI >>>> 41: {flags:CCZ=cmp([bx:SI]-0x1,0);[bx:SI]=[bx:SI]-0x1;} >>>> >>>> The enhanced shrink-wrapping, which calls copyprop_hardreg_forward >>>> changes the INSN 25 to >>>> >>>> 25: ax:SI=[ax:SI] >>>> >>>> Then peephole2 can not optimize it since two memory_operands look like >>>> different. >>>> >>>> To fix it, the patch adds another peephole2 rule to read one more >>>> insn. From the register copy, it knows the address is the same. >>> >>> >>> That is one complex peephole2 to deal with a transformation like this. >>> It seems to be like it's a too specific solution for a bigger problem. >>> >>> Could you please try one of the following solutions instead: >>> >>> 1. Track register values for peephole2 and try different alternatives >>> based on known register equivalences? E.g. in your example, perhaps >>> there is already a REG_EQUAL/REG_EQUIV note available on insn 25 after >>> copyprop_hardreg_forward, to annotate that [ax:SI] is equivalent to >>> [bx:SI] at that point (or if that information is not available, it is >>> not very difficult to make it available). Then you could try applying >>> peephole2 on the original pattern but also on patterns modified with >>> the known equivalences (i.e. try peephole2 on multiple equivalent >>> patterns for the same insn). This may expose other peephole2 >>> opportunities, not just the specific one your patch addresses. >> >> >> Patch is updated according to the comment. There is no REG_EQUAL. So I >> add it when replace_oldest_value_reg. >> >> ChangeLog: >> 2014-05-22 Zhenqiang Chen >> >> Part of PR rtl-optimization/61225 >> * config/i386/i386-protos.h (ix86_peephole2_rtx_equal_p): New >> proto. >> * config/i386/i386.c (ix86_peephole2_rtx_equal_p): New function. >> * regcprop.c (replace_oldest_value_reg): Add REG_EQUAL note when >> propagating to SET. > > I can't help but wonder why the new 4 insn combination code isn't presenting > this as a nice big fat insn to the x86 backend which would eliminate the > need for the peep2. > > But, assuming there's a fundamental reason why that's not kicking in... Current combine pass can only handle I0 -> I1 -> I2 -> I3. I0, I1 -> I2, I2 -> I3. I0 -> I2; I1, I2 -> I3. I0 -> I1; I1, I2 -> I3. For the case, it is I1 -> I2 -> I3; I2 -> INSN I3 and INSN looks like not related. But INSN is a COMPARE to set CC and I3 can also set CC. I3 and INSN can be combined together as one instruction to set CC. The following patch enhances combine pass to handle the case. A new parameter is added for try_combine to accept INSN as TO_COMBINED_INSN. It reuses the 3-insn combine method to combine I1 -> I2 -> I3. If there is TO_COMBINED_INSN, combine I2 -> TO_COMBINED_INSN. Then create an new insn "parallel (TO_COMBINED_INSN, I3)". refer_same_reg_p is some check to make sure the change is safe. Bootstrap and no make check regression on X86-64 and i686. X86-64 bootstrap logs show 358 cases were combined by the patch. Ok for trunk? Thanks! -Zhenqiang ChangeLog 2014-08-04 Zhenqiang Chen Part of PR rtl-optimization/61225 * combine.c (refer_same_reg_p): New function. (combine_instructions): Handle I1 -> I2 -> I3; I2 -> insn. (try_combine): Add one more parameter TO_COMBINED_INSN, which is used to create a new insn parallel (TO_COMBINED_INSN, I3). testsuite/ChangeLog: 2014-08-04 Zhenqiang Chen * gcc.target/i386/pr61225.c: New test. +/* { dg-final { cleanup-rtl-dump "combine" } } */ > In replace_oldest_value_reg, why not use reg_overlap_mentioned_p to > determine if the REGNO of NEW_RTX is modified by INSN? I'd look to avoid > some of those calls to single_set (insn). Just call it once and reuse the > value. > > Shouldn't you be ensuring the REG_EQUAL note is unique? I think we have a > routine to avoid creating a note that already exists. > > Don't you have to ensure that the value in the REG_EQUAL note has not > changed? A REG_EQUAL note denotes an equivalence that holds at the single > insn where it appears. If you want to use the value elsewhere you'd have to > ensure the value hasn't been changed. If RTX referred to by the REG_EQUAL > note is a MEM, this can be relatively difficult due to aliasing issues. > > Jeff > > > > diff --git a/gcc/combine.c b/gcc/combine.c index 53ac1d6..42098ab 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -412,7 +412,7 @@ static int cant_combine_insn_p (rtx); static int can_combine_p (rtx, rtx, rtx, rtx, rtx, rtx, rtx *, rtx *); static int combinable_i3pat (rtx, rtx *, rtx, rtx, rtx, int, int, rtx *); static int contains_muldiv (rtx); -static rtx try_combine (rtx, rtx, rtx, rtx, int *, rtx); +static rtx try_combine (rtx, rtx, rtx, rtx, int *, rtx, rtx); static void undo_all (void); static void undo_commit (void); static rtx *find_split_point (rtx *, rtx, bool); @@ -1099,6 +1099,46 @@ insn_a_feeds_b (rtx a, rtx b) #endif return false; } + +/* A is a compare (reg1, 0) and B is SINGLE_SET which SET_SRC is reg2. + It returns TRUE, if reg1 == reg2, and no other refer of reg1 + except A and B. */ + +static bool +refer_same_reg_p (rtx a, rtx b) +{ + rtx seta = single_set (a); + rtx setb = single_set (b); + + if (BLOCK_FOR_INSN (a) != BLOCK_FOR_INSN (b) + || !seta || !setb) + return false; + + if (GET_CODE (SET_SRC (seta)) != COMPARE + || GET_MODE_CLASS (GET_MODE (SET_DEST (seta))) != MODE_CC + || !REG_P (XEXP (SET_SRC (seta), 0)) + || !const0_rtx + || !REG_P (SET_SRC (setb)) + || REGNO (SET_SRC (setb)) != REGNO (XEXP (SET_SRC (seta), 0))) + return false; + + if (DF_REG_USE_COUNT (REGNO (SET_SRC (setb))) > 2) + { + df_ref use; + rtx insn; + unsigned int i = REGNO (SET_SRC (setb)); + + for (use = DF_REG_USE_CHAIN (i); use; use = DF_REF_NEXT_REG (use)) + { + insn = DF_REF_INSN (use); + if (insn != a && insn != b && !(NOTE_P (insn) || DEBUG_INSN_P (insn))) + return false; + } + } + + return true; +} + ^L /* Main entry point for combiner. F is the first insn of the function. NREGS is the first unused pseudo-reg number. @@ -1108,10 +1148,7 @@ insn_a_feeds_b (rtx a, rtx b) static int combine_instructions (rtx f, unsigned int nregs) { - rtx insn, next; -#ifdef HAVE_cc0 - rtx prev; -#endif + rtx insn, next, prev; struct insn_link *links, *nextlinks; rtx first; basic_block last_bb; @@ -1258,7 +1295,7 @@ combine_instructions (rtx f, unsigned int nregs) FOR_EACH_LOG_LINK (links, insn) if ((next = try_combine (insn, links->insn, NULL_RTX, NULL_RTX, &new_direct_jump_p, - last_combined_insn)) != 0) + last_combined_insn, NULL_RTX)) != 0) { statistics_counter_event (cfun, "two-insn combine", 1); goto retry; @@ -1279,7 +1316,7 @@ combine_instructions (rtx f, unsigned int nregs) FOR_EACH_LOG_LINK (nextlinks, link) if ((next = try_combine (insn, link, nextlinks->insn, NULL_RTX, &new_direct_jump_p, - last_combined_insn)) != 0) + last_combined_insn, NULL_RTX)) != 0) { statistics_counter_event (cfun, "three-insn combine", 1); goto retry; @@ -1301,13 +1338,13 @@ combine_instructions (rtx f, unsigned int nregs) { if ((next = try_combine (insn, prev, NULL_RTX, NULL_RTX, &new_direct_jump_p, - last_combined_insn)) != 0) + last_combined_insn, NULL_RTX)) != 0) goto retry; FOR_EACH_LOG_LINK (nextlinks, prev) if ((next = try_combine (insn, prev, nextlinks->insn, NULL_RTX, &new_direct_jump_p, - last_combined_insn)) != 0) + last_combined_insn, NULL_RTX)) != 0) goto retry; } @@ -1321,13 +1358,13 @@ combine_instructions (rtx f, unsigned int nregs) { if ((next = try_combine (insn, prev, NULL_RTX, NULL_RTX, &new_direct_jump_p, - last_combined_insn)) != 0) + last_combined_insn, NULL_RTX)) != 0) goto retry; FOR_EACH_LOG_LINK (nextlinks, prev) if ((next = try_combine (insn, prev, nextlinks->insn, NULL_RTX, &new_direct_jump_p, - last_combined_insn)) != 0) + last_combined_insn, NULL_RTX)) != 0) goto retry; } @@ -1343,7 +1380,7 @@ combine_instructions (rtx f, unsigned int nregs) && sets_cc0_p (PATTERN (prev)) && (next = try_combine (insn, links->insn, prev, NULL_RTX, &new_direct_jump_p, - last_combined_insn)) != 0) + last_combined_insn, NULL_RTX)) != 0) goto retry; #endif @@ -1356,7 +1393,7 @@ combine_instructions (rtx f, unsigned int nregs) if ((next = try_combine (insn, links->insn, nextlinks->insn, NULL_RTX, &new_direct_jump_p, - last_combined_insn)) != 0) + last_combined_insn, NULL_RTX)) != 0) { statistics_counter_event (cfun, "three-insn combine", 1); @@ -1385,7 +1422,7 @@ combine_instructions (rtx f, unsigned int nregs) if ((next = try_combine (insn, link, link1, nextlinks->insn, &new_direct_jump_p, - last_combined_insn)) != 0) + last_combined_insn, NULL_RTX)) != 0) { statistics_counter_event (cfun, "four-insn combine", 1); goto retry; @@ -1396,7 +1433,7 @@ combine_instructions (rtx f, unsigned int nregs) if ((next = try_combine (insn, link, link1, nextlinks->insn, &new_direct_jump_p, - last_combined_insn)) != 0) + last_combined_insn, NULL_RTX)) != 0) { statistics_counter_event (cfun, "four-insn combine", 1); goto retry; @@ -1413,7 +1450,7 @@ combine_instructions (rtx f, unsigned int nregs) if ((next = try_combine (insn, link, link1, nextlinks->insn, &new_direct_jump_p, - last_combined_insn)) != 0) + last_combined_insn, NULL_RTX)) != 0) { statistics_counter_event (cfun, "four-insn combine", 1); goto retry; @@ -1423,7 +1460,7 @@ combine_instructions (rtx f, unsigned int nregs) if ((next = try_combine (insn, link, link1, nextlinks->insn, &new_direct_jump_p, - last_combined_insn)) != 0) + last_combined_insn, NULL_RTX)) != 0) { statistics_counter_event (cfun, "four-insn combine", 1); goto retry; @@ -1431,6 +1468,50 @@ combine_instructions (rtx f, unsigned int nregs) } } + /* Try to combine a compare insn that sets CC + with a preceding insn that can set CC, and maybe with its + logical predecessor as well. + We need this special code because data flow connections + do not get entered in LOG_LINKS. */ + if ((prev = prev_nonnote_nondebug_insn (insn)) != NULL_RTX + && refer_same_reg_p (insn, prev) + && max_combine >= 4) + { + struct insn_link *next1; + FOR_EACH_LOG_LINK (next1, prev) + { + rtx link1 = next1->insn; + if (NOTE_P (link1)) + continue; + /* I1 -> I2 -> I3; I2 -> insn; + output parallel (insn, I3). */ + FOR_EACH_LOG_LINK (nextlinks, link1) + if ((next = try_combine (prev, link1, + nextlinks->insn, NULL_RTX, + &new_direct_jump_p, + last_combined_insn, insn)) != 0) + + { + delete_insn (insn); + insn = next; + statistics_counter_event (cfun, "four-insn combine", 1); + goto retry; + } + /* I2 -> I3; I2 -> insn + output next = parallel (insn, I3). */ + if ((next = try_combine (prev, link1, + NULL_RTX, NULL_RTX, + &new_direct_jump_p, + last_combined_insn, insn)) != 0) + + { + delete_insn (insn); + insn = next; + statistics_counter_event (cfun, "three-insn combine", 1); + goto retry; + } + } + } /* Try this insn with each REG_EQUAL note it links back to. */ FOR_EACH_LOG_LINK (links, insn) { @@ -1456,7 +1537,7 @@ combine_instructions (rtx f, unsigned int nregs) i2mod_new_rhs = copy_rtx (note); next = try_combine (insn, i2mod, NULL_RTX, NULL_RTX, &new_direct_jump_p, - last_combined_insn); + last_combined_insn, NULL_RTX); i2mod = NULL_RTX; if (next) { @@ -2450,11 +2531,14 @@ update_cfg_for_uncondjump (rtx insn) LAST_COMBINED_INSN is either I3, or some insn after I3 that has been I3 passed to an earlier try_combine within the same basic - block. */ + block. + + TO_COMBINED_INSN is an insn after I3 that sets CC flags. It is used to + combine with I3 to make a new insn. */ static rtx try_combine (rtx i3, rtx i2, rtx i1, rtx i0, int *new_direct_jump_p, - rtx last_combined_insn) + rtx last_combined_insn, rtx to_combined_insn) { /* New patterns for I3 and I2, respectively. */ rtx newpat, newi2pat = 0; @@ -2543,6 +2627,7 @@ try_combine (rtx i3, rtx i2, rtx i1, rtx i0, int *new_direct_jump_p, || cant_combine_insn_p (i2) || (i1 && cant_combine_insn_p (i1)) || (i0 && cant_combine_insn_p (i0)) + || (to_combined_insn && cant_combine_insn_p (to_combined_insn)) || likely_spilled_retval_p (i3)) return 0; @@ -3216,7 +3301,11 @@ try_combine (rtx i3, rtx i2, rtx i1, rtx i0, int *new_direct_jump_p, rtx old = newpat; total_sets = 1 + extra_sets; newpat = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (total_sets)); - XVECEXP (newpat, 0, 0) = old; + + if (to_combined_insn) + XVECEXP (newpat, 0, --total_sets) = old; + else + XVECEXP (newpat, 0, 0) = old; } if (added_sets_0) @@ -3239,6 +3328,21 @@ try_combine (rtx i3, rtx i2, rtx i1, rtx i0, int *new_direct_jump_p, if ((i0_feeds_i1_n && i1_feeds_i2_n) || i0_feeds_i2_n) t = subst (t, i0dest, i0src_copy2 ? i0src_copy2 : i0src, 0, 0, 0); + if (to_combined_insn) + { + rtx todest = NULL_RTX, tosrc = NULL_RTX; + if (can_combine_p (i2, to_combined_insn, NULL_RTX, NULL_RTX, + i3, NULL_RTX, &todest, &tosrc)) + { + rtx pat = copy_rtx (PATTERN (to_combined_insn)); + t = subst (pat, todest, tosrc, 0, 0, 0); + } + else + { + undo_all (); + return 0; + } + } XVECEXP (newpat, 0, --total_sets) = t; } } diff --git a/gcc/testsuite/gcc.target/i386/pr61225.c b/gcc/testsuite/gcc.target/i386/pr61225.c new file mode 100644 index 0000000..9c08102 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr61225.c @@ -0,0 +1,16 @@ +/* PR rtl-optimization/61225 */ +/* { dg-do compile } */ +/* { dg-options "-Os -fdump-rtl-combine-details" } */ + +void foo (void *); + +int * +f1 (int *x) +{ + if (!--*x) + foo (x); + return x; +} + +/* { dg-final { scan-rtl-dump "Successfully matched this instruction" "combine" } } */