From patchwork Wed Aug 13 07:46:06 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhenqiang Chen X-Patchwork-Id: 35328 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-qg0-f71.google.com (mail-qg0-f71.google.com [209.85.192.71]) by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id D7817203C5 for ; Wed, 13 Aug 2014 07:46:27 +0000 (UTC) Received: by mail-qg0-f71.google.com with SMTP id f51sf31272892qge.10 for ; Wed, 13 Aug 2014 00:46:27 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:delivered-to:mailing-list:precedence:list-id :list-unsubscribe:list-archive:list-post:list-help:sender :delivered-to:mime-version:in-reply-to:references:date:message-id :subject:from:to:cc:x-original-sender :x-original-authentication-results:content-type; bh=MQd+TjWeGtkpxhwiCf93x/mwm63XsMHLIbheT3q9K/I=; b=kgO/QFYb4O/fKWDF+bKmZi746uCVdr/N5sQqlzP2ogLsKHJqxcqcxbnntlP90HiSs3 kyjjbMz7Py9DR7X1e7igiwxsnNDVMZvAfriatSF6fnsU3Le6NyoJGu9EGpCliJ1IUfMe WZk8Syk6ZxXexVnY+pO0LtTRwVCiK7TDYe0x7y9ID6I54hfj7Ydmcok1MYFeDrkcF4Qg QbpraNbX7O/DGU7ghD8ibq0N8WNeQdcvgHquOaJd8jYV5Opv0girdbjpTcloXh1QmMhM RGgTYk7qBYZfz5DeWoOLsLkVjEAY0WstsyhYd+1vQehGLGvTOL21S3h9bWJ5OEqIbinl CM5g== X-Gm-Message-State: ALoCoQlRY+JBwfLTDAy6oycc5GLTPPx+tqFFkNh9f6Km/vfR8WsvgF/fA+NaAVDrtxfAZPo8Vv3v X-Received: by 10.236.1.198 with SMTP id 46mr1667109yhd.16.1407915987617; Wed, 13 Aug 2014 00:46:27 -0700 (PDT) X-BeenThere: patchwork-forward@linaro.org Received: by 10.140.104.130 with SMTP id a2ls432676qgf.90.gmail; Wed, 13 Aug 2014 00:46:27 -0700 (PDT) X-Received: by 10.221.68.66 with SMTP id xx2mr2236129vcb.1.1407915987441; Wed, 13 Aug 2014 00:46:27 -0700 (PDT) Received: from mail-vc0-x22b.google.com (mail-vc0-x22b.google.com [2607:f8b0:400c:c03::22b]) by mx.google.com with ESMTPS id h6si640011vdw.73.2014.08.13.00.46.27 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 13 Aug 2014 00:46:27 -0700 (PDT) Received-SPF: pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 2607:f8b0:400c:c03::22b as permitted sender) client-ip=2607:f8b0:400c:c03::22b; Received: by mail-vc0-f171.google.com with SMTP id hq11so14622345vcb.16 for ; Wed, 13 Aug 2014 00:46:27 -0700 (PDT) X-Received: by 10.221.44.69 with SMTP id uf5mr2294959vcb.4.1407915987267; Wed, 13 Aug 2014 00:46:27 -0700 (PDT) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patch@linaro.org Received: by 10.221.37.5 with SMTP id tc5csp309820vcb; Wed, 13 Aug 2014 00:46:26 -0700 (PDT) X-Received: by 10.68.162.3 with SMTP id xw3mr2554886pbb.142.1407915985824; Wed, 13 Aug 2014 00:46:25 -0700 (PDT) Received: from sourceware.org (server1.sourceware.org. [209.132.180.131]) by mx.google.com with ESMTPS id rn13si886749pab.178.2014.08.13.00.46.25 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 13 Aug 2014 00:46:25 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-return-375045-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Received: (qmail 31706 invoked by alias); 13 Aug 2014 07:46:13 -0000 Mailing-List: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: , List-Help: , Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 31695 invoked by uid 89); 13 Aug 2014 07:46:13 -0000 X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.5 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-la0-f53.google.com Received: from mail-la0-f53.google.com (HELO mail-la0-f53.google.com) (209.85.215.53) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Wed, 13 Aug 2014 07:46:10 +0000 Received: by mail-la0-f53.google.com with SMTP id gl10so8619481lab.26 for ; Wed, 13 Aug 2014 00:46:07 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.152.4.97 with SMTP id j1mr2944875laj.10.1407915967184; Wed, 13 Aug 2014 00:46:07 -0700 (PDT) Received: by 10.112.140.169 with HTTP; Wed, 13 Aug 2014 00:46:06 -0700 (PDT) In-Reply-To: <53E9DD61.2050208@arm.com> References: <53E9DD61.2050208@arm.com> Date: Wed, 13 Aug 2014 15:46:06 +0800 Message-ID: Subject: Re: [PATCH, ARM] Keep constants in register when expanding From: Zhenqiang Chen To: Richard Earnshaw Cc: "gcc-patches@gcc.gnu.org" , Ramana radhakrishnan X-IsSubscribed: yes X-Original-Sender: zhenqiang.chen@linaro.org X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 2607:f8b0:400c:c03::22b as permitted sender) smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org; dkim=pass header.i=@gcc.gnu.org X-Google-Group-Id: 836684582541 On 12 August 2014 17:24, Richard Earnshaw wrote: > On 05/08/14 10:31, Zhenqiang Chen wrote: >> Hi, >> >> For some large constants, ARM will split them during expanding, which >> makes impossible to hoist them out the loop or shared by different >> references (refer the test case in the patch). >> >> The patch keeps some constants in registers. If the constant can not >> be optimized, the cprop and combine passes can optimize them as what >> we do in current expand pass with >> >> define_insn_and_split "*arm_subsi3_insn" >> define_insn_and_split "*arm_andsi3_insn" >> define_insn_and_split "*iorsi3_insn" >> define_insn_and_split "*arm_xorsi3" >> >> The patch does not modify addsi3 since the define_insn_and_split >> "*arm_addsi3" is only valid when (reload_completed || >> !arm_eliminable_register (operands[1])). The cprop and combine passes >> can not optimize the large constant if we put it in register, which >> will lead to regression. >> >> For logic operators, the patch skips changes for constants: >> >> INTVAL (operands[2]) < 0 && const_ok_for_arm (-INTVAL (operands[2]) >> >> since expand pass always uses "sign-extend" to get the value >> (trunc_int_for_mode called from immed_wide_int_const) for rtl, and >> logs show most negative values are UNSIGNED when they are TREE node. >> And combine pass is smart enough to recover the negative value to >> positive value. >> >> Bootstrap and no make check regression on Chrome book. >> For coremark, dhrystone and eembcv1, no any code size and performance >> change on Cortex-M4. >> No any file in CSiBE has code size change for Cortex-A15 and Cortex-M4. >> No Spec2000 performance regression on Chrome book and dumped assemble >> codes only show very few difference. >> >> OK for trunk? >> > > Not yet. > > I think the principle is sound, but there are some issues that need > sorting out. > > 1) We shouldn't do this when not optimizing; there's no advantage to > hoisting out the constants in that case. Indeed, it may not be > worth-while at -O1 either. Updated. Only apply it when optimize >= 2 > 2) I think you should be using const_ok_for_op rather than working > through all the permitted values. If that function isn't doing the > right thing, then it probably needs extending to handle those cases as well. There are some difference. I want to filter cases for which arm_split_constant will get better result. Patch is updated to add a function const_ok_for_split, in which (i < 0) && const_ok_for_arm (-i) and 0xffff can not fit into const_ok_for_op. > 3) Why haven't you handled addsi3? As I had mentioned in the original mail, since the define_insn_and_split "*arm_addsi3" is only valid when (reload_completed || !arm_eliminable_register (operands[1])). The cprop and combine passes can not optimize the large constant if we put it in register (and it is not hoisted out of the loop), which will lead to regression. Any reason for *arm_addsi3 with additional condition (reload_completed || !arm_eliminable_register (operands[1]))? Without it, addsi3 can be handled as subsi3. > See below for some specific comments. All are updated. Thanks! -Zhenqiang >> >> ChangeLog: >> 2014-08-05 Zhenqiang Chen >> >> * config/arm/arm.md (subsi3, andsi3, iorsi3, xorsi3): Keep some >> large constants in register other than split them. >> >> testsuite/ChangeLog: >> 2014-08-05 Zhenqiang Chen >> >> * gcc.target/arm/maskdata.c: New test. >> >> >> skip-split.patch >> >> >> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md >> index bd8ea8f..c8b3001 100644 >> --- a/gcc/config/arm/arm.md >> +++ b/gcc/config/arm/arm.md >> @@ -1162,9 +1162,16 @@ >> { >> if (TARGET_32BIT) >> { >> - arm_split_constant (MINUS, SImode, NULL_RTX, >> - INTVAL (operands[1]), operands[0], >> - operands[2], optimize && can_create_pseudo_p ()); >> + if (!const_ok_for_arm (INTVAL (operands[1])) >> + && can_create_pseudo_p ()) >> + { >> + operands[1] = force_reg (SImode, operands[1]); >> + emit_insn (gen_subsi3 (operands[0], operands[1], operands[2])); > > The final emit_insn shouldn't be be needed. Once you've forced > operands[1] into a register, you can just fall out the bottom of the > pattern and let the default expansion take over. But... >> + } >> + else >> + arm_split_constant (MINUS, SImode, NULL_RTX, >> + INTVAL (operands[1]), operands[0], operands[2], >> + optimize && can_create_pseudo_p ()); >> DONE; > > ... You'll need to move the DONE inside the else clause. > >> } >> else /* TARGET_THUMB1 */ >> @@ -2077,6 +2084,17 @@ >> emit_insn (gen_thumb2_zero_extendqisi2_v6 (operands[0], >> operands[1])); >> } >> + else if (!(const_ok_for_arm (INTVAL (operands[2])) >> + || const_ok_for_arm (~INTVAL (operands[2])) >> + /* zero_extendhi instruction is efficient enough. */ >> + || INTVAL (operands[2]) == 0xffff >> + || (INTVAL (operands[2]) < 0 >> + && const_ok_for_arm (-INTVAL (operands[2])))) >> + && can_create_pseudo_p ()) >> + { >> + operands[2] = force_reg (SImode, operands[2]); >> + emit_insn (gen_andsi3 (operands[0], operands[1], operands[2])); > > Same here. > >> + } >> else >> arm_split_constant (AND, SImode, NULL_RTX, >> INTVAL (operands[2]), operands[0], >> @@ -2882,9 +2900,20 @@ >> { >> if (TARGET_32BIT) >> { >> - arm_split_constant (IOR, SImode, NULL_RTX, >> - INTVAL (operands[2]), operands[0], operands[1], >> - optimize && can_create_pseudo_p ()); >> + if (!(const_ok_for_arm (INTVAL (operands[2])) >> + || (TARGET_THUMB2 >> + && const_ok_for_arm (~INTVAL (operands[2]))) >> + || (INTVAL (operands[2]) < 0 >> + && const_ok_for_arm (-INTVAL (operands[2])))) >> + && can_create_pseudo_p ()) >> + { >> + operands[2] = force_reg (SImode, operands[2]); >> + emit_insn (gen_iorsi3 (operands[0], operands[1], operands[2])); > > And here. > >> + } >> + else >> + arm_split_constant (IOR, SImode, NULL_RTX, >> + INTVAL (operands[2]), operands[0], operands[1], >> + optimize && can_create_pseudo_p ()); >> DONE; >> } >> else /* TARGET_THUMB1 */ >> @@ -3052,9 +3081,18 @@ >> { >> if (TARGET_32BIT) >> { >> - arm_split_constant (XOR, SImode, NULL_RTX, >> - INTVAL (operands[2]), operands[0], operands[1], >> - optimize && can_create_pseudo_p ()); >> + if (!(const_ok_for_arm (INTVAL (operands[2])) >> + || (INTVAL (operands[2]) < 0 >> + && const_ok_for_arm (-INTVAL (operands[2])))) >> + && can_create_pseudo_p ()) >> + { >> + operands[2] = force_reg (SImode, operands[2]); >> + emit_insn (gen_xorsi3 (operands[0], operands[1], operands[2])); > > And here. > >> + } >> + else >> + arm_split_constant (XOR, SImode, NULL_RTX, >> + INTVAL (operands[2]), operands[0], operands[1], >> + optimize && can_create_pseudo_p ()); >> DONE; >> } >> else /* TARGET_THUMB1 */ >> >> diff --git a/gcc/testsuite/gcc.target/arm/maskdata.c b/gcc/testsuite/gcc.target/arm/maskdata.c >> new file mode 100644 >> index 0000000..b4231a4 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/arm/maskdata.c >> @@ -0,0 +1,16 @@ >> +/* { dg-do compile } */ >> +/* { dg-options " -O2 -fno-gcse " } */ >> +/* { dg-require-effective-target arm_thumb2_ok } */ >> + >> +#define MASK 0xfe00ff >> +void maskdata (int * data, int len) >> +{ >> + int i = len; >> + for (; i > 0; i -= 2) >> + { >> + data[i] &= MASK; >> + data[i + 1] &= MASK; >> + } >> +} >> +/* { dg-final { scan-assembler "254" } } */ >> +/* { dg-final { scan-assembler "255" } } */ >> > > diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h index be5e72a..f913035 100644 --- a/gcc/config/arm/arm-protos.h +++ b/gcc/config/arm/arm-protos.h @@ -51,6 +51,7 @@ extern bool arm_modes_tieable_p (enum machine_mode, enum machine_mode); extern int const_ok_for_arm (HOST_WIDE_INT); extern int const_ok_for_op (HOST_WIDE_INT, enum rtx_code); extern int const_ok_for_dimode_op (HOST_WIDE_INT, enum rtx_code); +extern int const_ok_for_split (HOST_WIDE_INT, enum rtx_code); extern int arm_split_constant (RTX_CODE, enum machine_mode, rtx, HOST_WIDE_INT, rtx, rtx, int); extern int legitimate_pic_operand_p (rtx); diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 7f62ca4..78a624d 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -3584,6 +3584,38 @@ const_ok_for_dimode_op (HOST_WIDE_INT i, enum rtx_code code) } } +/* Return true if I is a valid constant for split with the operation CODE. + The condition should align with the constrain of the corresponding + define_insn_and_split pattern to make sure later pass can optimize + the constants. */ +int +const_ok_for_split (HOST_WIDE_INT i, enum rtx_code code) +{ + if (optimize < 2 + || !can_create_pseudo_p () + || const_ok_for_arm (i) + /* Since expand pass always uses "sign-extend" to get the value + (trunc_int_for_mode called from immed_wide_int_const) for rtl, + and logs show most negative values are UNSIGNED when they are + TREE node. And combine pass is smart enough to recover the + negative value to positive value. */ + || ((i < 0) && const_ok_for_arm (-i))) + return 1; + + switch (code) + { + case AND: + /* zero_extendhi instruction is efficient. */ + return const_ok_for_arm (~i) || (i == 0xffff); + + case IOR: + return TARGET_THUMB2 && const_ok_for_arm (~i); + + default: + return 1; + } +} + /* Emit a sequence of insns to handle a large constant. CODE is the code of the operation required, it can be any of SET, PLUS, IOR, AND, XOR, MINUS; diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index cd9ab6c..f3ca849 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -1162,10 +1162,16 @@ { if (TARGET_32BIT) { - arm_split_constant (MINUS, SImode, NULL_RTX, - INTVAL (operands[1]), operands[0], - operands[2], optimize && can_create_pseudo_p ()); - DONE; + if (!const_ok_for_split (INTVAL (operands[1]), MINUS)) + operands[1] = force_reg (SImode, operands[1]); + else + { + arm_split_constant (MINUS, SImode, NULL_RTX, + INTVAL (operands[1]), operands[0], + operands[2], + optimize && can_create_pseudo_p ()); + DONE; + } } else /* TARGET_THUMB1 */ operands[1] = force_reg (SImode, operands[1]); @@ -2076,14 +2082,19 @@ operands[1] = convert_to_mode (QImode, operands[1], 1); emit_insn (gen_thumb2_zero_extendqisi2_v6 (operands[0], operands[1])); + DONE; } + else if (!const_ok_for_split (INTVAL (operands[2]), AND)) + operands[2] = force_reg (SImode, operands[2]); else - arm_split_constant (AND, SImode, NULL_RTX, - INTVAL (operands[2]), operands[0], - operands[1], - optimize && can_create_pseudo_p ()); + { + arm_split_constant (AND, SImode, NULL_RTX, + INTVAL (operands[2]), operands[0], + operands[1], + optimize && can_create_pseudo_p ()); - DONE; + DONE; + } } } else /* TARGET_THUMB1 */ @@ -2882,10 +2893,16 @@ { if (TARGET_32BIT) { - arm_split_constant (IOR, SImode, NULL_RTX, - INTVAL (operands[2]), operands[0], operands[1], - optimize && can_create_pseudo_p ()); - DONE; + if (!const_ok_for_split (INTVAL (operands[2]), IOR)) + operands[2] = force_reg (SImode, operands[2]); + else + { + arm_split_constant (IOR, SImode, NULL_RTX, + INTVAL (operands[2]), operands[0], + operands[1], + optimize && can_create_pseudo_p ()); + DONE; + } } else /* TARGET_THUMB1 */ { @@ -3052,10 +3069,16 @@ { if (TARGET_32BIT) { - arm_split_constant (XOR, SImode, NULL_RTX, - INTVAL (operands[2]), operands[0], operands[1], - optimize && can_create_pseudo_p ()); - DONE; + if (!const_ok_for_split (INTVAL (operands[2]), XOR)) + operands[2] = force_reg (SImode, operands[2]); + else + { + arm_split_constant (XOR, SImode, NULL_RTX, + INTVAL (operands[2]), operands[0], + operands[1], + optimize && can_create_pseudo_p ()); + DONE; + } } else /* TARGET_THUMB1 */ { diff --git a/gcc/testsuite/gcc.target/arm/maskdata.c b/gcc/testsuite/gcc.target/arm/maskdata.c new file mode 100644 index 0000000..6960c75 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/maskdata.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options " -O2 -fno-gcse " } */ +/* { dg-require-effective-target arm_thumb2_ok } */ + +#define MASK 0xfe00ff +void maskdata (int * data, int len) +{ + int i = len; + for (; i > 0; i -= 2) + { + data[i] &= MASK; + data[i + 1] &= MASK; + } +} +/* { dg-final { scan-assembler-not "65536" } } */