From patchwork Tue Nov 1 15:22:37 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kyrill Tkachov X-Patchwork-Id: 80375 Delivered-To: patch@linaro.org Received: by 10.140.97.247 with SMTP id m110csp683099qge; Tue, 1 Nov 2016 08:22:56 -0700 (PDT) X-Received: by 10.99.112.72 with SMTP id a8mr50394180pgn.43.1478013776559; Tue, 01 Nov 2016 08:22:56 -0700 (PDT) Return-Path: Received: from sourceware.org (server1.sourceware.org. [209.132.180.131]) by mx.google.com with ESMTPS id l65si31084873pge.112.2016.11.01.08.22.56 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 01 Nov 2016 08:22:56 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-return-440054-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org; spf=pass (google.com: domain of gcc-patches-return-440054-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) smtp.mailfrom=gcc-patches-return-440054-patch=linaro.org@gcc.gnu.org DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:cc:subject:references :in-reply-to:content-type; q=dns; s=default; b=ZViUjd+3OiGoeEvFf bHb7Jkca2tLzq+YbcRltToyrKJSeoVPqvrsFEGpIhZFXSQ/2ezwV2sy10jAyetPt H+OIp5SrTBHiiNiNeGtQAuwjAyTUo7djjoVkywEe9PTByxS1VlOxpSBjU0Xciws5 gvhDSjaN4oqhXwx0CYpWBqK/i4= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:cc:subject:references :in-reply-to:content-type; s=default; bh=M46CBKZkTAyOGQQ81djTThd Sn5k=; b=jfeKWRHReL30RxExCEfYpnZ4I+OtImv8TjPk1oHhVJ+1CFgsmLLb3ge ttrGjakJPldbVYyIuRH43R/81w6Qimns1FilGgH1gNQa8y/duGR7JXY+mCZnHIn5 1YwLTqT0uPEH3+L0QCO5SsdGo6yRh5dABjjOPxqZZQDDIuo1fw/s= Received: (qmail 5836 invoked by alias); 1 Nov 2016 15:22:43 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 5820 invoked by uid 89); 1 Nov 2016 15:22:42 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-3.2 required=5.0 tests=BAYES_00, KAM_LAZY_DOMAIN_SECURITY, KAM_LOTSOFHASH, RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=savings, const_int_p, CONST_INT_P, gen_reg_rtx X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 01 Nov 2016 15:22:40 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 594B1137A; Tue, 1 Nov 2016 08:22:39 -0700 (PDT) Received: from [10.2.207.77] (e100706-lin.cambridge.arm.com [10.2.207.77]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 7289B3F220; Tue, 1 Nov 2016 08:22:38 -0700 (PDT) Message-ID: <5818B33D.7040208@foss.arm.com> Date: Tue, 01 Nov 2016 15:22:37 +0000 From: Kyrill Tkachov User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: Andrew Pinski CC: GCC Patches , Marcus Shawcroft , Richard Earnshaw , James Greenhalgh Subject: Re: [PATCH][AArch64] Expand DImode constant stores to two SImode stores when profitable References: <580E1A3E.6090108@foss.arm.com> <581730F0.1040007@foss.arm.com> <5818B2F9.3040401@foss.arm.com> In-Reply-To: <5818B2F9.3040401@foss.arm.com> And here is the patch itself. On 01/11/16 15:21, Kyrill Tkachov wrote: > > On 31/10/16 11:54, Kyrill Tkachov wrote: >> >> On 24/10/16 17:15, Andrew Pinski wrote: >>> On Mon, Oct 24, 2016 at 7:27 AM, Kyrill Tkachov >>> wrote: >>>> Hi all, >>>> >>>> When storing a 64-bit immediate that has equal bottom and top halves we >>>> currently >>>> synthesize the repeating 32-bit pattern twice and perform a single X-store. >>>> With this patch we synthesize the 32-bit pattern once into a W register and >>>> store >>>> that twice using an STP. This reduces codesize bloat from synthesising the >>>> same >>>> constant multiple times at the expense of converting a store to a >>>> store-pair. >>>> It will only trigger if we can save two or more instructions, so it will >>>> only transform: >>>> mov x1, 49370 >>>> movk x1, 0xc0da, lsl 32 >>>> str x1, [x0] >>>> >>>> into: >>>> >>>> mov w1, 49370 >>>> stp w1, w1, [x0] >>>> >>>> when optimising for -Os, whereas it will always transform a 4-insn synthesis >>>> sequence into a two-insn sequence + STP (see comments in the patch). >>>> >>>> This patch triggers already but will trigger more with the store merging >>>> pass >>>> that I'm working on since that will generate more of these repeating 64-bit >>>> constants. >>>> This helps improve codegen on 456.hmmer where store merging can sometimes >>>> create very >>>> complex repeating constants and target-specific expand needs to break them >>>> down. >>> >>> Doing STP might be worse on ThunderX 1 than the mov/movk. Or this >>> might cause an ICE with -mcpu=thunderx; I can't remember if the check >>> for slow unaligned store pair word is with the pattern or not. >> >> I can't get it to ICE with -mcpu=thunderx. >> The restriction is just on the STP forming code in the sched-fusion hooks AFAIK. >> >>> Basically the rule is >>> 1) if 4 byte aligned, then it is better to do two str. >>> 2) If 8 byte aligned, then doing stp is good >>> 3) Otherwise it is better to do two str. >> >> Ok, then I'll make the function just emit two stores and depend on the sched-fusion >> machinery to fuse them into an STP when appropriate since that has the logic that >> takes thunderx into account. >> > > Here it is. > I've confirmed that it emits to STRs for 4 byte aligned stores when -mtune=thunderx > and still generates STP for the other tunings, though now sched-fusion is responsible for > merging them, which is ok by me. > > Bootstrapped and tested on aarch64. > Ok for trunk? > > Thanks, > Kyril > > > 2016-11-01 Kyrylo Tkachov > > * config/aarch64/aarch64.md (mov): Call > aarch64_split_dimode_const_store on DImode constant stores. > * config/aarch64/aarch64-protos.h (aarch64_split_dimode_const_store): > New prototype. > * config/aarch64/aarch64.c (aarch64_split_dimode_const_store): New > function. > > 2016-11-01 Kyrylo Tkachov > > * gcc.target/aarch64/store_repeating_constant_1.c: New test. > * gcc.target/aarch64/store_repeating_constant_2.c: Likewise. > >> >> >>> >>> Thanks, >>> Andrew >>> >>> >>>> Bootstrapped and tested on aarch64-none-linux-gnu. >>>> >>>> Ok for trunk? >>>> >>>> Thanks, >>>> Kyrill >>>> >>>> 2016-10-24 Kyrylo Tkachov >>>> >>>> * config/aarch64/aarch64.md (mov): Call >>>> aarch64_split_dimode_const_store on DImode constant stores. >>>> * config/aarch64/aarch64-protos.h (aarch64_split_dimode_const_store): >>>> New prototype. >>>> * config/aarch64/aarch64.c (aarch64_split_dimode_const_store): New >>>> function. >>>> >>>> 2016-10-24 Kyrylo Tkachov >>>> >>>> * gcc.target/aarch64/store_repeating_constant_1.c: New test. >>>> * gcc.target/aarch64/store_repeating_constant_2.c: Likewise. >> > diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index 4f4989d8b0da91096000d0b51736eaa5b0aa9474..b6ca3dfacb0dc88e5d688905d9d013263d4e8d7f 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -337,6 +337,7 @@ bool aarch64_simd_scalar_immediate_valid_for_move (rtx, machine_mode); bool aarch64_simd_shift_imm_p (rtx, machine_mode, bool); bool aarch64_simd_valid_immediate (rtx, machine_mode, bool, struct simd_immediate_info *); +bool aarch64_split_dimode_const_store (rtx, rtx); bool aarch64_symbolic_address_p (rtx); bool aarch64_uimm12_shift (HOST_WIDE_INT); bool aarch64_use_return_insn_p (void); diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index a5c72a65451db639a5a623d15ecc61ceb60d1707..ec77d1cb2a6c63ac1703efc75fc67946e7d24c8e 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -13178,6 +13178,63 @@ aarch64_expand_movmem (rtx *operands) return true; } +/* Split a DImode store of a CONST_INT SRC to MEM DST as two + SImode stores. Handle the case when the constant has identical + bottom and top halves. This is beneficial when the two stores can be + merged into an STP and we avoid synthesising potentially expensive + immediates twice. Return true if such a split is possible. */ + +bool +aarch64_split_dimode_const_store (rtx dst, rtx src) +{ + rtx lo = gen_lowpart (SImode, src); + rtx hi = gen_highpart_mode (SImode, DImode, src); + + bool size_p = optimize_function_for_size_p (cfun); + + if (!rtx_equal_p (lo, hi)) + return false; + + unsigned int orig_cost + = aarch64_internal_mov_immediate (NULL_RTX, src, false, DImode); + unsigned int lo_cost + = aarch64_internal_mov_immediate (NULL_RTX, lo, false, SImode); + + /* We want to transform: + MOV x1, 49370 + MOVK x1, 0x140, lsl 16 + MOVK x1, 0xc0da, lsl 32 + MOVK x1, 0x140, lsl 48 + STR x1, [x0] + into: + MOV w1, 49370 + MOVK w1, 0x140, lsl 16 + STP w1, w1, [x0] + So we want to perform this only when we save two instructions + or more. When optimizing for size, however, accept any code size + savings we can. */ + if (size_p && orig_cost <= lo_cost) + return false; + + if (!size_p + && (orig_cost <= lo_cost + 1)) + return false; + + rtx mem_lo = adjust_address (dst, SImode, 0); + if (!aarch64_mem_pair_operand (mem_lo, SImode)) + return false; + + rtx tmp_reg = gen_reg_rtx (SImode); + aarch64_expand_mov_immediate (tmp_reg, lo); + rtx mem_hi = aarch64_move_pointer (mem_lo, GET_MODE_SIZE (SImode)); + /* Don't emit an explicit store pair as this may not be always profitable. + Let the sched-fusion logic decide whether to merge them. */ + emit_move_insn (mem_lo, tmp_reg); + emit_move_insn (mem_hi, tmp_reg); + + return true; +} + /* Implement the TARGET_ASAN_SHADOW_OFFSET hook. */ static unsigned HOST_WIDE_INT diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 8861ac18f4e33ada3bc6dde0e4667fd040b1c213..ec423eb4b048609daaf2f81b0ae874f2e4350f69 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -1010,6 +1010,11 @@ (define_expand "mov" (match_operand:GPI 1 "general_operand" ""))] "" " + if (MEM_P (operands[0]) && CONST_INT_P (operands[1]) + && mode == DImode + && aarch64_split_dimode_const_store (operands[0], operands[1])) + DONE; + if (GET_CODE (operands[0]) == MEM && operands[1] != const0_rtx) operands[1] = force_reg (mode, operands[1]); diff --git a/gcc/testsuite/gcc.target/aarch64/store_repeating_constant_1.c b/gcc/testsuite/gcc.target/aarch64/store_repeating_constant_1.c new file mode 100644 index 0000000000000000000000000000000000000000..50d456834bcea10760f6cccecc89e3c21f53bb4c --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/store_repeating_constant_1.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mtune=generic" } */ + +void +foo (unsigned long long *a) +{ + a[0] = 0x0140c0da0140c0daULL; +} + +/* { dg-final { scan-assembler-times "movk\\tw.*" 1 } } */ +/* { dg-final { scan-assembler-times "stp\tw\[0-9\]+, w\[0-9\]+.*" 1 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/store_repeating_constant_2.c b/gcc/testsuite/gcc.target/aarch64/store_repeating_constant_2.c new file mode 100644 index 0000000000000000000000000000000000000000..c421277989adcf446ad8a7b3ab9060602c03a7ea --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/store_repeating_constant_2.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-Os" } */ + +/* Check that for -Os we synthesize only the bottom half and then + store it twice with an STP rather than synthesizing it twice in each + half of an X-reg. */ + +void +foo (unsigned long long *a) +{ + a[0] = 0xc0da0000c0daULL; +} + +/* { dg-final { scan-assembler-times "mov\\tw.*" 1 } } */ +/* { dg-final { scan-assembler-times "stp\tw\[0-9\]+, w\[0-9\]+.*" 1 } } */