From patchwork Tue Nov  3 15:43:24 2015
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
X-Patchwork-Id: 55944
Delivered-To: patch@linaro.org
Received: by 10.112.61.134 with SMTP id p6csp1905126lbr;
 Tue, 3 Nov 2015 07:43:53 -0800 (PST)
X-Received: by 10.66.163.201 with SMTP id yk9mr2337075pab.107.1446565433821; 
 Tue, 03 Nov 2015 07:43:53 -0800 (PST)
Return-Path: <gcc-patches-return-412517-patch=linaro.org@gcc.gnu.org>
Received: from sourceware.org (server1.sourceware.org. [209.132.180.131])
 by mx.google.com with ESMTPS id
 x8si42408907pbt.193.2015.11.03.07.43.53 for <patch@linaro.org>
 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Tue, 03 Nov 2015 07:43:53 -0800 (PST)
Received-SPF: pass (google.com: domain of
 gcc-patches-return-412517-patch=linaro.org@gcc.gnu.org
 designates 209.132.180.131 as permitted sender)
 client-ip=209.132.180.131; 
Authentication-Results: mx.google.com; spf=pass (google.com: domain of
 gcc-patches-return-412517-patch=linaro.org@gcc.gnu.org
 designates 209.132.180.131 as permitted sender)
 smtp.mailfrom=gcc-patches-return-412517-patch=linaro.org@gcc.gnu.org;
 dkim=pass header.i=@gcc.gnu.org
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
 :list-unsubscribe:list-archive:list-post:list-help:sender
 :message-id:date:from:mime-version:to:cc:subject:content-type;
 q=dns; s=default; b=NlaZ+mhPmdFtluONZy41G1Sh/9tGePhTCgfbMpbEqVm
 Q+Fec6MjY+C0xMDMIjmAkgzGpBk41nAHYMBDznVT0TF0tlI6AAkKzFrN86zUS/We
 QrHALbKkkpcDRrFdA1sGv7Z14AIG0FlZeOVjBAGNPBug78+DmgT8YFerr0DJZt1Q
 =
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
 :list-unsubscribe:list-archive:list-post:list-help:sender
 :message-id:date:from:mime-version:to:cc:subject:content-type;
 s=default; bh=9qzRPc+suK799JLrkwX40kF7S1A=; b=qQAFvgB8cW6820sGH
 juw183T1SxqHWhKThfaLndS5U7LAgBbiCTntu61b9AzixQs0T7qEnnP0NLVk03fk
 t6V+8O+iTjjIpVCHkZmBowIUDR/DPLODnPXEvGbOfFTYluOaeEqPp++WiBA4hyx4
 emS1MB12AA/f86Q2bqEQo32ht0=
Received: (qmail 6350 invoked by alias); 3 Nov 2015 15:43:32 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <mailto:gcc-patches-unsubscribe-patch=linaro.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 6337 invoked by uid 89); 3 Nov 2015 15:43:31 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.7 required=5.0 tests=AWL, BAYES_00,
 SPF_PASS autolearn=ham version=3.3.2
X-HELO: eu-smtp-delivery-143.mimecast.com
Received: from eu-smtp-delivery-143.mimecast.com (HELO
 eu-smtp-delivery-143.mimecast.com) (146.101.78.143) by
 sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP;
 Tue, 03 Nov 2015 15:43:30 +0000
Received: from cam-owa1.Emea.Arm.com (fw-tnat.cambridge.arm.com
 [217.140.96.140]) by eu-smtp-1.mimecast.com with ESMTP id
 uk-mta-15-CNTGU8t8TOq8PnKYGolwLQ-1; Tue, 03 Nov 2015 15:43:25 +0000
Received: from [10.2.206.200] ([10.1.2.79]) by cam-owa1.Emea.Arm.com with
 Microsoft SMTPSVC(6.0.3790.3959); Tue, 3 Nov 2015 15:43:24 +0000
Message-ID: <5638D61C.5060100@arm.com>
Date: Tue, 03 Nov 2015 15:43:24 +0000
From: Kyrill Tkachov <kyrylo.tkachov@arm.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
 rv:31.0) Gecko/20100101 Thunderbird/31.2.0
MIME-Version: 1.0
To: GCC Patches <gcc-patches@gcc.gnu.org>
CC: Marcus Shawcroft <marcus.shawcroft@arm.com>,
 Richard Earnshaw <Richard.Earnshaw@arm.com>,
 James Greenhalgh <james.greenhalgh@arm.com>
Subject: [PATCH][AArch64][v2] Improve comparison with complex immediates
 followed by branch/cset
X-MC-Unique: CNTGU8t8TOq8PnKYGolwLQ-1
X-IsSubscribed: yes

Hi all,

This patch slightly improves sequences where we want to compare against a complex immediate and branch against the result
or perform a cset on it.
This means transforming sequences of mov+movk+cmp+branch into sub+subs+branch.
Similar for cset. Unfortunately I can't just do this by simply matching a (compare (reg) (const_int)) rtx because
this transformation is only valid for equal/not equal comparisons, not greater than/less than ones but the compare instruction
pattern only has the general CC mode. We need to also match the use of the condition code.

I've done this by creating a splitter for the conditional jump where the condition is the comparison between the register
and the complex immediate and splitting it into the sub+subs+condjump sequence. Similar for the cstore pattern.
Thankfully we don't split immediate moves until later in the optimization pipeline so combine can still try the right patterns.
With this patch for the example code:
void g(void);
void f8(int x)
{
    if (x != 0x123456) g();
}

I get:
f8:
         sub     w0, w0, #1191936
         subs    w0, w0, #1110
         beq     .L1
         b       g
         .p2align 3
.L1:
         ret

instead of the previous:
f8:
         mov     w1, 13398
         movk    w1, 0x12, lsl 16
         cmp     w0, w1
         beq     .L1
         b       g
         .p2align 3
.L1:
         ret


The condjump case triggered 130 times across all of SPEC2006 which is, admittedly, not much
whereas the cstore case didn't trigger at all. However, the included testcase in the patch
demonstrates the kind of code that it would trigger on.

Bootstrapped and tested on aarch64.

Ok for trunk?

Thanks,
Kyrill


2015-11-03  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * config/aarch64/aarch64.md (*condjump): Rename to...
     (condjump): ... This.
     (*compare_condjump<mode>): New define_insn_and_split.
     (*compare_cstore<mode>_insn): Likewise.
     (*cstore<mode>_insn): Rename to...
     (aarch64_cstore<mode>): ... This.
     * config/aarch64/iterators.md (CMP): Handle ne code.
     * config/aarch64/predicates.md (aarch64_imm24): New predicate.

2015-11-03  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * gcc.target/aarch64/cmpimm_branch_1.c: New test.
     * gcc.target/aarch64/cmpimm_cset_1.c: Likewise.

commit 7df013a391532f39932b80c902e3b4bbd841710f
Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
Date:   Mon Sep 21 10:56:47 2015 +0100

    [AArch64] Improve comparison with complex immediates

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 126c9c2..1bfc870 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -369,7 +369,7 @@ (define_expand "mod<mode>3"
   }
 )
 
-(define_insn "*condjump"
+(define_insn "condjump"
   [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
 			    [(match_operand 1 "cc_register" "") (const_int 0)])
 			   (label_ref (match_operand 2 "" ""))
@@ -394,6 +394,40 @@ (define_insn "*condjump"
 		      (const_int 1)))]
 )
 
+;; For a 24-bit immediate CST we can optimize the compare for equality
+;; and branch sequence from:
+;; mov	x0, #imm1
+;; movk	x0, #imm2, lsl 16 /* x0 contains CST.  */
+;; cmp	x1, x0
+;; b<ne,eq> .Label
+;; into the shorter:
+;; sub	x0, #(CST & 0xfff000)
+;; subs	x0, #(CST & 0x000fff)
+;; b<ne,eq> .Label
+(define_insn_and_split "*compare_condjump<mode>"
+  [(set (pc) (if_then_else (EQL
+			      (match_operand:GPI 0 "register_operand" "r")
+			      (match_operand:GPI 1 "aarch64_imm24" "n"))
+			   (label_ref:P (match_operand 2 "" ""))
+			   (pc)))]
+  "!aarch64_move_imm (INTVAL (operands[1]), <MODE>mode)
+   && !aarch64_plus_operand (operands[1], <MODE>mode)"
+  "#"
+  "&& true"
+  [(const_int 0)]
+  {
+    HOST_WIDE_INT lo_imm = UINTVAL (operands[1]) & 0xfff;
+    HOST_WIDE_INT hi_imm = UINTVAL (operands[1]) & 0xfff000;
+    rtx tmp = gen_reg_rtx (<MODE>mode);
+    emit_insn (gen_add<mode>3 (tmp, operands[0], GEN_INT (-hi_imm)));
+    emit_insn (gen_add<mode>3_compare0 (tmp, tmp, GEN_INT (-lo_imm)));
+    rtx cc_reg = gen_rtx_REG (CC_NZmode, CC_REGNUM);
+    rtx cmp_rtx = gen_rtx_fmt_ee (<EQL:CMP>, <MODE>mode, cc_reg, const0_rtx);
+    emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[2]));
+    DONE;
+  }
+)
+
 (define_expand "casesi"
   [(match_operand:SI 0 "register_operand" "")	; Index
    (match_operand:SI 1 "const_int_operand" "")	; Lower bound
@@ -2898,7 +2932,7 @@ (define_expand "cstore<mode>4"
   "
 )
 
-(define_insn "*cstore<mode>_insn"
+(define_insn "aarch64_cstore<mode>"
   [(set (match_operand:ALLI 0 "register_operand" "=r")
 	(match_operator:ALLI 1 "aarch64_comparison_operator"
 	 [(match_operand 2 "cc_register" "") (const_int 0)]))]
@@ -2907,6 +2941,39 @@ (define_insn "*cstore<mode>_insn"
   [(set_attr "type" "csel")]
 )
 
+;; For a 24-bit immediate CST we can optimize the compare for equality
+;; and branch sequence from:
+;; mov	x0, #imm1
+;; movk	x0, #imm2, lsl 16 /* x0 contains CST.  */
+;; cmp	x1, x0
+;; cset	x2, <ne,eq>
+;; into the shorter:
+;; sub	x0, #(CST & 0xfff000)
+;; subs	x0, #(CST & 0x000fff)
+;; cset x1, <ne, eq>.
+(define_insn_and_split "*compare_cstore<mode>_insn"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+	 (EQL:GPI (match_operand:GPI 1 "register_operand" "r")
+		  (match_operand:GPI 2 "aarch64_imm24" "n")))]
+  "!aarch64_move_imm (INTVAL (operands[2]), <MODE>mode)
+   && !aarch64_plus_operand (operands[2], <MODE>mode)"
+  "#"
+  "&& true"
+  [(const_int 0)]
+  {
+    HOST_WIDE_INT lo_imm = UINTVAL (operands[2]) & 0xfff;
+    HOST_WIDE_INT hi_imm = UINTVAL (operands[2]) & 0xfff000;
+    rtx tmp = gen_reg_rtx (<MODE>mode);
+    emit_insn (gen_add<mode>3 (tmp, operands[1], GEN_INT (-hi_imm)));
+    emit_insn (gen_add<mode>3_compare0 (tmp, tmp, GEN_INT (-lo_imm)));
+    rtx cc_reg = gen_rtx_REG (CC_NZmode, CC_REGNUM);
+    rtx cmp_rtx = gen_rtx_fmt_ee (<EQL:CMP>, <MODE>mode, cc_reg, const0_rtx);
+    emit_insn (gen_aarch64_cstore<mode> (operands[0], cmp_rtx, cc_reg));
+    DONE;
+  }
+  [(set_attr "type" "csel")]
+)
+
 ;; zero_extend version of the above
 (define_insn "*cstoresi_insn_uxtw"
   [(set (match_operand:DI 0 "register_operand" "=r")
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index c4a1c98..9f63ef2 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -801,7 +801,7 @@ (define_code_attr cmp_2   [(lt "1") (le "1") (eq "2") (ge "2") (gt "2")
 			   (ltu "1") (leu "1") (geu "2") (gtu "2")])
 
 (define_code_attr CMP [(lt "LT") (le "LE") (eq "EQ") (ge "GE") (gt "GT")
-			   (ltu "LTU") (leu "LEU") (geu "GEU") (gtu "GTU")])
+			   (ltu "LTU") (leu "LEU") (ne "NE") (geu "GEU") (gtu "GTU")])
 
 (define_code_attr fix_trunc_optab [(fix "fix_trunc")
 				   (unsigned_fix "fixuns_trunc")])
diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
index 046f852..1bcbf62 100644
--- a/gcc/config/aarch64/predicates.md
+++ b/gcc/config/aarch64/predicates.md
@@ -145,6 +145,11 @@ (define_predicate "aarch64_imm3"
   (and (match_code "const_int")
        (match_test "(unsigned HOST_WIDE_INT) INTVAL (op) <= 4")))
 
+;; An immediate that fits into 24 bits.
+(define_predicate "aarch64_imm24"
+  (and (match_code "const_int")
+       (match_test "(UINTVAL (op) & 0xffffff) == UINTVAL (op)")))
+
 (define_predicate "aarch64_pwr_imm3"
   (and (match_code "const_int")
        (match_test "INTVAL (op) != 0
diff --git a/gcc/testsuite/gcc.target/aarch64/cmpimm_branch_1.c b/gcc/testsuite/gcc.target/aarch64/cmpimm_branch_1.c
new file mode 100644
index 0000000..d7a8d5b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/cmpimm_branch_1.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-save-temps -O2" } */
+
+/* Test that we emit a sub+subs sequence rather than mov+movk+cmp.  */
+
+void g (void);
+void
+foo (int x)
+{
+  if (x != 0x123456)
+    g ();
+}
+
+void
+fool (long long x)
+{
+  if (x != 0x123456)
+    g ();
+}
+
+/* { dg-final { scan-assembler-not "cmp\tw\[0-9\]*.*" } } */
+/* { dg-final { scan-assembler-not "cmp\tx\[0-9\]*.*" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/cmpimm_cset_1.c b/gcc/testsuite/gcc.target/aarch64/cmpimm_cset_1.c
new file mode 100644
index 0000000..619c026
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/cmpimm_cset_1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-save-temps -O2" } */
+
+/* Test that we emit a sub+subs sequence rather than mov+movk+cmp.  */
+
+int
+foo (int x)
+{
+  return x == 0x123456;
+}
+
+long
+fool (long x)
+{
+  return x == 0x123456;
+}
+
+/* { dg-final { scan-assembler-not "cmp\tw\[0-9\]*.*" } } */
+/* { dg-final { scan-assembler-not "cmp\tx\[0-9\]*.*" } } */