From patchwork Thu May 24 16:51:13 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Stubbs X-Patchwork-Id: 8947 Return-Path: X-Original-To: patchwork@peony.canonical.com Delivered-To: patchwork@peony.canonical.com Received: from fiordland.canonical.com (fiordland.canonical.com [91.189.94.145]) by peony.canonical.com (Postfix) with ESMTP id 4B23923F07 for ; Thu, 24 May 2012 16:51:25 +0000 (UTC) Received: from mail-qc0-f180.google.com (mail-qc0-f180.google.com [209.85.216.180]) by fiordland.canonical.com (Postfix) with ESMTP id DC0BFA18016 for ; Thu, 24 May 2012 16:51:24 +0000 (UTC) Received: by qcmv28 with SMTP id v28so7135650qcm.11 for ; Thu, 24 May 2012 09:51:24 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-forwarded-to:x-forwarded-for:delivered-to:received-spf:message-id :date:from:user-agent:mime-version:cc:subject:references:in-reply-to :content-type:x-originalarrivaltime:x-gm-message-state; bh=8OtGVGVY9ua4hLSMPZWHD8GcuyB7zTtL8svXbTQ8l1o=; b=VcaTjL+dSQx6+Hqn/oM1zmxie5U4sr0147XxvicqTWK/gWxyBJpgM6q42bje0vDfXo Dreo5q7ITI/78NF/6sJIYmuqgr8pEKyxgKCxpDpC+oGqgxLZfhcnNHsEbZgTCzp5TTDH m0KtsGg5EWi0gh8USYpmFrIZlMN9VOKBETvP3/HUz/wBNjNEJLFh7Jj1SsqjO//E/4Z8 DE+CudB88pzvDZxYOoPxtPIPXgCbkTJ+zjX3bv1xEMH79fvauPv5wj5GFqGBzt2M73u2 1ZVEyqe+iyKbs8uKIZk4hY9O7vavQ9sRLNv6vZZJy4B1V9Cz+JxpgSuEA9D8i8SG9d8Y gYUw== Received: by 10.50.203.39 with SMTP id kn7mr88308igc.53.1337878283923; Thu, 24 May 2012 09:51:23 -0700 (PDT) X-Forwarded-To: linaro-patchwork@canonical.com X-Forwarded-For: patch@linaro.org linaro-patchwork@canonical.com Delivered-To: patches@linaro.org Received: by 10.231.35.72 with SMTP id o8csp53668ibd; Thu, 24 May 2012 09:51:21 -0700 (PDT) Received: by 10.204.149.216 with SMTP id u24mr67767bkv.36.1337878280983; Thu, 24 May 2012 09:51:20 -0700 (PDT) Received: from relay1.mentorg.com (relay1.mentorg.com. [192.94.38.131]) by mx.google.com with ESMTPS id gi18si1709286bkc.35.2012.05.24.09.51.19 (version=TLSv1/SSLv3 cipher=OTHER); Thu, 24 May 2012 09:51:20 -0700 (PDT) Received-SPF: neutral (google.com: 192.94.38.131 is neither permitted nor denied by best guess record for domain of Andrew_Stubbs@mentor.com) client-ip=192.94.38.131; Authentication-Results: mx.google.com; spf=neutral (google.com: 192.94.38.131 is neither permitted nor denied by best guess record for domain of Andrew_Stubbs@mentor.com) smtp.mail=Andrew_Stubbs@mentor.com Received: from svr-orw-exc-10.mgc.mentorg.com ([147.34.98.58]) by relay1.mentorg.com with esmtp id 1SXbFt-0007dV-Oq from Andrew_Stubbs@mentor.com ; Thu, 24 May 2012 09:51:17 -0700 Received: from SVR-IES-FEM-01.mgc.mentorg.com ([137.202.0.104]) by SVR-ORW-EXC-10.mgc.mentorg.com with Microsoft SMTPSVC(6.0.3790.4675); Thu, 24 May 2012 09:50:56 -0700 Received: from [172.30.64.183] (137.202.0.76) by SVR-IES-FEM-01.mgc.mentorg.com (137.202.0.104) with Microsoft SMTP Server id 14.1.289.1; Thu, 24 May 2012 17:51:15 +0100 Message-ID: <4FBE6701.7010506@codesourcery.com> Date: Thu, 24 May 2012 17:51:13 +0100 From: Andrew Stubbs User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:12.0) Gecko/20120430 Thunderbird/12.0.1 MIME-Version: 1.0 CC: "gcc-patches@gcc.gnu.org" , "patches@linaro.org" Subject: Re: [PATCH][ARM] 64-bit shifts in NEON. References: <4F2FD216.6090507@codesourcery.com> <4F43B707.7070908@codesourcery.com> <4F46A336.5000503@codesourcery.com> In-Reply-To: <4F46A336.5000503@codesourcery.com> X-OriginalArrivalTime: 24 May 2012 16:50:56.0556 (UTC) FILETIME=[63976EC0:01CD39CD] X-Gm-Message-State: ALoCoQmKhOUl9Z+ic3jPFSdWKK3LX/6kGN01DQ5Uzhq5LPENPvYuV6uzKxGQwOHiYDV2jUz4inM2 On 23/02/12 20:36, Andrew Stubbs wrote: > On 21/02/12 15:23, Andrew Stubbs wrote: >> On 06/02/12 13:13, Andrew Stubbs wrote: >>> This patch adds DImode shift support in NEON registers/instructions. >>> >>> The patch causes delays any lowering until the split2 pass, after the >>> register allocator has chosen whether to do the shift in NEON (VFP) >>> registers, or in core-registers. >>> >>> The core-registers case depends on the patch I previously posted here: >>> http://gcc.gnu.org/ml/gcc-patches/2012-01/msg01472.html >>> >>> The NEON right-shifts make life more interesting by using a left-shift >>> instruction with a negative offset. This means that the amount has to be >>> negated. Ideally you'd want to do this at expand time, but the delayed >>> NEON/core decision makes this impossible, so I've chosen to expand this >>> in the post-reload split pass. Unfortunately, NEON does not provide a >>> suitable instruction for negating the shift amount, so that ends up >>> happening in core-registers. >>> >>> Another complication is that the NEON shift instructions use a 64-bit >>> register for the shift amount, but they only pay attention to the bottom >>> 8 bits. I did experiment with using a DImode shift amount, but that >>> didn't work out well; there were unnecessary extends and the >>> core-registers fall back was less efficient. >>> >>> Therefore, I've chosen to create a new register class, VFP_LO_REGS_EVEN, >>> which includes only the 32-bit low-part of the DImode NEON registers so >>> the shift amount can be loaded into VFP regs without extending them. >>> This required a new print format 'E' that converts the low-part name to >>> the full register name the instructions need. Unfortunately, this does >>> artificially limit the shift amount to the bottom half of the register >>> set, but hopefully that's not going to be a big problem. >>> >>> The register allocator is causing me trouble though. The problem is that >>> the compiler just refused to use the NEON variant in all of my toy >>> examples. It turns out to be simply that the IRA & reload passes do not >>> change hard-registers already present in the RTL (function parameters, >>> return values, etc.) unless there is absolutely no alternative that >>> works with that register. I'm not sure if there's anything that can be >>> done about this, or not. I'm not even sure if it isn't the right choice >>> much of the time, cost wise. >> >> I've now updated the patch to take into account size optimization. >> >> Currently, if optimizing for size the compiler prefers to call the >> libgcc function, rather that do the shift inline. >> >> With my old patch, when NEON is enabled it always used the inline code >> (either in NEON or core-registers) no matter which optimization flags >> were set. This is more-or-less correct if the register allocator chooses >> to do the operation in NEON, but much less space efficient otherwise. >> >> The update simply disables the core-registers fall-back option when >> optimizing for size. Transferring the values to NEON registers and back >> should be roughly the same size as calling a function, so there >> shouldn't be a big loss. >> >> I'm in two minds about the shift-by-constant cases though, since they >> expand to fewer instructions. Any thoughts? > > And yet another update. > > This time I noticed that I didn't discard the "clobber"s after the split > has determined they're not necessary any more. Presumably the > unallocated "match_scratch"es were harmless, but the unnecessary CC > clobbers could affect if-conversion and scheduling. > > This patch is the same as the previous, except that I've broken out the > alternatives that don't need any clobbers. > > Ok for 4.8? Ping! The pre-requisite patch is now committed so this one is ready for review. Here's a rebased version of the same patch. The only real difference is that one of my constraint names is no longer available, so now their all renamed. OK? Andrew 2012-05-18 Andrew Stubbs gcc/ * config/arm/arm.c (arm_print_operand): Add new 'E' format code. * config/arm/arm.h (enum reg_class): Add VFP_LO_REGS_EVEN. (REG_CLASS_NAMES, REG_CLASS_CONTENTS, IS_VFP_CLASS): Likewise. * config/arm/arm.md (opt, opt_enabled): New attributes. (enabled): Use opt_enabled. (ashldi3, ashrdi3, lshrdi3): Add TARGET_NEON case. * config/arm/constraints.md (T): New register constraint. (Pf, PF, P1, Pg, Ph): New constraints. * config/arm/neon.md (signed_shift_di3_neon, unsigned_shift_di3_neon, ashldi3_neon, ashldi3_neon_noclobber, ashrdi3_neon_imm, ashrdi3_neon_reg, ashrdi3_neon, ashrdi3_neon_imm_noclobber, lshrdi3_neon_imm, ashrdi3_neon, lshrdi3_neon_imm_noclobber, lshrdi3_neon_imm, lshrdi3_neon_reg, lshrdi3_neon): New patterns. * config/arm/predicates.md (int_1_to_64): New predicate. diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 3ad4c75..f581d15 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -17973,6 +17973,24 @@ arm_print_operand (FILE *stream, rtx x, int code) } return; + /* Print the VFP/Neon double precision register name that overlaps the + given single-precision register. */ + case 'E': + { + int mode = GET_MODE (x); + + if (GET_MODE_SIZE (mode) != 4 + || GET_CODE (x) != REG + || !IS_VFP_REGNUM (REGNO (x))) + { + output_operand_lossage ("invalid operand for code '%c'", code); + return; + } + + fprintf (stream, "d%d", (REGNO (x) - FIRST_VFP_REGNUM) >> 1); + } + return; + /* These two codes print the low/high doubleword register of a Neon quad register, respectively. For pair-structure types, can also print low/high quadword registers. */ diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h index f4204e4..ebd77c6 100644 --- a/gcc/config/arm/arm.h +++ b/gcc/config/arm/arm.h @@ -1043,6 +1043,7 @@ enum reg_class CIRRUS_REGS, VFP_D0_D7_REGS, VFP_LO_REGS, + VFP_LO_REGS_EVEN, VFP_HI_REGS, VFP_REGS, IWMMXT_GR_REGS, @@ -1069,6 +1070,7 @@ enum reg_class "CIRRUS_REGS", \ "VFP_D0_D7_REGS", \ "VFP_LO_REGS", \ + "VFP_LO_REGS_EVEN", \ "VFP_HI_REGS", \ "VFP_REGS", \ "IWMMXT_GR_REGS", \ @@ -1094,6 +1096,7 @@ enum reg_class { 0xF8000000, 0x000007FF, 0x00000000, 0x00000000 }, /* CIRRUS_REGS */ \ { 0x00000000, 0x80000000, 0x00007FFF, 0x00000000 }, /* VFP_D0_D7_REGS */ \ { 0x00000000, 0x80000000, 0x7FFFFFFF, 0x00000000 }, /* VFP_LO_REGS */ \ + { 0x00000000, 0x80000000, 0x2AAAAAAA, 0x00000000 }, /* VFP_LO_REGS_EVEN */ \ { 0x00000000, 0x00000000, 0x80000000, 0x7FFFFFFF }, /* VFP_HI_REGS */ \ { 0x00000000, 0x80000000, 0xFFFFFFFF, 0x7FFFFFFF }, /* VFP_REGS */ \ { 0x00000000, 0x00007800, 0x00000000, 0x00000000 }, /* IWMMXT_GR_REGS */ \ @@ -1111,7 +1114,7 @@ enum reg_class /* Any of the VFP register classes. */ #define IS_VFP_CLASS(X) \ - ((X) == VFP_D0_D7_REGS || (X) == VFP_LO_REGS \ + ((X) == VFP_D0_D7_REGS || (X) == VFP_LO_REGS || (X) == VFP_LO_REGS_EVEN \ || (X) == VFP_HI_REGS || (X) == VFP_REGS) /* The same information, inverted: diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index bc97a4a..d68dc0c 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -251,6 +251,22 @@ (const_string "yes")] (const_string "no"))) +(define_attr "opt" "any,speed,size" + (const_string "any")) + +(define_attr "opt_enabled" "no,yes" + (cond [(eq_attr "opt" "any") + (const_string "yes") + + (and (eq_attr "opt" "speed") + (match_test "optimize_function_for_speed_p (cfun)")) + (const_string "yes") + + (and (eq_attr "opt" "size") + (match_test "optimize_function_for_size_p (cfun)")) + (const_string "yes")] + (const_string "no"))) + ; Allows an insn to disable certain alternatives for reasons other than ; arch support. (define_attr "insn_enabled" "no,yes" @@ -258,11 +274,15 @@ ; Enable all alternatives that are both arch_enabled and insn_enabled. (define_attr "enabled" "no,yes" - (if_then_else (eq_attr "insn_enabled" "yes") - (if_then_else (eq_attr "arch_enabled" "yes") - (const_string "yes") - (const_string "no")) - (const_string "no"))) + (cond [(eq_attr "insn_enabled" "no") + (const_string "no") + + (eq_attr "arch_enabled" "no") + (const_string "no") + + (eq_attr "opt_enabled" "no") + (const_string "no")] + (const_string "yes"))) ; POOL_RANGE is how far away from a constant pool entry that this insn ; can be placed. If the distance is zero, then this insn will never @@ -3520,8 +3540,15 @@ (match_operand:SI 2 "reg_or_int_operand" "")))] "TARGET_32BIT" " - if (!CONST_INT_P (operands[2]) - && (TARGET_REALLY_IWMMXT || (TARGET_HARD_FLOAT && TARGET_MAVERICK))) + if (TARGET_NEON) + { + /* Delay the decision whether to use NEON or core-regs until + register allocation. */ + emit_insn (gen_ashldi3_neon (operands[0], operands[1], operands[2])); + DONE; + } + else if (!CONST_INT_P (operands[2]) + && (TARGET_REALLY_IWMMXT || (TARGET_HARD_FLOAT && TARGET_MAVERICK))) ; /* No special preparation statements; expand pattern as above. */ else { @@ -3595,8 +3622,15 @@ (match_operand:SI 2 "reg_or_int_operand" "")))] "TARGET_32BIT" " - if (!CONST_INT_P (operands[2]) - && (TARGET_REALLY_IWMMXT || (TARGET_HARD_FLOAT && TARGET_MAVERICK))) + if (TARGET_NEON) + { + /* Delay the decision whether to use NEON or core-regs until + register allocation. */ + emit_insn (gen_ashrdi3_neon (operands[0], operands[1], operands[2])); + DONE; + } + else if (!CONST_INT_P (operands[2]) + && (TARGET_REALLY_IWMMXT || (TARGET_HARD_FLOAT && TARGET_MAVERICK))) ; /* No special preparation statements; expand pattern as above. */ else { @@ -3668,8 +3702,15 @@ (match_operand:SI 2 "reg_or_int_operand" "")))] "TARGET_32BIT" " - if (!CONST_INT_P (operands[2]) - && (TARGET_REALLY_IWMMXT || (TARGET_HARD_FLOAT && TARGET_MAVERICK))) + if (TARGET_NEON) + { + /* Delay the decision whether to use NEON or core-regs until + register allocation. */ + emit_insn (gen_lshrdi3_neon (operands[0], operands[1], operands[2])); + DONE; + } + else if (!CONST_INT_P (operands[2]) + && (TARGET_REALLY_IWMMXT || (TARGET_HARD_FLOAT && TARGET_MAVERICK))) ; /* No special preparation statements; expand pattern as above. */ else { diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md index 0b80e1f..95507fc 100644 --- a/gcc/config/arm/constraints.md +++ b/gcc/config/arm/constraints.md @@ -19,7 +19,7 @@ ;; . ;; The following register constraints have been used: -;; - in ARM/Thumb-2 state: f, t, v, w, x, y, z +;; - in ARM/Thumb-2 state: f, t, T, v, w, x, y, z ;; - in Thumb state: h, b ;; - in both states: l, c, k ;; In ARM state, 'l' is an alias for 'r' @@ -29,7 +29,8 @@ ;; in Thumb-1 state: I, J, K, L, M, N, O ;; The following multi-letter normal constraints have been used: -;; in ARM/Thumb-2 state: Da, Db, Dc, Dn, Dl, DL, Dv, Dy, Di, Dt, Dz +;; in ARM/Thumb-2 state: Da, Db, Dc, Dn, Dl, DL, Dv, Dy, Di, Dt, Dz, Pf, PF, +;; Pg, Ph, P1 ;; in Thumb-1 state: Pa, Pb, Pc, Pd, Pe ;; in Thumb-2 state: Pj, PJ, Ps, Pt, Pu, Pv, Pw, Px, Py @@ -45,6 +46,9 @@ (define_register_constraint "t" "TARGET_32BIT ? VFP_LO_REGS : NO_REGS" "The VFP registers @code{s0}-@code{s31}.") +(define_register_constraint "T" "TARGET_32BIT ? VFP_LO_REGS_EVEN : NO_REGS" + "The even numbered VFP registers @code{s0}-@code{s31}.") + (define_register_constraint "v" "TARGET_ARM ? CIRRUS_REGS : NO_REGS" "The Cirrus Maverick co-processor registers.") @@ -177,6 +181,32 @@ (and (match_code "const_int") (match_test "TARGET_THUMB1 && ival >= 256 && ival <= 510"))) +(define_constraint "Pf" + "@internal In ARM/Thumb-2 state, a constant in the range 0 to 63" + (and (match_code "const_int") + (match_test "TARGET_32BIT && ival >= 0 && ival < 64"))) + +(define_constraint "PF" + "@internal In ARM/Thumb-2 state, a constant in the range 1 to 64" + (and (match_code "const_int") + (match_test "TARGET_32BIT && ival > 0 && ival <= 64"))) + +(define_constraint "P1" + "@internal In ARM/Thumb2 state, a constant of 1" + (and (match_code "const_int") + (match_test "TARGET_32BIT && ival == 1"))) + +(define_constraint "Pg" + "@internal In ARM state, a constant in the range 0 to 63, and in thumb-2 state, 32 to 63" + (and (match_code "const_int") + (match_test "(TARGET_ARM && ival >= 0 && ival < 64) + || (TARGET_THUMB2 && ival >= 32 && ival < 64)"))) + +(define_constraint "Ph" + "@internal In Thumb-2 state, a constant in the range 0 to 31" + (and (match_code "const_int") + (match_test "TARGET_THUMB2 && ival >= 0 && ival <= 31"))) + (define_constraint "Ps" "@internal In Thumb-2 state a constant in the range -255 to +255" (and (match_code "const_int") diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index 4568dea..a671c1b 100644 --- a/gcc/config/arm/neon.md +++ b/gcc/config/arm/neon.md @@ -1133,6 +1133,266 @@ DONE; }) +;; 64-bit shifts + +; The shift amount needs to be negated for right-shifts +(define_insn "signed_shift_di3_neon" + [(set (match_operand:DI 0 "s_register_operand" "=w") + (unspec:DI [(match_operand:DI 1 "s_register_operand" " w") + (match_operand:SI 2 "s_register_operand" " T")] + UNSPEC_ASHIFT_SIGNED))] + "TARGET_NEON && reload_completed" + "vshl.s64\t%P0, %P1, %E2 @ ashr %P0, %P1, %2" + [(set_attr "neon_type" "neon_vshl_ddd")] +) + +; The shift amount needs to be negated for right-shifts +(define_insn "unsigned_shift_di3_neon" + [(set (match_operand:DI 0 "s_register_operand" "=w") + (unspec:DI [(match_operand:DI 1 "s_register_operand" " w") + (match_operand:SI 2 "s_register_operand" " T")] + UNSPEC_ASHIFT_UNSIGNED))] + "TARGET_NEON && reload_completed" + "vshl.u64\t%P0, %P1, %E2 @ lshr %P0, %P1, %2" + [(set_attr "neon_type" "neon_vshl_ddd")] +) + +(define_insn "ashldi3_neon_noclobber" + [(set (match_operand:DI 0 "s_register_operand" "=w,w") + (ashift:DI (match_operand:DI 1 "s_register_operand" " w,w") + (match_operand:SI 2 "reg_or_int_operand" "Pf,T")))] + "TARGET_NEON && reload_completed" + "@ + vshl.u64\t%P0, %P1, %2 + vshl.u64\t%P0, %P1, %E2 @ ashl %P0, %P1, %2" + [(set_attr "neon_type" "neon_vshl_ddd,neon_vshl_ddd")] +) + +(define_insn_and_split "ashldi3_neon" + [(set (match_operand:DI 0 "s_register_operand" "=w,w,?&r,?&r,?r,?r,?r,?w,?w") + (ashift:DI (match_operand:DI 1 "s_register_operand" " w,w, 0, r, r, r, r, w, w") + (match_operand:SI 2 "reg_or_int_operand" " T,i, r, r,P1,Pf,Pg, T, i"))) + (clobber (match_scratch:SI 3 "=X,X, &r, &r, X, X,&r, X, X")) + (clobber (match_scratch:SI 4 "=X,X, &r, &r, X, X, X, X, X")) + (clobber (reg:CC_C CC_REGNUM))] + "TARGET_NEON" + "#" + "TARGET_NEON && reload_completed" + [(const_int 0)] + " + { + if (IS_VFP_REGNUM (REGNO (operands[0]))) + { + if (CONST_INT_P (operands[2])) + { + if (INTVAL (operands[2]) < 1) + { + emit_insn (gen_movdi (operands[0], operands[1])); + DONE; + } + else if (INTVAL (operands[2]) > 63) + operands[2] = gen_rtx_CONST_INT (VOIDmode, 63); + } + + /* Ditch the unnecessary clobbers. */ + emit_insn (gen_ashldi3_neon_noclobber (operands[0], operands[1], + operands[2])); + } + else if (CONST_INT_P (operands[2]) && INTVAL (operands[2]) == 1) + /* This clobbers CC. */ + emit_insn (gen_arm_ashrdi3_1bit (operands[0], operands[1])); + else + arm_emit_coreregs_64bit_shift (ASHIFT, operands[0], operands[1], + operands[2], operands[3], operands[4]); + DONE; + }" + [(set_attr "length" "*,*,24,24,8,12,12,*,*") + (set_attr "arch" "nota8,nota8,*,*,*,*,*,onlya8,onlya8") + (set_attr "opt" "*,*,speed,speed,speed,speed,speed,*,*")] +) + +(define_insn "ashrdi3_neon_imm_noclobber" + [(set (match_operand:DI 0 "s_register_operand" "=w") + (ashiftrt:DI (match_operand:DI 1 "s_register_operand" " w") + (match_operand:SI 2 "int_1_to_64" "PF")))] + "TARGET_NEON && reload_completed" + "vshr.s64\t%P0, %P1, %2" + [(set_attr "neon_type" "neon_vshl_ddd")] +) + +(define_insn_and_split "ashrdi3_neon_imm" + [(set (match_operand:DI 0 "s_register_operand" "=w,?r,?r,?r,?w") + (ashiftrt:DI (match_operand:DI 1 "s_register_operand" " w, r, r, r, w") + (match_operand:SI 2 "int_1_to_64" "PF,P1,Pg,Ph,PF"))) + (clobber (match_scratch:SI 3 "=X, X, X,&r, X")) + (clobber (reg:CC_C CC_REGNUM))] + "TARGET_NEON" + "#" + "TARGET_NEON && reload_completed" + [(const_int 0)] + " + { + if (IS_VFP_REGNUM (REGNO (operands[0]))) + /* Ditch the unnecessary clobbers. */ + emit_insn (gen_ashrdi3_neon_imm_noclobber (operands[0], operands[1], + operands[2])); + else if (INTVAL (operands[2]) == 1) + /* This clobbers CC. */ + emit_insn (gen_arm_ashrdi3_1bit (operands[0], operands[1])); + else + arm_emit_coreregs_64bit_shift (ASHIFTRT, operands[0], operands[1], + operands[2], operands[3], NULL); + DONE; + }" + [(set_attr "length" "*,8,12,12,*") + (set_attr "arch" "nota8,*,*,*,onlya8") + (set_attr "opt" "*,speed,speed,speed,*")] +) + +(define_insn_and_split "ashrdi3_neon_reg" + [(set (match_operand:DI 0 "s_register_operand" "= w, w,?&r,?&r,?w,?w") + (ashiftrt:DI (match_operand:DI 1 "s_register_operand" " w, w, 0, r, w, w") + (match_operand:SI 2 "s_register_operand" " r, r, r, r, r, r"))) + (clobber (match_scratch:SI 3 "= 2,&r, &r, &r, 2,&r")) + (clobber (match_scratch:SI 4 "=&T,&T, &r, &r,&T,&T")) + (clobber (reg:CC CC_REGNUM))] + "TARGET_NEON" + "#" + "TARGET_NEON && reload_completed" + [(const_int 0)] + " + { + if (IS_VFP_REGNUM (REGNO (operands[0]))) + { + emit_insn (gen_negsi2 (operands[3], operands[2])); + emit_insn (gen_rtx_SET (SImode, operands[4], operands[3])); + emit_insn (gen_signed_shift_di3_neon (operands[0], operands[1], + operands[4])); + } + else + /* This clobbers CC (ASHIFTRT only). */ + arm_emit_coreregs_64bit_shift (ASHIFTRT, operands[0], operands[1], + operands[2], operands[3], operands[4]); + DONE; + }" + [(set_attr "length" "12,12,24,24,12,12") + (set_attr "arch" "nota8,nota8,*,*,onlya8,onlya8") + (set_attr "opt" "*,*,speed,speed,*,*")] +) + +(define_expand "ashrdi3_neon" + [(match_operand:DI 0 "s_register_operand" "") + (match_operand:DI 1 "s_register_operand" "") + (match_operand:SI 2 "reg_or_int_operand" "")] + "TARGET_NEON" +{ + if (CONST_INT_P (operands[2])) + { + if (INTVAL (operands[2]) < 1) + { + emit_insn (gen_movdi (operands[0], operands[1])); + DONE; + } + else if (INTVAL (operands[2]) > 64) + operands[2] = gen_rtx_CONST_INT (VOIDmode, 64); + + emit_insn (gen_ashrdi3_neon_imm (operands[0], operands[1], operands[2])); + } + else + emit_insn (gen_ashrdi3_neon_reg (operands[0], operands[1], operands[2])); + DONE; +}) + +(define_insn "lshrdi3_neon_imm_noclobber" + [(set (match_operand:DI 0 "s_register_operand" "=w") + (lshiftrt:DI (match_operand:DI 1 "s_register_operand" " w") + (match_operand:SI 2 "int_1_to_64" "PF")))] + "TARGET_NEON && reload_completed" + "vshr.u64\t%P0, %P1, %2" + [(set_attr "neon_type" "neon_vshl_ddd")] +) + +(define_insn_and_split "lshrdi3_neon_imm" + [(set (match_operand:DI 0 "s_register_operand" "=w,?r,?r,?r,?w") + (lshiftrt:DI (match_operand:DI 1 "s_register_operand" " w, r, r, r, w") + (match_operand:SI 2 "int_1_to_64" "PF,P1,Pg,Ph,PF"))) + (clobber (match_scratch:SI 3 "=X, X, X,&r, X")) + (clobber (reg:CC_C CC_REGNUM))] + "TARGET_NEON" + "#" + "TARGET_NEON && reload_completed" + [(const_int 0)] + " + { + if (IS_VFP_REGNUM (REGNO (operands[0]))) + /* Ditch the unnecessary clobbers. */ + emit_insn (gen_lshrdi3_neon_imm_noclobber (operands[0], operands[1], + operands[2])); + else if (INTVAL (operands[2]) == 1) + /* This clobbers CC. */ + emit_insn (gen_arm_lshrdi3_1bit (operands[0], operands[1])); + else + arm_emit_coreregs_64bit_shift (LSHIFTRT, operands[0], operands[1], + operands[2], operands[3], NULL); + DONE; + }" + [(set_attr "length" "4,8,12,12,4") + (set_attr "arch" "nota8,*,*,*,onlya8") + (set_attr "opt" "*,speed,speed,speed,*")] +) + +(define_insn_and_split "lshrdi3_neon_reg" + [(set (match_operand:DI 0 "s_register_operand" "= w, w,?&r,?&r,?w,?w") + (lshiftrt:DI (match_operand:DI 1 "s_register_operand" " w, w, 0, r, w, w") + (match_operand:SI 2 "s_register_operand" " r, r, r, r, r, r"))) + (clobber (match_scratch:SI 3 "= 2,&r, &r, &r, 2,&r")) + (clobber (match_scratch:SI 4 "=&T,&T, &r, &r,&T,&T"))] + "TARGET_NEON" + "#" + "TARGET_NEON && reload_completed" + [(const_int 0)] + " + { + if (IS_VFP_REGNUM (REGNO (operands[0]))) + { + emit_insn (gen_negsi2 (operands[3], operands[2])); + emit_insn (gen_rtx_SET (SImode, operands[4], operands[3])); + emit_insn (gen_unsigned_shift_di3_neon (operands[0], operands[1], + operands[4])); + } + else + arm_emit_coreregs_64bit_shift (LSHIFTRT, operands[0], operands[1], + operands[2], operands[3], operands[4]); + DONE; + }" + [(set_attr "length" "12,12,24,24,12,12") + (set_attr "arch" "nota8,nota8,*,*,onlya8,onlya8") + (set_attr "opt" "*,*,speed,speed,*,*")] +) + +(define_expand "lshrdi3_neon" + [(match_operand:DI 0 "s_register_operand" "") + (match_operand:DI 1 "s_register_operand" "") + (match_operand:SI 2 "reg_or_int_operand" "")] + "TARGET_NEON" +{ + if (CONST_INT_P (operands[2])) + { + if (INTVAL (operands[2]) < 1) + { + emit_insn (gen_movdi (operands[0], operands[1])); + DONE; + } + else if (INTVAL (operands[2]) > 64) + operands[2] = gen_rtx_CONST_INT (VOIDmode, 64); + + emit_insn (gen_lshrdi3_neon_imm (operands[0], operands[1], operands[2])); + } + else + emit_insn (gen_lshrdi3_neon_reg (operands[0], operands[1], operands[2])); + DONE; +}) + ;; Widening operations (define_insn "widen_ssum3" diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md index fa2027c..62af222 100644 --- a/gcc/config/arm/predicates.md +++ b/gcc/config/arm/predicates.md @@ -644,6 +644,11 @@ (define_special_predicate "add_operator" (match_code "plus")) + (define_predicate "mem_noofs_operand" (and (match_code "mem") (match_code "reg" "0"))) + +(define_predicate "int_1_to_64" + (and (match_code "const_int") + (match_test "IN_RANGE (INTVAL (op), 1, 64)")))