From patchwork Mon Nov 30 01:18:04 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Collison X-Patchwork-Id: 57397 Delivered-To: patch@linaro.org Received: by 10.112.155.196 with SMTP id vy4csp1159440lbb; Sun, 29 Nov 2015 17:18:32 -0800 (PST) X-Received: by 10.66.153.198 with SMTP id vi6mr86710017pab.37.1448846312671; Sun, 29 Nov 2015 17:18:32 -0800 (PST) Return-Path: Received: from sourceware.org (server1.sourceware.org. [209.132.180.131]) by mx.google.com with ESMTPS id q11si26817336pfi.218.2015.11.29.17.18.32 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 29 Nov 2015 17:18:32 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-return-415725-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Authentication-Results: mx.google.com; spf=pass (google.com: domain of gcc-patches-return-415725-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) smtp.mailfrom=gcc-patches-return-415725-patch=linaro.org@gcc.gnu.org; dkim=pass header.i=@gcc.gnu.org DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:subject:content-type; q= dns; s=default; b=LHImeif4kVMWbNzkE1yds0ZL8v6gHgMGDxkUXNAXyzI8gu aJnjPVcNse5gJVw+0F+BcOWJRdwmesb7XSAMtb4V71m9qEzHPBsVvG3rsLkViVHm gLdo+KIiocssj2pNFyEFj30cOCoTwwD/SdVwc0SVnO9lZ9pNuOIe2e9jpyX2Y= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:subject:content-type; s= default; bh=THZNAr4UC//Q+B/BZwzz0SqDYNs=; b=V88QPbtwEbR+I6lS45F8 LrPKbVY7+/7ogPCIeAPeXX7AH9X0cJGBORJ0VG0IbLwQ6bKBsUHZKVwegPr6lWQS pX6IGYD0zutQPrUx4q/xUjLIWq6ty/gvgNou/4Mj6usRcAklOwJxxgp09RIYNN6S RZKUJAvtLdpzWX5A+pUa0YA= Received: (qmail 127760 invoked by alias); 30 Nov 2015 01:18:18 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 127744 invoked by uid 89); 30 Nov 2015 01:18:17 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.4 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-pa0-f53.google.com Received: from mail-pa0-f53.google.com (HELO mail-pa0-f53.google.com) (209.85.220.53) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Mon, 30 Nov 2015 01:18:15 +0000 Received: by pacej9 with SMTP id ej9so166605314pac.2 for ; Sun, 29 Nov 2015 17:18:13 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:message-id:date:from:organization:user-agent :mime-version:to:subject:content-type; bh=aezsUsf+45MOHI+KCPJ26MTEQJciKD8uumxuLlN1eeI=; b=X0V959MMLwslv/X3icEzXKja6mXz9QZjga+80GOM4EFB7T8t525ohqpqGQRpK/ZueM cDQDMdLYbQX0I0rTAWEG91ZhSJv029YtGVrzpRw+m42TxLKwMZ10po8D1NHg2lLGhpnj KDbaz/cKMryUa2tKRnKjBrCnY3orEO/FHXnUgF306jmIANB4Rv7avEmqvnOWFNa03fHB kqJoPnfOh+C4jTpqSDxajuOvO+Hwgh7VXdogh1HZ8T/QOYz78Vt0Mp35slk9ceqrPRYy j8rhc4/8FrdeV6Jxr/DpI9FKp2hiTByXE6BRHQdsj1Ag/kidude4nWUwwXtnDF5nUejR 14mg== X-Gm-Message-State: ALoCoQmILSFHIgHQpv3ucW2bemC/C8AGE1oM/20L7WV5KAkKCcSEkQQ/57nh1wd9VRWG1rPXKGx5 X-Received: by 10.98.13.218 with SMTP id 87mr66142757pfn.153.1448846292869; Sun, 29 Nov 2015 17:18:12 -0800 (PST) Received: from [192.168.1.14] (ip70-176-202-128.ph.ph.cox.net. [70.176.202.128]) by smtp.googlemail.com with ESMTPSA id 1sm2321297pfc.3.2015.11.29.17.18.11 (version=TLSv1/SSLv3 cipher=OTHER); Sun, 29 Nov 2015 17:18:12 -0800 (PST) Message-ID: <565BA3CC.3050800@linaro.org> Date: Sun, 29 Nov 2015 18:18:04 -0700 From: Michael Collison User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0 MIME-Version: 1.0 To: gcc Patches , Ramana Radhakrishnan , Kyrill Tkachov Subject: Re: [ARM] Use vector wide add for mixed-mode adds This is a modified version of my previous patch that supports vector wide add. I added support for vaddw on big endian when generating the parallel operand for the vector select. There are four failing test cases on arm big endian with similar code. They are: gcc.dg/vect/vect-outer-4f.c -flto -ffat-lto-objects execution test gcc.dg/vect/vect-outer-4g.c -flto -ffat-lto-objects execution test gcc.dg/vect/vect-outer-4k.c -flto -ffat-lto-objects execution test gcc.dg/vect/vect-outer-4l.c -flto -ffat-lto-objects execution test The failures occur without my patch and are related to a bug with vector loads using VUZP operations. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68532 Validated on arm-none-eabi, arm-none-linux-gnueabi, arm-none-linux-gnueabihf, and armeb-none-linux-gnueabihf. 2015-11-29 Michael Collison * config/arm/neon.md (widen_sum): New patterns where mode is VQI to improve mixed mode vectorization. * config/arm/neon.md (vec_sel_widen_ssum_lo3): New define_insn to match low half of signed vaddw. * config/arm/neon.md (vec_sel_widen_ssum_hi3): New define_insn to match high half of signed vaddw. * config/arm/neon.md (vec_sel_widen_usum_lo3): New define_insn to match low half of unsigned vaddw. * config/arm/neon.md (vec_sel_widen_usum_hi3): New define_insn to match high half of unsigned vaddw. * config/arm/arm.c (aarch32_simd_vect_par_cnst_half): New function. (aarch32_simd_check_vect_par_cnst_half): Likewise. * config/arm/arm-protos.h (aarch32_simd_vect_par_cnst_half): Prototype for new function. (aarch32_simd_check_vect_par_cnst_half): Likewise. * config/arm/predicates.md (vect_par_constant_high): Support big endian and simplify by calling aarch32_simd_check_vect_par_cnst_half (vect_par_constant_low): Likewise. * testsuite/gcc.target/arm/neon-vaddws16.c: New test. * testsuite/gcc.target/arm/neon-vaddws32.c: New test. * testsuite/gcc.target/arm/neon-vaddwu16.c: New test. * testsuite/gcc.target/arm/neon-vaddwu32.c: New test. * testsuite/gcc.target/arm/neon-vaddwu8.c: New test. * testsuite/lib/target-supports.exp (check_effective_target_vect_widen_sum_hi_to_si_pattern): Indicate that arm neon support vector widen sum of HImode TO SImode. Okay for trunk? -- Michael Collison Linaro Toolchain Working Group michael.collison@linaro.org diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h index f9b1276..26fe370 100644 --- a/gcc/config/arm/arm-protos.h +++ b/gcc/config/arm/arm-protos.h @@ -50,7 +50,9 @@ extern tree arm_builtin_decl (unsigned code, bool initialize_p ATTRIBUTE_UNUSED); extern void arm_init_builtins (void); extern void arm_atomic_assign_expand_fenv (tree *hold, tree *clear, tree *update); - +extern rtx aarch32_simd_vect_par_cnst_half (machine_mode mode, bool high); +extern bool aarch32_simd_check_vect_par_cnst_half (rtx op, machine_mode mode, + bool high); #ifdef RTX_CODE extern bool arm_vector_mode_supported_p (machine_mode); extern bool arm_small_register_classes_for_mode_p (machine_mode); diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 61e2aa2..158c2e8 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -30111,4 +30111,80 @@ arm_sched_fusion_priority (rtx_insn *insn, int max_pri, *pri = tmp; return; } + +/* Construct and return a PARALLEL RTX vector with elements numbering the + lanes of either the high (HIGH == TRUE) or low (HIGH == FALSE) half of + the vector - from the perspective of the architecture. This does not + line up with GCC's perspective on lane numbers, so we end up with + different masks depending on our target endian-ness. The diagram + below may help. We must draw the distinction when building masks + which select one half of the vector. An instruction selecting + architectural low-lanes for a big-endian target, must be described using + a mask selecting GCC high-lanes. + + Big-Endian Little-Endian + +GCC 0 1 2 3 3 2 1 0 + | x | x | x | x | | x | x | x | x | +Architecture 3 2 1 0 3 2 1 0 + +Low Mask: { 2, 3 } { 0, 1 } +High Mask: { 0, 1 } { 2, 3 } +*/ + +rtx +aarch32_simd_vect_par_cnst_half (machine_mode mode, bool high) +{ + int nunits = GET_MODE_NUNITS (mode); + rtvec v = rtvec_alloc (nunits / 2); + int high_base = nunits / 2; + int low_base = 0; + int base; + rtx t1; + int i; + + if (BYTES_BIG_ENDIAN) + base = high ? low_base : high_base; + else + base = high ? high_base : low_base; + + for (i = 0; i < nunits / 2; i++) + RTVEC_ELT (v, i) = GEN_INT (base + i); + + t1 = gen_rtx_PARALLEL (mode, v); + return t1; +} + +/* Check OP for validity as a PARALLEL RTX vector with elements + numbering the lanes of either the high (HIGH == TRUE) or low lanes, + from the perspective of the architecture. See the diagram above + aarch64_simd_vect_par_cnst_half for more details. */ + +bool +aarch32_simd_check_vect_par_cnst_half (rtx op, machine_mode mode, + bool high) +{ + rtx ideal = aarch32_simd_vect_par_cnst_half (mode, high); + HOST_WIDE_INT count_op = XVECLEN (op, 0); + HOST_WIDE_INT count_ideal = XVECLEN (ideal, 0); + int i = 0; + + if (!VECTOR_MODE_P (mode)) + return false; + + if (count_op != count_ideal) + return false; + + for (i = 0; i < count_ideal; i++) + { + rtx elt_op = XVECEXP (op, 0, i); + rtx elt_ideal = XVECEXP (ideal, 0, i); + + if (!CONST_INT_P (elt_op) + || INTVAL (elt_ideal) != INTVAL (elt_op)) + return false; + } + return true; +} + #include "gt-arm.h" diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index e5a2b0f..8d311b7 100644 --- a/gcc/config/arm/neon.md +++ b/gcc/config/arm/neon.md @@ -1174,6 +1174,51 @@ ;; Widening operations +(define_expand "widen_ssum3" + [(set (match_operand: 0 "s_register_operand" "") + (plus: (sign_extend: (match_operand:VQI 1 "s_register_operand" "")) + (match_operand: 2 "s_register_operand" "")))] + "TARGET_NEON" + { + machine_mode mode = GET_MODE (operands[1]); + rtx p1, p2; + + p1 = aarch32_simd_vect_par_cnst_half (mode, false); + p2 = aarch32_simd_vect_par_cnst_half (mode, true); + + if (operands[0] != operands[2]) + emit_move_insn (operands[0], operands[2]); + + emit_insn (gen_vec_sel_widen_ssum_lo3 (operands[0], operands[1], p1, operands[0])); + emit_insn (gen_vec_sel_widen_ssum_hi3 (operands[0], operands[1], p2, operands[0])); + DONE; + } +) + +(define_insn "vec_sel_widen_ssum_lo3" + [(set (match_operand: 0 "s_register_operand" "=w") + (plus: (sign_extend: (vec_select:VW (match_operand:VQI 1 "s_register_operand" "%w") + (match_operand:VQI 2 "vect_par_constant_low" ""))) + (match_operand: 3 "s_register_operand" "0")))] + "TARGET_NEON" +{ + return BYTES_BIG_ENDIAN ? "vaddw.\t%q0, %q3, %f1" : + "vaddw.\t%q0, %q3, %e1"; +} + [(set_attr "type" "neon_add_widen")]) + +(define_insn "vec_sel_widen_ssum_hi3" + [(set (match_operand: 0 "s_register_operand" "=w") + (plus: (sign_extend: (vec_select:VW (match_operand:VQI 1 "s_register_operand" "%w") + (match_operand:VQI 2 "vect_par_constant_high" ""))) + (match_operand: 3 "s_register_operand" "0")))] + "TARGET_NEON" +{ + return BYTES_BIG_ENDIAN ? "vaddw.\t%q0, %q3, %e1" : + "vaddw.\t%q0, %q3, %f1"; +} + [(set_attr "type" "neon_add_widen")]) + (define_insn "widen_ssum3" [(set (match_operand: 0 "s_register_operand" "=w") (plus: (sign_extend: @@ -1184,6 +1229,51 @@ [(set_attr "type" "neon_add_widen")] ) +(define_expand "widen_usum3" + [(set (match_operand: 0 "s_register_operand" "") + (plus: (zero_extend: (match_operand:VQI 1 "s_register_operand" "")) + (match_operand: 2 "s_register_operand" "")))] + "TARGET_NEON" + { + machine_mode mode = GET_MODE (operands[1]); + rtx p1, p2; + + p1 = aarch32_simd_vect_par_cnst_half (mode, false); + p2 = aarch32_simd_vect_par_cnst_half (mode, true); + + if (operands[0] != operands[2]) + emit_move_insn (operands[0], operands[2]); + + emit_insn (gen_vec_sel_widen_usum_lo3 (operands[0], operands[1], p1, operands[0])); + emit_insn (gen_vec_sel_widen_usum_hi3 (operands[0], operands[1], p2, operands[0])); + DONE; + } +) + +(define_insn "vec_sel_widen_usum_lo3" + [(set (match_operand: 0 "s_register_operand" "=w") + (plus: (zero_extend: (vec_select:VW (match_operand:VQI 1 "s_register_operand" "%w") + (match_operand:VQI 2 "vect_par_constant_low" ""))) + (match_operand: 3 "s_register_operand" "0")))] + "TARGET_NEON" +{ + return BYTES_BIG_ENDIAN ? "vaddw.\t%q0, %q3, %f1" : + "vaddw.\t%q0, %q3, %e1"; +} + [(set_attr "type" "neon_add_widen")]) + +(define_insn "vec_sel_widen_usum_hi3" + [(set (match_operand: 0 "s_register_operand" "=w") + (plus: (zero_extend: (vec_select:VW (match_operand:VQI 1 "s_register_operand" "%w") + (match_operand:VQI 2 "vect_par_constant_high" ""))) + (match_operand: 3 "s_register_operand" "0")))] + "TARGET_NEON" +{ + return BYTES_BIG_ENDIAN ? "vaddw.\t%q0, %q3, %e1" : + "vaddw.\t%q0, %q3, %f1"; +} + [(set_attr "type" "neon_add_widen")]) + (define_insn "widen_usum3" [(set (match_operand: 0 "s_register_operand" "=w") (plus: (zero_extend: @@ -5331,7 +5421,7 @@ if (BYTES_BIG_ENDIAN) [(set (match_operand: 0 "register_operand" "=w") (mult: (SE: (vec_select: (match_operand:VU 1 "register_operand" "w") - (match_operand:VU 2 "vect_par_constant_low" ""))) + (match_operand:VU 2 "vect_par_constant_low" ""))) (SE: (vec_select: (match_operand:VU 3 "register_operand" "w") (match_dup 2)))))] diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md index 48e4ba8..e1c24bd 100644 --- a/gcc/config/arm/predicates.md +++ b/gcc/config/arm/predicates.md @@ -605,59 +605,13 @@ (define_special_predicate "vect_par_constant_high" (match_code "parallel") { - HOST_WIDE_INT count = XVECLEN (op, 0); - int i; - int base = GET_MODE_NUNITS (mode); - - if ((count < 1) - || (count != base/2)) - return false; - - if (!VECTOR_MODE_P (mode)) - return false; - - for (i = 0; i < count; i++) - { - rtx elt = XVECEXP (op, 0, i); - int val; - - if (!CONST_INT_P (elt)) - return false; - - val = INTVAL (elt); - if (val != (base/2) + i) - return false; - } - return true; + return aarch32_simd_check_vect_par_cnst_half (op, mode, true); }) (define_special_predicate "vect_par_constant_low" (match_code "parallel") { - HOST_WIDE_INT count = XVECLEN (op, 0); - int i; - int base = GET_MODE_NUNITS (mode); - - if ((count < 1) - || (count != base/2)) - return false; - - if (!VECTOR_MODE_P (mode)) - return false; - - for (i = 0; i < count; i++) - { - rtx elt = XVECEXP (op, 0, i); - int val; - - if (!CONST_INT_P (elt)) - return false; - - val = INTVAL (elt); - if (val != i) - return false; - } - return true; + return aarch32_simd_check_vect_par_cnst_half (op, mode, false); }) (define_predicate "const_double_vcvt_power_of_two_reciprocal" diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddws16.c b/gcc/testsuite/gcc.target/arm/neon-vaddws16.c new file mode 100644 index 0000000..96c657e --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/neon-vaddws16.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_neon_hw } */ +/* { dg-add-options arm_neon_ok } */ +/* { dg-options "-O3" } */ + + +int +t6(int len, void * dummy, short * __restrict x) +{ + len = len & ~31; + int result = 0; + __asm volatile (""); + for (int i = 0; i < len; i++) + result += x[i]; + return result; +} + +/* { dg-final { scan-assembler "vaddw\.s16" } } */ diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddws32.c b/gcc/testsuite/gcc.target/arm/neon-vaddws32.c new file mode 100644 index 0000000..1bfdc13 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/neon-vaddws32.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_neon_hw } */ +/* { dg-add-options arm_neon_ok } */ +/* { dg-options "-O3" } */ + +int +t6(int len, void * dummy, int * __restrict x) +{ + len = len & ~31; + long long result = 0; + __asm volatile (""); + for (int i = 0; i < len; i++) + result += x[i]; + return result; +} + +/* { dg-final { scan-assembler "vaddw\.s32" } } */ diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddwu16.c b/gcc/testsuite/gcc.target/arm/neon-vaddwu16.c new file mode 100644 index 0000000..98f8768 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/neon-vaddwu16.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_neon_hw } */ +/* { dg-add-options arm_neon_ok } */ +/* { dg-options "-O3" } */ + + +int +t6(int len, void * dummy, unsigned short * __restrict x) +{ + len = len & ~31; + unsigned int result = 0; + __asm volatile (""); + for (int i = 0; i < len; i++) + result += x[i]; + return result; +} + +/* { dg-final { scan-assembler "vaddw.u16" } } */ diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddwu32.c b/gcc/testsuite/gcc.target/arm/neon-vaddwu32.c new file mode 100644 index 0000000..4a72a39 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/neon-vaddwu32.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_neon_hw } */ +/* { dg-add-options arm_neon_ok } */ +/* { dg-options "-O3" } */ + +int +t6(int len, void * dummy, unsigned int * __restrict x) +{ + len = len & ~31; + unsigned long long result = 0; + __asm volatile (""); + for (int i = 0; i < len; i++) + result += x[i]; + return result; +} + +/* { dg-final { scan-assembler "vaddw\.u32" } } */ diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddwu8.c b/gcc/testsuite/gcc.target/arm/neon-vaddwu8.c new file mode 100644 index 0000000..9c9c68a --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/neon-vaddwu8.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_neon_hw } */ +/* { dg-add-options arm_neon_ok } */ +/* { dg-options "-O3" } */ + + +int +t6(int len, void * dummy, char * __restrict x) +{ + len = len & ~31; + unsigned short result = 0; + __asm volatile (""); + for (int i = 0; i < len; i++) + result += x[i]; + return result; +} + +/* { dg-final { scan-assembler "vaddw\.u8" } } */ diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index b543519..4deca1f 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -3943,6 +3943,7 @@ proc check_effective_target_vect_widen_sum_hi_to_si_pattern { } { } else { set et_vect_widen_sum_hi_to_si_pattern_saved 0 if { [istarget powerpc*-*-*] + || [check_effective_target_arm_neon_ok] || [istarget ia64-*-*] } { set et_vect_widen_sum_hi_to_si_pattern_saved 1 } -- 1.9.1