From patchwork Thu Dec 10 15:09:50 2015
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
X-Patchwork-Id: 58224
Delivered-To: patch@linaro.org
Received: by 10.112.147.194 with SMTP id tm2csp552283lbb;
 Thu, 10 Dec 2015 07:10:16 -0800 (PST)
X-Received: by 10.98.32.218 with SMTP id m87mr5305905pfj.60.1449760216614;
 Thu, 10 Dec 2015 07:10:16 -0800 (PST)
Return-Path: <gcc-patches-return-416968-patch=linaro.org@gcc.gnu.org>
Received: from sourceware.org (server1.sourceware.org. [209.132.180.131])
 by mx.google.com with ESMTPS id
 pb7si20836960pac.48.2015.12.10.07.10.16 for <patch@linaro.org>
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Thu, 10 Dec 2015 07:10:16 -0800 (PST)
Received-SPF: pass (google.com: domain of
 gcc-patches-return-416968-patch=linaro.org@gcc.gnu.org
 designates 209.132.180.131 as permitted sender)
 client-ip=209.132.180.131; 
Authentication-Results: mx.google.com; spf=pass (google.com: domain of
 gcc-patches-return-416968-patch=linaro.org@gcc.gnu.org
 designates 209.132.180.131 as permitted sender)
 smtp.mailfrom=gcc-patches-return-416968-patch=linaro.org@gcc.gnu.org;
 dkim=pass header.i=@gcc.gnu.org
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
 :list-unsubscribe:list-archive:list-post:list-help:sender
 :message-id:date:from:mime-version:to:subject:references
 :in-reply-to:content-type:content-transfer-encoding; q=dns; s=
 default; b=YSWQAXDNWVvFNRHJIFXHKf0DGsu8myQt72ROrW67duBtQeImRM/zA
 qJmoEF+FB+TE3aIsu4fwnxQi3rUQq9fFiWGhW2Ha2sdMtVt6YW8VuZSK0dya4aWI
 D/liVx1drfYpBE1592DrwG8JP42BswKQ/azLrM0EOrFipymDWdoGfU=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
 :list-unsubscribe:list-archive:list-post:list-help:sender
 :message-id:date:from:mime-version:to:subject:references
 :in-reply-to:content-type:content-transfer-encoding; s=default;
 bh=WcBRktZiE1c5M9Uhf6YA5hOBD9g=; b=H2mPkwr1gvleqRL9AuhKKUulkGQ4
 16tbedwxB1EZ40225idvU28cbvCaYcgcsDoS+aWZIe/cy7IeeBAYE5rVCg6SWDWR
 cJtrBtTeHgVmJXmS7L1M94FGN5qQPDiDJ1nBtq3QDbhKjZd+9H3cmrj5rk8CqM0J
 CnioYNJVrIFH14Q=
Received: (qmail 98122 invoked by alias); 10 Dec 2015 15:09:57 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <mailto:gcc-patches-unsubscribe-patch=linaro.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 98112 invoked by uid 89); 10 Dec 2015 15:09:57 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.7 required=5.0 tests=AWL, BAYES_00,
 SPF_PASS autolearn=ham version=3.3.2
X-HELO: eu-smtp-delivery-143.mimecast.com
Received: from eu-smtp-delivery-143.mimecast.com (HELO
 eu-smtp-delivery-143.mimecast.com) (146.101.78.143) by
 sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP;
 Thu, 10 Dec 2015 15:09:55 +0000
Received: from cam-owa1.Emea.Arm.com (fw-tnat.cambridge.arm.com
 [217.140.96.140]) by eu-smtp-1.mimecast.com with ESMTP id
 uk-mta-19-ul59cghNRf2TLWv0warLmg-1; Thu, 10 Dec 2015 15:09:51 +0000
Received: from [10.2.206.200] ([10.1.2.79]) by cam-owa1.Emea.Arm.com with
 Microsoft SMTPSVC(6.0.3790.3959); Thu, 10 Dec 2015 15:09:50 +0000
Message-ID: <566995BE.8040206@arm.com>
Date: Thu, 10 Dec 2015 15:09:50 +0000
From: Kyrill Tkachov <kyrylo.tkachov@arm.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
 rv:31.0) Gecko/20100101 Thunderbird/31.2.0
MIME-Version: 1.0
To: Michael Collison <michael.collison@linaro.org>,
 gcc Patches <gcc-patches@gcc.gnu.org>,
 Ramana Radhakrishnan <Ramana.Radhakrishnan@arm.com>
Subject: Re: [ARM] Use vector wide add for mixed-mode adds
References: <565BA3CC.3050800@linaro.org>
In-Reply-To: <565BA3CC.3050800@linaro.org>
X-MC-Unique: ul59cghNRf2TLWv0warLmg-1
X-IsSubscribed: yes

Hi Michael,

A few comments while I look deeper into this patch...

On 30/11/15 01:18, Michael Collison wrote:
>
> This is a modified version of my previous patch that supports vector wide add. I added support for vaddw on big endian when generating the parallel operand for the vector select.
>
> There are four failing test cases on arm big endian with similar code. They are:
>
> gcc.dg/vect/vect-outer-4f.c -flto -ffat-lto-objects execution test
> gcc.dg/vect/vect-outer-4g.c -flto -ffat-lto-objects execution test
> gcc.dg/vect/vect-outer-4k.c -flto -ffat-lto-objects execution test
> gcc.dg/vect/vect-outer-4l.c -flto -ffat-lto-objects execution test
>
>
> The failures occur without my patch and are related to a bug with vector loads using VUZP operations.
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68532
>
> Validated on arm-none-eabi, arm-none-linux-gnueabi, arm-none-linux-gnueabihf, and armeb-none-linux-gnueabihf.
>
> 2015-11-29  Michael Collison  <michael.collison@linaro.org>
>
>     * config/arm/neon.md (widen_<us>sum<mode>): New patterns where
>     mode is VQI to improve mixed mode vectorization.
>     * config/arm/neon.md (vec_sel_widen_ssum_lo<VQI:mode><VW:mode>3): New
>     define_insn to match low half of signed vaddw.
>     * config/arm/neon.md (vec_sel_widen_ssum_hi<VQI:mode><VW:mode>3): New
>     define_insn to match high half of signed vaddw.
>     * config/arm/neon.md (vec_sel_widen_usum_lo<VQI:mode><VW:mode>3): New
>     define_insn to match low half of unsigned vaddw.
>     * config/arm/neon.md (vec_sel_widen_usum_hi<VQI:mode><VW:mode>3): New
>     define_insn to match high half of unsigned vaddw.
>     * config/arm/arm.c (aarch32_simd_vect_par_cnst_half): New function.
>     (aarch32_simd_check_vect_par_cnst_half): Likewise.
>     * config/arm/arm-protos.h (aarch32_simd_vect_par_cnst_half): Prototype
>     for new function.
>     (aarch32_simd_check_vect_par_cnst_half): Likewise.
>     * config/arm/predicates.md (vect_par_constant_high): Support
>     big endian and simplify by calling
>     aarch32_simd_check_vect_par_cnst_half
>     (vect_par_constant_low): Likewise.
>     * testsuite/gcc.target/arm/neon-vaddws16.c: New test.
>     * testsuite/gcc.target/arm/neon-vaddws32.c: New test.
>     * testsuite/gcc.target/arm/neon-vaddwu16.c: New test.
>     * testsuite/gcc.target/arm/neon-vaddwu32.c: New test.
>     * testsuite/gcc.target/arm/neon-vaddwu8.c: New test.
>     * testsuite/lib/target-supports.exp
>     (check_effective_target_vect_widen_sum_hi_to_si_pattern): Indicate
>     that arm neon support vector widen sum of HImode TO SImode.
>
> Okay for trunk?
>

--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -50,7 +50,9 @@ extern tree arm_builtin_decl (unsigned code, bool initialize_p
  			      ATTRIBUTE_UNUSED);
  extern void arm_init_builtins (void);
  extern void arm_atomic_assign_expand_fenv (tree *hold, tree *clear, tree *update);
-
+extern rtx aarch32_simd_vect_par_cnst_half (machine_mode mode, bool high);
+extern bool aarch32_simd_check_vect_par_cnst_half (rtx op, machine_mode mode,
+						   bool high);


Please use arm instead of aarch32 in the name to be consistent with the rest of the
backend. Also, for functions that return a bool without side-effects it's preferable
to finish their name with '_p'. So for the second one I'd drop the 'check' and call
it something like "arm_vector_of_lane_nums_p ", is that a more descriptive name?

+/* Check OP for validity as a PARALLEL RTX vector with elements
+   numbering the lanes of either the high (HIGH == TRUE) or low lanes,
+   from the perspective of the architecture.  See the diagram above
+   aarch64_simd_vect_par_cnst_half for more details.  */
+

aarch64?

--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -1174,6 +1174,51 @@
  
  ;; Widening operations
  
+(define_expand "widen_ssum<mode>3"
+  [(set (match_operand:<V_double_width> 0 "s_register_operand" "")
+	(plus:<V_double_width> (sign_extend:<V_double_width> (match_operand:VQI 1 "s_register_operand" ""))
+			       (match_operand:<V_double_width> 2 "s_register_operand" "")))]
+  "TARGET_NEON"
+  {
+    machine_mode mode = GET_MODE (operands[1]);
+    rtx p1, p2;
+
+    p1  = aarch32_simd_vect_par_cnst_half (mode, false);
+    p2  = aarch32_simd_vect_par_cnst_half (mode, true);
+
+    if (operands[0] != operands[2])
+      emit_move_insn (operands[0], operands[2]);
+
+    emit_insn (gen_vec_sel_widen_ssum_lo<mode><V_half>3 (operands[0], operands[1], p1, operands[0]));
+    emit_insn (gen_vec_sel_widen_ssum_hi<mode><V_half>3 (operands[0], operands[1], p2, operands[0]));
+    DONE;
+  }

Please format these properly to avoid long lines.
Thanks,
Kyrill