From patchwork Thu Dec 10 15:09:50 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kyrylo Tkachov X-Patchwork-Id: 58224 Delivered-To: patch@linaro.org Received: by 10.112.147.194 with SMTP id tm2csp552283lbb; Thu, 10 Dec 2015 07:10:16 -0800 (PST) X-Received: by 10.98.32.218 with SMTP id m87mr5305905pfj.60.1449760216614; Thu, 10 Dec 2015 07:10:16 -0800 (PST) Return-Path: Received: from sourceware.org (server1.sourceware.org. [209.132.180.131]) by mx.google.com with ESMTPS id pb7si20836960pac.48.2015.12.10.07.10.16 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 10 Dec 2015 07:10:16 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-return-416968-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Authentication-Results: mx.google.com; spf=pass (google.com: domain of gcc-patches-return-416968-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) smtp.mailfrom=gcc-patches-return-416968-patch=linaro.org@gcc.gnu.org; dkim=pass header.i=@gcc.gnu.org DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; q=dns; s= default; b=YSWQAXDNWVvFNRHJIFXHKf0DGsu8myQt72ROrW67duBtQeImRM/zA qJmoEF+FB+TE3aIsu4fwnxQi3rUQq9fFiWGhW2Ha2sdMtVt6YW8VuZSK0dya4aWI D/liVx1drfYpBE1592DrwG8JP42BswKQ/azLrM0EOrFipymDWdoGfU= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; s=default; bh=WcBRktZiE1c5M9Uhf6YA5hOBD9g=; b=H2mPkwr1gvleqRL9AuhKKUulkGQ4 16tbedwxB1EZ40225idvU28cbvCaYcgcsDoS+aWZIe/cy7IeeBAYE5rVCg6SWDWR cJtrBtTeHgVmJXmS7L1M94FGN5qQPDiDJ1nBtq3QDbhKjZd+9H3cmrj5rk8CqM0J CnioYNJVrIFH14Q= Received: (qmail 98122 invoked by alias); 10 Dec 2015 15:09:57 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 98112 invoked by uid 89); 10 Dec 2015 15:09:57 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.7 required=5.0 tests=AWL, BAYES_00, SPF_PASS autolearn=ham version=3.3.2 X-HELO: eu-smtp-delivery-143.mimecast.com Received: from eu-smtp-delivery-143.mimecast.com (HELO eu-smtp-delivery-143.mimecast.com) (146.101.78.143) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 10 Dec 2015 15:09:55 +0000 Received: from cam-owa1.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.140]) by eu-smtp-1.mimecast.com with ESMTP id uk-mta-19-ul59cghNRf2TLWv0warLmg-1; Thu, 10 Dec 2015 15:09:51 +0000 Received: from [10.2.206.200] ([10.1.2.79]) by cam-owa1.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959); Thu, 10 Dec 2015 15:09:50 +0000 Message-ID: <566995BE.8040206@arm.com> Date: Thu, 10 Dec 2015 15:09:50 +0000 From: Kyrill Tkachov User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: Michael Collison , gcc Patches , Ramana Radhakrishnan Subject: Re: [ARM] Use vector wide add for mixed-mode adds References: <565BA3CC.3050800@linaro.org> In-Reply-To: <565BA3CC.3050800@linaro.org> X-MC-Unique: ul59cghNRf2TLWv0warLmg-1 X-IsSubscribed: yes Hi Michael, A few comments while I look deeper into this patch... On 30/11/15 01:18, Michael Collison wrote: > > This is a modified version of my previous patch that supports vector wide add. I added support for vaddw on big endian when generating the parallel operand for the vector select. > > There are four failing test cases on arm big endian with similar code. They are: > > gcc.dg/vect/vect-outer-4f.c -flto -ffat-lto-objects execution test > gcc.dg/vect/vect-outer-4g.c -flto -ffat-lto-objects execution test > gcc.dg/vect/vect-outer-4k.c -flto -ffat-lto-objects execution test > gcc.dg/vect/vect-outer-4l.c -flto -ffat-lto-objects execution test > > > The failures occur without my patch and are related to a bug with vector loads using VUZP operations. > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68532 > > Validated on arm-none-eabi, arm-none-linux-gnueabi, arm-none-linux-gnueabihf, and armeb-none-linux-gnueabihf. > > 2015-11-29 Michael Collison > > * config/arm/neon.md (widen_sum): New patterns where > mode is VQI to improve mixed mode vectorization. > * config/arm/neon.md (vec_sel_widen_ssum_lo3): New > define_insn to match low half of signed vaddw. > * config/arm/neon.md (vec_sel_widen_ssum_hi3): New > define_insn to match high half of signed vaddw. > * config/arm/neon.md (vec_sel_widen_usum_lo3): New > define_insn to match low half of unsigned vaddw. > * config/arm/neon.md (vec_sel_widen_usum_hi3): New > define_insn to match high half of unsigned vaddw. > * config/arm/arm.c (aarch32_simd_vect_par_cnst_half): New function. > (aarch32_simd_check_vect_par_cnst_half): Likewise. > * config/arm/arm-protos.h (aarch32_simd_vect_par_cnst_half): Prototype > for new function. > (aarch32_simd_check_vect_par_cnst_half): Likewise. > * config/arm/predicates.md (vect_par_constant_high): Support > big endian and simplify by calling > aarch32_simd_check_vect_par_cnst_half > (vect_par_constant_low): Likewise. > * testsuite/gcc.target/arm/neon-vaddws16.c: New test. > * testsuite/gcc.target/arm/neon-vaddws32.c: New test. > * testsuite/gcc.target/arm/neon-vaddwu16.c: New test. > * testsuite/gcc.target/arm/neon-vaddwu32.c: New test. > * testsuite/gcc.target/arm/neon-vaddwu8.c: New test. > * testsuite/lib/target-supports.exp > (check_effective_target_vect_widen_sum_hi_to_si_pattern): Indicate > that arm neon support vector widen sum of HImode TO SImode. > > Okay for trunk? > --- a/gcc/config/arm/arm-protos.h +++ b/gcc/config/arm/arm-protos.h @@ -50,7 +50,9 @@ extern tree arm_builtin_decl (unsigned code, bool initialize_p ATTRIBUTE_UNUSED); extern void arm_init_builtins (void); extern void arm_atomic_assign_expand_fenv (tree *hold, tree *clear, tree *update); - +extern rtx aarch32_simd_vect_par_cnst_half (machine_mode mode, bool high); +extern bool aarch32_simd_check_vect_par_cnst_half (rtx op, machine_mode mode, + bool high); Please use arm instead of aarch32 in the name to be consistent with the rest of the backend. Also, for functions that return a bool without side-effects it's preferable to finish their name with '_p'. So for the second one I'd drop the 'check' and call it something like "arm_vector_of_lane_nums_p ", is that a more descriptive name? +/* Check OP for validity as a PARALLEL RTX vector with elements + numbering the lanes of either the high (HIGH == TRUE) or low lanes, + from the perspective of the architecture. See the diagram above + aarch64_simd_vect_par_cnst_half for more details. */ + aarch64? --- a/gcc/config/arm/neon.md +++ b/gcc/config/arm/neon.md @@ -1174,6 +1174,51 @@ ;; Widening operations +(define_expand "widen_ssum3" + [(set (match_operand: 0 "s_register_operand" "") + (plus: (sign_extend: (match_operand:VQI 1 "s_register_operand" "")) + (match_operand: 2 "s_register_operand" "")))] + "TARGET_NEON" + { + machine_mode mode = GET_MODE (operands[1]); + rtx p1, p2; + + p1 = aarch32_simd_vect_par_cnst_half (mode, false); + p2 = aarch32_simd_vect_par_cnst_half (mode, true); + + if (operands[0] != operands[2]) + emit_move_insn (operands[0], operands[2]); + + emit_insn (gen_vec_sel_widen_ssum_lo3 (operands[0], operands[1], p1, operands[0])); + emit_insn (gen_vec_sel_widen_ssum_hi3 (operands[0], operands[1], p2, operands[0])); + DONE; + } Please format these properly to avoid long lines. Thanks, Kyrill