From patchwork Tue Sep 22 23:52:25 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Collison X-Patchwork-Id: 54011 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-la0-f71.google.com (mail-la0-f71.google.com [209.85.215.71]) by patches.linaro.org (Postfix) with ESMTPS id 78CF822D91 for ; Tue, 22 Sep 2015 23:52:50 +0000 (UTC) Received: by lamf6 with SMTP id f6sf14819170lam.1 for ; Tue, 22 Sep 2015 16:52:49 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:delivered-to:mailing-list:precedence:list-id :list-unsubscribe:list-archive:list-post:list-help:sender :delivered-to:message-id:date:from:user-agent:mime-version:to :subject:content-type:x-original-sender :x-original-authentication-results; bh=vYliEOYXGA+v1zttfJdJRUUaAM0w7bEoP7Y6iM38wQA=; b=EtZwcpZs/Lg24+vBFVhIjYqJC6WsKXxwLcYO8NixhQ9hAf4i6NvWn6DKtvPmB2EapW VuL/s3t3XhsZ4UyyK43V9otf9o7JJv+XPDOZ9BgodKlY/S557aQrWok7vo7YPdanEb5G W6cMKdtjE9HvIST13RxyAAw7ZRtw7aDUWjCRJ/jQGYGJfEYuBSGTgjwBhqQc6X0DLaMf liiS33GCxJtUxVxGDEcbYlHiiLUElBCXETAo2JvMPwr7/kVSx+Usqcds1IS0c7VSiwBC s+zd8P2ywBtz5hosZYlm28Bu/UXs3k4nUBwcvMYfOfF58cEqs9EgtKPs1gVJqCP47Tyk VbHQ== X-Gm-Message-State: ALoCoQl9oi7c/YdiyQIyzz3XgNBgH2IHC9ILXfQJgb1eJxRqZSsmGUEGZ+Ul2pV0YkG9/RXQ3aRw X-Received: by 10.112.140.195 with SMTP id ri3mr4838253lbb.22.1442965969451; Tue, 22 Sep 2015 16:52:49 -0700 (PDT) X-BeenThere: patchwork-forward@linaro.org Received: by 10.25.143.209 with SMTP id r200ls84821lfd.6.gmail; Tue, 22 Sep 2015 16:52:49 -0700 (PDT) X-Received: by 10.112.36.196 with SMTP id s4mr10700004lbj.59.1442965969181; Tue, 22 Sep 2015 16:52:49 -0700 (PDT) Received: from mail-la0-x236.google.com (mail-la0-x236.google.com. [2a00:1450:4010:c03::236]) by mx.google.com with ESMTPS id an1si573002lbc.68.2015.09.22.16.52.49 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 22 Sep 2015 16:52:49 -0700 (PDT) Received-SPF: pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 2a00:1450:4010:c03::236 as permitted sender) client-ip=2a00:1450:4010:c03::236; Received: by lahh2 with SMTP id h2so7749110lah.0 for ; Tue, 22 Sep 2015 16:52:49 -0700 (PDT) X-Received: by 10.112.64.72 with SMTP id m8mr10297682lbs.41.1442965969036; Tue, 22 Sep 2015 16:52:49 -0700 (PDT) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patch@linaro.org Received: by 10.112.59.35 with SMTP id w3csp774369lbq; Tue, 22 Sep 2015 16:52:47 -0700 (PDT) X-Received: by 10.67.22.196 with SMTP id hu4mr7946700pad.20.1442965967359; Tue, 22 Sep 2015 16:52:47 -0700 (PDT) Received: from sourceware.org (server1.sourceware.org. [209.132.180.131]) by mx.google.com with ESMTPS id x8si6141131pbt.32.2015.09.22.16.52.46 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 22 Sep 2015 16:52:47 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-return-408083-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Received: (qmail 83698 invoked by alias); 22 Sep 2015 23:52:32 -0000 Mailing-List: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: , List-Help: , Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 83686 invoked by uid 89); 22 Sep 2015 23:52:31 -0000 X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.6 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-pa0-f42.google.com Received: from mail-pa0-f42.google.com (HELO mail-pa0-f42.google.com) (209.85.220.42) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Tue, 22 Sep 2015 23:52:30 +0000 Received: by padhy16 with SMTP id hy16so22913286pad.1 for ; Tue, 22 Sep 2015 16:52:28 -0700 (PDT) X-Received: by 10.68.96.197 with SMTP id du5mr33660508pbb.32.1442965947862; Tue, 22 Sep 2015 16:52:27 -0700 (PDT) Received: from [172.16.37.73] ([70.35.39.2]) by smtp.googlemail.com with ESMTPSA id wj12sm4381455pac.9.2015.09.22.16.52.26 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 22 Sep 2015 16:52:27 -0700 (PDT) Message-ID: <5601E9B9.5060600@linaro.org> Date: Tue, 22 Sep 2015 16:52:25 -0700 From: Michael Collison User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0 MIME-Version: 1.0 To: GCC Patches , Ramana.Radhakrishnan@arm.com Subject: Re: [ARM] Use vector wide add for mixed-mode adds X-Original-Sender: michael.collison@linaro.org X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 2a00:1450:4010:c03::236 as permitted sender) smtp.mailfrom=patch+caf_=patchwork-forward=linaro.org@linaro.org; dkim=pass header.i=@gcc.gnu.org X-Google-Group-Id: 836684582541 This is a modified version of the previous patch that removes the documentation and read-md.c fixes. These patches have been submitted separately and approved. This patch is designed to address code that was not being vectorized due to missing widening patterns in the ARM backend. Code such as: int t6(int len, void * dummy, short * __restrict x) { len = len & ~31; int result = 0; __asm volatile (""); for (int i = 0; i < len; i++) result += x[i]; return result; } Validated on arm-none-eabi, arm-none-linux-gnueabi, arm-none-linux-gnueabihf, and armeb-none-linux-gnueabihf. 2015-09-22 Michael Collison * config/arm/neon.md (widen_sum): New patterns where mode is VQI to improve mixed mode add vectorization. diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index 654d9d5..54623fe 100644 --- a/gcc/config/arm/neon.md +++ b/gcc/config/arm/neon.md @@ -1174,6 +1174,57 @@ ;; Widening operations +(define_expand "widen_ssum3" + [(set (match_operand: 0 "s_register_operand" "") + (plus: (sign_extend: (match_operand:VQI 1 "s_register_operand" "")) + (match_operand: 2 "s_register_operand" "")))] + "TARGET_NEON" + { + int i; + int half_elem = /2; + rtvec v1 = rtvec_alloc (half_elem); + rtvec v2 = rtvec_alloc (half_elem); + rtx p1, p2; + + for (i = 0; i < half_elem; i++) + RTVEC_ELT (v1, i) = GEN_INT (i); + p1 = gen_rtx_PARALLEL (GET_MODE (operands[1]), v1); + + for (i = half_elem; i < ; i++) + RTVEC_ELT (v2, i - half_elem) = GEN_INT (i); + p2 = gen_rtx_PARALLEL (GET_MODE (operands[1]), v2); + + if (operands[0] != operands[2]) + emit_move_insn (operands[0], operands[2]); + + emit_insn (gen_vec_sel_widen_ssum_lo3 (operands[0], operands[1], p1, operands[0])); + emit_insn (gen_vec_sel_widen_ssum_hi3 (operands[0], operands[1], p2, operands[0])); + DONE; + } +) + +(define_insn "vec_sel_widen_ssum_lo3" + [(set (match_operand: 0 "s_register_operand" "=w") + (plus: (sign_extend: (vec_select:VW (match_operand:VQI 1 "s_register_operand" "%w") + (match_operand:VQI 2 "vect_par_constant_low" ""))) + (match_operand: 3 "s_register_operand" "0")))] + "TARGET_NEON" + "vaddw.\t%q0, %q3, %e1" + [(set_attr "type" "neon_add_widen") + (set_attr "length" "8")] +) + +(define_insn "vec_sel_widen_ssum_hi3" + [(set (match_operand: 0 "s_register_operand" "=w") + (plus: (sign_extend: (vec_select:VW (match_operand:VQI 1 "s_register_operand" "%w") + (match_operand:VQI 2 "vect_par_constant_high" ""))) + (match_operand: 3 "s_register_operand" "0")))] + "TARGET_NEON" + "vaddw.\t%q0, %q3, %f1" + [(set_attr "type" "neon_add_widen") + (set_attr "length" "8")] +) + (define_insn "widen_ssum3" [(set (match_operand: 0 "s_register_operand" "=w") (plus: (sign_extend: @@ -1184,4 +1235,55 @@ [(set_attr "type" "neon_add_widen")] ) +(define_expand "widen_usum3" + [(set (match_operand: 0 "s_register_operand" "") + (plus: (zero_extend: (match_operand:VQI 1 "s_register_operand" "")) + (match_operand: 2 "s_register_operand" "")))] + "TARGET_NEON" + { + int i; + int half_elem = /2; + rtvec v1 = rtvec_alloc (half_elem); + rtvec v2 = rtvec_alloc (half_elem); + rtx p1, p2; + + for (i = 0; i < half_elem; i++) + RTVEC_ELT (v1, i) = GEN_INT (i); + p1 = gen_rtx_PARALLEL (GET_MODE (operands[1]), v1); + + for (i = half_elem; i < ; i++) + RTVEC_ELT (v2, i - half_elem) = GEN_INT (i); + p2 = gen_rtx_PARALLEL (GET_MODE (operands[1]), v2); + + if (operands[0] != operands[2]) + emit_move_insn (operands[0], operands[2]); + + emit_insn (gen_vec_sel_widen_usum_lo3 (operands[0], operands[1], p1, operands[0])); + emit_insn (gen_vec_sel_widen_usum_hi3 (operands[0], operands[1], p2, operands[0])); + DONE; + } +) + +(define_insn "vec_sel_widen_usum_lo3" + [(set (match_operand: 0 "s_register_operand" "=w") + (plus: (zero_extend: (vec_select:VW (match_operand:VQI 1 "s_register_operand" "%w") + (match_operand:VQI 2 "vect_par_constant_low" ""))) + (match_operand: 3 "s_register_operand" "0")))] + "TARGET_NEON" + "vaddw.\t%q0, %q3, %e1" + [(set_attr "type" "neon_add_widen") + (set_attr "length" "8")] +) + +(define_insn "vec_sel_widen_usum_hi3" + [(set (match_operand: 0 "s_register_operand" "=w") + (plus: (zero_extend: (vec_select:VW (match_operand:VQI 1 "s_register_operand" "%w") + (match_operand:VQI 2 "vect_par_constant_high" ""))) + (match_operand: 3 "s_register_operand" "0")))] + "TARGET_NEON" + "vaddw.\t%q0, %q3, %f1" + [(set_attr "type" "neon_add_widen") + (set_attr "length" "8")] +) + diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddws16.c b/gcc/testsuite/gcc.target/arm/neon-vaddws16.c new file mode 100644 index 0000000..ed10669 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/neon-vaddws16.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_neon_hw } */ +/* { dg-add-options arm_neon_ok } */ +/* { dg-options "-O3" } */ + + +int +t6(int len, void * dummy, short * __restrict x) +{ + len = len & ~31; + int result = 0; + __asm volatile (""); + for (int i = 0; i < len; i++) + result += x[i]; + return result; +} + +/* { dg-final { scan-assembler "vaddw\.s16" } } */ + + + diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddws32.c b/gcc/testsuite/gcc.target/arm/neon-vaddws32.c new file mode 100644 index 0000000..94bf0c9 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/neon-vaddws32.c @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_neon_hw } */ +/* { dg-add-options arm_neon_ok } */ +/* { dg-options "-O3" } */ + +int +t6(int len, void * dummy, int * __restrict x) +{ + len = len & ~31; + long long result = 0; + __asm volatile (""); + for (int i = 0; i < len; i++) + result += x[i]; + return result; +} + +/* { dg-final { scan-assembler "vaddw\.s32" } } */ + + diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddwu16.c b/gcc/testsuite/gcc.target/arm/neon-vaddwu16.c new file mode 100644 index 0000000..98f8768 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/neon-vaddwu16.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_neon_hw } */ +/* { dg-add-options arm_neon_ok } */ +/* { dg-options "-O3" } */ + + +int +t6(int len, void * dummy, unsigned short * __restrict x) +{ + len = len & ~31; + unsigned int result = 0; + __asm volatile (""); + for (int i = 0; i < len; i++) + result += x[i]; + return result; +} + +/* { dg-final { scan-assembler "vaddw.u16" } } */ diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddwu32.c b/gcc/testsuite/gcc.target/arm/neon-vaddwu32.c new file mode 100644 index 0000000..2e9af56 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/neon-vaddwu32.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_neon_hw } */ +/* { dg-add-options arm_neon_ok } */ +/* { dg-options "-O3" } */ + +int +t6(int len, void * dummy, unsigned int * __restrict x) +{ + len = len & ~31; + unsigned long long result = 0; + __asm volatile (""); + for (int i = 0; i < len; i++) + result += x[i]; + return result; +} + +/* { dg-final { scan-assembler "vaddw\.u32" } } */ + diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddwu8.c b/gcc/testsuite/gcc.target/arm/neon-vaddwu8.c new file mode 100644 index 0000000..de2ad8a --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/neon-vaddwu8.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_neon_hw } */ +/* { dg-add-options arm_neon_ok } */ +/* { dg-options "-O3" } */ + + +int +t6(int len, void * dummy, char * __restrict x) +{ + len = len & ~31; + unsigned short result = 0; + __asm volatile (""); + for (int i = 0; i < len; i++) + result += x[i]; + return result; +} + +/* { dg-final { scan-assembler "vaddw\.u8" } } */ + + + -- 1.9.1