From patchwork Tue Sep 22 23:52:25 2015
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Michael Collison <michael.collison@linaro.org>
X-Patchwork-Id: 54011
Return-Path: <patchwork-forward+bncBDIIVBVZ6QLRBUOTQ6YAKGQE6ZFR2XA@linaro.org>
X-Original-To: linaro@patches.linaro.org
Delivered-To: linaro@patches.linaro.org
Received: from mail-la0-f71.google.com (mail-la0-f71.google.com
 [209.85.215.71])
 by patches.linaro.org (Postfix) with ESMTPS id 78CF822D91
 for <linaro@patches.linaro.org>; Tue, 22 Sep 2015 23:52:50 +0000 (UTC)
Received: by lamf6 with SMTP id f6sf14819170lam.1
 for <linaro@patches.linaro.org>; Tue, 22 Sep 2015 16:52:49 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:delivered-to:mailing-list:precedence:list-id
 :list-unsubscribe:list-archive:list-post:list-help:sender
 :delivered-to:message-id:date:from:user-agent:mime-version:to
 :subject:content-type:x-original-sender
 :x-original-authentication-results;
 bh=vYliEOYXGA+v1zttfJdJRUUaAM0w7bEoP7Y6iM38wQA=;
 b=EtZwcpZs/Lg24+vBFVhIjYqJC6WsKXxwLcYO8NixhQ9hAf4i6NvWn6DKtvPmB2EapW
 VuL/s3t3XhsZ4UyyK43V9otf9o7JJv+XPDOZ9BgodKlY/S557aQrWok7vo7YPdanEb5G
 W6cMKdtjE9HvIST13RxyAAw7ZRtw7aDUWjCRJ/jQGYGJfEYuBSGTgjwBhqQc6X0DLaMf
 liiS33GCxJtUxVxGDEcbYlHiiLUElBCXETAo2JvMPwr7/kVSx+Usqcds1IS0c7VSiwBC
 s+zd8P2ywBtz5hosZYlm28Bu/UXs3k4nUBwcvMYfOfF58cEqs9EgtKPs1gVJqCP47Tyk
 VbHQ==
X-Gm-Message-State: ALoCoQl9oi7c/YdiyQIyzz3XgNBgH2IHC9ILXfQJgb1eJxRqZSsmGUEGZ+Ul2pV0YkG9/RXQ3aRw
X-Received: by 10.112.140.195 with SMTP id ri3mr4838253lbb.22.1442965969451; 
 Tue, 22 Sep 2015 16:52:49 -0700 (PDT)
X-BeenThere: patchwork-forward@linaro.org
Received: by 10.25.143.209 with SMTP id r200ls84821lfd.6.gmail; Tue, 22 Sep
 2015 16:52:49 -0700 (PDT)
X-Received: by 10.112.36.196 with SMTP id s4mr10700004lbj.59.1442965969181; 
 Tue, 22 Sep 2015 16:52:49 -0700 (PDT)
Received: from mail-la0-x236.google.com (mail-la0-x236.google.com.
 [2a00:1450:4010:c03::236])
 by mx.google.com with ESMTPS id an1si573002lbc.68.2015.09.22.16.52.49
 for <patchwork-forward@linaro.org>
 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Tue, 22 Sep 2015 16:52:49 -0700 (PDT)
Received-SPF: pass (google.com: domain of
 patch+caf_=patchwork-forward=linaro.org@linaro.org designates
 2a00:1450:4010:c03::236 as permitted sender)
 client-ip=2a00:1450:4010:c03::236; 
Received: by lahh2 with SMTP id h2so7749110lah.0
 for <patchwork-forward@linaro.org>;
 Tue, 22 Sep 2015 16:52:49 -0700 (PDT)
X-Received: by 10.112.64.72 with SMTP id m8mr10297682lbs.41.1442965969036;
 Tue, 22 Sep 2015 16:52:49 -0700 (PDT)
X-Forwarded-To: patchwork-forward@linaro.org
X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org
Delivered-To: patch@linaro.org
Received: by 10.112.59.35 with SMTP id w3csp774369lbq;
 Tue, 22 Sep 2015 16:52:47 -0700 (PDT)
X-Received: by 10.67.22.196 with SMTP id hu4mr7946700pad.20.1442965967359;
 Tue, 22 Sep 2015 16:52:47 -0700 (PDT)
Received: from sourceware.org (server1.sourceware.org. [209.132.180.131])
 by mx.google.com with ESMTPS id x8si6141131pbt.32.2015.09.22.16.52.46
 for <patch@linaro.org>
 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Tue, 22 Sep 2015 16:52:47 -0700 (PDT)
Received-SPF: pass (google.com: domain of
 gcc-patches-return-408083-patch=linaro.org@gcc.gnu.org
 designates 209.132.180.131 as permitted sender)
 client-ip=209.132.180.131; 
Received: (qmail 83698 invoked by alias); 22 Sep 2015 23:52:32 -0000
Mailing-List: list patchwork-forward@linaro.org;
 contact patchwork-forward+owners@linaro.org
Precedence: list
List-Id: <patchwork-forward.linaro.org>
List-Unsubscribe: <mailto:googlegroups-manage+836684582541+unsubscribe@googlegroups.com>, 
 <http://groups.google.com/a/linaro.org/group/patchwork-forward/subscribe>
List-Archive: <http://groups.google.com/a/linaro.org/group/patchwork-forward/>
List-Post: <http://groups.google.com/a/linaro.org/group/patchwork-forward/post>, 
 <mailto:patchwork-forward@linaro.org>
List-Help: <http://support.google.com/a/linaro.org/bin/topic.py?topic=25838>, 
 <mailto:patchwork-forward+help@linaro.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 83686 invoked by uid 89); 22 Sep 2015 23:52:31 -0000
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-2.6 required=5.0 tests=BAYES_00,
 RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2
X-HELO: mail-pa0-f42.google.com
Received: from mail-pa0-f42.google.com (HELO mail-pa0-f42.google.com)
 (209.85.220.42) by sourceware.org
 (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256
 encrypted) ESMTPS; Tue, 22 Sep 2015 23:52:30 +0000
Received: by padhy16 with SMTP id hy16so22913286pad.1 for
 <gcc-patches@gcc.gnu.org>; Tue, 22 Sep 2015 16:52:28 -0700 (PDT)
X-Received: by 10.68.96.197 with SMTP id du5mr33660508pbb.32.1442965947862;
 Tue, 22 Sep 2015 16:52:27 -0700 (PDT)
Received: from [172.16.37.73] ([70.35.39.2]) by smtp.googlemail.com with
 ESMTPSA id wj12sm4381455pac.9.2015.09.22.16.52.26
 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256
 bits=128/128); Tue, 22 Sep 2015 16:52:27 -0700 (PDT)
Message-ID: <5601E9B9.5060600@linaro.org>
Date: Tue, 22 Sep 2015 16:52:25 -0700
From: Michael Collison <michael.collison@linaro.org>
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
 rv:31.0) Gecko/20100101 Thunderbird/31.4.0
MIME-Version: 1.0
To: GCC Patches <gcc-patches@gcc.gnu.org>, Ramana.Radhakrishnan@arm.com
Subject: Re: [ARM] Use vector wide add for mixed-mode adds
X-Original-Sender: michael.collison@linaro.org
X-Original-Authentication-Results: mx.google.com; spf=pass (google.com:
 domain of
 patch+caf_=patchwork-forward=linaro.org@linaro.org designates
 2a00:1450:4010:c03::236 as permitted sender)
 smtp.mailfrom=patch+caf_=patchwork-forward=linaro.org@linaro.org;
 dkim=pass header.i=@gcc.gnu.org
X-Google-Group-Id: 836684582541

This is a modified version of the previous patch that removes the 
documentation and read-md.c fixes. These patches have been submitted 
separately and approved.

This patch is designed to address code that was not being vectorized due 
to missing widening patterns in the ARM backend. Code such as:

int t6(int len, void * dummy, short * __restrict x)
{
   len = len & ~31;
   int result = 0;
   __asm volatile ("");
   for (int i = 0; i < len; i++)
     result += x[i];
   return result;
}

Validated on arm-none-eabi, arm-none-linux-gnueabi, 
arm-none-linux-gnueabihf, and armeb-none-linux-gnueabihf.

2015-09-22  Michael Collison  <michael.collison@linaro.org>

     * config/arm/neon.md (widen_<us>sum<mode>): New patterns
     where mode is VQI to improve mixed mode add vectorization.

diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 654d9d5..54623fe 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -1174,6 +1174,57 @@
 
 ;; Widening operations
 
+(define_expand "widen_ssum<mode>3"
+  [(set (match_operand:<V_double_width> 0 "s_register_operand" "")
+	(plus:<V_double_width> (sign_extend:<V_double_width> (match_operand:VQI 1 "s_register_operand" ""))
+			       (match_operand:<V_double_width> 2 "s_register_operand" "")))]
+  "TARGET_NEON"
+  {
+    int i;
+    int half_elem = <V_mode_nunits>/2;
+    rtvec v1 = rtvec_alloc (half_elem);
+    rtvec v2 = rtvec_alloc (half_elem);
+    rtx p1, p2;
+
+    for (i = 0; i < half_elem; i++)
+      RTVEC_ELT (v1, i) = GEN_INT (i);
+    p1 = gen_rtx_PARALLEL (GET_MODE (operands[1]), v1);
+
+    for (i = half_elem; i < <V_mode_nunits>; i++)
+      RTVEC_ELT (v2, i - half_elem) = GEN_INT (i);
+    p2 = gen_rtx_PARALLEL (GET_MODE (operands[1]), v2);
+
+    if (operands[0] != operands[2])
+      emit_move_insn (operands[0], operands[2]);
+
+    emit_insn (gen_vec_sel_widen_ssum_lo<mode><V_half>3 (operands[0], operands[1], p1, operands[0]));
+    emit_insn (gen_vec_sel_widen_ssum_hi<mode><V_half>3 (operands[0], operands[1], p2, operands[0]));
+    DONE;
+  }
+)
+
+(define_insn "vec_sel_widen_ssum_lo<VQI:mode><VW:mode>3"
+  [(set (match_operand:<VW:V_widen> 0 "s_register_operand" "=w")
+	(plus:<VW:V_widen> (sign_extend:<VW:V_widen> (vec_select:VW (match_operand:VQI 1 "s_register_operand" "%w")
+						   (match_operand:VQI 2 "vect_par_constant_low" "")))
+		        (match_operand:<VW:V_widen> 3 "s_register_operand" "0")))]
+  "TARGET_NEON"
+  "vaddw.<V_s_elem>\t%q0, %q3, %e1"
+  [(set_attr "type" "neon_add_widen")
+  (set_attr "length" "8")]
+)
+
+(define_insn "vec_sel_widen_ssum_hi<VQI:mode><VW:mode>3"
+  [(set (match_operand:<VW:V_widen> 0 "s_register_operand" "=w")
+	(plus:<VW:V_widen> (sign_extend:<VW:V_widen> (vec_select:VW (match_operand:VQI 1 "s_register_operand" "%w")
+						   (match_operand:VQI 2 "vect_par_constant_high" "")))
+		        (match_operand:<VW:V_widen> 3 "s_register_operand" "0")))]
+  "TARGET_NEON"
+  "vaddw.<V_s_elem>\t%q0, %q3, %f1"
+  [(set_attr "type" "neon_add_widen")
+  (set_attr "length" "8")]
+)
+
 (define_insn "widen_ssum<mode>3"
   [(set (match_operand:<V_widen> 0 "s_register_operand" "=w")
 	(plus:<V_widen> (sign_extend:<V_widen>
@@ -1184,4 +1235,55 @@
   [(set_attr "type" "neon_add_widen")]
 )
 
+(define_expand "widen_usum<mode>3"
+  [(set (match_operand:<V_double_width> 0 "s_register_operand" "")
+	(plus:<V_double_width> (zero_extend:<V_double_width> (match_operand:VQI 1 "s_register_operand" ""))
+			       (match_operand:<V_double_width> 2 "s_register_operand" "")))]
+  "TARGET_NEON"
+  {
+    int i;
+    int half_elem = <V_mode_nunits>/2;
+    rtvec v1 = rtvec_alloc (half_elem);
+    rtvec v2 = rtvec_alloc (half_elem);
+    rtx p1, p2;
+
+    for (i = 0; i < half_elem; i++)
+      RTVEC_ELT (v1, i) = GEN_INT (i);
+    p1 = gen_rtx_PARALLEL (GET_MODE (operands[1]), v1);
+
+    for (i = half_elem; i < <V_mode_nunits>; i++)
+      RTVEC_ELT (v2, i - half_elem) = GEN_INT (i);
+    p2 = gen_rtx_PARALLEL (GET_MODE (operands[1]), v2);
+
+    if (operands[0] != operands[2])
+      emit_move_insn (operands[0], operands[2]);
+
+    emit_insn (gen_vec_sel_widen_usum_lo<mode><V_half>3 (operands[0], operands[1], p1, operands[0]));
+    emit_insn (gen_vec_sel_widen_usum_hi<mode><V_half>3 (operands[0], operands[1], p2, operands[0]));
+    DONE;
+  }
+)
+
+(define_insn "vec_sel_widen_usum_lo<VQI:mode><VW:mode>3"
+  [(set (match_operand:<VW:V_widen> 0 "s_register_operand" "=w")
+	(plus:<VW:V_widen> (zero_extend:<VW:V_widen> (vec_select:VW (match_operand:VQI 1 "s_register_operand" "%w")
+						   (match_operand:VQI 2 "vect_par_constant_low" "")))
+		        (match_operand:<VW:V_widen> 3 "s_register_operand" "0")))]
+  "TARGET_NEON"
+  "vaddw.<V_u_elem>\t%q0, %q3, %e1"
+  [(set_attr "type" "neon_add_widen")
+  (set_attr "length" "8")]
+)
+
+(define_insn "vec_sel_widen_usum_hi<VQI:mode><VW:mode>3"
+  [(set (match_operand:<VW:V_widen> 0 "s_register_operand" "=w")
+	(plus:<VW:V_widen> (zero_extend:<VW:V_widen> (vec_select:VW (match_operand:VQI 1 "s_register_operand" "%w")
+						   (match_operand:VQI 2 "vect_par_constant_high" "")))
+		        (match_operand:<VW:V_widen> 3 "s_register_operand" "0")))]
+  "TARGET_NEON"
+  "vaddw.<V_u_elem>\t%q0, %q3, %f1"
+  [(set_attr "type" "neon_add_widen")
+  (set_attr "length" "8")]
+)
+
 
diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddws16.c b/gcc/testsuite/gcc.target/arm/neon-vaddws16.c
new file mode 100644
index 0000000..ed10669
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/neon-vaddws16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_hw } */
+/* { dg-add-options arm_neon_ok } */
+/* { dg-options "-O3" } */
+
+
+int 
+t6(int len, void * dummy, short * __restrict x)
+{
+  len = len & ~31;
+  int result = 0;
+  __asm volatile ("");
+  for (int i = 0; i < len; i++)
+    result += x[i];
+  return result;
+}
+
+/* { dg-final { scan-assembler "vaddw\.s16" } } */
+
+
+
diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddws32.c b/gcc/testsuite/gcc.target/arm/neon-vaddws32.c
new file mode 100644
index 0000000..94bf0c9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/neon-vaddws32.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_hw } */
+/* { dg-add-options arm_neon_ok } */
+/* { dg-options "-O3" } */
+
+int 
+t6(int len, void * dummy, int * __restrict x)
+{
+  len = len & ~31;
+  long long result = 0;
+  __asm volatile ("");
+  for (int i = 0; i < len; i++)
+    result += x[i];
+  return result;
+}
+
+/* { dg-final { scan-assembler "vaddw\.s32" } } */
+
+
diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddwu16.c b/gcc/testsuite/gcc.target/arm/neon-vaddwu16.c
new file mode 100644
index 0000000..98f8768
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/neon-vaddwu16.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_hw } */
+/* { dg-add-options arm_neon_ok } */
+/* { dg-options "-O3" } */
+
+
+int 
+t6(int len, void * dummy, unsigned short * __restrict x)
+{
+  len = len & ~31;
+  unsigned int result = 0;
+  __asm volatile ("");
+  for (int i = 0; i < len; i++)
+    result += x[i];
+  return result;
+}
+
+/* { dg-final { scan-assembler "vaddw.u16" } } */
diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddwu32.c b/gcc/testsuite/gcc.target/arm/neon-vaddwu32.c
new file mode 100644
index 0000000..2e9af56
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/neon-vaddwu32.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_hw } */
+/* { dg-add-options arm_neon_ok } */
+/* { dg-options "-O3" } */
+
+int 
+t6(int len, void * dummy, unsigned int * __restrict x)
+{
+  len = len & ~31;
+  unsigned long long result = 0;
+  __asm volatile ("");
+  for (int i = 0; i < len; i++)
+    result += x[i];
+  return result;
+}
+
+/* { dg-final { scan-assembler "vaddw\.u32" } } */
+
diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddwu8.c b/gcc/testsuite/gcc.target/arm/neon-vaddwu8.c
new file mode 100644
index 0000000..de2ad8a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/neon-vaddwu8.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_hw } */
+/* { dg-add-options arm_neon_ok } */
+/* { dg-options "-O3" } */
+
+
+int 
+t6(int len, void * dummy, char * __restrict x)
+{
+  len = len & ~31;
+  unsigned short result = 0;
+  __asm volatile ("");
+  for (int i = 0; i < len; i++)
+    result += x[i];
+  return result;
+}
+
+/* { dg-final { scan-assembler "vaddw\.u8" } } */
+
+
+
-- 
1.9.1