From patchwork Wed Oct 26 22:29:25 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Cesar Philippidis <cesar@codesourcery.com>
X-Patchwork-Id: 79549
Delivered-To: patch@linaro.org
Received: by 10.140.97.247 with SMTP id m110csp311686qge;
 Wed, 26 Oct 2016 15:29:57 -0700 (PDT)
X-Received: by 10.98.7.148 with SMTP id 20mr8369064pfh.18.1477520997226;
 Wed, 26 Oct 2016 15:29:57 -0700 (PDT)
Return-Path: <gcc-patches-return-439668-patch=linaro.org@gcc.gnu.org>
Received: from sourceware.org (server1.sourceware.org. [209.132.180.131])
 by mx.google.com with ESMTPS id
 y10si4864996pgc.54.2016.10.26.15.29.56 for <patch@linaro.org>
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Wed, 26 Oct 2016 15:29:57 -0700 (PDT)
Received-SPF: pass (google.com: domain of
 gcc-patches-return-439668-patch=linaro.org@gcc.gnu.org
 designates 209.132.180.131 as permitted sender)
 client-ip=209.132.180.131; 
Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org;
 spf=pass (google.com: domain of
 gcc-patches-return-439668-patch=linaro.org@gcc.gnu.org
 designates 209.132.180.131 as permitted sender)
 smtp.mailfrom=gcc-patches-return-439668-patch=linaro.org@gcc.gnu.org
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
 :list-unsubscribe:list-archive:list-post:list-help:sender:from
 :subject:to:message-id:date:mime-version:content-type; q=dns; s=
 default; b=uffGwpIJM+wGy8QGDvdVcOuIMqvDzAzd2cFNL00Nz650m6b0RZ6HG
 EUrS0R8bnDcGaWZebTLXKF/CYBBLw2IoeXZy9EWgbURRgz9vcV4fCXGGuMqkxkFl
 4t2fzZlf+xOzPgoMWktrbvRXUopwYwLu8AnuYmRLup/sRnQ3cZOuH0=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
 :list-unsubscribe:list-archive:list-post:list-help:sender:from
 :subject:to:message-id:date:mime-version:content-type; s=
 default; bh=jbsjswfmYRIQCTLSA9DHAHC3OJE=; b=IpuWfw+KGESYc7Re7+Dz
 nRuzUqG79i2CBlWiZvmQnexZfuhQO6TWjHr4QjtwDntmT1JU00CTmRHbnXlvhTSD
 R/yTPE4rESTOC/xw7TF0DkB+iwB3uYhisBBJd1A451JAOlScM2GDE/N8NzuAxg0d
 sWAMLgv/RnrR6ni9t8zTriM=
Received: (qmail 90153 invoked by alias); 26 Oct 2016 22:29:42 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <mailto:gcc-patches-unsubscribe-patch=linaro.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 90144 invoked by uid 89); 26 Oct 2016 22:29:42 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=AWL, BAYES_00,
 RCVD_IN_DNSWL_NONE, SPF_PASS,
 URIBL_RED autolearn=ham version=3.3.2 spammy=HTo:U*nathan,
 H*Ad:U*nathan, gang, tidy
X-HELO: relay1.mentorg.com
Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131)
 by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with
 ESMTP; Wed, 26 Oct 2016 22:29:31 +0000
Received: from svr-orw-mbx-04.mgc.mentorg.com ([147.34.90.204])	by
 relay1.mentorg.com with esmtp id 1bzWhM-0001xW-Vo from
 Cesar_Philippidis@mentor.com ; Wed, 26 Oct 2016 15:29:29 -0700
Received: from [127.0.0.1] (147.34.91.1) by SVR-ORW-MBX-04.mgc.mentorg.com
 (147.34.90.204) with Microsoft SMTP Server (TLS) id
 15.0.1210.3; Wed, 26 Oct 2016 15:29:26 -0700
From: Cesar Philippidis <cesar@codesourcery.com>
Subject: [nvptx] propagating conditionals in worker-vector partitioned loops
To: Nathan Sidwell <nathan@acm.org>,
 "gcc-patches@gcc.gnu.org"	<gcc-patches@gcc.gnu.org>
Message-ID: <2d3d57d2-e5a8-04e9-7ad1-72e4b21beae4@codesourcery.com>
Date: Wed, 26 Oct 2016 15:29:25 -0700
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
 rv:45.0) Gecko/20100101 Thunderbird/45.3.0
MIME-Version: 1.0
X-ClientProxiedBy: svr-orw-mbx-01.mgc.mentorg.com (147.34.90.201) To
 SVR-ORW-MBX-04.mgc.mentorg.com (147.34.90.204)

Currently, the nvptx backend is only neutering the worker axis when
propagating variables used in conditional expressions across the worker
and vector axes. That's a problem with the worker-state spill and fill
propagation implementation because all of the vector threads in worker 0
all write the the same address location being spilled. As the attached
test case demonstrates, this might cause an infinite loop depending on
the values in the vector threads being propagated.

This patch fixes this issue by introducing a new worker-vector
predicate, so that both the worker and vector threads can be predicated
together, not separately. I.e., instead of first neutering worker axis,
then neutering the vector axis, this patch uses a single predicate for
tid.x == 0 && tid.y == 0.

Is this patch ok for trunk?

Cesar

2016-10-26  Cesar Philippidis  <cesar@codesourcery.com>

	gcc/
	* config/nvptx/nvptx.c (nvptx_single): Use a single predicate
	for loops partitioned across both worker and vector axes.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/broadcast-1.c: New test.


diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 7bf5987..4e6ed60 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -3507,11 +3507,38 @@ nvptx_single (unsigned mask, basic_block from, basic_block to)
   /* Insert the vector test inside the worker test.  */
   unsigned mode;
   rtx_insn *before = tail;
+  rtx wvpred = NULL_RTX;
+  bool skip_vector = false;
+
+  /* Create a single predicate for loops containing both worker and
+     vectors.  */
+  if (cond_branch
+      && (GOMP_DIM_MASK (GOMP_DIM_WORKER) & mask)
+      && (GOMP_DIM_MASK (GOMP_DIM_VECTOR) & mask))
+    {
+      rtx regx = gen_reg_rtx (SImode);
+      rtx regy = gen_reg_rtx (SImode);
+      rtx tmp = gen_reg_rtx (SImode);
+      wvpred = gen_reg_rtx (BImode);
+
+      emit_insn_before (gen_oacc_dim_pos (regx, const1_rtx), head);
+      emit_insn_before (gen_oacc_dim_pos (regy, const2_rtx), head);
+      emit_insn_before (gen_rtx_SET (tmp, gen_rtx_IOR (SImode, regx, regy)),
+			head);
+      emit_insn_before (gen_rtx_SET (wvpred, gen_rtx_NE (BImode, tmp,
+							 const0_rtx)),
+			head);
+
+      skip_mask &= ~(GOMP_DIM_MASK (GOMP_DIM_VECTOR));
+      skip_vector = true;
+    }
+
   for (mode = GOMP_DIM_WORKER; mode <= GOMP_DIM_VECTOR; mode++)
     if (GOMP_DIM_MASK (mode) & skip_mask)
       {
 	rtx_code_label *label = gen_label_rtx ();
-	rtx pred = cfun->machine->axis_predicate[mode - GOMP_DIM_WORKER];
+	rtx pred = skip_vector ? wvpred
+	  : cfun->machine->axis_predicate[mode - GOMP_DIM_WORKER];
 
 	if (!pred)
 	  {
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/broadcast-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/broadcast-1.c
new file mode 100644
index 0000000..4dcb60d
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/broadcast-1.c
@@ -0,0 +1,49 @@
+/* Ensure that worker-vector state conditional expressions are
+   properly handled by the nvptx backend.  */
+
+#include <assert.h>
+#include <math.h>
+
+
+#define N 1024
+
+int A[N][N] ;
+
+void test(int x)
+{
+#pragma acc parallel  num_gangs(16) num_workers(4) vector_length(32) copyout(A)
+  {
+#pragma acc loop gang
+    for(int j=0;j<N;j++)
+      {
+	if (x==1)
+	  {
+#pragma acc loop worker vector
+	    for(int i=0;i<N;i++)
+	      A[i][j] = 1;
+	  }
+	else
+	  {
+#pragma acc loop worker vector
+	    for(int i=0;i<N;i++)
+	      A[i][j] = -1;
+	  }
+      }
+  }
+}
+
+
+int main(void)
+{
+  test (0);
+  for (int i = 0; i < N; i++)
+    for (int j = 0; j < N; j++)
+      assert (A[i][j] == -1);
+
+  test (1);
+  for (int i = 0; i < N; i++)
+    for (int j = 0; j < N; j++)
+      assert (A[i][j] == 1);
+
+  return 0;
+}