From patchwork Tue Nov 22 17:25:45 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Monakov X-Patchwork-Id: 83459 Delivered-To: patch@linaro.org Received: by 10.182.1.168 with SMTP id 8csp2269199obn; Tue, 22 Nov 2016 09:26:24 -0800 (PST) X-Received: by 10.98.150.206 with SMTP id s75mr27327809pfk.155.1479835584335; Tue, 22 Nov 2016 09:26:24 -0800 (PST) Return-Path: Received: from sourceware.org (server1.sourceware.org. [209.132.180.131]) by mx.google.com with ESMTPS id 92si742635plc.147.2016.11.22.09.26.24 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 22 Nov 2016 09:26:24 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-return-442269-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org; spf=pass (google.com: domain of gcc-patches-return-442269-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) smtp.mailfrom=gcc-patches-return-442269-patch=linaro.org@gcc.gnu.org DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:in-reply-to:message-id:references :mime-version:content-type; q=dns; s=default; b=JxLBZPA96WhWAb0b Otu29mXzPL1aBam3NZtYfpNI60QcBHEYPsN63939eJHDviFr6bkQN6WyTwe4PKba CutYqM7S5pK2BCQUOHm2YiUAWzjjpYIyKXUVnXjujKaZbLNO6PVCCHA+Iy1B/70z HoCtKduc9t2nuu7lKxCVeDQhPdI= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:in-reply-to:message-id:references :mime-version:content-type; s=default; bh=lGlR1YjOYJhVdsJ044t51c 8JSsg=; b=k2JCayxLHcPA490rbvrG8pm/JhLAaNPWZvBQEncgOVkRyXRxnk0K1A x9Nn/BJQLw38SSs9lT/ShHzv0PYfi6PqkTMnUj188aS26QjOiPJ2S4ZS2r+RQzx/ 33ySDB8mV5nuewHpN3ugZVyHHMs9YgpftaXQnBQgvBTO4h0OUdZLo= Received: (qmail 60657 invoked by alias); 22 Nov 2016 17:26:10 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 60639 invoked by uid 89); 22 Nov 2016 17:26:09 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.1 required=5.0 tests=AWL, BAYES_50, RP_MATCHES_RCVD, SPF_PASS autolearn=ham version=3.3.2 spammy=worker, sk:enable_, as_a, gang X-HELO: smtp.ispras.ru Received: from bran.ispras.ru (HELO smtp.ispras.ru) (83.149.199.196) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 22 Nov 2016 17:26:06 +0000 Received: from monopod.intra.ispras.ru (monopod.intra.ispras.ru [10.10.3.121]) by smtp.ispras.ru (Postfix) with ESMTP id ED3B9612B1; Tue, 22 Nov 2016 20:26:02 +0300 (MSK) Date: Tue, 22 Nov 2016 20:25:45 +0300 (MSK) From: Alexander Monakov To: Jakub Jelinek cc: gcc-patches@gcc.gnu.org Subject: Re: gomp-nvptx branch - middle-end changes In-Reply-To: <20161111081221.GP3541@tucnak.redhat.com> Message-ID: References: <20161111081221.GP3541@tucnak.redhat.com> User-Agent: Alpine 2.20.13 (LNX 116 2015-12-14) MIME-Version: 1.0 On Fri, 11 Nov 2016, Jakub Jelinek wrote: > Ok for trunk, once the needed corresponding config/nvptx bits are committed, > with one nit below that needs immediate action and the rest can be resolved > incrementally. I'd like to check in afterwards the attached patch, at least > for now, so that non-offloaded SIMD code is less affected. Testing your patch revealed an issue in Fortran offloaded code; types of boolean_type_node in f951 and boolean_false_node in lto1 (when omp_device_lower runs) don't match. I'm attaching a revised patch that addresses it by simply using an integer type (there are also two other minor issues, below). > Please change this into > (ENABLE_OFFLOADING && (flag_openmp || in_lto)) > for now, so that we don't waste compile time even when clearly it > isn't needed, and incrementally change the inliner to propagate > the property. As ENABLE_OFFLOADING is not set in the offloading compiler, this additionally needs to accept ACCEL_COMPILER. Applied like this: + virtual bool gate (function *ARG_UNUSED (fun)) + { + /* FIXME: this should use PROP_gimple_lomp_dev. */ +#ifdef ACCEL_COMPILER + return true; +#else + return ENABLE_OFFLOADING && (flag_openmp || in_lto_p); +#endif + } In your GOMP_USE_SIMT() patch, > @@ -4314,6 +4364,12 @@ lower_rec_simd_input_clauses (tree new_v > if (max_vf == 0) > { > max_vf = omp_max_vf (); > + if (find_omp_clause (gimple_omp_for_clauses (ctx->stmt), > + OMP_CLAUSE__SIMT_)) > + { > + int max_simt = omp_max_simt_vf (); > + max_vf = MAX (max_vf, max_simt); > + } I don't believe here there's a need to take a maximum. Cloning the loop upfront means that SIMD+SIMT styles are not going to mix within a single loop. I've simplified it to an if-then-else in the revised patch. > @@ -10601,7 +10656,11 @@ expand_omp_simd (struct omp_region *regi > bool offloaded = cgraph_node::get (current_function_decl)->offloadable; > for (struct omp_region *rgn = region; !offloaded && rgn; rgn = rgn->outer) > offloaded = rgn->type == GIMPLE_OMP_TARGET; > - bool is_simt = offloaded && omp_max_simt_vf () > 1 && safelen_int > 1; > + bool is_simt > + = (offloaded > + && find_omp_clause (gimple_omp_for_clauses (fd->for_stmt), > + OMP_CLAUSE__SIMT_) > + && safelen_int > 1); Here computation of 'offloaded' is no longer needed, because presence of OMP_CLAUSE__SIMT_ would imply that. Removed in the revised patch. I've noticed that your patch doesn't adjust 'maybe_simt' in "ordered" lowering. Not sure if that's intentional -- as I understand it's possible to look at the enclosing context's clauses because 'omp ordered' must be closely nested with the corresponding loop. I've added a FIXME in the patch. Alexander * internal-fn.c (expand_GOMP_USE_SIMT): New function. * tree.c (omp_clause_num_ops): OMP_CLAUSE__SIMT_ has 0 operands. (omp_clause_code_name): Add _simt_ name. (walk_tree_1): Handle OMP_CLAUSE__SIMT_. * tree-core.h (enum omp_clause_code): Add OMP_CLAUSE__SIMT_. * omp-low.c (scan_sharing_clauses): Handle OMP_CLAUSE__SIMT_. (scan_omp_simd): New function. (scan_omp_1_stmt): Use it in target regions if needed. (omp_max_vf): Don't max with omp_max_simt_vf. (lower_rec_simd_input_clauses): Use omp_max_simt_vf if OMP_CLAUSE__SIMT_ is present. (lower_rec_input_clauses): Compute maybe_simt from presence of OMP_CLAUSE__SIMT_. (lower_lastprivate_clauses): Likewise. (expand_omp_simd): Likewise. (execute_omp_device_lower): Lower IFN_GOMP_USE_SIMT. * internal-fn.def (GOMP_USE_SIMT): New internal function. * tree-pretty-print.c (dump_omp_clause): Handle OMP_CLAUSE__SIMT_. diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c index 6cd8522..b1dbc98 100644 --- a/gcc/internal-fn.c +++ b/gcc/internal-fn.c @@ -158,6 +158,14 @@ expand_ANNOTATE (internal_fn, gcall *) gcc_unreachable (); } +/* This should get expanded in omp_device_lower pass. */ + +static void +expand_GOMP_USE_SIMT (internal_fn, gcall *) +{ + gcc_unreachable (); +} + /* Lane index on SIMT targets: thread index in the warp on NVPTX. On targets without SIMT execution this should be expanded in omp_device_lower pass. */ diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def index f055230..9a03e17 100644 --- a/gcc/internal-fn.def +++ b/gcc/internal-fn.def @@ -141,6 +141,7 @@ DEF_INTERNAL_INT_FN (FFS, ECF_CONST, ffs, unary) DEF_INTERNAL_INT_FN (PARITY, ECF_CONST, parity, unary) DEF_INTERNAL_INT_FN (POPCOUNT, ECF_CONST, popcount, unary) +DEF_INTERNAL_FN (GOMP_USE_SIMT, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL) DEF_INTERNAL_FN (GOMP_SIMT_LANE, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL) DEF_INTERNAL_FN (GOMP_SIMT_VF, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL) DEF_INTERNAL_FN (GOMP_SIMT_LAST_LANE, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL) diff --git a/gcc/omp-low.c b/gcc/omp-low.c index 6c52bff..eab0af5 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -278,6 +278,7 @@ static bool omp_any_child_fn_dumped; static void scan_omp (gimple_seq *, omp_context *); static tree scan_omp_1_op (tree *, int *, void *); static gphi *find_phi_with_arg_on_edge (tree, edge); +static int omp_max_simt_vf (void); #define WALK_SUBSTMTS \ case GIMPLE_BIND: \ @@ -2192,6 +2193,7 @@ scan_sharing_clauses (tree clauses, omp_context *ctx, case OMP_CLAUSE_INDEPENDENT: case OMP_CLAUSE_AUTO: case OMP_CLAUSE_SEQ: + case OMP_CLAUSE__SIMT_: break; case OMP_CLAUSE_ALIGNED: @@ -2363,6 +2365,7 @@ scan_sharing_clauses (tree clauses, omp_context *ctx, case OMP_CLAUSE_AUTO: case OMP_CLAUSE_SEQ: case OMP_CLAUSE__GRIDDIM_: + case OMP_CLAUSE__SIMT_: break; case OMP_CLAUSE_TILE: @@ -3066,6 +3069,48 @@ scan_omp_for (gomp_for *stmt, omp_context *outer_ctx) scan_omp (gimple_omp_body_ptr (stmt), ctx); } +/* Duplicate #pragma omp simd, one for SIMT, another one for SIMD. */ + +static void +scan_omp_simd (gimple_stmt_iterator *gsi, gomp_for *stmt, + omp_context *outer_ctx) +{ + gbind *bind = gimple_build_bind (NULL, NULL, NULL); + gsi_replace (gsi, bind, false); + gimple_seq seq = NULL; + gimple *g = gimple_build_call_internal (IFN_GOMP_USE_SIMT, 0); + tree cond = create_tmp_var_raw (integer_type_node); + DECL_CONTEXT (cond) = current_function_decl; + DECL_SEEN_IN_BIND_EXPR_P (cond) = 1; + gimple_bind_set_vars (bind, cond); + gimple_call_set_lhs (g, cond); + gimple_seq_add_stmt (&seq, g); + tree lab1 = create_artificial_label (UNKNOWN_LOCATION); + tree lab2 = create_artificial_label (UNKNOWN_LOCATION); + tree lab3 = create_artificial_label (UNKNOWN_LOCATION); + g = gimple_build_cond (NE_EXPR, cond, integer_zero_node, lab1, lab2); + gimple_seq_add_stmt (&seq, g); + g = gimple_build_label (lab1); + gimple_seq_add_stmt (&seq, g); + gimple_seq new_seq = copy_gimple_seq_and_replace_locals (stmt); + gomp_for *new_stmt = as_a (new_seq); + tree clause = build_omp_clause (gimple_location (stmt), OMP_CLAUSE__SIMT_); + OMP_CLAUSE_CHAIN (clause) = gimple_omp_for_clauses (new_stmt); + gimple_omp_for_set_clauses (new_stmt, clause); + gimple_seq_add_stmt (&seq, new_stmt); + g = gimple_build_goto (lab3); + gimple_seq_add_stmt (&seq, g); + g = gimple_build_label (lab2); + gimple_seq_add_stmt (&seq, g); + gimple_seq_add_stmt (&seq, stmt); + g = gimple_build_label (lab3); + gimple_seq_add_stmt (&seq, g); + gimple_bind_set_body (bind, seq); + update_stmt (bind); + scan_omp_for (new_stmt, outer_ctx); + scan_omp_for (stmt, outer_ctx); +} + /* Scan an OpenMP sections directive. */ static void @@ -3969,7 +4014,13 @@ scan_omp_1_stmt (gimple_stmt_iterator *gsi, bool *handled_ops_p, break; case GIMPLE_OMP_FOR: - scan_omp_for (as_a (stmt), ctx); + if (((gimple_omp_for_kind (as_a (stmt)) + & GF_OMP_FOR_KIND_MASK) == GF_OMP_FOR_KIND_SIMD) + && omp_maybe_offloaded_ctx (ctx) + && omp_max_simt_vf ()) + scan_omp_simd (gsi, as_a (stmt), ctx); + else + scan_omp_for (as_a (stmt), ctx); break; case GIMPLE_OMP_SECTIONS: @@ -4316,8 +4367,7 @@ omp_max_vf (void) if (GET_MODE_CLASS (vqimode) == MODE_VECTOR_INT) vf = GET_MODE_NUNITS (vqimode); } - int svf = omp_max_simt_vf (); - return MAX (vf, svf); + return vf; } /* Helper function of lower_rec_input_clauses, used for #pragma omp simd @@ -4329,7 +4379,11 @@ lower_rec_simd_input_clauses (tree new_var, omp_context *ctx, int &max_vf, { if (max_vf == 0) { - max_vf = omp_max_vf (); + if (find_omp_clause (gimple_omp_for_clauses (ctx->stmt), + OMP_CLAUSE__SIMT_)) + max_vf = omp_max_simt_vf (); + else + max_vf = omp_max_vf (); if (max_vf > 1) { tree c = find_omp_clause (gimple_omp_for_clauses (ctx->stmt), @@ -4405,8 +4459,7 @@ lower_rec_input_clauses (tree clauses, gimple_seq *ilist, gimple_seq *dlist, int pass; bool is_simd = (gimple_code (ctx->stmt) == GIMPLE_OMP_FOR && gimple_omp_for_kind (ctx->stmt) & GF_OMP_FOR_SIMD); - bool maybe_simt - = is_simd && omp_maybe_offloaded_ctx (ctx) && omp_max_simt_vf () > 1; + bool maybe_simt = is_simd && find_omp_clause (clauses, OMP_CLAUSE__SIMT_); int max_vf = 0; tree lane = NULL_TREE, idx = NULL_TREE; tree simt_lane = NULL_TREE; @@ -5497,7 +5550,7 @@ lower_lastprivate_clauses (tree clauses, tree predicate, gimple_seq *stmt_list, if (gimple_code (ctx->stmt) == GIMPLE_OMP_FOR && gimple_omp_for_kind (ctx->stmt) & GF_OMP_FOR_SIMD) { - maybe_simt = omp_maybe_offloaded_ctx (ctx) && omp_max_simt_vf () > 1; + maybe_simt = find_omp_clause (orig_clauses, OMP_CLAUSE__SIMT_); simduid = find_omp_clause (orig_clauses, OMP_CLAUSE__SIMDUID_); if (simduid) simduid = OMP_CLAUSE__SIMDUID__DECL (simduid); @@ -10749,10 +10802,9 @@ expand_omp_simd (struct omp_region *region, struct omp_for_data *fd) } tree step = fd->loop.step; - bool offloaded = cgraph_node::get (current_function_decl)->offloadable; - for (struct omp_region *rgn = region; !offloaded && rgn; rgn = rgn->outer) - offloaded = rgn->type == GIMPLE_OMP_TARGET; - bool is_simt = offloaded && omp_max_simt_vf () > 1 && safelen_int > 1; + bool is_simt = (safelen_int > 1 + && find_omp_clause (gimple_omp_for_clauses (fd->for_stmt), + OMP_CLAUSE__SIMT_)); tree simt_lane = NULL_TREE, simt_maxlane = NULL_TREE; if (is_simt) { @@ -15006,6 +15058,8 @@ lower_omp_ordered (gimple_stmt_iterator *gsi_p, omp_context *ctx) gbind *bind; bool simd = find_omp_clause (gimple_omp_ordered_clauses (ord_stmt), OMP_CLAUSE_SIMD); + /* FIXME: this should check presence of OMP_CLAUSE__SIMT_ on the enclosing + loop. */ bool maybe_simt = simd && omp_maybe_offloaded_ctx (ctx) && omp_max_simt_vf () > 1; bool threads = find_omp_clause (gimple_omp_ordered_clauses (ord_stmt), @@ -20167,6 +20221,9 @@ execute_omp_device_lower () tree type = lhs ? TREE_TYPE (lhs) : integer_type_node; switch (gimple_call_internal_fn (stmt)) { + case IFN_GOMP_USE_SIMT: + rhs = vf == 1 ? integer_zero_node : integer_one_node; + break; case IFN_GOMP_SIMT_LANE: case IFN_GOMP_SIMT_LAST_LANE: rhs = vf == 1 ? build_zero_cst (type) : NULL_TREE; diff --git a/gcc/tree-core.h b/gcc/tree-core.h index a3d220d..eec2d4f3 100644 --- a/gcc/tree-core.h +++ b/gcc/tree-core.h @@ -438,6 +438,10 @@ enum omp_clause_code { /* Internally used only clause, holding SIMD uid. */ OMP_CLAUSE__SIMDUID_, + /* Internally used only clause, flag whether this is SIMT simd + loop or not. */ + OMP_CLAUSE__SIMT_, + /* Internally used only clause, holding _Cilk_for # of iterations on OMP_PARALLEL. */ OMP_CLAUSE__CILK_FOR_COUNT_, diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c index 096eefd..95db710 100644 --- a/gcc/tree-pretty-print.c +++ b/gcc/tree-pretty-print.c @@ -893,6 +893,10 @@ dump_omp_clause (pretty_printer *pp, tree clause, int spc, int flags) pp_right_paren (pp); break; + case OMP_CLAUSE__SIMT_: + pp_string (pp, "_simt_"); + break; + case OMP_CLAUSE_GANG: pp_string (pp, "gang"); if (OMP_CLAUSE_GANG_EXPR (clause) != NULL_TREE) diff --git a/gcc/tree.c b/gcc/tree.c index a4c5b1b..9b0b806 100644 --- a/gcc/tree.c +++ b/gcc/tree.c @@ -320,6 +320,7 @@ unsigned const char omp_clause_num_ops[] = 1, /* OMP_CLAUSE_HINT */ 0, /* OMP_CLAUSE_DEFALTMAP */ 1, /* OMP_CLAUSE__SIMDUID_ */ + 0, /* OMP_CLAUSE__SIMT_ */ 1, /* OMP_CLAUSE__CILK_FOR_COUNT_ */ 0, /* OMP_CLAUSE_INDEPENDENT */ 1, /* OMP_CLAUSE_WORKER */ @@ -391,6 +392,7 @@ const char * const omp_clause_code_name[] = "hint", "defaultmap", "_simduid_", + "_simt_", "_Cilk_for_count_", "independent", "worker", @@ -11893,6 +11895,7 @@ walk_tree_1 (tree *tp, walk_tree_fn func, void *data, case OMP_CLAUSE_AUTO: case OMP_CLAUSE_SEQ: case OMP_CLAUSE_TILE: + case OMP_CLAUSE__SIMT_: WALK_SUBTREE_TAIL (OMP_CLAUSE_CHAIN (*tp)); case OMP_CLAUSE_LASTPRIVATE: