From patchwork Fri Nov 17 21:58:50 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 119229 Delivered-To: patch@linaro.org Received: by 10.140.22.164 with SMTP id 33csp1051094qgn; Fri, 17 Nov 2017 14:02:29 -0800 (PST) X-Google-Smtp-Source: AGs4zMYQSeepjNXqrFJdQl9fZ+dch8ciMWP1fdqdorK41rLibqy0d1S1i3TQnDQ1ep7MpOxxlQj8 X-Received: by 10.84.235.201 with SMTP id m9mr3345651plt.30.1510956149562; Fri, 17 Nov 2017 14:02:29 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1510956149; cv=none; d=google.com; s=arc-20160816; b=B1+z0vA35taE2Lklc1DEkS2Y2mlA7EbxEPOfJY+GvsQqWRkA4OEph+sxC407AlJPC1 OKJhV9ZdT1jtvPJnplxPpyMr4w4W6XpM8jUlSZ7BRvFtsnvyQ4iOJVMzkC31Xso77qqf Y/LIxdBmkV9TDO2E9RuNBsSDHizqiQoav3NXk3/uioKjWYI4EhAokHuAdzTfoYGHBAs0 PqAOCuyfLf1flwFhKt2vYdJJuYEeJVMB2l9KMY0MhNCxSP2vA3wtCQW+MaHPFK2S2wkY UZ9LDVFYm51q/zFlPTAkgegBjKnEtm3SLp2mze2mMtbDgI7NYdGRZcakujfMa+q9iRfd nOqw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:user-agent:message-id:date:subject:mail-followup-to:to :from:delivered-to:sender:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:mailing-list:dkim-signature :domainkey-signature:arc-authentication-results; bh=DDc7l9oinK320lv6HPYC+PrNybAEDSGU12HI55NfdMQ=; b=HPziwvVJZd6/JsOnmj5poM+LxIbzegbeXr0EIqdERweUCWbzl5ImLKB2uReLWkYEt6 qJQNH4by3knY1kcFqRSuzBZ+hpDRlj33veCWg/3rGUGDK5zH1xwE8l85whRnNi05Ifrq EgHUUMU4VyMKne5JWYlYN+EP5D+J2hzi+Dt5DEAokVW69MzKcW9J3Odbm4BWrJ8wYgXz p3E1pnBcssLDU0/jnLTYjFdhUaQEeXdcfOD0HWuDDavuoSVg4uK74zEkNJgmd0/TVvuJ suGANgqBK9glXcNVGPxiMj+M0NoCzTxmbP+7zfgtj6GZvN73QaSVW7aJpORnez0b9LiH 2Qzw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=KEbDHZZv; spf=pass (google.com: domain of gcc-patches-return-467229-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) smtp.mailfrom=gcc-patches-return-467229-patch=linaro.org@gcc.gnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from sourceware.org (server1.sourceware.org. [209.132.180.131]) by mx.google.com with ESMTPS id u125si3375337pgc.776.2017.11.17.14.02.29 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 17 Nov 2017 14:02:29 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-return-467229-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=KEbDHZZv; spf=pass (google.com: domain of gcc-patches-return-467229-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) smtp.mailfrom=gcc-patches-return-467229-patch=linaro.org@gcc.gnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; q=dns; s= default; b=f6m4oHrepnY9/W0QxyMv0ZrYboRzkcy0oicYcrgho7CZkLvjSacd5 JlQ9qzNBp3Vv4rlx1B6fBfEbBjn0RdsKvSgdjt1NGaq7de/Q1Bqy1WUm/Gq16uqz aQcuhl99h0sWrtrk8WNFVlW7dN0GvckTBz5cKtEGC/YJMgatmSyOTI= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; s= default; bh=MmKI3TR+dg/ydBIqTNkacfOfHR8=; b=KEbDHZZvMcK0Z0JGKOqE L5qpJb2c5amXKDskbGwjywCX2kBw6+WM/lI0bAt5cq5DOfYOpunImPzcdacFNxiu TyKqNC6QTbufjkwhppx3UCWmrYGWhhQdtVfhupMNg5Z5599JSMiJ6hziEIBeAasX /HFxTe3YhPdtt/G+3jrvjqc= Received: (qmail 84000 invoked by alias); 17 Nov 2017 21:59:09 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 83128 invoked by uid 89); 17 Nov 2017 21:59:07 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-15.4 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, KB_WAM_FROM_NAME_SINGLEWORD, RCVD_IN_DNSWL_NONE, SPF_PASS autolearn=ham version=3.3.2 spammy= X-HELO: mail-wm0-f50.google.com Received: from mail-wm0-f50.google.com (HELO mail-wm0-f50.google.com) (74.125.82.50) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 17 Nov 2017 21:58:58 +0000 Received: by mail-wm0-f50.google.com with SMTP id y80so9001380wmd.0 for ; Fri, 17 Nov 2017 13:58:57 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:mail-followup-to:subject:date:message-id :user-agent:mime-version; bh=DDc7l9oinK320lv6HPYC+PrNybAEDSGU12HI55NfdMQ=; b=t7tzXtwQhcQR7CYkpKIrLHWFuKJTv+MQBBQDgxtsQbsYA60xDNbwrYXU9zFvnoSlu6 sapUBj5B+dklFSoQVBXWcnCUbqlL+/uEMl3PkMF4Kjrzj2661EZrTzJS8SJ2+Pb6FBe/ sm/kLUlGTGwKd53wYsfxEIp3UbJMk7V5WlomcWI/0xUr9facpn68/xTMh962VXUcMWVo m1p467WP6/o0/QvMWBNe/l123Ct2iu17ojvvwVoGQoP+NnHBSeLaxeA3jlay+X1TkcTc HtKkCmV/erXvLb4HYKWPklIp2lFgndA5/CmPxN9W2J5g3O9mExZq+IwUYVqPR0IJbNWa qShA== X-Gm-Message-State: AJaThX7mqF6fK8KLbWQ4J+gkjaM8S0iIM/hO0n6SSOtuQ3Zic6iFvfUj ydpSx6qdSEYk2jlZlffXxVE0F8qK5mM= X-Received: by 10.28.140.15 with SMTP id o15mr4906434wmd.8.1510955933550; Fri, 17 Nov 2017 13:58:53 -0800 (PST) Received: from localhost ([2.25.234.120]) by smtp.gmail.com with ESMTPSA id x63sm4653245wma.39.2017.11.17.13.58.50 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 17 Nov 2017 13:58:52 -0800 (PST) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@linaro.org Subject: Add support for SVE gather loads Date: Fri, 17 Nov 2017 21:58:50 +0000 Message-ID: <87ine8worp.fsf@linaro.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux) MIME-Version: 1.0 This patch adds support for SVE gather loads. It uses the basically the same analysis code as the AVX gather support, but after that there are two major differences: - It uses new internal functions rather than target built-ins. The interface is: IFN_GATHER_LOAD (base, offsets, scale) IFN_MASK_GATHER_LOAD (base, offsets, scale, mask) which should be reasonably generic. One of the advantages of using internal functions is that other passes can understand what the functions do, but a more immediate advantage is that we can query the underlying target pattern to see which scales it supports. - It uses pattern recognition to convert the offset to the right width, if it was originally narrower than that. This avoids having to do a widening operation as part of the gather expansion itself. Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu and powerpc64le-linux-gnu. OK to install? Richard 2017-11-17 Richard Sandiford Alan Hayward David Sherwood gcc/ * doc/md.texi (gather_load@var{m}): Document. (mask_gather_load@var{m}): Likewise. * genopinit.c (main): Add supports_vec_gather_load and supports_vec_gather_load_cached to target_optabs. * optabs-tree.c (init_tree_optimization_optabs): Use ggc_cleared_alloc to allocate target_optabs. * optabs.def (gather_load_optab, mask_gather_laod_optab): New optabs. * internal-fn.def (GATHER_LOAD, MASK_GATHER_LOAD): New internal functions. * internal-fn.h (internal_load_fn_p): Declare. (internal_gather_scatter_fn_p): Likewise. (internal_fn_mask_index): Likewise. (internal_gather_scatter_fn_supported_p): Likewise. * internal-fn.c (gather_load_direct): New macro. (expand_gather_load_optab_fn): New function. (direct_gather_load_optab_supported_p): New macro. (direct_internal_fn_optab): New function. (internal_load_fn_p): Likewise. (internal_gather_scatter_fn_p): Likewise. (internal_fn_mask_index): Likewise. (internal_gather_scatter_fn_supported_p): Likewise. * optabs-query.c (supports_at_least_one_mode_p): New function. (supports_vec_gather_load_p): Likewise. * optabs-query.h (supports_vec_gather_load_p): Declare. * tree-vectorizer.h (gather_scatter_info): Add ifn, element_type and memory_type field. (NUM_PATTERNS): Bump to 15. * tree-vect-data-refs.c (vect_gather_scatter_fn_p): New function. (vect_describe_gather_scatter_call): Likewise. (vect_check_gather_scatter): Try using internal functions for gather loads. Recognize existing calls to a gather load function. (vect_analyze_data_refs): Consider using gather loads if supports_vec_gather_load_p. * tree-vect-patterns.c (vect_get_load_store_mask): New function. (vect_get_gather_scatter_offset_type): Likewise. (vect_convert_mask_for_vectype): Likewise. (vect_add_conversion_to_patterm): Likewise. (vect_try_gather_scatter_pattern): Likewise. (vect_recog_gather_scatter_pattern): New pattern recognizer. (vect_vect_recog_func_ptrs): Add it. * tree-vect-stmts.c (exist_non_indexing_operands_for_use_p): Use internal_fn_mask_index and internal_gather_scatter_fn_p. (check_load_store_masking): Take the gather_scatter_info as an argument and handle gather loads. (vect_get_gather_scatter_ops): New function. (vectorizable_call): Check internal_load_fn_p. (vectorizable_load): Likewise. Handle gather load internal functions. (vectorizable_store): Update call to check_load_store_masking. * config/aarch64/aarch64.md (UNSPEC_LD1_GATHER): New unspec. * config/aarch64/iterators.md (SVE_S, SVE_D): New mode iterators. * config/aarch64/predicates.md (aarch64_gather_scale_operand_w) (aarch64_gather_scale_operand_d): New predicates. * config/aarch64/aarch64-sve.md (gather_load): New expander. (mask_gather_load): New insns. gcc/testsuite/ * gcc.target/aarch64/sve_gather_load_1.c: New test. * gcc.target/aarch64/sve_gather_load_2.c: Likewise. * gcc.target/aarch64/sve_gather_load_3.c: Likewise. * gcc.target/aarch64/sve_gather_load_4.c: Likewise. * gcc.target/aarch64/sve_gather_load_5.c: Likewise. * gcc.target/aarch64/sve_gather_load_6.c: Likewise. * gcc.target/aarch64/sve_gather_load_7.c: Likewise. * gcc.target/aarch64/sve_mask_gather_load_1.c: Likewise. * gcc.target/aarch64/sve_mask_gather_load_2.c: Likewise. * gcc.target/aarch64/sve_mask_gather_load_3.c: Likewise. * gcc.target/aarch64/sve_mask_gather_load_4.c: Likewise. * gcc.target/aarch64/sve_mask_gather_load_5.c: Likewise. * gcc.target/aarch64/sve_mask_gather_load_6.c: Likewise. * gcc.target/aarch64/sve_mask_gather_load_7.c: Likewise. Index: gcc/doc/md.texi =================================================================== --- gcc/doc/md.texi 2017-11-17 21:57:43.531042657 +0000 +++ gcc/doc/md.texi 2017-11-17 21:57:43.915004222 +0000 @@ -4905,6 +4905,35 @@ for (j = 0; j < GET_MODE_NUNITS (@var{n} This pattern is not allowed to @code{FAIL}. +@cindex @code{gather_load@var{m}} instruction pattern +@item @samp{gather_load@var{m}} +Load several separate memory locations into a vector of mode @var{m}. +Operand 1 is a scalar base address and operand 2 is a vector of +offsets from that base. Operand 0 is a destination vector with the +same number of elements as the offset. For each element index @var{i}: + +@itemize @bullet +@item +extend the offset element @var{i} to address width, using zero +extension if operand 3 is 1 and sign extension if operand 3 is zero; +@item +multiply the extended offset by operand 4; +@item +add the result to the base; and +@item +load the value at that address into element @var{i} of operand 0. +@end itemize + +The value of operand 3 does not matter if the offsets are already +address width. + +@cindex @code{mask_gather_load@var{m}} instruction pattern +@item @samp{mask_gather_load@var{m}} +Like @samp{gather_load@var{m}}, but takes an extra mask operand as +operand 5. Bit @var{i} of the mask is set if element @var{i} +of the result should be loaded from memory and clear if element @var{i} +of the result should be set to zero. + @cindex @code{vec_set@var{m}} instruction pattern @item @samp{vec_set@var{m}} Set given field in the vector value. Operand 0 is the vector to modify, Index: gcc/genopinit.c =================================================================== --- gcc/genopinit.c 2017-11-17 21:57:43.531042657 +0000 +++ gcc/genopinit.c 2017-11-17 21:57:43.915004222 +0000 @@ -234,6 +234,11 @@ main (int argc, const char **argv) "struct target_optabs {\n" " /* Patterns that are used by optabs that are enabled for this target. */\n" " bool pat_enable[NUM_OPTAB_PATTERNS];\n" + "\n" + " /* Cache if the target supports vec_gather_load for at least one vector\n" + " mode. */\n" + " bool supports_vec_gather_load;\n" + " bool supports_vec_gather_load_cached;\n" "};\n" "extern void init_all_optabs (struct target_optabs *);\n" "\n" Index: gcc/optabs-tree.c =================================================================== --- gcc/optabs-tree.c 2017-11-17 21:57:43.531042657 +0000 +++ gcc/optabs-tree.c 2017-11-17 21:57:43.916004122 +0000 @@ -380,7 +380,7 @@ init_tree_optimization_optabs (tree optn if (tmp_optabs) memset (tmp_optabs, 0, sizeof (struct target_optabs)); else - tmp_optabs = ggc_alloc (); + tmp_optabs = ggc_cleared_alloc (); /* Generate a new set of optabs into tmp_optabs. */ init_all_optabs (tmp_optabs); Index: gcc/optabs.def =================================================================== --- gcc/optabs.def 2017-11-17 21:57:43.531042657 +0000 +++ gcc/optabs.def 2017-11-17 21:57:43.917004022 +0000 @@ -390,6 +390,9 @@ OPTAB_D (atomic_xor_optab, "atomic_xor$I OPTAB_D (get_thread_pointer_optab, "get_thread_pointer$I$a") OPTAB_D (set_thread_pointer_optab, "set_thread_pointer$I$a") +OPTAB_D (gather_load_optab, "gather_load$a") +OPTAB_D (mask_gather_load_optab, "mask_gather_load$a") + OPTAB_DC (vec_duplicate_optab, "vec_duplicate$a", VEC_DUPLICATE) OPTAB_DC (vec_series_optab, "vec_series$a", VEC_SERIES) OPTAB_D (vec_shl_insert_optab, "vec_shl_insert_$a") Index: gcc/internal-fn.def =================================================================== --- gcc/internal-fn.def 2017-11-17 21:57:43.531042657 +0000 +++ gcc/internal-fn.def 2017-11-17 21:57:43.916004122 +0000 @@ -47,6 +47,7 @@ along with GCC; see the file COPYING3. - mask_load: currently just maskload - load_lanes: currently just vec_load_lanes - mask_load_lanes: currently just vec_mask_load_lanes + - gather_load: used for {mask_,}gather_load - mask_store: currently just maskstore - store_lanes: currently just vec_store_lanes @@ -110,6 +111,10 @@ DEF_INTERNAL_OPTAB_FN (LOAD_LANES, ECF_C DEF_INTERNAL_OPTAB_FN (MASK_LOAD_LANES, ECF_PURE, vec_mask_load_lanes, mask_load_lanes) +DEF_INTERNAL_OPTAB_FN (GATHER_LOAD, ECF_PURE, gather_load, gather_load) +DEF_INTERNAL_OPTAB_FN (MASK_GATHER_LOAD, ECF_PURE, + mask_gather_load, gather_load) + DEF_INTERNAL_OPTAB_FN (MASK_STORE, 0, maskstore, mask_store) DEF_INTERNAL_OPTAB_FN (STORE_LANES, ECF_CONST, vec_store_lanes, store_lanes) DEF_INTERNAL_OPTAB_FN (MASK_STORE_LANES, 0, Index: gcc/internal-fn.h =================================================================== --- gcc/internal-fn.h 2017-11-17 21:57:43.531042657 +0000 +++ gcc/internal-fn.h 2017-11-17 21:57:43.916004122 +0000 @@ -192,6 +192,12 @@ extern bool set_edom_supported_p (void); extern internal_fn get_conditional_internal_fn (tree_code, tree); +extern bool internal_load_fn_p (internal_fn); +extern bool internal_gather_scatter_fn_p (internal_fn); +extern int internal_fn_mask_index (internal_fn); +extern bool internal_gather_scatter_fn_supported_p (internal_fn, tree, + tree, signop, int); + extern void expand_internal_call (gcall *); extern void expand_internal_call (internal_fn, gcall *); extern void expand_PHI (internal_fn, gcall *); Index: gcc/internal-fn.c =================================================================== --- gcc/internal-fn.c 2017-11-17 21:57:43.531042657 +0000 +++ gcc/internal-fn.c 2017-11-17 21:57:43.916004122 +0000 @@ -83,6 +83,7 @@ #define not_direct { -2, -2, false } #define mask_load_direct { -1, 2, false } #define load_lanes_direct { -1, -1, false } #define mask_load_lanes_direct { -1, -1, false } +#define gather_load_direct { -1, 1, false } #define mask_store_direct { 3, 2, false } #define store_lanes_direct { 0, 0, false } #define mask_store_lanes_direct { 0, 0, false } @@ -2676,6 +2677,38 @@ expand_LAUNDER (internal_fn, gcall *call expand_assignment (lhs, gimple_call_arg (call, 0), false); } +/* Expand {MASK_,}GATHER_LOAD call CALL using optab OPTAB. */ + +static void +expand_gather_load_optab_fn (internal_fn, gcall *stmt, direct_optab optab) +{ + tree lhs = gimple_call_lhs (stmt); + tree base = gimple_call_arg (stmt, 0); + tree offset = gimple_call_arg (stmt, 1); + tree scale = gimple_call_arg (stmt, 2); + + rtx lhs_rtx = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE); + rtx base_rtx = expand_normal (base); + rtx offset_rtx = expand_normal (offset); + HOST_WIDE_INT scale_int = tree_to_shwi (scale); + + int i = 0; + struct expand_operand ops[6]; + create_output_operand (&ops[i++], lhs_rtx, TYPE_MODE (TREE_TYPE (lhs))); + create_address_operand (&ops[i++], base_rtx); + create_input_operand (&ops[i++], offset_rtx, TYPE_MODE (TREE_TYPE (offset))); + create_integer_operand (&ops[i++], TYPE_UNSIGNED (TREE_TYPE (offset))); + create_integer_operand (&ops[i++], scale_int); + if (optab == mask_gather_load_optab) + { + tree mask = gimple_call_arg (stmt, 3); + rtx mask_rtx = expand_normal (mask); + create_input_operand (&ops[i++], mask_rtx, TYPE_MODE (TREE_TYPE (mask))); + } + insn_code icode = direct_optab_handler (optab, TYPE_MODE (TREE_TYPE (lhs))); + expand_insn (icode, i, ops); +} + /* Expand DIVMOD() using: a) optab handler for udivmod/sdivmod if it is available. b) If optab_handler doesn't exist, generate call to @@ -2915,12 +2948,32 @@ #define direct_cond_binary_optab_support #define direct_mask_load_optab_supported_p direct_optab_supported_p #define direct_load_lanes_optab_supported_p multi_vector_optab_supported_p #define direct_mask_load_lanes_optab_supported_p multi_vector_optab_supported_p +#define direct_gather_load_optab_supported_p direct_optab_supported_p #define direct_mask_store_optab_supported_p direct_optab_supported_p #define direct_store_lanes_optab_supported_p multi_vector_optab_supported_p #define direct_mask_store_lanes_optab_supported_p multi_vector_optab_supported_p #define direct_while_optab_supported_p convert_optab_supported_p #define direct_fold_extract_optab_supported_p direct_optab_supported_p +/* Return the optab used by internal function FN. */ + +static optab +direct_internal_fn_optab (internal_fn fn) +{ + switch (fn) + { +#define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) \ + case IFN_##CODE: break; +#define DEF_INTERNAL_OPTAB_FN(CODE, FLAGS, OPTAB, TYPE) \ + case IFN_##CODE: return OPTAB##_optab; +#include "internal-fn.def" + + case IFN_LAST: + break; + } + gcc_unreachable (); +} + /* Return true if FN is supported for the types in TYPES when the optimization type is OPT_TYPE. The types are those associated with the "type0" and "type1" fields of FN's direct_internal_fn_info @@ -3022,6 +3075,87 @@ get_conditional_internal_fn (tree_code c } } +/* Return true if IFN is some form of load from memory. */ + +bool +internal_load_fn_p (internal_fn fn) +{ + switch (fn) + { + case IFN_MASK_LOAD: + case IFN_LOAD_LANES: + case IFN_MASK_LOAD_LANES: + case IFN_GATHER_LOAD: + case IFN_MASK_GATHER_LOAD: + return true; + + default: + return false; + } +} + +/* Return true if IFN is some form of gather load or scatter store. */ + +bool +internal_gather_scatter_fn_p (internal_fn fn) +{ + switch (fn) + { + case IFN_GATHER_LOAD: + case IFN_MASK_GATHER_LOAD: + return true; + + default: + return false; + } +} + +/* If FN takes a vector mask argument, return the index of that argument, + otherwise return -1. */ + +int +internal_fn_mask_index (internal_fn fn) +{ + switch (fn) + { + case IFN_MASK_LOAD: + case IFN_MASK_LOAD_LANES: + case IFN_MASK_STORE: + case IFN_MASK_STORE_LANES: + return 2; + + case IFN_MASK_GATHER_LOAD: + return 3; + + default: + return -1; + } +} + +/* Return true if the target supports gather load or scatter store function + IFN. For loads, VECTOR_TYPE is the vector type of the load result, + while for stores it is the vector type of the stored data argument. + MEMORY_ELEMENT_TYPE is the type of the memory elements being loaded + or stored. OFFSET_SIGN is the sign of the offset argument, which is + only relevant when the offset is narrower than an address. SCALE is + the amount by which the offset should be multiplied *after* it has + been extended to address width. */ + +bool +internal_gather_scatter_fn_supported_p (internal_fn ifn, tree vector_type, + tree memory_element_type, + signop offset_sign, int scale) +{ + if (!tree_int_cst_equal (TYPE_SIZE (TREE_TYPE (vector_type)), + TYPE_SIZE (memory_element_type))) + return false; + optab optab = direct_internal_fn_optab (ifn); + insn_code icode = direct_optab_handler (optab, TYPE_MODE (vector_type)); + return (icode != CODE_FOR_nothing + && insn_operand_matches (icode, 3, GEN_INT (offset_sign == UNSIGNED)) + && insn_operand_matches (icode, 4, GEN_INT (scale))); +} + /* Expand STMT as though it were a call to internal function FN. */ void Index: gcc/optabs-query.c =================================================================== --- gcc/optabs-query.c 2017-11-17 21:57:43.531042657 +0000 +++ gcc/optabs-query.c 2017-11-17 21:57:43.916004122 +0000 @@ -621,3 +621,32 @@ lshift_cheap_p (bool speed_p) return cheap[speed_p]; } + +/* Return true if optab OP supports at least one mode. */ + +static bool +supports_at_least_one_mode_p (optab op) +{ + for (int i = 0; i < NUM_MACHINE_MODES; ++i) + if (direct_optab_handler (op, (machine_mode) i) != CODE_FOR_nothing) + return true; + + return false; +} + +/* Return true if vec_gather_load is available for at least one vector + mode. */ + +bool +supports_vec_gather_load_p () +{ + if (this_fn_optabs->supports_vec_gather_load_cached) + return this_fn_optabs->supports_vec_gather_load; + + this_fn_optabs->supports_vec_gather_load_cached = true; + + this_fn_optabs->supports_vec_gather_load + = supports_at_least_one_mode_p (gather_load_optab); + + return this_fn_optabs->supports_vec_gather_load; +} Index: gcc/optabs-query.h =================================================================== --- gcc/optabs-query.h 2017-11-17 21:57:43.531042657 +0000 +++ gcc/optabs-query.h 2017-11-17 21:57:43.916004122 +0000 @@ -187,6 +187,7 @@ bool can_compare_and_swap_p (machine_mod bool can_atomic_exchange_p (machine_mode, bool); bool can_atomic_load_p (machine_mode); bool lshift_cheap_p (bool); +bool supports_vec_gather_load_p (); /* Version of find_widening_optab_handler_and_mode that operates on specific mode types. */ Index: gcc/tree-vectorizer.h =================================================================== --- gcc/tree-vectorizer.h 2017-11-17 21:57:43.531042657 +0000 +++ gcc/tree-vectorizer.h 2017-11-17 21:57:43.920003721 +0000 @@ -844,7 +844,12 @@ typedef struct _stmt_vec_info { /* Information about a gather/scatter call. */ struct gather_scatter_info { - /* The FUNCTION_DECL for the built-in gather/scatter function. */ + /* The internal function to use for the gather/scatter operation, + or IFN_LAST if a built-in function should be used instead. */ + internal_fn ifn; + + /* The FUNCTION_DECL for the built-in gather/scatter function, + or null if an internal function should be used instead. */ tree decl; /* The loop-invariant base value. */ @@ -862,6 +867,12 @@ struct gather_scatter_info { /* The type of the vectorized offset. */ tree offset_vectype; + + /* The type of the scalar elements after loading or before storing. */ + tree element_type; + + /* The type of the scalar elements being loaded or stored. */ + tree memory_type; }; /* Access Functions. */ @@ -1529,7 +1540,7 @@ extern void duplicate_and_interleave (gi Additional pattern recognition functions can (and will) be added in the future. */ typedef gimple *(* vect_recog_func_ptr) (vec *, tree *, tree *); -#define NUM_PATTERNS 14 +#define NUM_PATTERNS 15 void vect_pattern_recog (vec_info *); /* In tree-vectorizer.c. */ Index: gcc/tree-vect-data-refs.c =================================================================== --- gcc/tree-vect-data-refs.c 2017-11-17 21:57:43.531042657 +0000 +++ gcc/tree-vect-data-refs.c 2017-11-17 21:57:43.919003822 +0000 @@ -3296,6 +3296,74 @@ vect_prune_runtime_alias_test_list (loop return true; } +/* Check whether we can use an internal function for a gather load + or scatter store. READ_P is true for loads and false for stores. + MASKED_P is true if the load or store is conditional. MEMORY_TYPE is + the type of the memory elements being loaded or stored. OFFSET_BITS + is the number of bits in each scalar offset and OFFSET_SIGN is the + sign of the offset. SCALE is the amount by which the offset should + be multiplied *after* it has been converted to address width. + + Return true if the function is supported, storing the function + id in *IFN_OUT and the type of a vector element in *ELEMENT_TYPE_OUT. */ + +static bool +vect_gather_scatter_fn_p (bool read_p, bool masked_p, tree vectype, + tree memory_type, unsigned int offset_bits, + signop offset_sign, int scale, + internal_fn *ifn_out, tree *element_type_out) +{ + unsigned int memory_bits = tree_to_uhwi (TYPE_SIZE (memory_type)); + unsigned int element_bits = tree_to_uhwi (TYPE_SIZE (TREE_TYPE (vectype))); + if (offset_bits > element_bits) + /* Internal functions require the offset to be the same width as + the vector elements. We can extend narrower offsets, but it isn't + safe to truncate wider offsets. */ + return false; + + if (element_bits != memory_bits) + /* For now the vector elements must be the same width as the + memory elements. */ + return false; + + /* Work out which function we need. */ + internal_fn ifn; + if (read_p) + ifn = masked_p ? IFN_MASK_GATHER_LOAD : IFN_GATHER_LOAD; + else + return false; + + /* Test whether the target supports this combination. */ + if (!internal_gather_scatter_fn_supported_p (ifn, vectype, memory_type, + offset_sign, scale)) + return false; + + *ifn_out = ifn; + *element_type_out = TREE_TYPE (vectype); + return true; +} + +/* CALL is a call to an internal gather load or scatter store function. + Describe the operation in INFO. */ + +static void +vect_describe_gather_scatter_call (gcall *call, gather_scatter_info *info) +{ + stmt_vec_info stmt_info = vinfo_for_stmt (call); + tree vectype = STMT_VINFO_VECTYPE (stmt_info); + data_reference *dr = STMT_VINFO_DATA_REF (stmt_info); + + info->ifn = gimple_call_internal_fn (call); + info->decl = NULL_TREE; + info->base = gimple_call_arg (call, 0); + info->offset = gimple_call_arg (call, 1); + info->offset_dt = vect_unknown_def_type; + info->offset_vectype = NULL_TREE; + info->scale = TREE_INT_CST_LOW (gimple_call_arg (call, 2)); + info->element_type = TREE_TYPE (vectype); + info->memory_type = TREE_TYPE (DR_REF (dr)); +} + /* Return true if a non-affine read or write in STMT is suitable for a gather load or scatter store. Describe the operation in *INFO if so. */ @@ -3309,17 +3377,38 @@ vect_check_gather_scatter (gimple *stmt, stmt_vec_info stmt_info = vinfo_for_stmt (stmt); struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info); tree offtype = NULL_TREE; - tree decl, base, off; + tree decl = NULL_TREE, base, off; + tree vectype = STMT_VINFO_VECTYPE (stmt_info); + tree memory_type = TREE_TYPE (DR_REF (dr)); machine_mode pmode; int punsignedp, reversep, pvolatilep = 0; + internal_fn ifn; + tree element_type; + bool masked_p = false; + + /* See whether this is already a call to a gather/scatter internal function. + If not, see whether it's a masked load or store. */ + gcall *call = dyn_cast (stmt); + if (call && gimple_call_internal_p (call)) + { + ifn = gimple_call_internal_fn (stmt); + if (internal_gather_scatter_fn_p (ifn)) + { + vect_describe_gather_scatter_call (call, info); + return true; + } + masked_p = (ifn == IFN_MASK_LOAD || ifn == IFN_MASK_STORE); + } + + /* True if we should aim to use internal functions rather than + built-in functions. */ + bool use_ifn_p = (DR_IS_READ (dr) + && supports_vec_gather_load_p ()); base = DR_REF (dr); /* For masked loads/stores, DR_REF (dr) is an artificial MEM_REF, see if we can use the def stmt of the address. */ - if (is_gimple_call (stmt) - && gimple_call_internal_p (stmt) - && (gimple_call_internal_fn (stmt) == IFN_MASK_LOAD - || gimple_call_internal_fn (stmt) == IFN_MASK_STORE) + if (masked_p && TREE_CODE (base) == MEM_REF && TREE_CODE (TREE_OPERAND (base, 0)) == SSA_NAME && integer_zerop (TREE_OPERAND (base, 1)) @@ -3450,7 +3539,17 @@ vect_check_gather_scatter (gimple *stmt, case MULT_EXPR: if (scale == 1 && tree_fits_shwi_p (op1)) { - scale = tree_to_shwi (op1); + int new_scale = tree_to_shwi (op1); + /* Only treat this as a scaling operation if the target + supports it. */ + if (use_ifn_p + && !vect_gather_scatter_fn_p (DR_IS_READ (dr), masked_p, + vectype, memory_type, 1, + TYPE_SIGN (TREE_TYPE (op0)), + new_scale, &ifn, + &element_type)) + break; + scale = new_scale; off = op0; continue; } @@ -3468,6 +3567,15 @@ vect_check_gather_scatter (gimple *stmt, off = op0; continue; } + + /* The internal functions need the offset to be the same width + as the elements of VECTYPE. Don't include operations that + cast the offset from that width to a different width. */ + if (use_ifn_p + && (int_size_in_bytes (TREE_TYPE (vectype)) + == int_size_in_bytes (TREE_TYPE (off)))) + break; + if (TYPE_PRECISION (TREE_TYPE (op0)) < TYPE_PRECISION (TREE_TYPE (off))) { @@ -3492,22 +3600,37 @@ vect_check_gather_scatter (gimple *stmt, if (offtype == NULL_TREE) offtype = TREE_TYPE (off); - if (DR_IS_READ (dr)) - decl = targetm.vectorize.builtin_gather (STMT_VINFO_VECTYPE (stmt_info), - offtype, scale); + if (use_ifn_p) + { + if (!vect_gather_scatter_fn_p (DR_IS_READ (dr), masked_p, vectype, + memory_type, TYPE_PRECISION (offtype), + TYPE_SIGN (offtype), scale, &ifn, + &element_type)) + return false; + } else - decl = targetm.vectorize.builtin_scatter (STMT_VINFO_VECTYPE (stmt_info), - offtype, scale); + { + if (DR_IS_READ (dr)) + decl = targetm.vectorize.builtin_gather (vectype, offtype, scale); + else + decl = targetm.vectorize.builtin_scatter (vectype, offtype, scale); - if (decl == NULL_TREE) - return false; + if (!decl) + return false; + + ifn = IFN_LAST; + element_type = TREE_TYPE (vectype); + } + info->ifn = ifn; info->decl = decl; info->base = base; info->offset = off; info->offset_dt = vect_unknown_def_type; info->offset_vectype = NULL_TREE; info->scale = scale; + info->element_type = element_type; + info->memory_type = memory_type; return true; } @@ -3588,7 +3711,8 @@ vect_analyze_data_refs (vec_info *vinfo, bool maybe_gather = DR_IS_READ (dr) && !TREE_THIS_VOLATILE (DR_REF (dr)) - && targetm.vectorize.builtin_gather != NULL; + && (targetm.vectorize.builtin_gather != NULL + || supports_vec_gather_load_p ()); bool maybe_scatter = DR_IS_WRITE (dr) && !TREE_THIS_VOLATILE (DR_REF (dr)) Index: gcc/tree-vect-patterns.c =================================================================== --- gcc/tree-vect-patterns.c 2017-11-17 21:57:43.531042657 +0000 +++ gcc/tree-vect-patterns.c 2017-11-17 21:57:43.919003822 +0000 @@ -69,6 +69,7 @@ static gimple *vect_recog_mixed_size_con tree *, tree *); static gimple *vect_recog_bool_pattern (vec *, tree *, tree *); static gimple *vect_recog_mask_conversion_pattern (vec *, tree *, tree *); +static gimple *vect_recog_gather_scatter_pattern (vec *, tree *, tree *); struct vect_recog_func { @@ -93,6 +94,10 @@ static vect_recog_func vect_vect_recog_f { vect_recog_mult_pattern, "mult" }, { vect_recog_mixed_size_cond_pattern, "mixed_size_cond" }, { vect_recog_bool_pattern, "bool" }, + /* This must come before mask conversion, and includes the parts + of mask conversion that are needed for gather and scatter + internal functions. */ + { vect_recog_gather_scatter_pattern, "gather_scatter" }, { vect_recog_mask_conversion_pattern, "mask_conversion" } }; @@ -4090,6 +4095,202 @@ vect_recog_mask_conversion_pattern (vec< return pattern_stmt; } +/* STMT is a load or store. If the load or store is conditional, return + the boolean condition under which it occurs, otherwise return null. */ + +static tree +vect_get_load_store_mask (gimple *stmt) +{ + if (gassign *def_assign = dyn_cast (stmt)) + { + gcc_assert (gimple_assign_single_p (def_assign)); + return NULL_TREE; + } + + if (gcall *def_call = dyn_cast (stmt)) + { + internal_fn ifn = gimple_call_internal_fn (def_call); + int mask_index = internal_fn_mask_index (ifn); + return gimple_call_arg (def_call, mask_index); + } + + gcc_unreachable (); +} + +/* Return the scalar offset type that an internal gather/scatter function + should use. GS_INFO describes the gather/scatter operation. */ + +static tree +vect_get_gather_scatter_offset_type (gather_scatter_info *gs_info) +{ + tree offset_type = TREE_TYPE (gs_info->offset); + unsigned int element_bits = tree_to_uhwi (TYPE_SIZE (gs_info->element_type)); + + /* Enforced by vect_check_gather_scatter. */ + unsigned int offset_bits = TYPE_PRECISION (offset_type); + gcc_assert (element_bits >= offset_bits); + + /* If the offset is narrower than the elements, extend it according + to its sign. */ + if (element_bits > offset_bits) + return build_nonstandard_integer_type (element_bits, + TYPE_UNSIGNED (offset_type)); + + return offset_type; +} + +/* Return MASK if MASK is suitable for masking an operation on vectors + of type VECTYPE, otherwise convert it into such a form and return + the result. Associate any conversion statements with STMT_INFO's + pattern. */ + +static tree +vect_convert_mask_for_vectype (tree mask, tree vectype, + stmt_vec_info stmt_info, vec_info *vinfo) +{ + tree mask_type = search_type_for_mask (mask, vinfo); + if (mask_type) + { + tree mask_vectype = get_mask_type_for_scalar_type (mask_type); + if (mask_vectype + && may_ne (TYPE_VECTOR_SUBPARTS (vectype), + TYPE_VECTOR_SUBPARTS (mask_vectype))) + mask = build_mask_conversion (mask, vectype, stmt_info, vinfo); + } + return mask; +} + +/* Return the equivalent of: + + fold_convert (TYPE, VALUE) + + with the expectation that the operation will be vectorized. + If new statements are needed, add them as pattern statements + to STMT_INFO. */ + +static tree +vect_add_conversion_to_patterm (tree type, tree value, + stmt_vec_info stmt_info, + vec_info *vinfo) +{ + if (useless_type_conversion_p (type, TREE_TYPE (value))) + return value; + + tree new_value = vect_recog_temp_ssa_var (type, NULL); + gassign *conversion = gimple_build_assign (new_value, CONVERT_EXPR, value); + stmt_vec_info new_stmt_info = new_stmt_vec_info (conversion, vinfo); + set_vinfo_for_stmt (conversion, new_stmt_info); + STMT_VINFO_VECTYPE (new_stmt_info) = get_vectype_for_scalar_type (type); + append_pattern_def_seq (stmt_info, conversion); + return new_value; +} + +/* Try to convert STMT into a call to a gather load or scatter store + internal function. Return the final statement on success and set + *TYPE_IN and *TYPE_OUT to the vector type being loaded or stored. + + This function only handles gathers and scatters that were recognized + as such from the outset (indicated by STMT_VINFO_GATHER_SCATTER_P). */ + +static gimple * +vect_try_gather_scatter_pattern (gimple *stmt, stmt_vec_info last_stmt_info, + tree *type_in, tree *type_out) +{ + /* Currently we only support this for loop vectorization. */ + stmt_vec_info stmt_info = vinfo_for_stmt (stmt); + loop_vec_info loop_vinfo = dyn_cast (stmt_info->vinfo); + if (!loop_vinfo) + return NULL; + + /* Make sure that we're looking at a gather load or scatter store. */ + data_reference *dr = STMT_VINFO_DATA_REF (stmt_info); + if (!dr || !STMT_VINFO_GATHER_SCATTER_P (stmt_info)) + return NULL; + + /* Reject stores for now. */ + if (!DR_IS_READ (dr)) + return NULL; + + /* Get the boolean that controls whether the load or store happens. + This is null if the operation is unconditional. */ + tree mask = vect_get_load_store_mask (stmt); + + /* Make sure that the target supports an appropriate internal + function for the gather/scatter operation. */ + gather_scatter_info gs_info; + if (!vect_check_gather_scatter (stmt, loop_vinfo, &gs_info)) + return NULL; + + /* Convert the mask to the right form. */ + tree gs_vectype = get_vectype_for_scalar_type (gs_info.element_type); + if (mask) + mask = vect_convert_mask_for_vectype (mask, gs_vectype, last_stmt_info, + loop_vinfo); + + /* Get the invariant base and non-invariant offset, converting the + latter to the same width as the vector elements. */ + tree base = gs_info.base; + tree offset_type = vect_get_gather_scatter_offset_type (&gs_info); + tree offset = vect_add_conversion_to_patterm (offset_type, gs_info.offset, + last_stmt_info, loop_vinfo); + + /* Build the new pattern statement. */ + tree scale = size_int (gs_info.scale); + gcall *pattern_stmt; + if (DR_IS_READ (dr)) + { + if (mask != NULL) + pattern_stmt = gimple_build_call_internal (gs_info.ifn, 4, base, + offset, scale, mask); + else + pattern_stmt = gimple_build_call_internal (gs_info.ifn, 3, base, + offset, scale); + tree load_lhs = vect_recog_temp_ssa_var (gs_info.element_type, NULL); + gimple_call_set_lhs (pattern_stmt, load_lhs); + } + else + /* Not yet supported. */ + gcc_unreachable (); + gimple_call_set_nothrow (pattern_stmt, true); + + /* Copy across relevant vectorization info and associate DR with the + new pattern statement instead of the original statement. */ + stmt_vec_info pattern_stmt_info = new_stmt_vec_info (pattern_stmt, + loop_vinfo); + set_vinfo_for_stmt (pattern_stmt, pattern_stmt_info); + STMT_VINFO_DATA_REF (pattern_stmt_info) = dr; + STMT_VINFO_DR_WRT_VEC_LOOP (pattern_stmt_info) + = STMT_VINFO_DR_WRT_VEC_LOOP (stmt_info); + STMT_VINFO_GATHER_SCATTER_P (pattern_stmt_info) + = STMT_VINFO_GATHER_SCATTER_P (stmt_info); + DR_STMT (dr) = pattern_stmt; + + tree vectype = STMT_VINFO_VECTYPE (stmt_info); + *type_out = vectype; + *type_in = vectype; + + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "gather/scatter pattern detected:\n"); + + return pattern_stmt; +} + +/* Pattern wrapper around vect_try_gather_scatter_pattern. */ + +static gimple * +vect_recog_gather_scatter_pattern (vec *stmts, tree *type_in, + tree *type_out) +{ + gimple *last_stmt = stmts->pop (); + stmt_vec_info last_stmt_info = vinfo_for_stmt (last_stmt); + gimple *pattern_stmt = vect_try_gather_scatter_pattern (last_stmt, + last_stmt_info, + type_in, type_out); + if (pattern_stmt) + stmts->safe_push (last_stmt); + return pattern_stmt; +} /* Mark statements that are involved in a pattern. */ Index: gcc/tree-vect-stmts.c =================================================================== --- gcc/tree-vect-stmts.c 2017-11-17 21:57:43.531042657 +0000 +++ gcc/tree-vect-stmts.c 2017-11-17 21:57:43.920003721 +0000 @@ -389,21 +389,19 @@ exist_non_indexing_operands_for_use_p (t { if (is_gimple_call (stmt) && gimple_call_internal_p (stmt)) - switch (gimple_call_internal_fn (stmt)) - { - case IFN_MASK_STORE: - operand = gimple_call_arg (stmt, 3); - if (operand == use) - return true; - /* FALLTHRU */ - case IFN_MASK_LOAD: - operand = gimple_call_arg (stmt, 2); - if (operand == use) - return true; - break; - default: - break; - } + { + internal_fn ifn = gimple_call_internal_fn (stmt); + int mask_index = internal_fn_mask_index (ifn); + if (mask_index >= 0 + && use == gimple_call_arg (stmt, mask_index)) + return true; + if (internal_gather_scatter_fn_p (ifn) + && use == gimple_call_arg (stmt, 1)) + return true; + if (ifn == IFN_MASK_STORE + && use == gimple_call_arg (stmt, 3)) + return true; + } return false; } @@ -1725,6 +1723,8 @@ static tree permute_vec_elements (tree, is the type of the vector being loaded or stored. MEMORY_ACCESS_TYPE says how the load or store is going to be implemented and GROUP_SIZE is the number of load or store statements in the containing group. + If the access is a gather load or scatter store, GS_INFO describes + its arguments. Clear LOOP_VINFO_CAN_FULLY_MASK_P if a fully-masked loop is not supported, otherwise record the required mask types. */ @@ -1732,7 +1732,8 @@ static tree permute_vec_elements (tree, static void check_load_store_masking (loop_vec_info loop_vinfo, tree vectype, vec_load_store_type vls_type, int group_size, - vect_memory_access_type memory_access_type) + vect_memory_access_type memory_access_type, + gather_scatter_info *gs_info) { /* Invariant loads need no special support. */ if (memory_access_type == VMAT_INVARIANT) @@ -1760,6 +1761,29 @@ check_load_store_masking (loop_vec_info return; } + if (memory_access_type == VMAT_GATHER_SCATTER) + { + gcc_assert (is_load); + tree offset_type = TREE_TYPE (gs_info->offset); + if (!internal_gather_scatter_fn_supported_p (IFN_MASK_GATHER_LOAD, + vectype, + gs_info->memory_type, + TYPE_SIGN (offset_type), + gs_info->scale)) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "can't use a fully-masked loop because the" + " target doesn't have an appropriate masked" + " gather load instruction.\n"); + LOOP_VINFO_CAN_FULLY_MASK_P (loop_vinfo) = false; + return; + } + unsigned int ncopies = vect_get_num_copies (loop_vinfo, vectype); + vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype); + return; + } + if (memory_access_type != VMAT_CONTIGUOUS && memory_access_type != VMAT_CONTIGUOUS_PERMUTE) { @@ -2591,6 +2615,31 @@ vect_build_gather_load_calls (gimple *st } } +/* Prepare the base and offset in GS_INFO for vectorization. + Set *DATAREF_PTR to the loop-invariant base address and *VEC_OFFSET + to the vectorized offset argument for the first copy of STMT. STMT + is the statement described by GS_INFO and LOOP is the containing loop. */ + +static void +vect_get_gather_scatter_ops (struct loop *loop, gimple *stmt, + gather_scatter_info *gs_info, + tree *dataref_ptr, tree *vec_offset) +{ + gimple_seq stmts = NULL; + *dataref_ptr = force_gimple_operand (gs_info->base, &stmts, true, NULL_TREE); + if (stmts != NULL) + { + basic_block new_bb; + edge pe = loop_preheader_edge (loop); + new_bb = gsi_insert_seq_on_edge_immediate (pe, stmts); + gcc_assert (!new_bb); + } + tree offset_type = TREE_TYPE (gs_info->offset); + tree offset_vectype = get_vectype_for_scalar_type (offset_type); + *vec_offset = vect_get_vec_def_for_operand (gs_info->offset, stmt, + offset_vectype); +} + /* Check and perform vectorization of BUILT_IN_BSWAP{16,32,64}. */ static bool @@ -2780,7 +2829,7 @@ vectorizable_call (gimple *gs, gimple_st return false; if (gimple_call_internal_p (stmt) - && (gimple_call_internal_fn (stmt) == IFN_MASK_LOAD + && (internal_load_fn_p (gimple_call_internal_fn (stmt)) || gimple_call_internal_fn (stmt) == IFN_MASK_STORE)) /* Handled by vectorizable_load and vectorizable_store. */ return false; @@ -5965,7 +6014,7 @@ vectorizable_store (gimple *stmt, gimple if (loop_vinfo && LOOP_VINFO_CAN_FULLY_MASK_P (loop_vinfo)) check_load_store_masking (loop_vinfo, vectype, vls_type, group_size, - memory_access_type); + memory_access_type, &gs_info); STMT_VINFO_TYPE (stmt_info) = store_vec_info_type; /* The SLP costs are calculated during SLP analysis. */ @@ -6937,7 +6986,11 @@ vectorizable_load (gimple *stmt, gimple_ else { gcall *call = dyn_cast (stmt); - if (!call || !gimple_call_internal_p (call, IFN_MASK_LOAD)) + if (!call || !gimple_call_internal_p (call)) + return false; + + internal_fn ifn = gimple_call_internal_fn (call); + if (!internal_load_fn_p (ifn)) return false; scalar_dest = gimple_call_lhs (call); @@ -6952,9 +7005,13 @@ vectorizable_load (gimple *stmt, gimple_ return false; } - mask = gimple_call_arg (call, 2); - if (!vect_check_load_store_mask (stmt, mask, &mask_vectype)) - return false; + int mask_index = internal_fn_mask_index (ifn); + if (mask_index >= 0) + { + mask = gimple_call_arg (call, mask_index); + if (!vect_check_load_store_mask (stmt, mask, &mask_vectype)) + return false; + } } if (!STMT_VINFO_DATA_REF (stmt_info)) @@ -7078,7 +7135,7 @@ vectorizable_load (gimple *stmt, gimple_ TYPE_MODE (mask_vectype), true)) return false; } - else if (memory_access_type == VMAT_GATHER_SCATTER) + else if (memory_access_type == VMAT_GATHER_SCATTER && gs_info.decl) { tree arglist = TYPE_ARG_TYPES (TREE_TYPE (gs_info.decl)); tree masktype @@ -7092,7 +7149,8 @@ vectorizable_load (gimple *stmt, gimple_ return false; } } - else if (memory_access_type != VMAT_LOAD_STORE_LANES) + else if (memory_access_type != VMAT_LOAD_STORE_LANES + && memory_access_type != VMAT_GATHER_SCATTER) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, @@ -7109,7 +7167,7 @@ vectorizable_load (gimple *stmt, gimple_ if (loop_vinfo && LOOP_VINFO_CAN_FULLY_MASK_P (loop_vinfo)) check_load_store_masking (loop_vinfo, vectype, VLS_LOAD, group_size, - memory_access_type); + memory_access_type, &gs_info); STMT_VINFO_TYPE (stmt_info) = load_vec_info_type; /* The SLP costs are calculated during SLP analysis. */ @@ -7131,7 +7189,7 @@ vectorizable_load (gimple *stmt, gimple_ ensure_base_align (dr); - if (memory_access_type == VMAT_GATHER_SCATTER) + if (memory_access_type == VMAT_GATHER_SCATTER && gs_info.decl) { vect_build_gather_load_calls (stmt, gsi, vec_stmt, &gs_info, mask); return true; @@ -7576,6 +7634,7 @@ vectorizable_load (gimple *stmt, gimple_ aggr_type = vectype; tree vec_mask = NULL_TREE; + tree vec_offset = NULL_TREE; prev_stmt_info = NULL; poly_uint64 group_elt = 0; vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo); @@ -7618,6 +7677,12 @@ vectorizable_load (gimple *stmt, gimple_ dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, gsi, stmt, diff); } + else if (STMT_VINFO_GATHER_SCATTER_P (stmt_info)) + { + vect_get_gather_scatter_ops (loop, stmt, &gs_info, + &dataref_ptr, &vec_offset); + inv_p = false; + } else dataref_ptr = vect_create_data_ref_ptr (first_stmt, aggr_type, at_loop, @@ -7633,6 +7698,13 @@ vectorizable_load (gimple *stmt, gimple_ if (dataref_offset) dataref_offset = int_const_binop (PLUS_EXPR, dataref_offset, TYPE_SIZE_UNIT (aggr_type)); + else if (STMT_VINFO_GATHER_SCATTER_P (stmt_info)) + { + gimple *def_stmt; + vect_def_type dt; + vect_is_simple_use (vec_offset, loop_vinfo, &def_stmt, &dt); + vec_offset = vect_get_vec_def_for_stmt_copy (dt, vec_offset); + } else dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, gsi, stmt, TYPE_SIZE_UNIT (aggr_type)); @@ -7721,6 +7793,24 @@ vectorizable_load (gimple *stmt, gimple_ { unsigned int align, misalign; + if (memory_access_type == VMAT_GATHER_SCATTER) + { + tree scale = size_int (gs_info.scale); + gcall *call; + if (masked_loop_p) + call = gimple_build_call_internal + (IFN_MASK_GATHER_LOAD, 4, dataref_ptr, + vec_offset, scale, final_mask); + else + call = gimple_build_call_internal + (IFN_GATHER_LOAD, 3, dataref_ptr, + vec_offset, scale); + gimple_call_set_nothrow (call, true); + new_stmt = call; + data_ref = NULL_TREE; + break; + } + align = DR_TARGET_ALIGNMENT (dr); if (alignment_support_scheme == dr_aligned) { Index: gcc/config/aarch64/aarch64.md =================================================================== --- gcc/config/aarch64/aarch64.md 2017-11-17 21:57:43.531042657 +0000 +++ gcc/config/aarch64/aarch64.md 2017-11-17 21:57:43.914004322 +0000 @@ -151,6 +151,7 @@ (define_c_enum "unspec" [ UNSPEC_XPACLRI UNSPEC_LD1_SVE UNSPEC_ST1_SVE + UNSPEC_LD1_GATHER UNSPEC_MERGE_PTRUE UNSPEC_PTEST_PTRUE UNSPEC_UNPACKSHI Index: gcc/config/aarch64/iterators.md =================================================================== --- gcc/config/aarch64/iterators.md 2017-11-17 21:57:43.531042657 +0000 +++ gcc/config/aarch64/iterators.md 2017-11-17 21:57:43.914004322 +0000 @@ -276,6 +276,12 @@ (define_mode_iterator SVE_HSF [VNx8HF VN ;; All SVE vector modes that have 32-bit or 64-bit elements. (define_mode_iterator SVE_SD [VNx4SI VNx2DI VNx4SF VNx2DF]) +;; All SVE vector modes that have 32-bit elements. +(define_mode_iterator SVE_S [VNx4SI VNx4SF]) + +;; All SVE vector modes that have 64-bit elements. +(define_mode_iterator SVE_D [VNx2DI VNx2DF]) + ;; All SVE integer vector modes that have 32-bit or 64-bit elements. (define_mode_iterator SVE_SDI [VNx4SI VNx2DI]) Index: gcc/config/aarch64/predicates.md =================================================================== --- gcc/config/aarch64/predicates.md 2017-11-17 21:57:43.531042657 +0000 +++ gcc/config/aarch64/predicates.md 2017-11-17 21:57:43.914004322 +0000 @@ -596,3 +596,11 @@ (define_predicate "aarch64_sve_float_mul (define_predicate "aarch64_sve_vec_perm_operand" (ior (match_operand 0 "register_operand") (match_operand 0 "aarch64_constant_vector_operand"))) + +(define_predicate "aarch64_gather_scale_operand_w" + (and (match_code "const_int") + (match_test "INTVAL (op) == 1 || INTVAL (op) == 4"))) + +(define_predicate "aarch64_gather_scale_operand_d" + (and (match_code "const_int") + (match_test "INTVAL (op) == 1 || INTVAL (op) == 8"))) Index: gcc/config/aarch64/aarch64-sve.md =================================================================== --- gcc/config/aarch64/aarch64-sve.md 2017-11-17 21:57:43.531042657 +0000 +++ gcc/config/aarch64/aarch64-sve.md 2017-11-17 21:57:43.913004422 +0000 @@ -189,6 +189,63 @@ (define_insn "maskstore" "st1\t%1., %2, %0" ) +;; Unpredicated gather loads. +(define_expand "gather_load" + [(set (match_operand:SVE_SD 0 "register_operand") + (unspec:SVE_SD + [(match_dup 5) + (match_operand:DI 1 "aarch64_reg_or_zero") + (match_operand: 2 "register_operand") + (match_operand:DI 3 "const_int_operand") + (match_operand:DI 4 "aarch64_gather_scale_operand_") + (mem:BLK (scratch))] + UNSPEC_LD1_GATHER))] + "TARGET_SVE" + { + operands[5] = force_reg (mode, CONSTM1_RTX (mode)); + } +) + +;; Predicated gather loads for 32-bit elements. Operand 3 is true for +;; unsigned extension and false for signed extensions. +(define_insn "mask_gather_load" + [(set (match_operand:SVE_S 0 "register_operand" "=w, w, w, w, w") + (unspec:SVE_S + [(match_operand: 5 "register_operand" "Upl, Upl, Upl, Upl, Upl") + (match_operand:DI 1 "aarch64_reg_or_zero" "Z, rk, rk, rk, rk") + (match_operand: 2 "register_operand" "w, w, w, w, w") + (match_operand:DI 3 "const_int_operand" "i, Z, Ui1, Z, Ui1") + (match_operand:DI 4 "aarch64_gather_scale_operand_w" "Ui1, Ui1, Ui1, i, i") + (mem:BLK (scratch))] + UNSPEC_LD1_GATHER))] + "TARGET_SVE" + "@ + ld1w\t%0.s, %5/z, [%2.s] + ld1w\t%0.s, %5/z, [%1, %2.s, sxtw] + ld1w\t%0.s, %5/z, [%1, %2.s, uxtw] + ld1w\t%0.s, %5/z, [%1, %2.s, sxtw %p4] + ld1w\t%0.s, %5/z, [%1, %2.s, uxtw %p4]" +) + +;; Predicated gather loads for 64-bit elements. The value of operand 3 +;; doesn't matter in this case. +(define_insn "mask_gather_load" + [(set (match_operand:SVE_D 0 "register_operand" "=w, w, w") + (unspec:SVE_D + [(match_operand: 5 "register_operand" "Upl, Upl, Upl") + (match_operand:DI 1 "aarch64_reg_or_zero" "Z, rk, rk") + (match_operand: 2 "register_operand" "w, w, w") + (match_operand:DI 3 "const_int_operand") + (match_operand:DI 4 "aarch64_gather_scale_operand_d" "Ui1, Ui1, i") + (mem:BLK (scratch))] + UNSPEC_LD1_GATHER))] + "TARGET_SVE" + "@ + ld1d\t%0.d, %5/z, [%2.d] + ld1d\t%0.d, %5/z, [%1, %2.d] + ld1d\t%0.d, %5/z, [%1, %2.d, lsl %p4]" +) + ;; SVE structure moves. (define_expand "mov" [(set (match_operand:SVE_STRUCT 0 "nonimmediate_operand") Index: gcc/testsuite/gcc.target/aarch64/sve_gather_load_1.c =================================================================== --- /dev/null 2017-11-14 14:28:07.424493901 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_gather_load_1.c 2017-11-17 21:57:43.917004022 +0000 @@ -0,0 +1,32 @@ +/* { dg-do assemble } */ +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */ + +#include + +#ifndef INDEX32 +#define INDEX32 int32_t +#define INDEX64 int64_t +#endif + +/* Invoked 18 times for each data size. */ +#define TEST_LOOP(DATA_TYPE, BITS) \ + void __attribute__ ((noinline, noclone)) \ + f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src, \ + INDEX##BITS *indices, int n) \ + { \ + for (int i = 9; i < n; ++i) \ + dest[i] += src[indices[i]]; \ + } + +#define TEST_ALL(T) \ + T (int32_t, 32) \ + T (uint32_t, 32) \ + T (float, 32) \ + T (int64_t, 64) \ + T (uint64_t, 64) \ + T (double, 64) + +TEST_ALL (TEST_LOOP) + +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, z[0-9]+.s, sxtw 2\]\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+.d, lsl 3\]\n} 3 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_gather_load_2.c =================================================================== --- /dev/null 2017-11-14 14:28:07.424493901 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_gather_load_2.c 2017-11-17 21:57:43.917004022 +0000 @@ -0,0 +1,10 @@ +/* { dg-do assemble } */ +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */ + +#define INDEX32 uint32_t +#define INDEX64 uint64_t + +#include "sve_gather_load_1.c" + +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, z[0-9]+.s, uxtw 2\]\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+.d, lsl 3\]\n} 3 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_gather_load_3.c =================================================================== --- /dev/null 2017-11-14 14:28:07.424493901 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_gather_load_3.c 2017-11-17 21:57:43.917004022 +0000 @@ -0,0 +1,32 @@ +/* { dg-do assemble } */ +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */ + +#include + +#ifndef INDEX32 +#define INDEX32 int32_t +#define INDEX64 int64_t +#endif + +/* Invoked 18 times for each data size. */ +#define TEST_LOOP(DATA_TYPE, BITS) \ + void __attribute__ ((noinline, noclone)) \ + f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src, \ + INDEX##BITS *indices, int n) \ + { \ + for (int i = 9; i < n; ++i) \ + dest[i] += *(DATA_TYPE *) ((char *) src + indices[i]); \ + } + +#define TEST_ALL(T) \ + T (int32_t, 32) \ + T (uint32_t, 32) \ + T (float, 32) \ + T (int64_t, 64) \ + T (uint64_t, 64) \ + T (double, 64) + +TEST_ALL (TEST_LOOP) + +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, z[0-9]+.s, sxtw\]\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+.d\]\n} 3 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_gather_load_4.c =================================================================== --- /dev/null 2017-11-14 14:28:07.424493901 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_gather_load_4.c 2017-11-17 21:57:43.917004022 +0000 @@ -0,0 +1,10 @@ +/* { dg-do assemble } */ +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */ + +#define INDEX32 uint32_t +#define INDEX64 uint64_t + +#include "sve_gather_load_3.c" + +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, z[0-9]+.s, uxtw\]\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+.d\]\n} 3 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_gather_load_5.c =================================================================== --- /dev/null 2017-11-14 14:28:07.424493901 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_gather_load_5.c 2017-11-17 21:57:43.917004022 +0000 @@ -0,0 +1,23 @@ +/* { dg-do assemble } */ +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */ + +#include + +/* Invoked 18 times for each data size. */ +#define TEST_LOOP(DATA_TYPE) \ + void __attribute__ ((noinline, noclone)) \ + f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict *src, \ + int n) \ + { \ + for (int i = 9; i < n; ++i) \ + dest[i] += *src[i]; \ + } + +#define TEST_ALL(T) \ + T (int64_t) \ + T (uint64_t) \ + T (double) + +TEST_ALL (TEST_LOOP) + +/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[z[0-9]+.d\]\n} 3 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_gather_load_6.c =================================================================== --- /dev/null 2017-11-14 14:28:07.424493901 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_gather_load_6.c 2017-11-17 21:57:43.917004022 +0000 @@ -0,0 +1,36 @@ +/* { dg-do assemble } */ +/* { dg-options "-O2 -ftree-vectorize -fwrapv -march=armv8-a+sve --save-temps" } */ + +#include + +#ifndef INDEX32 +#define INDEX16 int16_t +#define INDEX32 int32_t +#endif + +/* Invoked 18 times for each data size. */ +#define TEST_LOOP(DATA_TYPE, BITS) \ + void __attribute__ ((noinline, noclone)) \ + f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src, \ + INDEX##BITS *indices, INDEX##BITS mask, int n) \ + { \ + for (int i = 9; i < n; ++i) \ + dest[i] = src[(INDEX##BITS) (indices[i] | mask)]; \ + } + +#define TEST_ALL(T) \ + T (int32_t, 16) \ + T (uint32_t, 16) \ + T (float, 16) \ + T (int64_t, 32) \ + T (uint64_t, 32) \ + T (double, 32) + +TEST_ALL (TEST_LOOP) + +/* { dg-final { scan-assembler-times {\tsunpkhi\tz[0-9]+\.s, z[0-9]+\.h\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tsunpklo\tz[0-9]+\.s, z[0-9]+\.h\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tsunpkhi\tz[0-9]+\.d, z[0-9]+\.s\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tsunpklo\tz[0-9]+\.d, z[0-9]+\.s\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, z[0-9]+.s, sxtw 2\]\n} 6 } } */ +/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+.d, lsl 3\]\n} 6 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_gather_load_7.c =================================================================== --- /dev/null 2017-11-14 14:28:07.424493901 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_gather_load_7.c 2017-11-17 21:57:43.917004022 +0000 @@ -0,0 +1,15 @@ +/* { dg-do assemble } */ +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */ + +#define INDEX16 uint16_t +#define INDEX32 uint32_t + +#include "sve_gather_load_6.c" + +/* { dg-final { scan-assembler-times {\tuunpkhi\tz[0-9]+\.s, z[0-9]+\.h\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tuunpklo\tz[0-9]+\.s, z[0-9]+\.h\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tuunpkhi\tz[0-9]+\.d, z[0-9]+\.s\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tuunpklo\tz[0-9]+\.d, z[0-9]+\.s\n} 3 } } */ +/* Either extension type is OK here. */ +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, z[0-9]+.s, [us]xtw 2\]\n} 6 } } */ +/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+.d, lsl 3\]\n} 6 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_mask_gather_load_1.c =================================================================== --- /dev/null 2017-11-14 14:28:07.424493901 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_mask_gather_load_1.c 2017-11-17 21:57:43.917004022 +0000 @@ -0,0 +1,52 @@ +/* { dg-do assemble } */ +/* { dg-options "-O2 -ftree-vectorize -ffast-math -march=armv8-a+sve --save-temps" } */ + +#include + +#ifndef INDEX32 +#define INDEX32 int32_t +#define INDEX64 int64_t +#endif + +#define TEST_LOOP(DATA_TYPE, CMP_TYPE, BITS) \ + void \ + f_##DATA_TYPE##_##CMP_TYPE \ + (DATA_TYPE *restrict dest, DATA_TYPE *restrict src, \ + CMP_TYPE *cmp1, CMP_TYPE *cmp2, INDEX##BITS *indices, int n) \ + { \ + for (int i = 0; i < n; ++i) \ + if (cmp1[i] == cmp2[i]) \ + dest[i] += src[indices[i]]; \ + } + +#define TEST32(T, DATA_TYPE) \ + T (DATA_TYPE, int32_t, 32) \ + T (DATA_TYPE, uint32_t, 32) \ + T (DATA_TYPE, float, 32) + +#define TEST64(T, DATA_TYPE) \ + T (DATA_TYPE, int64_t, 64) \ + T (DATA_TYPE, uint64_t, 64) \ + T (DATA_TYPE, double, 64) + +#define TEST_ALL(T) \ + TEST32 (T, int32_t) \ + TEST32 (T, uint32_t) \ + TEST32 (T, float) \ + TEST64 (T, int64_t) \ + TEST64 (T, uint64_t) \ + TEST64 (T, double) + +TEST_ALL (TEST_LOOP) + +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, x[0-9]+, lsl 2\]\n} 36 } } */ +/* { dg-final { scan-assembler-times {\tcmpeq\tp[0-7]\.s, p[0-7]/z, z[0-9]+\.s, z[0-9]+\.s\n} 6 } } */ +/* { dg-final { scan-assembler-times {\tfcmeq\tp[0-7]\.s, p[0-7]/z, z[0-9]+\.s, z[0-9]+\.s\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, z[0-9]+\.s, sxtw 2\]\n} 9 } } */ +/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s, p[0-7], \[x[0-9]+, x[0-9]+, lsl 2\]\n} 9 } } */ + +/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, x[0-9]+, lsl 3\]\n} 36 } } */ +/* { dg-final { scan-assembler-times {\tcmpeq\tp[0-7]\.d, p[0-7]/z, z[0-9]+\.d, z[0-9]+\.d\n} 6 } } */ +/* { dg-final { scan-assembler-times {\tfcmeq\tp[0-7]\.d, p[0-7]/z, z[0-9]+\.d, z[0-9]+\.d\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+\.d, lsl 3\]\n} 9 } } */ +/* { dg-final { scan-assembler-times {\tst1d\tz[0-9]+\.d, p[0-7], \[x[0-9]+, x[0-9]+, lsl 3\]\n} 9 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_mask_gather_load_2.c =================================================================== --- /dev/null 2017-11-14 14:28:07.424493901 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_mask_gather_load_2.c 2017-11-17 21:57:43.918003922 +0000 @@ -0,0 +1,19 @@ +/* { dg-do assemble } */ +/* { dg-options "-O2 -ftree-vectorize -ffast-math -march=armv8-a+sve --save-temps" } */ + +#define INDEX32 uint32_t +#define INDEX64 uint64_t + +#include "sve_mask_gather_load_1.c" + +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, x[0-9]+, lsl 2\]\n} 36 } } */ +/* { dg-final { scan-assembler-times {\tcmpeq\tp[0-7]\.s, p[0-7]/z, z[0-9]+\.s, z[0-9]+\.s\n} 6 } } */ +/* { dg-final { scan-assembler-times {\tfcmeq\tp[0-7]\.s, p[0-7]/z, z[0-9]+\.s, z[0-9]+\.s\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, z[0-9]+\.s, uxtw 2\]\n} 9 } } */ +/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s, p[0-7], \[x[0-9]+, x[0-9]+, lsl 2\]\n} 9 } } */ + +/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, x[0-9]+, lsl 3\]\n} 36 } } */ +/* { dg-final { scan-assembler-times {\tcmpeq\tp[0-7]\.d, p[0-7]/z, z[0-9]+\.d, z[0-9]+\.d\n} 6 } } */ +/* { dg-final { scan-assembler-times {\tfcmeq\tp[0-7]\.d, p[0-7]/z, z[0-9]+\.d, z[0-9]+\.d\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+\.d, lsl 3\]\n} 9 } } */ +/* { dg-final { scan-assembler-times {\tst1d\tz[0-9]+\.d, p[0-7], \[x[0-9]+, x[0-9]+, lsl 3\]\n} 9 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_mask_gather_load_3.c =================================================================== --- /dev/null 2017-11-14 14:28:07.424493901 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_mask_gather_load_3.c 2017-11-17 21:57:43.918003922 +0000 @@ -0,0 +1,52 @@ +/* { dg-do assemble } */ +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve -ffast-math --save-temps" } */ + +#include + +#ifndef INDEX32 +#define INDEX32 int32_t +#define INDEX64 int64_t +#endif + +#define TEST_LOOP(DATA_TYPE, CMP_TYPE, BITS) \ + void \ + f_##DATA_TYPE##_##CMP_TYPE \ + (DATA_TYPE *restrict dest, DATA_TYPE *restrict src, \ + CMP_TYPE *cmp1, CMP_TYPE *cmp2, INDEX##BITS *indices, int n) \ + { \ + for (int i = 0; i < n; ++i) \ + if (cmp1[i] == cmp2[i]) \ + dest[i] += *(DATA_TYPE *) ((char *) src + indices[i]); \ + } + +#define TEST32(T, DATA_TYPE) \ + T (DATA_TYPE, int32_t, 32) \ + T (DATA_TYPE, uint32_t, 32) \ + T (DATA_TYPE, float, 32) + +#define TEST64(T, DATA_TYPE) \ + T (DATA_TYPE, int64_t, 64) \ + T (DATA_TYPE, uint64_t, 64) \ + T (DATA_TYPE, double, 64) + +#define TEST_ALL(T) \ + TEST32 (T, int32_t) \ + TEST32 (T, uint32_t) \ + TEST32 (T, float) \ + TEST64 (T, int64_t) \ + TEST64 (T, uint64_t) \ + TEST64 (T, double) + +TEST_ALL (TEST_LOOP) + +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, x[0-9]+, lsl 2\]\n} 36 } } */ +/* { dg-final { scan-assembler-times {\tcmpeq\tp[0-7]\.s, p[0-7]/z, z[0-9]+\.s, z[0-9]+\.s\n} 6 } } */ +/* { dg-final { scan-assembler-times {\tfcmeq\tp[0-7]\.s, p[0-7]/z, z[0-9]+\.s, z[0-9]+\.s\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, z[0-9]+\.s, sxtw\]\n} 9 } } */ +/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s, p[0-7], \[x[0-9]+, x[0-9]+, lsl 2\]\n} 9 } } */ + +/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, x[0-9]+, lsl 3\]\n} 36 } } */ +/* { dg-final { scan-assembler-times {\tcmpeq\tp[0-7]\.d, p[0-7]/z, z[0-9]+\.d, z[0-9]+\.d\n} 6 } } */ +/* { dg-final { scan-assembler-times {\tfcmeq\tp[0-7]\.d, p[0-7]/z, z[0-9]+\.d, z[0-9]+\.d\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+\.d\]\n} 9 } } */ +/* { dg-final { scan-assembler-times {\tst1d\tz[0-9]+\.d, p[0-7], \[x[0-9]+, x[0-9]+, lsl 3\]\n} 9 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_mask_gather_load_4.c =================================================================== --- /dev/null 2017-11-14 14:28:07.424493901 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_mask_gather_load_4.c 2017-11-17 21:57:43.918003922 +0000 @@ -0,0 +1,19 @@ +/* { dg-do assemble } */ +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve -ffast-math --save-temps" } */ + +#define INDEX32 uint32_t +#define INDEX64 uint64_t + +#include "sve_mask_gather_load_3.c" + +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, x[0-9]+, lsl 2\]\n} 36 } } */ +/* { dg-final { scan-assembler-times {\tcmpeq\tp[0-7]\.s, p[0-7]/z, z[0-9]+\.s, z[0-9]+\.s\n} 6 } } */ +/* { dg-final { scan-assembler-times {\tfcmeq\tp[0-7]\.s, p[0-7]/z, z[0-9]+\.s, z[0-9]+\.s\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, z[0-9]+\.s, uxtw\]\n} 9 } } */ +/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s, p[0-7], \[x[0-9]+, x[0-9]+, lsl 2\]\n} 9 } } */ + +/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, x[0-9]+, lsl 3\]\n} 36 } } */ +/* { dg-final { scan-assembler-times {\tcmpeq\tp[0-7]\.d, p[0-7]/z, z[0-9]+\.d, z[0-9]+\.d\n} 6 } } */ +/* { dg-final { scan-assembler-times {\tfcmeq\tp[0-7]\.d, p[0-7]/z, z[0-9]+\.d, z[0-9]+\.d\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+\.d\]\n} 9 } } */ +/* { dg-final { scan-assembler-times {\tst1d\tz[0-9]+\.d, p[0-7], \[x[0-9]+, x[0-9]+, lsl 3\]\n} 9 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_mask_gather_load_5.c =================================================================== --- /dev/null 2017-11-14 14:28:07.424493901 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_mask_gather_load_5.c 2017-11-17 21:57:43.918003922 +0000 @@ -0,0 +1,38 @@ +/* { dg-do assemble } */ +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve -ffast-math --save-temps" } */ + +#include + +#ifndef INDEX32 +#define INDEX32 int32_t +#define INDEX64 int64_t +#endif + +#define TEST_LOOP(DATA_TYPE, CMP_TYPE) \ + void \ + f_##DATA_TYPE##_##CMP_TYPE \ + (DATA_TYPE *restrict dest, DATA_TYPE *restrict *restrict src, \ + CMP_TYPE *cmp1, CMP_TYPE *cmp2, int n) \ + { \ + for (int i = 0; i < n; ++i) \ + if (cmp1[i] == cmp2[i]) \ + dest[i] += *src[i]; \ + } + +#define TEST_TYPE(T, DATA_TYPE) \ + T (DATA_TYPE, int64_t) \ + T (DATA_TYPE, uint64_t) \ + T (DATA_TYPE, double) + +#define TEST_ALL(T) \ + TEST_TYPE (T, int64_t) \ + TEST_TYPE (T, uint64_t) \ + TEST_TYPE (T, double) + +TEST_ALL (TEST_LOOP) + +/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, x[0-9]+, lsl 3\]\n} 36 } } */ +/* { dg-final { scan-assembler-times {\tcmpeq\tp[0-7]\.d, p[0-7]/z, z[0-9]+\.d, z[0-9]+\.d\n} 6 } } */ +/* { dg-final { scan-assembler-times {\tfcmeq\tp[0-7]\.d, p[0-7]/z, z[0-9]+\.d, z[0-9]+\.d\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[z[0-9]+\.d\]\n} 9 } } */ +/* { dg-final { scan-assembler-times {\tst1d\tz[0-9]+\.d, p[0-7], \[x[0-9]+, x[0-9]+, lsl 3\]\n} 9 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_mask_gather_load_6.c =================================================================== --- /dev/null 2017-11-14 14:28:07.424493901 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_mask_gather_load_6.c 2017-11-17 21:57:43.918003922 +0000 @@ -0,0 +1,38 @@ +/* { dg-do assemble } */ +/* { dg-options "-O2 -ftree-vectorize -ffast-math -march=armv8-a+sve --save-temps" } */ + +#include + +#define TEST_LOOP(DATA_TYPE, CMP_TYPE, INDEX_TYPE) \ + void \ + f_##DATA_TYPE##_##CMP_TYPE##_##INDEX_TYPE \ + (DATA_TYPE *restrict dest, DATA_TYPE *restrict src, \ + CMP_TYPE *cmp1, CMP_TYPE *cmp2, INDEX_TYPE *indices, int n) \ + { \ + for (int i = 0; i < n; ++i) \ + if (cmp1[i] == cmp2[i]) \ + dest[i] += src[indices[i]]; \ + } + +#define TEST32(T, DATA_TYPE) \ + T (DATA_TYPE, int64_t, int32_t) \ + T (DATA_TYPE, uint64_t, int32_t) \ + T (DATA_TYPE, double, int32_t) \ + T (DATA_TYPE, int64_t, uint32_t) \ + T (DATA_TYPE, uint64_t, uint32_t) \ + T (DATA_TYPE, double, uint32_t) + +#define TEST_ALL(T) \ + TEST32 (T, int32_t) \ + TEST32 (T, uint32_t) \ + TEST32 (T, float) + +TEST_ALL (TEST_LOOP) + +/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, x[0-9]+, lsl 3\]\n} 72 } } */ +/* { dg-final { scan-assembler-times {\tcmpeq\tp[0-7]\.d, p[0-7]/z, z[0-9]+\.d, z[0-9]+\.d\n} 24 } } */ +/* { dg-final { scan-assembler-times {\tfcmeq\tp[0-7]\.d, p[0-7]/z, z[0-9]+\.d, z[0-9]+\.d\n} 12 } } */ +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, x[0-9]+, lsl 2\]\n} 36 } } */ +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, z[0-9]+\.s, sxtw 2\]\n} 9 } } */ +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, z[0-9]+\.s, uxtw 2\]\n} 9 } } */ +/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s, p[0-7], \[x[0-9]+, x[0-9]+, lsl 2\]\n} 18 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_mask_gather_load_7.c =================================================================== --- /dev/null 2017-11-14 14:28:07.424493901 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_mask_gather_load_7.c 2017-11-17 21:57:43.918003922 +0000 @@ -0,0 +1,53 @@ +/* { dg-do assemble } */ +/* { dg-options "-O2 -ftree-vectorize -ffast-math -march=armv8-a+sve --save-temps" } */ + +#include + +#define TEST_LOOP(DATA_TYPE, CMP_TYPE, INDEX_TYPE) \ + void \ + f_##DATA_TYPE##_##CMP_TYPE##_##INDEX_TYPE \ + (DATA_TYPE *restrict dest, DATA_TYPE *restrict src, \ + CMP_TYPE *cmp1, CMP_TYPE *cmp2, INDEX_TYPE *indices, int n) \ + { \ + for (int i = 0; i < n; ++i) \ + if (cmp1[i] == cmp2[i]) \ + dest[i] += src[indices[i]]; \ + } + +#define TEST32(T, DATA_TYPE) \ + T (DATA_TYPE, int16_t, int32_t) \ + T (DATA_TYPE, uint16_t, int32_t) \ + T (DATA_TYPE, _Float16, int32_t) \ + T (DATA_TYPE, int16_t, uint32_t) \ + T (DATA_TYPE, uint16_t, uint32_t) \ + T (DATA_TYPE, _Float16, uint32_t) + +#define TEST64(T, DATA_TYPE) \ + T (DATA_TYPE, int32_t, int64_t) \ + T (DATA_TYPE, uint32_t, int64_t) \ + T (DATA_TYPE, float, int64_t) \ + T (DATA_TYPE, int32_t, uint64_t) \ + T (DATA_TYPE, uint32_t, uint64_t) \ + T (DATA_TYPE, float, uint64_t) + +#define TEST_ALL(T) \ + TEST32 (T, int32_t) \ + TEST32 (T, uint32_t) \ + TEST32 (T, float) \ + TEST64 (T, int64_t) \ + TEST64 (T, uint64_t) \ + TEST64 (T, double) + +TEST_ALL (TEST_LOOP) + +/* { dg-final { scan-assembler-times {\tld1h\tz[0-9]+\.h, p[0-7]/z, \[x[0-9]+, x[0-9]+, lsl 1\]\n} 36 } } */ +/* { dg-final { scan-assembler-times {\tcmpeq\tp[0-7]\.h, p[0-7]/z, z[0-9]+\.h, z[0-9]+\.h\n} 12 } } */ +/* { dg-final { scan-assembler-times {\tfcmeq\tp[0-7]\.h, p[0-7]/z, z[0-9]+\.h, z[0-9]+\.h\n} 6 } } */ +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, z[0-9]+\.s, sxtw 2\]\n} 18 } } */ +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, z[0-9]+\.s, uxtw 2\]\n} 18 } } */ + +/* Also used for the TEST32 indices. */ +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, x[0-9]+, lsl 2\]\n} 72 } } */ +/* { dg-final { scan-assembler-times {\tcmpeq\tp[0-7]\.s, p[0-7]/z, z[0-9]+\.s, z[0-9]+\.s\n} 12 } } */ +/* { dg-final { scan-assembler-times {\tfcmeq\tp[0-7]\.s, p[0-7]/z, z[0-9]+\.s, z[0-9]+\.s\n} 6 } } */ +/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+\.d, lsl 3\]\n} 36 } } */