From patchwork Thu Oct 22 16:05:26 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kyrylo Tkachov X-Patchwork-Id: 55444 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-wi0-f197.google.com (mail-wi0-f197.google.com [209.85.212.197]) by patches.linaro.org (Postfix) with ESMTPS id BCEEF22AA5 for ; Thu, 22 Oct 2015 16:05:58 +0000 (UTC) Received: by wikv3 with SMTP id v3sf36631703wik.1 for ; Thu, 22 Oct 2015 09:05:58 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:delivered-to:mailing-list:precedence:list-id :list-unsubscribe:list-archive:list-post:list-help:sender :delivered-to:message-id:date:from:user-agent:mime-version:to:cc :subject:content-type:x-original-sender :x-original-authentication-results; bh=vyE+y4EOgmKU4+UWhv6reyavEID4sTFQk3N8upvWhT4=; b=RYhYiLEzUqzjnlVPJ/UuofH9VKgmIHxxImkKDQcloBaTAn49ggRRulGEMB4WkIDheX q20U0usEtzi0uiVHGl6crISeW4R2yHAUhQhm34lq0zj8mgYeoZ0MBehGkT4Jlsq+9Go2 IaeqXvTSdXhrlFLmkTuOwaWADC48jODigzP2mv2cdPmQwwL/EDZVAVVIxipoMEo+rAiR CennSEsXiPQw+/tS2Xb32/MtyrR6BIx3y1VceUCyLfiU04PKhM+yZQMDoRbNC5SHMROx 4WsGYofygniUGsXYMwNT+v0ObNJIf7dqynPFv0ItqF/9GsAYO+lG5/8KdGLxWCNfU8cT IPHQ== X-Gm-Message-State: ALoCoQnD2RJBjAGsV4zf8JYfYZMzH8pS2tCXmWsXTubZZBN2wXRv1vJE4tIiQxyqqyJaHNcmsY/S X-Received: by 10.112.169.34 with SMTP id ab2mr3876809lbc.23.1445529957947; Thu, 22 Oct 2015 09:05:57 -0700 (PDT) X-BeenThere: patchwork-forward@linaro.org Received: by 10.25.44.79 with SMTP id s76ls356441lfs.29.gmail; Thu, 22 Oct 2015 09:05:57 -0700 (PDT) X-Received: by 10.112.180.230 with SMTP id dr6mr8795805lbc.72.1445529957670; Thu, 22 Oct 2015 09:05:57 -0700 (PDT) Received: from mail-lf0-x22a.google.com (mail-lf0-x22a.google.com. [2a00:1450:4010:c07::22a]) by mx.google.com with ESMTPS id t1si10062805lbk.72.2015.10.22.09.05.57 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 22 Oct 2015 09:05:57 -0700 (PDT) Received-SPF: pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 2a00:1450:4010:c07::22a as permitted sender) client-ip=2a00:1450:4010:c07::22a; Received: by lfbn126 with SMTP id n126so19417833lfb.2 for ; Thu, 22 Oct 2015 09:05:57 -0700 (PDT) X-Received: by 10.25.40.130 with SMTP id o124mr5614707lfo.41.1445529957456; Thu, 22 Oct 2015 09:05:57 -0700 (PDT) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patch@linaro.org Received: by 10.112.59.35 with SMTP id w3csp692578lbq; Thu, 22 Oct 2015 09:05:56 -0700 (PDT) X-Received: by 10.68.166.68 with SMTP id ze4mr18478572pbb.74.1445529956121; Thu, 22 Oct 2015 09:05:56 -0700 (PDT) Received: from sourceware.org (server1.sourceware.org. [209.132.180.131]) by mx.google.com with ESMTPS id gy9si22032601pbc.83.2015.10.22.09.05.55 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 22 Oct 2015 09:05:56 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-return-411075-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Received: (qmail 27103 invoked by alias); 22 Oct 2015 16:05:40 -0000 Mailing-List: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: , List-Help: , Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 27093 invoked by uid 89); 22 Oct 2015 16:05:39 -0000 X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.7 required=5.0 tests=AWL, BAYES_00, SPF_PASS autolearn=ham version=3.3.2 X-HELO: eu-smtp-delivery-143.mimecast.com Received: from eu-smtp-delivery-143.mimecast.com (HELO eu-smtp-delivery-143.mimecast.com) (207.82.80.143) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 22 Oct 2015 16:05:33 +0000 Received: from cam-owa2.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.140]) by eu-smtp-1.mimecast.com with ESMTP id uk-mta-10-d-KOLBkUS5yTcaMfq_O2_A-1; Thu, 22 Oct 2015 17:05:27 +0100 Received: from [10.2.207.50] ([10.1.2.79]) by cam-owa2.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959); Thu, 22 Oct 2015 17:05:27 +0100 Message-ID: <56290946.50804@arm.com> Date: Thu, 22 Oct 2015 17:05:26 +0100 From: Kyrill Tkachov User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: GCC Patches CC: Marcus Shawcroft , Richard Earnshaw , James Greenhalgh Subject: [PATCH][AArch64] Enable autoprefetcher modelling in the scheduler X-MC-Unique: d-KOLBkUS5yTcaMfq_O2_A-1 X-IsSubscribed: yes X-Original-Sender: kyrylo.tkachov@arm.com X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 2a00:1450:4010:c07::22a as permitted sender) smtp.mailfrom=patch+caf_=patchwork-forward=linaro.org@linaro.org; dkim=pass header.i=@gcc.gnu.org X-Google-Group-Id: 836684582541 Hi all, This patch enables the autoprefetcher heuristic for scheduling in AArch64. It is enabled for the Cortex-A53, Cortex-A57 cores and is off for the other cores, leaving their behaviour unchanged. When enabled, the scheduler will try to sort groups of loads or stores in order of the offset from a common base register. From what I understand of the relevant scheduling hooks, there are essentially three levels of this: 1) Don't use the autoprefetcher heuristic 2) Use it to order loads/stores but allow other scheduling heuristics to reorder them again to maximise multi-issue opportunities 3) Use it to order loads/stores and keep that order, even if it can harm multi-issue opportunities. With this patch I get a 0.4% improvement in SPECINT 2006 and 1.7% improvement in SPECFP 2006 on a Cortex-A57 as well as improvements in various streaming workloads. On Cortex-A53 I see improvements to various streaming workloads and there's no regressions or improvements on SPEC2000. Bootstrapped and tested on aarch64-none-linux-gnu. Ok for trunk? Thanks, Kyrill 2015-10-22 Kyrylo Tkachov * config/aarch64/aarch64-protos.h (struct tune_params): Add autoprefetcher_model field. * config/aarch64/aarch64.c: Include params.h (generic_tunings): Specify autoprefetcher_model value. (cortexa53_tunings): Likewise. (cortexa57_tunings): Likewise. (cortexa72_tunings): Likewise. (thunderx_tunings): Likewise. (xgene1_tunings): Likewise. (aarch64_first_cycle_multipass_dfa_lookahead_guard): New function. (TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD_GUARD): Define. (aarch64_override_options_internal): Set PARAM_SCHED_AUTOPREF_QUEUE_DEPTH param. commit da29c21db2050a6fb3b8c428eb0fc20e63856b6c Author: Kyrylo Tkachov Date: Wed Sep 30 09:29:59 2015 +0100 [AArch64] Enable autoprefetcher modelling in the scheduler diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index baaf1bd..07839ef 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -194,6 +194,23 @@ struct tune_params int vec_reassoc_width; int min_div_recip_mul_sf; int min_div_recip_mul_df; + +/* An enum specifying how to take into account CPU autoprefetch capabilities + during instruction scheduling: + - AUTOPREFETCHER_OFF: Do not take autoprefetch capabilities into account. + - AUTOPREFETCHER_WEAK: Attempt to sort sequences of loads/store in order of + offsets but allow the pipeline hazard recognizer to alter that order to + maximize multi-issue opportunities. + - AUTOPREFETCHER_STRONG: Attempt to sort sequences of loads/store in order of + offsets and prefer this even if it restricts multi-issue opportunities. */ + + enum aarch64_autoprefetch_model + { + AUTOPREFETCHER_OFF, + AUTOPREFETCHER_WEAK, + AUTOPREFETCHER_STRONG + } autoprefetcher_model; + unsigned int extra_tuning_flags; }; diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index d4c5665..4c69dc8 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -64,6 +64,7 @@ #include "gimple-fold.h" #include "tree-eh.h" #include "gimplify.h" +#include "params.h" #include "optabs.h" #include "dwarf2.h" #include "cfgloop.h" @@ -364,6 +365,7 @@ static const struct tune_params generic_tunings = 1, /* vec_reassoc_width. */ 2, /* min_div_recip_mul_sf. */ 2, /* min_div_recip_mul_df. */ + tune_params::AUTOPREFETCHER_OFF, /* autoprefetcher_model. */ (AARCH64_EXTRA_TUNE_NONE) /* tune_flags. */ }; @@ -386,6 +388,7 @@ static const struct tune_params cortexa53_tunings = 1, /* vec_reassoc_width. */ 2, /* min_div_recip_mul_sf. */ 2, /* min_div_recip_mul_df. */ + tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ (AARCH64_EXTRA_TUNE_NONE) /* tune_flags. */ }; @@ -408,6 +411,7 @@ static const struct tune_params cortexa57_tunings = 1, /* vec_reassoc_width. */ 2, /* min_div_recip_mul_sf. */ 2, /* min_div_recip_mul_df. */ + tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ (AARCH64_EXTRA_TUNE_RENAME_FMA_REGS) /* tune_flags. */ }; @@ -430,6 +434,7 @@ static const struct tune_params cortexa72_tunings = 1, /* vec_reassoc_width. */ 2, /* min_div_recip_mul_sf. */ 2, /* min_div_recip_mul_df. */ + tune_params::AUTOPREFETCHER_OFF, /* autoprefetcher_model. */ (AARCH64_EXTRA_TUNE_NONE) /* tune_flags. */ }; @@ -451,6 +456,7 @@ static const struct tune_params thunderx_tunings = 1, /* vec_reassoc_width. */ 2, /* min_div_recip_mul_sf. */ 2, /* min_div_recip_mul_df. */ + tune_params::AUTOPREFETCHER_OFF, /* autoprefetcher_model. */ (AARCH64_EXTRA_TUNE_NONE) /* tune_flags. */ }; @@ -472,6 +478,7 @@ static const struct tune_params xgene1_tunings = 1, /* vec_reassoc_width. */ 2, /* min_div_recip_mul_sf. */ 2, /* min_div_recip_mul_df. */ + tune_params::AUTOPREFETCHER_OFF, /* autoprefetcher_model. */ (AARCH64_EXTRA_TUNE_NONE) /* tune_flags. */ }; @@ -7024,6 +7031,19 @@ aarch64_sched_first_cycle_multipass_dfa_lookahead (void) return issue_rate > 1 && !sched_fusion ? issue_rate : 0; } + +/* Implement TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD_GUARD as + autopref_multipass_dfa_lookahead_guard from haifa-sched.c. It only + has an effect if PARAM_SCHED_AUTOPREF_QUEUE_DEPTH > 0. */ + +static int +aarch64_first_cycle_multipass_dfa_lookahead_guard (rtx_insn *insn, + int ready_index) +{ + return autopref_multipass_dfa_lookahead_guard (insn, ready_index); +} + + /* Vectorizer cost model target hooks. */ /* Implement targetm.vectorize.builtin_vectorization_cost. */ @@ -7615,6 +7635,29 @@ aarch64_override_options_internal (struct gcc_options *opts) initialize_aarch64_code_model (opts); initialize_aarch64_tls_size (opts); + int queue_depth = 0; + switch (aarch64_tune_params.autoprefetcher_model) + { + case tune_params::AUTOPREFETCHER_OFF: + queue_depth = -1; + break; + case tune_params::AUTOPREFETCHER_WEAK: + queue_depth = 0; + break; + case tune_params::AUTOPREFETCHER_STRONG: + queue_depth = max_insn_queue_index + 1; + break; + default: + gcc_unreachable (); + } + + /* We don't mind passing in global_options_set here as we don't use + the *options_set structs anyway. */ + maybe_set_param_value (PARAM_SCHED_AUTOPREF_QUEUE_DEPTH, + queue_depth, + opts->x_param_values, + global_options_set.x_param_values); + aarch64_override_options_after_change_1 (opts); } @@ -13481,6 +13524,10 @@ aarch64_promoted_type (const_tree t) #define TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD \ aarch64_sched_first_cycle_multipass_dfa_lookahead +#undef TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD_GUARD +#define TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD_GUARD \ + aarch64_first_cycle_multipass_dfa_lookahead_guard + #undef TARGET_TRAMPOLINE_INIT #define TARGET_TRAMPOLINE_INIT aarch64_trampoline_init