From patchwork Thu Oct 22 16:05:26 2015
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
X-Patchwork-Id: 55444
Return-Path: <patchwork-forward+bncBDFONDVM3EGBBZMSUSYQKGQEEFKSATI@linaro.org>
X-Original-To: linaro@patches.linaro.org
Delivered-To: linaro@patches.linaro.org
Received: from mail-wi0-f197.google.com (mail-wi0-f197.google.com
 [209.85.212.197])
 by patches.linaro.org (Postfix) with ESMTPS id BCEEF22AA5
 for <linaro@patches.linaro.org>; Thu, 22 Oct 2015 16:05:58 +0000 (UTC)
Received: by wikv3 with SMTP id v3sf36631703wik.1
 for <linaro@patches.linaro.org>; Thu, 22 Oct 2015 09:05:58 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:delivered-to:mailing-list:precedence:list-id
 :list-unsubscribe:list-archive:list-post:list-help:sender
 :delivered-to:message-id:date:from:user-agent:mime-version:to:cc
 :subject:content-type:x-original-sender
 :x-original-authentication-results;
 bh=vyE+y4EOgmKU4+UWhv6reyavEID4sTFQk3N8upvWhT4=;
 b=RYhYiLEzUqzjnlVPJ/UuofH9VKgmIHxxImkKDQcloBaTAn49ggRRulGEMB4WkIDheX
 q20U0usEtzi0uiVHGl6crISeW4R2yHAUhQhm34lq0zj8mgYeoZ0MBehGkT4Jlsq+9Go2
 IaeqXvTSdXhrlFLmkTuOwaWADC48jODigzP2mv2cdPmQwwL/EDZVAVVIxipoMEo+rAiR
 CennSEsXiPQw+/tS2Xb32/MtyrR6BIx3y1VceUCyLfiU04PKhM+yZQMDoRbNC5SHMROx
 4WsGYofygniUGsXYMwNT+v0ObNJIf7dqynPFv0ItqF/9GsAYO+lG5/8KdGLxWCNfU8cT
 IPHQ==
X-Gm-Message-State: ALoCoQnD2RJBjAGsV4zf8JYfYZMzH8pS2tCXmWsXTubZZBN2wXRv1vJE4tIiQxyqqyJaHNcmsY/S
X-Received: by 10.112.169.34 with SMTP id ab2mr3876809lbc.23.1445529957947; 
 Thu, 22 Oct 2015 09:05:57 -0700 (PDT)
X-BeenThere: patchwork-forward@linaro.org
Received: by 10.25.44.79 with SMTP id s76ls356441lfs.29.gmail; Thu, 22 Oct
 2015 09:05:57 -0700 (PDT)
X-Received: by 10.112.180.230 with SMTP id dr6mr8795805lbc.72.1445529957670; 
 Thu, 22 Oct 2015 09:05:57 -0700 (PDT)
Received: from mail-lf0-x22a.google.com (mail-lf0-x22a.google.com.
 [2a00:1450:4010:c07::22a]) by mx.google.com with ESMTPS id
 t1si10062805lbk.72.2015.10.22.09.05.57
 for <patchwork-forward@linaro.org>
 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Thu, 22 Oct 2015 09:05:57 -0700 (PDT)
Received-SPF: pass (google.com: domain of
 patch+caf_=patchwork-forward=linaro.org@linaro.org designates
 2a00:1450:4010:c07::22a as permitted sender)
 client-ip=2a00:1450:4010:c07::22a; 
Received: by lfbn126 with SMTP id n126so19417833lfb.2
 for <patchwork-forward@linaro.org>;
 Thu, 22 Oct 2015 09:05:57 -0700 (PDT)
X-Received: by 10.25.40.130 with SMTP id o124mr5614707lfo.41.1445529957456; 
 Thu, 22 Oct 2015 09:05:57 -0700 (PDT)
X-Forwarded-To: patchwork-forward@linaro.org
X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org
Delivered-To: patch@linaro.org
Received: by 10.112.59.35 with SMTP id w3csp692578lbq;
 Thu, 22 Oct 2015 09:05:56 -0700 (PDT)
X-Received: by 10.68.166.68 with SMTP id ze4mr18478572pbb.74.1445529956121; 
 Thu, 22 Oct 2015 09:05:56 -0700 (PDT)
Received: from sourceware.org (server1.sourceware.org. [209.132.180.131])
 by mx.google.com with ESMTPS id
 gy9si22032601pbc.83.2015.10.22.09.05.55 for <patch@linaro.org>
 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Thu, 22 Oct 2015 09:05:56 -0700 (PDT)
Received-SPF: pass (google.com: domain of
 gcc-patches-return-411075-patch=linaro.org@gcc.gnu.org
 designates 209.132.180.131 as permitted sender)
 client-ip=209.132.180.131; 
Received: (qmail 27103 invoked by alias); 22 Oct 2015 16:05:40 -0000
Mailing-List: list patchwork-forward@linaro.org;
 contact patchwork-forward+owners@linaro.org
Precedence: list
List-Id: <patchwork-forward.linaro.org>
List-Unsubscribe: <mailto:googlegroups-manage+836684582541+unsubscribe@googlegroups.com>, 
 <http://groups.google.com/a/linaro.org/group/patchwork-forward/subscribe>
List-Archive: <http://groups.google.com/a/linaro.org/group/patchwork-forward/>
List-Post: <http://groups.google.com/a/linaro.org/group/patchwork-forward/post>, 
 <mailto:patchwork-forward@linaro.org>
List-Help: <http://support.google.com/a/linaro.org/bin/topic.py?topic=25838>, 
 <mailto:patchwork-forward+help@linaro.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 27093 invoked by uid 89); 22 Oct 2015 16:05:39 -0000
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.7 required=5.0 tests=AWL, BAYES_00,
 SPF_PASS autolearn=ham version=3.3.2
X-HELO: eu-smtp-delivery-143.mimecast.com
Received: from eu-smtp-delivery-143.mimecast.com (HELO
 eu-smtp-delivery-143.mimecast.com) (207.82.80.143) by
 sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP;
 Thu, 22 Oct 2015 16:05:33 +0000
Received: from cam-owa2.Emea.Arm.com (fw-tnat.cambridge.arm.com
 [217.140.96.140]) by eu-smtp-1.mimecast.com with ESMTP id
 uk-mta-10-d-KOLBkUS5yTcaMfq_O2_A-1; Thu, 22 Oct 2015 17:05:27 +0100
Received: from [10.2.207.50] ([10.1.2.79]) by cam-owa2.Emea.Arm.com with
 Microsoft SMTPSVC(6.0.3790.3959); Thu, 22 Oct 2015 17:05:27 +0100
Message-ID: <56290946.50804@arm.com>
Date: Thu, 22 Oct 2015 17:05:26 +0100
From: Kyrill Tkachov <kyrylo.tkachov@arm.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
 rv:31.0) Gecko/20100101 Thunderbird/31.2.0
MIME-Version: 1.0
To: GCC Patches <gcc-patches@gcc.gnu.org>
CC: Marcus Shawcroft <marcus.shawcroft@arm.com>,
 Richard Earnshaw <Richard.Earnshaw@arm.com>,
 James Greenhalgh <james.greenhalgh@arm.com>
Subject: [PATCH][AArch64] Enable autoprefetcher modelling in the scheduler
X-MC-Unique: d-KOLBkUS5yTcaMfq_O2_A-1
X-IsSubscribed: yes
X-Original-Sender: kyrylo.tkachov@arm.com
X-Original-Authentication-Results: mx.google.com; spf=pass (google.com:
 domain of
 patch+caf_=patchwork-forward=linaro.org@linaro.org designates
 2a00:1450:4010:c07::22a as permitted sender)
 smtp.mailfrom=patch+caf_=patchwork-forward=linaro.org@linaro.org;
 dkim=pass header.i=@gcc.gnu.org
X-Google-Group-Id: 836684582541

Hi all,

This patch enables the autoprefetcher heuristic for scheduling in AArch64.
It is enabled for the Cortex-A53, Cortex-A57 cores and is off for the other cores,
leaving their behaviour unchanged.

When enabled, the scheduler will try to sort groups of loads or stores in order of the offset from
a common base register.

 From what I understand of the relevant scheduling hooks, there are essentially three levels of this:
1) Don't use the autoprefetcher heuristic
2) Use it to order loads/stores but allow other scheduling heuristics to reorder them again to maximise multi-issue opportunities
3) Use it to order loads/stores and keep that order, even if it can harm multi-issue opportunities.

With this patch I get a 0.4% improvement in SPECINT 2006 and 1.7% improvement in SPECFP 2006 on a Cortex-A57
as well as improvements in various streaming workloads.

On Cortex-A53 I see improvements to various streaming workloads and there's no regressions or improvements on SPEC2000.

Bootstrapped and tested on aarch64-none-linux-gnu.

Ok for trunk?

Thanks,
Kyrill

2015-10-22  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * config/aarch64/aarch64-protos.h
     (struct tune_params): Add autoprefetcher_model field.
     * config/aarch64/aarch64.c: Include params.h
     (generic_tunings): Specify autoprefetcher_model value.
     (cortexa53_tunings): Likewise.
     (cortexa57_tunings): Likewise.
     (cortexa72_tunings): Likewise.
     (thunderx_tunings): Likewise.
     (xgene1_tunings): Likewise.
     (aarch64_first_cycle_multipass_dfa_lookahead_guard): New function.
     (TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD_GUARD): Define.
     (aarch64_override_options_internal): Set
     PARAM_SCHED_AUTOPREF_QUEUE_DEPTH param.

commit da29c21db2050a6fb3b8c428eb0fc20e63856b6c
Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
Date:   Wed Sep 30 09:29:59 2015 +0100

    [AArch64] Enable autoprefetcher modelling in the scheduler

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index baaf1bd..07839ef 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -194,6 +194,23 @@ struct tune_params
   int vec_reassoc_width;
   int min_div_recip_mul_sf;
   int min_div_recip_mul_df;
+
+/* An enum specifying how to take into account CPU autoprefetch capabilities
+   during instruction scheduling:
+   - AUTOPREFETCHER_OFF: Do not take autoprefetch capabilities into account.
+   - AUTOPREFETCHER_WEAK: Attempt to sort sequences of loads/store in order of
+   offsets but allow the pipeline hazard recognizer to alter that order to
+   maximize multi-issue opportunities.
+   - AUTOPREFETCHER_STRONG: Attempt to sort sequences of loads/store in order of
+   offsets and prefer this even if it restricts multi-issue opportunities.  */
+
+  enum aarch64_autoprefetch_model
+  {
+    AUTOPREFETCHER_OFF,
+    AUTOPREFETCHER_WEAK,
+    AUTOPREFETCHER_STRONG
+  } autoprefetcher_model;
+
   unsigned int extra_tuning_flags;
 };
 
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index d4c5665..4c69dc8 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -64,6 +64,7 @@
 #include "gimple-fold.h"
 #include "tree-eh.h"
 #include "gimplify.h"
+#include "params.h"
 #include "optabs.h"
 #include "dwarf2.h"
 #include "cfgloop.h"
@@ -364,6 +365,7 @@ static const struct tune_params generic_tunings =
   1,	/* vec_reassoc_width.  */
   2,	/* min_div_recip_mul_sf.  */
   2,	/* min_div_recip_mul_df.  */
+  tune_params::AUTOPREFETCHER_OFF,	/* autoprefetcher_model.  */
   (AARCH64_EXTRA_TUNE_NONE)	/* tune_flags.  */
 };
 
@@ -386,6 +388,7 @@ static const struct tune_params cortexa53_tunings =
   1,	/* vec_reassoc_width.  */
   2,	/* min_div_recip_mul_sf.  */
   2,	/* min_div_recip_mul_df.  */
+  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
   (AARCH64_EXTRA_TUNE_NONE)	/* tune_flags.  */
 };
 
@@ -408,6 +411,7 @@ static const struct tune_params cortexa57_tunings =
   1,	/* vec_reassoc_width.  */
   2,	/* min_div_recip_mul_sf.  */
   2,	/* min_div_recip_mul_df.  */
+  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
   (AARCH64_EXTRA_TUNE_RENAME_FMA_REGS)	/* tune_flags.  */
 };
 
@@ -430,6 +434,7 @@ static const struct tune_params cortexa72_tunings =
   1,	/* vec_reassoc_width.  */
   2,	/* min_div_recip_mul_sf.  */
   2,	/* min_div_recip_mul_df.  */
+  tune_params::AUTOPREFETCHER_OFF,	/* autoprefetcher_model.  */
   (AARCH64_EXTRA_TUNE_NONE)	/* tune_flags.  */
 };
 
@@ -451,6 +456,7 @@ static const struct tune_params thunderx_tunings =
   1,	/* vec_reassoc_width.  */
   2,	/* min_div_recip_mul_sf.  */
   2,	/* min_div_recip_mul_df.  */
+  tune_params::AUTOPREFETCHER_OFF,	/* autoprefetcher_model.  */
   (AARCH64_EXTRA_TUNE_NONE)	/* tune_flags.  */
 };
 
@@ -472,6 +478,7 @@ static const struct tune_params xgene1_tunings =
   1,	/* vec_reassoc_width.  */
   2,	/* min_div_recip_mul_sf.  */
   2,	/* min_div_recip_mul_df.  */
+  tune_params::AUTOPREFETCHER_OFF,	/* autoprefetcher_model.  */
   (AARCH64_EXTRA_TUNE_NONE)	/* tune_flags.  */
 };
 
@@ -7024,6 +7031,19 @@ aarch64_sched_first_cycle_multipass_dfa_lookahead (void)
   return issue_rate > 1 && !sched_fusion ? issue_rate : 0;
 }
 
+
+/* Implement TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD_GUARD as
+   autopref_multipass_dfa_lookahead_guard from haifa-sched.c.  It only
+   has an effect if PARAM_SCHED_AUTOPREF_QUEUE_DEPTH > 0.  */
+
+static int
+aarch64_first_cycle_multipass_dfa_lookahead_guard (rtx_insn *insn,
+						    int ready_index)
+{
+  return autopref_multipass_dfa_lookahead_guard (insn, ready_index);
+}
+
+
 /* Vectorizer cost model target hooks.  */
 
 /* Implement targetm.vectorize.builtin_vectorization_cost.  */
@@ -7615,6 +7635,29 @@ aarch64_override_options_internal (struct gcc_options *opts)
   initialize_aarch64_code_model (opts);
   initialize_aarch64_tls_size (opts);
 
+  int queue_depth = 0;
+  switch (aarch64_tune_params.autoprefetcher_model)
+    {
+      case tune_params::AUTOPREFETCHER_OFF:
+	queue_depth = -1;
+	break;
+      case tune_params::AUTOPREFETCHER_WEAK:
+	queue_depth = 0;
+	break;
+      case tune_params::AUTOPREFETCHER_STRONG:
+	queue_depth = max_insn_queue_index + 1;
+	break;
+      default:
+	gcc_unreachable ();
+    }
+
+  /* We don't mind passing in global_options_set here as we don't use
+     the *options_set structs anyway.  */
+  maybe_set_param_value (PARAM_SCHED_AUTOPREF_QUEUE_DEPTH,
+			 queue_depth,
+			 opts->x_param_values,
+			 global_options_set.x_param_values);
+
   aarch64_override_options_after_change_1 (opts);
 }
 
@@ -13481,6 +13524,10 @@ aarch64_promoted_type (const_tree t)
 #define TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD \
   aarch64_sched_first_cycle_multipass_dfa_lookahead
 
+#undef TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD_GUARD
+#define TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD_GUARD \
+  aarch64_first_cycle_multipass_dfa_lookahead_guard
+
 #undef TARGET_TRAMPOLINE_INIT
 #define TARGET_TRAMPOLINE_INIT aarch64_trampoline_init