From patchwork Thu Oct 15 15:27:39 2015
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
X-Patchwork-Id: 55042
Return-Path: <patchwork-forward+bncBDFONDVM3EGBBDUM76YAKGQEVXOUD6Y@linaro.org>
X-Original-To: linaro@patches.linaro.org
Delivered-To: linaro@patches.linaro.org
Received: from mail-lb0-f200.google.com (mail-lb0-f200.google.com
 [209.85.217.200])
 by patches.linaro.org (Postfix) with ESMTPS id 62E8F2301F
 for <linaro@patches.linaro.org>; Thu, 15 Oct 2015 15:28:15 +0000 (UTC)
Received: by lbbms9 with SMTP id ms9sf11838109lbb.3
 for <linaro@patches.linaro.org>; Thu, 15 Oct 2015 08:28:14 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:delivered-to:mailing-list:precedence:list-id
 :list-unsubscribe:list-archive:list-post:list-help:sender
 :delivered-to:message-id:date:from:user-agent:mime-version:to:cc
 :subject:references:in-reply-to:content-type:x-original-sender
 :x-original-authentication-results;
 bh=xT/24E65bfm/dXBrVD/mIl/WMi/SD1Bb+Y22V3LubM4=;
 b=jQKFdNQMHZ+vXQJBTigBKvzN/xSVYOHvID9yJ4hnm3CcS0ClGgh1/nd1LhwYEkZGSg
 Nu3wcfP/YCVOLSZ3Y0Lq8v2SLAyTc8okYG0A8feAO1wlluAjozzZT3K8FZxDXt/ZyWmr
 n9MYoxvI9Ep6jURb7anP+G5OIF8Te60EL4yGrzB2xGQFnjVPwTfpi8+n5NWWdvd3VDM8
 G/lMyoesR1YHQMjRr8NXAe906oVed1wZezvHRM8NafA7ERPmKx7qXNz0Yo9kUet3sLzU
 kQ4xSMk51c3PE7PcUEl8C6LSOvkBIMxV7Wll83zl06tC3U/uqfQ2rbypcH03oNbGvxuY
 yO7A==
X-Gm-Message-State: ALoCoQll5XJZyM0Jo/u1h2Y5t9bGt+aXUUAfAvDkvJX8YIvfCYJ6tnUebs+ktUs8B/iq8yPJiiDJ
X-Received: by 10.112.158.202 with SMTP id ww10mr2393567lbb.13.1444922894238; 
 Thu, 15 Oct 2015 08:28:14 -0700 (PDT)
X-BeenThere: patchwork-forward@linaro.org
Received: by 10.25.155.70 with SMTP id d67ls160832lfe.25.gmail; Thu, 15 Oct
 2015 08:28:14 -0700 (PDT)
X-Received: by 10.112.63.233 with SMTP id j9mr4925409lbs.113.1444922894090; 
 Thu, 15 Oct 2015 08:28:14 -0700 (PDT)
Received: from mail-lb0-x233.google.com (mail-lb0-x233.google.com.
 [2a00:1450:4010:c04::233]) by mx.google.com with ESMTPS id
 wk2si9483923lbb.42.2015.10.15.08.28.13
 for <patchwork-forward@linaro.org>
 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Thu, 15 Oct 2015 08:28:13 -0700 (PDT)
Received-SPF: pass (google.com: domain of
 patch+caf_=patchwork-forward=linaro.org@linaro.org designates
 2a00:1450:4010:c04::233 as permitted sender)
 client-ip=2a00:1450:4010:c04::233; 
Received: by lbbpp2 with SMTP id pp2so44210209lbb.0
 for <patchwork-forward@linaro.org>;
 Thu, 15 Oct 2015 08:28:13 -0700 (PDT)
X-Received: by 10.112.132.7 with SMTP id oq7mr5054904lbb.32.1444922893315;
 Thu, 15 Oct 2015 08:28:13 -0700 (PDT)
X-Forwarded-To: patchwork-forward@linaro.org
X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org
Delivered-To: patch@linaro.org
Received: by 10.112.59.35 with SMTP id w3csp702103lbq;
 Thu, 15 Oct 2015 08:28:11 -0700 (PDT)
X-Received: by 10.50.12.8 with SMTP id u8mr10958886igb.56.1444922891806;
 Thu, 15 Oct 2015 08:28:11 -0700 (PDT)
Received: from sourceware.org (server1.sourceware.org. [209.132.180.131])
 by mx.google.com with ESMTPS id
 qh9si21753287igb.88.2015.10.15.08.28.11 for <patch@linaro.org>
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Thu, 15 Oct 2015 08:28:11 -0700 (PDT)
Received-SPF: pass (google.com: domain of
 gcc-patches-return-410268-patch=linaro.org@gcc.gnu.org
 designates 209.132.180.131 as permitted sender)
 client-ip=209.132.180.131; 
Received: (qmail 67948 invoked by alias); 15 Oct 2015 15:27:53 -0000
Mailing-List: list patchwork-forward@linaro.org;
 contact patchwork-forward+owners@linaro.org
Precedence: list
List-Id: <patchwork-forward.linaro.org>
List-Unsubscribe: <mailto:googlegroups-manage+836684582541+unsubscribe@googlegroups.com>, 
 <http://groups.google.com/a/linaro.org/group/patchwork-forward/subscribe>
List-Archive: <http://groups.google.com/a/linaro.org/group/patchwork-forward/>
List-Post: <http://groups.google.com/a/linaro.org/group/patchwork-forward/post>, 
 <mailto:patchwork-forward@linaro.org>
List-Help: <http://support.google.com/a/linaro.org/bin/topic.py?topic=25838>, 
 <mailto:patchwork-forward+help@linaro.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 67931 invoked by uid 89); 15 Oct 2015 15:27:52 -0000
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.7 required=5.0 tests=AWL, BAYES_00,
 SPF_PASS autolearn=ham version=3.3.2
X-HELO: eu-smtp-delivery-143.mimecast.com
Received: from eu-smtp-delivery-143.mimecast.com (HELO
 eu-smtp-delivery-143.mimecast.com) (207.82.80.143) by
 sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP;
 Thu, 15 Oct 2015 15:27:46 +0000
Received: from cam-owa1.Emea.Arm.com (fw-tnat.cambridge.arm.com
 [217.140.96.140]) by eu-smtp-1.mimecast.com with ESMTP id
 uk-mta-38-JVUwCx2HQ1q7wTOMQOav6Q-1; Thu, 15 Oct 2015 16:27:40 +0100
Received: from [10.2.207.50] ([10.1.2.79]) by cam-owa1.Emea.Arm.com with
 Microsoft SMTPSVC(6.0.3790.3959); Thu, 15 Oct 2015 16:27:40 +0100
Message-ID: <561FC5EB.6090906@arm.com>
Date: Thu, 15 Oct 2015 16:27:39 +0100
From: Kyrill Tkachov <kyrylo.tkachov@arm.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
 rv:31.0) Gecko/20100101 Thunderbird/31.2.0
MIME-Version: 1.0
To: Bernd Schmidt <bschmidt@redhat.com>, GCC Patches <gcc-patches@gcc.gnu.org>
CC: Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org>,
 Vladimir Makarov <vmakarov@redhat.com>
Subject: Re: [PATCH][haifa-sched] model load/store multiples properly in
 autoprefetcher scheduling
References: <561F748B.5090705@arm.com> <561F7D03.70708@redhat.com>
In-Reply-To: <561F7D03.70708@redhat.com>
X-MC-Unique: JVUwCx2HQ1q7wTOMQOav6Q-1
X-IsSubscribed: yes
X-Original-Sender: kyrylo.tkachov@arm.com
X-Original-Authentication-Results: mx.google.com; spf=pass (google.com:
 domain of
 patch+caf_=patchwork-forward=linaro.org@linaro.org designates
 2a00:1450:4010:c04::233 as permitted sender)
 smtp.mailfrom=patch+caf_=patchwork-forward=linaro.org@linaro.org;
 dkim=pass header.i=@gcc.gnu.org
X-Google-Group-Id: 836684582541

On 15/10/15 11:16, Bernd Schmidt wrote:
> On 10/15/2015 11:40 AM, Kyrill Tkachov wrote:
>> The code that analyzes the offsets of the loads/stores doesn't try to
>> handle load/store-multiple insns.
>> These appear rather frequently in memory streaming workloads on aarch64
>> in the form of load-pair/store-pair instructions
>> i.e. ldp/stp.  In RTL, they are created by the sched_fusion pass + a
>> subsequent peephole and during sched2 they appear
>> as PARALLEL rtxes of multiple SETs to/from memory.
>>
>
>>      * sched-int.h (struct autopref_multipass_data_): Remove offset
>>      field.  Add min_offset, max_offset, multi_mem_insn_p fields.
>>      * haifa-sched.c (analyze_set_insn_for_autopref): New function.
>>      (autopref_multipass_init): Use it.  Handle PARALLEL sets.
>>      (autopref_rank_data): New function.
>>      (autopref_rank_for_schedule): Use it.
>>      (autopref_multipass_dfa_lookahead_guard_1): Likewise.
>
> Looks pretty reasonable to me. Ok to commit with a few changes next Wednesday unless you hear from Vlad in the meantime (I just want to give him time to look at it).

Thanks, I'll wait as you suggested (and cc'ing Vlad).
In the meantime, here's the updated patch with the suggested changes for the record.

Kyrill

2015-10-15  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * sched-int.h (struct autopref_multipass_data_): Remove offset
     field.  Add min_offset, max_offset, multi_mem_insn_p fields.
     * haifa-sched.c (analyze_set_insn_for_autopref): New function.
     (autopref_multipass_init): Use it.  Handle PARALLEL sets.
     (autopref_rank_data): New function.
     (autopref_rank_for_schedule): Use it.
     (autopref_multipass_dfa_lookahead_guard_1): Likewise.

>
>> +/* Helper for autopref_multipass_init. Given a SET insn in PAT and whether
>> +   we're expecting a memory WRITE or not, check that the insn is relevant to
>> +   the autoprefetcher modelling code.  Return true iff that is the case.
>> +   If it is relevant record the base register of the memory op in BASE and
>> +   the offset in OFFSET.  */
>
> Comma before "record". I think I'd just use "SET" rather than "SET insn", because in my mind an insn is always more than just a (part of a) pattern.
>
>> +static bool
>> +analyze_set_insn_for_autopref (rtx pat, int write, rtx *base, int *offset)
>
> bool write?
>
>> +  /* This insn is relevant for auto-prefetcher.
>
> "the auto-prefetcher".
>
>>     if (data2->status == AUTOPREF_MULTIPASS_DATA_IRRELEVANT)
>>       return 0;
>>
>> -  if (rtx_equal_p (data1->base, data2->base)
>> -      && data1->offset > data2->offset)
>> +  bool delay_p = rtx_equal_p (data1->base, data2->base)
>> +          && autopref_rank_data (data1, data2) > 0;
>> +  if (delay_p)
>
> If you want to do this you still need parentheses around the multi-line expression. I'd just keep it inside the if condition.
>
>> +  /* Is this a load/store-multiple instruction.  */
>> +  bool multi_mem_insn_p;
>
> Maybe write "True if this is a ..."
>
>
> Bernd
>

commit e90bcc38961f2911f5ea158a415dcde807e3f551
Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
Date:   Tue Sep 29 16:58:05 2015 +0100

    model load/store pairs properly in autoprefetcher scheduling

diff --git a/gcc/haifa-sched.c b/gcc/haifa-sched.c
index c35d777..46751fe 100644
--- a/gcc/haifa-sched.c
+++ b/gcc/haifa-sched.c
@@ -5533,6 +5533,35 @@ insn_finishes_cycle_p (rtx_insn *insn)
   return false;
 }
 
+/* Helper for autopref_multipass_init.  Given a SET in PAT and whether
+   we're expecting a memory WRITE or not, check that the insn is relevant to
+   the autoprefetcher modelling code.  Return true iff that is the case.
+   If it is relevant, record the base register of the memory op in BASE and
+   the offset in OFFSET.  */
+
+static bool
+analyze_set_insn_for_autopref (rtx pat, bool write, rtx *base, int *offset)
+{
+  if (GET_CODE (pat) != SET)
+    return false;
+
+  rtx mem = write ? SET_DEST (pat) : SET_SRC (pat);
+  if (!MEM_P (mem))
+    return false;
+
+  struct address_info info;
+  decompose_mem_address (&info, mem);
+
+  /* TODO: Currently only (base+const) addressing is supported.  */
+  if (info.base == NULL || !REG_P (*info.base)
+      || (info.disp != NULL && !CONST_INT_P (*info.disp)))
+    return false;
+
+  *base = *info.base;
+  *offset = info.disp ? INTVAL (*info.disp) : 0;
+  return true;
+}
+
 /* Functions to model cache auto-prefetcher.
 
    Some of the CPUs have cache auto-prefetcher, which /seems/ to initiate
@@ -5557,30 +5586,139 @@ autopref_multipass_init (const rtx_insn *insn, int write)
 
   gcc_assert (data->status == AUTOPREF_MULTIPASS_DATA_UNINITIALIZED);
   data->base = NULL_RTX;
-  data->offset = 0;
+  data->min_offset = 0;
+  data->max_offset = 0;
+  data->multi_mem_insn_p = false;
   /* Set insn entry initialized, but not relevant for auto-prefetcher.  */
   data->status = AUTOPREF_MULTIPASS_DATA_IRRELEVANT;
 
+  rtx pat = PATTERN (insn);
+
+  /* We have a multi-set insn like a load-multiple or store-multiple.
+     We care about these as long as all the memory ops inside the PARALLEL
+     have the same base register.  We care about the minimum and maximum
+     offsets from that base but don't check for the order of those offsets
+     within the PARALLEL insn itself.  */
+  if (GET_CODE (pat) == PARALLEL)
+    {
+      int n_elems = XVECLEN (pat, 0);
+
+      int i = 0;
+      rtx prev_base = NULL_RTX;
+      int min_offset;
+      int max_offset;
+
+      for (i = 0; i < n_elems; i++)
+	{
+	  rtx set = XVECEXP (pat, 0, i);
+	  if (GET_CODE (set) != SET)
+	    return;
+
+	  rtx base = NULL_RTX;
+	  int offset = 0;
+	  if (!analyze_set_insn_for_autopref (set, write, &base, &offset))
+	    return;
+
+	  if (i == 0)
+	    {
+	      prev_base = base;
+	      min_offset = offset;
+	      max_offset = offset;
+	    }
+	  /* Ensure that all memory operations in the PARALLEL use the same
+	     base register.  */
+	  else if (REGNO (base) != REGNO (prev_base))
+	    return;
+	  else
+	    {
+	      min_offset = MIN (min_offset, offset);
+	      max_offset = MAX (max_offset, offset);
+	    }
+	}
+
+      /* If we reached here then we have a valid PARALLEL of multiple memory
+	 ops with prev_base as the base and min_offset and max_offset
+	 containing the offsets range.  */
+      gcc_assert (prev_base);
+      data->base = prev_base;
+      data->min_offset = min_offset;
+      data->max_offset = max_offset;
+      data->multi_mem_insn_p = true;
+      data->status = AUTOPREF_MULTIPASS_DATA_NORMAL;
+
+      return;
+    }
+
+  /* Otherwise this is a single set memory operation.  */
   rtx set = single_set (insn);
   if (set == NULL_RTX)
     return;
 
-  rtx mem = write ? SET_DEST (set) : SET_SRC (set);
-  if (!MEM_P (mem))
+  if (!analyze_set_insn_for_autopref (set, write, &data->base,
+				       &data->min_offset))
     return;
 
-  struct address_info info;
-  decompose_mem_address (&info, mem);
+  /* This insn is relevant for the auto-prefetcher.
+     The base and offset fields will have been filled in the
+     analyze_set_insn_for_autopref call above.  */
+  data->status = AUTOPREF_MULTIPASS_DATA_NORMAL;
+}
 
-  /* TODO: Currently only (base+const) addressing is supported.  */
-  if (info.base == NULL || !REG_P (*info.base)
-      || (info.disp != NULL && !CONST_INT_P (*info.disp)))
-    return;
 
-  /* This insn is relevant for auto-prefetcher.  */
-  data->base = *info.base;
-  data->offset = info.disp ? INTVAL (*info.disp) : 0;
-  data->status = AUTOPREF_MULTIPASS_DATA_NORMAL;
+/* Helper for autopref_rank_for_schedule.  Given the data of two
+   insns relevant to the auto-prefetcher modelling code DATA1 and DATA2
+   return their comparison result.  Return 0 if there is no sensible
+   ranking order for the two insns.  */
+
+static int
+autopref_rank_data (autopref_multipass_data_t data1,
+		     autopref_multipass_data_t data2)
+{
+  /* Simple case when both insns are simple single memory ops.  */
+  if (!data1->multi_mem_insn_p && !data2->multi_mem_insn_p)
+    return data1->min_offset - data2->min_offset;
+
+  /* Two load/store multiple insns.  Return 0 if the offset ranges
+     overlap and the difference between the minimum offsets otherwise.  */
+  else if (data1->multi_mem_insn_p && data2->multi_mem_insn_p)
+    {
+      int min1 = data1->min_offset;
+      int max1 = data1->max_offset;
+      int min2 = data2->min_offset;
+      int max2 = data2->max_offset;
+
+      if (max1 < min2 || min1 > max2)
+	return min1 - min2;
+      else
+	return 0;
+    }
+
+  /* The other two cases is a pair of a load/store multiple and
+     a simple memory op.  Return 0 if the single op's offset is within the
+     range of the multi-op insn and the difference between the single offset
+     and the minimum offset of the multi-set insn otherwise.  */
+  else if (data1->multi_mem_insn_p && !data2->multi_mem_insn_p)
+    {
+      int max1 = data1->max_offset;
+      int min1 = data1->min_offset;
+
+      if (data2->min_offset >= min1
+	  && data2->min_offset <= max1)
+	return 0;
+      else
+	return min1 - data2->min_offset;
+    }
+  else
+    {
+      int max2 = data2->max_offset;
+      int min2 = data2->min_offset;
+
+      if (data1->min_offset >= min2
+	  && data1->min_offset <= max2)
+	return 0;
+      else
+	return data1->min_offset - min2;
+    }
 }
 
 /* Helper function for rank_for_schedule sorting.  */
@@ -5607,7 +5745,7 @@ autopref_rank_for_schedule (const rtx_insn *insn1, const rtx_insn *insn2)
       if (!rtx_equal_p (data1->base, data2->base))
 	continue;
 
-      return data1->offset - data2->offset;
+      return autopref_rank_data (data1, data2);
     }
 
   return 0;
@@ -5633,7 +5771,7 @@ autopref_multipass_dfa_lookahead_guard_1 (const rtx_insn *insn1,
     return 0;
 
   if (rtx_equal_p (data1->base, data2->base)
-      && data1->offset > data2->offset)
+      && autopref_rank_data (data1, data2) > 0)
     {
       if (sched_verbose >= 2)
 	{
diff --git a/gcc/sched-int.h b/gcc/sched-int.h
index 800262c..86f5821 100644
--- a/gcc/sched-int.h
+++ b/gcc/sched-int.h
@@ -807,8 +807,17 @@ struct autopref_multipass_data_
 {
   /* Base part of memory address.  */
   rtx base;
-  /* Memory offset.  */
-  int offset;
+
+  /* Memory offsets from the base.  For single simple sets
+     only min_offset is valid.  For multi-set insns min_offset
+     and max_offset record the minimum and maximum offsets from the same
+     base among the sets inside the PARALLEL.  */
+  int min_offset;
+  int max_offset;
+
+  /* True if this is a load/store-multiple instruction.  */
+  bool multi_mem_insn_p;
+
   /* Entry status.  */
   enum autopref_multipass_data_status status;
 };