From patchwork Fri Nov 14 00:56:13 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Pinski X-Patchwork-Id: 40794 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-wg0-f70.google.com (mail-wg0-f70.google.com [74.125.82.70]) by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id A0E7224493 for ; Fri, 14 Nov 2014 00:57:19 +0000 (UTC) Received: by mail-wg0-f70.google.com with SMTP id x13sf8451303wgg.1 for ; Thu, 13 Nov 2014 16:57:18 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:delivered-to:mailing-list :precedence:list-id:list-unsubscribe:list-archive:list-post :list-help:sender:delivered-to:from:to:cc:subject:date:message-id :in-reply-to:references:x-original-sender :x-original-authentication-results; bh=GhVADKNo3pLMmmv1I/tn9HfPK08LPbPP/LvJiVVTdNg=; b=HQcYwUw3U5rlLmiYN6A9CiwZyxRrHx5ETZ8mCWGdWonIjLKVlvN7kAIgvrfCYA2KI3 nVIYY0Eb8X2d2cXmRYMbsbpY5fG4GRzlmP3mBJwxy9kMo7mMUzhl+oAc4KroTLXeffGp eYqnX000MZuAlkRF1E0fnbtFXQE0K1DDuxM1ndp6kVCem/ty8u5crmuw9zvwVyEuRY+6 bPIt7HMosBUjqqWFX3bZdgBrAHVQgcZLZz0//lDl6lS4ue2sGQgW8H5QSZXEy+aVipWa /4E0sg5drJVAsJfpqvll47OgqT1w5moSDdi4/JUEHgFFTnueYE8E/VH5t0xDeziNYMCk GBdQ== X-Gm-Message-State: ALoCoQnSn0OqSqJaAJxfL0lpahKBYfMAKlHskAeLx9zh8oFD+qj3cRZ0GmvQRSRExu2LQ6gaSTlF X-Received: by 10.181.27.135 with SMTP id jg7mr413115wid.5.1415926638953; Thu, 13 Nov 2014 16:57:18 -0800 (PST) MIME-Version: 1.0 X-BeenThere: patchwork-forward@linaro.org Received: by 10.152.6.227 with SMTP id e3ls926280laa.69.gmail; Thu, 13 Nov 2014 16:57:18 -0800 (PST) X-Received: by 10.152.23.73 with SMTP id k9mr5481401laf.14.1415926638590; Thu, 13 Nov 2014 16:57:18 -0800 (PST) Received: from mail-lb0-x231.google.com (mail-lb0-x231.google.com. [2a00:1450:4010:c04::231]) by mx.google.com with ESMTPS id y3si40075005lbf.66.2014.11.13.16.57.18 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 13 Nov 2014 16:57:18 -0800 (PST) Received-SPF: pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 2a00:1450:4010:c04::231 as permitted sender) client-ip=2a00:1450:4010:c04::231; Received: by mail-lb0-f177.google.com with SMTP id z12so5189276lbi.22 for ; Thu, 13 Nov 2014 16:57:18 -0800 (PST) X-Received: by 10.152.87.171 with SMTP id az11mr5214284lab.37.1415926638465; Thu, 13 Nov 2014 16:57:18 -0800 (PST) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patch@linaro.org Received: by 10.112.184.201 with SMTP id ew9csp693009lbc; Thu, 13 Nov 2014 16:57:17 -0800 (PST) X-Received: by 10.66.148.225 with SMTP id tv1mr6458454pab.17.1415926634077; Thu, 13 Nov 2014 16:57:14 -0800 (PST) Received: from sourceware.org (server1.sourceware.org. [209.132.180.131]) by mx.google.com with ESMTPS id ho1si26990903pbc.78.2014.11.13.16.57.13 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 13 Nov 2014 16:57:14 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-return-384240-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Received: (qmail 671 invoked by alias); 14 Nov 2014 00:56:24 -0000 Mailing-List: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: , List-Help: , Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 575 invoked by uid 89); 14 Nov 2014 00:56:23 -0000 X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.1 required=5.0 tests=AWL, BAYES_50, RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.2 X-HELO: mail-ig0-f173.google.com Received: from mail-ig0-f173.google.com (HELO mail-ig0-f173.google.com) (209.85.213.173) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Fri, 14 Nov 2014 00:56:19 +0000 Received: by mail-ig0-f173.google.com with SMTP id r10so680575igi.6 for ; Thu, 13 Nov 2014 16:56:17 -0800 (PST) X-Received: by 10.107.156.131 with SMTP id f125mr6622442ioe.15.1415926577425; Thu, 13 Nov 2014 16:56:17 -0800 (PST) Received: from localhost.localdomain (64.2.3.194.ptr.us.xo.net. [64.2.3.194]) by mx.google.com with ESMTPSA id f4sm6532837ioe.11.2014.11.13.16.56.16 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Thu, 13 Nov 2014 16:56:16 -0800 (PST) Received: from localhost.localdomain (apinskidesktop [127.0.0.1]) by localhost.localdomain (8.14.3/8.14.3/Debian-9.4) with ESMTP id sAE0uFJl003125 (version=TLSv1/SSLv3 cipher=DHE-DSS-AES256-SHA bits=256 verify=NO); Thu, 13 Nov 2014 16:56:15 -0800 Received: (from apinski@localhost) by localhost.localdomain (8.14.3/8.14.3/Submit) id sAE0uFZ7003124; Thu, 13 Nov 2014 16:56:15 -0800 From: Andrew Pinski To: gcc-patches@gcc.gnu.org Cc: Andrew Pinski Subject: [PATCH 2/3] [AARCH64] Add scheduler for ThunderX Date: Thu, 13 Nov 2014 16:56:13 -0800 Message-Id: <1415926574-3080-3-git-send-email-apinski@cavium.com> In-Reply-To: <1415926574-3080-1-git-send-email-apinski@cavium.com> References: <1415926574-3080-1-git-send-email-apinski@cavium.com> X-Original-Sender: apinski@cavium.com X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 2a00:1450:4010:c04::231 as permitted sender) smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org; dkim=pass header.i=@gcc.gnu.org X-Google-Group-Id: 836684582541 This adds the schedule model for ThunderX. There are a few TODOs in that not all of the SIMD is model currently. Also the idea of a simple shift/extend is not modeled and all cases where there is a shift/extend is considered as non simple and take up two cycles rather than correct value of one cycle. Also the 32bit divide and the 64bit divide have different cycle counts but there is no way to model that currently. Also multiply high takes one cycle more than the normal multiply but there is no way to model that currently either. Build and tested for aarch64-elf with no regressions. ChangeLog: * config/aarch64/aarch64-cores.def (thunderx): Change the scheduler over to thunderx. * config/aarch64/aarch64.md: Include thunderx.md. (generic_sched): Set to no for thunderx. * config/aarch64/thunderx.md: New file. --- gcc/config/aarch64/aarch64-cores.def | 2 +- gcc/config/aarch64/aarch64.md | 3 +- gcc/config/aarch64/thunderx.md | 260 ++++++++++++++++++++++++++++++++++ 3 files changed, 263 insertions(+), 2 deletions(-) create mode 100644 gcc/config/aarch64/thunderx.md diff --git a/gcc/config/aarch64/aarch64-cores.def b/gcc/config/aarch64/aarch64-cores.def index b3318c3..471cdd6 100644 --- a/gcc/config/aarch64/aarch64-cores.def +++ b/gcc/config/aarch64/aarch64-cores.def @@ -36,7 +36,7 @@ AARCH64_CORE("cortex-a53", cortexa53, cortexa53, 8, AARCH64_FL_FPSIMD | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, cortexa53) AARCH64_CORE("cortex-a57", cortexa15, cortexa15, 8, AARCH64_FL_FPSIMD | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, cortexa57) -AARCH64_CORE("thunderx", thunderx, cortexa53, 8, AARCH64_FL_FPSIMD | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx) +AARCH64_CORE("thunderx", thunderx, thunderx, 8, AARCH64_FL_FPSIMD | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx) /* V8 big.LITTLE implementations. */ diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 17570ba..80f2db7 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -191,13 +191,14 @@ (define_attr "generic_sched" "yes,no" (const (if_then_else - (eq_attr "tune" "cortexa53,cortexa15") + (eq_attr "tune" "cortexa53,cortexa15,thunderx") (const_string "no") (const_string "yes")))) ;; Scheduling (include "../arm/cortex-a53.md") (include "../arm/cortex-a15.md") +(include "thunderx.md") ;; ------------------------------------------------------------------- ;; Jumps and other miscellaneous insns diff --git a/gcc/config/aarch64/thunderx.md b/gcc/config/aarch64/thunderx.md new file mode 100644 index 0000000..30e4395 --- /dev/null +++ b/gcc/config/aarch64/thunderx.md @@ -0,0 +1,260 @@ +;; Cavium ThunderX pipeline description +;; Copyright (C) 2014 Free Software Foundation, Inc. +;; +;; Written by Andrew Pinski + +;; This file is part of GCC. + +;; GCC is free software; you can redistribute it and/or modify +;; it under the terms of the GNU General Public License as published by +;; the Free Software Foundation; either version 3, or (at your option) +;; any later version. + +;; GCC is distributed in the hope that it will be useful, +;; but WITHOUT ANY WARRANTY; without even the implied warranty of +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +;; GNU General Public License for more details. + +;; You should have received a copy of the GNU General Public License +;; along with GCC; see the file COPYING3. If not see +;; . +;; Copyright (C) 2004, 2005, 2006 Cavium Networks. + + +;; Thunder is a dual-issue processor that can issue all instructions on +;; pipe0 and a subset on pipe1. + + +(define_automaton "thunderx_main, thunderx_mult, thunderx_divide, thunderx_simd") + +(define_cpu_unit "thunderx_pipe0" "thunderx_main") +(define_cpu_unit "thunderx_pipe1" "thunderx_main") +(define_cpu_unit "thunderx_mult" "thunderx_mult") +(define_cpu_unit "thunderx_divide" "thunderx_divide") +(define_cpu_unit "thunderx_simd" "thunderx_simd") + +(define_insn_reservation "thunderx_add" 1 + (and (eq_attr "tune" "thunderx") + (eq_attr "type" "adc_imm,adc_reg,adr,alu_imm,alu_sreg,alus_imm,alus_sreg,extend,logic_imm,logic_reg,logics_imm,logics_reg,mov_imm,mov_reg")) + "thunderx_pipe0 | thunderx_pipe1") + +(define_insn_reservation "thunderx_shift" 1 + (and (eq_attr "tune" "thunderx") + (eq_attr "type" "bfm,extend,shift_imm,shift_reg")) + "thunderx_pipe0 | thunderx_pipe1") + + +;; Arthimentic instructions with an extra shift or extend is two cycles. +;; FIXME: This needs more attributes on aarch64 than what is currently there; +;; this is conserative for now. +;; Except this is not correct as this is only for !(LSL && shift by 0/1/2/3) +;; Except this is not correct as this is only for !(zero extend) + +(define_insn_reservation "thunderx_arith_shift" 2 + (and (eq_attr "tune" "thunderx") + (eq_attr "type" "alu_ext,alu_shift_imm,alu_shift_reg,alus_ext,logic_shift_imm,logic_shift_reg,logics_shift_imm,logics_shift_reg,alus_shift_imm")) + "thunderx_pipe0 | thunderx_pipe1") + +(define_insn_reservation "thunderx_csel" 2 + (and (eq_attr "tune" "thunderx") + (eq_attr "type" "csel")) + "thunderx_pipe0 | thunderx_pipe1") + +;; Multiply and mulitply accumulate and count leading zeros can only happen on pipe 1 + +(define_insn_reservation "thunderx_mul" 4 + (and (eq_attr "tune" "thunderx") + (eq_attr "type" "mul,muls,mla,mlas,clz,smull,umull,smlal,umlal")) + "thunderx_pipe1 + thunderx_mult") + +;; Multiply high instructions take an extra cycle and cause the muliply unit to +;; be busy for an extra cycle. + +;(define_insn_reservation "thunderx_mul_high" 5 +; (and (eq_attr "tune" "thunderx") +; (eq_attr "type" "smull,umull")) +; "thunderx_pipe1 + thunderx_mult") + +(define_insn_reservation "thunderx_div32" 22 + (and (eq_attr "tune" "thunderx") + (eq_attr "type" "udiv,sdiv")) + "thunderx_pipe1 + thunderx_divide, thunderx_divide * 21") + +;(define_insn_reservation "thunderx_div64" 38 +; (and (eq_attr "tune" "thunderx") +; (eq_attr "type" "udiv,sdiv") +; (eq_attr "mode" "DI")) +; "thunderx_pipe1 + thunderx_divide, thunderx_divide * 34") + +;; Stores take one cycle in pipe 0 +(define_insn_reservation "thunderx_store" 1 + (and (eq_attr "tune" "thunderx") + (eq_attr "type" "store1")) + "thunderx_pipe0") + +;; Store pair are single issued +(define_insn_reservation "thunderx_storepair" 1 + (and (eq_attr "tune" "thunderx") + (eq_attr "type" "store2")) + "thunderx_pipe0 + thunderx_pipe1") + + +;; loads (and load pairs) from L1 take 3 cycles in pipe 0 +(define_insn_reservation "thunderx_load" 3 + (and (eq_attr "tune" "thunderx") + (eq_attr "type" "load1, load2")) + "thunderx_pipe0") + +(define_insn_reservation "thunderx_brj" 1 + (and (eq_attr "tune" "thunderx") + (eq_attr "type" "branch,trap,call")) + "thunderx_pipe1") + +;; FPU + +(define_insn_reservation "thunderx_fadd" 4 + (and (eq_attr "tune" "thunderx") + (eq_attr "type" "faddd,fadds")) + "thunderx_pipe1") + +(define_insn_reservation "thunderx_fconst" 1 + (and (eq_attr "tune" "thunderx") + (eq_attr "type" "fconsts,fconstd")) + "thunderx_pipe1") + +;; Moves between fp are 2 cycles including min/max/select/abs/neg +(define_insn_reservation "thunderx_fmov" 2 + (and (eq_attr "tune" "thunderx") + (eq_attr "type" "fmov,f_minmaxs,f_minmaxd,fcsel,ffarithd,ffariths")) + "thunderx_pipe1") + +(define_insn_reservation "thunderx_fmovgpr" 2 + (and (eq_attr "tune" "thunderx") + (eq_attr "type" "f_mrc, f_mcr")) + "thunderx_pipe1") + +(define_insn_reservation "thunderx_fmul" 6 + (and (eq_attr "tune" "thunderx") + (eq_attr "type" "fmacs,fmacd,fmuls,fmuld")) + "thunderx_pipe1") + +(define_insn_reservation "thunderx_fdivs" 12 + (and (eq_attr "tune" "thunderx") + (eq_attr "type" "fdivs")) + "thunderx_pipe1 + thunderx_divide, thunderx_divide*8") + +(define_insn_reservation "thunderx_fdivd" 22 + (and (eq_attr "tune" "thunderx") + (eq_attr "type" "fdivd")) + "thunderx_pipe1 + thunderx_divide, thunderx_divide*18") + +(define_insn_reservation "thunderx_fsqrts" 17 + (and (eq_attr "tune" "thunderx") + (eq_attr "type" "fsqrts")) + "thunderx_pipe1 + thunderx_divide, thunderx_divide*13") + +(define_insn_reservation "thunderx_fsqrtd" 28 + (and (eq_attr "tune" "thunderx") + (eq_attr "type" "fsqrtd")) + "thunderx_pipe1 + thunderx_divide, thunderx_divide*31") + +;; The rounding conversion inside fp is 4 cycles +(define_insn_reservation "thunderx_frint" 4 + (and (eq_attr "tune" "thunderx") + (eq_attr "type" "f_rints,f_rintd")) + "thunderx_pipe1") + +;; Float to integer with a move from int to/from float is 6 cycles +(define_insn_reservation "thunderx_f_cvt" 6 + (and (eq_attr "tune" "thunderx") + (eq_attr "type" "f_cvt,f_cvtf2i,f_cvti2f")) + "thunderx_pipe1") + +;; FP/SIMD load/stores happen in pipe 0 +;; 64bit Loads register/pairs are 4 cycles from L1 +(define_insn_reservation "thunderx_64simd_fp_load" 4 + (and (eq_attr "tune" "thunderx") + (eq_attr "type" "f_loadd,f_loads,neon_load1_1reg,\ + neon_load1_1reg_q,neon_load1_2reg")) + "thunderx_pipe0") + +;; 128bit load pair is singled issue and 4 cycles from L1 +(define_insn_reservation "thunderx_128simd_pair_load" 4 + (and (eq_attr "tune" "thunderx") + (eq_attr "type" "neon_load1_2reg_q")) + "thunderx_pipe0+thunderx_pipe1") + +;; FP/SIMD Stores takes one cycle in pipe 0 +(define_insn_reservation "thunderx_simd_fp_store" 1 + (and (eq_attr "tune" "thunderx") + (eq_attr "type" "f_stored,f_stores,neon_store1_1reg,neon_store1_1reg_q")) + "thunderx_pipe0") + +;; 64bit neon store pairs are single issue for one cycle +(define_insn_reservation "thunderx_64neon_storepair" 1 + (and (eq_attr "tune" "thunderx") + (eq_attr "type" "neon_store1_2reg")) + "thunderx_pipe0 + thunderx_pipe1") + +;; 128bit neon store pair are single issued for two cycles +(define_insn_reservation "thunderx_128neon_storepair" 2 + (and (eq_attr "tune" "thunderx") + (eq_attr "type" "neon_store1_2reg_q")) + "(thunderx_pipe0 + thunderx_pipe1)*2") + + +;; SIMD/NEON (q forms take an extra cycle) + +;; Thunder simd move instruction types - 2/3 cycles +(define_insn_reservation "thunderx_neon_move" 2 + (and (eq_attr "tune" "thunderx") + (eq_attr "type" "neon_logic, neon_bsl, neon_fp_compare_s, \ + neon_fp_compare_d, neon_move")) + "thunderx_pipe1 + thunderx_simd") + +(define_insn_reservation "thunderx_neon_move_q" 3 + (and (eq_attr "tune" "thunderx") + (eq_attr "type" "neon_logic_q, neon_bsl_q, neon_fp_compare_s_q, \ + neon_fp_compare_d_q, neon_move_q")) + "thunderx_pipe1 + thunderx_simd, thunderx_simd") + + +;; Thunder simd simple/add instruction types - 4/5 cycles + +(define_insn_reservation "thunderx_neon_add" 4 + (and (eq_attr "tune" "thunderx") + (eq_attr "type" "neon_reduc_add, neon_reduc_minmax, neon_fp_reduc_add_s, \ + neon_fp_reduc_add_d, neon_fp_to_int_s, neon_fp_to_int_d, \ + neon_add_halve, neon_sub_halve, neon_qadd, neon_compare, \ + neon_compare_zero, neon_minmax, neon_abd, neon_add, neon_sub, \ + neon_fp_minmax_s, neon_fp_minmax_d, neon_reduc_add, neon_cls, \ + neon_qabs, neon_qneg, neon_fp_addsub_s, neon_fp_addsub_d")) + "thunderx_pipe1 + thunderx_simd") + +;; BIG NOTE: neon_add_long/neon_sub_long don't have a q form which is incorrect + +(define_insn_reservation "thunderx_neon_add_q" 5 + (and (eq_attr "tune" "thunderx") + (eq_attr "type" "neon_reduc_add_q, neon_reduc_minmax_q, neon_fp_reduc_add_s_q, \ + neon_fp_reduc_add_d_q, neon_fp_to_int_s_q, neon_fp_to_int_d_q, \ + neon_add_halve_q, neon_sub_halve_q, neon_qadd_q, neon_compare_q, \ + neon_compare_zero_q, neon_minmax_q, neon_abd_q, neon_add_q, neon_sub_q, \ + neon_fp_minmax_s_q, neon_fp_minmax_d_q, neon_reduc_add_q, neon_cls_q, \ + neon_qabs_q, neon_qneg_q, neon_fp_addsub_s_q, neon_fp_addsub_d_q, \ + neon_add_long, neon_sub_long")) + "thunderx_pipe1 + thunderx_simd, thunderx_simd") + + +;; Thunder 128bit SIMD reads the upper halve in cycle 2 and writes in the last cycle +(define_bypass 2 "thunderx_neon_move_q" "thunderx_neon_move_q, thunderx_neon_add_q") +(define_bypass 4 "thunderx_neon_add_q" "thunderx_neon_move_q, thunderx_neon_add_q") + +;; Assume both pipes are needed for unknown and multiple-instruction +;; patterns. + +(define_insn_reservation "thunderx_unknown" 1 + (and (eq_attr "tune" "thunderx") + (eq_attr "type" "untyped,multiple")) + "thunderx_pipe0 + thunderx_pipe1") + +