From patchwork Tue Jun 28 01:03:01 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jim Wilson X-Patchwork-Id: 70967 Delivered-To: patch@linaro.org Received: by 10.140.28.4 with SMTP id 4csp1328835qgy; Mon, 27 Jun 2016 18:03:27 -0700 (PDT) X-Received: by 10.66.101.136 with SMTP id fg8mr363399pab.144.1467075807867; Mon, 27 Jun 2016 18:03:27 -0700 (PDT) Return-Path: Received: from sourceware.org (server1.sourceware.org. [209.132.180.131]) by mx.google.com with ESMTPS id tg11si29827551pac.193.2016.06.27.18.03.27 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 27 Jun 2016 18:03:27 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-return-430576-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org; spf=pass (google.com: domain of gcc-patches-return-430576-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) smtp.mailfrom=gcc-patches-return-430576-patch=linaro.org@gcc.gnu.org; dmarc=fail (p=NONE dis=NONE) header.from=linaro.org DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; q=dns; s=default; b=bt20qpD26kdEX76 owWwXe/igCyIxCZkj2RE3HJnA4CnXfCKcG6wd5bGzoiyGtQJIxKYZUEkSYSYBg33 vZ1kwqDhBrdgGDTpL1Li+xNrU1kX8DVUJ64v9gvq7Im47581zNXcvg/LsEBi2EOf 7xgPmDTjPr/kbz9ywMvbCSSsNeSQ= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; s=default; bh=KtQTxx4HnzS3ExvmPfeyc 4+GmxY=; b=jZx4n5PCOqe8vwqDRbOLpaSeQWoj4gvm8tIsGk0XoZkve5QzbYuD8 Ke+iE63p1j+q1T1LT4CSXfXqXvJMm50Jn98D7kxD3GP35dqPMOQGMaqMQ5/PpRIH oRiBIhsS4UZP/C/rWCuvhWBP1Q3/KAc400dteisX1ypM0qoXZodAr8= Received: (qmail 73194 invoked by alias); 28 Jun 2016 01:03:15 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 73179 invoked by uid 89); 28 Jun 2016 01:03:14 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.2 required=5.0 tests=AWL, BAYES_00, KAM_ASCII_DIVIDERS, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=no version=3.3.2 spammy=Branches, tkachov, Tkachov, COSTS_N_INSNS X-HELO: mail-qk0-f175.google.com Received: from mail-qk0-f175.google.com (HELO mail-qk0-f175.google.com) (209.85.220.175) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Tue, 28 Jun 2016 01:03:04 +0000 Received: by mail-qk0-f175.google.com with SMTP id q79so3167268qke.0 for ; Mon, 27 Jun 2016 18:03:03 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=npzQ2Nt5SFD4kfqjBr1dyUYHbtHVKisNAoAMvjPiDJY=; b=X8K9zVWlyVxWKUXDEr4Ox4ToeK1abGRBrxyW82meDUJQf+Qj+N9aCLqjmgCkv4rzxn 0TgAL1F9QlzemjV2DkYktr9pPWPYgSCwO1rTEdjJpFLaXsY/Qx2V4EryfiX1CGZ45Cay wfT9TaLW/Z+IURZT9gLOIYKL+lBpha3pZXHCX960Xw2a2Orz14dBuTjoAaeF2gx9LQGb axq4kDwXcyRhijrx86UNsQNH7od82anx/TsVm9v2WkE653tpL1QQzd3buK/cFVlPF/91 0pKtYC27UQZ9EWnh3sK2GgofOBXJzZUhlbHv6DL1a4uqzkaiZRMBh/clRL/fLmdi7sx8 0FsQ== X-Gm-Message-State: ALyK8tKe69G91CIkWfNfj3r/qoWre5xNFxB1uDWQWGmsPrYvK1dzbCc/NWL5fLWRnw54eyc9TK7pf351rCUBC5oF X-Received: by 10.13.224.193 with SMTP id j184mr52579ywe.198.1467075781756; Mon, 27 Jun 2016 18:03:01 -0700 (PDT) MIME-Version: 1.0 Received: by 10.129.9.213 with HTTP; Mon, 27 Jun 2016 18:03:01 -0700 (PDT) In-Reply-To: <575E8497.2090301@foss.arm.com> References: <575E8497.2090301@foss.arm.com> From: Jim Wilson Date: Mon, 27 Jun 2016 18:03:01 -0700 Message-ID: Subject: Re: [PATCH, AARCH64] add qdf24xx tuning structure To: Kyrill Tkachov , James Greenhalgh Cc: "gcc-patches@gcc.gnu.org" On Mon, Jun 13, 2016 at 3:01 AM, Kyrill Tkachov wrote: > Hi Jim, > > On 10/06/16 23:48, Jim Wilson wrote: >> >> This adds a tuning structure for qdf24xx. This was tested with an >> aarch64-linux bootstrap and a make check, with no regressions. I also >> tested it with an x86_64-linux C make check to verify that I didn't >> break the testsuite for non aarch64 targets. > > > As this also changes code in the arm backend > it also needs a bootstrap and test on an arm target > (arm-none-linux-gnueabihf for example). > Can you please confirm that this passes successfully? Yes, I forgot to do the bootstrap and make check on arm. I tried to do that testing, and ran into problems with the armv8-a assembler warning IT blocks containing 32-bit Thumb instructions are deprecated in ARMv8 which messed up my testsuite results so much that they were unusable. I had to rerun the tests to workaround that, and then got distracted by other problems, but I have now done an armhf bootstrap and make check with unpatched (cortex-a57) and patched (qdf24xx) trees and got the same results. During the delay, the aarch64 tuning structure changed how the recip square root approx is handled, so I had to make a trivial change to my patch to compensate for that, and then redo the aarch64 bootstrap to make sure it was still OK. The new patch is attached, which otherwise the same as the previous patch. I'm assuming this is still OK to install, as the previous patch was approved pending test results, but will wait a bit in case someone ones to object. Jim gcc/ * config/aarch64/aarch64-cores.def (qdf24xx): Use qdf24xx tuning. * config/aarch64/aarch64.c (qdf24xx_addrcost_table, qdf24xx_regmove_cost, qdf24xx_tunings): New. * config/arm/aarch64-cost-tables.h (qdf24xx_extra_costs): New. * config/arm/arm-cores.def (qdf24xx): Use qdf24xx tuning. * config/arm/arm.c (arm_qdf24xx_tune): New. gcc/testsuite/ * gcc.dg/asr_div1.c: Add aarch64 specific dg-options. Index: config/aarch64/aarch64-cores.def =================================================================== --- config/aarch64/aarch64-cores.def (revision 237800) +++ config/aarch64/aarch64-cores.def (working copy) @@ -46,7 +46,7 @@ AARCH64_CORE("cortex-a57", cortexa57, cortexa57, AARCH64_CORE("cortex-a72", cortexa72, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa72, "0x41", "0xd08") AARCH64_CORE("cortex-a73", cortexa73, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa73, "0x41", "0xd09") AARCH64_CORE("exynos-m1", exynosm1, exynosm1, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, exynosm1, "0x53", "0x001") -AARCH64_CORE("qdf24xx", qdf24xx, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, cortexa57, "0x51", "0x800") +AARCH64_CORE("qdf24xx", qdf24xx, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, qdf24xx, "0x51", "0x800") AARCH64_CORE("thunderx", thunderx, thunderx, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx, "0x43", "0x0a1") AARCH64_CORE("xgene1", xgene1, xgene1, 8A, AARCH64_FL_FOR_ARCH8, xgene1, "0x50", "0x000") Index: config/aarch64/aarch64.c =================================================================== --- config/aarch64/aarch64.c (revision 237800) +++ config/aarch64/aarch64.c (working copy) @@ -250,6 +250,22 @@ static const struct cpu_addrcost_table xgene1_addr 0, /* imm_offset */ }; +static const struct cpu_addrcost_table qdf24xx_addrcost_table = +{ + { + 1, /* hi */ + 0, /* si */ + 0, /* di */ + 1, /* ti */ + }, + 0, /* pre_modify */ + 0, /* post_modify */ + 0, /* register_offset */ + 0, /* register_sextend */ + 0, /* register_zextend */ + 0 /* imm_offset */ +}; + static const struct cpu_regmove_cost generic_regmove_cost = { 1, /* GP2GP */ @@ -308,6 +324,15 @@ static const struct cpu_regmove_cost xgene1_regmov 2 /* FP2FP */ }; +static const struct cpu_regmove_cost qdf24xx_regmove_cost = +{ + 2, /* GP2GP */ + /* Avoid the use of int<->fp moves for spilling. */ + 6, /* GP2FP */ + 6, /* FP2GP */ + 4 /* FP2FP */ +}; + /* Generic costs for vector insn classes. */ static const struct cpu_vector_cost generic_vector_cost = { @@ -647,6 +672,32 @@ static const struct tune_params xgene1_tunings = (AARCH64_EXTRA_TUNE_NONE) /* tune_flags. */ }; +static const struct tune_params qdf24xx_tunings = +{ + &qdf24xx_extra_costs, + &qdf24xx_addrcost_table, + &qdf24xx_regmove_cost, + &generic_vector_cost, + &generic_branch_cost, + &generic_approx_modes, + 4, /* memmov_cost */ + 4, /* issue_rate */ + (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD + | AARCH64_FUSE_MOVK_MOVK), /* fuseable_ops */ + 16, /* function_align. */ + 8, /* jump_align. */ + 16, /* loop_align. */ + 2, /* int_reassoc_width. */ + 4, /* fp_reassoc_width. */ + 1, /* vec_reassoc_width. */ + 2, /* min_div_recip_mul_sf. */ + 2, /* min_div_recip_mul_df. */ + 0, /* max_case_values. */ + 64, /* cache_line_size. */ + tune_params::AUTOPREFETCHER_STRONG, /* autoprefetcher_model. */ + (AARCH64_EXTRA_TUNE_NONE) /* tune_flags. */ +}; + /* Support for fine-grained override of the tuning structures. */ struct aarch64_tuning_override_function { Index: config/arm/aarch-cost-tables.h =================================================================== --- config/arm/aarch-cost-tables.h (revision 237800) +++ config/arm/aarch-cost-tables.h (working copy) @@ -537,4 +537,107 @@ const struct cpu_cost_table xgene1_extra_costs = } }; +const struct cpu_cost_table qdf24xx_extra_costs = +{ + /* ALU */ + { + 0, /* arith. */ + 0, /* logical. */ + 0, /* shift. */ + 0, /* shift_reg. */ + COSTS_N_INSNS (1), /* arith_shift. */ + COSTS_N_INSNS (1), /* arith_shift_reg. */ + 0, /* log_shift. */ + 0, /* log_shift_reg. */ + 0, /* extend. */ + 0, /* extend_arith. */ + 0, /* bfi. */ + 0, /* bfx. */ + 0, /* clz. */ + 0, /* rev. */ + 0, /* non_exec. */ + true /* non_exec_costs_exec. */ + }, + { + /* MULT SImode */ + { + COSTS_N_INSNS (2), /* simple. */ + COSTS_N_INSNS (2), /* flag_setting. */ + COSTS_N_INSNS (2), /* extend. */ + COSTS_N_INSNS (2), /* add. */ + COSTS_N_INSNS (2), /* extend_add. */ + COSTS_N_INSNS (4) /* idiv. */ + }, + /* MULT DImode */ + { + COSTS_N_INSNS (3), /* simple. */ + 0, /* flag_setting (N/A). */ + COSTS_N_INSNS (3), /* extend. */ + COSTS_N_INSNS (3), /* add. */ + COSTS_N_INSNS (3), /* extend_add. */ + COSTS_N_INSNS (9) /* idiv. */ + } + }, + /* LD/ST */ + { + COSTS_N_INSNS (2), /* load. */ + COSTS_N_INSNS (2), /* load_sign_extend. */ + COSTS_N_INSNS (2), /* ldrd. */ + COSTS_N_INSNS (2), /* ldm_1st. */ + 1, /* ldm_regs_per_insn_1st. */ + 2, /* ldm_regs_per_insn_subsequent. */ + COSTS_N_INSNS (2), /* loadf. */ + COSTS_N_INSNS (2), /* loadd. */ + COSTS_N_INSNS (3), /* load_unaligned. */ + 0, /* store. */ + 0, /* strd. */ + 0, /* stm_1st. */ + 1, /* stm_regs_per_insn_1st. */ + 2, /* stm_regs_per_insn_subsequent. */ + 0, /* storef. */ + 0, /* stored. */ + COSTS_N_INSNS (1), /* store_unaligned. */ + COSTS_N_INSNS (1), /* loadv. */ + COSTS_N_INSNS (1) /* storev. */ + }, + { + /* FP SFmode */ + { + COSTS_N_INSNS (6), /* div. */ + COSTS_N_INSNS (5), /* mult. */ + COSTS_N_INSNS (5), /* mult_addsub. */ + COSTS_N_INSNS (5), /* fma. */ + COSTS_N_INSNS (3), /* addsub. */ + COSTS_N_INSNS (1), /* fpconst. */ + COSTS_N_INSNS (1), /* neg. */ + COSTS_N_INSNS (2), /* compare. */ + COSTS_N_INSNS (4), /* widen. */ + COSTS_N_INSNS (4), /* narrow. */ + COSTS_N_INSNS (4), /* toint. */ + COSTS_N_INSNS (4), /* fromint. */ + COSTS_N_INSNS (2) /* roundint. */ + }, + /* FP DFmode */ + { + COSTS_N_INSNS (11), /* div. */ + COSTS_N_INSNS (6), /* mult. */ + COSTS_N_INSNS (6), /* mult_addsub. */ + COSTS_N_INSNS (6), /* fma. */ + COSTS_N_INSNS (3), /* addsub. */ + COSTS_N_INSNS (1), /* fpconst. */ + COSTS_N_INSNS (1), /* neg. */ + COSTS_N_INSNS (2), /* compare. */ + COSTS_N_INSNS (4), /* widen. */ + COSTS_N_INSNS (4), /* narrow. */ + COSTS_N_INSNS (4), /* toint. */ + COSTS_N_INSNS (4), /* fromint. */ + COSTS_N_INSNS (2) /* roundint. */ + } + }, + /* Vector */ + { + COSTS_N_INSNS (1) /* alu. */ + } +}; + #endif /* GCC_AARCH_COST_TABLES_H */ Index: config/arm/arm-cores.def =================================================================== --- config/arm/arm-cores.def (revision 237800) +++ config/arm/arm-cores.def (working copy) @@ -173,7 +173,7 @@ ARM_CORE("cortex-a57", cortexa57, cortexa57, 8A, A ARM_CORE("cortex-a72", cortexa72, cortexa57, 8A, ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), cortex_a57) ARM_CORE("cortex-a73", cortexa73, cortexa57, 8A, ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), cortex_a73) ARM_CORE("exynos-m1", exynosm1, exynosm1, 8A, ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), exynosm1) -ARM_CORE("qdf24xx", qdf24xx, cortexa57, 8A, ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), cortex_a57) +ARM_CORE("qdf24xx", qdf24xx, cortexa57, 8A, ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), qdf24xx) ARM_CORE("xgene1", xgene1, xgene1, 8A, ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_FOR_ARCH8A), xgene1) /* V8 big.LITTLE implementations */ Index: config/arm/arm.c =================================================================== --- config/arm/arm.c (revision 237800) +++ config/arm/arm.c (working copy) @@ -2052,6 +2052,29 @@ const struct tune_params arm_xgene1_tune = tune_params::SCHED_AUTOPREF_OFF }; +const struct tune_params arm_qdf24xx_tune = +{ + arm_9e_rtx_costs, + &qdf24xx_extra_costs, + NULL, /* Scheduler cost adjustment. */ + arm_default_branch_cost, + &arm_default_vec_cost, /* Vectorizer costs. */ + 1, /* Constant limit. */ + 2, /* Max cond insns. */ + 8, /* Memset max inline. */ + 4, /* Issue rate. */ + ARM_PREFETCH_BENEFICIAL (0, -1, 64), + tune_params::PREF_CONST_POOL_FALSE, + tune_params::PREF_LDRD_TRUE, + tune_params::LOG_OP_NON_SHORT_CIRCUIT_TRUE, /* Thumb. */ + tune_params::LOG_OP_NON_SHORT_CIRCUIT_TRUE, /* ARM. */ + tune_params::DISPARAGE_FLAGS_ALL, + tune_params::PREF_NEON_64_FALSE, + tune_params::PREF_NEON_STRINGOPS_TRUE, + FUSE_OPS (tune_params::FUSE_MOVW_MOVT), + tune_params::SCHED_AUTOPREF_FULL +}; + /* Branches can be dual-issued on Cortex-A5, so conditional execution is less appealing. Set max_insns_skipped to a low value. */ Index: testsuite/gcc.dg/asr_div1.c =================================================================== --- testsuite/gcc.dg/asr_div1.c (revision 237800) +++ testsuite/gcc.dg/asr_div1.c (working copy) @@ -1,6 +1,7 @@ /* Test division by const int generates only one shift. */ /* { dg-do run } */ /* { dg-options "-O2 -fdump-rtl-combine-all" } */ +/* { dg-options "-O2 -fdump-rtl-combine-all -mtune=cortex-a53" { target aarch64*-*-* } } */ /* { dg-require-effective-target int32plus } */ extern void abort (void);