From patchwork Fri Sep 22 15:49:17 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jim Wilson X-Patchwork-Id: 114060 Delivered-To: patch@linaro.org Received: by 10.140.106.117 with SMTP id d108csp3447631qgf; Fri, 22 Sep 2017 08:50:00 -0700 (PDT) X-Received: by 10.99.3.9 with SMTP id 9mr9767572pgd.205.1506095400599; Fri, 22 Sep 2017 08:50:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1506095400; cv=none; d=google.com; s=arc-20160816; b=hbVVzXu1sOyO+MjWtQ7+HDdnuYE+0LLUQixO9LaigfddbJoKA5EAZGKT4lrZvSjOFv QhgpyjXj7HCTzh4JMBIF5aFSfG1e1wAbPlU0DhEuheEdMYhoGMlC53poSuZJUkaiRFua IZOrTfjQ8MqV3yw9+MKJVQsScF0wA12QRSHaGRJsxAYCAzbF7r+j6K5/Am4wVZB1Ky2U r8mZTWv3EcemZUKN8Ang0aCb7M7C+qjnKp+5flALK47PrRWf/lpOn7x+k7ef3vKmTvEz p/CoN5JNxFzFbxSj2bVteT5pYSaBAc+ljEj+opWx79HjCHs0yGp3sx5WXA4Ulk/7s+Cw aIWg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=message-id:date:subject:cc:to:from:delivered-to:sender:list-help :list-post:list-archive:list-unsubscribe:list-id:precedence :mailing-list:dkim-signature:domainkey-signature :arc-authentication-results; bh=phdu39ztNbgQYtuC5j5UVN+rD96TgFJtgZ+sih9ZJ/k=; b=GBoqUeAcl4CsnmgvK2F/eEZleNaYd6w0KkCktyOYFUDHv2bE5G5YqJ15/XejyE0NH0 +YHu8iFKIfxU3oW+DLBG3PcPZi4oEIFwpyUindsWrViPHS4SQuvJt+aGfFQxEVHSCjE5 AxGNmyAwJFKFY3WaADUAzl+bpf8VnW9V1gYYp4BGNFw/CVpArZxWo11XccJno9rOkgez BlxyaMwJoLCeCxw4QqdlkLAFKCgs32B6fHrGfFAG0FcYIMpMV3CJ2o03OhJv06US0c38 7Ga8KIfnN7EbB8oe/FtS1KICvwMIp+f+YEaHsCZEl9f2jjj+QbJEcgUXs7bAF3D6plDF S2rg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=Slg5ACQf; spf=pass (google.com: domain of gcc-patches-return-462785-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) smtp.mailfrom=gcc-patches-return-462785-patch=linaro.org@gcc.gnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from sourceware.org (server1.sourceware.org. [209.132.180.131]) by mx.google.com with ESMTPS id a69si59076pli.637.2017.09.22.08.50.00 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 22 Sep 2017 08:50:00 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-return-462785-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=Slg5ACQf; spf=pass (google.com: domain of gcc-patches-return-462785-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) smtp.mailfrom=gcc-patches-return-462785-patch=linaro.org@gcc.gnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id; q=dns; s=default; b=UlfDVqAeNV/h oWy32bkITE7mdIJFH7w+zAI+cp1SWJG3cl5dq3CCBg+5SlkKPBzpodDNtos86TY0 tORZxSUeS0AI7SRkkUIlLWpyBcqHX6d+bjJ/Aq5GISt6KxB1NhdNkMuMIbnOgPcc MpfxVgKA65BvKu0RQZ7R/a/kNLeaF6Q= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id; s=default; bh=qoUCfy8ORkxtRoP8Uf 6xxPzH/eQ=; b=Slg5ACQfqa1RXXkYouA4eRh7HFOsu3lIGFL2Pq5O9zXiOLBp+7 ZwymfE/1GVhVS9+q0dYwUhLJhjyorWdCx2NLFgUeNB2/icGJsny39222D2bkHLfk yxiOcDg3BJvosCgGXe0Yx7vEflcfVA4qQ2Kxj3pblrdXJnOF0MFZwb88k= Received: (qmail 47237 invoked by alias); 22 Sep 2017 15:49:26 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 47146 invoked by uid 89); 22 Sep 2017 15:49:26 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-26.4 required=5.0 tests=BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, RCVD_IN_SORBS_SPAM, SPF_PASS autolearn=ham version=3.3.2 spammy=ry, yy, mw, 0.4 X-HELO: mail-io0-f182.google.com Received: from mail-io0-f182.google.com (HELO mail-io0-f182.google.com) (209.85.223.182) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 22 Sep 2017 15:49:23 +0000 Received: by mail-io0-f182.google.com with SMTP id d16so4017930ioj.3 for ; Fri, 22 Sep 2017 08:49:23 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=phdu39ztNbgQYtuC5j5UVN+rD96TgFJtgZ+sih9ZJ/k=; b=T8COlrWP9B8u/IN4C4YhW7OcFBlv0ARNVnUcJN8wYQ2kHaRzPL3eDAUbD/lmaXqNKP 9EOLVmkaKtCBoCOKwsrsOILZWAFro/gueks2iN7/ZbgeQ2z3j667MmoJhl1z8nJG4+LV F5w93AyA4MoeV4M2g2gGVoP41oZho22/DjmVdzocjQ2v6UdQtVWmIWfqaa5v50p7BaDJ c4ACZusf9V1yXWEx/o6W0OZSwwBfgGNCeJZs5r6cusuo0sb5gF9hVwv3dATVeYGXBnPa atnGBydD61zrIjAWHM9+6gQlxXvcGdMK2JL7eL6wBoYnCPcSCPL1n4QOkqB+SzwpaQe+ HaNg== X-Gm-Message-State: AHPjjUgqHFcDokZFulK9X8jAgcmwE4vniBfK+9hfPgHhZaru0fPfhh1m 7HRj5Cql5487khn5Mpj0sr+CTyrKXNw= X-Google-Smtp-Source: AOwi7QBh8OJGaX3bpWKsdFGW2W6Yk64byqEbavaZ63e1Ow0TRmN+iqvMwOWnOAfpe3frH97L4LRkOQ== X-Received: by 10.202.77.201 with SMTP id a192mr6796837oib.311.1506095361367; Fri, 22 Sep 2017 08:49:21 -0700 (PDT) Received: from weathertop.attlocal.net ([2602:306:80a3:c890:201:73ff:fe02:1650]) by smtp.gmail.com with ESMTPSA id 199sm63480oie.58.2017.09.22.08.49.20 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Fri, 22 Sep 2017 08:49:20 -0700 (PDT) From: Jim Wilson To: gcc-patches@gcc.gnu.org Cc: Jim Wilson , wilson@tuliptree.org Subject: [PATCH, AArch64] Disable reg offset in quad-word store for Falkor. Date: Fri, 22 Sep 2017 08:49:17 -0700 Message-Id: <1506095357-3334-1-git-send-email-jim.wilson@linaro.org> On Falkor, because of an idiosyncracy of how the pipelines are designed, a quad-word store using a reg+reg addressing mode is almost twice as slow as an add followed by a quad-word store with a single reg addressing mode. So we get better performance if we disallow addressing modes using register offsets with quad-word stores. Using lmbench compiled with -O2 -ftree-vectorize as my benchmark, I see a 13% performance increase on stream copy using this patch, and a 16% performance increase on stream scale using this patch. I also see a small performance increase on SPEC CPU2006 of around 0.2% for int and 0.4% for FP at -O3. gcc/ * config/aarch64/aarch64-protos.h (aarch64_movti_target_operand_p): New. * config/aarch64/aarch64-simd.md (aarch64_simd_mov): Use Utf. * config/aarch64/aarch64-tuning-flags.def (SLOW_REGOFFSET_QUADWORD_STORE): New. * config/aarch64/aarch64.c (qdf24xx_tunings): Add SLOW_REGOFFSET_QUADWORD_STORE to tuning flags. (aarch64_movti_target_operand_p): New. * config/aarch64/aarch64.md (movti_aarch64): Use Utf. (movtf_aarch64): Likewise. * config/aarch64/constraints.md (Utf): New. --- gcc/config/aarch64/aarch64-protos.h | 1 + gcc/config/aarch64/aarch64-simd.md | 4 ++-- gcc/config/aarch64/aarch64-tuning-flags.def | 4 ++++ gcc/config/aarch64/aarch64.c | 14 +++++++++++++- gcc/config/aarch64/aarch64.md | 6 +++--- gcc/config/aarch64/constraints.md | 6 ++++++ 6 files changed, 29 insertions(+), 6 deletions(-) -- 2.7.4 diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index e67c2ed..2dfd057 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -379,6 +379,7 @@ const char *aarch64_output_move_struct (rtx *operands); rtx aarch64_return_addr (int, rtx); rtx aarch64_simd_gen_const_vector_dup (machine_mode, HOST_WIDE_INT); bool aarch64_simd_mem_operand_p (rtx); +bool aarch64_movti_target_operand_p (rtx); rtx aarch64_simd_vect_par_cnst_half (machine_mode, bool); rtx aarch64_tls_get_addr (void); tree aarch64_fold_builtin (tree, int, tree *, bool); diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 70e9339..88bf210 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -133,9 +133,9 @@ (define_insn "*aarch64_simd_mov" [(set (match_operand:VQ 0 "nonimmediate_operand" - "=w, Umq, m, w, ?r, ?w, ?r, w") + "=w, Umq, Utf, w, ?r, ?w, ?r, w") (match_operand:VQ 1 "general_operand" - "m, Dz, w, w, w, r, r, Dn"))] + "m, Dz, w, w, w, r, r, Dn"))] "TARGET_SIMD && (register_operand (operands[0], mode) || aarch64_simd_reg_or_zero (operands[1], mode))" diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def b/gcc/config/aarch64/aarch64-tuning-flags.def index f48642c..7d0b104 100644 --- a/gcc/config/aarch64/aarch64-tuning-flags.def +++ b/gcc/config/aarch64/aarch64-tuning-flags.def @@ -41,4 +41,8 @@ AARCH64_EXTRA_TUNING_OPTION ("slow_unaligned_ldpw", SLOW_UNALIGNED_LDPW) are not considered cheap. */ AARCH64_EXTRA_TUNING_OPTION ("cheap_shift_extend", CHEAP_SHIFT_EXTEND) +/* Don't use a register offset in a memory address for a quad-word store. */ +AARCH64_EXTRA_TUNING_OPTION ("slow_regoffset_quadword_store", + SLOW_REGOFFSET_QUADWORD_STORE) + #undef AARCH64_EXTRA_TUNING_OPTION diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 5e26cb7..d6f1133 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -818,7 +818,7 @@ static const struct tune_params qdf24xx_tunings = 2, /* min_div_recip_mul_df. */ 0, /* max_case_values. */ tune_params::AUTOPREFETCHER_STRONG, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ + (AARCH64_EXTRA_TUNE_SLOW_REGOFFSET_QUADWORD_STORE), /* tune_flags. */ &qdf24xx_prefetch_tune }; @@ -11821,6 +11821,18 @@ aarch64_simd_mem_operand_p (rtx op) || REG_P (XEXP (op, 0))); } +/* Return TRUE if OP uses an efficient memory address for quad-word target. */ +bool +aarch64_movti_target_operand_p (rtx op) +{ + if (! optimize_size + && (aarch64_tune_params.extra_tuning_flags + & AARCH64_EXTRA_TUNE_SLOW_REGOFFSET_QUADWORD_STORE)) + return MEM_P (op) && ! (GET_CODE (XEXP (op, 0)) == PLUS + && ! CONST_INT_P (XEXP (XEXP (op, 0), 1))); + return MEM_P (op); +} + /* Emit a register copy from operand to operand, taking care not to early-clobber source registers in the process. diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index f8cdb06..9c7e356 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -1023,7 +1023,7 @@ (define_insn "*movti_aarch64" [(set (match_operand:TI 0 - "nonimmediate_operand" "=r, w,r,w,r,m,m,w,m") + "nonimmediate_operand" "=r, w,r,w,r,m,m,w,Utf") (match_operand:TI 1 "aarch64_movti_operand" " rn,r,w,w,m,r,Z,m,w"))] "(register_operand (operands[0], TImode) @@ -1170,9 +1170,9 @@ (define_insn "*movtf_aarch64" [(set (match_operand:TF 0 - "nonimmediate_operand" "=w,?&r,w ,?r,w,?w,w,m,?r,m ,m") + "nonimmediate_operand" "=w,?&r,w ,?r,w,?w,w,Utf,?r,m ,m") (match_operand:TF 1 - "general_operand" " w,?r, ?r,w ,Y,Y ,m,w,m ,?r,Y"))] + "general_operand" " w,?r, ?r,w ,Y,Y ,m,w ,m ,?r,Y"))] "TARGET_FLOAT && (register_operand (operands[0], TFmode) || aarch64_reg_or_fp_zero (operands[1], TFmode))" "@ diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constraints.md index 3649fb4..b1defb6 100644 --- a/gcc/config/aarch64/constraints.md +++ b/gcc/config/aarch64/constraints.md @@ -171,6 +171,12 @@ (match_test "aarch64_legitimate_address_p (GET_MODE (op), XEXP (op, 0), PARALLEL, 1)"))) +(define_memory_constraint "Utf" + "@iternal + An efficient memory address for a quad-word target operand." + (and (match_code "mem") + (match_test "aarch64_movti_target_operand_p (op)"))) + (define_memory_constraint "Utv" "@internal An address valid for loading/storing opaque structure