From patchwork Mon Nov 2 14:15:27 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kyrylo Tkachov X-Patchwork-Id: 55899 Delivered-To: patch@linaro.org Received: by 10.112.61.134 with SMTP id p6csp1292550lbr; Mon, 2 Nov 2015 06:15:48 -0800 (PST) X-Received: by 10.66.163.66 with SMTP id yg2mr8201694pab.153.1446473748466; Mon, 02 Nov 2015 06:15:48 -0800 (PST) Return-Path: Received: from sourceware.org (server1.sourceware.org. [209.132.180.131]) by mx.google.com with ESMTPS id xt7si63652pab.187.2015.11.02.06.15.48 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 02 Nov 2015 06:15:48 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-return-412354-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Authentication-Results: mx.google.com; spf=pass (google.com: domain of gcc-patches-return-412354-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) smtp.mailfrom=gcc-patches-return-412354-patch=linaro.org@gcc.gnu.org; dkim=pass header.i=@gcc.gnu.org DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:cc:subject:content-type; q=dns; s=default; b=GnN2Mywy8bC5cQnW7LY0f+szRK+YsEsiX4UD4Y8v9M9 BznSiVuXou1p82+Fz9piXgQ3usLgstYWGb6NRmFrGUQ9OSk5GlmvDhBoeUcoPa/5 6adGxOiTvYJahOH0XYz5jRjLXgGmnjUtQfcd8aT9d4+Smqj/fQZWXHwcL0wiNWX8 = DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:cc:subject:content-type; s=default; bh=/+5Ek1hzws58pK0P8sDyB5P2Htg=; b=fsBIx/b+FV/RwSquT zIT+gaGS0Lo2UShuTJTgbyOHDMycjS1agok4J+840ELnle56WXQ4XYwZuFkrR3ef Uh13zg42xrI91O3td8mXFS+OBKV3QnVPsinrbyhrm/vxtmdxuI2IT27P51aC0eJw W7S56LS6eAcaHBWVrrSWGNsJtk= Received: (qmail 96983 invoked by alias); 2 Nov 2015 14:15:35 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 96969 invoked by uid 89); 2 Nov 2015 14:15:34 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.7 required=5.0 tests=AWL, BAYES_00, SPF_PASS autolearn=ham version=3.3.2 X-HELO: eu-smtp-delivery-143.mimecast.com Received: from eu-smtp-delivery-143.mimecast.com (HELO eu-smtp-delivery-143.mimecast.com) (146.101.78.143) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 02 Nov 2015 14:15:32 +0000 Received: from cam-owa1.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.140]) by eu-smtp-1.mimecast.com with ESMTP id uk-mta-15-jr8JPlK0RDGxMMoeRHbNCg-1; Mon, 02 Nov 2015 14:15:27 +0000 Received: from [10.2.206.200] ([10.1.2.79]) by cam-owa1.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959); Mon, 2 Nov 2015 14:15:27 +0000 Message-ID: <56376FFF.3070008@arm.com> Date: Mon, 02 Nov 2015 14:15:27 +0000 From: Kyrill Tkachov User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: gcc Patches CC: Segher Boessenkool Subject: [PATCH][combine][RFC] Don't transform sign and zero extends inside mults X-MC-Unique: jr8JPlK0RDGxMMoeRHbNCg-1 X-IsSubscribed: yes Hi all, This patch attempts to restrict combine from transforming ZERO_EXTEND and SIGN_EXTEND operations into and-bitmask and weird SUBREG expressions when they appear inside MULT expressions. This is because a MULT rtx containing these extend operations is usually a well understood widening multiply operation. However, if the costs for simple zero or sign-extend moves happen to line up in a particular way, expand_compound_operation will end up mangling a perfectly innocent extend+mult+add rtx like: (set (reg/f:DI 393) (plus:DI (mult:DI (sign_extend:DI (reg/v:SI 425 [ selected ])) (sign_extend:DI (reg:SI 606))) (reg:DI 600))) into: (set (reg/f:DI 393) (plus:DI (mult:DI (and:DI (subreg:DI (reg:SI 606) 0) (const_int 202 [0xca])) (sign_extend:DI (reg/v:SI 425 [ selected ]))) (reg:DI 600))) because it decides that the and-subreg form is cheaper than a sign-extend, even though the resultant expression can't hope to match a widening multiply-add pattern anymore. The and-subreg form may well be cheaper as the SET_SRC of a simple set, but is unlikely to be meaningful when inside a MULT expression. AFAIK the accepted way of representing widening multiplies is with MULT expressions containing zero/sign-extends, so rather than adding duplicate patterns in the backend to represent the different ways an extend operation can be expressed by combine, I think we should stop combine from mangling the representation where possible. This patch fixes that by stopping the transformation on the extend operands of a MULT if the target has a mode-appropriate widening multiply optab in the two places where I saw this happen. With this patch on aarch64 I saw increased generation of smaddl and umaddl instructions where before the combination of smull and add instructions were rejected because the extend operands were being transformed into these weird equivalent expressions. Overall, I see 24% more [su]maddl instructions being generated for SPEC2006 and with no performance regressions. The testcase in the patch is the most minimal one I could get that demonstrates the issue I'm trying to solve. Does this approach look ok? Thanks, Kyrill 2015-11-01 Kyrylo Tkachov * combine.c (subst): Restrict transformation when handling extend ops inside a MULT. (force_to_mode, binop case): Likewise. 2015-11-01 Kyrylo Tkachov * gcc.target/aarch64/umaddl_combine_1.c: New test. commit 2005243d896cbeb3d22421a63f080699ab357610 Author: Kyrylo Tkachov Date: Fri Oct 30 13:56:10 2015 +0000 [combine] Don't transform sign and zero extends inside mults diff --git a/gcc/combine.c b/gcc/combine.c index fa1a93f..be0f5ae 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -5369,6 +5369,7 @@ subst (rtx x, rtx from, rtx to, int in_dest, int in_cond, int unique_copy) n_occurrences++; } else + { /* If we are in a SET_DEST, suppress most cases unless we have gone inside a MEM, in which case we want to simplify the address. We assume here that things that @@ -5376,15 +5377,33 @@ subst (rtx x, rtx from, rtx to, int in_dest, int in_cond, int unique_copy) parts in the first expression. This is true for SUBREG, STRICT_LOW_PART, and ZERO_EXTRACT, which are the only things aside from REG and MEM that should appear in a - SET_DEST. */ - new_rtx = subst (XEXP (x, i), from, to, + SET_DEST. Also, restrict transformations on SIGN_EXTENDs + and ZERO_EXTENDs if they appear inside multiplications if + the target supports widening multiplies. This is to avoid + mangling such expressions beyond recognition. */ + bool restrict_extend_p = false; + rtx_code inner_code = GET_CODE (XEXP (x, i)); + + if (code == MULT + && (inner_code == SIGN_EXTEND + || inner_code == ZERO_EXTEND) + && widening_optab_handler (inner_code == ZERO_EXTEND + ? umul_widen_optab + : smul_widen_optab, + GET_MODE (x), + GET_MODE (XEXP (XEXP (x, i), 0))) + != CODE_FOR_nothing) + restrict_extend_p = true; + + new_rtx = subst (XEXP (x, i), from, to, (((in_dest && (code == SUBREG || code == STRICT_LOW_PART || code == ZERO_EXTRACT)) || code == SET) - && i == 0), + && i == 0) || restrict_extend_p, code == IF_THEN_ELSE && i == 0, unique_copy); + } /* If we found that we will have to reject this combination, indicate that by returning the CLOBBER ourselves, rather than @@ -8509,8 +8528,30 @@ force_to_mode (rtx x, machine_mode mode, unsigned HOST_WIDE_INT mask, /* For most binary operations, just propagate into the operation and change the mode if we have an operation of that mode. */ - op0 = force_to_mode (XEXP (x, 0), mode, mask, next_select); - op1 = force_to_mode (XEXP (x, 1), mode, mask, next_select); + op0 = XEXP (x, 0); + op1 = XEXP (x, 1); + + /* If we have a widening multiply avoid messing with the + ZERO/SIGN_EXTEND operands so that we can still match the + appropriate smul/umul patterns. */ + if (GET_CODE (x) == MULT + && GET_CODE (op0) == GET_CODE (op1) + && (GET_CODE (op0) == SIGN_EXTEND + || GET_CODE (op0) == ZERO_EXTEND) + && GET_MODE (op0) == GET_MODE (op1) + && widening_optab_handler (GET_CODE (op0) == ZERO_EXTEND + ? umul_widen_optab + : smul_widen_optab, + GET_MODE (x), + GET_MODE (XEXP (op0, 0))) + != CODE_FOR_nothing) + ; + else + { + op0 = force_to_mode (op0, mode, mask, next_select); + op1 = force_to_mode (op1, mode, mask, next_select); + } + /* If we ended up truncating both operands, truncate the result of the operation instead. */ diff --git a/gcc/testsuite/gcc.target/aarch64/umaddl_combine_1.c b/gcc/testsuite/gcc.target/aarch64/umaddl_combine_1.c new file mode 100644 index 0000000..703c94d --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/umaddl_combine_1.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mcpu=cortex-a53" } */ + +enum reg_class +{ + NO_REGS, + AD_REGS, + ALL_REGS, LIM_REG_CLASSES +}; + +extern enum reg_class + reg_class_subclasses[((int) LIM_REG_CLASSES)][((int) LIM_REG_CLASSES)]; + +void +init_reg_sets_1 () +{ + unsigned int i, j; + { + for (j = i + 1; j < ((int) LIM_REG_CLASSES); j++) + { + enum reg_class *p; + p = ®_class_subclasses[j][0]; + while (*p != LIM_REG_CLASSES) + p++; + } + } +} + +/* { dg-final { scan-assembler-not "umull\tx\[0-9\]+, w\[0-9\]+, w\[0-9\]+" } } */