From patchwork Mon Nov 7 10:39:06 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Botcazou X-Patchwork-Id: 81030 Delivered-To: patch@linaro.org Received: by 10.140.97.165 with SMTP id m34csp943354qge; Mon, 7 Nov 2016 02:39:35 -0800 (PST) X-Received: by 10.99.54.74 with SMTP id d71mr9857114pga.34.1478515175703; Mon, 07 Nov 2016 02:39:35 -0800 (PST) Return-Path: Received: from sourceware.org (server1.sourceware.org. [209.132.180.131]) by mx.google.com with ESMTPS id 67si30491216pgb.309.2016.11.07.02.39.35 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 07 Nov 2016 02:39:35 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-return-440597-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org; spf=pass (google.com: domain of gcc-patches-return-440597-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) smtp.mailfrom=gcc-patches-return-440597-patch=linaro.org@gcc.gnu.org DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type :content-transfer-encoding; q=dns; s=default; b=vNNNmIDgGixcf6Ii na/iDTbeoMk8cy9ghUGkfTTOPpdl6fXXOd/2iCRdy1VgP0t6kJjjM7b4pF3uWix+ kFplThD+iiFCFcRywHxQn2C9dDQeg4ijkaGR7mVN65KFHWQ3GRhorSvzMGADu47X a94hCIgxPVD9FDDKgrp7fIW+wso= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type :content-transfer-encoding; s=default; bh=N8Tl7QmUwh+7Y0ByrAlgPT sOQdA=; b=cm/v9Zva03HUUXdSIfCMfxzeTIF+7CDn5qo7wCMLQdKzIVosrY0px8 a4FwNNmi970GyN30U9qS58HuUvTVdOvy/lkyaad67mICk9NA7uAeq4GzoQ4keIFt ZUbQCGgo7imb+QMGqnt66Ds2c6k/Mg+pKL3LMFFlG/vA7AVnXDtSw= Received: (qmail 101490 invoked by alias); 7 Nov 2016 10:39:20 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 101480 invoked by uid 89); 7 Nov 2016 10:39:19 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.1 required=5.0 tests=AWL, BAYES_00, KAM_ASCII_DIVIDERS, RCVD_IN_DNSWL_NONE, SPF_PASS autolearn=no version=3.3.2 spammy=integral, sk:num_sig, regno, REGNO X-HELO: smtp.eu.adacore.com Received: from mel.act-europe.fr (HELO smtp.eu.adacore.com) (194.98.77.210) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 07 Nov 2016 10:39:09 +0000 Received: from localhost (localhost [127.0.0.1]) by filtered-smtp.eu.adacore.com (Postfix) with ESMTP id 79366812E5 for ; Mon, 7 Nov 2016 11:39:07 +0100 (CET) Received: from smtp.eu.adacore.com ([127.0.0.1]) by localhost (smtp.eu.adacore.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id JN2w2S9FXQvY for ; Mon, 7 Nov 2016 11:39:07 +0100 (CET) Received: from polaris.localnet (bon31-6-88-161-99-133.fbx.proxad.net [88.161.99.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.eu.adacore.com (Postfix) with ESMTPSA id 43DED812CB for ; Mon, 7 Nov 2016 11:39:07 +0100 (CET) From: Eric Botcazou To: gcc-patches@gcc.gnu.org Subject: [RFC] Fix PR rtl-optimization/59461 Date: Mon, 07 Nov 2016 11:39:06 +0100 Message-ID: <3031993.Iryjn4xZby@polaris> User-Agent: KMail/4.14.10 (Linux/3.16.7-48-desktop; KDE/4.14.9; x86_64; ; ) MIME-Version: 1.0 It's a missed optimization of a redundant zero-extension on the SPARC, which originally comes from PR rtl-optimization/58295 for ARM. The extension is eliminated on the ARM because the load is explicitly zero-extended in RTL; on the SPARC the load is implicitly zero-extended by means of LOAD_EXTEND_OP and the combiner is blocked by limitations of the nonzero_bits machinery. The approach is two-pronged: 1. it lifts a limitation in reg_nonzero_bits_for_combine that was recently added (https://gcc.gnu.org/ml/gcc-patches/2013-11/msg03782.html) and prevents the combiner from reasoning on larger modes under certain circumstances. 2. it makes nonzero_bits1 propagate results from inner REGs to paradoxical SUBREGs if both WORD_REGISTER_OPERATIONS and LOAD_EXTEND_OP are set. This also eliminate quite a few zero-extensions in the compile.exp testsuite at -O2 on the SPARC. Tested on x86-64/Linux and SPARC/Solaris. 2016-11-07 Eric Botcazou PR rtl-optimization/59461 * doc/rtl.texi (paradoxical subregs): Add missing word. * combine.c (reg_nonzero_bits_for_combine): Do not discard results in modes with precision larger than that of last_set_mode. * rtlanal.c (nonzero_bits1) : If WORD_REGISTER_OPERATIONS is set and LOAD_EXTEND_OP is appropriate, propagate results from inner REGs to paradoxical SUBREGs. (num_sign_bit_copies1) : Likewise. Check that the mode is not larger than a word before invoking LOAD_EXTEND_OP on it. 2016-11-07 Eric Botcazou * gcc.target/sparc/pr59461.c: New test. -- Eric Botcazou Index: doc/rtl.texi =================================================================== --- doc/rtl.texi (revision 241856) +++ doc/rtl.texi (working copy) @@ -1882,7 +1882,7 @@ When used as an rvalue, the low-order bi taken from @var{reg} while the high-order bits may or may not be defined. -The high-order bits of rvalues are in the following circumstances: +The high-order bits of rvalues are defined in the following circumstances: @itemize @item @code{subreg}s of @code{mem} Index: combine.c =================================================================== --- combine.c (revision 241856) +++ combine.c (working copy) @@ -9878,18 +9878,17 @@ reg_nonzero_bits_for_combine (const_rtx (DF_LR_IN (ENTRY_BLOCK_PTR_FOR_FN (cfun)->next_bb), REGNO (x))))) { - unsigned HOST_WIDE_INT mask = rsp->last_set_nonzero_bits; - - if (GET_MODE_PRECISION (rsp->last_set_mode) < GET_MODE_PRECISION (mode)) - /* We don't know anything about the upper bits. */ - mask |= GET_MODE_MASK (mode) ^ GET_MODE_MASK (rsp->last_set_mode); - - *nonzero &= mask; + /* Note that, even if the precision of last_set_mode is lower than that + of mode, record_value_for_reg invoked nonzero_bits on the register + with nonzero_bits_mode (because last_set_mode is necessarily integral + and HWI_COMPUTABLE_MODE_P in this case) so bits in nonzero_bits_mode + are all valid, hence in mode too since nonzero_bits_mode is defined + to the largest HWI_COMPUTABLE_MODE_P mode. */ + *nonzero &= rsp->last_set_nonzero_bits; return NULL; } tem = get_last_value (x); - if (tem) { if (SHORT_IMMEDIATES_SIGN_EXTEND) @@ -9898,7 +9897,8 @@ reg_nonzero_bits_for_combine (const_rtx return tem; } - else if (nonzero_sign_valid && rsp->nonzero_bits) + + if (nonzero_sign_valid && rsp->nonzero_bits) { unsigned HOST_WIDE_INT mask = rsp->nonzero_bits; Index: rtlanal.c =================================================================== --- rtlanal.c (revision 241856) +++ rtlanal.c (working copy) @@ -4242,7 +4242,7 @@ cached_nonzero_bits (const_rtx x, machin /* Given an expression, X, compute which bits in X can be nonzero. We don't care about bits outside of those defined in MODE. - For most X this is simply GET_MODE_MASK (GET_MODE (MODE)), but if X is + For most X this is simply GET_MODE_MASK (GET_MODE (X)), but if X is an arithmetic operation, we can do better. */ static unsigned HOST_WIDE_INT @@ -4549,18 +4549,17 @@ nonzero_bits1 (const_rtx x, machine_mode /* If this is a SUBREG formed for a promoted variable that has been zero-extended, we know that at least the high-order bits are zero, though others might be too. */ - if (SUBREG_PROMOTED_VAR_P (x) && SUBREG_PROMOTED_UNSIGNED_P (x)) nonzero = GET_MODE_MASK (GET_MODE (x)) & cached_nonzero_bits (SUBREG_REG (x), GET_MODE (x), known_x, known_mode, known_ret); - inner_mode = GET_MODE (SUBREG_REG (x)); /* If the inner mode is a single word for both the host and target machines, we can compute this from which bits of the inner object might be nonzero. */ + inner_mode = GET_MODE (SUBREG_REG (x)); if (GET_MODE_PRECISION (inner_mode) <= BITS_PER_WORD - && (GET_MODE_PRECISION (inner_mode) <= HOST_BITS_PER_WIDE_INT)) + && GET_MODE_PRECISION (inner_mode) <= HOST_BITS_PER_WIDE_INT) { nonzero &= cached_nonzero_bits (SUBREG_REG (x), mode, known_x, known_mode, known_ret); @@ -4568,19 +4567,17 @@ nonzero_bits1 (const_rtx x, machine_mode /* On many CISC machines, accessing an object in a wider mode causes the high-order bits to become undefined. So they are not known to be zero. */ - if (!WORD_REGISTER_OPERATIONS - /* If this is a typical RISC machine, we only have to worry - about the way loads are extended. */ - || ((LOAD_EXTEND_OP (inner_mode) == SIGN_EXTEND - ? val_signbit_known_set_p (inner_mode, nonzero) - : LOAD_EXTEND_OP (inner_mode) != ZERO_EXTEND) - || !MEM_P (SUBREG_REG (x)))) - { - if (GET_MODE_PRECISION (GET_MODE (x)) + if ((!WORD_REGISTER_OPERATIONS + /* If this is a typical RISC machine, we only have to worry + about the way loads are extended. */ + || (LOAD_EXTEND_OP (inner_mode) == SIGN_EXTEND + ? val_signbit_known_set_p (inner_mode, nonzero) + : LOAD_EXTEND_OP (inner_mode) != ZERO_EXTEND) + || (!MEM_P (SUBREG_REG (x)) && !REG_P (SUBREG_REG (x)))) + && GET_MODE_PRECISION (GET_MODE (x)) > GET_MODE_PRECISION (inner_mode)) - nonzero |= (GET_MODE_MASK (GET_MODE (x)) - & ~GET_MODE_MASK (inner_mode)); - } + nonzero + |= (GET_MODE_MASK (GET_MODE (x)) & ~GET_MODE_MASK (inner_mode)); } break; @@ -4785,6 +4782,7 @@ num_sign_bit_copies1 (const_rtx x, machi { enum rtx_code code = GET_CODE (x); unsigned int bitwidth = GET_MODE_PRECISION (mode); + machine_mode inner_mode; int num0, num1, result; unsigned HOST_WIDE_INT nonzero; @@ -4892,13 +4890,13 @@ num_sign_bit_copies1 (const_rtx x, machi } /* For a smaller object, just ignore the high bits. */ - if (bitwidth <= GET_MODE_PRECISION (GET_MODE (SUBREG_REG (x)))) + inner_mode = GET_MODE (SUBREG_REG (x)); + if (bitwidth <= GET_MODE_PRECISION (inner_mode)) { num0 = cached_num_sign_bit_copies (SUBREG_REG (x), VOIDmode, known_x, known_mode, known_ret); - return MAX (1, (num0 - - (int) (GET_MODE_PRECISION (GET_MODE (SUBREG_REG (x))) - - bitwidth))); + return + MAX (1, num0 - (int) (GET_MODE_PRECISION (inner_mode) - bitwidth)); } /* For paradoxical SUBREGs on machines where all register operations @@ -4912,9 +4910,10 @@ num_sign_bit_copies1 (const_rtx x, machi to the stack. */ if (WORD_REGISTER_OPERATIONS + && GET_MODE_PRECISION (inner_mode) <= BITS_PER_WORD + && LOAD_EXTEND_OP (inner_mode) == SIGN_EXTEND && paradoxical_subreg_p (x) - && LOAD_EXTEND_OP (GET_MODE (SUBREG_REG (x))) == SIGN_EXTEND - && MEM_P (SUBREG_REG (x))) + && (MEM_P (SUBREG_REG (x)) || REG_P (SUBREG_REG (x)))) return cached_num_sign_bit_copies (SUBREG_REG (x), mode, known_x, known_mode, known_ret); break;