From patchwork Fri Nov 11 14:45:34 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Thomas Preudhomme <thomas.preudhomme@foss.arm.com>
X-Patchwork-Id: 81846
Delivered-To: patch@linaro.org
Received: by 10.140.97.165 with SMTP id m34csp1303563qge;
 Fri, 11 Nov 2016 06:46:01 -0800 (PST)
X-Received: by 10.99.127.72 with SMTP id p8mr45517163pgn.183.1478875561908; 
 Fri, 11 Nov 2016 06:46:01 -0800 (PST)
Return-Path: <gcc-patches-return-441110-patch=linaro.org@gcc.gnu.org>
Received: from sourceware.org (server1.sourceware.org. [209.132.180.131])
 by mx.google.com with ESMTPS id 2si10539658pgd.31.2016.11.11.06.46.01
 for <patch@linaro.org>
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Fri, 11 Nov 2016 06:46:01 -0800 (PST)
Received-SPF: pass (google.com: domain of
 gcc-patches-return-441110-patch=linaro.org@gcc.gnu.org
 designates 209.132.180.131 as permitted sender)
 client-ip=209.132.180.131; 
Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org;
 spf=pass (google.com: domain of
 gcc-patches-return-441110-patch=linaro.org@gcc.gnu.org
 designates 209.132.180.131 as permitted sender)
 smtp.mailfrom=gcc-patches-return-441110-patch=linaro.org@gcc.gnu.org
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
 :list-unsubscribe:list-archive:list-post:list-help:sender:to
 :from:subject:message-id:date:mime-version:content-type; q=dns;
 s=default; b=PZWJNQkyckhRB4yJhxKcQtz0f+9JbrgmeYLeFbrO4bUnaNHbnX
 BZUZSstqbq9NRtDbG38Fg2I2NXKahxn+a4DN/92/yV0GhsYSlw3QCHYvRZo+4BMh
 zN6q97e2f7uB3AG+Fs5DZz9z9RiCps3nTtcdN2Nfrts9+4kXlXjVTv6rg=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
 :list-unsubscribe:list-archive:list-post:list-help:sender:to
 :from:subject:message-id:date:mime-version:content-type; s=
 default; bh=yaXmZLAfjgS4sCY59NyO6lZj0IQ=; b=dhI7jRuP4nlxJngaCKuP
 b9BEXG93jsmMvsfV9z9/FEaWQECaHP/vkDXJsJYsul3Uar/lzGtmjEenbsiPYc3s
 mVTD3Pr/so7IJ2cXCSZGglfbogslSpvxc7whaO/PGgLPNG1sasgyWff2cDUi1Wqa
 PgW1akPuTSBaMJhfxBJqn9k=
Received: (qmail 80176 invoked by alias); 11 Nov 2016 14:45:49 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <mailto:gcc-patches-unsubscribe-patch=linaro.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 80161 invoked by uid 89); 11 Nov 2016 14:45:48 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-3.8 required=5.0 tests=BAYES_00,
 KAM_LAZY_DOMAIN_SECURITY,
 RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=lowest,
 sk:bytes_b, sk:BYTES_B, cancel
X-HELO: foss.arm.com
Received: from foss.arm.com (HELO foss.arm.com) (217.140.101.70) by
 sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP;
 Fri, 11 Nov 2016 14:45:38 +0000
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249])	by
 usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 44A3E16;
 Fri, 11 Nov 2016 06:45:36 -0800 (PST)
Received: from [10.2.206.52] (usa-sjc-imap-foss1.foss.arm.com
 [10.72.51.249])	by usa-sjc-imap-foss1.foss.arm.com (Postfix)
 with ESMTPSA id A61583F318; Fri, 11 Nov 2016 06:45:35 -0800 (PST)
To: Richard Biener <rguenther@suse.de>, Jakub Jelinek <jakub@redhat.com>,
 "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
From: Thomas Preudhomme <thomas.preudhomme@foss.arm.com>
Subject: [PATCH, GCC] Recognize partial load of source expression on big endian
 targets
Message-ID: <ed676fc3-3c20-308a-d765-5c5a60d7ea08@foss.arm.com>
Date: Fri, 11 Nov 2016 14:45:34 +0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
 rv:45.0) Gecko/20100101 Thunderbird/45.3.0
MIME-Version: 1.0
X-IsSubscribed: yes

Hi,

To fix PR69714, code was added to disable bswap when the resulting symbolic 
expression (a load or load + byte swap) is smaller than the source expression 
(eg. some of the bytes accessed in the source code gets bitwise ANDed with 0). 
As explained in [1], there was already two pieces of code written independently 
in bswap to deal with that case and that's the interaction of the two that 
caused the bug.

[1] https://gcc.gnu.org/ml/gcc-patches/2016-02/msg00948.html

PR69714 proves that this pattern do occur in real code so this patch set out to 
reenable the optimization and remove the big endian adjustment in bswap_replace: 
the change in find_bswap_or_nop ensures that either we cancel the optimization 
or we don't and there is no need for offset adjustement. As explained in [2], 
the current code only support loss of bytes at the highest addresses because 
there is no code to adjust the address of the load. However, for little and big 
endian targets the bytes at highest address translate into different byte 
significance in the result. This patch first separate cmpxchg and cmpnop 
adjustement into 2 steps and then deal with endianness correctly for the second 
step.

[2] https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00119.html

Ideally we would want to still be able to do the adjustment to deal with load or 
load+bswap at an offset from the byte at lowest memory address accessed but this 
would require more code to recognize it properly for both little endian and big 
endian and will thus have to wait GCC 8 stage 1.

ChangeLog entry is as follows:

*** gcc/ChangeLog ***

2016-11-10  Thomas Preud'homme  <thomas.preudhomme@arm.com>

         * tree-ssa-math-opts.c (find_bswap_or_nop): Zero out bytes in cmpxchg
         and cmpnop in two steps: first the ones not accessed in original gimple
         expression in a endian independent way and then the ones not accessed
         in the final result in an endian-specific way.
         (bswap_replace): Stop doing big endian adjustment.


Testsuite does not show any regression on an armeb-none-eabi GCC cross-compiler 
targeting ARM Cortex-M3 and on an x86_64-linux-gnu bootstrapped native GCC 
compiler. Bootstrap on powerpc in progress.

Is this ok for trunk provided that the powerpc bootstrap succeeds?

Best regards,

Thomas

diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
index c315da88ce4feea1196a0416e4ea02e2a75a4377..b28c808c55489ae1ae16c173d66c561c1897e6ab 100644
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -2504,9 +2504,11 @@ find_bswap_or_nop_1 (gimple *stmt, struct symbolic_number *n, int limit)
 static gimple *
 find_bswap_or_nop (gimple *stmt, struct symbolic_number *n, bool *bswap)
 {
-  /* The number which the find_bswap_or_nop_1 result should match in order
-     to have a full byte swap.  The number is shifted to the right
-     according to the size of the symbolic number before using it.  */
+  unsigned rsize;
+  uint64_t tmpn, mask;
+/* The number which the find_bswap_or_nop_1 result should match in order
+   to have a full byte swap.  The number is shifted to the right
+   according to the size of the symbolic number before using it.  */
   uint64_t cmpxchg = CMPXCHG;
   uint64_t cmpnop = CMPNOP;
 
@@ -2527,28 +2529,38 @@ find_bswap_or_nop (gimple *stmt, struct symbolic_number *n, bool *bswap)
 
   /* Find real size of result (highest non-zero byte).  */
   if (n->base_addr)
-    {
-      unsigned HOST_WIDE_INT rsize;
-      uint64_t tmpn;
-
-      for (tmpn = n->n, rsize = 0; tmpn; tmpn >>= BITS_PER_MARKER, rsize++);
-      if (BYTES_BIG_ENDIAN && n->range != rsize)
-	/* This implies an offset, which is currently not handled by
-	   bswap_replace.  */
-	return NULL;
-      n->range = rsize;
-    }
+    for (tmpn = n->n, rsize = 0; tmpn; tmpn >>= BITS_PER_MARKER, rsize++);
+  else
+    rsize = n->range;
 
-  /* Zero out the extra bits of N and CMP*.  */
+  /* Zero out the bits corresponding to untouched bytes in original gimple
+     expression.  */
   if (n->range < (int) sizeof (int64_t))
     {
-      uint64_t mask;
-
       mask = ((uint64_t) 1 << (n->range * BITS_PER_MARKER)) - 1;
       cmpxchg >>= (64 / BITS_PER_MARKER - n->range) * BITS_PER_MARKER;
       cmpnop &= mask;
     }
 
+  /* Zero out the bits corresponding to unused bytes in the result of the
+     gimple expression.  */
+  if (rsize < n->range)
+    {
+      if (BYTES_BIG_ENDIAN)
+	{
+	  mask = ((uint64_t) 1 << (rsize * BITS_PER_MARKER)) - 1;
+	  cmpxchg &= mask;
+	  cmpnop >>= (n->range - rsize) * BITS_PER_MARKER;
+	}
+      else
+	{
+	  mask = ((uint64_t) 1 << (rsize * BITS_PER_MARKER)) - 1;
+	  cmpxchg >>= (n->range - rsize) * BITS_PER_MARKER;
+	  cmpnop &= mask;
+	}
+      n->range = rsize;
+    }
+
   /* A complete byte swap should make the symbolic number to start with
      the largest digit in the highest order byte. Unchanged symbolic
      number indicates a read with same endianness as target architecture.  */
@@ -2636,26 +2648,6 @@ bswap_replace (gimple *cur_stmt, gimple *src_stmt, tree fndecl,
       HOST_WIDE_INT load_offset = 0;
 
       align = get_object_alignment (src);
-      /* If the new access is smaller than the original one, we need
-	 to perform big endian adjustment.  */
-      if (BYTES_BIG_ENDIAN)
-	{
-	  HOST_WIDE_INT bitsize, bitpos;
-	  machine_mode mode;
-	  int unsignedp, reversep, volatilep;
-	  tree offset;
-
-	  get_inner_reference (src, &bitsize, &bitpos, &offset, &mode,
-			       &unsignedp, &reversep, &volatilep);
-	  if (n->range < (unsigned HOST_WIDE_INT) bitsize)
-	    {
-	      load_offset = (bitsize - n->range) / BITS_PER_UNIT;
-	      unsigned HOST_WIDE_INT l
-		= (load_offset * BITS_PER_UNIT) & (align - 1);
-	      if (l)
-		align = least_bit_hwi (l);
-	    }
-	}
 
       if (bswap
 	  && align < GET_MODE_ALIGNMENT (TYPE_MODE (load_type))