From patchwork Wed Nov 23 10:42:04 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Preudhomme X-Patchwork-Id: 83628 Delivered-To: patch@linaro.org Received: by 10.140.97.165 with SMTP id m34csp2564424qge; Wed, 23 Nov 2016 02:42:31 -0800 (PST) X-Received: by 10.99.63.135 with SMTP id m129mr4097883pga.16.1479897751810; Wed, 23 Nov 2016 02:42:31 -0800 (PST) Return-Path: Received: from sourceware.org (server1.sourceware.org. [209.132.180.131]) by mx.google.com with ESMTPS id b34si4564581pli.224.2016.11.23.02.42.31 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 23 Nov 2016 02:42:31 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-return-442351-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org; spf=pass (google.com: domain of gcc-patches-return-442351-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) smtp.mailfrom=gcc-patches-return-442351-patch=linaro.org@gcc.gnu.org DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to :from:subject:message-id:date:mime-version:content-type; q=dns; s=default; b=REksT/o44EHdnoRiIdruPjxHA553MU0t3zJlIvKoYxhRfrj5s6 k+IZWhiKy6vxc79TnHp6esUklYPrRXKVM+UnHHdA50d73tjaVGz2ONhz/+Z32uAr JrpvK9cai2aRpmGxGllr+GCTrhvMFW/KMx1Y9s9QGg0d7pf4bpy/565xU= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to :from:subject:message-id:date:mime-version:content-type; s= default; bh=ITokJ3rHGsyqJo+94jBKfWwJioI=; b=Anv1k86Zjp282GH8kTRZ KWsrbEOoQOdPsz2uLUDBARr/Fc8QA73CJ/FiXjOVD4Zm0PbxY0pQKK61oFJROQQU f6a0sruiesJF9HpvyJgfyn7PTPqw4fbnlIkJ52zEvdpF1NL8R7LbbdvDmfapUK2H PagPaVsfuuIQ7mZroKsv9Ns= Received: (qmail 58548 invoked by alias); 23 Nov 2016 10:42:18 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 58534 invoked by uid 89); 23 Nov 2016 10:42:17 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-3.8 required=5.0 tests=BAYES_00, KAM_LAZY_DOMAIN_SECURITY, RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=26327, 2632, 7, 28789, n2 X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 23 Nov 2016 10:42:07 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 5BB04C14; Wed, 23 Nov 2016 02:42:06 -0800 (PST) Received: from [10.2.206.52] (usa-sjc-imap-foss1.foss.arm.com [10.72.51.249]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id BA3E43F483; Wed, 23 Nov 2016 02:42:05 -0800 (PST) To: Richard Biener , Jakub Jelinek , "gcc-patches@gcc.gnu.org" From: Thomas Preudhomme Subject: [PATCH, GCC] Fix PR77673: bswap loads passed end of object Message-ID: Date: Wed, 23 Nov 2016 10:42:04 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.3.0 MIME-Version: 1.0 X-IsSubscribed: yes Hi, In its current form, when the bswap optimization pass recognizes a load in a specific endianness it assumes that all smaller loads in the original expression are part of a linear chain of basic block (ie they are either in the same basic block or there is no conditional branching in the blocks involved). The assumptions is used when replacing the original set of loads by a new wider load: that load is always placed just before the original load with the smallest address. This can mean accessing passed the end of an object when the other loads of the original expression are executed conditionally because the code would be changed to a wider load unconditionally. Please see initial comment in PR77673 for the problem in action. This patch changes the pass to always select the load in the original expression in the most postdominated basic block of all loads as the location where to insert the new load. It also checks that this location postdominates the final statement of the load computation in the original expression. These two checks together with the fact that there is necessarily a flow path that includes all the loads and the computing expression (otherwise the expression's value would be undefined) ensure that (1) the new load is made if all original loads would have been made and (ii) the load is always made when the computing expression would be executed. ChangeLog entry is as follows: *** gcc/ChangeLog *** 2016-11-22 Thomas Preud'homme PR tree-optimization/77673 * tree-ssa-math-opts.c (struct symbolic_number): Add new src field. (init_symbolic_number): Initialize src field from src parameter. (perform_symbolic_merge): Select most dominated statement as the source statement. Set src field of resulting n structure from the input src with the lowest address. (find_bswap_or_nop): Rename source_stmt into ins_stmt. (bswap_replace): Rename src_stmt into ins_stmt. Initially get source of load from src field rather than insertion statement. Cancel optimization if statement analyzed is not dominated by the insertion statement. (pass_optimize_bswap::execute): Rename src_stmt to ins_stmt. Compute dominance information. *** gcc/testsuite/ChangeLog *** 2016-11-22 Thomas Preud'homme PR tree-optimization/77673 * gcc.dg/pr77673.c: New test. Bootstrapped on arm-linux-gnueabihf and x86_64-linux-gnu with regression testsuite coming back clean in both case. Is this ok for trunk? Best regards, Thomas diff --git a/gcc/testsuite/gcc.dg/pr77673.c b/gcc/testsuite/gcc.dg/pr77673.c new file mode 100644 index 0000000000000000000000000000000000000000..e03bcb9284d1e5a0e496cfa547fdbab630bec04f --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr77673.c @@ -0,0 +1,19 @@ +/* { dg-do compile { target fpic } } */ +/* { dg-require-effective-target bswap32 } */ +/* { dg-options "-O2 -fPIC -fdump-tree-bswap" } */ +/* { dg-additional-options "-march=z900" { target s390*-*-* } } */ + +void mach_parse_compressed(unsigned char* ptr, unsigned long int* val) +{ + if (ptr[0] < 0xC0U) { + *val = ptr[0] + ptr[1]; + return; + } + + *val = ((unsigned long int)(ptr[0]) << 24) + | ((unsigned long int)(ptr[1]) << 16) + | ((unsigned long int)(ptr[2]) << 8) + | ptr[3]; +} + +/* { dg-final { scan-tree-dump-not "load_dst_\\d+ =.* if \\(" "bswap" } } */ diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c index deccdf1ad14ece93ad56153ba0cfb8c555f9ceb0..4a47254d223e24caf1cd611f434a578729ba205d 100644 --- a/gcc/tree-ssa-math-opts.c +++ b/gcc/tree-ssa-math-opts.c @@ -1964,6 +1964,7 @@ struct symbolic_number { tree base_addr; tree offset; HOST_WIDE_INT bytepos; + tree src; tree alias_set; tree vuse; unsigned HOST_WIDE_INT range; @@ -2068,6 +2069,7 @@ init_symbolic_number (struct symbolic_number *n, tree src) return false; n->base_addr = n->offset = n->alias_set = n->vuse = NULL_TREE; + n->src = src; /* Set up the symbolic number N by setting each byte to a value between 1 and the byte size of rhs1. The highest order byte is set to n->size and the @@ -2192,6 +2194,7 @@ perform_symbolic_merge (gimple *source_stmt1, struct symbolic_number *n1, uint64_t inc; HOST_WIDE_INT start_sub, end_sub, end1, end2, end; struct symbolic_number *toinc_n_ptr, *n_end; + basic_block bb1, bb2; if (!n1->base_addr || !n2->base_addr || !operand_equal_p (n1->base_addr, n2->base_addr, 0)) @@ -2205,15 +2208,20 @@ perform_symbolic_merge (gimple *source_stmt1, struct symbolic_number *n1, { n_start = n1; start_sub = n2->bytepos - n1->bytepos; - source_stmt = source_stmt1; } else { n_start = n2; start_sub = n1->bytepos - n2->bytepos; - source_stmt = source_stmt2; } + bb1 = gimple_bb (source_stmt1); + bb2 = gimple_bb (source_stmt2); + if (dominated_by_p (CDI_DOMINATORS, bb1, bb2)) + source_stmt = source_stmt1; + else + source_stmt = source_stmt2; + /* Find the highest address at which a load is performed and compute related info. */ end1 = n1->bytepos + (n1->range - 1); @@ -2270,6 +2278,7 @@ perform_symbolic_merge (gimple *source_stmt1, struct symbolic_number *n1, n->vuse = n_start->vuse; n->base_addr = n_start->base_addr; n->offset = n_start->offset; + n->src = n_start->src; n->bytepos = n_start->bytepos; n->type = n_start->type; size = TYPE_PRECISION (n->type) / BITS_PER_UNIT; @@ -2519,7 +2528,7 @@ find_bswap_or_nop (gimple *stmt, struct symbolic_number *n, bool *bswap) uint64_t cmpxchg = CMPXCHG; uint64_t cmpnop = CMPNOP; - gimple *source_stmt; + gimple *ins_stmt; int limit; /* The last parameter determines the depth search limit. It usually @@ -2529,9 +2538,9 @@ find_bswap_or_nop (gimple *stmt, struct symbolic_number *n, bool *bswap) in libgcc, and for initial shift/and operation of the src operand. */ limit = TREE_INT_CST_LOW (TYPE_SIZE_UNIT (gimple_expr_type (stmt))); limit += 1 + (int) ceil_log2 ((unsigned HOST_WIDE_INT) limit); - source_stmt = find_bswap_or_nop_1 (stmt, n, limit); + ins_stmt = find_bswap_or_nop_1 (stmt, n, limit); - if (!source_stmt) + if (!ins_stmt) return NULL; /* Find real size of result (highest non-zero byte). */ @@ -2583,7 +2592,7 @@ find_bswap_or_nop (gimple *stmt, struct symbolic_number *n, bool *bswap) return NULL; n->range *= BITS_PER_UNIT; - return source_stmt; + return ins_stmt; } namespace { @@ -2632,7 +2641,7 @@ public: changing of basic block. */ static bool -bswap_replace (gimple *cur_stmt, gimple *src_stmt, tree fndecl, +bswap_replace (gimple *cur_stmt, gimple *ins_stmt, tree fndecl, tree bswap_type, tree load_type, struct symbolic_number *n, bool bswap) { @@ -2641,25 +2650,31 @@ bswap_replace (gimple *cur_stmt, gimple *src_stmt, tree fndecl, gimple *bswap_stmt; gsi = gsi_for_stmt (cur_stmt); - src = gimple_assign_rhs1 (src_stmt); + src = n->src; tgt = gimple_assign_lhs (cur_stmt); /* Need to load the value from memory first. */ if (n->base_addr) { - gimple_stmt_iterator gsi_ins = gsi_for_stmt (src_stmt); + gimple_stmt_iterator gsi_ins = gsi_for_stmt (ins_stmt); tree addr_expr, addr_tmp, val_expr, val_tmp; tree load_offset_ptr, aligned_load_type; gimple *addr_stmt, *load_stmt; unsigned align; HOST_WIDE_INT load_offset = 0; + basic_block ins_bb, cur_bb; + + ins_bb = gimple_bb (ins_stmt); + cur_bb = gimple_bb (cur_stmt); + if (!dominated_by_p (CDI_DOMINATORS, cur_bb, ins_bb)) + return false; align = get_object_alignment (src); /* Move cur_stmt just before one of the load of the original to ensure it has the same VUSE. See PR61517 for what could go wrong. */ - if (gimple_bb (cur_stmt) != gimple_bb (src_stmt)) + if (gimple_bb (cur_stmt) != gimple_bb (ins_stmt)) reset_flow_sensitive_info (gimple_assign_lhs (cur_stmt)); gsi_move_before (&gsi, &gsi_ins); gsi = gsi_for_stmt (cur_stmt); @@ -2834,6 +2849,7 @@ pass_optimize_bswap::execute (function *fun) memset (&nop_stats, 0, sizeof (nop_stats)); memset (&bswap_stats, 0, sizeof (bswap_stats)); + calculate_dominance_info (CDI_DOMINATORS); FOR_EACH_BB_FN (bb, fun) { @@ -2845,7 +2861,7 @@ pass_optimize_bswap::execute (function *fun) variant wouldn't be detected. */ for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi);) { - gimple *src_stmt, *cur_stmt = gsi_stmt (gsi); + gimple *ins_stmt, *cur_stmt = gsi_stmt (gsi); tree fndecl = NULL_TREE, bswap_type = NULL_TREE, load_type; enum tree_code code; struct symbolic_number n; @@ -2878,9 +2894,9 @@ pass_optimize_bswap::execute (function *fun) continue; } - src_stmt = find_bswap_or_nop (cur_stmt, &n, &bswap); + ins_stmt = find_bswap_or_nop (cur_stmt, &n, &bswap); - if (!src_stmt) + if (!ins_stmt) continue; switch (n.range) @@ -2914,7 +2930,7 @@ pass_optimize_bswap::execute (function *fun) if (bswap && !fndecl && n.range != 16) continue; - if (bswap_replace (cur_stmt, src_stmt, fndecl, bswap_type, load_type, + if (bswap_replace (cur_stmt, ins_stmt, fndecl, bswap_type, load_type, &n, bswap)) changed = true; }