From patchwork Fri Oct 13 14:10:09 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 115776 Delivered-To: patch@linaro.org Received: by 10.140.22.163 with SMTP id 32csp778279qgn; Fri, 13 Oct 2017 07:10:30 -0700 (PDT) X-Received: by 10.98.222.2 with SMTP id h2mr1523536pfg.165.1507903830266; Fri, 13 Oct 2017 07:10:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1507903830; cv=none; d=google.com; s=arc-20160816; b=lLtucFXwfaDKpUl6TEHoO2wJ7EG3GuWx0XWLjLtdPPcryB8r+V2GdCBocV23X3Cfyi anxsZgKsaDN+nwC9Q2MxTx1dtMXHPdbphm1nOVcYdyxmEsTH3EIOXv6kfGm1Idwrcr6W uYscUu0rwULNwRS/+xnKvT5w8R52lyT2wmYdS5uDyN6aVDEC+37yUK6YGSqW3iNdHQiZ cgDEX3thRWplnXzC5br89lOzohGdfz0rKh88+F5fil0H/AL8tpmE2PV0zBrC7bRkV+oc vjTLGSFSIwj4qLN+W0v9xKVY+dW0MqWsTw9w3cwWtb+a26Do7PmgSFoA5E2BNrLhyJG4 pEFA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:user-agent:message-id:date:subject:mail-followup-to:to :from:delivered-to:sender:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:mailing-list:dkim-signature :domainkey-signature:arc-authentication-results; bh=XhIkLAW5SJP1pF9h5+YsY4bmuhM69mGAcvgXIe2I+s0=; b=FvGgEbffYmAabsWGYyh+XMHuRCSceZ3AeSbsgYNjfD38M+IShuOeIJt3pDTVHUOy/8 NzItrZQjAhMlbYpj1WrDNEpLrOBo2RkG/ZTmEh9CID6i80oTNq9I350ZNmdyi0a3A+wS 8JShcGXtUokO8rCdG+i4/tt+PH4biWWUEXtKFQNuEkS2tNYK2qmtfW8bRu1O2DQ+xf5r cuYMqupSV1Tszms66kx9GymXxn+pWonp4PQXa8T8gnlOIEH4FjUad+kVvEGKTvbPgOU/ pfxdSBJIu5D/PT5AypjlDIf/+0PYaJuz+fQmTzfB4dOToycQFN3utFI619xN2QqCSTH9 QEWg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=eQZpq5wu; spf=pass (google.com: domain of gcc-patches-return-464142-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) smtp.mailfrom=gcc-patches-return-464142-patch=linaro.org@gcc.gnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from sourceware.org (server1.sourceware.org. [209.132.180.131]) by mx.google.com with ESMTPS id t28si648410pfg.6.2017.10.13.07.10.29 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 13 Oct 2017 07:10:30 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-return-464142-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=eQZpq5wu; spf=pass (google.com: domain of gcc-patches-return-464142-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) smtp.mailfrom=gcc-patches-return-464142-patch=linaro.org@gcc.gnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; q=dns; s= default; b=Fsqznb9cVE/gpc2hnZ+OkhNbE/g8jodJraQfpXxS3xQc1HWeF2oJC d0cyMbynIOA5uAp+nsluS8lz/L71vNo8twMgP2q/bov0SXjP5GiC4TP1rN2jsrm5 lHtbwNZ0c6xPVpuviU/kjzWKcBkxl3AXmZzfqo++v9yKEYBi60tRxc= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; s= default; bh=TIbjKu6JQU1DDWXOBX//x1taqfs=; b=eQZpq5wuZgrYPzjpSp4k xddk7FoELnYu8+gw/SIQZp0GacHgeG62ND3wMcG9eJXOVXvf4v8mhs0ydWnRWVxE GmuIXmQIq3eQHMmsj4tmCJ2KLRKwc7815ZAram1itGkfIWi5qzhrC7syhvySad52 nubpu8PlwUK2owBReH/Qn/o= Received: (qmail 102049 invoked by alias); 13 Oct 2017 14:10:18 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 102040 invoked by uid 89); 13 Oct 2017 14:10:18 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-9.7 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, RCVD_IN_DNSWL_NONE, RCVD_IN_SORBS_WEB, SPF_PASS autolearn=ham version=3.3.2 spammy= X-HELO: mail-wr0-f171.google.com Received: from mail-wr0-f171.google.com (HELO mail-wr0-f171.google.com) (209.85.128.171) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 13 Oct 2017 14:10:15 +0000 Received: by mail-wr0-f171.google.com with SMTP id k62so1144513wrc.9 for ; Fri, 13 Oct 2017 07:10:14 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:mail-followup-to:subject:date:message-id :user-agent:mime-version; bh=XhIkLAW5SJP1pF9h5+YsY4bmuhM69mGAcvgXIe2I+s0=; b=lNDauQ4LASPVKr8/seI8Oh+sFdFrft2zIJL+h9zcpINqUKLtg7QIyP6I+NELO3Qpv/ uILnxpOfOrpMmibdjVI2PqaLsAUsRuxJ2eyz0d+6zrv0aApu2w9b/ZFk3SJJmjl2jzcd g8g5CBJ1anCyDEjPNAZEEiGzWmzOBHHR9py74rD9H/LxW/3JmxXzXwo2/LGLeJcQpskR WRyEAsb7CzA4W35GQUC0WNlZ8adtzPKNmoquB8PWyQmLb349nupDfN/p3MMpOgrQb1Sy FNKW8Z6uPEHJrbBPpCK+0qM41yKqvLmQeS00Z3olvFDqWLlljuSQgNOADDYwXCo336Gc HFCw== X-Gm-Message-State: AMCzsaUWFFg2Ty+9Jrmv8H/j96rSO5SLCS7HbYAIhR6RxnXdgqEPhI6Q c+AOHPy58h2afkca1wrACxq3dfybpWI= X-Google-Smtp-Source: AOwi7QBv+s9iXjOsb+7cuo8VNbOTsEBnM4Z8eGFhu7nNcZwT6YZZ5wiufk8z0/nHOJKYpTG0nyduIg== X-Received: by 10.223.184.181 with SMTP id i50mr1718262wrf.124.1507903812390; Fri, 13 Oct 2017 07:10:12 -0700 (PDT) Received: from localhost (94.197.120.215.threembb.co.uk. [94.197.120.215]) by smtp.gmail.com with ESMTPSA id i10sm948912wmf.14.2017.10.13.07.10.11 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 13 Oct 2017 07:10:11 -0700 (PDT) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@linaro.org Subject: Add an alternative vector loop iv mechanism Date: Fri, 13 Oct 2017 15:10:09 +0100 Message-ID: <87o9pbrvem.fsf@linaro.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.2 (gnu/linux) MIME-Version: 1.0 Normally we adjust the vector loop so that it iterates: (original number of scalar iterations - number of peels) / VF times, enforcing this using an IV that starts at zero and increments by one each iteration. However, dividing by VF would be expensive for variable VF, so this patch adds an alternative in which the IV increments by VF each iteration instead. We then need to take care to handle possible overflow in the IV. The new mechanism isn't used yet; a later patch replaces the "if (1)" with a check for variable VF. If the patch is OK, I'll hold off applying it until the follow-on is ready to go in. Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64-linux-gnu. OK to install when the time comes? Richard 2017-10-13 Richard Sandiford gcc/ * tree-vect-loop-manip.c: Include gimple-fold.h. (slpeel_make_loop_iterate_ntimes): Add step, final_iv and niters_maybe_zero parameters. Handle other cases besides a step of 1. (vect_gen_vector_loop_niters): Add a step_vector_ptr parameter. Add a path that uses a step of VF instead of 1, but disable it for now. (vect_do_peeling): Add step_vector, niters_vector_mult_vf_var and niters_no_overflow parameters. Update calls to slpeel_make_loop_iterate_ntimes and vect_gen_vector_loop_niters. Create a new SSA name if the latter choses to use a ste other than zero, and return it via niters_vector_mult_vf_var. * tree-vect-loop.c (vect_transform_loop): Update calls to vect_do_peeling, vect_gen_vector_loop_niters and slpeel_make_loop_iterate_ntimes. * tree-vectorizer.h (slpeel_make_loop_iterate_ntimes, vect_do_peeling) (vect_gen_vector_loop_niters): Update declarations after above changes. Index: gcc/tree-vect-loop-manip.c =================================================================== --- gcc/tree-vect-loop-manip.c 2017-10-13 15:01:40.144777367 +0100 +++ gcc/tree-vect-loop-manip.c 2017-10-13 15:01:40.296014347 +0100 @@ -41,6 +41,7 @@ Software Foundation; either version 3, o #include "tree-scalar-evolution.h" #include "tree-vectorizer.h" #include "tree-ssa-loop-ivopts.h" +#include "gimple-fold.h" /************************************************************************* Simple Loop Peeling Utilities @@ -247,30 +248,115 @@ adjust_phi_and_debug_stmts (gimple *upda gimple_bb (update_phi)); } -/* Make the LOOP iterate NITERS times. This is done by adding a new IV - that starts at zero, increases by one and its limit is NITERS. +/* Make LOOP iterate N == (NITERS - STEP) / STEP + 1 times, + where NITERS is known to be outside the range [1, STEP - 1]. + This is equivalent to making the loop execute NITERS / STEP + times when NITERS is nonzero and (1 << M) / STEP times otherwise, + where M is the precision of NITERS. + + NITERS_MAYBE_ZERO is true if NITERS can be zero, false it is known + to be >= STEP. In the latter case N is always NITERS / STEP. + + If FINAL_IV is nonnull, it is an SSA name that should be set to + N * STEP on exit from the loop. Assumption: the exit-condition of LOOP is the last stmt in the loop. */ void -slpeel_make_loop_iterate_ntimes (struct loop *loop, tree niters) +slpeel_make_loop_iterate_ntimes (struct loop *loop, tree niters, tree step, + tree final_iv, bool niters_maybe_zero) { tree indx_before_incr, indx_after_incr; gcond *cond_stmt; gcond *orig_cond; + edge pe = loop_preheader_edge (loop); edge exit_edge = single_exit (loop); gimple_stmt_iterator loop_cond_gsi; gimple_stmt_iterator incr_gsi; bool insert_after; - tree init = build_int_cst (TREE_TYPE (niters), 0); - tree step = build_int_cst (TREE_TYPE (niters), 1); source_location loop_loc; enum tree_code code; + tree niters_type = TREE_TYPE (niters); orig_cond = get_loop_exit_condition (loop); gcc_assert (orig_cond); loop_cond_gsi = gsi_for_stmt (orig_cond); + tree init, limit; + if (!niters_maybe_zero && integer_onep (step)) + { + /* In this case we can use a simple 0-based IV: + + A: + x = 0; + do + { + ... + x += 1; + } + while (x < NITERS); */ + code = (exit_edge->flags & EDGE_TRUE_VALUE) ? GE_EXPR : LT_EXPR; + init = build_zero_cst (niters_type); + limit = niters; + } + else + { + /* The following works for all values of NITERS except 0: + + B: + x = 0; + do + { + ... + x += STEP; + } + while (x <= NITERS - STEP); + + so that the loop continues to iterate if x + STEP - 1 < NITERS + but stops if x + STEP - 1 >= NITERS. + + However, if NITERS is zero, x never hits a value above NITERS - STEP + before wrapping around. There are two obvious ways of dealing with + this: + + - start at STEP - 1 and compare x before incrementing it + - start at -1 and compare x after incrementing it + + The latter is simpler and is what we use. The loop in this case + looks like: + + C: + x = -1; + do + { + ... + x += STEP; + } + while (x < NITERS - STEP); + + In both cases the loop limit is NITERS - STEP. */ + gimple_seq seq = NULL; + limit = force_gimple_operand (niters, &seq, true, NULL_TREE); + limit = gimple_build (&seq, MINUS_EXPR, TREE_TYPE (limit), limit, step); + if (seq) + { + basic_block new_bb = gsi_insert_seq_on_edge_immediate (pe, seq); + gcc_assert (!new_bb); + } + if (niters_maybe_zero) + { + /* Case C. */ + code = (exit_edge->flags & EDGE_TRUE_VALUE) ? GE_EXPR : LT_EXPR; + init = build_all_ones_cst (niters_type); + } + else + { + /* Case B. */ + code = (exit_edge->flags & EDGE_TRUE_VALUE) ? GT_EXPR : LE_EXPR; + init = build_zero_cst (niters_type); + } + } + standard_iv_increment_position (loop, &incr_gsi, &insert_after); create_iv (init, step, NULL_TREE, loop, &incr_gsi, insert_after, &indx_before_incr, &indx_after_incr); @@ -278,11 +364,10 @@ slpeel_make_loop_iterate_ntimes (struct indx_after_incr = force_gimple_operand_gsi (&loop_cond_gsi, indx_after_incr, true, NULL_TREE, true, GSI_SAME_STMT); - niters = force_gimple_operand_gsi (&loop_cond_gsi, niters, true, NULL_TREE, + limit = force_gimple_operand_gsi (&loop_cond_gsi, limit, true, NULL_TREE, true, GSI_SAME_STMT); - code = (exit_edge->flags & EDGE_TRUE_VALUE) ? GE_EXPR : LT_EXPR; - cond_stmt = gimple_build_cond (code, indx_after_incr, niters, NULL_TREE, + cond_stmt = gimple_build_cond (code, indx_after_incr, limit, NULL_TREE, NULL_TREE); gsi_insert_before (&loop_cond_gsi, cond_stmt, GSI_SAME_STMT); @@ -301,8 +386,23 @@ slpeel_make_loop_iterate_ntimes (struct } /* Record the number of latch iterations. */ - loop->nb_iterations = fold_build2 (MINUS_EXPR, TREE_TYPE (niters), niters, - build_int_cst (TREE_TYPE (niters), 1)); + if (limit == niters) + /* Case A: the loop iterates NITERS times. Subtract one to get the + latch count. */ + loop->nb_iterations = fold_build2 (MINUS_EXPR, niters_type, niters, + build_int_cst (niters_type, 1)); + else + /* Case B or C: the loop iterates (NITERS - STEP) / STEP + 1 times. + Subtract one from this to get the latch count. */ + loop->nb_iterations = fold_build2 (TRUNC_DIV_EXPR, niters_type, + limit, step); + + if (final_iv) + { + gassign *assign = gimple_build_assign (final_iv, MINUS_EXPR, + indx_after_incr, init); + gsi_insert_on_edge_immediate (single_exit (loop), assign); + } } /* Helper routine of slpeel_tree_duplicate_loop_to_edge_cfg. @@ -1170,23 +1270,32 @@ vect_gen_scalar_loop_niters (tree niters return niters; } -/* This function generates the following statements: +/* NITERS is the number of times that the original scalar loop executes + after peeling. Work out the maximum number of iterations N that can + be handled by the vectorized form of the loop and then either: + + a) set *STEP_VECTOR_PTR to the vectorization factor and generate: + + niters_vector = N + + b) set *STEP_VECTOR_PTR to one and generate: - niters = number of iterations loop executes (after peeling) - niters_vector = niters / vf + niters_vector = N / vf - and places them on the loop preheader edge. NITERS_NO_OVERFLOW is - true if NITERS doesn't overflow. */ + In both cases, store niters_vector in *NITERS_VECTOR_PTR and add + any new statements on the loop preheader edge. NITERS_NO_OVERFLOW + is true if NITERS doesn't overflow (i.e. if NITERS is always nonzero). */ void vect_gen_vector_loop_niters (loop_vec_info loop_vinfo, tree niters, - tree *niters_vector_ptr, bool niters_no_overflow) + tree *niters_vector_ptr, tree *step_vector_ptr, + bool niters_no_overflow) { tree ni_minus_gap, var; - tree niters_vector, type = TREE_TYPE (niters); + tree niters_vector, step_vector, type = TREE_TYPE (niters); int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo); edge pe = loop_preheader_edge (LOOP_VINFO_LOOP (loop_vinfo)); - tree log_vf = build_int_cst (type, exact_log2 (vf)); + tree log_vf = NULL_TREE; /* If epilogue loop is required because of data accesses with gaps, we subtract one iteration from the total number of iterations here for @@ -1207,21 +1316,32 @@ vect_gen_vector_loop_niters (loop_vec_in else ni_minus_gap = niters; - /* Create: niters >> log2(vf) */ - /* If it's known that niters == number of latch executions + 1 doesn't - overflow, we can generate niters >> log2(vf); otherwise we generate - (niters - vf) >> log2(vf) + 1 by using the fact that we know ratio - will be at least one. */ - if (niters_no_overflow) - niters_vector = fold_build2 (RSHIFT_EXPR, type, ni_minus_gap, log_vf); + if (1) + { + /* Create: niters >> log2(vf) */ + /* If it's known that niters == number of latch executions + 1 doesn't + overflow, we can generate niters >> log2(vf); otherwise we generate + (niters - vf) >> log2(vf) + 1 by using the fact that we know ratio + will be at least one. */ + log_vf = build_int_cst (type, exact_log2 (vf)); + if (niters_no_overflow) + niters_vector = fold_build2 (RSHIFT_EXPR, type, ni_minus_gap, log_vf); + else + niters_vector + = fold_build2 (PLUS_EXPR, type, + fold_build2 (RSHIFT_EXPR, type, + fold_build2 (MINUS_EXPR, type, + ni_minus_gap, + build_int_cst (type, vf)), + log_vf), + build_int_cst (type, 1)); + step_vector = build_one_cst (type); + } else - niters_vector - = fold_build2 (PLUS_EXPR, type, - fold_build2 (RSHIFT_EXPR, type, - fold_build2 (MINUS_EXPR, type, ni_minus_gap, - build_int_cst (type, vf)), - log_vf), - build_int_cst (type, 1)); + { + niters_vector = ni_minus_gap; + step_vector = build_int_cst (type, vf); + } if (!is_gimple_val (niters_vector)) { @@ -1231,7 +1351,7 @@ vect_gen_vector_loop_niters (loop_vec_in gsi_insert_seq_on_edge_immediate (pe, stmts); /* Peeling algorithm guarantees that vector loop bound is at least ONE, we set range information to make niters analyzer's life easier. */ - if (stmts != NULL) + if (stmts != NULL && log_vf) set_range_info (niters_vector, VR_RANGE, wi::to_wide (build_int_cst (type, 1)), wi::to_wide (fold_build2 (RSHIFT_EXPR, type, @@ -1239,6 +1359,7 @@ vect_gen_vector_loop_niters (loop_vec_in log_vf))); } *niters_vector_ptr = niters_vector; + *step_vector_ptr = step_vector; return; } @@ -1600,7 +1721,12 @@ slpeel_update_phi_nodes_for_lcssa (struc - TH, CHECK_PROFITABILITY: Threshold of niters to vectorize loop if CHECK_PROFITABILITY is true. Output: - - NITERS_VECTOR: The number of iterations of loop after vectorization. + - *NITERS_VECTOR and *STEP_VECTOR describe how the main loop should + iterate after vectorization; see slpeel_make_loop_iterate_ntimes + for details. + - *NITERS_VECTOR_MULT_VF_VAR is either null or an SSA name that + should be set to the number of scalar iterations handled by the + vector loop. The SSA name is only used on exit from the loop. This function peels prolog and epilog from the loop, adds guards skipping PROLOG and EPILOG for various conditions. As a result, the changed CFG @@ -1657,8 +1783,9 @@ slpeel_update_phi_nodes_for_lcssa (struc struct loop * vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1, - tree *niters_vector, int th, bool check_profitability, - bool niters_no_overflow) + tree *niters_vector, tree *step_vector, + tree *niters_vector_mult_vf_var, int th, + bool check_profitability, bool niters_no_overflow) { edge e, guard_e; tree type = TREE_TYPE (niters), guard_cond; @@ -1754,7 +1881,9 @@ vect_do_peeling (loop_vec_info loop_vinf /* Generate and update the number of iterations for prolog loop. */ niters_prolog = vect_gen_prolog_loop_niters (loop_vinfo, anchor, &bound_prolog); - slpeel_make_loop_iterate_ntimes (prolog, niters_prolog); + tree step_prolog = build_one_cst (TREE_TYPE (niters_prolog)); + slpeel_make_loop_iterate_ntimes (prolog, niters_prolog, step_prolog, + NULL_TREE, false); /* Skip the prolog loop. */ if (skip_prolog) @@ -1867,9 +1996,20 @@ vect_do_peeling (loop_vec_info loop_vinf overflows. */ niters_no_overflow |= (prolog_peeling > 0); vect_gen_vector_loop_niters (loop_vinfo, niters, - niters_vector, niters_no_overflow); - vect_gen_vector_loop_niters_mult_vf (loop_vinfo, *niters_vector, - &niters_vector_mult_vf); + niters_vector, step_vector, + niters_no_overflow); + if (!integer_onep (*step_vector)) + { + /* On exit from the loop we will have an easy way of calcalating + NITERS_VECTOR / STEP * STEP. Install a dummy definition + until then. */ + niters_vector_mult_vf = make_ssa_name (TREE_TYPE (*niters_vector)); + SSA_NAME_DEF_STMT (niters_vector_mult_vf) = gimple_build_nop (); + *niters_vector_mult_vf_var = niters_vector_mult_vf; + } + else + vect_gen_vector_loop_niters_mult_vf (loop_vinfo, *niters_vector, + &niters_vector_mult_vf); /* Update IVs of original loop as if they were advanced by niters_vector_mult_vf steps. */ gcc_checking_assert (vect_can_advance_ivs_p (loop_vinfo)); Index: gcc/tree-vect-loop.c =================================================================== --- gcc/tree-vect-loop.c 2017-10-13 15:01:40.144777367 +0100 +++ gcc/tree-vect-loop.c 2017-10-13 15:01:40.296014347 +0100 @@ -7273,7 +7273,9 @@ vect_transform_loop (loop_vec_info loop_ basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo); int nbbs = loop->num_nodes; int i; - tree niters_vector = NULL; + tree niters_vector = NULL_TREE; + tree step_vector = NULL_TREE; + tree niters_vector_mult_vf = NULL_TREE; int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo); bool grouped_store; bool slp_scheduled = false; @@ -7342,17 +7344,21 @@ vect_transform_loop (loop_vec_info loop_ LOOP_VINFO_NITERS_UNCHANGED (loop_vinfo) = niters; tree nitersm1 = unshare_expr (LOOP_VINFO_NITERSM1 (loop_vinfo)); bool niters_no_overflow = loop_niters_no_overflow (loop_vinfo); - epilogue = vect_do_peeling (loop_vinfo, niters, nitersm1, &niters_vector, th, + epilogue = vect_do_peeling (loop_vinfo, niters, nitersm1, &niters_vector, + &step_vector, &niters_vector_mult_vf, th, check_profitability, niters_no_overflow); if (niters_vector == NULL_TREE) { if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)) - niters_vector - = build_int_cst (TREE_TYPE (LOOP_VINFO_NITERS (loop_vinfo)), - LOOP_VINFO_INT_NITERS (loop_vinfo) / vf); + { + niters_vector + = build_int_cst (TREE_TYPE (LOOP_VINFO_NITERS (loop_vinfo)), + LOOP_VINFO_INT_NITERS (loop_vinfo) / vf); + step_vector = build_one_cst (TREE_TYPE (niters)); + } else vect_gen_vector_loop_niters (loop_vinfo, niters, &niters_vector, - niters_no_overflow); + &step_vector, niters_no_overflow); } /* 1) Make sure the loop header has exactly two entries @@ -7603,7 +7609,13 @@ vect_transform_loop (loop_vec_info loop_ } /* stmts in BB */ } /* BBs in loop */ - slpeel_make_loop_iterate_ntimes (loop, niters_vector); + /* The vectorization factor is always > 1, so if we use an IV increment of 1. + a zero NITERS becomes a nonzero NITERS_VECTOR. */ + if (integer_onep (step_vector)) + niters_no_overflow = true; + slpeel_make_loop_iterate_ntimes (loop, niters_vector, step_vector, + niters_vector_mult_vf, + !niters_no_overflow); scale_profile_for_vect_loop (loop, vf); Index: gcc/tree-vectorizer.h =================================================================== --- gcc/tree-vectorizer.h 2017-10-13 15:01:40.144777367 +0100 +++ gcc/tree-vectorizer.h 2017-10-13 15:01:40.296014347 +0100 @@ -1138,13 +1138,14 @@ vect_get_scalar_dr_size (struct data_ref /* Simple loop peeling and versioning utilities for vectorizer's purposes - in tree-vect-loop-manip.c. */ -extern void slpeel_make_loop_iterate_ntimes (struct loop *, tree); +extern void slpeel_make_loop_iterate_ntimes (struct loop *, tree, tree, + tree, bool); extern bool slpeel_can_duplicate_loop_p (const struct loop *, const_edge); struct loop *slpeel_tree_duplicate_loop_to_edge_cfg (struct loop *, struct loop *, edge); extern void vect_loop_versioning (loop_vec_info, unsigned int, bool); extern struct loop *vect_do_peeling (loop_vec_info, tree, tree, - tree *, int, bool, bool); + tree *, tree *, tree *, int, bool, bool); extern source_location find_loop_location (struct loop *); extern bool vect_can_advance_ivs_p (loop_vec_info); @@ -1258,7 +1259,8 @@ extern gimple *vect_force_simple_reducti /* Drive for loop analysis stage. */ extern loop_vec_info vect_analyze_loop (struct loop *, loop_vec_info); extern tree vect_build_loop_niters (loop_vec_info, bool * = NULL); -extern void vect_gen_vector_loop_niters (loop_vec_info, tree, tree *, bool); +extern void vect_gen_vector_loop_niters (loop_vec_info, tree, tree *, + tree *, bool); /* Drive for loop transformation stage. */ extern struct loop *vect_transform_loop (loop_vec_info); extern loop_vec_info vect_analyze_loop_form (struct loop *);