Message ID | CACgzC7A8cVyoyFQO24eOo9t0+UEuKcYtWdANiJvgukcg_P9baw@mail.gmail.com |
---|---|
State | New |
Headers | show |
On Tue, Jun 17, 2014 at 4:11 AM, Zhenqiang Chen <zhenqiang.chen@linaro.org> wrote: > Hi, > > For some large constant, ports like ARM, need one more instructions to > operate it. e.g > > #define MASK 0xfe00ff > void maskdata (int * data, int len) > { > int i = len; > for (; i > 0; i -= 2) > { > data[i] &= MASK; > data[i + 1] &= MASK; > } > } > > Need two instructions for each AND operation: > > and r3, r3, #16711935 > bic r3, r3, #65536 > > If we keep the MASK in a register, loop2_invariant pass can hoist it > out the loop. And it can be shared by different references. > > So the patch skips constant propagation if it makes INSN's cost higher. So cprop undos invariant motions work here? Should we make sure we add a REG_EQUAL note when not propagating? > Bootstrap and no make check regression on X86-64 and ARM Chrome book. > > OK for trunk? > > Thanks! > -Zhenqiang > > ChangeLog: > 2014-06-17 Zhenqiang Chen <zhenqiang.chen@linaro.org> > > * cprop.c (try_replace_reg): Check cost for constants. > > diff --git a/gcc/cprop.c b/gcc/cprop.c > index aef3ee8..c9cf02a 100644 > --- a/gcc/cprop.c > +++ b/gcc/cprop.c > @@ -733,6 +733,14 @@ try_replace_reg (rtx from, rtx to, rtx insn) > rtx src = 0; > int success = 0; > rtx set = single_set (insn); > + int old_cost = 0; > + bool copy_p = false; > + bool speed = optimize_bb_for_speed_p (BLOCK_FOR_INSN (insn)); > + > + if (set && SET_SRC (set) && REG_P (SET_SRC (set))) > + copy_p = true; > + else > + old_cost = set_rtx_cost (set, speed); Looks bogus for set == NULL? Also what about register pressure? I think this kind of change needs wider testing as RTX costs are usually not fully implemented and you introduce a new use kind (or is it already used elsewhere in this way to compute cost difference of a set with s/reg/const?). What kind of performance difference do you see? Thanks, Richard. > /* Usually we substitute easy stuff, so we won't copy everything. > We however need to take care to not duplicate non-trivial CONST > @@ -740,6 +748,20 @@ try_replace_reg (rtx from, rtx to, rtx insn) > to = copy_rtx (to); > > validate_replace_src_group (from, to, insn); > + > + /* For CONSTANT_P (TO), loop2_invariant pass might hoist it out the loop. > + And it can be shared by different references. So skip propagation if > + it makes INSN's rtx cost higher. */ > + if (set && !copy_p && CONSTANT_P (to)) > + { > + int new_cost = set_rtx_cost (set, speed); > + if (new_cost > old_cost) > + { > + cancel_changes (0); > + return false; > + } > + } > + > if (num_changes_pending () && apply_change_group ()) > success = 1;
On 17 June 2014 16:15, Richard Biener <richard.guenther@gmail.com> wrote: > On Tue, Jun 17, 2014 at 4:11 AM, Zhenqiang Chen > <zhenqiang.chen@linaro.org> wrote: >> Hi, >> >> For some large constant, ports like ARM, need one more instructions to >> operate it. e.g >> >> #define MASK 0xfe00ff >> void maskdata (int * data, int len) >> { >> int i = len; >> for (; i > 0; i -= 2) >> { >> data[i] &= MASK; >> data[i + 1] &= MASK; >> } >> } >> >> Need two instructions for each AND operation: >> >> and r3, r3, #16711935 >> bic r3, r3, #65536 >> >> If we keep the MASK in a register, loop2_invariant pass can hoist it >> out the loop. And it can be shared by different references. >> >> So the patch skips constant propagation if it makes INSN's cost higher. > > So cprop undos invariant motions work here? Yes. GLOBAL CONST-PROP will undo invariant motions. > Should we make sure we add a REG_EQUAL note when not propagating? Logs show there already has REG_EQUAL note. >> Bootstrap and no make check regression on X86-64 and ARM Chrome book. >> >> OK for trunk? >> >> Thanks! >> -Zhenqiang >> >> ChangeLog: >> 2014-06-17 Zhenqiang Chen <zhenqiang.chen@linaro.org> >> >> * cprop.c (try_replace_reg): Check cost for constants. >> >> diff --git a/gcc/cprop.c b/gcc/cprop.c >> index aef3ee8..c9cf02a 100644 >> --- a/gcc/cprop.c >> +++ b/gcc/cprop.c >> @@ -733,6 +733,14 @@ try_replace_reg (rtx from, rtx to, rtx insn) >> rtx src = 0; >> int success = 0; >> rtx set = single_set (insn); >> + int old_cost = 0; >> + bool copy_p = false; >> + bool speed = optimize_bb_for_speed_p (BLOCK_FOR_INSN (insn)); >> + >> + if (set && SET_SRC (set) && REG_P (SET_SRC (set))) >> + copy_p = true; >> + else >> + old_cost = set_rtx_cost (set, speed); > > Looks bogus for set == NULL? set_rtx_cost has checked it. If it is NULL, the function will return 0; > Also what about register pressure? Do you think it has big register pressure impact? I think it does not increase register pressure. > I think this kind of change needs wider testing as RTX costs are > usually not fully implemented and you introduce a new use kind > (or is it already used elsewhere in this way to compute cost > difference of a set with s/reg/const?). Passes like fwprop, cse, auto_inc_dec, uses RTX costs to make the decision. e.g. in function attempt_change of auto-inc-dec.c, it has code segments like: old_cost = (set_src_cost (mem, speed) + set_rtx_cost (PATTERN (inc_insn.insn), speed)); new_cost = set_src_cost (mem_tmp, speed); ... if (old_cost < new_cost) { ... return false; } The usage of RTX costs in this patch is similar. I had run X86-64 bootstrap and regression tests with --enable-languages=c,c++,lto,fortran,go,ada,objc,obj-c++,java And ARM bootstrap and regression tests with --enable-languages=c,c++,fortran,lto,objc,obj-c++ I will run tests on i686. What other tests do you think I have to run? > What kind of performance difference do you see? I had run coremark, dhrystone, eembc on ARM Cortex-M4 (with some arm backend changes). Coremark with some options show >10% performance improvement. dhrystone is a little better. Some wave in eembc, but overall result is better. I will run spec2000 on X86-64 and ARM, and back to you about the performance changes. Thanks! -Zhenqiang > Thanks, > Richard. > >> /* Usually we substitute easy stuff, so we won't copy everything. >> We however need to take care to not duplicate non-trivial CONST >> @@ -740,6 +748,20 @@ try_replace_reg (rtx from, rtx to, rtx insn) >> to = copy_rtx (to); >> >> validate_replace_src_group (from, to, insn); >> + >> + /* For CONSTANT_P (TO), loop2_invariant pass might hoist it out the loop. >> + And it can be shared by different references. So skip propagation if >> + it makes INSN's rtx cost higher. */ >> + if (set && !copy_p && CONSTANT_P (to)) >> + { >> + int new_cost = set_rtx_cost (set, speed); >> + if (new_cost > old_cost) >> + { >> + cancel_changes (0); >> + return false; >> + } >> + } >> + >> if (num_changes_pending () && apply_change_group ()) >> success = 1;
diff --git a/gcc/cprop.c b/gcc/cprop.c index aef3ee8..c9cf02a 100644 --- a/gcc/cprop.c +++ b/gcc/cprop.c @@ -733,6 +733,14 @@ try_replace_reg (rtx from, rtx to, rtx insn) rtx src = 0; int success = 0; rtx set = single_set (insn); + int old_cost = 0; + bool copy_p = false; + bool speed = optimize_bb_for_speed_p (BLOCK_FOR_INSN (insn)); + + if (set && SET_SRC (set) && REG_P (SET_SRC (set))) + copy_p = true; + else + old_cost = set_rtx_cost (set, speed); /* Usually we substitute easy stuff, so we won't copy everything. We however need to take care to not duplicate non-trivial CONST @@ -740,6 +748,20 @@ try_replace_reg (rtx from, rtx to, rtx insn) to = copy_rtx (to); validate_replace_src_group (from, to, insn); + + /* For CONSTANT_P (TO), loop2_invariant pass might hoist it out the loop. + And it can be shared by different references. So skip propagation if + it makes INSN's rtx cost higher. */ + if (set && !copy_p && CONSTANT_P (to)) + { + int new_cost = set_rtx_cost (set, speed); + if (new_cost > old_cost) + { + cancel_changes (0); + return false; + } + } + if (num_changes_pending () && apply_change_group ()) success = 1;