Message ID | 1441724029-3124-1-git-send-email-james.greenhalgh@arm.com |
---|---|
State | New |
Headers | show |
On 09/08/2015 04:53 PM, James Greenhalgh wrote: > One big question I have with this patch is how I ought to write a meaningful > cost model I've used. It seems like yet another misuse of RTX costs, and > another bit of stuff for targets to carefully balance. Now, if the > relative cost of branches and conditional move instructions is not > carefully managed, you may enable or disable these optimisations. This is > probably acceptable, but I dislike adding more and more gotcha's to > target costs, as I get bitten by them hard enough as is! The code you have seems reasonable, except that for compile time it might make sense to not even attempt the optimization if the number of sets is too large. I'm not too worried about that, but maybe you could bail out early if your cost estimate goes too much above the branch cost. > + /* If we were supposed to read from an earlier write in this block, > + we've changed the register allocation. Rewire the read. While > + we are looking, also try to catch a swap idiom. */ So this is one interesting case; do you also have to worry about others (such as maybe setting the same register multiple times)? > + /* We must have seen some sort of insn to insert, otherwise we were > + given an empty BB to convert, and we can't handle that. */ > + if (unmodified_insns.is_empty ()) > + { > + end_sequence (); > + return FALSE; > + } Looks like some of the error conditions are tested twice across the two new functions? I think it would be better to get rid of one copy or turn the second one into a gcc_assert. > No testcase provided, as currently I don't know of targets with a high > enough branch cost to actually trigger the optimisation. Hmm, so the code would not actually be used right now? In that case I'll leave it to others to decide whether we want to apply it. Other than the points above it looks OK to me. Bernd
On 09/10/2015 12:23 PM, Bernd Schmidt wrote: > > > No testcase provided, as currently I don't know of targets with a high > > enough branch cost to actually trigger the optimisation. > > Hmm, so the code would not actually be used right now? In that case I'll > leave it to others to decide whether we want to apply it. Other than the > points above it looks OK to me. Some targets have -mbranch-cost to allow overriding the default costing. visium has a branch cost of 10! Several ports have a cost of 6 either unconditionally or when the branch is not well predicted. Presumably James is more interested in the ARM/AArch64 targets ;-) I think that's probably what James is most interested in getting some ideas around -- the cost model. I think the fundamental problem is BRANCH_COST isn't actually relative to anything other than the default value of "1". It doesn't directly correspond to COSTS_N_INSNS or anything else. So while using COSTS_N_INSNS (BRANCH_COST)) would seem to make sense, it actually doesn't. It's not even clear how a value of 10 relates to a value of 1 other than it's more expensive. ifcvt (and others) comparing to magic #s is more than a bit lame. But with BRANCH_COST having no meaning relative to anything else I can see why Richard did things that way. In an ideal world we'd find some mapping from BRANCH_COST that relates to CONST_N_INSNS. I suspect a simple mapping doesn't necessarily exist and we'll likely regress targets with any simplistic mapping. But maybe now is the time to address that fundamental problem and deal with the fallout. jeff
On 10/09/15 22:11, Jeff Law wrote: > On 09/10/2015 12:23 PM, Bernd Schmidt wrote: >> > No testcase provided, as currently I don't know of targets with a high >> > enough branch cost to actually trigger the optimisation. >> >> Hmm, so the code would not actually be used right now? In that case I'll >> leave it to others to decide whether we want to apply it. Other than the >> points above it looks OK to me. > Some targets have -mbranch-cost to allow overriding the default costing. > visium has a branch cost of 10! Several ports have a cost of 6 either > unconditionally or when the branch is not well predicted. > > Presumably James is more interested in the ARM/AArch64 targets ;-) > > I think that's probably what James is most interested in getting some > ideas around -- the cost model. > > I think the fundamental problem is BRANCH_COST isn't actually relative > to anything other than the default value of "1". It doesn't directly > correspond to COSTS_N_INSNS or anything else. So while using > COSTS_N_INSNS (BRANCH_COST)) would seem to make sense, it actually > doesn't. It's not even clear how a value of 10 relates to a value of 1 > other than it's more expensive. > > ifcvt (and others) comparing to magic #s is more than a bit lame. But > with BRANCH_COST having no meaning relative to anything else I can see > why Richard did things that way. Out of interest, what was the intended original meaning of branch costs if it was not to be relative to instructions? Thanks, Kyrill > In an ideal world we'd find some mapping from BRANCH_COST that relates > to CONST_N_INSNS. I suspect a simple mapping doesn't necessarily exist > and we'll likely regress targets with any simplistic mapping. But maybe > now is the time to address that fundamental problem and deal with the > fallout. > > jeff > >
On 09/10/2015 11:11 PM, Jeff Law wrote: > I think that's probably what James is most interested in getting some > ideas around -- the cost model. > > I think the fundamental problem is BRANCH_COST isn't actually relative > to anything other than the default value of "1". It doesn't directly > correspond to COSTS_N_INSNS or anything else. So while using > COSTS_N_INSNS (BRANCH_COST)) would seem to make sense, it actually > doesn't. It's not even clear how a value of 10 relates to a value of 1 > other than it's more expensive. > > ifcvt (and others) comparing to magic #s is more than a bit lame. But > with BRANCH_COST having no meaning relative to anything else I can see > why Richard did things that way. > > In an ideal world we'd find some mapping from BRANCH_COST that relates > to CONST_N_INSNS. I suspect a simple mapping doesn't necessarily exist > and we'll likely regress targets with any simplistic mapping. But maybe > now is the time to address that fundamental problem and deal with the > fallout. I think the right approach if we want to fix this is a new branch_cost_ninsns target hook (maybe with arguments taken_percentage, predictability), and gradually move everything to use that instead of BRANCH_COST. Bernd
On Fri, Sep 11, 2015 at 10:53:13AM +0200, Bernd Schmidt wrote: > On 09/10/2015 11:11 PM, Jeff Law wrote: > >I think that's probably what James is most interested in getting some > >ideas around -- the cost model. > > > >I think the fundamental problem is BRANCH_COST isn't actually relative > >to anything other than the default value of "1". It doesn't directly > >correspond to COSTS_N_INSNS or anything else. So while using > >COSTS_N_INSNS (BRANCH_COST)) would seem to make sense, it actually > >doesn't. It's not even clear how a value of 10 relates to a value of 1 > >other than it's more expensive. > > > >ifcvt (and others) comparing to magic #s is more than a bit lame. But > >with BRANCH_COST having no meaning relative to anything else I can see > >why Richard did things that way. > > > >In an ideal world we'd find some mapping from BRANCH_COST that relates > >to CONST_N_INSNS. I suspect a simple mapping doesn't necessarily exist > >and we'll likely regress targets with any simplistic mapping. But maybe > >now is the time to address that fundamental problem and deal with the > >fallout. > > I think the right approach if we want to fix this is a new > branch_cost_ninsns target hook (maybe with arguments > taken_percentage, predictability), and gradually move everything to > use that instead of BRANCH_COST. Perhaps providing backends with the entire if-then-else block along with the above mentioned information being if converted may be another approach, it allows the backends to analyse what cases are good to if-convert as per the ISA or micro-architecture and what aren't. regards Ramana
On Fri, Sep 11, 2015 at 10:04:12AM +0100, Ramana Radhakrishnan wrote: > On Fri, Sep 11, 2015 at 10:53:13AM +0200, Bernd Schmidt wrote: > > On 09/10/2015 11:11 PM, Jeff Law wrote: > > >I think that's probably what James is most interested in getting some > > >ideas around -- the cost model. > > > > > >I think the fundamental problem is BRANCH_COST isn't actually relative > > >to anything other than the default value of "1". It doesn't directly > > >correspond to COSTS_N_INSNS or anything else. So while using > > >COSTS_N_INSNS (BRANCH_COST)) would seem to make sense, it actually > > >doesn't. It's not even clear how a value of 10 relates to a value of 1 > > >other than it's more expensive. > > > > > >ifcvt (and others) comparing to magic #s is more than a bit lame. But > > >with BRANCH_COST having no meaning relative to anything else I can see > > >why Richard did things that way. > > > > > >In an ideal world we'd find some mapping from BRANCH_COST that relates > > >to CONST_N_INSNS. I suspect a simple mapping doesn't necessarily exist > > >and we'll likely regress targets with any simplistic mapping. But maybe > > >now is the time to address that fundamental problem and deal with the > > >fallout. > > > > I think the right approach if we want to fix this is a new > > branch_cost_ninsns target hook (maybe with arguments > > taken_percentage, predictability), and gradually move everything to > > use that instead of BRANCH_COST. > > Perhaps providing backends with the entire if-then-else block along > with the above mentioned information being if converted may be another > approach, it allows the backends to analyse what cases are good to > if-convert as per the ISA or micro-architecture and what aren't. I'm not sure how much of this is likely to be target-dependent and how much can just be abstracted to common ifcvt code resuing rtx_costs. I've been sketching out a rough idea of a more applicable cost model for RTL ifcvt, taking in to consideration what David mentioned regarding the talks at cauldron. The question we want to ask is: Which is preferable between: Before: (COSTS_N_INSNS cost of the compare+branch insns at the tail of the if BB. ??? (possibly) some factor related to BRANCH_COST) + weighted cost of then BB. + (if needed) weighted cost of else BB. After: seq_cost the candidate new sequence. The weighted cost of the two BBs should mix in some idea as to the relative probability that we execute them. The tough part is figuring out how to (reasonably) factor in branch cost. The reason that is tough is that BRANCH_COST is used inconsistently. Normally it is not measured relative to anything, but is compared against magic numbers for optimizations (each of which are really their own question to be posed as above). I don't have a good answer to that, nor a good answer as to what BRANCH_COST should represent in future. The use in the compiler is sort-of consistent with a measurement against instruction counts (i.e. a branch cost of 3 means a branch is equivalent to 3 cheap instructions), but is sometimes just used as a measure of expensive (a branch cost of >= 2 means that abs should be expanded using a sequence of bit operations). I'll look in to how the code in ifcvt starts to look with a modified cost model and get back to you... James
On 09/11/2015 02:49 AM, Kyrill Tkachov wrote: > > On 10/09/15 22:11, Jeff Law wrote: >> On 09/10/2015 12:23 PM, Bernd Schmidt wrote: >>> > No testcase provided, as currently I don't know of targets with a >>> high >>> > enough branch cost to actually trigger the optimisation. >>> >>> Hmm, so the code would not actually be used right now? In that case I'll >>> leave it to others to decide whether we want to apply it. Other than the >>> points above it looks OK to me. >> Some targets have -mbranch-cost to allow overriding the default costing. >> visium has a branch cost of 10! Several ports have a cost of 6 either >> unconditionally or when the branch is not well predicted. >> >> Presumably James is more interested in the ARM/AArch64 targets ;-) >> >> I think that's probably what James is most interested in getting some >> ideas around -- the cost model. >> >> I think the fundamental problem is BRANCH_COST isn't actually relative >> to anything other than the default value of "1". It doesn't directly >> correspond to COSTS_N_INSNS or anything else. So while using >> COSTS_N_INSNS (BRANCH_COST)) would seem to make sense, it actually >> doesn't. It's not even clear how a value of 10 relates to a value of 1 >> other than it's more expensive. >> >> ifcvt (and others) comparing to magic #s is more than a bit lame. But >> with BRANCH_COST having no meaning relative to anything else I can see >> why Richard did things that way. > > Out of interest, what was the intended original meaning > of branch costs if it was not to be relative to instructions? I don't think it ever had one. It's self-relative. A cost of 2 is greater than a cost of 1. No more, no less IIRC. Lame? Yes. Short-sighted? Yes. Should we try to fix it. Yes. If you look at how BRANCH_COST actually gets used, AFAIK it's tested only against "magic constants", which are themselves lame, short-sighted and need to be fixed. jeff
> Some targets have -mbranch-cost to allow overriding the default costing. > visium has a branch cost of 10! Yeah, the GR5 variant is pipelined but has no branch prediction; moreover there is an additional adverse effect coming for the instructions bus... > Several ports have a cost of 6 either unconditionally or when the branch > is not well predicted. 9 for UltraSPARC3, although this should probably be lowered if predictable_p.
Hi, In relation to the patch I put up for review a few weeks ago to teach RTL if-convert to handle multiple sets in a basic block [1], I was asking about a sensible cost model to use. There was some consensus at Cauldron that what should be done in this situation is to introduce a target hook that delegates answering the question to the target. This patch series introduces that new target hook to provide cost decisions for the RTL ifcvt pass. The idea is to give the target full visibility of the proposed transformation, and allow it to respond as to whether if-conversion in that way is profitable. In order to preserve current behaviour across targets, we will need the default implementation to keep to the strategy of simply comparing branch cost against a magic number. Patch 1/3 performs this refactoring, which is a bit hairy in some corner cases. Patch 2/3 is a simple code move, pulling the definition of the if_info structure used by RTL if-convert in to ifcvt.h where it can be included by targets. Patch 3/3 then introduces the new target hook, with the same default behaviour as was previously in noce_is_profitable_p. The series has been bootstrapped on ARM, AArch64 and x86_64 targets, and I've verified with Spec2000 and Spec2006 runs that there are no code generation differences for any of these three targets after the patch. I also gave ultrasparc3 a quick go, from what I could see, I changed the register allocation for the floating-point condition code registers. Presumably this is a side effect of first constructing RTXen that I then discard. I didn't see anything which looked like more frequent reloads or substantial code generation changes, though I'm not familiar with the intricacies of the Sparc condition registers :). I've included a patch 4/3, to give an example of what a target might want to do with this hook. It needs work for tuning and deciding how the function should actually behave, but works if it is thought of as more of a strawman/prototype than a patch submission. Are parts 1, 2 and 3 OK? Thanks, James [1]: https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00781.html --- [Patch ifcvt 1/3] Factor out cost calculations from noce cases 2015-09-26 James Greenhalgh <james.greenhalgh@arm.com> * ifcvt.c (noce_if_info): Add a magic_number field :-(. (noce_is_profitable_p): New. (noce_try_store_flag_constants): Move cost calculation to after sequence generation, factor it out to noce_is_profitable_p. (noce_try_addcc): Likewise. (noce_try_store_flag_mask): Likewise. (noce_try_cmove): Likewise. (noce_try_cmove_arith): Likewise. (noce_try_sign_mask): Add comment regarding cost calculations. [Patch ifcvt 2/3] Move noce_if_info in to ifcvt.h 2015-09-26 James Greenhalgh <james.greenhalgh@arm.com> * ifcvt.c (noce_if_info): Move to... * ifcvt.h (noce_if_info): ...Here. [Patch ifcvt 3/3] Create a new target hook for deciding profitability of noce if-conversion 2015-09-26 James Greenhalgh <james.greenhalgh@arm.com> * target.def (costs): New hook vector. (ifcvt_noce_profitable_p): New hook. * doc/tm.texi.in: Document it. * doc/tm.texi: Regenerate. * targhooks.h (default_ifcvt_noce_profitable_p): New. * targhooks.c (default_ifcvt_noce_profitable_p): New. * ifcvt.c (noce_profitable_p): Use new target hook. [Patch Prototype AArch64 ifcvt 4/3] Wire up the new if-convert costs hook for AArch64 2015-09-26 James Greenhalgh <james.greenhalgh@arm.com> * config/aarch64/aarch64.c (aarch64_additional_branch_cost_for_probability): New. (aarch64_ifcvt_noce_profitable_p): Likewise. (TARGET_COSTS_IFCVT_NOCE_PROFITABLE_P): Likewise.
On Fri, Sep 25, 2015 at 5:04 PM, James Greenhalgh <james.greenhalgh@arm.com> wrote: > Hi, > > In relation to the patch I put up for review a few weeks ago to teach > RTL if-convert to handle multiple sets in a basic block [1], I was > asking about a sensible cost model to use. There was some consensus at > Cauldron that what should be done in this situation is to introduce a > target hook that delegates answering the question to the target. Err - the consensus was to _not_ add gazillion of special target hooks but instead enhance what we have with rtx_cost so that passes can rely on comparing before and after costs of a sequence of insns. Richard. > This patch series introduces that new target hook to provide cost > decisions for the RTL ifcvt pass. > > The idea is to give the target full visibility of the proposed > transformation, and allow it to respond as to whether if-conversion in that > way is profitable. > > In order to preserve current behaviour across targets, we will need the > default implementation to keep to the strategy of simply comparing branch > cost against a magic number. Patch 1/3 performs this refactoring, which is > a bit hairy in some corner cases. > > Patch 2/3 is a simple code move, pulling the definition of the if_info > structure used by RTL if-convert in to ifcvt.h where it can be included > by targets. > > Patch 3/3 then introduces the new target hook, with the same default > behaviour as was previously in noce_is_profitable_p. > > The series has been bootstrapped on ARM, AArch64 and x86_64 targets, and > I've verified with Spec2000 and Spec2006 runs that there are no code > generation differences for any of these three targets after the patch. > > I also gave ultrasparc3 a quick go, from what I could see, I changed the > register allocation for the floating-point condition code registers. > Presumably this is a side effect of first constructing RTXen that I then > discard. I didn't see anything which looked like more frequent reloads or > substantial code generation changes, though I'm not familiar with the > intricacies of the Sparc condition registers :). > > I've included a patch 4/3, to give an example of what a target might want > to do with this hook. It needs work for tuning and deciding how the function > should actually behave, but works if it is thought of as more of a > strawman/prototype than a patch submission. > > Are parts 1, 2 and 3 OK? > > Thanks, > James > > [1]: https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00781.html > > --- > [Patch ifcvt 1/3] Factor out cost calculations from noce cases > > 2015-09-26 James Greenhalgh <james.greenhalgh@arm.com> > > * ifcvt.c (noce_if_info): Add a magic_number field :-(. > (noce_is_profitable_p): New. > (noce_try_store_flag_constants): Move cost calculation > to after sequence generation, factor it out to noce_is_profitable_p. > (noce_try_addcc): Likewise. > (noce_try_store_flag_mask): Likewise. > (noce_try_cmove): Likewise. > (noce_try_cmove_arith): Likewise. > (noce_try_sign_mask): Add comment regarding cost calculations. > > [Patch ifcvt 2/3] Move noce_if_info in to ifcvt.h > > 2015-09-26 James Greenhalgh <james.greenhalgh@arm.com> > > * ifcvt.c (noce_if_info): Move to... > * ifcvt.h (noce_if_info): ...Here. > > [Patch ifcvt 3/3] Create a new target hook for deciding profitability > of noce if-conversion > > 2015-09-26 James Greenhalgh <james.greenhalgh@arm.com> > > * target.def (costs): New hook vector. > (ifcvt_noce_profitable_p): New hook. > * doc/tm.texi.in: Document it. > * doc/tm.texi: Regenerate. > * targhooks.h (default_ifcvt_noce_profitable_p): New. > * targhooks.c (default_ifcvt_noce_profitable_p): New. > * ifcvt.c (noce_profitable_p): Use new target hook. > > [Patch Prototype AArch64 ifcvt 4/3] Wire up the new if-convert costs > hook for AArch64 > > 2015-09-26 James Greenhalgh <james.greenhalgh@arm.com> > > * config/aarch64/aarch64.c > (aarch64_additional_branch_cost_for_probability): New. > (aarch64_ifcvt_noce_profitable_p): Likewise. > (TARGET_COSTS_IFCVT_NOCE_PROFITABLE_P): Likewise.
On Tue, Sep 29, 2015 at 11:16:37AM +0100, Richard Biener wrote: > On Fri, Sep 25, 2015 at 5:04 PM, James Greenhalgh > <james.greenhalgh@arm.com> wrote: > > Hi, > > > > In relation to the patch I put up for review a few weeks ago to teach > > RTL if-convert to handle multiple sets in a basic block [1], I was > > asking about a sensible cost model to use. There was some consensus at > > Cauldron that what should be done in this situation is to introduce a > > target hook that delegates answering the question to the target. > > Err - the consensus was to _not_ add gazillion of special target hooks > but instead enhance what we have with rtx_cost so that passes can > rely on comparing before and after costs of a sequence of insns. Ah, I was not able to attend Cauldron this year, so I was trying to pick out "consensus" from the video. Rewatching it now, I see a better phrase would be "suggestion with some support". Watching the video a second time, it seems your proposal is that we improve the RTX costs infrastructure to handle sequences of Gimple/RTX. That would get us some way to making a smart decision in if-convert, but I'm not convinced it allows us to answer the question we are interested in. We have the rtx for before and after, and we can generate costs for these sequences. This allows us to calculate some weighted cost of the instructions based on the calculated probabilities that each block is executed. However, we are missing information on how expensive the branch is, and we have no way to get that through an RTX-costs infrastructure. We could add a hook to give a cost in COSTS_N_INSNS units to a branch based on its predictability. This is difficult as COSTS_N_INSNS units can differ depending on whether you are talking about floating-point or integer code. By this I mean, the compiler considers a SET which costs more than COSTS_N_INSNS (1) to be "expensive". Consequently, some targets set the cost of both an integer SET and a floating-point SET to both be COSTS_N_INSNS (1). In reality, these instructions may have different latency performance characteristics. What real world quantity are we trying to invoke when we say a branch costs the same as 3 SET instructions of any type? It certainly isn't mispredict penalty (likely measured in cycles, not relative to the cost of a SET instruction, which may well be completely free on modern x86 processors), nor is it the cost of executing the branch instruction which is often constant to resolve regardless of predicted/mispredicted status. On the other side of the equation, we want a cost for the converted sequence. We can build a cost of the generated rtl sequence, but for targets like AArch64 this is going to be wildly off. AArch64 will expand (a > b) ? x : y; as a set to the CC register, followed by a conditional move based on the CC register. Consequently, where we have multiple sets back to back we end up with: set CC (a > b) set x1 (CC ? x : y) set CC (a > b) set x2 (CC ? x : z) set CC (a > b) set x3 (CC ? x : k) Which we know will be simplified later to: set CC (a > b) set x1 (CC ? x : y) set x2 (CC ? x : z) set x3 (CC ? x : k) I imagine other targets have something similar in their expansion of mov<mode>cc (though I haven't looked). Our comparison for if-conversion then must be: weighted_old_cost = (taken_probability * (then_bb_cost) - (1 - taken_probability) * (else_bb_cost)); branch_cost = branch_cost_in_insns (taken_probability) weighted_new_cost = redundancy_factor (new_sequence) * seq_cost (new_sequence) profitable = weighted_new_cost <= weighted_old_cost + branch_cost And we must define: branch_cost_in_insns (taken_probability) redundancy_factor (new_sequence) At that point, I feel you are better giving the entire sequence to the target and asking it to implement whatever logic is needed to return a profitable/unprofitable analysis of the transformation. The "redundancy_factor" in particular is pretty tough to define in a way which makes sense outside of if_convert, without adding some pretty detailed analysis to decide what might or might not be eliminated by later passes. The alternative is to weight the other side of the equation by tuning the cost of branch_cost_in_insns high. This only serves to increase the disconnect between a real-world cost and a number to tweak to game code generation. If you have a different way of phrasing the if-conversion question that avoids the two very specific hooks, I'd be happy to try taking the patches in that direction. I don't see a way to implement this as just queries to a costing function which does not need substantial target and pass dependent tweaking to make behave correctly. Thanks, James > > This patch series introduces that new target hook to provide cost > > decisions for the RTL ifcvt pass. > > > > The idea is to give the target full visibility of the proposed > > transformation, and allow it to respond as to whether if-conversion in that > > way is profitable. > > > > In order to preserve current behaviour across targets, we will need the > > default implementation to keep to the strategy of simply comparing branch > > cost against a magic number. Patch 1/3 performs this refactoring, which is > > a bit hairy in some corner cases. > > > > Patch 2/3 is a simple code move, pulling the definition of the if_info > > structure used by RTL if-convert in to ifcvt.h where it can be included > > by targets. > > > > Patch 3/3 then introduces the new target hook, with the same default > > behaviour as was previously in noce_is_profitable_p. > > > > The series has been bootstrapped on ARM, AArch64 and x86_64 targets, and > > I've verified with Spec2000 and Spec2006 runs that there are no code > > generation differences for any of these three targets after the patch. > > > > I also gave ultrasparc3 a quick go, from what I could see, I changed the > > register allocation for the floating-point condition code registers. > > Presumably this is a side effect of first constructing RTXen that I then > > discard. I didn't see anything which looked like more frequent reloads or > > substantial code generation changes, though I'm not familiar with the > > intricacies of the Sparc condition registers :). > > > > I've included a patch 4/3, to give an example of what a target might want > > to do with this hook. It needs work for tuning and deciding how the function > > should actually behave, but works if it is thought of as more of a > > strawman/prototype than a patch submission. > > > > Are parts 1, 2 and 3 OK? > > > > Thanks, > > James > > > > [1]: https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00781.html > > > > --- > > [Patch ifcvt 1/3] Factor out cost calculations from noce cases > > > > 2015-09-26 James Greenhalgh <james.greenhalgh@arm.com> > > > > * ifcvt.c (noce_if_info): Add a magic_number field :-(. > > (noce_is_profitable_p): New. > > (noce_try_store_flag_constants): Move cost calculation > > to after sequence generation, factor it out to noce_is_profitable_p. > > (noce_try_addcc): Likewise. > > (noce_try_store_flag_mask): Likewise. > > (noce_try_cmove): Likewise. > > (noce_try_cmove_arith): Likewise. > > (noce_try_sign_mask): Add comment regarding cost calculations. > > > > [Patch ifcvt 2/3] Move noce_if_info in to ifcvt.h > > > > 2015-09-26 James Greenhalgh <james.greenhalgh@arm.com> > > > > * ifcvt.c (noce_if_info): Move to... > > * ifcvt.h (noce_if_info): ...Here. > > > > [Patch ifcvt 3/3] Create a new target hook for deciding profitability > > of noce if-conversion > > > > 2015-09-26 James Greenhalgh <james.greenhalgh@arm.com> > > > > * target.def (costs): New hook vector. > > (ifcvt_noce_profitable_p): New hook. > > * doc/tm.texi.in: Document it. > > * doc/tm.texi: Regenerate. > > * targhooks.h (default_ifcvt_noce_profitable_p): New. > > * targhooks.c (default_ifcvt_noce_profitable_p): New. > > * ifcvt.c (noce_profitable_p): Use new target hook. > > > > [Patch Prototype AArch64 ifcvt 4/3] Wire up the new if-convert costs > > hook for AArch64 > > > > 2015-09-26 James Greenhalgh <james.greenhalgh@arm.com> > > > > * config/aarch64/aarch64.c > > (aarch64_additional_branch_cost_for_probability): New. > > (aarch64_ifcvt_noce_profitable_p): Likewise. > > (TARGET_COSTS_IFCVT_NOCE_PROFITABLE_P): Likewise. >
On Sep 29, 2015, at 7:31 AM, James Greenhalgh <james.greenhalgh@arm.com> wrote: > On Tue, Sep 29, 2015 at 11:16:37AM +0100, Richard Biener wrote: >> On Fri, Sep 25, 2015 at 5:04 PM, James Greenhalgh >> <james.greenhalgh@arm.com> wrote: >>> >>> In relation to the patch I put up for review a few weeks ago to teach >>> RTL if-convert to handle multiple sets in a basic block [1], I was >>> asking about a sensible cost model to use. There was some consensus at >>> Cauldron that what should be done in this situation is to introduce a >>> target hook that delegates answering the question to the target. >> >> Err - the consensus was to _not_ add gazillion of special target hooks >> but instead enhance what we have with rtx_cost so that passes can >> rely on comparing before and after costs of a sequence of insns. > > Ah, I was not able to attend Cauldron this year, so I was trying to pick out > "consensus" from the video. Rewatching it now, I see a better phrase would > be "suggestion with some support”. I’m not a big fan of rtx_cost. To me it feels more like a crude, sledge hammer. Now, that is the gcc way, we have a ton of these things, but would be nice to refine the tools so that the big escape hatch isn’t used as often and we have more finer grained ways of doing things. rtx_cost should be what a code-generator generates with most new ports when they use the nice api to do a port. The old sledge hammer wielding ports may well always define rtx_cost themselves, but, we should shoot for something better. As a concrete example, I now have a code-generator for enum reg_class, N_REG_CLASSES, REG_CLASS_NAMES, REG_CLASS_CONTENTS, REGISTER_NAMES, FIXED_REGISTERS, CALL_USED_REGISTERS, ADDITIONAL_REGISTER_NAMES, REG_ALLOC_ORDER and more (some binutils code-gen to do with registers), and oh my, it is so much nicer to user than the original api. If you only ever have to write once these things, fine, but, if you develop and prototype CPUs, the existing interface is, well, less than ideal. I can do things like: gccrclass rc_gprs = “GENERAL”; r gpr[] = { rc_gprs, Fixed, Used, "$zero", "$sp", "$fp", "$lr" }; r gpr_sav[] = { Notfixed, Notused, alias ("$save_first"), "$sav1", "$sav2", "$sav3", "$sav4”, and get all the other goop I need for free. I’d encourage people to find a way to do up an rtx_cost generator. If you're a port maintainer, and want to redo your port to use a nicer api to do the registers, let me know. I’d love to see progress made to rid gcc of the old crappy apis.
On Tue, Sep 29, 2015 at 9:23 PM, Mike Stump <mikestump@comcast.net> wrote: > On Sep 29, 2015, at 7:31 AM, James Greenhalgh <james.greenhalgh@arm.com> wrote: >> On Tue, Sep 29, 2015 at 11:16:37AM +0100, Richard Biener wrote: >>> On Fri, Sep 25, 2015 at 5:04 PM, James Greenhalgh >>> <james.greenhalgh@arm.com> wrote: >>>> >>>> In relation to the patch I put up for review a few weeks ago to teach >>>> RTL if-convert to handle multiple sets in a basic block [1], I was >>>> asking about a sensible cost model to use. There was some consensus at >>>> Cauldron that what should be done in this situation is to introduce a >>>> target hook that delegates answering the question to the target. >>> >>> Err - the consensus was to _not_ add gazillion of special target hooks >>> but instead enhance what we have with rtx_cost so that passes can >>> rely on comparing before and after costs of a sequence of insns. >> >> Ah, I was not able to attend Cauldron this year, so I was trying to pick out >> "consensus" from the video. Rewatching it now, I see a better phrase would >> be "suggestion with some support”. > > I’m not a big fan of rtx_cost. To me it feels more like a crude, sledge hammer. Now, that is the gcc way, we have a ton of these things, but would be nice to refine the tools so that the big escape hatch isn’t used as often and we have more finer grained ways of doing things. rtx_cost should be what a code-generator generates with most new ports when they use the nice api to do a port. The old sledge hammer wielding ports may well always define rtx_cost themselves, but, we should shoot for something better. > > As a concrete example, I now have a code-generator for enum reg_class, N_REG_CLASSES, REG_CLASS_NAMES, REG_CLASS_CONTENTS, REGISTER_NAMES, FIXED_REGISTERS, CALL_USED_REGISTERS, ADDITIONAL_REGISTER_NAMES, REG_ALLOC_ORDER and more (some binutils code-gen to do with registers), and oh my, it is so much nicer to user than the original api. If you only ever have to write once these things, fine, but, if you develop and prototype CPUs, the existing interface is, well, less than ideal. I can do things like: > > gccrclass > rc_gprs = “GENERAL”; > > r gpr[] = { rc_gprs, Fixed, Used, > "$zero", "$sp", "$fp", "$lr" }; > r gpr_sav[] = { Notfixed, Notused, alias ("$save_first"), > "$sav1", "$sav2", "$sav3", "$sav4”, > > and get all the other goop I need for free. I’d encourage people to find a way to do up an rtx_cost generator. If you're a port maintainer, and want to redo your port to use a nicer api to do the registers, let me know. I’d love to see progress made to rid gcc of the old crappy apis. I agree that rtx_cost isn't the nicest thing either. But adding hooks like my_transform_proftable_p (X *) with 'X' being pass private data structure isn't very great design either. In fact it's worse IMHO. Richard.
On Tue, Sep 29, 2015 at 4:31 PM, James Greenhalgh <james.greenhalgh@arm.com> wrote: > On Tue, Sep 29, 2015 at 11:16:37AM +0100, Richard Biener wrote: >> On Fri, Sep 25, 2015 at 5:04 PM, James Greenhalgh >> <james.greenhalgh@arm.com> wrote: >> > Hi, >> > >> > In relation to the patch I put up for review a few weeks ago to teach >> > RTL if-convert to handle multiple sets in a basic block [1], I was >> > asking about a sensible cost model to use. There was some consensus at >> > Cauldron that what should be done in this situation is to introduce a >> > target hook that delegates answering the question to the target. >> >> Err - the consensus was to _not_ add gazillion of special target hooks >> but instead enhance what we have with rtx_cost so that passes can >> rely on comparing before and after costs of a sequence of insns. > > Ah, I was not able to attend Cauldron this year, so I was trying to pick out > "consensus" from the video. Rewatching it now, I see a better phrase would > be "suggestion with some support". > > Watching the video a second time, it seems your proposal is that we improve > the RTX costs infrastructure to handle sequences of Gimple/RTX. That would > get us some way to making a smart decision in if-convert, but I'm not > convinced it allows us to answer the question we are interested in. > > We have the rtx for before and after, and we can generate costs for these > sequences. This allows us to calculate some weighted cost of the > instructions based on the calculated probabilities that each block is > executed. However, we are missing information on how expensive the branch > is, and we have no way to get that through an RTX-costs infrastructure. Yeah, though during the meeting at the Cauldron I was asking on whether we maybe want a replacement_cost hook that can assume the to-be-replaced sequence is in the IL thus the hook can inspect insn_bb and thus get at branch costs ... Surely the proposed rtx_cost infrastructure enhancements will not cover all cases so the thing I wanted to throw in was that there was _not_ consensus that cost should be computed by pass specific target hooks that allow the target to inspect pass private data. Because that's a maintainance nightmare if you change a pass and have to second guess 50 targets cost hook implementations. > We could add a hook to give a cost in COSTS_N_INSNS units to a branch based > on its predictability. This is difficult as COSTS_N_INSNS units can differ > depending on whether you are talking about floating-point or integer code. Yes, which is why I suggested a replacement cost ... Of course the question is what you feed that hook as in principle it would be nice to avoid building the replacement RTXen until we know doing that will be necessary (the replacement is profitable). Maybe as soon as CFG changes are involved we need to think about adding a BB frequency to the hook though factoring in that can be done in the passes. What is the interesting part for the target is probably the cost of the branch itself as that can vary depending on predictability (which is usually very hard to assess at compile-time anyway). So what about a branch_cost hook that takes taken/not-taken probabilities as argument? Note that the agreement during the discussion was that all costs need to be comparable. > By this I mean, the compiler considers a SET which costs more than > COSTS_N_INSNS (1) to be "expensive". Consequently, some targets set the cost > of both an integer SET and a floating-point SET to both be COSTS_N_INSNS (1). > In reality, these instructions may have different latency performance > characteristics. What real world quantity are we trying to invoke when we > say a branch costs the same as 3 SET instructions of any type? It certainly > isn't mispredict penalty (likely measured in cycles, not relative to the cost > of a SET instruction, which may well be completely free on modern x86 > processors), nor is it the cost of executing the branch instruction which > is often constant to resolve regardless of predicted/mispredicted status. > > On the other side of the equation, we want a cost for the converted > sequence. We can build a cost of the generated rtl sequence, but for > targets like AArch64 this is going to be wildly off. AArch64 will expand > (a > b) ? x : y; as a set to the CC register, followed by a conditional > move based on the CC register. Consequently, where we have multiple sets > back to back we end up with: > > set CC (a > b) > set x1 (CC ? x : y) > set CC (a > b) > set x2 (CC ? x : z) > set CC (a > b) > set x3 (CC ? x : k) > > Which we know will be simplified later to: > > set CC (a > b) > set x1 (CC ? x : y) > set x2 (CC ? x : z) > set x3 (CC ? x : k) > > I imagine other targets have something similar in their expansion of > mov<mode>cc (though I haven't looked). > > Our comparison for if-conversion then must be: > > weighted_old_cost = (taken_probability * (then_bb_cost) > - (1 - taken_probability) * (else_bb_cost)); > branch_cost = branch_cost_in_insns (taken_probability) > weighted_new_cost = redundancy_factor (new_sequence) * seq_cost (new_sequence) > > profitable = weighted_new_cost <= weighted_old_cost + branch_cost > > And we must define: > > branch_cost_in_insns (taken_probability) > redundancy_factor (new_sequence) > > At that point, I feel you are better giving the entire sequence to the > target and asking it to implement whatever logic is needed to return a > profitable/unprofitable analysis of the transformation. Sure, that was what was suggested at the Cauldron - rtx_cost on individual insns (rtx cost doesn't even work on that level usually!) isn't coarse enough. We're C++ now so we'd pass the hook an iterator which it can use to iterate over the insns it should compute the cost of. > The "redundancy_factor" in particular is pretty tough to define in a way > which makes sense outside of if_convert, without adding some pretty > detailed analysis to decide what might or might not be eliminated by > later passes. The alternative is to weight the other side of the equation > by tuning the cost of branch_cost_in_insns high. This only serves to increase > the disconnect between a real-world cost and a number to tweak to game > code generation. > > If you have a different way of phrasing the if-conversion question that > avoids the two very specific hooks, I'd be happy to try taking the patches > in that direction. I don't see a way to implement this as just queries to > a costing function which does not need substantial target and pass > dependent tweaking to make behave correctly. From the patch I can't even see what the question is ;) I only see "is the transform profitable". I saw from your hook implementation that it's basically a set of magic numbers. So I suggested to make those numbers target configurable instead. IMHO we shouldn't accept new cost hooks that are only implemented by a single target. Instead it should be proved that the hook can improve code on multiple targets. Richard. > Thanks, > James > >> > This patch series introduces that new target hook to provide cost >> > decisions for the RTL ifcvt pass. >> > >> > The idea is to give the target full visibility of the proposed >> > transformation, and allow it to respond as to whether if-conversion in that >> > way is profitable. >> > >> > In order to preserve current behaviour across targets, we will need the >> > default implementation to keep to the strategy of simply comparing branch >> > cost against a magic number. Patch 1/3 performs this refactoring, which is >> > a bit hairy in some corner cases. >> > >> > Patch 2/3 is a simple code move, pulling the definition of the if_info >> > structure used by RTL if-convert in to ifcvt.h where it can be included >> > by targets. >> > >> > Patch 3/3 then introduces the new target hook, with the same default >> > behaviour as was previously in noce_is_profitable_p. >> > >> > The series has been bootstrapped on ARM, AArch64 and x86_64 targets, and >> > I've verified with Spec2000 and Spec2006 runs that there are no code >> > generation differences for any of these three targets after the patch. >> > >> > I also gave ultrasparc3 a quick go, from what I could see, I changed the >> > register allocation for the floating-point condition code registers. >> > Presumably this is a side effect of first constructing RTXen that I then >> > discard. I didn't see anything which looked like more frequent reloads or >> > substantial code generation changes, though I'm not familiar with the >> > intricacies of the Sparc condition registers :). >> > >> > I've included a patch 4/3, to give an example of what a target might want >> > to do with this hook. It needs work for tuning and deciding how the function >> > should actually behave, but works if it is thought of as more of a >> > strawman/prototype than a patch submission. >> > >> > Are parts 1, 2 and 3 OK? >> > >> > Thanks, >> > James >> > >> > [1]: https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00781.html >> > >> > --- >> > [Patch ifcvt 1/3] Factor out cost calculations from noce cases >> > >> > 2015-09-26 James Greenhalgh <james.greenhalgh@arm.com> >> > >> > * ifcvt.c (noce_if_info): Add a magic_number field :-(. >> > (noce_is_profitable_p): New. >> > (noce_try_store_flag_constants): Move cost calculation >> > to after sequence generation, factor it out to noce_is_profitable_p. >> > (noce_try_addcc): Likewise. >> > (noce_try_store_flag_mask): Likewise. >> > (noce_try_cmove): Likewise. >> > (noce_try_cmove_arith): Likewise. >> > (noce_try_sign_mask): Add comment regarding cost calculations. >> > >> > [Patch ifcvt 2/3] Move noce_if_info in to ifcvt.h >> > >> > 2015-09-26 James Greenhalgh <james.greenhalgh@arm.com> >> > >> > * ifcvt.c (noce_if_info): Move to... >> > * ifcvt.h (noce_if_info): ...Here. >> > >> > [Patch ifcvt 3/3] Create a new target hook for deciding profitability >> > of noce if-conversion >> > >> > 2015-09-26 James Greenhalgh <james.greenhalgh@arm.com> >> > >> > * target.def (costs): New hook vector. >> > (ifcvt_noce_profitable_p): New hook. >> > * doc/tm.texi.in: Document it. >> > * doc/tm.texi: Regenerate. >> > * targhooks.h (default_ifcvt_noce_profitable_p): New. >> > * targhooks.c (default_ifcvt_noce_profitable_p): New. >> > * ifcvt.c (noce_profitable_p): Use new target hook. >> > >> > [Patch Prototype AArch64 ifcvt 4/3] Wire up the new if-convert costs >> > hook for AArch64 >> > >> > 2015-09-26 James Greenhalgh <james.greenhalgh@arm.com> >> > >> > * config/aarch64/aarch64.c >> > (aarch64_additional_branch_cost_for_probability): New. >> > (aarch64_ifcvt_noce_profitable_p): Likewise. >> > (TARGET_COSTS_IFCVT_NOCE_PROFITABLE_P): Likewise. >>
On Sep 30, 2015, at 1:04 AM, Richard Biener <richard.guenther@gmail.com> wrote:
> So what about a branch_cost hook that takes taken/not-taken probabilities as argument?
So, for my port, I need to know %prediction as well to calculate cost. I know, kinda sucks. Or put another way, I want to explain the cost taken, predicted, not-taken, predicted, taken, mis-predicted, and not-taken-mis-predicted and let the caller sort out if the branch will be predicted or mis-predicted, as it can do the math itself and that math is target independent.
On 09/29/2015 04:31 PM, James Greenhalgh wrote: > On the other side of the equation, we want a cost for the converted > sequence. We can build a cost of the generated rtl sequence, but for > targets like AArch64 this is going to be wildly off. AArch64 will expand > (a > b) ? x : y; as a set to the CC register, followed by a conditional > move based on the CC register. Consequently, where we have multiple sets > back to back we end up with: > > set CC (a > b) > set x1 (CC ? x : y) > set CC (a > b) > set x2 (CC ? x : z) > set CC (a > b) > set x3 (CC ? x : k) > > Which we know will be simplified later to: > > set CC (a > b) > set x1 (CC ? x : y) > set x2 (CC ? x : z) > set x3 (CC ? x : k) I guess the transformation you want to make is a bit unusual in that it generates such extra instructions. rtx_cost has problems taking such secondary considerations into account. I haven't quite made up my mind about the new target hook, but I wonder if it might be a good idea to try and simplify the above sequence on the spot before calculating costs for it. (Incidentally, which pass removes the extra CC sets?) Bernd
On 10/01/2015 11:37 AM, Bernd Schmidt wrote: > On 09/29/2015 04:31 PM, James Greenhalgh wrote: >> On the other side of the equation, we want a cost for the converted >> sequence. We can build a cost of the generated rtl sequence, but for >> targets like AArch64 this is going to be wildly off. AArch64 will expand >> (a > b) ? x : y; as a set to the CC register, followed by a conditional >> move based on the CC register. Consequently, where we have multiple sets >> back to back we end up with: >> >> set CC (a > b) >> set x1 (CC ? x : y) >> set CC (a > b) >> set x2 (CC ? x : z) >> set CC (a > b) >> set x3 (CC ? x : k) >> >> Which we know will be simplified later to: >> >> set CC (a > b) >> set x1 (CC ? x : y) >> set x2 (CC ? x : z) >> set x3 (CC ? x : k) > > I guess the transformation you want to make is a bit unusual in that it > generates such extra instructions. rtx_cost has problems taking such > secondary considerations into account. > > I haven't quite made up my mind about the new target hook, but I wonder > if it might be a good idea to try and simplify the above sequence on the > spot before calculating costs for it. (Incidentally, which pass removes > the extra CC sets?) I don't know whether you've done any more work on the patch series, but I think I've made up my mind that optimizing the sequence before computing its cost would be a good thing to try first. Either with a better expander interface which generates the right thing immediately, or running something like cselib over the generated code. I think this would give more reliable results rather than guesstimating a redundancy factor in the cost function. Bernd
On 10/09/2015 05:28 AM, Bernd Schmidt wrote: > > I don't know whether you've done any more work on the patch series, but > I think I've made up my mind that optimizing the sequence before > computing its cost would be a good thing to try first. Either with a > better expander interface which generates the right thing immediately, > or running something like cselib over the generated code. I think this > would give more reliable results rather than guesstimating a redundancy > factor in the cost function. Another great example of what we could do if we could give an RTL sequence and mini CFG to the various optimizers and have them just work on the sequence (making the appropriate assumptions about live-in and live-out objects). Retrofitting this capability would likely be very hard at this point :( Even just DCE & CSE would probably be a notable step forward. jeff
Hi, When I was working in the area last year, I promised to revisit the cost model for noce if-conversion and see if I could improve the modeling. This turned out to be more tricky than I expected. This patch set rewrites the cost model for noce if-conversion. The goal is to rationalise the units used in the calculations away from BRANCH_COST, which is only defined relative to itself. I've introduced a new target hook "rtx_branch_cost" which is defined to return the cost of a branch in RTX cost units, suitable for comparing directly with the calculated cost of a conditional move, or the conditionally executed branches. If you're looking at that and thinking it doesn't sound much different from our current call to BRANCH_COST, you're right. This isn't as large a departure from the existing cost model as I had originally intended. I started out experimenting with much larger hooks (with many parameters/pass-specific data), or hooks that passed the whole edge to the target asking for the cost. These ended up feeling quite silly to write in the target, and don't match with the direction discussed at last year's cauldron. We don't want to go around leaking pass-internal data around to back-ends. That is a path of madness as the passes change but find that targets have invented baroque calculations to help invent a magic number. I tried implementing a "replacement_cost" hook, which would take before and after code sequences and try to guess profitability, but because you want to take edge probabilities in to consideration while trying to calculate the costs of an if-then-else structure, the code gets hairy quickly. Worse is that this would need duplicating across any target implementing the hook. I found that I was constructing lots of RTX just to throw it away again, and sometimes we were constructing RTX that would trivially be optimised by a future pass. As a metric for if-conversion, this hook seemed more harmful than useful for both the quality of the decision we'd make, and for the quality of the GCC source. As I iterated through versions of this patch set, I realised that all we really wanted for ifcvt was a way to estimate the cost of a branch in units that were comparable to the cost of instructions. The trouble with BRANCH_COST wasn't that it was returning a magic number, it was just that it was returning a magic number which had inconsistent meanings in the compiler. Otherwise, BRANCH_COST was a straightforward, low-complexity target hook. So the new hook simply defines the relative units that it will use and splits off the use in ifcvt from other BRANCH_COST calls. Having introduced the hook, and added some framework to make use of it, the rest of the patch set works through each of the cost models in ifcvt.c, makes them consistent, and moves them to the new hook. This act of making the cost models consistent will cause code generation changes on a number of targets - most notably x86_64. On x86_64 the RTX cost of a conditional move comes out at "20" - this is far higher than COSTS_N_INSNS (BRANCH_COST) for the x86 targets, so they lose lots of if-conversion. The easy fix for this would be to implement the new hook. I measured the performance impact on Spec2000 as a smoke test, it didn't seem to harm anything, and the result was a slight (< 3%) uplift on Spec2000FP. I'm no expert on x86_64, so I haven't taken a closer look for the reasons. Having worked through the patch set, I'd say it is probably a small improvement over what we currently do, but I'm not very happy with it. I'm posting it for comment so we can discuss any directions for costs that I haven't thought about or prototyped. I'm also happy to drop the costs rewrite if this seems like complexity for no benefit. Any thoughts? I've bootstrapped and tested the patch set on x86_64 and aarch64, but they probably need some further polishing if we were to decide this was a useful direction. Thanks, James James Greenhalgh (6): [RFC: Patch 1/6] New target hook: rtx_branch_cost [RFC: Patch 2/6] Factor out the comparisons against magic numbers in ifcvt [RFC: Patch 3/6] Remove if_info->branch_cost [RFC: Patch 4/6] Modify cost model for noce_cmove_arith [RFC: Patch 5/6] Improve the cost model for multiple-sets [RFC: Patch 6/6] Remove second cost model from noce_try_store_flag_mask gcc/doc/tm.texi | 10 +++ gcc/doc/tm.texi.in | 2 + gcc/ifcvt.c | 204 +++++++++++++++++++++++++++++++++++++++++------------ gcc/target.def | 14 ++++ gcc/targhooks.c | 10 +++ gcc/targhooks.h | 2 + 6 files changed, 197 insertions(+), 45 deletions(-)
On Thu, Jun 09, 2016 at 10:58:52AM -0600, Jeff Law wrote: > On 06/02/2016 10:53 AM, James Greenhalgh wrote: > >Hi, > > > >When I was working in the area last year, I promised to revisit the cost > >model for noce if-conversion and see if I could improve the modeling. This > >turned out to be more tricky than I expected. > > > >This patch set rewrites the cost model for noce if-conversion. The goal is > >to rationalise the units used in the calculations away from BRANCH_COST, > >which is only defined relative to itself. > Right. I think we all agreed that the key weakness of BRANCH_COST > was that its meaning is only defined relative to itself. > > What we want is a costing metric that would allow us to estimate the > cost of different forms of the computation, which might include > branches and which may include edge probabilty information. > > > > >If you're looking at that and thinking it doesn't sound much different from > >our current call to BRANCH_COST, you're right. This isn't as large a > >departure from the existing cost model as I had originally intended. > Perhaps not as large of a change as you intended, but I think you're > hitting the key issue with BRANCH_COST. > > > >This act of making the cost models consistent will cause code generation > >changes on a number of targets - most notably x86_64. On x86_64 the RTX > >cost of a conditional move comes out at "20" - this is far higher than > >COSTS_N_INSNS (BRANCH_COST) for the x86 targets, so they lose lots > >of if-conversion. The easy fix for this would be to implement the new hook. > >I measured the performance impact on Spec2000 as a smoke test, it didn't > >seem to harm anything, and the result was a slight (< 3%) uplift on > >Spec2000FP. I'm no expert on x86_64, so I haven't taken a closer look for > >the reasons. > I'd be comfortable with Uros guiding the implementation of the > target hook for x86 so that we don't take a major step backward. The trouble I'm having with all targets is noce_try_cmove_arith and the testcases added as part of that patch set. The current cost model for noce_try_cmove_arith doesn't take in to consideration the cost of a conditional move at all, it just checks the cost of each branch, takes the maximum of that, and compares it against COSTS_N_INSNS (BRANCH_COST). As we move to my patch set, I want to compute the cost of the whole if-converted sequence, which is going to include the cost of a conditional move. On x86_64, the total cost for a simple conditional move is COSTS_N_INSNS (5). This alone would prevent if-conversion for most x86_64 subtargets, once you add in the max cost of one of the two branches, you guarantee that no x86_64 target will be converting through noce_try_cmove_arith. From the test results I'm seeing, this holds true for other targets which show regressions with the new cost models. Clearly my idea for the default hook implementation is not going to fly. If more targets had implemented the PARAM_MAX_RTL_IF_CONVERSION_INSNS hook, I could use that to guide the default implementation, but that is currently x86_64 only. I could add a multiplier against BRANCH_COST in the new hook, but then we'd be guaranteeing that the "cheap" if-conversions, where we use STORE_FLAG_VALUE rather than introducing a conditional move, always fired unless the target had a branch_cost of 0 (this might not be a bad model actually...). Finding what this multiplier should be will be tough, as it depends directly on the cost of a conditional move, which targets don't expose easily, and which I don't want to construct junk conditional moves just to find through rtx_cost. I'll keep giving it some thought, and I'd appreciate any suggestions for the default hook implementation. I've respun the patch set around Bernd's suggestion, and now that we don't check the cost model until after constructing the new sequence the patch set looks much nicer. I'll wait a bit before sending the respin out in the hope that I'll have a good idea for the default hook implementation to reduce the number of performance changes I'll introduce. Thanks, James
diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c index 157a716..059bd89 100644 --- a/gcc/ifcvt.c +++ b/gcc/ifcvt.c @@ -2982,6 +2982,223 @@ bb_valid_for_noce_process_p (basic_block test_bb, rtx cond, return false; } +/* We have something like: + + if (x > y) + { i = a; j = b; k = c; } + + Make it: + + tmp_i = (x > y) ? a : i; + tmp_j = (x > y) ? b : j; + tmp_k = (x > y) ? c : k; + i = tmp_i; <- Should be cleaned up + j = tmp_j; <- Likewise. + k = tmp_k; <- Likewise. + + Look for special cases such as use of temporary registers (for + example in a swap idiom). + + IF_INFO contains the useful information about the block structure and + jump instructions. */ + +static int +noce_convert_multiple_sets (struct noce_if_info *if_info) +{ + basic_block test_bb = if_info->test_bb; + basic_block then_bb = if_info->then_bb; + basic_block join_bb = if_info->join_bb; + rtx_insn *jump = if_info->jump; + rtx_insn *cond_earliest; + unsigned int cost = 0; + rtx_insn *insn; + + start_sequence (); + + /* Decompose the condition attached to the jump. */ + rtx cond = noce_get_condition (jump, &cond_earliest, false); + rtx x = XEXP (cond, 0); + rtx y = XEXP (cond, 1); + rtx_code cond_code = GET_CODE (cond); + + /* The true targets for a conditional move. */ + vec<rtx> targets = vNULL; + /* The temporaries introduced to allow us to not consider register + overlap. */ + vec<rtx> temporaries = vNULL; + /* The insns we've emitted. */ + vec<rtx_insn *> unmodified_insns = vNULL; + unsigned count = 0; + + FOR_BB_INSNS (then_bb, insn) + { + /* Skip over non-insns. */ + if (!active_insn_p (insn)) + continue; + + rtx set = single_set (insn); + gcc_checking_assert (set); + + rtx target = SET_DEST (set); + rtx temp = gen_reg_rtx (GET_MODE (target)); + rtx new_val = SET_SRC (set); + rtx old_val = target; + + /* If we were supposed to read from an earlier write in this block, + we've changed the register allocation. Rewire the read. While + we are looking, also try to catch a swap idiom. */ + rtx candidate_rewire = new_val; + for (unsigned i = 0; i < count; i++) + { + if (reg_overlap_mentioned_p (new_val, targets[i])) + { + /* Catch a "swap" style idiom. */ + if (find_reg_note (insn, REG_DEAD, new_val) != NULL_RTX) + { + /* The write to targets[i] is only live until the read + here. As the condition codes match, we can propagate + the set to here. */ + candidate_rewire + = SET_SRC (single_set (unmodified_insns[i])); + + /* Discount the cost calculation by one conditional + set instruction. As we are just putting out + a group of SET instructions, any will do. */ + cost -= insn_rtx_cost (PATTERN (get_last_insn ()), + optimize_bb_for_speed_p (test_bb)); + } + else + candidate_rewire = temporaries[i]; + } + } + new_val = candidate_rewire; + + /* If we had a non-canonical conditional jump (i.e. one where + the fallthrough is to the "else" case) we need to reverse + the conditional select. */ + if (if_info->then_else_reversed) + std::swap (old_val, new_val); + + /* Actually emit the conditional move. */ + rtx temp_dest = noce_emit_cmove (if_info, temp, cond_code, + x, y, new_val, old_val); + + /* If we failed to expand the conditional move, drop out and don't + try to continue. */ + if (temp_dest == NULL_RTX) + { + end_sequence (); + return FALSE; + } + + /* Track the cost of building these conditional instructions. */ + cost += insn_rtx_cost (PATTERN (get_last_insn ()), + optimize_bb_for_speed_p (test_bb)); + + /* Bookkeeping. */ + count++; + targets.safe_push (target); + temporaries.safe_push (temp_dest); + unmodified_insns.safe_push (insn); + } + + /* We must have seen some sort of insn to insert, otherwise we were + given an empty BB to convert, and we can't handle that. */ + if (unmodified_insns.is_empty ()) + { + end_sequence (); + return FALSE; + } + + /* Check if this is actually beneficial. */ + if (cost > COSTS_N_INSNS (if_info->branch_cost)) + { + end_sequence (); + return FALSE; + } + + /* Now fixup the assignments. */ + for (unsigned i = 0; i < count; i++) + noce_emit_move_insn (targets[i], temporaries[i]); + + /* Actually emit the sequence. */ + rtx_insn *seq = get_insns (); + + for (insn = seq; insn; insn = NEXT_INSN (insn)) + set_used_flags (insn); + + unshare_all_rtl_in_chain (seq); + end_sequence (); + + if (!seq) + return FALSE; + + emit_insn_before_setloc (seq, if_info->jump, + INSN_LOCATION (unmodified_insns.last ())); + + /* Clean up THEN_BB and the edges in and out of it. */ + remove_edge (find_edge (test_bb, join_bb)); + remove_edge (find_edge (then_bb, join_bb)); + redirect_edge_and_branch_force (single_succ_edge (test_bb), join_bb); + delete_basic_block (then_bb); + num_true_changes++; + + /* Maybe merge blocks now the jump is simple enough. */ + if (can_merge_blocks_p (test_bb, join_bb)) + { + merge_blocks (test_bb, join_bb); + num_true_changes++; + } + + num_updated_if_blocks++; + return TRUE; +} + +/* Return true iff basic block TEST_BB is comprised of only + (SET (REG) (REG)) insns suitable for conversion to a series + of conditional moves. */ + +static bool +bb_ok_for_noce_convert_multiple_sets (basic_block test_bb) +{ + rtx_insn *insn; + + /* We must have at least one real insn to convert, or there will + be trouble! */ + bool bb_is_not_empty = false; + FOR_BB_INSNS (test_bb, insn) + { + /* Skip over notes etc. */ + if (!active_insn_p (insn)) + continue; + + /* We only handle SET insns. */ + rtx set = single_set (insn); + if (set == NULL_RTX) + return false; + + rtx dest = SET_DEST (set); + rtx src = SET_SRC (set); + + /* We can possibly relax this, but for now only handle REG to REG + moves. This avoids any issues that might come from introducing + loads/stores that might violate data-race-freedom guarantees. */ + if (!(REG_P (src) && REG_P (dest))) + return false; + + /* Destination must be appropriate for a conditional write. */ + if (!noce_operand_ok (dest)) + return false; + + /* We must be able to conditionally move in this mode. */ + if (!can_conditionally_move_p (GET_MODE (dest))) + return false; + + bb_is_not_empty = true; + } + return bb_is_not_empty; +} + /* Given a simple IF-THEN-JOIN or IF-THEN-ELSE-JOIN block, attempt to convert it without using conditional execution. Return TRUE if we were successful at converting the block. */ @@ -3004,12 +3221,22 @@ noce_process_if_block (struct noce_if_info *if_info) (1) if (...) x = a; else x = b; (2) x = b; if (...) x = a; (3) if (...) x = a; // as if with an initial x = x. - + (4) if (...) { x = a; y = b; z = c; } // Like 3, for multiple SETS. The later patterns require jumps to be more expensive. For the if (...) x = a; else x = b; case we allow multiple insns inside the then and else blocks as long as their only effect is to calculate a value for x. - ??? For future expansion, look for multiple X in such patterns. */ + ??? For future expansion, further expand the "multiple X" rules. */ + + /* First look for multiple SETS. */ + if (!else_bb + && HAVE_conditional_move + && !HAVE_cc0 + && bb_ok_for_noce_convert_multiple_sets (then_bb)) + { + if (noce_convert_multiple_sets (if_info)) + return TRUE; + } if (! bb_valid_for_noce_process_p (then_bb, cond, &if_info->then_cost, &if_info->then_simple))