Message ID | CAAgBjM=mScG0DeJ=wSn1eoQCrpZZdr8T9=CpGQbfiQ1WKazMyQ@mail.gmail.com |
---|---|
State | New |
Headers | show |
Series | Transform (x >> cst) != 0 to x >= (1 << cst) and (x >> cst) == 0 to x < (1 << cst) | expand |
On Tue, Oct 03, 2017 at 12:54:39PM -0700, Prathamesh Kulkarni wrote: > Hi, > This follow-up patch implements the patterns mentioned in $subject. > Bootstrap+test in progress on x86_64-unknown-linux-gnu and aarch64-linux-gnu. > OK to commit if passes ? > > Thanks, > Prathamesh > 2017-10-03 Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> > > * match.pd ((X >> CST) == 0 -> X < (1 << CST)): New pattern. > ((X >> CST) != 0 -> X >= (1 << CST)): Likewise. > > testsuite/ > * gcc.dg/tree-ssa/cmpdiv.c: Add test-cases f3 and f4. Why this way and not the other way around? E.g. i?86/x86_64 and various other targets have shift instructions which set condition codes, so (X >> 51) == 0 is certainly smaller smaller and I believe cheaper than the latter. Try: void foo (void); void bar (unsigned long long x) { if ((x >> 51) == 0) foo (); } void baz (unsigned long long x) { if (x < (1LL << 51)) foo (); } with -Os on x86_64, the first function is 4 insns, 12 bytes, the second one 5 insns, 21 bytes. I wonder if this kind of instruction selection stuff shouldn't be done in target.pd instead, with input from the target. Jakub
On Tue, 3 Oct 2017, Jakub Jelinek wrote: > On Tue, Oct 03, 2017 at 12:54:39PM -0700, Prathamesh Kulkarni wrote: >> Hi, >> This follow-up patch implements the patterns mentioned in $subject. >> Bootstrap+test in progress on x86_64-unknown-linux-gnu and aarch64-linux-gnu. >> OK to commit if passes ? >> >> Thanks, >> Prathamesh > >> 2017-10-03 Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> >> >> * match.pd ((X >> CST) == 0 -> X < (1 << CST)): New pattern. >> ((X >> CST) != 0 -> X >= (1 << CST)): Likewise. build_int_cstu doesn't work for vectors, you want build_one_cst. I never know if we should check single_use or not :-( >> testsuite/ >> * gcc.dg/tree-ssa/cmpdiv.c: Add test-cases f3 and f4. > > Why this way and not the other way around? For high level gimple optimizations, X < CST is more convenient (and smaller, just one insn) than (X >> CST) == 0. > E.g. i?86/x86_64 and various other targets have shift instructions which > set condition codes, so (X >> 51) == 0 is certainly smaller > smaller and I believe cheaper than the latter. > Try: > void foo (void); > > void > bar (unsigned long long x) > { > if ((x >> 51) == 0) > foo (); > } > > void > baz (unsigned long long x) > { > if (x < (1LL << 51)) > foo (); > } > with -Os on x86_64, the first function is 4 insns, 12 bytes, > the second one 5 insns, 21 bytes. > > I wonder if this kind of instruction selection stuff shouldn't be > done in target.pd instead, with input from the target. At a late stage, maybe during an RTL pass or expansion (or just before expansion) it would indeed be good to generate a shift for such comparisons, on targets where that sets a cc. The lack of this transformation could be considered a blocker for the other one, to avoid regressing on bar. -- Marc Glisse
On 10/03/2017 03:00 PM, Marc Glisse wrote: > On Tue, 3 Oct 2017, Jakub Jelinek wrote: > >> On Tue, Oct 03, 2017 at 12:54:39PM -0700, Prathamesh Kulkarni wrote: >>> Hi, >>> This follow-up patch implements the patterns mentioned in $subject. >>> Bootstrap+test in progress on x86_64-unknown-linux-gnu and >>> aarch64-linux-gnu. >>> OK to commit if passes ? >>> >>> Thanks, >>> Prathamesh >> >>> 2017-10-03 Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> >>> >>> * match.pd ((X >> CST) == 0 -> X < (1 << CST)): New pattern. >>> ((X >> CST) != 0 -> X >= (1 << CST)): Likewise. > > build_int_cstu doesn't work for vectors, you want build_one_cst. I never > know if we should check single_use or not :-( > >>> testsuite/ >>> * gcc.dg/tree-ssa/cmpdiv.c: Add test-cases f3 and f4. >> >> Why this way and not the other way around? > > For high level gimple optimizations, X < CST is more convenient (and > smaller, just one insn) than (X >> CST) == 0. Right. One could easily argue that Prathamesh's form should be the preferred form because it is simpler at the gimple level -- and that x86-isms should be dealt with later in the pipeline. > >> E.g. i?86/x86_64 and various other targets have shift instructions which >> set condition codes, so (X >> 51) == 0 is certainly smaller >> smaller and I believe cheaper than the latter. >> Try: >> void foo (void); >> >> void >> bar (unsigned long long x) >> { >> if ((x >> 51) == 0) >> foo (); >> } >> >> void >> baz (unsigned long long x) >> { >> if (x < (1LL << 51)) >> foo (); >> } >> with -Os on x86_64, the first function is 4 insns, 12 bytes, >> the second one 5 insns, 21 bytes. >> >> I wonder if this kind of instruction selection stuff shouldn't be >> done in target.pd instead, with input from the target. Right, but I think that argues that Prathamesh's patch is the right direction and that to move forward what needs to happen is something needs to be fixed at the gimple/rtl border to ensure we get good x86 code. > > At a late stage, maybe during an RTL pass or expansion (or just before > expansion) it would indeed be good to generate a shift for such > comparisons, on targets where that sets a cc. The lack of this > transformation could be considered a blocker for the other one, to avoid > regressing on bar. Right. In fact, I think Jakub's test ought to be added to this work as part of its basic testing. jeff
On 3 October 2017 at 14:10, Jeff Law <law@redhat.com> wrote: > On 10/03/2017 03:00 PM, Marc Glisse wrote: >> On Tue, 3 Oct 2017, Jakub Jelinek wrote: >> >>> On Tue, Oct 03, 2017 at 12:54:39PM -0700, Prathamesh Kulkarni wrote: >>>> Hi, >>>> This follow-up patch implements the patterns mentioned in $subject. >>>> Bootstrap+test in progress on x86_64-unknown-linux-gnu and >>>> aarch64-linux-gnu. >>>> OK to commit if passes ? >>>> >>>> Thanks, >>>> Prathamesh >>> >>>> 2017-10-03 Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> >>>> >>>> * match.pd ((X >> CST) == 0 -> X < (1 << CST)): New pattern. >>>> ((X >> CST) != 0 -> X >= (1 << CST)): Likewise. >> >> build_int_cstu doesn't work for vectors, you want build_one_cst. I never >> know if we should check single_use or not :-( Thanks for the pointers, I wasn't aware about build_one_cst. >> >>>> testsuite/ >>>> * gcc.dg/tree-ssa/cmpdiv.c: Add test-cases f3 and f4. >>> >>> Why this way and not the other way around? >> >> For high level gimple optimizations, X < CST is more convenient (and >> smaller, just one insn) than (X >> CST) == 0. > Right. One could easily argue that Prathamesh's form should be the > preferred form because it is simpler at the gimple level -- and that > x86-isms should be dealt with later in the pipeline. > > >> >>> E.g. i?86/x86_64 and various other targets have shift instructions which >>> set condition codes, so (X >> 51) == 0 is certainly smaller >>> smaller and I believe cheaper than the latter. >>> Try: >>> void foo (void); >>> >>> void >>> bar (unsigned long long x) >>> { >>> if ((x >> 51) == 0) >>> foo (); >>> } >>> >>> void >>> baz (unsigned long long x) >>> { >>> if (x < (1LL << 51)) >>> foo (); >>> } >>> with -Os on x86_64, the first function is 4 insns, 12 bytes, >>> the second one 5 insns, 21 bytes. Indeed, I can reproduce this behavior. Thanks for pointing it out. >>> >>> I wonder if this kind of instruction selection stuff shouldn't be >>> done in target.pd instead, with input from the target. > Right, but I think that argues that Prathamesh's patch is the right > direction and that to move forward what needs to happen is something > needs to be fixed at the gimple/rtl border to ensure we get good x86 code. > >> >> At a late stage, maybe during an RTL pass or expansion (or just before >> expansion) it would indeed be good to generate a shift for such >> comparisons, on targets where that sets a cc. The lack of this >> transformation could be considered a blocker for the other one, to avoid >> regressing on bar. > Right. In fact, I think Jakub's test ought to be added to this work as > part of its basic testing. Um, how to write the above-test case for bar() in DejaGNU format ? Is it possible to test for number of insns or to use nm to test for size of a function with dejagnu directive ? Or should I scan for "cmpq" since the pattern transforms a right shift and cmp against 0 to cmp between operands ? Thanks, Prathamesh > > jeff >
On Tue, 3 Oct 2017, Jeff Law wrote: > On 10/03/2017 03:00 PM, Marc Glisse wrote: > > On Tue, 3 Oct 2017, Jakub Jelinek wrote: > > > >> On Tue, Oct 03, 2017 at 12:54:39PM -0700, Prathamesh Kulkarni wrote: > >>> Hi, > >>> This follow-up patch implements the patterns mentioned in $subject. > >>> Bootstrap+test in progress on x86_64-unknown-linux-gnu and > >>> aarch64-linux-gnu. > >>> OK to commit if passes ? > >>> > >>> Thanks, > >>> Prathamesh > >> > >>> 2017-10-03 Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> > >>> > >>> * match.pd ((X >> CST) == 0 -> X < (1 << CST)): New pattern. > >>> ((X >> CST) != 0 -> X >= (1 << CST)): Likewise. > > > > build_int_cstu doesn't work for vectors, you want build_one_cst. I never > > know if we should check single_use or not :-( > > > >>> testsuite/ > >>> * gcc.dg/tree-ssa/cmpdiv.c: Add test-cases f3 and f4. > >> > >> Why this way and not the other way around? > > > > For high level gimple optimizations, X < CST is more convenient (and > > smaller, just one insn) than (X >> CST) == 0. > Right. One could easily argue that Prathamesh's form should be the > preferred form because it is simpler at the gimple level -- and that > x86-isms should be dealt with later in the pipeline. > > > > > >> E.g. i?86/x86_64 and various other targets have shift instructions which > >> set condition codes, so (X >> 51) == 0 is certainly smaller > >> smaller and I believe cheaper than the latter. > >> Try: > >> void foo (void); > >> > >> void > >> bar (unsigned long long x) > >> { > >> if ((x >> 51) == 0) > >> foo (); > >> } > >> > >> void > >> baz (unsigned long long x) > >> { > >> if (x < (1LL << 51)) > >> foo (); > >> } > >> with -Os on x86_64, the first function is 4 insns, 12 bytes, > >> the second one 5 insns, 21 bytes. > >> > >> I wonder if this kind of instruction selection stuff shouldn't be > >> done in target.pd instead, with input from the target. > Right, but I think that argues that Prathamesh's patch is the right > direction and that to move forward what needs to happen is something > needs to be fixed at the gimple/rtl border to ensure we get good x86 code. For this case it should be even "easy" within the current RTL expansion framework since all we need is to look at a single comparison and its operands. So I'd suggest to go forward with the canonicalization as proposed and address the expansion issue as followup. It's currently a missed optimization for baz anyway. Richard. > > > > At a late stage, maybe during an RTL pass or expansion (or just before > > expansion) it would indeed be good to generate a shift for such > > comparisons, on targets where that sets a cc. The lack of this > > transformation could be considered a blocker for the other one, to avoid > > regressing on bar. > Right. In fact, I think Jakub's test ought to be added to this work as > part of its basic testing. > > jeff > > -- Richard Biener <rguenther@suse.de> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
On 10/04/2017 12:34 PM, Prathamesh Kulkarni wrote: >>> At a late stage, maybe during an RTL pass or expansion (or just before >>> expansion) it would indeed be good to generate a shift for such >>> comparisons, on targets where that sets a cc. The lack of this >>> transformation could be considered a blocker for the other one, to avoid >>> regressing on bar. >> Right. In fact, I think Jakub's test ought to be added to this work as >> part of its basic testing. > Um, how to write the above-test case for bar() in DejaGNU format ? > Is it possible to test for number of insns or to use nm to test for > size of a function with dejagnu directive ? > Or should I scan for "cmpq" since the pattern transforms a right shift > and cmp against 0 to cmp between operands ? I've used a variety of approaches. The difficulty in the x86 world is that there's enough tuning variants that change the resulting code which can make scanning problematical. So one approach is to look at the total object size if that's a reliable indicator of the issue at hand. As an example see gcc.target/m68k/pr52076-?.c. I'm sure there's others if you were to search for "object-size" in the testsuite. You could try to count the insns or search for specific key insns that indicate we generated desirable or undesirable code. But what might ultimately be best would be to scan the rtl at expansion time. That catches things at the earliest point we can observe them. Another alternative would be to include some dumping code to indicate when we transform the canonical gimple form into the form we want for the x86 backend at rtl expansion. You could then scan the debugging dumps for those annotations. jeff
diff --git a/gcc/match.pd b/gcc/match.pd index 43ab226a705..883ad5ba53c 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -1287,6 +1287,18 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) && (VECTOR_TYPE_P (type) || !VECTOR_TYPE_P (TREE_TYPE (@0)))) (ocmp @0 @1)))) +/* Transform + (x >> cst) != 0 -> x >= (1 << cst) + (x >> cst) == 0 -> x < (1 << cst) + if x, cst are unsigned. */ +(for cmp (eq ne) + ocmp (lt ge) + (simplify + (cmp (rshift @0 INTEGER_CST@1) integer_zerop) + (if (TYPE_UNSIGNED (TREE_TYPE (@0)) + && (VECTOR_TYPE_P (type) || !VECTOR_TYPE_P (TREE_TYPE (@0)))) + (ocmp @0 (lshift { build_int_cstu (TREE_TYPE (@0), 1); } @1))))) + /* X == C - X can never be true if C is odd. */ (for cmp (eq ne) (simplify diff --git a/gcc/testsuite/gcc.dg/tree-ssa/cmpdiv.c b/gcc/testsuite/gcc.dg/tree-ssa/cmpdiv.c index 14161f5ea6f..fc5bc8c3674 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/cmpdiv.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/cmpdiv.c @@ -15,4 +15,19 @@ _Bool f2(unsigned x, unsigned y) return t2; } +_Bool f3(unsigned x) +{ + unsigned t1 = x >> 4; + _Bool t2 = (t1 != 0); + return t2; +} + +_Bool f4(unsigned x) +{ + unsigned t1 = x >> 4; + _Bool t2 = (t1 == 0); + return t2; +} + /* { dg-final { scan-tree-dump-not "trunc_div_expr" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "rshift_expr" "optimized" } } */