Message ID | CAAgBjMkJnnkCZMH9RtFkQCeZ2E2U_6B0HXwYYnWL2d+XsYJFnw@mail.gmail.com |
---|---|
State | Superseded |
Headers | show |
On Thu, 3 Nov 2016, Prathamesh Kulkarni wrote: > Hi Richard, > The attached patch tries to fix PR35691, by adding the following two > transforms to match.pd: > (x == 0 && y == 0) -> (x | typeof(x)(y)) == 0. > (x != 0 || y != 0) -> (x | typeof(x)(y)) != 0. > > For GENERIC, the "and" operator is truth_andif_expr, and it seems for GIMPLE, > it gets transformed to bit_and_expr > so to match for both GENERIC and GIMPLE, I had to guard the for-stmt: > > #if GENERIC > (for op (truth_andif truth_orif) > #elif GIMPLE > (for op (bit_and bit_ior) > #endif > > Is that OK ? As you are not removing the fold-const.c variant I'd say you should simply not look for truth_* and only handle GIMPLE. Note that we have tree-ssa-ifcombine.c which should handle the variant with control-flow (but I guess it does not and your patch wouldn't help it either). The transform would also work for vectors (element_precision for the test but also a value-matching zero which should ensure the same number of elements). Richard.
On 3 November 2016 at 16:13, Richard Biener <rguenther@suse.de> wrote: > On Thu, 3 Nov 2016, Prathamesh Kulkarni wrote: > >> Hi Richard, >> The attached patch tries to fix PR35691, by adding the following two >> transforms to match.pd: >> (x == 0 && y == 0) -> (x | typeof(x)(y)) == 0. >> (x != 0 || y != 0) -> (x | typeof(x)(y)) != 0. >> >> For GENERIC, the "and" operator is truth_andif_expr, and it seems for GIMPLE, >> it gets transformed to bit_and_expr >> so to match for both GENERIC and GIMPLE, I had to guard the for-stmt: >> >> #if GENERIC >> (for op (truth_andif truth_orif) >> #elif GIMPLE >> (for op (bit_and bit_ior) >> #endif >> >> Is that OK ? > > As you are not removing the fold-const.c variant I'd say you should > simply not look for truth_* and only handle GIMPLE. Note that we > have tree-ssa-ifcombine.c which should handle the variant with > control-flow (but I guess it does not and your patch wouldn't help > it either). > > The transform would also work for vectors (element_precision for > the test but also a value-matching zero which should ensure the > same number of elements). Um sorry, I didn't get how to check vectors to be of equal length by a matching zero. Could you please elaborate on that ? Thanks! > > Richard.
On Thu, 3 Nov 2016, Prathamesh Kulkarni wrote: > On 3 November 2016 at 16:13, Richard Biener <rguenther@suse.de> wrote: >> On Thu, 3 Nov 2016, Prathamesh Kulkarni wrote: >> >>> Hi Richard, >>> The attached patch tries to fix PR35691, by adding the following two >>> transforms to match.pd: >>> (x == 0 && y == 0) -> (x | typeof(x)(y)) == 0. >>> (x != 0 || y != 0) -> (x | typeof(x)(y)) != 0. >>> >>> For GENERIC, the "and" operator is truth_andif_expr, and it seems for GIMPLE, >>> it gets transformed to bit_and_expr >>> so to match for both GENERIC and GIMPLE, I had to guard the for-stmt: >>> >>> #if GENERIC >>> (for op (truth_andif truth_orif) >>> #elif GIMPLE >>> (for op (bit_and bit_ior) >>> #endif >>> >>> Is that OK ? >> >> As you are not removing the fold-const.c variant I'd say you should >> simply not look for truth_* and only handle GIMPLE. Note that we >> have tree-ssa-ifcombine.c which should handle the variant with >> control-flow (but I guess it does not and your patch wouldn't help >> it either). >> >> The transform would also work for vectors (element_precision for >> the test but also a value-matching zero which should ensure the >> same number of elements). > Um sorry, I didn't get how to check vectors to be of equal length by a > matching zero. > Could you please elaborate on that ? He may have meant something like: (op (cmp @0 integer_zerop@2) (cmp @1 @2)) So the last operand is checked with operand_equal_p instead of integer_zerop. But the fact that we could compute bit_ior on the comparison results should already imply that the number of elements is the same. This would also prevent the case where one vector is signed and the other unsigned, which requires a view_convert (I don't remember if convert automatically becomes view_convert here as in fold_convert or not). For (some_int == 0) & (some_long == 0), doing ((long)some_int | some_long) == 0 should also work (and it doesn't matter if we pick a sign- or zero-extension), but that's more complicated, not necessary for a first version. On platforms that have IOR on floats (at least x86 with SSE, maybe some vector mode on s390?), it would be cool to do the same for floats (most likely at the RTL level). -- Marc Glisse
On Thu, 3 Nov 2016, Richard Biener wrote: > On Thu, 3 Nov 2016, Prathamesh Kulkarni wrote: > >> Hi Richard, >> The attached patch tries to fix PR35691, by adding the following two >> transforms to match.pd: >> (x == 0 && y == 0) -> (x | typeof(x)(y)) == 0. >> (x != 0 || y != 0) -> (x | typeof(x)(y)) != 0. >> >> For GENERIC, the "and" operator is truth_andif_expr, and it seems for GIMPLE, >> it gets transformed to bit_and_expr >> so to match for both GENERIC and GIMPLE, I had to guard the for-stmt: >> >> #if GENERIC >> (for op (truth_andif truth_orif) >> #elif GIMPLE >> (for op (bit_and bit_ior) >> #endif >> >> Is that OK ? > > As you are not removing the fold-const.c variant I'd say you should > simply not look for truth_* and only handle GIMPLE. Note that we > have tree-ssa-ifcombine.c which should handle the variant with > control-flow (but I guess it does not and your patch wouldn't help > it either). > > The transform would also work for vectors (element_precision for > the test but also a value-matching zero which should ensure the > same number of elements). On the other hand, now that we are using VEC_COND_EXPR all over the place, it may be hard to write a testcase that makes this kind of pattern appear... (related to PR 68714) -- Marc Glisse
On November 3, 2016 6:11:07 PM GMT+01:00, Marc Glisse <marc.glisse@inria.fr> wrote: >On Thu, 3 Nov 2016, Prathamesh Kulkarni wrote: > >> On 3 November 2016 at 16:13, Richard Biener <rguenther@suse.de> >wrote: >>> On Thu, 3 Nov 2016, Prathamesh Kulkarni wrote: >>> >>>> Hi Richard, >>>> The attached patch tries to fix PR35691, by adding the following >two >>>> transforms to match.pd: >>>> (x == 0 && y == 0) -> (x | typeof(x)(y)) == 0. >>>> (x != 0 || y != 0) -> (x | typeof(x)(y)) != 0. >>>> >>>> For GENERIC, the "and" operator is truth_andif_expr, and it seems >for GIMPLE, >>>> it gets transformed to bit_and_expr >>>> so to match for both GENERIC and GIMPLE, I had to guard the >for-stmt: >>>> >>>> #if GENERIC >>>> (for op (truth_andif truth_orif) >>>> #elif GIMPLE >>>> (for op (bit_and bit_ior) >>>> #endif >>>> >>>> Is that OK ? >>> >>> As you are not removing the fold-const.c variant I'd say you should >>> simply not look for truth_* and only handle GIMPLE. Note that we >>> have tree-ssa-ifcombine.c which should handle the variant with >>> control-flow (but I guess it does not and your patch wouldn't help >>> it either). >>> >>> The transform would also work for vectors (element_precision for >>> the test but also a value-matching zero which should ensure the >>> same number of elements). >> Um sorry, I didn't get how to check vectors to be of equal length by >a >> matching zero. >> Could you please elaborate on that ? > >He may have meant something like: > > (op (cmp @0 integer_zerop@2) (cmp @1 @2)) I meant with one being @@2 to allow signed vs. Unsigned @0/@1 which was the point of the pattern. >So the last operand is checked with operand_equal_p instead of >integer_zerop. But the fact that we could compute bit_ior on the >comparison results should already imply that the number of elements is >the >same. Though for equality compares we also allow scalar results IIRC. This would also prevent the case where one vector is signed and >the >other unsigned, which requires a view_convert (I don't remember if >convert >automatically becomes view_convert here as in fold_convert or not). No, but the other way around (for sign changes). >For (some_int == 0) & (some_long == 0), doing ((long)some_int | >some_long) >== 0 should also work (and it doesn't matter if we pick a sign- or >zero-extension), but that's more complicated, not necessary for a first > >version. Yeah. >On platforms that have IOR on floats (at least x86 with SSE, maybe some > >vector mode on s390?), it would be cool to do the same for floats (most > >likely at the RTL level). On GIMPLE view-converts could come to the rescue here as well. Or we cab just allow bit-and/or on floats as much as we allow them on pointers. Richard.
On Thu, 3 Nov 2016, Richard Biener wrote: >>>> The transform would also work for vectors (element_precision for >>>> the test but also a value-matching zero which should ensure the >>>> same number of elements). >>> Um sorry, I didn't get how to check vectors to be of equal length by a >>> matching zero. >>> Could you please elaborate on that ? >> >> He may have meant something like: >> >> (op (cmp @0 integer_zerop@2) (cmp @1 @2)) > > I meant with one being @@2 to allow signed vs. Unsigned @0/@1 which was the point of the pattern. Oups, that's what I had written first, and then I somehow managed to confuse myself enough to remove it so as to remove the call to types_match :-( >> So the last operand is checked with operand_equal_p instead of >> integer_zerop. But the fact that we could compute bit_ior on the >> comparison results should already imply that the number of elements is >> the same. > > Though for equality compares we also allow scalar results IIRC. Oh, right, I keep forgetting that :-( And I have no idea how to generate one for a testcase, at least until the GIMPLE FE lands... >> On platforms that have IOR on floats (at least x86 with SSE, maybe some >> vector mode on s390?), it would be cool to do the same for floats (most >> likely at the RTL level). > > On GIMPLE view-converts could come to the rescue here as well. Or we > cab just allow bit-and/or on floats as much as we allow them on > pointers. Would that generate sensible code on targets that do not have logic insns for floats? Actually, even on x86_64 that generates inefficient code, so there would be some work (for instance grep finds no gen_iordf3, only gen_iorv2df3). I am also a bit wary of doing those obfuscating optimizations too early... a==0 is something that other optimizations might use. long c=(long&)a|(long&)b; (double&)c==0; less so... (and I am assuming that signaling NaNs don't make the whole transformation impossible, which might be wrong) -- Marc Glisse
On Thu, 3 Nov 2016, Marc Glisse wrote: > On Thu, 3 Nov 2016, Richard Biener wrote: > > > > > > The transform would also work for vectors (element_precision for > > > > > the test but also a value-matching zero which should ensure the > > > > > same number of elements). > > > > Um sorry, I didn't get how to check vectors to be of equal length by a > > > > matching zero. > > > > Could you please elaborate on that ? > > > > > > He may have meant something like: > > > > > > (op (cmp @0 integer_zerop@2) (cmp @1 @2)) > > > > I meant with one being @@2 to allow signed vs. Unsigned @0/@1 which was the > > point of the pattern. > > Oups, that's what I had written first, and then I somehow managed to confuse > myself enough to remove it so as to remove the call to types_match :-( > > > > So the last operand is checked with operand_equal_p instead of > > > integer_zerop. But the fact that we could compute bit_ior on the > > > comparison results should already imply that the number of elements is the > > > same. > > > > Though for equality compares we also allow scalar results IIRC. > > Oh, right, I keep forgetting that :-( And I have no idea how to generate one > for a testcase, at least until the GIMPLE FE lands... > > > > On platforms that have IOR on floats (at least x86 with SSE, maybe some > > > vector mode on s390?), it would be cool to do the same for floats (most > > > likely at the RTL level). > > > > On GIMPLE view-converts could come to the rescue here as well. Or we cab > > just allow bit-and/or on floats as much as we allow them on pointers. > > Would that generate sensible code on targets that do not have logic insns for > floats? Actually, even on x86_64 that generates inefficient code, so there > would be some work (for instance grep finds no gen_iordf3, only gen_iorv2df3). > > I am also a bit wary of doing those obfuscating optimizations too early... > a==0 is something that other optimizations might use. long > c=(long&)a|(long&)b; (double&)c==0; less so... > > (and I am assuming that signaling NaNs don't make the whole transformation > impossible, which might be wrong) Yeah. I also think it's not so much important - I just wanted to mention vectors... Btw, I still think we need a more sensible infrastructure for passes to gather, analyze and modify complex conditions. (I'm always pointing to tree-affine.c as an, albeit not very good, example for handling a similar problem) Richard.
diff --git a/gcc/match.pd b/gcc/match.pd index 48f7351..65930bb 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -519,6 +519,23 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (if (TYPE_UNSIGNED (type)) (bit_and @0 (bit_not (lshift { build_all_ones_cst (type); } @1))))) +/* PR35691: Transform + (x == 0 & y == 0) -> (x | typeof(x)(y)) == 0. + (x != 0 | y != 0) -> (x | typeof(x)(y)) != 0. */ + +#if GENERIC +(for op (truth_andif truth_orif) +#elif GIMPLE +(for op (bit_and bit_ior) +#endif + cmp (eq ne) + (simplify + (op (cmp @0 integer_zerop) (cmp @1 integer_zerop)) + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) + && INTEGRAL_TYPE_P (TREE_TYPE (@1)) + && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE (@1))) + (cmp (bit_ior @0 (convert @1)) { build_zero_cst (TREE_TYPE (@0)); })))) + /* Fold (A & ~B) - (A & B) into (A ^ B) - B. */ (simplify (minus (bit_and:cs @0 (bit_not @1)) (bit_and:cs @0 @1)) diff --git a/gcc/testsuite/gcc.dg/pr35691-1.c b/gcc/testsuite/gcc.dg/pr35691-1.c new file mode 100644 index 0000000..25a7ace --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr35691-1.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-gimple" } */ + +int foo1(int z0, unsigned z1) +{ + return (z0 == 0) && (z1 == 0); +} + +/* { dg-final { scan-tree-dump-not "z1.\[0-9\]*_\[0-9\]* = (int) z1" "gimple" } } */ diff --git a/gcc/testsuite/gcc.dg/pr35691-2.c b/gcc/testsuite/gcc.dg/pr35691-2.c new file mode 100644 index 0000000..5211f815 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr35691-2.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-forwprop-details" } */ + +int foo(int z0, unsigned z1) +{ + int t0 = (z0 == 0); + int t1 = (z1 == 0); + int t2 = (t0 && t1); + return t2; +} + +/* { dg-final { scan-tree-dump "gimple_simplified to _\[0-9\]* = \\(int\\) z1_\[0-9\]*\\(D\\);" "forwprop1" } } */ diff --git a/gcc/testsuite/gcc.dg/pr35691-3.c b/gcc/testsuite/gcc.dg/pr35691-3.c new file mode 100644 index 0000000..134bbdf --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr35691-3.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-gimple" } */ + +int foo1(int z0, unsigned z1) +{ + return (z0 != 0) || (z1 != 0); +} + +/* { dg-final { scan-tree-dump-not "z1.\[0-9\]*_\[0-9\]* = (int) z1" "gimple" } } */ diff --git a/gcc/testsuite/gcc.dg/pr35691-4.c b/gcc/testsuite/gcc.dg/pr35691-4.c new file mode 100644 index 0000000..90cbf6d --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr35691-4.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-forwprop-details" } */ + +int foo(int z0, unsigned z1) +{ + int t0 = (z0 != 0); + int t1 = (z1 != 0); + int t2 = (t0 || t1); + return t2; +} + +/* { dg-final { scan-tree-dump "gimple_simplified to _\[0-9\]* = \\(int\\) z1_\[0-9\]*\\(D\\);" "forwprop1" } } */