Message ID | CAAgBjMnP8EK=wgJ9kwZFOPM-Cmmsb7iYTBKDNOy88-0t3T6Ovw@mail.gmail.com |
---|---|
State | New |
Headers | show |
On 7 November 2016 at 23:06, Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> wrote: > On 7 November 2016 at 15:43, Richard Biener <rguenther@suse.de> wrote: >> On Fri, 4 Nov 2016, Prathamesh Kulkarni wrote: >> >>> On 4 November 2016 at 13:41, Richard Biener <rguenther@suse.de> wrote: >>> > On Thu, 3 Nov 2016, Marc Glisse wrote: >>> > >>> >> On Thu, 3 Nov 2016, Richard Biener wrote: >>> >> >>> >> > > > > The transform would also work for vectors (element_precision for >>> >> > > > > the test but also a value-matching zero which should ensure the >>> >> > > > > same number of elements). >>> >> > > > Um sorry, I didn't get how to check vectors to be of equal length by a >>> >> > > > matching zero. >>> >> > > > Could you please elaborate on that ? >>> >> > > >>> >> > > He may have meant something like: >>> >> > > >>> >> > > (op (cmp @0 integer_zerop@2) (cmp @1 @2)) >>> >> > >>> >> > I meant with one being @@2 to allow signed vs. Unsigned @0/@1 which was the >>> >> > point of the pattern. >>> >> >>> >> Oups, that's what I had written first, and then I somehow managed to confuse >>> >> myself enough to remove it so as to remove the call to types_match :-( >>> >> >>> >> > > So the last operand is checked with operand_equal_p instead of >>> >> > > integer_zerop. But the fact that we could compute bit_ior on the >>> >> > > comparison results should already imply that the number of elements is the >>> >> > > same. >>> >> > >>> >> > Though for equality compares we also allow scalar results IIRC. >>> >> >>> >> Oh, right, I keep forgetting that :-( And I have no idea how to generate one >>> >> for a testcase, at least until the GIMPLE FE lands... >>> >> >>> >> > > On platforms that have IOR on floats (at least x86 with SSE, maybe some >>> >> > > vector mode on s390?), it would be cool to do the same for floats (most >>> >> > > likely at the RTL level). >>> >> > >>> >> > On GIMPLE view-converts could come to the rescue here as well. Or we cab >>> >> > just allow bit-and/or on floats as much as we allow them on pointers. >>> >> >>> >> Would that generate sensible code on targets that do not have logic insns for >>> >> floats? Actually, even on x86_64 that generates inefficient code, so there >>> >> would be some work (for instance grep finds no gen_iordf3, only gen_iorv2df3). >>> >> >>> >> I am also a bit wary of doing those obfuscating optimizations too early... >>> >> a==0 is something that other optimizations might use. long >>> >> c=(long&)a|(long&)b; (double&)c==0; less so... >>> >> >>> >> (and I am assuming that signaling NaNs don't make the whole transformation >>> >> impossible, which might be wrong) >>> > >>> > Yeah. I also think it's not so much important - I just wanted to mention >>> > vectors... >>> > >>> > Btw, I still think we need a more sensible infrastructure for passes >>> > to gather, analyze and modify complex conditions. (I'm always pointing >>> > to tree-affine.c as an, albeit not very good, example for handling >>> > a similar problem) >>> Thanks for mentioning the value-matching capture @@, I wasn't aware of >>> this match.pd feature. >>> The current patch keeps it restricted to only bitwise operators on integers. >>> Bootstrap+test running on x86_64-unknown-linux-gnu. >>> OK to commit if passes ? >> >> +/* PR35691: Transform >> + (x == 0 & y == 0) -> (x | typeof(x)(y)) == 0. >> + (x != 0 | y != 0) -> (x | typeof(x)(y)) != 0. */ >> + >> >> Please omit the vertical space >> >> +(for bitop (bit_and bit_ior) >> + cmp (eq ne) >> + (simplify >> + (bitop (cmp @0 integer_zerop) (cmp @1 integer_zerop)) >> >> if you capture the first integer_zerop as @2 then you can re-use it... >> >> + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) >> + && INTEGRAL_TYPE_P (TREE_TYPE (@1)) >> + && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE >> (@1))) >> + (cmp (bit_ior @0 (convert @1)) { build_zero_cst (TREE_TYPE (@0)); >> >> ... here inplace of the { build_zero_cst ... }. >> >> Ok with that changes. > Thanks, committed the attached version as r241915. ugh, the svn commit message has: testsuite/ * gcc.dg/pr35691-1.c: New test-case. * gcc.dg/pr35691-4.c: Likewise. pr35691-4.c was a typo, should be pr35691-2.c :/ However testsuite/ChangeLog correctly has entry for pr35691-2.c Is it possible to edit the commit message for r241915 ? Sorry about this. Regards, Prathamesh > >> >> Richard.
On Mon, 7 Nov 2016, Prathamesh Kulkarni wrote: > On 7 November 2016 at 23:06, Prathamesh Kulkarni > <prathamesh.kulkarni@linaro.org> wrote: > > On 7 November 2016 at 15:43, Richard Biener <rguenther@suse.de> wrote: > >> On Fri, 4 Nov 2016, Prathamesh Kulkarni wrote: > >> > >>> On 4 November 2016 at 13:41, Richard Biener <rguenther@suse.de> wrote: > >>> > On Thu, 3 Nov 2016, Marc Glisse wrote: > >>> > > >>> >> On Thu, 3 Nov 2016, Richard Biener wrote: > >>> >> > >>> >> > > > > The transform would also work for vectors (element_precision for > >>> >> > > > > the test but also a value-matching zero which should ensure the > >>> >> > > > > same number of elements). > >>> >> > > > Um sorry, I didn't get how to check vectors to be of equal length by a > >>> >> > > > matching zero. > >>> >> > > > Could you please elaborate on that ? > >>> >> > > > >>> >> > > He may have meant something like: > >>> >> > > > >>> >> > > (op (cmp @0 integer_zerop@2) (cmp @1 @2)) > >>> >> > > >>> >> > I meant with one being @@2 to allow signed vs. Unsigned @0/@1 which was the > >>> >> > point of the pattern. > >>> >> > >>> >> Oups, that's what I had written first, and then I somehow managed to confuse > >>> >> myself enough to remove it so as to remove the call to types_match :-( > >>> >> > >>> >> > > So the last operand is checked with operand_equal_p instead of > >>> >> > > integer_zerop. But the fact that we could compute bit_ior on the > >>> >> > > comparison results should already imply that the number of elements is the > >>> >> > > same. > >>> >> > > >>> >> > Though for equality compares we also allow scalar results IIRC. > >>> >> > >>> >> Oh, right, I keep forgetting that :-( And I have no idea how to generate one > >>> >> for a testcase, at least until the GIMPLE FE lands... > >>> >> > >>> >> > > On platforms that have IOR on floats (at least x86 with SSE, maybe some > >>> >> > > vector mode on s390?), it would be cool to do the same for floats (most > >>> >> > > likely at the RTL level). > >>> >> > > >>> >> > On GIMPLE view-converts could come to the rescue here as well. Or we cab > >>> >> > just allow bit-and/or on floats as much as we allow them on pointers. > >>> >> > >>> >> Would that generate sensible code on targets that do not have logic insns for > >>> >> floats? Actually, even on x86_64 that generates inefficient code, so there > >>> >> would be some work (for instance grep finds no gen_iordf3, only gen_iorv2df3). > >>> >> > >>> >> I am also a bit wary of doing those obfuscating optimizations too early... > >>> >> a==0 is something that other optimizations might use. long > >>> >> c=(long&)a|(long&)b; (double&)c==0; less so... > >>> >> > >>> >> (and I am assuming that signaling NaNs don't make the whole transformation > >>> >> impossible, which might be wrong) > >>> > > >>> > Yeah. I also think it's not so much important - I just wanted to mention > >>> > vectors... > >>> > > >>> > Btw, I still think we need a more sensible infrastructure for passes > >>> > to gather, analyze and modify complex conditions. (I'm always pointing > >>> > to tree-affine.c as an, albeit not very good, example for handling > >>> > a similar problem) > >>> Thanks for mentioning the value-matching capture @@, I wasn't aware of > >>> this match.pd feature. > >>> The current patch keeps it restricted to only bitwise operators on integers. > >>> Bootstrap+test running on x86_64-unknown-linux-gnu. > >>> OK to commit if passes ? > >> > >> +/* PR35691: Transform > >> + (x == 0 & y == 0) -> (x | typeof(x)(y)) == 0. > >> + (x != 0 | y != 0) -> (x | typeof(x)(y)) != 0. */ > >> + > >> > >> Please omit the vertical space > >> > >> +(for bitop (bit_and bit_ior) > >> + cmp (eq ne) > >> + (simplify > >> + (bitop (cmp @0 integer_zerop) (cmp @1 integer_zerop)) > >> > >> if you capture the first integer_zerop as @2 then you can re-use it... > >> > >> + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) > >> + && INTEGRAL_TYPE_P (TREE_TYPE (@1)) > >> + && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE > >> (@1))) > >> + (cmp (bit_ior @0 (convert @1)) { build_zero_cst (TREE_TYPE (@0)); > >> > >> ... here inplace of the { build_zero_cst ... }. > >> > >> Ok with that changes. > > Thanks, committed the attached version as r241915. > ugh, the svn commit message has: > > testsuite/ > * gcc.dg/pr35691-1.c: New test-case. > * gcc.dg/pr35691-4.c: Likewise. > > pr35691-4.c was a typo, should be pr35691-2.c :/ > However testsuite/ChangeLog correctly has entry for pr35691-2.c > Is it possible to edit the commit message for r241915 ? > Sorry about this. No, just leave it as-is. Richard. > Regards, > Prathamesh > > > >> > >> Richard. > > -- Richard Biener <rguenther@suse.de> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
On 8 November 2016 at 13:23, Richard Biener <rguenther@suse.de> wrote: > On Mon, 7 Nov 2016, Prathamesh Kulkarni wrote: > >> On 7 November 2016 at 23:06, Prathamesh Kulkarni >> <prathamesh.kulkarni@linaro.org> wrote: >> > On 7 November 2016 at 15:43, Richard Biener <rguenther@suse.de> wrote: >> >> On Fri, 4 Nov 2016, Prathamesh Kulkarni wrote: >> >> >> >>> On 4 November 2016 at 13:41, Richard Biener <rguenther@suse.de> wrote: >> >>> > On Thu, 3 Nov 2016, Marc Glisse wrote: >> >>> > >> >>> >> On Thu, 3 Nov 2016, Richard Biener wrote: >> >>> >> >> >>> >> > > > > The transform would also work for vectors (element_precision for >> >>> >> > > > > the test but also a value-matching zero which should ensure the >> >>> >> > > > > same number of elements). >> >>> >> > > > Um sorry, I didn't get how to check vectors to be of equal length by a >> >>> >> > > > matching zero. >> >>> >> > > > Could you please elaborate on that ? >> >>> >> > > >> >>> >> > > He may have meant something like: >> >>> >> > > >> >>> >> > > (op (cmp @0 integer_zerop@2) (cmp @1 @2)) >> >>> >> > >> >>> >> > I meant with one being @@2 to allow signed vs. Unsigned @0/@1 which was the >> >>> >> > point of the pattern. >> >>> >> >> >>> >> Oups, that's what I had written first, and then I somehow managed to confuse >> >>> >> myself enough to remove it so as to remove the call to types_match :-( >> >>> >> >> >>> >> > > So the last operand is checked with operand_equal_p instead of >> >>> >> > > integer_zerop. But the fact that we could compute bit_ior on the >> >>> >> > > comparison results should already imply that the number of elements is the >> >>> >> > > same. >> >>> >> > >> >>> >> > Though for equality compares we also allow scalar results IIRC. >> >>> >> >> >>> >> Oh, right, I keep forgetting that :-( And I have no idea how to generate one >> >>> >> for a testcase, at least until the GIMPLE FE lands... >> >>> >> >> >>> >> > > On platforms that have IOR on floats (at least x86 with SSE, maybe some >> >>> >> > > vector mode on s390?), it would be cool to do the same for floats (most >> >>> >> > > likely at the RTL level). >> >>> >> > >> >>> >> > On GIMPLE view-converts could come to the rescue here as well. Or we cab >> >>> >> > just allow bit-and/or on floats as much as we allow them on pointers. >> >>> >> >> >>> >> Would that generate sensible code on targets that do not have logic insns for >> >>> >> floats? Actually, even on x86_64 that generates inefficient code, so there >> >>> >> would be some work (for instance grep finds no gen_iordf3, only gen_iorv2df3). >> >>> >> >> >>> >> I am also a bit wary of doing those obfuscating optimizations too early... >> >>> >> a==0 is something that other optimizations might use. long >> >>> >> c=(long&)a|(long&)b; (double&)c==0; less so... >> >>> >> >> >>> >> (and I am assuming that signaling NaNs don't make the whole transformation >> >>> >> impossible, which might be wrong) >> >>> > >> >>> > Yeah. I also think it's not so much important - I just wanted to mention >> >>> > vectors... >> >>> > >> >>> > Btw, I still think we need a more sensible infrastructure for passes >> >>> > to gather, analyze and modify complex conditions. (I'm always pointing >> >>> > to tree-affine.c as an, albeit not very good, example for handling >> >>> > a similar problem) >> >>> Thanks for mentioning the value-matching capture @@, I wasn't aware of >> >>> this match.pd feature. >> >>> The current patch keeps it restricted to only bitwise operators on integers. >> >>> Bootstrap+test running on x86_64-unknown-linux-gnu. >> >>> OK to commit if passes ? >> >> >> >> +/* PR35691: Transform >> >> + (x == 0 & y == 0) -> (x | typeof(x)(y)) == 0. >> >> + (x != 0 | y != 0) -> (x | typeof(x)(y)) != 0. */ >> >> + >> >> >> >> Please omit the vertical space >> >> >> >> +(for bitop (bit_and bit_ior) >> >> + cmp (eq ne) >> >> + (simplify >> >> + (bitop (cmp @0 integer_zerop) (cmp @1 integer_zerop)) >> >> >> >> if you capture the first integer_zerop as @2 then you can re-use it... >> >> >> >> + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) >> >> + && INTEGRAL_TYPE_P (TREE_TYPE (@1)) >> >> + && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE >> >> (@1))) >> >> + (cmp (bit_ior @0 (convert @1)) { build_zero_cst (TREE_TYPE (@0)); >> >> >> >> ... here inplace of the { build_zero_cst ... }. >> >> >> >> Ok with that changes. >> > Thanks, committed the attached version as r241915. >> ugh, the svn commit message has: >> >> testsuite/ >> * gcc.dg/pr35691-1.c: New test-case. >> * gcc.dg/pr35691-4.c: Likewise. >> >> pr35691-4.c was a typo, should be pr35691-2.c :/ >> However testsuite/ChangeLog correctly has entry for pr35691-2.c >> Is it possible to edit the commit message for r241915 ? >> Sorry about this. > > No, just leave it as-is. Hi, Chritstophe reported to me that the commit caused test-cases pr35691-1.c and pr35691-2.c (which were added by the commit) to FAIL for cortex-a5: http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/241915/arm-none-linux-gnueabihf/diff-gcc-rh60-arm-none-linux-gnueabihf-arm-cortex-a5-vfpv3-d16-fp16.txt It seems truth_andif_expr is not simplified to bit_and_expr on cortex-a5 as for x86_64 (and other arm variants). The differences in dumps start from 004t.gimple for pr35691-1.c: x86_64 gimple dump: foo (int z0, unsigned int z1) { int D.1800; int t0; int t1; int t2; _1 = z0 == 0; t0 = (int) _1; _2 = z1 == 0; t1 = (int) _2; _3 = t0 != 0; _4 = t1 != 0; _5 = _3 & _4; t2 = (int) _5; D.1800 = t2; return D.1800; } cortex-a5 gimple dump: foo (int z0, unsigned int z1) { int iftmp.0; int D.4176; int t0; int t1; int t2; _1 = z0 == 0; t0 = (int) _1; _2 = z1 == 0; t1 = (int) _2; if (t0 != 0) goto <D.4174>; else goto <D.4172>; <D.4174>: if (t1 != 0) goto <D.4175>; else goto <D.4172>; <D.4175>: iftmp.0 = 1; goto <D.4173>; <D.4172>: iftmp.0 = 0; <D.4173>: t2 = iftmp.0; D.4176 = t2; return D.4176; } Since the pattern expects truth_andif_expr to be converted to bit_and_expr, it fails to match for cortex-a5. This seems to happen only for cortex-a5 (the other variants a9, a15, a57 are OK). Is my assumption that truth_andif_expr would be always converted to bit_and_expr for above case incorrect ? Thanks, Prathamesh > > Richard. > >> Regards, >> Prathamesh >> > >> >> >> >> Richard. >> >> > > -- > Richard Biener <rguenther@suse.de> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
On Tue, 8 Nov 2016, Prathamesh Kulkarni wrote: > On 8 November 2016 at 13:23, Richard Biener <rguenther@suse.de> wrote: > > On Mon, 7 Nov 2016, Prathamesh Kulkarni wrote: > > > >> On 7 November 2016 at 23:06, Prathamesh Kulkarni > >> <prathamesh.kulkarni@linaro.org> wrote: > >> > On 7 November 2016 at 15:43, Richard Biener <rguenther@suse.de> wrote: > >> >> On Fri, 4 Nov 2016, Prathamesh Kulkarni wrote: > >> >> > >> >>> On 4 November 2016 at 13:41, Richard Biener <rguenther@suse.de> wrote: > >> >>> > On Thu, 3 Nov 2016, Marc Glisse wrote: > >> >>> > > >> >>> >> On Thu, 3 Nov 2016, Richard Biener wrote: > >> >>> >> > >> >>> >> > > > > The transform would also work for vectors (element_precision for > >> >>> >> > > > > the test but also a value-matching zero which should ensure the > >> >>> >> > > > > same number of elements). > >> >>> >> > > > Um sorry, I didn't get how to check vectors to be of equal length by a > >> >>> >> > > > matching zero. > >> >>> >> > > > Could you please elaborate on that ? > >> >>> >> > > > >> >>> >> > > He may have meant something like: > >> >>> >> > > > >> >>> >> > > (op (cmp @0 integer_zerop@2) (cmp @1 @2)) > >> >>> >> > > >> >>> >> > I meant with one being @@2 to allow signed vs. Unsigned @0/@1 which was the > >> >>> >> > point of the pattern. > >> >>> >> > >> >>> >> Oups, that's what I had written first, and then I somehow managed to confuse > >> >>> >> myself enough to remove it so as to remove the call to types_match :-( > >> >>> >> > >> >>> >> > > So the last operand is checked with operand_equal_p instead of > >> >>> >> > > integer_zerop. But the fact that we could compute bit_ior on the > >> >>> >> > > comparison results should already imply that the number of elements is the > >> >>> >> > > same. > >> >>> >> > > >> >>> >> > Though for equality compares we also allow scalar results IIRC. > >> >>> >> > >> >>> >> Oh, right, I keep forgetting that :-( And I have no idea how to generate one > >> >>> >> for a testcase, at least until the GIMPLE FE lands... > >> >>> >> > >> >>> >> > > On platforms that have IOR on floats (at least x86 with SSE, maybe some > >> >>> >> > > vector mode on s390?), it would be cool to do the same for floats (most > >> >>> >> > > likely at the RTL level). > >> >>> >> > > >> >>> >> > On GIMPLE view-converts could come to the rescue here as well. Or we cab > >> >>> >> > just allow bit-and/or on floats as much as we allow them on pointers. > >> >>> >> > >> >>> >> Would that generate sensible code on targets that do not have logic insns for > >> >>> >> floats? Actually, even on x86_64 that generates inefficient code, so there > >> >>> >> would be some work (for instance grep finds no gen_iordf3, only gen_iorv2df3). > >> >>> >> > >> >>> >> I am also a bit wary of doing those obfuscating optimizations too early... > >> >>> >> a==0 is something that other optimizations might use. long > >> >>> >> c=(long&)a|(long&)b; (double&)c==0; less so... > >> >>> >> > >> >>> >> (and I am assuming that signaling NaNs don't make the whole transformation > >> >>> >> impossible, which might be wrong) > >> >>> > > >> >>> > Yeah. I also think it's not so much important - I just wanted to mention > >> >>> > vectors... > >> >>> > > >> >>> > Btw, I still think we need a more sensible infrastructure for passes > >> >>> > to gather, analyze and modify complex conditions. (I'm always pointing > >> >>> > to tree-affine.c as an, albeit not very good, example for handling > >> >>> > a similar problem) > >> >>> Thanks for mentioning the value-matching capture @@, I wasn't aware of > >> >>> this match.pd feature. > >> >>> The current patch keeps it restricted to only bitwise operators on integers. > >> >>> Bootstrap+test running on x86_64-unknown-linux-gnu. > >> >>> OK to commit if passes ? > >> >> > >> >> +/* PR35691: Transform > >> >> + (x == 0 & y == 0) -> (x | typeof(x)(y)) == 0. > >> >> + (x != 0 | y != 0) -> (x | typeof(x)(y)) != 0. */ > >> >> + > >> >> > >> >> Please omit the vertical space > >> >> > >> >> +(for bitop (bit_and bit_ior) > >> >> + cmp (eq ne) > >> >> + (simplify > >> >> + (bitop (cmp @0 integer_zerop) (cmp @1 integer_zerop)) > >> >> > >> >> if you capture the first integer_zerop as @2 then you can re-use it... > >> >> > >> >> + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) > >> >> + && INTEGRAL_TYPE_P (TREE_TYPE (@1)) > >> >> + && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE > >> >> (@1))) > >> >> + (cmp (bit_ior @0 (convert @1)) { build_zero_cst (TREE_TYPE (@0)); > >> >> > >> >> ... here inplace of the { build_zero_cst ... }. > >> >> > >> >> Ok with that changes. > >> > Thanks, committed the attached version as r241915. > >> ugh, the svn commit message has: > >> > >> testsuite/ > >> * gcc.dg/pr35691-1.c: New test-case. > >> * gcc.dg/pr35691-4.c: Likewise. > >> > >> pr35691-4.c was a typo, should be pr35691-2.c :/ > >> However testsuite/ChangeLog correctly has entry for pr35691-2.c > >> Is it possible to edit the commit message for r241915 ? > >> Sorry about this. > > > > No, just leave it as-is. > Hi, > Chritstophe reported to me that the commit caused test-cases > pr35691-1.c and pr35691-2.c (which were added by the commit) > to FAIL for cortex-a5: > http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/241915/arm-none-linux-gnueabihf/diff-gcc-rh60-arm-none-linux-gnueabihf-arm-cortex-a5-vfpv3-d16-fp16.txt > > It seems truth_andif_expr is not simplified to bit_and_expr on > cortex-a5 as for x86_64 (and other arm variants). > The differences in dumps start from 004t.gimple for pr35691-1.c: > > x86_64 gimple dump: > foo (int z0, unsigned int z1) > { > int D.1800; > int t0; > int t1; > int t2; > > _1 = z0 == 0; > t0 = (int) _1; > _2 = z1 == 0; > t1 = (int) _2; > _3 = t0 != 0; > _4 = t1 != 0; > _5 = _3 & _4; > t2 = (int) _5; > D.1800 = t2; > return D.1800; > } > > cortex-a5 gimple dump: > foo (int z0, unsigned int z1) > { > int iftmp.0; > int D.4176; > int t0; > int t1; > int t2; > > _1 = z0 == 0; > t0 = (int) _1; > _2 = z1 == 0; > t1 = (int) _2; > if (t0 != 0) goto <D.4174>; else goto <D.4172>; > <D.4174>: > if (t1 != 0) goto <D.4175>; else goto <D.4172>; > <D.4175>: > iftmp.0 = 1; > goto <D.4173>; > <D.4172>: > iftmp.0 = 0; > <D.4173>: > t2 = iftmp.0; > D.4176 = t2; > return D.4176; > } > > Since the pattern expects truth_andif_expr to be converted to bit_and_expr, > it fails to match for cortex-a5. > This seems to happen only for cortex-a5 (the other variants a9, a15, > a57 are OK). > > Is my assumption that truth_andif_expr would be always converted to bit_and_expr > for above case incorrect ? Yes, it depends on LOGICAL_OP_SHORT_CIRCUIT. Richard. > Thanks, > Prathamesh > > > > Richard. > > > >> Regards, > >> Prathamesh > >> > > >> >> > >> >> Richard. > >> > >> > > > > -- > > Richard Biener <rguenther@suse.de> > > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg) > > -- Richard Biener <rguenther@suse.de> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
On 08/11/16 11:16, Richard Biener wrote: > On Tue, 8 Nov 2016, Prathamesh Kulkarni wrote: > >> On 8 November 2016 at 13:23, Richard Biener <rguenther@suse.de> wrote: >>> On Mon, 7 Nov 2016, Prathamesh Kulkarni wrote: >>> >>>> On 7 November 2016 at 23:06, Prathamesh Kulkarni >>>> <prathamesh.kulkarni@linaro.org> wrote: >>>>> On 7 November 2016 at 15:43, Richard Biener <rguenther@suse.de> wrote: >>>>>> On Fri, 4 Nov 2016, Prathamesh Kulkarni wrote: >>>>>> >>>>>>> On 4 November 2016 at 13:41, Richard Biener <rguenther@suse.de> wrote: >>>>>>>> On Thu, 3 Nov 2016, Marc Glisse wrote: >>>>>>>> >>>>>>>>> On Thu, 3 Nov 2016, Richard Biener wrote: >>>>>>>>> >>>>>>>>>>>>> The transform would also work for vectors (element_precision for >>>>>>>>>>>>> the test but also a value-matching zero which should ensure the >>>>>>>>>>>>> same number of elements). >>>>>>>>>>>> Um sorry, I didn't get how to check vectors to be of equal length by a >>>>>>>>>>>> matching zero. >>>>>>>>>>>> Could you please elaborate on that ? >>>>>>>>>>> He may have meant something like: >>>>>>>>>>> >>>>>>>>>>> (op (cmp @0 integer_zerop@2) (cmp @1 @2)) >>>>>>>>>> I meant with one being @@2 to allow signed vs. Unsigned @0/@1 which was the >>>>>>>>>> point of the pattern. >>>>>>>>> Oups, that's what I had written first, and then I somehow managed to confuse >>>>>>>>> myself enough to remove it so as to remove the call to types_match :-( >>>>>>>>> >>>>>>>>>>> So the last operand is checked with operand_equal_p instead of >>>>>>>>>>> integer_zerop. But the fact that we could compute bit_ior on the >>>>>>>>>>> comparison results should already imply that the number of elements is the >>>>>>>>>>> same. >>>>>>>>>> Though for equality compares we also allow scalar results IIRC. >>>>>>>>> Oh, right, I keep forgetting that :-( And I have no idea how to generate one >>>>>>>>> for a testcase, at least until the GIMPLE FE lands... >>>>>>>>> >>>>>>>>>>> On platforms that have IOR on floats (at least x86 with SSE, maybe some >>>>>>>>>>> vector mode on s390?), it would be cool to do the same for floats (most >>>>>>>>>>> likely at the RTL level). >>>>>>>>>> On GIMPLE view-converts could come to the rescue here as well. Or we cab >>>>>>>>>> just allow bit-and/or on floats as much as we allow them on pointers. >>>>>>>>> Would that generate sensible code on targets that do not have logic insns for >>>>>>>>> floats? Actually, even on x86_64 that generates inefficient code, so there >>>>>>>>> would be some work (for instance grep finds no gen_iordf3, only gen_iorv2df3). >>>>>>>>> >>>>>>>>> I am also a bit wary of doing those obfuscating optimizations too early... >>>>>>>>> a==0 is something that other optimizations might use. long >>>>>>>>> c=(long&)a|(long&)b; (double&)c==0; less so... >>>>>>>>> >>>>>>>>> (and I am assuming that signaling NaNs don't make the whole transformation >>>>>>>>> impossible, which might be wrong) >>>>>>>> Yeah. I also think it's not so much important - I just wanted to mention >>>>>>>> vectors... >>>>>>>> >>>>>>>> Btw, I still think we need a more sensible infrastructure for passes >>>>>>>> to gather, analyze and modify complex conditions. (I'm always pointing >>>>>>>> to tree-affine.c as an, albeit not very good, example for handling >>>>>>>> a similar problem) >>>>>>> Thanks for mentioning the value-matching capture @@, I wasn't aware of >>>>>>> this match.pd feature. >>>>>>> The current patch keeps it restricted to only bitwise operators on integers. >>>>>>> Bootstrap+test running on x86_64-unknown-linux-gnu. >>>>>>> OK to commit if passes ? >>>>>> +/* PR35691: Transform >>>>>> + (x == 0 & y == 0) -> (x | typeof(x)(y)) == 0. >>>>>> + (x != 0 | y != 0) -> (x | typeof(x)(y)) != 0. */ >>>>>> + >>>>>> >>>>>> Please omit the vertical space >>>>>> >>>>>> +(for bitop (bit_and bit_ior) >>>>>> + cmp (eq ne) >>>>>> + (simplify >>>>>> + (bitop (cmp @0 integer_zerop) (cmp @1 integer_zerop)) >>>>>> >>>>>> if you capture the first integer_zerop as @2 then you can re-use it... >>>>>> >>>>>> + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) >>>>>> + && INTEGRAL_TYPE_P (TREE_TYPE (@1)) >>>>>> + && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE >>>>>> (@1))) >>>>>> + (cmp (bit_ior @0 (convert @1)) { build_zero_cst (TREE_TYPE (@0)); >>>>>> >>>>>> ... here inplace of the { build_zero_cst ... }. >>>>>> >>>>>> Ok with that changes. >>>>> Thanks, committed the attached version as r241915. >>>> ugh, the svn commit message has: >>>> >>>> testsuite/ >>>> * gcc.dg/pr35691-1.c: New test-case. >>>> * gcc.dg/pr35691-4.c: Likewise. >>>> >>>> pr35691-4.c was a typo, should be pr35691-2.c :/ >>>> However testsuite/ChangeLog correctly has entry for pr35691-2.c >>>> Is it possible to edit the commit message for r241915 ? >>>> Sorry about this. >>> No, just leave it as-is. >> Hi, >> Chritstophe reported to me that the commit caused test-cases >> pr35691-1.c and pr35691-2.c (which were added by the commit) >> to FAIL for cortex-a5: >> http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/241915/arm-none-linux-gnueabihf/diff-gcc-rh60-arm-none-linux-gnueabihf-arm-cortex-a5-vfpv3-d16-fp16.txt >> >> It seems truth_andif_expr is not simplified to bit_and_expr on >> cortex-a5 as for x86_64 (and other arm variants). >> The differences in dumps start from 004t.gimple for pr35691-1.c: >> >> x86_64 gimple dump: >> foo (int z0, unsigned int z1) >> { >> int D.1800; >> int t0; >> int t1; >> int t2; >> >> _1 = z0 == 0; >> t0 = (int) _1; >> _2 = z1 == 0; >> t1 = (int) _2; >> _3 = t0 != 0; >> _4 = t1 != 0; >> _5 = _3 & _4; >> t2 = (int) _5; >> D.1800 = t2; >> return D.1800; >> } >> >> cortex-a5 gimple dump: >> foo (int z0, unsigned int z1) >> { >> int iftmp.0; >> int D.4176; >> int t0; >> int t1; >> int t2; >> >> _1 = z0 == 0; >> t0 = (int) _1; >> _2 = z1 == 0; >> t1 = (int) _2; >> if (t0 != 0) goto <D.4174>; else goto <D.4172>; >> <D.4174>: >> if (t1 != 0) goto <D.4175>; else goto <D.4172>; >> <D.4175>: >> iftmp.0 = 1; >> goto <D.4173>; >> <D.4172>: >> iftmp.0 = 0; >> <D.4173>: >> t2 = iftmp.0; >> D.4176 = t2; >> return D.4176; >> } >> >> Since the pattern expects truth_andif_expr to be converted to bit_and_expr, >> it fails to match for cortex-a5. >> This seems to happen only for cortex-a5 (the other variants a9, a15, >> a57 are OK). >> >> Is my assumption that truth_andif_expr would be always converted to bit_and_expr >> for above case incorrect ? > Yes, it depends on LOGICAL_OP_SHORT_CIRCUIT. Right, and that varies by core in the arm backend. looking at the tuning structures in arm.c I'd expect exynosm1 to also exhibit this Kyrill > Richard. > >> Thanks, >> Prathamesh >>> Richard. >>> >>>> Regards, >>>> Prathamesh >>>>>> Richard. >>>> >>> -- >>> Richard Biener <rguenther@suse.de> >>> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg) >>
> Chritstophe reported to me that the commit caused test-cases > pr35691-1.c and pr35691-2.c (which were added by the commit) > to FAIL for cortex-a5: > http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/241915/arm-none-linux-gnueabihf/diff-gcc-rh60-arm-none-linux-gnueabihf-arm-cortex-a5-vfpv3-d16-fp16.txt I also see the test fail on powerpc64le. The forwprop1 tree dump is attached in case it helps. Martin ;; Function foo (foo, funcdef_no=0, decl_uid=2506, cgraph_uid=0, symbol_order=0) Applying pattern match.pd:2422, gimple-match.c:1704 Applying pattern match.pd:913, gimple-match.c:621 Applying pattern match.pd:901, gimple-match.c:164 Applying pattern match.pd:2685, gimple-match.c:59999 gimple_simplified to if (_1 != 0) Applying pattern match.pd:913, generic-match.c:429 Applying pattern match.pd:901, generic-match.c:136 Applying pattern match.pd:2685, generic-match.c:30968 Replaced '_1 != 0' with 'z0_4(D) == 0' Applying pattern match.pd:2422, gimple-match.c:1704 Applying pattern match.pd:913, gimple-match.c:621 Applying pattern match.pd:901, gimple-match.c:164 Applying pattern match.pd:2685, gimple-match.c:59999 gimple_simplified to if (_2 != 0) Applying pattern match.pd:913, generic-match.c:429 Applying pattern match.pd:901, generic-match.c:136 Applying pattern match.pd:2685, generic-match.c:30968 Replaced '_2 != 0' with 'z1_6(D) == 0' gimple_simplified to _11 = iftmp.0_3; foo (int z0, unsigned int z1) { int t2; int t1; int t0; _Bool _1; _Bool _2; int iftmp.0_3; int _11; <bb 2>: _1 = z0_4(D) == 0; t0_5 = (int) _1; _2 = z1_6(D) == 0; t1_7 = (int) _2; if (z0_4(D) == 0) goto <bb 3>; else goto <bb 4>; <bb 3>: if (z1_6(D) == 0) goto <bb 5>; else goto <bb 4>; <bb 4>: <bb 5>: # iftmp.0_3 = PHI <1(3), 0(4)> t2_10 = iftmp.0_3; _11 = iftmp.0_3; return _11; }
diff --git a/gcc/match.pd b/gcc/match.pd index 48f7351..29ddcd8 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -519,6 +519,18 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (if (TYPE_UNSIGNED (type)) (bit_and @0 (bit_not (lshift { build_all_ones_cst (type); } @1))))) +/* PR35691: Transform + (x == 0 & y == 0) -> (x | typeof(x)(y)) == 0. + (x != 0 | y != 0) -> (x | typeof(x)(y)) != 0. */ +(for bitop (bit_and bit_ior) + cmp (eq ne) + (simplify + (bitop (cmp @0 integer_zerop@2) (cmp @1 integer_zerop)) + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) + && INTEGRAL_TYPE_P (TREE_TYPE (@1)) + && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE (@1))) + (cmp (bit_ior @0 (convert @1)) @2)))) + /* Fold (A & ~B) - (A & B) into (A ^ B) - B. */ (simplify (minus (bit_and:cs @0 (bit_not @1)) (bit_and:cs @0 @1)) diff --git a/gcc/testsuite/gcc.dg/pr35691-1.c b/gcc/testsuite/gcc.dg/pr35691-1.c new file mode 100644 index 0000000..5211f815 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr35691-1.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-forwprop-details" } */ + +int foo(int z0, unsigned z1) +{ + int t0 = (z0 == 0); + int t1 = (z1 == 0); + int t2 = (t0 && t1); + return t2; +} + +/* { dg-final { scan-tree-dump "gimple_simplified to _\[0-9\]* = \\(int\\) z1_\[0-9\]*\\(D\\);" "forwprop1" } } */ diff --git a/gcc/testsuite/gcc.dg/pr35691-2.c b/gcc/testsuite/gcc.dg/pr35691-2.c new file mode 100644 index 0000000..90cbf6d --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr35691-2.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-forwprop-details" } */ + +int foo(int z0, unsigned z1) +{ + int t0 = (z0 != 0); + int t1 = (z1 != 0); + int t2 = (t0 || t1); + return t2; +} + +/* { dg-final { scan-tree-dump "gimple_simplified to _\[0-9\]* = \\(int\\) z1_\[0-9\]*\\(D\\);" "forwprop1" } } */