Message ID | CAAgBjM=gRR+C8GwrT6=9nHQZ3vapJKuOPsrDMHjEWmEnFCa3Rw@mail.gmail.com |
---|---|
State | New |
Headers | show |
On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote: > On 24 November 2016 at 14:07, Richard Biener <rguenther@suse.de> wrote: > > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote: > > > >> Hi, > >> Consider following test-case: > >> > >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3) > >> { > >> __builtin_memcpy (a1, a2, a3); > >> return a1; > >> } > >> > >> return a1 can be considered equivalent to return value of memcpy, > >> and the call could be emitted as a tail-call. > >> gcc doesn't emit the above call to memcpy as a tail-call, > >> but if it is changed to: > >> > >> void *t1 = __builtin_memcpy (a1, a2, a3); > >> return t1; > >> > >> Then memcpy is emitted as a tail-call. > >> The attached patch tries to handle the former case. > >> > >> Bootstrapped+tested on x86_64-unknown-linux-gnu. > >> Cross tested on arm*-*-*, aarch64*-*-* > >> Does this patch look OK ? > > > > +/* Return arg, if function returns it's argument or NULL if it doesn't. > > */ > > +tree > > +gimple_call_return_arg (gcall *call_stmt) > > +{ > > > > > > Please just inline it at the single use - the name is not terribly > > informative. > > > > I'm not sure you can rely on code-generation working if you not > > effectively change the IL to > > > > a1 = __builtin_memcpy (a1, a2, a3); > > return a1; > > > > someone more familiar with RTL expansion plus tail call emission on > > RTL needs to chime in. > Well I was trying to copy-propagate function's argument into uses of > it's return value if > function returned that argument, so the assignment to lhs of call > could be made redundant. > > eg: > void *f(void *a1, void *a2, __SIZE_TYPE__ a3) > { > void *t1 = __builtin_memcpy (a1, a2, a3); > return t1; > } > > After patch, copyprop transformed it into: > t1 = __builtin_memcpy (a1, a2, a3); > return a1; But that's a bad transform -- if we know that t1 == a1 then it's better to use t1 as that's readily available in the return register while the register for a1 might have been clobbered and thus we need to spill it for the later return. > But this now interferes with tail-call optimization, because it is not > able to emit memcpy > as tail-call anymore due to which the patch regressed 20050503-1.c. > I am not sure how to workaround this. > > Thanks, > Prathamesh > > > > Richard. > -- Richard Biener <rguenther@suse.de> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
On Thu, Nov 24, 2016 at 01:18:37PM +0100, Richard Biener wrote: > > eg: > > void *f(void *a1, void *a2, __SIZE_TYPE__ a3) > > { > > void *t1 = __builtin_memcpy (a1, a2, a3); > > return t1; > > } > > > > After patch, copyprop transformed it into: > > t1 = __builtin_memcpy (a1, a2, a3); > > return a1; > > But that's a bad transform -- if we know that t1 == a1 then it's > better to use t1 as that's readily available in the return register > while the register for a1 might have been clobbered and thus we > need to spill it for the later return. I thought that the RA is aware of this and is able to undo it. Jakub
On Thu, 24 Nov 2016, Jakub Jelinek wrote: > On Thu, Nov 24, 2016 at 01:18:37PM +0100, Richard Biener wrote: > > > eg: > > > void *f(void *a1, void *a2, __SIZE_TYPE__ a3) > > > { > > > void *t1 = __builtin_memcpy (a1, a2, a3); > > > return t1; > > > } > > > > > > After patch, copyprop transformed it into: > > > t1 = __builtin_memcpy (a1, a2, a3); > > > return a1; > > > > But that's a bad transform -- if we know that t1 == a1 then it's > > better to use t1 as that's readily available in the return register > > while the register for a1 might have been clobbered and thus we > > need to spill it for the later return. > > I thought that the RA is aware of this and is able to undo it. The RA has some feature here, yes. Not sure what it can do exactly and if it can do something when the return value is optimized away. Richard.
On 24 November 2016 at 17:48, Richard Biener <rguenther@suse.de> wrote: > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote: > >> On 24 November 2016 at 14:07, Richard Biener <rguenther@suse.de> wrote: >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote: >> > >> >> Hi, >> >> Consider following test-case: >> >> >> >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3) >> >> { >> >> __builtin_memcpy (a1, a2, a3); >> >> return a1; >> >> } >> >> >> >> return a1 can be considered equivalent to return value of memcpy, >> >> and the call could be emitted as a tail-call. >> >> gcc doesn't emit the above call to memcpy as a tail-call, >> >> but if it is changed to: >> >> >> >> void *t1 = __builtin_memcpy (a1, a2, a3); >> >> return t1; >> >> >> >> Then memcpy is emitted as a tail-call. >> >> The attached patch tries to handle the former case. >> >> >> >> Bootstrapped+tested on x86_64-unknown-linux-gnu. >> >> Cross tested on arm*-*-*, aarch64*-*-* >> >> Does this patch look OK ? >> > >> > +/* Return arg, if function returns it's argument or NULL if it doesn't. >> > */ >> > +tree >> > +gimple_call_return_arg (gcall *call_stmt) >> > +{ >> > >> > >> > Please just inline it at the single use - the name is not terribly >> > informative. >> > >> > I'm not sure you can rely on code-generation working if you not >> > effectively change the IL to >> > >> > a1 = __builtin_memcpy (a1, a2, a3); >> > return a1; >> > >> > someone more familiar with RTL expansion plus tail call emission on >> > RTL needs to chime in. >> Well I was trying to copy-propagate function's argument into uses of >> it's return value if >> function returned that argument, so the assignment to lhs of call >> could be made redundant. >> >> eg: >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3) >> { >> void *t1 = __builtin_memcpy (a1, a2, a3); >> return t1; >> } >> >> After patch, copyprop transformed it into: >> t1 = __builtin_memcpy (a1, a2, a3); >> return a1; > > But that's a bad transform -- if we know that t1 == a1 then it's > better to use t1 as that's readily available in the return register > while the register for a1 might have been clobbered and thus we > need to spill it for the later return. Oh I didn't realize this could possibly pessimize RA. For test-case: void *t1 = memcpy (dest, src, n); if (t1 != dest) __builtin_abort (); we could copy-propagate t1 into cond_expr and make the condition redundant. However I suppose this particular case could be handled with VRP instead (t1 and dest should be marked equivalent) ? Thanks, Prathamesh > >> But this now interferes with tail-call optimization, because it is not >> able to emit memcpy >> as tail-call anymore due to which the patch regressed 20050503-1.c. >> I am not sure how to workaround this. >> >> Thanks, >> Prathamesh >> > >> > Richard. >> > > -- > Richard Biener <rguenther@suse.de> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote: > On 24 November 2016 at 17:48, Richard Biener <rguenther@suse.de> wrote: > > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote: > > > >> On 24 November 2016 at 14:07, Richard Biener <rguenther@suse.de> wrote: > >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote: > >> > > >> >> Hi, > >> >> Consider following test-case: > >> >> > >> >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3) > >> >> { > >> >> __builtin_memcpy (a1, a2, a3); > >> >> return a1; > >> >> } > >> >> > >> >> return a1 can be considered equivalent to return value of memcpy, > >> >> and the call could be emitted as a tail-call. > >> >> gcc doesn't emit the above call to memcpy as a tail-call, > >> >> but if it is changed to: > >> >> > >> >> void *t1 = __builtin_memcpy (a1, a2, a3); > >> >> return t1; > >> >> > >> >> Then memcpy is emitted as a tail-call. > >> >> The attached patch tries to handle the former case. > >> >> > >> >> Bootstrapped+tested on x86_64-unknown-linux-gnu. > >> >> Cross tested on arm*-*-*, aarch64*-*-* > >> >> Does this patch look OK ? > >> > > >> > +/* Return arg, if function returns it's argument or NULL if it doesn't. > >> > */ > >> > +tree > >> > +gimple_call_return_arg (gcall *call_stmt) > >> > +{ > >> > > >> > > >> > Please just inline it at the single use - the name is not terribly > >> > informative. > >> > > >> > I'm not sure you can rely on code-generation working if you not > >> > effectively change the IL to > >> > > >> > a1 = __builtin_memcpy (a1, a2, a3); > >> > return a1; > >> > > >> > someone more familiar with RTL expansion plus tail call emission on > >> > RTL needs to chime in. > >> Well I was trying to copy-propagate function's argument into uses of > >> it's return value if > >> function returned that argument, so the assignment to lhs of call > >> could be made redundant. > >> > >> eg: > >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3) > >> { > >> void *t1 = __builtin_memcpy (a1, a2, a3); > >> return t1; > >> } > >> > >> After patch, copyprop transformed it into: > >> t1 = __builtin_memcpy (a1, a2, a3); > >> return a1; > > > > But that's a bad transform -- if we know that t1 == a1 then it's > > better to use t1 as that's readily available in the return register > > while the register for a1 might have been clobbered and thus we > > need to spill it for the later return. > Oh I didn't realize this could possibly pessimize RA. > For test-case: > > void *t1 = memcpy (dest, src, n); > if (t1 != dest) > __builtin_abort (); > > we could copy-propagate t1 into cond_expr and make the condition redundant. > However I suppose this particular case could be handled with VRP instead > (t1 and dest should be marked equivalent) ? Yeah, exposing this to value-numbering in general can enable some optimizations (but I wouldn't put it in copyprop). Note it's then difficult to avoid copy-propgating things... The user can also write void *f(void *a1, void *a2, __SIZE_TYPE__ a3) { __builtin_memcpy (a1, a2, a3); return a1; } so it's good to improve code-gen for that (for the tailcall issue). Richard. > Thanks, > Prathamesh > > > >> But this now interferes with tail-call optimization, because it is not > >> able to emit memcpy > >> as tail-call anymore due to which the patch regressed 20050503-1.c. > >> I am not sure how to workaround this. > >> > >> Thanks, > >> Prathamesh > >> > > >> > Richard. > >> > > > > -- > > Richard Biener <rguenther@suse.de> > > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg) > > -- Richard Biener <rguenther@suse.de> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
On 24 November 2016 at 18:08, Richard Biener <rguenther@suse.de> wrote: > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote: > >> On 24 November 2016 at 17:48, Richard Biener <rguenther@suse.de> wrote: >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote: >> > >> >> On 24 November 2016 at 14:07, Richard Biener <rguenther@suse.de> wrote: >> >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote: >> >> > >> >> >> Hi, >> >> >> Consider following test-case: >> >> >> >> >> >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3) >> >> >> { >> >> >> __builtin_memcpy (a1, a2, a3); >> >> >> return a1; >> >> >> } >> >> >> >> >> >> return a1 can be considered equivalent to return value of memcpy, >> >> >> and the call could be emitted as a tail-call. >> >> >> gcc doesn't emit the above call to memcpy as a tail-call, >> >> >> but if it is changed to: >> >> >> >> >> >> void *t1 = __builtin_memcpy (a1, a2, a3); >> >> >> return t1; >> >> >> >> >> >> Then memcpy is emitted as a tail-call. >> >> >> The attached patch tries to handle the former case. >> >> >> >> >> >> Bootstrapped+tested on x86_64-unknown-linux-gnu. >> >> >> Cross tested on arm*-*-*, aarch64*-*-* >> >> >> Does this patch look OK ? >> >> > >> >> > +/* Return arg, if function returns it's argument or NULL if it doesn't. >> >> > */ >> >> > +tree >> >> > +gimple_call_return_arg (gcall *call_stmt) >> >> > +{ >> >> > >> >> > >> >> > Please just inline it at the single use - the name is not terribly >> >> > informative. >> >> > >> >> > I'm not sure you can rely on code-generation working if you not >> >> > effectively change the IL to >> >> > >> >> > a1 = __builtin_memcpy (a1, a2, a3); >> >> > return a1; >> >> > >> >> > someone more familiar with RTL expansion plus tail call emission on >> >> > RTL needs to chime in. >> >> Well I was trying to copy-propagate function's argument into uses of >> >> it's return value if >> >> function returned that argument, so the assignment to lhs of call >> >> could be made redundant. >> >> >> >> eg: >> >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3) >> >> { >> >> void *t1 = __builtin_memcpy (a1, a2, a3); >> >> return t1; >> >> } >> >> >> >> After patch, copyprop transformed it into: >> >> t1 = __builtin_memcpy (a1, a2, a3); >> >> return a1; >> > >> > But that's a bad transform -- if we know that t1 == a1 then it's >> > better to use t1 as that's readily available in the return register >> > while the register for a1 might have been clobbered and thus we >> > need to spill it for the later return. >> Oh I didn't realize this could possibly pessimize RA. >> For test-case: >> >> void *t1 = memcpy (dest, src, n); >> if (t1 != dest) >> __builtin_abort (); >> >> we could copy-propagate t1 into cond_expr and make the condition redundant. >> However I suppose this particular case could be handled with VRP instead >> (t1 and dest should be marked equivalent) ? > > Yeah, exposing this to value-numbering in general can enable some > optimizations (but I wouldn't put it in copyprop). Note it's then > difficult to avoid copy-propgating things... > > The user can also write > > void *f(void *a1, void *a2, __SIZE_TYPE__ a3) > { > __builtin_memcpy (a1, a2, a3); > return a1; > } > > so it's good to improve code-gen for that (for the tailcall issue). For the tail-call, issue should we artificially create a lhs and use that as return value (perhaps by a separate pass before tailcall) ? __builtin_memcpy (a1, a2, a3); return a1; gets transformed to: _1 = __builtin_memcpy (a1, a2, a3) return _1; So tail-call optimization pass would see the IL in it's expected form. Thanks, Prathamesh > > Richard. > >> Thanks, >> Prathamesh >> > >> >> But this now interferes with tail-call optimization, because it is not >> >> able to emit memcpy >> >> as tail-call anymore due to which the patch regressed 20050503-1.c. >> >> I am not sure how to workaround this. >> >> >> >> Thanks, >> >> Prathamesh >> >> > >> >> > Richard. >> >> >> > >> > -- >> > Richard Biener <rguenther@suse.de> >> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg) >> >> > > -- > Richard Biener <rguenther@suse.de> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
On Fri, 25 Nov 2016, Prathamesh Kulkarni wrote: > On 24 November 2016 at 18:08, Richard Biener <rguenther@suse.de> wrote: > > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote: > > > >> On 24 November 2016 at 17:48, Richard Biener <rguenther@suse.de> wrote: > >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote: > >> > > >> >> On 24 November 2016 at 14:07, Richard Biener <rguenther@suse.de> wrote: > >> >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote: > >> >> > > >> >> >> Hi, > >> >> >> Consider following test-case: > >> >> >> > >> >> >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3) > >> >> >> { > >> >> >> __builtin_memcpy (a1, a2, a3); > >> >> >> return a1; > >> >> >> } > >> >> >> > >> >> >> return a1 can be considered equivalent to return value of memcpy, > >> >> >> and the call could be emitted as a tail-call. > >> >> >> gcc doesn't emit the above call to memcpy as a tail-call, > >> >> >> but if it is changed to: > >> >> >> > >> >> >> void *t1 = __builtin_memcpy (a1, a2, a3); > >> >> >> return t1; > >> >> >> > >> >> >> Then memcpy is emitted as a tail-call. > >> >> >> The attached patch tries to handle the former case. > >> >> >> > >> >> >> Bootstrapped+tested on x86_64-unknown-linux-gnu. > >> >> >> Cross tested on arm*-*-*, aarch64*-*-* > >> >> >> Does this patch look OK ? > >> >> > > >> >> > +/* Return arg, if function returns it's argument or NULL if it doesn't. > >> >> > */ > >> >> > +tree > >> >> > +gimple_call_return_arg (gcall *call_stmt) > >> >> > +{ > >> >> > > >> >> > > >> >> > Please just inline it at the single use - the name is not terribly > >> >> > informative. > >> >> > > >> >> > I'm not sure you can rely on code-generation working if you not > >> >> > effectively change the IL to > >> >> > > >> >> > a1 = __builtin_memcpy (a1, a2, a3); > >> >> > return a1; > >> >> > > >> >> > someone more familiar with RTL expansion plus tail call emission on > >> >> > RTL needs to chime in. > >> >> Well I was trying to copy-propagate function's argument into uses of > >> >> it's return value if > >> >> function returned that argument, so the assignment to lhs of call > >> >> could be made redundant. > >> >> > >> >> eg: > >> >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3) > >> >> { > >> >> void *t1 = __builtin_memcpy (a1, a2, a3); > >> >> return t1; > >> >> } > >> >> > >> >> After patch, copyprop transformed it into: > >> >> t1 = __builtin_memcpy (a1, a2, a3); > >> >> return a1; > >> > > >> > But that's a bad transform -- if we know that t1 == a1 then it's > >> > better to use t1 as that's readily available in the return register > >> > while the register for a1 might have been clobbered and thus we > >> > need to spill it for the later return. > >> Oh I didn't realize this could possibly pessimize RA. > >> For test-case: > >> > >> void *t1 = memcpy (dest, src, n); > >> if (t1 != dest) > >> __builtin_abort (); > >> > >> we could copy-propagate t1 into cond_expr and make the condition redundant. > >> However I suppose this particular case could be handled with VRP instead > >> (t1 and dest should be marked equivalent) ? > > > > Yeah, exposing this to value-numbering in general can enable some > > optimizations (but I wouldn't put it in copyprop). Note it's then > > difficult to avoid copy-propgating things... > > > > The user can also write > > > > void *f(void *a1, void *a2, __SIZE_TYPE__ a3) > > { > > __builtin_memcpy (a1, a2, a3); > > return a1; > > } > > > > so it's good to improve code-gen for that (for the tailcall issue). > For the tail-call, issue should we artificially create a lhs and use that > as return value (perhaps by a separate pass before tailcall) ? > > __builtin_memcpy (a1, a2, a3); > return a1; > > gets transformed to: > _1 = __builtin_memcpy (a1, a2, a3) > return _1; > > So tail-call optimization pass would see the IL in it's expected form. As said, a RTL expert needs to chime in here. Iff then tail-call itself should do this rewrite. But if this form is required to make things work (I suppose you checked it _does_ actually work?) then we'd need to make sure later passes do not undo it. So it looks fragile to me. OTOH I seem to remember that the flags we set on GIMPLE are merely a hint to RTL expansion and the tailcalling is verified again there? Thanks, Richard. > Thanks, > Prathamesh > > > > Richard. > > > >> Thanks, > >> Prathamesh > >> > > >> >> But this now interferes with tail-call optimization, because it is not > >> >> able to emit memcpy > >> >> as tail-call anymore due to which the patch regressed 20050503-1.c. > >> >> I am not sure how to workaround this. > >> >> > >> >> Thanks, > >> >> Prathamesh > >> >> > > >> >> > Richard. > >> >> > >> > > >> > -- > >> > Richard Biener <rguenther@suse.de> > >> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg) > >> > >> > > > > -- > > Richard Biener <rguenther@suse.de> > > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg) > > -- Richard Biener <rguenther@suse.de> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
On 25 November 2016 at 13:37, Richard Biener <rguenther@suse.de> wrote: > On Fri, 25 Nov 2016, Prathamesh Kulkarni wrote: > >> On 24 November 2016 at 18:08, Richard Biener <rguenther@suse.de> wrote: >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote: >> > >> >> On 24 November 2016 at 17:48, Richard Biener <rguenther@suse.de> wrote: >> >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote: >> >> > >> >> >> On 24 November 2016 at 14:07, Richard Biener <rguenther@suse.de> wrote: >> >> >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote: >> >> >> > >> >> >> >> Hi, >> >> >> >> Consider following test-case: >> >> >> >> >> >> >> >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3) >> >> >> >> { >> >> >> >> __builtin_memcpy (a1, a2, a3); >> >> >> >> return a1; >> >> >> >> } >> >> >> >> >> >> >> >> return a1 can be considered equivalent to return value of memcpy, >> >> >> >> and the call could be emitted as a tail-call. >> >> >> >> gcc doesn't emit the above call to memcpy as a tail-call, >> >> >> >> but if it is changed to: >> >> >> >> >> >> >> >> void *t1 = __builtin_memcpy (a1, a2, a3); >> >> >> >> return t1; >> >> >> >> >> >> >> >> Then memcpy is emitted as a tail-call. >> >> >> >> The attached patch tries to handle the former case. >> >> >> >> >> >> >> >> Bootstrapped+tested on x86_64-unknown-linux-gnu. >> >> >> >> Cross tested on arm*-*-*, aarch64*-*-* >> >> >> >> Does this patch look OK ? >> >> >> > >> >> >> > +/* Return arg, if function returns it's argument or NULL if it doesn't. >> >> >> > */ >> >> >> > +tree >> >> >> > +gimple_call_return_arg (gcall *call_stmt) >> >> >> > +{ >> >> >> > >> >> >> > >> >> >> > Please just inline it at the single use - the name is not terribly >> >> >> > informative. >> >> >> > >> >> >> > I'm not sure you can rely on code-generation working if you not >> >> >> > effectively change the IL to >> >> >> > >> >> >> > a1 = __builtin_memcpy (a1, a2, a3); >> >> >> > return a1; >> >> >> > >> >> >> > someone more familiar with RTL expansion plus tail call emission on >> >> >> > RTL needs to chime in. >> >> >> Well I was trying to copy-propagate function's argument into uses of >> >> >> it's return value if >> >> >> function returned that argument, so the assignment to lhs of call >> >> >> could be made redundant. >> >> >> >> >> >> eg: >> >> >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3) >> >> >> { >> >> >> void *t1 = __builtin_memcpy (a1, a2, a3); >> >> >> return t1; >> >> >> } >> >> >> >> >> >> After patch, copyprop transformed it into: >> >> >> t1 = __builtin_memcpy (a1, a2, a3); >> >> >> return a1; >> >> > >> >> > But that's a bad transform -- if we know that t1 == a1 then it's >> >> > better to use t1 as that's readily available in the return register >> >> > while the register for a1 might have been clobbered and thus we >> >> > need to spill it for the later return. >> >> Oh I didn't realize this could possibly pessimize RA. >> >> For test-case: >> >> >> >> void *t1 = memcpy (dest, src, n); >> >> if (t1 != dest) >> >> __builtin_abort (); >> >> >> >> we could copy-propagate t1 into cond_expr and make the condition redundant. >> >> However I suppose this particular case could be handled with VRP instead >> >> (t1 and dest should be marked equivalent) ? >> > >> > Yeah, exposing this to value-numbering in general can enable some >> > optimizations (but I wouldn't put it in copyprop). Note it's then >> > difficult to avoid copy-propgating things... >> > >> > The user can also write >> > >> > void *f(void *a1, void *a2, __SIZE_TYPE__ a3) >> > { >> > __builtin_memcpy (a1, a2, a3); >> > return a1; >> > } >> > >> > so it's good to improve code-gen for that (for the tailcall issue). >> For the tail-call, issue should we artificially create a lhs and use that >> as return value (perhaps by a separate pass before tailcall) ? >> >> __builtin_memcpy (a1, a2, a3); >> return a1; >> >> gets transformed to: >> _1 = __builtin_memcpy (a1, a2, a3) >> return _1; >> >> So tail-call optimization pass would see the IL in it's expected form. > > As said, a RTL expert needs to chime in here. Iff then tail-call > itself should do this rewrite. But if this form is required to make > things work (I suppose you checked it _does_ actually work?) then > we'd need to make sure later passes do not undo it. So it looks > fragile to me. OTOH I seem to remember that the flags we set on > GIMPLE are merely a hint to RTL expansion and the tailcalling is > verified again there? Yeah, I verified the form works: void *f(void *a1, void *a2, __SIZE_TYPE__ a3) { void *t1 = __builtin_memcpy (a1, a2, a3); return t1; } assembly: f: .LFB0: .cfi_startproc jmp memcpy .cfi_endproc Thanks, Prathamesh > > Thanks, > Richard. > >> Thanks, >> Prathamesh >> > >> > Richard. >> > >> >> Thanks, >> >> Prathamesh >> >> > >> >> >> But this now interferes with tail-call optimization, because it is not >> >> >> able to emit memcpy >> >> >> as tail-call anymore due to which the patch regressed 20050503-1.c. >> >> >> I am not sure how to workaround this. >> >> >> >> >> >> Thanks, >> >> >> Prathamesh >> >> >> > >> >> >> > Richard. >> >> >> >> >> > >> >> > -- >> >> > Richard Biener <rguenther@suse.de> >> >> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg) >> >> >> >> >> > >> > -- >> > Richard Biener <rguenther@suse.de> >> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg) >> >> > > -- > Richard Biener <rguenther@suse.de> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
On Fri, 25 Nov 2016, Prathamesh Kulkarni wrote: > On 25 November 2016 at 13:37, Richard Biener <rguenther@suse.de> wrote: > > On Fri, 25 Nov 2016, Prathamesh Kulkarni wrote: > > > >> On 24 November 2016 at 18:08, Richard Biener <rguenther@suse.de> wrote: > >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote: > >> > > >> >> On 24 November 2016 at 17:48, Richard Biener <rguenther@suse.de> wrote: > >> >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote: > >> >> > > >> >> >> On 24 November 2016 at 14:07, Richard Biener <rguenther@suse.de> wrote: > >> >> >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote: > >> >> >> > > >> >> >> >> Hi, > >> >> >> >> Consider following test-case: > >> >> >> >> > >> >> >> >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3) > >> >> >> >> { > >> >> >> >> __builtin_memcpy (a1, a2, a3); > >> >> >> >> return a1; > >> >> >> >> } > >> >> >> >> > >> >> >> >> return a1 can be considered equivalent to return value of memcpy, > >> >> >> >> and the call could be emitted as a tail-call. > >> >> >> >> gcc doesn't emit the above call to memcpy as a tail-call, > >> >> >> >> but if it is changed to: > >> >> >> >> > >> >> >> >> void *t1 = __builtin_memcpy (a1, a2, a3); > >> >> >> >> return t1; > >> >> >> >> > >> >> >> >> Then memcpy is emitted as a tail-call. > >> >> >> >> The attached patch tries to handle the former case. > >> >> >> >> > >> >> >> >> Bootstrapped+tested on x86_64-unknown-linux-gnu. > >> >> >> >> Cross tested on arm*-*-*, aarch64*-*-* > >> >> >> >> Does this patch look OK ? > >> >> >> > > >> >> >> > +/* Return arg, if function returns it's argument or NULL if it doesn't. > >> >> >> > */ > >> >> >> > +tree > >> >> >> > +gimple_call_return_arg (gcall *call_stmt) > >> >> >> > +{ > >> >> >> > > >> >> >> > > >> >> >> > Please just inline it at the single use - the name is not terribly > >> >> >> > informative. > >> >> >> > > >> >> >> > I'm not sure you can rely on code-generation working if you not > >> >> >> > effectively change the IL to > >> >> >> > > >> >> >> > a1 = __builtin_memcpy (a1, a2, a3); > >> >> >> > return a1; > >> >> >> > > >> >> >> > someone more familiar with RTL expansion plus tail call emission on > >> >> >> > RTL needs to chime in. > >> >> >> Well I was trying to copy-propagate function's argument into uses of > >> >> >> it's return value if > >> >> >> function returned that argument, so the assignment to lhs of call > >> >> >> could be made redundant. > >> >> >> > >> >> >> eg: > >> >> >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3) > >> >> >> { > >> >> >> void *t1 = __builtin_memcpy (a1, a2, a3); > >> >> >> return t1; > >> >> >> } > >> >> >> > >> >> >> After patch, copyprop transformed it into: > >> >> >> t1 = __builtin_memcpy (a1, a2, a3); > >> >> >> return a1; > >> >> > > >> >> > But that's a bad transform -- if we know that t1 == a1 then it's > >> >> > better to use t1 as that's readily available in the return register > >> >> > while the register for a1 might have been clobbered and thus we > >> >> > need to spill it for the later return. > >> >> Oh I didn't realize this could possibly pessimize RA. > >> >> For test-case: > >> >> > >> >> void *t1 = memcpy (dest, src, n); > >> >> if (t1 != dest) > >> >> __builtin_abort (); > >> >> > >> >> we could copy-propagate t1 into cond_expr and make the condition redundant. > >> >> However I suppose this particular case could be handled with VRP instead > >> >> (t1 and dest should be marked equivalent) ? > >> > > >> > Yeah, exposing this to value-numbering in general can enable some > >> > optimizations (but I wouldn't put it in copyprop). Note it's then > >> > difficult to avoid copy-propgating things... > >> > > >> > The user can also write > >> > > >> > void *f(void *a1, void *a2, __SIZE_TYPE__ a3) > >> > { > >> > __builtin_memcpy (a1, a2, a3); > >> > return a1; > >> > } > >> > > >> > so it's good to improve code-gen for that (for the tailcall issue). > >> For the tail-call, issue should we artificially create a lhs and use that > >> as return value (perhaps by a separate pass before tailcall) ? > >> > >> __builtin_memcpy (a1, a2, a3); > >> return a1; > >> > >> gets transformed to: > >> _1 = __builtin_memcpy (a1, a2, a3) > >> return _1; > >> > >> So tail-call optimization pass would see the IL in it's expected form. > > > > As said, a RTL expert needs to chime in here. Iff then tail-call > > itself should do this rewrite. But if this form is required to make > > things work (I suppose you checked it _does_ actually work?) then > > we'd need to make sure later passes do not undo it. So it looks > > fragile to me. OTOH I seem to remember that the flags we set on > > GIMPLE are merely a hint to RTL expansion and the tailcalling is > > verified again there? > > Yeah, I verified the form works: > void *f(void *a1, void *a2, __SIZE_TYPE__ a3) > { > void *t1 = __builtin_memcpy (a1, a2, a3); > return t1; > } > > assembly: > f: > .LFB0: > .cfi_startproc > jmp memcpy > .cfi_endproc I meant the void *f(void *a1, void *a2, __SIZE_TYPE__ a3) { __builtin_memcpy (a1, a2, a3); return a1; } form after your patch to the tailcall pass. Richard. > Thanks, > Prathamesh > > > > Thanks, > > Richard. > > > >> Thanks, > >> Prathamesh > >> > > >> > Richard. > >> > > >> >> Thanks, > >> >> Prathamesh > >> >> > > >> >> >> But this now interferes with tail-call optimization, because it is not > >> >> >> able to emit memcpy > >> >> >> as tail-call anymore due to which the patch regressed 20050503-1.c. > >> >> >> I am not sure how to workaround this. > >> >> >> > >> >> >> Thanks, > >> >> >> Prathamesh > >> >> >> > > >> >> >> > Richard. > >> >> >> > >> >> > > >> >> > -- > >> >> > Richard Biener <rguenther@suse.de> > >> >> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg) > >> >> > >> >> > >> > > >> > -- > >> > Richard Biener <rguenther@suse.de> > >> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg) > >> > >> > > > > -- > > Richard Biener <rguenther@suse.de> > > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg) > > -- Richard Biener <rguenther@suse.de> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
On 25 November 2016 at 13:55, Richard Biener <rguenther@suse.de> wrote: > On Fri, 25 Nov 2016, Prathamesh Kulkarni wrote: > >> On 25 November 2016 at 13:37, Richard Biener <rguenther@suse.de> wrote: >> > On Fri, 25 Nov 2016, Prathamesh Kulkarni wrote: >> > >> >> On 24 November 2016 at 18:08, Richard Biener <rguenther@suse.de> wrote: >> >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote: >> >> > >> >> >> On 24 November 2016 at 17:48, Richard Biener <rguenther@suse.de> wrote: >> >> >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote: >> >> >> > >> >> >> >> On 24 November 2016 at 14:07, Richard Biener <rguenther@suse.de> wrote: >> >> >> >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote: >> >> >> >> > >> >> >> >> >> Hi, >> >> >> >> >> Consider following test-case: >> >> >> >> >> >> >> >> >> >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3) >> >> >> >> >> { >> >> >> >> >> __builtin_memcpy (a1, a2, a3); >> >> >> >> >> return a1; >> >> >> >> >> } >> >> >> >> >> >> >> >> >> >> return a1 can be considered equivalent to return value of memcpy, >> >> >> >> >> and the call could be emitted as a tail-call. >> >> >> >> >> gcc doesn't emit the above call to memcpy as a tail-call, >> >> >> >> >> but if it is changed to: >> >> >> >> >> >> >> >> >> >> void *t1 = __builtin_memcpy (a1, a2, a3); >> >> >> >> >> return t1; >> >> >> >> >> >> >> >> >> >> Then memcpy is emitted as a tail-call. >> >> >> >> >> The attached patch tries to handle the former case. >> >> >> >> >> >> >> >> >> >> Bootstrapped+tested on x86_64-unknown-linux-gnu. >> >> >> >> >> Cross tested on arm*-*-*, aarch64*-*-* >> >> >> >> >> Does this patch look OK ? >> >> >> >> > >> >> >> >> > +/* Return arg, if function returns it's argument or NULL if it doesn't. >> >> >> >> > */ >> >> >> >> > +tree >> >> >> >> > +gimple_call_return_arg (gcall *call_stmt) >> >> >> >> > +{ >> >> >> >> > >> >> >> >> > >> >> >> >> > Please just inline it at the single use - the name is not terribly >> >> >> >> > informative. >> >> >> >> > >> >> >> >> > I'm not sure you can rely on code-generation working if you not >> >> >> >> > effectively change the IL to >> >> >> >> > >> >> >> >> > a1 = __builtin_memcpy (a1, a2, a3); >> >> >> >> > return a1; >> >> >> >> > >> >> >> >> > someone more familiar with RTL expansion plus tail call emission on >> >> >> >> > RTL needs to chime in. >> >> >> >> Well I was trying to copy-propagate function's argument into uses of >> >> >> >> it's return value if >> >> >> >> function returned that argument, so the assignment to lhs of call >> >> >> >> could be made redundant. >> >> >> >> >> >> >> >> eg: >> >> >> >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3) >> >> >> >> { >> >> >> >> void *t1 = __builtin_memcpy (a1, a2, a3); >> >> >> >> return t1; >> >> >> >> } >> >> >> >> >> >> >> >> After patch, copyprop transformed it into: >> >> >> >> t1 = __builtin_memcpy (a1, a2, a3); >> >> >> >> return a1; >> >> >> > >> >> >> > But that's a bad transform -- if we know that t1 == a1 then it's >> >> >> > better to use t1 as that's readily available in the return register >> >> >> > while the register for a1 might have been clobbered and thus we >> >> >> > need to spill it for the later return. >> >> >> Oh I didn't realize this could possibly pessimize RA. >> >> >> For test-case: >> >> >> >> >> >> void *t1 = memcpy (dest, src, n); >> >> >> if (t1 != dest) >> >> >> __builtin_abort (); >> >> >> >> >> >> we could copy-propagate t1 into cond_expr and make the condition redundant. >> >> >> However I suppose this particular case could be handled with VRP instead >> >> >> (t1 and dest should be marked equivalent) ? >> >> > >> >> > Yeah, exposing this to value-numbering in general can enable some >> >> > optimizations (but I wouldn't put it in copyprop). Note it's then >> >> > difficult to avoid copy-propgating things... >> >> > >> >> > The user can also write >> >> > >> >> > void *f(void *a1, void *a2, __SIZE_TYPE__ a3) >> >> > { >> >> > __builtin_memcpy (a1, a2, a3); >> >> > return a1; >> >> > } >> >> > >> >> > so it's good to improve code-gen for that (for the tailcall issue). >> >> For the tail-call, issue should we artificially create a lhs and use that >> >> as return value (perhaps by a separate pass before tailcall) ? >> >> >> >> __builtin_memcpy (a1, a2, a3); >> >> return a1; >> >> >> >> gets transformed to: >> >> _1 = __builtin_memcpy (a1, a2, a3) >> >> return _1; >> >> >> >> So tail-call optimization pass would see the IL in it's expected form. >> > >> > As said, a RTL expert needs to chime in here. Iff then tail-call >> > itself should do this rewrite. But if this form is required to make >> > things work (I suppose you checked it _does_ actually work?) then >> > we'd need to make sure later passes do not undo it. So it looks >> > fragile to me. OTOH I seem to remember that the flags we set on >> > GIMPLE are merely a hint to RTL expansion and the tailcalling is >> > verified again there? >> >> Yeah, I verified the form works: >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3) >> { >> void *t1 = __builtin_memcpy (a1, a2, a3); >> return t1; >> } >> >> assembly: >> f: >> .LFB0: >> .cfi_startproc >> jmp memcpy >> .cfi_endproc > > I meant the > > void *f(void *a1, void *a2, __SIZE_TYPE__ a3) > { > __builtin_memcpy (a1, a2, a3); > return a1; > } > > form after your patch to the tailcall pass. Oops, sorry -;) Yes, before the patch: f: .LFB0: .cfi_startproc subq $8, %rsp .cfi_def_cfa_offset 16 call memcpy addq $8, %rsp .cfi_def_cfa_offset 8 ret .cfi_endproc after patch: f: .LFB0: .cfi_startproc jmp memcpy .cfi_endproc Thanks, Prathamesh > > Richard. > >> Thanks, >> Prathamesh >> > >> > Thanks, >> > Richard. >> > >> >> Thanks, >> >> Prathamesh >> >> > >> >> > Richard. >> >> > >> >> >> Thanks, >> >> >> Prathamesh >> >> >> > >> >> >> >> But this now interferes with tail-call optimization, because it is not >> >> >> >> able to emit memcpy >> >> >> >> as tail-call anymore due to which the patch regressed 20050503-1.c. >> >> >> >> I am not sure how to workaround this. >> >> >> >> >> >> >> >> Thanks, >> >> >> >> Prathamesh >> >> >> >> > >> >> >> >> > Richard. >> >> >> >> >> >> >> > >> >> >> > -- >> >> >> > Richard Biener <rguenther@suse.de> >> >> >> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg) >> >> >> >> >> >> >> >> > >> >> > -- >> >> > Richard Biener <rguenther@suse.de> >> >> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg) >> >> >> >> >> > >> > -- >> > Richard Biener <rguenther@suse.de> >> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg) >> >> > > -- > Richard Biener <rguenther@suse.de> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
On 11/25/2016 01:07 AM, Richard Biener wrote: >> For the tail-call, issue should we artificially create a lhs and use that >> as return value (perhaps by a separate pass before tailcall) ? >> >> __builtin_memcpy (a1, a2, a3); >> return a1; >> >> gets transformed to: >> _1 = __builtin_memcpy (a1, a2, a3) >> return _1; >> >> So tail-call optimization pass would see the IL in it's expected form. > > As said, a RTL expert needs to chime in here. Iff then tail-call > itself should do this rewrite. But if this form is required to make > things work (I suppose you checked it _does_ actually work?) then > we'd need to make sure later passes do not undo it. So it looks > fragile to me. OTOH I seem to remember that the flags we set on > GIMPLE are merely a hint to RTL expansion and the tailcalling is > verified again there? So tail calling actually sits on the border between trees and RTL. Essentially it's an expand-time decision as we use information from trees as well as low level target information. I would not expect the former sequence to tail call. The tail calling code does not know that the return value from memcpy will be a1. Thus the tail calling code has to assume that it'll have to copy a1 into the return register after returning from memcpy, which obviously can't be done if we tail called memcpy. The second form is much more likely to turn into a tail call sequence because the return value from memcpy will be sitting in the proper register. This form out to work for most calling conventions that allow tail calls. We could (in theory) try and exploit the fact that memcpy returns its first argument as a return value, but that would only be helpful on a target where the first argument and return value use the same register. So I'd have a slight preference to rewriting per Prathamesh's suggestion above since it's more general. Jeff
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-copyprop-3.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-copyprop-3.c new file mode 100644 index 0000000..8319e9f --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-copyprop-3.c @@ -0,0 +1,35 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-copyprop-slim" } */ + +void f(char *dest, const char *src, __SIZE_TYPE__ n) +{ + char *t1 = __builtin_strcpy (dest, src); + if (t1 != dest) + __builtin_abort (); + + char *t2 = __builtin_strncpy (dest, src, n); + if (t2 != dest) + __builtin_abort (); + + char *t3 = __builtin_strcat (dest, src); + if (t3 != dest) + __builtin_abort (); + + char *t4 = __builtin_strncat (dest, src, n); + if (t4 != dest) + __builtin_abort (); + + void *t5 = __builtin_memmove (dest, src, n); + if (t5 != dest) + __builtin_abort (); + + void *t6 = __builtin_memcpy (dest, src, n); + if (t6 != dest) + __builtin_abort (); + + void *t7 = __builtin_memset (dest, 0, n); + if (t7 != dest) + __builtin_abort (); +} + +/* { dg-final { scan-tree-dump-not "__builtin_abort" "copyprop1" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-copyprop-4.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-copyprop-4.c new file mode 100644 index 0000000..199074d --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-copyprop-4.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-copyprop" } */ + +char *f(char *dest, const char *src) +{ + char *t1 = __builtin_strcpy (dest, src); + return t1; +} + +/* { dg-final { scan-tree-dump "return dest_\[0-9\]*\\(D\\);" "copyprop1" } } */ diff --git a/gcc/tree-ssa-copy.c b/gcc/tree-ssa-copy.c index abc6205..77f4ad7 100644 --- a/gcc/tree-ssa-copy.c +++ b/gcc/tree-ssa-copy.c @@ -80,6 +80,9 @@ stmt_may_generate_copy (gimple *stmt) if (gimple_code (stmt) == GIMPLE_PHI) return !SSA_NAME_OCCURS_IN_ABNORMAL_PHI (gimple_phi_result (stmt)); + if (gimple_code (stmt) == GIMPLE_CALL) + return true; + if (gimple_code (stmt) != GIMPLE_ASSIGN) return false; @@ -252,6 +255,40 @@ copy_prop_visit_cond_stmt (gimple *stmt, edge *taken_edge_p) return retval; } +/* Check if call returns one of it's arguments, and in that + case set copy-of (lhs) to the returned argument. */ + +static enum ssa_prop_result +copy_prop_visit_call (gcall *call_stmt, tree *result_p) +{ + tree lhs = gimple_call_lhs (call_stmt); + if (!lhs || TREE_CODE (lhs) != SSA_NAME) + return SSA_PROP_VARYING; + + unsigned rf = gimple_call_return_flags (call_stmt); + if (rf & ERF_RETURNS_ARG) + { + unsigned argnum = rf & ERF_RETURN_ARG_MASK; + if (argnum < gimple_call_num_args (call_stmt)) + { + tree arg = gimple_call_arg (call_stmt, argnum); + if (TREE_CODE (arg) == SSA_NAME) + { + arg = valueize_val (arg); + if (!may_propagate_copy (lhs, arg)) + return SSA_PROP_VARYING; + + *result_p = lhs; + if (set_copy_of_val (*result_p, arg)) + return SSA_PROP_INTERESTING; + else + return SSA_PROP_NOT_INTERESTING; + } + } + } + + return SSA_PROP_VARYING; +} /* Evaluate statement STMT. If the statement produces a new output value, return SSA_PROP_INTERESTING and store the SSA_NAME holding @@ -290,6 +327,8 @@ copy_prop_visit_stmt (gimple *stmt, edge *taken_edge_p, tree *result_p) jump. */ retval = copy_prop_visit_cond_stmt (stmt, taken_edge_p); } + else if (gcall *call_stmt = dyn_cast<gcall *> (stmt)) + retval = copy_prop_visit_call (call_stmt, result_p); else retval = SSA_PROP_VARYING;