Message ID | 20190329133529.22523-2-adhemerval.zanella@linaro.org |
---|---|
State | New |
Headers | show |
Series | powerpc floating-point optimization refactor | expand |
On Fri, 29 Mar 2019, Adhemerval Zanella wrote: > Since be2e25bbd78f9fdf the generic ieee754 implementation uses > compiler builtin which generates fabs{f} for all supported targets. One reason for the existing version might be a microoptimization for code size to make the float and double versions aliases, as permitted by the ABIs and instruction set in this case. (This is not an objection to this patch.) -- Joseph S. Myers joseph@codesourcery.com
On 02/04/2019 03:04, Joseph Myers wrote: > On Fri, 29 Mar 2019, Adhemerval Zanella wrote: > >> Since be2e25bbd78f9fdf the generic ieee754 implementation uses >> compiler builtin which generates fabs{f} for all supported targets. > > One reason for the existing version might be a microoptimization for code > size to make the float and double versions aliases, as permitted by the > ABIs and instruction set in this case. (This is not an objection to this > patch.) > Indeed powerpc does such microoptimizations for some implementations, but if the idea is indeed to push such optimizations couldn't add on generic implementation though a flag defined in arch-specific headers?
On Wed, Apr 03 2019, Adhemerval Zanella wrote: > > > On 02/04/2019 03:04, Joseph Myers wrote: > > On Fri, 29 Mar 2019, Adhemerval Zanella wrote: > > > >> Since be2e25bbd78f9fdf the generic ieee754 implementation uses > >> compiler builtin which generates fabs{f} for all supported targets. > > > > One reason for the existing version might be a microoptimization for code > > size to make the float and double versions aliases, as permitted by the > > ABIs and instruction set in this case. (This is not an objection to this > > patch.) > > > > Indeed powerpc does such microoptimizations for some implementations, but The code generated by these functions is as follows (on powerpc64le): Before the patch: 000000000004dac0 <fabs>: 4dac0: 0e 00 4c 3c addis r2,r12,14 4dac4: 40 9b 42 38 addi r2,r2,-25792 4dac8: 10 0a 20 fc fabs f1,f1 4dacc: 20 00 80 4e blr After the patch: 000000000004dac0 <fabs>: 4dac0: 10 0a 20 fc fabs f1,f1 4dac4: 20 00 80 4e blr (this is true when --enable-stack-protector is not in use, or when it is set to strong, as is the case for the distros I checked (Debian, Fedora and OpenSuse); when set to all, there would be a lot more differences before and after the patch, as the assembly implementation wouldn't get stack protection code, but that's not a problem, imo). (also, the absence of the TOC-setup code is fine, as the function does not need anything from the TOC). > if the idea is indeed to push such optimizations couldn't add on generic > implementation though a flag defined in arch-specific headers? Given the size of these functions, I believe that this is not worth the effort, and I believe this patch is OK as is. Reviewed-by: Gabriel F. T. Gomes <gabriel@inconstante.eti.br>
"Gabriel F. T. Gomes" <gabriel@inconstante.eti.br> writes: > The code generated by these functions is as follows (on powerpc64le): > > Before the patch: > 000000000004dac0 <fabs>: > 4dac0: 0e 00 4c 3c addis r2,r12,14 > 4dac4: 40 9b 42 38 addi r2,r2,-25792 > 4dac8: 10 0a 20 fc fabs f1,f1 > 4dacc: 20 00 80 4e blr > > After the patch: > 000000000004dac0 <fabs>: > 4dac0: 10 0a 20 fc fabs f1,f1 > 4dac4: 20 00 80 4e blr This is showing that __fabs could have been using ENTRY_TOCLESS instead of ENTRY. ;-) -- Tulio Magno
On 15/04/2019 18:32, Tulio Magno Quites Machado Filho wrote: > "Gabriel F. T. Gomes" <gabriel@inconstante.eti.br> writes: > >> The code generated by these functions is as follows (on powerpc64le): >> >> Before the patch: >> 000000000004dac0 <fabs>: >> 4dac0: 0e 00 4c 3c addis r2,r12,14 >> 4dac4: 40 9b 42 38 addi r2,r2,-25792 >> 4dac8: 10 0a 20 fc fabs f1,f1 >> 4dacc: 20 00 80 4e blr >> >> After the patch: >> 000000000004dac0 <fabs>: >> 4dac0: 10 0a 20 fc fabs f1,f1 >> 4dac4: 20 00 80 4e blr > > This is showing that __fabs could have been using ENTRY_TOCLESS instead of > ENTRY. ;-) > And another point try use C implementation where possible (since it abstracts that kind of possible ABI issues) ;)
diff --git a/sysdeps/powerpc/fpu/s_fabs.S b/sysdeps/powerpc/fpu/s_fabs.S deleted file mode 100644 index 7147f6b4a2..0000000000 --- a/sysdeps/powerpc/fpu/s_fabs.S +++ /dev/null @@ -1,33 +0,0 @@ -/* Floating-point absolute value. PowerPC version. - Copyright (C) 1997-2019 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <http://www.gnu.org/licenses/>. */ - -#include <sysdep.h> -#include <libm-alias-float.h> -#include <libm-alias-double.h> - -ENTRY(__fabs) -/* double [f1] fabs (double [f1] x); */ - fabs fp1,fp1 - blr -END(__fabs) - -libm_alias_double (__fabs, fabs) - -/* It turns out that it's safe to use this code even for single-precision. */ -strong_alias(__fabs,__fabsf) -libm_alias_float (__fabs, fabs) diff --git a/sysdeps/powerpc/fpu/s_fabsf.S b/sysdeps/powerpc/fpu/s_fabsf.S deleted file mode 100644 index 877c710ce8..0000000000 --- a/sysdeps/powerpc/fpu/s_fabsf.S +++ /dev/null @@ -1 +0,0 @@ -/* __fabsf is in s_fabs.S */