Message ID | 20250131191844.2582716-10-adhemerval.zanella@linaro.org |
---|---|
State | New |
Headers | show |
Series | Add c23 CORE-MATH binary32 implementations to libm | expand |
I confirm we get correct rounding for all rounding modes and all binary32 inputs on x86_64. Paul > From: Adhemerval Zanella <adhemerval.zanella@linaro.org> > Cc: DJ Delorie <dj@redhat.com>, > Joseph Myers <josmyers@redhat.com>, > Paul Zimmermann <Paul.Zimmermann@inria.fr>, > Alexei Sibidanov <sibid@uvic.ca> > Date: Fri, 31 Jan 2025 16:17:13 -0300 > > The CORE-MATH implementation is correctly rounded (for any rounding mode) > and shows better performance to the generic asinpif. > > The code was adapted to glibc style and to use the definition of > math_config.h (to handle errno, overflow, and underflow). > > Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1, > gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1): > > latency master patched improvement > x86_64 46.4996 51.0456 -9.78% > x86_64v2 46.7551 52.1317 -11.50% > x86_64v3 42.6235 34.8162 18.32% > aarch64 (Neoverse) 17.4161 14.3604 17.55% > power8 10.7347 9.0193 15.98% > power10 10.6420 9.0362 15.09% > > reciprocal-throughput master patched improvement > x86_64 24.7208 29.0812 -17.64% > x86_64v2 24.2177 29.7166 -22.71% > x86_64v3 20.5617 12.3679 39.85% > aarch64 (Neoverse) 13.4827 7.17613 46.78% > power8 6.46134 3.56089 44.89% > power10 5.79007 3.49544 39.63% > > x86_64/x86_64-v2 shows slower performance due the use of a fma > operation in the fast patch, only x86_64-v3 provides it without a > function call. > --- > SHARED-FILES | 4 + > sysdeps/aarch64/libm-test-ulps | 4 - > sysdeps/arc/fpu/libm-test-ulps | 4 - > sysdeps/arc/nofpu/libm-test-ulps | 1 - > sysdeps/arm/libm-test-ulps | 4 - > sysdeps/hppa/fpu/libm-test-ulps | 4 - > sysdeps/i386/fpu/libm-test-ulps | 4 - > .../i386/i686/fpu/multiarch/libm-test-ulps | 4 - > sysdeps/ieee754/flt-32/s_asinpif.c | 136 ++++++++++++++++++ > sysdeps/loongarch/lp64/libm-test-ulps | 4 - > sysdeps/mips/mips64/libm-test-ulps | 4 - > sysdeps/or1k/fpu/libm-test-ulps | 4 - > sysdeps/or1k/nofpu/libm-test-ulps | 1 - > sysdeps/powerpc/fpu/libm-test-ulps | 4 - > sysdeps/riscv/nofpu/libm-test-ulps | 1 - > sysdeps/riscv/rvd/libm-test-ulps | 4 - > sysdeps/s390/fpu/libm-test-ulps | 4 - > sysdeps/sparc/fpu/libm-test-ulps | 4 - > sysdeps/x86_64/fpu/libm-test-ulps | 4 - > 19 files changed, 140 insertions(+), 59 deletions(-) > create mode 100644 sysdeps/ieee754/flt-32/s_asinpif.c > > diff --git a/SHARED-FILES b/SHARED-FILES > index 3fde72644a..e700f4b155 100644 > --- a/SHARED-FILES > +++ b/SHARED-FILES > @@ -338,3 +338,7 @@ sysdeps/ieee754/flt-32/s_acospif.c: > (src/binary32/acospi/acospif.c in CORE-MATH) > - the code was adapted to use glibc code style and internal > functions to handle errno, overflow, and underflow. > +sysdeps/ieee754/flt-32/s_asinpif.c: > + (src/binary32/asinpi/asinpif.c in CORE-MATH) > + - the code was adapted to use glibc code style and internal > + functions to handle errno, overflow, and underflow. > diff --git a/sysdeps/aarch64/libm-test-ulps b/sysdeps/aarch64/libm-test-ulps > index 1a403d95b6..abb0611ee5 100644 > --- a/sysdeps/aarch64/libm-test-ulps > +++ b/sysdeps/aarch64/libm-test-ulps > @@ -115,22 +115,18 @@ ldouble: 4 > > Function: "asinpi": > double: 1 > -float: 1 > ldouble: 2 > > Function: "asinpi_downward": > double: 1 > -float: 1 > ldouble: 2 > > Function: "asinpi_towardzero": > double: 1 > -float: 2 > ldouble: 1 > > Function: "asinpi_upward": > double: 2 > -float: 2 > ldouble: 2 > > Function: "atan": > diff --git a/sysdeps/arc/fpu/libm-test-ulps b/sysdeps/arc/fpu/libm-test-ulps > index c0c5daa589..35aebba38a 100644 > --- a/sysdeps/arc/fpu/libm-test-ulps > +++ b/sysdeps/arc/fpu/libm-test-ulps > @@ -63,19 +63,15 @@ double: 3 > > Function: "asinpi": > double: 1 > -float: 1 > > Function: "asinpi_downward": > double: 1 > -float: 1 > > Function: "asinpi_towardzero": > double: 1 > -float: 2 > > Function: "asinpi_upward": > double: 2 > -float: 2 > > Function: "atan": > double: 1 > diff --git a/sysdeps/arc/nofpu/libm-test-ulps b/sysdeps/arc/nofpu/libm-test-ulps > index 2b34f5a0ab..325546e582 100644 > --- a/sysdeps/arc/nofpu/libm-test-ulps > +++ b/sysdeps/arc/nofpu/libm-test-ulps > @@ -18,7 +18,6 @@ double: 2 > > Function: "asinpi": > double: 1 > -float: 1 > > Function: "atan": > double: 1 > diff --git a/sysdeps/arm/libm-test-ulps b/sysdeps/arm/libm-test-ulps > index afb0532a66..0927fdb980 100644 > --- a/sysdeps/arm/libm-test-ulps > +++ b/sysdeps/arm/libm-test-ulps > @@ -63,19 +63,15 @@ double: 3 > > Function: "asinpi": > double: 1 > -float: 1 > > Function: "asinpi_downward": > double: 1 > -float: 1 > > Function: "asinpi_towardzero": > double: 1 > -float: 2 > > Function: "asinpi_upward": > double: 2 > -float: 2 > > Function: "atan": > double: 1 > diff --git a/sysdeps/hppa/fpu/libm-test-ulps b/sysdeps/hppa/fpu/libm-test-ulps > index b9959c8a12..02cc3b5ddc 100644 > --- a/sysdeps/hppa/fpu/libm-test-ulps > +++ b/sysdeps/hppa/fpu/libm-test-ulps > @@ -63,19 +63,15 @@ double: 3 > > Function: "asinpi": > double: 1 > -float: 1 > > Function: "asinpi_downward": > double: 1 > -float: 1 > > Function: "asinpi_towardzero": > double: 1 > -float: 2 > > Function: "asinpi_upward": > double: 2 > -float: 2 > > Function: "atan": > double: 1 > diff --git a/sysdeps/i386/fpu/libm-test-ulps b/sysdeps/i386/fpu/libm-test-ulps > index 85c58f34e9..69d0eb1eec 100644 > --- a/sysdeps/i386/fpu/libm-test-ulps > +++ b/sysdeps/i386/fpu/libm-test-ulps > @@ -101,25 +101,21 @@ ldouble: 5 > > Function: "asinpi": > double: 1 > -float: 1 > float128: 2 > ldouble: 2 > > Function: "asinpi_downward": > double: 2 > -float: 1 > float128: 2 > ldouble: 2 > > Function: "asinpi_towardzero": > double: 1 > -float: 2 > float128: 1 > ldouble: 2 > > Function: "asinpi_upward": > double: 2 > -float: 2 > float128: 2 > ldouble: 2 > > diff --git a/sysdeps/i386/i686/fpu/multiarch/libm-test-ulps b/sysdeps/i386/i686/fpu/multiarch/libm-test-ulps > index bc14e7e115..392d7d252c 100644 > --- a/sysdeps/i386/i686/fpu/multiarch/libm-test-ulps > +++ b/sysdeps/i386/i686/fpu/multiarch/libm-test-ulps > @@ -101,25 +101,21 @@ ldouble: 5 > > Function: "asinpi": > double: 1 > -float: 1 > float128: 2 > ldouble: 2 > > Function: "asinpi_downward": > double: 2 > -float: 1 > float128: 2 > ldouble: 2 > > Function: "asinpi_towardzero": > double: 1 > -float: 2 > float128: 1 > ldouble: 2 > > Function: "asinpi_upward": > double: 2 > -float: 2 > float128: 2 > ldouble: 2 > > diff --git a/sysdeps/ieee754/flt-32/s_asinpif.c b/sysdeps/ieee754/flt-32/s_asinpif.c > new file mode 100644 > index 0000000000..585dc3f06e > --- /dev/null > +++ b/sysdeps/ieee754/flt-32/s_asinpif.c > @@ -0,0 +1,136 @@ > +/* Correctly-rounded half-revolution arc-sine function for binary32 value. > + > +Copyright (c) 2022-2025 Alexei Sibidanov. > + > +The original version of this file was copied from the CORE-MATH > +project (file src/binary32/asinpi/asinpif.c, revision 49e223e). > + > +Permission is hereby granted, free of charge, to any person obtaining a copy > +of this software and associated documentation files (the "Software"), to deal > +in the Software without restriction, including without limitation the rights > +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell > +copies of the Software, and to permit persons to whom the Software is > +furnished to do so, subject to the following conditions: > + > +The above copyright notice and this permission notice shall be included in all > +copies or substantial portions of the Software. > + > +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR > +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, > +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE > +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER > +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, > +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE > +SOFTWARE. > + > +*/ > + > +#include <errno.h> > +#include <math.h> > +#include <stdint.h> > +#include <libm-alias-float.h> > +#include "math_config.h" > + > +float > +__asinpif (float x) > +{ > + float ax = fabsf (x); > + double az = ax; > + double z = x; > + uint32_t t = asuint (x); > + int32_t e = (t >> 23) & 0xff; > + if (__glibc_unlikely (e >= 127)) > + { > + if (ax == 1.0f) > + return copysignf (0.5f, x); > + if (e == 0xff && (t << 9)) > + return x + x; /* nan */ > + return __math_edomf ((x - x) / (x - x)); /* nan */ > + } > + int32_t s = 146 - e; > + int32_t i = 0; > + if (__glibc_likely (s < 32)) > + i = ((t & (~0u >> 9)) | 1 << 23) >> s; > + static const double ch[][8] = > + { > + { 0x1.45f306dc9c882p-2, 0x1.b2995e7b7dc2fp-5, 0x1.8723a1cf50c7ep-6, > + 0x1.d1a4591d16a29p-7, 0x1.3ce3aa68ddaeep-7, 0x1.d3182ab0cc1bfp-8, > + 0x1.62b379a8b88e3p-8, 0x1.6811411fcfec2p-8 }, > + { 0x1.ffffffffd3cdap-2, -0x1.17cc1b3355fdcp-4, 0x1.d067a1e8d5a99p-6, > + -0x1.08e16fb09314ap-6, 0x1.5eed43d42dcb2p-7, -0x1.f58baca7acc71p-8, > + 0x1.5dab64e2dcf15p-8, -0x1.59270e30797acp-9 }, > + { 0x1.fffffff7c4617p-2, -0x1.17cc149ded3a2p-4, 0x1.d0654d4cb2c1ap-6, > + -0x1.08c3ba713d33ap-6, 0x1.5d2053481079cp-7, -0x1.e485ebc545e7ep-8, > + 0x1.303baca167dddp-8, -0x1.dee8d16d06b38p-10 }, > + { 0x1.ffffffa749848p-2, -0x1.17cbe7155935p-4, 0x1.d05a312269adfp-6, > + -0x1.0862b3ee617d7p-6, 0x1.5920708db2a73p-7, -0x1.cb0463b3862c3p-8, > + 0x1.02b82478f95d7p-8, -0x1.52a7b8579e729p-10 }, > + { 0x1.fffffe1f92bb5p-2, -0x1.17cb3e74c64e3p-4, 0x1.d03af67311cbfp-6, > + -0x1.079441cbfc7ap-6, 0x1.52b4287805a61p-7, -0x1.ac3286d604a98p-8, > + 0x1.b2f1210d9701bp-9, -0x1.e740ddc25afd6p-11 }, > + { 0x1.fffff92beb6e2p-2, -0x1.17c986fe9518bp-4, 0x1.cff98167c9a5ep-6, > + -0x1.0638b591eae52p-6, 0x1.4a0803828959ep-7, -0x1.8adeca229f11dp-8, > + 0x1.6b9a7ba05dfcep-9, -0x1.640521a43b2dp-11 }, > + { 0x1.ffffeccee5bfcp-2, -0x1.17c5f1753f5eap-4, 0x1.cf874e4fe258fp-6, > + -0x1.043e6cf77b256p-6, 0x1.3f7db42227d92p-7, -0x1.691a6fa2a2882p-8, > + 0x1.2f6543162bc61p-9, -0x1.07d5da05822b6p-11 }, > + { 0x1.ffffd2f64431dp-2, -0x1.17bf8208c10c1p-4, 0x1.ced7487cdb124p-6, > + -0x1.01a0d30932905p-6, 0x1.3388f99b254dap-7, -0x1.4844e245c65bdp-8, > + 0x1.fa777150197c6p-10, -0x1.8c1ecf16a05c8p-12 }, > + { 0x1.ffffa36d1712ep-2, -0x1.17b523971bd4ep-4, 0x1.cddee26de2deep-6, > + -0x1.fccb00abaaabcp-7, 0x1.269afc3622342p-7, -0x1.2933152686752p-8, > + 0x1.a76d4956cc9a3p-10, -0x1.2ce7d6dc651cep-12 }, > + { 0x1.ffff5402ab3a1p-2, -0x1.17a5ba85da77ap-4, 0x1.cc96894e05c02p-6, > + -0x1.f532143cb832ep-7, 0x1.19180b660ff09p-7, -0x1.0c57417a78b3cp-8, > + 0x1.62e26cbd7bb1ep-10, -0x1.ce28d33fe1df3p-13 }, > + { 0x1.fffed8d639751p-2, -0x1.1790349f3ae76p-4, 0x1.caf9a4fd1b398p-6, > + -0x1.ec986b111342ep-7, 0x1.0b53c3ad4baa4p-7, -0x1.e3c2282eeace4p-9, > + 0x1.2a55369f55bbep-10, -0x1.667fe48c396e8p-13 }, > + { 0x1.fffe24b714161p-2, -0x1.177394fbcb719p-4, 0x1.c90652d920ebdp-6, > + -0x1.e3239197bddf1p-7, 0x1.fb2188525b025p-8, -0x1.b3aadd451afc7p-9, > + 0x1.f74020f31fdabp-11, -0x1.18b0cb246768dp-13 }, > + { 0x1.fffd298bec9e2p-2, -0x1.174efbfd34648p-4, 0x1.c6bcfe48ea92bp-6, > + -0x1.d8f9f2a16157cp-7, 0x1.e0044f56c8864p-8, -0x1.883e2347fe76cp-9, > + 0x1.a9f0e3c1b7af5p-11, -0x1.bb5acc0e60825p-14 }, > + { 0x1.fffbd8b784c4dp-2, -0x1.1721abdd3722ep-4, 0x1.c41fee756d4bp-6, > + -0x1.ce40bccf8065fp-7, 0x1.c59b684b70ef9p-8, -0x1.6133d027996b3p-9, > + 0x1.69cad01106397p-11, -0x1.60f8e45494156p-14 }, > + { 0x1.fffa23749cf88p-2, -0x1.16eb0a8285c06p-4, 0x1.c132d762e1b0dp-6, > + -0x1.c31a959398f4ep-7, 0x1.ac1c5b46bc8ap-8, -0x1.3e34f1abe51dcp-9, > + 0x1.346738737c0b9p-11, -0x1.1b227a3f5c75p-14 }, > + { 0x1.fff7fb25bb407p-2, -0x1.16aaa14d7564p-4, 0x1.bdfa75fca5ff2p-6, > + -0x1.b7a6e260d079cp-7, 0x1.93ab06911033cp-8, -0x1.1ee5560967fd5p-9, > + 0x1.07d31060838bfp-11, -0x1.c96f33a283115p-15 > + } > + }; > + const double *c = ch[i]; > + double z2 = z * z; > + double z4 = z2 * z2; > + if (__glibc_unlikely (i == 0)) > + { > + double c0 = c[0] + z2 * c[1]; > + double c2 = c[2] + z2 * c[3]; > + double c4 = c[4] + z2 * c[5]; > + double c6 = c[6] + z2 * c[7]; > + c0 += c2 * z4; > + c4 += c6 * z4; > + c0 += c4 * (z4 * z4); > + if (__glibc_unlikely (ax != 0.0 && ax <= 0x1.921fb4p-126f)) > + __set_errno (ERANGE); > + return z * c0; > + } > + else > + { > + double f = sqrt (1 - az); > + double c0 = fma (az, c[1], c[0]); > + double c2 = c[2] + az * c[3]; > + double c4 = c[4] + az * c[5]; > + double c6 = c[6] + az * c[7]; > + c0 += c2 * z2; > + c4 += c6 * z2; > + c0 += c4 * z4; > + double r = fma (-c0, copysign (f, x), copysign (0.5, x)); > + return r; > + } > +} > +libm_alias_float (__asinpi, asinpi) > diff --git a/sysdeps/loongarch/lp64/libm-test-ulps b/sysdeps/loongarch/lp64/libm-test-ulps > index ce84ddf1e6..33dd6718ba 100644 > --- a/sysdeps/loongarch/lp64/libm-test-ulps > +++ b/sysdeps/loongarch/lp64/libm-test-ulps > @@ -83,22 +83,18 @@ ldouble: 4 > > Function: "asinpi": > double: 1 > -float: 1 > ldouble: 2 > > Function: "asinpi_downward": > double: 1 > -float: 1 > ldouble: 2 > > Function: "asinpi_towardzero": > double: 1 > -float: 2 > ldouble: 1 > > Function: "asinpi_upward": > double: 2 > -float: 2 > ldouble: 2 > > Function: "atan": > diff --git a/sysdeps/mips/mips64/libm-test-ulps b/sysdeps/mips/mips64/libm-test-ulps > index 67c37dfd5e..869ceff928 100644 > --- a/sysdeps/mips/mips64/libm-test-ulps > +++ b/sysdeps/mips/mips64/libm-test-ulps > @@ -83,22 +83,18 @@ ldouble: 4 > > Function: "asinpi": > double: 1 > -float: 1 > ldouble: 2 > > Function: "asinpi_downward": > double: 1 > -float: 1 > ldouble: 2 > > Function: "asinpi_towardzero": > double: 1 > -float: 2 > ldouble: 1 > > Function: "asinpi_upward": > double: 2 > -float: 2 > ldouble: 2 > > Function: "atan": > diff --git a/sysdeps/or1k/fpu/libm-test-ulps b/sysdeps/or1k/fpu/libm-test-ulps > index d3b1036d29..75db236e09 100644 > --- a/sysdeps/or1k/fpu/libm-test-ulps > +++ b/sysdeps/or1k/fpu/libm-test-ulps > @@ -63,19 +63,15 @@ double: 3 > > Function: "asinpi": > double: 1 > -float: 1 > > Function: "asinpi_downward": > double: 1 > -float: 1 > > Function: "asinpi_towardzero": > double: 1 > -float: 2 > > Function: "asinpi_upward": > double: 2 > -float: 2 > > Function: "atan": > double: 1 > diff --git a/sysdeps/or1k/nofpu/libm-test-ulps b/sysdeps/or1k/nofpu/libm-test-ulps > index 14b7e0f3f9..a1f7c80097 100644 > --- a/sysdeps/or1k/nofpu/libm-test-ulps > +++ b/sysdeps/or1k/nofpu/libm-test-ulps > @@ -54,7 +54,6 @@ double: 3 > > Function: "asinpi": > double: 1 > -float: 1 > > Function: "atan": > double: 1 > diff --git a/sysdeps/powerpc/fpu/libm-test-ulps b/sysdeps/powerpc/fpu/libm-test-ulps > index c9c86de147..fa3cf2e844 100644 > --- a/sysdeps/powerpc/fpu/libm-test-ulps > +++ b/sysdeps/powerpc/fpu/libm-test-ulps > @@ -107,25 +107,21 @@ ldouble: 7 > > Function: "asinpi": > double: 1 > -float: 1 > float128: 1 > ldouble: 2 > > Function: "asinpi_downward": > double: 1 > -float: 1 > float128: 2 > ldouble: 4 > > Function: "asinpi_towardzero": > double: 1 > -float: 2 > float128: 1 > ldouble: 4 > > Function: "asinpi_upward": > double: 2 > -float: 2 > float128: 2 > ldouble: 4 > > diff --git a/sysdeps/riscv/nofpu/libm-test-ulps b/sysdeps/riscv/nofpu/libm-test-ulps > index 6206a9531a..a5184ecad9 100644 > --- a/sysdeps/riscv/nofpu/libm-test-ulps > +++ b/sysdeps/riscv/nofpu/libm-test-ulps > @@ -71,7 +71,6 @@ ldouble: 4 > > Function: "asinpi": > double: 1 > -float: 1 > ldouble: 2 > > Function: "atan": > diff --git a/sysdeps/riscv/rvd/libm-test-ulps b/sysdeps/riscv/rvd/libm-test-ulps > index 124ca4b719..3bfc9668d5 100644 > --- a/sysdeps/riscv/rvd/libm-test-ulps > +++ b/sysdeps/riscv/rvd/libm-test-ulps > @@ -83,22 +83,18 @@ ldouble: 4 > > Function: "asinpi": > double: 1 > -float: 1 > ldouble: 2 > > Function: "asinpi_downward": > double: 1 > -float: 1 > ldouble: 2 > > Function: "asinpi_towardzero": > double: 1 > -float: 2 > ldouble: 1 > > Function: "asinpi_upward": > double: 2 > -float: 2 > ldouble: 2 > > Function: "atan": > diff --git a/sysdeps/s390/fpu/libm-test-ulps b/sysdeps/s390/fpu/libm-test-ulps > index 364ccf3326..7d61bf1cef 100644 > --- a/sysdeps/s390/fpu/libm-test-ulps > +++ b/sysdeps/s390/fpu/libm-test-ulps > @@ -83,22 +83,18 @@ ldouble: 4 > > Function: "asinpi": > double: 1 > -float: 1 > ldouble: 2 > > Function: "asinpi_downward": > double: 1 > -float: 1 > ldouble: 2 > > Function: "asinpi_towardzero": > double: 1 > -float: 2 > ldouble: 1 > > Function: "asinpi_upward": > double: 2 > -float: 2 > ldouble: 2 > > Function: "atan": > diff --git a/sysdeps/sparc/fpu/libm-test-ulps b/sysdeps/sparc/fpu/libm-test-ulps > index 1174972002..426f45893e 100644 > --- a/sysdeps/sparc/fpu/libm-test-ulps > +++ b/sysdeps/sparc/fpu/libm-test-ulps > @@ -83,22 +83,18 @@ ldouble: 4 > > Function: "asinpi": > double: 1 > -float: 1 > ldouble: 2 > > Function: "asinpi_downward": > double: 1 > -float: 1 > ldouble: 2 > > Function: "asinpi_towardzero": > double: 1 > -float: 2 > ldouble: 1 > > Function: "asinpi_upward": > double: 2 > -float: 2 > ldouble: 2 > > Function: "atan": > diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps > index 5ed5112b49..d4c4bfa42b 100644 > --- a/sysdeps/x86_64/fpu/libm-test-ulps > +++ b/sysdeps/x86_64/fpu/libm-test-ulps > @@ -180,25 +180,21 @@ float: 1 > > Function: "asinpi": > double: 1 > -float: 1 > float128: 2 > ldouble: 2 > > Function: "asinpi_downward": > double: 1 > -float: 1 > float128: 2 > ldouble: 2 > > Function: "asinpi_towardzero": > double: 1 > -float: 2 > float128: 1 > ldouble: 2 > > Function: "asinpi_upward": > double: 2 > -float: 2 > float128: 2 > ldouble: 2 > > -- > 2.43.0 > >
I suggest the following change which should improve performance on x86_64/x86_64-v2: --- a/sysdeps/ieee754/flt-32/s_asinpif.c +++ b/sysdeps/ieee754/flt-32/s_asinpif.c @@ -122,7 +122,7 @@ __asinpif (float x) else { double f = sqrt (1 - az); - double c0 = fma (az, c[1], c[0]); + double c0 = c[0] + az * c[1]; double c2 = c[2] + az * c[3]; double c4 = c[4] + az * c[5]; double c6 = c[6] + az * c[7]; Moreover "fast patch" should be fast path. Paul > From: Adhemerval Zanella <adhemerval.zanella@linaro.org> > Cc: DJ Delorie <dj@redhat.com>, > Joseph Myers <josmyers@redhat.com>, > Paul Zimmermann <Paul.Zimmermann@inria.fr>, > Alexei Sibidanov <sibid@uvic.ca> > Date: Fri, 31 Jan 2025 16:17:13 -0300 > > The CORE-MATH implementation is correctly rounded (for any rounding mode) > and shows better performance to the generic asinpif. > > The code was adapted to glibc style and to use the definition of > math_config.h (to handle errno, overflow, and underflow). > > Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1, > gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1): > > latency master patched improvement > x86_64 46.4996 51.0456 -9.78% > x86_64v2 46.7551 52.1317 -11.50% > x86_64v3 42.6235 34.8162 18.32% > aarch64 (Neoverse) 17.4161 14.3604 17.55% > power8 10.7347 9.0193 15.98% > power10 10.6420 9.0362 15.09% > > reciprocal-throughput master patched improvement > x86_64 24.7208 29.0812 -17.64% > x86_64v2 24.2177 29.7166 -22.71% > x86_64v3 20.5617 12.3679 39.85% > aarch64 (Neoverse) 13.4827 7.17613 46.78% > power8 6.46134 3.56089 44.89% > power10 5.79007 3.49544 39.63% > > x86_64/x86_64-v2 shows slower performance due the use of a fma > operation in the fast patch, only x86_64-v3 provides it without a > function call. > --- > SHARED-FILES | 4 + > sysdeps/aarch64/libm-test-ulps | 4 - > sysdeps/arc/fpu/libm-test-ulps | 4 - > sysdeps/arc/nofpu/libm-test-ulps | 1 - > sysdeps/arm/libm-test-ulps | 4 - > sysdeps/hppa/fpu/libm-test-ulps | 4 - > sysdeps/i386/fpu/libm-test-ulps | 4 - > .../i386/i686/fpu/multiarch/libm-test-ulps | 4 - > sysdeps/ieee754/flt-32/s_asinpif.c | 136 ++++++++++++++++++ > sysdeps/loongarch/lp64/libm-test-ulps | 4 - > sysdeps/mips/mips64/libm-test-ulps | 4 - > sysdeps/or1k/fpu/libm-test-ulps | 4 - > sysdeps/or1k/nofpu/libm-test-ulps | 1 - > sysdeps/powerpc/fpu/libm-test-ulps | 4 - > sysdeps/riscv/nofpu/libm-test-ulps | 1 - > sysdeps/riscv/rvd/libm-test-ulps | 4 - > sysdeps/s390/fpu/libm-test-ulps | 4 - > sysdeps/sparc/fpu/libm-test-ulps | 4 - > sysdeps/x86_64/fpu/libm-test-ulps | 4 - > 19 files changed, 140 insertions(+), 59 deletions(-) > create mode 100644 sysdeps/ieee754/flt-32/s_asinpif.c > > diff --git a/SHARED-FILES b/SHARED-FILES > index 3fde72644a..e700f4b155 100644 > --- a/SHARED-FILES > +++ b/SHARED-FILES > @@ -338,3 +338,7 @@ sysdeps/ieee754/flt-32/s_acospif.c: > (src/binary32/acospi/acospif.c in CORE-MATH) > - the code was adapted to use glibc code style and internal > functions to handle errno, overflow, and underflow. > +sysdeps/ieee754/flt-32/s_asinpif.c: > + (src/binary32/asinpi/asinpif.c in CORE-MATH) > + - the code was adapted to use glibc code style and internal > + functions to handle errno, overflow, and underflow. > diff --git a/sysdeps/aarch64/libm-test-ulps b/sysdeps/aarch64/libm-test-ulps > index 1a403d95b6..abb0611ee5 100644 > --- a/sysdeps/aarch64/libm-test-ulps > +++ b/sysdeps/aarch64/libm-test-ulps > @@ -115,22 +115,18 @@ ldouble: 4 > > Function: "asinpi": > double: 1 > -float: 1 > ldouble: 2 > > Function: "asinpi_downward": > double: 1 > -float: 1 > ldouble: 2 > > Function: "asinpi_towardzero": > double: 1 > -float: 2 > ldouble: 1 > > Function: "asinpi_upward": > double: 2 > -float: 2 > ldouble: 2 > > Function: "atan": > diff --git a/sysdeps/arc/fpu/libm-test-ulps b/sysdeps/arc/fpu/libm-test-ulps > index c0c5daa589..35aebba38a 100644 > --- a/sysdeps/arc/fpu/libm-test-ulps > +++ b/sysdeps/arc/fpu/libm-test-ulps > @@ -63,19 +63,15 @@ double: 3 > > Function: "asinpi": > double: 1 > -float: 1 > > Function: "asinpi_downward": > double: 1 > -float: 1 > > Function: "asinpi_towardzero": > double: 1 > -float: 2 > > Function: "asinpi_upward": > double: 2 > -float: 2 > > Function: "atan": > double: 1 > diff --git a/sysdeps/arc/nofpu/libm-test-ulps b/sysdeps/arc/nofpu/libm-test-ulps > index 2b34f5a0ab..325546e582 100644 > --- a/sysdeps/arc/nofpu/libm-test-ulps > +++ b/sysdeps/arc/nofpu/libm-test-ulps > @@ -18,7 +18,6 @@ double: 2 > > Function: "asinpi": > double: 1 > -float: 1 > > Function: "atan": > double: 1 > diff --git a/sysdeps/arm/libm-test-ulps b/sysdeps/arm/libm-test-ulps > index afb0532a66..0927fdb980 100644 > --- a/sysdeps/arm/libm-test-ulps > +++ b/sysdeps/arm/libm-test-ulps > @@ -63,19 +63,15 @@ double: 3 > > Function: "asinpi": > double: 1 > -float: 1 > > Function: "asinpi_downward": > double: 1 > -float: 1 > > Function: "asinpi_towardzero": > double: 1 > -float: 2 > > Function: "asinpi_upward": > double: 2 > -float: 2 > > Function: "atan": > double: 1 > diff --git a/sysdeps/hppa/fpu/libm-test-ulps b/sysdeps/hppa/fpu/libm-test-ulps > index b9959c8a12..02cc3b5ddc 100644 > --- a/sysdeps/hppa/fpu/libm-test-ulps > +++ b/sysdeps/hppa/fpu/libm-test-ulps > @@ -63,19 +63,15 @@ double: 3 > > Function: "asinpi": > double: 1 > -float: 1 > > Function: "asinpi_downward": > double: 1 > -float: 1 > > Function: "asinpi_towardzero": > double: 1 > -float: 2 > > Function: "asinpi_upward": > double: 2 > -float: 2 > > Function: "atan": > double: 1 > diff --git a/sysdeps/i386/fpu/libm-test-ulps b/sysdeps/i386/fpu/libm-test-ulps > index 85c58f34e9..69d0eb1eec 100644 > --- a/sysdeps/i386/fpu/libm-test-ulps > +++ b/sysdeps/i386/fpu/libm-test-ulps > @@ -101,25 +101,21 @@ ldouble: 5 > > Function: "asinpi": > double: 1 > -float: 1 > float128: 2 > ldouble: 2 > > Function: "asinpi_downward": > double: 2 > -float: 1 > float128: 2 > ldouble: 2 > > Function: "asinpi_towardzero": > double: 1 > -float: 2 > float128: 1 > ldouble: 2 > > Function: "asinpi_upward": > double: 2 > -float: 2 > float128: 2 > ldouble: 2 > > diff --git a/sysdeps/i386/i686/fpu/multiarch/libm-test-ulps b/sysdeps/i386/i686/fpu/multiarch/libm-test-ulps > index bc14e7e115..392d7d252c 100644 > --- a/sysdeps/i386/i686/fpu/multiarch/libm-test-ulps > +++ b/sysdeps/i386/i686/fpu/multiarch/libm-test-ulps > @@ -101,25 +101,21 @@ ldouble: 5 > > Function: "asinpi": > double: 1 > -float: 1 > float128: 2 > ldouble: 2 > > Function: "asinpi_downward": > double: 2 > -float: 1 > float128: 2 > ldouble: 2 > > Function: "asinpi_towardzero": > double: 1 > -float: 2 > float128: 1 > ldouble: 2 > > Function: "asinpi_upward": > double: 2 > -float: 2 > float128: 2 > ldouble: 2 > > diff --git a/sysdeps/ieee754/flt-32/s_asinpif.c b/sysdeps/ieee754/flt-32/s_asinpif.c > new file mode 100644 > index 0000000000..585dc3f06e > --- /dev/null > +++ b/sysdeps/ieee754/flt-32/s_asinpif.c > @@ -0,0 +1,136 @@ > +/* Correctly-rounded half-revolution arc-sine function for binary32 value. > + > +Copyright (c) 2022-2025 Alexei Sibidanov. > + > +The original version of this file was copied from the CORE-MATH > +project (file src/binary32/asinpi/asinpif.c, revision 49e223e). > + > +Permission is hereby granted, free of charge, to any person obtaining a copy > +of this software and associated documentation files (the "Software"), to deal > +in the Software without restriction, including without limitation the rights > +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell > +copies of the Software, and to permit persons to whom the Software is > +furnished to do so, subject to the following conditions: > + > +The above copyright notice and this permission notice shall be included in all > +copies or substantial portions of the Software. > + > +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR > +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, > +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE > +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER > +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, > +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE > +SOFTWARE. > + > +*/ > + > +#include <errno.h> > +#include <math.h> > +#include <stdint.h> > +#include <libm-alias-float.h> > +#include "math_config.h" > + > +float > +__asinpif (float x) > +{ > + float ax = fabsf (x); > + double az = ax; > + double z = x; > + uint32_t t = asuint (x); > + int32_t e = (t >> 23) & 0xff; > + if (__glibc_unlikely (e >= 127)) > + { > + if (ax == 1.0f) > + return copysignf (0.5f, x); > + if (e == 0xff && (t << 9)) > + return x + x; /* nan */ > + return __math_edomf ((x - x) / (x - x)); /* nan */ > + } > + int32_t s = 146 - e; > + int32_t i = 0; > + if (__glibc_likely (s < 32)) > + i = ((t & (~0u >> 9)) | 1 << 23) >> s; > + static const double ch[][8] = > + { > + { 0x1.45f306dc9c882p-2, 0x1.b2995e7b7dc2fp-5, 0x1.8723a1cf50c7ep-6, > + 0x1.d1a4591d16a29p-7, 0x1.3ce3aa68ddaeep-7, 0x1.d3182ab0cc1bfp-8, > + 0x1.62b379a8b88e3p-8, 0x1.6811411fcfec2p-8 }, > + { 0x1.ffffffffd3cdap-2, -0x1.17cc1b3355fdcp-4, 0x1.d067a1e8d5a99p-6, > + -0x1.08e16fb09314ap-6, 0x1.5eed43d42dcb2p-7, -0x1.f58baca7acc71p-8, > + 0x1.5dab64e2dcf15p-8, -0x1.59270e30797acp-9 }, > + { 0x1.fffffff7c4617p-2, -0x1.17cc149ded3a2p-4, 0x1.d0654d4cb2c1ap-6, > + -0x1.08c3ba713d33ap-6, 0x1.5d2053481079cp-7, -0x1.e485ebc545e7ep-8, > + 0x1.303baca167dddp-8, -0x1.dee8d16d06b38p-10 }, > + { 0x1.ffffffa749848p-2, -0x1.17cbe7155935p-4, 0x1.d05a312269adfp-6, > + -0x1.0862b3ee617d7p-6, 0x1.5920708db2a73p-7, -0x1.cb0463b3862c3p-8, > + 0x1.02b82478f95d7p-8, -0x1.52a7b8579e729p-10 }, > + { 0x1.fffffe1f92bb5p-2, -0x1.17cb3e74c64e3p-4, 0x1.d03af67311cbfp-6, > + -0x1.079441cbfc7ap-6, 0x1.52b4287805a61p-7, -0x1.ac3286d604a98p-8, > + 0x1.b2f1210d9701bp-9, -0x1.e740ddc25afd6p-11 }, > + { 0x1.fffff92beb6e2p-2, -0x1.17c986fe9518bp-4, 0x1.cff98167c9a5ep-6, > + -0x1.0638b591eae52p-6, 0x1.4a0803828959ep-7, -0x1.8adeca229f11dp-8, > + 0x1.6b9a7ba05dfcep-9, -0x1.640521a43b2dp-11 }, > + { 0x1.ffffeccee5bfcp-2, -0x1.17c5f1753f5eap-4, 0x1.cf874e4fe258fp-6, > + -0x1.043e6cf77b256p-6, 0x1.3f7db42227d92p-7, -0x1.691a6fa2a2882p-8, > + 0x1.2f6543162bc61p-9, -0x1.07d5da05822b6p-11 }, > + { 0x1.ffffd2f64431dp-2, -0x1.17bf8208c10c1p-4, 0x1.ced7487cdb124p-6, > + -0x1.01a0d30932905p-6, 0x1.3388f99b254dap-7, -0x1.4844e245c65bdp-8, > + 0x1.fa777150197c6p-10, -0x1.8c1ecf16a05c8p-12 }, > + { 0x1.ffffa36d1712ep-2, -0x1.17b523971bd4ep-4, 0x1.cddee26de2deep-6, > + -0x1.fccb00abaaabcp-7, 0x1.269afc3622342p-7, -0x1.2933152686752p-8, > + 0x1.a76d4956cc9a3p-10, -0x1.2ce7d6dc651cep-12 }, > + { 0x1.ffff5402ab3a1p-2, -0x1.17a5ba85da77ap-4, 0x1.cc96894e05c02p-6, > + -0x1.f532143cb832ep-7, 0x1.19180b660ff09p-7, -0x1.0c57417a78b3cp-8, > + 0x1.62e26cbd7bb1ep-10, -0x1.ce28d33fe1df3p-13 }, > + { 0x1.fffed8d639751p-2, -0x1.1790349f3ae76p-4, 0x1.caf9a4fd1b398p-6, > + -0x1.ec986b111342ep-7, 0x1.0b53c3ad4baa4p-7, -0x1.e3c2282eeace4p-9, > + 0x1.2a55369f55bbep-10, -0x1.667fe48c396e8p-13 }, > + { 0x1.fffe24b714161p-2, -0x1.177394fbcb719p-4, 0x1.c90652d920ebdp-6, > + -0x1.e3239197bddf1p-7, 0x1.fb2188525b025p-8, -0x1.b3aadd451afc7p-9, > + 0x1.f74020f31fdabp-11, -0x1.18b0cb246768dp-13 }, > + { 0x1.fffd298bec9e2p-2, -0x1.174efbfd34648p-4, 0x1.c6bcfe48ea92bp-6, > + -0x1.d8f9f2a16157cp-7, 0x1.e0044f56c8864p-8, -0x1.883e2347fe76cp-9, > + 0x1.a9f0e3c1b7af5p-11, -0x1.bb5acc0e60825p-14 }, > + { 0x1.fffbd8b784c4dp-2, -0x1.1721abdd3722ep-4, 0x1.c41fee756d4bp-6, > + -0x1.ce40bccf8065fp-7, 0x1.c59b684b70ef9p-8, -0x1.6133d027996b3p-9, > + 0x1.69cad01106397p-11, -0x1.60f8e45494156p-14 }, > + { 0x1.fffa23749cf88p-2, -0x1.16eb0a8285c06p-4, 0x1.c132d762e1b0dp-6, > + -0x1.c31a959398f4ep-7, 0x1.ac1c5b46bc8ap-8, -0x1.3e34f1abe51dcp-9, > + 0x1.346738737c0b9p-11, -0x1.1b227a3f5c75p-14 }, > + { 0x1.fff7fb25bb407p-2, -0x1.16aaa14d7564p-4, 0x1.bdfa75fca5ff2p-6, > + -0x1.b7a6e260d079cp-7, 0x1.93ab06911033cp-8, -0x1.1ee5560967fd5p-9, > + 0x1.07d31060838bfp-11, -0x1.c96f33a283115p-15 > + } > + }; > + const double *c = ch[i]; > + double z2 = z * z; > + double z4 = z2 * z2; > + if (__glibc_unlikely (i == 0)) > + { > + double c0 = c[0] + z2 * c[1]; > + double c2 = c[2] + z2 * c[3]; > + double c4 = c[4] + z2 * c[5]; > + double c6 = c[6] + z2 * c[7]; > + c0 += c2 * z4; > + c4 += c6 * z4; > + c0 += c4 * (z4 * z4); > + if (__glibc_unlikely (ax != 0.0 && ax <= 0x1.921fb4p-126f)) > + __set_errno (ERANGE); > + return z * c0; > + } > + else > + { > + double f = sqrt (1 - az); > + double c0 = fma (az, c[1], c[0]); > + double c2 = c[2] + az * c[3]; > + double c4 = c[4] + az * c[5]; > + double c6 = c[6] + az * c[7]; > + c0 += c2 * z2; > + c4 += c6 * z2; > + c0 += c4 * z4; > + double r = fma (-c0, copysign (f, x), copysign (0.5, x)); > + return r; > + } > +} > +libm_alias_float (__asinpi, asinpi) > diff --git a/sysdeps/loongarch/lp64/libm-test-ulps b/sysdeps/loongarch/lp64/libm-test-ulps > index ce84ddf1e6..33dd6718ba 100644 > --- a/sysdeps/loongarch/lp64/libm-test-ulps > +++ b/sysdeps/loongarch/lp64/libm-test-ulps > @@ -83,22 +83,18 @@ ldouble: 4 > > Function: "asinpi": > double: 1 > -float: 1 > ldouble: 2 > > Function: "asinpi_downward": > double: 1 > -float: 1 > ldouble: 2 > > Function: "asinpi_towardzero": > double: 1 > -float: 2 > ldouble: 1 > > Function: "asinpi_upward": > double: 2 > -float: 2 > ldouble: 2 > > Function: "atan": > diff --git a/sysdeps/mips/mips64/libm-test-ulps b/sysdeps/mips/mips64/libm-test-ulps > index 67c37dfd5e..869ceff928 100644 > --- a/sysdeps/mips/mips64/libm-test-ulps > +++ b/sysdeps/mips/mips64/libm-test-ulps > @@ -83,22 +83,18 @@ ldouble: 4 > > Function: "asinpi": > double: 1 > -float: 1 > ldouble: 2 > > Function: "asinpi_downward": > double: 1 > -float: 1 > ldouble: 2 > > Function: "asinpi_towardzero": > double: 1 > -float: 2 > ldouble: 1 > > Function: "asinpi_upward": > double: 2 > -float: 2 > ldouble: 2 > > Function: "atan": > diff --git a/sysdeps/or1k/fpu/libm-test-ulps b/sysdeps/or1k/fpu/libm-test-ulps > index d3b1036d29..75db236e09 100644 > --- a/sysdeps/or1k/fpu/libm-test-ulps > +++ b/sysdeps/or1k/fpu/libm-test-ulps > @@ -63,19 +63,15 @@ double: 3 > > Function: "asinpi": > double: 1 > -float: 1 > > Function: "asinpi_downward": > double: 1 > -float: 1 > > Function: "asinpi_towardzero": > double: 1 > -float: 2 > > Function: "asinpi_upward": > double: 2 > -float: 2 > > Function: "atan": > double: 1 > diff --git a/sysdeps/or1k/nofpu/libm-test-ulps b/sysdeps/or1k/nofpu/libm-test-ulps > index 14b7e0f3f9..a1f7c80097 100644 > --- a/sysdeps/or1k/nofpu/libm-test-ulps > +++ b/sysdeps/or1k/nofpu/libm-test-ulps > @@ -54,7 +54,6 @@ double: 3 > > Function: "asinpi": > double: 1 > -float: 1 > > Function: "atan": > double: 1 > diff --git a/sysdeps/powerpc/fpu/libm-test-ulps b/sysdeps/powerpc/fpu/libm-test-ulps > index c9c86de147..fa3cf2e844 100644 > --- a/sysdeps/powerpc/fpu/libm-test-ulps > +++ b/sysdeps/powerpc/fpu/libm-test-ulps > @@ -107,25 +107,21 @@ ldouble: 7 > > Function: "asinpi": > double: 1 > -float: 1 > float128: 1 > ldouble: 2 > > Function: "asinpi_downward": > double: 1 > -float: 1 > float128: 2 > ldouble: 4 > > Function: "asinpi_towardzero": > double: 1 > -float: 2 > float128: 1 > ldouble: 4 > > Function: "asinpi_upward": > double: 2 > -float: 2 > float128: 2 > ldouble: 4 > > diff --git a/sysdeps/riscv/nofpu/libm-test-ulps b/sysdeps/riscv/nofpu/libm-test-ulps > index 6206a9531a..a5184ecad9 100644 > --- a/sysdeps/riscv/nofpu/libm-test-ulps > +++ b/sysdeps/riscv/nofpu/libm-test-ulps > @@ -71,7 +71,6 @@ ldouble: 4 > > Function: "asinpi": > double: 1 > -float: 1 > ldouble: 2 > > Function: "atan": > diff --git a/sysdeps/riscv/rvd/libm-test-ulps b/sysdeps/riscv/rvd/libm-test-ulps > index 124ca4b719..3bfc9668d5 100644 > --- a/sysdeps/riscv/rvd/libm-test-ulps > +++ b/sysdeps/riscv/rvd/libm-test-ulps > @@ -83,22 +83,18 @@ ldouble: 4 > > Function: "asinpi": > double: 1 > -float: 1 > ldouble: 2 > > Function: "asinpi_downward": > double: 1 > -float: 1 > ldouble: 2 > > Function: "asinpi_towardzero": > double: 1 > -float: 2 > ldouble: 1 > > Function: "asinpi_upward": > double: 2 > -float: 2 > ldouble: 2 > > Function: "atan": > diff --git a/sysdeps/s390/fpu/libm-test-ulps b/sysdeps/s390/fpu/libm-test-ulps > index 364ccf3326..7d61bf1cef 100644 > --- a/sysdeps/s390/fpu/libm-test-ulps > +++ b/sysdeps/s390/fpu/libm-test-ulps > @@ -83,22 +83,18 @@ ldouble: 4 > > Function: "asinpi": > double: 1 > -float: 1 > ldouble: 2 > > Function: "asinpi_downward": > double: 1 > -float: 1 > ldouble: 2 > > Function: "asinpi_towardzero": > double: 1 > -float: 2 > ldouble: 1 > > Function: "asinpi_upward": > double: 2 > -float: 2 > ldouble: 2 > > Function: "atan": > diff --git a/sysdeps/sparc/fpu/libm-test-ulps b/sysdeps/sparc/fpu/libm-test-ulps > index 1174972002..426f45893e 100644 > --- a/sysdeps/sparc/fpu/libm-test-ulps > +++ b/sysdeps/sparc/fpu/libm-test-ulps > @@ -83,22 +83,18 @@ ldouble: 4 > > Function: "asinpi": > double: 1 > -float: 1 > ldouble: 2 > > Function: "asinpi_downward": > double: 1 > -float: 1 > ldouble: 2 > > Function: "asinpi_towardzero": > double: 1 > -float: 2 > ldouble: 1 > > Function: "asinpi_upward": > double: 2 > -float: 2 > ldouble: 2 > > Function: "atan": > diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps > index 5ed5112b49..d4c4bfa42b 100644 > --- a/sysdeps/x86_64/fpu/libm-test-ulps > +++ b/sysdeps/x86_64/fpu/libm-test-ulps > @@ -180,25 +180,21 @@ float: 1 > > Function: "asinpi": > double: 1 > -float: 1 > float128: 2 > ldouble: 2 > > Function: "asinpi_downward": > double: 1 > -float: 1 > float128: 2 > ldouble: 2 > > Function: "asinpi_towardzero": > double: 1 > -float: 2 > float128: 1 > ldouble: 2 > > Function: "asinpi_upward": > double: 2 > -float: 2 > float128: 2 > ldouble: 2 > > -- > 2.43.0 > >
On 03/02/25 03:40, Paul Zimmermann wrote: > I suggest the following change which should improve performance on x86_64/x86_64-v2: > > --- a/sysdeps/ieee754/flt-32/s_asinpif.c > +++ b/sysdeps/ieee754/flt-32/s_asinpif.c > @@ -122,7 +122,7 @@ __asinpif (float x) > else > { > double f = sqrt (1 - az); > - double c0 = fma (az, c[1], c[0]); > + double c0 = c[0] + az * c[1]; > double c2 = c[2] + az * c[3]; > double c4 = c[4] + az * c[5]; > double c6 = c[6] + az * c[7]; > > Moreover "fast patch" should be fast path. Thanks, this allows to drop the last patch that adds a asinpif ifunc variant for x86_64-v3, since performance is now an improvement on all x86_64 ABIs. > > Paul > >> From: Adhemerval Zanella <adhemerval.zanella@linaro.org> >> Cc: DJ Delorie <dj@redhat.com>, >> Joseph Myers <josmyers@redhat.com>, >> Paul Zimmermann <Paul.Zimmermann@inria.fr>, >> Alexei Sibidanov <sibid@uvic.ca> >> Date: Fri, 31 Jan 2025 16:17:13 -0300 >> >> The CORE-MATH implementation is correctly rounded (for any rounding mode) >> and shows better performance to the generic asinpif. >> >> The code was adapted to glibc style and to use the definition of >> math_config.h (to handle errno, overflow, and underflow). >> >> Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1, >> gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1): >> >> latency master patched improvement >> x86_64 46.4996 51.0456 -9.78% >> x86_64v2 46.7551 52.1317 -11.50% >> x86_64v3 42.6235 34.8162 18.32% >> aarch64 (Neoverse) 17.4161 14.3604 17.55% >> power8 10.7347 9.0193 15.98% >> power10 10.6420 9.0362 15.09% >> >> reciprocal-throughput master patched improvement >> x86_64 24.7208 29.0812 -17.64% >> x86_64v2 24.2177 29.7166 -22.71% >> x86_64v3 20.5617 12.3679 39.85% >> aarch64 (Neoverse) 13.4827 7.17613 46.78% >> power8 6.46134 3.56089 44.89% >> power10 5.79007 3.49544 39.63% >> >> x86_64/x86_64-v2 shows slower performance due the use of a fma >> operation in the fast patch, only x86_64-v3 provides it without a >> function call. >> --- >> SHARED-FILES | 4 + >> sysdeps/aarch64/libm-test-ulps | 4 - >> sysdeps/arc/fpu/libm-test-ulps | 4 - >> sysdeps/arc/nofpu/libm-test-ulps | 1 - >> sysdeps/arm/libm-test-ulps | 4 - >> sysdeps/hppa/fpu/libm-test-ulps | 4 - >> sysdeps/i386/fpu/libm-test-ulps | 4 - >> .../i386/i686/fpu/multiarch/libm-test-ulps | 4 - >> sysdeps/ieee754/flt-32/s_asinpif.c | 136 ++++++++++++++++++ >> sysdeps/loongarch/lp64/libm-test-ulps | 4 - >> sysdeps/mips/mips64/libm-test-ulps | 4 - >> sysdeps/or1k/fpu/libm-test-ulps | 4 - >> sysdeps/or1k/nofpu/libm-test-ulps | 1 - >> sysdeps/powerpc/fpu/libm-test-ulps | 4 - >> sysdeps/riscv/nofpu/libm-test-ulps | 1 - >> sysdeps/riscv/rvd/libm-test-ulps | 4 - >> sysdeps/s390/fpu/libm-test-ulps | 4 - >> sysdeps/sparc/fpu/libm-test-ulps | 4 - >> sysdeps/x86_64/fpu/libm-test-ulps | 4 - >> 19 files changed, 140 insertions(+), 59 deletions(-) >> create mode 100644 sysdeps/ieee754/flt-32/s_asinpif.c >> >> diff --git a/SHARED-FILES b/SHARED-FILES >> index 3fde72644a..e700f4b155 100644 >> --- a/SHARED-FILES >> +++ b/SHARED-FILES >> @@ -338,3 +338,7 @@ sysdeps/ieee754/flt-32/s_acospif.c: >> (src/binary32/acospi/acospif.c in CORE-MATH) >> - the code was adapted to use glibc code style and internal >> functions to handle errno, overflow, and underflow. >> +sysdeps/ieee754/flt-32/s_asinpif.c: >> + (src/binary32/asinpi/asinpif.c in CORE-MATH) >> + - the code was adapted to use glibc code style and internal >> + functions to handle errno, overflow, and underflow. >> diff --git a/sysdeps/aarch64/libm-test-ulps b/sysdeps/aarch64/libm-test-ulps >> index 1a403d95b6..abb0611ee5 100644 >> --- a/sysdeps/aarch64/libm-test-ulps >> +++ b/sysdeps/aarch64/libm-test-ulps >> @@ -115,22 +115,18 @@ ldouble: 4 >> >> Function: "asinpi": >> double: 1 >> -float: 1 >> ldouble: 2 >> >> Function: "asinpi_downward": >> double: 1 >> -float: 1 >> ldouble: 2 >> >> Function: "asinpi_towardzero": >> double: 1 >> -float: 2 >> ldouble: 1 >> >> Function: "asinpi_upward": >> double: 2 >> -float: 2 >> ldouble: 2 >> >> Function: "atan": >> diff --git a/sysdeps/arc/fpu/libm-test-ulps b/sysdeps/arc/fpu/libm-test-ulps >> index c0c5daa589..35aebba38a 100644 >> --- a/sysdeps/arc/fpu/libm-test-ulps >> +++ b/sysdeps/arc/fpu/libm-test-ulps >> @@ -63,19 +63,15 @@ double: 3 >> >> Function: "asinpi": >> double: 1 >> -float: 1 >> >> Function: "asinpi_downward": >> double: 1 >> -float: 1 >> >> Function: "asinpi_towardzero": >> double: 1 >> -float: 2 >> >> Function: "asinpi_upward": >> double: 2 >> -float: 2 >> >> Function: "atan": >> double: 1 >> diff --git a/sysdeps/arc/nofpu/libm-test-ulps b/sysdeps/arc/nofpu/libm-test-ulps >> index 2b34f5a0ab..325546e582 100644 >> --- a/sysdeps/arc/nofpu/libm-test-ulps >> +++ b/sysdeps/arc/nofpu/libm-test-ulps >> @@ -18,7 +18,6 @@ double: 2 >> >> Function: "asinpi": >> double: 1 >> -float: 1 >> >> Function: "atan": >> double: 1 >> diff --git a/sysdeps/arm/libm-test-ulps b/sysdeps/arm/libm-test-ulps >> index afb0532a66..0927fdb980 100644 >> --- a/sysdeps/arm/libm-test-ulps >> +++ b/sysdeps/arm/libm-test-ulps >> @@ -63,19 +63,15 @@ double: 3 >> >> Function: "asinpi": >> double: 1 >> -float: 1 >> >> Function: "asinpi_downward": >> double: 1 >> -float: 1 >> >> Function: "asinpi_towardzero": >> double: 1 >> -float: 2 >> >> Function: "asinpi_upward": >> double: 2 >> -float: 2 >> >> Function: "atan": >> double: 1 >> diff --git a/sysdeps/hppa/fpu/libm-test-ulps b/sysdeps/hppa/fpu/libm-test-ulps >> index b9959c8a12..02cc3b5ddc 100644 >> --- a/sysdeps/hppa/fpu/libm-test-ulps >> +++ b/sysdeps/hppa/fpu/libm-test-ulps >> @@ -63,19 +63,15 @@ double: 3 >> >> Function: "asinpi": >> double: 1 >> -float: 1 >> >> Function: "asinpi_downward": >> double: 1 >> -float: 1 >> >> Function: "asinpi_towardzero": >> double: 1 >> -float: 2 >> >> Function: "asinpi_upward": >> double: 2 >> -float: 2 >> >> Function: "atan": >> double: 1 >> diff --git a/sysdeps/i386/fpu/libm-test-ulps b/sysdeps/i386/fpu/libm-test-ulps >> index 85c58f34e9..69d0eb1eec 100644 >> --- a/sysdeps/i386/fpu/libm-test-ulps >> +++ b/sysdeps/i386/fpu/libm-test-ulps >> @@ -101,25 +101,21 @@ ldouble: 5 >> >> Function: "asinpi": >> double: 1 >> -float: 1 >> float128: 2 >> ldouble: 2 >> >> Function: "asinpi_downward": >> double: 2 >> -float: 1 >> float128: 2 >> ldouble: 2 >> >> Function: "asinpi_towardzero": >> double: 1 >> -float: 2 >> float128: 1 >> ldouble: 2 >> >> Function: "asinpi_upward": >> double: 2 >> -float: 2 >> float128: 2 >> ldouble: 2 >> >> diff --git a/sysdeps/i386/i686/fpu/multiarch/libm-test-ulps b/sysdeps/i386/i686/fpu/multiarch/libm-test-ulps >> index bc14e7e115..392d7d252c 100644 >> --- a/sysdeps/i386/i686/fpu/multiarch/libm-test-ulps >> +++ b/sysdeps/i386/i686/fpu/multiarch/libm-test-ulps >> @@ -101,25 +101,21 @@ ldouble: 5 >> >> Function: "asinpi": >> double: 1 >> -float: 1 >> float128: 2 >> ldouble: 2 >> >> Function: "asinpi_downward": >> double: 2 >> -float: 1 >> float128: 2 >> ldouble: 2 >> >> Function: "asinpi_towardzero": >> double: 1 >> -float: 2 >> float128: 1 >> ldouble: 2 >> >> Function: "asinpi_upward": >> double: 2 >> -float: 2 >> float128: 2 >> ldouble: 2 >> >> diff --git a/sysdeps/ieee754/flt-32/s_asinpif.c b/sysdeps/ieee754/flt-32/s_asinpif.c >> new file mode 100644 >> index 0000000000..585dc3f06e >> --- /dev/null >> +++ b/sysdeps/ieee754/flt-32/s_asinpif.c >> @@ -0,0 +1,136 @@ >> +/* Correctly-rounded half-revolution arc-sine function for binary32 value. >> + >> +Copyright (c) 2022-2025 Alexei Sibidanov. >> + >> +The original version of this file was copied from the CORE-MATH >> +project (file src/binary32/asinpi/asinpif.c, revision 49e223e). >> + >> +Permission is hereby granted, free of charge, to any person obtaining a copy >> +of this software and associated documentation files (the "Software"), to deal >> +in the Software without restriction, including without limitation the rights >> +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell >> +copies of the Software, and to permit persons to whom the Software is >> +furnished to do so, subject to the following conditions: >> + >> +The above copyright notice and this permission notice shall be included in all >> +copies or substantial portions of the Software. >> + >> +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR >> +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, >> +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE >> +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER >> +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, >> +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE >> +SOFTWARE. >> + >> +*/ >> + >> +#include <errno.h> >> +#include <math.h> >> +#include <stdint.h> >> +#include <libm-alias-float.h> >> +#include "math_config.h" >> + >> +float >> +__asinpif (float x) >> +{ >> + float ax = fabsf (x); >> + double az = ax; >> + double z = x; >> + uint32_t t = asuint (x); >> + int32_t e = (t >> 23) & 0xff; >> + if (__glibc_unlikely (e >= 127)) >> + { >> + if (ax == 1.0f) >> + return copysignf (0.5f, x); >> + if (e == 0xff && (t << 9)) >> + return x + x; /* nan */ >> + return __math_edomf ((x - x) / (x - x)); /* nan */ >> + } >> + int32_t s = 146 - e; >> + int32_t i = 0; >> + if (__glibc_likely (s < 32)) >> + i = ((t & (~0u >> 9)) | 1 << 23) >> s; >> + static const double ch[][8] = >> + { >> + { 0x1.45f306dc9c882p-2, 0x1.b2995e7b7dc2fp-5, 0x1.8723a1cf50c7ep-6, >> + 0x1.d1a4591d16a29p-7, 0x1.3ce3aa68ddaeep-7, 0x1.d3182ab0cc1bfp-8, >> + 0x1.62b379a8b88e3p-8, 0x1.6811411fcfec2p-8 }, >> + { 0x1.ffffffffd3cdap-2, -0x1.17cc1b3355fdcp-4, 0x1.d067a1e8d5a99p-6, >> + -0x1.08e16fb09314ap-6, 0x1.5eed43d42dcb2p-7, -0x1.f58baca7acc71p-8, >> + 0x1.5dab64e2dcf15p-8, -0x1.59270e30797acp-9 }, >> + { 0x1.fffffff7c4617p-2, -0x1.17cc149ded3a2p-4, 0x1.d0654d4cb2c1ap-6, >> + -0x1.08c3ba713d33ap-6, 0x1.5d2053481079cp-7, -0x1.e485ebc545e7ep-8, >> + 0x1.303baca167dddp-8, -0x1.dee8d16d06b38p-10 }, >> + { 0x1.ffffffa749848p-2, -0x1.17cbe7155935p-4, 0x1.d05a312269adfp-6, >> + -0x1.0862b3ee617d7p-6, 0x1.5920708db2a73p-7, -0x1.cb0463b3862c3p-8, >> + 0x1.02b82478f95d7p-8, -0x1.52a7b8579e729p-10 }, >> + { 0x1.fffffe1f92bb5p-2, -0x1.17cb3e74c64e3p-4, 0x1.d03af67311cbfp-6, >> + -0x1.079441cbfc7ap-6, 0x1.52b4287805a61p-7, -0x1.ac3286d604a98p-8, >> + 0x1.b2f1210d9701bp-9, -0x1.e740ddc25afd6p-11 }, >> + { 0x1.fffff92beb6e2p-2, -0x1.17c986fe9518bp-4, 0x1.cff98167c9a5ep-6, >> + -0x1.0638b591eae52p-6, 0x1.4a0803828959ep-7, -0x1.8adeca229f11dp-8, >> + 0x1.6b9a7ba05dfcep-9, -0x1.640521a43b2dp-11 }, >> + { 0x1.ffffeccee5bfcp-2, -0x1.17c5f1753f5eap-4, 0x1.cf874e4fe258fp-6, >> + -0x1.043e6cf77b256p-6, 0x1.3f7db42227d92p-7, -0x1.691a6fa2a2882p-8, >> + 0x1.2f6543162bc61p-9, -0x1.07d5da05822b6p-11 }, >> + { 0x1.ffffd2f64431dp-2, -0x1.17bf8208c10c1p-4, 0x1.ced7487cdb124p-6, >> + -0x1.01a0d30932905p-6, 0x1.3388f99b254dap-7, -0x1.4844e245c65bdp-8, >> + 0x1.fa777150197c6p-10, -0x1.8c1ecf16a05c8p-12 }, >> + { 0x1.ffffa36d1712ep-2, -0x1.17b523971bd4ep-4, 0x1.cddee26de2deep-6, >> + -0x1.fccb00abaaabcp-7, 0x1.269afc3622342p-7, -0x1.2933152686752p-8, >> + 0x1.a76d4956cc9a3p-10, -0x1.2ce7d6dc651cep-12 }, >> + { 0x1.ffff5402ab3a1p-2, -0x1.17a5ba85da77ap-4, 0x1.cc96894e05c02p-6, >> + -0x1.f532143cb832ep-7, 0x1.19180b660ff09p-7, -0x1.0c57417a78b3cp-8, >> + 0x1.62e26cbd7bb1ep-10, -0x1.ce28d33fe1df3p-13 }, >> + { 0x1.fffed8d639751p-2, -0x1.1790349f3ae76p-4, 0x1.caf9a4fd1b398p-6, >> + -0x1.ec986b111342ep-7, 0x1.0b53c3ad4baa4p-7, -0x1.e3c2282eeace4p-9, >> + 0x1.2a55369f55bbep-10, -0x1.667fe48c396e8p-13 }, >> + { 0x1.fffe24b714161p-2, -0x1.177394fbcb719p-4, 0x1.c90652d920ebdp-6, >> + -0x1.e3239197bddf1p-7, 0x1.fb2188525b025p-8, -0x1.b3aadd451afc7p-9, >> + 0x1.f74020f31fdabp-11, -0x1.18b0cb246768dp-13 }, >> + { 0x1.fffd298bec9e2p-2, -0x1.174efbfd34648p-4, 0x1.c6bcfe48ea92bp-6, >> + -0x1.d8f9f2a16157cp-7, 0x1.e0044f56c8864p-8, -0x1.883e2347fe76cp-9, >> + 0x1.a9f0e3c1b7af5p-11, -0x1.bb5acc0e60825p-14 }, >> + { 0x1.fffbd8b784c4dp-2, -0x1.1721abdd3722ep-4, 0x1.c41fee756d4bp-6, >> + -0x1.ce40bccf8065fp-7, 0x1.c59b684b70ef9p-8, -0x1.6133d027996b3p-9, >> + 0x1.69cad01106397p-11, -0x1.60f8e45494156p-14 }, >> + { 0x1.fffa23749cf88p-2, -0x1.16eb0a8285c06p-4, 0x1.c132d762e1b0dp-6, >> + -0x1.c31a959398f4ep-7, 0x1.ac1c5b46bc8ap-8, -0x1.3e34f1abe51dcp-9, >> + 0x1.346738737c0b9p-11, -0x1.1b227a3f5c75p-14 }, >> + { 0x1.fff7fb25bb407p-2, -0x1.16aaa14d7564p-4, 0x1.bdfa75fca5ff2p-6, >> + -0x1.b7a6e260d079cp-7, 0x1.93ab06911033cp-8, -0x1.1ee5560967fd5p-9, >> + 0x1.07d31060838bfp-11, -0x1.c96f33a283115p-15 >> + } >> + }; >> + const double *c = ch[i]; >> + double z2 = z * z; >> + double z4 = z2 * z2; >> + if (__glibc_unlikely (i == 0)) >> + { >> + double c0 = c[0] + z2 * c[1]; >> + double c2 = c[2] + z2 * c[3]; >> + double c4 = c[4] + z2 * c[5]; >> + double c6 = c[6] + z2 * c[7]; >> + c0 += c2 * z4; >> + c4 += c6 * z4; >> + c0 += c4 * (z4 * z4); >> + if (__glibc_unlikely (ax != 0.0 && ax <= 0x1.921fb4p-126f)) >> + __set_errno (ERANGE); >> + return z * c0; >> + } >> + else >> + { >> + double f = sqrt (1 - az); >> + double c0 = fma (az, c[1], c[0]); >> + double c2 = c[2] + az * c[3]; >> + double c4 = c[4] + az * c[5]; >> + double c6 = c[6] + az * c[7]; >> + c0 += c2 * z2; >> + c4 += c6 * z2; >> + c0 += c4 * z4; >> + double r = fma (-c0, copysign (f, x), copysign (0.5, x)); >> + return r; >> + } >> +} >> +libm_alias_float (__asinpi, asinpi) >> diff --git a/sysdeps/loongarch/lp64/libm-test-ulps b/sysdeps/loongarch/lp64/libm-test-ulps >> index ce84ddf1e6..33dd6718ba 100644 >> --- a/sysdeps/loongarch/lp64/libm-test-ulps >> +++ b/sysdeps/loongarch/lp64/libm-test-ulps >> @@ -83,22 +83,18 @@ ldouble: 4 >> >> Function: "asinpi": >> double: 1 >> -float: 1 >> ldouble: 2 >> >> Function: "asinpi_downward": >> double: 1 >> -float: 1 >> ldouble: 2 >> >> Function: "asinpi_towardzero": >> double: 1 >> -float: 2 >> ldouble: 1 >> >> Function: "asinpi_upward": >> double: 2 >> -float: 2 >> ldouble: 2 >> >> Function: "atan": >> diff --git a/sysdeps/mips/mips64/libm-test-ulps b/sysdeps/mips/mips64/libm-test-ulps >> index 67c37dfd5e..869ceff928 100644 >> --- a/sysdeps/mips/mips64/libm-test-ulps >> +++ b/sysdeps/mips/mips64/libm-test-ulps >> @@ -83,22 +83,18 @@ ldouble: 4 >> >> Function: "asinpi": >> double: 1 >> -float: 1 >> ldouble: 2 >> >> Function: "asinpi_downward": >> double: 1 >> -float: 1 >> ldouble: 2 >> >> Function: "asinpi_towardzero": >> double: 1 >> -float: 2 >> ldouble: 1 >> >> Function: "asinpi_upward": >> double: 2 >> -float: 2 >> ldouble: 2 >> >> Function: "atan": >> diff --git a/sysdeps/or1k/fpu/libm-test-ulps b/sysdeps/or1k/fpu/libm-test-ulps >> index d3b1036d29..75db236e09 100644 >> --- a/sysdeps/or1k/fpu/libm-test-ulps >> +++ b/sysdeps/or1k/fpu/libm-test-ulps >> @@ -63,19 +63,15 @@ double: 3 >> >> Function: "asinpi": >> double: 1 >> -float: 1 >> >> Function: "asinpi_downward": >> double: 1 >> -float: 1 >> >> Function: "asinpi_towardzero": >> double: 1 >> -float: 2 >> >> Function: "asinpi_upward": >> double: 2 >> -float: 2 >> >> Function: "atan": >> double: 1 >> diff --git a/sysdeps/or1k/nofpu/libm-test-ulps b/sysdeps/or1k/nofpu/libm-test-ulps >> index 14b7e0f3f9..a1f7c80097 100644 >> --- a/sysdeps/or1k/nofpu/libm-test-ulps >> +++ b/sysdeps/or1k/nofpu/libm-test-ulps >> @@ -54,7 +54,6 @@ double: 3 >> >> Function: "asinpi": >> double: 1 >> -float: 1 >> >> Function: "atan": >> double: 1 >> diff --git a/sysdeps/powerpc/fpu/libm-test-ulps b/sysdeps/powerpc/fpu/libm-test-ulps >> index c9c86de147..fa3cf2e844 100644 >> --- a/sysdeps/powerpc/fpu/libm-test-ulps >> +++ b/sysdeps/powerpc/fpu/libm-test-ulps >> @@ -107,25 +107,21 @@ ldouble: 7 >> >> Function: "asinpi": >> double: 1 >> -float: 1 >> float128: 1 >> ldouble: 2 >> >> Function: "asinpi_downward": >> double: 1 >> -float: 1 >> float128: 2 >> ldouble: 4 >> >> Function: "asinpi_towardzero": >> double: 1 >> -float: 2 >> float128: 1 >> ldouble: 4 >> >> Function: "asinpi_upward": >> double: 2 >> -float: 2 >> float128: 2 >> ldouble: 4 >> >> diff --git a/sysdeps/riscv/nofpu/libm-test-ulps b/sysdeps/riscv/nofpu/libm-test-ulps >> index 6206a9531a..a5184ecad9 100644 >> --- a/sysdeps/riscv/nofpu/libm-test-ulps >> +++ b/sysdeps/riscv/nofpu/libm-test-ulps >> @@ -71,7 +71,6 @@ ldouble: 4 >> >> Function: "asinpi": >> double: 1 >> -float: 1 >> ldouble: 2 >> >> Function: "atan": >> diff --git a/sysdeps/riscv/rvd/libm-test-ulps b/sysdeps/riscv/rvd/libm-test-ulps >> index 124ca4b719..3bfc9668d5 100644 >> --- a/sysdeps/riscv/rvd/libm-test-ulps >> +++ b/sysdeps/riscv/rvd/libm-test-ulps >> @@ -83,22 +83,18 @@ ldouble: 4 >> >> Function: "asinpi": >> double: 1 >> -float: 1 >> ldouble: 2 >> >> Function: "asinpi_downward": >> double: 1 >> -float: 1 >> ldouble: 2 >> >> Function: "asinpi_towardzero": >> double: 1 >> -float: 2 >> ldouble: 1 >> >> Function: "asinpi_upward": >> double: 2 >> -float: 2 >> ldouble: 2 >> >> Function: "atan": >> diff --git a/sysdeps/s390/fpu/libm-test-ulps b/sysdeps/s390/fpu/libm-test-ulps >> index 364ccf3326..7d61bf1cef 100644 >> --- a/sysdeps/s390/fpu/libm-test-ulps >> +++ b/sysdeps/s390/fpu/libm-test-ulps >> @@ -83,22 +83,18 @@ ldouble: 4 >> >> Function: "asinpi": >> double: 1 >> -float: 1 >> ldouble: 2 >> >> Function: "asinpi_downward": >> double: 1 >> -float: 1 >> ldouble: 2 >> >> Function: "asinpi_towardzero": >> double: 1 >> -float: 2 >> ldouble: 1 >> >> Function: "asinpi_upward": >> double: 2 >> -float: 2 >> ldouble: 2 >> >> Function: "atan": >> diff --git a/sysdeps/sparc/fpu/libm-test-ulps b/sysdeps/sparc/fpu/libm-test-ulps >> index 1174972002..426f45893e 100644 >> --- a/sysdeps/sparc/fpu/libm-test-ulps >> +++ b/sysdeps/sparc/fpu/libm-test-ulps >> @@ -83,22 +83,18 @@ ldouble: 4 >> >> Function: "asinpi": >> double: 1 >> -float: 1 >> ldouble: 2 >> >> Function: "asinpi_downward": >> double: 1 >> -float: 1 >> ldouble: 2 >> >> Function: "asinpi_towardzero": >> double: 1 >> -float: 2 >> ldouble: 1 >> >> Function: "asinpi_upward": >> double: 2 >> -float: 2 >> ldouble: 2 >> >> Function: "atan": >> diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps >> index 5ed5112b49..d4c4bfa42b 100644 >> --- a/sysdeps/x86_64/fpu/libm-test-ulps >> +++ b/sysdeps/x86_64/fpu/libm-test-ulps >> @@ -180,25 +180,21 @@ float: 1 >> >> Function: "asinpi": >> double: 1 >> -float: 1 >> float128: 2 >> ldouble: 2 >> >> Function: "asinpi_downward": >> double: 1 >> -float: 1 >> float128: 2 >> ldouble: 2 >> >> Function: "asinpi_towardzero": >> double: 1 >> -float: 2 >> float128: 1 >> ldouble: 2 >> >> Function: "asinpi_upward": >> double: 2 >> -float: 2 >> float128: 2 >> ldouble: 2 >> >> -- >> 2.43.0 >> >>
diff --git a/SHARED-FILES b/SHARED-FILES index 3fde72644a..e700f4b155 100644 --- a/SHARED-FILES +++ b/SHARED-FILES @@ -338,3 +338,7 @@ sysdeps/ieee754/flt-32/s_acospif.c: (src/binary32/acospi/acospif.c in CORE-MATH) - the code was adapted to use glibc code style and internal functions to handle errno, overflow, and underflow. +sysdeps/ieee754/flt-32/s_asinpif.c: + (src/binary32/asinpi/asinpif.c in CORE-MATH) + - the code was adapted to use glibc code style and internal + functions to handle errno, overflow, and underflow. diff --git a/sysdeps/aarch64/libm-test-ulps b/sysdeps/aarch64/libm-test-ulps index 1a403d95b6..abb0611ee5 100644 --- a/sysdeps/aarch64/libm-test-ulps +++ b/sysdeps/aarch64/libm-test-ulps @@ -115,22 +115,18 @@ ldouble: 4 Function: "asinpi": double: 1 -float: 1 ldouble: 2 Function: "asinpi_downward": double: 1 -float: 1 ldouble: 2 Function: "asinpi_towardzero": double: 1 -float: 2 ldouble: 1 Function: "asinpi_upward": double: 2 -float: 2 ldouble: 2 Function: "atan": diff --git a/sysdeps/arc/fpu/libm-test-ulps b/sysdeps/arc/fpu/libm-test-ulps index c0c5daa589..35aebba38a 100644 --- a/sysdeps/arc/fpu/libm-test-ulps +++ b/sysdeps/arc/fpu/libm-test-ulps @@ -63,19 +63,15 @@ double: 3 Function: "asinpi": double: 1 -float: 1 Function: "asinpi_downward": double: 1 -float: 1 Function: "asinpi_towardzero": double: 1 -float: 2 Function: "asinpi_upward": double: 2 -float: 2 Function: "atan": double: 1 diff --git a/sysdeps/arc/nofpu/libm-test-ulps b/sysdeps/arc/nofpu/libm-test-ulps index 2b34f5a0ab..325546e582 100644 --- a/sysdeps/arc/nofpu/libm-test-ulps +++ b/sysdeps/arc/nofpu/libm-test-ulps @@ -18,7 +18,6 @@ double: 2 Function: "asinpi": double: 1 -float: 1 Function: "atan": double: 1 diff --git a/sysdeps/arm/libm-test-ulps b/sysdeps/arm/libm-test-ulps index afb0532a66..0927fdb980 100644 --- a/sysdeps/arm/libm-test-ulps +++ b/sysdeps/arm/libm-test-ulps @@ -63,19 +63,15 @@ double: 3 Function: "asinpi": double: 1 -float: 1 Function: "asinpi_downward": double: 1 -float: 1 Function: "asinpi_towardzero": double: 1 -float: 2 Function: "asinpi_upward": double: 2 -float: 2 Function: "atan": double: 1 diff --git a/sysdeps/hppa/fpu/libm-test-ulps b/sysdeps/hppa/fpu/libm-test-ulps index b9959c8a12..02cc3b5ddc 100644 --- a/sysdeps/hppa/fpu/libm-test-ulps +++ b/sysdeps/hppa/fpu/libm-test-ulps @@ -63,19 +63,15 @@ double: 3 Function: "asinpi": double: 1 -float: 1 Function: "asinpi_downward": double: 1 -float: 1 Function: "asinpi_towardzero": double: 1 -float: 2 Function: "asinpi_upward": double: 2 -float: 2 Function: "atan": double: 1 diff --git a/sysdeps/i386/fpu/libm-test-ulps b/sysdeps/i386/fpu/libm-test-ulps index 85c58f34e9..69d0eb1eec 100644 --- a/sysdeps/i386/fpu/libm-test-ulps +++ b/sysdeps/i386/fpu/libm-test-ulps @@ -101,25 +101,21 @@ ldouble: 5 Function: "asinpi": double: 1 -float: 1 float128: 2 ldouble: 2 Function: "asinpi_downward": double: 2 -float: 1 float128: 2 ldouble: 2 Function: "asinpi_towardzero": double: 1 -float: 2 float128: 1 ldouble: 2 Function: "asinpi_upward": double: 2 -float: 2 float128: 2 ldouble: 2 diff --git a/sysdeps/i386/i686/fpu/multiarch/libm-test-ulps b/sysdeps/i386/i686/fpu/multiarch/libm-test-ulps index bc14e7e115..392d7d252c 100644 --- a/sysdeps/i386/i686/fpu/multiarch/libm-test-ulps +++ b/sysdeps/i386/i686/fpu/multiarch/libm-test-ulps @@ -101,25 +101,21 @@ ldouble: 5 Function: "asinpi": double: 1 -float: 1 float128: 2 ldouble: 2 Function: "asinpi_downward": double: 2 -float: 1 float128: 2 ldouble: 2 Function: "asinpi_towardzero": double: 1 -float: 2 float128: 1 ldouble: 2 Function: "asinpi_upward": double: 2 -float: 2 float128: 2 ldouble: 2 diff --git a/sysdeps/ieee754/flt-32/s_asinpif.c b/sysdeps/ieee754/flt-32/s_asinpif.c new file mode 100644 index 0000000000..585dc3f06e --- /dev/null +++ b/sysdeps/ieee754/flt-32/s_asinpif.c @@ -0,0 +1,136 @@ +/* Correctly-rounded half-revolution arc-sine function for binary32 value. + +Copyright (c) 2022-2025 Alexei Sibidanov. + +The original version of this file was copied from the CORE-MATH +project (file src/binary32/asinpi/asinpif.c, revision 49e223e). + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. + +*/ + +#include <errno.h> +#include <math.h> +#include <stdint.h> +#include <libm-alias-float.h> +#include "math_config.h" + +float +__asinpif (float x) +{ + float ax = fabsf (x); + double az = ax; + double z = x; + uint32_t t = asuint (x); + int32_t e = (t >> 23) & 0xff; + if (__glibc_unlikely (e >= 127)) + { + if (ax == 1.0f) + return copysignf (0.5f, x); + if (e == 0xff && (t << 9)) + return x + x; /* nan */ + return __math_edomf ((x - x) / (x - x)); /* nan */ + } + int32_t s = 146 - e; + int32_t i = 0; + if (__glibc_likely (s < 32)) + i = ((t & (~0u >> 9)) | 1 << 23) >> s; + static const double ch[][8] = + { + { 0x1.45f306dc9c882p-2, 0x1.b2995e7b7dc2fp-5, 0x1.8723a1cf50c7ep-6, + 0x1.d1a4591d16a29p-7, 0x1.3ce3aa68ddaeep-7, 0x1.d3182ab0cc1bfp-8, + 0x1.62b379a8b88e3p-8, 0x1.6811411fcfec2p-8 }, + { 0x1.ffffffffd3cdap-2, -0x1.17cc1b3355fdcp-4, 0x1.d067a1e8d5a99p-6, + -0x1.08e16fb09314ap-6, 0x1.5eed43d42dcb2p-7, -0x1.f58baca7acc71p-8, + 0x1.5dab64e2dcf15p-8, -0x1.59270e30797acp-9 }, + { 0x1.fffffff7c4617p-2, -0x1.17cc149ded3a2p-4, 0x1.d0654d4cb2c1ap-6, + -0x1.08c3ba713d33ap-6, 0x1.5d2053481079cp-7, -0x1.e485ebc545e7ep-8, + 0x1.303baca167dddp-8, -0x1.dee8d16d06b38p-10 }, + { 0x1.ffffffa749848p-2, -0x1.17cbe7155935p-4, 0x1.d05a312269adfp-6, + -0x1.0862b3ee617d7p-6, 0x1.5920708db2a73p-7, -0x1.cb0463b3862c3p-8, + 0x1.02b82478f95d7p-8, -0x1.52a7b8579e729p-10 }, + { 0x1.fffffe1f92bb5p-2, -0x1.17cb3e74c64e3p-4, 0x1.d03af67311cbfp-6, + -0x1.079441cbfc7ap-6, 0x1.52b4287805a61p-7, -0x1.ac3286d604a98p-8, + 0x1.b2f1210d9701bp-9, -0x1.e740ddc25afd6p-11 }, + { 0x1.fffff92beb6e2p-2, -0x1.17c986fe9518bp-4, 0x1.cff98167c9a5ep-6, + -0x1.0638b591eae52p-6, 0x1.4a0803828959ep-7, -0x1.8adeca229f11dp-8, + 0x1.6b9a7ba05dfcep-9, -0x1.640521a43b2dp-11 }, + { 0x1.ffffeccee5bfcp-2, -0x1.17c5f1753f5eap-4, 0x1.cf874e4fe258fp-6, + -0x1.043e6cf77b256p-6, 0x1.3f7db42227d92p-7, -0x1.691a6fa2a2882p-8, + 0x1.2f6543162bc61p-9, -0x1.07d5da05822b6p-11 }, + { 0x1.ffffd2f64431dp-2, -0x1.17bf8208c10c1p-4, 0x1.ced7487cdb124p-6, + -0x1.01a0d30932905p-6, 0x1.3388f99b254dap-7, -0x1.4844e245c65bdp-8, + 0x1.fa777150197c6p-10, -0x1.8c1ecf16a05c8p-12 }, + { 0x1.ffffa36d1712ep-2, -0x1.17b523971bd4ep-4, 0x1.cddee26de2deep-6, + -0x1.fccb00abaaabcp-7, 0x1.269afc3622342p-7, -0x1.2933152686752p-8, + 0x1.a76d4956cc9a3p-10, -0x1.2ce7d6dc651cep-12 }, + { 0x1.ffff5402ab3a1p-2, -0x1.17a5ba85da77ap-4, 0x1.cc96894e05c02p-6, + -0x1.f532143cb832ep-7, 0x1.19180b660ff09p-7, -0x1.0c57417a78b3cp-8, + 0x1.62e26cbd7bb1ep-10, -0x1.ce28d33fe1df3p-13 }, + { 0x1.fffed8d639751p-2, -0x1.1790349f3ae76p-4, 0x1.caf9a4fd1b398p-6, + -0x1.ec986b111342ep-7, 0x1.0b53c3ad4baa4p-7, -0x1.e3c2282eeace4p-9, + 0x1.2a55369f55bbep-10, -0x1.667fe48c396e8p-13 }, + { 0x1.fffe24b714161p-2, -0x1.177394fbcb719p-4, 0x1.c90652d920ebdp-6, + -0x1.e3239197bddf1p-7, 0x1.fb2188525b025p-8, -0x1.b3aadd451afc7p-9, + 0x1.f74020f31fdabp-11, -0x1.18b0cb246768dp-13 }, + { 0x1.fffd298bec9e2p-2, -0x1.174efbfd34648p-4, 0x1.c6bcfe48ea92bp-6, + -0x1.d8f9f2a16157cp-7, 0x1.e0044f56c8864p-8, -0x1.883e2347fe76cp-9, + 0x1.a9f0e3c1b7af5p-11, -0x1.bb5acc0e60825p-14 }, + { 0x1.fffbd8b784c4dp-2, -0x1.1721abdd3722ep-4, 0x1.c41fee756d4bp-6, + -0x1.ce40bccf8065fp-7, 0x1.c59b684b70ef9p-8, -0x1.6133d027996b3p-9, + 0x1.69cad01106397p-11, -0x1.60f8e45494156p-14 }, + { 0x1.fffa23749cf88p-2, -0x1.16eb0a8285c06p-4, 0x1.c132d762e1b0dp-6, + -0x1.c31a959398f4ep-7, 0x1.ac1c5b46bc8ap-8, -0x1.3e34f1abe51dcp-9, + 0x1.346738737c0b9p-11, -0x1.1b227a3f5c75p-14 }, + { 0x1.fff7fb25bb407p-2, -0x1.16aaa14d7564p-4, 0x1.bdfa75fca5ff2p-6, + -0x1.b7a6e260d079cp-7, 0x1.93ab06911033cp-8, -0x1.1ee5560967fd5p-9, + 0x1.07d31060838bfp-11, -0x1.c96f33a283115p-15 + } + }; + const double *c = ch[i]; + double z2 = z * z; + double z4 = z2 * z2; + if (__glibc_unlikely (i == 0)) + { + double c0 = c[0] + z2 * c[1]; + double c2 = c[2] + z2 * c[3]; + double c4 = c[4] + z2 * c[5]; + double c6 = c[6] + z2 * c[7]; + c0 += c2 * z4; + c4 += c6 * z4; + c0 += c4 * (z4 * z4); + if (__glibc_unlikely (ax != 0.0 && ax <= 0x1.921fb4p-126f)) + __set_errno (ERANGE); + return z * c0; + } + else + { + double f = sqrt (1 - az); + double c0 = fma (az, c[1], c[0]); + double c2 = c[2] + az * c[3]; + double c4 = c[4] + az * c[5]; + double c6 = c[6] + az * c[7]; + c0 += c2 * z2; + c4 += c6 * z2; + c0 += c4 * z4; + double r = fma (-c0, copysign (f, x), copysign (0.5, x)); + return r; + } +} +libm_alias_float (__asinpi, asinpi) diff --git a/sysdeps/loongarch/lp64/libm-test-ulps b/sysdeps/loongarch/lp64/libm-test-ulps index ce84ddf1e6..33dd6718ba 100644 --- a/sysdeps/loongarch/lp64/libm-test-ulps +++ b/sysdeps/loongarch/lp64/libm-test-ulps @@ -83,22 +83,18 @@ ldouble: 4 Function: "asinpi": double: 1 -float: 1 ldouble: 2 Function: "asinpi_downward": double: 1 -float: 1 ldouble: 2 Function: "asinpi_towardzero": double: 1 -float: 2 ldouble: 1 Function: "asinpi_upward": double: 2 -float: 2 ldouble: 2 Function: "atan": diff --git a/sysdeps/mips/mips64/libm-test-ulps b/sysdeps/mips/mips64/libm-test-ulps index 67c37dfd5e..869ceff928 100644 --- a/sysdeps/mips/mips64/libm-test-ulps +++ b/sysdeps/mips/mips64/libm-test-ulps @@ -83,22 +83,18 @@ ldouble: 4 Function: "asinpi": double: 1 -float: 1 ldouble: 2 Function: "asinpi_downward": double: 1 -float: 1 ldouble: 2 Function: "asinpi_towardzero": double: 1 -float: 2 ldouble: 1 Function: "asinpi_upward": double: 2 -float: 2 ldouble: 2 Function: "atan": diff --git a/sysdeps/or1k/fpu/libm-test-ulps b/sysdeps/or1k/fpu/libm-test-ulps index d3b1036d29..75db236e09 100644 --- a/sysdeps/or1k/fpu/libm-test-ulps +++ b/sysdeps/or1k/fpu/libm-test-ulps @@ -63,19 +63,15 @@ double: 3 Function: "asinpi": double: 1 -float: 1 Function: "asinpi_downward": double: 1 -float: 1 Function: "asinpi_towardzero": double: 1 -float: 2 Function: "asinpi_upward": double: 2 -float: 2 Function: "atan": double: 1 diff --git a/sysdeps/or1k/nofpu/libm-test-ulps b/sysdeps/or1k/nofpu/libm-test-ulps index 14b7e0f3f9..a1f7c80097 100644 --- a/sysdeps/or1k/nofpu/libm-test-ulps +++ b/sysdeps/or1k/nofpu/libm-test-ulps @@ -54,7 +54,6 @@ double: 3 Function: "asinpi": double: 1 -float: 1 Function: "atan": double: 1 diff --git a/sysdeps/powerpc/fpu/libm-test-ulps b/sysdeps/powerpc/fpu/libm-test-ulps index c9c86de147..fa3cf2e844 100644 --- a/sysdeps/powerpc/fpu/libm-test-ulps +++ b/sysdeps/powerpc/fpu/libm-test-ulps @@ -107,25 +107,21 @@ ldouble: 7 Function: "asinpi": double: 1 -float: 1 float128: 1 ldouble: 2 Function: "asinpi_downward": double: 1 -float: 1 float128: 2 ldouble: 4 Function: "asinpi_towardzero": double: 1 -float: 2 float128: 1 ldouble: 4 Function: "asinpi_upward": double: 2 -float: 2 float128: 2 ldouble: 4 diff --git a/sysdeps/riscv/nofpu/libm-test-ulps b/sysdeps/riscv/nofpu/libm-test-ulps index 6206a9531a..a5184ecad9 100644 --- a/sysdeps/riscv/nofpu/libm-test-ulps +++ b/sysdeps/riscv/nofpu/libm-test-ulps @@ -71,7 +71,6 @@ ldouble: 4 Function: "asinpi": double: 1 -float: 1 ldouble: 2 Function: "atan": diff --git a/sysdeps/riscv/rvd/libm-test-ulps b/sysdeps/riscv/rvd/libm-test-ulps index 124ca4b719..3bfc9668d5 100644 --- a/sysdeps/riscv/rvd/libm-test-ulps +++ b/sysdeps/riscv/rvd/libm-test-ulps @@ -83,22 +83,18 @@ ldouble: 4 Function: "asinpi": double: 1 -float: 1 ldouble: 2 Function: "asinpi_downward": double: 1 -float: 1 ldouble: 2 Function: "asinpi_towardzero": double: 1 -float: 2 ldouble: 1 Function: "asinpi_upward": double: 2 -float: 2 ldouble: 2 Function: "atan": diff --git a/sysdeps/s390/fpu/libm-test-ulps b/sysdeps/s390/fpu/libm-test-ulps index 364ccf3326..7d61bf1cef 100644 --- a/sysdeps/s390/fpu/libm-test-ulps +++ b/sysdeps/s390/fpu/libm-test-ulps @@ -83,22 +83,18 @@ ldouble: 4 Function: "asinpi": double: 1 -float: 1 ldouble: 2 Function: "asinpi_downward": double: 1 -float: 1 ldouble: 2 Function: "asinpi_towardzero": double: 1 -float: 2 ldouble: 1 Function: "asinpi_upward": double: 2 -float: 2 ldouble: 2 Function: "atan": diff --git a/sysdeps/sparc/fpu/libm-test-ulps b/sysdeps/sparc/fpu/libm-test-ulps index 1174972002..426f45893e 100644 --- a/sysdeps/sparc/fpu/libm-test-ulps +++ b/sysdeps/sparc/fpu/libm-test-ulps @@ -83,22 +83,18 @@ ldouble: 4 Function: "asinpi": double: 1 -float: 1 ldouble: 2 Function: "asinpi_downward": double: 1 -float: 1 ldouble: 2 Function: "asinpi_towardzero": double: 1 -float: 2 ldouble: 1 Function: "asinpi_upward": double: 2 -float: 2 ldouble: 2 Function: "atan": diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps index 5ed5112b49..d4c4bfa42b 100644 --- a/sysdeps/x86_64/fpu/libm-test-ulps +++ b/sysdeps/x86_64/fpu/libm-test-ulps @@ -180,25 +180,21 @@ float: 1 Function: "asinpi": double: 1 -float: 1 float128: 2 ldouble: 2 Function: "asinpi_downward": double: 1 -float: 1 float128: 2 ldouble: 2 Function: "asinpi_towardzero": double: 1 -float: 2 float128: 1 ldouble: 2 Function: "asinpi_upward": double: 2 -float: 2 float128: 2 ldouble: 2