diff mbox series

[09/15] math: Use asinpif from CORE-MATH

Message ID 20250131191844.2582716-10-adhemerval.zanella@linaro.org
State New
Headers show
Series Add c23 CORE-MATH binary32 implementations to libm | expand

Commit Message

Adhemerval Zanella Netto Jan. 31, 2025, 7:17 p.m. UTC
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance to the generic asinpif.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

latency                 master        patched   improvement
x86_64                 46.4996        51.0456        -9.78%
x86_64v2               46.7551        52.1317       -11.50%
x86_64v3               42.6235        34.8162        18.32%
aarch64 (Neoverse)     17.4161        14.3604        17.55%
power8                 10.7347         9.0193        15.98%
power10                10.6420         9.0362        15.09%

reciprocal-throughput   master        patched   improvement
x86_64                 24.7208        29.0812       -17.64%
x86_64v2               24.2177        29.7166       -22.71%
x86_64v3               20.5617        12.3679        39.85%
aarch64 (Neoverse)     13.4827        7.17613        46.78%
power8                 6.46134        3.56089        44.89%
power10                5.79007        3.49544        39.63%

x86_64/x86_64-v2 shows slower performance due the use of a fma
operation in the fast patch, only x86_64-v3 provides it without a
function call.
---
 SHARED-FILES                                  |   4 +
 sysdeps/aarch64/libm-test-ulps                |   4 -
 sysdeps/arc/fpu/libm-test-ulps                |   4 -
 sysdeps/arc/nofpu/libm-test-ulps              |   1 -
 sysdeps/arm/libm-test-ulps                    |   4 -
 sysdeps/hppa/fpu/libm-test-ulps               |   4 -
 sysdeps/i386/fpu/libm-test-ulps               |   4 -
 .../i386/i686/fpu/multiarch/libm-test-ulps    |   4 -
 sysdeps/ieee754/flt-32/s_asinpif.c            | 136 ++++++++++++++++++
 sysdeps/loongarch/lp64/libm-test-ulps         |   4 -
 sysdeps/mips/mips64/libm-test-ulps            |   4 -
 sysdeps/or1k/fpu/libm-test-ulps               |   4 -
 sysdeps/or1k/nofpu/libm-test-ulps             |   1 -
 sysdeps/powerpc/fpu/libm-test-ulps            |   4 -
 sysdeps/riscv/nofpu/libm-test-ulps            |   1 -
 sysdeps/riscv/rvd/libm-test-ulps              |   4 -
 sysdeps/s390/fpu/libm-test-ulps               |   4 -
 sysdeps/sparc/fpu/libm-test-ulps              |   4 -
 sysdeps/x86_64/fpu/libm-test-ulps             |   4 -
 19 files changed, 140 insertions(+), 59 deletions(-)
 create mode 100644 sysdeps/ieee754/flt-32/s_asinpif.c

Comments

Paul Zimmermann Feb. 1, 2025, 8:28 a.m. UTC | #1
I confirm we get correct rounding for all rounding modes and all binary32 inputs
on x86_64.

Paul

> From: Adhemerval Zanella <adhemerval.zanella@linaro.org>
> Cc: DJ Delorie <dj@redhat.com>,
> 	Joseph Myers <josmyers@redhat.com>,
> 	Paul Zimmermann <Paul.Zimmermann@inria.fr>,
> 	Alexei Sibidanov <sibid@uvic.ca>
> Date: Fri, 31 Jan 2025 16:17:13 -0300
> 
> The CORE-MATH implementation is correctly rounded (for any rounding mode)
> and shows better performance to the generic asinpif.
> 
> The code was adapted to glibc style and to use the definition of
> math_config.h (to handle errno, overflow, and underflow).
> 
> Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
> gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):
> 
> latency                 master        patched   improvement
> x86_64                 46.4996        51.0456        -9.78%
> x86_64v2               46.7551        52.1317       -11.50%
> x86_64v3               42.6235        34.8162        18.32%
> aarch64 (Neoverse)     17.4161        14.3604        17.55%
> power8                 10.7347         9.0193        15.98%
> power10                10.6420         9.0362        15.09%
> 
> reciprocal-throughput   master        patched   improvement
> x86_64                 24.7208        29.0812       -17.64%
> x86_64v2               24.2177        29.7166       -22.71%
> x86_64v3               20.5617        12.3679        39.85%
> aarch64 (Neoverse)     13.4827        7.17613        46.78%
> power8                 6.46134        3.56089        44.89%
> power10                5.79007        3.49544        39.63%
> 
> x86_64/x86_64-v2 shows slower performance due the use of a fma
> operation in the fast patch, only x86_64-v3 provides it without a
> function call.
> ---
>  SHARED-FILES                                  |   4 +
>  sysdeps/aarch64/libm-test-ulps                |   4 -
>  sysdeps/arc/fpu/libm-test-ulps                |   4 -
>  sysdeps/arc/nofpu/libm-test-ulps              |   1 -
>  sysdeps/arm/libm-test-ulps                    |   4 -
>  sysdeps/hppa/fpu/libm-test-ulps               |   4 -
>  sysdeps/i386/fpu/libm-test-ulps               |   4 -
>  .../i386/i686/fpu/multiarch/libm-test-ulps    |   4 -
>  sysdeps/ieee754/flt-32/s_asinpif.c            | 136 ++++++++++++++++++
>  sysdeps/loongarch/lp64/libm-test-ulps         |   4 -
>  sysdeps/mips/mips64/libm-test-ulps            |   4 -
>  sysdeps/or1k/fpu/libm-test-ulps               |   4 -
>  sysdeps/or1k/nofpu/libm-test-ulps             |   1 -
>  sysdeps/powerpc/fpu/libm-test-ulps            |   4 -
>  sysdeps/riscv/nofpu/libm-test-ulps            |   1 -
>  sysdeps/riscv/rvd/libm-test-ulps              |   4 -
>  sysdeps/s390/fpu/libm-test-ulps               |   4 -
>  sysdeps/sparc/fpu/libm-test-ulps              |   4 -
>  sysdeps/x86_64/fpu/libm-test-ulps             |   4 -
>  19 files changed, 140 insertions(+), 59 deletions(-)
>  create mode 100644 sysdeps/ieee754/flt-32/s_asinpif.c
> 
> diff --git a/SHARED-FILES b/SHARED-FILES
> index 3fde72644a..e700f4b155 100644
> --- a/SHARED-FILES
> +++ b/SHARED-FILES
> @@ -338,3 +338,7 @@ sysdeps/ieee754/flt-32/s_acospif.c:
>    (src/binary32/acospi/acospif.c in CORE-MATH)
>    - the code was adapted to use glibc code style and internal
>      functions to handle errno, overflow, and underflow.
> +sysdeps/ieee754/flt-32/s_asinpif.c:
> +  (src/binary32/asinpi/asinpif.c in CORE-MATH)
> +  - the code was adapted to use glibc code style and internal
> +    functions to handle errno, overflow, and underflow.
> diff --git a/sysdeps/aarch64/libm-test-ulps b/sysdeps/aarch64/libm-test-ulps
> index 1a403d95b6..abb0611ee5 100644
> --- a/sysdeps/aarch64/libm-test-ulps
> +++ b/sysdeps/aarch64/libm-test-ulps
> @@ -115,22 +115,18 @@ ldouble: 4
>  
>  Function: "asinpi":
>  double: 1
> -float: 1
>  ldouble: 2
>  
>  Function: "asinpi_downward":
>  double: 1
> -float: 1
>  ldouble: 2
>  
>  Function: "asinpi_towardzero":
>  double: 1
> -float: 2
>  ldouble: 1
>  
>  Function: "asinpi_upward":
>  double: 2
> -float: 2
>  ldouble: 2
>  
>  Function: "atan":
> diff --git a/sysdeps/arc/fpu/libm-test-ulps b/sysdeps/arc/fpu/libm-test-ulps
> index c0c5daa589..35aebba38a 100644
> --- a/sysdeps/arc/fpu/libm-test-ulps
> +++ b/sysdeps/arc/fpu/libm-test-ulps
> @@ -63,19 +63,15 @@ double: 3
>  
>  Function: "asinpi":
>  double: 1
> -float: 1
>  
>  Function: "asinpi_downward":
>  double: 1
> -float: 1
>  
>  Function: "asinpi_towardzero":
>  double: 1
> -float: 2
>  
>  Function: "asinpi_upward":
>  double: 2
> -float: 2
>  
>  Function: "atan":
>  double: 1
> diff --git a/sysdeps/arc/nofpu/libm-test-ulps b/sysdeps/arc/nofpu/libm-test-ulps
> index 2b34f5a0ab..325546e582 100644
> --- a/sysdeps/arc/nofpu/libm-test-ulps
> +++ b/sysdeps/arc/nofpu/libm-test-ulps
> @@ -18,7 +18,6 @@ double: 2
>  
>  Function: "asinpi":
>  double: 1
> -float: 1
>  
>  Function: "atan":
>  double: 1
> diff --git a/sysdeps/arm/libm-test-ulps b/sysdeps/arm/libm-test-ulps
> index afb0532a66..0927fdb980 100644
> --- a/sysdeps/arm/libm-test-ulps
> +++ b/sysdeps/arm/libm-test-ulps
> @@ -63,19 +63,15 @@ double: 3
>  
>  Function: "asinpi":
>  double: 1
> -float: 1
>  
>  Function: "asinpi_downward":
>  double: 1
> -float: 1
>  
>  Function: "asinpi_towardzero":
>  double: 1
> -float: 2
>  
>  Function: "asinpi_upward":
>  double: 2
> -float: 2
>  
>  Function: "atan":
>  double: 1
> diff --git a/sysdeps/hppa/fpu/libm-test-ulps b/sysdeps/hppa/fpu/libm-test-ulps
> index b9959c8a12..02cc3b5ddc 100644
> --- a/sysdeps/hppa/fpu/libm-test-ulps
> +++ b/sysdeps/hppa/fpu/libm-test-ulps
> @@ -63,19 +63,15 @@ double: 3
>  
>  Function: "asinpi":
>  double: 1
> -float: 1
>  
>  Function: "asinpi_downward":
>  double: 1
> -float: 1
>  
>  Function: "asinpi_towardzero":
>  double: 1
> -float: 2
>  
>  Function: "asinpi_upward":
>  double: 2
> -float: 2
>  
>  Function: "atan":
>  double: 1
> diff --git a/sysdeps/i386/fpu/libm-test-ulps b/sysdeps/i386/fpu/libm-test-ulps
> index 85c58f34e9..69d0eb1eec 100644
> --- a/sysdeps/i386/fpu/libm-test-ulps
> +++ b/sysdeps/i386/fpu/libm-test-ulps
> @@ -101,25 +101,21 @@ ldouble: 5
>  
>  Function: "asinpi":
>  double: 1
> -float: 1
>  float128: 2
>  ldouble: 2
>  
>  Function: "asinpi_downward":
>  double: 2
> -float: 1
>  float128: 2
>  ldouble: 2
>  
>  Function: "asinpi_towardzero":
>  double: 1
> -float: 2
>  float128: 1
>  ldouble: 2
>  
>  Function: "asinpi_upward":
>  double: 2
> -float: 2
>  float128: 2
>  ldouble: 2
>  
> diff --git a/sysdeps/i386/i686/fpu/multiarch/libm-test-ulps b/sysdeps/i386/i686/fpu/multiarch/libm-test-ulps
> index bc14e7e115..392d7d252c 100644
> --- a/sysdeps/i386/i686/fpu/multiarch/libm-test-ulps
> +++ b/sysdeps/i386/i686/fpu/multiarch/libm-test-ulps
> @@ -101,25 +101,21 @@ ldouble: 5
>  
>  Function: "asinpi":
>  double: 1
> -float: 1
>  float128: 2
>  ldouble: 2
>  
>  Function: "asinpi_downward":
>  double: 2
> -float: 1
>  float128: 2
>  ldouble: 2
>  
>  Function: "asinpi_towardzero":
>  double: 1
> -float: 2
>  float128: 1
>  ldouble: 2
>  
>  Function: "asinpi_upward":
>  double: 2
> -float: 2
>  float128: 2
>  ldouble: 2
>  
> diff --git a/sysdeps/ieee754/flt-32/s_asinpif.c b/sysdeps/ieee754/flt-32/s_asinpif.c
> new file mode 100644
> index 0000000000..585dc3f06e
> --- /dev/null
> +++ b/sysdeps/ieee754/flt-32/s_asinpif.c
> @@ -0,0 +1,136 @@
> +/* Correctly-rounded half-revolution arc-sine function for binary32 value.
> +
> +Copyright (c) 2022-2025 Alexei Sibidanov.
> +
> +The original version of this file was copied from the CORE-MATH
> +project (file src/binary32/asinpi/asinpif.c, revision 49e223e).
> +
> +Permission is hereby granted, free of charge, to any person obtaining a copy
> +of this software and associated documentation files (the "Software"), to deal
> +in the Software without restriction, including without limitation the rights
> +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> +copies of the Software, and to permit persons to whom the Software is
> +furnished to do so, subject to the following conditions:
> +
> +The above copyright notice and this permission notice shall be included in all
> +copies or substantial portions of the Software.
> +
> +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
> +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
> +SOFTWARE.
> +
> +*/
> +
> +#include <errno.h>
> +#include <math.h>
> +#include <stdint.h>
> +#include <libm-alias-float.h>
> +#include "math_config.h"
> +
> +float
> +__asinpif (float x)
> +{
> +  float ax = fabsf (x);
> +  double az = ax;
> +  double z = x;
> +  uint32_t t = asuint (x);
> +  int32_t e = (t >> 23) & 0xff;
> +  if (__glibc_unlikely (e >= 127))
> +    {
> +      if (ax == 1.0f)
> +	return copysignf (0.5f, x);
> +      if (e == 0xff && (t << 9))
> +	return x + x; /* nan */
> +      return __math_edomf ((x - x) / (x - x)); /* nan */
> +    }
> +  int32_t s = 146 - e;
> +  int32_t i = 0;
> +  if (__glibc_likely (s < 32))
> +    i = ((t & (~0u >> 9)) | 1 << 23) >> s;
> +  static const double ch[][8] =
> +    {
> +      {  0x1.45f306dc9c882p-2,   0x1.b2995e7b7dc2fp-5,  0x1.8723a1cf50c7ep-6,
> +	 0x1.d1a4591d16a29p-7,   0x1.3ce3aa68ddaeep-7,  0x1.d3182ab0cc1bfp-8,
> +	 0x1.62b379a8b88e3p-8,   0x1.6811411fcfec2p-8 },
> +      {  0x1.ffffffffd3cdap-2,  -0x1.17cc1b3355fdcp-4,  0x1.d067a1e8d5a99p-6,
> +	-0x1.08e16fb09314ap-6,   0x1.5eed43d42dcb2p-7, -0x1.f58baca7acc71p-8,
> +	 0x1.5dab64e2dcf15p-8,  -0x1.59270e30797acp-9 },
> +      {  0x1.fffffff7c4617p-2,  -0x1.17cc149ded3a2p-4,  0x1.d0654d4cb2c1ap-6,
> +	-0x1.08c3ba713d33ap-6,   0x1.5d2053481079cp-7, -0x1.e485ebc545e7ep-8,
> +	 0x1.303baca167dddp-8,  -0x1.dee8d16d06b38p-10 },
> +      {  0x1.ffffffa749848p-2,  -0x1.17cbe7155935p-4,   0x1.d05a312269adfp-6,
> +	-0x1.0862b3ee617d7p-6,   0x1.5920708db2a73p-7, -0x1.cb0463b3862c3p-8,
> +	 0x1.02b82478f95d7p-8,  -0x1.52a7b8579e729p-10 },
> +      {  0x1.fffffe1f92bb5p-2,  -0x1.17cb3e74c64e3p-4,  0x1.d03af67311cbfp-6,
> +	-0x1.079441cbfc7ap-6,    0x1.52b4287805a61p-7, -0x1.ac3286d604a98p-8,
> +	 0x1.b2f1210d9701bp-9,  -0x1.e740ddc25afd6p-11 },
> +      {  0x1.fffff92beb6e2p-2,  -0x1.17c986fe9518bp-4,  0x1.cff98167c9a5ep-6,
> +	-0x1.0638b591eae52p-6,   0x1.4a0803828959ep-7, -0x1.8adeca229f11dp-8,
> +	 0x1.6b9a7ba05dfcep-9,  -0x1.640521a43b2dp-11 },
> +      {  0x1.ffffeccee5bfcp-2,  -0x1.17c5f1753f5eap-4,  0x1.cf874e4fe258fp-6,
> +	-0x1.043e6cf77b256p-6,   0x1.3f7db42227d92p-7, -0x1.691a6fa2a2882p-8,
> +	 0x1.2f6543162bc61p-9,  -0x1.07d5da05822b6p-11 },
> +      {  0x1.ffffd2f64431dp-2,  -0x1.17bf8208c10c1p-4,  0x1.ced7487cdb124p-6,
> +	-0x1.01a0d30932905p-6,   0x1.3388f99b254dap-7, -0x1.4844e245c65bdp-8,
> +	 0x1.fa777150197c6p-10, -0x1.8c1ecf16a05c8p-12 },
> +      {  0x1.ffffa36d1712ep-2,  -0x1.17b523971bd4ep-4,  0x1.cddee26de2deep-6,
> +	-0x1.fccb00abaaabcp-7,   0x1.269afc3622342p-7, -0x1.2933152686752p-8,
> +	 0x1.a76d4956cc9a3p-10, -0x1.2ce7d6dc651cep-12 },
> +      {  0x1.ffff5402ab3a1p-2,  -0x1.17a5ba85da77ap-4,  0x1.cc96894e05c02p-6,
> +	-0x1.f532143cb832ep-7,   0x1.19180b660ff09p-7, -0x1.0c57417a78b3cp-8,
> +	 0x1.62e26cbd7bb1ep-10, -0x1.ce28d33fe1df3p-13 },
> +      {  0x1.fffed8d639751p-2,  -0x1.1790349f3ae76p-4,  0x1.caf9a4fd1b398p-6,
> +	-0x1.ec986b111342ep-7,   0x1.0b53c3ad4baa4p-7, -0x1.e3c2282eeace4p-9,
> +	 0x1.2a55369f55bbep-10, -0x1.667fe48c396e8p-13 },
> +      {  0x1.fffe24b714161p-2,  -0x1.177394fbcb719p-4,  0x1.c90652d920ebdp-6,
> +	-0x1.e3239197bddf1p-7,   0x1.fb2188525b025p-8, -0x1.b3aadd451afc7p-9,
> +	 0x1.f74020f31fdabp-11, -0x1.18b0cb246768dp-13 },
> +      {  0x1.fffd298bec9e2p-2,  -0x1.174efbfd34648p-4,  0x1.c6bcfe48ea92bp-6,
> +	-0x1.d8f9f2a16157cp-7,   0x1.e0044f56c8864p-8, -0x1.883e2347fe76cp-9,
> +	 0x1.a9f0e3c1b7af5p-11, -0x1.bb5acc0e60825p-14 },
> +      {  0x1.fffbd8b784c4dp-2,  -0x1.1721abdd3722ep-4,  0x1.c41fee756d4bp-6,
> +	-0x1.ce40bccf8065fp-7,   0x1.c59b684b70ef9p-8, -0x1.6133d027996b3p-9,
> +	 0x1.69cad01106397p-11, -0x1.60f8e45494156p-14 },
> +      {  0x1.fffa23749cf88p-2,  -0x1.16eb0a8285c06p-4,  0x1.c132d762e1b0dp-6,
> +	-0x1.c31a959398f4ep-7,   0x1.ac1c5b46bc8ap-8,  -0x1.3e34f1abe51dcp-9,
> +	 0x1.346738737c0b9p-11, -0x1.1b227a3f5c75p-14 },
> +      {  0x1.fff7fb25bb407p-2,  -0x1.16aaa14d7564p-4,   0x1.bdfa75fca5ff2p-6,
> +	-0x1.b7a6e260d079cp-7,   0x1.93ab06911033cp-8, -0x1.1ee5560967fd5p-9,
> +	 0x1.07d31060838bfp-11, -0x1.c96f33a283115p-15
> +      }
> +  };
> +  const double *c = ch[i];
> +  double z2 = z * z;
> +  double z4 = z2 * z2;
> +  if (__glibc_unlikely (i == 0))
> +    {
> +      double c0 = c[0] + z2 * c[1];
> +      double c2 = c[2] + z2 * c[3];
> +      double c4 = c[4] + z2 * c[5];
> +      double c6 = c[6] + z2 * c[7];
> +      c0 += c2 * z4;
> +      c4 += c6 * z4;
> +      c0 += c4 * (z4 * z4);
> +      if (__glibc_unlikely (ax != 0.0 && ax <= 0x1.921fb4p-126f))
> +	__set_errno (ERANGE);
> +      return z * c0;
> +    }
> +  else
> +    {
> +      double f = sqrt (1 - az);
> +      double c0 = fma (az, c[1], c[0]);
> +      double c2 = c[2] + az * c[3];
> +      double c4 = c[4] + az * c[5];
> +      double c6 = c[6] + az * c[7];
> +      c0 += c2 * z2;
> +      c4 += c6 * z2;
> +      c0 += c4 * z4;
> +      double r = fma (-c0, copysign (f, x), copysign (0.5, x));
> +      return r;
> +    }
> +}
> +libm_alias_float (__asinpi, asinpi)
> diff --git a/sysdeps/loongarch/lp64/libm-test-ulps b/sysdeps/loongarch/lp64/libm-test-ulps
> index ce84ddf1e6..33dd6718ba 100644
> --- a/sysdeps/loongarch/lp64/libm-test-ulps
> +++ b/sysdeps/loongarch/lp64/libm-test-ulps
> @@ -83,22 +83,18 @@ ldouble: 4
>  
>  Function: "asinpi":
>  double: 1
> -float: 1
>  ldouble: 2
>  
>  Function: "asinpi_downward":
>  double: 1
> -float: 1
>  ldouble: 2
>  
>  Function: "asinpi_towardzero":
>  double: 1
> -float: 2
>  ldouble: 1
>  
>  Function: "asinpi_upward":
>  double: 2
> -float: 2
>  ldouble: 2
>  
>  Function: "atan":
> diff --git a/sysdeps/mips/mips64/libm-test-ulps b/sysdeps/mips/mips64/libm-test-ulps
> index 67c37dfd5e..869ceff928 100644
> --- a/sysdeps/mips/mips64/libm-test-ulps
> +++ b/sysdeps/mips/mips64/libm-test-ulps
> @@ -83,22 +83,18 @@ ldouble: 4
>  
>  Function: "asinpi":
>  double: 1
> -float: 1
>  ldouble: 2
>  
>  Function: "asinpi_downward":
>  double: 1
> -float: 1
>  ldouble: 2
>  
>  Function: "asinpi_towardzero":
>  double: 1
> -float: 2
>  ldouble: 1
>  
>  Function: "asinpi_upward":
>  double: 2
> -float: 2
>  ldouble: 2
>  
>  Function: "atan":
> diff --git a/sysdeps/or1k/fpu/libm-test-ulps b/sysdeps/or1k/fpu/libm-test-ulps
> index d3b1036d29..75db236e09 100644
> --- a/sysdeps/or1k/fpu/libm-test-ulps
> +++ b/sysdeps/or1k/fpu/libm-test-ulps
> @@ -63,19 +63,15 @@ double: 3
>  
>  Function: "asinpi":
>  double: 1
> -float: 1
>  
>  Function: "asinpi_downward":
>  double: 1
> -float: 1
>  
>  Function: "asinpi_towardzero":
>  double: 1
> -float: 2
>  
>  Function: "asinpi_upward":
>  double: 2
> -float: 2
>  
>  Function: "atan":
>  double: 1
> diff --git a/sysdeps/or1k/nofpu/libm-test-ulps b/sysdeps/or1k/nofpu/libm-test-ulps
> index 14b7e0f3f9..a1f7c80097 100644
> --- a/sysdeps/or1k/nofpu/libm-test-ulps
> +++ b/sysdeps/or1k/nofpu/libm-test-ulps
> @@ -54,7 +54,6 @@ double: 3
>  
>  Function: "asinpi":
>  double: 1
> -float: 1
>  
>  Function: "atan":
>  double: 1
> diff --git a/sysdeps/powerpc/fpu/libm-test-ulps b/sysdeps/powerpc/fpu/libm-test-ulps
> index c9c86de147..fa3cf2e844 100644
> --- a/sysdeps/powerpc/fpu/libm-test-ulps
> +++ b/sysdeps/powerpc/fpu/libm-test-ulps
> @@ -107,25 +107,21 @@ ldouble: 7
>  
>  Function: "asinpi":
>  double: 1
> -float: 1
>  float128: 1
>  ldouble: 2
>  
>  Function: "asinpi_downward":
>  double: 1
> -float: 1
>  float128: 2
>  ldouble: 4
>  
>  Function: "asinpi_towardzero":
>  double: 1
> -float: 2
>  float128: 1
>  ldouble: 4
>  
>  Function: "asinpi_upward":
>  double: 2
> -float: 2
>  float128: 2
>  ldouble: 4
>  
> diff --git a/sysdeps/riscv/nofpu/libm-test-ulps b/sysdeps/riscv/nofpu/libm-test-ulps
> index 6206a9531a..a5184ecad9 100644
> --- a/sysdeps/riscv/nofpu/libm-test-ulps
> +++ b/sysdeps/riscv/nofpu/libm-test-ulps
> @@ -71,7 +71,6 @@ ldouble: 4
>  
>  Function: "asinpi":
>  double: 1
> -float: 1
>  ldouble: 2
>  
>  Function: "atan":
> diff --git a/sysdeps/riscv/rvd/libm-test-ulps b/sysdeps/riscv/rvd/libm-test-ulps
> index 124ca4b719..3bfc9668d5 100644
> --- a/sysdeps/riscv/rvd/libm-test-ulps
> +++ b/sysdeps/riscv/rvd/libm-test-ulps
> @@ -83,22 +83,18 @@ ldouble: 4
>  
>  Function: "asinpi":
>  double: 1
> -float: 1
>  ldouble: 2
>  
>  Function: "asinpi_downward":
>  double: 1
> -float: 1
>  ldouble: 2
>  
>  Function: "asinpi_towardzero":
>  double: 1
> -float: 2
>  ldouble: 1
>  
>  Function: "asinpi_upward":
>  double: 2
> -float: 2
>  ldouble: 2
>  
>  Function: "atan":
> diff --git a/sysdeps/s390/fpu/libm-test-ulps b/sysdeps/s390/fpu/libm-test-ulps
> index 364ccf3326..7d61bf1cef 100644
> --- a/sysdeps/s390/fpu/libm-test-ulps
> +++ b/sysdeps/s390/fpu/libm-test-ulps
> @@ -83,22 +83,18 @@ ldouble: 4
>  
>  Function: "asinpi":
>  double: 1
> -float: 1
>  ldouble: 2
>  
>  Function: "asinpi_downward":
>  double: 1
> -float: 1
>  ldouble: 2
>  
>  Function: "asinpi_towardzero":
>  double: 1
> -float: 2
>  ldouble: 1
>  
>  Function: "asinpi_upward":
>  double: 2
> -float: 2
>  ldouble: 2
>  
>  Function: "atan":
> diff --git a/sysdeps/sparc/fpu/libm-test-ulps b/sysdeps/sparc/fpu/libm-test-ulps
> index 1174972002..426f45893e 100644
> --- a/sysdeps/sparc/fpu/libm-test-ulps
> +++ b/sysdeps/sparc/fpu/libm-test-ulps
> @@ -83,22 +83,18 @@ ldouble: 4
>  
>  Function: "asinpi":
>  double: 1
> -float: 1
>  ldouble: 2
>  
>  Function: "asinpi_downward":
>  double: 1
> -float: 1
>  ldouble: 2
>  
>  Function: "asinpi_towardzero":
>  double: 1
> -float: 2
>  ldouble: 1
>  
>  Function: "asinpi_upward":
>  double: 2
> -float: 2
>  ldouble: 2
>  
>  Function: "atan":
> diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps
> index 5ed5112b49..d4c4bfa42b 100644
> --- a/sysdeps/x86_64/fpu/libm-test-ulps
> +++ b/sysdeps/x86_64/fpu/libm-test-ulps
> @@ -180,25 +180,21 @@ float: 1
>  
>  Function: "asinpi":
>  double: 1
> -float: 1
>  float128: 2
>  ldouble: 2
>  
>  Function: "asinpi_downward":
>  double: 1
> -float: 1
>  float128: 2
>  ldouble: 2
>  
>  Function: "asinpi_towardzero":
>  double: 1
> -float: 2
>  float128: 1
>  ldouble: 2
>  
>  Function: "asinpi_upward":
>  double: 2
> -float: 2
>  float128: 2
>  ldouble: 2
>  
> -- 
> 2.43.0
> 
>
Paul Zimmermann Feb. 3, 2025, 6:40 a.m. UTC | #2
I suggest the following change which should improve performance on x86_64/x86_64-v2:

--- a/sysdeps/ieee754/flt-32/s_asinpif.c
+++ b/sysdeps/ieee754/flt-32/s_asinpif.c
@@ -122,7 +122,7 @@ __asinpif (float x)
   else
     {
       double f = sqrt (1 - az);
-      double c0 = fma (az, c[1], c[0]);
+      double c0 = c[0] + az * c[1];
       double c2 = c[2] + az * c[3];
       double c4 = c[4] + az * c[5];
       double c6 = c[6] + az * c[7];

Moreover "fast patch" should be fast path.

Paul

> From: Adhemerval Zanella <adhemerval.zanella@linaro.org>
> Cc: DJ Delorie <dj@redhat.com>,
> 	Joseph Myers <josmyers@redhat.com>,
> 	Paul Zimmermann <Paul.Zimmermann@inria.fr>,
> 	Alexei Sibidanov <sibid@uvic.ca>
> Date: Fri, 31 Jan 2025 16:17:13 -0300
> 
> The CORE-MATH implementation is correctly rounded (for any rounding mode)
> and shows better performance to the generic asinpif.
> 
> The code was adapted to glibc style and to use the definition of
> math_config.h (to handle errno, overflow, and underflow).
> 
> Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
> gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):
> 
> latency                 master        patched   improvement
> x86_64                 46.4996        51.0456        -9.78%
> x86_64v2               46.7551        52.1317       -11.50%
> x86_64v3               42.6235        34.8162        18.32%
> aarch64 (Neoverse)     17.4161        14.3604        17.55%
> power8                 10.7347         9.0193        15.98%
> power10                10.6420         9.0362        15.09%
> 
> reciprocal-throughput   master        patched   improvement
> x86_64                 24.7208        29.0812       -17.64%
> x86_64v2               24.2177        29.7166       -22.71%
> x86_64v3               20.5617        12.3679        39.85%
> aarch64 (Neoverse)     13.4827        7.17613        46.78%
> power8                 6.46134        3.56089        44.89%
> power10                5.79007        3.49544        39.63%
> 
> x86_64/x86_64-v2 shows slower performance due the use of a fma
> operation in the fast patch, only x86_64-v3 provides it without a
> function call.
> ---
>  SHARED-FILES                                  |   4 +
>  sysdeps/aarch64/libm-test-ulps                |   4 -
>  sysdeps/arc/fpu/libm-test-ulps                |   4 -
>  sysdeps/arc/nofpu/libm-test-ulps              |   1 -
>  sysdeps/arm/libm-test-ulps                    |   4 -
>  sysdeps/hppa/fpu/libm-test-ulps               |   4 -
>  sysdeps/i386/fpu/libm-test-ulps               |   4 -
>  .../i386/i686/fpu/multiarch/libm-test-ulps    |   4 -
>  sysdeps/ieee754/flt-32/s_asinpif.c            | 136 ++++++++++++++++++
>  sysdeps/loongarch/lp64/libm-test-ulps         |   4 -
>  sysdeps/mips/mips64/libm-test-ulps            |   4 -
>  sysdeps/or1k/fpu/libm-test-ulps               |   4 -
>  sysdeps/or1k/nofpu/libm-test-ulps             |   1 -
>  sysdeps/powerpc/fpu/libm-test-ulps            |   4 -
>  sysdeps/riscv/nofpu/libm-test-ulps            |   1 -
>  sysdeps/riscv/rvd/libm-test-ulps              |   4 -
>  sysdeps/s390/fpu/libm-test-ulps               |   4 -
>  sysdeps/sparc/fpu/libm-test-ulps              |   4 -
>  sysdeps/x86_64/fpu/libm-test-ulps             |   4 -
>  19 files changed, 140 insertions(+), 59 deletions(-)
>  create mode 100644 sysdeps/ieee754/flt-32/s_asinpif.c
> 
> diff --git a/SHARED-FILES b/SHARED-FILES
> index 3fde72644a..e700f4b155 100644
> --- a/SHARED-FILES
> +++ b/SHARED-FILES
> @@ -338,3 +338,7 @@ sysdeps/ieee754/flt-32/s_acospif.c:
>    (src/binary32/acospi/acospif.c in CORE-MATH)
>    - the code was adapted to use glibc code style and internal
>      functions to handle errno, overflow, and underflow.
> +sysdeps/ieee754/flt-32/s_asinpif.c:
> +  (src/binary32/asinpi/asinpif.c in CORE-MATH)
> +  - the code was adapted to use glibc code style and internal
> +    functions to handle errno, overflow, and underflow.
> diff --git a/sysdeps/aarch64/libm-test-ulps b/sysdeps/aarch64/libm-test-ulps
> index 1a403d95b6..abb0611ee5 100644
> --- a/sysdeps/aarch64/libm-test-ulps
> +++ b/sysdeps/aarch64/libm-test-ulps
> @@ -115,22 +115,18 @@ ldouble: 4
>  
>  Function: "asinpi":
>  double: 1
> -float: 1
>  ldouble: 2
>  
>  Function: "asinpi_downward":
>  double: 1
> -float: 1
>  ldouble: 2
>  
>  Function: "asinpi_towardzero":
>  double: 1
> -float: 2
>  ldouble: 1
>  
>  Function: "asinpi_upward":
>  double: 2
> -float: 2
>  ldouble: 2
>  
>  Function: "atan":
> diff --git a/sysdeps/arc/fpu/libm-test-ulps b/sysdeps/arc/fpu/libm-test-ulps
> index c0c5daa589..35aebba38a 100644
> --- a/sysdeps/arc/fpu/libm-test-ulps
> +++ b/sysdeps/arc/fpu/libm-test-ulps
> @@ -63,19 +63,15 @@ double: 3
>  
>  Function: "asinpi":
>  double: 1
> -float: 1
>  
>  Function: "asinpi_downward":
>  double: 1
> -float: 1
>  
>  Function: "asinpi_towardzero":
>  double: 1
> -float: 2
>  
>  Function: "asinpi_upward":
>  double: 2
> -float: 2
>  
>  Function: "atan":
>  double: 1
> diff --git a/sysdeps/arc/nofpu/libm-test-ulps b/sysdeps/arc/nofpu/libm-test-ulps
> index 2b34f5a0ab..325546e582 100644
> --- a/sysdeps/arc/nofpu/libm-test-ulps
> +++ b/sysdeps/arc/nofpu/libm-test-ulps
> @@ -18,7 +18,6 @@ double: 2
>  
>  Function: "asinpi":
>  double: 1
> -float: 1
>  
>  Function: "atan":
>  double: 1
> diff --git a/sysdeps/arm/libm-test-ulps b/sysdeps/arm/libm-test-ulps
> index afb0532a66..0927fdb980 100644
> --- a/sysdeps/arm/libm-test-ulps
> +++ b/sysdeps/arm/libm-test-ulps
> @@ -63,19 +63,15 @@ double: 3
>  
>  Function: "asinpi":
>  double: 1
> -float: 1
>  
>  Function: "asinpi_downward":
>  double: 1
> -float: 1
>  
>  Function: "asinpi_towardzero":
>  double: 1
> -float: 2
>  
>  Function: "asinpi_upward":
>  double: 2
> -float: 2
>  
>  Function: "atan":
>  double: 1
> diff --git a/sysdeps/hppa/fpu/libm-test-ulps b/sysdeps/hppa/fpu/libm-test-ulps
> index b9959c8a12..02cc3b5ddc 100644
> --- a/sysdeps/hppa/fpu/libm-test-ulps
> +++ b/sysdeps/hppa/fpu/libm-test-ulps
> @@ -63,19 +63,15 @@ double: 3
>  
>  Function: "asinpi":
>  double: 1
> -float: 1
>  
>  Function: "asinpi_downward":
>  double: 1
> -float: 1
>  
>  Function: "asinpi_towardzero":
>  double: 1
> -float: 2
>  
>  Function: "asinpi_upward":
>  double: 2
> -float: 2
>  
>  Function: "atan":
>  double: 1
> diff --git a/sysdeps/i386/fpu/libm-test-ulps b/sysdeps/i386/fpu/libm-test-ulps
> index 85c58f34e9..69d0eb1eec 100644
> --- a/sysdeps/i386/fpu/libm-test-ulps
> +++ b/sysdeps/i386/fpu/libm-test-ulps
> @@ -101,25 +101,21 @@ ldouble: 5
>  
>  Function: "asinpi":
>  double: 1
> -float: 1
>  float128: 2
>  ldouble: 2
>  
>  Function: "asinpi_downward":
>  double: 2
> -float: 1
>  float128: 2
>  ldouble: 2
>  
>  Function: "asinpi_towardzero":
>  double: 1
> -float: 2
>  float128: 1
>  ldouble: 2
>  
>  Function: "asinpi_upward":
>  double: 2
> -float: 2
>  float128: 2
>  ldouble: 2
>  
> diff --git a/sysdeps/i386/i686/fpu/multiarch/libm-test-ulps b/sysdeps/i386/i686/fpu/multiarch/libm-test-ulps
> index bc14e7e115..392d7d252c 100644
> --- a/sysdeps/i386/i686/fpu/multiarch/libm-test-ulps
> +++ b/sysdeps/i386/i686/fpu/multiarch/libm-test-ulps
> @@ -101,25 +101,21 @@ ldouble: 5
>  
>  Function: "asinpi":
>  double: 1
> -float: 1
>  float128: 2
>  ldouble: 2
>  
>  Function: "asinpi_downward":
>  double: 2
> -float: 1
>  float128: 2
>  ldouble: 2
>  
>  Function: "asinpi_towardzero":
>  double: 1
> -float: 2
>  float128: 1
>  ldouble: 2
>  
>  Function: "asinpi_upward":
>  double: 2
> -float: 2
>  float128: 2
>  ldouble: 2
>  
> diff --git a/sysdeps/ieee754/flt-32/s_asinpif.c b/sysdeps/ieee754/flt-32/s_asinpif.c
> new file mode 100644
> index 0000000000..585dc3f06e
> --- /dev/null
> +++ b/sysdeps/ieee754/flt-32/s_asinpif.c
> @@ -0,0 +1,136 @@
> +/* Correctly-rounded half-revolution arc-sine function for binary32 value.
> +
> +Copyright (c) 2022-2025 Alexei Sibidanov.
> +
> +The original version of this file was copied from the CORE-MATH
> +project (file src/binary32/asinpi/asinpif.c, revision 49e223e).
> +
> +Permission is hereby granted, free of charge, to any person obtaining a copy
> +of this software and associated documentation files (the "Software"), to deal
> +in the Software without restriction, including without limitation the rights
> +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> +copies of the Software, and to permit persons to whom the Software is
> +furnished to do so, subject to the following conditions:
> +
> +The above copyright notice and this permission notice shall be included in all
> +copies or substantial portions of the Software.
> +
> +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
> +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
> +SOFTWARE.
> +
> +*/
> +
> +#include <errno.h>
> +#include <math.h>
> +#include <stdint.h>
> +#include <libm-alias-float.h>
> +#include "math_config.h"
> +
> +float
> +__asinpif (float x)
> +{
> +  float ax = fabsf (x);
> +  double az = ax;
> +  double z = x;
> +  uint32_t t = asuint (x);
> +  int32_t e = (t >> 23) & 0xff;
> +  if (__glibc_unlikely (e >= 127))
> +    {
> +      if (ax == 1.0f)
> +	return copysignf (0.5f, x);
> +      if (e == 0xff && (t << 9))
> +	return x + x; /* nan */
> +      return __math_edomf ((x - x) / (x - x)); /* nan */
> +    }
> +  int32_t s = 146 - e;
> +  int32_t i = 0;
> +  if (__glibc_likely (s < 32))
> +    i = ((t & (~0u >> 9)) | 1 << 23) >> s;
> +  static const double ch[][8] =
> +    {
> +      {  0x1.45f306dc9c882p-2,   0x1.b2995e7b7dc2fp-5,  0x1.8723a1cf50c7ep-6,
> +	 0x1.d1a4591d16a29p-7,   0x1.3ce3aa68ddaeep-7,  0x1.d3182ab0cc1bfp-8,
> +	 0x1.62b379a8b88e3p-8,   0x1.6811411fcfec2p-8 },
> +      {  0x1.ffffffffd3cdap-2,  -0x1.17cc1b3355fdcp-4,  0x1.d067a1e8d5a99p-6,
> +	-0x1.08e16fb09314ap-6,   0x1.5eed43d42dcb2p-7, -0x1.f58baca7acc71p-8,
> +	 0x1.5dab64e2dcf15p-8,  -0x1.59270e30797acp-9 },
> +      {  0x1.fffffff7c4617p-2,  -0x1.17cc149ded3a2p-4,  0x1.d0654d4cb2c1ap-6,
> +	-0x1.08c3ba713d33ap-6,   0x1.5d2053481079cp-7, -0x1.e485ebc545e7ep-8,
> +	 0x1.303baca167dddp-8,  -0x1.dee8d16d06b38p-10 },
> +      {  0x1.ffffffa749848p-2,  -0x1.17cbe7155935p-4,   0x1.d05a312269adfp-6,
> +	-0x1.0862b3ee617d7p-6,   0x1.5920708db2a73p-7, -0x1.cb0463b3862c3p-8,
> +	 0x1.02b82478f95d7p-8,  -0x1.52a7b8579e729p-10 },
> +      {  0x1.fffffe1f92bb5p-2,  -0x1.17cb3e74c64e3p-4,  0x1.d03af67311cbfp-6,
> +	-0x1.079441cbfc7ap-6,    0x1.52b4287805a61p-7, -0x1.ac3286d604a98p-8,
> +	 0x1.b2f1210d9701bp-9,  -0x1.e740ddc25afd6p-11 },
> +      {  0x1.fffff92beb6e2p-2,  -0x1.17c986fe9518bp-4,  0x1.cff98167c9a5ep-6,
> +	-0x1.0638b591eae52p-6,   0x1.4a0803828959ep-7, -0x1.8adeca229f11dp-8,
> +	 0x1.6b9a7ba05dfcep-9,  -0x1.640521a43b2dp-11 },
> +      {  0x1.ffffeccee5bfcp-2,  -0x1.17c5f1753f5eap-4,  0x1.cf874e4fe258fp-6,
> +	-0x1.043e6cf77b256p-6,   0x1.3f7db42227d92p-7, -0x1.691a6fa2a2882p-8,
> +	 0x1.2f6543162bc61p-9,  -0x1.07d5da05822b6p-11 },
> +      {  0x1.ffffd2f64431dp-2,  -0x1.17bf8208c10c1p-4,  0x1.ced7487cdb124p-6,
> +	-0x1.01a0d30932905p-6,   0x1.3388f99b254dap-7, -0x1.4844e245c65bdp-8,
> +	 0x1.fa777150197c6p-10, -0x1.8c1ecf16a05c8p-12 },
> +      {  0x1.ffffa36d1712ep-2,  -0x1.17b523971bd4ep-4,  0x1.cddee26de2deep-6,
> +	-0x1.fccb00abaaabcp-7,   0x1.269afc3622342p-7, -0x1.2933152686752p-8,
> +	 0x1.a76d4956cc9a3p-10, -0x1.2ce7d6dc651cep-12 },
> +      {  0x1.ffff5402ab3a1p-2,  -0x1.17a5ba85da77ap-4,  0x1.cc96894e05c02p-6,
> +	-0x1.f532143cb832ep-7,   0x1.19180b660ff09p-7, -0x1.0c57417a78b3cp-8,
> +	 0x1.62e26cbd7bb1ep-10, -0x1.ce28d33fe1df3p-13 },
> +      {  0x1.fffed8d639751p-2,  -0x1.1790349f3ae76p-4,  0x1.caf9a4fd1b398p-6,
> +	-0x1.ec986b111342ep-7,   0x1.0b53c3ad4baa4p-7, -0x1.e3c2282eeace4p-9,
> +	 0x1.2a55369f55bbep-10, -0x1.667fe48c396e8p-13 },
> +      {  0x1.fffe24b714161p-2,  -0x1.177394fbcb719p-4,  0x1.c90652d920ebdp-6,
> +	-0x1.e3239197bddf1p-7,   0x1.fb2188525b025p-8, -0x1.b3aadd451afc7p-9,
> +	 0x1.f74020f31fdabp-11, -0x1.18b0cb246768dp-13 },
> +      {  0x1.fffd298bec9e2p-2,  -0x1.174efbfd34648p-4,  0x1.c6bcfe48ea92bp-6,
> +	-0x1.d8f9f2a16157cp-7,   0x1.e0044f56c8864p-8, -0x1.883e2347fe76cp-9,
> +	 0x1.a9f0e3c1b7af5p-11, -0x1.bb5acc0e60825p-14 },
> +      {  0x1.fffbd8b784c4dp-2,  -0x1.1721abdd3722ep-4,  0x1.c41fee756d4bp-6,
> +	-0x1.ce40bccf8065fp-7,   0x1.c59b684b70ef9p-8, -0x1.6133d027996b3p-9,
> +	 0x1.69cad01106397p-11, -0x1.60f8e45494156p-14 },
> +      {  0x1.fffa23749cf88p-2,  -0x1.16eb0a8285c06p-4,  0x1.c132d762e1b0dp-6,
> +	-0x1.c31a959398f4ep-7,   0x1.ac1c5b46bc8ap-8,  -0x1.3e34f1abe51dcp-9,
> +	 0x1.346738737c0b9p-11, -0x1.1b227a3f5c75p-14 },
> +      {  0x1.fff7fb25bb407p-2,  -0x1.16aaa14d7564p-4,   0x1.bdfa75fca5ff2p-6,
> +	-0x1.b7a6e260d079cp-7,   0x1.93ab06911033cp-8, -0x1.1ee5560967fd5p-9,
> +	 0x1.07d31060838bfp-11, -0x1.c96f33a283115p-15
> +      }
> +  };
> +  const double *c = ch[i];
> +  double z2 = z * z;
> +  double z4 = z2 * z2;
> +  if (__glibc_unlikely (i == 0))
> +    {
> +      double c0 = c[0] + z2 * c[1];
> +      double c2 = c[2] + z2 * c[3];
> +      double c4 = c[4] + z2 * c[5];
> +      double c6 = c[6] + z2 * c[7];
> +      c0 += c2 * z4;
> +      c4 += c6 * z4;
> +      c0 += c4 * (z4 * z4);
> +      if (__glibc_unlikely (ax != 0.0 && ax <= 0x1.921fb4p-126f))
> +	__set_errno (ERANGE);
> +      return z * c0;
> +    }
> +  else
> +    {
> +      double f = sqrt (1 - az);
> +      double c0 = fma (az, c[1], c[0]);
> +      double c2 = c[2] + az * c[3];
> +      double c4 = c[4] + az * c[5];
> +      double c6 = c[6] + az * c[7];
> +      c0 += c2 * z2;
> +      c4 += c6 * z2;
> +      c0 += c4 * z4;
> +      double r = fma (-c0, copysign (f, x), copysign (0.5, x));
> +      return r;
> +    }
> +}
> +libm_alias_float (__asinpi, asinpi)
> diff --git a/sysdeps/loongarch/lp64/libm-test-ulps b/sysdeps/loongarch/lp64/libm-test-ulps
> index ce84ddf1e6..33dd6718ba 100644
> --- a/sysdeps/loongarch/lp64/libm-test-ulps
> +++ b/sysdeps/loongarch/lp64/libm-test-ulps
> @@ -83,22 +83,18 @@ ldouble: 4
>  
>  Function: "asinpi":
>  double: 1
> -float: 1
>  ldouble: 2
>  
>  Function: "asinpi_downward":
>  double: 1
> -float: 1
>  ldouble: 2
>  
>  Function: "asinpi_towardzero":
>  double: 1
> -float: 2
>  ldouble: 1
>  
>  Function: "asinpi_upward":
>  double: 2
> -float: 2
>  ldouble: 2
>  
>  Function: "atan":
> diff --git a/sysdeps/mips/mips64/libm-test-ulps b/sysdeps/mips/mips64/libm-test-ulps
> index 67c37dfd5e..869ceff928 100644
> --- a/sysdeps/mips/mips64/libm-test-ulps
> +++ b/sysdeps/mips/mips64/libm-test-ulps
> @@ -83,22 +83,18 @@ ldouble: 4
>  
>  Function: "asinpi":
>  double: 1
> -float: 1
>  ldouble: 2
>  
>  Function: "asinpi_downward":
>  double: 1
> -float: 1
>  ldouble: 2
>  
>  Function: "asinpi_towardzero":
>  double: 1
> -float: 2
>  ldouble: 1
>  
>  Function: "asinpi_upward":
>  double: 2
> -float: 2
>  ldouble: 2
>  
>  Function: "atan":
> diff --git a/sysdeps/or1k/fpu/libm-test-ulps b/sysdeps/or1k/fpu/libm-test-ulps
> index d3b1036d29..75db236e09 100644
> --- a/sysdeps/or1k/fpu/libm-test-ulps
> +++ b/sysdeps/or1k/fpu/libm-test-ulps
> @@ -63,19 +63,15 @@ double: 3
>  
>  Function: "asinpi":
>  double: 1
> -float: 1
>  
>  Function: "asinpi_downward":
>  double: 1
> -float: 1
>  
>  Function: "asinpi_towardzero":
>  double: 1
> -float: 2
>  
>  Function: "asinpi_upward":
>  double: 2
> -float: 2
>  
>  Function: "atan":
>  double: 1
> diff --git a/sysdeps/or1k/nofpu/libm-test-ulps b/sysdeps/or1k/nofpu/libm-test-ulps
> index 14b7e0f3f9..a1f7c80097 100644
> --- a/sysdeps/or1k/nofpu/libm-test-ulps
> +++ b/sysdeps/or1k/nofpu/libm-test-ulps
> @@ -54,7 +54,6 @@ double: 3
>  
>  Function: "asinpi":
>  double: 1
> -float: 1
>  
>  Function: "atan":
>  double: 1
> diff --git a/sysdeps/powerpc/fpu/libm-test-ulps b/sysdeps/powerpc/fpu/libm-test-ulps
> index c9c86de147..fa3cf2e844 100644
> --- a/sysdeps/powerpc/fpu/libm-test-ulps
> +++ b/sysdeps/powerpc/fpu/libm-test-ulps
> @@ -107,25 +107,21 @@ ldouble: 7
>  
>  Function: "asinpi":
>  double: 1
> -float: 1
>  float128: 1
>  ldouble: 2
>  
>  Function: "asinpi_downward":
>  double: 1
> -float: 1
>  float128: 2
>  ldouble: 4
>  
>  Function: "asinpi_towardzero":
>  double: 1
> -float: 2
>  float128: 1
>  ldouble: 4
>  
>  Function: "asinpi_upward":
>  double: 2
> -float: 2
>  float128: 2
>  ldouble: 4
>  
> diff --git a/sysdeps/riscv/nofpu/libm-test-ulps b/sysdeps/riscv/nofpu/libm-test-ulps
> index 6206a9531a..a5184ecad9 100644
> --- a/sysdeps/riscv/nofpu/libm-test-ulps
> +++ b/sysdeps/riscv/nofpu/libm-test-ulps
> @@ -71,7 +71,6 @@ ldouble: 4
>  
>  Function: "asinpi":
>  double: 1
> -float: 1
>  ldouble: 2
>  
>  Function: "atan":
> diff --git a/sysdeps/riscv/rvd/libm-test-ulps b/sysdeps/riscv/rvd/libm-test-ulps
> index 124ca4b719..3bfc9668d5 100644
> --- a/sysdeps/riscv/rvd/libm-test-ulps
> +++ b/sysdeps/riscv/rvd/libm-test-ulps
> @@ -83,22 +83,18 @@ ldouble: 4
>  
>  Function: "asinpi":
>  double: 1
> -float: 1
>  ldouble: 2
>  
>  Function: "asinpi_downward":
>  double: 1
> -float: 1
>  ldouble: 2
>  
>  Function: "asinpi_towardzero":
>  double: 1
> -float: 2
>  ldouble: 1
>  
>  Function: "asinpi_upward":
>  double: 2
> -float: 2
>  ldouble: 2
>  
>  Function: "atan":
> diff --git a/sysdeps/s390/fpu/libm-test-ulps b/sysdeps/s390/fpu/libm-test-ulps
> index 364ccf3326..7d61bf1cef 100644
> --- a/sysdeps/s390/fpu/libm-test-ulps
> +++ b/sysdeps/s390/fpu/libm-test-ulps
> @@ -83,22 +83,18 @@ ldouble: 4
>  
>  Function: "asinpi":
>  double: 1
> -float: 1
>  ldouble: 2
>  
>  Function: "asinpi_downward":
>  double: 1
> -float: 1
>  ldouble: 2
>  
>  Function: "asinpi_towardzero":
>  double: 1
> -float: 2
>  ldouble: 1
>  
>  Function: "asinpi_upward":
>  double: 2
> -float: 2
>  ldouble: 2
>  
>  Function: "atan":
> diff --git a/sysdeps/sparc/fpu/libm-test-ulps b/sysdeps/sparc/fpu/libm-test-ulps
> index 1174972002..426f45893e 100644
> --- a/sysdeps/sparc/fpu/libm-test-ulps
> +++ b/sysdeps/sparc/fpu/libm-test-ulps
> @@ -83,22 +83,18 @@ ldouble: 4
>  
>  Function: "asinpi":
>  double: 1
> -float: 1
>  ldouble: 2
>  
>  Function: "asinpi_downward":
>  double: 1
> -float: 1
>  ldouble: 2
>  
>  Function: "asinpi_towardzero":
>  double: 1
> -float: 2
>  ldouble: 1
>  
>  Function: "asinpi_upward":
>  double: 2
> -float: 2
>  ldouble: 2
>  
>  Function: "atan":
> diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps
> index 5ed5112b49..d4c4bfa42b 100644
> --- a/sysdeps/x86_64/fpu/libm-test-ulps
> +++ b/sysdeps/x86_64/fpu/libm-test-ulps
> @@ -180,25 +180,21 @@ float: 1
>  
>  Function: "asinpi":
>  double: 1
> -float: 1
>  float128: 2
>  ldouble: 2
>  
>  Function: "asinpi_downward":
>  double: 1
> -float: 1
>  float128: 2
>  ldouble: 2
>  
>  Function: "asinpi_towardzero":
>  double: 1
> -float: 2
>  float128: 1
>  ldouble: 2
>  
>  Function: "asinpi_upward":
>  double: 2
> -float: 2
>  float128: 2
>  ldouble: 2
>  
> -- 
> 2.43.0
> 
>
Adhemerval Zanella Netto Feb. 3, 2025, 12:27 p.m. UTC | #3
On 03/02/25 03:40, Paul Zimmermann wrote:
> I suggest the following change which should improve performance on x86_64/x86_64-v2:
> 
> --- a/sysdeps/ieee754/flt-32/s_asinpif.c
> +++ b/sysdeps/ieee754/flt-32/s_asinpif.c
> @@ -122,7 +122,7 @@ __asinpif (float x)
>    else
>      {
>        double f = sqrt (1 - az);
> -      double c0 = fma (az, c[1], c[0]);
> +      double c0 = c[0] + az * c[1];
>        double c2 = c[2] + az * c[3];
>        double c4 = c[4] + az * c[5];
>        double c6 = c[6] + az * c[7];
> 
> Moreover "fast patch" should be fast path.

Thanks, this allows to drop the last patch that adds a asinpif ifunc variant
for x86_64-v3, since performance is now an improvement on all x86_64 ABIs.

> 
> Paul
> 
>> From: Adhemerval Zanella <adhemerval.zanella@linaro.org>
>> Cc: DJ Delorie <dj@redhat.com>,
>> 	Joseph Myers <josmyers@redhat.com>,
>> 	Paul Zimmermann <Paul.Zimmermann@inria.fr>,
>> 	Alexei Sibidanov <sibid@uvic.ca>
>> Date: Fri, 31 Jan 2025 16:17:13 -0300
>>
>> The CORE-MATH implementation is correctly rounded (for any rounding mode)
>> and shows better performance to the generic asinpif.
>>
>> The code was adapted to glibc style and to use the definition of
>> math_config.h (to handle errno, overflow, and underflow).
>>
>> Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
>> gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):
>>
>> latency                 master        patched   improvement
>> x86_64                 46.4996        51.0456        -9.78%
>> x86_64v2               46.7551        52.1317       -11.50%
>> x86_64v3               42.6235        34.8162        18.32%
>> aarch64 (Neoverse)     17.4161        14.3604        17.55%
>> power8                 10.7347         9.0193        15.98%
>> power10                10.6420         9.0362        15.09%
>>
>> reciprocal-throughput   master        patched   improvement
>> x86_64                 24.7208        29.0812       -17.64%
>> x86_64v2               24.2177        29.7166       -22.71%
>> x86_64v3               20.5617        12.3679        39.85%
>> aarch64 (Neoverse)     13.4827        7.17613        46.78%
>> power8                 6.46134        3.56089        44.89%
>> power10                5.79007        3.49544        39.63%
>>
>> x86_64/x86_64-v2 shows slower performance due the use of a fma
>> operation in the fast patch, only x86_64-v3 provides it without a
>> function call.
>> ---
>>  SHARED-FILES                                  |   4 +
>>  sysdeps/aarch64/libm-test-ulps                |   4 -
>>  sysdeps/arc/fpu/libm-test-ulps                |   4 -
>>  sysdeps/arc/nofpu/libm-test-ulps              |   1 -
>>  sysdeps/arm/libm-test-ulps                    |   4 -
>>  sysdeps/hppa/fpu/libm-test-ulps               |   4 -
>>  sysdeps/i386/fpu/libm-test-ulps               |   4 -
>>  .../i386/i686/fpu/multiarch/libm-test-ulps    |   4 -
>>  sysdeps/ieee754/flt-32/s_asinpif.c            | 136 ++++++++++++++++++
>>  sysdeps/loongarch/lp64/libm-test-ulps         |   4 -
>>  sysdeps/mips/mips64/libm-test-ulps            |   4 -
>>  sysdeps/or1k/fpu/libm-test-ulps               |   4 -
>>  sysdeps/or1k/nofpu/libm-test-ulps             |   1 -
>>  sysdeps/powerpc/fpu/libm-test-ulps            |   4 -
>>  sysdeps/riscv/nofpu/libm-test-ulps            |   1 -
>>  sysdeps/riscv/rvd/libm-test-ulps              |   4 -
>>  sysdeps/s390/fpu/libm-test-ulps               |   4 -
>>  sysdeps/sparc/fpu/libm-test-ulps              |   4 -
>>  sysdeps/x86_64/fpu/libm-test-ulps             |   4 -
>>  19 files changed, 140 insertions(+), 59 deletions(-)
>>  create mode 100644 sysdeps/ieee754/flt-32/s_asinpif.c
>>
>> diff --git a/SHARED-FILES b/SHARED-FILES
>> index 3fde72644a..e700f4b155 100644
>> --- a/SHARED-FILES
>> +++ b/SHARED-FILES
>> @@ -338,3 +338,7 @@ sysdeps/ieee754/flt-32/s_acospif.c:
>>    (src/binary32/acospi/acospif.c in CORE-MATH)
>>    - the code was adapted to use glibc code style and internal
>>      functions to handle errno, overflow, and underflow.
>> +sysdeps/ieee754/flt-32/s_asinpif.c:
>> +  (src/binary32/asinpi/asinpif.c in CORE-MATH)
>> +  - the code was adapted to use glibc code style and internal
>> +    functions to handle errno, overflow, and underflow.
>> diff --git a/sysdeps/aarch64/libm-test-ulps b/sysdeps/aarch64/libm-test-ulps
>> index 1a403d95b6..abb0611ee5 100644
>> --- a/sysdeps/aarch64/libm-test-ulps
>> +++ b/sysdeps/aarch64/libm-test-ulps
>> @@ -115,22 +115,18 @@ ldouble: 4
>>  
>>  Function: "asinpi":
>>  double: 1
>> -float: 1
>>  ldouble: 2
>>  
>>  Function: "asinpi_downward":
>>  double: 1
>> -float: 1
>>  ldouble: 2
>>  
>>  Function: "asinpi_towardzero":
>>  double: 1
>> -float: 2
>>  ldouble: 1
>>  
>>  Function: "asinpi_upward":
>>  double: 2
>> -float: 2
>>  ldouble: 2
>>  
>>  Function: "atan":
>> diff --git a/sysdeps/arc/fpu/libm-test-ulps b/sysdeps/arc/fpu/libm-test-ulps
>> index c0c5daa589..35aebba38a 100644
>> --- a/sysdeps/arc/fpu/libm-test-ulps
>> +++ b/sysdeps/arc/fpu/libm-test-ulps
>> @@ -63,19 +63,15 @@ double: 3
>>  
>>  Function: "asinpi":
>>  double: 1
>> -float: 1
>>  
>>  Function: "asinpi_downward":
>>  double: 1
>> -float: 1
>>  
>>  Function: "asinpi_towardzero":
>>  double: 1
>> -float: 2
>>  
>>  Function: "asinpi_upward":
>>  double: 2
>> -float: 2
>>  
>>  Function: "atan":
>>  double: 1
>> diff --git a/sysdeps/arc/nofpu/libm-test-ulps b/sysdeps/arc/nofpu/libm-test-ulps
>> index 2b34f5a0ab..325546e582 100644
>> --- a/sysdeps/arc/nofpu/libm-test-ulps
>> +++ b/sysdeps/arc/nofpu/libm-test-ulps
>> @@ -18,7 +18,6 @@ double: 2
>>  
>>  Function: "asinpi":
>>  double: 1
>> -float: 1
>>  
>>  Function: "atan":
>>  double: 1
>> diff --git a/sysdeps/arm/libm-test-ulps b/sysdeps/arm/libm-test-ulps
>> index afb0532a66..0927fdb980 100644
>> --- a/sysdeps/arm/libm-test-ulps
>> +++ b/sysdeps/arm/libm-test-ulps
>> @@ -63,19 +63,15 @@ double: 3
>>  
>>  Function: "asinpi":
>>  double: 1
>> -float: 1
>>  
>>  Function: "asinpi_downward":
>>  double: 1
>> -float: 1
>>  
>>  Function: "asinpi_towardzero":
>>  double: 1
>> -float: 2
>>  
>>  Function: "asinpi_upward":
>>  double: 2
>> -float: 2
>>  
>>  Function: "atan":
>>  double: 1
>> diff --git a/sysdeps/hppa/fpu/libm-test-ulps b/sysdeps/hppa/fpu/libm-test-ulps
>> index b9959c8a12..02cc3b5ddc 100644
>> --- a/sysdeps/hppa/fpu/libm-test-ulps
>> +++ b/sysdeps/hppa/fpu/libm-test-ulps
>> @@ -63,19 +63,15 @@ double: 3
>>  
>>  Function: "asinpi":
>>  double: 1
>> -float: 1
>>  
>>  Function: "asinpi_downward":
>>  double: 1
>> -float: 1
>>  
>>  Function: "asinpi_towardzero":
>>  double: 1
>> -float: 2
>>  
>>  Function: "asinpi_upward":
>>  double: 2
>> -float: 2
>>  
>>  Function: "atan":
>>  double: 1
>> diff --git a/sysdeps/i386/fpu/libm-test-ulps b/sysdeps/i386/fpu/libm-test-ulps
>> index 85c58f34e9..69d0eb1eec 100644
>> --- a/sysdeps/i386/fpu/libm-test-ulps
>> +++ b/sysdeps/i386/fpu/libm-test-ulps
>> @@ -101,25 +101,21 @@ ldouble: 5
>>  
>>  Function: "asinpi":
>>  double: 1
>> -float: 1
>>  float128: 2
>>  ldouble: 2
>>  
>>  Function: "asinpi_downward":
>>  double: 2
>> -float: 1
>>  float128: 2
>>  ldouble: 2
>>  
>>  Function: "asinpi_towardzero":
>>  double: 1
>> -float: 2
>>  float128: 1
>>  ldouble: 2
>>  
>>  Function: "asinpi_upward":
>>  double: 2
>> -float: 2
>>  float128: 2
>>  ldouble: 2
>>  
>> diff --git a/sysdeps/i386/i686/fpu/multiarch/libm-test-ulps b/sysdeps/i386/i686/fpu/multiarch/libm-test-ulps
>> index bc14e7e115..392d7d252c 100644
>> --- a/sysdeps/i386/i686/fpu/multiarch/libm-test-ulps
>> +++ b/sysdeps/i386/i686/fpu/multiarch/libm-test-ulps
>> @@ -101,25 +101,21 @@ ldouble: 5
>>  
>>  Function: "asinpi":
>>  double: 1
>> -float: 1
>>  float128: 2
>>  ldouble: 2
>>  
>>  Function: "asinpi_downward":
>>  double: 2
>> -float: 1
>>  float128: 2
>>  ldouble: 2
>>  
>>  Function: "asinpi_towardzero":
>>  double: 1
>> -float: 2
>>  float128: 1
>>  ldouble: 2
>>  
>>  Function: "asinpi_upward":
>>  double: 2
>> -float: 2
>>  float128: 2
>>  ldouble: 2
>>  
>> diff --git a/sysdeps/ieee754/flt-32/s_asinpif.c b/sysdeps/ieee754/flt-32/s_asinpif.c
>> new file mode 100644
>> index 0000000000..585dc3f06e
>> --- /dev/null
>> +++ b/sysdeps/ieee754/flt-32/s_asinpif.c
>> @@ -0,0 +1,136 @@
>> +/* Correctly-rounded half-revolution arc-sine function for binary32 value.
>> +
>> +Copyright (c) 2022-2025 Alexei Sibidanov.
>> +
>> +The original version of this file was copied from the CORE-MATH
>> +project (file src/binary32/asinpi/asinpif.c, revision 49e223e).
>> +
>> +Permission is hereby granted, free of charge, to any person obtaining a copy
>> +of this software and associated documentation files (the "Software"), to deal
>> +in the Software without restriction, including without limitation the rights
>> +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
>> +copies of the Software, and to permit persons to whom the Software is
>> +furnished to do so, subject to the following conditions:
>> +
>> +The above copyright notice and this permission notice shall be included in all
>> +copies or substantial portions of the Software.
>> +
>> +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
>> +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
>> +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
>> +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
>> +SOFTWARE.
>> +
>> +*/
>> +
>> +#include <errno.h>
>> +#include <math.h>
>> +#include <stdint.h>
>> +#include <libm-alias-float.h>
>> +#include "math_config.h"
>> +
>> +float
>> +__asinpif (float x)
>> +{
>> +  float ax = fabsf (x);
>> +  double az = ax;
>> +  double z = x;
>> +  uint32_t t = asuint (x);
>> +  int32_t e = (t >> 23) & 0xff;
>> +  if (__glibc_unlikely (e >= 127))
>> +    {
>> +      if (ax == 1.0f)
>> +	return copysignf (0.5f, x);
>> +      if (e == 0xff && (t << 9))
>> +	return x + x; /* nan */
>> +      return __math_edomf ((x - x) / (x - x)); /* nan */
>> +    }
>> +  int32_t s = 146 - e;
>> +  int32_t i = 0;
>> +  if (__glibc_likely (s < 32))
>> +    i = ((t & (~0u >> 9)) | 1 << 23) >> s;
>> +  static const double ch[][8] =
>> +    {
>> +      {  0x1.45f306dc9c882p-2,   0x1.b2995e7b7dc2fp-5,  0x1.8723a1cf50c7ep-6,
>> +	 0x1.d1a4591d16a29p-7,   0x1.3ce3aa68ddaeep-7,  0x1.d3182ab0cc1bfp-8,
>> +	 0x1.62b379a8b88e3p-8,   0x1.6811411fcfec2p-8 },
>> +      {  0x1.ffffffffd3cdap-2,  -0x1.17cc1b3355fdcp-4,  0x1.d067a1e8d5a99p-6,
>> +	-0x1.08e16fb09314ap-6,   0x1.5eed43d42dcb2p-7, -0x1.f58baca7acc71p-8,
>> +	 0x1.5dab64e2dcf15p-8,  -0x1.59270e30797acp-9 },
>> +      {  0x1.fffffff7c4617p-2,  -0x1.17cc149ded3a2p-4,  0x1.d0654d4cb2c1ap-6,
>> +	-0x1.08c3ba713d33ap-6,   0x1.5d2053481079cp-7, -0x1.e485ebc545e7ep-8,
>> +	 0x1.303baca167dddp-8,  -0x1.dee8d16d06b38p-10 },
>> +      {  0x1.ffffffa749848p-2,  -0x1.17cbe7155935p-4,   0x1.d05a312269adfp-6,
>> +	-0x1.0862b3ee617d7p-6,   0x1.5920708db2a73p-7, -0x1.cb0463b3862c3p-8,
>> +	 0x1.02b82478f95d7p-8,  -0x1.52a7b8579e729p-10 },
>> +      {  0x1.fffffe1f92bb5p-2,  -0x1.17cb3e74c64e3p-4,  0x1.d03af67311cbfp-6,
>> +	-0x1.079441cbfc7ap-6,    0x1.52b4287805a61p-7, -0x1.ac3286d604a98p-8,
>> +	 0x1.b2f1210d9701bp-9,  -0x1.e740ddc25afd6p-11 },
>> +      {  0x1.fffff92beb6e2p-2,  -0x1.17c986fe9518bp-4,  0x1.cff98167c9a5ep-6,
>> +	-0x1.0638b591eae52p-6,   0x1.4a0803828959ep-7, -0x1.8adeca229f11dp-8,
>> +	 0x1.6b9a7ba05dfcep-9,  -0x1.640521a43b2dp-11 },
>> +      {  0x1.ffffeccee5bfcp-2,  -0x1.17c5f1753f5eap-4,  0x1.cf874e4fe258fp-6,
>> +	-0x1.043e6cf77b256p-6,   0x1.3f7db42227d92p-7, -0x1.691a6fa2a2882p-8,
>> +	 0x1.2f6543162bc61p-9,  -0x1.07d5da05822b6p-11 },
>> +      {  0x1.ffffd2f64431dp-2,  -0x1.17bf8208c10c1p-4,  0x1.ced7487cdb124p-6,
>> +	-0x1.01a0d30932905p-6,   0x1.3388f99b254dap-7, -0x1.4844e245c65bdp-8,
>> +	 0x1.fa777150197c6p-10, -0x1.8c1ecf16a05c8p-12 },
>> +      {  0x1.ffffa36d1712ep-2,  -0x1.17b523971bd4ep-4,  0x1.cddee26de2deep-6,
>> +	-0x1.fccb00abaaabcp-7,   0x1.269afc3622342p-7, -0x1.2933152686752p-8,
>> +	 0x1.a76d4956cc9a3p-10, -0x1.2ce7d6dc651cep-12 },
>> +      {  0x1.ffff5402ab3a1p-2,  -0x1.17a5ba85da77ap-4,  0x1.cc96894e05c02p-6,
>> +	-0x1.f532143cb832ep-7,   0x1.19180b660ff09p-7, -0x1.0c57417a78b3cp-8,
>> +	 0x1.62e26cbd7bb1ep-10, -0x1.ce28d33fe1df3p-13 },
>> +      {  0x1.fffed8d639751p-2,  -0x1.1790349f3ae76p-4,  0x1.caf9a4fd1b398p-6,
>> +	-0x1.ec986b111342ep-7,   0x1.0b53c3ad4baa4p-7, -0x1.e3c2282eeace4p-9,
>> +	 0x1.2a55369f55bbep-10, -0x1.667fe48c396e8p-13 },
>> +      {  0x1.fffe24b714161p-2,  -0x1.177394fbcb719p-4,  0x1.c90652d920ebdp-6,
>> +	-0x1.e3239197bddf1p-7,   0x1.fb2188525b025p-8, -0x1.b3aadd451afc7p-9,
>> +	 0x1.f74020f31fdabp-11, -0x1.18b0cb246768dp-13 },
>> +      {  0x1.fffd298bec9e2p-2,  -0x1.174efbfd34648p-4,  0x1.c6bcfe48ea92bp-6,
>> +	-0x1.d8f9f2a16157cp-7,   0x1.e0044f56c8864p-8, -0x1.883e2347fe76cp-9,
>> +	 0x1.a9f0e3c1b7af5p-11, -0x1.bb5acc0e60825p-14 },
>> +      {  0x1.fffbd8b784c4dp-2,  -0x1.1721abdd3722ep-4,  0x1.c41fee756d4bp-6,
>> +	-0x1.ce40bccf8065fp-7,   0x1.c59b684b70ef9p-8, -0x1.6133d027996b3p-9,
>> +	 0x1.69cad01106397p-11, -0x1.60f8e45494156p-14 },
>> +      {  0x1.fffa23749cf88p-2,  -0x1.16eb0a8285c06p-4,  0x1.c132d762e1b0dp-6,
>> +	-0x1.c31a959398f4ep-7,   0x1.ac1c5b46bc8ap-8,  -0x1.3e34f1abe51dcp-9,
>> +	 0x1.346738737c0b9p-11, -0x1.1b227a3f5c75p-14 },
>> +      {  0x1.fff7fb25bb407p-2,  -0x1.16aaa14d7564p-4,   0x1.bdfa75fca5ff2p-6,
>> +	-0x1.b7a6e260d079cp-7,   0x1.93ab06911033cp-8, -0x1.1ee5560967fd5p-9,
>> +	 0x1.07d31060838bfp-11, -0x1.c96f33a283115p-15
>> +      }
>> +  };
>> +  const double *c = ch[i];
>> +  double z2 = z * z;
>> +  double z4 = z2 * z2;
>> +  if (__glibc_unlikely (i == 0))
>> +    {
>> +      double c0 = c[0] + z2 * c[1];
>> +      double c2 = c[2] + z2 * c[3];
>> +      double c4 = c[4] + z2 * c[5];
>> +      double c6 = c[6] + z2 * c[7];
>> +      c0 += c2 * z4;
>> +      c4 += c6 * z4;
>> +      c0 += c4 * (z4 * z4);
>> +      if (__glibc_unlikely (ax != 0.0 && ax <= 0x1.921fb4p-126f))
>> +	__set_errno (ERANGE);
>> +      return z * c0;
>> +    }
>> +  else
>> +    {
>> +      double f = sqrt (1 - az);
>> +      double c0 = fma (az, c[1], c[0]);
>> +      double c2 = c[2] + az * c[3];
>> +      double c4 = c[4] + az * c[5];
>> +      double c6 = c[6] + az * c[7];
>> +      c0 += c2 * z2;
>> +      c4 += c6 * z2;
>> +      c0 += c4 * z4;
>> +      double r = fma (-c0, copysign (f, x), copysign (0.5, x));
>> +      return r;
>> +    }
>> +}
>> +libm_alias_float (__asinpi, asinpi)
>> diff --git a/sysdeps/loongarch/lp64/libm-test-ulps b/sysdeps/loongarch/lp64/libm-test-ulps
>> index ce84ddf1e6..33dd6718ba 100644
>> --- a/sysdeps/loongarch/lp64/libm-test-ulps
>> +++ b/sysdeps/loongarch/lp64/libm-test-ulps
>> @@ -83,22 +83,18 @@ ldouble: 4
>>  
>>  Function: "asinpi":
>>  double: 1
>> -float: 1
>>  ldouble: 2
>>  
>>  Function: "asinpi_downward":
>>  double: 1
>> -float: 1
>>  ldouble: 2
>>  
>>  Function: "asinpi_towardzero":
>>  double: 1
>> -float: 2
>>  ldouble: 1
>>  
>>  Function: "asinpi_upward":
>>  double: 2
>> -float: 2
>>  ldouble: 2
>>  
>>  Function: "atan":
>> diff --git a/sysdeps/mips/mips64/libm-test-ulps b/sysdeps/mips/mips64/libm-test-ulps
>> index 67c37dfd5e..869ceff928 100644
>> --- a/sysdeps/mips/mips64/libm-test-ulps
>> +++ b/sysdeps/mips/mips64/libm-test-ulps
>> @@ -83,22 +83,18 @@ ldouble: 4
>>  
>>  Function: "asinpi":
>>  double: 1
>> -float: 1
>>  ldouble: 2
>>  
>>  Function: "asinpi_downward":
>>  double: 1
>> -float: 1
>>  ldouble: 2
>>  
>>  Function: "asinpi_towardzero":
>>  double: 1
>> -float: 2
>>  ldouble: 1
>>  
>>  Function: "asinpi_upward":
>>  double: 2
>> -float: 2
>>  ldouble: 2
>>  
>>  Function: "atan":
>> diff --git a/sysdeps/or1k/fpu/libm-test-ulps b/sysdeps/or1k/fpu/libm-test-ulps
>> index d3b1036d29..75db236e09 100644
>> --- a/sysdeps/or1k/fpu/libm-test-ulps
>> +++ b/sysdeps/or1k/fpu/libm-test-ulps
>> @@ -63,19 +63,15 @@ double: 3
>>  
>>  Function: "asinpi":
>>  double: 1
>> -float: 1
>>  
>>  Function: "asinpi_downward":
>>  double: 1
>> -float: 1
>>  
>>  Function: "asinpi_towardzero":
>>  double: 1
>> -float: 2
>>  
>>  Function: "asinpi_upward":
>>  double: 2
>> -float: 2
>>  
>>  Function: "atan":
>>  double: 1
>> diff --git a/sysdeps/or1k/nofpu/libm-test-ulps b/sysdeps/or1k/nofpu/libm-test-ulps
>> index 14b7e0f3f9..a1f7c80097 100644
>> --- a/sysdeps/or1k/nofpu/libm-test-ulps
>> +++ b/sysdeps/or1k/nofpu/libm-test-ulps
>> @@ -54,7 +54,6 @@ double: 3
>>  
>>  Function: "asinpi":
>>  double: 1
>> -float: 1
>>  
>>  Function: "atan":
>>  double: 1
>> diff --git a/sysdeps/powerpc/fpu/libm-test-ulps b/sysdeps/powerpc/fpu/libm-test-ulps
>> index c9c86de147..fa3cf2e844 100644
>> --- a/sysdeps/powerpc/fpu/libm-test-ulps
>> +++ b/sysdeps/powerpc/fpu/libm-test-ulps
>> @@ -107,25 +107,21 @@ ldouble: 7
>>  
>>  Function: "asinpi":
>>  double: 1
>> -float: 1
>>  float128: 1
>>  ldouble: 2
>>  
>>  Function: "asinpi_downward":
>>  double: 1
>> -float: 1
>>  float128: 2
>>  ldouble: 4
>>  
>>  Function: "asinpi_towardzero":
>>  double: 1
>> -float: 2
>>  float128: 1
>>  ldouble: 4
>>  
>>  Function: "asinpi_upward":
>>  double: 2
>> -float: 2
>>  float128: 2
>>  ldouble: 4
>>  
>> diff --git a/sysdeps/riscv/nofpu/libm-test-ulps b/sysdeps/riscv/nofpu/libm-test-ulps
>> index 6206a9531a..a5184ecad9 100644
>> --- a/sysdeps/riscv/nofpu/libm-test-ulps
>> +++ b/sysdeps/riscv/nofpu/libm-test-ulps
>> @@ -71,7 +71,6 @@ ldouble: 4
>>  
>>  Function: "asinpi":
>>  double: 1
>> -float: 1
>>  ldouble: 2
>>  
>>  Function: "atan":
>> diff --git a/sysdeps/riscv/rvd/libm-test-ulps b/sysdeps/riscv/rvd/libm-test-ulps
>> index 124ca4b719..3bfc9668d5 100644
>> --- a/sysdeps/riscv/rvd/libm-test-ulps
>> +++ b/sysdeps/riscv/rvd/libm-test-ulps
>> @@ -83,22 +83,18 @@ ldouble: 4
>>  
>>  Function: "asinpi":
>>  double: 1
>> -float: 1
>>  ldouble: 2
>>  
>>  Function: "asinpi_downward":
>>  double: 1
>> -float: 1
>>  ldouble: 2
>>  
>>  Function: "asinpi_towardzero":
>>  double: 1
>> -float: 2
>>  ldouble: 1
>>  
>>  Function: "asinpi_upward":
>>  double: 2
>> -float: 2
>>  ldouble: 2
>>  
>>  Function: "atan":
>> diff --git a/sysdeps/s390/fpu/libm-test-ulps b/sysdeps/s390/fpu/libm-test-ulps
>> index 364ccf3326..7d61bf1cef 100644
>> --- a/sysdeps/s390/fpu/libm-test-ulps
>> +++ b/sysdeps/s390/fpu/libm-test-ulps
>> @@ -83,22 +83,18 @@ ldouble: 4
>>  
>>  Function: "asinpi":
>>  double: 1
>> -float: 1
>>  ldouble: 2
>>  
>>  Function: "asinpi_downward":
>>  double: 1
>> -float: 1
>>  ldouble: 2
>>  
>>  Function: "asinpi_towardzero":
>>  double: 1
>> -float: 2
>>  ldouble: 1
>>  
>>  Function: "asinpi_upward":
>>  double: 2
>> -float: 2
>>  ldouble: 2
>>  
>>  Function: "atan":
>> diff --git a/sysdeps/sparc/fpu/libm-test-ulps b/sysdeps/sparc/fpu/libm-test-ulps
>> index 1174972002..426f45893e 100644
>> --- a/sysdeps/sparc/fpu/libm-test-ulps
>> +++ b/sysdeps/sparc/fpu/libm-test-ulps
>> @@ -83,22 +83,18 @@ ldouble: 4
>>  
>>  Function: "asinpi":
>>  double: 1
>> -float: 1
>>  ldouble: 2
>>  
>>  Function: "asinpi_downward":
>>  double: 1
>> -float: 1
>>  ldouble: 2
>>  
>>  Function: "asinpi_towardzero":
>>  double: 1
>> -float: 2
>>  ldouble: 1
>>  
>>  Function: "asinpi_upward":
>>  double: 2
>> -float: 2
>>  ldouble: 2
>>  
>>  Function: "atan":
>> diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps
>> index 5ed5112b49..d4c4bfa42b 100644
>> --- a/sysdeps/x86_64/fpu/libm-test-ulps
>> +++ b/sysdeps/x86_64/fpu/libm-test-ulps
>> @@ -180,25 +180,21 @@ float: 1
>>  
>>  Function: "asinpi":
>>  double: 1
>> -float: 1
>>  float128: 2
>>  ldouble: 2
>>  
>>  Function: "asinpi_downward":
>>  double: 1
>> -float: 1
>>  float128: 2
>>  ldouble: 2
>>  
>>  Function: "asinpi_towardzero":
>>  double: 1
>> -float: 2
>>  float128: 1
>>  ldouble: 2
>>  
>>  Function: "asinpi_upward":
>>  double: 2
>> -float: 2
>>  float128: 2
>>  ldouble: 2
>>  
>> -- 
>> 2.43.0
>>
>>
diff mbox series

Patch

diff --git a/SHARED-FILES b/SHARED-FILES
index 3fde72644a..e700f4b155 100644
--- a/SHARED-FILES
+++ b/SHARED-FILES
@@ -338,3 +338,7 @@  sysdeps/ieee754/flt-32/s_acospif.c:
   (src/binary32/acospi/acospif.c in CORE-MATH)
   - the code was adapted to use glibc code style and internal
     functions to handle errno, overflow, and underflow.
+sysdeps/ieee754/flt-32/s_asinpif.c:
+  (src/binary32/asinpi/asinpif.c in CORE-MATH)
+  - the code was adapted to use glibc code style and internal
+    functions to handle errno, overflow, and underflow.
diff --git a/sysdeps/aarch64/libm-test-ulps b/sysdeps/aarch64/libm-test-ulps
index 1a403d95b6..abb0611ee5 100644
--- a/sysdeps/aarch64/libm-test-ulps
+++ b/sysdeps/aarch64/libm-test-ulps
@@ -115,22 +115,18 @@  ldouble: 4
 
 Function: "asinpi":
 double: 1
-float: 1
 ldouble: 2
 
 Function: "asinpi_downward":
 double: 1
-float: 1
 ldouble: 2
 
 Function: "asinpi_towardzero":
 double: 1
-float: 2
 ldouble: 1
 
 Function: "asinpi_upward":
 double: 2
-float: 2
 ldouble: 2
 
 Function: "atan":
diff --git a/sysdeps/arc/fpu/libm-test-ulps b/sysdeps/arc/fpu/libm-test-ulps
index c0c5daa589..35aebba38a 100644
--- a/sysdeps/arc/fpu/libm-test-ulps
+++ b/sysdeps/arc/fpu/libm-test-ulps
@@ -63,19 +63,15 @@  double: 3
 
 Function: "asinpi":
 double: 1
-float: 1
 
 Function: "asinpi_downward":
 double: 1
-float: 1
 
 Function: "asinpi_towardzero":
 double: 1
-float: 2
 
 Function: "asinpi_upward":
 double: 2
-float: 2
 
 Function: "atan":
 double: 1
diff --git a/sysdeps/arc/nofpu/libm-test-ulps b/sysdeps/arc/nofpu/libm-test-ulps
index 2b34f5a0ab..325546e582 100644
--- a/sysdeps/arc/nofpu/libm-test-ulps
+++ b/sysdeps/arc/nofpu/libm-test-ulps
@@ -18,7 +18,6 @@  double: 2
 
 Function: "asinpi":
 double: 1
-float: 1
 
 Function: "atan":
 double: 1
diff --git a/sysdeps/arm/libm-test-ulps b/sysdeps/arm/libm-test-ulps
index afb0532a66..0927fdb980 100644
--- a/sysdeps/arm/libm-test-ulps
+++ b/sysdeps/arm/libm-test-ulps
@@ -63,19 +63,15 @@  double: 3
 
 Function: "asinpi":
 double: 1
-float: 1
 
 Function: "asinpi_downward":
 double: 1
-float: 1
 
 Function: "asinpi_towardzero":
 double: 1
-float: 2
 
 Function: "asinpi_upward":
 double: 2
-float: 2
 
 Function: "atan":
 double: 1
diff --git a/sysdeps/hppa/fpu/libm-test-ulps b/sysdeps/hppa/fpu/libm-test-ulps
index b9959c8a12..02cc3b5ddc 100644
--- a/sysdeps/hppa/fpu/libm-test-ulps
+++ b/sysdeps/hppa/fpu/libm-test-ulps
@@ -63,19 +63,15 @@  double: 3
 
 Function: "asinpi":
 double: 1
-float: 1
 
 Function: "asinpi_downward":
 double: 1
-float: 1
 
 Function: "asinpi_towardzero":
 double: 1
-float: 2
 
 Function: "asinpi_upward":
 double: 2
-float: 2
 
 Function: "atan":
 double: 1
diff --git a/sysdeps/i386/fpu/libm-test-ulps b/sysdeps/i386/fpu/libm-test-ulps
index 85c58f34e9..69d0eb1eec 100644
--- a/sysdeps/i386/fpu/libm-test-ulps
+++ b/sysdeps/i386/fpu/libm-test-ulps
@@ -101,25 +101,21 @@  ldouble: 5
 
 Function: "asinpi":
 double: 1
-float: 1
 float128: 2
 ldouble: 2
 
 Function: "asinpi_downward":
 double: 2
-float: 1
 float128: 2
 ldouble: 2
 
 Function: "asinpi_towardzero":
 double: 1
-float: 2
 float128: 1
 ldouble: 2
 
 Function: "asinpi_upward":
 double: 2
-float: 2
 float128: 2
 ldouble: 2
 
diff --git a/sysdeps/i386/i686/fpu/multiarch/libm-test-ulps b/sysdeps/i386/i686/fpu/multiarch/libm-test-ulps
index bc14e7e115..392d7d252c 100644
--- a/sysdeps/i386/i686/fpu/multiarch/libm-test-ulps
+++ b/sysdeps/i386/i686/fpu/multiarch/libm-test-ulps
@@ -101,25 +101,21 @@  ldouble: 5
 
 Function: "asinpi":
 double: 1
-float: 1
 float128: 2
 ldouble: 2
 
 Function: "asinpi_downward":
 double: 2
-float: 1
 float128: 2
 ldouble: 2
 
 Function: "asinpi_towardzero":
 double: 1
-float: 2
 float128: 1
 ldouble: 2
 
 Function: "asinpi_upward":
 double: 2
-float: 2
 float128: 2
 ldouble: 2
 
diff --git a/sysdeps/ieee754/flt-32/s_asinpif.c b/sysdeps/ieee754/flt-32/s_asinpif.c
new file mode 100644
index 0000000000..585dc3f06e
--- /dev/null
+++ b/sysdeps/ieee754/flt-32/s_asinpif.c
@@ -0,0 +1,136 @@ 
+/* Correctly-rounded half-revolution arc-sine function for binary32 value.
+
+Copyright (c) 2022-2025 Alexei Sibidanov.
+
+The original version of this file was copied from the CORE-MATH
+project (file src/binary32/asinpi/asinpif.c, revision 49e223e).
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+
+*/
+
+#include <errno.h>
+#include <math.h>
+#include <stdint.h>
+#include <libm-alias-float.h>
+#include "math_config.h"
+
+float
+__asinpif (float x)
+{
+  float ax = fabsf (x);
+  double az = ax;
+  double z = x;
+  uint32_t t = asuint (x);
+  int32_t e = (t >> 23) & 0xff;
+  if (__glibc_unlikely (e >= 127))
+    {
+      if (ax == 1.0f)
+	return copysignf (0.5f, x);
+      if (e == 0xff && (t << 9))
+	return x + x; /* nan */
+      return __math_edomf ((x - x) / (x - x)); /* nan */
+    }
+  int32_t s = 146 - e;
+  int32_t i = 0;
+  if (__glibc_likely (s < 32))
+    i = ((t & (~0u >> 9)) | 1 << 23) >> s;
+  static const double ch[][8] =
+    {
+      {  0x1.45f306dc9c882p-2,   0x1.b2995e7b7dc2fp-5,  0x1.8723a1cf50c7ep-6,
+	 0x1.d1a4591d16a29p-7,   0x1.3ce3aa68ddaeep-7,  0x1.d3182ab0cc1bfp-8,
+	 0x1.62b379a8b88e3p-8,   0x1.6811411fcfec2p-8 },
+      {  0x1.ffffffffd3cdap-2,  -0x1.17cc1b3355fdcp-4,  0x1.d067a1e8d5a99p-6,
+	-0x1.08e16fb09314ap-6,   0x1.5eed43d42dcb2p-7, -0x1.f58baca7acc71p-8,
+	 0x1.5dab64e2dcf15p-8,  -0x1.59270e30797acp-9 },
+      {  0x1.fffffff7c4617p-2,  -0x1.17cc149ded3a2p-4,  0x1.d0654d4cb2c1ap-6,
+	-0x1.08c3ba713d33ap-6,   0x1.5d2053481079cp-7, -0x1.e485ebc545e7ep-8,
+	 0x1.303baca167dddp-8,  -0x1.dee8d16d06b38p-10 },
+      {  0x1.ffffffa749848p-2,  -0x1.17cbe7155935p-4,   0x1.d05a312269adfp-6,
+	-0x1.0862b3ee617d7p-6,   0x1.5920708db2a73p-7, -0x1.cb0463b3862c3p-8,
+	 0x1.02b82478f95d7p-8,  -0x1.52a7b8579e729p-10 },
+      {  0x1.fffffe1f92bb5p-2,  -0x1.17cb3e74c64e3p-4,  0x1.d03af67311cbfp-6,
+	-0x1.079441cbfc7ap-6,    0x1.52b4287805a61p-7, -0x1.ac3286d604a98p-8,
+	 0x1.b2f1210d9701bp-9,  -0x1.e740ddc25afd6p-11 },
+      {  0x1.fffff92beb6e2p-2,  -0x1.17c986fe9518bp-4,  0x1.cff98167c9a5ep-6,
+	-0x1.0638b591eae52p-6,   0x1.4a0803828959ep-7, -0x1.8adeca229f11dp-8,
+	 0x1.6b9a7ba05dfcep-9,  -0x1.640521a43b2dp-11 },
+      {  0x1.ffffeccee5bfcp-2,  -0x1.17c5f1753f5eap-4,  0x1.cf874e4fe258fp-6,
+	-0x1.043e6cf77b256p-6,   0x1.3f7db42227d92p-7, -0x1.691a6fa2a2882p-8,
+	 0x1.2f6543162bc61p-9,  -0x1.07d5da05822b6p-11 },
+      {  0x1.ffffd2f64431dp-2,  -0x1.17bf8208c10c1p-4,  0x1.ced7487cdb124p-6,
+	-0x1.01a0d30932905p-6,   0x1.3388f99b254dap-7, -0x1.4844e245c65bdp-8,
+	 0x1.fa777150197c6p-10, -0x1.8c1ecf16a05c8p-12 },
+      {  0x1.ffffa36d1712ep-2,  -0x1.17b523971bd4ep-4,  0x1.cddee26de2deep-6,
+	-0x1.fccb00abaaabcp-7,   0x1.269afc3622342p-7, -0x1.2933152686752p-8,
+	 0x1.a76d4956cc9a3p-10, -0x1.2ce7d6dc651cep-12 },
+      {  0x1.ffff5402ab3a1p-2,  -0x1.17a5ba85da77ap-4,  0x1.cc96894e05c02p-6,
+	-0x1.f532143cb832ep-7,   0x1.19180b660ff09p-7, -0x1.0c57417a78b3cp-8,
+	 0x1.62e26cbd7bb1ep-10, -0x1.ce28d33fe1df3p-13 },
+      {  0x1.fffed8d639751p-2,  -0x1.1790349f3ae76p-4,  0x1.caf9a4fd1b398p-6,
+	-0x1.ec986b111342ep-7,   0x1.0b53c3ad4baa4p-7, -0x1.e3c2282eeace4p-9,
+	 0x1.2a55369f55bbep-10, -0x1.667fe48c396e8p-13 },
+      {  0x1.fffe24b714161p-2,  -0x1.177394fbcb719p-4,  0x1.c90652d920ebdp-6,
+	-0x1.e3239197bddf1p-7,   0x1.fb2188525b025p-8, -0x1.b3aadd451afc7p-9,
+	 0x1.f74020f31fdabp-11, -0x1.18b0cb246768dp-13 },
+      {  0x1.fffd298bec9e2p-2,  -0x1.174efbfd34648p-4,  0x1.c6bcfe48ea92bp-6,
+	-0x1.d8f9f2a16157cp-7,   0x1.e0044f56c8864p-8, -0x1.883e2347fe76cp-9,
+	 0x1.a9f0e3c1b7af5p-11, -0x1.bb5acc0e60825p-14 },
+      {  0x1.fffbd8b784c4dp-2,  -0x1.1721abdd3722ep-4,  0x1.c41fee756d4bp-6,
+	-0x1.ce40bccf8065fp-7,   0x1.c59b684b70ef9p-8, -0x1.6133d027996b3p-9,
+	 0x1.69cad01106397p-11, -0x1.60f8e45494156p-14 },
+      {  0x1.fffa23749cf88p-2,  -0x1.16eb0a8285c06p-4,  0x1.c132d762e1b0dp-6,
+	-0x1.c31a959398f4ep-7,   0x1.ac1c5b46bc8ap-8,  -0x1.3e34f1abe51dcp-9,
+	 0x1.346738737c0b9p-11, -0x1.1b227a3f5c75p-14 },
+      {  0x1.fff7fb25bb407p-2,  -0x1.16aaa14d7564p-4,   0x1.bdfa75fca5ff2p-6,
+	-0x1.b7a6e260d079cp-7,   0x1.93ab06911033cp-8, -0x1.1ee5560967fd5p-9,
+	 0x1.07d31060838bfp-11, -0x1.c96f33a283115p-15
+      }
+  };
+  const double *c = ch[i];
+  double z2 = z * z;
+  double z4 = z2 * z2;
+  if (__glibc_unlikely (i == 0))
+    {
+      double c0 = c[0] + z2 * c[1];
+      double c2 = c[2] + z2 * c[3];
+      double c4 = c[4] + z2 * c[5];
+      double c6 = c[6] + z2 * c[7];
+      c0 += c2 * z4;
+      c4 += c6 * z4;
+      c0 += c4 * (z4 * z4);
+      if (__glibc_unlikely (ax != 0.0 && ax <= 0x1.921fb4p-126f))
+	__set_errno (ERANGE);
+      return z * c0;
+    }
+  else
+    {
+      double f = sqrt (1 - az);
+      double c0 = fma (az, c[1], c[0]);
+      double c2 = c[2] + az * c[3];
+      double c4 = c[4] + az * c[5];
+      double c6 = c[6] + az * c[7];
+      c0 += c2 * z2;
+      c4 += c6 * z2;
+      c0 += c4 * z4;
+      double r = fma (-c0, copysign (f, x), copysign (0.5, x));
+      return r;
+    }
+}
+libm_alias_float (__asinpi, asinpi)
diff --git a/sysdeps/loongarch/lp64/libm-test-ulps b/sysdeps/loongarch/lp64/libm-test-ulps
index ce84ddf1e6..33dd6718ba 100644
--- a/sysdeps/loongarch/lp64/libm-test-ulps
+++ b/sysdeps/loongarch/lp64/libm-test-ulps
@@ -83,22 +83,18 @@  ldouble: 4
 
 Function: "asinpi":
 double: 1
-float: 1
 ldouble: 2
 
 Function: "asinpi_downward":
 double: 1
-float: 1
 ldouble: 2
 
 Function: "asinpi_towardzero":
 double: 1
-float: 2
 ldouble: 1
 
 Function: "asinpi_upward":
 double: 2
-float: 2
 ldouble: 2
 
 Function: "atan":
diff --git a/sysdeps/mips/mips64/libm-test-ulps b/sysdeps/mips/mips64/libm-test-ulps
index 67c37dfd5e..869ceff928 100644
--- a/sysdeps/mips/mips64/libm-test-ulps
+++ b/sysdeps/mips/mips64/libm-test-ulps
@@ -83,22 +83,18 @@  ldouble: 4
 
 Function: "asinpi":
 double: 1
-float: 1
 ldouble: 2
 
 Function: "asinpi_downward":
 double: 1
-float: 1
 ldouble: 2
 
 Function: "asinpi_towardzero":
 double: 1
-float: 2
 ldouble: 1
 
 Function: "asinpi_upward":
 double: 2
-float: 2
 ldouble: 2
 
 Function: "atan":
diff --git a/sysdeps/or1k/fpu/libm-test-ulps b/sysdeps/or1k/fpu/libm-test-ulps
index d3b1036d29..75db236e09 100644
--- a/sysdeps/or1k/fpu/libm-test-ulps
+++ b/sysdeps/or1k/fpu/libm-test-ulps
@@ -63,19 +63,15 @@  double: 3
 
 Function: "asinpi":
 double: 1
-float: 1
 
 Function: "asinpi_downward":
 double: 1
-float: 1
 
 Function: "asinpi_towardzero":
 double: 1
-float: 2
 
 Function: "asinpi_upward":
 double: 2
-float: 2
 
 Function: "atan":
 double: 1
diff --git a/sysdeps/or1k/nofpu/libm-test-ulps b/sysdeps/or1k/nofpu/libm-test-ulps
index 14b7e0f3f9..a1f7c80097 100644
--- a/sysdeps/or1k/nofpu/libm-test-ulps
+++ b/sysdeps/or1k/nofpu/libm-test-ulps
@@ -54,7 +54,6 @@  double: 3
 
 Function: "asinpi":
 double: 1
-float: 1
 
 Function: "atan":
 double: 1
diff --git a/sysdeps/powerpc/fpu/libm-test-ulps b/sysdeps/powerpc/fpu/libm-test-ulps
index c9c86de147..fa3cf2e844 100644
--- a/sysdeps/powerpc/fpu/libm-test-ulps
+++ b/sysdeps/powerpc/fpu/libm-test-ulps
@@ -107,25 +107,21 @@  ldouble: 7
 
 Function: "asinpi":
 double: 1
-float: 1
 float128: 1
 ldouble: 2
 
 Function: "asinpi_downward":
 double: 1
-float: 1
 float128: 2
 ldouble: 4
 
 Function: "asinpi_towardzero":
 double: 1
-float: 2
 float128: 1
 ldouble: 4
 
 Function: "asinpi_upward":
 double: 2
-float: 2
 float128: 2
 ldouble: 4
 
diff --git a/sysdeps/riscv/nofpu/libm-test-ulps b/sysdeps/riscv/nofpu/libm-test-ulps
index 6206a9531a..a5184ecad9 100644
--- a/sysdeps/riscv/nofpu/libm-test-ulps
+++ b/sysdeps/riscv/nofpu/libm-test-ulps
@@ -71,7 +71,6 @@  ldouble: 4
 
 Function: "asinpi":
 double: 1
-float: 1
 ldouble: 2
 
 Function: "atan":
diff --git a/sysdeps/riscv/rvd/libm-test-ulps b/sysdeps/riscv/rvd/libm-test-ulps
index 124ca4b719..3bfc9668d5 100644
--- a/sysdeps/riscv/rvd/libm-test-ulps
+++ b/sysdeps/riscv/rvd/libm-test-ulps
@@ -83,22 +83,18 @@  ldouble: 4
 
 Function: "asinpi":
 double: 1
-float: 1
 ldouble: 2
 
 Function: "asinpi_downward":
 double: 1
-float: 1
 ldouble: 2
 
 Function: "asinpi_towardzero":
 double: 1
-float: 2
 ldouble: 1
 
 Function: "asinpi_upward":
 double: 2
-float: 2
 ldouble: 2
 
 Function: "atan":
diff --git a/sysdeps/s390/fpu/libm-test-ulps b/sysdeps/s390/fpu/libm-test-ulps
index 364ccf3326..7d61bf1cef 100644
--- a/sysdeps/s390/fpu/libm-test-ulps
+++ b/sysdeps/s390/fpu/libm-test-ulps
@@ -83,22 +83,18 @@  ldouble: 4
 
 Function: "asinpi":
 double: 1
-float: 1
 ldouble: 2
 
 Function: "asinpi_downward":
 double: 1
-float: 1
 ldouble: 2
 
 Function: "asinpi_towardzero":
 double: 1
-float: 2
 ldouble: 1
 
 Function: "asinpi_upward":
 double: 2
-float: 2
 ldouble: 2
 
 Function: "atan":
diff --git a/sysdeps/sparc/fpu/libm-test-ulps b/sysdeps/sparc/fpu/libm-test-ulps
index 1174972002..426f45893e 100644
--- a/sysdeps/sparc/fpu/libm-test-ulps
+++ b/sysdeps/sparc/fpu/libm-test-ulps
@@ -83,22 +83,18 @@  ldouble: 4
 
 Function: "asinpi":
 double: 1
-float: 1
 ldouble: 2
 
 Function: "asinpi_downward":
 double: 1
-float: 1
 ldouble: 2
 
 Function: "asinpi_towardzero":
 double: 1
-float: 2
 ldouble: 1
 
 Function: "asinpi_upward":
 double: 2
-float: 2
 ldouble: 2
 
 Function: "atan":
diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps
index 5ed5112b49..d4c4bfa42b 100644
--- a/sysdeps/x86_64/fpu/libm-test-ulps
+++ b/sysdeps/x86_64/fpu/libm-test-ulps
@@ -180,25 +180,21 @@  float: 1
 
 Function: "asinpi":
 double: 1
-float: 1
 float128: 2
 ldouble: 2
 
 Function: "asinpi_downward":
 double: 1
-float: 1
 float128: 2
 ldouble: 2
 
 Function: "asinpi_towardzero":
 double: 1
-float: 2
 float128: 1
 ldouble: 2
 
 Function: "asinpi_upward":
 double: 2
-float: 2
 float128: 2
 ldouble: 2