[00/23] Add remaining CORE-MATH binary32 implementations to libm

Message ID	20241129132032.476978-1-adhemerval.zanella@linaro.org
Headers	show Delivered-To: patch@linaro.org Received-SPF: pass (google.com: domain of libc-alpha-bounces~patch=linaro.org@sourceware.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B824F3858D29 From: Adhemerval Zanella <adhemerval.zanella@linaro.org> To: libc-alpha@sourceware.org Cc: DJ Delorie <dj@redhat.com> Subject: [PATCH 00/23] Add remaining CORE-MATH binary32 implementations to libm Date: Fri, 29 Nov 2024 10:17:24 -0300 Message-ID: <20241129132032.476978-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: list Errors-To: libc-alpha-bounces~patch=linaro.org@sourceware.org
Series	Add remaining CORE-MATH binary32 implementations to libm \| expand [00/23] Add remaining CORE-MATH binary32 implementations to libm [01/23] benchtests: Add acosf benchmark [02/23] benchtests: Add acoshf benchmark [03/23] benchtests: Add asinf benchmark [04/23] benchtests: Add asinhf benchmark [05/23] benchtests: Add atanf benchmark [06/23] benchtests: Add atan2f benchmark [07/23] benchtests: Add atanhf benchmark [08/23] benchtests: Add coshf benchmark [09/23] benchtests: Add sinhf benchmark [10/23] benchtests: Add tanhf benchmark [11/23] math: Add inf support on gen-auto-libm-tests.c [12/23] math: Fix the expected atanf (inf) results [13/23] math: Fix the expected atan2f (inf) results [14/23] math: Use acosf from CORE-MATH [15/23] math: Use acoshf from CORE-MATH [16/23] math: Use asinf from CORE-MATH [18/23] math: Use atanf from CORE-MATH [19/23] math: Use atan2f from CORE-MATH [20/23] math: Use atanhf from CORE-MATH [21/23] math: Use coshf from CORE-MATH [22/23] math: Use sinhf from CORE-MATH [23/23] math: Use tanhf from CORE-MATH

Message ID

20241129132032.476978-1-adhemerval.zanella@linaro.org

Headers

Received-SPF: pass (google.com: domain of
 libc-alpha-bounces~patch=linaro.org@sourceware.org designates 8.43.85.97 as
 permitted sender) client-ip=8.43.85.97;
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B824F3858D29
From: Adhemerval Zanella <adhemerval.zanella@linaro.org>
To: libc-alpha@sourceware.org
Cc: DJ Delorie <dj@redhat.com>
Subject: [PATCH 00/23] Add remaining CORE-MATH binary32 implementations to
 libm
Date: Fri, 29 Nov 2024 10:17:24 -0300
Message-ID: <20241129132032.476978-1-adhemerval.zanella@linaro.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: list
Errors-To: libc-alpha-bounces~patch=linaro.org@sourceware.org

Series

Add remaining CORE-MATH binary32 implementations to libm | expand

Message

Adhemerval Zanella Netto Nov. 29, 2024, 1:17 p.m. UTC

This patchset adds the optimized and correctly rounded acosf, acohf,
asinf, asinhf, atanf, atan2f, atanhf, coshf, sinhf, and tanf from
CORE-MATH [1].  Each implementation has a benchmark to evaluate the
performance improvements.

In general, the results are pretty good with just some remarks:

* acosf/asinf hits hard the branch prediction on the range x < 0x1.c2a1dcp-1
  for acosf and x < 0x1.852p+126 for asinf.

  For acosf, this is required to make it correctly rounded on non-default
  rounding modes, and for asinf, it is fast-path optimization.

  The performance profiles are wildly different depending on the chip: I see
  regressions compared to glibc implementation on AMD Zen3 targeting
  x86-64/x86-64-v2, but I also see large improvements for x86-64-v3 and also
   on aarch64 Neoverse-N1 and powerpc POWER10.

  I think it should not be a blocker for integration.

* coshf performance is the only regression compared to the current glibc.
  This is mostly due to the benchmark used (which I modeled using CORE-MATH
  input range) showing that the glibc code hotspot is on expf, an optimized
  version from ARM-optimized routines.  

  Neither current expf nor coshf is correctly rounded, and the maximum error
  is 2ulps for FE_TONEAREST and 3ulp for another rounding.

  I am not sure if this would be a blocker, and I plan to remove the old SVID
  compat wrapper in subsequent patches (which should improve the function
  performance by about ~10%).

[1] https://gitlab.inria.fr/core-math/core-math

Adhemerval Zanella (23):
  benchtests: Add acosf benchmark
  benchtests: Add acoshf benchmark
  benchtests: Add asinf benchmark
  benchtests: Add asinhf benchmark
  benchtests: Add atanf benchmark
  benchtests: Add atan2f benchmark
  benchtests: Add atanhf benchmark
  benchtests: Add coshf benchmark
  benchtests: Add sinhf benchmark
  benchtests: Add tanhf benchmark
  math: Add inf support on gen-auto-libm-tests.c
  math: Fix the expected atanf (inf) results
  math: Fix the expected atan2f (inf) results
  math: Use acosf from CORE-MATH
  math: Use acoshf from CORE-MATH
  math: Use asinf from CORE-MATH
  math: Use asinhf from CORE-MATH
  math: Use atanf from CORE-MATH
  math: Use atan2f from CORE-MATH
  math: Use atanhf from CORE-MATH
  math: Use coshf from CORE-MATH
  math: Use sinhf from CORE-MATH
  math: Use tanhf from CORE-MATH

 SHARED-FILES                                  |   40 +
 benchtests/Makefile                           |   10 +
 benchtests/acosf-inputs                       | 2710 +++++++++++++++++
 benchtests/acoshf-inputs                      | 1005 ++++++
 benchtests/asinf-inputs                       | 2710 +++++++++++++++++
 benchtests/asinhf-inputs                      | 2005 ++++++++++++
 benchtests/atan2f-inputs                      | 2005 ++++++++++++
 benchtests/atanf-inputs                       | 2005 ++++++++++++
 benchtests/atanhf-inputs                      | 2005 ++++++++++++
 benchtests/coshf-inputs                       | 2005 ++++++++++++
 benchtests/sinhf-inputs                       | 2005 ++++++++++++
 benchtests/tanhf-inputs                       | 2005 ++++++++++++
 math/auto-libm-test-in                        |   52 +
 math/auto-libm-test-out-atan                  |   50 +
 math/auto-libm-test-out-atan2                 | 2316 ++++++++++++++
 math/gen-auto-libm-tests.c                    |   23 +-
 math/libm-test-atan.inc                       |    2 -
 math/libm-test-atan2.inc                      |   56 -
 sysdeps/aarch64/libm-test-ulps                |   44 +-
 sysdeps/alpha/fpu/libm-test-ulps              |   40 -
 sysdeps/arc/fpu/libm-test-ulps                |   40 -
 sysdeps/arc/nofpu/libm-test-ulps              |   10 -
 sysdeps/arm/libm-test-ulps                    |   48 +-
 sysdeps/csky/fpu/libm-test-ulps               |   40 -
 sysdeps/csky/nofpu/libm-test-ulps             |   40 -
 sysdeps/hppa/fpu/libm-test-ulps               |   40 -
 sysdeps/i386/fpu/e_acosf.S                    |   23 -
 sysdeps/i386/fpu/e_acoshf.S                   |  101 -
 sysdeps/i386/fpu/e_asinf.S                    |   38 -
 sysdeps/i386/fpu/e_atan2f.S                   |   30 -
 sysdeps/i386/fpu/e_atanhf.S                   |  110 -
 sysdeps/i386/fpu/libm-test-ulps               |   25 -
 sysdeps/i386/fpu/s_asinhf.S                   |  139 -
 sysdeps/i386/fpu/s_atanf.S                    |   30 -
 .../i386/i686/fpu/multiarch/libm-test-ulps    |   25 -
 sysdeps/ieee754/flt-32/e_acosf.c              |  191 +-
 sysdeps/ieee754/flt-32/e_acoshf.c             |  230 +-
 sysdeps/ieee754/flt-32/e_asinf.c              |  210 +-
 sysdeps/ieee754/flt-32/e_atan2f.c             |  337 +-
 sysdeps/ieee754/flt-32/e_atanhf.c             |  210 +-
 sysdeps/ieee754/flt-32/e_coshf.c              |  156 +-
 sysdeps/ieee754/flt-32/e_sinhf.c              |  169 +-
 sysdeps/ieee754/flt-32/s_asinhf.c             |  219 +-
 sysdeps/ieee754/flt-32/s_atanf.c              |  186 +-
 sysdeps/ieee754/flt-32/s_tanhf.c              |  131 +-
 sysdeps/loongarch/lp64/libm-test-ulps         |   44 +-
 sysdeps/m68k/coldfire/fpu/libm-test-ulps      |    2 -
 sysdeps/m68k/m680x0/fpu/libm-test-ulps        |   11 -
 sysdeps/microblaze/libm-test-ulps             |   10 -
 sysdeps/mips/mips32/libm-test-ulps            |   40 -
 sysdeps/mips/mips64/libm-test-ulps            |   40 -
 sysdeps/or1k/fpu/libm-test-ulps               |   40 -
 sysdeps/or1k/nofpu/libm-test-ulps             |   40 -
 sysdeps/powerpc/fpu/libm-test-ulps            |   44 +-
 sysdeps/powerpc/nofpu/libm-test-ulps          |   40 -
 sysdeps/riscv/nofpu/libm-test-ulps            |   40 -
 sysdeps/riscv/rvd/libm-test-ulps              |   40 -
 sysdeps/s390/fpu/libm-test-ulps               |   40 -
 sysdeps/sh/libm-test-ulps                     |   20 -
 sysdeps/sparc/fpu/libm-test-ulps              |   40 -
 sysdeps/x86_64/fpu/libm-test-ulps             |   46 +-
 61 files changed, 24381 insertions(+), 2027 deletions(-)
 create mode 100644 benchtests/acosf-inputs
 create mode 100644 benchtests/acoshf-inputs
 create mode 100644 benchtests/asinf-inputs
 create mode 100644 benchtests/asinhf-inputs
 create mode 100644 benchtests/atan2f-inputs
 create mode 100644 benchtests/atanf-inputs
 create mode 100644 benchtests/atanhf-inputs
 create mode 100644 benchtests/coshf-inputs
 create mode 100644 benchtests/sinhf-inputs
 create mode 100644 benchtests/tanhf-inputs
 delete mode 100644 sysdeps/i386/fpu/e_acosf.S
 delete mode 100644 sysdeps/i386/fpu/e_acoshf.S
 delete mode 100644 sysdeps/i386/fpu/e_asinf.S
 delete mode 100644 sysdeps/i386/fpu/e_atan2f.S
 delete mode 100644 sysdeps/i386/fpu/e_atanhf.S
 delete mode 100644 sysdeps/i386/fpu/s_asinhf.S
 delete mode 100644 sysdeps/i386/fpu/s_atanf.S

Comments

Florian Weimer Dec. 2, 2024, 2:39 p.m. UTC | #1

* Adhemerval Zanella:

> * acosf/asinf hits hard the branch prediction on the range x < 0x1.c2a1dcp-1
>   for acosf and x < 0x1.852p+126 for asinf.
>
>   For acosf, this is required to make it correctly rounded on non-default
>   rounding modes, and for asinf, it is fast-path optimization.
>
>   The performance profiles are wildly different depending on the chip: I see
>   regressions compared to glibc implementation on AMD Zen3 targeting
>   x86-64/x86-64-v2, but I also see large improvements for x86-64-v3 and also
>    on aarch64 Neoverse-N1 and powerpc POWER10.

We can add an FMA variant for acosf/asinf as an IFUNC.  This should
avoid the performance regression for x86-64 on most machines.

According to the commit message, that would also help with coshf for
x86-64?

Thanks,
Florian