[00/11] Add more CORE-math implementations to libm

Message ID	20241111134740.1410635-1-adhemerval.zanella@linaro.org
Headers	show Delivered-To: patch@linaro.org Received-SPF: pass (google.com: domain of libc-alpha-bounces~patch=linaro.org@sourceware.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 9A3003858D21 From: Adhemerval Zanella <adhemerval.zanella@linaro.org> To: libc-alpha@sourceware.org Subject: [PATCH 00/11] Add more CORE-math implementations to libm Date: Mon, 11 Nov 2024 10:45:38 -0300 Message-ID: <20241111134740.1410635-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: list Errors-To: libc-alpha-bounces~patch=linaro.org@sourceware.org
Series	Add more CORE-math implementations to libm \| expand [00/11] Add more CORE-math implementations to libm [01/11] benchtests: Add cbrtf benchmark [02/11] benchtests: Add erff benchmark [03/11] benchtests: Add erfcf benchmark [04/11] benchtests: Add lgammaf benchmark [05/11] benchtests: Add tanf benchmark [06/11] math: Use cbrtf from CORE-MATH [07/11] math: Split s_erfF in erff and erfc [08/11] math: Use erff from CORE-MATH [09/11] math: Use erfcf from CORE-MATH [10/11] math: Use lgammaf from CORE-MATH [11/11] math: Use tanf from CORE-MATH

Message ID

20241111134740.1410635-1-adhemerval.zanella@linaro.org

Headers

Received-SPF: pass (google.com: domain of
 libc-alpha-bounces~patch=linaro.org@sourceware.org designates 8.43.85.97 as
 permitted sender) client-ip=8.43.85.97;
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 9A3003858D21
From: Adhemerval Zanella <adhemerval.zanella@linaro.org>
To: libc-alpha@sourceware.org
Subject: [PATCH 00/11] Add more CORE-math implementations to libm
Date: Mon, 11 Nov 2024 10:45:38 -0300
Message-ID: <20241111134740.1410635-1-adhemerval.zanella@linaro.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: list
Errors-To: libc-alpha-bounces~patch=linaro.org@sourceware.org

Series

Add more CORE-math implementations to libm | expand

Message

Adhemerval Zanella Netto Nov. 11, 2024, 1:45 p.m. UTC

This patchset adds the optimized and correctly rounded cbrtf,
erff, erfcf, lgammaf, and tanf.  Each implementation has a benchmark
to evaluate the performance improvements.

I tested the implementation on recent hardware (Ryzen 9 5900X for
x86_64, Ampere/Neoverse for aarch64, POWER10 for powerpc, and
Loongson-3C5000L-LL for loongarch), and all implementations show good
performance improvements.  Like the implementation from ARM optimized
routines, the CORE-MATH one takes advantage of recent ISA and platform
support (like FMA and rounding instructions, along with FP throughput).

Adhemerval Zanella (11):
  benchtests: Add cbrtf benchmark
  benchtests: Add erff benchmark
  benchtests: Add erfcf benchmark
  benchtests: Add lgammaf benchmark
  benchtests: Add tanf benchmark
  math: Use cbrtf from CORE-MATH
  math: Split s_erfF in erff and erfc
  math: Use erff from CORE-MATH
  math: Use erfcf from CORE-MATH
  math: Use lgammaf from CORE-MATH
  math: Use tanf from CORE-MATH

 SHARED-FILES                                  |   26 +
 benchtests/Makefile                           |    5 +
 benchtests/cbrtf-inputs                       | 1005 ++++++
 benchtests/erfcf-inputs                       |  795 +++++
 benchtests/erff-inputs                        |  795 +++++
 benchtests/lgammaf-inputs                     | 1005 ++++++
 benchtests/tanf-inputs                        | 3005 +++++++++++++++++
 math/Makefile                                 |    1 +
 sysdeps/aarch64/libm-test-ulps                |   13 -
 sysdeps/alpha/fpu/libm-test-ulps              |   20 -
 sysdeps/arc/fpu/libm-test-ulps                |   20 -
 sysdeps/arc/nofpu/libm-test-ulps              |    7 -
 sysdeps/arm/libm-test-ulps                    |   22 -
 sysdeps/csky/fpu/libm-test-ulps               |   22 -
 sysdeps/csky/nofpu/libm-test-ulps             |   22 -
 sysdeps/generic/math_int128.h                 |  144 +
 sysdeps/hppa/fpu/libm-test-ulps               |   20 -
 sysdeps/i386/fpu/libm-test-ulps               |   16 -
 .../i386/i686/fpu/multiarch/libm-test-ulps    |   13 -
 sysdeps/ieee754/dbl-64/s_erfc.c               |    1 +
 sysdeps/ieee754/float128/s_erfcf128.c         |    1 +
 sysdeps/ieee754/flt-32/e_lgammaf_r.c          |  576 ++--
 sysdeps/ieee754/flt-32/k_tanf.c               |  102 +-
 sysdeps/ieee754/flt-32/lgamma_negf.c          |  283 +-
 sysdeps/ieee754/flt-32/s_cbrtf.c              |  136 +-
 sysdeps/ieee754/flt-32/s_erfcf.c              |  185 +
 sysdeps/ieee754/flt-32/s_erff.c               |  470 +--
 sysdeps/ieee754/flt-32/s_tanf.c               |  220 +-
 sysdeps/ieee754/ldbl-128/s_erfcl.c            |    1 +
 sysdeps/ieee754/ldbl-128ibm/s_erfcl.c         |    1 +
 sysdeps/ieee754/ldbl-96/s_erfcl.c             |    1 +
 sysdeps/loongarch/lp64/libm-test-ulps         |   20 -
 sysdeps/m68k/coldfire/fpu/libm-test-ulps      |    1 -
 sysdeps/m68k/m680x0/fpu/libm-test-ulps        |   16 -
 sysdeps/microblaze/libm-test-ulps             |    7 -
 sysdeps/mips/mips32/libm-test-ulps            |   22 -
 sysdeps/mips/mips64/libm-test-ulps            |   20 -
 sysdeps/nios2/libm-test-ulps                  |    7 -
 sysdeps/or1k/fpu/libm-test-ulps               |   22 -
 sysdeps/or1k/nofpu/libm-test-ulps             |   22 -
 sysdeps/powerpc/fpu/libm-test-ulps            |   20 -
 sysdeps/powerpc/nofpu/libm-test-ulps          |   20 -
 sysdeps/riscv/nofpu/libm-test-ulps            |   20 -
 sysdeps/riscv/rvd/libm-test-ulps              |   20 -
 sysdeps/s390/fpu/libm-test-ulps               |   20 -
 sysdeps/sh/libm-test-ulps                     |   12 -
 sysdeps/sparc/fpu/libm-test-ulps              |   20 -
 sysdeps/x86_64/fpu/libm-test-ulps             |   20 -
 48 files changed, 7815 insertions(+), 1407 deletions(-)
 create mode 100644 benchtests/cbrtf-inputs
 create mode 100644 benchtests/erfcf-inputs
 create mode 100644 benchtests/erff-inputs
 create mode 100644 benchtests/lgammaf-inputs
 create mode 100644 benchtests/tanf-inputs
 create mode 100644 sysdeps/generic/math_int128.h
 create mode 100644 sysdeps/ieee754/dbl-64/s_erfc.c
 create mode 100644 sysdeps/ieee754/float128/s_erfcf128.c
 create mode 100644 sysdeps/ieee754/flt-32/s_erfcf.c
 create mode 100644 sysdeps/ieee754/ldbl-128/s_erfcl.c
 create mode 100644 sysdeps/ieee754/ldbl-128ibm/s_erfcl.c
 create mode 100644 sysdeps/ieee754/ldbl-96/s_erfcl.c