From patchwork Mon Nov 11 13:45:38 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 842455 Delivered-To: patch@linaro.org Received: by 2002:a5d:6307:0:b0:381:e71e:8f7b with SMTP id i7csp3034003wru; Mon, 11 Nov 2024 05:48:09 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCVb7t3GbhM0ILI3O2oiDePjMtTrKmcRH1k65eufaiiBqjJ0VbuzMaAwrzkFqJI0gLxxgN2Feg==@linaro.org X-Google-Smtp-Source: AGHT+IHu36kH/ur/BbJcO+AQTlhxSgKoYGxYI27B/FnpXY3bul/ujYYoeXSdlsjKdTzGvfV1oJU4 X-Received: by 2002:a05:620a:2685:b0:7a9:b250:d57b with SMTP id af79cd13be357-7b331e88d2dmr1759179885a.5.1731332889195; Mon, 11 Nov 2024 05:48:09 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1731332889; cv=pass; d=google.com; s=arc-20240605; b=LoMa8norBxKNkySF3UZ7oCQZuK+LDGtUMUzjIEheDaOQ3bdO5o1/IHx/EgZvZC3lck Udpc6N25rEPWEuolahWKY5m8NUwyxLeNbXBokqaH5/5gO/a7ktDMY0NcAyqvXjv4X+7z 5G5UnrzLELCo/UmiC2BOJdfK8ZGD4ktRSqrikJOSnjItMeHIS7y1DtpYMQK7pIcT+Eji vT+UZoWGdY2Lh42EEB7PPtQT49Erg2UhWjsxx6B0SGX6jNbN99qFW1JhzYkiyE09Po6b ePXk0lg1JxjSBTk8Ibv3pUa+HErpI5ER2PSMp0oN5YVRk6YT/qyMM4j5/6SQnEhcMpYT iqiw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:message-id:date:subject:to:from:dkim-signature :arc-filter:dmarc-filter:delivered-to; bh=ulJEFlxAtc3sVYLyr3rreSgd0qFTH6NKYlq99tMhwLw=; fh=dHLBnA+MhGtNtN2B2JMAELi4oD+gmgMg7DL8H0jYbkI=; b=LKjXtvhSr3GBmcepI3TcOS7IiBVogH8YLMtZUksA03lJvOvowUvONlhAIi3sEB34zX 0P1tE/sgLQwUX1shhQgZjQHC5q4RQIbP/YpKa8wwvrow1Dby4u/mhh3QnoU/3huzODWl 0e8EwCFxlUByq6yNtK/q2/lPqqcxs5068sp5NN1TprjHF9CRAbYR+OIyiWE4DtUp9l/u WcwHqW5CXU9C0X7XAkGZ+hHr5aIbyHdIjCqKFHJn0X0vGRrzW0KmbcevDMHQomcCJzf/ apbSLAvdgK2o6fCofko5mdvVUIBJTLESeFNKlVG0U2oIEzSALpdkv38EIS9Wtlc2sTx3 csPQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=nG3YpuPu; arc=pass (i=1); spf=pass (google.com: domain of libc-alpha-bounces~patch=linaro.org@sourceware.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="libc-alpha-bounces~patch=linaro.org@sourceware.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from server2.sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id af79cd13be357-7b32ad0e475si1097449685a.542.2024.11.11.05.48.09 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 11 Nov 2024 05:48:09 -0800 (PST) Received-SPF: pass (google.com: domain of libc-alpha-bounces~patch=linaro.org@sourceware.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=nG3YpuPu; arc=pass (i=1); spf=pass (google.com: domain of libc-alpha-bounces~patch=linaro.org@sourceware.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="libc-alpha-bounces~patch=linaro.org@sourceware.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id CA46A3858C53 for ; Mon, 11 Nov 2024 13:48:08 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pf1-x433.google.com (mail-pf1-x433.google.com [IPv6:2607:f8b0:4864:20::433]) by sourceware.org (Postfix) with ESMTPS id 9A3003858D21 for ; Mon, 11 Nov 2024 13:47:45 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 9A3003858D21 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 9A3003858D21 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::433 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1731332876; cv=none; b=Hch6rEtOm+YEXB7wJ0H+jvfaTDbloGNvO9su5rxbNm7R+8DszmiI6XSS4RQYIHKM9notbrG5YLeAHRq9xUHZGkejvjVZxUk/IG8Dh68I4+cexrtgSEioYOBrTNm5nSjlZkExwTUgCQldMFcZSWVrLLEw+uWmcmppEhYI3587OB0= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1731332876; c=relaxed/simple; bh=a/y0EdCk1c+zfkd2nRzLGHiQE1QMSAp3jvtB1kNgofU=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=jNBPYsAF8MA0A/8X+fcLCkV7IJ2Ox/eUWHFPTLgr4LZQdu0UZyqun3SASUNzcvbJO5KaU8VnlcoqPN1P+XMhqmG9Jt1PPk+3Kb+737T7tpSult8OAw4x+rWwe1zscemlIkVaQvKJ3E0fZO3KQPZb/Qe/KobqfKa+CYULYttS/V0= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-pf1-x433.google.com with SMTP id d2e1a72fcca58-71e79f73aaeso3629400b3a.3 for ; Mon, 11 Nov 2024 05:47:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1731332864; x=1731937664; darn=sourceware.org; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:from:to:cc:subject:date:message-id:reply-to; bh=ulJEFlxAtc3sVYLyr3rreSgd0qFTH6NKYlq99tMhwLw=; b=nG3YpuPu3xpDy0SInpJchwmZ+/EgZLBFG2f/OCFgJu6iN3zExI1VUduy5kboBapKee EgjjhoG9eH+59hmN8IFZ1a+9mIHfhp+PWZwFdMs2S1FxCZq7Wr4smzKdDx8Q+iIJJLYT y04PRiJkMi3UnajRImn/Xkl6hchW+1K8bc2D57ibCuTDW1MYDAo2NuEDVNfLB6gBjrH3 DHuDjVsWv7b2nJ7W78yeXhz8L0OuS3zel6JXtloWvnfGC/xrFzYRgusJXvUfUp/Dk7zS NtEpOo2R8fjIXU+MuebJCAPB2cO1sxnLBwCkZCy4O3vhESIoBMUCx+Rpv8Sws0NLtUvl kJCg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731332864; x=1731937664; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ulJEFlxAtc3sVYLyr3rreSgd0qFTH6NKYlq99tMhwLw=; b=MTITrFql3swn8JMYUCcaKDfiG8ug/D3j6fMAmB4oaRtPNsTmhhRVXYjF7ZN6UY6L8I u98J3n+Pf2nMtSwYorgjkMEbhb9N9RkrR+c2ub4PgWneM9+cMz9mBUkNzcj2rDUwb5ZC +rvglUE2Dti7kl3uapqlMhQFECAXErQNotOQ00rdl3WUTuBnCIYqq6wlfwZAQK+pp4B3 ulIU7GMxzLPiRTnRqwB4yJizYCN1076/3fLVolwLCAzVkfZAmA+zPkpqmxhq0hEbDAHz vsPX9l8n/QGO+lsPw66K2/SuUiv9aub/tLgT0fhjnBoGsApqiKA7wbyVfWeM8aMg6uvL k9Mw== X-Gm-Message-State: AOJu0Yy3pjKBpaynowNBCQmb1ovyy/P4K6JnXIJ8z0nglj1tjf3JwcpK gYqPoqYkcteoWvWoYMI3MGxruLaJEncIOPAim+xnIeeIgfcpHmw50VrojBPn45ZDnvN58KASG/9 fbN4lsg== X-Received: by 2002:a05:6a20:4305:b0:1d9:c7df:3b1d with SMTP id adf61e73a8af0-1dc228f505emr17056265637.12.1731332864130; Mon, 11 Nov 2024 05:47:44 -0800 (PST) Received: from mandiga.. ([2804:1b3:a7c0:1b55:b2b2:a79f:60ab:6ea2]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-7f41f65bf93sm8530126a12.79.2024.11.11.05.47.42 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 11 Nov 2024 05:47:43 -0800 (PST) From: Adhemerval Zanella To: libc-alpha@sourceware.org Subject: [PATCH 00/11] Add more CORE-math implementations to libm Date: Mon, 11 Nov 2024 10:45:38 -0300 Message-ID: <20241111134740.1410635-1-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.43.0 MIME-Version: 1.0 X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~patch=linaro.org@sourceware.org This patchset adds the optimized and correctly rounded cbrtf, erff, erfcf, lgammaf, and tanf. Each implementation has a benchmark to evaluate the performance improvements. I tested the implementation on recent hardware (Ryzen 9 5900X for x86_64, Ampere/Neoverse for aarch64, POWER10 for powerpc, and Loongson-3C5000L-LL for loongarch), and all implementations show good performance improvements. Like the implementation from ARM optimized routines, the CORE-MATH one takes advantage of recent ISA and platform support (like FMA and rounding instructions, along with FP throughput). Adhemerval Zanella (11): benchtests: Add cbrtf benchmark benchtests: Add erff benchmark benchtests: Add erfcf benchmark benchtests: Add lgammaf benchmark benchtests: Add tanf benchmark math: Use cbrtf from CORE-MATH math: Split s_erfF in erff and erfc math: Use erff from CORE-MATH math: Use erfcf from CORE-MATH math: Use lgammaf from CORE-MATH math: Use tanf from CORE-MATH SHARED-FILES | 26 + benchtests/Makefile | 5 + benchtests/cbrtf-inputs | 1005 ++++++ benchtests/erfcf-inputs | 795 +++++ benchtests/erff-inputs | 795 +++++ benchtests/lgammaf-inputs | 1005 ++++++ benchtests/tanf-inputs | 3005 +++++++++++++++++ math/Makefile | 1 + sysdeps/aarch64/libm-test-ulps | 13 - sysdeps/alpha/fpu/libm-test-ulps | 20 - sysdeps/arc/fpu/libm-test-ulps | 20 - sysdeps/arc/nofpu/libm-test-ulps | 7 - sysdeps/arm/libm-test-ulps | 22 - sysdeps/csky/fpu/libm-test-ulps | 22 - sysdeps/csky/nofpu/libm-test-ulps | 22 - sysdeps/generic/math_int128.h | 144 + sysdeps/hppa/fpu/libm-test-ulps | 20 - sysdeps/i386/fpu/libm-test-ulps | 16 - .../i386/i686/fpu/multiarch/libm-test-ulps | 13 - sysdeps/ieee754/dbl-64/s_erfc.c | 1 + sysdeps/ieee754/float128/s_erfcf128.c | 1 + sysdeps/ieee754/flt-32/e_lgammaf_r.c | 576 ++-- sysdeps/ieee754/flt-32/k_tanf.c | 102 +- sysdeps/ieee754/flt-32/lgamma_negf.c | 283 +- sysdeps/ieee754/flt-32/s_cbrtf.c | 136 +- sysdeps/ieee754/flt-32/s_erfcf.c | 185 + sysdeps/ieee754/flt-32/s_erff.c | 470 +-- sysdeps/ieee754/flt-32/s_tanf.c | 220 +- sysdeps/ieee754/ldbl-128/s_erfcl.c | 1 + sysdeps/ieee754/ldbl-128ibm/s_erfcl.c | 1 + sysdeps/ieee754/ldbl-96/s_erfcl.c | 1 + sysdeps/loongarch/lp64/libm-test-ulps | 20 - sysdeps/m68k/coldfire/fpu/libm-test-ulps | 1 - sysdeps/m68k/m680x0/fpu/libm-test-ulps | 16 - sysdeps/microblaze/libm-test-ulps | 7 - sysdeps/mips/mips32/libm-test-ulps | 22 - sysdeps/mips/mips64/libm-test-ulps | 20 - sysdeps/nios2/libm-test-ulps | 7 - sysdeps/or1k/fpu/libm-test-ulps | 22 - sysdeps/or1k/nofpu/libm-test-ulps | 22 - sysdeps/powerpc/fpu/libm-test-ulps | 20 - sysdeps/powerpc/nofpu/libm-test-ulps | 20 - sysdeps/riscv/nofpu/libm-test-ulps | 20 - sysdeps/riscv/rvd/libm-test-ulps | 20 - sysdeps/s390/fpu/libm-test-ulps | 20 - sysdeps/sh/libm-test-ulps | 12 - sysdeps/sparc/fpu/libm-test-ulps | 20 - sysdeps/x86_64/fpu/libm-test-ulps | 20 - 48 files changed, 7815 insertions(+), 1407 deletions(-) create mode 100644 benchtests/cbrtf-inputs create mode 100644 benchtests/erfcf-inputs create mode 100644 benchtests/erff-inputs create mode 100644 benchtests/lgammaf-inputs create mode 100644 benchtests/tanf-inputs create mode 100644 sysdeps/generic/math_int128.h create mode 100644 sysdeps/ieee754/dbl-64/s_erfc.c create mode 100644 sysdeps/ieee754/float128/s_erfcf128.c create mode 100644 sysdeps/ieee754/flt-32/s_erfcf.c create mode 100644 sysdeps/ieee754/ldbl-128/s_erfcl.c create mode 100644 sysdeps/ieee754/ldbl-128ibm/s_erfcl.c create mode 100644 sysdeps/ieee754/ldbl-96/s_erfcl.c