From patchwork Fri Nov 29 13:17:24 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 846120 Delivered-To: patch@linaro.org Received: by 2002:adf:f2c4:0:b0:382:43a8:7b94 with SMTP id d4csp854380wrp; Fri, 29 Nov 2024 05:21:11 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCW4q1Z43EWTlbbMjW4uMrko3zapkAziyqtnYwuXOug3EFvkmQ3Gg5/sQLBsZg1ZkiPA2MxRkA==@linaro.org X-Google-Smtp-Source: AGHT+IEaIUWVmrecvrJSV/frCpdsNpSuonzZGHSRvfbosPMrVOE5M7XomU22b3HbFgDxOPIs4Ax8 X-Received: by 2002:a05:620a:458c:b0:7b6:7826:928c with SMTP id af79cd13be357-7b67c275782mr1458233985a.18.1732886471040; Fri, 29 Nov 2024 05:21:11 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1732886471; cv=pass; d=google.com; s=arc-20240605; b=cyYnQoIrgyffws2NlvtnMkmoFf9ImPoqvZ+oK3UJU92Q29reFgHTdkCBa3XXT/qbR2 GUZ4NHOOfFf7lPbULy77OM/k3qsM72/5IwMf4yzkkrh0Tqz3qEwsfd/2sQwTM/dK0ap2 EjkiPnXWNo1wyePVpj9yxedsBzS18Iui7Fxua5/6eqbc8qb4vxu+7Snra8ZGx42w6uvM hYYAAcEiar9AYv1qe50hSbwMJDZ1rzm00TsHTEWqK36oFLHQFZ/jwzOwbGuFowY0th++ g+Oq8vfPJMpCgciRNXuzbRuNqumCo/LcTmTgYWdGyoLSM5++GgrjGtD0V35wzlPHjSTF vpCw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:message-id:date:subject:cc:to:from:dkim-signature :dkim-filter:arc-filter:dmarc-filter:delivered-to:dkim-filter; bh=UO54ajiNQyWz7vL5BP14YBnArWPnMAwCE7zjz2tgebU=; fh=/bzLn4Pu6wt5yJOmVA6NJX5WuFMrzV1CRzIGBeRqGpk=; b=CrZyKOxXFVqOvmQX28FRONZSXVmurDEqUwfE2LFqvqlu0tq9x70eUkw8yLLczEjT+d rj8y7bqBGf4/zkliUSueN75U7eTZFmSlmD3lMveZfOSmsDEPu4F48RLRi7eFM8Nd2Bno j/HC+3+xK2Z4EyrlyuBcYim9xABe5cYAdnvWsG2f6oPPGmX/OEKoOn7zjQvP4DB6dgnv Q7pAk6mFZt7CrG4KlmnLy1/ejwc7tU7ExGttYcg0Xmbz7WLzDP4DJzcwweuev4KjfiQ1 SF69ngrpP9ORQkDpCaFN3wytU+B5hXu5PueLEMheoLS2SqrifXycwI//Rt24KaOIksKn hhQQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=MRtEQRe+; arc=pass (i=1); spf=pass (google.com: domain of libc-alpha-bounces~patch=linaro.org@sourceware.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="libc-alpha-bounces~patch=linaro.org@sourceware.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from server2.sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id af79cd13be357-7b6849b3b07si258511785a.315.2024.11.29.05.21.10 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 29 Nov 2024 05:21:11 -0800 (PST) Received-SPF: pass (google.com: domain of libc-alpha-bounces~patch=linaro.org@sourceware.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=MRtEQRe+; arc=pass (i=1); spf=pass (google.com: domain of libc-alpha-bounces~patch=linaro.org@sourceware.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="libc-alpha-bounces~patch=linaro.org@sourceware.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 915D03858403 for ; Fri, 29 Nov 2024 13:21:10 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 915D03858403 Authentication-Results: sourceware.org; dkim=pass (2048-bit key, unprotected) header.d=linaro.org header.i=@linaro.org header.a=rsa-sha256 header.s=google header.b=MRtEQRe+ X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pl1-x642.google.com (mail-pl1-x642.google.com [IPv6:2607:f8b0:4864:20::642]) by sourceware.org (Postfix) with ESMTPS id B824F3858D29 for ; Fri, 29 Nov 2024 13:20:37 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B824F3858D29 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org ARC-Filter: OpenARC Filter v1.0.0 sourceware.org B824F3858D29 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::642 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1732886438; cv=none; b=TDvCH3gQensjXs5bOEZrJ5taV2PL//oKYUFwdsQeTabQoyh5nxfIbrS/upSEdXfMrerDsd+DKAdpLenVH8QJBY0KMcn/wsIv94BoxwjLCoJCB17O/Ifo6S1bQ3dEfVNHNW3MW2bIo1StnPi/0LtcLdD4fYVJE62xNDDBkDlbzM4= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1732886438; c=relaxed/simple; bh=WvkcGplu4OKPw0NKBIHdSTywuqYebGEh0H0zjuUVErM=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=gLJfAP7zxsl8IYkNu/N9UpV9V2lQCQCkog8p6QkyeUQ0QyAASiCbAO3Gis8Zi/Fk8RgRBqZzzdke2xvPyhGE6NLNrSN+p+JuPW/wbvpfsUwALXL6/voMGQKBt/npZ+9PrKGZ5X3781qr9UM2BvzIR/iAwI1dW6f/6FrGrHrKp0A= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B824F3858D29 Received: by mail-pl1-x642.google.com with SMTP id d9443c01a7336-212776d6449so17582685ad.1 for ; Fri, 29 Nov 2024 05:20:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1732886436; x=1733491236; darn=sourceware.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=UO54ajiNQyWz7vL5BP14YBnArWPnMAwCE7zjz2tgebU=; b=MRtEQRe+NTq2z6SqlNlql2zwTKV3rc/mhv6QvAiPphdklkvHa8LpYIX8AZZ61zHKt5 dSrcuIEkfEDiW8mGIfa+lREcMLAT9M7zz+kQJy79ouPkw84e2r9xApmjy8oYHGr01KlY ZtL/R4Wq/4jnB2b4TKP3fbVZtp1Abw+y+W+rfrKO25rb5Lv3oSj9i3nM+FjSw63I6GAp A1jBXothL2lxzMu1TmnM76aJRPoiO85JJf/wZ9apXewjIDJCFM5lJN8Iq3U2bR3HlHTs Xsnwc1lelMQw/zBnzdC1Iq9v1Knf4OpAugChr7xJyYMTl942IcP+cjlqjVyVwnHeITDz bWqw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1732886436; x=1733491236; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=UO54ajiNQyWz7vL5BP14YBnArWPnMAwCE7zjz2tgebU=; b=VCa1WWjwtbysFMv1qknrJILWLeIB4U3NYaL7ihWnSHXSPq8Xu++azdnNRQqCwNlfV9 YHMAV6iKLoSiQTrQ33kjfBYD3xajYH3Tp8w7NkA6ZZmK9Z749YktMSx3injj5C5LNthF x5Zbl6FTaU5kVtWhbKo9h7+m46nZZWW5BW5nafHNw3eq4kaj1tvUbo3NLlkK527e5yAn VhvVIbow1KM8FyiV1Q7f0Coc4e8fuuy3RJkmxKmH4iKZq7CRVpifoHEhQ6XxTrnw7EEZ Av/VXxf3CGYo5rbi/sl5a4p0zMVsSkIqasMpvF2VY1VaFRJImqeypgIltRYm7cMuvaAL 2n1A== X-Gm-Message-State: AOJu0Yzd2W0KFegb3me4Da3KAZNxhI9MEXs7Dh6x4h5GMp9vUmvt5Vfm kz5N/O/0EfBT3a0hhRTMwe4rhG51USjuvoiQVf8OYfgUU+q9O9PZ2wA3hnb+3OyKosSn5B/a+x+ stpqeDaCp X-Gm-Gg: ASbGncuc3Mu7JierW526LI+lf8qSI6HbQQsf7Vn3KocMx6puF+1R0KiOfexyiiiUwuQ /RGi0Vyfj1EF0rCfiTZsvIIjlK3abJeCaek3geQD8UVd9QMJaxeRkUVm17C1eFW7IwHCr1f705a asa9N2X6FHvaPVVVOGpEyYYVzZvlfX109r/bbCbYdL2GnK/HFjcDC0z5L9+wfiqfYF+iCYzgTuY XqGicA3Z1HR0sctHndr+3OF/kOor0WwumFLUZMFUacZiNHU4Hiw+Hpu04d+nnk= X-Received: by 2002:a17:902:da85:b0:212:e29:3b2f with SMTP id d9443c01a7336-21501b5afc3mr136660005ad.44.1732886436333; Fri, 29 Nov 2024 05:20:36 -0800 (PST) Received: from mandiga.. ([2804:1b3:a7c1:68c8:3143:6603:ad16:715e]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-2153d5f66d5sm14472255ad.201.2024.11.29.05.20.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 29 Nov 2024 05:20:35 -0800 (PST) From: Adhemerval Zanella To: libc-alpha@sourceware.org Cc: DJ Delorie Subject: [PATCH 00/23] Add remaining CORE-MATH binary32 implementations to libm Date: Fri, 29 Nov 2024 10:17:24 -0300 Message-ID: <20241129132032.476978-1-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.43.0 MIME-Version: 1.0 X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~patch=linaro.org@sourceware.org This patchset adds the optimized and correctly rounded acosf, acohf, asinf, asinhf, atanf, atan2f, atanhf, coshf, sinhf, and tanf from CORE-MATH [1]. Each implementation has a benchmark to evaluate the performance improvements. In general, the results are pretty good with just some remarks: * acosf/asinf hits hard the branch prediction on the range x < 0x1.c2a1dcp-1 for acosf and x < 0x1.852p+126 for asinf. For acosf, this is required to make it correctly rounded on non-default rounding modes, and for asinf, it is fast-path optimization. The performance profiles are wildly different depending on the chip: I see regressions compared to glibc implementation on AMD Zen3 targeting x86-64/x86-64-v2, but I also see large improvements for x86-64-v3 and also on aarch64 Neoverse-N1 and powerpc POWER10. I think it should not be a blocker for integration. * coshf performance is the only regression compared to the current glibc. This is mostly due to the benchmark used (which I modeled using CORE-MATH input range) showing that the glibc code hotspot is on expf, an optimized version from ARM-optimized routines. Neither current expf nor coshf is correctly rounded, and the maximum error is 2ulps for FE_TONEAREST and 3ulp for another rounding. I am not sure if this would be a blocker, and I plan to remove the old SVID compat wrapper in subsequent patches (which should improve the function performance by about ~10%). [1] https://gitlab.inria.fr/core-math/core-math Adhemerval Zanella (23): benchtests: Add acosf benchmark benchtests: Add acoshf benchmark benchtests: Add asinf benchmark benchtests: Add asinhf benchmark benchtests: Add atanf benchmark benchtests: Add atan2f benchmark benchtests: Add atanhf benchmark benchtests: Add coshf benchmark benchtests: Add sinhf benchmark benchtests: Add tanhf benchmark math: Add inf support on gen-auto-libm-tests.c math: Fix the expected atanf (inf) results math: Fix the expected atan2f (inf) results math: Use acosf from CORE-MATH math: Use acoshf from CORE-MATH math: Use asinf from CORE-MATH math: Use asinhf from CORE-MATH math: Use atanf from CORE-MATH math: Use atan2f from CORE-MATH math: Use atanhf from CORE-MATH math: Use coshf from CORE-MATH math: Use sinhf from CORE-MATH math: Use tanhf from CORE-MATH SHARED-FILES | 40 + benchtests/Makefile | 10 + benchtests/acosf-inputs | 2710 +++++++++++++++++ benchtests/acoshf-inputs | 1005 ++++++ benchtests/asinf-inputs | 2710 +++++++++++++++++ benchtests/asinhf-inputs | 2005 ++++++++++++ benchtests/atan2f-inputs | 2005 ++++++++++++ benchtests/atanf-inputs | 2005 ++++++++++++ benchtests/atanhf-inputs | 2005 ++++++++++++ benchtests/coshf-inputs | 2005 ++++++++++++ benchtests/sinhf-inputs | 2005 ++++++++++++ benchtests/tanhf-inputs | 2005 ++++++++++++ math/auto-libm-test-in | 52 + math/auto-libm-test-out-atan | 50 + math/auto-libm-test-out-atan2 | 2316 ++++++++++++++ math/gen-auto-libm-tests.c | 23 +- math/libm-test-atan.inc | 2 - math/libm-test-atan2.inc | 56 - sysdeps/aarch64/libm-test-ulps | 44 +- sysdeps/alpha/fpu/libm-test-ulps | 40 - sysdeps/arc/fpu/libm-test-ulps | 40 - sysdeps/arc/nofpu/libm-test-ulps | 10 - sysdeps/arm/libm-test-ulps | 48 +- sysdeps/csky/fpu/libm-test-ulps | 40 - sysdeps/csky/nofpu/libm-test-ulps | 40 - sysdeps/hppa/fpu/libm-test-ulps | 40 - sysdeps/i386/fpu/e_acosf.S | 23 - sysdeps/i386/fpu/e_acoshf.S | 101 - sysdeps/i386/fpu/e_asinf.S | 38 - sysdeps/i386/fpu/e_atan2f.S | 30 - sysdeps/i386/fpu/e_atanhf.S | 110 - sysdeps/i386/fpu/libm-test-ulps | 25 - sysdeps/i386/fpu/s_asinhf.S | 139 - sysdeps/i386/fpu/s_atanf.S | 30 - .../i386/i686/fpu/multiarch/libm-test-ulps | 25 - sysdeps/ieee754/flt-32/e_acosf.c | 191 +- sysdeps/ieee754/flt-32/e_acoshf.c | 230 +- sysdeps/ieee754/flt-32/e_asinf.c | 210 +- sysdeps/ieee754/flt-32/e_atan2f.c | 337 +- sysdeps/ieee754/flt-32/e_atanhf.c | 210 +- sysdeps/ieee754/flt-32/e_coshf.c | 156 +- sysdeps/ieee754/flt-32/e_sinhf.c | 169 +- sysdeps/ieee754/flt-32/s_asinhf.c | 219 +- sysdeps/ieee754/flt-32/s_atanf.c | 186 +- sysdeps/ieee754/flt-32/s_tanhf.c | 131 +- sysdeps/loongarch/lp64/libm-test-ulps | 44 +- sysdeps/m68k/coldfire/fpu/libm-test-ulps | 2 - sysdeps/m68k/m680x0/fpu/libm-test-ulps | 11 - sysdeps/microblaze/libm-test-ulps | 10 - sysdeps/mips/mips32/libm-test-ulps | 40 - sysdeps/mips/mips64/libm-test-ulps | 40 - sysdeps/or1k/fpu/libm-test-ulps | 40 - sysdeps/or1k/nofpu/libm-test-ulps | 40 - sysdeps/powerpc/fpu/libm-test-ulps | 44 +- sysdeps/powerpc/nofpu/libm-test-ulps | 40 - sysdeps/riscv/nofpu/libm-test-ulps | 40 - sysdeps/riscv/rvd/libm-test-ulps | 40 - sysdeps/s390/fpu/libm-test-ulps | 40 - sysdeps/sh/libm-test-ulps | 20 - sysdeps/sparc/fpu/libm-test-ulps | 40 - sysdeps/x86_64/fpu/libm-test-ulps | 46 +- 61 files changed, 24381 insertions(+), 2027 deletions(-) create mode 100644 benchtests/acosf-inputs create mode 100644 benchtests/acoshf-inputs create mode 100644 benchtests/asinf-inputs create mode 100644 benchtests/asinhf-inputs create mode 100644 benchtests/atan2f-inputs create mode 100644 benchtests/atanf-inputs create mode 100644 benchtests/atanhf-inputs create mode 100644 benchtests/coshf-inputs create mode 100644 benchtests/sinhf-inputs create mode 100644 benchtests/tanhf-inputs delete mode 100644 sysdeps/i386/fpu/e_acosf.S delete mode 100644 sysdeps/i386/fpu/e_acoshf.S delete mode 100644 sysdeps/i386/fpu/e_asinf.S delete mode 100644 sysdeps/i386/fpu/e_atan2f.S delete mode 100644 sysdeps/i386/fpu/e_atanhf.S delete mode 100644 sysdeps/i386/fpu/s_asinhf.S delete mode 100644 sysdeps/i386/fpu/s_atanf.S