From patchwork Tue Nov 19 12:23:58 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella X-Patchwork-Id: 844267 Delivered-To: patch@linaro.org Received: by 2002:a05:6000:8b:b0:382:43a8:7b94 with SMTP id m11csp1124124wrx; Tue, 19 Nov 2024 04:25:55 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCVql51n4O2JedM86AUryGVzIZ/361q2VwDvG9AVFCYm//bBdoCu1yK5dXw58w8wWkeEwdIs+w==@linaro.org X-Google-Smtp-Source: AGHT+IHFsburA7FGIx0g+3mW16UT0ZiZMexnXOA15ih4MTz+7/wNj9D7csw1fAUlR/XP0iBlJX/0 X-Received: by 2002:a05:622a:5597:b0:462:c473:94ee with SMTP id d75a77b69052e-46363ec0cffmr227054131cf.49.1732019155194; Tue, 19 Nov 2024 04:25:55 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1732019155; cv=pass; d=google.com; s=arc-20240605; b=jBmYRW/2HF5H7SRl1tUasd+uuTUfzFeobWSxx35c826zfo3rNcvFzswO7chI/vdZqB 1dVXZIYWEs8l07RVmsTTqCWqbdNqIxSug9XBDmwPQkbjtchPsDNKvE2eQ61y8Hm78tZm ME1eOzrUkoWK+q1YZDihgFXNjtZl7jQ8sOIqj36p+qMFYnW+gytE5lDlMzXMMPt92Jvm y7wPjVLHVIr8Lqq9ogPcfHZWADI4t6YeEQBZff/Y4XKD15J3p2REZQ2msa5Nb6Zw5WcW 6CuX9RIe3IOxefyvAU4fbPtM6pbyXP9P7St86HrrK+OV4kFSjlQXoYrR/ZWoMmISHUQZ cRig== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:message-id:date:subject:cc:to:from:dkim-signature :dkim-filter:arc-filter:dmarc-filter:delivered-to:dkim-filter; bh=s3Yhr/cGT7Xr7bKlPHxWZ32NAO5oPR/2x4kyATeWUL8=; fh=G1NI73xfDcVophzg1+e/pmkwSVc7WWCXN7V/bl5G28k=; b=JOqjDczAAn271joUtLS7wB1exo/QrZZFguC/Ap1BqPua5o+mvJxIUHuO8CUp9rkPwp dL492zVurhWg+LkQtibPDQP0Q+reWyBLol8O2PNB9TRwVHJyb7WSKsMtPYPS9VqIRDlX j1Kn1MuyRWMr3VRxmQJ+E3xzBu1Wab6+erZepXtiBL9l72nDa5LWgcbM7sx/BNA2rYaq DrI/CuK5r1PtULSokO4xalVAiHdsnEB5fsJZHogtaLqp4bSvgcSliM9Hme64RGQzBP1N sMwwcTrH8TNmvZy7dseiUhhQazY65eP5ZYjsw9LUSN0aK6twbE93VgcQG+baI48XAbMN pGmg==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=WsQwI0Km; arc=pass (i=1); spf=pass (google.com: domain of libc-alpha-bounces~patch=linaro.org@sourceware.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="libc-alpha-bounces~patch=linaro.org@sourceware.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from server2.sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id d75a77b69052e-46392c79c42si20834571cf.692.2024.11.19.04.25.54 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 19 Nov 2024 04:25:55 -0800 (PST) Received-SPF: pass (google.com: domain of libc-alpha-bounces~patch=linaro.org@sourceware.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=WsQwI0Km; arc=pass (i=1); spf=pass (google.com: domain of libc-alpha-bounces~patch=linaro.org@sourceware.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="libc-alpha-bounces~patch=linaro.org@sourceware.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id BA5773858C42 for ; Tue, 19 Nov 2024 12:25:54 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org BA5773858C42 Authentication-Results: sourceware.org; dkim=pass (2048-bit key, unprotected) header.d=linaro.org header.i=@linaro.org header.a=rsa-sha256 header.s=google header.b=WsQwI0Km X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pl1-x633.google.com (mail-pl1-x633.google.com [IPv6:2607:f8b0:4864:20::633]) by sourceware.org (Postfix) with ESMTPS id F25153858CDA for ; Tue, 19 Nov 2024 12:25:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org F25153858CDA Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org ARC-Filter: OpenARC Filter v1.0.0 sourceware.org F25153858CDA Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::633 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1732019128; cv=none; b=KdaCkIsdbExB6PWMvafPD5t1anrLtftSlNVUQGRfx/Linkb9ADpPKTpi6+JINF2XfLV8TA3IIQz74INsKyrfFDGx9o1Vqb47jD2wKKThr+F3SIRK7Gqm9YIjJB18WKntaTv/Wgyd8rS6NdE88pAgdAvuqx9MBjzy/mQfxFPlW8g= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1732019128; c=relaxed/simple; bh=Df4+m6OQoxyA6P2qTfOMArSIYO2QxYo1tXlvCku+RVg=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=gZhXm2VQwNQaJBeJH/TKVI6nPzTCPaGFHfjaCL/08hho1mPyxXUbkwNpWicXPLLE/lz9OjTwhWeoqdGoEVOHqJTriWDFNHyIMNP8XuHvGnGxOtnMTvFSSJZcEg9vNSDAap0YhEEle2CGDrHkJX/2NfiDYFXoxXVUUhPwqRuVc2A= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org F25153858CDA Received: by mail-pl1-x633.google.com with SMTP id d9443c01a7336-20cdb889222so39614205ad.3 for ; Tue, 19 Nov 2024 04:25:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1732019127; x=1732623927; darn=sourceware.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=s3Yhr/cGT7Xr7bKlPHxWZ32NAO5oPR/2x4kyATeWUL8=; b=WsQwI0Km2RGlUKhjSdQ292Sz+QhOAYgcli4piNrSKg7Mlflyi9kve5EVykVxuqh0SC QaNhVZIgLroU2sVPNkUCtLgsj8qCE32rUoEknkkOTlc7jhTzdaygj0R4hlI35zqQpJg3 gNeFuUaaB6KLPSrK3WImgsK5pL1NRNXyuqJ16QeTKQg3mPMUY9eGx1lyGrelcxUv0QV/ 3lIHDMaHwkj2Z2KwDVZzRKQXoeQRRnysgMIyY6Wl3hiPLAL1uw8DRBA7FhbXJkin3Tj8 9qDV9plPE2reADb0vNZbEcGoZpRGVXZgXmxlG3m4SEshKxH+Mkg2vFx46BUXz/z7JbYD SHoQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1732019127; x=1732623927; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=s3Yhr/cGT7Xr7bKlPHxWZ32NAO5oPR/2x4kyATeWUL8=; b=eFiA4/CW6HWvsPn84aHTl7hV6re/jy1JFJRoihRKN58ariPiX8QyzLYDvK+7KMgcX8 1jJtsulHorvx1vnwA3LxIyM9WTr1CUSg20RzurhWF6yk79ueziyglUy4/vPF8D+CTJyM z/ut30BwEvm6/iGUzPF8vR54gFT5HtMECAB4cAicg6WCzaJUY4FoF+9vve0HbWMAfe+m DoOukjg552LlvF63NhP78+wHD7HoGxjX5425A6cxBlJ+lQOvJ5eeTzvisSdPC4PxhYx2 r/QYzCQmP6c9o8tg3ALp9fPjBZHHmjhuSlHm/B2HqgRb8A/Q9RljyNvT75jjd3nDtyM8 6IFg== X-Gm-Message-State: AOJu0Yw8inKRAkfIvK4oCQ1tZktEzfnaJcSgxdLUOilKJwak2ebO9MlF sGBrjBtkYZ1kR8HY8GghF03fDjYpOgNT+CgyKdZPs4K+aaZ7qtulM8H8mFPpeHpzflwr9pLaA4t WlQ0hcA== X-Received: by 2002:a17:903:2302:b0:20c:f261:2516 with SMTP id d9443c01a7336-211d0d6ebe1mr214648375ad.8.1732019126603; Tue, 19 Nov 2024 04:25:26 -0800 (PST) Received: from mandiga.. ([2804:1b3:a7c0:d8e8:618d:4836:c108:cdf2]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-211d0dc5d92sm73632305ad.9.2024.11.19.04.25.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 19 Nov 2024 04:25:26 -0800 (PST) From: Adhemerval Zanella To: libc-alpha@sourceware.org Cc: Paul Zimmermann Subject: [PATCH] math: Add internal roundeven_finite Date: Tue, 19 Nov 2024 09:23:58 -0300 Message-ID: <20241119122522.3290493-1-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.43.0 MIME-Version: 1.0 X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~patch=linaro.org@sourceware.org Some CORE-MATH routines uses roundeven and most of ISA do not have an specific instruction for the operation. In this case, the call will be routed to generic implementation. However, if the ISA does support round() and ctz() there is a better alternative (as used by CORE-MATH). This patch adds such optimization and also enables it on powerpc. On a power10 it shows the following improvement: expm1f latency master patched improvement power10 4.3910 2.6595 39.43% power10 15.8168 11.9993 24.14% Checked on powerpc64le-linux-gnu and aarch64-linux-gnu. --- sysdeps/ieee754/flt-32/e_gammaf_r.c | 2 +- sysdeps/ieee754/flt-32/math_config.h | 29 ++++++++++++++++++++++++++++ sysdeps/ieee754/flt-32/s_expm1f.c | 2 +- sysdeps/powerpc/fpu/math_private.h | 5 +++++ 4 files changed, 36 insertions(+), 2 deletions(-) diff --git a/sysdeps/ieee754/flt-32/e_gammaf_r.c b/sysdeps/ieee754/flt-32/e_gammaf_r.c index 6b1f95d50f..66e8caee0b 100644 --- a/sysdeps/ieee754/flt-32/e_gammaf_r.c +++ b/sysdeps/ieee754/flt-32/e_gammaf_r.c @@ -140,7 +140,7 @@ __ieee754_gammaf_r (float x, int *signgamp) }; double m = z - 0x1.7p+1; - double i = roundeven (m); + double i = roundeven_finite (m); double step = copysign (1.0, i); double d = m - i, d2 = d * d, d4 = d2 * d2, d8 = d4 * d4; double f = (c[0] + d * c[1]) + d2 * (c[2] + d * c[3]) diff --git a/sysdeps/ieee754/flt-32/math_config.h b/sysdeps/ieee754/flt-32/math_config.h index dc07ebd459..4336beb926 100644 --- a/sysdeps/ieee754/flt-32/math_config.h +++ b/sysdeps/ieee754/flt-32/math_config.h @@ -57,6 +57,35 @@ static inline int32_t converttoint (double_t x); #endif +#ifndef ROUNDEVEN_INTRINSICS +/* When set, roundeven_finite will route to the internal roundeven function. */ +# define ROUNDEVEN_INTRINSICS 1 +#endif + +#if ROUNDEVEN_INTRINSICS +/* Round x to nearest integer value in floating-point format, rounding halfway + cases to even. If the input is non finte the result is unspecified. */ +static inline double_t +roundeven_finite (double_t x) +{ + return roundeven (x); +} +#else +static inline double +roundeven_finite (double_t x) +{ + double_t y = round (x); + if (fabs (x - y) == 0.5) + { + union { double f; uint64_t i; } u = {y}; + union { double f; uint64_t i; } v = {(x > 0) ? y - 1.0 : y + 1.0}; + if (__builtin_ctzll (v.i) > __builtin_ctzll (u.i)) + y = v.f; + } + return y; +} +#endif + static inline uint32_t asuint (float f) { diff --git a/sysdeps/ieee754/flt-32/s_expm1f.c b/sysdeps/ieee754/flt-32/s_expm1f.c index edd7c9acf8..a36e5781f5 100644 --- a/sysdeps/ieee754/flt-32/s_expm1f.c +++ b/sysdeps/ieee754/flt-32/s_expm1f.c @@ -95,7 +95,7 @@ __expm1f (float x) return __math_oflowf (0); } double a = iln2 * z; - double ia = roundeven (a); + double ia = roundeven_finite (a); double h = a - ia; double h2 = h * h; uint64_t u = asuint64 (ia + big); diff --git a/sysdeps/powerpc/fpu/math_private.h b/sysdeps/powerpc/fpu/math_private.h index 9ef35b20cd..b22f53d366 100644 --- a/sysdeps/powerpc/fpu/math_private.h +++ b/sysdeps/powerpc/fpu/math_private.h @@ -59,4 +59,9 @@ __ieee754_sqrtf128 (_Float128 __x) #define _GL_HAS_BUILTIN_ILOGB 0 #endif +#ifdef _ARCH_PWR6 +/* ISA 2.03 provides frin/round() and cntlzw/ctznll(). */ +# define ROUNDEVEN_INTRINSICS 0 +#endif + #endif /* _PPC_MATH_PRIVATE_H_ */