From patchwork Sun May 16 12:33:51 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 439842 Delivered-To: patch@linaro.org Received: by 2002:a02:7a1b:0:0:0:0:0 with SMTP id a27csp253338jac; Sun, 16 May 2021 05:36:23 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwsDQnWBTo5Rp1jQ80IctHvmfMTZvvdJupenSEceJWKjxBMWN9GArJ5S3+VCX9YjRZ4qgtS X-Received: by 2002:a05:6638:1242:: with SMTP id o2mr51708133jas.10.1621168583723; Sun, 16 May 2021 05:36:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1621168583; cv=none; d=google.com; s=arc-20160816; b=SXYWt3b0tS3PtYFZcmVEeXDXROE0QOI5iithli5plLYkrmum/9lLIW+ZE8HPXq+XyG DDp09ygaedcAVEpZraqLsRve3crJaZqKBApwxYMtjaNGh2VL2KXutJgvb48Uc50c/Wuh Eac8mgEb4ol9dAaAaan4+b4Nurnxu21KNd8w88EP9R7h69xTUX2Kw5gkMl4rqXtFYWPb JeEMlrD3FpkibUWLTfgO2CHu0wN+x4qWqDh4xSf5QT7MpC2RUpFtTZ+dxV9uLtjTLzxS jZnA2GVgCFPn2liOejP1zBwdofvUml/CDByi2u+C0aT/p+raNI8U7ykVtzfrbC5acwEL uuKg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:to:from :dkim-signature; bh=HnPXApZGOfezRUyUkfJd+r7Xafb2ezTr4qIfixFojDk=; b=zPS4vAWWQ18br+/Pv5xLjegNoSDiAGmvB10z+8/ZONPtzSSpbyCYtQPp7Uy19Vh/xU 0DCXTa45HiRkWv4YU7siXi+VDRbb7NAVOAfMRtq7m1ot6Gi9k7enCRWjzGZT+7zE/dmf uLskjWKp8MMrrziNHSIYsHSSyBcKe5oKGWqbAAsHONu+klp9fI+ikWHctu9dzEzs0Jn1 x/bgRPVnDneOPOrer0UFwfZnIzfjWl+58CQZXixeOUZ40aPeIsT8sIVXT1ZW9gcqRh0T EMZtoKrpEf2DbSRgrQne8QiOUCPzxKLVsgJh1apQNJH7iBWI5NsW+Njoy8ovecqZsWZ4 KBZg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=KGIQMHFl; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id a14si16856340ilm.103.2021.05.16.05.36.23 for (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Sun, 16 May 2021 05:36:23 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=KGIQMHFl; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:52536 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1liG0N-0007NR-3k for patch@linaro.org; Sun, 16 May 2021 08:36:23 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:42922) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1liFyj-0007I3-Hw for qemu-devel@nongnu.org; Sun, 16 May 2021 08:34:41 -0400 Received: from mail-qt1-x82f.google.com ([2607:f8b0:4864:20::82f]:34430) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1liFyg-0007iY-Tc for qemu-devel@nongnu.org; Sun, 16 May 2021 08:34:41 -0400 Received: by mail-qt1-x82f.google.com with SMTP id v4so3034059qtp.1 for ; Sun, 16 May 2021 05:34:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=HnPXApZGOfezRUyUkfJd+r7Xafb2ezTr4qIfixFojDk=; b=KGIQMHFl1Ls6/DW/wzqlkx6X7QwtjzFRZQG8Q9ggVuOU2Hii3mvI2jjCfBaMmc0PYt Tf56mHfxRRMnXNsI9Qy11TaRuLhGE6vIkK4NkiE+gGOe4Xa/h2CPKKr7pxlYW3JYWkww Gf1a910JpnZMBWkVlD2nHryDhNJfzmRk2aWPGTOgGEDZdQHODOys89xhme+4DvbLT6dC hA/TB3yMqy1lqJGZ8dJrM3Ck8z8TLXrVwScGvbua1NL5RCfw6daiqoOw7Gl7gt7rOdwi qYcAu18l1BRiPDjnYBLb3D3QJCY056bl/RmV33yM+utnJQlNdaQqf9gKfQTk84NyT64G vZAA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=HnPXApZGOfezRUyUkfJd+r7Xafb2ezTr4qIfixFojDk=; b=MCmAzZgqIPmf0fBAdh5oCbZ72O935c/A8UZRkVDSVOPATmulHSHfX+/Y5CTLCag/+R QEtbWhZqDp82aPGfxrLdo3lr88jMMlzgHFdeJ6xJrGqZsxRcwWJ3rZpk9hgNe8d8LrDy kRVj2F0KB/CycjfgZYS18MfH5b6Ebhx6UbFYGLIERrKlPuz7Jlt+tYyX0ViW3c1jp+nq BmB736HHJRpyxicJAQZy3ZNLh6PB1AuDeb97d2uE2Tk1XTCmKxLDLcUCkuCGr/Ie0rVX KpsCc3mClt8XCnM5WDh5YeFW79w3nHqGuhm9GNbIDozAZzGd5PgI9jojIo+pgZtDKrrq E1mQ== X-Gm-Message-State: AOAM533ZTfFE0HNJtjEeIeFBZZa3tGg5AEMg9JhBbKBr2IiecKxI5RMA oHAvElVvE4665YS4D3Uhe8j7gtCzGQBtjrzmlxU= X-Received: by 2002:a05:622a:1049:: with SMTP id f9mr51907806qte.140.1621168477868; Sun, 16 May 2021 05:34:37 -0700 (PDT) Received: from localhost.localdomain (163.189-204-200.bestelclientes.com.mx. [189.204.200.163]) by smtp.gmail.com with ESMTPSA id s5sm8500553qkg.88.2021.05.16.05.34.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 16 May 2021 05:34:37 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Subject: [PULL 06/46] softfloat: Move the binary point to the msb Date: Sun, 16 May 2021 07:33:51 -0500 Message-Id: <20210516123431.718318-7-richard.henderson@linaro.org> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210516123431.718318-1-richard.henderson@linaro.org> References: <20210516123431.718318-1-richard.henderson@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::82f; envelope-from=richard.henderson@linaro.org; helo=mail-qt1-x82f.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org, =?utf-8?q?Alex_Benn=C3=A9e?= Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" Rather than point the binary point at msb-1, put it at the msb. Use uadd64_overflow to detect when addition overflows instead of DECOMPOSED_OVERFLOW_BIT. This reduces the number of special cases within the code, such as shifting an int64_t either left or right during conversion. Reviewed-by: Alex Bennée Signed-off-by: Richard Henderson --- fpu/softfloat.c | 169 +++++++++++++++++++----------------------------- 1 file changed, 66 insertions(+), 103 deletions(-) -- 2.25.1 diff --git a/fpu/softfloat.c b/fpu/softfloat.c index 67cfa0fd82..cd777743f1 100644 --- a/fpu/softfloat.c +++ b/fpu/softfloat.c @@ -503,9 +503,8 @@ typedef struct { bool sign; } FloatParts; -#define DECOMPOSED_BINARY_POINT (64 - 2) +#define DECOMPOSED_BINARY_POINT 63 #define DECOMPOSED_IMPLICIT_BIT (1ull << DECOMPOSED_BINARY_POINT) -#define DECOMPOSED_OVERFLOW_BIT (DECOMPOSED_IMPLICIT_BIT << 1) /* Structure holding all of the relevant parameters for a format. * exp_size: the size of the exponent field @@ -657,7 +656,7 @@ static FloatParts sf_canonicalize(FloatParts part, const FloatFmt *parm, part.cls = float_class_zero; part.frac = 0; } else { - int shift = clz64(part.frac) - 1; + int shift = clz64(part.frac); part.cls = float_class_normal; part.exp = parm->frac_shift - parm->exp_bias - shift + 1; part.frac <<= shift; @@ -727,9 +726,8 @@ static FloatParts round_canonical(FloatParts p, float_status *s, if (likely(exp > 0)) { if (frac & round_mask) { flags |= float_flag_inexact; - frac += inc; - if (frac & DECOMPOSED_OVERFLOW_BIT) { - frac >>= 1; + if (uadd64_overflow(frac, inc, &frac)) { + frac = (frac >> 1) | DECOMPOSED_IMPLICIT_BIT; exp++; } } @@ -758,9 +756,12 @@ static FloatParts round_canonical(FloatParts p, float_status *s, p.cls = float_class_zero; goto do_zero; } else { - bool is_tiny = s->tininess_before_rounding - || (exp < 0) - || !((frac + inc) & DECOMPOSED_OVERFLOW_BIT); + bool is_tiny = s->tininess_before_rounding || (exp < 0); + + if (!is_tiny) { + uint64_t discard; + is_tiny = !uadd64_overflow(frac, inc, &discard); + } shift64RightJamming(frac, 1 - exp, &frac); if (frac & round_mask) { @@ -985,7 +986,7 @@ static FloatParts addsub_floats(FloatParts a, FloatParts b, bool subtract, a.cls = float_class_zero; a.sign = s->float_rounding_mode == float_round_down; } else { - int shift = clz64(a.frac) - 1; + int shift = clz64(a.frac); a.frac = a.frac << shift; a.exp = a.exp - shift; a.sign = a_sign; @@ -1022,9 +1023,10 @@ static FloatParts addsub_floats(FloatParts a, FloatParts b, bool subtract, shift64RightJamming(a.frac, b.exp - a.exp, &a.frac); a.exp = b.exp; } - a.frac += b.frac; - if (a.frac & DECOMPOSED_OVERFLOW_BIT) { + + if (uadd64_overflow(a.frac, b.frac, &a.frac)) { shift64RightJamming(a.frac, 1, &a.frac); + a.frac |= DECOMPOSED_IMPLICIT_BIT; a.exp += 1; } return a; @@ -1219,16 +1221,17 @@ static FloatParts mul_floats(FloatParts a, FloatParts b, float_status *s) int exp = a.exp + b.exp; mul64To128(a.frac, b.frac, &hi, &lo); - shift128RightJamming(hi, lo, DECOMPOSED_BINARY_POINT, &hi, &lo); - if (lo & DECOMPOSED_OVERFLOW_BIT) { - shift64RightJamming(lo, 1, &lo); + if (hi & DECOMPOSED_IMPLICIT_BIT) { exp += 1; + } else { + hi <<= 1; } + hi |= (lo != 0); /* Re-use a */ a.exp = exp; a.sign = sign; - a.frac = lo; + a.frac = hi; return a; } /* handle all the NaN cases */ @@ -1411,56 +1414,41 @@ static FloatParts muladd_floats(FloatParts a, FloatParts b, FloatParts c, p_exp = a.exp + b.exp; - /* Multiply of 2 62-bit numbers produces a (2*62) == 124-bit - * result. - */ mul64To128(a.frac, b.frac, &hi, &lo); - /* binary point now at bit 124 */ - /* check for overflow */ - if (hi & (1ULL << (DECOMPOSED_BINARY_POINT * 2 + 1 - 64))) { - shift128RightJamming(hi, lo, 1, &hi, &lo); + /* Renormalize to the msb. */ + if (hi & DECOMPOSED_IMPLICIT_BIT) { p_exp += 1; + } else { + shortShift128Left(hi, lo, 1, &hi, &lo); } /* + add/sub */ - if (c.cls == float_class_zero) { - /* move binary point back to 62 */ - shift128RightJamming(hi, lo, DECOMPOSED_BINARY_POINT, &hi, &lo); - } else { + if (c.cls != float_class_zero) { int exp_diff = p_exp - c.exp; if (p_sign == c.sign) { /* Addition */ if (exp_diff <= 0) { - shift128RightJamming(hi, lo, - DECOMPOSED_BINARY_POINT - exp_diff, - &hi, &lo); - lo += c.frac; + shift64RightJamming(hi, -exp_diff, &hi); p_exp = c.exp; + if (uadd64_overflow(hi, c.frac, &hi)) { + shift64RightJamming(hi, 1, &hi); + hi |= DECOMPOSED_IMPLICIT_BIT; + p_exp += 1; + } } else { - uint64_t c_hi, c_lo; - /* shift c to the same binary point as the product (124) */ - c_hi = c.frac >> 2; - c_lo = 0; - shift128RightJamming(c_hi, c_lo, - exp_diff, - &c_hi, &c_lo); - add128(hi, lo, c_hi, c_lo, &hi, &lo); - /* move binary point back to 62 */ - shift128RightJamming(hi, lo, DECOMPOSED_BINARY_POINT, &hi, &lo); + uint64_t c_hi, c_lo, over; + shift128RightJamming(c.frac, 0, exp_diff, &c_hi, &c_lo); + add192(0, hi, lo, 0, c_hi, c_lo, &over, &hi, &lo); + if (over) { + shift64RightJamming(hi, 1, &hi); + hi |= DECOMPOSED_IMPLICIT_BIT; + p_exp += 1; + } } - - if (lo & DECOMPOSED_OVERFLOW_BIT) { - shift64RightJamming(lo, 1, &lo); - p_exp += 1; - } - } else { /* Subtraction */ - uint64_t c_hi, c_lo; - /* make C binary point match product at bit 124 */ - c_hi = c.frac >> 2; - c_lo = 0; + uint64_t c_hi = c.frac, c_lo = 0; if (exp_diff <= 0) { shift128RightJamming(hi, lo, -exp_diff, &hi, &lo); @@ -1495,20 +1483,15 @@ static FloatParts muladd_floats(FloatParts a, FloatParts b, FloatParts c, /* Normalizing to a binary point of 124 is the correct adjust for the exponent. However since we're shifting, we might as well put the binary point back - at 62 where we really want it. Therefore shift as + at 63 where we really want it. Therefore shift as if we're leaving 1 bit at the top of the word, but adjust the exponent as if we're leaving 3 bits. */ - shift -= 1; - if (shift >= 64) { - lo = lo << (shift - 64); - } else { - hi = (hi << shift) | (lo >> (64 - shift)); - lo = hi | ((lo << shift) != 0); - } - p_exp -= shift - 2; + shift128Left(hi, lo, shift, &hi, &lo); + p_exp -= shift; } } } + hi |= (lo != 0); if (flags & float_muladd_halve_result) { p_exp -= 1; @@ -1518,7 +1501,7 @@ static FloatParts muladd_floats(FloatParts a, FloatParts b, FloatParts c, a.cls = float_class_normal; a.sign = p_sign ^ sign_flip; a.exp = p_exp; - a.frac = lo; + a.frac = hi; return a; } @@ -1742,25 +1725,17 @@ static FloatParts div_floats(FloatParts a, FloatParts b, float_status *s) * exponent to match. * * The udiv_qrnnd algorithm that we're using requires normalization, - * i.e. the msb of the denominator must be set. Since we know that - * DECOMPOSED_BINARY_POINT is msb-1, the inputs must be shifted left - * by one (more), and the remainder must be shifted right by one. + * i.e. the msb of the denominator must be set, which is already true. */ if (a.frac < b.frac) { exp -= 1; - shift128Left(0, a.frac, DECOMPOSED_BINARY_POINT + 2, &n1, &n0); - } else { shift128Left(0, a.frac, DECOMPOSED_BINARY_POINT + 1, &n1, &n0); + } else { + shift128Left(0, a.frac, DECOMPOSED_BINARY_POINT, &n1, &n0); } - q = udiv_qrnnd(&r, n1, n0, b.frac << 1); + q = udiv_qrnnd(&r, n1, n0, b.frac); - /* - * Set lsb if there is a remainder, to set inexact. - * As mentioned above, to find the actual value of the remainder we - * would need to shift right, but (1) we are only concerned about - * non-zero-ness, and (2) the remainder will always be even because - * both inputs to the division primitive are even. - */ + /* Set lsb if there is a remainder, to set inexact. */ a.frac = q | (r != 0); a.sign = sign; a.exp = exp; @@ -2135,12 +2110,12 @@ static FloatParts round_to_int(FloatParts a, FloatRoundMode rmode, if (a.frac & rnd_mask) { s->float_exception_flags |= float_flag_inexact; - a.frac += inc; - a.frac &= ~rnd_mask; - if (a.frac & DECOMPOSED_OVERFLOW_BIT) { + if (uadd64_overflow(a.frac, inc, &a.frac)) { a.frac >>= 1; + a.frac |= DECOMPOSED_IMPLICIT_BIT; a.exp++; } + a.frac &= ~rnd_mask; } } break; @@ -2213,10 +2188,8 @@ static int64_t round_to_int_and_pack(FloatParts in, FloatRoundMode rmode, case float_class_zero: return 0; case float_class_normal: - if (p.exp < DECOMPOSED_BINARY_POINT) { + if (p.exp <= DECOMPOSED_BINARY_POINT) { r = p.frac >> (DECOMPOSED_BINARY_POINT - p.exp); - } else if (p.exp - DECOMPOSED_BINARY_POINT < 2) { - r = p.frac << (p.exp - DECOMPOSED_BINARY_POINT); } else { r = UINT64_MAX; } @@ -2498,10 +2471,8 @@ static uint64_t round_to_uint_and_pack(FloatParts in, FloatRoundMode rmode, return 0; } - if (p.exp < DECOMPOSED_BINARY_POINT) { + if (p.exp <= DECOMPOSED_BINARY_POINT) { r = p.frac >> (DECOMPOSED_BINARY_POINT - p.exp); - } else if (p.exp - DECOMPOSED_BINARY_POINT < 2) { - r = p.frac << (p.exp - DECOMPOSED_BINARY_POINT); } else { s->float_exception_flags = orig_flags | float_flag_invalid; return max; @@ -2765,11 +2736,11 @@ static FloatParts int_to_float(int64_t a, int scale, float_status *status) f = -f; r.sign = true; } - shift = clz64(f) - 1; + shift = clz64(f); scale = MIN(MAX(scale, -0x10000), 0x10000); r.exp = DECOMPOSED_BINARY_POINT - shift + scale; - r.frac = (shift < 0 ? DECOMPOSED_IMPLICIT_BIT : f << shift); + r.frac = f << shift; } return r; @@ -2920,21 +2891,16 @@ bfloat16 int16_to_bfloat16(int16_t a, float_status *status) static FloatParts uint_to_float(uint64_t a, int scale, float_status *status) { FloatParts r = { .sign = false }; + int shift; if (a == 0) { r.cls = float_class_zero; } else { scale = MIN(MAX(scale, -0x10000), 0x10000); + shift = clz64(a); r.cls = float_class_normal; - if ((int64_t)a < 0) { - r.exp = DECOMPOSED_BINARY_POINT + 1 + scale; - shift64RightJamming(a, 1, &a); - r.frac = a; - } else { - int shift = clz64(a) - 1; - r.exp = DECOMPOSED_BINARY_POINT - shift + scale; - r.frac = a << shift; - } + r.exp = DECOMPOSED_BINARY_POINT - shift + scale; + r.frac = a << shift; } return r; @@ -3475,12 +3441,9 @@ static FloatParts sqrt_float(FloatParts a, float_status *s, const FloatFmt *p) /* We need two overflow bits at the top. Adding room for that is a * right shift. If the exponent is odd, we can discard the low bit * by multiplying the fraction by 2; that's a left shift. Combine - * those and we shift right if the exponent is even. + * those and we shift right by 1 if the exponent is odd, otherwise 2. */ - a_frac = a.frac; - if (!(a.exp & 1)) { - a_frac >>= 1; - } + a_frac = a.frac >> (2 - (a.exp & 1)); a.exp >>= 1; /* Bit-by-bit computation of sqrt. */ @@ -3488,10 +3451,10 @@ static FloatParts sqrt_float(FloatParts a, float_status *s, const FloatFmt *p) s_frac = 0; /* Iterate from implicit bit down to the 3 extra bits to compute a - * properly rounded result. Remember we've inserted one more bit - * at the top, so these positions are one less. + * properly rounded result. Remember we've inserted two more bits + * at the top, so these positions are two less. */ - bit = DECOMPOSED_BINARY_POINT - 1; + bit = DECOMPOSED_BINARY_POINT - 2; last_bit = MAX(p->frac_shift - 4, 0); do { uint64_t q = 1ULL << bit; @@ -3507,7 +3470,7 @@ static FloatParts sqrt_float(FloatParts a, float_status *s, const FloatFmt *p) /* Undo the right shift done above. If there is any remaining * fraction, the result is inexact. Set the sticky bit. */ - a.frac = (r_frac << 1) + (a_frac != 0); + a.frac = (r_frac << 2) + (a_frac != 0); return a; }