From patchwork Wed May 2 15:43:42 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Alex_Benn=C3=A9e?= X-Patchwork-Id: 134834 Delivered-To: patch@linaro.org Received: by 10.46.151.6 with SMTP id r6csp835523lji; Wed, 2 May 2018 08:46:15 -0700 (PDT) X-Google-Smtp-Source: AB8JxZpijdfSZYfiq6jez2U3Z/g30s5w6iYs9HhjFFMP/CWQk05q8xxPdhgYwxP0sa/EgI/OTBOo X-Received: by 2002:a1f:214b:: with SMTP id h72-v6mr18731904vkh.176.1525275975827; Wed, 02 May 2018 08:46:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525275975; cv=none; d=google.com; s=arc-20160816; b=YIeRb17eiiEodMtFoOHEXeKbwwYhwOp3NFctfUKkN9G0t2PVFI3a39aSEgQkxDJqKG HyDrsjuY2J3IhqEBgKvdXHyy2tz0I+cuL1jDuTowY2rxd30dk29o3gDmwvGwgBnka5gR 9OhmXNGvctbwLJg9ZrKw+gwUfu5iE/eCzisYYFkL5RAVYP1OkaMHtu77Nq+08uGkaS7E V0Z5ouDUGlGypzW0KwVkM9hkgK7OGzsH1ZAwuveKJ2CvKJOuWXZ69rsLpCZWr8ZiT/21 hI534Pi3zVM6gRnmdz7HMPPt0OqMIiX3uHLXWLN/Be3DyiQJJkBTQIZ6c4WIaARmErBI OYdw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject :content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature:arc-authentication-results; bh=WgT+Wr8Tz0UuWlgnaK7RE3vyeRRoG38WZNTN03T8YSg=; b=grXPc+Mg+g8odIHXRyUua/A3Lyd/IyRIJd5SgZ4lCkTRrhncFimfgIjYJFi4YE09wB CPuIucbcDaUjHiTarH4/4WE1gKfYJORKohQeuaygv1At41Qs900BLjrPRgU7lM7C9Rko RaMDm53IHdFZox7mpJZxgM6d7Y/4SCejv6hD4l7yqyd3YEkNMq8IfOHFCwSir6paIlp3 roPeDkB2QZLydDMQ4og98nm9Xe7J9WG/gVHpLwGWYUrVGqQOnjIIejb88oygKBpkH6cr HhD1PTJQEt4uTzusgr3pcan6U2nYC294zyaXmWbA3ygZ4Yy0uViqBU0Y1f+lYMw8t+r8 YztA== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=E7Z5S+Np; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id e58si5908138uaa.158.2018.05.02.08.46.15 for (version=TLS1 cipher=AES128-SHA bits=128/128); Wed, 02 May 2018 08:46:15 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=E7Z5S+Np; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:51149 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fDtxP-0006AR-8j for patch@linaro.org; Wed, 02 May 2018 11:46:15 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:46833) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fDtv5-0004Mz-Aw for qemu-devel@nongnu.org; Wed, 02 May 2018 11:43:53 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fDtv2-0008MN-Ee for qemu-devel@nongnu.org; Wed, 02 May 2018 11:43:51 -0400 Received: from mail-wr0-x242.google.com ([2a00:1450:400c:c0c::242]:44201) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fDtv2-0008Lq-2p for qemu-devel@nongnu.org; Wed, 02 May 2018 11:43:48 -0400 Received: by mail-wr0-x242.google.com with SMTP id y15-v6so2808085wrg.11 for ; Wed, 02 May 2018 08:43:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=WgT+Wr8Tz0UuWlgnaK7RE3vyeRRoG38WZNTN03T8YSg=; b=E7Z5S+Np6o//0v0oxnoKp2bFc5WiQ+9+u38BMp+a+qEa64Fl7XbHRib8HvH/C7WvLb 5Erj/UW8esB4dBcZlmwEU1eEfpjSzGF0Tt5E+V3EfhmsIDuyO9tN3HEYBHRTJZKcqpFx N8WAKvil2z8SEBk8PXNfI/fLdmvpjwjNw8kpk= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=WgT+Wr8Tz0UuWlgnaK7RE3vyeRRoG38WZNTN03T8YSg=; b=fLCfdM1eAi0iDml4xTihHeG+fR3WEMG8ux+Gs5VtIsiiT09du9sHXskbPVev7uXf6W dMYtNpHOTlOd2QIQYmiztjbyE8wMsQqzft+B6LiJNe8W2Gqz1rgB98Outoll8LRE8+hm RL4hro9mDpZ/d2wdhlPEJQdHFWaru8rvHbBRgaGbh7yT0PjQzywmiSrPWNjfshlnm9i7 tqm/fA9koMctLtVOxXnKKlaTjPVySRIEpimt1h9d23nGhkut8oZRY158yeQ1D8p0Su8d diFHXZgBsz896snqlCAsGrBlVVwCMriyMe51UfhsXgDbruV3kqB+KCN1uJb14Dva0IL9 q1DA== X-Gm-Message-State: ALQs6tB0edPwpLMUTpdma2yxSkSQVRvHlqkPLlegKsR7ba+pc6T8+zp9 qJRAlbHk7BFqN4Qc1QeRfAn9oA== X-Received: by 2002:adf:b839:: with SMTP id h54-v6mr15310886wrf.141.1525275826582; Wed, 02 May 2018 08:43:46 -0700 (PDT) Received: from zen.linaro.local ([81.128.185.34]) by smtp.gmail.com with ESMTPSA id r14-v6sm18300072wra.41.2018.05.02.08.43.44 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 02 May 2018 08:43:45 -0700 (PDT) Received: from zen.linaroharston (localhost [127.0.0.1]) by zen.linaro.local (Postfix) with ESMTP id 4D4953E034C; Wed, 2 May 2018 16:43:44 +0100 (BST) From: =?utf-8?q?Alex_Benn=C3=A9e?= To: peter.maydell@linaro.org Date: Wed, 2 May 2018 16:43:42 +0100 Message-Id: <20180502154344.10585-2-alex.bennee@linaro.org> X-Mailer: git-send-email 2.17.0 In-Reply-To: <20180502154344.10585-1-alex.bennee@linaro.org> References: <20180502154344.10585-1-alex.bennee@linaro.org> MIME-Version: 1.0 X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2a00:1450:400c:c0c::242 Subject: [Qemu-devel] [PATCH v2 1/3] fpu/softfloat: re-factor float to float conversions X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: =?utf-8?q?Alex_Benn=C3=A9e?= , qemu-arm@nongnu.org, richard.henderson@linaro.org, qemu-devel@nongnu.org, Aurelien Jarno Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" This allows us to delete a lot of additional boilerplate code which is no longer needed. Currently the ieee flag is ignored (everything is assumed to be ieee). Handling for ARM AHP will be in the next patch. Signed-off-by: Alex Bennée --- v2 - pass FloatFmt to float_to_float instead of sizes - split AHP handling to another patch - use rth's suggested re-packing (+ setting .exp) --- fpu/softfloat-specialize.h | 40 ---- fpu/softfloat.c | 443 +++++++------------------------------ include/fpu/softfloat.h | 8 +- 3 files changed, 88 insertions(+), 403 deletions(-) -- 2.17.0 diff --git a/fpu/softfloat-specialize.h b/fpu/softfloat-specialize.h index 27834af0de..a20b440159 100644 --- a/fpu/softfloat-specialize.h +++ b/fpu/softfloat-specialize.h @@ -293,46 +293,6 @@ float16 float16_maybe_silence_nan(float16 a_, float_status *status) return a_; } -/*---------------------------------------------------------------------------- -| Returns the result of converting the half-precision floating-point NaN -| `a' to the canonical NaN format. If `a' is a signaling NaN, the invalid -| exception is raised. -*----------------------------------------------------------------------------*/ - -static commonNaNT float16ToCommonNaN(float16 a, float_status *status) -{ - commonNaNT z; - - if (float16_is_signaling_nan(a, status)) { - float_raise(float_flag_invalid, status); - } - z.sign = float16_val(a) >> 15; - z.low = 0; - z.high = ((uint64_t) float16_val(a)) << 54; - return z; -} - -/*---------------------------------------------------------------------------- -| Returns the result of converting the canonical NaN `a' to the half- -| precision floating-point format. -*----------------------------------------------------------------------------*/ - -static float16 commonNaNToFloat16(commonNaNT a, float_status *status) -{ - uint16_t mantissa = a.high >> 54; - - if (status->default_nan_mode) { - return float16_default_nan(status); - } - - if (mantissa) { - return make_float16(((((uint16_t) a.sign) << 15) - | (0x1F << 10) | mantissa)); - } else { - return float16_default_nan(status); - } -} - #ifdef NO_SIGNALING_NANS int float32_is_quiet_nan(float32 a_, float_status *status) { diff --git a/fpu/softfloat.c b/fpu/softfloat.c index 70e0c40a1c..28b9f4f79b 100644 --- a/fpu/softfloat.c +++ b/fpu/softfloat.c @@ -1194,6 +1194,90 @@ float64 float64_div(float64 a, float64 b, float_status *status) return float64_round_pack_canonical(pr, status); } +/* + * Float to Float conversions + * + * Returns the result of converting one float format to another. The + * conversion is performed according to the IEC/IEEE Standard for + * Binary Floating-Point Arithmetic. + * + * The float_to_float helper only needs to take care of raising + * invalid exceptions and handling the conversion on NaNs. + */ + +static FloatParts float_to_float(FloatParts a, + const FloatFmt *srcf, const FloatFmt *dstf, + float_status *s) +{ + if (is_nan(a.cls)) { + + if (is_snan(a.cls)) { + s->float_exception_flags |= float_flag_invalid; + } + + if (s->default_nan_mode) { + a.cls = float_class_dnan; + return a; + } + + /* + * Our only option now is to "re-pack" the NaN. As the + * canonilization process doesn't mess with fraction bits for + * NaNs we do it all here. We also reset a.exp to the + * destination format exp_max as the maybe_silence_nan code + * assumes it is correct (which is would be for non-conversions). + */ + a.frac = a.frac << (64 - srcf->frac_size) >> (64 - dstf->frac_size); + a.exp = dstf->exp_max; + a.cls = float_class_msnan; + } + + return a; +} + +float32 float16_to_float32(float16 a, bool ieee, float_status *s) +{ + FloatParts p = float16_unpack_canonical(a, s); + FloatParts pr = float_to_float(p, &float16_params, &float32_params, s); + return float32_round_pack_canonical(pr, s); +} + +float64 float16_to_float64(float16 a, bool ieee, float_status *s) +{ + FloatParts p = float16_unpack_canonical(a, s); + FloatParts pr = float_to_float(p, &float16_params, &float64_params, s); + return float64_round_pack_canonical(pr, s); +} + +float16 float32_to_float16(float32 a, bool ieee, float_status *s) +{ + FloatParts p = float32_unpack_canonical(a, s); + FloatParts pr = float_to_float(p, &float32_params, &float16_params, s); + return float16_round_pack_canonical(pr, s); +} + +float64 float32_to_float64(float32 a, float_status *s) +{ + FloatParts p = float32_unpack_canonical(a, s); + FloatParts pr = float_to_float(p, &float32_params, &float64_params, s); + return float64_round_pack_canonical(pr, s); +} + +float16 float64_to_float16(float64 a, bool ieee, float_status *s) +{ + FloatParts p = float64_unpack_canonical(a, s); + FloatParts pr = float_to_float(p, &float64_params, &float16_params, s); + return float16_round_pack_canonical(pr, s); +} + +float32 float64_to_float32(float64 a, float_status *s) +{ + FloatParts p = float64_unpack_canonical(a, s); + FloatParts pr = float_to_float(p, &float64_params, &float32_params, s); + return float32_round_pack_canonical(pr, s); +} + + /* * Rounds the floating-point value `a' to an integer, and returns the * result as a floating-point value. The operation is performed @@ -3142,41 +3226,6 @@ float128 uint64_to_float128(uint64_t a, float_status *status) return normalizeRoundAndPackFloat128(0, 0x406E, a, 0, status); } - - - -/*---------------------------------------------------------------------------- -| Returns the result of converting the single-precision floating-point value -| `a' to the double-precision floating-point format. The conversion is -| performed according to the IEC/IEEE Standard for Binary Floating-Point -| Arithmetic. -*----------------------------------------------------------------------------*/ - -float64 float32_to_float64(float32 a, float_status *status) -{ - flag aSign; - int aExp; - uint32_t aSig; - a = float32_squash_input_denormal(a, status); - - aSig = extractFloat32Frac( a ); - aExp = extractFloat32Exp( a ); - aSign = extractFloat32Sign( a ); - if ( aExp == 0xFF ) { - if (aSig) { - return commonNaNToFloat64(float32ToCommonNaN(a, status), status); - } - return packFloat64( aSign, 0x7FF, 0 ); - } - if ( aExp == 0 ) { - if ( aSig == 0 ) return packFloat64( aSign, 0, 0 ); - normalizeFloat32Subnormal( aSig, &aExp, &aSig ); - --aExp; - } - return packFloat64( aSign, aExp + 0x380, ( (uint64_t) aSig )<<29 ); - -} - /*---------------------------------------------------------------------------- | Returns the result of converting the single-precision floating-point value | `a' to the extended double-precision floating-point format. The conversion @@ -3695,173 +3744,6 @@ int float32_unordered_quiet(float32 a, float32 b, float_status *status) return 0; } - -/*---------------------------------------------------------------------------- -| Returns the result of converting the double-precision floating-point value -| `a' to the single-precision floating-point format. The conversion is -| performed according to the IEC/IEEE Standard for Binary Floating-Point -| Arithmetic. -*----------------------------------------------------------------------------*/ - -float32 float64_to_float32(float64 a, float_status *status) -{ - flag aSign; - int aExp; - uint64_t aSig; - uint32_t zSig; - a = float64_squash_input_denormal(a, status); - - aSig = extractFloat64Frac( a ); - aExp = extractFloat64Exp( a ); - aSign = extractFloat64Sign( a ); - if ( aExp == 0x7FF ) { - if (aSig) { - return commonNaNToFloat32(float64ToCommonNaN(a, status), status); - } - return packFloat32( aSign, 0xFF, 0 ); - } - shift64RightJamming( aSig, 22, &aSig ); - zSig = aSig; - if ( aExp || zSig ) { - zSig |= 0x40000000; - aExp -= 0x381; - } - return roundAndPackFloat32(aSign, aExp, zSig, status); - -} - - -/*---------------------------------------------------------------------------- -| Packs the sign `zSign', exponent `zExp', and significand `zSig' into a -| half-precision floating-point value, returning the result. After being -| shifted into the proper positions, the three fields are simply added -| together to form the result. This means that any integer portion of `zSig' -| will be added into the exponent. Since a properly normalized significand -| will have an integer portion equal to 1, the `zExp' input should be 1 less -| than the desired result exponent whenever `zSig' is a complete, normalized -| significand. -*----------------------------------------------------------------------------*/ -static float16 packFloat16(flag zSign, int zExp, uint16_t zSig) -{ - return make_float16( - (((uint32_t)zSign) << 15) + (((uint32_t)zExp) << 10) + zSig); -} - -/*---------------------------------------------------------------------------- -| Takes an abstract floating-point value having sign `zSign', exponent `zExp', -| and significand `zSig', and returns the proper half-precision floating- -| point value corresponding to the abstract input. Ordinarily, the abstract -| value is simply rounded and packed into the half-precision format, with -| the inexact exception raised if the abstract input cannot be represented -| exactly. However, if the abstract value is too large, the overflow and -| inexact exceptions are raised and an infinity or maximal finite value is -| returned. If the abstract value is too small, the input value is rounded to -| a subnormal number, and the underflow and inexact exceptions are raised if -| the abstract input cannot be represented exactly as a subnormal half- -| precision floating-point number. -| The `ieee' flag indicates whether to use IEEE standard half precision, or -| ARM-style "alternative representation", which omits the NaN and Inf -| encodings in order to raise the maximum representable exponent by one. -| The input significand `zSig' has its binary point between bits 22 -| and 23, which is 13 bits to the left of the usual location. This shifted -| significand must be normalized or smaller. If `zSig' is not normalized, -| `zExp' must be 0; in that case, the result returned is a subnormal number, -| and it must not require rounding. In the usual case that `zSig' is -| normalized, `zExp' must be 1 less than the ``true'' floating-point exponent. -| Note the slightly odd position of the binary point in zSig compared with the -| other roundAndPackFloat functions. This should probably be fixed if we -| need to implement more float16 routines than just conversion. -| The handling of underflow and overflow follows the IEC/IEEE Standard for -| Binary Floating-Point Arithmetic. -*----------------------------------------------------------------------------*/ - -static float16 roundAndPackFloat16(flag zSign, int zExp, - uint32_t zSig, flag ieee, - float_status *status) -{ - int maxexp = ieee ? 29 : 30; - uint32_t mask; - uint32_t increment; - bool rounding_bumps_exp; - bool is_tiny = false; - - /* Calculate the mask of bits of the mantissa which are not - * representable in half-precision and will be lost. - */ - if (zExp < 1) { - /* Will be denormal in halfprec */ - mask = 0x00ffffff; - if (zExp >= -11) { - mask >>= 11 + zExp; - } - } else { - /* Normal number in halfprec */ - mask = 0x00001fff; - } - - switch (status->float_rounding_mode) { - case float_round_nearest_even: - increment = (mask + 1) >> 1; - if ((zSig & mask) == increment) { - increment = zSig & (increment << 1); - } - break; - case float_round_ties_away: - increment = (mask + 1) >> 1; - break; - case float_round_up: - increment = zSign ? 0 : mask; - break; - case float_round_down: - increment = zSign ? mask : 0; - break; - default: /* round_to_zero */ - increment = 0; - break; - } - - rounding_bumps_exp = (zSig + increment >= 0x01000000); - - if (zExp > maxexp || (zExp == maxexp && rounding_bumps_exp)) { - if (ieee) { - float_raise(float_flag_overflow | float_flag_inexact, status); - return packFloat16(zSign, 0x1f, 0); - } else { - float_raise(float_flag_invalid, status); - return packFloat16(zSign, 0x1f, 0x3ff); - } - } - - if (zExp < 0) { - /* Note that flush-to-zero does not affect half-precision results */ - is_tiny = - (status->float_detect_tininess == float_tininess_before_rounding) - || (zExp < -1) - || (!rounding_bumps_exp); - } - if (zSig & mask) { - float_raise(float_flag_inexact, status); - if (is_tiny) { - float_raise(float_flag_underflow, status); - } - } - - zSig += increment; - if (rounding_bumps_exp) { - zSig >>= 1; - zExp++; - } - - if (zExp < -10) { - return packFloat16(zSign, 0, 0); - } - if (zExp < 0) { - zSig >>= -zExp; - zExp = 0; - } - return packFloat16(zSign, zExp, zSig >> 13); -} - /*---------------------------------------------------------------------------- | If `a' is denormal and we are in flush-to-zero mode then set the | input-denormal exception and return zero. Otherwise just return the value. @@ -3877,163 +3759,6 @@ float16 float16_squash_input_denormal(float16 a, float_status *status) return a; } -static void normalizeFloat16Subnormal(uint32_t aSig, int *zExpPtr, - uint32_t *zSigPtr) -{ - int8_t shiftCount = countLeadingZeros32(aSig) - 21; - *zSigPtr = aSig << shiftCount; - *zExpPtr = 1 - shiftCount; -} - -/* Half precision floats come in two formats: standard IEEE and "ARM" format. - The latter gains extra exponent range by omitting the NaN/Inf encodings. */ - -float32 float16_to_float32(float16 a, flag ieee, float_status *status) -{ - flag aSign; - int aExp; - uint32_t aSig; - - aSign = extractFloat16Sign(a); - aExp = extractFloat16Exp(a); - aSig = extractFloat16Frac(a); - - if (aExp == 0x1f && ieee) { - if (aSig) { - return commonNaNToFloat32(float16ToCommonNaN(a, status), status); - } - return packFloat32(aSign, 0xff, 0); - } - if (aExp == 0) { - if (aSig == 0) { - return packFloat32(aSign, 0, 0); - } - - normalizeFloat16Subnormal(aSig, &aExp, &aSig); - aExp--; - } - return packFloat32( aSign, aExp + 0x70, aSig << 13); -} - -float16 float32_to_float16(float32 a, flag ieee, float_status *status) -{ - flag aSign; - int aExp; - uint32_t aSig; - - a = float32_squash_input_denormal(a, status); - - aSig = extractFloat32Frac( a ); - aExp = extractFloat32Exp( a ); - aSign = extractFloat32Sign( a ); - if ( aExp == 0xFF ) { - if (aSig) { - /* Input is a NaN */ - if (!ieee) { - float_raise(float_flag_invalid, status); - return packFloat16(aSign, 0, 0); - } - return commonNaNToFloat16( - float32ToCommonNaN(a, status), status); - } - /* Infinity */ - if (!ieee) { - float_raise(float_flag_invalid, status); - return packFloat16(aSign, 0x1f, 0x3ff); - } - return packFloat16(aSign, 0x1f, 0); - } - if (aExp == 0 && aSig == 0) { - return packFloat16(aSign, 0, 0); - } - /* Decimal point between bits 22 and 23. Note that we add the 1 bit - * even if the input is denormal; however this is harmless because - * the largest possible single-precision denormal is still smaller - * than the smallest representable half-precision denormal, and so we - * will end up ignoring aSig and returning via the "always return zero" - * codepath. - */ - aSig |= 0x00800000; - aExp -= 0x71; - - return roundAndPackFloat16(aSign, aExp, aSig, ieee, status); -} - -float64 float16_to_float64(float16 a, flag ieee, float_status *status) -{ - flag aSign; - int aExp; - uint32_t aSig; - - aSign = extractFloat16Sign(a); - aExp = extractFloat16Exp(a); - aSig = extractFloat16Frac(a); - - if (aExp == 0x1f && ieee) { - if (aSig) { - return commonNaNToFloat64( - float16ToCommonNaN(a, status), status); - } - return packFloat64(aSign, 0x7ff, 0); - } - if (aExp == 0) { - if (aSig == 0) { - return packFloat64(aSign, 0, 0); - } - - normalizeFloat16Subnormal(aSig, &aExp, &aSig); - aExp--; - } - return packFloat64(aSign, aExp + 0x3f0, ((uint64_t)aSig) << 42); -} - -float16 float64_to_float16(float64 a, flag ieee, float_status *status) -{ - flag aSign; - int aExp; - uint64_t aSig; - uint32_t zSig; - - a = float64_squash_input_denormal(a, status); - - aSig = extractFloat64Frac(a); - aExp = extractFloat64Exp(a); - aSign = extractFloat64Sign(a); - if (aExp == 0x7FF) { - if (aSig) { - /* Input is a NaN */ - if (!ieee) { - float_raise(float_flag_invalid, status); - return packFloat16(aSign, 0, 0); - } - return commonNaNToFloat16( - float64ToCommonNaN(a, status), status); - } - /* Infinity */ - if (!ieee) { - float_raise(float_flag_invalid, status); - return packFloat16(aSign, 0x1f, 0x3ff); - } - return packFloat16(aSign, 0x1f, 0); - } - shift64RightJamming(aSig, 29, &aSig); - zSig = aSig; - if (aExp == 0 && zSig == 0) { - return packFloat16(aSign, 0, 0); - } - /* Decimal point between bits 22 and 23. Note that we add the 1 bit - * even if the input is denormal; however this is harmless because - * the largest possible single-precision denormal is still smaller - * than the smallest representable half-precision denormal, and so we - * will end up ignoring aSig and returning via the "always return zero" - * codepath. - */ - zSig |= 0x00800000; - aExp -= 0x3F1; - - return roundAndPackFloat16(aSign, aExp, zSig, ieee, status); -} - /*---------------------------------------------------------------------------- | Returns the result of converting the double-precision floating-point value | `a' to the extended double-precision floating-point format. The conversion diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h index 36626a501b..01ef1c6b81 100644 --- a/include/fpu/softfloat.h +++ b/include/fpu/softfloat.h @@ -211,10 +211,10 @@ float128 uint64_to_float128(uint64_t, float_status *status); /*---------------------------------------------------------------------------- | Software half-precision conversion routines. *----------------------------------------------------------------------------*/ -float16 float32_to_float16(float32, flag, float_status *status); -float32 float16_to_float32(float16, flag, float_status *status); -float16 float64_to_float16(float64 a, flag ieee, float_status *status); -float64 float16_to_float64(float16 a, flag ieee, float_status *status); +float16 float32_to_float16(float32, bool ieee, float_status *status); +float32 float16_to_float32(float16, bool ieee, float_status *status); +float16 float64_to_float16(float64 a, bool ieee, float_status *status); +float64 float16_to_float64(float16 a, bool ieee, float_status *status); int16_t float16_to_int16(float16, float_status *status); uint16_t float16_to_uint16(float16 a, float_status *status); int16_t float16_to_int16_round_to_zero(float16, float_status *status);