[24/76] fpu: allow flushing of output denormals to be after rounding

Message ID	20250124162836.2332150-25-peter.maydell@linaro.org
State	Superseded
Headers	show Delivered-To: patch@linaro.org Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; From: Peter Maydell <peter.maydell@linaro.org> To: qemu-arm@nongnu.org, qemu-devel@nongnu.org Subject: [PATCH 24/76] fpu: allow flushing of output denormals to be after rounding Date: Fri, 24 Jan 2025 16:27:44 +0000 Message-Id: <20250124162836.2332150-25-peter.maydell@linaro.org> In-Reply-To: <20250124162836.2332150-1-peter.maydell@linaro.org> References: <20250124162836.2332150-1-peter.maydell@linaro.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=2a00:1450:4864:20::32e; envelope-from=peter.maydell@linaro.org; helo=mail-wm1-x32e.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=unavailable autolearn_force=no X-Spam_action: no action Precedence: list Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org
Series	target/arm: Implement FEAT_AFP and FEAT_RPRES \| expand [00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES [01/76] target/i386: Do not raise Invalid for 0 * Inf + QNaN [02/76] tests/tcg/x86_64/fma: Test some x86 fused-multiply-add cases [03/76] target/arm: arm_reset_sve_state() should set FPSR, not FPCR [04/76] target/arm: Use FPSR_ constants in vfp_exceptbits_from_host() [05/76] target/arm: Use uint32_t in vfp_exceptbits_from_host() [06/76] target/arm: Define new fp_status_a32 and fp_status_a64 [07/76] target/arm: Use vfp.fp_status_a64 in A64-only helper functions [08/76] target/arm: Use fp_status_a32 in vjvct helper [09/76] target/arm: Use fp_status_a32 in vfp_cmp helpers [10/76] target/arm: Use FPST_FPCR_A32 in A32 decoder [11/76] target/arm: Use FPST_FPCR_A64 in A64 decoder [12/76] target/arm: Remove now-unused vfp.fp_status and FPST_FPCR [13/76] target/arm: Define new fp_status_f16_a32 and fp_status_f16_a64 [14/76] target/arm: Use fp_status_f16_a32 in AArch32-only helpers [15/76] target/arm: Use fp_status_f16_a64 in AArch64-only helpers [16/76] target/arm: Use FPST_FPCR_F16_A32 in A32 decoder [17/76] target/arm: Use FPST_FPCR_F16_A64 in A64 decoder [18/76] target/arm: Remove now-unused vfp.fp_status_f16 and FPST_FPCR_F16 [19/76] fpu: Rename float_flag_input_denormal to float_flag_input_denormal_flushed [20/76] fpu: Rename float_flag_output_denormal to float_flag_output_denormal_flushed [21/76] fpu: Fix a comment in softfloat-types.h [22/76] fpu: Add float_class_denormal [23/76] fpu: Implement float_flag_input_denormal_used [24/76] fpu: allow flushing of output denormals to be after rounding [25/76] target/arm: Remove redundant advsimd float16 helpers [26/76] target/arm: Use FPST_FPCR_F16_A64 for halfprec-to-other conversions [27/76] target/arm: Define FPCR AH, FIZ, NEP bits [28/76] target/arm: Implement FPCR.FIZ handling [29/76] target/arm: Adjust FP behaviour for FPCR.AH = 1 [30/76] target/arm: Adjust exception flag handling for AH = 1 [31/76] target/arm: Add FPCR.AH to tbflags [32/76] target/arm: Set up float_status to use for FPCR.AH=1 behaviour [33/76] target/arm: Use FPST_FPCR_AH for FRECPE, FRECPS, FRECPX, FRSQRTE, FRSQRTS [34/76] target/arm: Use FPST_FPCR_AH for BFCVT* insns [35/76] target/arm: Use FPST_FPCR_AH for BFMLAL, BFMLSL insns [36/76] target/arm: Add FPCR.NEP to TBFLAGS [37/76] target/arm: Define and use new write_fp_*reg_merging() functions [38/76] target/arm: Handle FPCR.NEP for 3-input scalar operations [39/76] target/arm: Handle FPCR.NEP for BFCVT scalar [40/76] target/arm: Handle FPCR.NEP for 1-input scalar operations [41/76] target/arm: Handle FPCR.NEP in do_cvtf_scalar() [42/76] target/arm: Handle FPCR.NEP for scalar FABS and FNEG [43/76] target/arm: Handle FPCR.NEP for FCVTXN (scalar) [44/76] target/arm: Handle FPCR.NEP for NEP for FMUL, FMULX scalar by element [45/76] target/arm: Implement FPCR.AH semantics for scalar FMIN/FMAX [46/76] target/arm: Implement FPCR.AH semantics for vector FMIN/FMAX [47/76] target/arm: Implement FPCR.AH semantics for FMAXV and FMINV [48/76] target/arm: Implement FPCR.AH semantics for FMINP and FMAXP [49/76] target/arm: Implement FPCR.AH semantics for SVE FMAXV and FMINV [50/76] target/arm: Implement FPCR.AH semantics for SVE FMIN/FMAX immediate [51/76] target/arm: Implement FPCR.AH semantics for SVE FMIN/FMAX vector [52/76] target/arm: Implement FPCR.AH handling of negation of NaN [53/76] target/arm: Implement FPCR.AH handling for scalar FABS and FABD [54/76] target/arm: Handle FPCR.AH in vector FABD [55/76] target/arm: Handle FPCR.AH in SVE FNEG [56/76] target/arm: Handle FPCR.AH in SVE FABS [57/76] target/arm: Handle FPCR.AH in SVE FABD [58/76] target/arm: Handle FPCR.AH in negation steps in FCADD [59/76] target/arm: Handle FPCR.AH in negation steps in SVE FCADD [60/76] target/arm: Handle FPCR.AH in FMLSL [61/76] target/arm: Handle FPCR.AH in FRECPS and FRSQRTS scalar insns [62/76] target/arm: Handle FPCR.AH in FRECPS and FRSQRTS vector insns [63/76] target/arm: Handle FPCR.AH in negation step in FMLS (indexed) [64/76] target/arm: Handle FPCR.AH in negation in FMLS (vector) [65/76] target/arm: Handle FPCR.AH in negation step in SVE FMLS (vector) [66/76] target/arm: Handle FPCR.AH in SVE FTSSEL [67/76] target/arm: Handle FPCR.AH in SVE FTMAD [68/76] target/arm: Enable FEAT_AFP for '-cpu max' [69/76] target/arm: Plumb FEAT_RPRES frecpe and frsqrte through to new helper [70/76] target/arm: Implement increased precision FRECPE [71/76] target/arm: Implement increased precision FRSQRTE [72/76] target/arm: Enable FEAT_RPRES for -cpu max [73/76] target/i386: Detect flush-to-zero after rounding [74/76] target/i386: Use correct type for get_float_exception_flags() values [75/76] target/i386: Wire up MXCSR.DE and FPUS.DE correctly [76/76] tests/tcg/x86_64/fma: add test for exact-denormal output

diff --git a/include/fpu/softfloat-helpers.h b/include/fpu/softfloat-helpers.h index 4cb30a48220..a4c1a4fa3b8 100644 --- a/include/fpu/softfloat-helpers.h +++ b/include/fpu/softfloat-helpers.h @@ -109,6 +109,12 @@ static inline void set_flush_inputs_to_zero(bool val, float_status *status) status->flush_inputs_to_zero = val; } +static inline void set_float_detect_ftz(FloatFTZDetection d, + float_status *status) +{ + status->ftz_detection = d; +} + static inline void set_default_nan_mode(bool val, float_status *status) { status->default_nan_mode = val; @@ -183,4 +189,9 @@ static inline bool get_default_nan_mode(const float_status *status) return status->default_nan_mode; } +static inline FloatFTZDetection get_float_detect_ftz(const float_status *status) +{ + return status->ftz_detection; +} + #endif /* SOFTFLOAT_HELPERS_H */ diff --git a/include/fpu/softfloat-types.h b/include/fpu/softfloat-types.h index b9b4e8e55fc..77cfed9d52e 100644 --- a/include/fpu/softfloat-types.h +++ b/include/fpu/softfloat-types.h @@ -304,6 +304,22 @@ typedef enum __attribute__((__packed__)) { float_infzeronan_suppress_invalid = (1 << 2), } FloatInfZeroNaNRule; +/* + * When flush_to_zero is set, should we detect denormal results to + * be flushed before or after rounding? For most architectures this + * should be set to match the tininess_before_rounding setting, + * but a few architectures, e.g. MIPS MSA, detect FTZ before + * rounding but tininess after rounding. + * + * This enum is arranged so that the default if the target doesn't + * configure it matches the default for tininess_before_rounding + * (i.e. "after rounding"). + */ +typedef enum __attribute__((__packed__)) { + detect_ftz_after_rounding = 0, + detect_ftz_before_rounding = 1, +} FloatFTZDetection; + /* * Floating Point Status. Individual architectures may maintain * several versions of float_status for different functions. The @@ -321,6 +337,8 @@ typedef struct float_status { bool tininess_before_rounding; /* should denormalised results go to zero and set output_denormal_flushed? */ bool flush_to_zero; + /* do we detect and flush denormal results before or after rounding? */ + FloatFTZDetection ftz_detection; /* should denormalised inputs go to zero and set input_denormal_flushed? */ bool flush_inputs_to_zero; bool default_nan_mode; diff --git a/target/mips/fpu_helper.h b/target/mips/fpu_helper.h index 6ad1e466cfd..042f7e02c03 100644 --- a/target/mips/fpu_helper.h +++ b/target/mips/fpu_helper.h @@ -84,6 +84,12 @@ static inline void fp_reset(CPUMIPSState *env) */ set_float_2nan_prop_rule(float_2nan_prop_s_ab, &env->active_fpu.fp_status); + /* + * TODO: the spec does't say clearly whether FTZ happens before + * or after rounding for normal FPU operations. + */ + set_float_detect_ftz(detect_ftz_before_rounding, + &env->active_fpu.fp_status); } /* MSA */ diff --git a/target/alpha/cpu.c b/target/alpha/cpu.c index e1b898e5755..d4bffd58834 100644 --- a/target/alpha/cpu.c +++ b/target/alpha/cpu.c @@ -202,6 +202,13 @@ static void alpha_cpu_initfn(Object *obj) set_float_2nan_prop_rule(float_2nan_prop_x87, &env->fp_status); /* Default NaN: sign bit clear, msb frac bit set */ set_float_default_nan_pattern(0b01000000, &env->fp_status); + /* + * TODO: this is incorrect. The Alpha Architecture Handbook version 4 + * section 4.7.7.11 says that we flush to zero for underflow cases, so + * this should be detect_ftz_after_rounding to match the + * tininess_after_rounding (which is specified in section 4.7.5). + */ + set_float_detect_ftz(detect_ftz_before_rounding, &env->fp_status); #if defined(CONFIG_USER_ONLY) env->flags = ENV_FLAG_PS_USER | ENV_FLAG_FEN; cpu_alpha_store_fpcr(env, (uint64_t)(FPCR_INVD | FPCR_DZED | FPCR_OVFD diff --git a/target/arm/cpu.c b/target/arm/cpu.c index 7a83b9ee34f..0b4cd872d27 100644 --- a/target/arm/cpu.c +++ b/target/arm/cpu.c @@ -185,6 +185,7 @@ void arm_register_el_change_hook(ARMCPU *cpu, ARMELChangeHookFn *hook, static void arm_set_default_fp_behaviours(float_status *s) { set_float_detect_tininess(float_tininess_before_rounding, s); + set_float_detect_ftz(detect_ftz_before_rounding, s); set_float_2nan_prop_rule(float_2nan_prop_s_ab, s); set_float_3nan_prop_rule(float_3nan_prop_s_cab, s); set_float_infzeronan_rule(float_infzeronan_dnan_if_qnan, s); diff --git a/target/hppa/fpu_helper.c b/target/hppa/fpu_helper.c index 239c027ec52..a0f01e3e734 100644 --- a/target/hppa/fpu_helper.c +++ b/target/hppa/fpu_helper.c @@ -67,6 +67,17 @@ void HELPER(loaded_fr0)(CPUHPPAState *env) set_float_infzeronan_rule(float_infzeronan_dnan_never, &env->fp_status); /* Default NaN: sign bit clear, msb-1 frac bit set */ set_float_default_nan_pattern(0b00100000, &env->fp_status); + /* + * "PA-RISC 2.0 Architecture" says it is IMPDEF whether the flushing + * enabled by FPSR.D happens before or after rounding. We pick "before" + * for consistency with tininess detection. + */ + set_float_detect_ftz(detect_ftz_before_rounding, &env->fp_status); + /* + * TODO: "PA-RISC 2.0 Architecture" chapter 10 says that we should + * detect tininess before rounding, but we don't set that here so we + * get the default tininess after rounding. + */ } void cpu_hppa_loaded_fr0(CPUHPPAState *env) diff --git a/target/i386/tcg/fpu_helper.c b/target/i386/tcg/fpu_helper.c index de6d0b252ec..9bf23fdd0f6 100644 --- a/target/i386/tcg/fpu_helper.c +++ b/target/i386/tcg/fpu_helper.c @@ -188,6 +188,14 @@ void cpu_init_fp_statuses(CPUX86State *env) set_float_default_nan_pattern(0b11000000, &env->fp_status); set_float_default_nan_pattern(0b11000000, &env->mmx_status); set_float_default_nan_pattern(0b11000000, &env->sse_status); + /* + * TODO: x86 does flush-to-zero detection after rounding (the SDM + * section 10.2.3.3 on the FTZ bit of MXCSR says that we flush + * when we detect underflow, which x86 does after rounding). + */ + set_float_detect_ftz(detect_ftz_before_rounding, &env->fp_status); + set_float_detect_ftz(detect_ftz_before_rounding, &env->mmx_status); + set_float_detect_ftz(detect_ftz_before_rounding, &env->sse_status); } static inline uint8_t save_exception_flags(CPUX86State *env) diff --git a/target/mips/msa.c b/target/mips/msa.c index fc77bfc7b9a..2899577e8e5 100644 --- a/target/mips/msa.c +++ b/target/mips/msa.c @@ -48,6 +48,15 @@ void msa_reset(CPUMIPSState *env) /* tininess detected after rounding.*/ set_float_detect_tininess(float_tininess_after_rounding, &env->active_tc.msa_fp_status); + /* + * MSACSR.FS detects tiny results to flush to zero before rounding + * (per "MIPS Architecture for Programmers Volume IV-j: The MIPS64 SIMD + * Architecture Module, Revision 1.1" section 3.5.4), even though it + * detects tininess after rounding for underflow purposes (section 3.4.2 + * table 3.3). + */ + set_float_detect_ftz(detect_ftz_before_rounding, + &env->active_tc.msa_fp_status); /* * According to MIPS specifications, if one of the two operands is diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c index c05c2dc42dc..8fa41307370 100644 --- a/target/ppc/cpu_init.c +++ b/target/ppc/cpu_init.c @@ -7262,6 +7262,9 @@ static void ppc_cpu_reset_hold(Object *obj, ResetType type) /* tininess for underflow is detected before rounding */ set_float_detect_tininess(float_tininess_before_rounding, &env->fp_status); + /* Similarly for flush-to-zero */ + set_float_detect_ftz(detect_ftz_before_rounding, &env->fp_status); + /* * PowerPC propagation rules: * 1. A if it sNaN or qNaN diff --git a/target/rx/cpu.c b/target/rx/cpu.c index 8c50c7a1bc8..a18c3d81e38 100644 --- a/target/rx/cpu.c +++ b/target/rx/cpu.c @@ -103,6 +103,14 @@ static void rx_cpu_reset_hold(Object *obj, ResetType type) set_float_2nan_prop_rule(float_2nan_prop_x87, &env->fp_status); /* Default NaN value: sign bit clear, set frac msb */ set_float_default_nan_pattern(0b01000000, &env->fp_status); + /* + * TODO: "RX Family RXv1 Instruction Set Architecture" is not 100% clear + * on whether flush-to-zero should happen before or after rounding, but + * section 1.3.2 says that it happens when underflow is detected, and + * implies that underflow is detected after rounding. So this may not + * be the correct setting. + */ + set_float_detect_ftz(detect_ftz_before_rounding, &env->fp_status); } static ObjectClass *rx_cpu_class_by_name(const char *cpu_model) diff --git a/target/sh4/cpu.c b/target/sh4/cpu.c index 24a22724c61..cade4463119 100644 --- a/target/sh4/cpu.c +++ b/target/sh4/cpu.c @@ -130,6 +130,14 @@ static void superh_cpu_reset_hold(Object *obj, ResetType type) set_default_nan_mode(1, &env->fp_status); /* sign bit clear, set all frac bits other than msb */ set_float_default_nan_pattern(0b00111111, &env->fp_status); + /* + * TODO: "SH-4 CPU Core Architecture ADCS 7182230F" doesn't say whether + * it detects tininess before or after rounding. Section 6.4 is clear + * that flush-to-zero happens when the result underflows, though, so + * either this should be "detect ftz after rounding" or else we should + * be setting "detect tininess before rounding". + */ + set_float_detect_ftz(detect_ftz_before_rounding, &env->fp_status); } static void superh_cpu_disas_set_info(CPUState *cpu, disassemble_info *info) diff --git a/target/tricore/helper.c b/target/tricore/helper.c index e8b0ec51611..df4a2b5b9d8 100644 --- a/target/tricore/helper.c +++ b/target/tricore/helper.c @@ -116,6 +116,7 @@ void fpu_set_state(CPUTriCoreState *env) set_flush_inputs_to_zero(1, &env->fp_status); set_flush_to_zero(1, &env->fp_status); set_float_detect_tininess(float_tininess_before_rounding, &env->fp_status); + set_float_detect_ftz(detect_ftz_before_rounding, &env->fp_status); set_default_nan_mode(1, &env->fp_status); /* Default NaN pattern: sign bit clear, frac msb set */ set_float_default_nan_pattern(0b01000000, &env->fp_status); diff --git a/tests/fp/fp-bench.c b/tests/fp/fp-bench.c index eacb39b99cb..9e3694bc4e1 100644 --- a/tests/fp/fp-bench.c +++ b/tests/fp/fp-bench.c @@ -496,6 +496,7 @@ static void run_bench(void) set_float_3nan_prop_rule(float_3nan_prop_s_cab, &soft_status); set_float_infzeronan_rule(float_infzeronan_dnan_if_qnan, &soft_status); set_float_default_nan_pattern(0b01000000, &soft_status); + set_float_detect_ftz(detect_ftz_before_rounding, &soft_status); f = bench_funcs[operation][precision]; g_assert(f); diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc index 0122b35008a..324e67de259 100644 --- a/fpu/softfloat-parts.c.inc +++ b/fpu/softfloat-parts.c.inc @@ -334,7 +334,8 @@ static void partsN(uncanon_normal)(FloatPartsN *p, float_status *s, p->frac_lo &= ~round_mask; } frac_shr(p, frac_shift); - } else if (s->flush_to_zero) { + } else if (s->flush_to_zero && + s->ftz_detection == detect_ftz_before_rounding) { flags |= float_flag_output_denormal_flushed; p->cls = float_class_zero; exp = 0; @@ -381,11 +382,19 @@ static void partsN(uncanon_normal)(FloatPartsN *p, float_status *s, exp = (p->frac_hi & DECOMPOSED_IMPLICIT_BIT) && !fmt->m68k_denormal; frac_shr(p, frac_shift); - if (is_tiny && (flags & float_flag_inexact)) { - flags |= float_flag_underflow; - } - if (exp == 0 && frac_eqz(p)) { - p->cls = float_class_zero; + if (is_tiny) { + if (s->flush_to_zero) { + assert(s->ftz_detection == detect_ftz_after_rounding); + flags |= float_flag_output_denormal_flushed; + p->cls = float_class_zero; + exp = 0; + frac_clear(p); + } else if (flags & float_flag_inexact) { + flags |= float_flag_underflow; + } + if (exp == 0 && frac_eqz(p)) { + p->cls = float_class_zero; + } } } p->exp = exp;

[24/76] fpu: allow flushing of output denormals to be after rounding

Commit Message

Comments

Patch