[28/76] target/arm: Implement FPCR.FIZ handling

Message ID	20250124162836.2332150-29-peter.maydell@linaro.org
State	Superseded
Headers	show Delivered-To: patch@linaro.org Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; From: Peter Maydell <peter.maydell@linaro.org> To: qemu-arm@nongnu.org, qemu-devel@nongnu.org Subject: [PATCH 28/76] target/arm: Implement FPCR.FIZ handling Date: Fri, 24 Jan 2025 16:27:48 +0000 Message-Id: <20250124162836.2332150-29-peter.maydell@linaro.org> In-Reply-To: <20250124162836.2332150-1-peter.maydell@linaro.org> References: <20250124162836.2332150-1-peter.maydell@linaro.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=2a00:1450:4864:20::32f; envelope-from=peter.maydell@linaro.org; helo=mail-wm1-x32f.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=unavailable autolearn_force=no X-Spam_action: no action Precedence: list Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org
Series	target/arm: Implement FEAT_AFP and FEAT_RPRES \| expand [00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES [01/76] target/i386: Do not raise Invalid for 0 * Inf + QNaN [02/76] tests/tcg/x86_64/fma: Test some x86 fused-multiply-add cases [03/76] target/arm: arm_reset_sve_state() should set FPSR, not FPCR [04/76] target/arm: Use FPSR_ constants in vfp_exceptbits_from_host() [05/76] target/arm: Use uint32_t in vfp_exceptbits_from_host() [06/76] target/arm: Define new fp_status_a32 and fp_status_a64 [07/76] target/arm: Use vfp.fp_status_a64 in A64-only helper functions [08/76] target/arm: Use fp_status_a32 in vjvct helper [09/76] target/arm: Use fp_status_a32 in vfp_cmp helpers [10/76] target/arm: Use FPST_FPCR_A32 in A32 decoder [11/76] target/arm: Use FPST_FPCR_A64 in A64 decoder [12/76] target/arm: Remove now-unused vfp.fp_status and FPST_FPCR [13/76] target/arm: Define new fp_status_f16_a32 and fp_status_f16_a64 [14/76] target/arm: Use fp_status_f16_a32 in AArch32-only helpers [15/76] target/arm: Use fp_status_f16_a64 in AArch64-only helpers [16/76] target/arm: Use FPST_FPCR_F16_A32 in A32 decoder [17/76] target/arm: Use FPST_FPCR_F16_A64 in A64 decoder [18/76] target/arm: Remove now-unused vfp.fp_status_f16 and FPST_FPCR_F16 [19/76] fpu: Rename float_flag_input_denormal to float_flag_input_denormal_flushed [20/76] fpu: Rename float_flag_output_denormal to float_flag_output_denormal_flushed [21/76] fpu: Fix a comment in softfloat-types.h [22/76] fpu: Add float_class_denormal [23/76] fpu: Implement float_flag_input_denormal_used [24/76] fpu: allow flushing of output denormals to be after rounding [25/76] target/arm: Remove redundant advsimd float16 helpers [26/76] target/arm: Use FPST_FPCR_F16_A64 for halfprec-to-other conversions [27/76] target/arm: Define FPCR AH, FIZ, NEP bits [28/76] target/arm: Implement FPCR.FIZ handling [29/76] target/arm: Adjust FP behaviour for FPCR.AH = 1 [30/76] target/arm: Adjust exception flag handling for AH = 1 [31/76] target/arm: Add FPCR.AH to tbflags [32/76] target/arm: Set up float_status to use for FPCR.AH=1 behaviour [33/76] target/arm: Use FPST_FPCR_AH for FRECPE, FRECPS, FRECPX, FRSQRTE, FRSQRTS [34/76] target/arm: Use FPST_FPCR_AH for BFCVT* insns [35/76] target/arm: Use FPST_FPCR_AH for BFMLAL, BFMLSL insns [36/76] target/arm: Add FPCR.NEP to TBFLAGS [37/76] target/arm: Define and use new write_fp_*reg_merging() functions [38/76] target/arm: Handle FPCR.NEP for 3-input scalar operations [39/76] target/arm: Handle FPCR.NEP for BFCVT scalar [40/76] target/arm: Handle FPCR.NEP for 1-input scalar operations [41/76] target/arm: Handle FPCR.NEP in do_cvtf_scalar() [42/76] target/arm: Handle FPCR.NEP for scalar FABS and FNEG [43/76] target/arm: Handle FPCR.NEP for FCVTXN (scalar) [44/76] target/arm: Handle FPCR.NEP for NEP for FMUL, FMULX scalar by element [45/76] target/arm: Implement FPCR.AH semantics for scalar FMIN/FMAX [46/76] target/arm: Implement FPCR.AH semantics for vector FMIN/FMAX [47/76] target/arm: Implement FPCR.AH semantics for FMAXV and FMINV [48/76] target/arm: Implement FPCR.AH semantics for FMINP and FMAXP [49/76] target/arm: Implement FPCR.AH semantics for SVE FMAXV and FMINV [50/76] target/arm: Implement FPCR.AH semantics for SVE FMIN/FMAX immediate [51/76] target/arm: Implement FPCR.AH semantics for SVE FMIN/FMAX vector [52/76] target/arm: Implement FPCR.AH handling of negation of NaN [53/76] target/arm: Implement FPCR.AH handling for scalar FABS and FABD [54/76] target/arm: Handle FPCR.AH in vector FABD [55/76] target/arm: Handle FPCR.AH in SVE FNEG [56/76] target/arm: Handle FPCR.AH in SVE FABS [57/76] target/arm: Handle FPCR.AH in SVE FABD [58/76] target/arm: Handle FPCR.AH in negation steps in FCADD [59/76] target/arm: Handle FPCR.AH in negation steps in SVE FCADD [60/76] target/arm: Handle FPCR.AH in FMLSL [61/76] target/arm: Handle FPCR.AH in FRECPS and FRSQRTS scalar insns [62/76] target/arm: Handle FPCR.AH in FRECPS and FRSQRTS vector insns [63/76] target/arm: Handle FPCR.AH in negation step in FMLS (indexed) [64/76] target/arm: Handle FPCR.AH in negation in FMLS (vector) [65/76] target/arm: Handle FPCR.AH in negation step in SVE FMLS (vector) [66/76] target/arm: Handle FPCR.AH in SVE FTSSEL [67/76] target/arm: Handle FPCR.AH in SVE FTMAD [68/76] target/arm: Enable FEAT_AFP for '-cpu max' [69/76] target/arm: Plumb FEAT_RPRES frecpe and frsqrte through to new helper [70/76] target/arm: Implement increased precision FRECPE [71/76] target/arm: Implement increased precision FRSQRTE [72/76] target/arm: Enable FEAT_RPRES for -cpu max [73/76] target/i386: Detect flush-to-zero after rounding [74/76] target/i386: Use correct type for get_float_exception_flags() values [75/76] target/i386: Wire up MXCSR.DE and FPUS.DE correctly [76/76] tests/tcg/x86_64/fma: add test for exact-denormal output

Message ID

20250124162836.2332150-29-peter.maydell@linaro.org

State

Superseded

Headers

Received-SPF: pass (google.com: domain of
 qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as
 permitted sender) client-ip=209.51.188.17;
From: Peter Maydell <peter.maydell@linaro.org>
To: qemu-arm@nongnu.org,
	qemu-devel@nongnu.org
Subject: [PATCH 28/76] target/arm: Implement FPCR.FIZ handling
Date: Fri, 24 Jan 2025 16:27:48 +0000
Message-Id: <20250124162836.2332150-29-peter.maydell@linaro.org>
In-Reply-To: <20250124162836.2332150-1-peter.maydell@linaro.org>
References: <20250124162836.2332150-1-peter.maydell@linaro.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Received-SPF: pass client-ip=2a00:1450:4864:20::32f;
 envelope-from=peter.maydell@linaro.org; helo=mail-wm1-x32f.google.com
X-Spam_score_int: -20
X-Spam_score: -2.1
X-Spam_bar: --
X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,
 DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1,
 RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001,
 T_SCC_BODY_TEXT_LINE=-0.01 autolearn=unavailable autolearn_force=no
X-Spam_action: no action
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <https://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org
Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org

Series

target/arm: Implement FEAT_AFP and FEAT_RPRES | expand

Commit Message

Peter Maydell Jan. 24, 2025, 4:27 p.m. UTC

Part of FEAT_AFP is the new control bit FPCR.FIZ.  This bit affects
flushing of single and double precision denormal inputs to zero for
AArch64 floating point instructions.  (For half-precision, the
existing FPCR.FZ16 control remains the only one.)

FPCR.FIZ differs from FPCR.FZ in that if we flush an input denormal
only because of FPCR.FIZ then we should *not* set the cumulative
exception bit FPSR.IDC.

FEAT_AFP also defines that in AArch64 the existing FPCR.FZ only
applies when FPCR.AH is 0.

We can implement this by setting the "flush inputs to zero" state
appropriately when FPCR is written, and by not reflecting the
float_flag_input_denormal status flag into FPSR reads when it is the
result only of FPSR.FIZ.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/vfp_helper.c | 58 ++++++++++++++++++++++++++++++++++-------
 1 file changed, 48 insertions(+), 10 deletions(-)

Comments

Richard Henderson Jan. 25, 2025, 5:25 p.m. UTC | #1

On 1/24/25 08:27, Peter Maydell wrote:
> Part of FEAT_AFP is the new control bit FPCR.FIZ.  This bit affects
> flushing of single and double precision denormal inputs to zero for
> AArch64 floating point instructions.  (For half-precision, the
> existing FPCR.FZ16 control remains the only one.)
> 
> FPCR.FIZ differs from FPCR.FZ in that if we flush an input denormal
> only because of FPCR.FIZ then we should *not* set the cumulative
> exception bit FPSR.IDC.
> 
> FEAT_AFP also defines that in AArch64 the existing FPCR.FZ only
> applies when FPCR.AH is 0.
> 
> We can implement this by setting the "flush inputs to zero" state
> appropriately when FPCR is written, and by not reflecting the
> float_flag_input_denormal status flag into FPSR reads when it is the
> result only of FPSR.FIZ.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>   target/arm/vfp_helper.c | 58 ++++++++++++++++++++++++++++++++++-------
>   1 file changed, 48 insertions(+), 10 deletions(-)
> 
> diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
> index 8c79ab4fc8a..5a0b389f7a3 100644
> --- a/target/arm/vfp_helper.c
> +++ b/target/arm/vfp_helper.c
> @@ -61,19 +61,29 @@ static inline uint32_t vfp_exceptbits_from_host(int host_bits)
>   
>   static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
>   {
> -    uint32_t i = 0;
> +    uint32_t a32_flags = 0, a64_flags = 0;
>   
> -    i |= get_float_exception_flags(&env->vfp.fp_status_a32);
> -    i |= get_float_exception_flags(&env->vfp.fp_status_a64);
> -    i |= get_float_exception_flags(&env->vfp.standard_fp_status);
> +    a32_flags |= get_float_exception_flags(&env->vfp.fp_status_a32);
> +    a32_flags |= get_float_exception_flags(&env->vfp.standard_fp_status);
>       /* FZ16 does not generate an input denormal exception.  */
> -    i |= (get_float_exception_flags(&env->vfp.fp_status_f16_a32)
> +    a32_flags |= (get_float_exception_flags(&env->vfp.fp_status_f16_a32)
>             & ~float_flag_input_denormal_flushed);
> -    i |= (get_float_exception_flags(&env->vfp.fp_status_f16_a64)
> +    a32_flags |= (get_float_exception_flags(&env->vfp.standard_fp_status_f16)
>             & ~float_flag_input_denormal_flushed);
> -    i |= (get_float_exception_flags(&env->vfp.standard_fp_status_f16)
> +
> +    a64_flags |= get_float_exception_flags(&env->vfp.fp_status_a64);
> +    a64_flags |= (get_float_exception_flags(&env->vfp.fp_status_f16_a64)
>             & ~float_flag_input_denormal_flushed);
> -    return vfp_exceptbits_from_host(i);
> +    /*
> +     * Flushing an input denormal only because FPCR.FIZ == 1 does
> +     * not set FPSR.IDC. So squash it unless (FPCR.AH == 0 && FPCR.FZ == 1).
> +     * We only do this for the a64 flags because FIZ has no effect
> +     * on AArch32 even if it is set.
> +     */
> +    if ((env->vfp.fpcr & (FPCR_FZ | FPCR_AH)) != FPCR_FZ) {
> +        a64_flags &= ~float_flag_input_denormal_flushed;
> +    }

It might be worth pointing to FPUnpackBase pseudocode to say if both FZ and FIZ set, FZ 
takes precedence for setting IDC.

Anyway,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index 8c79ab4fc8a..5a0b389f7a3 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -61,19 +61,29 @@  static inline uint32_t vfp_exceptbits_from_host(int host_bits)
 
 static uint32_t vfp_get_fpsr_from_host(CPUARMState *env)
 {
-    uint32_t i = 0;
+    uint32_t a32_flags = 0, a64_flags = 0;
 
-    i |= get_float_exception_flags(&env->vfp.fp_status_a32);
-    i |= get_float_exception_flags(&env->vfp.fp_status_a64);
-    i |= get_float_exception_flags(&env->vfp.standard_fp_status);
+    a32_flags |= get_float_exception_flags(&env->vfp.fp_status_a32);
+    a32_flags |= get_float_exception_flags(&env->vfp.standard_fp_status);
     /* FZ16 does not generate an input denormal exception.  */
-    i |= (get_float_exception_flags(&env->vfp.fp_status_f16_a32)
+    a32_flags |= (get_float_exception_flags(&env->vfp.fp_status_f16_a32)
           & ~float_flag_input_denormal_flushed);
-    i |= (get_float_exception_flags(&env->vfp.fp_status_f16_a64)
+    a32_flags |= (get_float_exception_flags(&env->vfp.standard_fp_status_f16)
           & ~float_flag_input_denormal_flushed);
-    i |= (get_float_exception_flags(&env->vfp.standard_fp_status_f16)
+
+    a64_flags |= get_float_exception_flags(&env->vfp.fp_status_a64);
+    a64_flags |= (get_float_exception_flags(&env->vfp.fp_status_f16_a64)
           & ~float_flag_input_denormal_flushed);
-    return vfp_exceptbits_from_host(i);
+    /*
+     * Flushing an input denormal only because FPCR.FIZ == 1 does
+     * not set FPSR.IDC. So squash it unless (FPCR.AH == 0 && FPCR.FZ == 1).
+     * We only do this for the a64 flags because FIZ has no effect
+     * on AArch32 even if it is set.
+     */
+    if ((env->vfp.fpcr & (FPCR_FZ | FPCR_AH)) != FPCR_FZ) {
+        a64_flags &= ~float_flag_input_denormal_flushed;
+    }
+    return vfp_exceptbits_from_host(a32_flags | a64_flags);
 }
 
 static void vfp_clear_float_status_exc_flags(CPUARMState *env)
@@ -91,6 +101,17 @@  static void vfp_clear_float_status_exc_flags(CPUARMState *env)
     set_float_exception_flags(0, &env->vfp.standard_fp_status_f16);
 }
 
+static void vfp_sync_and_clear_float_status_exc_flags(CPUARMState *env)
+{
+    /*
+     * Synchronize any pending exception-flag information in the
+     * float_status values into env->vfp.fpsr, and then clear out
+     * the float_status data.
+     */
+    env->vfp.fpsr |= vfp_get_fpsr_from_host(env);
+    vfp_clear_float_status_exc_flags(env);
+}
+
 static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
 {
     uint64_t changed = env->vfp.fpcr;
@@ -130,9 +151,18 @@  static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
     if (changed & FPCR_FZ) {
         bool ftz_enabled = val & FPCR_FZ;
         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_a32);
-        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_a32);
         set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_a64);
-        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_a64);
+        /* FIZ is A64 only so FZ always makes A32 code flush inputs to zero */
+        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_a32);
+    }
+    if (changed & (FPCR_FZ | FPCR_AH | FPCR_FIZ)) {
+        /*
+         * A64: Flush denormalized inputs to zero if FPCR.FIZ = 1, or
+         * both FPCR.AH = 0 and FPCR.FZ = 1.
+         */
+        bool fitz_enabled = (val & FPCR_FIZ) ||
+            (val & (FPCR_FZ | FPCR_AH)) == FPCR_FZ;
+        set_flush_inputs_to_zero(fitz_enabled, &env->vfp.fp_status_a64);
     }
     if (changed & FPCR_DN) {
         bool dnan_enabled = val & FPCR_DN;
@@ -141,6 +171,14 @@  static void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask)
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a32);
         set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16_a64);
     }
+    /*
+     * If any bits changed that we look at in vfp_get_fpsr_from_host(),
+     * we must sync the float_status flags into vfp.fpsr now (under the
+     * old regime) before we update vfp.fpcr.
+     */
+    if (changed & (FPCR_FZ | FPCR_AH | FPCR_FIZ)) {
+        vfp_sync_and_clear_float_status_exc_flags(env);
+    }
 }
 
 #else

[28/76] target/arm: Implement FPCR.FIZ handling

Commit Message

Comments

Patch