diff mbox series

[39/76] target/arm: Handle FPCR.NEP for BFCVT scalar

Message ID	20250124162836.2332150-40-peter.maydell@linaro.org
State	Superseded
Headers	show Delivered-To: patch@linaro.org Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; From: Peter Maydell <peter.maydell@linaro.org> To: qemu-arm@nongnu.org, qemu-devel@nongnu.org Subject: [PATCH 39/76] target/arm: Handle FPCR.NEP for BFCVT scalar Date: Fri, 24 Jan 2025 16:27:59 +0000 Message-Id: <20250124162836.2332150-40-peter.maydell@linaro.org> In-Reply-To: <20250124162836.2332150-1-peter.maydell@linaro.org> References: <20250124162836.2332150-1-peter.maydell@linaro.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=2a00:1450:4864:20::332; envelope-from=peter.maydell@linaro.org; helo=mail-wm1-x332.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action Precedence: list Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org
Series	target/arm: Implement FEAT_AFP and FEAT_RPRES \| expand [00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES [01/76] target/i386: Do not raise Invalid for 0 * Inf + QNaN [02/76] tests/tcg/x86_64/fma: Test some x86 fused-multiply-add cases [03/76] target/arm: arm_reset_sve_state() should set FPSR, not FPCR [04/76] target/arm: Use FPSR_ constants in vfp_exceptbits_from_host() [05/76] target/arm: Use uint32_t in vfp_exceptbits_from_host() [06/76] target/arm: Define new fp_status_a32 and fp_status_a64 [07/76] target/arm: Use vfp.fp_status_a64 in A64-only helper functions [08/76] target/arm: Use fp_status_a32 in vjvct helper [09/76] target/arm: Use fp_status_a32 in vfp_cmp helpers [10/76] target/arm: Use FPST_FPCR_A32 in A32 decoder [11/76] target/arm: Use FPST_FPCR_A64 in A64 decoder [12/76] target/arm: Remove now-unused vfp.fp_status and FPST_FPCR [13/76] target/arm: Define new fp_status_f16_a32 and fp_status_f16_a64 [14/76] target/arm: Use fp_status_f16_a32 in AArch32-only helpers [15/76] target/arm: Use fp_status_f16_a64 in AArch64-only helpers [16/76] target/arm: Use FPST_FPCR_F16_A32 in A32 decoder [17/76] target/arm: Use FPST_FPCR_F16_A64 in A64 decoder [18/76] target/arm: Remove now-unused vfp.fp_status_f16 and FPST_FPCR_F16 [19/76] fpu: Rename float_flag_input_denormal to float_flag_input_denormal_flushed [20/76] fpu: Rename float_flag_output_denormal to float_flag_output_denormal_flushed [21/76] fpu: Fix a comment in softfloat-types.h [22/76] fpu: Add float_class_denormal [23/76] fpu: Implement float_flag_input_denormal_used [24/76] fpu: allow flushing of output denormals to be after rounding [25/76] target/arm: Remove redundant advsimd float16 helpers [26/76] target/arm: Use FPST_FPCR_F16_A64 for halfprec-to-other conversions [27/76] target/arm: Define FPCR AH, FIZ, NEP bits [28/76] target/arm: Implement FPCR.FIZ handling [29/76] target/arm: Adjust FP behaviour for FPCR.AH = 1 [30/76] target/arm: Adjust exception flag handling for AH = 1 [31/76] target/arm: Add FPCR.AH to tbflags [32/76] target/arm: Set up float_status to use for FPCR.AH=1 behaviour [33/76] target/arm: Use FPST_FPCR_AH for FRECPE, FRECPS, FRECPX, FRSQRTE, FRSQRTS [34/76] target/arm: Use FPST_FPCR_AH for BFCVT* insns [35/76] target/arm: Use FPST_FPCR_AH for BFMLAL, BFMLSL insns [36/76] target/arm: Add FPCR.NEP to TBFLAGS [37/76] target/arm: Define and use new write_fp_*reg_merging() functions [38/76] target/arm: Handle FPCR.NEP for 3-input scalar operations [39/76] target/arm: Handle FPCR.NEP for BFCVT scalar [40/76] target/arm: Handle FPCR.NEP for 1-input scalar operations [41/76] target/arm: Handle FPCR.NEP in do_cvtf_scalar() [42/76] target/arm: Handle FPCR.NEP for scalar FABS and FNEG [43/76] target/arm: Handle FPCR.NEP for FCVTXN (scalar) [44/76] target/arm: Handle FPCR.NEP for NEP for FMUL, FMULX scalar by element [45/76] target/arm: Implement FPCR.AH semantics for scalar FMIN/FMAX [46/76] target/arm: Implement FPCR.AH semantics for vector FMIN/FMAX [47/76] target/arm: Implement FPCR.AH semantics for FMAXV and FMINV [48/76] target/arm: Implement FPCR.AH semantics for FMINP and FMAXP [49/76] target/arm: Implement FPCR.AH semantics for SVE FMAXV and FMINV [50/76] target/arm: Implement FPCR.AH semantics for SVE FMIN/FMAX immediate [51/76] target/arm: Implement FPCR.AH semantics for SVE FMIN/FMAX vector [52/76] target/arm: Implement FPCR.AH handling of negation of NaN [53/76] target/arm: Implement FPCR.AH handling for scalar FABS and FABD [54/76] target/arm: Handle FPCR.AH in vector FABD [55/76] target/arm: Handle FPCR.AH in SVE FNEG [56/76] target/arm: Handle FPCR.AH in SVE FABS [57/76] target/arm: Handle FPCR.AH in SVE FABD [58/76] target/arm: Handle FPCR.AH in negation steps in FCADD [59/76] target/arm: Handle FPCR.AH in negation steps in SVE FCADD [60/76] target/arm: Handle FPCR.AH in FMLSL [61/76] target/arm: Handle FPCR.AH in FRECPS and FRSQRTS scalar insns [62/76] target/arm: Handle FPCR.AH in FRECPS and FRSQRTS vector insns [63/76] target/arm: Handle FPCR.AH in negation step in FMLS (indexed) [64/76] target/arm: Handle FPCR.AH in negation in FMLS (vector) [65/76] target/arm: Handle FPCR.AH in negation step in SVE FMLS (vector) [66/76] target/arm: Handle FPCR.AH in SVE FTSSEL [67/76] target/arm: Handle FPCR.AH in SVE FTMAD [68/76] target/arm: Enable FEAT_AFP for '-cpu max' [69/76] target/arm: Plumb FEAT_RPRES frecpe and frsqrte through to new helper [70/76] target/arm: Implement increased precision FRECPE [71/76] target/arm: Implement increased precision FRSQRTE [72/76] target/arm: Enable FEAT_RPRES for -cpu max [73/76] target/i386: Detect flush-to-zero after rounding [74/76] target/i386: Use correct type for get_float_exception_flags() values [75/76] target/i386: Wire up MXCSR.DE and FPUS.DE correctly [76/76] tests/tcg/x86_64/fma: add test for exact-denormal output

Commit Message

Peter Maydell Jan. 24, 2025, 4:27 p.m. UTC

Currently we implement BFCVT scalar via do_fp1_scalar().  This works
even though BFCVT is a narrowing operation from 32 to 16 bits,
because we can use write_fp_sreg() for float16. However, FPCR.NEP
support requires that we use write_fp_hreg_merging() for float16
outputs, so we can't continue to borrow the non-narrowing
do_fp1_scalar() function for this. Split out trans_BFCVT_s()
into its own implementation that honours FPCR.NEP.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/translate-a64.c | 25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

Comments

Richard Henderson Jan. 25, 2025, 5:55 p.m. UTC | #1

On 1/24/25 08:27, Peter Maydell wrote:
> Currently we implement BFCVT scalar via do_fp1_scalar().  This works
> even though BFCVT is a narrowing operation from 32 to 16 bits,
> because we can use write_fp_sreg() for float16. However, FPCR.NEP
> support requires that we use write_fp_hreg_merging() for float16
> outputs, so we can't continue to borrow the non-narrowing
> do_fp1_scalar() function for this. Split out trans_BFCVT_s()
> into its own implementation that honours FPCR.NEP.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/tcg/translate-a64.c | 25 +++++++++++++++++++++----
>   1 file changed, 21 insertions(+), 4 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~

diff mbox series

Patch

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 66c214ed278..944bdf8cafe 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -8582,10 +8582,27 @@  static const FPScalar1 f_scalar_frintx = {
 };
 TRANS(FRINTX_s, do_fp1_scalar, a, &f_scalar_frintx, -1)
 
-static const FPScalar1 f_scalar_bfcvt = {
-    .gen_s = gen_helper_bfcvt,
-};
-TRANS_FEAT(BFCVT_s, aa64_bf16, do_fp1_scalar_ah, a, &f_scalar_bfcvt, -1)
+static bool trans_BFCVT_s(DisasContext *s, arg_rr_e *a)
+{
+    ARMFPStatusFlavour fpsttype = s->fpcr_ah ? FPST_FPCR_AH : FPST_FPCR_A64;
+    TCGv_i32 t32;
+    int check;
+
+    if (!dc_isar_feature(aa64_bf16, s)) {
+        return false;
+    }
+
+    check = fp_access_check_scalar_hsd(s, a->esz);
+
+    if (check <= 0) {
+        return check == 0;
+    }
+
+    t32 = read_fp_sreg(s, a->rn);
+    gen_helper_bfcvt(t32, t32, fpstatus_ptr(fpsttype));
+    write_fp_hreg_merging(s, a->rd, a->rd, t32);
+    return true;
+}
 
 static const FPScalar1 f_scalar_frint32 = {
     NULL,