[15/57] target/arm: Expand vfp neg and abs inline

Message ID	20240506010403.6204-16-richard.henderson@linaro.org
State	Superseded
Headers	show Delivered-To: patch@linaro.org Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; From: Richard Henderson <richard.henderson@linaro.org> To: qemu-devel@nongnu.org Cc: qemu-arm@nongnu.org Subject: [PATCH 15/57] target/arm: Expand vfp neg and abs inline Date: Sun, 5 May 2024 18:03:21 -0700 Message-Id: <20240506010403.6204-16-richard.henderson@linaro.org> In-Reply-To: <20240506010403.6204-1-richard.henderson@linaro.org> References: <20240506010403.6204-1-richard.henderson@linaro.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=2607:f8b0:4864:20::102e; envelope-from=richard.henderson@linaro.org; helo=mail-pj1-x102e.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=unavailable autolearn_force=no X-Spam_action: no action Precedence: list Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org
Series	target/arm: Convert a64 advsimd to decodetree (part 1) \| expand [00/57] target/arm: Convert a64 advsimd to decodetree (part 1) [01/57] target/arm: Split out gengvec.c [02/57] target/arm: Split out gengvec64.c [03/57] target/arm: Convert Cryptographic AES to decodetree [04/57] target/arm: Convert Cryptographic 3-register SHA to decodetree [05/57] target/arm: Convert Cryptographic 2-register SHA to decodetree [06/57] target/arm: Convert Cryptographic 3-register SHA512 to decodetree [07/57] target/arm: Convert Cryptographic 2-register SHA512 to decodetree [08/57] target/arm: Convert Cryptographic 4-register to decodetree [09/57] target/arm: Convert Cryptographic 3-register, imm2 to decodetree [10/57] target/arm: Convert XAR to decodetree [11/57] target/arm: Convert Advanced SIMD copy to decodetree [12/57] target/arm: Convert FMULX to decodetree [13/57] target/arm: Convert FADD, FSUB, FDIV, FMUL to decodetree [14/57] target/arm: Convert FMAX, FMIN, FMAXNM, FMINNM to decodetree [15/57] target/arm: Expand vfp neg and abs inline [16/57] target/arm: Convert FNMUL to decodetree [17/57] target/arm: Convert FMLA, FMLS to decodetree [18/57] target/arm: Convert FCMEQ, FCMGE, FCMGT, FACGE, FACGT to decodetree [19/57] target/arm: Convert FABD to decodetree [20/57] target/arm: Convert FRECPS, FRSQRTS to decodetree [21/57] target/arm: Convert FADDP to decodetree [22/57] target/arm: Convert FMAXP, FMINP, FMAXNMP, FMINNMP to decodetree [23/57] target/arm: Use gvec for neon faddp, fmaxp, fminp [24/57] target/arm: Convert ADDP to decodetree [25/57] target/arm: Use gvec for neon padd [26/57] target/arm: Convert SMAXP, SMINP, UMAXP, UMINP to decodetree [27/57] target/arm: Use gvec for neon pmax, pmin [28/57] target/arm: Convert FMLAL, FMLSL to decodetree [29/57] target/arm: Convert disas_simd_3same_logic to decodetree [30/57] target/arm: Improve vector UQADD, UQSUB, SQADD, SQSUB [31/57] target/arm: Convert SUQADD and USQADD to gvec [32/57] target/arm: Inline scalar SUQADD and USQADD [33/57] target/arm: Inline scalar SQADD, UQADD, SQSUB, UQSUB [34/57] target/arm: Convert SQADD, SQSUB, UQADD, UQSUB to decodetree [35/57] target/arm: Convert SUQADD, USQADD to decodetree [36/57] target/arm: Convert SSHL, USHL to decodetree [37/57] target/arm: Convert SRSHL and URSHL (register) to gvec [38/57] target/arm: Convert SRSHL, URSHL to decodetree [39/57] target/arm: Convert SQSHL and UQSHL (register) to gvec [40/57] target/arm: Convert SQSHL, UQSHL to decodetree [41/57] target/arm: Convert SQRSHL and UQRSHL (register) to gvec [42/57] target/arm: Convert SQRSHL, UQRSHL to decodetree [43/57] target/arm: Convert ADD, SUB (vector) to decodetree [44/57] target/arm: Convert CMGT, CMHI, CMGE, CMHS, CMTST, CMEQ to decodetree [45/57] target/arm: Use TCG_COND_TSTNE in gen_cmtst_{i32,i64} [46/57] target/arm: Convert SHADD, UHADD to gvec [47/57] target/arm: Convert SHADD, UHADD to decodetree [48/57] target/arm: Convert SHSUB, UHSUB to gvec [49/57] target/arm: Convert SHSUB, UHSUB to decodetree [50/57] target/arm: Convert SRHADD, URHADD to gvec [51/57] target/arm: Convert SRHADD, URHADD to decodetree [52/57] target/arm: Convert SMAX, SMIN, UMAX, UMIN to decodetree [53/57] target/arm: Convert SABA, SABD, UABA, UABD to decodetree [54/57] target/arm: Convert MUL, PMUL to decodetree [55/57] target/arm: Convert MLA, MLS to decodetree [56/57] target/arm: Tidy SQDMULH, SQRDMULH (vector) [57/57] target/arm: Convert SQDMULH, SQRDMULH to decodetree

Message ID

20240506010403.6204-16-richard.henderson@linaro.org

State

Superseded

Headers

Received-SPF: pass (google.com: domain of
 qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as
 permitted sender) client-ip=209.51.188.17;
From: Richard Henderson <richard.henderson@linaro.org>
To: qemu-devel@nongnu.org
Cc: qemu-arm@nongnu.org
Subject: [PATCH 15/57] target/arm: Expand vfp neg and abs inline
Date: Sun,  5 May 2024 18:03:21 -0700
Message-Id: <20240506010403.6204-16-richard.henderson@linaro.org>
In-Reply-To: <20240506010403.6204-1-richard.henderson@linaro.org>
References: <20240506010403.6204-1-richard.henderson@linaro.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Received-SPF: pass client-ip=2607:f8b0:4864:20::102e;
 envelope-from=richard.henderson@linaro.org; helo=mail-pj1-x102e.google.com
X-Spam_score_int: -20
X-Spam_score: -2.1
X-Spam_bar: --
X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,
 DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1,
 RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001,
 SPF_PASS=-0.001 autolearn=unavailable autolearn_force=no
X-Spam_action: no action
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <https://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org
Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org

Series

target/arm: Convert a64 advsimd to decodetree (part 1) | expand

Commit Message

Richard Henderson May 6, 2024, 1:03 a.m. UTC

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper.h            |  6 ----
 target/arm/tcg/translate.h     | 30 +++++++++++++++++++
 target/arm/tcg/translate-a64.c | 44 +++++++++++++--------------
 target/arm/tcg/translate-vfp.c | 54 +++++++++++++++++-----------------
 target/arm/vfp_helper.c        | 30 -------------------
 5 files changed, 79 insertions(+), 85 deletions(-)

Comments

Peter Maydell May 23, 2024, 11:45 a.m. UTC | #1

On Mon, 6 May 2024 at 02:06, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/helper.h            |  6 ----
>  target/arm/tcg/translate.h     | 30 +++++++++++++++++++
>  target/arm/tcg/translate-a64.c | 44 +++++++++++++--------------
>  target/arm/tcg/translate-vfp.c | 54 +++++++++++++++++-----------------
>  target/arm/vfp_helper.c        | 30 -------------------
>  5 files changed, 79 insertions(+), 85 deletions(-)

> +static inline void gen_vfp_absh(TCGv_i32 d, TCGv_i32 s)
> +{
> +    tcg_gen_andi_i32(d, s, INT16_MAX);
> +}
> +
> +static inline void gen_vfp_abss(TCGv_i32 d, TCGv_i32 s)
> +{
> +    tcg_gen_andi_i32(d, s, INT32_MAX);
> +}
> +
> +static inline void gen_vfp_absd(TCGv_i64 d, TCGv_i64 s)
> +{
> +    tcg_gen_andi_i64(d, s, INT64_MAX);
> +}
> +
> +static inline void gen_vfp_negh(TCGv_i32 d, TCGv_i32 s)
> +{
> +    tcg_gen_xori_i32(d, s, 1u << 15);
> +}
> +
> +static inline void gen_vfp_negs(TCGv_i32 d, TCGv_i32 s)
> +{
> +    tcg_gen_xori_i32(d, s, 1u << 31);
> +}
> +
> +static inline void gen_vfp_negd(TCGv_i64 d, TCGv_i64 s)
> +{
> +    tcg_gen_xori_i64(d, s, 1ull << 63);
> +}

These will get a bit more complicated when we get to
handling FEAT_AFP, where abs or neg on a NaN doesn't
change its sign bit. But I guess we'll cross that bridge
when we get to it.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

Peter Maydell May 23, 2024, 4:29 p.m. UTC | #2

On Mon, 6 May 2024 at 02:06, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/helper.h            |  6 ----
>  target/arm/tcg/translate.h     | 30 +++++++++++++++++++
>  target/arm/tcg/translate-a64.c | 44 +++++++++++++--------------
>  target/arm/tcg/translate-vfp.c | 54 +++++++++++++++++-----------------
>  target/arm/vfp_helper.c        | 30 -------------------
>  5 files changed, 79 insertions(+), 85 deletions(-)

> +static inline void gen_vfp_absh(TCGv_i32 d, TCGv_i32 s)
> +{
> +    tcg_gen_andi_i32(d, s, INT16_MAX);
> +}
> +
> +static inline void gen_vfp_abss(TCGv_i32 d, TCGv_i32 s)
> +{
> +    tcg_gen_andi_i32(d, s, INT32_MAX);
> +}
> +
> +static inline void gen_vfp_absd(TCGv_i64 d, TCGv_i64 s)
> +{
> +    tcg_gen_andi_i64(d, s, INT64_MAX);
> +}
> +
> +static inline void gen_vfp_negh(TCGv_i32 d, TCGv_i32 s)
> +{
> +    tcg_gen_xori_i32(d, s, 1u << 15);

Just noticed something here -- we take a 32-bit input,
so if there is junk in the top half, we will leave it.
In contrast the old helper function:

> -dh_ctype_f16 VFP_HELPER(neg, h)(dh_ctype_f16 a)
> -{
> -    return float16_chs(a);
> -}

passed the value through the float16 type which the
float16_chs() function passes and returns, which is
a uint16_t. So it will zero out the top half, which
I think is the semantics we want for halfprec ops.

Maybe

static inline void gen_vfp_negh(TCGv_i32 d, TCGv_i32 s)
{
    tcg_gen_xori_i32(d, s, 1u << 15);
    tcg_gen_ext16u_i32(d, d);
}

?

There are probably places where the zero-extend
is redundant, but there are definitely places where
it is not, eg A32 VFP vneg.f16, because do_vfp_2op_hp()
doesn't do a "load a 16 bit float value", it does
vfp_load_reg32(), which is just a ld_i32.

Or we could change the VFP codegen to work like the A64
codegen, which has a read_fp_hreg() that does a
ld16u to read the input value.

gen_vfp_absh() doesn't have this problem because the
AND operation clears out the top half anyway.

thanks
-- PMM

diff --git a/target/arm/helper.h b/target/arm/helper.h
index 7ee15b9651..0fd01c9c52 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -132,12 +132,6 @@  DEF_HELPER_3(vfp_maxnumd, f64, f64, f64, ptr)
 DEF_HELPER_3(vfp_minnumh, f16, f16, f16, ptr)
 DEF_HELPER_3(vfp_minnums, f32, f32, f32, ptr)
 DEF_HELPER_3(vfp_minnumd, f64, f64, f64, ptr)
-DEF_HELPER_1(vfp_negh, f16, f16)
-DEF_HELPER_1(vfp_negs, f32, f32)
-DEF_HELPER_1(vfp_negd, f64, f64)
-DEF_HELPER_1(vfp_absh, f16, f16)
-DEF_HELPER_1(vfp_abss, f32, f32)
-DEF_HELPER_1(vfp_absd, f64, f64)
 DEF_HELPER_2(vfp_sqrth, f16, f16, env)
 DEF_HELPER_2(vfp_sqrts, f32, f32, env)
 DEF_HELPER_2(vfp_sqrtd, f64, f64, env)
diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
index ecfa242eef..b05a9eb668 100644
--- a/target/arm/tcg/translate.h
+++ b/target/arm/tcg/translate.h
@@ -406,6 +406,36 @@  static inline void gen_swstep_exception(DisasContext *s, int isv, int ex)
  */
 uint64_t vfp_expand_imm(int size, uint8_t imm8);
 
+static inline void gen_vfp_absh(TCGv_i32 d, TCGv_i32 s)
+{
+    tcg_gen_andi_i32(d, s, INT16_MAX);
+}
+
+static inline void gen_vfp_abss(TCGv_i32 d, TCGv_i32 s)
+{
+    tcg_gen_andi_i32(d, s, INT32_MAX);
+}
+
+static inline void gen_vfp_absd(TCGv_i64 d, TCGv_i64 s)
+{
+    tcg_gen_andi_i64(d, s, INT64_MAX);
+}
+
+static inline void gen_vfp_negh(TCGv_i32 d, TCGv_i32 s)
+{
+    tcg_gen_xori_i32(d, s, 1u << 15);
+}
+
+static inline void gen_vfp_negs(TCGv_i32 d, TCGv_i32 s)
+{
+    tcg_gen_xori_i32(d, s, 1u << 31);
+}
+
+static inline void gen_vfp_negd(TCGv_i64 d, TCGv_i64 s)
+{
+    tcg_gen_xori_i64(d, s, 1ull << 63);
+}
+
 /* Vector operations shared between ARM and AArch64.  */
 void gen_gvec_ceq0(unsigned vece, uint32_t rd_ofs, uint32_t rm_ofs,
                    uint32_t opr_sz, uint32_t max_sz);
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index c7b379385f..e6c3da5b2a 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -6592,10 +6592,10 @@  static void handle_fp_1src_half(DisasContext *s, int opcode, int rd, int rn)
         tcg_gen_mov_i32(tcg_res, tcg_op);
         break;
     case 0x1: /* FABS */
-        tcg_gen_andi_i32(tcg_res, tcg_op, 0x7fff);
+        gen_vfp_absh(tcg_res, tcg_op);
         break;
     case 0x2: /* FNEG */
-        tcg_gen_xori_i32(tcg_res, tcg_op, 0x8000);
+        gen_vfp_negh(tcg_res, tcg_op);
         break;
     case 0x3: /* FSQRT */
         fpst = fpstatus_ptr(FPST_FPCR_F16);
@@ -6646,10 +6646,10 @@  static void handle_fp_1src_single(DisasContext *s, int opcode, int rd, int rn)
         tcg_gen_mov_i32(tcg_res, tcg_op);
         goto done;
     case 0x1: /* FABS */
-        gen_helper_vfp_abss(tcg_res, tcg_op);
+        gen_vfp_abss(tcg_res, tcg_op);
         goto done;
     case 0x2: /* FNEG */
-        gen_helper_vfp_negs(tcg_res, tcg_op);
+        gen_vfp_negs(tcg_res, tcg_op);
         goto done;
     case 0x3: /* FSQRT */
         gen_helper_vfp_sqrts(tcg_res, tcg_op, tcg_env);
@@ -6721,10 +6721,10 @@  static void handle_fp_1src_double(DisasContext *s, int opcode, int rd, int rn)
 
     switch (opcode) {
     case 0x1: /* FABS */
-        gen_helper_vfp_absd(tcg_res, tcg_op);
+        gen_vfp_absd(tcg_res, tcg_op);
         goto done;
     case 0x2: /* FNEG */
-        gen_helper_vfp_negd(tcg_res, tcg_op);
+        gen_vfp_negd(tcg_res, tcg_op);
         goto done;
     case 0x3: /* FSQRT */
         gen_helper_vfp_sqrtd(tcg_res, tcg_op, tcg_env);
@@ -6950,7 +6950,7 @@  static void handle_fp_2src_single(DisasContext *s, int opcode,
     switch (opcode) {
     case 0x8: /* FNMUL */
         gen_helper_vfp_muls(tcg_res, tcg_op1, tcg_op2, fpst);
-        gen_helper_vfp_negs(tcg_res, tcg_res);
+        gen_vfp_negs(tcg_res, tcg_res);
         break;
     default:
     case 0x0: /* FMUL */
@@ -6984,7 +6984,7 @@  static void handle_fp_2src_double(DisasContext *s, int opcode,
     switch (opcode) {
     case 0x8: /* FNMUL */
         gen_helper_vfp_muld(tcg_res, tcg_op1, tcg_op2, fpst);
-        gen_helper_vfp_negd(tcg_res, tcg_res);
+        gen_vfp_negd(tcg_res, tcg_res);
         break;
     default:
     case 0x0: /* FMUL */
@@ -7018,7 +7018,7 @@  static void handle_fp_2src_half(DisasContext *s, int opcode,
     switch (opcode) {
     case 0x8: /* FNMUL */
         gen_helper_advsimd_mulh(tcg_res, tcg_op1, tcg_op2, fpst);
-        tcg_gen_xori_i32(tcg_res, tcg_res, 0x8000);
+        gen_vfp_negh(tcg_res, tcg_res);
         break;
     default:
     case 0x0: /* FMUL */
@@ -7103,11 +7103,11 @@  static void handle_fp_3src_single(DisasContext *s, bool o0, bool o1,
      * flipped if it is a negated-input.
      */
     if (o1 == true) {
-        gen_helper_vfp_negs(tcg_op3, tcg_op3);
+        gen_vfp_negs(tcg_op3, tcg_op3);
     }
 
     if (o0 != o1) {
-        gen_helper_vfp_negs(tcg_op1, tcg_op1);
+        gen_vfp_negs(tcg_op1, tcg_op1);
     }
 
     gen_helper_vfp_muladds(tcg_res, tcg_op1, tcg_op2, tcg_op3, fpst);
@@ -7135,11 +7135,11 @@  static void handle_fp_3src_double(DisasContext *s, bool o0, bool o1,
      * flipped if it is a negated-input.
      */
     if (o1 == true) {
-        gen_helper_vfp_negd(tcg_op3, tcg_op3);
+        gen_vfp_negd(tcg_op3, tcg_op3);
     }
 
     if (o0 != o1) {
-        gen_helper_vfp_negd(tcg_op1, tcg_op1);
+        gen_vfp_negd(tcg_op1, tcg_op1);
     }
 
     gen_helper_vfp_muladdd(tcg_res, tcg_op1, tcg_op2, tcg_op3, fpst);
@@ -9240,7 +9240,7 @@  static void handle_3same_float(DisasContext *s, int size, int elements,
             switch (fpopcode) {
             case 0x39: /* FMLS */
                 /* As usual for ARM, separate negation for fused multiply-add */
-                gen_helper_vfp_negd(tcg_op1, tcg_op1);
+                gen_vfp_negd(tcg_op1, tcg_op1);
                 /* fall through */
             case 0x19: /* FMLA */
                 read_vec_element(s, tcg_res, rd, pass, MO_64);
@@ -9264,7 +9264,7 @@  static void handle_3same_float(DisasContext *s, int size, int elements,
                 break;
             case 0x7a: /* FABD */
                 gen_helper_vfp_subd(tcg_res, tcg_op1, tcg_op2, fpst);
-                gen_helper_vfp_absd(tcg_res, tcg_res);
+                gen_vfp_absd(tcg_res, tcg_res);
                 break;
             case 0x7c: /* FCMGT */
                 gen_helper_neon_cgt_f64(tcg_res, tcg_op1, tcg_op2, fpst);
@@ -9298,7 +9298,7 @@  static void handle_3same_float(DisasContext *s, int size, int elements,
             switch (fpopcode) {
             case 0x39: /* FMLS */
                 /* As usual for ARM, separate negation for fused multiply-add */
-                gen_helper_vfp_negs(tcg_op1, tcg_op1);
+                gen_vfp_negs(tcg_op1, tcg_op1);
                 /* fall through */
             case 0x19: /* FMLA */
                 read_vec_element_i32(s, tcg_res, rd, pass, MO_32);
@@ -9322,7 +9322,7 @@  static void handle_3same_float(DisasContext *s, int size, int elements,
                 break;
             case 0x7a: /* FABD */
                 gen_helper_vfp_subs(tcg_res, tcg_op1, tcg_op2, fpst);
-                gen_helper_vfp_abss(tcg_res, tcg_res);
+                gen_vfp_abss(tcg_res, tcg_res);
                 break;
             case 0x7c: /* FCMGT */
                 gen_helper_neon_cgt_f32(tcg_res, tcg_op1, tcg_op2, fpst);
@@ -9735,10 +9735,10 @@  static void handle_2misc_64(DisasContext *s, int opcode, bool u,
         }
         break;
     case 0x2f: /* FABS */
-        gen_helper_vfp_absd(tcg_rd, tcg_rn);
+        gen_vfp_absd(tcg_rd, tcg_rn);
         break;
     case 0x6f: /* FNEG */
-        gen_helper_vfp_negd(tcg_rd, tcg_rn);
+        gen_vfp_negd(tcg_rd, tcg_rn);
         break;
     case 0x7f: /* FSQRT */
         gen_helper_vfp_sqrtd(tcg_rd, tcg_rn, tcg_env);
@@ -12561,10 +12561,10 @@  static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn)
                     }
                     break;
                 case 0x2f: /* FABS */
-                    gen_helper_vfp_abss(tcg_res, tcg_op);
+                    gen_vfp_abss(tcg_res, tcg_op);
                     break;
                 case 0x6f: /* FNEG */
-                    gen_helper_vfp_negs(tcg_res, tcg_op);
+                    gen_vfp_negs(tcg_res, tcg_op);
                     break;
                 case 0x7f: /* FSQRT */
                     gen_helper_vfp_sqrts(tcg_res, tcg_op, tcg_env);
@@ -13285,7 +13285,7 @@  static void disas_simd_indexed(DisasContext *s, uint32_t insn)
             switch (16 * u + opcode) {
             case 0x05: /* FMLS */
                 /* As usual for ARM, separate negation for fused multiply-add */
-                gen_helper_vfp_negd(tcg_op, tcg_op);
+                gen_vfp_negd(tcg_op, tcg_op);
                 /* fall through */
             case 0x01: /* FMLA */
                 read_vec_element(s, tcg_res, rd, pass, MO_64);
diff --git a/target/arm/tcg/translate-vfp.c b/target/arm/tcg/translate-vfp.c
index b9af03b7c3..ee53257687 100644
--- a/target/arm/tcg/translate-vfp.c
+++ b/target/arm/tcg/translate-vfp.c
@@ -1763,7 +1763,7 @@  static void gen_VMLS_hp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
     TCGv_i32 tmp = tcg_temp_new_i32();
 
     gen_helper_vfp_mulh(tmp, vn, vm, fpst);
-    gen_helper_vfp_negh(tmp, tmp);
+    gen_vfp_negh(tmp, tmp);
     gen_helper_vfp_addh(vd, vd, tmp, fpst);
 }
 
@@ -1781,7 +1781,7 @@  static void gen_VMLS_sp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
     TCGv_i32 tmp = tcg_temp_new_i32();
 
     gen_helper_vfp_muls(tmp, vn, vm, fpst);
-    gen_helper_vfp_negs(tmp, tmp);
+    gen_vfp_negs(tmp, tmp);
     gen_helper_vfp_adds(vd, vd, tmp, fpst);
 }
 
@@ -1799,7 +1799,7 @@  static void gen_VMLS_dp(TCGv_i64 vd, TCGv_i64 vn, TCGv_i64 vm, TCGv_ptr fpst)
     TCGv_i64 tmp = tcg_temp_new_i64();
 
     gen_helper_vfp_muld(tmp, vn, vm, fpst);
-    gen_helper_vfp_negd(tmp, tmp);
+    gen_vfp_negd(tmp, tmp);
     gen_helper_vfp_addd(vd, vd, tmp, fpst);
 }
 
@@ -1819,7 +1819,7 @@  static void gen_VNMLS_hp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
     TCGv_i32 tmp = tcg_temp_new_i32();
 
     gen_helper_vfp_mulh(tmp, vn, vm, fpst);
-    gen_helper_vfp_negh(vd, vd);
+    gen_vfp_negh(vd, vd);
     gen_helper_vfp_addh(vd, vd, tmp, fpst);
 }
 
@@ -1839,7 +1839,7 @@  static void gen_VNMLS_sp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
     TCGv_i32 tmp = tcg_temp_new_i32();
 
     gen_helper_vfp_muls(tmp, vn, vm, fpst);
-    gen_helper_vfp_negs(vd, vd);
+    gen_vfp_negs(vd, vd);
     gen_helper_vfp_adds(vd, vd, tmp, fpst);
 }
 
@@ -1859,7 +1859,7 @@  static void gen_VNMLS_dp(TCGv_i64 vd, TCGv_i64 vn, TCGv_i64 vm, TCGv_ptr fpst)
     TCGv_i64 tmp = tcg_temp_new_i64();
 
     gen_helper_vfp_muld(tmp, vn, vm, fpst);
-    gen_helper_vfp_negd(vd, vd);
+    gen_vfp_negd(vd, vd);
     gen_helper_vfp_addd(vd, vd, tmp, fpst);
 }
 
@@ -1874,8 +1874,8 @@  static void gen_VNMLA_hp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
     TCGv_i32 tmp = tcg_temp_new_i32();
 
     gen_helper_vfp_mulh(tmp, vn, vm, fpst);
-    gen_helper_vfp_negh(tmp, tmp);
-    gen_helper_vfp_negh(vd, vd);
+    gen_vfp_negh(tmp, tmp);
+    gen_vfp_negh(vd, vd);
     gen_helper_vfp_addh(vd, vd, tmp, fpst);
 }
 
@@ -1890,8 +1890,8 @@  static void gen_VNMLA_sp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
     TCGv_i32 tmp = tcg_temp_new_i32();
 
     gen_helper_vfp_muls(tmp, vn, vm, fpst);
-    gen_helper_vfp_negs(tmp, tmp);
-    gen_helper_vfp_negs(vd, vd);
+    gen_vfp_negs(tmp, tmp);
+    gen_vfp_negs(vd, vd);
     gen_helper_vfp_adds(vd, vd, tmp, fpst);
 }
 
@@ -1906,8 +1906,8 @@  static void gen_VNMLA_dp(TCGv_i64 vd, TCGv_i64 vn, TCGv_i64 vm, TCGv_ptr fpst)
     TCGv_i64 tmp = tcg_temp_new_i64();
 
     gen_helper_vfp_muld(tmp, vn, vm, fpst);
-    gen_helper_vfp_negd(tmp, tmp);
-    gen_helper_vfp_negd(vd, vd);
+    gen_vfp_negd(tmp, tmp);
+    gen_vfp_negd(vd, vd);
     gen_helper_vfp_addd(vd, vd, tmp, fpst);
 }
 
@@ -1935,7 +1935,7 @@  static void gen_VNMUL_hp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
 {
     /* VNMUL: -(fn * fm) */
     gen_helper_vfp_mulh(vd, vn, vm, fpst);
-    gen_helper_vfp_negh(vd, vd);
+    gen_vfp_negh(vd, vd);
 }
 
 static bool trans_VNMUL_hp(DisasContext *s, arg_VNMUL_sp *a)
@@ -1947,7 +1947,7 @@  static void gen_VNMUL_sp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
 {
     /* VNMUL: -(fn * fm) */
     gen_helper_vfp_muls(vd, vn, vm, fpst);
-    gen_helper_vfp_negs(vd, vd);
+    gen_vfp_negs(vd, vd);
 }
 
 static bool trans_VNMUL_sp(DisasContext *s, arg_VNMUL_sp *a)
@@ -1959,7 +1959,7 @@  static void gen_VNMUL_dp(TCGv_i64 vd, TCGv_i64 vn, TCGv_i64 vm, TCGv_ptr fpst)
 {
     /* VNMUL: -(fn * fm) */
     gen_helper_vfp_muld(vd, vn, vm, fpst);
-    gen_helper_vfp_negd(vd, vd);
+    gen_vfp_negd(vd, vd);
 }
 
 static bool trans_VNMUL_dp(DisasContext *s, arg_VNMUL_dp *a)
@@ -2110,12 +2110,12 @@  static bool do_vfm_hp(DisasContext *s, arg_VFMA_sp *a, bool neg_n, bool neg_d)
     vfp_load_reg32(vm, a->vm);
     if (neg_n) {
         /* VFNMS, VFMS */
-        gen_helper_vfp_negh(vn, vn);
+        gen_vfp_negh(vn, vn);
     }
     vfp_load_reg32(vd, a->vd);
     if (neg_d) {
         /* VFNMA, VFNMS */
-        gen_helper_vfp_negh(vd, vd);
+        gen_vfp_negh(vd, vd);
     }
     fpst = fpstatus_ptr(FPST_FPCR_F16);
     gen_helper_vfp_muladdh(vd, vn, vm, vd, fpst);
@@ -2169,12 +2169,12 @@  static bool do_vfm_sp(DisasContext *s, arg_VFMA_sp *a, bool neg_n, bool neg_d)
     vfp_load_reg32(vm, a->vm);
     if (neg_n) {
         /* VFNMS, VFMS */
-        gen_helper_vfp_negs(vn, vn);
+        gen_vfp_negs(vn, vn);
     }
     vfp_load_reg32(vd, a->vd);
     if (neg_d) {
         /* VFNMA, VFNMS */
-        gen_helper_vfp_negs(vd, vd);
+        gen_vfp_negs(vd, vd);
     }
     fpst = fpstatus_ptr(FPST_FPCR);
     gen_helper_vfp_muladds(vd, vn, vm, vd, fpst);
@@ -2234,12 +2234,12 @@  static bool do_vfm_dp(DisasContext *s, arg_VFMA_dp *a, bool neg_n, bool neg_d)
     vfp_load_reg64(vm, a->vm);
     if (neg_n) {
         /* VFNMS, VFMS */
-        gen_helper_vfp_negd(vn, vn);
+        gen_vfp_negd(vn, vn);
     }
     vfp_load_reg64(vd, a->vd);
     if (neg_d) {
         /* VFNMA, VFNMS */
-        gen_helper_vfp_negd(vd, vd);
+        gen_vfp_negd(vd, vd);
     }
     fpst = fpstatus_ptr(FPST_FPCR);
     gen_helper_vfp_muladdd(vd, vn, vm, vd, fpst);
@@ -2409,13 +2409,13 @@  static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
 DO_VFP_VMOV(VMOV_reg, sp, tcg_gen_mov_i32)
 DO_VFP_VMOV(VMOV_reg, dp, tcg_gen_mov_i64)
 
-DO_VFP_2OP(VABS, hp, gen_helper_vfp_absh, aa32_fp16_arith)
-DO_VFP_2OP(VABS, sp, gen_helper_vfp_abss, aa32_fpsp_v2)
-DO_VFP_2OP(VABS, dp, gen_helper_vfp_absd, aa32_fpdp_v2)
+DO_VFP_2OP(VABS, hp, gen_vfp_absh, aa32_fp16_arith)
+DO_VFP_2OP(VABS, sp, gen_vfp_abss, aa32_fpsp_v2)
+DO_VFP_2OP(VABS, dp, gen_vfp_absd, aa32_fpdp_v2)
 
-DO_VFP_2OP(VNEG, hp, gen_helper_vfp_negh, aa32_fp16_arith)
-DO_VFP_2OP(VNEG, sp, gen_helper_vfp_negs, aa32_fpsp_v2)
-DO_VFP_2OP(VNEG, dp, gen_helper_vfp_negd, aa32_fpdp_v2)
+DO_VFP_2OP(VNEG, hp, gen_vfp_negh, aa32_fp16_arith)
+DO_VFP_2OP(VNEG, sp, gen_vfp_negs, aa32_fpsp_v2)
+DO_VFP_2OP(VNEG, dp, gen_vfp_negd, aa32_fpdp_v2)
 
 static void gen_VSQRT_hp(TCGv_i32 vd, TCGv_i32 vm)
 {
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index 3e5e37abbe..ce26b8a71a 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -281,36 +281,6 @@  VFP_BINOP(minnum)
 VFP_BINOP(maxnum)
 #undef VFP_BINOP
 
-dh_ctype_f16 VFP_HELPER(neg, h)(dh_ctype_f16 a)
-{
-    return float16_chs(a);
-}
-
-float32 VFP_HELPER(neg, s)(float32 a)
-{
-    return float32_chs(a);
-}
-
-float64 VFP_HELPER(neg, d)(float64 a)
-{
-    return float64_chs(a);
-}
-
-dh_ctype_f16 VFP_HELPER(abs, h)(dh_ctype_f16 a)
-{
-    return float16_abs(a);
-}
-
-float32 VFP_HELPER(abs, s)(float32 a)
-{
-    return float32_abs(a);
-}
-
-float64 VFP_HELPER(abs, d)(float64 a)
-{
-    return float64_abs(a);
-}
-
 dh_ctype_f16 VFP_HELPER(sqrt, h)(dh_ctype_f16 a, CPUARMState *env)
 {
     return float16_sqrt(a, &env->vfp.fp_status_f16);

[15/57] target/arm: Expand vfp neg and abs inline

Commit Message

Comments

Patch