From patchwork Fri Mar 14 18:38:13 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Peter Maydell X-Patchwork-Id: 26294 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-pd0-f200.google.com (mail-pd0-f200.google.com [209.85.192.200]) by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id CF301202DD for ; Fri, 14 Mar 2014 18:38:27 +0000 (UTC) Received: by mail-pd0-f200.google.com with SMTP id p10sf6458956pdj.3 for ; Fri, 14 Mar 2014 11:38:27 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:delivered-to:from:to:cc:subject:date:message-id :in-reply-to:references:mime-version:x-original-sender :x-original-authentication-results:precedence:mailing-list:list-id :list-post:list-help:list-archive:list-unsubscribe:content-type :content-transfer-encoding; bh=r8V+VKeKTeUiJJb28UG4RpcQDuwv3/YNWgGb3Owy//A=; b=GR94f/QYxiIXHibH64u3kvvKqge1N/3uaoJnt2n5PhipGVw/PC+ek4wAz3ZpifaRum v3A1+U97z3ZjaKV8OFlVZflGQ2/hMuN2Y1A0cL4QOzqAZ/CuiFvae35SbRUG5iGZbYMw xyk19kLrWt8XO7A/raP9/Pn+cng9wxnQTJJ5YJYYE1y3peISUn4SrKj5+wQkMZUcjxCV K0P6x94eADC1l8YiabLxTBWZKXhCaPdjrZKWTDsN/U3JV93H5d2p51Zeg85esyq2ZWwM vjGFjyefOgR002Uv1YB5bVIC2pCWZt5WqaNqeOWVzFU0QIt/+HEmKha6lNx4hF6LNOLb E9BQ== X-Gm-Message-State: ALoCoQmeekG1ak6ic8UIzJTLjEmnTPJM/9lrPGexsfH2ZjMaVV92P16cO9v+6jWhLwU7k/CIeWbZ X-Received: by 10.66.146.65 with SMTP id ta1mr3776780pab.19.1394822307086; Fri, 14 Mar 2014 11:38:27 -0700 (PDT) X-BeenThere: patchwork-forward@linaro.org Received: by 10.140.19.81 with SMTP id 75ls102897qgg.75.gmail; Fri, 14 Mar 2014 11:38:27 -0700 (PDT) X-Received: by 10.221.61.210 with SMTP id wx18mr2090592vcb.27.1394822306991; Fri, 14 Mar 2014 11:38:26 -0700 (PDT) Received: from mail-vc0-f178.google.com (mail-vc0-f178.google.com [209.85.220.178]) by mx.google.com with ESMTPS id tr5si2411504vdc.65.2014.03.14.11.38.26 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 14 Mar 2014 11:38:26 -0700 (PDT) Received-SPF: neutral (google.com: 209.85.220.178 is neither permitted nor denied by best guess record for domain of patch+caf_=patchwork-forward=linaro.org@linaro.org) client-ip=209.85.220.178; Received: by mail-vc0-f178.google.com with SMTP id im17so3286000vcb.9 for ; Fri, 14 Mar 2014 11:38:26 -0700 (PDT) X-Received: by 10.58.207.74 with SMTP id lu10mr7407419vec.15.1394822306897; Fri, 14 Mar 2014 11:38:26 -0700 (PDT) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patches@linaro.org Received: by 10.220.78.9 with SMTP id i9csp43452vck; Fri, 14 Mar 2014 11:38:26 -0700 (PDT) X-Received: by 10.204.169.81 with SMTP id x17mr3938bky.132.1394822302914; Fri, 14 Mar 2014 11:38:22 -0700 (PDT) Received: from mnementh.archaic.org.uk (mnementh.archaic.org.uk. [2001:8b0:1d0::1]) by mx.google.com with ESMTPS id yy2si2951499bkb.174.2014.03.14.11.38.21 for (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Fri, 14 Mar 2014 11:38:22 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of pm215@archaic.org.uk designates 2001:8b0:1d0::1 as permitted sender) client-ip=2001:8b0:1d0::1; Received: from pm215 by mnementh.archaic.org.uk with local (Exim 4.80) (envelope-from ) id 1WOWzn-0003tK-Qu; Fri, 14 Mar 2014 18:38:15 +0000 From: Peter Maydell To: qemu-devel@nongnu.org Cc: patches@linaro.org, Alexander Graf , Michael Matz , Dirk Mueller , Laurent Desnogues , kvmarm@lists.cs.columbia.edu, Richard Henderson , =?UTF-8?q?Alex=20Benn=C3=A9e?= , Christoffer Dall , Will Newton , Peter Crosthwaite Subject: [PATCH v2 24/25] target-arm: A64: Add [UF]RSQRTE (reciprocal root estimate) Date: Fri, 14 Mar 2014 18:38:13 +0000 Message-Id: <1394822294-14837-25-git-send-email-peter.maydell@linaro.org> X-Mailer: git-send-email 1.7.10.4 In-Reply-To: <1394822294-14837-1-git-send-email-peter.maydell@linaro.org> References: <1394822294-14837-1-git-send-email-peter.maydell@linaro.org> MIME-Version: 1.0 X-Removed-Original-Auth: Dkim didn't pass. X-Original-Sender: peter.maydell@linaro.org X-Original-Authentication-Results: mx.google.com; spf=neutral (google.com: 209.85.220.178 is neither permitted nor denied by best guess record for domain of patch+caf_=patchwork-forward=linaro.org@linaro.org) smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org Precedence: list Mailing-list: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org List-ID: X-Google-Group-Id: 836684582541 List-Post: , List-Help: , List-Archive: List-Unsubscribe: , From: Alex Bennée This adds support for [UF]RSQRTE instructions. It utilises the existing NEON helpers with some changes. The changes include an explicit passing of fpstatus (so the correct one is used between arm32 and aarch64), denormilzation, more correct error handling and also proper scaling of the fraction going into the estimate. Signed-off-by: Alex Bennée --- target-arm/helper.c | 133 ++++++++++++++++++++++++++++++++++++--------- target-arm/helper.h | 5 +- target-arm/translate-a64.c | 27 +++++++-- target-arm/translate.c | 12 +++- 4 files changed, 140 insertions(+), 37 deletions(-) diff --git a/target-arm/helper.c b/target-arm/helper.c index 9059dea..64d982a 100644 --- a/target-arm/helper.c +++ b/target-arm/helper.c @@ -4720,12 +4720,12 @@ float64 HELPER(recpe_f64)(float64 input, void *fpstp) /* The algorithm that must be used to calculate the estimate * is specified by the ARM ARM. */ -static float64 recip_sqrt_estimate(float64 a, CPUARMState *env) +static float64 recip_sqrt_estimate(float64 a, float_status *real_fp_status) { /* These calculations mustn't set any fp exception flags, * so we use a local copy of the fp_status. */ - float_status dummy_status = env->vfp.standard_fp_status; + float_status dummy_status = *real_fp_status; float_status *s = &dummy_status; float64 q; int64_t q_int; @@ -4772,49 +4772,64 @@ static float64 recip_sqrt_estimate(float64 a, CPUARMState *env) return float64_div(int64_to_float64(q_int, s), float64_256, s); } -float32 HELPER(rsqrte_f32)(float32 a, CPUARMState *env) +float32 HELPER(rsqrte_f32)(float32 input, void *fpstp) { - float_status *s = &env->vfp.standard_fp_status; + float_status *s = fpstp; + float32 f32 = float32_squash_input_denormal(input, s); + uint32_t val = float32_val(f32); + uint32_t f32_sbit = 0x80000000 & val; + int32_t f32_exp = extract32(val, 23, 8); + uint32_t f32_frac = extract32(val, 0, 23); + uint64_t f64_frac; + uint64_t val64; int result_exp; float64 f64; - uint32_t val; - uint64_t val64; - val = float32_val(a); - - if (float32_is_any_nan(a)) { - if (float32_is_signaling_nan(a)) { + if (float32_is_any_nan(f32)) { + float32 nan = f32; + if (float32_is_signaling_nan(f32)) { float_raise(float_flag_invalid, s); + nan = float32_maybe_silence_nan(f32); } - return float32_default_nan; - } else if (float32_is_zero_or_denormal(a)) { - if (!float32_is_zero(a)) { - float_raise(float_flag_input_denormal, s); + if (s->default_nan_mode) { + nan = float32_default_nan; } + return nan; + } else if (float32_is_zero(f32)) { float_raise(float_flag_divbyzero, s); - return float32_set_sign(float32_infinity, float32_is_neg(a)); - } else if (float32_is_neg(a)) { + return float32_set_sign(float32_infinity, float32_is_neg(f32)); + } else if (float32_is_neg(f32)) { float_raise(float_flag_invalid, s); return float32_default_nan; - } else if (float32_is_infinity(a)) { + } else if (float32_is_infinity(f32)) { return float32_zero; } - /* Normalize to a double-precision value between 0.25 and 1.0, + /* Scale and normalize to a double-precision value between 0.25 and 1.0, * preserving the parity of the exponent. */ - if ((val & 0x800000) == 0) { - f64 = make_float64(((uint64_t)(val & 0x80000000) << 32) + + f64_frac = ((uint64_t) f32_frac) << 29; + if (f32_exp == 0) { + while (extract64(f64_frac, 51, 1) == 0) { + f64_frac = f64_frac << 1; + f32_exp = f32_exp-1; + } + f64_frac = extract64(f64_frac, 0, 51) << 1; + } + + if (extract64(f32_exp, 0, 1) == 0) { + f64 = make_float64(((uint64_t) f32_sbit) << 32 | (0x3feULL << 52) - | ((uint64_t)(val & 0x7fffff) << 29)); + | f64_frac); } else { - f64 = make_float64(((uint64_t)(val & 0x80000000) << 32) + f64 = make_float64(((uint64_t) f32_sbit) << 32 | (0x3fdULL << 52) - | ((uint64_t)(val & 0x7fffff) << 29)); + | f64_frac); } - result_exp = (380 - ((val & 0x7f800000) >> 23)) / 2; + result_exp = (380 - f32_exp) / 2; - f64 = recip_sqrt_estimate(f64, env); + f64 = recip_sqrt_estimate(f64, s); val64 = float64_val(f64); @@ -4823,6 +4838,69 @@ float32 HELPER(rsqrte_f32)(float32 a, CPUARMState *env) return make_float32(val); } +float64 HELPER(rsqrte_f64)(float64 input, void *fpstp) +{ + float_status *s = fpstp; + float64 f64 = float64_squash_input_denormal(input, s); + uint64_t val = float64_val(f64); + uint64_t f64_sbit = 0x8000000000000000ULL & val; + int64_t f64_exp = extract64(val, 52, 11); + uint64_t f64_frac = extract64(val, 0, 52); + int64_t result_exp; + uint64_t result_frac; + + if (float64_is_any_nan(f64)) { + float64 nan = f64; + if (float64_is_signaling_nan(f64)) { + float_raise(float_flag_invalid, s); + nan = float64_maybe_silence_nan(f64); + } + if (s->default_nan_mode) { + nan = float64_default_nan; + } + return nan; + } else if (float64_is_zero(f64)) { + float_raise(float_flag_divbyzero, s); + return float64_set_sign(float64_infinity, float64_is_neg(f64)); + } else if (float64_is_neg(f64)) { + float_raise(float_flag_invalid, s); + return float64_default_nan; + } else if (float64_is_infinity(f64)) { + return float64_zero; + } + + /* Scale and normalize to a double-precision value between 0.25 and 1.0, + * preserving the parity of the exponent. */ + + if (f64_exp == 0) { + while (extract64(f64_frac, 51, 1) == 0) { + f64_frac = f64_frac << 1; + f64_exp = f64_exp - 1; + } + f64_frac = extract64(f64_frac, 0, 51) << 1; + } + + if (extract64(f64_exp, 0, 1) == 0) { + f64 = make_float64(f64_sbit + | (0x3feULL << 52) + | f64_frac); + } else { + f64 = make_float64(f64_sbit + | (0x3fdULL << 52) + | f64_frac); + } + + result_exp = (3068 - f64_exp) / 2; + + f64 = recip_sqrt_estimate(f64, s); + + result_frac = extract64(float64_val(f64), 0, 52); + + return make_float64(f64_sbit | + ((result_exp & 0x7ff) << 52) | + result_frac); +} + uint32_t HELPER(recpe_u32)(uint32_t a, void *fpstp) { float_status *s = fpstp; @@ -4840,8 +4918,9 @@ uint32_t HELPER(recpe_u32)(uint32_t a, void *fpstp) return 0x80000000 | ((float64_val(f64) >> 21) & 0x7fffffff); } -uint32_t HELPER(rsqrte_u32)(uint32_t a, CPUARMState *env) +uint32_t HELPER(rsqrte_u32)(uint32_t a, void *fpstp) { + float_status *fpst = fpstp; float64 f64; if ((a & 0xc0000000) == 0) { @@ -4856,7 +4935,7 @@ uint32_t HELPER(rsqrte_u32)(uint32_t a, CPUARMState *env) | ((uint64_t)(a & 0x3fffffff) << 22)); } - f64 = recip_sqrt_estimate(f64, env); + f64 = recip_sqrt_estimate(f64, fpst); return 0x80000000 | ((float64_val(f64) >> 21) & 0x7fffffff); } diff --git a/target-arm/helper.h b/target-arm/helper.h index f96a824..a3d6f32 100644 --- a/target-arm/helper.h +++ b/target-arm/helper.h @@ -169,9 +169,10 @@ DEF_HELPER_3(recps_f32, f32, f32, f32, env) DEF_HELPER_3(rsqrts_f32, f32, f32, f32, env) DEF_HELPER_FLAGS_2(recpe_f32, TCG_CALL_NO_RWG, f32, f32, ptr) DEF_HELPER_FLAGS_2(recpe_f64, TCG_CALL_NO_RWG, f64, f64, ptr) -DEF_HELPER_2(rsqrte_f32, f32, f32, env) +DEF_HELPER_FLAGS_2(rsqrte_f32, TCG_CALL_NO_RWG, f32, f32, ptr) +DEF_HELPER_FLAGS_2(rsqrte_f64, TCG_CALL_NO_RWG, f64, f64, ptr) DEF_HELPER_2(recpe_u32, i32, i32, ptr) -DEF_HELPER_2(rsqrte_u32, i32, i32, env) +DEF_HELPER_FLAGS_2(rsqrte_u32, TCG_CALL_NO_RWG, i32, i32, ptr) DEF_HELPER_5(neon_tbl, i32, env, i32, i32, i32, i32) DEF_HELPER_3(shl_cc, i32, env, i32, i32) diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c index 235f880..befffac 100644 --- a/target-arm/translate-a64.c +++ b/target-arm/translate-a64.c @@ -7146,6 +7146,9 @@ static void handle_2misc_reciprocal(DisasContext *s, int opcode, case 0x3f: /* FRECPX */ gen_helper_frecpx_f64(tcg_res, tcg_op, fpst); break; + case 0x7d: /* FRSQRTE */ + gen_helper_rsqrte_f64(tcg_res, tcg_op, fpst); + break; default: g_assert_not_reached(); } @@ -7181,6 +7184,9 @@ static void handle_2misc_reciprocal(DisasContext *s, int opcode, case 0x3f: /* FRECPX */ gen_helper_frecpx_f32(tcg_res, tcg_op, fpst); break; + case 0x7d: /* FRSQRTE */ + gen_helper_rsqrte_f32(tcg_res, tcg_op, fpst); + break; default: g_assert_not_reached(); } @@ -7378,6 +7384,7 @@ static void disas_simd_scalar_two_reg_misc(DisasContext *s, uint32_t insn) } case 0x3d: /* FRECPE */ case 0x3f: /* FRECPX */ + case 0x7d: /* FRSQRTE */ handle_2misc_reciprocal(s, opcode, true, u, true, size, rn, rd); return; case 0x1a: /* FCVTNS */ @@ -7404,9 +7411,6 @@ static void disas_simd_scalar_two_reg_misc(DisasContext *s, uint32_t insn) } handle_2misc_narrow(s, true, opcode, u, false, size - 1, rn, rd); return; - case 0x7d: /* FRSQRTE */ - unsupported_encoding(s, insn); - return; default: unallocated_encoding(s); return; @@ -9255,6 +9259,11 @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) } /* fall through */ case 0x3d: /* FRECPE */ + case 0x7d: /* FRSQRTE */ + if (size == 3 && !is_q) { + unallocated_encoding(s); + return; + } handle_2misc_reciprocal(s, opcode, false, u, is_q, size, rn, rd); return; case 0x56: /* FCVTXN, FCVTXN2 */ @@ -9297,9 +9306,12 @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) } break; case 0x7c: /* URSQRTE */ - case 0x7d: /* FRSQRTE */ - unsupported_encoding(s, insn); - return; + if (size == 3) { + unallocated_encoding(s); + return; + } + need_fpstatus = true; + break; default: unallocated_encoding(s); return; @@ -9432,6 +9444,9 @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) case 0x59: /* FRINTX */ gen_helper_rints_exact(tcg_res, tcg_op, tcg_fpstatus); break; + case 0x7c: /* URSQRTE */ + gen_helper_rsqrte_u32(tcg_res, tcg_op, tcg_fpstatus); + break; default: g_assert_not_reached(); } diff --git a/target-arm/translate.c b/target-arm/translate.c index 3771953..56e3b4b 100644 --- a/target-arm/translate.c +++ b/target-arm/translate.c @@ -6689,8 +6689,12 @@ static int disas_neon_data_insn(CPUARMState * env, DisasContext *s, uint32_t ins break; } case NEON_2RM_VRSQRTE: - gen_helper_rsqrte_u32(tmp, tmp, cpu_env); + { + TCGv_ptr fpstatus = get_fpstatus_ptr(1); + gen_helper_rsqrte_u32(tmp, tmp, fpstatus); + tcg_temp_free_ptr(fpstatus); break; + } case NEON_2RM_VRECPE_F: { TCGv_ptr fpstatus = get_fpstatus_ptr(1); @@ -6699,8 +6703,12 @@ static int disas_neon_data_insn(CPUARMState * env, DisasContext *s, uint32_t ins break; } case NEON_2RM_VRSQRTE_F: - gen_helper_rsqrte_f32(cpu_F0s, cpu_F0s, cpu_env); + { + TCGv_ptr fpstatus = get_fpstatus_ptr(1); + gen_helper_rsqrte_f32(cpu_F0s, cpu_F0s, fpstatus); + tcg_temp_free_ptr(fpstatus); break; + } case NEON_2RM_VCVT_FS: /* VCVT.F32.S32 */ gen_vfp_sito(0, 1); break;